RU2782182C1

RU2782182C1 - Audio encoder with signal-dependent precision and number control, audio decoder and related methods and computer programs

Info

Publication number: RU2782182C1
Application number: RU2022100599A
Authority: RU
Inventors: Ян БЮТЕ; Маркус ШНЕЛЛЬ; Штефан ДЁЛА; Бернхард ГРИЛЛ; Мартин ДИТЦ
Original assignee: Фраунхофер-Гезелльшафт Цур Фердерунг Дер Ангевандтен Форшунг Е.Ф.
Priority date: 2019-06-17
Filing date: 2020-06-10
Publication date: 2022-10-21

Abstract

FIELD: computer technology.

SUBSTANCE: invention relates to the field of computer technology for encoding audio data. The effect is achieved by pre-processing the input audio data to obtain the audio data to be encoded; encoding the audio data to be encoded; and controlling encoding such that, depending on the first signal characteristic of the first frame of audio data to be encoded, the number of audio data elements for the audio data to be encoded for the first frame is reduced compared to the second signal characteristic of the second frame, and the first number of information units used to encode the reduced number of audio data items for the first frame is more significantly improved compared to the second number of information units for the second frame.

EFFECT: increasing the bit rate and quality of audio data playback.

35 cl, 14 dwg, 1 tbl

Description

Настоящее изобретение относится к обработке аудиосигналов и, в частности, к аудиокодерам/декодерам, применяющим зависимое от сигнала управление точностью и числом.The present invention relates to audio signal processing, and in particular to audio encoders/decoders employing signal dependent precision and number control.

Современные аудиокодеры на основе преобразования применяют последовательность психоакустически обусловленных обработок к спектральному представлению аудиосегмента (кадра) для получения остаточного спектра. Этот остаточный спектр квантуется, и коэффициенты кодируются с использованием энтропийного кодирования.Modern transform-based audio encoders apply a sequence of psycho-acoustically conditioned processing to the spectral representation of an audio segment (frame) to produce a residual spectrum. This residual spectrum is quantized and the coefficients are encoded using entropy coding.

В этом процессе, размер шага квантования, который обычно регулируется через глобальное усиление, имеет прямое влияние на потребление битов энтропийного кодера и должен выбираться таким образом, чтобы соответствовать битовому бюджету, который обычно ограничен и зачастую является фиксированным. Поскольку потребление битов энтропийного кодера и, в частности, арифметического кодера точно не известно до кодирования, вычисление оптимального глобального усиления может выполняться только в итерации с замкнутым контуром квантования и кодирования. Тем не менее, это является нецелесообразным при определенных ограничениях по сложности, поскольку арифметическое кодирование влечет за собой существенную вычислительную сложность.In this process, the quantization step size, which is usually controlled via the global gain, has a direct impact on the bit consumption of the entropy encoder and must be chosen to match the bit budget, which is usually limited and often fixed. Because the bit consumption of the entropy encoder, and in particular the arithmetic encoder, is not exactly known prior to encoding, the calculation of the optimal global gain can only be performed in a closed-loop quantization and encoding iteration. However, this is impractical under certain complexity constraints because arithmetic coding entails significant computational complexity.

Кодеры из уровня техники, которые содержатся в кодеке EVS 3GPP, в силу этого содержат модуль оценки потребления битов для извлечения первой оценки глобального усиления, которая обычно работает в спектре мощности остаточного сигнала. В зависимости от ограничения по сложности, после него может содержаться контур скорости передачи для детализации первой оценки. Использование только такой оценки или в сочетании с очень ограниченной пропускной способностью коррекции уменьшает сложность, но также и уменьшает точность, что приводит к существенным недо- или переоценкам потребления битов.The prior art encoders contained in the 3GPP EVS codec therefore include a bit consumption estimator to extract a first global gain estimate, which typically operates in the power spectrum of the residual signal. Depending on the complexity constraint, it may be followed by a rate loop to refine the first estimate. Using this estimate alone, or in combination with very limited correction bandwidth, reduces complexity but also reduces accuracy, resulting in significant under- or overestimation of bit consumption.

Переоценка потребления битов приводит к избыточным битам после первого каскада кодирования. Кодеры из уровня техники используют их для детализации квантования кодированных коэффициентов во втором каскаде кодирования, называемом «остаточным кодированием». Остаточное кодирование фундаментально отличается от первого каскада кодирования, поскольку оно работает при степени детализации в бит и в силу этого не включает энтропийного кодирования. Кроме того, остаточное кодирование обычно применяется только на частотах с квантованными значениями, неравными нулю, оставляя мертвые зоны, которые дополнительно не улучшаются.Overestimating bit consumption results in redundant bits after the first encoding stage. Prior art encoders use them to refine the quantization of coded coefficients in a second coding stage called "residual coding". Residual coding is fundamentally different from the first coding stage because it operates at granularity per bit and therefore does not involve entropy coding. Moreover, residual coding is typically only applied at frequencies with non-zero quantized values, leaving dead zones that do not improve further.

С другой стороны, недооценка потребления битов неизбежно приводит к частичным потерям спектральных коэффициентов, обычно наибольших частот. В кодерах из уровня техники, этот эффект уменьшается посредством применения замещения шума в декодере, которое основано на предположении, что высокочастотное содержимое обычно является шумным.On the other hand, underestimating the consumption of bits inevitably leads to a partial loss of the spectral coefficients, usually the highest frequencies. In prior art encoders, this effect is reduced by applying noise replacement in the decoder, which is based on the assumption that high frequency content is typically noisy.

В этой конфигурации совершенно очевидно, что желательно кодировать максимально возможно большую часть сигнала возможного на первом этапе кодирования, который использует энтропийное кодирование, и в силу этого является более эффективным, чем этап остаточного кодирования. Следовательно, можно хотеть выбирать глобальное усиление с битовой оценкой, максимально возможно близкой к доступному битовому бюджету. Хотя модуль оценки на основе спектра мощности хорошо работает для большей части аудиосодержимого, он может вызывать проблемы для высокотональных сигналов, в которых оценка первого каскада главным образом основана на нерелевантных боковых лепестках частотного разложения гребенки фильтров, в то время как важные компоненты теряются вследствие недооценки потребления битов.In this configuration, it is quite obvious that it is desirable to encode as much of the signal as possible in the first coding step, which uses entropy coding and is therefore more efficient than the residual coding step. Therefore, one may want to choose a global gain with a bit estimate as close as possible to the available bit budget. Although the power spectrum estimator works well for most audio content, it can cause problems for high-pitched signals, in which the first stage estimation is mainly based on irrelevant filterbank frequency decomposition sidelobes, while important components are lost due to underestimation of bit consumption. .

Задача настоящего изобретения состоит в создании усовершенствованной концепции для кодирования или декодирования аудио, которая, тем не менее, является эффективной и достигает хорошего качества звука.The object of the present invention is to provide an improved concept for encoding or decoding audio which is nevertheless efficient and achieves good audio quality.

Данная задача решается аудиокодером по пункту 1, способом кодирования входных аудиоданных по пункту 33 и аудиодекодером по пункту 35, способом декодирования кодированных аудиоданных по пункту 41 или компьютерной программой по пункту 42 формулы.This task is solved by the audio encoder of claim 1, the input audio data encoding method of claim 33 and the audio decoder of claim 35, the encoded audio data decoding method of claim 41, or the computer program of claim 42.

Настоящее изобретение основано на выводах о том, что для повышения эффективности, в частности, в отношении скорости передачи битов, с одной стороны, и качества звучания, с другой стороны, требуется зависимое от сигнала изменение относительно примерной ситуации, которая определяется рассматриваемыми психоакустическими факторами. Обычные психоакустические модели или рассматриваемые психоакустические факторы приводят к хорошему качеству звука при низкой скорости передачи битов для всех классов сигналов в среднем, т.е. для всех кадров аудиосигналов независимо от их характеристики сигналов, когда предусмотрен средний результат. Тем не менее обнаружено, что для определенных классов сигналов или для сигналов, имеющих определенные характеристики сигналов, например, для довольно тональных сигналов, простая психоакустическая модель или простое психоакустическое управление кодером приводит только к субоптимальным результатам относительно качества звучания (когда скорость передачи битов поддерживается постоянной) либо относительно скорости передачи битов (когда качество звучания поддерживается постоянным).The present invention is based on the conclusion that in order to increase efficiency, in particular in terms of bit rate on the one hand and sound quality on the other hand, a signal-dependent change is required relative to the exemplary situation, which is determined by the considered psychoacoustic factors. Conventional psychoacoustic models or considered psychoacoustic factors lead to good sound quality at low bit rates for all signal classes on average, i.e. for all frames of audio signals, regardless of their signal characteristics, when an average result is provided. However, it has been found that for certain classes of signals or for signals having certain signal characteristics, e.g. quite tonal signals, a simple psychoacoustic model or simple psychoacoustic encoder control only leads to sub-optimal results in terms of audio quality (when the bit rate is kept constant) or relative to the bit rate (when the sound quality is kept constant).

Следовательно, для устранения этого недостатка обычных рассматриваемых психоакустических факторов, настоящее изобретение предусматривает, в контексте аудиокодера с препроцессором для предварительной обработки входных аудиоданных для получения аудиоданных, которые должны кодироваться, и процессора кодера для кодирования аудиоданных, которые должны кодироваться, контроллер для управления процессором кодера таким образом, что в зависимости от определенной характеристики сигналов кадра, число элементов аудиоданных для аудиоданных, которые должны кодироваться посредством процессора кодера, уменьшается по сравнению с обычными простыми результатами, полученными посредством рассматриваемых психоакустических факторов из уровня техники. Кроме того, это уменьшение числа элементов аудиоданных выполняется зависимым от сигнала способом таким образом, что для кадра с определенной первой характеристикой сигналов, число более существенно уменьшается, чем для другого кадра с другой характеристикой сигналов, которая отличается от характеристики сигналов из первого кадра. Это уменьшение числа элементов аудиоданных может считаться уменьшением абсолютного числа или уменьшением относительного числа, хотя это не является решающим. Тем не менее, имеется такой признак, что информационные единицы, которые «экономятся» посредством намеренного уменьшения числа элементов аудиоданных, просто не теряются, но используются для более точного кодирования оставшегося числа элементов данных, т.е. элементы данных, которые не исключены посредством намеренного уменьшения числа элементов аудиоданных.Therefore, in order to overcome this shortcoming of the conventional psychoacoustic factors considered, the present invention provides, in the context of an audio encoder with a preprocessor for preprocessing audio input data to obtain the audio data to be encoded and an encoder processor to encode the audio data to be encoded, a controller for controlling the encoder processor such in such a way that, depending on a certain characteristic of the frame signals, the number of audio data elements for audio data to be encoded by the encoder processor is reduced compared to the usual simple results obtained by the considered psychoacoustic factors of the prior art. In addition, this reduction in the number of audio data elements is performed in a signal-dependent manner such that for a frame with a certain first signal characteristic, the number is more significantly reduced than for another frame with a different signal characteristic that is different from the signal characteristic of the first frame. This reduction in the number of audio data elements may be considered an absolute number reduction or a relative number reduction, although this is not decisive. However, there is a feature that the information units that are "saved" by deliberately reducing the number of audio data items are simply not lost, but are used to more accurately encode the remaining number of data items, i.e. data items that are not excluded by intentionally reducing the number of audio data items.

В соответствии с изобретением, контроллер для управления процессором кодера работает таким образом, что в зависимости от первой характеристики сигналов первого кадра аудиоданных, которые должны кодироваться, число элементов аудиоданных для аудиоданных, которые должны кодироваться посредством процессора кодера для первого кадра, уменьшается по сравнению со второй характеристикой сигналов второго кадра, и в то же время первое число информационных единиц, используемых для кодирования уменьшенного числа элементов аудиоданных для первого кадра, более существенно улучшается по сравнению со вторым числом информационных единиц для второго кадра.According to the invention, the controller for controlling the encoder processor operates in such a way that, depending on the first characteristic of the signals of the first frame of audio data to be encoded, the number of audio data elements for the audio data to be encoded by the encoder processor for the first frame is reduced compared to the second characteristic of the signals of the second frame, and at the same time, the first number of information units used to encode the reduced number of audio data elements for the first frame is more significantly improved compared to the second number of information units for the second frame.

В предпочтительном варианте осуществления, уменьшение выполняется таким образом, что для более тональных кадров сигнала, более существенное уменьшение выполняется, и в то же время число битов для отдельных линий более существенно улучшается по сравнению с кадром, который является менее тональным, т.е. который является более шумным. Здесь, число не уменьшается в такой высокой степени, и, соответственно, число информационных единиц, используемых для кодирования менее тональных элементов аудиоданных, не увеличивается настолько сильно.In the preferred embodiment, the reduction is performed in such a way that for more tone frames of the signal, a more significant reduction is performed, and at the same time the number of bits for individual lines is more significantly improved compared to a frame that is less tonal, i.e. which is noisier. Here, the number does not decrease to such a high degree, and accordingly, the number of information units used to encode less pitched audio data items does not increase as much.

Настоящее изобретение обеспечивает инфраструктуру, в которой, зависимым от сигнала способом, обычно предусмотренные рассматриваемые психоакустические факторы в той или иной степени нарушаются. Тем не менее, с другой стороны, это нарушение не трактуется так, как в нормальных кодерах, в которых нарушение рассматриваемых психоакустических факторов, например, осуществляется в чрезвычайной ситуации, например, в ситуации, когда для поддержания требуемой скорости передачи битов, более высокочастотные части задаются равными нулю. Вместо этого, в соответствии с настоящим изобретением, такое нарушение нормальных рассматриваемых психоакустических факторов осуществляется независимо от чрезвычайной ситуации, и «сэкономленные» информационные единицы применяются для дополнительной детализации «сохранившихся» элементов аудиоданных.The present invention provides an infrastructure in which, in a signal-dependent manner, the commonly envisaged psychoacoustic factors in question are violated to some extent. However, on the other hand, this violation is not treated in the same way as in normal encoders, in which the violation of the considered psychoacoustic factors, for example, is carried out in an emergency situation, for example, in a situation where, in order to maintain the required bit rate, higher frequency parts are given zero. Instead, in accordance with the present invention, such disruption of the normal psychoacoustic factors in question occurs regardless of the emergency, and "saved" information units are used to further refine the "surviving" audio data elements.

В предпочтительных вариантах осуществления, используется двухкаскадный процессор кодера, который имеет в качестве каскада начального кодирования, например, энтропийный кодер, такой как арифметический кодер, или кодер переменной длины, такой как кодер Хаффмана. Второй каскад кодирования служит в качестве каскада детализации, и этот второй кодер обычно реализован в предпочтительных вариантах осуществления в виде остаточного кодера или битового кодера, работающего со степенью детализации в бит, которая, например, может быть реализована посредством прибавления определенного заданного смещения в случае первого значения информационной единицы или вычитания смещения в случае противоположного значения информационной единицы. В варианте осуществления, этот детализирующий кодер предпочтительно реализован в виде остаточного кодера, прибавляющего смещение в случае значения первого бита и вычитающего смещение в случае значения второго бита. В предпочтительном варианте осуществления, уменьшение числа элементов аудиоданных приводит к ситуации, когда распределение доступных битов в обычном сценарии с фиксированной частотой кадров изменяется таким образом, что каскад начального кодирования принимает более низкий битовый бюджет, чем каскад детализирующего кодирования. К настоящему моменту, парадигма заключается в том, что каскад начального кодирования заключается в приёме битового бюджета, который является максимально возможно высоким независимо от характеристики сигналов, поскольку считается, что каскад начального кодирования, такой как каскад арифметического кодирования, имеет наибольшую эффективность и в силу этого кодирует гораздо лучше, чем каскад остаточного кодирования с энтропийной точки зрения. Тем не менее, в соответствии с настоящим изобретением, эта парадигма удаляется, поскольку обнаружено, что для определенных сигналов, таких как, например, сигналы с более высокой тональностью, эффективность энтропийного кодера, такого как арифметический кодер, не является настолько высокой, как эффективность, полученная посредством последующего соединенного остаточного кодера, такого как битовый кодер. Тем не менее, хотя следует признавать то, что каскад энтропийного кодирования является высокоэффективным для аудиосигналов в среднем, настоящее изобретение теперь разрешает эту проблему за счет исключения из рассмотрения среднего, но в силу уменьшения битового бюджета для каскада начального кодирования зависимым от сигнала способом и, предпочтительно, для тональных частей сигнала.In preferred embodiments, a two-stage encoder processor is used which has as its initial encoding stage, for example, an entropy encoder such as an arithmetic encoder or a variable length encoder such as a Huffman encoder. The second coding stage serves as the granularity stage, and this second encoder is usually implemented in the preferred embodiments as a residual encoder or bit encoder operating in bit granularity, which for example can be implemented by adding a certain predetermined offset in the case of the first value information unit or offset subtraction in the case of the opposite value of the information unit. In an embodiment, this detail encoder is preferably implemented as a residual encoder adding an offset in case of a first bit value and subtracting an offset in case of a second bit value. In the preferred embodiment, reducing the number of audio data elements results in a situation where the distribution of available bits in a conventional fixed frame rate scenario is changed such that the start coding stage accepts a lower bit budget than the detail coding stage. By now, the paradigm is that the seed coding stage is to receive a bit budget that is as high as possible regardless of the characteristics of the signals, since the seed coding stage, such as the arithmetic coding stage, is considered to have the highest efficiency and therefore encodes much better than the residual coding cascade from an entropy point of view. However, in accordance with the present invention, this paradigm is removed because it is found that for certain signals, such as, for example, signals with higher pitch, the efficiency of an entropy encoder, such as an arithmetic encoder, is not as high as the efficiency obtained by a subsequent connected residual encoder such as a bit encoder. However, while it should be recognized that the entropy coding stage is highly efficient for audio signals on average, the present invention now solves this problem by eliminating the average from consideration, but by reducing the bit budget for the initial coding stage in a signal dependent and preferably , for tonal parts of the signal.

В предпочтительном варианте осуществления, сдвиг битового бюджета из каскада начального кодирования в каскад детализирующего кодирования на основе характеристики сигналов входных данных выполняется таким образом, что по меньшей мере две детализирующих информационных единицы доступны по меньшей мере для одного и предпочтительно 50% и еще более предпочтительно всех элементов аудиоданных, которые остаются в силе после уменьшения числа элементов данных. Кроме того, обнаружено, что очень эффективная процедура для вычисления этих детализирующих информационных единиц на стороне кодера и применения этих детализирующих информационных единиц на стороне декодера представляет собой итеративную процедуру, при которой, в определенном порядке, например, от низкой частоты к высокой частоте, оставшиеся биты из битового бюджета для каскада детализирующего кодирования потребляются один за другим. В зависимости от числа остающихся в силе элементов аудиоданных и в зависимости от числа информационных единиц для каскада детализирующего кодирования, число итераций может быть значительно больше двух, и обнаружено, что для сильно тональных кадров сигналов, число итераций может составлять четыре, пять или даже выше.In a preferred embodiment, shifting the bit budget from the initial coding stage to the detail coding stage based on the characteristics of the input data signals is performed such that at least two detail information units are available for at least one and preferably 50% and even more preferably all elements. audio data that remains valid after the number of data items has been reduced. In addition, it has been found that a very efficient procedure for calculating these detail information units at the encoder side and applying these detail information units at the decoder side is an iterative procedure in which, in a certain order, for example, from low frequency to high frequency, the remaining bits from the bit budget for the detail coding stage are consumed one by one. Depending on the number of audio data elements still valid, and depending on the number of information units for the detail coding stage, the number of iterations can be significantly greater than two, and it has been found that for high tone signal frames, the number of iterations can be four, five, or even higher.

В предпочтительном варианте осуществления, определение управляющего значения посредством контроллера выполняется опосредованным способом, т.е. без явного определения характеристики сигналов. С этой целью управляющее значение вычисляется на основе манипулируемых входных данных, причем эти манипулируемые входные данные, например, представляют собой входные данные, которые должны квантоваться, или связанные с амплитудой данные, извлекаемые из данных, которые должны квантоваться. Хотя управляющее значение для процессора кодера определяется на основе манипулируемых данных, фактическое квантование/кодирование выполняется без этого манипулирования. Таким образом, зависимая от сигнала процедура получается посредством определения значения манипуляции для манипулирования зависимым от сигнала способом, причем это манипулирование в той или иной степени оказывает влияние на полученное уменьшение числа элементов аудиоданных без явно заданных знаний конкретной характеристики сигналов.In the preferred embodiment, the determination of the control value by the controller is performed in an indirect manner, i.e. without explicit definition of signal characteristics. To this end, the control value is computed on the basis of the manipulated input data, where the manipulated input data is, for example, the input data to be quantized or amplitude-related data extracted from the data to be quantized. Although the control value for the encoder processor is determined based on the keyed data, the actual quantization/encoding is performed without this keying. Thus, a signal-dependent procedure is obtained by determining a keying value for signal-dependent manipulation, the manipulation having some effect on the resulting reduction in the number of audio data elements without explicit knowledge of the specific characteristics of the signals.

В другой реализации, может применяться прямой режим, в котором определенная характеристика сигналов непосредственно оценивается и зависит от результата этого анализа сигналов, определенное уменьшение числа элементов данных выполняется для того получения более высокой точности для остающихся в силе элементов данных.In another implementation, a direct mode may be used, in which a certain characteristic of the signals is directly evaluated and depends on the result of this signal analysis, a certain reduction in the number of data elements is performed in order to obtain higher accuracy for the remaining data elements.

В дополнительной реализации, разделенная процедура может применяться для целей уменьшения элементов аудиоданных. В разделенной процедуре, определенное число элементов данных уже получается посредством квантования, управляемого посредством обычно психоакустически обусловленного управления квантователем, и на основе входного аудиосигнала, квантованные элементы аудиоданных уменьшаются относительно своего числа, и, предпочтительно, это уменьшение выполняется посредством исключения наименьших элементов аудиоданных относительно их амплитуды, их энергии или их мощности. Управление для уменьшения, снова, может получаться посредством прямого/явного определения характеристик сигналов либо посредством опосредованного или неявного управления сигналами.In a further implementation, a split procedure may be applied for purposes of reducing audio data elements. In the split procedure, a certain number of data items are already obtained by quantization controlled by the usually psychoacoustically conditioned control of the quantizer, and based on the input audio signal, the quantized audio data items are reduced in relation to their number, and preferably this reduction is performed by eliminating the smallest audio data items in relation to their amplitude , their energies or their powers. Control for reduction, again, can be obtained through direct/explicit signal characterization, or through indirect or implicit signal control.

В дополнительном предпочтительном варианте осуществления, применяется интегрированная процедура, в которой переменный квантователь управляется таким образом, чтобы выполнять одно квантование, но на основе манипулируемых данных, при этом одновременно квантуются неманипулируемые данные. Управляющее значение квантователя, например, глобальное усиление вычисляется с использованием зависимых от сигнала манипулируемых данных, в то время как данные без этого манипулирования квантуются, и результат квантования кодируется с использованием всех доступных информационных единиц таким образом, что в случае двухкаскадного кодирования, обычно большое количество информационных единиц для каскада детализирующего кодирования остается.In a further preferred embodiment, an integrated procedure is applied in which the variable quantizer is controlled to perform one quantization, but based on manipulated data, while non-manipulated data is quantized at the same time. The quantizer control value, for example, the global gain is computed using the signal-dependent keyed data, while the data without this keying is quantized and the result of the quantization is encoded using all available information units such that in the case of two-stage coding, usually a large amount of information units for the detail coding stage remains.

Варианты осуществления обеспечивают решение проблемы потерь качества для высокотонального содержимого, который основан на модификации спектра мощности, который используется для оценки потребления битов энтропийного кодера. Эта модификация существует для сигнально-адаптивного сумматора минимального уровня шума, который поддерживает оценку для общего аудиосодержимого с плоским остаточным спектром на практике неизменной, в то время как он увеличивает оценку битового бюджета для высокотонального содержимого. Эффект от этой модификации является двойным. Во-первых, он вызывает шум гребенки фильтров и нерелевантные боковые лепестки гармонических компонентов, которые накладываются посредством минимального уровня шума, с возможностью квантования в ноль. Во-вторых, он сдвигает биты из первого каскада кодирования в каскад остаточного кодирования. Хотя такой сдвиг не требуется для большинства сигналов, он является полностью эффективным для высокотональных сигналов, поскольку биты используются для повышения точности квантования гармонических компонентов. Это означает, что они используются для кодирования битов с низкой значимостью, которые обычно придерживаются равномерного распределения и в силу этого полностью эффективно кодируются с помощью двоичного представления. Кроме того, процедура является вычислительно недорогой, что делает ее очень эффективным инструментальным средством для решения вышеуказанной проблемы.Embodiments provide a solution to the quality loss problem for high-pitched content that is based on a modification of the power spectrum that is used to estimate the bit consumption of the entropy encoder. This modification exists for the signal-adaptive noise floor adder, which keeps the estimate for overall flat-residual audio content unchanged in practice, while it increases the bit budget estimate for high-pitched content. The effect of this modification is twofold. First, it causes filterbank noise and irrelevant sidelobes of harmonic components, which are superimposed by a noise floor that can be quantized to zero. Second, it shifts the bits from the first coding stage to the residual coding stage. While this shift is not required for most signals, it is fully effective for high-pitched signals because the bits are used to improve the quantization accuracy of the harmonic components. This means that they are used to encode bits with low significance, which usually adhere to a uniform distribution and are therefore completely efficiently encoded using a binary representation. In addition, the procedure is computationally inexpensive, making it a very efficient tool for solving the above problem.

Далее описаны предпочтительные варианты осуществления настоящего изобретения с обращением к сопровождающим чертежам, на которых:The following describes the preferred embodiments of the present invention with reference to the accompanying drawings, in which:

Фиг. 1 является вариантом осуществления аудиокодера;Fig. 1 is an embodiment of an audio encoder;

Фиг. 2 иллюстрирует предпочтительную реализацию процессора кодера по фиг. 1;Fig. 2 illustrates a preferred implementation of the encoder processor of FIG. one;

Фиг. 3 иллюстрирует предпочтительную реализацию каскада детализирующего кодирования;Fig. 3 illustrates a preferred implementation of the detail coding stage;

Фиг. 4a иллюстрирует примерный синтаксис кадра для первого или второго кадра с итеративными детализирующими битами;Fig. 4a illustrates an exemplary frame syntax for the first or second frame with iterative detail bits;

Фиг. 4b иллюстрирует предпочтительную реализацию модуля уменьшения числа элементов аудиоданных в качестве переменного квантователя;Fig. 4b illustrates a preferred implementation of the audio chip reduction module as a variable quantizer;

Фиг. 5 иллюстрирует предпочтительную реализацию аудиокодера со спектральным препроцессором;Fig. 5 illustrates a preferred implementation of an audio encoder with a spectral preprocessor;

Фиг. 6 иллюстрирует предпочтительный вариант осуществления аудиодекодера с временным постпроцессором;Fig. 6 illustrates a preferred embodiment of an audio decoder with a temporary post processor;

Фиг. 7 иллюстрирует реализацию процессора кодера для аудиодекодера по фиг. 6;Fig. 7 illustrates an implementation of the encoder processor for the audio decoder of FIG. 6;

Фиг. 8 иллюстрирует предпочтительную реализацию каскада детализирующего декодирования по фиг. 7;Fig. 8 illustrates a preferred implementation of the detail decoding stage of FIG. 7;

Фиг. 9 иллюстрирует реализацию опосредованного режима для вычисления управляющих значений;Fig. 9 illustrates an indirect mode implementation for calculating control values;

Фиг. 10 иллюстрирует предпочтительную реализацию модуля вычисления значений манипуляции по фиг. 9;Fig. 10 illustrates a preferred implementation of the keying value calculation module of FIG. 9;

Фиг. 11 иллюстрирует вычисление управляющих значений в прямом режиме;Fig. 11 illustrates the calculation of control values in direct mode;

Фиг. 12 иллюстрирует реализацию разделенного уменьшения числа элементов аудиоданных; иFig. 12 illustrates an implementation of split reduction in the number of audio data elements; and

Фиг. 13 иллюстрирует реализацию интегрированного уменьшения числа элементов аудиоданных.Fig. 13 illustrates an implementation of integrated audio element reduction.

Фиг. 1 иллюстрирует аудиокодер для кодирования входных аудиоданных 11. Аудиокодер содержит препроцессор 10, процессор 15 кодера и контроллер 20. Препроцессор 10 предварительно обрабатывает входные аудиоданные 11 для получения аудиоданных в расчете на кадр или аудиоданные, которые должны кодироваться, проиллюстрированные в элементе 12. Аудиоданные, которые должны кодироваться, вводятся в процессор 15 кодера для кодирования аудиоданных, которые должны кодироваться, и процессор кодера выводит кодированные аудиоданные. Контроллер 20 соединяется, относительно своего ввода, с аудиоданными в расчете на кадр препроцессора, но, в качестве альтернативы, контроллер также может быть соединён с возможностью приёма входных аудиоданных без предварительной обработки. Контроллер выполнен с возможностью уменьшения числа элементов аудиоданных в расчете на кадр в зависимости от сигнала в кадре, и в то же время контроллер увеличивает число информационных единиц или, предпочтительно, битов для уменьшенного числа элементов аудиоданных в зависимости от сигнала в кадре. Контроллер выполнен с возможностью управления процессором 15 кодера таким образом, что в зависимости от первой характеристики сигналов первого кадра аудиоданных, которые должны кодироваться, число элементов аудиоданных для аудиоданных, которые должны кодироваться посредством процессора кодера для первого кадра, уменьшается по сравнению со второй характеристикой сигналов второго кадра, и число информационных единиц, используемых для кодирования уменьшенного числа элементов аудиоданных для первого кадра, более существенно улучшается по сравнению со вторым числом информационных единиц для второго кадра.Fig. 1 illustrates an audio encoder for encoding input audio data 11. The audio encoder comprises a preprocessor 10, an encoder processor 15, and a controller 20. The preprocessor 10 preprocesses the input audio data 11 to obtain audio data per frame or audio data to be encoded illustrated in element 12. Audio data that to be encoded are input to the encoder processor 15 to encode the audio data to be encoded, and the encoder processor outputs the encoded audio data. The controller 20 is connected, relative to its input, to the audio data per preprocessor frame, but, alternatively, the controller can also be connected to be able to receive audio input without preprocessing. The controller is configured to decrease the number of audio data elements per frame depending on the signal in the frame, and at the same time the controller increases the number of information units or preferably bits for the reduced number of audio data elements depending on the signal in the frame. The controller is configured to control the encoder processor 15 such that, depending on the first signal characteristic of the first frame of audio data to be encoded, the number of audio data elements for the audio data to be encoded by the encoder processor for the first frame is reduced compared to the second signal characteristic of the second frame. frame, and the number of information units used to encode the reduced number of audio data items for the first frame is more significantly improved compared to the second number of information units for the second frame.

Фиг. 2 иллюстрирует предпочтительную реализацию процессора кодера. Процессор кодера содержит каскад 151 начального кодирования и каскад 152 детализирующего кодирования. В реализации, каскад начального кодирования содержит энтропийный кодер, такой как арифметический кодер или кодер Хаффмана. В другом варианте осуществления, каскад 152 детализирующего кодирования содержит битовый кодер или остаточный кодер, работающий со степенью детализации в бит или информационную единицу. Кроме того, функциональность относительно уменьшения числа элементов аудиоданных осуществляется на фиг. 2 посредством модуля 150 уменьшения числа элементов аудиоданных, который, например, может быть реализован как переменный квантователь в режиме интегрированного уменьшения, проиллюстрированном на фиг. 13, или в качестве альтернативы, как отдельный элемент, уже работающий с квантованными элементами аудиоданных, как проиллюстрировано в режиме 902 разделенного уменьшения, и в дополнительном непроиллюстрированном варианте осуществления, модуль уменьшения числа элементов аудиоданных также может работать с неквантованными элементами посредством задания равными нулю таких неквантованных элементов или посредством взвешивания, с тем чтобы исключать элементы данных с определенным весовым числом таким образом, что такие элементы аудиоданных квантуются до нуля и в силу этого исключаются в последующем соединенном квантователе. Модуль 150 уменьшения числа элементов аудиоданных по фиг. 2 может работать с элементами неквантованных или квантованных данных в процедуре разделенного уменьшения или может быть реализован посредством переменного квантователя, конкретно управляемого посредством зависимого от сигнала управляющего значения, как проиллюстрировано в режиме интегрированного уменьшения по фиг. 13.Fig. 2 illustrates a preferred encoder processor implementation. The encoder processor comprises an initial encoding stage 151 and a detailed encoding stage 152 . In an implementation, the initial encoding stage comprises an entropy encoder such as an arithmetic encoder or a Huffman encoder. In another embodiment, the granularity encoding stage 152 comprises a bit encoder or a residual encoder operating at a granularity per bit or information unit. In addition, the functionality for reducing the number of audio data items is implemented in FIG. 2 by means of an audio chip reduction module 150, which, for example, may be implemented as a variable quantizer in the integrated reduction mode illustrated in FIG. 13, or alternatively, as a separate element already operating on quantized audio data elements, as illustrated in the split reduction mode 902, and in a further non-illustrated embodiment, the audio element reduction module can also operate on unquantized elements by setting such unquantized elements to zero. elements or by weighting so as to exclude data elements with a certain weight number such that such audio data elements are quantized to zero and thereby excluded in a subsequent coupled quantizer. The audio element reduction unit 150 of FIG. 2 may operate on unquantized or quantized data elements in a split reduction procedure, or may be implemented by a variable quantizer specifically controlled by a signal-dependent control value, as illustrated in the integrated reduction mode of FIG. 13.

Контроллер 20 по фиг. 1 выполнен с возможностью уменьшения числа элементов аудиоданных, кодированных посредством каскада 151 начального кодирования для первого кадра, и каскад 151 начального кодирования выполнен с возможностью кодирования уменьшенного числа элементов аудиоданных для первого кадра с использованием начального числа информационных единиц первого кадра, и вычисленные биты/единицы из начального числа информационных единиц выводятся посредством блока 151, как проиллюстрировано на фиг. 2, элемент 151.The controller 20 of FIG. 1 is configured to reduce the number of audio data elements encoded by the initial encoding stage 151 for the first frame, and the initial encoding stage 151 is configured to encode the reduced number of audio data elements for the first frame using the initial number of information units of the first frame, and the calculated bits/ones from the initial number of information units are output by block 151, as illustrated in FIG. 2, item 151.

Кроме того, каскад 152 детализирующего кодирования выполнен с возможностью использования оставшегося числа информационных единиц первого кадра для детализирующего кодирования для уменьшенного числа элементов аудиоданных для первого кадра, и начальное число информационных единиц первого кадра, сложенное с оставшимся числом информационных единиц первого кадра, приводит к заданному числу информационных единиц для первого кадра. В частности, каскад 152 детализирующего кодирования выводит оставшееся число битов первого кадра и оставшееся число битов второго кадра, и существуют по меньшей мере два детализирующих бита по меньшей мере для одного или предпочтительно по меньшей мере для 50% или еще более предпочтительно для всех ненулевых элементов аудиоданных, т.е. для элементов аудиоданных, которые остаются в силе после уменьшения элементов аудиоданных и которые начально кодируются посредством каскада 151 начального кодирования.In addition, the detail coding stage 152 is configured to use the remaining number of first frame information units for detail coding for the reduced number of audio data items for the first frame, and the initial number of first frame information units added to the remaining number of first frame information units results in a predetermined number information units for the first frame. Specifically, the detail coding stage 152 outputs the remaining number of bits of the first frame and the remaining number of bits of the second frame, and there are at least two detail bits for at least one, or preferably at least 50%, or even more preferably all non-zero audio data elements. , i.e. for audio data elements that remain in effect after reduction of audio data elements and that are initially encoded by the initial coding stage 151 .

Предпочтительно, заданное число информационных единиц для первого кадра равно заданному числу информационных единиц для второго кадра либо находится достаточно близко к заданному числу информационных единиц для второго кадра таким образом, что получается работа с постоянной или с практически постоянной скоростью передачи битов для аудиокодера.Preferably, the predetermined number of information units for the first frame is equal to the predetermined number of information units for the second frame, or is close enough to the predetermined number of information units for the second frame such that constant or substantially constant bit rate operation is obtained for the audio encoder.

Как проиллюстрировано на фиг. 2, модуль 150 уменьшения числа элементов аудиоданных уменьшает число элементов аудиоданных за рамками психоакустически обусловленного числа зависимым от сигнала способом. Таким образом, для первой характеристики сигналов, число уменьшается незначительно на психоакустически обусловленное число, и в кадре со второй характеристикой сигналов, например, число существенно улучшается за рамками психоакустически обусловленного числа. Так же, предпочтительно модуль уменьшения числа элементов аудиоданных исключает элементы данных с наименьшими амплитудами/мощностями/энергиями, и эта операция предпочтительно выполняется через косвенный выбор, полученный в интегрированном режиме, при этом уменьшение числа элементов аудиоданных осуществляется посредством квантования до нуля определенных элементов аудиоданных. В варианте осуществления, каскад начального кодирования кодирует только элементы аудиоданных, которые не квантованы до нуля, и каскад 152 детализирующего кодирования детализирует только элементы аудиоданных, уже обработанные посредством каскада начального кодирования, т.е. элементы аудиоданных, которые не квантованы до нуля посредством модуля 150 уменьшения числа элементов аудиоданных по фиг. 2.As illustrated in FIG. 2, the audio item reduction module 150 reduces the number of audio data items beyond the psychoacoustically determined number in a signal-dependent manner. Thus, for the first signal characteristic, the number decreases slightly by the psychoacoustically determined number, and in a frame with the second signal characteristic, for example, the number improves significantly beyond the psychoacoustically determined number. Also, preferably, the audio element reduction module excludes the lowest amplitude/power/energy elements, and this operation is preferably performed via indirect selection obtained in integrated mode, whereby the reduction in the number of audio data elements is carried out by quantizing certain audio data elements to zero. In an embodiment, the start encoding stage encodes only the audio data elements that are not quantized to zero, and the detail encoding stage 152 details only the audio data elements already processed by the start encoding stage, i.e. audio data elements that are not quantized to zero by the audio element reduction unit 150 of FIG. 2.

В предпочтительном варианте осуществления, каскад детализирующего кодирования выполнен с возможностью итеративного назначения оставшегося числа информационных единиц первого кадра уменьшенному числу элементов аудиоданных первого кадра по меньшей мере на двух последовательно выполняемых итерациях. В частности, значения назначенных информационных единиц по меньшей мере для двух последовательно выполняемых итераций вычисляются, и вычисленные значения информационной единицы по меньшей мере для двух последовательно выполняемых итераций вводятся в кодированный выходной кадр в заданном порядке. В частности, каскад детализирующего кодирования выполнен с возможностью последовательного назначения информационной единицы для каждого элемента аудиоданных из уменьшенного числа элементов аудиоданных для первого кадра в порядке от низкочастотной информации для элемента аудиоданных к высокочастотной информации для элемента аудиоданных на первой итерации. В частности, элементы аудиоданных могут представлять собой отдельные спектральные значения, полученные посредством временно-спектрального преобразования. В качестве альтернативы, элементы аудиоданных могут представлять собой кортежи из двух или более спектральных линий, обычно смежных друг с другом в спектре. Вычисление битовых значений осуществляется от определенного начального значения с низкочастотной информацией к определенному конечному значению с наиболее высокочастотной информацией, и в дополнительной итерации, выполняется та же процедура, т.е. снова обработка от низких спектральных информационных значений/кортежей к высоким спектральным информационным значениям/кортежам. В частности, каскад 152 детализирующего кодирования выполнен с возможностью проверки, является ли число уже назначенных информационных единиц меньшим, чем заданное число информационных единиц для первого кадра, меньшее, чем начальное число информационных единиц первого кадра, и каскад детализирующего кодирования также выполнен с возможностью прекращения второй итерации в случае отрицательного результата проверки, или выполнения в случае положительного результата проверки определенного числа дополнительных итераций до тех пор, пока не будет получен отрицательный результат проверки, причём число дополнительных итераций равно 1, 2, ...,. Предпочтительно, максимальное число итераций ограничено двухразрядным числом, например, значениями от 10 до 30, и предпочтительно 20 итерациями. В альтернативном варианте осуществления, проверка на предмет максимального числа итераций может быть исключена, если сначала подсчитываются ненулевые спектральные линии, и число остаточных битов регулируется соответствующим образом для каждой итерации или для целой процедуры. Следовательно, когда предусмотрено, например, 20 остающихся в силе спектральных кортежей и 50 остаточных битов, можно, без проверки во время процедуры в кодере или декодере, определять то, что число итераций составляет три, и в третьей итерации, детализирующий бит должен вычисляться или быть доступным в потоке битов для первых десяти спектральных линий/кортежей. Таким образом, эта альтернатива не требует проверки во время итеративной обработки, поскольку информация относительно числа ненулевых или остающихся в силе аудиоэлементов известна после обработки начального каскада в кодере или декодере.In a preferred embodiment, the detail coding stage is configured to iteratively assign the remaining number of information units of the first frame to the reduced number of audio data elements of the first frame in at least two successive iterations. In particular, assigned information unit values for at least two successive iterations are computed, and the calculated information unit values for at least two successive iterations are input into the encoded output frame in a predetermined order. In particular, the detail coding stage is configured to sequentially assign an information unit for each audio data item from the reduced number of audio data items for the first frame, in order from low frequency information for the audio data item to high frequency information for the audio data item in the first iteration. In particular, the audio data elements may be individual spectral values obtained by spectral temporal transform. Alternatively, the audio data elements may be tuples of two or more spectral lines, typically adjacent to each other in the spectrum. Bit values are calculated from a certain start value with low frequency information to a certain end value with most high frequency information, and in an additional iteration, the same procedure is performed, i.e. again processing from low spectral information values/tuples to high spectral information values/tuples. Specifically, the detail encoding stage 152 is configured to check whether the number of information units already assigned is less than the predetermined number of information units for the first frame is less than the initial number of information units of the first frame, and the detail encoding stage is also configured to terminate the second iterations in case of a negative test result, or execution, in case of a positive test result, a certain number of additional iterations until a negative test result is obtained, and the number of additional iterations is 1, 2, ...,. Preferably, the maximum number of iterations is limited to a two-digit number, for example, values from 10 to 30, and preferably 20 iterations. In an alternative embodiment, the check for the maximum number of iterations can be eliminated if non-zero spectral lines are counted first and the number of residual bits is adjusted accordingly for each iteration or for the entire procedure. Therefore, when there are, for example, 20 remaining spectral tuples and 50 residual bits, it is possible, without checking during a procedure in the encoder or decoder, to determine that the number of iterations is three, and in the third iteration, the detail bit should be calculated or be available in the bitstream for the first ten spectral lines/tuples. Thus, this alternative does not need to be checked during iterative processing, since the information regarding the number of non-zero or remaining audio elements is known after the processing of the initial stage in the encoder or decoder.

Фиг. 3 иллюстрирует предпочтительную реализацию итеративной процедуры, выполняемой посредством каскада 152 детализирующего кодирования по фиг. 2, которая становится возможной вследствие того факта, что, в отличие от других процедур, число детализирующих битов для кадра значительно увеличено для определенных кадров вследствие соответствующего уменьшения элементов аудиоданных для таких определенных кадров.Fig. 3 illustrates a preferred implementation of the iterative procedure performed by the detail coding stage 152 of FIG. 2, which is made possible by the fact that, unlike other procedures, the number of detail bits per frame is significantly increased for certain frames due to the corresponding reduction in audio data elements for such certain frames.

На этапе 300, остающиеся в силе элементы аудиоданных определяются. Это определение может автоматически выполняться посредством управления элементами аудиоданных, которые уже обработаны посредством каскада 151 начального кодирования по фиг. 2. На этапе 302, начало процедуры осуществляется в заранее заданном элементе аудиоданных, таком как элемент аудиоданных с наиболее низкой спектральной информацией. На этапе 304, битовые значения для каждого элемента аудиоданных в заранее заданной последовательности вычисляются, причем эта заранее заданная последовательность, например, представляет собой последовательность от низких спектральных значений/кортежей к высоким спектральным значениям/кортежам. Вычисление на этапе 304 выполняется с использованием начального смещения 305 и согласно такому управлению 314, что детализирующие биты по-прежнему доступны. В элементе 316, выводятся первые итеративные детализирующие информационные единицы, т.е. битовая комбинация, указывающая один бит для каждого остающегося в силе элемента аудиоданных, причем бит указывает, должно ли смещение, т.е. начальное смещение 305, прибавляться или вычитаться, либо, в качестве альтернативы, должно ли начальное смещение прибавляться или не прибавляться.At 300, the remaining audio data elements are determined. This determination may be automatically performed by managing the audio data elements that have already been processed by the initial encoding stage 151 of FIG. 2. In step 302, the procedure is started at a predetermined audio data element, such as the audio data element with the lowest spectral information. At step 304, bit values for each element of audio data in a predetermined sequence are calculated, and this predetermined sequence, for example, is a sequence from low spectral values/tuples to high spectral values/tuples. The calculation at step 304 is performed using the start offset 305 and according to control 314 such that the detail bits are still available. At element 316, the first iterative detail information items are output, i. e. a bit pattern indicating one bit for each remaining audio data element, where the bit indicates whether the offset, i.e. the start offset 305 to be added or subtracted, or, alternatively, whether the start offset should be added or not added.

На этапе 306, смещение уменьшается с использованием заданного правила. Это заданное правило, например, может заключаться в том, что смещение делится на два, т.е. что новое смещение составляет половину от исходного смещения. Тем не менее, также могут применяться другие правила уменьшения смещения, которые отличаются от взвешивания в 0,5.At step 306, the offset is reduced using the given rule. This predetermined rule could, for example, be that the offset is divided by two, i.e. that the new offset is half of the original offset. However, other bias reduction rules may also apply that differ from the 0.5 weighting.

На этапе 308, битовые значения для каждого элемента в заранее заданной последовательности снова вычисляются, но теперь на второй итерации. В качестве ввода во вторую итерацию, детализированные элементы после первой итерации, проиллюстрированной на 307, вводятся. Таким образом, для вычисления на этапе 314, детализация, представленная посредством первых итеративных детализирующих информационных единиц, уже применяется, и согласно такой необходимой предпосылке, что детализирующие биты по-прежнему доступны, как указано на этапе 314, вторые итеративные детализирующие информационные единицы вычисляются и выводятся на 318.At step 308, the bit values for each element in the predetermined sequence are calculated again, but now in the second iteration. As input to the second iteration, the detailed elements after the first iteration illustrated at 307 are input. Thus, for the calculation at step 314, the granularity represented by the first iterative detail information items is already applied, and under such a necessary premise that the detail bits are still available, as indicated at step 314, the second iterative detail information items are computed and output. at 318.

На этапе 310, смещение снова уменьшается с использованием заданного правила для готовности к третьей итерации, и третья итерация снова основывается на детализированных элементах после второй итерации, проиллюстрированной на 309, и снова согласно такой необходимой предпосылке, что детализирующие биты по-прежнему доступны, как указано на 314, третьи итеративные детализирующие информационные единицы вычисляются и выводятся на 320.At step 310, the offset is again reduced using the predetermined rule to be ready for the third iteration, and the third iteration is again based on the detail elements after the second iteration illustrated at 309, and again under the necessary premise that the detail bits are still available as indicated. at 314, third iterative detail information items are computed and output at 320.

Фиг. 4a иллюстрирует примерный синтаксис кадра с информационными единицами или битами для первого кадра или второго кадра. Часть битовых данных для кадра состоит из начального числа битов, т.е. элемента 400. Кроме того, первые итеративные детализирующие биты 316, вторые итеративные детализирующие биты 318 и третьи итеративные детализирующие биты 320 также включаются в кадр. В частности, в соответствии с синтаксисом кадра, декодер в состоянии идентификации того, какие биты кадра представляют собой начальное число битов, того, какие биты представляют собой первый, второй или третий итеративные детализирующие биты 316, 318, 320, и того, какие биты в кадре представляют собой любые другие биты 402, такая вспомогательная информация, которая, например, может также включать в себя кодированное представление глобального усиления (gg), например, которое, например, может вычисляться посредством контроллера 200 непосредственно или которое, например, может затрагиваться посредством контроллера посредством выходной информации контроллера 21. В секции 316, 318, 320, придается определенная последовательность отдельных информационных единиц. Эта последовательность предпочтительно является такой, что биты в битовой последовательности применяются к начально декодированным элементам аудиоданных, которые должны декодироваться. Поскольку нецелесообразно, относительно требований по скорости передачи битов, явно передавать в служебных сигналах что-либо относительно первого, второго и третьего итеративных детализирующих битов, порядок отдельных битов в блоках 316, 318, 320 должен быть тем же, что и соответствующий порядок остающихся в силе элементов аудиоданных. С учетом этого, предпочтительно использовать одинаковую итеративную процедуру на стороне кодера, как проиллюстрировано на фиг. 3, и на стороне декодера, как проиллюстрировано на фиг. 8. Не обязательно передавать в служебных сигналах конкретное выделение битов или битовое ассоциирование по меньшей мере в блоках 316-320.Fig. 4a illustrates an exemplary frame syntax with information units or bits for a first frame or a second frame. The bit data portion for a frame consists of an initial number of bits, i. e. element 400. In addition, first iterative detail bits 316, second iterative detail bits 318, and third iterative detail bits 320 are also included in the frame. Specifically, according to the frame syntax, the decoder is in a state of identifying which bits of the frame are the seed number of bits, which bits are the first, second, or third iterative detail bits 316, 318, 320, and which bits in frame are any other bits 402, such ancillary information which, for example, may also include an encoded representation of the global gain (gg), for example, which, for example, can be calculated by the controller 200 directly or which, for example, can be affected by the controller through the output information of the controller 21. In sections 316, 318, 320, a certain sequence of individual information units is given. This sequence is preferably such that the bits in the bit sequence are applied to the initially decoded audio data elements to be decoded. Since it is impractical, with respect to bit rate requirements, to explicitly signal anything about the first, second, and third iterative detail bits, the order of the individual bits in blocks 316, 318, 320 must be the same as the corresponding order of the remaining bits. audio data elements. With this in mind, it is preferable to use the same iterative procedure on the encoder side as illustrated in FIG. 3 and on the decoder side, as illustrated in FIG. 8. It is not necessary to signal a specific bit allocation or bit association, at least in blocks 316-320.

Кроме того, числа для начального числа битов, с одной стороны, и оставшегося числа битов, с другой стороны, являются лишь примерными. Обычно начальное число битов, которые обычно кодируют старшую битовую часть элемента аудиоданных, такую как спектральные значения или кортежи спектральных значений, превышает итеративные детализирующие биты, которые представляют младшую часть «сохранившихся» элементов аудиоданных. Кроме того, начальное число 400 битов обычно определяется посредством энтропийного кодера или арифметического кодера, но итеративные детализирующие биты определяются с использованием остаточного или битового кодера, работающего со степенью детализации в информационную единицу. Хотя каскад детализирующего кодирования не выполняет энтропийное кодирование и т.п., кодирование младшей битовой части элементов аудиоданных, тем не менее, более эффективно выполняется посредством каскада детализирующего кодирования, поскольку можно предполагать, что младшая битовая часть элементов аудиоданных, таких как спектральные значения, одинаково распределяется, и в силу этого энтропийное кодирование с кодом переменной длины или арифметическим кодом вместе с определенным контекстом не вводит дополнительного преимущества, а вместо этого даже вводит дополнительный объем служебной информации.In addition, the numbers for the initial number of bits on the one hand and the remaining number of bits on the other hand are only exemplary. Typically, the initial number of bits that typically encode the high-bit part of an audio data element, such as spectral values or tuples of spectral values, is greater than the iterative detail bits that represent the low-order part of the "preserved" audio data elements. In addition, the initial number of 400 bits is usually determined by an entropy encoder or an arithmetic encoder, but the iterative detail bits are determined using a residual or bit encoder operating at a granularity per information unit. Although the detail coding stage does not perform entropy coding and the like, the encoding of the low bit part of the audio data items is nevertheless more efficiently performed by the detail coding stage, since it can be assumed that the low bit part of the audio data items, such as spectral values, is the same. is distributed, and because of this, entropy coding with a variable length code or an arithmetic code, together with a certain context, does not introduce an additional advantage, but instead even introduces an additional amount of overhead.

Другими словами, для младшей битовой части элементов аудиоданных, использование арифметического кодера должно быть менее эффективным, чем использование битового кодера, поскольку битовый кодер вообще не требует скорости передачи битов для определенного контекста. Намеренное уменьшение числа элементов аудиоданных, вызываемое посредством контроллера, не только повышает точность доминирующих спектральных линий или кортежей линий, но и, кроме того, обеспечивает высокоэффективную операцию кодирования для целей детализации частей MSB этих элементов аудиоданных, представленных посредством арифметического кода или кода переменной длины.In other words, for the least significant bit part of the audio data elements, using an arithmetic encoder should be less efficient than using a bit encoder, since a bit encoder does not require a bit rate at all for a particular context. Intentional reduction in the number of audio data items called by the controller not only improves the accuracy of the dominant spectral lines or line tuples, but also provides a highly efficient encoding operation for the purpose of detailing the MSB portions of these audio data items represented by an arithmetic or variable length code.

С учетом этого, несколько и, например, следующие преимущества получаются посредством реализации процессора 15 кодера по фиг. 1, как проиллюстрировано на фиг. 2, с каскадом 151 начального кодирования, с одной стороны, и с каскадом 152 детализирующего кодирования, с другой стороны.With this in mind, several and for example the following advantages are obtained by implementing the encoder processor 15 of FIG. 1 as illustrated in FIG. 2 with the start encoding stage 151 on the one hand and the detail encoding stage 152 on the other hand.

Предлагается эффективная схема двухкаскадного кодирования, содержащая первый каскад энтропийного кодирования и второй каскад остаточного кодирования на основе однобитового (неэнтропийного) кодирования.An efficient two-stage coding scheme is proposed, containing the first entropy coding stage and the second residual coding stage based on one-bit (non-entropy) coding.

Схема использует модуль оценки глобального усиления с низкой сложностью, который включает модуль оценки потребления битов на основе энергии для первого каскада кодирования, содержащего сигнально-адаптивный сумматор минимального уровня шума.The scheme uses a low complexity global gain estimator that includes an energy-based bit consumption estimator for the first coding stage, comprising a signal-adaptive noise floor adder.

Сумматор минимального уровня шума эффективно передает биты из первого каскада кодирования во второй каскад кодирования для высокотональных сигналов при оставлении оценки для других типов сигналов неизменной. Этот сдвиг битов из каскада энтропийного кодирования в каскад неэнтропийного кодирования является полностью эффективным для высокотональных сигналов.The noise floor combiner efficiently transfers bits from the first coding stage to the second coding stage for high tone signals while leaving the estimate for other types of signals unchanged. This bit shifting from the entropy coding stage to the non-entropy coding stage is fully effective for high pitched signals.

Фиг. 4b иллюстрирует предпочтительную реализацию переменного квантователя, который, например, может быть реализован с возможностью выполнения уменьшения числа элементов аудиоданных управляемым способом предпочтительно в режиме интегрированного уменьшения, проиллюстрированном относительно фиг. 13. С этой целью, переменный квантователь содержит модуль 155 взвешивания, который принимает (неманипулируемые) аудиоданные, которые должны кодироваться, проиллюстрированные в линии 12. Эти данные также вводятся в контроллер 20, и контроллер выполнен с возможностью вычисления глобального усиления 21, но на основе неманипулируемых данных, вводимых в модуль 155 взвешивания, и с использованием зависимого от сигнала манипулирования. Глобальное усиление 21 применяется в модуле 155 взвешивания, и вывод модуля взвешивания вводится в ядро 157 квантователя, которое основывается на фиксированном размере шага квантования. Переменный квантователь 150 реализован в виде управляемого модуля взвешивания, в котором управление выполняется с использованием глобального усиления 21 (gg) и последующего соединенного ядра 157 квантователя с фиксированным размером шага квантования. Тем не менее, также могут выполняться другие реализации, такие как ядро квантователя, имеющее переменный размер шага квантования, который управляется посредством выходного значения контроллера 20.Fig. 4b illustrates a preferred implementation of a variable quantizer, which, for example, may be implemented to perform audio chip reduction in a controlled manner, preferably in the integrated reduction mode illustrated with respect to FIG. 13. To this end, the variable quantizer comprises a weighting module 155 which receives the (non-manipulated) audio data to be encoded illustrated in line 12. This data is also input to the controller 20 and the controller is configured to calculate the global gain 21 but based on non-manipulated data input to the weighing module 155 and using signal dependent keying. The global gain 21 is applied in the weighting module 155, and the output of the weighting module is input to the quantizer core 157, which is based on a fixed quantization step size. Variable quantizer 150 is implemented as a controllable weighting module in which control is performed using a global gain 21 (gg) and a subsequent coupled quantizer core 157 with a fixed quantization step size. However, other implementations may also be performed, such as a quantizer core having a variable quantization step size that is controlled by the output value of the controller 20.

Фиг. 5 иллюстрирует предпочтительную реализацию аудиокодера и, в частности, определенную реализацию препроцессора 10 по фиг. 1. Предпочтительно, препроцессор содержит модуль 13 кодирования со взвешиванием, который формирует, из входных аудиоданных 11, кадр аудиоданных временной области, кодированных со взвешиванием с использованием определенной функции аналитического кодирования со взвешиванием, которая, например, может представлять собой косинусоидальную функцию кодирования со взвешиванием. Кадр аудиоданных временной области вводится в спектральный преобразователь 14, который может быть реализован с возможностью выполнения модифицированного дискретного косинусного преобразования (MDCT) или любого другого преобразования, такого как FFT или MDST, либо любого другого временно-спектрального преобразования. Предпочтительно, модуль кодирования со взвешиванием работает с определенным опережающим управлением таким образом, что формирование перекрывающихся кадров выполняется. В случае 50%-ного перекрытия, опережающее значение модуля кодирования со взвешиванием составляет половину от размера функции аналитического кодирования со взвешиванием, применяемой посредством модуля 13 кодирования со взвешиванием. (Неквантованный) кадр спектральных значений, выводимых посредством спектрального преобразователя, вводится в спектральный процессор 15, который реализован для выполнения некоторой спектральной обработки, такой как выполнение операции формирования временного шума, операции формирования спектрального шума или любой другой операции, такой как операция спектрального отбеливания, посредством которой модифицированные спектральные значения, сформированные посредством спектрального процессора, имеют спектральную огибающую, более плоскую, чем спектральная огибающая спектральных значений перед обработкой посредством спектрального процессора 15. Аудиоданные, которые должны кодироваться (в расчете на кадр), перенаправляются через линию 12 в процессор 15 кодера и в контроллер 20, при этом контроллер 20 передаёт управляющую информацию через линию 21 в процессор 15 кодера. Процессор кодера выводит свои данные в модуль 30 записи потоков битов, реализованный, например, в виде мультиплексора потоков битов, и кодированные кадры выводятся в линии 35.Fig. 5 illustrates a preferred implementation of the audio encoder, and in particular a specific implementation of the preprocessor 10 of FIG. 1. Preferably, the preprocessor comprises a weighted coding module 13 which generates, from the input audio data 11, a frame of weighted time domain audio data using a certain weighted analytic coding function, which may, for example, be a cosine weighted coding function. A frame of time domain audio data is input to spectral transform 14, which may be implemented to perform a modified discrete cosine transform (MDCT) or any other transform such as FFT or MDST, or any other time spectral transform. Preferably, the weighted coding module operates with a certain feedforward control such that overlapping frame generation is performed. In the case of 50% overlap, the leading value of the weighting coding unit is half the size of the analytic weighting coding function applied by the weighting coding unit 13 . The (non-quantized) frame of spectral values output by the spectral converter is input to the spectral processor 15, which is implemented to perform some spectral processing, such as performing a temporal noise shaping operation, a spectral noise shaping operation, or any other operation such as a spectral whitening operation, by which the modified spectral values generated by the spectral processor have a spectral envelope that is flatter than the spectral envelope of the spectral values before being processed by the spectral processor 15. The audio data to be encoded (per frame) is forwarded via line 12 to the encoder processor 15 and to the controller 20, while the controller 20 transmits control information via line 21 to the processor 15 of the encoder. The encoder processor outputs its data to a bitstream recorder 30 implemented as a bitstream multiplexer, for example, and the encoded frames are output on line 35.

Относительно обработки на стороне декодера, следует обратиться к фиг. 6. Поток битов, выводимый посредством блока 30, например, может непосредственно вводиться в модуль 40 считывания потоков битов после некоторого хранения или передачи. Естественно, любая другая обработка может выполняться между кодером и декодером, например, обработка передачи в соответствии с протоколом беспроводной передачи, таким как протокол DECT или протокол Bluetooth либо любой другой протокол беспроводной передачи. Данные, вводимые в аудиодекодер, показанный на фиг. 6, вводятся в модуль 40 считывания потоков битов. Модуль 40 считывания потоков битов считывает данные и перенаправляет данные в процессор 50 кодера, который управляется посредством контроллера 60. В частности, модуль считывания потоков битов принимает кодированные данные, причем кодированные аудиоданные содержат, для кадра, начальное число информационных единиц кадра и оставшееся число информационных единиц кадра. Процессор 50 кодера обрабатывает кодированные аудиоданные, и процессор 50 кодера содержит каскад начального декодирования и каскад детализирующего декодирования, как проиллюстрировано на фиг. 7 в элементе 51 для каскада начального декодирования и в элементе 52 для каскада детализирующего декодирования, которые управляются посредством контроллера 60. Контроллер 60 выполнен с возможностью управления каскадом 52 детализирующего декодирования таким образом, чтобы использовать при детализации начально декодированных элементов данных, выводимых посредством каскада 51 начального декодирования по фиг. 7, по меньшей мере две информационных единицы из оставшегося числа информационных единиц для детализации одного и того же начально декодированного элемента данных. Кроме того, контроллер 60 выполнен с возможностью управления процессором кодера таким образом, чтобы каскад начального декодирования использовал начальное число информационных единиц кадра для получения начально декодированных элементов данных в линии, соединяющей блок 51 и 52 на фиг. 7, при этом, предпочтительно, контроллер 60 принимает индикатор начального числа информационных единиц кадра, с одной стороны, и начального оставшегося числа информационных единиц кадра из модуля 40 считывания потоков битов, как указано посредством входной линии в блок 60 по фиг. 6 или фиг. 7. Постпроцессор 70 обрабатывает детализированные элементы аудиоданных для получения декодированных аудиоданных 80 в выводе постпроцессора 70.Regarding processing at the decoder side, refer to FIG. 6. The bitstream output by the block 30, for example, can be directly input to the bitstream reader 40 after some storage or transmission. Naturally, any other processing may be performed between the encoder and decoder, such as transmission processing according to a wireless transmission protocol such as the DECT protocol or the Bluetooth protocol, or any other wireless transmission protocol. Data input to the audio decoder shown in FIG. 6 are input to the bit stream reading unit 40 . The bitstream reader 40 reads the data and forwards the data to the encoder processor 50, which is controlled by the controller 60. Specifically, the bitstream reader receives encoded data, the encoded audio data containing, for a frame, the initial number of frame information units and the remaining number of information units. frame. The encoder processor 50 processes the encoded audio data, and the encoder processor 50 comprises an initial decoding stage and a detailed decoding stage, as illustrated in FIG. 7 in the element 51 for the initial decoding stage and in the element 52 for the detailed decoding stage, which are controlled by the controller 60. The controller 60 is configured to control the detailed decoding stage 52 so as to use the initial decoded data elements output by the initial stage 51 in detailing. decoding according to FIG. 7, at least two information units out of the remaining number of information units for detailing the same initially decoded data element. In addition, the controller 60 is configured to control the encoder processor such that the initial decoding stage uses the initial number of frame information units to obtain the initial decoded data elements on the line connecting block 51 and 52 in FIG. 7, wherein, preferably, the controller 60 receives an indicator of the initial number of frame information units on the one hand and the initial remaining number of frame information units from the bitstream reader 40 as indicated by an input line to block 60 of FIG. 6 or FIG. 7. The post processor 70 processes the detailed audio data elements to obtain the decoded audio data 80 in the output of the post processor 70.

В предпочтительной реализации для аудиодекодера, который соответствует аудиокодеру по фиг. 5, постпроцессор 70 содержит, в качестве входного каскада, спектральный процессор 71, который выполняет операцию обратного формирования временного шума или операцию обратного формирования спектрального шума, или операцию обратного спектрального отбеливания, или любую другую операцию, которая уменьшает некоторую обработку, применяемую посредством спектрального процессора 15 по фиг. 5. Вывод спектрального процессора вводится во временной преобразователь 72, который работает с возможностью выполнения преобразования из спектральной области во временную область, и предпочтительно временной преобразователь 72 совпадает со спектральным преобразователем 14 по фиг. 5. Вывод временного преобразователя 72 вводится в каскад 73 суммирования с перекрытием, который выполняет операцию суммирования с перекрытием для определенного числа перекрывающихся кадров, например по меньшей мере для двух перекрывающихся кадров для получения декодированных аудиоданных 80. Предпочтительно, каскад 73 суммирования с перекрытием применяет функцию синтезирующего кодирования со взвешиванием к выводу временного преобразователя 72, причем эта функция синтезирующего кодирования со взвешиванием совпадает с функцией аналитического кодирования со взвешиванием, применяемой посредством модуля 13 аналитического кодирования со взвешиванием. Кроме того, операция перекрытия, выполняемая посредством блока 73, совпадает с опережающей операцией блока, выполняемой посредством модуля 13 кодирования со взвешиванием по фиг. 5.In a preferred implementation for an audio decoder that corresponds to the audio encoder of FIG. 5, the post processor 70 includes, as an input stage, a spectrum processor 71 that performs a temporal noise de-shaping operation, or a spectral noise de-shaping operation, or an inverse spectral whitening operation, or any other operation that reduces some of the processing applied by the spectral processor 15. according to fig. 5. The output of the spectral processor is input to a time transform 72 which is operable to perform a spectral-to-time domain transform, and preferably the time transform 72 is the same as the spectral transform 14 of FIG. 5. The output of the time converter 72 is input to the overlap adder stage 73, which performs an overlap adder operation on a certain number of overlapping frames, such as at least two overlapping frames, to obtain decoded audio data 80. Preferably, the overlap adder stage 73 applies a synthesis function. weighting coding to the output of the time converter 72, this synthesis weighting coding function being the same as the analytic weighting coding function applied by the analytic weighting coding module 13 . In addition, the overlap operation performed by the block 73 is the same as the forward block operation performed by the weighted coding unit 13 of FIG. 5.

Как проиллюстрировано на фиг. 4a, оставшееся число информационных единиц кадра содержит вычисленные значения информационных единиц 316, 318, 320 по меньшей мере для двух последовательных итераций в заданном порядке, при этом, в варианте осуществления 4a, проиллюстрированы даже три итерации. Кроме того, контроллер 60 выполнен с возможностью управления каскадом 52 детализирующего декодирования таким образом, чтобы использовать для первой итерации вычисленные значения, например, из блока 316 для первой итерации в соответствии с заданным порядком и использовать для второй итерации вычисленные значения из блока 318 для второй итерации в заданном порядке.As illustrated in FIG. 4a, the remaining number of frame information units contains calculated values of information units 316, 318, 320 for at least two consecutive iterations in a given order, with even three iterations illustrated in Embodiment 4a. In addition, the controller 60 is configured to control the detail decoding stage 52 so as to use for the first iteration the calculated values, for example, from block 316 for the first iteration in accordance with a predetermined order, and to use for the second iteration the calculated values from block 318 for the second iteration. in the given order.

Затем, предпочтительная реализация каскада детализирующего декодирования под управлением контроллера 60 проиллюстрирована относительно фиг. 8. На этапе 800, контроллер или каскад 52 детализирующего декодирования по фиг. 7 определяет то, подлежащие детализации элементы аудиоданных. Эти элементы аудиоданных обычно представляют собой все элементы аудиоданных, которые выводятся посредством блока 51 по фиг. 7. Как указано на этапе 802, начало в заранее заданном элементе аудиоданных, таком как наиболее низкая спектральная информация, выполняется. С использованием начального смещения 805, первые итеративные детализирующие информационные единицы, принимаемые из потока битов или из контроллера 16, например, данные в блоке 316 по фиг. 4a, применяются 804 для каждого элемента в заранее заданной последовательности, причем заранее заданная последовательность протягивается от низкого к высокому спектральному значению/спектральному кортежу/спектральной информации. Результаты представляют собой детализированные элементы аудиоданных после первой итерации, как проиллюстрировано посредством линии 807. На этапе 808, битовые значения для каждого элемента в заранее заданной последовательности применяются, причем битовые значения исходят из вторых итеративных детализирующих информационных единиц, как проиллюстрировано на 818, и эти биты принимаются из модуля считывания потоков битов или контроллера 60 в зависимости от конкретной реализации. Результат этапа 808 представляет собой детализированные элементы после второй итерации. С другой стороны, на этапе 810, смещение уменьшается в соответствии с использованием заданного правила уменьшения смещения, которое уже применено в блоке 806. С уменьшенным смещением, битовые значения для каждого элемента в заранее заданной последовательности применяются, как проиллюстрировано на 812 с использованием принимаемых третьих итеративных детализирующих информационных единиц, например, из потока битов или из контроллера 60. Третьи итеративные детализирующие информационные единицы записываются в поток битов в элементе 320 по фиг. 4a. Результат процедуры в блоке 812 представляет собой детализированные элементы после третьей итерации, как указано на 821.Next, a preferred implementation of the detail decoding stage under the control of controller 60 is illustrated with respect to FIG. 8. In step 800, the controller or detail decoding stage 52 of FIG. 7 defines the audio data elements to be refined. These audio data elements are typically all audio data elements that are output by the block 51 of FIG. 7. As indicated in step 802, a start at a predetermined audio data element, such as the lowest spectral information, is performed. Using start offset 805, the first iterative detail information items received from the bit stream or from controller 16, such as the data in block 316 of FIG. 4a are applied 804 to each element in a predetermined sequence, with the predetermined sequence stretching from low to high spectral value/spectral tuple/spectral information. The results are the detailed audio data items after the first iteration, as illustrated by line 807. In step 808, the bit values for each item in a predetermined sequence are applied, with the bit values coming from the second iterated detail information units, as illustrated at 818, and these bits are received from the bitstream reader or controller 60, depending on the particular implementation. The result of step 808 is the detailed items after the second iteration. On the other hand, at step 810, the offset is reduced in accordance with the predetermined offset reduction rule already applied in block 806. With the reduced offset, the bit values for each element in the predetermined sequence are applied as illustrated at 812 using the received third iteratives. detail information items, such as from the bit stream or controller 60. The third iterative detail information items are written to the bit stream at element 320 of FIG. 4a. The result of the procedure at block 812 is the detailed items after the third iteration, as indicated at 821.

Эта процедура продолжается до тех пор, пока все итеративные детализирующие биты, включенные в поток битов для кадра, не обрабатываются. Это проверяется посредством контроллера 60 через управляющую линию 814, которая управляет оставшейся доступностью детализирующих битов предпочтительно для каждой итерации, но по меньшей мере для второй и третьей итераций, обработанных в блоках 808, 812. На каждой итерации, контроллер 60 управляет каскадом детализирующего декодирования таким образом, чтобы проверить, является ли число уже считанных информационных единиц меньшим, чем число информационных единиц в оставшихся информационных единицах кадра для кадра, для прекращения второй итерации в случае отрицательного результата проверки, либо, в случае положительного результата проверки, выполнения определенного числа дополнительных итераций до тех пор, пока не получается отрицательный результат проверки. Число дополнительных итераций составляет по меньшей мере одну. Вследствие применения аналогичных процедур на стороне кодера, поясненных в контексте по фиг. 3, и на стороне декодера, как указано на фиг. 8, конкретная передача служебных сигналов вообще не требуется. Вместо этого, множественная итеративная детализирующая обработка осуществляется высокоэффективным способом без конкретного объема служебной информации. В альтернативном варианте осуществления, проверка на предмет максимального числа итераций может быть исключена, если ненулевые спектральные линии подсчитываются сначала, и число остаточных битов регулируются соответствующим образом для каждой итерации.This procedure continues until all of the iterative detail bits included in the bit stream for the frame have been processed. This is checked by controller 60 via control line 814, which controls the remaining availability of detail bits, preferably for each iteration, but at least for the second and third iterations processed in blocks 808, 812. At each iteration, controller 60 controls the detail decoding stage in this way to check if the number of information units already read is less than the number of information units in the remaining frame information units for the frame, to terminate the second iteration in case of a negative test result, or, in the case of a positive test result, perform a certain number of additional iterations until until a negative test result is obtained. The number of additional iterations is at least one. Due to the application of similar procedures at the encoder side, explained in the context of FIG. 3 and on the decoder side as shown in FIG. 8, specific signaling is not required at all. Instead, multiple iterative detail processing is performed in a highly efficient manner without a particular amount of overhead. In an alternative embodiment, the check for the maximum number of iterations can be eliminated if non-zero spectral lines are counted first and the number of residual bits is adjusted accordingly for each iteration.

В предпочтительной реализации, каскад 52 детализирующего декодирования выполнен с возможностью сложения смещения с начально декодированным элементом данных, когда считываемая информационная единица данных из оставшегося числа информационных единиц кадра имеет первое значение, и вычитания смещения из начально декодированного элемента, когда считываемая информационная единица данных из оставшегося числа информационных единиц кадра имеет второе значение. Это смещение, для первой итерации, представляет собой детализированные элементы по фиг. 8. На второй итерации, как проиллюстрировано на 808 на фиг. 8, уменьшенное смещение, сформированное посредством блока 806, используется для сложения уменьшенного или второго смещения с результатом первой итерации, когда считываемая информационная единица данных из оставшегося числа информационных единиц кадра имеет первое значение, и для вычитания второго смещения из результата первой итерации, когда считываемая информационная единица данных из оставшегося числа информационных единиц кадра имеет второе значение. Обычно, второе смещение ниже первого смещения, и предпочтительно, если второе смещение составляет 0,4-0,6 раз относительно первого смещения и наиболее предпочтительно 0,5 раз относительно первого смещения.In a preferred implementation, the detail decoding stage 52 is configured to add an offset to the initially decoded data item when the read information data unit of the remaining number of frame information units has a first value, and subtract the offset from the initially decoded item when the read information data unit of the remaining number frame information units has a second meaning. This offset, for the first iteration, is the detail elements of FIG. 8. In the second iteration, as illustrated at 808 in FIG. 8, the reduced offset generated by block 806 is used to add the reduced or second offset to the result of the first iteration when the read information data unit of the remaining number of frame information units has a first value, and to subtract the second offset from the result of the first iteration when the read information unit the data unit of the remaining number of information units of the frame has the second value. Typically, the second offset is lower than the first offset, and preferably the second offset is 0.4-0.6 times the first offset and most preferably 0.5 times the first offset.

В предпочтительной реализации настоящего изобретения с использованием опосредованного режима, проиллюстрированного на фиг. 9, явное определение характеристик сигналов не требуется. Вместо этого, значение манипуляции вычисляется предпочтительно с использованием варианта осуществления, проиллюстрированного на фиг. 9. Для опосредованного режима, контроллер 20 реализован таким образом, как указано на фиг. 9. В частности, контроллер содержит управляющий препроцессор 22, модуль 23 вычисления значений манипуляции, модуль 24 комбинирования и модуль 25 вычисления глобальных усилений, который, в конечном счете, вычисляет глобальное усиление для модуля 150 уменьшения числа элементов аудиоданных по фиг. 2, который реализован в виде переменного квантователя, проиллюстрированного на фиг. 4b. В частности, контроллер 20 выполнен с возможностью анализа аудиоданных первого кадра для определения первого управляющего значения для переменного квантователя для первого кадра, и анализа аудиоданных второго кадра для определения второго управляющего значения для переменного квантователя для второго кадра, причем второе управляющее значение отличается от первого управляющего значения. Анализ аудиоданных кадра выполняется посредством модуля 23 вычисления значений манипуляции. Контроллер 20 выполнен с возможностью выполнения манипулирования аудиоданными первого кадра. В этой операции управляющий препроцессор 20, проиллюстрированный на фиг. 9, не присутствует, и в силу этого обходная линия для блока 22 является активной.In a preferred implementation of the present invention using the indirect mode illustrated in FIG. 9, explicit signal characterization is not required. Instead, the keying value is calculated preferably using the embodiment illustrated in FIG. 9. For the indirect mode, the controller 20 is implemented as shown in FIG. 9. Specifically, the controller includes a control preprocessor 22, a keying value calculation module 23, a combining module 24, and a global gain calculation module 25 that ultimately calculates the global gain for the audio chip reduction module 150 of FIG. 2, which is implemented as the variable quantizer illustrated in FIG. 4b. In particular, the controller 20 is configured to parse the audio data of the first frame to determine the first control value for the variable quantizer for the first frame, and parse the audio data of the second frame to determine the second control value for the variable quantizer for the second frame, the second control value being different from the first control value. . Analysis of the frame audio data is performed by the keying value calculation unit 23 . The controller 20 is configured to perform manipulation of the audio data of the first frame. In this operation, the control preprocessor 20 illustrated in FIG. 9 is not present, and therefore the bypass line for block 22 is active.

Тем не менее, когда манипулирование не выполняется для аудиоданных первого кадра или второго кадра, но применяется к связанным с амплитудой значениям, извлекаемым из аудиоданных первого кадра или второго кадра, управляющий препроцессор 22 присутствует, и обходная линия не существует. Фактическое манипулирование выполняется посредством модуля 24 комбинирования, который комбинирует значение манипуляции, выводимое из блока 23, со связанными с амплитудой значениями, извлекаемыми из аудиоданных определенного кадра. В выводе модуля 24 комбинирования, существуют манипулируемые (предпочтительно энергетические) данные, и на основе этих манипулируемых данных, модуль 25 вычисления глобальных усилений вычисляет глобальное усиление или по меньшей мере управляющее значение для глобального усиления, указываемого на 404. Модуль 25 вычисления глобальных усилений должен вводить ограничения относительно разрешенного битового бюджета для спектра таким образом, что получается определенная скорость передачи данных или определенное число информационных единиц, разрешенных для кадра.However, when the manipulation is not performed on the audio data of the first frame or the second frame, but is applied to the amplitude related values extracted from the audio data of the first frame or the second frame, the control preprocessor 22 is present and the bypass line does not exist. The actual keying is performed by a combining unit 24 which combines the keying value output from the block 23 with amplitude related values extracted from the audio data of a certain frame. In the output of combining unit 24, there is manipulated (preferably energy) data, and based on this manipulated data, global gain calculation unit 25 calculates a global gain or at least a control value for the global gain indicated at 404. Global gain calculation unit 25 must input restrictions on the allowed bit budget for the spectrum such that a certain data rate or a certain number of information units are allowed per frame.

В прямом режиме, проиллюстрированном на фиг. 11, контроллер 20 содержит анализатор 201 для определения характеристик сигналов в расчете на кадр, и анализатор 208 выводит, например, информацию количественных характеристик сигналов, например, информацию тональности, и управляет модулем 202 вычисления управляющих значений с использованием этих предпочтительно количественных данных. Одна процедура для вычисления тональности кадра заключается в вычислении показателя спектральной сглаженности (SFM) кадра. Любые другие процедуры определения тональности или любые другие процедуры определения характеристик сигналов могут выполняться посредством блока 201, и трансляция от определенного значения характеристик сигналов к определенному управляющему значению должна выполняться для получения намеченного уменьшения числа элементов аудиоданных для кадра. Вывод модуля 202 вычисления управляющих значений для прямого режима по фиг. 11 может представлять собой управляющее значение в процессор кодера, например, в переменный квантователь или, в качестве альтернативы, в каскад начального кодирования. Когда управляющее значение передаётся в переменный квантователь, выполняется режим интегрированного уменьшения, в то время как, когда управляющее значение обеспечивается для каскада начального кодирования, выполняется разделенное уменьшение. Другая реализация разделенного уменьшения заключается в удалении или оказании влияния на конкретно выбранные неквантованные элементы аудиоданных, присутствующие до фактического квантования, так что, посредством определенного квантователя, такие затрагиваемые элементы аудиоданных квантуются до нуля и в силу этого должны исключаться для целей энтропийного кодирования и последующего детализирующего кодирования.In the direct mode illustrated in FIG. 11, the controller 20 includes an analyzer 201 for characterizing signals per frame, and the analyzer 208 outputs, for example, signal quantity information such as tone information, and controls the control value calculation unit 202 using this preferably quantitative data. One procedure for calculating the tone of a frame is to calculate the Spectral Smoothness Score (SFM) of the frame. Any other tone determination procedures or any other signal characterization procedures may be performed by block 201, and translation from the determined signal characteristic value to the determined control value must be performed to obtain the intended reduction in the number of audio data elements for the frame. The output of the direct mode control value calculation module 202 of FIG. 11 may be a control value to an encoder processor, such as a variable quantizer, or alternatively an initial encoding stage. When the control value is passed to the variable quantizer, the integrated reduction mode is performed, while when the control value is provided for the initial coding stage, the divided reduction is performed. Another implementation of split reduction is to remove or affect specifically selected non-quantized audio data items present prior to actual quantization such that, by means of a particular quantizer, such affected audio data items are quantized to zero and therefore must be excluded for the purposes of entropy coding and subsequent detail coding. .

Хотя опосредованный режим по фиг. 9 показан вместе с интегрированным уменьшением, т.е. что модуль 25 вычисления глобальных усилений выполнен с возможностью вычисления переменного глобального усиления, манипулируемые данные, выводимые посредством модуля 24 комбинирования, также могут использоваться для непосредственного управления каскадом начального кодирования для удаления любых определенных квантованных элементов аудиоданных, например, наименьших квантованных элементов данных, либо, в качестве альтернативы, управляющее значение также может отправляться в непроиллюстрированные аудиоданные, оказывающие влияние на каскад, который оказывает влияние на аудиоданные перед фактическим квантованием с использованием переменного управляющего значения квантования, которое определено без манипулирования данными и в силу этого обычно подчиняется психоакустическим правилам, которые, тем не менее, намеренно нарушаются процедурами по настоящему изобретению.Although the indirect mode of FIG. 9 is shown together with integrated reduction, i.e. that the global gain calculator 25 is configured to calculate a variable global gain, the manipulated data output by the combiner 24 can also be used to directly control the initial coding stage to remove any specific quantized audio data units, such as the smallest quantized data units, or, in alternatively, the control value can also be sent to the non-illustrated audio data, influencing the cascade, which affects the audio data before the actual quantization using a variable quantization control value, which is determined without data manipulation and therefore usually obeys psychoacoustic rules, which, however, however, are intentionally violated by the procedures of the present invention.

Как проиллюстрировано на фиг. 11 для прямого режима, контроллер выполнен с возможностью определения первой характеристики тональности в качестве первой характеристики сигналов и определения второй характеристики тональности в качестве второй характеристики сигналов таким образом, что битовый бюджет для каскада детализирующего кодирования увеличивается в случае первой характеристики тональности по сравнению с битовым бюджетом для каскада детализирующего кодирования в случае второй характеристики тональности, при этом первая характеристика тональности указывает большую тональность, чем вторая характеристика тональности.As illustrated in FIG. 11 for direct mode, the controller is configured to determine the first tone characteristic as the first signal characteristic and determine the second tone characteristic as the second signal characteristic such that the bit budget for the detail coding stage is increased in the case of the first tone characteristic compared to the bit budget for of the detail coding stage in the case of the second tone characteristic, the first key characteristic indicating a greater key than the second key characteristic.

Настоящее изобретение не приводит к более приблизительному квантованию, которое обычно получается посредством применения большего глобального усиления. Вместо этого, это вычисление глобального усиления на основе зависимых от сигнала манипулируемых данных приводит только к сдвигу битового бюджета из каскада начального кодирования, который принимает меньший битовый бюджет, в каскад детализирующего декодирования, который принимает более высокий битовый бюджет, но этот сдвиг битового бюджета выполняется зависимым от сигнала способом и составляет больше для части сигнала с более высокой тональностью.The present invention does not result in more approximate quantization, which is usually obtained by applying a larger global gain. Instead, this calculation of the global gain based on the signal-dependent manipulated data results only in a shift of the bit budget from the initial coding stage, which accepts a lower bit budget, to the fine-grained decoding stage, which accepts a higher bit budget, but this bit budget shift is performed by the dependent away from the signal in a way and amounts to more for the higher pitched portion of the signal.

Предпочтительно, управляющий препроцессор 22 по фиг. 9 вычисляет связанные с амплитудой значения в качестве множества значений мощности, извлекаемых из одного или более аудиозначений аудиоданных. В частности, именно эти значения мощности манипулируются с использованием прибавления одинакового значения манипуляции посредством модуля 24 комбинирования, и это одинаковое значение манипуляции, которое определено посредством модуля 23 вычисления значений манипуляции, комбинируется со всеми значениями мощности из множества значений мощности для кадра.Preferably, the control preprocessor 22 of FIG. 9 calculates amplitude related values as a set of power values extracted from one or more audio data values. In particular, it is these power values that are manipulated using the addition of the same keying value by the combining module 24, and this same keying value, which is determined by the keying value calculation module 23, is combined with all the power values of the plurality of power values for the frame.

В качестве альтернативы, как указано посредством обходной линии, значения, полученные посредством одинаковой абсолютной величины значения манипуляции, вычисленного посредством блока 23, но предпочтительно с рандомизированными знаками, и/или значения, полученные посредством вычитания немного отличающихся членов из одинаковой абсолютной величины (но предпочтительно с рандомизированными знаками), или комплексное значение манипуляции, или, обобщённо, значения, полученные в качестве выборок из определенного нормализованного распределения вероятностей, масштабируемого с использованием вычисленной комплексной или действительной абсолютной величины значения манипуляции, складываются со всеми аудиозначениями из множества аудиозначений, включенных в кадр. Процедура, выполняемая посредством управляющего препроцессора 22, такая как вычисление спектра мощности и субдискретизация, может включаться в модуль 25 вычисления глобальных усилений. Следовательно, предпочтительно, минимальный уровень шума складывается либо непосредственно со спектральными аудиозначениями, либо, в качестве альтернативы, со связанными с амплитудой значениями, извлекаемыми из аудиоданных в расчете на кадр, т.е. с выводом управляющего препроцессора 22. Предпочтительно препроцессор контроллера вычисляет субдискретизированный спектр мощности, который соответствует использованию возведения в степень со значением экспоненты, равным 2. Тем не менее, в качестве альтернативы, может использоваться другое значение экспоненты, большее 1. В качестве примера, значение экспоненты, равное 3, должно представлять громкость, а не степень. Но также могут использоваться другие значения экспоненты, например, меньшие или большие значения экспоненты.Alternatively, as indicated by a bypass line, values obtained by the same absolute value of the keying value calculated by block 23, but preferably with randomized signs, and/or values obtained by subtracting slightly different terms from the same absolute value (but preferably with randomized signs), or a complex keying value, or, more generally, values obtained as samples from a certain normalized probability distribution scaled using a calculated complex or real absolute value of the keying value, are added to all audio values from the set of audio values included in the frame. The procedure performed by the control preprocessor 22, such as power spectrum calculation and subsampling, may be included in the global gain calculation module 25 . Therefore, preferably, the noise floor is added either directly to the spectral audio values or, alternatively, to the amplitude-related values extracted from the audio data per frame, i. e. with the output of the control preprocessor 22. Preferably, the controller preprocessor calculates a downsampled power spectrum that corresponds to using exponentiation with an exponent value of 2. However, alternatively, another exponent value greater than 1 may be used. As an example, the exponent value 3 should represent loudness, not degree. But other exponent values can also be used, such as smaller or larger exponent values.

В предпочтительной реализации, проиллюстрированной на фиг. 10, модуль 23 вычисления значений манипуляциисодержит модуль 26 поиска для выполнения поиска максимального спектрального значения в кадре и по меньшей мере одно из вычисления не зависимой от сигнала доли, указываемого посредством элемента 27 по фиг. 10, или модуля вычисления для вычисления одного или более моментов в расчете на кадр, как проиллюстрировано посредством блока 28 по фиг. 10. По существу, блок 26 или блок 28 служит здесь для обеспечения зависимого от сигнала влияния на значение манипуляции для кадра. В частности, модуль 26 поиска выполнен с возможностью выполнения поиска максимального значения множества элементов аудиоданных или связанных с амплитудой значений или с возможностью выполнения поиска максимального значения множества субдискретизированных аудиоданных или множества субдискретизированных связанных с амплитудой значений для соответствующего кадра. Фактическое вычисление выполняется посредством блока 29 с использованием вывода блоков 26, 27 и 28, при этом блоки 26, 28 фактически представляют анализ сигналов.In the preferred implementation illustrated in FIG. 10, the keying value calculation unit 23 includes a search unit 26 for performing a search for the maximum spectral value in a frame and at least one of the calculation of the signal-independent fraction indicated by the element 27 of FIG. 10, or a calculation module for calculating one or more moments per frame, as illustrated by block 28 of FIG. 10. As such, block 26 or block 28 serves here to provide a signal-dependent influence on the keying value for a frame. In particular, the search module 26 is configured to search for the maximum value of the plurality of audio data items or amplitude related values, or to search for the maximum value of the plurality of subsampled audio data or the plurality of subsampled amplitude associated values for a corresponding frame. The actual calculation is performed by block 29 using the output of blocks 26, 27 and 28, blocks 26, 28 actually representing signal analysis.

Предпочтительно, не зависимая от сигнала доля определяется посредством скорости передачи битов для фактического сеанса работы кодера, длительности кадра или частоты дискретизации для фактического сеанса работы кодера. Кроме того, модуль 28 вычисления для вычисления одного или более моментов в расчете на кадр выполнен с возможностью вычисления зависимого от сигнала весового значения, извлекаемого по меньшей мере из первой суммы абсолютных величин аудиоданных или субдискретизированных аудиоданных в кадре, второй суммы абсолютных величин аудиоданных или субдискретизированных аудиоданных в кадре, умноженном на индекс, ассоциированный с каждой абсолютной величиной, и частного второй суммы и первой суммы.Preferably, the signal-independent fraction is determined by the bit rate for the actual encoder session, the frame duration, or the sampling rate for the actual encoder session. In addition, the calculation unit 28 for calculating one or more moments per frame is configured to calculate a signal-dependent weight value derived from at least the first sum of absolute values of audio data or subsampled audio data in a frame, the second sum of absolute values of audio data or subsampled audio data in a frame multiplied by the index associated with each absolute value and the quotient of the second sum and the first sum.

В предпочтительной реализации, выполняемой посредством модуля 25 вычисления глобальных усилений по фиг. 9, требуемая битовая оценка вычисляется для каждого значения энергии в зависимости от значения энергии и возможного значения для значения фактического управления. Требуемые битовые оценки для значений энергии и возможного значения для управляющего значения накапливаются, и проверяется, соответствует ли накопленная битовая оценка для возможного значения для управляющего значения критерию разрешенного потребления битов, например, как проиллюстрировано на фиг. 9, в качестве битового бюджета для спектра, введенного в модуль 25 вычисления глобальных усилений. В случае если критерий разрешенного потребления битов не выполняется, возможное значение для управляющего значения модифицируется, и вычисление требуемой битовой оценки, накопление требуемой скорости передачи битов и проверка соответствия критерию разрешенного потребления битов для модифицированного возможного значения для управляющего значения повторяются. После того, как такое оптимальное управляющее значение обнаруживается, это значение выводится в линии 404 по фиг. 9.In the preferred implementation performed by the global gain calculation module 25 of FIG. 9, the required bit estimate is calculated for each energy value depending on the energy value and a possible value for the actual control value. The required bit scores for the energy values and the candidate value for the control value are accumulated, and it is checked whether the accumulated bit score for the candidate value for the control value meets the allowed bit consumption criterion, for example, as illustrated in FIG. 9 as a bit budget for the spectrum input to the global gain calculation unit 25 . If the allowed bit consumption criterion is not met, the candidate value for the control value is modified, and the calculation of the required bit estimate, the accumulation of the required bit rate, and the check for the allowed bit consumption criterion for the modified candidate value for the control value is repeated. Once such an optimal control value is found, that value is output on line 404 of FIG. 9.

Далее проиллюстрированы предпочтительные варианты осуществления.The following illustrates preferred embodiments.

Подробное описание кодера (например, фиг. 5)Detailed description of the encoder (for example, Fig. 5)

ОбозначениеDesignation

Обозначим как f базовую частоту дискретизации в Гц, как N_ms базовую длительность кадра в миллисекундах, и как br базовую скорость передачи битов в битах в секунду.Let f be the base sampling rate in Hz, N _ms the base frame duration in milliseconds, and br the base bit rate in bits per second.

Извлечение остаточного спектра (например, препроцессор 10)Residual spectrum extraction (e.g. preprocessor 10)

Вариант осуществления работает с действительным остаточным спектром

, который обычно извлекается посредством временно-частотного преобразования, такого как MDCT, с дальнейшими психоакустически обусловленными модификациями, такими как формирование временного шума (TNS) для удаления временной структуры, и формирование спектрального шума (SNS) для удаления спектральной структуры. Для аудиосодержимого с медленно варьирующейся спектральной огибающей, огибающая остаточного спектра X_f(k) в силу этого является плоской.Embodiment operates with actual residual spectrum

, which is typically extracted by a time-frequency transform such as MDCT, with further psychoacoustically driven modifications such as temporal noise shaping (TNS) to remove temporal structure, and spectral noise shaping (SNS) to remove spectral structure. For audio content with a slowly varying spectral envelope, the residual spectrum envelope X _f (k) is therefore flat.

Оценка глобального усиления (например, фиг. 9)Global Gain Estimation (eg, Fig. 9)

Квантование спектра управляется посредством глобального усиления g_glob через следующее:Spectrum quantization is controlled by the global gain g _glob via the following:

Начальная оценка глобального усиления (элемент 22 по фиг. 9) извлекается из спектра X_f(k)² мощности после субдискретизации на коэффициент 4:The initial global gain estimate (element 22 of FIG. 9) is extracted from the power spectrum X _f (k) ² after downsampling by a factor of 4:

и сигнально-адаптивного минимального уровня N(X_f) шума, который задается следующим образом:and a signal-adaptive noise floor N(X _f ), which is given as follows:

(например, элемент 23 по фиг. 9)

(for example, element 23 in Fig. 9)

Параметр regBits зависит от скорости передачи битов, длительности кадра и частоты дискретизации и вычисляется следующим образом:The regBits parameter depends on the bit rate, frame duration and sample rate and is calculated as follows:

(например, элемент 27 по фиг. 10)

(for example, element 27 in Fig. 10)

где C(Nm_s, f_s) является таким, как указано в нижеприведенной таблице.where C(Nm _s , f _s ) is as specified in the table below.

N_ms\f_s N _ms \f _s 4800048000 9600096000 2,52.5 -6-6 -6-6 55 00 00 10ten 22 55

Параметр lowBits зависит от центра масс абсолютных значений остаточного спектра и вычисляется следующим образом:The lowBits parameter depends on the center of mass of the absolute values of the residual spectrum and is calculated as follows:

(например, элемент 28, фиг. 10)

(for example, element 28, Fig. 10)

где:where:

и:and:

являются моментами абсолютного спектра.are moments of the absolute spectrum.

Глобальное усиление оценивается в форме:The global gain is estimated in the form:

из значений:from values:

(например, вывода модуля 24 комбинирования по фиг. 9),

(for example, output module 24 combination of Fig. 9),

где gg_off является зависимым от скорости передачи битов и частоты дискретизации смещением.where gg _off is a bit rate and sample rate dependent offset.

Следует отметить, что сложение члена N(X_f) минимального уровня шума с PX_lp(k) дает ожидаемый результат сложения соответствующего минимального уровня шума с остаточным спектром X_f(k), например, рандомизированного прибавления или вычитания члена

для каждой спектральной линии, перед вычислением спектра мощности.It should be noted that adding the noise floor term N(X _f ) to PX _lp (k) gives the expected result of adding the corresponding noise floor to the residual spectrum X _f (k), such as a randomized addition or subtraction of the term

for each spectral line, before calculating the power spectrum.

Чистые оценки на основе спектра мощности могут уже обнаруживаться, например, в кодеке EVS 3GPP (3GPP TS 26.445, раздел 5.3.3.2.8.1). В вариантах осуществления выполняется прибавление минимального уровня N(X_f) шума. Минимальный уровень шума является сигнально-адаптивным двумя способами.Pure estimates based on the power spectrum may already be found, for example, in the 3GPP EVS codec (3GPP TS 26.445, section 5.3.3.2.8.1). In embodiments, the addition of a noise floor N(X _f ) is performed. The noise floor is signal-adaptive in two ways.

Во-первых, он масштабируется с максимальной амплитудой X_f. Следовательно, влияние на энергию равномерного спектра, в котором все амплитуды находятся близко к максимальной амплитуде, является очень небольшим. Но для высокотональных сигналов, в которых спектр и, в расширении, также остаточный спектр демонстрируют определенное число сильных пиков, полная энергия значительно увеличивается, что увеличивает битовую оценку в вычислении глобального усиления, как указано ниже.First, it scales with the maximum amplitude X _f . Therefore, the effect on energy of a uniform spectrum in which all amplitudes are close to the maximum amplitude is very small. But for high-pitched signals, in which the spectrum and, in extension, also the residual spectrum show a certain number of strong peaks, the total energy increases significantly, which increases the bit estimate in the global gain calculation, as indicated below.

Во-вторых, минимальный уровень шума понижается через параметр lowBits, если спектр демонстрирует низкий центр масс. В этом случае, низкочастотное содержимое является доминирующим, в силу чего потери высокочастотных компонентов с большой вероятностью не должны быть настолько критическими, как для высокотонального содержимого.Second, the noise floor is lowered via the lowBits parameter if the spectrum exhibits a low center of mass. In this case, the low frequency content is dominant, whereby the loss of the high frequency components is not likely to be as critical as the high frequency content.

Фактическая оценка глобального усиления выполняется (например, блок 25 по фиг. 9) посредством бисекционного поиска с низкой сложностью, как указано в нижеприведенном коде на языке C, где

обозначает битовый бюджет для кодирования спектра. Оценка потребления битов (накопленная в переменной tmp) основана на значениях E(k) энергии с учетом контекстной зависимости в арифметическом кодере, используемом для кодирования в каскаде 1.The actual global gain estimation is performed (eg, block 25 of FIG. 9) by a low complexity bisectional search as indicated in the C code below, where

denotes the bit budget for spectrum coding. The bit consumption estimate (accumulated in the tmp variable) is based on the context sensitive energy E(k) values in the arithmetic encoder used to encode in stage 1.

fac=0,3;fac=0.3;

gg_ind=255;gg _ind =255;

for (iter=0; iter<8; iter++)for (iter=0; iter<8; iter++)

{{

fac>>=1;fac>>=1;

gg_ind-=fac;gg _ind -=fac;

tmp=0;tmp=0;

iszero=1;iszero=1;

for (i=N/4-1; i>=0; i--)for (i=N/4-1; i>=0; i--)

{{

if (E[i]*28/20<(gg_ind+gg_off))if (E[i]*28/20<(gg _ind +gg _off ))

{{

if (iszero==0)if (iszero==0)

{{

tmp+=2,7*28/20;tmp+=2.7*28/20;

}}

elseelse

{{

if ((gg_ind+gg_off)<E[i]*28/20-43*28/20)if ((gg _ind +gg _off )<E[i]*28/20-43*28/20)

{{

tmp+=2*E[i]*28/20-2*(gg_ind+gg_off)-36*28/20;tmp+=2*E[i]*28/20-2*(gg _ind +gg _off )-36*28/20;

}}

elseelse

{{

tmp+=E[i]*28/20-(gg_ind+gg_off)+7*28/20;tmp+=E[i]*28/20-(gg _ind +gg _off )+7*28/20;

}}

iszero=0;iszero=0;

}}

if (tmp>

*1,4*28/20 andand iszero==0)if (tmp>

*1.4*28/20 andand iszero==0)

{{

gg_ind+=fac;gg _ind +=fac;

}}

Остаточное кодирование (например, фиг. 3)Residual coding (eg, Fig. 3)

Остаточное кодирование использует избыточные биты, которые доступны после арифметического кодирования квантованного спектра X_q(k). Пусть B обозначает число избыточных битов, и пусть K обозначает число кодированных ненулевых коэффициентов X_q(k). Кроме того, пусть k_i, i=1, ... K, обозначает перечисление этих ненулевых коэффициентов от наименьшей к наибольшей частоте. Остаточные биты b_i(j) (принимающие значения 0 и 1) для коэффициента k_i вычисляются таким образом, чтобы минимизировать ошибку:Residual coding uses redundant bits that are available after arithmetic coding of the quantized spectrum X _q (k). Let B denote the number of redundant bits, and let K denote the number of encoded non-zero coefficients X _q (k). In addition, let k _i , i=1, ... K, denote the enumeration of these non-zero coefficients from the smallest to the largest frequency. The residual bits b _i (j) (taking the values 0 and 1) for the coefficient k _i are calculated in such a way as to minimize the error:

Это может осуществляться итеративным способом с проверкой того, выполняется ли следующее:This can be done in an iterative way, checking if the following is true:

Если (1) является истинным, то n-й остаточный бит bi(n) для коэффициента k_i задается равным 0, и иначе он задается равным 1. Вычисление остаточных битов выполняется посредством вычисления первого остаточного бита для каждого k_i и затем второго бита и т.д. до тех пор, пока все остаточные биты не расходуются, или максимальное число nmax итераций не выполняется. Это оставляет:If (1) is true, then the nth residual bit bi(n) for the coefficient k _i is set to 0, and otherwise it is set to 1. The calculation of the residual bits is performed by calculating the first residual bit for each k _i and then the second bit and etc. until all residual bits are consumed, or the maximum number nmax of iterations is not executed. This leaves:

остаточных битов для коэффициента X_q(k_i). Эта схема остаточного кодирования улучшает схему остаточного кодирования, которая применяется в 3GPP EVS-кодеке, который расходует самое большее один бит в расчете на ненулевой коэффициент.residual bits for coefficient X _q (k _i ). This residual coding scheme improves upon the residual coding scheme applied in the 3GPP EVS codec, which consumes at most one bit per non-zero coefficient.

Вычисление остаточных битов с nmax=20 проиллюстрировано посредством следующего псевдокода, при этом gg обозначает глобальное усиление:The calculation of the residual bits with nmax=20 is illustrated with the following pseudocode, with gg standing for the global gain:

iter=0;iter=0;

nbits_residual=0;nbits_residual=0;

offset=0,25;offset=0.25;

while (nbits_residual<nbits_residual_max andand iter<20)while (nbits_residual<nbits_residual_max andand iter<20)

{{

k=0; k=0;

while (k<N_E andand nbits_residual<nbits_residual_max)while (k<N _E andand nbits_residual<nbits_residual_max)

{{

if (X_q[k] !=0)if (Xq[k] ! ₌ 0)

{{

if (X_f[k]>=X_q[k]*gg)if ( _Xf [k]> _=Xq [k]*gg)

{{

res_bits[nbits_residual]=1;res_bits[nbits_residual]=1;

X_f[k] -=offset*gg;X _f [k] -=offset*gg;

}}

elseelse

{{

res_bits[nbits_residual]=0;res_bits[nbits_residual]=0;

X_f[k]+=offset*gg; _Xf [k]+=offset*gg;

}}

nbits_residual++;nbits_residual++;

}}

k++;k++;

} }

iter++; iter++;

offset/=2; offset/=2;

}}

Описание декодера (например, фиг. 6)Description of the decoder (for example, Fig. 6)

В декодере, энтропийно кодированный спектр

получается посредством энтропийного декодирования. Остаточные биты используются для детализации этого спектра, как продемонстрировано следующим псевдокодом (см. также, например, фиг. 8).At the decoder, entropy coded spectrum

obtained by entropy decoding. The residual bits are used to refine this spectrum, as demonstrated by the following pseudocode (see also, eg, FIG. 8).

iter=n=0;iter=n=0;

offset=0,25;offset=0.25;

while (iter<20 andand n<nResBits)while (iter<20 andand n<nResBits)

{{

k=0; k=0;

while (k<N_E andand n<nResBits)while (k<N _E andand n<nResBits)

{{

если (

[k] !=0)if (

[k] !=0)

{{

if (resBits[n++]==0)if (resBits[n++]==0)

{{

[k] -=offset;

[k]-=offset;

}}

elseelse

{{

[k]+=offset;

}}

k++;k++;

}}

iter++;iter++;

offset/=2; offset/=2;

}}

Декодированный остаточный спектр задается следующим образом:The decoded residual spectrum is given as follows:

ЗаключенияConclusions

Сумматор минимального уровня шума эффективно передает биты из первого каскада кодирования во второй каскад кодирования для высокотональных сигналов при оставлении оценки для других типов сигналов неизменной. Утверждается, что этот сдвиг битов из каскада энтропийного кодирования в каскад неэнтропийного кодирования является полностью эффективным для высокотональных сигналов.The noise floor combiner efficiently transfers bits from the first coding stage to the second coding stage for high tone signals while leaving the estimate for other types of signals unchanged. This bit shifting from the entropy coding stage to the non-entropy coding stage is claimed to be fully effective for high-pitched signals.

Фиг. 12 иллюстрирует процедуру для уменьшения числа элементов аудиоданных зависимым от сигнала способом с использованием разделенного уменьшения. На этапе 901, квантование выполняется с использованием неманипулируемой информации, такой как глобальное усиление, вычисленное из сигнальных данных без манипулирования. С этой целью, (полный) битовый бюджет для элементов аудиоданных требуется, и в выводе блока 901, получаются квантованные элементы данных. В блоке 902, число элементов аудиоданных уменьшается посредством исключения (управляемого) количества предпочтительно наименьших элементов аудиоданных на основе зависимого от сигнала управляющего значения. В выводе блока 902, получается уменьшенное число элементов данных, и в блоке 903, применяется каскад начального кодирования, и с битовым бюджетом для остаточных битов, которые остаются вследствие управляемого уменьшения, применяется каскад детализирующего кодирования, как проиллюстрировано на 904.Fig. 12 illustrates a procedure for reducing the number of audio data elements in a signal-dependent manner using split reduction. In step 901, quantization is performed using non-manipulated information such as a global gain calculated from the signal data without being manipulated. To this end, the (full) bit budget for the audio data items is required, and in the output of block 901, quantized data items are obtained. In block 902, the number of audio data elements is reduced by eliminating a (controlled) number of preferably the smallest audio data elements based on a signal-dependent control value. At the output of block 902, the reduced number of data elements is obtained, and at block 903, the initial encoding stage is applied, and with the bit budget for the residual bits that remain due to the controlled reduction, the detail encoding stage is applied, as illustrated at 904.

Кроме того, в процедуру на фиг. 12, блок 902 уменьшения также может выполняться перед фактическим квантованием с использованием значения глобального усиления или, в общем, определенного размера шага квантователя, который определен с использованием неманипулируемых аудиоданных. Это уменьшение числа элементов аудиоданных может в силу этого также выполняться в неквантованной области посредством задания равными нулю определенных предпочтительно небольших значений или посредством взвешивания определенных значений с весовыми коэффициентами, что, в конечном счете, приводит к значениям, квантованным до нуля. В реализации для разделенного уменьшения, выполняются этап явного квантования, с одной стороны, и этап явного уменьшения, с другой стороны, при этом управление для конкретного квантования выполняется без манипулирования данных.In addition, in the procedure of FIG. 12, reduction block 902 may also be performed prior to actual quantization using a global gain value or, in general, a specific quantizer step size that is determined using non-keyed audio data. This reduction in the number of audio data elements can therefore also be carried out in the non-quantized region by setting certain preferably small values to zero, or by weighting certain values with weighting factors, which ultimately leads to values quantized to zero. In the implementation for divided reduction, an explicit quantization step on the one hand and an explicit reduction step on the other hand are performed, while control for specific quantization is performed without data manipulation.

В отличие от этого, фиг. 13 иллюстрирует режим интегрированного уменьшения в соответствии с вариантом осуществления настоящего изобретения. В блоке 911, манипулируемая информация определяется посредством контроллера 20, такая как, например, глобальное усиление, проиллюстрированное в выводе блока 25 по фиг. 9. В блоке 912, квантование неманипулируемых аудиоданных выполняется с использованием манипулируемого глобального усиления, или в общем, манипулируемой информации, вычисленной в блоке 911. В выводе процедуры квантования блока 912, получается уменьшенное число элементов аудиоданных, которые начально кодируются в блоке 903, и детализация, кодированная в блоке 904. Вследствие зависимого от сигнала уменьшения элементов аудиоданных, остаточные биты по меньшей мере для одной полной итерации и по меньшей мере для части второй итерации и предпочтительно даже более чем для двух итераций остаются. Сдвиг битового бюджета из каскада начального кодирования в каскад детализирующего кодирования выполняется в соответствии с настоящим изобретением и зависимым от сигнала способом.In contrast, FIG. 13 illustrates an integrated reduction mode according to an embodiment of the present invention. In block 911, manipulated information is determined by the controller 20, such as, for example, the global gain illustrated in the output of block 25 of FIG. 9. In block 912, quantization of the non-keyed audio data is performed using the keyed global gain, or in general, the keyed information computed in block 911. In the output of the quantization procedure of block 912, the reduced number of audio data items that are initially encoded in block 903 is obtained, and the granularity , encoded in block 904. Due to signal-dependent audio chip reduction, residual bits for at least one full iteration and at least part of the second iteration, and preferably even more than two iterations, remain. The shift of the bit budget from the initial coding stage to the detail coding stage is performed in accordance with the present invention and in a signal dependent manner.

Настоящее изобретение может быть реализовано по меньшей мере в четырех различных режимах. Определение управляющего значения может выполняться в прямом режиме с явным определением характеристик сигналов или в опосредованном режиме без явного определения характеристик сигналов, но со сложением зависимого от сигнала минимального уровня шума с аудиоданными или с извлеченными аудиоданными в качестве примера для манипулирования. Одновременно, уменьшение числа элементов аудиоданных выполняется интегрированным способом или разделенным способом. Также могут выполняться опосредованное определение и интегрированное уменьшение или опосредованное формирование управляющего значения и разделенное уменьшение. Кроме того, также могут выполняться прямое определение вместе с интегрированным уменьшением и прямое определение управляющего значения вместе с разделенным уменьшением. Для целей низкой эффективности, предпочтительным является опосредованное определение управляющего значения вместе с интегрированным уменьшением элементов аудиоданных.The present invention can be implemented in at least four different modes. The determination of the control value may be performed in direct mode with explicit signal characterization, or in indirect mode without explicit signal characterization, but adding the signal-dependent noise floor to the audio data or extracted audio data as an example for manipulation. At the same time, the reduction in the number of audio data items is performed in an integrated manner or in a divided manner. Indirect determination and integrated reduction or indirect control value generation and divided reduction may also be performed. In addition, direct determination along with integrated reduction and direct determination of the control value along with divided reduction can also be performed. For low efficiency purposes, it is preferable to indirectly determine the control value along with the integrated reduction of audio data elements.

Здесь следует отметить, что все альтернативы или аспекты, поясненные выше, и все аспекты, определённые в независимых пунктах в нижеприведенной формуле изобретения, могут использоваться по отдельности, т.е. без альтернатив или задач, отличных от предполагаемой альтернативы, задачи или независимого пункта формулы изобретения. Тем не менее, в других вариантах осуществления, две или более из альтернатив или аспектов или независимых пунктов формулы изобретения могут комбинироваться друг с другом, и, в других вариантах осуществления, все аспекты или альтернативы и все независимые пункты формулы изобретения могут комбинироваться друг с другом.It should be noted here that all alternatives or aspects explained above and all aspects defined in the independent claims in the following claims may be used individually, i. without alternatives or objectives other than the intended alternative, objective or independent claim. However, in other embodiments, two or more of the alternatives or aspects or independent claims may be combined with each other, and in other embodiments, all aspects or alternatives and all independent claims may be combined with each other.

Кодированный согласно изобретению аудиосигнал может сохраняться на цифровом носителе хранения данных или постоянном носителе хранения данных или может передаваться по среде передачи, такой как беспроводная среда передачи или проводная среда передачи, например, Интернет.The audio signal encoded according to the invention may be stored on a digital storage medium or a permanent storage medium, or may be transmitted over a transmission medium such as a wireless transmission medium or a wired transmission medium such as the Internet.

Хотя некоторые аспекты описаны в контексте устройства, очевидно, что эти аспекты также представляют описание соответствующего способа, при этом блок или устройство соответствует этапу способа либо признаку этапа способа. Аналогичным образом, аспекты, описанные в контексте этапа способа, также представляют описание соответствующего блока или элемента, или признака соответствующего устройства.Although some aspects are described in the context of a device, it is obvious that these aspects also represent a description of the corresponding method, with the block or device corresponding to a method step or a feature of a method step. Likewise, the aspects described in the context of a method step also provide a description of the corresponding block or element, or feature of the corresponding device.

В зависимости от определенных требований к реализации, варианты осуществления изобретения могут быть реализованы в аппаратных средствах или в программном обеспечении. Реализация может выполняться с использованием цифрового носителя хранения данных, например, гибкого диска, DVD, CD, ROM, PROM, EPROM, EEPROM или флэш-памяти, имеющего сохраненные считываемые электронными средствами управляющие сигналы, которые взаимодействуют (или допускают взаимодействие) с программируемой компьютерной системой таким образом, что осуществляется соответствующий способ.Depending on certain implementation requirements, embodiments of the invention may be implemented in hardware or software. The implementation may be performed using a digital storage medium, such as a floppy disk, DVD, CD, ROM, PROM, EPROM, EEPROM, or flash memory, having stored electronically readable control signals that interact (or are capable of interacting) with a programmable computer system. so that the corresponding method is carried out.

Некоторые варианты осуществления согласно изобретению содержат носитель данных, имеющий считываемые электронными средствами управляющие сигналы, которые допускают взаимодействие с программируемой компьютерной системой таким образом, что осуществляется один из способов, описанных в данном документе.Some embodiments of the invention comprise a storage medium having electronically readable control signals that are capable of interacting with a programmable computer system such that one of the methods described herein is carried out.

В общем, варианты осуществления настоящего изобретения могут быть реализованы в виде компьютерного программного продукта с программным кодом, при этом программный код выполнен с возможностью осуществления одного из способов, когда компьютерный программный продукт работает на компьютере. Программный код, например, может сохраняться на машиночитаемом носителе.In general, embodiments of the present invention may be implemented as a computer program product with program code, wherein the program code is configured to perform one of the methods when the computer program product is running on the computer. The program code may, for example, be stored on a computer-readable medium.

Другие варианты осуществления содержат компьютерную программу для осуществления одного из способов, описанных в данном документе, сохраненную на машиночитаемом носителе или на постоянном носителе хранения данных.Other embodiments comprise a computer program for carrying out one of the methods described herein stored on a computer-readable medium or on a permanent storage medium.

Другими словами, таким образом, вариант осуществления способа согласно изобретению представляет собой компьютерную программу, имеющую программный код для осуществления одного из способов, описанных в данном документе, когда компьютерная программа работает на компьютере.In other words, therefore, an embodiment of the method according to the invention is a computer program having program code for carrying out one of the methods described herein when the computer program is running on a computer.

Следовательно, дополнительный вариант осуществления способов согласно изобретению представляет собой носитель хранения данных (цифровой носитель хранения данных или машиночитаемый носитель), содержащий записанную компьютерную программу для осуществления одного из способов, описанных в данном документе.Therefore, a further embodiment of the methods of the invention is a storage medium (digital storage medium or computer-readable medium) containing a recorded computer program for carrying out one of the methods described herein.

Следовательно, дополнительный вариант осуществления способа согласно изобретению представляет собой поток данных или последовательность сигналов, представляющих компьютерную программу для осуществления одного из способов, описанных в данном документе. Поток данных или последовательность сигналов, например, может быть выполнена с возможностью передачи через соединение для передачи данных, например, через Интернет.Therefore, a further embodiment of the method according to the invention is a data stream or sequence of signals representing a computer program for implementing one of the methods described herein. The data stream or signal sequence, for example, may be configured to be transmitted over a data connection, such as the Internet.

Дополнительный вариант осуществления содержит средство обработки, например, компьютер или программируемое логическое устройство, выполненное с возможностью осуществления одного из способов, описанных в данном документе.An additional embodiment includes processing means, such as a computer or programmable logic device, configured to perform one of the methods described herein.

Дополнительный вариант осуществления содержит компьютер, имеющий установленную компьютерную программу для осуществления одного из способов, описанных в данном документе.An additional embodiment comprises a computer having a computer program installed to implement one of the methods described herein.

В некоторых вариантах осуществления, программируемое логическое устройство (например, программируемая пользователем вентильная матрица) может использоваться для выполнения части или всех из функциональностей способов, описанных в данном документе. В некоторых вариантах осуществления, программируемая пользователем вентильная матрица может взаимодействовать с микропроцессором для осуществления одного из способов, описанных в данном документе. В общем, способы предпочтительно осуществляются посредством любого аппаратного устройства.In some embodiments, a programmable logic device (eg, a field programmable gate array) may be used to perform some or all of the functionality of the methods described herein. In some embodiments, a user-programmable gate array may interface with a microprocessor to implement one of the methods described herein. In general, the methods are preferably carried out by any hardware device.

Вышеописанные варианты осуществления являются лишь иллюстративными в отношении принципов настоящего изобретения. Следует понимать, что специалистам в данной области техники должны быть очевидными модификации и изменения конфигураций и подробностей, описанных в данном документе. Следовательно, подразумевается ограничение лишь объемом нижеприведенной формулы изобретения, но не конкретными подробностями, представленными в данном документе в качестве описания и пояснения вариантов осуществления.The above described embodiments are merely illustrative of the principles of the present invention. It should be understood that modifications and changes to the configurations and details described herein should be apparent to those skilled in the art. Therefore, it is intended to be limited only by the scope of the following claims, and not by the specific details provided herein as a description and explanation of the embodiments.

Claims

1. Audio encoder for encoding input audio data (11), containing:

- a preprocessor (10) for pre-processing the input audio data (11 for obtaining the audio data to be encoded;

- an encoder processor (15) for encoding the audio data to be encoded; and

- a controller (20) for controlling the encoder processor (15) in such a way that, depending on the first characteristic of the signals of the first frame of audio data to be encoded, the number of audio data elements for the audio data to be encoded by the encoder processor (15) for the first frame is reduced by compared to the second signal characteristic of the second frame, and the first number of information units used to encode the reduced number of audio data elements for the first frame is more significantly improved compared to the second number of information units for the second frame.

2. Audio encoder according to claim 1,

- in which the encoder processor (15) comprises an initial encoding stage (151) and a detail encoding stage (152),

- wherein the controller (20) is configured to reduce the number of audio data elements encoded by the initial coding stage (151) for the first frame,

wherein the initial encoding stage (151) is configured to encode the reduced number of audio data elements for the first frame using the initial number of information units of the first frame, and

wherein the detail coding stage (152) is configured to use the remaining number of information units of the first frame for detail coding for the reduced number of audio data elements for the first frame, wherein the result of adding the initial number of information units of the first frame with the remaining number of information units of the first frame is a given the number of information units for the first frame.

3. Audio encoder according to claim 2,

- in which the controller (20) is configured to reduce the number of audio data elements encoded by the initial coding stage (151) for the second frame to a higher number of audio data elements compared to the first frame,

wherein the initial encoding stage (151) is configured to encode the reduced number of audio data elements for the second frame using the initial number of information units of the second frame, wherein the initial number of information units of the second frame is greater than the initial number of information units of the first frame, and

wherein the detail coding stage (152) is configured to use the remaining number of information units of the second frame for detail coding for the reduced number of audio data elements for the second frame, wherein the result of adding the initial number of information units of the second frame with the remaining number of information units of the second frame is a given the number of information units for the first frame.

4. An audio encoder according to any one of the preceding claims,

wherein the initial encoding stage (151) is configured to encode a reduced number of audio data elements for the first frame using the initial number of information units of the first frame,

wherein the detail coding stage (152) is configured to use the remaining number of information units of the first frame for detail coding for the reduced number of audio data elements for the first frame, wherein the result of adding the initial number of information units of the first frame with the remaining number of information units of the first frame is a given the number of information units for the first frame, and

wherein the controller (20) is configured to control the encoder processor (15) such that the detail coding stage (152) performs detail coding of at least one of the reduced number of audio data elements of the first frame using at least two information units, or such that the detail coding stage (152) performs detail coding on more than 50 percent of the reduced number of audio data items using at least two information units for each audio data item, or

wherein the controller (20) is configured to control the encoder processor (15) such that the detail coding stage (152) performs detail coding of all audio data elements of the second frame using less than two information units, or such that the detail coding stage performs detail encoding less than 50 percent of the reduced number of audio data items using at least two information units for each audio data item.

5. An audio encoder according to any one of the preceding claims,

wherein the detail coding stage (152) is configured to use the remaining number of information units of the first frame for detail coding for the reduced number of audio data elements for the first frame,

- at the same time, the detail coding stage (152) is configured to iteratively assign (300, 302) the remaining number of information units of the first frame to a reduced number of audio data elements on at least two consecutive iterations, calculating (304, 308, 312) the value of the assigned information units for at least two sequentially performed iterations and input (316, 318, 320) the calculated values of information units for at least two sequentially performed iterations into the encoded output frame in a given order.

6. The audio encoder of claim 5, wherein the detail encoding stage (152) is configured to sequentially compute (304) an information unit for each audio data element from the reduced number of audio data elements for the first frame, in order from low frequency information for the audio data element to high frequency information for the audio data element in the first iteration,

wherein the detail coding stage (152) is configured to sequentially compute (308) an information unit for each audio data element from the reduced number of audio data elements for the first frame, in order from low frequency information for the audio data element to high frequency information for the audio data element in the second iteration, and

wherein the detail coding stage (152) is configured to check (314) whether the number of information units already assigned is less than the predetermined number of information units for the first frame, which is less than the initial number of information units of the first frame, and terminate the second iteration at in the case of a negative test result, or, in the case of a positive test result, performing (312) a certain number of additional iterations until a negative test result is obtained, and the number of additional iterations is at least one, or

- at the same time, the detail coding stage (152) is configured to count the number of non-zero audio elements and determine the number of iterations from the number of non-zero audio elements and a given number of information units for the first frame, which is less than the initial number of information units of the first frame.

7. An audio encoder according to any one of the preceding claims,

wherein the initial coding stage (151) is configured to encode the number of MSIs for each audio data element from the reduced number of audio data elements for the first frame using the initial number of information units of the first frame, said number being greater than one, and

wherein the detail coding stage (152) is configured to use the remaining number of information units of the first frame to encode the number of lower information units for each audio data element from the reduced number of audio data elements for the first frame, said number being greater than one for at least one audio data element of the reduced number of audio data items for the first frame.

8. An audio encoder according to any one of the preceding claims,

wherein the first signal characteristic constitutes a first tone value, wherein the second signal characteristic constitutes a second tone value, wherein the first tone value indicates a higher tone than the second tone value, and

- while the controller (20) is configured to reduce the number of audio data elements for the first frame to a first number less than the number of audio data elements for the second frame, and increase the average number of information units used to encode each audio data element from the reduced number of audio data elements of the first frame such that it exceeds the average number of information units used to encode each audio data element from the reduced number of audio data elements of the second frame.

9. An audio encoder according to any one of the preceding claims, wherein the encoder processor (15) comprises:

- a variable quantizer (150) for quantizing the audio data of the first frame to obtain quantized audio data for the first frame, and for quantizing the audio data of the second frame to obtain quantized audio data for the second frame;

- an initial coding stage (151) for encoding the quantized audio data of the first frame or the second frame;

- a detail coding stage (152) for encoding the residual data of the first frame and the second frame;

- while the controller (20) is configured to analyze (26, 28) the audio data of the first frame to determine the first control value (21) for the variable quantizer (150) for the first frame, and analyze (26, 28) the audio data of the second frame to determine the second a control value for the variable quantizer (150) for the second frame, the second control value being different from the first control value (21), and

- while the controller (20) is configured to perform (23, 24) manipulation of the audio data of the first frame or the second frame or amplitude-related values extracted from the audio data of the first frame or the second frame, depending on the audio data to determine the first control value (21) or a second control value (21), wherein the variable quantizer (150) is configured to quantize the audio data of the first frame or the second frame without said manipulation.

10. Audio encoder according to any one of paragraphs. 1-9, wherein the encoder processor (15) comprises:

- while the controller (20) is configured to analyze the audio data of the first frame to determine the first control value (21) for the variable quantizer (150), for the initial coding stage (151) or for the module (150) for reducing the number of audio data elements for the first frame, and with the possibility of analyzing the audio data of the second frame to determine the second control value for the variable quantizer (150), for the initial coding stage (151), or for the module (150) for reducing the number of audio data elements for the second frame, the second control value being different from the first control value, and

wherein the controller (20) is configured (201) to determine the first tone characteristic as the first signal characteristic for determining the first control value, and the second tone characteristic as the second signal characteristic for determining the second control value, so that the bit budget for of the detail coding stage (152) is increased in the case of the first keynote compared to the bit budget for the detail coding stage (152) in the case of the second keynote, with the first keynote indicating a greater key than the second keynote.

11. The audio encoder according to claim 9 or 10, wherein the initial coding stage (151) is an entropy coding stage for entropy coding, or the detail coding stage (152) is a residual or binary coding stage for encoding the residual data of the first frame and the second frame .

12. Audio encoder according to any one of paragraphs. 9-11,

- in which the controller (20) is configured to determine the first or second control value so that the first information unit budget for the initial encoding stage (151) is less than or equal to a predetermined value, and the controller (20) is configured to extract a second information unit budget for the detail coding cascade (152) using the first information unit budget and the maximum number of information units for the first or second frame, or a predetermined value.

13. Audio encoder according to any one of paragraphs. 9-12, in which the controller (20) is configured to calculate (22) amplitude related values as a plurality of power values extracted from one or more audio data audio values, and manipulate (24) the power values using the addition of the same manipulation value with all power values from a plurality of power values, or

- while the controller (20) is configured to:

- randomized addition or subtraction (24) of the same manipulation value on all audio values from or from a plurality of audio values included in the frame, or

- addition or subtraction of values obtained by means of the same absolute value of the manipulation value, but preferably with randomized signs, or

- adding or subtracting values obtained by subtracting slightly different terms from the same absolute value,

- adding or subtracting values obtained as samples from a normalized probability distribution scaled using the computed complex or real absolute value of the keying value, or

- while the controller (20) is configured to calculate (22) amplitude-related values using the exponentiation of the audio data of the first or second frame or subsampled audio data of the first or second frame with an exponent value, and the exponent value is greater than 1.

14. Audio encoder according to any one of paragraphs. 9-13, in which the controller (20) is configured to calculate (23) a keying value to be manipulated using the maximum value (26) of the set of audio data or amplitude related values, or using the maximum value of the set of subsampled audio data or the set of subsampled amplitude related values. for the first or second frame.

15. Audio encoder according to any one of paragraphs. 9-14, wherein the controller (20) is configured to calculate (23) a keying value for keying further using a signal-independent weight value (27), wherein the signal-independent weight value depends on at least one of the bit rate for the first or second frame, frame duration and sample rate.

16. Audio encoder according to any one of paragraphs. 9-15, wherein the controller (20) is configured to calculate (23, 29) a keying value to be manipulated using a signal dependent weight value derived from at least one of a first sum of absolute values of audio data or subsampled audio data in a frame, a second the sum of the absolute values of the audio data or subsampled audio data in the frame, multiplied by the index associated with each absolute value, and the quotient of the second sum and the first sum.

17. Audio encoder according to any one of paragraphs. 9-16,

- in which the controller (20) is configured to calculate (29) a manipulation value for manipulation based on the following equation:

,

where k is a frequency index, where X _f (k) is the audio data value for frequency index k before quantization, where max is a function of the maximum, where regBits is the first signal-independent weight value, and where lowBits is the second signal-dependent weighting value.

18. An audio encoder according to any one of the preceding claims, wherein the preprocessor (10) further comprises:

- frequency-time converter (14) for converting time domain audio data into frame spectral values; and

- a spectral processor (15) for calculating modified spectral values having a flatter spectral envelope than the spectral envelope of the spectral values, wherein the modified spectral values represent the audio data of the first or second frame to be encoded by the encoder processor (15).

19. The audio encoder of claim 18, wherein the spectral processor (15) is configured to perform at least one of a temporal noise shaping operation, a spectral noise shaping operation, and a spectral whitening operation.

20. Audio encoder according to any one of paragraphs. 9-19, wherein the controller (20) is configured to calculate a control value using a plurality of energy values as amplitude-related values for a frame, each energy value being extracted (22, 23, 24) from the power value as associated with an amplitude value and a signal-dependent keying value for said keying.

21. Audio encoder according to claim 20, wherein the controller (20) is configured to:

- calculating the required bit estimate of each energy value depending on the energy value and the possible value for the control value,

- accumulation of the required bit estimates for energy values and a possible value for the control value,

- checking if the accumulated bit score for the possible value for the control value meets the allowed bit consumption criterion, and

- modifying the candidate value for the control value in case the allowed bit consumption criterion is not met, and repeating the computation of the required bit estimate, accumulating the required bit rate, and checking until the allowed bit consumption criterion for the modified candidate value is met for control value.

22. Audio encoder according to claim 20 or 21,

- in which the controller (20) is configured to calculate a plurality of energy values based on the following equation:

where E(k) is the energy value for index k, where PX _lp (k) is the power value for index k as an amplitude related value, and N(X _f ) is the signal-dependent keying value.

23. Audio encoder according to any one of paragraphs. 9-22, wherein the controller (20) is configured to calculate a first or second control value based on an estimate of the accumulated information units required for each manipulated audio data value or manipulated amplitude-related value.

24. Audio decoder according to any one of paragraphs. 9-23,

- in which the controller (20) is configured to manipulate in such a way that, due to manipulation, the bit budget for the initial encoding stage (151) is increased or the bit budget for the detailed encoding stage (152) is reduced.

25. Audio decoder according to any one of paragraphs. 9-24,

- in which the controller (20) is configured to be manipulated such that the manipulation results in a larger residual coding stage bit budget for the first key signal compared to the second key signal, with the second key lower than the first key.

26. Audio decoder according to any one of paragraphs. 9-25,

- in which the controller (20) is configured to manipulate such that the energy of the audio data from which the bit budget for the initial coding stage (151) is calculated is increased relative to the energy of the audio data to be quantized by the variable quantizer (150).

27. An audio encoder according to any one of the preceding claims, wherein the encoder processor (15) comprises a variable quantizer (150) for quantizing the audio data of the first frame to obtain quantized audio data for the first frame and for quantizing the audio data of the second frame to obtain quantized audio data for the second frame,

- while the controller (20) is configured to calculate the global gain for the first or second frame, and

- while the variable quantizer (150) contains: module (155) weighting for weighting with global gain; and a quantizer kernel (157) having a fixed quantization step size.

28. An audio encoder according to any one of the preceding claims, wherein the encoder processor (15) comprises an initial encoding stage (151) and a detail encoding stage (152),

wherein the detail coding stage (152) is configured to calculate the detail bits for the quantized audio values in a plurality of iterations, with the detail bit indicating a different number at each iteration, or

where the detail bit at the lower iteration indicates a larger number than the detail bit at the higher iteration, or

wherein the amount is a fractional amount that is a fraction of the quantizer step size indicated by the control value.

29. An audio encoder according to any one of the preceding claims, wherein the encoder processor (15) comprises a detail coding stage (152), wherein the detail coding stage (152) is configured (304, 308, 312):

- performing iterative processing having at least two iterations,

- checking whether the quantized audio value, or the quantized audio value, together with the potential first quantity associated with the detail bit for the quantized audio value at the first iteration, is added to or subtracted from the second quantity for the second iteration, when weighted by global gain, is greater or less than the unquantized audio value, and

- setting the detail bit for the second iteration, depending on the result of the check.

30. An audio encoder according to any one of the preceding claims, wherein the encoder processor (15) comprises a variable quantizer (150) and a detail coding stage (152), wherein the detail coding stage (152) is configured to compute a detail bit only for audio values that are not quantized to zero by a variable quantizer (150).

31. An audio encoder according to any one of the preceding claims,

- in which the controller (20) is configured to reduce the effect of manipulation for audio data having a center of mass at a lower frequency, and

wherein the initial encoding stage (151) of the encoder processor (15) is configured to remove high-frequency spectral values from the audio data if it is determined that the bit budget for the first or second frame is not sufficient to encode the quantized audio data of the frame.

32. Audio encoder according to any one of the preceding claims,

- in which the controller (20) is configured to perform a bisectional search for each frame separately using the manipulated spectral energy values for the first or second frame as the manipulated amplitude-related values for the first or second frame.

33. A method for encoding input audio data, comprising the steps of:

- pre-processing the input audio data (11) to obtain the audio data to be encoded;

encode the audio data to be encoded; and

controlling the encoding such that, depending on the first signal characteristic of the first frame of audio data to be encoded, the number of audio data elements for the audio data to be encoded for the first frame is reduced compared to the second signal characteristic of the second frame, and the first number of information units used to encode the reduced number of audio data items for the first frame is more significantly improved compared to the second number of information units for the second frame.

34. The method of claim 33, wherein the encoding comprises the steps of:

- variably quantizing the audio data of the frame to obtain quantized audio data;

- perform entropy encoding of the quantized audio data of the frame; and

- encode the residual data of the frame;

- while the control contains the step at which determine the control value for the variable quantization, and the definition contains the steps at which: analyze the audio data of the first or second frame; and manipulating the audio data of the first or second frame or the amplitude related values extracted from the audio data of the first or second frame depending on the audio data to determine the control value, wherein the variable quantization quantizes the audio data of the frame without manipulation, or

wherein the control comprises the step of determining the first or second audio data sentiment characteristic and determining the control value such that the bit budget for the residual coding is increased in the case of the first sentiment characteristic compared to the bit budget for the residual coding stage in the case of the second sentiment characteristic, wherein the first key characteristic indicates a greater key than the second key characteristic.

35. A physical storage medium on which a computer program is stored for implementation when executed on a computer or in a processor of the method according to claim 33 or 34.