RU2344493C2

RU2344493C2 - Sound coding with different durations of coding frame

Info

Publication number: RU2344493C2
Application number: RU2006139796/09A
Authority: RU
Inventors: Яри МЯКИНЕН (FI); Яри МЯКИНЕН
Original assignee: Нокиа Корпорейшн
Priority date: 2004-05-17
Filing date: 2004-05-17
Publication date: 2009-01-20
Also published as: RU2006139796A

Abstract

FIELD: information technologies.

SUBSTANCE: invention is related to method of sound signal coding support, in which at least one segment of sound signal should be coded with the help of coding model, which makes it possible to use different durations of coding frame, according to which it is suggested to define at least one control parameter on the basis of sound signal characteristics. Then this control parameter is used for limitation of versions of possible frame durations selection in respect to at least one segment of signal. Group of inventions also comprises module (10, 11), in which this method is realised, device (1) and system, which comprise such module (10, 11), and also software product, which includes program code for realisation of suggested method.

EFFECT: presentation of possibility of simple selection of corresponding most suitable duration of coding frame.

34 cl, 4 dwg

Description

ОБЛАСТЬ ТЕХНИКИFIELD OF TECHNOLOGY

Изобретение относится к способу поддержки кодирования звукового сигнала, при котором по меньшей мере один отрезок указанного звукового сигнала необходимо кодировать с помощью модели кодирования, которая позволяет использовать различные длительности кадра кодирования. Изобретение относится также к соответствующему модулю, к соответствующему электронному устройству, к соответствующей системе и к соответствующему программному продукту.The invention relates to a method for supporting encoding an audio signal, in which at least one segment of the specified audio signal must be encoded using an encoding model that allows the use of different encoding frame durations. The invention also relates to a corresponding module, to a corresponding electronic device, to a corresponding system, and to a corresponding software product.

УРОВЕНЬ ТЕХНИКИBACKGROUND

Известно, что кодирование звуковых сигналов дает возможность обеспечения их эффективной передачи и/или хранения.It is known that the encoding of audio signals makes it possible to ensure their efficient transmission and / or storage.

Звуковой сигнал может быть речевым сигналом или звуковым сигналом другого типа, например музыкой, и для разных типов звуковых сигналов могут подходить разные модели кодирования.The audio signal may be a speech signal or another type of audio signal, for example music, and different coding models may be suitable for different types of audio signals.

Широко распространенной техникой кодирования речевых сигналов является кодирование по алгоритму линейного предсказания с возбуждением алгебраическим кодом (ACELP - Algebraic Code-Exited Linear Prediction). ACELP моделирует систему речеобразования человека и хорошо подходит для кодирования периодичности речевого сигнала. В результате можно достичь высокого качества речи при очень низких скоростях передачи. Например, Адаптивный Широкополосный Многоскоростной кодек (Adaptive Multi-Rate Wideband - AMR-WB) - это речевой кодек, который основан на технологии ACELP. Кодек AMR-WB был описан, например, в технической документации 3GPP TS 26.190: "Speech Codec speech processing functions; AMR Wideband speech codec; Transcoding functions" (Функции обработки речи речевого кодека; широкополосный речевой кодек AMR; функции транскодирования), V5.1.0 (2001-12). Однако для других типов звуковых сигналов, таких как музыка, речевые кодеки, которые основаны на системе речеобразования человека, обычно работают довольно плохо.A widespread technique for encoding speech signals is coding according to a linear prediction algorithm with excitation by an algebraic code (ACELP - Algebraic Code-Exited Linear Prediction). ACELP models a human speech system and is well suited for encoding the frequency of a speech signal. As a result, high speech quality can be achieved at very low transmission rates. For example, the Adaptive Multi-Rate Wideband (AMR-WB) codec is a speech codec that is based on ACELP technology. The AMR-WB codec was described, for example, in 3GPP TS 26.190 technical documentation: "Speech Codec speech processing functions; AMR Wideband speech codec; Transcoding functions" (AMR wideband speech codec; transcoding functions), V5.1.0 (2001-12). However, for other types of audio signals, such as music, speech codecs, which are based on a person’s speech education system, usually work quite poorly.

Широко используемым способом кодирования звуковых сигналов, отличных от речевых, является кодирование с преобразованием (ТСХ - transform coding). Преимущества кодирования с преобразованием для звукового сигнала основаны на эффекте маскирования при восприятии и кодировании в частотной области. Качество получающегося звукового сигнала можно дополнительно улучшить, выбирая для кодирования с преобразованием подходящую длину кадра кодирования. Но вместе с тем, что способы кодирования с преобразованием приводят к высокому качеству звуковых сигналов, отличных от речевых, они имеют плохие показатели для периодических речевых сигналов. Поэтому качество речи, подвергнутой кодированию с преобразованием, обычно довольно низкое, особенно с большими длинами кадра ТСХ.A widely used method of encoding audio signals other than speech is transformation coding (TLC - transform coding). The benefits of transform coding for an audio signal are based on the masking effect of perception and coding in the frequency domain. The quality of the resulting audio signal can be further improved by choosing a suitable encoding frame length for transform coding. But at the same time, that conversion coding methods lead to high quality of sound signals other than speech, they have poor performance for periodic speech signals. Therefore, the quality of speech encoded with the conversion is usually quite low, especially with large TLC frame lengths.

Расширенный кодек AMR-WB (AMR-WB+) кодирует стереофонический звуковой сигнал в виде высокоскоростного монофонического сигнала и предоставляет некоторую дополнительную информацию относительно стереорасширения. Кодек AMR-WB+ использует для кодирования основного монофонического сигнала в полосе частот от 0 до 6400 Гц обе модели кодирования, ACELP и ТСХ. Для модели ТСХ используется длительность кадра кодирования 20, 40 или 80 мс.The advanced AMR-WB codec (AMR-WB +) encodes a stereo audio signal as a high-speed mono signal and provides some additional information regarding stereo extension. The AMR-WB + codec uses both coding models, ACELP and TLC, to encode the main monaural signal in the frequency band from 0 to 6400 Hz. For the TLC model, an encoding frame duration of 20, 40, or 80 ms is used.

Так, как модель ACELP может снизить качество звука, а кодирование с преобразованием обычно плохо работает для речи, особенно когда используются длинные кадры, надо выбрать соответственно лучшую модель кодирования. Выбор той модели кодирования, которую фактически нужно применять, можно выполнить разнообразными путями.Since the ACELP model can reduce sound quality, and conversion coding usually does not work well for speech, especially when long frames are used, you must choose the best coding model accordingly. The choice of the coding model that actually needs to be applied can be done in a variety of ways.

В системах, требующих применение алгоритмов невысокой сложности, таких как мобильные мультимедийные службы (MMS - mobile multimedia services), для выбора оптимальной модели кодирования обычно применяются алгоритмы классификации музыка/речь. Эти алгоритмы классифицируют целый исходный сигнал либо как музыку, либо как речь на основе исследования энергии и частоты звукового сигнала.In systems requiring the use of algorithms of low complexity, such as mobile multimedia services (MMS), music / speech classification algorithms are usually used to select the optimal encoding model. These algorithms classify the entire source signal either as music or as speech based on a study of the energy and frequency of the audio signal.

Если звуковой сигнал состоит только из речи или только из музыки, то использование одной и той же модели кодирования, которая основана на такой классификации музыка/речь для целого сигнала, будет удовлетворительным. Однако во многих других случаях звуковой сигнал, который нужно кодировать, является звуковым сигналом смешанного типа. Например, речь может присутствовать в звуковом сигнале вместе с музыкой в одно и то же время и/или чередоваться с ней.If an audio signal consists only of speech or only music, then using the same coding model, which is based on such a classification of music / speech for the whole signal, will be satisfactory. However, in many other cases, the audio signal to be encoded is a mixed type audio signal. For example, speech may be present in an audio signal along with music at the same time and / or alternate with it.

В этих случаях разделение целых исходных сигналов на музыкальную или речевую категорию является слишком ограниченным. В таком случае только переключение между моделями кодирования во время кодирования звукового сигнала может максимизировать общее качество звука. То есть модель ACELP также частично используется для кодирования исходного сигнала, классифицированного как звуковой сигнал, отличный от речевого, а модель ТСХ также частично используется для исходного сигнала, классифицированного как речевой сигнал.In these cases, the separation of entire source signals into a music or speech category is too limited. In this case, only switching between coding models during coding of the audio signal can maximize the overall sound quality. That is, the ACELP model is also partially used for encoding an original signal classified as an audio signal other than a speech signal, and the TLC model is also partially used for an original signal classified as a speech signal.

Расширенный кодек AMR-WB (AMR-WB+) также предназначен и для кодирования таких смешанных типов звуковых сигналов с помощью разнородных моделей кодирования, которые используют принцип разбиения на кадры.The advanced AMR-WB codec (AMR-WB +) is also designed to encode such mixed types of audio signals using heterogeneous encoding models that use the principle of frame division.

Выбор моделей кодирования в AMR-WB+ можно выполнять несколькими путями.The selection of coding models in AMR-WB + can be done in several ways.

При наиболее сложном подходе сигнал сначала кодируется с помощью всех возможных сочетаний моделей ACELP и ТСХ. Далее для каждого сочетания сигнал снова синтезируется. Затем на основе качества синтезированных речевых сигналов выбирается лучшее возбуждение. Качество синтезированного речевого сигнала, получающегося для конкретного сочетания, можно измерить, например, определив его отношение сигнал/шум (SNR). Этот метод анализа через синтез обеспечивает хорошие результаты. Однако в некоторых приложениях из-за своей очень высокой сложности он не реализуем на практике. В значительной степени сложность вызвана кодированием ACELP, которое является наиболее сложной частью кодера.In the most complex approach, the signal is first encoded using all possible combinations of ACELP and TLC models. Further, for each combination, the signal is synthesized again. Then, based on the quality of the synthesized speech signals, the best excitation is selected. The quality of the synthesized speech signal obtained for a particular combination can be measured, for example, by determining its signal-to-noise ratio (SNR). This synthesis analysis method provides good results. However, in some applications, due to its very high complexity, it cannot be put into practice. To a large extent, the complexity is caused by ACELP encoding, which is the most complex part of the encoder.

Например, в системах, подобных MMS, метод анализа через синтез с полным замкнутым циклом выполнить слишком сложно. Поэтому в кодере MMS для определения того, какую модель кодирования - ACELP или ТСХ выбрать для кодирования отдельного кадра, применяется способ с незамкнутым циклом, обладающий низкой сложностьюFor example, in systems such as MMS, the full closed loop synthesis analysis method is too complicated. Therefore, in the MMS encoder, to determine which coding model — ACELP or TLC — to choose for encoding a single frame, an open-loop method is used, which has low complexity

Кодек AMR-WB+ для выбора соответствующей модели кодирования для каждого кадра предлагает два различных метода небольшой сложности с незамкнутым циклом. Оба метода с незамкнутым циклом для выбора соответствующей модели кодирования оценивают характеристики исходного сигнала и параметры кодирования.The AMR-WB + codec for choosing the appropriate encoding model for each frame offers two different methods of small complexity with an open loop. Both open-loop methods for selecting the appropriate encoding model evaluate the characteristics of the original signal and the encoding parameters.

В первом методе с незамкнутым циклом звуковой сигнал в пределах каждого кадра сначала делится на несколько полос частот и исследуется отношение между энергией в нижних и в верхних полосах частот, а также изменение уровня энергии в этих полосах. Затем на основе обоих выполненных измерений или на основе разных сочетаний этих измерений с использованием различных окон анализа и значений порога решения звуковое содержимое в каждом кадре звукового сигнала классифицируется как музыкоподобное или речеподобное.In the first open-loop method, the sound signal within each frame is first divided into several frequency bands and the relationship between the energy in the lower and upper frequency bands, as well as the change in the energy level in these bands, is examined. Then, based on both measurements taken or on different combinations of these measurements using different analysis windows and decision threshold values, the audio content in each frame of the audio signal is classified as music-like or speech-like.

Во втором методе с незамкнутым циклом, который также называется уточнением классификации модели, выбор модели кодирования основан на оценке периодичности и характеристик стационарности звукового содержимого в соответствующем кадре звукового сигнала. Характеристики периодичности и стационарности более точно оцениваются на основе определения корреляции, параметров долговременного предсказания (LTP - Long Term Prediction) и измерения спектрального расстояния.In the second open-loop method, which is also called refinement of the classification of the model, the choice of coding model is based on an assessment of the periodicity and stationarity characteristics of the audio content in the corresponding frame of the audio signal. The characteristics of periodicity and stationarity are more accurately estimated on the basis of determining the correlation, the parameters of the long term prediction (LTP - Long Term Prediction) and measuring the spectral distance.

Если свойства сигнала исследуются с помощью метода с незамкнутым циклом для выбора либо ACELP, либо ТСХ, а для кодирования выбирается ТСХ, то все еще необходимо выбрать одно из значений длительности кадра ТСХ 20, 40 или 80 мс, которое надо использовать. Однако оптимальную длительность кадра для ТСХ очень сложно выбрать на основе характеристик сигнала в рамках метода с незамкнутым циклом.If the signal properties are examined using the open-loop method to select either ACELP or TLC, and TLC is selected for encoding, then it is still necessary to choose one of the TLC frame durations of 20, 40 or 80 ms, which should be used. However, the optimal frame duration for TLC is very difficult to choose based on the characteristics of the signal in the open-loop method.

Таким образом, длительности кадра ТСХ возможно выбирать только в рамках упомянутого выше анализа через синтез. Однако для систем, требующих применения алгоритмов невысокой сложности, метод анализа через синтез слишком сложен, даже если он используется только для выбора длительностей кадра ТСХ.Thus, it is possible to select TLC frame durations only within the framework of the above analysis through synthesis. However, for systems requiring the use of algorithms of low complexity, the method of analysis through synthesis is too complicated, even if it is used only to select TLC frame durations.

СУЩНОСТЬ ИЗОБРЕТЕНИЯSUMMARY OF THE INVENTION

Целью изобретения является предоставление возможности эффективного и простого выбора длительности кадра кодирования, которую нужно использовать для кодирования отрезка звукового сигнала.The aim of the invention is to enable effective and simple selection of the length of the encoding frame, which must be used to encode the length of the audio signal.

Предложен способ поддержки кодирования звукового сигнала, в котором по меньшей мере один отрезок указанного звукового сигнала необходимо кодировать с помощью модели кодирования, которая позволяет использовать различные длительности кадра кодирования. Предложенный способ включает определение по меньшей мере одного параметра управления на основе, по меньшей мере частично, характеристик указанного звукового сигнала. Кроме того, предложенный способ включает ограничение вариантов выбора возможных длительностей кадра кодирования по меньшей мере для одного отрезка посредством указанного по меньшей мере одного параметра управления.A method is proposed for supporting coding of an audio signal, in which at least one segment of the indicated audio signal must be encoded using an encoding model that allows the use of different encoding frame durations. The proposed method includes determining at least one control parameter based at least in part on the characteristics of said audio signal. In addition, the proposed method includes limiting the selection of possible durations of the coding frame for at least one segment by means of the specified at least one control parameter.

Кроме того, предложен модуль поддержки кодирования звукового сигнала, в котором по меньшей мере один отрезок указанного звукового сигнала необходимо кодировать с помощью модели кодирования, которая позволяет использовать различные длительности кадра кодирования. Модуль содержит блок выбора параметра, способный определять по меньшей мере один параметр управления на основе, по меньшей мере частично, характеристик звукового сигнала. Кроме того, модуль содержит блок выбора длительности кадра, который выполнен с возможностью ограничивать варианты выбора возможных длительностей кадра кодирования по меньшей мере для одного отрезка звукового сигнала посредством по меньшей мере одного параметра управления, предоставленного блоком первой оценки. Этот модуль может быть, например, кодером или частью кодера.In addition, an audio coding support module is proposed in which at least one section of said audio signal needs to be encoded using an encoding model that allows the use of different encoding frame durations. The module comprises a parameter selection unit capable of determining at least one control parameter based at least in part on the characteristics of the audio signal. In addition, the module comprises a block for selecting a frame duration, which is configured to limit the selection of possible encoding frame durations for at least one segment of the audio signal by means of at least one control parameter provided by the first estimation unit. This module may be, for example, an encoder or part of an encoder.

Кроме того, предложено электронное устройство, которое содержит такой модуль.In addition, an electronic device is proposed that includes such a module.

Кроме того, предложена система для кодирования звука, которая содержит такой модуль и дополнительно декодер для декодирования звуковых сигналов, которые были кодированы с переменными длительностями кадра кодирования.In addition, the proposed system for encoding sound, which contains such a module and additionally a decoder for decoding audio signals that have been encoded with variable lengths of the encoding frame.

Наконец, предложен программный продукт, в котором хранится код программы поддержки кодирования звукового сигнала. По меньшей мере один отрезок звукового сигнала необходимо кодировать с помощью модели кодирования, которая позволяет использовать различные длительности кадра кодирования. В процессе выполнения в обрабатывающей части кодера код программы выполняет этапы предложенного способа.Finally, a software product is proposed in which the code for the audio encoding support program is stored. At least one segment of the audio signal must be encoded using a coding model that allows the use of different coding frame durations. In the process of execution in the processing part of the encoder, the program code performs the steps of the proposed method.

Изобретение исходит из того, что хотя окончательное определение длительности кадра кодирования для конкретного отрезка звукового сигнала часто не может быть выполнено на основе характеристик сигнала, тем не менее такие характеристики сигнала позволяют предварительно выбрать подходящие длительности кадра кодирования. Поэтому предложено определять по меньшей мере один параметр управления на основе характеристик сигнала для соответствующего отрезка звукового сигнала и использовать этот по меньшей мере один параметр для ограничения доступных вариантов выбора длительности кадра кодирования.The invention proceeds from the fact that although the final determination of the length of the coding frame for a particular segment of the audio signal often cannot be performed based on the characteristics of the signal, nevertheless, such characteristics of the signal allow you to pre-select the appropriate duration of the coding frame. Therefore, it is proposed to determine at least one control parameter based on the characteristics of the signal for the corresponding segment of the audio signal and use this at least one parameter to limit the available options for choosing the encoding frame duration.

Преимущество изобретения состоит в том, что оно снижает количество вариантов выбора длительности кадра кодирования с помощью подхода, имеющего низкую сложность. С другой стороны, уменьшение вариантов выбора длительности кадра кодирования снижает сложность окончательного выбора длительности кадра кодирования, которую предстоит использовать.An advantage of the invention is that it reduces the number of options for choosing a coding frame duration using an approach having low complexity. On the other hand, decreasing the encoding frame length selection options reduces the complexity of the final encoding frame length selection to be used.

В одном варианте осуществления изобретения окончательный выбор длительности кадра кодирования выполняется с помощью анализа через синтез. То есть в том случае, если после предложенного ограничения остается более чем один вариант выбора возможных длительностей кадра кодирования, то каждая из остающихся длительностей кадра кодирования с преобразованием используется для кодирования по меньшей мере одного отрезка. Затем получающиеся кодированные сигналы вновь декодируются с помощью соответствующей использованной длительности кадра кодирования с преобразованием. Теперь можно выбрать длительность кадра кодирования, которая приводит к лучшему декодированному звуковому сигналу по меньшей мере на одном отрезке.In one embodiment of the invention, the final selection of the length of the coding frame is performed by synthesis analysis. That is, in the event that after the proposed restriction there is more than one choice of possible durations of the encoding frame, then each of the remaining durations of the encoding frame with conversion is used to encode at least one segment. Then, the resulting encoded signals are decoded again with the corresponding used transform coding frame length. Now you can select the encoding frame duration, which leads to the best decoded audio signal in at least one segment.

Благодаря предшествующему ограничению можно значительно снизить количество требуемых циклов анализа через синтез по сравнению с упомянутым выше способом полного замкнутого цикла. В результате общая сложность кодера, в котором осуществляется изобретение, также понижается.Due to the preceding limitation, it is possible to significantly reduce the number of required analysis cycles through synthesis in comparison with the aforementioned full closed loop method. As a result, the overall complexity of the encoder in which the invention is implemented is also reduced.

Наилучший декодированный звуковой сигнал можно определить различными способами. Например, его можно определить с помощью сравнения отношений сигнал/шум, получающихся с использованием каждой из оставшихся длительностей кадра кодирования. Отношения сигнал/шум можно легко определить и они обеспечивают надежное указание на качество сигнала.The best decoded audio signal can be determined in various ways. For example, it can be determined by comparing signal-to-noise ratios obtained using each of the remaining coding frame durations. Signal-to-noise ratios can be easily determined and provide a reliable indication of signal quality.

В том случае, если для кодирования звукового сигнала можно применять несколько моделей кодирования, например модель ТСХ и модель ACELP, то также необходимо определить, какую модель кодирования для какого отрезка звукового сигнала нужно применять. Этого можно достичь способом с низкой сложностью на основе характеристик звукового сигнала для соответствующего отрезка, как упоминалось выше. Затем количество и/или положение отрезков, для которых должна применяться иная модель кодирования, чем та, которая позволяет использовать различную длительность кадра кодирования, также можно использовать в качестве параметра управления для ограничения вариантов выбора длительности кадра кодирования.In the event that several encoding models can be used to encode an audio signal, for example, the TLC model and the ACELP model, it is also necessary to determine which encoding model for which segment of the audio signal should be used. This can be achieved with a low complexity method based on the characteristics of the audio signal for the corresponding segment, as mentioned above. Then, the number and / or position of the segments for which a different encoding model should be used than that which allows using different encoding frame durations can also be used as a control parameter to limit the options for choosing the encoding frame duration.

Например, длительность кадра кодирования не может превышать размер отрезка или отрезков между двумя отрезками, для которых была выбрана иная модель кодирования.For example, the length of the encoding frame cannot exceed the size of the segment or segments between two segments for which a different encoding model was chosen.

В дальнейших вариантах осуществления изобретения длительность кадра кодирования выбирается только в пределах соответствующего суперотрезка, который содержит предварительно определенное количество отрезков. В этом случае варианты выбора длительности кадра кодирования для конкретного отрезка также можно ограничить на основе сведений о границах суперотрезка, которому принадлежит отрезок.In further embodiments of the invention, the length of the coding frame is selected only within the corresponding super-segment, which contains a predetermined number of segments. In this case, the options for choosing the encoding frame duration for a particular segment can also be limited based on the boundaries of the super-segment to which the segment belongs.

Например, таким суперотрезком может быть суперкадр, который содержит в качестве отрезков четыре кадра звукового сигнала, каждый кадр звукового сигнала имеет длительность 20 мс. В случае, если моделью кодирования является модель ТСХ, то она может предоставить длительности кадра кодирования 20, 40 или 80 мс. Если в этом случае, например, для второго кадра звукового сигнала в суперкадре была выбрана модель кодирования ACELP, то известно, что третий кадр звукового сигнала можно кодировать с длительностью кодирования не более чем 20 мс или вместе с четвертым кадром звукового сигнала 40 мс.For example, such a super-segment can be a super-frame that contains four frames of an audio signal as segments, each frame of an audio signal has a duration of 20 ms. If the encoding model is a TLC model, then it can provide the encoding frame durations of 20, 40, or 80 ms. If in this case, for example, for the second frame of the audio signal in the superframe, the ACELP encoding model was selected, it is known that the third frame of the audio signal can be encoded with a coding duration of no more than 20 ms or together with the fourth frame of the audio signal 40 ms.

В другом предпочтительном варианте осуществления изобретения индикатор, показывающий, надо ли применять большую или меньшую длительность кадра кодирования, предоставляет добавочный параметр управления. Указание на то, что надо применять меньшую длительность кадра кодирования, при этом исключает по меньшей мере вариант выбора самой большой длительности кадра кодирования, а указание на то, что надо применять большую длительность кадра кодирования, исключает по меньшей мере вариант выбора самой малой длительности кадра кодирования.In another preferred embodiment of the invention, an indicator showing whether to apply a larger or shorter encoding frame duration provides an additional control parameter. An indication that a shorter encoding frame length should be used, at the same time eliminating at least the option of choosing the longest coding frame duration, and an indication that a longer coding frame length should be used excludes at least the option of choosing the shortest coding frame duration .

КРАТКОЕ ОПИСАНИЕ ЧЕРТЕЖЕЙBRIEF DESCRIPTION OF THE DRAWINGS

Другие объекты и возможности настоящего изобретения станут очевидны из следующего подробного описания вместе с сопроводительными чертежами.Other objects and features of the present invention will become apparent from the following detailed description, together with the accompanying drawings.

Фиг.1 - это схема системы для кодирования звука в соответствии с вариантом осуществления изобретения.Figure 1 is a diagram of a system for encoding sound in accordance with an embodiment of the invention.

Фиг.2 - это алгоритм, иллюстрирующий вариант осуществления способа, в соответствии с изобретением, реализованным в системе, показанной на фиг.1;FIG. 2 is a flowchart illustrating an embodiment of a method in accordance with the invention implemented in the system of FIG. 1;

Фиг.3 - это первая таблица, иллюстрирующая ограничение, накладываемое на сочетания режимов работы, которое основано на параметрах управления в соответствии с изобретением; иFigure 3 is a first table illustrating a restriction imposed on combinations of operating modes, which is based on control parameters in accordance with the invention; and

Фиг.4 - это вторая таблица, иллюстрирующая ограничение, накладываемое на сочетания режимов работы, которое основано на параметрах управления в соответствии с изобретением.4 is a second table illustrating a restriction imposed on combinations of operating modes that is based on control parameters in accordance with the invention.

ПОДРОБНОЕ ОПИСАНИЕ ИЗОБРЕТЕНИЯDETAILED DESCRIPTION OF THE INVENTION

Фиг.1 - это схема системы для кодирования звука в соответствии с вариантом осуществления изобретения, которая позволяет выбирать длительность кадра кодирования для модели кодирования с преобразованием.Figure 1 is a diagram of a system for encoding sound in accordance with an embodiment of the invention, which allows you to select the duration of the encoding frame for the transform coding model.

Система содержит первое устройство 1, включающее кодер 10 AMR-WB+, и второе устройство 2, включающее декодер 20 AMR-WB+. Первое устройство 1 может быть, например, сервером MMS, а второе устройство 2 может быть, например, мобильным телефоном.The system comprises a first device 1 including an AMR-WB + encoder 10, and a second device 2 including an AMR-WB + decoder 20. The first device 1 may be, for example, an MMS server, and the second device 2 may be, for example, a mobile phone.

Первое устройство 1 содержит блок 12 первой оценки для первого выбора модели кодирования по методу с незамкнутым циклом. Кроме того, первое устройство 1 содержит блок 13 второй оценки для уточнения первого выбора дополнительным методом с незамкнутым циклом и параллельно для определения индикатора короткого кадра в качестве одного параметра управления. Блок 12 первой оценки и блок 13 второй оценки вместе образуют блок выбора параметра. Кроме того, первое устройство 1 содержит блок 14 выбора длительности кадра ТСХ для ограничения вариантов выбора длительности кадра кодирования в том случае, если выбирается модель ТСХ, и для выбора лучшего варианта среди оставшихся вариантов методом с незамкнутым циклом. Кроме того, первое устройство 1 содержит блок 15 кодирования. Блок 15 кодирования способен применять к принятым звуковым кадрам модель кодирования ACELP, модель кодирования ТСХ20, которая использует длительность кадра ТСХ 20 мс, модель кодирования ТСХ40, которая использует длительность кадра ТСХ 40 мс, или модель кодирования ТСХ80, которая использует длительность кадра ТСХ 80 мс.The first device 1 comprises a first evaluation unit 12 for a first choice of an open-loop encoding model. In addition, the first device 1 comprises a second evaluation unit 13 for refining the first selection by an additional open-loop method and in parallel for determining the short frame indicator as one control parameter. The first evaluation unit 12 and the second evaluation unit 13 together form a parameter selection unit. In addition, the first device 1 comprises a TLC frame duration selection unit 14 for limiting the options for selecting the encoding frame duration in the event that a TLC model is selected, and for choosing the best option among the remaining options with an open loop method. In addition, the first device 1 comprises an encoding unit 15. The encoding unit 15 is capable of applying the ACELP encoding model, the TLCX encoding model, which uses a TLC frame length of 20 ms, the TLC40 encoding model, which uses a TLC frame length of 40 ms, or the TLC80 encoding model, which uses a TLC frame length of 80 ms, to apply to received sound frames.

Блок 12 первой оценки соединен с блоком 13 второй оценки и с блоком 15 кодирования. Блок 13 второй оценки, кроме того, соединен с блоком 14 выбора длительности кадра ТСХ и с блоком 15 кодирования. Блок 14 выбора длительности кадра ТСХ также соединен с блоком 15 кодирования.The first evaluation unit 12 is connected to the second evaluation unit 13 and to the encoding unit 15. The second evaluation unit 13 is furthermore connected to the TLC frame duration selecting unit 14 and to the encoding unit 15. The TLC frame duration selection unit 14 is also connected to the encoding unit 15.

Необходимо понимать, что представленные блоки 12-15 предназначены для кодирования монофонического звукового сигнала, который может быть сформирован из стереофонического звукового сигнала. Дополнительную стереофоническую информацию можно сформировать в непоказанных дополнительных блоках стереорасширения. Кроме того, необходимо отметить, что кодер 10 содержит дополнительные непоказанные блоки. Кроме того, необходимо понимать, что представленные блоки 12-15 не должны быть отдельными блоками, а в равной степени могут быть перемешаны между собой или с другими блоками.You must understand that the presented blocks 12-15 are intended for encoding a monophonic audio signal, which can be formed from a stereo audio signal. Additional stereo information may be generated in additional stereo units not shown. In addition, it should be noted that the encoder 10 contains additional blocks not shown. In addition, it must be understood that the presented blocks 12-15 do not have to be separate blocks, but can be equally mixed with each other or with other blocks.

Блоки 12, 13, 14 и 15 можно реализовать, в частности, с помощью программного обеспечения SW, работающего в обрабатывающей части 11 кодера 10, которая показана штриховой линией.Blocks 12, 13, 14 and 15 can be implemented, in particular, using SW software operating in the processing part 11 of the encoder 10, which is shown by a dashed line.

Далее обработка в кодере 10 будет описана подробнее со ссылкой на алгоритм, показанный на фиг.2.Next, the processing in the encoder 10 will be described in more detail with reference to the algorithm shown in figure 2.

Обработка выполняется для соответствующего суперкадра. Каждый суперкадр имеет длительность 80 мс и содержит четыре последовательных кадра звукового сигнала.Processing is performed for the corresponding superframe. Each superframe has a duration of 80 ms and contains four consecutive frames of the audio signal.

Кодер 10 принимает звуковой сигнал, которым его снабжает первое устройство 1. Звуковой сигнал преобразовывается в монофонический звуковой сигнал и фильтр линейного предсказания (LP - linear prediction) и вычисляет в каждом кадре кодирование с линейным предсказанием для того, чтобы смоделировать спектральную огибающую.The encoder 10 receives the audio signal supplied by the first device 1. The audio signal is converted into a monophonic audio signal and a linear prediction filter (LP) and computes linear prediction coding in each frame in order to simulate the spectral envelope.

Блок 12 первой оценки в рамках первого анализа с незамкнутым циклом обрабатывает получающееся возбуждение LPC, производимое фильтром LP, для каждого кадра в суперкадре. Этот анализ на основе характеристик исходного сигнала определяет, может ли содержимое соответствующего кадра рассматриваться в качестве речевого сигнала или иного звукового содержимого, аналогичного музыке. Анализ, как упоминалось выше, может основываться, например, на оценке энергии в разных полосах частот. Для каждого кадра, который, предположительно, содержит речевой сигнал, выбирается модель кодирования ACELP, а для каждого кадра, который предположительно содержит иной звуковой сигнал, выбирается модель ТСХ. В этот момент времени не существует разделения между моделями ТСХ, которые используют различные длительности кадра кодирования. Для тех кадров, для которых исследуемые характеристики ясно не указывают на содержимое с речевым сигналом или на содержимое со звуковым сигналом другого типа, выбирается неопределенное состояние.The first estimation unit 12, in the first open-loop analysis, processes the resulting LPC excitation produced by the LP filter for each frame in the superframe. This analysis, based on the characteristics of the original signal, determines whether the contents of the corresponding frame can be considered as a speech signal or other audio content similar to music. The analysis, as mentioned above, can be based, for example, on the estimation of energy in different frequency bands. For each frame that supposedly contains a speech signal, an ACELP encoding model is selected, and for each frame that supposedly contains a different audio signal, a TLC model is selected. At this point in time, there is no separation between TLC models that use different encoding frame lengths. For those frames for which the characteristics under study do not clearly indicate the content with a speech signal or the content with a different type of audio signal, an undefined state is selected.

Блок 12 первой оценки информирует блок 15 кодирования обо всех кадрах, для которых пока была выбрана модель ACELP.The first evaluation unit 12 informs the encoding unit 15 of all frames for which the ACELP model has so far been selected.

Затем блок 13 второй оценки выполняет второй анализ с незамкнутым циклом с использованием покадрового принципа работы для дальнейшего разделения на кадры ACELP и ТСХ, которое основано на характеристиках сигнала. Параллельно блок 13 второй оценки определяет индикатор короткого кадра - флаг MoMtcx в качестве одного параметра управления. Если флаг MoMtcx установлен, то использование ТСХ80 запрещено.Then, the second evaluation unit 13 performs a second open-loop analysis using a single-frame principle of operation for further splitting into ACELP and TLC frames, which is based on the characteristics of the signal. In parallel, the second evaluation unit 13 determines the indicator of the short frame - the MoMtcx flag as one control parameter. If the MoMtcx flag is set, then the use of TCX80 is prohibited.

Обработка в блоке 13 второй оценки для соответствующего кадра выполняется только в том случае, если для него установлен индикатор голосовой активности - флаг VAD и если блок 12 первой оценки не выбрал для этого кадра модель кодирования ACELP.The processing in block 13 of the second rating for the corresponding frame is performed only if the indicator of voice activity is set for it - the VAD flag and if block 12 of the first rating did not select the ACELP coding model for this frame.

Если результатом вычисления первого анализа с незамкнутым циклом при помощи узла 12 первой оценки стало неопределенное состояние, то сначала вычисляется спектральное расстояние и набирается множество доступных характеристик сигнала.If the result of the calculation of the first open-loop analysis using node 12 of the first estimate is an undefined state, then the spectral distance is calculated first and a lot of available signal characteristics are typed.

Спектральное расстояние SD_n текущего кадра n вычисляется из параметров спектральной пары иммитанса (ISP - Immittance Spectral Pair) согласно следующему уравнению.The spectral distance SD _{n of the} current frame n is calculated from the parameters of an Immittance Spectral Pair (ISP) according to the following equation.

где ISP_n - это вектор коэффициентов ISP кадра n и где ISP_n(i) - это i-й элемент этого вектора. Параметры ISP так или иначе доступны, так как коэффициенты LP преобразуются в область ISP для целей квантования и интерполяции.where ISP _n is the vector of coefficients of the ISP frame n and where ISP _n (i) is the i-th element of this vector. The ISP parameters are available in one way or another, since the LP coefficients are converted to the ISP domain for quantization and interpolation purposes.

Параметр Lag_n содержит значения двух задержек незамкнутого цикла текущего кадра. Lag - это долговременная задержка фильтра. Обычно она равна истинному периоду основного тона или кратна ему, или он кратен ей. Анализ основного тона в незамкнутом цикле выполняется дважды за кадр, то есть каждые 10 мс, чтобы найти две оценки задержки основного тона в каждом кадре. Это делается для того, чтобы упростить анализ основного тона и ограничить поиск основного тона в замкнутом цикле малым количеством задержек вблизи оценок задержек в незамкнутом цикле.The Lag _n parameter contains the values of the two open-cycle delays of the current frame. Lag is a long-term filter delay. Usually it is equal to the true period of the fundamental tone or a multiple of it, or it is a multiple of it. An open-loop pitch analysis is performed twice per frame, that is, every 10 ms, to find two estimates of the pitch delay in each frame. This is done in order to simplify the analysis of the fundamental tone and limit the search for the fundamental tone in a closed loop to a small number of delays near estimates of delays in an open loop.

Далее, LagDif_buf - буфер, содержащий значения задержки незамкнутого цикла предыдущих десяти кадров длительностью 20 мс.Next, LagDif _buf is a buffer containing the open cycle delay values of the previous ten frames lasting 20 ms.

Параметр Gain_n содержит два значения коэффициента усиления LTP для текущего кадра n.Gain _n contains two LTP gain values for the current frame n.

Параметр NormCorr_n содержит два значения нормированной корреляции для текущего кадра n.The NormCorr _n parameter contains two normalized correlation values for the current frame n.

Параметр MaxEnergy_buf - это максимальное значение буфера, содержащего значения энергии. Буфер энергии содержит значения энергии текущего кадра n и пяти предыдущих кадров, каждый длительностью 20 мс.The MaxEnergy _buf parameter is the maximum value of the buffer containing the energy values. The energy buffer contains the energy values of the current frame n and the previous five frames, each lasting 20 ms.

Теперь выбираются режимы кодирования, а параметр управления NoMtcx устанавливается в соответствии со следующим алгоритмом с незамкнутым циклом:Now encoding modes are selected, and the NoMtcx control parameter is set in accordance with the following open-loop algorithm:

if (SD_n>0.2)if (SD _n > 0.2)

Mode=ACELP_MODE;Mode = ACELP_MODE;

elseelse

if (LagDif_buf<2)if (LagDif _buf <2)

if (Lag_n==HIGH LIMIT or Lag_n==LOW LIMIT) {if (Lag _n == HIGH LIMIT or Lag _n == LOW LIMIT) {

if(Gainn-NormCorr_n<0.1 and NormCorr_n>0.9)if (Gainn-NormCorr _n <0.1 and NormCorr _n > 0.9)

Mode=ACELP_MODEMode = ACELP_MODE

elseelse

Mode=TCX_MODEMode = TCX_MODE

else if (Gainn-NormCorr_n<0.1 and NormCorr_n>0.88)else if (Gainn-NormCorr _n <0.1 and NormCorr _n > 0.88)

Mode==ACELP_MODEMode == ACELP_MODE

else if (Gain_n-NormCorr_n>0.2)else if (Gain _n -NormCorr _n > 0.2)

Mode==TCX_MODEMode == TCX_MODE

elseelse

NoMtcx=NoMtcx+1NoMtcx = NoMtcx + 1

if (MaxEnergybuf<60)if (MaxEnergybuf <60)

if (SD_n>0.15)if (SD _n > 0.15)

Mode=ACELP_MODE;Mode = ACELP_MODE;

else NoMtcx=NoMtcx+1.else NoMtcx = NoMtcx + 1.

Таким образом, разнообразные характеристики сигнала и их сочетания сравниваются с разнообразными предварительно заданными пороговыми значениями, чтобы определить, содержит ли кадр неопределенного режима речь или другое звуковое содержимое, и назначить подходящую модель кодирования. Подобным же образом в зависимости от некоторых из этих характеристик сигнала и их сочетаний устанавливается индикатор короткого кадра - флаг NoMtcx.Thus, the various characteristics of the signal and their combinations are compared with various predefined threshold values to determine if the frame of the indefinite mode contains speech or other audio content, and to assign a suitable encoding model. Similarly, depending on some of these characteristics of the signal and their combinations, a short frame indicator is set - the NoMtcx flag.

Если результатом вычислений первого анализа с незамкнутым циклом с помощью блока 12 первой оценки был режим ТСХ, то, наоборот, определяется, был ли сброшен в ноль флаг VAD по меньшей мере для одного кадра в предшествующем суперкадре. Если это так, то индикатор короткого кадра - флаг NoMtcx также устанавливается на '1'.If the result of the calculations of the first open-loop analysis using block 12 of the first evaluation was TLC mode, then, on the contrary, it is determined whether the VAD flag was reset to zero for at least one frame in the previous superframe. If so, then the indicator of a short frame - the NoMtcx flag is also set to '1'.

Если режим кодирования для текущего кадра к данному моменту уже установлен на режим ТСХ или все еще установлен на неопределенный режим, решение о режиме проверяется далее. С этой целью сначала из коэффициентов LP текущего кадра создается вектор mag спектральной огибающей, подвергнутой дискретному преобразованию Фурье (DFT). Затем выполняется проверка режима кодирования в соответствии со следующим алгоритмом:If the encoding mode for the current frame is already set to TLC mode or is still set to indefinite mode, the decision about the mode is checked further. For this purpose, the spectral envelope vector mag subjected to the discrete Fourier transform (DFT) is first created from the LP coefficients of the current frame. Then, the encoding mode is checked in accordance with the following algorithm:

if (Gain_n-NormCorr_n<0.006 and NormCorr_n>0.92 and Lag_n>21)if (Gain _n -NormCorr _n <0.006 and NormCorr _n > 0.92 and Lag _n > 21)

DFTSum=0;DFTSum = 0;

for (i=l; i<40; i++) {for (i = l; i <40; i ++) {

DFTSum=DFTSum+mag[i];DFTSum = DFTSum + mag [i];

if (DFTSum>95 and mag[0]<5) {if (DFTSum> 95 and mag [0] <5) {

Mode=TCX_MODE;Mode = TCX_MODE;

elseelse

Mode=ACELP_MODE;Mode = ACELP_MODE;

NoMtcx=NoMtcx+1NoMtcx = NoMtcx + 1

Таким образом, конечная сумма DFTSum - это сумма первых 40 элементов вектора mag, исключая его первый элемент mag(0).Thus, the final sum of DFTSum is the sum of the first 40 elements of the mag vector, excluding its first element mag (0).

Блок 13 второй оценки дополнительно информирует блок 15 кодирования обо всех кадрах, для которых была выбрана модель ACELP.The second evaluation unit 13 additionally informs the encoding unit 15 of all frames for which the ACELP model has been selected.

В блоке 14 выбора длительности кадра ТСХ сначала оцениваются параметры управления для ограничения количества вариантов выбора длительности кадра ТСХ.In TLC frame duration selection section 14, control parameters are first evaluated to limit the number of TLC frame duration selection options.

Один параметр управления - это количество режимов ACELP, выбранных в суперкадре. В том случае, если модель кодирования ACELP была выбрана для четырех кадров в суперкадре, то не остается ни одного кадра, для которого нужно определять длительность кадра ТСХ. В том случае, если модель кодирования ACELP была выбрана для трех кадров в суперкадре, то длительность кадра ТСХ устанавливается равной 20 мс.One control parameter is the number of ACELP modes selected in the superframe. In the event that the ACELP encoding model was selected for four frames in a superframe, there is not a single frame left for which the TLC frame duration needs to be determined. In the event that the ACELP encoding model was selected for three frames in a superframe, then the TLC frame duration is set to 20 ms.

Дальнейшие ограничения вносятся на основе таблицы фиг.3 или 4. На фиг.3 и 4 представлены соответствующие таблицы из пяти колонок, связывающие длительности кадра ТСХ, которые можно выбрать с разнообразными сочетаниями выбранных режимов кодирования.Further restrictions are introduced based on the table of FIGS. 3 or 4. FIGS. 3 and 4 show corresponding tables of five columns linking the TLC frame durations that can be selected with various combinations of selected coding modes.

Обе таблицы в первой колонке показывают семь возможных сочетаний выбранных режимов кодирования для четырех кадров в суперкадре. В каждом из сочетаний выбрано не более чем два режима ACELP. Сочетания следующие (0,1,1,1), (1,0,1,1), (1,1,0,1), (1,1,1,0), (1,1,0,0), (0,0,1,1) и (1,1,1,1), последнее сочетание встречается дважды. В этом представлении выбранных сочетаний '0' обозначает режим ACELP, а '1' - режим ТСХ.Both tables in the first column show seven possible combinations of the selected encoding modes for four frames in a superframe. In each combination, no more than two ACELP modes are selected. The combinations are (0,1,1,1), (1,0,1,1), (1,1,0,1), (1,1,1,0), (1,1,0,0 ), (0,0,1,1) and (1,1,1,1), the last combination occurs twice. In this representation of the selected combinations, '0' stands for ACELP mode, and '1' stands for TLC mode.

В соответствующей четвертой колонке представлен параметр управления Aind, который для каждого сочетания в первой колонке указывает количество выбранных режимов работы ACELP. Можно увидеть, что присутствуют только сочетания режимов, соответствующие значениям Aind '0', '1' и '2', так как в случае, если значения равны '3' или '4', блок 14 выбора длительности кадра ТСХ может сразу выбрать длительность кадра ТСХ без дальнейшей обработки.The corresponding fourth column presents the Aind control parameter, which for each combination in the first column indicates the number of selected ACELP operating modes. You can see that there are only combinations of modes that correspond to the values Aind '0', '1' and '2', since if the values are equal to '3' or '4', the TLC frame duration selection unit 14 can immediately select the duration TLC frame without further processing.

В соответствующей пятой колонке представлен индикатор короткого кадра - флаг NoMtcx. Этот параметр оценивается блоком 14 выбора длительности кадра ТСХ только в том случае, если параметр управления Aind имеет значение '0', то есть в случае, если ни для одного кадра в суперкадре не был выбран режим работы ACELP.The corresponding fifth column presents the indicator of a short frame - the NoMtcx flag. This parameter is evaluated by the TLC frame duration selection unit 14 only if the Aind control parameter has a value of '0', that is, if the ACELP operation mode has not been selected for any frame in the superframe.

В соответствующей второй и третьей колонке для каждого сочетания показаны длительности кадра ТСХ, которые можно выбирать для кадров режима ТСХ, принимая во внимание ограничения, наложенные параметрами управления. Для каждого сочетания в первой колонке нужно проверить не больше чем две длительности кадра ТСХ. В этих сочетаниях длительностей кадра ТСХ '0' означает кадр кодирования ACELP длительностью 20 мс, '1' - кадр ТСХ длительностью 20 мс, последовательность двух '2' - кадр ТСХ длительностью 40 мс и последовательность четырех '3' - кадр ТСХ длительностью 80 мс.The corresponding second and third columns for each combination show the TLC frame durations that can be selected for TLC mode frames, taking into account the restrictions imposed by the control parameters. For each combination in the first column, check for no more than two TLC frame durations. In these combinations of TLC frame durations, '0' means an ACELP coding frame with a duration of 20 ms, '1' means a TLC frame with a duration of 20 ms, a sequence of two '2' is a TLC frame with a duration of 40 ms and a sequence of four '3' is a TLC frame with a duration of 80 ms .

Например, для первого сочетания режимов работы (0,1,1,1) разрешены сочетания длительностей кадра кодирования (0,1,1,1) и (0,1,2,2). То есть либо второй, третий и четвертый кадры кодируются как кадр ТСХ длительностью 20 мс или второй кадр кодируется как кадр ТСХ длительностью 20 мс, а третий и четвертый кадры кодируются как кадр ТСХ длительностью 40 мс.For example, for the first combination of operating modes (0,1,1,1), combinations of coding frame durations (0,1,1,1) and (0,1,2,2) are allowed. That is, either the second, third and fourth frames are encoded as a TLC frame with a duration of 20 ms or the second frame is encoded as a TLC frame with a duration of 20 ms, and the third and fourth frames are encoded as a TLC frame with a duration of 40 ms.

Аналогично, для второго сочетания режимов работы (1,0,1,1) разрешено сочетание длительностей кадра кодирования (1,0,1,1) и (1,0,2,2). Для третьего сочетания режимов работы (1,1,0,1) разрешены сочетания длительностей кадра кодирования (1,1,0,1) и (2,2,0,1), Для четвертого сочетания режимов работы (1,1,1,0) разрешены сочетания длительностей кадра кодирования (1,1,1,0) и (2,2,1,0). Для пятого сочетания режимов работы (1,1,0,0) разрешено сочетание длительностей кадра кодирования (1,1,0,0) и (2,2,0,0). Для шестого сочетания режимов работы (0,0,1,1) разрешено сочетание длительностей кадра кодирования (0,0,1,1) и (0,0,2,2).Similarly, for the second combination of operating modes (1,0,1,1), a combination of coding frame durations (1,0,1,1) and (1,0,2,2) is allowed. For the third combination of operating modes (1,1,0,1), combinations of coding frame durations (1,1,0,1) and (2,2,0,1) are allowed, For the fourth combination of operating modes (1,1,1 , 0) combinations of encoding frame durations (1,1,1,0) and (2,2,1,0) are allowed. For the fifth combination of operating modes (1,1,0,0), a combination of coding frame durations (1,1,0,0) and (2,2,0,0) is allowed. For the sixth combination of operating modes (0,0,1,1), a combination of coding frame durations (0,0,1,1) and (0,0,2,2) is allowed.

Для седьмого сочетания режимов работы (1,1,1,1) индикатор короткого кадра - флаг NoMtcx показывает, надо ли пробовать большую или меньшую длительность кадра ТСХ. Флаг NoMtcx устанавливается для суперкадра в том случае, если блок 13 второй оценки установил его по меньшей мере для одного из кадров в суперкадре. Если флаг NoMtcx устанавливается для суперкадра, то разрешены только длительности короткого кадра.For the seventh combination of operating modes (1,1,1,1), the short frame indicator - the NoMtcx flag indicates whether to try longer or shorter TCX frame duration. The NoMtcx flag is set for a superframe in the event that the second evaluation unit 13 has set it for at least one of the frames in the superframe. If the NoMtcx flag is set for a superframe, then only short frame durations are allowed.

В таблице фиг.3 это означает, что блок 14 выбора длительности кадра ТСХ сразу выбирает длительность кадра ТСХ 20 мс для целого суперкадра. То есть единственным разрешенным сочетанием длительностей кадра ТСХ является (1,1,1,1). В таблице фиг.4 установленный флаг NoMtcx означает, что разрешены сочетание длительностей кадра ТСХ (1,1,1,1) и дополнительно сочетание длительностей кадра ТСХ (2,2,2,2), последнее означает кадр ТСХ длительностью 40 мс.In the table of FIG. 3, this means that the TLC frame duration selection section 14 immediately selects a TLC frame duration of 20 ms for the whole superframe. That is, the only allowed combination of TLC frame durations is (1,1,1,1). In the table of Fig. 4, the set NoMtcx flag means that a combination of TLC frame durations (1,1,1,1) and, additionally, a combination of TLC frame durations (2,2,2,2) are allowed, the latter means a TLC frame duration of 40 ms.

Если индикатор короткого кадра - флаг NoMtcx не установлен, то разрешены только большие длительности кадра ТСХ. В таблице на фиг.3 и 4 это означает, что разрешены сочетания длительностей кадра ТСХ (2,2,2,2) и (3,3,3,3), последнее означает одиночный кадр ТСХ длительностью 80 мс.If the indicator of a short frame is the NoMtcx flag is not set, then only large TLC frame durations are allowed. In the table in FIGS. 3 and 4, this means that combinations of TLC frame durations (2,2,2,2) and (3,3,3,3) are allowed, the latter means a single TLC frame with a duration of 80 ms.

Для оптимального кодирования чистой музыки обычно требуются более длинные кадры ТСХ, а речь, очевидно, лучше всего кодируется с помощью ACELP. Особенно в начале музыкального и/или речевого сигнала, когда энергия низкая или индикатор голосовой активности VAD был сброшен в ноль в предыдущих кадрах, более длинные кадры ТСХ, использованные для кодирования речи, ухудшают ее качество. С другой стороны, короткие кадры ТСХ длительностью 20 мс сравнительно хороши для музыки и определенных речевых отрезков. При некоторых характеристиках сигнала сложно определить, является ли содержимое кадра музыкой или речью. Поэтому в таком случае короткий кадр ТСХ является хорошей альтернативой оптимальной модели кодирования, потому что он подходит для содержимого обоих типов. Таким образом, индикатор короткого кадра хорошо подходит в качестве параметра управления.Optimal encoding of clean music usually requires longer TLC frames, and speech is obviously best encoded using ACELP. Especially at the beginning of a musical and / or speech signal, when the energy is low or the voice activity indicator VAD was reset to zero in previous frames, longer TLC frames used to encode speech impair its quality. On the other hand, short TLC frames of 20 ms duration are comparatively good for music and certain speech segments. With some characteristics of the signal, it is difficult to determine whether the contents of the frame are music or speech. Therefore, in this case, a short TLC frame is a good alternative to the optimal coding model because it is suitable for both types of content. Thus, the short frame indicator is well suited as a control parameter.

Дополнительные сочетания длительностей кадра кодирования для представленных сочетаний режимов работы не допускаются структурой кодера, в котором для центральных кадров звукового сигнала не разрешена модель ТСХ40.Additional combinations of coding frame durations for the presented combinations of operating modes are not allowed by the encoder structure, in which the TCX40 model is not allowed for the central frames of the audio signal.

Аналогично, дополнительные сочетания режимов работы при Aind<3, не представленные на фиг.3 и 4, допускают только единственное сочетание длительностей кадра кодирования или сами по себе, или из-за структуры кодера. То есть сочетание режимов (1,0,0,1) допускает только сочетание длительностей кадра кодирования (1,0,0,1), а сочетание режимов (0,1,1,0) допускает только сочетание длительностей кадра кодирования (0,1,1,0).Similarly, additional combinations of operating modes at Aind <3, not shown in FIGS. 3 and 4, allow only a single combination of coding frame durations, either on their own or because of the encoder structure. That is, a combination of modes (1,0,0,1) allows only a combination of coding frame durations (1,0,0,1), and a combination of modes (0,1,1,0) allows only a combination of coding frame durations (0, 1,1,0).

Так как параметры управления Aind и NoMtcx ограничивают сочетания режимов в том, что касается длительностей кадра ТСХ, то для каждого суперкадра нужно проверить не больше двух длительностей кадра.Since the control parameters Aind and NoMtcx limit the combination of modes with regard to the TLC frame durations, for each superframe it is necessary to check no more than two frame durations.

В том случае, если остаются два возможных сочетания длительностей кадра ТСХ, то для того чтобы найти оптимальную модель или модели ТСХ для суперкадра, в блоке 14 выбора длительности кадра ТСХ используется алгоритм типа сравнения отношений сигнал/шум.In the event that there are two possible combinations of TLC frame durations, then in order to find the optimal TLC model or models for the superframe, an algorithm such as signal-to-noise ratio comparison is used in block TLC frame duration selection 14.

Для оценки длительностей кадра ТСХ, которые можно выбрать, кадры в суперкадре, для которых был выбран режим ТСХ, кодируются с использованием кодирования с преобразованием и с обоими разрешенными сочетаниями длительности кадра ТСХ. Для примера, модель ТСХ основана на быстром преобразовании Фурье (FFT). Закодированные сигналы снова декодируются, а затем результаты для обеих длительностей кадра ТСХ сравниваются на основе сегментного отношения сигнал/шум.To estimate the TLC frame durations that can be selected, the frames in the superframe for which TLC mode has been selected are encoded using transform coding and with both allowed combinations of TLC frame duration. For example, the TLC model is based on the Fast Fourier Transform (FFT). The encoded signals are decoded again, and then the results for both TLC frame durations are compared based on a segmented signal-to-noise ratio.

Сегментное отношение сигнал/шум - это отношение сигнал/шум для одного подкадра кадра ТСХ. Подкадр имеет длину N, что соответствует подкадру исходного звукового сигнала длительностью 5 мс.The signal-to-noise ratio is the signal-to-noise ratio for one subframe of the TLC frame. The subframe has a length N, which corresponds to a subframe of the original audio signal with a duration of 5 ms.

Сегментное отношение сигнал/шум в подкадре i (segSNR_i) для каждого подкадра кадра ТСХ определяется согласно следующему уравнению:The signal to noise segment ratio in subframe i (segSNR _i ) for each subframe of the TLC frame is determined according to the following equation:

В этом уравнении x_w(n) - это амплитуда оцифрованного исходного звукового сигнала в позиции n в подкадре, a

- это амплитуда кодированного и затем декодированного звукового сигнала в позиции n в подкадре.In this equation, x _w (n) is the amplitude of the digitized source audio signal at position n in the subframe, a

is the amplitude of the encoded and then decoded audio signal at position n in the subframe.

На основе этого по всем подкадрам в кадре ТСХ определяется среднее сегментное отношение сигнал/шум согласно следующему уравнению:Based on this, the average segment signal-to-noise ratio is determined for all subframes in the TLC frame according to the following equation:

где N_SF - число подкадров в кадре ТСХ. Так как кадр ТСХ может иметь длительность 20, 40 или 80 мс, то N_SF может быть 4, 8 или 16.where N _SF is the number of subframes in the TLC frame. Since the TLC frame may have a duration of 20, 40, or 80 ms, N _SF may be 4, 8, or 16.

Затем блок 14 выбора длительности кадра ТСХ определяет, какая из разрешенных длительностей кадра ТСХ для определенного количества кадров звукового сигнала приводит к лучшему среднему отношению сигнал/шум. Например, в том случае, если каждый из двух кадров звукового сигнала можно было кодировать с помощью модели ТСХ20 или совместно с помощью модели ТСХ40, то сравнивается усредненное отношение сигнал/шум кадра ТСХ40 с усредненной суммой отношений сигнал/шум для обоих кадров ТСХ20. Выбирается и сообщается в блок 15 кодирования длительность кадра ТСХ, которая приводит к более высокому усредненному отношению сигнал/шум.Then, the TLC frame duration selection section 14 determines which of the allowed TLC frame durations for a certain number of frames of the audio signal leads to a better average signal to noise ratio. For example, if each of the two frames of the audio signal could be encoded using the TCX20 model or together using the TCX40 model, then the average signal-to-noise ratio of the TCX40 frame is compared with the average sum of the signal-to-noise ratios for both TCX20 frames. The TLC frame duration, which leads to a higher average signal-to-noise ratio, is selected and reported to the coding unit 15.

Блок 15 кодирования кодирует все кадры звукового сигнала с помощью соответственно выбранной модели кодирования, указанной или блоком 12 первой оценки, блоком 13 второй оценки, или блоком 14 выбора длительности кадра ТСХ. К примеру, ТСХ основано на FFT с использованием выбранной длительности кадра кодирования, а кодирование ACELP использует, к примеру, LTP и параметры фиксированной кодовой книги для возбуждения LPC.The encoding unit 15 encodes all frames of the audio signal using an appropriately selected encoding model indicated by either the first evaluation unit 12, the second evaluation unit 13, or the TLC frame duration selecting unit 14. For example, TLC is based on FFT using the selected coding frame duration, and ACELP coding uses, for example, LTP and fixed codebook parameters to drive the LPC.

Затем блок 15 кодирования предоставляет кодированные кадры для передачи второму устройству 2. Во втором устройстве 2 декодер 20 декодирует все принятые кадры с помощью модели кодирования ACELP или с помощью одной из моделей ТСХ. Декодированные кадры предоставляются, например, для воспроизведения пользователю второго устройства 2.Then, the encoding unit 15 provides encoded frames for transmission to the second device 2. In the second device 2, the decoder 20 decodes all received frames using the ACELP encoding model or using one of the TLC models. Decoded frames are provided, for example, for playback to the user of the second device 2.

Таким образом, представленный выбор длительности кадра ТСХ основан на подходе с полузамкнутым циклом, в котором основной тип модели кодирования и параметры управления выбираются по методу незамкнутого цикла, а затем из ограниченного количества вариантов по методу замкнутого цикла выбирается длительность кадра ТСХ. В то время как в процессе анализа с полным замкнутым циклом анализ через синтез всегда выполняется четыре раза за суперкадр, в данном методе полузамкнутого цикла анализ через синтез нужно выполнить не более двух раз за суперкадр.Thus, the presented choice of TLC frame duration is based on a semi-closed cycle approach in which the main type of coding model and control parameters are selected using the open loop method, and then the TLC frame duration is selected from a limited number of options using the closed loop method. While in the analysis process with a full closed loop, analysis through synthesis is always performed four times per superframe, in this semi-closed loop method, analysis through synthesis needs to be performed no more than two times per superframe.

Нужно отметить, что описанный вариант осуществления изобретения представляет лишь один из множества возможных вариантов его осуществления.It should be noted that the described embodiment of the invention represents only one of the many possible options for its implementation.

Claims

1. A method of supporting encoding an audio signal, in which at least one segment of the specified audio signal must be encoded using the encoding model, which allows you to use different durations of the encoding frame, the specified method comprises:
determining at least one control parameter at least partially based on the characteristics of said audio signal; and
the limitation of these options for choosing the possible duration of the coding frame for the specified at least one segment by means of the specified at least one control parameter.

2. The method according to claim 1, which comprises determining said at least one control parameter based on at least one of the following parameters:
an indicator of the spectral distance between the current and previous frames;
the number of frames in the superframe that are selected for encoding using another encoding model.

3. The method according to claim 1, which contains:
in the event that after the specified restriction there remains more than one option for choosing the possible durations of the encoding frame, encoding the specified at least one segment using each of the indicated remaining durations of the encoding frame with conversion;
decoding said encoded segments with the corresponding used encoding frame length with conversion and
selection for the specified at least one segment of such a length of the encoding frame, which leads to the best decoded audio signal on the specified at least one segment.

4. The method according to claim 3, in which the duration of the encoding frame, which leads to a better decoded segment, is determined by comparing the signal-to-noise ratio obtained for each of the indicated encoding frame durations.

5. The method according to claim 4, in which for the specified signal-to-noise ratio of the audio signal obtained at a particular encoding frame duration, first, the signal-to-noise segment ratio is determined separately for a plurality of subframes in the corresponding encoding frame, and then said signal-to-noise segment ratios are specified the subframes of the encoding frame are averaged over the entire encoding frame to obtain the specified signal to noise ratio for the specified at least one segment.

6. The method according to claim 1, in which for each segment of the specified audio signal is determined based on the characteristics of the audio signal for the corresponding segment, whether to apply the specified coding model or another coding model, and the specified at least one control parameter contains an indication of the segments, for which the specified other coding model has been selected.

7. The method of claim 6, wherein said encoding model is a transform encoding model, and said other encoding model is a linear prediction algorithm with an algebraic code excitation algorithm.

8. The method according to claim 6, in which each segment of the specified audio signal has a predetermined duration, and the indicated indication of the segments for which another encoding model was selected is provided for the corresponding super-cut, which contains a predetermined number of these segments.

9. The method according to claim 1, in which each segment of the specified audio signal has a predetermined duration, while a predetermined number of consecutive segments respectively form the corresponding super-segment, and these options for selecting the encoding frame duration for a particular segment are limited by the boundaries of the super-segment to which the specified segment belongs .

10. The method according to claim 7, in which each segment of the specified audio signal has a duration of 20 ms and four consecutive segments, respectively, form the corresponding super-segment, while the specified encoding model with conversion allows you to use the duration of the encoding frame 20, 40 and 80 ms, and these options the choice of the length of the coding frame for the segment is limited by the boundaries of the super-segment to which the specified segment belongs.

11. The method according to claim 1, wherein said at least one control parameter comprises an indicator indicating whether to use a shorter or longer coding frame duration, while indicating that a shorter coding frame duration should be applied, eliminates the option of at least as long as the encoding frame is the longest, and an indication that a longer encoding frame is to be used excludes the option of at least the smallest encoding frame.

12. A module for supporting coding of an audio signal, in which at least one segment of the audio signal must be encoded using an encoding model that allows the use of different durations of an encoding frame, wherein said module comprises;
a parameter selection unit configured to determine at least one control parameter at least in part based on the characteristics of said audio signal; and
a frame duration selection unit configured to limit the selection of possible encoding frame durations for at least one segment by means of at least one control parameter provided by said first estimation unit.

13. The module according to item 12, which determines the specified at least one control parameter based on at least one of the following parameters:
an indicator of a short frame, which is determined at least based on the spectral distance; and
the number of selected linear predictor frames with an algebraic code excitation contained in a superframe.

14. The module of claim 12, wherein said block for selecting a frame duration is configured to encode the specified at least one segment using each of the remaining remaining encoding frame lengths with conversion if more than one choice of possible frame durations remains after this limitation encoding, as well as decode the indicated encoded segments again using the corresponding used encoding frame with conversion and select for the specified shey least one segment coding frame length which results in the best decoded audio signal in said at least one segment.

15. The module of claim 14, wherein said block for selecting a frame duration is capable of determining an encoding frame duration that results in a better decoded length by comparing the signal-to-noise ratio obtained for each of said encoding frame durations.

16. The module according to clause 15, in which to determine the specified signal-to-noise ratio for an audio signal obtained at a particular coding frame duration, said frame duration selection unit is configured to first determine a segmented signal-to-noise ratio separately for a plurality of subframes in a corresponding frame encoding and averaging said segmented signal-to-noise ratios of said subframes of a coding frame over an entire coding frame to obtain said signal-to-noise ratio for said at least one segment.

17. The module of claim 12, wherein said parameter selection unit is configured to determine, at least for some segments of the audio signal, based on the characteristics of the audio signal for the corresponding segment of the audio signal, whether to use the specified encoding model or another encoding model, and the ability to provide an indication of the segments for which the specified other coding model was selected as one of the specified control parameters.

18. The module of claim 17, wherein said encoding model is a transform encoding model, and said other encoding model is a linear prediction algorithm with an algebraic code excitation algorithm.

19. An electronic device that contains a support module for encoding an audio signal, in which at least one segment of the specified audio signal must be encoded using an encoding model that allows you to use different durations of the encoding frame, while this module contains:
a parameter selection unit configured to determine at least one control parameter at least in part based on the characteristics of said audio signal; and
a frame duration selection unit, which is configured to limit the selection of possible encoding frame durations for at least one segment by means of at least one control parameter provided by said first estimation unit.

20. The electronic device according to claim 19, which determines the specified at least one control parameter based on at least one of the following parameters:
an indicator of a short frame, which is determined at least based on the spectral distance; and
the number of selected linear predictor frames with an algebraic code excitation contained in a superframe.

21. The electronic device according to claim 19, wherein said block for selecting a frame duration is configured to encode the specified at least one segment using each of the remaining remaining encoding frame lengths with conversion, if after this restriction there is more than one option for choosing possible frame durations encoding, and again decode the specified encoded segments using the corresponding used encoding frame with conversion and choose for bound to at least one segment coding frame length which results in the best decoded audio signal in said at least one segment.

22. The electronic device according to item 21, wherein said block for selecting a frame duration is configured to determine the length of the encoding frame, which leads to a better decoded segment, by comparing the signal-to-noise ratio obtained for each of the indicated encoding frame durations.

23. The electronic device according to item 22, in which to determine the specified signal-to-noise ratio for an audio signal obtained at a particular coding frame duration, said frame duration selection unit is configured to first determine a segmented signal-to-noise ratio separately for a plurality of subframes in the corresponding encoding frame and average the indicated segment signal-to-noise ratios of the indicated subframes of the encoding frame for the whole encoding frame to obtain the specified signal-to-noise ratio for the specified at least one segment.

24. The electronic device of claim 21, wherein said parameter selection unit is configured to determine, at least for some segments of the audio signal, based on the characteristics of the audio signal for the corresponding segment of the audio signal, whether or not to use said encoding model or other encoding model, and with the ability to provide as one of the specified control parameters an indication of the segments for which the specified other coding model was selected.

25. The electronic device according to paragraph 24, wherein said encoding model is a transform encoding model, and said other encoding model is a linear prediction algorithm with an algebraic code.

26. The electronic device according to paragraph 24, in which each segment of the specified audio signal has a predetermined duration and the specified parameter selection unit is capable of providing an indication of the segments for which the specified other coding model has been selected in the corresponding super segment containing a predetermined number of these segments.

27. The electronic device according to claim 19, in which each segment of the specified audio signal has a predetermined duration, and a predetermined number of consecutive segments respectively forms a corresponding super-segment, and said block for selecting a frame duration is capable of limiting the options for selecting the encoding frame duration for a particular segment based on The boundaries of the super-segment to which the specified segment belongs.

28. The electronic device according A.25, in which each segment of the specified audio signal has a duration of 20 ms, and four consecutive segments, respectively, form a super-segment, while this encoding model with conversion allows you to use the duration of the encoding frame 20, 40 and 80 ms and the specified block the choice of the duration of the frame is able to limit the options for choosing the duration of the coding frame for the segment based on the boundaries of the super-segment to which the specified segment belongs.

29. The electronic device according to claim 19, wherein said parameter selection unit is configured to provide as one of the specified control parameters an indicator indicating whether to use a shorter or longer coding frame duration, while indicating that it is necessary to apply a shorter the length of the encoding frame excludes the option of at least the longest duration of the encoding frame, and the indication that you need to apply a longer duration of the encoding frame excludes the option of at least my shorter encoding frame duration.

30. A system for encoding sound, which contains the module according to item 12 and a decoder for decoding audio signals encoded with a variable length of the encoding frame.

31. The system of claim 30, which comprises determining at least one control parameter at least partially based on the characteristics of said audio signal.

32. The system of clause 30, which includes limiting these options for choosing the possible duration of the encoding frame through the specified at least one control parameter.

33. The system according to p. 31, which further comprises, if after the specified restriction there is more than one option to select the possible duration of the encoding frame, encoding the specified at least one segment using each of the indicated remaining durations of the encoding frame with conversion;
decoding said encoded segments with the corresponding used encoding frame length with conversion and
selection for the specified at least one segment of such a length of the encoding frame, which leads to the best decoded audio signal on the specified at least one segment.

34. A software product that stores the code for the audio encoding support program, in which at least one segment of the audio signal must be encoded using an encoding model that allows the use of various encoding frame durations, while the specified program code, when executed in the processing part of the encoder, following operations:
determining at least one control parameter at least partially based on the characteristics of said audio signal; and
the limitation of these options for choosing the possible duration of the coding frame for the specified at least one segment by means of the specified at least one control parameter.