RU2464651C2

RU2464651C2 - Method and apparatus for multilevel scalable information loss tolerant speech encoding for packet switched networks

Info

Publication number: RU2464651C2
Application number: RU2009147332/08A
Authority: RU
Inventors: Владимир Александрович Свириденко (RU); Владимир Александрович Свириденко
Original assignee: Общество с ограниченной ответственностью "Спирит Корп"
Priority date: 2009-12-22
Filing date: 2009-12-22
Publication date: 2012-10-20
Also published as: RU2009147332A

Abstract

FIELD: information technology.

SUBSTANCE: in the method for multilevel frequency band and transmission speed scalable speech encoding based on "synthesis analysis", parameters of the multipulse excitation source and the synthesis filter are encoded for each speech frame or subframe. The method involves encoding the base level and encoding one or more enhancement levels. Parameters of the synthesis filter are determined only once at the analysis step when encoding the base level, and these parameters are then encoded such that only the base part of the speech signal can be reconstructed with given quality in a defined limited part of the frequency band of the signal, and on at least one subsequent encoding enhancement level, said part of the frequency band is expanded.

EFFECT: high speech quality and providing tolerance to loss of speech frames transmitted in a packet switched network.

9 cl, 10 dwg

Description

Область техники, к которой относится изобретениеFIELD OF THE INVENTION

Изобретение относится к области способов передачи мультимедийной информации в сетях связи и запоминания ее в электронных устройствах, в частности к кодированию речи с целью, например, эффективной и надежной передачи речи высокого качества по линиям связи сетей с коммутацией пакетов, таких как IP-сети (включая Интернет).The invention relates to the field of methods for transmitting multimedia information in communication networks and storing it in electronic devices, in particular, speech coding for the purpose of, for example, efficient and reliable high-quality voice transmission over communication lines of packet-switched networks, such as IP networks (including The Internet).

Уровень техникиState of the art

Характеристики систем передачи/запоминания речи в цифровой форме сильно зависят от методов сжатия (компрессии) речи, т.е. качества кодирования речевых сигналов. Эффективность такого кодирования является определяющими фактором как для передачи речи по цифровым каналам связи, так и в сетях с коммутацией каналов или пакетов. Кодирование речи при передаче по IP-сетям (системы IP-телефонии, Voice over IP) имеет следующие важные особенности.The characteristics of speech transmission / storage systems in digital form strongly depend on the methods of speech compression (compression), i.e. quality of coding of speech signals. The effectiveness of such coding is a determining factor both for voice transmission over digital communication channels, and in circuit-switched networks or packets. Voice encoding for transmission over IP networks (IP telephony systems, Voice over IP) has the following important features.

1. Сети с коммутацией пакетов очень удобны для надежной передачи данных на базе протоколов IP, допускающей определенную переменную задержку, но для трафика реального времени (в частности, речи) они изначально не были рассчитаны. Использование протоколов UDP или RTP частично решает проблему задержек, но может приводить к нарушениям очередности и переменному времени доставки до получателя отдельных пакетов. К тому же при перегрузках сети часть пакетов, передаваемых по виртуальным каналам связи, может быть потеряна. Поэтому обеспечение определенной устойчивости к этому фактору влияния IP-сети на качество передачи речи очень важно.1. Networks with packet switching are very convenient for reliable data transmission based on IP protocols, which allows a certain variable delay, but for real-time traffic (in particular, speech) they were not originally calculated. The use of UDP or RTP partially solves the problem of delays, but can lead to sequence violations and variable delivery times to the recipient of individual packets. In addition, during network congestion, some of the packets transmitted over virtual communication channels may be lost. Therefore, ensuring a certain resistance to this factor of the influence of the IP network on the quality of voice transmission is very important.

Таким образом, кодирование речи для систем VoIP должно обеспечить достаточно качественную ее передачу в сетях IP в режиме реального времени в условиях, когда некоторая часть пакетов может быть потеряна.Thus, voice coding for VoIP systems should ensure its high-quality transmission in IP networks in real time in conditions where some of the packets may be lost.

2. Так как IP-сеть позволяет передавать пакеты с высокой скоростью, то это открывает возможности передачи широкополосной речи существенно более высокого качества, чем традиционная телефония. Очевидно, что речевой кодек для систем VoIP должен учитывать возможности IP-сети по передаче широкополосной речи с высоким качеством.2. Since the IP network allows you to transfer packets at high speed, this opens up the possibility of transmitting broadband speech of significantly higher quality than traditional telephony. Obviously, the speech codec for VoIP systems must take into account the capabilities of the IP network to transmit broadband speech with high quality.

3. IP-сети допускают переменную скорость передачи информации. Поэтому целесообразно обеспечить условие, чтобы речевой трафик в IP-сети соответствовал реальной информативности речевого сообщения. Речь - именно такой источник с переменной информативностью (очевидно, что, например, информативность пауз в речевом потоке существенно ниже информативности активных речевых участков). В свою очередь, на речевых участках вокализованные части более важны, чем невокализованные, а участки перехода между самими вокализованными участками несут существенно больше информации, чем их квазистационарная часть. Это означает, что речевой кодек для VoIP должен обеспечить кодирование речи с заданным качеством и при этом скорость передаваемого битового потока должна меняться в зависимости от текущей информативности входного речевого сигнала.3. IP networks allow variable bit rate. Therefore, it is advisable to ensure that the voice traffic in the IP network corresponds to the real information content of the voice message. Speech is just such a source with variable information content (it is obvious that, for example, the information content of pauses in the speech stream is significantly lower than the information content of active speech sections). In turn, in voice sections, voiced parts are more important than unvoiced, and transition sections between the voiced sections themselves carry significantly more information than their quasi-stationary part. This means that the speech codec for VoIP must provide encoding of speech with a given quality and the speed of the transmitted bit stream should vary depending on the current information content of the input speech signal.

4. Линии связи в IP-сети имеют, как известно, разную пропускную способность. Поэтому речевой кодек для систем VoIP должен обеспечивать передачу речи между абонентами (клиентами) одной сети или разных сетей, подключенными к сети IP цифровыми линиями связи с разной пропускной способностью без дополнительного перекодирования (транскодинга), вносящего дополнительные искажения в речь и задержки. Особенно это важно при организации многоточечной и циркулярной конференц-связи, когда в конференции одновременно участвуют несколько клиентов, подключенных к сети линиями связи с разной пропускной способностью, а сама конференц-связь организована по так называемой "tandem-free" схеме, когда микширование речевых сигналов проводится не на сервере, а непосредственно на стороне (компьютере) клиента. Решение этой проблемы возможно, если для кодирования речи используется способ так называемого масштабируемого кодирования (scalable coding). Впервые, по-видимому, он был предложен в 1971 г. автором данной заявки (см. АС 409477) и описан в статье В.А.Свириденко «Способ сжатия аналогового сообщения и его эффективность». - Автометрия, №3, 1974 г., стр.102-106, применительно к поэтапному кодированию с контролируемыми потерями непрерывных сообщений и сигналов, включая и речевые сигналы.4. Communication lines in an IP network have, as you know, different bandwidth. Therefore, the speech codec for VoIP systems should provide voice transmission between subscribers (clients) of the same network or different networks connected to the IP network by digital communication lines with different bandwidths without additional transcoding (transcoding), introducing additional distortion into speech and delays. This is especially important when organizing multi-point and circular conference calls, when several clients connected to the network by communication lines with different bandwidth simultaneously participate in the conference, and the conference call itself is organized according to the so-called "tandem-free" scheme, when mixing voice signals It is carried out not on the server, but directly on the client’s side (computer). The solution to this problem is possible if the so-called scalable coding method is used for speech coding. Apparently, it was first proposed in 1971 by the author of this application (see AC 409477) and described in the article by V. A. Sviridenko “Method for Compressing an Analog Message and Its Efficiency”. - Autometry, No. 3, 1974, pp. 102-106, in relation to phased coding with controlled loss of continuous messages and signals, including speech signals.

5. Число требований к речевому кодеку, предназначенному для эффективной работы в IP-сети, не исчерпывается указанным перечнем. К нему можно добавить требования малой алгоритмической задержки, что важно при интерактивной речевой связи, обеспечения определенного уровня шумостойкости кодека, устойчивости к ошибкам и снижения уровня акустического эха, способность кодека отвечать требованиям сетей связи нового поколения, способность кодека эффективно функционировать в интегрированных сетях при объединении сетей с разными архитектурами и возможностями (в частности, сети IP и сети мобильной радиосвязи) и другое.5. The number of requirements for a speech codec intended for efficient operation in an IP network is not limited to this list. The requirements of low algorithmic delay can be added to it, which is important for interactive voice communication, ensuring a certain level of codec noise resistance, error resistance and reducing the acoustic echo level, the ability of the codec to meet the requirements of new generation communication networks, the ability of the codec to function effectively in integrated networks when connecting networks with different architectures and capabilities (in particular, IP networks and mobile radio networks) and more.

К настоящему времени многие из этих факторов учтены в рамках разработки и реализации речевых кодеков для IP-сетей, включая кодеки в соответствии с Рекомендациями МСЭ-Т G.718, G.719, G.722.2 (AMR-WB), G.729.1, кодеки отдельных организаций, занятых проектированием и использованием систем VoIP и видеоконференцсвязи (Скайп, Поликом, ГИПС и др.), включая такие кодеки, как Speex, iSAC, iLBC, AMR-WB+ и другие.To date, many of these factors have been taken into account in the development and implementation of speech codecs for IP networks, including codecs in accordance with ITU-T Recs G.718, G.719, G.722.2 (AMR-WB), G.729.1, codecs of individual organizations involved in the design and use of VoIP and video conferencing systems (Skype, Polycom, GIPS, etc.), including codecs such as Speex, iSAC, iLBC, AMR-WB + and others.

Особо отметим положительные качества многоуровневого кодирования речевого источника, когда речь сначала кодируется кодером нижнего или базового уровня (base layer), который обеспечивает сжатие речи с допустимым речевым качеством, т.е. обеспечивает кодирование для самой низкой скорости передачи. Затем разница между исходной и восстановленной речью после базового уровня кодирования кодируется кодером первого уровня улучшения. Это дает некоторый заданный прирост битовой скорости и заданный прирост речевого качества в суммарном сигнале после двух уровней кодирования. Далее разница между исходной и восстановленной речью после базового и первого улучшающего уровня кодирования кодируется кодером второго уровня улучшения. И так поэтапно далее до тех пор, пока речевое качество восстановленной речи от всех уровней кодирования не будет достигать заданного качества, а суммарная скорость передачи не будет превышать заданного предела. Именно этот подход был представлен в АС 409477 применительно к любым непрерывным сигналам, включая и речевые сигналы.We especially note the positive qualities of multilevel coding of a speech source when speech is first encoded by an encoder of the lower or basic level (base layer), which provides compression of speech with acceptable speech quality, i.e. provides encoding for the lowest bit rate. Then, the difference between the original and the restored speech after the basic encoding level is encoded by the encoder of the first improvement level. This gives some predetermined increase in bit rate and a predetermined increase in speech quality in the total signal after two coding levels. Further, the difference between the original and the restored speech after the basic and first improving coding level is encoded by the encoder of the second improvement level. And so, step by step, until the speech quality of the restored speech from all coding levels reaches the specified quality, and the total transmission speed does not exceed the specified limit. This approach was presented in AC 409477 in relation to any continuous signals, including speech signals.

Если вышележащие уровни кодирования практически не оказывают влияния на нижележащие уровни, то это обеспечивает независимость нижележащих уровней от вышележащих, что является полезным качеством.If the overlying coding levels have practically no effect on the lower levels, then this ensures the independence of the lower levels from the overlying ones, which is a useful quality.

Такой способ многоуровневого кодирования дает целый ряд важных преимуществ:This method of multilevel coding provides a number of important advantages:

- обеспечивает масштабируемость битовой (двоичной) скорости (bit-rate scalability), когда речь кодируется с самым высоким качеством и наивысшей скоростью, а затем декомпозируется (разбирается) на потоки с различной скоростью и передается клиентам уже на тех скоростях, которые соответствуют текущим пропускным способностям линий связи, которыми они подключены к сети IP;- provides scalability of bit (binary) speed (bit-rate scalability) when speech is encoded with the highest quality and highest speed, and then decomposed (disassembled) into streams with different speeds and transmitted to clients already at those speeds that correspond to current bandwidths communication lines with which they are connected to the IP network;

- достаточно использовать дополнительные средства защиты от потерь в канале связи, например используя корректирующее кодирование (forward error correcting) или некоторые формы маскировки потерянных пакетов (PLC - packet loss concealment) только информации самого нижнего уровня, чтобы обеспечить у получателя качество речи не ниже заданного допустимого уровня. Или, например, в случае IP-сети, результаты кодирования разных уровней могут быть разложены в разные пакеты, которым присваиваются разные приоритеты. Информации базового уровня назначается максимальный приоритет, а всем последующим соответственно назначаются приоритеты по убыванию. Тогда передача этой информации через IP-сеть с соответствующей архитектурой протоколов будет осуществляться с допустимой вероятностью потери пакетов базового уровня (за счет пакетов уровней улучшения, передаваемых с меньшим приоритетом). А это значит, что даже в условиях перегрузок в сети качество декодированной речи у получателя не будет снижаться ниже заданного уровня, а при отсутствии потерь будет достигать заданного максимального качества;- it is enough to use additional means of protection against losses in the communication channel, for example, using correcting coding (forward error correcting) or some form of masking lost packets (PLC - packet loss concealment) of only the lowest level information to ensure that the recipient has a speech quality not lower than the specified acceptable level. Or, for example, in the case of an IP network, the encoding results of different levels can be decomposed into different packets, which are assigned different priorities. Information of the basic level is assigned the maximum priority, and all subsequent ones are assigned priority in descending order. Then the transfer of this information through an IP network with the appropriate protocol architecture will be carried out with an acceptable probability of loss of basic layer packets (due to packets of improvement layers transmitted with lower priority). This means that even under conditions of network congestion, the quality of decoded speech at the recipient will not decrease below a predetermined level, and if there are no losses, it will reach a predetermined maximum quality;

- возможно обеспечить определенную компенсацию в качестве декодированной речи при потере пакета, если всю информацию базового уровня или ее наиболее важную часть размещать не только в соответствующем кадре (или пакете), но и в смежном(ых) кадре(ах) или пакете(ах).- it is possible to provide some compensation as decoded speech in case of packet loss, if all the information of the basic level or its most important part is placed not only in the corresponding frame (or packet), but also in the adjacent frame (s) or packet (s) .

Следует отметить, что масштабирование может осуществляться еще и по ширине полосы обрабатываемой речи (bandwidth scalability). Например, кодирование базового уровня обеспечивает только минимально допустимую часть частотной полосы входной речи, например, в «телефонной» полосе 0,3…3,4 кГц, а каждый последующий улучшающий уровень кодирования постепенно расширяет полосу частот вплоть до полосы FM-радио (0,04…15 кГц) или, возможно, даже выше, до 0,02…20 кГц (при соответствующем увеличении частоты дискретизации). Таким образом, достигается способность передавать широкополосную (ШП) и даже сверхширокополосную (СШП) речь очень высокого качества, а в случае перегрузки сети или относительно низкой способности передавать широкополосную (ШП) и даже сверхширокополосную (СШП) речь очень высокого качества обеспечивать многоканальную ее передачу (например, стереопередачу), а в случае перегрузки сети или относительно низкой пропускной способности линии связи, которая подключает получателя к сети, отбрасывать часть или все уровни улучшения, сохранив речевое качество не ниже уровня обычной телефонной связи.It should be noted that scaling can also be carried out along the bandwidth of the processed speech (bandwidth scalability). For example, basic level coding provides only the minimum acceptable part of the input speech frequency band, for example, in the “telephone” band 0.3 ... 3.4 kHz, and each subsequent improving coding level gradually expands the frequency band up to the FM radio band (0, 04 ... 15 kHz) or, possibly, even higher, to 0.02 ... 20 kHz (with a corresponding increase in the sampling frequency). Thus, the ability to transmit broadband (UW) and even ultra-wideband (UWB) speech of very high quality is achieved, and in case of network congestion or a relatively low ability to transmit broadband (UW) and even ultra-wideband (UWB) speech of very high quality to provide multi-channel transmission ( for example, stereo broadcast), and in case of network congestion or relatively low bandwidth of the communication line that connects the receiver to the network, discard part or all of the improvement levels, retaining speech Its quality is not lower than the level of regular telephone communications.

Одним из известных кодеков, использующих метод многоуровневого кодирования речи, является речевой кодек MPEG-4 CELP (ISO/IEC 14496-3), который содержит кодер базового уровня (Core Encoder) и набор средств масштабирования скорости передачи (Bit Rate Scalable (BRS) tools) и масштабирования полосы (Bandwidth Extension tool). Упрощенная блок-схема MPEG-4 CELP масштабируемого кодера показана на Фиг.1.One of the well-known codecs using the multilevel speech coding method is the MPEG-4 CELP speech codec (ISO / IEC 14496-3), which contains a basic level encoder (Core Encoder) and a set of Bit Rate Scalable (BRS) tools ) and Band Scaling (Bandwidth Extension tool). A simplified block diagram of an MPEG-4 CELP scalable encoder is shown in FIG.

Этот кодек основан на многоимпульсном возбуждении (Multi Pulse Excitation (MPE)) - методе речевого кодирования, являющемся разновидностью хорошо известного CELP (Code Excited Linear Predictive) метода, предложенного М.Шредером и В.Аталом в 1985 г., который имеет разные варианты реализации (ACELP, RCELP, LD-CELP, VSELP и др.) и в настоящее время широко используется в разработке вокодеров.This codec is based on Multi Pulse Excitation (MPE), a speech coding method that is a variation of the well-known CELP (Code Excited Linear Predictive) method proposed by M. Schroeder and W. Atal in 1985, which has different implementation options (ACELP, RCELP, LD-CELP, VSELP, etc.) and is currently widely used in the development of vocoders.

Метод MPE, как и многие другие методы кодирования речи, опирается на модель речеобразования «источник-фильтр», в которой речевой тракт представлен в виде полюсного фильтра с изменяемыми во времени параметрами, на вход которого подается сигнал источника возбуждения, характеризующий работу голосовых связок, и параметры воздушной струи, поступающей на вход речевого тракта. Сам CELP-метод базируется на четырех главных принципах: использование упомянутой модели речеобразования «источник-фильтр» на основе линейного предсказания (ЛП); использование адаптивной и фиксированной кодовой книг в качестве источника возбуждения для ЛП-модели; реализации поиска оптимального сигнала возбуждения в замкнутой петле с учетом взвешенного разностного сигнала на основе восприятия речи человеком; векторного квантования.The MPE method, like many other methods of speech coding, relies on a source-filter speech-formation model, in which the voice path is presented in the form of a pole filter with time-varying parameters, the input of which is an excitation source signal characterizing the operation of the vocal cords, and parameters of the air stream entering the input of the vocal tract. The CELP method itself is based on four main principles: the use of the mentioned source-filter speech-formation model based on linear prediction (LP); the use of adaptive and fixed code books as a source of excitation for the LP model; implementation of the search for the optimal excitation signal in a closed loop, taking into account the weighted difference signal based on human speech perception; vector quantization.

В кодеке MPEG-4 CELP входной речевой сигнал в цифровой форме с частотой дискретизации 16 кГц, достаточной для передачи речи в полосе 7…8 кГц, поступает на блок 101, понижающий частоту дискретизации до 8 кГц, что достаточно для кодирования речи в полосе не более 4 кГц (например, «телефонного канала»), и далее в кодер базового уровня (блок 102), на выходе которого выделяются параметры модели речеобразования, которые передаются во вне через мультиплексор (блок 106). Часть параметров образует информацию об импульсах возбуждения, которая передается на блок масштабирования по скорости (блок 103), куда одновременно от блока 102 поступает «остаток», представляющий собой разность между исходным речевым сигналом с частотой дискретизации 8 кГц и синтезированном в кодере сигналом. Подобная информация и «остатки» поступают и на более высокие по уровню средства масштабирования по скорости вплоть до средства уровня N (блок 104), выходы которых также подаются на блок 106. Но если есть возможность передать более широкополосный сигнал (полосой 7…8 кГц), то в работу включается блок 105 - средство расширения полосы, на основной вход которого подается исходный речевой сигнал (16 кГц), на его информационные входы подается информация о параметрах источника возбуждения и коэффициентах синтезирующего ЛП-фильтра от блока 104, а на выходе, подключенном к блоку 106, формируются параметры речевого сигнала для широкой полосы (до 8 кГц).In the MPEG-4 CELP codec, the input speech signal in digital form with a sampling frequency of 16 kHz, sufficient for transmitting speech in the band 7 ... 8 kHz, is fed to block 101, which reduces the sampling frequency to 8 kHz, which is sufficient for encoding speech in the band no more 4 kHz (for example, a “telephone channel”), and then to the base level encoder (block 102), at the output of which the speech model parameters are allocated, which are transmitted outside through the multiplexer (block 106). Some of the parameters generate information about the excitation pulses, which is transmitted to the speed scaling unit (block 103), where at the same time a “remainder” is received from block 102, which is the difference between the original speech signal with a sampling frequency of 8 kHz and the signal synthesized in the encoder. Similar information and “residuals” are also sent to higher-level scaling means in speed up to level-N means (block 104), the outputs of which are also sent to block 106. But if it is possible to transmit a wider signal (band 7 ... 8 kHz) , then block 105 is switched on to work — a band extension means, to the main input of which the original speech signal (16 kHz) is supplied, information on the parameters of the excitation source and the coefficients of the synthesizing LP filter from block 104 is fed to its information inputs, and at the output, Connecting to the block 106, the speech signal parameters are generated for a broad band (up to 8 kHz).

Упрощенная блок-схема MPEG-4 CELP кодера базового уровня кодирования показана на Фиг.2. Он включает блоки 201, 202, 203 и 204 для синтеза речевого сигнала, причем блоки 201 и 202 в совокупности моделируют «источник» и включают фиксированную и адаптивную кодовую книги, сигналы которых объединяются в блоке 203 для формирования сигнала возбуждения, а цифровой фильтр (блок 204), ЛП-коэффициенты (LPC) которого определяют огибающую спектра синтезируемой речи, формирует из этого сигнала возбуждения синтезируемый в кодере речевой сигнал. Последний вычитается из исходной речи в блоке 205 для формирования сигнала ошибки или «остатка», который подается на взвешивающий фильтр (блок 208), выход которого запитывает блок 209 для минимизации мощности этого остатка с целью формирования сигнала выбора «оптимального сигнала возбуждения», подаваемого на управляющий вход 1 блока 201 и блок 202, где реализуется поиск этого «оптимального сигнала» в кодовых книгах. Сигнал возбуждения из блока 203 подается также на формирователь управляющего сигнала для адаптивной кодовой книги (блок 210), выход которого поступает на второй управляющий вход блока 201. Оценка коэффициентов LPC производится в блоке 206, куда подается речевой сигнал. Эти коэффициенты квантуются векторным квантователем (блок 207) и подаются на блок 204. На выходе кодера формируются «остаток» (выход блока 205) и параметры речевого сигнала в виде основного тона (второй выход блока 201), информации об импульсах возбуждения (второй выход блока 202), линейных спектральных частотах (LSF), называемых также линейными спектральными парами (LSP), представляющими собой коэффициенты LPC в более удобной для передачи форме из-за меньшей чувствительности к шуму квантования, на втором выходе блока 207.A simplified block diagram of an MPEG-4 CELP encoder of the basic encoding level is shown in FIG. 2. It includes blocks 201, 202, 203 and 204 for synthesizing a speech signal, and blocks 201 and 202 collectively simulate a “source” and include a fixed and adaptive codebook, whose signals are combined in block 203 to form an excitation signal, and a digital filter (block 204), the LP-coefficients (LPC) of which determine the envelope of the spectrum of synthesized speech, generates a speech signal synthesized in the encoder from this excitation signal. The latter is subtracted from the initial speech in block 205 to generate an error signal or “residue”, which is fed to the weighing filter (block 208), the output of which feeds block 209 to minimize the power of this residue in order to generate a signal for selecting the “optimal excitation signal” applied to control input 1 of block 201 and block 202, where the search for this "optimal signal" in code books is implemented. The excitation signal from block 203 is also supplied to the driver of the control signal for the adaptive codebook (block 210), the output of which is supplied to the second control input of block 201. The LPC coefficients are estimated in block 206, where the speech signal is supplied. These coefficients are quantized by a vector quantizer (block 207) and fed to block 204. An “remainder” (block output 205) and parameters of the speech signal in the form of a fundamental tone (second block output 201), information about excitation pulses (second block output) are generated at the encoder output. 202), linear spectral frequencies (LSFs), also called linear spectral pairs (LSPs), which are LPC coefficients in a more convenient form for transmission due to less sensitivity to quantization noise, at the second output of block 207.

Данный кодер содержит адаптивную и несколько фиксированных кодовых книг (для простоты на Фиг.2 показана только одна фиксированная кодовая книга). Сумма выходных сигналов этих двух книг формирует сигнал возбуждения синтезирующего LPC-фильтра. Кодирование речи проводится покадрово. Т.е. входной речевой сигнал делится на отдельные кадры (frames) и для каждого кадра выполняется анализ параметров LPC-фильтра. Далее кадр разбивается на подкадры и для каждого из них проводится оптимизация параметров адаптивной и фиксированной кодовых книг методом «анализ синтезом» по критерию минимума взвешенной среднеквадратической погрешности (ошибки) между входным и синтезированным сигналом. Взвешивание проводится так называемым «фильтром восприятия», представляющим собой, как правило, полюсно-нулевой фильтр, параметры которого получаются из коэффициентов LPC-фильтра. Задача взвешивания состоит в повышении веса погрешности в формантных областях спектра речевого сигнала, которые наиболее важны при восприятии речевого сигнала, и снизить вес в менее значимых областях этого спектра для восприятия речи человеком. Взвешенный речевой сигнал может быть получен как:This encoder contains an adaptive and several fixed codebooks (for simplicity, figure 2 shows only one fixed codebook). The sum of the output signals of these two books forms the excitation signal of the synthesizing LPC filter. Speech coding is carried out frame by frame. Those. the input speech signal is divided into separate frames and the analysis of the parameters of the LPC filter is performed for each frame. Next, the frame is divided into subframes and for each of them the parameters of the adaptive and fixed codebooks are optimized by the “synthesis analysis” method according to the criterion of the minimum weighted mean square error (error) between the input and synthesized signal. Weighing is carried out by the so-called "perception filter", which is usually a pole-zero filter, the parameters of which are obtained from the coefficients of the LPC filter. The weighting task is to increase the weight of the error in the formant regions of the spectrum of the speech signal, which are most important in the perception of the speech signal, and to reduce the weight in the less significant regions of this spectrum for human perception of speech. A weighted speech signal can be obtained as:

где:Where:

x(n) - очередной отсчет входного речевого сигнала,x (n) is the next sample of the input speech signal,

a_k, b_k - коэффициенты LPC-фильтра K-го порядка,a _k , b _k are K-order LPC filter coefficients,

γ₁, γ₂ - константы, определяющие степень взвешиванияγ ₁ , γ ₂ - constants that determine the degree of weighing

Уравнение (1) отражает последовательную фильтрацию входного сигнала соответственно нулевым и полюсным LPC-фильтрами, импульсные характеристики которых скорректированы константами γ₁, γ₂.Equation (1) reflects the sequential filtering of the input signal by the zero and pole LPC filters, respectively, whose impulse characteristics are corrected by the constants γ ₁ , γ ₂ .

В основе алгоритма кодека лежит известный метод линейного предсказания, в котором речевой сигнал представлен следующим уравнением, отражающим фильтрацию сигнала возбуждения полюсным LPC-фильтром:The codec algorithm is based on the well-known linear prediction method, in which the speech signal is represented by the following equation, which reflects the filtering of the excitation signal by a pole LPC filter:

где:Where:

s(n) - очередной отсчет синтезированного речевого сигнала,s (n) is the next sample of the synthesized speech signal,

е(n) - очередной отсчет сигнала возбуждения (excitation),e (n) - the next count of the excitation signal (excitation),

a_k - коэффициенты LPC-фильтра K-го порядка (в кодекс MPEG-4 CELP: K=10 для базового кодера (Core Encoder) и K=20 для средства расширения полосы (Bandwidth Extension Tool)).a _k - K-order LPC filter coefficients (in MPEG-4 CELP code: K = 10 for the core encoder (Core Encoder) and K = 20 for the Bandwidth Extension Tool).

LPC-фильтр отражает характеристики голосового тракта речевого сигнала и формирует спектральную огибающую кодируемого отрезка синтезированного сигнала, опираясь на так называемые кратковременные корреляции в речевом сигнале. Коэффициенты фильтра a_k находятся путем решения системы автокорреляционных уравнений предсказания на базе метода минимизации энергии погрешности (ошибки) предсказания, на основе рекурсивного алгоритма Левинсона-Дарбина.The LPC filter reflects the characteristics of the voice path of the speech signal and forms the spectral envelope of the encoded segment of the synthesized signal, based on the so-called short-term correlations in the speech signal. The filter coefficients a _k are found by solving the system of autocorrelation prediction equations based on the method of minimizing the energy of the prediction error (error) based on the Levinson-Darbin recursive algorithm.

Сигнал возбуждения е(n) находится для каждого подкадра, как описано ниже.An excitation signal e (n) is found for each subframe, as described below.

Вначале находится оценка периода основного тона (ОТ), так называемая «открытая петля ОТ» (Open Loop Pitch), как временная задержка, на которой отрезок прошлых отсчетов во времени взвешенного речевого сигнала имеет минимальное среднеквадратичное отклонение от отрезка текущих отсчетов:First, there is an estimate of the period of the fundamental tone (OT), the so-called "open loop OT" (Open Loop Pitch), as the time delay over which the segment of past samples in time of the weighted speech signal has a minimum standard deviation from the segment of current samples:

где:Where:

- среднеквадратическая ошибка, подвергаемая минимизации,

- standard error to be minimized,

t_op - («оптимальная») оценка задержки, при которой среднеквадратическая ошибка минимальна. Для вокализованных участков эта задержка практически равна длительности периода основного тона речи;t _op - (“optimal”) estimate of the delay at which the standard error is minimal. For voiced sections, this delay is almost equal to the length of the period of the fundamental tone of speech;

x_w(n) - очередной отсчет взвешенного входного речевого сигнала,x _w (n) is the next sample of the weighted input speech signal,

- коэффициент усиления предсказания для оптимальной задержки t_op, который может быть рассчитан как:

- prediction gain for optimal delay t _op , which can be calculated as:

N - число отсчетов взвешенного речевого сигнала, участвующих в оценке коэффициента усиления.N is the number of samples of the weighted speech signal involved in the evaluation of the gain.

Сравнение значения коэффициента усиления предсказания с порогами дает возможность дополнительно оценить степень вокализованности текущего отрезка речевого сигнала, которая может быть использована в дальнейшем.A comparison of the values of the prediction gain with thresholds makes it possible to further evaluate the degree of vocalization of the current segment of the speech signal, which can be used in the future.

Далее все процедуры оптимизации проводятся известным методом «анализ синтезом».Further, all optimization procedures are carried out by the well-known method of "analysis by synthesis."

Для этого процедура синтеза текущего отрезка взвешенной речи представляется в форме:For this, the synthesis procedure for the current segment of the weighted speech is presented in the form:

где:Where:

е(n-i) - сигнал возбуждения синтезирующего фильтра,e (n-i) is the excitation signal of the synthesizing filter,

h_w - отсчеты импульсного отклика взвешенного синтезирующего фильтра, которые вычисляются путем последовательной фильтрации вектора коэффициентов нулевого взвешивающего фильтра, дополненного нулями, соответственно синтезирующим и взвешивающим полюсными фильтрами:h _w are the impulse response samples of the weighted synthesizing filter, which are calculated by sequentially filtering the coefficient vector of the zero weighting filter, supplemented with zeros, respectively, synthesizing and weighing pole filters:

где:Where:

a_k - коэффициенты LPC-фильтра для текущего подкадра, полученные, как правило, линейной интерполяцией из LPC-коэффициентов, вычисленных для прошлого и текущего кадров и пересчитанных в LSF-область.a _k are the LPC filter coefficients for the current subframe, obtained, as a rule, by linear interpolation from the LPC coefficients calculated for the past and current frames and converted to the LSF region.

а(n)=a_k для всех n=(k-1)<K,and (n) = a _k for all n = (k-1) <K,

а(n)=0 для всех n≥K,and (n) = 0 for all n≥K,

- вклад прошлого возбуждения в текущий вектор синтезированного сигнала за счет памяти взвешенного синтезирующего фильтра, так называемый «звон».

- the contribution of the past excitation to the current vector of the synthesized signal due to the memory of the weighted synthesizing filter, the so-called "ringing".

Затем вычисляется целевой сигнал как взвешенный входной речевой сигнал за вычетом «звона»:Then, the target signal is calculated as a weighted input speech signal minus the "ringing":

Этот «звон» вычисляется, как правило, путем последовательной фильтрации «целевого» сигнала взвешивающим и синтезирующим фильтрами. Однако часто используется эквивалентная процедура получения целевого сигнала, которая заключается в нахождении сигнала LPC-остатка:This “ringing" is calculated, as a rule, by sequentially filtering the "target" signal by weighing and synthesizing filters. However, the equivalent procedure for obtaining the target signal is often used, which consists in finding the signal of the LPC residual:

а затем последовательной фильтрации сигнала остатка взвешивающим и синтезирующим фильтрами:and then sequential filtering of the remainder signal by weighing and synthesizing filters:

Обновление памяти фильтра для следующего подкадра выполняется после нахождения сигнала возбуждения путем фильтрации разности между остатком и найденным сигналом возбуждения взвешивающим и синтезирующим фильтрами.The filter memory is updated for the next subframe after finding the excitation signal by filtering the difference between the remainder and the found excitation signal by the weighing and synthesizing filters.

Найденные векторы импульсного отклика взвешенного синтезирующего фильтра h_w(n) и целевого сигнала tag(n) используются для нахождения параметров адаптивной и фиксированной кодовых книг.The found impulse response vectors of the weighted synthesizing filter h _w (n) and the target signal tag (n) are used to find the adaptive and fixed codebook parameters.

Вначале оптимизируется адаптивная кодовая книга (Adaptive Codebook), опираясь на значение Open Loop Pitch t_op. Как правило, адаптивная кодовая книга реализуется в виде Pitch-предиктора, а суть оптимизации состоит в уточнении оптимальной задержки и вычислении усиления предсказания, при которых сигнал прошлого возбуждения, профильтрованный взвешенным синтезирующим фильтром, имеет минимальное среднеквадратичное отклонение от вектора целевого сигнала:Initially, the Adaptive Codebook is optimized based on the value of Open Loop Pitch t _op . As a rule, an adaptive codebook is implemented as a Pitch predictor, and the essence of optimization is to refine the optimal delay and calculate the prediction gain for which the signal of the last excitation filtered by a weighted synthesizing filter has a minimum standard deviation from the vector of the target signal:

где:Where:

t_op - значение оптимальной задержки, найденное методом Open-Loop,t _op is the optimal delay value found by the Open-Loop method,

y_t(n) - сигнал прошлого возбуждения, профильтрованный взвешенным синтезирующим фильтром h_w:y _t (n) is the signal of the last excitation filtered by a weighted synthesizing filter h _w :

Следует отметить, что при задержках t менее длины анализируемого подкадра сигнал прошлого возбуждения еще не сформирован. В этом случае, как правило, вычисленный ранее LPC-остаток res(n) используется в качестве расширения сигнала прошлого возбуждения.It should be noted that with delays t less than the length of the analyzed subframe, the signal of the past excitation is not yet formed. In this case, as a rule, the previously calculated LPC residue res (n) is used as an extension of the signal of the past excitation.

Усиление предиктора основного тона (или Pitch-предиктора) вычисляется как:The gain of the pitch predictor (or Pitch predictor) is calculated as:

Параметры адаптивной кодовой книги (t, g_t) далее кодируются для передачи.Adaptive codebook parameters (t, g _t ) are further encoded for transmission.

Таким образом, адаптивная кодовая книга обеспечивает в текущее возбуждение вклад от прошлого возбуждения на временном интервале задержки t, равном периоду «основного тона» речевого сигнала, используя тем самым так называемые «долговременные корреляции» в речевом сигнале. Вклад этой книги, оцениваемый как коэффициент усиления g_t, тем сильнее, чем сильнее сохраняется стационарный и периодичный (с периодом «основного тона») характер текущего речевого сигнала в сравнении с предыдущим во времени речевым сигналом. Соответственно на переходных участках речевого сигнала от вокализованных (огласованных) до целиком невокализованных (неогласованных) эффективность адаптивной кодовой книги сильно падает.Thus, the adaptive codebook provides a contribution to the current excitation from the past excitation on the delay time interval t equal to the period of the “fundamental tone” of the speech signal, thereby using the so-called “long-term correlations” in the speech signal. The contribution of this book, estimated as the gain g _t , is the stronger, the stronger the stationary and periodic (with the period of the “fundamental tone”) nature of the current speech signal is preserved in comparison with the previous speech signal in time. Accordingly, in the transition sections of the speech signal from voiced (agreed) to completely unvoiced (uncoordinated), the effectiveness of the adaptive codebook is greatly reduced.

Далее проводится оптимизация фиксированной кодовой книги (Fixed Codebook) также по критерию минимума среднеквадратической ошибки между взвешенным синтезированным сигналом и целевым вектором:Next, the optimization of the fixed codebook is also performed according to the criterion of the minimum standard error between the weighted synthesized signal and the target vector:

где:Where:

- новый целевой вектор, необходимый для оптимизации фиксированной кодовой книги, который получен после вычета из первого целевого вектора вклада адаптивной кодовой книги,- a new target vector necessary for optimizing a fixed codebook, which is obtained after subtracting the contribution of the adaptive codebook from the first target vector,

- вклад фиксированной кодовой книги во взвешенном синтезированном сигнале,- the contribution of a fixed codebook in a weighted synthesized signal,

- вектор сигнала многоимпульсного возбуждения, состоящий из М единичных импульсов d с позициями m_l, 0≤l<М и знаками α_l(±1).is the vector of the multi-pulse excitation signal, consisting of M unit pulses d with positions m _l , 0≤l <M and signs α _l (± 1).

Суть оптимизации сводится к нахождению таких позиций из М импульсов, их знаков и усиления g_fc, которые обеспечивают минимальную среднеквадратическую ошибку E_fc (формула 10).The essence of optimization is to find such positions from M pulses, their signs and amplification g _fc that provide the minimum mean square error E _fc (formula 10).

Классический алгоритм поиска позиций импульсов в МРЕ-методе заключается в нахождении позиции импульса по критерию максимума выражения:The classical algorithm for finding the positions of pulses in the MPE method consists in finding the position of the pulse by the criterion of the maximum expression:

а затем его амплитуды из выражения:and then its amplitudes from the expression:

где:Where:

l - текущее количество импульсов,l is the current number of pulses,

m_i - позиция i-го импульса.m _i - position of the i-th pulse.

Таким образом, используя выражения (17) и (18), проводится последовательное нахождение позиций всех М импульсов, а затем и их общее усиление как:Thus, using expressions (17) and (18), the positions of all M pulses are sequentially found, and then their general amplification is performed as:

Далее найденные оптимальные позиции, знаки и усиление импульсов кодируются для передачи.Further, the found optimal positions, signs, and pulse amplification are encoded for transmission.

Итак, фиксированная кодовая книга обеспечивает вклад в текущее возбуждение стохастической составляющей возбуждения, моделирующей шумовую компоненту источника возбуждения. Одновременно важным свойством фиксированной кодовой книги является то, что она фактически обеспечивает сигнал прошлого возбуждения недостающими импульсами, необходимыми для того, чтобы сделать текущий суммарный сигнал возбуждения более близким к реальному сигналу.So, a fixed codebook provides a contribution to the current excitation of the stochastic component of excitation, simulating the noise component of the excitation source. At the same time, an important property of a fixed codebook is that it actually provides the signal of the past excitation with the missing pulses necessary to make the current total excitation signal closer to the real signal.

Особенно это свойство фиксированной кодовой книги проявляется при сильной разряженности книги, когда количество импульсов возбуждения недостаточно для того, чтобы сразу хорошо описать реальный сигнал. Поэтому при сильной разряженности кодовых книг (свойственной низкоскоростным кодекам) как неогласованные, так и огласованные участки речи формируются недостаточно качественно. Однако вышеупомянутое свойство фиксированной кодовой книги позволяет уже через несколько интервалов анализа достаточно хорошо «насытить» огласованный сигнал недостающими импульсами. Но часть речевого сигнала в начале новых огласованных участков остается недостаточно хорошо восстановленной.This property of a fixed codebook is especially manifested in the case of a strong discharged book, when the number of excitation pulses is not enough to immediately describe a real signal well. Therefore, with a strong discharge of codebooks (characteristic of low-speed codecs), both uncoordinated and uncoordinated parts of speech are formed insufficiently. However, the aforementioned property of the fixed codebook allows, after just a few analysis intervals, to saturate the agreed signal with the missing pulses quite well. But part of the speech signal at the beginning of the new agreed areas remains insufficiently well restored.

Блок-схема средства масштабирования скорости передачи MPEG-4 CELP кодека представлена на Фиг.3.A block diagram of an MPEG-4 CELP codec transmission rate scaling tool is shown in FIG. 3.

Это средство масштабирования представляет собой фактически дополнительный кодер, в котором также присутствуют источник многоимпульсного возбуждения (блок 302) на базе кодовой книги, оптимизируемый методом «анализ синтезом» аналогично кодеру базового уровня со стороны блока 306, выход которого заведен на первый управляющий вход блока 302, и со стороны блока 301, выход которого заведен на второй управляющий вход блока 302, и синтезирующий LPC-фильтр (блок 303), выход которого подается на блок вычитания 304. Информация о возбуждении текущего уровня улучшения суммируется в блоке 305 и подается на следующий уровень улучшения.This scaling tool is actually an additional encoder, which also contains a multi-pulse excitation source (block 302) based on a codebook optimized by the "synthesis analysis" method similar to the base level encoder on the side of block 306, the output of which is connected to the first control input of block 302, and from the side of block 301, the output of which is connected to the second control input of block 302, and a synthesizing LPC filter (block 303), the output of which is fed to the subtraction block 304. Information about the excitation of the current level The improvements are summarized in block 305 and fed to the next level of improvement.

В то же время имеются и следующие принципиальные отличия:At the same time, there are the following fundamental differences:

- оптимизация проводится на базе минимизации ошибки между синтезированным сигналом и сигналом остатка (а не со входным речевым сигналом), полученным на предыдущем уровне кодирования (например, на базовом уровне, если текущее масштабирующее средство является первым улучшающим уровнем),- optimization is carried out on the basis of minimizing the error between the synthesized signal and the remainder signal (and not with the input speech signal) obtained at the previous coding level (for example, at the basic level if the current scaling tool is the first improving level),

- в качестве сигнала возбуждения используется только фиксированная кодовая книга (как уже отмечалось, в кодере MPEG-4 CELP используется МРЕ в качестве фиксированной кодовой книги),- only a fixed codebook is used as an excitation signal (as already noted, MPE is used as a fixed codebook in the MPEG-4 CELP encoder),

- это средство работает автономно, т.е. не оказывает никакого влияния на предыдущие уровни, включая базовый уровень. (В частности, сформированный на данном уровне сигнал возбуждения не участвует в обновлении памяти адаптивной кодовой книги базового уровня.)- this tool works autonomously, i.e. has no effect on previous levels, including the base level. (In particular, the excitation signal generated at this level does not participate in updating the memory of the adaptive codebook of the basic level.)

Последняя особенность обеспечивает масштабируемость кодирования, поскольку обеспечивает независимость каждого уровня кодирования от более высоких (последующих) уровней кодирования. В то же время плата за такую независимость - это более низкое качество речи в сравнении с другими CELP-подобными кодеками на одинаковой скорости. Причина этому - более низкая эффективность адаптивной кодовой книги в масштабируемом кодеке, поскольку в обновлении памяти адаптивной кодовой книги участвует фиксированная книга только одного базового уровня, имеющая очень ограниченное количество импульсов возбуждения. Это не позволяет быстро «насытить» огласованный сигнал недостающими импульсами и, как следствие, не позволяет в полной мере использовать долговременные корреляции в речевом сигнале, что приводит к снижению эффективности кодирования по качеству ШП-речи. Это снижение эффективности кодирования тем больше, чем ниже скорость кодека, используемого в базовом уровне кодирования. В такой ситуации даже наличие нескольких улучшающих уровней кодирования, добавляющих недостающие импульсы в сигнал возбуждения, не обеспечивает адекватное повышение эффективности кодирования, поскольку вклад прошлого возбуждения не используется в полной мере, особенно в начале огласованных участков.The latter feature provides coding scalability, since it ensures the independence of each coding level from higher (subsequent) coding levels. At the same time, the fee for such independence is a lower quality of speech in comparison with other CELP-like codecs at the same speed. The reason for this is the lower efficiency of the adaptive codebook in a scalable codec, since a fixed book of only one basic level, with a very limited number of excitation pulses, is involved in updating the memory of the adaptive codebook. This does not allow you to quickly "saturate" the agreed signal with the missing pulses and, as a result, does not allow you to fully use long-term correlations in the speech signal, which leads to a decrease in the encoding efficiency in terms of the quality of voice-frequency speech. This decrease in coding efficiency is the greater, the lower the speed of the codec used in the basic coding level. In such a situation, even the presence of several improving coding levels that add the missing pulses to the excitation signal does not provide an adequate increase in coding efficiency, since the contribution of the past excitation is not used to the full, especially at the beginning of the agreed sections.

С другой стороны, как отмечалось выше, виртуальный цифровой канал, по которому передаются речевые пакеты в IP-сети, не является идеальным из-за возможных ошибок в нем и в первую очередь из-за потери пакетов. В таких условиях важно, чтобы потери части информации в речевом трафике не приводили к размножению ошибки, т.е. требуется независимость текущей информации от предыдущей информации. В этой связи наличие адаптивной кодовой книги сказывается отрицательно, поскольку потеря информации о параметрах текущего предиктора, которым обычно является адаптивная кодовая книга, а также потеря импульсов фиксированной кодовой книги, которые используются для обновления ее памяти, приводят к тому, что на протяжении длительного времени (теоретически стремящегося к бесконечности) состояние адаптивной кодовой книги декодера остается не соответствующим состоянию адаптивной кодовой книги кодера. Это приводит к рассогласованию их работы и резкому снижению качества восстанавливаемого речевого сигнала на стороне получателя при наличии указанных потерь.On the other hand, as noted above, a virtual digital channel through which voice packets are transmitted over an IP network is not ideal because of possible errors in it and primarily due to packet loss. Under such conditions, it is important that the loss of part of the information in voice traffic does not lead to the propagation of errors, i.e. independence of current information from previous information is required. In this regard, the presence of an adaptive codebook has a negative effect, since the loss of information about the parameters of the current predictor, which is usually an adaptive codebook, as well as the loss of impulses of a fixed codebook, which are used to update its memory, lead to the fact that for a long time ( theoretically tending to infinity) the state of the adaptive codebook of the decoder remains inconsistent with the state of the adaptive codebook of the encoder. This leads to a mismatch in their work and a sharp decrease in the quality of the restored speech signal on the recipient side in the presence of these losses.

MPEG-4 CELP кодек обладает также средством масштабирования по ширине полосы (см. блок 105 на Фиг.1), которое представляет собой отдельный MP-CELP кодек, работающий с речевым сигналом с вдвое большей шириной полосы речевого сигнала и, соответственно, с удвоенной частотой выборки, в сравнении с базовым кодеком. Блок-схема средства масштабирования по полосе показана на Фиг.4.The MPEG-4 CELP codec also has a bandwidth scaling tool (see block 105 in FIG. 1), which is a separate MP-CELP codec that works with a speech signal with twice as much bandwidth of the speech signal and, accordingly, with a double frequency samples, compared to the base codec. A block diagram of a band scaler is shown in FIG. 4.

Алгоритм работы этого кодера аналогичен алгоритму кодера базового уровня, за исключением нескольких особенностей:The operation algorithm of this encoder is similar to the basic level encoder algorithm, with the exception of several features:

- частота выборки всех сигналов 16 кГц,- sampling frequency of all signals 16 kHz,

- порядок LPC-предсказания 20, т.е. вдвое больше, чем в базовом кодере,- the order of LPC prediction 20, i.e. twice as much as in the base encoder,

- поиск по адаптивной кодовой книге проводится вокруг значения частоты основного тона, найденного при кодировании базового уровня,- the search in the adaptive codebook is carried out around the value of the frequency of the fundamental tone found during the encoding of the basic level

- сигнал возбуждения формируется из трех составляющих:- the excitation signal is formed of three components:

• прошлого возбуждения из адаптивной кодовой книги,• past excitement from the adaptive codebook,

• возбуждения из кодера базового уровня после преобразования его на удвоенную частоту выборки,• excitation from the base level encoder after converting it to twice the sampling frequency,

• многоимпульсного возбуждения из фиксированной кодовой книги,• multi-pulse excitation from a fixed codebook,

- квантование LPC коэффициентов в LSF-области проводится с предсказанием (предиктивно) с учетом коэффициентов LSF, ранее квантованных в кодере базового уровня.- the quantization of LPC coefficients in the LSF region is carried out with a prediction (predictive) taking into account the LSF coefficients previously quantized in the base layer encoder.

Блок-схема предиктивного квантователя коэффициентов LSF показана на Фиг.5.A block diagram of a predictive quantizer of LSF coefficients is shown in FIG. 5.

При квантовании используется корреляция между десятью коэффициентами, квантованными на базовом уровне (в узкой полосе - NB), и первыми десятью коэффициентами, квантованными в средстве масштабирования полосы (в широкой полосе - WB). Кроме того, используется корреляция между смежными кадрами, чтобы повысить эффективность кодирования.When quantizing, a correlation is used between the ten coefficients quantized at a basic level (in a narrow band — NB) and the first ten coefficients quantized in a band scaling tool (in a wide band — WB). In addition, correlation between adjacent frames is used to improve coding efficiency.

Квантование коэффициентов LSF в средстве расширения полосы (Bandwidth Extension Tool) выполняется в соответствии со следующими выражениями:The quantization of the LSF coefficients in the Bandwidth Extension Tool is performed in accordance with the following expressions:

где:Where:

- квантованные LSFs,

- quantized LSFs,

a_p(i) - коэффициенты междкадрового предсказания порядка Р,a _p (i) are the inter prediction coefficients of order P,

ε_p(i) - квантованная предсказанная ошибка с предыдущего кадра на расстоянии р,ε _p (i) is the quantized predicted error from the previous frame at a distance p,

b(i) - оценочные коэффициенты во внутрикадровом предсказании, при котором квантованные LSF из базового кодера (узкополосного - УП или NB) трансформируются в соответствующие коэффициенты модуля расширения полосы (широкополосного - ШП или WB).b (i) are the estimated coefficients in the intra-frame prediction, in which the quantized LSFs from the base encoder (narrowband - UE or NB) are transformed into the corresponding coefficients of the band extension module (broadband - SHP or WB).

Следует отметить, что все вычисления в блоке 105 (средство масштабирования или расширения по полосе) проводятся аналогично кодеру базового уровня. Т.е. последовательно оптимизируются каждая из трех составляющих возбуждения на основе критерия минимума среднеквадратического отклонения взвешенного синтезированного сигнала от целевого вектора. Найденные параметры затем квантуются для передачи по каналу связи.It should be noted that all the calculations in block 105 (means for scaling or expanding the band) are carried out similarly to the encoder base level. Those. each of the three excitation components is sequentially optimized based on the criterion of the minimum standard deviation of the weighted synthesized signal from the target vector. The found parameters are then quantized for transmission over the communication channel.

Таким образом, MPEG-4 CELP кодек обеспечивает масштабируемость как по битовой скорости, так и по полосе обрабатываемого речевого сигнала.Thus, the MPEG-4 CELP codec provides scalability both in bit rate and in the band of the processed speech signal.

Однако при работе в канале с изменяемой пропускной способностью (IP-сеть, например) резкие переходы с узкой полосы на широкую и наоборот приводят к заметным артефактам в звучании речи. Кроме того, чтобы обеспечить плавность в изменении скорости при переходе количество импульсов возбуждения, добавляемых в блоке 105 (средстве расширения полосы), должно быть минимальным. Однако при недостаточном количестве импульсов возбуждения качество широкополосной речи сильно страдает, поскольку большая разряженность импульсов не обеспечивает правильное формирование сигнала в верхней части спектра.However, when working in a channel with a variable bandwidth (IP network, for example), sharp transitions from a narrow band to a wide one and vice versa lead to noticeable artifacts in the sound of speech. In addition, in order to ensure a smooth change in speed during the transition, the number of excitation pulses added in block 105 (band expansion means) should be minimal. However, with an insufficient number of excitation pulses, the quality of broadband speech suffers greatly, since a large discharge of pulses does not provide the correct signal formation in the upper part of the spectrum.

Также существенным недостатком является использование разных синтезирующих фильтров в базовом кодере и в кодере расширения. Такой подход приводит к необходимости:Another significant drawback is the use of different synthesizing filters in the base encoder and in the extension encoder. This approach leads to the need:

- повторно проводить LPC-анализ в средстве расширения и усложняет квантование коэффициентов фильтра,- re-conduct LPC analysis in the expansion tool and complicates the quantization of filter coefficients,

- ресэмплировать (повторно дискретизировать) как входной речевой сигнал, так и сигнал возбуждения, что требует дополнительной временной задержки в фильтрах, дополнительных вычислений и приводит к дополнительным искажениям речи,- resample (re-sample) both the input speech signal and the excitation signal, which requires additional time delay in the filters, additional calculations and leads to additional speech distortion,

а также затрудняет построение кодека с плавным масштабированием и большим количеством уровней улучшения из-за возрастающей сложности.and also makes it difficult to build a codec with smooth scaling and a large number of improvement levels due to increasing complexity.

Кроме того, использование корреляции между кадрами при квантовании коэффициентов LSF делает данный кодек очень уязвимым к потерям в канале из-за размножения (распространимости) ошибок.In addition, the use of correlation between frames when quantizing LSF coefficients makes this codec very vulnerable to channel losses due to propagation of errors.

Все это резко ограничивает возможности использования такого кодека для передачи речи по IP-сети.All this sharply limits the possibilities of using such a codec for voice over IP networks.

Раскрытие изобретенияDisclosure of invention

Цель данного изобретения - создание такого способа и реализующего его устройства многоуровневого кодирования речи, ориентированного на передачу речи высокого качества по сети с коммутацией пакетов (например, IP-сети), которые устраняют отмеченные выше недостатки, присущие другим масштабируемым (многоуровневым) речевым кодекам, подобным MPEG-4 CELP. Для достижения этой цели предлагаются следующие улучшения.The purpose of this invention is the creation of such a method and a multi-level speech coding device implementing it, oriented to transmitting high-quality voice over a packet-switched network (for example, IP-network), which eliminate the above-mentioned disadvantages inherent in other scalable (multi-level) speech codecs, similar MPEG-4 CELP. To achieve this goal, the following improvements are proposed.

1. Для передачи речи очень высокого качества ширина частотной полосы обрабатываемого речевого сигнала должна быть достаточно широкой. Соответственно, синтезирующий фильтр речевого кодера должен иметь адекватный этой полосе порядок. Например, для кодирования входного сигнала, имеющего частотную полосу 0,04…15 кГц (полосу FM-radio), порядок синтезирующего LPC-фильтра должен быть около 40. В то же время необходимо обеспечить плавную масштабируемость кодека по полосе, начиная с минимально допустимой частотной полосы, например 0,04…3,4 кГц, плавно расширяя ее от уровня к уровню до полной ширины полосы. Обычно для кодирования такой узкополосной речи достаточен порядок фильтра, равный 10. Решение этой задачи способом, рассмотренным выше (в MPEG-4 CELP кодеке), характеризуется рядом отмеченных выше существенных недостатков.1. To transmit speech of very high quality, the width of the frequency band of the processed speech signal must be wide enough. Accordingly, the synthesizing filter of the speech encoder must have an order adequate to this band. For example, to encode an input signal having a frequency band of 0.04 ... 15 kHz (FM-radio band), the order of the synthesizing LPC filter should be about 40. At the same time, it is necessary to ensure smooth scalability of the codec in the band, starting with the minimum allowable frequency bands, for example, 0.04 ... 3.4 kHz, smoothly expanding it from level to level to the full bandwidth. Typically, a filter order of 10 is sufficient to encode such narrow-band speech. The solution to this problem by the method described above (in the MPEG-4 CELP codec) is characterized by a number of significant drawbacks noted above.

Предлагается на всех уровнях кодирования, включая базовый уровень, использовать единую максимальную частоту выборки (например, 32 кГц) и единый синтезирующий фильтр максимального порядка (например, 40). Для того чтобы минимизировать избыточность, возникающую при этом на нижних уровнях кодирования, предлагается применить такое кодирование (квантование) параметров этого фильтра, которое обеспечивает восстановление с необходимой точностью огибающей только той части спектра, которая выбрана как рабочая для заданного уровня кодирования. Сигнал, синтезируемый за пределами этой рабочей полосы, просто не использовать, т.е. отфильтровывать.It is proposed at all coding levels, including the basic level, to use a single maximum sampling frequency (for example, 32 kHz) and a single synthesizing filter of the maximum order (for example, 40). In order to minimize the redundancy that arises at the same time at the lower levels of coding, it is proposed to apply such coding (quantization) of the parameters of this filter that ensures restoration with the necessary accuracy of the envelope of only that part of the spectrum that is selected as working for a given coding level. The signal synthesized outside this operating band is simply not used, i.e. filter out.

Полюсный фильтр высокого порядка позволяет помимо более тонкой оценки огибающей спектра речевого сигнала в его максимумах довольно точно оценивать и «нули» спектра, что позволяет более точно передавать некоторые типы вокализованных звуков (в частности, назальных звуков).A high-order pole filter allows, in addition to a finer estimate of the envelope of the spectrum of the speech signal at its maximums, to fairly accurately estimate the “zeros” of the spectrum, which allows more accurate transmission of certain types of voiced sounds (in particular, nasal sounds).

Как известно, для формирования огибающей спектра в минимально заданной полосе 0,04…3,4 кГц (кодирование базового уровня) требуется передать с высокой точностью только первые 8-12 коэффициентов фильтра, а все последующие коэффициенты достаточно передать с более низкой точностью, но чтобы уровень спектральных искажений в рабочей полосе 0,04-3,4 кГц не превышал заданный уровень. На каждом из последующих уровней кодирования рабочая полоса расширяется и, соответственно, повышается точность кодирования все большего числа коэффициентов. При этом на каждом последующем уровне кодирования квантуется только ошибка между неквантованными параметрами фильтра и квантованными параметрами на предыдущих уровнях. Таким образом, достигается плавность расширения полосы от уровня к уровню при контролируемой дополнительной избыточности и сохранении «классического» алгоритма кодирования с многоимпульсным возбуждением, без дополнительных операций LPC-анализа, сложного квантования и ресэмплирования.As you know, for the formation of the spectrum envelope in the minimum specified band of 0.04 ... 3.4 kHz (coding of the base level), it is required to transmit with high accuracy only the first 8-12 filter coefficients, and all subsequent coefficients should be transmitted with lower accuracy, but in order the level of spectral distortion in the working band of 0.04-3.4 kHz did not exceed the specified level. At each of the subsequent coding levels, the working band expands and, accordingly, the coding accuracy of an increasing number of coefficients is increased. At the same time, at each subsequent coding level, only the error is quantized between the non-quantized filter parameters and the quantized parameters at the previous levels. Thus, a smooth expansion of the band from level to level is achieved with controlled additional redundancy and preservation of the “classical” coding algorithm with multi-pulse excitation, without additional LPC analysis, complex quantization and resampling.

В качестве примера такого квантователя параметров синтезирующего фильтра может служить векторный квантователь LSF, в котором поиск по кодовой книге ведется на основе критерия минимума взвешенной среднеквадратической ошибки между вектором исходных коэффициентов и вектором квантованных коэффициентов:An example of such a quantizer of synthesizing filter parameters is the LSF vector quantizer, in which the codebook is searched based on the criterion of the minimum weighted mean square error between the vector of initial coefficients and the vector of quantized coefficients:

где:Where:

K - порядок синтезирующего фильтра (например, 40),K is the order of the synthesizing filter (for example, 40),

f_i,

- соответственно неквантованные и квантованные коэффициенты LSF,f _i

- respectively, non-quantized and quantized LSF coefficients,

w_i - вектор весовых коэффициентов, который и задает точность квантования отдельных LSF-коэффициентов.w _i is the vector of weight coefficients, which determines the quantization accuracy of individual LSF coefficients.

Вектор весовых коэффициентов является переменным и для каждого нового вектора LSF, подлежащего квантованию, должен рассчитываться исходя из заданных спектральных искажений в рабочей полосе частот как:The vector of weights is variable and for each new LSF vector to be quantized, it must be calculated based on the given spectral distortions in the working frequency band as:

где:Where:

- вектор квантуемых LSF-коэффициентов,

is the vector of quantized LSF coefficients,

- вектор квантованных LSF-коэффициентов.

is the vector of quantized LSF coefficients.

- взвешенные спектральные искажения, которые могут быть рассчитаны как:

- weighted spectral distortion, which can be calculated as:

где:Where:

С - нормирующая константа,C is the normalizing constant

,

- энергетические LPC-спектры соответственно неквантованных и квантованных LPC, которые могут быть рассчитаны как:

,

- energy LPC spectra of non-quantized and quantized LPC, respectively, which can be calculated as:

- функция взвешивания по частоте, которая определяется рабочей шириной частотной полосы и важностью восприятия слушателем отдельных участков спектра внутри рабочей полосы и может быть рассчитана, например, как:

- a function of weighting by frequency, which is determined by the working width of the frequency band and the importance of the listener's perception of individual sections of the spectrum within the working band and can be calculated, for example, as

где:Where:

- неквантованные LPC-коэффициенты, выраженные как функция вектора LSF,

- non-quantized LPC coefficients, expressed as a function of the LSF vector,

r - эмпирическая константа (≈0,15),r is the empirical constant (≈0.15),

P_LPF(φ) - спектральная характеристика фильтра нижних частот (LPF), который определяет рабочую частотную полосу, фиксированную для текущего уровня кодирования.P _LPF (φ) is the spectral characteristic of the low-pass filter (LPF), which determines the working frequency band fixed for the current encoding level.

Вычисление второй производной (23) и интеграла (24) может быть выполнено любым из соответствующих реализационных алгоритмов, при выборе из которых целесообразно учесть снижение вычислительных затрат.The calculation of the second derivative (23) and integral (24) can be performed by any of the corresponding implementation algorithms, when choosing from which it is advisable to take into account the reduction in computational costs.

Обучение кодовых книг квантователя может быть выполнено любым из известных оптимизационных алгоритмов (например, с помощью алгоритма Ллойда).The quantizer codebook training can be performed by any of the known optimization algorithms (for example, using the Lloyd's algorithm).

2. Как известно, использование предсказания при кодировании параметров речевого сигнала (параметров возбуждения и LPC-фильтра) позволяет увеличить эффективность кодирования с учетом корреляционной связи (избыточности) между текущими и прошлыми сигналами или параметрами. Предиктивное кодирование позволяет использовать эту избыточность и существенно повысить тем самым качество восстановленного речевого сигнала по сравнению с непредиктивным кодированием при одинаковой степени сжатия (одинаковой битовой скорости).2. As you know, using prediction when encoding parameters of a speech signal (excitation parameters and LPC filter) allows you to increase the encoding efficiency taking into account the correlation (redundancy) between current and past signals or parameters. Predictive coding allows using this redundancy and thereby significantly improving the quality of the reconstructed speech signal compared to non-predictive coding at the same compression ratio (the same bit rate).

С другой стороны, как отмечалось выше, предиктивное кодирование весьма чувствительно к ошибкам и потерям информации при передаче речевого потока по каналу связи из-за эффекта размножения ошибок.On the other hand, as noted above, predictive coding is very sensitive to errors and loss of information when transmitting a speech stream through a communication channel due to the effect of error propagation.

Одним из известных компромиссных вариантов решения этой проблемы является чередование во времени непредиктивного и предиктивного кодирования. В этом случае распространение ошибки невозможно далее, чем на расстояние между смежными кадрами с непредиктивно кодированными параметрами. Однако подобное компромиссное решение без учета самого речевого сигнала и типа кодирования (предиктивный или непредиктивный) для данного конкретного отрезка кодируемого сигнала может не дать требуемого качества. Кроме того, действие даже одиночной ошибки приводит к деградации качества соответствующего участка сигнала из-за эффекта ее размножения, хотя и ограничивается по времени.One well-known compromise solution to this problem is the alternation in time of non-predictive and predictive coding. In this case, the propagation of the error is impossible further than the distance between adjacent frames with non-predictively encoded parameters. However, such a compromise solution without taking into account the speech signal itself and the type of coding (predictive or non-predictive) for this particular segment of the encoded signal may not give the required quality. In addition, the effect of even a single error leads to degradation of the quality of the corresponding signal section due to the effect of its propagation, although it is limited in time.

Целесообразно связать выбор типа кодирования с характером кодируемого сигнала через некоторый критерий стационарности. Идея такого подхода заключается в том, чтобы обнаруживать начальный участок квазистационарного отрезка сигнала, в рамках которого его статистические характеристики практически неизменны, и кодировать его непредиктивно, но достаточно точно, а последующие за ним участки внутри этого квазистационарного отрезка кодировать предиктивно по отношению к начальному непредиктивно кодированному участку.It is advisable to associate the choice of encoding type with the nature of the encoded signal through a certain stationarity criterion. The idea of this approach is to detect the initial portion of the quasistationary signal segment, within which its statistical characteristics are practically unchanged, and encode it unpredictably, but fairly accurately, and encode subsequent sections within this quasistationary segment predictively with respect to the initial non-predictively encoded plot.

Подобный подход известен из кодирования видеопоследовательностей, когда ключевые кадры (фреймы) кодируются непредиктивно, а промежуточные кадры кодируются предиктивно относительно ключевых кадров. При этом возможна автоматическая расстановка ключевых кадров на основе некоторого критерия отличия текущего кадра от последнего ключевого.A similar approach is known from coding of video sequences, when key frames (frames) are encoded non-predictively, and intermediate frames are encoded predictively with respect to key frames. At the same time, automatic arrangement of key frames is possible based on some criterion for distinguishing the current frame from the last key one.

Предлагается применить подобный подход (в дополнение к вышеописанному улучшению по п.1) к кодированию параметров речевых сигналов в масштабируемом речевом кодеке с целью обеспечить компромисс между эффективностью кодирования и устойчивостью к потерям (размножению ошибки). Для этого при кодировании параметров синтезирующего фильтра целесообразно опираться на некоторый критерий отличий текущих параметров синтезирующего фильтра от тех кодированных параметров синтезирующего фильтра, которые в прошлом были последними закодированы непредиктивно. Если критерий отличий не выполняется, то текущие параметры синтезирующего фильтра квантуются предиктивно относительно тех кодированных параметров синтезирующего фильтра, которые последними были закодированы непредиктивно. Если критерий отличий выполняется, то текущие параметры квантуются непредиктивно.It is proposed to apply a similar approach (in addition to the improvement described above according to claim 1) to encoding parameters of speech signals in a scalable speech codec in order to provide a compromise between coding efficiency and resistance to losses (error propagation). To do this, when encoding the parameters of the synthesizing filter, it is advisable to rely on some criterion for the differences between the current parameters of the synthesizing filter and those encoded parameters of the synthesizing filter, which in the past were the last encoded non-predictively. If the criterion of differences is not met, then the current parameters of the synthesizing filter are quantized predictively relative to those encoded parameters of the synthesizing filter that were last encoded non-predictively. If the criterion of differences is met, then the current parameters are quantized unpredictably.

При таком подходе искажение или потеря любых предиктивно кодированных параметров не приводят к размножению ошибки, поскольку каждые последующие предиктивно кодированные параметры слабо зависят от предыдущих предиктивно кодированных параметров. Однако потеря непредиктивно кодированных параметров приведет к искажению всего участка, который предиктивно кодирован по отношению к ним. Тем не менее, во-первых, вероятность попадания ошибки на непредиктивно квантованные параметры существенно ниже, чем на промежуточные предиктивно квантованные, а во-вторых, непредиктивно квантованные параметры могут быть дополнительно защищены одним из способов (например, корректирующим кодом или дублированием). Поэтому предложенный способ кодирования параметров синтезирующего фильтра обеспечивает более высокую устойчивость к потерям, поскольку снижает вероятность распространения ошибки.With this approach, distortion or loss of any predictively encoded parameters does not lead to propagation of the error, since each subsequent predictive encoded parameter is weakly dependent on previous predictively encoded parameters. However, the loss of non-predictively encoded parameters will lead to a distortion of the entire section that is predictively encoded with respect to them. However, firstly, the probability of an error getting on non-predictively quantized parameters is significantly lower than on intermediate predictively quantized parameters, and secondly, non-predictively quantized parameters can be additionally protected by one of the methods (for example, by corrective code or duplication). Therefore, the proposed method for encoding the parameters of a synthesizing filter provides higher resistance to losses, since it reduces the likelihood of error propagation.

Вместе с тем, подобная схема кодирования сохраняет высокую эффективность, т.к. использует сильные корреляционные связи внутри квазистационарных участков и не использует предиктивное кодирование там, где его эффективность низка.At the same time, such a coding scheme retains high efficiency, since It uses strong correlations within quasi-stationary sections and does not use predictive coding where its efficiency is low.

Более важно в сетях IP обеспечить устойчивость к потере кадров, характерной для подобных сетей. Существуют различные механизмы для этого, базирующиеся на интерполяции потерянной информации по ближайшим кадрам (переданным до потерянного и/или после потерянного) или на ее повторении. С учетом предлагаемого подхода формирования непредиктивно кодированных ключевых кадров целесообразно использовать механизм повторения путем сохранения некоторой минимально необходимой информации из непредиктивно кодированного кадра в таком виде: важная часть группы параметров базового уровня (в первую очередь чувствительные к ошибкам/потерям биты) непредиктивно кодированного кадра или все такие параметры повторяются в следующем (или даже в двух или более следующих) предиктивно кодированном(ых) кадре(ах). Тогда при потере непредиктивно кодированного кадра его базовая часть может быть восстановлена с некоторой задержкой (большей частью на один кадр) по информации о ней в следующем(их) предиктивно кодированном(ых) кадре(ах). Квантование параметров базового уровня, которые повторяются в следующим кадре за непредиктивно кодированном кадром, может быть и более грубым с целью уменьшения такой вводимой контролируемой избыточности.It is more important in IP networks to provide immunity to frame loss characteristic of such networks. There are various mechanisms for this, based on the interpolation of the lost information in the nearest frames (transmitted before the lost and / or after the lost) or on its repetition. Taking into account the proposed approach to the formation of unpredictably encoded key frames, it is advisable to use the repetition mechanism by storing some minimum necessary information from a non-predictively encoded frame in the following form: an important part of the group of basic level parameters (primarily bits / error-sensitive bits) of a non-predictively encoded frame or all such parameters are repeated in the next (or even two or more of the following) predictively encoded frame (s). Then, if a non-predictively encoded frame is lost, its base part can be restored with a certain delay (for the most part by one frame) from the information about it in the next (them) predictively encoded frame (s). The quantization of the parameters of the basic level, which are repeated in the next frame after the non-predictively encoded frame, may be coarser in order to reduce such an introduced controlled redundancy.

В качестве примера квантователя параметров синтезирующего фильтра может служить переключаемый (предиктивный/непредиктивный) векторный квантователь LSF, построенный одним из известных способов. В качестве критерия отличия, по которому происходит переключение с одного типа квантователя на другой, может быть использован простой критерий сравнения взвешенной среднеквадратической ошибки между текущими коэффициентами LSF и ранее непредиктивно квантованными LSF с заданным порогом:An example of a synthesizer filter parameter quantizer is a switchable (predictive / non-predictive) vector LSF quantizer constructed by one of the known methods. As a criterion for the difference by which switching from one type of quantizer to another can be used, a simple criterion for comparing the weighted mean square error between the current LSF and previously unpredictably quantized LSF with a given threshold can be used:

(где if - это «если», then - «тогда», Threshold означает «порог», a otherwise означает «иначе»).(where if is “if,” then is “then,” Threshold means “threshold,” otherwise means “otherwise.”)

Весовой вектор w_i обеспечивает взвешивание LSF-коэффициентов по их значимости по восприятию слушателем.The weight vector w _i provides weighting of the LSF coefficients according to their significance according to the listener's perception.

3. В соответствии с предыдущими рассуждениями, адаптивная кодовая книга также является источником размножения ошибки. Поэтому разумно или не использовать адаптивную кодовую книгу там, где ее эффективность невысокая, решив тем самым компромисс между эффективностью кодирования и устойчивостью к потерям, или использовать указанный выше механизм введения контролируемой избыточности для ослабления влияния потерь кадров.3. In accordance with the previous reasoning, the adaptive codebook is also a source of error propagation. Therefore, it is reasonable or not to use an adaptive codebook where its efficiency is low, thereby solving the compromise between coding efficiency and loss tolerance, or to use the above mechanism for introducing controlled redundancy to mitigate the effects of frame loss.

В первом случае в дополнение к улучшению по п.1 предлагается не использовать вклад адаптивной кодовой книги в текущий сигнал возбуждения в том случае, если не выполняется некоторый критерий эффективности этой адаптивной кодовой книги.In the first case, in addition to the improvement according to claim 1, it is proposed not to use the contribution of the adaptive codebook to the current excitation signal if some criterion of the effectiveness of this adaptive codebook is not met.

В качестве такого критерия может быть использована, например, среднеквадратическая ошибка между входным и предсказанным речевым сигналом, нормированная по энергии входного сигнала, которая сравнивается с заданным порогом:As such a criterion, for example, the mean-square error between the input and the predicted speech signal, normalized by the energy of the input signal, which is compared with a given threshold, can be used:

где:Where:

(use - означает «использовать», not_use - «не использовать»).(use - means "use", not_use - "do not use").

T.e. если нормированная ошибка не превышает заданный порог, то адаптивная кодовая книга (АКК) применяется для формирования возбуждения текущего подкадра. В противном случае адаптивная кодовая книга не используется. Флаг использования адаптивной кодовой книги передается в линию связи вместе с другими речевыми параметрами.T.e. if the normalized error does not exceed a predetermined threshold, then the adaptive codebook (ACC) is used to generate the excitation of the current subframe. Otherwise, the adaptive codebook is not used. The adaptive codebook usage flag is transmitted to the communication line along with other speech parameters.

Во втором случае флаг использования АКК и информация о вкладе АКК повторяются в следующем кадре, что снизит ущерб от потери предыдущего кадра за счет локализации ошибки.In the second case, the ACC usage flag and information on the ACC contribution are repeated in the next frame, which will reduce the damage from the loss of the previous frame due to error localization.

4. Как отмечалось выше, эффективность адаптивной кодовой книги остается низкой в начале огласованных стационарных участков, особенно когда количество импульсов возбуждения, формируемых фиксированной кодовой книгой, малое. Только после нескольких интервалов анализа адаптивная кодовая книга «насыщает» сигнал возбуждения. В результате качество речи на значительной части этих стационарных участков сильно страдает. Эта проблема особенно обостряется при кодировании речи на базовом уровне, где минимальная битовая скорость может быть достигнута только при очень малом количестве импульсов возбуждения, формируемых фиксированной кодовой книгой. Обычный подход, в котором количество импульсов возбуждения, формируемых фиксированной кодовой книгой, не зависит от входного сигнала, вынуждает смириться либо с недостаточным речевым качеством в начале огласованных стационарных участков, если используется малое количество импульсов, либо с завышенной битовой скоростью при использовании большего количества импульсов.4. As noted above, the adaptive codebook efficiency remains low at the beginning of the agreed stationary sections, especially when the number of excitation pulses generated by the fixed codebook is small. Only after several analysis intervals does the adaptive codebook “saturate” the excitation signal. As a result, the quality of speech in a significant part of these stationary sites is severely affected. This problem is especially exacerbated when encoding speech at a basic level, where the minimum bit rate can be achieved only with a very small number of excitation pulses generated by a fixed codebook. The usual approach, in which the number of excitation pulses generated by a fixed codebook, is independent of the input signal, forces one to put up with insufficient speech quality at the beginning of agreed stationary sections if a small number of pulses is used, or with an increased bit rate when using more pulses.

Как ранее указывалось, сети с коммутацией пакетов (например, IP-сеть) не накладывают требование на постоянство битовой скорости. Разумно использовать эту возможность и управлять количеством импульсов возбуждения в кодеке в зависимости от информативности (стационарности) входного сигнала.As previously indicated, packet-switched networks (e.g., IP-based networks) do not impose a requirement for a constant bit rate. It is reasonable to use this opportunity and control the number of excitation pulses in the codec, depending on the information content (stationarity) of the input signal.

Предлагается, в дополнение к улучшению по п.1, связать количество импульсов возбуждения, формируемого фиксированной кодовой книгой, с некоторым критерием начала вокализованного (огласованного) стационарного участка.It is proposed, in addition to the improvement according to claim 1, to associate the number of excitation pulses generated by the fixed codebook with some criterion for the start of a voiced (agreed upon) stationary section.

В качестве этого критерия начала огласованного стационарного участка может быть использован, например, критерий сравнения усиления предсказания, вычисленного по формуле (29), с заданным порогом:As this criterion for the beginning of the agreed stationary section, for example, a criterion for comparing the prediction gain calculated by the formula (29) with a given threshold can be used:

где Vocal_Stationary_Section_begins означает «начался стационарный огласованный участок речи», otherwise означает «иначе», not означает «нет».where Vocal_Stationary_Section_begins means "the stationary stationary part of speech has begun", otherwise means "otherwise", not means "no."

Анализ очень большого речевого материала показывает, что точность предсказания резко увеличивается в самом начале огласованных стационарных участков, когда новая стационарная речевая волна еще только начинает развиваться. Увеличивая количество импульсов возбуждения, формируемых фиксированной кодовой книгой, удается более точно сформировать эту развивающуюся речевую волну на данном подкадре (или кадре). Сигнал возбуждения в последующих подкадрах (или кадрах) формируется также с более высокой точностью, но уже за счет адаптивной кодовой книги при минимальном вкладе фиксированной кодовой книги. Таким образом, кратковременный «впрыск» дополнительных импульсов в начале огласованных стационарных участков несущественно повышает среднюю битовую скорость кодированной речи, но позволяет значительно повысить качество синтезированной речи, поскольку эффективность предсказателя адаптивной кодовой книги становится очень высокой уже с самого начала огласованного стационарного участка.The analysis of very large speech material shows that the accuracy of prediction sharply increases at the very beginning of the agreed stationary sections, when a new stationary speech wave is just beginning to develop. By increasing the number of excitation pulses generated by the fixed codebook, it is possible to more accurately form this developing speech wave on a given subframe (or frame). The excitation signal in subsequent subframes (or frames) is also formed with higher accuracy, but already due to the adaptive codebook with a minimum contribution of a fixed codebook. Thus, a short-term “injection” of additional pulses at the beginning of the agreed stationary sections does not significantly increase the average bit rate of coded speech, but it can significantly improve the quality of synthesized speech, since the adaptive codebook predictor efficiency becomes very high from the very beginning of the agreed stationary section.

Краткое описание чертежейBrief Description of the Drawings

На Фиг.1 представлена упрощенная блок-схема речевого кодера MPEG-4 CELP, работающего в режиме масштабирования битовой скорости и частотной полосы.Figure 1 presents a simplified block diagram of a speech encoder MPEG-4 CELP, operating in the mode of scaling the bit rate and frequency band.

На Фиг.2 дана упрощенная блок-схема речевого CELP-кодера с многоимпульсным возбуждением (MP-CELP), являющегося кодером базового уровня в MPEG-4 CELP.Figure 2 shows a simplified block diagram of a multi-pulse excitation CELP speech encoder (MP-CELP), which is a basic level encoder in MPEG-4 CELP.

На Фиг.3 показана упрощенная блок-схема средства масштабирования битовой скорости (Bit Rate Scalable tool) речевого кодера MPEG-4 CELP.Figure 3 shows a simplified block diagram of the Bit Rate Scalable tool of the MPEG-4 CELP speech encoder.

На Фиг.4 представлена упрощенная блок-схема средства расширения частотной полосы (Bandwidth Extension Tool) речевого кодера MPEG-4 CELP.Figure 4 shows a simplified block diagram of the Bandwidth Extension Tool of the MPEG-4 CELP speech encoder.

На Фиг.5 показана упрощенная блок-схема LPC-квантователя (квантователя LSF коэффициентов) средства расширения частотной полосы (Bandwidth Extension Tool) речевого кодера MPEG-4 CELP.Figure 5 shows a simplified block diagram of an LPC quantizer (LSF coefficient quantizer) of the Bandwidth Extension Tool of the MPEG-4 CELP speech encoder.

На Фиг.6 показана упрощенная блок-схема масштабируемого речевого кодера, использующего заявляемый способ кодирования.Figure 6 shows a simplified block diagram of a scalable speech encoder using the inventive encoding method.

На Фиг.7 показана упрощенная блок-схема MP-CELP кодера базового уровня, использующего заявляемый способ кодирования.7 shows a simplified block diagram of an MP-CELP encoder of a basic level using the inventive encoding method.

На Фиг.8 показана упрощенная блок-схема LPC-квантователя MP-CELP кодера базового уровня, использующего заявляемый способ кодирования.On Fig shows a simplified block diagram of the LPC quantizer MP-CELP encoder base level using the inventive encoding method.

На Фиг.9 показана упрощенная блок-схема MP-CELP кодера улучшающего уровня, использующего заявляемый способ кодирования.Figure 9 shows a simplified block diagram of an MP-CELP encoder improving level using the inventive encoding method.

На Фиг.10 показана упрощенная блок-схема LPC-квантователя MP-CELP кодера уровня улучшения, использующего заявляемый способ кодирования.Figure 10 shows a simplified block diagram of the LPC quantizer MP-CELP encoder level enhancement using the inventive encoding method.

Осуществление изобретенияThe implementation of the invention

Блок-схема масштабируемого речевого кодера, в котором использован метод кодирования в соответствии с предлагаемым изобретением, показана на Фиг.6.A block diagram of a scalable speech encoder using an encoding method in accordance with the invention is shown in FIG. 6.

Как видно из Фиг.6, масштабируемый кодер содержит MP-CELP кодер базового уровня кодирования (блок 601), на который поступает речевой сигнал в цифровой форме, с максимальной частотой дискретизации (например, с частотой 32 кГц), и набор из N MP-CELP кодеров улучшающих уровней кодирования (блоки 602, 603 и 604). Важной особенностью данной схемы является отсутствие каких-либо преобразователей частоты выборки входного речевого сигнала. Т.е. кодеры всех уровней работают с сигналами входной (максимальной) частоты дискретизации (или выборки).As can be seen from Fig.6, the scalable encoder contains an MP-CELP encoder of the basic encoding level (block 601), which receives a speech signal in digital form, with a maximum sampling frequency (for example, with a frequency of 32 kHz), and a set of N MP CELP encoders improving coding levels (blocks 602, 603 and 604). An important feature of this circuit is the absence of any converters of the sampling frequency of the input speech signal. Those. encoders of all levels work with signals of the input (maximum) sampling frequency (or sample).

Тем не менее для того чтобы обеспечить низкую битовую скорость, кодер базового уровня кодирует речевой сигнал, используя очень малое количество импульсов возбуждения и только в узкой (базовой) полосе частот, например 0,04…3,4 кГц. Результатом работы кодера базового уровня (блок 601) является набор кодированных параметров базового уровня, передаваемые в канал через блок 605, реализующий мультиплексирование или пакетизацию битового потока и корректирующего (помехоустойчивого) кодирования всех или части кодированных параметров речевого сигнала для борьбы с ошибками в цифровом канале связи, а также ряд параметров (квантованные и неквантованные коэффициенты LSF, информация об основном тоне и о позициях импульсов возбуждения и др.), передаваемых на кодер последующего улучшающего уровня (блок 602), включая сигнал остатка. Этот сигнал остатка получен как разница между взвешенным входным речевым сигналом и взвешенным синтезированным сигналом базового уровня за вычетом «звона» взвешенного синтезирующего фильтра и служит входным целевым вектором для кодера следующего улучшающего уровня. Каждый из последующих улучшающих уровней (блоки 603 и 604) кодирует этот сигнал остатка, поступающий с предыдущего уровня улучшения наряду с информацией об ОТ и позициях импульсов возбуждения, а также о квантованных и неквантованных коэффициентах LSF. Выходы всех последовательно соединенных блоков 601, 602, 603 и 604 поступают на соответствующие входы блока 605. При этом речевое качество суммы сигналов, синтезированных на всех предыдущих уровнях кодирования, плавно повышается от уровня к уровню за счет как дополнительных импульсов в сигнале возбуждения, так и за счет расширения частотной полосы. Суммарный синтезированный сигнал всех уровней сформирован, таким образом, максимальным числом импульсов возбуждения и имеет максимальную ширину частотной полосы. Речевое качество этого сигнала близко качеству исходного речевого сигнала.Nevertheless, in order to ensure a low bit rate, the base level encoder encodes the speech signal using a very small number of excitation pulses and only in a narrow (base) frequency band, for example, 0.04 ... 3.4 kHz. The result of the operation of the basic level encoder (block 601) is a set of encoded basic level parameters transmitted to the channel through block 605, which implements multiplexing or packetization of the bit stream and corrective (noise-resistant) encoding of all or part of the encoded parameters of the speech signal to combat errors in the digital communication channel , as well as a number of parameters (quantized and non-quantized LSF coefficients, information on the fundamental tone and on the positions of the excitation pulses, etc.) transmitted to the encoder Collapsing level (block 602), including a residue signal. This residual signal is obtained as the difference between the weighted input speech signal and the weighted synthesized base level signal minus the "ringing" of the weighted synthesizing filter and serves as an input target vector for the encoder of the next enhancement level. Each of the subsequent enhancement levels (blocks 603 and 604) encodes this residual signal coming from the previous enhancement level along with information about the OT and positions of the excitation pulses, as well as the quantized and non-quantized LSF coefficients. The outputs of all series-connected blocks 601, 602, 603 and 604 are supplied to the corresponding inputs of block 605. In this case, the speech quality of the sum of the signals synthesized at all previous coding levels gradually increases from level to level due to both additional pulses in the excitation signal and due to the expansion of the frequency band. The total synthesized signal of all levels is thus formed by the maximum number of excitation pulses and has a maximum frequency band width. The speech quality of this signal is close to the quality of the original speech signal.

Блок-схема базового MP-CELP кодера, использующего способ кодирования в соответствии с настоящей заявкой на изобретение, показана на Фиг.7.A block diagram of a basic MP-CELP encoder using the encoding method in accordance with the present application for the invention, shown in Fig.7.

Как видно из Фиг.7, эта блок-схема похожа на блок-схему MPEG-4 CELP кодера, показанную на Фиг.2. Поэтому при кодировании все основные вычисления выполняются аналогично, в соответствии с выражениями (1)-(19). В то же время базовый кодер, использующий заявляемый метод, имеет следующие существенные отличия.As can be seen from FIG. 7, this block diagram is similar to the block diagram of the MPEG-4 CELP encoder shown in FIG. Therefore, when coding, all basic calculations are performed similarly, in accordance with expressions (1) - (19). At the same time, the basic encoder using the inventive method has the following significant differences.

Во-первых, частота выборки входного сигнала соответствует максимальной ширине частотной полосы, выбранной для всех уровней кодирования, например 32 кГц. Соответственно, порядок LPC-фильтра также выбран исходя из максимальной ширины частотной полосы, например 40-й.Firstly, the sampling frequency of the input signal corresponds to the maximum frequency bandwidth selected for all coding levels, for example 32 kHz. Accordingly, the order of the LPC filter is also selected based on the maximum frequency bandwidth, for example, the 40th.

Кроме того, на блок-схеме присутствует дополнительный блок 711 «анализатора эффективности адаптивной кодовой книги» (Adaptive Codebook's Effectiveness Analyzer), который исключает вклад адаптивной кодовой книги из сигнала возбуждения в том случае, когда эффективность кодовой книги низка. Работа анализатора может быть осуществлена, например, в соответствии с выражением (28).In addition, the block diagram contains an additional block 711 of the “Adaptive Codebook's Effectiveness Analyzer”, which eliminates the contribution of the adaptive codebook from the excitation signal when the codebook efficiency is low. The analyzer can be performed, for example, in accordance with expression (28).

Также на блок-схеме присутствует дополнительный блок 712 «анализатора начала стационарного огласованного участка» (Analyzer of the Stationary Voiced Section Beginning), который управляет вкладом дополнительной фиксированной кодовой книги в сигнал возбуждения в начале огласованных стационарных участков. Этот анализатор может быть реализован, например, в соответствии с выражением (30). На блоки 711 и 712 подается входная речь с максимальной частотой дискретизации. Выход блока 711 управляет переключателем 706, а выход блока 12 управляет переключателем 705. Дополнительная фиксированная кодовая книга (блок 704), на которую также подается сигнал поиска оптимального кодового слова, как и на блоки 701 и 702, позволяет при обнаружении стационарного участка речи через переключатель 705 обеспечить дополнительный «впрыск» импульсов возбуждения.Also on the block diagram there is an additional block 712 of the “Analyzer of the Stationary Voiced Section Beginning”, which controls the contribution of the additional fixed codebook to the excitation signal at the beginning of the agreed stationary sections. This analyzer can be implemented, for example, in accordance with expression (30). Blocks 711 and 712 are input speech with a maximum sampling rate. The output of block 711 controls the switch 706, and the output of block 12 controls the switch 705. An additional fixed codebook (block 704), which also receives the search signal for the optimal codeword, as well as blocks 701 and 702, allows for detection of a stationary portion of speech through the switch 705 provide additional “injection” of excitation pulses.

Остальные особенности заключаются в реализации отдельных модулей базового кодера, в первую очередь LPC-квантователя. Блок-схема квантователя показана на Фиг.8.Other features are the implementation of individual modules of the base encoder, primarily the LPC quantizer. A quantizer block diagram is shown in FIG.

Квантователь, на вход которого поступают коэффициенты LSP через переключатель 81.1 (блок 801), состоит из двух частей: непредиктивного векторного квантователя 1 (блок 802) и предиктивного векторного квантователя 2 (блок 803) с межкадровым предиктором (блок 807), выход которого поступает на блок вычитаний 804 и блок суммирования 810. Переключение входов и выходов этих квантователей обеспечивается с помощью условных переключателей соответственно с номером 81.1 (блок 801) и номером 81.2 (блок 808), управляемых сигналом "критерий различий" (Distinction Criterion) с выхода блока 806. Этот сигнал формируется «анализатором различий» (блок 806) как результат сравнительного анализа текущих входных LSF-коэффициентов и последних по времени «ключевых LSF- коэффициентов», хранимых в памяти блока 809 (Key LSF's memory), например, в соответствии с выражением (27). В зависимости от этого параметра и соответственно положения переключателей 81.1 и 81.2, текущие LSF-коэффициенты квантуются либо непредиктивно и затем сохраняются в памяти как новые «ключевые LSF», либо предиктивно по отношению к последним «ключевым LSF», хранимым в памяти блока 809, выход которого подключен ко входам блоков 806 и 807. Выходом квантователя являются квантованные коэффициенты LSF, которые могут в зависимости от положения переключателя 81.2 поступать с выхода блока 802 или блока 803 через сумматор 810.A quantizer, the input of which receives the LSP coefficients through switch 81.1 (block 801), consists of two parts: a non-predictive vector quantizer 1 (block 802) and a predictive vector quantizer 2 (block 803) with an inter-frame predictor (block 807), the output of which goes to a subtraction block 804 and a summing block 810. The inputs and outputs of these quantizers are switched using conditional switches with numbers 81.1 (block 801) and number 81.2 (block 808), respectively, controlled by the Distinction Criterion signal from the block output and 806. This signal is generated by the “difference analyzer” (block 806) as a result of a comparative analysis of the current input LSF coefficients and the last “key LSF coefficients” stored in the memory of block 809 (Key LSF's memory), for example, in accordance with expression (27). Depending on this parameter and accordingly the positions of switches 81.1 and 81.2, the current LSF coefficients are quantized either unpredictably and then stored in the memory as new “key LSFs”, or predictively in relation to the last “key LSFs” stored in the memory of block 809, output which is connected to the inputs of blocks 806 and 807. The output of the quantizer are the quantized LSF coefficients, which, depending on the position of the switch 81.2, can come from the output of block 802 or block 803 through an adder 810.

Главной особенностью квантователя является наличие блока «формирования весов» (блок 805). Именно этот блок, используя текущие входные LSF-коэффициенты и ширину рабочей полосы частот (Base Bandwidth), принятую на базовом уровне кодирования (например, 0,04…3,4 кГц), адаптивно формирует для каждого текущего вектора LSF-коэффициентов взвешивающий вектор, который используется в обоих векторных квантователях 1 и 2. Этот взвешивающий вектор может быть сформирован в соответствии с выражением (23). Таким образом, несмотря на то что порядок фильтра максимальный, эффективно кодируется речевой сигнал только в рабочей принятой для базового уровня полосе частот. Сигнал, синтезированный за пределами рабочей частотной полосы, отфильтровывается в декодере базового уровня.The main feature of the quantizer is the presence of the block “formation of weights” (block 805). It is this unit, using the current input LSF coefficients and the Base Bandwidth, adopted at the basic coding level (for example, 0.04 ... 3.4 kHz) that adaptively generates a weighting vector for each current LSF coefficient vector, which is used in both vector quantizers 1 and 2. This weighting vector can be formed in accordance with expression (23). Thus, in spite of the fact that the filter order is maximum, the speech signal is effectively encoded only in the working frequency band adopted for the base level. The signal synthesized outside the working frequency band is filtered out at the base level decoder.

Блок-схема MP-CELP кодера улучшающего уровня показана на Фиг.9.The block diagram of the MP-CELP encoder enhancement layer shown in Fig.9.

Очевидно, что блок-схема кодера уровня улучшения очень похожа на блок-схему средства масштабирования скорости кодера MPEG-4 CELP, показанную на Фиг.3. Формирование сигнала возбуждения синтезирующего фильтра может быть осуществлено «классическим» методом поиска позиций импульсов, принятым в МРЕ-методе, например, в соответствии с выражениями (17)-(19). В то же время имеются и следующие существенные отличия.Obviously, the block diagram of the enhancement level encoder is very similar to the block diagram of the MPEG-4 CELP encoder rate scaling tool shown in FIG. 3. The excitation signal of the synthesizing filter can be generated using the “classical” method for searching for pulse positions, adopted in the MPE method, for example, in accordance with expressions (17) - (19). At the same time, there are the following significant differences.

Во-первых, частота выборки сигнала остатка, поступающего на вход кодера с предыдущего уровня, всегда равна максимальной (например, 32 кГц) частоте, несмотря на ширину частотной полосы, которая принята как «рабочая» на данном уровне кодирования. Соответственно, порядок синтезирующего фильтра также всегда остается максимального порядка (например, 40-го).Firstly, the sampling frequency of the residual signal entering the encoder input from the previous level is always equal to the maximum (for example, 32 kHz) frequency, despite the width of the frequency band, which is accepted as “working” at this encoding level. Accordingly, the order of the synthesizing filter also always remains the maximum order (for example, the 40th).

Во-вторых, на блок-схеме присутствует LPC-квантователь, на вход которого поступают неквантованные LSF-коэффициенты и LSF-коэффициенты, квантованные в кодере предыдущего уровня. Кроме того, на вход квантователя поступает номер данного уровня улучшения, который косвенно задает ширину рабочей полосы частот для данного уровня кодирования.Secondly, there is an LPC quantizer in the block diagram, the input of which receives non-quantized LSF coefficients and LSF coefficients quantized in the encoder of the previous level. In addition, the quantizer receives the number of this improvement level, which indirectly sets the width of the working frequency band for a given encoding level.

Блок-схема этого LPC-квантователя показана на Фиг.10.A block diagram of this LPC quantizer is shown in FIG. 10.

Из него следует, что векторный квантователь (блок 1003) квантует ошибку (разницу), формируемую блоком вычитания 1001, между входными LSF-коэффициентами и LSF-коэффициентами, квантованными на предыдущих уровнях кодирования. Особенность квантователя - наличие блока формирования весов (Weights Forming) 1004, который формирует вектор весовых коэффициентов, опираясь на текущие значения входных LSF-коэффициентов и ширину частотной полосы, принятой как рабочая для данного уровня кодирования. Формирование вектора весовых коэффициентов может быть осуществлено, например, в соответствии с выражением (23). Выходом LPC-квантователя служат квантованные LSF-коэффициенты текущего уровня, которые получаются на выходе блока суммирования 1002, на два входа которого поступают соответственно входной сигнал LSF-квантователя, т.е. квантованные коэффициенты LSF предыдущего уровня, и выходной сигнал блока 1003. Несмотря на то что на Фиг.10 показан простой векторный квантователь, использование в улучшающем уровне кодирования переключаемого предиктивно/непредиктивного квантования, подобно базовому уровню, также не исключается.It follows that the vector quantizer (block 1003) quantizes the error (difference) generated by the subtraction unit 1001 between the input LSF coefficients and the LSF coefficients quantized at previous coding levels. A feature of the quantizer is the presence of a Weights Forming block 1004, which forms a vector of weighting coefficients, based on the current values of the input LSF coefficients and the width of the frequency band, which is accepted as working for a given coding level. The formation of the vector of weights can be carried out, for example, in accordance with expression (23). The output of the LPC quantizer is quantized LSF coefficients of the current level, which are obtained at the output of the summing unit 1002, the two inputs of which respectively receive the input signal of the LSF quantizer, i.e. quantized LSF coefficients of the previous level, and the output signal of block 1003. Although a simple vector quantizer is shown in FIG. 10, the use of switchable predictive / non-predictive quantization in an improving coding level, like a base level, is also possible.

Claims

1. A method of multi-level scalable frequency band and coding rate of speech based on "analysis by synthesis", in which for each frame or subframe of speech the parameters of a multi-pulse excitation source and synthesis filter are encoded, including base level encoding and encoding of one or more improvement levels, characterized in order to improve the quality of speech and ensure resistance to loss of speech frames transmitted in a packet switching network, the parameters of the above synthesizing filter determine only once at the stage of analysis when encoding the basic level, then encode these parameters so that only the basic part of the speech signal can be restored with a given quality in a certain limited part of the frequency band of the signal, and at least one subsequent improving level of coding, the specified part frequency band is expanding.

2. The method according to claim 1, where when encoding the next speech frame or subframe, the current parameters of the above synthesizing filter are encoded based on the predictive coding method, which relies on previously encoded past parameters of the synthesizing filter, which were last encoded without using prediction when encoding one of the past frames or subframes, if the specified criterion for the difference between the above-mentioned current parameters of the synthesizing filter and the above-mentioned previous parameters is not fulfilled filter, and
otherwise, if the above criterion of difference is met, then these indicated current parameters of the synthesizing filter are encoded without using prediction.

3. The method according to claim 1, in which the excitation signal of the synthesizing filter is generated using at least one fixed codebook and using an adaptive codebook, if the specified criterion for the effectiveness of the adaptive codebook is met, and
otherwise, if the above specified performance criterion is not met, the excitation signal of the above synthesizing filter is generated without using the specified adaptive codebook, and information about the use of the adaptive codebook is repeated in the next or next frames.

4. The method according to claim 3, in which the adaptive codebook memory is reset to zero if the above specified adaptive codebook efficiency criterion is not satisfied.

5. The method according to claim 1, in which one or more excitation pulses are additionally added to the excitation signal of the synthesizing filter if the specified criterion characterizing the beginning of the agreed stationary portion of the speech signal is fulfilled.

6. The method according to claim 1, or 2, or 3, or 4, or 5, in which part of the parameters or all parameters presented at one or more coding levels are additionally protected by a correction code, and / or part or all of the parameters of the basic level repeated in the next or next frames to increase the resistance to errors and frame loss.

7. The method according to claim 6, in which corrective coding or repetition at a basic level in the next or next frames is applied only to those parameters that were encoded without using prediction.

8. A device that implements the speech encoding method according to claim 1, or 2, or 3, or 4, or 5, or 7 and containing a series-connected basic encoder, the input of which receives a speech signal with a maximum sampling frequency, and the output is connected to the first the input of the multiplexing or packetization unit and channel coding of non-predictive parameters of the speech signal, the output of which is a bitstream or packet stream, and a series of expansion encoders 1, 2, ..., N, the outputs of which are connected respectively to the second, third, ..., (N + 1 ) -m at the inputs of the indicated multiplexing or packetization unit and channel coding, moreover, information on excitation pulses, information on the fundamental tone, quantized and non-quantized LSF coefficients, the remainder signal from the base unit or encoder, respectively, are supplied to the four information inputs of all expansion encoders from the first to Nth low level extensions.

9. A device that implements the method according to claim 6 and containing a series-connected basic encoder, the input of which receives a speech signal with a maximum sampling frequency, and the output is connected to the first input of the multiplexing or packetization unit with repeating all or part of the parameters of the base level or channel coding of all or part of the parameters of the speech signal of the basic level and all or part of the extension levels, the output of which is a bitstream or stream of packets, and a series of expansion encoders 1, 2, ..., N, output which are connected respectively to the second, third, ..., (N + 1) -th inputs of the indicated multiplexing or packetization unit and channel coding, moreover, information on excitation pulses and information are received to the four information inputs of all expansion encoders from the first to N-ro fundamental tone, quantized and non-quantized LSF coefficients, the remainder signal from the base unit or low level extension encoder.