RU2248619C2

RU2248619C2 - Method and device for converting speech signal by method of linear prediction with adaptive distribution of information resources

Info

Publication number: RU2248619C2
Application number: RU2003104222/09A
Authority: RU
Inventors: А.А. Рыболовлев (RU); А.А. Рыболовлев; Г.В. Богачев (RU); Г.В. Богачев; В.Г. Трубицын (RU); В.Г. Трубицын; И.А. Азаров (RU); И.А. Азаров
Original assignee: Рыболовлев Александр Аркадьевич; Богачев Геннадий Васильевич; Трубицын Владимир Геннадьевич; Азаров Игорь Анатольевич
Priority date: 2003-02-12
Filing date: 2003-02-12
Publication date: 2005-03-20

Abstract

FIELD: electric communications.

SUBSTANCE: method uses acoustic-phonetic classification of processed speech signal cadres as four non-crossing classes - absence of speech, vocalized speech, non-vocalized speech and transfer to vocalized speech. This classification is made concurrently with identification of filter of short-time linear prediction, and on basis of results information resources are adaptively distributed among encoded parameters. Classification decision is included into code combination structure for transfer along communication channel and is used to determine modes of vector quantizer and dequantizer, trained for each class of voice cadres.

EFFECT: higher quality.

2 cl, 8 dwg

Description

Область техники, к которой относится изобретение.The technical field to which the invention relates.

Изобретение относится к системе электросвязи и предназначено для кодирования и декодирования речевого сигнала методом линейного предсказания при адаптивном распределении информационных ресурсов кодека (количества бит, выделяемых для кодирования текущего кадра речевого сигнала) по кодируемым параметрам.The invention relates to a telecommunication system and is intended for encoding and decoding a speech signal by the linear prediction method with adaptive distribution of codec information resources (the number of bits allocated for encoding the current frame of the speech signal) according to the encoded parameters.

Уровень техники.The level of technology.

Метод линейного предсказания речи принадлежит к классу методов преобразования речевого сигнала, использующих модель дискретного речевого сигнала в виде отклика линейной дискретной системы с переменными параметрами (голосового тракта) на соответствующий сигнал возбуждения (порождающий сигнал). Переменный характер состояния системы нацелен на повышение эффективности передачи речевого сигнала за счет доступной степени использования нестационарных свойств речи. Временной интервал постоянства параметров дискретной системы детерминирует длительность обрабатываемого кадра речи, выбирается в пределах интервала квазистационарности речевого сигнала (до 30 мс) и, как правило, является фиксированным. Анализатор речепреобразующего устройства выделяет из кадра речевого сигнала параметры состояния линейной системы и сигнала возбуждения, которые служат координатами вектора информационного обмена между кодером и декодером и позволяют синтезатору восстановить исходный сигнал с требуемой степенью верности.The linear speech prediction method belongs to the class of speech signal conversion methods using a discrete speech signal model in the form of a response of a linear discrete system with variable parameters (voice path) to the corresponding excitation signal (generating signal). The variable nature of the state of the system is aimed at increasing the efficiency of the transmission of the speech signal due to the available degree of use of non-stationary speech properties. The time interval for the constancy of the parameters of the discrete system determines the duration of the processed speech frame, is selected within the interval of quasi-stationarity of the speech signal (up to 30 ms) and, as a rule, is fixed. The analyzer of the speech-converting device extracts from the frame of the speech signal the state parameters of the linear system and the excitation signal, which serve as the coordinates of the information exchange vector between the encoder and decoder and allow the synthesizer to restore the original signal with the required degree of fidelity.

Многовариантность определения, комбинирования и отображения параметров линейного предсказания и сигнала возбуждения является основной причиной разнообразия способов и устройств кодирования и декодирования речевого сигнала на основе метода линейного предсказания, доминирующего на современном этапе развития речепреобразующих устройств в диапазоне скоростей кодирования не более 16 кбит/с. Аналогом изобретения является способ преобразования речевого сигнала методом линейного предсказания с возбуждением от кода [Коротаев А.Г. Эффективный алгоритм кодирования речевого сигнала на скорости 4,8 кбит/с и ниже // Зарубежная электроника, 1996, №3, стр.52-68; Hayashi S., Kataoka A., Moriya T. 8 kbit/c short and medium delay speech codecs based on CELP coding // ETT, Vol.5, No.5, September-October 1994, pp. 49-56], заключающийся в идентификации синтезирующего фильтра кратковременного линейного предсказания с последующим выбором из фиксированных кодовых книг векторов стохастического и квазипериодического компонентов сигнала возбуждения и их масштабирующих коэффициентов, обеспечивающих синтез речевого кадра, максимально близкого к обрабатываемому по выбранной метрике. Выбор лучшего сигнала возбуждения осуществляется методом анализа через синтез. Информация о параметрах синтезирующего фильтра и сигнала возбуждения в виде двоичной кодовой комбинации передается по каналу связи. Декодирование сигнала заключается в формировании копии кадра цифрового речевого сигнала синтезирующим фильтром декодера, параметры и сигнал возбуждения которого определяются кодовой комбинацией, поступившей из канала связи. Недостатком способа является относительно низкое качество синтезированной речи, обусловленное, в числе прочих причин, невысокой степенью учета статистических характеристик кодируемых параметров. Известно устройство [Коротаев А.Г. Эффективный алгоритм кодирования речевого сигнала на скорости 4,8 кбит/с и ниже // Зарубежная электроника, 1996, №3, стр.52-68], реализующее этот способ.The multivariance of determining, combining, and displaying linear prediction parameters and an excitation signal is the main reason for the variety of methods and devices for encoding and decoding a speech signal based on the linear prediction method that dominates at the present stage of development of speech-converting devices in the encoding speed range of no more than 16 kbit / s. An analogue of the invention is a method of converting a speech signal by linear prediction with excitation from the code [Korotaev A.G. An effective algorithm for encoding a speech signal at a speed of 4.8 kbit / s and lower // Foreign Electronics, 1996, No. 3, pp. 52-68; Hayashi S., Kataoka A., Moriya T. 8 kbit / c short and medium delay speech codecs based on CELP coding // ETT, Vol.5, No.5, September-October 1994, pp. 49-56], which consists in identifying the synthesizing filter of short-term linear prediction with the subsequent selection from the fixed code books of the vectors of the stochastic and quasiperiodic components of the excitation signal and their scaling coefficients, which provide the synthesis of a speech frame that is as close as possible to the processed by the selected metric. The choice of the best excitation signal is carried out by analysis through synthesis. Information about the parameters of the synthesizing filter and the excitation signal in the form of a binary code combination is transmitted through the communication channel. The decoding of the signal consists in forming a copy of the digital speech signal frame with a synthesizing filter of the decoder, the parameters and excitation signal of which are determined by the code combination received from the communication channel. The disadvantage of this method is the relatively low quality of the synthesized speech, due, among other reasons, to the low degree of accounting for the statistical characteristics of the encoded parameters. A device is known [Korotaev A.G. An effective algorithm for encoding a speech signal at a speed of 4.8 kbit / s and lower // Foreign Electronics, 1996, No. 3, pp. 52-68], which implements this method.

Прототипом изобретения выбран способ преобразования речевого сигнала методом линейного предсказания с возбуждением от алгебраического кода и сопряженной структурой квантователя масштабирующих коэффициентов стохастического и квазипериодического компонентов сигнала возбуждения (CS - ACELP) [Kataoka A., Hayashi S., Moriya Т., Kurihara S., Mano К. Basic algoritm of conjugate-structure algebraic CELP (CS - ASELP) speech coder // NTT Review, Vol. 8, No. 4, July 1996, pp. 24-29; Kataoka A., Hayashi S., Moriya Т., Ikedo J. LSP and gain quantization for CS - ACELP speech coder // NTT Review, Vol. 8, No. 4, July 1996, pp. 30-35; Kitawaki N. An 8-kbit/s speech coding method (CS - ASELP) standardized by ITU // NTT Review, Vol. 8, No. 4, July 1996, pp. 16-23], заключающийся в том, что методом анализа через синтез определяется набор квантованных параметров линейного предсказания кадра речи, обеспечивающий синтез кадра речевого сигнала, минимально отличающегося от оригинального. В качестве настраиваемой модели используется цифровой полюсный фильтр десятого порядка, коэффициенты которого, получаемые в результате процедуры идентификации, пересчитываются в вектор линейных спектральных частот

, подвергаемый непосредственному векторному квантованию. Квантованный вектор линейных спектральных частот

формирует частотную характеристику синтезирующего фильтра кратковременного линейного предсказания, используемого в процедуре анализа через синтез.The prototype of the invention selected a method of converting a speech signal by linear prediction with excitation from an algebraic code and the conjugate quantizer structure of the scaling coefficients of the stochastic and quasiperiodic components of the excitation signal (CS - ACELP) [Kataoka A., Hayashi S., Moriya T., Kurihara S., Mano K. Basic algoritm of conjugate-structure algebraic CELP (CS - ASELP) speech coder // NTT Review, Vol. 8, No. 4, July 1996, pp. 24-29; Kataoka A., Hayashi S., Moriya T., Ikedo J. LSP and gain quantization for CS - ACELP speech coder // NTT Review, Vol. 8, No. 4, July 1996, pp. 30-35; Kitawaki N. An 8-kbit / s speech coding method (CS - ASELP) standardized by ITU // NTT Review, Vol. 8, No. 4, July 1996, pp. 16-23], which consists in the fact that the method of analysis through synthesis determines a set of quantized parameters of linear prediction of the speech frame, which provides synthesis of the frame of the speech signal that is minimally different from the original. As a tunable model, a tenth-order digital pole filter is used, the coefficients of which obtained as a result of the identification procedure are converted into a vector of linear spectral frequencies

subjected to direct vector quantization. Quantized Linear Spectral Frequency Vector

generates the frequency response of the synthesizing filter of short-term linear prediction used in the analysis through synthesis procedure.

Сигнал возбуждения

, подаваемый на этот фильтр, непосредственно не определяется, представляется в виде линейной комбинации масштабированных стохастического и квазипериодического компонентов алгебраического типа и формируется на основании перебора возможных комбинаций кодовых векторов, содержащихся в кодовой книге стохастического компонента сигнала возбуждения, кодовой книге квазипериодического компонента сигнала возбуждения и кодовой книге векторов масштабирующих компонентов, имеющей сопряженную структуру. Выбор комбинации кодовых векторов, формирующих лучшую реализацию сигнала возбуждения, производится по минимуму взвешенной среднеквадратической ошибки между оригинальным и синтезированным кадрами речевого сигнала.Excitation signal

supplied to this filter is not directly determined; it is represented as a linear combination of scaled stochastic and quasiperiodic components of an algebraic type and is formed based on enumerating possible combinations of code vectors contained in the codebook of the stochastic component of the excitation signal, the codebook of the quasiperiodic component of the excitation signal, and the codebook vectors of scaling components having a conjugated structure. The choice of a combination of code vectors that form the best implementation of the excitation signal is made by minimizing the weighted mean square error between the original and synthesized frames of the speech signal.

Длительность обрабатываемого кадра речевого сигнала составляет 10 мс, при этом вектор линейных спектральных частот определяется один раз на длительности кадра, а вектор сигнала возбуждения - дважды (один раз на длительности подкадра, равной 5 мс). В результате кодирования формируется кодовая комбинация двоичного мультипликативного кодаThe duration of the processed frame of the speech signal is 10 ms, while the vector of linear spectral frequencies is determined once for the duration of the frame, and the vector of the excitation signal - twice (once for the duration of the subframe equal to 5 ms). As a result of encoding, a code combination of a binary multiplicative code is generated

элементы которой содержат информацию о квантованном векторе линейных спектральных частот

, выбранных кодовых векторах квазипериодического компонента сигнала возбуждения на каждом из двух подкадров

, стохастического компонента сигнала возбуждения на каждом из двух подкадров

и масштабирующих коэффициентов на каждом из двух подкадров

. Полученная кодовая комбинация имеет фиксированную структуру (для кодирования каждого информационного параметра выделяется постоянное количество бит), отображает обрабатываемый кадр речевого сигнала и поступает через канал связи (в неискаженном виде в случае идеального канала) к декодеру.whose elements contain information about the quantized vector of linear spectral frequencies

selected code vectors of the quasiperiodic component of the excitation signal on each of the two subframes

, the stochastic component of the excitation signal on each of the two subframes

and scaling factors on each of the two subframes

. The resulting code combination has a fixed structure (a constant number of bits is allocated for encoding each information parameter), displays the processed frame of the speech signal and enters through the communication channel (in the undistorted form in the case of an ideal channel) to the decoder.

Декодирование заключается в формировании квантованных векторов

(один раз на длительности кадра) и

(два раза на длительности кадра) на основании полученной из канала связи информации с последующим синтезом кадра цифрового речевого сигнала полюсным фильтром, аналогичным используемому в процедуре анализа через синтез.Decoding is the formation of quantized vectors

(once per frame duration) and

(twice per frame duration) based on information received from the communication channel, followed by synthesis of the digital speech signal frame by a pole filter, similar to that used in the synthesis analysis procedure.

Недостатком данного способа является ограниченная степень учета характеристик текущего кадра речевого сигнала, проявляющаяся в фиксированном распределении информационных ресурсов кодека (количества бит, выделяемых для кодирования речевого кадра) по кодируемым параметрам в условиях инвариантности набора последних. Параметрическая степень адаптации кодирующей процедуры к характеристикам речи, используемая в рассматриваемом способе, ограничивает степень разрешения противоречия между нестационарным характером речевого сигнала и локально-стационарной моделью речеобразования, используемой в способе.The disadvantage of this method is the limited degree of taking into account the characteristics of the current frame of the speech signal, manifested in a fixed distribution of information resources of the codec (the number of bits allocated for encoding the speech frame) according to the encoded parameters under the conditions of invariance of the set of the latter. The parametric degree of adaptation of the coding procedure to the characteristics of speech used in the method under consideration limits the degree of resolution of the contradiction between the unsteady nature of the speech signal and the locally stationary model of speech formation used in the method.

Прототипом изобретения выбрано устройство преобразования речевого сигнала методом линейного предсказания с возбуждением от алгебраического кода и сопряженной структурой квантователя масштабирующих коэффициентов компонентов сигнала возбуждения (CS - ACELP) [Kataoka А., Науаshi S., Moriya Т., Kurihara S., Mano K. Basic algoritm of conjugate-structure algebraic CELP (CS - ASELP) speech coder // NTT Review, Vol. 8, No. 4, July 1996, pp. 24-29; Kataoka A., Hayashi S., Moriya Т., Ikedo J. LSP and gain quantization for CS - ACELP speech coder // NTT Review, Vol. 8, No. 4, July 1996, pp. 30-35; Kitawaki N. An 8-kbit/s speech coding method (CS - ASELP) standardized by ITU // NTT Review, Vol. 8, No. 4, July 1996, pp. 16-23], изображенное на фиг.1 и реализующее способ, выбранный в качестве прототипа. Устройство состоит (фиг.1) из передающей части (кодера) и приемной части (декодера). Кодер прототипа содержит идентификатор фильтра кратковременного линейного предсказания (ИФКЛП) 1, фиксированный векторный квантователь параметров речевого сигнала (ФВК) 2 и устройство формирования кодовой комбинации (УФКК) 3, выход которого через канал связи соединен с декодером. Декодер прототипа содержит устройство разделения кодовой комбинации (УРКК) 4, фиксированный векторный деквантователь параметров речевого сигнала (ФВДК) 5 и фильтр синтеза кратковременного линейного предсказания (ФСКЛП) 6.The prototype of the invention is a device for converting a speech signal by linear prediction with excitation from an algebraic code and the conjugate quantizer structure of the scaling coefficients of the components of the excitation signal (CS - ACELP) [Kataoka A., Naushi S., Moriya T., Kurihara S., Mano K. Basic algoritm of conjugate-structure algebraic CELP (CS - ASELP) speech coder // NTT Review, Vol. 8, No. 4, July 1996, pp. 24-29; Kataoka A., Hayashi S., Moriya T., Ikedo J. LSP and gain quantization for CS - ACELP speech coder // NTT Review, Vol. 8, No. 4, July 1996, pp. 30-35; Kitawaki N. An 8-kbit / s speech coding method (CS - ASELP) standardized by ITU // NTT Review, Vol. 8, No. 4, July 1996, pp. 16-23], depicted in figure 1 and implementing the method selected as a prototype. The device consists (Fig. 1) of a transmitting part (encoder) and a receiving part (decoder). The prototype encoder contains an identifier for a filter of short-term linear prediction (IFCLP) 1, a fixed vector quantizer of parameters of a speech signal (FVC) 2 and a device for generating a code combination (UFCC) 3, the output of which is connected through a communication channel to a decoder. The prototype decoder comprises a code combination separation device (URCM) 4, a fixed vector dequantizer of speech signal parameters (FVDC) 5 and a synthesis filter for short-term linear prediction (FSCLP) 6.

Структурная схема кодера прототипа изображена на фиг.2. Обрабатываемый кадр речевого сигнала

поступает на ИФКЛП 1, на выходе которого формируется вектор линейных спектральных частот

, поступающий на векторный квантователь линейных спектральных частот (ВКЛСЧ) 7. Результатом квантования является квантованный вектор линейных спектральных частот

, формирующий частотную характеристику фильтра синтеза кратковременного линейного предсказания (ФСКЛП) 11, идентичного блоку 6. Реализации квантованного сигнала возбуждения

соответствующего каждому подкадру речевого сигнала и поступающего на второй вход ФСКЛП 11, формируются процедурой перебора на выходе сумматора, подключенного на выход кодовой книги векторов масштабирующих коэффициентов стохастического и квазипериодического компонентов сигнала возбуждения (КК 3) 10, имеющей сопряженную структуру, и представляют собой линейные комбинации видаThe block diagram of the encoder of the prototype shown in figure 2. The processed frame of the speech signal

arrives at IFKLP 1, at the output of which a vector of linear spectral frequencies is formed

arriving at the vector quantizer of linear spectral frequencies (VCLSS) 7. The result of quantization is a quantized vector of linear spectral frequencies

, forming the frequency response of the synthesis filter short-term linear prediction (FSKLP) 11, identical to block 6. Implementation of the quantized excitation signal

corresponding to each subframe of the speech signal and fed to the second input of FSKLP 11, are formed by the search procedure at the output of the adder connected to the output of the codebook of the vectors of the scaling coefficients of the stochastic and quasiperiodic components of the excitation signal (KK 3) 10 having a conjugate structure, and are linear combinations of the form

где

- кодовый вектор кодовой книги стохастического компонента сигнала возбуждения (КК 1) 8, отображающий остаток кратковременного и долговременного линейного предсказания подкадра речевого сигнала и имеющий единичную дисперсию;Where

- the code vector of the codebook of the stochastic component of the excitation signal (KK 1) 8, displaying the remainder of the short-term and long-term linear prediction of the subframe of the speech signal and having a unit variance;

- масштабирующий коэффициент кодового вектора

;

- scaling coefficient of the code vector

;

- кодовый вектор кодовой книги квазипериодического компонента сигнала возбуждения (КК 2) 9, отображающий квазипериодический компонент остатка кратковременного линейного предсказания подкадра речевого сигнала и имеющий единичную дисперсию;

- a codebook codebook vector of the quasiperiodic component of the excitation signal (QC 2) 9, representing the quasiperiodic component of the remainder of the short-term linear prediction of the subframe of the speech signal and having a unit variance;

.

- scaling coefficient of the code vector

.

Для выбора лучшей реализации сигнала возбуждения

в состав кодера прототипа включена система анализа через синтез, состоящая из ФСКЛП 11, сумматора 12, взвешивающего фильтра восприятия (ВФВ) 13 и определителя минимального искажения (ОМИ) 14. Перечисленные блоки системы анализа через синтез совместно с блоками 7, 8, 9 и 10 являются составными элементами ФВК 2. На выходе сумматора 12 формируется вектор разности оригинального и синтезированного подкадров речевого сигнала

(для первого подкадра) или

(для второго подкадра), который подвергается процедуре частотного взвешивания в ВФВ 13 с расчетом взвешенного вектора разности

, после чего в ОМИ 14 производится расчет взвешенной среднеквадратичной ошибки (ВСКО) между оригинальным речевым подкадром и подкадрами синтезированного речевого сигнала, полученными от каждой реализации сигнала возбуждения. По критерию минимума ВСКО формируется команда выбора лучших кодовых векторов (КВЛКВ), поступающая на блоки 8, 9 и 10. Информация о выбранных векторах кодовых книг

на обоих подкадрах совместно с информацией о векторе

поступает на УФКК 3, с выхода которого кодовая комбинация

поступает в канал связи.To select the best implementation of the excitation signal

the prototype encoder includes a synthesis analysis system consisting of FSKLP 11, an adder 12, a perceptual weighting filter (IEF) 13 and a minimum distortion determinant (OMI) 14. The listed blocks of the analysis system through synthesis together with

blocks

7, 8, 9, and 10 are constituent elements of the FVC 2. At the output of the adder 12, a difference vector of the original and synthesized subframes of the speech signal is formed

(for the first subframe) or

(for the second subframe), which undergoes the procedure of frequency weighing in the WFV 13 with the calculation of the weighted difference vector

and then, in OMI 14, a weighted mean square error (SEC) is calculated between the original speech subframe and the synthesized speech signal subframes obtained from each implementation of the excitation signal. According to the minimum criterion of the HSCE, a team is formed to select the best code vectors (CVLKV), which arrives at

blocks

8, 9 and 10. Information about the selected codebook vectors

on both subframes along with vector information

arrives at UFCC 3, from the output of which the code combination

enters the communication channel.

Структурная схема декодера прототипа изображена на фиг.3. Кодовая комбинация

, поступающая из канала связи на вход декодера, в УРКК 4 разделяется на элементы

которые определяют векторы, формируемые один раз на длительности кадра векторным деквантователем линейных спектральных частот (ВДКЛСЧ) 15 и дважды на длительности кадра кодовой книгой квазипериодического компонента сигнала возбуждения (КК 2) 17, кодовой книгой стохастического компонента сигнала возбуждения (КК 1) 16 и кодовой книгой векторов масштабирующих коэффициентов стохастического и квазипериодического компонентов сигнала возбуждения (КК 3) 18. На входы ФСКЛП 6 поступают вектор линейных спектральных частот

, детерминирующий частотную характеристику фильтра, и вектор сигнала возбуждения

, обеспечивающие формирование подкадров кадра синтезированного речевого сигнала

идентичного (в случае идеального канала связи) кадру речевого сигнала на выходе ФСКЛП 11. Блоки 15, 16, 17 и 18 являются составными элементами ФВДК 5.The block diagram of the prototype decoder is shown in figure 3. Code combination

coming from the communication channel to the decoder input in URCK 4 is divided into elements

which determine the vectors formed once on the duration of the frame by the vector dequantizer of linear spectral frequencies (VDKSCH) 15 and twice on the duration of the frame by the code book of the quasiperiodic component of the excitation signal (KK 2) 17, the code book of the stochastic component of the excitation signal (KK 1) 16 and the code book vectors of scaling coefficients of the stochastic and quasiperiodic components of the excitation signal (QC 3) 18. The vector of linear spectral frequencies arrives at the inputs of the FSKLP 6

determining the frequency response of the filter and the excitation signal vector

providing the formation of subframes of the frame of the synthesized speech signal

identical (in the case of an ideal communication channel) frame of the speech signal at the output of FSKLP 11.

Blocks

15, 16, 17 and 18 are components of FVDK 5.

Недостатком устройства является неэффективное использование информационных ресурсов (а следовательно - пропускной способности канала связи) по причине невысокой степени учета статистических характеристик кодируемых параметров речи.The disadvantage of this device is the inefficient use of information resources (and hence the bandwidth of the communication channel) due to the low degree of accounting for the statistical characteristics of the encoded speech parameters.

Сущность изобретения.SUMMARY OF THE INVENTION

Предлагаемый способ преобразования речевого сигнала решает задачу повышения качества речевого сигнала, синтезируемого методом линейного предсказания, без увеличения скорости кодирования.The proposed method for converting a speech signal solves the problem of improving the quality of the speech signal synthesized by the linear prediction method without increasing the coding rate.

Указанный технический результат достигается тем, что известный способ преобразования речевого сигнала методом линейного предсказания с возбуждением от алгебраического кода и сопряженной структурой квантователя масштабирующих коэффициентов стохастического и квазипериодического компонентов сигнала возбуждения дополняется процедурой акустико-фонетической классификации обрабатываемых кадров речевого сигнала на четыре непересекающиеся класса (кадры отсутствия речи, кадры вокализованной речи, кадры невокализованной речи, переходные кадры к вокализованной речи), используемой в качестве управляющей процедуры адаптивного распределения информационных ресурсов. Такой подход повышает степень адаптации процедур кодирования и декодирования от параметрической до структурной и позволяет использовать различия в статистических характеристиках кодируемых параметров речевого сигнала в указанных классах речевых кадров, выделяя биты на кодирование параметров пропорционально их информативности в данном классе речевых кадров. Процедура акустико-фонетической классификации выполняется одновременно с процедурой идентификации настраиваемой модели (цифрового полюсного фильтра), что предотвращает нежелательное возрастание алгоритмической временной задержки на обработку речевого кадра, являющейся критическим параметром для осуществления телефонного обмена в режиме реального времени. Классификационное решение h (номер класса обрабатываемого речевого кадра) является дополнительным параметром информационного обмена между кодером и декодером, в результате чего кодовая комбинация двоичного мультипликативного кода имеет видThe specified technical result is achieved by the fact that the known method of converting a speech signal by linear prediction with excitation from an algebraic code and the conjugate quantizer structure of the scaling coefficients of the stochastic and quasiperiodic components of the excitation signal is supplemented by the procedure of acoustic-phonetic classification of the processed frames of the speech signal into four disjoint classes (absence speech frames , frames of voiced speech, frames of unvoiced speech, transition frames for voiced speech) used as a control procedure for the adaptive distribution of information resources. This approach increases the degree of adaptation of coding and decoding procedures from parametric to structural and allows you to use the differences in the statistical characteristics of the encoded parameters of the speech signal in the specified classes of speech frames, allocating bits for encoding the parameters in proportion to their information content in this class of speech frames. The procedure of acoustic-phonetic classification is performed simultaneously with the procedure for identifying a custom model (digital pole filter), which prevents an undesirable increase in the algorithmic time delay for processing a speech frame, which is a critical parameter for real-time telephone exchange. The classification solution h (class number of the processed speech frame) is an additional parameter of information exchange between the encoder and decoder, as a result of which the code combination of the binary multiplicative code has the form

без изменения ее разрядности (без изменения скорости кодирования речевого сигнала). Кодирование параметра h требует выделения двух бит кодовой комбинации, остальные информационные ресурсы адаптивно распределяются по кодируемым параметрам, что обуславливает переход от используемых фиксированных векторных квантования и деквантования кодируемых параметров речевого сигнала к классифицированным векторным квантованию и деквантованию [Спутниковое телевидение. Новые методы передачи. Под редакцией Харатишвили Н.Г. - М.: Радио и связь, 1993. - стр.175-199] с четырьмя режимами функционирования. Обучение квантователей и деквантователей для каждого класса речевых кадров производится по обучающим выборкам, сформированным на основе речевых кадров, принадлежащих к данному классу.without changing its bit depth (without changing the coding rate of the speech signal). The encoding of the parameter h requires the allocation of two bits of the code combination, the remaining information resources are adaptively distributed among the encoded parameters, which determines the transition from the used fixed vector quantization and dequantization of the encoded parameters of the speech signal to the classified vector quantization and dequantization [Satellite television. New transmission methods. Edited by N. Kharatishvili - M .: Radio and communication, 1993. - p. 175-199] with four modes of operation. The training of quantizers and dequantizers for each class of speech frames is carried out according to training samples formed on the basis of speech frames belonging to this class.

Предлагаемое устройство предназначено для осуществления предлагаемого способа в целом. Повышение качества речевого сигнала без увеличения скорости кодирования достигается тем, что известное устройство преобразования речевого сигнала методом линейного предсказания с возбуждением от алгебраического кода и сопряженной структурой квантователя масштабирующих коэффициентов стохастического и квазипериодического компонентов сигнала возбуждения дополняется акустико-фонетическим классификатором (АФК) 19 (фиг.4, 5), вместо ФВК 2 и ФВДК 5 используются классифицированный векторный квантователь (KBК) 21 и классифицированный векторный деквантователь (КВДК) 24. На вход АФК 19 поступает обрабатываемый кадр речевого сигнала

, с его выхода классификационное решение h (h=1,2,3,4), определяющее принадлежность обрабатываемого кадра к одному из четырех непересекающихся классов (кадрам отсутствия речи, кадрам вокализованной речи, кадрам невокализованной речи, переходным кадрам к вокализованной речи), поступает на устройство формирования кодовой комбинации (УФКК) 22 и на квантующие устройства КВК 21: векторный квантователь линейных спектральных частот (ВКЛСЧ) 26, кодовую книгу стохастического компонента сигнала возбуждения (КК 1) 27, кодовую книгу квазипериодического компонента сигнала возбуждения (КК 2) 28 и кодовую книгу векторов масштабирующих коэффициентов компонентов сигнала возбуждения (КК 3) 29. УФКК 22 отличается от УФКК 3 тем, что использует четыре варианта структуры формируемой кодовой комбинации

, при этом два бита во всех режимах выделяются на кодирование классификационного решения h, остальные биты без увеличения общей разрядности кодовой комбинации адаптивно распределяются по кодируемым параметрам речевого сигнала в зависимости от значения h.The proposed device is intended to implement the proposed method as a whole. Improving the quality of the speech signal without increasing the coding rate is achieved by the fact that the known device for converting the speech signal by linear prediction with excitation from an algebraic code and the associated quantizer structure of the scaling coefficients of the stochastic and quasiperiodic components of the excitation signal is supplemented by an acoustic-phonetic classifier (AFK) 19 (Fig. 4 5), instead of FVK 2 and FVDK 5, the classified vector quantizer (KBK) 21 and the classified vector th dequantizer (KVDK) 24. The processed frame of the speech signal is fed to the input of AFK 19

, from its output, the classification solution h (h = 1,2,3,4), which determines the belonging of the processed frame to one of the four disjoint classes (frames for lack of speech, frames for voiced speech, frames for unvoiced speech, transition frames for voiced speech), to the code combination generation device (UFCA) 22 and to the KVK quantizing devices 21: vector linear spectral frequency quantizer (VLCSH) 26, code book of the stochastic component of the excitation signal (QC 1) 27, code book of the quasiperiodic comp nent excitation signal (CS 2) and 28 codebook vectors scaling factors excitation signal components (CC 3) 22 29. FCCU is different from the FCCU 3 in that the embodiment uses four code formed by a combination of structures

, in this case, two bits in all modes are allocated for encoding the classification decision h, the remaining bits are adaptively distributed among the encoded parameters of the speech signal without increasing the total bit depth of the code combination depending on the value of h.

Устройство разделения кодовой комбинации (УРКК) 23 (фиг.4, 6) отличается от УРКК 4 тем, что, в зависимости от классифицированного решения h, содержащегося в поступающей из канала связи кодовой комбинации, использует один из четырех вариантов разделения, отличающихся числом бит, выделенных на каждый из кодируемых параметров. С выхода УРКК 23 классификационное решение h поступает на управляющие входы элементов КВДК 24: векторный деквантователь линейных спектральных частот (ВДКЛСЧ) 34, кодовую книгу стохастического компонента сигнала возбуждения (КК 1) 35, кодовую книгу квазипериодического компонента сигнала возбуждения (КК 2) 36 и кодовую книгу векторов масштабирующих коэффициентов компонентов сигнала возбуждения (КК 3) 37.The code combination separation device (URCM) 23 (FIGS. 4, 6) differs from URCM 4 in that, depending on the classified solution h contained in the code combination received from the communication channel, it uses one of four separation options differing in the number of bits, allocated to each of the encoded parameters. From the output of the URCK 23, the classification solution h goes to the control inputs of the KVDK elements 24: vector linear spectral frequency dequantizer (VDLCS) 34, the code book of the stochastic component of the excitation signal (KK 1) 35, the code book of the quasiperiodic component of the excitation signal (KK 2) 36, and the code a book of vectors of scaling coefficients of the components of the excitation signal (QC 3) 37.

Блоки 26, 27, 28, 29, 34, 35, 36, 37 отличаются от блоков 7, 8, 9, 10, 15, 16, 17 и 18 соответственно наличием четырех вариантов кодовых книг (четырех режимов работы), обученных на основе обучающих выборок, сформированных использованием кадров речевого сигнала, принадлежащих конкретному классу речевых кадров. Режим работы этих блоков определяется классификационным решением h.Blocks 26, 27, 28, 29, 34, 35, 36, 37 differ from blocks 7, 8, 9, 10, 15, 16, 17 and 18, respectively, by the presence of four codebook variants (four operating modes), trained on the basis of training samples generated using speech frames belonging to a particular class of speech frames. The operating mode of these blocks is determined by the classification solution h.

Перечень фигур схем.Enumeration of figures of schemes.

На фиг.1 представлена структурная схема устройства преобразования речевого сигнала на основе метода линейного предсказания с фиксированным распределением информационных ресурсов (прототип); на фиг.2 - структурная схема кодера речевого сигнала устройства прототипа; на фиг.3 - структурная схема декодера речевого сигнала устройства прототипа; на фиг.4 - структурная схема предлагаемого устройства преобразования речевого сигнала на основе метода линейного предсказания с адаптивным распределением информационных ресурсов, с помощью которого реализуется предлагаемый способ; на фиг.5 - структурная схема кодера речевого сигнала предлагаемого устройства; на фиг.6 - структурная схема декодера речевого сигнала предлагаемого устройства; на фиг.7 - структурная схема акустико-фонетического классификатора кодера речевого сигнала предлагаемого устройства, на фиг.8 - блок-схема алгоритма акустико-фонетической классификации кадров речевого сигнала.Figure 1 presents a structural diagram of a device for converting a speech signal based on the linear prediction method with a fixed distribution of information resources (prototype); figure 2 is a structural diagram of the encoder speech signal of the device of the prototype; figure 3 is a structural diagram of a decoder of the speech signal of the device of the prototype; figure 4 is a structural diagram of the proposed device for converting a speech signal based on the linear prediction method with adaptive distribution of information resources, with which the proposed method is implemented; figure 5 is a structural diagram of a speech encoder of the proposed device; Fig.6 is a structural diagram of a speech decoder of the proposed device; Fig.7 is a structural diagram of an acoustic-phonetic classifier for the encoder of a speech signal of the proposed device, Fig.8 is a block diagram of an algorithm of acoustic-phonetic classification of frames of a speech signal.

Сведения, подтверждающие возможность осуществления изобретения.Information confirming the possibility of carrying out the invention.

Предлагаемый способ преобразования речевого сигнала осуществляют следующим образом. В процедуры кодирования и декодирования речевого сигнала вводят процедуру акустико-фонетической классификации обрабатываемого кадра речевого сигнала, принятое классификационное решение h используют в качестве управляющего параметра адаптивного распределения информационных ресурсов, определяющего варианты осуществления классифицированных векторных квантования и деквантования кодируемых параметров речевого кадра. Такой подход позволяет использовать отдельные варианты квантования и деквантования речевых кадров для каждого класса, характеризующиеся различным распределением информационных ресурсов по кодируемым параметрам, определяемым степенью значимости каждого из кодируемых параметров для качественного представления речевых кадров данного класса (обеспечивающим максимальное качество синтезируемых кадров данного класса). Дополнительные затраты двух бит кодовой комбинации компенсируются значительным увеличением качества квантованного представления каждого из четырех классов речевых кадров, в результате чего достигается повышение качества речевого сигнала, синтезируемого методом линейного предсказания без увеличения скорости кодирования.The proposed method of converting a speech signal is as follows. The procedure of acousto-phonetic classification of the processed frame of the speech signal is introduced into the encoding and decoding of a speech signal, the adopted classification decision h is used as a control parameter of the adaptive distribution of information resources, which determines the implementation options of the classified vector quantization and dequantization of the encoded parameters of the speech frame. This approach allows the use of separate options for quantization and dequantization of speech frames for each class, characterized by a different distribution of information resources by encoded parameters, determined by the degree of significance of each of the encoded parameters for the qualitative presentation of speech frames of a given class (ensuring the maximum quality of synthesized frames of this class). The additional costs of two bits of the code combination are offset by a significant increase in the quality of the quantized representation of each of the four classes of speech frames, as a result of which an increase in the quality of the speech signal synthesized by the linear prediction method without increasing the coding rate is achieved.

Акустико-фонетическая классификация речевых кадров осуществляется на основе процедур анализа речи на акустическом и фонетическом уровнях. В качестве классификационного критерия принадлежности кадра речевого сигнала к классу кадров отсутствия речи используется энергия речевого кадра Е_k. Для принятия решения о принадлежности кадра активного речевого сигнала к классу кадров невокализованной речи используется обобщенный критерий J_k, учитывающий энергию речевого кадра Е_k и число переходов через нуль Z_k. К классу переходных кадров относятся начальные кадры вокализованных сегментов речевого сигнала.Acoustic-phonetic classification of speech frames is carried out on the basis of speech analysis procedures at the acoustic and phonetic levels. The energy of the speech frame E _{k is} used as a classification criterion for a speech signal frame to belong to the class of absence speech frames. To decide on whether an active speech signal frame belongs to a class of unvoiced speech frames, a generalized criterion J _{k is used} that takes into account the energy of the speech frame E _k and the number of zero transitions Z _k . The transition frames class includes the initial frames of voiced segments of a speech signal.

Предлагаемое устройство (фиг.4) состоит из передающей части (кодера) и приемной части (декодера). Кодер содержит акустико-фонетический классификатор (АФК) 19, идентификатор фильтра кратковременного линейного предсказания (ИФКЛП) 20, классифицированный векторный квантователь параметров речевого сигнала (KBК) 21 и устройство формирования кодовой комбинации (УФКК) 22, выход которого через канал связи соединен с декодером. Декодер предлагаемого устройства содержит устройство разделения кодовой комбинации (УРКК) 23, классифицированный векторный деквантователь параметров речевого сигнала (КВДК) 24 и фильтр синтеза кратковременного линейного предсказания (ФСКЛП) 25.The proposed device (figure 4) consists of a transmitting part (encoder) and a receiving part (decoder). The encoder contains an acousto-phonetic classifier (AFC) 19, a short-term linear prediction filter identifier (IFCLP) 20, a classified vector quantizer of speech signal parameters (KBK) 21, and a code combination generation device (UFC) 22, the output of which is connected through a communication channel to a decoder. The decoder of the proposed device contains a device for separating a code combination (URCK) 23, a classified vector dequantizer of speech signal parameters (CVDC) 24, and a synthesis filter for short-term linear prediction (FSCLP) 25.

Структурная схема кодера изображена на фиг.5. Выход АФК 19 соединен с УФКК 22 и управляющими входами элементов КВК: векторного квантователя линейных спектральных частот (ВКЛСЧ) 26, кодовой книги стохастического компонента сигнала возбуждения (КК 1) 27, кодовой книги квазипериодического компонента сигнала возбуждения (КК 2) 28 и кодовой книги векторов масштабирующих коэффициентов стохастического и квазипериодического компонентов сигнала возбуждения (КК 3) 29. Блоки 26, 27, 28 и 29 имеют четыре варианта функционирования в зависимости от одного из четырех возможных значений сигнала на выходе АФК. Выход ИФКЛП 20 является входом ВКЛСЧ 26, выход которого соединен со входом фильтра синтеза кратковременного линейного предсказания (ФСКЛП) 30, идентичного блоку 25. Выходы КК 1 и КК 2 являются входами КК 3, выходы которого соединены с входами сумматора, формирующего сигнал возбуждения

. Выход сумматора соединен со вторым входом ФСКЛП 30. На входы сумматора 31 поступают сигнал со входа кодера и инвертированный сигнал с выхода ФСКЛП 30, выход сумматора соединен с входом взвешивающего фильтра восприятия (ВФВ) 32. Выход ВФВ 32 является входом определителя минимального искажения (ОМИ) 33, выход которого соединен с управляющими входами блоков 27, 28, 29. Вторые выходы блоков 26, 27, 28 и 29 являются входами УФКК 22. Выход УФКК 22 является выходом кодера предлагаемого устройства.The block diagram of the encoder is shown in Fig.5. The output of the AFK 19 is connected to the UFCC 22 and the control inputs of the KVK elements: the vector quantizer of linear spectral frequencies (VCLSS) 26, the code book of the stochastic component of the excitation signal (KK 1) 27, the code book of the quasiperiodic component of the excitation signal (KK 2) 28 and the code book of vectors scaling coefficients of the stochastic and quasiperiodic components of the excitation signal (KK 3) 29.

Blocks

26, 27, 28 and 29 have four operating options depending on one of the four possible signal values at the output of the ROS. The output IFKLP 20 is the input VCLSS 26, the output of which is connected to the input of the synthesis filter short-term linear prediction (FSKLP) 30, identical to block 25. The outputs KK 1 and KK 2 are the inputs of KK 3, the outputs of which are connected to the inputs of the adder forming the excitation signal

. The output of the adder is connected to the second input of the FSKLP 30. The inputs from the adder 31 receive the signal from the input of the encoder and the inverted signal from the output of the FSKLP 30, the output of the adder is connected to the input of the weighing perception filter (WFW) 32. The output of the WFW 32 is the input of the minimum distortion determinant (OMI) 33, the output of which is connected to the control inputs of the

blocks

27, 28, 29. The second outputs of the

blocks

26, 27, 28 and 29 are the inputs of the FCCC 22. The output of the FCCC 22 is the output of the encoder of the proposed device.

Структурная схема декодера изображена на фиг.6. Сигналы с выхода УРКК 23 поступают на управляющие и информационные входы элементов КВДК: векторного деквантователя линейных спектральных частот (ВДКЛСЧ) 34, кодовой книги стохастического компонента сигнала возбуждения (КК 1) 35, кодовой книги квазипериодического компонента сигнала возбуждения (КК 2) 36 и кодовой книги векторов масштабирующих коэффициентов стохастического и квазипериодического компонентов сигнала возбуждения (КК 3) 37. Блоки 34, 35, 36 и 37 имеют четыре варианта функционирования в зависимости от одного из четырех возможных значений сигнала h на выходе АФК. Выходы блоков 35 и 36 соединены с входами блока 37, выходы которого соединены с входами сумматора, формирующего сигнал возбуждения

. Выход сумматора соединен со входом ФСКЛП 25, на второй вход которого поступает сигнал с ВДКЛСЧ 34. Выход блока 25 является выходом декодера предлагаемого устройства.The block diagram of the decoder is shown in Fig.6. The signals from the output of the URCK 23 are fed to the control and information inputs of the KVDK elements: a vector linear spectral frequency dequantizer (VDLCS) 34, a code book of the stochastic component of the excitation signal (KK 1) 35, a code book of the quasiperiodic component of the excitation signal (KK 2) 36, and a code book vectors of scaling coefficients of the stochastic and quasiperiodic components of the excitation signal (QC 3) 37.

Blocks

34, 35, 36 and 37 have four options for functioning, depending on one of four possible values signal h at the output of the AFC. The outputs of

blocks

35 and 36 are connected to the inputs of block 37, the outputs of which are connected to the inputs of the adder forming the excitation signal

. The output of the adder is connected to the input of FSKLP 25, the second input of which receives a signal from VDKLSC 34. The output of block 25 is the output of the decoder of the proposed device.

Акустико-фонетический классификатор 19 содержит (фиг.7) определитель энергии (ОЭ) 38 и определитель числа переходов через нуль (ОЧПЧН) 39, на входы которых одновременно поступает обрабатываемый сигнал. Выходы блоков 38 и 39 соединены со входами определителя кадров невокализованной речи (ОКНР) 41, кроме того, выход ОЭ 38 является входом определителя кадров пауз (ОКП) 40. На вход определителя кадров вокализованной речи и переходных кадров (ОКВР и ПК) 42 поступают сигналы с двух выходов ОКНР 41 и с выхода ОКП 40. Входы формирователя классификационных решений (ФКР) 43 соединены с двумя выходами блока 42, выходами блоков 40 и 41. Выход ФКР 43 является выходом АФК.Acoustic-phonetic classifier 19 contains (Fig.7) a determinant of energy (OE) 38 and a determinant of the number of transitions through zero (OFPCHN) 39, the inputs of which simultaneously receive the processed signal. The outputs of blocks 38 and 39 are connected to the inputs of the qualifier of frames of non-voiced speech (OKNR) 41, in addition, the output of OE 38 is the input of the identifier of frames of pauses (OKP) 40. The signals of the frames of voiced speech and transition frames (OKVR and PC) 42 receive signals from two outputs of OKNR 41 and from the output of OKP 40. The inputs of the shaper of classification decisions (FCR) 43 are connected to two outputs of block 42, the outputs of blocks 40 and 41. The output of FCR 43 is the output of the AFK.

Предлагаемое устройство выполняет покадровую обработку речевого сигнала. На вход кодера подается текущий кадр

речевого сигнала, представленный в формате линейной импульсно-кодовой модуляции. Результатом параметрического кодирования на основе метода линейного предсказания с адаптивным распределением информационных ресурсов является двоичная кодовая комбинация

, поступающая с выхода кодера в канал связи. На выходе декодера формируется кадр синтезированного речевого сигнала

, соответствующего исходному кадру

.The proposed device performs frame-by-frame processing of a speech signal. The current frame is fed to the input of the encoder

speech signal, presented in a linear pulse code modulation format. The result of parametric coding based on the linear prediction method with adaptive distribution of information resources is a binary code combination

coming from the output of the encoder into the communication channel. A synthesized speech signal frame is formed at the decoder output

corresponding to the source frame

.

Предлагаемое устройство работает следующим образом. Обрабатываемый кадр речевого сигнала поступает одновременно на АКФ 19 и ИФКЛП 20, где производятся акустико-фонетическая классификация речевого кадра и идентификация фильтра кратковременного линейного предсказания соответственно. Алгоритм акустико-фонетической классификации кадров речевого сигнала представлен блок-схемой на фиг.8. Обрабатываемый кадр

речевого сигнала одновременно анализируется на величину энергии Е_k и число переходов через нуль Z_k. По величине энергии кадра принимается классификационное решение первого уровня "кадр отсутствия речи (кадр паузы, h=1) - кадр активной речи". В случае принятия решения о кадре отсутствия речи классификационная процедура завершается. В противном случае рассчитывается обобщенный критерий

, на основании которого принимается классификационное решение второго уровня "кадр невокализованной речи (h=3) - кадр вокализованной речи или переходной кадр". В случае принятия решения о кадре невокализованной речи классификационная процедура завершается. В противном случае на основании сравнения текущего решения с классификационным решением по предыдущему речевому кадру принимается классификационное решение третьего уровня "кадр вокализованной речи (h=2) - переходной кадр (h=4)". На этом процедура классификации завершается. Выделение переходных кадров, характеризующихся наиболее широким диапазоном изменения значений кодируемых параметров, в отдельный класс позволяет повысить точность их квантования, что оказывает значительное влияние на качество синтезируемой речи.The proposed device operates as follows. The processed frame of the speech signal is supplied simultaneously to ACF 19 and IFKLP 20, where the acoustic-phonetic classification of the speech frame and identification of the filter of short-term linear prediction are performed, respectively. The algorithm of acoustic-phonetic classification of frames of a speech signal is presented in the block diagram of Fig. 8. Processed frame

the speech signal is simultaneously analyzed for energy E _k and the number of transitions through zero Z _k . According to the magnitude of the energy of the frame, the first level classification decision is made: “frame for no speech (pause frame, h = 1) - frame for active speech”. If a decision is made about the frame for the absence of speech, the classification procedure is completed. Otherwise, a generalized criterion is calculated.

, on the basis of which the second level classification decision is made “frame of unvoiced speech (h = 3) - frame of voiced speech or transition frame”. If a decision is made on a frame of unvoiced speech, the classification procedure is completed. Otherwise, based on a comparison of the current solution with the classification decision for the previous speech frame, a third-level classification decision is made “voiced speech frame (h = 2) - transition frame (h = 4)”. This completes the classification procedure. The selection of transition frames, characterized by the widest range of changes in the values of the encoded parameters, into a separate class makes it possible to increase the accuracy of their quantization, which has a significant impact on the quality of the synthesized speech.

Классификационное решение h детерминирует режим функционирования УФКК 22 и текущее состояние ВКЛСЧ 26, блоков 27, 28 и 29, адаптируя, тем самым, распределение информационных ресурсов устройства под характеристики обрабатываемого кадра речевого сигнала. ВКЛСЧ 26 выполняет векторное квантование вектора линейных спектральных частот

, являющегося результатом процедуры идентификации в блоке 20, с выхода ВКЛСЧ 26 квантованный вектор

поступает на ФСКЛП 30, фиксируя его состояние на временной интервал, равный длительности обрабатываемого кадра. Перебор кодовых векторов, содержащихся в кодовых книгах 27, 28, 29, приводит к формированию множества возможных реализаций сигнала возбуждения

поочередно для обоих подкадров речевого сигнала. На выходе ФСКЛП 30 поочередно формируются множества реализации подкадров квантованного речевого кадра

. На выходе сумматора 31 формируется множество векторов

ошибок квантования подкадров, на выходе ВФВ 32 - множество векторов

взвешенных ошибок квантования подкадров, которое в ОМИ 33 пересчитывается во множество взвешенных среднеквадратических ошибок (ВСКО).The classification solution h determines the functioning mode of the UFCC 22 and the current state of the VCLCH 26, blocks 27, 28 and 29, thereby adapting the distribution of the device’s information resources to the characteristics of the processed frame of the speech signal. VLSSH 26 performs vector quantization of the vector of linear spectral frequencies

resulting from the identification procedure in block 20, from the output of the ON switching frequency control 26 quantized vector

arrives at FSKLP 30, fixing its state for a time interval equal to the duration of the processed frame. Enumeration of code vectors contained in

code books

27, 28, 29 leads to the formation of many possible implementations of the excitation signal

alternately for both subframes of the speech signal. At the output of FSKLP 30, alternately, implementation sets of subframes of a quantized speech frame are formed

. The output of the adder 31 is formed of many vectors

quantization errors of subframes; at the output of the WFW 32, a plurality of vectors

weighted errors of quantization of subframes, which in OMI 33 is converted into a set of weighted mean square errors (SEC).

По минимальной из полученных ВСКО в ОМИ 33 принимается решение о лучшей комбинации кодовых векторов, которое в виде команды КВЛКВ поступает на блоки 27, 28, 29. С выхода этих блоков по окончании обработки каждого подкадра информация о лучших комбинациях кодовых векторов поступает на УФКК 22, где она объединяется с информацией о классификационном решении h и информацией о векторе

. На выходе УФКК 22 формируется кодовая комбинацияBased on the smallest of the obtained HSCEs, in OMI 33 a decision is made on the best combination of code vectors, which is sent to

blocks

27, 28, 29 in the form of an EHLC command. From the output of these blocks at the end of processing of each subframe, information about the best combinations of code vectors is sent to UFCC 22, where it is combined with information about the classification decision h and information about the vector

. At the output of UFCC 22, a code combination is formed

поступающая в канал связи, а из него - на вход УРКК 23. Классификационное решение h, выделенное в УРКК 23 из

, детерминирует режим разделения кодовой комбинации в УРКК 23 и состояние ВДКЛСЧ 34, блоков 35, 36 и 37, адаптируя, тем самым, распределение информационных ресурсов устройства под характеристики обрабатываемого кадра речевого сигнала.entering the communication channel, and from it to the input of the URCK 23. The classification solution h allocated in URCK 23 of

, determines the separation mode of the code combination in URCK 23 and the state of the VDKLCH 34, blocks 35, 36 and 37, thereby adapting the distribution of information resources of the device to the characteristics of the processed frame of the speech signal.

Элемент

комбинации мультипликативного кода, содержащий информацию о квантованном векторе линейных спектральных частот, поступает на вход ВДКЛСЧ 34, на выходе которого формируется вектор

, идентичный вектору на выходе ВКЛСЧ 26. Элементы комбинации мультипликативного кода, содержащие информацию о лучших кодовых векторах первого подкадра

и второго подкадраElement

a combination of a multiplicative code containing information about a quantized vector of linear spectral frequencies is fed to the input VDKLSCH 34, at the output of which a vector is formed

identical to the vector at the output of the VCLCH 26. Elements of the combination of the multiplicative code containing information on the best code vectors of the first subframe

and second subframe

поступают на блоки 36, 35 и 37, в результате чего на выходе сумматора, включенного между блоками 37 и 25, формируется сигнал возбуждения, идентичный сигналу на входе ФСКЛП 30. ФСКЛП 25 идентичен ФСКЛП 30. На выходе ФСКЛП 25 формируется кадр

синтезированного речевого сигнала, идентичный кадру речевого сигнала на выходе ФСКЛП 30 и являющийся наиболее близким к обрабатываемому кадру

по критерию ВСКО.

arrive at

blocks

36, 35 and 37, as a result of which an excitation signal is generated at the output of the adder connected between

blocks

37 and 25, which is identical to the signal at the input of FSKLP 30. FSKLP 25 is identical to FSKLP 30. At the output of FSKLP 25, a frame is formed

synthesized speech signal, identical to the frame of the speech signal at the output of FSKLP 30 and being the closest to the processed frame

by the criterion of HSCE.

Приведенные сведения показывают, что средства, воплощающие изобретения при их осуществлении, способны обеспечить более качественную передачу речи за счет адаптивного распределения информационных ресурсов устройства преобразования речевого сигнала, использующего метод линейного предсказания.The above information shows that the means embodying the invention in their implementation are able to provide better voice transmission due to the adaptive distribution of information resources of the speech signal conversion device using the linear prediction method.

Claims

1. A method of converting a speech signal by linear prediction, which means that a short-term linear prediction filter is identified during encoding, the identification result is expressed by a linear spectral frequency vector, which is directly quantized by a vector quantizer, a quantized linear spectral frequency vector, and the frequency response of a short-term linear prediction synthesizing filter is generated ; the excitation signal of the short-term linear prediction filter is represented by a linear combination of scaled stochastic and quasiperiodic components contained in the corresponding codebooks and is determined by the analysis procedure by synthesis using the criterion of the minimum mean-square weighted error between the processed frame of the speech signal and the synthesized frame; for transmission over a communication channel, a combination of a binary multiplicative code is used, containing information about the quantized vector of linear spectral frequencies, code vectors of the components of the excitation signal and their scaling factors; when decoding, a frame of the speech signal is generated with a short-term linear prediction synthesizing filter, the frequency response and the excitation signal of which are generated in accordance with the information contained in the code combination received from the communication channel, characterized in that during coding, the acoustic-phonetic classification of the processed frames of the speech signal into four disjoint class, performed simultaneously with the identification procedure of the filter of short-term linear prediction, cl the assimilation solution is included in the structure of the code combination transmitted over the communication channel and is used to determine the functioning mode of the classified vector quantizer and dequantizer trained for each class of speech frames from training samples formed on the basis of speech frames belonging to this class and performing classified vector procedures quantization and dequantization, respectively, depending on the result of the acoustic-phonetic classification, thereby providing dissolved adaptive resource allocation information.

2. A device for converting speech by the linear prediction method, which contains the encoder identifier of the filter for short-term linear prediction (IFCLP), the input of which receives the encoded frame of the speech signal, and the output is the input of the vector quantizer of linear spectral frequencies (VLCSH), stochastic codebooks (CC 1 ) and quasiperiodic (QC 2) components of the excitation signal, the outputs of which are connected to separate inputs of the codebook of scaling coefficients (QC 3), the outputs of which after summation are connected to the input of the filter for synthesis of short-term linear prediction (FSKLP), the second input of which is connected to the output of the high-frequency oscillator, and the output is connected to the inverse input of the adder, the direct input of which receives the encoded frame of the speech signal, and the output of which is connected through a weighted perceptual filter of perception (WFV) and the minimum error determinant (OMI) is connected to the control inputs of KK 1, KK 2 and KK 3, a code combination generation device (UFCC), the inputs of which are connected to additional outputs ON, CK 1, KK 2, K 3, and as part of the decoder is a code combination separation device (URCK), the outputs of which are the inputs of a vector de-quantizer of linear spectral frequencies (VLCLC), stochastic code books (CC 1) and quasiperiodic (CC 2) components of the excitation signal and code book of scaling coefficients (KK 3), the inputs of which are connected to the outputs of KK 1 and KK 2, and the outputs, after combining, are the input of the synthesis filter for short-term linear prediction (FSKLP), the second input of which is connected to the output of the VDKLSC, and the output forms A frame of synthesized speech signal is distinguished, characterized in that it contains an acoustic-phonetic classifier (AFK) of the processed speech signal frames into four disjoint classes: absence speech frames, voiced speech frames, unvoiced speech frames, transition frames to voiced speech, input The ROS receives the encoded frame of the speech signal, and its output is connected to additional inputs of the VLSSH, codebooks of the stochastic (KK 1) and quasiperiodic (KK 2) components of the excitation signal and their mas tabir coefficients (CC 3) and forming apparatus codeword (FCCU).

3. The device according to claim 2, characterized in that the UFKK has an additional input connected to the output of the ROS, and the URCK has an additional output connected to the additional inputs of the VDKLSC, KK 1, KK 2 and KK 3, while the UFKK and URCK controlled by the AFK classification decision, they have four modes of generating and splitting a code combination, respectively, differing in the distribution of the number of bits allocated for encoding a speech frame according to the encoded parameters, taking into account two bits for encoding the class number of the speech frame without changing the overall size code sequence numbers.

4. The device according to p. 2, characterized in that the VLSSH, codebooks of the stochastic (QC 1) and quasiperiodic (QC 2) components of the excitation signal and their scaling coefficients (QC 3) contained in the encoder have additional inputs connected to AFC output, and they contain four codebook variants corresponding to four classes of processed frames of a speech signal and differing in the number of code vectors while their total number remains unchanged, and, in the case of VLHLC, stochastic (QC 1) and quasiperiodic ( K 2) of the components of the excitation signal and scaling coefficients (CC 3) contained in the composition of the decoder have additional inputs connected to the additional output URKK and contain four codebooks embodiment similar codebooks contained in encoder devices.