RU2402879C2

RU2402879C2 - Method for adaptation of codec parametres and transfer of voice signal in network with switching of packets (versions)

Info

Publication number: RU2402879C2
Application number: RU2008146436/09A
Authority: RU
Inventors: Роман Аликович Иманкулов (RU); Роман Аликович Иманкулов; Андрей Александрович Пинаев (RU); Андрей Александрович Пинаев; Владимир Иванович Суханов (RU); Владимир Иванович Суханов; Сергей Иванович Тимошенко (RU); Сергей Иванович Тимошенко
Original assignee: Закрытое акционерное общество "Нау-сервис"
Priority date: 2008-11-24
Filing date: 2008-11-24
Publication date: 2010-10-27
Also published as: RU2008146436A

Abstract

FIELD: information technologies.

SUBSTANCE: tolerance is established for change of voice signal quality, network is tested by higher and lower load, following the network testing results, optimal parametres of codec and voice signal transfer are found for current connection and established as current, change of voice signal quality is calculated with a certain periodicity on the basis of E-model rating factor (R-factor) calculation. In case calculated change of voice signal quality exceeds established tolerance for its for change, network is again tested by higher and lower load, following the network testing results, optimal parametres of codec and voice signal transfer are found for current connection, found parametres are established as current, and again change of voice signal quality is calculated with a certain periodicity.

EFFECT: improved quality of speech transfer in IP-telephony under conditions of unknown and unstable throughput capacity of communication channel and availability of accidental losses of transmitted packets in communication channel.

2 tbl

Description

Изобретение относится к IP-телефонии и может быть использовано для оптимизации качества передачи речи.The invention relates to IP-telephony and can be used to optimize the quality of voice transmission.

В известных способах передачи речевой информации возможность адаптивного управления качеством в сетях IP реализуют динамическим изменением параметров кодирования и передачи речевого сигнала оконечными устройствами в зависимости от текущих характеристик канала связи. Для этого необходим выбор способа сбора статистической информации о канале связи, способа адаптации параметров кодирования и передачи речевого сигнала.In known methods for transmitting voice information, adaptive quality control in IP networks is realized by dynamically changing the encoding and transmission of the speech signal by terminal devices depending on the current characteristics of the communication channel. For this, it is necessary to choose a method for collecting statistical information about a communication channel, a method for adapting encoding parameters and transmitting a speech signal.

Сбор статистической информации о канале связи обычно осуществляется с использованием рекомендации RFC 3550, где определен протокол RTCP (Real Time Control Protocol, протокол, управляющий транспортным протоколом реального времени), позволяющий оконечным устройствам получать статистическую информацию о характеристиках канала связи.The collection of statistical information on a communication channel is usually carried out using RFC 3550 recommendation, which defines the RTCP protocol (Real Time Control Protocol, a protocol that controls real-time transport protocol), which allows terminal devices to obtain statistical information on the characteristics of the communication channel.

Основную сложность представляет выбор способа адаптации параметров кодирования и передачи речевого сигнала на основании полученной информации о канале связи. Здесь существует две проблемы.The main difficulty is the choice of how to adapt the encoding and transmission of the speech signal based on the received information about the communication channel. There are two problems here.

Первая проблема заключается в необходимости объективного измерения качества передаваемой речи, поскольку можно говорить об успехе способа адаптивной подстройки только в том случае, когда новые параметры кодирования при данных условиях обеспечивают лучшее качество речи, чем используемые ранее. Классическое определение качества речевого сигнала по шкале MOS требует специально оборудованных студий с привлечением большого количества экспертов. Очевидно, такой способ оценки качества оказывается неприемлемым в большинстве приложений, и потому были разработаны другие способы оценки качества речи, которые позволили устранить «человеческий фактор». Наиболее интересной для практического применения оказалась Е-модель (G.107. ITU-T Recommendation G.107 Е-модель - вычислительная модель, используемая для планирования передачи [Text]. - Approved - Geneva: ITU-T, 2005. - 28 р.) и ее разновидности для IP-телефонии (Lustosa, L.С.G. E-Model Utilization For Speech Quality Evaluation Over VoIP-Based Communication Systems / L.C.G.Lustosa, L.S.G.Carvalho, P.H.A.Rodrigues, E.S.Mota. - 22nd SBRC, 2004. - P. 933-938).The first problem is the need for an objective measurement of the quality of transmitted speech, since it is possible to talk about the success of the adaptive adjustment method only when new coding parameters under these conditions provide better speech quality than previously used. The classical definition of the quality of a speech signal on the MOS scale requires specially equipped studios with the involvement of a large number of experts. Obviously, this method of assessing quality is unacceptable in most applications, and therefore other methods of assessing speech quality have been developed that have eliminated the "human factor". The most interesting for practical application was the E-model (G.107. ITU-T Recommendation G.107 E-model - a computational model used for transmission planning [Text]. - Approved - Geneva: ITU-T, 2005. - 28 p .) and its variants for IP-telephony (Lustosa, L.C. G. E-Model Utilization For Speech Quality Evaluation Over VoIP-Based Communication Systems / LCG Lustosa, LSG Carvalho, PHARodrigues, ESMota. - 22nd SBRC, 2004 . - P. 933-938).

Е-модель исходит из предположения об аддитивном влиянии искажающих факторов на качество связи и не требует исходного сигнала для его оценки. Вместо этого Е-модель выводит решение на основании характеристики канала связи, оконечных устройств, уровня фонового шума и т.п. Таким образом, предполагается, что измеряя характеристики оконечных устройств и канала связи, можно получить оценку качества выходного речевого сигнала.The E-model proceeds from the assumption of the additive effect of distorting factors on the quality of communication and does not require an initial signal to evaluate it. Instead, the E-model derives a decision based on the characteristics of the communication channel, terminals, background noise level, etc. Thus, it is assumed that by measuring the characteristics of the terminal devices and the communication channel, it is possible to obtain an estimate of the quality of the output speech signal.

Вторая проблема состоит в определении того, какими именно параметрами и как можно управлять для достижения лучшего качества. Распространенные речевые кодеки (например, G.729 или G.723.1) не предоставляют возможности какого-либо манипулирования параметрами кодирования и для указанных целей не подходят. Однако известны кодеки, в которых есть управление режимом кодирования (например, Speex и GSM 06.90 AMR (Adaptive Multi-Rate)).The second problem is to determine which parameters and how you can manage to achieve better quality. Common speech codecs (for example, G.729 or G.723.1) do not provide the possibility of any manipulation of the encoding parameters and are not suitable for these purposes. However, codecs are known in which there is control of the encoding mode (for example, Speex and GSM 06.90 AMR (Adaptive Multi-Rate)).

На качество речи в IP-телефонии оказывают влияние и режимы передачи речевого сигнала: коэффициент блокирования (пакетизации, иначе - число фреймов в пакете) и количество посылок одного и того же пакета в канал. Увеличение числа фреймов в пакете позволяет снизить накладные расходы по их передаче за счет экономии на заголовках пакетов, но увеличивает общую задержку сигнала. Увеличение числа посылок одного и того же пакета в канал при наличии запаса по пропускной способности канала связи позволяет эффективно бороться со случайными потерями пакетов. Этими параметрами тоже можно и нужно управлять.Voice quality in IP-telephony is also influenced by the speech signal transmission modes: blocking coefficient (packetization, otherwise, the number of frames in a packet) and the number of transmissions of the same packet per channel. An increase in the number of frames in a packet can reduce the overhead of their transmission by saving on packet headers, but it increases the overall signal delay. An increase in the number of parcels of the same packet per channel with a margin on the bandwidth of the communication channel allows you to effectively deal with random packet loss. These parameters can and should also be controlled.

Примером разработки способа адаптации параметров речевого кодека может служить известный способ, заключающийся в том, что предусматривают адаптивный кодек, способный передавать непрерывный голосовой поток и имеющий информацию о скорости передачи данных источника и полосе пропускания канала, осуществляют проверку канала передачи голосового потока для получения по меньшей мере одного параметра качества, определяют по меньшей мере одно ограничение, связанное с передачей голосового потока, адаптируют скорость передачи данных источника и полосу пропускания канала как функцию параметра качества и ограничивающего фактора для получения максимального значения качества получаемого сигнала во время передачи непрерывного голосового потока. Параметром оценки качества передачи речи в указанном способе используется максимальное число медиасимволов в кодовом слове, максимальная длина кодового слова, задержка сети, фактор ухудшения задержки, фактор потерь пакетов, измеренный фактор искажения сигналов, R-фактор для расчета параметра MOS, который используют для согласования скорости передачи данных источника и пропускной способности канала связи (Заявка № US2004160979 H04L 1/00; H04L 29/06; Н04М 7/00; H04L 12/56, опубл. 19.08.2004).An example of the development of a method for adapting the parameters of a speech codec can be a known method, namely, that they provide an adaptive codec capable of transmitting a continuous voice stream and having information about the source data rate and channel bandwidth; one quality parameter, determine at least one restriction associated with the transmission of the voice stream, adapt the data rate source and the channel bandwidth as a function of the quality parameter and a limiting factor for the maximum value of received signal quality while transmission of the continuous stream of voice. The parameter for evaluating the quality of speech transmission in this method uses the maximum number of media characters in the codeword, the maximum length of the codeword, network delay, delay degradation factor, packet loss factor, measured signal distortion factor, R-factor for calculating the MOS parameter, which is used to match the speed transmit data source and bandwidth of the communication channel (Application No. US2004160979 H04L 1/00; H04L 29/06; H04M 7/00; H04L 12/56, publ. 08/19/2004).

Для применения известного способа необходимо иметь возможность управлять пропускной способностью канала связи, что в большинстве задач IP-телефонии невозможно. Также для оценки качества передачи речи требуется знать такие параметры, как максимальное число медиасимволов в кодовом слове и максимальную длину кодового слова, что требует дополнительных затрат на их оценку и является в свою очередь сложной технической задачей. Кроме того, общение по IP-телефону не всегда соответствует задаче передачи непрерывного голосового потока, так как разговор двух абонентов сопровождается естественными паузами. Известный способ не имеет средств компенсации случайных потерь пакетов в канале связи и уменьшения накладных расходов при передаче пакетов. Все это усложняет практическую реализацию способа для использования в IP-телефонии.To apply the known method, it is necessary to be able to control the bandwidth of the communication channel, which is impossible in most IP-telephony tasks. Also, to assess the quality of speech transmission, you need to know such parameters as the maximum number of media characters in the code word and the maximum length of the code word, which requires additional costs for their assessment and is, in turn, a difficult technical task. In addition, communication by IP-phone does not always correspond to the task of transmitting a continuous voice stream, since a conversation between two subscribers is accompanied by natural pauses. The known method does not have means to compensate for random packet loss in the communication channel and reduce overhead during packet transmission. All this complicates the practical implementation of the method for use in IP-telephony.

Техническим результатом предлагаемого решения является повышение качества передачи речи в IP-телефонии в условиях неизвестной и нестабильной пропускной способности канала связи и наличия случайных потерь передаваемых пакетов в канале связи.The technical result of the proposed solution is to improve the quality of voice transmission in IP-telephony in the conditions of unknown and unstable bandwidth of the communication channel and the presence of random loss of transmitted packets in the communication channel.

Для достижения указанного результата предложены два варианта способа адаптации параметров кодека и передачи речевого сигнала в сети, ориентированной на пакетную передачу данных.To achieve this result, two variants of a method for adapting codec parameters and transmitting a speech signal in a packet-oriented network are proposed.

Вариант 1. Способ адаптации параметров кодека и передачи речевого сигнала в сети с коммутацией пакетов, при котором устанавливают начальные параметры кодека и передачи речевого сигнала, а также допуск на изменение качества речевого сигнала, осуществляют поиск оптимальных для текущего соединения параметров кодека и передачи речевого сигнала, устанавливают найденные параметры кодека и передачи речевого сигнала как текущие и с определенной периодичностью вычисляют изменение качества речевого сигнала. В случае, если вычисленное изменение качества речевого сигнала превышает установленный допуск на изменение качества речевого сигнала, вновь осуществляют поиск оптимальных параметров кодека и передачи речевого сигнала, устанавливают их как текущие и вновь с определенной периодичностью вычисляют изменение качества речевого сигнала.Option 1. A method of adapting the codec parameters and transmitting a speech signal in a packet-switched network, in which the initial parameters of the codec and transmitting the speech signal, as well as the tolerance for changing the quality of the speech signal are set, search for the optimal codec and speech signal transmission parameters for the current connection, the found codec and speech signal transmission parameters are set as current and, with a certain frequency, a change in the quality of the speech signal is calculated. If the calculated change in the quality of the speech signal exceeds the established tolerance for changing the quality of the speech signal, they again search for the optimal parameters of the codec and the transmission of the speech signal, set them as current and again change the quality of the speech signal with a certain frequency.

Отличием предложенного способа является то, что вычисление изменения качества речевого сигнала производят на основе расчета фактора рейтинга (R-фактор) Е-модели, а параметры кодека и передачи речевого сигнала, оптимальные для текущего соединения, ищут методом покоординатного спуска, используя в качестве целевой функции R-фактор.The difference of the proposed method is that the calculation of the change in the quality of the speech signal is carried out on the basis of the calculation of the rating factor (R-factor) of the E-model, and the parameters of the codec and speech signal transmission, optimal for the current connection, are searched by the coordinate descent method, using as the objective function R factor.

Вариант 2. Способ адаптации параметров кодека и передачи речевого сигнала в сети с коммутацией пакетов, при котором устанавливают допуск на изменение качества речевого сигнала, осуществляют тестирование сети повышенной и пониженной нагрузкой, по результатам тестирования сети находят оптимальные для текущего соединения параметры кодека и передачи речевого сигнала, устанавливают найденные параметры кодека и передачи речевого сигнала как текущие и с определенной периодичностью вычисляют изменение качества речевого сигнала. В случае, если вычисленное изменение качества речевого сигнала превышает установленный допуск на изменение качества речевого сигнала, вновь осуществляют тестирование сети повышенной и пониженной нагрузкой, по результатам тестирования сети находят оптимальные для текущего соединения параметры кодека и передачи речевого сигнала, устанавливают найденные параметры кодека и передачи речевого сигнала как текущие и вновь с определенной периодичностью вычисляют изменение качества речевого сигнала.Option 2. A method of adapting codec parameters and transmitting a speech signal in a packet-switched network, in which a tolerance for changing the quality of a speech signal is established, testing the network with increased and reduced load, and using network testing results, the codec and speech signal transmission parameters that are optimal for the current connection are found the found codec and speech signal transmission parameters are set as current and, with a certain periodicity, the change in the quality of the speech signal is calculated. If the calculated change in the quality of the speech signal exceeds the established tolerance for changing the quality of the speech signal, the network is again tested with increased and reduced load, according to the results of network testing, the codec and speech transmission parameters that are optimal for the current connection are found, the found codec and speech transmission parameters are established signal as current and again with a certain frequency calculate the change in the quality of the speech signal.

Отличием предложенного способа является то, что при тестировании сети повышенной нагрузкой параметры кодека и передачи речевого сигнала выбирают такими, чтобы превысить реальную пропускную способность В установленного соединения, при тестировании сети пониженной нагрузкой текущие параметры кодека и передачи речевого сигнала выбирают такими, чтобы не превысить реальную пропускную способность В установленного соединения, а вычисление изменения качества речевого сигнала производят на основе расчета фактора рейтинга (R-фактор) Е-модели. При расчете оптимальных для текущего соединения параметров кодека и передачи речевого сигнала определяют по данным статистики протокола RTCP реальную пропускную способность установленного соединения по следующей формуле:The difference of the proposed method is that when testing a network with an increased load, the parameters of the codec and voice signal transmission are selected so as to exceed the actual throughput. In the established connection, when testing the network with a reduced load, the current parameters of the codec and voice signal transmission are selected so as not to exceed the actual bandwidth ability B of the established connection, and the calculation of the change in the quality of the speech signal is performed based on the calculation of the rating factor (R-factor) E-model . When calculating the codec and speech signal parameters optimal for the current connection, the real throughput of the established connection is determined according to the statistics of the RTCP protocol according to the following formula:

B=(B1-L1)/(1-dL),B = (B1-L1) / (1-dL),

где B1 - пропускная способность, для которой производят тестирование повышенной нагрузкой, причем В1>В;where B1 is the throughput for which testing is carried out with increased load, and B1> B;

L1 - потери, соответствующие повышенной нагрузке, получаемые по данным статистики протокола RTCP;L1 - losses corresponding to increased load obtained according to the statistics of the RTCP protocol;

dL=L2/B2;dL = L2 / B2;

В2 - пропускная способность, для которой производят тестирование пониженной нагрузкой, причем В2<В;B2 - throughput, for which testing is carried out with a reduced load, with B2 <V;

L2 - потери, соответствующие пониженной нагрузке, получаемые по данным статистики протокола RTCP.L2 - losses corresponding to a reduced load, obtained according to the statistics of the RTCP protocol.

Параметры кодека и передачи речевого сигнала, оптимальные для текущего соединения, подбирают для найденных параметров В, L1 и L2, используя следующее условие:The codec and speech signal parameters that are optimal for the current connection are selected for the found parameters B, L1 and L2 using the following condition:

min|В-[(Р(m)×k+H)×F(k)/8]×r|,min | B - [(P (m) × k + H) × F (k) / 8] × r |,

где m - значения управляемых параметров кодека;where m are the values of the controlled parameters of the codec;

k - коэффициент блокирования (число фреймов в пакете);k is the blocking coefficient (the number of frames in the packet);

Р(m) - размер фрейма для заданных m, бит;P (m) - frame size for given m, bits;

Н - размер заголовков пакетов, бит;H is the size of the packet headers, bits;

F(k) - частота посылки пакетов, 1/с;F (k) - packet sending frequency, 1 / s;

r - количество посылок одного и того же пакета в канал связи;r is the number of parcels of the same packet in the communication channel;

min|f(m, k, r)| - операция поиска минимума абсолютного значения функции f(m, k, r) по управляемым параметрам.min | f (m, k, r) | - the operation of finding the minimum of the absolute value of the function f (m, k, r) by controlled parameters.

Предлагаемый способ реализуют следующим образом. В первом варианте способа устанавливают начальные параметры кодека и передачи речевого сигнала, а также допуск на изменение качества речевого сигнала ΔQ_d.The proposed method is implemented as follows. In the first variant of the method, the initial parameters of the codec and transmission of the speech signal are set, as well as the tolerance for changing the quality of the speech signal ΔQ _d .

В параметры передачи речевого сигнала входят следующие управляемые параметры:The following controlled parameters are included in the speech transmission parameters:

- коэффициент блокирования k (другое название - коэффициент пакетизации или число фреймов в пакете, положительное целое число);- blocking coefficient k (another name - packetization coefficient or the number of frames in the packet, a positive integer);

- количество посылок одного и того же пакета в канал r (целое число).- the number of parcels of the same packet in channel r (integer).

Указанные параметры используются для любых кодеков, применяемых в IP-телефонии.The specified parameters are used for any codecs used in IP-telephony.

Параметры кодека зависят от используемого кодека. Не во всех кодеках можно управлять параметрами кодирования. Однако известны кодеки, в которых есть управление режимом кодирования m. Например, Speex и GSM 06.90 AMR (Adaptive Multi-Rate, см. http://en.wikipedia.org/wiki/Adaptive_Multi-Rate, а также рекомендацию ETSI EN 301704 V7.2.1 на http://www.esti.org). Режим кодирования m кодека Speex называется quality (качество кодирования, целое число от 1 до 8 включительно). Аналогичный параметр у кодека GSM 06.90 AMR называется mode (режим), используется также восемь состояний и сходное управление. Поэтому в дальнейшем все рассмотрено на примере одного кодека - кодека Speex.Codec parameters depend on the codec used. Not all codecs can control encoding parameters. However, codecs are known in which there is a control of the encoding mode m. For example, Speex and GSM 06.90 AMR (Adaptive Multi-Rate, see http://en.wikipedia.org/wiki/Adaptive_Multi-Rate, as well as ETSI recommendation EN 301704 V7.2.1 at http://www.esti.org ) The encoding mode m of the Speex codec is called quality (encoding quality, integer from 1 to 8 inclusive). A similar parameter in the GSM codec 06.90 AMR is called mode, eight states and similar controls are also used. Therefore, in the future, everything is considered on the example of one codec - Speex codec.

Каждый из управляемых параметров по-своему влияет на качество речевого сигнала. Увеличение значения параметра k увеличивает общую задержку, что приводит к ухудшению качества принимаемого речевого сигнала, но уменьшает нагрузку на канал связи. Повторение каждого пакета при наличии резерва пропускной способности канала связи позволяет компенсировать случайные потери пакетов в канале связи, но увеличивает нагрузку на канал связи. Изменение значения параметра m позволяет управлять качеством кодирования речи и нагрузкой на канал связи: чем выше качество, тем больше нагрузка на канал.Each of the controlled parameters in its own way affects the quality of the speech signal. Increasing the value of the parameter k increases the overall delay, which leads to a deterioration in the quality of the received speech signal, but reduces the load on the communication channel. The repetition of each packet in the presence of a reserve bandwidth of the communication channel allows you to compensate for random packet loss in the communication channel, but increases the load on the communication channel. Changing the value of the parameter m allows you to control the quality of speech encoding and the load on the communication channel: the higher the quality, the greater the load on the channel.

Важно отметить, что изменение любого из перечисленных выше параметров изменяет полезную нагрузку IP-телефона на канал связи, что на узкополосных или перегруженных каналах приводит к изменению процента потерь и задержек и, как следствие, к изменению качества речевого сигнала. Таким образом, изменение любого из параметров может привести как к улучшению качества сигнала, так и к его ухудшению. Это зависит от свойств самого канала связи. Потому невозможно заранее выбрать режим кодирования и передачи, оптимальный для всех случаев.It is important to note that changing any of the above parameters changes the payload of the IP phone to the communication channel, which on narrow-band or congested channels leads to a change in the percentage of losses and delays and, as a consequence, to a change in the quality of the speech signal. Thus, a change in any of the parameters can lead to both an improvement in signal quality and its deterioration. It depends on the properties of the communication channel itself. Therefore, it is impossible to pre-select the encoding and transmission mode that is optimal for all cases.

Выбор начального состояния параметров в этом варианте способа не принципиален. Для ускорения поиска оптимальных параметров выбирались типичные настройки кодека (для Speex это m₀=8), коэффициент блокирования k₀=1 и количество посылок одного и того же пакета в канал r₀=1.The choice of the initial state of the parameters in this variant of the method is not fundamental. To speed up the search for optimal parameters, typical codec settings were chosen (for Speex it is m ₀ = 8), the blocking coefficient k ₀ = 1 and the number of transmissions of the same packet into the channel r ₀ = 1.

Допуск на изменение качества речевого сигнала ΔQ_d задают в относительных единицах (например, 0,2, что эквивалентно изменению качества на 20%).The tolerance for the change in the quality of the speech signal ΔQ _{d is} set in relative units (for example, 0.2, which is equivalent to a change in quality by 20%).

После установки начальных параметров кодека и передачи речевого сигнала, а также допуска на изменение качества речевого сигнала ΔQ_d осуществляют поиск оптимальных для текущего соединения параметров кодека и передачи речевого сигнала. Для этого используют метод покоординатного спуска.After setting the initial parameters of the codec and transmitting the speech signal, as well as the tolerance for changing the quality of the speech signal ΔQ _d , the optimal codec and speech signal parameters are searched for for the current connection. To do this, use the method of coordinate descent.

Согласно этому методу из начального состояния параметров кодека и передачи речевого сигнала, характеризуемого тройкой управляемых параметров (k₀, r₀, m₀), производят поиск максимума качества речевого сигнала Q вдоль направления оси k (коэффициент блокирования (пакетизации)) и определяют точку (k₁, r₀, m₀), в которой он максимален. Затем производят поиск максимума из этой точки в направлении оси r (количество посылок одного и того же пакета в канал) и определяют следующее состояние (k₁, r₀, m₀). Аналогично определяют максимум по координате m (режим кодирования).According to this method, from the initial state of the codec parameters and the transmission of a speech signal characterized by a triple of controlled parameters (k ₀ , r ₀ , m ₀ ), a search is made for the maximum quality of the speech signal Q along the direction of the k axis (blocking coefficient (packetization)) and determine the point ( k ₁ , r ₀ , m ₀ ), in which it is maximal. Then, a maximum is searched from this point in the direction of the r axis (the number of packets of the same packet per channel) and the following state is determined (k ₁ , r ₀ , m ₀ ). Similarly, the maximum is determined by the coordinate m (coding mode).

Определение качества речевого сигнала Q производят путем расчета R-фактора Е-модели (G.107. ITU-T Recommendation G.107 Е-модель вычислительная модель, используемая для планирования передачи [Text]. - Approved - Geneva: ITU-T, 2005. - 28р.). Упрощенную формулу для подсчета R-фактора, используемую в IP-телефонии, представляют в следующем виде (Lustosa, L.С.G. E-Model Utilization For Speech Quality Evaluation Over VoIP-Based Communication Systems / L.C.G.Lustosa, L.S.G.Carvalho, P.H.A.Rodrigues, E.S.Mota. - 22nd SBRC, 2004. - P.933-938):The quality of the speech signal Q is determined by calculating the R-factor of the E-model (G.107. ITU-T Recommendation G.107 E-model is a computational model used for transmission planning [Text]. - Approved - Geneva: ITU-T, 2005 . - 28 p.). The simplified formula for calculating the R-factor used in IP-telephony is presented as follows (Lustosa, L.C. G. E-Model Utilization For Speech Quality Evaluation Over VoIP-Based Communication Systems / LCG Lustosa, LSG Carvalho, PHARodrigues , ESMota. - 22nd SBRC, 2004. - P.933-938):

R=93,4-Id(Ta)-Ie(codec, loss),R = 93.4-Id (Ta) -Ie (codec, loss),

где:Where:

Та - абсолютная односторонняя задержка;Ta is an absolute one-way delay;

Id - ухудшение качества, вызванное задержкой;Id - quality degradation caused by delay;

Ie - потери качества при кодировании (codec), а также из-за потерь пакетов в канале связи (loss).Ie - quality loss during encoding (codec), as well as due to packet loss in the communication channel (loss).

Параметр Id при отсутствии эффекта эха полностью определяется односторонней задержкой распространения сигнала Та и его вычисление не представляет принципиальной сложности. Расчет задержки обычно производят при помощи специальных управляющих сигналов, встраиваемых в пакеты кодека Speex (аналог ping).In the absence of an echo effect, the parameter Id is completely determined by the one-way propagation delay of the signal Ta and its calculation does not represent a fundamental difficulty. Calculation of the delay is usually performed using special control signals that are built into the Speex codec packets (analogue ping).

Параметр Ie требует экспериментального определения. Для ряда наиболее популярных кодеков такие измерения проведены, в то же время для Speex эта работа ранее выполнена не была. Необходимо отметить, что на параметр Ie оказывает влияние как ухудшение качества, связанное с кодированием Speex, так и ухудшение качества за счет потерь в канале связи. Эти параметры не являются независимыми, т.к. декодер Speex использует алгоритмы скрытия потерь пакетов (packet loss concealment), которые весьма эффективны при одиночных потерях, однако их эффективность падает при увеличении процента потерь.The parameter Ie requires experimental determination. Such measurements were carried out for a number of the most popular codecs, while for Speex, this work has not been previously performed. It should be noted that the Ie parameter is affected by both the quality degradation associated with Speex coding and the quality degradation due to losses in the communication channel. These parameters are not independent, because Speex decoder uses packet loss concealment algorithms, which are very effective for single losses, but their efficiency decreases with increasing percentage of losses.

План эксперимента для оценки параметра Ie заключается в выполнении следующей последовательности действий:The experimental design for estimating the parameter Ie consists in the following sequence of actions:

- для заданного набора режимов кодирования Speex и заданного набора уровня потерь выполняют кодирование, а затем декодирование набора речевых файлов; полученные в результате декодирования звуковые фрагменты сравнивают с оригинальными по алгоритму PESQ (P.862. ITU-T Recommendation P.862 Perceptual evaluation of speech quality (PESQ): An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs [Text]. - Approved 01.02.2001. - Geneva: ITU-T, 2001. - 21 p.), выставляющему оценку качества MOS в пределах от 0 до 5;- for a given set of Speex encoding modes and a given set of loss levels, encoding and then decoding of a set of speech files are performed; sound fragments resulting from decoding are compared with the original ones according to the PESQ algorithm (P.862. ITU-T Recommendation P.862 Perceptual evaluation of speech quality (PESQ): An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs [Text]. - Approved 02/01/2001. - Geneva: ITU-T, 2001. - 21 p.), which evaluates the quality of MOS in the range from 0 to 5;

- оценку MOS переводят в R-фактор с использованием формулы, приведенной в рекомендации ITU-T G.107;- the MOS score is converted to the R-factor using the formula given in ITU-T Recommendation G.107;

- принимая значение Id равным нулю, вычисляют Ie для каждого набора параметров кодека и канала.- taking the Id value equal to zero, Ie is calculated for each set of codec and channel parameters.

Результаты экспериментального определения параметра Ie приведены в таблице 1.The results of the experimental determination of the parameter Ie are shown in table 1.

Всего было протестировано порядка 20 звуковых файлов длиной от 15 до 120 секунд. При проведении экспериментов на других звуковых файлах результаты могут варьироваться. Это, в частности, связано с особенностями кодека Speex, который оптимизирован под мужские голоса. Поэтому будут различия при делении речи на голоса взрослых/детей, мужских/женских.In total, about 20 sound files with a length of 15 to 120 seconds were tested. When experimenting with other sound files, the results may vary. This, in particular, is due to the features of the Speex codec, which is optimized for male voices. Therefore, there will be differences when dividing speech into the voices of adults / children, male / female.

Таблица 1Table 1 Значение Ie при различных режимах кодека Speex и уровнях потерьIe value for various Speex codec modes and loss levels Уровень потерь пакетов (%)Packet Loss Rate (%) Режим кодекаCodec mode 00 1one 22 33 4four 88 1616 1one 53,4953.49 53,8553.85 54,1754.17 54,0054.00 54,1554.15 55,0155.01 56,4856.48 88 43,9843.98 44,4144.41 44,7344.73 45,0745.07 45,5445.54 47,0747.07 50,3250.32 22 33,0433.04 34,5334.53 35,6335.63 36,7736.77 38,0738.07 41,5441.54 46,6246.62 33 27,8627.86 30,3030.30 31,8631.86 33,5933.59 35,3835.38 39,7239.72 45,5845.58 4four 23,1423.14 26,4426.44 28,5228.52 30,9230.92 33,0333.03 37,6737.67 43,9343.93 55 21,2221.22 24,8224.82 27,1727.17 29,7829.78 31,9931,99 36,7936.79 42,9642.96 66 17,8517.85 22,6522.65 25,2825.28 28,0228.02 30,5830.58 35,9335.93 42,8742.87 77 15,4115.41 20,9420.94 24,2224.22 27,3227.32 29,9029.90 35,4935.49 42,4142.41

Период между сменой параметров и проверками качества определяется длительностью переходных процессов в IP-телефонии. Из-за этого период должен составлять не менее 1 с. Опытным путем можно подобрать оптимальное значение данного параметра для конкретной реализации IP-телефона. Обычно период составляет 2-4 с. Большие значения периода приводят к увеличению инерционности управления и несвоевременной реакции системы адаптации на изменение параметров канала связи. Это относится и ко второму варианту способа, описываемому ниже.The period between the change of parameters and quality checks is determined by the duration of transients in IP-telephony. Because of this, the period must be at least 1 s. Empirically, you can choose the optimal value of this parameter for a specific implementation of the IP-phone. Typically, the period is 2-4 s. Large values of the period lead to an increase in the inertia of the control and untimely reaction of the adaptation system to changes in the parameters of the communication channel. This also applies to the second variant of the method described below.

Найденный таким образом набор параметров считают оптимальным для данного канала, его устанавливают как текущий, соответствующее ему максимальное значение R-фактора Q_opt используют в дальнейшем для вычисления изменения качества.The set of parameters found in this way is considered optimal for this channel, it is set as the current one, the corresponding maximum value of the R-factor Q _{opt is} used in the future to calculate the quality change.

Поиск оптимальных параметров осуществляют на стороне приемника речевого сигнала, однако параметры кодека должны быть скорректированы на стороне передатчика. Перенос управляющих сигналов осуществляют посредством встроенной сигнализации (in-band signalling), реализованной в кодеке Speex.The search for optimal parameters is carried out on the side of the receiver of the speech signal, however, the parameters of the codec must be adjusted on the side of the transmitter. Transfer of control signals is carried out by means of the built-in signaling (in-band signalling) implemented in the Speex codec.

Основная идея встроенной сигнализации заключается в том, что приемник имеет возможность посылать передатчику специально сформированные «псевдофреймы» - пакеты Speex, поле режима (speex mode) которых установлено равным 14, а в поле данных располагается закодированное управляющее воздействие. Сторона, получившая пакет со специальным номером режима, определяет по первым четырем байтам код управляющего воздействия, а все остальные данные в пакете интерпретирует как его параметры. Программный интерфейс кодека Speex предоставляет возможность устанавливать обработчики на события вида «получено управляющее воздействие с кодом X».The main idea of the built-in signaling is that the receiver has the ability to send specially shaped "pseudo-frames" to the transmitter - Speex packets, whose speex mode field is set to 14, and the encoded control is located in the data field. The party that received the packet with a special mode number determines the control action code from the first four bytes, and interprets all other data in the packet as its parameters. The Speex codec programming interface provides the ability to set handlers for events of the form “received control action with code X”.

Далее с определенной периодичностью вычисляют изменение качества речевого сигнала ΔQ=|(Q-Q_орt)/Q_opt|, где |х| - абсолютная величина числа х, Q - текущее качество, вычисленное с использованием R-фактора. Период выбирают из тех же соображений, что и при поиске оптимальных параметров кодека и передачи речевого сигнала. При превышении значением ΔQ допуска ΔQ_d вновь выполняют поиск оптимальных для текущего соединения параметров кодека и передачи речевого сигнала (на этот раз из ранее найденной точки оптимума). Устанавливают их как текущие и вновь с определенной периодичностью вычисляют изменение качества речевого сигнала.Then, with a certain periodicity, the change in the quality of the speech signal ΔQ = | (QQ _opt ) / Q _opt |, where | x | is the absolute value of the number x, Q is the current quality calculated using the R-factor. The period is chosen for the same reasons as when searching for the optimal parameters of the codec and transmitting the speech signal. If the ΔQ value exceeds the tolerance ΔQ _{d, they} again search for the codec and the speech signal optimal for the current connection (this time from the previously found optimum point). Set them as current and again with a certain frequency calculate the change in the quality of the speech signal.

По сути все время двустороннего соединения работают в цикле режима поиска оптимальных параметров кодека и передачи речевого сигнала, а также режима наблюдения за изменением качества речевого сигнала.In fact, all the time of a two-way connection, they work in a cycle of the search mode for the optimal parameters of the codec and the transmission of the speech signal, as well as the monitoring mode for changing the quality of the speech signal.

Во втором варианте способа устанавливают допуск на изменение качества речевого сигнала ΔQ_d, причем, как и в первом варианте способа, ΔQ_d задают в относительных единицах (например, 0,2, что эквивалентно изменению качества на 20%).In the second variant of the method, the tolerance for changing the quality of the speech signal ΔQ _{d is set} , and, as in the first variant of the method, ΔQ _{d is} set in relative units (for example, 0.2, which is equivalent to a quality change of 20%).

Далее осуществляют тестирование сети повышенной и пониженной нагрузкой. Тестирование начинают при старте двустороннего соединения абонентов. При этом не должен быть включен VAD (Voice Activity Detector, детектор речевой активности), иначе можно попасть на паузу и протестировать повышенной нагрузкой не получится.Next, the network is tested with increased and reduced load. Testing starts at the start of two-way connection of subscribers. At the same time, VAD (Voice Activity Detector, speech activity detector) should not be turned on, otherwise you can pause and test with increased load.

Тестирование позволяет определить причины потерь пакетов при передаче по сети и реальную полосу пропускания сети. Это важно, так как потери пакетов являются основным источником снижения качества восстановления речевого сигнала.Testing allows you to determine the causes of packet loss during transmission over the network and the actual network bandwidth. This is important because packet loss is a major source of degradation in voice restoration quality.

Есть две основные причины потерь пакетов в канале:There are two main reasons for packet loss in a channel:

- ограниченная ширина полосы пропускания канала связи В, привязанная к входу в канал (максимальная пропускная способность канала связи); если поток пакетов превышает полосу пропускания канала связи, то часть пакетов теряется;- limited bandwidth of the communication channel B, tied to the entrance to the channel (maximum bandwidth of the communication channel); if the packet stream exceeds the bandwidth of the communication channel, then some of the packets are lost;

- доля независимых случайных потерь dL в канале связи, наблюдаемых при любой полосе реального потока пакетов Bi≤В, переданного в канал (i=1 при тестировании сети повышенной нагрузкой, i=2 при тестировании сети пониженной нагрузкой).- the proportion of independent random losses dL in the communication channel observed for any band of the real packet stream Bi≤В transmitted to the channel (i = 1 when testing the network with increased load, i = 2 when testing the network with reduced load).

Исходя из этой упрощенной модели для поиска реальной полосы пропускания канала связи В и доли независимых случайных потерь dL, составляют систему двух уравнений.Based on this simplified model, to search for the real bandwidth of communication channel B and the fraction of independent random losses dL, they make up a system of two equations.

Первое уравнение строится при тестировании канала с заведомой перегрузкой В1>В, например, за счет увеличения количества посылок одного и того же пакета в канал:The first equation is constructed when testing a channel with a known overload B1> B, for example, by increasing the number of sendings of the same packet to the channel:

L1=B1-(B×(1-dL)),L1 = B1- (B × (1-dL)),

где L1 - общие потери потока в канале при тестировании сети повышенной нагрузкой.where L1 is the total flow loss in the channel when testing the network with increased load.

Второе уравнение строится при тестировании сети нагрузкой В2<В, не превышающей пропускную способность канала. В2 - это фактически принятая полезная нагрузка в первом усиленном тесте. ТогдаThe second equation is constructed when testing the network with a load of B2 <B, not exceeding the channel capacity. B2 is actually the accepted payload in the first enhanced test. Then

L2=В2-(В2×(1-dL))=B2×dL,L2 = B2- (B2 × (1-dL)) = B2 × dL,

где L2 - общие потери потока в канале при тестировании сети пониженной нагрузкой.where L2 is the total flow loss in the channel when testing the network with a reduced load.

ОткудаWhere from

dL=L2/B2,dL = L2 / B2,

в свою очередьin its turn

В=(В1-L1)/(1-dL).B = (B1-L1) / (1-dL).

Все необходимые для расчета величины находят непосредственно из статистики протокола RTCP.All values necessary for the calculation are found directly from the statistics of the RTCP protocol.

По результатам тестирования сети находят оптимальные для текущего соединения параметры кодека и передачи речевого сигнала (задача - вписаться в измеренную полосу В) следующим образом:Based on the network testing results, the codec and speech signal transmission parameters that are optimal for the current connection are found (the task is to fit into the measured B band) as follows:

- в цикле от большего к меньшему размеру фрейма Р(m) подбирается соответствующий ему режим кодирования m кодека Speex, коэффициент блокирования (пакетизации) k и количество посылок одного и того же пакета в канал r так, чтобы поместиться в измеренную полосу В; причем при потерях пакетов более hdL (значение hdL может быть установлено, например, равным 1%) и полосе более hBand (значение hBand может быть установлено, например, равным 2500 байт/с) используются повторы пакетов;- in a cycle from a larger to a smaller frame size P (m), the corresponding encoding mode m of the Speex codec, the blocking (packetization) coefficient k and the number of packets of the same packet in channel r are selected so as to fit in the measured band B; moreover, with packet losses greater than hdL (hdL value can be set, for example, equal to 1%) and a band more than hBand (hBand value can be set, for example, equal to 2500 bytes / s), packet retries are used;

- если удалось поместиться в измеренную полосу В, то заканчивается выбор параметров, иначе осуществляется переход к меньшему размеру фрейма Р(m);- if it was possible to fit into the measured band B, then the selection of parameters ends, otherwise a transition to a smaller frame size P (m) is performed;

- если для всех возможных размеров фрейма Р(m) не удалось поместиться в полосу, то на верхнем уровне системы IP-телефонии принимается решение о разрыве двустороннего соединения.- if for all possible frame sizes P (m) it was not possible to fit in the strip, then at the upper level of the IP-telephony system, a decision is made to break the two-way connection.

Расчет требуемой полосы для заданных k, r и m производят по формуле:The calculation of the required band for a given k, r and m is performed according to the formula:

В(m,k)=[(Р(m)×k+Н)×F(k)/8]×r,B (m, k) = [(P (m) × k + H) × F (k) / 8] × r,

где m - режим кодирования кодека Speex;where m is the encoding mode of the Speex codec;

k - коэффициент блокирования (пакетизации, иначе - число фреймов в пакете);k is the blocking coefficient (packetization, otherwise, the number of frames in the packet);

В(m, k) - требуемая полоса, Кбайт/с;In (m, k) - the required band, Kb / s;

Р(m) - размер фрейма для заданного m, бит;P (m) - frame size for a given m, bits;

Н - размер заголовков (обычно 224), бит;H is the size of the headers (usually 224), bits;

r - количество посылок одного и того же пакета в канал связи.r is the number of parcels of the same packet in the communication channel.

В таблице 2 приведены результаты расчета требуемой полосы для различных значений m и k при r=1.Table 2 shows the results of calculating the required band for various values of m and k at r = 1.

Таблица 2table 2 Требуемая полоса для различных значений m и k при r=1The required band for different values of m and k at r = 1 F(k)F (k) 50fifty 2525 16,6616.66 12,512.5 1010 8,338.33 7,147.14 kk 1one 22 33 4
В(m, k)four
In (m, k) 55 66 77 mm P(m)P (m) 1one 4848 17001700 10001000 766766 650650 580580 533533 500500 88 8080 19001900 12001200 966966 850850 780780 733733 700700 22 120120 21502150 14501450 12161216 11001100 10301030 982982 950950 33 160160 24002400 17001700 14661466 13501350 12801280 1123311233 12001200 4four 224224 28002800 21002100 18661866 17501750 16801680 16331633 15991599 55 304304 33003300 26002600 23662366 22502250 21802180 21322132 20992099 66 368368 37003700 30003000 27662766 26502650 25802580 25322532 24992499 77 496496 45004500 38003800 35653565 34503450 33803380 33323332 32993299

Найденные параметры кодека и передачи речевого сигнала устанавливают как текущие. После этого рассчитывают качество переданного речевого сигнала Q_opt с использованием R-фактора и запоминают. Период между сменой параметров и проверками качества, а также R-фактор определяют аналогично первому варианту способа.The found codec and speech signal transmission parameters are set as current. After that, the quality of the transmitted speech signal Q _{opt is} calculated using the R factor and stored. The period between the change of parameters and quality checks, as well as the R-factor is determined similarly to the first variant of the method.

Далее с определенной периодичностью вычисляют изменение качества речевого сигнала ΔQ=|(Q-Q_opt)/Q_opt|, где |х| - абсолютная величина числа х, Q - текущее качество, вычисленное с использованием R-фактора. Период выбирают из тех же соображений, что и при тестировании. При превышении значением ΔQ допуска ΔQ_d вновь осуществляют тестирование сети повышенной и пониженной нагрузкой, по результатам тестирования сети находят оптимальные для текущего соединения параметры кодека и передачи речевого сигнала, устанавливают найденные параметры кодека и передачи речевого сигнала как текущие и вновь с определенной периодичностью вычисляют изменение качества речевого сигнала.Then, with a certain frequency, the change in the quality of the speech signal ΔQ = | (QQ _opt ) / Q _opt |, where | x | is the absolute value of the number x, Q is the current quality calculated using the R-factor. The period is chosen for the same reasons as in testing. If the ΔQ value exceeds the tolerance ΔQ _d , the network is again tested with increased and reduced load, according to the network test results, the codec and speech signal parameters that are optimal for the current connection are found, the found codec and speech signal transmission parameters are set as current and the quality change is calculated again with a certain frequency speech signal.

Повторные тестирования повышенной нагрузкой производят при незначительно увеличенной в Ktest (например, в 1,25 раза) нагрузке канала, что дает принципиальную возможность бережного прощупывания пропускной способности канала.Repeated tests with increased load are carried out at a slightly increased channel load in Ktest (for example, 1.25 times), which makes it possible in principle to carefully probe the channel capacity.

По сути все время двустороннего соединения работают в цикле режима тестирования с определением оптимальных параметров кодека и передачи речевого сигнала, а также режима наблюдения за изменением качества речевого сигнала.In fact, all the time of a two-way connection, they work in a test mode cycle with determination of the optimal parameters of the codec and voice signal transmission, as well as a mode for monitoring the change in the quality of the speech signal.

Предлагаемое техническое решение позволяет повысить качество передачи речи в условиях неизвестной и нестабильной пропускной способности канала связи за счет подбора оптимальных параметров кодека и передачи речевого сигнала. При наличии случайных потерь пакетов в канале связи используется для их компенсации повтор передачи пакетов (параметр передачи речевого сигнала).The proposed technical solution allows to improve the quality of speech transmission in the conditions of unknown and unstable bandwidth of the communication channel by selecting the optimal parameters of the codec and transmitting the speech signal. In the presence of random packet losses in the communication channel, packet retransmission (speech transmission parameter) is used to compensate for them.

Claims

A method of adapting codec parameters and transmitting a speech signal in a packet-switched network, in which a tolerance for changing the quality of a speech signal is established, testing the network with increased and reduced load, using the network testing results, find the codec and speech transmission parameters that are optimal for the current connection, establish the found parameters of the codec and speech signal transmission as current, with a certain periodicity calculate the change in the quality of the speech signal, if This change in the quality of the speech signal exceeds the established tolerance for changing the quality of the speech signal, the network is again tested with increased and reduced load, according to the results of network testing, the codec and speech signal transmission parameters that are optimal for the current connection are found, the found codec and speech signal transmission parameters are set as current and again, with a certain periodicity, the change in the quality of the speech signal is calculated, characterized in that when testing the network the increased load the parameters of the codec and voice signal transmission are selected so as to exceed the actual bandwidth B of the established connection, and when testing the network with a reduced load, the current parameters of the codec and voice signal transmission are selected so as not to exceed the actual bandwidth B of the established connection, while calculating the quality change the speech signal is produced based on the calculation of the rating factor (R-factor) of the E-model, when calculating the codec and transmission parameters optimal for the current connection the speech signal is determined according to the statistics of the protocol RTCP real throughput of the established connection according to the following formula:
B = (B1-L1) / (1-dL),
where B1 is the throughput for which testing is carried out with increased load, and B1>B;
L1 - losses corresponding to increased load obtained according to the statistics of the RTCP protocol;
dL = L2 / B2;
B2 - throughput, for which testing is carried out with a reduced load, with B2 <V;
L2 - losses corresponding to reduced load, obtained according to the statistics of the RTCP protocol;
despite the fact that the parameters of the codec and voice signal transmission that are optimal for the current connection are selected for the found parameters B, L1 and L2 using the following condition:
min | B - [(P (m) · k + H) · F (k) / 8] · r |,
where m are the values of the controlled parameters of the codec;
k is the blocking coefficient (the number of frames in the packet);
P (m) - frame size for given m, bits;
H is the size of the headers, bits;
F (k) - packet sending frequency, 1 / s;
r is the number of parcels of the same packet per channel;
min | f (m, k, r) | - the operation of finding the minimum of the absolute value of the function f (m, k, r) by controlled parameters.