RU2312405C2

RU2312405C2 - Method for realizing machine estimation of quality of sound signals

Info

Publication number: RU2312405C2
Application number: RU2005128572/09A
Authority: RU
Inventors: Михаил Николаевич Гусев (RU); Михаил Николаевич Гусев; рев Владимир Михайлович Дегт (RU); Владимир Михайлович Дегтярев; Игорь Вениаминович Жарков (RU); Игорь Вениаминович Жарков
Original assignee: Михаил Николаевич Гусев
Priority date: 2005-09-13
Filing date: 2005-09-13
Publication date: 2007-12-10
Also published as: RU2005128572A

Abstract

FIELD: analysis of sound signal quality, possible use for estimating quality of speech transferred through radio communication channels.

SUBSTANCE: in accordance to the method for machine estimation of sound signal quality, the signal is divided onto critical bands and spectral energy values are computed for critical bands, values of spectral likeness of active phase of fragments are determined, and quality of tested sound signal is determined by means of weighted linear combination of aforementioned quality values for each phase. The difference of the method is that selected fragments of active and inactive phase of both signals are synchronized, inactive phase spectrums are determined for each fragment, resulting spectrums of active and inactive phase of fragments are divided onto additional sets of bands, for each one of which spectral energy values are computed, resulting spectral energies of active and inactive fragment phases are compared in couples, to determine spectral likeness coefficients, resulting likeness coefficient for each phase is determined as an average value of likeness coefficients for all sets of bands, which is the estimate of quality of each phase.

EFFECT: ensured universality and optimized quality of estimation process depending on purposes of estimation.

5 cl, 13 dwg, 6 tbl

Description

Изобретение относится к анализу качества звуковых сигналов и может быть использовано для оценки качества речи, передаваемой по каналам радиосвязи, телефонии и трактам переговорных устройств, а также при оценке качества звука, воспроизводимого различной аудиоаппаратурой, в том числе прошедшего любые процедуры сжатия/восстановления с помощью различных вокодеров и оценки акустического качества помещений.The invention relates to the analysis of the quality of sound signals and can be used to assess the quality of speech transmitted over radio channels, telephony and communication channels, as well as to assess the quality of sound reproduced by various audio equipment, including those that underwent any compression / restoration procedures using various vocoders and assessment of the acoustic quality of the premises.

Оценка качества звуковых сигналов приобретает все большее значение с ростом распространения и использования мобильной связи, систем синтетической телефонии, различных портативных звукозаписывающих и звуковоспроизводящих устройств. Стремление создать способ, обеспечивающий объективность оценки (т.е. независимость от оценки конкретного лица) и возможность его автоматической реализации, понятно - объективная оценка необходима как для сравнения образцов продукции конкурентов, так и для оптимизации параметров собственной.Evaluation of the quality of sound signals is becoming increasingly important with the growth in the spread and use of mobile communications, synthetic telephony systems, various portable sound recording and reproducing devices. The desire to create a method that ensures the objectivity of the assessment (i.e., independence from the assessment of a particular person) and the possibility of its automatic implementation, it is clear - an objective assessment is necessary both for comparing product samples of competitors, and for optimizing your own parameters.

Одним из основных показателей систем сжатия, передачи и воспроизведения звуковой информации является качество восстановленного, принятого или воспроизведенного звука.One of the main indicators of compression, transmission and playback of audio information is the quality of the restored, received or reproduced sound.

Количественное измерение качества звука имеет свои специфические особенности, связанные с тем, что, в конечном итоге, приемником звукового сигнала всегда является человек, и он же, является источником большинства звуковых сигналов. Соответственно, качество звуковых сигналов определяется не только техническими характеристиками систем обработки и передачи звука, но и свойствами речевого аппарата и слуха людей, изменяющимися со временем и от человека к человеку.The quantitative measurement of sound quality has its own specific features associated with the fact that, in the end, the receiver of the sound signal is always a person, and he is the source of most sound signals. Accordingly, the quality of sound signals is determined not only by the technical characteristics of sound processing and transmission systems, but also by the properties of the speech apparatus and hearing of people, which change over time and from person to person.

Различают субъективные и объективные методы измерения качества речи. Субъективные методы - это методы, в которых слух человек является составной частью измерительного комплекса. Соответственно, объективные методы исключают участие слуха человека из процесса измерений.Distinguish between subjective and objective methods of measuring speech quality. Subjective methods are methods in which a person’s hearing is an integral part of a measuring complex. Accordingly, objective methods exclude the participation of human hearing from the measurement process.

Наиболее распространенным субъективным методом оценки качества речи (не обязательно речи, хотя, обычно, именно речи) является оценка MOS (mean opinion score - средняя субъективная оценка) - оценка по пяти бальной шкале.The most common subjective method for assessing the quality of speech (not necessarily speech, although usually it is speech) is the MOS score (mean opinion score) - an assessment on a five-point scale.

Оценка по шкале MOS определяется путем обработки оценок, даваемых группами аудиторов, нескольким звуковым сигналам, воспроизводимым различными аудиосистемами. Каждый аудитор выносит оценку каждого сигнала. Затем результаты усредняются.The MOS score is determined by processing the ratings given by the groups of auditors to several audio signals played by various audio systems. Each auditor evaluates each signal. Then the results are averaged.

Процесс организации и проведения субъективных экспертиз достаточно сложная, длительная и дорогостоящая процедура, поэтому на протяжении уже многих лет ведутся работы по поиску объективных методов оценки разборчивости, позволяющих получить быстрые и автоматизированные оценки, хорошо совпадающие с субъективными экспертизами.The process of organizing and conducting subjective examinations is a rather complicated, lengthy and costly procedure, therefore, for many years, work has been ongoing on the search for objective methods for assessing intelligibility, which allow us to obtain quick and automated assessments that are in good agreement with subjective examinations.

Известны различные методы оценки, некоторые из них приведены ниже:Various valuation methods are known, some of which are given below:

AI (Articulation Index) - индекс артикуляции - Идея заключается в том, что весь частотный диапазон речевого сигнала разбивается на 20 полос, в пределах которых определяется отношение сигнал/шум. Ширина полос выбирается так, чтобы вклад каждой полосы в восприятие речи был одинаковым. В каждой полосе рассчитывается отношение сигнал/шум. Индекс артикуляции принимается равным взвешенной сумме значений на полосах.AI (Articulation Index) - The idea is that the entire frequency range of the speech signal is divided into 20 bands, within which the signal-to-noise ratio is determined. The bandwidth is chosen so that the contribution of each band to speech perception is the same. In each band, the signal-to-noise ratio is calculated. The articulation index is taken equal to the weighted sum of the values in the bands.

Индекс артикуляции плох тем, что он хоть и ориентирован на речевой сигнал не учитывает свойств слуха и речеобразования.The articulation index is bad in that it, although it is focused on a speech signal, does not take into account the properties of hearing and speech formation.

SII (Speech Intelligibility Index) - Индекс разборчивости речи - развитие метода AI. Индекс разборчивости речи включен в американский стандарт ANSI S3.5-1997 и предлагает четыре измерительные процедуры на различных группах полос: критические полосы (21 полоса), третьоктавные полосы (18 полос), равные по вкладу критические полосы (17 полос) и октавные полосы (6 полос). В каждой из полос вычисляется отношение сигнал/шум и рассчитывается суммарный коэффициент SII, лежащий в пределах от 0 до 1.SII (Speech Intelligibility Index) - Speech intelligibility index - development of the AI method. The speech intelligibility index is included in the American standard ANSI S3.5-1997 and offers four measurement procedures for different groups of bands: critical bands (21 bands), one-third octave bands (18 bands), equal critical bands (17 bands) and octave bands ( 6 bands). In each of the bands, the signal-to-noise ratio is calculated and the total coefficient SII is calculated, lying in the range from 0 to 1.

Индекс разборчивости речи учитывает только свойства слуха и не учитывает свойств речеобразования.The speech intelligibility index takes into account only the properties of hearing and does not take into account the properties of speech formation.

STI (Speech Transmission Index) - индекс передачи речи - Речевой сигнал можно приближенно рассматривать как широкополосный сигнал, модулированный низкочастотным сигналом. Частота модуляции определяется скоростью артикуляции. Уменьшение глубины модуляции уподобляет речевой сигнал шумовому и уменьшает его разборчивость. Соответственно и уменьшение разборчивости можно оценить по уменьшению глубины модуляции.STI (Speech Transmission Index) - speech transmission index - A speech signal can be approximately considered as a broadband signal modulated by a low-frequency signal. The modulation frequency is determined by the rate of articulation. Reducing the depth of modulation likens a speech signal to a noise signal and reduces its intelligibility. Accordingly, a decrease in intelligibility can be estimated by a decrease in the modulation depth.

Весь речевой диапазон разбивается на семь октавных полос, на вход испытуемой системы подается октавный шумовой сигнал. Распределение интенсивности тестового сигнала совпадает с распределением интенсивностей речевого сигнала. Частоты модулирующего сигнала изменяются от 0.5 до 12.5 Гц с третьоктавным интервалом (всего 14 частот).The entire speech range is divided into seven octave bands, an octave noise signal is fed to the input of the test system. The distribution of the intensity of the test signal coincides with the distribution of intensities of the speech signal. The frequencies of the modulating signal vary from 0.5 to 12.5 Hz with a third-octave interval (a total of 14 frequencies).

Метод измерений STI зафиксирован в международном стандарте IEC 268-16.The STI measurement method is recorded in the international standard IEC 268-16.

RATSI/STIPA (Rapid Speech Transmission Index) - быстрый индекс передачи речи. Метод STI требует большого количества измерений и расчетов. Был разработан упрощенный метод, предусматривающий измерения только в двух полосах при пяти частотах модуляции, а сокративший - количество измерений и расчетов. Для хорошей разборчивости значения RASTI должны быть не ниже 0.6.RATSI / STIPA (Rapid Speech Transmission Index) is a fast speech transmission index. The STI method requires a large number of measurements and calculations. A simplified method was developed, providing for measurements in only two bands at five modulation frequencies, and reduced the number of measurements and calculations. For good readability, RASTI values should be at least 0.6.

Индекс передачи речи, равно как и быстрый индекс, имитирует процесс речеобразования с помощью шумовой модели, однако такой учет свойств речеобразования и слуха далек от оптимального.The speech transmission index, as well as the fast index, imitates the process of speech formation using a noise model, however, such an account of the properties of speech formation and hearing is far from optimal.

С50 - коэффициент четкости - определяет четкость или ясность звучания и вычисляется, как отношение ближнего и дальнего эха. Метод основан на том, что эхо понижает разборчивость сигнала. Измеряется отношение ближнего и дальнего эха на нескольких частотных полосах. Ближнее эхо (до 33 мс) считается полезным сигналом, а дальнее (больше 33 мс) - мешающим.C50 - clarity coefficient - determines the clarity or clarity of sound and is calculated as the ratio of near and far echo. The method is based on the fact that the echo reduces the intelligibility of the signal. The ratio of near and far echo is measured at several frequency bands. The near echo (up to 33 ms) is considered a useful signal, and the distant echo (more than 33 ms) is considered an interfering signal.

Коэффициент четкости учитывает лишь один вид возможных искажений и его целесообразно применять в качестве одной из оценок качества речи.The clarity factor takes into account only one type of possible distortion and it is advisable to use it as one of the assessments of speech quality.

Известен способ оценки разборчивости речи, получаемой по трактам переговорных устройств средств индивидуальной защиты органов дыхания, путем применения преобразователя речевого сообщения в электрический сигнал и комплекса аппаратуры регистрации и обработки для получения амплитудно-частотной зависимости речевого сообщения, определения формант равной разборчивости, уровня их ощущения, расчета вероятности приема формант, по величине которой оценивают разборчивость речи, отличающийся тем, что преобразователь речевого сообщения в электрический сигнал подключают на вход звукового адаптера персональной ЭВМ с платой оцифровки, осуществляют перевод информации из аналоговой формы в цифровую, проводят обработку цифровой информации и определение требуемых для оценки разборчивости выходных характеристик (заявка №2002133196).There is a method of evaluating speech intelligibility obtained through the paths of intercoms of personal respiratory protection by applying a voice message transducer to an electric signal and a complex of recording and processing equipment to obtain the amplitude-frequency dependence of the voice message, determining formants of equal intelligibility, their sensation level, calculation the probability of receiving formants, the magnitude of which assesses speech intelligibility, characterized in that the voice message converter in the ele a ctric signal is connected to the input of the sound adapter of a personal computer with a digitization board, information is transferred from an analog form to digital, digital information is processed and the output characteristics required for evaluating intelligibility are evaluated (application No. 2002133196).

Недостатком данного способа является то, что он не учитывает в полной мере свойства речеобразования. Наличие формант характерно только для гласных и звонких согласных звуков. Кроме того, данный метод применим только для оценки разборчивости речи, как меры качества речевого звукового сигнала, однако он не применим для звуковых сигналов в общем.The disadvantage of this method is that it does not fully take into account the properties of speech formation. The presence of formants is characteristic only for vowels and voiced consonants. In addition, this method is applicable only to assess speech intelligibility as a measure of the quality of a speech audio signal, but it is not applicable to audio signals in general.

Наиболее близким техническим решением к заявляемому является способ осуществления машинной оценки качества передачи аудиосигналов, в особенности речевых сигналов, при котором в одном частотном диапазоне определяют спектры передаваемого сигнала источника и принимаемого сигнала, определяют значение спектрального подобия, которое соответствует качеству передачи, при этом ковариацию спектров сигнала источника и принимаемого сигнала делят на произведение стандартного отклонения обоих спектров (Патент РФ №2232434).The closest technical solution to the claimed one is a method for performing a machine estimation of the transmission quality of audio signals, in particular speech signals, in which the spectra of the transmitted signal of the source and the received signal are determined in one frequency range, the value of spectral similarity that corresponds to the quality of transmission is determined, while the covariance of the signal spectra the source and the received signal are divided by the product of the standard deviation of both spectra (RF Patent No. 2232434).

Кроме того, спектральные значения подобия взвешиваются коэффициентом, который зависит от отношения энергий спектров сигнала приема к сигналу источника, что обеспечивает регулирование сигнала помехи, т.к. чем выше энергия принимаемого сигнала, тем значение подобия снижается сильнее.In addition, the spectral similarity values are weighted by a coefficient that depends on the ratio of the energies of the spectra of the receive signal to the source signal, which provides regulation of the interference signal, since the higher the energy of the received signal, the greater the similarity value decreases.

Предварительно до обработки сигналов из сигнала источника и принимаемого сигнала выделяют активную и неактивную фазы, при этом фрагменты сигнала, энергия которых превосходит предварительно заданный порог, соотносят с активными фазами, а остальные фрагменты квалифицируют как паузы. Паузы и помехи в паузах также отделяются и учитываются в меньшей степени, чем активные фазы сигналов.Prior to signal processing, the active and inactive phases are isolated from the source signal and the received signal, while the signal fragments whose energy exceeds a predetermined threshold are correlated with the active phases, and the remaining fragments are qualified as pauses. Pauses and interferences in pauses are also separated and accounted for to a lesser extent than the active phases of the signals.

Исходя из этого, значение спектрального подобия определяют только для фрагментов принимаемого сигнала и сигнала источника, относящихся к активной фазе, а для неактивных фаз применяется функция качества, зависящая от максимальной и средней энергии на интервале пауз, которая спадает дегрессивно.Based on this, the value of spectral similarity is determined only for fragments of the received signal and the source signal related to the active phase, and for inactive phases, a quality function is applied, depending on the maximum and average energy in the interval of pauses, which decreases degressively.

Перед преобразованием в частотную область сигналов активной фазы, осуществляют временное маскирование, для чего их подразделяют на временные блоки данных таким образом, что следующие друг за другом блоки данных перекрывались существенной частью до 50%, причем перед временным маскированием компоненты спектров сжимают посредством возведения в степень с показателем меньшим, чем 1.Before conversion to the frequency domain of the active phase signals, temporary masking is performed, for which they are divided into temporary data blocks in such a way that successive data blocks are overlapped by a substantial part up to 50%, and before temporary masking, the components of the spectra are compressed by raising to the power c an indicator less than 1.

Полученные спектры источника и принимаемого сигнала делят на критические полосы (по модели Цвикера) и рассчитывает для них коэффициенты подобия. Перед определением значения подобия спектры соответственно подвергают свертке с использованием асимметричной по частоте функции размытия, а перед сверткой расширяют компоненты спектров с использованием возведения в степень с показателем большим, чем 1.The obtained spectra of the source and the received signal are divided into critical bands (according to the Zwicker model) and calculates similarity coefficients for them. Before determining the similarity value, the spectra are respectively convolved using a frequency-asymmetric blur function, and before the convolution, the components of the spectra are expanded using exponentiation with an exponent greater than 1.

Качество передачи вычисляют посредством взвешенной линейной комбинации из значения подобия активной фазы и значения качества неактивной фазы.The transmission quality is calculated by a weighted linear combination of the similarity value of the active phase and the quality value of the inactive phase.

К основным недостаткам прототипа можно отнести:The main disadvantages of the prototype include:

- практически обработке подвергаются только активные фазы исходного и принятого (тестируемого) сигналов, что снижает объективность оценки;- in practice, only the active phases of the initial and received (tested) signals are processed, which reduces the objectivity of the assessment;

- данный метод не учитывает свойства речеобразования, т.к. критические полосы по Цвикеру, применяемые авторами изобретения, отражают лишь свойства слуха;- this method does not take into account the properties of speech formation, because critical Zwicker bands used by the inventors reflect only hearing properties;

- метод учитывает восприятие неактивной фазы только по уровню громкости, что так же снижает точность оценки.- the method takes into account the perception of the inactive phase only in terms of volume, which also reduces the accuracy of the assessment.

Задачей предлагаемого изобретения является разработка способа получения объективной оценки качества звукового сигнала, которую можно использовать в указанных областях применения предлагаемого изобретения.The objective of the invention is to develop a method for obtaining an objective assessment of the quality of the audio signal, which can be used in these areas of application of the invention.

Технический результат достигается за счет того, что в известный способ машинной оценки качества звуковых сигналов, в котором из исходного сигнала и тестируемого сигнала выделяют фрагменты активной и неактивной фаз, определяют спектр активной фазы, рассчитывают значения спектральной энергии на критических полосах и значения подобия, а качество тестируемого звукового сигнала определяют посредством взвешенной линейной комбинации из полученных значений для каждой фазы, внесены изменения, а именно:The technical result is achieved due to the fact that in the known method of machine estimation of the quality of sound signals, in which fragments of the active and inactive phases are extracted from the initial signal and the test signal, the spectrum of the active phase is determined, the spectral energy values at critical bands and similarity values are calculated, and the quality the tested sound signal is determined by means of a weighted linear combination of the obtained values for each phase, changes are made, namely:

- выделенные фрагменты активной и неактивной фазы синхронизируют по времени;- the selected fragments of the active and inactive phases are synchronized in time;

- дополнительно определяют спектры фрагментов неактивной фазы;- additionally determine the spectra of fragments of the inactive phase;

- полученные спектры фрагментов обеих фаз делят на дополнительные наборы полос, для которых рассчитывают значения спектральной энергии;- the obtained spectra of fragments of both phases are divided into additional sets of bands for which the values of spectral energy are calculated;

- фрагменты сравнивают;- fragments are compared;

- результирующий коэффициент подобия для каждой фазы определяют, как среднее значение коэффициентов подобия наборов полос по всем фрагментам.- the resulting similarity coefficient for each phase is determined as the average value of the similarity coefficients of the sets of bands for all fragments.

Затем с учетом полученных результатов производят оценку качества тестируемого звукового сигнала.Then, taking into account the results obtained, the quality of the tested sound signal is evaluated.

Кроме того:Besides:

- в качестве исходного сигнала можно использовать как произвольный звуковой сигнал, так и специализированный набор сигналов;- as an initial signal, you can use either an arbitrary audio signal or a specialized set of signals;

- спектры фрагментов активной и неактивной фазы определяют, используя дискретное косинуспреобразование;- the spectra of fragments of the active and inactive phases are determined using discrete cosine transform;

- в качестве дополнительных наборов полос могут использоваться логарифмические, резонаторные и различные известные критические полосы;- as additional sets of bands can be used logarithmic, resonator and various known critical bands;

- количество и состав наборов полос может варьироваться в различных сочетаниях для определения коэффициента подобия каждой фазы.- the number and composition of sets of bands can vary in various combinations to determine the similarity coefficient of each phase.

Сущность предлагаемого изобретения поясняется с помощью фигур 1-3, фигуры 4-8 поясняют пример реализации, а фигуры 9-13 - возможные способы использования:The essence of the invention is illustrated using figures 1-3, figures 4-8 explain an example implementation, and figures 9-13 - possible ways of using:

Фиг.1 - укрупненный алгоритм оценки качества звукового сигнала;Figure 1 - enlarged algorithm for assessing the quality of the audio signal;

Фиг.2 - алгоритм сравнения фрагментов сигнала по полосам;Figure 2 - algorithm for comparing signal fragments in bands;

Фиг.3 - общий алгоритм синхронизации исходного и тестируемого сигналов;Figure 3 - General algorithm for synchronizing the source and test signals;

Фиг.4 - алгоритм фильтрации выбросов VAD;4 is a VAD emission filtering algorithm;

Фиг.5 - алгоритм работы синхронизаторного блока (начало);Figure 5 - algorithm of the synchronization unit (start);

Фиг.6 - алгоритм работы синхронизаторного блока (продолжение);6 - algorithm of the synchronization unit (continued);

Фиг.7 - алгоритм работы синхронизаторного блока (продолжение);7 - the algorithm of the synchronization unit (continued);

Фиг.8 - алгоритм работы синхронизаторного блока (окончание);Fig - the algorithm of the synchronization unit (end);

Фиг.9 - пример оценки качества звука, передаваемого через телефонную сеть;Fig.9 is an example of assessing the quality of sound transmitted through a telephone network;

Фиг.10 - пример оценки качества передачи звука по VoIP;Figure 10 is an example of evaluating the quality of sound transmission over VoIP;

Фиг.11 - пример оценки качества передачи звука в сетях сотовой и спутниковой связи;11 is an example of assessing the quality of sound transmission in cellular and satellite networks;

Фиг.12 - пример использования оценок качества группой разработчиков систем(ы) обработки звука;Fig - an example of the use of quality ratings by a group of developers of sound processing systems (s);

Фиг.13 - пример оценки звукового качества помещений.Fig - an example of assessing the sound quality of the premises.

Необходимость разработки новых методов и улучшения существующих вызвана желанием повышения близости объективных и субъективных оценок качества, необходимостью учитывать свойства слуха и речеобразования.The need to develop new methods and improve existing ones is caused by the desire to increase the proximity of objective and subjective quality assessments, the need to take into account the properties of hearing and speech formation.

Использование в качестве исходного сигнала произвольного или специализированного сигнала зависит от цели оценки (определение разборчивости речи, качество воспроизведения звука, оценки качества речи, получаемой по трактам переговорных устройств, и т.п.) и позволяет повысить ее объективность.The use of an arbitrary or specialized signal as the initial signal depends on the purpose of the assessment (determining speech intelligibility, sound reproduction quality, evaluating the quality of speech received through intercom channels, etc.) and makes it possible to increase its objectivity.

Практически любой звуковой сигнал можно разделить на активную и неактивную фазы. Первая соответствует активным звуковым процессам, вторая - низкоуровневому фоновому шуму. Простейший способ разделения - разделения по уровню энергии сигнала, однако такой подход не обладает высокой точностью. В предлагаемом способе для разделения сигнала на активную и неактивную фазы использован известный алгоритм VAD, зафиксированный в рекомендации G.723 (в качестве элемента одноименного вокодера).Almost any sound signal can be divided into active and inactive phases. The first corresponds to active sound processes, the second to low-level background noise. The simplest method of separation is separation by signal energy level, but this approach does not have high accuracy. In the proposed method, the well-known VAD algorithm, fixed in G.723 recommendation (as an element of the vocoder of the same name), is used to separate the signal into active and inactive phases.

Исходный и тестируемый звуковой сигналы анализируются и разделяются на активную и неактивную фазы (фиг.1). Далее фрагменты активной и неактивной фазы синхронизируются (однотипные фрагменты совмещаются во времени) и анализируются различными блоками по одному алгоритму. Алгоритм синхронизации описан ниже.The source and test audio signals are analyzed and divided into active and inactive phases (figure 1). Then the fragments of the active and inactive phases are synchronized (fragments of the same type are combined in time) and analyzed by different blocks according to one algorithm. The synchronization algorithm is described below.

Раздельное сравнение совмещенных пар фрагментов активной и неактивной фазы позволяет повысить точность получаемой оценки.Separate comparison of combined pairs of fragments of the active and inactive phases allows one to increase the accuracy of the resulting estimate.

Для каждого фрагмента определяется интегральный спектр с использованием дискретного косинуспреобразования (ДКП), которое для достижения технического результата обладает некоторым преимуществом по сравнению с быстрым преобразованием Фурье (БПФ).For each fragment, the integrated spectrum is determined using discrete cosine transform (DCT), which has some advantage over the fast Fourier transform (FFT) to achieve a technical result.

Интегрирование спектра поводится по формуле (1):The integration of the spectrum is carried out according to the formula (1):

где j=0...N/2-1 - индексы значения спектральной энергии,where j = 0 ... N / 2-1 are the indices of the spectral energy,

i - номер шага интегрирования;i is the number of the integration step;

N - количество отсчетов сигнала, используемое при расчете спектра;N is the number of signal samples used in the calculation of the spectrum;

- получаемое усредненное значение спектра;

- the resulting average value of the spectrum;

- усредненное значение спектра на прошлом шаге;

- the average value of the spectrum at the last step;

Sp_i,j - значение спектра, полученное с помощью ДКП.Sp _{i, j} is the spectrum value obtained using DCT.

При расчете интегрального спектра перекрытие окон составляет N/2 отсчетов, на каждое окно накладывается известная оконная функция Хэмминга (Hamming) или Блэкмана-Харриса (Blackmann-Harris).When calculating the integral spectrum, the window overlap is N / 2 samples; the well-known window function of Hamming or Blackman-Harris is superimposed on each window.

Для всех выбранных наборов полос определяются уровни спектральной энергии на полосах. Известны группы критических полос, определенные разными авторами, исходя из различных моделей восприятия звука и речеобразования.For all selected sets of bands, the spectral energy levels in the bands are determined. There are known groups of critical bands defined by different authors, based on various models of sound perception and speech formation.

Слуховой аппарат человека является нелинейной системой, что приводит к возникновению явления, называемого маскировкой. Маскировка возникает при прослушивании сообщения на фоне помех, или маскирующих звуков.The human hearing system is a non-linear system, which leads to the appearance of a phenomenon called masking. Masking occurs when listening to a message against a background of clutter, or masking sounds.

В результате исследования маскировки гармонических сигналов узкополосным шумом Цвикер определил, что весь спектр слышимых частот можно разделить на частотные группы или полосы, выделяемые слухом человека. До Цвикера аналогичный вывод был сделан Флетчером, назвавшим выделенные частотные группы критическими полосами слуха.As a result of the study of masking harmonic signals with narrow-band noise, Zwicker determined that the entire spectrum of audible frequencies can be divided into frequency groups or bands emitted by the human ear. Before Zwicker, a similar conclusion was made by Fletcher, who called the selected frequency groups critical hearing bands.

Критические полосы, определенные Флетчером и Цвикером, различаются, т.к. первый определял полосы с помощью маскировки шумом, а второй - из соотношений воспринимаемой громкости.The critical bands defined by Fletcher and Zwicker differ because the first determined the bands using noise masking, and the second from the ratios of perceived loudness.

Сапожков определил критическую полосу, как «полоску частотного диапазона речи, которая воспринимается как единое целое». В своих ранних исследованиях он даже говорил о возможности замены звукового сигнала на полосе эквивалентным тональным сигналом, однако данное предположение не выдержало экспериментальной проверки. Критические полосы, определенные Сапожковым, отличаются от полос, определенных Флетчером и Цвикером, т.к. Сапожков исходил из свойств речевого сигнала.Sapozhkov defined the critical band as “a strip of the frequency range of speech, which is perceived as a whole.” In his early studies, he even talked about the possibility of replacing the audio signal on the strip with an equivalent tone signal, but this assumption did not stand up to experimental verification. The critical bands defined by Sapozhkov differ from the bands defined by Fletcher and Zwicker, because Sapozhkov proceeded from the properties of a speech signal.

Покровский также определял критические полосы на основе свойств речевого сигнала. Полосы, определенные Покровским, обеспечивают равную вероятность попадания в них формант.Pokrovsky also determined critical bands based on the properties of the speech signal. The bands defined by Pokrovsky provide an equal probability of hit by formants.

Значение спектральной энергии на полосах может использоваться для различных целей, одной из которых является оценка качества звукового сигнала. Однако использование критических полос только одного автора (в прототипе, например, используются критические полосы Цвикера), не позволяет получить достаточно объективную оценку, т.к. отражают только один из аспектов либо восприятия, либо речеобразования. В предлагаемом изобретении спектральная энергия может определяться на различных критических полосах, а также на логарифмических и резонаторных полосах, что позволяет учесть больше особенностей слуха и речеобразования.The value of the spectral energy in the bands can be used for various purposes, one of which is the assessment of the quality of the audio signal. However, the use of critical bands of only one author (in the prototype, for example, Zwicker critical bands are used), does not allow to obtain a fairly objective assessment, because reflect only one aspect of either perception or speech formation. In the present invention, the spectral energy can be determined on various critical bands, as well as on the logarithmic and resonator bands, which allows you to take into account more features of hearing and speech formation.

Учет того, что полосы, определенные Покровским и Сапожковым, лучше подходят для речевых сигналов, а не для звуковых сигналов вообще, позволяет повысить точность оценки, в зависимости от ее цели. В таблице 1 приведены критические полосы по разным авторам.Taking into account the fact that the bands defined by Pokrovsky and Sapozhkov are better suited for speech signals, and not for sound signals in general, can improve the accuracy of the assessment, depending on its purpose. Table 1 shows the critical bands for various authors.

Использованы следующие обозначения:The following notation is used:

Fc - центральная частота полосы;Fc is the center frequency of the band;

L - ширина полосы.L is the bandwidth.

Таблица 1Table 1 Критические полосы, определенные разными авторами.Critical bands identified by different authors. №No. ЦвикерZwicker ПокровскийPokrovsky ФлетчерFletcher СапожковSapozhkov FcFc LL FcFc LL FcFc LL FcFc LL 1one 5151 8080 260260 320320 200200 5353 200200 6060 22 150150 100one hundred 495495 150150 300300 50fifty 300300 6060 33 250250 100one hundred 640640 140140 400400 50fifty 500500 6060 4four 350350 100one hundred 787787 155155 500500 50fifty 800800 7070 55 450450 110110 947947 165165 600600 5353 10001000 8080 66 570570 120120 11251125 190190 700700 5454 15001500 100one hundred 77 700700 140140 13151315 190190 800800 5858 20002000 130130 88 840840 150150 15051505 190190 900900 6060 30003000 200200 99 10001000 160160 16901690 180180 10001000 6363 50005000 300300 1010 11701170 190190 18701870 180180 12501250 7171 80008000 600600 11eleven 13701370 210210 20502050 180180 15001500 8080 1212 16001600 240240 22302230 180180 17501750 8787 1313 18501850 280280 24352435 230230 20002000 9898 14fourteen 21502150 320320 27252725 350350 25002500 120120 15fifteen 25002500 380380 31003100 400400 30003000 141141 1616 29002900 450450 34803480 360360 40004000 200200 1717 34003400 550550 38553855 390390 50005000 276276 18eighteen 40004000 700700 45304530 960960 60006000 370370 1919 48004800 900900 61306130 22402240 70007000 480480 20twenty 58005800 11001100 86258625 27502750 80008000 590590 2121 70007000 13001300 2222 85008500 18001800 2323 1050010500 25002500 2424 1350013500 35003500

Дополнительно предлагается использовать логарифмические полосы или полосы равной громкости. Идея проста - громкость пропорциональна 10 логарифмам энергии. Для определения границ логарифмических полос используется запись фонетически представительного текста (известный текст, разработанный на кафедре фонетики СПбГУ), начитанного дикторами разного пола и возраста.Additionally, it is proposed to use logarithmic bands or bands of equal volume. The idea is simple - loudness is proportional to 10 logarithms of energy. To determine the boundaries of the logarithmic bands, a phonetically representative text is used (a well-known text developed at the Department of Phonetics of St. Petersburg State University), read by announcers of different sex and age.

Речевой тракт представляет собой сложную акустическую систему. Акустика речевого тракта нестационарная и нелинейная. При движении артикуляционных органов форма и объем верхнего резонатора изменяются, в результате чего осуществляется речевая функция. Высота голоса определяется числом колебаний голосовых связок в секунду, а также длины связок, силы их натяжения и положения надгортанника. Сила звука определяется силой смыкания голосовых связок и силой выдоха. Тембр изменяется в зависимости от положения гортани и надгортанника.The voice path is a complex speaker system. The acoustics of the vocal tract are non-stationary and non-linear. With the movement of articulatory organs, the shape and volume of the upper resonator change, resulting in a speech function. The height of the voice is determined by the number of vibrations of the vocal cords per second, as well as the length of the ligaments, the strength of their tension and the position of the epiglottis. The strength of sound is determined by the closing force of the vocal cords and the force of exhalation. The timbre changes depending on the position of the larynx and epiglottis.

В силу анатомических особенностей строения речевого аппарата и умения пользоваться резонаторами у одних людей получается усиление или ослабление гармонических составляющих звуков. Основное влияние на фонацию оказывают верхний резонатор и глотка. Также резонаторную функцию, состоящую в усилении тонов голоса и придании ему индивидуального тембра, осуществляют полости носа и околоносовых пазух.Due to the anatomical features of the structure of the speech apparatus and the ability to use resonators, some people get amplification or weakening of the harmonic components of sounds. The main effect on phonation is exerted by the upper resonator and pharynx. Also, the resonator function, which consists in enhancing the tones of the voice and giving it an individual timbre, is carried out by the nasal cavity and paranasal sinuses.

Резонаторные полосы, характерные для различных звуков речи, были определены Сорокиным В.Н. (табл.2). Учет резонаторных полос полезен при определении качества речевых звуковых (особенно речевых) сигналов. Резонаторные полосы могут быть использованы для определения качества воспроизведения отдельных звуков.Resonator bands characteristic of various speech sounds were identified by V.N. Sorokin (table 2). Consideration of resonator bands is useful in determining the quality of speech audio (especially speech) signals. Resonator bands can be used to determine the playback quality of individual sounds.

Индексы у центральных частот и ширины полос приведены по Сорокину. F_x соответствует Fc, a L_x-L.The indices at the center frequencies and bandwidths are given according to Sorokin. F _x corresponds to Fc, a L _x -L.

Таблица 2table 2 Резонаторные полосыResonator bands №No. ЗвукSound F_p F _p L_р L _p F₁ F ₁ L₁ L ₁ F₂ F ₂ L₂ L ₂ 1one «А»"BUT" 273,5273.5 72,472,4 574,6574.6 78,178.1 994,1994.1 48,348.3 F₃ F ₃ L₃ L ₃ F₄ F ₄ L₄ L ₄ F₅ F ₅ L₅ L ₅ F₆ F ₆ L₆ L ₆ 2404,82404.8 77,777.7 2711,42711.4 102,5102.5 3796,53796.5 145,6145.6 4735,34735.3 221,8221.8 №No. ЗвукSound F_p F _p L_р L _p F₁ F ₁ L₁ L ₁ F₂ F ₂ L₂ L ₂ 22 «О»"ABOUT" 287,6287.6 72,472,4 497,1497.1 100,9100.9 914,2914.2 47,147.1 F₃ F ₃ L₃ L ₃ F₄ F ₄ L₄ L ₄ F₅ F ₅ L₅ L ₅ F₆ F ₆ L₆ L ₆ 2316,42316.4 67,967.9 2635,12635.1 87,687.6 4030,94030.9 142,3142.3 4728,34728.3 189,5189.5 №No. ЗвукSound F_p F _p L_р L _p F₁ F ₁ L₁ L ₁ F₂ F ₂ L₂ L ₂ 33 «У»"U" 296,8296.8 72,472,4 408,6408.6 149,2149.2 858,0858.0 41,941.9 F₃ F ₃ L₃ L ₃ F₄ F ₄ L₄ L ₄ F₅ F ₅ L₅ L ₅ F₆ F ₆ L₆ L ₆ 2042,82042.8 54,254,2 2761,32761.3 71,271.2 3612,33612.3 92,492.4 4434,34434.3 122,7122.7 №No. ЗвукSound F_p F _p L_р L _p F₁ F ₁ L₁ L ₁ F₂ F ₂ L₂ L ₂ 4four «И»"AND" 287,7287.7 72,472,4 393,5393.5 54,954.9 2272,12272.1 66,166.1 F₃ F ₃ L₃ L ₃ F₄ F ₄ L₄ L ₄ F₅ F ₅ L₅ L ₅ F₆ F ₆ L₆ L ₆ 3094,63094.6 77,677.6 4003,64003.6 83,783.7 5047,35047,3 117,0117.0 6103,56103.5 133,6133.6 №No. ЗвукSound F_p F _p L_р L _p F₁ F ₁ L₁ L ₁ F₂ F ₂ L₂ L ₂ 55 «Ы»"Y" 302,6302.6 72,472,4 485,7485.7 85,585.5 1378,41378.4 47,047.0 F₃ F ₃ L₃ L ₃ F₄ F ₄ L₄ L ₄ F₅ F ₅ L₅ L ₅ F₆ F ₆ L₆ L ₆ 1847,71847.7 46,346.3 2574,52574.5 63,363.3 3732,53732.5 97,797.7 4421,94421.9 124,8124.8 №No. ЗвукSound F_p F _p L_р L _p F₁ F ₁ L₁ L ₁ F₂ F ₂ L₂ L ₂ 66 «Э»"E" 279,0279.0 72,472,4 490,9490.9 73,173.1 1353,01353.0 41,441,4 F₃ F ₃ L₃ L ₃ F₄ F ₄ L₄ L ₄ F₅ F ₅ L₅ L ₅ F₆ F ₆ L₆ L ₆ 2235,02235.0 60,860.8 2775,02775.0 78,578.5 3575,73575.7 109,4109,4 4226,44226.4 141,3141.3 №No. ЗвукSound F_p F _p L_р L _p F₁ F ₁ L₁ L ₁ F₂ F ₂ L₂ L ₂ 77 «С»"FROM" 325,4325,4 72,472,4 482,7482.7 72,772.7 1619,41619.4 45,745.7 F₃ F ₃ L₃ L ₃ F₄ F ₄ L₄ L ₄ F₅ F ₅ L₅ L ₅ F₆ F ₆ L₆ L ₆ 2861,02861.0 72,772.7 4029,84029.8 106,3106.3 4406,14,406.1 115,9115.9 5290,65290.6 153,9153.9 №No. ЗвукSound F_p F _p L_р L _p F₁ F ₁ L₁ L ₁ F₂ F ₂ L₂ L ₂ 88 «Ш»"Sh" 335,1335.1 72,472,4 473,4473.4 97,597.5 1439,91439.9 53,753.7 F₃ F ₃ L₃ L ₃ F₄ F ₄ L₄ L ₄ F₅ F ₅ L₅ L ₅ F₆ F ₆ L₆ L ₆ 2101,62101.6 57,157.1 2528,82528.8 62,862.8 3159,83159.8 72,972.9 4516,784516.78 117,3117.3 №No. ЗвукSound F_p F _p L_р L _p F₁ F ₁ L₁ L ₁ F₂ F ₂ L₂ L ₂ 99 «X»"X" 349,9349.9 72,472,4 543,8543.8 91,991.9 1459,71459.7 54,854.8 F₃ F ₃ L₃ L ₃ F₄ F ₄ L₄ L ₄ F₅ F ₅ L₅ L ₅ F₆ F ₆ L₆ L ₆ 2035,02035.0 53,553.5 2915,12915.1 78,578.5 3699,13699.1 93,593.5 4540,64540.6 120,5120.5 №No. ЗвукSound F_p F _p L_p L _p F₁ F ₁ L₁ L ₁ F₂ F ₂ L₂ L ₂ 1010 «Ф»"F" 274,9274.9 72,472,4 338,9338.9 83,283,2 1024,61,024.6 37,437,4 F₃ F ₃ L₃ L ₃ F₄ F ₄ L₄ L ₄ F₅ F ₅ L₅ L ₅ F₆ F ₆ L₆ L ₆ 2110,22110.2 43,243,2 2694,52694.5 53,553.5 3872,93872.9 78,078.0 4798,04798,0 104,9104.9

Дополнительно могут определяться «коэффициенты важности» полос, исходя из предположения о том, чем меньше интегральная энергия на полосе, тем выше важность полосы для восприятия речи. Соответственно, для оценки качества звуковых сигналов вообще, целесообразно считать полосы равно важными, а при оценке качества речевых сигналов, передаваемых по трактам переговорных устройств, учитывать коэффициенты важности.Additionally, the “importance factors” of the bands can be determined based on the assumption that the lower the integrated energy in the band, the higher the importance of the band for speech perception. Accordingly, to assess the quality of sound signals in general, it is advisable to consider the bands as equally important, and when evaluating the quality of speech signals transmitted along the paths of intercoms, take into account the importance factors.

Границы полос (начальный и конечный индексы) определяются по следующим формулам:The borders of the bands (starting and ending indices) are determined by the following formulas:

где nSpecLen - количество точек в спектре (N/2);where nSpecLen is the number of points in the spectrum (N / 2);

SampleRate - частота дискретизации сигнала;SampleRate - signal sampling rate;

n - номер полосы.n is the number of the strip.

Энергии на полосах определяются как:The energy in the bands is defined as:

где

- значения интегрального спектра (

равно

, полученному на последнем окне фрагмента).Where

are the values of the integrated spectrum (

equally

obtained on the last window of the fragment).

Алгоритм сравнения по полосам (для одного набора) представлен на фиг.2. Исходная оценка качества полагается равной 100%. Далее она уменьшается пропорционально различию энергий на полосах. Определяются оценки качества по каждому набору полос. Оценка качества по всем наборам полос определяется как среднее значение отдельных оценок по формуле:The strip comparison algorithm (for one set) is shown in FIG. 2. The initial quality score is assumed to be 100%. Further, it decreases in proportion to the difference in energy in the bands. Quality assessments are determined for each set of bands. Quality assessment for all sets of bands is defined as the average value of individual ratings by the formula:

где Nk - количество используемых таблиц полос;where Nk is the number of used strip tables;

k - номер текущей таблицы;k is the number of the current table;

dQ_k - оценка, полученная для k-той таблицы полос;dQ _k is the estimate obtained for the k-th strip table;

- интегральная оценка по всем таблицам.

- integral score for all tables.

Оценка качества для каждой фазы определяется как среднее, по всем парам фрагментов:Quality assessment for each phase is determined as the average for all pairs of fragments:

где

- получаемое интегральное значение коэффициента потери качества;Where

- the resulting integral value of the quality loss coefficient;

- интегральное значение коэффициента качества на предыдущем шаге;

- the integral value of the quality factor in the previous step;

- значение коэффициента качества на паре фрагментов с номером t;

- the value of the quality factor on a pair of fragments with number t;

- значение коэффициента качества на первой паре фрагментов;

- the value of the quality factor on the first pair of fragments;

t - номер пары фрагментов.t is the number of a pair of fragments.

Результирующая оценка качества по всему сигналу (dQGlobal) определяется как сумма взвешенных оценок качества активной (

(Active)) и неактивной (

(Pause)) фаз:The resulting quality score for the entire signal (dQGlobal) is defined as the sum of the weighted quality ratings of the active (

(Active)) and inactive (

(Pause)) phases:

Общий алгоритм синхронизации сигналов представлен на фиг.3. На вход синхронизатора сигнала поступают отрезки сигнала (pDATA), равные по длительности фрейму VAD, и признаки активности VAD на отрезках pDATA. Имеются два входа: для эталонного (или исходного) сигнала и для тестируемого сигнала.The general signal synchronization algorithm is shown in FIG. 3. Signal segments (pDATA) equal in duration to the VAD frame and signs of VAD activity on pDATA segments are input to the signal synchronizer. There are two inputs: for the reference (or source) signal and for the test signal.

Перед синхронизацией проводится фильтрация выбросов признаков активности VAD, заключающаяся в том, что признак активности на коротких участках (с длительностью менее пороговой) приравнивается к признакам активности окружающего сигнала.Before synchronization, a filtering of emissions of signs of VAD activity is carried out, which consists in the fact that the sign of activity in short sections (with a duration less than a threshold) is equated with signs of activity of the surrounding signal.

После фильтра признаки состояний и фреймы сигнала поступают на синхронизаторные блоки, совмещающие фрагменты активного сигнала и паузы. Модули используют общие данные: буфер активного эталонного сигнала (EBuffer1), буфер активного тестируемого сигнала (TBuffer1), буфер паузы эталонного сигнала (EBuffer0), буфер паузы тестируемого сигнала (TBuffer0), признак готовности буферов активного сигнала и пауз (dReady[0...1]), также предусмотрен счетчик ошибок синхронизации (dErrorCounter).After the filter, the state signs and signal frames are sent to synchronization blocks combining fragments of the active signal and pauses. The modules use common data: buffer of the active reference signal (EBuffer1), buffer of the active test signal (TBuffer1), pause buffer of the reference signal (EBuffer0), buffer of pause of the test signal (TBuffer0), sign of readiness of the active signal buffers and pauses (dReady [0 .. .1]), a synchronization error counter (dErrorCounter) is also provided.

На выходе синхронизатора получается пара буферов с активным сигналом или пара буферов с паузами. Оба синхронизаторных блока могут инициировать появление пары синхронизированных буферов.At the output of the synchronizer, a pair of buffers with an active signal or a pair of buffers with pauses is obtained. Both synchronization blocks can trigger the appearance of a pair of synchronized buffers.

Синхронизированные буфера в зависимости от признака активности поступают на блок сравнения активных фрагментов или пауз (фиг.1).Synchronized buffers, depending on the sign of activity, are sent to the unit for comparing active fragments or pauses (Fig. 1).

В настоящее время продолжается апробирование предлагаемого метода применительно к оценке качества телефонных каналов и IP-телефонии. Ведется поиск оптимальных алгоритмов синхронизации и уточняется зависимость между оценкой качества и слоговой разборчивостью.At present, testing of the proposed method continues with respect to assessing the quality of telephone channels and IP-telephony. The search for optimal synchronization algorithms is carried out and the relationship between quality assessment and syllabic intelligibility is specified.

Ниже приводится описание реализации способа. Реализация предлагаемого способа оценки качества звуковых сигналов осуществляется на персональном компьютере с использованием программного обеспечения и разработанного авторами изобретения. Метод реализован в виде программы для оценки качества вокодеров и сравнения внешних исходных и тестируемых сигналов.The following is a description of the implementation of the method. Implementation of the proposed method for evaluating the quality of sound signals is carried out on a personal computer using software developed by the inventors. The method is implemented as a program for assessing the quality of vocoders and comparing external source and test signals.

В качестве внешних сигналов могут использоваться произвольные сигналы, записанные с частотой дискретизации 8 кГц и разрядностью отсчетов 16 бит. Предполагается, что тестируемый сигнал получен из исходного сигнала в результате каких-либо преобразований (например, сжатие/восстановление, передача по каналам связи, фильтрация).As external signals, arbitrary signals recorded with a sampling frequency of 8 kHz and a sampling capacity of 16 bits can be used. It is assumed that the test signal is obtained from the original signal as a result of any transformations (for example, compression / restoration, transmission via communication channels, filtering).

Дополнительно, в качестве исходного внешнего сигнала может использоваться запись фонетически представительного текста, начитанного несколькими дикторами разного пола и возраста.Additionally, a phonetically representative text read by several speakers of different sexes and ages can be used as the source of the external signal.

В качестве внутренних исходных сигналов (сигналов, к которым пользователь программы не имеет доступа) используются сигналы, генерируемые в соответствии с шумовой моделью (описание генератора приведено ниже) и сигналы, генерируемые на основе статистической модели.As internal source signals (signals to which the user of the program does not have access) are used signals generated in accordance with the noise model (description of the generator is given below) and signals generated based on the statistical model.

Внутренние сигналы подаются на вход реализации системы сжатия/восстановления звуковых данных, реализуемой в виде DLL с оговоренным интерфейсом. Допускается использование DLL, разработанных как авторами предлагаемого метода, так и сторонними разработчиками. Сигнал, прошедший обработку методами, содержащимися в DLL, считается тестируемым и подвергается процедуре оценки качества, описанной выше.Internal signals are fed to the input of the implementation of the compression / restoration system of audio data, implemented as a DLL with a specified interface. It is allowed to use DLLs developed both by the authors of the proposed method and by third-party developers. The signal processed by the methods contained in the DLL is considered to be tested and undergoes the quality assessment procedure described above.

На фиг.4 представлен алгоритм фильтрации выбросов VAD. В качестве исходных данных выступают отрезки сигнала pDATA и признаки активности VAD-dVAD. В табл.3 приведены названия переменных, их назначение и начальные значения. Кроме переменных в алгоритме использованы три константы: порог выправления пауз в активное состояние (dBound[0]=6), порог выправления активного состояния в паузу (dBound[1]=4) и длина линии задержки (dDLSize=max(dBound[])+1).4 shows a VAD emission filtering algorithm. PDATA signal segments and signs of VAD-dVAD activity are used as initial data. Table 3 shows the names of the variables, their purpose and initial values. In addition to the variables in the algorithm, three constants were used: the threshold for straightening pauses to the active state (dBound [0] = 6), the threshold for straightening pauses in the active state (dBound [1] = 4) and the length of the delay line (dDLSize = max (dBound []) +1).

Используемые значения констант определены экспериментально (для случая оценки качества сигналов, прошедших процедуру сжатия/восстановления) и могут изменяться при реализации для лучшей синхронизации конкретных сигналов.The used values of the constants are determined experimentally (for the case of evaluating the quality of signals that have passed the compression / restoration procedure) and can be changed during implementation for better synchronization of specific signals.

Таблица 3Table 3 Переменные, используемые фильтром выбросов VADVariables Used by VAD Emission Filter ПеременнаяVariable НазначениеAppointment Н/зN / a dVADdVAD Значение признака активности, поступающее на вход алгоритмаThe value of the activity indicator received at the input of the algorithm -- pDATApDATA Массив отсчетов сигнала с длиной, равный фрейму VADArray of signal samples with a length equal to the VAD frame -- dStatedState Признак активности участка (предшествующее значение признака активности)Sign of site activity (previous value of activity sign) -1-one dSLendSLen Количество последовательных фреймов с одинаковым признаком активностиThe number of consecutive frames with the same sign of activity 00 dNDLFramesdNDLFrames Общее количество фреймов, поступивших на вход алгоритмаThe total number of frames received at the input of the algorithm 00 DelayLine[]DelayLine [] Линия задержки. Сохраняет признаки активности и массивы отсчетовDelay line. Saves activity signs and sample arrays --

Алгоритм проверяет признак активности текущего блока сигнала. Если признак активности совпадает с текущим принимаемым состоянием, то пришедший фрейм просто добавляется в линию задержки, а первый элемент линии задержки выдается на вход синхронизаторного блока.The algorithm checks the sign of activity of the current signal block. If the sign of activity coincides with the current received state, then the incoming frame is simply added to the delay line, and the first element of the delay line is issued to the input of the synchronizer block.

Если признак активности не совпадает с текущим принимаемым состоянием, то осуществляется проверка на приход первого фрейма сигнала. Первый фрейм просто помещается в линию задержки, а его признак активности принимается за текущее состояние.If the sign of activity does not coincide with the current accepted state, then a check is made for the arrival of the first frame of the signal. The first frame is simply placed in the delay line, and its sign of activity is taken as the current state.

Если происходит смена активности принимаемого сигнала в процессе фильтрации, то проверяется количество фреймов сигнала, принятых в предыдущем состоянии. Если количество фреймов меньше установленного порога, то производится смена их признака активности на противоположный, если нет, то просто изменяется текущее состояние и сбрасывается счетчик фреймов, принятых в текущем состоянии. После всех операций по смене состояния фрейм помещается в линию задержки.If there is a change in the activity of the received signal during the filtering process, then the number of signal frames received in the previous state is checked. If the number of frames is less than the set threshold, then their attribute of activity is changed to the opposite, if not, then the current state simply changes and the counter of frames received in the current state is reset. After all operations to change the state, the frame is placed in the delay line.

Работа алгоритма завершается по получению признака окончания сигнала. При этом на вход синхронизаторного блока отдается весь накопленный сигнал, если, конечно, таковой имеется, и только потом - признак окончания сигнала.The operation of the algorithm is completed by obtaining the sign of the end of the signal. In this case, the entire accumulated signal is given to the input of the synchronization block, if, of course, there is one, and only then is a sign of the end of the signal.

Для синхронизации сигналов используется пара синхронизаторных блоков, работающих с несколькими общими переменными, описанными выше. Алгоритм работы синхронизаторного блока представлен на фигурах 5-8.To synchronize the signals, a pair of synchronization blocks is used, working with several common variables described above. The operation algorithm of the synchronization unit is presented in figures 5-8.

Синхронизаторный блок 0 обрабатывает эталонный сигнал (фиг.5), а блок 1 - тестируемый. Алгоритмы блоков идентичны, блоки используют перекрестные ссылки на буфера, т.е. в блоке 0 XBuffer0 - это буфер пауз эталонного сигнала, a

- тестируемого, и наоборот - в блоке 1 XBuffer0 - буфер пауз тестируемого сигнала, a

- эталонного.Synchronization block 0 processes the reference signal (figure 5), and block 1 is the test one. The block algorithms are identical, the blocks use cross references to buffers, i.e. in block 0, XBuffer0 is the pause buffer of the reference signal, a

- tested, and vice versa - in block 1 XBuffer0 - buffer of pauses of the tested signal, a

- reference.

Аналогично в блоке 0 XBuffer1 - это буфер активного эталонного сигнала, a

- тестируемого, и наоборот - в блоке 1 XBuffer1 - буфер активного тестируемого сигнала, а

- эталонного.Similarly, in block 0, XBuffer1 is the buffer of the active reference signal, a

- tested, and vice versa - in block 1 XBuffer1 - buffer of the active test signal, and

- reference.

По получению признака конца сигнала алгоритм завершает свою работу. Ветка останова представлена на фиг.8.Upon receipt of the sign of the end of the signal, the algorithm completes its work. The stop branch is shown in Fig. 8.

В зависимости от признака активности VAD сигнал помещается либо в буфер пауз, либо в буфер активного сигнала. Если размер буфера превышает пороговое значение, то производится выдача синхронизированных буферов на модуль сравнения. Ветки, выдающие синхронизацию по размеру буфера, представлены на фиг.7.Depending on the sign of VAD activity, the signal is placed either in the pause buffer or in the active signal buffer. If the buffer size exceeds the threshold value, then synchronized buffers are issued to the comparison module. Branches issuing synchronization by the size of the buffer are presented in Fig.7.

После помещения сигнала в буфер проверяется текущее состояние активности сигнала. Если оно прежнее, то производится переход к началу и ожидаются новые данные. При изменении состояния проверяется, не была ли это первая порция данных? Если «да», то принимается ее состояние активности и осуществляется переход на начало.After placing the signal in the buffer, the current state of signal activity is checked. If it is the same, then a transition to the beginning is made and new data is expected. When the state changes, it is checked if this was the first piece of data? If “yes”, then its state of activity is accepted and the transition to the beginning is carried out.

Если нет, то увеличивается признак готовности сигнала в данном состоянии, после чего проверяется не готовы ли оба сигнала, т.е. участки активного сигнала или паузу синхронизированы. Если есть синхронизированные фрагменты сигнала, переходим к ветке, представленной на фиг.6. Если нет, то переход на начало алгоритма.If not, then the sign of signal availability in this state increases, after which it is checked whether both signals are ready, i.e. sections of the active signal or pause are synchronized. If there are synchronized signal fragments, go to the branch shown in Fig.6. If not, then go to the beginning of the algorithm.

По текущему состоянию определяется, была ли найдена синхронизация для пауз или для активного сигнала. Проверяем результат синхронизации на ошибку путем сравнения с нулем размеров буферов (своего и буфера из параллельного блока) сигнала. Ели хоть один из них равен нулю, то произошла ошибка синхронизации.The current state determines whether synchronization was found for pauses or for the active signal. We check the synchronization result for an error by comparing with zero the size of the buffers (ours and the buffer from the parallel block) of the signal. If at least one of them is equal to zero, then a synchronization error has occurred.

Если все в порядке синхронизированные буфера выдаются на вход модуля сравнения. Если нет - то счетчик ошибок увеличивается, буфера сбрасываются, изменяется состояние активности и происходит возврат к ожиданию новой порции данных.If everything is in order, the synchronized buffers are issued to the input of the comparison module. If not, then the error counter is increased, the buffers are discarded, the state of activity changes and there is a return to waiting for a new piece of data.

Прежде чем отдать буфера по превышению размера сегмента, производим проверку размера параллельного буфера (фиг.7). Если буфер параллельного блока пуст - буфера сбрасываются и увеличивается счетчик ошибок синхронизации. Если данные присутствуют в обоих буферах, модулю сравнения сигналов отдаются синхронизированные фрагменты.Before you give the buffer to exceed the size of the segment, we check the size of the parallel buffer (Fig.7). If the parallel block buffer is empty, the buffers are discarded and the counter of synchronization errors increases. If data is present in both buffers, synchronized fragments are given to the signal comparison module.

Перед окончанием работы проверяется: есть ли данные в буферах пауз и буферах активного сигнала. Если есть, то отдаем соответствующие синхронизированные пары (или пару) сигналов модулю сравнения. После чего, выдаем модулю сравнения признак окончания сигнала.Before finishing work, it is checked: is there data in the pause buffers and active signal buffers. If there is, then we give the corresponding synchronized pairs (or a pair) of signals to the comparison module. After that, we give the comparison module a sign of the end of the signal.

Далее рассчитываются интегральные спектры выделенных и совмещенных фрагментов в соответствии с описанием метода, приведенным выше. Для расчета спектров используется 1024-точечное дискретное косинуспреобразование, обеспечивающее достаточную для 8 кГц сигнала точность определения границ полос.Next, the integrated spectra of the selected and combined fragments are calculated in accordance with the description of the method given above. To calculate the spectra, a 1024-point discrete cosine transform is used, which provides sufficient accuracy for determining the band boundaries for an 8 kHz signal.

В табл.4 представлены коэффициенты важности отдельных полос, определенные в соответствии с описанием метода. Коэффициенты определены для сигналов, записанных с частотой дискретизации 8 кГц.Table 4 presents the coefficients of importance of the individual bands, determined in accordance with the description of the method. Coefficients are defined for signals recorded with a sampling frequency of 8 kHz.

Таблица 4Table 4 Коэффициенты важности критических полосCritical Band Importance Factors №No. ЦвикерZwicker ПокровскийPokrovsky ФлетчерFletcher СапожковSapozhkov Vc_log Vc _log Vc_line Vc _line Vc_log Vc _log Vc_line Vc _line Vc_log Vc _log Vc_line Vc _line Vc_log Vc _log Vc_line Vc _line 1one .112257.112257 .022757.022757 .023221.023221 .000234.000234 .060620.060620 .000554.000554 .119224.119224 .002201.002201 22 .071777.071777 .001918.001918 .052324.052324 .001002.001002 .062399.062399 .000774.000774 .122034.122034 .002950.002950 33 .063108.063108 .000816.000816 .059593.059593 .002275.002275 .056358.056358 .001181.001181 .150998.150998 .008126.008126 4four .066354.066354 .001426.001426 .057859.057859 .004045.004045 .058028.058028 .001615.001615 .136877.136877 .027655.027655 55 .063906.063906 .001986.001986 .061305.061305 .009510.009510 .061681.061681 .003002.003002 .141754.141754 .081240.081240 66 .063221.063221 .003309.003309 .059533.059533 .019082.019082 .064525.064525 .004624.004624 .124286.124286 .165172.165172 77 .056019.056019 .005001.005001 .061430.061430 .029982.029982 .067569.067569 .006987.006987 .123389.123389 .462036.462036 88 .057442.057442 .009524.009524 .064073.064073 .032110.032110 .072189.072189 .012508.012508 .073987.073987 .249604.249604 99 .061323.061323 .023545.023545 .068123.068123 .037674.037674 .068211.068211 .020201.020201 1010 .055594.055594 .037177.037177 .074750.074750 .066339.066339 .068900.068900 .045774.045774 11eleven .052207.052207 .048333.048333 .082703.082703 .121272.121272 .062134.062134 .041917.041917 1212 .046928.046928 .043929.043929 .086423.086423 .153918.153918 .059783.059783 .055413.055413 1313 .043235.043235 .066483.066483 .067701.067701 .114197.114197 .063507.063507 .122423.122423 14fourteen .043545.043545 .132619.132619 .042399.042399 .063481.063481 .049434.049434 .112761.112761 15fifteen .037087.037087 .111828.111828 .035929.035929 .044686.044686 .041380.041380 .072089.072089 1616 .029355.029355 .072354.072354 .045791.045791 .100261.100261 .081286.081286 .497994.497994 1717 .026423.026423 .087377.087377 .055098.055098 .199775.199775 18eighteen .049202.049202 .329361.329361

Также в соответствии с описанием метода определены логарифмические полосы и их коэффициенты важности, справедливые для 8 кГц сигнала (табл.5).Also, in accordance with the description of the method, the logarithmic bands and their importance coefficients are determined, which are valid for the 8 kHz signal (Table 5).

Таблица 5Table 5 Логарифмические полосы и их коэффициенты важности полосLogarithmic bands and their band importance factors №No. FcFc LL Vc_line Vc _line №No. FcFc LL Vc_line Vc _line №No. FcFc LL Vc_line Vc _line 1one 7474 149149 .005170.005170 88 10351035 180180 .013173.013173 15fifteen 25512551 242242 .090052.090052 22 207207 117117 .000426.000426 99 12191219 188188 .022271.022271 1616 27972797 250250 .081422.081422 33 324324 117117 .000556.000556 1010 14101410 195195 .029766.029766 1717 30353035 227227 .069182.069182 4four 445445 125125 .000925.000925 11eleven 16091609 203203 .027939.027939 18eighteen 32733273 250250 .079362.079362 55 574574 133133 .001577.001577 1212 18161816 211211 .042986.042986 1919 35393539 281281 .142375.142375 66 715715 148148 .002717.002717 1313 20472047 250250 .079971.079971 20twenty 38363836 313313 .204682.204682 77 867867 156156 .005893.005893 14fourteen 23012301 258258 .099413.099413

В соответствии с описанием метода рассчитываются интегральные спектры фрагментов, энергии на полосах, оценки качества по каждой паре фрагментов и интегральная, результирующая оценка качества. Данная реализация задействует все наборы полос.In accordance with the description of the method, the integrated spectra of the fragments, the energy in the bands, the quality estimates for each pair of fragments, and the integral, resulting quality assessment are calculated. This implementation uses all sets of bands.

Затем для удобства сравнения с субъективными оценками MOS объективная оценка в процентах пересчитывается в баллы путем деления на 20.Then, for convenience of comparison with subjective assessments of MOS, the objective assessment in percent is converted into points by dividing by 20.

Генератор сигнала, соответствующий шумовой модели речеобразования, работает следующим образом: генерируется белый шум. Из него вырезаются критические полосы, определенные Покровским или Сапожковым (табл.1). Каждая полоса модулируется частотами, перечисленными ниже. Частоты модуляции применяются последовательно на количестве отсчетов, указанных для каждой частоты (число в скобках). После того, как перебраны все частоты модуляции, делается пауза в 8000 отсчетов (1 секунда) и осуществляется переход к следующей полосе.The signal generator corresponding to the noise model of speech generation works as follows: white noise is generated. Critical bands defined by Pokrovsky or Sapozhkov are cut out of it (Table 1). Each band is modulated by the frequencies listed below. Modulation frequencies are applied sequentially on the number of samples indicated for each frequency (the number in parentheses). After all modulation frequencies have been tried, a pause of 8000 samples (1 second) is made and the transition to the next band is carried out.

Используются следующие частоты модуляции: 0.63 (40000), 0.84 (40000), 1.05 (40000), 1.26 (40000), 1.68 (40000), 2.10 (20000), 2.52 (20000), 3.36 (20000), 4.20 (20000), 5.04 (20000), 6.72 (10000), 8.40 (10000), 10.08 (10000), 13.44 (10000).The following modulation frequencies are used: 0.63 (40,000), 0.84 (40,000), 1.05 (40,000), 1.26 (40,000), 1.68 (40,000), 2.10 (20,000), 2.52 (20,000), 3.36 (20,000), 4.20 (20,000), 5.04 (20000), 6.72 (10000), 8.40 (10000), 10.08 (10000), 13.44 (10000).

Статистическая модель генерирует звуковой сигнал исходя из знаний о звуковом составе русского языка, частотности звуков, статистической информации о физических характеристиках звуков, статистических данных о составе населения, образцов голоса нескольких дикторов. Модель генерирует исходный звуковой сигнал, как последовательность образцов голосов диктора, берущихся в случайном порядке, в количестве, пропорциональном их частотности.The statistical model generates an audio signal based on knowledge of the sound composition of the Russian language, frequency of sounds, statistical information on the physical characteristics of sounds, statistical data on the composition of the population, voice samples of several speakers. The model generates the original sound signal as a sequence of samples of the speaker’s voices taken in random order, in an amount proportional to their frequency.

В результате апробации предлагаемого способа были получены оценки качества нескольких стандартных вокодеров. В табл.6 приводится оценка качества нескольких стандартных вокодеров, полученные на различных тестовых сигналах, предлагаемым способом с использованием описанной реализации. Для сравнения в таблице приведены оценки MOS.As a result of testing the proposed method, quality assessments of several standard vocoders were obtained. Table 6 shows the quality assessment of several standard vocoders obtained on various test signals, the proposed method using the described implementation. For comparison, the table shows the MOS estimates.

Таблица 6Table 6 Оценка качества звука вокодеровAssessing the sound quality of vocoders КодекCodec MOSMOS Шумовая модельNoise model Статистическая модельStatistical model ФПТFPT МинимальныйMinimum СокращенныйAbbreviated ПолныйFull -- VcVc -- FcFc -- VcVc -- VcVc -- VcVc A-LawA-law 4,104.10 4,794.79 4,734.73 4,784.78 4,784.78 4,784.78 4,784.78 4,794.79 4,804.80 4,804.80 4,844.84 Mu-LawMu-law 4,104.10 4,794.79 4,844.84 4,774.77 4,774.77 4,774.77 4,784.78 4,784.78 4,794.79 4,794.79 4,824.82 G.723.6.3G.723.6.3 3,903.90 4,254.25 4,484.48 4,214.21 4,294.29 4,224.22 4,334.33 4,154.15 4,044.04 4,084.08 3,953.95 GSM.6.10GSM.6.10 3,703.70 3,203.20 1,991.99 3,013.01 1,651.65 3,043.04 1,781.78 4,224.22 3,663.66 4,014.01 3,213.21 G.723.5.3G.723.5.3 3,653.65 4,234.23 4,444.44 4,184.18 4,274.27 4,194.19 4,324.32 4,144.14 4,044.04 4,064.06 3,933.93

В графе со знаком «-» приведены оценки при принятии полос равновероятными, а в графе «Vc» - оценки, полученные с учетом коэффициентов важности.In the column with the “-” sign, the estimates for making the bands are equally probable, and in the column “Vc”, the estimates obtained taking into account the importance factors are given.

Предлагаемый способ оценки звуковых сигналов имеет ряд преимуществ перед известными методами измерения качества, а именно:The proposed method for evaluating audio signals has several advantages over the known methods of measuring quality, namely:

- обладает универсальностью, т.к. позволяет судить о качестве сигналов, имеющих различное происхождение, прошедших различные процедуры обработки;- has versatility, because allows you to judge the quality of signals of various origins that have gone through various processing procedures;

- процесс оценки качества может быть оптимизирован в зависимости от целей получения оценки:- the quality assessment process can be optimized depending on the objectives of obtaining the assessment:

- по скорости (например, возможно быстро получить грубую оценку);- speed (for example, it is possible to quickly get a rough estimate);

- по типу сигнала (использование различных полос для речевых сигналов и звуковых сигналов вообще);- according to the type of signal (the use of different bands for speech signals and sound signals in general);

- полученная оценка хорошо коррелируют с оценками MOS;- the obtained estimate correlates well with the MOS estimates;

- оценки качества, полученные для речевых сигналов, могут быть пересчитаны в значения различных видов разборчивости.- quality estimates obtained for speech signals can be converted into values of different types of intelligibility.

Ниже приводится краткое описание нескольких возможных вариантов применения предлагаемого метода для оценки качества звука.The following is a brief description of several possible applications of the proposed method for evaluating sound quality.

На фиг.9 представлена схема применения предлагаемого метода для оценки качества передачи звука через телефонную сеть общего пользования. Данная схема справедлива как для местной, так и для междугородней/международной связи.Figure 9 presents a diagram of the application of the proposed method for assessing the quality of sound transmission through a public telephone network. This scheme is valid for both local and long-distance / international calls.

Сервер оценки качества звука генерирует исходный сигнал (или выбирает среди заранее приготовленных) и передает его одному из абонентов, участвующих в тестировании.The sound quality assessment server generates an initial signal (or selects among pre-prepared ones) and transmits it to one of the subscribers participating in the testing.

Абонент, получивший сигнал, устанавливает обычное телефонное соединение со вторым абонентом, и воспроизводит исходный сигнал. Второй абонент записывает принимаемый звуковой сигнал, и передает его на сервер оценки качества звука.The subscriber who received the signal establishes a regular telephone connection with the second subscriber, and reproduces the original signal. The second subscriber records the received audio signal and transmits it to the server for evaluating sound quality.

Сервер оценки качества звука производит сравнение исходного и тестируемого сигналов в соответствии с предлагаемым методом и выдает оценку качества звука, прошедшего через телефонную сеть. Полученная оценка может быть использована для повышения качества обслуживания абонентов, принятия решения о необходимости замены или настройки оборудования (как на стороне абонента, так и на стороне станции), в рекламных целях и др.The sound quality assessment server compares the source and test signals in accordance with the proposed method and provides an estimate of the quality of sound transmitted through the telephone network. The resulting assessment can be used to improve the quality of customer service, make decisions about the need to replace or configure equipment (both on the subscriber side and on the station side), for advertising purposes, etc.

Аналогичным образом происходит оценка качества звука по IP-сети, представленная на фиг.10. Отличие от предыдущего варианта применения заключается в способе передачи исходного и тестируемого звукового сигналов от сервера оценки качества звука к абонентам, и в способе передачи данным между абонентами.Similarly, there is an assessment of sound quality over the IP network, presented in figure 10. The difference from the previous application is in the method of transmitting the source and test audio signals from the server for evaluating sound quality to subscribers, and in the method of transmitting data between subscribers.

Кроме того, полученные оценки качества могут быть использованы для выбора кодеков, используемых при VoIP-связи и для выбора операторов, предоставляющих услуги IP-телефонии.In addition, the obtained quality assessments can be used to select the codecs used in VoIP communications and to select the operators providing IP-telephony services.

Аналогичным образом предлагаемый метод может быть использован для оценки сотовой и спутниковой связи (см. фиг.11). Полученные оценки могут использоваться абонентами для выбора операторов связи и моделей телефонов, а операторами - для оптимизации размещения базовых станций.Similarly, the proposed method can be used to evaluate cellular and satellite communications (see Fig.11). The obtained estimates can be used by subscribers to select telecom operators and phone models, and by operators to optimize the placement of base stations.

На фиг.12 представлен процесс использования предлагаемого метода оценки качества звука разработчиками и тестировщиками систем и алгоритмов (методов) сжатия звуковых данных. Каждая версия кодека (или кодек с набором параметров) требует оценки и сравнения с аналогами. Каждый разработчик может обратиться к базе звуковых образцов, сжать и восстановить сигнал и получить объективную оценку качества работы кодека.On Fig presents the process of using the proposed method for assessing sound quality by developers and testers of systems and algorithms (methods) for compressing audio data. Each version of the codec (or codec with a set of parameters) requires evaluation and comparison with analogues. Each developer can turn to the database of sound samples, compress and restore the signal and get an objective assessment of the quality of the codec.

Такая система позволит управлять процессом разработки кодеков и оптимизацией их параметров, кроме того, конечный потребитель сможет получить оптимальный алгоритм, а не просто работающий.Such a system will allow you to control the process of developing codecs and optimizing their parameters, in addition, the end user will be able to get the optimal algorithm, and not just working.

На фиг.13 представлен процесс оценки звукового качества помещений. В данном случае исходным является сигнал, получаемый с микрофона, расположенного напротив диктора, а тестовыми сигналы с микрофонов, расположенных в разных частях помещения, в местах расположения слушателей и звуковоспроизводящего оборудования.On Fig presents the process of evaluating the sound quality of the premises. In this case, the source signal is received from a microphone located opposite the speaker, and test signals from microphones located in different parts of the room, at the locations of listeners and sound-reproducing equipment.

Полученные оценки могут использоваться для оптимизации расположения звуковоспроизводящего оборудования, мебели и зрительских мест.The resulting estimates can be used to optimize the location of sound-reproducing equipment, furniture and seats.

После апробации предлагаемый способ в течение 2005-06 г.г. будет широко использоваться в различных областях техники.After testing, the proposed method for 2005-06 will be widely used in various fields of technology.

Claims

1. A method for the implementation of a machine-based evaluation of the quality of sound signals, in which the active and inactive phases are isolated from fragments of the initial and test signals, the spectrum of the active phase is determined, it is divided into critical bands and the spectral energy values are calculated on the critical bands, the spectral similarity values of the active phase of the fragments are determined and the quality of the tested audio signal is determined by a weighted linear combination of the obtained quality values for each phase, characterized in that These fragments of the active and inactive phase of both signals are synchronized, the spectra of the inactive phase for each of the fragments are determined, the obtained spectra of the active and inactive phase of the fragments are divided into sets of bands, including additional sets of critical, as well as logarithmic and resonator bands, for each of which the values of the spectral energy, compare the pairwise obtained spectral energies of the active and inactive phases of the fragments, to determine the coefficients of spectral similarity, the resulting coefficient the similarity coefficient for each phase is determined as the average value of similarity coefficients for all sets of bands, which is an estimate of the quality of each phase.

2. The method according to claim 1, characterized in that as an initial signal, you can use either an arbitrary audio signal or a specialized set of signals.

3. The method according to claim 1, characterized in that the spectra of fragments of the active and inactive phases are determined using a discrete cosine transform.

4. The method according to claim 1, characterized in that the spectral energy of the active and inactive phases of each fragment is calculated taking into account the importance factors of each band included in the set.

5. The method according to any one of claim 1 or 4, characterized in that as the sets of bands use a different combination of logarithmic, resonator and known critical bands.