RU2433488C1

RU2433488C1 - Method of detecting pathology of voice leading in speech

Info

Publication number: RU2433488C1
Application number: RU2010104610/08A
Authority: RU
Inventors: Евгений Михайлович Воронин (RU); Евгений Михайлович Воронин; Сергей Станиславович Дериглазов (RU); Сергей Станиславович Дериглазов; Дмитрий Викторович Ламтюгин (RU); Дмитрий Викторович Ламтюгин; Владимир Карпович Макуха (RU); Владимир Карпович Макуха; Артем Владимирович Марков (RU); Артем Владимирович Марков; Ольга Геннадьевна Фетисова (RU); Ольга Геннадьевна Фетисова
Priority date: 2010-02-09
Filing date: 2010-02-09
Publication date: 2011-11-10
Also published as: RU2010104610A

Abstract

FIELD: medicine.

SUBSTANCE: pairs of sets of low-frequency harmonics and/or obertones and sets of high-frequency obertones, corresponding to definite type of voice leading pathology. After that, for each pair of sets coefficients of voice harmonisation are calculated as ratio of total energy of definite set of relatively high-frequency obertones to total energy of definite set of relatively low-frequency harmonics and/or obertones and compared with values of corresponding one or several coefficients of voice harmonisation for the norm and for the pathology and conclusion about presence of one or another type of pathology of voice leading in speech is made.

EFFECT: increased selectivity and increased sensitivity of method of detecting voice leading pathology.

2 dwg

Description

Предлагаемое изобретение относится к области психофизиологии, а именно к психофизиологии речи, и может быть использовано при анализе характеристик голосового аппарата человека для диагностики различных видов патологии голосоведения и объективной оценки эффективности проведенного лечения.The present invention relates to the field of psychophysiology, namely to the psychophysiology of speech, and can be used in the analysis of the characteristics of the human vocal apparatus for the diagnosis of various types of voice pathology and an objective assessment of the effectiveness of the treatment.

Известен способ оценки выявления патологии голосоведения, применяемый для оценки вокальной одаренности (патент РФ №2204170 «Способ комплексной оценки вокальной одаренности»), при котором голос обследуемого человека анализируется в соответствии с частотой высокой певческой форманты, которая, в зависимости от индивидуальных характеристик голосового тракта человека, может находиться в широком частотном диапазоне (от 2350 до 3700 Гц). Анализ голоса основывается на вычислении отношения электрического напряжения в 1/3-октавной полосе частотного спектра звука, соответствующей высокой певческой форманте, к электрическому напряжению звука, содержащего полный спектр голоса, и определении коэффициента звонкости голоса, который соответствует относительному уровню певческой форманты в спектре голоса.There is a method of assessing the identification of voice pathology, used to assess vocal giftedness (RF patent No. 2204170 "Method for the integrated assessment of vocal giftedness"), in which the voice of the person being examined is analyzed in accordance with the frequency of the high singing formant, which, depending on the individual characteristics of the human vocal tract , can be in a wide frequency range (from 2350 to 3700 Hz). Voice analysis is based on calculating the ratio of the electrical voltage in the 1/3 octave band of the frequency spectrum of the sound corresponding to the high singing formant to the electrical voltage of the sound containing the full spectrum of the voice, and determining the voicemail coefficient of the voice, which corresponds to the relative level of the singing formant in the voice spectrum.

Однако указанный способ предназначен для анализа качества голосоведения при пении с привязкой к конкретным спектральным частотам в диапазоне от 2350 до 3700 Гц. В то же самое время известно, что у большинства людей в этой области низкая спектральная плотность звука в голосе при речи без патологии. Следовательно, анализ голоса у таких людей невозможен указанным способом, что в свою очередь сужает возможности применения рассматриваемого способа.However, this method is intended for analysis of voice quality during singing with reference to specific spectral frequencies in the range from 2350 to 3700 Hz. At the same time, it is known that most people in this area have a low spectral density of sound in the voice when speaking without pathology. Therefore, the analysis of the voice in such people is impossible in this way, which in turn narrows the possibilities of using the method in question.

Кроме того, известен способ выявления патологии голосоведения в речи (А computer system for acoustic analysis of pathological voices and laryngeal diseases screening. Mitev P., Hadjitodorov S., Medical engineering & physics, 2002, p.419-429), являющийся прототипом предлагаемого изобретения и заключающийся в том, что для анализа голосового сигнала вводится параметр NFHE (Normalized First Harmonic Energy). Запись голосоречевого сигнала должна быть разделена на временные сегменты. Длина сегмента должна быть равна 8 периодам минимальной анализируемой частоты, для того чтобы обеспечить достаточно малые ошибки при анализе голосового сигнала. Для каждого сегмента рассчитывается быстрое преобразование Фурье (БПФ) с использованием окна Хэмминга. Параметр NFHE для каждого сегмента вычисляется по следующей формуле:In addition, there is a method of identifying the pathology of voice recognition in speech (A computer system for acoustic analysis of pathological voices and laryngeal diseases screening. Mitev P., Hadjitodorov S., Medical engineering & physics, 2002, p.419-429), which is the prototype of the proposed inventions and consisting in the fact that for the analysis of a voice signal, the NFHE (Normalized First Harmonic Energy) parameter is introduced. Voice recording should be divided into time segments. The segment length should be equal to 8 periods of the minimum analyzed frequency in order to ensure sufficiently small errors in the analysis of the voice signal. For each segment, the fast Fourier transform (FFT) is calculated using the Hamming window. The NFHE parameter for each segment is calculated using the following formula:

где P(f) - мощность спектра сигнала,where P (f) is the power of the signal spectrum,

int - оператор преобразования, который округляет значение до ближайшего целого,int is a conversion operator that rounds the value to the nearest integer,

f₀ - частота основного тона,f ₀ - the frequency of the fundamental tone,

b=1.5 N/W половина полосы гармоники (N и W - количество точек, используемых для расчета БПФ и определяющих длину временного окна соответственно),b = 1.5 N / W half the harmonic band (N and W - the number of points used to calculate the FFT and determining the length of the time window, respectively),

k - порядковый номер гармоники,k is the harmonic sequence number,

k_max - число гармоник на участке до 4 кГц.k _max - the number of harmonics in the region up to 4 kHz.

Значение NFHE для всего голосового сигнала вычисляется по следующей формуле:The NFHE value for the entire voice signal is calculated using the following formula:

где i - порядковый номер сегмента,where i is the sequence number of the segment,

n - число сегментов.n is the number of segments.

При этом указанный способ не позволяет отличать друг от друга различные проявления дисфонии, так как особенностью некоторых проявлений дисфонии является определенное перераспределение энергии в структуре спектра голоса между наборами высокочастотных и низкочастотных обертонов при одинаковом отношении энергии гармоники на частоте основного тона к сумме энергий остальных обертонов. Более того, в случае дисфонии с отличным от нормы распределением энергии в структуре спектра голоса выше частоты основного тона указанный способ не позволит в принципе детектировать патологию. Также коэффициент NFHE обладает малой чувствительностью, т.е. его значение для голосового сигнала с явно выраженной патологией отличается от значения для голоса без патологии всего на несколько десятков процентов. Вследствие этого коэффициент NFHE не будет информативным, например, при выявлении патологии на ранней стадии.Moreover, this method does not allow to distinguish between different manifestations of dysphonia, since a feature of some manifestations of dysphonia is a certain redistribution of energy in the structure of the voice spectrum between sets of high-frequency and low-frequency overtones with the same ratio of harmonic energy at the fundamental frequency to the sum of the energies of the other overtones. Moreover, in the case of dysphonia with an abnormal distribution of energy in the structure of the voice spectrum above the frequency of the fundamental tone, this method will not allow in principle to detect pathology. Also, the NFHE coefficient has low sensitivity, i.e. its value for a voice signal with a pronounced pathology differs from the value for a voice without pathology by only a few tens of percent. As a result, the NFHE coefficient will not be informative, for example, in identifying pathologies at an early stage.

Задачей предлагаемого изобретения является создание способа выявления патологии голосоведеия в речи, который обладает более высокой селективностью, т.е. позволяет отличать различные проявления дисфонии друг от друга, детектировать виды патологии, в которых отличие от нормы наблюдается в структуре спектра выше частоты основного тона, а также обладает большей чувствительностью.The objective of the invention is to provide a method for identifying pathology of voice science in speech, which has a higher selectivity, i.e. it allows to distinguish different manifestations of dysphonia from each other, to detect types of pathology in which a difference from the norm is observed in the spectrum structure above the fundamental frequency, and also has greater sensitivity.

Поставленная задача достигается тем, что в способе в спектре голосоречевого сигнала анализируется отношение суммарной энергии определенного набора относительно высокочастотных обертонов к суммарной энергии определенного набора относительно низкочастотных гармоник и/или обертонов, в отличие от известного способа, в котором производится анализ голосового сигнала по отношению энергии гармоники на частоте основного тона к суммарной энергии других гармоник посредством параметра NFHE (при этом не учитывается распределение энергии в структуре спектра голоса выше частоты основного тона). Более высокая чувствительность и селективность достигается за счет выбора информативных для конкретного вида патологии гармоник и/или обертонов, например, на основе формантной структуры спектра голосоречевого сигнала.This object is achieved in that in the method in the spectrum of a voice-speech signal, the ratio of the total energy of a certain set of relatively high-frequency overtones to the total energy of a certain set of relatively low-frequency harmonics and / or overtones is analyzed, in contrast to the known method in which a voice signal is analyzed in relation to the harmonic energy at the fundamental frequency to the total energy of other harmonics by means of the NFHE parameter (this does not take into account the energy distribution in the jet round spectrum above the voice pitch frequency). Higher sensitivity and selectivity is achieved by choosing harmonics and / or overtones that are informative for a particular type of pathology, for example, based on the formant structure of the spectrum of a voice-speech signal.

На фиг.1 приведена блок-схема аппаратного обеспечения, необходимого для реализации предложенного способа, на фиг.2 приведен спектр голосоречевого сигнала с пронумерованными гармониками (обертонами).Figure 1 shows a block diagram of the hardware necessary to implement the proposed method, figure 2 shows the spectrum of a voice-speech signal with numbered harmonics (overtones).

На схеме (фиг.1) 1 - микрофон, 2 - аналого-цифровой преобразователь, 3 - микропроцессорная система. Все компоненты соединены таким образом, что сигнал с микрофона (1) поступает на аналого-цифровой преобразователь (2), а уже с аналого-цифрового преобразователя на микропроцессорую систему (3). В качестве аналогово-цифрового преобразователя (2) может быть использована звуковая карта персонального компьютера (ПК), цифровой диктофон, устройство на базе микроконтроллера или цифрового сигнального процессора. Микропроцессорная система (3) может быть выполнена на основе микроконтроллера или сигнального цифрового процессора, либо это может быть персональный компьютер.In the diagram (figure 1) 1 - microphone, 2 - analog-to-digital Converter, 3 - microprocessor system. All components are connected in such a way that the signal from the microphone (1) goes to the analog-to-digital converter (2), and from the analog-to-digital converter to the microprocessor system (3). As an analog-to-digital converter (2), a sound card of a personal computer (PC), a digital voice recorder, a device based on a microcontroller or a digital signal processor can be used. The microprocessor system (3) can be based on a microcontroller or a signal digital processor, or it can be a personal computer.

Способ осуществляется следующим образом: сначала производится цифровая запись голосоречевого сигнала с использованием микрофона. Например, голосоречевой сигнал записывается при помощи цифрового диктофона.The method is as follows: first, digital recording of the voice signal using a microphone is performed. For example, a voice-voice signal is recorded using a digital voice recorder.

Затем при помощи конкретной реализации микропроцессорной системы, например, ПК вычисляется спектр сигнала. Спектр сигнала может быть вычислен, например, методом быстрого преобразования Фурье. Затем в спектре выделяются гармоники и обертоны - это сильно выделяющиеся пики в спектре голосоречевого сигнала, гармоника и обертон характеризуются центральной частотой и диапазоном частот на уровне полувысоты пика. На фиг.2 приведен спектр голосоречевого сигнала с пронумерованными гармониками (обертонами). Далее выделяются пары наборов. Каждая пара состоит из набора низкочастотных гармоник и/или обертонов (низкочастотный набор) и набора высокочастотных обертонов (высокочастотный набор). Одна или несколько пар соответствуют определенному виду патологии голосоведения. Для выявления нескольких проявлений патологии следует использовать различные совокупности пар.Then, using a specific implementation of a microprocessor system, for example, a PC, the signal spectrum is calculated. The signal spectrum can be calculated, for example, by the fast Fourier transform method. Then, harmonics and overtones stand out in the spectrum - these are very prominent peaks in the spectrum of the voice-speech signal, harmonic and overtone are characterized by a central frequency and a frequency range at the peak half-level. Figure 2 shows the spectrum of a voice-speech signal with numbered harmonics (overtones). Next, pairs of sets are highlighted. Each pair consists of a set of low-frequency harmonics and / or overtones (low-frequency set) and a set of high-frequency overtones (high-frequency set). One or more pairs correspond to a certain type of voice pathology. To identify several manifestations of the pathology, different sets of pairs should be used.

При выявлении патологии для количественной оценки голосоречевого сигнала вводится коэффициент голосовой гармонизации, представляющий собой отношение суммарной энергии высокочастотного набора к суммарной энергии низкочастотного набора в паре для голосоречевого сигнала:When pathology is identified, for the quantitative assessment of the voice-speech signal, a voice harmonization coefficient is introduced, which is the ratio of the total energy of the high-frequency dialing to the total energy of the low-frequency dialing in pair for the voice-speech signal:

где h1, h2…hn - номера относительно высокочастотных обертонов,where h1, h2 ... hn are the numbers relative to the high-frequency overtones,

l1,l2…lm - номера относительно низкочастотных обертонов,l1, l2 ... lm - numbers relative to low-frequency overtones,

P_hi - энергия i-го относительно высокочастотного обертона,P _hi is the energy of the i-th relatively high-frequency overtone,

Р_lj - энергия j-го относительно низкочастотного гармоники или обертона,P _lj is the energy of the jth relatively low-frequency harmonic or overtone,

P_h1, P_h2…P_hm - высокочастотный набор,P _h1 , P _h2 ... P _hm - high-frequency set,

P_l1, P_l2…P_ln - низкочастотный набор.P _l1 , P _l2 ... P _ln - low-frequency set.

Для каждого из наборов низкочастотных и высокочастотных гармоник и обертонов вычисляется коэффициент голосовой гармонизации. Рассчитанные значения коэффициентов голосовой гармонизации сравниваются со значениями соответствующих коэффициентов голосовой гармонизации при норме и различных видах патологии. На основании сравнения делается вывод о наличии той или иной патологии голосоведения.For each of the sets of low-frequency and high-frequency harmonics and overtones, the coefficient of voice harmonization is calculated. The calculated values of the coefficients of voice harmonization are compared with the values of the corresponding coefficients of voice harmonization with normal and various types of pathology. Based on the comparison, the conclusion is made about the presence of a particular pathology of voice control.

Для различных видов патологии голосоведения изменяются пары наборов гармоник/обертонов, участвующих в расчете коэффициента голосовой гармонизации.For different types of voice pathology, pairs of sets of harmonics / overtones participating in the calculation of the voice harmony coefficient are changed.

Эксперименты показали, что у образцов голосоречевого сигнала с патологией такой коэффициент значительно отличается от образцов с правильным голосоведением. Для различных видов проявления патологии нормой являются определенные диапазоны значений коэффициентов голосовой гармонизации, определяемые эмпирически.Experiments have shown that for samples of a voice-speech signal with pathology, this coefficient differs significantly from samples with correct voice recognition. For various types of manifestations of pathology, the norm is certain ranges of values of the coefficients of voice harmonization, determined empirically.

Чувствительность предложенного способа для выбранного образца с патологией может быть проиллюстрирована следующими расчетными данными коэффициентов для выбранного образца с выраженной патологией и для образца с голосоведением в норме. В обоих случаях запись голосоречевого образца длительностью 2 секунды содержала гласный звук /А/ и была сохранена в 16-битном формате WAV с частотой дискретизации 44100 Гц. Расчет спектра голосоречевого образца производился на персональном компьютере с использованием программного обеспечения WavLAB 6 с использованием дискретного преобразования Фурье, реализованного быстрыми методами (2048 отсчетов, окно Хэмминга с шириной 1024 отсчета).The sensitivity of the proposed method for the selected sample with pathology can be illustrated by the following calculated data of the coefficients for the selected sample with pronounced pathology and for the sample with voice recognition is normal. In both cases, the recording of a voice-speech sample lasting 2 seconds contained a vowel sound / A / and was saved in a 16-bit WAV format with a sampling frequency of 44100 Hz. The spectrum of the voice-speech sample was calculated on a personal computer using WavLAB 6 software using the discrete Fourier transform implemented by fast methods (2048 samples, a Hamming window with a width of 1024 samples).

Для расчета коэффициента NFHE минимальная анализируемая частота была задана в 40 Гц. Это вполне оправдано тем, что в речи в этом диапазоне частот отсутствует информативная составляющая. При этом для продолжения расчета вся запись была разделена на отдельные сегменты длительностью, равной восьми периодам минимальной анализируемой частоты. Т.е. длительность одного сегмента составила 8*1/(40 Гц)с=0,20 с.Количество сегментов 2/(0,20 с)=10. Указанным выше способом с использованием персонального компьютера был рассчитан спектр для каждого отдельного сегмента, а также коэффициент NFHE_i по следующей формуле:To calculate the NFHE coefficient, the minimum analyzed frequency was set at 40 Hz. This is quite justified by the fact that in the speech in this frequency range there is no informative component. Moreover, to continue the calculation, the entire record was divided into separate segments with a duration equal to eight periods of the minimum analyzed frequency. Those. the duration of one segment was 8 * 1 / (40 Hz) s = 0.20 s. The number of segments 2 / (0.20 s) = 10. In the above manner, using a personal computer, the spectrum was calculated for each individual segment, as well as the coefficient NFHE _i according to the following formula:

При этом для отдельных сегментов голосоречевого образца с патологическим голосоведением были получены следующие значения:At the same time, for the individual segments of the voice-speech sample with pathological voice recognition, the following values were obtained:

NFHE₁=0,0067;NFHE ₁ = 0.0067;

NFHE₂=0,0071;NFHE ₂ = 0.0071;

NFHE₃=0,0072;NFHE ₃ = 0.0072;

NFHE₄=0,0067;NFHE ₄ = 0.0067;

NFHE₅=0,0069;NFHE ₅ = 0.0069;

NFHE₆=0,0065;NFHE ₆ = 0.0065;

NFHE₇=0,0065;NFHE ₇ = 0.0065;

NFHE₈=0,0077;NFHE ₈ = 0.0077;

NFHE₉=0,0074;NFHE ₉ = 0.0074;

NFHE₁₀=0,0063.NFHE ₁₀ = 0.0063.

При этом для отдельных сегментов голосоречевого образца голосоведением в норме были получены следующие значения:At the same time, for the individual segments of the voice-speech sample, the following values were normally obtained by voice science:

NFHE₁=0,014;NFHE ₁ = 0.014;

NFHE₂=0,011;NFHE ₂ = 0.011;

NFHE₃=0,013;NFHE ₃ = 0.013;

NFHE₄=0,013;NFHE ₄ = 0.013;

NFHE₅=0,013;NFHE ₅ = 0.013;

NFHE₆=0,012;NFHE ₆ = 0.012;

NFHE₇=0,013;NFHE ₇ = 0.013;

NFHE₈=0,015;NFHE ₈ = 0.015;

NFHE₉=0,015;NFHE ₉ = 0.015;

NFHE₁₀=0,012.NFHE ₁₀ = 0.012.

Значение NFHE для всего голосового сигнала рассчитывалось по следующей формуле:The NFHE value for the entire voice signal was calculated using the following formula:

n - число сегментов, в данном случае равное 10.n is the number of segments, in this case, equal to 10.

При этом коэффициенты NFHE, рассчитанные для образца голосоречевого сигнала в норме и при патологии, соответственно равны: NFHE_нopмa=0,0069, NFHE_{патология}=0,013, отличаются примерно в два раза.At the same time, the NFHE coefficients calculated for the sample of the voice-speech signal in normal and pathological conditions are respectively equal: NFHE _norm = 0.0069, NFHE _pathology = 0.013, differ approximately two times.

Для расчета коэффициентов голосовой гармонизации тем же способом и с теми же параметрами, что и для расчета коэффициентов NFHE, для образца с патологическим и нормальным голосоведением рассчитан спектр сигнала, при этом для всей записи целиком. Далее в спектре выделены и пронумерованы гармоники и обертоны с использованием специализированного программного обеспечения, реализованного для этих целей в среде MATLAB.To calculate the coefficients of voice harmonization in the same way and with the same parameters as for calculating the NFHE coefficients, the signal spectrum was calculated for a sample with pathological and normal voice recognition, while for the whole recording. Then, harmonics and overtones are highlighted and numbered in the spectrum using specialized software implemented for these purposes in the MATLAB environment.

Коэффициенты голосовой гармонизации для различных наборов гармоник и/или обертонов вычислялись по формуле:Voice harmonization coefficients for different sets of harmonics and / or overtones were calculated by the formula:

l1, l2…lm - номера относительно низкочастотных обертонов,l1, l2 ... lm - numbers relative to low-frequency overtones,

Наиболее показательные значения коэффициентов голосовой гармонизации, рассчитанные для тех же образцов голосоречевого сигнала в норме и при патологии, соответственно равны: К_{6,7,8-1,2(норма)}=13,77, К_{6,7,8-1,2(патология)}=1,29 и отличаются почти в десять раз. При этом для тех же образцов коэффициенты NFHE равны: NFHE_норма=0,0069, NFHE_{патология}=0,013, отличаются только примерно в два раза.The most indicative values of the coefficients of voice harmonization, calculated for the same samples of the voice-speech signal in normal and pathological conditions, respectively, are: K _{6.7.8-1.2 (normal)} = 13.77, K _{6.7.8-1, 2 (pathology)} = 1.29 and differ almost ten times. Moreover, for the same samples, the NFHE coefficients are equal: NFHE _norm = 0.0069, NFHE _pathology = 0.013, they differ only about two times.

Таким образом, вычисляя коэффициенты голосовой гармонизации с различными индексами, можно обеспечить селективность по различным видам дисфонии, а также детектировать патологию, когда отклонения наблюдаются в структуре спектра, выше первой гармоники. За счет выбора наиболее показательных гармоник/обертонов можно получить более высокую чувствительность при выявлении патологии.Thus, by calculating the coefficients of voice harmonization with different indices, it is possible to ensure selectivity for various types of dysphonia, as well as detect pathology when deviations are observed in the spectrum structure above the first harmonic. By choosing the most indicative harmonics / overtones, you can get a higher sensitivity in detecting pathology.

Claims

A method for identifying voice pathology in speech, including analyzing the energy distribution in the spectrum of a voice-speech signal, characterized in that pairs of sets of low-frequency harmonics and / or overtones and sets of high-frequency overtones corresponding to a certain type of voice pathology are distinguished in the spectrum, after which the coefficients are calculated for each pair of sets voice harmonization as the ratio of the total energy of a certain set of relatively high-frequency overtones to the total energy of a certain set of relative itelno low-frequency harmonics and / or overtones, and compared with the values corresponding to one or more factors voice harmonizing with the normal and pathological conditions, and concludes that there is a particular type of pathology of voice in speech.