RU2559710C2

RU2559710C2 - Method of processing autocorrelation function for measuring fundamental tone of speech signal

Info

Publication number: RU2559710C2
Application number: RU2013104317/08A
Authority: RU
Inventors: Александр Сергеевич Колоколов; Марианна Иосифовна Павлова
Priority date: 2013-02-04
Filing date: 2013-02-04
Publication date: 2015-08-10
Also published as: RU2013104317A

Abstract

FIELD: physics, computer engineering.

SUBSTANCE: invention relates to means of processing an autocorrelation function for measuring the fundamental tone of a speech signal and can be used in signal processing in speech recognition systems. The main peak in the autocorrelation function is emphasised by subtraction from the autocorrelation function obtained for a segment of the signal having lesser amplitude than that of the smoothed autocorrelation function for the signal modulus on the same segment and nulling negative differences.

EFFECT: high reliability of measuring base frequency of a speech signal.

2 cl, 3 dwg

Description

Изобретение относится к области обработки сигналов и может быть использовано для измерения основного тона речевых сигналов, а также других квазипериодических сигналов.The invention relates to the field of signal processing and can be used to measure the fundamental tone of speech signals, as well as other quasiperiodic signals.

Измерение основного тона является базовой процедурой при анализе и распознавании речевого сигнала. Для этих целей часто используется кратковременная автокорреляционная функция коротких вокализованных сегментов сигнала длительностью ΔT=20-50 мс. Пусть имеется сегмент речевого сигнала s(t), заданный на интервале [0, ΔT]. В этом случае частота основного тона f₀=1/T₀ определяется величиной, обратной координате главного пика на τ=Т₀ автокорреляционной функцииThe measurement of the fundamental tone is the basic procedure in the analysis and recognition of a speech signal. For these purposes, a short-term autocorrelation function of short voiced signal segments with a duration of ΔT = 20-50 ms is often used. Let there be a segment of the speech signal s (t) defined on the interval [0, ΔT]. In this case, the fundamental frequency f ₀ = 1 / T ₀ is determined by the reciprocal of the coordinate of the main peak at τ = T _{0 of the} autocorrelation function

$r (τ) = \frac{1}{Δ T} \int_{0}^{Δ T - τ} s (t) s (t - 1) d t$

r (τ) = \frac{one}{Δ T} \int_{0}^{Δ T - τ} s (t) s (t - one) d t

или нормированной автокорреляционной функции r₀(τ)=r(τ)/r(0). Однако поскольку речевой сигнал является сверткой сигнала голосового источника, роль которого выполняют голосовые связки, с импульсной характеристикой речевого тракта, то за пик основного тона может быть принят пик автокорреляционной функции, связанный с первой формантой речевого сигнала, что приводит к нежелательным грубым ошибкам измерения основного тона.or the normalized autocorrelation function r ₀ (τ) = r (τ) / r (0). However, since the speech signal is a convolution of the voice source signal, the role of which is the vocal cords, with the impulse response of the speech path, the peak of the pitch can be taken as the peak of the autocorrelation function associated with the first form of the speech signal, which leads to undesirable gross errors in the measurement of the pitch .

Для уменьшения амплитуды пика в r(τ), связанного с первой формантой, используется центральное клиппирование речевого сигнала (Sondhi M.M. New methods of pitch extraction // IEEE Trans. Audio and Electroacoust. 1968. V.AU-16. №2. 262-266), выравнивающее амплитуды гармоник речевого сигнала и тем самым ослабляющее его формантные резонансы. Подобная процедура обеспечивает подчеркивание пика корреляционной функции на τ=Т₀ для стационарных участков речевого сигнала, однако оказывается неудовлетворительной при изменении амплитуды речевого сигнала на протяжении интервала ΔT или в присутствии импульсных помех.To reduce the peak amplitude in r (τ) associated with the first formant, central clipping of the speech signal is used (Sondhi MM New methods of pitch extraction // IEEE Trans. Audio and Electroacoust. 1968. V.AU-16. No. 2. 262- 266), equalizing the amplitudes of the harmonics of the speech signal and thereby weakening its formant resonances. Such a procedure ensures that the peak of the correlation function is emphasized by τ = T ₀ for stationary sections of the speech signal, but it turns out to be unsatisfactory when the amplitude of the speech signal changes over the ΔT interval or in the presence of impulse noise.

Наиболее близким техническим решением к предлагаемому способу является способ подчеркивания в автокорреляционной функции пика на τ=Т₀ (Колоколов А.С., Любинский И.А., Мещеряков А.Ю. Измерение основного тона речевого сигнала на основе его автокорреляционной функции // Наукоемкие технологии, 2012, т.13, №5, с.26-29). Он основан на клиппировании положительных пиков в автокорреляционной функции r₀(τ) с помощью линейно-убывающей функции

, где α - параметр, определяющий уровень клиппирования r₀(τ), выбираемый в диапазоне 0<α<1, а τ ∈[0, ΔТ]. В результате получается клиппированная автокорреляционная функцияThe closest technical solution to the proposed method is the method of emphasizing in the autocorrelation function of the peak at τ = T ₀ (Kolokolov A.S., Lyubinsky I.A., Meshcheryakov A.Yu. Measurement of the fundamental tone of a speech signal based on its autocorrelation function // High-tech Technology, 2012, vol. 13, No. 5, p. 26-29). It is based on clipping positive peaks in the autocorrelation function r ₀ (τ) using a linearly decreasing function

, where α is a parameter that determines the clipping level r ₀ (τ), selected in the range 0 <α <1, and τ ∈ [0, ΔТ]. The result is a clipped autocorrelation function

Рассмотренная процедура клиппирования обеспечивает подчеркивание пика автокорреляционной функции на τ=Т₀ для стационарных участков речевого сигнала и является малочувствительной к присутствию импульсных помех, однако оказывается неудовлетворительной при изменении амплитуды речевого сигнала на протяжении интервала ΔT, т.к. в этом случае автокорреляционная функция r₀(τ) будет затухать быстрее, чем пороговая функция р₀(τ).The considered clipping procedure ensures that the peak of the autocorrelation function is emphasized by τ = T ₀ for stationary sections of the speech signal and is insensitive to the presence of impulse noise, but it is unsatisfactory when the amplitude of the speech signal changes over the ΔT interval, because in this case, the autocorrelation function r ₀ (τ) will decay faster than the threshold function p ₀ (τ).

Техническим результатом изобретения является повышение надежности измерения частоты основного тона ƒ₀ речевого сигнала путем применения обработки автокорреляционной функции r₀(τ), подчеркивающей ее пик на τ=1/ƒ₀.The technical result of the invention is to increase the reliability of measuring the frequency of the fundamental tone ƒ _{0 of the} speech signal by applying the processing of the autocorrelation function r ₀ (τ), emphasizing its peak at τ = 1 / ƒ ₀ .

Технический результат обеспечивается тем, что производится подчеркивание главного пика в автокорреляционной функции с помощью вычитания из автокорреляционной функции, полученной для сегмента сигнала, меньшей по амплитуде функции автокорреляции для модуля сигнала на том же сегменте и обнуления отрицательных разностей.The technical result is ensured by emphasizing the main peak in the autocorrelation function by subtracting from the autocorrelation function obtained for the signal segment smaller in amplitude of the autocorrelation function for the signal module on the same segment and zeroing the negative differences.

Кроме того, производят дополнительное сглаживание функции автокорреляции, найденной для модуля сигнала.In addition, additional smoothing of the autocorrelation function found for the signal module is performed.

На фиг. 1 представлена блок-схема, поясняющая процесс обработки автокорреляционной функции r₀(τ) в предлагаемом способе.In FIG. 1 is a flowchart explaining a process for processing an autocorrelation function r ₀ (τ) in the proposed method.

На фиг. 2 предложенный способ поясняется на примере двухформантного синтетического гласного с постоянной амплитудой.In FIG. 2, the proposed method is illustrated by the example of a two-form synthetic vowel with a constant amplitude.

На фиг. 3 демонстрируется устойчивость способа при линейном убывании амплитуды на сегменте гласного до уровня 0,5 - (а) и 0,25 - (б).In FIG. 3 demonstrates the stability of the method with a linear decrease in amplitude on the vowel segment to the level of 0.5 - (a) and 0.25 - (b).

На фиг.1 представлены блок получения автокорреляционной функции сигнала 1, блок получения автокорреляционной функции модуля сигнала 2, блок сглаживания 3, блок умножения на постоянный коэффициент 4, блок вычитания 5, блок обнуления отрицательных значений 6.Figure 1 shows the unit for obtaining the autocorrelation function of signal 1, the unit for obtaining the autocorrelation function of signal module 2, the smoothing unit 3, the multiplying unit by a constant coefficient 4, the subtracting unit 5, the unit for zeroing negative values 6.

Поставленная цель достигается с помощью нахождения взвешенной разностиThe goal is achieved by finding the weighted difference

$r_{c 2} (τ) = {\begin{matrix} \begin{array}{l} r_{0} (τ) - α r_{0 e} (τ) \\ 0 \end{array} & \begin{array}{l} п р и r_{0} (τ) - α r_{0 e} (τ) \otimes h (τ) > 0 \\ п р и r_{0} (τ) - α r_{0 e} (τ) \otimes h (τ) \leq 0 \end{array} \end{matrix},$

r_{c 2} (τ) = {\begin{matrix} \begin{array}{l} r_{0} (τ) - α r_{0 e} (τ) \\ 0 \end{array} & \begin{array}{l} P R and r_{0} (τ) - α r_{0 e} (τ) \otimes h (τ) > 0 \\ P R and r_{0} (τ) - α r_{0 e} (τ) \otimes h (τ) \leq 0 \end{array} \end{matrix},

гдеWhere

$r_{0 e} (τ) = \frac{1}{Δ T} \int_{0}^{Δ T - τ} | s (t) | | s (t - 1) | d t;$

r_{0 e} (τ) = \frac{one}{Δ T} \int_{0}^{Δ T - τ} | s (t) | | s (t - one) | d t;

⊗ - знак операции свертки; h(τ) - симметричная импульсная характеристика сглаживающего фильтра, которая в частном случае отсутствия сглаживания будет представлять собой δ - функцию Дирака; 0<α<1; τ∈[0, Т]; |s(t)| - модуль s(t).⊗ is the sign of the convolution operation; h (τ) is the symmetric impulse response of the smoothing filter, which in the particular case of no smoothing will be a δ - Dirac function; 0 <α <1; τ∈ [0, T]; | s (t) | is the module s (t).

Такого рода обработку можно рассматривать как своего рода клиппирование r₀(τ) с пороговой функцией ar_0e(τ), затухающей примерно так же, как и r₀(τ). В результате этого r_c2(τ) в сравнении с r_c1(τ) выделение пика на τ=Т₀ оказывается в меньшей степени зависимым от изменении амплитуды речевого сигнала на протяжении интервала ΔT.This kind of processing can be considered as a kind of clipping of r ₀ (τ) with a threshold function ar _0e (τ) that attenuates in much the same way as r ₀ (τ). As a result of this, r _c2 (τ) in comparison with r _c1 (τ), the peak allocation at τ = T ₀ is less dependent on the change in the amplitude of the speech signal over the interval ΔT.

Приведенные на фиг.2(а), (б) и (с) зависимости, представляющие соответственно автокорреляционные функции r₀(τ), r_0e(τ) и результат обработки r_c2(τ), были получены для дискретного двухформантного синтетического гласного, представленного 256 отсчетами, при частоте дискретизации 10 кГц для α=0,85. Синтезированный гласный имел частоту основного тона f₀=100 Гц и частоты формант 500 и 830 Гц.The dependences shown in Fig. 2 (a), (b) and (c), which represent, respectively, the autocorrelation functions r ₀ (τ), r _0e (τ) and the processing result r _c2 (τ), were obtained for a discrete two-form synthetic vowel, represented by 256 samples, at a sampling frequency of 10 kHz for α = 0.85. The synthesized vowel had a fundamental frequency f ₀ = 100 Hz and frequencies of formants 500 and 830 Hz.

Сглаживание r_0e(τ) выполнялось с помощью фильтра низких частот с симметричной импульсной характеристикой h(n)=0,25u₀(n-1)+0,5u₀(n)+0,25u₀(n+1), где n=…-2, -1, 0, 1, 2, …,Smoothing r _0e (τ) was performed using a low-pass filter with a symmetrical impulse response h (n) = 0.25u ₀ (n-1) + 0.5u ₀ (n) + 0.25u ₀ (n + 1), where n = ... -2, -1, 0, 1, 2, ...,

$u_{0} (n) = {\begin{matrix} 1, & n = 0, \\ 0, & n \neq 0. \end{matrix}$

u_{0} (n) = {\begin{matrix} one, & n = 0 \\ 0 & n \neq 0. \end{matrix}

Поэтому вычисление свертки сводилось к суммированию трех взвешенных спектральных отсчетов. В одном случае (фиг.2) амплитуда гласного была неизменной на протяжении сегмента длительностью ΔT=25,6 мс, в других двух случаях (фиг.3(а) и 3(б)) линейно спадала до уровней в два и четыре раза ниже исходного.Therefore, the calculation of the convolution was reduced to the summation of three weighted spectral samples. In one case (Fig. 2), the vowel amplitude was unchanged over a segment with a duration of ΔT = 25.6 ms, in the other two cases (Figs. 3 (a) and 3 (b)) linearly decreased to levels two and four times lower source.

Из чертежей можно видеть, что предложенный способ обработки автокорреляционной функции позволяет подчеркнуть ее пик на τ=1/f₀ как в случае речевого сигнала с постоянной амплитудой, так и при изменениях амплитуды речевого сигнала на интервале анализа ΔT. При этом во всех случаях пик у r_c2(τ) на τ=1/f₀ является существенно более выраженным в сравнении с другими пиками, нежели у автокорреляционной функции r₀(τ).From the drawings it can be seen that the proposed method for processing the autocorrelation function allows emphasizing its peak at τ = 1 / f ₀ both in the case of a speech signal with a constant amplitude and when the amplitude of the speech signal changes in the analysis interval ΔT. Moreover, in all cases, the peak at r _c2 (τ) at τ = 1 / f ₀ is significantly more pronounced in comparison with other peaks than the autocorrelation function r ₀ (τ).

Таким образом, приведенные выше данные позволяют заключить, что предложенный способ обработки функции автокорреляции может быть использован для реализации более устойчивого измерения основного тона речевого сигнала в присутствии амплитудных вариаций сигнала на интервале анализа ΔT.Thus, the above data allow us to conclude that the proposed method for processing the autocorrelation function can be used to implement a more stable measurement of the pitch of the speech signal in the presence of amplitude variations of the signal in the analysis interval ΔT.

Claims

1. A method for processing an autocorrelation function for measuring the pitch of a speech signal, characterized in that the main peak of the autocorrelation function is emphasized over the signal period, and the autocorrelation function for the signal module with the same amplitude is subtracted from the autocorrelation function obtained for the signal segment segment and zero negative differences.

2. The method according to claim 1, characterized in that they perform additional smoothing of the autocorrelation function obtained for the signal module.