RU2580796C1

RU2580796C1 - Method (variants) of filtering the noisy speech signal in complex jamming environment

Info

Publication number: RU2580796C1
Application number: RU2015107227/08A
Authority: RU
Inventors: Олег Николаевич Титов; Андрей Алексеевич Афанасьев; Александр Павлович Рыжков
Priority date: 2015-03-02
Filing date: 2015-03-02
Publication date: 2016-04-10

Abstract

FIELD: radio engineering and communications.

SUBSTANCE: inventions relate to the field of digital communication and voice processing technologies under noise pollution. Applied are methods of filtering a noisy speech signal in a difficult jamming environment. To that end poly-spectrum results of analysis are used to accurately assess the spectral characteristics of the noise impact. Inventive process is carried out with an additional spectral subtraction correction signals based on empirical mode decomposition procedure and adaptive digital filtering using a low-frequency coefficient bi-correlation obtained by analyzing the total bi-correlation in areas of concentration of low-density area of the processed segment of bi-amplitude of noisy speech signal.

EFFECT: technical result is to increase the signal to noise ratio of the purified speech.

3 cl, 10 dwg

Description

Представленные изобретения относится к области цифровой связи и могут быть использованы в системах телекоммуникаций при реализации процедуры фильтрации зашумленного речевого сигнала в условиях сложной помеховой обстановки.The presented invention relates to the field of digital communications and can be used in telecommunication systems when implementing the filtering procedure of a noisy speech signal in a complex interference environment.

Область применения изобретений: радиотелефония и системы обработки речи, голосовое управление электронными приборами, устройства пред- и пост-обработки речевого сигнала.Scope of inventions: radiotelephony and speech processing systems, voice control of electronic devices, devices for pre- and post-processing of a speech signal.

Несмотря на наличие большого количества технических решений в области применения заявленных изобретений остается нерешенной задача обработки зашумленной речи в условиях высокой интенсивности шумового воздействия, что проявляется в снижении качества предоставляемых телекоммуникационных услуг.Despite the presence of a large number of technical solutions in the field of application of the claimed inventions, the problem of processing noisy speech in conditions of high intensity of noise exposure remains unsolved, which is manifested in a decrease in the quality of telecommunication services provided.

Имеется способ и устройство ослабления шума в речевом сигнале (патент РФ 2121719 G10L 9/00, опубликовано 10.11.1998), основанных на обработке зашумленной речи в режиме реального времени устройством, в котором определяют спектральные оценки каждого сегмента речи заданной длительности, при этом каждый сегмент речи анализируют логически на наличие фонем и их принадлежность определенному классу, частью которого они являются, и затем частотный спектр сегмента анализируют на наличие особенностей, позволяющих распознать специфические фонемы в пределах типа. Последовательность фонем может быть сохранена в виде компактных групп и преобразована затем для синхронизации с голосом диктора.There is a method and apparatus for attenuation of noise in a speech signal (RF patent 2121719 G10L 9/00, published 10.11.1998), based on the processing of noisy speech in real time by a device in which spectral estimates of each segment of speech of a given duration are determined, with each segment speeches are analyzed logically for the presence of phonemes and their belonging to a particular class of which they are a part, and then the frequency spectrum of the segment is analyzed for features that allow recognition of specific phonemes within t ipa. The phoneme sequence can be stored as compact groups and then transformed to synchronize with the voice of the speaker.

Такой подход ввиду использования результатов фонемного анализа обладает низким качеством очищенной речи.This approach, due to the use of the results of phonemic analysis, has a low quality of purified speech.

Наиболее близким аналогом по совокупности существенных признаков, признанным в качестве прототипа, являются способ улучшения качества речи и устройство для его осуществления (патент на изобретение РФ 2391778 Н04В 15/00, опубликовано 07.09.2005), включающих последовательно исполняемые этапы, согласно которым осуществляют прием зашумленного речевого сигнала и его аналого-цифровое преобразование с предустановленной частотой дискретизации, далее разделяют зашумленный речевой сигнал на сегменты квазистационарности, после чего на основе анализа результатов фильтрации, как в области нижних, так и верхних частот классифицируют сегменты на вокализованные и невокализованные, далее выполняют оценку спектральных характеристик шума, производят в заранее выбранных сегментах и производят шумоподавление отдельно для вокализованного сегмента в модуле адаптивной фильтрации и невокализованного сегмента путем спектрального вычитания в спектрах мощности, далее выполняют оценку спектра фаз зашумленного обрабатываемого сегмента, с последующим обратным преобразованием Фурье спектра амплитуд и спектра фаз для получения очищенного речевого сигнала.The closest analogue in terms of essential features recognized as a prototype is a method for improving the quality of speech and a device for its implementation (patent for the invention of the Russian Federation 2391778 Н04В 15/00, published September 7, 2005), including sequentially executed stages, according to which the reception of noisy speech signal and its analog-to-digital conversion with a pre-set sampling frequency, then the noisy speech signal is divided into quasistationary segments, after which, based on the analysis of filtering ultates, both in the low and high frequencies, classify segments into voiced and unvoiced, then evaluate the spectral characteristics of the noise, produce in pre-selected segments and perform noise reduction separately for the voiced segment in the adaptive filtering module and unvoiced segment by spectral subtraction in the spectra power, then evaluate the phase spectrum of the noisy processed segment, followed by the inverse Fourier transform of the amplitude spectrum d and phase spectrum to produce a purified speech signal.

К недостаткам аналога и прототипа можно отнести такие факты как:The disadvantages of the analogue and prototype include facts such as:

невозможность определения факта зашумления речевого сигнала с дальнейшей задачей проведения оценки шумового воздействия в случае высокой энергетики шума (при отношениях сигнал-шум <10 дБ - условия сложной помеховой обстановки (фигура 1));the impossibility of determining the fact of noise of the speech signal with the further task of assessing the noise impact in the case of high noise energy (for signal-to-noise ratios <10 dB - conditions of a complex noise environment (figure 1));

появление сильных нелинейных искажений после проведения процедуры шумоподавления.the appearance of strong nonlinear distortion after the noise reduction procedure.

Задачей заявленных изобретений является создание способов фильтрации зашумленного речевого сигнала в условиях сложной помеховой обстановки.The task of the claimed invention is to provide methods for filtering a noisy speech signal in a complex jamming environment.

Задача изобретений решается тем, что достигается технический результат, выражаемый как повышение отношение сигнал-шум очищенного речевого сигнала, обрабатываемого способами фильтрации зашумленного речевого сигнала в условиях сложной помеховой обстановки.The objective of the invention is solved in that a technical result is achieved, expressed as an increase in the signal-to-noise ratio of the purified speech signal processed by the filtering methods of a noisy speech signal in a difficult interference environment.

Заявленные способы характеризуются тем, что на этапе дискретизации устанавливают постоянное значение частоты дискретизации, равное 44100 Гц, кроме того на этапе сегментации выбирают постоянный период квазистационарности, равный 1024 отсчетам, также применяют полиспектральный анализ, включающий в себя оценку и работу не только с спектром мощности, но и биамплитудой

, биспектра

обрабатываемого зашумленного речевого сигнала (Тоцкий А.В., Астола Я., Восстановление сигналов по оценкам биспектров в присутствии гауссовых и негауссовых помех, Зарубежная радиоэлектроника, 2002, №11, с. 44-58, Никиас Х.Л., Рагувер М.Р. Биспектральное оценивание применительно к цифровой обработке сигналов. ТИИЭР, 1987, Т. 75, №7, с. 5-30, Zhang Ji-Wu, Zheng Chong-Xun, and Xie Au, Bispectrum analysis of focal ischemic cerebral EEG signal usingthird-order recursion method, IEE Trans. Biomedical Engineering, vol. 47, No. 3, March 2000, pp. 352-359).The claimed methods are characterized by the fact that at the sampling stage, a constant value of the sampling frequency is set equal to 44100 Hz, in addition, at the segmentation stage, a constant quasistationary period equal to 1024 samples is selected, a multispectral analysis is also used, including assessment and work not only with the power spectrum, but also biamplitude

bispectrum

processed noisy speech signal (Totsky A.V., Astola Y., Signal reconstruction according to bispectrum estimates in the presence of Gaussian and non-Gaussian interference, Foreign Radio Electronics, 2002, No. 11, pp. 44-58, Nikias H.L., Raguver M. R. Bispectral Evaluation for Digital Signal Processing, TIIER, 1987, Vol. 75, No. 7, pp. 5-30, Zhang Ji-Wu, Zheng Chong-Xun, and Xie Au, Bispectrum analysis of focal ischemic cerebral EEG signal usingthird -order recursion method, IEE Trans. Biomedical Engineering, vol. 47, No. 3, March 2000, pp. 352-359).

Для чего на этапе проектирования набирают статистику, которая в полной мере описывает все статистические и параметрические свойства русской речи, далее записывают полученную информацию в блок хранения информации.Why, at the design stage, statistics are collected that fully describes all the statistical and parametric properties of Russian speech, then the information received is written to the information storage unit.

Накопление априорных сведений о чистом речевом сигнале можно определить следующими последовательно выполняемыми действиями:The accumulation of a priori information about a pure speech signal can be determined by the following sequentially performed actions:

I) Используют следующие тестовые фразы (ГОСТ Р 51061-97 Системы низкоскоростной передачи речи по цифровым каналам. Параметры качества речи и методы испытаний. - М.: Госстандарт России, 1997 г. - 230 с). Данные фразы в полной мере характеризуют русскую речь и полностью описывают ее статистические и параметрические особенности, общее количество записей М=3:I) Use the following test phrases (GOST R 51061-97 System for low-speed voice transmission through digital channels. Speech quality parameters and test methods. - M .: Gosstandart of Russia, 1997 - 230 s). These phrases fully characterize Russian speech and fully describe its statistical and parametric features, the total number of entries M = 3:

A) "Если хочешь быть здоров, советует Татьяна Илье - чистить зубы пастой "Жемчуг";A) "If you want to be healthy, Tatyana Ilya advises - brushing your teeth with Pearl paste;

Б) "В клумбах сочинской здравницы "Пуща", как сообщает автоинспектор, обожгли шихту";B) "In the beds of the Sochi health resort" Pushcha ", as the traffic inspector reports, they burnt the charge";

B) "Актеры и актрисы драматического театра часто покупают в этой аптеке антибиотики".B) "Drama theater actors and actresses often buy antibiotics at this pharmacy."

Данные тестовые фразы полностью характеризуют вариабельность русского языка и в той же мере точно описывают его параметрические и статистические свойства.These test phrases fully characterize the variability of the Russian language and to the same extent accurately describe its parametric and statistical properties.

II) Запись тестовых фраз осуществляют от 40 дикторов, 25 из которых мужского пола: 5 до 20 лет, 5 от 20 до 25 лет, 5 от 25 до 35 лет, 5 от 35 до 50 лет, 5 старше 50 лет, и 15 женского пола: 3 до 20 лет, 3 от 20 до 25 лет, 3 от 25 до 35 лет, 3 от 35 до 50 лет, 3 старше 50 лет, общее количество дикторов: X=40 (фиг. 2).II) Recording test phrases is carried out from 40 broadcasters, 25 of which are male: 5 to 20 years old, 5 from 20 to 25 years old, 5 from 25 to 35 years old, 5 from 35 to 50 years old, 5 older than 50 years old, and 15 female gender: 3 to 20 years, 3 from 20 to 25 years, 3 from 25 to 35 years, 3 from 35 to 50 years, 3 older than 50 years, the total number of speakers: X = 40 (Fig. 2).

III) Запись осуществляют в условиях отсутствия шумов.III) Recording is carried out in the absence of noise.

IV) Записанные тестовые фразы по 8 с подвергают аналого-цифровому преобразованию с частотой дискретизации 44100 Гц.IV) The recorded test phrases of 8 s are subjected to analog-to-digital conversion with a sampling frequency of 44100 Hz.

V) Полученные последовательности отсчетов делят на сегменты квазистационарности по 1024 отсчета, общее количество сегментов D=344.V) The obtained sequence of samples is divided into quasistationary segments of 1024 samples, the total number of segments D = 344.

VI) Определяют среднее значение моментной энергии на сегменте чистого речевого сигнала:VI) Determine the average value of moment energy on a segment of a pure speech signal:

,

где

- значение энергии отсчета при номере отсчета

, при i=1:1024, последовательном номере d - сегмента, m - записи, x - диктора сигнала чистой речи, при конкретных значениях d=1, 2, 3, …, D, l=1, 2 …, M, x=1, 2, 3, …, X;Where

- value of reference energy at reference number

, with i = 1: 1024, the serial number of d is the segment, m is the recording, x is the speaker of the pure speech signal, for specific values d = 1, 2, 3, ..., D, l = 1, 2 ..., M, x = 1, 2, 3, ..., X;

,

где

- значение мгновенной энергии d - сегмента, l - записи, x - диктора сигнала «чистой» речи, при конкретных значениях d=1, 2, 3, …, D, l=1, 2 …, М, х=1, 2, 3, …, X;Where

- the value of the instantaneous energy of d - segment, l - recording, x - speaker of the signal of "pure" speech, for specific values of d = 1, 2, 3, ..., D, l = 1, 2 ..., M, x = 1, 2 , 3, ..., X;

,

где

- среднее значение мгновенной энергии на сегменте 1024 отсчета для всех D - сегментов, М - записей, X - дикторов.Where

- the average value of the instantaneous energy in the segment of 1024 samples for all D - segments, M - records, X - speakers.

VII) Полученные априорные данные

записывают в блок хранения информации устройства обработки речи.VII) A priori data obtained

recorded in the information storage unit of the speech processing device.

Полную последовательность материальных действий выполняемых согласно предложенным способам фильтрации зашумленного речевого сигнала в условиях сложной помеховой обстановки можно представить следующим образом (фиг. 3) (А1, A2, … А10 - некие последовательности составных материальных действий по обработке зашумленного PC, раскрываемых ниже. Кроме того, d - последовательный номер сегмента, так что d - текущий сегмент обработки, d-1 - предшествующий текущему сегменту обработки и т.д., L - суммарное количество сегментов, необходимое для состоятельной оценки спектральных характеристик шумового воздействия (для частоты дискретизации 44100 Гц и сегмента локальной стационарности 23 мс L≥88⇔2 сек):The complete sequence of material actions performed according to the proposed methods for filtering a noisy speech signal in a difficult interference environment can be represented as follows (Fig. 3) (A1, A2, ... A10 are some sequences of composite material actions for processing a noisy PC, described below. In addition, d is the serial number of the segment, so d is the current processing segment, d-1 is the one preceding the current processing segment, etc., L is the total number of segments required for a consistent estimates of the spectral characteristics of noise exposure (for a sampling frequency of 44100 Hz and a segment of local stationarity 23 ms L≥88⇔2 sec):

А1) Прием непрерывного зашумленного речевого сигнала.A1) Reception of a continuous noisy speech signal.

А2):A2):

1) Аналого-цифровое преобразование с частотой дискретизации 44100 Гц.1) Analog-to-digital conversion with a sampling frequency of 44100 Hz.

2) Сегментация PC на участки локальной стационарности по 23 мс (1024 отсчета).2) PC segmentation into local stationarity sections for 23 ms (1024 counts).

Если последовательный номер сегмента не удовлетворяет условию состоятельной оценки спектральных характеристик шумового воздействия, то выполняется последовательность действий, согласно прототипа.If the serial number of the segment does not satisfy the condition of a consistent assessment of the spectral characteristics of the noise exposure, then the sequence of actions is performed, according to the prototype.

A3) Разделение обрабатываемого сигнала на вокализованные и невокализованные участки PC путем фильтрации в области нижних и верхних частот.A3) Separation of the processed signal into voiced and unvoiced sections of the PC by filtering in the low and high frequencies.

А4) Выполнение спектрального вычитания для невокализованных участков речи.A4) Perform spectral subtraction for unvoiced sections of speech.

А5) Выполнение амплитудно-линейной фильтрации (АЛФ) для вокализованных участков речи (выявляется доминантная спектральная компонента в области частоты основного тона, относительно которой осуществляется АЛФ с затуханием 6 дБ на октаву).A5) Performing amplitude-linear filtering (ALP) for voiced portions of speech (a dominant spectral component is detected in the region of the fundamental frequency, relative to which ALP is performed with a decay of 6 dB per octave).

A1, А2, A3, А4, А5 достаточно подробно представлены в [О.И. Шелухин, Н.Ф. Лукьянцев Цифровая обработка и передача речи, М., Радио и Связь, 2000 г. - с. 102-112, с. 123-146; Быков С.Ф., Журавлев В.И., Шалимов И.А. Цифровая телефония: учебное пособие для вузов - М.: Радио и связь, 2003 г. - 144 с] прототипе.A1, A2, A3, A4, A5 are presented in sufficient detail in [O.I. Shelukhin, N.F. Lukyantsev Digital processing and transmission of speech, M., Radio and Communications, 2000 - p. 102-112, p. 123-146; Bykov S.F., Zhuravlev V.I., Shalimov I.A. Digital telephony: a textbook for universities - M .: Radio and communications, 2003 - 144 s] prototype.

В случае удовлетворения последовательного номера сегмента обработки условию состоятельной оценки спектральных характеристик шумового воздействия для d>L.If the serial number of the processing segment is satisfied, the condition for a consistent estimation of the spectral characteristics of the noise exposure for d> L.

А6):A6):

3) Осуществляют оценку мгновенного эмпирического отношения шум-сигнал (ОШС - характеристика противоположная отношению сигнал-шум) для каждого сегмента на длительности оценки шума, т.е. анализируемый сегмент и L-1 - сегментов предшествующих ему, что для сегмента анализа 1024 отсчета и частоте дискретизации 44100 Гц составляет 88 сегментов примерно 2 с согласно:3) The instantaneous empirical noise-to-signal ratio is estimated (OSN - characteristic opposite to the signal-to-noise ratio) for each segment for the duration of the noise estimation, i.e. the analyzed segment and L-1 - segments preceding it, which for the analysis segment of 1024 samples and the sampling frequency of 44100 Hz is 88 segments of approximately 2 s according to:

где G(d) - отношение шум-сигнал в дБ, d - последовательный номер сегмента обработки; U(i) - номер отсчета сегмента обработки.where G (d) is the noise-to-signal ratio in dB, d is the serial number of the processing segment; U (i) is the reference number of the processing segment.

4) Получают оценку среднего значения эмпирического отношения шум-сигнал для сегмента обработки с учетом мгновенных оценок ОШС на длительности оценки шума, т.е. анализируемый сегмент и L-1 - сегментов предшествующих ему, что для сегмента анализа 1024 отсчета и частоте дискретизации 44100 Гц составляет 88 сегментов примерно 2 с:4) Get an estimate of the average value of the empirical noise-to-signal ratio for the processing segment, taking into account the instantaneous OSH estimates for the duration of the noise estimate, i.e. the analyzed segment and L-1 - segments preceding it, which for the analysis segment of 1024 samples and the sampling frequency of 44100 Hz is 88 segments for about 2 s:

5) Получают оценку порогового значения разрешающей способности биамплитуды биспектра сегмента зашумленного речевого сигнала:5) Get an estimate of the threshold value of the resolution of the bi-amplitude of the bispectrum of the segment of a noisy speech signal:

6) Получают оценку порогового значения процедуры выделения сегментов путем анализа низкоплотностной области биамплитуды:6) Get an estimate of the threshold value of the procedure for selecting segments by analyzing the low-density region of the bi-amplitude:

7) Выполняют прямое быстрое дискретное преобразование Фурье на сегменте:7) Perform direct fast discrete Fourier transform on the segment:

8) Получают оценку усеченного спектра амплитуд Фурье зашумленного сегмента PC при i=1:122:8) An estimate is obtained of the truncated spectrum of the Fourier amplitudes of the noisy PC segment at i = 1: 122:

9) Получают оценку спектра фаз Фурье зашумленного сегмента PC при i=1:1024:9) Get an estimate of the spectrum of the Fourier phases of the noisy PC segment at i = 1: 1024:

,

где

- значение мнимой составляющей комплексного спектра Фурье на i - частоте,

- значение вещественной составляющей комплексного спекта Фурье на i - частоте.Where

- the value of the imaginary component of the complex Fourier spectrum at i - frequency,

is the value of the material component of the complex Fourier spectrum at i - frequency.

10) Получают оценку разреза биамплитуды

, которую синтезируют прямым методом согласно следующему выражению при p=11, q≤122, p+q≤122:10) Get an estimate of the bi-amplitude section

which is synthesized by the direct method according to the following expression for p = 11, q≤122, p + q≤122:

,

где

- значение амплитудного Фурье-спектра на частоте р,Where

- the value of the amplitude Fourier spectrum at a frequency p,

- значение амплитудного Фурье-спектра на частоте q,

- the value of the amplitude Fourier spectrum at a frequency q,

- значение амплитудного Фурье-спектра на частоте p+q.

is the value of the amplitude Fourier spectrum at a frequency p + q.

11) Осуществляют стабилизацию разрешающей способности биамплитуды на длительности оценки шума, т.е. анализируемый сегмент и L-1 - сегментов предшествующих ему, что для сегмента анализа 1024 отсчета и частоте дискретизации 44100 Гц составляет 88 сегментов примерно 2 с при р=11, q≤122, p+q≤122:11) Stabilize the resolution of the bi-amplitude for the duration of the noise assessment, i.e. the analyzed segment and L-1 - segments preceding it, that for the analysis segment of 1024 samples and the sampling frequency of 44100 Hz is 88 segments for about 2 s at p = 11, q≤122, p + q≤122:

12) Находят значение суммарной бикорреляции для вокализованных элементов речи на сегменте анализа р=11:12) Find the value of the total bicorrelation for voiced speech elements on the analysis segment p = 11:

13) Находят среднее значение суммарной бикорреляции для вокализованных элементов речи на сегменте анализа:13) Find the average value of the total bicorrelation for voiced speech elements on the analysis segment:

14) Находят максимальное значение суммарной бикорреляции для вокализованных элементов речи на сегменте анализа:14) Find the maximum value of the total bicorrelation for voiced speech elements on the analysis segment:

15) Осуществляют первую ступень нормировки суммарной бикорреляции для вокализованных элементов речи на сегменте:15) Carry out the first stage of normalization of the total bicorrelation for voiced elements of speech on the segment:

16) Находят максимальное значение 1-нормированной суммарной бикорреляции для вокализованных элементов речи C₁(d) на длительности оценки шума, т.е. анализируемый сегмент и L-1 - сегментов предшествующих ему, что для сегмента анализа 1024 отсчета и частоте дискретизации 44100 Гц составляет 88 сегментов примерно 2 с:16) Find the maximum value of the 1-normalized total bicorrelation for voiced speech elements C ₁ (d) over the duration of the noise estimate, i.e. the analyzed segment and L-1 - segments preceding it, which for the analysis segment of 1024 samples and the sampling frequency of 44100 Hz is 88 segments for about 2 s:

17) Осуществляют вторую ступень нормировки суммарной бикорреляции для вокализованных элементов речи на сегменте:17) Carry out the second stage of normalization of the total bicorrelation for voiced elements of speech on the segment:

18) Получают оценку среднего значения 2-нормированной суммарной бикорреляции для вокализованных элементов речи C₂(d) на длительности оценки шума, т.е. анализируемый сегмент и L-1 - сегментов предшествующих ему, что для сегмента анализа 1024 отсчета и частоте дискретизации 44100 Гц составляет 88 сегментов примерно 2 с:18) An estimate is obtained of the average value of the 2-normalized total bicorrelation for voiced speech elements C ₂ (d) over the duration of the noise estimate, i.e. the analyzed segment and L-1 - segments preceding it, which for the analysis segment of 1024 samples and the sampling frequency of 44100 Hz is 88 segments for about 2 s:

19) Осуществляют третью ступень нормировки суммарной бикорреляции для вокализованных элементов речи на сегменте:19) Carry out the third stage of normalization of the total bicorrelation for voiced elements of speech on the segment:

20) Находят значение суммарной бикорреляции для невокализованных элементов речи на сегменте анализа р=11:20) Find the value of the total bicorrelation for unvoiced elements of speech on the analysis segment p = 11:

21) Находят среднее значение бикорреляции для невокализованных элементов речи на сегменте анализа:21) Find the average bicorrelation value for unvoiced speech elements on the analysis segment:

22) Находят максимальное значение бикорреляции для невокализованных элементов речи на сегменте анализа:22) Find the maximum bicorrelation value for unvoiced speech elements on the analysis segment:

23) Осуществляют первую ступень нормировки суммарной бикорреляции для невокализованных элементов речи на сегменте:23) Carry out the first stage of normalization of total bicorrelation for unvoiced speech elements on a segment:

24) Находят максимальное значение 1-нормированной суммарной бикорреляции для невокализованных элементов речи Н₁(d) на длительности оценки шума, т.е. анализируемый сегмент и L-1 - сегментов предшествующих ему, что для сегмента анализа 1024 отсчета и частоте дискретизации 44100 Гц составляет 88 сегментов примерно 2 с:24) Find the maximum value of the 1-normalized total bicorrelation for unvoiced speech elements H ₁ (d) over the duration of the noise estimate, i.e. the analyzed segment and L-1 - segments preceding it, which for the analysis segment of 1024 samples and the sampling frequency of 44100 Hz is 88 segments for about 2 s:

25) Осуществляют вторую ступень нормировки суммарной бикорреляции для невокализованных элементов речи на сегменте:25) Carry out the second stage of normalization of the total bicorrelation for unvoiced speech elements on the segment:

26) Находят среднее значение 2-нормированной суммарной бикорреляции для невокализованных элементов речи H₂(d) на длительности оценки шума, т.е. анализируемый сегмент и L-1 - сегментов предшествующих ему, что для сегмента анализа 1024 отсчета и частоте дискретизации 44100 Гц составляет 88 сегментов примерно 2 с:26) Find the average value of the 2-normalized total bicorrelation for unvoiced speech elements H ₂ (d) over the duration of the noise estimate, i.e. the analyzed segment and L-1 - segments preceding it, which for the analysis segment of 1024 samples and the sampling frequency of 44100 Hz is 88 segments for about 2 s:

27) Осуществляют третью ступень нормировки суммарной бикорреляции для невокализованных элементов речи на сегменте:27) Carry out the third stage of normalization of the total bicorrelation for unvoiced speech elements on the segment:

28) Получают оценку 3-нормированной суммарной бикорреляции для всех элементов речи на длительности оценки шума, т.е. анализируемый сегмент и L-1 - сегментов предшествующих ему, что для сегмента анализа 1024 отсчета и частоте дискретизации 44100 Гц составляет 88 сегментов примерно 2 с:28) An estimate of the 3-normalized total bicorrelation for all speech elements is obtained for the duration of the noise estimate, i.e. the analyzed segment and L-1 - segments preceding it, which for the analysis segment of 1024 samples and the sampling frequency of 44100 Hz is 88 segments for about 2 s:

29) Получают оценку среднего значения 2-нормированной суммарной бикорреляции для всех элементов речи на длительности оценки шума, т.е. анализируемый сегмент и L-1 - сегментов предшествующих ему, что для сегмента анализа 1024 отсчета и частоте дискретизации 44100 Гц составляет 88 сегментов примерно 2 с:29) Get an estimate of the average value of the 2-normalized total bicorrelation for all speech elements on the duration of the noise assessment, i.e. the analyzed segment and L-1 - segments preceding it, which for the analysis segment of 1024 samples and the sampling frequency of 44100 Hz is 88 segments for about 2 s:

30) Получают оценку коэффициента стабилизации порогового значения процедуры выделения сегментов:30) Get an estimate of the stabilization coefficient of the threshold value of the procedure for allocating segments:

31) Осуществляют стабилизацию порогового значения процедуры выделения сегментов:31) Perform the stabilization of the threshold value of the procedure for the allocation of segments:

32) Осуществляют выделение сегментов для оценки спектральных характеристик шумового воздействия на длительности оценки шума, т.е. анализируемый сегмент и L-1 - сегментов предшествующих ему, что для сегмента анализа 1024 отсчета и частоте дискретизации 44100 Гц составляет 88 сегментов примерно 2 с:32) Carry out the allocation of segments to assess the spectral characteristics of the noise exposure on the duration of the noise assessment, i.e. the analyzed segment and L-1 - segments preceding it, which for the analysis segment of 1024 samples and the sampling frequency of 44100 Hz is 88 segments for about 2 s:

где J_S(d) - признак сегмента, выделенного для оценки спектральных характеристик шумового воздействия.where J _S (d) is a sign of the segment allocated to assess the spectral characteristics of noise exposure.

33) Осуществляют выделение сегментов для оценки спектральных характеристик шумового воздействия для процедуры эмпирической модовой декомпозиции (ЭМД) на длительности оценки шума, т.е. анализируемый сегмент и L-1 - сегментов предшествующих ему, что для сегмента анализа 1024 отсчета и частоте дискретизации 44100 Гц составляет 88 сегментов примерно 2 секунды:33) The segments are selected to evaluate the spectral characteristics of the noise exposure for the empirical mode decomposition (EMD) procedure for the duration of the noise estimation, i.e. the analyzed segment and L-1 - segments preceding it, which for the analysis segment of 1024 samples and the sampling frequency of 44100 Hz is 88 segments for about 2 seconds:

где J_E(d)- признак сегмента, выделенного для оценки спектральных характеристик шумового воздействия, подаваемого на вход процедуры ЭМД.where J _E (d) is a sign of the segment allocated for evaluating the spectral characteristics of the noise exposure supplied to the input of the EMD procedure.

34) Находят мгновенные спектры мощности для каждого сегмента на длительности оценки шума, т.е. анализируемый сегмент и L-1 - сегментов предшествующих ему, что для сегмента анализа 1024 отсчета и частоте дискретизации 44100 Гц составляет 88 сегментов и примерно 2 с при i=1:122, d=d-L+1:d:34) Instantaneous power spectra are found for each segment over the duration of the noise estimate, i.e. the analyzed segment and L-1 - segments preceding it, that for the analysis segment of 1024 samples and the sampling frequency of 44100 Hz is 88 segments and about 2 s at i = 1: 122, d = d-L + 1: d:

35) Осуществляют оценку спектральных характеристик шумового воздействия для процедуры спектрального вычитания на длительности оценки шума, т.е. анализируемый сегмент и L-1 - сегментов предшествующих ему, что для сегмента анализа 1024 отсчета и частоте дискретизации 44100 Гц составляет 88 сегментов примерно 2 с при i=1:122:35) The spectral characteristics of the noise exposure are estimated for the spectral subtraction procedure for the duration of the noise estimate, i.e. the analyzed segment and L-1 - segments preceding it, which for the analysis segment of 1024 samples and the sampling frequency of 44100 Hz is 88 segments for about 2 s at i = 1: 122:

36) Осуществляют проверку оценки спектральных характеристик шумового воздействия на узкополостность для сегмента анализа при последовательном номере спектральной компоненты i=1:122:36) Verify the assessment of the spectral characteristics of the noise effect on the narrow-cavity for the analysis segment with the serial number of the spectral component i = 1: 122:

37) Осуществляют оценку коэффициента бикорреляции на анализируемом сегменте:37) Carry out an assessment of the bicorrelation coefficient in the analyzed segment:

38) Осуществляют первую ступень стабилизации оценки коэффициента бикорреляции на анализируемом сегменте:38) Carry out the first stage of stabilization of the assessment of the bicorrelation coefficient in the analyzed segment:

39) Осуществляют вторую ступень стабилизации оценки коэффициента бикорреляции на анализируемом сегменте:39) Carry out the second stage of stabilization of the assessment of the bicorrelation coefficient in the analyzed segment:

А7):A7):

40) Выполняют спектральное вычитание согласно следующим выражениям при i=1:122:40) Perform spectral subtraction according to the following expressions for i = 1: 122:

,

где

- значение спектральной компоненты на частоте i амплитудного спектра Фурье очищенного сегмента обрабатываемого речевого сигнала с выхода процедуры спектрального вычитания.Where

- the value of the spectral component at frequency i of the amplitude Fourier spectrum of the cleaned segment of the processed speech signal from the output of the spectral subtraction procedure.

41) Осуществляют обратное дискретное преобразование Фурье:41) Carry out the inverse discrete Fourier transform:

где S^*(1:1024)_d - очищенный сегмент обрабатываемого речевого сигнала с выхода блока обратного преобразования Фурье,where S ^* (1: 1024) _d is the cleaned segment of the processed speech signal from the output of the inverse Fourier transform block,

- спектр амплитуд Фурье с выхода процедуры спектрального вычитания,

- the spectrum of the Fourier amplitudes from the output of the spectral subtraction procedure,

F_U(1:1024)_d - спектр фаз Фурье сегмента анализа зашумленного PC.F _U (1: 1024) _d is the Fourier phase spectrum of the noisy PC analysis segment.

А8):A8):

42) Осуществляют оценку спектральных характеристик шумового воздействия для процедуры ЭМД на длительности оценки шума, т.е. анализируемый сегмент и L-1 - сегментов предшествующих ему, что для сегмента анализа 1024 отсчета и частоте дискретизации 44100 Гц составляет 88 сегментов примерно 2 с при i=1:122:42) Evaluate the spectral characteristics of the noise exposure for the EMD procedure for the duration of the noise assessment, i.e. the analyzed segment and L-1 - segments preceding it, which for the analysis segment of 1024 samples and the sampling frequency of 44100 Hz is 88 segments for about 2 s at i = 1: 122:

43) Осуществляют проверку остаточного шумового воздействия на узкополостность для сегмента анализа при i=1:122:43) Carry out a check of the residual noise effect on narrow-gap for the analysis segment at i = 1: 122:

44) Формируют временные реализации остаточного шумового воздействия для сегмента анализа 1024 отсчета и частоте дискретизации 44100 Гц составляет 88 сегментов примерно 2 с:44) Form temporary implementations of the residual noise exposure for the analysis segment of 1024 samples and the sampling frequency of 44100 Hz is 88 segments for approximately 2 s:

где E(1:1024)_d - временная реализация остаточного шумового воздействия на анализируемом сегменте очищенного сигнала с выхода процедуры спектрального вычитания,where E (1: 1024) _d is the temporary implementation of the residual noise exposure on the analyzed segment of the cleaned signal from the output of the spectral subtraction procedure,

- спектр амплитуд Фурье остаточного шумового воздействия,

- spectrum of the Fourier amplitudes of the residual noise exposure,

45) Осуществляют эмпирическую модовую декомпозицию, т.е. производят разложение на составляющие моды последовательности отсчетов S^*(1:1024)_d и E(1:1024)_d согласно следующим выражениям:45) Carry out an empirical mode decomposition, i.e. decomposition into mode components of the sequence of samples S ^* (1: 1024) _d and E (1: 1024) _d according to the following expressions:

46) Осуществляют помодовое вычитание полученных эмпирических декомпозиций i=1:1024, j=1:15:46) Carry out a modal subtraction of the obtained empirical decompositions i = 1: 1024, j = 1: 15:

47) Восстанавливают целостный сигнал после помодового вычитания:47) Restore the integral signal after the modal subtraction:

где S^**(1:1024)_d - очищенный сегмент обрабатываемого речевого сигнала с выхода процедуры эмпирической модовой декомпозиции.where S ^** (1: 1024) _d is the cleaned segment of the processed speech signal from the output of the empirical mode decomposition procedure.

А9):A9):

45) Временную реализацию очищенного сегмента обрабатываемого речевого сигнала с выхода процедуры эмпирической модовой декомпозиции S^**(1:1024)_d подают на вход адаптивного цифрового фильтра низких частот и выполняют следующие действия:45) Temporary implementation of the cleaned segment of the processed speech signal from the output of the empirical mode decomposition procedure S ^** (1: 1024) _{d is} fed to the input of an adaptive digital low-pass filter and perform the following actions:

где S^***(1:1024)_d - очищенный сегмент обрабатываемого речевого сигнала с выхода адаптивного цифрового фильтра низких частот,where S ^*** (1: 1024) _d is the cleaned segment of the processed speech signal from the output of the adaptive digital low-pass filter,

где: S^**(1:1024)_d - очищенный сегмент обрабатываемого речевого сигнала с выхода процедуры эмпирической модовой декомпозиции,where: S ^** (1: 1024) _d is the cleared segment of the processed speech signal from the output of the empirical mode decomposition procedure,

O(d) - оценка коэффициента бикорреляции на анализируемом сегменте,O (d) - assessment of the bicorrelation coefficient in the analyzed segment,

×I_f - операция свертки с импульсной характеристик цифрового фильтра низких частот.× I _f - convolution operation with impulse characteristics of a digital low-pass filter.

Исходя из предложенного описания последовательности действий, определим функциональный состав для каждого из пунктов формулы (фиг. 3):Based on the proposed description of the sequence of actions, we determine the functional composition for each of the claims (Fig. 3):

Способ по п. 1: А1-А7;The method according to claim 1: A1-A7;

Способ по п. 2: А1-А8;The method according to p. 2: A1-A8;

Способ по п. 3: А1-А9.The method of claim 3: A1-A9.

Предполагаемое устройство для реализации заявленных способов, представлено на фигуре 4:The proposed device for implementing the claimed methods is presented in figure 4:

1) Уровень управляющих воздействий и предустановленных априорных данных о чистом сигнале речи (возможность реализации по совокупности процессора постоянного запоминающего устройства), имеющий технологически в своем составе:1) The level of control actions and predefined a priori data about a pure speech signal (the possibility of implementing a combination of a read-only memory processor), which has technologically its composition:

1 - блок управления (функционально соединенный со всеми блоками);1 - control unit (functionally connected to all units);

2 - блок хранения априорных данных (соединен с блоком 8 оценки эмпирического отношения шум-сигнал);2 - a priori data storage unit (connected to the unit 8 for evaluating the empirical noise-signal ratio);

3 - блок хранения кратковременных данных (реализация возможна на оперативном запоминающем устройстве);3 - block storage of short-term data (implementation is possible on random access memory);

2) Этап приема непрерывного сигнала речи, имеющий технологически в составе:2) The step of receiving a continuous speech signal, which is technologically composed of:

4 - блок приема непрерывного зашумленного сигнала речи;4 - block receiving a continuous noisy speech signal;

3) Этап аналого-цифрового преобразования и сегментации речевого сигнала, имеющий технологически в составе:3) The stage of analog-to-digital conversion and segmentation of a speech signal, having technologically composed:

5 - блок аналого-цифрового преобразования;5 - block analog-to-digital conversion;

6 - блок сегментации дискретного обрабатываемого зашумленного речевого сигнала на сегменты квазистационарности;6 - block segmentation of a discrete processed noisy speech signal into quasistationary segments;

4) Этап полиспектрального анализа речевого сигнала при последовательно-параллельной обработке, технологически имеющий в своем составе:4) The stage of multispectral analysis of the speech signal during serial-parallel processing, technologically incorporating:

7 - блок прямого преобразования Фурье;7 - block direct Fourier transform;

8 - блок оценки эмпирического отношения шум-сигнал;8 - unit for evaluating the empirical noise-signal ratio;

9 - блок процедуры оценки спектральных характеристик шумового воздействия и бикорреляционных свойств обрабатываемого сегмента зашумленного речевого сигнала;9 is a block procedure for evaluating the spectral characteristics of noise exposure and bicorrelation properties of the processed segment of a noisy speech signal;

10 - блок спектрального вычитания;10 - block spectral subtraction;

11 - блок обратного преобразования Фурье;11 - block inverse Fourier transform;

5) Этап коррекции, имеющий технологически в своем составе:5) The correction stage, which is technologically composed:

12 - блок процедуры эмпирической модовой декомпозиции;12 is a block of the empirical mode decomposition procedure;

13 - цифровой адаптивный фильтр низких частот.13 is a digital adaptive low-pass filter.

Процедуры приема, аналого-цифрового преобразования и сегментации речевого сигнала и их реализация достаточно подробно описаны в (Солонина А.И., Улахович Д.А., Арбузов С.М., Соловьева Е.Б., Основы цифровой обработки сигналов: Курс лекций. - СПб.: БХВ - Петербург, 2003. - с. 425-446). Описание формирования и приема кадра передачи, выполняемые блоками 3, 4, 5 представлено в (Быков С.В., Журавлев В.И., Шалимов И.А. Цифровая телефония: Учеб. пособие для вузов. - М.: Радио и связь, 2003. - с. 79-87).The procedures for receiving, analog-to-digital conversion and segmentation of a speech signal and their implementation are described in sufficient detail in (Solonina A.I., Ulakhovich D.A., Arbuzov S.M., Solovieva EB, Fundamentals of digital signal processing: Course of lectures . - St. Petersburg: BHV - Petersburg, 2003 .-- p. 425-446). A description of the formation and reception of the transmission frame performed by blocks 3, 4, 5 is presented in (Bykov S.V., Zhuravlev V.I., Shalimov I.A. Digital Telephony: Textbook for universities. - M .: Radio and communication , 2003. - p. 79-87).

Реализация совокупности блоков 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13 возможна на базе модуля TORNADO-P64, который разработан компанией "МикроЛАБ Системе" () Цифровая обработка сигналов CHIP NEWS Жучков К., Хоружий С., Чепель Е. Полиспектральный анализатор сигналов на базе модуля цифрового сигнального процессора TMS320C6416).Implementation of a set of blocks 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13 is possible on the basis of the TORNADO-P64 module, which was developed by the MicroLAB System company () Digital signal processing CHIP NEWS Zhuchkov K., Khoruzhiy S., Chepel E. Multispectral signal analyzer based on the digital signal processor module TMS320C6416).

Устройство, реализующее заявленные способы, работает следующим образом (фиг. 4): П0 - связь на вход устройства, предложенного в прототипе, П1 - выход очищенного сигнала по п. 1 формулы изобретения, П2 - выход очищенного сигнала по п. 2 формулы изобретения, П3 - выход очищенного сигнала по п. 3 формулы изобретения.A device that implements the claimed methods works as follows (Fig. 4): P0 — communication to the input of the device proposed in the prototype, P1 — output of the purified signal according to claim 1, P2 — output of the purified signal according to claim 2, P3 - the output of the purified signal according to claim 3 of the claims.

Согласно способу по п. 1:According to the method of claim 1:

Непрерывный зашумленный акустический сигнал поступает на вход блока 4, в котором происходит его акустоэлектрическое преобразование. Полученный непрерывный электрический сигнал с выхода блока 4 поступает на вход блока аналого-цифрового преобразования 5, в котором осуществляется получение дискретных отсчетов речевого сигнала с частотой дискретизации, равной 44100 Гц, последовательность дискретных отсчетов с выхода блока 5 поступает на вход блока сегментации 6, где происходит разделение последовательности отсчетов на сегменты квазистационарности по 1024 отсчета, далее с выхода блока 6 посегментно речевой сигнал поступает на входы блоков прямого преобразования Фурье 7 и оценки эмпирического отношения шум-сигнал 8, с выхода блока 7 на вход блоков процедуры оценки спектральных характеристик шумового воздействия и бикорреляционных свойств обрабатываемого сегмента зашумленного речевого сигнала 9 и спектрального вычитания 10 поступает последовательность значений усеченного спектра амплитуд Фурье, также с выхода блока 7 на вход блока обратного преобразования Фурье 11 поступает последовательность значений спектра фаз обрабатываемого сегмента зашумленного речевого сигнала, с выхода блока 8 на вход блока 9 поступают значения среднего эмпирического отношения шум-сигнал на сегменте обработки и последовательность значений мгновенного эмпирического отношения шум-сигнал для каждого из сегментов в течение предшествующих 2 с, с выхода блока 9 на вход блока 10 поступает последовательность значений оценки спектра амплитуд шумового воздействия для сегмента обработки, в блоке 10 выполняется спектральное вычитание в спектрах амплитуд Фурье, с выхода блока 10 на вход блока 11 последовательность значений спектра амплитуд Фурье очищенного речевого сигнала, на выходе блока 11 получена последовательность значений временной реализации сегмента очищенного речевого сигнала.A continuous noisy acoustic signal is fed to the input of block 4, in which its acoustoelectric conversion takes place. The received continuous electrical signal from the output of block 4 is fed to the input of the analog-to-digital conversion block 5, in which discrete samples of the speech signal are obtained with a sampling frequency of 44100 Hz, a sequence of discrete samples from the output of block 5 is fed to the input of the segmentation block 6, where dividing the sequence of samples into quasistationary segments of 1024 samples, then from the output of block 6, the speech signal is sent segmentwise to the inputs of the direct Fourier transform blocks 7 and ots the empirical noise-signal ratio 8, from the output of block 7 to the input of the procedure for evaluating the spectral characteristics of the noise exposure and the bicorrelation properties of the processed segment of the noisy speech signal 9 and spectral subtraction 10, a sequence of values of the truncated spectrum of the Fourier amplitudes is received, also from the output of block 7 to the input of the block the inverse Fourier transform 11 receives a sequence of values of the phase spectrum of the processed segment of the noisy speech signal from the output of block 8 to the input of block 9 post the values of the average empirical noise-signal ratio on the processing segment and the sequence of values of the instantaneous empirical noise-signal ratio for each of the segments within the previous 2 s are received, from the output of block 9 to the input of block 10 a sequence of values of the spectrum of noise amplitude amplitudes for the processing segment is received, in block 10, spectral subtraction in the spectra of Fourier amplitudes is performed, from the output of block 10 to the input of block 11, the sequence of values of the spectrum of the Fourier amplitudes of the purified speech signal, the output of block 11 received a sequence of values of the temporal implementation of the segment of the cleared speech signal.

Согласно способу по п. 2:According to the method of claim 2:

Аналогично способу по п. 1, с дополнением следующих связей и операций:Similar to the method according to claim 1, with the addition of the following relationships and operations:

С выхода блока 10 на вход блока 9 поступают последовательности значений спектров амплитуд очищенного речевого сигнала в течение последних 2-х с, в блоке 9 осуществляется оценка спектральных характеристик остаточного шумового воздействия, далее с выхода блока оценки шумового воздействия 9 на вход блока 12 поступает последовательность значений оценки спектра амплитуд остаточного шумового воздействия, с выхода блока 11 на второй вход блока 12 поступает последовательность значений временной реализации сегмента речевого сигнала с выхода процедуры спектрального вычитания, с выхода блока прямого преобразования Фурье на вход блока 12 поступает последовательность значений спектра фаз Фурье зашумленного речевого сигнала, в блоке 12 осуществляется синтез временной реализации остаточного шумового воздействия для сегмента обработки с коррекцией сегмента обработки очищенного речевого сигнала с выхода спектрального вычитания на основе эмпирической модовой декомпозиции, с выхода блока 12 получена последовательность значений временной реализации сегмента очищенного речевого сигнала.From the output of block 10 to the input of block 9, the sequence of values of the spectra of amplitudes of the purified speech signal for the last 2 s is received, in block 9, the spectral characteristics of the residual noise exposure are evaluated, then from the output of the block for evaluating the noise exposure 9, a sequence of values estimating the amplitude spectrum of the residual noise exposure, from the output of block 11 to the second input of block 12, a sequence of values of the temporal realization of the segment of the speech signal from the output of the process spectral subtraction, from the output of the direct Fourier transform block to the input of block 12, a sequence of values of the Fourier phase spectrum of the noisy speech signal is received, in block 12, the temporal implementation of the residual noise exposure is synthesized for the processing segment with correction of the processing segment of the purified speech signal from the spectral subtraction output based on empirical mode decomposition, from the output of block 12, a sequence of values of the temporal realization of the segment of the cleared speech signal is obtained a.

Согласно способу по п. 3:According to the method of claim 3:

Аналогично способу по п. 2, с дополнением следующих связей и операций:Similar to the method according to claim 2, with the addition of the following relationships and operations:

С выхода блока 9 на вход адаптивного цифрового фильтра низких частот 13 поступает значение коэффициента бикорреляции для сегмента обработки, с выхода блока 12 на второй вход блока 13 поступает последовательность значений сегмента очищенного речевого сигнала с выхода процедуры эмпирической модовой декомпозиции, с выхода блока 13 получена последовательность значений временной реализации сегмента очищенного речевого сигнала.From the output of block 9 to the input of the adaptive digital low-pass filter 13, the value of the bicorrelation coefficient for the processing segment is received, from the output of block 12 to the second input of block 13, a sequence of values of the segment of the cleared speech signal from the output of the empirical mode decomposition is received, a sequence of values is obtained from the output of block 13 temporary implementation of the segment of the cleared speech signal.

Блок управления 1 работает в режиме реального времени и осуществляет общий контроль над всеми процедурами. Блок хранения априорных данных 2 выполнен на основе постоянного запоминающего устройства и хранит информацию о среднем значении мгновенной энергии на сегменте чистого речевого сигнала. Блок хранения кратковременных данных работает в режиме реального масштаба времени и осуществляет прием и хранение различных значений оценок всех информационных признаков, описанных выше согласно предложенному способу в течение последних двух секунд обработки, и осуществляет обмен этими данными между всеми блоками.The control unit 1 operates in real time and exercises general control over all procedures. The a priori data storage unit 2 is made on the basis of read-only memory and stores information about the average value of instantaneous energy on a segment of a pure speech signal. The short-term data storage unit operates in real time and receives and stores various values of the estimates of all the information signs described above according to the proposed method during the last two seconds of processing, and exchanges these data between all the blocks.

При проведении фильтрации в условиях слабого шумового воздействия правила принятия решения, предложенные в прототипе, имеют высокую эффективность, однако в условиях воздействия шума высокой интенсивности наблюдается ее снижение, вследствие появления ситуации, связанной с трудностью оценки спектральных характеристик шума и возникновением нелинейных искажений после проведения спектрального вычитания, что снижаем в итоге отношение сигнал-шум.When filtering under conditions of low noise exposure, the decision rules proposed in the prototype are highly effective, however, under conditions of high intensity noise, its reduction is observed due to the situation associated with the difficulty of estimating the spectral characteristics of noise and the occurrence of nonlinear distortion after spectral subtraction , which reduces the resulting signal-to-noise ratio.

Для оценки эффекта, получаемого при введении различных действий над обрабатываемым зашумленным речевым сигналом, взято отношение сигнал-шум (ОСШ) очищенного речевого сигнала, полученного способом фильтрации.To evaluate the effect obtained by introducing various actions on the processed noisy speech signal, the signal-to-noise ratio (SNR) of the purified speech signal obtained by the filtering method is taken.

ОСШ характеризует в физическом смысле меру близости зашумленного и чистого речевого сигнала до фильтрации согласно способу и очищенного и чистого речевого сигнала после фильтрации согласно способу.The SNR characterizes in a physical sense a measure of the proximity of a noisy and clean speech signal before filtering according to the method and a cleaned and clean speech signal after filtering according to the method.

Исходя из этого, объективным показателем повышения эффективности предложенных способов относительно прототипа, выберем среднее повышение ОСШ очищенного речевого сигнала в диапазоне от 10 до -10 дБ, так как повышение ОСШ является техническим результатом, достигаемым предложенными изобретениями.Based on this, an objective indicator of the effectiveness of the proposed methods relative to the prototype, we choose the average increase in the SNR of the purified speech signal in the range from 10 to -10 dB, since the increase in the SNR is a technical result achieved by the proposed inventions.

На фиг. 10 представлен график зависимости ОСШ до фильтрации согласно предложенных способов и прототипа от ОСШ после фильтрации для различных шумов.In FIG. 10 shows a graph of the SNR before filtering according to the proposed methods and prototype from the SNR after filtering for various noises.

Оценку среднего повышения ОСШ ΔD^j j - способа в дБ будем проводить согласно следующему выражению:The average increase in the SNR ΔD ^j j - method in dB will be estimated according to the following expression:

- значение ОСШ на выходе предложенного j-способа от (10-i)-го отношения сигнал-шум на входе;

- the SNR value at the output of the proposed j-method from the (10-i) -th signal-to-noise ratio at the input;

P_i - значение ОСШ на выходе прототипа от (10-i)-го отношения сигнал-шум на входе;P _i - SNR value at the prototype output from the (10-i) -th signal-to-noise ratio at the input;

h - вид шума;h is the type of noise;

I - количество уровней отношения сигнал-шум на входе способов - 20 (от плюс 10 до минус 10 дБ);I is the number of signal-to-noise ratio levels at the input of the methods — 20 (from plus 10 to minus 10 dB);

H - количество видов шумов - 9:H - the number of types of noise - 9:

1) Белый гауссовский шум.1) White Gaussian noise.

2) Шум двигателя.2) Engine noise.

3) Шум города.3) The noise of the city.

4) Шум ветра.4) The noise of the wind.

5) Шум вертолета.5) The noise of a helicopter.

6) Шум идеализированной синусоиды (тоновой шум).6) The noise of an idealized sine wave (tone noise).

7) Шум проезжающего поезда.7) The noise of a passing train.

8) Шум боя.8) The noise of the battle.

9) Шум горящего здания.9) The noise of a burning building.

В ходе проверки эффективности получены следующие значения среднего повышения отношения сигнал-шум:During the efficiency check, the following values of the average increase in the signal-to-noise ratio were obtained:

ΔD¹=3.33678708525837 дБ - для способа по п. 1,ΔD ¹ = 3.33678708525837 dB - for the method according to claim 1,

ΔD²=3.75325376882644 дБ - для способа по п. 2,ΔD ² = 3.75325376882644 dB - for the method according to claim 2,

ΔD³=4.1522149764898 дБ - для способа по п. 3.ΔD ³ = 4.1522149764898 dB - for the method according to claim 3.

Исходя из проведенной оценки эффективности предложенного способа, согласно решению изобретательской задачи, можно с уверенностью сказать, что предложенные способы позволяют осуществлять фильтрацию зашумленного речевого сигнала со средним повышением отношения шум-сигнал от 3,33 дБ до 4,15 дБ.Based on the assessment of the effectiveness of the proposed method, according to the solution of the inventive problem, we can say with confidence that the proposed methods allow filtering a noisy speech signal with an average increase in the noise-signal ratio from 3.33 dB to 4.15 dB.

Достоверность технического результата, подтверждена сведениями экспериментального характера, полученными в ходе испытаний (использовались различные записи речевых сигналов, которые подвергались аддитивному зашумлению белым гауссовым шумом и различными видами реальных шумов при различных отношениях сигнал-шум, данные зашумленные сигналы подвергались многократным испытаниям в сравнительном характере между различными способами, реализованными в программной среде MATLAB) по принятым в отрасли стандартным методикам (согласно ГОСТ Р 51061-97 Системы низкоскоростной передачи речи по цифровым каналам. Параметры качества речи и методы испытаний. - М.: Госстандарт России, 1997 г. - 230 с.), которые показали, что применение предложенных способов позволяет повысить отношение сигнал - шум.The reliability of the technical result is confirmed by experimental information obtained during the tests (various recordings of speech signals were used, which were subjected to additive noise by white Gaussian noise and various types of real noise at different signal-to-noise ratios, these noisy signals were subjected to repeated tests in a comparative manner between different methods implemented in the MATLAB software environment) according to industry standard methods (according to GOST R 51061-9 7 Systems of low-speed voice transmission through digital channels. Parameters of speech quality and test methods. - M .: Gosstandart of Russia, 1997 - 230 pp.), Which showed that the application of the proposed methods allows to increase the signal-to-noise ratio.

Проведенный анализ уровня техники позволил установить, что аналоги, характеризующиеся совокупностью признаков, тождественных всем признакам заявленного способа фильтрации зашумленного речевого сигнала, отсутствуют. Следовательно, заявленное изобретение соответствует условию патентоспособности "новизна".The analysis of the prior art made it possible to establish that analogues, characterized by a combination of features that are identical to all the features of the claimed method of filtering a noisy speech signal, are absent. Therefore, the claimed invention meets the condition of patentability "novelty."

Результаты поиска известных решений в данной и смежных областях техники с целью выявления признаков, совпадающих с отличительными от прототипа признаками заявленного объекта, показали, что они не следуют явным образом из уровня техники. Из уровня техники также не выявлена известность влияния предусматриваемых существенными признаками заявленного изобретения преобразований на достижение указанного технического результата. Следовательно, заявленное изобретение соответствует условию патентоспособности «изобретательский уровень».Search results for known solutions in this and related fields of technology in order to identify features that match the distinctive features of the claimed object from the prototype showed that they do not follow explicitly from the prior art. The prior art also did not reveal the popularity of the impact provided for by the essential features of the claimed invention, the transformations to achieve the specified technical result. Therefore, the claimed invention meets the condition of patentability "inventive step".

Заявленное изобретение поясняется следующими фигурами:The claimed invention is illustrated by the following figures:

На фиг. 1 представлены условия сложной помеховой обстановки:In FIG. 1 presents the conditions of a complex interference environment:

A) Чистый речевой сигнал (PC);A) Pure speech signal (PC);

Б) PC + БГШ ОСШ минус 10 дБ;B) PC + HSS SNR minus 10 dB;

B) PC + шум двигателя ОСШ минус 10 дБ;B) PC + noise of the OSSh engine minus 10 dB;

Г) PC + шум ветра ОСШ минус 20 дБ.D) PC + SNR wind noise minus 20 dB.

На фиг. 2 представлены варианты записей тестовых фраз:In FIG. 2 presents options for recording test phrases:

A) тестовая фраза №1 5-го диктора (женщина 30 лет);A) test phrase No. 1 of the 5th announcer (female 30 years old);

Б) гистограмма значений отсчетов тестовой фразы №1 5-го диктора;B) a histogram of the values of the readings of the test phrase No. 1 of the 5th announcer;

B) тестовая фраза №2 23-го диктора (мужчина 66 лет);B) test phrase No. 2 of the 23rd announcer (male 66 years old);

Г) гистограмма значений отсчетов тестовой фразы №2 23-го диктора;D) a histogram of the values of the readings of the test phrase No. 2 of the 23rd announcer;

Д) тестовая фраза №3 37-го диктора (мужчина 21 год);D) test phrase No. 3 of the 37th announcer (male 21 years old);

Е) гистограмма значений отсчетов тестовой фразы №3 37-го диктора.E) a histogram of the values of the readings of the test phrase No. 3 of the 37th announcer.

На фиг. 3 представлены блок-схемы способов согласно формулы изобретения: П1 - П7 пункты формулы изобретения;In FIG. 3 shows flow charts of methods according to the claims: P1 - P7 claims;

На фиг. 4 представлен вариант устройства для реализации заявленных способов. Состав устройства представлен выше в описании.In FIG. 4 presents a variant of the device for implementing the claimed methods. The composition of the device is presented above in the description.

На фиг. 5 представлена процедура стабилизации разрешающей способности биамплитуды:In FIG. 5 presents the procedure for stabilizing the resolution of biamplitude:

A) разрез биамплитуды зашумленного PC по сегментам до процедуры стабилизации разрешающей способности;A) a section of the biamplitude of the noisy PC by segments before the resolution stabilization procedure;

Б) разрез биамплитуды зашумленного PC по сегментам до процедуры стабилизации разрешающей способности на плоскости равных значений;B) a section of the biamplitude of the noisy PC by segments before the procedure for stabilizing resolution on a plane of equal values;

B) разрез биамплитуды зашумленного PC по сегментам после процедуры стабилизации разрешающей способности;B) a section of the biamplitude of the noisy PC by segments after the resolution stabilization procedure;

Г) разрез биамплитуды зашумленного PC по сегментам после процедуры стабилизации разрешающей способности на плоскости равных значений.D) a section of the biamplitude of the noisy PC by segments after the procedure for stabilizing resolution on a plane of equal values.

На фиг. 6 представлена процедура выделения сегментов для оценки спектральных характеристик шумового воздействия путем анализа зон сосредоточения низкоплотностной области биамплитуды:In FIG. Figure 6 shows the procedure for selecting segments for evaluating the spectral characteristics of noise exposure by analyzing the concentration zones of the low-density region of biamplitude:

A) оценка эмпирического отношения шум-сигнал по сегментам;A) evaluation of the empirical noise-signal ratio by segments;

Б) выделение сегментов для анализа спектральных характеристик шумового воздействия;B) selection of segments for analysis of the spectral characteristics of noise exposure;

B) признак сегмента, выделенного для оценки спектральных характеристик шумового воздействия;B) a sign of the segment allocated to assess the spectral characteristics of the noise exposure;

Г) признак сегмента, выделенного для оценки спектральных характеристик остаточного шумового воздействия;D) a sign of the segment allocated to assess the spectral characteristics of the residual noise exposure;

Д) оценка коэффициента бикорреляции по сегментам.D) estimation of the bicorrelation coefficient by segments.

На фиг. 7 представлена процедура спектрального вычитания:In FIG. 7 presents the spectral subtraction procedure:

A) спектрограмма по сегментам зашумленного PC;A) spectrogram by segments of a noisy PC;

Б) истинная спектрограмма по сегментам шумового воздействия;B) true spectrogram by segments of noise exposure;

B) спектрограмма оценки шумового воздействия по сегментам;B) segment spectrogram of noise exposure assessment;

Г) спектрограмма по сегментам чистого PC;D) spectrogram by segments of pure PC;

Д) спектрограмма по сегментам очищенного PC с выхода процедуры спектрального вычитания.D) a spectrogram for the segments of the purified PC from the output of the spectral subtraction procedure.

На фиг. 8 представлена коррекция сигналов с выхода спектрального вычитания на основе эмпирической модовой декомпозиции и адаптивной цифровой фильтрации низких частот с использованием коэффициента бикорреляции:In FIG. Figure 8 shows the correction of signals from the spectral subtraction output based on empirical mode decomposition and adaptive digital low-pass filtering using the bicorrelation coefficient:

A) сегмент зашумленного речевого сигнала ОСШ - минус 7 дБ;A) segment of the noisy SNR speech signal - minus 7 dB;

Б) сегмент чистого речевого сигнала;B) a segment of pure speech signal;

B) сегмент очищенного речевого сигнала с выхода блока спектрального вычитания ОСШ - минус 3,45 дБ;B) the segment of the cleared speech signal from the output of the SNR spectral subtraction block is minus 3.45 dB;

Г) эмпирическая модовая декомпозиция сегмента речевого сигнала с выхода процедуры спектрального вычитания;D) empirical mode decomposition of the speech signal segment from the output of the spectral subtraction procedure;

Д) сегмент временной реализации оценки остаточного шумового воздействия;D) a segment of the temporary implementation of the assessment of residual noise exposure;

Е) эмпирическая модовая декомпозиция сегмента сигнала остаточного шумового воздействия;E) empirical mode decomposition of the signal segment of the residual noise exposure;

Ж) результат помодового вычитания;G) the result of modal subtraction;

З) сегмент очищенного речевого сигнала с выхода процедуры ЕМД ОСШ - минус 1,2 дБ;H) the segment of the cleared speech signal from the output of the EMD SNR procedure - minus 1.2 dB;

И) сегмент очищенного речевого сигнала с выхода цифрового фильтра низких частот ОСШ - плюс 0,7 дБ.I) the segment of the cleared speech signal from the output of the OSS digital low-pass filter - plus 0.7 dB.

На фиг. 9 представлены основы подхода к анализу зон сосредоточения низкоплотностной области биамплитуды. Сравнительная характеристика вокализованных и энергетически сильных элементов речи с невокализованными и энергетически слабыми элементами речи.In FIG. Figure 9 presents the basics of the approach to the analysis of concentration zones of the low-density region of biamplitude. Comparative characteristics of voiced and energetically strong elements of speech with unvoiced and energetically weak elements of speech.

На фиг. 10 представлены графики эффективности в зависимости отношение сигнал-шум на выходе от отношения сигнал-шум на входе предложенных способов и прототипа для некоторых видов шумов:In FIG. 10 shows efficiency graphs depending on the signal-to-noise ratio at the output from the signal-to-noise ratio at the input of the proposed methods and prototype for some types of noise:

1) Белый гауссовский шум (БГШ).1) White Gaussian noise (BGS).

2) Шум двигателя.2) Engine noise.

3) Шум города.3) The noise of the city.

4) Шум ветра.4) The noise of the wind.

Исходя из проведенной оценки эффективности, согласно решению изобретательской задачи, можно с уверенностью сказать, что предложенные способы позволяют осуществлять эффективную фильтрацию в задаче шумоподавления со средним повышением отношения сигнал-шум от 3,33 до 4,15 дБ, следовательно, задача заявленных изобретений достигнута.Based on the effectiveness assessment, according to the solution of the inventive problem, we can say with confidence that the proposed methods allow efficient filtering in the noise reduction problem with an average increase in signal-to-noise ratio from 3.33 to 4.15 dB, therefore, the task of the claimed inventions is achieved.

Claims

1. A method for filtering a noisy speech signal in a difficult interference environment, including sequentially executed steps, according to which a noisy speech signal and its analog-to-digital conversion with a predetermined sampling frequency are received, the noisy speech signal is further divided into quasi-stationary segments, after which, based on the analysis The filtering results in the low and high frequencies are classified into voiced and unvoiced segments, then they are evaluated with spectral characteristics of noise in preselected segments and produce separate noise reduction for the voiced segment in the adaptive filtering module and the unvoiced segment by spectral subtraction in power spectra, then the phase spectrum of the processed segment of the noisy speech signal is estimated, followed by the inverse Fourier transform of the amplitude spectrum and phase spectrum for obtaining a cleared speech signal, characterized in that when evaluating the characteristics of the noise exposure using the result Tata polispektralnogo analysis, based on which the value of the total applied bikorrelyatsii coefficients obtained through evaluation bispectrum noisy speech signal in areas of concentration of low-density region with further biamplitudy performing spectral subtraction Fourier spectra amplitudes for all segments.

2. A method for filtering a noisy speech signal in a difficult interference environment, including successively executed steps, according to which a noisy speech signal and its analog-to-digital conversion with a predetermined sampling frequency are received, the noisy speech signal is further divided into quasi-stationary segments, after which, based on the analysis The filtering results in the low and high frequencies are classified into voiced and unvoiced segments, then they are evaluated with spectral characteristics of noise in preselected segments and produce separate noise reduction for the voiced segment in the adaptive filtering module and the unvoiced segment by spectral subtraction in power spectra, then the phase spectrum of the processed segment of the noisy speech signal is estimated, followed by the inverse Fourier transform of the amplitude spectrum and phase spectrum for obtaining a cleared speech signal, characterized in that when evaluating the characteristics of the noise exposure using the result tats of multispectral analysis, based on which the values of the coefficients of total bicorrelation are used, obtained through estimates of the bispectrum of a noisy speech signal in the concentration areas of the low-density region of the biamplitude with further spectral subtraction in the spectra of the Fourier amplitudes for all segments, then, after the inverse Fourier transform, the empirical mode decoding method is additionally applied to eliminate non-linear artifacts in the cleared speech signal.

3. A method for filtering a noisy speech signal in a difficult interference environment, including successively executed steps, according to which a noisy speech signal and its analog-to-digital conversion with a predetermined sampling frequency are received, the noisy speech signal is further divided into quasi-stationary segments, after which, based on the analysis The filtering results in the low and high frequencies are classified into voiced and unvoiced segments, then they are evaluated with spectral characteristics of noise in preselected segments and produce separate noise reduction for the voiced segment in the adaptive filtering module and the unvoiced segment by spectral subtraction in power spectra, then the phase spectrum of the processed segment of the noisy speech signal is estimated, followed by the inverse Fourier transform of the amplitude spectrum and phase spectrum for obtaining a cleared speech signal, characterized in that when evaluating the characteristics of the noise exposure using the result tats of multispectral analysis, based on which the values of the coefficients of total bicorrelation are used, obtained through estimates of the bispectrum of a noisy speech signal in the concentration areas of the low-density region of the biamplitude with further spectral subtraction in the spectra of the Fourier amplitudes for all segments, then, after the inverse Fourier transform, the empirical mode decoding method is additionally applied to eliminate non-linear artifacts in the purified speech signal, then apply Adaptive digital low-pass filtering is performed to further attenuate the noise impact in pauses using the bicorrelation coefficient, which is calculated in the concentration areas of the low-density bi-amplitude region of the processed noisy speech signal.