RU2814115C1

RU2814115C1 - Method for separating speech and pauses by analyzing characteristics of spectral components of mixture of signal and noise

Info

Publication number: RU2814115C1
Application number: RU2023120818A
Authority: RU
Inventors: Владимир Алексеевич Золотарев
Original assignee: Акционерное общество "Концерн "Созвездие"
Filing date: 2023-08-09
Publication date: 2024-02-22

Abstract

FIELD: computer technology for digital processing of voice information.

SUBSTANCE: spectral analysis is carried out by analysing multi-frequency periodic signals represented by digital samples using compensation of cross products. For several positions of the sliding window, based on the results of the analysis of the values of the interference parameters, threshold values are set for the powers and number of detected spectral components, as well as for the average value of the powers of the spectral components of the interference. For sliding window positions for which a signal may be present, the average number of detected components is calculated. If this number exceeds the corresponding threshold, then the signal duration is calculated. If the signal duration exceeds the minimum threshold value and does not exceed the maximum threshold value, then the average value of the powers of the spectral components of the mixture of speech signal and noise is calculated. If this value exceeds the threshold, then a signal is considered to be present in these sliding windows. Otherwise, it is decided that there is only interference in these time slots.

EFFECT: increasing the accuracy of speech and pause separation in the presence of interference with rapidly changing power.

1 cl, 3 dwg, 1 tbl

Description

Изобретение относится к области цифровой обработки речевых сигналов и может найти применение в устройствах связи. The invention relates to the field of digital processing of speech signals and can find application in communication devices.

Известен способ спектрального анализа электрических сигналов (патент РФ № 2431853), в котором анализируемый электрический сигнал подают одновременно на гребенку фильтров, настроенных на различные частоты и измеряют сигналы на выходах этих фильтров, причем до проведения измерений диапазон контролируемых частот разбивают на элементы разрешения с шагом дискретизации, соответствующим желаемым точности и разрешению спектрального анализа. Недостатком данного способа является сложность технической реализации и недостаточно высокая эффективность решения задачи разделения речи и пауз. There is a known method for spectral analysis of electrical signals (RF patent No. 2431853), in which the analyzed electrical signal is simultaneously fed to a comb of filters tuned to different frequencies and the signals at the outputs of these filters are measured, and before the measurements, the range of controlled frequencies is divided into resolution elements with a sampling step , corresponding to the desired accuracy and resolution of spectral analysis. The disadvantage of this method is the complexity of the technical implementation and the insufficiently high efficiency of solving the problem of separating speech and pauses.

Известен способ спектрального анализа сигналов (патент РФ №2127888), в котором при дискретизации и квантовании сигнала создают последовательности дискретных значений сигнала с различными частотами следования отсчетов в каждой из них. При этом дискретные значения этих последовательностей фильтруют с помощью цифровых полосовых фильтров и цифровых фильтров нижних частот. Сигналы с выходов цифровых полосовых фильтров подвергают обработке, связанной с определением амплитудных значений, а на их основе и остальных информативных параметров полосовых сигналов. Но данный способ имеет недостаточно высокую эффективность решения задачи разделения речи и пауз. There is a known method for spectral analysis of signals (RF patent No. 2127888), in which, when sampling and quantizing a signal, sequences of discrete signal values are created with different sampling frequencies in each of them. In this case, the discrete values of these sequences are filtered using digital bandpass filters and digital low-pass filters. Signals from the outputs of digital bandpass filters are subjected to processing associated with determining the amplitude values, and on their basis, other informative parameters of the bandpass signals. But this method is not highly efficient in solving the problem of separating speech and pauses.

Известен способ спектрального анализа многочастотных периодических сигналов, представленных цифровыми отсчетами (Функциональный контроль и диагностика электротехнических систем и устройств по цифровым отсчетам мгновенных значений тока и напряжения. /под редакцией Е.И. Гольдштейна - Томск: Изд. «Печатная мануфактура», 2003, с.92-94). Недостатком способа является недостаточно высокая эффективность решения задачи разделения речи и пауз. There is a known method for spectral analysis of multi-frequency periodic signals represented by digital readings (Functional control and diagnostics of electrical systems and devices using digital readings of instantaneous current and voltage values. /edited by E.I. Goldstein - Tomsk: Publishing House "Printed Manufactory", 2003, p. .92-94). The disadvantage of this method is that it is not very efficient in solving the problem of separating speech and pauses.

Известен способ спектрального анализа сигналов (патент РФ №2730043 G01R23/16 ). Недостатком этого способа является недостаточно высокая эффективность решения задачи разделения речи и пауз. There is a known method for spectral analysis of signals (RF patent No. 2730043 G01R23/16 ). The disadvantage of this method is that it is not very efficient in solving the problem of separating speech and pauses.

Известен способ разделения речи и пауз, описанный в книге «Цифровая обработка речевых сигналов. //Л.Р. Рабинер, Р.В. Шафер. Перевод с английского под редакцией М.В. Назарова и Ю.Н. Прохорова. Москва, «Радио и связь», 1981», стр. 123 - 126. Недостатком данного способа является высокая вероятность ошибочного решения о появлении сигнала в условиях наличия акустического шума. There is a known method for separating speech and pauses, described in the book “Digital processing of speech signals. //L.R. Rabiner, R.V. Best man. Translation from English edited by M.V. Nazarov and Yu.N. Prokhorova. Moscow, “Radio and Communications”, 1981,” pp. 123 - 126. The disadvantage of this method is the high probability of an erroneous decision about the appearance of a signal in the presence of acoustic noise.

Известен способ разделения речи и пауз путем сравнительного анализа значений мощностей помехи и смеси сигнала и помехи по патенту РФ 2668407, G10L 25/93 , у которого недостаточно высокая эффективность решения задачи разделения речи и пауз в условиях наличия мощных акустических помех. There is a known method for separating speech and pauses by comparative analysis of the values of the interference powers and the mixture of signal and interference according to RF patent 2668407, G10L 25/93 , which does not have a high enough efficiency in solving the problem of separating speech and pauses in the presence of powerful acoustic interference.

Известен способ разделения речи и пауз путем анализа значений фаз частотных составляющих шума и сигнала по патенту РФ 2680735, G10L 21/0272 , который обладает недостаточно высокой эффективностью решения задачи разделения речи и пауз в условиях наличия большого числа частотных составляющих акустических помех. There is a known method for separating speech and pauses by analyzing the phase values of the frequency components of noise and signal according to RF patent 2680735, G10L 21/0272 , which is not highly efficient in solving the problem of separating speech and pauses in the presence of a large number of frequency components of acoustic interference.

Известен способ разделения речи и пауз путем анализа значений корреляционной функции помехи и смеси сигнала и помехи по патенту РФ 2691603, G10L 15/00 . Известное техническое решение имеет недостаточно высокую эффективность при решении задачи разделения речи и пауз в условиях априорной неопределенности информации о присутствии в интервале анализа только помехи или смеси помехи и сигнала. There is a known method for separating speech and pauses by analyzing the values of the correlation function of interference and the mixture of signal and interference according to RF patent 2691603, G10L 15/00 . The known technical solution is not highly efficient in solving the problem of separating speech and pauses under conditions of a priori uncertainty of information about the presence in the analysis interval of only interference or a mixture of interference and signal.

Известен способ разделения речи и речеподобного шума путем анализа значений энергии и фаз частотных составляющих сигнала и шума, описанный в патенте РФ 2700189, H04Q1/46, недостатком которого является недостаточно высокая эффективность решения задачи разделения речи и пауз в условиях наличия большого числа частотных составляющих акустических помех. There is a known method for separating speech and speech-like noise by analyzing the energy values and phases of the frequency components of the signal and noise, described in RF patent 2700189, H04Q1/46 , the disadvantage of which is the insufficiently high efficiency of solving the problem of separating speech and pauses in the presence of a large number of frequency components of acoustic interference .

Наиболее близким аналогом по технической сущности к предлагаемому является способ разделения речи и пауз по значениям дисперсий амплитуд спектральных составляющих, описанный в патенте РФ 2723301, G10L 25/93 , принятый за прототип. The closest analogue in technical essence to the proposed one is a method for separating speech and pauses according to the dispersion values of the amplitudes of the spectral components, described in RF patent 2723301, G10L 25/93 , adopted as a prototype.

Способ-прототип заключается в следующем. The prototype method is as follows.

На всем интервале анализа, состоящего из интервала, который содержит шум или речевой сигнал или смесь речевого сигнала и шума, которые поступают в устройство (входной сигнал), сигнал разветвляют на две одинаковые составляющие, одну из них фильтруют фильтром нижних частот (ФНЧ), вторую составляющую фильтруют полосовым фильтром, сигналы, поступившие на выходы фильтров дискретизируют и заносят в память для последующей обработки, формируют «скользящее окно», состоящее из интервалов одинаковой длительности, «скользящее окно» сдвигают на некоторое, заранее определенное количество отсчетов, «скользящее окно» формируют так, что оно включает в себя два интервала анализа, каждый из которых состоит из нескольких интервалов одинаковой длительности, первое положение «скользящего окна» устанавливают так, что в первом интервале анализа присутствует только помеха, осуществляют спектральный анализ входного сигнала для каждого интервала следующим образом, каждый результат преобразования входного сигнала, который образуется после умножения входного сигнала на синус и косинус опорных частот, разветвляют на две одинаковые составляющие, первую составляющую фильтруют фильтром нижних частот (ФНЧ), полоса которого согласована с полосой анализируемого сигнала, одновременно вторую составляющую фильтруют полосовым фильтром, полоса пропускания которого выбирается так, что верхняя частота полосового фильтра соответствует верхней частоте анализируемого сигнала, нижнюю частоту полосового фильтра устанавливают равной некоторому заранее заданному значению, выбор ФНЧ и полосового фильтра осуществляют с идентичными в максимальной степени фазо-частотными характеристиками и так, что амплитудно-частотная характеристика (АЧХ) полосового фильтра в области частот близких к нулю имеет максимально-возможную крутизну, в области частот, начиная со значения, для которого разность значений АЧХ ФНЧ и полосового фильтра становится меньше некоторой заранее заданной величины, обеспечивают идентичность их АЧХ в максимальной степени, сигналы, прошедшие ФНЧ и полосовой фильтр, вычитают один из другого, результаты вычитания преобразуют в цифровой вид, по данным значениям, соответствующим синусной и косинусной составляющей одной частоты, определяют мгновенную спектральную плотность (МСП) для каждой опорной частоты и запоминают эти значения пропорциональные амплитуде сигналов, находят среднее значение МСП, определяют значение порога путем умножения найденного среднего значения МСП на коэффициент, значение которого устанавливают заранее, полученные значения МСП сравнивают с порогом, по результатам сравнения принимают решение о наличии или об отсутствии сигнала с соответствующей частотой, находят значения мощности каждого выделенного сигнала путем возведения в квадрат соответствующих значений МСП, находят для каждой гармоники дисперсию значений мощностей для первого и второго интервалов анализа, рассчитывают среднее значение дисперсий мощностей первого и второго интервалов, усреднение осуществляют по числу гармоник, определяют пороговое значение путем умножения среднего значения дисперсии значений мощностей первого интервала анализа, принадлежащего «скользящему окну», на коэффициент, значение которого определяют заранее, находят значение разности средних значений дисперсий мощностей, рассчитанных для первого и второго интервалов анализа, данное значение разности сравнивают с порогом, считают, что во втором интервале анализа присутствует только помеха, если значение разности средних значений дисперсий мощностей не превышает порог, в противном случае считают, что во втором интервале анализа присутствует сигнал или смесь сигнала и помехи, сдвигают «скользящее окно» на заданное значение интервалов, описанную процедуру повторяют, для последующих шагов пороговое значение для разности средних значений дисперсии значений мощностей интервалов анализа определяют с использованием среднего значения средних значений дисперсии мощностей интервалов анализа, которое рассчитывают, применяя принцип «первый пришел, первый ушел», процесс продолжают до тех пор, пока не закончится время, отведенное для анализа входного сигнала.Over the entire analysis interval, consisting of an interval that contains noise or a speech signal or a mixture of a speech signal and noise that enters the device (input signal), the signal is branched into two identical components, one of them is filtered with a low-pass filter (LPF), the second the component is filtered by a bandpass filter, the signals received at the filter outputs are sampled and stored in memory for subsequent processing, a “sliding window” is formed, consisting of intervals of the same duration, the “sliding window” is shifted by a certain predetermined number of samples, the “sliding window” is formed so that it includes two analysis intervals, each of which consists of several intervals of the same duration, the first position of the “sliding window” is set so that only interference is present in the first analysis interval, a spectral analysis of the input signal is carried out for each interval as follows, each result of transforming the input signal, which is formed after multiplying the input signal by the sine and cosine of the reference frequencies, is branched into two identical components, the first component is filtered by a low-pass filter (LPF), the band of which is matched to the band of the analyzed signal, while the second component is filtered by a band-pass filter, the passband of which is selected so that the upper frequency of the bandpass filter corresponds to the upper frequency of the analyzed signal, the lower frequency of the bandpass filter is set equal to some predetermined value, the selection of the low-pass filter and the bandpass filter is carried out with phase-frequency characteristics identical to the maximum extent and so that the amplitude-frequency the characteristic (frequency response) of a bandpass filter in the frequency range close to zero has the maximum possible slope; in the frequency range, starting from the value for which the difference between the values of the frequency response of the low-pass filter and the bandpass filter becomes less than a certain predetermined value, their frequency responses are ensured to the maximum extent, the signals that have passed through the low-pass filter and the band-pass filter are subtracted from one another, the results of the subtraction are converted into digital form, using these values corresponding to the sine and cosine components of one frequency, the instantaneous spectral density (ISD) is determined for each reference frequency and these values are stored, proportional to the amplitude of the signals , find the average value of the MRP, determine the threshold value by multiplying the found average value of the MRP by a coefficient, the value of which is set in advance, the obtained values of the MRP are compared with the threshold, based on the comparison results, a decision is made about the presence or absence of a signal with the corresponding frequency, the power values of each selected one are found signal by squaring the corresponding MRP values, find for each harmonic the dispersion of power values for the first and second analysis intervals, calculate the average value of the power dispersions of the first and second intervals, averaging is carried out by the number of harmonics, determine the threshold value by multiplying the average value of the dispersion of power values of the first analysis interval belonging to the “sliding window” by a coefficient whose value is determined in advance, the difference between the average values of the power dispersions calculated for the first and second analysis intervals is found, this difference value is compared with the threshold, it is assumed that in the second analysis interval there is only noise , if the difference between the average values of power dispersions does not exceed the threshold, otherwise it is considered that in the second analysis interval there is a signal or a mixture of signal and noise, the “sliding window” is shifted by a given interval value, the described procedure is repeated, for subsequent steps the threshold value for the differences in the average values of the dispersion of the power values of the analysis intervals are determined using the average value of the average values of the dispersion of the powers of the analysis intervals, which is calculated using the principle of “first in, first out”, the process is continued until the time allotted for analyzing the input signal ends.

Недостатком способа-прототипа является его недостаточно высокая эффективность при решении задачи разделения речи и пауз в условиях наличия помехи с быстро изменяющейся мощностью. The disadvantage of the prototype method is its insufficiently high efficiency in solving the problem of separating speech and pauses in the presence of interference with rapidly changing power.

Задачей предлагаемого способа является повышение эффективности принятия правильного решения о появлении речевого сигнала при наличии акустических шумоподобных помех. The objective of the proposed method is to increase the efficiency of making the correct decision about the appearance of a speech signal in the presence of acoustic noise-like interference.

Для решения поставленной задачи в способе разделения речи и пауз путем анализа значений характеристик спектральных составляющих смеси сигнала и помехи, заключающемся в том, что на всем интервале анализа, состоящего из интервала, содержащего шум или речевой сигнал или смесь речевого сигнала и шума, которые поступают в устройство – входной сигнал, дискретизируют и заносят в память для последующей обработки, формируют «скользящее окно», «скользящее окно» сдвигают на некоторое, заранее определенное количество отсчетов, первое положение «скользящего окна» устанавливают так, что в первом интервале анализа присутствует только помеха, осуществляют спектральный анализ входного сигнала для каждого интервала следующим образом, каждый результат преобразования входного сигнала, который образуется после умножения входного сигнала на синус и косинус опорных частот, разветвляют на две одинаковые составляющие, первую составляющую фильтруют фильтром нижних частот (ФНЧ), полоса которого согласована с полосой анализируемого сигнала, одновременно вторую составляющую фильтруют полосовым фильтром, полоса пропускания которого выбирается так, что верхняя частота полосового фильтра соответствует верхней частоте анализируемого сигнала, нижнюю частоту полосового фильтра устанавливают равной некоторому заранее заданному значению, выбор ФНЧ и полосового фильтра осуществляют с идентичными в максимальной степени фазо-частотными характеристиками и так, что амплитудно-частотная характеристика (АЧХ) полосового фильтра в области частот близких к нулю имеет максимально-возможную крутизну, в области частот, начиная со значения, для которого разность значений АЧХ ФНЧ и полосового фильтра становится меньше некоторой заранее заданной величины, обеспечивают идентичность их АЧХ в максимальной степени, сигналы, прошедшие ФНЧ и полосовой фильтр, вычитают один из другого, результаты вычитания преобразуют в цифровой вид, по данным значениям, соответствующим синусной и косинусной составляющей одной частоты, рассчитывают значения мощностей спектральных составляющих и запоминают эти значения, согласно изобретению , заранее устанавливают значения: интервала анализа; длительности «скользящего окна»; длительности временного интервала, на который сдвигают «скользящее окно»; количества положений «скользящего окна», в которых осуществляют анализ наличия сигнала, минимальной и максимальной длительности речевого сигнала; коэффициентов, с использованием которых рассчитывают пороговые значения для значений мощностей спектральных составляющих, для среднего числа спектральных составляющих, значения мощности которых превысили порог – обнаруженные составляющие, для среднего значения мощностей обнаруженных спектральных составляющих смеси сигнала и помехи, «скользящее окно» сдвигают на временной интервал установленной величины, для каждого положения «скользящего окна» проводят спектральный анализ;To solve the problem in the method of separating speech and pauses by analyzing the values of the characteristics of the spectral components of the mixture of signal and noise, which consists in the fact that over the entire analysis interval, consisting of an interval containing noise or a speech signal or a mixture of speech signal and noise, which enters device - input signal, sampled and stored in memory for subsequent processing, a “sliding window” is formed, the “sliding window” is shifted by a certain predetermined number of samples, the first position of the “sliding window” is set so that in the first analysis interval there is only noise , carry out a spectral analysis of the input signal for each interval as follows, each result of transforming the input signal, which is formed after multiplying the input signal by the sine and cosine of the reference frequencies, is branched into two identical components, the first component is filtered by a low-pass filter (LPF), the band of which is consistent with the band of the analyzed signal, at the same time the second component is filtered with a band-pass filter, the passband of which is selected so that the upper frequency of the band-pass filter corresponds to the upper frequency of the analyzed signal, the lower frequency of the band-pass filter is set equal to some predetermined value, the choice of the low-pass filter and the band-pass filter is carried out with identical maximum degree phase-frequency characteristics and so that the amplitude-frequency response (AFC) of the bandpass filter in the frequency range close to zero has the maximum possible slope, in the frequency range, starting from the value for which the difference between the AFC values of the low-pass filter and the bandpass filter becomes less than a certain a predetermined value, ensure the identity of their frequency response to the maximum extent, the signals that have passed through the low-pass filter and the band-pass filter are subtracted from one another, the subtraction results are converted into digital form, according to these values corresponding to the sine and cosine components of the same frequency, the power values of the spectral components are calculated and remember these values, according to the invention , set the values in advance: analysis interval; duration of the “sliding window”; the duration of the time interval by which the “sliding window” is shifted; the number of “sliding window” positions in which the presence of a signal, the minimum and maximum duration of the speech signal are analyzed; coefficients used to calculate threshold values for the power values of spectral components, for the average number of spectral components whose power values exceeded the threshold - detected components, for the average value of the powers of the detected spectral components of the signal and noise mixture, the “sliding window” is shifted by the time interval established values, for each position of the “sliding window” a spectral analysis is carried out;

для нескольких первых положений «скользящего окна», число которых устанавливают заранее, для которых выполняется условие отсутствия сигнала, рассчитывают: число обнаруженных спектральных составляющих помехи для каждого положения «скользящего окна» и пороговое значение для среднего числа обнаруженных спектральных составляющих смеси сигнала и помехи; среднее значение мощностей спектральных составляющих помехи; пороговые значения для значения мощности спектральных составляющих смеси сигнала и помехи и для средних значений мощности спектральных составляющих смеси сигнала и помехи, для нескольких положений «скользящего окна», число которых устанавливают заранее, для которых возможно присутствие сигнала, рассчитывают для каждого положения «скользящего окна» число обнаруженных спектральных составляющих и среднее число обнаруженных спектральных составляющих, если среднее число обнаруженных спектральных составляющих превышает соответствующее пороговое значение, то считают, что для этих положений «скользящего окна» возможно присутствие речевого сигнала, это событие регистрируют, для этого случая рассчитывают длительность сигнала, если длительность сигнала превышает минимальное пороговое значение и не превышает максимальное пороговое значение, то рассчитывают среднее значение мощностей обнаруженных спектральных составляющих сигнала и помехи, если рассчитанное среднее значение мощностей превышает пороговое значение, то считают, что в этих «скользящих окнах» присутствует сигнал, в противном случае принимают решение о наличии в этих временных интервалах только помехи, если для положений «скользящего окна», для которых возможно присутствие сигнала, не зарегистрировано наличие речевого сигнала, или рассчитанное значение длительности сигнала не превышает минимальное значение или превышает максимальное пороговое значение, то считают, что в этих «скользящих окнах», присутствует только помеха, для «скользящих окон», относительно которых принято решение, что в них присутствует только помеха, рассчитывают пороговые значения: для среднего числа обнаруженных спектральных составляющих смеси сигнала и помехи; для значения мощности спектральных составляющих смеси сигнала и помехи; для средних значений мощности спектральных составляющих смеси сигнала и помехи, процесс изменения положения «скользящего окна» осуществляют до тех пор, пока не будет исчерпан интервал анализа сигнала. for the first few positions of the “sliding window”, the number of which is set in advance, for which the condition of no signal is met, the following are calculated: the number of detected spectral components of the interference for each position of the “sliding window” and the threshold value for the average number of detected spectral components of the signal and interference mixture; the average power value of the spectral components of the interference; threshold values for the power of the spectral components of the signal and noise mixture and for the average power values of the spectral components of the signal and noise mixture, for several positions of the “sliding window”, the number of which is set in advance, for which the presence of a signal is possible, are calculated for each position of the “sliding window” the number of detected spectral components and the average number of detected spectral components, if the average number of detected spectral components exceeds the corresponding threshold value, then it is considered that for these positions of the “sliding window” the presence of a speech signal is possible, this event is recorded, for this case the signal duration is calculated if the duration of the signal exceeds the minimum threshold value and does not exceed the maximum threshold value, then the average value of the powers of the detected spectral components of the signal and interference is calculated; if the calculated average value of the powers exceeds the threshold value, then it is considered that there is a signal in these “sliding windows”, otherwise decide on the presence of only interference in these time intervals, if for the positions of the “sliding window” for which the presence of a signal is possible, the presence of a speech signal is not registered, or the calculated value of the signal duration does not exceed the minimum value or exceeds the maximum threshold value, then it is considered that in these “sliding windows”, there is only interference; for “sliding windows”, for which it is decided that only interference is present in them, threshold values are calculated: for the average number of detected spectral components of the signal and interference mixture; for the power value of the spectral components of the signal and noise mixture; for average power values of the spectral components of the signal and noise mixture, the process of changing the position of the “sliding window” is carried out until the signal analysis interval is exhausted.

Предлагаемый способ заключается в следующем. The proposed method is as follows.

Заранее устанавливают значения: Set the values in advance:

– интервала анализа; – analysis interval;

– длительности «скользящего окна»; – duration of the “sliding window”;

– длительности временного интервала, на который сдвигают «скользящее окно»; – the duration of the time interval by which the “sliding window” is shifted;

– количества положений «скользящего окна», в которых осуществляют анализ наличия сигнала, минимальной и максимальной длительности речевого сигнала.– the number of “sliding window” positions in which the presence of a signal, the minimum and maximum duration of the speech signal are analyzed.

Также заранее устанавливают значения коэффициентов, с использованием которых рассчитывают пороговые значения: Also, the values of the coefficients are set in advance, using which the threshold values are calculated:

– для значений мощностей спектральных составляющих; – for the power values of the spectral components;

– для среднего числа спектральных составляющих, значения мощности которых превысили порог – обнаруженные составляющие;– for the average number of spectral components whose power values exceeded the threshold – detected components;

– для среднего значения мощностей обнаруженных спектральных составляющих смеси сигнала и помехи.– for the average power value of the detected spectral components of the signal and noise mixture.

Данные значения устанавливают для типовых условий применения устройства, в котором реализован способ разделения речи и пауз, методом математического моделирования или экспериментальным путем. These values are established for typical conditions of use of a device in which a method for separating speech and pauses is implemented, using the method of mathematical modeling or experimentally.

Входной сигнал преобразуют в цифровой вид и заносят в память для последующей обработки. The input signal is converted into digital form and stored in memory for subsequent processing.

Формируют «скользящее окно». A “sliding window” is formed.

Сдвигают «скользящее окно» на несколько временных интервалов. Значение числа сдвигов устанавливают заранее.The “sliding window” is shifted by several time intervals. The number of shifts is set in advance.

Для каждого положения «скользящего окна» проводят спектральный анализ.For each sliding window position, a spectral analysis is performed.

Спектральный анализ осуществляют, например способом, описание которого приведено в патенте РФ № 2730043, G01R23/16 .Spectral analysis is carried out, for example, by the method described in RF patent No. 2730043, G01R23/16 .

Каждый результат преобразования входного сигнала, который образуется после умножения входного сигнала на синус и косинус опорных частот, разветвляют на две одинаковые составляющие. Each result of converting the input signal, which is formed after multiplying the input signal by the sine and cosine of the reference frequencies, is branched into two identical components.

Первую составляющую фильтруют фильтром нижних частот (ФНЧ), полоса которого согласована с полосой анализируемого сигнала. Одновременно вторую составляющую фильтруют полосовым фильтром, полоса пропускания которого выбирается так, что верхняя частота полосового фильтра соответствует верхней частоте анализируемого сигнала, нижнюю частоту полосового фильтра устанавливают равной некоторому заранее заданному значению. Выбор ФНЧ и полосового фильтра осуществляют с идентичными в максимальной степени фазо-частотными характеристиками и так, что амплитудно-частотная характеристика (АЧХ) полосового фильтра в области частот близких к нулю имеет максимально-возможную крутизну, в области частот, начиная со значения, для которого разность значений АЧХ ФНЧ и полосового фильтра становится меньше некоторой заранее заданной величины, обеспечивают идентичность их АЧХ в максимальной степени (иллюстративный пример приведен на фиг. 1).The first component is filtered by a low-pass filter (LPF), the band of which is consistent with the band of the analyzed signal. At the same time, the second component is filtered by a bandpass filter, the passband of which is selected so that the upper frequency of the bandpass filter corresponds to the upper frequency of the analyzed signal, the lower frequency of the bandpass filter is set equal to some predetermined value. The choice of a low-pass filter and a band-pass filter is carried out with phase-frequency characteristics that are identical to the maximum extent and so that the amplitude-frequency response (AFC) of the band-pass filter in the frequency range close to zero has the maximum possible slope in the frequency range, starting from the value for which the difference between the frequency response values of the low-pass filter and the bandpass filter becomes less than a certain predetermined value, ensuring that their frequency responses are identical to the maximum extent (an illustrative example is shown in Fig. 1).

Сигналы, прошедшие ФНЧ и полосовой фильтр, вычитают один из другого. Результаты вычитания преобразуют в цифровой вид, по данным значениям, соответствующим синусной и косинусной составляющей одной частоты, рассчитывают значения мощности (дисперсий) каждой спектральной составляющей и запоминают эти значения. The signals that pass through the low-pass filter and the band-pass filter subtract one from the other. The subtraction results are converted into digital form, using these values corresponding to the sine and cosine components of the same frequency, the power values (dispersions) of each spectral component are calculated and these values are stored.

Для нескольких первых положений «скользящего окна», число которых устанавливают заранее, для которых выполняется условие отсутствия речевого сигнала, рассчитывают: For the first few positions of the “sliding window”, the number of which is set in advance, for which the condition of the absence of a speech signal is met, calculate:

– число обнаруженных спектральных составляющих помехи для каждого положения «скользящего окна» и пороговое значение для среднего числа обнаруженных спектральных составляющих смеси сигнала и помехи; – the number of detected spectral components of the interference for each position of the “sliding window” and the threshold value for the average number of detected spectral components of the signal and interference mixture;

– среднее значение мощностей спектральных составляющих помехи, пороговые значения для значения мощности спектральных составляющих смеси сигнала и помехи и для средних значений мощности спектральных составляющих смеси сигнала и помехи.– the average power value of the spectral components of the interference, threshold values for the power value of the spectral components of the signal and interference mixture and for the average power values of the spectral components of the signal and interference mixture.

Пороговое значение для среднего числа обнаруженных спектральных составляющих смеси сигнала и помехи рассчитывают путем расчета среднего числа обнаруженных спектральных составляющих помехи для этих положений «скользящего окна» и умножения этого значения на значение соответствующего коэффициента.The threshold value for the average number of detected spectral components of the signal and interference mixture is calculated by calculating the average number of detected spectral components of the interference for these sliding window positions and multiplying this value by the value of the appropriate coefficient.

Пороговые значения для значения мощности спектральных составляющих смеси сигнала и помехи и для средних значений мощности спектральных составляющих смеси сигнала и помехи рассчитывают путем умножения среднего значения мощности спектральных составляющих смеси сигнала и помехи на значения соответствующих коэффициентов.Threshold values for the power of the spectral components of the signal and noise mixture and for the average power of the spectral components of the signal and noise mixture are calculated by multiplying the average power of the spectral components of the signal and noise mixture by the values of the corresponding coefficients.

Число первых положений «скользящего окна», для которых выполняется условие отсутствия сигнала, рассчитывают для типовых условий применения устройства, в котором реализован способ разделения речи и пауз, методом математического моделирования или экспериментальным путем. The number of the first positions of the “sliding window” for which the condition of absence of a signal is satisfied is calculated for typical conditions of use of a device in which a method for separating speech and pauses is implemented, using the method of mathematical modeling or experimentally.

Для нескольких положений «скользящего окна», число которых устанавливают заранее, для которых возможно присутствие сигнала, рассчитывают для каждого положения «скользящего окна» число обнаруженных спектральных составляющих и среднее число обнаруженных спектральных составляющих. For several sliding window positions, the number of which is determined in advance, for which the presence of a signal is possible, the number of detected spectral components and the average number of detected spectral components are calculated for each sliding window position.

Если среднее число обнаруженных спектральных составляющих превышает соответствующее пороговое значение, то считают, что для этих положений «скользящего окна» возможно присутствие речевого сигнала, это событие регистрируют. If the average number of detected spectral components exceeds the corresponding threshold value, then the presence of a speech signal is considered possible for these sliding window positions, and this event is recorded.

Для этого случая рассчитывают длительность сигнала по формуле For this case, the signal duration is calculated using the formula

где Т_со – длительность «скользящего окна»; where T _co is the duration of the “sliding window”;

Т_ссо – длительность интервала, на который сдвигают «скользящее окно»; T _ссо – duration of the interval by which the “sliding window” is shifted;

N – число положений «скользящего окна». N – number of “sliding window” positions.

Если длительность сигнала превышает минимальное пороговое значение и не превышает максимальное пороговое значение, то рассчитывают среднее значение мощностей обнаруженных спектральных составляющих сигнала и помехи. Если рассчитанное среднее значение мощностей превышает пороговое значение, то считают, что в этих «скользящих окнах» присутствует сигнал, в противном случае принимают решение о наличии в этих временных интервалах только помехи. If the signal duration exceeds the minimum threshold value and does not exceed the maximum threshold value, then the average value of the powers of the detected spectral components of the signal and interference is calculated. If the calculated average power value exceeds the threshold value, then it is considered that there is a signal in these “sliding windows”, otherwise a decision is made that there is only interference in these time intervals.

Если для положений «скользящего окна», для которых возможно присутствие сигнала, не зарегистрировано наличие речевого сигнала, или рассчитанное значение длительности сигнала не превышает минимальное значение или превышает максимальное пороговое значение, то считают, что в этих «скользящих окнах», присутствует только помеха. If for the sliding window positions for which a signal may be present, the presence of a speech signal is not detected, or the calculated signal duration does not exceed the minimum value or exceeds the maximum threshold value, then only interference is considered to be present in these sliding windows.

Для «скользящих окон», относительно которых принято решение, что в них присутствует только помеха, рассчитывают пороговые значения: For “sliding windows”, for which it is decided that only noise is present in them, threshold values are calculated:

– для среднего числа обнаруженных спектральных составляющих смеси сигнала и помехи; – for the average number of detected spectral components of the signal and noise mixture;

– для значения мощности спектральных составляющих смеси сигнала и помехи; – for the power value of the spectral components of the signal and noise mixture;

– для средних значений мощности спектральных составляющих смеси сигнала и помехи. – for average power values of the spectral components of the signal and noise mixture.

Процесс изменения положения «скользящего окна» осуществляют до тех пор, пока не будет исчерпан интервал анализа сигнала. The process of changing the position of the “sliding window” is carried out until the signal analysis interval is exhausted.

Ниже приведены результаты моделирования процесса обнаружения факта присутствия речевого сигнала или его отсутствия в условиях наличия помех. Below are the results of modeling the process of detecting the presence of a speech signal or its absence in the presence of interference.

Шумоподобная помеха моделировалась как сумма гармонических сигналов со случайными значениями амплитуд (U_si) и фаз (ϕ_si), которые распределены по нормальному (амплитуды) и равномерному (фазы) законам, соответственноNoise-like interference was modeled as a sum of harmonic signals with random values of amplitudes (U _si ) and phases (ϕ _si ), which are distributed according to normal (amplitude) and uniform (phase) laws, respectively

где: ω_si, φ_si, – частота, фаза, амплитуда i-ого гармонического сигнала;where: ω _si , φ _si , – frequency, phase, amplitude of the i-th harmonic signal;

Nsp – число гармонических сигналов. Nsp is the number of harmonic signals.

Частоты гармоник помехи формировались как случайные величины, значения которых распределены по равномерному закону в полосе сигнала. The interference harmonic frequencies were formed as random variables, the values of which were distributed according to a uniform law in the signal band.

Длительности гармоник помехи формировались как случайные величины, значения которых распределены по равномерному закону в пределах от одного до двух периодов гармоник. Значение периода соответствуют значению частоты гармоники помехи. The durations of the interference harmonics were formed as random variables, the values of which are distributed according to a uniform law within the range of one to two harmonic periods. The period value corresponds to the frequency value of the interference harmonic.

Сигнал моделировался как сумма гармонических сигналов с некоторым значением первой частоты и фиксированными «расстояниями» между значениями частот других гармоник. Значение первой частоты определялось при условии, что это значение равномерно распределено в интервале от 300 до 800 Гц. Значения фаз гармоник сигнала устанавливались одинаковыми. The signal was modeled as a sum of harmonic signals with a certain value of the first frequency and fixed “distances” between the frequency values of other harmonics. The value of the first frequency was determined under the condition that this value is uniformly distributed in the range from 300 to 800 Hz. The phase values of the signal harmonics were set to the same.

Амплитуды гармоник сигнала формировались как случайные величины, распределенные по нормальному закону.The amplitudes of the signal harmonics were formed as random variables distributed according to the normal law.

Моделирование проведено для следующих значений параметров: The simulation was carried out for the following parameter values:

– диапазон изменения частот речевого сигнала: 300 Гц – 3400 Гц;– range of speech signal frequencies: 300 Hz – 3400 Hz;

– число реализаций – 500;– number of implementations – 500;

– число гармоник сигнала – 8; – number of signal harmonics – 8;

– число гармоник помехи – в среднем 100 для одного положения «скользящего окна»; – number of interference harmonics – on average 100 for one position of the “sliding window”;

– число положений «скользящего окна» – 15;– number of “sliding window” positions – 15;

– коэффициент, определяющий частоту дискретизации – 64000; – coefficient determining the sampling frequency – 64000;

– число опорных частот – 30; – number of reference frequencies – 30;

– значение первой опорной частоты – 300 Гц; – the value of the first reference frequency is 300 Hz;

– значение последней опорной частоты – 3350 Гц; – value of the last reference frequency – 3350 Hz;

– значение полосы частот полосового фильтра с максимальной крутизной АЧХ – 200 Гц (0 – Fр, см. фиг. 1);– the value of the frequency band of the bandpass filter with the maximum slope of the frequency response – 200 Hz (0 – Fр, see Fig. 1);

– длительность речевого сигнала (одна фонема) – 30 мс. – duration of the speech signal (one phoneme) – 30 ms.

Результаты моделирования процесса разделения речи и пауз для шумоподобной помехи (значения вероятности решения о наличии речевого сигнала при его присутствии – PPOS, значения вероятности решения о присутствии речевого сигнала при наличии только помехи – PPOP) приведены в таблице. The results of modeling the process of separating speech and pauses for noise-like interference (probability values for a decision about the presence of a speech signal in its presence - PPOS, probability values for a decision about the presence of a speech signal in the presence of only interference - PPOP) are given in the table.

Тип помехиInterference type Обозначение параметраParameter designation Отношение мощностей сигнала и помехиSignal to interference power ratio 0,50.5 11 Шумоподобная помехаNoise-like interference PPOSPPOS 0,950.95 0,9980.998 PPOPPPOP 0,120.12 0,080.08

На основе результатов анализа данных, приведенных в таблице, может быть сделан вывод о высокой эффективности рассматриваемого способа, что объясняется высокой эффективностью используемого способа спектрального анализа. Based on the results of the data analysis given in the table, a conclusion can be drawn about the high efficiency of the method under consideration, which is explained by the high efficiency of the spectral analysis method used.

Структурная схема устройства, реализующего предлагаемый способ, приведена на фиг. 3, где обозначено: A block diagram of a device that implements the proposed method is shown in Fig. 3, where it is indicated:

1 – электроакустическое устройство (ЭАУ);1 – electroacoustic device (EAD);

2 – усилитель низкой частоты (УНЧ); 2 – low frequency amplifier (LF);

3.1 – 3.n – блоки умножения с первого по n-й; 3.1 – 3.n – multiplication blocks from the first to the nth;

4.1 – 4.n – фильтры нижних частот (ФНЧ) с первого по n-й; 4.1 – 4.n – low-pass filters (LPF) from the first to the n-th;

5.1 – 5.n – устройства вычитания с первого по n-й; 5.1 – 5.n – subtraction devices from the first to the nth;

6.1 – 6.n – аналого-цифровые преобразователи (АЦП) с первого по n-й; 6.1 – 6.n – analog-to-digital converters (ADCs) from the first to the n-th;

7.1 – 7.n – полосовые фильтры с первого по n-й; 7.1 – 7.n – bandpass filters from the first to the nth;

8 – вычислительное устройство (ВУ). 8 – computing device (CD).

Устройство содержит последовательно соединенные ЭАУ 1 и УНЧ 2, при этом вход ЭАУ 1 является входом устройства. Кроме того, n параллельных линеек, каждая из которых состоит из соответствующих последовательно соединенных блока умножения 3, ФНЧ 4, устройства вычитания 5 и АЦП 6, при этом полосовой фильтр 7 включен между выходом блока умножения 3 и вторым входом устройства вычитания 5. Входы n блоков умножения 3.1 – 3.n объединены и соединены с выходом УНЧ 2. Выходы с первого по n-й АЦП 6.1 – 6.n соединены с соответствующими входами с первого по n-й вычислительного устройства 8, выход которого является выходом устройства. Вторые входы блоков умножения 3.1 – 3.n являются входами для опорных сигналов. The device contains series-connected EAU 1 and ULF 2, while the input of EAU 1 is the input of the device. In addition, n parallel lines, each of which consists of corresponding serially connected multiplication block 3, low-pass filter 4, subtraction device 5 and ADC 6, with a bandpass filter 7 connected between the output of the multiplication block 3 and the second input of the subtraction device 5. Inputs n blocks multiplications 3.1 – 3.n are combined and connected to the output of ULF 2. The outputs from the first to the nth ADC 6.1 – 6.n are connected to the corresponding inputs from the first to the nth computing device 8, the output of which is the output of the device. The second inputs of multiplication blocks 3.1 – 3.n are inputs for reference signals.

Устройство работает следующим образом. The device works as follows.

Помеху или аддитивную смесь сигнала и помехи, которые поступают с выхода ЭАУ 1, усиливают в УНЧ 2 и подают на объединенный вход n параллельных линеек.Interference or an additive mixture of signal and interference, which comes from the output of EAU 1, is amplified in ULF 2 and fed to the combined input of n parallel lines.

Для обработки одной гармоники используют две линейки устройства. То есть, при использовании k опорных частот число линеек равноTo process one harmonic, two lines of the device are used. That is, when using k reference frequencies, the number of lines is equal to

Помеху или аддитивную смесь сигнала и помехи с выхода УНЧ 2 подают на первые входы блоков умножения 3.1-3.n, на вторые входы которых подают соответствующие опорные сигналы, например,Interference or an additive mixture of signal and interference from the output of ULF 2 is supplied to the first inputs of multiplication blocks 3.1-3.n, to the second inputs of which the corresponding reference signals are supplied, for example,

U_оп1 = sin(x);U _op1 = sin(x);

U_оп2 = cos(x).U _op2 = cos(x).

….….

U_оп(n-1) = sin(x);U _op(n-1) = sin(x);

U_опn = cos(x).U _opn = cos(x).

Результат умножения сигнала и помехи на опорные сигналы разветвляют на две одинаковые составляющие. Первую составляющую фильтруют ФНЧ 4.1 – 4.n, полоса каждого из которых согласована с полосой сигнала. Одновременно вторую составляющую фильтруют полосовыми фильтрами 7.1 – 7.n, полоса пропускания каждого из которых выбирается так, что верхняя частота полосовых фильтров 7.1 – 7.n соответствует верхней частоте сигнала, нижнюю частоту полосовых фильтров 7.1 – 7.n устанавливают максимально близкой к нулевому значению. The result of multiplying the signal and noise by reference signals is branched into two identical components. The first component is filtered by low-pass filters 4.1 – 4.n, the band of each of which is consistent with the signal band. At the same time, the second component is filtered by bandpass filters 7.1 – 7.n, the passband of each of which is selected so that the upper frequency of bandpass filters 7.1 – 7.n corresponds to the upper frequency of the signal, the lower frequency of bandpass filters 7.1 – 7.n is set as close as possible to the zero value .

Выбор ФНЧ 4.1 – 4.n и полосовых фильтров 7.1 – 7.n осуществляют с идентичными в максимальной степени фазо-частотными характеристиками и так, что АЧХ полосовых фильтров 7.1 – 7.n в области частот близких к нулю имеет максимально возможную крутизну, в области частот, начиная со значения, для которого разность значений АЧХ ФНЧ 4.1 – 4.n и полосовых фильтров 7.1 – 7.n становится меньше некоторой заранее заданной величины (F_р), обеспечивают идентичность их АЧХ в максимальной степени (иллюстративный пример приведен на фиг. 1). The choice of low-pass filters 4.1 - 4.n and bandpass filters 7.1 - 7.n is carried out with phase-frequency characteristics identical to the maximum extent and so that the frequency response of bandpass filters 7.1 - 7.n in the frequency range close to zero has the maximum possible slope, in the region frequencies, starting from the value for which the difference between the values of the frequency response of the low-pass filter 4.1 - 4.n and bandpass filters 7.1 - 7.n becomes less than a certain predetermined value (F _p ), ensure the identity of their frequency response to the maximum extent (an illustrative example is shown in Fig. 1).

Сигналы, прошедшие ФНЧ 4.1 – 4.n и полосовые фильтры 7.1 – 7.n, вычитают один из другого. То есть, из сигнала первого ФНЧ 4.1 вычитают сигнал первого полосового фильтра 7.1, из сигнала второго ФНЧ 4.2 вычитают сигнал второго полосового фильтра 7.2 и т.д.Signals that have passed through low-pass filters 4.1 – 4.n and bandpass filters 7.1 – 7.n subtract one from the other. That is, the signal of the first band-pass filter 7.1 is subtracted from the signal of the first low-pass filter 4.1, the signal of the second band-pass filter 7.2 is subtracted from the signal of the second low-pass filter 4.2, etc.

Полученные сигналы преобразуют в цифровой вид в соответствующих с первого по n-й АЦП 6.1 – 6.n. Данные сигналы в цифровом виде подают в ВУ 8. The received signals are converted into digital form in the corresponding first to nth ADCs 6.1 – 6.n. These signals are digitally supplied to VU 8.

В ВУ 8 по данным значениям, соответствующим синусной и косинусной составляющей одной частоты, определяют дисперсию (мощность) спектральных составляющих для каждой опорной частоты путем извлечения квадратного корня из суммы квадратов синусной и косинусной составляющей и запоминают эти значения. In VU 8, using these values corresponding to the sine and cosine components of one frequency, the dispersion (power) of the spectral components for each reference frequency is determined by extracting the square root of the sum of the squares of the sine and cosine components and storing these values.

В ВУ 8 осуществляют обнаружение наличия или отсутствия речевого сигнала по алгоритму, который приведен на стр. 8 – 12 описания.In VU 8, the presence or absence of a speech signal is detected using the algorithm that is given on pages 8 – 12 of the description.

Результаты моделирования процесса спектрального анализа приведены выше. The results of modeling the spectral analysis process are given above.

В качестве ЭАУ 1 могут использоваться, например, микрофоны или ларингофоны.As EAU 1, microphones or laryngophones can be used, for example.

УНЧ 2 может быть реализован, например, на микросхеме OP467GS фирмы Analog Devices.ULF 2 can be implemented, for example, on the OP467GS chip from Analog Devices.

Блоки умножения 3.1 – 3.n могут быть выполнены, например, в виде преобразователя частоты (смесителя), см., например, учебное пособие «Основы теории радиотехнических систем». Учебное пособие. // В.И. Борисов, В.М. Зинчук, А.Е. Лимарев, Н.П. Мухин. Под ред. В.И. Борисова. Воронежский научно-исследовательский институт связи, 2004», стр. 186 – 189.Multiplication blocks 3.1 – 3.n can be made, for example, in the form of a frequency converter (mixer), see, for example, the textbook “Fundamentals of the Theory of Radio Engineering Systems”. Tutorial. // IN AND. Borisov, V.M. Zinchuk, A.E. Limarev, N.P. Mukhin. Ed. IN AND. Borisova. Voronezh Scientific Research Institute of Communications, 2004", pp. 186 – 189.

АЦП 6.1 – 6.n могут быть выполнены, например, на микросхеме AD7495BR фирмы Analog Devices. ADCs 6.1 – 6.n can be implemented, for example, on the AD7495BR chip from Analog Devices.

Вычислительное устройство может быть выполнено, например, в виде единого микропроцессорного устройства с соответствующим программным обеспечением, например, процессора серии TMS320VC5416 фирмы Texas Instruments, или в виде программируемой логической интегральной схемы (ПЛИС) с соответствующим программным обеспечением, например ПЛИС XCV400 фирмы Xilinx.The computing device can be implemented, for example, in the form of a single microprocessor device with appropriate software, for example, a TMS320VC5416 series processor from Texas Instruments, or in the form of a programmable logic integrated circuit (FPGA) with appropriate software, for example, XCV400 FPGA from Xilinx.

Таким образом, заявляемый способ может быть реализован описанным устройством.Thus, the inventive method can be implemented by the described device.

Техническим результатом предлагаемого способа является повышение эффективности принятия правильного решения о появлении речевого сигнала при наличии акустических помех.The technical result of the proposed method is to increase the efficiency of making the correct decision about the appearance of a speech signal in the presence of acoustic interference.

Claims

A method for separating speech and pauses by analyzing the values of the characteristics of the spectral components of a mixture of signal and noise, which consists in the fact that throughout the entire analysis interval, consisting of an interval containing noise, either a speech signal, or a mixture of a speech signal and noise that enters the device - input the signal is sampled and stored in memory for subsequent processing, a “sliding window” is formed, the “sliding window” is shifted by a certain predetermined number of samples, the first position of the “sliding window” is set so that only noise is present in the first analysis interval, spectral analysis is performed input signal for each interval as follows: each result of transforming the input signal, which is formed after multiplying the input signal by the sine and cosine of the reference frequencies, is branched into two identical components, the first component is filtered by a low-pass filter (LPF), the band of which is consistent with the band of the analyzed signal , at the same time, the second component is filtered by a bandpass filter, the passband of which is selected so that the upper frequency of the bandpass filter corresponds to the upper frequency of the analyzed signal, the lower frequency of the bandpass filter is set equal to some predetermined value, the choice of the low-pass filter and the bandpass filter is carried out with identical phase-frequency parameters to the maximum extent possible characteristics and so that the amplitude-frequency response (AFC) of the bandpass filter in the frequency range close to zero has the maximum possible slope in the frequency range, starting from the value for which the difference between the AFC values of the low-pass filter and the bandpass filter becomes less than a certain predetermined values, ensure the identity of their frequency response to the maximum extent, the signals that have passed through the low-pass filter and the band-pass filter are subtracted from one another, the results of the subtraction are converted into digital form, according to these values corresponding to the sine and cosine components of the same frequency, the power values of the spectral components are calculated and these are stored values, characterized in that the following values are set in advance: analysis interval; the duration of the “sliding window”, the duration of the time interval by which the “sliding window” is shifted, the number of positions of the “sliding window” in which the presence of a signal is analyzed, the minimum and maximum duration of the speech signal, the coefficients used to calculate threshold values for power values spectral components, for the average number of spectral components whose power values exceeded the threshold - detected components, for the average power value of the detected spectral components of the signal and noise mixture, the “sliding window” is shifted by a time interval of a set value, for each position of the “sliding window” a spectral analysis; for the first few positions of the “sliding window”, the number of which is set in advance, for which the condition of no signal is met, the following are calculated: the number of detected spectral components of the interference for each position of the “sliding window” and the threshold value for the average number of detected spectral components of the signal and interference mixture; the average power value of the spectral components of the interference; threshold values for the power of the spectral components of the signal and noise mixture and for the average power values of the spectral components of the signal and noise mixture, for several positions of the “sliding window”, the number of which is set in advance, for which the presence of a signal is possible, are calculated for each position of the “sliding window” the number of detected spectral components and the average number of detected spectral components, if the average number of detected spectral components exceeds the corresponding threshold value, then it is considered that for these positions of the “sliding window” the presence of a speech signal is possible, this event is recorded, for this case the signal duration is calculated if the duration of the signal exceeds the minimum threshold value and does not exceed the maximum threshold value, then the average value of the powers of the detected spectral components of the signal and interference is calculated; if the calculated average value of the powers exceeds the threshold value, then it is considered that there is a signal in these “sliding windows”, otherwise decide on the presence of only interference in these time intervals, if for the positions of the “sliding window” for which the presence of a signal is possible, the presence of a speech signal is not registered, or the calculated value of the signal duration does not exceed the minimum value or exceeds the maximum threshold value, then it is considered that in these “sliding windows” there is only interference; for “sliding windows”, for which it is decided that only interference is present in them, threshold values are calculated: for the average number of detected spectral components of the signal and interference mixture; for the power value of the spectral components of the signal and noise mixture; for average power values of the spectral components of the signal and noise mixture, the process of changing the position of the “sliding window” is carried out until the signal analysis interval is exhausted.