RU99107842A

RU99107842A - SPEECH ANALYSIS METHOD

Info

Publication number: RU99107842A
Application number: RU99107842/09A
Authority: RU
Inventors: Геннадий Дмитриевич Толстых; Эммануил Григорьевич Кнеллер; Валерий Владимирович Сборщиков; Сергей Валерьевич Суслов; Евгений Юрьевич Демин
Original assignee: Закрытое акционерное общество "ИстраСофт"
Filing date: 1999-04-14
Publication date: 2001-03-10

Claims

1. A method for analyzing speech, in which samples are taken from an input signal with a given sampling frequency and, using an analog-to-digital conversion, a digital signal is formed from them, a digital signal is stored on a time period whose length is at least twice the maximum allowable period of the fundamental signal , the presence of a speech signal and / or pause is detected in the stored digital signal, when a pause is detected, its duration is determined, when a pitch signal is detected and if there is a specified time In a given segment of a digital signal, at least two periods of the fundamental signal, the difference between which does not exceed a predetermined threshold, decide on the presence of a vowel in the speech signal and then the stored digital signal is divided into frames, the duration of each of which is set equal to T samples and the signal period fundamental tone, in each frame T samples are interpolated into N samples, where N = 2 ⁿ , n is an integer, the digital signal obtained from N samples is subjected to an N-point Fourier transform, based on which both / and the signal spectrum is measured if no fundamental signal is detected in the stored digital signal, then the amplitude changes of the stored digital signal are measured, and if the changes in the amplitude of the stored digital signal are in a predetermined range, a decision is made on the presence of an “hissing consonant” and then from the stored digital signal select N samples, normalize them according to the actual value and undergo the N-point Fourier transform, on the basis of which the energies of the obtained spectrum are measured over critical hearing zones, if and do not detect a "vowel" and "hissing consonant", then decide on the presence of an "explosive consonant" and then N samples are selected from the stored digital signal, subjected to N-point Fourier transform, based on which the signal spectrum is extracted and / or measured.

2. The method according to claim 1, characterized in that when deciding on the presence of an "explosive consonant" before the N-point Fourier transform, the frame duration of N samples is reduced.

3. The method according to claim 2, characterized in that the frame duration is reduced by no more than 5-15%.