RU1795515C

RU1795515C - Method of conversion of time-amplitude presentation of sound wave

Info

Publication number: RU1795515C
Application number: SU904840016A
Authority: RU
Inventors: Сергей Васильевич Мышко; Анатолий Иванович Шевченко
Priority date: 1990-06-19
Filing date: 1990-06-19
Publication date: 1993-02-15

Abstract

Изобретение относитс к речевой акустике и может быть использовано при проектировании систем автоматического распознавани и синтеза звуковых волн, вл ющихс носител ми речевой информации . Цель изобретени - повышение точности представлени образа звуковой волны. Поставленна цель достигаетс тем, что амплитудно-временное представление звуковой волны раздел ют на временные отрезки, представл ющие собой элементарные составл ющие волнового процесса и соответствующие полному колебанию функции , и преобразуют в последовательность значений длин полных колебаний и колебаний функции на них. 3 ил.The invention relates to speech acoustics and can be used in the design of systems for the automatic recognition and synthesis of sound waves that are carriers of speech information. The purpose of the invention is to improve the accuracy of the representation of an image of a sound wave. The goal is achieved in that the amplitude-temporal representation of the sound wave is divided into time periods, which are elementary components of the wave process and correspond to the complete oscillation of the function, and are converted into a sequence of values of the lengths of the full oscillations and oscillations of the function on them. 3 ill.

Description

Изобретение относитс к речевой акустике и может быть использовано при проектировании систем автоматического распознавани и синтеза звуковых волн, вл ющихс носител ми речевой информации .The invention relates to speech acoustics and can be used in the design of systems for the automatic recognition and synthesis of sound waves that are carriers of speech information.

Известно устройство кодировани речевых сигналов, в котором в качестве признаков в системах распознавани речи предлагаетс использовать экстремальные значени речевых сигналов и интервалы времени между ними.A speech encoding device is known in which extreme values of speech signals and time intervals between them are proposed to be used as features in speech recognition systems.

Однако такое представление зашумлен- ных сигналов, когда на полуволнах существует множество экстремумов, которые не несут важной информационной нагрузки, приводит к избытку информации.However, such a representation of noisy signals, when there are many extrema on half-waves that do not carry an important information load, leads to an excess of information.

Наиболее близким техническим ре- шением вл етс способ построени динамических портретов, при которомThe closest technical solution is the method of constructing dynamic portraits, in which

амплитудно-временное представление речевого сигнала разбиваетс на временные отрезки длиной 10 мс. Дл каждого из этих отрезков определ ют значение максимального отсчета и количество переходов функции через ноль. Таким образом звуковую волну представл ют в виде последовательностей значений максимального отсчета на временном отрезке и количества переходов функции через ноль на этом отрезке.the amplitude-time representation of the speech signal is divided into time slots of 10 ms length. For each of these segments, the value of the maximum reference and the number of transitions of the function through zero are determined. Thus, the sound wave is represented as sequences of values of the maximum reference on the time interval and the number of transitions of the function through zero on this interval.

Известный способ имеет следующие недостатки . Во-первых, произвольный выбор длины временного интервала лишает физического смысла такое представление звуковой волны и сводит его к субъективно- статистическому - не пон тно, что же представл ет собой отрезок звуковой волны на интервале, например, 10 мс, и почему именно такой длины выбран интервал.The known method has the following disadvantages. Firstly, the arbitrary choice of the length of the time interval deprives the physical meaning of such a representation of the sound wave and reduces it to a subjective-statistical one - it is not clear what the segment of the sound wave is in the interval, for example, 10 ms, and why such a length is chosen interval.

-ч|h |

чэ ел елche ate

елate

Во-вторых, одна полуволна может иметь множество локальных экстремумов, и при произвольном разбиении амплитудно- временного представлени на временные отрезки, локальные экстремумы данной полуволны могут войти в соседние интервалы , и таким образом искажаетс картина представлени звуковой волны.Secondly, one half-wave can have many local extrema, and if the amplitude-time representation is arbitrarily divided into time segments, the local extrema of this half-wave can enter neighboring intervals, and thus the picture of the sound wave is distorted.

Проблему преобразовани амплитудно- временного представлени звуковой волны в виде последовательности отрезков, обладающих определенными физическими свойствами волнового процесса сводитс к обоснованию выбора длины этих отрезков и отображению их физической сущности через определенные физические параметры волнового процесса.The problem of transforming the amplitude-time representation of a sound wave in the form of a sequence of segments having certain physical properties of the wave process reduces to justifying the choice of the length of these segments and displaying their physical nature through certain physical parameters of the wave process.

Целью способа вл етс повышение точности представлени образа звуковой волны..The aim of the method is to increase the accuracy of the representation of the sound wave image.

Поставленна цель достигаетс тем, что амплитудно-временное представление звуковой волны расчлен ют на временные отрезки, представл ющие собой элементарные составл ющие волнового процесса и соответствующие полному колебанию функции, и преобразуют в последовательность значений длин полных колебаний и колебаний функции на них.The goal is achieved in that the amplitude-time representation of the sound wave is divided into time periods representing the elementary components of the wave process and corresponding to the full oscillation of the function, and is converted into a sequence of values of the lengths of the full oscillations and oscillations of the function on them.

Представление звуковой волны в виде последовательности элементарных составл ющих АВП,1 соответствующих полным колебани м функции позвол ет решить проблему выбора длин временных отрезков , с точки зрени физической сущности волнового процесса.Representation of the sound wave in the form of a sequence of elementary components of the WUA, 1 corresponding to the full oscillations of the function, allows us to solve the problem of choosing the lengths of time intervals from the point of view of the physical nature of the wave process.

Длина временного отрезка определ етс периодом полного колебани функции. Выбор величин, характеризующих элементарные составл ющие волнового процесса, значений длин полных колебаний и колебаний функции на этих длинах, позвол ет подойти к изучению речевого сообщени как волнового процесса, характеризующегос определенными параметрами, с точки зрени которых можно оценить источники звуковых волн. Функци U(t) соответствует образу волнового процесса, элементарной составл ющей которого вл етс полное колебание , то есть отрезок области определени функции U(t) на концах которого U(t)0, существует единственна точка t, принадлежаща отрезку, така , что U(ti) 0.The length of the time span is determined by the period of complete fluctuation of the function. The choice of the quantities characterizing the elementary components of the wave process, the values of the lengths of the full oscillations and the fluctuations of the function at these lengths allows us to approach the study of speech communication as a wave process characterized by certain parameters, from the point of view of which it is possible to evaluate the sources of sound waves. The function U (t) corresponds to the image of the wave process, the elementary component of which is the full oscillation, that is, the segment of the domain of the function U (t) at the ends of which U (t) 0, there is a unique point t belonging to the segment, such that U (ti) 0.

Если обозначить через полное колебание функции U(t), а через р( Ki) - вектор-функцию признаков KI, то функци будет иметь вид tp(K)- (dt n), где di - длина полного колебани ki, a n колебание функции U(t) на Ki. т.е. n SUP{U(ti) - U(t2)}.If we denote by the complete oscillation of the function U (t), and by p (Ki) the vector-function of the signs KI, then the function will have the form tp (K) - (dt n), where di is the length of the complete oscillation ki, an oscillation of the function U (t) on Ki. those. n SUP {U (ti) - U (t2)}.

Vti, taCKiVti, taCKi

55

00

55

Предлагаемое представление функции U(t) приводит к выбору исследуемых фрагментов функции, согласно ее физической сущности, соответствующей отражениюThe proposed representation of the function U (t) leads to the choice of the studied fragments of the function, according to its physical nature, corresponding to the reflection

волнового процесса. Предлагаемое техническое решение по сн етс чертежами.wave process. The proposed solution is illustrated by drawings.

На фиг. 1,2 приведены амплитудно-временное представление речевого сигнала и представление речевого сигнала в виде по0 следовательностей значений длин полных колебаний - штриховые линии 2 и колебаний функций АВП звуковых волн - штриховые линии 1 соответственно: на фиг.З приведена структурна схема устройства, реализующего АВП звуковых волн в виде последовательностей значений длин полных колебаний и колебаний функции на этих длинах.In FIG. Figure 1.2 shows the amplitude-time representation of the speech signal and the presentation of the speech signal in the form of sequences of values of the lengths of the full oscillations — dashed lines 2 and oscillations of the WUA functions of sound waves — dashed lines 1, respectively: Fig. 3 shows a structural diagram of a device that implements WUAs of sound waves in the form of sequences of values of the lengths of the full oscillations and oscillations of the function at these lengths.

На фиг.1,2 приведено представление речевого сигнала согласно предлагаемому способу. Такое представление используетс при анализе речевых сообщений. Figure 1.2 shows a representation of a speech signal according to the proposed method. This representation is used in the analysis of voice messages.

Устройство, реализующее способ (фиг.З), состоит из генератора эталонной частоты 1, счетчика 2, регистра 3, блока выделени нулей 4, блока выделени максимумов 5, блока выделени минимумов 6, сумматора 7, аналого-цифрового преобразовател 8, параллельных интерфейсов 9The device that implements the method (FIG. 3) consists of a reference frequency generator 1, counter 2, register 3, block for extracting zeros 4, block for extracting maxima 5, block for extracting minima 6, adder 7, analog-to-digital converter 8, parallel interfaces 9

0 и 10. Выход генератора 1 соединен с первым входом счетчика 2 и первым входом блока выделени нулей 4. Выход счетчика 2 соединен с первым входом регистра 3, выходы блока выделени нулей 4 соединены соот5 ветственно со вторыми входами счетчика 2, регистра 3, АЦП 8 и интерфейса 9. Выходы блоков выделени максимумов 5 и минимумов б соединены со входом сумматора 7, выход которого соединен с первым входом0 and 10. The output of the generator 1 is connected to the first input of the counter 2 and the first input of the block of allocation of zeros 4. The output of the counter 2 is connected to the first input of the register 3, the outputs of the block of selection of zeros 4 are connected5 respectively to the second inputs of the counter 2, register 3, ADC 8 and interface 9. The outputs of the blocks for extracting the maxima 5 and minima b are connected to the input of the adder 7, the output of which is connected to the first input

0 АЦП 8. Второй выход АЦП 8 соединен со вторыми входами блоков выделени максимумов 5 и минимумов 6 и интерфейса 10. Блок выделени нулей 4 состоит из триггера готовности, триггера Шмитта, четырех одно5 вибраторов. Блоки выделени максимумов 5 и минимумов 6 состо т из детектора, ключа и одновибратора.0 ADC 8. The second output of ADC 8 is connected to the second inputs of the highs 5 and lows 6 highlighters and the interface 10. The highs 4 highlighter consists of a ready trigger, a Schmitt trigger, four one to 5 vibrators. The blocks for extracting maxima 5 and minima 6 consist of a detector, a key, and a one-shot.

Устройство работает следующим образом . Входной сигнал поступает на второйThe device operates as follows. The input signal goes to the second

0 вход блока выделений нулей 4 и первые входы блоков выделени максимумов 5 и минимумов 6. Дл выделени длины полного колебани используетс генератор 1, счетчик 2 и регистр 3. В блоке выделени нулей0 the input of the block of allocation of zeros 4 and the first inputs of the blocks of allocation of maximums 5 and minimums 6. To select the length of the full oscillation, generator 1, counter 2 and register 3 are used.

5 4 на выходе триггера Шмитта формируетс последовательность пр моугольных импульсов , соответствующа моментам перехода через ноль. По переднему фронту этих импульсов формируетс сигнал записи в регистр 3 текущего значени счетчика 2 и сигнал готовности к обмену интерфейса 9 с микропроцессорной системой, а также сигнал сброса счетчика 2. Дл выделени амплитуды полного колебани используютс блоки выделени максимумов 5 и минимумов 6, в которых за период полных колебаний на накопительных конденсаторах запоминаютс амплитуды глобальных максимума и минимума, затем они суммируютс по абсолютной величине в сумматоре 7 и поступают на вход АЦП 8. Запуск АЦП 8 осуществл етс по переднему фронту импульсов , поступающих с триггера Шмитта блока выделени нулей 4. По завершению аналого-цифрового преобразовани вырабатываетс сигнал, по которому происходит разр д накопительных конденсаторов и формируетс сигнал готовности к обмену5 4 at the output of the Schmitt trigger, a sequence of rectangular pulses is formed corresponding to the moments of transition through zero. On the leading edge of these pulses, a signal is written to register 3 of the current value of counter 2 and a signal of readiness for exchange of interface 9 with the microprocessor system, as well as a reset signal of counter 2. To extract the amplitude of the full oscillation, blocks are used to extract maxima 5 and minima 6, in which the period of complete oscillations on the storage capacitors is stored the amplitudes of the global maximum and minimum, then they are summed by the absolute value in the adder 7 and fed to the input of the ADC 8. The ADC 8 is launched s on the leading edge of the pulses coming from the Schmitt trigger of the block of the allocation of zeros 4. Upon completion of the analog-to-digital conversion, a signal is generated by which the discharge of the storage capacitors is generated and a signal of readiness for exchange is generated

00

55

интерфейса 10с микропроцессорной системой .interface 10c microprocessor system.

Предлагаемый способ преобразовани амплитудно-временного представлени звуковых волн может примен тьс при анализе и синтезе речевых сообщений. Речевой сигнал представл етс в виде последовательности значений длин полных колебаний и колебаний функции на них. При такое представлении объем пам ти необходимой дл хранени речевой информации по сравнению с традиционным АВП сокращаетс в 4 раза.The proposed method for converting the amplitude-time representation of sound waves can be used in the analysis and synthesis of voice messages. The speech signal is represented as a sequence of values of the lengths of the full oscillations and oscillations of the function on them. With this representation, the memory required for storing voice information is reduced by 4 times compared with traditional WUAs.

Информационное содержание, передаваемое звуковой волной может быть восстановлено путем синтеза по последовательности значений длин полных колебаний и колебаний функции на соответствующих длинах.The information content transmitted by the sound wave can be restored by synthesis from a sequence of values of the lengths of the full oscillations and oscillations of the function at the corresponding lengths.

Claims

The claims

A method of converting the amplitude-time representation of a sound wave by dividing it into time segments, characterized in that, in order to increase the accuracy of the image representation

sound wave, the amplitude-temporal representation of the sound wave is divided into time segments corresponding to the full fluctuations of the function and converted into a sequence of values of the lengths of the full oscillations and oscillations of the function on them.