SU1599888A1

SU1599888A1 - Method of compilation speech synthesis

Info

Publication number: SU1599888A1
Application number: SU884410536A
Authority: SU
Inventors: Альберт Айрапетович Григорян; Карен Оникович Канаян
Original assignee: Ереванский политехнический институт им.К.Маркса
Priority date: 1988-04-18
Filing date: 1988-04-18
Publication date: 1990-10-15

Abstract

Изобретение относитс к речевой информатике. Цель изобретени - повышение натуральности звучани компилируемых речевых сообщений - достигаетс стыковкой фрагментов фонограмм заранее записанных дифонов и прот женых гласных звуков, которые ввод т на 20 - 40 мс перед предударными дифонами и на 40 - 60 мс перед ударными. 1 ил.The invention relates to speech informatics. The purpose of the invention is to increase the naturalness of compiled speech messages by connecting the fragments of phonograms of prerecorded difons and extended vowel sounds, which are entered for 20-40 ms before the pre-diphones and 40- 60 ms for the drums. 1 il.

Description

Изобретение относитс к речевой информатике и приборостроению дл синтеза речевых сообщений по тексту в системах акустического общени человека с автоматамиThe invention relates to speech informatics and instrumentation for the synthesis of speech messages in the text in the systems of acoustic communication of a person with automata.

Цель изобретени - повышение разборчивости и натуральности синтезируемой речи.The purpose of the invention is to increase the intelligibility and naturalness of synthesized speech.

Повыпение натуральности и разборчивости компилируемьпс сообщений достигаетс стыковкой фрагментов фонограмм , выбираемых из заранее записанных сегментов соответствуюп1их дифонов натуральной речи, содержаний конечную часть предшествующего и начальную часть последующего звуков, общей продолжительностью от 80 до 120 мс. Интонационные свойства высказываний моделируют BCTaBKaNOi из фонограмм , отдельно хран щихс стационарных участков гласных звуков. При КОМПИЛЯ1ТИИ вставки между дифонами, соответствующими предударным гласным , имеют продолжительность в пределах от 20 до 40 мс, а между дифонами.The compilation of messages achieves naturalness and intelligibility by joining phonogram fragments selected from pre-recorded segments corresponding to their different natural speech phono, the contents of the final part of the preceding and initial part of the subsequent sounds, with a total duration from 80 to 120 ms. The intonational properties of utterances model the BCTaBKaNOi from phonograms, separately stored stationary parts of vowels. In COMPILATION, the inserts between the diphones corresponding to the pre-stressed vowel have a duration ranging from 20 to 40 ms, and between the diphones.

соответствующими ударной гласной, продолжительностью в пределах от 40 до 60 мс.corresponding shock vowel, duration ranging from 40 to 60 ms.

На чертеже представлена блок-схема , по сн юща способ.The drawing shows a block diagram illustrating the method.

Текст с дополнительными знаками ударений ввод т в текстовый процессор 1, соединенный с посто нными запо- минаюп(ими устройствами 2 и 3, где хран т заранее записанные дифоны и прот жно произнесенные отдельные гласные звуки соответственно.Text with additional stress marks is entered into word processor 1 connected to fixed memory (devices 2 and 3 by them, where they store pre-recorded diphones and long pronounced separate vowel sounds, respectively.

Считываемые фонограммы поблочно через буферное запоминающее устройство 4 соответственно выбранным длительност м преобразуют цифро-аналого- -вым преобразователем 5 и воспроизвод т электроакустическим агрегатом 6.Readable phonograms are block by block through a buffer storage device 4, respectively, the selected durations are converted by a digital-to-analog converter 5 and reproduced by an electro-acoustic unit 6.

Компил ци сообщений из натуральных дифонов и фрагментов прот жных гласных обеспечивает естественный оттенок и разборчивость синтезируемых сообщений.Compilation of messages from natural diffons and fragments of vowels provides a natural shade and intelligibility of the synthesized messages.

(Л G(Lg

елate

со соwith so

0000

асace

0000

Claims

Invention Formula

The method of compilation speech synthesis that includes the reproduction of segments from Pre-allocated and recorded fragments of natural speech while controlling the length of the segments by signals generated by transcribing synthesized texts in a sequence of segments, characterized in that, in order to improve legibility and naturalness of the synthesized speech, all dLones containing

the final part of the preceding and the initial part of the subsequent sounds with a total duration of 80-120 ms and isolated strings of natural speech vowels, and at reproduction the synthesized message is compiled from diffons with inclusion between segments, corresponding preliminary vowels, segments of corresponding vowels of 20-40 ms and with the inclusion between diphones containing shock vowels, segments of corresponding vowels with a duration of 40-60 ms.

Tex / n