KR19980053336A

KR19980053336A - Sound quality reduction device of voice synthesis system

Info

Publication number: KR19980053336A
Application number: KR1019960072435A
Authority: KR
Inventors: 정준구
Original assignee: 구자홍; 엘지전자 주식회사
Priority date: 1996-12-26
Filing date: 1996-12-26
Publication date: 1998-09-25
Also published as: KR100429978B1

Abstract

본 발명은 입력 문장(Text)을 받아서 이 것을 음성(Speech)으로 합성하여 출력하는 음성합성시스템(TTS: Text To Speech Systems)에 관한 것으로서 특히, 무성음에 대한 음질 저하를 방지하기 위해서 유성음과 무성음을 구분하여 음성합성필터의 입력 여기신호(Input Excitation Signal)가 구분되도록 한 음성합성 시스템의 음질저하 방지장치에 관한 것이다.BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a text to speech system (TTS) for receiving an input text and synthesizing it into speech and outputting the speech. In particular, voiced sounds and unvoiced sounds are used to prevent deterioration of sound quality for unvoiced sounds. The present invention relates to an apparatus for preventing sound deterioration of a speech synthesis system in which an input excitation signal of a speech synthesis filter is distinguished.

종래의 문장/음성 합성시스템에서는 데이타 베이스에서 제공하는 유성음 정보와 무성음 정보를 이용해서 음성 합성의 입력 제어를 수행하는데, 이때 무성음 구간에서의 음질이 저하되고, 합성음의 명료도가 떨어지는 단점이 수반된다.In the conventional sentence / voice synthesis system, voice control is performed using voiced sound information and unvoiced sound information provided from a database. In this case, the sound quality in the unvoiced sound section is degraded and the clarity of the synthesized sound is accompanied.

본 발명에서는 제 2 도와같이, 무성음의 잔차신호를 데이타 베이스(206)로 구축해두고, 음성 합성필터(203)의 입력을 유성음 정보와 무성음 정보에 따라 제어할때 상기 무성음 잔차 신호를 이용해서 원래의 신호와 동일한 무성음 합성이 이루어지도록 한 음성합성 시스템의 음질저하 방지장치를 제공한다.In the present invention, as in the second diagram, an unvoiced residual signal is constructed in the database 206, and when the input of the voice synthesis filter 203 is controlled according to the voiced sound information and unvoiced sound information, the original unvoiced sound residual signal is used. The present invention provides a sound deterioration prevention device of a speech synthesis system in which the same unvoiced sound synthesis is performed.

Description

Sound quality reduction device of voice synthesis system

종래의 음성 합성시스템의 구성은 도면 제 1 도에 도시한 바와같이, 입력 문장의 장음이나 불규칙 음운변동 등의 전처리를 수행하는 언어 처리부(101)와, 상기 언어 처리부(101)의 출력을 입력받아 합성 음성을 생성하는 수단으로서; 문장의 운율 구현과 음운 현상의 처리를 수행하는 운율 제어부(102)와, 상기 운율제어부(102)에서 운율 제어된 신호를 입력받아 그 문장을 음성으로 합성하여 출력하는 음성합성 필터(103)와, 상기 음성 합성수단에 의한 음성 합성을 위하여 합성필터의 특징계수, 유성음과 무성음 정보, 피치정보 등의 파라미터를 제공하는 데이타 베이스(104)로 이루어지며, 그 동작은 다음과 같이 이루어진다.As shown in FIG. 1, the conventional speech synthesis system includes a language processor 101 for performing preprocessing such as long sound or irregular phonetic variation of an input sentence, and an output of the language processor 101. Means for generating synthesized speech; A rhyme control unit 102 for performing a rhyme of a sentence and processing a phonological phenomenon, a speech synthesis filter 103 for receiving a rhyme-controlled signal from the rhyme control unit 102 and synthesizing the sentence into speech; It consists of a database 104 that provides parameters such as feature coefficients, voiced and unvoiced information, pitch information, and the like of the synthesis filter for speech synthesis by the speech synthesis means. The operation is performed as follows.

언어 처리부(101)에 입력된 문장은 여러가지 음운학적 정보에 따라 입력 문장이 해석되어지고, 그 결과로써 합성단위 시이퀀스(열)가 생성된다.In the sentence input to the language processor 101, the input sentence is interpreted according to various phonological information, and as a result, a synthesis unit sequence (column) is generated.

언어 처리를 위한 문장처리 정보는 장음사전과 불규칙 음운변동사전, 조사 및 어미 사전 등의 데이타 베이스에서 제공되며, 상기 각 데이타 베이스에서 제공되는 정보를 기초로하여 입력 문장(Text)의 장음, 불규칙 처리 등을 수행하여 처리 결과를 운율 제어부(102)에 공급한다.Sentence processing information for language processing is provided in a database such as a long phonetic dictionary, an irregular phonetic variance dictionary, a survey and a mother dictionary, and based on the information provided from each database, the long sound and irregular processing of an input text (Text) And the like to supply the processing result to the rhyme control unit 102.

운율 제어부(102)는 상기 언어 처리부의 처리 결과를 이용해서 자소의 구성에 따라 음운현상 및 강세, 음절의 길이, 어절내에서 억양의 변화 등을 처리하여 해당 언어에 대한 적절한 운율을 구현한다.The rhyme controller 102 processes phonological phenomena and stresses, syllable lengths, and changes in intonation in a word according to the phoneme using the processing result of the language processor to implement an appropriate rhyme for the language.

이 운율 구현에 필요한 정보는 데이타 베이스(104)로부터, 피치(pitch)궤적 정보를 입력받고 또 운율 구현의 에너지 정보도 입력받아 이 것으로부터 운율을 제어하게 된다.The information necessary to implement the rhyme is input from the database 104, the pitch trajectory information is input, and the energy information of the rhyme implementation is also received, thereby controlling the rhyme from this.

이와같이 운율 제어된 합성단위열은 합성필터(103)에 입력되어 입력된 문장 정보를 데이타 베이스(104)에서 제공하는 음성 데이타에 따라 소리로 만들어서 입력 문장에 대응하는 음성을 출력해 준다.As described above, the rhythm-controlled synthesis unit string is input to the synthesis filter 103 to make a sound according to the voice data provided by the database 104 and outputs a voice corresponding to the input sentence.

이때, 음성 합성필터(103)는 데이타 베이스(104)로부터 유성음 정보와 무성음 정보를 제어신호로 입력받아, 유성음 입력(103a)과 무성음 입력(103b)을 제어하고 특징계수를 입력받아 유성음과 무성음의 소리 합성을 수행하게 된다.At this time, the speech synthesis filter 103 receives voiced sound information and unvoiced sound information from the database 104 as a control signal, controls the voiced sound input 103a and the unvoiced sound input 103b, receives a feature coefficient, and receives the voiced sound and unvoiced sound. Sound synthesis will be performed.

한편, 상기 합성단위 시이퀀스는 초기에 음소나 음절 등 간단한 음운학적 단위가 사용되어 왔으나 최근에는 유무성 구간이 혼합된 복잡한 합성단위가 사용되며, 이 것은 합성 음질의 자연성을 높이기 위한 것이다.On the other hand, the synthesis unit sequence has been used in the early simple phonological units such as phonemes or syllables, but recently complex complex units in which the presence or absence section is mixed, this is to increase the naturalness of the synthesized sound quality.

그러나, 자연성이 향상될수록 합성음의 명료도는 떨어지게 된다.However, as the naturalness improves, the clarity of the synthesized sound decreases.

본 발명에서는 무성음의 잔차신호를 데이타 베이스로 구축해두고, 음성 합성필터의 입력을 유성음 정보와 무성음 정보에 따라 제어할때 상기 무성음 잔차 신호를 이용해서 원래의 신호와 동일한 무성음 합성이 이루어지도록 한 음성합성 시스템의 음질저하 방지장치를 제공한다.In the present invention, a speech synthesis is constructed in which the unvoiced residual signal is constructed as a database, and when the input of the speech synthesis filter is controlled according to the voiced sound information and the unvoiced sound information, the unvoiced sound synthesis is performed using the unvoiced residual signal. Provides a system for preventing sound degradation of the system.

제 1 도는 종래의 문장/음성 합성시스템의 블럭 구성도1 is a block diagram of a conventional sentence / voice synthesis system

제 2 도는 본 발명의 문장/음성 합성시스템의 블럭 구성도2 is a block diagram of a sentence / voice synthesis system according to the present invention.

도면 제 2 도에 본 발명의 문장/음성 합성시스템의 구성을 나타내었다.2 shows the structure of the sentence / voice synthesis system of the present invention.

본 발명의 음성 합성 시스템은, 입력 문장을 처리하여 음성 합성을 위한 단위 시이퀀스를 출력하는 언어 처리부(201)와, 상기 언어처리부(201)에서 출력된 합성 단위열의 정보를 음성신호로 합성하여 출력하는 수단으로서; 합성 단위열에 대한 운율 처리를 수행하는 운율 제어부(202)와, 상기 운율 제어부(202)에서 처리된 합성단위 열의 정보를 입력받아 이 것을 유성음과 무성음 합성 처리함과 함께 무성음의 잔차신호 데이타 베이스(206)로부터의 잔차신호 정보를 이용해서 무성음 합성을 수행하는 음성합성필터(203)와, 상기 운율 제어 및 음성 합성을 위한 파라미터들을 공급하는 합성 데이타 베이스(204)와, 상기 음성 합성필터(203)의 유성음 및 무성음 입력 제어를 위한 신호를 공급하는 유/무성음 제어 메모리(205)와, 상기 음성 합성필터(203)에 의한 무성음 합성을 위한 무성음의 잔차신호를 공급하는 무성음 잔차신호 데이타베이스(206)를 포함하여 구성된다.In the speech synthesis system of the present invention, a language processor 201 for processing an input sentence and outputting a unit sequence for speech synthesis, and synthesizes and outputs information of the synthesis unit string output from the language processor 201 into a voice signal are outputted. As a means; A rhyme control unit 202 that performs rhyme processing on the synthesis unit strings, and receives information on the synthesis unit strings processed by the rhyme control unit 202, and processes the synthesized voiced and unvoiced voices together with the residual signal database of the unvoiced sound. Speech synthesis filter 203 for performing unvoiced sound synthesis using the residual signal information from;); a synthesis database 204 for supplying parameters for the rhyme control and speech synthesis; Voice / unvoice control memory 205 for supplying signals for voiced sound and unvoiced input control, and voiceless residual signal database 206 for supplying the residual signal of unvoiced sound for unvoiced sound synthesis by the speech synthesis filter 203. It is configured to include.

상기한 바와같이 구성된 본 발명의 음성 합성 시스템의 동작은 다음과 같이이루어진다.The operation of the speech synthesis system of the present invention configured as described above is performed as follows.

언어처리부(201)에 입력된 문장은 음운학적 정보에 기초하여 입력 문장이 해석되고, 그 결과로써 합성단위열이 생성되고, 운율 제어부(202)에서는 입력되는 합성단위열에 대하여 운율 처리를 수행한다.The sentence input to the language processor 201 is interpreted based on phonological information, and as a result, a compound unit string is generated, and the rhyme controller 202 performs a rhyme process on the input unit string.

운율 처리에 필요한 피치 등의 파라미터는 합성 데이타 베이스(204)에서 제공된다.Parameters such as pitch required for rhyme processing are provided in the synthesis database 204.

운율 제어부(202)에서 운율 제어된 합성단위열은 합성필터(203)에 입력되는데, 합성필터로 입력되는 합성단위열은 음성 합성시에 필요한 합성단위의 순차적인 색인정보(Index)로서 이 색인 열(Index Sequence)에 따라 합성 데이타 베이스(204)로부터 해당 합성 단위를 합성하는데 필요한 피치값과 특징 계수값 등의 파라미터가 프레임 단위로 읽혀진다.The synthesis unit sequence controlled by the rhythm control unit 202 is input to the synthesis filter 203. The synthesis unit sequence input to the synthesis filter is sequential index information (Index) of the synthesis unit necessary for speech synthesis. According to (Index Sequence), parameters such as pitch values and feature coefficient values required for synthesizing the corresponding synthesis unit are read from the synthesis database 204 in units of frames.

또한 합성단위열의 각 프레임들에 대해서 유/무성음 제어 메모리(TLB: Table Lookahead Buffer)로부터 유성음(U) 또는 무성음(V) 정보가 입력된다.In addition, voiced sound (U) or unvoiced sound (V) information is input to each frame of the synthesis unit sequence from a table lookahead buffer (TLB).

이 유성음 정보(U)와 무성음 정보(V)에 의해서 합성필터(203)의 입력(203a, 203b)이 제어되어 입력 여기신호(input excitation signal)가 구분되는데, 현재 합성필터(203)의 입력 프레임이 유성음인 경우에는 상기 합성 데이타 베이스(204)의 피치값의 주기에 따라 임펄스 시이퀀스(203a)가 합성 필터(203)로 입력되어 유성음이 합성된다.The voiced sound information U and the unvoiced sound information V control the inputs 203a and 203b of the synthesis filter 203 to distinguish the input excitation signal, which is currently input frame of the synthesis filter 203. In the case of the voiced sound, the impulse sequence 203a is input to the synthesis filter 203 according to the pitch period of the synthesis database 204 to synthesize voiced sound.

그러나, 합성필터(203) 입력 프레임이 무성음인 경우에는 무성음 잔차신호 데이타 베이스(206)에서 입력된 잔차신호(203b)가 합성필터(203)로 입력되어 원래의 신호와 동일한 무성음이 합성된다.However, when the input frame of the synthesis filter 203 is an unvoiced sound, the residual signal 203b input from the unvoiced residual signal database 206 is input to the synthesis filter 203, and the unvoiced sound identical to the original signal is synthesized.

위와같이 합성된 음성신호는 디지탈/아날로그 변환되어 최종적으로 상기 입력 문장에 대응하는 합성 음성신호로서 출력된다.The synthesized speech signal as described above is digital / analog converted and finally output as a synthesized speech signal corresponding to the input sentence.

상기한 바와같이 본 발명에서는 무성음 합성시에 무성음의 잔차신호 데이타 베이스를 이용해서 무성음을 합성하므로 무성음 구간에서의 음질 저하를 방지할 수 있고, 또한 합성음의 명료도를 높일 수 있는 효과가 있다.As described above, in the present invention, the unvoiced sound is synthesized by using the residual signal database of unvoiced sound at the time of unvoiced sound synthesis, thereby preventing deterioration of sound quality in the unvoiced sound section and increasing the intelligibility of the synthesized sound.

Claims

Means for converting an input sentence into a synthesis unit sequence based on phonological information, and for controlling the rhythm thereof, to synthesize the input sentence into a speech signal corresponding to the sentence; Means for synthesizing by dividing into voiced and unvoiced sounds using synthesis parameters from the bass means; and means for supplying to the synthesis means including a residual signal of unvoiced sound as a database means for providing the parameters necessary for speech synthesis. Sound degradation prevention device of the speech synthesis system, characterized in that configured.