KR0134707B1

KR0134707B1 - Voice synthesizer

Info

Publication number: KR0134707B1
Application number: KR1019940036104A
Authority: KR
Inventors: 이승훈; 강동규; 한민수
Original assignee: 양승택; 재단법인 한국전자통신연구소; 조백제; 한국전기통신공사
Priority date: 1994-12-22
Filing date: 1994-12-22
Publication date: 1998-05-15
Also published as: KR960024888A

Abstract

본 발명은 한국어 문자를 음성으로 합성하는 방법에 대한 것으로서, 음성을 다이폰 단위로 세분하여 합성 데이타베이스로 부터 파라미터를 가져와서 LSP합성방식으로 음성으로 변환하는 방법이다. 본 발명에 사용한 특징으로는 다이폰 단위로의 분류와 결합을 효율적으로 세분화하였으며 합성 데이타베이스를 구성하는 경우 각각의 다이폰에 대해서 특수한 경계표시를 하였다. 각각의 경계표시는 다이폰의 유형에 따라서 서로 다른 의미를 가지며 합성단위들을 결합하는 경우 길이의 조절과 결합이 용이하도록 구성하였다. 또한 음성합성기술의 관건은 합성음의 품질이므로 LSP 합성필터의 음원으로 수정된 LF모델과 리지쥬얼(residual)신호를 사용하여 자연성과 명료성을 높이고자 하였다.The present invention relates to a method of synthesizing a Korean character into a voice, and is a method of subdividing a voice into a diphone unit to obtain a parameter from a synthesis database and converting the voice into an LSP synthesis method. As a feature of the present invention, the classification and binding in diphonic units are effectively subdivided, and when constructing a synthetic database, a special landmark is given for each diphony. Each landmark city has different meanings according to the type of diphony, and it is configured to facilitate the adjustment of length and combination when combining synthetic units. In addition, the key to speech synthesis technology is the quality of synthesized sound. Therefore, the LF model and the residual signal modified as the sound source of the LSP synthesis filter are used to improve the naturalness and clarity.

본 발명은 현재 늘어가는 정보 통신 서비스와 연결될 경우 합성기술을 이용한 정보검색의 급증으로 새로운 정보제공 기술의 증진을 가져오는 효과를 가진다.The present invention has the effect of increasing the new information providing technology by the proliferation of information retrieval using a synthetic technology when connected to the increasing number of information and communication services.

Description

LSP Speech Synthesis Method Using Diphone Unit

제 1도는 본 발명이 적용되는 하드웨어의 기본 구성도,1 is a basic configuration diagram of hardware to which the present invention is applied;

제 2도는 본 발명에 사용되는 합성 데이타베이스 구성도,2 is a schematic diagram of a synthetic database used in the present invention,

제 3도는 본 발명에 사용되는 다이폰의 유형별 결합표시도,Figure 3 is a combination display of the type of diphones used in the present invention,

제 4도는 본 발명에 따른 음성합성 방법의 전체적인 처리 흐름도,4 is an overall processing flowchart of the speech synthesis method according to the present invention;

제 5도는 본 발명에 사용되는 음원모델 구성도.5 is a block diagram of the sound source model used in the present invention.

*도면의 주요 부분에 대한 부호의 설명* Explanation of symbols for the main parts of the drawings

1:문자입력장치2:중앙처리모듈1: Character input device 2: Central processing module

3:합성데이타베이스4:D/A변환장치3: Synthetic database 4: D / A converter

10:프레임단위의 구조20:프레임별 파라미터의 구조10: structure of frame unit 20: structure of parameter for each frame

30:경계표시30: Boundary Display

본 발명은 다이폰 단위로 구성된 합성 피라미터 데이타베이스르 이용한 LSP방식의 음성합성 방법에 관한 것으로서, 한국어 문자를 음성으로 변환하는 기능을 필요로 하는 모든 분야에 적용 가능한 음성합성 방법에 관한 것이다.The present invention relates to a speech synthesis method of the LSP method using a synthesis parameter database composed of diphone units, and to a speech synthesis method applicable to all fields requiring a function of converting Korean characters into speech.

음성은 인간의 대화수단으로서 옛날부터 가장 많이 이용되어 왔다. 음성을 이용한 정보의 전달은 특별한 교육을 필요로 하지 않으며 다른일을 하면서도 이루어질 수 있다는 장점등으로 인해 컴퓨터와 통신기술이 발달하면서 새로운 정보전달매체로서 각광을 받고 있다. 또한 컴퓨터의 보급이 늘어나면서 정보통신 서비스망을 이용하여 서로 정보를 교환하고 필요한 자료를 검색하는 사용자들이 계속적으로 증가하고 있으며, 이에 따라 정보의 미디어변환 서비스인 문자/음성 변환 기술에 대한 욕구도 계속적으로 확대되고 있는 추세에 있다.Voice has been used most since ancient times as a means of human communication. The delivery of information using voice does not require special education and can be achieved while doing other work. As a result of the development of computer and communication technology, it has been in the spotlight as a new information transmission medium. In addition, as the spread of computers increases, users are increasingly using information and communication service networks to exchange information with each other and search for necessary materials. Accordingly, there is a continuous need for text / voice conversion technology, a media conversion service of information. The trend is expanding.

이러한 음성합성 기술은 전화선을 이용한 각종 정보제공 서비스, 언어장애자를 위한 발성장치, OA기기, 가전제품등의 인간-기계 인터페이스에 응용될 수 있다. 그러나 인간에 가까운 자연스러운 합성음을 만들어 내는데는 아직도 많은 어려움이 있으므로 국내에서 실제 상용화한 제품은 손에 꼽힐 정도이다. 특히 규칙을 이용한 음성합성 방식을 이용하여 인간에 가까운 음성을 만들어 내기 위해서 다양한 합성단위들이 연구되어 왔는데 이러한 단위들은 주로 음소 또는 여러개의 연속적인 음소로 구성이 되었다.Such speech synthesis technology can be applied to various information providing services using telephone lines, speech devices for the language-disabled, human-machine interfaces such as OA devices and home appliances. However, there are still many difficulties in producing natural synthetic sounds that are close to humans, so the products that are commercially available in Korea are considered to be in the hands. In particular, various synthesis units have been studied to produce voices close to humans using the rule synthesis method. These units are mainly composed of phonemes or several consecutive phonemes.

일반적으로 합성 단위의 크리가 커지면 커질수록 합성 단위 간의 연결 규칙이 간단해지고 합성음이 음질이 양호하지만 무제한의 음소열을 만들어 내는데 필요한 합성단위의 갯수가 많아진다. 또한 이러한 합성단위는 앞이나 뒤의 음소, 액센트, 문장의 형태, 또는 문맥등에 영향을 받아서 그 특성이 변하게 되므로 가장 효율적이고 적절한 합성단위 뿐만 아니라 필요한 합성단위의 갯수를 결정하는 것은 매우 어려운 일이다.In general, the larger the size of the synthesis unit, the simpler the connection rule between the synthesis units and the better the sound of the synthesized sound, but the larger the number of synthesis units required to produce an unlimited number of phonemes. In addition, since the characteristics of these synthesis units are affected by phonemes, accents, sentence forms, or contexts, it is very difficult to determine not only the most efficient and appropriate synthesis units but also the number of necessary synthesis units.

지금까지 연구 중인 합성단위 중 다이폰단위는 음소 사이의 천이 구간을 합성단위 내에 포함시키고 있으므로 결합규칙은 복잡해지더라도 음소간의 연결이 자연스럽다는 장점을 갖는다.Among the synthesized units under study, the diphone unit includes transition periods between phonemes in the synthesized unit, and thus, even though the combining rules are complicated, the connection between the phonemes is natural.

따라서 본 발명은 다이폰단위를 사용하여 합성기술을 구현하고, 명료도의 차원에서는 리지쥬얼(residual) 신호를 사용하여 무성음을 합성함으로써 합성 데이타베이스의 크기는 늘어나지만 보다 정확하게 무성음을 표현할 수 있도록 한 음성합성 방법을 제공하는데 그 목적이 있다.Therefore, the present invention implements a synthesis technique using a diphone unit, and synthesizes an unvoiced sound using a residual signal in terms of intelligibility, thereby increasing the size of the synthesized database, but more accurately expressing the unvoiced sound. The purpose is to provide a synthetic method.

상기 목적을 달성하기 위하여 본 발명은, 완성형으로 표현이 가능한 한국어 문자를 입력받아 전달하는 문자입력수단과, 상기 문자 입력수단으로 부터 입력된 문자를 전달받아 음성 합성 알고리즘을 수행하는 중앙처리 수단과, 합성 알고리즘에 사용되는 다이폰 단위로 구성된 파라미터를 저장하고 있으며, 상기 중앙처리 수단으로 필요한 파라미터들을 전송하는 합성 데이타 베이스와, 상기 중앙처리수단에서 합성이 끝난 디지탈 데이타를 아날로그로 변환하여 외부로 합성음을 출력하는 디지탈/아날로그 변환기를 포함한 장치에 적용되는 음성합성 방법에 있어서, 문자 입력수단을 통해 입력되는 완성형 문자를 변환테이블을 이용하여 3바이트의 내부 코드로 변환하는 한글처리 과정 후 알파벳, 숫자 및 제한된 약어처리과정을 수행하는 제1단계와, 경계분석 및 숨쉬기 처리과정을 통해 운율제어정보를 생성하고, 발음규칙처리 과정을 통해 한국어의 음운규칙을 적용하여 소리나는 형태의 발음기호열을 생성하는 제2단계와, 상기 제2단계에서 생성한 기호열을 이용하여 우선 음소의 길이조절을 수행하는 제3단계와, 상기 제3단계의 수행결과에 따라 3바이트(byte) 형태로 구성된 각각의 음절을 합성 데이타베이스에 정의된 다이폰 유형으로 변환(160)하는 제4단계와, 생성된 다이폰들의 인덱스를 이용하여 합성데이타베이스로 부터 파라미터를 가지고 온 후, 인접단위 사이에서 LSP 파라미터의 선형보간 및 에너지 가증치를 이용한 에너지 조절을 수행하는 제5단계와, 문장의 구조, 의미, 감정등에 대한 정보를 표현하는 기본주파수를 경제분석정보를 이용하여 결정하는 운율제어를 수행하는 제6단계와, 상기 제6단계를 거쳐 생성된 합성 파라미터를 이용하여 음성을 합성하는 제7단계를 포함한다.In order to achieve the above object, the present invention, the character input means for receiving and transmitting the Korean characters that can be expressed in a complete form, the central processing means for receiving a character input from the character input means for performing a speech synthesis algorithm, It stores the parameters configured in units of diphony used in the synthesis algorithm, and synthesizes a database for transmitting the necessary parameters to the central processing means, and converts the digital data synthesized in the central processing means to analog to synthesize the sound In a speech synthesis method applied to a device including a digital / analog converter for outputting, alphabets, numbers, and limitations after the Hangul processing process of converting a completed character input through a character input means into an internal code of 3 bytes using a conversion table. The first step of performing the abbreviation process, A second step of generating rhyme control information through a system analysis and a breathing process and generating a phonetic phonetic code sequence by applying a phonological rule of Korean through a pronunciation rule processing process; The third step of adjusting the length of the phoneme using the symbol string and converting each syllable in the form of 3 bytes into the diphone type defined in the synthesis database according to the result of the third step A fourth step (160) and a fifth step of bringing parameters from the synthesis database using the generated indexes of the diphones, and performing energy regulation using linear interpolation and energy increment of the LSP parameters between adjacent units; And a sixth step of performing rhythm control to determine, by using economic analysis information, a fundamental frequency representing information on a sentence structure, meaning, emotion, and the like; And a seventh step of synthesizing the voice using the synthesis parameter generated through the sixth step.

이하, 첨부된 도면을 참조하여 본 발명의 일실시예를 상세히 설명한다.Hereinafter, with reference to the accompanying drawings will be described an embodiment of the present invention;

제 1도는 본 발명의 합성기술이 적용되는 하드웨어 구조도로서, 도면에 도시된 바와 같이, 문자입력장치(1), 중앙처리모듈(2), 합성 데이타 베이스(3), 디지탈/아날로그(D/A) 변환기(4)를 구비함을 보인다.1 is a hardware structural diagram to which the synthesis technique of the present invention is applied, and as shown in the drawing, a character input device 1, a central processing module 2, a synthesis database 3, and digital / analog (D / A). ) With a transducer 4.

문자입력장치(1)는 KS5601 완성형으로 표현이 가능한 한국어 문자를 입력받아 중앙처리모듈(2)로 전달한다. 중앙처리 모듈(2)은 본 발명의 알고리즘을 수행하는 소프트웨어가 탑재되어 수행되는 부분이다. 합성 데이타 베이스(3)는 합성 알고리즘에 사용되는 다이폰 단위로 구성된 파라미터 데이타베이스로서 기억장치에 기록되어 있으며 중앙처리모듈(2)로 필요한 파라미터들을 전송하는 역할을 담당한다. D/A 변환기(4)는 합성이 끝난 디지탈 데이타를 아날로그로 변환하여 외부로 합성음을 들려주는 부분이다.The character input device 1 receives the Korean characters that can be represented in the KS5601 complete type and transmits them to the central processing module 2. The central processing module 2 is a part where the software for executing the algorithm of the present invention is loaded and executed. The synthesizing database 3 is a parameter database composed of units of diphony used in the synthesizing algorithm, recorded in the storage device, and responsible for transmitting necessary parameters to the central processing module 2. The D / A converter 4 is a part that converts the synthesized digital data into analog and plays the synthesized sound to the outside.

다음의 표1은 본 발명에서 사용되는 다이폰의 유형별 분류를 나타낸 표이다. Table 1 is a table showing the classification according to the type of diphones used in the present invention.

음성을 합성하기 위해서는 각각의 음절에 대해서 선행음절과 후행음절을 참고로 하여 표와 같이 다이폰의 형태로 분리한다. 각각 생성된 다이폰 유형은 합성 데이타베이스로 부터 파라미터를 가져오는데 사용되며, 다이폰 단위의 결합시에도 사용된다.In order to synthesize the voice, each syllable is divided into the form of a die phone as shown in the table by referring to the preceding syllables and the following syllables. Each diphony generated is used to fetch parameters from the synthetic database, and is also used to combine diphony units.

제 2도는 합성 피라미터로 이루어진 다이폰단위의 데이타베이스 구성을 나타내는 도면으로서, 각각의 다이폰은 프레임단위로 이루어진 합성 파라미터부분과 길이조절 및 인접다이폰 사이의 연결을 위한 구간정보를 나타내는 부분으로 구성되어 있다. 한개의 다이폰은 일정한 길이를 가진 프레임 형태들의 연속(10)으로 이루어져 있으며 각 프레임은 4종류의 합성 파라미터들(20)로 나누어진다.FIG. 2 is a diagram showing a database structure of a diphone unit composed of synthetic parameters, and each diphone is a portion representing a composition parameter part composed of frame units and section information for length control and connection between adjacent diphones. Consists of. One die phone consists of a series of frame shapes 10 having a constant length and each frame is divided into four kinds of composition parameters 20.

이 중 피치는 현재 프레임이 무성음이지 유성음인지를 나타내는데, 만약 이 값이 0이면 현재 프레임이 무성음이므로 합성시에 리지쥬얼(residual) 신호를 사용하여 음성을 합성하려는 것을 가리키며, 반대로 값이 0보다 크면 유성음이므로 수정된 LF 모델로 합성하라는 것을 나타낸다. 에너지 가증치는 합성음의 진폭을 결정하는 값으로서 LSP 합성필터의 이득으로 사용된다. LSP 파라미터 P, Q는 모두 12개로 되어 있는데 12차의 올-폴(all-poll)모델로 표현되는 합성필터의 계수로 사용된다.The pitch indicates whether the current frame is unvoiced or voiced. If this value is 0, it indicates that the current frame is unvoiced, so that the speech is synthesized using a residual signal during synthesis. Since it is a voiced sound, it indicates to synthesize it with a modified LF model. The energy boost is a value that determines the amplitude of the synthesized sound and is used as the gain of the LSP synthesis filter. There are 12 LSP parameters, P and Q, which are used as the coefficients of the synthesis filter represented by the 12th-order all-poll model.

리지쥬얼(residual) 신호는 자음의 합성시에 음원으로 사용되는 파라미터로서 원음과 거의 유사한 합성자음을 만들어 낼 수 있다. 가장 윗 부분에 그려진 부분(30)은 이러한 프레임단위의 다이폰들(10)을 연결하여 연속적인 음성으로 합성하기 위해서 특별한 경계표시, 즉 다이폰의 시작, 음소의 경계점, 연결지점, 끝 등을 지정해 놓은 것을 나타내는 것으로서 다이폰의 유형에 따라 최대 5개까지 정의한다. 다이폰 유형별로 자세한 경계표시는 다음과 같다.A residual signal is a parameter used as a sound source when synthesizing a consonant, and can produce a synthetic consonant that is almost similar to the original sound. The part 30 drawn on the top part connects the die-phones 10 of the frame unit to synthesize a continuous voice, so that a special landmark time, i.e., the start of the phone, the boundary of the phoneme, the connection point, the end, etc. Defines up to five, depending on the type of diphony. Detailed landmark time by type of diphony is as follows.

아래에서 동그라미 쳐진 숫자가 제2도의 경계표시(30)안에 있는 숫자에 해당하며 사용하지 않는 경계표시는 나타내지 않았다.The number circled below corresponds to the number in landmark hour 30 of FIG. 2 and does not indicate landmark time not in use.

-다이폰 유형 1)-Diphon type 1)

①묵음에서 모음으로의 변화구간이 시작되는 위치① Location where the transition period from silence to vowel begins

②묵음에서 모음으로의 변화가 끝나고 안정된 모음이 시작되는 위치② The position where the change from silence to vowel ends and stable vowel begins

③모음에서 묵음으로의 변화가 시작되기 전 모음의 안정구간이 끝나는 위치③ The end of the stable section of the vowel before the change from collection to silence

④모음에서 묵음으로의 변화구간이 끝나는 위치④ Location where the change section from vowels to mute ends

-다이폰 유형 2)-Diphon type 2)

①묵음에서 반모음으로의 변화구간이 시작되는 위치① Location where the transition period from silent to half vowel begins

②반모음에서 모음으로의 변화가 끝나고 안정된 모음이 시작되는 위치② Position where stable vowel starts after change from half vowel to vowel

-다이폰 유형 3)-Diphon type 3)

①묵음에서 초성자음으로의 변화구간이 시작되는 위치① Location where the transition period from silence to initial consonants begins

②묵음에서 자음을 거쳐 모음으로의 변화가 끝나고 안정된 모음이 시작되는 위치② The position where the change from the silent to the vowel ends and the stable vowel starts.

⑤초성자음이 무성음일 경우 무성음이 끝나는 위치⑤ The location where the unvoiced sound ends when the initial consonant sound is unvoiced

-다이폰 유형 4)-Diphon type 4)

②묵음에서 자음을 거쳐 반모음을 거쳐 모음으로의 변화가 끝나고 안정된 모음이 시작되는 위치② The position where stable vowels begin after the change from silence to consonants to half vowels ends.

-다이폰 유형 5)-Diphon type 5)

②묵음에서(이중모음일 경우 반모음을 거쳐) 모음으로의 변화가 끝나고 안정된 모음이 시작되는 위치② The position where the change from silence to vowel ends and the stable vowel starts.

③모음에서 종성자음으로의 변화가 시작되기 전 모음의 안정구간이 끝나는 위치③ The end of the stable section of the vowel before the change from the vowel to the final consonant begins

④모음에서 종성자음을 거쳐 묵음으로의 변화구간이 끝나는 위치④ The location where the transition period from the vowels to the final consonants ends.

-다이폰 유형 6)-Diphone type 6)

①앞 모음의 안정구간이 시작되는 위치① Position where the stability section of the front bar starts

②앞 모음에서 뒷모음으로의 변화구간이 시작되기 전 앞 모음의 안정구간이 끝② The stability section of the front vowel ends before the transition period from the front vowel to the rear vowel begins.

나는 위치I position

③앞 모음에서 뒷모음으로의 변화구간이 끝나고 뒷모음의 안정구간이 시작되는 위치③ The position where the change section from the front vowel to the back vowel ends and the stable section of the back vowel begins.

④뒷 모음의 안정구간이 끝나는 위치④ Position where the stability section of the rear vowel ends

-다이폰 유형 7)-Dipon type 7)

①모음의 안정구간이 시작되는 위치① Position where the stable section of the collection starts

②모음에서 반모음으로의 변화가 시작되기 전 모음의 안정구간이 끝나는 위치② The end of the stable section of the vowel before the change from vowels to half vowels

③모음에서 반모음으로의 변화구간이 끝나는 위치③ Location where the change interval from vowels to half vowels ends

-다이폰 유형 8)-Dipon type 8)

②모음에서 자음으로의 변화가 시작되기 전 모음의 안정구간이 끝나는 위치② The end of the stable section of the vowel before the change from vowels to consonants begins

③모음에서 자음으로의 변화구간이 끝나는 위치③ Location where the change interval from the vowels to the consonants ends

-다이폰 유형 9)-Dipon type 9)

①반모음에서 모음으로의 변화구간이 시작되는 위치① Position where the change section from the half vowel to the vowel begins

②반모음에서 모음으로의 변화구간이 끝나고 모음의 안정구간이 시작되는 위치② Position where the vowel stability section starts after the change period from the half vowel to the vowel ends

③모음의 안정구간이 끝나는 위치③ Position where the stable section of the collection ends

-다이폰 유형 10)-Dipon type 10)

①자음이 시작되는 위치① Location where consonants begin

②자음에서 모음으로의 변화구간이 끝나고 모음의 안정구간이 시작되는 위치② The position where the vowel stable section starts after the transition period from consonant to vowel

⑤무성음이 끝나는 위치⑤ Location where the unvoiced sound ends

-다이폰 유형 11)-Dipon type 11)

①자음이 시작되는 위치① Location where consonants begin

②자음에서 반모음을 거쳐 반모음에서 모음으로의 변화구간이 끝나고 모음의 안정구간이 시작되는 위치② The position where the vowel stability section starts after the change period from consonant to half vowel to half vowel to vowel ends

⑤자음이 무성자음일 경우 무성음이 끝나는 위치⑤ Location where the unvoiced sound ends when the consonant sound is unvoiced

-다이폰 유형 12)-Diphon type 12)

②모음에서 종성자음으로의 변화가 시작되기 전 모음의 안정구간이 끝나는 위치② The end of the stable section of the vowel before the change from the vowel to the final consonant

③모음에서 종성자음으로의 변화구간이 끝나고 종성자음의 안정구간이 시작되는 위치③ Position where the change section from the vowel to the final consonant ends and the stable section of the final consonant begins

④종성자음의 안정구간이 끝나는 위치④ Position where the stable section of the final consonant ends

-다이폰 유형 13)-Diphon type 13)

①종성자음의 안정구간이 시작되는 위치① Position where the stable section of the final consonant begins

②종성자음에서 초성자음의 변화가 시작되기 전 종성자음의 안정구간이 끝나는 위치② The position where the stable section of the final consonant ends before the change of the initial consonant from the final consonant

-다이폰 유형 14)-Diphon type 14)

②모음에서 /ㄹ/로의 변화가 시작되기 전 모음의 안정구간이 끝나는 위치② The end of the stable section of the vowel before the change from the vowel to / ㄹ /

③모음에서 /ㄹ/로의 변화구간이 끝나고 /ㄹ/의 안정구간이 시작되는 위치③ Position where the stable section of / ㄹ / begins after the change section from the collection to / ㄹ / ends.

④/ㄹ/의 안정구간이 끝나는 위치④ / ㄹ / position where the stable section ends

-다이폰 유형 15)-Diphon type 15)

①/ㄹ/의 안정구간이 시작되는 위치Location where the stable section of ① / ㄹ / starts

②/ㄹ/에서 모음으로의 변화구간이 시작되기 전 /ㄹ/의 안정구간이 끝나는 위치② The position where the stable section of / ㄹ / ends before the transition period from / ㄹ / to vowel begins

③/ㄹ/에서 모음으로의 변화구간이 끝나고 모음의 안정구간이 시작되는 위치The position where the stability section of the vowel starts after the change section from ③ / ㄹ / to the vowel

④모음의 안정구간이 끝나는 위치④ Position where the stable section of the collection ends

제 3도는 본 발명에 사용된 다이폰 유형별로 가능한 결합상태를 나타낸 것으로서, 합성단위들을 서로 어떻게 결합시키는 가에 따라서 결합규칙의 복잡도 및 합성음의 자연성에 많은 영향을 미친다. 도면에서 검게 표시된 부분은 결합가능한 것을 나타내며, L이라고 표시한 것은 다이폰2개를 결합하는 경우 왼쪽에 있는 합성단위를, R이라고 표시한 것은 오른쪽에 있는 합성단위를 나타내는 것이다.FIG. 3 shows possible coupling states for each type of diphone used in the present invention, and has a great influence on the complexity of the coupling rule and the naturalness of the synthesized sound depending on how the combined units are combined with each other. In the drawings, the black part indicates that the bond is possible, and the letter L indicates the compound unit on the left when combining two diphons, and the unit R on the right.

제 4도는 본 발명에 따른 문자/음성 변환 방법을 전체적인 처리 흐름도를 나타내며, 도면을 참조하여 상세히 설명하면 다음과 같다.4 is a flowchart illustrating an overall process of a text / voice conversion method according to the present invention, which will be described in detail with reference to the accompanying drawings.

문자를 음성으로 변환하는 과정은 크게 언어처리과정(200)과 실제적인 음성을 합성하는 음성합성과정(300)으로 구성되어 있다.The process of converting a text into a voice is largely composed of a language processing process 200 and a speech synthesis process 300 for synthesizing an actual voice.

입력되는 완성형 문자는 변환테이블을 이용하여 3바이트 내부 코드로 변환하는 한글처리과정(110)을 거친 후 알파벳, 숫자 및 제한된 약어 처리과정(120)을 지나게 된다. 다음으로 경계분석 및 숨쉬기 처리과정(130)에서 운율제어정보를 생성한다. 언어처리과정의 마지막인 발음규칙처리(140)에서는 한국어의 음운규칙을 적용하여 소리나는 형태의 발음 기호열을 생성한다.The input complete character is passed through the Hangul process 110, which converts it into a 3-byte internal code using a conversion table, and then passes through the alphabet, numbers, and limited abbreviation process 120. Next, rhyme control information is generated in the boundary analysis and breathing process 130. In the pronunciation rule processing 140, which is the end of the language processing process, a phonetic rule sequence of a phonetic form is generated by applying Korean phonological rules.

음성합성과정에서는 이 기호열을 이용하여 우선 음소의 길이조절(150)을 수행한다. 길이조절은 입력받은 각각의 발음기호열에 대해서 단어, 구, 절, 문장의 경계분석정보와 실험적으로 정한 음소의 최소지속시간과 고유지속시간을 기준으로하여 음절. 음소 순서로 결정한다. 그 다음으로 3바이트(byte) 형태로 구성된 각각의 음절을 합성 데이타베이스에 정의된 다이폰 유형으로 변환(160)한다. 그리고 생성된 다이폰들의 인덱스를 이용하여 합성데이타베이스로 부터 파라미터를 가지고(170) 온 후, 인접단위 사이에서 LSP파라미터의 선형보간 및 에너지 가증치를 이용한 에너지 조절을 수행한다. 운율제어부(180)에서는 문장의 구조, 의미, 감정등에 대한 정보를 표현하는 기본주파수를 경계분석정보를 이용하여 결정하는데 본 발명에서는 다음과 같은 이차함수 P(t)를 사용하여 구현하였다.In the speech synthesis process, the phoneme length control 150 is first performed using this symbol string. The length control is syllable based on the boundary analysis information of words, phrases, phrases, and sentences, and the minimum duration and intrinsic duration of the phoneme. Determined in phoneme order. Next, each syllable composed of three bytes is converted into a diphone type defined in the synthesis database (160). Then, using the generated indexes of the diphony (170) from the synthesized database from the index, and performs the energy control using linear interpolation and energy additive value of the LSP parameter between adjacent units. The rhyme control unit 180 determines the fundamental frequency expressing information on the structure, meaning, emotion, etc. of the sentence by using boundary analysis information. In the present invention, the second function P (t) is implemented.

P(t)= Pb-(Pb-Pa)*((Tb-t)/(Tb-Ta))**2,Ta=tTbP (t) = Pb- (Pb-Pa) * ((Tb-t) / (Tb-Ta)) ** 2, Ta = tTb

P(t)=Pc,Tb=tTcP (t) = Pc, Tb = tTc

P(t)= Pb-(Pb-Pd)*((t-Tc)/(Td-Tc))**2,Tc=tTdP (t) = Pb- (Pb-Pd) * ((t-Tc) / (Td-Tc)) ** 2, Tc = tTd

이 때 Pa,Pb,Pc,Pd는 상수이다.At this time, Pa, Pb, Pc, and Pd are constants.

도면의 가장 밑에 있는 LSP합성(190)은 위와같은 과정(110-180)을 거쳐 생성된 합성 파라미터를 이용하여 음성을 합성하는 과정으로서 12차 올-폴(all-poll) LSP필터와 디지탈로 표현된 합성음을 아날로그로 변환하는 D/A 변환기(4)를 이용하여 합성한다.The LSP synthesis 190 at the bottom of the figure is a process of synthesizing speech using the synthesis parameters generated through the above processes 110-180 and is represented by a 12th order all-poll LSP filter and digital. The synthesized synthesized sound is synthesized using a D / A converter 4 that converts the synthesized sound into an analog.

제 5도는 제 6도의 LSP합성기(190)에 사용되는 음원에 대한 도면으로서, 생성된 합성 파라미터의 피치정보를 이용하여 이 값이 0일때에는 무성음을 나타내므로 리지쥬얼(residual) 신호를 필터의 음원으로 사용하여 원음과 가까운 명료도를 가지는 합성음을 생성하도록 한다. 만약 피치정보가 있다면 현재 합성해야 할 프레임이 유성음이므로 인간의 발성모델과 비슷한 LF모델을 한국어에 맞도록 수정하여 음원으로 사용하였다.FIG. 5 is a diagram of a sound source used in the LSP synthesizer 190 of FIG. 6, which shows unvoiced sound when this value is 0 by using the pitch information of the generated synthesis parameter. Use it to generate synthesized sound with intelligibility close to the original sound. If there is pitch information, the frame to be synthesized is voiced sound, so the LF model similar to the human vocal model was modified to fit Korean and used as a sound source.

따라서, 상기와 같이 구성하여 수행되는 본 발명은 다음과 같은 특수한 효과를 얻을 수 있다.Therefore, the present invention configured and carried out as described above can obtain the following special effects.

첫째, 완성형코드로 이루어진 한국어 문자이면 모두 합성이 가능하다.First, all Korean characters consisting of complete codes can be synthesized.

둘째, 다이폰 단위에 따라 특성에 맞게 최대 5개 까지의 경계표시를 함으로써 합성단위의 연결 및 음소의 길이 조절이 용이하도록 하였다.Second, up to five landmarks were made according to the characteristics of the diphony units to facilitate the connection of the synthesized units and the adjustment of the phoneme length.

셋째, 음원모델에 수정된 LF모델과 리지쥬얼(residual)신호를 사용함으로써 보다 자연스럽고 명료한 합성음을 얻을 수 있다.Third, a more natural and clear synthesized sound can be obtained by using the modified LF model and the residual signal in the sound source model.

넷째, 사용자는 원하는 정보를 음성으로 들을 수 있으므로 동시에 다른 작업을 할 수 있다.Fourth, the user can listen to the desired information by voice so that they can do other tasks at the same time.

다섯째, 현재 서비스중인 정보통신망과 연결하여 문자정보를 받아서 합성이 가능하다.Fifth, it is possible to synthesize text information by connecting to the information communication network currently in service.

여섯째, 장애인들을 위한 복지 서비스 및 자동통역기술의 합성분야에 활용할 수 있다.Sixth, it can be used in the synthesis of welfare services and automatic interpretation technology for the disabled.

Claims

Character input means (1) for receiving and transmitting Korean characters that can be expressed in a complete form, and central processing means (2) for receiving a character input from the character input means (1) and performing a speech synthesis algorithm; A synthetic database (3) for storing the parameters configured in units of the diphony used for the algorithm, and transmitting the necessary parameters to the central processing means (2), and digital data synthesized by the central processing means (2). In the speech synthesis method applied to a device including a digital-to-analog converter (4) for converting to analog and outputting synthesized sound to the outside,

A first step of performing alphabetic, numeric and limited abbreviation processing after the Hangul processing process of converting a completed character input through the character input means 1 into an internal code of 3 bytes using a conversion table;

A second step of generating rhyme control information through boundary analysis and breathing process and generating a phonetic phonetic code sequence by applying phonological rules of Korean through pronunciation rule processing;

A third step of first adjusting the phoneme length by using the symbol string generated in the second step;

A fourth step of converting each syllable having a three-byte form into a diphone type defined in a synthesis database according to a result of performing the third step;

A fifth step of bringing the parameters from the synthesis database using the generated indexes of the diphones and performing energy adjustment using linear interpolation and energy weights of the LSP parameters between adjacent units;

A sixth step of performing rhythm control to determine a fundamental frequency representing information on a sentence structure, meaning, emotion, etc. using boundary analysis information;

And a seventh step of synthesizing the voice using the synthesis parameter generated through the sixth step.

The method of claim 1, wherein the length adjustment in the third step is based on the boundary analysis information of words, phrases, clauses, and sentences, and the minimum duration and intrinsic duration of the phoneme for each phonetic phonetic sequence received. To determine the order of syllables and phonemes.

According to claim 1, wherein the rhyme control in the sixth step,

P (t) = Pb- (Pb-Pa) * ((Tb-t) / (Tb-Ta)) ** 2, Ta = tTb

P (t) = Pc, Tb = tTc

P (t) = Pb- (Pb-Pd) * ((t-Tc) / (Td-Tc)) ** 2, Tc = tTd

And Pa, Pb, Pc, and Pd are implemented using a quadratic function P (t) which is a constant.

2. The speech synthesis in the seventh step is performed by using a twelfth all-poll LSP filter and a D / A converter 4 that converts synthesized speech expressed in digital into analog. Speech synthesis method characterized in that.

The speech synthesis method of claim 1, wherein the parameter from the synthesis data base comprises a pitch, an energy weight, an LSP parameter, and a residual signal.

5. The speech synthesis method according to claim 4, wherein the speech synthesis in the seventh step is performed using an LF model modified as a sound source and a residual signal.