KR100355393B1

KR100355393B1 - Phoneme length deciding method in voice synthesis and method of learning phoneme length decision tree

Info

Publication number: KR100355393B1
Application number: KR1019950019064A
Authority: KR
Inventors: 곽동후; 기석철
Original assignee: 삼성전자 주식회사
Priority date: 1995-06-30
Filing date: 1995-06-30
Publication date: 2002-12-26
Also published as: KR970002851A

Abstract

PURPOSE: A phoneme length deciding method in voice synthesis and a method of learning phoneme length decision tree are provided to statistically decide the length of a syllable to generate a natural synthesis sound. CONSTITUTION: Constructive attributes given to each of phonemes that are learning objects are analyzed. The phonemes are classified according to the attributes and an average error of the classified groups is obtained. Among the attributes, an attribute having the smallest average error is determined. The groups classified according to the determined attribute is classified again according to the other attributes, and an average error of the classified groups is obtained. The above-described steps are repeated until the average error becomes smaller than a predetermined value, to acquire a phoneme length decision tree. An average phoneme length of each group is allocated to the end node of the phoneme length decision tree. The length of a phoneme that is an object of voice synthesis is decided with reference to the phoneme lengths.

Description

Phoneme length determination method and phoneme length decision tree learning method in speech synthesis

본 발명은 음소단위로 합성음을 생성하는 음성합성방법에 관한 것으로서 더욱 상세하게는 음소의 구문론적 속성을 이용하여 음소길이를 음소길이를 결정하는 방법 및 음소길이 결정트리의 학습방법에 관한 것이다.The present invention relates to a speech synthesis method for generating synthesized sounds on a phoneme basis, and more particularly, to a method for determining phoneme length using a syntactic property of a phoneme, and a method for learning a phoneme length determination tree.

음성합성은 컴퓨터로부터 인간에게로의 정보전달방법의 일환인 것이며, 특히 문서음성변환장치에서 핵심적으로 사용된다. 문서음성 변환장치에 있어서는 자연음성과의 차이가 없는 자연스러운 합성음을 발생시키는 것이 중요한 데 현재 이를 위하여 많은 연구가 진행되어지고 있다.Speech synthesis is part of the information transfer method from the computer to the human being, and is particularly important in document speech conversion devices. In the document speech converter, it is important to generate a natural synthesized sound which is not different from the natural sound. Currently, many studies have been conducted.

음성은 말하는 사람의 의도, 말의 내용 혹은 말이 이루어지는 상황등과 같은 의미론적인 요소와 문장 안에서의 위치, 기능, 상호결합관계등의 구문론적인 요소를 갖는다. 이에 따라 음성합성방법에도 의미론적인 방법과 구문론적인 방법이 있을 수 있는 데 의미론적인 방법이 규칙에 대한 일관성을 유지하기 어렵다는 단점을 가지는 반면에 구문론적인 방법은 문장구조에 의해 결정론적으로 결정되는 제어규칙에 의해 쉽게 구현될 수 있다는 장점이 있다.Speech has semantic elements such as the intention of the speaker, the content of the words, or the situation in which they are spoken, and syntactic elements such as their position, function, and interrelationship within the sentence. Accordingly, there may be a semantic method and a syntactic method in the speech synthesis method. The semantic method has a disadvantage in that it is difficult to maintain the consistency of rules, while the syntactic method is a control rule determined deterministically by sentence structure. There is an advantage that can be easily implemented by.

이러한 구문론적인 방법의 일예로서 "한국어 문서/음성변환 시스템의 구문분석에 의한 운율조절에 관한 연구"-제10회 음성통신 및 신호처리 워크샵 논문집, 10권 1호, p285-290 가 있으며, 여기서는 음소단위로 길이를 결정함에 있어서 음절의 수, 음절의 위치, 및 음절의 품사를 고려하는 방법이 개시된다.An example of such syntactic method is "Study on Rhyme Control by Syntactic Analysis of Korean Document / Voice Conversion System"-10th Speech Communication and Signal Processing Workshop, Vol. 10, No. 1, p285-290. A method of considering the number of syllables, the position of syllables, and the parts of syllables in determining the length in units is disclosed.

그러나, 이러한 방법은 다양한 구문론적인 속성들을 모두 포함하지 못하여 구해진 음절길이의 정밀도가 떨어지고, 1차원적인 분류표에 의해 작성된 음절 길이표를 기초로 하기 때문에 자연스럽지 못하다는 문제점이 있다.However, this method has a problem in that it is not natural because it does not include all of the syntactic properties, and thus the precision of the syllable length obtained is inferior and is based on the syllable length table created by the one-dimensional classification table.

다른 방법으로서는 대한민국 특허공개 92-0132247에 개시된 것으로서 음소의 음편길이를 그대로 음소의 길이도 사용하는 것으로서 이는 음편데이타베이스의 구축시에 결정된 단순한 음편길이만으로 음소의 길이가 결정된다는 경직성이 있다.As another method disclosed in Korean Patent Publication No. 92-0132247, the phoneme length of a phoneme is also used as it is, which has the rigidity that the phoneme length is determined only by the simple phoneme length determined when the phoneme database is constructed.

따라서, 본 발명은 음소길이의 예측에 다양한 구문론적인 속성을 이용하여 음절길이의 정밀도를 높이고, 속성상호간의 영향을 고려하여 통계적으로 음절의 길이를 결정함으로써 자연스러운 합성음을 발생시킬수 있는 음소길이 결정방법을 제공하는 것을 그 목적으로 한다.Therefore, the present invention improves the length of syllable length by using various syntactic properties for the prediction of phoneme length, and considers the length of syllables statistically by considering the influences between attributes. Its purpose is to provide.

본 발명의 다른 목적은 상기의 음소길이 결정방법에 사용되는 음소길이 결정트리의 학습방법을 제공하는 것에 있다.Another object of the present invention is to provide a method for learning a phoneme length determination tree used in the phoneme length determination method.

상기의 목적을 달성하는 본 발명에 따른 음성합성방법은Speech synthesis method according to the present invention to achieve the above object

구문론적인 속성들에 의해 음성합성의 단위가 되는 음소의 길이를 결정하는 방법에 있어서,In the method of determining the phoneme length that is the unit of speech synthesis by syntactic properties,

(a) 학습대상이 되는 음소들 각각에 부여된 구문론적인 속성들을 분석하는 과정;(a) analyzing syntactic attributes assigned to each of the phonemes to be studied;

(b) 상기 속성들에 의해 음소들을 분류하고 고유의 음소길이를 학습시키는 과정; 및(b) classifying phonemes according to the attributes and learning a unique phoneme length; And

(c) 상기 (b)과정에서 결정된 음소길이를 참조하여 음성합성의 대상이 되는 음소의 길이를 결정하는 과정을 포함함을 특징으로 한다.(c) determining the length of the phoneme that is the object of speech synthesis by referring to the phoneme length determined in step (b).

상기의 다른 목적을 달성하는 본 발명에 따른 음소길이 결정트리의 학습방법은Learning method of phoneme length determination tree according to the present invention to achieve the above another object

복수개의 문장을 구성하는 음소들의 구문론적 특성에 따라 분류하여 음성합성의 단위가 되는 음소의 길이를 결정하는 데 사용되는 음소길이 결정트리를 학습하는 방법에 있어서,In the method of learning the phoneme length determination tree used to determine the length of the phonemes to be the unit of speech synthesis by classifying according to the syntactic characteristics of the phonemes constituting a plurality of sentences,

(o) 상기 음소들을 임의의 속성에 의해 분류하고, 분류된 집단의 평균오차를 구하는 과정;(o) classifying the phonemes by any attribute and finding an average error of the classified population;

(p) 상기 (o)과정을 모든 속성들에 반복적용하여 분류된 집단의 평균오차가 가장 작은 값을 보이는 속성을 결정하는 과정;(p) repeatedly applying (o) to all the attributes to determine the attribute with the lowest mean error of the classified population;

(q) 상기 (p)과정에서 결정된 속성에 의해 분류된 집합들을 상기 결정된 속성을 제외시킨 나머지의 속성들에 의해 재차 분류하고, 분류된 집단의 평균오차를 구하는 과정;(q) classifying the sets classified by the attributes determined in step (p) again by the remaining attributes excluding the determined attributes, and obtaining the average error of the classified population;

(r) 상기 (p)과정과 (q)과정을 더이상 적용시킬 속성이 남아있지 않거나 혹은 분류된 집단의 평균오차가 소정의 값보다 작을 때까지 적용하여 반복적으로 적용시켜 트리구조의 음소길이 결정트리를 구하는 과정 ; 및(r) The phoneme length determination tree of the tree structure is applied repeatedly by applying the above steps (p) and (q) until there are no properties to apply or until the mean error of the classified groups is smaller than a predetermined value. Process of finding; And

(s) 상기 (r)과정에의해 분류된 음소길이 결정트리의 종단노드에 각 집단들의 평균음소길이를 할당시키는 과정을 포함함을 특징으로 한다. 이하 첨부된 도면을 참조하여 본 발명을 상세히 설명한다.(s) assigning an average phoneme length of each group to an end node of the phoneme length determination tree classified by step (r). Hereinafter, the present invention will be described in detail with reference to the accompanying drawings.

제1A도 내지 제1B도는 종래기술에 의한 음소길이 결정방법을 보이는 도면이다. 제1A도에 개시된 것은 특허공개 92-0132247에 개시된 것으로서 어떤 음소(p)가 입력되면 소정의 음편선택규칙에 의해 음소(p)의 음편을 얻어서 출력하는 구조를 갖는다.1A to 1B are diagrams showing a phoneme length determination method according to the prior art. As disclosed in FIG. 1A, it is disclosed in Patent Publication No. 92-0132247. When a phoneme p is inputted, a phoneme of the phoneme p is obtained and output according to a predetermined phoneme selection rule.

제1B도는 "한국어 문서/음성변환 시스템의 구문분석에 의한 운율조절에 관한 연구"에 개시된 것으로서 어떤 음절이 입력되면 이를 속성분석과정을 통하여 각 음절의 속성을 분석하고, 음절길이표로부터 분석된 결과에 상응하는 음절길이를 얻어서 출력하는 구조를 갖는다.FIG. 1B is a study on rhyme control by syntactic analysis of Korean document / voice conversion system. When a syllable is input, the property of each syllable is analyzed through an attribute analysis process, and the result is analyzed from syllable length table. It has a structure that obtains and outputs syllable length corresponding to.

제2도는 본 발명에 따른 음성합성방법을 보이는 도면이다. 제2도에 도시된음성합성방법은 다음과 같은 과정들을 통하여 수행된다,2 is a diagram showing a speech synthesis method according to the present invention. The speech synthesis method shown in FIG. 2 is performed through the following processes.

(a) 학습대상이 되는 음소들 각각에 부여된 구문론적인 속성들을 분석한다. 학습의 대상이 되는 음소들은 복수개의 녹음된 문장으로부터 추출된 것으로서, 고유의 구문론적인 속성들과 실제 길이값을 갖는다.(a) Analyze syntactic attributes assigned to each phoneme to be studied. Phonemes to be learned are extracted from a plurality of recorded sentences and have unique syntactic properties and actual length values.

예를 들면 "우리는 학교에 간다"라는 문장에 포함된 첫번째 음소"ㅜ"를 속성과 속성값 및 길이로 나타내면 다음과 같다.For example, the first phone "TT" contained in the sentence "We are going to school" is expressed as an attribute, an attribute value, and a length as follows.

가) 첫번째 음소: ㅜ1) First Phone: TT

1) 강세: 0 /*사전에 표기된 값을 적용*/1) Stress: 0 / * Apply the values shown beforehand * /

2) 직전 음소: ㅇ /*없음*/2) Previous Phone: ㅇ / * None * /

3) 직후 음소: ㄹ3) Phoneme immediately after

4) 장단음 표시: 무 /*사전에 표기된 값을 적용*/4) Show short and long notes: No / * Apply the indicated values * /

5) 어절의 음절수: 3 /*"ㅜ"가포함된 어절의 음절수*/5) Number of syllables: 3 / * Number of syllables including "TT" * /

6) 끊어 읽기 단위의 음절수: 3/*우리는(끊고) 학교에 간다*/6) Number of syllables in reading units: 3 / * We go to school (hang) * /

7) 어절선두로부터의 거리: 1/*"우"는 "우리는"이란 어절의 첫번째 음절임*/7) Distance from word start: 1 / * "Right" is the first syllable of the phrase "we"

8) 어절말미로부터의 거리: 38) Distance from end of word: 3

9) 끊어읽기 단위의 선두로부터의 거리 : 19) Distance from the start of the unit of reading: 1

10) 끊어읽기 단위의 말미로부터의 거리: 310) Distance from end of break unit: 3

11) 길이: 30 /*녹음된 음성파형에서 "ㅜ"음의 길이 단위는 1/1000초임 */11) Length: 30 / * The length unit of the "TT" sound in the recorded voice waveform is 1/1000 second * /

나) 두번째 음소 "ㄹ "B) the second phone "ㄹ"

1) 강세: 01) Stress: 0

2) 직전음소 : ㅜ2) Just before the phone: TT

3) 직후 음소 : ㅣ3) Phoneme immediately after: ㅣ

4) 장단음표시 : 무4) Long and short sound indication: No

(b) 속성들에 의해 음소들을 분류하고 고유의 음소길이를 학습시킨다. 음소들은 다음과 같은 음소길이 결정트리 생성규칙에 의해 트리구조로 분류된다.(b) Classify phonemes by their attributes and learn their phoneme lengths. Phonemes are classified into a tree structure according to the following phoneme length decision tree generation rules.

(c) 상기 (b)과정에서 결정된 음소길이를 참조하여 실제 음성합성시 적용되는 음소의 길이를 결정한다.(c) The phoneme length determined in actual speech synthesis is determined by referring to the phoneme length determined in step (b).

제3도는 제2도에 도시된 도면에 있어서 음소길이 결정트리의 결정과정을 보이는 도면이다. 본 발명에 따른 음소길이 결정트리의 결정과정은 다음과 같다.FIG. 3 is a diagram showing a determination process of a phoneme length decision tree in the diagram shown in FIG. The determination process of the phoneme length decision tree according to the present invention is as follows.

가) 학습의 대상이 되는 음소들을 임의의 속성에 의해 분류하고, 분류된 집단의 평균오차를 구한다. 평균오차는 하기의 식으로 결정된다.A) Classify phonemes by learning by arbitrary attributes and find the mean error of the classified groups. The mean error is determined by the following equation.

여기서, S_i(i는 0, 1,,,N)는 학습대상이 된 음소들이고, LEN(S_i)는 학습 대상이 된 음소들 S_i의 길이이고, m_ĸ는 집단에 소속된 음소의 길이 LEN(S_i)을 평균한 값이다.Here, S _i (i is 0, 1 ,,, N) are the phonemes to be learned, LEN (S _i ) is the length of the phonemes S _i to be learned, and m _ĸ is the number of phonemes belonging to the group. Length LEN (S _i ) is the average value.

나) 가)과정을 모든 속성들에 반복적용하여 분류된 집단의 평균오차가 가장 작은 값을 보이는 속성을 결정한다.B) Apply process a) to all the attributes repeatedly to determine which attribute has the lowest mean error in the group.

다) 나)과정에서 결정된 속성에 의해 분류된 집합들을 상기 결정된 속성을 제외시킨 나머지의 속성들에 의해 재차 분류하고, 분류된 집단의 평균오차를 구한다.C) The sets classified by the attributes determined in step b) are again classified by the remaining attributes excluding the determined attributes, and the average error of the classified group is obtained.

라) 나)과정과 다)과정을 더 이상 적용시킬 속성이 남아있지 않거나 혹은 분류된 집단의 평균오차가 소정의 값보다 작을 때까지 적용하여 반복적으로 적용시켜 트리구조의 음소길이 결정트리를 구한다.D) The phoneme length decision tree of the tree structure is obtained by applying it repeatedly until the attributes to which the processes b) and c) are no longer applied or the mean error of the classified group are smaller than the predetermined value are applied repeatedly.

마) 라)과정에의해 분류된 음소길이 결정트리의 종단노드에 각 집단들의 평균음소길이를 할당시킨다.D) Assign the average phoneme length of each group to the end node of the phoneme length decision tree classified by d).

제4도는 제3도에 도시된 방법에 의해 결정된 음소길이 결정트리의 일예를 보이는 도면이다.4 is a diagram showing an example of a phoneme length decision tree determined by the method shown in FIG.

제5도는 제2도에 도시된 음소길이 생성과정을 보이는 도면이다. 어떤 음소가 입력되면 이 음소의 구문론적인 속성들을 분석한다. 분석된 속성들에 의해 제4도에 도시되는 바와 같은 음소길이 결정트리를 참조하여 실제 적용되는 음소길이를 결정한다.5 is a diagram illustrating a phoneme length generation process illustrated in FIG. 2. When a phoneme is entered, its syntactic properties are analyzed. Based on the analyzed attributes, the phoneme length actually applied is determined by referring to the phoneme length decision tree as shown in FIG.

상술한 바와 같이 본 발명에 따른 음소길이 결정방법은 음소길이의 예측에 다양한 구문론적인 속성간의 영향을 고려하여 통계적으로 음절의 길이를 결정함으로써 자연스러운 합성음을 발생시킬 수 있는 음절길이 결정방법을 제공하는 효과를 갖는다.As described above, the phoneme length determination method according to the present invention has an effect of providing a syllable length determination method capable of generating a natural synthesized sound by determining the length of syllables statistically in consideration of the influence between various syntactic properties on the prediction of phoneme length. Has

제1A도 내지 제1B도는 종래기술에 의한 음소길이 결정방법을 보이는 도면이다.1A to 1B are diagrams showing a phoneme length determination method according to the prior art.

제2도는 본 발명에 따른 음성합성방법의 개념을 보이기 위한 모식도이다.2 is a schematic diagram showing the concept of a speech synthesis method according to the present invention.

제3도는 음소길이 결정트리 생성과정을 보이는 흐름도이다.3 is a flowchart showing a phoneme length decision tree generation process.

제4도는 제3도에 도시된 방법에 의해 생성된 음소길이 결정트리의 일예를 보이는 도면이다.4 is a diagram showing an example of a phoneme length decision tree generated by the method shown in FIG.

제5도는 본 발명에 따른 음소길이 결정과정을 보이기 위한 모식도이다.5 is a schematic diagram for showing the phoneme length determination process according to the present invention.

Claims

In the method of determining the phoneme length that is the unit of speech synthesis by syntactic properties,

(a) analyzing the syntactic attributes assigned to each of the phonemes to be studied;

(b) classifying phonemes based on the attributes and learning unique phoneme lengths; And

(C) a method for determining a phoneme length comprising the step of determining the length of the phoneme to be the object of speech synthesis by referring to the phoneme length determined in step (b).

The method of claim 1, wherein step (b)

(d) classifying the phonemes by any attribute and finding an average error of the classified population;

(e) applying (d) to all the attributes repeatedly to determine the attribute with the lowest mean error of the classified population;

(f) classifying the sets classified by the attributes determined in step (e) again by the remaining attributes excluding the determined attributes, and obtaining an average error of the classified population;

(g) The phoneme length determination tree of the tree structure is applied repeatedly by applying the above steps (e) and (f) until there are no properties to apply or until the mean error of the classified groups is smaller than a predetermined value. Process of obtaining; And

and (h) assigning an average phoneme length of each group to an end node of the phoneme length determination tree classified by step (g).

The phoneme length determination method according to claim 2, wherein the average error is determined by the following equation.

Here, S _i (i is 0, 1 ,,, N) are the phonemes to be learned, LEN (S _i ) is the length of the phonemes S _i to be learned, and m _ĸ is the number of phonemes belonging to the group. Average of length LEN (S _i )

In the method of learning the phoneme length determination tree used to determine the length of the phonemes to be the unit of speech synthesis by classifying according to the syntactic characteristics of the phonemes constituting a plurality of sentences,

(o) classifying the phonemes by any attribute and finding an average error of the classified population;

(p) repeatedly applying (o) to all the attributes to determine the attribute with the lowest mean error of the classified population;

(q) classifying the sets classified by the attributes determined in step (p) again by the remaining attributes excluding the determined attributes, and obtaining the average error of the classified population;

(r) The phoneme length determination tree of the tree structure is applied repeatedly by applying the above steps (p) and (q) until there are no properties to apply or until the mean error of the classified groups is smaller than a predetermined value. Process of obtaining; And

(s) a method of learning a phoneme length determination tree comprising assigning an average phoneme length of each group to an end node of the phoneme length determination tree classified by step (r).

5. The method of claim 4, wherein the average error is determined by the following equation.

Here, S _i (i is 0, 1 ,,, N) are the phonemes for learning, LEN (S _i ) is the length of the phonemes S _i and m _ĸ is the length of the phonemes belonging to the group. LEN (S _i ) averaged