KR19980031888A

KR19980031888A - Korean rhyme generating device and method

Info

Publication number: KR19980031888A
Application number: KR1019960051453A
Authority: KR
Inventors: 김정수
Original assignee: 김광호; 삼성전자 주식회사
Priority date: 1996-10-31
Filing date: 1996-10-31
Publication date: 1998-07-25
Also published as: KR100387232B1

Abstract

합성음의 자연성과 명료도를 향상시키는 한국어 운율생성장치 및 방법이 개시된다. 이 한국어 운율생성장치는 현음소, 전음소, 후음소, 문장 내에서의 위치로 구성된 문맥과 각 문맥에 해당하는 음소의 실제길이를 저장하는 문맥음소길이저장부; 전음소, 후음소로 구성된 문맥과 각 문맥의 무음의 실제 길이, 무음이 나타날 확률 데이터를 저장하고 있는 문맥무음길이저장부; 현음소, 전음소, 후음소, 문장 내에서의 위치로 구성된 문맥과 각 문맥에 해당하는 음소의 실제 크기를 저장하고 있는 문맥음소크기저장부; 음소의 평균길이를 저장하고 있는 음소길이저장부; 음소의 평균크기를 저장하고 있는 음소크기 저장부; 구문깊이차와 에너지를 사용하여 끊어읽을 유형을 찾는 끊어읽기부; 발화구를 이루는 각 음소, 음소간 무음기호의 실제길이를 음소가 무음이면 문맥무음길이저장부를 탐색하고, 무음이 아니면 문맥음소길이저장부 및 음소길이 저장부를 탐색하여 생성하는 음소길이생성부; 발화구를 이루는 각 음소의 실제 크기를 문맥음소크기저장부 및 음소크기저장부를 탐색하여 생성하는 음소크기생성부; 및 발화구의 피치패턴을 생성하는 피치패턴생성부를 포함함을 특징으로 한다.An apparatus and method for generating Korean rhymes for improving the naturalness and clarity of a synthesized sound are disclosed. This Korean rhyme growth growth context context length storage unit stores the actual length of the phoneme corresponding to each context and the context consisting of the strings, the front phone, the back phone, the position in the sentence; A context silence length storage unit for storing a context consisting of a front phone and a back phone, an actual length of silence of each context, and probability data of silence; A context phoneme size storage unit for storing the actual size of the phoneme corresponding to each context and the context consisting of a string, a front phone, a back phone, and a position in a sentence; A phoneme length storage unit for storing an average length of phonemes; A phoneme size storage unit for storing an average size of phonemes; A truncation unit that finds a truncated type using syntax depth and energy; A phoneme length generation unit searching for a context-free silence length storage unit if the phoneme is a silent phoneme, and if not, a context-phone length storage unit and a phoneme-length storage unit; A phoneme size generator for generating the actual phoneme size of each phoneme by searching for the contextual phoneme size storage unit and the phoneme size storage unit; And a pitch pattern generator for generating a pitch pattern of the utterance.

본 발명에 의하면, 합성음의 자연성과 명료도를 향상시키는 효과가 있다.According to the present invention, there is an effect of improving the naturalness and clarity of the synthesized sound.

Description

Korean rhyme generating device and method

본 발명은 언어처리의 결과인 의존트리(구문트리)와 사람이 발성한 문장에서 추출한 통계적인 운율데이터를 이용하여 한국어 운율을 생성함으로써 합성음의 자연성과 명료도를 향상시키는 방법을 제안한다.The present invention proposes a method for improving the naturalness and clarity of the synthesized sound by generating Korean rhymes using statistical tree data extracted from the dependent tree (syntax tree) that is the result of language processing and sentences produced by human speech.

일반적으로 운율(prosody)이란 억양(pitch), 리듬, 강세(accent) 등을 포함하며, 음소의 고유한 특성은 변화시키지 않으면서 의미, 강조, 감정 등을 전달하는 음성의 특성을 말한다. 운율이 없거나 단순한 음성은 의미전달이 잘되지 않을 뿐 아니라 단조롭고 지루하여 곧 듣기 싫은 음성이 되고 만다. 문서를 음성으로 바꾸는 TTS(Text to Speech) 시스템은 문서해석, 운율생성 및 파형합성의 3단계로 구성된다. 상기 운율생성은 일반적으로 입력 문장에 대한 발화구(utterance phrase, prosodic phrase), 음소길이(segmental duration), 음소크기(segmental amplitude) 및 피치패턴 등에 관한 정보를 생성하는 것이다. 이중 발화구 생성 즉 끊어읽기는 합성음의 전반적인 느낌을 좌우하는 요인이지만 아주 단순화된 방법이나 형태소 해석결과를 이용하는 연구들이 진행되어 왔다.In general, prosody includes pitch, rhythm, accent, etc., and it is a characteristic of voice that conveys meaning, emphasis, emotion, etc. without changing the characteristic of phoneme. A rhymeless or simple voice not only conveys meaning well, but also becomes monotonous and boring, making it an unpleasant voice. Text-to-speech (TTS) system, which converts documents into speech, consists of three steps: interpretation, rhyme generation, and waveform synthesis. The rhythm generation generally generates information about an utterance phrase, a prosodic phrase, a segmental duration, a segmental amplitude, a pitch pattern, and the like for an input sentence. Although the generation of double utterance, that is, cut-out, is a factor that affects the overall feeling of synthesized sound, studies using very simplified methods or morphological analysis results have been conducted.

그런데 보통 사람이 문서를 낭독할 때 끊어읽기는 문장구조를 파악하여 의미의 결합도와 자신이 들이마신 숨을 고려하여 수행한다. 이 때 음성합성을 위한 문서의 끊어읽기 위치결정을 함에 있어서 종래에는 문장구조와 발화자의 숨(에너지)보다는 어절과 어절사이의 휴지기 존재확률과 어절갯수 확률분포를 사용하고 있다. 이러한 방법은 사람과 같이 의미의 결합도를 고려하여 끊어 읽는 것과는 차이가 많이 발생하여 합성음의 자연성을 많이 떨어뜨리는 결과를 낳는다.However, when a person reads a document, he or she understands the sentence structure and performs it in consideration of the combination of meaning and the breath that he breathed. At this time, in the reading and positioning of a document for speech synthesis, the existence probability of a pause between words and words and word number probability distribution are used rather than sentence structure and speaker's breath (energy). This method is different from reading by considering the coupling of meanings like humans, which results in a lot of deterioration of the naturalness of synthesized sound.

그리고 일반적으로 음성의 운율을 구성하는 주 요인을 피치곡선으로 보고 있다. 그러나 실제 한국어의 운율은 발화구의 피치곡선, 음소의 길이, 음소의 크기 등이 운율을 구성하는 주 요인이다. 운율이 이렇게 다양한 요인을 반영하지 않으면 기계음과 같이 단조로운 합성음을 만들게 된다.In general, the main factor constituting the rhyme of speech is viewed as the pitch curve. However, the actual Korean rhyme is the main factor that constitutes the rhyme, such as pitch curve, length of phonemes, and phoneme size. If the rhyme does not reflect these various factors, it will produce monotonous synthesis sounds like mechanical sounds.

또한 음성합성에 있어서 피치, 음소길이, 음소크기등 운율의 주 요인은 형태소, 구문트리 등의 언어처리 결과와 많은 관련성을 맺고 있다. 종래의 음성합성기술은 이러한 언어정보보다는 문장이나 구절을 구성하는 음절수를 이용하고 있다. 이 역시 자연성과 명료도가 낮은 합성음을 만들게 되는 문제점이 있다.Also, in speech synthesis, the main factors of rhyme, such as pitch, phoneme length, and phoneme size, have many relations with the results of language processing such as morpheme and syntax tree. Conventional speech synthesis technology uses the number of syllables that make up a sentence or phrase rather than such language information. This also has the problem of creating a synthetic sound with low naturalness and clarity.

본 발명의 목적은 상술한 문제점을 해결하기 위해 창출된 것으로서, 음성합성에 있어서 합성음의 자연성을 향상시키는 끊어읽기 방법을 제공함에 있다.SUMMARY OF THE INVENTION An object of the present invention is to solve the above-mentioned problems, and to provide a method of cutting off the speech to improve the naturalness of the synthesized speech.

본 발명의 다른 목적은 음성합성에 있어서, 합성음의 자연성과 명료성을 향상시키는 음소길이생성방법을 제공함에 있다.Another object of the present invention is to provide a phoneme length generation method for improving the naturalness and clarity of synthesized speech in speech synthesis.

본 발명의 또 다른 목적은 음성합성에 있어서, 합성음의 자연성과 명료성을 향상시키는 음소크기생성방법을 제공함에 있다.Still another object of the present invention is to provide a phoneme size generation method for improving the naturalness and clarity of synthesized sound in speech synthesis.

본 발명의 또 다른 목적은 음성합성에 있어서, 합성음의 자연성과 명료성을 향상시키는 피치패턴생성방법을 제공함에 있다.Still another object of the present invention is to provide a pitch pattern generation method for improving the naturalness and clarity of synthesized sound in speech synthesis.

본 발명의 또 다른 목적은 합성음의 자연성과 명료도를 향상시키는 한국어 운율생성장치 및 방법을 제공함에 그 목적이 있다.It is another object of the present invention to provide an apparatus and method for generating Korean rhyme which improves the naturalness and clarity of synthesized sounds.

도 1은 일반적인 음성합성 장치(TTS)의 음성합성과정을 도시한 것이다.FIG. 1 illustrates a speech synthesis process of a general speech synthesis apparatus (TTS).

도 2는 본 발명에 의한, 한국어 운율생성장치의 구성을 블록도로 도시한 것이다.2 is a block diagram showing the configuration of the Korean rhyme generating device according to the present invention.

도 3은 피치패턴 생성부의 구성을 블록도로 도시한 것이다.3 is a block diagram illustrating a configuration of a pitch pattern generator.

도 4는 본 발명의 입력형태인 구문트리 및 구문깊이, 구문깊이 차를 도시한 것이다.Figure 4 shows the syntax tree, syntax depth, syntax depth difference of the input form of the present invention.

도 5는 끊어읽기부의 동작을 흐름도로 도시한 것이다.5 is a flowchart illustrating an operation of the interrupt read unit.

도 6은 제어정보생성부의 동작을 흐름도로 도시한 것이다.6 is a flowchart illustrating the operation of the control information generation unit.

도 7은 음소길이 생성부의 동작을 흐름도로 도시한 것이다.7 is a flowchart illustrating an operation of a phoneme length generator.

도 8은 음소크기 생성부의 동작을 흐름도로 도시한 것이다.8 is a flowchart illustrating an operation of a phoneme size generator.

도 9는 피치패턴 생성부의 동작을 흐름도로 도시한 것이다.9 is a flowchart illustrating an operation of the pitch pattern generator.

상기의 목적을 달성하기 위한, 음성합성의 합성음의 자연성을 향상시키는 끊어읽기 방법은 사람이 호흡을 하는 것에 해당하는 에너지초기화단계; 현재 어절을 발음하기 위해 필요한 에너지를 사용하는 에너지사용단계; 뒤 어절과 현재 어절의 구문깊이 차를 계산하여 구문유형을 설정하는 단계; 에너지가 다 소모되었으면 음성 코퍼스로부터 추출한 발화구 다음의 휴지부 길이분포에 따라 설정된 소정의 끊어읽기 유형을 휴지부길이가 한 단계 긴 유형으로 증가시키고, 에너지를 충전하는 단계; 및 상기 단계를 마지막 어절까지 반복하고, 마지막 어절에서는 상기 끊어읽기 유형을 휴지부 길이가 가장긴 유형으로 설정하는 단계로 이루어짐이 바람직하다.In order to achieve the above object, the method of reading off to improve the naturalness of the synthesized sound of speech synthesis comprises: an energy initialization step corresponding to a person breathing; An energy use step of using energy required to pronounce the current word; Setting a syntax type by calculating a syntax depth difference between a next word and a current word; When the energy is exhausted, increasing the predetermined interruption reading type set according to the pause length distribution after the ignition bulb extracted from the voice corpus to a type where the pause length is one step longer, and charging the energy; And repeating the above steps up to the last word, and setting the interruption reading type to the type having the longest rest length.

상기의 다른 목적을 달성하기 위한, 음성합성에서 합성음의 자연성과 명료성을 향상시키는 음소길이생성방법은 현음소, 전음소, 후음소, 문장 내에서의 위치로 구성된 소정의 문맥과 상기 각 문맥에 해당하는 음소의 실제길이로 구성된 문맥음소길이표와, 전음소, 후음소로 구성된 문맥과 상기 각 문맥에 해당하는 무음의 실제길이, 상기 무음이 나타날 확률로 구성된 문맥무음길이표와, 문장의 위치에 따른 각 음소의 평균길이로 구성된 음소길이표를 생성하여 구비하는 단계; 현재 음소를 중심으로 문맥을 결정하는 단계; 현재 음소가 무음인가를 비교하여 무음이면 상기 문맥무음길이표를 탐색하여 무음의 길이를 설정하고, 무음이 아니면 상기 문맥음소길이표를 탐색하여 음소길이를 설정하는 단계; 소정의 제어정보에 따라 음소길이를 조절하는 단계; 및 상기 단계를 발화구 내의 마지막 음소까지 반복하는 단계를 포함함이 바람직하다.Phoneme length generation method for improving the naturalness and clarity of the synthesized sound in speech synthesis to achieve the above another object is a predetermined context consisting of the pre-phone, pre-phone, post-phone, the position in the sentence and each of the context In the contextual phoneme length table consisting of the actual phoneme length of the phoneme, the context consisting of the front phoneme and the backphoneme, the actual length of the silence corresponding to each context, the contextual phoneme length table consisting of the probability of the occurrence of the silence, and the position of the sentence. Generating and providing a phoneme length table composed of average lengths of respective phonemes; Determining a context based on the current phoneme; Comparing the current phoneme to be silent or not and searching for the contextless length table to set the length of the silence; otherwise, searching for the contextual phoneme length table to set the phoneme length; Adjusting the phoneme length according to predetermined control information; And repeating the above steps to the last phoneme in the utterance.

상기의 또 다른 목적을 달성하기 위한, 음성합성에서 합성음의 자연성과 명료성을 향상시키는 음소크기생성방법은 현음소, 전음소, 후음소, 문장 내에서의 위치로 구성된 문맥과 각 문맥에 해당하는 음소의 실제 크기를 저장하고 있는 문맥음소크기표와, 문장의 위치에 따른 각 음소의 크기를 저장하고 있는 음소크기표를 구비하는 단계; 현재 음소를 중심으로 문맥을 결정하는 단계; 문맥음소크기표를 탐색하여 음소크기 값을 설정하고 탐색에 성공하지 못하면 음소크기표를 탐색하여 음소크기 값을 설정하는 단계; 소정의 제어정보에 따라 음소크기를 조절하는 단계; 및 상기 단계를 발화구 내의 마지막 음소까지 반복하는 단계를 포함함이 바람직하다.In order to achieve the above another object, the phoneme size generation method for improving the naturalness and clarity of the synthesized speech in speech synthesis includes a context composed of a string phone, a front phone, a back phone, a position in a sentence, and a phoneme corresponding to each context. A phoneme size table for storing the actual phoneme size and a phoneme size table for storing the size of each phoneme according to the position of the sentence; Determining a context based on the current phoneme; Searching for a phoneme size table to set a phoneme size value and, if the search is not successful, searching for a phoneme size table to set a phoneme size value; Adjusting the phoneme size according to predetermined control information; And repeating the above steps to the last phoneme in the utterance.

상기의 또 다른 목적을 달성하기 위한, 음성합성에서 합성음의 자연성과 명료성을 향상시키는 피치패턴 생성방법은 음소내에 소정의 피치 제어점을 설정하는 단계; 피치제어점이 설정되면 전체 윤곽선을 생성하는 단계; 발화구의 시작하는 단어의 품사나 모음에 따라 결정되는 발화구의 머리패턴을 생성하는 단계; 발화구의 끊어 읽기 유형, 끝나는 단어의 품사에 따라 결정되는 꼬리패턴을 생성하는 단계; 발화구내 각 어절의 유형, 현재의 피치값에 따라 결정되는 어절패턴을 생성하는 단계; 단어내 액센트 여부에 따라 결정되는 액센트 패턴을 생성하는 단계; 및 마지막 발화구까지 발화구 단위로 상기 단계를 반복하는 단계를 포함함이 바람직하다.In order to achieve the above object, the pitch pattern generation method for improving the naturalness and clarity of the synthesized sound in speech synthesis comprises the steps of setting a predetermined pitch control point in the phoneme; Generating an entire contour when the pitch control point is set; Generating a head pattern of the headpiece determined according to the part of speech or the vowel of the starting word of the headpiece; Generating a tail pattern that is determined according to a broken reading type of the utterance and a part of speech of the ending word; Generating a word pattern determined according to the type of each word in the utterance and the current pitch value; Generating an accent pattern that is determined according to whether the word has an accent; And it is preferable to include the step of repeating the above step in the unit of the firing until the last firing.

상기의 또 다른 목적을 달성하기 위한, 합성음의 자연성과 명료도를 향상시키는 한국어 운율생성장치는 현음소, 전음소, 후음소, 문장 내에서의 위치로 구성된 문맥과 각 문맥에 해당하는 음소의 실제길이를 저장하고 있는 문맥음소길이 저장부; 전음소, 후음소로 구성된 문맥과 상기 문맥의 각각에 해당하는 무음의 실제 길이, 무음이 나타날 확률 데이터를 저장하고 있는 문맥무음길이 저장부; 현음소, 전음소, 후음소, 문장 내에서의 위치로 구성된 문맥과 각 문맥에 해당하는 음소의 실제 크기를 저장하고 있는 문맥음소크기 저장부; 문장의 위치에 따른 각 음소의 평균길이를 저장하고 있는 음소길이 저장부; 문장의 위치에 따른 각 음소의 크기를 저장하고 있는 음소크기 저장부; 구문트리의 루트노드에서 각 노드까지의 거리를 구문깊이, 사람이 발성하기 위해 필요한 폐활량을 에너지라고 할 때, 입력된 문장에서 구문깊이차를 계산하여 상기 구문깊이차와 상기 에너지를 사용하여 소정의 끊어읽을 유형을 찾는 끊어읽기부; 상기 끊어읽기부에서 한 문장을 이루는 끊어읽기 유형이 결정되면, 발화구를 이루는 각 음소, 음소간 무음기호의 실제길이를 음소가 무음이면 문맥무음길이저장부를 탐색하고, 무음이 아니면 상기 문맥음소길이저장부 및 음소길이 저장부를 탐색하여 생성하는 음소길이생성부; 발화구를 이루는 각 음소의 실제 크기를 상기 문맥음소크기저장부 및 음소크기저장부를 탐색하여 생성하는 음소크기생성부; 및 발화구를 이루는 각 음소의 제어점에 실제 피치값을 할당하는 방법으로 전체윤곽선, 머리패턴, 꼬리패턴, 어절패턴, 액센트패턴으로 이루어진 발화구의 피치패턴을 생성하는 피치패턴생성부를 포함함이 바람직하다.In order to achieve the above object, Korean rhyme growth value which improves the naturalness and clarity of the synthesized sound is composed of the strings, the front phone, the back phone, the position in the sentence, and the actual length of the phonemes corresponding to each context. A context phoneme length storage unit for storing the; A context silence length storage unit for storing a context composed of a front phone and a post phone, an actual length of a silence corresponding to each of the contexts, and probability data of occurrence of silence; A contextual phoneme size storage unit for storing the actual size of the phoneme corresponding to each context and a context consisting of a string, a front phone, a phoneme, and a position in a sentence; A phoneme length storage unit for storing an average length of each phoneme according to a sentence position; Phoneme size storage unit for storing the size of each phoneme according to the position of the sentence; When the distance from the root node of the syntax tree to each node is syntax depth, and the spirocity required for human speech is called energy, the syntax depth difference is calculated from the input sentence and the predetermined depth is calculated using the syntax depth difference and the energy. Breakreader to find the type of break; When the type of cut-off reading that forms a sentence is determined in the cut-out unit, the phoneme is found in the actual length of each phoneme and the phonetic silence between phonemes. If the phoneme is silent, the context-free length storage unit is searched. A phoneme length generation unit for searching and generating a storage unit and a phoneme length storage unit; A phoneme size generation unit for generating the actual size of each phoneme forming a utterance by searching for the contextual phoneme size storage unit and the phoneme size storage unit; And a pitch pattern generation unit for generating a pitch pattern of the spokes consisting of a total outline, a head pattern, a tail pattern, a word pattern, and an accent pattern as a method of allocating an actual pitch value to each control point of the phonemes. .

그리고 상기 끊어읽기부의 소정의 끊어읽기 유형은 음성 코퍼스로부터 추출한 발화구 다음의 휴지부 길이 분포에 따라 나누어지고, 발화구 뒤에 휴지부는 없지만 피치의 변화가 심한 발화구를 끊어읽기 유형 0이라 하고, 발화구뒤에 150 msec 정도의 휴지부가 있는 발화구로 구문깊이 차가 적은 경우의 발화구를 끊어읽기 유형 1이라 하고, 발화구 뒤에 150 - 400 msec 정도의 휴지부가 있는 발화구로 구문깊이차가 큰 경우의 발화구이고, 발화구 뒤의 휴지부 길이가 발화구 내의 음소수에 따라 결정되는 것을 끊어읽기 유형 2라 하고, 문장 내에서 마지막 발화구를 끊어읽기 유형 3이라 함을 특징으로 한다.The predetermined reading type of the reading part is divided according to the distribution of the length of the rest part after the firing part extracted from the voice corpus, and is called the reading type 0 by cutting off the ignition part having no change in the pitch but having the rest part behind the firing part. An ignition with a pause of 150 msec in the back is called a reading type 1 when the phrase has a small difference in syntax, and a ignition with a syntax difference of 150-400 msec behind a ignition. The length of the rest portion behind the utterance is determined according to the phoneme number in the utterance, and is referred to as reading type 2, and the last utterance in the sentence is referred to as reading type 3.

또한 상기 한국어 운율생성장치는 어절별로 만들어진 발음기호를 상기 끊어읽기부에서 생성된 끊어읽기 유형에 따라 발화구 단위로 재구성하고, 유무성음 결정, 닫히는 음 전후의 무음기호 삽입, 길이와 피치제어를 위한 제어정보를 생성하는 제어정보생성부를 더 구비한다. 또한 상기 음소길이생성부 및 음소크기생성부는 음소길이 및 음소크기를 생성한 후 상기 제어정보생성부에서 생성된 제어정보에 따라 음소길이 및 음소크기를 조절함을 특징으로 한다.In addition, the Korean rhyme growth growth value reconstructs the phonetic symbols made for each word in the unit of the utterance according to the type of utterance generated by the utterance reading unit, to determine the presence or absence of voice, inserting the silent symbols before and after the closing sound, length and pitch control The apparatus further includes a control information generator for generating control information. The phoneme length generator and the phoneme size generator may generate a phoneme length and a phoneme size, and then adjust the phoneme length and the phoneme size according to the control information generated by the control information generator.

상기의 또 다른 목적을 달성하기 위한, 합성음의 자연성과 명료도를 향상시키는 한국어 운율생성방법은 현음소, 전음소, 후음소, 문장 내에서의 위치로 구성된 문맥과 각 문맥에 해당하는 음소의 실제길이를 저장하고 있는 문맥음소길이표; 전음소, 후음소로 구성된 문맥과 상기 문맥의 각각에 해당하는 무음의 실제 길이, 무음이 나타날 확률 데이터를 저장하고 있는 문맥무음길이표; 현음소, 전음소, 후음소, 문장 내에서의 위치로 구성된 문맥과 각 문맥에 해당하는 음소의 실제 크기를 저장하고 있는 문맥음소크기표; 문장의 위치에 따른 각 음소의 평균길이를 저장하고 있는 음소길이표; 및 문장의 위치에 따른 각 음소의 크기를 저장하고 있는 음소크기표를 구비하고 있을 때, 구문트리의 루트노드에서 각 노드까지의 거리를 구문깊이, 사람이 발성하기 위해 필요한 폐활량을 에너지라고 할 때, 입력된 문장에서 구문깊이차를 계산하여 상기 구문깊이차와 상기 에너지를 사용하여 소정의 끊어읽을 유형을 찾는 끊어읽기단계; 발화구를 이루는 각 음소, 음소간 무음기호의 실제길이를 음소가 무음이면 문맥무음길이표를 탐색하고, 무음이 아니면 상기 문맥음소길이표 및 음소길이표를 탐색하여 생성하는 음소길이단계; 발화구를 이루는 각 음소의 실제 크기를 상기 문맥음소크기표 및 음소크기표를 탐색하여 생성하는 음소크기단계; 및 발화구를 이루는 각 음소의 제어점에 실제 피치값을 할당하는 방법으로 전체윤곽선, 머리패턴, 꼬리패턴, 어절패턴, 액센트패턴으로 이루어진 발화구의 피치패턴을 생성하는 피치패턴단계를 포함함이 바람직하다.In order to achieve the above another object, the method of generating Korean rhyme to improve the naturalness and clarity of the synthesized sound includes a context consisting of a string phone, a front phone, a back phone, a position in a sentence, and the actual length of a phoneme corresponding to each context. A contextual phoneme length table for storing a; A context silence length table that stores a context consisting of a front phone and a back phone, an actual length of silence corresponding to each of the contexts, and probability data of occurrence of silence; A contextual phoneme size table that stores a context consisting of a string phone, a phoneme, a phoneme, and a position in a sentence and an actual size of a phoneme corresponding to each context; Phoneme length table that stores the average length of each phoneme according to the position of the sentence; And a phoneme size table that stores the size of each phoneme according to the position of the sentence, the depth of the phrase from the root node of the syntax tree to each node, and the amount of spiro necessity for a person to speak. Calculating a syntax depth difference from an input sentence and finding a predetermined type of reading using the syntax depth difference and the energy; A phoneme length step of generating the actual length of each phoneme and the phonetic silence between the phonemes, if the phoneme is silent, searching for the contextual silence table, and if not, searching for the contextual phoneme length table and the phoneme length table; A phoneme size step of generating the actual size of each phoneme constituting the utterance by searching the contextual phoneme size table and the phoneme size table; And a pitch pattern step of generating a pitch pattern of the spokes consisting of a total outline, a head pattern, a tail pattern, a word pattern, and an accent pattern as a method of allocating an actual pitch value to the control points of each phoneme constituting the firing phrase. .

또한 상기 끊어읽기 단계와 상기 음소길이단계 사이에 어절별로 만들어진 발음기호를 상기 끊어읽기부에서 생성된 끊어읽기 유형에 따라 발화구 단위로 재구성하는 단계; 및 유무성음 결정, 닫히는 음 전후의 무음기호 삽입, 길이와 피치제어를 위한 제어정보를 생성하는 단계를 더 구비하고, 상기 음소길이단계 및 음소크기단계는 생성된 음소길이 및 음소크기를 상기 생성된 제어정보에 따라 조절하는 단계를 더 구비함을 특징으로 한다.The method may further include reconfiguring a phonetic symbol generated for each word between the cut-out step and the phoneme length step according to a break-read type generated by the cut-out unit. And determining whether there is a voiceless sound, inserting silent symbols before and after the closed sound, and generating control information for controlling length and pitch, wherein the phoneme length step and the phoneme size step are generated by the generated phoneme length and phoneme size. And adjusting according to the control information.

이하에서 첨부된 도면을 참조하여 본 발명을 상세히 설명하기로 한다. 도 1은 일반적인 음성합성 장치(TTS)의 음성합성과정을 도시한 것으로서, 문서해석, 운율생성, 파형합성의 세 단계로 이루어진다. 일반적으로 문서처리는 전처리, 형태소해석, 구문해석과 같은 단계를 거치며, 그 결과로서 입력문장에 해당하는 구문트리를 출력한다.Hereinafter, the present invention will be described in detail with reference to the accompanying drawings. FIG. 1 illustrates a speech synthesis process of a general speech synthesis apparatus (TTS), and includes three steps of document interpretation, rhythm generation, and waveform synthesis. In general, document processing goes through steps such as preprocessing, morphological analysis, and syntax analysis. As a result, the syntax tree corresponding to the input sentence is output.

본 발명은 이 중 운율생성에 관한 것으로서, 상기 구문트리의 한 종류인 의존트리를 운율생성을 위한 입력으로 본다. 상기 의존트리는 문장을 이루는 어절간의 의존관계(수식관계)를 방향성을 갖는 이진 관계로 표현한 것이다.The present invention relates to rhythm generation, and regards a dependency tree, which is one type of the syntax tree, as an input for rhyme generation. The dependency tree expresses the dependency relationship (formula relationship) between the words forming the sentence as a binary relationship.

도 2는 본 발명에 의한, 한국어 운율생성장치의 구성을 블록도로 도시한 것으로서, 끊어읽기부(200), 문맥음소길이저장부(225), 문맥무음길이저장부(230), 문맥음소크기저장부(245), 음소길이저장부(235), 음소크기저장부(240), 제어정보생성부(205), 음소길이생성부(210), 음소크기생성부(215) 및 피치패턴생성부(220)를 포함하여 이루어진다.2 is a block diagram showing the configuration of the Korean rhyme generating apparatus according to the present invention, the reading unit 200, the context phoneme length storage unit 225, the context silence length storage unit 230, the context phoneme size storage Part 245, phoneme length storage unit 235, phoneme size storage unit 240, control information generation unit 205, phoneme length generation unit 210, phoneme size generation unit 215 and pitch pattern generation unit ( 220).

한편, 상기 끊어읽기부(200)는 입력된 문장에서 끊어읽기 유형을 찾는다. 상기 구문트리는 의존문법에 의한 구문해석의 결과이다. 구문트리의 루트(root) 노드에서 각 노드까지의 거리를 구문깊이, 사람이 발성하기 위해 필요한 폐활량을 에너지라고 할 때, 두 노드간의 구문깊이차와 에너지 값을 이용하여 사람과 유사한 끊어읽기를 수행하도록 끊어읽기 유형을 찾는다. 도 4는 본 발명의 입력형태인 구문트리 및 구문깊이, 구문깊이 차를 도시한 것이다. 상기 구문트리는 의존문법에 의한 구문해석의 결과이다. 상기 구문깊이는 루트 노드의 구문깊이를 0으로하고, 각 노드까지 아래로 내려간 길이이다. 인접한 두 어절의 구문깊이차는 뒤 어절의 구문깊이와 앞어절의 구문깊이의 차이다.On the other hand, the reading unit 200 finds the reading type in the input sentence. The syntax tree is a result of syntax analysis by dependency syntax. When the distance from the root node of the syntax tree to each node is syntactically deep, the energy required for human utterance is called energy. Find the type of break to read. Figure 4 shows the syntax tree, syntax depth, syntax depth difference of the input form of the present invention. The syntax tree is a result of syntax analysis by dependency syntax. The syntax depth is the syntax depth of the root node as 0, and is the length down to each node. The difference in syntax depth between two adjacent words is the difference between the syntax depth of the next word and the syntax depth of the previous word.

끊어읽기 즉 발화구는 사람이 끊어 읽는 단위로서, 음성적, 구문적, 의미적 요인에 따라 매우 다양하게 분류할 수 있다. 그런데 발화구 뒤에는 휴지부가 오는 것이 일반적이다. 음성 코퍼스로부터 추출한 발화구 다음의 휴지부 길이 분포에 따라 발화구(끊어 읽기) 유형을 표 1과 같이 4가지로 나눌 수 있다.Hang-reading, or utterance, is a unit of reading that can be broken down in various ways according to phonetic, syntactic, and semantic factors. By the way, it is common that the rest comes after the ignition. According to the distribution of the length of the rest after the speech extracted from the negative corpus, the types of speech (breaking) can be divided into four types as shown in Table 1.

끊어읽기 유형 0은 발화구 뒤에 휴지부는 없지만 피치의 변화가 심한 발화구로, 주로 복합명사를 이루는 명사 사이에서 나타난다. 끊어읽기 유형 1은 발화구뒤에 150 msec 정도의 휴지부가 있는 발화구로 구문깊이 차가 적은 경우의 발화구이다. 끊어읽기 유형 2는 발화구 뒤에 150 -400 msec 정도의 휴지부가 있는 발화구로 구문깊이차가 큰 경우의 발화구이고, 발화구 뒤의 휴지부 길이는 발화구 내의 음소수에 따라 결정된다. 끊어읽기 유형 3은 문장 내에서 마지막 발화구로 피치가 급격히 떨어지는 것이 특징이다.The break-type 0 is a pit with a large pitch change, with no pauses behind the ignition, which appear mainly between nouns that make up compound nouns. The break-reading type 1 is an ignition with a 150 msec pause behind the ignition, and is an ignition when the syntax depth is small. The cut-out type 2 is an utterance having a pause of 150 -400 msec behind the utterance, and is an utterance when the syntax depth is large, and the length of the resting portion behind the utterance is determined by the number of phonemes in the utterance. Cut-out type 3 is characterized by a sharp drop in pitch to the last firing point in a sentence.

끊어읽기 유형Reading type 끊어 읽기 유형Broken read type 끊어 읽는 시간Reading time 의 미meaning 00 50 msec50 msec 복합명사를 이루는 명사간 끊어 읽기Read between nouns forming a compound noun 1One 150 msec150 msec 구문깊이 차가 적은 끊어 읽기Phrase reading with less difference in syntax 22 150-400 msec150-400 msec 구문깊이 차가 큰 끊어 읽기Read deeply cut off the gap 33 700 msec700 msec 문장간 끊어 읽기Read between sentences

상기 제어정보 생성부(205)는 상기 끊어읽기부(200)에서 한 문장을 이루는 각 어절의 발화구 유형이 결정되면, 어절별로 만들어진 발음기호를 끊어읽기 유형에 따라 발화구 단위로 재구성하면서 유무성음 결정, 닫히는 음 전후의 무음기호 삽입, 길이와 피치제어를 위한 제어정보를 생성한다.The control information generator 205 determines whether a spoken phrase of each word constituting a sentence is determined by the truncated reader 200. Generates control information for determining, inserting silent symbols before and after closing notes, and controlling length and pitch.

상기 문맥음소길이저장부(225)는 문맥음소길이표를 저장하고 있으며, 상기 문맥음소길이표는 현음소, 전음소, 후음소, 문장 내에서의 위치로 구성된 문맥과 각 문맥에 해당하는 음소의 실제길이로 구성된다.The context phoneme length storage unit 225 stores a context phoneme length table, and the context phoneme length table includes a context composed of a current phoneme, a front phoneme, a backphoneme, a position in a sentence, and a phoneme corresponding to each context. It consists of the actual length.

상기 문맥무음길이저장부(230)는 문맥무음길이표를 저장하고 있으며, 상기 문맥무음길이표는 전음소, 후음소로 구성된 문맥과 각 문맥에 해당하는 무음의 실제 길이, 무음이 나타날 확률로 구성된다.The context silence length storage unit 230 stores a context silence length table, and the context silence length table includes a context consisting of a front phone and a back phone, an actual length of a silence corresponding to each context, and a probability of appearing silence. do.

상기 음소길이저장부(235)는 음소길이표를 저장하고 있으며, 상기 음소길이표는 문장의 위치에 다른 각 음소의 평균길이로 구성된다.The phoneme length storage unit 235 stores a phoneme length table, and the phoneme length table includes an average length of each phoneme different in the position of a sentence.

상기 음소길이 생성부(210)는 발화구를 이루는 각 음소, 음소간 무음기호의 실제 길이를 상기 문맥음소길이저장부(225)의 문맥음소길이표, 문맥무음길이저장부(230)의 문맥무음길이표 및 음소길이저장부(235)의 음소길이저장표를 참조하여 생성한다.The phoneme length generation unit 210 is the phoneme length table of the context phoneme length storage unit 225, the context silence length of the context phoneme length storage unit 230, the actual length of each phoneme, phoneme between the phonemes forming the utterance The length table and the phoneme length storage table of the phoneme length storage unit 235 are generated by referring to the phoneme length storage table.

상기 문맥음소크기저장부(245)는 문맥음소크기표를 저장하고 있으며, 상기 문맥음소크기표는 현음소, 전음소, 후음소, 문장 내에서의 위치로 구성된 문맥과 각 문맥에 해당하는 음소의 실제 크기로 구성된다.The context phoneme size storage unit 245 stores a context phoneme size table, wherein the context phoneme size table is composed of a context phoneme, a front phoneme, a backphoneme, a position in a sentence, and a phoneme corresponding to each context. It is configured at actual size.

상기 음소크기저장부(240)는 음소크기표를 저장하고 있으며, 상기 음소크기표는 문장의 위치에 따른 각 음소의 평균크기로 구성된다.The phoneme size storage unit 240 stores a phoneme size table, and the phoneme size table consists of the average size of each phoneme according to the position of the sentence.

상기 음소크기 생성부(215)는 발화구를 이루는 각 음소의 실제크기를 문맥무음크기 저장부(245)의 문맥무음크기표 및 음소크기저장부(240)의 음소크기표를 참조하여 생성한다.The phoneme size generation unit 215 generates the actual size of each phoneme constituting the utterance by referring to the contextual silence size table of the contextual silence size storage unit 245 and the phoneme size table of the phoneme size storage unit 240.

상기 피치패턴 생성부(220)는 발화구를 이루는 각 음소의 제어점에 실제 피치값을 할당하는 방법으로 발화구의 피치패턴을 생성하며 도 3과 같이 구성된다. 전체 윤곽선(300)은 발화구 전체의 피치 윤곽선으로서 화자의 성별, 발화구의 길이에 따라 다른 모양으로 만들어지는 피치패턴을 말한다. 머리패턴(310)은 발화구의 시작위치에 대한 피치 패턴으로서 시작하는 단어의 품사나 모음에 따라 결정된다. 꼬리패턴(320)은 발화구의 끝 부분에 위치한 피치패턴으로서 끊어읽기 유형, 끝나는 단어의 품사에 따라 결정된다. 어절패턴(330)은 발화구내 각 어절의 끝에 위치하는 피치 패턴으로서 어절의 유형, 현재의 피치값에 따라 결정된다. 액센트 패턴(340)은 액센트가 있는 음소의 피치 패턴이다.The pitch pattern generator 220 generates a pitch pattern of the spokes by assigning an actual pitch value to the control points of the phonemes forming the spokes, and is configured as shown in FIG. 3. The overall contour 300 is a pitch contour of the entire firing mouth, and refers to a pitch pattern that is made in a different shape according to the sex of the speaker and the length of the firing sphere. The head pattern 310 is determined according to the part-of-speech or vowel of the word starting as a pitch pattern for the start position of the utterance. The tail pattern 320 is a pitch pattern located at the end of the utterance, and is determined according to the reading type and the part of speech of the ending word. The word pattern 330 is a pitch pattern located at the end of each word in the utterance, and is determined according to the type of word and the current pitch value. Accent pattern 340 is a pitch pattern of accented phonemes.

본 발명에 대한 구체적 동작은 다음과 같다. 도 5는 상기 끊어읽기부(200)의 동작을 흐름도로 도시한 것이다. 먼저, 사람이 말을 하기 위해 공기를 들이마시는 것에 해당하는 에너지 초기화를 수행한다.(501단계) 다음으로 현재 어절을 발음하기 위해 필요한 에너지를 사용한다.(502단계) 다음으로 도 4의 예와 같은 방법으로 뒤 어절과 현재어절의 구문깊이를 계산한다.(503단계)Specific operation of the present invention is as follows. 5 is a flowchart illustrating an operation of the interrupt reading unit 200. First, an energy initialization corresponding to a person breathing air to speak is performed (step 501). Next, the energy required to pronounce the current word is used. (Step 502) Next, the example of FIG. In the same way, the syntax depth of the next word and the current word is calculated (step 503).

그리고 나서 구문깊이차가 2 이상인가를 비교한다.(504단계) 만약 2 이상이면 끊어읽기 유형을 2로 설정한다.(505단계) 만약 2 미만이면 끊어읽기 유형을 구문 깊이차와 같은 값으로 설정한다.(506단계) 다음으로 에너지가 다 소모되었는가 즉 에너지 값이 0 인가를 비교한다.(507단계) 에너지가 0 인 경우, 끊어읽기 유형 값을 1씩 증가한다.(508단계) 다음으로 끊어 읽기 유형에 따라 사람이 공기를 들이마시는 것처럼 에너지를 충전한다.(509단계) 이제 현재 어절이 마지막 어절인가를 검사한다.(510단계) 마지막 어절이 아니면 에너지 사용단계(502)로 돌아가고 마지막 어절이면 끊어읽기 유형을 3으로 설정하고(511단계), 끊어읽기를 끝마친다.Then, compare whether the syntax depth difference is 2 or more (step 504). If it is 2 or more, set the break type to 2 (step 505). If it is less than 2, set the break type to the same value as the syntax depth difference. (Step 506) Next, compare whether the energy is exhausted, that is, whether the energy value is 0 (step 507). If the energy is 0, increase the value of the read type by 1 (step 508). Depending on the type, people charge the energy as if they inhale air (step 509). Now check whether the current word is the last word (step 510). If not, return to the use of energy (502) and cut off the last word. Set the read type to 3 (step 511), and finish reading.

도 6은 상기 제어정보생성부(205)의 동작을 흐름도로 도시한 것이다. 상기 끊어읽기부(200)에서 한 문장을 이루는 각 어절의 끊어읽기 유형을 결정하면 발음기호 연결에서는 발화구 단위로 각 어절의 발음기호를 연결하여 발화구를 만든다.(601단계) 상기 발화구는 끊어읽기 유형이 1, 2, 3 인 경우이다. 다음으로 발화구내의 무성자음 중 유성화될 수 있는 자음은 유성화한다.(602단계) 상기 유성화규칙은 표 2에 기술되어 있다.6 is a flowchart illustrating the operation of the control information generation unit 205. When the word reading unit 200 determines the type of word reading in each word constituting a sentence, the phonetic symbol connection connects the phonetic symbols of each word in units of a spoken phrase to create a firing phrase (step 601). The read type is 1, 2, or 3. Next, the consonants which can be voiced among the unvoiced consonants in the ignition are voiced (step 602). The meteorization rules are described in Table 2.

다음으로 상기 문맥무음길이저장부(230)의 문맥무음길이표를 참조하여 두 음소간에 무음이 존재할 확률이 크면 음소 사이에 무음기호를 삽입한다.(603단계) 다음으로 발화구의 발음기호에서 음운기호가 아닌 것을 삭제하면서 제어정보를 추출한다.(604단계) 상기 제어정보에는 장음기호, 액센트기호, 영어낱자기호, 숫자끊어읽기 기호, 어절끝 기호, 문장끝 기호 등이 있다. 이제 현재 발화구가 마지막 발화구인가를 검사한다.(605단계) 마지막 발화구가 아니면 발음기호 연결 단계(601)를 돌아가고, 마지막 어절이면 제어정보 생성을 마친다.Next, if there is a high probability that silence exists between two phonemes, referring to the context silence length table of the context silence length storage unit 230, a silent symbol is inserted between the phonemes (step 603). The control information is extracted while deleting other than (step 604). The control information includes a long sign, an accent sign, an English single sign, a number-breaking sign, a word ending sign, and a sentence ending sign. Now, it is checked whether the current caller is the last caller (step 605). If it is not the last caller, the phonetic symbol connection step 601 is returned, and if the last word, the control information is finished.

유성화 규칙Meteorization rules 전음소의 유형Type of phoneme 현음소String 후음소의 유형Type of rear phone 유성meteor 무성/ㄱ/,/ㄴ/,/ㄷ/,/ㅈ/,/ㅎ/Silent / ㄱ /, / ㄴ /, / ㄷ /, / ㅈ /, / ㅎ / 유성meteor

도 7은 상기 음소길이 생성부(210)의 동작을 흐름도로 도시한 것이다. 음소 길이 생성은 발화구내의 음소 단위로 수행된다. 먼저, 현재 음소를 중심으로 문맥을 결정한다.(701단계) 다음으로 현재 음소가 무음인가를 비교한다.(702단계) 무음이면 상기 문맥무음길이저장부(230)의 문맥무음길이표를 탐색하여 무음의 길이를 설정한다.(703단계) 무음이 아니면 일반 음소이므로 상기 문맥음소길이저장부(225)의 문맥음소길이표를 탐색한다.(704단계) 탐색에 성공하지 못하면 상기 음소길이저장부(235)의 음소길이표를 탐색한다.(706단계) 이제 음소의 길이값이 설정되었으면 제어정보에 따라 음소길이를 조절한다.(707단계) 음소길이 조절 규칙은 표 3에 나타나 있다. 이제 현재 음소가 마지막 음소인가를 검사한다.(708단계) 마지막 음소가 아니면 음소의 문맥결정단계(701)로 돌아가고, 마지막 음소이면 음소길이 생성을 마친다.7 is a flowchart illustrating an operation of the phoneme length generator 210. The phoneme length generation is performed in phoneme units within the utterance. First, the context is determined based on the current phoneme. (Step 701) Next, a comparison is made whether the current phoneme is silent (step 702). If it is silent, the context-free length table of the context-free silence length storage unit 230 is searched. The length of the silence is set. (Step 703) If the sound is not a normal phone, the context phoneme length table of the context phoneme length storage unit 225 is searched. (Step 704) If the search is not successful, the phoneme length storage unit ( The phoneme length table of step 235 is searched (step 706). Now, if the phoneme length value is set, the phoneme length is adjusted according to the control information (step 707). It is now checked whether the current phoneme is the last phoneme (step 708). If it is not the last phoneme, it returns to the context determination step 701 of the phoneme, and if it is the last phoneme, the phoneme length generation is finished.

음소길이 조절 규칙Phoneme length adjustment rule 제 어 정 보Control Information 규 칙rule 문장끝, 어절끝, 숫자의 끊어읽기End of sentence, end of word, break of number 큰폭 증가A significant increase 영어의 낱자 읽기Read the English word 중간폭 증가Medium width increase 끊어 읽기 유형 0, 제목Hang read type 0, in title 적은폭 증가Small increase

도 8은 상기 음소크기 생성부(215)의 동작을 흐름도로 도시한 것이다. 음소크기 생성은 발화구내의 음소단위로 수행된다. 먼저 현재 음소를 중심으로 문맥을 결정한다.(801단계) 다음으로 상기 문맥음소크기저장부(245)의 문맥음소 크기표를 탐색한다.(802단계) 탐색에 성공하지 못하면(803단계), 상기 음소크기저장부(240)의 음소크기표를 탐색한다.(804단계) 이제 음소의 길이값이 설정되었으면 제어정보에 따라 음소크기를 조절한다.(805단계) 이제 현재 음소가 마지막 음소인가를 검사한다.(806단계) 마지막 음소의 문맥결정단계(801)로 돌아가고, 마지막 음소이면 음소크기 생성을 마친다.8 is a flowchart illustrating an operation of the phoneme size generator 215. Phoneme size generation is performed in phoneme units within the utterance. First, the context is determined based on the current phoneme. (Step 801) Next, the context phoneme size table of the context phoneme size storage unit 245 is searched (step 802). The phoneme size table of the phoneme size storage unit 240 is searched (step 804). If the phoneme length value is set, the phoneme size is adjusted according to the control information (step 805). It is now checked whether the phoneme is the last phoneme. (Step 806) Return to the context determination step 801 of the last phoneme, if the last phoneme finishes the phoneme size generation.

도 9는 상기 피치패턴 생성부(220)의 동작을 흐름도로 도시한 것이다. 피치패턴을 표현하기 위해 본 발명에서는 음소내에 n개의 피치 제어점을 설정하고(901단계), 이 제어점에 실제 피치값을 할당하는 방법을 사용한다. 피치제어점이 설정되면 먼저 전체윤곽선을 생성한다.(902단계) 상기 전체윤곽선은 발화자의 성별, 발화구의 음소수에 따라 기울기가 다르며, 여성화자의 경우 그 규칙은 표 4.와 같다. 다음으로 발화구의 머리패턴을 생성한다.(903단계) 머리패턴은 발화구의 시작하는 단어의 품사나 모음에 따라 결정된다. 다음으로 발화구의 꼬리패턴은 생성한다.(904단계) 꼬리패턴은 발화구의 끊어읽기 유형, 끝나는 단어의 품사에 따라 결정된다. 다음으로 발화구를 이루는 어절 패턴을 생성한다.(905단계) 어절패턴은 발화구내 각 어절의 유형, 현재의 피치값에 따라 결정된다. 다음으로 액센트 패턴을 생성한다.(906단계) 액센트 패턴은 단어내 액센트 여부에 따라 결정된다. 이제 현재 발화구가 마지막 발화구인가를 검사한다.(907단계) 마지막 발화구가 아니면 피치 제어점설정단계(901)로 돌아가고 마지막 어절이면 피치패턴 생성을 마친다.9 is a flowchart illustrating an operation of the pitch pattern generator 220. In order to express the pitch pattern, the present invention uses a method of setting n pitch control points in the phoneme (step 901) and assigning an actual pitch value to the control points. When the pitch control point is set, first, an overall outline is generated (step 902). The overall outline has a slope according to the sex of the talker and the phoneme number of the talker, and the rules of the female talker are shown in Table 4. Next, a head pattern of the crater is generated (step 903). The head pattern is determined according to the part of speech or the vowel of the starting word of the crater. Next, the tail pattern of the crater is generated (step 904). The tail pattern is determined according to the reading type of the crater and the part of speech of the ending word. Next, a word pattern constituting the crater is generated (step 905). The word pattern is determined according to the type of each word in the crater and the current pitch value. Next, an accent pattern is generated (step 906). The accent pattern is determined based on whether or not an accent is in a word. Now, it is checked whether the current crater is the last crater (step 907). If it is not the last crater, the process returns to the pitch control point setting step 901, and if the last word, the pitch pattern generation is completed.

전체윤곽선 생성 규칙Global outline creation rule 발화구내 음소의 수(Nph)Number of phonemes in the crater (Nph) 시작 피치Starting pitch 끝 피치End pitch Nph 8Nph 8 기준피치Standard pitch 기준피치Standard pitch 9 ≤ Nph 159 ≤ Nph 15 기준피치 + 5Pitch + 5 기준피치 - 10Pitch-10 16 ≤ Nph 2016 ≤ Nph 20 기준피치 + 10Pitch + 10 기준피치 - 20Pitch-20 21 ≤ Nph 3021 ≤ Nph 30 기준피치 + 15Pitch + 15 기준피치 - 30Pitch-30 Nph ≥ 30Nph ≥ 30 기준피치 + 20Pitch + 20 기준피치 - 40Pitch-40

상술한 바와 같이 본 발명에 의하면, 끊어읽기를 수행함에 있어 문장의 구조(구문트리)와 에너지(사람의 숨)를 사용함으로써 합성음의 자연성을 향상시키는 효과가 있다.As described above, according to the present invention, the structure of the sentence (syntax tree) and the energy (human breath) can be used to improve the naturalness of the synthesized sound.

또한 운율을 구성하는 주 요인을 발화구의 피치패턴, 음소의 길이, 크기 등 다양하게 설정하므로써 합성음의 자연성과 명료도를 향상시키는 효과가 있다.In addition, it is possible to improve the naturalness and clarity of the synthesized sound by setting the main factors constituting the rhythm in various ways such as pitch pattern of the utterance, length of the phoneme, and size.

또한 피치, 길이, 크기등 운율의 주 요인에 해당하는 값을 설정함에 있어 언어정보, 사람이 발성한 음성에서 추출한 통계데이터, 규칙을 사용함으로써 합성음의 자연성과 명료도를 향상시키는 효과가 있다.In addition, in setting values corresponding to the main factors of rhyme such as pitch, length, and size, it is effective to improve the naturalness and clarity of the synthesized sound by using language information, statistical data extracted from human voices, and rules.

Claims

In the reading method for improving the naturalness of the synthesized sound of speech synthesis,

An energy initialization step corresponding to the person breathing;

An energy use step of using energy required to pronounce the current word;

Setting a syntax type by calculating a syntax depth difference between a next word and a current word;

When the energy is exhausted, increasing the predetermined interruption reading type set according to the pause length distribution after the ignition bulb extracted from the voice corpus to a type where the pause length is one step longer, and charging the energy; And

Repeating the above steps to the last word, and in the last word is the step of setting the interruption reading type to the type having the longest rest part.

In the phoneme length generation method for improving the naturalness and clarity of the synthesized sound in speech synthesis,

A contextual phoneme length table consisting of a predetermined context consisting of a string, a front phone, a backphone, and a position in a sentence, and a real phoneme length corresponding to each of the contexts, a context consisting of a front phone, a backphone, and each context Generating and providing a phoneme length table consisting of an actual length of a corresponding silence, a contextless length table consisting of a probability that the silence appears, and an average length of each phoneme according to a position of a sentence;

Determining a context based on the current phoneme;

Comparing the current phoneme to be silent or not and searching for the contextless length table to set the length of the silence; otherwise, searching for the contextual phoneme length table to set the phoneme length;

Adjusting the phoneme length according to predetermined control information; And

And repeating the steps to the last phoneme in the utterance.

In the phoneme size generation method for improving the naturalness and clarity of the synthesized sound in speech synthesis,

The context phoneme size table stores the actual phoneme size of the context, consisting of the strings, the front phone, the back phoneme, and the position within the sentence, and the size of each phoneme according to the position of the sentence. Providing a phoneme size table;

Determining a context based on the current phoneme;

Searching for a phoneme size table to set a phoneme size value and, if the search is not successful, searching for a phoneme size table to set a phoneme size value;

Adjusting the phoneme size according to predetermined control information; And

And repeating the steps up to the last phoneme in the utterance.

In the pitch pattern generation method for improving the naturalness and clarity of the synthesized sound in speech synthesis,

Setting a predetermined pitch control point in the phoneme;

Generating an entire contour when the pitch control point is set;

Generating a head pattern of the headpiece determined according to the part of speech or the vowel of the starting word of the headpiece;

Generating a tail pattern that is determined according to a broken reading type of the utterance and a part of speech of the ending word;

Generating a word pattern determined according to the type of each word in the utterance and the current pitch value;

Generating an accent pattern that is determined according to whether the word has an accent; And

And repeating the above steps in units of a speaker until the last one.

In the Korean rhythm generator to improve the naturalness and clarity of the synthesized sound,

A context phoneme length storage unit which stores a context consisting of a string phone, a phoneme, a phoneme, and a position in a sentence and an actual length of a phoneme corresponding to each context;

A context silence length storage unit for storing a context composed of a front phone and a post phone, an actual length of a silence corresponding to each of the contexts, and probability data of occurrence of silence;

A contextual phoneme size storage unit for storing the actual size of the phoneme corresponding to each context and a context consisting of a string, a front phone, a phoneme, and a position in a sentence;

A phoneme length storage unit for storing an average length of each phoneme according to a sentence position;

Phoneme size storage unit for storing the size of each phoneme according to the position of the sentence;

When the distance from the root node of the syntax tree to each node is syntax depth, and the spirocity required for human speech is called energy, the syntax depth difference is calculated from the input sentence and the predetermined depth is calculated using the syntax depth difference and the energy. Breakreader to find the type of break;

When the type of cut-off reading that forms a sentence is determined in the cut-out unit, the phoneme is found in the actual length of each phoneme and the phonetic silence between phonemes. If the phoneme is silent, the context-free length storage unit is searched. A phoneme length generation unit for searching and generating a storage unit and a phoneme length storage unit;

A phoneme size generation unit for generating the actual size of each phoneme forming a utterance by searching for the contextual phoneme size storage unit and the phoneme size storage unit; And

And a pitch pattern generator for generating a pitch pattern of a spokes consisting of a total outline, a head pattern, a tail pattern, a word pattern, and an accent pattern by allocating an actual pitch value to a control point of each phoneme constituting the crater. Korean Rhythm Generator.

The method of claim 5, wherein the predetermined reading type of the cutting part is

Divided according to the resting part length distribution following the ignition extracted from the negative corpus,

Although there is no rest area behind the ignition, it is called a reading type 0 by breaking the ignition with a large change in pitch.

An ignition with a pause of 150 msec behind the ignition.

It is an ignition with a pause of 150-400 msec behind the ignition. It is an ignition with a large difference in syntax depth, and the length of the rest behind the ignition is determined according to the number of phonemes in the ignition.

Korean rhyme generating device, characterized in that the last call out in the sentence type 3 reading.

The method of claim 5,

Control information for reconstructing the phonetic symbols made by words based on the type of cut-outs generated by the cut-out reading unit, and determining control of voiceless voices, inserting silent symbols before and after the closing sound, and generating control information for length and pitch control. Further comprising a generation unit,

The phoneme length generator and the phoneme size generator

And a phoneme length and a phoneme size according to the control information generated by the control information generator after generating a phoneme length and a phoneme size.

In the Korean rhyme generating method to improve the naturalness and clarity of the synthesized sound,

A context phoneme length table that stores a context consisting of a string phone, a front phone, a back phone, and a position in a sentence, and a real length of a phoneme corresponding to each context;

A context silence length table that stores a context consisting of a front phone and a back phone, an actual length of silence corresponding to each of the contexts, and probability data of occurrence of silence;

A contextual phoneme size table that stores a context consisting of a string phone, a phoneme, a phoneme, and a position in a sentence and an actual size of a phoneme corresponding to each context;

Phoneme length table that stores the average length of each phoneme according to the position of the sentence; And

When you have a phoneme size table that stores the size of each phoneme according to the position of the sentence,

When the distance from the root node of the syntax tree to each node is syntax depth, and the spirocity required for human speech is called energy, the syntax depth difference is calculated from the input sentence and the predetermined depth is calculated using the syntax depth difference and the energy. A reading step of finding a type to read off;

A phoneme length step of generating the actual length of each phoneme and the phonetic silence between the phonemes, if the phoneme is silent, searching for the contextual silence table, and if not, searching for the contextual phoneme length table and the phoneme length table;

A phoneme size step of generating the actual size of each phoneme constituting the utterance by searching the contextual phoneme size table and the phoneme size table; And

And a pitch pattern step of generating a pitch pattern of the speech phrase consisting of an overall contour, a head pattern, a tail pattern, a word pattern, and an accent pattern as a method of allocating an actual pitch value to each control point of the phoneme. Korean Rhythm Generator.

10. The method of claim 8, wherein between the interruption reading step and the phoneme length step.

Reconstructing a phonetic symbol generated for each word in a unit of a spoken phrase according to the type of cut off generated by the cutout unit; And

Determining whether there is a voiceless sound, inserting silent symbols before and after the closing sound, and generating control information for controlling length and pitch,

The phoneme length step and phoneme size step

And adjusting the generated phoneme length and the phoneme size according to the generated control information.