KR20030060588A

KR20030060588A - Method for selecting recording sentence for voice synthesis on corpus

Info

Publication number: KR20030060588A
Application number: KR1020020001353A
Authority: KR
Inventors: 류승표; 권오일
Original assignee: 주식회사 현대오토넷
Priority date: 2002-01-10
Filing date: 2002-01-10
Publication date: 2003-07-16

Abstract

PURPOSE: A method for selecting a recorded sentence for corpus-based speech synthesis is provided to construct the corpus using a minimum number of sentences including all synthesis units to optimize a DB size. CONSTITUTION: Flags of the primary corpus sentences are initialized to 1(100). All of synthesis units are collected and classified to form a synthesis unit arrangement table in which the synthesis units are arranged backward(116). All sentences in which the first synthesis unit of the arranged synthesis units appears are selected(104). All sentences including the first synthesis unit are selected(106). A sentence having the largest number of synthesis units that have been not found yet is chosen and the flag of the chosen sentence are changed to 0(110). All synthesis units included in the selected sentences are found from the synthesis unit arrangement table and the flags of the units are changed to 0. It is checked whether the flags of all synthesis units are 0 in the synthesis unit arrangement table(112). The process is finished when the flags are 0 and, when the flags are not 0, a routine, that finding synthesis units whose flags are not 1 from the table and selecting sentences including the synthesis units whose flags are not 1, is repeated(114).

Description

A method for selecting recording sentences for corpus-based speech synthesis {METHOD FOR SELECTING RECORDING SENTENCE FOR VOICE SYNTHESIS ON CORPUS}

본 발명은 코퍼스 기반 음성 합성용 녹음 문장 선정을 위한 방법에 관한 것이다. 본 발명은 코퍼스 기반 음성 합성에서 대량의 1차 코퍼스에서 합성 단위 선별용 2차 코퍼스 문장을 선별해 내는 작업에 대한 방법이다.The present invention relates to a method for selecting a recording sentence for corpus-based speech synthesis. The present invention is a method for screening a second corpus sentence for synthesis unit selection in a large number of primary corpus in corpus-based speech synthesis.

종래의 기술은 문장을 선정하는 특별한 기준이 마련되어 있지 않아 문장 선정시 임의의 문장을 무작위로 추출하여 이를 활용하는 식으로 사용이 되었다.In the prior art, since a special standard for selecting a sentence is not provided, a random sentence is randomly extracted and used when selecting a sentence.

기존의 문장 선정이라 하면 합성 단위나 운율을 고려할 때 특별한 기준이 없이무작위로 문장을 선정하는 경우가 많았다.In the conventional sentence selection, in many cases, sentences were randomly selected without special criteria in consideration of the synthesis unit or rhyme.

그러나 이 방법을 사용하게 되면 최소한의 문장으로 모든 합성 단위를 사용할 수 있기 때문에 음성합성에 적용시 보다 다양한 음성을 만들어 낼 수 있다.However, when using this method, all synthesis units can be used with a minimum amount of sentences, which can produce more diverse voices when applied to speech synthesis.

본 발명은 상술한 종래의 문제점을 극복하기 위한 것으로서, 본 발명의 목적은 문장을 선정하는 두가지 기준 즉,The present invention is to overcome the above-mentioned conventional problems, the object of the present invention is to select two criteria, that is, the sentence

1. 문장이 모든 합성 단위를 포함할 것.The sentence should include all compound units.

2. 문장의 크기는 가급적 작게 선정할 것.2. The sentence size should be as small as possible.

을 마련하여 이를 활용하는 방법으로 문장을 선정하는 코퍼스 기반 음성 합성용 녹음 문장 선정을 위한 방법을 제공하는데 있다.To provide a method for selecting a recording sentence for the corpus-based speech synthesis to select a sentence by using this method.

상기 본 발명의 목적을 달성하기 위한 본 발명에 따른 코퍼스 기반 음성 합성용 녹음 문장 선정을 위한 방법의 일예로써,As an example of a method for selecting a recording sentence for corpus-based speech synthesis according to the present invention for achieving the object of the present invention,

1차 코퍼스 문장의 플래그를 모두 1로 초기화하는 단계와,Initializing all the flags of the first corpus sentence to 1,

합성 단위를 모두 수집하여 분류한 다음 그 출현 빈도를 구하고 그 이름과 빈도의 역순으로 정렬하여 합성 단위 역순 정렬표를 만드는 단계와,Collecting and classifying all composite units, obtaining their frequency of occurrence, and sorting them in reverse order of their name and frequency to create a composite unit reverse sort table;

1차 역순으로 정렬된 첫번째 합성 단위가 나오는 모든 문장을 선택하는 단계와,Selecting all sentences that result in the first composite unit sorted in reverse first order,

이 합성단위가 포함된 모든 문장을 골라내는 단계와,Picking out all sentences that contain this composite unit,

이미 선정된 문장에 포함된 합성 단위는 제외하고 아직 찾아지지 않은 합성 단위의 개수가 가장 많은 문장을 선정하여 그 문장의 플래그를 0으로 바꾸는 단계와,Selecting a sentence having the largest number of composition units not yet found except for the synthesis unit included in the already selected sentence, and changing the flag of the sentence to 0;

선정된 문장이 포함하는 모든 합성 단위들을 합성단위 역순 정열표에서 찾아서 그 플래그를 0으로 바꾸는 단계와,Finding all compound units included in the selected sentence in the compound unit reverse order table and changing the flag to 0,

합성 단위 역순 정렬표(116)에서 모든 합성 단위의 플래그가 0인지 조사하여, 모두 0이면 작업이 완료된 것이고, 그렇지 않으면, 합성 단위 역순 정렬표(116)에서 합성단위의 플래그가 1이 아닌 다음 합성 단위를 찾아내어, 이 단위가 포함된 모든 문장을 선택하는 단계 루틴을 반복하는 단계Check if all of the synthesis units' flags are 0 in the synthesis unit reverse order table 116, and if all are 0, the operation is complete; otherwise, the next composite in which the unit of synthesis is not 1 in the synthesis unit reverse order table 116 Find the unit and select all the sentences that contain this unit Repeat the routine

로 구성된다.It consists of.

도 1은 본 발명에 따른 코퍼스 기반 음성 합성용 녹음 문장 선정을 위한 방법을 도시한 플로우챠트1 is a flowchart illustrating a method for selecting a recording sentence for corpus-based speech synthesis according to the present invention.

< 도면의 주요 부분에 대한 부호의 설명 ><Description of Symbols for Main Parts of Drawings>

116 : 합성단위역순정열표 118 : 2차 코퍼스 문장들116: Synthetic unit reverse order table 118: Secondary corpus sentences

120 : 1차 코퍼스 문장들120: primary corpus sentences

이하, 본 발명의 실시예를 첨부한 도면을 참조하여 설명하기로 한다.Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings.

본 발명에 따른 코퍼스 기반 음성 합성용 녹음 문장 선정을 위한 방법을 설명하기 전에 관련 용어를 정의하기로 한다.Before describing a method for selecting a corpus-based speech synthesis recording sentence according to the present invention, related terms will be defined.

1. 음성 합성1. Speech synthesis

음성합성이란 기계 또는 컴퓨터로 하여금 인간의 언어와 같은 형태의 출력음을 만들어 내는 기술을 의미하는 것으로 좁은 의미로 TTS가 있다. TTS란 text to speech로서, 글로 작성된 문장 형태의 데이터를 입력으로 받아 이를 음성 언어로 출력하여 내보내 주는 시스템을 의미한다.Speech synthesis refers to a technology that allows a machine or computer to produce output sounds in the form of a human language, and in a narrow sense, TTS. TTS is text to speech, a system that receives data in the form of written sentences as input and outputs it in a speech language.

2. 코퍼스(corpus)2. Corpus

코퍼스란 말묶음이라는 의미로 소설이나 신문, 잡지, 사전 등 모든 형태의 문장들의 집합 또는 그 중에서 선별된 다량의 문장들의 집합을 의미한다.Corpus is a word set meaning a collection of sentences of all types, such as novels, newspapers, magazines, dictionaries, or a large collection of sentences selected from them.

3. 코퍼스 기반 음성 합성3. Corpus-based Speech Synthesis

음성 합성이 코퍼스를 기반으로 한다는 것은 위에서 설명한 다량의 문장 집합을 기반으로 한다는 의미이다. 다시 말해서 이 다량의 문장을 언어학, 음성학적으로 분석하여 운율 요소를 모델링하고 이를 기반으로 음성 합성용 운율을 만들어The fact that speech synthesis is based on corpus means that it is based on the large set of sentences described above. In other words, by analyzing linguistic and phonological analysis of this large amount of sentences, modeling rhyme elements and making rhymes for speech synthesis

낸다는 의미이다. 또한, 이 문장을 그대로 한명의 화자에게 발성을 시켜 이를 녹음한 다음 이 녹음된 문장을 작은 절편으로 잘라내어 DB화 해두었다가 역시 음성 합성시 발음열에 맞게 이 절편을 적당히 접합시켜서 인위적인 합성음을 만들어 내게 된다.It means to serve. In addition, this sentence is spoken to one speaker as it is, and then recorded, and then the recorded sentence is cut into small fragments and made into a DB.

따라서 코퍼스 기반 음성 합성에서는 바로 이 문장 선정을 얼마나 효과적으로 하느냐 하는 방법이 일차적으로 중요한 문제가 된다.Therefore, how to effectively select the sentence in corpus-based speech synthesis is an important issue.

4. 문장 선정4. Sentence Selection

여기서 말하는 문장은 다량의 문장 코퍼스에서 음성 합성에 가장 적합한 최소한의 문장 집합을 말하는 것으로, 문장 선정이라 하는 것은 이 최소한의 문장을 다량의 문장 집합에서 골라내는 작업을 의미하는 것이다The sentence here refers to the smallest set of sentences that is best suited for speech synthesis in a large sentence corpus, and the sentence selection refers to the task of selecting the minimum sentence from the large set of sentences.

여기서 다량의 문장을 1차 코퍼스라 하고 선정된 문장 집합을 2차 코퍼스라 한다. 2차 코퍼스가 바로 음성 합성에 사용되게 된다.Here, a large amount of sentences is called a primary corpus and a selected set of sentences is called a secondary corpus. Secondary corpus will be used for speech synthesis.

다시 말하자면 1차 코퍼스에서 2차 코퍼스 문장을 선별해 내는 작업이 곧In other words, screening the second corpus sentence from the first corpus

문장 선정이 되는 것이다.It is a sentence selection.

5. 절편5. Intercept

절편이라 함은 어떤 임의의 문장을 녹음한 파형이 있을 때, 이 녹음 파형의 일부분을 의미하는 것이다. 일반적으로 음성 합성에서는 음소, 음절, 이음소, 삼음소 등이 사용된다.Intercept means a portion of a recorded waveform when there is a waveform recorded for any sentence. In general, phonemes, syllables, two phonemes, and three phonemes are used in speech synthesis.

6. 합성 단위6. Composite unit

합성 단위란 음성 합성을 수행할 때 접합되는 최소의 단위 요소를 말하는 것이다.위에서 말한 절편이 이에 해당하는 것으로 음성 파형을 같은 길이 별로 작게 나누어 두었다가 합성을 수행할 때 이를 연쇄적으로 접합시켜 완성된 파형을 만들어 내게된다.Synthesis unit refers to the minimum unit element to be joined when performing speech synthesis. The above-mentioned intercept is equivalent to a waveform formed by dividing the speech waveform into smaller pieces by the same length and then chaining them together when performing synthesis. Make me up.

7. 발음 변환7. Pronunciation conversion

발음변환이라는 것은 문서 형태의 문장 데이터를 소리나는 데로 적은 것을 말한다. 이 때 합성 단위가 음소이면 음소열로, 이음소면 이음소열로 발음변환열이Pronunciation conversion refers to the phonetic writing of sentence data. At this time, if the synthesis unit is phoneme, the phoneme sequence is converted to phoneme sequence,

만들어 지게 된다.Will be made.

이하 도 1을 참조하여 본 발명에 따른 코퍼스 기반 음성 합성용 녹음 문장 선정을 위한 방법을 설명하기로 한다.Hereinafter, a method for selecting a corpus-based speech synthesis recording sentence according to the present invention will be described with reference to FIG. 1.

본 발명에서는 문장을 선정하는 두가지 기준을 아래와 같이 마련하였다In the present invention, two criteria for selecting sentences are prepared as follows.

먼저 많은 다량의 문장을 수집하여 이를 1차 코퍼스로 정한다.First, a large amount of sentences are collected and set as the primary corpus.

이러한 1차 코퍼스 문장의 플래그를 모두 1로 초기화한다(100).All the flags of the first corpus sentence are initialized to 1 (100).

그 다음 이 1차 코퍼스의 모든 문장을 발음변환하여 문장별 합성 단위의 열로 바꾼다. 그 다음 이 합성 단위를 모두 수집하여 분류한 다음 그 출현 빈도를 구하고 그 이름과 빈도의 역순으로 정렬하여 합성 단위 역순 정렬표(116)를 만든다(102).Then, all the sentences of the first corpus are pronunciation-transformed and converted into columns of compound units for each sentence. Then, all of these synthesis units are collected and sorted, and their frequency of occurrence is obtained and sorted in reverse order of their names and frequencies to form a synthesis unit reverse ordering table 116 (102).

다음으로, 1차 역순으로 정렬된 첫번째 합성 단위가 나오는 모든 문장을 선택한다(104).Next, all the sentences with the first composite unit sorted in the first reverse order are selected (104).

그리고, 이 합성단위가 포함된 모든 문장을 골라낸다(106). 즉, 합성 단위역순 정렬표의 순서대로 합성 단위를 포함하는 문장들을 찾아내서 그 문장 내의 찾아지지 않은 합성 단위들의 개수를 구하여 이를 빈도합으로 놓고 빈도합이 가장 큰 문장을 선정한다. 이것은 그 문장들 중 아직 찾아지지 않은 합성 단위의 개수가 가장 많은 문장을 선정하면 이 문장이 첫번째 선정된다. 이 문장을 따로 저장하여 2차 코퍼스가 된다.Then, all the sentences containing this synthesis unit are selected (106). In other words, the sentences including the synthesis unit are found in the order of the synthetic unit reverse order table, the number of unsynthesized units in the sentence is found, the frequency sum is set, and the sentence with the largest frequency sum is selected. It selects the sentence with the largest number of synthesizing units that have not been found yet. Save this sentence separately to become the secondary corpus.

음성합성을 위한 2차 코퍼스는 가능한 1차 코퍼스에서 추출된 모든 합성 단위가 포함되어야 하고, 또한 가능하면 다양한 운율현상이 반영되어 있는 문장을Secondary corpus for speech synthesis should include all synthesized units extracted from possible primary corpus and, if possible, a sentence that reflects various rhyme phenomena.

선택한다면 바람직할 것이다.It would be desirable to choose.

따라서 녹음할 문장을 선택하게 될 기준을 요약해보면 아래와 같다.Therefore, summarize the criteria for selecting the sentence to be recorded as follows.

(1)모든 합성 단위가 포함되어야 한다.(1) All synthetic units are to be included.

(2) 다양한 운율이 반영되어야 한다.(2) Various rhymes should be reflected.

(3) 가능한한 선택 문장의 크기가 작아야 한다(3) The selection sentence should be as small as possible

세번째 조건은, 현실적으로 합성 시스템을 제작할 때 DB사이즈를 고려하여 최소한의 용량으로 최대의 효과를 나타낼 수 있게 문장 크기를 최적화시키자는 의미이다.The third condition is to optimize the sentence size so that the maximum effect can be achieved with the minimum capacity in consideration of the DB size.

이 때, 운율 요소를 고려하는 두번째 제약조건은 빼도록 한다. 왜냐하면 모든 운율요소를 고려하기는 현실적으로 불가능하기 때문이다. 따라서 여기서는 1번과 3번의 기준만 가지고 문장을 선정한다.At this time, the second constraint considering the rhyme factor is omitted. For it is impossible to consider all the rhyme elements. Therefore, here, sentences are selected using only the criteria of 1 and 3.

이미 선정된 문장에 포함된 합성 단위는 제외하고 아직 찾아지지 않은 합성 단위의 개수가 가장 많은 문장을 선정하여 그 문장의 플래그를 0으로 바꾼다(108).Except for the synthesis unit included in the already selected sentence, the sentence with the largest number of synthesis units not yet found is selected and the flag of the sentence is changed to 0 (108).

선정된 문장이 포함하는 모든 합성 단위들을 합성단위 역순 정열표(116)에서 찾아서 그 플래그를 0으로 바꾼다(110). 합성 단위 역순 정렬표를 참조하여 그 다음 합성 단위에 대해서도 이와 같은 절차를 거쳐 문장을 선정해 나간다All the synthesis units included in the selected sentence are found in the synthesis unit reverse order table 116 and the flag is changed to 0 (110). Refer to the Synthetic Unit Reverse Order Table and then select the sentence for the next synthetic unit.

합성 단위 역순 정렬표(116)에 나오는 모든 합성 단위에 대해서 문장을 선정하게 되면 문장 선정이 완료되고 선정된 문장은 결과로 출력된다.When a sentence is selected for all the synthesis units shown in the synthesis unit reverse order table 116, the sentence selection is completed and the selected sentence is output as a result.

합성 단위 역순 정렬표(116)에서 모든 합성 단위의 플래그가 0인지 조사한다(112). 모두 0이면 작업이 완료된 것이고, 그렇지 않으면, 합성 단위 역순 정렬표(116)에서 합성단위의 플래그가 1이 아닌 다음 합성 단위를 찾아내어(114), 106 루틴을 반복한다.In the synthesis unit reverse order table 116, it is checked if the flags of all the synthesis units are zero (112). If all zeros, the operation is complete; otherwise, the synthesis unit finds the next synthesis unit whose flag is not 1 in synthesis unit reverse order table 116 (114) and repeats the routine.

상술한 바와같이, 본 발명에 따른 코퍼스 기반 음성 합성용 녹음 문장 선정을 위한 방법은 모든 합성 단위를 포함하는 최소한의 문장으로 코퍼스를 구성할 수 있기 때문에 DB크기를 최적화 시키기에 적합하다.As described above, the method for selecting a recording sentence for corpus-based speech synthesis according to the present invention is suitable for optimizing the DB size because the corpus can be configured with a minimum sentence including all the synthesis units.

또한, 많은 빈도를 보이는 합성 단위일수록 출현 횟수가 높아지기 때문에 그 만큼 다양한 운율을 만들어 내기가 용이하다.In addition, it is easy to produce a variety of rhymes because the frequency of appearance is higher in the synthetic unit showing a higher frequency.

알고리즘이 간단하여 적은 비용으로 시스템을 구성할 수 있다.The algorithm is simple to configure the system at low cost.

이상에서 설명한 것은 본 발명에 따른 코퍼스 기반 음성 합성용 녹음 문장 선정을 위한 방법을 실시하기 위한 하나의 실시예에 불과한 것으로서, 본 발명은상기한 실시예에 한정되지 않고, 이하의 특허청구범위에서 청구하는 바와같이 본 발명의 요지를 벗어남이 없이 당해 발명이 속하는 분야에서 통상의 지식을 가진 자가라면 누구든지 다양한 변경 실시가 가능한 범위까지 본 발명의 기술적 정신이 있다고 할 것이다.What has been described above is only one embodiment for implementing a method for selecting a corpus-based speech synthesis recording sentence according to the present invention, the present invention is not limited to the above-described embodiment, it is claimed in the claims As will be apparent to those skilled in the art to which the present invention pertains without departing from the gist of the present invention, the technical spirit of the present invention will be described to the extent that various modifications can be made.

Claims

Initializing all the flags of the first corpus sentence to 1,

Collecting and classifying all composite units, obtaining their frequency of occurrence, and sorting them in reverse order of their name and frequency to create a composite unit reverse sort table;

Selecting all sentences that result in the first composite unit sorted in reverse first order,

Picking out all sentences that contain this composite unit,

Selecting a sentence having the largest number of composition units not yet found except for the synthesis unit included in the already selected sentence, and changing the flag of the sentence to 0;

Finding all compound units included in the selected sentence in the compound unit reverse order table and changing the flag to 0,

Check if all of the synthesis units' flags are 0 in the synthesis unit reverse order table 116, and if all are 0, the operation is complete; otherwise, the next composite in which the unit of synthesis is not 1 in the synthesis unit reverse order table 116 Find the unit and select all the sentences that contain this unit Repeat the routine

Method for selecting a recording sentence for corpus-based speech synthesis, characterized in that consisting of.