KR100959494B1

KR100959494B1 - Voice Synthesizer and Its Method using Processing Not registered Word

Info

Publication number: KR100959494B1
Application number: KR1020030014024A
Authority: KR
Inventors: 한민수; 백승권; 류창선
Original assignee: 주식회사 케이티
Priority date: 2003-03-06
Filing date: 2003-03-06
Publication date: 2010-05-26
Also published as: KR20040079099A

Abstract

본 발명은 미등록어 합성 기능을 이용한 음성합성기 및 그 방법에 관한 것으로, 형태소 분석 및 구문분석에 의하여 처리되지 않으며 예외사전에도 등록되어 있지 않은 미등록어에 끊어읽기 정보를 삽입하여 합성음을 생성함으로써, 합성음의 명료도 및 이해도를 향상시킬 수 있는, 미등록어 합성 기능을 이용한 음성합성기 및 그 방법을 제공하고자 한다.The present invention relates to a speech synthesizer using a non-registered word synthesizing function and a method thereof. The present invention aims to provide a speech synthesizer using a non-registered word synthesizing function and a method of improving the clarity and comprehension.

이를 위하여, 본 발명은, 예외적인 단어들이 저장되어 있는 제 1 저장수단; 음절단위 데이터가 저장되어 있는 제 2 저장수단; 합성단위 데이터가 저장되어 있는 제 3 저장수단; 외부로부터 텍스트 데이터에 대해 형태소 및 구문 분석을 수행하고, 형태소 및 구문 분석에 의하여 처리되지 않으며 상기 제 1 저장수단에도 등록되어 있지 않은 미등록어를 추출하기 위한 언어처리수단; 상기 언어처리수단으로부터의 미등록어에 끊어읽기 정보를 삽입하기 위한 미등록어처리수단; 상기 언어처리수단에서 분석된 텍스트 데이터 및 상기 미등록어처리수단에서 처리된 텍스트 데이터에 대해 운율 처리를 수행하기 위한 운율처리수단; 상기 운율처리수단에서 처리된 텍스트 데이터를 입력받아 상기 제 2 저장수단 또는 상기 제 3 저장수단을 탐색하여 합성단위 정보를 삽입하기 위한 합성단위처리수단; 및 상기 합성단위처리수단에서 처리된 텍스트 데이터를 입력받아 합성음을 생성하기 위한 합성음생성수단을 포함하되, 상기 미등록어처리수단은, 상기 언어처리수단으로부터 입력받은 미등록어를 확인하여, 미등록어가 숫자열이면 숫자 단위로 분절하여 분절된 숫자 사이에 약경계 끊어읽기 정보를 삽입하고, 미등록어가 숫자열이 아니면 음절 단위로 분절하여 분절된 음절 사이에 약경계 끊어읽기 정보를 삽입한 후, 상기 약경계 끊어읽기 정보가 삽입된 미등록어의 양쪽 경계에 강경계 끊어읽기 정보를 삽입한다.To this end, the present invention comprises: first storage means for storing exceptional words; Second storage means for storing syllable unit data; Third storage means for storing the synthesis unit data; Language processing means for performing morpheme and parsing on text data from outside and extracting unregistered words which are not processed by morpheme and parsing and are not registered in the first storage means; Unregistered word processing means for inserting unread information into unregistered words from said language processing means; Rhyme processing means for performing rhyme processing on the text data analyzed by the language processing means and the text data processed by the non-registered word processing means; Synthesis unit processing means for receiving text data processed by the rhyme processing means and searching for the second storage means or the third storage means to insert synthesis unit information; And synthesized sound generating means for generating the synthesized sound by receiving the text data processed by the synthesized unit processing means, wherein the unregistered word processing means checks an unregistered word input from the language processing means, and the unregistered word is a numeric string. If the non-registered word is not a number string, insert weak boundary cut-out information between segmented numbers and insert the weak boundary cut-out information between segmented syllables. The hard boundary break information is inserted at both boundaries of the non-registered word in which the read information is inserted.

끊어읽기 정보, 코퍼스(Corpus), 음성합성, 강경계, 약경계, 미등록어Unread information, corpus, speech synthesis, hard boundary, weak boundary, unregistered words

Description

Voice Synthesizer and Its Method using Processing Not registered Word}

도 1은 본 발명의 일실시예에 따른 미등록어 합성 기능을 이용한 음성합성기의 구성도.1 is a block diagram of a speech synthesizer using a non-registered word synthesizing function according to an embodiment of the present invention.

도 2는 본 발명의 일실시예에 따른 미등록어 합성 기능을 이용한 음성 합성 방법에 대한 흐름도.2 is a flowchart illustrating a speech synthesis method using a non-registered word synthesizing function according to an embodiment of the present invention.

도 3은 본 발명의 일실시예에서 사용자의 선택에 따라 합성음을 출력하는 과정에 대한 설명도.
3 is a diagram illustrating a process of outputting a synthesized sound according to a user's selection in an embodiment of the present invention.

* 도면의 주요 부분에 대한 부호 설명* Explanation of symbols on the main parts of the drawing

11 : 언어처리부 12 : 미등록어처리부11: language processing unit 12: unregistered word processing unit

13 : 운율처리부 14 : 합성단위처리부13: rhyme processing unit 14: synthesis unit processing unit

15 : 합성필터 16 : 예외사전15: Synthesis filter 16: Exception dictionary

17 : 음절 데이터베이스 18 : 합성 데이터베이스
17: Syllable Database 18: Synthetic Database

본 발명은 미등록어 합성 기능을 이용한 음성합성기 및 그 방법에 관한 것이다.The present invention relates to a speech synthesizer using a non-registered word synthesizing function and a method thereof.

일반적으로, 음성합성기는 텍스트 데이터로부터 인간의 목소리와 흡사한 음성 신호를 합성하는 것으로, 전화 서비스나 음성 정보 시스템 등에서 많이 이용된다. 종래의 음성합성 방식은 음성 신호를 대표하는 파라메터들을 추출하고 이를 이용하여 음성을 생성하는 방법을 정형화하는 규칙 합성 방법 또는 파라메터 합성 방법에 의한 합성기술이 널리 이용되었다. 그러나, 현재에는 컴퓨터 주변기기의 발달과 더불어 합성기술도 원래의 음성신호를 그대로 이용한 대용량 코퍼스 기반 합성 방식이 도입되어 합성음의 품질이 향상되었다.In general, a speech synthesizer synthesizes a voice signal similar to a human voice from text data, and is widely used in a telephone service or a voice information system. In the conventional speech synthesis scheme, a synthesis technique based on a rule synthesizing method or a parameter synthesizing method for extracting parameters representing speech signals and shaping a speech using the same is widely used. However, nowadays, with the development of computer peripherals, the synthesis technology has introduced a large-capacity corpus-based synthesis method using the original voice signal, thereby improving the quality of the synthesized sound.

코퍼스(Corpus)란 언제든지 재사용이 가능하도록 부가적인 정보와 도큐먼트가 갖추어져 있으며, 컴퓨터로 읽을 수 있는 형태로 구성된 음성자료의 모음을 말한다. 그런 의미에서 보면 부가적인 자료가 불충분하고 컴퓨터로 읽기 어려운 아날로그 테이프 형태의 대량의 방송자료들은 이러한 정의에서 제외되나, 반면에 음성신호와 함께 수집된 발성에 관계되는 생리적인 신호(EMG, EGG 등) 등은 이 범주에 포함시켜 다루고 있다.Corpus is a collection of audio data in computer readable form with additional information and documentation that can be reused at any time. In that sense, large amounts of broadcast material in the form of analog tapes with insufficient additional data and difficult to read by computer are excluded from this definition, while physiological signals related to phonation collected with voice signals (EMG, EGG, etc.) Etc. are included in this category.

이러한 자료의 묶음을 음성정보 처리분야에서는 그동안 음성 데이터베이스라 불러왔으나, 데이터베이스 시스템의 의미보다는 "대량의 음성 데이터의 집적"이라는 의미가 강하므로, 최근에는 데이터의 뭉치 또는 묶음이라는 의미의 음성코퍼스(Corpus) 또는 음성언어코퍼스(Spoken Language Corpus)라고 부른다. 음성언어코퍼스의 경우는 기존의 텍스트코퍼스에서 다루던 대화음성 또는 자유발화음성의 전사(Transcription)된 형태도 포괄적으로 포함하는 경향이 있다.In the field of voice information processing, such a bundle of data has been called a voice database. However, since the meaning of "accumulation of a large amount of voice data" is stronger than that of a database system, it is recently called a corpus of corpus (bundle or bundle of data). Or Voice Language Corpus. In the case of the speech language corpus, there is a tendency to comprehensively include the transcribed form of the dialogue speech or the free speech speech, which was dealt with in the conventional text corpus.

이러한 음성언어코퍼스는 여러 가지 응용을 생각할 수 있는데, 크게는 연구용과 기술적인 응용(개발용)으로 나눌 수 있다. 연구용의 경우는 먼저 음성 그 자체의 생성, 전달, 지각 과정을 규명하고 그 언어적인 현상을 중심으로 한 음성학적 연구, 음성언어를 통해 성별, 연령별, 지역별, 계층별 변화 및 방언 등에 관심을 둔 사회언어학적 연구, 언어의 심리적 현상을 다루는 심리 언어학적 연구, 모국어나 제2외국어의 언어 습득 및 훈련에 관한 연구, 일반적인 언어학 연구, 청각학(Audiology) 및 음성병리학적인 연구 등과 같은 그 기본 연구환경에 쓰인다. 기술적 응용으로는 음성의 합성에 필요한 기본적인 합성단위의 추출 및 음운, 운율 규칙을 위한 기본자료로 쓰이며, 음성인식 및 화자인식의 경우에는 인식 알고리듬의 훈련 및 평가용으로 필수적인 자원이다.This speech language corpus can be thought of in various applications, which can be broadly divided into research and technical applications (development). In the case of research, first of all, the process of creating, transmitting and perceiving voice itself is examined. Phonetic research focusing on the linguistic phenomena, and the language focused on gender, age, region, class, and dialect through voice language. Used in basic research environments such as linguistic studies, psycholinguistic studies dealing with the psychological phenomena of language, studies of language acquisition and training in the mother tongue or a second foreign language, general linguistic studies, audiology and phonological studies, etc. . For technical application, it is used as basic data for the extraction of the basic synthesis units necessary for the synthesis of speech, phonology, and rhyme rules. In the case of speech recognition and speaker recognition, it is an essential resource for training and evaluation of recognition algorithms.

한편, 음성언어코퍼스는 단순히 음성을 기록하여 보존하는 것만이 아니라 어떤 음성이 어디에 보존되어 있는가 하는 색인정보도 가지고 있다. 따라서, 지정한 단어 또는 문장을 바로 음성으로 들어볼 수도 있고 어떤 음소열이나 음운현상을 포함한 음성자료들(예를 들면, "앞뒤에 유성음으로 둘러쌓인 ‘ㄱ’,‘ㄷ’,‘ㅂ’ 이 포함된 단어 또는 문장들을 모두 찾아라" 등)만을 임의로 검색해 볼 수도 있다. 또한, 발성내용 이외에도 발성자에 관한 정보(성별, 연령, 출신지 등)도 포함되어 있어 발성자에 따른 여러 음성현상들도 분석해 볼 수 있다. 이와 같은 검색이 가능하도록 하기 위해, 음성언어학적인 여러 구분에 관한 부가정보를 부여하는 것을 레이블링(labelling)이라고 부른다. (언어레벨의 경우는 태깅(Tagging), 음성레벨의 경우는 레이블링이라고 부른다.) 레이블링의 단위로는 음소, 단어, 어절, 문장 등이 있다.On the other hand, the voice language corpus not only records and preserves voices but also has index information on which voices are stored and where. Therefore, you can listen directly to a specified word or sentence, and voice material including any phoneme string or phonological phenomenon (for example, "a", "c", and "ㅂ" surrounded by voiced sounds before and after Find all words or phrases ", etc.). In addition to the contents of the voice, information on the speaker (gender, age, place of birth, etc.) is included, so that various voice phenomena according to the speaker can be analyzed. In order to enable such a search, the provision of additional information on various phonetic linguistic divisions is called labeling. (In the case of language level, it is called tagging, and in the case of voice level, it is called labeling.) The unit of labeling is phoneme, word, word, sentence, etc.

단어나 그 이상을 단위로 할 경우는 비교적 큰 문제는 없지만 음소 이하의 단위로 레이블링을 할 경우는 시간적으로 연속된 파형 상에서 그 구분(Segmentation)을 정하는 것이 쉽지 않다. 따라서, 연구자들 간에 공통적으로 사용할 수 있도록 일정한 기준을 마련해 두어야 한다. 또한, 음운정보 만이 아니라 운율정보(예를 들어, 억양정보)를 부여한 코퍼스도 있다.It is not a big problem to use words or more as a unit, but when labeling by sub phoneme, it is not easy to determine the segmentation on a continuous waveform in time. Therefore, certain criteria should be in place for common use among researchers. In addition, there is a corpus in which not only phonological information but also rhyme information (for example, intonation information) are assigned.

이와 같은 대용량 코퍼스 기반 합성 방식은 주어진 텍스트 정보로부터 형태소 정보와 통사 정보를 추출하고 이에 적합한 합성단위를 선정한다. 그런데, 분석된 합성단위가 음성 데이터베이스에 등록되어 있지 않은 고유명사나 인명, 지명, 숫자열, 상호 등의 미등록 어휘이면, 다른 합성단위로 교체되어 부자연스러운 합성음을 생성하게 되고, 그에 따라 합성음의 자연성, 명료도, 이해도 및 음질이 저하되는 문제점이 있다.
Such a large-scale corpus-based synthesis method extracts morphological information and syntactic information from given text information and selects a suitable synthesis unit. However, if the analyzed unit is an unregistered vocabulary such as a proper noun, a person's name, a place name, a string of numbers, a trade name, etc. that are not registered in the speech database, the synthesized unit is replaced with another unit to generate an unnatural synthesized sound. Clarity, comprehension, and sound quality are deteriorated.

본 발명은, 상기와 같은 문제점을 해결하기 위하여 제안된 것으로, 형태소 분석 및 구문분석에 의하여 처리되지 않으며 예외사전에도 등록되어 있지 않은 미등록어에 끊어읽기 정보를 삽입하여 합성음을 생성함으로써, 합성음의 명료도 및 이해도를 향상시킬 수 있는 미등록어 합성 기능을 이용한 음성합성기 및 그 방법을 제공하는데 그 목적이 있다.The present invention has been proposed in order to solve the above problems, by inserting the reading information into the unregistered words that are not processed by morphological analysis and syntax analysis and are not registered in the exception dictionary, thereby producing a synthesized sound, And to provide a speech synthesizer using a non-registered word synthesis function that can improve the understanding and its object.

상기의 목적을 달성하기 위하여 본 발명은, 예외적인 단어들이 저장되어 있는 제 1 저장수단; 음절단위 데이터가 저장되어 있는 제 2 저장수단; 합성단위 데이터가 저장되어 있는 제 3 저장수단; 외부로부터 텍스트 데이터에 대해 형태소 및 구문 분석을 수행하고, 형태소 및 구문 분석에 의하여 처리되지 않으며 상기 제 1 저장수단에도 등록되어 있지 않은 미등록어를 추출하기 위한 언어처리수단; 상기 언어처리수단으로부터의 미등록어에 끊어읽기 정보를 삽입하기 위한 미등록어처리수단; 상기 언어처리수단에서 분석된 텍스트 데이터 및 상기 미등록어처리수단에서 처리된 텍스트 데이터에 대해 운율 처리를 수행하기 위한 운율처리수단; 상기 운율처리수단에서 처리된 텍스트 데이터를 입력받아 상기 제 2 저장수단 또는 상기 제 3 저장수단을 탐색하여 합성단위 정보를 삽입하기 위한 합성단위처리수단; 및 상기 합성단위처리수단에서 처리된 텍스트 데이터를 입력받아 합성음을 생성하기 위한 합성음생성수단을 포함하되, 상기 미등록어처리수단은, 상기 언어처리수단으로부터 입력받은 미등록어를 확인하여, 미등록어가 숫자열이면 숫자 단위로 분절하여 분절된 숫자 사이에 약경계 끊어읽기 정보를 삽입하고, 미등록어가 숫자열이 아니면 음절 단위로 분절하여 분절된 음절 사이에 약경계 끊어읽기 정보를 삽입한 후, 상기 약경계 끊어읽기 정보가 삽입된 미등록어의 양쪽 경계에 강경계 끊어읽기 정보를 삽입한다.The present invention to achieve the above object, the first storage means for storing exceptional words; Second storage means for storing syllable unit data; Third storage means for storing the synthesis unit data; Language processing means for performing morpheme and parsing on text data from outside and extracting unregistered words which are not processed by morpheme and parsing and are not registered in the first storage means; Unregistered word processing means for inserting unread information into unregistered words from said language processing means; Rhyme processing means for performing rhyme processing on the text data analyzed by the language processing means and the text data processed by the non-registered word processing means; Synthesis unit processing means for receiving text data processed by the rhyme processing means and searching for the second storage means or the third storage means to insert synthesis unit information; And synthesized sound generating means for generating the synthesized sound by receiving the text data processed by the synthesized unit processing means, wherein the unregistered word processing means checks an unregistered word input from the language processing means, and the unregistered word is a numeric string. If the non-registered word is not a number string, insert weak boundary cut-out information between segmented numbers and insert the weak boundary cut-out information between segmented syllables. The hard boundary break information is inserted at both boundaries of the non-registered word in which the read information is inserted.

한편, 본 발명은, 언어처리부가 외부로부터 텍스트 데이터를 입력받아 형태소 및 구문 분석을 수행하고, 형태소 및 구문 분석에 의하여 처리되지 않으며 예외사전에도 등록되지 않은 미등록어를 추출하는 제 1 단계; 미등록어처리부가 상기 언어처리부로부터의 미등록어에 끊어읽기 정보를 삽입하는 제 2 단계; 운율처리부가 상기 언어처리부에서 분석된 텍스트 데이터 및 상기 미등록어처리부에서 처리된 텍스트 데이터를 입력받아 운율 처리를 수행하는 제 3 단계; 합성단위처리부가 상기 운율처리부에서 처리된 텍스트 데이터를 입력받아 음절 데이터베이스 또는 합성 데이터베이스를 탐색하여 합성단위 정보를 삽입하는 제 4 단계; 및 합성필터가 상기 합성단위처리부에서 처리된 텍스트 데이터를 입력받아 합성음을 생성하여 출력하는 제 5 단계를 포함하되, 상기 제 2 단계는, 상기 미등록어처리부가 상기 언어처리부로부터 입력받은 미등록어가 숫자열인지 확인하는 제 6 단계; 상기 제 6 단계의 확인 결과, 미등록어가 숫자열이면 숫자 단위로 분절하여 분절된 숫자 사이에 약경계 끊어읽기 정보를 삽입하는 제 7 단계; 상기 제 6 단계의 확인 결과, 미등록어가 숫자열이 아니면 음절 단위로 분절하여 분절된 음절 사이에 약경계 끊어읽기 정보를 삽입하는 제 8 단계; 및 상기 미등록어처리부가 약경계 끊어읽기 정보가 삽입된 미등록어의 양쪽 경계에 강경계 끊어읽기 정보를 삽입하는 제 9 단계를 포함한다.On the other hand, the present invention, the language processing unit receives the text data from the outside to perform the morpheme and syntax analysis, the first step of extracting unregistered words that are not processed by the morpheme and syntax analysis and not registered in the exception dictionary; A second step of the non-registered word processor inserting the read information into the non-registered word from the language processor; A third step of the rhyme processing unit receiving the text data analyzed by the language processing unit and the text data processed by the non-registered word processing unit to perform rhyme processing; A fourth step of receiving, by the synthesis unit processing unit, text data processed by the rhyme processing unit, searching for a syllable database or a synthesis database, and inserting synthesis unit information; And a fifth step of the synthesis filter receiving the text data processed by the synthesis unit processor to generate a synthesized sound and outputting the synthesized sound. In the second step, the non-registered word received from the language processor by the unregistered word processor is a numeric string. A sixth step of confirming that; A seventh step of inserting the weak boundary cut-out information between the segmented numbers by dividing by the numeric unit if the unregistered word is a numeric sequence as a result of the checking in the sixth step; An eighth step of inserting the weak boundary cut-out information between the segmented syllables by segmenting the syllable unit if the unregistered word is not a string of numbers as a result of the checking of the sixth step; And a ninth step of the non-registered word processing unit inserting the hard-boundary break reading information at both boundaries of the non-registered word into which the weak boundary breaking information is inserted.

삭제delete

상술한 목적, 특징들 및 장점은 첨부된 도면과 관련한 다음의 상세한 설명을 통하여 보다 분명해 질 것이다. 이하 첨부된 도면을 참조하여 본 발명에 따른 바람직한 일실시예를 상세히 설명한다.The above-mentioned objects, features and advantages will become more apparent from the following detailed description in conjunction with the accompanying drawings. Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일실시예에 따른 미등록어 합성 기능을 이용한 음성합성기의 구성도이다.1 is a block diagram of a speech synthesizer using a non-registered word synthesizing function according to an embodiment of the present invention.

도 1에 도시된 바와 같이, 본 발명에 따른 미등록어 합성 기능을 이용한 음성합성기(10)는, 외부로부터 텍스트 문장을 입력받아 텍스트 문장에 대하여 형태소 분석 및 구문 분석을 수행하여 운율처리부(13)에 전달하고, 형태소 분석 및 구문분석에 의하여 처리되지 않으며 예외사전에도 등록되어 있지 않은 미등록어를 추출하여 미등록어처리부(12)에 전달하기 위한 언어처리부(11), 상기 언어처리부(11)로부터 미등록어를 전달받아 음절 또는 숫자 단위로 분절하여 끊어읽기 정보를 삽입한 후에 상기 운율처리부(13)에 전달하기 위한 미등록어처리부(12), 상기 언어처리부(11) 및 상기 미등록어처리부(12)에서 처리된 텍스트 데이터를 입력받아 운율 모델링을 수행하여 요구되는 운율 및 통사 정보를 삽입하기 위한 운율처리부(13), 상기 운율처리부(13)에서 처리된 텍스트 데이터를 입력받아 음절 데이터베이스(17) 또는 합성 데이터베이스(18)를 탐색하여 적합한 합성단위 정보를 삽입하기 위한 합성단위처리부(14) 및 상기 합성단위처리부(14)에서 처리된 텍스트 데이터를 입력받아 합성음을 생성하여 출력하기 위한 합성필터(15)를 포함한다.As shown in FIG. 1, the speech synthesizer 10 using the non-registered word synthesizing function according to the present invention receives a text sentence from the outside and performs morpheme analysis and syntax analysis on the text sentence to the rhyme processor 13. A language processor 11 for extracting the unregistered words that are not processed by morphological analysis and syntax analysis and not registered in the exception dictionary and transferring them to the unregistered word processing unit 12, and the unregistered words from the language processing unit 11 The unregistered word processor 12, the language processor 11, and the unregistered word processor 12 for receiving the received syllables and segmenting them into syllables or numbers and inserting the broken reading information into the rhythm processor 13. The rhyme processing unit 13 and the rhyme processing unit 13 for inserting the required rhyme and syntactic information by receiving the received text data The synthesized unit processor 14 and the synthesized unit processor 14 for inputting the synthesized unit information by searching the syllable database 17 or the synthesized database 18 by receiving the text data and receiving the synthesized sound It includes a synthesis filter 15 for generating and outputting.

상기 예외사전(16)은 언어처리부(11)의 형태소 분석 및 구문 분석에서 처리되지 않은 예외적인 단어들이 저장된 데이터베이스로, 널리 통용되는 고유명사, 외래어, 지명, 인명 등이 저장된다. 그리고, 미등록어란 언어처리부(11)의 형태소 분석 및 구문분석에 의하여 처리되지 않고, 상기 예외사전(16)에도 등록되어 있지 않은 고유명사, 외래어, 지명, 인명, 숫자(열)로 정의한다.The exception dictionary 16 is a database storing exceptional words that are not processed in the morphological analysis and the syntax analysis of the language processing unit 11, and widely used proper nouns, foreign words, place names, names of persons, and the like. Unregistered words are defined as proper nouns, foreign words, place names, person names, and numbers (columns) that are not processed by the morpheme analysis and the syntax analysis of the language processing unit 11 and are not registered in the exception dictionary 16.

상기 음절 데이터베이스(17)는 음절단위로 합성하기 위하여 필요한 데이터를 저장하고 있는 데이터베이스이고, 합성 데이터베이스(18)는 3상음 또는 2상음의 합성단위로 합성하기 위하여 필요한 데이터를 저장하고 있는 데이터베이스이다.The syllable database 17 is a database that stores data necessary for synthesizing in syllable units, and the synthesis database 18 is a database that stores data necessary for synthesizing in synthesis units of three-phase or two-phase sounds.

한편, 본 발명에 따른 미등록어 합성 기능을 이용한 음성합성기에서 예외사전에 등록되어 있지 않는 미등록어에 끊어읽기 정보를 삽입하는 방식 및 합성음을 출력하는 방식에 대해서 후술하기로 한다.On the other hand, in the speech synthesizer using the non-registered word synthesizing function according to the present invention will be described later the method of inserting the read information to the non-registered words not registered in the exception dictionary and the method of outputting the synthesized sound.

도 2는 본 발명의 일실시예에 따른 미등록어 합성 기능을 이용한 음성 합성 방법에 대한 흐름도이다.2 is a flowchart illustrating a speech synthesis method using a non-registered word synthesis function according to an embodiment of the present invention.

먼저, 언어처리부(11)가 외부로부터 텍스트 문장을 입력받아 형태소 분석 및 구문 분석을 수행하여 운율처리부(13)에 전달하고, 형태소 분석 및 구문분석에 의하여 처리되지 않으며 예외사전에도 등록되지 않은 미등록어를 추출하여 미등록어처리부(12)에 전달한다(201).First, the language processing unit 11 receives a text sentence from the outside, performs morphological analysis and syntax analysis, and transmits it to the rhythm processing unit 13, and is not processed by morphological analysis and syntax analysis and is not registered in an exception dictionary. Extract it and transfer it to the unregistered word processing unit 12 (201).

그러면, 상기 미등록어처리부(12)가 상기 언어처리부(11)로부터 입력받은 미등록어에 끊어읽기 정보를 삽입하여 상기 운율처리부(13)에 전달한다(202 내지 206). 즉, 상기 미등록어처리부(12)가 상기 언어처리부(11)로부터 입력받은 미등록어가 숫자열인지 확인하여(202,203) 상기 확인 결과(202, 203), 미등록어가 숫자열이면 숫자 단위로 분절하여 분절된 숫자 사이에 약경계 끊어읽기 정보를 삽입하고(204), 미등록어가 숫자열이 아니면 음절 단위로 분절하여 분절된 음절 사이에 약경계 끊어읽기 정보를 삽입한다(205). 예를 들어, 입력받은 미등록어가 "사십오"라면 약경계 끊어읽기 정보를 삽입한 결과는 "사십-약경계-오"이며, 입력받은 미등록어가 "홍길동"이라면 약경계 끊어읽기 정보를 삽입한 결과는 "홍-약경계-길-약경계-동"이다. 여기에서, 약경계란 50~200msec의 휴지구간의 삽입으로 정의한다.Then, the unregistered word processor 12 inserts the read information into the unregistered word received from the language processor 11 and transmits the read information to the rhyme processor 13 (202 to 206). That is, the non-registered word processor 12 checks whether the unregistered word input from the language processor 11 is a numeric string (202, 203). If the unregistered word is a numeric string, the unregistered word is segmented by a number unit. The weak boundary breaking information is inserted between the numbers (204). If the non-registered word is not a number string, the weak boundary breaking information is inserted (S205). The weak boundary breaking information is inserted between the segmented syllables (205). For example, if the input unregistered word is "forty-five", the result of inserting the weak boundary read information is "forty- weak boundary-oh", and if the input unregistered word is "Hong Gil-dong", the result of inserting the weak boundary cut information is inserted. Is "hong-weak boundary-gil-weak boundary-dong". Here, the weak boundary is defined as the insertion of a rest period of 50 to 200 msec.

이후, 약경계 끊어읽기 정보를 삽입한 미등록어의 양쪽 경계(어절 단위)에 강경계 끊어읽기 정보를 삽입한다(206). 여기에서, 강경계란 200msec 이상의 휴지구간의 삽입으로 정의한다. 위에서 설명한 예에서 강경계 끊어읽기 정보를 삽입한 결과는 각각 "강경계-사십-약경계-오-강경계", "강경계-홍-약경계-길-약경계-동-강경계"이다.Subsequently, the strong boundary breaking information is inserted at both boundaries (word units) of the unregistered word into which the weak boundary breaking information is inserted (206). Here, the hard boundary is defined as the insertion of a resting section of 200 msec or more. In the example described above, the result of inserting the hard boundary break information is "Strong boundary- Forty-weak boundary-O-strong boundary" and "Strong boundary- Hong- weak boundary- road- weak boundary- east- strong boundary", respectively. .

이후, 상기 운율처리부(13)는 상기 언어처리부(11) 및 상기 미등록어처리부(12)에서 처리된 텍스트 데이터를 입력받아 운율 모델링을 수행하여 요구되는 운율 및 통사 정보를 삽입한다(207).Thereafter, the rhyme processor 13 receives text data processed by the language processor 11 and the non-registered word processor 12 and performs rhyme modeling to insert required rhyme and syntactic information (207).

이후, 합성단위처리부(14)가 상기 운율처리부(13)에서 처리된 텍스트 데이터를 입력받아 음절 데이터베이스(17) 또는 합성 데이터베이스(18)를 탐색하여 적합한 합성단위 정보를 삽입한다(208). 이때, 미등록어에 대하여는 음절 데이터베이스(17)를 먼저 검색하여 존재하지 않을 경우에 합성 데이터베이스(18)를 검색하여 합성하도록 한다.Subsequently, the synthesis unit processing unit 14 receives the text data processed by the rhyme processing unit 13, searches the syllable database 17 or the synthesis database 18, and inserts the appropriate synthesis unit information (208). In this case, for the non-registered words, the syllable database 17 is first searched and the synthesized database 18 is searched and synthesized when the syllable database 17 does not exist.

이후, 합성필터(15)가 상기 합성단위처리부(14)에서 처리된 텍스트 데이터를 입력받아 합성음을 생성하여 출력한다(209).Thereafter, the synthesis filter 15 receives the text data processed by the synthesis unit processor 14 to generate and output a synthesis sound (209).

도 3은 본 발명의 일실시예에서 사용자의 선택에 따라 합성음을 출력하는 과정에 대한 설명도이다.3 is an explanatory diagram illustrating a process of outputting a synthesized sound according to a user's selection in an embodiment of the present invention.

도 3에 도시된 바와 같이, 합성필터(15)는 외부로부터 입력받은 합성음 출력 형태 선택 신호에 따라 세가지 출력 형태 중 한가지로 출력을 한다. 즉, 세가지 출력 형태란, 첫번째로 미등록어에 대하여 끊어읽기 정보와 음절단위의 합성단위 정보가 삽입된 미등록어처리합성음과 일반합성음을 일정한 휴지구간(강경계의 2배 이상)을 두어 순차적으로 출력하는 형태(301), 두번째로 미등록어처리합성음만을 출력하는 형태(302), 세번째로 일반합성음만을 출력하는 형태(303)를 말한다.As shown in FIG. 3, the synthesis filter 15 outputs one of three output forms according to the synthesis sound output form selection signal received from the outside. That is, three types of outputs are sequentially outputted with a fixed pause period (more than twice the boundary) of unregistered word processing synthesis sound and general synthesized sound in which unread-word information and syllable unit information are inserted. A form 301, a second form 302 for outputting only unregistered word processing synthesis sounds, and a third form 303 for outputting only general synthesized sounds.

상술한 바와 같은 본 발명의 방법은 프로그램으로 구현되어 컴퓨터로 읽을 수 있는 형태로 기록매체(씨디롬, 램, 플로피 디스크, 하드 디스크, 광자기 디스크 등)에 저장될 수 있다. 이러한 과정은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있으므로 더 이상 상세히 설명하지 않기로 한다.As described above, the method of the present invention may be implemented as a program and stored in a recording medium (CD-ROM, RAM, floppy disk, hard disk, magneto-optical disk, etc.) in a computer-readable form. Since this process can be easily implemented by those skilled in the art will not be described in more detail.

이상에서 설명한 본 발명은, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에 있어 본 발명의 기술적 사상을 벗어나지 않는 범위 내에서 여러 가지 치환, 변형 및 변경이 가능하므로 전술한 실시예 및 첨부된 도면에 의해 한정되는 것이 아니다.
The present invention described above is capable of various substitutions, modifications, and changes without departing from the spirit of the present invention for those skilled in the art to which the present invention pertains. It is not limited by the drawings.

상기와 같이 본 발명은, 미등록어에 대하여 합성음을 생성할 때에 어절에는 강경계 끊어읽기 정보를, 음절 단위에는 약경계를 삽입시킴으로써 합성음의 명료도 및 이해도를 향상시킬 수 있는 효과가 있다.As described above, the present invention has the effect of improving the intelligibility and comprehension of the synthesized sound by inserting a strong boundary reading information into a word and a weak boundary into a syllable unit when generating a synthesized sound for an unregistered word.

또한, 본 발명은, 사용자의 편리에 따라 합성음 출력 형태를 선택할 수 있도록 함으로써 미등록어합성에 따른 이해도 및 명료도의 향상에 따른 합성음의 부자연스러움을 절충할 수 있는 효과가 있다.In addition, the present invention, by allowing the user to select the synthesized sound output form according to the user's convenience has the effect that can be compromised the unnaturalness of the synthesized sound according to the improvement of the understanding and clarity according to the unregistered word synthesis.

Claims

delete

First storage means for storing exceptional words;

Second storage means for storing syllable unit data;

Third storage means for storing the synthesis unit data;

Language processing means for performing morpheme and parsing on text data from outside and extracting unregistered words which are not processed by morpheme and parsing and are not registered in the first storage means;

Unregistered word processing means for inserting unread information into unregistered words from said language processing means;

Rhyme processing means for performing rhyme processing on the text data analyzed by the language processing means and the text data processed by the non-registered word processing means;

Synthesis unit processing means for receiving text data processed by the rhyme processing means and searching for the second storage means or the third storage means to insert synthesis unit information; And

Synthetic sound generation means for generating a synthesis sound by receiving the text data processed by the synthesis unit processing means,

The non-registered word processing means,

The non-registered word received from the language processing means is checked, and if the unregistered word is a numeric string, the weak boundary cut information is inserted between the segmented numbers, and if the non-registered word is not a numeric string, the fragment is divided into syllable units. And a strong boundary breaking information is inserted between both syllables and the boundary boundary breaking information is inserted at both boundaries of the non-registered word into which the weak boundary breaking information is inserted.

The method of claim 2,

The synthetic sound generating means,

Synthesis of unregistered word processing synthesized sound and general synthesized sound in a certain rest period (more than twice the boundary) according to the input signal of the synthesized sound output type selected from the outside, or outputting only unregistered word processed synthesis sound or only general synthesized sound Speech synthesizer using a non-registered word synthesis function, characterized in that.

delete

A first step in which the language processing unit receives text data from the outside to perform morpheme and parsing, and extracts unregistered words which are not processed by morpheme and parsing and are not registered in an exception dictionary;

A second step of the non-registered word processor inserting the read information into the non-registered word from the language processor;

A third step of the rhyme processing unit receiving the text data analyzed by the language processing unit and the text data processed by the non-registered word processing unit to perform rhyme processing;

A fourth step of receiving, by the synthesis unit processing unit, text data processed by the rhyme processing unit, searching for a syllable database or a synthesis database, and inserting synthesis unit information; And

And a fifth step of the synthesis filter receiving the text data processed by the synthesis unit processing unit to generate and output a synthesis sound.

The second step,

A sixth step of confirming, by the non-registered word processor, whether the unregistered word received from the language processor is a numeric string;

A seventh step of inserting the weak boundary cut-out information between the segmented numbers by dividing by the numeric unit if the unregistered word is a numeric sequence as a result of the checking in the sixth step;

An eighth step of inserting the weak boundary cut-out information between the segmented syllables by segmenting the syllable unit if the unregistered word is not a string of numbers as a result of the checking of the sixth step; And

A ninth step of the non-registered word processor inserting the hard-boundal break information into both boundaries of the non-registered word in which the weak boundary-breaking information is inserted;

Speech synthesis method using a non-registered word synthesis function comprising a.

The method of claim 5,

The fifth step,

A tenth step in which the synthesis filter receives a synthesis sound output type selection signal from an external source; And

An eleventh step of sequentially outputting a non-registered word synthesized sound and a general synthesized sound according to an input form of the synthesized sound, and outputting a non-registered word-processed synthesized sound only, or outputting only a general synthesized sound with a certain rest period (more than twice the boundary);

delete