KR100292376B1

KR100292376B1 - Device and method for converting sentence

Info

Publication number: KR100292376B1
Application number: KR1019930025928A
Authority: KR
Inventors: 공병구
Original assignee: 윤종용; 삼성전자주식회사
Priority date: 1993-11-30
Filing date: 1993-11-30
Publication date: 2001-06-01
Also published as: KR950015052A

Abstract

PURPOSE: A device and method for converting a sentence are provided to merge a sentence as a voice easily by modeling and regulating an interpret of a morpheme and a speaking rule of a human with respect to an input sentence, respectively, without a meaning interpret and performing a disconnected reading based thereon. CONSTITUTION: An input unit(10) receives a sentence(data) from an exterior. A dictionary unit(20) stores a basic mode of a word. A disconnected reading model table(30) stores a disconnected reading model. An operation memory unit(40) temporarily stores data being processed until an inputted sentence is converted into a voice. A program storing unit(60) stores a processing program with respect to a disconnected reading. A control unit(70) performs a control for performing a data process with a peripheral device in accordance with the stored program. An output unit(80) outputs a converted sentence. In addition, a central processing unit(50) is provided.

Description

Translator and its method

제1도는 본 발명에 따른 문장변환장치의 블럭도.1 is a block diagram of a sentence conversion apparatus according to the present invention.

제2도는 본 발명에 따른 문장변환방법을 개략적으로 도시한 흐름도이다.2 is a flowchart schematically showing a sentence conversion method according to the present invention.

* 도면의 주요부분에 대한 부호의 설명* Explanation of symbols for main parts of the drawings

10 : 입력부 20 : 사전부10: input unit 20: dictionary unit

30 : 끊어읽기 모델테이블 40 : 동작메모리부30: Cut off model table 40: Operation memory part

50 : 중앙처리장치 60 : 프로그램저장부50: central processing unit 60: program storage unit

70 : 제어부 80 : 출력부70: control unit 80: output unit

본 발명은 문장을 일정단위로 끊어읽을 수 있도록 변환하는 문장변환장치 및 그 방법에 관한 것으로, 보다 상세하게는 문장의 형태소해석 및 인간의 발성규칙에 따라 문장을 끊어읽기 단위로 생성하도록 하는 문장변환장치 및 그 방법에 관한 것이다.The present invention relates to a sentence converting apparatus for converting a sentence to be read in a predetermined unit, and a method thereof, and more particularly, sentence transformation to generate a sentence in a reading unit according to the morphological analysis of the sentence and human voice rules. An apparatus and a method thereof are provided.

인간이 사용하는 문장을 기계를 통하여 음성으로 변환하고자 하는 노력이 과거부터 현재에 이르기 까지 다양한 방법으로 발전하고 있다. 문장을 음성으로 변환하기 위해서는 문장을 인간의 발성규칙과 같이 띄어 읽어야만 변환된 음성정보의 내용을 정확히 파악할 수 있으며 또한 자연성을 확보할 수 있다.Efforts to convert the sentences used by humans to speech through machines have been developed in various ways from the past to the present. In order to convert a sentence into speech, it is necessary to read the sentence like a human speech rule to accurately grasp the contents of the converted speech information and to secure naturalness.

그러나, 인간의 경우에는 문장을 읽을때 해당 문장에 대한 의미해석을 동시에 수행하며 문장을 음성으로 표현하는데 현재의 기술수준으로는 입력문장에 대하여 의미해석을 수행한후 이를 토대로 하여 음성출력을 구현하는 것은 거의 불가능하다.However, in the case of humans, when a sentence is read, the sentence is interpreted at the same time and the sentence is spoken. However, at the current level of technology, a sentence is interpreted based on the semantic interpretation of the input sentence. It is almost impossible.

또한, 일반적인 음성변환장치의 경우에서 사용하는 형태소의 해석에 따른 변환방법에 있어서도 전체 문장을 고려함이 없이 단순히 부분적인 형태소의 분석에 따라 음성변환이 이루어지므로 항상 일정한 끊어 읽기 패턴을 가지므로 인간의 자연적인 발성모델로는 적합하지 않은 문제점이 있었다.In addition, in the conversion method according to the interpretation of the morpheme used in the case of a general speech conversion device, since the voice conversion is performed by simply analyzing the partial morpheme without considering the whole sentence, it always has a constant broken reading pattern. There was a problem that is not suitable as a vocal model.

이러한, 종래기술에서 사용되어왔던 끊어읽기 방법에 보다 구체적으로 살펴보면 다음과 같다.In more detail, the reading method, which has been used in the related art, is as follows.

첫째, 형태소의 해석만에 의한 끊어읽기의 경우에는 형태소에 대한 자료를 포함하는 형태소 테이블을 두고 입력되는 문장의 띄어쓰기 단위마다(어절) 역순으로 형태소 테이블에 정의된 형태소중 해당하는 형태소가 존재하면 각 어절의 품사를 결정하고 존재하지 않으면 품사중에서 가장 적절한 것으로 결정한다. 이렇게 결정된 형태소를 토대로 단순히 문법규칙에 따라 (예, 접속조사, 서술격조사등) 적당히 끊어읽기 규칙을 만들어 사용하였다.First, in the case of broken reading by morphological interpretation only, if there is a corresponding morpheme among the morphemes defined in the morpheme table in reverse order for each spacing unit of input sentences (words) with a morpheme table containing data on morphemes The part of speech is determined and if it does not exist, the part of speech is the most appropriate. Based on the morphemes determined in this way, the grammar rules (e.g. access checks, narrative checks, etc.) were used to create and use appropriate reading rules.

그러나, 이러한 방식은 형태소 해석에 대한 오류가 많이 발생하고, 이렇게 형성된 오류가 끊어읽기 규칙에 중대한 영향을 주기 때문에 끊어읽기 규칙의 정확도가 상당히 낮아지는 문제점이 있었다.However, this method has a problem that the error of morpheme interpretation occurs a lot, and the accuracy of the break-and-read rule is considerably lowered because the error thus formed has a significant effect on the break-through rule.

둘째, 의미해석에 의한 끊어읽기의 경우에는 단어와, 변형어미등의 사전을 미리 구성하고 그 의미를 분석하여 의미에 따라 적절히 끊어읽기 규칙을 적용하는 것으로 지금까지의 기술수준으로는 완전한 의미해석이 불가능하고, 또한 완전하지 않지만 어느 정도의 수준(여기에서는 끊어읽기 결과가 청취자가 이해할 수 있는 정도)의 의미해석도 방대한 양의 단어들에 대한 정보를 포함하는 사전과 이를 구현하는 장치의 형성이 상당히 어려웠고, 입력되는 데이타를 바로 끊어읽기를 수행하여 그 결과를 음성으로 변환하는 실시간 처리가 용이하지 않아 음성합성기술에 적용이 상당히 곤란하다.Second, in the case of cut-off reading by means of meaning interpretation, the dictionary of words and transformed endings is constructed in advance, and the meaning is analyzed and the cut-out rule is applied according to the meaning. Impossible and not complete, but at some level (here the reading results can be understood by the listener), the formation of dictionaries containing information on vast amounts of words and the devices that implement them are quite significant. It is difficult, and it is very difficult to apply to speech synthesis technology because it is not easy to perform real-time processing of immediately reading the input data and converting the result into speech.

따라서, 본 발명의 목적은 의미해석을 지양하고 입력문장에 대한 형태소의 해석과 인간의 발성규칙을 각각 모델링하여 규칙화하고 이를 토대로하여 끊어읽기를 함으로써 음성으로의 합성을 용이하도록 하는 문장변환장치 및 그 방법을 제공하는 것이다.Accordingly, an object of the present invention is to provide a sentence converting apparatus for avoiding semantic interpretation and morphological interpretation of input sentences and human speech rules, modeling and regularizing them, and breaking them based on them, thereby facilitating synthesis into speech. To provide a way.

또한, 본 발명의 다른 목적은 문장의 끊어읽기의 갯수를 가변적으로 결정하여 형태소의 해석오류에 의한 문제점을 최소화할 수 있는 문장변환장치 및 그 방법을 제공하는 것이다.In addition, another object of the present invention is to provide a sentence conversion apparatus and method capable of minimizing problems caused by morphological interpretation errors by variably determining the number of broken readings of a sentence.

상기와 같은 목적을 달성하기 위하여 본원의 발명은 외부로 부터 문장데이타를 입력받기 위한 입력수단;In order to achieve the above object, the present invention provides an input means for receiving the sentence data from the outside;

단어들의 기본형 및 품사정보를 포함하는 사전부;A dictionary including basic form and part-of-speech information of words;

입력되는 문장을 분석하여 각 어절간의 끊어읽기 레벨의 결정을 제어하기 위한 제어수단;Control means for controlling the determination of the reading level between each word by analyzing the input sentence;

상기 입력수단을 통하여 입력된 문장을 일시저장하며 상기 제어수단의 제어에 의하여 상기 문장을 각각의 어절로 부터 분리된후 상기 사전부의 검색에 의하여 각 어절에 대한 형태소의 분석에 의하여 얻어진 상기 어절의 기본형과 함께 해당 어절의 품사정보를 저장하는 동작메모리부;The sentence type is temporarily stored through the input means, and the sentence is separated from each word under the control of the control means, and the basic form of the word obtained by analyzing the morpheme for each word by searching the dictionary part An operation memory unit for storing the part-of-speech information of the word;

상기 동작메모리부에 저장된 어절들의 끊어읽기 레벨을 설정하기 위하여 어절의 품사정보에 따른 끊어읽기 레벨을 규정하는 끊어읽기 모델테이블; 및 상기 제어부의 제어에 의하여 상기 끊어읽기 모델테이블에 의하여 결정된 끊어읽기레벨에 따라 입력문장을 출력하는 출력부를 포함하는 것을 특징으로 한다.A cut-out model table that defines a cut-out level according to the part-of-speech information of a word in order to set the cut-off level of words stored in the operation memory unit; And an output unit configured to output an input sentence according to the cut-off level determined by the cut-out model table under the control of the controller.

또한, 본원의 발명은 입력되는 문장데이타를 각각의 어절로 분리하고 분리된 각 어절들에 대하여 사전을 검색하여 형태소분석에 의하여 해당어절의 기본형과 품사정보를 추출하는 형태소분석단계;In addition, the present invention is the morphological analysis step of separating the sentence data to be input to each word and search the dictionary for each of the separated words to extract the basic form and part-of-speech information of the phrase by morpheme analysis;

상기 형태소분석단계에서 얻어진 정보와 입력문장에서 인접하는 각각의 어절들간의 관계에 따라 각 어절의 끊어읽기 레벨을 결정하는 끊어읽기처리단계; 및 상기 끊어읽기처리단계에서 출력되는 레벨정보에 따라 해당 입력문장을 끊어읽기단위별로 출력하는 출력단계를 포함하는 것을 특징으로 한다.A truncated reading processing step of determining a truncated reading level of each word according to the relationship between the information obtained in the morphological analysis step and each of the adjacent words in the input sentence; And an output step of outputting the input sentence for each reading unit according to the level information output in the cut-out processing step.

이하, 첨부도면을 참조하여 본원의 발명을 상세히 설명한다.Hereinafter, the present invention will be described in detail with reference to the accompanying drawings.

제1도는 본 발명에 따른 문장변환장치의 개략블럭도이다. 제1도에서 10은 외부로부터 데이타인 문장을 입력받은 입력부, 20은 단어의 기본형이 저장되어 있는 사전부, 30은 끊어읽기 모델이 저장되는 끊어읽기 모델테이블, 40은 입력문장을 음성으로 변환하기 까지 처리되는 데이타를 일시 저장하는 동작메모리부, 50은 중앙처리장치(Central Processing Unit : CPU라함); 60은 본 발명에 따른 띄어읽기에 대한 처리프로그램이 저장되어 있는 프로그램저장부이고, 70은 저장된 프로그램에 따라 주변장치와의 데이타처리를 수행할 수 있도록 제어하는 제어부이고, 80은 변환된 문장을 출력하는 출력부이다.1 is a schematic block diagram of a sentence conversion apparatus according to the present invention. In FIG. 1, 10 is an input unit for receiving a sentence which is data from the outside, 20 is a dictionary unit in which a basic form of a word is stored, 30 is a break reading model table in which a break reading model is stored, and 40 is a voice for converting an input sentence into voice. An operation memory unit for temporarily storing data processed up to 50, a central processing unit (Central Processing Unit: CPU); 60 is a program storage unit for storing a processing program for spacing according to the present invention, 70 is a control unit for controlling data processing with a peripheral device according to the stored program, 80 is output the converted sentence Is an output section.

상기와 같은 구성을 가지는 문장변환장치의 동작을 제2도의 흐름도와 관련하여 설명하면 다음과 같다.The operation of the sentence conversion apparatus having the above configuration will be described with reference to the flowchart of FIG.

문장변환장치에 대한 전원의 입력과 함께 동작이 온되면 제어부(70)에서는 내부의 프로그램저장부(60)에 저장되어 있는 프로그램을 순차적으로 실행한다. 이와 같이 저장된 프로그램이 실행되면 문장변환을 위한 초기화가 이루어진다.When the operation is turned on together with the input of the power to the sentence converter, the control unit 70 sequentially executes the programs stored in the internal program storage unit 60. When the stored program is executed as described above, initialization for sentence conversion is performed.

초기화가 이루어진 상태에서 음성변환될 데이타가 먼저 본 발명에 따라 띄어읽기에 적합한 형태로 변환시키기 위하여 입력부(10)를 통하여 입력된다. 이때 입력되는 데이타는 문장을 포함하는 데이타화일의 형태가 되거나 키보드를 통하여 입력되는 문장등 어느 것이나 가능하다. 키보드를 통하여 입력되는 경우에는 화면상에 특정 문장을 입력하고 리턴키, 각종 문장의 끝을 표시하는 문자 또는 별도의 명령을 통하여 데이타를 입력할 수 있다.In the initializing state, the data to be voice-converted is first input through the input unit 10 in order to convert it into a form suitable for reading in accordance with the present invention. In this case, the input data may be in the form of a data file including a sentence or a sentence input through a keyboard. When input through a keyboard, a specific sentence may be input on the screen, and data may be input through a return key, a character indicating the end of various sentences, or a separate command.

이와 같이 입력된 문장은 우선 동작메모리부(40)에 일시적으로 저장된다. 동작메모리부(40)에 저장되는 문장은 입력된 데이타가 하나의 완전한 문장을 이룰때까지 저장한다. 입력문장이 완전한 문장이 될때까지 저장되는 것은 음성변환을 보다 자연적으로 이루어지도록 하는 것이다. 만약 데이타가 입력되는데로 이를 음성으로 변환하여 출력한다면 변환되는 문장의 의미를 정확히 파악할 수 없기 때문이다. 이때 입력데이타가 하나의 완전한 문장을 이루었는가에 대한 체크로는 문장의 종료를 표시하는 "."?"!"의 검출, 또는 스페이스가 주어진 값보다 연속해서 많이 나타나는 경우(예:8개), 또는 라인피드(Line Feed;LF)가 연속될 때등의 약정에 의하여 수행할 수 있다.The sentence input in this way is first temporarily stored in the operation memory unit 40. The sentence stored in the operation memory section 40 is stored until the input data forms one complete sentence. What is stored until the input sentence is a complete sentence is to make the voice conversion more natural. This is because if the data is input and converted into voice and output, the meaning of the converted sentence cannot be accurately understood. At this time, a check to see if the input data is a complete sentence can be performed by detecting "."? "!" Indicating the end of the sentence, or when more spaces appear in succession than a given value (e.g., eight). Alternatively, this may be performed by an agreement such as when line feed (LF) is continuous.

하나의 완성된 문장이 이루어지면 문장의 어절(띄어쓰기)마다 분리를 수행한다. 이때 문장의 분리는 단어간의 스페이스(간격)에 의하여 알 수 있다. 이렇게 어절마다 분리가 되면 분리된 각 단어에 대한 기본형을 사전부(20)를 검색하여 찾는다. 즉, 어절들에 대한 형태소분석을 수행한다. 이때 기본형을 검색하는 방법으로는 최장일치법을 사용한다(즉, 가장 긴 단어가 존재하면 짧은 단어가 존재하더라도 가장 긴 단어로 정의한다.). 이러한 검색의 예로서 만약 문장중에 "사랑하는" 이라는 단어를 검색한다고 하면 사전부(20)에서는 다음과 같은 단어들이 검색될 수 있다.When a completed sentence is made, it is separated for each sentence of the sentence. At this time, the separation of sentences can be known by the space (space) between words. When the words are separated by each word like this, the basic form for each separated word is searched and found in the dictionary unit 20. That is, morphological analysis of words is performed. In this case, the long form matching method is used as a method of searching the basic form (ie, if the longest word exists, the shortest word is defined as the longest word). As an example of such a search, if the word "love" is searched in a sentence, the following words may be searched in the dictionary unit 20.

"① 사 ② 사랑 ③ 사랑하 ""① ④ ② love ③ love"

이때 ③의 "사랑하"는 "사랑하다"의 어간이다. 일반적으로 동사나 형용사의 경우에는 어미는 활용형에 따라 다양하게 변화될 수 있기 때문에 사전에 기록되는 형태는 어간만이 기록된다. 이와 같이 단어가 검색되면 앞서 언급된 바와 같이 최장일치법에 의하여 "사랑하"가 결정된다.At this time, "love" is the stem of "love". In general, in the case of verbs or adjectives, since the mother may vary depending on the utilization type, only the stem is recorded beforehand. When the word is searched as described above, "love" is determined by the longest matching method as mentioned above.

그리고 앞서 언급한 형태소의 분석을 보다 상세히 살펴보면 문장에서 사전을 이용하여 단어를 이루는 형태소를 분리하는 음절단위 분석을 행하여 형태론적 변형이 일어난 형태소의 원형을 복원하고 가능한 형태소 분석후보를 생성하는 것으로 형태소의 복원을 수행하는 방법중의 하나의 예를 살펴보면 다음과 같다.And if you look at the analysis of the morpheme mentioned above in detail, the syllable unit analysis that separates the morphemes that make up the words from the sentence is carried out to restore the prototype of the morphological transformation where the morphological transformation has occurred and to generate possible morphological candidates. An example of how to perform a restore is as follows.

예를 들어, ㅂ 불규칙 형용사인 "아름답다"가 문장내에서 "아름다운"으로 사용되는 경우, 이러한 단어가 형태소분석처리에 의하여 "아름다운"을 ("아름답-" + "-은")으로 분석하는 것이다. 이러한 분석은 먼저 입력문장에서의 단어인 "아름다운"을 ("아름다우-" + "-ㄴ")으로 분리하고 이로부터 어간의 끝을 체크하여 만약 그 끝이 "우"이고 어미부가 "ㄴ"으로 시작되면 "우"를 "ㅂ"으로 치환하여 "아름답-"을 사전에서 찾아본다. 이때 사전에는 "아름답-"에 대하여 ㅂ-불규칙이라는 정보가 수록되어 있는 상태이므로 ("아름답-" + "-은")이라는 분석결과가 나오게 된다. 이와 같은 형태소의 분석은 여러가지 방법중 하나의 방법을 적용하여 일예를 설명한 것으로 다른 방법으로도 동일한 결과를 얻을 수 있도록 구현할 수 있다.For example, when the irregular adjective "beautiful" is used as "beautiful" in a sentence, the word is analyzed by morphological processing as "beautiful" ("beautiful-" + "-"). . This analysis first separates the word "beautiful" in the input sentence into "beautiful" ("beautiful-" + "-ㄴ") and checks the end of the stem from it if the end is "right" and the ending is "b". If you start with "우" to replace "ㅂ" look for "beautiful answer" in the dictionary. At this time, since the information about the 아름 -irregularity is contained in the dictionary about "beautiful answer-", the result of analysis ("beautiful answer-" + "-eun") comes out. The analysis of the morpheme described above is one example by applying one of several methods can be implemented to obtain the same results in other methods.

상기에서와 같은 형태소분석에 따라 사전부(20)를 통하여 검색된 결과(해당 어절에 대한 기본형과 품사정보)가 동작메모리부(40)에 저장된다. 저장된 데이타들은 3개로 구성되는 끊어읽기 모델테이블에 전달되어 3개의 어절을 순차적으로 한개의 어절식 시프트하면서 끊어읽기의 레벨을 결정하기 위한 처리과정을 수행한다. 이러한, 수행의 일예를 살펴보면 다음과 같다.According to the morpheme analysis as described above, the result (basic type and part-of-speech information about the word) searched through the dictionary unit 20 is stored in the operation memory unit 40. The stored data is transferred to a three-read model table, and the three words are sequentially shifted by one word to perform a process for determining the level of the broken read. An example of such a performance is as follows.

"① 나는 ② 영희를 ③ 사랑하고 ④ 있다고 ⑤ 생각한다." 라는 문장은 ①번과 ②번, ②번과 ③번 어절의 관계에 의해 먼저 ②번 뒤의 끊어읽기 레벨을 결정한다. 즉, 자신의 앞에 오는 어절의 종류와 뒤에 오는 어절과의 관계가 레벨을 결정한다. 계속해서 ③번 어절의 끊어읽기 레벨은 ②번과 ③번, ③번과 ④번의 관계가 되기 때문에 한 어절씩 시프트하면서 테이블에 적용하게 되는 것이다."① I think ② I love ③ Young-hee ⑤. Is determined by the relationship between the words ①, ②, ②, and ③, and the reading level after ② is determined first. In other words, the relationship between the type of word that comes before it and the word that follows it determines the level. Since the reading level of word ③ is related to ② and ③, ③ and ④, it is applied to the table by shifting by one word.

이렇게 사용되는 끊어읽기 모델테이블은 3개(테이블Ⅰ, 테이블Ⅱ, 테이블Ⅲ)로 구성되며 각각은 60×60의 테이블로 구성되어 이 테이블에는 어절과 어절의 종류ID(ID는 단어가 이루는 어절의 품사종류를 의미하며, 여기서는 품사의 종류를 60종류로 구성하였고 기본형과 그 활용을 고려하여 품사ID를 결정하였다.)가 x축과 y축의 ID를 나타내며 x, y축상의 해당 값에는 끊어읽기 레벨이 기록된다.The cut-out model table used in this way is composed of three tables (Table I, Table II, and Table III), and each table consists of 60 × 60 tables. The table contains the word ID and the type ID of the word. The part of speech refers to the part of speech, which consists of 60 types of parts of speech, and the part of speech ID was determined by considering the basic type and its application.) Indicates the IDs of the x-axis and the y-axis. This is recorded.

실제로 모델테이블을 구현함에 있어서 이를 3개로 나누지 않고 3차원 테이블, 즉 [이전][현재][다음]의 구조가 되도록 할 수도 있으나 이는 테이블의 크기가 방대해지고, 데이타의 중복성이 많아지는 문제점을 포함한다. 따라서, 60×60×60의 3차원 테이블을 60×60×30의 테이블, 즉 60×60의 2차원 테이블을 3개를 형성하도록 구성하여 데이타의 크기를 줄이고 또한 중복되는 데이타를 감소시킬 수 있도록 하였다.In practice, the model table may be structured as a three-dimensional table, that is, [previous] [current] [next], without dividing it into three, but this includes the problem that the size of the table becomes huge and the data is redundant. do. Therefore, a three-dimensional table of 60 × 60 × 60 is formed to form three 60 × 60 × 30 tables, that is, two 60 × 60 two-dimensional tables, so that the data size can be reduced and redundant data can be reduced. It was.

이러한 테이블Ⅰ의 가로와 세로축에 현재의 어절과 이전 어절을 적용하여 해당하는 칸의 값을 취하고, 즉 행과열의 ID에 의해 정해진 위치의 값을 취한다. 다음에 현재 어절과 다음 어절은 테이블Ⅱ의 가로축과 세로축에 적용하여 해당하는 값을 취하여 테이블Ⅲ의 가로와 세로축에 투영하여 해당하는 값을 일단 현재의 어절의 끊어읽기 값으로 저장한다.The current word and the previous word are applied to the horizontal and vertical axes of Table I to take the value of the corresponding column, that is, the value of the position determined by the row and column ID. Next, the current word and the next word are applied to the horizontal and vertical axes of Table II and are projected onto the horizontal and vertical axes of Table III. The corresponding words are stored as the current reading value of the current word.

끊어읽기값이 결정되면(즉, 끊어읽기처리Ⅰ과정을 통하여 각 어절에 대한 레벨이 결정되고) 각 어절수를 이루는 음절수(받침이 없으면 1, 받침이 있는 경우에는 1.5로 정한다)의 총계에 따라 정의된 가중치를 앞서 저장된 끊어읽기의 값에 곱하여 그 결과를 저장한다. 이 과정에서 음절수의 총계에 따라 가중치를 곱하는 이유는 음절이 길수록 실제 사람이 발음할때 숨이 가빠지는 경우를 고려한 것이다.When the cutoff value is determined (that is, the level for each word is determined through the breakreading process I), the total number of syllables (1 if there is no support and 1.5 if there is a support) is formed. The result is then multiplied by the previously stored cutoff value to store the result. The reason for multiplying the weight according to the total number of syllables in this process is to consider the case that the longer the syllable is, the shorter the breath becomes when a real person pronounces it.

이와 같이 계속해서 문장을 이루는 전체 음절수에 따라 끊어읽기 갯수를 정하고 정해진 갯수를 만족하는 값을 갖고있는 어절에서 보다 자연스런운 끊어읽기를 수행하기위한 끊어읽기처리Ⅱ과정을 수행한다. 이러한 과정은 제1위치를 정하고(구 구분) 각 구에 대하여 상기의 과정을 되풀이한 후에 출력부로 결과를 출력한다. 즉, 일단 끊어읽기 레벨정보가 있고, 한 문장을 이루는 음절수에 따라 몇번을 끊어읽을 것인가를 결정하고, 그에 따라 레벨의 끊어읽기값을 조정한다. 여기서, 끊어읽기 갯수란, 한번 호흡으로 자연스럽게 읽을 수 있는 음절수(평균)를 정하면 정해진 문장을 대략 몇번 끊어 읽을 것인가가 결정된다.In this way, the number of cut-outs is determined according to the total number of syllables forming the sentence, and the cut-out process II is performed to perform a more natural cut-out in a word having a value satisfying the set number. This process determines the first position (segment classification) and repeats the above process for each sphere and outputs the result to the output unit. In other words, once there is break level information, it is decided how many times to read according to the number of syllables forming a sentence, and adjusts the level of break level accordingly. Here, the number of cut-outs determines the number of syllables (average) that can be naturally read by one breath.

즉, 다음의 문장을 예로서 살펴본다.In other words, consider the following sentence as an example.

③우리는⑤민족중흥의③역사적③사명을②띄고③이땅에④태어났다.③ We turned the ③ historical ③ mission of ⑤ national revival ② and ③ was born ④ on the ground.

상기의 문장은 전체 7어절과 23음절로 이루어진 문장으로서 13 내지 15음절이 한 호흡으로 이루어질 수 있는 단락이므로 끊어읽기의 갯수는 2가되며 앞서 언급한 방법에 따라 처리되면 끊어읽기의 정보는 다음과 같이 된다.The above sentence is a sentence composed of 7 syllables and 23 syllables, and 13 to 15 syllables can be composed of one breath. Therefore, the number of broken readings is 2, and the information of the broken readings is processed as follows. Become together.

우리는⑤민족중흥의②역사적①사명을④띄고⑦이땅에②태어났다.We ④ turned off the ② historic ① mission of national revival ④ and was born ② on the ground.

즉, 문장의 끊어읽기는 상기에서 기술한 바와 같이 레벨이 가장 큰 부분인 레벨⑦에서 끊어서 2개의 부분으로 나누어 읽으면 적당하고, 또한 자연스러운 띄어읽기가 되기위애서는 다음 레벨인 레벨⑤에서 약간의 처음에 레벨⑦의 띄어읽기보다는 약간 작은 휴지구간을 두면 대단히 자연스러운 끊어읽기가 된다.That is, it is appropriate to cut off the sentence at the level ⑦, which is the largest level, as described above, and then divide the text into two parts. Rather than reading the level ⑦ in a little smaller than the rest period is a very natural break reading.

이러한 처리과정을 앞서 예를 들은 문장인 "우리는 민족중흥의 역사적 사명을 띠고 이땅에 태어났다."를 이용하여 전체 과정을 순서대로 살펴보면 다음과 같다.Looking at the process in order, using the sentence "We were born on earth with the historical mission of national revival," we look at the whole process in order.

입력된 문장은 음절수의 계산을 위하여 각각의 어절이 분리된다.The input sentences are separated from each word to calculate the number of syllables.

우리는 민족중흥의 역사적 사명을 띠고 이땅에 태어났다.We were born here on the historical mission of national revival.

①(3) ②(5) ③(3) ④(3) ⑤(2) ⑥(3) (4)⑦① (3) ② (5) ③ (3) ④ (3) ⑤ (2) ⑥ (3) (4) ⑦

이와 같이 입력문장에 대한 어절분리가 되면 형태소의 분석에 의하여 각 어절에 대한 품사정보가 주어진다.In this way, when the word is divided into input sentences, parts of speech information for each word are given by analyzing the morpheme.

10,16 3,2,19 3,18 2,17 34,48 12,3,20 34,4310,16 3,2,19 3,18 2,17 34,48 12,3,20 34,43

이렇게 품사정보가 주어지면 테이블에 의하여 각 어절에 대한 레벨이 결정된다.Given the part-of-speech information, the level for each word is determined by the table.

우리는⑤민족중흥의②역사적①사명을④띠고⑦이땅에②태어났다.We were born ② on the ground, ② with the ② historical ① mission of national revival.

각각의 어절에 대한 레벨이 결정된 상기의 문장은 전체 음절의 갯수가 23음절이므로 13 내지 15음절이 인간이 한번의 호흡으로 발성할 수 있는 음절의 수이므로 이러한 것으로 판단하건데 2번의 끊어읽음으로서 읽을 수 있다. 따라서, "띠고"와 "이땅에"사이가 가장 레벨이 큰 값을 가지므로 상기 문장의 띄어읽기는 "우리는 민족중흥의 역사적 사명을 띠고"와 "이땅에 태어났다."로 읽으면 가능하며 또한 두개로 나누어진 부분중 앞부분은 5개의 어절과 15개의 음절로 이루어진 긴 문장이므로 이 문장중 레벨이 가장 높은 부분은 "우리는"과 "민족중흥의"사이이므로 앞서의 띄어읽기보다는 작은 휴지구간을 주어 띄어읽는 것이 가장 바람직한 결과가 된다.The sentence above is determined by the level of each word, since the total number of syllables is 23 syllables, 13 to 15 syllables are the number of syllables that humans can produce with one breath. have. Therefore, the most important level is between "Thigo" and "On earth", so the reading of the sentence above is possible by reading "We have a historical mission of national revival" and "Born on earth". Since the first part of the two parts is a long sentence consisting of five words and fifteen syllables, the highest level of the sentence is between "we" and "ethnic revival". Subject reading is the most desirable result.

일반적으로 입력데이타, 즉 문장을 음성으로 변환하는 경우의 입력문장의 띄어읽기의 정확도는 변환되어 출력되는 음성이 정확한 의미를 전달하여 청취시 거부감이 없는가에 의하여 결정되는 것으로 상기와 같은 본 발명에 따른 변환방법과 종래에서 사용되었던 방법들간의 정확도, 동일문장을 변환하는데 걸리는 시간 및 이를 구현하기 위한 시스템의 복잡도등을 비교한 결과를 <표 1>에 나타내었다.In general, the accuracy of the spacing of the input sentences in the case of converting the input data, that is, the sentences into speech, is determined by whether there is no sense of rejection when listening by transmitting the correct meaning of the converted speech. Table 1 shows the result of comparing the accuracy between the conversion method and the conventional methods, the time taken to convert the same sentence, and the complexity of the system for implementing the same.

[표 1]TABLE 1

[표 1]에 도시된 것처럼 기존Ⅰ의 방법에서 수행하였던 음성변환은 수행시간 및 시스템복잡도등을 고려할 때는 장점이 있으나 정확한 띄어읽기가 이루어지지 못하는 단점이 있어 자연스러운 음성출력을 얻을 수 없으며 기존Ⅱ의 경우에는 정확도는 향상될 수 있으나 의미해석에 의한 수행시간이 급격히 증가하고 또는 이를 처리하기 위한 메모리양이 막대해지므로 경제적으로 극히 불리하다. 그러나, 본원의 발명에 의한 발성모델에 의한 띄어읽기의 경우에는 정확도, 수행시간 및 시스템 복잡도를 고려할 때 가장 바람직한 실시형태임을 알 수 있다.As shown in [Table 1], the voice conversion performed by the method of the existing I has advantages in consideration of execution time and system complexity, but there is a disadvantage of not being able to read the correct spacing. In this case, the accuracy can be improved, but it is extremely economically disadvantageous because the execution time by the semantic analysis increases rapidly or the amount of memory for processing the same increases. However, in the case of the spacing of the speech model according to the present invention, it can be seen that it is the most preferred embodiment in consideration of accuracy, execution time and system complexity.

상기 [표 1]에서 얻어진 결과는 국민교육헌장, 무작위의 30여개의 문장으로 이루어진 사설 30개, 중학교 국어, 역사책 내용중 발췌한 500여 문장등에 대하여 실시하여 얻은 결과이며, 정확도는 사람이 직접 작성한 띄어읽기 위치와 비교한 정수이다.The results obtained in [Table 1] are the results obtained by conducting the National Education Charter, 30 editorials consisting of 30 random sentences, Korean middle school language, and 500 sentences extracted from the contents of history books. This is an integer compared to the created spacing position.

또한, 본 발명에 따른 방법의 적용력을 위하여 음성합성기에 적용하고 청취력 검사를 실시하여 [표 2]와 같은 결과를 얻었다.In addition, it was applied to the speech synthesizer for the application of the method according to the invention and the listening test was performed to obtain the results as shown in Table 2.

[표 2]TABLE 2

[표 2]에서의 명료성은 무의미 단어로 이루어진 문장의 청취결과이며, 자연성은 20여명의 청취자에게 평가기준을 주고 평가한 결과를 평균한 값이다. 이때 사용된 평가기준은 <표 3>과 같다.Clarity in Table 2 is the listening result of a sentence consisting of meaningless words, and naturality is the average value of the evaluation results given to 20 listeners. The evaluation criteria used are shown in <Table 3>.

[표 3]TABLE 3

상기에서 기술한 설명과 그 평가테스트에서 나타난 바와 같이 본원의 발명은 입력된 데이타에 대한 음성변환은 의미해석이 아닌 형태소 해석과 인간의 발성규칙에 따라 끊어읽기 단위를 생성하기 때문에 의미해석을 위한 방대한 사전이 필요없기 때문에 실시간 처리가 용이하며 시스템의 구성이 간단하다. 또한, 끊어읽기 모델을 문법에 치우치지 않고 실제 인간의 발성습성을 고려하였기 때문에 종래에 사용되었던 것보다 자연스러운 효과가 있다.As shown in the description and evaluation test described above, the present invention has a large number of meanings for semantic interpretation because the speech conversion for the input data generates the reading unit according to the morphological analysis and the human speech rules. No dictionaries are required, making it easy to process in real time and simplifying system configuration. In addition, since the read-reading model does not bias the grammar and considers the actual human vocalization habit, there is a natural effect than that used in the past.

또한, 음절수에 의한 가중치로 후처리를 할 수 있도록 하였기 때문에 인간의 호흡기 특성을 고려하여 자연스러운 음성변환을 수행할 수 있다.In addition, since the post-processing can be performed by the weight of the syllable number, the natural voice conversion can be performed in consideration of the respiratory characteristics of the human.

그리고, 전체 음절수에 따라 끊어읽기 갯수를 결정하고 이에 따라 어절의 끊어읽기값에 의해 항상 일정한 끊어읽기 패턴이 되지않도록 할 수 있으므로 인간의 발성모델에 보다 접근할 수 있다.The number of cut-outs can be determined according to the total number of syllables, and accordingly the cut-out value of words can be prevented from always being a constant cut-out pattern.

Claims

An input unit for inputting sentence data; A dictionary unit for storing basic form and part-of-speech information of words; A cut-out model table that defines a break-read level between words according to word-of-speech information; An operation memory unit for temporarily storing the input sentence data in one sentence unit; After dividing the stored sentences into word units, the dictionary unit is searched to analyze the morphemes of each word, the basic form and the part-of-speech information of each analyzed word are stored in the operation memory unit, and stored by referring to the broken reading model table. A control unit for determining a level of reading between words and selecting a final level of reading according to the number of readings in consideration of the total number of syllables of the input sentences among the reading levels; And an output unit which cuts out and outputs the stored sentence according to the final cut-off level.

The sentence converting apparatus of claim 1, wherein the controller adds a weight in consideration of the number of syllables of each word to the cutoff level determined by referring to the cutout model table.

The sentence transformation apparatus of claim 1, wherein the cut-out model table comprises three tables that form a two-dimensional array having a predetermined size.

A morpheme analysis step of dividing an input sentence into word units, analyzing a morpheme for each word by searching a dictionary storing basic form and part-of-speech information of a word, and obtaining basic form and part-of-speech information of each word; According to the word-reading information of words, the level of word reading is determined by referring to the word reading model table that defines the level of word reading between words. The total number of syllables of sentences entered from the word reading levels is determined. A cut-out processing step of selecting a final cut-off level according to the number of cut-outs considering the cut-off reading; And an output step of reading and outputting an input sentence according to a final breaking read level.

The method of claim 4, wherein the number of cut-outs is determined based on one of 13 syllables and 15 syllables as a reference syllable unit from all syllables of the input sentence.

5. The sentence of claim 4, wherein in the cut-out processing step, after determining the cut-off level of each word that has undergone the morphological analysis step, a weight in consideration of the number of syllables of each word is added to the determined cut-off level. How to convert.