KR950002705B1

KR950002705B1 - Speech synthetic system

Info

Publication number: KR950002705B1
Application number: KR1019930000760A
Authority: KR
Inventors: 김형욱
Original assignee: 주식회사금성사; 이헌조
Priority date: 1993-01-21
Filing date: 1993-01-21
Publication date: 1995-03-24
Also published as: KR940018737A

Abstract

The syntactic parser analyzing the semantic relations of the written text statement to control the intonations for speech synthesis comprises a file input means(1) reading the text statement stored in an auxiliary memory device; a morphology analyzing means(2) finding the structure of the text statement; a dictionary means(3) keeping the information for analyzing the morphology; a syntactic handler(5) processing the text statement in the phrase; a paragraph/phrase handler(4) separating phrase parts from a paragraph; an intonation handler(7) separating the predefined intonation sections; an intonation extractor generating intonation information for speech synthesis.

Description

Parsing System and Method for Rhythm Control in Speech Synthesis System

제1도는 종래의 음성합성시트템의 음성합성처리 흐름도.1 is a flowchart of speech synthesis processing of a conventional speech synthesis system.

제2도는 본 발명의 구문분석장치를 구비한 음성합성시스템의 음성합성처리 흐름도.2 is a speech synthesis processing flow diagram of a speech synthesis system having a syntax analysis device of the present invention.

제3도는 본 발명의 구문분석장치의 블록 구성도.3 is a block diagram of a syntax analysis device of the present invention.

제4도는 본 발명의 구문분석과정을 나타낸 구문분석처리 흐름도.4 is a syntax analysis flowchart showing a syntax analysis process of the present invention.

제5도의 (a)는 본 발명에 의한 어절별 사전처리 흐름도, (b)는 본 발명에 의한 사전의 종류를 나타낸 도표.(A) of FIG. 5 is a flow chart for each word preprocessing according to the present invention, and (b) is a diagram showing the type of dictionary according to the present invention.

제6도의 (a)는 본 발명에 의한 정보틀의 구성예, (b)는 본 발명에 의한 정보를 고유값과 그 고유값의 의미를 나타낸 도표.6A is a diagram showing an example of a structure of an information frame according to the present invention, and (B) is a table showing the eigenvalues and the meanings of the eigenvalues of the information according to the present invention.

제7도는 본 발명의 구문분석 어절처리과정의 흐름도.7 is a flow chart of the parsing word processing process of the present invention.

제8도는 본 발명의 명사나열 처리과정의 흐름도.8 is a flowchart of a noun sequence processing of the present invention.

제9도는 본 발명의 술어 처리과정의 흐름도.9 is a flow chart of the predicate processing of the present invention.

제10도는 본 발명의 접속조사 연결어절 처리과정의 흐름도.10 is a flow chart of a connection inquiry connection word processing process of the present invention.

제11도는 본 발명의 구문지표 처리과정의 흐름도.11 is a flowchart of a syntax indicator processing process of the present invention.

제12도는 본 발명의 구문분석 처리 출력의 예.12 is an example of parsing processing output of the present invention.

제13도의 (a), (b), (c)는 본 발명의 문장에 대한 억양적용의 예.(A), (b) and (c) of FIG. 13 are examples of application of intonation to the sentence of the present invention.

제14도는 본 발명의 형태론적 분석과정의 흐름도.14 is a flow chart of the morphological analysis process of the present invention.

제15도는 본 발명의 구문론적 분석과정의 흐름도.15 is a flow chart of the syntactic analysis process of the present invention.

제16도는 본 발명의 각 과정에서 사용되는 용어 및 부호의 설명 도표.16 is an explanatory diagram of terms and symbols used in each process of the present invention.

* 도면의 주요부분에 대한 부호의 설명* Explanation of symbols for main parts of the drawings

1 : 화일입력수단 2 : 형태론적 분석수단1: File input means 2: Morphological analysis means

3 : 사전수단 4 : 구문론적 분석수단3: dictionary means 4: syntactic analysis means

5 : 구문규칙 처리수단 6 : 구절분리수단5: syntax rule processing means 6: phrase separation means

7 : 억양구 분리수단 8 : 운율정보 추출수단7: accent separation means 8: rhyme information extraction means

본 발명은 입력문장을 받아서 이를 음성으로 합성하여 출력하는 음성합성시스템에서 합성할 문장의 구문분석을 진행하여 운율제어를 통해 음성을 합성할 수 있도록한 구문분석장치와 구문분석방법에 관한 것으로, 특히 한국어의 구문분석과 이 분석 결과로 한국어의 운율을 제어하므로서 합성음의 자연도 및 명료도를 향상시킬 수 있도록한 한국어 문장/음성 변환기의 운율제어용 구문분석장치와 구문분석 방법에 관한 것이다.The present invention relates to a syntax analysis device and a syntax analysis method for synthesizing a speech through a rhythm control by analyzing a sentence to be synthesized in a speech synthesis system for receiving an input sentence and synthesizing it into a speech. The present invention relates to a syntax analysis device and a syntax analysis method for rhyme control of a Korean sentence / voice converter, which improves the naturalness and clarity of the synthesized sound by controlling Korean rhythm and the result of the analysis.

종래의 음성합성 시스템의 기능적 구성은 도면 제1도에 도시된 바와같이, 입력문장의 장음이나 불규칙 음성변동등을 처리하는 문장 전처리부와, 상기 전처리된 문장의 운율구현과 음운현상의 처리를 수행하는 운율구현 및 음운현상 처리부와, 상기 운율 및 음운처리된 문장을 음성으로 합성하여 출력하는 음성합성부로 이루어지며 이에따른 음성합성 처리과정은 다음과 같다.As shown in FIG. 1, the functional configuration of a conventional speech synthesis system includes a sentence preprocessor for processing long sounds or irregular speech fluctuations of input sentences, a rhythm implementation and a phonological phenomenon of the preprocessed sentences. And a voice synthesizer for synthesizing and outputting the rhyme and the phonologically processed sentences into voices. The voice synthesis process is as follows.

문장전처리부는 입력되는 한국어 문장에 대하여 한국어의 음운학적 처리를 수행하고 이 처리 결과로서 한국어 문장을 소리나는대로 변환시켜준다.The sentence preprocessing unit performs a phonological process of Korean on the input Korean sentence, and converts the Korean sentence into a phonetic sound as a result of the processing.

상기 문장처리부에서 필요로하는 문장처리정보는 장음사전과 불규칙 음운변동사전 및 조사, 어미 사전에서 제공하며 문장 전처리부는 상기 각 사전에서 제공되는 정보를 기초로하여 입력문장(INPUT TEXT)의 장음, 불규칙처리 등을 수행하여 처리결과를 운율구현 및 음운형상 처리부에 공급한다.The sentence processing information required by the sentence processing unit is provided in a long sound dictionary, an irregular phonological change dictionary, an investigation, and a parent dictionary, and the sentence preprocessing unit is based on the information provided in each dictionary. The processing results are supplied to the rhythm realization and phonological processing section.

운율구현 및 음운현상처리부는 상기 문장전처리부의 처리 결과를 이용하여 한글 자소의 구성에 따라 음운현상 및 강세, 음절의 길이, 어절내에서 억양의 변화등을 처리하여 한국어의 운율을 구현한다.Rhyme implementation and phonological phenomenon processing unit implements rhyme of Korean by processing phonological phenomena and stress, syllable length, and intonation within a word according to the structure of Hangul phonemes using the result of the sentence preprocessor.

상기 운율구현과 음운현상의 처리에 필요한 정보는 피치 궤적정보를 데이타베이스(PITCH 궤적 DBASE)로 구축해두고 이 정보를 제공하며 또한 운율구현의 에너지 정보도 데이타 베어스(ENERGY 궤적 DBASE)로 구축해두고 이 정보를 제공한다.The information necessary for processing the rhyme and phonological phenomena is constructed by providing the pitch trajectory information as a database (PITCH trajectory DBASE) and providing this information, and the energy information of the rhyme implementation is also constructed as the data bears (ENERGY trajectory DBASE). To provide.

상기 데이타 베이스로부터의 정보를 이용하여 상기 운율구현 및 음운현상처리부에서 처리된 문장은 음성합성부에 공급된다.Using the information from the database, the sentence processed by the rhyme implementation and phonological development processing unit is supplied to the speech synthesis unit.

음성합성부는 입력된 문장정보를 음성 데이타베이스(음성 DBASE)에서 제공하는 음성데이타에 따라 소리로 만들어 입력문장에 대응하는 음성을 출력해준다.The voice synthesis unit outputs the voice corresponding to the input sentence by making the input sentence information into sound according to the voice data provided by the voice database (voice DBASE).

즉, 음성 데이타베이스는 음성 데이타를 음소나 음절 또는 반음절 단위로 저장하고 있다가 이를 음성합성부에 공급하고 음성합성부는 이를 입력문장에 따라 결합하여 소리로 만들어주고 이때 운율요소를 처리하여 합성음을 자연음에 근사시키고 있다.In other words, the voice database stores the voice data in units of phonemes, syllables or half-syllables, and supplies them to the voice synthesizer, and the voice synthesizer combines them according to the input sentence and makes them into sound. It approximates natural sound.

그러나, 상기와같은 종래의 음성합성 시스템은 한국어의 문장에 대한 해석이 선행되지 않기 때문에 합성음이 자연스럽지 못하고 자연도가 저하되는 문제점이 있다.However, the conventional speech synthesis system as described above has a problem in that the synthesized sound is not natural and the naturalness is lowered because the interpretation of the sentence of Korean is not preceded.

즉, 한국어의 운율에는 크게 나누어 볼때 어절간의 휴지기와 문장에 따른 전제척인 억양구, 강세, 음절의 길이 등이 있는데, 이중에서, 강세와 음절의 길이등은 자소의 구성, 즉 단어의 초성, 중성, 종성의 조합에 따라 결정되는 값들로서 규칙적인 제어가 가능하다.In other words, the rhyme of Korean language is divided into two sections: accents, accents, and syllable lengths, which are preconditions between sentences and sentences. Among them, stress and syllable lengths are composed of phonemes. Regular control is possible with values determined according to the combination of neutral and final.

그러나 어절간의 휴지기와 억양구의 조절은 사람의 음성과같이 자연스럽게 조절할 수 없었다.However, the word rest period and the adjustment of the accent could not be adjusted as naturally as the human voice.

물론 어절마다 억양구라는 억양의 최소단위를 적용시키기는 하였으나 억양과 문장에서의 휴지기는 문장단위로 의미적, 구문적, 분절적 특성에 따라 구하여지는 것일뿐 어절과 음절단위로는 적용될 수 없는 것이다.Of course, although the minimum unit of accent is applied to each word, the rest period in accent and sentence is calculated by the semantic, syntactic, and segmental characteristics in sentence units, and cannot be applied by word and syllable units. .

따라서 문장에 대한 해석이 선행되지 않으면 합성음의 질은 단조로운 기계음 이상으로 기대하기 어렵고 이는 결국 멀티 미디어(MULTIMEDIA)등의 인간 기계 정보전달 요구가 증가하는 추세에 부응하지 못하는 문제점을 낳게되면 또한 인간의 음성처럼 자연스러운 고음질의 합성음을 구현하지 못하게 되는 문제점을 내포하게 되었다.Therefore, if the interpretation of the sentence is not preceded, the quality of the synthesized sound is difficult to expect beyond the monotonous mechanical sound, which in turn leads to a problem in that it fails to meet the increasing demand of human machine information transmission such as MULTIMEDIA. Like this, there is a problem of failing to realize natural high quality synthesized sound.

본 발명은 합성의 대상이되는 문장에 대한 어절간의 문법적 관계를 밝혀내고 이를 이용하여 음성합성기의 운율제어에 필요한 정보를 추출하는 구문분석수단을 제공함을 목적으로 한다.An object of the present invention is to find a grammatical relationship between words for a sentence to be synthesized, and to provide syntax analysis means for extracting information necessary for rhyme control of a speech synthesizer using the same.

특히 본 발명은 한국어를 대상으로하여 한국어의 문장안에서 어절간의 상호 문법적 연결관계, 예를들면 문장을 주부(NP : NOUN PHRASE)와 술부(VP : VERB PHRASE)의 대립구조로 피악하고 이 문법적 연결관계로부터 한국어의 운율제어에 필요한 정보를 추출하므로서 합성음을 자연음에 최적으로 근사시키고 이를 통해 합성음의 질적향상을 기할수 있도록한 구문분석수단을 제공함을 목적으로 한다.In particular, the present invention targets Korean grammatical relations between sentences in sentences in Korean, for example, sentences with the opposing structure of the housewife (NP: NOUN PHRASE) and the predicate (VP: VERB PHRASE). The purpose of this study is to provide a syntactic analysis means to extract the information necessary for Korean rhyme control from the Korean language, and to approximate the synthesized sound optimally, thereby improving the quality of the synthesized sound.

또한 본 발명은 한국어의 구문분석을 진행하므로서 합성음의 자연도 및 명료도를 향상시킨 음성합성 시스템을 제공함을 목적으로 한다.In addition, an object of the present invention is to provide a speech synthesis system that improves the naturalness and clarity of the synthesized sound while proceeding with the syntax analysis of Korean.

이하 첨도된 도면 제2도 내지 제16도를 참조하여 상기한 목적을 이루는 본 발명의 구문분석장치와 이 장치에서 이루어지는 구문분석 방법을 설명한다.Hereinafter, a syntax analysis apparatus of the present invention and a syntax analysis method performed in the apparatus will be described with reference to FIGS. 2 through 16 of the accompanying drawings.

먼저, 제2도는 본 발명의 구문분석기를 포함하는 음성 합성시스템의 기능적 구성을 나타낸다.First, Figure 2 shows the functional configuration of a speech synthesis system including a parser of the present invention.

즉, 문장전처리부에서 제공된 음운학적 문창처리정보를 기초로하여 구문분석정보(PARSIN 정보)를 추출하고 이 구분분석정보(억양구 및 구절과 실사, 허사의 길이등)를 운율구현 및 음운현상 처리부에 공급하므로서 문장전체에 대한 억양의 형태를 구하고 어절간의 휴지기를 조절하여 합성음을 자연음에 근사시키는 것이다.That is, based on the phonological sentence processing information provided by the sentence preprocessing unit, syntax analysis information (PARSIN information) is extracted, and the segmentation analysis information (according phrase and phrase, due diligence, length of scarcity, etc.) is rhyme implementation and phonological phenomenon processing unit. By supplying to, obtain the form of intonation for the whole sentence and adjust the rest period between words to approximate the synthesized sound to natural sound.

상기 구문분석기에서 수행되는 구문분석은 입력문장의 각 어절별 사전처리를 통해 어절의 내부구성을 분석한후 앞 뒤 어절들간의 문법적 관계를 밝힌다.The syntax analysis performed by the parser analyzes the internal structure of the word through preprocessing for each word of the input sentence, and then reveals the grammatical relationship between the previous and the next word.

즉, 상기 구문분석기는 전체 구문구조를 구문요소들의 조합으로 분석하며 이에따라 문장을 주부와 술부의 대립구조로 파악하고 이리하여 어절들간의 문법적 연결관계와 구문의 구성에 따라 임의의 문장으로부터 구절 및 억양구를 분리해내며 이와 더불어 한 어절에 대한 실사와 허사의 길이도 구해낸다.That is, the parser analyzes the entire syntactic structure as a combination of syntax elements and accordingly identifies the sentence as an opposing structure of the housewife and predicate. He separates the ball, and finds the length of the actual word and the fiction of a word.

이와같은 구문분석기의 전체적인 구성은 도면 제3도에, 그리고 상기 구문분석기에 의하여 수행되는 전체적인 구문분석과정은 도면 제4도에 나타낸다.The overall configuration of such a parser is shown in FIG. 3, and the overall parse process performed by the parser is shown in FIG.

먼저, 제3도를 참조하면 본 발명의 구문분석장치는, 합성의 대상이되는 문장을 읽어들이는 화일입력수단(1)과, 상기 읽어들인 문장을 문장의 형태분석을 통해 그 구조를 분석해내는 형태론적 분석수단(2)과, 상기 형태론적 분석수단에서 필요로하는 어절별 분석정보를 제공하는 사전수단(3)와, 상기 형태론적 분석수단의 분석 결과를 입력받아 문장의 구문론적 분석을 수행하는 구문론적 분석수단(4)과, 상기 구문론적 분석수단에서 필요로하는 구문규칙의 처리를 수행하는 구문규칙 처리수단(5)과, 상기 구문론적 분석수단의 처리결과를 입력받아 문장의 구절분리를 수행하는 구절 분리수단(6)과, 상기 구문론적 분석수단의 처리결과를 입력받아 문장의 억양구를 분리해내는 억양구 분리수단(7)과, 상기 구절분리 결과 및 상기 억양구 분리결과를 입력받아 운율정보를 추출하는 운율정보 추출수단(8)으로 구성된다.First, referring to FIG. 3, the syntax analyzing apparatus of the present invention analyzes a structure of a file input means 1 for reading a sentence to be synthesized and a structure of the read sentence through sentence form analysis. Morphological analysis means (2), dictionary means (3) for providing word-by-word analysis information required by the morphological analysis means, and the result of analysis of the morphological analysis means is input to perform syntactic analysis of sentences Syntactic analysis means (4), a syntax rule processing means (5) for performing the processing of the syntax rules required by the syntactic analysis means, and a sentence separation of the sentence by receiving the processing result of the syntactic analysis means. A phrase separation means (6) for performing a; and an accent separation means (7) for separating the accent of the sentence by receiving the processing result of the syntactic analysis means; and Take input And rhyme information extracting means 8 for extracting rhyme information.

이와같이 구성된 본 발명 장치에 의하여 이루어지는 한국어 문장의 구문분석 과정은, 입력문장을 작은양의 형태소 사전을 이용하여 어절에 대한 형태소분석을 통해 어절에 대한 정보를 구하고 이 정보를 어형틀에 입력하는 형태론적 분석과정과, 상기 어형틀을 가지고 구문분석을 진행하여 구문분석의 결과로서 어절을 문장요소로 분리한후 구절 및 억양구를 분리하여 구문분석정보를 문장단위로 실어 출력하는 구문론적 분석과정으로 이루어진다.The syntax analysis process of the Korean sentence made by the apparatus of the present invention configured as described above uses a small amount of morpheme dictionary to obtain information about a word through morphological analysis of the word and input the information into the morphology. Analysis process and syntax analysis proceeds with the morphology, and the phrase is divided into sentence elements as a result of syntax analysis, and the phrase and intonation are separated and the syntactic analysis process loads the syntax analysis information in sentence units. .

상기 형태론적 분석과정은, 합성의 대상이되는 문장을 입력시키는 제1과정과, 입력된 문장의 어절을 초성, 중성, 종성으로 분리하는 제2과정과, 상기 분리된 어절의 사전별 검색을 통해 어절별 내부 구성을 밝혀 어형의 정보틀을 구성하는 제3과정과, 상기 과정에서 구한 정보틀의 형태론적 분석에 의한 오류수정을 통해 정보들을 재구성하는 제4과정으로 이루어진다.The morphological analysis process includes a first process of inputting a sentence to be synthesized, a second process of dividing a word of the input sentence into a consonant, neutral, and final word, and a dictionary search of the separated word. The third step of constructing the information frame of the word form by revealing the internal structure of each word, and the fourth step of reconstructing the information through error correction by the morphological analysis of the information frame obtained in the above process.

즉, 상기 형태론적 분석 과정에서는 제14도에서와 같이, 입력된 문장의 어절을초성, 중성, 종성으로 자르고, 어절별 사전 검색을 통해 어형틀을 구성하고, 형태론적 분석을 통해 오류를 수정하여 어형틀을 제구성한다.That is, in the morphological analysis process, as shown in FIG. 14, words of the input sentence are cut into initial, neutral, and final words, and a word structure is formed through dictionary search for each word, and errors are corrected through morphological analysis. Restructure the framework.

상기 구문론적 분석과정은, 한 문장에 대한 어형의 정보틀을 입력하는 제1과정과, 상기 입력된 정보틀을 기초로하여 구문분석 어절처리를 수행하는 제2과정과, 상기 구문분석 결과로부터 문장의 구절과 억양구를 분리하는 제3과정과, 상기 분리된 구절과 억양구안에 어절수와 음절수에 따라 적용할 운율처리의 값을 조정하여 구문분석정보를 입력문장에 실어서 출력하는 제4과정으로 이루어진다.The syntactic analysis process includes a first process of inputting an information frame of a sentence for a sentence, a second process of performing syntax analysis word processing based on the input information frame, and a sentence from the syntax analysis result. A third process of separating the phrase and the intonation phrase of the second phrase, and adjusting the value of the rhythm processing to be applied according to the number of words and the number of syllables in the separated phrase and the intonation phrase, and outputting the syntax analysis information in the input sentence and outputting it The process takes place.

즉, 구문론적 분석과정에서는 제15도에서와 같이, 상기 형태론적 분석과정에서 처리된 어형틀을 입력받아 구문분석을 수행하고, 구문분석 결과로 구절 및 억양구를 분리하며, 이 구문분석정보(PARSING정보)를 실어 문장단위로 출력한다.That is, in the syntactic analysis process, as shown in FIG. 15, the morphological analysis process is performed by receiving the morphology processed in the morphological analysis process, and the phrase and the intonation are separated by the syntax analysis result. PARSING information) is printed in sentence unit.

제4도는 상기한 바와같이 이루어지는 본 발명의 구문분석과정의 전체적인 흐름을 나타내며 이를 참조하여 상기 제3도의 구문분석장치에서 수행되는 구문분석과정을 설명하면 다음과 같다.4 illustrates the overall flow of the parsing process according to the present invention as described above. Referring to this, the parsing process performed in the parsing apparatus of FIG. 3 is as follows.

먼저, 화일입력수단(1)에서 합성대상이 되는 한국어 문장을 읽어들인다.First, the Korean sentence to be synthesized is read from the file input means 1.

이 입력 화일은 형태론적 분석수단(2)에 입력되고 형태론적 분석수단은 입력화일을 문장별로 자른다.This input file is input to the morphological analysis means 2, and the morphological analysis means cuts the input file by sentence.

그리고 어절의 내부구성을 분석하기위하여 어미나 조사중심의 의존형 형태소(BOUND FORM MORPHEME)와 일부분의 자립형 형태소(FREE FORM MORPHEME)로 구성된 사전수단(3)내의 정보를 읽어들이고 각각의 어절을 초성, 중성, 종성으로 나누어 사전처리를 수행한다.In order to analyze the internal structure of the word, information in the dictionary means (3) consisting of the BOUND FORM MORPHEME and the partial FREE FORM MORPHEME of the word or investigation center is read, and each word is read and written In other words, pretreatment is performed by dividing into the finality.

상기 사전수단(3)의 사전 검색순서는 제5도의 (a)와 같은 순서로 수행하고 이때 제공되는 사전정보는 제5도의 (b)와 같은 데이타베이스의 사전군으로부터 제공된다.The dictionary search order of the dictionary means 3 is performed in the same order as in FIG. 5 (a), and the dictionary information provided at this time is provided from the dictionary group of the database as shown in FIG. 5 (b).

제5도의 (a)와 같은 일련의 순서는 제한된 사전에서 최대의 정보를 추출하기위하여 필요한 순서로서 부문지표사전 1, 2, 소요류 사전, 품사군 사전, 어미 사전, 품사군 사전, 어미.조사 사전, 품사군 사전, 형태지표 사전의 순서로 사전검색을 수행한다.The sequence of sequence as shown in (a) of FIG. 5 is the order necessary to extract the maximum information from the limited dictionary, and the indicators 1, 2, necessary dictionary, part-of-speech dictionary, mother-dictionary dictionary, part-of-speech dictionary, and mother-search The dictionary search is performed in order of dictionary, part-of-speech dictionary, and shape index dictionary.

그리고 사전수단(3)에는 상기한 문장분석정보를 제공하는 품사 사전군, 요소 사전군, 형태지표 사전군, 격.어미 사전군 1, 2, 부문지표 사전군 1, 2를 데이타베이스로 구축해둔다.In the dictionary means 3, a part-of-speech dictionary group, an element dictionary group, a form index dictionary group, a case and end dictionary group 1, 2, and a section index dictionary group 1, 2 which provide the sentence analysis information are constructed as a database. .

각 사전군은 도일한 부류의 형태론적 요소들(morphological element)로 구성된 사전들의 모임이다.Each dictionary group is a collection of dictionaries made up of the same kind of morphological elements.

이와같은 사전에서 정보를 꺼내는 작업은 어절을 초성, 중성, 종성으로 나누어 사전과 매칭시키면서 매칭이되면 그 어절에서 매칭된 것만큼 자르고 그 나머지를 다음 사전군에서 다시 매칭시키면서 정보를 꺼내는 것으로 다음의 2가지 경우로 구분한다.The task of extracting information from a dictionary like this is to divide the word into initial, neutral, and final words, match them with the dictionary, and if there is a match, cut out the words as matched in the word and retrieve the information while matching the rest from the next dictionary group. It is divided into two cases.

1) 독립어의 검색1) search for independent words

이는 들어온 어절값을 전체 매칭시키면서 검색하는 것으로 품사 사전군이나 요소 사전군이 이에 해당한다.This is a part-word dictionary group or element dictionary group that searches while matching the word values.

2) 의존어 검색2) dependency search

주로 어미나 조사를 이용해 어절을 요소로 나누기위한 정보를 찾는 작업으로 대상이되는 어절을 뒤에서부터 앞으로 매칭시키면서 정보를 꺼내게 된다.The task is to find information for dividing words into elements by using words or surveys.

예를들면 '앞으로'라는 어절을 초성, 중성, 종성으로 자르면, 'ㅇ ㅏ ㅍ ㅇ ㅡ ㅇ ㄹ ㅗ ㅇ'가 되고 독립어 검색의 경우는 상기 'o ㅏ ㅍ ㅇ ㅡ ㅇ ㄹ ㅗ ㅇ'를 전체 어절로 찾고 의존어 검색의 경우는 (1). ㅏ ㅍ ㅇ ㅡ ㅇ ㄹ ㅗ ㅇ, (2). ㅍ o ㅡ o o ㄹ ㅗ o, (3). o ㅡ o ㄹ ㅗ o, (4). ㅡ o ㄹ ㅗ o, (5). o ㄹ ㅗ o, (6). ㄹ ㅗ ㅇ, (7). ㅗ o, (8). o, 의 순서로 상기 1번 항목에서 8번 항목으로 뒤를 중심으로 앞에서부터 하나씩 잘라가면서 매칭되는 것을 찾아 정보를 꺼내게된다.For example, if the word 'forward' is cut into initial, neutral, and final, 'ㅇ ㅏ ッ ㅇ ㅡ ㅇ ㄹ ㅇ ㅇ', and in the case of independent word search, the word 'o ㅏ ㄴ ㅇ ㅡ ㅇ ㅇ' In case of dependent search (1). , ㅇ ㅡ ㅇ ㄹ ㅗ ㅇ, (2). O o o o o o l o o (3). o ㅡ o ㄹ ㅗ o, (4). O o d o (5). o r ㅗ o, (6). ㅗ ㅇ ㅇ, (7). ㅗ o, (8). o, in the order of item 1 to item 8 from the center around the back cut one by one to find a match to retrieve information.

이 예의 단어의 겨우는 '앞'이라는 명사와 '으로'라는 조사가 결합된 상태이므로 1차적으로 '으로'를 찾게되며 나머지 'o ㅏㅍ'을 다음사전군 검색루틴으로 넘겨 '앞'을 명사로서 찾게된다.In this example, the word 'front' and the search of 'ro' are combined, so the first step is to find 'ro', and the remaining 'o't' is passed to the next dictionary search routine so that 'front' is used as a noun. You will find

따라서 이 예의 어절은 상기 3번 항목에서 의존어의 검색이 완료된다. 이와같이하여 사전에서 입력문장의 내부구조가 분석되어지면 이를 제6도의 (a)에 나타낸 예에서와 같은 7개의 셀(cell)로 구성된 어형틀에 입력한다.Thus, the word of this example completes the search for the dependent word in item 3 above. In this way, when the internal structure of the input sentence is analyzed in the dictionary, the input sentence is input into a form composed of seven cells as shown in the example of FIG.

어형의 정보틀은, 요소, 품사, 지표, 격, 어미, 격.어미, 문 지표, 문 기호의 7개의 셀(CELL)로 구성된다.The information frame of a fish form consists of seven cells (CELL) of an element, a part-of-speech, an index, a grid, a mother, a grid.

그리고 이 셀에 입력되는 값은 매칭된 사전의 고유값이며 이 고유값들이 갖는 의미는 제6도의 (b)에 나타낸 바와같이 이 고유값은 형태소의 특징을 나타내게 된다.The value input to this cell is the eigenvalue of the matched dictionary, and the meaning of these eigenvalues is as shown in (b) of FIG.

이와같이하여 제4도에서의 형태론적 어절분석 및 어형틀의 구성이 완료된다. 이어서 상기 어형틀의 값을 형태론적 규칙에 따라 분석하여 잘못된 셀값을 정정하고 실사와 허사를 자른다.In this way, morphological word analysis and construction of the morphology in FIG. 4 are completed. The value of the morphology is then analyzed according to morphological rules to correct wrong cell values and to cut due diligence and vanity.

이로써 본 발명의 구문분석장치의 형태론적 분석수단(2)에 의한 입력문장의 형태론적 분석이 완료된다.This completes the morphological analysis of the input sentence by the morphological analysis means 2 of the syntax analysis device of the present invention.

이 형태론적 분석정보는 구문론적 분석수단(4)에 공급되고 구문론적 분석수단은 입력정보를 기초로하여 구문규칙 처리수단(5)을 참조로 제7도와 같은 구문분석 어절 처리를 수행한다.This morphological analysis information is supplied to the syntactic analysis means 4, and the syntactic analysis means performs syntax parsing word processing as shown in FIG. 7 with reference to the syntax rule processing means 5 based on the input information.

구문분석 어절처리과정은, 명사 나열 쳐리과정(A)과, 술어 처리과정(B)과, 접속조사에 의한 연결 어절처리과정(C)과, 부문 지표 처리과정(D)과, 어절을 요소값으로 바꾸는 과정(E)과, 연결복합문을 분리하는 과정(F)과, 삽입 복합문을 분리하는 과정(G)으로 이루어지며 상기 각 과정별로 본발명의 구문분석 과정을 도면 제8도 내지 제11도를 참조하여 설명한다.The parsing word processing process includes noun ordering processing process (A), predicate processing process (B), concatenated word processing process (C) by connection check, section index processing process (D), and word value. And a step (F) for separating the connection compound statement and a step (G) for separating the compound compound statement. It demonstrates with reference to FIG.

제8도 내지 제11도에서 사용되는 부호와 각 용어의 의미는 도면 제16도와 같다.The symbols used in FIGS. 8 to 11 and the meanings of the terms are the same as those in FIG. 16.

1. 명사 나열 처리 과정(A)1. Noun Listing Process (A)

명사나열 처리과정은, 어절별 정보틀을 입력받는 제1단계와, 입력된 어절에 토(조사, 어미)가 있는가의 여부 및 독립된 형태소의 여부를 검색하는 제2단계와, 상기 제2단계의 검색 결과, 토가 있거나, 독립된 형태소인 경우에는 문장기호를 검색하는 제3단계와, 상기 제3단계의 검색결과 문장기호가 있는 경우에는 종료하고 없는 경우에는 제1단계로 리턴하여 다음 어절을 분석하는 제4단계와, 상기 제2단계의 검색결과 독립된 형태소도 아니고 토도 없는 경우에는 컴마를 검색하는 제5단계와, 상기 컴마 검색결과 컴마가 없는 경우에는 뒤 어절이 구문요소인가를 검색하여 구문요소이면 뒤 어절과 현재 어절을 묶고 제1단계로 리턴하고 뒤 어절이 구문요소가 아닌 경우에는 제1단계로 리턴하는 제6단계와, 상기 컴마 검색 결과 컴마가 있는 경우에는 앞 어절에 수식어구가 있는가를 검색하는 제7단계와, 상기 수식어구 검색 결과 수식어구가 있는 경우에는 루틴 종료후 상기 제1단계로 리턴하여 다음 어절을 입력받고 앞 어절에 수식어구가 없는 경우에는 명사의 나열로 판단하고 제1단계로 리턴하여 다음 어절을 입력받는 제8단계로 이루어진다.The noun sequence processing process includes a first step of receiving a word-by-word information frame, a second step of searching whether the input word has a vowel (search, a mother) and an independent morpheme, and the second step of the second step. The third step of searching for a punctuation symbol if the result is a vowel or an independent morpheme, and if the search result punctuation symbol is found in the third step is terminated, returns to the first step and analyzes the next word. A fourth step of searching for a comma if the search result of the second step is not an independent morpheme or a vowel, and if the comma search result does not contain a comma, a phrase is searched for a phrase element If the second word is combined with the current word and returns to the first step, and if the next word is not a syntax element, the sixth step is returned to the first step, and if there is a comma, the previous word is returned. 7th step of searching whether there is a modifier in the case, and if there is a modifier in the result of the modifier search, after returning to the first step after the completion of the routine, the next word is input and the noun is listed if there is no modifier in the previous word. In step 8, the method returns to the first step and receives the next word.

제8도에서와 같이 어절에서 조사가 생략된 경우나 명사의 나열일 경우 이를 하나의 명사로서 취급한다.As in Fig. 8, if a word is omitted from a word or is a list of nouns, it is treated as a noun.

예를들면 조사가 생략된 경우는 '국민교육현장', 명사의 나열인 경우는 '산, 바다.강으로'가 된다.For example, if the survey is omitted, it is 'national education site' and if it is a noun, it is 'mountain, sea, river'.

따라서 제8도에서와 같이, 어절에 대한 제6도와같은 정보를 구성의 결과로서 제12도와같은 정보틀이 구해지면 한 어절씩 분석하면서 처리한다.Therefore, as in FIG. 8, when the information frame as shown in FIG. 12 is obtained as a result of construction, the information as shown in FIG.

즉, 단계(A)에서 어절별 정보틀(TEMPLETE)이 입력되면 다음단계(B)에서 토(조사나 어미)가 어절에 있는가의 여부를 검사하고, 입력된 어절이 독립적 형태소인가의 여부도 검사한다.That is, when the word-specific information frame (TEMPLETE) is input in step (A), the next step (B) checks whether the word (search or ending) is in the word, and also checks whether the input word is an independent morpheme. do.

상기 검사결과 독립적 형태소도 아니고 토도 없으면 컴마(COMMA)를 검사하여(단계 C) 컴마가 있으면 단계(F)에서 앞의 어절이 현재의 어절을 수식하는가의 여부를 검색한후 현어절이 수식을 받으면 단계(H)에서 루틴을 멈추고 그다음 어절을 분석한다(단계 1).If the result of the test is not an independent morpheme or a vowel, it checks the comma (step C). If there is a comma, it is determined whether the previous word modifies the current word in step (F). In step H, the routine stops and the next word is analyzed (step 1).

상기 검색결과 앞 어절에 수식어구가 없는 경우에는 단계(G)에서 그 다음 어절을 받아 이를 명사의 나열로 판단하고 처음의 루틴을 반복한다.If there is no modifier in the word preceding the search result, the next word is received in step G, and it is determined as a noun and the first routine is repeated.

상기 단계(B)에서 독립적 형태소이거나, 또는 토가 있는 경우에는 단계(J)에서 문장기호(SENTENSEMARK)를 검사하여 문장기호가 있는 경우에는 종료하고(단계 K), 문장기호가 없는 경우에는 다음단계(1)를 거쳐 다음의 어절을 상기와같이 분석한다.In step (B), if there is an independent morpheme or toe, check the sentence symbol (SENTENSEMARK) in step (J), and if there is a sentence symbol (step K), if there is no symbol, the next step After (1), the following words are analyzed as above.

한편, 상기 컴마 검색단계(C)에서 컴마가 없은 경우에는 뒤 어절에 대하여 구문요소인가를 검사하고(단계 D), 뒤 어절이 구문요소이면 뒤 어절과 현재의 어절을 묶고 상수를 초기화시킨다.On the other hand, if there is no comma in the comma retrieval step (C), it is checked whether or not a syntax element is included in the next word (step D). If the next word is a syntax element, the next word and the current word are bundled and the constant is initialized.

상기 구문요소 검색단계(D)는 토를 찾거나 구문요소를 찾는 것이다.The phrase element search step (D) is to find a toe or a phrase element.

이와같이하여 명사의 나열처리를 완료하였으면 제9도에서같은 술어처리과정을 수행한다.In this way, if the noun processing is completed, the same predicate processing is performed in FIG.

2. 술어 처리 과정(B)2. Predicate Processing Process (B)

술어 처리 과정은, 어절별 정보들을 입력받는 제1단계와, 상기 입력된 어절의 조사들과 체언의 결합을 찾아내고 어절에 대한 품사를 분석하며 어미와 지표소로서 문장안의 용언형을 분류하여 서술어를 판단하는 제2단계와, 상기 판단 결과 용언형인 어절이면 서술어미로서 전성 어미, 연결.종결 어미, 의존형 어미들을 처리하고 묶이는 어절수를 증가시킨후 컴마 및 문장기호를 판단하는 제3단계와, 상기 제2단계의 판단결과 체언형이나 지엽어이면 이것이 앞에서 계속되는 루틴일때 그루핑처리후 컴마 및 문장기호를 판단하는 제4단계와, 상기 제2단계의 서술어 판단결과 부문지표가 발전되면 컴마 및 문장기호를 판단하는 제5단계와, 상기 컴마 및 문장기호 판단결과 컴마 및 문장기호가 있는 경우에는 종료하고 없는 경우에는 제1단계로 리턴하여 다음 어절을 분석하는 제6단계로 이루어진다.The predicate processing process includes a first step of receiving word-specific information, finding a combination of words and checks of the input word, analyzing a part-of-speech for a word, and classifying verbal forms in sentences as endings and indexes. A second step of determining a third step, and a third step of determining a comma and a punctuation mark after increasing the number of words that are combined and processed as a predicate ending, a connecting ending ending ending, and a dependent ending, if the word is a verbal form. A fourth step of determining a comma and a punctuation symbol after grouping when the result of the determination of the second step is a continuation type or a side word; and a comma and a punctuation symbol when the descriptive judgment result segment indicator of the second step is developed. The fifth step of determining and if the result of the determination of the comma and punctuation symbols is terminated if there is a comma and punctuation symbols, return to the first stage if there is no The sixth step is to analyze words.

이 술어처리과정은 서술어를 찾는 과정이며 서술어는 여러개의 어절이 연속되어 사용됨으로써 하나의 술어를 구성할수 있고 이 경우에는 이를 하나의 술어로 취급한다.This predicate processing is a process of finding a predicate, and a predicate can be composed of several predicates, and in this case, it is treated as a predicate.

예를들면 '그날이 오고만다'라는 문장에서 '오고만다'라는 술어를 찾는 것이다. 즉, 제9도를 참조하면, 정보틀에서 한 어절씩 입력하고(단계 A), 이로부터 서술어의 판단을 수행한다.For example, the phrase "coming soon" is found in the phrase "coming soon". That is, referring to FIG. 9, the words are input one by one in the information frame (step A), and the judgment of the predicate is performed therefrom.

상기 서술어의 판단은 조사들과 체언의 결합을 찻아내고 어절에 대한 품사를 분석하며 어미와 지표소로서 문장안에서 용언형을 분류하는 것이다.Judgment of the predicate is to take out the combination of surveys and verbs, to analyze parts of speech for words, and to classify verbal forms in sentences as mothers and indicators.

이와같은 서술어 판단결과, 체언형이나 지엽요소인 어절은 단계(C.D.E)를 거쳐 그루핑(GROUPING) 처리단계(M)로 이행하고 용언형인 어절은 단계(B)를 거쳐 서술어미로 전성어미, 연결어미, 의존형어미의 처리를 수행하며(단계 F,G,H), 묶이는 어절은 어절수(j)를 증가하고 처리루틴의 계속여부를 나타내는 지표(end)를 1로 한다.(단계 I) end=1이면 루틴은 계속된다.As a result of judging such a predicate, the word which is a teletype or a local element is transferred to a grouping processing step (M) through a step (CDE), and the word that is a verbal type is a predicate ending, a connecting word, The dependent words are processed (steps F, G, and H), and the bound words increase the number of words (j) and the index (end) indicating whether the processing routine is continued is set to 1 (step I). If so, the routine continues.

이어서 컴마나 문장기호는 검색하여(단계 J) 문장기호가 있는 경우 이는 종료하고 컴마나 문장기호가 없는 경우에는 그 다음 어절을 분석한다.(단계 K)Then, the comma or punctuation is searched (step J), and if there is a punctuation, it is terminated. If there is no comma or punctuation, the next word is analyzed.

상기 서술어 판단결과 체언형이나 지엽어인 경우 이것이 앞에서 계속되는 루틴이면 이를 그루핑하고(단계 M), end=0으로 하여 루틴을 종료한후(단계 N), 문장기호 검색단계를 수행한다.(단계 J).As a result of the predicate determination, if it is a spoken type or a side word, if it is a routine that continues before (step M), the routine is terminated with end = 0 (step N), and a sentence symbol search step is performed (step J). .

상기 서술어 판단결과 부문지표가 검색되면(단계 L), 문장기호를 검색하고(단계 J) 다음 어절을 분석한후(단계 K,B,C) 이들을 그루핑 처리하여(단계 M) 묶거나 루틴을 빠져나온다. 이와같이하여 술어의 처리를 완료하고 접속조사 처리를 수행한다.As a result of the above judgment, the division index is searched (step L), the sentence symbol is searched (step J), the next word is analyzed (steps K, B, and C), grouped and grouped (step M), or the routine is skipped. Comes out. In this way, the processing of the predicate is completed and the connection inquiry process is performed.

3. 접속조사에 의한 연결어절 처리과정(C)3. Connection word processing process by connection investigation (C)

접속 조사 처리과정은, 어절별 정보틀을 입력받는 제1단계와, 입력 어절의 정보틀에서 접속조사를 검색하는 제2단계와, 상기 검색결과 접속조사가 없는 경우에는 문장기호가 검색하여 문장기호가 있으면 종료하고 없으면 다음 어절의 접속조사를 검색하는 제3단계와, 상기 제2단계의 검색 결과 접속조사가 있는 경우에는 앞 어절의 부체형 여부를 확인하여 부체어이면 다음 어절이 부체어인가를 확인하는 제4단계와, 상기 제4단계에서 다음 어절이 부체어를 제외한 체언이면 묶이는 어절의 갯수를 증가시키고 다음 어절에 대한 접속조사 처리루틴을 계속하는 제5단계와, 상기 제4단계에서 다음 어절이 명사 나열이거나 또는 이전 어절이 부체어이고 현재 어절이 부체어이면 그루핑 처리후 제2단계로 리턴하여 다음 어절의 접속조사를 검색하는 제6단계와, 상기 제4단계에서 체언도 아니고 명사의 나열도 아니고 부체어도 아닌 경우에는 접속조사를 격조사로 바꾸고 문장기호를 검색하는 제7단계로 이루어진다.The access inquiry processing process includes a first step of receiving an information frame for each word, a second step of searching for an access search from the information frame of the input word, and a sentence symbol for searching for a sentence symbol if there is no connection search result. If there is a search result of the next word, and if there is a search result of the second step, if there is a search result of the second word, it checks whether or not the next word is a subword. A fourth step of checking and a fifth step of increasing the number of words to be bound and continuing the connection check processing routine for the next word if the next word in the fourth step is a word except for a sub-chord; and in the fourth step, If the word is a noun list or if the previous word is a subword and the current word is a subword, the sixth step of returning to the second step after the grouping process is performed to search for the next word; In the fourth step, if it is not a statement, a noun, or a sub-word, a seventh step of changing the connection check to the case check and searching for a sentence symbol.

이와같이 이루어진 접속조사 처리 과정은 다음과 같이 연결 어절을 처리한다.In this way, the connection lookup process processes the connection word as follows.

접속조사로 연결된 체언형태의 어절을 묶어서 이를 하나의 체언으로 취급한다. 접속조사의 종류를 예를들면, '와, 과, 랑, 나, 에나'등이 있다.Groups of spoken words connected by a connection survey are treated as a single message. Examples of types of connection surveys are wah, gua, lang, me, and ena.

접속조사에 의한 체언의 연결을 처리하는데는 도면 제10도와 같이, 먼저, 입력된 어절의 정보틀에서 접속조사를 찾는다(단계 A,B).To process the connection of the message by the connection survey, as shown in FIG. 10, the connection survey is first searched for in the input word frame (steps A and B).

검색결과 접속조사가 없는 경우에는 문장기호를 검사하여(단계 D), 문장기호가 있는 경우에는 종료하고(단계 N), 문장기호가 없는 경우에는 단계(S)를 거쳐 다음 어절을 불러온다.If there is no connection check result, the sentence symbol is checked (step D), if there is a sentence symbol (step N), and if there is no sentence symbol, the next word is retrieved through step (S).

상기 검색결과 접속조사가 있는 경우에는 qbs=1로 하여 접속조사 처리루틴에 들어간다.If there is a search result as a result of the above search, qbs = 1 enters the connection search processing routine.

이때 한국어의 체언의 연결에는 대칭성이 많으므로 단계(E)에서 부체어의 여부를 판단하여 앞 어절이 부체어이면 nb=1로 하여 다음 어절이 부체어인가를 확인한다(단계 F,G).At this time, since there is a lot of symmetry in the connection of the Korean word, it is determined whether the subword is a subword in step (E), and if the previous word is a subword, nb = 1 to determine whether the next word is a subword (steps F and G).

상기 확인결과 다음 어절이 체언이면(단계 G) nf=1로하고(단계 J), 명사나열이면(단계 H) nf=2로하고(단계 K), nb=1일때 부체어이면(단계 I), nf=3으로 한다.(단계 L)If the next word is a statement (step G), nf = 1 (step J), if the nouns are listed (step H), nf = 2 (step K), and if nb = 1, it is a subordinate (step I). , nf = 3 (step L)

상기 각 단계에서 체언도 아니고 명사나열도 아니고 부체어도 아니면 접속조사를 격조로 바꾼후(단계 M) 문장기호 검색단계로 가서 현 루틴을 빠져나온다.In each of the above steps, it is not a statement, a noun sequence, a sub-challenge, or a connection check toned (step M), and it goes to the sentence symbol search step and exits the current routine.

한편 상기 각 단계에서 nf=1이면 (단계P) 한 루틴속에 있던 것을 그루핑하고(단계 Q) 상수를 초기화 시킨후(단계 R) 다음 어절을 입력한다.On the other hand, if nf = 1 in each of the above steps (step P), groupings in one routine (step Q) are initialized (step R), and then the next word is input.

상기 nf검색결과 nf가 1이 아니면 묶이는 어절의 갯수(nm)를 증가시키고(단계 O) 다음 어절에 대하여 접속조사 처리루틴(F,G,H,I,J,K,L,P,Q)을 계속한다.If nf is not 1, the number of bound words (nm) is increased (step O), and the connection check processing routine (F, G, H, I, J, K, L, P, Q) for the next word is found. Continue.

이와같이하여 연결되는 체언구를 하나의 어절로 묶어낼 수 있고 이로써 접속조사처리가 완료된다. 접속조사 처리가 완료되면 부분지표처리를 수행한다.In this way, the connecting phrases can be grouped into one word, thereby completing the connection investigation process. When the access investigation process is completed, partial index processing is performed.

4. 부문지표 처리과정(D)4. Sectoral Indicator Process (D)

부문지표 처리과정은, 어절별 정보틀을 입력받는 제1단계와, 상기 입력된 정보틀을 분석하여 지표를 판단하는 제2단계와, 상기 판단결과 보어지표, 부체치표, 부용지표이면 컴마를 검색하는 제3단계와, 상기제2단계의 판단결과 보어지표, 부체지표, 부용지표가 아닌 경우에는 문장지표를 검색하여 문장지표가 있는 경우에는 종료하고 없는 경우에는 제2단계로 리턴하여 다음 어절에 대한 지표검색을 수행하는 제4단계와, 상기 제3단계에서 컴마검색 결과 컴마가 없는 경우에는 다음 어절을 분석하여 다음 어절이 부체지표어, 보어지표어, 부용지표어이고 앞에서의 처리값과 연결되면 이들을 그루핑 처리한후 종료하는 제5단계와, 상기 제5단계에서 다음 어절이 보어지표어, 부체지표어, 부용지표어가 아닌 경우에는 제2단계로 리턴하여 다음 어절에 대한 지표검색을 수행하는 제6단계로 이루어진다.The division indicator processing process includes a first step of receiving an information frame for each word, a second step of determining an indicator by analyzing the input information frame, and searching for a comma if the determination result is a bore indicator, a sub-index indicator, and an in-use indicator. In the third step and the determination result of the second step, if it is not a bore index, a floating index, or an unused index, the sentence index is searched. If there is a sentence index, the sentence is terminated. In the fourth step of performing the index search and the comma search result in the third step, if the comma is not found, the next word is analyzed and the next word is a floating index, a bore index, and a floating index. After the grouping process is finished, the fifth step is terminated, and if the next word is not a bore mark, a floating mark, or an unmarked mark in the fifth step, the process returns to the second step and returns to the next word. It comprises a sixth step of performing a table search.

이와같이 이루어진 부문지표 처리과정은 다음과 같다.The sector indicator processing thus achieved is as follows.

모든 문장의 복합변형은 삽입과 연결로 나뉠수 있고 이러한 문장의 복합에 사용되는 형태소가 부문지표이다.Complex transformations of all sentences can be divided into insertions and links, and the morphemes used in compounding these sentences are sector indicators.

이 부문지표는 복합문의 지표 연결어와 연결되어 사용될 수 있는데 이것들의 분석에 의해 복합문 속의 구절의 삽입과 나열을 찾을수 있다.These sector indicators can be used in conjunction with index linkages in compound statements. Their analysis can find insertions and listings of phrases in compound statements.

부문지표의 연결형태는 (1) 보어지표(예,…)…보어지표 연결어(관하여, 대하여,…), (2) 부체지표(으로,…)…부체지표 연결어(인한,…), (3) 부용지표(부체지표, 보어지표)…부용지표 연결어(까닭에, 대로, 듯이, 싶이,…)이다.The type of consolidation of sector indicators is: (1) Bore indicators (eg…). Bore Index Links (about,…), (2) Floating Indicators (to…)… Subsidiary Indicators Linked Words (Restricted,…), (3) Subsidiary Indicators (Sub indicators, Bore Indicators). Bouillon indicator concatenation (because, as, like, ...).

이와같은 부문지표의 처리과정을 제11도를 참조하여 설명한다.The processing of such sector indicators will be described with reference to FIG.

입력단계(A)에서 한 어절이 입력되면 검색단계(B,C,D)에서 보어, 부체, 부용지표를 찾고 각각의 상수에 1을 입력한다.(단계 E,P,G).When a word is input in the input step (A), the search words (B, C, D) are used to find the bore, the float, and the indicator, and enter 1 for each constant (steps E, P, and G).

그리고 상기 검색단계에서 보어, 부체, 부용에 해당하지 않는 어절을 다음 어절을 입력하던가(단계 J,O,P)또는 완전히 루틴을 빠져나온다(단계 Q).In the search step, a word that does not correspond to a bore, a floating part, or a bouillon is inputted with the next word (step J, O, P) or completely exits from the routine (step Q).

상기 검색단계에서 지표가 검색되면 컴마검색단계(H)에서 컴마를 검색하여 루틴이 계속여부를 판단한다.When the index is searched in the search step, a comma is searched in the comma search step (H) to determine whether the routine continues.

즉, 컴마가 있는 경우에는 단계(J)를 통해 루틴을 빠져나오고, 컴마가 없는 경우에는 i를 증가시켜 그 다음 처리루틴(i++)에서 다음 어절을 분석한다.That is, if there is a comma, the routine exits through step J. If there is no comma, i is incremented to analyze the next word in the next processing routine (i ++).

다음 어절의 분석 결과 다음 어절이 부체, 부용, 보어지표이고 앞에서의 처리값과 연결이 되면(단계 K,L,M) 이를을 그루핑 단계(N)에서 그루핑 처리하고 루틴을 종료한다.(단계 J).As a result of analyzing the next word, if the next word is a float, bouillon, and bore index and it is connected with the previous processing value (steps K, L, and M), it is grouped in the grouping step (N) and the routine is terminated (step J). ).

이와같이 하여 부문지표처리가 완료되면 어절을 요소값으로 바꾸어(단계 E), 상기 부문지표를 이용하여 복합문속의 구절과 삽입의 나열을 찻아 연결복합문과 삽입복합문을 자른다(단계 F,G).In this way, when the division index processing is completed, the word is replaced with the element value (step E), and the section index and the insertion compound statement in the compound statement are cut using the division index (step F, G).

이와같이 구문론적 분석수단(4)에 의하여 구문분석이 완료되면 어절을 구문요소로 분류하여 구절분리수단(6)과 억양구 분리수단(7)에서 구절과 억양구를 자른다.As such, when the syntactic analysis is completed by the syntactic analysis means 4, the phrases are classified into syntax elements and the phrases and intonations are cut by the phrase separation means 6 and the intonation separating means 7.

즉, 구문요소를 지엽요소와 필수요소로 구분하고 이들의 수식관계를 살펴서 억양구로 묶은후 서술어를 중심으로 구절을 자른다.In other words, the syntax elements are divided into leaf elements and essential elements, and their relations are examined and grouped into accents.

지엽요소는 예를들면 1, 부체어-체언과 부체관계로 묶인다. 2. 부용어-용언과 부용관계로 묶인다. 필수요소는 에를들면 1. 주어, 2. 목적어, 3. 보어, 4. 술어를 들 수 있다.The leaf elements are, for example, tied together with a subchair-subject and a subordinate relationship. 2. Terminology—to be bound by adjectives and adjectives. Essential elements are, for example, 1. subject, 2. object, 3. bore, 4. predicate.

이와같이하여 구절이 잘리면 운율정보 추출수단(8)에서 구절과 억양구안에 어절수와 음절수에 따라 적용할 운율처리의 값을 바꾼다.(PARSING LEVEL변경) 운율처리값을 바꾼후 문장에 운율정보를 도면 제12도의 예와같이 실어 출력하므로서 한국어의 구문분석에 의한 운율제어가 완료되었다.In this way, when the phrase is cut, the rhyme information extraction means 8 changes the rhythm processing value to be applied according to the number of words and syllables in the phrase and intonation. (Change the parsing level) After changing the rhyme processing value, the rhyme information in the sentence is changed. Rhythm control by syntactic analysis of Korean language is completed by outputting as shown in the example of FIG.

상기 제12도에서 정보틀에 있는 각각의 값들은 어절을 구성한 형태소의 고유값이다. 이 값들중에서(-)값은 형태론적 분석에 의한 오류의 수정을 완료한후에 제거된 값이다.In FIG. 12, each value in the information frame is an intrinsic value of a morpheme that constitutes a word. Of these values, negative values are removed after completing the correction of errors by morphological analysis.

각 어절의 정보틀의 고유값과 주위의 어절들과의 문법적 관계에 따라 구분요소값(CV값)이 얻어지는데 이는 어절이 문장에서 하는 기능을 구분요소로 나타냈을 경우의 값이다. 각 CV값은 상기한 제6도의 (b)에서 사전의 고유값이 의미하는 바와같다. 예를들면 CV값이 40이나 49이나 둘다 부체어의 한종류인 것이다.According to the grammatical relationship between each word's intrinsic value and surrounding words, the value of the distinguishing element (CV) is obtained, which is the value when the function of the word in the sentence is expressed as the distinguishing element. Each CV value is the same as the prior eigenvalue in (b) of FIG. 6 mentioned above. For example, a CV value of 40 or 49 is both a type of subcarrier.

구문분석의 결과값(RESULT)은 구문 요소값(CV)의 결합 형태에 따라 문장에서 구절과 억양구가 분리된 형태를 보인다.The result of syntax analysis (RESULT) shows the phrase and accent are separated in the sentence according to the combined form of syntax element value (CV).

억양구는 억양이 적용될 수 있는 최소의 단위로서 어절과 어절간의 문법적 연결관계에 따라 정해진다. 여기서 '2'는 억양구의 경계이며 '10'은 구절의 경계이다.The accent is the smallest unit to which the intonation can be applied and is determined according to the grammatical link between words. '2' is the boundary of intonation and '10' is the boundary of passage.

위에서 찾아낸 값들을 이용하여 운율구현 및 음운현상 처리부에서 어절간, 억양구간, 구절간의 구문분석 결과를 입력문장에 실어서 다음의 예와같이 음성합성부의 합성결과를 출력한다.Using the values found above, the parsing result between the phrase, intonation section, and phrase phrase in the rhyme implementation and phonological phenomenon processing unit outputs the synthesis result of the speech synthesis unit as shown in the following example.

"조상의 # 빛난 얼을 # 오늘에 # 되살려// 안으로# 자주 독립의 # 자세를 # 확립하고//밖으로 # 인류공영에 # 이바지 # 할때다//""Revival of the # ancestors of the ancestors # today / / # Independence # often # independence / / # out # contribute to humanity # / /"

상기 출력 합성음성에서 자간 공간은 어절간의 경계, #은 억양구의 경계, //는 구절의 경계를 각각 의미한다.In the output synthesized speech, interspace space means a boundary between words, # means an accent boundary, and // indicates a boundary between phrases.

이와같이 처리된 문장을 이용하여 도면 제13도와같은 칸투어(CONTOUR)를 그릴수 있고 어절 사이의 휴지기를 조절할 수 있다.By using the processed sentence as described above, it is possible to draw a CONTOUR as shown in FIG. 13 and to adjust the pause between words.

문장내에서 억양과 어절의 휴지기를 조절하는 구칙은 다음과 같다.The rules for controlling intonation and word pauses in sentences are as follows:

1. 인터내이션(INTONATION); 음의 높낮이를 나타내며 음성의 피치(PITCH)에 의해 결정되고피치는 문장의 형태 및 구문론적 구조에 따라 변한다.1. INTONATION; It represents the pitch of the note, which is determined by the pitch of the speech, which depends on the form and syntactic structure of the sentence.

(1). PITCH SENTENSE=PITCH BASE LINE+PITCH INTONATION PATTERN(One). PITCH SENTENSE = PITCH BASE LINE + PITCH INTONATION PATTERN

상기 규칙으로 전체 문장의 억양이 구해지고 PITCH BASE LINE은 아래의 식으로 구해진다.The accent of the entire sentence is obtained by the above rule, and the PITCH BASE LINE is obtained by the following equation.

(2). PITCH BASE LINE(t)=(EPH-1PH)*t/TLS+1PH(2). PITCH BASE LINE (t) = (EPH-1PH) * t / TLS + 1PH

상기 도면 제13도의 (b)에서 보이는 바와같이 구절의 경계에서 리세팅(RESETTING)의 경향이 있다.There is a tendency of RESETTING at the boundary of the passage as shown in (b) of FIG.

(3) 억양구에 따라 피치패턴(PITCH PATTENRN)은 도면 제13도의 (c)와 같이 구해진다.(3) According to the intonation, the pitch pattern PITCH PATTENRN is obtained as shown in FIG.

2. 휴지기 (PAUSE)2. PAUSE

(1). 음절 사이의 휴지기; 음절의 초성, 종성에 따라 결정한다.(One). Rest between syllables; Decide according to the syllable's initial and finality.

(2) 어절 사이의 휴지기; 구문론적 구조에 따라 문장간, 어절간, 억양구간, 어절간의 값이 다르다.(2) rest periods between words; Depending on the syntactic structure, the values between sentences, words, intonation sections, and words differ.

상기와 같은 운율정보의 추출과 이에따른 음성합성의 결과는 종래의 음성 합성 시스템에 비하여 그 합성음의 자연도와 명료도가 향상되고 기계로 구현한 인간의 음향학적 정보전달 수단인 음성이 인간의 그 것과 근접한 고음질의 확보가 가능하게 된 것이다.As a result of extracting the rhyme information and the resulting speech synthesis, the natural and clarity of the synthesized speech is improved compared to the conventional speech synthesis system, and the voice, which is a human acoustic information transmission means implemented by a machine, is closer to that of the human. High quality sound is now possible.

이상에서 설명한 바와같이 본 발명에 의하면 음성 합성 시스템에서 한국어의 구문분석을 진행하여 운율제어를 수행하므로 인간의 음성과 같은 고음질의 합성음을 구할 수 있고 따라서 합성음이 적용될수 있는 ARS(음성 자동 응답 시스템)나 자동 교환 시스템에 적용한 경우 그 음성의 명료도를 향상시켜 정확한 음성 정보의 전달을기할 수 있으며 음성 정보전달에 높은 신뢰성을 확보할 수 있는 효과가 있다.As described above, according to the present invention, since the speech synthesis system performs rhythm control by performing Korean syntax analysis, a high-quality synthesized sound such as a human voice can be obtained, and thus, an ARS (voice answering system) to which the synthesized sound can be applied. B. When applied to an automatic switching system, it is possible to improve the intelligibility of the voice to ensure accurate voice information delivery and to secure high reliability in voice information transmission.

Claims

File input means (1) for reading the sentence to be synthesized, morphological analysis means (2) for analyzing the structure of the read sentence through sentence morphology analysis, and the morphological analysis means Dictionary means (3) for providing word-by-word analysis information; Syntactic analysis means (4) for receiving syntactic analysis of sentences by receiving analysis results of morphological analysis means, and syntax rule processing means (5) for processing syntax rules required by the syntactic analysis means (5). And a phrase separation means (6) which receives the processing result of the syntactic analysis means and performs sentence separation of the sentence, and an intonation that separates the accent of the sentence by receiving the processing result of the syntactic analysis means. And a rhyme information extracting means (8) for extracting rhyme information by receiving means (7) and the phrase separation result and the intonation section separation result.

The apparatus of claim 1, wherein the dictionary means (3) comprises a part-of-speech dictionary group, a urea dictionary group, a form index dictionary group, a case, a mother dictionary group, and a sector indicator dictionary group.

The morphological analysis process of composing the information frame of the word form through the word-by-word classification and the word-by-word dictionary search of the input sentence, and the phrase obtained by separating the phrase and intonation by inputting the information frame of the process. Syntactic analysis method for rhyme control of speech synthesis system that consists of syntactic analysis process that outputs analysis information by character unit.

The method of claim 3, wherein the morphological analysis process comprises: a first process of inputting a sentence to be synthesized; a second process of separating words of the input sentences into initial, neutral, and final words; and the separated words It consists of a third process of constructing the information frame of words by searching the internal structure of each word through a dictionary search of the word, and a fourth process of reconstructing the information frame through error correction by the morphological analysis of the information frame obtained in the above process. Parsing Method for Rhythm Control in Speech Synthesis System.

The method of claim 3, wherein the syntactic analysis process comprises: a first process of inputting an information frame of a sentence for a sentence, a second process of performing syntax analysis word processing based on the input information frame; A third step of separating the sentence phrase and intonation phrase from the syntax analysis result, and adjusting the value of the rhythm processing to be applied according to the number of words and the number of syllables in the separated phrase and intonation phrase, and then analyzing the syntax information in the input sentence. A parsing method for rhyme control of a speech synthesis system consisting of a fourth process of outputting.

According to claim 3, The dictionary search for each word in the morphological analysis process, Sector index dictionary (independent language), Sector index dictionary (dependent language), Urea dictionary (independent language), Parts of speech dictionary (independent language), ending dictionary Voice synthesis to search in the order of (dependent), part-of-speech dictionary (independent child), ending investigation dictionary (dependent), part-of-speech dictionary (independent), morphological indicator dictionary (dependent), morphological indicator dictionary (dependent) Parsing method for rhyme control of system.

The method of claim 3, wherein the information frame of the morphology in the morphological analysis process is composed of seven cells (CELL) of elements, parts of speech, indicators, intervals, endings, sentence marks, and sentence symbols. Parsing Method.

6. The method of claim 5, wherein the second process of syntactic analysis includes a noun sequence processing that treats a word as a noun when a word is omitted or a list of nouns, and a word is consecutively used. Predicate processing to find the predicates that make up the predicate, connection search processing to group the words of the spoken form connected by the connection search and treat them as a single statement, and to process the sector index to find the segment indicators used for the compounding of sentences. Parsing Method for Rhythm Control of Speech Synthesis System.

9. The method of claim 8, wherein the noun sequence processing step comprises: a first step of receiving a word-by-word information frame, and a second step of searching whether the input word has a toe (irradiation, a mother) and an independent morpheme; A third step of searching for a punctuation symbol in the case of the second result of the search result or a separate morpheme, and a first step if not ending in the case of the third search result punctuation symbol. A fourth step of returning and analyzing the next word; a fifth step of searching for a comma if the search result of the second step is not an independent morpheme or a satin; and if the comma search result does not have a comma, If it is a syntax element, if it is a syntax element, it binds the next word and the current word and returns it with the first step. If the next word is not a syntax element, it terminates. If not, it returns to the first step and analyzes the next word. The fourth step, and if the search result of the second step is not an independent morpheme or no vowel, the searcher is a fifth step, and if the comma search result does not have a comma, the search word is a phrase element. If the second word and the current word are combined and returned to the first step, and the second word is not a syntax element, the sixth step is returned to the first step, and if the comma search result includes the comma, In the seventh step of searching, and if the modifier phrase is found as a result of the modifier phrase, after the completion of the routine, the method returns to the first step and receives the next phrase. A parsing method for rhyme control of a speech synthesis system consisting of an eighth step of returning to a step and receiving the next word

The method of claim 8, wherein the predicate processing process comprises: a first step of receiving an information frame for each word, finding a combination of the search and the sentence of the input word, analyzing a part-of-speech for the word, The second step of classifying a verbal form in a sentence and judging a descriptive word, and if the word is a verbal word, the word is processed as a descriptive ending, and the comma and sentence symbol are increased after increasing the number of words that are bound and ending. A third step of judging; and a fourth step of judging commas and punctuation symbols after grouping when the result of the judgment of the second step is a spoken type or a side word; Is found, a fifth step of determining a comma and a punctuation symbol; A parsing method for rhyme control of a speech synthesis system, comprising a sixth step of returning to the system and analyzing the next word.

10. The method of claim 8, wherein the access inquiry process comprises: a first step of receiving an information frame for each word and a second step of searching for an access search from an information frame of an input word; The third step of searching for a symbol and ending if there is a punctuation symbol, if not, the third step of searching for a connection check of the next word, and if the search result of the second step has a connection check, checking whether the previous word has a floating type. A fourth step of checking whether the word is a subword, and a fifth step of increasing the number of words to be bundled if the next word is a subword except for the subword in the fourth step, and continuing the connection check processing routine for the next word; In the fourth step, if the next word is a noun or the previous word is a subword and the current word is a subword, the grouping process returns to the second step to check the connection of the next word. Rhythmic parsing method of the speech synthesis system consisting of a sixth step of searching, and a seventh step of changing the connection search to dichotomy and searching for punctuation when the fourth step is not spoken, nouns, or subordinates. .

The method of claim 8, wherein the process of processing the sector indicator comprises: a first step of receiving an information frame for each word, a second step of analyzing the input information frame to determine an index, and a bore index and a floating index of the determination result; In the third step of searching for a comma, if the indicator is not a bore indicator, a sub indicator, and a sentence indicator, if it is not the indicator, the sentence indicator is terminated. In the fourth step of performing index search for the next word by returning to the step, and in the third step, if there is no comma as a result of the comma search, the next word is analyzed by using the next word. If the previous word is connected to the above-mentioned processing value, the grouping process is terminated after the grouping process, and if the next word in the fifth step is not a bore mark, a float mark, or a markup mark, it returns to the second stage. And prosody control syntax analysis method of the speech synthesis system of claim 6, wherein performing a search index for the following Eojeol.