KR20040025969A

KR20040025969A - Natural Language Processing Method Using Classification And Regression Trees

Info

Publication number: KR20040025969A
Application number: KR1020020056453A
Authority: KR
Inventors: 권오일; 김태수
Original assignee: 주식회사 현대오토넷
Priority date: 2002-09-17
Filing date: 2002-09-17
Publication date: 2004-03-27
Also published as: KR100486457B1

Abstract

PURPOSE: A method of processing a natural language using classification and regression trees (CART) is provided to analyze phrase information through probability and statistics to generate a control rule. CONSTITUTION: A corpus database required in the event of voice synthesis with respect to character information is constructed(S210). Phrases are modeled through CART using the constructed corpus database(S220). Specifically, a phrase break index is estimated using a decision-tree model. A phoneme period is estimated using a regression-tree model. Natural phrases are estimated on the basis of the modeled data when unknown character information is input(S230).

Description

Natural Language Processing Method Using Classification And Regression Trees}

본 발명은 자연어 처리 방법에 관한 것으로서, 보다 상세하게는 고품질의 음성합성기의 개발을 위해 억양이나 휴지, 음가의 길이 및 세기 등의 첨가되는 운율정보를 확률 통계적으로 분석하여 제어규칙을 생성할 수 있도록 한 CART를 이용한 자연어 처리 방법에 관한 것이다.The present invention relates to a natural language processing method, and more specifically, to develop a control rule by probabilistic statistical analysis of added rhyme information such as intonation, pause, length of voice and intensity for the development of a high quality speech synthesizer. A natural language processing method using a CART.

일반적으로 자연어 처리(Natural Language Processing) 기술은 그 사용 목적에 따라서 원시 언어(Source Language)를 분석하여 어휘의 의미, 구문관계 등을 밝혀내는 기술이다.In general, natural language processing technology is a technique for uncovering the meaning and syntax of a vocabulary by analyzing a source language according to its purpose of use.

도 1은 일반적인 자연어 처리 방법의 흐름도로써, 문법적으로 하자가 없는 문장을 한 어절씩 어휘의 원형 복구와 최소한의 의미 단위로 나누어주는 형태소 분석 단계(S110)와, 각 어절의 문장에서의 문법적인 성분을 구분해 주는 구문 분석단계(S120)와, 여러 개의 구문 구조에서 의미를 기준으로 가지치기하는 의미 분석 단계(S130), 및 의미 분석 단계에서도 해결이 안되는 것은 담화 분석 단계(S140)를 거쳐서 입력 문장을 처리한다.1 is a flow chart of a general natural language processing method, a grammatical analysis step (S110) of dividing a sentence without grammatical defects into a vocabulary by a word and a minimum meaning unit (S110), and a grammatical component in each sentence. Parsing step (S120) for distinguishing between, a semantic analysis step (S130) to pruning on the basis of the meaning in a plurality of syntax structures, and that is not solved in the semantic analysis step through the discourse analysis step (S140) To deal with.

이러한 자연어 처리 방법은 자동차용 네비게이션(Navigation)의 안내방송이나, FM DARC(DAta Radio Channel)의 문자데이터와 오토 PC(Auto PC)에서의 전자메일 등을 음성으로 변환시켜 보다 자연스러운 합성음의 운율과 음질을 제공하는 음성합성기의 개발에 광범위하게 활용된다.This natural language processing method converts voice announcements of automobile navigation, text data of FM DARC (DAta Radio Channel), and e-mails from Auto PC into voices for more natural rhythm and sound quality. It is widely used in the development of speech synthesizers that provide.

그러나, 종래의 음성합성에서는 운율 처리가 단순한 띄어읽기에 불과하고, 음소 지속시간의 예측 부족으로 인해 자연스러운 합성음의 운율 및 음질을 제공하지 못하는 문제점이 있다.However, in the conventional speech synthesis, the rhythm processing is merely a space reading, and there is a problem in that it cannot provide natural rhyme and sound quality of the synthesized sound due to the lack of prediction of phoneme duration.

이에, 본 발명의 목적은 음성 합성용 코퍼스 데이터베이스를 구축한 후 이를 이용하여 다양한 운율현상을 CART(Classification And Regression Trees)를 통해 통계적으로 모델링한 후 입력된 문장을 처리함으로써 자연스러운 운율 현상을 예측할 수 있도록 한 CART를 이용한 자연어 처리 방법을 제공하는데 있다.Accordingly, an object of the present invention is to construct a corpus database for speech synthesis and statistically model various rhythmic phenomena using CART (Classification And Regression Trees), and then process the input sentences to predict natural rhyme phenomena. It is to provide a natural language processing method using a CART.

상기와 같은 목적을 달성하기 위하여 본 발명에 따른 CART를 이용한 자연어 처리 방법은, 문자 형태의 정보에 대한 음성 합성시 필요한 코퍼스 데이터베이스(Corpus Database)를 구축하는 단계와, 상기 구축된 코퍼스 데이터베이스를 이용하여 운율현상을 CART를 통해 모델링하는 단계, 및 상기 모델링된 데이터를 바탕으로 미지의 문자 형태의 정보 입력시 자연스러운 운율현상을 예측하는단계를 포함한다.In order to achieve the above object, the natural language processing method using the CART according to the present invention comprises the steps of constructing a corpus database (Corpus Database) required for speech synthesis of information in the form of characters, and using the constructed corpus database Modeling the rhyme phenomenon through a CART, and predicting a natural rhyme phenomenon when inputting information in an unknown character form based on the modeled data.

도 1은 일반적인 자연어 처리 방법의 흐름도.1 is a flow chart of a general natural language processing method.

도 2는 본 발명에 따른 CART를 이용한 자연어 처리 방법의 흐름도.2 is a flowchart of a natural language processing method using CART according to the present invention;

이하, 첨부된 도면을 참조하여 본 발명을 상세하게 설명하고자 한다.Hereinafter, with reference to the accompanying drawings will be described in detail the present invention.

도 2는 본 발명에 따른 CART를 이용한 자연어 처리 방법의 흐름도이다.2 is a flowchart of a natural language processing method using a CART according to the present invention.

도 2를 참조하여 설명하면, 본 발명에 따른 자연어 처리 방법은, 음성 합성을 위해 모든 음소에 대한 발음 데이터베이스를 구축하고, 이를 연결시켜 연속된 음성을 생성하는데, 이때, 음성의 크기, 길이, 높낮이 등을 조절해 자연스러운 음성을 합성한다.Referring to FIG. 2, the natural language processing method according to the present invention constructs a pronunciation database for all phonemes for speech synthesis and connects them to generate a continuous voice. In this case, the size, length, and height of the voice are used. Adjust your back to synthesize natural voices.

먼저, 문자정보 또는 기호를 인간의 음성으로 들려주는 음성 합성을 위한 자연어 처리 장치(도시되지 않음)는 문자 형태의 정보 또는 기호에 대한 음성 합성시 필요한 코퍼스 데이터베이스(Corpus Database)를 구축한다(S210).First, a natural language processing apparatus (not shown) for speech synthesis that speaks text information or a symbol into a human voice constructs a corpus database required for speech synthesis of information or symbols in a text form (S210). .

이때, 코퍼스 기반 TTS(Corpus-Based Text To Speech) 기술을 사용함으로써 또다른 음성처리 기술인 제한 어휘 음성 합성 기술의 사용으로 인한 음성연결의 부자연스러움을 해결한다.In this case, by using a corpus-based TTS (Corpus-Based Text To Speech) technology, it solves the unnaturalness of voice connection caused by the use of another speech processing technology, a limited lexical speech synthesis technology.

참고로, 상기 제한 어휘 음성 합성 기술은 제한된 어휘와 문장 형태에 대한 음성합성을 하는 기술로서, 필요한 음성조각을 미리 녹음하였다가 이를 연결시켜 연속된 음성을 만들어 내는 기술로 자동응답장치(ATS) 서비스에 주로 응용되는데, 기술적으로 매우 간단하지만 문장형태가 제한되고 음성연결 부위가 부자연스러운 단점이 있다. 이에 비해 TTS는 합성 대상 어휘에 제한이 없으며 일반적인 문자 형태의 정보를 음성으로 변환하는데, 이는 자연스러운 음성 합성을 위해 억양, 끊어읽기 등을 실제 인간의 음성과 유사하게 구현한다.For reference, the limited vocabulary speech synthesis technique is a technique for synthesizing speech for a limited vocabulary and sentence type, and is a technology for generating a continuous speech by pre-recording necessary pieces of speech and connecting them to an automatic response device (ATS) service It is mainly applied to, but it is technically very simple but the sentence form is limited and the voice connection site is unnatural disadvantages. On the other hand, TTS has no limitations on the vocabulary to be synthesized, and converts information in general text form into speech, which embodies the intonation, reading, etc. similar to the actual human voice for natural speech synthesis.

따라서, 자연스러운 음성 합성을 위해 운율 경계강도 예측에 가장 합당한 문장 및 음소기간 예측에 가장 합당한 문장을 선별하여 코퍼스 음성 데이터베이스를 구축한다.Therefore, for natural speech synthesis, a corpus speech database is constructed by selecting sentences that are most suitable for rhyme boundary strength prediction and sentences that are most suitable for phoneme duration prediction.

이때, 운율 경계강도 예측을 위해서는 언어학자들에 의해 엑센트구(AP), 억양구(IP)가 체크된 400문장을 코퍼스 음성 데이터베이스로 구축하고, 음소기간 예측을 위해서는 언어학자들에 의해 엑센트구(AP), 억양구(IP)가 체크된 65,000개의 음소, 즉 약 900문장을 코퍼스 음성 데이터베이스로 구축한다.At this time, 400 sentences with accent phrases (AP) and accents (IP) checked by linguists are constructed by corpus speech database for prediction of rhyme boundary strength, and accent phrases are calculated by linguists for phoneme period prediction. AP), 65,000 phonemes in which the accent (IP) is checked, or about 900 sentences, are constructed as a corpus voice database.

상기 코퍼스 데이터베이스의 구축이 완료되면(S210), 상기 구축된 코퍼스 데이터베이스를 이용하여 운율현상을 CART(Classification And Regression Trees) 방법을 통해 모델링한다(S220). 참고로, CART 방법은 1984년 세계적으로 잘 알려진 UC 버클리(Berkeley)와 스탠포드(Stanford)의 여러 통계학자들에 의해 소개된 결정트리(Decision-Tree) 절차이다.When the construction of the corpus database is completed (S210), the rhythm phenomenon is modeled using the constructed and regression trees (CART) method using the constructed corpus database (S220). For reference, the CART method is a Decision-Tree procedure introduced in 1984 by several world-renowned UC Berkeley and Stanford statisticians.

이때, 상기 운율현상을 CART를 통해 모델링하는 경우, 결정트리기반 모델(Decision-tree Model)을 이용한 운율의 경계강도를 예측하고, 회귀트리 모델(Regression-tree Model)을 이용한 음운의 지속시간, 즉 음소기간을 예측한다.In this case, when modeling the rhyme phenomenon through the CART, the boundary strength of the rhyme is predicted using a decision tree model, and the duration of the rhyme using a regression-tree model, that is, Predict phoneme duration.

상기 운율 경계강도(Phrase Break Index)는 문장의 끊어읽기 정도를 나타내는 파라미터(Parameter)로 발화된 음성을 청취할 때 사람이 느끼는 어절간의 운율적 이질감으로서 객관적인 판단에 의한 값이라기 보다는 심리음향적 파라미터이다.The Phrase Break Index is a psychoacoustic parameter rather than an objective judgment as a rhythmic heterogeneity between words that a person feels when listening to a spoken voice as a parameter representing a degree of reading of a sentence. .

자연스러운 음성 합성을 위해 필요한 운율 요소를 추출하는 과정인 운율 경계강도 예측 과정은, 상기 코퍼스 데이터베이스에 구축된 전문가에 의해 엑센트구(AP), 억양구(IP)가 체크된 400문장을 가지고 CART 방법을 이용해 예측한다.The rhyme boundary strength prediction process, which is the process of extracting the rhyme components necessary for natural speech synthesis, uses the CART method with 400 sentences in which accents (AP) and accents (IP) are checked by experts built in the corpus database. Use it to predict.

또한, CART 방법에 의한 운율 경계강도 예측은 상기 데이터베이스의 400문장에서 관측된 어절의 수, 즉 4847개의 어절에 해당하는 특징 파라미터 요소로 아래 [표 1]의 파라미터(Parameter)들을 사용한다.In addition, the rhyme boundary strength prediction by the CART method uses the parameters of Table 1 below as feature parameter elements corresponding to the number of words observed in 400 sentences of the database, that is, 4847 words.

파라미터parameter 해설Commentary DPOSDPOS 해당 어절의 대표품사Representative parts of speech DLPOSDLPOS 해당 어절의 좌품사The left-hand article of the word DPLPOSDPLPOS 앞 어절의 좌품사A left-handed article DPPLPOSDPPLPOS 앞 앞 어절의 좌품사The front article of the front word DNLPOSDNLPOS 다음 어절의 좌품사Next article DNNLPOSDNNLPOS 다음 다음 어절의 좌품사Next article DRPOSDRPOS 해당 어절의 우품사The postman of the word DPRPOSDPRPOS 앞 어절의 우품사A postman DPPRPOSDPPRPOS 앞 앞 어절의 우품사The postman of the front word DNRPOSDNRPOS 다음 어절의 우품사Next word DNNRPOSDNNRPOS 다음 다음 어절의 우품사Next postman C_LOCC_LOC 문장에서 해당 어절의 위치The position of the word in the sentence C_LOCFRBEGC_LOCFRBEG 문장 앞에서부터의 어절위치Word position from front of sentence C_LOCFRENDC_LOCFREND 문장 뒤에서부터의 어절위치Word position from behind the sentence

이때, 상기 운율 경계강도 예측을 위해 사용한 결정트리기반 모델링에서 스플라이팅 방법(Splitting Method)은 GINI 인덱스(Index) 방법을 선택하고, SE 룰(Standard Error Rule)은 최소비용(Minimal Cost) 트리를 선택한다. 또한, 자신의 데이터를 테스트(Test) 목적으로 같은 사이즈(Size)의 세그먼트들로 분리하고, 한번에 한 세그먼트식을 제공하는 V-fold Cross-Validation 방법을 사용하여 최적의 트리를 결정함으로써 운율 경계강도를 예측한다.In this case, in the decision tree-based modeling used for predicting the rhyme boundary strength, the splicing method selects the GINI index method and the SE rule (Standard Error Rule) selects the minimum cost tree. do. In addition, by dividing its data into segments of the same size for testing purposes, and determining the optimal tree using the V-fold Cross-Validation method that provides one segment at a time, the rhyme boundary strength Predict.

한편, 상기 음운(Phone)의 지속시간은 억양과 함께 합성음의 자연성을 결정하는 중요한 요소이며, 발성의 속도에도 영향을 받음으로 이를 제어하는 과정이 중요하다. 참고로, 단어를 만들어 내는데 참여하는 소리를 음운(Phone)이라 하는데, 음운은 발음하는 사람들에 따라 각기 약간의 차이를 가질 수 있을 뿐만 아니라, 발음되는 환경에 따라 각기 차이를 보일 수 있다. 따라서, 이러한 음운을 대표 소리로 추상화 시킨 것을 음소(Phoneme)라고 한다.On the other hand, the duration of the phone (Phone) is an important factor in determining the naturalness of the synthesized sound with the intonation, the process of controlling this is important because it is also affected by the speed of speech. For reference, the sound that participates in making a word is called a phoneme, and the phoneme may not only have a slight difference depending on the people who pronounce the word, but may also show a difference depending on the pronunciation environment. Therefore, the phoneme is the abstraction of such phonology into representative sounds.

상기 음운의 지속시간, 즉 음소기간은 상기 코퍼스 데이터베이스에 구축된 전문가에 의해 엑센트구(AP), 억양구(IP)가 체크된 900문장을 이용하여 예측한다. 이때, 보통 운율 예측에 있어서 400문장 내지 1000문장 사이에 문장을 데이터베이스로 사용하는 경우가 많은데, 음소기간 예측시 수천 문장의 데이터베이스가 구축되었으나 데이터베이스 프로그램의 제한으로 65,000개의 음소, 즉 약 900문장 만을 가지고 예측한다.The duration of the phoneme, that is, the phoneme period, is predicted using 900 sentences in which accents (AP) and accents (IP) are checked by an expert built in the corpus database. At this time, a sentence is often used as a database between 400 and 1000 sentences in predicting rhyme. In the case of phoneme prediction, a database of thousands of sentences was constructed, but due to the limitation of the database program, it has only 65,000 phonemes, that is, about 900 sentences. Predict.

또한, 상기 코퍼스 데이터베이스의 900문장은 다시 초성, 중성, 종성, AP 경계의 마지막 음소, IP 경계의 마지막 음소의 5가지 범주로 분리되어 구축된다.In addition, 900 sentences of the corpus database are constructed by dividing into five categories: initial, neutral, and final phonemes, the last phoneme of the AP boundary, and the last phoneme of the IP boundary.

상기 5가지 범주로 분리된 각 음소집합에 대한 음소의 종류에 따른 지속시간의 평균은 [표 2]에 도시된 바와 같다.The average of the durations of the phonemes for each phoneme set divided into the five categories is shown in [Table 2].

음소의 종류Type of phoneme 평균 지속시간Average duration 초성Initiality 69.62msec69.62 msec 중성neutrality 74.59msec74.59 msec 종성Jongseong 55.98msec55.98 msec AP 마지막 음소AP last phoneme 84.74msec84.74 msec IP 마지막 음소IP last phoneme 155.58msec155.58 msec

또한, 이때의 음소 지속시간 제어를 위해 사용된 파라미터는 아래 [표 3]과같다.In addition, the parameters used for the phoneme duration control at this time are as shown in Table 3 below.

파라미터parameter 해설Commentary DLPHONEDLPHONE 관측 음소의 앞 음소Phoneme in front of the observation phoneme DPHONEDPHONE 관측음소Observation DRPHONEDRPHONE 관측 음소의 뒤 음소Phoneme behind the phoneme DLOCEOJDLOCEOJ 해당 어절내 음소 위치Phoneme location within the word DLOCAPDLOCAP 해당 엑센트구 내 음소 위치Phoneme location within the accent phrase DLOCIPDLOCIP 해당 억양구 내 음소 위치Phoneme location within the accent DNUMEOJDNUMEOJ 해당 어절의 음절 수The number of syllables in the word DAPDAP 해당 어절의 악센트구 경계 여부Whether the word accented boundary DIPDIP 해당 어절의 억양구 경계 여부Whether the word is bordered by an accent

이때, 상기 음소기간 예측을 위해 사용한 회귀트리 모델링에서 스플라이팅 방법(Splitting Method)은 GINI 인덱스(Index) 방법을 선택하고, SE 룰(Standard Error Rule)은 최소비용(Minimal Cost) 트리를 선택한다. 또한, V-fold Cross-Validation 방법을 사용하여 최적의 트리를 결정함으로써 음소기간을 예측한다.In this case, in the regression tree modeling used for the phoneme period prediction, a splicing method selects a GINI index method and a SE rule (Standard Error Rule) selects a minimum cost tree. In addition, the phoneme period is predicted by determining the optimal tree using the V-fold cross-validation method.

상기 운율 경계강도 및 음소기간의 예측을 통한 운율현상의 모델링이 완료되면(S220), 상기 모델링된 데이터를 바탕으로 미지의 문자 형태의 정보 입력시 자연스러운 운율현상을 예측한다(S230).When the modeling of the rhyme phenomenon through the prediction of the rhyme boundary strength and the phoneme period is completed (S220), a natural rhyme phenomenon is predicted when inputting information in an unknown character form based on the modeled data (S230).

따라서, 본 발명에 따른 CART를 이용한 자연어 처리 방법은, 음성 합성용 코퍼스 데이터베이스를 구축한 후 이를 이용하여 다양한 운율현상을 CART 방법을 통해 통계적으로 모델링한 후 입력된 문장을 처리함으로써 자연스러운 운율 현상을 예측할 수 있다.Therefore, in the natural language processing method using the CART according to the present invention, after constructing a corpus database for speech synthesis and using it to statistically model various rhyme phenomena using the CART method, the natural sentence phenomenon can be predicted by processing the input sentence. Can be.

이상에서 설명한 것은 본 발명에 따른 CART를 이용한 자연어 처리 방법에 대한 실험적인 방법을 통하여 최적의 운율예측을 위한 방법을 제시하는 것으로써, 본 발명이 속하는 분야에서 통상의 지식을 가진 자라면 누구든지 다양한 변경 실시가가능한 범위까지 본 발명의 기술적 정신이 있다고 할 것이다.What has been described above is to suggest a method for optimal rhyme prediction through an experimental method for the natural language processing method using the CART according to the present invention, any person having ordinary knowledge in the field of the present invention It will be said that the technical spirit of the present invention is to the extent that modifications can be made.

이상에서 살펴본 바와 같이, 본 발명에 따른 CART를 이용한 자연어 처리 방법은 억양, 휴지, 음가의 길이 및 세기 등의 운율정보를 CART 방법을 통해 통계적으로 모델링하여 제어규칙을 생성한 후 입력된 문장을 처리함으로써 사용자에게 종래의 음성 합성에서의 기계적이고 불안정한 음질의 제공 대신에 보다 자연스러운 음질을 제공할 수 있는 고품질의 음성합성기의 개발이 가능한 효과가 있다.As described above, the natural language processing method using the CART according to the present invention statistically modeling the rhyme information, such as intonation, pause, length and strength of the tone through the CART method to generate a control rule and process the input sentence By doing so, it is possible to develop a high quality speech synthesizer capable of providing more natural sound quality instead of providing mechanical and unstable sound quality in the conventional speech synthesis.

Claims

Constructing a corpus database required for speech synthesis of textual information;

Modeling a rhyme phenomenon through classification and regression trees (CART) using the constructed corpus database; And

Natural language processing method using the CART comprising the step of predicting a natural rhyme phenomenon when inputting information in the unknown character form based on the modeled data.

The method of claim 1, wherein the building of the corpus database,

Extract 400 sentences with the most suitable accent phrase (AP) and intonation (IP) for predicting the rhyme boundary strength, construct them into a database, and check the most suitable accent phrase (AP) and accent (IP) for phoneme period prediction. Natural language processing method using CART, which extracts 900 sentences and then divides them into initial, neutral, final, last phoneme of AP boundary and last phoneme of IP boundary and constructs them into database.

The method of claim 1, wherein the modeling of the rhyme phenomenon through CART comprises:

A rhyme boundary strength prediction process that predicts the boundary strength of rhyme using a decision-tree model;

A natural language processing method using a CART, comprising: a phoneme duration prediction process for predicting a phoneme duration using a regression-tree model.

The method of claim 2, wherein the rhyme boundary strength prediction process,

A splicing method selects a GINI index and a SE rule selects a minimum cost tree.

The method of claim 2, wherein the phoneme duration prediction process,

The splicing method selects the GINI index and the SE rule selects the least cost tree.