KR101025814B1

KR101025814B1 - Method for tagging morphology by using prosody modeling and its apparatus

Info

Publication number: KR101025814B1
Application number: KR1020080127710A
Authority: KR
Inventors: 김정세; 이수종; 윤승; 이일빈; 황규웅; 박준; 박상규
Original assignee: 한국전자통신연구원
Priority date: 2008-12-16
Filing date: 2008-12-16
Publication date: 2011-04-04
Also published as: KR20100069120A

Abstract

The present invention relates to a method and a device for tagging a morpheme part-of-speech using a rhyme model. The present invention relates to a morpheme part-of-speech sequence based on a speech DB for analyzing morphemes of speech recognition results or transcribed text sentences. Comparing the rhyme model of each rhyme model and the input voice, find the optimal morpheme parts of speech sequence for the input rhyme model, and combine the findings with the morpheme parts of speech tagging method to tag the morpheme parts of speech, thereby making the accuracy of morpheme parts of speech tagging. Can be maximized. In addition, the present invention can grasp the speech intent by tagging the morpheme parts of speech by applying the rhyme model.

Stemming, rhyme model, parts of speech, tagging

Description

METHODE FOR TAGGING MORPHOLOGY BY USING PROSODY MODELING AND ITS APPARATUS}

본 발명은 운율 모델을 이용한 형태소 품사 태깅 방법 및 그 장치에 관한 것으로, 보다 상세하게 설명하면 음성 데이터베이스(DabaBase, 이하 DB라 함)를 기반으로 형태소 품사 시퀀스별 운율모델을 구축하고, 텍스트 문장과 입력음성의 운율 모델을 이용하여 형태소 품사 태깅의 모호성을 해소하도록 하는 방법 및 그 장치에 관한 것이다.The present invention relates to a morpheme parts-of-speech tagging method and apparatus using a rhyme model. More specifically, the present invention relates to a morpheme parts-of-speech sequence rhyme model based on a speech database (DabaBase, hereinafter referred to as DB). A method and apparatus for resolving ambiguity of morpheme parts-of-speech tagging using a rhyme model of speech.

본 발명은 지식경제부 및 정보통신연구진흥원의 IT성장동력기술개발사업의 일환으로 수행한 연구로부터 도출된 것이다[과제관리번호: 2008-S-019-01, 과제명: 휴대형 한/영 자동통역 기술개발].The present invention is derived from the research conducted as part of the IT growth engine technology development project of the Ministry of Knowledge Economy and the Ministry of Information and Communication Research and Development. [Task management number: 2008-S-019-01, Task name: Portable Korean / English automatic interpretation technology Development].

주지된 바와 같이, 음성인식기를 이용하는 자동통역이나 정보검색 장치는 음성인식기의 결과인 텍스트 문장만을 바탕으로 형태소 품사 태깅, 구문분석, 의미분석 등을 수행한다.As is well known, an automatic interpretation or information retrieval apparatus using a speech recognizer performs morpheme parts tagging, syntax analysis, semantic analysis, and the like based on only text sentences that are the result of the speech recognizer.

즉, 텍스트 문장만을 이용하여 통역하는 경우, 형태소 품사 태거의 오류로 인해, 화자의 발성 의도와는 전혀 다른 번역 결과를 출력할 수 있다. 예컨대, 텍스트 문장만을 이용하여 통역하는 경우 음성인식 결과가 “나는 새를 보았다”일 경우, 예1) 및 예2)That is, when interpreting using only the text sentence, a translation result that is completely different from the speaker's intention of speech can be output due to an error of the morpheme speech tagger. For example, when interpreting using only text sentences, when the voice recognition result is “I saw a bird,” Examples 1 and 2)

예1) 나/인칭대명사 는/조사 새/명사 를/조사 보/동사 았/어미 다/어미Example 1) I / person pronoun / investigative bird / noun / investigation assistant / verb

예2) 나/동사 는/관형형어미 새/명사 를/조사 보/동사 았/어미 다/어미Ex 2) I / verb / tubular mother bird / noun / investigative report / verb / mother / mother

로 형태소 해석이 될 수 있는데, 입력 문장만으로는 “나”가 “인칭대명사”의 뜻인지 “날다”동사의 뜻인지를 전혀 알 수가 없으므로, 이 문장은 구문분석과 나아가 의미분석을 하더라도 구분할 수 없다. 따라서, 자동통역기에서 활용하는 경우, 화자의 발성 의도와는 전혀 다른 번역 결과를 출력할 수 있다. The sentence can be morphologically interpreted, and the input sentence alone cannot tell whether “I” means “personal pronoun” or “fly” verb, so this sentence cannot be distinguished even by syntactic analysis and semantic analysis. Therefore, when used in an automatic translator, it is possible to output a translation result completely different from the speaker's intention.

일 예로, 도 1은 “나는 새를 보았다”에 대한 음성파일을 나타낸 도면으로서, 앞의 음성은 상술한 예1)의 음성파일이고, 뒤의 음성은 상술한 예2)의 음성파일에 대한 실제 예이다. 다시 말하여, 도 1을 참조하면, “나는”이 인칭대명사로 쓰일 경우와 동사로 쓰일 경우의 피치정보는 다른 것이다. 예1)의 인칭대명사로 사용할 경우는 피치가 높았다가 낮아지며, 예2)의 동사로 쓰일 경우는 낮아졌다가 높아짐을 알 수 있다. 이 피치정보를 활용하면 예1)과 예2)를 구분할 수 있다. 또한 예1)은 “나는”에서 끊어 읽으며, “나는 새를”에서 끊어 읽음을 볼 수 있어, 끊어 읽기 정보 또한 다르다. 또한, 음성의 길이 정보와 끊어 읽기 정보를 활용할 수 있다. For example, FIG. 1 is a diagram showing a voice file for "I saw a bird," wherein the previous voice is the voice file of Example 1) described above, and the following voice is the actual voice file of the voice file of Example 2) described above. Yes. In other words, referring to FIG. 1, the pitch information when “I” is used as a personal pronoun and a verb is different. When used as a personal pronoun in Example 1), the pitch increases and then decreases, and when used as a verb in Example 2), it decreases and then increases. By using this pitch information, it is possible to distinguish between Example 1) and Example 2). Also, Example 1) reads "I" cut off and "I cut a bird" to read, so the read information is different. In addition, the length information of the voice and the read information can be utilized.

예컨대, 발성의 길이에 대한 예로서, For example, as an example of the length of speech

예3) 내가 아는 한 교수님은 그렇지 않다.를 살펴보면, “내가 아는”에서 끊어 읽을 경우는 “한/고유명사”로 “한씨 성을 가진 교수님”의 뜻이며, 짧게 발성이 된다. 반면에 “내가 아는 한”에서 끊어 읽게 되면 “한/의존명사”의 뜻으로 길게 발성이 된다.Example 3) As far as I know, the professor is not. If you read in "I know," it means "professional professor with Han's surname," which means "one / unique noun." On the other hand, if you read from "as far as I know," you will be uttered with the meaning of "han / dependency noun."

또한, "종결어휘와 피치를 이용한 문형정보를 추출하는 방법"을 사용한 예가 개시되어 있다. 즉 자동통역 장치의 음성인식 결과에 따른 종결어휘를 이용하여 1차적으로 문형 정보를 추출하고, 2차적으로 음성으로부터 피치를 추출한 후 종결어휘의 문형별 출현 빈도율과 조합하여 문형정보를 추출함으로서 보다 높은 정확률을 얻을 수 있다. In addition, an example using "a method of extracting sentence pattern information using a final vocabulary and a pitch" is disclosed. In other words, the sentence information is extracted first by using the ending vocabulary according to the speech recognition result of the automatic interpreter, and secondly, the pitch is extracted from the voice, and the sentence information is extracted by combining the frequency of appearance of the ending vocabulary by sentence type. High accuracy rate can be obtained.

그러나, 상술한 바와 같은 종래 기술은 대용량 텍스트 DB에서의 종결어휘의 출현빈도율과, 추출된 피치의 선형조합 방법에 의해 수행되며, 평서형인지 아닌지를 구분하는 기술인데, 현재와 같이 반도체 및 정보 통신 기술이 급격하게 발달하는 환경을 고려할 때, 음성 DB를 기반으로 형태소 품사 시퀀스별 운율모델을 구축하고, 텍스트 문장과 입력음성의 운율 모델을 이용하여 형태소 품사 태깅의 모호성을 해소하도록 하는 운율 모델을 이용한 형태소 품사 태깅 방법 및 그 장치를 추가 개발해야 할 필요성이 있다. However, the prior art as described above is performed by the linear combination method of the extracted vocabulary and the extracted pitch in the large-capacity text DB, and is a technique for distinguishing whether or not it is flat form. Considering the rapid development of technology, we build a rhyme model for morpheme parts of speech based on speech DB, and use the rhyme model to solve the ambiguity of morpheme parts of speech tagging using text sentence and input voice. There is a need for further development of the morphemes speech tagging method and apparatus.

이에, 본 발명의 기술적 과제는 상술한 필요성에 의해 안출된 것으로서, 형태소 품사를 태깅하는 경우, 음성인식의 결과 또는 전사한 텍스트문장의 형태소를 해석하고, 해석이 모호한 부분에 대해 음성 DB를 기반으로 기 구축된 형태소 품사 시퀀스별 운율 모델과 입력음성의 운율 모델을 비교하여, 입력음성의 운율 모델에 대한 최적의 형태소 품사 시퀀스를 찾고, 그 찾은 결과를 형태소 품사 태깅 방법과 조합시켜 형태소 품사를 태깅하도록 하는 운율 모델을 이용한 형태소 품사 태깅 방법 및 그 장치를 제공한다. Accordingly, the technical problem of the present invention has been devised by the above-mentioned necessity. When tagging a morpheme part of speech, the result of speech recognition or the morpheme of the transcribed text sentence is interpreted, and the interpretation of the ambiguity is based on the voice DB for the part where the interpretation is ambiguous. Comparing the rhyme model for each morphological part-of-speech sequence and the input voice model, find the optimal morphological part-of-speech sequence for the input rhyme model, and combine the findings with the morphological part-of-speech tagging method to tag the morpheme parts-of-speech. A morpheme part-of-speech tagging method using a rhyme model and an apparatus thereof are provided.

본 발명의 일 관점에 따른 운율 모델을 이용한 형태소 품사 태깅 방법은, 형태소 품사 시퀀스별 운율 모델을 기반으로 텍스트 문장에 대한 형태소를 해석하는 단계와, 형태소 해석 결과에 형태소 품사 시퀀스가 존재할 경우 음성에 대한 운율 모델을 검출하는 단계와, 음성에 대한 운율 모델과 데이터 저장 DB에 저장된 형태소 품사 시퀀스별 운율 모델간을 비교하여 음성에 대한 운율 모델의 형태소 품사 시퀀스를 결정하는 단계와, 결정된 형태소 품사 시퀀스 결과에 대해 형태소 품사 태깅 기법을 적용하고, 음성에 대한 운율 모델의 품사 시퀀스 정보를 합산하여 품사 태깅하는 단계를 포함하는 것을 특징으로 한다.The morpheme parts-of-speech tagging method using a rhyme model according to an aspect of the present invention comprises the steps of: analyzing a morpheme for a text sentence based on a rhyme model for each morpheme part-of-speech sequence; Detecting a rhyme model, comparing a rhyme model for speech with a rhyme model for each morpheme part-of-speech sequence stored in a data storage DB, and determining a morpheme part-of-speech sequence of the rhyme model for speech; And applying a part-of-speech tagging technique to the part-of-speech tagging by summing the parts-of-speech sequence information of the rhyme model for speech.

또한, 본 발명의 다른 관점에 따른 운율 모델을 이용한 형태소 품사 태깅 장치는, 형태소 품사 시퀀스별 운율 모델을 기반으로 텍스트 문장에 대한 형태소를 해석하는 형태소 해석부와, 형태소 해석 결과에 형태소 품사 시퀀스의 존재 여부에 따라 모호성 존재를 결정하는 모호성 판단부와, 모호성이 존재할 경우 음성에 대한 운율 모델을 검출하는 운율모델 검출부와, 음성에 대한 운율 모델과 형태소 품사 시퀀스별 운율 모델간을 비교하여 음성에 대한 운율 모델의 형태소 품사 시퀀스를 결정하는 운율 모델 비교 판단부와, 결정된 형태소 품사 시퀀스 결과에 대해 형태소 품사 태깅 기법을 적용하고, 음성에 대한 운율 모델의 품사 시퀀스 정보를 합산하여 품사 태깅하는 품사 태깅부와, 품사 태깅된 결과를 출력하는 품사태깅 결과 출력부를 포함하는 것을 특징으로 한다.In addition, the morpheme parts-of-speech tagging apparatus using the rhyme model according to another aspect of the present invention, the morpheme analysis unit for analyzing the morphemes for the text sentence based on the rhyme model for each morpheme part-of-speech sequence, the presence of the morpheme parts of speech sequence in the morpheme analysis results A ambiguity determination unit that determines the existence of ambiguity according to whether or not, a rhyme model detection unit that detects a rhyme model for speech when ambiguity exists, and a rhyme for speech by comparing between a rhyme model for speech and a rhyme model for each morpheme part of speech sequence A rhyme model comparison determination unit for determining a morpheme part-of-speech sequence of the model, a part-of-speech tagging unit for applying a morpheme part-of-speech tagging method to the result of the determined morpheme part-of-speech sequence, and summing the part-of-speech sequence information of the rhyme model with respect to speech; A part-of-stage tagging result output unit for outputting a part-of-speech tagged result It is characterized by.

본 발명은 음성인식의 결과 또는 전사한 텍스트문장의 형태소를 해석하고, 해석이 모호한 부분에 대해 음성 DB를 기반으로 기 구축된 형태소 품사 시퀀스별 운율 모델과 입력음성의 운율 모델을 비교하여, 입력음성의 운율 모델에 대한 최적의 형태소 품사 시퀀스를 찾고, 그 찾은 결과를 형태소 품사 태깅 방법과 조합시켜 형태소 품사를 태깅함으로써, 형태소 품사 태깅의 정확도를 극대화시킬 수 있다.The present invention analyzes the result of speech recognition or the morpheme of the transcribed text sentence, and compares the rhyme model of the morpheme parts of speech sequence based on the speech DB and the rhyme model of the input speech for the ambiguous part. The accuracy of morpheme parts-of-speech tagging can be maximized by finding the optimal morpheme parts-of-speech sequence for the rhyme model and tagging the morpheme parts-of-speech by combining the findings with the morpheme parts-of-speech tagging method.

또한, 본 발명은 운율 모델을 적용하여 형태소 품사를 태깅함으로써, 화자의 발성 의도를 파악할 수 있는 이점이 있다. In addition, the present invention has the advantage that it is possible to grasp the speaker's intention by tagging the morpheme parts of speech by applying the rhyme model.

이하, 첨부된 도면을 참조하여 본 발명의 동작 원리를 상세히 설명한다. 하기에서 본 발명을 설명함에 있어서 공지 기능 또는 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략할 것이다. 그리고 후술되는 용어들은 본 발명에서의 기능을 고려하여 정의된 용어들로서 이는 사용자, 운용자의 의도 또는 관례 등에 따라 달라질 수 있다. 그러므로 그 정의는 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다. Hereinafter, with reference to the accompanying drawings will be described in detail the operating principle of the present invention. In the following description of the present invention, when it is determined that a detailed description of a known function or configuration may unnecessarily obscure the subject matter of the present invention, the detailed description thereof will be omitted. Terms to be described later are terms defined in consideration of functions in the present invention, and may be changed according to intentions or customs of users or operators. Therefore, the definition should be based on the contents throughout this specification.

도 2는 본 발명의 일 실시예에 따른 운율 모델을 이용한 형태소 품사 태깅 장치에 대한 블록 구성도로서, 텍스트 및 음성 입력부(201)와 형태소 해석부(203)와 모호성 판단부(205)와 운율 모델 검출부(207)와 운율 모델 비교 판단부(209)와 품사 태깅부(211)와 품사 태깅 결과 출력부(213)와 음성 DB(215)와 전사한 텍스트 DB(217)와 형태소 품사 태깅 DB(219)와 형태소 품사 시퀀스별 운율 모델 구축부(221)와 데이터 저장 DB(223)를 포함할 수 있다.2 is a block diagram of a morpheme part-of-speech tagging apparatus using a rhyme model according to an embodiment of the present invention, and includes a text and voice input unit 201, a morpheme analysis unit 203, an ambiguity determination unit 205, and a rhyme model. The detector 207, the rhyme model comparison determination unit 209, the part-of-speech tagging unit 211, the part-of-speech tagging result output unit 213, the voice DB 215, the transcribed text DB 217, and the morpheme part-of-speech tagging DB 219. ) And a rhyme part-of-speech sequence-specific rhyme model construction unit 221 and a data storage DB 223.

텍스트 및 음성 입력부(201)는 텍스트 및 음성을 입력받아 형태소 해석부(203)에 제공할 수 있다.The text and voice input unit 201 may receive text and voice and provide the same to the morpheme analyzer 203.

형태소 해석부(203)는 데이터 저장 DB(223)에 저장된 형태소 품사 시퀀스별 운율 모델을 기반으로 텍스트 및 음성 입력부(201)로부터 입력되는 텍스트 및 음성중 텍스트 문장에 대해 분석 가능한 모든 형태소들의 리스트를 생성하면서 해석하여 모호성 판단부(205)에 제공할 수 있다.The morpheme analysis unit 203 generates a list of all morphemes that can be analyzed for text sentences in text and voice input from the text and voice input unit 201 based on a rhyme model for each morpheme part-of-speech sequence stored in the data storage DB 223. The analysis can be provided to the ambiguity determining unit 205.

모호성 판단부(205)는 형태소 해석부(203)의 형태소 해석 결과에 대해 기 구축된 형태소 품사 시퀀스별 운율 모델에 존재하는 형태소 품사 시퀀스가 형태소 해석 결과에 존재하는지를 판단하고, 그 판단 결과를 운율 모델 검출부(207)에 제공할 수 있다.The ambiguity determining unit 205 determines whether a morpheme part-of-speech sequence existing in the morphological analysis result for each morpheme part-of-speech sequence existing in the morpheme analysis result of the morpheme analysis unit 203 exists in the result of the morpheme analysis, and determines the result of the rhyme model. The detector 207 can be provided.

운율 모델 검출부(207)는 모호성 판단부(205)로부터 입력되는 판단결과에서 모호성이 존재할 경우, 입력된 음성으로부터 형태소 품사 시퀀스별 운율 모델을 검출한 음성에 대한 운율 모델을 운율 모델 비교 판단부(209)에 제공할 수 있다.If there is ambiguity in the determination result input from the ambiguity determining unit 205, the rhyme model detection unit 207 determines the rhyme model for the voice from which the rhyme model is detected by the morpheme parts of speech sequence from the input voice. ) Can be provided.

운율 모델 비교 판단부(209)는 운율 모델 검출부(207)로부처 입력되는 음성에 대한 운율 모델과 데이터 저장 DB(223)에 저장된 형태소 품사 시퀀스별 운율 모델간을 비교하여 입력된 음성의 운율 모델에 대한 최적의 형태소 품사 시퀀스를 결정하여 품사 태깅부(211)에 제공할 수 있다. The rhyme model comparison determination unit 209 compares the rhyme model for the voice inputted from the rhyme model detection unit 207 with the rhyme model for each morpheme part of speech sequence stored in the data storage DB 223 to determine the rhyme model of the input voice. The optimal morpheme part-of-speech sequence may be determined and provided to the part-of-speech tagging unit 211.

품사 태깅부(211)는 운율 모델 비교 판단부(209)에 의해 결정된 최적의 형태소 품사 시퀀스 결과인 결합 가능한 형태소간의 리스트들에 대해 기존의 형태소 품사 태깅 기법을 적용하고, 입력된 음성의 운율 모델에 대한 최적의 품사 시퀀스 정보를 찾아 이를 더해줌으로서, 즉 수학식 1The part-of-speech tagging unit 211 applies an existing morpheme part-of-speech tagging technique to the list of the combinable morphemes that are the result of the optimal morpheme part-of-speech sequence determined by the rhythm model comparison determining unit 209, and applies the rhyme model of the input voice. By finding the best part-of-speech sequence information for and adding it, that is, Equation 1

(여기서,

는 통계 기반 품사 태깅에 주로 사용되는 기존 공식으로, 형태소 품사 태깅 방법은 은닉 마르코프 모델(HMM)을 사용하며, P는 최적의 형태소 품사열을 의미하고, W_i는 i번째 단어를 의미하며, P_i는 W_i의 태그를 의미하며, d_j는 운율 모델을 의미하며, s_j는 형태소 품사 시퀀스를 의미하며, Pd(d_j│s_j)는 형태소 품사 시퀀스에 대한 입력된 음성의 운율 모델과 기 구축된 형태소 품사 시퀀스별 운율 모델을 비교한 확률을 의미하며, C는 상수로 기존공식에 비교 결과를 어느 정도 적용할 것인지를 결정하는 것을 의미한다.)(here,

Is a conventional formula mainly used for statistical-based part-of-speech tagging. The morpheme part-of-speech tagging method uses the Hidden Markov Model (HMM), P is the optimal morphological part-of-speech sequence, W _i is the i-th word, and P _i stands for the tag of W _i , d _j stands for the rhyme model, s _j stands for the morpheme parts of speech sequence, and Pd (d _j | s _j ) is the rhyme model of the input voice for the morpheme parts of speech sequence. It means the probability of comparing the rhyme model for each pre-formed morpheme parts of speech sequence, and C is a constant to determine how much the comparison result is applied to the existing formula.)

을 이용하여 품사결정 태깅을 수행하여 품사 태깅 결과 출력부(213)에 제공할 수 있다. The part-of-speech tagging result may be provided to the part-of-speech tagging result output unit 213 by using.

품사 태깅 결과 출력부(213)는 품사 태깅부(211)로부터 입력되는 품사결정 태깅 결과를 출력할 수 있다.The part-of-speech tagging result output unit 213 may output the part-of-speech tagging result input from the part-of-speech tagging unit 211.

음성 DB(215)는 음성을 저장하고, 전사한 텍스트 DB(217)는 음성 DB(215)에 저장된 음성을 전사한 텍스트를 저장하며, 형태소 품사 태깅 DB(219)는 전사한 텍스트 DB(217)에 저장된 전사한 텍스트에 대한 형태소 품사 태깅을 저장할 수 있다. The voice DB 215 stores the voice, the transcribed text DB 217 stores the transcribed text stored in the voice DB 215, and the morpheme part-tagging tagging DB 219 transfers the transcribed text DB 217. You can store stemmed part-of-speech tagging for transcribed text stored in.

형태소 품사 시퀀스별 운율 모델 구축부(221)는 음성 DB(215)와 이를 전사한 텍스트 DB(217), 전사한 텍스트에 대한 형태소 품사 태깅 DB(219)를 읽어, 형태소 품사 태깅의 모호성이 발생하는 형태소 품사 시퀀스들을 찾는데, 즉 음성 DB(215)에서 형태소 품사 시퀀스들에 대한 음성구간을 찾고, 운율 정보를 수집하고, 수집된 운율 정보에서 형태소 품사 시퀀스들을 구분할 수 있는 하나 이상의 운율의 속성(예컨대, 피치와 음성의 길이와 끊어 읽기 정보와 음성의 강세와 에너지 중 어느 하나임.)들을 찾고, 이를 바탕으로 형태소 품사 시퀀스별로 각각 운율 모델을 구축하여 데이터 저장 DB(223)에 제공할 수 있다. The morpheme part-of-sequence rhyme model construction unit 221 reads the speech DB 215, the transcribed text DB 217, and the morpheme parts-of-speech tagging DB 219 for the transcribed text to generate ambiguity in morpheme parts-of-speech tagging. Find one or more rhyme attributes (e.g., find speech segments for morpheme parts of speech sequences in speech DB 215, collect rhyme information, and distinguish morpheme parts of speech sequences from collected rhyme information). Pitch and length of the voice and the reading information and the strength and energy of the voice.), And based on this, a rhyme model for each morpheme parts of speech sequence can be built and provided to the data storage DB (223).

일 예로, 음성인식 결과가 “나는 새를 보았다”일 경우, 예1) 및 예2)For example, when the voice recognition result is "I saw a bird", Examples 1) and 2)

에 대하여 형태소 품사 시퀀스별 운율 모델 구축 과정을 설명할 수 있다. It can explain the process of building a rhyme model for each morpheme part-of-speech sequence.

즉, 예1)과 예2)에 대해 모호성이 발생하는 형태소 품사 시퀀스인 “나/인칭대명사 는/조사”와 “나/동사 는/관형형어미”를 찾고, 모든 음성 DB(215)에서 두 가지 형태소 해석에 해당하는 음성구간을 찾고, 이에 대한 운율 정보를 수집하고, 이 수집된 운율 정보에서 “나는”의 해석 모호성을 구분할 수 있는 운율의 속성인 피치와 끊어 읽기 정보를 찾는다. 즉 “나/인칭대명사 는/조사”와 “나/동사 는/관형형어미”에 대해 각각 피치와 끊어 읽기 정보를 활용한 운율 모델을 구축할 수 있다.In other words, look for the morpheme parts of speech sequence ambiguity for Examples 1) and 2), “I / person pronoun / investigation” and “I / verb / tubular mother”, and in all voice DBs 215 It finds the speech section corresponding to the morphological interpretation, collects the rhyme information about it, and finds the pitch and the broken reading information, which are the attributes of the rhyme, which can distinguish the interpretation ambiguity of “I” from the collected rhyme information. In other words, we can construct a rhyme model using pitch and break reading information for “I / person pronoun / investigation” and “I / verb / tubular mother” respectively.

데이터 저장 DB(223)는 형태소 품사 시퀀스별 운율 모델 구축부(221)로부터 입력되는 형태소 품사 시퀀스별 운율 모델을 저장할 수 있다. The data storage DB 223 may store the morpheme parts-of-speech rhyme model input from the morpheme parts-of-speech sequence rhyme model construction unit 221.

따라서, 본 발명은 음성인식의 결과 또는 전사한 텍스트문장의 형태소를 해석하고, 해석이 모호한 부분에 대해 음성 DB를 기반으로 기 구축된 형태소 품사 시퀀스별 운율 모델과 입력음성의 운율 모델을 비교하여, 입력음성의 운율 모델에 대한 최적의 형태소 품사 시퀀스를 찾고, 그 찾은 결과를 형태소 품사 태깅 방법과 조합시켜 형태소 품사를 태깅함으로써, 형태소 품사 태깅의 정확도를 극대화시킬 수 있다.Accordingly, the present invention analyzes the result of speech recognition or the morpheme of the transcribed text sentence, and compares the rhyme model for each morpheme parts of speech sequence based on the speech DB and the rhyme model of the input speech for the ambiguous part of the interpretation, The accuracy of the morpheme parts-of-speech tagging can be maximized by finding the optimal morpheme parts-of-speech sequence for the input speech rhyme model and tagging the morpheme parts-of-speech by combining the results with the morpheme parts-of-speech tagging method.

다음에, 상술한 바와 같은 구성을 갖는 본 실시 예에서 운율 모델을 이용한 형태소 품사 태깅 과정에 대하여 설명한다. Next, the morpheme part-of-speech tagging process using the prosody model in the present embodiment having the above-described configuration will be described.

도 3은 본 발명의 일 실시예에 따른 운율 모델을 이용한 형태소 품사 태깅 방법에 대하여 순차적으로 도시한 흐름도이다.3 is a flowchart sequentially illustrating a morpheme POS tagging method using a rhyme model according to an embodiment of the present invention.

먼저, 음성 DB(215)에는 음성이 저장되어 있고, 전사한 텍스트 DB(217)에는 음성 DB(215)에 저장된 음성을 전사한 텍스트를 저장하며, 형태소 품사 태깅 DB(219)에서는 전사한 텍스트 DB(217)에 저장된 전사한 텍스트에 대한 형태소 품사 태깅을 저장할 수 있다. First, a voice is stored in the voice DB 215, and the transcribed text DB 217 stores the transcribed text stored in the voice DB 215, and the morpheme parts tagging DB 219 transfers the transcribed text DB. The morpheme part-of-speech tagging for the transcribed text stored in 217 may be stored.

상술한 바와 같이 저장된 상태에서, 형태소 품사 시퀀스별 운율 모델 구축부(221)에서는 음성 DB(215)와 이를 전사한 텍스트 DB(217), 전사한 텍스트에 대한 형태소 품사 태깅 DB(219)를 읽어, 형태소 품사 태깅의 모호성이 발생하는 형태소 품사 시퀀스들을 찾는데, 즉 음성 DB(215)에서 형태소 품사 시퀀스들에 대한 음성구간을 찾고, 운율 정보를 수집하고, 수집된 운율 정보에서 형태소 품사 시퀀스들을 구분할 수 있는 하나 이상의 운율의 속성(예컨대, 피치, 음성의 길이, 끊어 읽기 정보 등)들을 찾고, 이를 바탕으로 형태소 품사 시퀀스별로 각각 운율 모델을 구축하여 데이터 저장 DB(223)에 저장(S301)할 수 있다. In the stored state, as described above, the rhythm model-specific rhyme model construction unit 221 reads the speech DB 215, the text DB 217 transferred thereto, and the morpheme parts-of-speech tagging DB 219 for the transferred text. Finds morpheme parts of speech sequences in which morphemes of parts tagging ambiguity occurs, that is, finds voice sections for morpheme parts of speech sequences in speech DB 215, collects rhyme information, and distinguishes morpheme parts of speech sequences from collected rhyme information. One or more attributes of the rhyme (eg, pitch, voice length, broken reading information, etc.) may be found, and a rhyme model may be constructed for each morpheme part-of-speech sequence and stored in the data storage DB 223 (S301).

이러한 상태에서, 텍스트 및 음성이 입력(S303)될 경우, 텍스트 및 음성 입력부(201)에서는 외부로부터 텍스트 및 음성을 입력받아 형태소 해석부(203)에 제공할 수 있다.In this state, when text and voice are input (S303), the text and voice input unit 201 may receive text and voice from the outside and provide the text and voice to the morpheme analyzer 203.

그러면, 형태소 해석부(203)에서는 데이터 저장 DB(223)에 저장된 형태소 품사 시퀀스별 운율 모델을 기반으로 텍스트 및 음성 입력부(201)로부터 입력되는 텍스트 및 음성중 텍스트 문장에 대해 분석 가능한 모든 형태소들의 리스트를 생성하면서 해석(S305)하여 모호성 판단부(205)에 제공할 수 있다.Then, the morpheme analysis unit 203 lists all the morphemes that can be analyzed for text sentences in the text and voice input from the text and voice input unit 201 based on the rhyme model for each morpheme part-of-speech sequence stored in the data storage DB 223. While generating (S305) it can be provided to the ambiguity determiner 205.

모호성 판단부(205)에서는 형태소 해석부(203)의 형태소 해석 결과에 대해 기 구축된 형태소 품사 시퀀스별 운율 모델에 존재하는 형태소 품사 시퀀스가 형태소 해석 결과에 존재하는지를 판단(S307)한다.The ambiguity determining unit 205 determines whether a morpheme part-of-speech sequence existing in the morphological analysis result for each morpheme part-of-speech sequence that has been pre-established by the morpheme analysis unit 203 exists in the morphological analysis result (S307).

상기 판단(S307)결과, 형태소 해석 결과에 존재하지 않으면 모호성이 존재하지 않는 것으로 판단(S309)하고 이어서 단계 S317의 품사결정 태깅을 수행하는 반면에, 상기 판단(S307)결과, 형태소 해석 결과에 존재하면 모호성이 존재하는 것으로 판단(S311)하고, 그 판단 결과를 운율 모델 검출부(207)에 제공할 수 있다.If the result of the determination (S307) does not exist in the morpheme analysis result, it is determined that ambiguity does not exist (S309), and then the part-of-speech decision tagging of step S317 is performed, while the determination (S307) results in the morpheme analysis result. If it is determined that there is ambiguity (S311), the determination result may be provided to the rhythm model detection unit 207.

운율 모델 검출부(207)에서는 모호성 판단부(205)로부터 입력되는 판단결과 중 모호성이 존재할 경우, 입력된 음성으로부터 형태소 품사 시퀀스별 운율 모델을 검출(S313)하고, 이 검출된 음성에 대한 운율 모델을 운율 모델 비교 판단부(209)에 제공할 수 있다.If there is ambiguity among the determination results input from the ambiguity determination unit 205, the rhyme model detection unit 207 detects a rhyme model for each morpheme part of speech sequence from the input voice (S313), and calculates a rhyme model for the detected voice. The prosody model comparison determination unit 209 may be provided.

운율 모델 비교 판단부(209)에서는 운율 모델 검출부(207)로부처 입력되는 음성에 대한 운율 모델과 데이터 저장 DB(223)에 저장된 형태소 품사 시퀀스별 운율 모델간을 비교하여 입력된 음성의 운율 모델에 대한 최적의 형태소 품사 시퀀스를 결정(S315)하여 품사 태깅부(211)에 제공할 수 있다. The rhyme model comparison determination unit 209 compares the rhyme model for the voice input from the rhyme model detection unit 207 with the rhyme model for each morpheme part of speech sequence stored in the data storage DB 223 to determine the input rhyme model. The optimal morpheme part-of-speech sequence may be determined (S315) and provided to the part-of-speech tagging unit 211.

품사 태깅부(211)에서는 운율 모델 비교 판단부(209)에 의해 결정된 최적의 형태소 품사 시퀀스 결과인 결합 가능한 형태소간의 리스트들에 대해 기존의 형태소 품사 태깅 기법을 적용하고, 입력된 음성의 운율 모델에 대한 최적의 품사 시퀀스 정보를 찾아 이를 더해줄 수 있는 상술한 수학식 1을 이용하여 품사결정 태깅(S317)을 수행하여 품사 태깅 결과 출력부(213)를 통해 출력(S319)할 수 있다. 여기서, 운율 모델은, 어절에 따라 구하거나, 구나 절 단위까지 확장하여 구할 수 있다. The part-of-speech tagging unit 211 applies an existing morpheme part-of-speech tagging technique to the list of combinable morphemes that are the result of the optimal morpheme part-of-speech sequence determined by the rhythm model comparison determining unit 209, and applies the rhyme model of the input voice. The part-of-speech tagging may be performed through the part-of-speech tagging result output unit 213 by using the above-described Equation 1 that can find and add the optimal part-of-speech sequence information. Here, the prosody model can be obtained according to a word or extended to phrase or clause units.

이상에서와 같이 본 발명은 운율 모델을 적용하여 형태소 품사를 태깅함으로써, 화자의 발성 의도를 파악할 수 있다. As described above, the present invention can grasp the speaker's intention by tagging the morpheme parts of speech by applying a prosody model.

한편, 본 발명의 상세한 설명에서는 구체적인 실시예에 관해 설명하였으나, 본 발명의 범위에서 벗어나지 않는 한도 내에서 여러 가지 변형이 가능함은 물론이다. 그러므로 본 발명의 범위는 설명된 실시예에 국한되지 않으며, 후술되는 특허청구의 범위뿐만 아니라 이 특허청구의 범위와 균등한 것들에 의해 정해져야 한다. Meanwhile, in the detailed description of the present invention, specific embodiments have been described, but various modifications are possible without departing from the scope of the present invention. Therefore, the scope of the present invention should not be limited to the described embodiments, but should be defined not only by the scope of the following claims, but also by those equivalent to the scope of the claims.

도 1은 음성파일을 나타낸 도면, 1 is a view showing a voice file,

도 2는 본 발명의 일 실시예에 따른 운율 모델을 이용한 형태소 품사 태깅 장치에 대한 블록 구성도,2 is a block diagram of a morpheme POS tagging device using a rhyme model according to an embodiment of the present invention;

도 3은 본 발명의 일 실시예에 따른 운율 모델을 이용한 형태소 품사 태깅 방법에 대하여 순차적으로 도시한 흐름도.3 is a flowchart sequentially illustrating a morpheme part-of-speech tagging method using a rhyme model according to an embodiment of the present invention.

<도면의 주요부분에 대한 부호의 설명><Description of the symbols for the main parts of the drawings>

201 : 텍스트 및 음성 입력부 203 : 형태소 해석부201: text and voice input unit 203: morpheme analysis unit

205 : 모호성 판단부 207 : 운율 모델 검출부205: ambiguity determination unit 207: rhyme model detection unit

209 : 운율 모델 비교 판단부 211 : 품사 태깅부209: Rhythm model comparison determination unit 211: Part of speech tagging unit

213 : 품사 태깅 결과 출력부 215 : 음성 DB213: Part of speech tagging result output unit 215: Voice DB

217 : 전사한 텍스트 DB 219 : 형태소 품사 태깅 DB217: Transcript text DB 219: Morphological part-of-speech tagging DB

221 : 형태소 품사 시퀀스별 운율 모델 구축부221: rhyme part-of-speech sequence rhyme model construction unit

223 : 데이터 저장 DB223: data storage DB

Claims

Analyzing the morpheme for the text sentence based on the rhyme model for each morpheme part of speech;

Detecting a rhyme model for speech when there is a morpheme POS part in the morpheme analysis result;

Determining a morpheme part-of-speech sequence of the rhyme model for the speech by comparing the rhyme model for the speech and the rhyme model for each morpheme part-of-speech sequence stored in a data storage DB;

Applying a morpheme part-of-speech tagging technique to the determined result of the morpheme part-of-speech sequence, and adding part-of-speech sequence information of the rhyme model for the speech to tag the part-of-speech tag

Morphological part-of-speech tagging method using a rhyme model comprising a.

The method of claim 1,

The rhyme model according to the part-of-speech sequence,

Reads the speech DB, the transcribed text DB, and the morpheme parts-of-speech tagging DB for the transcribed text, finds a speech section for the morpheme parts-of-speech sequences in the speech DB, collects rhyme information, and extracts morpheme parts of speech from the collected rhyme information. A morpheme part-of-speech tagging method using a rhyme model that finds and builds a rhyme attribute that can distinguish sequences.

The method of claim 2,

The rhyme model for the speech is a morpheme part-of-speech tagging method using a rhyme model obtained according to a word or extended to phrase or phrase units.

The method of claim 2,

The rhyme attribute tagging method using a rhyme model which is any one of pitch and length of speech, reading information and stress and energy of speech.

The method of claim 1,

The part-of-speech tagging,

Equation

Where P is the optimal morpheme part of speech, W _i is the i-th word, P _i is the tag of W _i , d _j is the rhyme model, and s _j is the morpheme part of speech. Pd (d _j │ s _j ) is the probability of comparing the rhyme model of the input speech with respect to the morpheme parts of speech sequence and the rhyme model for each preformed morpheme parts of speech sequence, and C is a constant. Means determining how much to apply the comparison results.)

A morpheme part-of-speech tagging method using a rhyme model determined by.

A morpheme analysis unit that interprets morphemes for text sentences based on a rhyme model for each morpheme part of speech,

An ambiguity determining unit that determines the existence of ambiguity according to whether or not a morpheme part-of-speech sequence exists in the morpheme analysis result;

A rhyme model detector for detecting a rhyme model for speech when the ambiguity exists;

A rhyme model comparison determination unit for comparing a rhyme model for the speech with a rhyme model for each morpheme part-of-speech sequence to determine a morpheme part-of-speech sequence of the rhyme model for the speech;

A part-of-speech tagging unit which applies a morpheme part-of-speech tagging technique to the determined morpheme part-of-speech sequence result and adds the part-of-speech sequence information of the rhyme model with respect to the speech;

Part-of-speech result output unit for outputting the part-of-speech tagged results

Morphological part-of-speech tagging device using a rhyme model comprising a.

The method of claim 6,

The rhyme model for each morpheme part-of-speech sequence may include a voice DB for storing a voice, a transcribed text DB for storing the transcribed text based on the voice, and a morpheme part-of-speech tagging DB for storing the morpheme parts-of-speech tagging based on the transcribed text. Morphological part-of-speech tagging device using rhyme model constructed by using.

The method of claim 7, wherein

A morpheme speech tagging device using a rhyme model that is obtained according to a word or extended to a phrase or phrase unit.

The method of claim 6,

The morpheme parts-of-speech sequence rhyme model is a morpheme parts-of-speech tagging device using a rhyme model stored in a data storage DB.

The method of claim 6,

The morphemes of speech tagging device,

Reads the speech DB, the transcribed text DB, and the morpheme parts-of-speech tagging DB for the transcribed text, finds a speech section for the morpheme parts-of-speech sequences in the speech DB, collects rhyme information, and extracts morpheme parts of speech from the collected rhyme information. A morpheme part-of-sequence rhyme model construction unit that finds and constructs rhyme attributes that can distinguish sequences

Morphological part-of-speech tagging device using a rhyme model further comprising.