KR20090041923A

KR20090041923A - Voice recognition method

Info

Publication number: KR20090041923A
Application number: KR1020070107705A
Authority: KR
Inventors: 박전규; 이윤근; 강병옥; 강점자; 김갑기; 이성주; 전형배; 정호영; 조훈영; 훈 정
Original assignee: 한국전자통신연구원
Priority date: 2007-10-25
Filing date: 2007-10-25
Publication date: 2009-04-29
Also published as: KR100930715B1

Abstract

A voice recognition method is provided to model various textual language phenomenons into statistical modeling among various knowledge sources. A morpheme is interpreted for a primitive text language corpus consisting of the separate words of Korean(S201). A morpheme language corpus separated is a separate word generated to morpheme. A word trigram which is the language model consisting of a morpheme unigram about a generated morpheme language corpus as described above, and bigram and trigrams is generated(S202). A first N - best recognition candidate to the maximum N is generated for a voice(S204). Recognition result candidates applying a morph-syntactic constraints are revaluated(S205). A second N-best list generated in above step is revaluated(S206). A final N-best list is generated.

Description

Speech Recognition Method

본 발명은 음성 인식 방법에 관한 것으로, 특히 지식 근원(Knowledge Source)들 중에서 다양한 텍스트적 언어 현상을 통계적으로 모델화하는 언어 모델에 기반한 음성 인식 방법에 관한 것이다.The present invention relates to a speech recognition method, and more particularly, to a speech recognition method based on a language model for statistically modeling various textual language phenomena among knowledge sources.

본 발명은 정보통신부 및 정보통신연구진흥원의 IT성장동력기술개발사업의 일환으로 수행한 연구로부터 도출된 것이다[과제관리번호: 2006-S-036-02, 과제명: 신성장동력산업용 대용량 대화형 분산 처리 음성인터페이스 기술개발].The present invention is derived from the research conducted as part of the IT growth engine technology development project of the Ministry of Information and Communication and the Ministry of Information and Telecommunications Research and Development. [Task Management Number: 2006-S-036-02, Task name: Large-capacity interactive dispersion for new growth engine industries] Development of processing voice interface technology].

일반적으로, 대용량 어휘의 음성 인식에 사용되는 언어 모델을 구성함에 있어서, 형태소 또는 의사 형태소 수준의 인식 단위를 설정하고, 대용량 말뭉치(Text Corpus)로부터 형태소 해석에 기반을 두어 언어 모델을 학습하도록 한다.In general, in constructing a language model used for speech recognition of a large vocabulary, a recognition unit at a morpheme or pseudo morpheme level is set, and a language model is trained based on morphological analysis from a large text corpus.

일반적인 음성 인식 방법은, 언어 모델, 음향 모델, 발성 사전, 영역 모델 등과 같은 다양한 지식 근원을 사용하여 발성자가 말한 음성을 텍스트로 변환하여 인식하게 된다.In the general speech recognition method, a speech of a speaker is converted into text using various knowledge sources such as a language model, an acoustic model, a speech dictionary, an area model, and the like.

이때, 통상의 언어 모델 과정은, 특정 단어 열이 연속해서 나타날 수 있는 가능성을 확률로 정의하는 것이며, 이를 위해서는 특정 언어의 모든 언어적 현상을 포괄 또는 대표할 수 있도록 충분한 양의 말뭉치를 필요로 한다. 이러한 말뭉치를 수집하기 위해서는, 일반적으로 수년 분의 신문기사, 소설, 수필, 희곡, 논설, 인터넷 채팅 기록, 전자메일, 다양한 텍스트 자료 등을 고려하게 된다.In this case, the normal language model process is to define the probability that a specific word string can appear consecutively, and this requires a sufficient amount of corpus to cover or represent all linguistic phenomena of a specific language. . To collect such corpus, we typically consider years of newspaper articles, novels, essays, plays, editorials, internet chat records, e-mail, and various textual materials.

이러한 언어 모델의 기본 단위는 '통상 단어'라는 어휘 구조를 사용하게 된다. 예를 들어, 영어의 경우에는 공백 문자(Space)로 분리된 유일한 단어를 최소 인식 단위로 삼고, 이것들에 기초하여 여러 가지의 통계 정보 및 언어 모델을 작성하기 때문에, 비교적 수월하게 영어의 어휘 및 문법 구조를 모델화하는 것이 가능하다.The basic unit of such a language model uses a lexical structure called 'normal word'. For example, in English, since only words separated by spaces are used as the minimum recognition unit, and various statistical information and language models are created based on these, English vocabulary and grammar are relatively easy. It is possible to model the structure.

하지만, 한국어의 경우처럼 다양한 형태의 접두사가 수시로 첨가될 수 있으며 어미의 변화가 많고 조사가 발달한 언어에 대해서는, 어절 단위의 언어 모델을 작성하게 되면, 음성 인식에 사용되는 기억 장소가 폭발적으로 증가하게 된다. 따라서 한국어의 경우는, 의미의 최소 단위인 형태소나, 아니면 어절의 음성 정보인 음소를 가지고 언어 모델을 구축하는 것이 합리적이고 현실적이다.However, as in the case of Korean, various types of prefixes can be added at any time, and for languages with a lot of ending changes and advanced research, when the language model of word units is created, the memory used for speech recognition explodes. Done. Therefore, in the case of Korean, it is reasonable and realistic to construct a language model with morphemes, which are the minimum units of meaning, or phonemes, which are speech information of words.

그리고 이러한 음성 인식에 사용되는 언어 모델은, 언어 모델의 수립에 사용되는 말뭉치를 대상으로 형태소 분석을 통해 형태소 또는 의사 형태소(Pseudo Morpheme)를 구하고, 형태소들이 인접해서 나타나는 가능성을 통계적인 수치로 나타내는 것으로서, 일반적으로 전체 말뭉치에서 특정 형태소가 나타날 확률을 유니그램(1-gram)으로 작성하고, 특정한 두 개의 형태소가 연속해서 나타날 확률을 바이그램(2-gram)으로 작성하고, 특정한 세 개의 형태소가 연속해서 나타날 확률을 트라이그램(3-gram)으로 작성하는 등과 같이, 이러한 방식으로 테트라그램(4- gram), 펜타그램(5-gram) 등을 작성하게 되는데, 이것을 '엔-그램(n-gram)'이라 칭한다.The language model used for speech recognition is to obtain a morpheme or pseudo morpheme through morphological analysis for corpus used in establishing a language model, and to express statistically the probability of morphemes appearing adjacent to each other. In general, the probability that a specific morpheme appears in the whole corpus is expressed in unigram (1-gram), the probability that two specific morphemes appear in succession in 2-gram, and the three specific morphemes in succession In this way, tetragrams (4-grams), pentagrams (5-grams), etc. are created, such as trigrams (3-grams) of probability of occurrence, which is called 'n-grams'. This is called.

이렇게 학습된 엔-그램 언어 모델은 음성 인식의 과정에서 직접 적용되는 데, 현재 트라이그램에 기반한 탐색이 가장 일반적으로 사용된다.The trained engram language model is applied directly in the process of speech recognition. Currently, trigram-based search is most commonly used.

이러한 방법은 단어의 연접 구조에 대한 통계적 정보만을 활용하기 때문에, 필연적으로 의미적으로나 구문적으로 잘못된 인식 결과 또는 순서를 빈번하게 수반하는 문제점이 있다.Since this method utilizes only statistical information on the concatenated structure of words, it inevitably has a problem that frequently involves semantic or syntactically incorrect recognition results or sequences.

또한, 발성 단위와 어휘 구문론적 단위가 서로 다름으로 인해 'ㄴ', 'ㄹ'과 같은 관형사형 어미와 같이 짧은 지속시간을 갖는 음성 이벤트에 대한 성능 저하를 초래하기 때문에, 이들을 의사 형태소라는 복잡한 단위로 확장 또는 조정하여 음성 인식의 단위를 구성해서 사용하고 있는 것이 현실이다.In addition, different speech units and lexical syntactic units result in poor performance for speech events with short durations, such as the hemispherical endings such as 'b' and 'ㄹ'. The reality is that the unit of speech recognition is constructed by using expansion or adjustment.

한편, 학습에 사용되는 주어진 말뭉치가 한국어의 모든 어휘 및 문법 현상을 반영할 수가 없으며, 또한 모든 사전적인 한국어 형태소들에 대해 언어 모델에서 사용하는 확률 값을 산출하는 것이 사실상 불가능하다.On the other hand, a given corpus used for learning cannot reflect all vocabulary and grammatical phenomena of Korean language, and it is virtually impossible to calculate probability values used in language models for all dictionary Korean morphemes.

다시 말해서, 학습에 사용되는 제한된 말뭉치에서 발생한 어휘 구문론적 이벤트에 대해서는 확률 값이 높게 예측되고, 발생하지 않은 이벤트에 대해서는 너무 낮게 편중되는 현상이 필연적으로 발생하기 때문에, 이를 보정하기 위해서 평탄화(Smoothing) 또는 백-오프(Back-off)와 같은 기법을 적용하게 된다.In other words, since the probability value is expected to be high for lexical syntactic events occurring in the limited corpus used for learning, and the bias is too low for events that do not occur, smoothing is performed. Or a technique such as back-off is applied.

일반적인 평탄화의 기법은, 아래의 수학식 1과 같이, 학습에 사용된 말뭉치에서 특정 형태소들의 열이 발생한 횟수(r)를 발생 가능한 횟수(R)로 나눈 확률 값 인 'P(E)'를 사용하게 된다.A general planarization technique uses 'P (E)', which is a probability value obtained by dividing the number of occurrences of heat of specific morphemes (r) by the number of occurrences (R) of a corpus used for learning, as shown in Equation 1 below. Done.

P(E) = r / RP (E) = r / R

그리고 일반적으로 말뭉치에서 특정 형태소 또는 형태소 열의 발생 빈도가 너무 낮아서 제대로 된 확률 값을 구할 수 없을 경우, 이들을 배제(Cut-off)하게 된다. 이와 같은 처리를 보상하기 위해서, 또는 알려지지 않은 형태소 열에 대한 처리를 위해서, 주어진 엔-그램(n-gram(W1, W2, ..., Wn)) 열이 말뭉치에 발생하지 않았을 경우에 하위 모델에서의 값(P(Wn | W1, ..., Wn-1))을 이용하게 되는데, 이것을 '백-오프'라고 한다.In general, when the frequency of occurrence of specific morphemes or morpheme fever in the corpus is too low to obtain a proper probability value, they are cut-off. In order to compensate for this process, or for processing unknown morphological columns, a sub-model can be used if a given n-gram (n-gram (W1, W2, ..., Wn)) column does not occur in the corpus. The value of P (Wn | W1, ..., Wn-1) is used. This is called 'back-off'.

상술한 바와 같이, 종래의 기술에서 적용되는 평탄화 기법이나 백-오프와 같은 언어 모델 기법을 적용할 경우, 의도하지 않거나, 의미 없는 단어 또는 형태소의 삽입 오류(Insertion Error)가 빈발하게 발생되는 문제점이 있다.As described above, when applying a language modeling technique such as a flattening technique or a back-off applied in the prior art, a problem in which unintended or meaningless insertion of words or morphemes occurs frequently. have.

본 발명이 이루고자 하는 기술적 과제는, 언어 모델, 음향 모델, 발성 사전, 영역 모델 등과 같은 다양한 지식 근원(Knowledge Source) 중에서 다양한 텍스트적 언어 현상을 통계적으로 모델화하는 언어 모델과 해당 언어 모델에 기반한 음성 인식 방법을 제공하는 것이다.The technical problem to be achieved by the present invention is a language model that statistically models various text language phenomena among various knowledge sources such as language models, acoustic models, speech dictionaries, domain models, and speech recognition based on the language models. To provide a way.

또한, 본 발명은 종래 형태소 기반의 트라이그램 탐색 과정에서 의미적으로나 어휘 구문론적으로 문제가 있는 빈번한 단어 삽입 오류를 효과적으로 제어할 수 있도록 하는데, 그 목적이 있다.In addition, an object of the present invention is to effectively control frequent word insertion errors that are semantically or lexical syntactically problematic in the conventional morpheme-based trigram search.

이러한 과제를 해결하기 위해, 본 발명에 따르면, 한국어 대용량 어휘 음성 인식에 핵심적으로 사용될 수 있는 언어 모델과 이에 기반한 탐색 방법을 구현한다. 이때, 본 발명은 형태소에 기반한 한국어 음성 인식을 위해서, 한국어의 특질을 고려하여 언어 모델 수립에 적합하도록 체계화한 지식 베이스를 사용하여 어절, 형태소, 음절 등의 다양한 정보를 효율적으로 사용할 수 있도록 한다.In order to solve this problem, according to the present invention, a language model and a search method based on the same can be used. In this case, the present invention enables efficient use of various information such as words, morphemes, and syllables by using a knowledge base structured to be suitable for establishing a language model in consideration of Korean characteristics.

다시 말해서, 본 발명은 형태소 기반의 인식 단위를 채용하는 대어휘 음성 인식 시스템을 위해서, 형태소 해석 단계에서 발생하는 다양한 의미적 또는 구문적 정보들을 효율적으로 지식화함으로써 음성 인식의 성능을 개선하며, 특히 형태소 트라이그램뿐만 아니라 형태소 품사 트라이그램, 어간-어미 결합 정보에 근거하는 어간어미 참조표(Stem-suffix Look-up Table) 등을 활용하여 음성 인식의 인식 후 보열들을 재평가하도록 한다. 이에, 본 발명은 다수 개의 유효한 인식 후보열들을 효과적으로 재평가하고 오류 보정을 수행함으로써 음성 인식의 성능 향상을 도모할 수 있다.In other words, the present invention improves the performance of speech recognition by efficiently knowledge of various semantic or syntactic information generated in the morpheme analysis step for a large vocabulary speech recognition system employing a morpheme-based recognition unit. In addition to morphological trigrams, stem-suffix look-up tables based on stem-to-mother combination information and stem-suffix look-up tables are used to reevaluate the strides after speech recognition. Accordingly, the present invention can improve the performance of speech recognition by effectively re-evaluating a plurality of valid recognition candidate sequences and performing error correction.

본 발명의 한 특징에 따르면, 음성을 인식하는 방법에 있어서, 원시 텍스트 말뭉치로부터 음성 인식에서 사용하는 지식베이스를 생성하도록 하는 말뭉치 학습 과정, 그리고 지식베이스를 사용하여 형태소를 인식 단위로 사용하는 음성 인식 엔진의 최종 인식 결과를 생성하도록 하는 인식 과정을 포함하는 음성 인식 방법을 제공한다.According to an aspect of the present invention, in a speech recognition method, a corpus learning process for generating a knowledge base for speech recognition from a raw text corpus, and speech recognition using a morpheme as a recognition unit using the knowledge base It provides a speech recognition method comprising a recognition process for generating a final recognition result of the engine.

여기서, 상술한 말뭉치 학습 과정은, 원시 텍스트 말뭉치에 대해 형태소 해석을 수행하여 형태소 말뭉치 및 형태소 열을 생성하는 단계, 형태소 말뭉치에 대해 통계적 언어 모델링을 수행하여 단어 트라이그램 및 형태소 트라이그램을 생성하는 단계, 그리고 형태소 말뭉치를 입력으로 하여 어미 또는 접미사들의 연결 형태와 실제적인 연결형 어미들을 생성시켜 어간어미 참조표로 구성하는 단계를 포함한다. 이때, 형태소 말뭉치는 원시 텍스트 말뭉치의 어절을 형태소로 분리한 것이며, 형태소 열은 형태소 말뭉치로부터 형태소를 제외한 품사 태그로만 개별 문장을 구성한 것이며, 단어 트라이그램은 형태소 말뭉치에 대해 형태소 유니그램, 바이그램, 트라이그램들로 구성되는 언어 모델로 생성되며, 형태소 트라이그램은 형태소 열에 대해서 형태소 품사들의 유니그램, 바이그램, 트라이그램들로 구성되는 품사 세트 언어 모델로 생성된다.Here, in the corpus learning process, the morpheme analysis is performed on the raw text corpus to generate the morpheme corpus and the morpheme sequence, and the statistical language modeling is performed on the morpheme corpus to generate the word trigram and the morpheme trigram. And inputting the morpheme corpus to generate the concatenated forms of the mother or suffixes and the actual concatenated endings to form a stem mother reference table. At this time, the morpheme corpus is a morpheme of words of the raw text corpus, and the morpheme column is composed of individual sentences only from the morpheme corpus except the morpheme corpus, and the word trigram is a morpheme unigram, a bigram, and a trigram. A morpheme trigram is generated as a part-of-speech language model consisting of unigrams, bigrams, and trigrams of morpheme parts of speech.

그리고 상술한 인식 과정은, 발성된 음성에 대해 트라이그램 탐색을 수행하 여 다수 개의 1차 엔-베스트 인식 후보열을 생성하는 단계, 1차 엔-베스트 인식 후보열에 대해 형태소 트라이그램에 기반한 탐색을 수행하여 2차 엔-베스트 리스트를 생성하는 단계, 그리고 어간어미 참조표를 적용하여 2차 엔-베스트 리스트를 대상으로 재평가를 수행하여 최종 엔-베스트 리스트를 생성하는 단계를 포함한다.The recognition process may include generating a plurality of primary en-best recognition candidate sequences by performing a trigram search on the spoken voice, and performing a search based on a morpheme trigram of the primary en-best recognition candidate sequences. And performing a second en-best list by performing a re-evaluation on the second en-best list by applying a stem mother reference table to generate a final en-best list.

본 발명의 다른 특징에 따르면, 발성된 음성을 인식하는 방법에 있어서, 발성된 음성에 대해 단어 트라이그램 탐색을 수행하여 다수 개의 1차 엔-베스트 후보열을 생성하는 단계, 1차 엔-베스트 후보열에 대해 형태소 트라이그램 탐색을 수행하여 2차 엔-베스트 후보열을 생성하는 단계, 그리고 2차 엔-베스트 후보열에 대해 어간-어미 탐색을 수행하여 최종 엔-베스트 후보열을 생성하는 단계를 포함하는 음성 인식 방법을 제공한다.According to another aspect of the present invention, in the method for recognizing a spoken voice, performing a word trigram search on the spoken voice to generate a plurality of primary en-best candidate sequences, the primary en-best candidate Performing a morphological trigram search on the column to generate a secondary en-best candidate sequence, and performing a stem-mother search on the secondary en-best candidate sequence to generate a final en-best candidate sequence. It provides a voice recognition method.

여기서, 상술한 2차 엔-베스트 리스트를 생성하는 단계는, 형태소 트라이그램을 적용하여 인식 후보열의 품사 태그열을 활용하여 1차 엔-베스트 후보열을 재평가하는 것이다. 또한, 상술한 최종 엔-베스트 리스트를 생성하는 단계는, 어간-어미 탐색 시에 어간어미 참조표를 구성하는 단계, 그리고 구성된 어간어미 참조표를 기반으로 2차 엔-베스트 후보열을 재평가하는 단계를 포함한다.Here, the generating of the above-described secondary en-best list is to re-evaluate the primary en-best candidate sequence by using the part-of-speech tag sequence of the recognition candidate sequence by applying the morpheme trigram. In addition, generating the above-described final en-best list may include: constructing a stem ending table at the time of searching for the stem-mother, and re-evaluating the secondary en-best candidate sequence based on the configured stem ending table. It includes.

이때, 상술한 어간어미 참조표를 구성하는 단계는, 어절을 어간과 어미로 분리하는 단계, 어미를 더 이상 분리하지 않고 어미열로서 그룹화하는 단계, 어미 결합열을 데이터베이스로서 정의하는 단계, 그리고 어간어미 참조표의 데이터베이스 필드를 정의하는 단계를 포함한다.At this time, the step of constructing the stem mother reference table, the step of separating the word into the stem and the mother, grouping as a mother column without separating the mother anymore, defining the mother bond string as a database, and stem Defining the database fields of the parent lookup table.

이와 같이 본 발명에 의하면, 종래 기술의 형태소 기반의 트라이그램 탐색 과정에서 의미적으로나 어휘 구문론적으로 문제가 있는 빈번한 단어 삽입 오류를 효과적으로 제어할 수 있으며, 다수 개의 유효한 인식 후보열들을 효과적으로 재평가하고 오류 보정을 수행함으로써 음성인식의 성능 향상을 도모할 수 있다.As described above, according to the present invention, it is possible to effectively control frequent word insertion errors that are semantically and lexical syntactically problematic in the morpheme-based trigram search of the prior art, and effectively re-evaluate a plurality of valid recognition candidate sequences and make errors. By performing the correction, the performance of speech recognition can be improved.

아래에서는 첨부한 도면을 참고로 하여 본 발명의 실시예에 대하여 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.DETAILED DESCRIPTION Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art may easily implement the present invention. As those skilled in the art would realize, the described embodiments may be modified in various different ways, all without departing from the spirit or scope of the present invention. In the drawings, parts irrelevant to the description are omitted in order to clearly describe the present invention, and like reference numerals designate like parts throughout the specification.

명세서 전체에서, 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다. 또한, 명세서에 기재된 "…부", "…기", "…모듈" 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어나 소프트웨어 또는 하드웨어 및 소프트웨어의 결합으로 구현될 수 있다.Throughout the specification, when a part is said to "include" a certain component, it means that it can further include other components, without excluding other components unless specifically stated otherwise. In addition, the terms “… unit”, “… unit”, “… module” described in the specification mean a unit that processes at least one function or operation, which may be implemented by hardware or software or a combination of hardware and software. Can be.

이제 본 발명의 실시 예에 따른 음성 인식 방법에 대하여 도면을 참고로 하여 상세하게 설명한다.Now, a voice recognition method according to an embodiment of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 실시 예에 따른 음성 인식 방법의 개략적인 순서도이다.1 is a schematic flowchart of a speech recognition method according to an exemplary embodiment of the present invention.

본 발명의 실시 예에 따른 음성 인식 방법은, 형태소, 형태소 품사, 어간어미 참조표 정의에 기반을 두는 다단계 음성 인식 방법으로서, 도 1에 도시한 바와 같이, 크게 단어 트라이그램 탐색(S110), 형태소 트라이그램 탐색(S120), 어간-어미 탐색(S130)의 세 가지 단계로 이루어진다. 여기서, 형태소 트라이그램을 적용하여 인식 후보열의 품사 태그열을 활용하여 엔-베스트(N-best) 후보열을 재평가하도록 한다. 이때, 어간-어미 탐색 시에 어간어미 참조표를 구성하고, 해당 구성된 어간어미 참조표를 기반으로 엔-베스트 후보열을 재평가하도록 한다.The speech recognition method according to an embodiment of the present invention is a multi-stage speech recognition method based on morphemes, morpheme parts of speech, and stem word reference table definitions, and as shown in FIG. 1, a word trigram search (S110) and a morpheme It consists of three stages: trigram search (S120) and stem-mother search (S130). Here, the morpheme trigram is applied to re-evaluate the N-best candidate sequence using the part-of-speech tag sequence of the recognition candidate sequence. In this case, when searching for a stem-mother, a stem-mother reference table is formed, and the en-best candidate sequence is re-evaluated based on the configured stem-mother reference table.

다시 말해서, 본 발명의 실시 예에 따른 다단계 음성 인식 방법은, 종래기술에서 적용되는 평탄화나 백-오프와 같은 언어 모델 기법을 적용할 경우에 의도하지 않은 또는 의미 없는 단어 또는 형태소의 삽입 오류(Insertion Error)가 빈발하게 되는 문제점을 해소하기 위한 것으로, 학습용 말뭉치를 대상으로 하여 일반적인 엔-그램(n-gram) 언어 모델을 수립하는 동시에 형태소 품사에 기반하는 형태소 트라이그램을 계산하며, 아울러 한국어의 어절 구조를 어간(또는 어근(Stem))과 어미(Affix)로 구분하고, 특히 어미의 구조적인 결합 규칙을 명시적으로 정의한 어간어미 참조표를 생성해서 사용하도록 한다.In other words, the multi-stage speech recognition method according to an embodiment of the present invention is an insertion error of an unintended or meaningless word or morpheme when applying a language model technique such as flattening or back-off applied in the prior art. To solve the problem of frequent error, it establishes general n-gram language model for learning corpus, calculates morpheme trigram based on morpheme parts of speech, and also calculates word of Korean. The structure is divided into stems (or stems) and affixes, and in particular, it is necessary to generate and use a stem table that explicitly defines the structural joining rules of the mothers.

그리고 본 발명의 실시 예에 따른 다단계 음성 인식 방법은, 기본 탐색구조인 첫 번째 단계 트라이그램 탐색에서 발생되는 복수 개의 인식 후보열(N-best)에 대해 두 번째 단계, 세 번째 단계의 재평가 및 순위화를 통해 오류 후처리를 보정함으로써, 전반적인 음성 인식의 성능 향상에 기여하도록 한다.In the multi-stage speech recognition method according to an embodiment of the present invention, the second and third stages of re-evaluation and ranking of the plurality of recognition candidate sequences (N-best) generated in the first-stage trigram search, which are basic search structures, are ranked. By correcting the error post-processing through speech, it contributes to the improvement of overall speech recognition performance.

다음으로는 본 발명의 실시 예에 따른 한국어 어절 구조로부터 어간어미 참 조표를 생성하는 과정 및 원리에 대해 설명하면 다음과 같다.Next, a process and principle of generating a stem word table from the Korean word structure according to an embodiment of the present invention will be described.

우선, 통상의 한국어 어절 구조 결합 규칙은, 어간(Stem), 접두어(Prefix) + 어간, 어간 + 어미(Affix), 접두어 + 어간 + 어미로 범주화할 수 있다. 이 중에서 특히 한국어에 있어 그 양상이 다양하게 나타나는 규칙은, 어미의 결합 형태이다.First, general Korean word structure combining rules can be categorized into stem, prefix + stem, stem + affix, prefix + stem + ending. Among these, in particular, the rule that appears in various ways in Korean is the combination of the endings.

예를 들어, 대표적인 '고'라는 어미를 고찰해 보면 아래와 같이 용언이나 체언의 마지막에 붙어서 다양한 활용을 수행할 수 있는 예가 있으며, 어미 "+고"는 체언의 뒤에서 아래와 같은 다양한 활용에 수반되어 어휘 구문론적 기능을 수행한다.For example, if you consider the representative word 'Go', there is an example that you can perform various applications at the end of a verb or statement as follows, and the term "+ go" is followed by various uses as follows. It performs a syntactic function.

1) +고: 체언 : suffix sequence : +시+고: 창+설+하+시+고1) + high: subtitle: suffix sequence: + high + high: window + description + lower + high + high

2) +고: 체언 : suffix sequence : +리+라+고 : 습+격+해+오+리+라+고2) + high: suffix sequence: + li + la + high: wet + violet + sea + oh + ri + la + high

3) +고: 체언 : suffix sequence : +고 : 부+풀+려+지+고3) + high: subtitle: suffix sequence: + high: swell + swell + swell + high

4) +고: 체언 : suffix sequence : +시+고: 창+설+하+시+고4) + High: Subtitle: suffix sequence: + High + High: Window + New + Lower + High + High

이와 같은 관점에서 본 발명의 실시 예는 한국어의 어휘 구문론적인 어간 및 어미의 접속 구조를 해석하여 다음과 같이 어미의 결합 규칙을 데이터베이스화함으로써 어간어미 참조표를 구축한다.In this regard, the embodiment of the present invention analyzes the lexical syntactic stem and the connection structure of the Korean language and constructs a stem mother reference table by databaseting the combining rule of the mother as follows.

첫 번째로, 어절은 어간(즉, 용언 및 체언 등)과 어미(즉, 조사, 접미사 등)로 분리한다.First, words are divided into stems (ie, words and dictates) and endings (ie, surveys, suffixes, etc.).

두 번째로, 한국어에는 새로운 어간을 파생시키거나 재 파생시키는 형태소들이 있으며, 이러한 형태소들은 원칙적으로 어간에 속하기는 하지만 또 다른 어간과 어미로 분리될 수 있다는 점을 고려해서, 어미는 더 이상 분리하지 않고 어미 열(Suffix Sequence)로서 그룹화한다.Secondly, there are morphemes in Korean that derive or re-derive new stems, which, in principle, belong to the stem but can be separated into another stem and the mother is no longer separated. Group them together as a suffix sequence.

세 번째로, 이러한 어미 결합열은 데이터베이스로서 정의하는데, 이때 표제어(Lemma), 형태소 품사, 결합 형태 및 그 구성요소들의 관점에서 유일하게 나타난다.Third, this parent binding sequence is defined as a database, which appears uniquely in terms of the headings (Lemma), the morpheme parts, the combination form, and its components.

네 번째로, 어간어미 참조표의 데이터베이스 필드를 정의한다. 즉, 표제어의 경우(상기 예에서 '+고'의 경우), 어절 내에서의 음절 경계는 '+'로 표시하며, 음절 및 어휘 파싱을 위해 맨 뒤의 음절을 표제어로 설정한다. 결합되는 품사 태그의 경우(상기 예에서 '체언'의 경우), 용언 또는 체언 등의 결합 가능한 형태를 지정한다. 품사의 결합 정보의 경우(상기 예에서 'suffix sequence'의 경우), 자립 또는 의존 형태소, 어미, 어미 결합 등에 따른 자질을 정의한다. 구성의 경우(상기 예에서 '+리+라+고'의 경우), 어미의 음절 구성을 지정한다. 어절 철자 예제의 경우(상기 예에서 '습+격+해+오+리+라+고'의 경우), 완전하게 결합된 형태의 어절을 예시한다.Fourth, we define the database fields of the stem ending table. That is, in the case of the headword (in the example, '+ high'), the syllable boundary in the word is represented by '+', and the last syllable is set as the heading word for syllable and vocabulary parsing. In the case of the part-of-speech tag to be combined (in the above example, 'to speak'), a combinable form such as a verb or a talk is specified. In the case of the combination information of the part of speech (in the case of the 'suffix sequence' in the above example), the qualities according to self-supporting or dependent morphemes, endings, and endings are defined. In the case of composition (in the example above, in the case of '+ Lee + La + Go'), the syllable composition of the ending is specified. In the case of the word spelling example (in the example above, 'wet + hard + solution + oh + li + la + go'), the word in a completely combined form is illustrated.

본 발명의 실시 예에 따른 음성 인식 방법을 도 2의 순서도를 참조하여 보다 상세히 설명하면 다음과 같다.The speech recognition method according to an embodiment of the present invention will be described in detail with reference to the flowchart of FIG. 2.

도 2는 본 발명의 실시 예에 따른 통계적 언어 모델 학습에 기반한 다단계 음성 인식 방법을 나타낸 도면이다. 여기서, 도 2는 원시 텍스트 말뭉치(Text Corpus)로부터 음성 인식에서 직접 사용하는 지식베이스를 생성하도록 하는 말뭉치 학습 과정과, 이들 지식베이스를 사용하여 형태소를 인식 단위로 사용하는 음성 인식 엔진의 최종 인식 결과를 생성하도록 하는 인식 과정을 도시하고 있다.2 is a diagram illustrating a multi-step speech recognition method based on statistical language model learning according to an embodiment of the present invention. Here, FIG. 2 shows a corpus learning process for generating a knowledge base directly used for speech recognition from a raw text corpus, and a final recognition result of a speech recognition engine using morphemes as a recognition unit using these knowledge bases. It shows the recognition process to generate a.

이때, 형태소 해석(S201), 통계적 언어 모델링(S202), 샐로우 파싱(Shallow Parsing)(S203)의 단계는 말뭉치 학습 과정의 일환이며, 트라이그램 탐색(S204), 형태소 트라이그램 탐색(S205), 어간-어미 탐색(S206)의 단계는 인식 과정에 속한다.At this time, the morpheme analysis (S201), statistical language modeling (S202), shallow parsing (Shallow Parsing) (S203) is part of the corpus learning process, trigram search (S204), morpheme trigram search (S205), The stage of stem-mother searching (S206) belongs to the recognition process.

형태소 해석 단계(S201)는, 한국어의 어절들로 구성된 원시 텍스트 말뭉치(Text Corpus)에 대해 형태소 해석을 수행함으로써, 어절을 형태소로 분리한 형태소 말뭉치(Morpheme Corpus)를 생성한다. 이들 형태소 말뭉치는 음성 인식의 단위로 직접 적용된다. 또한, 이와 동시에 이들 형태소의 품사 태그들로만 구성되는 형태소 열(Morpheme Sequence)을 생성한다.Morphological analysis step (S201), by performing a morpheme analysis on the text corpus (Text Corpus) consisting of the words of the Korean language, to generate a morpheme corpus (Morpheme Corpus) that is separated into morphemes. These morpheme corpus are applied directly as a unit of speech recognition. At the same time, a Morpheme Sequence consisting of only parts of speech tags of these morphemes is generated.

통계적 언어 모델링 단계(S202)는, 형태소 해석 단계(S201)에서 생성된 형태소 말뭉치에 대해 형태소 유니그램, 바이그램, 트라이그램들로 구성되는 언어 모델인 단어 트라이그램(Word Trigram)을 생성한다. 또한, 이와 동시에 형태소 해석 단계(S201)에서 생성된 형태소 열(Morpheme Sequence)에 대해서 형태소 품사들의 유니그램, 바이그램, 트라이그램들로 구성되는 품사 세트 언어 모델인 형태소 트라이그램(Morpheme Trigram)을 생성한다.The statistical language modeling step S202 generates a word trigram that is a language model composed of morpheme unigrams, bigrams, and trigrams with respect to the morpheme corpus generated in the morpheme analysis step S201. At the same time, a Morpheme Trigram, which is a part-of-speech set language model composed of unigrams, bigrams, and trigrams of morpheme parts of speech, is generated with respect to the Morpheme Sequence generated in the morpheme analysis step S201. .

샐로우 파싱(S203)의 중요한 개념은, 상술한 "+리+라+고"의 예에서처럼 개별 조사 접미사의 정확한 의미적 해석이나 품사적 해석을 요하지 않으며, 단지 어절 내에서의 어미로서의 기능을 수행하는 어휘적 해석과 함께, 활용어가 활용될 때에 변하지 않는 부분인 어간에 상대적으로 용언 및 서술격 조사가 활용되어 변하는 형태에 중점을 두고 있는 것으로서, 어간-어미 접속 규칙의 등록 시에도 '접미사 열(Suffix Sequence)'라는 품사 태그만을 사용하게 된다.The important concept of shallow parsing (S203) does not require accurate semantic or part-of-speech interpretation of individual survey suffixes as in the example of "+ Lee + La + Go" described above, but merely functions as a mother in a word. In addition to the lexical interpretation, the terminology and narrative surveys are applied to the stem, which is a part that does not change when the term is utilized, and focuses on the changing form. Only part-of-speech tags.

그리고 샐로우 파싱 단계(S203)는, 형태소 해석 단계(S201)의 출력 데이터인 형태소 말뭉치를 입력으로 하여 어미 또는 접미사들의 연결 형태와 실제적인 연결형 어미들을 생성함으로써 어간어미 참조표를 생성한다.In the parsing step (S203), a stem ending table is generated by generating a concatenated form of mothers or suffixes and actual connected mothers by inputting morpheme corpus, which is output data of the morphological analysis step (S201).

트라이그램 탐색 단계(S204)는, 녹음된 음성에 대해 음향 모델, 언어 모델, 발성 사전 등을 사용하여 텍스트로 변환하는 대어휘 음성 인식의 전형적인 탐색 과정으로서, 발성된 음성에 대해 최대 N개까지의 1차 엔-베스트(N-best) 인식 후보열을 생성한다.The trigram search step S204 is a typical search process of a large vocabulary speech recognition that converts a recorded voice into text using an acoustic model, a language model, a speech dictionary, and the like. Generate a first order N-best recognition candidate sequence.

형태소 트라이그램 탐색 단계(S205)는, 형태소-구문 제약조건(Morph-syntactic Constraints)을 적용하는 인식 결과 후보들을 재평가하는 것이 목표이다. 즉, 트라이그램 탐색 단계(S204)에서 생성된 1차 엔-베스트 리스트에 대해 형태소 트라이그램에 기반한 탐색을 수행하여 이들 엔-베스트 리스트를 검증 및 재평가하는 단계이다. 1차 엔-베스트를 재평가하여 점수별로 순위를 재조정하여 2차 엔-베스트 리스트를 생성하게 된다.The morpheme trigram search step S205 aims to re-evaluate the recognition result candidates applying Morph-syntactic Constraints. That is, a step of verifying and re-evaluating these en-best lists by performing a search based on a morpheme trigram on the first en-best list generated in the trigram search step S204. The first en-best is reevaluated and the ranking is re-adjusted by scores to generate a second en-best list.

어간-어미 기반 탐색 단계(S206)는, 샐로우 파싱 단계(S203)의 출력 데이터인 어간어미 참조표를 적용하여 형태소 트라이그램 탐색 단계(S205)에서 생성된 2차 엔-베스트 리스트를 대상으로 재평가를 수행하여 최종 엔-베스트 리스트를 생성한다.The stem-mother based search step S206 is applied to the stem mother reference table, which is the output data of the shallow parsing step S203, to re-evaluate the second en-best list generated in the stem stem trigram search step S205. To generate the final en-best list.

아래의 표 1은 본 발명의 실시 예에 따른 음성 인식 방법의 통계적 언어 처리 결과를 예를 들어 나타낸 표이다.Table 1 below is a table showing the statistical language processing results of the speech recognition method according to an embodiment of the present invention.

처리 대상Processing target 처리 대상 데이터 또는 결과Data or result to be processed 원시 텍스트 말뭉치Raw text corpus <s> 나는 그가 수습하리라고는 또한 그의 모국이 한국이리라고는 꿈에도 생각하지 못했다 </s> <s> 한국은 민주공화국이다 </s> <s> 오늘 밥은 다 먹었다 </s><s> I never thought that he would appease or dream that his homeland would be Korea. </ s> <s> Korea is a democratic republic. </ s> <s> I've finished my meal. </ s> 형태소 말뭉치Stem stem <s> 나/nc+는/j 그/np+가/j 수습/nc+하/xsp+리라고는/e 또한/ma 그의/mm 모국/nc+이/j 한국/nq+이/j+리라고는/e 꿈에도/ma 생각/nc+하/xsp+지/e 못하/pv+었/e+다/e </s> <s> 한국/nq+은/j 민주/nc+공화국/nc+이/j+다/e </s> <s> 오늘/nc 밥/nc+은/j 다/nc 먹/pv+었/e+다/e </s><s> I / nc + / j he / np + / j probation / nc + ha / xsp + li / e also / ma his / mm motherland / nc + / j korea / nq + / j + li / e dream / ma Thought / nc + ha / xsp + d / e not / pv + d / e + da / e </ s> <s> korea / nq + / j demo / nc + republic / nc + / j + da / e </ s> <s> Today / nc rice / nc + / j da / nc ate / pv + / e + da / e </ s> 형태소 열Stem heat <s> nc+j np+j nc+xsp+e ma mm nc+ j nq+j+e ma nc+xsp+e pv+e+e </s> <s> nq+j nc+nc+j+e </s> <s> nc nc+j nc pv+e+e </s><s> nc + j np + j nc + xsp + e ma mm nc + j nq + j + e ma nc + xsp + e pv + e + e </ s> <s> nq + j nc + nc + j + e </ s> <s> nc nc + j nc pv + e + e </ s> 단어 트라이그램Word trigram \1-그램: (유니그램 확률, 단어, 단어의 백-오프 가중치) -1.1027 <s> 0.0469 -1.6046 가 0.0109 -1.6046 공화국 0.0357 ... \2-그램: (바이그램 확률, 바이그램, 바이그램의 백-오프 가중치) -0.1249 <s> 나 0.0000 -0.1249 <s> 한국 0.0000 -0.1249 가 수습 0.0000 -0.1249 공화국 이 0.0000 ... \3-grams: (트라이그램 확률, 트라이그램) -0.1761 </s> <s> 한국 -0.1761 <s> 나 는 -0.1761 <s> 오늘 밥 -0.1761 <s> 한국 은 -0.1761 그 가 수습 ...\ 1-Gram: (Unigram Probability, Word, Back-Off Weight of Word) -1.1027 <s> 0.0469 -1.6046 is 0.0109 -1.6046 Republic 0.0357 ... \ 2-Gram: (Bigram Probability, Baigram, Hundred of Baigram -Off weight) -0.1249 <s> or 0.0000 -0.1249 <s> Korea 0.0000 -0.1249 Probable 0.0000 -0.1249 Republic 2 0.0000 ... \ 3-grams: (trigram probability, trigram) -0.1761 </ s> <s> Korea -0.1761 <s> I -0.1761 <s> Today Bob -0.1761 <s> Korea -0.1761 That is probable ... 형태소 트라이그램Stem stem \1-그램: (유니그램 확률, 단어, 단어의 백-오프 가중치) -1.1027 <s> -0.4536 -0.7347 e -0.8182 -0.7347 j -0.6096 ... \2-그램: (바이그램 확률, 바이그램, 바이그램의 백-오프 가중치) -0.3010 <s> nc 0.3802 -0.6021 <s> nq 0.4771 -1.0000 e pv 0.6021 -0.6021 j e 0.3010 ... \3-그램: (트라이그램 확률, 트라이그램) -0.1761 <s> nc j -0.1761 <s> nc nc -0.1761 <s> nq j -0.1761 e ma nc -0.1761 e pv e ...\ 1-gram: (unigram probability, word, back-off weight of the word) -1.1027 <s> -0.4536 -0.7347 e -0.8182 -0.7347 j -0.6096 ... \ 2-grams: (Bigram probability, Back-off weight of bigrams) -0.3010 <s> nc 0.3802 -0.6021 <s> nq 0.4771 -1.0000 e pv 0.6021 -0.6021 je 0.3010 ... \ 3-grams: (trigram probability, trigrams) -0.1761 <s > nc j -0.1761 <s> nc nc -0.1761 <s> nq j -0.1761 e ma nc -0.1761 e pv e ...

상술한 바와 같은 본 발명의 실시 예에 따른 통계적 언어 모델 학습에 기반한 다단계 음성 인식 방법에서 처리하는 대상 데이터 및 그 처리 결과를 표 1의 예와 같이 나타낼 수 있다.The target data processed in the multi-stage speech recognition method based on the statistical language model training according to the embodiment of the present invention as described above and the processing result thereof may be represented as in the example of Table 1.

원시 텍스트 말뭉치는, 문장 단위로 분리되어 문장의 처음에 "<s>" 기호를 첨부하고, 문장의 마지막에 "</s>" 기호를 첨부한다.Raw text corpus, separated by sentence, appends the "<s>" symbol at the beginning of the sentence and the "</ s>" symbol at the end of the sentence.

이것(원시 텍스트 말뭉치)을 형태소 해석하여 형태소 말뭉치와 같은 형태로 출력하게 된다. 이때, 주어진 어절의 형태소 결합 정보는 '+'라는 구분자로서 나타내며, 각 형태소의 품사 태그 정보는 '/' 기호 다음에 해당 품사 태그를 부착하게 된다.This (the raw text corpus) is morphologically interpreted and output in the same form as the morpheme corpus. At this time, the morpheme combining information of a given word is represented as a separator of '+', and the part-of-speech tag information of each morpheme is attached with a part-of-speech tag after the '/' symbol.

형태소 열은, 이것(형태소 말뭉치)으로부터 형태소를 제외한 품사 태그로만 개별 문장을 구성하도록 한 것이다.The morpheme column is intended to form individual sentences only from the parts of speech tag excluding the morpheme from this (morpheme corpus).

단어 트라이그램은, 트라이그램 탐색 단계(S204)에서 직접 사용되는 언어 모델로서, 발성 사전에는 품사 태그 정보가 포함되어 탐색에 사용된다.The word trigram is a language model used directly in the trigram search step S204, and the speech dictionary includes part-of-speech tag information and is used for the search.

형태소 트라이그램은, 형태소 트라이그램 탐색 단계(S205)에서 사용되는 언어 모델이다.The morpheme trigram is a language model used in the morpheme trigram search step S205.

아래의 표 2는 본 발명의 실시 예에 따른 어간어미 참조표의 구현 예시를 나타낸 표이다.Table 2 below is a table showing an implementation example of the stem mother reference table according to an embodiment of the present invention.

처리 단위Processing unit 예제example 처리 단계Processing steps 문장sentence 나는 그가 수습하리라고는 또한 그의 모국이 한국이리라고는 꿈에도 생각하지 못했다.I didn't even think that he would be able to settle, nor that his motherland would be Korea . 텍스트 말뭉치Text corpus 어절Word 수습하리라고는 한국이리라고는I'm going to be a Korean 형태소 해석Morphological analysis 형태소morpheme 수습하+리+라+고+는Apprentice + Lee + La + Go + 샐로우 파싱Shallow Parsing 어간어미 참조표Stem ending table ... (용언/어간)+(리+라+고+는/접미사 열) (용언/어간)+(리+라+고/접미사 열) (용언/어간)+(리+라/접미사 열) (체언/어간)+(이+리+라+고+는/접미사 열) (체언/어간)+(이+리+라+고/접미사 열) (체언/어간)+(이+리+라/접미사 열) (체언/어간)+(이/접미사) (체언/어간)+(가/접미사) (체언/어간)+(은/접미사) (용언/어간)+(고/접미사) ...... (verb / stem) + (ri + la + high + silver / suffix column) (verb / stem) + (ri + la + high / suffix column) (verb / stem) + (ri + la / suffix column ) (Chain / Stem) + (E + Li + La + Go + Silver / Suffix Column) (Chain / Stem) + (E + Lee + La + High / Suffix Column) (Chain / Stem) + (E + Lee + D / suffix column) (term / stem) + (two / suffix) (term / stem) + (temp / suffix) (term / stem) + (silver / suffix) (verb / stem) + (high / suffix). .. 어간-어미 탐색Stem-mother exploration

상술한 바와 같은 형태소 해석 단계(S201)에서 어절을 대상으로 '+'를 구분자로 하여 형태소라는 음성 인식의 기본 단위로 분할한 결과를 표 2의 예와 같이 나타낼 수 있다.In the morpheme analysis step (S201) described above, the result of dividing the word into a basic unit of speech recognition called morpheme by using '+' as a delimiter may be expressed as in the example of Table 2.

구분된 형태소 열은, 어간("수습하")과 어미("리+라+고+는") 형태로 분리하여 이들 어간군(즉, 접두사, 용언, 체언 등)과 어미군(즉, 조사, 접미사, 어미 등)간의 결합 형태 및 결합 관계를 정의하는 어간어미 참조표에 등록하게 된다.Separate morphological columns are divided into stems ("under probation") and endings ("Lee + La + Go +") and these stem groups (ie, prefixes, verbs, messages, etc.) and mother groups (ie, investigations). , Suffixes, and endings) will be registered in the stem ending table, which defines the type of association and association.

이때, 어미는, 독립적인 어미뿐만 아니라 어미들 간의 연접 형태(접미사 열(Suffix Sequence))까지도 등록을 수행하여 용언 및 체언의 결합 형태 또는 결합 관계를 직접적으로 참조하여 점수화를 수행하게 된다.At this time, the mother registers not only an independent mother but also a concatenated form (suffix sequence) between the mothers, and performs scoring by directly referring to the combined form or association relationship of the word and the verb.

아래의 표 3은 본 발명의 실시 예에 따른 음성 인식 방법의 단계별 인식 동작을 예를 들어 나타낸 표이다.Table 3 below is a table showing a step-by-step recognition operation of the speech recognition method according to an embodiment of the present invention.

처리 순서Processing order 처리 결과Processing result 엔-베스트 후보열 예시N-best candidate sequence example 트라이그램 탐색Trigram navigation 형태소(단어) 단위 엔-베스트 후보열Morphological (word) unit N-best candidate sequence 1) 모국/nc 이/j 한국/nq 이/j 리라고는/e 꿈에도/ma 생각/nc 하/xsp 지/e 못하/pv 었/e 다/e 2) 모국/nc 이/j 항구/nq 이/j 리라고는/e 꿈에도/ma 생각/nc 하/xsp 지/e 못하/pv 었/e 다/e 3) 모국/nc 이/j 한국/nq nc 이/j 리라고는/e 꿈에도/ma 생각/nc 하/xsp 지/e 모태/nc 다/j 4) 모국/nc 이/j 항복/nc 하/xs 이/j 리라고는/e 꿈에도/ma 생각/nc 하/xsp 지/e 못하/pv 었/e 다/e1) Motherland / nc yi / j korea / nq yi / j li / e dreams / ma thought / nc ha / xsp zi / e not / pv i / e da / e 2) home country / nc i / j port / nq / J li / e dreams / ma thoughts / nc ha / xsp ji / e can't / pv was / e da / e 3) home country / nc yi / j korea / nq nc yi / j li / e dreams / ma Think / nc ha / xsp ji / e maternal / nc da / j 4) motherland / nc ga / j surrender / nc ha / xs ga / j li / e dream / ma think / nc ha / xsp ji / e can pv was / e die / e 형태소 트라이그램 탐색Stemming Trigram Navigation 엔-베스트 후보열 재평가N-Best Candidate Revaluation 1) 모국/nc 이/j 한국/nq 이/j 리라고는/e 꿈에도/ma 생각/nc 하/xsp 지/e 못하/pv 었/e 다/e 2) 모국/nc 이/j 항구/nq 이/j 리라고는/e 꿈에도/ma 생각/nc 하/xsp 지/e 못하/pv 었/e 다/e 3) 모국/nc 이/j 한국/nq 이/j 리라고는/e 꿈에도/ma 생각/nc 하/xsp 지/e 모태/ nc 다/j 4) 모국/nc 이/j 항복/nc 하/xs 이/j 리라고는/e 꿈에도/ma 생각/nc 하/xsp 지/e 못하/pv 었/e 다/e1) Motherland / nc yi / j korea / nq yi / j li / e dreams / ma thought / nc ha / xsp zi / e not / pv i / e da / e 2) home country / nc i / j port / nq This / j li / e dream / ma thought / nc ha / xsp ji / e i can't / pv was / e da / e 3) homeland / nc i / j korea / nq i / j li / e dream / ma thought / nc ha / xsp edge / e womb / nc die / j 4) homeland / nc Yi / j surrender / nc ha / xs Yi / j li / e dreams / ma thought / nc ha / xsp ji / e hardship / pv Was / e die / e 어간-어미 탐색Stem-mother exploration 엔-베스트 후보열 재평가N-Best Candidate Revaluation 1) 모국/nc+이/j 한국/nq+이/j+리라고는/e 꿈에도/ma 생각/nc+하/xsp+지/e+못하/pv+었/e+다/e 2) 모국/nc+이/j 항구/nq+이/j+리라고는/e 꿈에도/ma 생각/nc+하/xsp+지/e+못하/pv+었/e+다/e 3) 모국/nc+이/j 한국/nq+이/j+리라고는/e 꿈에도/ma 생각/nc+하/x네+지/e 모태/nc+다/j 4) 모국/nc+이/j 항복/nc+하/xs?이/j+리라고는/e 꿈에도/ma 생각/nc+하/xsp+지/e+못하/pv+었/e+다/e1) Motherland / nc + / j korea / nq + / j + li / e dream / ma thought / nc + ha / xsp + ji / e + not / pv + / e + c / e 2) Motherland / nc + i / j port / nq + This / j + li / e dream / ma thought / nc + ha / xsp + ji / e + not / pv + was / e + da / e 3) Motherland / nc + / j korea / nq + / j + ri / e dream / ma thought 4) Motherland / nc + surrenders / j surrender / nc + ha / xs? / j + li / e dreams / ma thought / nc + ha / xsp + ji / e + Nails / pv + / e + multi / e 최종 엔-베스트 후보열Final N-Best Candidate 1) 모국이 한국이리라고는 꿈에도 생각하지 못했다 2) 모국이 항구이리라고는 꿈에도 생각하지 못했다 3) 모국이 한국이리라고는 꿈에도 생각하지 모태다 4) 모국이 항복하이리라고는 꿈에도 생각하지 못했다1) I never thought my motherland would be Korea 2) I never thought my motherland was a port 3) I never thought my motherland would be Korea 4) I never thought my motherland would surrender

상술한 바와 같은 본 발명의 실시 예에 따른 통계적 언어 모델 학습에 기반한 다단계 음성 인식 방법에 있어서, "모국이 한국이리라고는 꿈에도 생각하지 못했다"라는 발성 예에 대한 인식 동작은 표 3의 예에 나타낸 바와 같다.In the multi-speech speech recognition method based on statistical language model learning according to the embodiment of the present invention as described above, the recognition operation for the example of uttering that "the home country is not dreaming of Korea" is shown in the example of Table 3. As shown.

형태소를 인식 단위로 사용하는 일반적인 음성 인식 엔진의 예는, 표 3에서 제시하는 트라이그램 탐색과 같이, 인식된 단위 형태소와 그 품사 태그가 부착된 형태이다.An example of a general speech recognition engine using a morpheme as a recognition unit is a form in which a recognized unit morpheme and its part-of-speech tag are attached, as in the trigram search shown in Table 3.

표 3의 예는 음성 인식 엔진에서 유효한 인식 열의 후보를 최대 4개까지 생성한 예로서, 첫 번째가 가장 높은 점수를 나타내며, 차례로 1)에서 4)까지의 후보열을 순위별로 정렬하여 제시한 경우이다.The example in Table 3 is an example in which the speech recognition engine generated up to four candidates for valid recognition columns, the first showing the highest score, and the candidate columns from 1) to 4) are presented in order. to be.

다음 단계인 형태소 트라이그램 탐색 단계(S205)는, 이것(즉, 트라이그램 탐색 단계(S204)에서 생성된 형태소(단어) 단위 엔-베스트 후보열)을 대상으로, 유효한 형태소 열인지를 형태소 트라이그램을 참조하여 판단하게 된다.The next step, the morpheme trigram search step S205, is a morpheme trigram whether it is a valid morpheme column for this (that is, a morpheme (word) unit en-best candidate sequence generated in the trigram search step S204). It is determined by referring to.

이때, 1)번과 2)번 후보열의 형태소 배열은, 형태소 말뭉치에도 정확히 나타나 있는 학습된 형태소 트라이그램의 배열을 따르고 있지만, 3)번 후보열의 경우에는 "지/e 모태/nc 다/j"의 "e nc j"가 학습된 형태소 트라이그램과 거리가 있음을 알 수 있으며, 4)번 후보열의 경우에는 "항복/nc 하/xs 이/j 리라고는/e"의 "nc xs j e"가 학습된 형태소 트라이그램과 거리가 있음을 알 수 있다. 이런 경우, 형태소 트라이그램 탐색 단계(S205)에서는 이들 3)번 및 4)번 후보열에 대해 페널티를 적용함으로써 점수를 하향 조정하게 된다.In this case, the morphological arrangement of candidate strings 1) and 2) follows the array of trained morpheme trigrams that are exactly as shown in the morphological corpus, but in the case of candidate string 3), "G / e matrix / nc da / j". It can be seen that the "e nc j" of is far from the learned stemgram, and in the case of candidate 4), "nc xs je" of "yield / nc ha / xs is / e" We can see that there is a distance from the learned stemgram. In this case, in the morpheme trigram search step S205, the score is adjusted downward by applying a penalty to the candidate strings 3) and 4).

마지막으로 어간-어미 탐색 단계(S206)에서는, 어간과 어미의 결합 구조를 평가하게 된다. 특히, 용언 및 체언에 대한 접속 규칙 및 어간어미 참조표를 참조함으로써, 어간-어미 탐색 단계(S206)에서 제시한 표 3의 예와 같이, 어간과 어미의 결합 구조를 생성하게 된다.Finally, in the stem-mother searching step (S206), the coupling structure of the stem and the mother is evaluated. In particular, by referring to the connection rule and the stem ending reference table for the words and phrases, as shown in the example of Table 3 presented in the stem-mother searching step (S206), it is possible to create a coupling structure between the stem and the stem.

또한. 후보열 4)의 "항복/nc+하/xs?이/j+리라고는/e"와 같이 어간어미 접속표에 위배되는 형태의 결합 또는 결합이 완전하지 않은 형태의 인식 결과를 자동으로 검출하여 '?'와 같은 기호로 해당 부분을 지정하고 페널티를 줌으로써 점수를 하향 조정하게 된다.Also. Automatically detects the result of a combination or incomplete combination that violates the stem ending table, such as "yield / nc + / xs? / J +? / E" in candidate column 4). The score is reduced by designating the part with a symbol such as' and penalizing it.

상술한 바와 같이, 본 발명의 실시 예에 따른 다단계 음성 인식 방법은, 통계적 언어 모델에 기반하는 대어휘 음성 인식 시스템을 구성함에 있어서 형태소열에 기반하는 형태소 트라이그램 탐색을 첫 번째 단계로 수행하며, 형태소 품사열에 기반하는 형태소 품사 트라이그램 탐색을 두 번째 단계로 수행하며, 형태소를 어절의 하위 구성요소로서의 개념인 어간-어미 결합구조로 정의하여 그 참조표를 지식 베이스화하여 이에 근거한 어간-어미 탐색을 세 번째 단계로 수행한다. 특히, 형태소에 기반하는 언어 모델 학습 과정에서 생성되는 기본적인 정보를 간단히 재구성 및 활용함으로써, 음성 인식의 결과를 효율적으로 재평가하고 성능 개선을 얻을 수 있도록 한다.As described above, the multi-stage speech recognition method according to an embodiment of the present invention performs a morpheme trigram search based on morpheme sequences as a first step in constructing a large vocabulary speech recognition system based on a statistical language model. The second step of searching for morpheme part-of-speech trigrams based on parts of speech is defined, and stemming is defined as a stem-mother combination structure, which is a concept of sub-elements of a word, and the reference table is knowledge-based. Follow the first step. In particular, by simply reconstructing and utilizing basic information generated in the morphological language model learning process, it is possible to efficiently reevaluate the results of speech recognition and to obtain performance improvement.

그리고 본 발명의 실시 예는 이상에서 설명한 장치 및/또는 방법을 통해서만 구현이 되는 것은 아니며, 본 발명의 실시예의 구성에 대응하는 기능을 실현하기 위한 프로그램, 그 프로그램이 기록된 기록 매체 등을 통해 구현될 수도 있으며, 이러한 구현은 앞서 설명한 실시예의 기재로부터 본 발명이 속하는 기술분야의 전문가라면 쉽게 구현할 수 있는 것이다.In addition, the embodiment of the present invention is not implemented only through the above-described apparatus and / or method, but through a program for realizing a function corresponding to the configuration of the embodiment of the present invention, a recording medium on which the program is recorded, and the like. Such implementations may be readily implemented by those skilled in the art from the description of the above-described embodiments.

이상에서 본 발명의 실시 예에 대하여 상세하게 설명하였지만 본 발명의 권리범위는 이에 한정되는 것은 아니고 다음의 청구범위에서 정의하고 있는 본 발명의 기본 개념을 이용한 당업자의 여러 변형 및 개량 형태 또한 본 발명의 권리범위에 속하는 것이다.Although the embodiments of the present invention have been described in detail above, the scope of the present invention is not limited thereto, and various modifications and improvements of those skilled in the art using the basic concepts of the present invention defined in the following claims are also provided. It belongs to the scope of rights.

도 2는 본 발명의 실시 예에 따른 음성 인식 방법을 보다 상세히 나타낸 순서도이다.2 is a flowchart illustrating a speech recognition method in detail according to an embodiment of the present invention.

Claims

In the method of recognizing speech,

A corpus learning process to generate a knowledge base for speech recognition from raw text corpus, and

Recognition process for generating a final recognition result of the speech recognition engine using the morpheme as a recognition unit using the knowledge base

Speech recognition method comprising a.

The method of claim 1,

The corpus learning process,

Generating a morpheme corpus and a morpheme column by performing a morpheme analysis on the raw text corpus,

Generating a word trigram and a morpheme trigram by performing statistical language modeling on the morpheme corpus, and

Constructing a linkage form of mother or suffixes and actual linked type endings by inputting the morpheme corpus into a stem mother reference table

Speech recognition method comprising a.

The method of claim 2,

The morpheme corpus is

A speech recognition method in which the words of the raw text corpus are separated into morphemes.

The method of claim 3,

The morphological heat,

Speech recognition method for forming a separate sentence only from the part of speech tag excluding morpheme from the morpheme corpus.

The method of claim 2,

The word trigram,

Speech recognition method generated by the language model consisting of morpheme unigram, bigram, trigram for the morpheme corpus.

The method of claim 2,

The morpheme trigram,

And a part-of-speech set language model consisting of unigrams, bigrams, and trigrams of morpheme parts of speech for the morpheme sequence.

The method of claim 1,

The recognition process,

Performing a trigram search on the spoken speech to generate a plurality of primary en-best recognition candidate sequences;

Generating a second en-best list by performing a search based on a morpheme trigram on the first en-best recognition candidate sequence; and

Generating a final en-best list by performing a re-evaluation on the secondary en-best list by applying a stem mother reference table

Speech recognition method comprising a.

The method of claim 7, wherein

The trigram search,

A speech recognition method for converting recorded speech into text using acoustic models, language models, and speech dictionaries.

The method of claim 7, wherein

Generating the secondary en-best list,

And a second en-best list is generated by re-adjusting the ranks by scores by verifying and re-evaluating the first en-best recognition candidate sequence.

The method of claim 7, wherein

The search based on the morpheme trigram is

Speech recognition method for re-evaluating recognition result candidates applying morphological-syntax constraints.

In the method of recognizing spoken voice,

Generating a plurality of first order n-best candidate sequences by performing a word trigram search on the spoken voice;

Generating a secondary en-best candidate sequence by performing a morpheme trigram search on the primary en-best candidate sequence, and

Generating a final en-best candidate sequence by performing a stem-mother search on the secondary en-best candidate sequence

Speech recognition method comprising a.

The method of claim 11,

Generating the secondary en-best list,

And re-evaluating the first en-best candidate sequence using a part-of-speech tag sequence of a recognition candidate sequence by applying a morpheme trigram.

The method of claim 11,

Generating the final en-best list,

Constructing a stem mother reference table in the stem-mother search, and

Re-evaluating the secondary en-best candidate sequence based on the constructed stem mother reference table

Speech recognition method comprising a.

The method of claim 13,

The step of constructing the stem mother reference table,

A speech recognition method that constructs a stem mother reference table by analyzing a lexical syntactic stem and connection structure of a mother and a database of combining rules of the mother.

The method of claim 13,

The step of constructing the stem mother reference table,

Separating words into stems and endings,

Grouping the mothers as mother fever without separating them anymore

Defining the mother join string as a database, and

Steps to Define Database Fields in the Organ Mother Table

Speech recognition method comprising a.