KR20060092683A

KR20060092683A - Continuity voice recognition model construction method

Info

Publication number: KR20060092683A
Application number: KR1020050013760A
Authority: KR
Inventors: 신원호
Original assignee: 엘지전자 주식회사
Priority date: 2005-02-18
Filing date: 2005-02-18
Publication date: 2006-08-23

Abstract

본 발명은 연속 음성인식 모델 구축방법에 관한 것으로, 특정 음성인식 관련 분야별로 언어모델 확률값을 대표할 수 있는 대표 단어를 설정하고, 대표단어의 그 언어모델 확률값을 나머지 단어에 대하여 적용함으로써, 언어모델 구축에 필요한 코퍼스 생성시간 및 작업량을 감소시키도록 한 것이다. 이를 위하여 본 발명은, 사용자가 음성인식 적용분야를 선택하는 과정과; 상기 음성인식 적용분야에 해당되는 다양한 패턴의 문장들을 문법으로 표현하는 과정과; 상기 음성인식 적용분야에 해당되는 대표단어를 설정하는 과정과; 상기 대표단어 및 문법을 이용하여 다양한 텍스트 코퍼스를 생성하는 과정과; 상기 다양한 텍스트 코퍼스를 이용하여 언어모델을 생성하는 과정과; 상기 대표단어와 동일분야에 해당되는 단어를 추가사전 리스트에 맵핑하는 과정으로 이루어진다.The present invention relates to a method for constructing a continuous speech recognition model, comprising setting a representative word that can represent a language model probability value for a specific speech recognition-related field, and applying the language model probability value of the representative word to the remaining words, thereby providing a language model. It is to reduce the corpus generation time and the amount of work required for the construction. To this end, the present invention, the process of selecting a speech recognition application field; Expressing sentences of various patterns corresponding to the speech recognition application field in grammar; Setting a representative word corresponding to the speech recognition application field; Generating various text corpus using the representative word and grammar; Generating a language model using the various text corpus; Comprising a process of mapping a word corresponding to the representative word and the same field to an additional dictionary list.

Description

CONTINUITY VOICE RECOGNITION MODEL CONSTRUCTION METHOD}

도1은 본 발명 연속 음성인식 모델 구축방법에 대한 동작흐름도.1 is a flowchart illustrating a method for constructing a continuous speech recognition model of the present invention.

도2는 도1에 있어서, 다양한 패턴의 문장을 문법으로 구축한 모습을 보인 예시도.FIG. 2 is an exemplary view showing how sentences of various patterns are constructed in grammar in FIG.

도3은 도1에 있어서, 대표단어를 설정한 모습을 보인 예시도.3 is an exemplary view showing a state in which a representative word is set in FIG.

도4는 도1에 있어서, 텍스트 코퍼스 생성을 보인 예시도.4 is an exemplary view showing text corpus generation in FIG.

도5는 도1에 있어서, 추가사전 리스트의 모습을 보인 예시도.5 is an exemplary view showing a state of an additional dictionary list in FIG.

도6은 도1에 있어서, 대표 단어를 설정한 사전을 보인 예시도.6 is an exemplary view showing a dictionary in which a representative word is set in FIG.

본 발명은 음성인식모델 구축방법에 관한 것으로, 특히 특정 음성인식 관련 분야별로 언어모델 확률값을 대표할 수 있는 대표 단어를 설정하고, 대표단어의 그 언어모델 확률값을 나머지 단어에 대하여 적용함으로써, 언어모델 구축에 필요한 코퍼스 생성시간 및 작업량을 감소시키도록 한 연속 음성인식 언어모델 구축방법에 관한 것이다.The present invention relates to a method for constructing a speech recognition model, and in particular, by setting a representative word that can represent a language model probability value for a specific speech recognition related field, and applying the language model probability value of the representative word to the remaining words, The present invention relates to a method of constructing a continuous speech recognition language model to reduce corpus generation time and workload required for construction.

현재 고립단어 수준의 음성 인식 기술은 비교적 보편화 되어 여러 가지 응용분야에서 활용되고 있다. 이러한 고립 단어 수준의 음성 인식 기술이 상용화 되다 보니, 사용자 입장에서도 보다 높은 기능을 갖는 음성 인식 제품에 대한 요구가 늘고 있다. 즉 인식 대상 단어 전후에 다른 말을 포함시켜도 인식이 가능한 key word spotting 기술이나 자연스러운 문장 형태의 인식이 가능한 연속 음성 인식과 같은 기술이 적용되거나 요구되고 있다.Currently, isolated word-level speech recognition technology is relatively common and is used in various applications. As the isolated word-level speech recognition technology is commercialized, there is an increasing demand for a speech recognition product having a higher function from the user's point of view. That is, technologies such as key word spotting technology that can recognize words even before and after the words to be recognized and continuous speech recognition that can recognize natural sentences are applied or required.

상기 연속 음성 인식의 경우 그 인식 성능을 좌우하는 요소는 크게 두 가지로 분류할 수 있다. In the case of the continuous speech recognition, two factors can determine the recognition performance.

첫째는 음향모델의 성능으로 충분한 양의 음성 데이터로부터 각 음소별 음향 모델이 학습되어야 좋은 성능을 얻을 수 있다. First, as the performance of the acoustic model, a good performance can be obtained when the acoustic model of each phoneme is learned from a sufficient amount of speech data.

둘째로 언어 모델의 성능이다. 연속 음성 인식의 경우 음향 모델의 성능만을 의존하여서는 좋은 인식 성능을 기대할 수 없다. 사람들도 실제로 사람들이 하는 말을 전후 문맥을 통해 알아듣고 예측하는 것과 유사하다. 물론 이해와 인식은 조금 다른 의미를 가지고 있으나 이해를 돕기 위하여 이를 예로 든 것이다. Second is the performance of the language model. In case of continuous speech recognition, we cannot expect good recognition performance only by relying on the performance of acoustic model. People are actually similar to understanding and predicting what people are saying in post-war contexts. Of course, understanding and perception have a slightly different meaning, but they are taken as examples to help understanding.

즉, 특정 단어 다음에 어떤 단어가 나올 확률을 인식에서 활용할 수 있다면 인식 성능을 높이는 데 도움이 된다. 그러나 이런 언어 모델을 구축하기 위해서는 많은 양의 텍스트 코퍼스가 이용되어야 한다. 대부분 수집이 용이한 인터넷 상의 신문이나 소설 기타 텍스트 자료들을 활용하게 되는 데, 특정 응용 영역에 적용되는 인식 시스템, 예를 들면 차량용 네비게이션, 열차 기차 예약 시스템 등 이와 같은 분야에서는 입맛에 맞는 코퍼스를 구하기 힘들게 된다. In other words, if you can use the probability of a word after a certain word in recognition, it helps to improve the recognition performance. However, building a language model requires a large amount of text corpus. Most of them use easy-to-collect newspapers, novels and other textual materials on the Internet, and it is difficult to find a corpus that suits your taste in such areas as recognition systems applied to specific application areas, such as car navigation and train train reservation systems. do.

따라서 범용 코퍼스를 그냥 이용하거나 임의로 데이터를 생성하기도 한다. 이러한 응용 분야에서 지명이라든지 어떤 범주에 들어가는 데이터 베이스의 단어들은 실제 코퍼스에 대부분 모두 존재하지 않으며 존재한다고 해도 각각 존재하는 빈도 차이가 심해서 실제로는 큰 의미를 갖지 못한다.Therefore, the general purpose corpus is simply used or randomly generated data. Most of the words in the database of names or categories in these applications do not exist in the actual corpus, and even if they exist, the frequency difference between them is so great that it does not really mean much.

상기 언어 모델을 생성하는 방법은 주어진 학습용 코퍼스로 부터 언어 모델을 구축하거나, 학습용 코퍼스가 없는 경우에는 임의로 코퍼스 생성하여 활용하기도 한다. The method for generating a language model may be used to construct a language model from a given learning corpus or to generate a corpus arbitrarily when there is no learning corpus.

상기 학습용 코퍼스를 생성하는 경우에, 실제 통계와는 다소 거리가 있고 생성 방법, 범위에 따라 다소 인위적인 부분이 있으나 실제로 데이터를 구할 수 없는 경우에는 불가피하게 이용할 수 밖에 없다. In the case of generating the learning corpus, there are some distances from actual statistics and some artificial parts depending on the generation method and range, but inevitably, the data cannot be obtained.

이러한 경우, 문법 등을 이용하여 가능한 문장이나 구 등을 생성해 내게 되는데, 이때 지명등의 일반 명사들은 불가피하게 동일한 문장 패턴 내에서 반복 생 성되게 된다. In this case, possible sentences or phrases are generated using grammar, etc. At this time, common nouns such as place names are inevitably generated within the same sentence pattern.

따라서, 문장 패턴이 몇 개의 조합으로 구성된다고 가정할 때 그 패턴 일부를 차지는 지명 등의 단어수가 증가하게 되면 생성되는 문장의 수는 기하급수적으로 늘어나게 되는 문제점이 있다.Therefore, assuming that a sentence pattern is composed of several combinations, when the number of words such as place names occupying a part of the pattern increases, the number of generated sentences increases exponentially.

본 발명은 상기와 같은 문제점을 해결하기 위하여 안출된 것으로, 특정 음성인식 관련 분야별로 언어모델 확률값을 대표할 수 있는 대표 단어를 설정하고, 대표단어의 그 언어모델 확률값을 나머지 단어에 대하여 적용함으로써, 언어모델 구축에 필요한 코퍼스 생성시간 및 작업량을 감소시키도록 한 연속 음성인식 언어모델 구축방법을 제공함에 그 목적이 있다.The present invention has been made to solve the above problems, by setting a representative word that can represent the language model probability value for each specific speech recognition-related field, and by applying the language model probability value of the representative word to the remaining words, The purpose of the present invention is to provide a continuous speech recognition language model construction method for reducing the corpus generation time and the amount of work required for constructing a language model.

상기와 같은 목적을 달성하기 위한 본 발명은, 사용자가 음성인식 적용분야를 선택하는 과정과;The present invention for achieving the above object, the process of selecting a speech recognition application field;

상기 음성인식 적용분야에 해당되는 다양한 패턴의 문장들을 문법으로 표현하는 과정과;Expressing sentences of various patterns corresponding to the speech recognition application field in grammar;

상기 음성인식 적용분야에 해당되는 대표단어를 설정하는 과정과;Setting a representative word corresponding to the speech recognition application field;

상기 대표단어 및 문법을 이용하여 다양한 텍스트 코퍼스를 생성하는 과정과;Generating various text corpus using the representative word and grammar;

상기 다양한 텍스트 코퍼스를 이용하여 언어모델을 생성하는 과정과;Generating a language model using the various text corpus;

상기 대표단어와 동일분야에 해당되는 단어를 추가사전 리스트에 맵핑하는 과정을 수행함을 특징으로 한다.The method of mapping a word corresponding to the representative word and the same field to an additional dictionary list is performed.

이하, 본 발명에 의한 연속 음성인식 언어모델 구축방법에 대한 작용 및 효과를 첨부한 도면을 참조하여 상세히 설명한다.Hereinafter, with reference to the accompanying drawings, the operation and effects of the continuous speech recognition language model construction method according to the present invention will be described in detail.

우선, 본 발명은 특정 음성인식 적용분야에 대하여 임의로 설정된 대표단어만을 이용하여 문장을 생성하고 나머지 단어에 대하여는 그 대표단어와 동일한 형태로 만들어진 것처럼 처리하면 효율적으로 코퍼스를 생성할 수 있다는 점에 착안하였음을 밝혀두는 바이다.First of all, the present invention focused on generating a corpus efficiently by generating a sentence using only a representative word set arbitrarily for a specific speech recognition application field and processing the remaining words as if they were made in the same form as the representative word. It is to be revealed.

도1은 본 발명 연속 음성인식 언어모델 구축방법에 대한 동작흐름도이다.1 is a flowchart illustrating a method of constructing a continuous speech recognition language model of the present invention.

도1에 도시한 바와같이, 본 발명은 사용자가 음성인식 적용분야를 선택하는 과정(SP1)과; 상기 음성인식 적용분야에 해당되는 다양한 패턴의 문장들을 문법으로 표현하는 과정(SP2)과; 상기 음성인식 적용분야에 해당되는 대표단어를 설정하는 과정(SP3)과; 상기 대표단어 및 문법을 이용하여 다양한 텍스트 코퍼스를 생성하는 과정(SP4)과; 상기 다양한 텍스트 코퍼스를 이용하여 언어모델을 생성하는 과정(SP5)과; 상기 대표단어와 동일분야에 해당되는 단어를 추가사전 리스트에 맵핑하는 과정(SP6)과; 상기 추가 사전 리스트에 추가된 단어들을 반영하여 언어모델을 갱신하는 과정(SP7)으로 이루어지며, 이와같은 본 발명의 동작을 설명한다.As shown in Fig. 1, the present invention provides a method SP1 for selecting a speech recognition application field by a user; Expressing sentences of various patterns corresponding to the speech recognition application field in grammar (SP2); Setting a representative word corresponding to the speech recognition application field (SP3); Generating various text corpus using the representative word and the grammar (SP4); Generating a language model using the various text corpus (SP5); Mapping a word corresponding to the representative word and the same field to an additional dictionary list (SP6); The process of updating the language model by reflecting the words added to the additional dictionary list (SP7) is described, and this operation of the present invention will be described.

먼저, 사용자가 음성인식 적용분야를 선택하는데(SP1), 예를 들어 네비게이션,항공 예약,영화 예약등과 같은 특정 음성인식 적용분야를 선택한다.First, the user selects a voice recognition application (SP1), for example, selects a specific voice recognition application such as navigation, airline reservation, movie reservation, and the like.

그 다음, 상기에서 선택한 음성인식 적용분야에 해당되는 다양한 패턴의 문장들을 문법으로 구성하는데(SP2), 예를 들어 도2와 같이, 다양한 패턴의 문장들을 문장으로 구성한다.Next, sentences of various patterns corresponding to the speech recognition application field selected above are configured in grammar (SP2). For example, as shown in FIG. 2, sentences of various patterns are composed of sentences.

그 다음, 상기에서 선택한 음성인식 적용분야에 해당되는 대표단어를 설정하는데(SP3), 상기 대표단어의 밑받침을 분석하고, 그 분석결과에 근거하여 대표단어들을 각기 구분하여 설정한다. 예를 들어, 도3과 같이 '가락동','잠실'등이 설정된 대표 단어들인데, 이러한 대표단어들은 조사연결을 고려하여 동일한 패턴이라도 받침이 없는 경우와 받침이 있는 경우를 구분하여 생성하도록 한다.Then, in setting the representative word corresponding to the speech recognition application field selected above (SP3), the underlay of the representative word is analyzed, and the representative words are separately set based on the analysis result. For example, as shown in Figure 3 'garak-dong', 'Jamsil' is set to the representative words, these representative words are to be generated by distinguishing the case without the support and the case with the support even in the same pattern in consideration of the survey connection .

그 다음, 상기 대표단어 및 문법을 이용하여 다양한 텍스트 코퍼스를, 도4와 같이 생성하고(SP4), 상기 다양한 텍스트 코퍼스를 이용하여 언어모델을 생성한다(SP5).Next, various text corpus is generated using the representative word and grammar as shown in FIG. 4 (SP4), and a language model is generated using the various text corpus (SP5).

그 다음, 상기 대표단어와 동일분야에 해당되는 단어를, 도5와 같은 추가사전 리스트에 맵핑한후(SP6) 해당 단어의 언어모델을 갱신한다(SP7). 즉, 추가할 단어가 추가사전 리스트에 존재하는지를 판단하고, 그 판단결과, 추가할 단어가 추가사전 리스트에 존재하지 않으면 상기 추가할 단어의 언어모델값을, 그 추가할 단어가 포함되는 음성인식 적용분야의 대표단어의 언어 모델값으로 설정하고, 반면에 추가할 단어가 추가사전 리스트에 존재하면 상기 추가할 단어의 언어모델값을, 그 추가할 단어가 포함되는 음성인식 적용문야와 기존 단어가 포함되는 음성인식 적용분야의 연관성에 따라 언어모델값을 설정한다.Then, the word corresponding to the same field as the representative word is mapped to the additional dictionary list as shown in FIG. 5 (SP6), and the language model of the word is updated (SP7). That is, it is determined whether the word to be added exists in the additional dictionary list, and as a result of the determination, if the word to be added does not exist in the additional dictionary list, the speech recognition application including the language model value of the word to be added is included. Set the language model value of the representative word of the field, whereas if the word to be added is present in the add dictionary list, the language model value of the word to be added includes the speech recognition application sentence and the existing word including the word to be added. The language model is set according to the relevance of the speech recognition application.

이때, 상기 음성인식 적용분야들의 상호 연관성은 기설정되어 저장된다.In this case, the correlation between the speech recognition applications is preset and stored.

즉, 추가할 단어의 음성인식 적용분야와 기존 단어의 음성인식 적용분야가 서로 상호 연관성이 없으면, 추가할 단어가 포함되는 음성인식 적용분야의 대표단 어의 언어 모델값으로 설정하고, 추가할 단어의 음성인식 적용분야와 기존 단어의 음성인식 적용분야가 상호 연관성이 있으면, 추가할 단어가 포함되는 음성인식 적용분야의 대표단어의 언어모델값과 기존 단어의 언어모델값의 크기를 비교하고, 그 비교결과 언어모델값의 크기가 큰값을 추가할 단어의 언어모델값으로 재설정한다.That is, if the speech recognition application field of the word to be added and the speech recognition application field of the existing word are not correlated with each other, the language model value of the representative word of the speech recognition field including the word to be added is set, and the If the speech recognition application field is correlated with the speech recognition application field of the existing word, compare the size of the language model value of the representative word of the speech recognition application including the word to be added with the size of the language model value of the existing word. Result The large language model value is reset to the language model value of the word to be added.

예를 들어, 도6과 같이, 추가사전 리스트의 가양동,개포동,구로동등은 대표단어인 가락동등과 동일한 언어모델값을 갖게 되는데, 뒤에 나오는 숫자는 추가사전리스트에 추가할 단어가 있는지의 유무에 따라 '0'을 추가하거나, '1,2'를 추가한다.For example, as shown in Figure 6, Gayang-dong, Gaepo-dong, Guro-dong of the additional dictionary list has the same language model value as the representative word Garak-dong, etc., the following number is the presence or absence of a word to add to the additional dictionary list Therefore, add '0' or '1,2'.

여기서, 상기 '1'로 설정되는 경우에는, 기존 단어 모델값을 쓰지 않고 대표단어 모델값으로 변경하여 사용하는 경우이고, '2'로 설정하는 경우에는 기존에 존재하는 단어와 대표단어의 확률값중 최대치로 재설정한다.In this case, when the value is set to '1', the word model value is used instead of the existing word model value. When the value is set to '2', the probability value of the existing word and the representative word is used. Reset to the maximum.

이때, 어떤 특정 동명이 자신의 고유한 특성을 나타낼 경우에는 상기 방법과 분리하여 사용한다.In this case, when a specific same name indicates its own characteristic, it is used separately from the above method.

즉, 본 발명은 특정 음성인식 관련 분야별로 언어모델 확률값을 대표할 수 있는 대표 단어를 설정하고, 추가할 단어가 추가사전리스트에 존재하는지의 유무에 따라 대표단어의 언어모델 확률값을 나머지 단어에 대하여 적용하도록 한 것이다.That is, the present invention sets a representative word that can represent a language model probability value for a specific speech recognition-related field, and sets the language model probability value of the representative word with respect to the remaining words according to whether or not the word to be added exists in the additional dictionary list. It is to apply.

상기 본 발명의 상세한 설명에서 행해진 구체적인 실시 양태 또는 실시예는 어디까지나 본 발명의 기술 내용을 명확하게 하기 위한 것으로 이러한 구체적 실시예에 한정해서 협의로 해석해서는 안되며, 본 발명의 정신과 다음에 기재된 특허 청구의 범위내에서 여러가지 변경 실시가 가능한 것이다.The specific embodiments or examples made in the detailed description of the present invention are intended to clarify the technical contents of the present invention to the extent that they should not be construed as limited to these specific embodiments and should not be construed in consultation. Various changes can be made within the scope of.

이상에서 상세히 설명한 바와같이 본 발명은, 특정 음성인식 관련 분야별로 언어모델 확률값을 대표할 수 있는 대표 단어를 설정하고, 추가할 단어가 추가사전리스트에 존재하는지의 유무에 따라 대표단어의 언어모델 확률값을 나머지 단어에 대하여 적용함으로써,언어모델 구축에 필요한 코퍼스 생성시간 및 작업량을 감소시키는 우수한 효과를 가진다.As described in detail above, the present invention sets a representative word that can represent a language model probability value for each specific speech recognition-related field, and determines the language model probability value of the representative word depending on whether or not the word to be added exists in the additional dictionary list. By applying to the remaining words, it has an excellent effect of reducing the corpus generation time and work required for building the language model.

Claims

Selecting a speech recognition application field by the user;

Expressing sentences of various patterns corresponding to the speech recognition application field in grammar;

Setting a representative word corresponding to the speech recognition application field;

Generating various text corpus using the representative word and grammar;

Generating a language model using the various text corpus;

And a method of mapping a word corresponding to the representative word and the same field to an additional dictionary list.

The method of claim 1, wherein the setting of the representative word comprises:

Analyzing the underlay of the representative word, and classifying the representative words based on the result of the analysis.

The method of claim 1, further comprising updating a language model by reflecting words added to the list of additional dictionaries.

The method of claim 1, wherein the mapping of the word to the additional dictionary list comprises:

Determining whether a word to be added exists in an additional dictionary list;

Setting a language model value of the word to be added as a language model value of a representative word of a speech recognition application field including the word to be added if the word to be added does not exist in the additional dictionary list as a result of the determination;

As a result of the determination, if the word to be added is present in the additional dictionary list, the language model value of the word to be added is determined according to the correlation between the speech recognition application field including the word to be added and the speech recognition application field including the existing word. A continuous speech recognition language model building method comprising the step of setting a model value.

5. The method of claim 4, further comprising the step of presetting correlations of speech recognition applications.

The method of claim 4, wherein the setting of the language model value according to the association comprises:

If the speech recognition application of the word to add and the speech recognition application of the existing word are not correlated with each other,

And setting the language model value of the representative word of the speech recognition application field including the word to be added.

If the speech recognition application of the word to add and the speech recognition application of the existing word are correlated,

Compare the language model value of the representative word in the speech recognition application field containing the word to be added with the language model value of the existing word, and reset the language model value of the word to be added with the larger value of the language model value A continuous speech recognition model building method comprising the step of.