KR100931786B1

KR100931786B1 - Speech recognition method according to Korean spelling

Info

Publication number: KR100931786B1
Application number: KR1020020034627A
Authority: KR
Inventors: 최영재
Original assignee: 주식회사 케이티
Priority date: 2002-06-20
Filing date: 2002-06-20
Publication date: 2009-12-14
Also published as: KR20030097309A

Abstract

본 발명은 한국어 철자 발화에 따른 음성인식 방법에 관한 것으로, 음성인식의 정확도를 높이고 음성인식 결과를 빠른 시간 내에 사용자에게 피드백시키기 위한 한국어 철자 발화에 따른 음성인식 방법을 제공하고자 한다.The present invention relates to a speech recognition method according to Korean spelling utterance, to improve the accuracy of speech recognition and to provide a speech recognition method according to Korean spelling utterance for feeding back a speech recognition result to a user in a short time.

이를 위하여, 본 발명은, 음성인식 시스템에 적용되는 음성인식 방법에 있어서, 한국어 철자 발화 시 인식을 위한 고립단어 음성음 사전을 구축하는 제 1 단계; 사용자로부터 검색대상 어휘를 철자 발음대로 입력받는 제 2 단계; 및 상기 고립단어 음성음 사전을 바탕으로 인식된 철자들을 음절 및 어휘로 조합하여 텍스트로 인식결과를 출력하는 제 3 단계를 포함하며, 음성인식 시스템 등에 이용된다.To this end, the present invention, the speech recognition method applied to the speech recognition system, comprising: a first step of building an isolated word phonetic dictionary for recognition when the Korean spelling speech; A second step of receiving a search target vocabulary from a user according to spelling pronunciation; And a third step of outputting a recognition result in text by combining the recognized spellings based on the isolated word phonetic dictionary into syllables and vocabulary, and used for a speech recognition system.

Description

Method of Korean utterance recognition using spelling pronunciation}

도 1 은 본 발명이 적용되는 음성인식 시스템의 구성 예시도.1 is an exemplary configuration of a voice recognition system to which the present invention is applied.

도 2 는 본 발명에 따른 한국어 철자 발화에 따른 음성인식 방법에 대한 일실시예 흐름도.Figure 2 is a flow diagram of an embodiment of a voice recognition method according to the Korean spelling utterance according to the present invention.

도 3 은 본 발명에 따라 음성인식 서비스 구현 예를 나타낸 일실시예 흐름도.3 is a flowchart illustrating an example of implementing a voice recognition service according to the present invention;

도 4 는 본 발명에 따라 상기 도 3의 구체적인 서비스 구현 예를 나타낸 일실시예 상세 흐름도.4 is a detailed flowchart illustrating an embodiment of a specific service implementation of FIG. 3 according to the present invention;

* 도면의 주요 부분에 대한 부호의 설명* Explanation of symbols for the main parts of the drawings

11 : 끝점검출기 12 : 특징추출기11: end point detector 12: feature extractor

13 : 비터비 탐색기 14 : 사전
13: Viterbi Explorer 14: dictionary

본 발명은 한국어 자음 19개, 모음 10개, 그리고 음절과 음절 사이를 구분해주는 말 1개와, 입력 음성의 끝을 표시하는 말 1개의 총 31개 고립단어를 이용하여 발화된 한국어 음절, 단어, 또는 문장을 인식하는 음성인식 방법에 관한 것이다.According to the present invention, a Korean syllable, a word, or spoken using a total of 31 isolated words including 19 Korean consonants, 10 vowels, and one word that distinguishes between syllables and syllables, and one word that indicates the end of the input voice. It relates to a speech recognition method for recognizing a sentence.

음성인식 전화정보 시스템은 전화망을 통해 음성을 입력시켜 어떤 정보를 요구했을 때, 이를 인식하여 관련된 정보를 제공하는 시스템이다. 예를 들면, 증권정보 안내 시스템에 있어서, 음성인식 시스템을 사용하면 사용자는 원하는 회사명에 해당되는 코드 번호를 암기할 필요 없이, 회사명만을 말하면 그 회사의 주식 정보를 들을 수 있다.The voice recognition telephone information system is a system that recognizes and provides related information when a certain information is requested by inputting a voice through a telephone network. For example, in the securities information guidance system, when the voice recognition system is used, the user can listen to the stock information of the company by simply speaking the company name without having to memorize a code number corresponding to the desired company name.

일반적으로, 널리 알려진 음성인식 방법으로는 은닉 마르코프 모델(HMM : Hidden Markov Model)을 사용하는 방법이 있다. 여기서, 음성인식 과정으로 비터비(Viterbi) 탐색을 실시하는데, 이는 인식대상 후보 단어들에 대하여 미리 훈련하여 구축한 HMM과 현재 입력된 음성의 특징들과의 차이를 비교하여 가장 유사한 후보단어를 결정하는 과정이다.In general, a widely known method of speech recognition is to use a Hidden Markov Model (HMM). Here, the Viterbi search is performed by the speech recognition process, which compares the difference between the HMM constructed by pre-training the candidate words to be recognized and the characteristics of the currently input voice to determine the most similar candidate word. It's a process.

그런데, 음성인식 과정에서는 많은 계산량을 필요로 하므로 인식할 수 있는 단어가 증가할수록 음성인식의 오율이 높아지고 응답 시간이 느려지기 때문에, 종래에는 사용자가 원하는 정보를 정확하고 신속하게 제공받을 수 없는 단점이 있었다. However, since the speech recognition process requires a large amount of calculation, as the number of words that can be recognized increases, the error rate of speech recognition increases and the response time slows. Therefore, in the past, the user cannot accurately and quickly provide the desired information. there was.

즉, 현재의 한국어 음성인식 시스템의 경우, 음성인식 가능한 최대 어휘수가 2천 ~ 3천 단어 정도이다. 인명 또는 상호명 114 자동안내 서비스와 같이 인식 대상 어휘수가 1만 단어를 넘거나, 인식 대상 어휘간에 유사 명칭이 많이 있는 서비스의 경우에는 현재의 한국어 음성인식 시스템으로 실용성있는 서비스를 제공하는 것이 어려운 실정이다. 또한, 인식 대상 어휘수가 2 ~ 3천 단어 이하로 이루어지는 서비스인 경우에도 지하철이나 극장 매표소, 고속버스터미널과 같이 비교적 주변 잡음이 큰 경우, 올바른 음성인식 결과를 얻기가 어려운 문제점이 있었다.
That is, in the current Korean speech recognition system, the maximum number of words that can be recognized is about 2,000 to 3000 words. In the case of services with more than 10,000 words to be recognized, or many similar names among the words to be recognized, such as automatic name service, it is difficult to provide practical services with the current Korean speech recognition system. . In addition, even in the case of a service having a recognition target vocabulary of 2 to 3000 words or less, there is a problem that it is difficult to obtain a correct voice recognition result when the surrounding noise is relatively large such as a subway, a theater ticket office, and a high-speed bus terminal.

본 발명은, 상기한 바와 같은 문제점을 해결하기 위하여 제안된 것으로, 음성인식의 정확도를 높이고 음성인식 결과를 빠른 시간 내에 사용자에게 피드백시키기 위한 한국어 철자 발화에 따른 음성인식 방법을 제공하는데 그 목적이 있다.The present invention has been proposed to solve the above problems, and an object thereof is to provide a speech recognition method according to Korean spelling to increase the accuracy of speech recognition and feed back a speech recognition result to a user in a short time. .

상기 목적을 달성하기 위한 본 발명은, 음성인식 시스템에 적용되는 음성인식 방법에 있어서, 한국어 철자 발화 시 인식을 위한 고립단어 음성음 사전을 구축하는 제 1 단계; 사용자로부터 검색대상 어휘를 철자 발음대로 입력받는 제 2 단계; 및 상기 고립단어 음성음 사전을 바탕으로 인식된 철자들을 음절 및 어휘로 조합하여 텍스트로 인식결과를 출력하는 제 3 단계를 포함하여 이루어진다.According to an aspect of the present invention, there is provided a speech recognition method applied to a speech recognition system, comprising: a first step of constructing an isolated word phonetic dictionary for recognition in Korean spelling speech; A second step of receiving a search target vocabulary from a user according to spelling pronunciation; And a third step of outputting a recognition result in text by combining the recognized spellings based on the isolated word phonetic dictionary into syllables and vocabulary.

또한, 본 발명은 상기 제 3 단계 수행 후에, 인식결과의 텍스트 어휘를 사용해 정보를 검색하여, 검색된 텍스트 결과를 음성합성기로 음성 출력하는 제 4 단계를 더 포함하여 이루어진다.The present invention further includes a fourth step of searching for information using the text vocabulary of the recognition result after the third step, and outputting the retrieved text result to the speech synthesizer.

또한, 본 발명은 상기 제 4 단계 수행 후에, 음성 출력 결과로서의 전화번호를 사용자의 요구에 따라 자동 다이얼링하는 제 5 단계를 더 포함하여 이루어진다.In addition, the present invention further includes a fifth step of automatically dialing a telephone number as a result of voice output after the fourth step is performed according to a user's request.

삭제delete

본 발명은 한국어 자음 19개, 모음 10개, 그리고 음절과 음절 사이를 구분해 주는 말 1개와 입력 음성의 끝을 표시하는 말 1개의 총 31개 고립단어를 이용하여, 한국어 음절, 단어, 또는 문장을 인식하는 것으로, 한국어 철자 음성인식 방법은 총 31개의 고립단어만을 사용하여, 무제한의 음절 음성, 단어 음성, 또는 문장 음성을 인식할 수가 있는 장점이 있다. 특히, 음성인식 114 전화번호 안내 서비스에서는 인명 또는 상호명의 어휘 길이가 한 단어이기 때문에, 본 발명에서 제안하는 방법을 사용하면, 매우 정확한 전화번호 안내 서비스가 제공될 수 있다. 또한, 총 31개의 고립단어 음성음만 인식하면 되기 때문에, 주변 환경 잡음에 강인하며, 인식률이 거의 100%에 근접하는 대어휘 화자독립 고립단어 음성인식기의 개발이 가능하다.The present invention utilizes a total of 31 isolated words including 19 Korean consonants, 10 vowels, and one word to distinguish between syllables and syllables, and one word to mark the end of the input voice. By recognizing, Korean spelling speech recognition method has the advantage that can recognize an unlimited number of syllable speech, word speech, or sentence speech using only 31 isolated words. In particular, in the voice recognition 114 telephone number guide service, since the vocabulary length of a human name or a business name is one word, a highly accurate telephone number guide service can be provided by using the method proposed by the present invention. In addition, since only 31 isolated speech words need to be recognized, a large vocabulary-independent isolated word speech recognizer that is robust to ambient noise and has a recognition rate of nearly 100% is possible.

상술한 목적, 특징들 및 장점은 첨부된 도면과 관련한 다음의 상세한 설명을 통하여 보다 분명해 질 것이다. 이하, 첨부된 도면을 참조하여 본 발명에 따른 바람직한 일실시예를 상세히 설명한다. The above objects, features and advantages will become more apparent from the following detailed description taken in conjunction with the accompanying drawings. Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1 은 본 발명이 적용되는 음성인식 시스템의 구성 예시도이다. 1 is an exemplary configuration of a voice recognition system to which the present invention is applied.

도 1에 도시된 바와 같이, 본 발명이 적용되는 음성인식시스템은, 음성이 입력되면 입력된 음성의 양끝점을 검출하는 끝점검출기(11)와, 끝점 검출된 음성신호로부터 음성의 특징을 추출하는 특징추출기(12)와, 사전(14)에 등록된 단어들에 대해 음성특징값을 이용하여 가장 유사한 단어를 선정하는 비터비 탐색기(13)를 포함한다.As shown in FIG. 1, the voice recognition system to which the present invention is applied includes an endpoint detector 11 for detecting both endpoints of the input voice when voice is input, and extracting features of the voice from the detected voice signal. A feature extractor 12 and a Viterbi searcher 13 for selecting words most similar to the words registered in the dictionary 14 using voice feature values.

여기서, 사전(14)은 한국어 자음 19개("기역(ㄱ)", "쌍기역(ㄲ)", "니은(ㄴ)", "디귿(ㄷ)", "쌍디귿(ㄸ)", "리을(ㄹ)", "미음(ㅁ)", "비읍(ㅂ)", "쌍비읍(ㅃ)", "시옷(ㅅ)", "쌍시옷(ㅆ)", "이응(ㅇ)", "지읒(ㅈ)", "쌍지읒(ㅉ)", "치읓(ㅊ)", "키읔(ㅋ)", "티읕(ㅌ)", "피읖(ㅍ)", "히읗(ㅎ)"의 19개임), 모음 10개("아(ㅏ)", "야(ㅑ)", "어(ㅓ)", "여(ㅕ)", "오(ㅗ)", "요(ㅛ)", "우(ㅜ)", "유(ㅠ)", "으(ㅡ)", "이(ㅣ)"의 10개임), 그리고 음절과 음절을 구분해 주는 말 1개(여기서는 "그리고"를 예로 사용함)와 입력 음성의 끝을 표시해주는 말 1개(여기서는 "이상"을 예로 사용함)의 총 31개 고립단어 음성음 사전이다. Here, the dictionary 14 is composed of 19 Korean consonants ("Gimyeo (a)", "Ssanggi", "Nieun" (b), "Di (d)", "Ssangdi (귿)", "Leeul ( ㄹ) "," Mieum (ㅁ) "," Bieup (ㅂ) "," Ssangbi-eup (,) "," Shiot (ㅅ) "," Ssangsiot (ㅆ) "," Ieung (ㅇ) "," Ji (읒) ) "," Shinji "," Chi "," Ki "," Thi "," Fi "," Hi "(19)), Vowel 10 ("Ah", "Ya", "Oh", "Female", "Oh", "Yo", "Wo") "," Yu (ㅠ) "," U (ㅡ) "," I (ㅣ) "are ten, and one word that distinguishes syllables and syllables (here" and "is used as an example) and input voice A total of 31 isolated word phonetic dictionaries, one for the end of the word (here using "more" as an example).

비터비 탐색기(13)는 일반적인 방법으로 인식대상 후보 단어들에 대한 미리 훈련하여 구축한 HMM과 현재 입력된 음성의 특징들과의 차이를 비교하여 가장 유사한 후보단어를 결정한다.The Viterbi searcher 13 determines the most similar candidate word by comparing the difference between the HMM constructed by pre-training the candidate words to be recognized and the characteristics of the currently input voice.

상기한 바와 같은 구성을 갖는 본 발명이 적용되는 음성인식 시스템의 구성 및 동작은 당해 분야에서 이미 주지된 기술에 지나지 아니하므로 여기에서는 그에 관한 자세한 설명한 생략하기로 한다. Since the configuration and operation of the speech recognition system to which the present invention having the configuration as described above is applied are only known techniques in the art, detailed description thereof will be omitted herein.

그럼, 유/무선 전화기를 통해 114 전화번호 안내 서비스 등과 같은 수만 어휘를 갖는 서비스를 음성인식에 의해, 운용자(안내원)를 통하지 않고 빠르고 정확하게 서비스할 수 있는 방안에 대해 설명하기로 한다. 예를 들면, 114 전화번호 안내 서비스의 경우, 한국어의 철자음성을 인식하여, 인식한 결과를 텍스트(text) 검색한 후, 검색 결과 전화번호를 음성합성기를 통하여 음성으로 전달해 주거나, 직접 다이얼링할 수 있게 해준다. Then, a method for quickly and accurately serving a service having tens of thousands of words such as 114 telephone number guide service through a wired / wireless telephone without using an operator (guide) will be described. For example, the 114 telephone number guide service recognizes the spelling of Korean, performs a text search on the recognized result, and then sends the search result phone number as a voice through a voice synthesizer or dials directly. To make it possible.

도 2 는 본 발명에 따른 한국어 철자 발화에 따른 음성인식 방법에 대한 일실시예 흐름도이다.2 is a flowchart illustrating an example of a voice recognition method in accordance with Korean spelling.

먼저, 사용자(음성 발화자)가 한국어 자음 19개("기역(ㄱ)", "쌍기역(ㄲ)", "니은(ㄴ)", "디귿(ㄷ)", "쌍디귿(ㄸ)", "리을(ㄹ)", "미음(ㅁ)", "비읍(ㅂ)", "쌍비읍(ㅃ)", "시옷(ㅅ)", "쌍시옷(ㅆ)", "이응(ㅇ)", "지읒(ㅈ)", "쌍지읒(ㅉ)", "치읓(ㅊ)", "키읔(ㅋ)", "티읕(ㅌ)", "피읖(ㅍ)", "히읗(ㅎ)"의 19개임), 모음 10개("아(ㅏ)", "야(ㅑ)", "어(ㅓ)", "여(ㅕ)", "오(ㅗ)", "요(ㅛ)", "우(ㅜ)", "유(ㅠ)", "으(ㅡ)", "이(ㅣ)"의 10개임), 그리고 음절과 음절을 구분해 주는 말 1개(여기서는 "그리고"를 예로 사용함)와 입력 음성의 끝을 표시해주는 말 1개(여기서는 "이상"을 예로 사용함)를 사용하여, 입력 어휘를 철자 발음하면(201), 이 발음은 끝점검출기(11), 특징추출기(12), 비터비 탐색기(13)를 거쳐 철자 발음을 음성인식하며(202~204), 만약 음절 끝 표시음(예를 들어, 여기서는 "그리고"를 사용함)이 인식되면(206) 현재의 입력 음절음을 저장한 후(207) 계속해서 입력되는 철자 발음을 음성인식하여(201~204) 어휘 끝 표시음(예를 들어, 여기서는 "이상"을 사용함)이 인식되면(205), 이전에 인식된 철자들을 음절 및 어휘로 조합하여 텍스트(text)로 인식 결과를 출력한다(208).First, the user (voice talker) has 19 Korean consonants ("Lee Mo", "Ssanggi", "Nee", "Di" (")," Ssang Di "," Leeul) (ㄹ) "," Mie (ㅁ) "," Bi-eup "," Ssangbi-eup "," Siot "," Doublet "," Yeung "," Ji " "", "Ssangji", "Chi", "Ki 읔", "Tit", "Phi", "Hi". 10 vowels ("ah", "ya", "er", "female", "o", "yo", "woo" ) "," Yu (ㅠ) "," U (ㅡ) "," I (ㅣ) "are ten words, and one word that distinguishes syllables and syllables (here" and "is used as an example) If one word is used to mark the end of the voice (here, "ideal" is used as an example) and the input vocabulary is pronounced (201), the pronunciation is an endpoint detector (11), a feature extractor (12), and a Viterbi searcher. (13) and the phonetic pronunciation (202 ~ 204), if the syllable ending sound (for example, "and" is used here) When the expression is stored (206), the current input syllable is stored (207), and the spelling pronunciation input continuously is recognized (201 to 204) to recognize the end-of-vocab indication sound (for example, "above" is used here). In operation 205, the recognition results are output as text by combining the previously recognized spellings into syllables and vocabulary (208).

예를 들면, "홍길동"을 입력하려면, "히읗", "오", "이응", "그리고", "기역", "이", "리을", "그리고", "디귿", "오", "이응", "이상"을 순차적으로 철자 발음한다.For example, to enter "Hong Gil-dong", "Hi 읗", "O", "Yeung", "And", "Grim", "Yi", "Lee", "And", "Diet", "O" Spell pronunciation, "eung", "ideal" sequentially.

따라서 "홍길동" 입력시, 사용자가 "히읗", "오", "이응"을 순차적으로 발화하고 음절 끝 표시음인 "그리고"를 발화하면(206), "히읗, 오, 이응"이 음절 단위로 인식되어 저장되고(207), 이후 "기역", "이", "리을"을 순차적으로 발화하고 음절 끝 표시음인 "그리고"를 발화하면(206) "기역, 이, 리을"이 음절 단위로 인식되어 저장되며(207), "디귿", "오", "이응"을 순차적으로 발화하고 어휘 끝 표시음인 "이상"을 발화하면(205), "디귿, 오, 이응"을 음절 단위로 인식한 후 "히읗, 오, 이응", "기역, 이, 리을", "디귿, 오, 이응"을 각 음절 단위로 한 "홍길동"을 출력한다(208).Therefore, when the user enters "Hong Gil-dong", when the user utters "Hi", "O", "Yeung" in sequence, and "Yes", which is the syllable ending sound (206), "Hyep, Oh, Yieung" is the syllable unit. If it is recognized and stored (207), then "base", "this" and "ri" are sequentially uttered, and the end syllable "and" is uttered (206), "base, this, ri" is the syllable unit. If it is recognized and stored (207), "Dih", "O", and "Ieung" are sequentially uttered, and "Voice", which is the end of the vocabulary, is fired (205). After recognizing as "Hi, Oh, Lee Eung", "Gimyeo, Lee, Lee Eul", and "Dong, Oh, Lee Eung", "Hong Gil Dong" with each syllable unit is output (208).

도 3 은 본 발명에 따라 음성인식 서비스 구현 예를 나타낸 일실시예 흐름도이다.3 is a flowchart illustrating an example of implementing a voice recognition service according to the present invention.

먼저, 본 서비스를 이용하려는 사용자에게 한국어 자음 19개, 모음 10개, 그리고 음절과 음절 사이를 구분해 주는 말 1개와 입력 음성의 끝을 표시하는 말 1개의 총 31개 고립단어를 이용해 철자 발음하는 시스템 사용법을 안내한다(301). 이때, 특히 음절 끝 표시음(예를 들어, 여기서는 "그리고"를 사용함)과 어휘 끝 표시음(예를 들어, 여기서는 "이상"을 사용함)에 대해서는 사용자가 인지할 수 있도록 명확하게 안내멘트를 송출한다. First, a user who wants to use the service spells pronunciation using 19 Korean consonants, 10 vowels, 1 word that distinguishes between syllables and syllables, and 1 word that indicates the end of the input voice. Guide system usage (301). At this time, the announcement is clearly clarified so that the user can recognize the syllable ending sound (for example, "and" here) and the lexical ending sound (for example, "more than"). do.

상기 도 2에서의 결과로 출력된 텍스트(text) 형태의 어휘는 한국어 무제한 음성합성기를 이용하여, 텍스트를 음성으로 출력하여 입력자에게 들려 주어, 인식 결과 어휘가 정확한지의 확인 과정(303)을 거친다. As a result, the vocabulary in the form of text output as a result of FIG. 2 is outputted as a voice to a user using a Korean unlimited speech synthesizer, and the recognition result vocabulary is corrected. .

입력자가 인식 어휘가 틀린 것으로 확인하면(303), 철자 발음 재입력을 요청하여(304) 철자 발음에 의한 한국어 음성인식 과정(302)(도 2 참조)과 인식 결과 확인 과정(303)을 맞는 것으로 확인할 때까지 반복 수행하며, 인식 결과가 맞는 것으로 확인이 되면(303), 인식 결과의 텍스트 어휘를 사용하여 정보 검색을 수행한 후(305), 검색된 텍스트 결과를 입력자에게 무제한 음성합성기를 이용하여 음성 안내해 주거나, 검색 정보 안내시 자동 다이얼링할 수 있도록 해준다(306).
상기와 같은 본 발명은 한국어의 한국어 철자 발화를 인식하는 것 뿐만 아니라, 외래어의 한국어 철자 발화를 인식하는 데에도 동일하게 적용될 수 있다. 이는 외래어를 한국어로 철자 발화하는 경우에 해당되며, 예를 들면 일본어 입력 프로그램이 없이도, 혹은 일본어 입력 프로그램이 있어도 일본어 입력이 불가능한 유무선 전화기에서 일본 웹 정보를 검색할 때, 일본어 키워드를 히라가나(가타가나)로 소리나는대로 철자 발화하면, 이를 인식하여 일본어 키워드로 변환하고, 이를 바탕으로 검색된 일본어 웹문서를 다시 한국어로 웹 변환하여 제공할 수 있다. 즉, 한국어 “주가안내”를 검색 키워드로 하고 싶은 경우에, 이에 대한 일본어 "カブカアンナイ"를 "카부카안나이"로 하여 철자 발화하면, 이를 "카부카안나이"로 인식하고, 다시 "카부타안나이"를 "カブカアンナイ"로 변환하여, "カブカアンナイ"를 검색어로 한 일본어 웹문서를 다시 한국어로 웹 변환함으로써, 일본어 입력 프로그램이 없이도, 혹은 일본어 입력 프로그램이 있어도 일본어 입력이 불가능한 유무선 전화기에서 일본 웹 정보를 검색할 수 있다. If the inputter confirms that the recognized vocabulary is incorrect (303), the user may request to re-enter the spelling pronunciation (304) to match the Korean speech recognition process 302 (see FIG. 2) and the recognition result check process 303 based on the spelling pronunciation. Repeat until it is confirmed, and if it is confirmed that the recognition result is correct (303), after performing the information search using the text vocabulary of the recognition result (305), using the unlimited speech synthesizer to the inputted text results Voice guidance or automatic dialing during search information guidance (306).
The present invention as described above can be equally applicable to not only recognizing Korean spelling utterances of Korean, but also recognizing Korean spelling utterances of foreign languages. This is the case when the foreign language is spelled in Korean. For example, when searching Japanese web information on a wired or wireless telephone that cannot input Japanese even without a Japanese input program or a Japanese input program, Hiragana (Katakana) If the phonetic spelling is spoken as it is, it can be recognized and converted into Japanese keywords, and based on this, the searched Japanese web document can be converted back to Korean web and provided. In other words, if you want to search for Korean “share price guide” as a keyword, you can say “Kabuka Annai” and spell it as “Kabuka Annai”. By converting the text into "カブカアンナイ" and web-converting Japanese web documents with `` カブカアンナイ '' as a search word back to Korean, so that Japanese web information can be retrieved from a wired or wireless telephone that does not have Japanese input program or Japanese input program. can do.

도 4 는 본 발명에 따라 상기 도 3의 구체적인 서비스 구현 예를 나타낸 일실시예 상세 흐름도로서, 철자 음성에 의한 한국어 음성인식 방법을 이용한 한미르 전화번호 안내 서비스의 유/무선 전화번호 검색 예를 나타낸다.4 is a detailed flowchart illustrating an exemplary embodiment of the service implementation of FIG. 3 according to the present invention, and illustrates a wire / wireless telephone number search example of a Hanmir telephone number guide service using a Korean voice recognition method using spelling voice.

먼저, 서비스 이용자가 음성인식 한미르 전화번호 안내 서비스로 전화를 하면, 서비스 시스템에서 서비스 제공 방법에 대한 안내멘트를 송출한다(401)(상기 도 3의 "301" 단계 참조). 서비스 이용 방법을 아는 경우, 서비스에 대한 안내멘트를 송출하는 도중에, 언제라도 음성은 입력할 수 있다.First, when the service user calls the voice recognition Hanmir telephone number guide service, the service system transmits a guidement about the service providing method (401) (see step "301" in FIG. 3). If the user knows how to use the service, the voice can be input at any time during the announcement of the service announcement.

만약, 정확한 상호명을 알고 있는 경우에는(402), 상호명에 대한 철자 입력을 한 후(403) 정확히 되었는지 확인을 하고, 이어서 상호명의 주소지를 철자 입력한 후(404) 앞에서와 마찬가지의 확인 절차를 거친다. 이 두가지 검색어를 사용하 여, 한미르 전화번호 안내 서비스에서 검색을 한다(405). If the correct business name is known (402), the spelling input for the business name is confirmed (403), and then the correct address is entered. . Using these two search terms, a search is made in the Hanmir telephone number guide service (405).

검색 결과의 상호명 업종을 한미르의 최상위 단위 업종 구분(현재, 한미르 전화번호 검색 서비스에서는 "생활서비스업", "의료서비스업", "부동산,임대업", "여행,숙박업", "정보,통신업", "연구,개발업", "전기,가스및수도사업", "식음서비스업", '금융,보험업", "운동,오락서비스업", "무역업,상품중개업", "운수관련업", "언론및광고업", "사회서비스업", "사업관련서비스업", "교육문화서비스업", "외국기관,사회단체", "제조업,도매업,소매업", "행정,국방,사회보장", "금속,재생재료,가공", "임업,농축,광업,어업", "건설업"의 총 22종의 업종 구분이 있음)에 따른 업종 종류를 무제한 한국어 음성합성기를 이용해, 안내하여 선택 요청한다(406). 그리고 나서, 검색 결과 전화국번을 모두 무제한 음성합성기를 이용해 안내하여 선택토록 요청한다(407). 이렇게 선택된 전화국번내의 동일 상호명의 전화번호가 1개인 경우에는 음성합성기를 이용하여 전화번호를 안내해 주고, 2-3초후에 자동 연결해 준다(408). 그러나 만약에 동일한 전화국번내의 동일 상호명의 전화번호가 2개 이상이 나오면, 모두 안내해 주고 선택하도록 요청을 한다(407). 그리고 선택된 전화번호로 2-3초후 자동 연결해 주는 것으로(408) 서비스는 종료된다.The business name of the search results is classified into the top-level sectors of Hanmir (currently, in the Hanmir telephone number search service, "living services", "medical services", "real estate, leasing", "travel, accommodation", "information and communication", " Research, development "," electricity, gas and water business "," food service "," financial, insurance "," exercise, entertainment "," trade, commodity brokerage "," transportation "," press and advertising ", "Social Services", "Business Related Services", "Educational Culture Services", "Foreign Institutions, Social Organizations", "Manufacturing, Wholesale, Retailing", "Administration, Defense, Social Security", "Metal, Recycled Materials, Processing" (22) There are 22 types of industry categories ("forestry, enrichment, mining, fishing" and "construction") using the unlimited Korean voice synthesizer to guide and request selection (406). All phone numbers are guided through an unlimited voice synthesizer and requested to be selected (407). If there is only one phone number of the same company name in the station number, the voice synthesizer guides the phone number and automatically connects it after 2-3 seconds (408), but if the same company name in the same phone number is 2 If more than one is found, all are guided and requested to select (407), and the service terminates by automatically connecting to the selected phone number after 2-3 seconds (408).

한편, 정확한 상호명을 모르는 경우에는 업종명(한미르에서 취급하는 업종명을 숙지하고 있어야 함)을 철자 입력한 후(409) 정확히 입력되었는지 확인을 하고, 이어서 상호명의 주소지를 철자 입력한 후(410) 앞에서와 마찬가지의 확인 절차를 거친다. 이 두가지 검색어를 사용하여, 한미르 전화번호 안내 서비스에서 검색을 한다(411). 검색 결과를 바탕으로, 업종을 한미르의 최하위 단위 업종 구분에 따른 업종 종류를 한국어 음성합성기를 이용해 안내하여 선택 요청한다(412). 그 결과로 나오는 모든 상호명을 안내해 주고 선택을 요청한다(413). 그리고 나서, 검색 결과의 전화국번을 모두 음성합성기를 이용해 안내하여 선택토록 요청한다(407). 이렇게 선택된 전화국번내의 동일 상호명의 전화번호가 1개인 경우에는 음성합성기를 이용하여 전화번호를 안내해 주고, 2-3초후에 자동 연결해 준다(408). 그러나 만약에 동일한 전화국번내의 동일 상호명의 전화번호가 2개 이상이 나오면, 모두 안내해 주고 선택하도록 요청을 한다(407). 그리고 선택된 전화번호로 2-3초후 자동 연결해 주는 것으로(408) 서비스는 종료된다.On the other hand, if you do not know the exact business name, spell out the type of business name (must be familiar with the name of the business handled by Hanmir) (409) and then confirm that the correct input, and then spelled the address of the business name (410) The same procedure is followed. Using these two search terms, a search is made in the Hanmir telephone number guide service (411). Based on the search result, the type of industry according to the classification of the lowest unit of the business of Hanmir is requested by using the Korean voice synthesizer to guide the selection (412). The resulting business name will be guided and requested for selection (413). Then, all the telephone numbers of the search results are guided by the voice synthesizer and requested to be selected (407). When the phone number of the same company name in the selected phone number is 1, the phone number is guided by using a voice synthesizer and automatically connected after 2-3 seconds (408). However, if two or more telephone numbers of the same business name in the same telephone number come out, all are guided and requested to be selected (407). The service is terminated by automatically connecting to the selected phone number after 2-3 seconds (408).

사용자가 정확한 상호명을 모르면서 찾고자 하는 상호명의 전화번호를 성공적으로 제공받기 위해서, 사용자는 한미르 전화번호 검색서비스의 업종 분류를 잘 알고 있어야 한다.In order for the user to successfully receive the phone number of the business name to be searched without knowing the exact business name, the user must be familiar with the business classification of the Hanmir phone number search service.

상술한 바와 같은 본 발명의 방법은 프로그램으로 구현되어 컴퓨터로 읽을 수 있는 기록매체(씨디롬, 램, 롬, 플로피 디스크, 하드 디스크, 광자기 디스크 등)에 저장될 수 있다.The method of the present invention as described above may be implemented as a program and stored in a computer-readable recording medium (CD-ROM, RAM, ROM, floppy disk, hard disk, magneto-optical disk, etc.).

이상에서 설명한 본 발명은 전술한 실시예 및 첨부된 도면에 의해 한정되는 것이 아니고, 본 발명의 기술적 사상을 벗어나지 않는 범위 내에서 여러 가지 치환, 변형 및 변경이 가능하다는 것이 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 있어 명백할 것이다.
The present invention described above is not limited to the above-described embodiments and the accompanying drawings, and various substitutions, modifications, and changes are possible in the art without departing from the technical spirit of the present invention. It will be clear to those of ordinary knowledge.

상기한 바와 같은 본 발명은, 총 31개의 고립단어만을 이용하여, 대어휘 화자독립 음성인식을 효과적으로 수행할 수 있는데, 예를 들면 수만 어휘를 사용하는 114 전화번호 안내 서비스에서 인명 또는 상호명은 그 어휘 길이가 한 단어이기 때문에 이와 유사한 서비스에 매우 유용하며, 특히 유/무선 전화기를 이용한 음성인식 인터넷 정보 검색 서비스의 입력 기술로 활용할 수 있는 효과가 있다. As described above, the present invention can effectively perform large-word speaker-independent speech recognition using only 31 isolated words. For example, in the 114 telephone number guide service using tens of thousands of words, the name of a person or a business name is the word. Since the length is one word, it is very useful for similar services, and especially, it can be used as an input technology of voice recognition internet information retrieval service using a wired / wireless phone.

특히, 주변의 소음이 많은 자동차 운전중에는 눈과 손을 쓸 수 없는 상황이어서 원하는 사람/회사에 전화를 하고자 할 때에는, 오로지 음성만으로 전화를 걸고자 하는 사람 또는 회사의 이름을 정확하게 입력해야 하는데, 본 발명은 31개의 고립단어 음성음만 인식하면 되기 때문에, 잡음에 강한 음성인식 기능을 수행할 수 있어, 매우 유용한 입력 방식을 제공할 수 있는 효과가 있다. In particular, when driving a car with a lot of noise, it is impossible to use your eyes and hands, so when you want to call a person / company you need to enter the name of the person or company you want to call only by voice. Since the present invention needs to recognize only 31 isolated word voices, it is possible to perform a voice recognition function that is strong against noise, thereby providing a very useful input method.

Claims

In the speech recognition method applied to the speech recognition system,

A first step of constructing an isolated word phonetic dictionary for recognizing Korean spelling;

A second step of receiving a search target vocabulary from a user according to spelling pronunciation; And

A third step of outputting a recognition result in text by combining the recognized spellings based on the isolated word phonetic dictionary into syllables and vocabulary

Speech recognition method according to the Korean spelling utterance, including.

The method of claim 1,

After performing the third step, a fourth step of searching for information using the text vocabulary of the recognition result, and outputting the retrieved text result to the speech synthesizer

Speech recognition method according to the Korean spelling utterance comprising a more.

The method of claim 2,

A fifth step of automatically dialing a telephone number as a result of voice output according to a user's request after performing the fourth step

The method according to any one of claims 1 to 3,

The isolated word phonetic dictionary,

Korean consonants and vowels, syllable separators to distinguish syllables and syllables, and vocabulary end separators to mark the end of the input voice.

The method of claim 4, wherein

The isolated word phonetic dictionary,

19 consonants in Korean ("Lim.", "Ssanggi", "Nee", "Di" (")," Ssang-di "," Lee "," Meeum " ㅁ) "," Bi-eup "," Ssangbi-eup "," Shikot "," Shuangsi-cloth "," Lee-eung "," Ji-ji "," Ssang-ji " ) "," Chi "," key "," tea "," pi ", and" hi ", 10 vowels (" ah " ) "," Ya "," U "," Wo "," O "," Yo "," U "," Yu " "," U (ㅡ) "," yi (ㅣ) "10), 1 syllable separator, and 1 vocabulary delimiter, a total of 31 isolated word phonetic dictionaries Recognition method.

delete