KR20010044675A

KR20010044675A - Method of Performing Speech Recognition by syllable and Apparatus Thereof

Info

Publication number: KR20010044675A
Application number: KR1020010013408A
Authority: KR
Inventors: 이종석; 이윤근
Original assignee: 백종관; 주식회사 보이스웨어
Priority date: 2001-03-15
Filing date: 2001-03-15
Publication date: 2001-06-05
Also published as: KR100366601B1

Abstract

PURPOSE: A method and apparatus for performing a voice recognition as a unit of a syllable are provided to recognize thousands of words capable of being used in general life by using the current voice recognizing technique. CONSTITUTION: A voice recognizing unit(100) performs a voice recognizing of the first syllable out of output voices. A word constructing unit(200) finally outputs a voice recognizing objected word using the result recognized in the voice recognizing unit(100). The elements of the voice recognizing unit(100) are described as follows. A character extracting unit(10) extracts a specific vector column out of output voices. A weight value embodying unit(20) embodies a portion corresponded to the first syllable out of the specific vector columns extracted from the character extracting unit(10). A searching unit(30) selects the most similar word by comparing the specific vector column emphasized in the weight value embodying unit(20) with a dictionary(40) and finally selects the first syllable out of the selected words. The dictionary(40) includes text information of the recognizing objected word and a specific vector column responded to the text information.

Description

Method of performing speech recognition in syllable units and its apparatus {Method of Performing Speech Recognition by syllable and Apparatus Thereof}

발명의 분야Field of invention

본 발명은 단어를 구성하는 음절단위로 음성인식을 수행하는 방법에 관한 것이다. 보다 구체적으로 본 발명은 음성인식 대상 단어를 구성하는 음절을 등록된 단어로부터 취득해서 음성인식 대상 단어를 최종적으로 조합하는 방법에 관한 것이다.The present invention relates to a method for performing speech recognition in units of syllables constituting words. More specifically, the present invention relates to a method of finally combining a speech recognition target word by acquiring a syllable constituting the speech recognition target word from a registered word.

발명의 배경Background of the Invention

통신사업 분야에서 음성 다이얼링과 같은 초보적인 음성인식 기술의 사용으로 그 편리성이 입증되면서 음성 인식 기술의 요구가 다양한 산업 분야로 확대되고 있다. 또한 실제 생활에서 필요성이 증대됨에 따라 음성인식 분야에서 많은 수의 업체들이 보다 나은 성능을 발휘하는 음성 인식기를 구현하기 위해 많은 시간과 투자를 들여 연구에 몰두하고 있다. 그러나 아직까지는 사용자가 편리하게 일상적으로 사용하는 모든 단어를 음성인식 할 수 있는 기술은 제공되지 않고 있다.In the telecommunications business, the use of rudimentary voice recognition technology such as voice dialing has proved its convenience, and the demand for voice recognition technology is expanding to various industries. In addition, as the necessity increases in real life, a large number of companies in the field of speech recognition are investing a lot of time and investment in research to realize a better speech recognizer. However, up to now, there is no technology for speech recognition of all words conveniently used by users everyday.

하지만 사용자는 보다 나은 성능을 발휘하는 음성 인식기를 원하고 있기 때문에 제한적인 현재의 기술로도 사용자의 욕구를 충족시킬 만한 기술이 필요하다. 현재의 음성인식 기술은 제한된 응용 영역에서는 수천 단어(약 6,000단어)의 인식이 가능한 수준에 올라와 있다. 그러나 일상적인 생활에서 사용되는 모든 언어를 대상으로 하기에는 부족한 것이 현실이다.However, the user wants a speech recognizer that performs better, and therefore limited current technology requires a technology that satisfies the user's needs. Current speech recognition technology is capable of recognizing thousands of words (about 6,000 words) in limited applications. However, the reality is that it is not enough to target all the languages used in everyday life.

이에 본 발명자는 현재의 음성인식 기술로도 일상적인 생활에서 사용될 수 있을 정도의 수만 또는 수십만 단어를 인식할 수 있는 음성 인식기 및 그 방법을 제안하고자 한다. 이는 기본적으로 단어에 대한 인식이 아니라 하나의 음절을 대상으로 음성인식을 수행함으로써 가능하다.Accordingly, the present inventors propose a speech recognizer and a method capable of recognizing tens of thousands or hundreds of thousands of words that can be used in everyday life even with current speech recognition technology. This is basically possible by performing speech recognition on one syllable, not recognition of words.

본 발명의 목적은 현재의 음성인식 기술로도 일상적인 생활에서 사용될 수 있을 정도의 수만 또는 수십만 단어를 인식할 수 있는 음성 인식기 및 그 방법을 제공하고자 한다.An object of the present invention is to provide a speech recognizer and a method capable of recognizing tens of thousands or hundreds of thousands of words that can be used in everyday life even with current speech recognition technology.

본 발명의 다른 목적은 단어에 대한 음성인식이 아니라 하나의 음절을 대상으로 음성인식을 수행함으로써 수만 또는 수십만 단어를 인식할 수 있는 음성 인식기 및 그 방법을 제공하고자 한다.Another object of the present invention is to provide a speech recognizer and a method capable of recognizing tens of thousands or hundreds of thousands of words by performing speech recognition on one syllable instead of speech recognition.

본 발명의 또 다른 목적은 여러 음절로 구성된 한글을 음절 단위로 입력하되, 한 음절만을 대상으로 하는 것이 아니라 음성인식 대상 단어를 구성하는 음절들을 포함하는 어휘로부터 음절을 선택해서 최종적으로 음성 인식 대상 단어를 구성하도록 한 음성 인식기 및 그 방법을 제공하고자 한다.Another object of the present invention is to input a Hangul composed of several syllables in syllable units, not to target only one syllable, but to select the syllables from the vocabulary including the syllables constituting the words to be recognized. It is intended to provide a speech recognizer and a method for configuring the same.

본 발명의 또 다른 목적은 음성인식기에 등록되지 않은 단어라도 음성인식을 수행할 수 있는 음성 인식기 및 그 방법을 제공하고자 한다.It is still another object of the present invention to provide a speech recognizer and a method capable of performing speech recognition even if a word is not registered in the speech recognizer.

본 발명의 또 다른 목적은 음성에 의한 문서 입력 장치에서 미등록어 또는 고유명사를 용이하게 인식할 수 있는 음성 인식기 및 그 방법을 제공하고자 한다.Another object of the present invention is to provide a speech recognizer and a method for easily recognizing an unregistered word or a proper noun in a document input apparatus using a voice.

본 발명의 상기 및 기타의 목적들은 하기 상세히 설명되는 본 발명에 의하여 모 두 달성될 수 있다.The above and other objects of the present invention can be achieved by the present invention described in detail below.

제1도는 본 발명에 따라 구성되는 음성 인식기를 개략적으로 도시한 블록도이다.1 is a block diagram schematically showing a speech recognizer constructed in accordance with the present invention.

제2도는 본 발명에 따라 이루어지는 음절 단위로 음성인식을 수행하는 과정을 도시한 흐름도이다.2 is a flowchart illustrating a process of performing speech recognition in units of syllables according to the present invention.

*도면의 주요부호에 대한 간단한 설명** Brief description of the major symbols in the drawings *

10 : 특징 추출부 20 : 가중치 부여부10: feature extraction unit 20: weighting unit

30 : 탐색부 40 : 사전30: search unit 40: dictionary

100 : 음성 인식부 200 : 단어 구성부100: speech recognition unit 200: word configuration unit

발명의 요약Summary of the Invention

본 발명은 음성인식에 의한 한글을 입력하는 방법에 관한 것으로 음절을 대상으로 음성인식을 수행해서 한글을 정확히 입력시킬 수 있는 방법에 관한 것이다. 이를 위한 본 발명의 음성 인식기는 발화된 단어 중 하나의 음절만을 대상으로 음성 인식을 수행하는 음성 인식부(100)와 상기 인식부에서 인식한 음절들을 모아서 음성인식 대상 단어를 최종적으로 출력하는 단어 구성부(200)로 구성되고, 상기 음성 인식부(100)는 발화된 단어에서 음성의 특징을 나타내는 특징 벡터열을 추출하기 위한 특징 추출부(10), 상기 추출부에서 추출한 특징 벡터열 중 인식 대상 단어의 음절에 해당하는 부분에 가중치를 부여하기 위한 가중치 부여부(20), 인식대상 단어의 텍스트 정보와 상기 텍스트 정보에 대응하는 특징 벡터열을 포함하는 사전, 및 상기 가중치 부여부(20)에서 강조된 특징 벡터열을 가지고 상기 사전(40)과 비교해서 유사도가 가장 높은 단어를 선택하고 선택된 단어 중에서 인식 대상 단어의 음절만을 최종적으로 선택하기 위한 탐색부(30)로 구성되는 것을 그 특징으로 하다.The present invention relates to a method of inputting Hangul by voice recognition, and to a method of accurately inputting Hangul by performing voice recognition on a syllable. To this end, the speech recognizer of the present invention comprises a speech recognition unit 100 for performing speech recognition on only one syllable of the spoken words and a word configuration for finally outputting a speech recognition target word by collecting the syllables recognized by the recognition unit. And a feature extractor 10 for extracting a feature vector sequence representing a feature of speech from the spoken word, and a recognition object among the feature vector sequences extracted by the extractor. A weighting unit 20 for weighting a portion corresponding to a syllable of a word, a dictionary including text information of a recognition target word and a feature vector sequence corresponding to the text information, and the weighting unit 20 With the highlighted feature vector string, the word having the highest similarity is selected in comparison with the dictionary 40, and only the syllable of the word to be recognized is finally selected among the selected words. That consisting of the navigation unit 30 to select it as its features.

또한 본 발명에 따른 음성인식을 이용한 한글 입력 방법은 음성인식 대상 단어를 구성하는 음절들을 포함하는 어휘들을 선택해서 발성하고, 발성된 어휘에서 어휘의 특징을 나타내는 음성의 특징 벡터열을 추출하고, 상기 추출된 특징 벡터열에 음성인식 대상 단어의 음절에 해당하는 부분을 찾아 가중치를 부여하고, 상기 가중치를 부여하는 단계에서 강조된 특징 벡터열을 가지고 인식 대상 단어의 텍스트(text)정보와 상기 텍스트 정보에 대응하는 단어의 특징 벡터열을 포함하는 사전(40)을 탐색해서 상기 어휘들을 선택해서 발성하는 단계에서 발성된 단어와 동일한 단어를 선택하고, 상기 선택된 단어에 대응하는 텍스트 정보 중 음성인식 대상 단어를 구성하는 음절에 해당하는 텍스트 정보만을 선택하고, 상기 단계들을 통해서 선택된 음성인식 대상 단어를 구성하는 모든 음절의 텍스트 정보를 음성인식 결과값으로 출력하는 단계로 이루어지는 것을 그 특징으로 한다.In addition, the Hangul input method using the speech recognition according to the present invention selects and utters the vocabulary including the syllables constituting the speech recognition target word, and extracts a feature vector sequence of the speech representing the characteristics of the vocabulary from the spoken vocabulary, Finding and weighting a portion corresponding to a syllable of a speech recognition target word to the extracted feature vector sequence, and corresponding to the text information and the text information of the recognition target word with the feature vector sequence highlighted in the step of assigning the weight. Searching the dictionary 40 including the feature vector sequence of the word to select the words and the step of selecting and uttering the vocabulary, select the same words as the words spoken, and constitutes a speech recognition target word among the text information corresponding to the selected words Select only the text information corresponding to the syllable, and select the selected voice recognition And that comprises text information for all the syllables that make up the word to the step of outputting a speech recognition result with its features.

발명의 구체예에 대한 상세한 설명Detailed Description of the Invention

본 발명이 종래의 음성 인식기와 구별되는 가장 큰 특징은 '단어'에 대한 음성인식을 수행하는 것이 아니라, '음절'에 대한 음성인식을 수행한다는 것이다. 보다 상세히는 음성인식 대상 단어를 구성하는 각각의 음절을 포함하는 어휘들로부터 하나씩 음절을 선정해서 음성인식을 수행하고, 최종적으로는 음절들을 하나의 단어로 결합해서 음성인식의 결과값을 출력하도록 구성한 것을 그 특징으로 한다.The most distinctive feature of the present invention is that it does not perform speech recognition for 'word', but performs speech recognition for 'syllable'. In more detail, speech recognition is performed by selecting syllables one by one from the vocabulary including each syllable constituting the speech recognition word, and finally, the syllables are combined into one word to output the result of speech recognition. It is characterized by that.

현재의 음성인식 기술은 단어를 대상으로 음성인식을 수행한다. 따라서 인식 대상 단어와 동일한 음성 정보를 포함하는 음성인식 대상어휘를 사전으로부터 선정해 음성인식을 수행하고 있다. 그러므로 음성인식 대상 단어가 사전에 등록되지 않은 경우에는 음성 인식 결과를 출력할 수가 없거나, 가장 유사한 음성 인식 결과를 출력해 음성 인식의 성능을 떨어뜨리고 있다.Current speech recognition technology performs speech recognition on words. Therefore, voice recognition is performed by selecting a voice recognition target vocabulary including the same voice information as the recognition target word from a dictionary. Therefore, if the words to be recognized are not registered in advance, the speech recognition results cannot be output, or the most similar speech recognition results are output to reduce the performance of the speech recognition.

그러나 본 발명은 단어를 구성하는 음절을 대상으로 하고 있으므로 정확도 및 한정된 단어만을 대상으로 하던 종래의 음성 인식기와는 달리 수만 단어의 음성 인식을 수행할 수가 있다.However, since the present invention targets syllables constituting words, it is possible to perform speech recognition of tens of thousands of words, unlike the conventional speech recognizers which target only accuracy and limited words.

이와 같은 놀라운 효과를 가지는 본 발명의 실시예를 첨부된 도면을 가지고 이하에서 자세히 설명한다.Embodiments of the present invention having such a surprising effect will be described in detail below with reference to the accompanying drawings.

제1도는 본 발명에 따라 구성되는 음성 인식기를 개략적으로 도시한 블록도이다. 본 발명은 발화된 음성 중 첫 음절만을 대상으로 음성 인식을 수행하는 음성 인식부(100)와 상기 인식부에서 인식한 결과를 가지고 음성인식 대상 단어를 최종적으로 출력하는 단어 구성부(200)로 이루어진다.1 is a block diagram schematically showing a speech recognizer constructed in accordance with the present invention. The present invention is composed of a speech recognition unit 100 for performing a speech recognition for only the first syllable of the spoken speech and a word configuration unit 200 for finally outputting the speech recognition target word with the result recognized by the recognition unit. .

상기 음성 인식부(100)는 발화된 음성에서 특징 벡터열을 추출하기 위한 특징 추출부(10), 상기 추출부에서 추출한 특징 벡터열 중 첫 음절에 해당하는 부분에 가중치를 부여하기 위한 가중치 부여부(20), 상기 가중치 부여부(20)에서 강조된 특징 벡터열을 가지고 사전(40)과 비교해서 유사도가 가장 높은 단어를 선택하고 선택된 단어 중 첫 음절만을 최종적으로 선택하기 위한 탐색부(30), 및 인식대상 단어의 텍스트 정보와 상기 텍스트 정보에 대응하는 특징 벡터열을 포함하는 사전으로 구성된다.The speech recognition unit 100 may include a feature extractor 10 for extracting a feature vector sequence from the spoken speech and a weighting unit for weighting a portion corresponding to the first syllable of the feature vector sequence extracted by the extractor. (20), the search unit 30 for selecting a word having the highest similarity compared to the dictionary 40 with the feature vector string highlighted in the weighting unit 20 and finally selecting only the first syllable among the selected words, And a dictionary including text information of the word to be recognized and a feature vector sequence corresponding to the text information.

본 발명은 기본적으로 음성인식 대상 단어를 구성하고 있는 음절들을 음성인식 대상물로 선정하는 것을 그 특징으로 하고 있다. 따라서 발화된 단어를 가지고 음성인식을 수행하는 것이 아니라, 발화된 단어중 음성인식 대상 단어를 구성하는 음절을 음성인식 대상으로 선정한다.The present invention is characterized in that the syllables constituting the speech recognition target word are selected as the speech recognition target. Therefore, instead of performing speech recognition with the spoken words, the syllables constituting the speech recognition target words among the spoken words are selected as the speech recognition targets.

설명의 편의를 위해서 "이순신"이라는 단어를 음성인식 한다고 가정한다. 또한 "이순신"이라는 단어는 사전(40)에 등록되지 않은 단어로 가정한다. 종래의 음성 인식기가 "이순신"이라는 단어를 가지고 음성인식 한다고 가정하면, 사전(40)에 등록된 단어가 아니므로 음성인식을 수행할 수 없거나, 수행한다고 하더라도 정확인 인식 결과를 출력할 수는 없다.For convenience of explanation, it is assumed that the word "Yi Soon Shin" is voice recognized. It is also assumed that the word "Yi" is not a word registered in the dictionary 40. Assuming that the conventional speech recognizer recognizes speech with the word "Yi-Shin Shin", since it is not a word registered in the dictionary 40, speech recognition cannot be performed or even if the speech recognition apparatus does not output accurate recognition results. .

그러나 본원 발명에서는 사전에 등록되지 않은 단어라도 정확한 인식 결과를 출력할 수가 있다.However, in the present invention, even a word not registered in advance can output an accurate recognition result.

이를 위한 본 발명을 제2도에 흐름도로 도시하였다. "이순신"이라는 단어는 "이", "순", "신"이라는 세 음절로 구성된다. 따라서 본 발명에서는 각각의 음절을 포함하는 어휘를 선정해 발성한다(S100). 바람직하게는 어휘의 첫음절에 음성인식 대상 음절이 위치하는 것이 바람직하나 이에 한정될 필요는 없다.The present invention for this purpose is shown in a flowchart in FIG. The word "Yi Soon Shin" consists of three syllables: "yi", "sun", and "god". Therefore, in the present invention, the vocabulary including each syllable is selected and uttered (S100). Preferably, the syllable recognition syllable is located in the first syllable of the vocabulary, but is not limited thereto.

"이"로 시작하는 일상적으로 많이 사용하는 어휘에는 "이별", "이슬", "이사", "이름"등이 있을 것이고, 상기 어휘들 모두는 사전에 등록된 단어가 아니더라도 몇 개의 단어, 즉 "이"라는 음절로 시작하는 동록된 단어들이 존재한다.Commonly used vocabulary starting with "yi" will include "farewell", "dew", "director", "name", etc. All of the above vocabulary words are not words registered in advance, There are registered words that begin with the syllable "this".

특징 추출부(10)는 발화된 단어에서 그 단어가 나타내는 음성의 특징을 나타내는 특징 벡터열을 추출하게 된다(S200). 각각의 단어에는 그 단어를 발성하는데 필요한 음성정보, 즉 특징 벡터열을 가지고 있다. 따라서 이 단계(S200)에서는 음성으로부터 단어의 특징을 찾아 발성된 음성에 해당하는 단어를 선택하게 된다.The feature extractor 10 extracts a feature vector sequence representing a feature of a voice represented by the word from the spoken word (S200). Each word has a feature vector string, that is, voice information necessary to speak the word. Therefore, in this step (S200) to find the feature of the word from the voice to select the word corresponding to the spoken voice.

이후, 상기 단계(S200)에서 추출된 음성의 특징 벡터열에 가중치를 부여하기 위한 가중치 부여부(20)가 동작하여 상기 S100단계에서 발성된 어휘 중 음성인식 대상 단어의 음절에 해당하는 부분에 가중치를 부여하게 된다(S300). 만약, "이름"이라는 단어를 발성하는 경우에는 첫음절에 해당하는 부분에 가중치를 부여하게 된다. 이와 같이 처리를 하게 되면 음절의 정확도가 높아져 음성인식의 정확도가 높아지게 된다.Subsequently, the weighting unit 20 for weighting the feature vector sequence of the speech extracted in step S200 operates to weight the portion corresponding to the syllables of the speech recognition target word among the vocabulary spoken in step S100. It is given (S300). If the word "name" is spoken, a weight is assigned to the portion corresponding to the first syllable. In this way, the accuracy of syllables is increased and the accuracy of speech recognition is increased.

상기 단계(S300)에서 강조된 특징 벡터열을 가지고 탐색부(30)에서는 인식 대상 단어의 텍스트(text)정보와 상기 텍스트 정보에 대응하는 단어의 특징 벡터열을 포함하는 사전(40)을 탐색해서 일치하는 단어, 즉 S100단계에서 발화된 단어와 동일한 단어를 선택하게 된다(S400). 만일, S100단계에서 "이름"이라는 단어를 발성하였다면 탐색부(30)는 사전(40)에 등록된 "이름"이라는 텍스트 정보를 불러 올 것이다.With the feature vector sequence highlighted in step S300, the search unit 30 searches for and matches the dictionary 40 including the text information of the word to be recognized and the feature vector sequence of the word corresponding to the text information. The word to be selected, that is, the same word as the word spoken in step S100 is selected (S400). If the word "name" is spoken in step S100, the search unit 30 will load the text information of the name "name" registered in the dictionary (40).

또한 상기 탐색부(30)에서는 선택한 단어를 가지고 인식 대상 단어의 음절에 해당하는 부분만을 최종적으로 선택을 해서 단어 구성부(200)로 선택한 음절만을 전달한다(S500). 다시 말해서 탐색부(30)는 "이름"이라는 단어에 해당하는 텍스트 정보에서 "이"라는 음절에 해당하는 정보만을 선택하는 것이다.In addition, the search unit 30 finally selects only the portion corresponding to the syllable of the word to be recognized with the selected word, and delivers only the selected syllable to the word constitution unit 200 (S500). In other words, the search unit 30 selects only the information corresponding to the syllable “Y” from the text information corresponding to the word “name”.

상기와 같은 단계가 계속해서 반복되어 탐색부(30)는 "순"에 해당하는 텍스트 정보와 "신"에 해당하는 텍스트 정보만을 사전(40)으로부터 선택해서 단어 구성부(200)로 전달하게 된다. 그러면 상기 단어 구성부(200)는 상기 탐색부(30)에서 전달된 음성인식 대상 단어의 각각의 음절에 해당하는 텍스트 정보들을 조합해서 하나의 단어를 생성해서 음성인식 결과값으로 출력하게 된다(S600).As the above steps are repeated repeatedly, the search unit 30 selects only the text information corresponding to "order" and the text information corresponding to "god" from the dictionary 40 and delivers it to the word constructing unit 200. . Then, the word constructing unit 200 generates a single word by combining text information corresponding to each syllable of the speech recognition target word transmitted from the search unit 30 and outputs the result as the speech recognition result value (S600). ).

상기와 같이 이루어지는 본 발명은 그대로 한국어 숫자음 인식에도 사용될 수가 있다. "일", "이", "삼"...."십",...등의 단음절 인식에 이 음절을 포함하는 단어를 아무것이나 발음을 한다. 한국어 숫자음은 우리나라 사람들도 잘 못 알아듣기 때문에 포병들의 경우에 각 숫자에 대응하는 단어들을 선정해서 이로 통용되고 있다.The present invention made as described above can also be used for Korean numeral sound recognition as it is. Any word that contains this syllable is pronounced to recognize one syllable such as "one", "two", "three" .... "ten", ... Korean numerals are not well understood by Korean people, so in the case of artillery, the words corresponding to each number are selected and used.

그러나, 본 발명의 경우에는 특정한 단어를 정해서 사람들을 교육할 필요가 없다.However, in the case of the present invention, it is not necessary to educate people by selecting specific words.

본 발명은 전화기내에 음성인식에 의한 전화번호부 구성하는데 유용하게 이용할 수가 있으며, 한국어 숫자음 음성 인식기를 구성할 수도 있다. 한국어 숫자음의 경우 단음절로 구성되어 인식 성능이 저조하여 어려움이 있었으나, 본 발명을 이용하면 숫자음 역시 용이하게 인식할 수 있는 음성 인식기를 구성할 수 있다. 또한 음성에 의한 문서 입력 장치인 이메일 음성 작성기, 음성인식 검색기등에서 미등록어 또는 고유명사를 입력하는 장치로도 구성할 수가 있다.The present invention can be usefully used to construct a phone book by voice recognition in a telephone, and can also configure a Korean digit voice recognizer. In the case of the Korean numeral sound, the syllable is composed of single syllables, so that the recognition performance is low. However, using the present invention, it is possible to construct a speech recognizer that can easily recognize the numeric sound. It can also be configured as a device for inputting a non-registered word or proper noun in an e-mail voice composer, a voice recognition searcher or the like which is a document input device by voice.

본 발명에 따른 음성 인식기 및 그 방법에 의하면, 현재의 음성인식 기술로도 일상적인 생활에서 사용될 수 있을 정도의 수만 또는 수십만 단어를 인식할 수가 있으며, 음성인식기에 등록되지 않은 단어라도 음성인식을 수행할 수가 있다.According to the speech recognizer and the method according to the present invention, even the current speech recognition technology can recognize tens of thousands or hundreds of thousands of words that can be used in everyday life, and performs speech recognition even for words not registered in the speech recognizer. You can do it.

본 발명의 단순한 변형 내지 변경은 이 분야의 통상의 지식을 가진 자에 의하여 용이하게 이용될 수 있으며, 이러한 변형이나 변경은 모두 본 발명의 영역에 포함되는 것으로 볼 수 있다.Simple modifications and variations of the present invention can be readily used by those skilled in the art, and all such variations or modifications can be considered to be included within the scope of the present invention.

Claims

It consists of a speech recognition unit 100 for performing speech recognition for only one syllable of the spoken words and a word constitution unit 200 for collecting the syllables recognized by the recognition unit and finally outputting a speech recognition target word. The speech recognition unit 100,

A feature extraction unit 10 for extracting a feature vector sequence representing a feature of speech from the spoken word;

A weighting unit 20 for assigning weights to portions corresponding to syllables of words to be recognized among the feature vector strings extracted by the extracting unit;

A dictionary 40 including text information of a word to be recognized and a feature vector sequence corresponding to the text information; And

A search unit (30) for selecting a word having the highest similarity compared to the dictionary (40) with the feature vector sequence highlighted in the weighting unit (20) and finally selecting only syllables of words to be recognized among the selected words;

Speech recognizer, characterized in that consisting of.

The speech recognizer of claim 1, wherein the weighted syllable is a first syllable.

Selecting and uttering vocabulary including syllables constituting the speech recognition target word;

Extracting a feature vector sequence of a voice representing a feature of the vocabulary from the spoken vocabulary;

Finding and weighting a portion corresponding to a syllable of a speech recognition target word to the extracted feature vector sequence;

Searching the dictionary 40 including the text information of the word to be recognized and the feature vector sequence of the word corresponding to the text information with the feature vector sequence highlighted in the weighting step, the words are selected and spoken. Selecting the same word as the word spoken in the step of;

Selecting only text information corresponding to a syllable constituting a voice recognition target word among text information corresponding to the selected word;

Outputting a speech recognition result value by combining text information of all syllables constituting the selected speech recognition target word in order;

Speech recognition in units of syllables, characterized in that consisting of steps.

The method of claim 1, wherein the weighted portion of the weighting step is the first syllable of the spoken vocabulary.