KR100358992B1

KR100358992B1 - Voice recognition apparatus depended on speakers for being based on the phoneme like unit and the method

Info

Publication number: KR100358992B1
Application number: KR1019990037534A
Authority: KR
Inventors: 이봉우; 권오일
Original assignee: 주식회사 현대오토넷
Priority date: 1999-09-04
Filing date: 1999-09-04
Publication date: 2002-10-30
Also published as: KR20010026288A

Abstract

본 발명은 유사 음소에 기반한 화자 종속형 음성 인식 장치 및 그 방법에 관한 것이다.The present invention relates to an apparatus and method for speaker-dependent speech recognition based on similar phonemes.

본 발명은 텍스트 입력기(10)와, 음성 입력기(20)와, 유사 음소(PLU)를 저장하는 ROM(30)과, ROM(30)의 유사 음소(PLU)를 연결하여 만든 웨이브 파일을 이용하여 생성하는 단어 모델을 저장하는 RAM(40)과, 입력 텍스트에 포함되는 단어에 대응하는 ROM(30)의 유사 음소(PLU)를 연결하여 단어에 대한 특징 벡터를 추출하므로써 웨이브 파일을 만들고, 웨이브 파일을 이용하여 대표 벡터를 가지는 단어 모델을 생성하여 등록하며, 음성 입력 시에 입력 음성 데이터의 특징 벡터를 추출한 후, 음성 데이터의 특징 벡터와 RAM(40)에 등록된 단어 모델의 대표 벡터를 매칭시켜 음성 데이터와 단어 모델의 거리를 측정한 결과, 최소 거리값을 가지는 단어 모델을 추출하여 음성 인식 결과로 출력하는 CPU(50), 및 CPU(50)의 출력 단어에 따라서 목표물의 작동을 제어하는 목표 컨트롤러(60)로 구성되며,The present invention uses a wave file created by connecting the text input device 10, the voice input device 20, a ROM 30 storing similar phonemes (PLUs), and a pseudo phoneme (PLU) of the ROM 30. Create a wave file by extracting a feature vector for a word by connecting a RAM 40 for storing a generated word model and a pseudo phoneme (PLU) of a ROM 30 corresponding to a word included in an input text. Generates and registers a word model having a representative vector by using the A-S, extracts a feature vector of the input voice data during voice input, and then matches the representative vector of the word model registered in the RAM 40 with the feature vector of the voice data. As a result of measuring the distance between the speech data and the word model, the CPU 50 for extracting the word model having the minimum distance value and outputting the word model as a speech recognition result, and the target for controlling the operation of the target according to the output word of the CPU 50 Controller (6 0)

이에 따라서, 사용자의 음성을 입력받아 단어 모델을 등록하지 않더라도 간단하고 용이하게 단어를 등록할 수 있다.Accordingly, a word can be registered simply and easily without receiving a user's voice and registering a word model.

Description

Speaker dependent speech recognition device based on similar phoneme and its method {VOICE RECOGNITION APPARATUS DEPENDED ON SPEAKERS FOR BEING BASED ON THE PHONEME LIKE UNIT AND THE METHOD}

본 발명은 음성 인식 장치에 관한 것이며, 보다 상세히는 유사 음소에 기반한 화자 종속형 음성 인식 장치 및 그 방법에 관한 것이다.The present invention relates to a speech recognition apparatus, and more particularly, to a speaker-dependent speech recognition apparatus and method thereof based on similar phonemes.

종래의 화자 종속형 음성 인식 장치는 사용자에게서 직접 단어 인식에 필요한 단어 전체를 입력받아 단어 모델을 생성한 후, 음성 입력 시에 해당 단어를 인식한 결과에 따라서 소정의 목표 컨트롤러(예컨대, 핸드폰의 다이얼링 컨트롤러)에 인식된 단어를 전송하도록 되어 있다.A conventional speaker-dependent speech recognition apparatus generates a word model by directly receiving a word required for word recognition directly from a user, and then generates a predetermined target controller (eg, dialing of a mobile phone) according to a result of recognizing the corresponding word during voice input. Controller to transmit the recognized word.

그러나, 상기와 같은 종래의 화자 종속형 음성 인식 장치는 반드시 사용자의 개입이 필요하므로 사용자에 대한 편의성이 떨어지고, 특히 단어 모델을 유지하기 위하여 많은 단어를 메모리에 등록시키면 그 만큼 기억 용량이 증가하는 단점이 있으며, 이로 인해 모빌 컴퓨터(예컨대, 노트북, 랩탑, PDA 등)에 활발하게 적용되지 못하는 문제점이 있다.However, such a speaker-dependent speech recognition apparatus as described above has a disadvantage in that it is inconvenient for the user because the user's intervention is necessary. In particular, if a large number of words are registered in the memory to maintain a word model, the memory capacity increases by that amount. There is a problem that can not be actively applied to the mobile computer (eg, notebook, laptop, PDA, etc.) due to this.

따라서, 본 발명은 상술한 종래의 문제점을 극복하기 위한 것으로서, 본 발명의 목적은 등록하려는 단어를 텍스트로 입력받고, 메모리에 미리 구축한 유사 음소(Phoneme Like Unit; 이하 PLU라 한다)를 연결하여 만든 해당 단어의 웨이브 파일을 이용하여 소정의 단어 모델을 생성하도록 된 유사 음소에 기반한 화자 종속형 음성 인식 장치 및 그 방법을 제공하는데 있다.Accordingly, an object of the present invention is to overcome the above-described problems, and an object of the present invention is to input a word to be registered as a text and connect a phoneme like unit (hereinafter referred to as a PLU) previously constructed in a memory. The present invention provides a speaker-dependent speech recognition apparatus and method based on similar phonemes that generate a predetermined word model using a wave file of a corresponding word.

상기 본 발명의 목적을 달성하기 위한 유사 음소에 기반한 화자 종속형 음성 인식 장치는 음성을 인식하기 위하여 등록하려고 하는 단어를 텍스트로 입력받는 텍스트 입력기와, 화자의 음성을 입력받는 음성 입력기와, 음성 인식용 웨이브 파일을 생성하기 위한 유사 음소(PLU)를 저장하는 ROM과, 상기 ROM의 유사 음소(PLU)를 연결하여 만든 소정의 단어에 대한 웨이브 파일을 이용하여 생성하는 소정의 단어 모델을 저장하는 RAM과, 입력 텍스트에 포함되는 단어에 대응하는 상기 ROM의 유사 음소(PLU)를 연결하여 소정의 단어에 대한 특징 벡터를 추출하므로써 웨이브 파일을 만들고, 상기 웨이브 파일을 이용하여 소정의 대표 벡터를 가지는 단어 모델을 생성한 후 상기 RAM에 등록하며, 화자의 음성 입력 시에 입력 음성 데이터의 특징 벡터를 추출한 후, 상기 음성 데이터의 특징 벡터와 상기 RAM에 등록되어 있는 단어 모델의 대표 벡터를 매칭시켜 음성 데이터와 단어 모델의 거리를 측정한 결과, 최소 거리값을 가지는 단어 모델을 추출하여 음성 인식 결과로 출력하는 CPU, 및 상기 CPU가 화자의 음성을 인식하여 출력하는 단어에 따라서 소정의 목표물의 작동을 제어하는 목표 컨트롤러로 구성된다.The speaker-dependent speech recognition apparatus based on similar phonemes for achieving the object of the present invention includes a text input unit for inputting a word to be registered in order to recognize a voice, a voice input unit for receiving a speaker's voice, and voice recognition. RAM storing pseudo phoneme (PLU) for generating wave file for use and predetermined word model generated using wave file for predetermined word made by concatenating pseudo phoneme (PLU) of ROM And a similar phoneme (PLU) of the ROM corresponding to a word included in the input text to extract a feature vector for a predetermined word to create a wave file, and use the wave file to have a word having a predetermined representative vector. After generating a model and registering it in the RAM, extracting the feature vector of the input voice data when the speaker's voice input, A CPU for matching the feature vector of the sex data with the representative vector of the word model registered in the RAM to measure the distance between the voice data and the word model, and extracting a word model having a minimum distance value and outputting the result as a voice recognition result; And a target controller for controlling the operation of a predetermined target in accordance with a word that the CPU recognizes and outputs the speaker's voice.

이에 따라서, 본 발명의 유사 음소에 기반한 화자 종속형 음성 인식 장치는사용자의 음성을 입력받아 단어 모델을 등록하지 않더라도 간단하고 용이하게 단어를 등록할 수 있으며, 단어 모델을 쉽게 생성할 수 있으므로 많은 단어 모델을 저장하기 위한 메모리 공간이 필요없게 된다.Accordingly, the speaker-dependent speech recognition apparatus based on the similar phoneme of the present invention can register a word simply and easily without registering a word model by receiving a user's voice, and can easily generate a word model, thus making many words. There is no need for memory space to store the model.

도 1은 본 발명에 따른 유사 음소에 기반한 화자 종속형 음성 인식 장치를 도시한 구성도,1 is a block diagram showing a speaker-dependent speech recognition apparatus based on a similar phone according to the present invention,

도 2는 본 발명에 따른 유사 음소에 기반한 화자 종속형 음성 인식 장치의 단어 등록 과정을 도시한 플로차트,2 is a flowchart illustrating a word registration process of a speaker-dependent speech recognition apparatus based on similar phonemes according to the present invention;

도 3은 본 발명에 따른 유사 음소에 기반한 화자 종속형 음성 인식 장치의 단어 인식 과정을 도시한 플로차트,3 is a flowchart illustrating a word recognition process of a speaker-dependent speech recognition apparatus based on similar phonemes according to the present invention;

도 4는 본 발명에 따른 유사 음소에 기반한 화자 종속형 음성 인식 장치의 웨이브 파일의 파형과 화자의 음성 파형을 도시한 파형도이다.4 is a waveform diagram showing a waveform of a wave file and a speaker's speech waveform of a speaker-dependent speech recognition apparatus based on the pseudo phoneme according to the present invention.

<도면의 주요 부분에 대한 부호의 설명><Explanation of symbols for the main parts of the drawings>

10: 텍스트 입력기 20: 음성 입력기10: text input 20: voice input

30: ROM 40: RAM30: ROM 40: RAM

50: CPU 60: 목표 컨트롤러50: CPU 60: Target Controller

이하, 본 발명의 실시예를 첨부한 도면을 참조하여 상세히 설명하기로 한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1을 참조하면, 텍스트 입력기(10)는 음성을 인식하기 위하여 등록하려고 하는 단어를 텍스트로 입력받는다.Referring to FIG. 1, the text input unit 10 receives a word to be registered as text in order to recognize a voice.

음성 입력기(20)는 화자의 음성을 입력받는다.The voice input unit 20 receives a speaker's voice.

ROM(30)은 음성 인식용 웨이브 파일을 생성하기 위한 유사 음소(PLU)를 저장한다.The ROM 30 stores pseudo phonemes (PLUs) for generating wave files for speech recognition.

상기 ROM(30)에는 한글 단어를 초성, 중성, 종성으로 구분하여 총 46개의 유사 음소(PLU)가 저장된다.The ROM 30 stores 46 similar phonemes (PLUs) by dividing a Korean word into a consonant, a neutral, and a consonant.

상기 유사 음소(PLU)는 "ㅃ, ㄸ, ㅔ(ㅐ), ㄲ, 종성 ㅇ, ㅆ, 묵음, ㅡ, ㅓ, ㅉ, ㅏ, ㅂ, 유성음화 ㅂ(예컨대, 아버지의 ㅂ), 불파음화 ㅂ, ㅊ, ㄷ, 유성음화 ㄷ, 불파음화 ㄷ, 받침 ㄱ, 유성음화 ㄱ, 불파음화 ㄱ, ㅎ, 유성음화 ㅎ, ㅣ, ㅑ, ㅒ(ㅖ), ㅛ, ㅠ, ㅕ, ㅋ, 받침 ㄹ, ㅁ, ㄴ, ㅗ, ㅍ, 초성 ㄹ, ㅅ, ㅌ, ㅜ, ㅓ, ㅚ(ㅞ,ㅙ), ㅘ, ㅟ, ㅝ, ㅈ, 유성음화 ㅈ" 으로 구성된다.The pseudo phoneme (PLU) is "ㅃ, ㄸ, ㅔ (ㅐ), ㄲ, jong ㅇ, ㅆ, mute, ,, ㅓ, ㅉ, ㅏ, ㅂ, voiced ㅂ (e.g., father's ㅂ), impaired ㅂ , C, c, voiced c, fired c, pedestal a, voiced a, fired a, h, h, voiced, h, l, b, b, b, b, b, b, b, b, ㅁ, ,, ㅗ, ,, consonant ,, ㅅ, ㅌ, TT, ㅓ, ㅚ (ㅞ, ㅙ), ㅘ, ㅟ, ㅝ, ,, voiced ㅈ ".

RAM(40)은 상기 ROM(30)의 유사 음소(PLU)를 연결하여 만든 소정의 단어에 대한 웨이브 파일을 이용하여 생성하는 소정의 단어 모델을 저장한다.The RAM 40 stores a predetermined word model that is generated by using a wave file for a predetermined word formed by concatenating pseudo phonemes (PLUs) of the ROM 30.

CPU(50)는 입력 텍스트에 포함되는 단어에 대응하는 상기 ROM(30)의 유사 음소(PLU)를 연결하여 소정의 단어에 대한 특징 벡터를 추출하므로써 웨이브 파일을 만들고, 상기 웨이브 파일을 이용하여 소정의 대표 벡터를 가지는 단어 모델을 생성한 후 상기 RAM(40)에 등록한다.The CPU 50 creates a wave file by concatenating similar phonemes (PLUs) of the ROM 30 corresponding to the words included in the input text and extracting feature vectors for a predetermined word, and using the wave file. After generating a word model having a representative vector of and registers in the RAM (40).

상기 CPU(50)는 화자의 음성 입력 시에 입력 음성 데이터의 특징 벡터를 추출한 후, 상기 음성 데이터의 특징 벡터와 상기 RAM(40)에 등록되어 있는 단어 모델의 대표 벡터를 매칭시켜 음성 데이터와 단어 모델의 거리를 측정한 결과, 최소 거리값을 가지는 단어 모델을 추출하여 음성 인식 결과로 출력한다.The CPU 50 extracts the feature vector of the input voice data when the speaker inputs the voice, and then matches the feature vector of the voice data with the representative vector of the word model registered in the RAM 40 to match the voice data and the word. As a result of measuring the distance of the model, the word model having the minimum distance value is extracted and output as a speech recognition result.

목표 컨트롤러(60)는 상기 CPU(50)가 화자의 음성을 인식하여 출력하는 단어에 따라서 소정의 목표물(예컨대, 핸드폰 다이얼 장치)의 작동을 제어한다.The target controller 60 controls the operation of a predetermined target (eg, a mobile phone dial device) according to a word that the CPU 50 recognizes and outputs the speaker's voice.

상기와 같은 구성에 의해서 본 발명에 따른 유사 음소에 기반한 화자 종속형 음성 인식 장치는 다음과 같이 작동한다.With the above configuration, the speaker-dependent speech recognition apparatus based on the similar phoneme according to the present invention operates as follows.

본 발명에 따른 유사 음소에 기반한 화자 종속형 음성 인식 장치는 기본적으로 단어 등록 과정과 단어 인식 과정을 수행한다.The speaker-dependent speech recognition apparatus based on the similar phoneme according to the present invention basically performs a word registration process and a word recognition process.

먼저, 도 2를 참조하여 단어 등록 과정을 설명하면 다음과 같다.First, a word registration process will be described with reference to FIG. 2.

상기 CPU(50)는 상기 텍스트 입력기(10)를 통해 등록시키기 위한 단어를 포함하는 텍스트가 입력되면(S10), 상기 ROM(30)에 저장되어 있는 유사 음소(PLU) 중에서 상기 텍스트에 대응하는 유사 음소(PLU)를 추출한다(S11).When the CPU 50 inputs a text including a word to be registered through the text input unit 10 (S10), a similar word corresponding to the text among similar phonemes (PLUs) stored in the ROM 30 is input. The phoneme PLU is extracted (S11).

또한, 상기와 같이 입력 텍스트에 대응하는 소정의 유사 음소(PLU)가 추출되면, 상기 CPU(50)는 추출된 소정의 유사 음소(PLU)를 연결하여 소정의 웨이브 파일을 만든다(S12).In addition, when a predetermined pseudo phoneme (PLU) corresponding to the input text is extracted as described above, the CPU 50 connects the extracted predetermined pseudo phoneme (PLU) to create a predetermined wave file (S12).

이어서, 상기 CPU(50)는 상기 웨이브 파일에 대해 VMS-VQ(Variable multi section-vector quantization) 알고리즘을 적용하여 특징 벡터를 추출하고(S13), 상기 특징 벡터를 분류하여 소정의 벡터군을 생성한다(S14).Subsequently, the CPU 50 extracts a feature vector by applying a variable multi section-vector quantization (VMS-VQ) algorithm to the wave file (S13), classifies the feature vector, and generates a predetermined vector group. (S14).

상기와 같이, 소정의 벡터군이 생성되면 최종적으로 상기 CPU(50)는 상기 벡터군에 대해 MKM(Modified K-means) 클러스터링 알고리즘을 적용하여 대표 벡터를 추출하고(S15), 소정의 단어 모델을 생성하여 상기 RAM(40)에 저장한다(S16). 이때, 상기 대표 벡터를 추출하기 위하여 생성한 웨이브 파일은 삭제된다.As described above, when a predetermined vector group is generated, the CPU 50 finally extracts a representative vector by applying a Modified K-means (MKM) clustering algorithm to the vector group (S15), and generates a predetermined word model. It is generated and stored in the RAM 40 (S16). At this time, the wave file generated to extract the representative vector is deleted.

특히, 본 발명에 따른 유사 음소에 기반한 화자 종속형 음성 인식 장치는 상기와 같은 단어 등록 과정에 있어서, 상기 ROM(30)에 저장되어 있는 유사 음소(PLU)를 이용하여 단어 모델을 생성하도록 되어 있으므로, 상기 유사음소(PLU)의 DB를 저장하는 ROM(30)의 메모리 공간만 확보해 놓으면 무제한적으로 다양한 단어 모델을 생성하여 상기 RAM(40)에 등록할 수 있다.In particular, the speaker-dependent speech recognition apparatus based on the similar phoneme according to the present invention is configured to generate a word model using the similar phoneme (PLU) stored in the ROM 30 in the word registration process as described above. When only the memory space of the ROM 30 storing the DB of the pseudo phoneme is secured, various word models can be generated and registered in the RAM 40 without limit.

상기와 같이 단어 등록 과정이 완료된 상태에서 화자의 음성이 상기 음성 입력기(20)를 통해 입력되면 본 발명에 따른 유사 음소에 기반한 화자 종속형 음성 인식 장치는 단어 인식 과정을 수행한다.When the speaker's voice is input through the voice input unit 20 while the word registration process is completed as described above, the speaker-dependent speech recognition apparatus based on the similar phoneme according to the present invention performs the word recognition process.

도 3을 참조하여 단어 인식 과정을 설명하면 다음과 같다.Referring to Figure 3 describes the word recognition process as follows.

상기 CPU(50)는 상기 음성 입력기(20)를 통해 화자의 음성이 입력되면(S20), 입력 음성의 엔드 포인트, 즉 시작점과 종료점을 검출하여 음성 인식에 필요한 음성 데이터만을 추출한다(S21).When the speaker's voice is input through the voice input unit 20 (S20), the CPU 50 detects an endpoint of the input voice, that is, a start point and an end point, and extracts only voice data necessary for voice recognition (S21).

또한, 상기와 같이 소정의 음성 데이터가 추출되면, 상기 CPU(50)는 추출된 상기 음성 데이터에 대한 특징 벡터를 추출한다(S22).In addition, when predetermined voice data is extracted as described above, the CPU 50 extracts a feature vector for the extracted voice data (S22).

이어서, 상기 음성 데이터에 대한 특징 벡터가 추출되면, 상기 CPU(50)는 상기 음성 데이터의 특징 벡터와 상기 RAM(40)에 등록되어 있는 단어 모델의 대표 벡터에 대해 VMS-VQ 알고리즘을 적용하여, 상기 음성 데이터의 특징 벡터와 상기 RAM(40)에 등록되어 있는 단어 모델의 대표 벡터를 서로 매칭시켜 음성 데이터와 단어 모델의 거리를 측정하고, 측정 결과 최소 거리값을 가지는 단어 모델을 화자의 음성에 대한 인식 단어로 추출한다(S23).Subsequently, when the feature vector for the voice data is extracted, the CPU 50 applies the VMS-VQ algorithm to the feature vector of the voice data and the representative vector of the word model registered in the RAM 40. The distance between the voice data and the word model is measured by matching the feature vector of the voice data with the representative vector of the word model registered in the RAM 40, and the word model having the minimum distance value is determined by the speaker's voice. Extracted as a recognition word (S23).

상기와 같이, 소정의 단어 모델, 즉 상기 최소 거리값을 가지는 단어 모델이 추출되면, 상기 CPU(50)는 상기 최소 거리값을 가지는 단어 모델을 음성 인식 결과로 출력하여 상기 목표 컨트롤러(60)로 인가시키며(S24), 이에 따라서 목표 컨트롤러(60)는 상기 CPU(50)가 화자의 음성을 인식하여 출력하는 단어에 따라서 소정의 목표물(예컨대, 핸드폰 다이얼 장치)의 작동을 제어한다.As described above, when a predetermined word model, that is, a word model having the minimum distance value is extracted, the CPU 50 outputs the word model having the minimum distance value as a voice recognition result to the target controller 60. In operation S24, the target controller 60 controls the operation of a predetermined target (eg, a mobile phone dial device) according to a word that the CPU 50 recognizes and outputs the speaker's voice.

참고로, 본 발명에 따른 유사 음소에 기반한 화자 종속형 음성 인식 장치에 있어서, "어머니"라는 단어를 상기 RAM(40)에 등록한 후, 화자가 "어머니"라는 음성을 상기 음성 입력기(20)를 통해 상기 CPU(50)로 인가시킬 때, 단어 등록 과정에서 생성되는 상기 "어머니"라는 단어의 대표 벡터를 추출하기 위한 웨이브 파일의 파형은 도 4의 (가)에 도시된 바와 같이 나타나며, 화자가 입력하는 "어머니"라는 단의 음성 파형은 도 4의 (나)에 도시된 바와 같이 나타난다.For reference, in the speaker-dependent speech recognition apparatus based on the similar phoneme according to the present invention, after the word "mother" is registered in the RAM 40, the speaker inputs the voice "mother" to the voice input device 20. When applied to the CPU 50, the waveform of the wave file for extracting the representative vector of the word "mother" generated in the word registration process appears as shown in (a) of FIG. The input voice waveform of "mother" appears as shown in Fig. 4B.

도 4에 도시된 파형을 비교해 보면, 실제로 상기 ROM(30)에 저장되는 유소 음소(PLU)를 연결하여 만든 웨이브 파일의 파형과 화자의 음성 파형이 유사함을 알 수 있으며, 이에 따라서 본 발명에 따른 유사 음소에 기반한 화자 종속형 음성 인식 장치가 정확한 음성 인식 기능을 수행할 수 있음을 알 수 있다.Comparing the waveforms shown in FIG. 4, it can be seen that the waveforms of the wave file and the voice waveforms of the speaker are actually similar by connecting the phoneme phoneme (PLU) stored in the ROM 30. It can be seen that the speaker-dependent speech recognition apparatus based on the similar phoneme can perform an accurate speech recognition function.

상술한 바와 같이 본 발명에 따른 유사 음소에 기반한 화자 종속형 음성 인식 장치 및 그 방법은 등록하려는 단어를 텍스트로 입력받고, 메모리에 미리 구축한 유사 음소(PLU)를 연결하여 만든 해당 단어의 웨이브 파일을 이용하여 소정의 단어 모델을 생성하도록 되어 있기 때문에, 사용자의 음성을 입력받아 단어 모델을 등록하지 않더라도 간단하고 용이하게 단어를 등록할 수 있으며, 단어 모델을 쉽게 생성할 수 있으므로 많은 단어 모델을 저장하기 위한 메모리 공간이 필요없게 되는 효과가 있다.As described above, the speaker-dependent speech recognition apparatus based on the similar phoneme according to the present invention and a method thereof receive a word to be registered as a text and a wave file of the corresponding word made by connecting a similar phoneme (PLU) pre-built in a memory. Since it is designed to generate a predetermined word model by using a user's voice, it is possible to register a word simply and easily even without registering the word model. The effect is that no memory space is required to do this.

이상에서 설명한 것은 본 발명에 따른 유사 음소에 기반한 화자 종속형 음성 인식 장치 및 방법을 실시하기 위한 하나의 실시예에 불과한 것으로서, 본 발명은 상기한 실시예에 한정되지 않고, 이하의 특허청구의 범위에서 청구하는 본 발명의 요지를 벗어남이 없이 당해 발명이 속하는 분야에서 통상의 지식을 가진 자라면 누구든지 다양한 변경 실시가 가능할 것이다.What has been described above is only one embodiment for implementing the speaker-dependent speech recognition apparatus and method based on the similar phoneme according to the present invention, and the present invention is not limited to the above-described embodiment, but the scope of the claims Various changes can be made by those skilled in the art without departing from the gist of the present invention claimed in the related art.

Claims

A text input unit 10 for inputting a word to be registered in order to recognize a voice as text;

Voice input unit 20 for receiving the speaker's voice,

A ROM 30 for storing a pseudo phoneme (PLU) for generating a wave file for speech recognition;

A RAM 40 for storing a predetermined word model generated by using a wave file for a predetermined word formed by connecting pseudo phonemes (PLUs) of the ROM 30;

A wave file is created by concatenating a similar phoneme (PLU) of the ROM 30 corresponding to a word included in the input text to extract a feature vector for a predetermined word, and having a predetermined representative vector using the wave file. After generating a word model and registering it in the RAM 40, and extracting a feature vector of the input voice data at the time of the speaker's voice input, the feature vector of the voice data and the word model registered in the RAM 40 CPU 50 for extracting a word model having a minimum distance value and outputting the word model having the minimum distance value as a result of measuring the distance between the speech data and the word model by matching the representative vector, and

A target controller 60 for controlling the operation of a predetermined target according to a word that the CPU 50 recognizes and outputs a speaker's voice;

Speaker-dependent speech recognition device based on a similar phone, characterized in that consisting of.

(Twice correction)

The speaker-dependent speech recognition apparatus based on the similar phone according to claim 1, wherein the word registration process of the CPU 50

When a text including a word for registering through the text input unit 10 is input (S10), a similar phoneme (PLU) corresponding to the text is extracted from the similar phoneme (PLU) stored in the ROM 30. Step S11,

Connecting the extracted predetermined pseudo phonemes (PLU) to create a predetermined wave file (S12);

Extracting a feature vector by applying a VMS-VQ algorithm to the wave file (S13), classifying the feature vector to generate a predetermined vector group (S14), and

Extracting a representative vector by applying an MKM clustering algorithm to the vector group (S15), generating a predetermined word model and storing it in the RAM 40 (S16).

Speaker-dependent speech recognition method based on a similar phone, characterized in that consisting of.

(Twice correction)

The speaker-dependent speech recognition apparatus based on the pseudo phoneme according to claim 1, wherein the word recognition process of the CPU 50

When the speaker's voice is input through the voice input unit 20 (S20), detecting the endpoint of the input voice and extracting only voice data (S21);

Extracting a feature vector for the extracted speech data (S22);

The VMS-VQ algorithm is applied to the feature vector of the speech data and the representative vector of the word model registered in the RAM 40 to apply the feature vector of the speech data and the word model registered in the RAM 40. Extracting a word model having a minimum distance value as a result of measuring the distance between the speech data and the word model by matching the representative vector (S23), and

Outputting the word model having the minimum distance value as a voice recognition result and applying the same to the target controller 60 (S24)