KR100577990B1

KR100577990B1 - Apparatus of talker dependence/independence speech recognition

Info

Publication number: KR100577990B1
Application number: KR1019970081825A
Authority: KR
Inventors: 김락용
Original assignee: 엘지전자 주식회사
Priority date: 1997-12-31
Filing date: 1997-12-31
Publication date: 2006-08-30
Also published as: KR19990061558A

Abstract

본 발명은 화자 종속 단어와 화자 독립 단어를 동시에 인식할 수 있는 화자 종속/독립 음성 인식 장치에 관한 것이다.The present invention relates to a speaker-dependent / independent speech recognition apparatus capable of simultaneously recognizing a speaker-dependent word and a speaker-independent word.

이 화자 종속/독립 음성 음성 인식 장치는 음성신호로부터 음성구간을 검출하는 음성구간 검출수단과; 상기 음성구간 검출수단으로부터의 음성신호에서 특징벡터를 추출하는 특징벡터 추출수단과; 상기 특징벡터 추출수단으로부터의 특징벡터를 코드북 이용하여 양자화하는 벡터 양자화수단과; 상기 양자화수단의 출력신호를 입력받아 종속모델 파라미터를 추정하는 파라미터 추정수단과; 상기 파라미터 추정수단으로부터의 종속모델을 저장하는 제1 저장수단과; 다수 화자의 음성데이터로부터 만들어진 독립모델을 저장하는 제2 저장수단과; 상기 제1 저장수단의 종속모델과 제2 저장수단의 독립모델 각각을 전처리수단의 출력신호와 패턴 정합하여 종속모델 유사도 정보와 독립모델 유사도 정보를 각각 생성하는 패턴 정합수단과;화자의 선택에 따라 상기 벡터 약자화수단을 상기 파라미터 추정수단과 패턴 정합수단 중 어느 하나에 접속시키는 선택 스위치 수단과; 상기 종속모델 유사도 정보와 독립모델 유사도 정보를 비교하여 해당모드를 결정하는 결정 로직부와, 상기 결정 로직부에서 결정된 모드에 따라 선택적으로 동작하여 해당 모드의 유사도 정보를 이용하여 결정된 인식결과를 출력하는 거절부를 포함하여 상기 패턴 정합수단으로부터의 종속모델 유사도 정보와 독립모델 유사도 정보를 이용하여 상기 입력음성이 해당하는 모드를 판단하고 그 해당 모드에서 결정된 인식결과를 출력하는 후처리 수단을 구비한다.This speaker dependent / independent speech recognition apparatus includes speech section detection means for detecting a speech section from a speech signal; Feature vector extracting means for extracting feature vectors from the speech signal from said speech segment detecting means; Vector quantization means for quantizing a feature vector from the feature vector extracting means using a codebook; Parameter estimation means for receiving an output signal of the quantization means and estimating the dependent model parameter; First storage means for storing the dependent model from the parameter estimating means; Second storage means for storing an independent model made from voice data of multiple speakers; A pattern matching means for pattern matching each of the dependent model of the first storage means and the independent model of the second storage means with the output signal of the preprocessing means to generate the dependent model similarity information and the independent model similarity information, respectively; Selection switch means for connecting the vector abbreviation means to any one of the parameter estimating means and the pattern matching means; A decision logic unit for determining a corresponding mode by comparing the dependent model similarity information and the independent model similarity information, and selectively operating according to a mode determined by the decision logic unit to output a recognition result determined using similarity information of the corresponding mode. And a post-processing means for determining a mode corresponding to the input voice using the dependent model similarity information and the independent model similarity information from the pattern matching means and outputting a recognition result determined in the corresponding mode, including a rejection unit.

Description

Speaker dependent / independent speech recognition device {Apparatus of Talker Dependence / Independence Speech Recognition}

본 발명은 음성 인식 시스템에 관한 것으로, 특히 화자 종속 단어와 화자 독립 단어를 동시에 인식할 수 있는 화자 종속/독립 음성 인식 장치에 관한 것이다.The present invention relates to a speech recognition system, and more particularly, to a speaker-dependent / independent speech recognition apparatus capable of simultaneously recognizing a speaker-dependent word and a speaker-independent word.

통상, 음성인식 방법에는 화자에 따라 화자종속 음성인식 방법과 화자독립 음성인식 방법으로 분류된다. 여기서, 화자종속 음성인식 방법은 특정인 한사람만을 위한 것으로 사용자의 입장에서는 종속 기준모델을 등록하기 위하여 학습과정이 필요하다. 반면에, 화자독립 음성인식 방법은 불특정 다수인을 위한 것으로 다수 인이 학습에 참여하여 얻은 일반적인 독립 기준모델을 이용하여 음성을 인식하는 방법으로써, 이 독립 기준모델은 생산공장에서 제품화되어 공급되므로 사용자의 입장에서는 학습의 과정이 불필요하다.Usually, voice recognition methods are classified into speaker dependent voice recognition methods and speaker independent voice recognition methods according to the speaker. Here, the speaker-dependent speech recognition method is for a specific person only and a learning process is required from the user's point of view in order to register the dependent reference model. On the other hand, the speaker-independent speech recognition method is for an unspecified number of people, and is a method of recognizing speech using a general independent reference model obtained by a large number of participants in learning. From the standpoint of learning, the process of learning is unnecessary.

일반적인 음성인식 장치는 상술한 두가지의 음성인식 방법을 제공하기 위하여 화자종속 인식기와 화자독립 인식기를 구비하고 있다. 종래의 음성인식 장치는 우선적으로 종속 기준모델을 등록하여 종속 단어의 인식과 독립 단어의 인식을 대비한다. 따라서, 종래의 음성 인식 장치는 처음 사용자가 무조건 종속 기준모델을 등록해야 하므로 번거로울 뿐만 아니라 경우에 따라 화자종속 인식기와 화자독립 인식기를 제어해야 하므로 인식을 위한 제어가 복잡하다는 문제점이 있다. 이하, 첨부도면을 참조하여 상술한 문제점을 상세히 살펴보기로 한다.A general speech recognition device includes a speaker dependent recognizer and a speaker independent recognizer to provide the above two speech recognition methods. The conventional speech recognition apparatus firstly registers the subordinate reference model to prepare for the recognition of the dependent word and the recognition of the independent word. Therefore, the conventional speech recognition apparatus has a problem in that the control for recognition is complicated because the first user must unconditionally register the dependent reference model, and the speaker dependent recognizer and the speaker independent recognizer must be controlled in some cases. Hereinafter, the above-described problems will be described in detail with reference to the accompanying drawings.

도 1은 종래의 음성인식 장치 중 화자종속 음성인식기의 구성을 도시한 블록도로써, 도 1의 화자종속 음성인식기는 입력되는 음성신호에서 특징벡터를 추출하여 양자화하는 전처리부(10)와, 전처리부(10)의 출력신호를 밤 웰츠(Baum-Welch) 추정부(14)와 패턴 정합부(22)로 절환하는 선택 스위치(12)와, 선택 스위치(12)에 접속되어 HMM 파라미터를 추정하는 밤 웰츠 추정부(14)와, 밤 웰츠 추정부(14)의 HMM 파라미터를 기준패턴으로 저장하는 저장부(16)와, 선택 스위치(12)를 경유한 입력신호와 저장부(16)의 기준패턴을 정합하는 패턴 정합부(18)와, 패턴 정합부(18)에 접속되어 인식결과를 출력하는 인식판단부(20)를 구비한다.1 is a block diagram showing the configuration of a speaker-dependent speech recognizer of the conventional speech recognition apparatus. The speaker-dependent speech recognizer of FIG. A selection switch 12 for switching the output signal of the section 10 to a Baum-Welch estimator 14 and a pattern matching section 22 and a selection switch 12 to estimate the HMM parameter The storage unit 16 stores the HMM parameters of the night Welts estimation unit 14 and the night Welts estimation unit 14 as a reference pattern, and the reference of the input signal and the storage unit 16 via the selection switch 12. A pattern matching unit 18 for matching patterns and a recognition determining unit 20 connected to the pattern matching unit 18 and outputting a recognition result are provided.

도 1의 화자종속 음성인식기에서 전처리부(10)는 음성구간 검출부(2), 특징 추출부(4) 및 벡터 양자화기(6)를 구성으로 한다. 전처리부(10)에서 음성구간 검 출부(2)는 입력된 음성신호로부터 음성구간을 검출하여 출력한다. 특징 추출부(4)는 음성구간 검출부(2)의 출력신호에서 특징벡터를 추출하여 출력한다. 벡터 양자화기(6)는 코드북(8)을 참조하여 특징 추출부(4)로부터의 특징벡터를 양자화하여 이산신호로 출력한다. 다시 말하여, 벡터 양자화기(6)는 특징 추출부(4)로부터의 특징벡터들을 코드북(8)의 N 개의 코드벡터와 비교하여 가장 근접한 코드 벡터값으로 양자화하여 출력한다. 코드북(8)은 N개의 다차원 특징 벡터들이 집단화(Clustering) 방법으로 구성된다. 선택 스위치(12)는 사용자의 선택에 따라 전처리부(10)를 밤 웰츠 추정부(14) 또는 패턴 정합부(18)에 접속시킨다. 상세히 하면, 선택스위치(12)는 사용자가 자신의 음성을 등록시키고자 하는 경우 전처리부(10)를 밤 웰츠 추정부(14)에 접속시킨다. 또한, 선택스위치(12)는 사용자의 음성을 인식하고자 하는 경우 전처리부(10)를 패턴 정합부(18)로 접속시킨다. 밤 웰츠 추정부(14)는 등록시 벡터 양자화기(6)로부터 선택스위치(12)를 경유하여 입력된 이산신호에서 은닉 마르코프 모델(Hidden Markov Model; 이하, HMM이라 한다) 파라미터를 추정한다. 이때, 밤 웰츠 추정부(14)는 사용자가 2∼3번 반복 발음한 신호를 입력받아 일반적인 HMM 파라미터를 추출한다. 저장부(16)에는 밤 웰츠 추정부(12)의 HMM 파라미터가 기준패턴으로 저장된다. 패턴 정합부(22)는 인식시 벡터 양자화기(6)로부터 선택 스위치(12)를 경유하여 입력된 이산신호와 저장부(16)의 기준패턴들을 정합하여 출력하고, 인식 판단부(20)는 유사도가 가장 높은 기준모델을 인식결과로 출력한다.In the speaker-dependent speech recognizer of FIG. 1, the preprocessor 10 includes a speech segment detector 2, a feature extractor 4, and a vector quantizer 6. In the preprocessor 10, the voice section detector 2 detects and outputs a voice section from the input voice signal. The feature extractor 4 extracts and outputs a feature vector from the output signal of the speech segment detector 2. The vector quantizer 6 quantizes the feature vectors from the feature extractor 4 with reference to the codebook 8 and outputs them as discrete signals. In other words, the vector quantizer 6 compares the feature vectors from the feature extractor 4 with the N code vectors of the codebook 8 and quantizes them to the nearest code vector value. The codebook 8 is composed of N multidimensional feature vectors in a clustering method. The selection switch 12 connects the preprocessor 10 to the night Welts estimation unit 14 or the pattern matching unit 18 according to the user's selection. In detail, the selector switch 12 connects the preprocessor 10 to the night Welts estimation unit 14 when the user wants to register his / her voice. In addition, the selector switch 12 connects the preprocessor 10 to the pattern matching unit 18 when the user's voice is to be recognized. The night Welts estimator 14 estimates a Hidden Markov Model (hereinafter referred to as HMM) parameter from the discrete signal input from the vector quantizer 6 via the select switch 12 at the time of registration. At this time, the night Welts estimator 14 receives a signal repeatedly pronounced by the user 2-3 times and extracts a general HMM parameter. The storage unit 16 stores the HMM parameter of the night Welts estimation unit 12 as a reference pattern. The pattern matching unit 22 matches and outputs the discrete signal inputted from the vector quantizer 6 via the selection switch 12 and the reference patterns of the storage unit 16 when the recognition is performed, and the recognition determining unit 20 The reference model with the highest similarity is output as the recognition result.

그리고, 음성인식 장치는 상기와 같은 전치리부와 독립모델을 기준으로 인식을 수행하는 인식기를 구성으로 하는 화자독립 인식기를 구비한다.In addition, the speech recognition apparatus includes a speaker independent recognizer configured as a recognizer that performs recognition based on the preposition and the independent model as described above.

그런데, 상술한 음성 인식 장치는 독립 단어의 인식과 동시에 종속 단어의 인식을 대비하기 위하여 우선적으로 종속 기준모델을 등록하여야 한다. 예컨데, 현재 이동통신망에서 사용되는 음성 인식 장치는 중요 메뉴 명령을 미리 화자 종속 상태로 등록해서 화자종속 단어와 화자독립 단어가 동시에 인식될 경우를 대비한다. 이에 따라, 종래의 음성인식 장치는 처음 사용자가 무조건 종속 기준모델을 등록해야 하는 문제점을 갖고 있다.However, the above-described speech recognition apparatus must first register the subordinate reference model in order to prepare for the recognition of the dependent word at the same time as the recognition of the independent word. For example, the speech recognition apparatus used in the current mobile communication network registers important menu commands in a speaker dependent state in advance to prepare for a case where a speaker dependent word and a speaker independent word are simultaneously recognized. Accordingly, the conventional speech recognition apparatus has a problem that the first user must register the dependent reference model unconditionally.

또한, 종래의 음성 인식 장치에서 화자종속 인식기와 화자독립 인식기는 마이컴 등과 같은 제어수단에 의하여 동작이 제어되므로 인식을 위한 제어가 복잡한 문제점을 갖고 있다.In addition, in the conventional speech recognition apparatus, the speaker dependent recognizer and the speaker independent recognizer have a complicated problem because the operation is controlled by a control means such as a microcomputer.

따라서, 본 발명의 목적은 화자종속 단어와 화자독립 단어를 동시에 인식하여 인식기에서 해당 모드 판단함으로써, 인식기의 구조를 간단히 할 수 있는 화자 종속/독립 음성인식 장치를 제공하는 것이다.Accordingly, an object of the present invention is to provide a speaker-dependent / independent speech recognition device capable of simplifying the structure of the recognizer by simultaneously recognizing the speaker dependent word and the speaker independent word and determining the corresponding mode in the recognizer.

본 발명의 다른 목적은 동일한 코드북을 사용하여 화자 종속 및 독립 인식을 수행함으로써, 메모리의 용량을 줄일 수 있는 화자 종속/독립 음성인식 장치를 제공하는 것이다.Another object of the present invention is to provide a speaker dependent / independent speech recognition apparatus capable of reducing the memory capacity by performing speaker dependent and independent recognition using the same codebook.

상기 목적을 달성하기 위하여, 본 발명에 따른 화자 종속/독립 음성 인식 장치는 음성신호로부터 음성구간을 검출하는 음성구간 검출수단과; 상기 음성구간 검출수단으로부터의 음성신호에서 특징벡터를 추출하는 특징벡터 추출수단과; 상기 특징벡터 추출수단으로부터의 특징벡터를 코드북 이용하여 양자화하는 벡터 양자화 수단과; 상기 양자화수단의 출력신호를 입력받아 종속모델 파라미터를 추정하는 파라미터 추정수단과; 상기 파라미터 추정수단으로부터의 종속모델을 저장하는 제1 저장수단과; 다수 화자의 음성데이터로부터 만들어진 독립모델을 저장하는 제2 저장수단과; 상기 제1 저장수단의 종속모델과 제2 저장수단의 독립모델 각각을 전처리수단의 출력신호와 패턴 정합하여 종속모델 유사도 정보와 독립모델 유사도 정보를 각각 생성하는 패턴 정합수단과; 화자의 선택에 따라 상기 벡터 약자화수단을 상기 파라미터 추정수단과 패턴 정합수단 중 어느 하나에 접속시키는 선택 스위치수단과; 상기 종속모델 유사도 정보와 독립모델 유사도 정보를 비교하여 해당모드를 결정하는 결정 로직부와, 상기 결정 로직부에서 결정된 모드에 따라 선택적으로 동작하여 해당 모드의 유사도 정보를 이용하여 결정된 인식결과를 출력하는 거절부를 포함하여 상기 패턴 정합수단으로부터의 종속모델 유사도 정보와 독립모델 유사도 정보를 이용하여 상기 입력음성이 해당하는 모드를 판단하고 그 해당 모드에서 결정된 인식결과를 출력하는 후처리 수단을 구비한다.In order to achieve the above object, a speaker-dependent / independent speech recognition apparatus according to the present invention comprises: speech section detecting means for detecting a speech section from a speech signal; Feature vector extracting means for extracting feature vectors from the speech signal from said speech segment detecting means; Vector quantization means for quantizing the feature vector from the feature vector extracting means using a codebook; Parameter estimation means for receiving an output signal of the quantization means and estimating the dependent model parameter; First storage means for storing the dependent model from the parameter estimating means; Second storage means for storing an independent model made from voice data of multiple speakers; Pattern matching means for pattern matching each of the dependent model of the first storage means and the independent model of the second storage means with the output signal of the preprocessing means to generate the dependent model similarity information and the independent model similarity information, respectively; Selection switch means for connecting the vector abbreviation means to one of the parameter estimating means and the pattern matching means according to a speaker's selection; A decision logic unit for determining a corresponding mode by comparing the dependent model similarity information and the independent model similarity information, and selectively operating according to a mode determined by the decision logic unit to output a recognition result determined using similarity information of the corresponding mode. And a post-processing means for determining a mode corresponding to the input voice using the dependent model similarity information and the independent model similarity information from the pattern matching means and outputting a recognition result determined in the corresponding mode, including a rejection unit.

상기 선택 스위치 수단은 화자가 음성을 등록시키고자 하는 경우 상기 전처 리수단을 상기 파라미터 추정수단에 접속시킨다.The selection switch means connects the preprocessing means to the parameter estimating means when the speaker wants to register a voice.

상기 선택 스위치 수단은 화자의 음성을 인식하고자 하는 경우 상기 전처리수단을 상기 패턴 정합수단에 접속시킨다.The selection switch means connects the preprocessing means to the pattern matching means when the speaker's voice is to be recognized.

상기 거절부는 상기 종속모델 유사도 정보를 이용하여 인식결과를 결정하는 종속모델 거절부와, 상기 독립모델 유사도 정보를 이용하여 인식결과를 결정하는 독립모델 거절부를 구비한다.The rejection unit includes a dependency model rejection unit that determines a recognition result using the dependent model similarity information, and an independent model rejection unit that determines the recognition result using the independent model similarity information.

상기 결정 로직부에서 상기 입력 음성신호가 종속모델에 유사하다고 결정되는 경우 상기 종속모델 거절부에서 종속모델 유사도 정보 중 가장 높은 인식 확률값과 종속모델에 대응되는 필러모델의 가장 높은 인식 확률값을 비교하여 종속모델의 확률값이 큰 경우 그 종속모델을 인식결과로 출력하고, 상기 필러모델의 확률값이 큰 경우 인식결과의 출력을 차단한다.If the decision logic unit determines that the input speech signal is similar to the dependent model, the dependent model rejection unit compares the highest recognition probability value among the similarity information of the dependent model with the highest recognition probability value of the filler model corresponding to the dependent model. If the probability value of the model is large, the dependent model is output as a recognition result. If the probability value of the filler model is large, the output of the recognition result is blocked.

상기 종속모델 거절부의 인식결과 출력이 차단되는 경우 상기 독립모델 거절부에서 상기 독립모델 유사도 정보 중 가장 높은 인식 확률값과 독립모델에 대응되는 필러모델의 가장 높은 인식 확률값을 비교하여 독립모델의 확률값이 큰 경우 그 독립모델을 인식결과로 출력하고, 상기 필러모델의 확률값이 큰 경우 인식 불가능메시지를 출력한다.When the recognition result output of the dependent model rejection unit is blocked, the independent model rejection unit compares the highest recognition probability value of the independent model similarity information with the highest recognition probability value of the filler model corresponding to the independent model and has a large probability value of the independent model. If the independent model is output as a recognition result, if the probability value of the filler model is large, an unrecognizable message is output.

상기 결정 로직부에서 상기 입력 음성신호가 독립모델에 유사하다고 결정되는 경우 상기 독립모델 거절부에서 독립모델 유사도 정보 중 가장 높은 인식 확률값과 독립모델에 대응되는 필러모델의 가장 높은 인식 확률값을 비교하여 독립모델의 확률값이 큰 경우 그 독립모델을 인식결과로 출력하고, 상기 필러모델의 확률값이 큰 경우 인식결과의 출력을 차단한다.When the decision logic unit determines that the input voice signal is similar to the independent model, the independent model rejection unit compares the highest recognition probability value among the independent model similarity information with the highest recognition probability value of the filler model corresponding to the independent model to determine the independence. If the probability value of the model is large, the independent model is output as a recognition result, and if the probability value of the filler model is large, the output of the recognition result is blocked.

상기 독립모델 거절부의 인식결과 출력이 차단되는 경우 상기 종속모델 거절수단에서 상기 종속모델 유사도 정보 중 가장 높은 인식 확률값과 종속모델에 대응되는 필러모델의 가장 높은 인식 확률값을 비교하여 종속모델의 확률값이 큰 경우 그 종속모델을 인식결과로 출력하고, 상기 필러모델의 확률값이 큰 경우 인식 불가능 메시지를 출력한다.When the recognition result output of the independent model rejection unit is blocked, the dependent model rejection means compares the highest recognition probability value among the similarity information of the dependent model with the highest recognition probability value of the filler model corresponding to the dependent model, and thus has a large probability value of the dependent model. If the dependent model is output as a recognition result, and if the probability value of the filler model is large, an unrecognizable message is output.

상기 목적 외에 본 발명의 다른 목적 및 이점들은 첨부 도면을 참조한 본 발명의 바람직한 실시예에 대한 설명을 통하여 명백하게 드러나게 될 것이다.Other objects and advantages of the present invention in addition to the above object will become apparent from the description of the preferred embodiment of the present invention with reference to the accompanying drawings.

이하, 본 발명의 바람직한 실시예를 도 2 및 도 3을 참조하여 상세하게 설명하기로 한다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to FIGS. 2 and 3.

도 2은 본 발명에 따른 화자 종속/독립 음성인식 장치의 구성을 도시한 블록도로써, 도 2의 화자 종속/독립 음성인식 장치는 입력되는 음성신호에서 특징벡터를 추출하여 양자화하는 전처리부(10)와, 전처리부(10)의 출력신호를 밤 웰츠(Baum-Welch) 추정부(14)와 패턴 정합부(24)로 절환하는 선택 스위치(12)와, 선택 스위치(12)에 접속되어 HMM 파라미터를 추정하는 밤 웰츠 추정부(14)와, 밤 웰츠 추정부(14)의 HMM 파라미터를 기준패턴으로 저장하는 종속모델 저장부(16)와, 다수 화자의 음성데이터로부터 만들어진 독립모델을 저장하는 독립모델 저장부(22)와, 선택 스위치(12)를 경유한 입력신호를 종속모델 저장부(16)와 독립모델 저장부(18)의 종속모델과 독립모델과 각각 정합하는 패턴 정합부(24)와, 패턴 정합부(24)에 접속되어 해당 모드 결정 및 거절을 수행하여 인식결과를 출력하는 후처리부(26)를 구비한다.FIG. 2 is a block diagram illustrating a configuration of a speaker-dependent / independent speech recognition apparatus according to the present invention. The speaker-dependent / independent speech recognition apparatus of FIG. 2 is a preprocessor 10 that extracts and quantizes a feature vector from an input speech signal. ), A selector switch 12 for switching the output signal of the preprocessor 10 to a Baum-Welch estimator 14 and the pattern matching unit 24, and a selector switch 12 connected to the HMM. A night Welts estimator 14 for estimating the parameters, a dependent model storage 16 for storing the HMM parameters of the night Welts estimator 14 as a reference pattern, and an independent model made from voice data of multiple speakers The pattern matching unit 24 matching the independent model storage unit 22 and the input signal via the selection switch 12 with the dependent model and the independent model of the dependent model storage unit 16 and the independent model storage unit 18, respectively. ) Is connected to the pattern matching unit 24 to determine the mode and reject it. And a post-processing section 26 for outputting the expression result.

도 2의 화자 종속/독립 음성인식 장치에서 전처리부(10)는 음성구간 검출부(2), 특징 추출부(4) 및 벡터 양자화기(6)를 구성으로 한다. 전처리부(10)에서 음성구간 검출부(2)는 입력된 음성신호로부터 음성구간을 검출하여 출력한다. 특징 추출부(4)는 음성구간 검출부(2)의 출력신호에서 특징벡터를 추출하여 출력한다. 벡터 양자화기(6)는 코드북(8)을 참조하여 특징 추출부(4)로부터의 특징벡터를 양자화하여 이산신호로 출력한다. 다시 말하여, 벡터 양자화기(6)는 특징 추출부(4)로부터의 특징벡터들을 코드북(8)의 N 개의 코드벡터와 비교하여 가장 근접한 코드 벡터값으로 양자화하여 출력한다. 코드북(8)은 N개의 다차원 특징 벡터들이 집단화(Clustering) 방법으로 구성된다. 선택 스위치(12)는 사용자의 선택에 따라 전처리부(10)를 밤 웰츠 추정부(14) 또는 패턴 정합부(18)에 접속시킨다. 상세히 하면, 선택스위치(12)는 사용자가 자신의 음성을 등록시키고자 하는 경우 전처리부(10)를 밤 웰츠 추정부(14)에 접속시킨다. 또한, 선택스위치(12)는 사용자의 음성을 인식하고자 하는 경우 전처리부(10)를 패턴 정합부(18)로 접속시킨다. 밤 웰츠 추정부(14)는 등록시 벡터 양자화기(6)로부터 선택스위치(12)를 경유하여 입력된 이산신호에서 HMM 파라미터를 추정한다. 이때, 밤 웰츠 추정부(14)는 사용자가 2∼3번 반복 발음한 신호를 입력받아 일반적인 HMM 파라미터를 추출한다. 종속모델 저장부(16)에는 밤 웰츠 추정부(12)의 HMM 파라미터가 종속모델로 저장된다.In the speaker dependent / independent speech recognition apparatus of FIG. 2, the preprocessor 10 includes a speech segment detector 2, a feature extractor 4, and a vector quantizer 6. In the preprocessor 10, the voice section detector 2 detects and outputs a voice section from the input voice signal. The feature extractor 4 extracts and outputs a feature vector from the output signal of the speech segment detector 2. The vector quantizer 6 quantizes the feature vectors from the feature extractor 4 with reference to the codebook 8 and outputs them as discrete signals. In other words, the vector quantizer 6 compares the feature vectors from the feature extractor 4 with the N code vectors of the codebook 8 and quantizes them to the nearest code vector value. The codebook 8 is composed of N multidimensional feature vectors in a clustering method. The selection switch 12 connects the preprocessor 10 to the night Welts estimation unit 14 or the pattern matching unit 18 according to the user's selection. In detail, the selector switch 12 connects the preprocessor 10 to the night Welts estimation unit 14 when the user wants to register his / her voice. In addition, the selector switch 12 connects the preprocessor 10 to the pattern matching unit 18 when the user's voice is to be recognized. The night Welts estimator 14 estimates the HMM parameter from the discrete signal input from the vector quantizer 6 via the select switch 12 at registration. At this time, the night Welts estimator 14 receives a signal repeatedly pronounced by the user 2-3 times and extracts a general HMM parameter. The subordinate model storage unit 16 stores the HMM parameter of the night Welts estimation unit 12 as the subordinate model.

독립모델 저장부(22)에는 다수 화자의 음성데이터로부터 만들어진 독립모델이 저장되어 있다.The independent model storage unit 22 stores an independent model made from voice data of a plurality of speakers.

패턴 정합부(24)는 인식시 벡터 양자화기(6)로부터 선택 스위치(12)를 경유한 이산신호를 입력으로 하여 비터비(Viterbi) 검색을 한다. 이때, 기준모델로 사용되는 것은 종속모델 저장부(16)의 종속모델과 독립모델 저장부(22)의 독립모델이다. 패턴 정합부(24)는 비터비 검색을 하여 입력신호와 종속모델 간의 유사도 정보(이하, 종속모델 학습정보라 한다)와 입력신호와 독립모델 간의 유사도 정보(이하, 독립모델 학습정보라 한다) 각각을 출력한다. 이때, 각 유사도는 대수가 취해진 확률값으로 표시된다.The pattern matching unit 24 performs a Viterbi search by inputting a discrete signal from the vector quantizer 6 via the selection switch 12 at the time of recognition. In this case, the reference model is used as the dependent model of the dependent model storage unit 16 and the independent model of the independent model storage unit 22. The pattern matching unit 24 performs a Viterbi search, and the similarity information between the input signal and the dependent model (hereinafter referred to as dependent model learning information) and the similarity information between the input signal and the independent model (hereinafter referred to as independent model learning information), respectively. Outputs In this case, each similarity is represented by a probability value taken in logarithm.

후처리부(26)는 패턴 정합부(24)로부터 출력되는 종속모델 학습정보와 독립모델 학습정보를 이용하여 입력음성이 종속모델에 유사한지 독립모델에 유사한지를 판단하고 이에 따라 결정된 인식결과를 출력한다. 이를 위하여, 후처리부(26)는 결정 로직부(28)와 거절부(30)를 구비하고, 거절부(30)는 종속 거절부(32)와 독립거절부(34)를 구성으로 한다. 후처리부(26)의 결정 로직부(28)는 패턴 정합부(24)로부터의 종속모델 학습정보와 독립모델 학습정보를 이용하여 입력음성이 종속모델에 유사한지 독립모델에 유사한지를 판단하여 출력한다. 상세히 하면, 통상수십 내지 수백명의 화자로부터 취득한 동일한 단어의 음성신호로부터 만들어진 독립모델은 2∼3번 반복 발음으로 학습된 종속모델에 비하여 해당 음성신호에 대한 비터비 검색 후 산출되는 확률값이 높다는 특성을 갖는다. 따라서, 결정 로직부(28)는 상술한 특성을 입력음성이 종속단어인지 독립단어인지 여부를 결정하여 거절부(30)의 해당 모드로 출력한다. 거절부(30)의 종속 거절부(32)와 독립 거절부(34)는 결정 로직부(28)의 출력에 따라 선택적으로 동작한다.The post processor 26 determines whether the input voice is similar to the dependent model or the independent model using the dependent model training information and the independent model training information output from the pattern matching unit 24, and outputs the recognition result determined accordingly. . To this end, the post-processing unit 26 includes a decision logic unit 28 and a rejection unit 30, and the rejection unit 30 includes a dependency rejection unit 32 and an independent rejection unit 34. The decision logic unit 28 of the post-processing unit 26 determines whether the input voice is similar to the dependent model or the independent model using the dependent model training information and the independent model training information from the pattern matching unit 24, and outputs the same. . In detail, the independent model made from the speech signal of the same word acquired from dozens or hundreds of speakers usually has a higher probability value calculated after Viterbi search for the speech signal compared to the dependent model trained by repeated pronunciation 2-3 times. Have Accordingly, the decision logic unit 28 determines whether the input voice is a dependent word or an independent word and outputs the above-described characteristic in the corresponding mode of the rejection unit 30. The dependent rejection unit 32 and the independent rejection unit 34 of the rejection unit 30 selectively operate according to the output of the decision logic unit 28.

상세히 하면, 결정 로직부(28)에서 입력음성을 독립단어로 결정한 경우 독립 거절부(34)는 독립모델 학습정보 중 가장 높은 확률값과 이 독립모델에 대응되는 필러(Filler) 모델의 확률값 중 가장 높은 값을 비교하여 독립모델의 확률값이 큰 경우 그 독립모델을 인식결과로 출력한다. 반면에, 필러 모델의 확률 값이 큰 경우 인식결과의 출력을 차단한다. 이 경우 종속 거절부(32)는 결정 로직부(28)로부터 종속모델 학습정보 중 가장 높은 값을 이 종속 모델에 대응되는 필러 모델의 확률값 중 가장 높은 값을 비교하여 종속모델의 확률값이 큰 경우 그 종속모델을 인식결과로 출력한다. 반면에 필러 모델의 확률값이 큰 경우 인식결과의 출력을 차단하고 인식 불가능(Out-of-vocabulary)이라는 메시지를 출력한다.In detail, when the decision logic unit 28 determines the input voice as the independent word, the independent rejection unit 34 has the highest probability value among the independent model learning information and the highest probability value of the filler model corresponding to the independent model. If the probability value of the independent model is large by comparing the values, the independent model is output as a recognition result. On the other hand, if the probability value of the filler model is large, the output of the recognition result is blocked. In this case, the dependency rejection unit 32 compares the highest value of the dependent model learning information from the decision logic unit 28 with the highest value among the probability values of the filler model corresponding to the dependent model, and then, when the probability value of the dependent model is large, Output the dependent model as a recognition result. On the other hand, if the probability value of the filler model is large, the output of the recognition result is blocked and an out-of-vocabulary message is output.

한편, 결정 로직부(28)에서 입력 음성을 종속단어로 결정한 경우 종속 거절부(32)부터 상술한 바와 같은 방법으로 동작하여 인식대상 단어인지 인식 불가능 단어인지를 결정하여 출력한다.On the other hand, when the decision logic unit 28 determines that the input voice is the dependent word, the dependent rejection unit 32 operates in the same manner as described above to determine and output whether the word to be recognized or the word that cannot be recognized.

이와 같이, 상술한 화자 종속/독립 음성 인식 장치는 종래와 같이 종속 모드로 중요 메뉴를 학습할 필요가 없을 뿐만 아니라, 종속 및 독립 모드를 인식기에서 자동으로 인식하여 판단함으로 구조를 간단히 할 수 있다.As described above, the speaker-dependent / independent speech recognition apparatus described above does not need to learn important menus in the dependent mode as in the conventional art, and can simplify the structure by automatically recognizing and determining the dependent and independent modes in the recognizer.

이상 설명한 바와 같이, 본 발명에 따른 화자 종속/독립 음성 인식 장치에 의하면 화자종속 단어와 화자독립 단어를 같은 인식부를 이용하여 동시에 인식하고 인식부에서 해당 모드를 판단함으로써 인식기의 구조를 간단히 할 수 있다. 또한,본 발명의 화자 종속/독립 음성인식 장치는 동일한 코드북을 이용하여 화자 종속 및 독립 인식을 수행함으로써 메모리의 용량을 줄일 수 있다.As described above, according to the speaker-dependent / independent speech recognition apparatus according to the present invention, the structure of the recognizer can be simplified by simultaneously recognizing the speaker-dependent word and the speaker-independent word by using the same recognizer and determining the corresponding mode by the recognizer. . In addition, the speaker-dependent / independent speech recognition device of the present invention can reduce the memory capacity by performing speaker-dependent and independent recognition using the same codebook.

한편, 상술한 내용을 통해 당업자라면 본 발명의 기술사상을 일탈하지 아니하는 범위에서 다양한 변경 및 수정이 가능함을 알 수 있을 것이다. 따라서, 본 발명의 기술적 범위는 명세서의 상세한 설명에 기재된 내용으로 한정되는 것이 아니라 특허 청구의 범위에 의하여 정하여져야만 한다.On the other hand, it will be appreciated by those skilled in the art that various changes and modifications can be made without departing from the spirit of the present invention. Therefore, the technical scope of the present invention should not be limited to the contents described in the detailed description of the specification but should be defined by the claims.

도 1은 종래의 음성인식 장치 중 화자종속 음성인식기의 구성을 도시한 블록도.1 is a block diagram showing the configuration of a speaker-dependent speech recognizer of the conventional speech recognition device.

도 2는 본 발명에 따른 화자 종속/독립 음성인식 장치의 구성을 도시한 블록도.Figure 2 is a block diagram showing the configuration of a speaker-dependent / independent speech recognition device according to the present invention.

도 3은 도 2의 음성인식 장치에서 후처리부의 구성을 상세히 도시한 블록도.3 is a block diagram showing in detail the configuration of the post-processing unit in the voice recognition device of FIG.

<도면의 주요부분에 대한 부호의 간단한 설명><Brief description of symbols for the main parts of the drawings>

2 : 음성구간 검출부 4 : 특징 추출부2: voice section detection unit 4: feature extraction unit

6 : 벡터 양자화기 8 : 코드북6: vector quantizer 8: codebook

10 : 전처리부 12 : 선택 스위치10: preprocessor 12: selection switch

14 : 밤웰츠(Baum-Welch) 추정부 16 : 종속모델 저장부14: Baum-Welch estimator 16: dependent model storage unit

18, 24 : 패턴 정합부 20 : 인식 판단부18, 24: pattern matching unit 20: recognition determination unit

22 : 독립모델 저장부 26 : 후처리부22: independent model storage unit 26: post-processing unit

28 : 결정 로직부 30 : 거절부28: decision logic unit 30: rejection unit

32 : 종속 거절부 34 : 독립 거절부32: dependent refusal 34: independent refusal

Claims

Voice section detection means for detecting a voice section from the voice signal;

Feature vector extracting means for extracting feature vectors from the speech signal from said speech segment detecting means;

Vector quantization means for quantizing a feature vector from the feature vector extracting means using a codebook;

Parameter estimation means for receiving an output signal of the quantization means and estimating the dependent model parameter;

First storage means for storing the dependent model from the parameter estimating means;

Second storage means for storing an independent model made from voice data of multiple speakers;

Pattern matching means for pattern matching each of the dependent model of the first storage means and the independent model of the second storage means with the output signal of the preprocessing means to generate the dependent model similarity information and the independent model similarity information, respectively;

Selection switch means for connecting the vector abbreviation means to one of the parameter estimating means and the pattern matching means according to a speaker's selection;

A decision logic unit for determining a corresponding mode by comparing the dependent model similarity information and the independent model similarity information, and selectively operating according to a mode determined by the decision logic unit to output a recognition result determined using similarity information of the corresponding mode. And a post-processing means for determining a mode corresponding to the input voice using the dependent model similarity information and the independent model similarity information from the pattern matching means, including a rejection unit, and outputting a recognition result determined in the corresponding mode. Speaker dependent / independent speech recognition device.

The method of claim 1

And said selection switch means connects said preprocessing means to said parameter estimating means when a speaker wants to register a speech.

The method of claim 1,

And said selection switch means connects said preprocessing means to said pattern matching means when it is desired to recognize a speaker's voice.

The method of claim 1,

The rejection unit

A subordinate model rejection unit configured to determine a recognition result using the subordinate model similarity information;

And an independent model rejection unit for determining a recognition result using the independent model similarity information.

The method of claim 4, wherein

If the decision logic unit determines that the input speech signal is similar to the dependent model, the dependent model rejection unit compares the highest recognition probability value among the similarity information of the dependent model with the highest recognition probability value of the filler model corresponding to the dependent model. If the probability value of the model is large, the dependent model is output as a recognition result.

And a speaker-dependent / independent speech recognition apparatus for blocking output of a recognition result when a probability value of the filler model is large.

The method of claim 5, wherein

When the recognition result output of the dependent model rejection unit is blocked, the independent model rejection unit compares the highest recognition probability value of the independent model similarity information with the highest recognition probability value of the filler model corresponding to the independent model and has a large probability value of the independent model. In this case, the independent model is output as a recognition result,

And a speaker-independent / independent speech recognition apparatus for outputting an unrecognized message when the probability value of the filler model is large.

The method of claim 1,

When the decision logic unit determines that the input voice signal is similar to the independent model, the independent model rejection unit compares the highest recognition probability value among the independent model similarity information with the highest recognition probability value of the filler model corresponding to the independent model to determine the independence. If the probability value of the model is large, the independent model is output as the recognition result,

The method of claim 7, wherein

When the recognition result output of the independent model rejection unit is blocked, the dependent model rejection means compares the highest recognition probability value among the similarity information of the dependent model with the highest recognition probability value of the filler model corresponding to the dependent model, and thus has a large probability value of the dependent model. In this case, the dependent model is output as a recognition result,