KR101195742B1

KR101195742B1 - Keyword spotting system having filler model by keyword model and method for making filler model by keyword model

Info

Publication number: KR101195742B1
Application number: KR1020100032089A
Authority: KR
Inventors: 김영준
Original assignee: 에스케이플래닛 주식회사
Priority date: 2010-04-08
Filing date: 2010-04-08
Publication date: 2012-11-01
Also published as: KR20110112890A

Abstract

본 발명은 각 키워드별로 필러 모델을 별도로 구현하여 키워드 검출 능력을 향상시킬 수 있도록 하는 키워드별 필러 모델을 구비하는 키워드 검출 시스템 및 키워드별 필러 모델 구현 방법에 관한 것이다.
이를 위해, 본 발명은 음성 신호에서 추출된 특징 벡터를 저장된 키워드와 비교하여 키워드에 가까운 정도인 유사도(Likelihood)를 산출하여 출력하는 적어도 하나 이상의 키워드 모델과, 상기 각각의 키워드 모델별로 각 키워드의 음향적 특성에 따라 구현되어, 상기 특징 벡터에 대한 유사도를 산출하여 출력하는 필러 모델과, 각 키워드 모델로부터 인가받은 유사도와 각 키워드 모델에 대응 구현되어 있는 펄러 모델로부터 인가받은 유사도를 비교하여 키워드를 검출하는 유사도 비교부를 포함하여 이루어지는 것이 바람직하다.
이에 따라, 본 발명은 각 키워드의 음향적 특성에 따라 키워드별로 필러 모델을 구현함으로써, 키워드 검출 성능을 향상시킬 수 있게 된다.The present invention relates to a keyword detection system and a method of implementing a filler model for each keyword having a filler model for each keyword for improving a keyword detection capability by separately implementing a filler model for each keyword.
To this end, the present invention compares the feature vector extracted from the speech signal with the stored keywords to calculate and output a similarity (Likelihood) that is close to the keyword, and the sound of each keyword for each keyword model The keyword is detected by comparing the filler model which calculates and outputs the similarity with respect to the feature vector and the similarity received from each keyword model and the similarity received from the perl model corresponding to each keyword model. It is preferable to include a similarity comparison section.
Accordingly, the present invention can improve the keyword detection performance by implementing the filler model for each keyword according to the acoustic characteristics of each keyword.

Description

KEYWORD SPOTTING SYSTEM HAVING FILLER MODEL BY KEYWORD MODEL AND METHOD FOR MAKING FILLER MODEL BY KEYWORD MODEL}

본 발명은 키워드별 필러 모델을 구비하는 키워드 검출 시스템 및 키워드별 필러 모델 구현 방법에 관한 것으로서, 특히 각 키워드별로 필러 모델을 별도로 구현하여 키워드 검출 능력을 향상시킬 수 있도록 하는 키워드별 필러 모델을 구비하는 키워드 검출 시스템 및 키워드별 필러 모델 구현 방법에 관한 것이다.The present invention relates to a keyword detection system having a filler model for each keyword and a method for implementing a filler model for each keyword. In particular, the present invention relates to a keyword model for improving keyword detection capability by implementing a filler model for each keyword separately. It relates to a keyword detection system and a method of implementing a filler model for each keyword.

일반적으로 키워드 검출(KEYWORD SPOTTING)이란 음성 인식의 한 분야로서 컴퓨터가 사람의 음성을 입력받아 이 음성에 미리 정해진 특정 단어 또는 복수 개의 단어들 중 어느 것이 포함되어 있는 지의 여부를 찾아내고 이 단어를 식별해 내는 작업을 의미한다.In general, KEYWORD SPOTTING is a field of speech recognition, in which a computer receives a human voice and finds out whether the speech contains a predetermined word or a plurality of words, and identifies the word. I mean work done.

이러한 키워드 검출 시스템은 키워드 이외의 일반적인 음성을 모델링한 필러 모델(Filler Model)의 스코어와 찾기를 원하는 단어인 키워드의 음성을 모델링한 키워드 모델(Keyword Model)의 스코어 비교를 통해 키워드를 검출하게 된다.The keyword detection system detects a keyword through a score comparison between a score of a filler model modeling a general voice other than a keyword and a keyword model modeling a voice of a keyword that is a word to be searched for.

이때 키워드 검출 성능에 큰 영향을 미치는 요소는 필러 모델이 얼마나 키워드 이외의 단어를 잘 걸러주는가에 달려있다.In this case, the factor that greatly affects the keyword detection performance depends on how well the filler model filters words other than keywords.

그러나, 종래에는 도 1에 도시하는 바와 같이 모든 키워드가 동일한 필러 모델을 사용해서 키워드를 검출하므로, 키워드 이외의 단어를 걸러내는 데 어려움이 발생하게 되는 문제점이 있다.However, conventionally, as shown in Fig. 1, since all keywords are detected using the same filler model, there is a problem in that it is difficult to filter out words other than keywords.

즉, 각 키워드의 음향적 특성은 서로 다르기 때문에, 종래와 같이 모든 키워드가 동일한 필러 모델을 사용하게 되면, 키워드와 유사한 발성의 음성이 입력되었을 때, 입력된 음성이 키워드가 아님에도 불구하고 키워드로 검출되는 문제점이 있다. 예를 들어, '가방'이라는 단어가 키워드로 등록되어 있을 때 이와 유사한 발성의 '가발'이라는 음성이 입력되는 경우, '가발'이라는 음성이 '가방'이라는 키워드로 검출되는 문제점이 있다.That is, since the acoustic characteristics of each keyword are different from each other, if all keywords use the same filler model as in the prior art, when a voice similar to the keyword is input, the input voice is a keyword even though the input voice is not a keyword. There is a problem that is detected. For example, when the word 'bag' is input when a word 'bag' is registered as a keyword, a voice of 'wig' is detected as a keyword 'bag'.

그리고, 서로 유사한 발성을 갖는 음성이 키워드로 등록된 경우에는 입력받은 음성을 잘못된 키워드로 검출될 수도 있게 되는 문제점이 있다. 예를 들어, 키워드로 '담배'와 '담비'라는 단어가 등록되어 있을 때, '담배'라는 음성이 '담비'로 오검출될 수도 있게 되는 문제점이 있다.In addition, when voices having similar utterances are registered as keywords, there is a problem that the input voice may be detected as an incorrect keyword. For example, when the words 'tobacco' and 'bibby' are registered as keywords, there is a problem that the voice of 'cigarette' may be incorrectly detected as 'cigarette'.

본 발명은 전술한 문제점을 해결하기 위해 안출된 것으로서, 각 키워드별로 각 키워드의 음향적 특성에 따라 필러 모델을 별도로 구현하여 키워드 검출 능력을 향상시킬 수 있도록 하는 키워드별 필러 모델을 구비하는 키워드 검출 시스템 및 키워드별 필러 모델 구현 방법을 제공함에 그 목적이 있다.The present invention has been made to solve the above-described problem, keyword detection system having a filler model for each keyword to improve the keyword detection ability by implementing a filler model separately according to the acoustic characteristics of each keyword for each keyword And it aims to provide a filler model implementation method for each keyword.

전술한 목적을 달성하기 위한 본 발명의 제1관점에 따른 키워드별 필러 모델을 구비하는 키워드 검출 시스템은, 음성 신호에서 추출된 특징 벡터를 저장된 키워드와 비교하여 키워드에 가까운 정도인 유사도(Likelihood)를 산출하여 출력하는 적어도 하나 이상의 키워드 모델과; 상기 각각의 키워드 모델별로 각 키워드의 음향적 특성에 따라 구현되어, 상기 특징 벡터에 대한 유사도를 산출하여 출력하는 필러 모델과; 각 키워드 모델로부터 인가받은 유사도와 각 키워드 모델에 대응 구현되어 있는 펄러 모델로부터 인가받은 유사도를 비교하여 키워드를 검출하는 유사도 비교부를 포함하여 이루어지는 것이 바람직하다.The keyword detection system having a filler model for each keyword according to the first aspect of the present invention for achieving the above object, compares the feature vector extracted from the speech signal with the stored keyword, the degree of similarity (Likelihood) that is close to the keyword At least one keyword model calculated and output; A filler model implemented according to the acoustic characteristics of each keyword for each keyword model and calculating and outputting a similarity degree to the feature vector; It is preferable to include a similarity comparison unit for detecting a keyword by comparing the similarity received from each keyword model and the similarity received from the Perler model implemented corresponding to each keyword model.

나아가, 상기 각각의 키워드 모델별로 구현되는 필러 모델은, 대응되는 키워드와 음향적 거리 차이 값이 가장 큰 음향 모델로 구현되는 것이 바람직하다.Furthermore, the filler model implemented for each keyword model is preferably implemented as an acoustic model having the largest keyword and acoustic distance difference value.

한편, 본 발명의 제2관점에 따른 키워드별 필러 모델 구현 방법은, 필러 모델 구현 장치에서 각 키워드의 음향적 분석을 통해 키워드 간의 음향적 거리를 측정하는 키워드간 음향 거리 측정 과정과; 각 키워드별로 각각의 키워드와 음향적 거리가 가장 먼 음향 모델을 이용하여 필러 모델을 구현하되, 상기 키워드간 음향 거리 측정 과정을 통해 측정된 키워드간 음향 거리를 반영하여 각 키워드에 대한 필러 모델을 구현하는 필러 모델 구현 과정을 포함하여 이루어지는 것이 바람직하다.Meanwhile, a method of implementing a filler model for each keyword according to the second aspect of the present invention includes: measuring a sound distance between keywords by measuring the acoustic distance between keywords through an acoustic analysis of each keyword in a filler model implementing apparatus; The filler model is implemented using the acoustic model having the farthest acoustic distance from each keyword for each keyword, and the filler model is implemented for each keyword by reflecting the acoustic distance between the keywords measured through the acoustic distance measurement process between the keywords. It is preferable to include a filler model implementation process.

그리고, 상기 필러 모델 구현 과정을 통해 각 키워드 모델별로 필러 모델이 구현되면, 각 키워드 모델별로, 키워드 이외의 단어에 대한 키워드 모델의 유사도와 상기 키워드 모델에 대응 구현된 필러 모델의 유사도를 비교하여 두 유사도 값의 차인 임계값을 각 키워드별로 측정하는 과정과; 상기 측정된 임계값이 기준치 이하인 키워드 모델에 대해서는, 키워드 이외의 단어에 대한 키워드 모델과 필러 모델의 유사도 값이 기설정된 값 이상으로 차이가 나도록 상기 키워드 모델에 대한 필러 모델을 재구현하는 과정을 더 포함하여 이루어지는 것이 바람직하다.When the filler model is implemented for each keyword model through the filler model implementation process, the similarity of the keyword model for words other than the keyword and the similarity of the filler model corresponding to the keyword model are compared for each keyword model. Measuring a threshold value, which is a difference between similarity values, for each keyword; For the keyword model whose measured threshold is less than or equal to the reference value, the process of re-implementing the filler model for the keyword model such that the similarity values of the keyword model and the filler model for words other than the keyword differ by more than a predetermined value is further performed. It is preferable to comprise.

본 발명의 키워드별 필러 모델을 구비하는 키워드 검출 시스템 및 키워드별 필러 모델 구현 방법에 따르면, 각 키워드의 음향적 특성에 따라 키워드별로 필러 모델을 구현함으로써, 키워드 검출 성능을 향상시킬 수 있게 된다.According to the keyword detection system including the keyword-based filler model and the method of implementing the filler model for each keyword of the present invention, the keyword detection performance can be improved by implementing the filler model for each keyword according to the acoustic characteristics of each keyword.

도 1은 종래 기술에 따라 키워드를 검출하는 방식을 설명하기 위한 도면.
도 2는 본 발명의 일 실시예에 따른 키워드별 필러 모델을 구비하는 키워드 검출 시스템의 구성을 개략적으로 보인 도면.
도 3은 본 발명에 적용되는 키워드 모델을 예시적으로 보인 도면.
도 4는 본 발명에 따라 각각의 키워드 모델별로 구현되는 필러 모델을 예시적으로 보인 도면.
도 5는 본 발명의 일 실시예에 따른 키워드별 필러 모델 구현 방법을 설명하기 위한 처리도.1 is a view for explaining a method of detecting a keyword according to the prior art.
2 is a view schematically showing the configuration of a keyword detection system having a filler model for each keyword according to an embodiment of the present invention.
3 is a view showing an example of a keyword model applied to the present invention.
4 is a diagram illustrating a filler model implemented for each keyword model according to the present invention.
5 is a flowchart illustrating a method of implementing a keyword-based filler model according to an embodiment of the present invention.

이하에서는 첨부한 도면을 참조하여 본 발명의 바람직한 실시예에 따른 키워드별 필러 모델을 구비하는 키워드 검출 시스템 및 키워드별 필러 모델 구현 방법에 대해서 상세하게 설명한다.Hereinafter, a keyword detection system having a filler model for each keyword and a filler model implementation method for each keyword according to a preferred embodiment of the present invention will be described in detail with reference to the accompanying drawings.

도 2는 본 발명의 일 실시예에 따른 키워드별 필러 모델을 구비하는 키워드 검출 시스템의 구성을 개략적으로 보인 도이다.2 is a diagram schematically illustrating a configuration of a keyword detection system having a keyword-specific filler model according to an embodiment of the present invention.

도 2에서, 음성 수신부(10)는 일반적으로 마이크로폰(microphone)으로 구현되어, 수신한 음성 신호를 전기 에너지로 변환하고, 전기 에너지로 변환된 음성 신호를 특징 벡터 추출부(20)로 전달한다.In FIG. 2, the voice receiver 10 is generally implemented as a microphone to convert the received voice signal into electrical energy, and transfer the voice signal converted into electrical energy to the feature vector extractor 20.

특징 벡터 추출부(20)는 음성 수신부(10)로부터 전달받은 음성 신호의 주파수 특성을 프레임별로 계산하여 음성 신호에 포함된 특징 벡터를 추출하고, 추출된 특징 벡터를 각각의 키워드 모델(30)과 각 키워드 모델(30)에 대응 구현되어 있는 필러 모델(40)로 전달한다.The feature vector extractor 20 calculates a frequency characteristic of the speech signal received from the speech receiver 10 for each frame, extracts a feature vector included in the speech signal, and extracts the extracted feature vector from each keyword model 30. The data is transmitted to the filler model 40 corresponding to each keyword model 30.

각각의 키워드 모델(30)은 특징 벡터 추출부(20)로부터 전달받은 특징 벡터를 저장된 키워드와 비교하여 키워드에 가까운 정도인 유사도(Likelihood)를 산출하여 출력한다.Each keyword model 30 compares the feature vector received from the feature vector extractor 20 with a stored keyword and calculates and outputs a similarity level that is close to the keyword.

전술한, 키워드 모델(30)은 키워드의 음소를 하나하나 분리하여 사용하거나, 도 3에 도시하는 바와 같이, 키워드(예를 들어, '담배') 전체를 하나의 HMM(Hidden Markov Model)으로 모델링하여 생성할 수 있다.As described above, the keyword model 30 separately uses phonemes of keywords one by one, or as shown in FIG. 3, modeling the entire keyword (for example, 'cigarette') as one HMM (Hidden Markov Model) Can be generated.

한편, 필러 모델(40)은 각각의 키워드 모델(30)별로 각 키워드의 음향적 특성에 따라 별도로 구현되며, 각 필러 모델(40)은 특징 벡터 추출부(20)로부터 전달받은 특징 벡터를 저장되어 있는 음소와 비교하여 유사도를 산출하여 출력한다.Meanwhile, the filler model 40 is implemented separately according to the acoustic characteristics of each keyword for each keyword model 30, and each filler model 40 stores the feature vector received from the feature vector extractor 20. The similarity is calculated and compared with the phoneme.

전술한, 필러 모델(40)은 언어에 사용되는 모든 음소들에 대한 통계 정보를 도 4에 도시하는 바와 같이, 하나 또는 여러 개의 상태(state)로 모은 것으로, 각 음향 정보(예를 들어, 버스 소리, 문 여닫는 소리, 발소리, 거리소음, 실내음 등)에 대한 통계 값도 추가될 수 있다.As described above, the filler model 40 collects statistical information about all phonemes used in the language into one or several states, as shown in FIG. Statistical values for sounds, doors, doors, footsteps, street noises, room noises, etc. can also be added.

키워드 검출 시스템은 입력 음성에 대한 키워드 모델(30)의 유사도와 필러 모델(40)의 유사도 비교를 통해 키워드를 검출하는 데, 키워드 이외의 음성 입력에 대해서는 필러 모델(40)의 유사도가 키워드 모델(30)의 유사도보다 커야 하고, 키워드 음성 입력에 대해서는 키워드 모델(30)의 유사도가 필러 모델(40)의 유사도보다 커야 한다.The keyword detection system detects a keyword by comparing the similarity of the keyword model 30 with respect to the input voice and the similarity of the filler model 40. For voice inputs other than the keyword, the similarity of the filler model 40 is determined using the keyword model ( The similarity of the keyword model 30 should be greater than that of the filler model 40.

전술한 바와 같이, 키워드 이외의 음성 입력에 대해서는 필러 모델(40)의 유사도가 키워드 모델(30)의 유사도보다 크게 하고, 키워드 음성 입력에 대해서는 키워드 모델(30)의 유사도가 필러 모델(40)의 유사도보다 크게 하여 키워드 검출 시스템의 키워드 검출 성능을 높이기 위해서는, 각각의 키워드 모델(30)별로 필러 모델(40)을 구현할 때, 각 키워드 모델(30)과 음향적 거리가 가장 큰 음향 모델을 필러 모델(40)로 구현하는 것이 바람직하다.As described above, the similarity of the filler model 40 is greater than the similarity of the keyword model 30 for voice inputs other than the keyword, and the similarity of the keyword model 30 is similar to that of the filler model 40 for the keyword voice input. In order to increase the keyword detection performance of the keyword detection system by increasing the similarity, when implementing the filler model 40 for each keyword model 30, the acoustic model having the largest acoustic distance with each keyword model 30 is the filler model. It is preferable to implement at 40.

한편, 유사도 비교부(50)는 각 키워드 모델(30)로부터 인가받은 유사도와 각 키워드 모델에 대응 구현되어 있는 필러 모델(40)로부터 인가받은 유사도를 비교하여 키워드를 검출한다.Meanwhile, the similarity comparison unit 50 detects a keyword by comparing the similarity received from each keyword model 30 with the similarity received from the filler model 40 corresponding to each keyword model.

전술한, 유사도 비교부(50)는 입력 음성에 대한 각 키워드 모델(30)의 유사도와 각 키워드 모델(30)에 대응 구현되어 있는 필러 모델(40)의 유사도를 비교하여, 특정 필러 모델(40)의 유사도가 키워드 모델(30)의 유사도보다 크면 해당 입력 음성을 키워드가 아닌 음성으로 인식하고, 특정 키워드 모델(30)의 유사도가 필러 모델(40)의 유사도보다 크면 해당 입력 음성을 키워드로 인식한다.As described above, the similarity comparison unit 50 compares the similarity of each keyword model 30 with respect to the input voice and the similarity of the filler model 40 corresponding to each keyword model 30. ), If the similarity of the keyword model 30 is greater than the similarity of the keyword model 30, the corresponding input voice is recognized as a non-keyword voice. If the similarity of the specific keyword model 30 is greater than the similarity of the filler model 40, the corresponding input voice is recognized as a keyword. do.

도 5는 본 발명의 일 실시예에 따른 키워드별 필러 모델 구현 방법을 설명하기 위한 처리도이다.5 is a flowchart illustrating a method of implementing a filler model for each keyword according to an embodiment of the present invention.

우선, 각 키워드별로 필러 모델을 구현하고자 하는 필러 모델 구현 장치는 각 키워드의 음향적 분석을 통해 키워드 간의 음향적 거리를 측정한다(S10).First, a filler model implementing apparatus for implementing a filler model for each keyword measures an acoustic distance between keywords through acoustic analysis of each keyword (S10).

상기한 과정 S10에서 키워드 간의 음향적 거리를 측정하는 이유는, 각 키워드별로 필러 모델을 구현할 때, 키워드간 음향 거리를 반영하여 필러 모델을 구현할 수 있도록 하기 위함이다. 이에 대한 자세한 설명은 아래에서 설명하기로 한다.The reason for measuring the acoustic distance between keywords in the process S10 is to implement the filler model by reflecting the acoustic distance between keywords when implementing the filler model for each keyword. Detailed description thereof will be described below.

상기한 과정 S10을 통해 키워드 간의 음향적 거리를 측정한 후에는 각 키워드별로 각각의 키워드와 음향적 거리가 가장 먼 음향 모델을 이용하여 필러 모델을 구현하되, 상기한 과정 S10을 통해 측정된 키워드간 음향 거리를 반영하여 각 키워드에 대한 필러 모델을 구현한다(S12).After measuring the acoustic distance between the keywords through the process S10 described above, a filler model is implemented using the acoustic model having the farthest acoustic distance from each keyword for each keyword, and the keyword measured through the process S10. A filler model for each keyword is implemented by reflecting the acoustic distance (S12).

상기한 과정 S10에서 키워드 간의 음향적 거리를 측정한 결과 키워드 간의 음향적 거리가 가까운 경우 즉, 키워드 간의 음향적 특징이 유사한 경우에는 음향적 특징이 유사한 키워드끼리 오인식될 가능성이 있다.When the acoustic distance between the keywords is close as a result of measuring the acoustic distance between the keywords in the process S10, that is, when the acoustic characteristics between the keywords are similar, there is a possibility that the keywords having similar acoustic characteristics are misidentified.

예를 들어, 키워드로 '담배'와 '담비'라는 단어가 등록되어 있다고 가정했을 때, 키워드 '담배'와 '담비'는 마지막 음소('ㅐ'와 'ㅣ')만 서로 다를 뿐 앞의 4음소(ㄷ, ㅏ, ㅁ, ㅂ)는 모두 동일하다. 이에 따라, 키워드 '담배'와 '담비'의 음향적 거리는 매우 가깝다.For example, assuming that the words 'tobacco' and 'beetle' are registered as keywords, the keywords 'tobacco' and 'beetle' differ only in the last phonemes ('ㅐ' and 'ㅣ'). Phonemes (ㄷ, ㅏ, ㅁ, ㅂ) are all the same. Accordingly, the acoustic distances of the keywords 'cigarette' and 'marten' are very close.

전술한 바와 같이, 키워드 '담배'와 '담비'는 키워드 간의 음향적 거리가 가까워서, '담배'가 '담비'로 또는 '담비'가 '담배'로 오인식될 수 있다. 이에 따라, 키워드가 오인식되는 것을 방지하기 위하여, 키워드 간의 음향적 거리가 가까운 키워드에 대해서는 필러 모델을 구현할 때, '담배'에 대한 키워드 모델의 필러 모델과 키워드 '담비'에 대한 키워드 모델의 필러 모델이 서로 차이가 날 수 있도록 구현한다. 즉, '담배'라는 음성이 입력되었을 때, '담배'에 대한 키워드 모델의 유사도와 '담배' 키워드 모델에 대응 구현되는 필러 모델의 유사도 비교 값이, '담비'에 대한 키워드 모델의 유사도와 '담비' 키워드 모델에 대응 구현되어 있는 필러 모델의 유사도 비교 값보다 크고, '담비'라는 음성이 입력되었을 때, '담비'에 대한 키워드 모델의 유사도와 '담비' 키워드 모델에 대응 구현되는 필러 모델의 유사도 비교 값이, '담배'에 대한 키워드 모델의 유사도와 '담배' 키워드 모델에 대응 구현되어 있는 필러 모델의 유사도 비교 값보다 크도록 필러 모델을 구현하여 음향적 거리가 가까운 키워드끼리 오인식되는 것을 방지한다.As described above, the keywords 'tobacco' and 'beetle' have a close acoustic distance between the keywords, and thus, 'cigarette' may be mistaken as 'drug' or 'damp' as 'tobacco'. Accordingly, in order to prevent a keyword from being misunderstood, when implementing a filler model for keywords having a close acoustic distance between the keywords, the filler model of the keyword model for 'cigarette' and the keyword model for the keyword 'salk' Implement this to make a difference. That is, when the voice of 'cigarette' is input, the similarity of the keyword model for 'cigarette' and the similarity of the filler model implemented for the 'cigarette' keyword model are similar to that of the 'cigarette'. 'Marten' corresponding to the keyword model, the similarity comparison of the filler model is greater than the comparison value, when the voice of 'Marten' is input, the similarity of the keyword model for 'marten' and the 'pillar' keyword model Implement the filler model so that the similarity comparison value is larger than the similarity of the keyword model for 'cigarette' and the similarity comparison value of the filler model implemented for the 'cigarette' keyword model, thereby preventing misunderstanding of keywords with a close acoustic distance. do.

상기한 과정 S12를 통해 각 키워드 모델별로 필러 모델을 구현한 후에는, 각 키워드 모델별로, 키워드 이외의 단어에 대한 키워드 모델의 유사도와 해당 키워드 모델에 대응 구현된 필러 모델의 유사도를 비교하여 두 유사도 값의 차인 임계값을 각 키워드별로 측정한다(S14).After implementing the filler model for each keyword model through the above-described process S12, for each keyword model, two similarities are compared by comparing the similarity of the keyword model with respect to the words other than the keywords and the similarity of the filler model implemented corresponding to the keyword model. The threshold value, which is the difference between the values, is measured for each keyword (S14).

상기한 과정 S14를 통해 각 키워드별로 임계값을 측정한 결과, 측정된 임계값이 기준치 이하인 키워드 모델에 대해서는, 키워드 이외의 단어에 대한 키워드 모델과 필러 모델의 유사도 값이 기설정된 값 이상으로 차이가 나도록 해당 키워드 모델에 대한 필러 모델을 재구현하고(S16, S18), 상기한 과정 S14로 진행하여 필러 모델이 재구현된 키워드 모델에 대해 키워드 이외의 단어에 대한 키워드 모델의 유사도와 필러 모델의 유사도를 비교하여 두 유사도 값의 차인 임계값을 측정한다.As a result of measuring the threshold value for each keyword through the above-described process S14, for the keyword model whose measured threshold value is less than or equal to the reference value, the similarity value between the keyword model and the filler model for words other than the keyword differs by more than a predetermined value. Re-implement the filler model for the corresponding keyword model (S16, S18), and proceed to the process S14 described above, and the similarity of the keyword model and the similarity of the filler model to the words other than keywords for the keyword model where the filler model is reimplemented. Compare and measure the threshold, which is the difference between the two similarity values.

본 발명의 키워드별 필러 모델을 구비하는 키워드 검출 시스템 및 키워드별 필러 모델 구현 방법은 전술한 실시예에 국한되지 않고 본 발명의 기술 사상이 허용하는 범위 내에서 다양하게 변형하여 실시할 수 있다.The keyword detection system including the keyword-based filler model and the method of implementing the keyword-based filler model of the present invention are not limited to the above-described embodiments, and various modifications can be made within the range allowed by the technical idea of the present invention.

본 발명의 키워드별 필러 모델을 구비하는 키워드 검출 시스템 및 키워드별 필러 모델 구현 방법은, 키워드 검출 기반의 음성 인식 시스템을 사용하는 모든 분야에서 활용될 수 있다.The keyword detection system including the keyword-based filler model and the method of implementing the keyword-based filler model may be utilized in all fields using the keyword detection-based speech recognition system.

10. 음성 수신부, 20. 특징 벡터 추출부,
30. 키워드 모델, 40. 필러 모델,
50. 유사도 비교부10. voice receiver, 20. feature vector extractor,
30. Keyword Model, 40. Filler Model,
50. Similarity comparison

Claims

At least one keyword model for comparing a feature vector extracted from the speech signal with a stored keyword and calculating and outputting a similarity level that is close to the keyword;
A filler model implemented as an acoustic model having the largest acoustic distance difference value of each keyword corresponding to each keyword model, and calculating and outputting a similarity degree to the feature vector;
A keyword detection system having a filler model for each keyword comprising a similarity comparison unit for detecting a keyword by comparing the similarity received from each keyword model with the similarity received from a perler model implemented corresponding to each keyword model.

The method of claim 1, wherein the similarity comparison unit
By comparing the similarity of the keyword model with respect to words other than the keyword for each keyword model and the similarity of the filler model implemented corresponding to the keyword model, a threshold value that is a difference between two similarity values is measured for each keyword, and the measured threshold A keyword detection system having a filler model for each keyword, the keyword corresponding to the keyword model whose value is equal to or less than a reference value is detected.

An acoustic distance measurement process between keywords for measuring acoustic distances between keywords through acoustic analysis of each keyword in a filler model implementing apparatus;
The filler model is implemented using the acoustic model having the farthest acoustic distance from each keyword for each keyword, and the filler model is implemented for each keyword by reflecting the acoustic distance between the keywords measured through the acoustic distance measurement process between the keywords. Filler model implementation method for each keyword comprising a filler model implementation process.

The method of claim 3, wherein when the filler model is implemented for each keyword model through the filler model implementation process, the similarity of the keyword model for words other than keywords and the similarity of the filler model corresponding to the keyword model are implemented for each keyword model. Measuring a threshold value, which is a difference between two similarity values, for each keyword;
For the keyword model whose measured threshold is less than or equal to the reference value, the process of re-implementing the filler model for the keyword model such that the similarity values of the keyword model and the filler model for words other than the keyword differ by more than a predetermined value is further performed. Filler model implementation method for each keyword characterized in that it comprises a.