KR20100030483A

KR20100030483A - Speech recognition apparatus using the multi-thread and methmod thereof

Info

Publication number: KR20100030483A
Application number: KR1020080089453A
Authority: KR
Inventors: 정두경
Original assignee: 엘지전자 주식회사
Priority date: 2008-09-10
Filing date: 2008-09-10
Publication date: 2010-03-18
Also published as: KR101529918B1

Abstract

PURPOSE: A speech recognition apparatus using multi-thread and a method thereof are provided to reduce response time of a general command compared with natural language recognition by treating voice recognition based on multithread using the different acoustic model. CONSTITUTION: An eigenvector extractor(120) extracts inputted voice signal. A voice recognition server(130) performs voice recognition through a plurality of threads using different audio models using the extracted eigenvector. A first thread performs voice recognition for the extracted eigenvector using the audio models. A second thread performs voice recognition for the extracted eigenvector using statistical language models.

Description

Speech Recognition Apparatus and Method Using Multiple Threads {SPEECH RECOGNITION APPARATUS USING THE MULTI-THREAD AND METHMOD THEREOF}

본 발명은 다중 스레드를 이용한 음성 인식 장치 및 그 방법에 관한 것이다.The present invention relates to a speech recognition apparatus and method using multiple threads.

일반적으로, 음성 인식은 화자로부터 입력된 일반 명령어에 대한 인식 또는 자연어에 대한 인식을 포함한다.In general, speech recognition includes recognition of general commands input from a speaker or recognition of natural language.

또한, 상기 일반 명령어에 대한 인식 시, 상기 일반 명령어에 대한 인식 응답 시간은 상기 자연어에 대한 인식 응답 시간보다 짧다.In addition, when recognizing the general command, the recognition response time for the general command is shorter than the recognition response time for the natural language.

본 발명의 목적은, 서로 다른 음향 모델을 이용하는 다중 스레드를 이용하여 음성 인식을 수행하는 음성 인식 장치 및 그 방법을 제공하는 데 있다.It is an object of the present invention to provide a speech recognition apparatus and method for performing speech recognition using multiple threads using different acoustic models.

본 발명의 다른 목적은, 일반 명령어와 자연어 인식에 있어서, 서로 다른 음향 모델을 이용하는 다중 스레드를 이용하여 음성 인식을 처리하여, 일반 명령어를 인식하는 경우 자연어 인식에 비해 응답 시간을 줄이는 다중 스레드를 이용한 음성 인식 장치 및 그 방법을 제공하는 데 있다.It is another object of the present invention to process speech recognition using multiple threads using different acoustic models in general commands and natural language recognition, and use multiple threads to reduce response time compared to natural language recognition when general commands are recognized. The present invention provides a speech recognition apparatus and a method thereof.

상기 목적들을 달성하기 위한 본 발명에 따른 다중 스레드를 이용한 음성 인식 방법은, 입력된 음성 신호로부터 특징 벡터를 추출하는 제1 단계; 상기 추출된 특징 벡터를 서로 다른 음향 모델을 이용하는 복수의 스레드들을 통해 각각 음성 인식을 수행하는 제2 단계; 상기 복수의 스레드들 중 상기 음성 인식에 대한 응답 시간이 빠른 스레드의 음성 인식 결과를 출력하는 제3 단계를 포함하여 이루어진다.According to an aspect of the present invention, there is provided a speech recognition method using multiple threads, the method comprising: extracting a feature vector from an input speech signal; Performing a speech recognition on the extracted feature vector through a plurality of threads using different acoustic models, respectively; And a third step of outputting a voice recognition result of a thread having a quick response time to the voice recognition among the plurality of threads.

상기 목적들을 달성하기 위한 본 발명에 따른 다중 스레드를 이용한 음성 인식 방법은, 입력된 음성 신호로부터 특징 벡터를 추출하는 제1 단계; 상기 추출된 특징 벡터를 근거로 화자 인식을 수행하는 제2 단계; 상기 화자 인식 수행 후, 상기 추출된 특징 벡터를 서로 다른 음향 모델을 이용하는 복수의 스레드들을 통해 각각 음성 인식을 수행하는 제3 단계; 상기 복수의 스레드들 중 상기 음성 인식에 대한 응답 시간이 빠른 스레드의 음성 인식 결과를 출력하는 제4 단계를 포함하여 이루어진다.According to an aspect of the present invention, there is provided a speech recognition method using multiple threads, the method comprising: extracting a feature vector from an input speech signal; A second step of performing speaker recognition based on the extracted feature vector; A third step of performing speech recognition on each of the extracted feature vectors through a plurality of threads using different acoustic models after performing the speaker recognition; And a fourth step of outputting a voice recognition result of a thread having a quick response time to the voice recognition among the plurality of threads.

상기 목적들을 달성하기 위한 본 발명에 따른 다중 스레드를 이용한 음성 인식 장치는, 입력된 음성 신호로부터 특징 벡터를 추출하는 특징 벡터 추출부; 상기 추출된 특징 벡터를 서로 다른 음향 모델을 이용하는 복수의 스레드를 통해 음성 인식을 수행하는 음성 인식 서버를 포함하여 이루어진다.According to an aspect of the present invention, there is provided a speech recognition apparatus using multiple threads, comprising: a feature vector extractor extracting a feature vector from an input speech signal; The extracted feature vector includes a speech recognition server configured to perform speech recognition through a plurality of threads using different acoustic models.

상기 목적들을 달성하기 위한 본 발명에 따른 다중 스레드를 이용한 음성 인식 장치는, 입력된 음성 신호로부터 특징 벡터를 추출하는 특징 벡터 추출부; 상기 추출된 특징 벡터를 근거로 화자 인식을 수행하는 화자 인식 서버; 상기 화자 인식이 정상 수행된 후, 상기 추출된 특징 벡터를 서로 다른 음향 모델을 이용하는 복수의 스레드를 통해 음성 인식을 수행하는 음성 인식 서버를 포함하여 이루어진다.According to an aspect of the present invention, there is provided a speech recognition apparatus using multiple threads, comprising: a feature vector extractor extracting a feature vector from an input speech signal; A speaker recognition server that performs speaker recognition based on the extracted feature vector; After the speaker recognition is normally performed, the extracted feature vector includes a speech recognition server that performs speech recognition through a plurality of threads using different acoustic models.

본 발명의 실시예에 따른 다중 스레드를 이용한 음성 인식 장치 및 그 방법은, 일반 명령어(Command & Control)와 자연어 인식(Flexible Speech Recognition, 또는 자유 발화)이 가능한 경우에 있어서, 서로 다른 음향 모델을 이용하는 다중 스레드를 이용하여 음성 인식을 처리하여, 인식 결과가 빠르게 나온 스레드의 결과를 이용하도록 함으로써, 응답 시간을 줄일 수 있는 효과가 있다.An apparatus and method for recognizing speech using multiple threads according to an embodiment of the present invention use different acoustic models in a case where general command (Command & Control) and natural language recognition (Flexible Speech Recognition, or free speech) are possible. By using multiple threads to process speech recognition, it is possible to reduce the response time by using the result of the thread that the recognition result is faster.

또한, 본 발명의 실시예에 따른 다중 스레드를 이용한 음성 인식 장치 및 그 방법은, 서로 다른 음향 모델을 이용하는 다중 스레드를 이용하여 인식률이 좋은 스레드의 결과를 이용하도록 함으로써, 신뢰성 있는 인식 결과를 제공할 수 있는 효과가 있다.In addition, the apparatus and method for recognizing speech using multiple threads according to an embodiment of the present invention may provide a reliable recognition result by using a result of a thread having a good recognition rate using multiple threads using different acoustic models. It can be effective.

이하, 첨부된 도면을 참조하여 본 발명에 따른 바람직한 실시예를 상세히 설명하되, 도면 부호에 관계없이 동일하거나 대응하는 구성 요소는 동일한 참조 번호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings, and the same or corresponding components will be denoted by the same reference numerals regardless of the reference numerals and redundant description thereof will be omitted.

도 1은 본 발명의 제1 실시예에 따른 다중 스레드를 이용한 음성 인식 장치를 나타낸 블록도로서, 이에 도시된 바와 같이, 음성 인식 장치(100)는, 입력부(110), 특징 벡터 추출부(120), 음성 인식 서버(130), 응용 프로그램부(140) 및, 출력부(150)를 포함하여 이루어진다.1 is a block diagram illustrating a speech recognition apparatus using multiple threads according to a first embodiment of the present invention. As shown in FIG. 1, the speech recognition apparatus 100 includes an input unit 110 and a feature vector extractor 120. ), A voice recognition server 130, an application program unit 140, and an output unit 150.

상기 입력부(110)는, 임의의 화자로부터 발성된 음성 신호를 입력받는다.The input unit 110 receives a voice signal spoken by an arbitrary speaker.

또한, 상기 입력부(110)는, 필터(filter)와 같은 모듈 등을 추가로 구비하여 상기 입력된 음성 신호에 포함된 잡음 등을 제거하도록 구성할 수도 있다.In addition, the input unit 110 may further include a module such as a filter to remove noise included in the input voice signal.

상기 특징 벡터 추출부(120)는, 상기 입력부(110)를 통해 입력된 음성 신호로부터 특징 벡터를 추출한다. 여기서, 상기 특징 벡터의 추출 기술들로는, 선형예측계수(LPC : Linear Predictive Coefficient), 켑스트럼(Cepstrum), MFCC(Mel Frequency Cepstral Coefficients), LSF(Line Spectral Frequencies), 주파수 대역별 에너지(Filter Bank Energy) 등이 있다.The feature vector extractor 120 extracts a feature vector from a voice signal input through the input unit 110. Here, the extraction techniques of the feature vectors include Linear Predictive Coefficient (LPC), Cepstrum, Mel Frequency Cepstral Coefficients (MFCC), Line Spectral Frequencies (LSF), and Energy by Frequency Band (Filter Bank). Energy).

상기 음성 인식 서버(130)는, 음성 인식부(131) 및, 음향 모델 데이터베이스(132)를 포함하여 이루어진다.The speech recognition server 130 includes a speech recognition unit 131 and an acoustic model database 132.

상기 음성 인식부(131)는, 다중 스레드(131-1, ... , 131-N)를 포함한다. 상 기 다중 스레드 각각(131-1, ... , 131-N)은, 상기 음향 모델 데이터베이스(132)에 기저장된 각각의 음향 모델을 이용하여 상기 특징 벡터 추출부(120)에서 추출된 상기 특징 벡터에 대한 디코딩(Decoding)을 수행한다. 이때, 상기 디코딩 수행 결과에 따른 결과는, 기저장된 유한 세트(finite-set) 중에서, 유사도 등의 비교 결과에 의해, 그 중 유사도가 가장 높은 첫 번째 결과를 의미한다.The speech recognition unit 131 includes multiple threads 131-1,..., 131-N. Each of the plurality of threads 131-1 to 131-N is extracted from the feature vector extractor 120 using the respective acoustic model pre-stored in the acoustic model database 132. Decoding the vector. In this case, the result according to the decoding result means the first result having the highest similarity among the finite-sets previously stored, based on a comparison result such as similarity.

또한, 상기 음성 인식부(131)는, 상기 각각의 음향 모델을 이용하여 디코딩을 수행한 결과, 응답 시간이 빠른 스레드에 대해 신뢰도(Confidence Score)를 평가하고, 상기 신뢰도의 평가 결과에 따라 해당 스레드의 결과를 음성 인식의 결과로 채택할지 여부를 결정한다.In addition, the speech recognition unit 131 evaluates a confidence score for a thread having a fast response time as a result of decoding using the respective acoustic model, and according to the result of the evaluation of the reliability, the corresponding thread. Determines whether to adopt the result of speech recognition as a result.

또한, 상기 음성 인식부(131)는, 상기 다수의 스레드 중 어느 하나의 스레드에서, 음향 모델을 이용하여 상기 추출된 특징 벡터에 대한 디코딩을 수행하는 경우, 상기 어느 하나의 스레드에 대해 상기 신뢰도를 평가하고, 상기 신뢰도의 평가 결과에 따라 해당 스레드의 결과를 음성 인식의 결과로 채택할지 여부를 결정하도록 구성할 수도 있다. 이때, 상기 음향 모델을 이용하는 경우, 그에 따른 응답 속도(음성의 끝 지점을 검출한 시점으로부터 음성 인식 결과가 나타나기까지의 시간 또는, End of utterance Time-out)는 약 300ms 정도이다.In addition, when any one of the plurality of threads decodes the extracted feature vector using an acoustic model, the speech recognition unit 131 may apply the reliability to the one of the threads. And determine whether to adopt the result of the thread as a result of speech recognition according to the result of the evaluation of the reliability. In this case, when using the acoustic model, the response speed (the time from when the end point of the voice is detected to the result of the speech recognition or the end of utterance time-out) is about 300 ms.

또한, 상기 음성 인식부(131)는, 상기 다수의 스레드 중 또 다른 어느 하나의 스레드에서, 통계적 언어 모델을 이용하여 상기 추출된 특징 벡터에 대한 디코딩을 수행하는 경우, 상기 디코딩 수행 결과에 대해 의미 분석(Semantic Analysis)을 수행하고, 상기 의미 분석 수행 후, 상기 또 다른 어느 하나의 스레드에 대해 상기 신뢰도를 평가하고, 상기 신뢰도의 평가 결과에 따라 해당 스레드의 결과를 음성 인식의 결과로 채택할지 여부를 결정하도록 구성할 수도 있다. 이때, 상기 통계적 언어 모델을 이용하는 경우, 그에 따른 응답 속도는 약 1300ms 정도이다.In addition, the speech recognition unit 131 may mean a result of the decoding when the decoding of the extracted feature vector is performed using a statistical language model in any one of the plurality of threads. Perform a semantic analysis, and after performing the semantic analysis, evaluate the reliability of the other one thread, and adopt the result of the thread as a result of speech recognition according to the evaluation result of the reliability; It can also be configured to determine. In this case, when the statistical language model is used, the response speed is about 1300 ms.

또한, 상기 음성 인식부(131)는, 상기 어느 하나의 스레드의 신뢰도와 상기 또 다른 어느 하나의 스레드의 신뢰도 모두가 기설정된 임계값보다 작거나 같은 경우에는, 상기 어느 하나의 스레드의 신뢰도와 상기 또 다른 어느 하나의 스레드의 신뢰도 중, 더 높은 신뢰도를 가지는 해당 스레드의 결과를 음성 인식의 결과로 채택할 수도 있다.In addition, when both the reliability of the one thread and the reliability of the another thread are less than or equal to a preset threshold, the voice recognition unit 131 may determine the reliability of the one thread and the thread. Among the reliability of another thread, the result of the thread having higher reliability may be adopted as a result of speech recognition.

상기 음향 모델 데이터베이스(132)는, 음향 모델(Acoustic Model), 통계적 언어 모델(SLM : Statistical Language Model) 등을 포함한다.The acoustic model database 132 includes an acoustic model, a statistical language model (SLM), and the like.

상기 응용 프로그램부(140)는, 상기 음성 인식 서버(130)의 음성 인식의 결과에 따라 상기 응용 프로그램부(140)에 포함된 해당 응용 프로그램을 동작시키게 된다.The application program unit 140 operates the application program included in the application program unit 140 according to a result of the voice recognition of the voice recognition server 130.

상기 출력부(150)는, 상기 음성 인식 서버(130)의 음성 인식의 결과를 출력한다.The output unit 150 outputs a result of voice recognition of the voice recognition server 130.

이와 같이, 일반 명령어 및 자연어 인식이 모두 가능한 경우에 있어서, 서로 다른 음향 모델을 이용하는 다수의 스레드를 구비하여 음성 인식을 수행함으로써, 음성 인식에 따른 응답 시간이 빠른 스레드의 결과를 이용함으로써, 화자에게 빠른 음성 인식 결과를 제공할 수 있다.As described above, in the case where both general commands and natural language recognition are possible, speech recognition is performed by providing a plurality of threads using different acoustic models, thereby using the result of a thread having a fast response time according to the speech recognition. It can provide fast speech recognition results.

또한, 서로 다른 음향 모델을 이용하는 다수의 스레드를 구비하여, 각 스레 드의 신뢰도에 따른 음성 인식 결과를 제공함으로써, 화자에게 신뢰성 있는 음성 인식 결과를 제공할 수 있다.In addition, by providing a plurality of threads using different acoustic models, by providing a speech recognition result according to the reliability of each thread, it is possible to provide a reliable speech recognition result to the speaker.

또한, 일반 명령어 및 자연어 인식이 모두 가능한 경우에 있어서, 서로 다른 음향 모델을 이용하는 다수의 스레드를 구비하여 음성 인식을 수행함으로써, 일반 명령어에 대해 자연어 인식과 동일한 응답 시간을 필요로 하지 않음으로써, 음성 인식 장치에 대한 효율적인 운영을 제공할 수 있다.In addition, in the case where both general command and natural language recognition are possible, speech recognition is performed with a plurality of threads using different acoustic models, thereby not requiring the same response time as natural language recognition for general commands. Efficient operation of the recognition device can be provided.

도 2는 본 발명의 제2 실시예에 따른 다중 스레드를 이용한 음성 인식 장치를 나타낸 블록도로서, 이에 도시된 바와 같이, 음성 인식 장치(100)는, 입력부(110), 특징 벡터 추출부(120), 음성 인식 서버(130), 응용 프로그램부(140), 출력부(150) 및, 화자 인식 서버(160)를 포함하여 이루어진다.2 is a block diagram illustrating a speech recognition apparatus using multiple threads according to a second exemplary embodiment of the present invention. As shown in the drawing, the speech recognition apparatus 100 includes an input unit 110 and a feature vector extractor 120. ), A voice recognition server 130, an application program unit 140, an output unit 150, and a speaker recognition server 160.

상기 입력부(110), 특징 벡터 추출부(120), 음성 인식 서버(130), 응용 프로그램부(140) 및, 출력부(150)의 기본적인 구성은 상기 제1 실시예에 기재된 내용과 동일하며, 이에 대한 설명은 생략한다.The basic configuration of the input unit 110, the feature vector extraction unit 120, the voice recognition server 130, the application program unit 140, and the output unit 150 is the same as described in the first embodiment, Description thereof will be omitted.

상기 화자 인식 서버(160)는, 상기 특징 벡터 추출부(120)에서 추출된 특징 벡터를 근거로 화자 인식을 수행한다.The speaker recognition server 160 performs speaker recognition based on the feature vector extracted by the feature vector extractor 120.

또한, 상기 화자 인식 서버(160)는, 화자 인식부(161), 화자 모델 데이터베이스(162) 및, 화자 모델 적응 서버(163)를 포함하여 이루어진다.In addition, the speaker recognition server 160 includes a speaker recognition unit 161, a speaker model database 162, and a speaker model adaptation server 163.

상기 화자 인식부(161)는, 상기 특징 벡터 추출부(120)에서 추출된 특징 벡터를 근거로 상기 추출된 특징 벡터와 상기 화자 모델 데이터베이스(162)에 기저장된 하나 이상의 화자 모델 간의 확률 값을 계산하고, 상기 계산된 확률 값을 근거 로 상기 화자 모델 데이터베이스(162)에 기등록된 화자인지 아닌지 여부를 판단하는 화자 식별(Speaker Identification)이나, 올바른 사용자의 접근인지를 판단하는 화자 검증(Speaker Verification)을 수행한다.The speaker recognizer 161 calculates a probability value between the extracted feature vector and at least one speaker model previously stored in the speaker model database 162 based on the feature vector extracted by the feature vector extractor 120. And a speaker identification for determining whether or not a speaker is already registered in the speaker model database 162 based on the calculated probability value, or a speaker verification for determining whether a correct user is approaching. Do this.

즉, 상기 화자 모델 데이터베이스(162)에 기저장된 다수의 화자 모델들에 대한 최우추정법(Maximum Likelihood Estimation)을 수행한 결과, 가장 높은 확률 값을 갖는 화자 모델을 상기 음성을 발성한 화자로 선택하게 된다. 또한, 상기 수행 결과 가장 높은 확률 값이 기설정된 임계값보다 작거나 같은 경우에는 상기 화자 모델 데이터베이스(162)에 기등록된 화자들 중에서는 상기 음성을 발성한 화자가 없는 것으로 판단하여, 상기 음성을 발성한 화자에 대해서는 화자 식별 결과 기등록된 화자가 아닌 것으로 판단하게 된다. 일 예로, 상기 추출된 특징 벡터와 상기 화자 모델 데이터베이스(162)에 기저장된 다수의 특징 벡터들과의 유사도를 각각 판별하고, 그 유사도가 기설정된 임계값 큰 경우에는, 상기 추출된 특징 벡터에 해당하는 화자의 화자 모델이 기등록된 것으로 판단하고, 상기 유사도가 기설정된 임계값보다 작거나 같은 경우에는, 상기 추출된 특징 벡터에 해당하는 화자의 화자 모델이 등록되지 않은 것으로 판단하게 된다.That is, as a result of performing a Maximum Likelihood Estimation on a plurality of speaker models previously stored in the speaker model database 162, the speaker model having the highest probability value is selected as the speaker who spoke the voice. . In addition, when the highest probability value is less than or equal to a predetermined threshold as a result of the execution, it is determined that none of the speakers registered in the speaker model database 162 is the speaker that spoke the voice, The speaker is determined to be not a registered speaker as a result of speaker identification. For example, a similarity between the extracted feature vector and a plurality of feature vectors pre-stored in the speaker model database 162 may be determined. If the similarity is greater than a preset threshold, the corresponding feature vector corresponds to the extracted feature vector. If the speaker model of the speaker is pre-registered, and the similarity is less than or equal to a predetermined threshold, it is determined that the speaker model of the speaker corresponding to the extracted feature vector is not registered.

또한, 화자 검증의 경우에는, 로그 우도비(LLR : Log-Likelihood Ratio) 방법을 이용하여 올바른 화자 인지 여부를 판별하게 된다.In addition, in the case of speaker verification, whether or not the speaker is the correct speaker is determined by using a Log-Likelihood Ratio (LLR) method.

또한, 상기 화자 인식부(161)는, 상기 판단 결과, 상기 기등록된 화자인 경우, 상기 화자 모델 적응 서버(163)를 이용하여 상기 추출된 특징 벡터를 상기 화자 모델 데이터베이스(162)에 기저장된 상기 추출된 특징 벡터에 대응하는 화자 모 델에 적응시킨다.In addition, when the speaker recognizes the speaker, the speaker recognition unit 161 pre-stores the extracted feature vector in the speaker model database 162 using the speaker model adaptation server 163. It is adapted to the speaker model corresponding to the extracted feature vector.

또한, 상기 화자 인식부(161)는, 상기 판단 결과, 기등록된 화자가 아닌 경우, 상기 추출된 특징 벡터를 근거로 새로운 화자 모델을 생성하게 된다.In addition, when the speaker recognition unit 161 determines that the speaker is not a registered speaker, the speaker recognition unit 161 generates a new speaker model based on the extracted feature vector.

이때, 상기 화자 인식부(161)는, GMM(Gaussian Mixture Model), HMM(Hidden Markov Model), 신경망(Neural Network) 등을 이용하여 상기 화자 모델을 생성하게 된다.In this case, the speaker recognition unit 161 generates the speaker model using a Gaussian Mixture Model (GMM), a Hidden Markov Model (HMM), a neural network, and the like.

또한, 상기 화자 인식부(161)는, 상기 추출된 특징 벡터를 근거로 EM(Expectation Maximazation) 알고리즘을 이용하여 화자 모델인 GMM을 생성할 수도 있다.In addition, the speaker recognition unit 161 may generate a GMM, which is a speaker model, using an Expectation Maximazation (EM) algorithm based on the extracted feature vector.

또한, 상기 화자 인식부(161)는, 상기 추출된 특징 벡터를 근거로 상기 EM 알고리즘을 이용하여 UBM(Universal Background Model)을 생성하고, 상기 생성된 UBM에 대해 상기 화자 모델 적응 서버(163)에 기저장된 적응 알고리즘(Adaptation Algorithm)을 수행하여 상기 발성한 화자에 적응된 화자 모델 즉, GMM을 생성할 수 있다. 이때, 상기 화자 모델 적응 서버(163)에 기저장된 적응 알고리즘은, MAP(Maximum A Posteriori), MLLR(Maximum Likelihood Linear Regression) 또는, 아이겐보이스(Eigenvoice) 방법 등을 포함할 수 있다.In addition, the speaker recognition unit 161 generates a universal background model (UBM) using the EM algorithm based on the extracted feature vector, and transmits the generated UBM to the speaker model adaptation server 163. A speaker model adapted to the spoken speaker, that is, a GMM, may be generated by performing a pre-stored adaptation algorithm. At this time, the adaptation algorithm pre-stored in the speaker model adaptation server 163 may include a Maximum A Posteriori (MAP), a Maximum Likelihood Linear Regression (MLLR), an Eigenvoice method, or the like.

상기 화자 모델 데이터베이스(162)는, 기등록된 하나 이상의 화자 모델을 저장한다.The speaker model database 162 stores one or more registered speaker models.

또한, 상기 화자 모델 데이터베이스(162)는, 상기 화자 인식부(161)에 의해 새롭게 생성된 화자 모델을 저장하도록 구성된다.In addition, the speaker model database 162 is configured to store a speaker model newly generated by the speaker recognition unit 161.

상기 화자 모델 적응 서버(163)는, 상기 기술된 바와 같이 MAP, MLLR 또는, 아이겐보이스 방법 등과 같은 적응 알고리즘을 저장하고, 상기 화자 인식부(161)의 제어에 의해 상기 추출된 특징 벡터를 화자 모델에 적응시킨다.The speaker model adaptation server 163 stores an adaptation algorithm, such as MAP, MLLR, or eigenvoice method, as described above, and uses the extracted feature vector under the control of the speaker recognition unit 161 to model the speaker. Adapt to

또한, 상기 화자 모델 적응 서버(163)는, 상기 화자 인식부(161)의 제어에 의해 상기 추출된 특징 벡터로부터 생성된 UBM에 대해서 상기 화자 모델 적응 서버(163)에 기저장된 적응 알고리즘을 수행하여 GMM을 생성할 수 있다.In addition, the speaker model adaptation server 163 may perform an adaptive algorithm previously stored in the speaker model adaptation server 163 on the UBM generated from the extracted feature vector under the control of the speaker recognition unit 161. You can create a GMM.

또한, 상기 화자 인식 서버(160)는, 상기 화자 인식 수행 결과를 상기 출력부(150)에 출력하거나, 상기 음성 인식 서버(130)에 전달 또는 출력한다.In addition, the speaker recognition server 160 outputs the speaker recognition result to the output unit 150, or transmits or outputs the result to the voice recognition server 130.

상기 음성 인식 서버(130)는, 상기 화자 인식 서버(160)에서 출력된 결과에 따라, 상기 음성을 발성한 화자에 대한 음성 인식을 수행하게 된다. 즉, 상기 화자 인식 서버(160)에서 정상적으로 화자를 인식(화자 검증 또는 화자 식별)한 후, 해당 화자에 대한 음성 인식을 수행하게 된다.The voice recognition server 130 performs voice recognition on the speaker who spoke the voice according to a result output from the speaker recognition server 160. That is, the speaker recognition server 160 normally recognizes the speaker (verifier verification or speaker identification), and then performs speech recognition for the speaker.

또한, 상기 음성 인식 서버(130) 내에는, 상기 기재한 바와 같이 다중 스레드(131-1, ... , 131-N)가 포함되며, 상기 다중 스레드 각각(131-1, ... , 131-N)은, 상기 음향 모델 데이터베이스(132)에 기저장된 각각의 음향 모델을 이용하여 상기 특징 벡터 추출부(120)에서 추출된 상기 특징 벡터에 대한 디코딩(또는, 음성 인식)을 수행한다.In addition, the voice recognition server 130 includes multiple threads 131-1, 13 -N as described above, and each of the multiple threads 131-1,..., 131. -N) performs decoding (or speech recognition) on the feature vector extracted by the feature vector extractor 120 using each sound model previously stored in the sound model database 132.

또한, 상기 음성 인식부(131)는, 상기 각각의 음향 모델을 이용하여 디코딩을 수행한 결과에 따라 신뢰도를 평가하여 해당 스레드의 결과를 음성 인식의 결과로 채택할지 여부를 결정하게 된다.In addition, the speech recognition unit 131 evaluates the reliability according to the result of decoding using the respective acoustic models to determine whether to adopt the result of the corresponding thread as a result of speech recognition.

이와 같이, 화자 인식 및 음성 인식을 동시에 수행함으로써, 화자에게 신뢰성 있는 화자 인식 및 빠른 음성 인식 결과를 제공할 수 있다.As such, by simultaneously performing speaker recognition and speech recognition, the speaker can be provided with reliable speaker recognition and fast speech recognition results.

이하에서는, 본 발명에 따른 다중 스레드를 이용한 음성 인식 방법을 도 1 및 도 2를 참조하여 상세히 설명한다.Hereinafter, a voice recognition method using multiple threads according to the present invention will be described in detail with reference to FIGS. 1 and 2.

도 3은 본 발명의 제1 실시예에 따른 다중 스레드를 이용한 음성 인식 방법을 나타낸 흐름도이다.3 is a flowchart illustrating a speech recognition method using multiple threads according to a first embodiment of the present invention.

먼저, 입력부(110)를 통해 입력된 음성 신호로부터 특징 벡터를 추출한다(S10).First, a feature vector is extracted from a voice signal input through the input unit 110 (S10).

이후, 상기 추출된 특징 벡터에 대해 서로 다른 음향 모델을 이용하는 복수의 스레드들을 통해 각각 음성 인식을 수행한다. 이때, 상기 서로 다른 음향 모델들은, 음향 모델 또는 통계적 언어 모델 등 일 수 있다(S20).Thereafter, speech recognition is performed on the extracted feature vector through a plurality of threads using different acoustic models. In this case, the different acoustic models may be an acoustic model or a statistical language model (S20).

이후, 상기 음성 인식 수행에 따라, 상기 복수의 스레드들 중에서 음성 인식에 대한 응답 시간이 가장 빠른 제1 스레드의 음성 인식 결과를 출력한다.Subsequently, as the voice recognition is performed, the voice recognition result of the first thread having the fastest response time for voice recognition among the plurality of threads is output.

이때, 상기 제1 스레드의 음성 인식 결과에 대해, 상기 제1 스레드의 신뢰도가 기설정된 임계값보다 큰 경우에는, 상기 제1 스레드의 음성 인식 결과를 출력하고, 상기 복수의 스레드들 중에서 상기 제1 스레드를 제외한 나머지 모든 스레드들의 동작을 강제 종료시킨다.In this case, when the reliability of the first thread is greater than a preset threshold with respect to the voice recognition result of the first thread, the voice recognition result of the first thread is output and the first of the plurality of threads is output. Kill all other threads except threads.

또한, 상기 제1 스레드의 음성 인식 결과에 대해, 상기 제1 스레드의 신뢰도가 기설정된 임계값보다 작거나 같은 경우에는, 상기 음성 인식 수행에 따라 응답 시간이 상기 제1 스레드 다음 순서인 제2 스레드에 대해 상기 제2 스레드의 신뢰도 가 상기 기설정된 임계값보다 큰지 판단하여 상기 제2 스레드의 출력 여부를 결정할 수도 있다.Also, when the reliability of the first thread is less than or equal to a preset threshold with respect to a voice recognition result of the first thread, a second thread having a response time next to the first thread according to performing the voice recognition. For example, it may be determined whether the reliability of the second thread is greater than the preset threshold to determine whether to output the second thread.

즉, 상기 제1 스레드의 신뢰도가 기설정된 임계값보다 작거나 같고, 상기 제2 스레드의 신뢰도가 상기 기설정된 임계값보다 큰 경우에는, 상기 제2 스레드의 음성 인식 결과를 출력한다. 또한, 상기 제2 스레드의 신뢰도가 상기 기설정된 임계값보다 작거나 같은 경우에는, 상기 제1 및 제2 스레드에 대한 신뢰도가 낮은 것으로 판단하여, 화자에게 음성 신호를 재입력해줄 것을 요청하고 상기 화자로부터 재입력된 음성 신호에 대해 상기 단계들을 재수행하도록 하거나 또는, 상기 제1 스레드의 신뢰도와 상기 제2 스레드의 신뢰도를 비교하여, 그 2개의 스레드의 신뢰도 중 더 높은 신뢰도를 가지는 해당 스레드의 음성 인식 결과를 출력하도록 할 수도 있다.That is, when the reliability of the first thread is less than or equal to a preset threshold and the reliability of the second thread is greater than the preset threshold, the voice recognition result of the second thread is output. In addition, when the reliability of the second thread is less than or equal to the preset threshold, the reliability of the first and second threads is determined to be low, and the speaker is requested to re-input a voice signal. Perform the steps again on the voice signal re-input from the user, or compare the reliability of the first thread with the reliability of the second thread, and the voice of the thread having the higher reliability among the reliability of the two threads. You can also output the recognition result.

또한, 음성 인식에 대한 응답 시간이 빠른 스레드 순으로 해당 스레드의 신뢰도를 상기 기설정된 임계값과 비교하고, 상기 비교 결과 해당 스레드의 신뢰도가 상기 기설정된 임계값보다 큰 경우에는 해당 스레드의 음성 인식 결과를 출력하고, 해당 스레드의 신뢰도가 상기 기설정된 임계값보다 작거나 같은 경우에는 응답 시간이 빠른 다음 스레드에 대해 상기 기설정된 임계값과의 비교를 통해 상기 응답 시간이 빠른 다음 스레드의 음성 인식 결과의 출력 여부를 결정하게 된다.In addition, the reliability of the corresponding thread is compared with the preset threshold in order of the fast response time to speech recognition, and if the reliability of the corresponding thread is greater than the preset threshold, the voice recognition result of the corresponding thread If the reliability of the thread is less than or equal to the preset threshold, the voice recognition result of the next thread having the quick response time is compared with the preset threshold for the next thread having the quick response time. The output will be determined.

이후, 상기 출력된 해당 스레드의 음성 인식 결과에 따라 해당 응용 프로그램을 동작시킬 수도 있다(S30).Thereafter, the corresponding application may be operated according to the voice recognition result of the corresponding thread (S30).

도 4는 본 발명의 제2 실시예에 따른 다중 스레드를 이용한 음성 인식 방법 을 나타낸 흐름도이다.4 is a flowchart illustrating a speech recognition method using multiple threads according to a second embodiment of the present invention.

먼저, 입력부(110)를 통해 입력된 음성 신호로부터 특징 벡터를 추출한다(S110).First, a feature vector is extracted from a voice signal input through the input unit 110 (S110).

이후, 상기 추출된 특징 벡터에 대해 서로 다른 음향 모델을 이용하는 복수의 스레드들을 통해 각각 음성 인식을 수행한다.Thereafter, speech recognition is performed on the extracted feature vector through a plurality of threads using different acoustic models.

즉, 상기 복수의 스레드들(131-1, ... ,131-N)이 제1 스레드(131-1) 및 제2 스레드(131-2)를 포함하는 경우, 상기 제1 스레드(131-1)는 음향 모델을 이용하여 음성 인식(또는, 디코딩)을 수행하고, 상기 제2 스레드(131-2)는 통계적 언어 모델을 이용하여 음성 인식을 수행한다(S120).That is, when the plurality of threads 131-1,..., 131 -N include the first thread 131-1 and the second thread 131-2, the first thread 131-. 1) performs speech recognition (or decoding) using an acoustic model, and the second thread 131-2 performs speech recognition using a statistical language model (S120).

이후, 상기 제1 스레드(131-1) 및 제2 스레드(131-2)의 음성 인식 수행에 있어서, 상기 음향 모델을 이용하는 상기 제1 스레드(131-1)의 인식 결과가 상기 제2 스레드(131-2)의 인식 결과보다 빠른지 비교한다(S130).Subsequently, in performing voice recognition of the first thread 131-1 and the second thread 131-2, the recognition result of the first thread 131-1 using the acoustic model is determined by the second thread ( It is compared whether or not the recognition result of 131-2) is faster than S130.

상기 비교 결과, 상기 제1 스레드(131-1)의 인식 결과가 상기 제2 스레드(131-2)의 인식 결과보다 빠른 경우, 일 예로, 일반 명령어 인식인 경우, 상기 제1 스레드(131-1)의 음성 인식 결과에 대한 신뢰도를 평가한다. 여기서, 상기 신뢰도 평가는, 상기 제1 스레드(131-1)의 신뢰도가 기설정된 임계값보다 큰지를 비교하는 것이며, 상기 비교 결과에 따라 해당 스레드의 결과를 이용하거나 이용하지 않도록 구성할 수 있다(S140).As a result of the comparison, when the recognition result of the first thread 131-1 is earlier than the recognition result of the second thread 131-2, for example, in the case of general instruction recognition, the first thread 131-1 Evaluate the reliability of the speech recognition result. Here, the reliability evaluation is to compare whether the reliability of the first thread 131-1 is greater than a predetermined threshold value, and may be configured to use or not use the result of the corresponding thread according to the comparison result ( S140).

상기 단계(S140)에서의 신뢰도 비교 결과, 상기 제1 스레드(131-1)의 신뢰도가 상기 기설정된 임계값보다 큰 경우에는, 상기 제1 스레드(131-1)의 음성 인식 결과를 출력하고, 상기 제2 스레드(131-2)의 동작을 강제 종료시킨다. 여기서, 상기 제2 스레드(131-2)의 동작 강제 종료는, 상기 제1 스레드(131-1)의 음성 인식 결과가 충분히 신뢰할 수 있는 정도이므로, 상기 제1 스레드(131-1)의 음성 인식 결과를 출력함에 따라, 상기 제2 스레드(131-2)의 음성 인식 결과가 추가로 필요하지 않기 때문이다(S150).As a result of the reliability comparison in step S140, when the reliability of the first thread 131-1 is greater than the predetermined threshold value, a voice recognition result of the first thread 131-1 is output, The operation of the second thread 131-2 is forcibly terminated. Here, the forced termination of the operation of the second thread 131-2 is such that the voice recognition result of the first thread 131-1 is sufficiently reliable, so that the voice recognition of the first thread 131-1 is performed. This is because the voice recognition result of the second thread 131-2 is not additionally required as the result is output (S150).

이후, 상기 단계(S130)에서의 인식 결과에 대한 비교 결과, 상기 제1 스레드(131-1)의 인식 결과가 상기 제2 스레드(131-2)의 인식 결과보다 빠르지 않은 경우, 일 예로, 자연어 인식인 경우, 상기 제2 스레드(131-2)의 동작이 완료되어 상기 제2 스레드(131-2)의 음성 인식 결과가 나올 때까지 잠시 대기하게 된다.Subsequently, when the comparison result with respect to the recognition result in step S130, the recognition result of the first thread 131-1 is not faster than the recognition result of the second thread 131-2, for example, a natural language In the case of recognition, the operation of the second thread 131-2 is completed and waits for a while until the voice recognition result of the second thread 131-2 is displayed.

또한, 상기 단계(S140)에서의 신뢰도 비교 결과, 상기 제1 스레드(131-1)의 신뢰도가 상기 기설정된 임계값보다 작거나 같은 경우에는, 상기 제2 스레드(131-2)의 음성 인식 결과가 나올 때까지 잠시 대기하게 된다(S160).In addition, when the reliability comparison result in the step (S140), the reliability of the first thread (131-1) is less than or equal to the predetermined threshold value, the voice recognition result of the second thread (131-2) Wait for a while until (S160).

이후, 상기 제2 스레드(131-2)의 음성 인식 결과에 대한 신뢰도를 평가한다. 여기서, 상기 신뢰도 평가는, 상기 제2 스레드(131-2)의 신뢰도가 상기 기설정된 임계값보다 큰지를 비교하는 것이며, 상기 비교 결과에 따라 해당 스레드의 결과를 이용하거나 이용하지 않도록 구성할 수 있다.Thereafter, the reliability of the speech recognition result of the second thread 131-2 is evaluated. Here, the reliability evaluation is to compare whether the reliability of the second thread (131-2) is greater than the predetermined threshold value, and may be configured to use or not use the result of the thread according to the comparison result. .

또한, 상기 통계적 언어 모델을 이용하여 자연어 인식을 수행하는 상기 제2 스레드의 음성 인식 결과에 대해서는, 의미 분석(Semantic Analysis)을 수행한 후, 상기 신뢰도를 평가하도록 구성한다(S170).In addition, for the speech recognition result of the second thread that performs natural language recognition using the statistical language model, after performing semantic analysis, the reliability is configured (S170).

상기 단계(S170)에서의 신뢰도 비교 결과, 상기 제2 스레드(131-2)의 신뢰도 가 상기 기설정된 임계값보다 큰 경우에는, 상기 제2 스레드(131-2)의 음성 인식 결과를 출력한다(S180).As a result of the reliability comparison in step S170, when the reliability of the second thread 131-2 is greater than the predetermined threshold value, the voice recognition result of the second thread 131-2 is output ( S180).

상기 단계(S170)에서의 신뢰도 비교 결과, 상기 제2 스레드(131-2)의 신뢰도가 상기 기설정된 임계값보다 작거나 같은 경우에는, 화자에게 음성 신호를 재입력해줄 것을 요청하여 상기 화자로부터 재입력된 음성 신호에 대해 상기 단계들을 재수행하도록 하거나 또는, 상기 제1 스레드의 신뢰도와 상기 제2 스레드의 신뢰도를 비교하여, 상기 2개의 스레드의 신뢰도 중 더 높은 신뢰도를 가지는 해당 스레드의 음성 인식 결과를 출력하도록 할 수도 있다.As a result of the reliability comparison in step S170, when the reliability of the second thread 131-2 is less than or equal to the predetermined threshold value, the speaker is requested to re-input the voice signal and re-issued from the speaker. Speech recognition result of the thread having higher reliability among the two threads, by comparing the reliability of the first thread with the reliability of the second thread or performing the steps on the input voice signal again. You can also output

즉, 상기 제1 스레드의 신뢰도가 상기 제2 스레드의 신뢰도보다 큰지를 비교하고, 상기 제1 스레드의 신뢰도가 상기 제2 스레드의 신뢰도보다 큰 경우에는, 상기 제1 스레드(131-1)의 음성 인식 결과를 출력하고, 상기 제1 스레드의 신뢰도가 상기 제2 스레드의 신뢰도보도 작거나 같은 경우에는, 상기 제2 스레드(131-2)의 음성 인식 결과를 출력한다(S190).That is, when the reliability of the first thread is greater than the reliability of the second thread, and when the reliability of the first thread is greater than the reliability of the second thread, the voice of the first thread 131-1 is compared. If the recognition result is output, and if the reliability of the first thread is less than or equal to the reliability of the second thread, the voice recognition result of the second thread 131-2 is output (S190).

이와 같이, 일반 명령어 및 자연어 인식이 모두 가능한 경우에 있어서, 서로 다른 음향 모델을 이용하는 다수의 스레드를 구비하여 음성 인식을 수행하는 도중, 음성 인식에 따른 응답 시간이 빠른 스레드의 결과를 이용하면서 상기 복수의 스레드들 중 상기 음성 인식의 결과를 출력하는 스레드를 제외한 나머지 동작 중인 스레드들의 동작을 강제 종료시킴으로써, 상기 나머지 스레드들의 부하를 줄일 수 있다.As described above, in the case where both general command and natural language recognition are possible, the plurality of threads using different acoustic models are used to perform speech recognition while using the result of a thread having a fast response time according to speech recognition. The load of the remaining threads can be reduced by forcibly terminating the operation of the remaining running threads except for the thread that outputs a result of the speech recognition among the threads of.

도 5는 본 발명의 제3 실시예에 따른 다중 스레드를 이용한 음성 인식 방법 을 나타낸 흐름도이다.5 is a flowchart illustrating a speech recognition method using multiple threads according to a third embodiment of the present invention.

먼저, 입력부(110)를 통해 입력된 음성 신호로부터 특징 벡터를 추출한다(S210).First, a feature vector is extracted from a voice signal input through the input unit 110 (S210).

이후, 상기 추출된 특징 벡터를 근거로 화자 인식(화자 식별 또는/및 화자 검증) 과정을 수행한다. 이때, 상기 음성 신호를 발성한 화자가 화자 모델 데이터베이스(162)에 기등록된 화자가 아닌 경우에는 해당 화자에 대한 화자 모델을 생성하여 상기 해당 화자에 대해 상기 화자 모델 데이터베이스(162)에 등록시키는 과정을 추가로 수행한다. 또한, 상기 추출된 특징 벡터를 상기 화자 모델 데이터베이스(162)에 저장된 상기 추출된 특징 벡터에 대응하는 화자 모델에 적응시킨다(S220).Thereafter, a speaker recognition (speaker identification and / or speaker verification) process is performed based on the extracted feature vector. At this time, when the speaker who uttered the voice signal is not a speaker registered in the speaker model database 162, a process of generating a speaker model for the speaker and registering the speaker in the speaker model database 162 is performed. Additionally. In addition, the extracted feature vector is adapted to the speaker model corresponding to the extracted feature vector stored in the speaker model database 162 (S220).

이후, 상기 화자 모델에 적응된 특징 벡터에 대해 서로 다른 음향 모델을 이용하는 복수의 스레드들을 통해 각각 음성 인식을 수행한다.Thereafter, speech recognition is performed through a plurality of threads using different acoustic models for the feature vector adapted to the speaker model.

이와 같이, 상기 화자 모델에 적응된 특징 벡터에 대해 음성 인식을 수행하는 이후의 과정들(S230 내지 S300)은, 상기 제2 실시예에 기재된 과정들(S120 내지 S190)에 각각 대응되며, 상기 각 대응되는 내용은 동일하며, 이에 대한 설명은 생략한다(S230~S300).As described above, processes S230 to S300 after performing speech recognition on the feature vector adapted to the speaker model correspond to the processes S120 to S190 described in the second embodiment, respectively. The corresponding contents are the same, and a description thereof will be omitted (S230 to S300).

본 발명의 다중 스레드를 이용한 음성 인식 장치 및 그 방법에 의하면, 일반 명령어와 자연어 인식이 동시에 존재하는 경우, 각각 서로 다른 화자 모델을 사용하는 복수의 스레드를 이용하여 음성 인식을 수행하고, 음성 인식의 처리 결과가 빠른 스레드의 결과를 이용함으로써, 서로 다른 성격의 음성 인식으로 인해 생기는 응답 시간의 지연을 방지할 수 있고, 사용자의 편의성을 제공할 수 있음으로, 그 산업상 이용가능성이 크다고 하겠다.According to an apparatus and method for recognizing speech using multiple threads according to the present invention, when general instructions and natural language recognition exist at the same time, speech recognition is performed by using a plurality of threads using different speaker models, By using the result of a thread whose processing result is fast, it is possible to prevent the delay of the response time caused by the speech recognition of different characteristics and to provide the convenience of the user.

도 1은 본 발명의 제1 실시예에 따른 다중 스레드를 이용한 음성 인식 장치를 나타낸 블록도이다.1 is a block diagram illustrating a speech recognition apparatus using multiple threads according to a first embodiment of the present invention.

도 2는 본 발명의 제2 실시예에 따른 다중 스레드를 이용한 음성 인식 장치를 나타낸 블록도2 is a block diagram illustrating a speech recognition apparatus using multiple threads according to a second exemplary embodiment of the present invention.

도 4는 본 발명의 제2 실시예에 따른 다중 스레드를 이용한 음성 인식 방법을 나타낸 흐름도이다.4 is a flowchart illustrating a speech recognition method using multiple threads according to a second embodiment of the present invention.

도 5는 본 발명의 제3 실시예에 따른 다중 스레드를 이용한 음성 인식 방법을 나타낸 흐름도이다.5 is a flowchart illustrating a speech recognition method using multiple threads according to a third embodiment of the present invention.

***도면의 주요 부분에 대한 부호의 설명****** Description of the symbols for the main parts of the drawings ***

110: 입력부 120: 특징 벡터 추출부110: input unit 120: feature vector extraction unit

130: 음성 인식 서버 131: 음성 인식부130: speech recognition server 131: speech recognition unit

132: 음향 모델 데이터베이스 140: 응용 프로그램부132: acoustic model database 140: application unit

150: 출력부 160: 화자 인식 서버150: output unit 160: speaker recognition server

161: 화자 인식부 162: 화자 모델 데이터베이스161: speaker recognition unit 162: speaker model database

163: 화자 모델 적응 서버163: Speaker model adaptation server

Claims

Extracting a feature vector from an input speech signal;

Performing a speech recognition on the extracted feature vector through a plurality of threads using different acoustic models, respectively;

And a third step of outputting a voice recognition result of a thread having a quick response time to the voice recognition among the plurality of threads.

The method of claim 1, wherein the different acoustic models,

A speech recognition method using multiple threads, characterized in that it is an acoustic model and a statistical language model.

The method of claim 1, wherein the third step,

A first step of comparing the reliability of the thread with a predetermined threshold value in a thread order of which response time for the speech recognition is faster among the plurality of threads;

A second step of outputting a voice recognition result of the thread when the reliability of the thread is greater than the predetermined threshold as a result of the comparison;

As a result of the comparison, if the reliability of the corresponding thread is less than or equal to the predetermined threshold value, and comprises a third step of performing the first process for the next quicker thread response time; Speech recognition method using multiple threads.

The method of claim 3,

And a fourth process of outputting a speech recognition result of a thread having the highest reliability among the plurality of threads when the reliability of all the threads is less than or equal to the predetermined threshold value. Speech recognition method using thread.

The method of claim 1, wherein the plurality of threads using the different acoustic models,

A first thread for performing speech recognition on the extracted feature vector using an acoustic model;

And a second thread for performing speech recognition on the extracted feature vector using a statistical language model.

The method of claim 5, wherein the first thread,

A speech recognition method using multiple threads, characterized in that to perform speech recognition for general commands.

The method of claim 5, wherein the second thread,

Speech recognition method using multiple threads, characterized in that for performing speech recognition for natural language recognition.

The method of claim 5, wherein the third step,

A first step of comparing the reliability of the first thread with a preset threshold when the response time of the first thread is faster than the response time of the second thread;

And a second process of outputting a voice recognition result of the first thread and stopping the voice recognition process of the second thread when the reliability of the first thread is greater than the preset threshold. Speech recognition method using multiple threads, characterized in that.

The method of claim 8,

And a third process of comparing the reliability of the second thread with the predetermined threshold when the reliability of the first thread is less than or equal to the predetermined threshold. Speech recognition method using the.

The method of claim 9,

And performing a semantic analysis on the speech recognition result of the second thread, and then comparing the reliability of the second thread with the predetermined threshold value.

The method of claim 9,

A fourth process of outputting a voice recognition result of the second thread when the reliability of the second thread is greater than the predetermined threshold as a result of the comparison in the third process;

A fifth process of comparing the reliability of the first thread with the reliability of the second thread when the reliability of the second thread is less than or equal to the preset threshold;

A sixth step of outputting a voice recognition result of the first thread when the comparison result of the fifth step is greater than the reliability of the second thread;

And a seventh process of outputting a speech recognition result of the second thread when the reliability of the first thread is less than or equal to the reliability of the second thread as a result of the comparison in the fifth process. Speech recognition method using multiple threads.

The method of claim 1,

And operating an application program according to the output speech recognition result.

Extracting a feature vector from an input speech signal;

A second step of performing speaker recognition based on the extracted feature vector;

A third step of performing speech recognition on each of the extracted feature vectors through a plurality of threads using different acoustic models after performing the speaker recognition;

And a fourth step of outputting a voice recognition result of a thread having a quick response time to the voice recognition among the plurality of threads.

The method of claim 13, wherein the second step,

Speech recognition method using a multi-threaded, characterized in that speaker identification using the extracted feature vector.

The method of claim 13, wherein the second step,

Speech recognition method using a multi-threaded, characterized in that for performing speaker verification using the extracted feature vector.

The method of claim 13, wherein the second step,

Generating a new speaker model based on the extracted feature vector when the speaker recognition result is not a registered speaker;

Speech recognition method using a multi-threaded, characterized in that comprising the step of performing speaker recognition using the generated speaker model.

The method of claim 13, wherein the plurality of threads using different acoustic models,

The method of claim 17, wherein the fourth step,

The method of claim 18,

The method of claim 19,

A feature vector extractor extracting a feature vector from an input voice signal;

And a speech recognition server configured to perform speech recognition through the plurality of threads using different acoustic models of the extracted feature vectors.

The method of claim 22, wherein the speech recognition server,

The method of claim 23, wherein the first thread,

Speech recognition apparatus using multiple threads, characterized in that for performing speech recognition for general commands.

The method of claim 23, wherein the second thread,

Speech recognition apparatus using multiple threads, characterized in that for performing speech recognition for natural language recognition.

The method of claim 23, wherein the voice recognition server,

When the response time of the first thread is faster than the response time of the second thread and the reliability of the first thread is greater than a preset threshold, the voice recognition result of the first thread is output and the operation of the second thread is performed. Speech recognition apparatus using a multi-thread, characterized in that for stopping.

The method of claim 23, wherein the voice recognition server,

When the response time of the first thread is faster than the response time of the second thread and the reliability of the first thread is less than or equal to a preset threshold, the semantic analysis is performed on the speech recognition result of the second thread. Speech recognition apparatus using a multi-threaded, characterized in that.

The method of claim 27, wherein the voice recognition server,

And outputting a voice recognition result of the second thread when the reliability of the second thread is greater than the preset threshold.

The method of claim 27, wherein the voice recognition server,

When the reliability of the second thread is less than or equal to the preset threshold, comparing the reliability of the first thread with the reliability of the second thread and outputting a speech recognition result of the corresponding thread having a high reliability as a result of the comparison. Speech recognition apparatus using a multi-threaded, characterized in that.

A speaker recognition server that performs speaker recognition based on the extracted feature vector;

And a speech recognition server configured to perform speech recognition on the extracted feature vector through a plurality of threads using different acoustic models after the speaker recognition is normally performed.

The voice recognition server of claim 30,

32. The method of claim 31, wherein the speech recognition server,

When the response time of the first thread is faster than the response time of the second thread and the reliability of the first thread is greater than a preset threshold, the voice recognition result of the first thread is output and Speech recognition apparatus using a multi-threaded, characterized in that to stop the operation.

32. The method of claim 31, wherein the speech recognition server,

The method of claim 33, wherein the voice recognition server,

When the reliability of the second thread is less than or equal to the preset threshold, comparing the reliability of the first thread with the reliability of the second thread and outputting a speech recognition result of the corresponding thread having high reliability as a result of the comparison. Speech recognition apparatus using a multi-threaded, characterized in that.