KR0136426B1

KR0136426B1 - Voice recognition method for hidden markov modeling method system

Info

Publication number: KR0136426B1
Application number: KR1019950001401A
Authority: KR
Inventors: 구명완
Original assignee: 조백제; 한국전기통신공사
Priority date: 1995-01-26
Filing date: 1995-01-26
Publication date: 1998-05-15
Also published as: KR960030078A

Abstract

본 발명은 음성 인식 과정에 필수적인 비터비(viterbi) 알고리즘을 구현할 때 반복 계산을 줄이는 히든 마르코프 모델링 방식(HMM)의 음성인식 시스템에서의 음성 인식 방법에 관한 것으로, 서브워드 일차 계산 및 이차 계산을 나누어 수행함으로서 비터비(Viterbi) 계산량을 줄이는 음성 인식 방법을 제공하기 위하여, 초기화 후에 마지막 프레임인지를 판단하여 마지막 프레임이면 인식 결과를 출력하고, 마지막 프레임이 아니면 서브워드 단위로 비터비(Viterbi) 일차 계산을 수행하는 제 1 단계(401 내지 404); 및 단어 단위로 비터비 이차 계산을 수행하여 비터비 값을 구한 후에 언어 처리 과정을 수행하고 상기 제 1 단계(401 내지 404)의 마지막 프레임 판단 과정을 반복 수행하는 제 2 단계(405, 406)를 포함하여 비터비(Viterbi) 계산량을 획기적으로 줄일 수 있어 실시간으로 음성을 인식할 수 있는 효과가 있다.The present invention relates to a speech recognition method in a Hidden Markov Modeling (HMM) speech recognition system that reduces iterative computation when implementing the Viterbi algorithm, which is essential for speech recognition. In order to provide a speech recognition method that reduces the Viterbi calculation amount by performing the operation, it is determined whether the frame is the last frame after initialization, and if the last frame is output, the recognition result is output. Performing a first step (401 to 404); A second step (405, 406) for performing a linguistic processing after performing a second Viterbi second calculation on a word-by-word basis and repeating the last frame determination process of the first steps (401 to 404). In addition, the Viterbi calculation amount can be drastically reduced, so that voice can be recognized in real time.

Description

Speech Recognition in Hidden Markov Modeling (HMM) Speech Recognition System.

제 1 도는 종래의 음성 인식 방법에 사용되는 서브워드 유니트를 이용한 단어 표시 방법의 예시도,1 is a diagram illustrating a word display method using a subword unit used in a conventional speech recognition method.

제 2 도는 종래의 음성 인식 방법의 흐름도,2 is a flowchart of a conventional speech recognition method;

제 3 도는 본 발명이 적용되는 HMM 음성 인식 시스템의 구성도,3 is a configuration diagram of an HMM speech recognition system to which the present invention is applied;

제 4 도는 본 발명에 따른 음성 인식 방법의 흐름도,4 is a flowchart of a speech recognition method according to the present invention;

제 5 도는 본 발명에 따른 비터비(Viterbi) 일차 계산 방법의 상세 흐름도.5 is a detailed flowchart of the Viterbi primary calculation method according to the present invention.

* 도면의 주요 부분에 대한 부호의 설명* Explanation of symbols for the main parts of the drawings

301 : 특정 추출부302 : 단어 인식부301: specific extraction unit 302: word recognition unit

303 : 단어 모델링부304 : 서브워드 모델303: word modeling unit 304: subword model

305 : 발음 사전306 : 문장 인식부305: phonetic dictionary306: sentence recognition unit

309 : 언어 모델309: language model

본 발명은 HMM(Hidden Markov Modeling)의 기본 유니트로 이음(allophone) 혹은 음소등과 같이 서브워드(subword)를 사용하는 HMM음성인식 시스템에서 음성 인식 과정에 필수적인 비터비(Viterbi) 알고리즘을 구현할 때 반복 계산을 줄이는 히든 마르코프 모델링 방식(HMM)의 음성인식 시스템에서의 음성 인식 방법에 관한 것이다.The present invention is repeated when implementing the Viterbi algorithm essential for the speech recognition process in the HMM speech recognition system using subwords such as alloy or phoneme as a basic unit of HMM (Hidden Markov Modeling). Speech Recognition in Hidden Markov Modeling (HMM) Speech Recognition System.

제 1 도는 종래의 음성 인식 방법에 사용되는 서브워드 유니트를 이용한 단어 표시 방법의 예시도이다.1 is a diagram illustrating a word display method using a subword unit used in a conventional speech recognition method.

예를 들어 감다, 간다, 같다의 세 단어를 인식하기 위해서는 우선 서브워드 유니트로 세 단어를 표시해 주어야한다. (a)는 선형 구조로 세 단어를 표시한 경우이고, (b)는 트리 구조로 세 단어를 표시한 경우이다.For example, to recognize three words of winding, going, and equal, three words must first be displayed in subword units. (a) is a case of displaying three words in a linear structure, and (b) is a case of displaying three words in a tree structure.

선형 구조로 표시된 단어를 인식할 때 필요한 비터비(Viterbi) 계산량은 감다인 경우 5개의 서브워드 유니트가 있으므로 5개의 서브워드 유니트 비터비(Viterbi) 계산량이 필요하다. 동일한 방법으로 세 단어에 대해 계산하면 15개의 서브워드 유니트 비터비(Viterbi) 계산량이 필요하다. 반면, 트리 구조로 표시된 단어를 인식할 경우에는 ㄱ, ㅏ 서브워드 유니트가 공유되므로 세 단어를 인식하는 데 필요한 비터비(Viterbi) 계산량은 11개의 서브워드 유니트 비터비(Viterbi) 계산량이다.The Viterbi calculation amount required for recognizing the word represented by the linear structure is 5 subword units because the Viterbi calculation amount is rewind. Therefore, 5 subword unit Viterbi calculation amounts are required. Computing three words in the same way requires 15 subword unit Viterbi calculations. On the other hand, when recognizing the words displayed in a tree structure, since the a and V subword units are shared, the Viterbi calculation amount required to recognize the three words is 11 subword unit Viterbi calculation amounts.

이와 같이 트리 구조에 의한 방법은 선형 구조보다 계산량이 줄었지만 연속 음성을 인식할 경우 매 단어의 마지막 서브워드 유니트의 비터비(Viterbi) 계산이 끝나야만 현재 인식되고 있는 단어를 알 수 있으므로 문법적인 정보를 추가하기 어렵다.In this way, the tree-based method has less computation than the linear structure, but when recognizing continuous speech, grammatical information is recognized because only the Viterbi calculation of the last subword unit of each word is completed before the word is recognized. It's hard to add.

또한, 제 1 도의 (b)에서와 같이 ㄷ, ㅏ의 서브워드 유니트는 여전히 반복 계산이 되고 있다.In addition, as shown in FIG. 1 (b), the subword units of c and k are still repeated.

제 2 도는 종래의 음성 인식 방법의 흐름도로서, 종래의 인식과정은 매 프레임에서 가능한 모든 후보 단어에 대해서 비터비(Viterbi)값을 구할 때 일차 및 이차 계산을 동시에 수행한다.FIG. 2 is a flowchart of a conventional speech recognition method. In the conventional recognition process, the first and second calculations are simultaneously performed when obtaining the Viterbi values for all possible candidate words in each frame.

상기와 같은 종래의 HMM 음성 인식 시스템은 인식 과정에 필수적인 비터비(Viterbi) 알고리즘을 구현할 때 반복 계산을 줄이기 위하여 인식 후보 대상 단어를 HMM의 기본 유니트로 구성된 트리 구조로 표현한다.In the conventional HMM speech recognition system as described above, in order to reduce the repetitive calculation when implementing the Viterbi algorithm essential for the recognition process, the recognition candidate words are represented in a tree structure composed of HMM basic units.

그러나, 트리 구조는 연속 음성 인식 시스템 구현에 필요한 문법 규칙을 적용하기가 어려웠으며 또한 트리 구조를 표현해 주기 위한 데이타 구조가 복잡한 문제점이 있었다.However, it is difficult to apply the grammar rules necessary to implement a continuous speech recognition system, and the data structure for representing the tree structure is complicated.

따라서, 상기 문제점을 해결하기 위하여 안출된 본 발명은 트리구조를 사용하지 않으면서 서브워드 일차 계산 및 이차 계산을 나누어 수행함으로서 비터비(Viterbi) 계산량을 줄이는 음성 인식 방법을 제공하는데 그 목적이 있다.Accordingly, an object of the present invention is to provide a speech recognition method for reducing the Viterbi calculation amount by performing subword first and second calculations separately without using a tree structure.

상기 목적을 달성하기 위하여 본 발명은, 음성을 입력받아 특징을 추출하는 특징 추출 수단; 발음 사전의 정보에 따라서 서브워드 모델을 이용하여 단어를 모델링하는 단어 모델링 수단; 상기 특징 추출수단의 음성 특징과 상기 단어 모델링 수단의 단어 모델 정보를 입력받아 비터비 계산을 수행하여 단어를 인식하는 단어 인식 수단; 상기 단어 인식 수단의 출력을 입력받아 언어 모델의 정보에 따라서 문장을 인식하는 문장 인식 수단을 구비하는 음성 인식 시스템에 적용되는 방법에 있어서, 초기화 후에 마지막 프레임인지를 판단하여 마지막 프레임이면 인식 결과를 출력하고, 마지막 프레임이 아니면 서브워드 단위로 비터비(Viterbi) 일차 계산을 수행하는 제 1 단계; 및 상기 제 1 단계 수행 후, 단어 단위로 비터비 이차 계산을 수행하여 비터비 값을 구한 후에 언어 처리 과정을 수행하고 상기 제 1 단계의 마지막 프레임 판단 과정을 반복 수행하는 제 2 단계를 포함하는 것을 특징으로 한다.In order to achieve the above object, the present invention comprises: feature extraction means for extracting a feature by receiving a voice; Word modeling means for modeling a word using the subword model according to the information in the pronunciation dictionary; Word recognition means for receiving a voice feature of the feature extracting means and word model information of the word modeling means and performing a Viterbi calculation to recognize a word; In the method applied to the speech recognition system having a sentence recognition means for receiving the output of the word recognition means for recognizing the sentence according to the information of the language model, it is determined whether the last frame after initialization and outputs the recognition result if the last frame A first step of performing a Viterbi first order calculation on a subword basis if not the last frame; And after performing the first step, performing a second Viterbi calculation on a word-by-word basis to obtain a Viterbi value, and then performing a language process and repeating the last frame determination process of the first step. It features.

이하, 첨부된 도면을 참조하여 본 발명에 따른 일실시예를 상세히 설명한다.Hereinafter, with reference to the accompanying drawings will be described an embodiment according to the present invention;

제 3도는 본 발명이 적용되는 HMM 음성 인식 시스템의 구성도이다.3 is a configuration diagram of an HMM speech recognition system to which the present invention is applied.

먼저, 음성이 HMM 음성 인식 시스템으로 입력되면 특징 추출부(301)는 음성 고유의 특징을 추출하여 단어 인식부(302)로 출력한다. 단어 인식부(302)는 음성 고유의 특징과 단어 모델링부(303)의 단어모델 정보를 사용하여 비터비 계산을 수행함으로서 단어를 인식한다. 단어 모델링부(303)에서는 발음 사전(305)의 정보에 따라서 단어를 쉽게 인식하기 위해서 서브워드 모델(304)을 이용하여 매 단어를 표시해 준다. 대표적인 방식으로는 선형 구조와 트리 구조가 있다. 서브워드 모델(304)은 서브워드 유니트를 결정하여 준다.First, when a voice is input to the HMM speech recognition system, the feature extractor 301 extracts a feature unique to the speech and outputs it to the word recognizer 302. The word recognizer 302 recognizes a word by performing a Viterbi calculation using a feature of speech and word model information of the word modeler 303. The word modeling unit 303 displays every word using the subword model 304 in order to easily recognize the word according to the information of the pronunciation dictionary 305. Representative methods include linear structure and tree structure. The subword model 304 determines the subword unit.

단어 인식부(302)에서 단어가 인식이 되면 (307)을 통해서 문장 인식부(306)에서 문장이 인식된다. 이때, 언어 모델(309)의 정보를 이용해서 단어 인식부(302)에서 이전에 구한 단어 정보 다음에 올 수 있는 단어를 선택한다. 만약, 단어가 문법에 맞지 않으면 (308)을 통하여 단어 인식부(302)로 돌아와서 다음으로 가능성이 있는 단어를 선택한 후에 (307)을 통하여 문장 인식부(306)는 문장을 인식한다. 여기서, (306)~(309) 과정이 없는 경우는 고립 단어 인식 시스템이 되고, (306)~(309) 과정이 있는 경우는 연속 단어 인식 시스템이 된다.When the word is recognized by the word recognizer 302, the sentence is recognized by the sentence recognizer 306 through 307. At this time, the word that can come after the word information previously obtained by the word recognition unit 302 is selected using the information of the language model 309. If the word does not match the grammar, the sentence recognition unit 306 recognizes the sentence through 307 after returning to the word recognition unit 302 through 308 to select a next probable word. Here, if there are no processes (306) to (309), it becomes an isolated word recognition system, and if there are processes (306) to (309), it becomes a continuous word recognition system.

제 4도는 본 발명에 따른 음성 인식 방법의 흐름도이다.4 is a flowchart of a speech recognition method according to the present invention.

본 발명에 따른 비터비(Viterbi) 계산 방식의 핵심 개념을 설명하면 아래와 같다.A key concept of the Viterbi calculation method according to the present invention will be described below.

δ_i ^sub(t)=MAX_i(δ_i ^sub(t-1)+log α_ji ^sub+log b_ji ^sub(O_t))δ _i ^sub (t) = MAX _i (δ _i ^sub (t-1) + log α _ji ^sub + log b _ji ^sub (O _t ))

=MAX_i(δ_i ^sub(t-1)+First _ cal_ji ^sub(O_t))= MAX _i (δ _i ^sub (t-1) + First _ cal _ji ^sub (O _t ))

=Sec _ cal_ji ^sub(O_t)= Sec _ cal _ji ^sub (O _t )

(서브워드 sub, 프레임 t, 상태 i 에서의 비터비 값;δ_i ^sub(t), 프레임 t에서의 음성특징 O_t가 상태 j에서 상태 i로 변경될 때 나올 수 있는 관찰 확율; b_ji ^sub(O_t), 상태j에서 상태 i로 이동될 천이 확율; α_ji ^sub)(Viterbi value in subword sub, frame t, state i; δ _i ^sub (t), observation probability that can occur when voice feature O _t in frame t changes from state j to state i; b _ji ^sub (O _t ), the probability of a transition from state j to state i; α _ji ^sub )

상기 비터비(Viterbi) 계산중 가장 많은 시간이 걸리는 부분이 log b_ji ^sub(O_t)이다. 비터비(Viterbi) 계산은 서브워드 단위로 수행하며 매 서브워드 비터비(Viterbi) 계산은 계산량을 줄이기 위해 일차계산과 이차 계산으로 나눈다. 일차 계산이란 상기 First _ cal_ji ^sub(O_t)인데, 이 계산은 매 서브워드의 상태 변화 ji와 프레임 값 Ot에만 영향을 받으며, 서브워드에 의해 구성되고 있는 단어에는 영향을 받지 않는다. 이차 계산이란 일차 계산값과 이전 프레임 t-1에 의해 구해진 비터비(Viterbi)값 δ_ji ^sub(t-1)을 합하는 과정을 말하며, 이때 이전 프로엠 t-1에 의해 구해진 비터비(Viterbi) 값은 현 서브워드가 구성하고 있는 단어에 의해 영향을 받는다.The most time-consuming part of the Viterbi calculation is log b _ji ^sub (O _t ). The Viterbi calculation is performed in subword units, and each subword Viterbi calculation is divided into first and second calculations to reduce the calculation amount. The first calculation is the first cal _ji ^sub (O _t ), which is influenced only by the state change ji of each subword and the frame value Ot, and is not affected by the words composed by the subwords. Secondary calculation refers to a process of summing the first calculated value and the Viterbi value δ _ji ^sub (t-1) obtained by the previous frame t-1, wherein the Viterbi obtained by the previous program t-1 The value is affected by the words that make up the current subword.

그러므로, 매 프레임 t에서의 비터비(Viterbi) 값을 구할 때 후보단어에 의해 영향을 받지 않는 일차 계산과 후보 단어에 의해 영향을 받는 이차 계산으로 나누어서 계산한다. 이러한 방식으로 계산하면 어떤 서브워드가 n개의 후보 단어를 구성한다고 하더라도 일차 계산은 일회만하면 되므로 과거의 n번의 일차 계산량을 1/n로 줄일 수 있다.Therefore, when calculating the Viterbi value in every frame t, it is calculated by dividing the first calculation that is not affected by the candidate word and the second calculation that is affected by the candidate word. In this way, even if a subword constitutes n candidate words, the first calculation needs to be performed only once, thus reducing the past n primary calculation amounts to 1 / n.

이처럼 본 발명에 따른 인식 과정에서는 매 프레임에서 가능한 모든 서브워드에 대해 비터비(Viterbi) 일차 계산을 하고, 모든 후보 단어에 대해서 이차 계산을 하여 최종 비터비(Viterbi) 값을 구한다. 예를 들면 기존의 방식에 따르면 제 1 도의 선형 구조인 경우 ㄱ, ㅏ의 비터비(Viterbi)의 일차 계산이 ㄱ 3회 ㅏ 6회 반복 계산이 된다. 그리고, 트리 구조인 경우에는 ㅏ만 4회 반복된다. 그러나, 본 발명에 따르면 제 1도의 선형 구조인 경우 ㄱ, ㅏ 모두 비터비(Viterbi) 일차 계산이 일회 수행되며 트리 구조인 경우도 ㅏ의 비터비(Viterbi) 일차 계산이 일회만 수행된다. 그러므로 계산량이 많이 줄게 된다.As described above, in the recognition process according to the present invention, Viterbi first calculation is performed on all possible subwords in every frame, and second calculation is performed on all candidate words to obtain a final Viterbi value. For example, according to the conventional method, in the linear structure of FIG. 1, the first calculation of Viterbi of a, 이 is repeated a 3 times ㅏ 6 times. And, in the case of the tree structure, ㅏ is repeated four times. However, according to the present invention, in the case of the linear structure of FIG. 1, Viterbi first calculation is performed once for both a and ,, and Viterbi first calculation of V is performed only once for the tree structure. Therefore, the calculation amount is greatly reduced.

동작 방법을 살펴보면, HMM 음성 인식 시스템을 초기화한 후에(401) 마지막 프레임인지를 판단하여(402) 마지막 프레임이면 인식결과를 도출한다(403). 마지막 프레임이 아니면 현 프레임에서 가능한 모든 서브워드에 대하여 비터비 일차 계산을 수행한 후에(404) 현 프레임에서 가능한 모든 후보 단어에 대하여 이차 계산을 수행하여 비터비 값을 구한 다음에(405) 언어 처리 과정을 수행하고(406) 상기 마지막 프레임인지를 판단하는 과정(402)을 반복 수행한다.Referring to the operation method, after initializing the HMM speech recognition system (401), it is determined whether it is the last frame (402), and if it is the last frame, a recognition result is derived (403). If it is not the last frame, Viterbi first calculation is performed on all possible subwords in the current frame (404), and then quadratic calculation is performed on all possible candidate words in the current frame to obtain the Viterbi value (405). In operation 406, the method 402 determines whether the frame is the last frame.

제 5 도는 본 발명에 따른 비터비(Viterbi) 일차 계산 방법의 상세 흐름도로서, 현 프레임에서 가능한 모든 후보 단어로 부터 현 프레임에서 가능한 모든 서브워드를 찾는 방법을 나타낸다. 현 프레임에서 가능한 모든 후보 단어는 서브워드를 찾는 방법을 나타낸다. 현 프레임에서 가능한 모든 후보 단어는 서브워드의 열로 표현이 되기 때문에 서브워드의 일차 계산을 하기 전에 이전에 이미 일차 계산이 이루어졌는지를 검토하고 만약 일차 계산이 이루어져 있으면 다음 서브워드열에 대해 동일한 방식을 계속하고, 일차 계산이 이루어져 있지 않는다면 현 프레임의 출력값을 근거로 일차 계산을 한 후에 현 서브워드에 저장한다. 이때, 일차 계산의 수행 여부는 매 서브워드에 플래그를 설정하여 구한다.5 is a detailed flowchart of the Viterbi primary calculation method according to the present invention, and shows a method of finding all possible subwords in the current frame from all possible candidate words in the current frame. All possible candidate words in the current frame indicate how to find the subword. Since all possible candidate words in the current frame are represented by a column of subwords, check whether the first calculation has already been made before the first calculation of the subword, and if the first calculation is done, continue the same way for the next subword string. If the primary calculation is not performed, the primary calculation is performed based on the output value of the current frame and then stored in the current subword. At this time, whether to perform the first calculation is obtained by setting a flag in every subword.

동작 방법을 상세히 살펴보면, 현 프레임에 해당되는 후보 단어들중 첫 후보 단어를 구한 후에(501) 현 후보 단어로 부터 첫 서브워드를 구한 다음에(502) 서브워드의 비터비 일차 계산 수행 플래그(flag)를 체크하여(503) 일차 계산이 이루어졌으면 다음 서브워드를 구하고(504) 상기 플래그 체크 과정을 반복 수행한다.Looking at the operation method in detail, after obtaining the first candidate word among candidate words corresponding to the current frame (501), after obtaining the first subword from the current candidate word (502), the Viterbi first calculation performing flag of the subword (flag) If the primary calculation is made (503), the next subword is obtained (504), and the flag check process is repeated.

일차 계산이 이루어지지 않았으면 현 서브워드에 대하여 비터비 일차 계산을 수행하여 저장한 후에(505) 현 서브워드의 비터비 일차 계산 수행 완료 플래그를 세트(set)한 다음에(506) 마지막 서브워드인지를 판단한다(507). 마지막 서브워드가 아니면 다음 서브워드를 구한 후에(504) 상기 플래그 체크 과정을(503) 반복 수행한다.If the first calculation has not been performed, the Viterbi first calculation is performed and stored for the current subword (505), then the Viterbi first calculation completion flag of the current subword is set (506), and then the last subword. Awareness is determined (507). If not, the flag check process is repeated 503 after the next subword is obtained (504).

마지막 서브워드이면 마지막 후보 단어인지를 판단하여(508) 마지막 후보 단어이면 종료하고, 마지막 후보 단어가 아니면 후보 단어를 선택한 후에(509) 후보 단어로 부터 첫 서브워드를 구하는 과정을(502) 반복 수행한다.If it is the last subword, it is determined whether it is the last candidate word (508). If it is the last candidate word, it is terminated. do.

상기와 같은 본 발명은 비터비(Viterbi) 계산량을 획기적으로 줄일 수 있어 실시간으로 음성을 인식할 수 있는 효과가 있다.The present invention as described above can significantly reduce the Viterbi calculation amount has the effect of recognizing the voice in real time.

Claims

Specific extraction means 301 for receiving a voice and extracting a feature; Word modeling means (303) for modeling words using the subword model (304) according to the information in the pronunciation dictionary (305); Word recognition means (302) for receiving a speech feature of the feature extraction means (301) and word model information of the word modeling means (303) to perform a Viterbi calculation to recognize a word; In the method applied to the speech recognition system having a sentence recognition means 306 for receiving the output of the word recognition means 302 to recognize a sentence according to the information of the language model 309,

A first step (401 to 404) of determining whether the frame is the last frame after initialization and outputting a recognition result if the frame is the last frame, and performing a Viterbi first calculation in units of subwords if the frame is not the last frame; And

After performing the first steps (401 to 404), after performing the Viterbi quadratic calculation on a word-by-word basis to obtain the Viterbi value, the language processing is performed.

And a second step (405, 406) of repeatedly performing the last frame determination process of the first step (401 to 404).

The Viterbi first order calculation of claim 1, wherein

The speech recognition method of the Hidden Markov Modeling Method (HMM) speech recognition system, characterized in that configured to be affected only by the speech feature output value (Ot) of each frame t and the corresponding subword (sub).

The Viterbi first order calculation of claim 1, wherein the first step (401 to 404)

log α _ji ^sub + log b _ji ^sub (O _t )

Observation probability that can come out when subt sub, frame t, state change ji, voice feature Ot in frame t changes from state j to state i: b _ji ^sub (O _t ), go from state j to state i Probability of Transition: α _ji ^sub )

A speech recognition method in a speech recognition system of Hidden Markov Modeling Method (HMM), characterized in that

The Viterbi quadratic calculation of claim 2, wherein the second step (405, 406)

The method of speech recognition in the Hidden Markov Modeling (HMM) speech recognition system, characterized in that the sum of the Viterbi first step to the result of the Viterbi first order calculation.

5. The Viterbi secondary calculation of claim 1, wherein the second step 405, 406 is performed.

MAX _j (δ _i ^sub (t-1) + First _ cal _ji ^sub (O _t ))

(Viterbi value at subword sub, frame t, state i: δ _i ^sub (t), state change ji, result of Viterbi first order calculation: First _ cal _ji ^sub (O _t ))

The Viterbi first order calculation of claim 1, wherein

A third step 501 of obtaining a first candidate word among candidate words corresponding to the current frame;

After performing the third step 501, the first calculation is performed on all possible subwords of the current candidate word based on the output value of the current frame, stored in the current subword, and the Viterbi first calculation completion flag is set. Four steps (502 to 506); And

After performing the fourth step (502 to 506), Hidden Markov modeling method (HMM) characterized in that it comprises a fifth step (508, 509) to repeat the fourth step (502 to 506) to the last candidate word (HMM) Speech Recognition Method in Speech Recognition System.

The method of claim 6, wherein the fourth step (502 to 506),

A sixth step (502, 503) of checking a subword Viterbi first order calculation flag after obtaining the first subword from the current candidate word;

A seventh step (504) of repeatedly performing the flag check process of the sixth step (502, 503) after obtaining the next subword after performing the sixth step (502, 503);

After performing the sixth step (502, 503), if the first calculation has not been performed, the Viterbi first calculation is performed on the current subword and stored in the current subword, and then the Viterbi first calculation completion flag of the current subword is stored. An eighth step (505 to 507) of determining whether it is the last subword after setting; And

After the eighth step (505 to 507), if not the last subword after the next subword after obtaining the ninth step (504) to repeat the flag check process of the sixth step (502, 503) A speech recognition method in a speech recognition system of the Hidden Markov Modeling Method (HMM).