KR100269429B1

KR100269429B1 - Transient voice determining method in voice recognition

Info

Publication number: KR100269429B1
Application number: KR1019980002593A
Authority: KR
Inventors: 이석필
Original assignee: 전주범; 대우전자주식회사
Priority date: 1998-01-30
Filing date: 1998-01-30
Publication date: 2000-10-16
Also published as: KR19990066557A

Abstract

PURPOSE: A method for identifying a voice of a transition section in a voice recognizing process is provided to reduce a recognition error for a transition section by correcting a defect of a recognition decision algorithm. CONSTITUTION: A pattern matching algorithm is performed by using an input pattern and a reference pattern obtained from a feature parameter. A recognized result is outputted by performing a decision algorithm when a current block is an initial block of a small section. The current block is determined as a transition section by comparing distance differences between each class. A recognized result of a previous block is changed to the recognized result of the current block if the current block is the transition section. The recognized result of the previous block is outputted if the current block is not the transition section. The above processes are repeatedly performed.

Description

Transient voice determining method in voice recognition

본 발명은 음성 인식에 관한 것으로, 특히 기준 음성 패턴과 입력된 음성 신호를 비교하여 유사한 패턴을 획득하는 인식 단계에서 천이 구간에 존재하는 음성에 대한 패턴을 결정하는 천이구간 음성 식별 방법에 관한 것이다.The present invention relates to speech recognition, and more particularly, to a transition section speech identification method for determining a pattern for a speech existing in a transition section in a recognition step of comparing a reference speech pattern and an input speech signal to obtain a similar pattern.

도 1은 일반적인 음성 인식 시스템의 구성도로서, 음성 인식은 인식하고자하는 입력 음성을 마이크를 통해 입력받아 디지털신호로 변환하는 음성 입력부(10), 음성 입력부(10)에 의해 얻어진 음성 신호를 분석하여 인식에 필요한 특징파라미터를 추출하여 입력 음성에 대한 시험패턴을 생성하는 특징 분석부(11), 기준패턴 저장부(13)에 미리 저장되어 있는 기준패턴과 상기 특징 분석부(11)에서 얻은 시험 패턴을 비교하여 패턴 매칭 알고리즘을 수행하는 패턴 인식부(12), 및 상기 패턴 인식부(12)에 의해 매칭된 복수개의 후보 기준패턴 중에서 유사도가 가장 높은 패턴을 결정하는 인식 결정부(14)를 통해 수행된다.1 is a block diagram of a general speech recognition system, wherein speech recognition is performed by analyzing a speech signal obtained by a speech input unit 10 and a speech input unit 10 that receive an input speech to be recognized through a microphone and convert it into a digital signal. Feature pattern 11 for extracting the feature parameters required for recognition to generate a test pattern for the input speech, the reference pattern previously stored in the reference pattern storage unit 13 and the test pattern obtained from the feature analyzer 11 Through the pattern recognition unit 12 to perform a pattern matching algorithm and a recognition determination unit 14 to determine a pattern having the highest similarity among a plurality of candidate reference patterns matched by the pattern recognition unit 12. Is performed.

입력 음성에 대한 음성 분석 과정을 도 2를 통해 좀더 자세히 설명하면, 입력음성은 저역통과필터(LPF,20)를 사용하여 표본화에 대한 엘리어싱 효과를 줄여준 다음, A/D변환기(21)를 통해 표본화하고 양자화하여 음성 샘플을 얻고, 평탄한 스펙트럼특성을 갖도록 고역강조(preemphasis,22)를 수행한다. 연속시간에 대한 음성을 분석하기 위하여 일정한 길이인 N개의 샘플을 프레임 단위로 나누고(23), 연속된 음성 샘플들에서 M개의 샘플씩 이동하면서 매 프레임마다 윈도우(window)함수(24)를 취하여 분석 구간의 양끝점의 급격한 변화를 제거한다. 이것은 분석 구간인 프레임 길이(N샘플)와 동일한 길이를 갖는 윈도우함수를 사용하여 윈도우를 M개 샘플씩 슬라이딩시켜 시간적으로 인접한 윈도우들간에 (N-M)개의 샘플이 중첩되도록 하여 음성의 시간 연속성을 유지하도록 하기 위함이다. 윈도우함수(24)를 통과한 신호는 FFT 분석등의 주파수 분석(25)을 통해 음성의 특징을 나타내는 특징 파라미터를 추출(26)하여 입력 음성 신호에 대한 시험패턴을 생성하게 된다.The speech analysis process for the input voice is described in more detail with reference to FIG. 2. The input voice reduces the aliasing effect on sampling by using a low pass filter (LPF, 20), and then through the A / D converter 21. Sampled and quantized to obtain a negative sample and preemphasis 22 is performed to have flat spectral characteristics. In order to analyze the speech for continuous time, N samples having a constant length are divided into frames (23), and the window function 24 is taken every frame while moving M samples from the continuous speech samples. Eliminate abrupt changes in both ends of the interval. It uses a window function having the same length as the frame length (Nsample), which is the analysis interval, to slide the window by M samples so that (NM) samples overlap between temporally adjacent windows so as to maintain speech continuity. To do this. The signal passing through the window function 24 extracts the characteristic parameter representing the characteristic of speech through frequency analysis 25 such as FFT analysis to generate a test pattern for the input speech signal.

음성의 특징 파라미터로는 0점교차율(zero crossing rate), 주파수 스펙트럼포만트(formant) 주파수, 상관 계수(autocorrelation), 캡스트럼(cepstrum), 선형 예측 계수(LPC,linear prediction coefficient), 부분 자기 상관 계수(PARCOR,partial autocorrelation coefficient), 대수 면적비(log area ratio)등이사용된다. 이중 한가지만으로 시스템을 구성하기도 하나 필요에 따라 상기 몇개의 특징 파리미터를 조합하여 인식 효율을 높일 수도 있다.Speech feature parameters include zero crossing rate, frequency spectrum formant frequency, autocorrelation, cepstrum, linear prediction coefficient (LPC), and partial autocorrelation. Partial autocorrelation coefficients (PARCOR), log area ratios, etc. are used. Only one of these systems can be configured, but the recognition efficiency can be increased by combining the several feature parameters as necessary.

패턴 인식 과정은 음성 분석을 통해 추출된 특징 파라미터들을 이용하여 입력 음성에 가장 잘 부합하는 언어 표현을 찾아내는 것이다. 패턴 인식을 위한 음성의 기본 단위를 정하여 각 기준패턴을 저장해 두고, 미지의 입력 음성 패턴에 대해 상기 저장된 기준패턴중 가장 유사한 패턴들을 선택하여 인식된 후보 단어(음소)로 결정한다.The pattern recognition process finds the language expression that best matches the input speech using the feature parameters extracted through speech analysis. Each reference pattern is stored by setting a basic unit of speech for pattern recognition, and the most similar patterns among the stored reference patterns are selected for the unknown input speech pattern, and are determined as the recognized candidate words (phonemes).

여기서, 음성 패턴 인식 단위는 시스템에 따라 단어, 음소, 음절등으로 사용될 수 있다. 단어 단위는 조음결합현상을 포함하므로 가장 좋은 성능을 갖지만 많은 어휘를 필요로 하므로 상당한 기억용량과 처리시간이 필요하다. 음소 단위는 음소의 수가 한정되고 적기 때문에 시스템 구현은 간단하지만 조음 결합에 의한 성능의 저하가 따른다. 음소 단위로 인식할 경우에는 프레임 단위로 인식하기도 하고, 주파수 스펙트럼의 변화 정도를 나타내는 스펙트럼 천이 척도(spectral transition measure)를 사용하여 음소단위로 분할하여 인식하게 된다.Here, the speech pattern recognition unit may be used as a word, a phoneme, a syllable, or the like according to a system. The word unit has the best performance because it includes the articulation combination, but it requires a lot of vocabulary, which requires considerable memory and processing time. The phoneme unit has a limited and small number of phonemes, so the system implementation is simple, but the performance is reduced by the articulation combination. When the phoneme is recognized in a phoneme unit, it may be recognized in a frame unit or divided into phoneme units using a spectral transition measure indicating a degree of change in the frequency spectrum.

패턴 인식 과정에서 가장 기본적인 방법인 소위 템플레이트 매칭(template matching)은 미지의 입력 특징 패턴의 시계열(time sequence)과 저장된 음성 단위의 기준 패턴의 시계열을 직접 비교하는 것이다. 발음 속도와 길이의 차이를 보상해주기 위해 보통 DTW(dynamic time warping)라 불리는 동적 프로그램기법을 이용하여 미리 저장되어 있는 기준음성과 미지의 입력음성을 비선형적으로 정합하여 가장 유사도가 높은 기준패턴의 음성으로 인식한다. 이 DTW방법은 음성 인식시 많은 계산량과 연속음성인식으로 확장이 어렵다는 단점이 있으나, 높은 인식률과 인식 시스템의 구현이 용이하여 대상 어휘 수가 소규모인 적용분야에 적합하다.So-called template matching, which is the most basic method in the pattern recognition process, is a direct comparison between a time sequence of an unknown input feature pattern and a time series of a reference pattern of a stored speech unit. To compensate for differences in pronunciation speed and length, a non-linear match between the pre-stored reference voice and an unknown input voice using a dynamic program technique, commonly referred to as dynamic time warping (DTW), results in the most similar reference speech. To be recognized. This DTW method has a disadvantage in that it is difficult to expand due to a large amount of calculation and continuous speech recognition in speech recognition, but it is suitable for an application having a small target vocabulary due to easy implementation of a high recognition rate and a recognition system.

특히 음성 인식시 모든 패턴을 미리 조사할 수 없기 때문에 훈련 데이터로부터 기준패턴을 정확히 모델링하는 것이 인식률 향상을 위해 필수적인 바, 어떤 패턴 분포가 주어졌을 때 같은 종류라고 생각되는 몇개의 클래스로 분할하는 군집화(clustering) 과정이 필요하다. 즉, 군집화는 군집 개수를 결정하고, 패턴이 어떤 군집에 속하는지를 결정하는 것이다. 예를 들어, 군집 개수가 결정되었을 경우, 각 클래스내에는 임의의 대표 패턴이 존재하며, 훈련 패턴에 대해 각 클래스의 대표 패턴과 최소 거리를 갖는 클래스에 할당한 이 후, 각 클래스의 패턴들중 중심 패턴을 결정하여 이를 비교의 대상이 되는 기준 패턴으로 정한다.In particular, since it is impossible to investigate all patterns in advance in speech recognition, it is essential to accurately model the reference patterns from the training data to improve the recognition rate.The clustering is divided into several classes that are considered the same type when a certain pattern distribution is given. clustering process is required. That is, clustering determines the number of clusters and determines which cluster the pattern belongs to. For example, when the number of clusters is determined, an arbitrary representative pattern exists in each class, and the training pattern is assigned to a class having a minimum distance from the representative pattern of each class. The center pattern is determined and the reference pattern to be compared is determined.

상기와 같은 군집화 과정을 통해 결정된 기준 패턴과 인식하고자하는 입력 패턴과의 유사도를 결정하는 인식 결정 알고리즘은 패턴 매칭된 거리 데이터들을 각 클래스별로 비교하여 거리가 가장 작은 클래스를 선택하여 그 클래스의 기준 패턴을 인식 결과로서 결정한다.The recognition determination algorithm that determines the similarity between the reference pattern determined through the clustering process and the input pattern to be recognized, compares the pattern-matched distance data for each class, selects the class having the smallest distance, and selects the reference pattern of the class. Is determined as the recognition result.

한편, 각 클래스에 속해있는 패턴은 통계적으로 분포하고 있어 각 클래스간의 경계를 명확히 분할하기 어렵다. 즉, 클래스간의 경계는 양측 분포의 끝부분이 겹쳐 있으므로 경계에 속해있는 패턴이 어느 쪽에 속하는지를 결정하기 곤란하므로이로 인한 오인식은 불가피하다.On the other hand, patterns belonging to each class are statistically distributed, so it is difficult to clearly divide the boundaries between classes. In other words, the boundary between classes overlaps the ends of both distributions, so it is difficult to determine which pattern belongs to the boundary.

특히 시계열 음성의 천이 구간중의 일부분은 어느 클래스영역에 속하는지 전혀 알길이 없고 양쪽 클래스의 특성을 다 지니고 있을 수 있다. 따라서. 소구간으로 나눈 블록에서의 음성 인식이 애매한 경우, 예를 들어 천이 구간중의 애매한 부분과 같이 두개 이상의 클래스들과 패턴 매칭된 최소 거리가 같을 때 오인식이 발생할 수 있는 문제점이 있다.In particular, some of the transition periods of time series speech have no knowledge of which class region they belong to and may have characteristics of both classes. therefore. When speech recognition in a block divided into small sections is ambiguous, there is a problem that misrecognition may occur when two or more classes and a pattern matched minimum distance are the same, for example, an ambiguous portion of a transition period.

이에, 본 발명은 상기와 같은 문제점을 해결하기 위하여 안출된 것으로서, 시계열로 입력되는 현재 음성 신호가 천이 구간인 경우, 현재 음성 패턴을 시간적으로 앞선 이미 결정된 이전 인식 패턴으로 대체하여 인식된 결과로서 결정하도록 하므로써 천이 구간의 인식률을 향상시킨 음성 식별 방법을 제공하는 데 그 목적이 있다.Accordingly, the present invention has been made to solve the above problems, and when the current speech signal input in time series is a transition period, the present speech pattern is determined as a recognized result by replacing the current speech pattern with a previously determined previous recognition pattern temporally. The purpose of the present invention is to provide a voice identification method which improves the recognition rate of a transition section.

상기와 같은 목적을 달성하기 위한 본 발명은 소구간으로 분리된 시계열 음성 신호에 대한 현재 블록의 특징 파라미터를 추출하여 얻은 입력 패턴과 미리 저장된 각 클래스의 기준 패턴을 이용하여 패턴 매칭 알고리즘을 수행하는 단계; 상기 현재 블록이 소구간의 최초 블록일 경우 패턴 매칭된 결과에 따른 결정 알고리즘을 수행하여 인식 결과를 출력하는 단계; 상기 현재 블록이 소구간의 최초 블록을 제외한 나머지 블록일 경우 패턴 매칭 결과에 따른 입력 패턴에 대한 각 클래스간의 거리차를 비교하여 현재 블록이 천이 구간인지를 판단하는 단계; 현재 블록이 천이 구간일 경우에는 이전 블록에서 획득한 인식 결과를 현재 블록의 인식 결과로 대체하여 출력하며, 현재 블록이 천이 구간이 아닐 경우에는 현재 블록의 인식 결과를 그대로 출력하는 단계; 및 상기 시계열의 입력 음성에 대한 마지막 블록에 대해서까지 상기 패턴 매칭 단계에서 인식 결과 출력 단계까지를 반복적으로 수행하는 단계를 포함하여 구성되는 것을 특징으로 한다.According to an aspect of the present invention, a pattern matching algorithm is performed by using an input pattern obtained by extracting feature parameters of a current block for a time series speech signal divided into small sections and a reference pattern of each class stored in advance. ; Outputting a recognition result by performing a decision algorithm based on a pattern matching result when the current block is the first block of a small section; Determining whether the current block is a transition section by comparing a distance difference between classes for an input pattern according to a pattern matching result when the current block is the remaining block except the first block of the sub-section; If the current block is a transition section, replaces the recognition result obtained in the previous block with the recognition result of the current block, and if the current block is not the transition section, outputting the recognition result of the current block as it is; And repeatedly performing the pattern matching step to the recognition result output step until the last block for the input speech of the time series.

도 1은 일반적인 음성 인식 시스템의 구성도,1 is a block diagram of a general speech recognition system,

도 2는 입력 음성에 대한 음성 분석 과정을 설명하기 위한 도면,2 is a view for explaining a voice analysis process for an input voice;

도 3은 본 발명에 따른 천이 구간에 대한 음성 식별 알고리즘을 도시한 흐름도이다.3 is a flowchart illustrating a voice identification algorithm for a transition section according to the present invention.

음성 인식은 크게 입력 음성 처리 과정, 특징 추출 과정, 유사도 측정 과정(패턴 매칭 과정), 인식어 결정 과정의 4단계로 나눌 수 있으며, 본 발명은 상기 4단계중 인식어 결정 과정에 관련된 것이다.Speech recognition can be divided into four stages: input speech processing, feature extraction, similarity measurement (pattern matching), and recognition word determination. The present invention relates to a recognition word determination process among the four steps.

상기 입력 음성 처리 과정은 인식하고자 하는 음성 신호를 입력받아 저장하고 시작점과 끝점을 검출하여 신호가 아닌부분은 제거한다. 상기 특징 추출 과정은 시간에 따른 음성 신호의 크기를 나타내는 음성 파형 신호보다 짧은 구간(소구간)을 구획하여 음성 프레임을 만들고, 이 구간에 대한 주파수 스펙트럼, LPC 등과 같은 특징 파라미터를 추출하며 아울러, 프레임 내의 0점교차율, 평균에너지, 자기상관계수등의 특징 파라미터를 구한다. 상기 유사도 측정 과정은 기준 패턴으로 저장된 특징 파라미터와 입력 음성신호를 처리하여 입력 패턴으로 주어진 특징 파라미터를 비교하여 유사도를 측정하여 계량화하는 과정으로 템플레이트 매칭 방법, 확률적 결합인 은닉 마코브 모델, 학습 기능을 갖는 신경망 매칭 방법등이 있다. 상기 인식어 결정 과정은 입력되는 음성에 대응하여 가장 적합한 음운, 단어, 문장을 선정하는 과정이다.The input speech processing process receives and stores a voice signal to be recognized, detects a start point and an end point, and removes a portion that is not a signal. The feature extraction process creates a speech frame by dividing a section (small section) shorter than the speech waveform signal representing the magnitude of the speech signal over time, extracting feature parameters such as frequency spectrum, LPC, etc. Characteristic parameters such as zero crossing rate, average energy, and autocorrelation coefficient are obtained. The similarity measuring process is a process of measuring and quantifying the similarity by processing a feature parameter stored as a reference pattern and an input speech signal and comparing the feature parameter given as an input pattern. A template matching method, a stochastic hidden Hidden Markov model, and a learning function. Neural network matching method. The recognition word determination process is a process of selecting the most suitable phoneme, word, sentence in response to the input voice.

이하, 본 발명의 실시예를 첨부된 도면을 참조하여 설명하기로 한다.Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings.

본 실시예에서는 템플레이트 패턴 매칭을 적용하여 최소 거리를 갖는 클래스를 인식 결과로 선정하는 기존의 인식어 결정 방식에 있어서 인식 결과가 애매한 경우에 대한 오인식을 감소토록 하는 방법을 제시한다. 여기서 인식 결과가 애매한 경우는 예를 들어, 입력 패턴에 대한 최소 거리를 갖는 클래스가 2개 이상 존재한다거나, 입력 패턴에 대한 각 클래스간의 거리차가 미소하여 확실히 구분되지 안을 경우에 해당한다. 이와 같은 경우는 천이 구간에서 종종 발생되며, 음성 인식에 있어 천이 구간은 중요한 정보를 지니고 있으므로 이에 대한 처리는 상당히 중요하다.In this embodiment, a method of reducing misperception of a case in which a recognition result is ambiguous in the conventional recognition word determination method of selecting a class having a minimum distance as a recognition result by applying template pattern matching is proposed. In this case, the recognition result is ambiguous, for example, when there are two or more classes having a minimum distance with respect to the input pattern, or when the distance difference between each class with respect to the input pattern is minute and not clearly distinguished. Such cases often occur in transition periods, and since the transition periods have important information in speech recognition, processing thereof is very important.

우선, 도 3의 흐름도에 대한 변수를 정의하면, N은 인식하고자 하는 입력 음성 신호에 대한 전체 블럭 개수, a_i는 i번째 블럭에 대한 특징 파라미터, A_i는 a_i에 패턴 매칭된 기준 패턴이다.First, if the variables for the flowchart of FIG. 3 are defined, N is the total number of blocks for the input speech signal to be recognized, a _i is a feature parameter for the i-th block, and A _i is a reference pattern pattern-matched to a _i . .

입력 음성 신호의 1번째 블록부터 N번째 블록까지 순서대로 인식 하기 위하여 (S1)단계에서 블럭 인덱스 i=1로 초기화 시킨 후, (S2)단계에서 i번째 블록에 대한 특징 파라미터 a_i를 구한다. 즉, 시간에 연속인 입력 음성 신호를 블록단위로 처리하고, 블록당 특징 파라미터를 추출하기 위하여 프레임 단위로 해밍 창함수를 씌워 각 프레임에 대한 LPC계수를 합산하여 i번째 블록의 특징 파라미터로 삼는다.In order to recognize from the 1st block to the Nth block of the input voice signal in order (S1), the block index i = 1 is initialized in step S1, and then in step S2, the feature parameter a _i for the i th block is obtained. That is, the input speech signal that is continuous in time is processed in units of blocks, and a Hamming window function is covered in units of frames to extract feature parameters per block, and the LPC coefficients for each frame are summed as the feature parameters of the i th block.

(S3)단계에서 상기 (S2)단계에서 얻은 i번째 블록의 특징 파라미터 a_i와, 미리 저장된 각 클래스의 기준 패턴의 특징 파라미터들을 각각 비교하여 유사도 측정을 위한 거리를 구하여 저장한다. 즉, DTW알고리즘을 사용하고 거리측도로는 유클리디안 거리를 사용하며 상기 저장된 각 기준 패턴과의 거리차를 비교하여 거리차가 가장 작은 클래스를 선정하고, 이 클래스에 대한 기준 패턴을 입력 음성 a_i에 대한 매칭된 패턴 A_i로 설정한다.In step S3, the feature parameters a _i of the i-th block obtained in step S2 are compared with the feature parameters of the reference pattern of each class, and the distance for measuring similarity is obtained and stored. That is, using a DTW algorithm to a distance measure is compared to a distance difference between each of the reference patterns stored in the uses Euclidean distance away difference selected is the smallest class, and enter a reference pattern for this class sound a _i Set the matched pattern A _i to.

이제, (S4)단계에서 i번째 블록이 천이 구간에 해당하는 지를 판단한다. 여기서 천이 구간이라 함은 상기 저장된 거리값들을 비교한 결과 최소 거리를 갖는 클래스가 2개 이상이 될 경우 혹은 각 클래스간의 거리차이가 소정의 임계값 미만으로 매우 미소한 경우에 해당한다.In operation S4, it is determined whether the i-th block corresponds to a transition period. Here, the transition section corresponds to a case where two or more classes having a minimum distance are compared as the result of comparing the stored distance values or when the distance difference between the classes is very small, less than a predetermined threshold value.

(S5)단계에서 i번째 블록이 천이 구간에 해당되는 경우에는 i번째 블록의 인식결과를 이전 블록인 i-1번째 블럭에서 매칭된 패턴 A_i-1을 i번째 블록의 인식 결과로 최종 출력한다. (S6)단계에서 i번째 블록이 천이 구간에 해당되지 않을 경우에는 i번째 블록의 인식 결과는 상기 (S3)단계에서 획득한 매칭된 패턴 A_i를 최종 인식 결과로 출력한다.When the i-th block corresponds to the transition period in step S5, the recognition result of the i-th block is finally output as the recognition result of the i-th block, which is the matching pattern A _i-1 from the i-1 block, which is the previous block. . If the i-th block does not correspond to the transition period in step S6, the recognition result of the i-th block outputs the matched pattern A _i obtained in step S3 as the final recognition result.

(S7)단계에서 블록 인덱스 i를 1증가 시키고, (S8)단계에서 i와 N의 크기를 비교하여 N보다 작거나 같으면 상기 (S2)단계 이하를 반복적으로 수행하여 1번째 블록부터 N번째 블록까지의 인식 결과를 차례로 출력한 후 종료한다.In step (S7), block index i is increased by 1, and in step (S8), if i and N are smaller than or equal to N, step (S2) or less is repeatedly performed until the first block to the Nth block. Outputs the recognition results in order and then exits.

상기와 같은 과정을 통하여 소구간내의 1번째 블록의 인식 결과 A₁가 출력된 이후, 2번째 블록부터는 현재 블록이 천이 구간에 해당될 경우에는 이전 블록의 인식 결과를 현재 블록의 인식 결과로 강제적으로 결정해 준다.After the recognition result A ₁ of the first block in the small section is output through the above process, if the current block corresponds to the transition section from the second block, the recognition result of the previous block is forcibly used as the recognition result of the current block. You decide.

종래에는 음성의 천이 구간과 같이 인식이 애매한 경우에 대해서도 인식 결정 알고리즘을 독립적으로 수행하므로 인해 이에 따른 오인식이 발생하였다. 본 발명은 현재 블록이 인식의 증거가 애매한 천이 구간일 경우에 대해서는 별도의 인식을 수행하지 않고 바로 이전 블록에서 인식된 결과를 현재 블록의 인식 음성으로 대처해주므로써 천이 구간 음성의 오인식을 줄이는 효과가 있다.In the related art, a recognition decision algorithm is independently performed even in a case where the recognition is ambiguous, such as a transition period of speech, thereby causing a misperception. According to the present invention, when the current block is a transition section in which the evidence of recognition is ambiguous, the result of the previous block is dealt with by the recognition voice of the current block without performing additional recognition, thereby reducing the misperception of the transition period speech. have.

Claims

Performing a pattern matching algorithm using an input pattern obtained by extracting feature parameters of a current block for a time series speech signal separated into subsections and a reference pattern of each class stored in advance;

Outputting a recognition result by performing a decision algorithm based on a pattern matching result when the current block is the first block of a small section;

Determining whether the current block is a transition section by comparing a distance difference between classes for an input pattern according to a pattern matching result when the current block is the remaining block except the first block of the sub-section;

If the current block is a transition section, the recognition result obtained in the previous block is replaced with the recognition result of the current block, and if the current block is not the transition section, the recognition result according to the pattern matching result of the current block is output as it is. Doing; And

And repeatedly performing the pattern matching step to the recognition result output step until the last block for the input speech of the time series.

The method of claim 1, wherein the pattern matching algorithm is a template matching method using a dynamic time stretching method.

The voice recognition method of claim 1, wherein the transition section corresponds to a case in which two or more classes having a minimum distance result from comparing distance values with reference patterns of each class with respect to the input pattern. Speech identification method of transition section.

The method of claim 1, wherein the transition period corresponds to a case where the distance difference between the classes is very small, less than a predetermined threshold, as a result of comparing the distance values with the reference pattern of each class for the input pattern. Speech recognition method of the transition period during speech recognition.