KR950009328B1

KR950009328B1 - Voice recognizing method of monosyable unit

Info

Publication number: KR950009328B1
Application number: KR1019890010935A
Authority: KR
Inventors: 최상대
Original assignee: 엘지전자주식회사; 구자홍
Priority date: 1989-07-31
Filing date: 1989-07-31
Publication date: 1995-08-19
Also published as: KR910003567A

Abstract

The method introduces a monosyllable based speech recognition to solve the drawbacks of the conventional word-based and phoneme-based speech recognition system. The method includes the steps of: segmenting the speech as an initial sound, a medial sound, and a final consonant using the log energy after the endpoint detection by zero crossing rate(ZCR); recognizing the medial sound first after feature extraction by the fast fourier transform(FFT) process; recognizing the initial sound by comparing it with the reference pattern; and finally recognizing the final consonant by comparing with the reference patterns.

Description

Single syllable speech recognition method

제1도는 본 발명 음성인식 시스템의 블럭도.1 is a block diagram of a voice recognition system of the present invention.

제2도는 본 발명 단음절 음성의 분석파형도.2 is an analysis waveform diagram of a single syllable speech of the present invention.

제3도는 본 발명 단음절단위의 음성인식에 대한 신호 흐름도.3 is a signal flow diagram for speech recognition in a single syllable unit of the present invention.

* 도면의 주요부분에 대한 부호의 설명* Explanation of symbols for main parts of the drawings

1 : 마이크 2 : 앰프1: microphone 2: amplifier

3 : 저역통과필터 4 : A/D변환기3: low pass filter 4: A / D converter

5 : 컴퓨터5: computer

본 발명은 음성인식기에 관한 것으로, 특히 단음절을 초성, 중성, 종성으로 분리시킨 후 중성을 먼저 인식하고, 중성의 패턴 중에서 초성과 종성 인식하도록 한 단음절단위의 음성인식방법에 관한 것이다.The present invention relates to a speech recognizer, and more particularly, to a speech recognition method of a single syllable unit that separates a single syllable into primary, neutral, and final, and recognizes neutrality first, and recognizes the initial and final among neutral patterns.

지금까지의 음성인식은 주로 단어단위의 인식을 행하고 있으며, 일부에서는 음소단위의 음성인식을 행하고 있는 것으로, 단어단위의 인식인 경우 등폭패턴의 수가 너무 많아지고, 음소단위의 인식인 경우 정확한 분절이 어렵게되는 문제점이 있었다.So far, speech recognition is mainly performed by word unit, and in some cases, phoneme unit speech recognition is performed. In case of word unit recognition, the number of equal width patterns is too high, and in case of phoneme unit recognition, accurate segmentation There was a problem that became difficult.

본 발명은 이와같은 종래의 문제점을 해결하기 위해서 모음인 중성을 먼저 인식하고, 초성과 종성의 기준 패턴들과 매칭을 행하여 초성과 종성을 인식함으로써 결과적으로 단음절을 인식하도록 한 단음절단위의 음성인식방법을 창안한 것으로, 이하 첨부한 도면에 의해 상세히 설명하면 다음과 같다.In order to solve the conventional problems, the present invention first recognizes neutral, which is a vowel, and matches the reference patterns of the initial and the final to recognize the initial and the final and thus recognizes the single syllable. The invention is described in detail below with reference to the accompanying drawings.

제1도는 본 발명 음성인식시스템의 블럭도로서 이에 도시된 바와같이, 마이크(1)를 통해 입력되는 음성신호를 앰프(2)를 통해 증폭하고, 저역통과필터(3)를 통해 저역성분을 검출한후 A/D변환기(4)를 통해 디지탈신호로 변환하여 컴퓨터(5)를 통해 처리되게 구성한다.1 is a block diagram of the speech recognition system of the present invention, as shown therein, amplifies a voice signal input through a microphone 1 through an amplifier 2 and detects a low pass component through a low pass filter 3. After that, the A / D converter 4 converts the digital signal into a digital signal to be processed by the computer 5.

제2도는 본 발명 단음절음성의 분석파형도로서 이에 도시한 바와같이, 초성(자음), 중성(모음), 종성(자음)을 시간에 대한 로그에너지로 나타낸 것으로, TS(1)은 초성의 임계값, VP는 종성의 임계값, TS(2)는 종성의 임계값이고, ①, ②는 각 영역의 전환점이다.2 is an analytic waveform diagram of the monosyllable speech of the present invention. The value VP is the critical value of the finality, TS 2 is the critical value of the finality, and ① and ② are the switching points of the respective areas.

이와같이 구성된 본 발명의 작용효과를 설명한다.The effects of the present invention configured as described above will be described.

우선 인식을 위한 단음절음성이 A/D변환되어 디지탈적으로 받아들여지면 ZCR(Zero Clossing Rate)에 의해 끝점을 찾은 후 로그에너지를 구성하여 분절을 행하고, FFT(Fast Fourier Transform)를 구하여 인식의 특징을 얻는다.First, if the single syllable speech for digital recognition is A / D converted and accepted digitally, the end point is found by ZCR (Zero Clossing Rate), then the log energy is composed and segmented.FFT (Fast Fourier Transform) is obtained Get

여기서 제2도에 도시한 바와같이, 로그에너지를 조사하여 처음음성이 시작한 후 정해진 임계값 TS(1)를 넘는점(①)까지를 초성부분, 그 이후부터 임계값 TS(2)을 내려가는 점(③)까지를 중성부분, 그 다음부터 음성이 끝나는 점까지를 종성으로 나눈다.Here, as shown in FIG. 2, after the start of the first voice after irradiating the log energy, the point where the initial portion reaches the point (①) exceeding the predetermined threshold TS (1), and the threshold TS (2) thereafter descends. (③) divides the neutral part, and from that point onwards to the end of the voice.

이때 중성(모음)의 인식과 기준패턴을 만드는 트레이닝을 위한 특징추출은 모음의 정상부분 즉 음성의 로그에너지가 초성임계값 TS(1)을 넘는점(①)으로부터 30㎳ 뒤의 약 40㎳를 취한다. 즉 모음은 같은 파형으로 반복되기 때문에 정상부분에서 40㎳만을 취하여도 충분한 인식이 이루어져 데이타량을 줄일 수 있게 된다.At this time, the feature extraction for training to recognize the neutral (vowel) and the reference pattern is about 40㎳ after 30㎳ from the point (①) at the top of the vowel, that is, the log energy of the voice exceeds the initial threshold TS (1). Take it. That is, since the vowels are repeated in the same waveform, even if only 40 kHz is taken at the top, sufficient recognition can be achieved, thereby reducing the amount of data.

또한 초성의 인식과 트레이닝을 위한 특징도 과도기 부분에 많이 포함되어 있기 때문에 제2도에 도시한 바와같이 전환점(①) 앞의 약 90㎳를 취하고, 종성의 인식과 트레이닝을 위한 특징도 전환점(②) 뒤의 약 90㎳를 취한다.In addition, since the characteristics of initial recognition and training are also included in the transition part, as shown in FIG. 2, about 90㎳ in front of the turning point (①) is taken. Take approximately 90㎳ of the back.

이하, 인식의 과정을 신호흐름도인 제3도에 의해 설명한다.Hereinafter, the process of recognition will be described with reference to FIG. 3 which is a signal flow diagram.

우선 음성입력의 끝점을 ZCR에 의해 검출하나 후 로그에너지에 의해 분절하고, FFT에 의해 특징을 추출하여 초성임계값 TS(1)을 구한다. 이후 초성임계값 TS(1)을 넘는점(①)으로부터 30㎳ 뒤에 40㎳를 취하고, 번지(ℓ, m, n)를 세트하여 A/D변환을 수행한 후 번지(ℓ)데이타를 불러 들인다.First, the end point of the voice input is detected by ZCR, then segmented by log energy, and features are extracted by FFT to obtain the supercritical threshold TS (1). Then, take 40㎳ after 30㎳ from the point (①) exceeding the supercritical threshold TS (1), set address (ℓ, m, n), perform A / D conversion, and load address (ℓ) data. .

이와같이 하여 번지(ℓ)를 21개로 되는 모음의 갯수까지 증가하면서 중성의 기준패턴과 같게되면 중성을 인식하고, 번지(ℓ)가 21을 초과하게 되면 처음의 음성입력을 다시 수행한다.In this way, the address (l) is increased to the number of vowels of 21, and when the same as the neutral reference pattern, the neutral is recognized, and if the address (l) exceeds 21, the first voice input is performed again.

이후 초성임계값 TS(1)의 앞 90㎳를 취하여 A/D변환을 수행하고, 번지(m)의 데이타를 불러들인다. 이와같이 하여 번지(m)를 19개로 되는 초성자음의 갯수까지 증가하면서 초성자음이 기준패턴과 같게되면 초성을 인식하고, 번지(m)가 19를 초과하면 처음의 음성입력을 다시 수행한다.Then, A / D conversion is performed by taking the front 90㎳ of the supercritical threshold TS (1), and the data of the address m is loaded. In this way, when the number of m is increased to the number of the number of consonants of 19, the initial consonant becomes the same as the reference pattern, and the initial voice is recognized, and when the number m exceeds 19, the first voice input is performed again.

이후 중성임계값 TS(2)의 뒤 90㎳를 취하여 A/D변환을 수행하고, 번지(n)의 데이타를 불러들인다. 이와같이하여 번지(n)를 27개로 되는 종성자음의 갯수까지 증가하면서 종성자음이 기준패턴과 같게되면 종성을 인식하고, 번지(n)가 27을 초과하게 되면 처음의 음성입력을 다시 수행한다.Thereafter, 90 s after the neutral threshold TS (2) is taken to perform the A / D conversion, and the data of the address n is loaded. In this way, the address n is increased to the number of 27 final consonants, and when the final consonant is equal to the reference pattern, the finality is recognized. When the address n exceeds 27, the first voice input is performed again.

따라서 제일먼저 중성을 인식하고, 초성, 종성의 순서로 매칭을 행하여 인식을 수행함으로써 단음절의 인식결과가 나오게 된다.Therefore, first of all, the neutral is recognized, and the recognition is performed by the matching of the initial and the final, and the recognition result of the single syllable is obtained.

이상에서 상세히 설명한 바와같이 본 발명은 한글의 음성을 단음절 단위로 인식할 수 있어 많은 수의 기준패턴을 만들지 않아도 인식되므로 데이타량을 줄일 수 있고, 정확한 분절을 하지 않아도 인식률을 향상시킬 수 있는 효과가 있다.As described in detail above, the present invention can recognize Korean voices in single-syllable units, thereby recognizing the data without generating a large number of reference patterns, thereby reducing the amount of data, and improving the recognition rate even without accurate segmentation. have.

Claims

Finding the end point of the voice input by ZCR and dividing single syllable into initial, neutral, and final by log energy values, and extracting features by FFT (Fast Fourier Transform). Recognizing when the number is equal to the reference pattern while increasing the number of, Recognizing when the same number is increased by increasing the number of the initial pattern, Recognition when the same as increasing the number of the reference pattern Speech recognition method of a single syllable unit, characterized in that.

2. The method of claim 1, wherein the initial constellation takes only 90 ms before the initial threshold, the neutral takes only 40 ms after the initial threshold, and the final 90 ms after the final threshold. Single syllable unit speech recognition method characterized in that to perform speech recognition by taking only.