KR960007131B1

KR960007131B1 - Continuous number tone recognizing method of voice recognizing system

Info

Publication number: KR960007131B1
Application number: KR1019900022294A
Authority: KR
Inventors: 김민성
Original assignee: 엘지전자주식회사; 구자홍
Priority date: 1990-12-28
Filing date: 1990-12-28
Publication date: 1996-05-27
Also published as: KR920013249A

Abstract

storing data of a digital signal processing unit in a reference RAM and a program RAM; comparing the algorithm of the program RAM with the stored data; driving a sound recognition device through an input/output unit; searching a reference pattern and its path from all training data; producing an average value throughout the frame matched with the path to thereby obtain frequency; obtaining weight for the frequency; and regulating the path by reducing the weight in the path much selected and increasing the weight in the path less selected.

Description

Continuous Number Recognition Method of Speech Recognition System

제 1 도는 본 발명 음성인식시스템의 연속숫자음 인식에 대한 블록도.1 is a block diagram of continuous numeric recognition of a speech recognition system of the present invention.

제 2 도는 본 발명 음성인식시스템의 연속숫자음 인식에 있어서, 기준패턴과 각 경로에 대한 빈번도를 구하는 것에 대한 신호 흐름도.2 is a signal flow diagram for obtaining a reference pattern and a frequency for each path in continuous digit recognition of the speech recognition system of the present invention.

제 3 도는 본 발명 음성인식시스템의 연속숫자음 인식에 있어서, 매칭이 진행되어가는 경로를 나타낼 개략도.3 is a schematic diagram showing a path in which a matching proceeds in continuous digit recognition of a speech recognition system of the present invention.

제 4 도는 본 발명 음성인식시스템의 연속숫자음 인식에 있어서, 인식과정에 대한 신호 흐름도.Figure 4 is a signal flow diagram for the recognition process in the continuous numeric recognition of the speech recognition system of the present invention.

제 5 도는 종래 음성인식시스템의 기준패턴의 각 프레임에서 선택될 수 있는 경로를 나타낸 특성도.5 is a characteristic diagram showing a path that can be selected in each frame of the reference pattern of the conventional speech recognition system.

* 도면의 주요부분에 대한 부호의 설명* Explanation of symbols for main parts of the drawings

1 : 마이크 2 : 아날로그부1: microphone 2: analog part

3 : A/D 변환부 4 : 디지탈신호처리부(DSP)3: A / D converter 4: Digital signal processor (DSP)

5 : 어드레스디코더 6 : 버퍼5: Address decoder 6: Buffer

7 : 프로그램램 8 : 레퍼런스 데이타램7: Program RAM 8: Reference Data RAM

9 : 프로그램롬 10 : 입출력9: Program ROM 10: I / O

11 : 음성이용장치11: Voice device

본 발명은 음성인식시스템에 관한 것으로, 특히 음성의 안정구간과 변이구간에 서로 다른 웨이트(weight)를 줄 수 있도록 하여 음성의 지속차이로 인한 오인식을 줄이기에 적당하도록 한 음성인식시스템의 연속숫자음 인식방법에 관한 것이다.The present invention relates to a speech recognition system, and in particular, to provide different weights between a stable section and a transition section of a speech, so that the continuous numeric sound of the speech recognition system is suitable for reducing misperception due to the continuous difference of the speech. It relates to a recognition method.

종래의 음성인식시스템은 제 5 도에 도시한 바와같이, 기존 패턴(y측)과 입력패턴(x축)을 DTW(Dynamic Time Warping)에 의해 매치할때 기준패턴의 각 프레임에서 선택될 수 있는 경로가 모두 균등하게 정해져 있다.Conventional speech recognition system can be selected in each frame of the reference pattern when matching the existing pattern (y side) and the input pattern (x-axis) by DTW (Dynamic Time Warping), as shown in FIG. The paths are all set equally.

여기서 DTW는 두패턴(입력과 기준패턴) 사이의 비선형적 시간축보정방법을 통한 정합방법으로DTW is a matching method through nonlinear time axis correction between two patterns (input and reference pattern).

와 같이 구해지며 모든 i와 j에 대해 g(i,j)를 반복적으로 구한다.It is found as follows, and g (i, j) is obtained repeatedly for all i and j.

이때 최종적으로 구해진 g(In,Rn)이 I와 Rn 의 거리가 되고 제 5 도에 도시한 바와같은 특성도에서 경로(path A)가 이때의 대응되는 관계를 나타내는 것이라면 입력 i가 프레임에서 선택된 부분 경로는 2임을 알 수 있다.In this case, if finally obtained g (In, Rn) is the distance between I and Rn, and the path A represents the corresponding relationship at this time in the characteristic diagram as shown in FIG. It can be seen that the path is 2.

그런데 상기와 같은 종래의 음성인식시스템에 있어서는 레퍼런스의 각 프레임에서 선택될 수 있는 경로가 균등하기 때문에 DTW를 하게되면 실제 많이 발생되는 경로와 다른 경로가 매칭되어 오류가 발생할 가능성이 커지게 되는 문제점이 있었다.However, in the conventional voice recognition system as described above, since the paths that can be selected in each frame of the reference are even, the DTW is used to match the paths that are frequently generated with other paths, which increases the possibility of errors. there was.

본 발명의 목적은 상기와 같은 종래의 문제점을 해결하기 위해서 기준패턴 각 프레임에서 많이 통과하는 경로를 찾고 많이 선택되는 경로에 웨이트를 줄이면서 적게 선택되는 경로에 웨이트를 늘여 더 정확한 매칭을 실행하도록 한 음성인식시스템의 연속숫자음 인식방법을 창안한 것으로, 이하 첨부한 도면에 의해 상세히 설명한다.An object of the present invention is to find a path that passes a lot in each frame of the reference pattern in order to solve the conventional problems as described above, and to increase the weight to a path that is selected a lot while performing a more accurate matching while reducing the weight to a path that is selected a lot. A method of recognizing continuous digits of a speech recognition system is disclosed, which will be described in detail with reference to the accompanying drawings.

제 1 도는 본 발명 음성인식시스템의 연속숫자음 인식에 대한 블록도로서, 이때 도시한 바와 같이, 마이크(1)를 통한 음성신호를 증폭 및 필터링하는 아날로그부(2)와, 상기 아날로그부(2)의 출력을 디지탈신호로 변환하는 A/D변환부(3)와 상기 A/D변환부(3)의 디지탈신호를 디지탈처리하여 어르데스디코더(5), 버퍼(6)를 통해 전송하는 디지털신호처리부(DSP)(4)와, 알고리즘을 저장하는 프로그램롬(9)과 상기 디지털신호처리부(DSP)(4)의 레퍼런스 데이타를 저장하는 레퍼런스 데이타램(8)과, 디지털신호처리부(4)의 데이타를 저장하는 프로그램램(7)과, 상기 디지털신호처리부(4)의 출력을 입출력으로 디코딩하여 음성이용장치(11)를 구동하는 입출력부(10)로 구성한 것으로, 이하 작용효과를 설명한다.1 is a block diagram for continuous numeric recognition of a voice recognition system according to the present invention. As shown in the drawing, an analog unit 2 for amplifying and filtering a voice signal through the microphone 1 and the analog unit 2 A / D converter 3 converts the output of the digital signal into a digital signal, and digitally processes the digital signals of the A / D converter 3 and transmits them through the address decoder 5 and the buffer 6. A signal processor (DSP) 4, a program ROM 9 for storing algorithms, a reference data RAM 8 for storing reference data of the digital signal processor (DSP) 4, and a digital signal processor 4 The program RAM 7 for storing data of the data and the input / output unit 10 for driving the voice using device 11 by decoding the output of the digital signal processing unit 4 as input / output will be described. .

음성신호는 마이크(1)를 통하여 아날로그부(2)에 인가되어 증폭 및 필터링한 후 A/D변환부(3)는 디지탈신호로 변환되며 이렇게 변환된 디지탈신호는 디지털신호처리부(4)에서 처리되어 인식된다. 이때 디지탈신호처리부(4)에서 처리하는 내용은 제 2 도 및 제 4 도에 도시한 바와같은 알고리즘을 프로그램롬(9)에 저장하고 제 2 도에 도시한 바와같은 과정을 거쳐 만들어진 기준패턴과 웨이트는 레퍼런스 데이타램(8)에 저장되어 있다가 입력음성이 들어오면 비교된다.After the audio signal is applied to the analog unit 2 through the microphone 1 and amplified and filtered, the A / D converter 3 is converted into a digital signal, and the digital signal is processed by the digital signal processor 4. It is recognized. At this time, the content processed by the digital signal processor 4 stores the algorithm as shown in FIGS. 2 and 4 in the program ROM 9 and the reference pattern and the weight made through the process as shown in FIG. Is stored in the reference data RAM 8 and compared when the input voice comes in.

여기서 제 2 도는 본 발명 음성인식시스템의 연속숫자음에 있어서, 기준패턴과 각 경로에 대한 빈번도를 구하는 것에 대한 신호흐름도로서 이에 도시한 바와같이, 트레이닝 데이타로부터 기준패턴과 제 3 도에 도시한 바와같은 각 경로 1,2,3에 대한 빈번도를 구하고 이것으로부터 웨이트 W1, W2, W3를 구하는 과정이다.FIG. 2 is a signal flow diagram for obtaining a reference pattern and a frequency for each path in the continuous numeric sound of the speech recognition system of the present invention, as shown in FIG. It is the process of finding the frequency for each path 1,2,3 as described above and the weight W1, W2, W3 from this.

즉, 트레이닝 데이타를 DTW하여 경로(Warping path)를 찾고 매칭되는 프레임을 평균하여 모형을 만들며 모든 트레이닝 데이타에 대하여 C1(j), C2(j), C3(j)=0 for j=1, …J와 같은 과정을 구행하다. 여기서 C1(j), C2(j), C3(j)는 평균하여 구한 기준패턴의 j번째 프레임에서 경로 1,2,3 각각 나타난 빈번도이다. 이와 같이하여 트레이닝 데이타와 모형을 DTW하여 경로를 구한후 모형의 j번째 경로가 p이면 Cp(j)←Cp(j)+1; for j=1…J를 수행하고 모든 트레이팅 데이타에 대해서 수행되었는지 판별하여 수행되었으면That is, DTW of the training data is used to find a warping path, and a model is generated by averaging matching frames, and C1 (j), C2 (j), and C3 (j) = 0 for j = 1,... Follow the same process as J Here, C1 (j), C2 (j), and C3 (j) are frequent frequencies 1, 2, and 3 respectively represented in the j-th frame of the averaged reference pattern. In this way, DTW of the training data and the model is obtained, and if the j th path of the model is p, Cp (j) ← Cp (j) +1; for j = 1… If it did, run J and determine if it was done for all trading data.

W1(j)=Kㆍlog(C1(j)/[C1(j)+C2(j)+C3(j)])W1 (j) = Klog (C1 (j) / [C1 (j) + C2 (j) + C3 (j)])

W2(j)=Kㆍlog(C2(j)/[C1(j)+C2(j)+C3(j)])W2 (j) = Klog (C2 (j) / [C1 (j) + C2 (j) + C3 (j)])

W3(j)=Kㆍlog(C3(j)/[C1(j)+C2(j)+C3(j)])W3 (j) = Klog (C3 (j) / [C1 (j) + C2 (j) + C3 (j)])

를 수행한다. 이때 W1(j), W2(j), W3(j)는 많이 선택되는 경로에는 웨이트를 줄이고, 적게 선택되는 경로에는 웨이트를 많이 주도록 정하였으며 k는 상수로서 실험적으로 구해진다.Perform At this time, W1 (j), W2 (j), and W3 (j) are determined to reduce the weight on the path that is frequently selected, and to give more weight to the path that is selected less, and k is experimentally obtained as a constant.

제 4 도는 본 발명 음성인식시스템의 연속숫자음 인식에 있어서, 인식과정에 대한 신호흐름도로서 이에 도시한 바와같이, 제 2 도에 도시한 바와같은 신호흐름도에서 구한 웨이트와 기준패턴으로 입력숫자음을 인식하는 과정이다.4 is a signal flow diagram for the recognition process in the continuous numeric sound recognition of the speech recognition system of the present invention. As shown in FIG. 2, the input numeric sound is obtained by the weight and the reference pattern obtained from the signal flow chart as shown in FIG. It is a process of recognition.

즉 입력구간 i프레임에서 j프레임까지를 모든 레퍼런스와 DTW해서 비교하고 최소가 되는 거리를 D(i,j)에, 최소가 되게 하는 레퍼런스를 B(i,j)에 저장한다(i=j-1,…1, I : 입력프레임수) 그리고 입력프레임 m까지를 K자리 숫자 열과 비교한 누적거리 T_R(m) 계산하고 T_R-1(1)+D(1,m)이 최소가 되는 1값을 B_R(m)에 저장한다(K=1,…L, m=1,…1, L : 최대 인식가능한 수).That is, DTW is compared with all references from i frame to j frame in the input section, and the minimum distance is stored in D (i, j), and the reference that minimizes is stored in B (i, j) (i = j- 1,… 1, I: number of input frames) and the cumulative distance T _R (m), which is calculated by comparing the number of input frames m with the K-digit column, and T _R-1 (1) + D (1, m) becomes the minimum. Store 1 value in B _R (m) (K = 1,… L, m = 1,… 1, L: maximum recognizable number).

이와 같이하여 T_R(I)를 모든 K에 대해 비교한 후 최소가 되게 하는 Kopt 값을 구하며 Kopt는 입력음성이 인식된 자리수를 의미한다. 그리고 B_R(m)으로부터 p번째 숫자가 매칭된 입력간 e_p를 반복적으로 구하고 이 e_p로부터 B(e_p-1,e_p)값을 읽어 인색된 0_p를 구한다. 인식된 숫자열은 01,02,02…0_Kopt가 된다.In this way, T _R (I) is compared for all Ks, and then Kopt is obtained to be the minimum. And B _R (m) p-th number is matched between the input e to obtain the _p repeatedly from _{_{e p B (e p-1}} , e p) reading the value obtained from the stingy 0 _p. The recognized string of numbers is 01,02,02... 0 _Kopt .

여기서 DTW할때 DP방정식은 제3도에 도시한 바와같이,Here, the DP equation for DTW is shown in Fig. 3,

는 각각 q번째 레퍼런스에 대하여 구한 웨이트이다. Are the weights for each q reference.

i는 입력레퍼런스, j는 레퍼런스 프레임 g(i,j)는 입력의 i번째 프레임까지를 레퍼런스의 j번째 프레임까지 매칭했을때의 누적거리이다.i is the input reference, j is the reference frame g (i, j) is the cumulative distance when the i-th frame of the input is matched to the j-th frame of the reference.

이상에서 상세히 설명한 바와같이 본 발명은 자주 선택되는 매칭경로와 드물게 선택되는 경로를 웨이트를 주어 매칭하므로 인식성능을 개선시킨다.As described in detail above, the present invention improves the recognition performance by matching a frequently selected matching path with a rarely selected path by weight.

Claims

The data of the digital signal processor 4 is stored in the program RAM 7 reference data RAM 8, and the voice recognition device 11 is driven through the input / output unit 1 in comparison with the algorithm of the program RAM 9. After finding the reference pattern and each path from all the training data, we make a model by averaging the matching frames to find the frequency and the weight for the frequency. Continuous number recognition method of a speech recognition system, characterized in that for adjusting the matching path to give.