KR940005043B1

KR940005043B1 - Recognizing method of numeric sound

Info

Publication number: KR940005043B1
Application number: KR1019900006291A
Authority: KR
Inventors: 김민성
Original assignee: 주식회사 금성사; 이헌조
Priority date: 1990-04-30
Filing date: 1990-04-30
Publication date: 1994-06-10
Also published as: KR910018968A

Abstract

The method improves the effect of numeric voice recognition by frequency spectrum variation. The method employs a microphone (1) which converts voice signal into electric signal, an interface (3) which interfaces eletric signal, a clock analog unit (4) which filters, samples, and outputs electric signal, a clock logic unit (5) which converts filtering and sampling voice signal into digital signal, a digital signal processor (6) which controls total action, an address decoder (7) which decodes data selection signal (10S), input-output selection signal (1IS) and program selection signal (1PS), a buffer (8), reference data RAMs (9)(10), a program ROM (11), an input-output decoding logic unit (12), and a voice recognition controller (13).

Description

Recognition method of digits by spectral change

제1도는 본 발명 스펙트럼 변화에 의한 숫자음 인식장치의 구성도.1 is a block diagram of a numerical tone recognition device according to the spectrum change of the present invention.

제2도는 제1도의 숫자음 인식장치에서 이루어지는 숫자음 인식 알고리즘 신호흐름도.2 is a signal flow diagram of a digit recognition algorithm made in the digit recognition apparatus of FIG.

제3도는 제2도의 스펙트럼 차를 계산하기 위한 신호흐름도.3 is a signal flow diagram for calculating the spectral difference of FIG.

제4도는 제2도의 스펙트럼 피크치를 검출하기 위한 신호흐름도.4 is a signal flow diagram for detecting the spectral peak of FIG.

제5도는 제2도의 스펙트럼 임계값을 계산하기 위한 신호흐름도.5 is a signal flow diagram for calculating the spectral threshold of FIG.

제6도는 제2도의 피크가 되는 프레임을 경계로 음성을 분할하는 신호흐름도.FIG. 6 is a signal flow diagram of dividing speech into a frame bordering the peak of FIG.

제7도는 제2도의 DTW에 의한 매칭과정을 설명하기 위한 신호흐름도.7 is a signal flow diagram for explaining the matching process by the DTW of FIG.

제8도는 제3도의 백트래킹 과정을 설명하기 위한 신호흐름도.8 is a signal flow diagram for explaining the backtracking process of FIG.

* 도면의 주요부분에 대한 부호의 설명* Explanation of symbols for main parts of the drawings

1 : 마이크 2 : 스피커1: microphone 2: speaker

3 : 인터페이스부 4 : 클럭아날로그부3: interface portion 4: clock analog portion

6 : 디지탈신호 프로세서 7 : 어드레스 디코더6: digital signal processor 7: address decoder

8 : 버퍼 9, 10 : 기준데이타램8: Buffer 9, 10: Reference data

11 : 프로그램용롬 12 : 입출력디코딩 로직부11: Program ROM 12: Input / Output Decoding Logic

13 : 음성인식 제어기기13: voice recognition controller

본 발명은 데이타 베이스화된 음성신호의 기준패턴과 입력되는 음성신호의 구간을 비교하여 음성신호를 인식하는 음성인식장치에 관한 것으로, 특히 음성영역이 바뀔 때 일어나는 주파수 스펙트럼의 변화량을 계산하여 음성구간을 분할함으로서, 발성되는 숫자음의 갯수에 관계없이 숫자음을 인식할 수 있도록 한 스펙트럼 변화에 의한 숫자음 인식방법에 관한 것이다.The present invention relates to a speech recognition device for recognizing a speech signal by comparing a reference pattern of a database-based speech signal with a section of an input speech signal. Particularly, the speech section is calculated by calculating an amount of change in the frequency spectrum that occurs when the speech region is changed. By dividing, the present invention relates to a method for recognizing a number sound by changing a spectrum so that the number sound can be recognized regardless of the number of spoken number sounds.

종래의 음성인식장치는 입력되는 음성신호의 입력패턴의 시작점과 끝점, 즉 첫 번째 프레임과 마지막 프레임을 검출하고, 그 검출한 입력패턴의 시작점과 끝점을 기준패턴의 시작점에서 끝점까지 DP매칭, 즉 기준패턴과 입력패턴을 비교하는 과정에서 시간축상으로 불균등하게 분포된 음성신호의 패턴을 시간축상으로 신축시키면서 두패턴의 차(Distance)가 가장 작도록 하여 두패턴의 차를 구하며, 이때 차가 가장 작은 것으로 음성신호를 인식하게 하였다.The conventional speech recognition apparatus detects the start point and end point of the input pattern of the input voice signal, that is, the first frame and the last frame, and DP matching, i.e., the start point and the end point of the detected input pattern, from the start point to the end point of the reference pattern. In the process of comparing the reference pattern and the input pattern, the difference between the two patterns is obtained by stretching the pattern of the voice signal unevenly distributed on the time axis on the time axis and making the distance between the two patterns smallest. Voice signal is recognized.

그러나, 이와 같은 종래의 음성인식장치는 끝점을 잘못 검출할 경우에 필요없는 음성신호의 차가 커지게 되고 이로인하여 입력되는 음성신호를 잘못 인식하게 되는 결함이 있었다.However, such a conventional speech recognition apparatus has a defect in that the difference of the unnecessary speech signal becomes large when the endpoint is erroneously detected, thereby misrecognizing the input speech signal.

따라서, 본 발명의 목적은 이와 같은 종래의 결함을 감안하여 음성영역이 바뀔 때 일어나는 주파수 스펙트럼의 변화에서 중심계수를 구하고 이를 계산하여 음성구간의 경계를 찾아 발성된 숫자음의 개수에 관계없이 숫자음을 인식하도록 하는 스펙트럼 변화에 의한 숫자음 인식방법을 제공함에 있다.Accordingly, an object of the present invention is to obtain the central coefficient from the change of the frequency spectrum that occurs when the voice region is changed in view of the conventional defects, and calculate the boundary to find the boundary of the voice interval, regardless of the number of digits spoken. The present invention provides a method for recognizing a number sound by a spectrum change to recognize a signal.

이와 같은 본 발명의 목적을 달성하기 위한 방법으로는 인식할 음성신호를 필터링 및 샘플링하여 디지탈 신호로 입력하는 단계와, 상기 샘플링된 데이타의 음성신호를 소정갯수로 모아 프레임 단위로 구분하는 단계와, 상기 구분된 각 프레임에 대해 캡스트럼(Cepstrum)계수를 이용하여 특징을 추출하는 단계와, 상기 추출된 모든 프레임에 대해 스펙트럼 차를 계산하는 단계와, 상기 계산된 프레임중 스펙트럼의 차가 피크가 되는 프레임을 검출하는 단계와, 상기 검출된 프레임의 피크치중 스펙트럼 차가 임계치 이하인 것은 피크치에서 제외하는 단계와, 상기 임계치 이상의 피크치로 판명된 프레임을 경계로 음성영역을 세그먼트로 분할하는 단계와, 상기 분할된 음성 세그먼트의 패턴과 데이타 베이스화된 기준패턴을 비교하여 거리를 구하는 DTW에 의한 정합단계와, 상기 DTW 결과 거리가 가장 작은 기준패턴을 인식결과로 출력하는 단계로 이루어짐으로써 달성되는 것으로, 이하 본 발명을 첨부한 도면에 의거 상세히 설명하면 다음과 같다.The method for achieving the object of the present invention comprises the steps of filtering and sampling the voice signal to be recognized as a digital signal, and collecting the voice signal of the sampled data in a predetermined number to separate the frame unit; Extracting a feature using a capstrum coefficient for each divided frame, calculating a spectral difference for all the extracted frames, and a frame in which the difference in the spectrum of the calculated frames becomes a peak Detecting a peak value of a peak value of the detected frame below a threshold value, dividing a voice region into segments at a boundary of a frame found to be a peak value above the threshold value; The DTW calculates the distance by comparing the pattern of the segment with the database-based reference pattern. It is described as being made of an accomplished by matching step and the step of the DTW distance result is output to the smallest reference pattern as a recognition result, less detail with reference to the accompanying drawings, the present invention.

제1도는 본 발명 스펙트럼 변화에 의한 숫자음 인식장치의 구성도로서, 이에 도시한 바와 같이, 발성되어 입력되는 음성신호를 전기적 신호로 변환시키는 마이크(1)와, 상기 마이크(1)에서 입력되는 음성신호에 대한 전기적신호를 처리하여 인터페이스하는 인터페이스부(3)와, 상기 인터페이스부(3)에서 입력되는 음성신호에 대한 전기적 신호를 필터링하고 샘플링하여 출력하는 클럭아날로그부(4)와, 상기 클럭아날로그부(4)에서 필터링 및 샘플링된 음성신호를 디지탈 신호로 변환하고 인터럽트신호를 발생하는 인터럽트 및 아날로그/디지탈변환 클럭로직부(5)와, 상기 인터럽트 및 아날로그/디지탈변환 클럭로직부(5)로부터 발생된 인터럽트신호

에 의해 음성인식장치의 전체 동작을 제어하는 디지탈 신호프로세서(6)와, 상기 디지탈 신호프로세서(6)에서 출력되는 데이타선택신호

, 입출력선택신호

및 프로그램선택신호

를 디코딩하는 어드레스디코더(7)와, 상기 디지탈신호프로세서(6)에서 출력되는 어드레스신호(AD)를 전송하는 버퍼(8)와, 상기 어드레스디코더(7)의 출력신호에 의해 칩선택되고 버퍼(8)에 의해 어드레스가 지정되어 해당 번지내에 저장된 기준데이타(DATA)를 디지탈신호프로세서(6)에 입력하는 기준데이타램(9)(10)과, 상기 디지탈 신호프로세서(6)가 수행할 프로그램을 저장하는 프로그램롬(11)과, 상기 어드레스디코더(7)를 통해 디코딩된 디지탈 신호프로세서(6)의 입출력선택신호

에 의해 상기 기준데이타램(9)(10)으로부터 출력된 데이타(DATA)를 디코딩하는 입출력디코딩로직부(12)와, 상기 입출력디코딩로직부(12)에서 디코딩된 음성신호에 대한 데이타와 디지탈신호프로세서(6)가 인식한 음성신호에 따라 음성을 인식하여 인터페이스부(3)를 통해 스피커(2)로 출력함과 아울러 인터럽트 및 아날로그/디지탈변환 클럭로직부(5)를 제어하는 음성인식제어기기(13)로 구성한다.FIG. 1 is a block diagram of an apparatus for recognizing a number sound according to a spectrum change of the present invention. As shown in FIG. 1, a microphone 1 for converting a voice signal inputted into an electrical signal and a signal input from the microphone 1 are input. An interface unit 3 for processing and interfacing electrical signals for voice signals, a clock analog unit 4 for filtering, sampling, and outputting electrical signals for voice signals input from the interface unit 3, and the clock; An interrupt and analog / digital conversion clock logic unit 5 for converting the voice signal filtered and sampled by the analog unit 4 into a digital signal and generating an interrupt signal, and the interrupt and analog / digital conversion clock logic unit 5. Interrupt signal generated from

A digital signal processor 6 for controlling the overall operation of the voice recognition device and a data selection signal output from the digital signal processor 6

, I / O selection signal

And program selection signal

The chip is selected by the address decoder 7 for decoding the signal, the buffer 8 for transmitting the address signal AD output from the digital signal processor 6, and the output signal of the address decoder 7. 8) reference data 9 and 10 for inputting the reference data DATA stored in the address to the digital signal processor 6 and a program to be executed by the digital signal processor 6; An input / output selection signal of the digital signal processor 6 decoded through the program ROM 11 to be stored and the address decoder 7.

I / O decoding logic section 12, which decodes data DATA output from the reference data 9 and 10, and data and digital signals for voice signals decoded by the I / O decoding logic section 12. Speech recognition control device that recognizes the voice according to the voice signal recognized by the processor 6 and outputs it to the speaker 2 through the interface unit 3 and controls the interrupt and the analog / digital conversion clock logic unit 5. It consists of (13).

그리고, 제2도는 제1도의 숫자음 인식장치에서 이루어지는 숫자음 인식 알고리즘 신호흐름도로서, 이에 도시한 바와 같이, 인식할 음성신호를 필터링 및 샘플링하여 디지탈 신호로 입력받는 단계(ST100)와, 상기 입력된 데이타의 음성신호를 소정갯수로 모아 프레임 단위로 구분하는 단계(ST101)와, 상기 구분된 각 프레임에 대해서 캡스트럼계수를 이용하여 음성의 특징을 추출하는 단계(ST102)와, 상기 추출된 모든 프레임에 대해서 스펙트럼 차를 계산하는 단계(ST103)와, 상기 계산된 프레임중 스펙트럼의 차가 피크가 되는 프레임을 검출하는 단계(ST104)와, 상기 검출된 프레임의 피크치중 스펙트럼 차가 임계치 이하인 것을 피크치에서 제외하는 단계(ST105)와, 상기 단계(ST105)를 통해 프레임의 피크치가 임계치 이상의 피크치로 판명되면 그 프레임을 경계로 하여 음성영역을 세그먼트(Segment)로 분할하는 단계(ST106)와, 상기 분할된 음성 세그먼트의 패턴과 데이타 베이스화된 기준패턴을 비교하여 거리를 구하는 DTW에 의해 정합하는 단계(ST107)와, 상기 DTW 결과 거리가 가장 작은 기준패턴을 인식결과로 출력하는 단계(ST108)로 이루어진다.FIG. 2 is a signal flow diagram of a number sound recognition algorithm of the number sound recognition apparatus of FIG. 1, as shown in FIG. 1, filtering and sampling a voice signal to be recognized and inputting it as a digital signal (ST100). Collecting a predetermined number of speech signals of the collected data into frame units (ST101), extracting a feature of speech using a cap stratum coefficient for each of the divided frames (ST102), and extracting all of the extracted frames. Calculating a spectral difference for the frame (ST103), detecting a frame in which the difference in the spectrum of the calculated frame becomes a peak (ST104), and excluding the peak value of the detected frame having a spectral difference less than or equal to a threshold value Step ST105 and if the peak value of the frame is found to be a peak value higher than or equal to the threshold value through the step ST105, Dividing the speech region into segments (ST106), comparing the pattern of the divided speech segment with a database-based reference pattern, and matching by DTW to obtain a distance (ST107), and the DTW result. In step ST108, the reference pattern having the smallest distance is output as a recognition result.

이와 같이, 구성된 본 발명의 작용효과를 제1도 내지 제8도를 참조하여 상세히 설명하면 다음과 같다.Thus, when described in detail with reference to Figures 1 to 8 the effect of the present invention configured as follows.

제1도에서와 같이, 음성신호의 기준패턴을 저장할시에 먼저 발성된 연결 숫자 음성이 마이크(1)를 통해 입력되면, 그 입력된 음성신호는 인터페이스부(3)를 통해 처리되어 클럭아날로그부(4)로 입력된다.As shown in FIG. 1, when the connection number voice, which is first uttered when the reference pattern of the voice signal is stored, is input through the microphone 1, the input voice signal is processed through the interface unit 3, thereby providing a clock analog unit. It is entered as (4).

상기 클럭아날로그부(4)는 입력된 음성신호를 저역필터링하고 샘플링하여 인터럽트 및 아날로그/디지탈변환 클럭로직부(5)에 입력하게 된다.The clock analog unit 4 performs low pass filtering and sampling of the input audio signal and inputs the interrupted and analog / digital conversion clock logic unit 5.

상기 인터럽트 및 아날로그/디지탈변환 클럭로직부(5)는 샘플링되어 입력된 음성신호를 디지탈 신호로 변환하여 디지탈신호프로세서(6)로 인터럽트신호

를 출력하게 된다.The interrupt and analog / digital conversion clock logic unit 5 converts the sampled and input voice signal into a digital signal to the digital signal processor 6 for an interrupt signal.

Will print

이와 같은 상태에서 상기 디지탈신호프로세서(6)가 인터페이스신호

에 응답하여 인터럽트인식신호

와 클럭신호(XF)를 출력하면 인터럽트 및 아날로그/디지탈변환 클럭로직부(5)는 클럭신호(XF)에 따라 디지탈신호로 변환한 음성신호를 데이타신호(DATA)로 하여 디지탈 신호프로세서(6)에 입력하게 된다.In this state, the digital signal processor 6 provides an interface signal.

Interrupt recognition signal in response to

And outputting the clock signal XF, the interrupt and the analog / digital conversion clock logic unit 5 converts the audio signal converted into the digital signal according to the clock signal XF as the data signal DATA. Will be entered.

상기 디지탈신호프로세서(6)는 입력된 디지탈 데이타를 고속푸리에 변환(FFT : Fast Fourier Transform)하여 주파수 스펙트럼을 구하고 일정길이의 프레임으로 나누어 기준패턴을 만든다.The digital signal processor 6 obtains a frequency spectrum by performing Fast Fourier Transform (FFT) on the input digital data and divides the received digital data into frames having a predetermined length to form a reference pattern.

이때 디지탈신호프로세서(6)는 어드레스디코더(7)를 통해 데이타선택신호

, 입출력선택신호

및 프로그램선택신호

를 출력하여 기준데이타램(9)(10)과 프로그램롬(11)의 칩을 선택하고, 이어서 버퍼(8)를 통해 어드레스를 선택한 후 상기 기준패턴을 데이타라인을 통해 기준데이타램(9)(10)에 저장하며, 이와 같은 동작을 인식하고자 하는 음성신호에 대하여 반복수행하여 음성신호의 기준패턴을 저장하게 된다.At this time, the digital signal processor 6 transmits a data selection signal through the address decoder 7.

, I / O selection signal

And program selection signal

Outputs the chips of the reference data (9) 10 and the program ROM (11), selects an address through the buffer (8), and then applies the reference pattern through the data line to the reference data (9) ( 10), and repeats the voice signal to recognize the operation to store the reference pattern of the voice signal.

한편, 발성된 음성신호를 인식하는 인식과정을 수행할 경우에 있어서는 먼저, 발성된 연결숫자 음성이 마이크(1)를 통해 입력되면, 그 입력된 음성신호는 인터페이스부(3)를 통해 처리되어 클럭아날로그부(4) 입력된다.On the other hand, in the case of performing a recognition process for recognizing the spoken voice signal, first, when the spoken connection number voice is input through the microphone 1, the input voice signal is processed through the interface unit 3 to clock. The analog unit 4 is input.

Will print

이와 같은 상태에서 상기 디지탈신호프로세서(6)가 인터럽트신호

에 응답하여 인터럽트인식신호

와 클럭신호(XF)를 출력하면 인터럽트 및 아날로그/디지탈변환 클럭로직부(5)는 클럭신호(XF)에 따라 디지탈 신호로 변환한 음성신호를 데이타신호(DATA)로 하여 디지탈신호프로세서(6)에 입력하게 된다.In this state, the digital signal processor 6 interrupts the signal.

Interrupt recognition signal in response to

And the clock signal XF are outputted, the interrupt and the analog / digital conversion clock logic unit 5 converts the audio signal converted into a digital signal according to the clock signal XF as the data signal DATA. Will be entered.

상기 디지탈신호프로세서(6)는 입력된 음성신호를 캡스트럼 계수를 이용하여 주파수 스펙트럼을 얻은 후 일정길이의 프레임으로 나누어 입력패턴을 만들고, 그 입력패턴과 기준데이타램(9)(10)에 저장된 기준패턴을 하나씩 비교하면 가장 유사한 것을 찾아 음성신호를 인식하고 인식결과에 따라 음성인식제어기기(13)를 제어하게 된다.The digital signal processor 6 obtains a frequency spectrum by using a capstrand coefficient, and then divides the input speech signal into frames having a predetermined length to create an input pattern, and stores the input pattern and the reference data 9 and 10 in the input signal. By comparing the reference patterns one by one, the voice signal is recognized and the voice recognition control device 13 is controlled according to the recognition result.

즉, 이를 제2 내지 제8도를 참조하여 구체적으로 설명하면, 먼저, 제2도에서와 같이, 상기 디지탈신호프로세서(6)는 클럭아날로그(4)와 인터럽트 및 아날로그/디지탈변환 클럭로직부(5)를 순차 통해 필터링 및 샘플링되고 디지탈신호로 변환된 음성신호가 입력되면(ST100)이 음성 디지탈 데이타를 먼저 프레임단위로 구분하고(ST101), 각 프레임에 대해서 특징을 추출하여(ST102) 그 추출된 특징을 가지고 분할경계 조건에 합당한지 검사하기 위해서 스펙트럼의 차를 계산하게 된다(ST103).That is, referring to FIG. 2 to FIG. 8 specifically, first, as shown in FIG. 2, the digital signal processor 6 includes a clock analog 4 and an interrupt and analog / digital conversion clock logic unit (see FIG. 5) When a voice signal filtered and sampled sequentially and converted into a digital signal is input (ST100), the voice digital data is first divided into frame units (ST101), and features are extracted for each frame (ST102). The difference between the spectra is calculated to check whether the partition boundary conditions are satisfied with the defined characteristics (ST103).

이렇게 얻어진 데이타의 국부적인 피크치를 찾는 피크치 검출단계(ST104)와, 상기 국부적인 피크치의 스펙트럼 차가 일정한 값보다 적으면 제거시키는 임계값비교단계(ST105)와, 이렇게 최종적으로 구한 국부(Local) 피크치값을 음성구간의 경계로 하여 음성 세그먼트로 분할하는 단계(ST106)와, 이 음성 세그먼트를 기준데이타램(9)(10)에 저장된 기준패턴과를 비교하여 거리를 구하는 DTW에 의한 매칭단계(ST107)와, 상기 DTW 결과 거리가 가장 작은 기준패턴을 인식결과로 출력하는 단계(ST108)로 인식을 수행한다.A peak value detection step (ST104) for finding a local peak value of the data thus obtained, a threshold comparison step (ST105) for removing when the spectral difference between the local peak values is smaller than a constant value, and a local peak value finally obtained as described above. Is divided into speech segments using the boundary of the speech section (ST106), and matching step by DTW comparing the speech segment with a reference pattern stored in the reference data 9 (10) to obtain a distance (ST107). And outputting the reference pattern having the smallest DTW distance as the recognition result (ST108).

상기 흐름의 각 단계별을 제3도 내지 제8도를 참조하여 구체적으로 설명하면 하기와 같다.Each step of the flow will be described in detail with reference to FIGS. 3 to 8 as follows.

먼저, 입력 음성신호를 프레임 단위로 구분하는 단계(ST101)와 특징 추출단계(ST102)는 상기 필터링 및 샘플링되고 디지탈 데이타로 변환된 음성 디지탈 입력 데이타의 샘플(음성구간)을 256개(25.6msec)를 모아 1프레임으로 구분 짓는다.First, the step (ST101) and the feature extraction step (ST102) for dividing the input voice signal into frame units are performed by 256 samples (voice intervals) of the voice digital input data filtered, sampled, and converted into digital data (25.6 msec). Collect them into 1 frame.

이후, 상기 각 프레임에 대해서 특징을 추출하는데 이때 사용되는 특징으로는 캡스트럼 계수를 사용하여 선형예측부호화 계수를 계산하고 이때 선형예측부호화 분석치수는 12차로 한다.Thereafter, the feature is extracted for each frame. The feature used at this time is a linear predictive encoding coefficient calculated using a capstrum coefficient, and the linear predictive encoding analysis dimension is set to 12th order.

이렇게 하여 얻어진 특징벡터 선형예측부호화 캡스트럼 계수로부터 상기 단계(ST103)를 통해 스텍트럼차를 구하게되는데, 이는 제3도에서와 같다.From the obtained feature vector linear predictive encoding capstrum coefficient, a spectrum difference is obtained through the above step ST103, which is the same as in FIG.

즉, j번째 프레임 캡스트럼 벡터를 C_j라 하면, C_j=C_j ¹,C_j ²,C_j ³,… C_j ¹²)라 하면 i번째 프레임에서 스펙트럼 차(di)는 하기와 같다.In other words, if the j th frame capstrum vector is C _j , C _j = C _j ¹ , C _j ² , C _j ³ ,. C _j ¹² ), the spectral difference di in the i-th frame is as follows.

로 구한다.

Obtain as

상기에서 I=1, 2, …M은 마지막 프레임이고, CK는 K번째 프레임에서 캡스트럼 특징 벡터이고, L은 최대 인식할 수 있는 숫자열 갯수이다.Wherein I = 1, 2,... M is the last frame, CK is the capstem feature vector in the Kth frame, and L is the maximum recognizable number of strings.

이와 같이, 모든 입력 프레임에 대해 스펙트럼 차(di)를 구한후 피크치 검출단계(ST104)를 수행한다.As described above, after obtaining the spectral difference di for all input frames, the peak value detection step ST104 is performed.

상기 스펙트럼 차가 국부적인 최대치(local peak)가 되는 찾아내는 피크치 검출단계(ST104)는 제4도와 같다.The peak value detecting step ST104 in which the spectral difference is a local peak is shown in FIG.

즉, j번째 프레임이 d_j>d_j-1과 d_j>d_j+1을 모두 만족하면 그 해당 프레임을 국부적인 최대치로 판정하여 P어레이에 저장하고, 만약 j프레임이 마지막 입력 프레임이 아니면 j프레임을 하나 증가시켜 반복수행한다.That is, if the j th frame satisfies both d _j > d _j-1 and d _j > d _{j + 1} , the frame is determined as the local maximum value and stored in the P array. If j frame is not the last input frame, Repeat by incrementing j frames.

상기 과정을 M-1번째 프레임까지 행한 후 임계값 비교단계(ST105)로 넘어간다.After the above process is performed up to the M-1 < th > frame, the process goes to the threshold comparison step ST105.

상기 임계값 비교단계(ST105)는 제5도와 같다.The threshold comparison step ST105 is shown in FIG.

이 단계에서는 상기 단계(ST104)에서 국부적인 피크치라고 판단된 프레임중에서 스펙트럼 차(dk)가 일정치(Tspec)보다 작으면 P어레이에서 이를 제거하고, 이 제거표시로 큰값(∞)을 P어레이에 저장한다.In this step, if the spectral difference dk is smaller than the predetermined value Tspec among the frames determined as the local peak value in the step ST104, the P array is removed from the P array. Save it.

상기 과정을 판정된 국부적인 피크치의 수(np)만큼 반복후 다음단계인 음성 세그먼트 분할단계(ST106)를 수행한다.After repeating the above process by the determined number of local peak values np, the voice segmentation step ST106 is performed.

상기 음성 세그먼트 분할단계(ST106)는 상기 임계값 비교단계(ST105)까지 거친 최종적인 국부적인 피크치들을 음성의 경계라고 판단한 후 이를 기준으로 음성영역을 분할한다.The speech segmentation step ST106 determines the final local peak values passed through the threshold value comparison step ST105 as the boundary of the speech and then divides the speech region based on the speech boundary.

이 과정을 제6도를 참조하여 설명하면, 국부적인 피크치가 아닌 프레임인 ∞값을 갖는 국부적인 피크치를 제외하고 나머지 국부적인 피크치를 기준으로 입력 음성신호를 음성 세그먼트로 분할한다.Referring to FIG. 6, the input voice signal is divided into voice segments based on the remaining local peak values except for the local peak value having the ∞ value which is a frame, not the local peak value.

모든 국부적인 피크치에서 행한 후 다음단계인 매칭단계(ST107)와 음성인식 단계(ST108)를 수행한다.After all local peaks, the next step, matching step ST107 and voice recognition step ST108, is performed.

상기 매칭단계(ST107)는 제7도에서와 같이, 상기 음성 세그먼트 분할단계(ST106)에서 얻은 음성 세그먼트를 미리 가지고 있는 0에서 9까지의 숫자음에 대한 기준패턴과를 비교하여 가장 차가 작은 값을 출력하는 과정이다.In the matching step ST107, as shown in FIG. 7, the value having the smallest difference is compared by comparing with the reference pattern for the digits from 0 to 9 having the voice segment obtained in the voice segment dividing step ST106 in advance. The process of printing.

상기 제7도에서 T_k은 1번째 기준치를 k번째 세그먼트에 매칭했을때의 차값이고, B_k ¹은 1번째 기준치에서 k번째 세그먼트까지 매칭했을 때 매칭된 마지막 기준치이며, C_k ¹은 1-1번째 기준치가 매칭된 세그먼트를 가르키는 점이며, DTW(I, K)은 I와 k을 DTW했을 때 차값이고, Iij는 입력 프레임 i에서 j까지의 구간이며, Rm은 m번째 기준치이고, M은 기준치의 수이며, N은 기준치에 매칭될 수 있는 최대 세그먼트 수이고, K는 단계(ST106)까지의 과정을 지쳐나온 세그먼트의 갯수이고, L은 최대 인식할 수 있는 숫자열 갯수이며, Q1…Qn은 인식된 숫자열이다.In FIG. 7, T _k is a difference value when the first reference value is matched to the k-th segment, B _k ¹ is the last reference value matched when the first reference value is matched to the k-th segment, and C _k ¹ is 1-. The first reference point indicates a matched segment, DTW (I, K) is the difference between DTW of I and k, Iij is the interval from input frame i to j, Rm is the mth reference value, M Is the number of reference values, N is the maximum number of segments that can be matched with the reference value, K is the number of segments that have passed the process up to step ST106, L is the maximum recognizable number of sequences, and Q1... Qn is a recognized string of numbers.

그리고 제8도는 상기 제7도의 매칭과정을 거쳐 얻어진 최소거리로부터 역추적하여 인식된 숫자열을 찾는 과정의 흐름도이다.FIG. 8 is a flowchart of a process of finding a recognized sequence of numbers by trace back from the minimum distance obtained through the matching process of FIG.

이렇게 인식된 숫자음을 찾아 출력하므로 숫자음 인식음을 종료하게 된다.Since the recognized number is detected and output, the number recognition sound ends.

이상에서 상세히 설명한 바와 같이, 본 발명은 음성영역이 바뀔 때 일어나는 주파수 스펙트럼의 변화를 중심계수를 구해서 계산하여 음성구간의 경계를 찾음으로 발생된 숫자음의 개수에 제약을 받지 않고 숫자음을 인식할 수 있는 효과가 있다.As described in detail above, the present invention can calculate the change in the frequency spectrum that occurs when the voice region is changed to calculate the center coefficient to recognize the numeric sound without being limited by the number of the number sound generated by finding the boundary of the voice interval. It can be effective.

Claims

Filtering, sampling, and inputting a voice signal to be recognized as a digital signal, collecting a predetermined number of voice signals of the sampled data into frame units, and capturing coefficients for each of the divided frames. Extracting a feature using; and calculating a spectral difference for all the extracted frames; detecting a frame in which the difference in the spectrum of the calculated frames becomes a peak; and among the peak values of the detected frames. Subtracting the spectral difference below the threshold from the peak value, dividing the speech region into segments around a frame found to be a peak value above the threshold, comparing the pattern of the divided speech segment with a reference patterned database; Matching step by DTW to obtain a reference pattern and a reference pattern with the smallest DTW result distance The method for recognizing a number sound by changing the spectrum, characterized in that it comprises the step of outputting as a recognition result.