KR100322203B1

KR100322203B1 - Device and method for recognizing sound in car

Info

Publication number: KR100322203B1
Application number: KR1019990037721A
Authority: KR
Inventors: 권오일; 이봉우
Original assignee: 윤장진; 주식회사 현대오토넷
Priority date: 1999-09-06
Filing date: 1999-09-06
Publication date: 2002-02-06
Also published as: KR20010026414A

Abstract

본 발명은 차량에서 운전자의 음성명령을 입력받아 음성구간의 시작점과 끝점을 검출하는 차량의 음성인식장치 및 그 방법, 특히, 입력된 전체신호에서 영교차율과 에너지를 이용하여 음성구간을 검출하는 차량의 음성인식장치 및 그 방법에 관한 것으로서, 본 발명에 의한 차량의 음성인식장치 및 그 방법에 의하면, 입력되는 음성데이터 프레임의 영교차율과 에너지를 이용하여 음성구간을 검출하는 방식이므로 배경잡음이 심한 환경에서 종래의 에너지만을 이용한 방식에 비해 정확한 음성구간을 검출할 수 있으며, 이에 따라 음성인식 수행속도를 향상시킬 수 있다는 뛰어난 효과가 있다.The present invention provides a voice recognition device and method for detecting a starting point and an end point of a voice section by receiving a voice command of a driver in a vehicle, and in particular, a vehicle detecting a voice section by using a zero crossing rate and energy from all input signals. The present invention relates to a speech recognition apparatus and a method thereof. According to the speech recognition apparatus and method of the vehicle according to the present invention, since the speech section is detected by using the zero crossing rate and energy of the input speech data frame, the background noise is severe. Compared to the conventional method using only energy in the environment, it is possible to detect an accurate speech section, thereby improving the speech recognition performance.

Description

Voice recognition device of vehicle and its method {DEVICE AND METHOD FOR RECOGNIZING SOUND IN CAR}

본 발명은 차량에서 운전자의 음성명령을 입력받아 음성구간의 시작점과 끝점을 검출하는 차량의 음성인식장치 및 그 방법에 관한 것으로, 특히, 입력된 전체신호에서 영교차율과 에너지를 이용하여 음성구간을 검출하는 차량의 음성인식장치 및 그 방법에 관한 것이다.The present invention relates to a vehicle voice recognition device and a method for detecting a start point and an end point of a voice section by receiving a voice command of a driver in a vehicle. Particularly, the present invention relates to a voice section using zero crossing rate and energy in the entire input signal. The present invention relates to a voice recognition device for detecting a vehicle and a method thereof.

일반적으로 차량용 음성인식기는 전체 음성구간에서 잡음구간을 제외한 순수 음성구간만을 추출하는 기술을 사용하며, 종래의 차량용 음성인식기는 에너지 만을 이용하므로 음성구간의 앞부분인 자음구간과 뒤 부분인 구간에서의 정확한 검출이 이루어지지 않아 음성인식률이 상당히 저하된다는 문제점이 있었다. 좀더 상세하게는 한글의ㅋ, ㅍ, ㅌ과 같은 무성파열음은 주파수가 높고 에너지가 낮으므로 에너지 만을 이용할 경우 음성구간의 검출이 어렵다.In general, the vehicle speech recognizer uses a technology that extracts only the pure speech segment except the noise segment from the entire speech segment, and the conventional speech recognizer uses only energy, so it is accurate in the consonant segment, which is the front part of the speech section, and in the rear part section. There was a problem that the speech recognition rate is considerably lowered due to no detection. More specifically, unvoiced burst sounds such as ㅋ, ッ and ㅌ of Korean have high frequency and low energy, so it is difficult to detect the voice segment using energy alone.

따라서, 본 발명은 상기와 같은 문제점을 해결하기 위해 이루어진 것으로서, 본 발명의 목적은 정확한 음성인식결과를 얻을 수 있는 차량의 음성인식장치 및 그 방법을 제공하는데 있다.Accordingly, the present invention has been made to solve the above problems, and an object of the present invention is to provide a speech recognition device and a method of a vehicle that can obtain accurate speech recognition results.

상기와 같은 목적을 달성하기 위해 본 발명에 의한 차량의 음성인식장치는 차량 운전자의 음성을 입력받아 음성구간의 시작점과 끝점을 검출하기 위해 차량에 설치되는 차량의 음성인식장치에 있어서, 음성인식동작을 진행시키는 음성인식시작 스위치와, 차량의 배경잡음이 입력되는 배경잡음전용 마이크와, 상기 음성인식시작 스위치의 온 동작시 운전자의 음성이 입력되는 음성입력전용 마이크와, 상기 음성인식시작 스위치의 온 동작시 상기 배경잡음전용 마이크 및 음성입력전용마이크로부터 배경잡음 및 운전자의 음성신호가 포함된 음성신호를 입력받아 디지털신호로 변환한 후 음성을 프레임단위로 분할하고, 그 분할된 음성신호의 에너지와 영교차율을 이용하여 음성구간의 시작점과 끝점을 검출하는 제어수단으로 이루어진 것을 특징으로 한다.In order to achieve the above object, the voice recognition device of a vehicle according to the present invention receives a voice of a vehicle driver, and in the voice recognition device of a vehicle installed in a vehicle to detect a start point and an end point of a voice section, the voice recognition operation A voice recognition start switch for advancing the vehicle, a background noise dedicated microphone for inputting background noise of the vehicle, a voice input dedicated microphone for inputting a driver's voice when the voice recognition start switch is turned on, and an ON of the voice recognition start switch In operation, a voice signal including a background noise and a driver's voice signal is received from the microphone and the voice input microphone, and converted into a digital signal. The voice is divided into frames, and the energy of the divided voice signal Characterized in that it consists of a control means for detecting the start point and the end point of the speech segment using the zero crossing rate do.

상기와 같은 목적을 달성하기 위해 본 발명에 의한 차량의 음성인식방법은 차량 운전자의 음성을 입력받아 음성구간의 시작점과 끝점을 검출하는 차량의 음성인식방법에 있어서, 음성인식스위치가 온 동작되면 스피커를 통해 입력개시를 알리는 음을 발생하는 제 1단계와, 배경잡음전용 마이크 및 음성입력전용 마이크를 통해 배경잡음 및 운전자의 음성신호가 포함된 음성데이터를 입력받아 일정크기의 프레임단위로 음성데이터를 분할하는 제 2단계와, 상기 분할된 음성데이터의 구간 중 음성인식스위치가 온 동작된 후 운전자의 음성이 입력되기 전의 묵음구간(5개 프레임)의 평균에너지를 추출하는 제 3단계와, 상기 묵음구간의 평균에너지에 에너지의 최소값을 가산하여 에너지의 기준치를 추출하는 제 4단계와, 상기 에너지의 기준치를 이용하여 입력되는 음성데이터 각 프레임의 영교차율을 계산하는 제 5단계와, 입력된 음성데이터의 초기 5개 프레임의 영교차율 평균값을 계산하는 제 6단계와, 변수(N)를 1로 셋팅하는 제 7단계와, 입력되는 음성데이터 각 프레임의 영교차율이 상기 영교차율 평균값 이상이 되는 지의 여부를 판단하는 제 8단계와, 상기 제 8단계에서 각 프레임의 영교차율이 상기 영교차율 평균값 이상이 되면 변수(B)를 0으로 셋팅하는 제 9단계와, 입력되는 음성데이터 각프레임의 영교차율이 상기 영교차율 평균값 이상이 되는 지의 여부를 판단하는 제 10단계와, 상기 제 10단계에서 음성데이터 각 프레임의 영교차율이 상기 영교차율 평균값 이상이 되면 변수(B, N)를 각각 하나씩 증가시킨 후 상기 변수(B)가 5가 되는 지의 여부를 판단하는 제 11단계와, 상기 제 11단계에서 변수(B)가 5가 되지 않으면 상기 제 10단계로 진행되는 한편, 상기 변수(B)가 5가 되면 음성구간의 시작점은 변수(N)에서 5를 감산하여 구해진 값으로 결정하는 제 12단계와, 음성데이터 각 프레임의 영교차율이 상기 영교차율 평균값 보다 작은 지의 여부를 판단하는 제 13단계와, 상기 제 13단계에서 각 프레임의 영교차율이 상기 영교차율 평균값 보다 작으면 변수(S)를 0으로 셋팅하는 제 14단계와, 음성데이터 각 프레임의 영교차율이 상기 영교차율 평균값 보다 작은 지의 여부를 판단하는 제 15단계와, 상기 제 15단계에서 음성데이터 각 프레임의 영교차율이 상기 영교차율 평균값 보다 작으면 변수(S, N)를 각각 하나씩 증가시키고, 변수(S)가 20인지의 여부를 판단하는 제 16단계와, 상기 제 16단계에서 변수(S)가 20이 되지 않으면 상기 제 15단계로 진행되는 한편, 상기 변수(S)가 20이 되면 음성구간의 끝점을 변수(N)에서 20을 감산하여 구해진 값으로 결정하는 제 17단계로 이루어진 것을 특징으로 한다.In order to achieve the above object, a voice recognition method of a vehicle according to the present invention is a voice recognition method for detecting a start point and an end point of a voice section by receiving a voice of a vehicle driver. The first step of generating a sound to inform the start of the input through the input, and the voice data including the background noise and the driver's voice signal through the background noise dedicated microphone and the voice input dedicated microphone to receive the voice data in a frame unit of a certain size A second step of dividing, a third step of extracting an average energy of a silent section (five frames) after the voice recognition switch is turned on in the divided voice data section and before the driver's voice is input; A fourth step of extracting a reference value of energy by adding a minimum value of energy to an average energy of a section; and using the reference value of energy A fifth step of calculating a zero crossing rate of each frame of the input voice data, a sixth step of calculating an average value of the zero crossing rate of the first five frames of the input voice data, and a seventh step of setting the variable N to 1 And an eighth step of determining whether or not the zero crossing rate of each frame of the input voice data is equal to or greater than the average value of the zero crossing rate; ) Is set to 0, a tenth step of determining whether the zero crossing rate of each frame of the input voice data is equal to or greater than the average value of the zero crossing rate, and the zero crossing rate of each frame of the voice data in the tenth step. When the zero crossing rate average value is equal to or greater than the average value of the zero crossing rate, the variables B and N are each increased by one, and then the eleventh step of determining whether the variable B becomes five and the variable (in the eleventh step) If B) is not 5, the process proceeds to the tenth step, while if the variable B is 5, the starting point of the speech section is determined by subtracting 5 from the variable N to a value obtained by subtracting 5 from the variable N; In step 13, it is determined whether the zero crossing rate of each frame of the data is smaller than the average value of the zero crossing rate; In step 14, if the zero crossing rate of each frame of voice data is smaller than the average value of the zero crossing rate, and in the fifteenth step, the zero crossing rate of each frame of the voice data is smaller than the average value of the zero crossing rate. (S, N) is increased by one each, and the sixteenth step of determining whether the variable (S) is 20, and if the variable (S) is not 20 in the sixteenth step proceeds to the fifteenth step , Prize When the variable S reaches 20, the end point of the voice segment is determined as the 17th step of determining the value obtained by subtracting 20 from the variable N.

도 1은 본 발명의 일실시예에 의한 차량의 음성인식장치의 제어블록도.1 is a control block diagram of a voice recognition device of a vehicle according to an embodiment of the present invention.

도 2a 내지 도 2C는 본 발명의 일실시예에 의한 차량의 음성인식방법을 설명하기 위한 플로우 챠트.2A to 2C are flowcharts for describing a voice recognition method of a vehicle according to an embodiment of the present invention.

도 3은 본 발명에 적용되는 영교차율의 원리를 설명하기 위한 도면.3 is a view for explaining the principle of the zero crossing rate applied to the present invention.

<도면의 주요 부분에 대한 부호의 설명><Explanation of symbols for the main parts of the drawings>

100 : 배경잡음전용마이크 110 : 음성입력전용 마이크100: microphone for background noise 110: microphone for voice input

120 : 음성인식시작 스위치 200 : 제어수단120: voice recognition start switch 200: control means

300 : 차량의 구동기기 310 : 스피커300: driving device of the vehicle 310: speaker

이하, 본 발명의 일실시예에 의한 차량의 음성인식장치 및 그 방법에 대해 첨부된 도면을 참조하여 상세히 기술하기로 한다.Hereinafter, a voice recognition apparatus and a method thereof for a vehicle according to an embodiment of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일실시예에 의한 차량의 음성인식장치의 제어블록도로서, 본 발명의 일실시예에 의한 차량의 음성인식장치는 음성인식시작스위치(120), 배경잡음전용 마이크(100), 음성입력전용 마이크(110), 제어수단(200) 및 스피커(310)로 구성되어 있다.1 is a control block diagram of a voice recognition device for a vehicle according to an embodiment of the present invention, the voice recognition device for a vehicle according to an embodiment of the present invention is a voice recognition start switch 120, a background noise-only microphone 100 ), A voice input dedicated microphone 110, the control means 200 and the speaker 310.

상기 음성인식시작스위치(120)는 음성인식동작을 진행시키는 역할을 하고, 상기 배경잡음전용 마이크(100)는 차량의 배경잡음이 입력될 수 있도록 한 장치이며, 상기 음성입력전용 마이크(110)는 상기 음성인식시작스위치(120)의 온 동작시 운전자의 음성이 입력될 수 있도록 한 장치이다.The voice recognition start switch 120 serves to advance the voice recognition operation, the background noise-only microphone 100 is a device for inputting the background noise of the vehicle, the voice input only microphone 110 is When the voice recognition start switch 120 is turned on, the driver's voice can be input.

상기 제어수단(200)은 상기 배경잡음전용 마이크(100) 및 상기 음성입력전용 마이크(110)로부터 입력되는 배경잡음 및 운전자의 음성신호를 입력받아 일정레벨로 증폭시키고 디지털신호로 변환하는 역할을 하는 증폭부와, 제어프로그램이 내장된 DSP 칩과, 데이터를 저장하기 위한 메모리소자들로 구성되어 있다.The control means 200 receives the background noise and the driver's voice signal input from the background noise-only microphone 100 and the voice input-only microphone 110, and amplifies them to a predetermined level and converts them into digital signals. It consists of an amplifier, a DSP chip with a built-in control program, and memory elements for storing data.

특히, 상기 제어수단(200)은 상기 음성인식시작스위치(120)의 온 동작시 상기 배경잡음전용 마이크(100) 및 음성입력전용마이크(110)로부터 배경잡음 및 운전자의 음성신호가 포함된 음성신호를 입력받아 디지털신호로 변환한 후 음성을 프레임단위로 분할하고, 그 분할된 음성신호의 에너지와 영교차율을 이용하여 음성구간의 시작점과 끝점을 검출하는 역할을 한다.In particular, the control means 200 is a voice signal including the background noise and the driver's voice signal from the background noise dedicated microphone 100 and the voice input microphone 110 during the on operation of the voice recognition start switch 120 After converting the signal into a digital signal, the voice is divided into frames, and the start point and end point of the voice section are detected by using the energy and the zero crossing rate of the divided voice signal.

또한, 상기 제어수단(200)은 음성인식이 수행된 후 차량의 구동기기(300) 및 스피커(310)를 제어하기 위한 구동제어신호를 출력하는 역할을 한다.In addition, the control means 200 outputs a driving control signal for controlling the driving device 300 and the speaker 310 of the vehicle after the voice recognition is performed.

상기 스피커(310)는 상기 음성인식시작 스위치(120)의 온 동작시 상기 제어수단(200)으로부터 출력되는 제어신호를 입력받아 입력개시를 알리는 음을 발생하는 역할을 한다.The speaker 310 receives a control signal output from the control means 200 during the on operation of the voice recognition start switch 120 and generates a sound indicating an input start.

상기와 같은 구성을 가지는 본 발명의 일실시예에 의한 차량의 음성인식장치 및 그 방법에 대해 설명하기로 한다.A speech recognition apparatus and a method of a vehicle according to an embodiment of the present invention having the above configuration will be described.

도 2a 내지 도 2C는 본 발명의 일실시예에 의한 차량의 음성인식방법을 설명하기 위한 플로우 챠트로서, 여기서 S는 스텝(STEP)을 나타낸다.2A to 2C are flowcharts illustrating a voice recognition method of a vehicle according to an embodiment of the present invention, where S represents a step.

먼저, 음성인식스위치(120)가 온 동작되면(S1), 제어수단(200)은 스피커(310)를 통해 입력개시를 알리는 음을 발생한다(S2).First, when the voice recognition switch 120 is turned on (S1), the control means 200 generates a sound indicating the start of the input through the speaker 310 (S2).

이어서, 제어수단(200)은 배경잡음전용 마이크(100) 및 음성입력전용 마이크(110)를 통해 배경잡음 및 운전자의 음성신호가 포함된 음성데이터를 입력받아(S3), 일정크기의 프레임단위(예: 10ms)로 음성데이터를 분할한다(S4).Subsequently, the control unit 200 receives the voice data including the background noise and the driver's voice signal through the background noise-only microphone 100 and the voice input-only microphone 110 (S3), and provides a frame unit having a predetermined size ( Example: 10ms) to divide the voice data (S4).

이어서, 제어수단(200)은 상기 분할된 음성데이터의 구간 중 음성인식스위치(120)가 온 동작된 후 운전자의 음성이 입력되기 전의 묵음구간(5개 프레임)의 평균에너지를 추출한다(S5).Subsequently, the control unit 200 extracts the average energy of the silent section (five frames) after the voice recognition switch 120 is turned on among the sections of the divided voice data and before the driver's voice is input (S5). .

상기 묵음구간의 평균에너지는 수학식 1과 같고, 상기 단기간 평균에너지 E(N)는 10 ㎑의 샘플링 주파수로 A/D 변환될 때 수학식 2와 같다.The average energy of the silent period is equal to Equation 1, and the short term average energy E (N) is equal to Equation 2 when A / D is converted to a sampling frequency of 10 kHz.

(수학식 1)(Equation 1)

(수학식 2)(Equation 2)

N=0, 1, 2,...N = 0, 1, 2, ...

이어서, 제어수단(200)은 상기 묵음구간의 평균에너지에 에너지의 최소값을 가산하여 에너지의 기준치를 추출하고(S6),상기 에너지의 기준치를 이용하여 입력되는 음성데이터 각 프레임의 영교차율을 계산한다(S7).Subsequently, the control means 200 extracts a reference value of energy by adding a minimum value of energy to the average energy of the silent section (S6), and calculates a zero crossing rate of each frame of the input voice data using the reference value of the energy. (S7).

상기 스텝(S6)에서 에너지의 기준치(ZCR_UP,ZCR_DOWN)는 수학식 3과 같다.In step S6, the reference values ZCR _UP and ZCR _DOWN of energy are expressed by Equation 3 below.

(수학식 3)(Equation 3)

ZCR_UP, ZCR_DOWN= ± (묵음구간의 평균에너지+20)ZCR _UP , ZCR _DOWN = ± (mean energy +20)

상기 스텝(S7)에서 입력되는 음성데이터 각 프레임의 영교차율은 도 3에 도시한 바와 같이 각 프레임의 신호값의 변화가 제로축(0)을 가로질러 상위기준치(ZCR_UP)나 하위기준치(ZCR_DOWN)를 하나이상 포함할 때 1로 하고, 상기 신호값의 변화가 제로축을 가로질러 상위나 하위 기준치를 하나라도 포함하지 않으면 0으로 결정한다.As shown in FIG. 3, the zero crossing rate of each frame of the voice data input in the step S7 is such that the change in the signal value of each frame crosses the zero axis (0) to the upper reference value (ZCR _UP ) or the lower reference value (ZCR). _DOWN ) is set to 1 when including one or more, and is set to 0 when the change in the signal value does not include any upper or lower reference value across the zero axis.

이어서, 제어수단(200)은 입력된 음성데이터의 초기 5개 프레임의 영교차율 평균값(ZCR_TH)을 계산한 후(S8), 임의의 변수(N)를 1로 셋팅하고(S9), 입력되는 음성데이터 각 프레임의 영교차율{ZCR(N)}이 상기 영교차율 평균값(ZCR_TH) 이상이 되는 지의 여부를 판단한다(S10).Subsequently, the control unit 200 calculates a zero crossing rate average value ZCR _TH of the initial five frames of the input voice data (S8), sets an arbitrary variable N to 1 (S9), and It is determined whether or not the zero crossing rate {ZCR (N)} of each frame of the voice data is equal to or greater than the zero crossing rate average value ZCR _TH (S10).

상기 스텝(S10)에서 각 프레임의 영교차율{ZCR(N)}이 상기 영교차율 평균값(ZCR_TH) 이상이 되면(YES), 제어수단(200)은 임의의 변수(B)를 0으로 셋팅한 후(S11), 입력되는 음성데이터 각 프레임의 영교차율{ZCR(N)}이 상기 영교차율 평균값(ZCR_TH) 이상이 되는 지의 여부를 판단한다(S12).When the zero crossing rate {ZCR (N)} of each frame is equal to or greater than the zero crossing rate average value ZCR _TH in step S10 (YES), the control means 200 sets an arbitrary variable B to zero. Subsequently (S11), it is determined whether or not the zero crossing rate {ZCR (N)} of each frame of the input voice data is equal to or greater than the zero crossing rate average value ZCR _TH (S12).

상기 스텝(S12)에서 음성데이터 각 프레임의 영교차율{ZCR(N)}이 상기 영교차율 평균값(ZCR_TH) 이상이 되면(YES), 제어수단(200)은 변수(B, N)를 각각 하나씩 증가시킨 후(S13), 상기 변수(B)가 5가 되는 지의 여부를 판단한다(S14).When the zero crossing rate {ZCR (N)} of each frame of the voice data becomes equal to or greater than the zero crossing rate average value ZCR _TH in step S12 (YES), the control means 200 sets the variables B and N, respectively, one by one. After increasing (S13), it is determined whether the variable (B) becomes 5 (S14).

상기 스텝(S14)에서 변수(B)가 5가 되지 않으면(NO), 상기 스텝(S12)로 진행되는 한편, 상기 변수(B)가 5가 되면(YES), 제어수단(200)은 음성구간의 시작점을 변수(N)에서 5를 감산하여 구해진 값으로 결정한 후(S15), 음성데이터 각 프레임의 영교차율{ZCR(N)}이 상기 영교차율 평균값(ZCR_TH) 보다 작은 지의 여부를 판단한다(S16).If the variable B does not become 5 in step S14 (NO), the process proceeds to the step S12, while when the variable B becomes 5 (YES), the control means 200 generates a voice section. After determining the starting point of the value as a value obtained by subtracting 5 from the variable N (S15), it is determined whether the zero crossing rate {ZCR (N)} of each frame of the voice data is smaller than the zero crossing rate average value (ZCR _TH ). (S16).

상기 스텝(S16)에서 각 프레임의 영교차율{ZCR(N)}이 상기 영교차율 평균값 (ZCR_TH)보다 작으면(YES), 제어수단(200)은 임의의 변수(S)를 0으로 셋팅한 후(S17), 음성데이터 각 프레임의 영교차율{ZCR(N)}이 상기 영교차율 평균값(ZCR_TH) 보다 작은 지의 여부를 판단한다(S18).In step S16, if the zero crossing rate {ZCR (N)} of each frame is smaller than the average zero crossing rate value (ZCR _TH ) (YES), the control means 200 sets an arbitrary variable S to zero. Subsequently (S17), it is determined whether or not the zero crossing rate {ZCR (N)} of each frame of voice data is smaller than the zero crossing rate average value ZCR _TH (S18).

상기 스텝(S18)에서 음성데이터 각 프레임의 영교차율{ZCR(N)}이 상기 영교차율 평균값(ZCR_TH) 보다 작으면(YES), 변수(S, N)를 각각 하나씩 증가시키고(S19), 변수(S)가 20인지의 여부를 판단한다(S20).In step S18, if the zero crossing rate {ZCR (N)} of each frame of the voice data is smaller than the zero crossing rate average value ZCR _TH (YES), the variables S and N are each increased by one (S19), It is determined whether the variable S is 20 (S20).

상기 스텝(S20)에서 변수(S)가 20이 되지 않으면(NO), 상기 스텝(S18)으로 진행되는 한편, 상기 변수(S)가 20이 되면(YES), 제어수단(200)은 음성구간의 끝점을 변수(N)에서 20을 감산하여 구해진 값으로 결정한 후(S21), 종료한다.If the variable S does not reach 20 in step S20 (NO), the process proceeds to step S18, while when the variable S reaches 20 (YES), the control means 200 generates a voice section. The end point of is determined as a value obtained by subtracting 20 from the variable N (S21), and then ends.

한편, 상기 스텝(S10, S12)에서 각 프레임의 영교차율{ZCR(N)}이 상기 영교차율 평균값(ZCR_TH) 보다 작으면(N0), 제어수단(200)은 변수(N)를 하나 증가 시킨 후(S22), 상기 스텝(S10)으로 진행된다.On the other hand, if the zero crossing rate {ZCR (N)} of each frame is smaller than the zero crossing rate average value ZCR _{TH in} the steps S10 and S12 (N0), the control means 200 increases the variable N by one. After step S22, the process proceeds to step S10.

한편, 상기 스텝(S6, S8)에서 각 프레임의 영교차율{ZCR(N)}이 상기 영교차율 평균값(ZCR_TH) 이상되면(NO), 변수(N)를 하나 증가 시킨 후(S23), 상기 스텝(S16)으로 진행된다.On the other hand, if the zero crossing rate {ZCR (N)} of each frame in the steps (S6, S8) is greater than or equal to the average value of the zero crossing rate (ZCR _TH ) (NO), the variable N is increased by one (S23). The flow proceeds to step S16.

상술한 바와 같이 본 발명에 의한 차량의 음성인식장치 및 그 방법에 의하면, 입력되는 음성데이터 프레임의 영교차율과 에너지를 이용하여 음성구간을 검출하는 방식이므로 배경잡음이 심한 환경에서 종래의 에너지만을 이용한 방식에 비해 정확한 음성구간을 검출할 수 있으며, 이에 따라 음성인식 수행속도를 향상시킬 수있다는 뛰어난 효과가 있다.As described above, according to the speech recognition apparatus and method of the vehicle according to the present invention, since the speech section is detected using the zero crossing rate and the energy of the input speech data frame, only the conventional energy is used in an environment with high background noise. Compared to the method, it is possible to detect an accurate speech section, thereby improving the speech recognition performance.

Claims

Claim 1 has been deleted.

Claim 2 has been deleted.

In the voice recognition method of the vehicle for receiving the voice of the vehicle driver to detect the start point and the end point of the voice section,

A first step of generating a sound for notifying input start through a speaker when the voice recognition switch is turned on;

A second step of receiving the voice data including the background noise and the driver's voice signal through the microphone for background noise and the microphone for voice input and dividing the voice data into frame units of a predetermined size;

A third step of extracting an average energy of a silent section (five frames) after the voice recognition switch is turned on among the divided voice data sections and before the driver's voice is input;

A fourth step of extracting a reference value of energy by adding a minimum value of energy to the average energy of the silent section;

A fifth step of calculating a zero crossing rate of each frame of voice data input using the energy reference value;

A sixth step of calculating an average value of zero crossing rates of the first five frames of the input voice data;

The seventh step of setting the variable N to 1,

An eighth step of determining whether the zero crossing rate of each frame of the input voice data is equal to or greater than the average value of the zero crossing rate;

A ninth step of setting the variable B to 0 when the zero crossing rate of each frame is equal to or greater than the average value of the zero crossing rate in the eighth step;

A tenth step of determining whether the zero crossing rate of each frame of the input voice data is equal to or greater than the average value of the zero crossing rate;

An eleventh step of determining whether or not the variable B becomes 5 when the zero crossing rate of each frame of the voice data is equal to or greater than the average value of the zero crossing rate in the tenth step, and then increase the variables B and N one by one; ,

If the variable B does not become 5 in the eleventh step, the process proceeds to the tenth step. On the other hand, when the variable B becomes five, the start point of the voice interval is determined as a value obtained by subtracting 5 from the variable N. 12th step,

A thirteenth step of determining whether the zero crossing rate of each frame of voice data is smaller than the average value of the zero crossing rate;

A fourteenth step of setting the variable S to 0 when the zero crossing rate of each frame is smaller than the average value of the zero crossing rate in the thirteenth step;

A fifteenth step of determining whether the zero crossing rate of each frame of the voice data is smaller than the average value of the zero crossing rate;

In the fifteenth step, if the zero crossing rate of each frame of the voice data is less than the average value of the zero crossing rate, each of the variables S and N is increased by one, and step 16 of determining whether the variable S is 20;

If the variable S is not 20 in the sixteenth step, the process proceeds to the fifteenth step. If the variable S is 20, the end point of the speech section is determined as a value obtained by subtracting 20 from the variable N. Voice recognition method of a vehicle, characterized in that consisting of a seventeenth step.

The method of claim 3, wherein

In the fourth step, the reference value of energy is ± (average energy of silence period + 20), characterized in that the vehicle voice recognition method.

The method according to claim 3 or 4,

The zero crossing rate of each frame of the voice data input in the fifth step is 1 when the change in the signal value of each frame includes one or more upper or lower reference values across the zero axis, and the change in the signal value crosses the zero axis. The voice recognition method of a vehicle, characterized in that it is determined as 0 if none of the upper or lower reference values are included.

The method of claim 3, wherein

In the eighth and tenth step, if the zero crossing rate of each frame is smaller than the average value of the zero crossing rate, the vehicle is characterized in that the eighteenth step is performed by increasing the variable N by one and then proceeding to the eighth step. Voice recognition method.

The method of claim 3, wherein

When the zero crossing rate of each frame is equal to or greater than the average value of the zero crossing rate in the thirteenth and fifteenth steps, a nineteenth step in which the variable N is increased and then proceeds to the thirteenth step is added. Voice recognition method.