KR100334238B1

KR100334238B1 - Apparatus and method for detecting speech/non-speech using the envelope of speech waveform

Info

Publication number: KR100334238B1
Application number: KR1019990061141A
Authority: KR
Inventors: 김승희; 이영직
Original assignee: 오길록; 한국전자통신연구원
Priority date: 1999-12-23
Filing date: 1999-12-23
Publication date: 2002-05-02
Also published as: KR20010057743A

Abstract

1. 청구범위에 기재된 발명이 속한 기술분야1. TECHNICAL FIELD OF THE INVENTION

본 발명은 음성 파형의 포락선 정보를 이용한 음성/비음성 판별 장치 및 그 방법에 관한 것임.The present invention relates to a speech / non-voice discrimination apparatus and method using envelope information of speech waveforms.

2. 발명이 해결하려고 하는 기술적 과제2. The technical problem to be solved by the invention

본 발명은 음성 포락선의 주파수 특성을 이용하여 입력신호의 음성/비음성 여부를 판단하는 음성/비음성 판별 장치 및 그 방법과 상기 방법을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공하는데 그 목적이 있음.The present invention provides a voice / non-voice discrimination apparatus for determining whether an input signal is voice / non-voice using frequency characteristics of a voice envelope, and a method and a computer-readable recording medium recording a program for realizing the method. Its purpose is to.

3. 발명의 해결방법의 요지3. Summary of Solution to Invention

본 발명은, 음성 파형의 포락선 정보를 이용한 음성/비음성 판별 장치에 있어서, 입력 음성 신호를 저장하기 위한 저장 수단; 상기 저장 수단에 저장된 신호에 대해 포락선을 추출하기 위한 포락선 검출 수단; 상기 포락선 신호를 주파수 영역으로 변환하기 위한 주파수 변환 수단; 및 상기 주파수 변환 수단에서 주파수 영역으로 변환된 포락선 신호의 전체 주파수 영역 에너지와 특정 주파수 영역 에너지의 비율에 따라 음성/비음성 여부를 판단하기 위한 음성/비음성 판별 수단을 포함한다.The present invention provides a speech / non-voice discrimination apparatus using envelope information of speech waveforms, comprising: storage means for storing an input speech signal; Envelope detection means for extracting an envelope with respect to the signal stored in said storage means; Frequency converting means for converting the envelope signal into a frequency domain; And voice / non-voice discrimination means for determining whether voice / non-voice is performed according to the ratio of the total frequency domain energy and the specific frequency domain energy of the envelope signal converted into the frequency domain by the frequency conversion means.

4. 발명의 중요한 용도4. Important uses of the invention

본 발명은 음성 처리 분야 등에 이용됨.The present invention is used in the field of speech processing.

Description

Apparatus and method for detecting speech / non-speech using the envelope of speech waveform}

본 발명은 음성인식 및 음성코딩 등 음성을 입력으로 사용하는 시스템에서 음성이라고 검출된 신호가 실제 음성인지 아닌지를 판별하는 음성/비음성 판별 장치 및 그 방법에 관한 것으로, 특히 음성 파형의 포락선이 가지는 주파수 정보를 이용하여 잡음과 음성이 공존하는 신호에서도 음성/비음성 여부를 정확히 판단하는 음성/비음성 판별 장치 및 그 방법과 상기 방법을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체에 관한 것이다.BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech / non-voice discrimination apparatus and method for determining whether a signal detected as speech is a real speech in a system using speech as an input such as speech recognition and speech coding. A voice / non-voice discrimination apparatus for accurately determining whether a voice or a non-voice is used even in a signal in which noise and voice coexist using frequency information, and a method and a computer-readable recording medium recording a program for implementing the method. will be.

일반적으로, 실제 환경에서 음성인식기의 오동작을 유발시키는 주된 요인 중 하나가 음성검출기의 오동작이다. 잡음이 음성으로 오검출될 경우에 필연적으로 오인식결과를 출력할 수 밖에 없다. 음성을 입력하지 않았는데도 인식결과를 출력할 경우에 음성인식 시스템 사용자들은 시스템의 신뢰성에 상당한 의심을 가질 수 밖에 없다. 따라서, 음성검출기에서 음성이라고 검출된 신호에 대해서 음성/비음성 여부를 판단하는 작업은 시스템의 적용분야에 따라서는 상당한 중요성을 가질 수 있다.In general, one of the main factors causing the malfunction of the speech recognizer in a real environment is the malfunction of the speech detector. If noise is falsely detected as voice, it is inevitably forced to output a misrecognition result. When the recognition result is output even though the voice is not input, the users of the voice recognition system have no doubt about the reliability of the system. Thus, the task of determining whether the voice / non-voice is to the signal detected as voice in the voice detector can be of considerable importance depending on the application of the system.

음성/비음성을 판단하는 종래의 방법에는 크게 주파수 영역에서의 특징을 가지고 음성 모델과 비음성 모델을 만들어서 입력신호가 어느 모델에 가까운지를 이용하는 방법과 시간영역에서 피치정보를 이용하여 판단하는 방법이 있다. 즉, 종래의 음성/비음성 판별 장치들은 잡음과 음성의 모델을 따로 만들어서 어느 모델에대한 확률이 높은지의 여부로 판단한다든지 또는 유성음 구간의 피치를 구하여 주어진 조건을 만족할 경우에 음성이라고 판단하는 방법들을 취해 왔다. 그런데, 이런 방법들은 매 프레임마다 우선적으로 관련 정보를 추출해야 하기 때문에 기본적으로 계산량이 많은 문제점이 있다.Conventional methods for determining speech / non-voice include a method of making a voice model and a non-voice model having characteristics in a frequency domain and using which model the input signal is close to and a method of determining using pitch information in a time domain. have. That is, conventional speech / non-voice discrimination apparatuses make a model of noise and speech separately to determine whether the probability of which model is high or to determine the pitch when the voiced sound interval is satisfied to satisfy a given condition. Have been drunk. However, since these methods must first extract relevant information every frame, there is a problem in that a large amount of computation is required.

또한, 상기 종래의 모델을 이용하는 방법은 실제 시스템이 동작하는 환경이 모델을 만들었을 때의 환경과 차이가 심할 경우에 시스템의 성능이 현저하게 떨어지는 문제점이 있다. 즉, 잡음과 음성의 모델을 이용하는 방법은 이들 모델이 사전에 만들어져야 하며, 모델을 만들 때의 환경과 실제 판별 시스템이 작동할 때의 환경이 차이가 많을 경우에 안정된 성능을 보장하기 어려운 문제점이 있다.In addition, the method using the conventional model has a problem in that the performance of the system is significantly reduced when the environment in which the actual system operates is severely different from the environment in which the model is made. In other words, the method of using noise and speech models has to be made in advance, and it is difficult to guarantee stable performance when there are many differences between the environment when the model is created and when the actual discrimination system operates. have.

또한, 피치 관련 정보를 이용하는 방법에서는 피치의 추출과 음성 판단에 사용되는 조건들이 환경에 얼마나 둔감하고 안정적인가 하는 문제가 관건이다. 즉, 피치를 이용하는 방법은 피치를 정확히 뽑아내는 것 자체가 실제 환경에서는 어려운 문제이다.In addition, in the method of using the pitch-related information, the problem is how insensitive and stable the conditions used for the extraction of the pitch and the voice judgment. In other words, the method of using the pitch is difficult to pull out the pitch correctly in a real environment.

본 발명은 상기 문제점을 해결하기 위하여 제안된 것으로, 음성 포락선의 주파수 특성을 이용하여 입력신호의 음성/비음성 여부를 판단하는 음성/비음성 판별 장치 및 그 방법과 상기 방법을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공하는데 그 목적이 있다.SUMMARY OF THE INVENTION The present invention has been proposed to solve the above-mentioned problems. The present invention provides a speech / non-voice discrimination apparatus and a method for determining the speech / non-voice of an input signal using frequency characteristics of a speech envelope and a program for realizing the method. Its purpose is to provide a computer-readable recording medium having recorded thereon.

즉, 본 발명은, 계산량을 줄이기 위해 포락선 검출기에서 추출된 신호를 다운 샘플링(down-sampling)하고, 주파수 변환된 포락선에 대해 전체 주파수 영역의 에너지에 대한 특정 주파수 영역의 에너지 비를 사용하여 음성/비음성 여부를 판단하는 음성/비음성 판별 장치 및 그 방법과 상기 방법을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공하는데 그 목적이 있다.That is, the present invention down-samples the signal extracted by the envelope detector to reduce the calculation amount, and uses the energy ratio of the specific frequency domain to the energy of the entire frequency domain with respect to the frequency-converted envelope. It is an object of the present invention to provide a voice / non-voice discrimination apparatus for determining non-voice and a method thereof and a computer-readable recording medium recording a program for realizing the method.

도 1 은 본 발명에 따른 음성/비음성 판별 장치가 적용된 음성인식 시스템의 일예시도.1 is an exemplary view of a speech recognition system to which a speech / non-voice discrimination apparatus according to the present invention is applied.

도 2 는 본 발명에 따른 음성 파형의 포락선 정보를 이용한 음성/비음성 판별 장치의 일실시예 구성도.2 is a block diagram of an embodiment of a speech / non-voice discrimination apparatus using envelope information of a speech waveform according to the present invention;

도 3 은 본 발명에 따른 음성 파형의 포락선 정보를 이용한 음성/비음성 판별 방법에 대한 일실시예 흐름도.3 is a flowchart illustrating a method for determining a voice / non-voice using envelope information of a voice waveform according to the present invention;

* 도면의 주요 부분에 대한 부호의 설명* Explanation of symbols for the main parts of the drawings

11 : 마이크 12 : 음성검출기11 microphone 12 voice detector

13 : 인식기 14,22 : 포락선 검출기13: recognizer 14, 22: envelope detector

15 : 포락선 분석기 21 : 버퍼15: envelope analyzer 21: buffer

23 : 다운 샘플링부(down-sampling) 24 : 주파수 변환부23: down-sampling 24: frequency converter

25 : 음성/비음성 판별부25: voice / non-voice discrimination unit

상기 목적을 달성하기 위한 본 발명의 장치는, 음성 파형의 포락선 정보를 이용한 음성/비음성 판별 장치에 있어서, 입력 음성 신호를 저장하기 위한 저장 수단; 상기 저장 수단에 저장된 신호에 대해 포락선을 추출하기 위한 포락선 검출 수단; 상기 포락선 신호를 주파수 영역으로 변환하기 위한 주파수 변환 수단; 및 상기 주파수 변환 수단에서 주파수 영역으로 변환된 포락선 신호의 전체 주파수 영역 에너지와 특정 주파수 영역 에너지의 비율에 따라 음성/비음성 여부를 판단하기 위한 음성/비음성 판별 수단을 포함하는 것을 특징으로 한다.An apparatus of the present invention for achieving the above object comprises: a speech / non-voice discrimination apparatus using envelope information of speech waveforms, comprising: storage means for storing an input speech signal; Envelope detection means for extracting an envelope with respect to the signal stored in said storage means; Frequency converting means for converting the envelope signal into a frequency domain; And voice / non-voice discrimination means for determining whether voice / non-voice is performed according to a ratio of total frequency domain energy and specific frequency domain energy of the envelope signal converted into the frequency domain by the frequency conversion means.

또한, 상기 본 발명의 장치는, 상기 주파수 변환 수단에서의 계산량을 줄이기 위하여, 상기 포락선 검출 수단에서 추출된 포락선 신호에 대해 다운 샘플링(downsampling)을 실시하여 포락선의 샘플(sample) 수를 줄여 상기 주파수 변환 수단으로 전달하기 위한 다운 샘플링 수단을 더 포함하는 것을 특징으로 한다.In addition, the apparatus of the present invention, by reducing the number of samples of the envelope by downsampling the envelope signal extracted by the envelope detection means, in order to reduce the amount of calculation in the frequency conversion means It further comprises a down sampling means for transferring to the conversion means.

한편, 본 발명의 방법은, 음성 파형의 포락선 정보를 이용한 음성/비음성 판별 장치에 적용되는 음성/비음성 판별 방법에 있어서, 음성 입력 신호를 저장하는 제 1 단계; 상기 저장된 입력 신호에 대해 포락선을 추출하는 제 2 단계; 상기 포락선 신호를 주파수 영역으로 변환하는 제 3 단계; 및 상기 주파수 영역으로 변환된 포락선 신호의 전체 주파수 영역 에너지와 특정 주파수 영역 에너지의 비율에 따라 음성/비음성 여부를 판단하는 제 4 단계를 포함하는 것을 특징으로 한다.On the other hand, the method of the present invention, the speech / non-voice discrimination method applied to the speech / non-voice discrimination apparatus using the envelope information of the speech waveform, comprising: a first step of storing a speech input signal; Extracting an envelope of the stored input signal; Converting the envelope signal into a frequency domain; And determining a speech / non-voice based on a ratio of total frequency domain energy and specific frequency domain energy of the envelope signal converted into the frequency domain.

또한, 상기 본 발명의 방법은, 상기 제 3 단계에서 시간영역의 포락선 신호를 주파수 영역으로 변환할 때 계산량이 줄어 들도록 하기 위하여, 상기 추출된 포락선 신호에 대해 다운 샘플링(downsampling)을 실시하여 포락선의 샘플(sample) 수를 줄인 후에 상기 제 3 단계로 넘어가는 제 5 단계를 더 포함하는 것을 특징으로 한다.In addition, the method of the present invention, downsampling the extracted envelope signal in order to reduce the amount of calculation when converting the envelope signal of the time domain to the frequency domain in the third step of the envelope of the envelope; The method may further include a fifth step of moving to the third step after reducing the number of samples.

한편, 본 발명은, 음성 파형의 포락선 정보를 이용하여 음성/비음성 여부를 판별하기 위하여, 프로세서를 구비한 음성/비음성 판별 장치에, 음성 입력 신호를 저장하는 제 1 기능; 상기 저장된 입력 신호에 대해 포락선을 추출하는 제 2 기능; 상기 포락선 신호를 주파수 영역으로 변환하는 제 3 기능; 및 상기 주파수 영역으로 변환된 포락선 신호의 전체 주파수 영역 에너지와 특정 주파수 영역 에너지의 비율에 따라 음성/비음성 여부를 판단하는 제 4 기능을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공한다.On the other hand, the present invention, in order to determine whether the voice / non-voice using the envelope information of the audio waveform, a voice / non-voice determination device having a processor, the first function of storing a voice input signal; A second function of extracting an envelope with respect to the stored input signal; A third function of converting the envelope signal into a frequency domain; And a fourth program for realizing a fourth function for determining voice / non-voice according to a ratio of total frequency domain energy and specific frequency domain energy of the envelope signal converted into the frequency domain. do.

또한, 본 발명은, 상기 제 3 기능에서 시간영역의 포락선 신호를 주파수 영역으로 변환할 때 계산량이 줄어 들도록 하기 위하여, 상기 추출된 포락선 신호에 대해 다운 샘플링(downsampling)을 실시하여 포락선의 샘플(sample) 수를 줄인 후에 상기 제 3 기능으로 넘어가는 제 5 기능을 더 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공한다.In addition, the present invention, in order to reduce the amount of calculation when converting the envelope signal in the time domain to the frequency domain in the third function, by performing downsampling on the extracted envelope signal to sample the envelope (sample) Provided is a computer-readable recording medium having recorded thereon a program for further realizing the fifth function, which reduces the number and then transitions to the third function.

이러한 본 발명의 요지를 간단히 살펴보면 다음과 같다.Briefly looking at the gist of the present invention.

일반적으로 실제 환경에서 동작해야 하는 음성 시스템이 우수하다고 하면 정확성 뿐만 아니라 안정성, 주변 환경에 대한 둔감성 측면에서 우수해야 하며, 가급적 간단한 구조를 가져야 한다. 이를 만족시키기 위하여 본 발명에서는 음성 포락선 정보를 이용한 음성/비음성 판별 장치 및 그 방법을 제안하고자 한다.Generally speaking, a good voice system that should operate in a real environment should be excellent in terms of stability as well as stability and insensitivity to the surrounding environment, and should be as simple as possible. In order to satisfy this problem, the present invention proposes a speech / non-voice discrimination apparatus and method using speech envelope information.

즉, 음성에는 반드시 유성음 구간이 존재하며, 무성음 구간에 비해서는 일반적으로 길이가 길다. 유성음 구간의 에너지 변화 또는 유성음과 무성음의 반복에 의해 형성되는 음성의 포락선은 조음기관의 물리적 운동에 기인하기 때문에 한정된 주파수 영역에 에너지가 집중되어 있다. 그리고, 음성의 포락선은 웬만한 잡음이 섞이더라도 그 형태가 크게 바뀌지 않고, 이러한 특성은 주변환경에 대한 둔감성 측면에서 큰 강점을 가지며, 또한 음소길이와 관련이 있어 포락선의 주파수 특성을 분석해 보면 특정 주파수영역에 에너지가 집중된다.That is, there is always a voiced sound section in the voice, and generally longer than the voiced sound section. Since the envelope of speech formed by the energy change of the voiced sound section or the repetition of the voiced sound and the unvoiced sound is due to the physical motion of the articulator, the energy is concentrated in the limited frequency region. And, even if the noise envelope is mixed with moderate noise, its shape does not change significantly, and this characteristic has great strength in terms of insensitivity to the surrounding environment, and also related to the phoneme length. The energy is concentrated.

따라서, 본 발명은 음성 포락선의 이러한 특징을 이용하여 작은 계산량과 간단한 알고리즘에 의해 잡음이 심한 입력신호에서도 음성/비음성의 판별을 정확히 할 수 있도록 해준다.Therefore, the present invention makes it possible to accurately discriminate voice / non-voice even in a noisy input signal by using a small calculation amount and a simple algorithm by using this feature of the voice envelope.

결론적으로, 인간 음성신호의 포락선은 대부분 특정 주파수 영역에 에너지가 몰려 있으며, 발화속도 및 발성된 음소의 종류에 따라 차이는 있으되 그 영역이 크게 변하지 않는다. 따라서, 본 발명에서는 이러한 특성을 이용하여 음성검출기에 의해 받아들여지는 신호에 대한 포락선을 구한 후에, 이를 주파수 영역으로 변환시킨 다음, 에너지 분포를 조사하여 음성/비음성 여부를 판단한다.In conclusion, most of the envelopes of the human voice signal are energized in a specific frequency region, and vary depending on the speed of speech and the types of spoken phonemes. Therefore, in the present invention, after calculating the envelope of the signal received by the voice detector using this characteristic, it is converted to the frequency domain, and then the energy distribution is examined to determine whether the voice or non-voice.

상술한 목적, 특징들 및 장점은 첨부된 도면과 관련한 다음의 상세한 설명을 통하여 보다 분명해 질 것이다. 이하, 첨부된 도면을 참조하여 본 발명에 따른 바람직한 일실시예를 상세히 설명한다.The above objects, features and advantages will become more apparent from the following detailed description taken in conjunction with the accompanying drawings. Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1 은 본 발명에 따른 음성/비음성 판별 장치가 적용된 음성인식 시스템의 일예시도이다.1 is an exemplary view of a speech recognition system to which a speech / non-voice discrimination apparatus according to the present invention is applied.

먼저, 마이크(11) 등의 입력장치에 의해 입력되는 신호는 음성검출기(12)를 통과하게 된다. 여기서의 음성검출기(12)는 기존의 일반적인 음성검출기이다. 음성검출기(12)는 마이크(11) 등으로부터의 입력신호를 음성이라고 판단된 시점부터 다음 블럭들에 전해주다가 음성이 끝났다고 판단되면 입력신호를 다음으로 전해주지 아니한다. 음성검출기(12)를 통과한 신호는 인식기(13) 및 포락선 검출기(14)로 입력된다. 여기서 인식기(13)는 특징벡터 추출부를 포함하는 일반적인 인식기이다.First, a signal input by an input device such as the microphone 11 passes through the voice detector 12. The voice detector 12 here is a conventional general voice detector. The voice detector 12 transmits the input signal from the microphone 11 or the like to the next blocks from the time when it is determined to be voice, but does not transmit the input signal to the next block when the voice is finished. The signal passing through the voice detector 12 is input to the recognizer 13 and the envelope detector 14. The recognizer 13 is a general recognizer including a feature vector extractor.

포락선 검출기(14)는 음성검출기(12)로부터 입력된 신호를 저장한 다음, 이 저장한 신호로부터 포락선을 검출한다. 그리고, 검출된 포락선을 포락선 분석기(15)로 넘기고, 포락선 분석기(15)는 입력된 포락선을 분석하여 음성인지 아닌지를 판단한다. 그리고, 판단결과를 음성검출기(12) 및 인식기(13)에 넘긴다. 포락선 검출기(14) 및 포락선 분석기(15)는 인식기(13)와는 독립적으로 동작한다.The envelope detector 14 stores a signal input from the voice detector 12, and then detects an envelope from the stored signal. Then, the detected envelope is passed to the envelope analyzer 15, and the envelope analyzer 15 analyzes the input envelope to determine whether or not it is negative. The result of the determination is passed to the voice detector 12 and the recognizer 13. The envelope detector 14 and the envelope analyzer 15 operate independently of the recognizer 13.

음성검출기(12)는 포락선 분석기(15)로부터, 마이크(11)로부터 입력된 신호가 음성이 아니라는 판단결과를 받으면 마이크(11)로부터의 신호를 다음 블럭으로 넘기지 아니하고, 음성의 시작부분을 검출하기 위한 초기모드로 전환한다.When the voice detector 12 receives the determination result from the envelope analyzer 15 that the signal input from the microphone 11 is not voice, the voice detector 12 does not pass the signal from the microphone 11 to the next block and detects the beginning of the voice. Switch to the initial mode for.

인식기(13)는 인식준비상태에 있다가 음성검출기(12)로부터 음성이라고 판단된 신호가 입력되면, 이 신호를 특징벡터 추출 등의 과정을 거친 후 인식하여 인식결과를 출력한다. 인식 도중에, 음성검출기(12)로부터 입력된 신호가 음성이 아니라는 판단결과를 포락선 분석기(15)로부터 받으면 인식과정을 즉시 중단하고 다시 처음의 인식준비상태에 들어간다.When the recognizer 13 is in a recognition ready state and receives a signal determined to be a voice from the voice detector 12, the recognizer 13 recognizes the signal after performing feature vector extraction and outputs a recognition result. During the recognition, upon receiving the determination result from the envelope analyzer 15 that the signal input from the voice detector 12 is not voice, the recognition process is immediately stopped and the initial recognition ready state is entered again.

도 2 는 본 발명에 따른 음성 파형의 포락선 정보를 이용한 음성/비음성 판별 장치의 일실시예 구성도이다.2 is a block diagram of an embodiment of a speech / non-voice discrimination apparatus using envelope information of a speech waveform according to the present invention.

본 발명에 따른 음성의 포락선 정보를 이용한 음성/비음성 판별 장치는, 음성검출기(12)에서 검출된 입력신호를 임시로 저장하기 위한 버퍼(21), 상기 버퍼(21)에 저장된 입력 신호에 대해 포락선을 추출하기 위한 포락선 검출기(22), 주파수 변환시의 계산량을 줄이기 위해 상기 포락선 검출기(22)에서 추출된 포락선 신호에 대해 다운 샘플링(downsampling)을 실시하여 포락선의 샘플(sample) 수를 줄이기 위한 다운 샘플링부(23), 상기 다운 샘플링부(23)에서 다운 샘플링(downsampling)된 포락선 신호를 주파수 영역으로 변환하기 위한 주파수 변환부(24), 및 상기 주파수 변환부(24)에서 주파수 영역으로 변환된 포락선 신호에 대해, 포락선의 전체 주파수 영역 에너지에 대한 특정 주파수 영역의 에너지의 비를 사용하여 음성/비음성 여부를 판단하기 위한 음성/비음성 판별부(25)를 포함한다.Voice / non-voice discrimination apparatus using the envelope information of the voice according to the present invention, the buffer 21 for temporarily storing the input signal detected by the voice detector 12, with respect to the input signal stored in the buffer 21 Envelope detector 22 for extracting the envelope, down sampling to reduce the number of samples of the envelope by performing downsampling on the envelope signal extracted by the envelope detector 22 to reduce the amount of calculation in frequency conversion A sampling unit 23, a frequency converter 24 for converting the downsampled envelope signal down-sampled into the frequency domain, and a frequency converter 24 converted into the frequency domain. For an envelope signal, a speech / non-speech plate for judging speech / non-voice using the ratio of the energy of a specific frequency domain to the energy of the entire frequency domain of the envelope. And a unit (25).

그 구체적인 동작을 도 3 을 참조하여 살펴보면 다음과 같다.Looking at the specific operation with reference to FIG.

도 3 은 본 발명에 따른 음성 파형의 포락선 정보를 이용한 음성/비음성 판별 방법에 대한 일실시예 흐름도이다.3 is a flowchart illustrating a method for determining a voice / non-voice using envelope information of a voice waveform according to the present invention.

먼저, 음성검출기(12)로부터 입력된 신호는 사전에 설정된 일정길이(바람직하게는 1~10초)만큼 버퍼(21)에 임시로 저장된다(31,32). 이후, 포락선 검출기(22)는 버퍼(21)에 저장된 입력 신호에 대해 포락선을 추출한다(33).First, signals input from the voice detector 12 are temporarily stored in the buffer 21 for a predetermined length (preferably 1 to 10 seconds) (31, 32). The envelope detector 22 then extracts the envelope 33 with respect to the input signal stored in the buffer 21 (33).

다음으로, 시간영역의 포락선 신호를 주파수 영역으로 변환시의 계산량을 줄이기 위해 추출된 포락선 신호에 대해 다운 샘플링(downsampling)을 실시하여 포락선의 샘플(sample) 수를 줄인다(34). 이후, 다운 샘플링(downsampling)된 포락선 신호를 주파수 영역으로 변환한다(35).Next, downsampling is performed on the extracted envelope signal in order to reduce the amount of calculation in converting the envelope signal in the time domain into the frequency domain, thereby reducing the number of samples of the envelope (34). Thereafter, the downsampled envelope signal is converted into the frequency domain (35).

다음으로, 주파수 영역으로 변환된 포락선 신호에 대해 음성/비음성 여부를 다음의 (수학식 1)을 이용하여 판단하게 된다(36). 즉, 음성의 포락선일 경우에 특정 주파수 영역에서 높은 에너지를 가지게 될 것이며, 따라서 포락선의 전체 주파수 영역의 에너지에 대한 특정 주파수 영역에서의 에너지 비를 알파(α)라고 할 때, 이 값의 크기에 따라 음성/비음성 여부를 판단한다.Next, it is determined whether the envelope signal converted into the frequency domain is speech / non-voice using Equation 1 below (36). In other words, in the case of a negative envelope, it will have a high energy in a specific frequency region, so when the energy ratio in a specific frequency region with respect to the energy of the entire frequency region of the envelope is alpha (α), Therefore, it is determined whether the voice or non-voice.

α = (주파수 변환된 포락선에서 주파수 f1에서부터 f2까지의 에너지) / 주파수 변환된 포락선의 전체 에너지α = (energy from frequency f1 to f2 in the frequency-converted envelope) / total energy of the frequency-converted envelope

상기 (수학식 1)에 따라 포락선 신호의 전체 주파수 영역 에너지에 대한 특정 주파수 영역에서의 에너지 비(α)를 구한 후, 이 값이 임계치 이상이면 음성, 임계치 미만이면 비음성으로 판단한다(37,38).According to Equation (1), the energy ratio α in a specific frequency region with respect to the total frequency domain energy of the envelope signal is determined, and if the value is greater than or equal to the threshold, it is determined to be negative (37). 38).

이후, 판단결과를 인식기(13) 및 음성검출기(12)에 보내는데, 판단결과가 비음성일 경우에 음성검출기(12) 및 인식기(13)는 초기모드로 전환된다.Thereafter, the determination result is sent to the recognizer 13 and the voice detector 12. When the determination result is non-voice, the voice detector 12 and the recognizer 13 are switched to the initial mode.

이상에서 설명한 본 발명은 전술한 실시예 및 첨부된 도면에 의해 한정되는 것이 아니고, 본 발명의 기술적 사상을 벗어나지 않는 범위 내에서 여러 가지 치환, 변형 및 변경이 가능하다는 것이 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 있어 명백할 것이다.The present invention described above is not limited to the above-described embodiments and the accompanying drawings, and various substitutions, modifications, and changes can be made in the art without departing from the technical spirit of the present invention. It will be apparent to those of ordinary knowledge.

상기와 같은 본 발명은, 음성 포락선의 주파수 특성을 이용하여 입력신호의 음성/비음성 여부를 정확하게 판별할 수 있으며, 간단한 알고리즘과 음성 포락선이 가지는 특성으로 인해 실제 판별 장치가 작동되는 환경에 영항을 덜 받는 안정된 성능을 보장하는 효과가 있다.The present invention as described above, by using the frequency characteristics of the speech envelope can accurately determine whether the input signal speech / non-voice, and due to the characteristics of the simple algorithm and the speech envelope affects the environment in which the actual discriminating device operates There is an effect of ensuring a stable performance less received.

Claims

In a speech / non-voice discrimination apparatus using envelope information of an audio waveform,

Storage means for storing an input speech signal;

Envelope detection means for extracting an envelope with respect to the signal stored in said storage means;

Frequency converting means for converting the envelope signal into a frequency domain; And

Speech / non-voice discrimination means for determining whether speech or non-voice according to the ratio of the total frequency domain energy and the specific frequency domain energy of the envelope signal converted into the frequency domain by the frequency conversion means

Speech / non-voice discrimination apparatus using the envelope information of the speech waveform comprising a.

The method of claim 1,

In order to reduce the amount of calculation in the frequency converting means, downsampling is performed on the envelope signal extracted by the envelope detecting means to reduce the number of samples of the envelope and deliver the downsampled means to the frequency converting means.

Speech / non-voice discrimination apparatus using the envelope information of the speech waveform further comprising.

The method according to claim 1 or 2,

The voice / non-voice discrimination means,

The ratio of the total frequency domain energy and the specific frequency domain energy of the envelope signal is determined. If the ratio is greater than or equal to the threshold, the voice is judged as voice, and if it is less than or equal to the threshold, the voice / ratio using the envelope information of the voice waveform is determined. Voice discrimination device.

In a speech / non-voice discrimination method applied to a speech / non-voice discrimination apparatus using envelope information of a speech waveform,

A first step of storing a voice input signal;

Extracting an envelope of the stored input signal;

Converting the envelope signal into a frequency domain; And

A fourth step of determining whether voice / non-voice is performed according to a ratio of total frequency domain energy and specific frequency domain energy of the envelope signal converted into the frequency domain;

Speech / non-voice discrimination method using the envelope information of the speech waveform comprising a.

The method of claim 4, wherein

In order to reduce the amount of calculation when converting the envelope signal in the time domain to the frequency domain in the third step, downsampling is performed on the extracted envelope signal to reduce the number of samples of the envelope. 5th step to 3rd step

Speech / non-voice discrimination method using the envelope information of the speech waveform further comprising.

The method according to claim 4 or 5,

The fourth step,

The ratio of the total frequency domain energy and the specific frequency domain energy of the envelope signal is determined. If the ratio is greater than or equal to the threshold, the voice is judged as voice, and if it is less than or equal to the threshold, the voice / ratio using the envelope information of the voice waveform is determined. Voice discrimination method.

In order to determine whether speech or non-voice using the envelope information of the speech waveform, to a speech / non- speech discrimination apparatus having a processor,

A first function of storing a voice input signal;

A second function of extracting an envelope with respect to the stored input signal;

A third function of converting the envelope signal into a frequency domain; And

A fourth function of determining whether voice / non-voice is based on a ratio of total frequency domain energy and specific frequency domain energy of the envelope signal converted into the frequency domain

A computer-readable recording medium having recorded thereon a program for realizing this.

The method of claim 7, wherein

In order to reduce the amount of calculation when converting the envelope signal in the time domain to the frequency domain in the third function, down sampling is performed on the extracted envelope signal to reduce the number of samples of the envelope. Fifth function to pass to third function

A computer-readable recording medium that records a program for further realization.