KR100434723B1

KR100434723B1 - Sporadic noise cancellation apparatus and method utilizing a speech characteristics

Info

Publication number: KR100434723B1
Application number: KR10-2001-0084335A
Authority: KR
Inventors: 한민수; 한현배; 정상배
Original assignee: 주식회사 케이티; 정보통신연구진흥원; 학교법인 한국정보통신학원
Priority date: 2001-12-24
Filing date: 2001-12-24
Publication date: 2004-06-07
Also published as: KR20020023858A

Abstract

본 발명은 음성 신호특성을 이용한 돌발잡음 제거장치 및 그 방법에 관한 것으로, 다양한 잡음 제거영역을 제공함으로써 특정한 확률적 특성이나 상관함수를 가지고 있지 않으며, 파형과 주파수 스펙트럼의 분포 형태가 음성과 유사한 돌발잡음을 다단계의 제거과정을 통해 효과적으로 제거될 수 있도록 하는 음성 신호특성을 이용한 돌발잡음 제거장치 및 그 방법을 제공함에 그 목적이 있다.The present invention relates to a device for canceling noise using a speech signal characteristic and a method thereof, and by providing various noise canceling regions, does not have a specific stochastic characteristic or correlation function, and the distribution of waveform and frequency spectrum is similar to that of speech. It is an object of the present invention to provide an apparatus and method for removing a sudden noise using a voice signal characteristic that can effectively remove noise through a multi-step elimination process.

본 발명은 돌발잡음이 섞인 신호에서 음성신호만을 추출하기 위한 돌발잡음 제거장치에 있어서, 잡음이 섞인 음성신호로부터 필터의 계수를 구하기 위한 음성분석을 행하는 음성분석부와; 상기 음성분석부에서 결정된 90∼3.4KHz의 대역폭을 갖는 대역통과필터와; 상기 대역통과 필터로부터 출력되는 신호에서 음성구간 검출을 위한 변수를 추출하는 검출파라미터 추출부와; 상기 검출파라미터 추출부에서 검출한 파라미터를 이용하여 음성 구간이 아니라고 판단되는 구간은 잡음신호 구간으로 판단하여 제거하여 음성구간을 결정하는 음성구간 결정로직과; 상기 음성구간 결정로직에서 음성신호 구간으로 판단된 구간이 전송되어 음성인식을 행하는 인식부로 구성된 것을 특징으로 한다.A sudden noise removing device for extracting only a speech signal from a signal having a mixed noise, comprising: a speech analyzer for performing a voice analysis to obtain a coefficient of a filter from a mixed speech signal; A band pass filter having a bandwidth of 90 to 3.4 KHz determined by the speech analyzer; A detection parameter extracting unit for extracting a variable for detecting a speech section from the signal output from the bandpass filter; A speech section determination logic for determining a speech section by removing a section determined to be not a speech section by using the parameter detected by the detection parameter extractor as a noise signal section; In the voice section determination logic, a section determined as a voice signal section is transmitted, and comprises a recognition unit configured to perform voice recognition.

본 발명을 적용하면, 제거하기 난이한 다양한 형태의 돌발잡음을 성공적으로 제거할 수 있으며, 필요한 신호성분의 왜곡이나 일부구간의 제거 등의 기존의 방법이 갖고 있는 부작용이 없고, 필요 없는 잡음 성분이 미리 제거되고 필요한 신호성분만이 남게 되므로, 음성 인식 등 음성신호처리의 성능향상 및 데이터 통신을 비롯한 각종 통신망의 대역효율 향상에 기여할 것이며, 음성코딩의 압축률을 향상시킬 수 있다는 효과가 있다.By applying the present invention, it is possible to successfully remove various types of sudden noises that are difficult to remove, and there are no side effects of conventional methods such as distortion of necessary signal components or removal of some sections, and unnecessary noise components are eliminated. Since only the necessary signal components are removed in advance, it will contribute to the improvement of the performance of voice signal processing such as speech recognition and bandwidth efficiency of various communication networks including data communication, and the compression rate of voice coding can be improved.

Description

SPORADIC NOISE CANCELLATION APPARATUS AND METHOD UTILIZING A SPEECH CHARACTERISTICS}

본 발명은 음성 신호특성을 이용한 돌발잡음 제거장치 및 그 방법에 관한 것으로, 보다 상세하게 음성신호가 포함된 신호중 돌발잡음을 제거하여 음성구간을 기타의 부작용없이 추출하는 음성 신호특성을 이용한 돌발잡음 제거장치 및 그 방법에 관한 것이다.The present invention relates to a device for removing a sudden noise using a voice signal characteristic and a method thereof, and more particularly, to remove a sudden noise from a signal including a voice signal, thereby removing a sudden noise using a voice signal characteristic for extracting a voice section without other side effects. An apparatus and a method thereof are provided.

주지된 바와 같이, 음성 인식 및 음성 데이터의 신호처리에서 가장 문제가 되는 것은 해당 신호에 포함된 잡음의 제거인 데, 그 잡음은 신호의 전체구간에 골고루 분포하는 잡음과 특정 구간에 일시적으로 존재하는 산발적 잡음으로 나눌 수 있다.As is well known, the most problematic problem in speech recognition and signal processing of speech data is the elimination of noise contained in the signal, which noise is evenly distributed throughout the entire section of the signal and temporarily present in a particular section. Can be divided into sporadic noise.

그 중 산발적 잡음은 임펄스성 잡음, 즉 짧은 시간동안 존재하며 그 진폭이 매우 크고 단순한 파형을 갖는 잡음과 돌발성 잡음 즉, 임펄스 잡음에 비해 비교적 긴 시간 동안 존재하며 음성과 비슷한 진폭과 신호성분이 복잡한 돌발성 잡음으로 대별된다. 돌발성 잡음의 예를 들면 책상서랍 닫는 소리, 문 닫는 소리, 손뼉 치는 소리 등이 이러한 잡음에 속한다.Among them, sporadic noise exists for impulsive noise, that is, for a short time, and has a very large amplitude and a simple waveform, and sudden noise, that is, for a relatively long time, compared to impulse noise, and has a complex amplitude and signal component similar to speech. It is roughly classified as noise. Examples of accidental noises include desk drawer closing, door closing, and clapping.

종래에는, 신호 전구간에 걸쳐 존재하는 잡음을 제거하는 방법이 사용된 바, 이러한 방법은 고전적인 필터나 적응 필터등의 방법과 평균잡음 제거법등 방법이 있으며, 산발적 잡음의 하나인 임펄스성 잡음을 제거하기 위한 방법들은 가변역을 이용하거나 앞뒤 신호의 상관관계를 이용하는 방법 등이 있다. 그러나 시스템의 성능에 큰 영향을 미치는 돌발성 잡음을 제거하기 위한 유용한 방법은 제시된 것이 없다.Conventionally, a method of removing noise existing throughout the signal is used. Such a method includes a classical filter, an adaptive filter, and an average noise cancellation method, and removes impulsive noise, which is one of sporadic noises. The methods to be used include a variable range or a method using correlation of front and rear signals. However, no useful method has been proposed to eliminate the sudden noise that has a big impact on the performance of the system.

통상, 음성인식은 기본적으로 인식하고자 하는 음성 패턴과 기준 패턴과의 주사도 판별에 의해 이루어지는 바, 이때 인식할 음성 데이터에서 음성구간을 가능한 한 정확히 검출하여야 음성인식의 성능을 높힐 수 있다.In general, speech recognition is basically performed by determining the degree of scanning between a speech pattern to be recognized and a reference pattern. At this time, the speech section may be detected as accurately as possible in the speech data to be recognized to increase the speech recognition performance.

또한, 일반적으로 음성은 인간의 성대에서 발생된 공기 흐름이 입, 코, 혀, 입술 등의 조음기관인 성도에서 단속되므로서 발생되어 진다. 음성은 크게 유성음과 무성음으로 구별되는 데 이는 음원이 각각 다르기 때문이다. 여기서, 유성음은 성대에서 주기적인 임펄스(또는 피치) 형태의 공기 흐름이 입, 혀, 코, 치아 등의 조음기관에서 공기흐름이 조절되어 생성된다. 무성음은 폐에서 나오는 난류성 공기 진동이 조음기관을 통과하면서 조음된다. 이를 모델링하면 도 1과 같이 표현될 수 있다.In addition, the voice is generally generated because the air flow generated in the human vocal cords is interrupted in the vocal tract, which is an articulator of the mouth, nose, tongue, and lips. Voice is largely divided into voiced sound and unvoiced sound because the sound sources are different. Here, the voiced sound is generated by controlling the air flow in articulators such as mouth, tongue, nose, teeth, and the like. Unvoiced sound is articulated as turbulent air vibrations from the lungs pass through the articulator. When modeling this, it can be expressed as shown in FIG. 1.

도 1의 음성 발생 모델에서 임펄스 열 발생기(10)에서 발생되는 유성음과 잡음 발생기(20)에서 발생되는 무성음은 성도에 해당되는 디지털 필터(30)를 구동시킴으로써 음성신호가 발생된다. 이러한 모델에서 발생된 음성신호는 신호의 특성이 명확하게 나타나므로 음성구간과 비음성구간의 구별이 비교적 쉽기는 하나, 주변 잡음이 강하게 나타나는 경우에는 그 구별이 쉽지 않다는 문제가 있다.In the voice generation model of FIG. 1, the voiced sound generated by the impulse heat generator 10 and the unvoiced sound generated by the noise generator 20 generate a voice signal by driving the digital filter 30 corresponding to the vocal tract. Since the voice signal generated in this model has a clear signal characteristic, it is relatively easy to distinguish between a voice section and a non-voice section, but there is a problem in that the voice signal is difficult to distinguish when the ambient noise is strong.

기존의 잡음제거 방법들을 사용하여 잡음을 축약 시키거나 제거 할 경우, 필요한 신호성분도 왜곡되거나 제거하는 경우가 발생되는 부작용이 있다. 또한 기존의 어떠한 방법도 돌발성 잡음을 효과적으로 제거하지 못하고 있다. 왜냐하면, 돌발성 잡음의 특성이 여타 잡음의 특성과는 기본적으로 틀리므로 기존의 방법을 적용할 경우 잡음이 제거되지 않을 뿐만 아니라 설사 일부 잡음이 제거된다 하더라도신호성분의 왜곡이나 제거가 심하게 이루어 지므로 이러한 방법으로 돌발성 잡음을 제거할 수는 없다는 문제점이 있다.When the noise is reduced or removed by using the conventional noise removal methods, there is a side effect that the necessary signal components are also distorted or removed. Also, none of the existing methods effectively eliminates abrupt noise. Because the characteristics of the abrupt noise are fundamentally different from those of other noises, the conventional method does not remove the noise, and even if some noise is removed, the signal component is severely distorted or removed. There is a problem that can not remove the abrupt noise.

즉, 입력된 신호에서 잡음성분을 제거하고 필요한 음성 신호만을 검출하는 것은 음성 신호처리, 음성코딩 및 각종 데이터 통신분야에서 시스템의 성능과 음성코딩의 압축률 향상, 통신대역폭의 효율적 사용을 가능토록 할 수 있는 매우 중요한 문제이다. 특히 시스템의 오동작의 가장 큰 원인이 되는 각종 돌발 잡음은 특정한 확률적 특성이나 상관함수를 가지고 있지 않으며, 그 중 일부는 파형과 주파수 스펙트럼의 분포 형태가 음성과 유사하여 제거하기가 매우 까다로우며 기존의 방법으로는 완벽한 제거는 불가능하다는 문제가 있다.In other words, removing noise components from the input signal and detecting only the necessary voice signals can improve the performance of the system, the compression ratio of the voice coding, and the efficient use of the communication bandwidth in voice signal processing, voice coding, and various data communication fields. That is a very important issue. In particular, various sudden noises, which are the main cause of system malfunctions, do not have specific stochastic characteristics or correlation functions, and some of them are very difficult to remove due to the similar distribution of waveforms and frequency spectrums. There is a problem in that it is impossible to remove completely.

본 발명은 상기한 종래 기술의 사정을 감안하여 이루어진 것으로, 다양한 잡음 제거영역을 제공함으로써 특정한 확률적 특성이나 상관함수를 가지고 있지 않으며, 파형과 주파수 스펙트럼의 분포 형태가 음성과 유사한 돌발잡음을 다단계의 제거과정을 통해 효과적으로 제거될 수 있도록 하는 음성 신호특성을 이용한 돌발잡음 제거장치 및 그 방법을 제공함에 그 목적이 있다.SUMMARY OF THE INVENTION The present invention has been made in view of the above-described state of the art, and provides various noise canceling areas, and thus does not have a specific stochastic characteristic or correlation function. An object of the present invention is to provide an apparatus and method for removing a sudden noise using a speech signal characteristic that can be effectively removed through a removal process.

도 1은 종래의 음성특성에 따른 모델링과 그 필터의 구성을 나타내는 도면,1 is a view showing the configuration of the filter and its modeling according to the conventional voice characteristics,

도 2는 본 발명의 일실시예에 따른 음성 신호특성을 이용한 돌발잡음 제거장치의 구성을 나타내는 블록구성도,Figure 2 is a block diagram showing the configuration of the abrupt noise cancellation apparatus using the voice signal characteristics according to an embodiment of the present invention,

도 3은 본 발명의 일실시예에 따른 돌발잡음이 포함된 음성 신호특성 및 그 제거상태에 따른 신호특성을 나타내는 파형도,3 is a waveform diagram illustrating a signal characteristic according to a speech signal characteristic and a state in which the abrupt noise is included according to an embodiment of the present invention;

도 4는 본 발명의 일실시예에 따른 음성 신호특성을 이용한 돌발잡음 제거장치의 신호흐름을 나타내는 플로우챠트이다.4 is a flowchart illustrating a signal flow of an abrupt noise removing apparatus using voice signal characteristics according to an exemplary embodiment of the present invention.

*도면의 주요부분에 대한 부호의 설명** Description of the symbols for the main parts of the drawings *

40:음성분석부, 50:대역통과필터,40: voice analysis unit, 50: band pass filter,

60:검출파라미터 추출부, 70:음성구간 결정로직,60: detection parameter extraction section, 70: voice section crystal logic,

72:단구간 신호제거부, 74:연무성음 제거부,72: short section signal canceller, 74: fume sound canceller,

76:1차 잡음제거부, 78:피치검출부,76: 1st order noise canceller, 78: pitch detector,

79:2차 잡음제거부.79: secondary noise canceller.

상기한 목적을 달성하기 위해, 본 발명의 바람직한 실시예에 따르면 돌발잡음이 섞인 신호에서 음성신호만을 추출하기 위한 돌발잡음 제거장치에 있어서, 잡음이 섞인 음성신호로부터 필터의 계수를 구하기 위한 음성분석을 행하는 음성분석부와; 상기 음성분석부에서 결정된 90∼3.4KHz의 대역폭을 갖는 대역통과필터와;상기 대역통과 필터로부터 출력되는 신호에서 음성구간 검출을 위한 변수를 추출하는 검출파라미터 추출부와; 상기 검출파라미터 추출부에서 검출한 파라미터를 이용하여 음성 구간이 아니라고 판단되는 구간은 잡음신호 구간으로 판단하여 제거하여 음성구간을 결정하는 음성구간 결정로직과; 상기 음성구간 결정로직에서 음성신호 구간으로 판단된 구간이 전송되어 음성인식을 행하는 인식부로 구성된 것을 특징으로 하는 음성 신호특성을 이용한 돌발잡음 제거장치가 제공된다.In order to achieve the above object, according to a preferred embodiment of the present invention, in the abrupt noise removing device for extracting only a speech signal from a signal containing a sudden noise, the voice analysis for obtaining the coefficient of the filter from the speech signal containing the noise A voice analyzer for performing; A band pass filter having a bandwidth of 90 to 3.4 KHz determined by the speech analyzer; a detection parameter extracting unit extracting a parameter for detecting a speech section from a signal output from the band pass filter; A speech section determination logic for determining a speech section by removing a section determined to be not a speech section by using the parameter detected by the detection parameter extractor as a noise signal section; An abrupt noise cancellation apparatus using a voice signal characteristic is provided, wherein a section determined as a voice signal section is transmitted from the voice section determination logic to perform voice recognition.

바람직하게, 상기 음성구간 결정로직은 돌발잡음을 위한 첫과정으로 기설정치보다 짧은 구간의 신호는 음성신호가 아니라고 판단하여 제거토록 하는 단구간 신호 제거부와; 한국어 특성상 무성음만으로 이어지는 구간을 잡음 구간으로 판단하여 제거하기 위한 연 무성음 제거부와; 상기 연 무성음 제거부를 통과한 신호중 스펙트럼의 변화가 심한 구간을 스펙트럼 배리언스(Spectrum Variance)와 3 차원 커뮬런트(Cumulant)값에 따라 잡음을 구분하여 제거하기 위한 1차 잡음 제거부와; 상기 1차 잡음 제거부를 통과한 신호중 기설정 피치(80∼300Hz)를 벗어나는 피치 구간을 검출하기 위한 피치 검출부와; 상기 피치 검출부에서 검출된 피치 구간(80Hz이하)을 제거하기 위한 2차 잡음 제거부로 구성된 것을 특징으로 하는 음성 신호특성을 이용한 돌발잡음 제거장치가 제공된다.Preferably, the voice section decision logic is a first step for sudden noise; a short section signal remover configured to determine that a signal of a section shorter than a preset value is not a voice signal and to remove it; A smokeless sound removing unit for determining and removing a section leading to unvoiced sound only as a noise section due to Korean characteristics; A primary noise canceller for classifying and removing noises in the signal passing through the noise canceller according to spectral variance and three-dimensional cumulant values; A pitch detector for detecting a pitch section outside a preset pitch (80 to 300 Hz) of the signal passing through the first noise canceller; An abrupt noise cancellation apparatus using a voice signal characteristic, comprising a secondary noise canceller for removing a pitch section (80 Hz or less) detected by the pitch detector, is provided.

한편, 본 발명은 돌발잡음과 음성신호가 혼합된 신호에서 돌발잡음만을 제거하기 위한 방법에 있어서, 신호와 잡음이 혼합된 신호를 대역통과필터를 통과시키는 과정과; 각 신호의 변수 값들을 일정 프레임씩 추출하는 변수 추출과정과; 음성이라고 볼 수 없는 단 구간을 제거하는 과정과; 단구간이 제거된 신호중 이어지는무성음이 존재하는 지의 여부를 판단하는 과정과; 연 무성음 구간을 제거하는 과정과; 스펙트럼 배리언스(spectrum variance)와 3차원 커뮬런트(cumulant)값에 따라 음성과 잡음을 구분하여 잡음 부분을 제거하는 음성 잡음 구분 제거 과정과; 일정 주파수 이하의 피치를 갖는 소리구간을 제거하는 특정 피치 소리구간 제거 과정으로 이루어진 것을 특징으로 하는 음성 신호특성을 이용한 돌발잡음 제거방법이 제공된다.On the other hand, the present invention provides a method for removing only the sudden noise from the signal of the unexpected noise and the voice signal, comprising the steps of passing through the band pass filter the signal mixed with the noise; A variable extraction process of extracting variable values of each signal by a predetermined frame; Removing a short section that cannot be viewed as negative; Determining whether there is a subsequent unvoiced sound among the signals from which the short section is removed; Removing a fume speech section; A speech noise classification elimination process for removing a noise portion by dividing the speech and the noise according to a spectrum variance and a three-dimensional cumulant value; Provided is a method for removing a sudden noise using a speech signal characteristic, comprising a specific pitch sound section removing process for removing a sound section having a pitch below a predetermined frequency.

또한, 상기 대역통과 필터의 대역폭은 신호와 잡음이 혼합된 90∼3.4KHz의 대역폭을 갖는 것을 특징으로 하는 음성 신호특성을 이용한 돌발잡음 제거방법이 제공된다.In addition, the bandwidth of the bandpass filter is provided with a method for removing noise by using a voice signal characteristics, characterized in that having a bandwidth of 90 ~ 3.4KHz mixed signal and noise.

한편, 상기 변수 추출과정에서 각 변수들은 128 point를 1 프레임으로 64point씩 슬라이딩하면서 각 변수 값들을 추출하는 것을 특징으로 하는 음성 신호특성을 이용한 돌발잡음 제거방법이 제공된다.Meanwhile, in the variable extraction process, each of the variables is provided with a method for removing a sudden noise using a speech signal characteristic, wherein each variable value is extracted while sliding 128 points by 64 points in one frame.

또, 음성구간이 아닌 단구간은 6 프레임, 96 msec인 것을 특징으로 음성 신호특성을 이용한 돌발잡음 제거방법이 제공된다.In addition, a short section, which is not a voice section, is 6 frames, 96 msec.

바람직하게, 특정 피치 소리구간을 제거하는 과정에서 특정 피치는 80HZ이하의 피치인 것을 특징으로 하는 음성 신호특성을 이용한 돌발잡음 제거방법이 제공된다.Preferably, in the process of removing a specific pitch sound interval, a specific noise cancellation method using the speech signal characteristics, characterized in that the pitch of 80HZ or less is provided.

이하, 본 발명에 대해 도면을 참조하여 상세하게 설명한다.EMBODIMENT OF THE INVENTION Hereinafter, this invention is demonstrated in detail with reference to drawings.

음성신호는 유성음과 무성음으로 구성되어 있으며, 유성음은 무성음에 비하여 발성구간의 길이와 에너지가 크고 음성대역 중 낮은 대역의 주파수 성분이 크다.The voice signal is composed of voiced sound and unvoiced sound, and voiced sound has a larger length and energy than voiced sound and a large frequency component of low band of voice band.

무성음은 유성음에 비해 에너지가 작으며 높은 대역의 주파수 성분이 크고 LCR(level Crossing Ratio) 혹은 ZCR(Zero Crossing Rate)값이 크다. 이러한 유,무성음은 여러 개의 음소를 포함하고 있으며 일반적으로 초당 약 10개의 음소가 음성신호를 구성하고 있다.The unvoiced sound has less energy than the voiced sound, has a high frequency component in a high band, and has a high level crossing ratio (LCR) or zero crossing rate (ZCR). These voiced and unvoiced sounds contain several phonemes. In general, about 10 phonemes per second constitute a voice signal.

한국어는 1내지 3개의 음소가 하나의 음을 형성하며 유성음에 속하는 모음은 단일 음을 형성할 수 있으나 유성음 혹은 무성음인 자음은 단일 음소로 음을 형성할 수 없으며 무성음만의 조합으로 단일 음이 형성되지 않는다.In Korean, 1 to 3 phonemes form a single note, and vowels belonging to voiced sounds can form a single note, but voiced or unvoiced consonants cannot form a single note. It doesn't work.

반면, 우리의 주변환경에서 흔히 접할 수 있는 돌발잡음은 잡음구간의 길이가 수십 밀리 초에서 수초에 이르고, 그 형태가 단순한 모양의 소리가 반복되는 것에서부터 여러가지 성분의 소리가 복합적으로 이루어져 나타나는 잡음에 이르기까지 다양한 특성을 갖는다.On the other hand, the sudden noise commonly found in our surroundings ranges from a few tens of milliseconds to several seconds in noise range, and is characterized by noises that are composed of complex sounds from various repeating sounds. It has a variety of characteristics.

이하, 이러한 음성과 잡음신호의 특성에 기반을 두어 돌발잡음을 제거하는 장치에 대하여 첨부도면을 참조하여 상세하게 기술한다.Hereinafter, a device for removing the sudden noise based on the characteristics of the voice and noise signal will be described in detail with reference to the accompanying drawings.

도 2는 본 발명의 일실시예에 따른 음성 신호특성을 이용한 돌발잡음 제거장치의 구성을 나타내는 블록구성도이다.2 is a block diagram showing the configuration of the abrupt noise cancellation apparatus using the voice signal characteristics according to an embodiment of the present invention.

이를 참조하면, 본 발명에 따른 음성 신호특성을 이용한 돌발잡음 제거장치는 음성분석부와, 대역통과필터와, 검출파라미터 추출부와, 음성구간 결정로직과, 인식부로 구성된다. 특히, 상기 음성구간 결정로직은 단구간 신호 제거부와, 연 무성음 제거부와, 1차 잡음 제거부와, 피치 검출부와, 2차 잡음 제거부로 이루어진다.Referring to this, the abrupt noise removing apparatus using the voice signal characteristics according to the present invention includes a voice analyzer, a band pass filter, a detection parameter extraction unit, a voice section decision logic, and a recognition unit. In particular, the voice section decision logic includes a short section signal canceller, a fume canceller, a first noise canceller, a pitch detector, and a second noise canceller.

보다 상세하게, 참조부호 40은 잡음이 섞인 음성신호로부터 필터의 계수를 구하기 위한 음성분석을 행하는 음성분석부를 나타내며, 참조부호 50은 상기 음성분석부에서 결정된 대역폭, 예컨대 90∼3.4KHz의 대역폭을 갖는 대역통과필터를 나타낸다.More specifically, reference numeral 40 denotes a speech analyzer for performing a voice analysis for obtaining a coefficient of a filter from a noise-mixed speech signal, and reference numeral 50 has a bandwidth determined by the speech analyzer, for example, a bandwidth of 90 to 3.4KHz. Represents a bandpass filter.

또한, 참조부호 60은 상기 대역통과 필터(50)로부터 출력되는 신호에서 음성구간 검출을 위한 검출파라미터 추출부를 나타내는 바, 그 검출파라미터 추출부(60)는 조용한 환경에서 음성구간을 검출할 때 사용하는 파라미터와 같이 에너지, 인접 음성 샘플의 상관도, 영교차율, 음성 지속시간 등을 이용하며, 여기서 에너지는 한 프레임을 구성하는 음성 데이터들의 평균 에너지를 의미하고, 인접 샘플과의 상관도는 음성 신호의 순간적인 변화량을 나타내며 인접한 두 샘플의 곱한 값을 전 프레임 구간에서 더한 값과 에너지의 비로써 산출할 수 있다. 영교차율은 샘플들의 부호 변화량을 특정한 값이다. 이때, 상기 대역통과 필터(50)는 128 포인트를 하나의 프레임으로 하고, 64 포인트씩 슬라이딩하면서 각 변수들을 추출한다.In addition, reference numeral 60 denotes a detection parameter extractor for detecting a speech segment in the signal output from the bandpass filter 50, and the detection parameter extractor 60 is used to detect the speech segment in a quiet environment. Like parameters, energy, correlation of adjacent speech samples, zero crossing rate, speech duration, etc. are used, where energy means the average energy of speech data constituting one frame, and correlation with neighboring samples It represents the instantaneous change and can be calculated as the ratio of the energy and the sum of two adjacent samples. The zero crossing rate is a specific value of the code change amount of the samples. In this case, the bandpass filter 50 sets 128 points as one frame and extracts each variable while sliding by 64 points.

참조부호 70은 상기 검출파라미터 추출부(60)에서 검출한 파라미터를 이용하여 음성구간을 결정하기 위한 음성구간 결정로직을 나타내는 바, 그 음성구간 결정로직(70)에서 음성 구간이 아니라고 판단되는 구간은 잡음신호 구간으로 판단하여 제거토록 한다.Reference numeral 70 denotes a voice section determination logic for determining a voice section using the parameter detected by the detection parameter extraction unit 60. The section determined that the voice section determination logic 70 is not a voice section is Determination is made based on the noise signal section.

참조부호 80은 상기 음성구간 결정로직(70)에서 음성신호 구간으로 판단된 구간이 전송되어 음성인식을 행하는 인식부를 나타낸다.Reference numeral 80 denotes a recognition unit for performing voice recognition by transmitting a section determined as a voice signal section in the voice section determination logic 70.

특히, 상기 음성구간 결정로직(70)은 단구간 신호 제거부와, 연 무성음 제거부와, 1차 잡음 제거부와, 피치 검출부와, 2차 잡음 제거부로 이루어진다.In particular, the voice section decision logic 70 includes a short section signal remover, an unvoiced sound canceller, a primary noise canceller, a pitch detector, and a secondary noise canceller.

이때, 참조부호 72는 돌발잡음을 위한 첫과정으로 기설정치보다 짧은 구간의 신호는 음성신호가 아니라고 판단하여 제거토록 하는 단구간 신호 제거부를 나타내는 바, 그 단구간 신호 제거부(72)는 6 프레임, 96msec 이하의 신호를 잡음으로 판단하여 제거한다.In this case, reference numeral 72 is a first step for sudden noise, and indicates a short section signal removing unit for determining that a signal of a section shorter than a preset value is not a voice signal, and the short section signal removing unit 72 has 6 frames. The signal below 96msec is judged as noise and removed.

또한, 참조부호 74는 한국어 특성상 무성음만으로 이어지는 구간을 잡음 구간으로 판단하여 제거하기 위한 연 무성음 제거부를 나타낸다. 참조부호 76은 상기 연 무성음 제거부(74)를 통과한 신호중 스펙트럼의 변화가 심한 구간을 스펙트럼 배리언스(Spectrum Variance)와 3 차원 커뮬런트(Cumulant)값에 따라 잡음을 구분하여 제거하기 위한 1차 잡음 제거부를 나타낸다.Also, reference numeral 74 denotes a smokeless sound removing unit for determining and removing a section leading to unvoiced sound only as a noise section due to Korean characteristics. Reference numeral 76 denotes a section for removing noise having a severe spectrum change in the signal passing through the fume remover 74 according to spectrum variance and three-dimensional cumulant values. Represents a primary noise canceller.

참조부호 78은 상기 1차 잡음 제거부(76)를 통과한 신호중 기설정 피치(80∼300Hz)를 벗어나는 피치 구간을 검출하기 위한 피치 검출부를 나타낸다.Reference numeral 78 denotes a pitch detector for detecting a pitch section that is outside the preset pitch (80 to 300 Hz) of the signal passing through the first noise canceller 76.

참조부호 79은 상기 피치 검출부(78)에서 검출된 피치 구간(80Hz이하)을 제거하기 위한 2차 잡음 제거부를 나타낸다.Reference numeral 79 denotes a secondary noise canceller for removing the pitch section (80 Hz or less) detected by the pitch detector 78.

상기한 구성의 본 발명의 일실시예에 따른 음성 신호특성을 이용한 돌발잡음 제거장치의 기능과 작용을 첨부된 도면을 참조하여 상세하게 설명한다.With reference to the accompanying drawings, the function and operation of the abrupt noise removing device using the voice signal characteristic according to an embodiment of the present invention having the above-described configuration will be described in detail.

도 3은 본 발명의 일실시예에 따른 돌발잡음이 포함된 음성 신호특성 및 그 제거상태에 따른 신호특성을 나타내는 파형도이며, 도 4는 본 발명의 일실시예에 따른 음성 신호특성을 이용한 돌발잡음 제거장치의 신호흐름을 나타내는 플로우챠트이다.3 is a waveform diagram illustrating a voice signal characteristic including a sudden noise according to an embodiment of the present invention and a signal characteristic according to a removal state thereof, and FIG. 4 illustrates a sudden use of the voice signal characteristic according to an embodiment of the present invention. This is a flowchart showing the signal flow of the noise canceller.

먼저, 잡음이 섞인 음성신호를 음성분석부(40)를 통과시켜 필터의 계수를 구하고, 그 신호를 상기 음성분석부(40)에서 결정된 대역폭, 예컨대 90∼3.4KHz의 대역폭을 갖는 대역통과필터(50)를 통과시켜 필터링한다(제 1 단계: ST-1).First, the noise-mixed voice signal is passed through the voice analyzer 40 to obtain a coefficient of the filter, and the signal is passed through a band pass filter having a bandwidth determined by the voice analyzer 40, for example, a bandwidth of 90 to 3.4 KHz. 50) to pass through (first step: ST-1).

또한, 상기 대역통과 필터(50)로부터 출력되는 신호를 상기 검출파라미터 추출부(60)를 통과시켜 음성구간 검출을 위한 검출파라미터를 추출하는 바, 이때, 상기 대역통과 필터(50)는 128 포인트를 하나의 프레임으로 하고, 64 포인트씩 슬라이딩하면서 각 변수들을 추출한다(제 2 단계: ST-2).In addition, the signal output from the bandpass filter 50 is passed through the detection parameter extracting unit 60 to extract a detection parameter for detecting a speech section. In this case, the bandpass filter 50 determines 128 points. One frame is used, and each variable is extracted while sliding by 64 points (second step: ST-2).

또한, 상기 검출파라미터 추출부(60)를 통과한 신호를 음성구간 결정로직(70)으로 전송하여 상기 검출파라미터 추출부(60)에서 검출한 파라미터를 이용하여 음성구간을 결정하도록 한다. 이때, 그 음성구간 결정로직(70)에서 음성 구간이 아니라고 판단되는 구간은 잡음신호 구간으로 판단하여 제거토록 한다.In addition, the signal passing through the detection parameter extraction unit 60 is transmitted to the voice interval determination logic 70 to determine the voice interval using the parameters detected by the detection parameter extraction unit 60. At this time, the section determined not to be a speech section in the speech section determining logic 70 is determined to be a noise signal section to be removed.

보다 상세하게, 돌발잡음을 위한 첫과정으로 상기 검출파라미터 추출부(60)를 통과한 신호를 상기 단구간 신호 제거부(72)로 전송하여 기설정치보다 짧은 구간의 신호가 존재하는 지의 여부를 판단한다.In more detail, as a first step for abrupt noise, the signal passing through the detection parameter extraction unit 60 is transmitted to the short-term signal removing unit 72 to determine whether a signal having a section shorter than a preset value exists. do.

만약, 기설정 구간(예컨대, 6 프레임, 96msec 이하)의 신호가 존재할 시에는 음성신호가 아니라고 판단하여 제거토록 한다.If there is a signal of a preset period (for example, 6 frames, 96 msec or less), it is determined that it is not an audio signal and removed.

또한, 상기 단구간 신호 제거부(72)를 통과한 음성신호는 상기 연 무성음 제거부(74)를 통과시켜 연 무성음이 존재하는 지의 여부를 판단하여(제 3 단계: ST-3) 연 무성음이 존재하면 잡음으로 판단하여 제거토록 한다(제 4 단계: ST-4).In addition, the voice signal passing through the short section signal removing unit 72 is passed through the fume sound removing unit 74 to determine whether there is a fume sound (step 3: ST-3) If present, it is judged to be a noise (step 4: ST-4).

한편, 상기 연무성음 제거부(74)를 통과한 신호는 다시 상기 1차 잡음 제거부(76)를 통과시키면서 신호중 스펙트럼의 변화가 심한 구간을 스펙트럼 배리언스(Spectrum Variance)와 3 차원 커뮬런트(Cumulant)값에 따라 잡음을 구분하여 제거한다(제 5 단계: ST-5).Meanwhile, the signal passing through the fume remover 74 passes through the primary noise remover 76 again, and the spectrum variation and the three-dimensional competence in the section where the spectrum change is severe in the signal. The noise is classified and removed according to the Cumulant value (step 5: ST-5).

그리고, 상기 1차 잡음 제거부(76)를 통과한 신호중 기설정 피치(80∼300Hz)를 벗어나는 피치 구간을 상기 피치 검출부(78)를 이용하여 검출하고(제 6 단계: ST-6), 기설정 주파수 이하의 피치는 잡음으로 판단하여 제거토록 하고(제 7 단계: ST-7), 나머지 신호를 음성신호로 결정한다(제 8 단계: ST-8).In addition, a pitch section outside the preset pitch (80 to 300 Hz) of the signal passing through the first noise removing unit 76 is detected using the pitch detecting unit 78 (sixth step: ST-6). The pitch below the set frequency is judged to be noise and eliminated (step 7: ST-7), and the remaining signals are determined as voice signals (step 8: ST-8).

한편, 본 발명의 실시예에 따른 음성 신호특성을 이용한 돌발잡음 제거장치 및 그 방법은 단지 상기한 실시예에 한정되는 것이 아니라 그 기술적 요지를 이탈하지 않는 범위내에서 다양한 변경이 가능하다.On the other hand, the abrupt noise canceling apparatus and method using the voice signal characteristics according to an embodiment of the present invention is not limited to the above embodiment, but various modifications can be made without departing from the technical gist of the invention.

상기한 바와 같이, 본 발명에 따른 음성 신호특성을 이용한 돌발잡음 제거장치 및 그 방법은 제거하기 난이한 다양한 형태의 돌발잡음을 성공적으로 제거할 수 있으며, 필요한 신호성분의 왜곡이나 일부구간의 제거 등의 기존의 방법이 갖고 있는 부작용이 없고, 필요 없는 잡음 성분이 미리 제거되고 필요한 신호성분만이 남게 되므로, 음성 인식 등 음성신호처리의 성능향상 및 데이터 통신을 비롯한 각종 통신망의 대역효율 향상에 기여할 것이며, 음성코딩의 압축률을 향상시킬 수 있다는 효과가 있다.As described above, the apparatus for removing the abrupt noise using the voice signal characteristic and the method thereof according to the present invention can successfully remove the abrupt noise of various forms that are difficult to remove, and eliminate the distortion of the necessary signal components or some sections. Since there are no side effects of existing methods, and unnecessary noise components are removed in advance and only necessary signal components are left, it will contribute to improving the performance of voice signal processing such as voice recognition and improving the bandwidth efficiency of various communication networks including data communication. Therefore, there is an effect that the compression rate of the voice coding can be improved.

Claims

In the abrupt noise removing device for extracting only the speech signal from the mixed signal of the sudden noise,

A voice analysis unit for performing voice analysis to obtain coefficients of a filter from the noise-mixed voice signal;

A band pass filter having a bandwidth of 90 to 3.4 KHz determined by the speech analyzer;

A detection parameter extracting unit for extracting a variable for detecting a speech section from the signal output from the bandpass filter;

A speech section determination logic for determining a speech section by removing a section determined to be not a speech section by using the parameter detected by the detection parameter extractor as a noise signal section;

Apparatus for removing the abrupt noise using the voice signal characteristics, characterized in that the voice section is determined in the voice signal determination section is a voice signal section is transmitted to perform the voice recognition.

2. The apparatus of claim 1, wherein the voice segment decision logic is a first step for sudden noise; a short segment signal remover configured to determine that a signal of a section shorter than a preset value is not a voice signal; A smokeless sound removing unit for determining and removing a section leading to unvoiced sound only as a noise section due to Korean characteristics; A primary noise canceller for classifying and removing noises in the signal passing through the noise canceller according to spectral variance and three-dimensional cumulant values; A pitch detector for detecting a pitch section outside a preset pitch (80 to 300 Hz) of the signal passing through the first noise canceller; An abrupt noise cancellation device using voice signal characteristics, characterized in that the secondary noise canceller for removing the pitch interval (80Hz or less) detected by the pitch detector.

In the method for removing only the sudden noise in the signal mixed with the sudden noise and the voice signal,

Passing a signal having a mixed signal and noise through a band pass filter;

A variable extraction process of extracting variable values of each signal by a predetermined frame;

Removing a short section that cannot be viewed as negative;

Determining whether there is a subsequent unvoiced sound among the signals from which the short section is removed;

Removing a fume speech section;

A speech noise classification elimination process for removing a noise portion by dividing the speech and the noise according to a spectrum variance and a three-dimensional cumulant value;

Abrupt noise cancellation method using the speech signal characteristics, characterized in that the process consisting of removing a specific pitch sound interval to remove a sound interval having a pitch below a predetermined frequency.

[4] The method of claim 3, wherein the bandwidth of the bandpass filter has a bandwidth of 90 to 3.4 kHz in which a signal and noise are mixed.

4. The method of claim 3, wherein in the variable extraction process, each variable extracts each variable value while sliding 128 points by 64 points in one frame.

4. The method of claim 3, wherein the short section other than the speech section is 6 frames, 96 msec.

4. The method of claim 3, wherein the specific pitch in the process of removing the specific pitch sounding interval is a pitch of 80 HZ or less.