KR20070050694A

KR20070050694A - Method and apparatus for removing noise of multi-channel voice signal

Info

Publication number: KR20070050694A
Application number: KR1020050108226A
Authority: KR
Inventors: 고한석; 안성주
Original assignee: 고려대학교 산학협력단
Priority date: 2005-11-11
Filing date: 2005-11-11
Publication date: 2007-05-16
Also published as: KR100751921B1

Abstract

멀티채널 음성신호의 잡음제거 방법 및 장치가 개시된다. 그 잡음제거 방법은 (a) 음성신호로부터 환경에 따른 잡음 고유의 성분을 제거하는 단계; (b) 상기 잡음 고유성분이 제거된 음성신호로부터 음성과 잡음을 분리하는 단계; 및 (c) 상기 분리된 음성에 남아있는 잔여 잡음을 제거하는 후처리 단계를 포함함을 특징으로 한다.Disclosed are a method and apparatus for removing noise of a multichannel voice signal. The noise removing method comprises the steps of: (a) removing a noise-specific component according to the environment from a voice signal; (b) separating speech and noise from the speech signal from which the noise intrinsic component has been removed; And (c) a post-processing step of removing residual noise remaining in the separated speech.

본 발명에 의하면, 멀티채널 잡음 제거 시스템의 입력단에서 잡음을 제거함으로써 결과적으로 단일 채널 환경에서의 잡음처리 방법보다 향상된 성능을 얻을 수 있고, 전체 시스템의 성능을 향상시킬 수 있다.According to the present invention, by removing the noise at the input terminal of the multi-channel noise cancellation system, as a result, it is possible to obtain improved performance than the noise processing method in a single channel environment and to improve the performance of the entire system.

Description

Method and apparatus for removing noise of multi-channel voice signal {Method and apparatus for removing noise of multi-channel voice signal}

도 1은 종래의 GSC 잡음 제거 방법에 대한 개략도이다.1 is a schematic diagram of a conventional GSC noise cancellation method.

도 2는 본 발명에 의한 멀티채널 음성신호의 잡음제거 장치에 대한 일실시예의 구성을 블록도로 도시한 것이다.2 is a block diagram illustrating a configuration of an embodiment of a device for removing noise of a multichannel voice signal according to the present invention.

도 3은 도 2의 음성&잡음 분리부(220)의 보다 상세한 구성을 블록도로 도시한 것이다.FIG. 3 is a block diagram illustrating a more detailed configuration of the voice & noise separation unit 220 of FIG. 2.

도 4는 도 3의 시간지연보상부(300)의 보다 상세한 구성을 블록도로 도시한 것이다.4 is a block diagram illustrating a more detailed configuration of the time delay compensation unit 300 of FIG. 3.

도 5는 eigen 필터링부(340)의 보다 세부적인 구성을 블록도로 도시한 것이다.5 is a block diagram illustrating a more detailed configuration of the eigen filtering unit 340.

도 6은 본 발명에 의한 멀티채널 음성신호의 잡음제거 방법에 대한 일실시예를 흐름도로 도시한 것이다.6 is a flowchart illustrating an embodiment of a method for removing noise of a multichannel voice signal according to the present invention.

도 7은 도 6의 음성과 잡음의 분리(620단계) 과정에 대한 보다 세부적인 과정을 흐름도로 도시한 것이다.FIG. 7 is a flowchart illustrating a more detailed process of the process of separating voice and noise (step 620) of FIG. 6.

본 발명은 음성인식에 관한 것으로서, 특히 멀티채널 음성신호의 잡음제거 방법 및 장치에 관한 것이다.BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to speech recognition, and more particularly, to a method and apparatus for removing noise of a multichannel speech signal.

음성인식 또는 통신시스템은 잡음이 없거나 비교적 조용한 실험실 환경에서는 좋은 성능을 나타낸다. 그러나 이를 실용화하여 실제 현장에서 사용할 경우에는 여러 가지 잡음요인들에 의하여 인식성능이 현저히 저하된다. 즉, 음성인식 시스템은 배경잡음 및 간섭신호 등에 의해 그 성능의 저하가 현저하므로 실제 환경에서 만족할 만한 성능을 보이기 위해서는, 마이크에 입력되는 원치 않는 신호들에 의한 영향을 배제시키거나 완화시키는 음성 전처리 기술이 필수적이다. 이러한 음성 전처리 기술들은 과거 수십 년 동안 잡음 제거를 위해 많은 연구들이 진행 되어 왔었다. 특히, 음성신호 처리 분야의 경우 단일 마이크 기반의 전처리 알고리즘은 적은 계산량과 구현의 용이성으로 인해 주된 잡음제거 기법으로 다루어져 왔으며, 그 성능 또한 점차 향상되어 왔다. 그러나 이는 정확한 잡음 성분의 추정을 전제로 하기 때문에, 잡음성분에 대한 정보가 부족할 경우 안정적인(변화가 없는) 잡음에 한해서만 효과적인 성능을 얻을 수 있다는 단점을 지닌다. 따라서 여러 개의 마이크를 통해 음성신호 및 잡음에 대한 다양한 정보를 얻고, 이를 이용하여 잡음 제거 또는 음성신호를 강화하는 방법들이 개발되어 왔다. 그러나 여러 개의 마이크를 이용하여 잡음을 제거하는 방법들은 계산량과 처리속도 등이 증가하는 단점이 있다. 따라서 이에 대한 해결방법으로 두개의 마이크를 이용해서 잡음을 제거하는 기술이 적당하다고 할 수 있다.Voice recognition or communication systems perform well in noisy or relatively quiet laboratory environments. However, when it is put to practical use and used in actual field, recognition performance is remarkably degraded by various noise factors. In other words, the speech recognition system has a significant performance deterioration due to background noise and interference signals. Therefore, in order to achieve satisfactory performance in a real environment, a speech preprocessing technology that excludes or mitigates the effects of unwanted signals input to the microphone. This is essential. These speech preprocessing techniques have been studied for noise reduction in the past decades. In particular, in the field of speech signal processing, a single microphone based preprocessing algorithm has been treated as a main noise canceling technique due to its low computational complexity and ease of implementation, and its performance has been gradually improved. However, since it is based on the estimation of accurate noise components, it is disadvantageous that effective performance can be obtained only for stable (unchanging) noise when the information on the noise components is insufficient. Therefore, various microphones have been developed to obtain various information about the voice signal and the noise, and to remove the noise or enhance the voice signal. However, the method of removing noise using multiple microphones has the disadvantage of increasing the computational speed and processing speed. Therefore, as a solution to this problem, the technique of removing noise using two microphones is appropriate.

기존의 멀티채널에서의 잡음처리 방법 중 Adaptive Noise Cancelling(ANC) 방법을 적용할 경우, 두 마이크 중 reference 마이크(두번째 마이크)의 입력에 잡음 신호만이 존재하여야 되는 가정이 있는데, 이 가정을 만족시키기가 어려움이 있고, 또한 두 개의 마이크가 완전히 분리되어 있지 않기 때문에 reference 마이크에도 우리가 원하는 음성신호가 많이 포함되어 있다는 것이다. 실제적으로 ANC 방법이 효과적으로 적용되기 위해서는 primary(첫번째) 마이크와 reference 마이크의 잡음이 크게 상관(correlation)되어 있어야 한다. 그리고 이를 만족하기 위해서는 두 마이크를 서로 가깝게 설치하여야 한다. 그러나 이렇게 할 경우 reference 마이크에도 desired 음성신호가 많이 존재하고(Cross-talk Interference라고 함), 여기에 ANC 방법을 적용할 경우 ANC는 잡음뿐만 아니라 desired 신호까지 제거해버려 성능하락의 주요 요인이 된다. When applying the Adaptive Noise Canceling (ANC) method among the existing multi-channel noise processing methods, there is an assumption that only a noise signal should exist at the input of the reference microphone (second microphone) among the two microphones. This is difficult and also because the two microphones are not completely separated, the reference microphone also contains a lot of the audio signal we want. In practice, in order for the ANC method to be effective, the noise of the primary and reference microphones must be highly correlated. And to satisfy this, two microphones should be installed close to each other. However, in this case, there are many desired voice signals in the reference microphone (called cross-talk interference), and when applying the ANC method, the ANC removes not only the noise but also the desired signal, which is a major factor in performance degradation.

다음으로 가장 널리 알려져 있는 방법의 하나로서 GSC(Generalized Sidelobe Canceller)가 있다. 도 1은 종래의 GSC 잡음 제거 방법에 대한 개략도이다. 도 1에서 GSC는 혼음패턴 생성부(100), 순수 잡음 패턴 생성부(102), 적응 필터링부(104), 감산부(106)로 이루어진다. 혼음 패턴 생성부(100)는 두 입력의 합을 반으로 나눔으로써 원음성+잡음의 혼음을 생성하고, 순수 잡음 패턴 생성부(102)는 두 입력의 차로 음성 성분을 제거한 뒤 이를 반으로 나눔으로써 두 입력의 순수 잡음 차 성분을 생성하며, 적응 필터부(104)는 순수 잡음 패턴 생성부(102)에서 생성된 순수 잡음 패턴을 입력으로 하여 적응 필터링을 통해 혼음 패턴 내의 잡음 성분을 추정토록 한다. 감산부(106)는 이와 같이 추정되어진 잡음을 혼음에서 감산함으로 써 최종적으로 음성 성분을 출력한다. The next most widely known method is the Generalized Sidelobe Canceller (GSC). 1 is a schematic diagram of a conventional GSC noise cancellation method. In FIG. 1, the GSC includes a confusion pattern generator 100, a pure noise pattern generator 102, an adaptive filter 104, and a subtractor 106. The horn pattern generator 100 divides the sum of the two inputs in half to generate the horn of the original voice + noise, and the pure noise pattern generator 102 removes the voice component by the difference between the two inputs and divides it in half. A pure noise difference component of two inputs is generated, and the adaptive filter unit 104 uses the pure noise pattern generated by the pure noise pattern generator 102 as an input to estimate the noise component in the confusion pattern through adaptive filtering. The subtraction unit 106 finally outputs the speech component by subtracting the noise thus estimated from the confusion.

그러나, 이러한 기존의 GSC 잡음 제거 방법을 그대로 적용할 경우, 실제 환경에서 두 마이크 간의 경로가 이상적으로 동일하지 않기 때문에 순수잡음 패턴 생성부(102)에서 잡음 성분에 음성의 누출 신호를 반드시 포함하게 된다. 이는 GSC의 적응 필터부(104)에서 정확한 잡음을 추정하는 것을 방해하고, 감산부(106)에서의 최종 감산 단계에서 원 음성 성분을 차감하게 되어 결국 전체적인 잡음제거 시스템의 성능을 저해하는 문제점을 지니고 있다.However, if the conventional GSC noise reduction method is applied as it is, since the paths between the two microphones are not ideally identical in a real environment, the pure noise pattern generator 102 necessarily includes the voice leakage signal in the noise component. . This interferes with estimating the correct noise in the adaptive filter section 104 of the GSC and subtracts the original speech component in the final subtraction step in the subtractor 106, which in turn hinders the performance of the overall noise reduction system. have.

따라서 이러한 reference 마이크에도 어쩔 수 없이 우리가 원하는 음성신호가 많이 존재하는 cross-talk interference가 존재하는데, 이를 해결하기 위한 방안으로 두 개의 마이크를 완전히 분리시키거나 reference 마이크에 잡음신호만을 얻을 수 없기 때문에, 이를 처리할 새로운 잡음처리 방법이 요구된다.Therefore, there is inevitable cross-talk interference in which there are many voice signals that we want. As a way to solve this problem, since we cannot completely separate the two microphones or obtain a noise signal in the reference microphone, New noise processing methods are needed to deal with this.

본 발명이 이루고자 하는 기술적 과제는 잡음성분이 제거된 보다 깨끗한 음성신호를 얻음으로써 음질 향상 및 음성인식 시스템의 성능 향상을 도모하기 위한, 멀티채널 음성신호의 잡음제거 방법 및 장치를 제공하는 것이다.SUMMARY OF THE INVENTION The present invention has been made in an effort to provide a method and an apparatus for removing noise of a multichannel speech signal for improving sound quality and improving performance of a speech recognition system by obtaining a cleaner speech signal from which noise components have been removed.

상술한 기술적 과제를 해결하기 위한 본 발명에 의한, 멀티채널 음성신호의 잡음제거 방법은, (a) 음성신호로부터 환경에 따른 잡음 고유의 성분을 제거하는 단계; (b) 상기 잡음 고유성분이 제거된 음성신호로부터 음성과 잡음을 분리하는 단계; 및 (c) 상기 분리된 음성에 남아있는 잔여 잡음을 제거하는 후처리 단계를 포함함을 특징으로 한다.According to the present invention for solving the above technical problem, a method for removing noise of a multi-channel voice signal, (a) removing the noise-specific components according to the environment from the voice signal; (b) separating speech and noise from the speech signal from which the noise intrinsic component has been removed; And (c) a post-processing step of removing residual noise remaining in the separated speech.

상기 (a)단계의 잡음고유 성분 제거는 잡음을 주파수 분석하여 고역통과 필터, 저역통과 필터 및 대역통과 필터 중 적어도 하나를 사용하여 잡음 고유성분을 제거함이 바람직하다.The noise inherent component removal in step (a) is preferably performed by frequency analysis of noise to remove noise intrinsic components using at least one of a high pass filter, a low pass filter, and a band pass filter.

상기 (b)단계는 (b1) 음원신호로부터 각각의 마이크에 도착하는데 걸리는 입력신호 x₁, x₂의 시간지연을 보정하여 채널에 포함된 음성신호성분 및 잡음성분을 각각 강화한 y₁, y₂를 생성하는 단계; (b2) 각 프레임마다 상기 y₁, y₂에 대해 데이터 행렬(data matrix) Y를 구하는 단계; 및 (b3) 상기 데이터 행렬 Y를 이용하여 eigen 필터 B(z)을 구하여 음성과 잡음을 분리하는 단계를 포함함이 바람직하다.In the step (b), y ₁ and y _{2 which} enhance the voice signal component and noise component included in the channel by correcting the time delays of the input signals x ₁ and x ₂ required to arrive at the respective microphones from the sound source signal, respectively. Generating a; (b2) obtaining a data matrix Y for y ₁ and y ₂ for each frame; And (b3) separating the speech and the noise by obtaining the eigen filter B (z) using the data matrix Y.

상기 (b1)단계는 입력 신호 x₁, x₂에 대해 크로스 상관(cross-correlation)을 수행하는 단계; 상기 크로스 상관된 정보를 이용하여 음원신호로부터 각각의 마이크에 도착하는데 걸리는 입력신호 x₁, x₂의 시간지연을 구하는 단계; 상기 입력신호 x₁, x₂를 시간지연된 값만큼 이동(shift)시켜 동기를 맞춘 신호 x‘₁, x’₂를 생성하는 단계; 및 상기 x‘₁, x’₂를 더한 값과 뺀 값의 반을 취하여 각각 y₁, y₂를 구하는 단계를 포함함이 바람직하다.Step (b1) may include performing cross-correlation on the input signals x ₁ and x ₂ ; Obtaining a time delay of an input signal x ₁ , x ₂ required to arrive at each microphone from a sound source signal using the cross-correlated information; Generating the synchronized signals x ' ₁ and x' ₂ by shifting the input signals x ₁ and x ₂ by a time delayed value; And taking the half of the sum of the sum of x ' ₁ and the sum of x' ₂ and subtraction to obtain y ₁ and y ₂ , respectively.

상기 (b3)단계는 상기 데이터 행렬 Y를 이용하여 eigen 필터 B(z)을 구하는 단계; 선택된 eigenvector의 성분을 이용하여 다항식(ploynomial)을 구성하고 상기 다항식의 근을 구하여 단위원(unit circle)의 안쪽으로 이동시켜 이동된 근을 이용하여 다항식을 구성하여 필터 A(z)를 생성하는 단계; 및 상기 eigen 필터 B(z)를 상기 필터 A(z)로 나누어 무한 임펄스 응답필터 H(z)( = B(z) / A(z))를 구하여 음성과 잡음을 분리하는 단계를 구비함이 바람직하다.Step (b3) may include obtaining an eigen filter B (z) using the data matrix Y; Constructing a polynomial using the components of the selected eigenvector, finding a root of the polynomial, moving it to the inside of a unit circle, and constructing a polynomial using the moved root to generate a filter A (z) ; And dividing the eigen filter B (z) by the filter A (z) to obtain an infinite impulse response filter H (z) (= B (z) / A (z)) to separate speech and noise. desirable.

상기 (c)단계는 1채널 기반의 잡음추정을 이용하여 음성신호에 남아있는 잔여잡음을 제거함이 바람직하다. 상기 잔여 잡음 제거는 VAD(Voice activity detection)를 적용하여 음성신호 구간과 잡음구간을 찾는 단계; 및 잡음 구간의 잡음신호를 주기적으로 업데이트한 신호를 상기 (b)단계에서 생성된 신호에서 감산하는 단계로 이루어짐이 바람직하다. 상기 업데이트는 이전 잡음추정값과 현재 잡음값을 가중합(weighted sum)함에 의해 이루어짐이 바람직하다.In the step (c), it is preferable to remove the residual noise remaining in the voice signal by using the noise estimation based on one channel. The residual noise removal may include: finding a voice signal section and a noise section by applying voice activity detection (VAD); And subtracting the signal periodically updated from the noise signal in the noise section from the signal generated in step (b). The update is preferably made by weighted sum of the previous noise estimate and the current noise value.

상술한 기술적 과제를 해결하기 위한 본 발명에 의한, 멀티채널 음성신호의 잡음제거 장치는, 음성신호로부터 환경에 따른 잡음 고유의 성분을 제거하는 고유잡음제거부; 상기 잡음 고유성분이 제거된 음성신호로부터 음성과 잡음을 분리하는 음성&잡음 분리부; 및 상기 분리된 음성에 남아있는 잔여 잡음을 제거하는 후처리부를 포함함이 바람직하다. 상기 고유잡음제거부의 잡음고유 성분 제거는 잡음을 주파수 분석하여 고역통과 필터, 저역통과 필터 및 대역통과 필터 중 적어도 하나를 사용하여 잡음 고유성분을 제거함이 바람직하다. According to an aspect of the present invention, there is provided a noise canceling apparatus for a multichannel speech signal, including: an inherent noise canceling unit for removing a noise-specific component according to an environment from a speech signal; A voice & noise separator for separating voice and noise from the voice signal from which the noise inherent components are removed; And a post-processing unit for removing residual noise remaining in the separated voice. The noise inherent component removal of the inherent noise removal unit preferably removes the noise intrinsic component using at least one of a high pass filter, a low pass filter, and a band pass filter by frequency analysis of the noise.

상기 음성&잡음 분리부는 음원신호로부터 각각의 마이크에 도착하는데 걸리는 입력신호 x₁, x₂의 시간지연을 보정하여 채널에 포함된 음성신호성분 및 잡음성분을 각각 강화한 y₁, y₂를 생성하는 시간지연보상부; 각 프레임마다 상기 y₁, y₂ 에 대해 데이터 행렬(data matrix) Y를 구하는 데이터행렬생성부; 및 상기 데이터 행렬 Y를 이용하여 eigen 필터 B(z)을 구하여 음성과 잡음을 분리하는 eigen필터링부를 포함함이 바람직하다.The voice and noise separation unit corrects the time delay of the input signals x ₁ and x ₂ to arrive at the respective microphones from the sound source signal, thereby generating y ₁ and y ₂ which enhance the voice signal components and noise components included in the channel, respectively. Time delay compensation; Y ₁ , y _{2 for} each frame A data matrix generator for obtaining a data matrix Y with respect to; And an eigen filtering unit for dividing speech and noise by obtaining an eigen filter B (z) using the data matrix Y.

상기 시간지연보상부는 입력 신호 x₁, x₂에 대해 크로스 상관(cross-correlation)을 수행하는 크로스 상관부; 상기 크로스 상관된 정보를 이용하여 음원신호로부터 각각의 마이크에 도착하는데 걸리는 입력신호 x₁, x₂의 시간지연을 구하는 시간지연획득부; 상기 입력신호 x₁, x₂를 시간지연된 값만큼 이동(shift)시켜 동기를 맞춘 신호 x‘₁, x’₂를 생성하는 동기화부; 및 상기 x‘₁, x’₂를 더한 값과 뺀 값의 반을 취하여 각각 y₁, y₂를 구하는 y₁& y₂생성부를 포함함이 바람직하다.The time delay compensation unit may include a cross correlation unit performing cross-correlation on the input signals x ₁ and x ₂ ; A time delay acquisition unit for obtaining a time delay of an input signal x ₁ , x ₂ required to arrive at each microphone from a sound source signal using the cross-correlated information; A synchronizer configured to shift the input signals x ₁ and x ₂ by a time delayed value to generate synchronized signals x ' ₁ and x'₂; And y ₁ & y ₂ generation units that take half of the sum of the sum of x ' ₁ and x' ₂ and subtract the values of y ₁ and y ₂ .

상기 eigen 필터링부는 상기 데이터 행렬 Y를 이용하여 eigen 필터 B(z)을 구하는 eigen 필터 생성부; 선택된 eigenvector의 성분을 이용하여 다항식(ploynomial)을 구성하고 상기 다항식의 근을 구하여 단위원(unit circle)의 안쪽으로 이동시켜 이동된 근을 이용하여 다항식을 구성하여 필터 A(z)를 생성하는 A(z) 생성부; 및 상기 eigen 필터 B(z)를 상기 필터 A(z)로 나누어 무한 임펄스 응답필터 H(z)( = B(z) / A(z))를 구하여 음성과 잡음을 분리하는 무한 임펄스 응답 필터링부를 구비함이 바람직하다.The eigen filtering unit comprises: an eigen filter generator for obtaining an eigen filter B (z) using the data matrix Y; A to construct a polynomial using the components of the selected eigenvector, find the root of the polynomial, move it to the inside of the unit circle, and construct a polynomial using the moved root to generate the filter A (z). (z) generating unit; And an infinite impulse response filtering unit for dividing the eigen filter B (z) by the filter A (z) to obtain an infinite impulse response filter H (z) (= B (z) / A (z)) to separate speech and noise. It is desirable to have.

상기 후처리부는 1채널 기반의 잡음추정을 이용하여 음성신호에 남아있는 잔여잡음을 제거함이 바람직하다. 상기 잔여 잡음 제거는 VAD(Voice activity detection)를 적용하여 음성신호 구간과 잡음구간을 찾아, 잡음 구간의 잡음신호를 주기적으로 업데이트한 신호를 상기 음성&잡음분리부에서 출력된 신호에서 감산함에 의해 이루어짐이 바람직하다. 상기 업데이트는 이전 잡음추정값과 현재 잡음값을 가중합(weighted sum)함에 의해 이루어짐이 바람직하다.The post-processing unit preferably removes residual noise remaining in the voice signal by using a noise estimation based on one channel. The residual noise is removed by applying voice activity detection (VAD) to find a voice signal section and a noise section, and subtracting the signal periodically updated from the noise signal of the noise section from the signal output from the voice & noise separation unit. This is preferred. The update is preferably made by weighted sum of the previous noise estimate and the current noise value.

그리고 상기 기재된 발명을 컴퓨터에서 실행시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공한다.A computer readable recording medium having recorded thereon a program for executing the invention described above is provided.

이하, 첨부된 도면들을 참조하여 본 발명에 따른 멀티채널 음성신호의 잡음제거 방법 및 장치에 대해 상세히 설명한다. 도 2는 본 발명에 의한 멀티채널 음성신호의 잡음제거 장치에 대한 일실시예의 구성을 블록도로 도시한 것으로서, 고유잡음 제거부(200), 음성&잡음 분리부(220) 및 후처리부(240)를 포함하여 이루어진다. Hereinafter, a method and apparatus for removing noise of a multichannel voice signal according to the present invention will be described in detail with reference to the accompanying drawings. Figure 2 is a block diagram showing the configuration of an embodiment of a noise canceling device for a multi-channel voice signal according to the present invention, the inherent noise removal unit 200, voice & noise separation unit 220 and post-processing unit 240 It is made, including.

고유잡음 제거부(200)는 음성신호로부터 환경에 따른 잡음 고유의 성분을 제거한다. 상기 잡음고유 성분은 잡음을 주파수 분석하여 고역통과 필터, 저역통과 필터 및 대역통과 필터 중 적어도 하나를 사용하여 잡음 고유성분을 제거할 수 있다.The inherent noise removal unit 200 removes noise inherent components of the environment from the voice signal. The noise inherent component may remove noise inherent components using at least one of a high pass filter, a low pass filter, and a band pass filter by frequency analysis of the noise.

음성&잡음 분리부(220)는 상기 잡음 고유성분이 제거된 음성신호로부터 음성과 잡음을 분리한다. 도 3은 상기 음성&잡음 분리부(220)의 보다 상세한 구성을 블록도로 도시한 것으로서, 시간지연보상부(300), 데이터행렬생성부(320) 및 eigen 필터링부(340)를 구비한다. The voice & noise separator 220 separates voice and noise from the voice signal from which the noise intrinsic component is removed. 3 is a block diagram illustrating a more detailed configuration of the voice and noise separation unit 220, and includes a time delay compensation unit 300, a data matrix generation unit 320, and an eigen filtering unit 340.

시간지연보상부(300)는 음원신호로부터 각각의 마이크에 도착하는데 걸리는 입력신호 x₁, x₂의 시간지연을 보정하여 채널에 포함된 음성신호성분 및 잡음성분을 각각 강화한 y₁, y₂를 생성한다. 도 4는 상기 시간지연보상부(300)의 보다 상세한 구성을 블록도로 도시한 것으로서, 크로스 상관부(400), 시간지연 획득부(420), 동기화부(440) 및 y₁&y₂ 생성부(460)를 구비한다. 크로스 상관부(400)는 입력 신호 x₁, x₂에 대해 크로스 상관(cross-correlation)을 수행한다. 시간지연획득부(420)는 상기 크로스 상관된 정보를 이용하여 음원신호로부터 각각의 마이크에 도착하는데 걸리는 입력신호 x₁, x₂의 시간지연을 구한다. 동기화부(440)는 상기 입력신호 x₁, x₂를 시간지연된 값만큼 이동(shift)시켜 동기를 맞춘 신호 x‘₁, x’₂를 생성한다. y₁& y₂생성부(460)는 상기 x‘₁, x’₂를 더한 값과 뺀 값의 반을 취하여 각각 y₁, y₂를 구한다.The time delay compensator 300 corrects the time delays of the input signals x ₁ and x ₂ to arrive at the respective microphones from the sound source signal, thereby increasing y ₁ and y ₂ respectively by reinforcing the voice signal components and noise components included in the channel. Create 4 is a block diagram illustrating a more detailed configuration of the time delay compensator 300. The cross correlator 400, the time delay acquirer 420, the synchronizer 440, and the y ₁ & y ₂ generator ( 460. The cross correlation unit 400 performs cross-correlation on the input signals x ₁ and x ₂ . The time delay acquisition unit 420 calculates a time delay of the input signals x ₁ and x _{2 that} are required to arrive at the respective microphones from the sound source signal using the cross-correlated information. The synchronizer 440 shifts the input signals x ₁ and x ₂ by a time delayed value to generate synchronized signals x ' ₁ and x' ₂ . The y ₁ & y ₂ generation unit 460 calculates y ₁ and y ₂ by taking half of the sum of the sum of x ' ₁ and x' ₂ and subtraction.

상기 데이터행렬생성부(320)는 각 프레임마다 상기 y₁, y₂ 에 대해 데이터 행렬(data matrix) Y를 구한다. The data matrix generator 320 is y ₁ and y ₂ for each frame. Find the data matrix Y for.

상기 eigen 필터링부(340)는 상기 데이터 행렬 Y를 이용하여 eigen 필터 B(z)을 구하여 음성과 잡음을 분리한다. 도 5는 상기 eigen 필터링부(340)의 보다 세부적인 구성을 블록도로 도시한 것으로서, eigen 필터 생성부(500), A(z) 생성부(520) 및 무한 임펄스 응답 필터링부(540)를 구비한다. 상기 eigen 필터 생성부(500)는 상기 데이터 행렬 Y를 이용하여 eigen 필터 B(z)을 구한다. 상기 A(z) 생성부(520)는 선택된 eigenvector의 성분을 이용하여 다항식(ploynomial)을 구성하 고 상기 다항식의 근을 구하여 단위원(unit circle)의 안쪽으로 이동시켜 이동된 근을 이용하여 다항식을 구성하여 필터 A(z)를 생성한다. 상기 무한 임펄스 응답 필터링부(540)는 상기 eigen 필터 B(z)를 상기 필터 A(z)로 나누어 무한 임펄스 응답필터 H(z)( = B(z) / A(z))를 구하여 음성과 잡음을 분리한다.The eigen filtering unit 340 obtains an eigen filter B (z) using the data matrix Y to separate speech and noise. 5 is a block diagram illustrating a more detailed configuration of the eigen filtering unit 340, and includes an eigen filter generator 500, an A (z) generator 520, and an infinite impulse response filter 540. do. The eigen filter generator 500 obtains an eigen filter B (z) using the data matrix Y. The A (z) generator 520 constructs a polynomial (ploynomial) using the components of the selected eigenvector, obtains the root of the polynomial, and moves it to the inside of the unit circle to use the polynomial. Construct filter A (z). The infinite impulse response filtering unit 540 divides the eigen filter B (z) by the filter A (z) to obtain an infinite impulse response filter H (z) (= B (z) / A (z)). Isolate the noise.

후처리부(240)는 상기 분리된 음성에 남아있는 잔여 잡음을 제거한다. 상기 후처리부(240)는 1채널 기반의 잡음추정을 이용하여 음성신호에 남아있는 잔여잡음을 제거함이 바람직하다. 상기 잔여 잡음 제거는 VAD(Voice activity detection)를 적용하여 음성신호 구간과 잡음구간을 찾아, 잡음 구간의 잡음신호를 주기적으로 업데이트한 신호를 상기 음성&잡음분리부에서 출력된 신호에서 감산함에 의해 이루어질 수 있다. 상기 업데이트는 이전 잡음추정값과 현재 잡음값을 가중합(weighted sum)함에 의해 이루어질 수 있다. The post processor 240 removes residual noise remaining in the separated voice. The post processor 240 preferably removes the residual noise remaining in the voice signal by using a noise estimation based on one channel. The residual noise removal is performed by applying voice activity detection (VAD) to find a voice signal section and a noise section, and subtracting a signal periodically updated from the noise signal of the noise section from the signal output from the voice & noise separation unit. Can be. The update can be made by weighting the previous noise estimate and the current noise value.

도 6은 본 발명에 의한 멀티채널 음성신호의 잡음제거 방법에 대한 일실시예를 흐름도로 도시한 것으로서, 도 6을 참조하여 본 발명에 의한 멀티채널 음성신호의 잡음제거 방법을 설명하기로 한다.FIG. 6 is a flowchart illustrating an example of a method for removing noise of a multichannel voice signal according to the present invention. Referring to FIG. 6, a method for removing noise of a multichannel voice signal according to the present invention will be described.

잡음 고유의 성분만을 제거(600단계)는 다음과 같은 과정을 가진다. 다양한 환경에 존재하는 여러 잡음들에는 그 각각의 잡음 특성을 나타낼 수 있는 고유한 성분들이 존재한다. 따라서 이러한 성분들의 특성을 구하기 위해 해당 잡음만이 존재하는 구간의 신호들을 이용하여 주파수 특성 분석을 수행한다. 이렇게 분석된 잡음 특성을 이용하여 1차적으로 해당 잡음 신호를 제거할 수 있는 필터를 구현하여 잡음을 제거할 수 있다.Removing only noise-specific components (step 600) has the following process. Different noises in different environments have their own components that can represent their respective noise characteristics. Therefore, frequency characteristic analysis is performed using signals in a section in which only a corresponding noise exists in order to characterize these components. Using the analyzed noise characteristics, noise can be removed by implementing a filter that can first remove the corresponding noise signal.

특히 운행 중인 자동차 환경에서는 차량 환경에서 발생하는 잡음들이 저주파 부분에 매우 크게 존재하게 된다. 일반적으로 차량 운행 중에 발생하는 잡음은 바람 소리, 타이어 잡음, 엔진 소리 등으로부터 발생하는 저주파 성분들이 대부분을 차지한다. 즉, 차량 잡음은 주행 환경과 차량 상태에 따라서 일반적으로 100~800 Hz 사이에서 피크 파워를 가진다. 또한 1 kHz 아래에서는 잡음 스펙트럼 레벨이 6 dB/octave로 감소하고, 반면 1 kHz 이상에서는 스펙트럼 레벨이 12 dB/octave로 빠르게 감소한다. 그러나 음성의 스펙트럼의 파워는 잡음과 비슷한 형태로 나타나고 따라서 음성과 잡음을 완전하게 분리하는 것은 어렵다. 그러나 잡음 신호 성분들이 대부분 분포하고 있고 저주파 성분들을 줄임으로써 보다 깨끗한 음성 신호를 얻을 수 있다. 이러한 것을 고려하여 cutoff frequency가 200~300 Hz인 고역통과 필터(High-pass 필터)를 적용하여 성능향상을 도모할 수 있다. 다음 수학식 1은 간단한 고역통과 필터(cutoff frequency=240 Hz)의 예이다.Particularly in a driving vehicle environment, noises generated in the vehicle environment are very large in the low frequency part. In general, noise generated while driving a vehicle is mostly made up of low frequency components generated from wind noise, tire noise, and engine noise. That is, the vehicle noise generally has a peak power between 100 and 800 Hz depending on the driving environment and the vehicle condition. Also below 1 kHz the noise spectral level is reduced to 6 dB / octave, while above 1 kHz the spectral level is rapidly reduced to 12 dB / octave. However, the power of the spectrum of the voice appears in a form similar to noise, so it is difficult to completely separate the speech and noise. However, most of the noise signal components are distributed, and by reducing the low frequency components, a cleaner speech signal can be obtained. In consideration of this, a high-pass filter (high-pass filter) having a cutoff frequency of 200 to 300 Hz may be applied to improve performance. Equation 1 below is an example of a simple high pass filter (cutoff frequency = 240 Hz).

다음으로 음성과 잡음의 분리(620단계)는 두 입력 신호들을 이용하여 음성신호 성분만을 추출할 수 있도록 필터를 구현하며, 이는 다음과 같은 과정을 가진다. 도 7은 상기 음성과 잡음의 분리(620단계) 과정에 대한 보다 세부적인 과정을 흐름도로 도시한 것이다.Next, in the separation of speech and noise (step 620), a filter is implemented to extract only a speech signal component using two input signals, which has the following process. 7 is a flowchart illustrating a more detailed process of the separation of the voice and the noise (step 620).

본 방법은 두 마이크의 입력신호에서 만약 음성신호와 잡음신호가 서로 독립 (independent)이라는 가정이 있을 경우에 음원분리(signal separation) 기법을 적용하여 두 신호를 분리해 낸다. 즉, 이러한 음원분리 기법의 원리를 이용하여 음성신호 성분만을 따로 추출할 수 있도록 필터를 구현하여 잡음이 제거된 음성신호만을 얻는다.This method separates two signals from the input signal of two microphones by applying signal separation technique if it is assumed that voice and noise signals are independent of each other. That is, the filter is implemented to extract only the voice signal components separately using the principle of the sound source separation technique to obtain only the voice signal from which the noise is removed.

이 방법은 두 마이크로부터의 신호를 프레임(frame) 단위로 나누어서 처리 한다. 즉, 윈도우를 씌우고 중첩(overlap)을 하면서 이동(shift)시키면서 처리한다.This method divides and processes the signals from the two microphones into frames. That is, it handles the window while shifting while overlapping.

두 개의 채널로 입력되는 잡음이 섞인 음성신호는 서로 크게 상관(correlation)되어 있다. 따라서 이 두 채널의 신호로부터 크로스 상관(cross-correlation)을 구할 수 있다.(700단계) 이 정보를 이용하여 음원 신호로부터 각각의 마이크에 도착하는데 걸린 두 마이크 신호 사이의 delay를 구할 수 있다.(710단계) 먼저 두 채널(마이크) 각각의 신호가

일 경우, 크로스 상관(cross-correlation)은 다음과 같이 구할 수 있다.Noise signals mixed in two channels are correlated with each other. Thus, cross-correlation can be obtained from the signals of these two channels (step 700). This information can be used to determine the delay between two microphone signals that arrive at each microphone from the sound source signal. Step 710) The signals of each of the two channels (microphone)

In this case, cross-correlation can be obtained as follows.

그 다음 상기 cross-correlation의 값들 중에서 가장 큰 값을 가지는 지점을 구하여 두 신호가 얼마나 시간지연(delay) 되어 있는지 구한다. 이렇게 구한 시간지연된 값를 이용하여 각각의 신호가 시간지연된 값만큼 이동(shift)시킨

신호를 구한 후(720단계), 수학식 3과 같이 새로운 입력신호

신호를 재구성한다.(730단계)Next, the point having the largest value among the cross-correlation values is obtained to determine how much time is delayed between the two signals. Using the time-delayed values thus obtained, each signal is shifted by the time-delayed value.

After obtaining the signal (step 720), the new input signal as shown in Equation 3

Reconstruct the signal (step 730).

그 다음 각 프레임마다 새로 구성한 입력 신호

에 대해서

라는 데이터 행렬(data matrix)을 구한다.(740단계)Then each newly configured input signal

about

Obtain a data matrix (S740).

여기서 p는

를 만족하는 값으로 정하고, k는 원하지 않는 신호의 개수이다. 또한 N은 한 프레임의 길이를 나타낸다.Where p is

Is set to a value satisfying k , and k is the number of unwanted signals. In addition, N represents the length of one frame.

이렇게 구한 data matrix를 이용하여 수학식 5와 같이 correlation matrices(

)를 구한다.Using the obtained data matrix, correlation matrices (

)

다음으로 Ratio matrix를 다음과 같은 관계식을 이용하여 구한다. Next, the ratio matrix is obtained using the following equation.

상기 Ratio matrix(

)에 대해서 eigenvalue decomposition 기법을 이용하여 eigenvalues와 eigenvectors를 구한다. 이렇게 구한 값들을 정렬하여 가장 작은 eigenvalues와 이에 해당하는 eigenvectors를 얻는다. 작은 eigenvalues에 해당하는 것들이 음성신호에 관련된 성분들이고, 큰 eigenvalues에 해당되는 것들은 잡음(noise)에 해당되는 성분들이 된다.The Ratio matrix (

Eigenvalues and eigenvectors are obtained using the eigenvalue decomposition technique. Sort these values to get the smallest eigenvalues and the corresponding eigenvectors. The small eigenvalues are the components of the voice signal, and the large eigenvalues are the components of noise.

이렇게 얻은 eigenvetors를 이용하여 B(z) 라는 eigen 필터를 얻을 수 있다.(750단계)Using the eigenvetors thus obtained , an eigen filter called B (z) can be obtained (step 750).

여기서 eigen 필터 B(z) 의 주파수 응답특성(frequency response)를 좋게 하기 위해서 필터 A(z)를 구하여(760단계), 최종적으로 H(z) 라는 Infinite Impulse Response(IIR) 필터를 구성한다.(770단계)In order to improve the frequency response of the eigen filter B (z) , filter A (z) is obtained (step 760), and finally, an Infinite Impulse Response (IIR) filter called H (z) is formed. 770 steps)

A(z)를 구하는 방법은 선택된 eigenvector의 성분(element)을 이용하여 polynomial을 구성하고 이 polynomial의 근(roots)을 구하여 이 값을 unit circle의 안쪽으로 조금씩 이동시킨다. 그 다음 다시 이동된 root를 이용하여 polynomial을 구성하고 이것이 A(z)의 계수가 된다. 이렇게 함으로써 H(z)의주파수 응답특성은 정규화(normalize)된다. 즉, 위와 같은 방법으로 필터를 구성하여 다음 식과 같이 잡음음성신호의 각 프레임마다 필터링을 시켜주어 잡음이 제거된 원하는 음성신호를 얻는다. The method of obtaining A (z) constructs a polynomial using the elements of the selected eigenvector, finds the roots of the polynomial, and moves this value inwardly into the unit circle. Then we use the moved root to construct the polynomial, which is the coefficient of A (z) . By doing so , the frequency response of H (z) is normalized. That is, the filter is configured in the above manner to filter the noise speech signal for each frame as shown in the following equation to obtain the desired speech signal from which the noise is removed.

여기서

는 콘볼루션(convolution) 연산을 의미한다.here

Denotes a convolution operation.

잔여 잡음제거를 위하여 후처리 과정인 제 3단계(640)에서는 다음과 같은 과정을 가진다. 본 과정은 2단계까지 잡음이 제거된 신호를 얻었으나, 아직까지 남아있는 잔여 잡음을 제거하기 위한 과정이다. 이 부분은 1채널 기반의 잡음제거 방법인 스펙트럼 차감법(Spectral Subtraction)과 비슷한 과정으로 잔여잡음을 추정하여 빼 주는 방법이다. 이때 Voice activity detection(VAD) 방법을 적용하여 음성과 잡음 구간을 찾고 잡음에 해당되는 구간의 잡음 신호들로부터 잡음 신호를 주기적으로 업데이트해준다. 이렇게 구한 잡음 신호를 2단계를 통과한 신호에서 빼주는 방법으로 잔여잡음을 제거한다. In the third step 640, which is a post-processing process for removing residual noise, the following process is performed. This process obtains the signal from which noise is removed up to 2 steps, but it is a process to remove the remaining noise. This part estimates and subtracts the residual noise in a similar process to Spectral Subtraction, a 1-channel noise reduction method. At this time, the voice activity detection (VAD) method is applied to find a voice and noise section and periodically update the noise signal from the noise signals of the section corresponding to the noise. The residual noise is removed by subtracting the obtained noise signal from the signal passed in step 2.

위와 같은 과정을 통해서 2 채널(2개의 마이크)을 이용한 잡음제거 방법은 잡음성분이 제거된 보다 깨끗한 음성신호를 얻을 수 있음으로써 음질 향상 및 음성인식 시스템의 성능 향상을 도모하는데 쓰일 수 있다.The noise reduction method using two channels (two microphones) through the above process can be used to improve the sound quality and performance of the speech recognition system by obtaining a clearer speech signal from which noise components are removed.

표 1은 아무런 잡음제거 기법을 적용하지 않은 경우와, 1~2채널 잡음제거 기법을 적용한 경우의 입/출력 음성 대 잡음의 SNR(Signla-to-Noise Ratio)값과 음성인식률 결과를 비교한 것이다. 본 실험결과에서 사용한 데이터베이스는 80km의 고속주행에서의 자동차 잡음환경에서 수집된 Car01 DB이고, 발성 목록은 Navigation, 자동차 액세서리, 카오디오, 다이얼링 명령어 및 Route 가이드 단어들로 구성되었다.Table 1 compares the SNR (Signal-to-Noise Ratio) values of the input / output voice-to-noise and the speech recognition rate results when no noise reduction is applied and when the 1 ~ 2 channel noise cancellation is applied. . The database used in this experiment is Car01 DB collected in a car noise environment of 80km high speed driving, and the voice list is composed of Navigation, car accessories, car audio, dialing command and Route guide words.

해당 마이크의 위치는 channel 3과 channel 5 두 개를 사용하였다.We used two channels 3 and 5 for the location of the microphone.

표 1의 실험 결과를 보면 기존 논문(Sungjoo Ahn and HANSEOK KO, “Background Noise Reduction via Dual-channel Scheme for Speech Recognition in Vehicular Environment”, IEEE Transactions on Consumer Electronics, Vol. 51, No. 1, pp. 22-27, Feb. 2005.)의 결과보다 더 좋은 성능을 보이는 것을 볼 수 있다. 기존 논문에서보다 후처리부(도 2의 240)가 추가되었고, SNR비와 음성인식률 모두 향상된 성능을 보이고 있다. 또한 2채널 GSC 기반 잡음 제거 방법에 본 발명에서 제안한 고유잡음제거부(도 2의 200)를 적용하였을 경우에도 성능이 크게 향상됨을 확인할 수 있다. Experimental results of Table 1 show the results of previous papers (Sungjoo Ahn and HANSEOK KO, “Background Noise Reduction via Dual-channel Scheme for Speech Recognition in Vehicular Environment”, IEEE Transactions on Consumer Electronics, Vol. 51, No. 1, pp. 22 -27, Feb. 2005.) shows better performance. A post-processing unit (240 of FIG. 2) was added than in the previous paper, and both SNR ratio and speech recognition rate showed improved performance. In addition, even when the inherent noise canceller (200 of FIG. 2) proposed in the present invention is applied to the two-channel GSC-based noise canceling method, the performance is greatly improved.

본 발명은 컴퓨터로 읽을 수 있는 기록 매체에 컴퓨터(정보 처리 기능을 갖는 장치를 모두 포함한다)가 읽을 수 있는 코드로서 구현하는 것이 가능하다. 컴퓨터가 읽을 수 있는 기록 매체는 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록 장치를 포함한다. 컴퓨터가 읽을 수 있는 기록 장치의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피 디스크, 광데이터 저장장치 등이 있다. The present invention can be embodied as code that can be read by a computer (including all devices having an information processing function) in a computer-readable recording medium. The computer-readable recording medium includes all kinds of recording devices in which data that can be read by a computer system is stored. Examples of computer-readable recording devices include ROM, RAM, CD-ROM, magnetic tape, floppy disks, optical data storage devices, and the like.

본 발명은 도면에 도시된 실시예를 참고로 설명되었으나 이는 예시적인 것에 불과하며, 본 기술 분야의 통상의 지식을 가진 자라면 이로부터 다양한 변형 및 균등한 타 실시예가 가능하다는 점을 이해할 것이다. 따라서, 본 발명의 진정한 기술적 보호 범위는 첨부된 특허청구범위의 기술적 사상에 의해 정해져야 할 것이다. Although the present invention has been described with reference to the embodiments shown in the drawings, this is merely exemplary, and it will be understood by those skilled in the art that various modifications and equivalent other embodiments are possible. Therefore, the true technical protection scope of the present invention will be defined by the technical spirit of the appended claims.

본 발명에 의한 멀티채널 음성신호의 잡음제거 방법 및 장치에 의하면, 멀티 채널 잡음 제거 시스템의 입력단에서 잡음을 제거함으로써 결과적으로 단일 채널 환경에서의 잡음처리 방법보다 향상된 성능을 얻을 수 있고, 전체 시스템의 성능을 향상시킬 수 있다.According to the method and apparatus for removing noise of a multi-channel voice signal according to the present invention, by removing noise from an input terminal of a multi-channel noise cancellation system, as a result, an improved performance can be obtained than a noise processing method in a single channel environment. It can improve performance.

그리고 각 잡음 환경마다 원천적으로 존재하는 잡음 성분들을 제거함으로써 시스템의 성능하락을 막을 수 있고, 또한 2채널 환경에서도 다른 잡음처리 방법에 비해 우수한 성능을 보임으로써 음질향상 및 음성인식 시스템에 적용하기에 적합하다. 또한 본 발명과 같은 음성 전처리 기술은 사무실환경에서만 이루어지던 음성 인식 및 인터페이스 기술을 소음이 많은 환경에 적용할 수 있는 기반을 제공한다.In addition, it is possible to prevent the performance degradation of the system by removing the noise components that exist in each noise environment. Also, it is suitable to be applied to the sound quality improvement and speech recognition system by showing the superior performance compared to other noise processing methods in the two channel environment. Do. In addition, the voice preprocessing technology such as the present invention provides a foundation for applying a voice recognition and interface technology, which was made only in an office environment, to a noisy environment.

Claims

(a) removing noise inherent components of the environment from the speech signal;

(b) separating speech and noise from the speech signal from which the noise intrinsic component has been removed; And

and (c) a post-processing step of removing residual noise remaining in the separated speech.

The method of claim 1, wherein the noise inherent component of step (a) is

A noise reduction method of a multi-channel speech signal characterized by removing noise intrinsic components using at least one of a high pass filter, a low pass filter, and a band pass filter by frequency analysis of the noise.

The method of claim 1, wherein step (b)

(b1) generating y ₁ and y ₂ by correcting the time delays of the input signals x ₁ and x ₂ to arrive at the respective microphones from the sound source signal by respectively enhancing the voice signal components and the noise components included in the channel;

(b2) y ₁ and y _{2 for} each frame Obtaining a data matrix Y for; And

and (b3) separating the speech and the noise by obtaining an eigen filter B (z) using the data matrix Y.

The method of claim 3, wherein step (b3)

Obtaining an eigen filter B (z) using the data matrix Y;

Constructing a polynomial using the components of the selected eigenvector, finding a root of the polynomial, moving it to the inside of a unit circle, and constructing a polynomial using the moved root to generate a filter A (z) ; And

And dividing the eigen filter B (z) by the filter A (z) to obtain an infinite impulse response filter H (z) (= B (z) / A (z)) to separate speech and noise. Noise reduction method for a multichannel audio signal.

The method of claim 3, wherein step (b1)

Performing cross-correlation on the input signals x ₁ , x ₂ ;

Obtaining a time delay of an input signal x ₁ , x ₂ required to arrive at each microphone from a sound source signal using the cross-correlated information;

Generating the synchronized signals x ' ₁ and x' ₂ by shifting the input signals x ₁ and x ₂ by a time delayed value; And

And obtaining y ₁ and y ₂ , respectively, by taking half of the sum of the sum of x ' ₁ and x' ₂ and subtraction.

The method of claim 1, wherein step (c)

A noise canceling method for a multichannel speech signal, characterized in that the residual noise remaining in the speech signal is removed using a 1-channel noise estimation.

The method of claim 6, wherein the residual noise cancellation

Finding a voice signal section and a noise section by applying voice activity detection (VAD); And

And subtracting the signal which is periodically updated with the noise signal in the noise section from the signal generated in the step (b).

8. The method of claim 7, wherein said update is

A noise canceling method of a multichannel speech signal, characterized by a weighted sum of a previous noise estimate and a current noise value.

An inherent noise removing unit for removing a noise inherent component according to the environment from the voice signal;

A voice & noise separator for separating voice and noise from the voice signal from which the noise inherent components are removed; And

And a post-processing unit for removing residual noise remaining in the separated voice.

10. The method of claim 9, wherein the noise inherent component removal of the inherent noise removal unit

A noise canceling device for a multichannel speech signal, characterized in that the noise is removed by frequency analysis using at least one of a high pass filter, a low pass filter, and a band pass filter.

The method of claim 9, wherein the voice and noise separation unit

A time delay compensator for correcting the time delays of the input signals x ₁ and x ₂ to arrive at the respective microphones from the sound source signal, thereby generating y ₁ and y ₂ , each of which enhances the voice signal component and the noise component included in the channel;

Y ₁ , y _{2 for} each frame A data matrix generator for obtaining a data matrix Y with respect to; And

And an eigen filtering unit for dividing speech and noise by obtaining an eigen filter B (z) using the data matrix Y.

The method of claim 11, wherein the time delay compensation unit

A cross correlator for performing cross-correlation on the input signals x ₁ and x ₂ ;

A time delay acquisition unit for obtaining a time delay of an input signal x ₁ , x ₂ required to arrive at each microphone from a sound source signal using the cross-correlated information;

A synchronizer configured to shift the input signals x ₁ and x ₂ by a time delayed value to generate synchronized signals x ' ₁ and x'₂; And

Wherein x _'1, x' ₂ obtained by adding the value obtained by subtracting half the taking y _1, multi-channel, characterized in that it comprises & y ₁ y ₂ y ₂ generation unit to obtain the removal of the audio signal noise values each device.

The method of claim 11, wherein the eigen filtering unit

An eigen filter generator for obtaining an eigen filter B (z) using the data matrix Y;

A to construct a polynomial using the components of the selected eigenvector, find the root of the polynomial, move it to the inside of the unit circle, and construct a polynomial using the moved root to generate the filter A (z). (z) generating unit; And

Infinite impulse response filtering unit for dividing the eigen filter B (z) by the filter A (z) to obtain an infinite impulse response filter H (z) (= B (z) / A (z)) to separate speech and noise Noise canceling device for a multi-channel voice signal characterized in that.

The method of claim 9, wherein the post-processing unit

A noise canceling device for a multichannel speech signal, characterized in that the residual noise remaining in the speech signal is removed using a 1-channel noise estimation.

15. The method of claim 14, wherein the residual noise cancellation is

It is made by applying the voice activity detection (VAD) to find the voice signal section and the noise section, and by subtracting the signal periodically updated the noise signal of the noise section from the signal output from the voice & noise separation unit Noise canceling device for channel voice signal.

The method of claim 15 wherein the update is

A noise canceling device for a multichannel speech signal, characterized by a weighted sum of a previous noise estimate and a current noise value.

A computer-readable recording medium having recorded thereon a program for executing the invention according to any one of claims 1 to 8.