KR20070085193A

KR20070085193A - Noise cancellation apparatus and method thereof

Info

Publication number: KR20070085193A
Application number: KR1020070079196A
Authority: KR
Inventors: 정상배; 강민아; 한민수
Original assignee: 한국정보통신대학교 산학협력단
Priority date: 2007-08-07
Filing date: 2007-08-07
Publication date: 2007-08-27
Also published as: KR100917460B1

Abstract

A noise removing device and a method therefor are provided to separate a frequency response of a target signal and a frequency response of a noise signal by using position information of the target signal and remove a noise. A plurality of microphones(100a,100b) receives sound signals in which a voice and a noise are included. A target signal information input unit(200) receives a position of a target signal, which is a voice signal, as preceding information. An equalizer information input unit(300) outputs equalizer information for correcting mismatching among a plurality of the microphones(100a,100b). A noise removing unit(400) analyzes frequencies of the sound signals, and estimates whether respective frequency responses of the sound signals are a frequency response of a noise signal or a frequency response of the target signal on the basis of the preceding information. A control unit(500) controls the operation of a plurality of the microphones(100a,100b), the target signal information input unit(200), the equalizer information input unit(300), and the noise removing unit(400). The control unit(500) analyzes the frequencies of the sound signals received through a plurality of the microphones(100a,100b), estimates whether the respective frequency responses are the frequency response of the noise signal or the frequency response of the target signal on the basis of the preceding information, and removes a noise by using the estimated result.

Description

Noise reduction device and method {NOISE CANCELLATION APPARATUS AND METHOD THEREOF}

본 발명은 음성인식 및 음성 복부호화기를 위한 잡음제거 장치 및 방법에 관한 것이다. 특히, 본 발명은 목적 신호의 위치 정보를 이용하여 목적 신호의 주파수 응답과 잡음 신호의 주파수 응답을 분리하여 잡음을 제거하는 방법에 관한 것이다.The present invention relates to a noise canceling device and method for speech recognition and speech decoder. In particular, the present invention relates to a method for removing noise by separating the frequency response of the target signal and the frequency response of the noise signal using the location information of the target signal.

음성 인식 시스템에서 바람직하지 않은 신호인 잡음은 정상성 잡음과 비정상성 잡음으로 나누어 진다. Noise, which is an undesirable signal in speech recognition systems, is divided into normal noise and abnormal noise.

정상성 잡음을 제거하기 위한 방법으로는 일반적으로 위너(Wiener) 필터 방식과 칼만(Kalman) 필터 방식을 예로 들 수 있다. As a method for removing the normal noise, a Wiener filter method and a Kalman filter method are generally used.

위너 필터 방식은 잡음 신호의 주파수 응답을 추정하고, 임의의 입력 신호의 주파수 응답에서 추정한 잡음 신호의 주파수 응답을 차감하여 음성 신호를 검출한다. 즉, 위너 필터 방식은 잡음 신호의 주파수 응답을 입력 신호의 초기 100msec~200msec 에서 추정을 하거나 실시간 음성검출기를 이용하여 음성이 아닌 구간에 대해서 지속적으로 추정을 해 가는 것으로부터 시작한다. 이후, 임의의 입 력 신호가 들어오면 주파수 응답을 구하고 추정된 잡음의 전력 스펙트럼을 차감하여 검출하고자 하는 음성 신호의 전력 스펙트럼을 구한다. 이를 통해, 잡음과 음성의 주파수 응답을 모두 구하게 되어 잡음 제거를 위한 시스템 함수의 추정이 가능하다. 실질적으로 검출하고자 하는 음성 신호의 샘플 값 자체는 시간 영역에서 입력 신호와 시스템 함수의 복적분(convolution)과정을 거쳐서 구할 수 있다. 수학식 1은 주어진 입력 신호로부터 검출하고자 하는 음성 신호의 전력 스펙트럼을 구하는 식이다. The Wiener filter method estimates the frequency response of the noise signal and detects the voice signal by subtracting the frequency response of the noise signal estimated from the frequency response of the arbitrary input signal. In other words, the Wiener filter method starts by estimating the frequency response of the noise signal at the initial 100msec ~ 200msec of the input signal or continuously estimating the non-voice interval using a real-time voice detector. Then, if any input signal comes in, the frequency response is obtained and the power spectrum of the speech signal to be detected is obtained by subtracting the power spectrum of the estimated noise. Through this, both the noise and the frequency response of speech can be found, and the system function for noise cancellation can be estimated. The sample value itself of the speech signal to be detected can be obtained through the process of convolution of the input signal and the system function in the time domain. Equation 1 calculates a power spectrum of a voice signal to be detected from a given input signal.

[수학식 1][Equation 1]

여기서,

는 검출하고자 하는 음성 신호의 주파수 응답,

는 잡음이 섞인 입력 신호의 주파수 응답을 나타내고,

는 입력 신호의 초기 신호 혹은 실시간 음성 검출기를 통해서 추정된 잡음 신호의 주파수 응답이다. here,

Is the frequency response of the voice signal to be detected,

Represents the frequency response of the noisy input signal,

Is the frequency response of the noise signal estimated by the initial signal of the input signal or the real-time voice detector.

수학식 2는 위너 필터의 설계 방법이다. Equation 2 is a design method of the Wiener filter.

[수학식 2][Equation 2]

수학식 3은 시간 영역에서 잡음 제거 방법이다. Equation 3 is a noise removal method in the time domain.

[수학식 3][Equation 3]

여기서,

은 잡음이 제거된 음성 신호, '

'는 선형복적분 기호이고, IFFT(Inverse Fourier Transform)은 역푸리에 변환 기호이다. here,

Is a noise-free speech signal,

'Is the linear double sign and IFFT (Inverse Fourier Transform) is the inverse Fourier transform symbol.

칼만 필터 방식은 시간 n까지 주어진 입력으로부터 구할 수 있는 최적 음성 신호의 값을 시간 (n-1)까지 주어진 입력으로부터 구할 수 있는 최적의 음성 신호의 값과 최적 칼만 이득 등을 이용하여 통계적으로 분석하는 방식을 말한다. 또한, 순수 시간 영역에서만 수행되며 비인과성(non-casual) 시스템인 위너 필터 방식과 달리 인과성(casual) 시스템이다. The Kalman filter method statistically analyzes the value of the optimal speech signal obtainable from a given input up to time n using the optimal speech signal value and optimal Kalman gain obtained from the given input up to time (n-1). Say the way. In addition, unlike the Wiener filter method, which is performed only in the pure time domain and is a non-casual system, it is a causal system.

수학식 4는 칼만 필터에 의한 잡음 제거 방법이다. Equation 4 is a noise reduction method by the Kalman filter.

[수학식 4][Equation 4]

여기서,

은 시간 n까지 주어진 입력으로부터 구할 수 있는 최적 음성 신호 값,

는 (n-1) 까지 주어진 입력으로부터 구할 수 있는 최적 음성 신호의 값,

은 최적 칼만 이득,

는 음성 신호의 생성에 직접 관련이 있는 선형 예측 계수이다.here,

Is the optimal speech signal value from a given input up to time n,

Is the value of the optimal speech signal that can be obtained from a given input up to (n-1),

Optimum Kalman benefit,

Is a linear prediction coefficient that is directly related to the generation of the speech signal.

그런데, 이러한 위너 및 칼만 필터 방식은 섞이는 잡음이 정상적 잡음인 경 우에 효과적인 알고리즘이다. 즉, 제거 대상이 음악, 사람의 목소리 등의 비정상적일 경우에는 효과를 발휘하지 못한다. However, this Wiener and Kalman filter method is an effective algorithm when the mixed noise is a normal noise. In other words, if the object to be removed is abnormal, such as music or human voice, the effect is not achieved.

비정상적 잡음을 제거하기 위한 방법으로는 일반적으로 맹목 신호 분리(blind source seperation) 방식과 빔포밍(beamforming) 방식을 예로 들 수 있다. 이러한, 방법들은 마이크 어레이를 사용하며, 마이크 어레이를 사용하면 검출하고자하는 목적 신호의 위치 정보와 경로 정보가 이용될 수 있다. As a method for removing abnormal noise, a blind source seperation method and a beamforming method may be generally used. These methods use a microphone array, and when the microphone array is used, location information and path information of a target signal to be detected may be used.

맹목 신호 분리 방식은 마이크의 개수가 잡음 및 목적 신호를 포함해서 섞여있는 신호원의 개수와 같거나 혹은 많을 때 쓰이는 알고리즘으로, 대표적인 알고리즘은 ICA(Independent component algorithm) 기법이다. ICA 기법은 신호원과 마이크 간의 시스템 함수는 경로함수로 주어지고, 경로 행렬의 역을 신호원들이 비가우시안이라는 가정하에 구한다. 즉, 경로 행렬의 역을 통과한 마이크 어레이 신호가 서로 상호 정보량(mutual information)이 최대화 되도록 역행렬을 학습하게 된다. The blind signal separation method is an algorithm used when the number of microphones is equal to or greater than the number of mixed signal sources including noise and the target signal. The representative algorithm is an independent component algorithm (ICA). In the ICA technique, the system function between the signal source and the microphone is given by the path function, and the inverse of the path matrix is obtained under the assumption that the signal sources are non-Gaussian. That is, the inverse matrix is trained so that the mutual information of the microphone array signals passing through the inverse of the path matrix is maximized.

그러나, 맹목 신호 분리 방식은 신호원의 개수가 항상 마이크보다 작아야 한다는 제약이 있다. 신호원의 개수는 정량적으로 정해질 수 없고 실제적인 경우에 넓은 영역에 산재하므로 좋은 성능을 기대할 수 없다. 또한, 맹목 신호 분리 방식은 신호 분리 과정에서 알고리즘이 수렴하는데 시간이 많이 걸리므로 음성 인식을 위한 신호 검출에 기본적으로 사용될 수 없고, 마찬가지로 음성 통신을 위한 복부호화기에도 쉽게 적용하기 어렵다는 단점이 있다.However, the blind signal separation method has a limitation that the number of signal sources should always be smaller than the microphone. The number of signal sources cannot be determined quantitatively, and in practical cases, they are scattered over a large area, so good performance cannot be expected. In addition, since the blind signal separation method takes a long time for the algorithm to converge in the signal separation process, it cannot be used basically for signal detection for speech recognition, and likewise, it is difficult to easily apply to a decoder for voice communication.

빔포밍 기법은 맹목 신호 분리 기법과 다르게 검출하고자 하는 목적 신호의 위치를 선행 정보로 이용한다. 빔포밍에서는 마이크의 개수의 제약이 없으며, 마 이크의 개수가 많을수록 좋은 성능을 기대할 수 있다. 최근에 가장 널리 쓰이고 있는 빔포밍 기법은 GSC(generalized sidelobe canceller) 기법이다. GSC 기법은 선행 정보로 입력된 목적 신호의 위치를 바탕으로 고정 빔포밍(Fixed beamforming)을 수행하여 그 방향으로 신호대 잡음비가 높아지도록 마이크 어레이를 스티어링(Steering)한다. 그러면, 목적 신호 방향으로 마이크 어레이 신호들이 위상 정합되어 신호대 잡음비가 높아진 신호를 얻게 된다. 그렇지만, 잡음 신호는 여전히 잔존하게 되며, 고정 빔포밍 이후에 존재하는 잔존 잡음을 제거하기 위해 참조 잡음 신호를 이용한 적응 필터링 기법을 응용한다. The beamforming technique uses the position of the target signal to be detected as the preceding information unlike the blind signal separation technique. In beamforming, there is no limit on the number of microphones. The more microphones, the better the performance. Recently, the most widely used beamforming technique is a generalized sidelobe canceller (GSC) technique. The GSC technique performs fixed beamforming based on the position of the target signal input as the preceding information and steers the microphone array to increase the signal-to-noise ratio in the direction. The microphone array signals are then phase matched in the direction of the target signal to obtain a signal with a high signal-to-noise ratio. However, the noise signal still remains, and an adaptive filtering technique using the reference noise signal is applied to remove the residual noise that exists after the fixed beamforming.

도 1은 참조 잡음 신호를 이용한 적응 필터 기반의 잡음 제거 기법이다.1 is an adaptive filter based noise cancellation technique using a reference noise signal.

도 1에서

는 고정 빔포밍 이후의 신호이고,

는 고정 빔포밍 출력에 섞여 있는 잡음

을 제거하기 위한 참조 잡음 신호를 나타낸다. 참조 잡음 신호는 선행 입력된 목적 신호의 위치 정보를 바탕으로 마이크 어레이의 채널간 신호를 적절히 차감하는 방법으로 추정될 수 있다. In Figure 1

Is the signal after fixed beamforming,

Is noise mixed in a fixed beamforming output

Represents a reference noise signal to remove. The reference noise signal may be estimated by a method of properly subtracting the inter-channel signal of the microphone array based on the position information of the previously input target signal.

빔포밍 기법은 일반적으로 전파를 이용한 통신 시스템에 사용되며, 강력한 적응 필터를 사용하여 잡음을 제거한다. 그런데, 음성 신호 처리를 위한 빔포밍 기법에서는 적응 필터가 잡음을 제거하는 동시에 음성을 심하게 왜곡시키는 문제점이 있다. 이는, 제거해야 할 잡음의 신호 특성이 목적 신호의 특성과 유사하기 때문이다. 따라서, 음성 통신을 위한 복부호화기에 음질 저하를 발생시키고, 음성 인식을 위한 전처리에 사용될 경우 주파수 응답의 왜곡으로 인식률을 저하시키는 문제점이 있다.Beamforming techniques are commonly used in radiocommunication communication systems and use a powerful adaptive filter to remove noise. However, in the beamforming technique for processing a speech signal, an adaptive filter removes noise and severely distorts speech. This is because the signal characteristics of the noise to be removed are similar to those of the target signal. Accordingly, there is a problem in that a sound quality is deteriorated in the decoder for voice communication, and the recognition rate is lowered due to distortion of the frequency response when used in preprocessing for voice recognition.

본 발명이 이루고자 하는 기술적 과제는 음성 인식률을 향상시키고, 통화 품질을 향상 시킬 수 있는 음성인식 및 음성 복부호화기를 위한 잡음제거 장치 및 방법을 제공하는 것이다.SUMMARY OF THE INVENTION The present invention has been made in an effort to provide an apparatus and a method for removing noise for a speech recognition and speech decoder that can improve speech recognition rate and improve call quality.

본 발명의 한 특징에 의하면, 잡음 제거 방법이 제공된다. 음성인식 및 음성 복부호화기를 위한 잡음제거 방법은, (a) 음성 신호인 목적 신호의 위치를 선행 정보로 입력 받는 단계; (b) 복수의 마이크로부터 수신된 다채널 음향 신호를 주파수 분석한 후, 상기 선행 정보를 기초로 상기 수신된 신호의 각각의 주파수 응답이 잡음 신호의 주파수 응답인지 목적 신호의 주파수 응답인지 여부를 추정하는 단계; 및 (c) 상기 (b) 단계에서 추정된 잡음 신호의 주파수 응답과 목적 신호의 주파수 응답을 이용하여 잡음 제거를 수행하는 단계를 포함한다. According to one aspect of the present invention, a noise canceling method is provided. A noise reduction method for speech recognition and speech decoder includes: (a) receiving a position of a target signal which is a speech signal as preceding information; (b) after frequency analysis of the multi-channel acoustic signals received from the plurality of microphones, on the basis of the preceding information, it is estimated whether each frequency response of the received signal is a frequency response of a noise signal or a frequency response of a target signal. Doing; And (c) performing noise cancellation using the frequency response of the noise signal estimated in step (b) and the frequency response of the target signal.

본 발명의 다른 특징에 의하면, 잡음 제거 장치가 제공된다. 음성인식 및 음성 복부호화기를 위한 잡음제거 장치는, 음성 및 잡음이 포함된 음향 신호를 수신하는 복수의 마이크; 음성 신호인 목적 신호의 위치를 선행 정보로 입력 받는 목적 신호 정보 입력부; 상기 복수의 마이크 간의 부정합을 보정하기 위한 등화기 정보를 출력하는 등화기 정보 입력부; 상기 음향 신호를 주파수 분석하고, 상기 선행 정보를 기초로 상기 수신된 음향 신호의 각각의 주파수 응답이 잡음 신호의 주파수 응답인지 목적 신호의 주파수 응답인지 여부를 추정하는 잡음 제거부; 및 상기 복 수의 마이크, 상기 목적 신호 정보 입력부, 상기 등화기 정보 입력부 및 상기 잡음 제거부의 동작을 제어하는 제어부를 포함한다. According to another feature of the present invention, a noise canceling device is provided. A noise canceling device for speech recognition and speech decoder includes: a plurality of microphones for receiving an audio signal including speech and noise; An object signal information input unit which receives a position of the object signal as a voice signal as preceding information; An equalizer information input unit for outputting equalizer information for correcting mismatches between the plurality of microphones; A noise canceling unit for frequency analyzing the acoustic signal and estimating whether each frequency response of the received acoustic signal is a frequency response of a noise signal or a frequency response of a target signal based on the preceding information; And a control unit controlling operations of the plurality of microphones, the target signal information input unit, the equalizer information input unit, and the noise canceling unit.

본 발명에 의하면, 음성 인식률을 향상시키고, 통화 품질을 향상 시킬 수 있는 음성인식 및 음성 복부호화기를 위한 잡음제거 장치 및 방법을 제공할 수 있다. According to the present invention, it is possible to provide an apparatus and a method for removing noise for speech recognition and a speech decoder which can improve speech recognition rate and improve call quality.

아래에서는 첨부한 도면을 참고로 하여 본 발명의 실시예에 대하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.DETAILED DESCRIPTION Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art may easily implement the present invention. As those skilled in the art would realize, the described embodiments may be modified in various different ways, all without departing from the spirit or scope of the present invention. In the drawings, parts irrelevant to the description are omitted in order to clearly describe the present invention, and like reference numerals designate like parts throughout the specification.

명세서 전체에서, 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다. 또한, 명세서에 기재된 "…부", "…기", "모듈" 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어나 소프트웨어 또는 하드웨어 및 소프트웨어의 결합으로 구현될 수 있다.Throughout the specification, when a part is said to "include" a certain component, it means that it can further include other components, except to exclude other components unless otherwise stated. In addition, the terms “… unit”, “… unit”, “module”, etc. described in the specification mean a unit that processes at least one function or operation, which may be implemented by hardware or software or a combination of hardware and software. have.

이하, 본 발명의 실시예에 따른 잡음제거 장치 및 방법에 대해서 상세하게 설명한다. Hereinafter, an apparatus and a method for removing noise according to an embodiment of the present invention will be described in detail.

도 2은 본 발명의 실시예에 따른 잡음제거 장치의 개략적인 블록도이고, 도 3은 본 발명의 실시예에 따른 잡음 제거부의 블록도이다.2 is a schematic block diagram of an apparatus for removing noise according to an embodiment of the present invention, and FIG. 3 is a block diagram of a noise removing unit according to an embodiment of the present invention.

본 발명의 실시에에 따른 잡음제거 장치는 복수의 마이크(100a, 100b), 목적 신호 정보 입력부(200), 등화기 정보 입력부(300), 잡음 제거부(400) 및 제어부(500)를 포함한다. 이때, 각 구성에 대해서 간략히 설명하면 다음과 같다. Noise reduction apparatus according to the embodiment of the present invention includes a plurality of microphone (100a, 100b), the target signal information input unit 200, the equalizer information input unit 300, the noise canceling unit 400 and the control unit 500. . At this time, each configuration is briefly described as follows.

마이크(100a, 100b)는 외부에서 발생되는 음향을 입력받아서 대응되는 음향 신호로 변환하여 출력한다. 이때, 음향 신호는 바람직한 음성 신호와 바람직하지 않은 잡음 신호를 포함한다. 마이크는 복수의 마이크로 구성되며, 본 명세서에서는 설명의 편의상 두 개만 도시하였다.The microphones 100a and 100b receive sound generated from the outside and convert the sound into a corresponding sound signal. At this time, the acoustic signal includes a desirable voice signal and an undesirable noise signal. The microphone is composed of a plurality of microphones, only two of them are shown here for convenience of description.

목적 신호 정보 입력부(200)는 취득해야 할 목적 신호의 위치를 입력받고 잡음 제거부(400)로 전달한다. The target signal information input unit 200 receives a position of the target signal to be acquired and transmits it to the noise canceller 400.

등화기 정보 입력부(300)는 마이크 간의 부정합을 보정하기 위한 등화기 정보를 잡음 제거부(400)로 전달한다. The equalizer information input unit 300 transmits equalizer information to the noise canceller 400 to correct mismatch between microphones.

잡음 제거부(400)는 목적 신호의 위치 정보 및 등화기 정보를 바탕으로, 수신한 음향 신호에서 목적 신호의 주파수 응답과 잡음 신호의 주파수 응답을 추출하고, 잡음이 제거된 목적 신호를 검출한다. The noise remover 400 extracts the frequency response of the target signal and the frequency response of the noise signal from the received acoustic signal based on the positional information and the equalizer information of the target signal, and detects the target signal from which the noise is removed.

제어부(500)는 마이크(100a, 100b), 목적 신호 정보 입력부(200), 등화기 정보 입력부(300) 및 잡음 제거부(400)의 동작을 제어한다.The controller 500 controls operations of the microphones 100a and 100b, the object signal information input unit 200, the equalizer information input unit 300, and the noise canceling unit 400.

잡음 제거부(400)에 대한 보다 구체적인 구성은 도 3을 참고하여 설명한다. A more detailed configuration of the noise removing unit 400 will be described with reference to FIG. 3.

도 3은 본 발명의 실시예에 따른 잡음 제거부(400)의 세부적인 구성을 보인 블록도이다. 3 is a block diagram showing a detailed configuration of the noise removing unit 400 according to an embodiment of the present invention.

도 3을 참고하면, 잡음 제거부(400)는 신호 블록 형성부(410), 주파수 분할부(420), 주파수 응답 추정부(430), 잡음 처리부(440), 주파수 합성부(450) 및 신호 종료 여부 판단부(460)를 포함한다. Referring to FIG. 3, the noise remover 400 includes a signal block forming unit 410, a frequency divider 420, a frequency response estimator 430, a noise processor 440, a frequency synthesizer 450, and a signal. Termination determination unit 460 is included.

신호 블록 형성부(410)는 입력된 음향 신호를 단위 시간 단위의 음향 신호를 나눈다. 즉, 단구간 다채널 신호 블록으로 음향 신호를 나누는 동작을 수행한다. The signal block forming unit 410 divides the input sound signal into sound signals in unit time units. That is, the sound signal is divided into short-term multi-channel signal blocks.

주파수 분할부(420)는 단위 시간별로 입력된 다채널 시간 블록에 대해서 시간 지연과 방향에 대한 모호성을 해결하기 위해 시간 영역에서 주파수 분할 필터링을 수행한다. The frequency dividing unit 420 performs frequency division filtering in the time domain to solve the ambiguity of the time delay and the direction of the multi-channel time block input for each unit time.

주파수 응답 추정부(430)는 채널-주파수 밴드 신호에 대해 채널별 주파수 분석을 수행하고, 각 주파수 밴드의 신호 블록에 대해 방향 정보를 추정한다. 이후, 주파수 응답 추정부(430)는 추정된 방향 정보에 따라 잡음 및 목적 신호의 주파수 응답을 추정한다. 그런데, 이때 추정된 주파수 응답에는 정의되지 않은 부분이 존재하므로 주파수 응답 스무싱을 통해 전체 주파수 응답을 추정한다. The frequency response estimator 430 performs channel-specific frequency analysis on the channel-frequency band signal, and estimates direction information on the signal block of each frequency band. Thereafter, the frequency response estimator 430 estimates the frequency response of the noise and the target signal according to the estimated direction information. However, since there is an undefined part of the estimated frequency response, the total frequency response is estimated through frequency response smoothing.

잡음 처리부(440)는 주파수 응답 추정부(430)에서 추정한 전체 목적 신호 및 잡음 신호의 주파수 응답을 시간 영역에서 구해진 위너 필터 계수를 이용하여 잡음을 제거한다. 이를 통해 각 주파수 밴드별 목적 신호가 추출된다. The noise processor 440 removes noise using the Wiener filter coefficients obtained in the time domain from the frequency response of the entire target signal and the noise signal estimated by the frequency response estimator 430. Through this, the target signal for each frequency band is extracted.

주파수 합성부(450)는 잡음 처리부(440)에서 추출된 각 주파수 밴드별 목적 신호를 전체 주파수 대역의 신호를 합성하기 위해 시간 영역 주파수 합성 필터링을 수행한다.The frequency synthesizer 450 performs time domain frequency synthesis filtering to synthesize signals of the entire frequency band from the target signal for each frequency band extracted by the noise processor 440.

신호 종료 여부 판단부(460)는 입력 음향 신호의 종료 여부를 판단하고, 신호가 종료되지 않았으면 다음 신호 블록에 대한 목적 신호 검출을 수행한다. The signal termination determination unit 460 determines whether the input sound signal ends, and if the signal is not terminated, detects the target signal for the next signal block.

이제, 본 발명의 실시예에 따른 잡음제거 방법에 대해 살펴본다.Now, a noise cancellation method according to an embodiment of the present invention will be described.

도 4는 본 발명의 실시예에 따른 잡음제거 방법의 전체 흐름도이다. 4 is an overall flowchart of a noise canceling method according to an exemplary embodiment of the present invention.

본 발명의 잡음제거 장치는 목적 신호의 위치 정보를 입력(S101)받고, 마이크간의 부정합을 보정하기 위한 등화기 정보를 입력(S102)받는다.The noise canceling apparatus of the present invention receives the position information of the target signal (S101), and receives the equalizer information (S102) for correcting mismatch between microphones.

이후, 단위 시간별로 입력된 다채널 신호 블록에 대해서 목적 신호를 추출하는 과정을 수행한다.Thereafter, a process of extracting a target signal with respect to the multi-channel signal block input for each unit time is performed.

우선, 단구간 다채널 신호를 입력(S103)받고, 시간 지연과 방향에 대한 모호성을 해결하기 위한 시간 영역 주파수 분할 필터링을 수행(S104)하여 채널-주파수 밴드 신호를 생성한다. First, a short-term multi-channel signal is input (S103), and time domain frequency division filtering to solve ambiguity for time delay and direction is performed (S104) to generate a channel-frequency band signal.

이후, 채널-주파수 밴드 신호에 대해 채널별 주파수 분석을 수행(S105)하고, 등화기 정보를 기초로 방향 정보를 추정(S106)한다. 이후, 추정된 방향 정보에 따라 잡음 및 목적 신호의 단구간 주파수 응답을 추정(S107)한다. 이때, 잡음 신호와 목적 신호의 주파수 응답의 추정은, 목적 신호 위치 정보를 기초로 각 주파수가 잡음 신호의 것인지 목적 신호의 것인지를 판단하는 방법에 의해 추정된다. Thereafter, channel-specific frequency analysis is performed on the channel-frequency band signal (S105), and direction information is estimated based on the equalizer information (S106). Thereafter, the short-term frequency response of the noise and the target signal is estimated according to the estimated direction information (S107). At this time, the estimation of the frequency response of the noise signal and the target signal is estimated by a method of determining whether each frequency is the noise signal or the target signal based on the target signal position information.

추정된 잡음 및 목적 신호의 주파수 응답에는 정의되지 않은 부분이 존재하므로 주파수 응답 스무싱(S108)을 통해 전체 주파수 응답을 추정한다. 이후, 시간 영여겡서 구해진 위너 필터 계수를 이용하여 잡음 제거를 수행한다(S109). Since there is an undefined portion of the frequency response of the estimated noise and the target signal, the overall frequency response is estimated through frequency response smoothing (S108). Thereafter, noise removal is performed using the Wiener filter coefficients obtained over time (S109).

이상의 과정을 통해, 각 주파수 밴드별 목적 신호가 추출되고, 전체 주파수 대역의 신호를 함성하기 위해 시간 영역 주파수 함성 필터링을 수행한다(S110). Through the above process, the target signal for each frequency band is extracted, and time domain frequency shamble filtering is performed to synthesize a signal of the entire frequency band (S110).

이후, 신호 종료 여부를 판단(S111)하고, 신호가 종료되지 않았으면 다음 신호 블록에 대한 목적 신호 검출을 수행한다. Subsequently, it is determined whether the signal is terminated (S111), and if the signal is not terminated, the target signal is detected for the next signal block.

이하, 각 단계에 대해서 자세히 살펴본다. Hereinafter, each step will be described in detail.

본 발명의 실시예에 따른 잡음제거 방법은 목적 신호의 위치 정보를 우선 입력 받는다. In the method of removing noise according to an embodiment of the present invention, position information of a target signal is first input.

목적 신호의 위치 정보 입력(S101) 단계에서는, 취득해야 할 목적 신호의 위치를 입력 받는다. 취득해야 할 목적 신호와 목적 신호를 수신하는 단말과의 거리가 1m 이하의 근거리 일 때는, 음향 신호가 마이크로 들어올 때 직접 경로에 의한 성분이 커서 잔향 성분을 거의 무시할 수 있다. 잔향이 약한 경우에는 2개 이상의 마이크를 사용하는 경우에 각 주파수 성분이 어떤 각도로 수신되었는지 큰 오차 없이 검출이 가능하다. In the step of inputting position information of the target signal (S101), the position of the target signal to be acquired is received. When the distance between the target signal to be acquired and the terminal receiving the target signal is short distance of 1 m or less, the component by the direct path is large when the acoustic signal enters the microphone, so that the reverberation component can be almost ignored. If the reverberation is weak, when two or more microphones are used, it is possible to detect at which angle each frequency component is received without significant error.

따라서, 본 실시예와 같이 목적 신호의 위치를 선행 정보로 사용하는 경우에는 목적 신호의 위치 파악을 위한 훈련 과정이 필요 없다. 즉, 어떤 신호가 들어오는 경우에도 목적 신호의 위치 정보를 이용하여 실시간으로 잡음 성분과 목적 신호 성분을 주파수 영역에서 분리가 가능하다. Therefore, when using the position of the target signal as the preceding information as in the present embodiment, a training process for identifying the position of the target signal is not necessary. That is, even when a signal is received, the noise component and the target signal component can be separated in the frequency domain in real time using the location information of the target signal.

이후, 본 실시예에 따른 잡음제거 방법은 등화기 정보 입력(S102) 단계를 거친다. 등화기 정보 입력(S102) 단계에서는 마이크 간의 부정합을 보정한다. 음성인식 환경에서 마이크간의 특성의 차이는 피할 수 없고, 이와 함께 마이크로부터 신호를 수신하는 아날로그/디지털(analog-to-digital, A/D) 변환기의 특성도 차이 가 날 수 있다. 등화기 정보 인력 단계(S102)에서는 이러한 마이크 및 A/D 변환기의 특성 차이를 주파수 영역에서 보상한다. Subsequently, the noise canceling method according to the present embodiment passes through an equalizer information input step S102. In the equalizer information input step S102, mismatch between microphones is corrected. In the voice recognition environment, the difference between the characteristics of the microphones is inevitable, and the characteristics of the analog-to-digital (A / D) converter that receives the signal from the microphone may also be different. The equalizer information attraction step S102 compensates for the difference in characteristics of the microphone and the A / D converter in the frequency domain.

기준 채널이 제1번이고, 입력 채널의 수가 2개일 때 등화기 구현을 위한 비용 함수를 수학식 5에 나타내었다.Equation 5 shows a cost function for the equalizer implementation when the reference channel is number 1 and the number of input channels is two.

[수학식 5][Equation 5]

여기서 k는 이산 주파수,

는 시간, T는 총 음성 신호 블록의 개수이다. Where k is the discrete frequency,

Is the time, and T is the total number of speech signal blocks.

이며, FFT[]는 고속 푸리에 변환 함수이다.

과

은 채널별 입력 신호이다.

는 채널 2의 주파수 응답을 보정하기 위한 등화기 계수이다. 수학식 5을 최적화하면, 수학식 6를 얻는다.

FFT [] is a fast Fourier transform function.

and

Is an input signal for each channel.

Is the equalizer coefficient for correcting the frequency response of channel 2. By optimizing Equation 5, Equation 6 is obtained.

[수학식 6][Equation 6]

도 5는 본 발명의 실시예에 필요한 등화기 설계 방법의 흐름도이다. 5 is a flowchart of an equalizer design method required for an embodiment of the present invention.

우선, 목적 신호의 위치 정보를 입력(S201)받고, 단위 시간별로 단구간 다채널 신호를 입력 받는다(S202). 이후, 입력 신호가 훈련 신호 인지 여부를 판단한 다(S203). 입력 신호가 훈련 신호가 아닌 경우는 다시 단위 시간별 신호를 입력받는다. First, the positional information of the target signal is input (S201), and the short-term multi-channel signal is input for each unit time (S202). Thereafter, it is determined whether the input signal is a training signal (S203). If the input signal is not a training signal, the signal is received per unit time again.

입력 신호가 훈련 신호인 경우는 채널별 주파수 분석을 수행(S204)하고, 기준 채널의 주파수별로 전력을 누적(S205)시키고, 채널간의 주파수별 상관도를 누적(S206)시킨다. If the input signal is a training signal, frequency analysis for each channel is performed (S204), power is accumulated for each frequency of the reference channel (S205), and frequency correlation between channels is accumulated (S206).

이후, 신호 입력의 종료 여부를 판단(S207)하고, 신호 입력이 종료되지 않았으면 다시 단위 시간별 신호를 입력 받는다. Thereafter, it is determined whether the signal input is terminated (S207), and if the signal input is not finished, the signal for each unit time is input again.

신호 입력이 종료된 경우는 누적 상관도와 누적 전력의 비를 계산한다(S208). When the signal is input, the ratio of the cumulative correlation and the cumulative power is calculated (S208).

이후, 단위 시간별로 입력된 다채널 신호 블록에 대해서 목적 신호를 추출하는 과정을 수행한다. 단구간 다채널 신호를 입력(S103)받고, 시간 지연과 방향에 대한 모호성을 해결하기 위한 시간 영역 주파수 분할 필터링을 수행(S104)하여 채널-주파수 밴드 신호를 생성한다. Thereafter, a process of extracting a target signal with respect to the multi-channel signal block input for each unit time is performed. A short-term multi-channel signal is input (S103), and time-domain frequency division filtering is performed to solve ambiguity of time delay and direction (S104) to generate a channel-frequency band signal.

시간 영역 주파수 분할 필터링(S104) 단계에서는, 주파수 영역에서 균등 밴드 분할 필터링 및 다운 샘플링(down-sampling)을 수행하여 신호를 분할한다. 도 6은 시간 영역 주파수 분할 필터링 방법을 나타낸 흐름도이다. In the time-domain frequency division filtering step S104, the signal is divided by performing equal band division filtering and down-sampling in the frequency domain. 6 is a flowchart illustrating a time domain frequency division filtering method.

도 6을 참고하면, 잡음제거 장치는 단구간 신호를 입력(S301)받고, 등간격 대역 통과 필터링을 수행(S302)한다. 전체 신호를 M 개의 등간격 대역(

/M ) 통과 신호로 분할한다. 그러면, (1/M)의 신호로 분할된다(S303). Referring to FIG. 6, the apparatus for removing noise receives an input of a short-term signal (S301) and performs band pass filtering at equal intervals (S302). Entire signal in M equally spaced bands (

/ M) Split into pass signals. Then, the signal is divided into a signal of (1 / M) (S303).

채널별 주파수 분석(S105) 단계에서는 분할된 신호에 대한 각 채널별 주파수 분석을 수행한다. 즉, 채널-주파수 밴드의 신호에 대한 단위 단구간 주파수 분석을 수행한다.In the channel-specific frequency analysis (S105), frequency analysis for each channel is performed on the divided signal. That is, unit short-term frequency analysis is performed on the signal of the channel-frequency band.

방향 정보 추정(S106) 단계에서는, 채널별 주파수 응답을 이용하여 각 주파수 성분의 방향 정보를 추정한다. 방향 정보는 각 주파수 성분에 대해서 채널간 이득 차이와 채널간 시간 차이를 이용하여 추정할 수 있다.In the direction information estimation step (S106), the direction information of each frequency component is estimated using the frequency response for each channel. The direction information can be estimated using the gain difference between channels and the time difference between channels for each frequency component.

예를 들어, 입력 채널의 수가 2개일 때, 수학식 7, 수학식 8 및 수학식 9의 과정을 통해서 구할 수 있다.For example, when the number of input channels is two, it can be obtained through the process of Equation 7, Equation 8 and Equation 9.

[수학식 7][Equation 7]

[수학식 8][Equation 8]

[수학식 9] [Equation 9]

여기서,

는 채널간 이득차,

는 채널간 시간차,

은 FFT의 크기를 나타낸다. here,

Is the gain difference between channels,

Is the time difference between channels,

Represents the size of the FFT.

주파수 응답 추정(S107) 단계에서는, 선행 입력된 목적 신호의 위치 정보를 기초로 목적 신호 및 잡음의 단위 구간 주파수 응답별 분류를 수행한다. 목적 신 호의 위치가 마이크 어레이의 정면에 위치한다고 가정하면 수학식 10과 같이 수행될 수 있다.In the step of frequency response estimation (S107), classification is performed for each unit frequency response of the target signal and the noise based on the position information of the previously input target signal. Assuming that the position of the target signal is located in front of the microphone array can be performed as shown in Equation 10.

[수학식 10][Equation 10]

값은 (채널간 이득차, 채널간 시간차)로 주어지는 평면에서 목적 신호를 추출하기 위한 문턱치 값이다. 수학식 6에서

이 1일 경우에는 기준 채널의 k 번째 단위 구간 주파수 응답이 목적 신호로부터 나온 것으로, 0일 경우에는 잡음원으로부터 나온 것으로 추정한다.

The value is a threshold value for extracting the target signal in the plane given by (inter-channel gain difference, inter-channel time difference). In equation (6)

If it is 1, the k-th unit frequency response of the reference channel is derived from the target signal, and if it is 0, it is assumed to be from a noise source.

주파수 응답 스무싱(S108) 단계에서는, 잡음 및 목적 신호에서 정의되지 않은 주파수 응답을 추정한다. 즉, 이전 단계에서 추정된 잡음 및 목적 신호의 주파수 응답에는 정의되지 않은 부분이 존재하므로 주파수 응답 스무싱을 통해 전체 주파수 응답을 추정한다. 수학식 11은 그 방법을 나타내고 있다.In the step of frequency response smoothing (S108), an undefined frequency response in the noise and the target signal is estimated. That is, since there is an undefined part of the frequency response of the noise and the target signal estimated in the previous step, the total frequency response is estimated through frequency response smoothing. (11) shows the method.

[수학식 11][Equation 11]

여기서,

는 스무싱된 주파수 응답 크기,

은 입력된 주파수 응답 크기를 나타내고,

는 스무싱된 함수의 주파수 응답 크기를 나타낸다. here,

Is the smoothed frequency response magnitude,

Represents the input frequency response magnitude,

Denotes the magnitude of the frequency response of the smoothed function.

이후, 전체 주파수 응답에서 시간 영역에서 구해진 위너 필터의 계수를 이용하여 잡음 제거를 수행한다(S109). Thereafter, noise reduction is performed using coefficients of the Wiener filter obtained in the time domain in the entire frequency response (S109).

위와 같은 단계를 거치며, 입력 신호에 대한 각 주파수 밴드별 목적 신호가 추출되고, 이후 전체 주파수 대역의 신호를 합성하기 위해 시간 영역 주파수 합성 필터링(S110)을 수행한다. Through the above steps, the target signal for each frequency band for the input signal is extracted, and then time domain frequency synthesis filtering (S110) is performed to synthesize the signals of the entire frequency band.

시간 영역 주파수 합성 필터링(S110) 단계에서는, 각 밴드별 신호의 업 샘플링(up-sampling) 및 균등 밴드 합성 필터링을 수행한다. 채널별 시간 영역 주파수 분할 필터링이 필요한 이유는 신호원의 공간상에서의 모호성을 없앨 수 있기 때문이다. 표본화 주파수를 높이면서 공간상의 모호성을 없애려면 마이크간의 거리는 줄어들어야 한다. 현상적으로 본다면 채널간 시간차를 구하기 위한 수학식 9에서 쓰고 있는 위상 응답은 (-

,

)에서만 정의되어야 한다. 이것은 디지털 주파수

성분의 신호가 가질 수 있는 시간차가 1 샘플 미만이어야 한다는 뜻과 같다. In the step of time domain frequency synthesis filtering (S110), up-sampling and equal band synthesis filtering of signals for each band are performed. The time-domain frequency division filtering for each channel is necessary because the ambiguity in the signal source can be eliminated. To increase the sampling frequency and eliminate spatial ambiguity, the distance between the microphones must be reduced. Phenomenologically, the phase response used in Equation 9 to calculate the time difference between channels is (-

,

Must be defined only). This is digital frequency

Equivalent to the time difference that a component's signal can have.

본 실시예에 의하면 균등 주파수 밴드의 개수를 증가시킴에 따라 표본화 주파수를 낮출 수 있고, 이에 따라 주어진 마이크간의 간격에 대해서 수학식 9의 위상 응답이 (-

,

)가 되도록 하는 조건을 항상 만족시킬 수 있다. 도 7은 분할할 균등 주파수 밴드가 2개일 때 다중 위상 QMF(quardrature mirror filtering)기법으로 구현한 예시이다. 도 7에서

,

은 저대역 및 고대역 분할 신호이며,

는 입력

와 주파수 영역에서 전력만이 보존된 신호이다. 즉, 위상 응답은 다를 수 있다. According to this embodiment, the sampling frequency can be lowered as the number of equal frequency bands is increased, so that the phase response of Equation (9) is given for the interval between microphones.

,

Can always be satisfied. FIG. 7 illustrates an example of multi-phase QMF (quadrature mirror filtering) when two equal frequency bands are divided. In Figure 7

,

Is a low band and high band split signal,

Enter

Only power in the and frequency domains is conserved. That is, the phase response can be different.

도 8은 시간 영역 주파수 합성 필터링 방법을 나타낸 흐름도이다.8 is a flowchart illustrating a time domain frequency synthesis filtering method.

M 개의 대역 통과 신호를 입력(S401)받고, 각 대역 신호에 대해서 M 배의 zero를 삽입(S402)한다. 이후, 각 대역 신호에 대해서 등간격 대역(

/M ) 통과 필터링을 수행(S403)하고, 각 대역 신호를 합하여(S404), 전체 주파수 대역의 신호를 합성한다. M band pass signals are input (S401), and M times zero for each band signal is inserted (S402). Subsequently, for each band signal, an equal interval band (

/ M) Pass filtering is performed (S403), and each band signal is summed (S404) to synthesize signals of all frequency bands.

이후, 신호 종료 여부 판단(S111) 단계에서는 신호가 종료 되지 않았으면 다음 신호 블록에 대한 목적 신호 검출을 수행하고, 신호가 종료 되었으면 음성 인식 방법은 종료한다. Thereafter, in step S111, if the signal is not terminated, the target signal is detected for the next signal block, and if the signal is terminated, the voice recognition method ends.

[실시예]EXAMPLE

도 9는 하나의 실시예로, 채널간 주파수 성분의 이득차 및 시간차 분포를 도시한 것이다. 실시예에서는 표본화 주파수가 16kHz, 마이크 수가 2개, 마이크간 간격이 4cm, 균등 밴드 분할 수가 2개, FFT의 크기가 512, 목적 신호원 1개 및 잡음 신호원 2개, 잡음 신호원간의 각도 150도, 수학식 6에서 r 값이 0.4, SNR 0dB 일때, 저대역, 고대역 신호에 대한 채널간 시간차 및 이득차를 나타낸 것이다. 도 9를 참고하면 목적 신호 성분과 잡음 신호 성분이 합쳐진 것을 확인할 수 있으며, 도 10은 잡음이 섞인 신호의 파형 및 본 실시예에 의한 잡음 제거의 결과를 나타내고 있다. FIG. 9 illustrates a gain difference and time difference distribution of frequency components between channels in an embodiment. In the embodiment, the sampling frequency is 16 kHz, the number of microphones is 2, the interval between microphones is 4 cm, the number of equal band divisions is 2, the size of the FFT is 512, the target signal source and the two noise signal sources, and the angle between the noise signal sources is 150 FIG. 6 shows the time difference and gain difference between channels for the low band and high band signals when r is 0.4 and SNR is 0 dB. Referring to FIG. 9, it can be seen that the target signal component and the noise signal component are combined, and FIG. 10 shows a waveform of a signal mixed with noise and a result of noise removal according to the present embodiment.

본 실시예에 의하면 선행 위치 정보를 기초로 목적 신호의 주파수 응답과 잡음 신호의 주파수 응답을 분리하고, 목적 신호 및 잡음 신호의 주파수 응답을 이용하여 잡음 제거를 수행한다. 이를 통해, 음성 인식률을 향상시키고, 통화 품질을 향상 시킬 수 있는 음성인식 및 음성 복부호화기를 위한 잡음제거 장치 및 방법을 제공할 수 있다.According to the present exemplary embodiment, the frequency response of the target signal and the frequency response of the noise signal are separated based on the preceding position information, and the noise is removed by using the frequency response of the target signal and the noise signal. Through this, it is possible to provide a noise reduction device and method for speech recognition and speech decoder which can improve speech recognition rate and improve call quality.

이상에서 설명한 본 발명의 실시예는 장치 및 방법을 통해서만 구현이 되는 것은 아니며, 본 발명의 실시예의 구성에 대응하는 기능을 실현하는 프로그램 또는 그 프로그램이 기록된 기록 매체를 통해 구현될 수도 있으며, 이러한 구현은 앞서 설명한 실시예의 기재로부터 본 발명이 속하는 기술분야의 전문가라면 쉽게 구현할 수 있는 것이다. The embodiments of the present invention described above are not implemented only through the apparatus and the method, but may be implemented through a program for realizing a function corresponding to the configuration of the embodiment of the present invention or a recording medium on which the program is recorded. Implementation may be easily implemented by those skilled in the art from the description of the above-described embodiments.

이상에서 본 발명의 실시예에 대하여 상세하게 설명하였지만 본 발명의 권리범위는 이에 한정되는 것은 아니고 다음의 청구범위에서 정의하고 있는 본 발명의 기본 개념을 이용한 당업자의 여러 변형 및 개량 형태 또한 본 발명의 권리범위에 속하는 것이다.Although the embodiments of the present invention have been described in detail above, the scope of the present invention is not limited thereto, and various modifications and improvements of those skilled in the art using the basic concepts of the present invention defined in the following claims are also provided. It belongs to the scope of rights.

도 6은 시간 영역 주파수 분할 필터링 방법을 나타낸 흐름도이다. 6 is a flowchart illustrating a time domain frequency division filtering method.

도 7은 분할할 균등 주파수 밴드가 2개일 때 다중 위상 QMF(quardrature mirror filtering)기법으로 구현한 예시이다.FIG. 7 illustrates an example of multi-phase QMF (quadrature mirror filtering) when two equal frequency bands are divided.

도 9는 실시예에 따른 채널간 주파수 성분의 이득차 및 시간차의 분포를 도시한 것이다. 9 illustrates a distribution of gain and time differences in frequency components between channels according to an embodiment.

도 10은 실시예에 따른 잡음 제거 전 후의 신호를 도시한 것이다. 10 illustrates a signal before and after noise cancellation according to an embodiment.

Claims

In the noise reduction method for speech recognition and speech decoder,

(a) receiving a position of a target signal as a voice signal as preceding information;

(b) after frequency analysis of the multi-channel acoustic signals received from the plurality of microphones, the frequency response of the target signal is whether each frequency response of the signal received in the step (a) is a frequency response of the noise signal based on the preceding information; Estimating whether it is a response; And

(c) performing noise cancellation using the results of step (b)

Noise Reduction Method.

The method of claim 1,

In step (b),

(i) performing time domain frequency division filtering on the received multichannel acoustic signal;

(ii) performing channel-specific frequency analysis on the signal frequency-divided in step (i);

(iii) estimating direction information based on the frequency analysis of step (ii); And

(iv) estimating the short-term frequency response of the target signal and the noise signal based on the direction information estimated in step (iii);

Noise Reduction Method.

The method of claim 2,

In step (c),

(v) estimating the total frequency response of the noise signal and the target signal using the frequency response of the noise signal estimated in step (b) and the frequency response of the previously defined portion of the undefined part of the frequency response of the target signal Making; And

(vi) removing noise from the overall frequency response using coefficients of the Wiener filter obtained in the time domain;

Noise Reduction Method.

The method of claim 3,

In step (a),

(vii) receiving frequency domain equalizer information using a gain difference and a time difference between channels to compensate for interchannel characteristics of the multichannel speech signal;

Noise Reduction Method.

The method of claim 4, wherein

Step (vii),

(1) determining whether the received multichannel sound signal is a training signal;

(2) if the received multi-channel sound signal is a training signal in step (1), performing frequency analysis for each channel;

(3) accumulating power for each frequency of the reference channel and accumulating correlation for each frequency between the channels;

(4) determining whether the input of the training signal is finished; And

(5) calculating the ratio of cumulative correlation and cumulative power when the input of the training signal in step (4) is terminated;

Noise Reduction Method.

The method of claim 5,

In step (i), the received multi-channel sound signal may be divided into M equally spaced bands (

/ M) divided by the pass signal

Noise Reduction Method.

The method of claim 6,

In step (iv), the short-term frequency response of the target signal and the noise signal are estimated by using a gain difference and a time difference between frequency components.

Noise Reduction Method.

In the noise reduction device for speech recognition and speech decoder,

A plurality of microphones for receiving an acoustic signal including voice and noise;

An object signal information input unit which receives a position of the object signal as a voice signal as preceding information;

An equalizer information input unit for outputting equalizer information for correcting mismatches between the plurality of microphones;

A noise canceling unit for frequency analyzing the acoustic signal and estimating whether each frequency response of the received acoustic signal is a frequency response of a noise signal or a frequency response of a target signal based on the preceding information; And

It includes a control unit for controlling the operation of the plurality of microphones, the target signal information input unit, the equalizer information input unit and the noise canceller,

The control unit,

Frequency analysis of the sound signal received through the plurality of microphones, and based on the preceding information to estimate whether each frequency response is the frequency response of the noise signal or the frequency response of the target signal, and using the estimated result Characterized by performing noise cancellation

Noise Canceling Device.

The method of claim 8,

The noise removing unit

A signal block forming unit dividing the sound signal into short-term multi-channel signal blocks in unit time units;

A frequency division unit for performing frequency division filtering on the short-term multi-channel signal block in a time domain;

A frequency response estimator for performing frequency analysis for each channel of the frequency-divided short-term multichannel signal block, and estimating a frequency response of the target signal based on direction information of each frequency;

A noise processor for removing noise using a Wiener filter coefficient obtained in a time domain from the estimated frequency response of the target signal and the noise signal;

A frequency synthesizer configured to perform frequency synthesis filtering on the target signal for each frequency band from which the noise is removed in the time domain; And

And a signal end determination unit to determine whether the sound signal ends.

Noise Canceling Device.