KR102045953B1

KR102045953B1 - Method for cancellating mimo acoustic echo based on kalman filtering

Info

Publication number: KR102045953B1
Application number: KR1020180112036A
Authority: KR
Inventors: 장준혁; 박지환
Original assignee: 한양대학교 산학협력단
Priority date: 2018-09-19
Filing date: 2018-09-19
Publication date: 2019-11-18

Abstract

Disclosed are a Kalman filter-based multi-channel input/output acoustic echo cancellation method and system. According to the present invention, a method for cancelling an acoustic echo can comprise the steps of: receiving a voice signal and an audio signal output through a speaker through a multi-channel microphone; estimating a multi-channel echo path based on a Kalman filter on a multi-channel microphone input signal input through the multi-channel microphone; estimating a multi-channel echo signal based on the estimated multi-channel echo path; and generating a multi-channel echo cancellation output signal based on the estimated multi-channel echo signal and the multi-channel microphone input signal.

Description

Kalman Filter-Based Multichannel Input / Output Acoustic Echo Cancellation {METHOD FOR CANCELLATING MIMO ACOUSTIC ECHO BASED ON KALMAN FILTERING}

본 발명의 실시예들은 다중 입출력 환경에서 음향학적 반향신호(acoustic echo)를 제거하는 기술에 관한 것이다.Embodiments of the present invention relate to a technique for canceling acoustic echo in a multiple input / output environment.

음향학적 반향신호(acoustic echo)는 스피커와 마이크(microphone)가 한 공간에 존재할 때, 스피커에서 출력되는 신호가 마이크로 입력되는 경우가 발생하는데, 이처럼, 스피커 출력이 다시 마이크로 입력되는 신호를 나타낸다. 이러한 반향신호는 음성통신의 감도를 저하시키고, 음성인식률을 저하시키는 요인으로 작용한다. An acoustic echo signal occurs when a speaker and a microphone are present in a space, and a signal output from the speaker is input to a microphone. Thus, the speaker output represents a signal input to the microphone again. These echo signals lower the sensitivity of voice communication and act as a factor of lowering the voice recognition rate.

반향신호로 인한 성능 저하를 감소시키기 위해 음향학적 반향신호 제거 방법을 이용하는데, 음향학적 반향신호 제거 방법은 마이크의 개수에 따라 방법에 차이가 존재한다. 예컨대, 단일채널 마이크 기반의 반향제거는 LMS(least Mean Square), RLS(Recursive Least Square) 등의 적응형 필터를 이용하여 반향 신호를 추정하고, 추정된 반향 신호를 마이크 입력신호로부터 제거한다. 이외에, LMS, RLS와 같은 적응형 필터를 이용하여 각 마이크 채널 별 반향신호를 제거한 후, 빔포밍(beamforming)을 이용해 단일채널을 출력하거나, 빔포밍을 거친 단일채널 출력에 대해 반향신호를 제거하는 방법이 존재한다. In order to reduce performance degradation due to the echo signal, an acoustic echo signal cancellation method is used. The acoustic echo signal removal method has a difference depending on the number of microphones. For example, echo cancellation based on a single channel microphone estimates an echo signal using an adaptive filter such as least mean square (LMS) or recursive least square (RLS), and removes the estimated echo signal from the microphone input signal. In addition, by using an adaptive filter such as LMS, RLS to remove the echo signal for each microphone channel, and outputting a single channel using beamforming, or to remove the echo signal for a single channel output through the beamforming There is a way.

그러나, 기존의 반향신호 제거 방법은 공간적 필터링 능력이 우수한 빔포밍관의 결합을 통해 단일채널 반향제거 방법보다는 성능이 우수나, 여전히 다음의 두 가지 단점을 가지고 있다. 첫째는 출력신호가 단일채널이므로 반향신호 제거 이후의 공간 상의 정보를 얻을 수 없어, 반향 신호를 제거한 이후의 유의미한 방향 추정 및 다채널 마이크 기반의 상황인지를 수행할 수 없다. 둘째는 고정된 스텝 사이즈(step size) 값을 갖는 적응형 필터가 화자의 움직임에도 쉽게 변하는 반향환경에 빠르게 적응하지 못함에 따른 반향제거 성능 저하가 발생한다.However, the conventional echo signal cancellation method is superior to the single channel echo cancellation method by combining the beamforming tube with excellent spatial filtering capability, but still has the following two disadvantages. First, since the output signal is a single channel, it is impossible to obtain spatial information after removing the echo signal, and thus it is impossible to perform meaningful direction estimation and multi-channel microphone-based situation after removing the echo signal. Second, there is a deterioration in echo cancellation performance because the adaptive filter with a fixed step size does not adapt quickly to the echo environment, which is easily changed even by the speaker's movement.

아래의 비특허 문헌 [1] S. Malik and G. Enzner , "Recursive Bayesian control of multichannel acoustic echo cancellation." IEEE Signal Processing Letters, Vol. 18, No. 11, pp. 619-622, Nov. 2011.에서는 칼만 필터(kalman filter) 기반의 반향제거 기술을 제시하고 있으나, 단일 채널 마이크와 다채널 스피커가 존재하는 환경을 가정하고 있으며, 마이크 입력 신호 간의 상관관계를 고려하지 못하므로, 결국 마이크 개수를 증가시킴에 따른 반향제거 성능을 향상시키기 어렵다. [1] S. Malik and G. Enzner , "Recursive Bayesian control of multichannel acoustic echo cancellation." IEEE Signal Processing Letters, Vol. 18, No. 11, pp. 619-622, Nov. In 2011 , Kalman filter based echo cancellation technology is proposed, but it assumes the environment where single channel microphone and multi-channel speaker exist. It is difficult to improve the echo cancellation performance by increasing.

이에 따라, 다채널 입력 및 출력 환경에서 마이크 개수를 증가시키더라도 반향신호를 효과적으로 제거하는 기술이 요구된다.Accordingly, even if the number of microphones is increased in a multi-channel input and output environment, a technique for effectively removing echo signals is required.

한국공개특허 제10-2015-0058980호는 다채널 선형예측 반향제거장치, 반향제거방법 및 이를 이용한 신호처리 장치 및 신호처리 방법에 관한 것으로, 유저와 디바이스 사이의 상호통신 환경에서의 음향 신호의 효율적인 반향 제거를 위해 다채널 선형 예측 반향제거(Linear Predictive Multi input Equalization)을 이용하는 기술을 기재하고 있다.Korean Patent Laid-Open Publication No. 10-2015-0058980 relates to a multi-channel linear prediction echo canceller, an echo canceller, and a signal processing device and a signal processing method using the same. A technique using multi-channel linear predictive multi input equalization for echo cancellation is described.

[1] S. Malik and G. Enzner, "Recursive Bayesian control of multichannel acoustic echo cancellation." IEEE Signal Processing Letters, Vol. 18, No. 11, pp. 619-622, Nov. 2011.[1] S. Malik and G. Enzner, "Recursive Bayesian control of multichannel acoustic echo cancellation." IEEE Signal Processing Letters, Vol. 18, No. 11, pp. 619-622, Nov. 2011. [2] S. Y. Lee and N. S. Kim, "A statistical model based residual echo suppression," IEEE Signal Process. Lett., Vol. 14, No. 10, pp. 758-761, Oct. 2007.[2] S. Y. Lee and N. S. Kim, "A statistical model based residual echo suppression," IEEE Signal Process. Lett., Vol. 14, No. 10, pp. 758-761, Oct. 2007.

본 발명의 일실시예는 다중 입출력이 가능하도록 입력과 출력 신호를 선형적 관계로 표현하고, 다채널 입출력 선형적 관계를 칼만 필터를 이용해 동적모델링함으로써, 반향신호를 제거하기 위한 것이다. 즉, 다채널 마이크 간의 상관 관계를 고려한 반향 경로 추정을 위한 상태-공간(state-space)모델링을 수행하고, 상태-공간으로 모델링된(즉, 동적 모델링된) 다채널 반향 경로를 MIMO 형태로 확장된 칼만 필터를 이용하여 추정하고, 추정된 다채널 반향 경로를 기반으로 다채널 마이크 입력 신호에서 다채널 반향 경로를 제거함으로써, MIMO 형태의 반향 제거를 수행하고자 한다.One embodiment of the present invention is to remove the echo signal by expressing the input and output signals in a linear relationship to enable multiple input and output, and dynamic modeling the multi-channel input and output linear relationship using a Kalman filter. That is, state-space modeling for echo path estimation considering the correlation between the multi-channel microphones is performed, and the multi-channel echo path modeled as state-space (ie, dynamic modeling) is extended to MIMO. By using the estimated Kalman filter and removing the multichannel echo path from the multichannel microphone input signal based on the estimated multichannel echo path, MIMO type echo cancellation is performed.

또한, MIMO 형태의 반향 제거를 통해 마이크로 입력되는 화자 음성의 공간적 상관성(spatial correlation)을 그대로 유지시킴으로써, 반향 제거 후에도 화자 발성의 위치를 찾을 수 있도록 하기 위한 것이다.In addition, the spatial correlation of the speaker's voice input through the microphone is maintained through MIMO type echo cancellation, so that the location of the speaker's speech can be found even after echo cancellation.

음향학적 반향신호(acoustic echo)를 제거하는 방법에 있어서, 음성 신호 및 스피커를 통해 출력되는 오디오 신호를 다채널 마이크(microphone)를 통해 입력받는 단계, 상기 다채널 마이크를 통해 입력된 다채널 마이크 입력신호를 대상으로, 칼만 필터(Kalman filter)에 기초하여 선형의 다채널 반향 경로를 추정하는 단계, 추정된 상기 다채널 반향 경로에 기초하여 다채널 반향신호를 추정하는 단계, 및 추정된 상기 다채널 반향신호 및 상기 다채널 마이크 입력신호에 기초하여 다채널 반향제거 출력신호를 생성하는 단계를 포함할 수 있다.A method of removing acoustic echo, the method comprising: receiving a voice signal and an audio signal output through a speaker through a multichannel microphone, and a multichannel microphone input through the multichannel microphone Estimating a linear multichannel echo path based on a Kalman filter for the signal, estimating a multichannel echo signal based on the estimated multichannel echo path, and estimating the estimated multichannel And generating a multichannel echo cancellation output signal based on the echo signal and the multichannel microphone input signal.

일측면에 따르면, 상기 다채널 마이크 입력신호는, 상기 다채널 반향 경로와 스피커를 통해 출력되기 전의 근단 신호의 곱에 기반할 수 있다.According to one aspect, the multi-channel microphone input signal may be based on the product of the multi-channel echo path and the near-end signal before output through the speaker.

다른 측면에 따르면, 상기 다채널 반향 경로를 추정하는 단계는, 상기 다채널 마이크 입력 신호와 출력 신호 간의 선형적 관계를 상기 칼만 필터에 기초하여 동적 모델링하는 단계를 포함할 수 있다.According to another aspect, estimating the multichannel echo path may include dynamically modeling a linear relationship between the multichannel microphone input signal and the output signal based on the Kalman filter.

또 다른 측면에 따르면, 상기 다채널 반향 경로를 추정하는 단계는, 프로세스 노이즈에 기반하는 다채널 음향학적 전달함수를 추정하기 위해 프로세스 노이즈의 공분산을 예측하는 단계를 포함할 수 있다.According to another aspect, estimating the multichannel echo path may include estimating covariance of the process noise to estimate the multichannel acoustic transfer function based on the process noise.

또 다른 측면에 따르면, 상기 다채널 반향 경로를 추정하는 단계는, 예측된 상기 공분산을 상기 다채널 마이크 입력신호에 기초하여 보정하는 업데이트 단계를 더 포함할 수 있다.According to another aspect, estimating the multichannel echo path may further include an updating step of correcting the predicted covariance based on the multichannel microphone input signal.

또 다른 측면에 따르면, 상기 보정을 통해 업데이트된 공분산은 다음 프레임에 해당하는 다채널 음향학적 전달함수를 추정하기 위해 이용될 수 있다.According to another aspect, the covariance updated through the correction may be used to estimate the multichannel acoustic transfer function corresponding to the next frame.

또 다른 측면에 따르면, 상기 다채널 반향제거 출력신호를 생성하는 단계는, 상기 다채널 마이크 입력신호에서 상기 추정된 다채널 반향신호를 제거함으로써, 상기 다채널 반향제거 출력신호를 생성할 수 있다.According to another aspect, the generating of the multi-channel echo cancellation output signal may generate the multi-channel echo cancellation output signal by removing the estimated multi-channel echo signal from the multi-channel microphone input signal.

음향학적 반향신호(acoustic echo)를 제거하는 시스템에 있어서, 음성 신호 및 스피커를 통해 출력되는 오디오 신호를 다채널 마이크(microphone)를 통해 입력받는 입력 제어부, 상기 다채널 마이크를 통해 입력된 다채널 마이크 입력신호를 대상으로, 칼만 필터(Kalman filter)에 기초하여 선형의 다채널 반향 경로를 추정하고, 추정된 상기 다채널 반향 경로에 기초하여 다채널 반향신호를 추정하는 반향신호 추정부, 및 추정된 상기 다채널 반향신호 및 상기 다채널 마이크 입력신호에 기초하여 다채널 반향제거 출력신호를 생성하는 출력신호 생성부를 포함할 수 있다.In the system for removing acoustic echo, an input control unit for receiving an audio signal and an audio signal output through a speaker through a multichannel microphone, and a multichannel microphone input through the multichannel microphone An echo signal estimator for estimating a linear multichannel echo path based on a Kalman filter for the input signal and estimating a multichannel echo signal based on the estimated multichannel echo path, and an estimated And an output signal generator configured to generate a multichannel echo cancellation output signal based on the multichannel echo signal and the multichannel microphone input signal.

다른 측면에 따르면, 상기 반향신호 추정부는, 상기 다채널 마이크 입력 신호와 출력 신호 간의 선형적 관계를 상기 칼만 필터에 기초하여 동적 모델링할 수 있다.According to another aspect, the echo signal estimator may dynamically model a linear relationship between the multichannel microphone input signal and the output signal based on the Kalman filter.

또 다른 측면에 따르면, 상기 반향신호 추정부는, 프로세스 노이즈에 기반하는 다채널 음향학적 전달함수를 추정하기 위해 프로세스 노이즈의 공분산을 예측할 수 있다.According to another aspect, the echo signal estimator may predict the covariance of the process noise to estimate the multi-channel acoustic transfer function based on the process noise.

또 다른 측면에 따르면, 상기 반향신호 추정부는, 예측된 상기 공분산을 상기 다채널 마이크 입력신호에 기초하여 보정하는 업데이트할 수 있다.According to another aspect, the echo signal estimator may update to correct the predicted covariance based on the multi-channel microphone input signal.

또 다른 측면에 따르면, 상기 출력신호 생성부는, 상기 다채널 마이크 입력신호에서 상기 추정된 다채널 반향신호를 제거함으로써, 상기 다채널 반향제거 출력신호를 생성할 수 있다.According to another aspect, the output signal generator may generate the multichannel echo cancellation output signal by removing the estimated multichannel echo signal from the multichannel microphone input signal.

본 발명의 실시예들에 따르면, 다중 입출력이 가능하도록 입력과 출력 신호를 선형적 관계로 표현하고, 다채널 입출력 선형적 관계를 칼만 필터를 이용해 동적모델링함으로써, 반향신호를 제거할 수 있다.According to embodiments of the present invention, an echo signal may be removed by expressing an input and an output signal in a linear relationship to enable multiple inputs and outputs, and by dynamically modeling the multichannel input / output linear relationship using a Kalman filter.

또한, MIMO 형태의 반향 제거를 통해 마이크로 입력되는 화자 음성의 공간적 상관성(spatial correlation)을 그대로 유지시킴으로써, 반향 제거 후에도 화자 발성의 위치를 찾을 수 있다.In addition, by maintaining the spatial correlation of the speaker's voice input through the microphone through echo cancellation in the form of MIMO, the location of the speaker's utterance can be found even after the echo cancellation.

도 1은 본 발명의 일실시예에 있어서, 칼만 필터 기반의 다채널 입출력 반향 제거 시스템의 동작을 도시한 블록 다이어그램이다.
도 2는 본 발명의 일실시예에 있어서, 음향학적 반향신호 제거 방법을 도시한 흐름도이다.
도 3은 본 발명의 일실시예에 있어서, 음향학적 반향신호 제거 시스템의 내부 구성을 도시한 블록도이다.
도 4는 본 발명의 일실시예에 있어서, 칼만 필터를 이용하여 동적 모델링을 수행하는 수학적으로 관계를 도시한 블록 다이어그램이다.1 is a block diagram illustrating an operation of a Kalman filter-based multi-channel input and output echo cancellation system according to an embodiment of the present invention.
2 is a flowchart illustrating a method of removing acoustic echo signals according to an embodiment of the present invention.
3 is a block diagram illustrating an internal configuration of an acoustic echo signal removing system according to an exemplary embodiment of the present invention.
4 is a block diagram illustrating a mathematical relationship for performing dynamic modeling using a Kalman filter according to an embodiment of the present invention.

이하, 본 발명의 실시 예를 첨부된 도면을 참조하여 상세하게 설명한다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

본 발명은 다채널 입출력 음향학적 반향신호를 제거하는 기술에 관한 것으로서, 특히, 입력과 출력 신호를 선형적 관계로 표현하고, 다채널 입출력 선형적 관계를 칼만 필터를 이용하여 동적 모델링함으로써, 반향신호를 제거하는 기술에 관한 것이다. 즉, 복수의 마이크 채널 각각 간의 상관 관계를 고려하여 다채널 반향 경로 추정을 위한 동적 모델링(state-space)을 수행하고, 동적 모델링된 다채널 반향 경로를 MIMO 형태로 확장된 칼만 필터를 이용하여 추정한 이후, 추정된 다채널 반향 경로를 이용해 다채널 음향학적 반향 신호를 추정하고, 추정된 다채널 음향학적 반향신호를 다채널 마이크 입력 신호에서 제거하는 기술에 관한 것이다.BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a technique for removing multi-channel input and output acoustic echo signals. In particular, the input and output signals are represented in a linear relationship, and the multi-channel input and output linear relationships are dynamically modeled using a Kalman filter to provide an echo signal. Relates to a technique for removing it. That is, dynamic modeling (state-space) for multichannel echo path estimation is performed by considering correlation between each of the plurality of microphone channels, and the dynamic modeled multichannel echo path is estimated using a Kalman filter extended in MIMO form. Afterwards, the present invention relates to a technique for estimating a multichannel acoustic echo signal using an estimated multichannel echo path and removing the estimated multichannel acoustic echo signal from a multichannel microphone input signal.

본 실시예들에서, '동적 모델링'은 상태-공간(state-space) 모델링을 나타낼 수 있다. In the present embodiments, 'dynamic modeling' may refer to state-space modeling.

도 1은 본 발명의 일실시예에 있어서, 칼만 필터 기반의 다채널 입출력 반향 제거 시스템의 동작을 도시한 블록 다이어그램이다.1 is a block diagram illustrating an operation of a Kalman filter-based multi-channel input and output echo cancellation system according to an embodiment of the present invention.

도 1에 따르면, 다채널 마이크와 스피커가 한 공간에 존재하는 환경에서, 칼만 필터 기반의 다채널 반향 경로(즉, 다채널 음향학적 전달 함수, 101)를 추정하고, 추정된 다채널 반향 경로를 기반으로 다채널 반향 신호(102)가 추정될 수 있다. According to FIG. 1, in an environment in which a multichannel microphone and a speaker exist in one space, a multichannel echo path based on a Kalman filter (ie, a multichannel acoustic transfer function 101) is estimated, and the estimated multichannel echo path is estimated. The multichannel echo signal 102 may be estimated based on the above.

일례로, 인공지능 스피커, 음성인식 TV, 스마트폰(smartphone) 등의 전자 기기는 음성 인식을 기반으로 명령을 수행할 뿐만 아니라, 영상이나 음악을 재생하거나, 전화 통화 기능을 제공하는 등 다양한 기능을 수행할 수 있다. 예컨대, 인공지능 스피커를 통해 음악이 재생되고 있는 상태에서, 화자가 "가수 A 음악 틀어줘" 등의 명령을 내린 경우, 인공지능 스피커에 마련된 다채널 마이크(즉, 복수개의 마이크)는 재생을 위해 출력되고 있는 음악에 해당하는 오디오 신호와 화자의 음성 신호를 입력받을 수 있다. 여기서, 스피커를 통해 출력되고 있는 음악에 관련된 정보는 시스템에서 이미 알고 있는 정보로서, 스피커를 통해 출력되기 직전의 상기 음악에 해당하는 오디오 신호는 근단 신호에 해당할 수 있다. 이처럼, 다채널 마이크를 통해 입력되는 화자의 음성 신호 및 스피커를 통해 출력된 오디오 신호가 다채널 마이크 입력 신호에 해당할 수 있다. For example, electronic devices such as artificial intelligence speakers, voice recognition TVs, and smartphones not only perform commands based on voice recognition, but also play various functions such as playing video or music or providing a phone call function. Can be done. For example, in a state where music is being played through the AI speaker, when the speaker commands “play singer A music,” the multi-channel microphone (ie, a plurality of microphones) provided in the AI speaker is used for playback. The audio signal corresponding to the music being output and the voice signal of the speaker may be input. Here, the information related to the music output through the speaker is information already known by the system, and the audio signal corresponding to the music immediately before being output through the speaker may correspond to a near-end signal. As such, the speaker's voice signal input through the multi-channel microphone and the audio signal output through the speaker may correspond to the multi-channel microphone input signal.

그러면, 음향학적 반향신호 제거 시스템은, 화자가 발화한 음성 신호를 인식하기 위해, 다채널 마이크 입력신호에서 반향신호를 제거할 수 있다. 이때, 반향신호 제거를 위해, 칼만필터를 기반으로 다채널 반향 경로를 추정할 수 있으며, 추정된 다채널 반향 신호(102)를 다채널 마이크 입력 신호(103)에서 제거함으로써, 다채널 반향제거 출력 신호(104)가 생성될 수 있다. 예컨대, 다채널 마이크 입력 신호(103)는 각 채널 별 입력 신호를 벡터 형태로 포함할 수 있으며, 각 채널 별로 상기 다채널 반향신호(102)를 감산함으로써, 채널 별로 반향신호가 제거된 출력 신호가 생성될 수 있다. 이때, 스피커를 통해 출력되기 전의 신호는 시스템에서 이미 알고 있으므로, 스피커와 다채널 마이크 각각 간의 경로(path)를 기반으로 곱해진 값이 다채널 마이크 입력 신호로 정의될 수 있다. 이에 따라, 이미 알고 있는 스피커 출력 전 신호(즉, 근단 신호)를 기반으로 반향신호가 추정되고, 추정된 반향신호를 상기 다채널 마이크 입력 신호(103)에서 제거함으로써, 즉, 채널 별로 대응하도록 제거함으로써, 다채널 반향제거 출력 신호(104)가 생성될 수 있다. The acoustic echo signal removal system may then remove the echo signal from the multi-channel microphone input signal to recognize the speech signal spoken by the speaker. In this case, in order to remove the echo signal, the multichannel echo path may be estimated based on the Kalman filter, and the multichannel echo cancellation signal 102 is removed from the multichannel microphone input signal 103, thereby outputting the multichannel echo cancellation output. Signal 104 may be generated. For example, the multi-channel microphone input signal 103 may include an input signal for each channel in a vector form, and by subtracting the multi-channel echo signal 102 for each channel, an output signal from which an echo signal is removed for each channel may be obtained. Can be generated. In this case, since the signal before output through the speaker is already known to the system, a value multiplied based on a path between the speaker and each of the multichannel microphones may be defined as the multichannel microphone input signal. Accordingly, the echo signal is estimated based on a signal before the speaker output (i.e., the near-end signal) which is already known, and the estimated echo signal is removed from the multi-channel microphone input signal 103, that is, removed to correspond to each channel. Thereby, the multichannel echo cancellation output signal 104 can be generated.

도 2는 본 발명의 일실시예에 있어서, 음향학적 반향신호 제거 방법을 도시한 흐름도이고, 도 3은 본 발명의 일실시예에 있어서, 음향학적 반향신호 제거 시스템의 내부 구성을 도시한 블록도이다.2 is a flowchart illustrating an acoustic echo signal removing method according to an embodiment of the present invention, and FIG. 3 is a block diagram illustrating an internal configuration of an acoustic echo signal removing system according to an embodiment of the present invention. to be.

도 3에 따르면, 음향학적 반향신호 제거 시스템(300)은 입력 제어부(310), 반향신호 추정부(320) 및 출력신호 제어부(330)를 포함할 수 있다. 그리고, 도 2에 도시된 음향학적 반향신호 제거 방법의 각 단계들(즉, 210 내지 240 단계)은 도 3의 음향학적 반향신호 제거 시스템(300)의 구성 요소인 입력 제어부(310), 반향신호 추정부(320) 및 출력신호 제어부(330)에 의해 수행될 수 있다.Referring to FIG. 3, the acoustic echo signal removal system 300 may include an input controller 310, an echo signal estimator 320, and an output signal controller 330. In addition, each step (ie, steps 210 to 240) of the acoustic echo signal removing method illustrated in FIG. 2 may include an input control unit 310 and an echo signal which are components of the acoustic echo signal removing system 300 of FIG. 3. It may be performed by the estimator 320 and the output signal controller 330.

210 단계에서, 입력 제어부(310)는 음성 신호 및 스피커를 통해 출력되는 오디오 신호를 다채널 마이크를 통해 입력받을 수 있다. In operation 210, the input controller 310 may receive an audio signal and an audio signal output through the speaker through the multi-channel microphone.

일례로, 스피커를 통해 재생되고 있는 음악이나 동영상에 해당하는 오디오 신호 및 원거리에 위치하는 사용자(즉, 화자)가 음향학적 반향신호 제거 시스템(300)을 향해 이야기 하는 음성 신호가 혼합된 입력 신호를 다채널 마이크를 통해 다채널 마이크 입력 신호로 입력받을 수 있다. 예컨대, 다채널 마이크로 반향신호가 입력되는 환경에서 STFT(Short Time Fourier Transform) 영역에서의 다채널 마이크 입력 신호는 아래의 수학식 1과 같이 정의될 수 있다.For example, an input signal mixed with an audio signal corresponding to a music or video being played through a speaker and a voice signal spoken toward the acoustic echo signal removing system 300 by a user (ie, a speaker) located at a long distance may be provided. Multi-channel microphone can be input as a multi-channel microphone input signal. For example, in an environment in which a multi-channel micro echo signal is input, a multi-channel microphone input signal in a short time fourier transform (STFT) region may be defined as in Equation 1 below.

[수학식 1] [Equation 1]

수학식 1에서, k는 STFT 주파수 인덱스(index), n은 프레임 인덱스(frame index)를 나타낼 수 있다. 그리고,

는 다채널 마이크 입력신호 벡터로서,

는 m 채널 마이크 입력의 STFT 계수를 나타내고, M은 전체 마이크의 개수를 나타낼 수 있다. 그리고,

는 다채널 음성신호 벡터를 나타내고,

처럼 정의될 수 있다.

는 스피커를 통해 출력되기 전의 근단 신호의 STFT 계수를 나타내고,

는 필터차수 L(즉, 칼만 필터)의 다채널 음향학적 전달함수를 나타낼 수 있다.In Equation 1, k may represent an STFT frequency index and n may represent a frame index. And,

Is the multichannel microphone input signal vector,

Denotes an STFT coefficient of the m channel microphone input, and M denotes the total number of microphones. And,

Represents a multi-channel speech signal vector,

Can be defined as

Denotes the STFT coefficient of the near-end signal before being output through the speaker,

Denotes the multi-channel acoustic transfer function of filter order L (ie, Kalman filter).

즉, 수학식 1에 따르면, STFT(Short Time Fourier Transform) 영역에서의 다채널 마이크 입력신호는 다채널 반향 경로를 나타내는 다채널 음향학적 전달함수

와 근단 신호

의 곱으로 표현될 수 있으며, 보다 상세하게는 상기 다채널 음향학적 전달 함수와 근단 신호의 곱에 다채널 음성 신호

를 합한 형태로 정의될 수 있다. That is, according to Equation 1, the multi-channel microphone input signal in the STFT (Short Time Fourier Transform) region is a multi-channel acoustic transfer function representing a multi-channel echo path.

And near-end signal

It can be expressed as a product of, more specifically, the multi-channel speech signal to the product of the multi-channel acoustic transfer function and the near-end signal

It can be defined as the sum form.

220 단계에서, 반향신호 추정부(320)는 다채널 마이크를 통해 입력된 다채널 마이크 입력신호를 대상으로, 칼만 필터(Kalman filter)에 기초하여 선형의 다채널 반향 경로를 추정할 수 있다. 여기서, 선형의 다채널 반향 경로는 다채널 마이크와 스피커 사이의 경로(path)가 선형성(linearity)을 가짐을 나타낼 수 있다.In operation 220, the echo signal estimator 320 may estimate a linear multichannel echo path based on a Kalman filter, targeting the multichannel microphone input signal input through the multichannel microphone. Here, the linear multichannel echo path may indicate that the path between the multichannel microphone and the speaker has linearity.

예컨대, 수학식 1과 같이 표현되는 다채널 마이크 입력신호에서, 근단 신호

와 다채널 마이크 입력신호

는 시스템에서 이미 알고 있는 신호이므로, 다채널 반향 경로를 나타내는 다채널 음향학적 전달함수

를 추정함으로써, 반향제거 신호를 획득할 수 있다. 이때, 반향신호 추정부(320)는 칼만 필터를 이용하여 동적 모델링을 수행함으로써, 다채널 반향경로를 추정할 수 있다. 즉, 반향신호 추정부(320)는 다채널 마이크 입력 신호와 출력 신호 간의 선형적 관계를 칼만 필터에 기초하여 동적 모델링할 수 있다. 여기서, 칼만 필터는 크게 예측(prediction) 및 업데이트(update)를 수행하는 과정으로 구분될 수 있다.For example, in the multi-channel microphone input signal represented by Equation 1, the near-end signal

And multi-channel microphone input signal

Since the signal is already known in the system, the multichannel acoustic transfer function representing the multichannel echo path

By estimating, the echo cancellation signal can be obtained. In this case, the echo signal estimator 320 may estimate the multi-channel echo path by performing dynamic modeling using the Kalman filter. That is, the echo signal estimator 320 may dynamically model the linear relationship between the multi-channel microphone input signal and the output signal based on the Kalman filter. Here, the Kalman filter may be classified into a process of performing prediction and update.

221 단계에서, 반향신호 추정부(320)는 프레임간 변이 상수 및 프로세스 노이즈에 기반하는 다채널 음향학적 전달 함수관련 평균 및 공분산을 예측할 수 있다.In operation 221, the echo signal estimator 320 may predict an average and covariance related to the multi-channel acoustic transfer function based on the inter-frame variation constant and the process noise.

222 단계에서, 반향신호 추정부(320)는 예측된 평균 및 공분산을 다채널 마이크 입력 신호에 기초하여 보정하는 업데이트를 수행할 수 있다. 이처럼, 보정을 통해 업데이트된 평균 및 공분산은 다음 프레임에 해당하는 다채널 음향학적 전달함수(즉, 다채널 반향경로)를 추정하기 위해 이용될 수 있다.In operation 222, the echo signal estimator 320 may perform an update for correcting the predicted average and covariance based on the multi-channel microphone input signal. As such, the mean and covariance updated through the correction may be used to estimate the multichannel acoustic transfer function (ie, multichannel echopath) corresponding to the next frame.

230 단계에서, 반향신호 추정부(320)는 추정된 다채널 반향 경로에 기초하여 다채널 반향 신호를 추정할 수 있다.In operation 230, the echo signal estimator 320 may estimate the multichannel echo signal based on the estimated multichannel echo path.

240 단계에서, 출력신호 제어부(330)는 추정된 다채널 반향신호 및 다채널 마이크 입력신호에 기초하여 다채널 반향제거 출력 신호를 생성할 수 있다.In operation 240, the output signal controller 330 may generate a multichannel echo cancellation output signal based on the estimated multichannel echo signal and the multichannel microphone input signal.

예컨대, 출력신호 제어부(330)는 다채널 마이크 입력신호에서 상기 추정된 다채널 반향신호를 제거함으로써, MIMO 형태의 반향 제거를 수행할 수 있다.For example, the output signal controller 330 may perform echo cancellation in the form of MIMO by removing the estimated multichannel echo signal from the multichannel microphone input signal.

도 4는 본 발명의 일실시예에 있어서, 칼만 필터를 이용하여 동적 모델링을 수행하는 수학적으로 관계를 도시한 블록 다이어그램이다.4 is a block diagram illustrating a mathematical relationship for performing dynamic modeling using a Kalman filter according to an embodiment of the present invention.

즉, 도 4는 상태-공간(state-space) 모델링과 다채널 마이크 입력 신호가 결합된 수학식을 블록 다이어그램 형태로 도시한 도면이다.That is, FIG. 4 is a block diagram of an equation in which state-space modeling and a multi-channel microphone input signal are combined.

위의 수학식 1과 같이 표현되는 다채널 마이크 입력 신호로부터 다채널 음향학적 전달함수

를 추정함으로써, 반향제거 신호를 획득할 수 있다. 이때, 반향신호 추정부(320)는 추정된 다채널 음향학적 전달함수

를 기반으로 아래의 수학식 2와 같이 표현되는 반향 제거 신호를 계산할 수 있다.Multi-channel acoustic transfer function from multi-channel microphone input signal expressed as in Equation 1 above

By estimating, the echo cancellation signal can be obtained. In this case, the echo signal estimator 320 estimates the multi-channel acoustic transfer function.

Based on the Echo cancellation signal can be calculated as shown in Equation 2 below.

[수학식 2][Equation 2]

일례로, 칼만 필터기반의 동적 모델링을 위해 다채널 음향학적 전달함수는 아래의 수학식 3과 같이 1차 마코브(markov) 모델 형태로 표현될 수 있다.For example, for the Kalman filter-based dynamic modeling, the multi-channel acoustic transfer function may be expressed in the form of a first-order markov model as shown in Equation 3 below.

[수학식 3][Equation 3]

수학식 3에서,

는

의 프레임 간 변이 상수를 나타내고,

는 프로세스 노이즈(process noise)로서 평균이 0인 복소 가우시안으로 정의될 수 있다.In Equation 3,

Is

Represents the inter-frame transition constant of

May be defined as a complex Gaussian with an average of 0 as process noise.

이처럼, 수학식 3과 같이 동적 모델링된 다채널 음향학적 전달함수

를 효과적으로 추정하기 위해 칼만 필터가 이용될 수 있다. 예컨대, 아래의 수학식 4 및 5에 기초하여 칼만 필터 기반의 예측이 수행되고(즉, 도 2에서 설명한 221 단계에 해당함), 아래의 수학식 7 및 8에 기초하여 칼만 필터 기반의 업데이트가 수행될 수 있다(즉, 도 2에서 설명한 222 단계에 해당함). As such, the multi-channel acoustic transfer function modeled dynamically as

The Kalman filter can be used to effectively estimate For example, a Kalman filter based prediction is performed based on Equations 4 and 5 below (ie, corresponds to step 221 described in FIG. 2), and an Kalman filter based update is performed based on Equations 7 and 8 below. (Ie, correspond to step 222 described with reference to FIG. 2).

[수학식 4][Equation 4]

[수학식 5][Equation 5]

위의 수학식 4 및 5에서,

는 칼만 필터의 예측(prediction step) 프로세스를 나타낼 수 있다. 그리고,

는 프로세스 노이즈

의 공분산을 나타낼 수 있다. 즉, 수학식 5에 기초하여 프로세스 노이즈의 공분산이 예측/추정될 수 있다.In Equations 4 and 5 above,

May represent a prediction step process of the Kalman filter. And,

Process noise

It can represent the covariance of. In other words, the covariance of the process noise can be predicted / estimated based on Equation (5).

위의 수학식 4 및 5에 기초하여 수행된 예측(prediction step)을 통해 획득한 평균 및 공분산에 기초하여 칼만 필터의 게인(gain)이 계산될 수 있다. 예컨대, 반향신호 추정부(320)는 아래의 수학식 6에 기초하여 상기 칼만 필터의 게인을 계산할 수 있다.The gain of the Kalman filter may be calculated based on the average and the covariance obtained through the prediction step performed based on Equations 4 and 5 above. For example, the echo signal estimator 320 may calculate the gain of the Kalman filter based on Equation 6 below.

[수학식 6][Equation 6]

이처럼, 칼만 필터의 게인이 계산되고, 프로세스 노이즈의 공분산이 예측되면, 아래의 수학식 7 및 8에 기초하여 프로세스의 노이즈를 정교하게 보정하는 업데이트가 수행될 수 있다. 여기서 평균이 0인 복수 가우시안을 가정하였으나, 프로세스 노이즈의 평균 역시 칼만필터를 기반으로 예측될 수 있으며, 예측된 평균과 공분산을 기반으로 상기 업데이트가 수행될 수도 있다.As such, when the gain of the Kalman filter is calculated and the covariance of the process noise is predicted, an update for precisely correcting the noise of the process may be performed based on Equations 7 and 8 below. Here, a plurality of Gaussians with an average of 0 are assumed, but the average of the process noise may also be predicted based on the Kalman filter, and the update may be performed based on the predicted average and the covariance.

[수학식 7][Equation 7]

[수학식 8][Equation 8]

도 4를 참고하면, 위의 수학식 3의 프로세스 노이즈

, 프레임간 변이 상수

를 기반으로 추정된 다채널 음향학적 전달함수

가

를 통과함으로써 에코(즉, 다채널 반향신호)가 되고, 다채널 반향신호와 다채널 음성 신호

가 더해져

으로 표현될 수 있다. 그러면, 위의 수학식 7 및 8, 그리고 도 4에서,

및

는 아래의 수학식 9 및 10과 같이 정의될 수 있다.Referring to FIG. 4, the process noise of Equation 3 above

, Interframe variation constant

Estimated Multichannel Acoustic Transfer Function

end

By passing through, an echo (i.e., a multichannel echo signal) becomes a multichannel echo signal and a multichannel audio signal.

Is added

It can be expressed as. Then, in the above Equations 7 and 8, and Figure 4,

And

May be defined as in Equations 9 and 10 below.

[수학식 9][Equation 9]

[수학식 10][Equation 10]

위의 수학식 9 및 10에서,

는 크로니컬 곱을 나타내고, I는 단위 행렬(identity matrix)를 나타내고, L은 칼만필터의 필터차수를 나타낼 수 있다. 그리고, H는 허미션(Hermition) 연산자를 나타낼 수 있다.In Equations 9 and 10 above,

Denotes a chromatic product, I denotes an identity matrix, and L denotes a filter order of the Kalman filter. In addition, H may represent a Hermition operator.

이처럼, 위의 수학식 4 내지 10을 기반으로 다채널 음향학적 전달함수가 추정되고, 추정된 전달함수와

를 기반으로 채널 별 반향신호가 추정될 수 있다. 그러면, 출력신호 제어부(330)는 추정된 다채널 음향학적 전달함수와 위의 수학식 2에 기초하여 최종적으로 다채널 반향제거 출력신호

를 생성할 수 있다. 즉, 수학식 2와 같이 표현되는

를 계산할 수 있다. 예컨대, 출력신호 제어부(330)는 다채널 마이크 입력 신호에서 해당 채널 별로 추정된 반향신호를 감산(즉, 제거)함으로써, 채널 별 반향제거 출력신호를 생성할 수 있다.As such, the multi-channel acoustic transfer function is estimated based on Equations 4 to 10, and the estimated transfer function

The echo signal for each channel can be estimated based on the. Then, the output signal controller 330 finally outputs the multichannel echo cancellation output signal based on the estimated multichannel acoustic transfer function and Equation 2 above.

Can be generated. That is, expressed as in Equation 2

Can be calculated. For example, the output signal controller 330 may generate an echo cancellation output signal for each channel by subtracting (ie, removing) the echo signal estimated for each channel from the multi-channel microphone input signal.

아래의 표 1은 마이크 개수에 따른 반향제거 성능 평가를 나타낼 수 있다.Table 1 below shows the echo cancellation performance evaluation according to the number of microphones.

표 1에서, 다채널 입출력 환경에서 칼만 필터를 이용하여 동적 모델링을 수행하고, 칼만필터를 업데이트함으로써, 다채널 반향제거 출력신호를 생성함에 따른 성능 측정을 위한 시뮬레이션 환경은 다음과 같을 수 있다.In Table 1, a simulation environment for performance measurement by generating a multichannel echo cancellation output signal by performing dynamic modeling using a Kalman filter and updating the Kalman filter in a multichannel input / output environment may be as follows.

일례로,

의 공간에서 잔향시간 0.3초, 0.5초에 대한 임펄스 함수(impulse function)를 기반으로 시뮬레이션이 수행될 수 있다. 마이크는 2개, 4개 사용하고, 마이크 사이의 간격은 8cm의 원형구조를 가정할 수 있다. 이때, 사용자(즉, 화자)는 마이크로부터 1m 거리를 두고 떨어져 있음을 가정하고, 반향신호 재생을 위한 스피커는 마이크로부터 0.5m의 이격된 위치에 배치되어 있고, 화자와 스피커는 90도의 각도 차로 배치되어 있음을 가정하였다. 화자신호는 무향실에서 녹음된 남성 사용자 음성에 해당하는 음성 신호를 나타내고, 반향신호는 무향실에서 녹음된 여성 화자의 음성에 해당하는 음성 신호를 포함함을 가정하였다. 1024크기의 STFT를 이용하였으며, 75%의 오버랩이 이용되었다. 그리고, 필터차수 L은 1의 값을 사용하였다. 이처럼, 반향신호는 스피커를 통해 출력되는 신호가 다시 마이크를 통해 입력되는 신호뿐만 아니라, 시스템으로 음성 인식을 위한 명령을 발화하는 사용자 이외에 다른 사용자가 이야기하는 음성 신호가 마이크를 통해 입력되는 신호를 포함할 수도 있다. 이외에, 음성 인식을 위한 명령을 발화하는 사용자 이외에 마이크를 통해 입력되는 자연에서 존재하는 다양한 신호(예컨대, 새소리, 의자 끄는 소리, 음식을 조리하는 소리 등)가 반향 신호에 해당할 수 있다.For example,

Simulation can be performed based on the impulse function for the reverberation time 0.3 seconds, 0.5 seconds in the space of. Two or four microphones may be used, and the space between the microphones may assume a circular structure of 8 cm. In this case, it is assumed that the user (ie, the speaker) is 1 m away from the microphone, and the speaker for reproducing the echo signal is disposed at a position 0.5 m away from the microphone, and the speaker and the speaker are arranged at an angle of 90 degrees. It is assumed. It is assumed that the speaker signal represents a voice signal corresponding to the male user voice recorded in the anechoic chamber, and the echo signal includes a voice signal corresponding to the voice of the female speaker recorded in the anechoic chamber. A 1024 size STFT was used and 75% overlap was used. In addition, the filter order L used the value of 1. As described above, the echo signal includes not only a signal output through a speaker, but also a signal input through a microphone, and a signal in which a voice signal spoken by another user other than the user who commands a voice recognition command to the system is input through the microphone. You may. In addition, various signals existing in nature (for example, a bird sound, a chair drag, a food cooking sound, etc.) input through a microphone may correspond to the echo signal in addition to a user who speaks a command for speech recognition.

표 1에서, 위의 비특허문헌 [1] S. Malik and G. Enzner , "Recursive Bayesian control of multichannel acoustic echo cancellation." IEEE Signal Processing Letters, Vol. 18, No. 11, pp. 619-622, Nov. 2011.에 제시된 반향제거정도를 측정하는 ERLE와, 위의 비특허문헌 [2] S. Y. Lee and N. S. Kim, "A statistical model based residual echo suppression," IEEE Signal Process. Lett., Vol. 14, No. 10, pp. 758-761, Oct. 2007.에 제시된 음성왜곡정도를 측정하는 PESQ가 반향제거 성능 검증을 위해 이용되었다.In Table 1, above non-patent literature [1] S. Malik and G. Enzner , "Recursive Bayesian control of multichannel acoustic echo cancellation." IEEE Signal Processing Letters, Vol. 18, No. 11, pp. 619-622, Nov. ERLE, which measures the degree of echo cancellation presented in 2011. , and non-patent literature [2] SY Lee and NS Kim, "A statistical model based residual echo suppression," IEEE Signal Process. Lett., Vol. 14, No. 10, pp. 758-761, Oct. PESQ, a measure of speech distortion, presented in 2007. was used to verify the echo cancellation performance.

표 1에 따르면, 2가지 잔향환경에서 마이크의 개수가 증가할수록 상대적으로 ERLE와 PESQ가 개선되는 것을 확인할 수 있다. 즉, 단일 채널 대비 다채널 마이크 환경에서의 반향신호가 더욱 정교하게 추정되고, 결국, 기존의 칼만필터 기반의 단일 채널 방법들보다 상대적으로 음성 왜곡을 줄이면서도 반향 제거 성능이 향상되었음을 확인할 수 있다. 즉, 마이크의 개수가 증가할수록 음향학적 반향신호 제거 성능이 향상되어, 음성 왜곡 역시 감소될 수 있으며, 결국 종합적인 음성 품질이 향상됨을 확인할 수 있다. 이에 따라, 반향신호를 제거한 이후, 다채널 마이크 기반의 신호처리 기술을 후처리로 적용함으로써, 음원의 방향추정, 빔포밍 등 다양한 분야에 응용 적용할 수도 있다.According to Table 1, it can be seen that as the number of microphones increases in two reverberation environments, ERLE and PESQ are relatively improved. In other words, the echo signal in the multi-channel microphone environment is more accurately estimated than the single channel, and as a result, it is confirmed that the echo cancellation performance is improved while reducing the speech distortion relatively compared to the single channel method based on the Kalman filter. That is, as the number of microphones increases, the acoustic echo signal cancellation performance is improved, so that the speech distortion may be reduced, and thus the overall speech quality may be improved. Accordingly, after removing the echo signal, by applying a multi-channel microphone-based signal processing technology as a post-processing, it may be applied to various fields such as direction estimation of a sound source, beamforming, and the like.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The method according to the embodiment may be embodied in the form of program instructions that can be executed by various computer means and recorded in a computer readable medium. The computer readable medium may include program instructions, data files, data structures, etc. alone or in combination. The program instructions recorded on the media may be those specially designed and constructed for the purposes of the embodiments, or they may be of the kind well-known and available to those having skill in the computer software arts. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tape, optical media such as CD-ROMs, DVDs, and magnetic disks, such as floppy disks. Magneto-optical media, and hardware devices specifically configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine code generated by a compiler, but also high-level language code that can be executed by a computer using an interpreter or the like. The hardware device described above may be configured to operate as one or more software modules to perform the operations of the embodiments, and vice versa.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.Although the embodiments have been described by the limited embodiments and the drawings as described above, various modifications and variations are possible to those skilled in the art from the above description. For example, the described techniques may be performed in a different order than the described method, and / or components of the described systems, structures, devices, circuits, etc. may be combined or combined in a different form than the described method, or other components. Or even if replaced or substituted by equivalents, an appropriate result can be achieved.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents to the claims are within the scope of the claims that follow.

Claims

In a method for removing acoustic echo,
Receiving a voice signal and an audio signal output through the speaker through a multi-channel microphone;
Estimating a linear multichannel echo path based on a Kalman filter on the multichannel microphone input signal input through the multichannel microphone;
Estimating a multichannel echo signal based on the estimated multichannel echo path; And
Generating a multichannel echo cancellation output signal based on the estimated multichannel echo signal and the multichannel microphone input signal;
Including,
The multi-channel microphone input signal,
A speech signal other than the speech signal and the speech signal that is a speech recognition object input through a plurality of microphones, and is based on a product of the multi-channel echo path and a near-end signal before being output through a speaker;
The generating of the multichannel echo cancellation output signal may include:
Generating the multichannel echo cancellation output signal for each channel by removing the multichannel echo signal estimated for each channel from the multichannel microphone input signal;
Acoustic echo signal removal method characterized in that.

delete

The method of claim 1,
Estimating the multichannel echo path,
Dynamically modeling a linear relationship between the multichannel microphone input signal and the output signal based on the Kalman filter
Acoustic echo signal removal method comprising a.

The method of claim 1,
Estimating the multichannel echo path,
Predicting covariance of process noise to estimate multichannel acoustic transfer function based on process noise
Acoustic echo signal removal method comprising a.

The method of claim 4, wherein
Estimating the multichannel echo path,
An update step of correcting the predicted covariance based on the multichannel microphone input signal
Acoustic echo signal removal method further comprising.

The method of claim 5,
The covariance updated through the correction is used to estimate the multichannel acoustic transfer function corresponding to the next frame.
Acoustic echo signal removal method characterized in that.

delete

In a system for canceling acoustic echo,
An input controller configured to receive a voice signal and an audio signal output through the speaker through a multi-channel microphone;
Estimating a linear multichannel echo path based on a Kalman filter on a multichannel microphone input signal input through the multichannel microphone, and multichannel echo signal based on the estimated multichannel echo path An echo signal estimator for estimating a? And
An output signal generator for generating a multichannel echo cancellation output signal based on the estimated multichannel echo signal and the multichannel microphone input signal
Including,
The multi-channel microphone input signal,
A speech signal other than the speech signal and the speech signal that is a speech recognition object input through a plurality of microphones, and is based on a product of the multi-channel echo path and a near-end signal before being output through a speaker;
The output signal generator,
Generating the multichannel echo cancellation output signal for each channel by removing the multichannel echo signal estimated for each channel from the multichannel microphone input signal;
Acoustic echo signal cancellation system characterized in that.

delete

The method of claim 8,
The echo signal estimator,
Dynamically modeling a linear relationship between the multichannel microphone input signal and an output signal based on the Kalman filter
Acoustic echo signal cancellation system characterized in that.

The method of claim 8,
The echo signal estimator,
Predicting covariance of process noise to estimate multichannel acoustic transfer function based on process noise
Acoustic echo signal cancellation system characterized in that.

The method of claim 11,
The echo signal estimator,
Updating the predicted covariance based on the multichannel microphone input signal
Acoustic echo signal cancellation system characterized in that.

The method of claim 12,
The covariance updated through the correction is used to estimate the multichannel acoustic transfer function corresponding to the next frame.
Acoustic echo signal cancellation system characterized in that.

delete