KR101537653B1

KR101537653B1 - Method and system for noise reduction based on spectral and temporal correlations

Info

Publication number: KR101537653B1
Application number: KR1020130167932A
Authority: KR
Inventors: 김남수; 진유광
Original assignee: 서울대학교산학협력단
Priority date: 2013-12-31
Filing date: 2013-12-31
Publication date: 2015-07-17
Also published as: KR20150078510A

Abstract

본 발명은 잡음 제거 방법 및 시스템에 관한 것으로서, 보다 구체적으로는 (1) 적어도 하나 이상의 마이크로부터 음향 신호를 입력받는 단계; (2) 상기 단계 (1)에서 입력받은 음향 신호에 현재 주파수 또는 현재 시간과 미리 정해진 범위로 인접한 주파수축 또는 시간축 스펙트럼을 통합하여 확장된 벡터를 생성하는 단계; 및 (3) 상기 확장된 벡터를 잡음 제거 필터를 통해 신호 처리하여 잡음이 제거된 음향 신호를 도출하는 단계를 포함하는 것을 그 구성상의 특징으로 한다.
본 발명에서 제안하고 있는 주파수 또는 시간적 상관관계를 반영한 잡음 제거 방법 및 시스템에 따르면, 입력받은 음향 신호에 현재 주파수 또는 현재 시간과 미리 정해진 범위로 인접한 주파수축 또는 시간축 스펙트럼을 통합하여 확장된 벡터를 생성하고, 확장된 벡터를 잡음 제거 필터를 통해 신호 처리하여 잡음이 제거된 음향 신호를 도출함으로써, 현재 주파수와 현재 시간의 입력 신호 뿐 아니라 지난 시간과 인접한 주파수의 입력 신호 성분들을 총체적으로 고려하여 보다 정확하고 효과적인 잡음 제거가 가능하다.
또한, 본 발명에 따르면, 확장된 벡터에 대한 PSD(Power Spectral Density) 행렬을 연산하고, 미리 정해진 파라미터화된 필터를 연산된 PSD 행렬에 적용하여 잡음이 제거된 음향 신호를 도출하되, PSD(Power Spectral Density) 행렬은, PSD 행렬에 기반하여 추정된 음성존재확률(SPP)을 이용하여 매 프레임마다 업데이트함으로써, 더욱 안정적으로 잡음 제거가 가능하다.The present invention relates to a noise cancellation method and system, and more particularly, to a noise cancellation method and system. (2) generating an extended vector by integrating a frequency axis or a time-base spectrum adjacent to the current frequency or the current time in a predetermined range in the acoustic signal received in the step (1); And (3) deriving a noise-canceled acoustic signal by subjecting the extended vector to signal processing through a noise elimination filter.
According to the noise cancellation method and system reflecting the frequency or temporal correlation proposed in the present invention, an extended vector is generated by integrating the current frequency or the current time with a frequency axis or time-base spectrum in a predetermined range in the inputted acoustic signal And the extended vector is subjected to signal processing through a noise elimination filter to derive an acoustic signal from which noises have been removed. Thus, the input signal components of the present frequency and the current time, And effective noise can be removed.
According to the present invention, a PSD (Power Spectral Density) matrix for an extended vector is computed, a predetermined parameterized filter is applied to the computed PSD matrix to derive a noise canceled sound signal, and PSD (Power The spectral density matrix is updated every frame using the estimated speech presence probability (SPP) based on the PSD matrix, so that noise can be removed more stably.

Description

FIELD OF THE INVENTION [0001] The present invention relates to a noise canceling method and system,

본 발명은 잡음 제거 방법 및 시스템에 관한 것으로서, 보다 구체적으로는 주파수 또는 시간적 상관관계를 반영한 잡음 제거 방법 및 시스템에 관한 것이다.The present invention relates to a noise cancellation method and system, and more particularly, to a noise cancellation method and system that reflects a frequency or temporal correlation.

마이크 등을 통하여 입력되는 음향 신호에는 실제 듣고자 하는 음성 이외에 다양한 잡음이 함께 존재한다. 이와 같은 잡음을 제거하여 음향 품질을 개선하기 위해서 마이크 어레이에서 수집한 데이터로부터 음성 음원과 목표 마이크 간의 채널 전달함수를 추정하고, 추정된 전달 함수에 기반한 잡음 제거 필터를 구성한다. 최근에는 잡음 및 음성 PSD(Power Spectral Density) 행렬에 기반한 PMWF, MVDR beamformer 및 GSC 등이 연구되었다(M. Souden, J. Benesty, and S. Affes, “On optimal frequency-domain multichannel linear filtering for noise reduction,” IEEE Trans. Audio, Speech, Lang. Process., vol. 18, no. 2, pp. 260-276, Feb. 2010., M. Souden, J. Chen, J. Benesty, and S. Affes, “An Integrated Solution for Online Multichannel Noise Tracking and Reduction” IEEE Trans. Audio, Speech, Lang. Process., vol. 19, no. 7, pp. 2159-2169, Sep. 2011. 참조). 이와 같은 기존의 잡음 제거 방법은 서로 다른 N 개의 마이크 입력 사이의 공간적 상관관계만을 고려하는 것으로서 신호의 주파수적 특성이나 시간적 특성은 고려하지 않은 알고리즘이다. 한편, 음성존재확률(Speech Presence Probability, SPP)을 계산하는 데에 있어서도, 현재 시간과 현재 주파수 성분의 입력만을 고려하여 현재의 음성존재확률을 구하는 것이 일반적이다(M. Souden, J. Chen, J. Benesty, and S. Affes, “Gaussian model-based multichannel speech presence probability,” IEEE Trans. Audio, Speech, Lang. Process., vol. 18, no. 5, pp. 1072-1077, Jul. 2010. 참조).
The acoustic signal inputted through the microphone or the like contains various noise other than the voice to be actually heard. In order to improve the sound quality by removing the noise, the channel transfer function between the sound source and the target microphone is estimated from the data collected by the microphone array, and a noise canceling filter based on the estimated transfer function is constructed. Recently, PMWF, MVDR beamformer and GSC based on noise and speech Power Spectral Density (PSD) matrices have been studied (M. Souden, J. Benesty, and S. Affes, "On optimal frequency-domain multichannel linear filtering for noise reduction M. Souden, J. Chen, J. Benesty, and S. Affes, "Transmission of Speech," IEEE Trans. Audio, Speech, "An Integrated Solution for Online Multichannel Noise Tracking and Reduction", IEEE Trans. Audio, Speech, Lang. Process., Vol. 19, No. 7, pp. 2159-2169, Sep. 2011). This conventional noise removal method considers only the spatial correlation between N different microphone inputs, and does not consider the frequency characteristics or temporal characteristics of the signals. On the other hand, in calculating the Speech Presence Probability (SPP), it is common to obtain the present speech presence probability only considering the input of the current time and the current frequency component (M. Souden, J. Chen, J Benesty, and S. Affes, "Gaussian Model-Based Multichannel Speech Presence Probability," IEEE Trans. Audio, Speech, Lang. Process., Vol.18, no.5, pp. 1072-1077, Jul. 2010. ).

그러나 음향 신호의 분석은 Short-Time Fourier Transform(STFT) domain에서 이루어지는데, 시간 도메인의 Linear Time-Invariant(LTI)시스템을 STFT 도메인에서 분석할 경우, 분석 window의 길이가 무한하지 않으므로 불가피하게 STFT 도메인 상의 주파수적 상관관계가 발생하여, 잡음 제거 효과가 떨어진다(Y. Avargel and I. Cohen, “System identification in the short-time Fourier transform domain with crossband filtering,” IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 4, pp. 1305-1319, May 2007. 참조). 또한, 사무실이나 방과 같이 반향(reverberation)이 생기는 실제 환경에서는 반향의 영향으로 수신 신호의 시간적 상관관계(temporal correlation)가 강해지게 되어 이를 고려하지 않는 종래의 방법으로는 효과적 잡음 제거가 이루어지지 않는다는 문제점이 있다.
However, when analyzing the time domain linear time-invariant (LTI) system in the STFT domain, the length of the analysis window is not infinite. Therefore, inevitably, the STFT domain (Y. Avargel and I. Cohen, " System identification in the short-time Fourier transform domain with crossband filtering, " IEEE Trans. Audio, Speech, Lang. Process. , vol. 15, no. 4, pp. 1305-1319, May 2007). In addition, in a real environment where reverberation occurs, such as an office or a room, temporal correlation of a received signal becomes strong due to the effect of echoes. As a result, .

본 발명자들은 이와 같은 문제점을 해결하기 위하여 공간적 상관관계만이 아니라, 주파수적 상관관계와 시간적 상관관계를 고려하여 보다 효과적이고 정확한 잡음 제거가 가능한 방법 및 시스템을 개발하고자 하였다.The present inventors have developed a method and system capable of more effectively and accurately removing noise by considering not only spatial correlation but also frequency correlation and temporal correlation.

본 발명은 기존에 제안된 방법들의 상기와 같은 문제점들을 해결하기 위해 제안된 것으로서, 입력받은 음향 신호에 현재 주파수 또는 현재 시간과 미리 정해진 범위로 인접한 주파수축 또는 시간축 스펙트럼을 통합하여 확장된 벡터를 생성하고, 확장된 벡터를 잡음 제거 필터를 통해 신호 처리하여 잡음이 제거된 음향 신호를 도출함으로써, 현재 주파수와 현재 시간의 입력 신호 뿐 아니라 지난 시간과 인접한 주파수의 입력 신호 성분들을 총체적으로 고려하여 보다 정확하고 효과적인 잡음 제거가 가능한, 주파수와 시간의 상관관계를 반영한 잡음 제거 방법 및 시스템을 제공하는 것을 그 목적으로 한다.
The present invention has been proposed in order to solve the above-mentioned problems of the previously proposed methods. The present invention integrates the current frequency or the current time with a frequency axis or a time-base spectrum in a predetermined range to generate an extended vector And the extended vector is subjected to signal processing through a noise elimination filter to derive an acoustic signal from which noises have been removed. Thus, the input signal components of the present frequency and the current time, And a noise cancellation method and system that reflects the correlation between frequency and time, which enables effective noise cancellation.

또한, 본 발명은, 확장된 벡터에 대한 PSD(Power Spectral Density) 행렬을 연산하고, 미리 정해진 파라미터화된 필터를 연산된 PSD 행렬에 적용하여 잡음이 제거된 음향 신호를 도출하되, PSD(Power Spectral Density) 행렬은, PSD 행렬에 기반하여 추정된 음성존재확률(SPP)을 이용하여 매 프레임마다 업데이트함으로써, 더욱 안정적으로 잡음 제거가 가능한, 주파수 또는 시간적 상관관계를 반영한 잡음 제거 방법 및 시스템을 제공하는 것을 또 다른 목적으로 한다.The present invention also provides a method of calculating a PSD (Power Spectral Density) matrix for an extended vector, applying a predetermined parameterized filter to a computed PSD matrix to derive a noise- Density matrix is updated every frame using the estimated speech presence probability (SPP) based on the PSD matrix to provide a noise cancellation method and system that reflects frequency or temporal correlation that can more reliably remove noise It is another purpose.

상기한 목적을 달성하기 위한 본 발명의 특징에 따른 주파수 또는 시간적 상관관계를 반영한 잡음 제거 방법은,According to an aspect of the present invention, there is provided a noise canceling method reflecting a frequency or temporal correlation,

(1) 적어도 하나 이상의 마이크로부터 음향 신호를 입력받는 단계;(1) receiving acoustic signals from at least one or more microphones;

(2) 상기 단계 (1)에서 입력받은 음향 신호에 현재 주파수 또는 현재 시간과 미리 정해진 범위로 인접한 주파수축 또는 시간축 스펙트럼을 통합하여 확장된 벡터를 생성하는 단계; 및(2) generating an extended vector by integrating a frequency axis or a time-base spectrum adjacent to the current frequency or the current time in a predetermined range in the acoustic signal received in the step (1); And

(3) 상기 확장된 벡터를 잡음 제거 필터를 통해 신호 처리하여 잡음이 제거된 음향 신호를 도출하는 단계를 포함하는 것을 그 구성상의 특징으로 한다.
(3) deriving a noise-canceled acoustic signal by subjecting the extended vector to signal processing through a noise elimination filter.

바람직하게는, 상기 단계 (2)는,Preferably, the step (2)

상기 단계 (1)에서 입력받은 공간적 상관관계가 고려된 음향 신호에 현재 주파수 및 현재 시간 미리 정해진 범위로 인접한 주파수축 및 시간축 스펙트럼을 통합하여 주파수적 상과관계 및 시간적 상관관계가 고려된 확장된 벡터를 생성하는 것일 수 있다.
The frequency synthesizer synthesizes the frequency signal and the time axis spectrum of the current frequency and the current time within a predetermined range of the acoustic signal in consideration of the spatial correlation inputted in the step (1) &Lt; / RTI >

바람직하게는, 상기 단계 (2)에서 상기 확장된 벡터는,Advantageously, in said step (2)

상기 음향 신호로부터 도출된 잡음이 섞인 음성 스펙트럼, 음성 스펙트럼 및 잡음 스펙트럼 각각에 주파수축 또는 시간축 스펙트럼을 통합하여 구성되는 것일 수 있다.
A frequency spectrum or a time-base spectrum may be integrated with each of a voice spectrum, a voice spectrum and a noise spectrum mixed with the noise derived from the acoustic signal.

바람직하게는, 상기 단계 (3)은,Preferably, the step (3)

상기 확장된 벡터에 대한 PSD(Power Spectral Density) 행렬을 연산하고, 미리 정해진 파라미터화된 필터를 상기 연산된 PSD 행렬에 적용하여 잡음이 제거된 음향 신호를 도출하는 것일 수 있다.
Computing a Power Spectral Density (PSD) matrix for the extended vector, and applying a predetermined parameterized filter to the computed PSD matrix to derive the noise canceled acoustic signal.

더욱 바람직하게는, 상기 PSD(Power Spectral Density) 행렬은, More preferably, the Power Spectral Density (PSD)

상기 PSD 행렬에 기반하여 추정된 음성존재확률(SPP)을 이용하여 매 프레임마다 업데이트될 수 있다.
And may be updated every frame using the estimated voice presence probability (SPP) based on the PSD matrix.

더욱 바람직하게는, 상기 PSD(Power Spectral Density) 행렬 중에서,More preferably, among the PSD (Power Spectral Density) matrix,

잡음이 섞인 음성 스펙트럼에 대한 PSD(Power Spectral Density) 행렬은 상기 단계 (1)의 입력받은 음향 신호로부터 도출되는 것으로서 매 프레임마다 새로운 입력치에 미리 정해진 가중치를 곱하여 추정되고,The PSD (Power Spectral Density) matrix for the noise spectrum is derived from the input acoustic signal of step (1), and is estimated by multiplying a new input value every frame by a predetermined weight,

잡음 스펙트럼에 대한 PSD(Power Spectral Density) 행렬은 음성존재확률이 없을 경우 미리 정해진 가중치에 따라 갱신되나, 음성존재확률이 있을 경우에는 추정된 음성존재확률이 반영된 가중치에 따라 갱신되며,The PSD (Power Spectral Density) matrix for the noise spectrum is updated according to a predetermined weight if there is no voice presence probability. However, if there is a voice presence probability, the estimated voice presence probability is updated according to the reflected weight,

음성 스펙트럼에 대한 PSD(Power Spectral Density) 행렬은 상기 잡음이 섞인 음성 스펙트럼에 대한 PSD(Power Spectral Density) 행렬과 상기 잡음 스펙트럼에 대한 PSD(Power Spectral Density) 행렬의 차이로 도출될 수 있다.
The Power Spectral Density (PSD) matrix for the speech spectrum can be derived from the difference between the PSD (Power Spectral Density) matrix for the noise-mixed speech spectrum and the PSD (Power Spectral Density) matrix for the noise spectrum.

상기한 목적을 달성하기 위한 본 발명의 특징에 따른 주파수 또는 시간적 상관관계를 반영한 잡음 제거 시스템은,According to an aspect of the present invention, there is provided a noise canceling system that reflects a frequency or temporal correlation,

적어도 하나 이상의 마이크로부터 음향 신호를 입력받는 음향 신호 입력 모듈;A sound signal input module for receiving sound signals from at least one microphone;

상기 음향 신호 입력 모듈로 입력된 음향 신호에 현재 주파수 또는 현재 시간과 미리 정해진 범위로 인접한 주파수축 또는 시간축 스펙트럼을 통합하여 확장된 벡터를 생성하는 벡터 생성 모듈; 및A vector generating module for generating an extended vector by integrating a frequency axis or a time axis spectrum adjacent to a current frequency or a current time with a predetermined range in an acoustic signal input to the acoustic signal input module; And

상기 확장된 벡터를 잡음 제거 필터를 통해 신호 처리하여 잡음이 제거된 음향 신호를 도출하는 잡음 제거 모듈을 포함하는 것을 그 구성상의 특징으로 한다.
And a noise removal module for extracting a noise-removed sound signal by processing the extended vector through a noise removal filter.

바람직하게는, 상기 잡음 제거 모듈은,Advantageously, the noise cancellation module comprises:

상기 확장된 벡터에 대한 PSD(Power Spectral Density) 행렬을 연산하는 연산 모듈; 및A computation module for computing a PSD (Power Spectral Density) matrix for the extended vector; And

파라미터화된 다채널 위너 필터(Parameterized Multichalnel Wiener filter)를 상기 연산된 PSD 행렬에 적용하는 필터 적용 모듈을 포함할 수 있다.And a filter applying module that applies a parameterized multi-channel Wiener filter to the calculated PSD matrix.

본 발명에서 제안하고 있는 주파수 또는 시간적 상관관계를 반영한 잡음 제거 방법 및 시스템에 따르면, 입력받은 음향 신호에 현재 주파수 또는 현재 시간과 미리 정해진 범위로 인접한 주파수축 또는 시간축 스펙트럼을 통합하여 확장된 벡터를 생성하고, 확장된 벡터를 잡음 제거 필터를 통해 신호 처리하여 잡음이 제거된 음향 신호를 도출함으로써, 현재 주파수와 현재 시간의 입력 신호 뿐 아니라 지난 시간과 인접한 주파수의 입력 신호 성분들을 총체적으로 고려하여 보다 정확하고 효과적인 잡음 제거가 가능하다.
According to the noise cancellation method and system reflecting the frequency or temporal correlation proposed in the present invention, an extended vector is generated by integrating the current frequency or the current time with a frequency axis or time-base spectrum in a predetermined range in the inputted acoustic signal And the extended vector is subjected to signal processing through a noise elimination filter to derive an acoustic signal from which noises have been removed. Thus, the input signal components of the present frequency and the current time, And effective noise can be removed.

또한, 본 발명에 따르면, 확장된 벡터에 대한 PSD(Power Spectral Density) 행렬을 연산하고, 미리 정해진 파라미터화된 필터를 연산된 PSD 행렬에 적용하여 잡음이 제거된 음향 신호를 도출하되, PSD(Power Spectral Density) 행렬은, PSD 행렬에 기반하여 추정된 음성존재확률(SPP)을 이용하여 매 프레임마다 업데이트함으로써, 더욱 안정적으로 잡음 제거가 가능하다.According to the present invention, a PSD (Power Spectral Density) matrix for an extended vector is computed, a predetermined parameterized filter is applied to the computed PSD matrix to derive a noise canceled sound signal, and PSD (Power The spectral density matrix is updated every frame using the estimated speech presence probability (SPP) based on the PSD matrix, so that noise can be removed more stably.

도 1은 본 발명의 일실시예에 따른 주파수 또는 시간적 상관관계를 반영한 잡음 제거 방법의 흐름을 도시한 도면.
도 2는 본 발명의 다른 실시예에 따른 주파수 또는 시간적 상관관계를 반영한 잡음 제거 방법의 흐름을 도시한 도면.
도 3은 본 발명의 일실시예에 따른 주파수 또는 시간적 상관관계를 반영한 잡음 제거 방법의 흐름을 도식화한 도면.
도 4는 본 발명의 일실시예에 따른 주파수 또는 시간적 상관관계를 반영한 잡음 제거 시스템의 구성을 도시한 도면.
도 5는 잡음 제거 성능을 확인한 실험 결과를 나타낸 도면.1 illustrates a flow of a noise removal method that reflects a frequency or temporal correlation according to an embodiment of the present invention.
FIG. 2 is a flowchart illustrating a noise removal method reflecting a frequency or temporal correlation according to another embodiment of the present invention; FIG.
3 is a diagram illustrating a flow of a noise removal method that reflects a frequency or temporal correlation according to an embodiment of the present invention.
4 is a diagram illustrating the configuration of a noise canceling system that reflects a frequency or temporal correlation according to an embodiment of the present invention.
FIG. 5 is a graph showing an experimental result of checking the noise canceling performance. FIG.

이하에서는 첨부된 도면을 참조하여 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 본 발명을 용이하게 실시할 수 있도록 바람직한 실시예를 상세히 설명한다. 다만, 본 발명의 바람직한 실시예를 상세하게 설명함에 있어, 관련된 공지 기능 또는 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략한다. 또한, 유사한 기능 및 작용을 하는 부분에 대해서는 도면 전체에 걸쳐 동일 또는 유사한 부호를 사용한다.
Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily carry out the present invention. In the following detailed description of the preferred embodiments of the present invention, a detailed description of known functions and configurations incorporated herein will be omitted when it may make the subject matter of the present invention rather unclear. The same or similar reference numerals are used throughout the drawings for portions having similar functions and functions.

덧붙여, 명세서 전체에서, 어떤 부분이 다른 부분과 ‘연결’되어 있다고 할 때, 이는 ‘직접적으로 연결’되어 있는 경우뿐만 아니라, 그 중간에 다른 소자를 사이에 두고 ‘간접적으로 연결’되어 있는 경우도 포함한다. 또한, 어떤 구성요소를 ‘포함’한다는 것은, 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있다는 것을 의미한다.
In addition, in the entire specification, when a part is referred to as being 'connected' to another part, it may be referred to as 'indirectly connected' not only with 'directly connected' . Also, to "include" an element means that it may include other elements, rather than excluding other elements, unless specifically stated otherwise.

기존의 잡음 제거 방법은 서로 다른 마이크(N개) 입력 사이의 공간적 상관관계(N)만을 고려하여, STFT 계수(short time Fourier transform coefficients)로 잡음 섞인 음향(noisy speech)을 y_i(k,t), 음성(clean speech)을 x_i(k,t), 잡음(noise signal)을 v_i(k,t)로 나타내었다. 이는 i번째 마이크로폰으로부터 얻어진 t 프레임에서 k번째 주파수축에 대한 것이다. 이들의 관계는 하기 수학식 1과 같다.The conventional noise cancellation method is based on the assumption that noisy speech is mixed with y _i (k, t (t)) by using STFT coefficients (short time Fourier transform coefficients) considering only the spatial correlation (N) ), it expressed as a negative (clean speech) x _i (k, t), the noise (noise signal), v _i (k, t). This is for the kth frequency axis in the t frame obtained from the ith microphone. These relationships are shown in Equation (1).

멀티채널 음향 개선에 대한 기존의 접근 방식에서 i번째 채널에서의 음성 신호 추정치는 하기 수학식 2와 같이 입력 마이크로폰 어레이 신호에 적절한 노이즈 제거 이득(noise reduction gain)을 곱함으로써 획득된다.In an existing approach to multi-channel acoustic enhancement, the speech signal estimate at the i < th > channel is obtained by multiplying the input microphone array signal by an appropriate noise reduction gain,

이와 같은 기존의 방법은, 주파수적 상관관계와 시간적 상관관계를 고려하지는 못하여 실제 환경에서 반향의 영향, 분석 윈도우(analysis window)의 길이 유한성으로 인한 영향으로 인하여 정확하고 효과적인 잡음 제거를 도출하지 못하고 있다.
Such conventional methods fail to take into account the frequency correlation and the temporal correlation, and fail to derive accurate and effective noise cancellation due to the influence of the echo in the real environment and the influence of the length of analysis window .

도 1은 본 발명의 일실시예에 따른 주파수 또는 시간적 상관관계를 반영한 잡음 제거 방법의 흐름을 도시한 도면이다. 도 1에 도시된 바와 같이, 본 발명의 일실시예에 따른 주파수 또는 시간적 상관관계를 반영한 잡음 제거 방법은, 적어도 하나 이상의 마이크로부터 음향 신호를 입력받는 단계(S100), 단계 S100에서 입력받은 음향 신호에 현재 주파수 또는 현재 시간과 미리 정해진 범위로 인접한 주파수축 또는 시간축 스펙트럼을 통합하여 확장된 벡터를 생성하는 단계(S200) 및 확장된 벡터를 잡음 제거 필터를 통해 신호 처리하여 잡음이 제거된 음향 신호를 도출하는 단계를 포함하여 구현될 수 있다. 즉, 본 발명에서는 시간적 상관관계(L)와 주파수적 상관관계(M)를 모두 고려하는 보다 확장된 형태의 잡음 제거 방법을 제안하고자 한다(하기 수학식 3 참조). 하기 수학식 3에서 Gi(k,t)는, 잡음 제거 필터 이득(noise reduction filter gains)을 의미한다. N은 공간적 상관관계(spatial correlation)를 의미하고, 잡음 제거 필터 이득(noise reduction filter gains)에서는 N에서 N'로 입력신호의 벡터 길이(고려하는 성분의 개수)가 늘어난다. 실시예에 따라서는, N'=N x (2M+1) x (N+1)일 수 있고, k-M에서 k+M까지 총 2M+1개의 인접 주파수축(frequency bins)과 t에서 t-L까지의 L+1개의 인접 시간축(frames)을 가질 수 있다. Gi(k,t), Y(k,t) 등은 미리 정해진 순서에 따라 하나의 벡터로 나열된 것일 수 있다. Gi^H(k,t)에서 H는, 벡터의 conjugate transpose를 의미한다.FIG. 1 is a flowchart illustrating a noise removal method that reflects a frequency or temporal correlation according to an embodiment of the present invention. Referring to FIG. As shown in FIG. 1, a noise canceling method that reflects a frequency or temporal correlation according to an embodiment of the present invention includes a step (S100) of receiving a sound signal from at least one or more microphones, a step (S200) by integrating the current frequency or the current time with the adjacent frequency axis or time-base spectrum in a predetermined range (S200). The extended vector is subjected to signal processing through the noise elimination filter to remove the noise- And a step of deriving the output signal. That is, the present invention proposes a more extended noise reduction method considering both the temporal correlation (L) and the frequency correlation (M) (see Equation (3)). In Equation (3), Gi (k, t) denotes noise reduction filter gains. N means spatial correlation, and in the noise reduction filter gains, the vector length of the input signal (the number of components considered) increases from N to N '. Depending on the embodiment, N '= N x (2M + 1) x (N + 1), and a total of 2M + 1 adjacent frequency bins from kM to k + L + 1 adjacent time frames (frames). Gi (k, t), Y (k, t), etc. may be arranged in one vector in a predetermined order. In Gi ^H (k, t), H means the conjugate transpose of the vector.

구체적으로, 인접한 주파수축과 시간축의 스펙트럼을 고려함으로써, 방 잔향 환경 등에 의해 생기는 임펄스 응답 함수의 영향을 더욱 세밀히 고려하여 음성을 향상시킬 수 있고, 종래 기술에서 음향 신호 처리에 사용되는 windowing 함수의 제한된 길이로 인하여 주파수 대역 간의 간섭이 생기는 단점을 극복하여 신호가 시간 영역에서 가지고 있는 정보를 스펙트럼 영역에서도 온전히 고려하는 시스템을 구성할 수 있다. 이하에서는 본 발명에서 제안하고 있는 잡음 제거 방법의 각 단계에 대하여 상세하게 설명하기로 한다.
Specifically, by considering the spectrum of the adjacent frequency axis and the time axis, the influence of the impulse response function caused by the room reverberation environment or the like can be more closely considered to improve the sound. In the prior art, It is possible to construct a system in which the information that the signal has in the time domain is completely considered in the spectrum domain by overcoming the disadvantage that the interference occurs between the frequency bands due to the length. Hereinafter, each step of the noise cancellation method proposed in the present invention will be described in detail.

단계 S100에서는, 적어도 하나 이상의 마이크로부터 음향 신호를 입력받을 수 있다. 마이크는 하나 또는 두 개 이상의 복수 개일 수 있다.
In step S100, a sound signal can be input from at least one of the microphones. The microphone may be one or more than two.

단계 S200에서는, 단계 S100에서 입력받은 음향 신호에 현재 주파수 또는 현재 시간과 미리 정해진 범위로 인접한 주파수축 또는 시간축 스펙트럼을 통합하여 확장된 벡터를 생성할 수 있다. 바람직하게는, 단계 S100에서 입력받은 공간적 상관관계가 고려된 음향 신호에 현재 주파수 및 현재 시간 미리 정해진 범위로 인접한 주파수축 및 시간축 스펙트럼을 통합하여 주파수적 상과관계 및 시간적 상관관계가 고려된 확장된 벡터를 생성할 수 있다. 확장된 벡터는, 음향 신호로부터 도출된 잡음이 섞인 음성(noisy speech) 스펙트럼, 음성(speech) 스펙트럼 및 잡음(noise) 스펙트럼 각각에 주파수축 또는 시간축 스펙트럼을 통합하여 구성될 수 있다. 주파수축과 시간축상에서 신호의 연관성을 고려하여 확장된 벡터를 구성하면 잡음환경에서의 음성 스펙트럼은 하기 수학식 4와 같이 결정될 수 있다.In step S200, the extended frequency vector may be generated by integrating the current frequency or the current time and the frequency axis or time-base spectrum adjacent to the predetermined range in the acoustic signal received in step S100. Preferably, the expanded frequency spectrum is obtained by integrating the frequency axis and the time axis spectrum adjacent to each other in a predetermined range of the current frequency and the current time, to the acoustic signal in which the spatial correlation inputted in step S100 is considered, You can create a vector. The extended vector may be constructed by integrating a frequency axis or a time-base spectrum in each of a noisy speech spectrum, a speech spectrum, and a noise spectrum obtained from a sound signal. If the extended vector is constructed considering the correlation of the signals on the frequency axis and the time axis, the speech spectrum in the noisy environment can be determined as shown in Equation (4).

(Y: 잡음이 섞인 음성 스펙트럼, X: 음성 스펙트럼, V: 잡음 스펙트럼)
(Y: noise spectrum with noise, X: speech spectrum, V: noise spectrum)

단계 S300에서는, 확장된 벡터를 잡음 제거 필터를 통해 신호 처리하여 잡음이 제거된 음향 신호를 도출할 수 있다. 바람직하게는, 확장된 벡터에 대한 PSD(Power Spectral Density) 행렬을 연산하고, 미리 정해진 파라미터화된 필터를 연산된 PSD 행렬에 적용하여 잡음이 제거된 음향 신호를 도출할 수 있다. 즉, 확장된 영역에서 음향 개선을 수행하기 위한 잡음과 음성의 확장된 벡터에 대한 PSD 행렬은 하기 수학식 5와 같이 정의될 수 있다. 본 발명에서는, 기존의 signal power spectrum density (PSD) 행렬이

나

처럼 현재 시간과 현재 주파수 성분만을 고려한 N x N 행렬임에 반해, 지난 시간 성분과 인접한 주파수 성분을 함께 고려하여 서로 간의 상관관계를 포함하는 형태의 확장된 N’ x N’ 크기의 PSD 행렬을 제시하는 것이다. 이는 기존의 통계 모델의 정보를 모두 유지하는 동시에 추가적인 주파수 및 시간적 상관관계 정보를 함께 갖고 있는 새롭게 일반화된 통계모델이라고 할 수 있다.In step S300, the extended vector is subjected to signal processing through a noise elimination filter to derive an acoustic signal from which noise has been removed. Preferably, a PSD (Power Spectral Density) matrix for the extended vector is computed, and a predetermined parameterized filter is applied to the computed PSD matrix to derive the noise canceled acoustic signal. That is, the PSD matrix for the expanded vector of the noise and speech for performing the acoustic enhancement in the extended region can be defined as Equation (5). In the present invention, a conventional signal power spectrum density (PSD) matrix

I

The N × N matrix of size N 'x N' containing the correlation between the past time component and the adjacent frequency component is presented, while the N × N matrix considering only the current time and the current frequency component is presented . This is a new generalized statistical model that retains all the information of the existing statistical models and also has additional frequency and temporal correlation information.

한편, 이렇게 정의된 확장된 벡터에 대한 PSD 행렬을 기반으로 기존의 파라미터화된 다채널 Wiener filter를 인접 시간과 주파수축의 범위까지 확장한 알고리즘을 고려하면 하기 수학식 6과 같은 필터를 얻을 수 있다.Considering an algorithm that extends the existing parameterized multi-channel Wiener filter to the range of adjacent time and frequency axis based on the PSD matrix for the extended vector thus defined, a filter as shown in Equation (6) can be obtained.

이는, 기존의 Parameterized noncausal Multichannel Wiener Filter(PMWF)를 주파수 및 시간적 상관관계를 고려하는 형태로 확장하는 것으로서, 여기에는 앞에서 제안한 확장된 형태의 PSD 행렬을 기반으로 할 수 있다.

벡터는, N’ 개의 성분 중 하나의 1을 제외하면 나머지는 다 0인 벡터로서, 1의 위치는 위에서 Y를 나열하는 순서에 따라 달라지나, 결국 의미적으로는 주변 주파수, 시간 성분을 다 모은 N’개 중에서 현재 시간-현재 주파수 성분이 위치한 곳 한군데만을 골라 추출하는 역할을 한다. 또한,

는 잡음 제거와 음성 왜곡 간의 관계를 고려하여 성능 최적점을 결정하기 위한 파라미터이다. 파라미터화된 다채널 Wiener filter은 기존의 Minimum Variance Distortionless Response(MVDR) filter, Generalized Sidelobe Canceller(GSC)의 일반화된 알고리즘이며 입력 신호에 대한 전달 함수의 정보 없이도 원하는 신호를 추정할 수 있는 방법이다. 이러한 filter를 인접 시간과 주파수 축까지 확장시킴으로서 보다 정확한 음성 신호 추정이 가능해진다. 이와 같은 잡음 제거 필터를 통하여 도출된 음성 신호는 하기 수학식 7과 같이 표현할 수 있다.This is an extension of the existing parameterized noncausal Multichannel Wiener Filter (PMWF) to consider the frequency and temporal correlation, which can be based on the previously proposed extended PSD matrix.

The vector is a vector with the remainder being 0, except for one of the N 'elements. The position of 1 varies according to the order in which Y is listed above. Eventually, N 'out of the current time - current frequency components are located. Also,

Is a parameter for determining a performance optimum point in consideration of the relationship between noise cancellation and speech distortion. The parameterized multi-channel Wiener filter is a generalized algorithm of the existing Minimum Variance Distortionless Response (MVDR) filter and Generalized Sidelobe Canceller (GSC), and is a method for estimating a desired signal without information on the transfer function of the input signal. By extending these filters to adjacent time and frequency axes, more accurate speech signal estimation becomes possible. The voice signal derived through the noise elimination filter can be expressed by Equation (7).

도 2는 본 발명의 다른 실시예에 따른 주파수 또는 시간적 상관관계를 반영한 잡음 제거 방법의 흐름을 도시한 도면이다. 도 2에 도시된 바와 같이, 본 발명의 다른 실시예에 따른 주파수 또는 시간적 상관관계를 반영한 잡음 제거 방법은, 적어도 하나 이상의 마이크로부터 음향 신호를 입력받는 단계(S100), 단계 S100에서 입력받은 공간적 상관관계가 고려된 음향 신호에 현재 주파수 및 현재 시간과 미리 정해진 범위로 인접한 주파수축 또는 시간축 스펙트럼을 통합하여 주파수적 상과관계 및 시간적 상관관계가 고려된 확장된 벡터를 생성하는 단계(S200'), 확장된 벡터를 잡음 제거 필터를 통해 신호 처리하여 잡음이 제거된 음향 신호를 도출하는 단계(S300)를 포함하되, 단계 S300은, 확장된 벡터에 대한 PSD(Power Spectral Density) 행렬을 연산하는 단계(S310), 미리 정해진 파라미터화된 필터를 연산된 PSD 행렬에 적용하여 잡음이 제거된 음향 신호를 도출하는 단계(S320), PSD 행렬에 기반하여 음성존재확률(SPP)을 추정하는 단계(S330), 추정된 음성존재확률(SPP)을 이용하여 PSD(Power Spectral Density) 행렬을 매 프레임마다 업데이트하는 단계(S340)를 포함하여 구현될 수 있다.
FIG. 2 is a flowchart illustrating a noise removal method that reflects a frequency or temporal correlation according to another embodiment of the present invention. Referring to FIG. As shown in FIG. 2, the noise reduction method reflecting the frequency or temporal correlation according to another embodiment of the present invention includes a step of receiving a sound signal from at least one or more microphones (S100), a spatial correlation A step (S200 ') of generating an extended vector by integrating a current frequency and a current time with an adjacent frequency axis or a time-base spectrum in a predetermined range, And a step S300 of deriving a noise-canceled acoustic signal by signal processing the extended vector through a noise elimination filter. In operation S300, a power spectral density (PSD) matrix for the extended vector is calculated S310), applying a predetermined parameterized filter to the computed PSD matrix to derive an acoustic signal from which noise has been removed (S320) (S340) of estimating the presence probability of speech (SPP) (S330), and updating the PSP (Power Spectral Density) matrix every frame using the estimated voice presence probability (SPP) .

구체적으로, 단계 S300은 확장된 벡터에 대한 PSD(Power Spectral Density) 행렬을 연산(S310)하고, 미리 정해진 파라미터화된 필터를 연산된 PSD 행렬에 적용하여 잡음이 제거된 음향 신호를 도출(S320)할 수 있는데, 이러한 필터 적용을 통한 음성 신호 추정은 확장된 벡터에 대한 PSD 행렬의 정확성에 영향을 받는다. 따라서 이를 안정적으로 추정하기 위해서는 각각의 확장된 벡터에 대한 PSD 행렬은 매 프레임마다 새로운 추정 값을 이용하여 기존의 추정 값을 갱신하는 것이 바람직하다. 실시예에 따라서는, PSD(Power Spectral Density) 행렬은, PSD 행렬에 기반하여 추정된 음성존재확률(SPP)(S330)을 이용하여 매 프레임마다 업데이트(S340)된 것일 수 있다.
In operation S330, a power spectral density (PSD) matrix for the extended vector is calculated (S310), a predetermined parameterized filter is applied to the computed PSD matrix to derive a noise-canceled sound signal (S320) The speech signal estimation through this filter application is affected by the accuracy of the PSD matrix for the extended vector. Therefore, in order to stably estimate the PSD matrix, it is preferable that the PSD matrix for each extended vector update the existing estimation value using a new estimated value every frame. Depending on the embodiment, the Power Spectral Density (PSD) matrix may be updated (S340) for each frame using the estimated voice presence probability (SPP) S330 based on the PSD matrix.

한편, PSD(Power Spectral Density) 행렬 중에서, 잡음이 섞인 음성 스펙트럼에 대한 PSD(Power Spectral Density) 행렬(

)은 단계 S100의 입력받은 음향 신호로부터 도출되는 것으로서 매 프레임마다 새로운 입력치에 미리 정해진 가중치를 곱하여 추정될 수 있고, 잡음 스펙트럼에 대한 PSD(Power Spectral Density) 행렬(

)은 음성존재확률이 없을 경우 미리 정해진 가중치에 따라 갱신되나, 음성존재확률이 있을 경우에는 추정된 음성존재확률이 반영된 가중치에 따라 갱신될 수 있다(하기 수학식 8 및 수학식 9 참조).On the other hand, among Power Spectral Density (PSD) matrices, a PSD (Power Spectral Density) matrix for a noise-

Is derived from the input acoustic signal in step S100, and can be estimated by multiplying a new input value every frame by a predetermined weight, and a PSD (Power Spectral Density) matrix

Is updated according to a predetermined weight if there is no voice existence probability, but may be updated according to a weight value reflecting the estimated voice existence probability when there is a voice presence probability (see Equations (8) and (9) below).

여기서,

와

는 두 개의 망각요소(forgetting factors)다.

는 상수 값으로 고정되고, 시변 주파수별 평활화 계수(time-varying frequency-dependent smoothing factor)

는 하기 수학식 10에 따라 업데이트될 수 있다. 하기 수학식 10에서

는 각 주파수축 및 프레임에서의 음성존재확률(SPP)을 나타내는 것으로 추후 더욱 상세하게 설명하기로 한다.here,

Wow

Are two forgetting factors.

Is fixed to a constant value, and a time-varying frequency-dependent smoothing factor

Can be updated according to the following equation (10). In the following equation (10)

(SPP) at each frequency axis and frame will be described in more detail later.

음성 스펙트럼에 대한 PSD(Power Spectral Density) 행렬(

)은 잡음이 섞인 음성 스펙트럼에 대한 PSD(Power Spectral Density) 행렬과 잡음 스펙트럼에 대한 PSD(Power Spectral Density) 행렬의 차이로 도출될 수 있다(수학식 11 참조).A Power Spectral Density (PSD) matrix for the speech spectrum

) Can be derived from the difference between the Power Spectral Density (PSD) matrix for the noise-mixed speech spectrum and the PSD (Power Spectral Density) matrix for the noise spectrum (see Equation 11).

음성존재확률(SPP)

는, 각각의 PSD 행렬을 추정하는 데에 있어서 매우 중요한 파라미터이다. 이 값을 얻기 위해서는 확장된 주파수, 시간축에서의 음성, 잡음 벡터가 가우시안 분포를 가진다고 가정하여 가우시안 확률 분포의 비율로부터 하기 수학식 12와 같은 식을 유도하였다. A(K,t)는 k번째 주파수축에서의 우도비율이고 det(·)은 정방 행렬의 행렬식을 나타낸다.Speech Presence Probability (SPP)

Is a very important parameter in estimating each PSD matrix. In order to obtain this value, the following equation (12) is derived from the ratio of Gaussian probability distribution assuming that the speech and noise vectors in the extended frequency, time axis have Gaussian distribution. A (K, t) is the likelihood ratio on the k-th frequency axis and det (·) is the determinant of the square matrix.

수학식 12에 기초하여 SPP가 하기 수학식 13에 의하여 계산될 수 있다.Based on Equation (12), SPP can be calculated by Equation (13).

여기서,

는 사전 음성 부재 확률이며 MCRA 알고리즘에 따라 갱신된다. 이와 같이 매 프레임마다

,

를 갱신하여 이에 따른

를 이용해 매 프레임마다 음성의 스펙트럼을 추정하고 이를 다시 시간 영역으로 복원하여 다채널에서의 음성 향상을 수행할 수 있다.
here,

Is a pre-speech absence probability and is updated according to the MCRA algorithm. In this manner,

,

To update

It is possible to estimate the spectrum of the speech every frame and restore the spectrum to the time domain to improve the speech in the multiple channels.

도 3은 본 발명의 일실시예에 따른 주파수 또는 시간적 상관관계를 반영한 잡음 제거 방법의 흐름을 도식화한 도면이다. 도 3에 도시된 바와 같이, 본 발명의 일실시예에 따른 주파수 또는 시간적 상관관계를 반영한 잡음 제거 방법은, 입력 채널에 대한 인접 시간축(지난 시간 성분)과 주파수축(인접 주파수 성분)의 스펙트럼을 통합하여 확장된 벡터 생성하고, 확장된 벡터에 대한 Power Spectral Density (PSD) 행렬의 계산 및 시간 축 상에서의 갱신한 후에, 확장된 벡터에 대한 PSD 행렬을 사용하여 각 시간축, 주파수 영역마다 음성 신호의 존재 확률을 검출하고, 파라미터화된 다채널 Wiener filter를 추정된 PSD 행렬에 적용하여 음성 신호의 추정치를 얻을 수 있다.
3 is a diagram illustrating a flow of a noise removal method that reflects a frequency or temporal correlation according to an embodiment of the present invention. As shown in FIG. 3, a noise removal method that reflects a frequency or temporal correlation according to an embodiment of the present invention includes a method of calculating a spectrum of an adjacent time axis (past time component) and a frequency axis (adjacent frequency component) And the PSD matrix for the extended vector is used to calculate the power spectral density (PSD) matrix for each of the time axes and the frequency domain, The probability of existence is detected and a parameterized multi-channel Wiener filter is applied to the estimated PSD matrix to obtain an estimate of the speech signal.

도 4는 본 발명의 일실시예에 따른 주파수 또는 시간적 상관관계를 반영한 잡음 제거 시스템의 구성을 도시한 도면이다. 도 4에 도시된 바와 같이, 본 발명의 일실시예에 따른 주파수 또는 시간적 상관관계를 반영한 잡음 제거 시스템은, 적어도 하나 이상의 마이크로부터 음향 신호를 입력받는 음향 신호 입력 모듈(100), 음향 신호 입력 모듈로 입력된 음향 신호에 현재 주파수 또는 현재 시간과 미리 정해진 범위로 인접한 주파수축 또는 시간축 스펙트럼을 통합하여 확장된 벡터를 생성하는 벡터 생성 모듈(200) 및 확장된 벡터를 잡음 제거 필터를 통해 신호 처리하여 잡음이 제거된 음향 신호를 도출하는 잡음 제거 모듈을 포함하여 구성될 수 있고, 실시예에 따라서는 잡음 제거 모듈(300)은, 확장된 벡터에 대한 PSD(Power Spectral Density) 행렬을 연산하는 연산 모듈(310) 및 파라미터화된 다채널 위너 필터(Parameterized Multichalnel Wiener filter)를 연산된 PSD 행렬에 적용하는 필터 적용 모듈(320)을 포함하여 구성될 수 있다. 각 구성에 대해서는 앞서 도 1 내지 도 3과 관련하여 설명한 바와 유사하므로 상세한 설명은 생략하기로 한다.
4 is a block diagram illustrating a noise canceling system that reflects a frequency or temporal correlation according to an embodiment of the present invention. As shown in FIG. 4, the noise canceling system that reflects the frequency or temporal correlation according to an embodiment of the present invention includes a sound signal input module 100 that receives sound signals from at least one or more microphones, a sound signal input module A vector generation module 200 for generating an extended vector by integrating a frequency axis or a time-base spectrum adjacent to a current frequency or a current time and a frequency axis or a time-base spectrum in a predetermined range to an acoustic signal input through the noise reduction filter, And a noise removal module for deriving a noise canceled acoustic signal. According to an embodiment, the noise removal module 300 includes a calculation module for calculating a PSD (Power Spectral Density) matrix for an extended vector, (310) and a parameterized multichannel Wiener filter to the computed PSD matrix It can be configured, including 320. Since each configuration is similar to that described above with reference to Figs. 1 to 3, detailed description thereof will be omitted.

이하에서는, 본 발명의 효과를 실험예를 통하여 더욱 상세하게 설명하지만, 본 발명의 권리범위가 하기 실험예에 의해 한정되는 것은 아니다.
Hereinafter, the effects of the present invention will be described in more detail through experimental examples, but the scope of the present invention is not limited by the following experimental examples.

실험예Experimental Example 1. 음향 개선 효과 비교 실험 1. Comparison experiment of sound improvement effect

Cepstrum distance, log-likelihood ratio, segmental signal-to-noise ratio 로 측정한 제안된 기법의 잡음 제거 성능을 확인하였다. 마이크 개수(N)는 2로 고정하였고, L(시간 성분)과 M(주파수 성분)을 각각 0, 1, 2로 확장하면서 성능을 측정하였다. 잔향 시간 T60=300㎳를 가지는 6.7m × 6.1m × 2.9m 크기의 방에서 이미징 방법을 사용하여 측정하였다(J. B. Allen and D. A. Berkley, “Image method for efficiently simulating small-room acoustics,” J. Acoust . Soc . Amer ., vol. 65, pp. 943-950, Apr. 1979., E. A. Lehmann, “Image-source method for room acoustics,” http://www.eric-lehmann.com/ism code.html. 참조). 음성 및 간섭 소스는 각각 (1.737m, 4.6m, 1.4m) 및 (3.337m, 4.6m, 1.4m)에 위치하였다. 2채널 시나리오(n=2)를 고려하여, (2.437m, 5.6m, 1.4m) 및 (2.637m, 5.6m, 1.4m) 위치에 마이크를 위치하였다. TIMIT 데이터베이스로부터의 발성을 음성 소스 신호로 사용하였고, NOISEX-92 데이터베이스로부터의 the babble, factory and F-16 noises(0, 5, 10, 15, 20㏈ 신호대 간섭비(SIR))를 간섭 신호 소스로 사용하였다(Babble, F-16, factory interfering signal, 0, 5, 10, 15, 20㏈ 평균). 각각의 신호는 16㎑에서 샘플링하고, 반 중첩된 길이 512의 분석 윈도우를 적용하였다. 또한, 켑스트럼 거리(Cepstrum Distance, CD), 추정된 신호 및 클린 신호 사이의 우도비(Log-Likelihood Ratio(LLR)), 잡음 감소 후에 측정된 분절 SNR(segmental SNR, segSNR)을 평가했다.
The cepstrum distance, the log-likelihood ratio, and the segmental signal-to-noise ratio were measured. The number of microphones (N) was fixed to 2, and the performance was measured while extending L (time component) and M (frequency component) to 0, 1, and 2, respectively. (JB Allen and DA Berkley, " Image method for efficiently simulating small-room acoustics, " J. Acoust . Soc . Amer . , Vol. 65, pp. 943-950, Apr. 1979., EA Lehmann, "Image-source method for room acoustics," http://www.eric-lehmann.com/ism code.html. Reference). The voice and interference sources were located at (1.737 m, 4.6 m, 1.4 m) and (3.337 m, 4.6 m, 1.4 m), respectively. Considering the two-channel scenario (n = 2), the microphones were positioned at (2.437m, 5.6m, 1.4m) and (2.637m, 5.6m, 1.4m). We used the speech from the TIMIT database as the speech source signal and the babble, factory and F-16 noises (0, 5, 10, 15, 20 dB signal-to-interference ratio (SIR)) from the NOISEX- (Babble, F-16, factory interfering signal, average of 0, 5, 10, 15, 20 dB). Each signal was sampled at 16 kHz and an analysis window of half length 512 was applied. We also evaluated cepstrum distance (CD), log-likelihood ratio (LLR) between estimated and clean signals, and segmental SNR (segSNR) measured after noise reduction.

도 5는 잡음 제거 성능을 확인한 실험 결과를 나타낸 도면이다. 도 5에 나타난 바와 같이, CD, LLR 및 segSNR의 관점에서 측정되는 음향 개선 성능이 M 또는 L의 값이 증가하는 경우 개선되는 것을 확인할 수 있었다.
5 is a graph showing an experiment result of checking the noise canceling performance. As shown in FIG. 5, it was confirmed that the sound improvement performance measured in terms of CD, LLR, and segSNR improves when the value of M or L increases.

또한, M과 L을 동시에 1로 확장시켰을 때의 성능을 측정하였다. F-16 간섭 신호의 경우로서 그 결과는 표 1에 나타내었다.Also, the performance when M and L were expanded to 1 at the same time was measured. The results are shown in Table 1 for F-16 interfering signals.

(F-16 interfering signal, 0, 5, 10, 15, 20 ㏈ 평균)
(F-16 interfering signal, 0, 5, 10, 15, 20 dB average)

표 1에 나타난 바와 같이, (M=1, L=1)의 경우, (M=0, L=1) 이나 (M=1, L=0) 보다 더 나은 음향 품질을 나타내었다. 이와 같은 실험들은 point interference source에 의한 방향성을 지닌 잡음 환경에 관한 것이다.
As shown in Table 1, when (M = 1, L = 1), sound quality is better than (M = 0, L = 1) or (M = 1, L = 0). These experiments are related to noise environments with directionality by point interference sources.

여기에 방향성이 없는 diffused white 잡음을 추가로 섞어 복합적인 환경을 만들어 음향 개선 효과를 실험하여 그 결과를 표 2에 나타내었다.The results are shown in Table 2, and the results are shown in Table 2.

(F-16 interfering signal, 10 ㏈ + white noise 15 ㏈)
(F-16 interfering signal, 10 ㏈ + white noise 15 ㏈)

표 2에 나타난 바와 같이, 0에서 1로의 M의 증가는 L의 0에서 1로의 증가보다 더 효과적임을 확인할 수 있었다. 즉, 방향성이 없는 diffused white 잡음을 추가로 섞어 복합적인 환경에서도 (M=1, L=1)의 경우에 (M=0, L=1)이나 (M=1, L=0)보다 더 나은 음향 품질을 나타내었다.
As shown in Table 2, it can be seen that the increase of M from 0 to 1 is more effective than the increase from 0 to 1 of L. (M = 0, L = 1) or (M = 1, L = 0) in a mixed environment where M = 1 and L = Sound quality.

이와 같이, 본 발명에서 제안하고 있는 주파수적 상관관계(M)와 시간적 상관관계(L)가 고려된 확장된 형태의 잡음 제거 방법 및 시스템은, 현재 시간과 현재 주파수의 입력 신호뿐 아니라 지난 시간과 인접한 주파수의 입력 신호 성분들을 총체적으로 고려한 확장된 필터 구조를 지니고 이로 인하여, 정확하고 효과적이며 안정적인 잡음 제거가 가능하다.
As described above, the extended noise cancellation method and system in which the frequency correlation (M) and the temporal correlation (L) proposed in the present invention are taken into consideration can be applied not only to the input signal of the current time and the current frequency, It has an extended filter structure that considers the input signal components of adjacent frequencies as a whole, thereby enabling accurate, effective and stable noise cancellation.

이상 설명한 본 발명은 본 발명이 속한 기술분야에서 통상의 지식을 가진 자에 의하여 다양한 변형이나 응용이 가능하며, 본 발명에 따른 기술적 사상의 범위는 아래의 특허청구범위에 의하여 정해져야 할 것이다.The present invention may be embodied in many other specific forms without departing from the spirit or essential characteristics of the invention.

100: 음향 신호 입력 모듈 200: 벡터 생성 모듈
300: 잡음 제거 모듈 310: 연산 모듈
320: 필터 적용 모듈
S100: 적어도 하나 이상의 마이크로부터 음향 신호를 입력받는 단계
S200: 단계 S100에서 입력받은 음향 신호에 현재 주파수 또는 현재 시간과 미리 정해진 범위로 인접한 주파수축 또는 시간축 스펙트럼을 통합하여 확장된 벡터를 생성하는 단계
S200': 단계 S100에서 입력받은 공간적 상관관계가 고려된 음향 신호에 현재 주파수 및 현재 시간과 미리 정해진 범위로 인접한 주파수축 또는 시간축 스펙트럼을 통합하여 주파수적 상과관계 및 시간적 상관관계가 고려된 확장된 벡터를 생성하는 단계
S300: 확장된 벡터를 잡음 제거 필터를 통해 신호 처리하여 잡음이 제거된 음향 신호를 도출하는 단계
S310: 확장된 벡터에 대한 PSD(Power Spectral Density) 행렬을 연산하는 단계
S320: 미리 정해진 파라미터화된 필터를 연산된 PSD 행렬에 적용하여 잡음이 제거된 음향 신호를 도출하는 단계
S330: PSD 행렬에 기반하여 음성존재확률(SPP)을 추정하는 단계
S340: 추정된 음성존재확률(SPP)을 이용하여 PSD(Power Spectral Density) 행렬을 매 프레임마다 업데이트하는 단계100: acoustic signal input module 200: vector generation module
300: Noise canceling module 310: Operation module
320: Filter application module
S100: Receiving a sound signal from at least one microphone
S200: generating an extended vector by integrating the current frequency or the current time and the frequency axis or the time axis spectrum adjacent to the acoustic signal received in step S100 in a predetermined range;
S200 ': an extended signal sequence in which the frequency signal and the temporal correlation are integrated by integrating the current frequency and the current time and the adjacent frequency axis or the time-base spectrum in a predetermined range to the acoustic signal in consideration of the spatial correlation input in step S100 Steps to generate vector
S300: deriving a noise-canceled acoustic signal by signal processing the extended vector through a noise elimination filter
S310: Computing a Power Spectral Density (PSD) matrix for the extended vector
S320: applying a predetermined parameterized filter to the computed PSD matrix to derive a noise canceled acoustic signal
S330: estimating the voice presence probability (SPP) based on the PSD matrix
S340: Updating the Power Spectral Density (PSD) matrix every frame using the estimated speech presence probability (SPP)

Claims

As a noise removal method,
(1) receiving acoustic signals from at least one or more microphones;
(2) generating an extended vector by integrating a frequency axis or a time-base spectrum adjacent to the current frequency or the current time in a predetermined range in the acoustic signal received in the step (1); And
(3) signal processing the extended vector through a noise elimination filter to derive a noise-canceled acoustic signal,
The step (3)
Calculating a PSD (Power Spectral Density) matrix for the extended vector, and applying a predetermined parameterized filter to the computed PSD matrix to derive a noise canceled acoustic signal. Noise canceling method reflecting relationship.

2. The method of claim 1, wherein step (2)
The frequency synthesizer synthesizes the frequency signal and the time axis spectrum of the current frequency and the current time within a predetermined range of the acoustic signal in consideration of the spatial correlation inputted in the step (1) Wherein the noise or noise signal is generated based on the frequency or temporal correlation.

2. The method of claim 1, wherein in step (2)
And a frequency spectrum or a time-base spectrum is integrated with each of a voice spectrum, a voice spectrum and a noise spectrum mixed with noise derived from the acoustic signal.

delete

The method of claim 1, wherein the PSD (Power Spectral Density)
Wherein the updated noise probability is updated every frame using the estimated voice presence probability (SPP) based on the PSD matrix.

The method of claim 1, wherein, among the PSD (Power Spectral Density) matrix,
The PSD (Power Spectral Density) matrix for the noise spectrum is derived from the input acoustic signal of step (1), and is estimated by multiplying a new input value every frame by a predetermined weight,
The PSD (Power Spectral Density) matrix for the noise spectrum is updated according to a predetermined weight without a voice presence probability, but when there is a voice presence probability, the estimated voice existence probability is updated according to a weight value reflected.
Wherein a Power Spectral Density (PSD) matrix for a speech spectrum is derived from a difference between a PSD (Power Spectral Density) matrix for the noise-mixed speech spectrum and a PSD (Power Spectral Density) matrix for the noise spectrum. Noise canceling method reflecting frequency or temporal correlation.

A noise cancellation system,
A sound signal input module for receiving sound signals from at least one microphone;
A vector generating module for generating an extended vector by integrating a frequency axis or a time axis spectrum adjacent to a current frequency or a current time with a predetermined range in an acoustic signal input to the acoustic signal input module; And
And a noise removal module for deriving a noise-removed acoustic signal by signal processing the extended vector through a noise removal filter,
The noise canceling module includes:
A computation module for computing a PSD (Power Spectral Density) matrix for the extended vector; And
And a filter applying module for applying a parameterized multi-channel Wiener filter to the calculated PSD matrix.

delete