KR101735313B1

KR101735313B1 - Phase corrected real-time blind source separation device

Info

Publication number: KR101735313B1
Application number: KR1020130092573A
Authority: KR
Inventors: 이윤경; 정호영; 이윤근
Original assignee: 한국전자통신연구원
Priority date: 2013-08-05
Filing date: 2013-08-05
Publication date: 2017-05-16
Also published as: KR20150016745A

Abstract

실시 예는, 음성 신호 및 잡음 신호를 포함하는 음원 신호에 암묵 디콘볼루션(blind deconvolution)을 수행하여 상기 음원 신호에서 상기 잡음 신호를 제거하여 상기 음성 신호를 출력하는 잡음 제거부; 상기 잡음 제거부로부터 수신되는 음성 신호를 기반으로 상기 잡음 제거부의 임펄스 응답(impulse response)을 획득하는 임펄스 응답 획득부; 및 상기 임펄스 응답을 시간에 대해 역방향으로 적용하여 상기 음성 신호의 위상 왜곡을 보상하는 위상 보상부를 포함하는 음원분리장치를 제공한다.An embodiment of the present invention is a noise canceling apparatus for performing blind deconvolution on a sound source signal including a sound signal and a noise signal to remove the noise signal from the sound source signal and outputting the sound signal; An impulse response obtaining unit obtaining an impulse response of the noise removing unit based on the speech signal received from the noise removing unit; And a phase compensator for compensating for the phase distortion of the speech signal by applying the impulse response in a reverse direction with respect to time.

Description

[0001] The present invention relates to a phase-corrected real-time blind source separation device,

실시 예는 실시간 음원분리장치에 관한 것으로서, 더욱 상세하게는 입력되는 음원신호에서 잡음신호를 제거하고 음성신호를 분리하는 과정에서 음성신호의 위상 왜곡을 보상하기 용이한 실시간 음원분리장치에 관한 것이다.More particularly, the present invention relates to a real-time sound source separation apparatus that can easily compensate for phase distortion of a sound signal in a process of removing a noise signal from an input sound source signal and separating a sound signal.

일반적인, 암묵 신호 분리(Blind Source Separation) 기술은 두 개 이상의 마이크로폰으로부터 채집된 신호를 음원의 통계적 특성에 따라 신호를 분리하는 기술로 크게 시간 영역에서의 분리 방법과 주파수 영역에서의 분리방법으로 구분된다.In general, Blind Source Separation is a technique for separating signals collected from two or more microphones according to the statistical characteristics of a sound source. It is classified into a time domain separation method and a frequency domain separation method .

암묵적 신호 분리를 이용하여 잡음을 제거하기 위해 음성신호 또는 잡음신호들이 혼합되어 들어오는 입력신호들로부터 상호 독립적인 신호들을 추출하는 방식으로 혼합되기 전의 신호들을 분리한다. 다시 말하여, 다수의 입력된 음성신호와 잡음신호의 혼합신호들이 입력되고, 이 입력신호로부터 잡음신호와 음성신호를 분리하여 출력함으로써 잡음이 분리된 음성신호만을 사용하여 음성인식을 수행하게 된다. To isolate noise by using implicit signal separation, the signals before mixing are separated by extracting mutually independent signals from the input signals mixed with the voice signal or the noise signal. In other words, mixed signals of a plurality of input speech signals and noise signals are input, and a noise signal and a speech signal are separately output from the input signal, thereby performing speech recognition using only a separated speech signal.

상술한 바와 같이, 시간 영역에서의 분리 방법은 이론적으로는 주파수 영역에서의 방법보다 우수한 성능을 보이나, 실제 적용했을 경우 화자의 위치 및 환경의 영향을 많이 받고, 알고리즘이 복잡하고, 계산량이 많다는 단점이 있다. As described above, although the separation method in the time domain shows superior performance to the method in the frequency domain in theory, it is affected by the position and environment of the speaker when actually applied, and has a disadvantage that the algorithm is complicated and the calculation amount is large .

이에 반하여 주파수 영역에서의 분리 방법은 알고리즘이 직관적이며 구현이 간단하나 본질적으로 뒤섞임 문제가 수반되며 이를 해결하기가 쉽지 않다는 단점이 있다. On the other hand, the separation method in the frequency domain is intuitive and simple to implement, but inherently involves the problem of intermixing, which is not easy to solve.

최근들어, 주파수 영역에서의 분리 방법에 대하여, 잡음신호가 제거된 음성신호의 뒤섞임 문제(위상 왜곡)를 보상하기 위한 연구가 진행 중에 있다.In recent years, studies have been made to compensate for the problem of intermixing (phase distortion) of a voice signal from which noise signals have been removed, in the frequency domain separation method.

실시 예는, 음원신호에서 잡음신호를 제거하고 음성신호를 분리하는 과정에서 음성신호의 위상 왜곡을 보상하기 용이하고 실시간으로 음원 분리를 수행하는 실시간 음원분리장치를 제공함에 있다.Embodiments of the present invention provide a real-time sound source separation apparatus that can easily compensate for phase distortion of a voice signal and perform real-time sound source separation in a process of removing a noise signal from a sound source signal and separating a voice signal.

실시 예에 따른 음원분리장치는, 음성신호 및 잡음신호를 포함하는 음원신호 입력시, 음원을 1초 간격으로 나누어 블록단위의 음성 신호를 출력하는 음성 신호 블록화부, 암묵 디콘볼루션(blind convolution)을 수행하여 상기 잡음신호가 제거된 상기 음성신호를 출력하는 잡음제거부, 상기 음성신호를 기반으로 상기 잡음제거부의 임펄스 응답(impulse response)을 확득하는 임펄스 응답 획득부 및 상기 임펄스 응답을 시간에 대한 역방향으로 적용하여 상기 음성신호의 위상왜곡을 보상하는 위상보상부를 포함한다.A sound source separation apparatus according to an embodiment of the present invention includes a speech signal blocking unit for outputting a speech signal in units of blocks at intervals of 1 second when a sound source signal including a speech signal and a noise signal is input, An impulse response acquiring unit for acquiring an impulse response of the noise removing unit based on the speech signal, and an impulse response obtaining unit for obtaining the impulse response at a time And a phase compensator for compensating for the phase distortion of the voice signal.

실시 예에 따른 잡음제거부는, 음원신호를 정보 최대화 기법(information maximization approach)에 기반을 둔 암묵 디콘볼루션을 수행하는 FIR 필터를 포함한다.The noise elimination unit according to the embodiment includes an FIR filter that performs an implicit deconvolution based on an information maximization approach of a sound source signal.

실시 예에 따른 FIR 필터는, 고역 통과 필터(high pass filter)를 포함한다.An FIR filter according to an embodiment includes a high pass filter.

실시 예에 따른 위상보상부는, 임펄스 응답을 시간에 역방향을 갖으며, 음성신호를 필터링하는 FIR 필터를 포함한다.The phase compensation unit according to the embodiment includes an FIR filter having an impulse response in a time direction opposite to that of the speech signal and filtering the speech signal.

실시 예에 따른 위상보상부의 FIR 필터는, 잡음제거부에 포함된 FIR 필터, 필터계수가 동일한 필터 및 polynomial curve fitting을 이용하는 추정한 필터 중 적어도 하나를 포함한다.The FIR filter of the phase compensator according to the embodiment includes at least one of a FIR filter included in noise suppression, a filter having the same filter coefficient, and an estimated filter using polynomial curve fitting.

실시 예에 따른 음원분리장치는, 암묵 디콘볼루션을 수행하는 과정에서 발생된 음성신호의 저주파 왜곡에 대하여, 상기 음성신호의 임펄스 응답을 획득하여 시간에 역방향으로 적용하여, 상기 음성신호의 위상 틀어짐 현상을 위상 보상해줌으로써, 음성신호 처리시 음성을 깨끗하게 처리할 수 있는 이점이 있다.The sound source separation apparatus according to the embodiment acquires the impulse response of the speech signal with respect to the low frequency distortion of the speech signal generated in the process of performing the implicit deconvolution and applies the impulse response in the reverse direction to the time, By compensating the phase of the phenomenon, there is an advantage that the sound can be processed cleanly during the speech signal processing.

또한, 입력 음성 신호를 1초 간격으로 나누어 음원 분리 및 위상 보상을 수행함으로써, 실시간으로 음성신호를 깨끗하게 처리할 수 있는 이점이 있다.In addition, there is an advantage that the voice signal can be processed cleanly in real time by performing the sound source separation and phase compensation by dividing the input voice signal at intervals of 1 second.

도 1은 제1 실시 예에 따른 음원분리장치의 제어구성을 나타낸 제어블록도이다.1 is a control block diagram showing a control configuration of a sound source separation apparatus according to the first embodiment.

이하, 본 발명의 바람직한 실시예를 첨부된 도면들을 참조하여 상세히 설명한다. 우선 각 도면의 구성요소들에 참조 부호를 부가함에 있어서, 동일한 구성요소들에 대해서는 비록 다른 도면상에 표시되더라도 가능한한 동일한 부호를 가지도록 하고 있음에 유의해야 한다. 또한, 본 발명을 설명함에 있어, 관련된 공지 구성 또는 기능에 대한 구체적인 설명이 본 발명의 요지를 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명은 생략한다. 또한, 이하에서 본 발명의 바람직한 실시예를 설명할 것이나, 본 발명의 기술적 사상은 이에 한정하거나 제한되지 않고 당업자에 의해 변형되어 다양하게 실시될 수 있음은 물론이다.Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. In the drawings, the same reference numerals are used to designate the same or similar components throughout the drawings. In the following description of the present invention, a detailed description of known functions and configurations incorporated herein will be omitted when it may make the subject matter of the present invention rather unclear. In addition, the preferred embodiments of the present invention will be described below, but it is needless to say that the technical idea of the present invention is not limited thereto and can be variously modified by those skilled in the art.

도 1은 실시 예에 따른 음원분리장치의 제어구성을 나타낸 제어블록도이다.1 is a control block diagram showing a control configuration of a sound source separation apparatus according to an embodiment.

도 1을 참조하면, 음원분리장치는, 음성신호 블록화부(10), 잡음제거부(20), 임펄스 응답 획득부(30) 및 위상보상부(40)를 포함한다.Referring to FIG. 1, the sound source separation apparatus includes a speech signal blocking unit 10, a noise removing unit 20, an impulse response obtaining unit 30, and a phase compensating unit 40.

음성신호 블록화부(10)는 음성신호 및 잡음신호를 포함하는 음원신호 입력시, 음원신호를 1초 간격으로 나누어 블록단위의 음원신호를 출력한다.The speech signal blocking unit 10 divides a sound source signal by an interval of one second when a sound source signal including a voice signal and a noise signal is input, and outputs a sound source signal of a block unit.

잡음제거부(20)는 상기 음원신호를 이용하여 암묵 디콘볼루션(blind deconvolution)을 수행하여 상기 잡음신호가 제거된 상기 음성신호를 출력한다.The noise removing unit 20 performs blind deconvolution using the sound source signal and outputs the voice signal from which the noise signal is removed.

여기서, 상기 암묵 디콘볼루션은 information-maximization approach에 의한 고주파 통과 필터의 형태의 FIR 필터를 이용하며, 필터링 수행 후의 신호는 하기의 [수학식 1]과 같다. 이때, 열 Y(t)는 입력 음성신호이다.Here, the implicit deconvolution uses an FIR filter in the form of a high-pass filter by an information-maximization approach, and the signal after filtering is expressed by Equation (1) below. At this time, the column Y (t) is the input voice signal.

암묵 디콘볼루션(Blind deconvolution)은 다음의 스텝에 따라 수행된다.The blind deconvolution is performed according to the following steps.

step 1은 필터 계수 wk에 대한 초기 추정치를 설정하고 열 U(t)와 Z(t) 초기 설정한다.Step 1 sets the initial estimate for the filter coefficient wk and initializes the columns U (t) and Z (t).

step 2는 필터 계수에 대한 gradient descent rule 계산한다.Step 2 calculates the gradient descent rule for the filter coefficients.

step 3은 필터 계수를 업데이트하고, 이를 이용하여 열 U(t)를 계산한다.Step 3 updates the filter coefficient and uses it to compute the column U (t).

step 4는 만약 필터 계수가 수렴하지 않으면, step 2에서 재수행한다.Step 4, if the filter coefficients do not converge, repeat step 2.

step 5는 [수학식 1]을 이용하여, 필터 계수를 입력 음원신호에 가중치 합하여줌으로써, 잡음이 제거된 특징 열 U(t)를 추정한다.Step 5 estimates the noise-removed feature sequence U (t) by weighting the filter coefficients by the input sound source signal using Equation (1).

암묵 디콘볼루션 수행을 위한 activation function은 음성 신호 처리에 대표적으로 적용되는 가우시안 분포를 적용하였으며, 가우시안 분포를 적용한 암묵 디콘볼루션에서의 가중치의 학습룰은 하기의 [수학식 2]와 같다.The activation function for the implicit deconvolution is applied to the Gaussian distribution which is typically applied to the speech signal processing. The learning rule of the weight in the implicit deconvolution using the Gaussian distribution is expressed by Equation (2) below.

잡음제거부(20)는 상기 음원신호 입력시 설정된 비선형 함수를 적용하여 상기 음원신호를 특성에 따른 파라미터 도메인으로 변환하며, 파라미터 도메인을 통하여 계산된 특징 벡터열 사이의 결합 엔트로피를 최대화하는 필터 계수를 계산하여, 상기 FIR 필터에 상기 필터 계수를 초기 추정치로 설정하고 최급하강법을 이용하여 상기 필터 계수가 수렴될때까지 반복 및 업데이트함으로써, 최적의 필터 계수를 산출할 수 있다.The noise eliminator 20 transforms the excitation signal into a parameter domain according to the characteristic by applying a nonlinear function set at the time of inputting the excitation signal, and a filter coefficient for maximizing the coupling entropy between the feature vector sequences calculated through the parameter domain And calculating the optimum filter coefficient by setting the filter coefficient to the FIR filter as an initial estimate and repeating and updating until the filter coefficient is converged using the peak drop method.

이후, 잡음제거부(20)는 상기 최적의 필터 계수를 이용하여 암묵 디콘볼루션을 수행하여, 상기 음원신호에서 상기 잡음신호를 제거한 상기 음성신호를 출력할 수 있다.Thereafter, the noise eliminator 20 performs an implicit deconvolution using the optimal filter coefficient, and outputs the speech signal from which the noise signal has been removed.

임펄스 응답 획득부(30) 는 잡음제거부(20)에서 출력된 상기 음성신호를 기반으로 잡음제거부(20)에 포함된 FIR 필터의 임펄스 응답(impulse response)을 획득하여, 위상보상부(40)로 전달한다.The impulse response obtaining unit 30 obtains the impulse response of the FIR filter included in the noise removing unit 20 based on the speech signal output from the noise removing unit 20 and outputs the impulse response to the phase compensating unit 40 ).

이때, 필터의 임펄스 응답을 획득하는 방법은 두 가지를 사용한다. 첫 번째 방법은 잡음제거부(20)에 포함된 FIR 필터로부터 직접 동일한 필터의 계수를 획득하는 것이고, 두 번째 방법은 잡음제거부(20)에서 출력된 상기 음성신호를 기반으로 필터의 계수를 추정하는 것이다. 필터 계수와 그에 따른 임펄스 응답을 추정하기 위해 polynomial curve fitting 방법을 이용하였다.At this time, there are two methods for obtaining the impulse response of the filter. The first method is to obtain the coefficient of the same filter directly from the FIR filter included in the noise removing unit 20. The second method is to estimate the coefficient of the filter based on the speech signal output from the noise removing unit 20, . The polynomial curve fitting method was used to estimate the filter coefficients and the corresponding impulse responses.

위상보상부(40)는 임펄스 응답 획득부(30) 에서 출력된 임펄스 응답을 시간에 역방향, 즉 역 임펄스 응답을 갖는 필터 계수를 적용하여 상기 음성신호에 포함된 저주파 왜곡을 보상할 수 있다.The phase compensation unit 40 may compensate the low frequency distortion included in the speech signal by applying a filter coefficient having a backward inverse time, that is, an inverse impulse response, to the impulse response output from the impulse response obtaining unit 30. [

여기서, 위상보상부(40)는 잡음제거부(20)에 포함된 FIR 필터와 유사한 상기 필터 계수를 갖는 FIR 필터를 사용함으로써, 잡음제거부(20)와 동일한 주파수 응답을 갖지만 위상 응답이 반대로 됨으로써, 상기 음성신호의 위상 왜곡을 복구할 수 있음으로써, 상기 음성신호의 위상을 보상할 수 있다.Here, the phase compensator 40 uses the FIR filter having the filter coefficient similar to that of the FIR filter included in the noise eliminator 20, thereby having the same frequency response as the noise canceler 20 but reversing the phase response , The phase distortion of the voice signal can be recovered, so that the phase of the voice signal can be compensated.

이상에서 설명한 본 발명의 실시예를 구성하는 모든 구성요소들이 하나로 결합하거나 결합하여 동작하는 것으로 기재되어 있다고 해서, 본 발명이 반드시 이러한 실시예에 한정되는 것은 아니다. 즉, 본 발명의 목적 범위 안에서라면, 그 모든 구성요소들이 하나 이상으로 선택적으로 결합하여 동작할 수도 있다. 또한, 그 모든 구성요소들이 각각 하나의 독립적인 하드웨어로 구현될 수 있지만, 각 구성요소들의 그 일부 또는 전부가 선택적으로 조합되어 하나 또는 복수개의 하드웨어에서 조합된 일부 또는 전부의 기능을 수행하는 프로그램 모듈을 갖는 컴퓨터 프로그램으로서 구현될 수도 있다. 또한, 이와 같은 컴퓨터 프로그램은 USB 메모리, CD 디스크, 플래쉬 메모리 등과 같은 컴퓨터가 읽을 수 있는 기록매체(Computer Readable Media)에 저장되어 컴퓨터에 의하여 읽혀지고 실행됨으로써, 본 발명의 실시예를 구현할 수 있다. 컴퓨터 프로그램의 기록매체로서는 자기 기록매체, 광 기록매체, 캐리어 웨이브 매체 등이 포함될 수 있다.It is to be understood that the present invention is not limited to these embodiments, and all elements constituting the embodiment of the present invention described above are described as being combined or operated in one operation. That is, within the scope of the present invention, all of the components may be selectively coupled to one or more of them. In addition, although all of the components may be implemented as one independent hardware, some or all of the components may be selectively combined to perform a part or all of the functions in one or a plurality of hardware. As shown in FIG. In addition, such a computer program may be stored in a computer readable medium such as a USB memory, a CD disk, a flash memory, etc., and read and executed by a computer to implement an embodiment of the present invention. As the recording medium of the computer program, a magnetic recording medium, an optical recording medium, a carrier wave medium, and the like can be included.

또한, 기술적이거나 과학적인 용어를 포함한 모든 용어들은, 상세한 설명에서 다르게 정의되지 않는 한, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 갖는다. 사전에 정의된 용어와 같이 일반적으로 사용되는 용어들은 관련 기술의 문맥상의 의미와 일치하는 것으로 해석되어야 하며, 본 발명에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Furthermore, all terms including technical or scientific terms have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs, unless otherwise defined in the Detailed Description. Commonly used terms, such as predefined terms, should be interpreted to be consistent with the contextual meanings of the related art, and are not to be construed as ideal or overly formal, unless expressly defined to the contrary.

이상의 설명은 본 발명의 기술 사상을 예시적으로 설명한 것에 불과한 것으로서, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 발명의 본질적인 특성에서 벗어나지 않는 범위 내에서 다양한 수정, 변경 및 치환이 가능할 것이다. 따라서, 본 발명에 개시된 실시예 및 첨부된 도면들은 본 발명의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시예 및 첨부된 도면에 의하여 본 발명의 기술 사상의 범위가 한정되는 것은 아니다. 본 발명의 보호 범위는 아래의 청구 범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 발명의 권리 범위에 포함되는 것으로 해석되어야 할 것이다.It will be apparent to those skilled in the art that various modifications, substitutions and substitutions are possible, without departing from the scope and spirit of the invention as disclosed in the accompanying claims. will be. Therefore, the embodiments disclosed in the present invention and the accompanying drawings are intended to illustrate and not to limit the technical spirit of the present invention, and the scope of the technical idea of the present invention is not limited by these embodiments and the accompanying drawings . The scope of protection of the present invention should be construed according to the following claims, and all technical ideas within the scope of equivalents should be construed as falling within the scope of the present invention.

10: 음성신호 블록화부 20:잡음제거부
30: 임펄스 응답 획득부 40: 위상보상부10: Voice signal blocking unit 20: Noise removing unit
30: Impulse response acquisition unit 40: Phase compensation unit

Claims

A noise eliminator for performing blind deconvolution on a sound source signal including a sound signal and a noise signal to remove the noise signal from the sound source signal and outputting the sound signal;
An impulse response obtaining unit obtaining an impulse response of the noise removing unit based on the speech signal received from the noise removing unit; And
And a phase compensator for compensating for phase distortion of the speech signal by applying the impulse response in a reverse direction with respect to time,
Wherein the noise eliminator comprises a finite impulse response (FIR) filter for performing the implicit deconvolution.

The method according to claim 1,
A sound signal blocking unit for dividing an input sound source signal into a predetermined time interval and outputting the signal to the noise removing unit;
The sound source separation device comprising:

delete

2. The FIR filter of claim 1,
A high-pass filter
Real time sound source separation device.

The apparatus of claim 1,
And an FIR filter for filtering the speech signal by applying the impulse response in a reverse direction with respect to time
Real time sound source separation device.

6. The apparatus of claim 5, wherein the FIR filter included in the phase compensator comprises:
The filter coefficient having the same filter coefficient as the FIR filter included in the noise removing unit
Real time sound source separation device.

7. The apparatus of claim 6, wherein the coefficient of the FIR filter included in the phase-
And estimating a speech signal based on the speech signal received from the noise removing unit
Real time sound source separation device.