KR101318328B1

KR101318328B1 - Speech enhancement method based on blind signal cancellation and device using the method

Info

Publication number: KR101318328B1
Application number: KR1020120037993A
Authority: KR
Inventors: 박형민; 황재식; 이민호
Original assignee: 경북대학교 산학협력단; 서강대학교산학협력단
Priority date: 2012-04-12
Filing date: 2012-04-12
Publication date: 2013-10-15

Abstract

PURPOSE: A method and device for sound enhancement using silence signal removal through the sparsity property minimization can make a target speech signal is generated from a point source and estimate a target speech signal in an environment in which defuse noise exists. CONSTITUTION: A gain measurement part (230) receives a null forming signal from a null forming unit. The gain measurement part receives a beam forming signal from a beam forming part. The gain measurement part uses the provided null forming signal and the beam forming signal to generate a gain and provides it. A filter section (240) receives a beam forming signal from the beam forming part. The filter section receives the gain from the gain measurement part. The filter section uses the provided beam forming signal and the gain to estimate a target speech signal and provides it. [Reference numerals] (200) Signal input part; (210) Null forming part; (220) Beam forming part; (230) Gain measurement part; (240) Filter section

Description

Speech enhancement method based on blind signal cancellation and device using the method}

본 발명은 음성 향상 방법 및 그 장치에 관한 것으로서, 더욱 구체적으로는 입력신호들 중 음원 방향의 신호에서 성김 특성 최소화를 통한 암묵 신호 제거를 이용하여 타겟 스피치 신호를 추정하는 음성 향상 방법 및 이를 이용한 장치에 관한 것이다. The present invention relates to a speech enhancement method and apparatus, and more particularly, to a speech enhancement method for estimating a target speech signal using tacit signal removal through minimization of sparsity in a sound source direction among input signals, and an apparatus using the same. It is about.

여러 음원 신호가 혼합된 음향 신호에서 개별적인 음원 신호를 분리해 내는 것을 BSS(Blind Source Separation 또는 Blind Signl Separation)라고 하며, 여기서 Blind는 원본 음원 신호에 대한 정보가 없으며, 혼합 환경에 대해서도 정보가 없다는 것을 의미한다. The separation of individual sound sources from a sound signal mixed with multiple sound sources is called BSS (Blind Source Separation or Blind Signl Separation), where Blind has no information about the original source signal and no information about the mixed environment. it means.

많은 BSS 방안들이 제안되었으나, 실제 응용에서 우세하게 발생되는 디퓨즈 잡음(Diffuse Background Noise)를 효율적으로 제거하지는 못하고 있는 실정이다. 따라서, 관심 음원이 단일의 우세한 점 음원(a dominant point source)이라 가정하고 주파수 영역(Frequency-Domain;'FD')의 BSE(Blind Signal Extraction)에 기반한 음원 향상 방법이 제안되었다. Many BSS schemes have been proposed, but they do not effectively eliminate diffuse background noise, which is predominant in practical applications. Therefore, assuming that the sound source of interest is a single dominant point source, a sound source enhancement method based on BSE (Blind Signal Extraction) in the frequency domain ('FD') has been proposed.

도 1은 종래의 FD-BSE 기반으로 한 음원 향상 방법을 개략적으로 도시한 블록도이다. 도 1을 참조하면, 종래의 FD-BSE 기반으로 한 음원 향상 방법에 있어서, 먼저 신호 입력부(100)는 다수 개의 마이크로폰을 통해 입력되는 신호들을 SFT 변환시켜 빔포밍부(110) 및 멀티 채널 필터부(120)로 각각 제공한다. 1 is a block diagram schematically showing a sound source enhancement method based on a conventional FD-BSE. Referring to FIG. 1, in the conventional FD-BSE-based sound source enhancement method, the signal input unit 100 first SFT-converts signals input through a plurality of microphones to form a beamforming unit 110 and a multi-channel filter unit. Provide each with 120.

빔포밍부(110)는 입력 신호들에 대하여 성김 특성을 최대화시킨 음원 신호를 구하고, 스피치 신호 추정부(130) 및 노이즈 신호 추정부(140)는 각각 추정된 음원 신호에 대한 타겟 스피치 신호 및 노이즈 신호를 추정하고, 이득 측정부(140)는 이들 값을 이용하여 이득을 계산한다. 상기 멀티 채널 필터부(120)는 신호 입력부로부터 제공된 입력 신호들과 이득값들을 이용하여 각 채널별로 위너 필터링(Wiener filtering)한다. 채널별로 필터링된 신호들은 DS(Delay-and-Sum) 빔포밍부(150)로 입력되며, DS 빔포밍부는 입려된 채널별 신호들에 대하여 채널간의 위상차를 제거한 후 합하여 타겟 스피치 신호를 추정하여 출력한다. The beamforming unit 110 obtains a sound source signal that maximizes the sparse characteristic with respect to the input signals, and the speech signal estimator 130 and the noise signal estimator 140 respectively target a speech signal and noise for the estimated sound source signal. The signal is estimated, and the gain measuring unit 140 calculates the gain using these values. The multi-channel filter unit 120 performs Wiener filtering for each channel by using input signals and gain values provided from the signal input unit. Signals filtered for each channel are input to a delay-and-sum beamforming unit 150, and the DS beamforming unit estimates and outputs a target speech signal by removing the phase difference between channels with respect to the applied channel-specific signals. do.

전술한 종래의 음원 향상 방법은 타겟 스피치 신호를 추출하는데 적용되나, 출력 신호에는 디퓨즈 백그라운드 노이즈(Diffuse Background Noise)가 포함되어 있는 문제점이 있다. 또한, 종래의 음원 향상 방법은 성김 특성 최대화를 통해 음원 신호를 추정하나, 이러한 방법은 비용 함수를 최소화시킴으로써 구할 수 있는데, 이에 대한 학습이 어렵고 수렴이 용이하지 않으며 weight 값이 추정이 잘 안되는 문제점이 있다.
The above-described conventional sound source enhancement method is applied to extract the target speech signal, but there is a problem that a diffuse background noise is included in the output signal. In addition, the conventional sound source enhancement method estimates the sound source signal by maximizing the coarseness characteristic, but this method can be obtained by minimizing the cost function, which is difficult to learn, is not easy to converge, and the weight value is not well estimated. have.

(1) 한국 특허 공개 번호 제10-2010-105700호(1) Korean Patent Publication No. 10-2010-105700 (2) 한국 특허 공개 번호 제10-2011-25667호(2) Korean Patent Publication No. 10-2011-25667 (3) 한국 특허 공개 번호 제10-2011-43699호(3) Korean Patent Publication No. 10-2011-43699 (4) 미국특허공개번호 US 2010-183178 A1(4) United States Patent Publication No. US 2010-183178 A1

전술한 문제점을 해결하기 위한 본 발명의 목적은 타겟 스피치 신호가 단일의 점 음원(point source)으로부터 발생하는 것이며 디퓨즈 노이즈가 존재하는 상황에서의 타겟 스피치 신호를 추정하는 음원 향상 방법 및 장치를 제공하는 것이다. SUMMARY OF THE INVENTION An object of the present invention is to provide a sound source enhancement method and apparatus for estimating a target speech signal in a situation where a target speech signal is generated from a single point source and there is diffuse noise. will be.

본 발명의 다른 목적은 타겟 스피치 신호를 추정함에 있어서 성김 특성 최소화를 통한 암묵 신호 제거를 이용하여 타겟 스피치 신호를 추정하는 음성 향상 방법 및 장치를 제공하는 것이다. Another object of the present invention is to provide a speech enhancement method and apparatus for estimating a target speech signal using blind signal elimination through minimizing sparsity in estimating the target speech signal.

전술한 기술적 과제를 달성하기 위한 본 발명의 제1 특징은 암묵 신호 제거(Blind Signal Cancellation)를 이용하여 타겟 스피치 신호를 추정하는 음원 향상 방법에 관한 것으로서, 상기 음원 향상 방법은, (a) 다수 개의 입력 장치를 통해 타겟 스피치 신호를 입력받아 주파수 도메인의 입력 신호({X _j (f,t), j=1,2,...,n, n=입력장치의 개수})으로 변환시키는 신호 입력 단계; (b) 상기 신호 입력 단계에서 변환된 주파수 도메인의 입력 신호들로부터 암묵 신호 제거를 위한 널포밍 신호( Y _N (f,t) ) 및 추정 변수( α(f) )를 생성하는 널포밍 단계; (c) 상기 신호 입력 단계에서 변환된 주파수 도메인의 입력신호, 및 상기 널포밍 단계에서 생성된 추정 변수(α(f) )를 이용하여 빔포밍 신호(Y _B (f,t) )를 생성하는 빔포밍 단계; (d) 상기 널포밍 단계에서 생성된 널포밍 신호 및 상기 빔포밍 단계에서 생성된 빔포밍 신호를 이용하여 이득(G(f,t))을 생성하는 이득 측정 단계; (e) 상기 빔포밍 단계에서 생성된 빔포밍 신호 및 상기 이득 측정부에서 생성된 이득을 이용하여 타겟 스피치 신호(S(f,t))를 추정하는 필터링 단계; 를 구비한다. A first aspect of the present invention for achieving the above technical problem relates to a sound source enhancement method for estimating a target speech signal using blind signal cancellation, the sound source enhancement method, (a) a plurality of A signal input for receiving a target speech signal through an input device and converting it to an input signal in a frequency domain ({ X _j (f, t), j = 1,2, ..., n, n = number of input devices}). step; (b) a null forming step of generating a null forming signal Y _N (f, t ) and an estimated variable α (f) for tacit signal removal from the input signals in the frequency domain converted in the signal input step; (c) generating a beamforming signal Y _B (f, t) using the input signal of the frequency domain converted in the signal input step and the estimated variable α (f) generated in the null forming step. Beamforming step; (d) a gain measuring step of generating a gain G (f, t) using the null forming signal generated in the null forming step and the beam forming signal generated in the beam forming step; (e) a filtering step of estimating a target speech signal S (f, t) using the beamforming signal generated in the beamforming step and the gain generated by the gain measuring unit; It is provided.

본 발명의 제2 특징은 암묵 신호 제거(Blind Signal Cancellation)를 이용하여 타겟 스피치 신호를 추정하여 제공하는 음원 향상 장치에 관한 것으로서, 상기 음원 향상 장치는, (a) 다수 개의 입력 장치를 통해 타겟 스피치 신호를 입력받아 주파수 도메인의 입력 신호({X _j (f,t), j=1,2,...,n, n=입력장치의 개수})으로 변환시켜 제공하는 신호 입력부; (b) 상기 신호 입력부로부터 제공된 주파수 도메인의 입력 신호들로부터 암묵 신호 제거를 위한 널포밍 신호(Y _N (f,t)) 및 추정 변수(α(f))를 생성하는 널포밍부; (c) 상기 신호 입력부로부터 주파수 도메인의 입력신호를 제공받고, 상기 널포밍부로부터 추정 변수((f))를 제공받고, 상기 제공된 입력신호 및 추정변수를 이용하여 빔포밍 신호(Y _B (f,t) )를 생성하여 제공하는 빔포밍부; (d) 상기 널포밍부로부터 널포밍 신호를 제공받고, 상기 빔포밍부로부터 빔포밍 신호를 제공받고, 상기 제공된 널포밍 신호 및 빔포밍 신호를 이용하여 이득(G(f,t))을 생성하여 제공하는 이득 측정부; (e) 상기 빔포밍부로부터 빔포밍 신호를 제공받고, 상기 이득 측정부로부터 이득을 제공받고, 상기 제공된 빔포밍 신호 및 이득을 이용하여 타겟 스피치 신호(S(f,t))를 추정하여 제공하는 필터부; 를 구비한다. A second aspect of the present invention relates to a sound source enhancement device for estimating and providing a target speech signal using blind signal cancellation, wherein the sound source enhancement device comprises: (a) target speech through a plurality of input devices; A signal input unit which receives a signal and converts the signal into an input signal in a frequency domain ({ X _j (f, t), j = 1, 2, ..., n, n = number of input devices); (b) a null forming unit for generating a null forming signal Y _N (f, t ) and an estimated variable α (f) for removing a blind signal from input signals in the frequency domain provided from the signal input unit; (c) receiving an input signal of a frequency domain from the signal input unit, receiving an estimation variable (f) from the null forming unit, and using the provided input signal and the estimation variable, a beamforming signal Y _B (f a beamforming unit generating and providing t) ; (d) receiving a null forming signal from the null forming unit, receiving a beam forming signal from the beam forming unit, and generating a gain G (f, t) using the provided null forming signal and the beam forming signal. Gain measuring unit provided by; (e) receiving a beamforming signal from the beamforming unit, receiving a gain from the gain measuring unit, and estimating and providing a target speech signal S (f, t) using the provided beamforming signal and gain. Filter unit to be; It is provided.

전술한 제1 및 제2 특징에 따른 음원 향상 방법 및 장치에 있어서, 상기 입력 장치는 서로 일정 간격 이격된 위치에 배치된 2개의 입력 장치로 구성되는 것이 바람직하다. In the sound source enhancement method and apparatus according to the first and second features described above, the input device is preferably composed of two input devices arranged at positions spaced apart from each other by a predetermined interval.

전술한 제1 및 제2 특징에 따른 음원 향상 방법 및 장치에 있어서, 타겟 스피치 신호가 단일의 점 음원으로부터 발생되는 신호이며, 디퓨즈 노이즈(diffuse noise)가 존재하는 경우에 적용되는 것이 바람직하다. In the sound source enhancement method and apparatus according to the above-described first and second features, the target speech signal is a signal generated from a single point sound source, and is preferably applied when there is diffuse noise.

전술한 제1 및 제2 특징에 따른 음원 향상 방법 및 장치에 있어서, 상기 널포밍 신호는 수학식

에 의해 결정되는 것이 바람직하다. In the sound source enhancement method and apparatus according to the first and second features described above, the null forming signal is

It is preferable to determine by.

전술한 제1 및 제2 특징에 따른 음원 향상 방법 및 장치에 있어서, 상기 추정 변수는 타겟 스피치 신호에 대응하는 최대 성김 신호를 소거하기 위한 변수로서, 추정 변수의 비용 함수(J(α(f)))를 최대화시키는 gradient ascent 알고리즘을 이용하여 추정되는 것이 바람직하다. In the method and apparatus for improving the sound source according to the first and second features described above, the estimated variable is a variable for canceling the maximum coarse signal corresponding to the target speech signal, and is a cost function of the estimated variable ( J (α (f)). )) it is preferably estimated using a gradient ascent algorithm to maximize the.

전술한 제1 및 제2 특징에 따른 음원 향상 방법 및 장치에 있어서, 상기 빔포밍 신호는 수학식

에 의해 결정되는 것이 바람직하다. In the sound source enhancement method and apparatus according to the first and second features described above, the beamforming signal is

It is preferable to determine by.

전술한 제1 및 제2 특징에 따른 음원 향상 방법 및 장치에 있어서, 상기 이득은 수학식

에 의해 결정되며, 여기서 β는 노이즈 감쇄를 제어하는 변수인 것이 바람직하다. In the sound source enhancement method and apparatus according to the first and second features described above, the gain is expressed by the following equation.

It is determined by, wherein β is preferably a variable controlling the noise attenuation.

전술한 제1 및 제2 특징에 따른 음원 향상 방법 및 장치에 있어서, 상기 타겟 스피치 추정신호는 수학식

에 의해 계산되는 것이 바람직하다. In the method and apparatus for improving a sound source according to the first and second features described above, the target speech estimation signal is

It is preferable to calculate by.

본 발명에 따른 FD-BSC에 기반한 음원 향상 방법의 성능을 테스트하기 위하여 도 3과 같은 환경을 설정하였다. 도 3을 참조하면, 테스트 룸(test room)내에 타겟 스피치 음원과 마이크로폰이 배치시켰는데, 흰색 원으로 표시된 2개의 마이크로폰이 4cm 이격되어 고정 배치되어 있으며, 타겟 스피치의 후보 음원들이 검은색 원으로 표시되어 마이크로폰을 기준으로 하여 30°씩 이격되어 배치되어 있다. In order to test the performance of the sound source enhancement method based on the FD-BSC according to the present invention, the environment as shown in FIG. 3 was set. Referring to FIG. 3, a target speech sound source and a microphone are arranged in a test room, and two microphones marked with white circles are fixedly spaced apart by 4 cm, and candidate sound sources of the target speech are indicated by black circles. They are spaced 30 degrees apart from the microphone.

도 4는 본 발명에 따른 FD-BSC에 기반한 음원 향상 방법과 도 1에 도시된 종래의 FD-BSE에 기반한 음원 향상 방법에서의 평균 DOA error들을 비교하여 도시한 그래프들이다. 도 4를 참조하면, 종래의 FD-BSE 에 기반한 음원 향상 방법은 DOA error들이 증가됨에 반해, 본 발명에 따른 FD-BSC에 기반한 음원 향상 방법은 DOA error가 감소됨을 쉽게 파악할 수 있다. 4 is a graph illustrating average DOA errors in a sound source enhancement method based on FD-BSC according to the present invention and a sound source enhancement method based on conventional FD-BSE shown in FIG. 1. Referring to FIG. 4, in the conventional FD-BSE-based sound source enhancement method, DOA errors are increased, the FD-BSC-based sound source enhancement method according to the present invention can easily grasp that the DOA error is reduced.

도 5는 본 발명에 따른 FD-BSC에 기반한 음원 향상 방법과 도 1에 도시된 종래의 FD-BSE에 기반한 음원 향상 방법에서의 신호대잡음비(Signal-to-Noise Ratio;이하 'SNR'이라 한다)를 비교한 그래프들이다. 도 5를 참조하면, 본 발명에 따른 FD-BSC 에 근거한 음원 향상 방법이 종래의 FD-BSE에 근거한 음원 향상 방법보다 모든 시정수에서 SNR이 증가됨을 알 수 있다. 5 is a signal-to-noise ratio (hereinafter referred to as 'SNR') in a sound source enhancement method based on FD-BSC and a conventional sound source enhancement method based on FD-BSE shown in FIG. These are graphs comparing. Referring to FIG. 5, it can be seen that the sound source enhancement method based on the FD-BSC according to the present invention increases the SNR at all time constants than the conventional sound source enhancement method based on the FD-BSE.

도 1은 종래의 FD-BSE 기반으로 한 음원 향상 방법을 개략적으로 도시한 블록도이다.
도 2는 본 발명의 바람직한 실시예에 따른 음원 향상 방법이 적용된 음원 향상 장치를 개략적으로 도시한 블록도이다.
도 3은 본 발명의 바람직한 실시예에 따른 음원 향상 방법의 성능을 테스트하기 위한 환경을 도시한 것이다.
도 4는 본 발명에 따른 FD-BSC에 기반한 음원 향상 방법과 도 1에 도시된 종래의 FD-BSE에 기반한 음원 향상 방법에서의 평균 DOA error들을 비교하여 도시한 그래프들이다.
도 5는 본 발명에 따른 FD-BSC에 기반한 음원 향상 방법과 도 1에 도시된 종래의 FD-BSE에 기반한 음원 향상 방법에서의 신호대잡음비(Signal-to-Noise Ratio;이하 'SNR'이라 한다)를 비교한 그래프들이다. 1 is a block diagram schematically showing a sound source enhancement method based on a conventional FD-BSE.
2 is a block diagram schematically illustrating a sound source enhancement apparatus to which a sound source enhancement method according to an exemplary embodiment of the present invention is applied.
Figure 3 shows an environment for testing the performance of the sound source enhancement method according to a preferred embodiment of the present invention.
4 is a graph illustrating average DOA errors in a sound source enhancement method based on FD-BSC according to the present invention and a sound source enhancement method based on conventional FD-BSE shown in FIG. 1.
5 is a signal-to-noise ratio (hereinafter referred to as 'SNR') in a sound source enhancement method based on FD-BSC and a conventional sound source enhancement method based on FD-BSE shown in FIG. These are graphs comparing.

이하, 첨부된 도면을 참조하여 본 발명의 바람직한 실시예에 따른 성김특성 최소화를 통한 암묵 신호 제거를 이용한 음원 향상 방법 및 이를 이용한 음원 향상 장치의 구조 및 동작을 구체적으로 설명한다. Hereinafter, with reference to the accompanying drawings will be described in detail the structure and operation of the sound source enhancement method and the sound source enhancement device using the same by eliminating the blind signal by minimizing the sparsity characteristics according to an embodiment of the present invention.

본 발명에 따른 음원 향상 방법은 타겟 스피치 신호(a target speech signal)가 마이크로폰들에 인접한 점 음원(a point source)에서 발생하는 것으로 가정함으로써, 타겟 스피치 신호를 소거하는 것은 음원 방향으로 널포밍(nullforming)함으로써 효율적으로 달성될 수 있는 것을 특징으로 한다. 또한, 본 발명에 따른 음원 향상 방법에 있어서, 성김 특성 최소화를 이용하여 구현하는 널포밍을 위한 파라메터 추정은 신뢰성이 있고 안정적이므로, 파라메터의 음(-)의 값을 이용하여 빔포밍(beamforming) 출력을 얻는 것을 특징으로 한다. 따라서, 본 발명에 따른 음원 향상 방법은 성김 특성 최소화를 통하여 암묵 신호를 제거하여 얻을 수 있는 널포밍 출력 신호와 빔포밍 출력 신호를 사용함으로써, 두 개의 마이크로폰 신호들을 사용한 타겟 스피치 신호의 추정을 향상시킬 수 있게 된다. The sound source enhancement method according to the present invention assumes that a target speech signal is generated at a point source adjacent to the microphones, so that canceling the target speech signal is nullforming in the direction of the sound source. It can be achieved efficiently by this). In addition, in the sound source enhancement method according to the present invention, since parameter estimation for null forming implemented by minimizing sparse characteristics is reliable and stable, beamforming is output by using negative values of the parameters. It is characterized by obtaining. Accordingly, the sound source enhancement method according to the present invention improves the estimation of a target speech signal using two microphone signals by using a null forming output signal and a beam forming output signal that can be obtained by removing the blind signal by minimizing the coarseness characteristic. It becomes possible.

도 2는 본 발명의 바람직한 실시예에 따른 음원 향상 방법이 적용된 음원 향상 장치를 개략적으로 도시한 블록도이다. 도 2를 참조하면, 음원 향상 장치는 신호 입력부(200), 널포밍부(210), 빔포밍부(220), 이득측정부(230), 필터부(240)를 구비한다. 2 is a block diagram schematically illustrating a sound source enhancement apparatus to which a sound source enhancement method according to an exemplary embodiment of the present invention is applied. Referring to FIG. 2, the sound source enhancement apparatus includes a signal input unit 200, a null forming unit 210, a beam forming unit 220, a gain measuring unit 230, and a filter unit 240.

상기 신호 입력부(200)는 2개의 마이크로폰과 같은 신호 입력 장치로부터 입력된 신호 x(t)에 Short-time Fourier Transform(이하, 'SFT'라 한다)하여 상기 널포밍부(210) 및 빔포밍부(220)로 제공한다. 신호 입력부로부터 출력되는 j-번째 마이크로폰 신호에 대한 시간-주파수(f,t)의 세그먼트에서의 값은 수학식 1과 같이 나타낼 수 있다. The signal input unit 200 is a short-time Fourier transform (hereinafter referred to as 'SFT') to the signal x (t) input from a signal input device such as two microphones, the null forming unit 210 and the beam forming unit Provided at 220. The value in the segment of the time-frequency (f, t) for the j-th microphone signal output from the signal input unit may be expressed by Equation 1 below.

여기서, S(f,t)는 점 음원(a point source)로부터 발생한 타겟 스피치 신호이며, A _j (f) 및 N _j (f,t)는 각각 j-번째 마이크로폰에 대한 타겟 음원으로부터의 전달 함수 및 디퓨즈 노이즈(diffuse noise)이다. Where S (f, t) is the target speech signal originating from a point source, and A _j (f) and N _j (f, t) are the transfer functions from the target sound source for the j-th microphone, respectively. And diffuse noise.

널포밍부(210)는 주파수 도메인(Frequency-Domain)의 암묵 신호 제거(Blind Signal Cancellation;'BSC')를 위하여 음원 방향으로 널포밍(nullforming)함으로써, 점 음원으로부터 발생된 타겟 스피치 신호를 소거하기 위한 것이다. 2개의 마이크로폰 신호가 입력됨을 고려하면, 수학식 1의 입력 신호는 {X _j (f,t), j=1,2}이 된다. X ₁ (f,t)가 수학식 2와 같이 나타낼 수 있는 타겟 스피치 신호 S(f,t)의 SFT 변환된 신호이므로, 주파수 도메인의 암묵신호제거(FD-BSC)를 위한 널포밍부로부터 출력되는 널포밍 신호는 수학식 3와 같이 표현될 수 있다. The null forming unit 210 erases the target speech signal generated from the point sound source by nullforming the sound source direction for blind signal cancellation ('BSC') in the frequency domain. It is for. Considering that two microphone signals are input, the input signal of Equation 1 is { X _j (f, t), j = 1,2}. Since X ₁ (f, t) is an SFT-converted signal of the target speech signal S (f, t ) , which can be expressed by Equation 2, the output from the null forming unit for blind signal removal (FD-BSC) in the frequency domain The null forming signal may be expressed as Equation 3 below.

여기서, α(f)는 타겟 스피치에 대응하는 최대 성김 신호를 소거하기 위한 추정 변수로서, 복소수의 형태로 이루어진다. Here, α (f) is an estimation variable for canceling the maximum sparse signal corresponding to the target speech, and is in the form of a complex number.

α(f)의 비용 함수는 수학식 4로 나타낼 수 있으며, 비용 함수를 최대화시키는 gradient ascent 알고리즘을 이용하여 성김 특성을 최소화시킬 수 있는 α(f)를 추정할 수 있다. 이때 수학식 5의 조건을 만족하여야 한다. The cost function of α (f) can be represented by Equation 4, and α (f) can be estimated by using a gradient ascent algorithm that maximizes the cost function. At this time, the condition of Equation 5 must be satisfied.

여기서, γ≥0 은 신호의 성김 특성을 제어하는 변수이다. 수학식 5의 조건을 만족시키는 α(f)를 추정하기 위하여, 입력 신호 {X _j (f,t), j=1,2}에 spatial whitening 기법을 적용시켜 수학식 6으로 표현되는 화이트닝 신호(Whitened Signal) { Z _j (f,t), j=1,2}을 구한다. Here, γ ≥ 0 is a variable that controls the coarseness of the signal. In order to estimate α (f) that satisfies the condition of Equation 5 , the whitening signal represented by Equation 6 is applied by applying a spatial whitening technique to the input signal { X _j (f, t), j = 1,2}. Whitened Signal) {Z _j (f, t), j = 1,2}

여기서, V _jk (f)는 화이트닝 매트릭스의 각 구성 성분이다. 다음, 수학식 3을 수학식 7과 같이 수정한다. Here, V _jk (f) is each component of the whitening matrix. Next, Equation 3 is modified as in Equation 7.

여기서, Y' _N (f,t)는 입력 신호를 수학식 6으로 화이트닝한 뒤 널포밍한 신호이다. 수학식 7에 있어서, 복소수 형태의 추정 변수 α(f)를 대신하여, [-π,π]의 구간내에서의 실수 형태의 φ(f), θ(f)로 표현할 수 있으며, 이 경우 항상 수학식 5의 조건을 만족하게 된다. Here, Y ' _N (f, t) is a signal that is null-formed after whitening the input signal by Equation 6. In Equation 7, instead of the complex variable α (f), it can be expressed as φ (f), θ (f) in the real form within the interval of [-π, π], in which case The condition of Equation 5 is satisfied.

수학식 4에 표현된 추정 변수 α(f)에 대한 비용함수(J(α(f)))를 최대화시키기 위한 gradient를 구하기 위하여 수학식 8 및 수학식 9와 같이 Δφ(f), Δθ(f)를 구한다. In order to obtain a gradient for maximizing the cost function J (α (f)) for the estimated variable α (f) expressed in Equation 4, Δφ (f) and Δθ (f as shown in Equations 8 and 9 )

여기서, 수학식 10와 같이 Y _N ' (f,t)는 실수와 허수로 표현되며, 이들 각각은 수학식 11 및 12와 같이 나타낼 수 있다. Here, as shown in Equation 10, Y _N ' (f, t) is represented by real and imaginary numbers, and each of them may be represented by Equations 11 and 12.

여기서, Z _j ^r (f,t) 와 Z _j ⁱ (f,t)는 각각 Z _j (f,t)의 실수부와 허수부를 나타낸다. 따라서, φ(f)은 수학식 13, 14, 15을 수학식 8에 적용하여 업데이팅시킬 수 있다. Here, Z _j ^r (f, t) and Z _j ⁱ (f, t) represent the real part and the imaginary part of Z _j (f, t), respectively. Therefore, φ (f) may be updated by applying Equations 13, 14, and 15 to Equation 8.

위와 유사한 방법으로 θ(f)는 수학식 16, 17, 18을 수학식 9에 적용하여 업데이팅시킬 수 있다. In a manner similar to the above, θ (f) may be updated by applying Equations 16, 17, and 18 to Equation 9.

위의 과정을 거쳐 φ(f)와 θ(f)을 추정하고, 복소수 형태의 α(f)는 수학식 19에 의해 구할 수 있다.Through the above process, φ (f) and θ (f) are estimated, and the complex form α (f) can be obtained by Equation 19.

널포밍부는 수학식 19에 의해 구한 추정 변수 α(f)를 이용하여 수학식 3에 따른 널포밍 신호 Y _N (f,t)을 구하여 이득 측정부로 제공하고, α(f)를 빔포밍부로 제공한다.The null forming unit obtains the null forming signal Y _N (f, t) according to Equation 3 using the estimated variable α (f) obtained by Equation 19, and provides it to the gain measuring unit , and provides α (f) to the beamforming unit. do.

상기 빔포밍부는 입력신호 X _j (f,t) 및 α(f)를 이용하여 수학식 20으로 표현되는 빔포밍 신호 Y _B (f,t)를 구하여 필터부 및 이득 측정부로 제공한다. The beamforming unit input signal The beamforming signal Y _B (f, t) represented by Equation 20 is obtained using X _j (f, t) and α (f) and provided to the filter unit and the gain measurement unit.

상기 이득 측정부는 널포밍부 및 빔포밍부로부터 각각 널포밍 신호(Y _N (f,t)) 및 빔포밍 신호(Y _B (f,t))를 제공받아 수학식 21로 표현되는 이득 G(f,t)를 계산하여 필터부로 제공한다. The gain measuring unit receives a null forming signal Y _N (f, t) and a beam forming signal Y _B (f, t) from the null forming unit and the beam forming unit, respectively , and represents a gain G ( Equation 21). f, t) is calculated and provided to the filter section.

여기서, β는 노이즈 감쇄를 제어하는 변수로서, 채널이나 주변 상황에 따라 적절하게 조절될 수 있다. Here, β is a variable controlling the noise attenuation, and may be appropriately adjusted according to the channel or the surrounding situation.

이하, 이득 추정부가 이득 G(f,t)를 구하는 과정을 설명한다. 먼저, 추정 변수 α(f)가 A ₁ (f)/A ₂ (f)로 수렴할 때, 널포밍부의 출력 신호인 널포밍 신호(Y _N (f,t))는 입력 신호로부터 타겟 스피치 신호가 소거되고 디퓨즈 노이즈 신호들만 남게 되어, 수학식 22와 같이 표현될 수 있다. 이 경우, 빔포밍부의 출력 신호인 빔포밍 신호(Y _B (f,t))는 수학식 23와 같이 표현될 수 있다. The process of obtaining the gain G (f, t) by the gain estimator will be described below. First, when the estimated variable α (f) converges to A ₁ (f) / A ₂ (f) , the null forming signal Y _N (f, t), which is an output signal of the null forming unit, is a target speech signal from the input signal. Is canceled and only the diffuse noise signals remain, which can be expressed as Equation (22). In this case, the beamforming signal Y _B (f, t), which is an output signal of the beamforming unit, may be expressed by Equation 23.

타겟 스피치가 디퓨즈 노이즈와 서로 상관되지 않으며(uncorrelated), 디퓨즈 노이즈 신호들은 다른 디퓨즈 노이즈 신호들과 서로 상관되지 않는다(uncorrelated)고 가정하면, 출력 신호들의 power은 수학식 24 및 수학식 25와 같이 표현된다. Assuming that the target speech is uncorrelated with the diffuse noise and the diffuse noise signals are uncorrelated with the other diffuse noise signals, the power of the output signals is expressed as in Equation 24 and Equation 25. do.

그러므로, 변환된 타겟 신호 X₁ ^S(f,t)는 수학식 26과 같이 추정될 수 있다. Therefore, the converted target signal X ₁ ^S (f, t) can be estimated as in Equation 26.

따라서, 이득(G(f,t))는 수학식 21과 같이 표현될 수 있다.
Therefore, the gain G (f, t) may be expressed as Equation 21.

상기 필터부(240)는 빔포밍부(220) 및 이득 측정부(230)로부터 각각 Y _B (f,t) 및 G(f,t) 를 제공받아, 수학식 27로 표현되는 타겟 스피치 추정 신호를 제공한다. The filter unit 240 receives Y _B (f, t) and G (f, t) from the beamforming unit 220 and the gain measuring unit 230, respectively , and is a target speech estimation signal represented by Equation 27. To provide.

본 발명에 따른 필터부는 위너 필터(Wiener filter)를 사용하는 것이 바람직하다.
The filter unit according to the present invention preferably uses a Wiener filter.

이상에서 본 발명에 대하여 그 바람직한 실시예를 중심으로 설명하였으나, 이는 단지 예시일 뿐 본 발명을 한정하는 것이 아니며, 본 발명이 속하는 분야의 통상의 지식을 가진 자라면 본 발명의 본질적인 특성을 벗어나지 않는 범위에서 이상에 예시되지 않은 여러 가지의 변형과 응용이 가능함을 알 수 있을 것이다. 그리고, 이러한 변형과 응용에 관계된 차이점들은 첨부된 청구 범위에서 규정하는 본 발명의 범위에 포함되는 것으로 해석되어야 할 것이다. While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments, but, on the contrary, It will be understood that various changes and modifications may be made without departing from the spirit and scope of the invention. It is to be understood that the present invention may be embodied in many other specific forms without departing from the spirit or essential characteristics thereof.

본 발명에 따른 음원 향상 방법은 타겟 스피치 신호가 단일의 점 음원이고 디퓨즈 노이즈가 존재하는 상황에서 타겟 스피치 신호를 추정하는데 효율적으로 사용될 수 있다. The sound source enhancement method according to the present invention can be efficiently used for estimating the target speech signal in a situation where the target speech signal is a single point sound source and diffuse noise is present.

200 : 신호 입력부
210 : 널포밍부
220 : 빔포밍부
230 : 이득측정부
240 : 필터부200: signal input unit
210: null forming unit
220: beam forming unit
230: gain measuring unit
240 filter unit

Claims

In the sound source enhancement method for estimating the target speech signal using blind signal cancellation,
(a) receiving a target speech signal through a plurality of input devices ({ X _j (f, t), j = 1,2, ..., n, n = number of input devices)) A signal input step of converting the signal into a signal;
(b) a null forming step of generating a null forming signal Y _N (f, t ) and an estimated variable α (f) for tacit signal removal from the input signals in the frequency domain converted in the signal input step;
(c) generating a beamforming signal Y _B (f, t) using the input signal of the frequency domain converted in the signal input step and the estimated variable α (f) generated in the null forming step. Beamforming step;
(d) a gain measuring step of generating a gain G (f, t) using the null forming signal generated in the null forming step and the beam forming signal generated in the beam forming step;
(e) a filtering step of estimating a target speech signal S (f, t) using the beamforming signal generated in the beamforming step and the gain generated in the gain measuring step;
Sound source enhancement method for estimating the target speech signal using the blind signal removal provided with.

The method of claim 1, wherein the signal input step receives a target speech signal through two input devices disposed at positions spaced apart from each other by a predetermined distance.

The method of claim 1, wherein the sound source enhancement method is a signal generated by a target speech signal from a single point sound source, and is applied when there is diffuse noise.

The method of claim 1, wherein the null forming signal is determined by the following equation.

The method of claim 1, wherein the estimated variable is a variable for canceling a maximum sparse signal corresponding to a target speech signal.

The method of claim 1, wherein the estimated variable is a variable for canceling a maximum coarse signal corresponding to a target speech signal, and is estimated using a gradient ascent algorithm that maximizes a cost function J (α (f)) of the estimated variable. Sound source enhancement method characterized in that.

The method of claim 6, wherein the cost function of the estimated variable is determined by the following equation, wherein γ is a variable that controls the coarseness of the signal.

The method of claim 1, wherein the estimated variable is determined by the following equation, wherein
V _jk (f) represents each component of the whitening matrix for applying the spatial whitening technique to the input signals, and φ (f) and θ (f) are real numbers in the interval of [-π, π]. Sound source enhancement method characterized in that.

The method of claim 6, wherein the following equation is always satisfied in the process of estimating the estimated variable.

The method of claim 1, wherein the beamforming signal is determined by the following equation.

The method of claim 1, wherein the gain is determined by the following equation, wherein β is a variable for controlling noise attenuation.

The method of claim 1, wherein the filtering comprises calculating and providing a target speech estimation signal using the following equation.

In the sound source enhancement device for estimating and providing a target speech signal using blind signal cancellation,
(a) receiving a target speech signal through a plurality of input devices ({ X _j (f, t), j = 1,2, ..., n, n = number of input devices)) A signal input unit for converting and providing a signal;
(b) a null forming unit for generating a null forming signal Y _N (f, t ) and an estimated variable α (f) for removing a blind signal from input signals in the frequency domain provided from the signal input unit;
(c) receiving an input signal of a frequency domain from the signal input unit, receiving an estimation variable (f) from the null forming unit, and using the provided input signal and the estimation variable, a beamforming signal Y _B (f a beamforming unit generating and providing t) ;
(d) receiving a null forming signal from the null forming unit, receiving a beam forming signal from the beam forming unit, and generating a gain G (f, t) using the provided null forming signal and the beam forming signal. Gain measuring unit provided by;
(e) receiving a beamforming signal from the beamforming unit, receiving a gain from the gain measuring unit, and estimating and providing a target speech signal S (f, t) using the provided beamforming signal and gain. Filter unit to be;
Sound source enhancement device for estimating the target speech signal using the blind signal removal provided with.

The sound source improving apparatus of claim 13, wherein the signal input unit receives an input signal from two input devices disposed at positions spaced apart from each other by a predetermined distance.

The sound source improving apparatus of claim 13, wherein the target speech signal is a signal generated from a single point sound source, and is applied to a situation in which diffuse noise exists.

The apparatus of claim 13, wherein the null forming signal is determined by the following equation.

The method of claim 13, wherein the estimated variable is a variable for canceling a maximum sparse signal corresponding to a target speech signal, and maximizes the cost function J (α (f)) of the estimated variable determined by the following equation. Estimated using a gradient ascent algorithm, wherein γ is a variable controlling the coarseness characteristics of the signal, characterized in that γ≥0.

The method of claim 13, wherein the estimated variable is determined by the following equation,
V _jk (f) represents each component of the whitening matrix for applying the spatial whitening technique to the input signals, and φ (f) and θ (f) are real numbers in the interval of [-π, π]. Sound source enhancement device, characterized in that.

The apparatus of claim 13, wherein the following equation is always satisfied in the process of estimating the estimated variable.

The apparatus of claim 13, wherein the beamforming signal is determined by the following equation.

The apparatus of claim 13, wherein the gain is determined by the following equation, wherein β is a variable controlling noise attenuation.