KR100421013B1

KR100421013B1 - Speech enhancement system and method thereof

Info

Publication number: KR100421013B1
Application number: KR10-2001-0048287A
Authority: KR
Inventors: 김평수
Original assignee: 삼성전자주식회사
Priority date: 2001-08-10
Filing date: 2001-08-10
Publication date: 2004-03-04
Also published as: KR20030034263A

Abstract

본 발명은 시간 영역만을 이용하여 입력 프레임에서 잡음신호가 제거된 음성신호를 얻을 수 있도록 구현함으로써, 간단한 구성과 구현 과정을 갖고, 입력 프레임에 포함되어 있는 음성신호가 음성신호와 잡음신호가 정체된 것이라는 가정과 음성신호와 잡음신호의 주파수 대역이 상관관계를 갖지 않는다는 가정에서 벗어난 경우에도 정확하게 잡음신호가 제거된 음성신호를 얻을 수 있는 음성 향상 시스템 및 방법이다.The present invention has a simple configuration and implementation process by implementing the voice signal from which the noise signal has been removed from the input frame using only the time domain, and the voice signal included in the input frame is stagnated with the voice signal and the noise signal. It is a voice enhancement system and method that can accurately obtain a noise signal from which the noise signal has been removed even if the assumption is made that there is no correlation between the frequency band of the voice signal and the noise signal.

본 발명에 따른 음성 향상 시스템은, 음성 향상 시스템에 있어서, 입력되는 프레임에 대한 상태 공간 모델을 토대로 구성되어, 입력되는 프레임으로부터 잡음 신호가 제거된 음성신호를 검출하도록, 입력되는 프레임을 필터링하는 필터; 필터링에 필요한 소정의 파라미터를 입력되는 프레임으로부터 실시간으로 추정하는 파라미터 추정부를 포함한다.The speech enhancement system according to the present invention comprises a filter for filtering an input frame so as to detect a speech signal from which a noise signal has been removed from the input frame, based on a state space model of the input frame in the speech enhancement system. ; And a parameter estimator for estimating, in real time, a predetermined parameter required for filtering from an input frame.

Description

Speech enhancement system and method

본 발명은 잡음 신호가 제거된 음성신호(voice)를 제공하는 음성 향상 시스템 및 방법에 관한 것으로, 특히, 입력되는 프레임에서 검출한 음성신호에 섞여 있는 잡음 신호를 제거하는 음성 향상 시스템 및 방법에 관한 것이다.The present invention relates to a voice enhancement system and method for providing a voice signal from which a noise signal has been removed, and more particularly, to a voice enhancement system and method for removing a noise signal mixed with a voice signal detected in an input frame. will be.

유무선 전화기, 핸즈프리 시스템, 음성 인식 시스템과 같이 음성신호를 취급하는 시스템에서 사용되는 음성 향상 시스템은, 잡음 신호가 섞여 있지 않은 음성신호를 제공하기 위한 것이다. 기존의 음성 향상 시스템은 음성신호를 검출하기 전에 잡음 신호를 먼저 추정하고, 입력 프레임에서 추정한 음성신호의 주파수 스펙트럼에서 상기 추정된 잡음 신호를 감산하여 얻어진 음성신호를 검출된 음성신호로서 제공하도록 구성되어 있다.Voice enhancement systems used in systems that handle voice signals, such as wired and wireless telephones, hands-free systems, and voice recognition systems, are for providing voice signals that are free of noise signals. The existing speech enhancement system is configured to estimate the noise signal before detecting the speech signal and to provide the speech signal obtained by subtracting the estimated noise signal from the frequency spectrum of the speech signal estimated in the input frame as the detected speech signal. It is.

즉, 도 1을 참조하면, 기존의 음성 향상 시스템은 음성 존재유무검출기(Voice Activity Detector, VAD로 약하기도 함)(101)에서 입력되는 프레임 y에 음성신호가 존재하는지 유무를 검출한다. 검출결과, 음성신호가 존재하지 않는 경우에 입력된 프레임 y는 음성 존재유무 검출기(101)로부터 잡음 추정부(102)로 전송된다.That is, referring to FIG. 1, the existing voice enhancement system detects whether or not a voice signal exists in a frame y input from a voice activity detector 101 (also referred to as a voice activity detector (VAD)). As a result of the detection, when the voice signal does not exist, the input frame y is transmitted from the voice presence detector 101 to the noise estimator 102.

잡음 추정부(102)는 고속 푸리에 변환(fast fourier transform) 알고리즘을 이용하여 잡음신호에 대한 주파수 스펙트럼을 구한다. 이 때, 잡음신호의 신뢰도를 높이기 위하여 음성신호가 존재하지 않는 다수의 입력 프레임 y에 대한 잡음 추정과정을 반복하고, 그 평균을 잡음신호에 대한 주파수 스펙트럼로 구한다. 상기 잡음 신호에 대한 주파수 스펙트럼은 감산부(105)로 제공된다.The noise estimator 102 uses a fast fourier transform algorithm to determine the frequency spectrum of the noise signal. Obtain At this time, in order to increase the reliability of the noise signal, the noise estimation process is repeated for a plurality of input frames y in which the voice signal does not exist, and the average thereof is the frequency spectrum of the noise signal. Obtain as Frequency spectrum for the noise signal Is provided to the subtraction section 105.

음성 존재유무 검출기(101)에서 검출한 결과, 입력되는 프레임 y에 음성신호가 존재하는 경우에, 입력된 프레임 y는 음성 존재유무 검출기(101)로부터 윈도윙 부(103)로 전송된다. 윈도윙 부(103)는 입력 프레임 y에서 음성신호만을 검출하기 위한 윈도윙 작업을 수행한다. 윈도윙 방식은 해밍(hamming) 혹은 해닝(hanning) 윈도윙 방식과 같은 방식으로 이루어질 수 있다.As a result of detection by the voice presence detector 101, when a voice signal exists in the input frame y, the input frame y is transmitted from the voice presence detector 101 to the windowing unit 103. The windowing unit 103 performs a windowing operation for detecting only an audio signal in the input frame y. The windowing method may be performed in the same manner as the hamming or the hanning windowing method.

주파수 스펙트럼 추정부(104)는 윈도윙 부(103)에서 검출된 음성신호를 고속 푸리에 변환하여 주파수 스펙트럼를 구한다. 감산부(105)는 주파수 스펙트럼 추정부(104)에서 추정된 주파수 스펙트럼에서 잡음 추정부(102)에서 추정된 주파수 스펙트럼을 감산하여, 잡음이 제거된 주파수 스펙트럼(=-)을 구한다.The frequency spectrum estimator 104 performs a fast Fourier transform on the voice signal detected by the window wing 103 to generate a frequency spectrum. Obtain The subtraction section 105 estimates the frequency spectrum estimated by the frequency spectrum estimation section 104. Frequency spectrum estimated by the noise estimator 102 Subtract noise to remove noise (= - )

역고속 푸리에 변환부(106)는 감산부(105)에서 구한 주파수 스펙트럼에 대해 역고속 푸리에 변환 알고리즘을 적용하여 잡음이 제거된 음성신호을 구하여 출력한다. 출력된 음성신호는 음성 향상 시스템의 출력 신호로서, 음성 향상 시스템이 적용된 시스템에 구비되어 있는 스피커와 같은 수단으로 제공될 수 있다.The inverse fast Fourier transform section 106 obtains the frequency spectrum obtained by the subtraction section 105. Noise-free Speech Signal Using Inverse Fast Fourier Transform Algorithm Obtain and print Audio signal output Is an output signal of the speech enhancement system, and may be provided by means such as a speaker included in the system to which the speech enhancement system is applied.

그러나, 상술한 바와 같이 동작되는 기존의 음성 향상 시스템은 잡음 신호에 대한 주파수 스펙트럼이 입력 프레임 y에 음성신호가 존재하지 않을 경우에 추정한 것이므로, 입력되는 프레임 y에 포함되어 있는 음성신호에 섞여 있는 잡음 신호의 주파수 대역과 다를 수 있다. 만약 음성신호에 섞여 있는 잡음 신호의 주파수 대역과 잡음 추정부(102)에서 추정된 주파수 스펙트럼의 주파수 대역이 다를 경우에, 음성 향상 시스템을 통해 양질의 음성신호를 얻는 것은 기대하기 어렵다.However, the existing speech enhancement system operated as described above has a frequency spectrum for the noise signal. It is estimated when the voice signal does not exist in the input frame y, and may be different from the frequency band of the noise signal mixed in the voice signal included in the input frame y. If the frequency band of the noise signal mixed in the voice signal and the frequency band of the frequency spectrum estimated by the noise estimating unit 102 are different, it is difficult to obtain a good voice signal through the voice enhancement system.

또한, 기존의 음성 향상 시스템에서 사용되고 있는 음성 존재유무 검출기(101)는 낮은 SNR(Signal-to-Noise Ratio) 조건에서는 신뢰성이 떨어지는 것으로 알려져 있다. 그리고, 기존의 음성 향상 시스템은 음성신호와 잡음 신호가 통계적으로 정체된 것(stationary)으로 가정하여 구현된 것이다. 이러한 가정은 주파수 영역에서 잡음을 제거하기 위해 설정된 것이다. 그러나, 실질적으로 음성신호와 잡음 신호는 정체되지 않을 수도 있다.In addition, the voice presence detector 101 used in the existing voice enhancement system is known to have low reliability under low signal-to-noise ratio (SNR) conditions. In addition, the existing speech enhancement system is implemented assuming that the speech signal and the noise signal are statistically stationary. This assumption is set to remove noise in the frequency domain. However, in practice, voice signals and noise signals may not be congested.

또, 기존의 음성 향상 시스템은 잡음신호와 음성신호가 서로 영향을 주지 않는다는 가정 하에 구현된 것이다. 즉 잡음신호와 음성신호는 주파수 대역이 완전히다르기 때문에 상관관계가 없다는 가정 하에 구현된 것이다. 따라서 입력되는 프레임에 포함되어 있는 음성신호에 섞여 있는 잡음신호가 음성신호와 서로 영향을 주는 경우에, 기존의 음성 향상 시스템으로 양질의 음성신호를 제공하는 것은 기대하기 어렵다.In addition, the existing speech enhancement system is implemented under the assumption that the noise signal and the speech signal do not affect each other. That is, the noise signal and the voice signal are implemented under the assumption that there is no correlation because the frequency band is completely different. Therefore, when a noise signal mixed in a voice signal included in an input frame affects the voice signal, it is difficult to provide a high quality voice signal with an existing voice enhancement system.

또한, 기존의 음성 향상 시스템은 시간 영역에서 얻어진 입력 프레임을 주파수 영역에서 처리한 뒤, 다시 시간 영역의 결과를 얻도록 구현되어 있어 다소 복잡한 구성과 구현 과정을 갖는다.In addition, the conventional voice enhancement system is implemented to process the input frame obtained in the time domain in the frequency domain and then obtain the result in the time domain, which has a rather complicated configuration and implementation process.

본 발명은 시간 영역만을 이용하여 입력 프레임에서 잡음신호가 제거된 음성신호를 얻을 수 있도록 구현함으로써, 간단한 구성과 구현 과정을 갖는 음성 향상 시스템 및 방법을 제공하는데 그 목적이 있다.An object of the present invention is to provide a speech enhancement system and method having a simple configuration and implementation process by implementing a speech signal from which noise signals have been removed from an input frame using only a time domain.

본 발명의 다른 목적은 입력 프레임에 포함되어 있는 음성신호가 음성신호와 잡음신호가 정체된 것이라는 가정과 음성신호와 잡음신호의 주파수 대역이 상관관계를 갖지 않는다는 가정에서 벗어난 경우에도 정확하게 잡음신호가 제거된 음성신호를 얻을 수 있는 음성 향상 시스템 및 방법을 제공하는데 그 목적이 있다.Another object of the present invention is to accurately remove a noise signal even if it deviates from the assumption that the voice signal included in the input frame is congested with the voice signal and the noise signal and that the frequency bands of the voice signal and the noise signal do not have a correlation. It is an object of the present invention to provide a speech enhancement system and method for obtaining a speech signal.

상기 목적들을 달성하기 위하여 본 발명에 따른 음성 향상 시스템은, 음성 향상 시스템에 있어서, 입력되는 프레임에 대한 상태 공간 모델을 토대로 구성되어, 입력되는 프레임으로부터 잡음 신호가 제거된 음성신호를 검출하도록, 입력되는 프레임을 필터링하는 필터; 필터링에 필요한 소정의 파라미터를 입력되는 프레임으로부터 실시간으로 추정하는 파라미터 추정부를 포함하는 것이 바람직하다.In order to achieve the above objects, the speech enhancement system according to the present invention is configured in the speech enhancement system based on a state space model for an input frame to detect a speech signal from which a noise signal is removed from the input frame. A filter for filtering the frames; It is preferable to include a parameter estimator for estimating a predetermined parameter necessary for filtering in real time from an input frame.

상기 필터는, 입력되는 프레임에 대한 적어도 하나의 샘플을 검출하는 샘플 검출기; 파라미터 추정부에서 추정된 파라미터들을 이용하여 입력되는 프레임에 포함되어 있는 음성신호에 대한 필터의 이득 행렬을 검출하는 이득 행렬 검출기; 샘플 검출기에서 검출된 샘플과 이득 행렬 검출기에서 검출된 이득 행렬을 이용하여 음성신호에 대한 상태 변수의 추정값을 검출하는 제 1 추정값 검출기; 제 1 추정값 검출기에서 검출된 추정값에서 잡음 신호가 제거된 음성신호의 추정값을 검출하는 제 2 추정값 검출기를 포함하는 것이 바람직하다.The filter includes a sample detector for detecting at least one sample of an input frame; A gain matrix detector for detecting a gain matrix of a filter for a voice signal included in an input frame using the parameters estimated by the parameter estimator; A first estimate detector for detecting an estimate of a state variable for the speech signal using the sample detected at the sample detector and the gain matrix detected at the gain matrix detector; And a second estimated value detector for detecting an estimated value of the speech signal from which the noise signal has been removed from the estimated value detected by the first estimated value detector.

상기 목적들을 달성하기 위하여 본 발명에 따른 음성 향상 방법은, 음성 향상 방법에 있어서, 입력 프레임으로부터 소정의 파라미터를 실시간으로 추정하는 단계; 소정의 파라미터를 이용하여 상기 입력 프레임으로부터 잡음 신호가 제거된 음성신호를 검출하기 위해 입력 프레임을 필터링하는 단계를 포함하는 것이 바람직하다.According to an aspect of the present invention, there is provided a speech enhancement method, comprising: estimating a predetermined parameter from an input frame in real time; And filtering an input frame to detect a speech signal from which the noise signal has been removed from the input frame using a predetermined parameter.

도 1은 기존의 음성 향상 시스템의 기능 블록 도이다.1 is a functional block diagram of a conventional voice enhancement system.

도 2는 본 발명에 따른 음성 향상 시스템의 기능 블록 도이다.2 is a functional block diagram of a speech enhancement system according to the present invention.

도 3은 본 발명에 따른 음성 향상 방법의 동작 흐름 도이다.3 is a flowchart illustrating the operation of the voice enhancement method according to the present invention.

이하, 첨부된 도면을 참조하여 본 발명에 따른 실시 예를 상세히 설명하기로 한다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 2는 본 발명에 따른 음성 향상 시스템의 기능 블록 도이다. 도 2를 참조하면, 본 발명에 따른 음성 향상 시스템은 파라미터 추정부(201)와 입력 프레임 y(k)에서 잡음 신호가 제거된 음성신호(k)를 추정하는 추정 필터(210)로 구성된다.2 is a functional block diagram of a speech enhancement system according to the present invention. Referring to FIG. 2, in the speech enhancement system according to the present invention, a speech signal from which a noise signal is removed from a parameter estimator 201 and an input frame y (k) is provided. and an estimation filter 210 for estimating (k).

파라미터 추정부(201)는 추정 필터(210)가 동작하는데 필요한 파라미터들은입력 프레임 y(k)를 토대로 실시간으로 추정한다. 이를 위하여 파라미터 추정부(201)는 잘 알려진 Yule-Walker 알고리즘, 확장 Yule-Walker 알고리즘, Durbin-Levinson 알고리즘과 같은 알고리즘으로 운영되도록 구성된다.The parameter estimator 201 estimates parameters necessary for the estimation filter 210 to operate in real time based on the input frame y (k). To this end, the parameter estimator 201 is configured to operate with well-known algorithms such as the Yule-Walker algorithm, the Extended Yule-Walker algorithm, and the Durbin-Levinson algorithm.

추정 필터(210)의 동작에 필요한 파라미터들은, 입력 프레임 y(k)가 하기 수학식 1과 같고, 그에 따른 상태 공간 모델(state space model)은 하기 수학식 2와 같다는 가정 하에 추정 필터(210)가 구성된 경우에, 수학식 2에서의 A, G, C와 잡음 신호인 w(k)와 v(k)에 대응되는 공 분산(covariance, cov라고 표기하기도 함) Q, R이 된다.The parameters necessary for the operation of the estimation filter 210 include the estimation filter 210 under the assumption that the input frame y (k) is equal to Equation 1 below, and the state space model is equal to Equation 2 below. In the case of Equation 2, covariances (also referred to as cov) and Q and R corresponding to A, G and C and noise signals w (k) and v (k) in Equation 2 are obtained.

(k) + (k) + v(k) (k) + (k) + v (k)

x(k+1) = Ax(k) + Gw(k)x (k + 1) = Ax (k) + Gw (k)

y(k) = Cx(k) + v(k)y (k) = Cx (k) + v (k)

수학식 1에서(k)는 입력 프레임 y(k)에 포함되어 있는 실제 음성 신호이고,(k)는 입력 프레임 y(k)에 포함되어 있는 잡음 신호이고, v(k)는 입력 프레임 y(k)를 본 발명에 따른 음성 향상 시스템으로 제공하는 수단에서의 잡음 신호이다. 예를 들어 v(k)는 음성 향상 시스템이 적용되는 시스템에 구비되어 있는 센서와 같은 수단에서 발생되는 잡음 신호이다. 따라서 입력 프레임 y(k)에 포함되어 있는 실질적인 총 잡음 신호는(k)와 v(k)를 포함한 신호가 된다.In Equation 1 (k) is the actual speech signal contained in input frame y (k), (k) is the noise signal contained in input frame y (k) and v (k) is the noise signal in the means for providing input frame y (k) to the speech enhancement system according to the present invention. For example, v (k) is a noise signal generated by means such as a sensor included in a system to which a speech enhancement system is applied. Therefore, the actual total noise signal contained in the input frame y (k) is It is a signal containing (k) and v (k).

수학식 2에서 x(k)는 수학식 1에서의(k)와(k)로 이루어진 신호의 상태 변수로서 현재 시간 k에서의 입력 프레임 y(k)로부터 추정하기를 원하는 신호의 상태 변수이고, x(k+1)는 다음 시간 k+1에서의 입력 프레임 y(k)로부터 추정하기를 원하는 신호의 상태 변수이다. 수학식 2에서 A, G, C, x(k), w(k)와 v(k)에 대한 공 분산 Q, R들은 수학식 3과 같이 정의된다.In Equation 2 x (k) is represented by Equation 1 (k) and is the state variable of the signal consisting of (k), which is the state variable of the signal desired to be estimated from the input frame y (k) at the current time k, and x (k + 1) is the input frame y ( k is the state variable of the signal desired to be estimated. In Equation 2, the covariances Q and R for A, G, C, x (k), w (k) and v (k) are defined as in Equation 3.

w(k)=,w (k) = ,

cov[w(k)]=Q, cov[v(k)]=Rcov [w (k)] = Q, cov [v (k)] = R

A=,A = ,

G=,G = ,

C=[,]C = [ , ]

수학식 3에서(k)와(k)는(k)와(k)에 대한 상태 변수 벡터이다. 상기 상태 변수 벡터(k) 와(k)는 수학식 4와 같이 정의된다.In equation (3) (k) and (k) is (k) and State variable vector for (k). The state variable vector (k) and (k) is defined as in Equation 4.

, ,

수학식 4에서 p, q는(k)와(k)의 차수이다.In Equation 4, p and q are (k) and is the order of (k).

파라미터 추정부(201)는 수학식 3에 정의된 바와 같은 파라미터들 A, G, C, Q, R을 입력 프레임 y(k)로부터 실시간으로 추정하여 추정 필터(210)로 제공한다.The parameter estimator 201 estimates, in real time, the parameters A, G, C, Q, and R as defined in Equation 3 from the input frame y (k) to the estimation filter 210.

추정 필터(210)는 파라미터 추정부(201)로부터 제공되는 파라미터들을 이용하여 입력 프레임 y(k)로부터 잡음신호가 제거된 음성신호(k)를 추정할 수 있도록, 상기 입력 프레임 y(k)를 필터링하여 출력한다. 이를 위하여 추정 필터(210)는 샘플 검출기(211), 이득 행렬 검출기(212), 제 1 추정값 검출기(213), 및 제 2 추정값 검출기(214)로 구성된다.The estimation filter 210 uses the parameters provided from the parameter estimator 201 to remove the voice signal from the input frame y (k). In order to estimate (k), the input frame y (k) is filtered and output. To this end, the estimation filter 210 includes a sample detector 211, a gain matrix detector 212, a first estimate detector 213, and a second estimate detector 214.

샘플 검출기(211)는 입력 프레임 y(k)에 대한 샘플 Y를 검출한다. 샘플 Y는 입력 프레임 y(k)에 대한 현재 시간 k에서 과거시간 k-N까지의 N+1개 샘플로서, 수학식 5와 같이 벡터 형식을 갖는다.Sample detector 211 detects sample Y for input frame y (k). Sample Y is N + 1 samples from the current time k to the past time k-N for the input frame y (k), and has a vector form as shown in Equation 5.

이득 행렬 검출기(212)는 파라미터 추정부(201)에서 제공되는 파라미터들을 토대로 추정 필터(210)의 이득 행렬 H를 구한다. 이득행렬 H는 수학식 6과 같은 형태의 행렬을 갖는다.The gain matrix detector 212 obtains a gain matrix H of the estimation filter 210 based on the parameters provided by the parameter estimator 201. Gain matrix H has a matrix of the form (6).

여기에서 j번째 h값인 h(j)는 다음과 같이 파라미터 A, G, C, Q, R을 이용해 수학식 7과 같이 구해진다.Here, h (j), which is the j-th h value, is obtained using Equation 7 using parameters A, G, C, Q, and R as follows.

수학식 7과 같이 H를 구하는 방식은 동 발명자가 발표한 논문 "A receding Horizon Kalman FIR filter for Discrete Time-invariant Systems(IEEE Transaction on Automatic control, Vol 44, No. 9, 1787-1791, 1999)"에 개시되어 있는 바와 같다.The method for obtaining H as shown in Equation 7 is published by the inventor, "A receding Horizon Kalman FIR filter for Discrete Time-invariant Systems (IEEE Transaction on Automatic control, Vol 44, No. 9, 1787-1791, 1999)." As disclosed in.

이득 행렬 H는 하기 수학식 8과 같이 음성신호에 대한 이득 행렬와 잡음 신호에 대한 이득 행렬로 정의할 수 있다.The gain matrix H is a gain matrix for the speech signal as shown in Equation 8 below. Matrix for Noise and Noise Signals Can be defined as

따라서, 이득 행렬 검출기(212)는 검출된 행렬 H에 대해서 상위 행(row)으로부터(k)의 차수인 p행(row)을 취하고, 이를로서 출력한다.Thus, the gain matrix detector 212 is determined from the higher row for the detected matrix H. take p rows, order of (k), Output as.

제 1 추정값 검출기(213)는 샘플 검출기(211)에서 제공되는 Y와 이득 행렬 검출기(212)에서 제공되는를 수학식 9와 같이 승산하여(k)를 추정한다.The first estimate detector 213 is provided by the Y provided by the sample detector 211 and the gain matrix detector 212. Multiply by Equation 9 Estimate (k).

(k)= Y (k) = Y

제 1 추정값 검출기(213)에서 추정된(k)는 수학식 1에서의(k)의 상태 변수 벡터인(k)에 대한 것이다.Estimated by the first estimate detector 213 (k) in Equation 1 (k) is the state variable vector for (k).

제 1 추정값 검출기(213)에서 검출된(k)는 상기 수학식 4에서의와 같은 벡터 형식을 갖는다.Detected by the first estimate detector 213 (k) in Equation 4 Has a vector format such as

제 2 추정값 검출기(214)는 수학식 4에서와 같이 벡터 형식을 갖는(k)에서 잡음이 제거된 음성신호(k)로 출력한다. 수학식 4에서 알 수 있는 바와 같이 실제 음성신호(k)는 수학식 4의 상태 변수 벡터인(k)의 마지막 항이다. 따라서 제 2 추정값 검출기(214)는(k)에서 마지막 항을 잡음신호가 제거된 음성신호(k)로 검출하여 출력한다.The second estimate detector 214 has a vector format as shown in equation (4). Voice signal with noise removed in (k) Output as (k). As can be seen from Equation 4, the actual voice signal (k) is the state variable vector is the last term of (k). Thus, the second estimate detector 214 The last term in (k) is the voice signal from which the noise signal is removed. Detects and outputs with (k).

상술한 추정 필터(210)는 유한 임펄스 응답(FIR) 필터로 구성될 수 있다.The estimation filter 210 described above may be configured as a finite impulse response (FIR) filter.

도 3은 본 발명에 따른 음성 향상 방법에 대한 동작 흐름 도이다.3 is a flowchart illustrating an operation of a voice enhancement method according to the present invention.

도 3을 참조하면, 제 301 단계에서 입력 프레임을 토대로 파라미터를 추정한다. 파라미터는 입력 프레임에 포함되어 있는 잡음 신호가 제거된 음성신호를 추정하는데 필요한 파라미터들로서, 상기 파라미터 추정부(201)에서 언급한 바와 같이 수학식 2에서의 A, G, C, Q, R이 이에 해당된다.Referring to FIG. 3, in operation 301, a parameter is estimated based on an input frame. The parameters are necessary for estimating the speech signal from which the noise signal included in the input frame has been removed. As mentioned in the parameter estimator 201, A, G, C, Q, and R in Equation 2 are the same. Yes.

제 302 단계에서 제 301 단계에서 추정된 파라미터를 이용하여 이득 행렬를 구함과 동시에 입력 프레임 y(k)에 대한 샘플 Y를 각각 검출한다. 이득행렬를 구하는 방식은 도 2에서의 이득 행렬 검출기(212)에서와 같은 방식으로 검출하고, 샘플 Y는 샘플 검출기(211)에서와 동일한 방식으로 검출한다.Gain matrix using the parameters estimated in step 301 to step 302 At the same time, samples Y for the input frame y (k) are detected. Gain matrix Is detected in the same manner as in the gain matrix detector 212 in FIG. 2, and sample Y is detected in the same manner as in the sample detector 211.

제 303 단계에서 상기 검출된 이득 행렬와 샘플 Y를 이용하여 제 1 추정값(k)를 수학식 6과 같이 검출한다. 그리고, 제 304 단계에서 추정된(k)의 마지막 항을 현재 입력되는 프레임 y(k)에 대한 잡음이 제거된 음성신호(k)로 검출한다. 제 305 단계에서 입력 프레임 y(k)에 대한 잡음이 제거된 음성신호(k)의 검출작업을 종료할 것인지 체크한다. 체크결과, 검출작업을 종료하지 않을 경우에는 제 301 단계로 리턴되어 상술한 과정을 반복 수행한다. 그러나, 검출작업을 종료할 경우에는 작업을 종료한다.The detected gain matrix in step 303 First estimate using and sample Y (k) is detected as in Equation 6. And estimated in step 304 The last term of (k) is used to remove the noise of the currently input frame y (k). (k) is detected. In step 305, the speech signal from which the noise for the input frame y (k) is removed It is checked whether or not to end the detection operation of (k). If the check result is not finished, the process returns to step 301 to repeat the above-described process. However, when the detection job is finished, the job ends.

상술한 본 발명의 음성 향상 시스템 및 방법은, 시간영역만을 이용하여 입력 프레임에서 잡음이 제거된 음성신호를 추출함으로써 기존의 음성 향상 시스템에 비해 간단한 구성과 구현 과정을 갖는다.The above-described speech enhancement system and method of the present invention has a simpler configuration and implementation process than a conventional speech enhancement system by extracting a speech signal from which noise is removed from an input frame using only a time domain.

또한, 기존의 음성 향상 시스템은 잡음신호를 처리할 때 근사적인 방법을 사용하는 반면 본 발명은 대응되는 수학적 모델을 토대로 구현된 추정필터를 사용하여 입력 프레임에 존재하는 잡음신호를 제거하도록 함으로써, 입력 프레임이 음성 신호와 잡음 신호가 서로 상관관계가 없는 주파수 대역에 존재한다는 것과 음성 신호와 잡음 신호가 정체된 것이라는 다소 현실적이지 못한 가정의 범위을 벗어나는 경우에도 양질의 음성신호를 제공할 수 있다.In addition, while the conventional speech enhancement system uses an approximate method when processing a noise signal, the present invention uses an estimation filter implemented based on a corresponding mathematical model to remove a noise signal present in an input frame. Even if a frame is out of the rather unrealistic assumption that the voice and noise signals are in a frequency band where they are not correlated with each other and that the voice and noise signals are stagnant, a good voice signal can be provided.

본 발명은 상술한 실시 예에 한정되지 않으며, 본 발명의 사상 내에서 당업자에 의한 변형이 가능함은 물론이다. 따라서, 본 발명에서 권리를 청구하는 범위는 상세한 설명의 범위 내로 정해지는 것이 아니라 후술하는 청구범위로 정해질 것이다.The present invention is not limited to the above-described embodiments, and variations of the present invention can be made by those skilled in the art within the spirit of the present invention. Therefore, the scope of claims in the present invention will not be defined within the scope of the detailed description will be defined by the claims below.

Claims

In the voice enhancement system,

The speech signal is configured based on a state-space model of an input frame, and detects an estimated value of a state variable for a voice signal included in the input frame, and extracts a voice signal from which a noise signal has been removed based on the estimated value of the detected state variable. A filter for filtering the input frame to detect;

And a parameter estimator for estimating, in real time, the predetermined parameter required for the filtering from the input frame to the filter.

The method of claim 1, wherein the filter,

A sample detector for detecting at least one sample for the input frame;

A gain matrix detector for detecting a gain matrix of the filter for the speech signal included in the input frame using the parameters estimated by the parameter estimator;

A first estimated value detector for detecting an estimated value of a state variable for the speech signal using the sample detected at the sample detector and the gain matrix detected at the gain matrix detector;

And a second estimated value detector for detecting an estimated value of the speech signal from which the noise signal has been removed from the estimated value detected by the first estimated value detector.

3. The speech enhancement system of claim 2 wherein the sample detector detects N + 1 samples from a current time k to a past time k-N for the input frame.

In the voice enhancement method,

Estimating a predetermined parameter from an input frame in real time;

An estimated value of a state variable for a speech signal included in the input frame is detected by using the predetermined parameter based on the state space model of the input frame, and a noise signal is removed based on the estimated value of the detected state variable. Filtering the input frame to detect a voice signal.

delete

The method of claim 4, wherein the filtering step,

Detecting at least one sample of the gain matrix of the filtering and the input frame using the estimated predetermined parameter;

Detecting an estimated value of a state variable for the speech signal using the gain matrix and the detected sample;

Estimating a speech signal from which the noise signal is removed based on an estimated value of a state variable for the speech signal.