KR101051035B1

KR101051035B1 - Wide Probability Based Wide Decision Method for Secondary Conditions for Speech Enhancement

Info

Publication number: KR101051035B1
Application number: KR1020090054807A
Authority: KR
Inventors: 장준혁; 금종모
Original assignee: 인하대학교 산학협력단
Priority date: 2009-06-19
Filing date: 2009-06-19
Publication date: 2011-07-21
Also published as: KR20100136634A

Abstract

본 발명은 음성 향상을 위한 2차 조건 사후최대확률 기반 광역연판정 방법에 관한 것으로서, 보다 구체적으로는 (1) 인접한 프레임들의 상호 연관성을 고려하여, 음성 존재 및 부재에 관한 조건 사후최대확률 값을 정의하는 단계; (2) 상기 정의된 음성 존재 및 부재에 관한 조건 사후최대확률 값에 기초하여, 현재 프레임의 음성부재확률을 획득하는 단계; 및 (3) 상기 획득한 음성부재확률을 적용하여, 현재 프레임의 음성을 향상시키는 단계를 포함하는 것을 그 구성상의 특징으로 한다.The present invention relates to a second-order post-probability-based wide-area decision method for speech enhancement. More specifically, (1) considering a correlation between adjacent frames, a post-probability maximum probability value for voice presence and absence is determined. Defining; (2) acquiring a speech absence probability of the current frame based on the conditional post-maximum probability value for the speech presence and absence defined above; And (3) applying the obtained voice member probability to improve the voice of the current frame.

본 발명의 음성 향상을 위한 2차 조건 사후최대확률 기반 광역연판정 방법에 따르면, 음성 신호를 구성하는 이전 두 프레임의 음성 활동(Voice Activity) 및 은닉 마르코프 모델(Hidden Markov Model; HMM)을 이용하여, 음성 존재 및 부재에 관한 2차조건 사후최대확률 값(Conditional Maximum A Posteriori; CMAP)을 정의하기 때문에, 인접 프레임 간에 존재하는 상호 연관성을 고려하는 것이 가능해진다. 또한 인접 프레임들의 음성 활동을 고려하는 상기 2차 조건 사후최대확률 값을 이용하여 음성부재확률을 도출하기 때문에, 수시로 변하는 잡음환경에서도 정확하게 잡음을 추정하여 음성을 향상시키는 것이 가능해진다.According to the second-order post-probability-based wide-area decision method for speech improvement of the present invention, using the Hidden Activity Markov Model (HMM) and the Voice Activity of the previous two frames constituting the speech signal Since it defines a Conditional Maximum A Posteriori (CMAP) for the second condition with respect to voice presence and absence, it becomes possible to take into account the correlations that exist between adjacent frames. In addition, since the speech absence probability is derived using the maximum probability value after the second condition considering the voice activity of the adjacent frames, it is possible to accurately estimate the noise even in a constantly changing noise environment and improve the speech.

음성 향상, 광역연판정, 2차 조건 사후최대확률 Speech Enhancement, Wide Decision, Maximum Probability After Secondary Condition

Description

`` AN IMPROVED GLOBAL SOFT DECISION METHOD INCORPORATING SECOND-ORDER CONDITIONAL MAP FOR SPEECH ENHANCEMENT}

본 발명은 음성 향상 방법에 관한 것으로서, 보다 구체적으로는 음성 향상을 위한 2차 조건 사후최대확률 기반 광역연판정 방법에 관한 것이다.BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech enhancement method, and more particularly, to a method for wideband delay determination based on a maximum post-probability second order condition for speech enhancement.

최근 이동통신 단말기나 차량 내비게이션 등 음성 신호처리 시스템의 사용이 증가함에 따라, 음성 향상 기술에 대한 연구가 주목받고 있다. 음성 향상을 위한 신호처리 과정에서 가장 중요한 부분은 잡음을 정확하게 추정하는 것인데, 특히 비상관 잡음 신호를 효과적으로 처리할 수 있어야 한다. 따라서 잡음을 정확하게 추정하여 음성 스펙트럼을 향상시키기 위한 많은 연구가 진행되어 왔다. 지금까지 스펙트럼 차감법, Wiener 필터링, 연판정(Soft Decision), 최소평균 자승오차(Minimum Mean Square Error; MMSE)를 이용하는 잡음 추정 방법들이 연구되어 왔으며, 특히 연판정에 근거한 추정방법이 뛰어난 성능을 가지는 것으로 알려져 있다.Recently, as the use of a voice signal processing system such as a mobile communication terminal or a vehicle navigation system is increasing, research on a voice enhancement technology has been attracting attention. The most important part of signal processing for speech enhancement is to accurately estimate the noise, especially the uncorrelated noise signal. Therefore, many studies have been conducted to improve the speech spectrum by accurately estimating noise. Until now, noise estimation methods using spectral subtraction, Wiener filtering, Soft Decision, and Minimum Mean Square Error (MMSE) have been studied. Especially, the soft decision estimation method has excellent performance. It is known.

최근에 제안된 광역연판정(Global Soft Decision) 방법에서는 기존의 채널별 음성부재확률(Local Speech Absence Probability; LSAP)과 현재 프레임에서의 모든 데이터에 의해 결정 되어지는 전역 음성부재확률(Global Speech Absence Probability; GSAP)이 결합되어, 통계적으로 견실한 음성부재확률을 도출하였다. 하지만 기존의 광역연판정 방법은 기존의 통계적 가정을 바탕으로 음성 부재 및 존재 확률이 사전에 정해지고, 이 값이 고정된 상태로 음성부재확률을 도출하기 때문에, 음성 신호를 구성하는 각각의 프레임간의 상호 연관성을 고려하지 못하는 단점이 있으며, 이에 더하여 수시로 변하는 잡음 환경에서 정확한 음성부재확률을 추정하기 어려운 문제점이 있다.In the recently proposed Global Soft Decision method, the Global Speech Absence Probability determined by the existing Local Speech Absence Probability (LSAP) and all data in the current frame is determined. GSAP) was combined to yield a statistically sound negative absence probability. However, in the conventional wide-area decision method, the absence of speech and the probability of existence are determined in advance based on the existing statistical assumptions, and the probability of the absence of speech is derived with a fixed value. There is a drawback of not considering the correlation, and in addition, it is difficult to estimate the accurate speech absence probability in a constantly changing noise environment.

본 발명은 기존에 제안된 방법들의 상기와 같은 문제점들을 해결하기 위해 제안된 것으로서, 음성 신호를 구성하는 이전 두 프레임의 음성 활동(Voice Activity) 및 은닉 마르코프 모델(Hidden Markov Model, HMM)을 이용하는 음성 존재 및 부재에 관한 2차 조건 사후최대확률 값을 정의하여, 인접 프레임 간에 존재하는 상호 연관성을 고려할 수 있는 방법을 제공하는 것을 목적으로 한다.The present invention has been proposed to solve the above problems of the conventionally proposed methods, and uses the Hidden Activity Markov Model (HMM) and the Voice Activity of the previous two frames constituting the speech signal. It is an object of the present invention to define a post-probability value of secondary conditions relating to existence and absence, and to provide a method of considering mutual correlations between adjacent frames.

또한 인접 프레임들의 음성 활동을 고려하는 상기 2차 조건 사후최대확률 값을 이용하여 음성부재확률을 도출하여, 수시로 변하는 잡음환경에서도 정확하게 잡음을 추정하여 음성을 향상시키는 것을 다른 목적으로 한다.Another object of the present invention is to derive a speech absence probability by using the maximum probability value after the second condition considering the voice activity of adjacent frames, and to improve the speech by accurately estimating the noise even in a constantly changing noise environment.

상기한 목적을 달성하기 위한 본 발명의 특징에 따른 음성 향상을 위한 2차조건 사후최대확률 기반 광역연판정 방법은,In accordance with a feature of the present invention for achieving the above object, a secondary condition post-maximum probability-based wide-area determination method for improving speech is provided.

(1) 인접한 프레임들의 상호 연관성을 고려하여, 음성 존재 및 부재에 관한 조건 사후최대확률 값을 정의하는 단계;(1) defining conditions post-maximum probability values for presence and absence of speech, taking into account the correlation of adjacent frames;

(2) 상기 정의된 음성 존재 및 부재에 관한 조건 사후최대확률 값에 기초하여, 현재 프레임의 음성부재확률을 획득하는 단계; 및(2) acquiring a speech absence probability of the current frame based on the conditional post-maximum probability value for the speech presence and absence defined above; And

(3) 상기 획득한 음성부재확률을 적용하여, 현재 프레임의 음성을 향상시키 는 단계를 포함하는 것을 그 구성상의 특징으로 한다.And (3) applying the obtained voice member probability to improve the voice of the current frame.

바람직하게는, 음성 신호와 잡음의 스펙트럼이 복소가우시안 분포를 따른다는 가정으로부터, 음성 존재 및 부재 가설에 근거한 음성 신호의 확률밀도함수를 다음과 같은 수학식으로 정의할 수 있다.Preferably, based on the assumption that the spectrum of the speech signal and the noise follow a complex Gaussian distribution, the probability density function of the speech signal based on the speech presence and absence hypothesis can be defined by the following equation.

여기서, λ_x(k, l)는 l번째 프레임의 k번째 주파수 성분에서 음성 신호의 분산 값을 의미하고, λ_n(k, l)은 l번째 프레임의 k번째 주파수 성분에서 잡음의 분산 값을 의미한다. 또한, P(Y(k, l)|H ₀)는 음성 부재 시 Y(k, l)의 확률밀도함수를 의미하고, P(Y(k, l)|H ₁ )는 음성 존재 시 Y(k, l)의 확률밀도함수를 의미한다.Here, λ _x (k, l) denotes the variance value of the speech signal in the k-th frequency component of the l-th frame, and λ _n (k, l) denotes the variance value of noise in the k-th frequency component of the l-th frame it means. In addition, P ( Y (k, l) | H ₀ ) means the probability density function of Y (k, l) in the absence of voice, and P ( Y (k, l) | H ₁ ) is Y (in the presence of negative). k, l) means the probability density function.

더욱 바람직하게는, 상기 음성 존재 및 부재 가설에 근거한 음성 신호의 확률밀도함수로부터, l번째 프레임의 k번째 주파수 채널의 우도비 Λ(Y(k, l))를 다음과 같은 수학식으로 정의할 수 있다.More preferably, from the probability density function of the speech signal based on the speech presence and absence hypothesis, the likelihood ratio Λ ( Y (k, l)) of the k-th frequency channel of the l-th frame may be defined by the following equation. Can be.

여기서,

은 사전 SNR을 나타내며,

은 사후 SNR을 나타낸다.here,

Represents the prior SNR,

Represents post SNR.

바람직하게는, 상기 단계 (1)에서 인접한 프레임들의 상호 연관성을 고려하기 위하여, 이전 두 프레임의 음성 활동(Voice Activity) 및 은닉 마르코프 모델(Hidden Markov Model, HMM)을 이용하여, 음성 존재 및 부재에 관한 2차 조건 사후최대확률 값을 다음과 같은 수학식으로 정의할 수 있다.Preferably, in order to take into account the correlation of adjacent frames in step (1), voice presence and absence of voice activity and hidden Markov Model (HMM) of the previous two frames are used. The maximum probability value after the secondary condition can be defined by the following equation.

여기서, k는 프레임에서의 주파수 성분 번호, l은 프레임 번호, H₀는 음성 부재 가설, H₁은 음성 존재 가설,

는 이전 프레임에 음성이 존재하지 않고 그 이전 프레임에도 음성이 존재하지 않을 경우,

는 이전 프레임에 음성이 존재하지 않고 그 이전 프레임에 음성이 존재하는 경우,

는 이전 프레임에는 음성이 존재하고 그 이전 프레임에는 음성이 존재하지 않는 경우,

는 이전 프레임과 그 이전 프레임에 모두 음성이 존재하는 경우의 2차 조건 사후최대확률 값을 나타낸다.Where k is the frequency component number in the frame, l is the frame number, H ₀ is the speech absent hypothesis, H ₁ is the speech presence hypothesis,

If no voice exists in the previous frame and no voice exists in the previous frame,

If there is no voice in the previous frame and voice in the previous frame,

If there is a voice in the previous frame and no voice in the previous frame,

Represents the maximum probability value after the second condition when speech exists in both the previous frame and the previous frame.

더더욱 바람직하게는, 상기 단계 (2)에서, 상기 음성 존재 및 부재 가설에 근거한 음성 신호의 확률밀도함수, 상기 l번째 프레임의 k번째 주파수 채널의 우도비 및 상기 2차 조건 사후최대확률 값을 고려하여, 음성부재확률을 다음의 수학식으로 정의할 수 있다.Even more preferably, in step (2), the probability density function of the speech signal based on the speech presence and absence hypothesis, the likelihood ratio of the k-th frequency channel of the l-th frame, and the post-secondary maximum probability value Thus, the voice member probability can be defined by the following equation.

여기서,

는 상기 음성 존재 및 부재에 관한 2차 조건 사후최대확률 값 4개 중 하나의 값을 가진다.here,

Has one of four secondary condition post-maximum probability values for the presence and absence of the voice.

본 발명의 음성 향상을 위한 2차조건 사후최대확률 기반 광역연판정 방법에 따르면, 음성 신호를 구성하는 이전 두 프레임의 음성 활동 및 은닉 마르코프 모델(HMM)을 이용하여, 음성 존재 및 부재에 관한 2차조건 사후최대확률 값을 정의하기 때문에, 인접 프레임 간에 존재하는 상호 연관성을 고려하는 것이 가능해진다.According to the second-order post-probability-based wide-area decision method for speech enhancement of the present invention, using the speech activity and hidden Markov model (HMM) of two previous frames constituting the speech signal, By defining the post-probability maximum probability value, it is possible to take into account the correlations that exist between adjacent frames.

또한 인접 프레임들의 음성 활동을 고려하는 상기 2차조건 사후최대확률 값을 이용하여 음성부재확률을 도출하기 때문에, 수시로 변하는 잡음환경에서도 정확하게 잡음을 추정하여 음성을 향상시키는 것이 가능해진다.In addition, since the speech absence probability is derived using the maximum probability value after the second condition considering the voice activity of adjacent frames, it is possible to accurately estimate the noise and improve the voice even in a constantly changing noise environment.

이하에서는 첨부된 도면들을 참조하여, 본 발명에 따른 실시예에 대하여 상세하게 설명하기로 한다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

본 발명에 대한 상세한 설명을 하기 전에, 먼저 기존의 광역연판정 방법에 의하여 음성부재확률을 구하는 과정에 관하여 상세히 기술한다.Before giving a detailed description of the present invention, the process of obtaining the voice member probability by the conventional wide-area determination method will be described in detail.

먼저, 잡음의 영향을 받은 음성 신호 Y(t)는 원래의 음성 신호 X(t)에 잡음 N(t)이 더해진 형태로 가정한다. 여기서, t는 이산시간을 나타낸다. 음성 향상 방법에서 사용되고 있는 기본가설 H ₀(k, l), H ₁(k, l)이 l번째 프레임의 k번째 주파수 성분에 대하여, 각각 음성의 부재와 존재를 나타낸다고 정의하면 다음 수학식 1과 같이 표현할 수 있다.First, it is assumed that the voice signal Y (t) affected by the noise is a form in which the noise N (t) is added to the original voice signal X (t). Here, t represents discrete time. If the basic hypotheses H ₀ (k, l) and H ₁ (k, l) used in the speech enhancement method are defined as representing the absence and presence of speech for the k-th frequency component of the l-th frame, respectively, Can be expressed as:

여기서, Y(k, l), X(k, l) 및 N(k, l)은 각각 잡음의 영향을 받은 음성 신호, 원래의 음성 신호 및 잡음 신호의 l번째 프레임에서의 k번째 주파수 성분을 의미하는 푸리에변환 계수를 의미한다.Where Y (k, l), X (k, l) and N (k, l) are the k-th frequency components in the l-th frame of the noise-affected speech signal, the original speech signal, and the noise signal, respectively. Fourier transform coefficients.

음성 신호와 잡음의 스펙트럼이 복소가우시안 분포를 따른다는 가정으로부 터, 가설 H ₀(k, l)과 H ₁(k, l)에 근거한 음성 신호의 확률밀도함수는 다음 수학식 2와 같이 나타낼 수 있다.From the assumption that the speech and noise spectra follow a complex Gaussian distribution, the probability density function of the speech signal based on the hypotheses H ₀ (k, l) and H ₁ (k, l) is given by Can be.

음성의 존재와 부재에 관한 가설을 바탕으로, 주파수 채널별 음성부재확률은 다음 수학식 3과 같이 정의할 수 있다.Based on the hypothesis about the presence and absence of speech, the speech absence probability for each frequency channel can be defined as in Equation 3 below.

여기서, Λ(Y(k, l))는 다음 수학식 4로 정의되는, l번째 프레임의 k번째 주 파수 채널의 우도비를 의미한다.Here, Λ ( Y (k, l)) means the likelihood ratio of the k-th frequency channel of the l-th frame, which is defined by Equation 4 below.

여기서,

은 사전 SNR을 나타내며,

은 사후 SNR을 나타낸다.here,

Represents the prior SNR,

Represents post SNR.

또한, 한 프레임에서의 음성부재확률은 현재 프레임의 관찰 결과를 기반으로 다음 수학식 5와 같이 구할 수 있다.In addition, the probability of speech absence in one frame may be obtained as shown in Equation 5 based on the observation result of the current frame.

각 주파수 성분들의 통계적인 독립성을 가정하면, 한 프레임에서의 음성부재확률을 다음 수학식 6과 같이 표현할 수 있다.Assuming statistical independence of each frequency component, the probability of speech absence in one frame can be expressed by Equation 6 below.

여기서, 주파수 채널의 총 개수는 M이다.Here, the total number of frequency channels is M.

기존의 광역연판정 방법을 이용하여 구한 수학식 6으로 표현할 수 있는 음성부재확률을 사용하여 잡음을 추정하는 경우의 문제점은 수학식 6의 분모에 나타나는 P(H ₁)와 P(H ₀)가 기존의 통계적 가정을 바탕으로 고정되어 있기 때문에(일례로, P(H ₁) = P(H ₀) = 0.5로 고정할 수 있음), 음성을 구성하는 각각의 프레임 간에 존재하는 강한 상호 연관성을 이용할 수 없으며, 또한 기존의 통계적 가정에서 벗어나는, 다양한 음성 환경 변화에 대해 정확한 잡음 추정이 어렵다.The problem of estimating noise by using the speech absence probability, which can be expressed by Equation 6 obtained using the conventional wide-area decision method, is that P ( H ₁ ) and P ( H ₀ ) appearing in the denominator of Equation 6 Because it is fixed based on existing statistical assumptions (for example, P ( H ₁ ) = P ( H ₀ ) = 0.5), it can take advantage of the strong correlations that exist between each frame of speech. In addition, accurate noise estimation is difficult for various voice environment changes that deviate from the existing statistical assumptions.

본 발명에서는 기존의 광역연판정 방법의 상기 문제점을 해결하기 위하여, 2차 조건 사후최대확률에 기반을 둔 광역연판정 방법을 제안한다. 상기 첫 번째 문제점으로 지적한 인접 프레임 간의 상호 연관성을 이용하기 위하여, 음성 신호를 구성하는 이전 두 프레임의 음성 활동 및 은닉 마르코프 모델(HMM)을 적용하여, 음성 존재 및 부재에 관한 2차 조건 사후최대확률 값(CMAP)을 정의한다. 또한 상기 두 번째 문제점으로 지적한 다양하면서 수시로 변하는 음성 환경 변화에 적응하기 위하여, 상기 2차 조건 사후최대확률 값이 적용된 음성부재확률을 도출하고, 이를 통해 음성을 향상시킨다.In the present invention, in order to solve the above problems of the conventional wide-area decision method, we propose a wide-area decision method based on the maximum probability after the second condition. In order to take advantage of the interrelationship between adjacent frames pointed out as the first problem, by applying the speech activity and hidden Markov model (HMM) of the previous two frames constituting the speech signal, the second condition post-maximum probability of speech presence and absence Defines the value (CMAP). In addition, in order to adapt to the various and frequently changing voice environment changes pointed out as the second problem, the voice absence probability to which the second condition post-maximum probability value is applied is derived, thereby improving the voice.

지금부터 본 발명의 음성 향상을 위한 2차 조건 사후최대확률 기반 광역연판 정 방법에 대해 상세히 설명한다.From now on, the second-order post-probability-based wide-area determination method for speech improvement of the present invention will be described in detail.

상기 기술한 바와 같이, 인접 프레임 간의 강한 상호 연관성을 고려하기 위하여, 본 발명의 일실시예에서는 두 프레임의 음성 존재 및 부재 조건을 고려한다. 두 프레임의 음성 존재 및 부재 조건을 고려한 주파수 채널별 음성부재확률은 다음 수학식 7과 같이 표현할 수 있다.As described above, in order to consider strong correlation between adjacent frames, one embodiment of the present invention considers the voice presence and absence conditions of two frames. The speech absence probability of each frequency channel considering the speech presence and absence conditions of two frames may be expressed by Equation 7 below.

여기서, α와 β는 다음의 수학식 8처럼 표현된다.Here, α and β are expressed as in Equation 8 below.

본 발명의 음성 향상을 위한 2차 조건 사후최대확률 기반 광역연판정 방법은 기존의 광역연판정 방법을 통해 구할 수 있는, 상기 수학식 6에 표기된 음성부재확률에서 고정 파라미터 P(H₁)/P(H₀) 대신 다음 수학식 9로 표현할 수 있는 2차 조건 사후최대확률 값을 적용한다.The second condition post-maximum probability based wide range decision method for improving the voice of the present invention can be obtained through the conventional wide range decision method, using the fixed parameter P (H ₁ ) / P in the voice member probability shown in Equation 6 above. Instead of (H ₀ ), the posterior maximum probability value of the second condition, which can be expressed by Equation 9, is applied.

2차 조건 사후최대확률 값을 고려하여 광역연판정 방법을 수행하게 되면, 2차 조건 사후최대확률 값이 다음 수학식 10과 같이, 통계적인 가설에 의한 사전 확률보다 신뢰성이 높기 때문에, 기존의 광역연판정 방법보다 더욱 정확하게 음성부재확률을 구할 수 있다.When the wide-area decision method is performed in consideration of the post-secondary maximum probability value of the secondary condition, since the post-secondary maximum probability value of the second condition is more reliable than the prior probability by the statistical hypothesis, as shown in Equation 10, The probability of the negative member can be obtained more accurately than the soft decision method.

상기 2차 조건 사후최대확률 값은 다음 수학식 11과 같이 4개 중 하나의 값을 가진다.The maximum post-probability value after the secondary condition has one of four values as shown in Equation 11 below.

If there is no voice in the previous frame and voice in the previous frame,

If there is a voice in the previous frame and no voice in the previous frame,

수학식 3의 고정 파라미터 P(H₁)/P(H₀)를 상기 2차 조건 사후최대확률로 대체한 주파수 채널별 음성부재확률은 다음 수학식 12와 같이 나타낼 수 있다.The speech absence probability for each frequency channel in which the fixed parameter P (H ₁ ) / P (H ₀ ) of Equation 3 is replaced with the maximum probability after the secondary condition may be expressed by Equation 12 below.

상기 수학식 12의 음성부재확률을 이용하여 잡음을 추정하게 되면, 음성부재확률의 분모에 있는 2차 조건 사후최대확률 값에 의하여, 인접 프레임들 간의 상호 연관성을 고려할 수 있으며, 또한 다양한 음성 환경에 대해서도 강인한 특성을 지니므로, 기존의 광역연판정 방법을 이용하는 경우보다 더욱 정확하게 잡음을 추정하여 음성 신호의 질을 더욱 향상시킬 수 있다.When the noise is estimated by using the speech absence probability of Equation 12, the correlation between adjacent frames may be considered based on the maximum post-probability value after the second condition in the denominator of the speech absence probability, and also in various speech environments. Because of its robust characteristics, it is possible to estimate the noise more accurately than using the conventional wide-area determination method, thereby improving the quality of the speech signal.

도 1은 본 발명의 일실시예에 따른 음성 향상을 위한 2차 조건 사후최대확률 기반 광역연판정 방법에 대한 흐름도이다. 도 1에 도시된 바와 같이, 본 발명의 일실시예에 따른 음성 향상을 위한 2차 조건 사후최대확률 기반 광역연판정 방법은, 인접한 프레임들의 상호 연관성을 고려하여, 음성 존재 및 부재에 관한 조건 사후최대확률 값을 정의하는 단계(S100), 상기 정의된 음성 존재 및 부재에 관한 조건 사후최대확률 값에 기초하여, 현재 프레임의 음성부재확률을 획득하는 단계(S200), 및 상기 획득한 음성부재확률을 적용하여, 현재 프레임의 음성을 향상시키는 단계(S300)를 포함한다.1 is a flowchart illustrating a wide-area delay determination method based on a maximum post-probability second order condition for speech enhancement according to an embodiment of the present invention. As shown in FIG. 1, the second probability post-probability-based wide-area determination method for speech enhancement according to an embodiment of the present invention, considering the correlation between adjacent frames, post-condition on speech presence and absence Defining a maximum probability value (S100), acquiring a speech member probability of the current frame based on the conditional post-maximum probability value for the defined speech presence and absence (S200), and obtaining the obtained speech member probability Applying, to improve the voice of the current frame (S300).

단계 S100은, 인접한 프레임들의 상호 연관성을 고려하기 위하여, 이전 두 프레임의 음성 활동 및 은닉 마르코프 모델(HMM)을 이용하여, 음성 존재 및 부재에 관한 2차 조건 사후최대확률 값을 결정하는 단계이다.Step S100 is a step of determining a second condition post-maximum probability value for voice presence and absence using the voice activity and hidden Markov model (HMM) of the previous two frames, in order to take into account the correlation of adjacent frames.

단계 S200은, 음성 존재 및 부재 가설에 근거한 음성 신호의 확률밀도함수, l번째 프레임의 k번째 주파수 채널의 우도비 및 2차 조건 사후최대확률 값을 고려하여, 음성부재확률을 획득하는 단계이다.Step S200 is a step of acquiring the speech absence probability in consideration of the probability density function of the speech signal based on the speech presence and absence hypothesis, the likelihood ratio of the k-th frequency channel of the l-th frame and the post-secondary maximum probability value.

단계 S300은, 단계 S200 에서 획득한 음성부재확률에 기초하여, 잡음을 추정하고 이를 제거하는 과정을 통하여 현재 프레임의 음질을 향상시키는 단계이다.Step S300 is a step of improving the sound quality of the current frame through a process of estimating noise and removing the noise based on the speech absence probability obtained in step S200.

이상 본 발명의 일실시예에 따른, 음성 향상을 위한 2차 조건 사후최대확률 기반 광역연판정 방법의 성능 평가를 위하여, 널리 사용되고 있는 ITU-T P.862 PESQ(Perceptual Evaluation of Speech Quality)테스트를 이용한다. 그리고 본 발명에서 제안하는 2차 조건 사후최대확률 기반 광역연판정 방법의 성능 비교를 위한 대상으로 기존의 광역연판정 방법을 고려한다. 2차 조건 사후최대확률 기반 광역연판정 방법 및 기존의 연판정방법의 성능을 검증하기 위한 ITU-T P.862 PESQ 테스트를 위하여, 남성 및 여성 화자 각각이 100개의 문장을 발음하여 얻은 음성 신호를, 하나의 프레임의 길이를 10ms로 정한 후 8kHz로 샘플링하여 획득한 음성샘플링 데이터에, NOISEX-92 데이터베이스를 이용하여 세 가지 잡음(white 잡음, car 잡음 및 F16 잡음)을 각각 5, 10, 15dB의 SNR에 맞게 부가하여, ITU-T P.862 PESQ 테스트 파일을 구성하였다. 도 2는 F16 잡음을 SNR = 10dB로 음성 샘플데이터에 첨가한 경우의 음성 파형을 나타내는 도면이다. 도 3은 기존의 광역연판정 방법을 이용하였을 때의 음성존재확률(점선으로 표기) 및 본 발명에서 제안된 방법을 이용한 경우의 음성존재확률(실선으로 표기)을 비교하여 나타낸 도면이다. 본 발명에서 제안한 방법은 2차 조건 사후최대확률 값을 통하여, 이전 두 프레임의 정보가 음성 신호일 확률이 높을 때에는 음성부재확률을 더 작게 만들어주고, 이전 두 프레임의 정보가 잡음 신호일 확률이 높을 때에는, 음성부재확률을 1에 가깝게 만들어 신뢰성을 향상시킨다. 도 3에서 관찰할 수 있는 것처럼, 기존의 광역연판정 방법은 음성 신호가 존재하는 경우에도 음성존재확률의 변동이 심하지만, 본 발명의 방법을 사용할 경우에는 음성존재확률이 거의 1에 근접하는 것을 알 수 있다. 또한 ITU-T P.862 PESQ 테스트를 위하여, 기존의 광역연판정 방법의 음성부재확률에 존재하는 고정 파라미터 P(H₁)/P(H₀)는 1로 설정하였으며, 제안된 방법에서 사용되는 수학식 11로 표현될 수 있는 4개의 2차 사후조건 확률값은 긴 음성 파일의 확률적 통계 자료를 바탕으로

=0.0246,

=0.0738,

=53.41,

=479로 설정하였다. 표 1은 기존의 광역연판정 방법을 적용하여 음성을 향상시킨 경우와, 본 발명에서 제안된 방법을 적용하여 음성을 향상시킨 경우에 대한 PESQ 테스트를 결과이다. 모든 실험 조건에 대하여 본 발명에서 제안하는 방법의 결과가 기존의 광역연판정 방법보다 나은 것을 확인할 수 있으며, 특히 낮은 SNR에서 더욱 뛰어난 성능을 나타냄을 알 수 있다. 이는 도 3에서와 같이, 고정 파라미터 P(H₁)/P(H₀)를 사용하던 광역연판정 방법보다, 본 발명에서 제안한 2차 조건 사후최대확률 값을 이용한 방법이, 다양한 잡음 환경에서 음성부재확률을 구할 때, 더 정확하게 잡음을 추정할 수 있으므로, 음성 향상 시스템에 적용되었을 때 더 나은 성능을 보일 수 있기 때문이다.In order to evaluate the performance of the second condition post-maximum probability-based wide-area determination method according to an embodiment of the present invention, a widely used ITU-T P.862 Perceptual Evaluation of Speech Quality (PESQ) test is performed. I use it. In addition, the conventional wide-area decision method is considered as an object for comparing the performance of the second-order condition post-maximum probability-based wide-area decision method proposed in the present invention. For the ITU-T P.862 PESQ test to verify the performance of the second-order post-probability-based wide-area soft decision method and the conventional soft decision method, each male and female speaker uses a speech signal obtained by pronouncing 100 sentences. In the voice sampling data obtained by setting the length of one frame to 10ms and sampling at 8kHz, three noises (white noise, car noise, and F16 noise) of 5, 10, and 15 dB, respectively, were obtained using the NOISEX-92 database. In addition to SNR, an ITU-T P.862 PESQ test file was constructed. Fig. 2 is a diagram showing a speech waveform when F16 noise is added to speech sample data at SNR = 10 dB. FIG. 3 is a view showing a comparison between a voice presence probability (marked with a dotted line) and a voice presence probability (marked with a solid line) when the conventional wide-area determination method is used. According to the method proposed in the present invention, through the second condition after the maximum probability value, when the information of the previous two frames is more likely to be a voice signal, the voice absence probability is made smaller, and when the information of the previous two frames is more likely to be a noise signal, Improves reliability by making the voice member probability close to 1. As can be observed in FIG. 3, the conventional wide-area determination method has a significant variation in the voice presence probability even in the presence of a voice signal. However, when the method of the present invention is used, the voice presence probability is almost one. Able to know. In addition, for the ITU-T P.862 PESQ test, the fixed parameters P (H ₁ ) / P (H ₀ ) existing in the speech absence probability of the conventional wide-area determination method are set to 1, and the proposed method is used. The four quadratic postcondition probabilities that can be expressed in Equation 11 are based on the stochastic statistics of the long speech file.

= 0.0246,

= 0.0738,

= 53.41,

= 479. Table 1 shows the results of the PESQ test for the case of improving the speech by applying the conventional wide-area determination method and the case of improving the speech by applying the method proposed in the present invention. For all experimental conditions, the results of the proposed method of the present invention can be confirmed to be better than the conventional wide-area determination method, and it can be seen that the performance is particularly excellent at low SNR. As shown in FIG. 3, the method using the second-order post-probability value proposed in the present invention is a voice in various noise environments, rather than the wide-area decision method using the fixed parameters P (H ₁ ) / P (H ₀ ). This is because the noise can be estimated more accurately when the absence probability is obtained, so that the performance can be better when applied to a speech enhancement system.

잡음 유형Noise type 방법Way SNR(dB)SNR (dB) 55 1010 1515 whitewhite Global
ProposedGlobal
Proposed 2.080
2.0822.080
2.082 2.423
2.4242.423
2.424 2.475
2.4782.475
2.478 carcar Global
ProposedGlobal
Proposed 3.310
3.3203.310
3.320 3.596
3.6043.596
3.604 3.848
3.8543.848
3.854 F16F16 Global
ProposedGlobal
Proposed 2.148
2.1962.148
2.196 2.540
2.5542.540
2.554 2.847
2.8582.847
2.858

이상 설명한 본 발명은 본 발명이 속한 기술분야에서 통상의 지식을 가진 자에 의하여 다양한 변형이나 응용이 가능하며, 본 발명에 따른 기술적 사상의 범위는 아래의 특허청구범위에 의하여 정해져야 할 것이다.The present invention described above may be variously modified or applied by those skilled in the art, and the scope of the technical idea according to the present invention should be defined by the following claims.

도 1은 본 발명의 일실시예에 따른 음성 향상을 위한 2차조건 사후최대확률 기반 광역연판정 방법에 대한 흐름도.1 is a flowchart illustrating a wide-area delay determination method based on a maximum post-probability second order condition for speech enhancement according to an embodiment of the present invention.

도 2는 F16 잡음을 SNR = 10dB로 음성 샘플데이터에 첨가한 경우의 음성 파형을 나타내는 도면.Fig. 2 is a diagram showing speech waveforms when F16 noise is added to speech sample data at SNR = 10 dB.

도 3은 기존의 광역연판정 방법을 이용하였을 때의 음성존재확률 및 본 발명에서 제안된 방법을 이용한 경우의 음성존재확률을 비교하여 나타낸 도면.3 is a view showing a comparison between the voice presence probability when using the conventional wide-area determination method and the voice presence probability when using the method proposed in the present invention.

<도면 부호에 대한 설명><Description of Drawing>

S100: 인접한 프레임들의 상호 연관성을 고려하여, 음성 존재 및 부재에 관한 조건 사후최대확률 값을 정의하는 단계S100: defining a condition post-maximum probability value regarding the presence and absence of speech in consideration of correlation of adjacent frames

S200: 정의된 음성 존재 및 부재에 관한 조건 사후최대확률 값에 기초하여, 현재 프레임의 음성부재확률을 획득하는 단계S200: acquiring the speech absence probability of the current frame based on the conditional post-maximum probability value regarding the defined speech presence and absence.

S300: 획득한 음성부재확률을 적용하여, 현재 프레임의 음성을 향상시키는 단계S300: applying the acquired voice member probability to improve the voice of the current frame

Claims

As the method of wide-area decision based on the posterior maximum probability after the second condition for speech enhancement,

(1) defining conditions post-maximum probability values for presence and absence of speech, taking into account the correlation of adjacent frames;

(2) acquiring a speech absence probability of the current frame based on the conditional post-maximum probability value related to the speech presence and absence defined above; And

(3) applying the obtained voice member probability to improve the voice of the current frame;

In order to consider the interrelationship of adjacent frames in step (1), using the Voice Activity and Hidden Markov Model (HMM) of the previous two frames, the secondary condition on the presence and absence of speech A post-maximum probability-based wide-area decision method for secondary conditions for speech enhancement, characterized in that the posterior maximum probability value is defined by the following equation.

Where k is the frequency component number in the frame, l is the frame number, H ₀ is the speech absent hypothesis, H ₁ is the speech presence hypothesis,

If there is no voice in the previous frame and voice in the previous frame,

If there is a voice in the previous frame and no voice in the previous frame,

The method of claim 1,

From the assumption that the speech signal and noise spectra follow a complex Gaussian distribution, the second order condition for speech enhancement is defined by the following equation: the probability density function of the speech signal based on the speech presence and absence hypothesis. Post-Maximum Probability Based Wide Decision Method.

Here, λ _x (k, l) denotes the variance value of the speech signal in the k-th frequency component of the l-th frame, and λ _n (k, l) denotes the variance value of noise in the k-th frequency component of the l-th frame it means. In addition, P ( Y (k, l) | H ₀ ) means the probability density function of Y (k, l) in the absence of voice, and P ( Y (k, l) | H ₁ ) is Y (in the presence of negative). k, l) means the probability density function.

The method of claim 2,

From the probability density function of the speech signal based on the speech presence and absence hypothesis, the likelihood ratio Λ ( Y (k, l)) of the k-th frequency channel of the l-th frame is defined by the following equation, Wide Probability-Based Wide Decision Based Second Order Condition for Speech Enhancement.

here,

Represents the prior SNR,

Represents post SNR.

delete

The method of claim 3,

In the step (2), based on the probability density function of the speech signal based on the speech presence and absence hypothesis, the likelihood ratio of the k-th frequency channel of the l-th frame, and the second maximum post-probability probability value, A method according to the following equation, characterized in that the second-order condition post-maximum probability based wide delay decision method for speech enhancement.

here,