KR101897242B1

KR101897242B1 - A method for enhancing quality of speech including noise

Info

Publication number: KR101897242B1
Application number: KR1020170013033A
Authority: KR
Inventors: 김선만; 김홍국
Original assignee: 광주과학기술원
Priority date: 2017-01-26
Filing date: 2017-01-26
Publication date: 2018-09-10
Also published as: KR20180088230A

Abstract

음성 신호 품질 향상을 위한 장치가 개시된다. 본 발명의 일 실시 예에 따른 음성 신호 품질 향상 장치는 잡음이 포함된 음성 신호에 대한 선험적 신호 대비 잡음 비율(SNR)을 추정하는 선험적 SNR 추정부, 상기 SNR을 이용하여 제1 위너 필터 게인을 획득하는 위너 필터 게인 생성부 및 상기 제1 위너 필터 게인이 적용된 잡음이 포함된 음성 신호를 비음수행렬 인수분해 처리하여 비음수행렬 인수분해 게인을 획득하는 비음수행렬 인수분해 처리부를 포함하고, 상기 위너 필터 처리부는 상기 비음수행렬 인수분해 게인 및 선험적 상기 SNR을 이용하여 제2 음성 신호 획득에 이용되는 제2 위너 필터 게인을 획득한다.An apparatus for improving voice signal quality is disclosed. The apparatus for enhancing speech signal quality according to an embodiment of the present invention includes an a priori SNR estimator for estimating a priori signal to noise ratio (SNR) for a speech signal including noise, a first Wiener filter gain obtained using the SNR And a non-noise matrix factorization processor for obtaining a non-noise matrix factorization gain by performing a non-noise matrix factorization on a voice signal including a noise applied with the first Wiener filter gain, The filter processing unit obtains the second Wiener filter gain used for acquiring the second voice signal using the non-noise matrix factorization gain and the a priori SNR.

Description

[0001] The present invention relates to a method for enhancing the quality of speech including noise,

본 발명은 잡음을 포함하는 음성의 음질을 향상시키는 방법에 관한 것이다. 보다 상세하게는 비음수 행렬 인수분해 및 위너 필터를 이용하여 음성의 음질을 향상시키는 방법에 관한 것이다.The present invention relates to a method for improving the sound quality of a voice including noise. More particularly, to a method for improving the sound quality of a voice by using a non-numerical matrix factorization and a Wiener filter.

잡음 음성의 음질을 향상시키기 위해 흔히 사용되는 위너 필터(Wiener filter), 최소 평균제곱 오차(minimum mean square error : MMSE) 추정 등의 음성 향상 기법들은 잡음 파워 스펙트럼 밀도(power spectral density : PSD) 추정에 의한 선험적 (a priori) 신호 대비 잡음 비율(signal-to-noise ratio : SNR)의 정확한 추정을 필요로 한다. 이러한 위너 필터 등의 통계적 기반 음성 향상 기법들은 정적인(stationary) 잡음 환경에서 비교적 좋은 성능을 보이는 것으로 알려져 있다.Speech enhancement techniques such as the Wiener filter and the minimum mean square error (MMSE) estimation, which are commonly used to improve the quality of noise speech, are used to estimate noise power spectral density (PSD) To-noise ratio (SNR) with respect to a priori signal due to the presence of the signal. Such statistical based speech enhancement techniques such as Wiener filter are known to exhibit relatively good performance in a stationary noise environment.

그러나, 배블(babble)과 같은 비정적(non-statoinary) 잡음 환경에서는 잡음 PSD 및 선험적 SNR 추정의 정확성이 떨어지고 이는 음성 향상 처리 후 잔여 잡음 문제를 야기한다.However, in a non-statoinary noise environment such as a babble, the accuracy of the noise PSD and the a priori SNR estimation becomes poor, which causes a residual noise problem after the speech enhancement processing.

음성향상을 위한 대체 방법으로 많은 연구가 진행되고 있는 비음수행렬 인수분해(non-negative matrix factorization : NMF)는 비음수 음성과 잡음의 기저를 이용하여 잡음의 기저를 획득하기 위해 관측 음성으로부터 정확한 잡음 구간 검출이 요구되나, 이 또한 수십 년간 많은 연구가 진행되고 있는 도전 문제이다.Non-negative matrix factorization (NMF), which is being studied as an alternative method for improving speech, is to extract the correct noise from the observed speech in order to obtain the basis of noise by using the basis of non- Although the detection of the interval is required, this is also a challenge that has been undergoing much research for decades.

결국, 잡음의 기저 없이 음성의 기저만으로 음성의 품질을 향상시키는 비음수행렬 인수분해 기반 음성 향상 방법이 개발될 필요가 있다.As a result, it is necessary to develop a non-speech matrix factorization-based speech enhancement method that improves speech quality only at the basis of speech without basis of noise.

선행기술문헌 1 : M. B. Trawicki, M. T. Johnson, Distributed multichannel speech enhancement with minimum mean-square error short-time spectral amplitude, log-spectral amplitude, and spectral phase estimation. Signal Processing. 92(2) (2012) 345-356.Prior Art Document 1: M. B. Trawicki, M. T. Johnson, Distributed multichannel speech enhancement with minimum mean-square error, short-time spectral amplitude, log-spectral amplitude, and spectral phase estimation. Signal Processing. 92 (2) (2012) 345-356. 선행기술문헌 2 : W. Lee, J. H. Song, J. H. Chang, Minima-controlled speech presence uncertainty tracking method for speech enhancement. Signal Processing. 91(1) (2011) 155-161.Prior Art Document 2: W. Lee, J. H. Song, J. H. Chang, Minima-controlled speech presence uncertainty tracking method for speech enhancement. Signal Processing. 91 (1) (2011) 155-161. 선행기술문헌 3 : P. C. Loizou, Speech Enhancement: Theory and Practice. Second Edition. CRC 2013.Prior Art Document 3: P. C. Loizou, Speech Enhancement: Theory and Practice. Second Edition. CRC 2013. 선행기술문헌 4 : Speech Processing, Transmission and Quality Aspects (STQ); Distributed speech recognition; Advanced frontend feature extraction algorithm; Compression algorithms, Tech. Rep. ETSI ES 202 050 V1.1.5 2007.Prior Art Document 4: Speech Processing, Transmission and Quality Aspects (STQ); Distributed speech recognition; Advanced frontend feature extraction algorithm; Compression algorithms, Tech. Rep. ETSI ES 202 050 V1.1.5 2007. 선행기술문헌 5 : S. J. Lee, B. O. Kang, H. Jung, Y. Lee, H. S. Kim, Statistical model-based noise reduction approach for car interior applications to speech recognition. ETRI Journal. 32(5) (2010) 801-809.Prior Art Document 5: S. J. Lee, B. O. Kang, H. Jung, Y. Lee, H. S. Kim, Statistical Model-Based Noise Reduction Approach to Speech Recognition in Speech Recognition. ETRI Journal. 32 (5) (2010) 801-809. 선행기술문헌 6 : S. Mirzaei, H. V. Hamme, Y. Norouzi, Blind audio source counting and separation of anechoic mixtures using the multichannel complex NMF framework. Signal Processing. 115 (2015) 27-37.Prior Art Document 6: S. Mirzaei, H. V. Hamme, Y. Norouzi, Blind audio source counting and separation of anechoic mixtures using the multichannel complex NMF framework. Signal Processing. 115 (2015) 27-37. 선행기술문헌 7 : Y. Xu, G. Bao, X. Xu, Z. Ye, Single-channel speech separation using sequential discriminative dictionary learning. Signal Processing. 106 (2015) 134-140.Prior Art Document 7: Y. Xu, G. Bao, X. Xu, Z. Ye, Single-channel speech separation using sequential discriminative dictionary learning. Signal Processing. 106 (2015) 134-140. 선행기술문헌 8 : F. Weninger, J. Le Roux, J. R. Hershey, S. Watanabe, Discriminative NMF and its application to single-channel source separation. Proc. Interspeech. Singapore (2014) 865-869.Prior Art Document 8: F. Weninger, J. Le Roux, J. R. Hershey, S. Watanabe, Discriminative NMF and its application to single-channel source separation. Proc. Interspeech. Singapore (2014) 865-869. 선행기술문헌 9 : N. Mohammadiha, P. Smaragdis, A. Leijon, Supervised and Unsupervised Speech Enhancement Using Nonnegative Matrix Factorization. IEEE Trans. Audio, Speech, and Language Process. 21(10) (2013) 2140-2151.Prior Art Document 9: N. Mohammadiha, P. Smaragdis, A. Leijon, Supervised and Unsupervised Speech Enhancement Using Nonnegative Matrix Factorization. IEEE Trans. Audio, Speech, and Language Process. 21 (10) (2013) 2140-2151. 선행기술문헌 10 : C. Joder, F. Weninger, F. Eyben, D. Virette, B. Schuller, Real-time speech separation by semi-supervised nonnegative matrix factorization. Proc. International Conference on Latent Variable Analysis and Signal Separation. Tel Aviv. Israel (2012) 322-329.Prior Art Document 10: C. Joder, F. Weninger, F. Eyben, D. Virette, B. Schuller, Real-time speech separation by semi-supervised nonnegative matrix factorization. Proc. International Conference on Latent Variable Analysis and Signal Separation. Tel Aviv. Israel (2012) 322-329. 선행기술문헌 11 : H. Hu, A. Krasoulis, M. Lutman, S. Bleeck, Development of a real time sparse non-negative matrix factorization module for cochlear implants by using xPC target. Sensors. 13(10) (2013) 13861??13878.Prior Art Document 11: H. Hu, A. Krasoulis, M. Lutman, S. Bleeck, Development of a real time sparse non-negative matrix factorization module for cochlear implants by using xPC target. Sensors. 13 (10) (2013) 13861? 13878. 선행기술문헌 12 : P. Hoyer, Non-negative sparse coding. Proc. IEEE Workshop on Neural Networks for Signal Processing. (2002) 557-565.Prior Art Document 12: P. Hoyer, Non-negative sparse coding. Proc. IEEE Workshop on Neural Networks for Signal Processing. (2002) 557-565. 선행기술문헌 13 : ITU-T Recommendation P.862, Perceptual Evaluation of Speech Quality (PESQ), and Objective Method for Endto-End Speech Quality Assessment of Narrowband Telephone Networks and Speech Coders Feb. 2001.Prior Art Document 13: ITU-T Recommendation P.862, Perceptual Evaluation of Speech Quality (PESQ), and Objective Method for End-End Speech Quality Assessment of Narrowband Telephone Networks and Speech Coders Feb. 2001. 선행기술문헌 14 : J. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, D. S. Pallett, N. L. Dahlgren, V. Zue, TIMIT Acoustic Phonetic Continuous Speech Corpus (Linguistic Data Consortium, Philadelphia, 1993).Prior Art Document 14: J. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, D. S. Pallett, N. L. Dahlgren, V. Zue, TIMIT Acoustic Phonetic Continuous Speech Corpus (Linguistic Data Consortium, Philadelphia, 1993). 선행기술문헌 15 : A. Varga, H. J. M. Steenneken, M. Tomilson, D. Jones, The NOISEX-92 study on the effect of additive noise on automatic speech recognition (Documentation on the NOISEX-92 CD-ROMs 1992).Prior Art Document 15: A. Varga, H. J. M. Steenneken, M. Tomilson, D. Jones, The NOISEX-92 study on additive noise on automatic speech recognition (Documentation on NOISEX-92 CD-ROMs 1992).

본 발명의 일 실시 예는 비음수행렬 인수분해 기반 알고리즘을 확장 개선하여 잡음이 포함된 음성의 품질을 향상시키고자 한다.An embodiment of the present invention is intended to improve the quality of a voice including noises by extending and improving the algorithm based on the factorization of a non-sound number matrix.

본 발명의 일 실시 예에 따른 음성 신호 품질 향상 장치는 잡음이 포함된 음성 신호에 대한 선험적 신호 대비 잡음 비율(SNR)을 추정하는 선험적 SNR 추정부, 상기 선험적 신호 대비 잡음 비율을 이용하여 제1 위너 필터 게인을 획득하는 위너 필터 게인 생성부 및 상기 제1 위너 필터 게인이 적용된 잡음이 포함된 음성 신호를 비음수행렬 인수분해 처리하여 비음수행렬 인수분해 게인을 획득하는 비음수행렬 인수분해 처리부를 포함하고, 상기 위너 필터 게인 생성부는 상기 비음수행렬 인수분해 게인 및 선험적 SNR을 이용하여 제2 음성 신호 획득에 이용되는 제2 위너 필터 게인을 획득한다.The apparatus for enhancing speech signal quality according to an embodiment of the present invention includes an a priori SNR estimator for estimating a a priori signal to noise ratio (SNR) for a speech signal including a noise, And a non-noise matrix factorization processing unit for obtaining a non-noise matrix factorization gain by performing a non-noise matrix factorization process on a voice signal including a noise applied with the first Wiener filter gain, And the Wiener filter gain generator obtains a second Wiener filter gain used for acquiring the second speech signal using the non-noise matrix factorization gain and a-priori SNR.

본 발명의 일 실시 예에 따른 음성 품질 향상 방법은 비음수행렬 인수분해 기반 알고리즘을 확장 개선하여 잡음이 포함된 음성의 품질을 향상시킬 수 있다.The speech quality enhancement method according to an embodiment of the present invention can improve the quality of the noise-including speech by expanding and improving the algorithm based on the factorization of the non-sound number matrix.

도 1은 본 발명의 일 실시 예에 따른 음성 품질 향상 장치의 블록도를 나타낸다.
도 2는 본 발명의 일 실시 예에서 제안하는 방법에 따라 획득된 PESQ(Perceptual Evaluation of Speech Quality 스코어를 잡음 종류에 따라 도시한 그래프이다.
도 3은 스펙트럼 잔여 잡음을 나타낸다.
도 4는 각 조건에서의 스펙트로그램을 나타낸다.
도 5는 서로 다른 타입의 잡음 환경에서 각 방법별 PESQ 스코어를 나타낸 도면이다.
도 6은 본 발명의 일 실시 예에 따른 방법(PR)과 히든 마르코프 모델(HMM)과 결합된 Bayesian NMF(BNMF-HMM)(선행기술문헌 9 참고)(R4)를 비교한 결과를 나타낸다.
도 7은 본 발명의 일 실시 예에 따른 비음수행렬 인수분해 위너 필터를 이용한 음성 품질 향상 방법을 나타내는 흐름도이다.1 is a block diagram of a speech quality enhancement apparatus according to an embodiment of the present invention.
FIG. 2 is a graph illustrating Perceptual Evaluation of Speech Quality (PESQ) score obtained according to a method proposed in an embodiment of the present invention, according to a noise type.
Figure 3 shows spectral residual noise.
Figure 4 shows the spectrogram at each condition.
5 is a diagram showing PESQ scores for each method in different types of noise environments.
Figure 6 shows the results of a comparison of the method PR according to an embodiment of the present invention with Bayesian NMF (BNMF-HMM) (see prior art document 9) R4 combined with a Hidden Markov model (HMM).
FIG. 7 is a flowchart illustrating a method for improving speech quality using a factorizing factor filter Wiener filter according to an embodiment of the present invention. Referring to FIG.

이하에서는 도면을 참조하여 본 발명의 구체적인 실시 예를 상세하게 설명한다. 그러나, 본 발명의 사상은 이하에 제시되는 구체적인 실시 예로 제한되지 아니하며, 본 발명의 사상을 이해하는 당업자는 동일한 사상의 범위 내에 포함되는 다른 실시 예를 구성요소의 부가, 변경, 삭제, 및 추가 등에 의해서 용이하게 제안할 수 있을 것이나, 이 또한 본 발명의 사상에 포함된다고 할 것이다.Hereinafter, specific embodiments of the present invention will be described in detail with reference to the drawings. However, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims. It will be understood by those skilled in the art that the present invention may be embodied in many other specific forms without departing from the spirit or essential characteristics thereof.

이하에서는 수신된 음성 신호의 품질을 향상시키는 방법을 제안한다. 구체적으로, 잡음이 포함된 음성 신호에 있어서, 잡음을 분리/제거하여 식별에 용이한 음성을 획득하기 위한 방법을 제안한다. 이때, 본 발명의 일 실시 예에 따라 획득되는 음성 신호를 목표 음성(target speech)

이라고 할 수 있다.Hereinafter, a method of improving the quality of a received voice signal is proposed. Specifically, a method for acquiring a voice easy to identify by separating / eliminating noise in a voice signal including noise is proposed. Herein, the speech signal obtained according to an embodiment of the present invention may be referred to as a target speech,

.

이하 도 1을 참고하여 본 발명의 일 실시 예에 따른 음성 품질 향상 방법 및 그 장치에 대해 설명한다.Hereinafter, a speech quality enhancement method and apparatus according to an embodiment of the present invention will be described with reference to FIG.

최초 음성 신호

이 주파수 영역에서 부가적인 잡음

에 의해 ㅌ오염된다고 가정하면, 잡음이 포함된 음성

은 k번째 주파수 및 k번째 프레임에서 수학식 1로 표현된다.Initial voice signal

Additional noise in this frequency range

Assuming that the noise is contaminated by noise,

Is expressed by Equation (1) in the k- th frequency and the k- th frame.

음성 향상을 위한 스펙트럴 게인(spectral gain)

은 목표 음성 추정을 시도하기 위해 수학식 2로 표현된다.Spectral gain for voice enhancement

Is expressed by Equation (2) to try target speech estimation.

특히 위너 필터에서의 게인

은 DD 어프로치(decision-directed approach)에 기반한 잡음 분산 추정치에 의해 추정되는 선험적 SNR 추정치

에 의해 수학식 3과 같이 표현된다. DD 어프로치에 대한 상세한 설명은 선행기술문헌 3을 참고한다.In particular, the gain in the Wiener filter

0.0 > SNR estimate < / RTI > estimated by a noise variance estimate based on a decision-directed approach (DD)

Is expressed by Equation (3). For a detailed description of the DD approach, see Prior Art Document 3.

본 발명의 일 실시 예에 따른 음성 품질 향상 방법은 위너 필터 게인을 적용한 뒤에 남아있는 잔여 잡음 컴포넌트를 제거하여

를 향상시키기 위한 것이다. 이를 설명하기 위해 이하에서 본 발명의 일 실시 예에 따른 음성 품질 향상 장치를 구체적으로 설명한다.The method for enhancing speech quality according to an embodiment of the present invention includes removing residual noise components remaining after applying the Wiener filter gain

. In order to explain the above, a speech quality enhancement apparatus according to an embodiment of the present invention will be described in detail.

도 1은 본 발명의 일 실시 예에 따른 음성 품질 향상 장치의 블록도를 나타낸다.1 is a block diagram of a speech quality enhancement apparatus according to an embodiment of the present invention.

본 발명의 일 실시 예에 따른 음성 품질 향상 장치(1)는 선험적 SNR 추정부(10), 위너 필터 게인 생성부(20), 비음수행렬 인수분해 처리부(30) 및 위너 필터 게인 적용부(40)을 포함한다.The apparatus for enhancing voice quality 1 according to the embodiment of the present invention includes a priori SNR estimator 10, a winner filter gain generator 20, a non-noise matrix factorization processor 30 and a winner filter gain application unit 40 ).

선험적 SNR 추정부(10)는 DD 어프로치에 기반한 선험적(priori) 신호 대비 잡음 비율(SNR)

를 추정한다. 구체적인 추정 방법은 선행기술문헌 4 및 5를 참고한다.The a priori SNR estimator 10 estimates a priori signal-to-noise ratio (SNR) based on the DD approach,

. Refer to

prior art documents

4 and 5 for specific estimation methods.

비음수행렬 인수분해 처리부(30)은 온라인 프레임웍에서 비음수행령 인수분해 기반 게인을 추정한다.The non-sound-number matrix factorization processor 30 estimates a non-sound performance factor-based gain in an online framework.

비음수행렬 인수분해 방법을 사용하여

을 향상시키기 위하여, 비음수행렬 인수분해 처리부(30)는 전 대역 주파수를 통해

을 벡터

로 구성한다. 이는 수학식 4로 표현된다Using the non-numeric matrix factorization method

The non-sound number matrix factorization processing unit 30 performs a non-sound number matrix factorization processing

Vector

. This is expressed by Equation 4

여기에서 T는 트랜스포즈 연산자이다. 그리고

는 사전에 학습된 기저행렬(pre-trained basis matrix)

및 활성화(activation) 벡터

을 이용하여 수학식 5와 같이 표현된다.Where T is the transpose operator. And

A pre-trained basis matrix < RTI ID = 0.0 >

And an activation vector

Is expressed as Equation (5).

여기에서

는 모든 주파수에서 위너필터 게인

을 적용한 뒤 남아있는 잔여(residual) 잡음 성분(component)으로 구성된 벡터를 나타낸다. 이때,

는 임의의 목표 음성을 재구성하기 위한 범용 음성 데이터베이스로부터 학습(training)된다.From here

Is the gain of the Wiener filter at all frequencies

And a residual noise component remaining after applying the residual noise component. At this time,

Is trained from a general purpose speech database for reconstructing any target speech.

그러므로, 비음수행렬 인수분해 처리는 오차

를 최소화하는

를 찾기 위한 것이다.

는 랜덤 초기화 된 후 오차 e 값이 최소화된 값으로 수렴할 때까지 반복적으로 업데이트 되어 추정된다. 이는 수학식 6으로 표현된다.Therefore, the non-tone number matrix factorization process is error-

To minimize

.

Is randomly initialized and then iteratively updated until the error e value converges to the minimized value. This is expressed by Equation (6).

여기에서,

및 나누기는 요소별 연산을 나타낸다. 그리고 마지막으로 비음수행렬 인수분해 처리부(30)는 위너 필터의 잔류 잡음을 감쇄시키기 위한 비음수행렬 인수분해 기반 게인

을 획득한다. 이는 수학식 7과 같이 표현된다.From here,

And division represent element-specific operations. Finally, the non-sound number matrix factorization processing unit 30 calculates a non-sound number matrix factorization based gain for attenuating the residual noise of the Wiener filter

. This is expressed by Equation (7).

여기에서,

은 분자가 0이 되는 것을 피하기 위한 최소값이다. 그리고, 획득된 비음수행렬 인수분해 기반 게인

는

을 향상시키기 위해 사용된다. 이는 수학식 8과 같이 표현된다. From here,

Is the minimum value to avoid zero molecules. Then, the obtained non-noise number matrix factorization-based gain

The

. &Lt; / RTI > This is expressed by Equation (8).

다시 말해서, 위너 필터 게인 생성부(20)는 비음수행렬 인수분해 기반 게인

및 선험적 SNR 추정치

를 모두 고려하여 업데이트된 새로운 위너 필터 게인

을 획득한다. In other words, the winner filter gain generator 20 generates the winner filter gain gain

And a priori SNR estimate

New Winner Filter Gain Updated

.

도 2는 본 발명의 일 실시 예에서 제안하는 방법의 성긴 파라미터(sparse parameter,

)에 따라 획득된 PESQ 스코어를 잡음 종류에 따라 도시한 그래프이다.FIG. 2 is a graph showing a sparse parameter of a method proposed in an embodiment of the present invention;

) According to the type of noise. &Lt; tb >< TABLE >

(a)는 잡음 타입이 배블(babble)인 경우이고, (b)는 가우시안 잡음인 경우를 나타낸다.(a) shows a case where the noise type is a babble, and (b) shows a case where the noise type is Gaussian noise.

도 2에 도시된 바와 같이, PESQ 스코어(음질을 1점에서 4.5점까지 점수화한 값으로 점수가 클수록 음질이 좋음을 의미함), 선행기술문헌 13참고)을 비교해 봤을 때, 노이즈 타입과 관련 없이 성긴 파라미터가 음질 향상에 큰 영향을 주고 있음을 알 수 있다.As shown in FIG. 2, when comparing the PESQ score (a score obtained by scoring the sound quality from 1 point to 4.5 points to indicate that the sound quality is better as the score is higher), the prior art reference 13) It can be seen that the sparse parameter has a great influence on the sound quality improvement.

위너 필터 게인 적용부(40)에서 최종적으로 획득되는 향상된 음성 추정치

는 수학식 9와 같이 표현된다.The improved speech estimate < RTI ID = 0.0 >

Is expressed by Equation (9).

그리고, 향상된 음성 추정치

는 신호 변환부(미도시)에서 역 이산 푸리에 변환(DFT) 및 중첩합(overlap-and-add) 방법에 의해 시간-이산(time-discrete) 신호로 변환된다. 그리고, 추정된

는 선험적 SNR 추정부(10)에서

을 추정하는데 다시 이용된다.Then,

Discrete signal by an inverse discrete Fourier transform (DFT) and an overlap-and-add method in a signal converter (not shown). Then,

The a priori SNR estimator 10

Is estimated.

도 3은 스펙트럼 잔여 잡음을 나타낸다.Figure 3 shows spectral residual noise.

도 3에서 검은색 영역은 주파수 영역에서 SNR이 5일 때 음성에 포함된 필터링되지 않은 잡음을 나타낸다. 여기에서 (a)는 배블(babble) 잡음이고, (b)는 가우시안 잡음이다. 짙은 회색 영역은 노이즈가 포함된 음성에 위너 필터 게인을 적용한 후 잔류하는 잡음을 나타낸다. 그리고 옅은 회색 영역은 비음수행렬 인수분해를 적용한 후 잔류하는 잡음을 나타낸다.In FIG. 3, the black region indicates the unfiltered noise included in the speech when the SNR is 5 in the frequency domain. Here, (a) is the babble noise and (b) is the Gaussian noise. The dark gray area represents residual noise after applying the Wiener filter gain to the noise containing the noise. And the light gray region represents the residual noise after applying the non-noise matrix factorization.

도 3에 도시된 바와 같이, 위너 필터는 잡음을 줄이는데 효과적이다. 반면에, 남아있는 잔류 잡음은 사전에 학습된 음성 기저(speech basis)를 이용하는 비음수행렬 인수분해에 의해 더 제거된다.As shown in Fig. 3, the Wiener filter is effective in reducing noise. On the other hand, the remaining residual noise is further removed by non-tone matrix factorization using a previously learned speech basis.

잡음 제거를 위해 유클리드 거리 기반 비음수행렬 인수분해(UCL-NMF) 및 Kullback-Leibler 발산 기반 비음수행렬 인수분해(KL-NMF)가 사용될 수 있다.In order to remove the noise, Euclidean distance-based Nominal Matrix factorization (UCL-NMF) and Kullback-Leibler divergence based non-numerical factorization (KL-NMF) can be used.

본 발명의 구체적인 실시 예에 따른 음성 품질 향상 방법은 비음수행렬 인수분해 기법을 사용할 때, 유클리드 거리 기반 비음수행렬 인수분해를 사용하는 것이 보다 바람직할 수 있다. It may be more preferable to use the Euclidean distance based non-noise matrix factorization when using the non-sound number matrix factorization technique according to a specific embodiment of the present invention.

왜냐하면, 본 발명의 일 실시 예에 따른 음성 품질 향상 방법은 데이터베이스로 음성 기저만을 사용하고 잡음 기저를 사용하지 않는바, 가우시안 노이즈를 추출하는 경향이 있는 유클리드 거리 기반 비음수행렬 인수분해가 더 높은 효과를 보일 수 있다. 이는 표 1에서도 나타난다.Because the voice quality enhancement method according to an embodiment of the present invention uses the voice basis only as a database and does not use a noise basis, the Euclidean distance-based non-voice matrix factorization, which tends to extract Gaussian noise, . &Lt; / RTI > This is also shown in Table 1.

SNR(dB)SNR (dB) WienerWiener Wiener + KL-NMFWiener + KL-NMF Wiener + UCL-NMFWiener + UCL-NMF [20,15][20,15] 3.1713.171 3.1923.192 3.1953.195 [10,5][10,5] 2.4382.438 2.4692.469 2.5182.518 [0,-5][0, -5] 1.6441.644 1.6791.679 1.8051.805

표 1은 본 발명의 일 실시 예에 따른 음성 품질 향상 방법에서, 다른 비음수행렬 인수분해 타입과 위너 필터의 결합에 따른 배블 잡음의 PESQ 스코어를 나타낸다.Table 1 shows the PESQ scores of the bubble noise due to combination of different non-noise matrix factorization types and Wiener filters in the speech quality enhancement method according to an embodiment of the present invention.

도 4는 각 조건에서의 스펙트로그램을 나타낸다. Figure 4 shows the spectrogram at each condition.

구체적으로 (a)는 클린 음성을 나타내며, (b)는 0dB SNR에서 배블 노이즈가 포함된 음성을 나타내며, (c)는 일반적인 위너 필터 게인

에 따라 처리된 음성 신호를 나타내며, (d)는 선행기술문헌 11에 따른 non-negative sparse coding이 적용된 음성 신호를 나타내며, (e)는 본 발명의 일 실시 예에 따른 업데이트된 위너 필터 게인

에 따라 처리된 음성 신호를 나타낸다.Specifically, (a) represents a clean speech, (b) represents a voice including a background noise at 0 dB SNR, and (c)

(D) shows a speech signal to which non-negative sparse coding according to the prior art document 11 is applied, and (e) shows an updated Wiener filter gain according to an embodiment of the present invention.

Lt; / RTI >

도 4에 나타난바와 같이, 일반적인 음성 품질 향상 기술인 (c) 및 (d)에 비교하여 본 발명의 일 실시 예에 따른 (e)가 음성 품질을 향상시키는데 효과적임을 알 수 있다.As shown in FIG. 4, it can be seen that (e) according to an embodiment of the present invention is effective for improving speech quality as compared with the general speech quality improvement techniques (c) and (d).

도 5는 서로 다른 타입의 잡음 환경에서 각 방법별 PESQ 스코어를 나타낸 도면이다.5 is a diagram showing PESQ scores for each method in different types of noise environments.

도 5에서, No는 아무런 처리가 되지 않은 경우를 말하며, R1은 non-negative sparse coding이 적용된 경우, R2는 two-stage Mel-warped 위너 필터가 적용된 경우(선행기술문헌 4 참고), R3은 Model-based 위너 필터가 적용된 경우(선행기술문헌 5 참고), PR은 본 발명의 일 실시 예에 따른 비음수행렬 인수분해 위너필터가 적용된 경우를 나타낸다.In the case of non-negative sparse coding, R2 is a two-stage Mel-warped Wiener filter (see prior art document 4), R3 is a model In the case where the -based Wiener filter is applied (see Prior Art Document 5), PR represents a case where the non-noise matrix factorization Winner filter according to an embodiment of the present invention is applied.

도 5에 도시된 바와 같이, 차량 주행 잡음 조건을 제외하고 모든 잡음 조건에서 본 발명의 일 실시 예에 따른 방법이 더 나은 효과를 보임을 알 수 있다.As shown in FIG. 5, it can be seen that the method according to an embodiment of the present invention exhibits a better effect in all noise conditions except for the vehicle traveling noise condition.

도 6은 본 발명의 일 실시 예에 따른 방법(PR)과 히든 마르코프 모델(HMM)과 결합된 Bayesian NMF(BNMF-HMM)(선행기술문헌 9 참고)(R4)를 비교한 결과를 나타낸다.Figure 6 shows the results of a comparison of the method PR according to an embodiment of the present invention with Bayesian NMF (BNMF-HMM) (see prior art document 9) R4 combined with a Hidden Markov model (HMM).

도 6에 도시된 바와 같이, 본 발명의 일 실시 예에 따른 방법이 모든 잡음 조건에서 R4보다 좋은 효과를 보임을 알 수 있다.As shown in FIG. 6, it can be seen that the method according to an embodiment of the present invention has a better effect than R4 under all noise conditions.

도 7은 본 발명의 일 실시 예에 따른 비음수행렬 인수분해 위너필터를 이용한 음성 품질 향상 방법을 나타내는 흐름도이다.FIG. 7 is a flowchart illustrating a method for improving speech quality using a factorizing factor filter Wiener filter according to an embodiment of the present invention. Referring to FIG.

본 발명의 일 실시 예에 따른 음성 품질 향상 장치(1)는 잡음이 포함된 음성 신호를 획득한다(S101). 여기에서 잡음은 예를 들면 배블(babble) 또는 가우시안 잡음일 수 있다. 본 발명의 일 실시 예에 따른 음성 품질 향상 장치(1)는 잡음과 음성을 분리하여 음성만을 추출할 수 있다. 추출된 음성은 자동 음성 인식 및 보청기 등에 이용될 수 있다.The speech quality enhancement apparatus 1 according to an embodiment of the present invention acquires a speech signal including noise (S101). Here, the noise may be, for example, babble or Gaussian noise. The apparatus 1 for enhancing voice quality according to an embodiment of the present invention can extract only a voice by separating noise and voice. The extracted speech can be used for automatic speech recognition and a hearing aid.

선험적 SNR 추정부(10)는 잡음이 포함된 음성 신호에 대한 선험적 SNR 추정치를 획득한다(S103). 구체적인 실시 예에서 선험적 SNR 추정부는 이전 프레임에서 추정된 음성 신호

또는 잡음이 포함된 음성 신호

로부터 선험적 SNR 추정치를 획득할 수 있다.The a-priori SNR estimator 10 acquires a priori SNR estimate for a speech signal including noise (S103). In a specific embodiment, the a-priori SNR estimator estimates a speech signal

Or a voice signal containing noise

Lt; RTI ID = 0.0 > SNR < / RTI >

위너 필터 게인 생성부(20)는 획득된 선험적 SNR 추정치에 기초하여 제1 위너 필터 게인을 획득한다(S105). 제1 위너 필터 게인은 상술한 수학식 3과 같이 표현될 수 있다.The Wiener filter gain generator 20 acquires the first Wiener filter gain based on the obtained a-priori SNR estimate (S105). The first Wiener filter gain can be expressed as Equation (3).

위터 필터 게인 적용부(40)은 획득한 잡음이 포함된 음성 신호에 제1 위너 필터 게인을 적용한 제1 향상된 음성 신호를 획득한다(S107). 여기에서 제1 향상된 음성 신호는 수학식 2와 같이 표현될 수 있다.The Wiener filter gain application unit 40 obtains a first enhanced speech signal in which the first Wiener filter gain is applied to the speech signal including the acquired noise (S107). Here, the first enhanced speech signal can be expressed as Equation (2).

비음수행렬 인수분해 처리부(30)는 제1 위너 필터 게인이 적용된 제1 향상된 음성 신호에 비음수행렬 인수분해를 처리하여 비음수행렬 인수분해 게인을 획득한다(S109). 구체적으로 비음수행렬 인수분해 처리부(30)는 데이터베이스 학습에 있어서 잡음 기저를 제외하고 음성 기저만을 이용하여 비음수행렬 인수분해 게인을 획득한다.The non-sound number matrix factorization processing unit 30 obtains the non-sound number matrix factorization gain by processing the non-sound number matrix factorization on the first enhanced speech signal to which the first Wiener filter gain is applied (S109). Specifically, the non-voice matrix factorization processor 30 acquires the non-voice matrix factorization gain using only the voice basis except for the noise base in the database learning.

위너 필터 처리부(20)는 비음수행렬 인수분해 게인 및 선험적 SNR 추정치를 이용하여 업데이트된 제2 위너 필터 게인을 획득한다(S111). 여기에서 업데이트된 제2 위너 필터 게인은 수학식 8과 같이 표현될 수 있다.The Wiener filter processing unit 20 obtains the updated second Wiener filter gain using the non-noise matrix factorization gain and the a-priori SNR estimate (S111). The updated second Wiener filter gain can be expressed as Equation (8).

위너 필터 게인 적용부(40)는 제2 위너 필터 게인을 잡음이 포함된 음성에 적용하여 제2 향상된 음성 신호를 출력한다(S113). 여기에서 제2 위너 필터 게인이 적용된 음성 신호는 수학식 9와 같이 표현될 수 있다. 그리고 최종 출력된 음성 신호는 신호 변환부에서 시간-이산 신호로 변환된다.The Wiener filter gain application unit 40 applies the second Wiener filter gain to the voice including the noise to output the second enhanced voice signal (S113). Here, the speech signal to which the second Wiener filter gain is applied can be expressed by Equation (9). The final output speech signal is converted into a time-discrete signal in the signal conversion unit.

본 발명의 일 실시 예에 따른 음성 신호 품질 향상 방법은 프로그램으로 구현될 수 있다. 구체적으로, 상술한 알고리즘에 따라 코딩되어 프로그램으로 구현될 수 있다. 본 발명의 일 실시 예에 따른 알고리즘을 구현한 프로그램은 저장 매체에 기록될 수 있다.The method for improving the voice signal quality according to an embodiment of the present invention can be implemented by a program. Specifically, it can be coded according to the above-described algorithm and implemented as a program. A program implementing the algorithm according to an embodiment of the present invention can be recorded on a storage medium.

이상과 같이, 본 발명에서는 구체적인 구성 요소등과 같은 특정 사항들과 한정된 실시 예 및 도면에 의해 설명되었으나, 이는 본 발명의 보다 전반적인 이해를 돕기 위해 제공된 것일 뿐, 본 발명은 상기의 실시 예에 한정되는 것은 아니며, 본 발명이 속하는 분야에서 통상적인 지식을 가진 자라면 이러한 기재로부터 다양한 수정 및 변형이 가능하다. 따라서, 본 발명의 사상은 설명된 실시 예에 국한되어 정해져서는 안되며, 후술하는 특허청구범위 뿐 아니라 이 특허청구범위와 균등하거나 등가적 변형이 있는 모든 것들은 본 발명의 사상의 범주에 속한다고 할 것이다.As described above, the present invention has been described with reference to particular embodiments, such as specific elements, and limited embodiments and drawings. However, it should be understood that the present invention is not limited to the above- And various modifications and changes may be made thereto without departing from the scope of the present invention. Accordingly, the spirit of the present invention should not be construed as being limited to the embodiments described, and all of the equivalents or equivalents of the claims, as well as the following claims, fall within the scope of the spirit of the present invention .

Claims

Obtaining a speech signal including noise;
Obtaining a priori signal to noise ratio (SNR) for a speech signal including noise;
Obtaining a first Wiener filter gain using the a-priori SNR;
Obtaining a first speech signal to which the first Wiener filter gain is applied to the speech signal including the noise;
Obtaining a non-sound number matrix factorization gain by factorizing a non-voice number matrix of the first voice signal;
Acquiring a second Wiener filter gain obtained by updating the first Wiener filter gain using the non-noise matrix factorization gain and the a priori signal-to-noise ratio; And
And applying the second Wiener filter gain to a speech signal including noise to obtain a second speech signal
How to improve voice signal quality.

The method according to claim 1,
The step of factorizing the non-voice number matrix of the first speech signal
And machine learning the first speech signal using only the speech database
How to improve voice signal quality.

3. The method of claim 2,
The machine learning technique is based on the Euclidean distance-based non-note number matrix factorization (UCL-NMF)
How to improve voice signal quality.

A priori SNR estimator for estimating a priori signal to noise ratio (SNR) for a speech signal including noise;
A Wiener filter gain generator for obtaining a first Wiener filter gain using the a priori SNR; And
And a non-sound number matrix factorization processor for obtaining a non-sound number matrix factorization gain by factorizing a non-sound number matrix of a first sound signal including noise to which the first Wiener filter gain is applied,
The Wiener filter gain generator may obtain a second Wiener filter gain that updates the first Wiener filter gain used for acquiring the second voice signal using the non-noise matrix factorization gain and the a priori SNR
A device for enhancing voice signal quality.

5. The method of claim 4,
Wherein the non-sound-number matrix factorization processor uses the speech database only to perform the machine learning of the first speech signal
A device for enhancing voice signal quality.

5. The method of claim 4,
The non-sound number matrix factorization processor may perform the machine learning of the first speech signal by factorizing the Euclidean distance-based non-
A device for enhancing voice signal quality.