KR100413797B1

KR100413797B1 - Speech signal compensation method and the apparatus thereof

Info

Publication number: KR100413797B1
Application number: KR10-2001-0051006A
Authority: KR
Inventors: 최창규; 김상룡; 최승호
Original assignee: 삼성전자주식회사
Priority date: 2001-08-23
Filing date: 2001-08-23
Publication date: 2003-12-31
Also published as: KR20030016925A

Abstract

본 발명에 따라 잡음 음성으로부터 잡음을 제거하는 방법은, 시간 영역의 잡음 음성 신호로부터 잡음 음성 계수 벡터를 얻기 위해 독립성분분석법을 이용하여 상기 잡음 음성 신호를 전처리하는 단계와, 상기 전처리에 의해 얻어진 잡음 음성 계수 벡터에서 잡음이 제거된 음성 계수 벡터를 구하기 위해 상기 잡음 음성 계수 벡터를 실시간으로 보상하는 단계와, 상기 보상 결과 얻어진 음성 계수 벡터를 시간 영역의 음성 신호로 변환하기 위해 상기 음성 계수 벡터를 후처리하는 단계를 포함한다. 이상과 같은 본 발명에 의하면, 음성 신호를 받는 마이크의 차이에 의한 왜곡, 음성 신호 전달 채널에 의한 왜곡, 실제 환경에서 소음으로 포함되는 잡음 등 음성 인식 시스템 작동 환경에 따른 다양한 영향에 의해 포함되는 잡음 성분을 제거한 음성신호를 실시간으로 구할 수 있다.According to the present invention, a method for removing noise from a noisy speech comprises the steps of: preprocessing the noisy speech signal using independent component analysis to obtain a noisy speech coefficient vector from a noisy speech signal in the time domain; Compensating the noise speech coefficient vector in real time to obtain a speech coefficient vector from which noise is removed from the speech coefficient vector; and converting the speech coefficient vector obtained as a result of the compensation into a speech signal in a time domain. Processing. According to the present invention as described above, the noise included by the various effects of the operating environment of the speech recognition system, such as distortion due to the difference of the microphone receiving the voice signal, distortion by the voice signal transmission channel, noise included as noise in the real environment The audio signal from which the component is removed can be obtained in real time.

Description

Speech signal compensation method and the apparatus

본 발명은 잡음 음성 신호나 잡음 음성 특징 벡터로부터 잡음을 제거하여 음성을 보상하는 방법 및 그 장치에 관한 것이다.The present invention relates to a method and apparatus for compensating speech by removing noise from a noisy speech signal or a noisy speech feature vector.

음성 인식 시스템의 성능은 훈련된 깨끗한 음성과 입력 잡음 음성사이에 불일치가 인식될 때 저하된다. 이러한 상황은 음성 코딩 시스템에서 더 악화되고, 음질의 저하는 입력 잡음 음성에서보다 음성 코딩 시스템에 의해 처리된 음성에서 더 나빠진다.The performance of the speech recognition system is degraded when discrepancies between the trained clean speech and the input noise speech are recognized. This situation is exacerbated in speech coding systems, and degradation of speech quality is worse in speech processed by the speech coding system than in input noise speech.

이와 같은 문제를 해결하기 위한 하나의 방법은 평균정규화이다. 평균 정규화방법에서는, 전체 음성으로부터 추출된 모든 특징 벡터들의 평균을 계산한 다음 소정의 함수를 이용하여 입력 음성 특징벡터로부터 평균을 감산함으로써 입력 음성 특징 벡터가 정규화된다. 평균 정규화 방법은 입력 음성 특징 벡터를 동적으로 적응시키기는 하지만, 전체 음성으로부터 추출된 모든 특징 벡터들에 대해 하나의 평균만을 계산하기 때문에 그리 정확하지가 않으며, 더해지는 잡음성분이 상대적으로 큰 경우에는 거의 효과를 보지 못한다는 것이다.One way to solve this problem is average normalization. In the average normalization method, the input speech feature vector is normalized by calculating an average of all feature vectors extracted from the entire speech and then subtracting the average from the input speech feature vector using a predetermined function. The average normalization method dynamically adapts the input speech feature vectors, but it is not very accurate because only one average is calculated for all feature vectors extracted from the entire speech, and it is rarely possible when the added noise component is relatively large. It doesn't work.

이와 같은 문제를 해결하기 위한 다른 방법은 신호대잡음 종속(SNR dependent) 정규화이다. 신호대잡음 종속 정규화방법에서는, 입력 음성의 순간적인 SNR을 계산한 다음 입력음성특징 벡터로부터 SNR에 따른 보정 벡터를 감산함으로써 입력 음성 특징 벡터가 정규화된다. 신호대잡음 종속 정규화방법은 입력 음성의 SNR에 따른 변화하는 보정 벡터를 계산하기 때문에 평균 정규화방법보다는 정확하지만 보정 벡터의 값을 동적으로 갱신하지는 못한다.Another way to solve this problem is SNR dependent normalization. In the signal-to-noise dependent normalization method, the input speech feature vector is normalized by calculating the instantaneous SNR of the input speech and then subtracting the correction vector according to the SNR from the input speech feature vector. The signal-to-noise dependent normalization method is more accurate than the average normalization method because it calculates a correction vector that varies with the SNR of the input speech, but does not dynamically update the value of the correction vector.

본 발명은 독립성분분석법을 이용하여 잡음 음성을 실시간으로 보상하여 잡음이 섞인 음성의 질을 향상시키기 위한 것이다.The present invention is to improve the quality of the speech mixed noise by compensating the noise speech in real time using the independent component analysis method.

도 1은 본 발명에 따른 음성개선부가 원 음성 신호에 사용된 음성인식기의 한 예의 블럭도.BRIEF DESCRIPTION OF THE DRAWINGS Fig. 1 is a block diagram of an example of a speech recognizer in which a speech improving unit according to the present invention is used for an original speech signal.

도 2는 본 발명에 따른 음성개선부가 특징추출부의 뒷단에 사용된 음성인식기의 한 예의 블럭도.Figure 2 is a block diagram of an example of the speech recognizer used in the rear end of the feature extracting portion of the speech enhancement portion according to the present invention.

도 3은 도 1에 도시된 음성개선부를 더욱 상세히 보여주는 블럭도.3 is a block diagram showing in more detail the voice improvement unit shown in FIG.

도 4는 도 2에 도시된 음성개선부를 더욱 상세히 보여주는 블럭도.Figure 4 is a block diagram showing in more detail the speech enhancement unit shown in FIG.

도 5는 본 발명에 따른 음성개선방법의 과정을 나타내는 흐름도.5 is a flowchart illustrating a process of a voice improvement method according to the present invention;

* 도면의 주요한 부분에 대한 설명 *Description of the main parts of the drawing

120 : 음성개선부 130 : 특징추출부120: voice improvement unit 130: feature extraction unit

140 : 음성인식부 320 : 전처리부140: voice recognition unit 320: preprocessing unit

330 : 음성보상부 340 : 후처리부330: voice compensation unit 340: post-processing unit

이상과 같은 과제를 해결하기 위한 본 발명의 하나의 특징은, 잡음 음성으로부터 음성보상된 음성 특징 벡터를 추출하는 방법에 있어서, 잡음 음성으로부터 잡음 음성 특징 벡터를 추출하는 단계와, 상기 잡음 음성 특징 벡터로부터 잡음 음성 계수 벡터를 얻기 위해 독립성분분석법을 이용하여 상기 잡음 음성 특징 벡터를 전처리하는 단계와, 상기 전처리부로부터 출력된 잡음 음성 계수 벡터를 실시간으로 보상하는 단계와, 상기 보상 결과 얻어진 음성 계수 벡터를 음성 특징 벡터로 변환하기 위해 상기 음성 계수 벡터를 후처리하는 단계를 포함하는 것이다.One feature of the present invention for solving the above problems is a method of extracting a speech compensated speech feature vector from a noise speech, extracting a noise speech feature vector from the noise speech, the noise speech feature vector Preprocessing the noise speech feature vector by using independent component analysis to obtain a noise speech coefficient vector from the compensator, compensating in real time the noise speech coefficient vector output from the preprocessor, and the speech coefficient vector obtained as a result of the compensation. Post-processing the speech coefficient vector to convert a to a speech feature vector.

본 발명의 다른 특징은, 잡음 음성으로부터 음성보상된 음성 특징 벡터를 추출하는 방법에 있어서, 잡음 음성으로부터 잡음 음성 특징 벡터를 추출하는 제1단계와, 독립성분분석법을 이용하여 상기 잡음 음성 특징 벡터를 잡음 음성 계수 벡터로 변환하도록 상기 잡음 음성 특징 벡터를 전처리하는 제2단계와, 처음 소정의 프레임동안 음성보상에 필요한 잡음 음성 계수 벡터, 음성 계수 벡터, 잡음 계수 벡터에 대한 파라미터들을 초기화하는 제3단계와, 상기 제2단계로부터 구해진 상기 잡음 음성 계수 벡터에 대한 파라미터를 계산하는 제4단계와, 현재 프레임이 잡음으로만 이루어졌을 확률을 계산하는 제5단계와, 현재 프레임이 잡음이라 판단될 경우 상기 잡음 계수 벡터에 대한 파라미터를 갱신하는 제6단계와, 현재 프레임의 상기 음성 계수 벡터에 대한 파라미터를 예측하는 제7단계와, 상기 잡음 계수 벡터 파라미터와 상기 음성 계수 벡터 파라미터를 이용하여 상기 잡음 음성 계수 벡터로부터 잡음이 제거된 음성 계수 벡터를 계산하는 제8단계와, 상기 잡음이 제거된 음성 계수 벡터를 후처리하여 음성 특징 벡터로 변환하는 제9단계와, 상기 음성 계수 벡터에 대한 파라미터를 갱신하는 제10단계와, 상기 제 2단계부터 제10단계까지를 마지막 프레임까지 반복하는 제11단계를 포함하는 것이다.According to another aspect of the present invention, there is provided a method of extracting a speech-compensated speech feature vector from a noise speech, comprising: a first step of extracting a noise speech feature vector from the noise speech; A second step of preprocessing the noise speech feature vector to convert to a noise speech coefficient vector, and a third step of initializing the parameters for the noise speech coefficient vector, speech coefficient vector, noise coefficient vector required for speech compensation during the first predetermined frame. And a fourth step of calculating a parameter for the noise speech coefficient vector obtained from the second step, a fifth step of calculating a probability that the current frame consists only of noise, and if it is determined that the current frame is noise. Updating a parameter for a noise coefficient vector to the speech coefficient vector of the current frame; A seventh step of predicting a parameter, an eighth step of calculating a speech coefficient vector from which the noise speech coefficient vector is removed using the noise coefficient vector parameter and the speech coefficient vector parameter, and removing the noise A ninth step of post-processing the speech coefficient vector to a speech feature vector, a tenth step of updating a parameter for the speech coefficient vector, and an eleventh step of repeating the second to tenth steps to the last frame To include the steps.

바람직하게는, 상기 제2단계는 상기 잡음 음성 특징 벡터를 오버랩-애드(overlap-add)를 위한 세그먼테이션, 윈도윙(windowing), 독립성분분석 베이시스 함수 변환(ICA basis function transform:ICA-BFT)시킨다.Advantageously, said second step performs ICA basis function transform (ICA-BFT) for segmentation, windowing, and independent component analysis for overlap-add. .

또한 바람직하게는, 상기 제5단계는, 독립성분분석 베이시스 계수들이 서로 독립적이라는 성질을 이용하여 현재 프레임이 잡음으로만 이루어졌을 확률을 계산한다.Also preferably, the fifth step calculates the probability that the current frame is made of noise only by using the property that the independent component analysis basis coefficients are independent of each other.

본 발명의 또다른 특징은, 잡음 음성으로부터 잡음이 제거된 음성 특징 벡터를 추출하는 장치에 있어서, 입력된 잡음 음성으로부터 잡음 음성 특징 벡터를 추출하는 특징추출부와, 상기 특징추출부로부터 출력된 잡음 음성 특징 벡터로부터 잡음 음성 계수 벡터를 얻기 위해 독립성분분석법을 이용하여 상기 잡음 음성 특징 벡터를 전처리하는 전처리부와, 상기 전처리부로부터 출력된 잡음 음성 계수 벡터를 실시간으로 보상하는 음성보상부와, 상기 음성보상부로부터 출력된 음성 계수 벡터를 음성 특징 벡터로 변환하기 위해 상기 음성 계수 벡터를 후처리하는 후처리부를 포함하는 것이다.In still another aspect of the present invention, there is provided an apparatus for extracting a speech feature vector from which noise is removed from a noise speech, comprising: a feature extractor extracting a noise speech feature vector from an input noise speech, and a noise output from the feature extractor; A preprocessor for preprocessing the noise speech feature vector using independent component analysis to obtain a noise speech coefficient vector from the speech feature vector, a speech compensator for compensating in real time the noise speech coefficient vector output from the preprocessor, And a post-processing unit which post-processes the speech coefficient vector to convert the speech coefficient vector output from the speech compensator into a speech feature vector.

바람직하게는, 상기 음성보상부는, 음성보상에 필요한 잡음 음성 계수 벡터, 음성 계수 벡터, 잡음 계수 벡터에 대한 파라미터들을 초기화하고, 상기 잡음 음성계수 벡터에 대한 파라미터를 계산하고, 현재 프레임이 잡음으로만 이루어진 경우 상기 잡음 계수 벡터에 대한 파라미터를 갱신하고, 현재 프레임의 상기 음성 계수 벡터에 대한 파라미터를 예측하고,상기 잡음 계수 벡터 파라미터와 상기 음성 계수 벡터 파라미터를 이용하여 상기 잡음 음성 계수 벡터로부터 잡음이 제거된 음성 계수 벡터를 계산한다.Preferably, the speech compensation unit initializes the parameters for the noise speech coefficient vector, the speech coefficient vector, and the noise coefficient vector required for speech compensation, calculates the parameters for the noise speech coefficient vector, and the current frame is noise only. Update the parameter for the noise coefficient vector, predict the parameter for the speech coefficient vector of the current frame, and remove the noise from the noise speech coefficient vector using the noise coefficient vector parameter and the speech coefficient vector parameter. The calculated speech coefficient vector.

본 발명의 또다른 특징은, 잡음 음성으로부터 잡음을 제거하는 방법에 있어서, 시간 영역의 잡음 음성 신호로부터 잡음 음성 계수 벡터를 얻기 위해 독립성분분석법을 이용하여 상기 잡음 음성 신호를 전처리하는 단계와, 상기 전처리에 의해 얻어진 잡음 음성 계수 벡터에서 잡음이 제거된 음성 계수 벡터를 구하기 위해 상기 잡음 음성 계수 벡터를 실시간으로 보상하는 단계와, 상기 보상 결과 얻어진 음성 계수 벡터를 시간 영역의 음성 신호로 변환하기 위해 상기 음성 계수 벡터를 후처리하는 단계를 포함하는 것이다.In still another aspect of the present invention, there is provided a method for removing noise from a noisy speech, the method comprising: preprocessing the noisy speech signal using independent component analysis to obtain a noisy speech coefficient vector from a noisy speech signal in the time domain; Compensating the noise speech coefficient vector in real time to obtain a speech coefficient vector from which the noise speech coefficient vector obtained by preprocessing has been removed, and converting the speech coefficient vector obtained as a result of the compensation into a speech signal in a time domain. Post-processing the speech coefficient vector.

본 발명의 또다른 특징은, 잡음 음성으로부터 잡음을 제거하는 방법에 있어서, 독립성분분석법을 이용하여 입력된 시간 영역의 잡음 음성 신호를 잡음 음성 계수 벡터로 변환하도록 상기 잡음 음성 신호를 전처리하는 제1단계와, 처음 소정의 프레임동안 음성보상에 필요한 잡음 음성 계수 벡터, 음성 계수 벡터, 잡음 계수 벡터에 대한 파라미터들을 초기화하는 제2단계와, 상기 제1단계로부터 구해진 상기 잡음 음성 계수 벡터에 대한 파라미터를 계산하는 제3단계와, 현재 프레임이 잡음으로만 이루어졌을 확률을 계산하는 제4단계와, 현재 프레임이 잡음이라 판단될 경우 상기 잡음 계수 벡터에 대한 파라미터를 갱신하는 제5단계와, 현재 프레임의 상기 음성 계수 벡터에 대한 파라미터를 예측하는 제6단계와, 상기 잡음 계수 벡터 파라미터와 상기 음성 계수 벡터 파라미터를 이용하여 상기 잡음 음성 계수 벡터로부터 잡음이 제거된 음성 계수 벡터를 계산하는 제7단계와, 상기 잡음이 제거된 음성 계수 벡터를 후처리하여 시간 영역의 음성 신호로 변환하는 제8단계와, 상기 음성 계수 벡터에 대한 파라미터를 갱신하는 제9단계와, 상기 제1단계부터 제9단계까지를 마지막 프레임까지 반복하는 제10단계를 포함하는 것이다.In still another aspect of the present invention, there is provided a method for removing noise from a noisy speech, comprising: a first process for preprocessing the noisy speech signal to convert a noisy speech signal in a time domain into a noisy speech coefficient vector using an independent component analysis method; And a second step of initializing the parameters for the noise speech coefficient vector, the speech coefficient vector, and the noise coefficient vector required for speech compensation during the first predetermined frame, and the parameters for the noise speech coefficient vector obtained from the first step. A third step of calculating, a fourth step of calculating a probability that the current frame consists only of noise, a fifth step of updating a parameter for the noise coefficient vector when the current frame is determined to be noise, and A sixth step of predicting a parameter for the speech coefficient vector, the noise coefficient vector parameter and the sound A seventh step of calculating a noise-reduced speech coefficient vector from the noise speech coefficient vector using a coefficient vector parameter, and an eighth step of post-processing the noise-reduced speech coefficient vector to a speech signal in a time domain And a ninth step of updating a parameter for the speech coefficient vector, and a tenth step of repeating the first to ninth steps to the last frame.

바람직하게는, 상기 제1단계는 상기 잡음 음성 신호를 오버랩-애드(overlap-add)를 위한 세그먼테이션, 윈도윙(windowing), 독립성분분석 베이시스 함수 변환(ICA basis function transform:ICA-BFT)시킨다.Advantageously, the first step performs segmentation, windowing, and ICA basis function transform (ICA-BFT) for overlap-add.

또한, 바람직하게는, 상기 제4단계는, 독립성분분석 베이시스 계수들이 서로 독립적이라는 성질을 이용하여 현재 프레임이 잡음으로만 이루어졌을 확률을 계산한다.Also, preferably, the fourth step calculates the probability that the current frame is composed only of noise by using a property that independent component analysis basis coefficients are independent of each other.

본 발명의 또다른 특징은, 잡음 음성으로부터 잡음을 제거하는 장치에 있어서, 입력된 시간 영역의 잡음 음성 신호를 잡음 음성 계수 벡터로 변환하기 위해 독립성분분석법을 이용하여 상기 잡음 음성 신호를 전처리하는 전처리부와, 상기 전처리부로부터 출력된 잡음 음성 계수 벡터로부터 잡음이 제거된 음성 계수 벡터를 얻기 위해 상기 잡음 음성 계수 벡터를 실시간으로 보상하는 음성보상부와, 상기 음성보상부로부터 출력된 음성 계수 벡터를 시간 영역의 음성 신호로 변환하기 위해 상기 음성 계수 벡터를 후처리하는 후처리부를 포함하는 것이다.Another feature of the present invention is a device for removing noise from a noisy speech, comprising: preprocessing the noise speech signal by using an independent component analysis method to convert the input noise speech signal in the time domain into a noise speech coefficient vector. And a speech compensator for compensating the noise speech coefficient vector in real time to obtain a speech coefficient vector from which noise is removed from the noise speech coefficient vector output from the preprocessor, and a speech coefficient vector output from the speech compensator. And a post-processing unit for post-processing the speech coefficient vector to convert the speech signal into a time domain.

바람직하게는, 상기 음성보상부는, 음성보상에 필요한 잡음 음성 계수 벡터,음성 계수 벡터, 잡음 계수 벡터에 대한 파라미터들을 초기화하고, 상기 잡음 음성 계수 벡터에 대한 파라미터를 계산하고, 현재 프레임이 잡음으로만 이루어진 경우 상기 잡음 계수 벡터에 대한 파라미터를 갱신하고, 현재 프레임의 상기 음성 계수 벡터에 대한 파라미터를 예측하고,상기 잡음 계수 벡터 파라미터와 상기 음성 계수 벡터 파라미터를 이용하여 상기 잡음 음성 계수 벡터로부터 잡음이 제거된 음성 계수 벡터를 계산한다.Preferably, the speech compensation unit initializes the parameters for the noise speech coefficient vector, the speech coefficient vector, and the noise coefficient vector required for speech compensation, calculates the parameters for the noise speech coefficient vector, and calculates the current frame as noise only. Update the parameter for the noise coefficient vector, predict the parameter for the speech coefficient vector of the current frame, and remove the noise from the noise speech coefficient vector using the noise coefficient vector parameter and the speech coefficient vector parameter. The calculated speech coefficient vector.

본 발명의 또다른 특징은, 상기 잡음 음성으로부터 음성보상된 음성 특징 벡터를 추출하는 방법 및 상기 잡음 음성으로부터 잡음을 제거하는 방법을 컴퓨터에서 실행시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체에 관한 것이다.Another aspect of the invention relates to a computer readable recording medium having recorded thereon a program for executing a method of extracting a speech compensated speech feature vector from the noisy speech and a method of removing noise from the noisy speech. will be.

이하에서는 첨부된 도면을 참조하여 본 발명을 상세히 설명한다.Hereinafter, with reference to the accompanying drawings will be described in detail the present invention.

본 발명에 따른 음성 개선 방법은 잡음이 섞인 원 시간 영역의 음성 신호를 입력으로 하여 잡음이 제거된 음성신호를 추출하는데 사용될 수도 있고, 또한 본 발명에 따른 음성개선 방법은 잡음 음성신호로부터 특징이 추출된 잡음 음성특징벡터를 입력으로 하여 잡음이 제거된 음성 특징벡터를 추출하는데 사용될 수도 있다.The speech improvement method according to the present invention may be used to extract a speech signal from which noise is removed by inputting a speech signal in a raw time domain in which noise is mixed. In addition, the speech improvement method according to the present invention extracts a feature from a noise speech signal. It may be used to extract the noise-rejected speech feature vector by inputting the noise-noisy feature vector.

잡음이 섞인 원 음성 신호에 본 발명에 따른 음성개선부가 적용된 음성인식기가 도 1에 도시되어 있으며, 잡음 음성신호로부터 특징이 추출된 잡음 음성특징벡터에 본 발명에 따른 음성개선부가 적용된 음성인식기가 도 2에 도시되어 있다.The speech recognizer to which the speech improving unit according to the present invention is applied to the original speech signal mixed with noise is shown in FIG. 1, and the speech recognizer to which the speech improving unit according to the present invention is applied to the noise speech feature vector extracted from the noise speech signal. 2 is shown.

첨부된 도 1과 도 2에서, 본 발명에 따른 음성개선부가 적용된 음성인식기가 도시되어 있지만, 잡음 음성 신호로부터 잡음을 제거하여 음성을 개선하는 방법 및잡음 음성 특징 벡터로부터 잡음이 제거된 음성 특징 벡터를 얻음으로써 음성을 개선하는 방법에 관한 것이므로, 본 발명에 따른 음성개선방법은 음성인식기에 한정되어 사용되는 것은 아니며, 상기와 같은 음성개선 처리가 요구되는 어느 장치에라도 적용될 수 있다는 것은 당업자라면 충분히 이해할 것이다.1 and 2, although a speech recognizer to which a speech improving unit is applied according to the present invention is shown, a method of improving speech by removing noise from a noise speech signal and a speech feature vector from which noise is removed from a noise speech feature vector Since the present invention relates to a method for improving speech by obtaining a speech recognition method, the speech improvement method according to the present invention is not limited to a speech recognizer and can be applied to any device requiring such speech improvement processing. will be.

도 1의 음성인식기는 인식단계(100)와 훈련단계(110)로 이루어져 있다. 도 1에서 본 발명에 따른 음성개선방법은 시간 영역의 잡음 음성신호로부터 잡음을 제거하는 음성개선부(120)에서 이루어지고, 도 1의 다른 부분들 즉, 클린 음성 데이터베이스로부터 훈련 과정을 거쳐 문법, 사전, 발음모델을 생성하는 훈련 단계(110)나, 특징 추출부(130), 음성인식부(140)는 기존의 방법이 사용된다. 인식 단계(100)에서는, 원 잡음 음성이 음성개선부(120)에 입력되면 잡음이 제거된 음성이 출력되어 특징추출부(130)로 입력되고, 특징추출부(130)에서는 잡음이 제거된 음성으로부터 음성 특징 벡터가 추출되어 음성인식부(140)로 입력되며, 음성인식부(140)에서는 입력된 음성 특징 벡터와 훈련 단계에서 얻어진 음성 모델등을 이용하여 인식이 이루어져 최적 문장이 출력된다. 음성개선부(120)에서 행해지는 본 발명에 따른 음성 개선방법은 이하에서 상세히 설명된다.The voice recognizer of FIG. 1 includes a recognition step 100 and a training step 110. In FIG. 1, the voice improvement method according to the present invention is performed by the voice improvement unit 120 which removes noise from a noise voice signal in a time domain, and undergoes a training process from other parts of FIG. 1, that is, a clean voice database. In the training step 110 for generating a dictionary and a pronunciation model, the feature extraction unit 130 and the voice recognition unit 140, conventional methods are used. In the recognition step 100, when the original noise voice is input to the voice improving unit 120, the voice from which the noise is removed is output to the feature extracting unit 130, and the feature extracting unit 130 eliminates the noise. The voice feature vector is extracted from the voice recognition unit 140 and input to the voice recognition unit 140. The voice recognition unit 140 recognizes the inputted voice feature vector and the voice model obtained in the training step, and outputs an optimal sentence. The speech improvement method according to the present invention performed in the speech improving unit 120 is described in detail below.

도 2에 도시된 음성인식기도 도 1에 도시된 음성인식기와 동일한 구성요소를 갖는다. 다만, 음성개선이 이루어지는 단계가 음성신호로부터 특징벡터를 추출하는 특징추출부(220)의 뒷단에 존재한다. 즉, 도 2에 도시된 음성인식기에서는 원 잡음 음성으로부터 일단 특징추출부(220)에 의해 특징 벡터가 추출된 후에, 이러한 잡음 음성 특징 벡터를 음성개선부(230)에 입력하여 잡음이 제거된 음성 특징 벡터를 얻는 것이다. 잡음이 제거된 음성 특징벡터는 마찬가지로 음성인식부(240)에 입력되어 음성인식이 행해진다.The voice recognizer shown in FIG. 2 also has the same components as the voice recognizer shown in FIG. However, the step of improving the voice is present at the rear end of the feature extractor 220 for extracting the feature vector from the voice signal. That is, in the speech recognizer illustrated in FIG. 2, after the feature vector is extracted by the feature extractor 220 from the original noise speech, the noise speech feature vector is input to the speech enhancer 230 to remove the noise. To get a feature vector. The speech feature vector from which the noise is removed is similarly inputted to the speech recognition unit 240 to perform speech recognition.

도 3에 도시된 음성개선장치는 도 1에 도시된 음성인식기 중에서 본 발명과 관련되는 부분만을 상세히 도시한 블럭도이다. 도 3에 도시된 음성개선장치는 음성개선단계(300)와 훈련 단계(310)로 이루어지며, 음성개선단계(300)는 전처리부(320)와, 음성보상부(330)와, 후처리부(340)와, 특징추출부(350)를 포함하며, 훈련단계(310)는 클린 음성 데이터베이스(315)와, 전처리부(325)와, ICA(335)와, PDF 추정부(345)와, 음성 베이시스 메트릭스 A의 저장부(355)와 클린 음성특징벡터 파라미터 저장부(365)를 포함한다.3 is a block diagram showing only a part of the voice recognizer shown in FIG. 1 related to the present invention in detail. 3 is composed of a voice improvement step 300 and a training step 310, the voice improvement step 300 is a preprocessor 320, a voice compensator 330, and a post processor ( 340, and feature extraction unit 350, the training step 310 is a clean voice database 315, preprocessor 325, ICA 335, PDF estimator 345, voice A storage unit 355 of the basis matrix A and a clean voice feature vector parameter storage unit 365 are included.

훈련단계(310)에서, 전처리부(325)는 클린 음성 데이터베이스(315)로부터의 잡음이 섞이지 않은 클린 음성으로부터 오버랩 애드를 위한 세그먼테이션 및 윈도윙 시킨 클린음성 훈련 벡터를 추출하며, ICA(335)는 이러한 클린 음성 훈련 벡터에 독립성분분석 방법을 이용하여 독립성분분석 베이시스 계수를 생성한다. PDF 추정부(345)는 이러한 베이시스 계수를 이용하여 클린 음성 특징 벡터에 대한 파라미터를 생성하여 저장부(365)에 저장한다. 또한, ICA(335)로부터 생성된 음성 베이시스 메트릭스 A는 저장부(335)에 저장된다.In the training step 310, the preprocessor 325 extracts the segmented and windowed clean voice training vector for overlapping from the clean voice without noise from the clean voice database 315, and the ICA 335 Independent component analysis basis coefficients are generated in the clean speech training vector using the independent component analysis method. The PDF estimator 345 generates a parameter for the clean speech feature vector using the basis coefficient and stores the parameter in the storage 365. In addition, the voice basis matrix A generated from the ICA 335 is stored in the storage unit 335.

음성개선단계(300)에서, 실제 음성개선은 전처리부(320), 음성보상부(330), 후처리부(340)에 의해 이루어진다. 전처리부(320)는 입력된 잡음이 섞인 원 시간 영역의 음성 신호를 음성 베이시스 메트릭스 A의 인버스를 이용하여 전처리하고, 전처리된 신호는 음성보상부(330)에 입력되며, 음성보상부(330)는 상기 전처리된신호를 클린 음성 특징 벡터에 대한 파라미터를 이용하여 환경보상을 행하며, 환경보상된 신호는 후처리부(340)에 입력된다. 후처리부(340)는 실질적으로 전처리부에서 행해지는 과정의 인버스에 해당하는 것으로 음성 베이시스 메트릭스 A를 이용하여 후처리하고 후처리부(340)로부터 출력된 잡음이 제거된 시간 영역의 음성 신호는 특징 벡터 추출을 위해 특징 추출부(350)에 입력된다.In the voice improvement step 300, the actual voice improvement is performed by the preprocessor 320, the voice compensator 330, and the post processor 340. The preprocessor 320 preprocesses the voice signal in the original time domain in which the input noise is mixed using the inverse of the voice basis matrix A, and the preprocessed signal is input to the voice compensator 330 and the voice compensator 330. Performs the environmental compensation using the preprocessed signal using the parameters for the clean speech feature vector, and the environmentally compensated signal is input to the post processor 340. The post processor 340 substantially corresponds to an inverse of a process performed in the pre processor. The post processing unit 340 performs a post processing using the voice basis matrix A and removes the noise output from the post processor 340. It is input to the feature extraction unit 350 for extraction.

도 4에 도시된 음성개선장치는 도 2에 도시된 음성인식기 중에서 본 발명과 관련되는 부분만을 상세히 도시한 블럭도이다. 도 4에 도시된 음성개선장치는 도 3에 도시된 음성개선장치와 그 구성요소는 동일하며, 다만, 음성개선단계(400)에서 원 잡음 음성신호로부터 특징 벡터가 추출된 잡음 음성 특징벡터에 대해서 본 발명에 따른 음성개선방법 즉, 전처리부, 음성보상부, 후처리부에 의한 동작을 수행한다는 것이 다르다.4 is a block diagram showing only a part of the voice recognizer shown in FIG. 2 related to the present invention in detail. 4 is the same as that of FIG. 3 and the components thereof are the same, except for the noise speech feature vector from which the feature vector is extracted from the original noise speech signal in the speech improvement step 400. The voice improvement method according to the present invention is different from that performed by the preprocessor, the voice compensator, and the post processor.

이제, 도 5를 참조하여, 도 4에 도시된 음성개선단계중 전처리부, 음성보상부, 후처리부의 동작을 상세히 설명한다. 도 3과 도 4에 도시된 전처리부, 음성보상부, 후처리부에서의 동작은 동일하므로 입력신호가 잡음 음성특징벡터인 도 4의 예를 참조하여 설명한다.Referring to FIG. 5, the operations of the preprocessor, the voice compensator, and the postprocessor during the voice improvement step shown in FIG. 4 will be described in detail. Since the operations in the preprocessor, the voice compensator, and the post processor shown in FIGS. 3 and 4 are the same, the input signal will be described with reference to the example of FIG. 4 in which the input signal is a noise voice feature vector.

전처리 단계(500)에서는 잡음이 섞인 음성 특징 벡터 y(n)을 입력으로 하여 차례로 세그먼테이션, 윈도윙(windowing), 독립성분분석 베이시스 함수 변환(ICA-BFT)을 행한다. 오버랩-애드(overlap-add)를 위해 세그먼테이션된 m번째 프레임 신호를 y_m(n)이라 하면, 주파수 성분 왜곡을 막기 위한 윈도윙은 다음과 같은 식으로 표현된다.In the preprocessing step 500, segmentation, windowing, and independent component analysis basis function transformation (ICA-BFT) are sequentially performed by inputting a noisy speech feature vector y (n). If the m th frame signal segmented for overlap-add is y _m (n), the windowing for preventing the frequency component distortion is expressed as follows.

여기에서, D는 오버랩 사이즈, L은 프레임 시프트, M=D+L은 프레임 사이즈 또는 ICA-BFT 사이즈이다. 이렇게 윈도윙을 거친 신호은 다음과 같은 ICA-BFT 과정을 거친다.Where D is overlap size, L is frame shift, M = D + L is frame size or ICA-BFT size. This windowing signal Undergoes the following ICA-BFT process:

여기에서, 행렬 Ao는 행렬 A를 orthogonalization한 것이며, 다음과 같은 식으로 구한다.Here, matrix Ao is orthogonalization of matrix A, and is obtained by the following equation.

행렬 A의 각 칼럼은 ICA 베이시스 함수이며, 이 행렬은 여러가지 방법으로 구할 수 있다. 이렇게 해서 전처리 단계로부터 M 차원의 잡음 음성 계수 벡터 Y(m)을 얻는다.Each column of matrix A is an ICA basis function, which can be obtained in several ways. In this way, the noise speech coefficient vector Y (m) of M dimension is obtained from the preprocessing step.

음성 보상 단계(502-518,522-526)는 처음 INIT-FRAMES 프레임 동안 보상에 필요한 파라미터들을 초기화하는 단계(502,504,506), 잡음 음성 계수 벡터 Y(m)에 대한 파라미터를 계산하는 단계(508), 현재 프레임이 잡음으로만 이루어졌을 확률을 계산하는 단계(510), 현재 프레임이 잡음이라 판단될 경우 잡음 계수 벡터 N(m)에 대한 파라미터를 갱신하는 단계(512,514), 현재 프레임의 음성 계수 벡터에 대한 파라미터를 예측하는 단계(516), Y(m)으로부터 잡음이 제거된 음성 계수 벡터 S(m)을 계산하는 단계(518), 음성 계수 벡터 S(m)에 대한 파라미터를 갱신하고 이 모든 과정을 마지막 프레임까지 반복하는 단계(522-526)로 구성된다.The speech compensation steps 502-518, 522-526 include initializing the parameters necessary for compensation during the first INIT-FRAMES frame (502, 504, 506), calculating the parameters for the noise speech coefficient vector Y (m) 508, the current frame. Calculating a probability that the noise is made of only the noise (510); updating the parameters for the noise coefficient vector N (m) (512, 514) if it is determined that the current frame is noise; Predicting 516, calculating the noise coefficient vector S (m) from which noise is removed from Y (m), updating the parameters for the speech coefficient vector S (m) and ending all of these processes. Repeating until the frame (522-526).

ICA-BFT는 선형 변환이므로, 전처리 단계를 통하여 얻은 잡음 음성 계수 벡터의 k번째 성분 Y_k(m)은 음성 계수 벡터의 k번째 성분인 S_k(m)과 잡음 계수 벡터의 k번째 성분인 N_k(m)의 합으로 이루어져 있다.Since ICA-BFT is a linear transformation, the kth component Y _k (m) of the noise speech coefficient vector obtained through the preprocessing step is S _k (m), the kth component of the speech coefficient vector, and N, the kth component of the noise coefficient vector. _It consists of the sum of _k (m).

여기서, Y_k(m)과 N_k(m) 은 각각 다음과 같은 일반적인 가우시안 분포와 가우시안 분포를 따른다.Here, Y _k (m) and N _k (m) follow the following general Gaussian and Gaussian distributions, respectively.

여기에서,이다.From here, to be.

다음, 단계(502-506)에서 처음 INIT-FRAMES 프레임 동안 음성 보상에 필요한파라미터들을 초기화한다. 보상에 필요한 파라미터들을 다음과 같이 정의한다.Next, in steps 502-506, the parameters required for voice compensation during the first INIT-FRAMES frame are initialized. Define the parameters required for compensation as follows.

첫번째 프레임 즉, m=0 인 경우 파라미터들은 다음과 같이 초기화된다.In the first frame, that is, m = 0, the parameters are initialized as follows.

이후 M<INIT-FRAMES 인 프레임들 동안 다음식과 같이 갱신된다.After that, during frames with M <INIT-FRAMES, it is updated as follows.

여기에서,와은 사용자가 정의하여 쓸 수 있는 상수이다.From here, Wow Is a user-defined constant.

처음 INIT-FRAMES 프레임 동안은 잡음만 있다고 가정하므로 이 동안의 음성계수 벡터는 다음과 같이 구한다.Since it is assumed that only noise is present during the first INIT-FRAMES frame, the speech coefficient vector is calculated as follows.

여기에서, GAIN_MIN은 상수로서 최소한의 이득 값이며, IS-127의 경우와 마찬가지로 0.2238 이라는 값으로 정해진다.Here, GAIN _MIN is a minimum gain value as a constant and is set to a value of 0.2238 as in the case of IS-127.

단계(508)에서는 잡음 음성 계수 벡터 Y(m)에 대한 파라미터를 계산한다. 프레임간의 상관(correlation)을 고려하여 다음과 같이 갱신된다.Step 508 calculates the parameters for the noise speech coefficient vector Y (m). It is updated as follows in consideration of the correlation between frames.

위 두식에서 얻어진 값으로부터 일반적인 가우시안 분포의 멱지수(exponent)를 계산한다.Calculate the exponent of a typical Gaussian distribution from the values obtained in the two equations above.

여기에서,이다.From here, to be.

단계(510)는 현재 프레임이 잡음으로만 이루어졌을 확률을 계산한다. 이 확률을 음성부재확률이라고 하며, 계산은 다음과 같은 두개의 전역적 가설에 기반한다.Step 510 calculates the probability that the current frame consists only of noise. This probability is called the negative absence probability, and the calculation is based on two global hypotheses:

그러나, 음성의 부재 여부는 각 베이시스 함수에 따라 다르므로, 각 성분들에 따라 또다시 두개씩의 국소적 가설이 존재한다.However, the absence of negative depends on each basis function, so there are again two local hypotheses for each component.

그러면, 전역적 음성 부재 확률 p(Ho l Y(m))은 다음 식에 의해 계산된다.Then, the global negative absence probability p (Ho l Y (m)) is calculated by the following equation.

여기에서,이며,는 우도비(likelihood ratio)이다.From here, Is, Is the likelihood ratio.

전역적 음성 부재 확률 p(Ho l Y(m))을 계산하는 데에는 다음과 같은 식들이 사용되었다.The following equations were used to calculate the global negative absence probability p (Ho l Y (m)).

이와 같은 계산이 가능한 이유는 ICA 베이시스의 계수들이 서로 독립적이기This calculation is possible because the coefficients of the ICA basis are independent of each other.

때문이다.Because.

단계(512-514)는 현재 프레임이 잡음이라 판단될 경우 잡음 계수 벡터 N(m)에 대한 파라미터를 갱신한다. 단계(510)에서 계산한 전역적 음성 부재 확률이 사용자가 정한 문턱값보다 클 경우 잡음 계수 벡터에 대한 파라미터를 다음 식과 같이 갱신한다.Steps 512-514 update the parameters for noise coefficient vector N (m) if it is determined that the current frame is noise. If the global speech absence probability calculated in step 510 is larger than a threshold determined by the user, the parameter for the noise coefficient vector is updated as follows.

갱신은 모든 M개의 베이시스에 대하여 이루어지며, 만약 전역적 음성 부재확률이 문턱값보다 작은 경우에는 파라미터 갱신을 하지 않고 기존의 값을 그대로 유지한다. 이와 같이 매 프레임에 대해서 현재 프레임이 잡음으로 되어 있는지를 판단하고 잡음 계수 벡터에 대한 파라미터를 갱신함으로써 환경의 영향에 의한 잡음 성분을 제거한 음성 특징벡터를 실시간으로 구할 수 있게 된다.The update is made for all M basis, and if the global voice absence probability is less than the threshold, the parameter is not updated and the existing value is kept as it is. In this way, by determining whether the current frame is noisy for each frame and updating the parameter for the noise coefficient vector, it is possible to obtain the voice feature vector in which the noise component due to environmental influence is removed in real time.

단계(516)는 현재 프레임의 음성 계수 벡터에 대한 파라미터를 예측한다. 이 계산은 전역적 음성 부재 확률과 관계없이 이루어진다.Step 516 predicts a parameter for the speech coefficient vector of the current frame. This calculation is made regardless of the global negative absence probability.

단계(518)는 Y(m)으로부터 잡음이 제거된 음성 계수 벡터 S(m)을 계산하는 과정이다. 단계(514)와 (516)에서 얻어진 파라미터들을 이용하여 m번째의 프레임의 k 번째 베이시스에 대한 음성 계수 S_k(m)을 계산하는 데에는 두가지 모델이 있다. 만약, 다음의 식을 만족한다면,Step 518 is a process of calculating the speech coefficient vector S (m) from which noise is removed from Y (m). There are two models for calculating the speech coefficient S _k (m) for the k th basis of the m th frame using the parameters obtained in steps 514 and 516. If you satisfy the following equation,

아래와 같은 식을 이용하여 음성 계수를 구한다.Obtain the speech coefficient using the equation below.

여기에서,이다.From here, to be.

그렇지 않다면, 다음 식을 이용하여 음성 계수를 구한다.Otherwise, the speech coefficient is obtained using the following equation.

여기에서,이다.From here, to be.

위의 식들을 이용하여 음성 계수 S_k(m)을 구하는 데에는 p(S_k(m)=0)을 알아야한다. S_k(m)도 일반적인 가우시안 분포를 따르므로 다음식과 같이 표현할 수 있다.We need to know p (S _k (m) = 0) to find the speech coefficient S _k (m) using the above equations. S _k (m) also follows the general Gaussian distribution and can be expressed as

그러므로, p(S_k(m)=0)는 다음 식과 같이 구한다.Therefore, p (S _k (m) = 0) is obtained as follows.

여기에서,,이다.From here, , to be.

과은 ICA 베이시스 함수들을 구할 때 사용되었던 데이터베이스로부터 오프라인으로 계산해서 그 값을 가지고 있을 수 있다. 이 경우에, v_s(k,m)은 매 프레임마다 계산될 필요가 없이 가지고 있는 값을 이용할 수 있다. and May have its value computed offline from the database used to obtain the ICA basis functions. In this case, v _s (k, m) can use the value it has without needing to be calculated every frame.

음성 계수 S_k(m)의 크기는 전 프레임에 걸쳐 GAIN_MINY_k(m)의 값보다는 작을 수 없게 한다. 그래서 단계(518)에서 구해진 음성 계수 S_k(m)은 다음과 같은 연산을 하나 더 거친 값을 사용한다.The magnitude of the speech coefficient S _k (m) cannot be smaller than the value of GAIN _MIN Y _k (m) over the entire frame. Thus, the speech coefficient S _k (m) obtained in step 518 uses a value that has undergone one more operation as follows.

단계(522-526)은 음성 계수 벡터 S(m)에 대한 파라미터를 갱신하고 이 모든과정을 마지막 프레임까지 반복하도록 하는 과정이다. 단계(522)에서 마지막 프레임이라 판단되면 프로그램 수행을 종료하고 그렇지 않다고 판단되면 잡음 계수 벡터에 대한 파라미터는 다음 프레임에서의 사용을 위해 다음 식과 같이 유지된다.Steps 522-526 are for updating the parameters for the speech coefficient vector S (m) and repeating all the processes until the last frame. If it is determined in step 522 that it is the last frame, the program execution is terminated. If not, the parameter for the noise coefficient vector is kept as follows for use in the next frame.

음성 계수 벡터에 대한 파라미터들은 단계(518)에서 구해진 값들을 이용하여 다음 식과 같이 갱신된다.The parameters for the speech coefficient vector are updated using the values obtained in step 518 as follows.

이와 같이 다음 프레임을 위한 파라미터들의 유지 및 갱신이 끝나면 프레임 인덱스를 하나 증가시켜 모든 프레임에 대하여 이 모든 과정을 반복하여 수행케 한다.In this way, when the maintenance and updating of the parameters for the next frame is completed, the frame index is increased by one so that all the processes are repeated for all the frames.

위와 같은 보상 과정에서 출력으로 얻어지는 것은 잡음이 제거된 음성 계수 벡터 S(m)이며, 이는 후처리 단계를 통하여 다시 음성 특징 벡터 S_m(n)로 변환된다. 후처리 과정은 역 ICA-BFT 와 오버랩-애드 연산으로 구성되며, 다음과 같은 식으로 구한다.What is obtained as an output in the compensation process as described above is the noise-coated speech coefficient vector S (m), which is converted back to the speech feature vector S _m (n) through a post-processing step. The post-processing process consists of the inverse ICA-BFT and the overlap-add operation.

본 발명의 음성개선방법은 또한 컴퓨터로 읽을 수 있는 기록매체에 컴퓨터가 읽을 수 있는 코드로서 구현하는 것이 가능하다. 컴퓨터가 읽을 수 있는 기록매체는 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록장치를 포함한다. 컴퓨터가 읽을 수 있는 기록매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플라피디스크, 광데이터 저장장치 등이 있으며, 또한 캐리어 웨이브(예를 들어 인터넷을 통한 전송)의 형태로 구현되는 것도 포함한다. 또한 컴퓨터가 읽을 수 있는 기록매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수 있다.The speech improvement method of the present invention can also be embodied as computer readable codes on a computer readable recording medium. The computer-readable recording medium includes all kinds of recording devices in which data that can be read by a computer system is stored. Examples of computer-readable recording media include ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage, and the like, which are also implemented in the form of a carrier wave (for example, transmission over the Internet). It also includes. The computer readable recording medium can also be distributed over network coupled computer systems so that the computer readable code is stored and executed in a distributed fashion.

이상과 같은 본 발명에 의하면, 음성 신호를 받는 마이크의 차이에 의한 왜곡, 음성 신호 전달 채널에 의한 왜곡, 실제 환경에서 소음으로 포함되는 잡음 등 음성 인식 시스템 작동 환경에 따른 다양한 영향에 의해 포함되는 잡음 성분을 제거한 음성 특징 벡터 또는 음성신호를 실시간으로 구할 수 있다.According to the present invention as described above, the noise included by the various effects of the operating environment of the speech recognition system, such as distortion due to the difference of the microphone receiving the voice signal, distortion by the voice signal transmission channel, noise included as noise in the real environment The speech feature vector or the speech signal from which the component is removed can be obtained in real time.

또한, 본 발명에 따른 음성개선방법은 음성 특징 벡터와 원 잡음 음성 신호에 모두에 적용할 수 있기 때문에 음성인식분야 뿐만 아니라 음성 신호를 잡음을 제거하여 녹음하는 단계나 녹음된 음성 신호를 잡음을 제거하여 출력하는 단계에서도 유용하게 쓰일 수 있다.In addition, since the voice improvement method according to the present invention can be applied to both the voice feature vector and the original noise voice signal, not only the voice recognition field but also the step of removing the noise from the voice signal and removing the noise from the recorded voice signal. It can also be useful for printing.

Claims

A method of extracting a speech compensated speech feature vector from a noisy speech,

Extracting a noise speech feature vector from the noise speech;

Preprocessing the noise speech feature vector using independent component analysis to obtain a noise speech coefficient vector from the noise speech feature vector;

Compensating for the noise speech coefficient vector output from the preprocessor in real time;

Postprocessing the speech coefficient vector to convert the speech coefficient vector obtained as a result of the compensation into a speech feature vector.

Extracting a noisy speech feature vector from the noisy speech;

A second step of preprocessing the noise speech feature vector to convert the noise speech feature vector into a noise speech coefficient vector using independent component analysis;

A third step of initializing the parameters for the noise speech coefficient vector, the speech coefficient vector, and the noise coefficient vector required for speech compensation during the first predetermined frame;

A fourth step of calculating a parameter for the noise speech coefficient vector obtained from the second step;

A fifth step of calculating a probability that the current frame consists only of noise,

A sixth step of updating a parameter for the noise coefficient vector when it is determined that the current frame is noise;

A seventh step of predicting a parameter for the speech coefficient vector of the current frame;

An eighth step of calculating a speech coefficient vector from which the noise is eliminated from the noise speech coefficient vector using the noise coefficient vector parameter and the speech coefficient vector parameter;

A ninth step of post-processing the noise-free speech coefficient vector and converting it into a speech feature vector;

Updating a parameter for the speech coefficient vector;

And an eleventh step of repeating the second to tenth steps to the last frame.

The method of claim 2,

The second step comprises the noise speech feature vector from noise speech, which segmentation, windowing, and ICA basis function transform (ICA-BFT) for overlap-add. A method of extracting a speech compensated speech feature vector.

The method of claim 2,

The fifth step,

A method of extracting a speech compensated speech feature vector from a noisy speech, using the property that the ICA basis coefficients are independent of each other to calculate the probability that the current frame consists only of noise.

The method of claim 4, wherein

The probability that the current frame calculated in the fifth step consists only of noise, Saved by

here,

, , , A method of extracting a speech compensated speech feature vector from a speech noise.

A method of using speech feature vectors extracted by the speech feature vector extraction method of claim 2 for speech recognition.

An apparatus for extracting a speech feature vector from which noise is removed from a noise speech,

A feature extractor for extracting a noise speech feature vector from the input noise speech;

A preprocessor for preprocessing the noise speech feature vector by using independent component analysis to obtain a noise speech coefficient vector from the noise speech feature vector output from the feature extractor;

A voice compensator for compensating in real time the noise speech coefficient vector output from the preprocessor;

And a post processor which post-processes the speech coefficient vector to convert the speech coefficient vector output from the speech compensator into a speech feature vector.

The method of claim 7, wherein

In the preprocessor, the preprocessing of the noise speech vector is performed by segmentation, windowing, and independent component analysis based on the noise-voice feature vector. A device for extracting a noise canceled speech feature vector from a noise speech performed by a BFT).

The method of claim 8,

The voice compensator,

Initialize the parameters for the noise speech coefficient vector, the speech coefficient vector, the noise coefficient vector required for speech compensation, calculate the parameters for the noise speech coefficient vector, and if the current frame consists only of noise, the parameters for the noise coefficient vector. To estimate a parameter for the speech coefficient vector of the current frame, and to calculate a speech coefficient vector from which the noise speech coefficient vector is removed using the noise coefficient vector parameter and the speech coefficient vector parameter. A device for extracting speech feature vectors from which noise is removed.

A speech recognition device comprising the speech feature vector extracting device according to claim 9.

In a method for removing noise from a noisy voice,

Preprocessing the noisy speech signal using independent component analysis to obtain a noisy speech coefficient vector from the noisy speech signal in time domain;

Compensating the noise speech coefficient vector in real time to obtain a speech coefficient vector from which noise is removed from the noise speech coefficient vector obtained by the preprocessing;

And post-processing the speech coefficient vector to convert the speech coefficient vector obtained as a result of the compensation into a speech signal in a time domain.

In a method for removing noise from a noisy voice,

A first step of preprocessing the noisy speech signal to convert the noisy speech signal in the time domain into a noisy speech coefficient vector using independent component analysis;

A second step of initializing the parameters for the noise speech coefficient vector, the speech coefficient vector, and the noise coefficient vector required for speech compensation during the first predetermined frame;

A third step of calculating a parameter for the noise speech coefficient vector obtained from the first step;

A fourth step of calculating a probability that the current frame consists only of noise,

A fifth step of updating a parameter for the noise coefficient vector when it is determined that the current frame is noise;

A sixth step of predicting a parameter for the speech coefficient vector of the current frame;

A seventh step of calculating a speech coefficient vector from which the noise is removed from the noise speech coefficient vector using the noise coefficient vector parameter and the speech coefficient vector parameter;

An eighth step of post-processing the noise-free speech coefficient vector and converting the speech coefficient vector into a speech signal in a time domain;

A ninth step of updating a parameter for the speech coefficient vector;

And a tenth step of repeating the first step to the ninth step to the last frame.

The method of claim 12,

The first step is noise from noisy speech, which segments the window for overlap-add, windowing, and ICA basis function transform (ICA-BFT). How to remove it.

The method of claim 12,

The fourth step,

Independent Component Analysis A method of removing noise from a noisy speech, which calculates the probability that the current frame consists only of noise using the property that the basis coefficients are independent of each other.

The method of claim 14,

The probability that the current frame calculated in the fourth step consists only of noise, Saved by

here,

, , , A method of removing noise from a noisy speech.

A speech signal from which noise is removed by a method for removing noise from the noise speech as recited in claim 12, for speech recognition.

A device for removing noise from a noisy voice,

A preprocessor for preprocessing the noisy speech signal using an independent component analysis method to convert the inputted noisy speech signal into a noisy speech coefficient vector;

A speech compensator for compensating the noise speech coefficient vector in real time to obtain a speech coefficient vector from which noise is removed from the noise speech coefficient vector output from the preprocessor;

And a post processor which post-processes the speech coefficient vector to convert the speech coefficient vector output from the speech compensator into a speech signal in a time domain.

The method of claim 17,

In the preprocessor, the preprocessing of the noisy speech signal in the time domain includes segmentation, windowing, and independent component analysis based on ICA based function transform for overlap-add. ICA-BFT) to remove noise from noisy speech.

The method of claim 18,

The voice compensator,

Initialize the parameters for the noise speech coefficient vector, the speech coefficient vector, the noise coefficient vector required for speech compensation, calculate the parameters for the noise speech coefficient vector, and if the current frame consists only of noise, the parameters for the noise coefficient vector. To estimate a parameter for the speech coefficient vector of the current frame, and to calculate a speech coefficient vector from which the noise speech coefficient vector is removed using the noise coefficient vector parameter and the speech coefficient vector parameter. Device for removing noise from speech.

A speech recognition device comprising the noise canceling device according to claim 19.

A computer-readable recording medium having recorded thereon a program for executing a method of extracting a speech compensated speech feature vector from a noisy speech according to claim 2.

A computer-readable recording medium having recorded thereon a program for causing a computer to perform the method for removing noise from the noisy voice according to claim 12.