KR101290997B1

KR101290997B1 - A codebook-based speech enhancement method using adaptive codevector and apparatus thereof

Info

Publication number: KR101290997B1
Application number: KR1020120087259A
Authority: KR
Inventors: 김무영; 황인우
Original assignee: 세종대학교산학협력단
Priority date: 2012-03-26
Filing date: 2012-08-09
Publication date: 2013-07-30

Abstract

PURPOSE: A codebook-based speech enhancement method using an adaptive codevector and an apparatus thereof are provided to generate a super codebook by combining an existing noise codebook with adaptive codevectors that are generated through a long-term noise estimation, thereby guaranteeing performance on outside noises as well as on inside noises. CONSTITUTION: A speech enhancement apparatus estimates noise signals included in inputted speech signals using a long-term noise estimation algorithm and calculates a linear predictor parameter of the estimated noise signal (S360,S380). The speech enhancement apparatus generates a super codebook by saving N adaptive codevectors in a selected noise codebook (S400,S410). The speech enhancement apparatus outputs compensates speech signals using an integrated codevector, which is a combination of noise codevectors in the super codebook and clean codevectors in a clean codebook, and a spectral envelop value of the inputted speech signal (S420). [Reference numerals] (AA) Start; (BB) End; (S310) Input voice signals (y (n)); (S320) Generating a voice spectrum ¦ (y (k))¦^2 by Fourier conversion; (S330,S370) Fourier conversion; (S340) Extracting a linear estimation coefficient (a_y) of input voice signals (y (n)); (S350) Obtaining spectrum evelope value of input voice signals (y (n)); (S360) Estimating noise signals (d (n)); (S380) Extracting a linear estimation coefficient (a_d) of estimated noise signals (d (n)); (S390) Obtaining spectrum evelope value of estimated noise signals (d (n)); (S400) Selecting noise codebook having similar noise codevectors; (S410) Generating a super codebook by saving N adaptive codevectors in a selected noise codebook; (S420) Outputs compensated speech signals

Description

Codebook-based speech enhancement method and apparatus therefor using adaptive codevector

본 발명은 적응형 코드벡터를 이용한 코드북 기반 음성 향상 방법 및 그 장치에 관한 것이다. 구체적으로는 적응형 코드벡터를 이용하여 학습한 잡음과 학습하지 않은 잡음, 모든 경우에 대해서 성능을 보장할 수 있는 코드북 기반 음성 향상 방법 및 그 장치에 관한 것이다. The present invention relates to a codebook-based speech enhancement method and apparatus using an adaptive codevector. Specifically, the present invention relates to a codebook-based speech enhancement method and apparatus capable of guaranteeing performance in all cases of noise and non-learned noise using adaptive codevectors.

음성향상 기법은 휴대기기를 이용한 통신 및 음악정보 처리 분야는 물론이고 로봇제어를 위한 음성인식 등 다양한 분야에서 필요로 하는 기술이며, 특히 음성향상 기법의 성능을 향상시키기 위해서는 잡음을 제거하는 기술이 중요하다.The voice enhancement technique is required in various fields such as communication and music information processing using mobile devices, as well as voice recognition for robot control, and in particular, a technique for removing noise is important for improving the performance of the voice enhancement technique. Do.

잡음을 제거하는 기술에는 잡음으로 오염된 음성이 깨끗한 음성과 배경 잡음의 합이라는 가정에서 시작된 스펙트럴 제거(spectral subtraction; SS) 방법, 잡음의 변화가 주파수 밴드 별로 독립적이라는 가정을 이용하여 성능을 개선한 MBSS(multi-band spectral subtraction) 방법 등이 있다. 또한 잡음 추정 알고리즘은 음성 인식(voice activity detection; VAD)을 이용한 방법이나 최소 통계(Minimum statistics; MS)를 이용한 방법 등 다양한 알고리즘이 연구되어 있다. Noise removal techniques use spectral subtraction (SS) methods that begin with the assumption that noise-contaminated speech is the sum of clean speech and background noise, and improves performance using the assumption that noise changes are independent of each frequency band. One multi-band spectral subtraction (MBSS) method is available. In addition, various algorithms have been studied, such as a method using voice activity detection (VAD) or a method using minimum statistics (MS).

또한 단기 예측계수 코드북 기법(Codebook Driven Short-Term Predictor parameter estimation; CDSTP)과 같은 방법은 음성과 잡음의 선형 예측 계수를 데이터베이스로 이용하고, 입력된 음성신호와의 최대 유사도(Maximum Likelihood; ML) 또는 최소 평균 제곱 오류(Minimum Mean Square Error; MMSE) 추정을 통하여 잡음을 제거하고 향상된 음성 신호를 얻는다. 이 기법은 코드북에 포함된 다양한 선형 예측 계수를 이용하기 때문에 정상(stationary) 배경 잡음 환경뿐만 아니라 비정상(non-stationary) 배경 잡음 환경에서도 뛰어난 성능을 보인다.In addition, methods such as Codebook Driven Short-Term Predictor Parameter Estimation (CDSTP) use a linear prediction coefficient of speech and noise as a database, and use the Maximum Likelihood (ML) or Minimum Mean Square Error (MMSE) estimation removes noise and obtains an improved speech signal. Because this technique uses various linear prediction coefficients included in the codebook, it shows excellent performance in both stationary and non-stationary background noise environments.

이와 같이 기존 코드북 기반 음성향상 기법은 미리 학습된 음성과 잡음의 선형 예측 계수를 데이터베이스로 이용하고, 입력신호와의 ML 또는 MMSE 추정을 통하여 향상된 음성 신호를 얻는다. 그러나, 이와 같은 기법은 코드북에 포함된 다양한 선형 예측 계수를 이용하기 때문에 기존의 잡음 코드북(fixed codebooks) 내에 해당 잡음이 있는 경우에는 뛰어난 성능을 보이지만 학습되지 않은 잡음에 관해서는 성능을 보장하지 못하는 문제점이 있다. As described above, the existing codebook-based speech enhancement method uses a pre-trained linear prediction coefficient of speech and noise as a database and obtains an improved speech signal through ML or MMSE estimation with an input signal. However, since this technique uses various linear prediction coefficients included in the codebook, it shows excellent performance when the noise is in existing fixed codebooks, but does not guarantee performance with respect to untrained noise. There is this.

본 발명의 목적은 학습한 잡음과 학습하지 않은 잡음, 모든 경우에 대해서 성능을 보장할 수 있는 적응형 코드북을 이용한 코드북 기반 음성 향상 방법 및 그 장치를 제공하는 것이다.An object of the present invention is to provide a codebook-based speech enhancement method and apparatus using an adaptive codebook capable of guaranteeing performance against learned and unlearned noise.

상기한 바와 같은 목적을 달성하기 위한 본 발명의 실시예에 따른 적응형 코드벡터를 이용한 코드북 기반 음성 향상 방법은, 클린 음성 신호와 잡음 신호가 혼합된 입력 음성 신호를 프레임 단위로 입력받는 단계, 상기 입력 음성 신호를 푸리에 변환시켜 입력 음성 파워 스펙트럼을 생성하는 단계, 상기 입력 음성 신호의 스펙트럴 엔벨로프 값을 구하는 단계, 상기 입력 음성 파워 스펙트럼으로부터 상기 잡음 신호를 추정하고, 상기 추정된 잡음 신호의 스펙트럴 엔벨로프 값을 구하는 단계, 기 저장된 복수의 잡음 코드북들 중에서, 상기 추정된 잡음 신호의 스펙트럴 엔벨로프 값과 가장 유사한 잡음 코드벡터를 가지고 있는 잡음 코드북을 선택하는 단계, 상기 추정된 잡음 신호의 스펙트럴 엔벨로프 값들 중에서 임의의 N개를 적응형 코드벡터로 설정하고, 상기 N개의 적응형 코드벡터를 상기 선택된 잡음 코드북에 저장시켜 슈퍼 코드북을 생성하는 단계, 그리고 상기 슈퍼 코드북 내의 잡음 코드벡터들과 클린 코드북 내의 클린 코드벡터들을 조합한 통합 코드벡터와 상기 입력 음성 신호의 스펙트럴 엔벨로프 값을 이용하여 보정된 음성 신호를 추출하는 단계를 포함한다. Codebook-based speech enhancement method using an adaptive codevector according to an embodiment of the present invention for achieving the above object, the step of receiving an input speech signal mixed with a clean speech signal and a noise signal in units of frames, Generating an input speech power spectrum by Fourier transforming an input speech signal, obtaining a spectral envelope value of the input speech signal, estimating the noise signal from the input speech power spectrum, and spectra of the estimated noise signal Obtaining an envelope value, selecting a noise codebook having a noise code vector most similar to a spectral envelope value of the estimated noise signal among a plurality of stored noise codebooks, and spectral envelope of the estimated noise signal Set any N of values to adaptive codevector Storing the N adaptive codevectors in the selected noise codebook to generate a super codebook, and combining the noise codevectors in the super codebook and the clean codevectors in the clean codebook and the input speech. Extracting the corrected speech signal using the spectral envelope value of the signal.

상기 입력 음성 신호의 스펙트럴 엔벨로프 값을 구하는 단계는, 상기 입력 음성 파워 스펙트럼을 역푸리에 변환시켜 상기 입력 음성 신호의 자기상관 값을 생성하는 단계, 상기 입력 음성 신호의 자기상관 값을 레빈슨 알고리즘(Levinson algorithm)에 적용하여, 상기 입력 음성 신호에 대한 선형예측계수를 추출하는 단계, 그리고 상기 입력 음성 신호에 대한 선형예측계수의 주파수 응답 값을 이용하여 상기 입력 음성 신호의 스펙트럴 엔벨로프 값을 구하는 단계를 포함할 수 있다. Obtaining a spectral envelope value of the input speech signal, inversely Fourier transform the input speech power spectrum to generate an autocorrelation value of the input speech signal, the autocorrelation value of the input speech signal Levinson algorithm extracting a linear predictive coefficient for the input speech signal, and obtaining a spectral envelope value of the input speech signal using a frequency response value of the linear predictive coefficient for the input speech signal. It may include.

상기 입력 음성 신호로부터 잡음 신호를 추정하고, 추정된 잡음 신호의 스펙트럴 엔벨로프 값을 구하는 단계는, Long-term 잡음 추정 알고리즘을 이용하여 상기 입력 음성 파워 스펙트럼으로부터 상기 잡음 신호를 추정하는 단계, 상기 추정된 잡음 신호의 잡음 타입을 이용하여, 상기 기 저장된 복수의 잡음 코드북들 중에서 1개의 잡음 코드북을 선택하는 단계, 상기 추정된 잡음 신호에 대한 잡음 파워 스펙트럼을 역푸리에 변환시켜 상기 추정된 잡음 신호의 자기 상관 값을 생성하는 단계, 상기 추정된 잡음 신호의 자기 상관 값을 레빈슨 알고리즘에 적용하여, 상기 추정된 잡음 신호에 대한 선형예측계수를 추출하는 단계, 그리고 상기 추정된 잡음 신호에 대한 선형예측계수의 주파수 응답 값을 이용하여 상기 입력 음성 신호의 스펙트럴 엔벨로프 값을 구하는 단계를 포함할 수 있다.Estimating a noise signal from the input speech signal, and obtaining a spectral envelope value of the estimated noise signal, estimating the noise signal from the input speech power spectrum using a long-term noise estimation algorithm, the estimation Selecting one noise codebook from among the plurality of stored noise codebooks by using a noise type of the estimated noise signal, and inversely transforming a noise power spectrum of the estimated noise signal to obtain a magnetic signal of the estimated noise signal. Generating a correlation value, applying the autocorrelation value of the estimated noise signal to a Levinson algorithm, extracting a linear predictive coefficient for the estimated noise signal, and obtaining a linear predictive coefficient for the estimated noise signal The spectral envelope value of the input speech signal is determined using the frequency response value. Obtaining may include.

상기 추정된 잡음 신호의 스펙트럴 엔벨로프 값과 가장 유사한 잡음 코드벡터를 가지고 있는 잡음 코드북을 선택하는 단계는, 다음의 수학식과 같은 IS-D(Itakura-Saito Distortion) 알고리즘을 이용하여 상기 잡음 코드북을 선택할 수 있다. Selecting a noise codebook having a noise code vector most similar to the spectral envelope value of the estimated noise signal, selects the noise codebook using an Itakura-Saito Distortion (IS-D) algorithm Can be.

여기서 n'는 선택된 잡음 코드북,

는 추정된 잡음신호, N_d는 코드북 데이터베이스 내 잡음 코드북의 개수, v_n은 n번째 코드북 내 코드벡터의 개수이고, a_d ^n,j은 n번째 잡음 코드북의 j번째 코드벡터의 선형 예측 계수이다. Where n 'is the selected noise codebook,

Is the estimated noise signal, N _d is the number of noise codebooks in the codebook database, v _n is the number of codevectors in the nth codebook, and a _d ^{n, j} is the linear prediction coefficient of the jth codevector of the nth noise codebook .

상기 슈퍼 코드북을 생성하는 단계는, 상기 추정된 잡음 신호의 스펙트럴 엔벨로프 값들 중에서 임의의 N개를 상기 적응형 코드벡터로 설정하여 상기 선택된 잡음 코드북에 저장시키는 단계, 그리고 상기 선택된 잡음 코드북에 상기 저장된 잡음 코드벡터와 상기 N개의 적응형 코드벡터를 통합하여 상기 선택된 잡음 코드북을 상기 슈퍼 코드북으로 변경시키는 단계를 포함할 수 있다. The generating of the super codebook comprises: setting any N of the spectral envelope values of the estimated noise signal as the adaptive codevector, storing the selected codeword in the selected noise codebook, and storing the selected noise codebook in the selected noise codebook. Integrating the noise codevector with the N adaptive codevectors to change the selected noise codebook into the super codebook.

상기 보정된 음성 신호를 추출하는 단계는, 상기 슈퍼 코드북에 포함되는 잡음 코드벡터와 상기 클린 코드북에 포함되는 클린 코드벡터를 조합하여 상기 통합 코드벡터를 생성하는 단계, 상기 통합 코드벡터에 대하여 클린 음성 신호의 게인 값과 상기 추정된 잡음 신호의 게인 값을 적용하여 각각 게인 조합 값을 생성하는 단계, 상기 입력 음성 신호의 스펙트럴 엔벨로프 값과 상기 게인 조합 값들 사이의 각각의 유사도를 구하는 단계, 그리고 상기 유사도와 게인 조합 값을 ML(Maximum Likelihood) 또는 베이시안 추정 알고리즘(MMSE)에 적용하여 상기 클린 음성 신호의 스펙트럴 엔벨로프 값과 상기 잡음 신호의 스펙트럴 엔벨로프 값을 구하는 단계를 포함할 수 있다. The extracting of the corrected speech signal may include generating the integrated code vector by combining a noise code vector included in the super codebook and a clean code vector included in the clean codebook. Generating gain combination values by applying a gain value of a signal and a gain value of the estimated noise signal, obtaining respective similarities between the spectral envelope values of the input speech signal and the gain combination values, and The method may include obtaining a spectral envelope value of the clean speech signal and a spectral envelope value of the noise signal by applying a similarity and gain combination value to a maximum likelihood (ML) or Bayesian estimation algorithm (MMSE).

상기 보정된 음성 신호를 생성하는 단계는, 위너 필터(Wiener filter)를 통해 상기 클린 음성 신호의 스펙트럴 엔벨로프 값과 상기 잡음 신호의 스펙트럴 엔벨로프 값으로부터 보정된 음성 파워 스펙트럼을 생성하는 단계, 그리고 상기 보정된 음성 파워 스펙트럼을 역푸리에 변환시켜 상기 보정된 음성 신호를 추출하는 단계를 더 포함할 수 있다. The generating of the corrected speech signal may include generating a corrected speech power spectrum from a spectral envelope value of the clean speech signal and a spectral envelope value of the noise signal through a Wiener filter, and The method may further include extracting the corrected speech signal by inverse Fourier transforming the corrected speech power spectrum.

상기 클린 음성 신호의 게인 값과 상기 추정된 잡음 신호의 게인 값은 음성 부존재 확률(SAP) 알고리즘을 이용하여 획득할 수 있다.The gain value of the clean speech signal and the gain value of the estimated noise signal may be obtained using a speech absence probability (SAP) algorithm.

상기 클린 음성 신호의 게인 값(c_g)과 상기 추정된 잡음 신호의 게인 값(n_g)은 다음의 수학식을 통해 연산될 수 있다. The gain value c _g of the clean speech signal and the gain value n _g of the estimated noise signal may be calculated through the following equation.

여기서,

이며, Y(w)는 입력 음성 신호의 주파수 응답 값을 나타내며, Aⁿ _d(w)은 슈퍼 코드북에 저장된 n번째 코드벡터를 나타내고, A^m _x(w)은 클린 코드북에 저장된 m번째 코드벡터를 나타낸다. here,

Where Y (w) represents the frequency response of the input speech signal, A ⁿ _d (w) represents the nth codevector stored in the super codebook, and A ^m _x (w) represents the mth codevector stored in the clean codebook Indicates.

상기 입력 음성 신호에 이어서 입력되는 다음 프레임의 입력 음성 신호로부터 잡음 신호를 추정하고, 추정된 잡음 신호의 스펙트럴 엔벨로프 값을 구하는 단계, 상기 추정된 잡음 신호의 스펙트럴 엔벨로프 값을 새로운 적응형 코드벡터로 설정하는 단계, 그리고 상기 슈퍼 코드북에 저장된 상기 N개의 적응형 코드벡터 중에서 적어도 하나의 적응형 코드벡터를 제거하고, 상기 새로운 적응형 코드벡터를 상기 슈퍼 코드북에 저장하는 단계를 더 포함할 수 있다.Estimating a noise signal from an input speech signal of a next frame input following the input speech signal, obtaining a spectral envelope value of the estimated noise signal, and converting the spectral envelope value of the estimated noise signal into a new adaptive code vector. And removing at least one adaptive codevector from among the N adaptive codevectors stored in the super codebook, and storing the new adaptive codevector in the super codebook. .

상기 슈퍼 코드북에 저장된 상기 N개의 적응형 코드벡터 중에서 저장된 시기가 가장 빠른 적응형 코드벡터를 제거할 수 있다.The adaptive code vector having the earliest stored time can be removed from the N adaptive code vectors stored in the super codebook.

본 발명의 다른 실시예에 따른 적응형 코드벡터를 이용한 코드북 기반 음성 향상 장치는, 클린 음성 신호와 잡음 신호가 혼합된 입력 음성 신호를 프레임 단위로 입력받는 음성 신호 입력부, 상기 입력 음성 신호를 푸리에 변환시켜 입력 음성 파워 스펙트럼을 생성하는 푸리에 변환부, 상기 입력 음성 신호의 스펙트럴 엔벨로프 값을 구하는 입력 음성 선형예측계수 추출부, 상기 입력 음성 파워 스펙트럼으로부터 상기 잡음 신호를 추정하고, 상기 추정된 잡음 신호의 스펙트럴 엔벨로프 값을 구하는 잡음 신호 선형예측계수 추출부, 복수의 클린 코드벡터를 가지는 클린 음성 코드북과 복수의 잡음 코드벡터를 가지며 서로 다른 타입을 가지는 복수의 잡음 코드북을 포함하는 코드북 데이터베이스, 상기 복수의 잡음 코드북들 중에서, 상기 추정된 잡음 신호의 스펙트럴 엔벨로프 값과 가장 유사한 잡음 코드벡터를 가지고 있는 잡음 코드북을 선택하는 코드북 선택부, 상기 추정된 잡음 신호의 스펙트럴 엔벨로프 값들 중에서 임의의 N개를 적응형 코드벡터로 설정하고, 상기 N개의 적응형 코드벡터를 상기 선택된 잡음 코드북에 저장시켜 슈퍼 코드북을 생성하는 제어부, 그리고 상기 슈퍼 코드북 내의 잡음 코드벡터들과 클린 코드북 내의 클린 코드벡터들을 조합한 통합 코드벡터와 상기 입력 음성 신호의 스펙트럴 엔벨로프 값을 이용하여 보정된 음성 신호를 추출하는 음성 합성부를 포함한다. Codebook-based speech enhancement apparatus using an adaptive codevector according to another embodiment of the present invention, a speech signal input unit for receiving an input speech signal of a clean speech signal and a noise signal in units of frames, Fourier transform the input speech signal A Fourier transform unit for generating an input speech power spectrum, an input speech linear prediction coefficient extractor for obtaining a spectral envelope value of the input speech signal, and estimating the noise signal from the input speech power spectrum, A noise signal linear prediction coefficient extracting unit for obtaining a spectral envelope value, a codebook database including a clean speech codebook having a plurality of clean code vectors and a plurality of noise codebooks having a plurality of noise code vectors and having different types, the plurality of Among the noise codebooks, the estimated noise A codebook selector for selecting a noise codebook having a noise codevector most similar to a spectral envelope value of a call, and setting any N of the spectral envelope values of the estimated noise signal as an adaptive codevector, A control unit for generating a super codebook by storing an adaptive code vector in the selected noise codebook, and a spectral envelope of the input speech signal and an integrated codevector combining the noise codevectors in the super codebook and the clean codevectors in the clean codebook. And a speech synthesizer for extracting the corrected speech signal using the value.

이와 같이, 본 발명에 따르면 입력신호의 잡음을 MS나 IMCRA(Improved Minima Controlled Recursive Averaging)와 같은 Long-term 잡음 추정 기법을 이용하여 추정된 잡음으로부터 선형예측계수를 추출하여 적응형 코드벡터를 생성하고, 생성된 적응형 코드벡터를 기존의 잡음 코드북에 결합하여 슈퍼 코드북을 생성함으로써, 학습한 잡음과 학습하지 않은 잡음, 모든 경우에 대해서 성능을 보장할 수 있다. 특히, 음성 부존재 확률(SAP)기법을 통해 획득한 클린 음성 신호의 게인 값과 잡음 신호의 게인 값을 이용하여, 더욱 성능이 향상된 음성 신호를 출력할 수 있다. As described above, according to the present invention, an adaptive code vector is generated by extracting a linear predictive coefficient from the estimated noise using a long-term noise estimation technique such as MS or Improved Minima Controlled Recursive Averaging (IMCRA). In addition, by combining the generated adaptive codevector with the existing noise codebook to generate a super codebook, the performance can be guaranteed for the learned noise and the untrained noise, in all cases. In particular, by using the gain value of the clean voice signal and the noise value obtained through the voice absence probability (SAP) technique, the voice signal with improved performance may be output.

도 1은 본 발명의 실시예에 따른 음성 인식 향상 장치의 구성을 나타내는 도면이다.
도 2는 도 1에 따른 코드북 데이터베이스의 구성도이다.
도 3은 본 발명의 실시예에 따른 음성 인식 향상 방법을 설명하기 위한 순서도이다.
도 4는 본 발명의 실시예에 따른 슈퍼 코드북을 생성하는 과정을 설명하기 위한 도면이다. 1 is a diagram illustrating a configuration of an apparatus for improving speech recognition according to an embodiment of the present invention.
2 is a configuration diagram of a codebook database according to FIG. 1.
3 is a flowchart illustrating a method for improving speech recognition according to an embodiment of the present invention.
4 is a diagram illustrating a process of generating a super codebook according to an embodiment of the present invention.

기타 실시예들의 구체적인 사항들은 상세한 설명 및 도면들에 포함되어 있다.The details of other embodiments are included in the detailed description and drawings.

본 발명의 이점 및, 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 후술 되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 것이며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하며, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다. 명세서 전체에 걸쳐 동일 참조 부호는 동일 구성 요소를 지칭한다.Advantages, features, and methods of achieving them will be apparent with reference to the embodiments described below in conjunction with the accompanying drawings. The present invention may, however, be embodied in many different forms and should not be construed as being limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art. Is provided to fully convey the scope of the invention to those skilled in the art, and the invention is only defined by the scope of the claims. Like reference numerals refer to like elements throughout.

도 1은 본 발명의 실시예에 따른 음성 인식 향상 장치의 구성을 나타내는 도면이고, 도 2는 도 1에 따른 코드북 데이터베이스의 구성도이다. 1 is a diagram illustrating a configuration of an apparatus for improving speech recognition according to an embodiment of the present invention, and FIG. 2 is a diagram illustrating a codebook database according to FIG. 1.

도 1을 참조하면, 음성 인식 향상 장치(100)는 음성 신호 입력부(110), 푸리에 변환부(120), 입력 음성 선형예측계수 추출부(130), 잡음 신호 선형예측계수 추출부(140), 코드북 데이터베이스(150), 코드북 선택부(160), 제어부(170) 및 음성 합성부(180)를 포함한다. Referring to FIG. 1, the apparatus 100 for improving speech recognition includes a speech signal input unit 110, a Fourier transform unit 120, an input speech linear predictive coefficient extractor 130, a noise signal linear predictive coefficient extractor 140, The codebook database 150 includes a codebook selector 160, a controller 170, and a speech synthesizer 180.

먼저 음성 신호 입력부(110)는 프레임 단위로 입력 음성 신호(y(n))를 수신한다. 여기서 입력 음성 신호는 오염된 음성 신호로서, 클린 음성 신호와 잡음 신호가 혼합된 음성 신호를 나타낸다. First, the voice signal input unit 110 receives an input voice signal y (n) in units of frames. The input voice signal is a contaminated voice signal, and represents a voice signal in which a clean voice signal and a noise signal are mixed.

푸리에 변환부(120)는 입력된 음성 신호를 시간 영역 신호에서 주파수 영역 신호로 푸리에 변환(FFT, Fast Fourier Transform) 시켜 입력 음성 파워 스펙트럼을 생성한다. The Fourier transform unit 120 generates an input speech power spectrum by performing Fourier Transform (FFT) on the input speech signal from the time domain signal to the frequency domain signal.

입력 음성 선형예측계수 추출부(130)는 입력 음성 신호의 스펙트럴 엔벨로프 값(|Y_LPC(k)|²)을 구한다. 여기서, 입력 음성 선형예측계수 추출부(130)는 입력 음성 파워 스펙트럼(|Y(k)|²)을 역푸리에 변환(IFFT, Inverse Fast Fourier Transform)시켜 입력 음성 신호(y(n))의 자기상관 값(R_y(n))을 생성한 다음, 입력 음성 신호(y(n))의 자기상관 값(R_y(n))을 레빈슨 알고리즘(Levinson algorithm)에 적용하여 입력 음성 신호(y(n))에 대한 선형예측계수(a_y)를 추출한다. 그 다음, 입력 음성 선형예측계수 추출부(130)는 입력 음성 신호(y(n))에 대한 선형예측계수(a_y)의 주파수 응답 값을 이용하여 입력 음성 신호(y(n))의 스펙트럴 엔벨로프 값(|Y_LPC(k)|²)을 구한다. 여기서, 입력 음성 신호(y(n))의 스펙트럴 엔벨로프 값(|Y_LPC(k)|²)은 입력 음성 신호(y(n))에 대한 선형예측계수의 주파수 응답 값(Y_LPC(k))의 크기의 제곱을 나타낸다. The input speech linear prediction coefficient extractor 130 obtains a spectral envelope value | Y _LPC (k) | ² of the input speech signal. Here, the input speech linear prediction coefficient extractor 130 performs an inverse fast fourier transform (IFFT) on the input speech power spectrum (| Y (k) | ² ) to generate a magnetic field of the input speech signal y (n). correlation value (R _y (n)) the generated and then, the input audio signal (y (n)) autocorrelation values (R _y (n)) the Levinson algorithm (Levinson algorithm) the input speech signal (y applied to the ( The linear predictive coefficient (a _y ) for n)) is extracted. Next, the input speech linear prediction coefficient extractor 130 speculates the input speech signal y (n) using a frequency response value of the linear prediction coefficient a _{y with} respect to the input speech signal y (n). To obtain the parallel envelope value (| Y _LPC (k) | ² ). Here, the spectral envelope value | Y _LPC (k) | ² of the input speech signal y (n) is the frequency response value Y _LPC (k of the linear predictive coefficient with respect to the input speech signal y (n). The square of the magnitude of)).

잡음 신호 선형예측계수 추출부(140)는 입력 음성 신호(y(n))로부터 잡음 신호를 추정하고, 추정된 잡음 신호(d(n))의 스펙트럴 엔벨로프 값(|D_LPC(k)|²)을 구한다. 즉, 잡음 신호 선형예측계수 추출부(140)는 IMCRA(Improved Minima Controlled Recursive Averaging) 또는 MS(Minimum Statistic)와 같은 long term 잡음 추정 알고리즘을 이용하여 입력 음성 신호(y(n))로부터 잡음 신호를 추정한다. 그리고, 추정된 잡음 신호의 잡음 타입을 이용하여, 복수의 잡음 코드북들 중에서 1개의 잡음 코드북을 선택하며, 상기 추정된 잡음 신호에 대한 잡음 파워 스펙트럼(|D(k)|²)을 역푸리에 변환(IFFT)시켜 추정된 잡음 신호(d(n))의 자기 상관 값(R_d(n))을 생성한다. 그리고, 추정된 잡음 신호(d(n))의 자기 상관 값(R_d(n))을 레빈슨 알고리즘(Levinson algorithm)에 적용하여, 추정된 잡음 신호(d(n))의 선형예측계수(a_d)를 추출한다. 그리고 잡음 신호 선형예측계수 추출부(140)는 추정된 잡음 신호(d(n))에 대한 선형예측계수(a_d)의 주파수 응답 값을 이용하여 추정된 잡음 신호(d(n))의 스펙트럴 엔벨로프 값(|D_LPC(k)|²)을 구한다. The noise signal linear prediction coefficient extractor 140 estimates a noise signal from the input speech signal y (n), and estimates the spectral envelope value (| D _LPC (k) |) of the estimated noise signal d (n). ² ) That is, the noise signal linear prediction coefficient extractor 140 extracts the noise signal from the input speech signal y (n) using a long term noise estimation algorithm such as Improved Minima Controlled Recursive Averaging (IMCRA) or Minimum Statistic (MS). Estimate. Then, one noise codebook is selected from a plurality of noise codebooks using the noise type of the estimated noise signal, and an inverse Fourier transform of the noise power spectrum (| D (k) | ² ) for the estimated noise signal is performed. IFFT produces an autocorrelation value R _d (n) of the estimated noise signal d (n). The linear predictive coefficient a of the estimated noise signal d (n) is applied by applying the autocorrelation value R _d (n) of the estimated noise signal d (n) to the Levinson algorithm. _d ) The noise signal linear prediction coefficient extractor 140 speculates the noise signal d (n) estimated using a frequency response value of the linear prediction coefficient a _{d with} respect to the estimated noise signal d (n). To obtain the envelop envelope value (| D _LPC (k) | ² ).

코드북 데이터베이스(150)는 도 2와 같이 클린 음성 코드북(151)과 복수의 잡음 코드북(152a, 152b, 152c, 152d, 152e)를 포함한다. 클린 음성 코드북(151)은 복수의 클린 코드벡터를 포함하며, 복수의 잡음 코드북(152)은 각각 서로 다른 잡음의 타입을 가지는데, 각각의 잡음 코드북(152)은 복수의 잡음 코드벡터를 포함한다. 본 발명의 실시예에서는 설명의 편의상 코드북 데이터베이스(150)는 1개의 클린 음성 코드북(151)과 5개의 잡음 코드북(152)으로 이루어지며, 5개의 잡음 코드북(152a, 152b, 152c, 152d, 152e)은 각각 잡음의 타입에 따라 White, Babble, F16, Factory2, Pink를 포함하는 것으로 가정한다. 또한 클린 음성 코드북(151)은 1024개의 클린 코드벡터로 이루어져 있으며, 잡음 코드북(152a, 152b, 152c, 152d, 152e)은 학습된 잡음(noise)에 대한 코드북으로서, 각각 8개의 잡음 코드벡터(fixed codevector)를 저장하고 있는 것으로 가정한다. The codebook database 150 includes a clean voice codebook 151 and a plurality of noise codebooks 152a, 152b, 152c, 152d, and 152e as shown in FIG. The clean speech codebook 151 includes a plurality of clean codevectors, and the plurality of noise codebooks 152 have different noise types, and each noise codebook 152 includes a plurality of noise codevectors. . In the embodiment of the present invention, for convenience of description, the codebook database 150 includes one clean voice codebook 151 and five noise codebooks 152, and five noise codebooks 152a, 152b, 152c, 152d, and 152e. Are assumed to include White, Babble, F16, Factory2, and Pink, respectively, depending on the type of noise. In addition, the clean speech codebook 151 is composed of 1024 clean code vectors, and the noise codebooks 152a, 152b, 152c, 152d, and 152e are codebooks for the learned noise. codevector).

다음으로 코드북 선택부(160)는 복수의 잡음 코드북들 중에서, 추정된 잡음 신호(d(n))의 스펙트럴 엔벨로프 값(|D_LPC(k)|²)과 가장 유사한 잡음 코드벡터를 가지고 있는 잡음 코드북을 선택한다.Next, the codebook selector 160 has a noise code vector most similar to the spectral envelope value | D _LPC (k) | ^{2 of the} estimated noise signal d (n) among the plurality of noise codebooks. Select the noise codebook.

그리고, 제어부(170)는 추정된 잡음 신호(d(n))의 스펙트럴 엔벨로프 값(|D_LPC(k)|²)들 중에서 임의의 N개를 적응형 코드벡터(adaptive codevector)로 설정하고, N개의 적응형 코드벡터를 상기 선택된 잡음 코드북에 저장시켜 슈퍼 코드북을 생성한다. The controller 170 sets any N of the spectral envelope values | D _LPC (k) | ² of the estimated noise signal d (n) as an adaptive codevector. The N adaptive codevectors are stored in the selected noise codebook to generate a super codebook.

또한 음성 합성부(180)는 슈퍼 코드북 내의 잡음 코드벡터들과 클린 코드북 내의 클린 코드벡터들을 조합한 통합 코드벡터와 상기 입력 음성 신호(y(n))의 스펙트럴 엔벨로프 값(|Y_LPC(k)|²)을 이용하여 음성인식이 향상된 보정 음성 신호를 추출한다. Also, the speech synthesis unit 180 may combine the noise code vectors in the super codebook and the clean code vectors in the clean codebook, and the spectral envelope value (| Y _LPC (k) of the input speech signal y (n). ² ) Extract the corrected speech signal with improved speech recognition.

이하에서는 도 3 및 도 4를 통하여 본 발명의 실시예에 따른 음성 인식 향상 방법을 더욱 상세하게 설명한다. 도 3은 본 발명의 실시예에 따른 음성 인식 향상 방법을 설명하기 위한 순서도이고, 도 4는 본 발명의 실시예에 따른 슈퍼 코드북을 생성하는 과정을 설명하기 위한 도면이다. Hereinafter, a method of improving speech recognition according to an embodiment of the present invention will be described in more detail with reference to FIGS. 3 and 4. 3 is a flowchart illustrating a method for improving speech recognition according to an embodiment of the present invention, and FIG. 4 is a diagram illustrating a process of generating a super codebook according to an embodiment of the present invention.

도 3에 나타낸 것처럼, 먼저 음성 신호 입력부(110)는 외부로부터 프레임 단위로 입력 음성 신호(y(n))를 수신한다(S310). 여기서, 입력되는 음성 신호(y(n))는 클린 음성(x(n))과 잡음 신호(d(n))가 섞인 신호로서, 잡음으로 오염된 음성 신호(y(n))는 다음의 수학식 1과 같이 나타낼 수 있다. As shown in FIG. 3, first, the voice signal input unit 110 receives an input voice signal y (n) from the outside in units of frames (S310). Here, the input voice signal y (n) is a mixed signal of the clean voice x (n) and the noise signal d (n). The voice signal y (n) contaminated with noise is It may be represented as in Equation 1.

여기서 x(n)은 클린 음성 신호이며 d(n)은 잡음 신호이다.
Where x (n) is a clean speech signal and d (n) is a noise signal.

외부로부터 입력 음성 신호(y(n))가 수신되면, 푸리에 변환부(120)는 시간 도메인의 입력 음성 신호(y(n))를 푸리에 변환(FFT) 시켜서 주파수 도메인의 음성 신호(Y(k))를 생성하고, 음성 파워 스펙트럼(Noisy power spectrum, |Y(k)|²)을 생성한다(S320). When the input voice signal y (n) is received from the outside, the Fourier transform unit 120 performs Fourier transform (FFT) on the input voice signal y (n) in the time domain to perform the voice signal Y (k) in the frequency domain. )) And generate a voice power spectrum (Y (k) | ² ) (S320).

그리고, 입력 음성 선형예측계수 추출부(130)는 음성 파워 스펙트럼(|Y(k)|²)을 역푸리에 변환(Inverse Fast Fourier Transform, IFFT)시켜 입력 음성 신호(y(n))의 자기 상관 값에 해당하는 R_y(n)을 생성한다(S330). The input speech linear prediction coefficient extractor 130 performs inverse fast Fourier transform (IFFT) on the voice power spectrum (| Y (k) | ² ) to autocorrelate the input speech signal (y (n)). R _y (n) corresponding to the value is generated (S330).

입력 음성 선형예측계수 추출부(130)는 R_y(n)을 레빈슨 알고리즘(Levinson algorithm)에 적용하여, 입력 음성 신호(y(n))의 선형예측계수(a_y)를 추출한다(S340). 레빈슨 알고리즘을 이용하여 입력 음성 신호(y(n))로부터 선형예측계수(a_y)를 추출하는 방식은 당업자라면 용이하게 실시할 수 있는 것이므로 상세한 설명은 생략한다. 이와 같이 입력 음성 신호에 대한 선형예측계수(a_y)를 추출하면, 입력 음성 선형예측계수 추출부(130)는 입력 음성 신호(y(n))의 선형예측계수(a_y)를 이용하여 입력 음성 신호(y(n))의 스펙트럴 엔벨로프 값(|Y_LPC(k)|²)을 구한다(S350). The input speech linear prediction coefficient extractor 130 applies R _y (n) to the Levinson algorithm to extract the linear prediction coefficient a _y of the input speech signal y (n) (S340). . A method of extracting the linear predictive coefficient a _y from the input speech signal y (n) using the Levinson algorithm can be easily implemented by those skilled in the art, and thus a detailed description thereof will be omitted. When the linear predictive coefficient a _y for the input speech signal is extracted as described above, the input speech linear predictive coefficient extractor 130 inputs the linear predictive coefficient a _y of the input speech signal y (n). The spectral envelope value | Y _LPC (k) | ^{2 of the} speech signal y (n) is obtained (S350).

한편, 잡음 신호 선형예측계수 추출부(140)는 IMCRA(Improved Minima Controlled Recursive Averaging) 또는 MS(Minimum Statistic)와 같은 Long-term 잡음 추정 알고리즘을 이용하여 음성 파워 스펙트럼(|Y(k)|²)로부터 입력 음성 신호(y(n))에 포함된 잡음 신호(d(n))를 추정한다(S360). 여기서, 추정된 잡음 신호(d(n))의 잡음 타입과 가장 유사한 잡음 코드북을 복수의 잡음 코드북들 중에서 선택한다. Meanwhile, the noise signal linear prediction coefficient extractor 140 uses a long-term noise estimation algorithm such as Improved Minima Controlled Recursive Averaging (IMCRA) or Minimum Statistic (MS) to obtain a speech power spectrum (| Y (k) | ² ). The noise signal d (n) included in the input voice signal y (n) is estimated from S360. Here, the noise codebook most similar to the noise type of the estimated noise signal d (n) is selected from the plurality of noise codebooks.

잡음 신호 선형예측계수 추출부(140)는 추정된 잡음 신호(d(n))로부터 잡음 파워 스펙트럼(|D(k)|²)을 생성하고, 추정된 잡음 파워 스펙트럼(|D(k)|²)을 역푸리에 변환(IFFT)시켜 추정된 잡음 신호(d(n))의 자기 상관 값(R_d(n))을 생성한다(S370). The noise signal linear prediction coefficient extractor 140 generates a noise power spectrum | D (k) | ² from the estimated noise signal d (n), and estimates the noise power spectrum | D (k) | ² ) an inverse Fourier transform (IFFT) to generate an autocorrelation value R _d (n) of the estimated noise signal d (n) (S370).

잡음 신호 선형예측계수 추출부(140)는 추정된 잡음 신호(d(n))의 자기 상관 값(R_d(n))을 레빈슨 알고리즘(Levinson algorithm)에 적용하여, 추정된 잡음 신호(d(n))의 선형예측계수(a_d)를 추출한다(S380). 그리고 잡음 신호 선형예측계수 추출부(140)는 추정된 잡음 신호(d(n))의 선형예측계수(a_d)를 이용하여 추정된 잡음 신호(d(n))의 스펙트럴 엔벨로프 값(|D_LPC(k)|²)을 구한다(S390). The noise signal linear prediction coefficient extractor 140 applies the autocorrelation value R _d (n) of the estimated noise signal _d (n) to the Levinson algorithm, thereby estimating the estimated noise signal d ( The linear predictive coefficient a _d of n)) is extracted (S380). The noise signal linear prediction coefficient extracting unit 140 may use the spectral envelope value (|) of the noise signal d (n) estimated using the linear prediction coefficient a _{d of} the estimated noise signal d (n). D _LPC (k) | ² ) is calculated (S390).

추정된 잡음 신호(d(n))의 P차 선형예측계수 a_d = [1, a_d(1), … ,a_d(P)]^T로 나타낼 수 있으며, 주파수 도메인에서는 다음의 수학식 2와 같이 스펙트럼 형태로 나타낼 수 있다.P-order linear prediction coefficients of the estimated noise signal d (n) a _d = [1, a _d (1),... , a _d (P)] ^T , and may be expressed in a spectral form in the frequency domain as shown in Equation 2 below.

이와 같이 추정된 잡음 신호(d(n))에 대한 선형예측계수(a_d)를 추출한 뒤, 잡음 신호 선형예측계수 추출부(140)는 추정된 잡음 신호(d(n))의 스펙트럴 엔벨로프 값(|D_LPC(k)|²)을 구한다. After extracting the linear predictive coefficient a _d for the estimated noise signal d (n), the noise signal linear predictive coefficient extractor 140 extracts the spectral envelope of the estimated noise signal d (n). _{Find the} value (| D _LPC (k) | ² ).

다음으로 코드북 선택부(160)는 코드북 데이터베이스(150)에 저장된 복수의 잡음 코드북(152a, 152b, 152c, 152d, 152e)들 중에서, 상기 추정된 잡음 신호(d(n))의 스펙트럴 엔벨로프 값(|D_LPC(k)|²)과 가장 유사한 잡음 코드벡터를 가지고 있는 잡음 코드북을 선택한다(S400).Next, the codebook selector 160 performs a spectral envelope value of the estimated noise signal d (n) among the plurality of noise codebooks 152a, 152b, 152c, 152d, and 152e stored in the codebook database 150. A noise codebook having a noise code vector most similar to (| D _LPC (k) | ² ) is selected (S400).

여기서, 코드북 선택부(140)는 IS-D(Itakura-Saito Distortion) 알고리즘을 이용하여 잡음신호의 코드북 타입을 선정하는데, IS-D를 이용하는 잡음신호의 코드북 타입의 선정은 다음의 수학식 3에 의해 이루어진다.Here, the codebook selector 140 selects a codebook type of the noise signal using an ISakura-Saito Distortion (IS-D) algorithm. The codebook type of the noise signal using the IS-D is selected from Equation 3 below. Is made by

여기서 n'는 선택된 잡음 코드북,

는 추정된 잡음신호, N_d는 코드북 데이터베이스 내 잡음 코드북의 개수, v_n은 n번째 코드북 내 코드벡터의 개수이고, a_d ^n,j은 n번째 잡음 코드북의 j번째 코드벡터의 선형 예측 계수이다. 따라서, 도 2에서 예를 든 것에 따르면, N_d는 5이고 v_n은 8에 해당한다. Where n 'is the selected noise codebook,

Is the estimated noise signal, N _d is the number of noise codebooks in the codebook database, v _n is the number of codevectors in the nth codebook, and a _d ^{n, j} is the linear prediction coefficient of the jth codevector of the nth noise codebook . Thus, according to the example in FIG. 2, N _d is 5 and v _n corresponds to 8.

그리고, 제어부(170)는 추정된 잡음 신호(d(n))의 스펙트럴 엔벨로프 값(|D_LPC(k)|²)들 중에서 임의의 N개를 적응형 코드벡터(adaptive codevector)로 설정하고, N개의 적응형 코드벡터를 선택된 잡음 코드북(n')에 저장시켜 슈퍼 코드북을 생성한다(S410). The controller 170 sets any N of the spectral envelope values | D _LPC (k) | ² of the estimated noise signal d (n) as an adaptive codevector. In operation S410, the N adaptive codevectors are stored in the selected noise codebook n '.

여기서, 추정된 잡음 신호(d(n))의 스펙트럴 엔벨로프 값(|D_LPC(k)|²)의 개수(N)는 임의로 설정이 가능하며, 본 발명의 실시예에서는 설명의 편의상 N=4개로 가정한다. 또한 설명의 편의상 선택된 잡음 코드북(n')을 White 타입의 잡음 코드북이라고 가정하면, 도 4와 같이 White 타입의 잡음 코드북(152a)에는 기존의 8개의 잡음 코드벡터(fixed code vector)에 4개의 적응형 코드벡터(adaptive code vector)가 추가된다. 따라서, White 타입의 잡음 코드북(152a)는 12개의 잡음 코드벡터를 포함하는 슈퍼 코드북으로 변경된다. Here, the number N of the spectral envelope values | D _LPC (k) | ^{2 of the} estimated noise signal d (n) may be arbitrarily set, and in the embodiment of the present invention, N = Assume four. In addition, assuming that the selected noise codebook n 'is a white type noise codebook for convenience of description, the white type noise codebook 152a has four adaptations to the existing eight fixed code vectors as shown in FIG. An adaptive code vector is added. Accordingly, the white type noise codebook 152a is changed to a super codebook including 12 noise code vectors.

다음으로 음성 합성부(170)는 슈퍼 코드북 내의 잡음 코드벡터들과 클린 코드북(151) 내의 클린 코드벡터들을 조합한 통합 코드벡터와 상기 입력 음성 신호(y(n))의 스펙트럴 엔벨로프 값(|Y_LPC(k)|²)을 이용하여 보정된 음성 신호(x''(n))를 출력한다(S420). Next, the speech synthesis unit 170 combines the noise code vectors in the super codebook and the clean code vectors in the clean codebook 151 and the spectral envelope values (|) of the input speech signal y (n). The audio signal x '' (n) corrected using the Y _LPC (k) | ² ) is output (S420).

더욱 상세하게 설명하면, 먼저 음성 합성부(170)는 S410 단계에서 생성된 슈퍼 코드북(즉, White 타입의 잡음 코드북(152a))에 포함되는 12개의 잡음 코드벡터와 클린 코드북에 포함되는 1024개의 클린 코드벡터를 조합하여 통합 코드벡터를 생성한다. 즉, 클린 코드북에 저장된 클린 코드벡터들의 개수가 1024개이고, 선택된 슈퍼 코드북에 저장된 잡음 코드벡터들의 개수가 12개이므로, 클린 코드벡터와 잡음 코드벡터가 조합된 통합 코드벡터의 경우의 수는 총 12288개가 가능하다. In more detail, first, the speech synthesis unit 170 includes 12 noise code vectors included in the super codebook generated in operation S410 (that is, the white type noise codebook 152a) and 1024 clean codes included in the clean codebook. Combine the codevectors to create an integrated codevector. That is, since the number of clean code vectors stored in the clean codebook is 1024 and the number of noise code vectors stored in the selected super codebook is 12, the total number of integrated code vectors in which the clean code vector and the noise code vector are combined is 12288 in total. Dogs are possible.

그리고, 음성 합성부(170)는 12288개의 통합 코드벡터들과 S320단계에서 추정된 입력 음성 신호(y(n))의 스펙트럴 엔벨로프 값(|Y_LPC(k)|²)의 유사도를 IS-D를 이용하여 측정하여, 입력 음성 신호에 포함된 클린 음성 신호와 잡음 신호를 추정한다. The speech synthesis unit 170 measures the similarity between 12288 integrated code vectors and the spectral envelope value | Y _LPC (k) | ² of the input speech signal y (n) estimated in step S320. The measurement is performed using D to estimate a clean speech signal and a noise signal included in the input speech signal.

유사도를 구하는 과정을 더욱 상세하게 설명하면, 입력 음성 신호(y(n)) 중 클린 음성 신호(x(n))의 게인을 c_g 라 하고 잡음 신호(d(n))의 게인을 n_g라고 하면, 다음의 수학식 4와 같이 12288개의 통합 코드벡터에 대하여 게인 조합 값(A₁ 내지 A₁₂₂₈₈)을 구할 수 있다. The process of obtaining the similarity is described in more detail. The gain of the clean voice signal x (n) of the input voice signal y (n) is referred to as c _g and the gain of the noise signal d (n) is defined as n _g. In this case, gain combination values A ₁ to A ₁₂₂₈₈ can be obtained for ₁₂₂₈₈ integrated code vectors as shown in Equation 4 below.

상기 수학식 4와 같이 클린 코드북의 1024개의 코드벡터(C_①, C_②, … ) 와 잡음 코드북(즉, 슈퍼 코드북)의 12개의 코드벡터(n_①, n_②, … ,n_⑪, n_⑫) 사이에는 12288개의 게인 조합 값(A₁ 내지 A₁₂₂₈₈)이 생성된다. 수학식 4에서 클린 음성 신호(x(n))의 게인 값(C_g)들과 잡음 신호(d(n))의 게인 값(n_g)들은 동일하게 표현하였으나, 각각 다른 값을 가질 수 있으며, 조합에 따라 게인 값(C_g, n_g)들이 다르게 산출될 수 있다. As shown in Equation 4, there are 12288 gains between 1024 code vectors C_①, C_②, ... of the clean codebook and 12 codevectors n_①, n_②, ..., n_⑪, n_⑫ of the noise codebook (ie, super codebook). Combination values A ₁ to A ₁₂₂₈₈ are generated. The gain value (n _g) of the gain value (C _g) and a noise signal (d (n)) of the clean speech signal (x (n)) in the equation (4) are however the same expression, respectively, may have different values, and Depending on the combination, the gain values C _g and n _g may be calculated differently.

그러면, 음성 합성부(170)는 하나의 프레임에 해당하는 입력 음성 신호(y(n))의 스펙트럴 엔벨로프 값(|Y_LPC(k)|²)과 게인 조합 값(A₁ 내지 A₁₂₂₈₈) 사이의 각각의 유사도(L₁ 내지 L₁₂₂₈₈)를 IS-D를 이용하여 구한다. IS-D를 이용하여 입력 음성 신호(y(n))의 스펙트럴 엔벨로프 값(|Y_LPC(k)|²)과 게인 조합 값 사이의 유사도(L, Likelyhood)를 구하는 방식은 당업자라면 용이하게 실시할 수 있으므로 이에 대한 상세한 설명은 생략한다. Then, the speech synthesis unit 170 performs a spectral envelope value (| Y _LPC (k) | ² ) and a gain combination value (A ₁ to A ₁₂₂₈₈ ) of the input speech signal y (n) corresponding to one frame. Each similarity (L ₁ to L ₁₂₂₈₈ ) between is obtained using IS-D. Using the IS-D to obtain the similarity (L, Likelyhood) between the spectral envelope value (| Y _LPC (k) | ² ) and the gain combination value of the input speech signal (y (n)) is readily available to those skilled in the art. Detailed description thereof will be omitted.

그런 다음, 음성 합성부(170)는 ML(Maximum Likelihood) 기법 또는 베이시안 추정 기법(Bayesian Minimum Mean Squared Error Estimation (MMSE))을 통해 입력 음성 신호로부터 클린 음성 신호와 잡음 신호를 구분한다. Then, the speech synthesizer 170 separates the clean speech signal and the noise signal from the input speech signal through a maximum likelihood (ML) technique or a Bayesian Minimum Mean Squared Error Estimation (MMSE) technique.

이를 더욱 상세하게 설명하면, 먼저 음성 합성부(170)는 다음의 수학식 5와 같이 입력 음성 신호(y(n))의 스펙트럴 엔벨로프 값(|Y_LPC(k)|²)을 구한다. In more detail, first, the speech synthesis unit 170 obtains the spectral envelope value | Y _LPC (k) | ^{2 of the} input speech signal y (n) as shown in Equation 5 below.

수학식 5에 나타낸 A₁ 내지 A₁₂₂₈₈는 수학식 4에서 구한 게인 조합 값이기 때문에, 이를 수학식 5에 대입하면 다음의 수학식 6과 같이 표현될 수 있다.Since A ₁ to A ₁₂₂₈₈ shown in Equation 5 are gain combination values obtained in Equation 4, it can be expressed as Equation 6 below by substituting this into Equation 5.

수학식 6에서 구한 입력 음성 신호(y(n))의 스펙트럴 엔벨로프 값을 클린 음성 신호의 스펙트럴 엔벨로프 값(X')과 추정된 잡음 신호의 스펙트럴 엔벨로프 값(d')으로 구분하면 수학식 7과 같이 나타낼 수 있다. If the spectral envelope value of the input speech signal y (n) obtained from Equation 6 is divided into the spectral envelope value (X ') of the clean speech signal and the spectral envelope value (d') of the estimated noise signal, It can be expressed as Equation 7.

그런 다음, 음성 합성부(170)는 다음의 수학식 8과 같이 추정된 클린 음성 신호의 스펙트럴 엔벨로프 값(X')과 잡음 신호의 스펙트럴 엔벨로프 값(d')으로 이루어진 위너 필터(Wiener filter, X'/(X'+d'))에 입력 음성 신호에 대한 입력 음성 파워 스펙트럼(y)을 곱함으로써, 추정된 잡음 신호 부분이 제거되어, 보정된 음성 파워 스펙트럼(X")을 추정할 수 있다. 여기서, 보정된 음성 파워 스펙트럼(X")이란 입력 음성 신호에서 추정된 잡음 신호가 제거되어 클린 음성 신호의 비율이 증가된 상태로서, 음성의 질이 향상된 음성 신호에 대한 파워 스펙트럼을 나타낸다. Then, the speech synthesis unit 170 includes a spectral envelope value X 'of the clean speech signal estimated as shown in Equation 8 and a spectral envelope value d' of the noise signal. By multiplying the input voice power spectrum y for the input voice signal by X '/ (X' + d '), the estimated noise signal portion is removed to estimate the corrected voice power spectrum X " Here, the corrected speech power spectrum (X ") is a state in which the ratio of the clean speech signal is increased by removing the estimated noise signal from the input speech signal, and represents the power spectrum of the speech signal with improved speech quality. .

여기서, y는 입력 음성 신호에 대한 입력 음성 파워 스펙트럼(|Y(k)|²)을 나타내고, x는 클린 음성 파워 스펙트럼을 나타내며, d는 잡음 파워 스펙트럼을 나타낸다. 그리고, 음성 합성부(170)는 보정된 음성 파워 스펙트럼(X'')을 역푸리에 변환(IFFT)시켜 음성의 질이 향상된 음성 신호를 출력시킨다. Here, y represents an input speech power spectrum (| Y (k) | ² ) for the input speech signal, x represents a clean speech power spectrum, and d represents a noise power spectrum. The speech synthesizer 170 outputs a speech signal having improved speech quality by performing an inverse Fourier transform (IFFT) on the corrected speech power spectrum X ″.

이와 같이 S420 단계가 완료된 이후에, 음성 신호 입력부(110)에 다음 프레임의 입력 음성 신호가 입력되면, S310 단계부터 S420 단계를 반복하여 음성의 질이 향상된 음성 신호를 출력시키는데, 제어부(170)는 적응형 코드벡터를 갱신시키도록 한다.After the step S420 is completed, if the input voice signal of the next frame is input to the voice signal input unit 110, the control unit 170 outputs a voice signal having improved voice quality by repeating steps S310 to S420. Update the adaptive code vector.

즉, 제어부(170)는 S310 단계 내지 S400 단계를 거쳐서 새로운 적응형 코드벡터(adaptive codevector)가 생성되면, 슈퍼 코드북(151)에 기 저장되어 있는 N개의 적응형 코드벡터 중에서 적어도 하나의 적응형 코드벡터를 제거하고, 새로 생성된 적응형 코드벡터를 상기 슈퍼 코드북(151)에 저장시킨다. 여기서, 기 저장되어 있는 N개의 적응형 코드벡터 중에서 가장 저장 시기가 빠른, 즉 가장 오래된 적응형 코드벡터를 제거하도록 한다. That is, when a new adaptive codevector is generated through steps S310 to S400, the controller 170 may include at least one adaptive code among N adaptive codevectors previously stored in the super codebook 151. The vector is removed and the newly generated adaptive codevector is stored in the super codebook 151. Here, among the N adaptive codevectors stored in advance, the earliest storage time, that is, the oldest adaptive codevector is removed.

이와 같이 본 발명의 실시예에 따르면 적응형 코드벡터를 프레임마다 갱신을 시키므로, 학습되지 않은 잡음이 섞인 입력 음성에 대해서도 효과적으로 음성의 질이 향상된 보정 음성 신호를 추출할 수 있다. As described above, according to the embodiment of the present invention, the adaptive code vector is updated for each frame, so that the corrected speech signal having the improved speech quality can be effectively extracted even for the input speech including the unlearned noise.

한편, 본 발명의 실시예에 따르면, 수학식 4에서 음성 부존재 확률(SAP, Speech Absence Probability) 기법을 이용하면, 클린 음성 신호의 게인 값(c_g)과 잡음 신호의 게인 값(n_g)을 더욱 정확하게 추정할 수 있다. 음성 부존재 확률(SAP)을 이용하는 잡음 추정 모델은 음성이 존재하는 구간과 음성이 존재하지 않은 구간으로 나누어 각 구간에 대하여 별개로 각 게인 값을 연산하여 합산하는 방식을 가진다. On the other hand, according to an embodiment of the present invention, using the speech absorptivity probability (SAP) method in Equation 4, the gain value (c _g ) of the clean speech signal and the gain value (n _g ) of the noise signal More accurate estimates can be made. The noise estimation model using the speech absence probability (SAP) has a method of calculating and adding each gain value separately for each section by dividing it into a section in which a speech is present and a section in which no speech is present.

먼저, 잡음 신호 선형예측계수 추출부(140)는 S360 단계에서 Long-term 잡음 추정 기법을 이용하여 잡음 신호(d(n))를 추정하면서 음성 부존재 확률 값(SAP)과 음성 존재 확률 값(1-SAP)을 산출한다. 그러면, 음성 합성부(170)는 산출된 음성 부존재 확률 값(SAP)과 음성 존재 확률 값(1-SAP)을 다음의 수학식 9에 적용하여 클린 음성 신호의 게인 값(c_g)과 잡음 신호의 게인 값(n_g)을 추정한다. First, the noise signal linear prediction coefficient extractor 140 estimates the noise signal d (n) by using a long-term noise estimation method in step S360, while the speech absence probability value SAP and the speech existence probability value 1 -SAP). Then, the speech synthesis unit 170 applies the calculated speech absence probability value (SAP) and the speech existence probability value (1-SAP) to the following equation (9), the gain value (c _g ) and the noise signal of the clean speech signal: Estimate the gain value of n _g .

여기서, Y(w)는 입력 음성 신호의 주파수 응답 값을 나타내며, Aⁿ _d(w)은 슈퍼 코드북에 저장된 n번째 코드벡터를 나타낸다. Here, Y (w) represents the frequency response value of the input speech signal, and A ⁿ _d (w) represents the nth code vector stored in the super codebook.

여기서, 음성 존재 구간에서의 게인 값은 E의 역행렬과 F의 행렬 곱으로 나타낼 수 있으며, E와 F는 다음과 같다. Here, the gain value in the voice presence section may be represented by the inverse matrix of E and the matrix product of F, and E and F are as follows.

여기서,

이며, A^m _x(w)은 클린 코드북에 저장된 m번째 코드벡터를 나타낸다. here,

A ^m _x (w) represents the m th code vector stored in the clean codebook.

따라서, 본 발명의 실시예에 따르면 음성 부존재 확률(SAP, Speech Absence Probability) 기법을 이용하여, 클린 음성 신호의 게인 값(c_g)과 잡음 신호의 게인 값(n_g)을 더욱 정확하게 추정할 수 있으며, 나아가 음성 합성부(170)는 S420 단계에서 클린 음성 신호를 출력할 수 있으므로 더욱 음성의 질이 향상된 음성 신호를 추출시킬 수 있다. Therefore, according to an embodiment of the present invention, the gain value c _g of the clean speech signal and the gain value n _g of the noise signal can be estimated more accurately using the speech absorptivity probability (SAP) technique. In addition, since the voice synthesizer 170 may output a clean voice signal in operation S420, the voice signal may be further extracted.

다음의 표 1은 본 발명의 실시예에 따른 잡음 추정 방법의 성능을 평가한 실험의 결과를 나타낸 것이다. Table 1 below shows the results of experiments evaluating the performance of the noise estimation method according to an embodiment of the present invention.

본 실험예에서는 학습된 잡음인 내부 잡음(Inside noise)와 학습되지 않은 외부 잡음(Outside noise)에 대하여 기존의 IMCRA 방법, 기존의 코드북 기반 음성향상기법(CBSE), 본 발명의 실시예에 따른 슈퍼 코드북을 이용한 음성향상기법(CBSE(S)), 기존의 음성 부존재 확률(SAP) 기법을 이용한 코드북 기반 음성향상기법(SAP-CBSE), 본 발명의 실시예에 따른 음성 부존재 확률(SAP)기법과 슈퍼 코드북을 이용한 음성향상기법(SAP-CBSE(S))의 5가지 방법을 통하여 스펙트럼 왜곡도(LSD, Log-spectral Distortion)에 대한 실험을 진행하였다. In this experimental example, the conventional IMCRA method, the existing codebook-based speech enhancement method (CBSE), and the super according to the embodiment of the present invention are applied to the internal noise, which is the learned noise, and the external noise, which are not learned. Speech enhancement method (CBSE (S)) using a codebook, codebook based speech enhancement method (SAP-CBSE) using an existing speech absence probability (SAP) technique, and a speech absence probability (SAP) technique according to an embodiment of the present invention. Log-spectral distortion (LSD) experiments were carried out through five methods of speech enhancement (SAP-CBSE (S)) using super codebook.

특히, 학습된 잡음인 내부 잡음(Inside noise)은 white, babble, F16, factory2, pink에 대한 잡음으로서, 상기 5개의 잡음에 대해서는 잡음 코드북을 저장하고 있으며, 학습되지 않은 외부 잡음(Outside noise)은 Buccaneer1, car, restaurant, factory1, street에 대한 잡음을 포함한다. 본 실험예에서는 음성신호 코드북의 학습은 8000Hz로 샘플링 된 TIMIT 데이터베이스 4620개의 문장으로, 테스트는 1680개의 문장으로 수행하였다. 잡음의 경우 10가지 잡음에 대해서 실험을 하였으며 white, babble, F16, factory2, pink를 기존 잡음 코드북(fixed codebooks)으로 가지고 있는 환경하해서 실험을 수행하였다. In particular, the internal noise, the learned noise, is noise for white, babble, F16, factory2, and pink. The noise codebook is stored for the five noises. Includes noise for Buccaneer1, car, restaurant, factory1, and street. In this example, the speech signal codebook was trained using 4620 sentences of the TIMIT database sampled at 8000 Hz, and the test was performed with 1680 sentences. In the case of noise, we experimented with 10 kinds of noise, and experimented under the environment of white, babble, F16, factory2, and pink as existing noise codebooks.

실험 결과를 보면 본 발명의 실시예에 따른 슈퍼 코드북을 이용한 음성향상기법(CBSE(S))의 경우가 기존의 IMCRA 방법, 기존의 코드북 기반 음성향상기법(CBSE)에 비하여 우수한 잡음 추정 성능을 보였으며, 본 발명의 실시예에 따른 음성 부존재 확률(SAP)기법과 슈퍼 코드북을 이용한 음성향상기법(SAP-CBSE(S))의 경우가 기존의 SAP를 이용한 코드북 기반 음성향상기법(SAP-CBSE)에 비하여 잡음 추정 성능이 우수한 것으로 나타났다. According to the experimental results, the speech enhancement method (CBSE (S)) using the super codebook according to the embodiment of the present invention shows better noise estimation performance than the conventional IMCRA method and the conventional codebook based speech enhancement method (CBSE). In the case of the speech absence probability (SAP) technique and the speech enhancement method using the super codebook (SAP-CBSE (S)) according to an embodiment of the present invention, the codebook-based speech enhancement method (SAP-CBSE) using SAP is used. The noise estimation performance is better than that.

특히, 본 발명의 실시예에 따른 슈퍼 코드북을 이용한 음성향상기법들(CBSE(S), SAP-CBSE(S))의 경우에는 학습된 잡음인 내부 잡음에서는 기존의 IMCRA 방법, 코드북 기반 음성향상기법들(CBSE, SAP-CBSE)과 유사한 잡음 추정 성능을 가지지만, 학습되지 않은 잡음(Outside noise)에 대해서는 월등히 우수한 잡음 추정 성능을 가지는 것으로 나타났다. In particular, in the case of speech enhancement methods (CBSE (S) and SAP-CBSE (S)) using the super codebook according to an embodiment of the present invention, in the internal noise which is the learned noise, the conventional IMCRA method and the codebook-based speech enhancement method Although it has similar noise estimation performance to those of CBSE and SAP-CBSE, it has been shown to have excellent noise estimation performance for the outside noise.

이와 같이, 본 발명의 실시예에 따르면 입력신호의 잡음을 MS나 IMCRA와 같은 Long-term 잡음 추정 기법을 이용하여 추정된 잡음으로부터 선형예측계수를 추출하여 적응형 코드벡터를 생성하고, 생성된 적응형 코드벡터를 기존의 잡음 코드북에 결합하여 슈퍼 코드북을 생성함으로써, 학습한 잡음과 학습하지 않은 잡음, 모든 경우에 대해서 성능을 보장할 수 있다. 특히, 음성 부존재 확률(SAP)기법을 통해 획득한 클린 음성 신호의 게인 값(c_g)과 잡음 신호의 게인 값(n_g)을 이용하여, 더욱 성능이 향상된 음성 신호를 출력할 수 있다. As described above, according to an embodiment of the present invention, an adaptive code vector is generated by extracting a linear predictive coefficient from noise estimated by using a long-term noise estimation technique such as MS or IMCRA, and generating an adaptive code vector. By combining type code vectors with existing noise codebooks, we can guarantee the performance of the learned and unlearned noises in all cases. In particular, by using the gain value c _g of the clean voice signal and the gain value n _g of the noise signal obtained through the voice absence probability (SAP) technique, a more improved voice signal may be output.

본 발명은 첨부된 도면에 도시된 실시예를 참고하여 설명되었으나, 이는 예시적인 것에 불과하며, 본 발명과 관련된 기술 분야에서의 통상적인 지식을 가진 자라면, 이로부터 다양한 변형 또는 균등한 타 실시예가 존재 가능하다는 점을 이해할 것이다. Although the present invention has been described with reference to the embodiments illustrated in the accompanying drawings, it is merely an example, and a person having ordinary knowledge in the art related to the present invention may have various modifications or equivalents thereto. I will understand that it is possible.

따라서 본 발명의 진정한 기술적 보호 범위는 첨부된 특허청구범위의 기술적 사상에 의해 정해져야 할 것이다.Accordingly, the true scope of the present invention should be determined by the technical idea of the appended claims.

Claims

Receiving an input voice signal mixed with a clean voice signal and a noise signal in units of frames,
Generating an input speech power spectrum by Fourier transforming the input speech signal,
Obtaining a spectral envelope value of the input speech signal;
Estimating the noise signal from the input speech power spectrum and obtaining a spectral envelope value of the estimated noise signal,
Selecting a noise codebook having a noise code vector most similar to a spectral envelope value of the estimated noise signal among a plurality of previously stored noise codebooks,
Setting any N of the spectral envelope values of the estimated noise signal as an adaptive codevector, and storing the N adaptive codevectors in the selected noise codebook to generate a super codebook; and
Extracting a corrected speech signal using an integrated codevector combining the noise codevectors in the super codebook and the clean codevectors in the clean codebook and the spectral envelope value of the input speech signal. Codebook based speech enhancement method.

The method of claim 1,
Obtaining a spectral envelope value of the input speech signal,
Generating an autocorrelation value of the input speech signal by inverse Fourier transforming the input speech power spectrum,
Extracting a linear predictive coefficient for the input speech signal by applying an autocorrelation value of the input speech signal to a Levinson algorithm, and
And obtaining a spectral envelope value of the input speech signal using a frequency response value of the linear predictive coefficient with respect to the input speech signal.

The method of claim 2,
Estimating a noise signal from the input speech signal, and obtaining a spectral envelope value of the estimated noise signal,
Estimating the noise signal from the input speech power spectrum using a long-term noise estimation algorithm,
Selecting one noise codebook among the plurality of previously stored noise codebooks using the noise type of the estimated noise signal,
Inverse Fourier transforming a noise power spectrum for the estimated noise signal to generate an autocorrelation value of the estimated noise signal,
Extracting a linear predictive coefficient for the estimated noise signal by applying an autocorrelation value of the estimated noise signal to a Levinson algorithm, and
And obtaining a spectral envelope value of the input speech signal using the frequency response value of the linear predictive coefficient for the estimated noise signal.

The method of claim 3,
Selecting a noise codebook having a noise code vector most similar to the spectral envelope value of the estimated noise signal,
Codebook-based speech enhancement method using an adaptive codevector that selects the noise codebook using an Itakura-Saito Distortion (IS-D) algorithm such as:

Where n 'is the selected noise codebook,

5. The method of claim 4,
Generating the super codebook,
Setting any N of the spectral envelope values of the estimated noise signal as the adaptive code vector and storing the selected N codebook in the selected noise codebook; and
And integrating the noise codevectors previously stored in the selected noise codebook with the N adaptive codevectors to change the selected noise codebook into the super codebook.

The method of claim 5,
Extracting the corrected voice signal,
Generating the integrated code vector by combining a noise code vector included in the super codebook and a clean code vector included in the clean codebook;
Generating gain combination values by applying a gain value of a clean speech signal and a gain value of the estimated noise signal with respect to the integrated code vector;
Obtaining respective similarities between the spectral envelope values of the input speech signal and the gain combination values, and
An adaptive code including applying the similarity and gain combination value to a maximum likelihood (ML) or a Bayesian estimation algorithm (MMSE) to obtain a spectral envelope value of the clean speech signal and a spectral envelope value of the noise signal Codebook based speech enhancement method using vectors.

The method according to claim 6,
Generating the corrected voice signal,
Generating a corrected speech power spectrum from a spectral envelope value of the clean speech signal and a spectral envelope value of the noise signal through a Wiener filter, and
And inversely Fourier transforming the corrected speech power spectrum to extract the corrected speech signal.

The method according to claim 6,
The gain value of the clean speech signal and the gain value of the estimated noise signal are obtained using an adaptive code vector algorithm using an adaptive code vector probability.

9. The method of claim 8,
A codebook-based speech enhancement method using an adaptive code vector, wherein the gain value c _g of the clean speech signal and the gain value n _g of the estimated noise signal are calculated by the following equation:

here,

The method of claim 1,
Estimating a noise signal from an input speech signal of a next frame input following the input speech signal, and obtaining a spectral envelope value of the estimated noise signal;
Setting the spectral envelope value of the estimated noise signal to a new adaptive codevector, and
Removing at least one adaptive codevector from among the N adaptive codevectors stored in the supercodebook, and storing the new adaptive codevector in the supercodebook. How to improve your voice.

The method of claim 10,
Codebook-based speech enhancement method using an adaptive codevector to remove the adaptive codevector having the earliest stored time among the N adaptive codevectors stored in the super codebook.

A voice signal input unit for receiving an input voice signal mixed with a clean voice signal and a noise signal in units of frames,
A Fourier transform unit for Fourier transforming the input speech signal to generate an input speech power spectrum;
An input speech linear prediction coefficient extraction unit for obtaining a spectral envelope value of the input speech signal,
A noise signal linear prediction coefficient extractor for estimating the noise signal from the input speech power spectrum and obtaining a spectral envelope value of the estimated noise signal;
A codebook database comprising a clean speech codebook having a plurality of clean codevectors and a plurality of noise codebooks having a plurality of noise codevectors and having different types,
A codebook selector for selecting a noise codebook having a noise code vector most similar to a spectral envelope value of the estimated noise signal among the plurality of noise codebooks;
A controller configured to set any N of the spectral envelope values of the estimated noise signal as an adaptive codevector, and store the N adaptive codevectors in the selected noise codebook to generate a super codebook; and
An adaptive code vector including an integrated code vector that combines noise code vectors in the super codebook and clean code vectors in a clean codebook, and a speech synthesizer for extracting a corrected speech signal using a spectral envelope value of the input speech signal. Codebook-based speech enhancement device using a.

The method of claim 12,
The input speech linear prediction coefficient extracting unit,
Generating an autocorrelation value of the input speech signal by inverse Fourier transforming the input speech power spectrum,
Applying the autocorrelation value of the input speech signal to the Levinson algorithm to extract a linear predictive coefficient for the input speech signal,
And a codebook-based speech enhancement apparatus using an adaptive code vector to obtain a spectral envelope value of the input speech signal using a frequency response value of the linear predictive coefficient with respect to the input speech signal.

The method of claim 13,
The noise signal linear prediction coefficient extractor,
Estimating the noise signal from the input speech power spectrum using a long-term noise estimation algorithm, selecting one noise codebook among the plurality of noise codebooks, using the noise type of the estimated noise signal, Inverse Fourier transform the noise power spectrum for the estimated noise signal to produce an autocorrelation of the estimated noise signal,
Applying the autocorrelation value of the estimated noise signal to a Levinson algorithm, extracting a linear predictive coefficient for the estimated noise signal, and using the estimated frequency response value of the linear predictive coefficient for the estimated noise signal Codebook-based Speech Enhancement Device Using Adaptive Codevector to Obtain Spectral Envelope of Noise Signal.

15. The method of claim 14,
The codebook selection unit,
Codebook-based speech enhancement device using an adaptive codevector that selects the noise codebook using an Itakura-Saito Distortion (IS-D) algorithm such as:

Where n 'is the selected noise codebook,

16. The method of claim 15,
The control unit,
N arbitrary spectral envelope values of the estimated noise signal are set as the adaptive codevector and stored in the selected noise codebook,
And a codebook-based speech enhancement apparatus using an adaptive codevector integrating the noise codevector previously stored in the selected noise codebook and the N adaptive codevectors into the super codebook.

17. The method of claim 16,
The speech synthesis unit,
The integrated code vector is generated by combining the noise code vector included in the super codebook and the clean code vector included in the clean codebook, and a gain value of a clean speech signal and a gain of the estimated noise signal are generated with respect to the integrated code vector. Apply each value to produce a gain combination,
After obtaining respective similarities between the spectral envelope values of the input speech signal and the gain combination values, the similarity and gain combination values are applied to a maximum likelihood (ML) or a Bayesian estimation algorithm (MMSE) to perform the clean speech signal. Codebook-based speech enhancement apparatus using an adaptive codevector to obtain a spectral envelope of and a spectral envelope of the noise signal.

18. The method of claim 17,
The speech synthesis unit,
A Wiener filter generates a corrected speech power spectrum from the spectral envelope value of the clean speech signal and the spectral envelope value of the noise signal, and inversely transforms the corrected speech power spectrum into the corrected speech power spectrum. Codebook-based speech enhancement device using adaptive codevectors that extract speech signals.

18. The method of claim 17,
And a gain value of the clean speech signal and a gain value of the estimated noise signal using an adaptive code vector obtained by using a speech absent probability (SAP) algorithm.

20. The method of claim 19,
A codebook-based speech enhancement apparatus using an adaptive code vector, wherein the gain value c _g of the clean speech signal and the gain value n _g of the estimated noise signal are calculated by the following equation:

here,

The method of claim 12,
When the input voice signal of the next frame input following the input voice signal is input,
The Fourier transform unit performs Fourier transform on the input speech signal of the next frame to generate an input speech power spectrum,
The noise signal linear prediction coefficient extractor estimates a noise signal from the input speech signal, obtains a spectral envelope value of the estimated noise signal,
The control unit,
Set a spectral envelope value of the estimated noise signal as a new adaptive codevector, remove at least one adaptive codevector from the N adaptive codevectors stored in the super codebook, and remove the new adaptive codevector Codebook-based speech enhancement apparatus using an adaptive codevector to store in the super codebook.

The method of claim 21,
The control unit,
Codebook-based speech enhancement apparatus using an adaptive codevector to remove an adaptive codevector having the earliest stored time among the N adaptive codevectors stored in the super codebook.