KR20060085118A

KR20060085118A - Method and apparatus for bandwidth extension of speech

Info

Publication number: KR20060085118A
Application number: KR1020050006096A
Authority: KR
Inventors: 마쓰마누; 한우진
Original assignee: 삼성전자주식회사
Priority date: 2005-01-22
Filing date: 2005-01-22
Publication date: 2006-07-26
Also published as: KR100708121B1

Abstract

협대역 음성 신호로부터 광대역의 음성 신호를 생성하는 음성신호의 대역 확장 방법 및 장치가 개시된다.Disclosed are a method and apparatus for widening a voice signal for generating a wideband voice signal from a narrowband voice signal.

본 발명에 따른 음성 신호의 대역 확장 장치는 스펙트럴 폴딩 기법 및 비선형화 기법을 이용하여 상기 협대역 음성 신호로부터 고대역 음성 신호를 추정하는 수단과, 상기 추정된 고대역 음성 신호의 스펙트럼 포락선을 조정하는 수단을 포함한다.The apparatus for extending a speech signal according to the present invention comprises means for estimating a highband speech signal from the narrowband speech signal using spectral folding and nonlinearization techniques, and adjusting a spectral envelope of the estimated highband speech signal. Means for doing so.

본 발명에 의하면, 스펙트럴 폴딩 기법과 비선형화 기법을 이용하여 협대역 음성 신호로부터 고대역 음성신호를 추정하여 효과적으로 음성 신호의 대역을 확장할 수 있으며, 통신 시스템의 수신측에 사용되어 향상된 품질의 음성 신호를 제공할 수 있다.According to the present invention, the spectral folding technique and the nonlinearization technique can be used to estimate the highband speech signal from the narrowband speech signal, thereby effectively extending the bandwidth of the speech signal, and can be used at the receiving side of the communication system to improve A voice signal can be provided.

Description

Method and apparatus for band extension of speech signal

도 1은 본 발명에 따른 대역 확장 장치의 개략적인 구성을 나타낸 블록도.1 is a block diagram showing a schematic configuration of a band extension device according to the present invention.

도 2는 본 발명에 따른 고대역 신호 생성부의 구체적인 구성을 나타낸 블록도.2 is a block diagram showing a specific configuration of a high band signal generation unit according to the present invention;

도 3은 본 발명에 따른 고대역 추정부의 구체적인 구성을 나타낸 블록도.3 is a block diagram showing a specific configuration of a high band estimation unit according to the present invention;

도 4는 본 발명에 사용되는 스펙트럴 윈도우를 나타낸 도면.4 illustrates a spectral window used in the present invention.

도 5는 본 발명에 따른 음성 신호의 대역 확장 방법을 나타낸 플로우 차트.5 is a flowchart illustrating a method for extending a band of a voice signal according to the present invention;

본 발명은 음성 신호의 처리에 관한 것으로, 보다 상세히는 협대역(narrowband) 음성 신호로부터 광대역(wideband)의 음성 신호를 생성하는 음성신호의 대역 확장 방법 및 장치에 관한 것이다.BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to the processing of speech signals, and more particularly, to a method and apparatus for widening a speech signal for generating a wideband speech signal from a narrowband speech signal.

일반적으로, 전화망의 대역(bandwidth)은 300~3200(㎐)으로 좁기 때문에, 전화망을 통하여 전송된 음성 신호의 주파수 대역은 제한된다. 즉, 전화망을 통해 전송되는 0~300(Hz), 3.2~8(KHz) 대역의 신호가 손실되어, 음성 신호의 열화가 일 어난다.In general, since the bandwidth of the telephone network is narrow to 300 to 3200 ㎐, the frequency band of the voice signal transmitted through the telephone network is limited. That is, signals in the 0 to 300 (Hz) and 3.2 to 8 (KHz) bands transmitted through the telephone network are lost, resulting in deterioration of the voice signal.

이러한 문제를 해결하기 위한 한 가지 방법으로, 대역 확장(bandwidth extension) 방법이 오랫동안 연구되어 왔다. 상기 대역 확장 방법은, 송신측에서 오디오(audio) 신호를 입력받아 소정 주파수 이상의 높은 주파수 대역의 데이터를 잘라내어 버리는 한편 잘라내어 버린 높은 주파수 대역의 데이터를 복원하기 위해 필요한 부가 정보를 생성하고 상기 낮은 주파수 대역의 신호와 상기 부가 정보를 전송하면, 수신측에서 상기 부가 정보를 이용하여 높은 주파수 대역의 데이터를 복원하는 방식으로 동작한다. 이러한 대역 확장 기술의 대표적인 예로는 Coding Technology사의 SBR(Spectral Band Replication) 기술을 들 수 있다. SBR에 대한 상세한 설명은 2002년 5월 10-13일 Audio Engineering Society 112차 컨벤션에서 발표된 Convention Paper 5560에 개시되어 있다.As one method for solving this problem, a bandwidth extension method has been studied for a long time. The band extension method receives an audio signal at a transmitting side, cuts out data of a high frequency band over a predetermined frequency, and generates additional information necessary to restore the cut out data of the high frequency band. When the signal and the additional information are transmitted, the receiver operates by recovering data of a high frequency band using the additional information. A representative example of such a bandwidth extension technology is Coding Technology's SBR (Spectral Band Replication) technology. A detailed description of the SBR is disclosed in Convention Paper 5560, presented at the Audio Engineering Society 112th Convention, May 10-13, 2002.

상기 SBR을 이용하는 대역 확장 기술은 음악과 같은 오디오 신호에 있어서 큰 성공을 거두었다. 최근에는 AAC(Advanced Audio Coding) 및 상기 SBR을 이용하는 aaaPLUS가 제 3세대 통신의 표준으로 선택된 바 있다. 그러나, 상기 SBR을 비롯한 다른 대역 확장 기술들은 오디오 신호에 초점을 맞추고 있으며, 상대적으로 스피치(speech)와 같은 음성 신호에 적합한 대역 확장 기술은 빈약한 실정이다.The band extension technology using the SBR has been a great success in audio signals such as music. Recently, aaaPLUS using Advanced Audio Coding (AAC) and the SBR has been selected as a standard for third generation communication. However, other band extension techniques, including the SBR, focus on audio signals, and there are relatively few band extension techniques suitable for speech signals such as speech.

따라서, 본 발명은 상기와 같은 문제점을 해결하기 위하여 안출된 것으로, 음성 신호의 대역을 확장하는 방법 및 그 장치를 제공하는 데에 목적이 있다. 특히, 본 발명은 스피치와 같은 음성 신호의 대역을 효과적으로 확장할 수 있는 음성 신호의 대역 확장 방법 및 장치를 제공하는 데에 목적이 있다.Accordingly, an object of the present invention is to provide a method and apparatus for extending a band of a voice signal. In particular, it is an object of the present invention to provide a method and apparatus for extending a band of a voice signal which can effectively expand a band of a voice signal such as speech.

또한, 본 발명은 수신측의 디코더에 사용되어 4~8(KHz)의 주파수 성분을 생성하여 16(KHz)까지 샘플링률의 증가를 가능하게 함으로써, 향상된 품질의 음성 신호를 제공할 수 있는 음성 신호의 대역 확장 방법 및 장치를 제공하는 데에 목적이 있다.In addition, the present invention can be used in the decoder on the receiving side to generate a frequency component of 4 ~ 8 (KHz) to enable an increase in the sampling rate up to 16 (KHz), thereby providing a voice signal that can provide a voice signal of improved quality An object of the present invention is to provide a method and a device for expanding a band of a channel.

전술한 바와 같은 기술적 과제를 해결하기 위하여 본 발명인 음성 신호의 대역 확장 장치는 스펙트럴 폴딩 기법 및 비선형화 기법을 이용하여 상기 협대역 음성 신호로부터 고대역 음성 신호를 추정하는 수단; 및 상기 추정된 고대역 음성 신호의 스펙트럼 포락선을 조정하는 수단을 포함한다.In order to solve the above technical problem, the present invention provides an apparatus for extending a speech signal, comprising: means for estimating a highband speech signal from the narrowband speech signal using spectral folding and nonlinearization techniques; And means for adjusting the spectral envelope of the estimated high band speech signal.

상기 협대역 음성 신호로부터 고대역 음성 신호를 추정하는 수단은, 상기 협대역 음성 신호를 업샘플링, 고역 통과 필터를 이용한 필터링, 및 다운 샘플링 과정을 수행하여 상기 협대역 음성 신호에 대하여 미러 대칭적인 스펙트럼 성분을 갖는 신호를 출력하는 스펙트럴 폴딩부; 및 상기 협대역 음성 신호를 비선형화하는 비선형화부를 구비하고, 상기 스펙트럴 폴딩부 및 상기 비선형화부의 출력 신호를 선형적으로 결합하여 상기 협대역 음성 신호로부터 고대역 음성 신호를 추정하는 것이 바람직하다.The means for estimating a highband speech signal from the narrowband speech signal comprises performing mirror sampling with the narrowband speech signal through upsampling, filtering with a high pass filter, and downsampling. A spectral folding unit for outputting a signal having a component; And a nonlinearity unit for non-linearizing the narrowband speech signal, and linearly combining the spectral folding unit and an output signal of the nonlinearization unit to estimate a highband speech signal from the narrowband speech signal. .

또한, 상기 추정된 고대역 음성 신호의 스펙트럼 포락선을 조정하는 수단은, 미리 훈련된 소정의 테이블을 이용하여 상기 협대역 음성 신호에 대응되는 고대역 음성 신호를 출력하는 매핑부; 및 상기 매핑부에서 출력되는 고대역 음성 신호에 일치되도록 상기 추정된 고대역 음성 신호의 스펙트럼을 조정하는 포락선 조정부를 포함하는 것이 바람직하다.The means for adjusting the spectral envelope of the estimated high band speech signal may include: a mapping unit configured to output a high band speech signal corresponding to the narrowband speech signal using a predetermined table that is trained in advance; And an envelope adjusting unit for adjusting the spectrum of the estimated high band speech signal to match the high band speech signal output from the mapping unit.

여기서, 상기 매핑부 및 포락선 조정부는 선형 주파수 켑스트럴 계수(LFCC)에 의하여 제공되는 음성 신호의 특징을 이용하는 것이 바람직하다.Here, it is preferable that the mapping unit and the envelope adjusting unit use characteristics of the voice signal provided by the linear frequency hysteresis coefficient (LFCC).

또한, 본 발명인 음성 신호의 대역 확장 방법은 상기 협대역 음성 신호의 피치 성분을 제거하고, 스펙트럴 폴딩 및 비선형화 기법을 이용하여 상기 피치 성분이 제거된 협대역 음성 신호로부터 고대역 음성 신호를 추정하는 단계; 및 상기 협대역 음성 신호의 선형 주파수 켑스트럴 계수(LFCC)를 추출하여, 상기 협대역 음성 신호에 대응되는 고대역 음성신호를 검색하고, 상기 검색된 고대역 음성신호에 일치되도록 상기 추정된 고대역 음성 신호의 스펙트럴 포락선을 조정하는 단계를 포함한다.In addition, the band extension method of the speech signal of the present invention removes the pitch component of the narrowband speech signal, and estimates the highband speech signal from the narrowband speech signal from which the pitch component is removed by using spectral folding and nonlinearization techniques. Doing; And extracting a linear frequency hysteresis coefficient (LFCC) of the narrowband speech signal, searching for a highband speech signal corresponding to the narrowband speech signal, and matching the estimated highband speech signal with the estimated highband speech signal. Adjusting the spectral envelope of the speech signal.

이하, 첨부된 도면을 참조하여 본 발명의 바람직한 실시예에 대하여 상세히 설명한다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명에 따른 대역 확장 장치의 개략적인 구성을 나타낸 블록도이다.1 is a block diagram showing a schematic configuration of a band extension device according to the present invention.

협대역 음성 신호의 대역 확장 방법은, 음성 신호의 발생 과정을 고려하여, 협대역 음성 신호나 고대역 음성 신호의 포락선(envelope)은 상호 의존적이고, 따라서 협대역 음성 신호를 알고 있다면 상기 협대역 음성 신호로부터 고대역의 음성 신호를 생성해 낼 수 있다는 사실에 기반하고 있다. In the method of band extension of a narrowband speech signal, in consideration of the generation process of the speech signal, the envelope of the narrowband speech signal or the highband speech signal is mutually dependent, so if the narrowband speech signal is known, the narrowband speech signal is known. It is based on the fact that it can generate high-band speech signals from signals.

도 1을 참조하면, 본 발명에 따른 대역 확장 장치(100)는 제 1, 2 업샘플링 부(110,150), 저역통과필터(120), 고대역 신호 생성부(130), 고역통과필터(160), 및 결합부(170)를 포함한다.Referring to FIG. 1, the band extension device 100 according to the present invention includes first and second upsampling units 110 and 150, a low pass filter 120, a high band signal generator 130, and a high pass filter 160. , And a coupling unit 170.

상기 제 1 업샘플링부(110)는 대역 확장 장치(100)로 입력되는 협대역 음성 신호(101)를 2배로 업샘플링한다. 이렇게 업샘플링된 신호는 16(KHz)에서 샘플링된다. 그 결과, 상기 업샘플링부(110)에서 출력되는 신호는 0~4(KHz)의 대역에서 입력 신호와 동일하고, 고대역인 4~8(KHz)에서는 입력 신호의 폴딩된 버전(folded version)과 동일한 스펙트럼을 갖게 된다. The first upsampling unit 110 upsamples the narrowband speech signal 101 that is input to the band extension device 100 twice. This upsampled signal is sampled at 16 (KHz). As a result, the signal output from the upsampling unit 110 is the same as the input signal in the band of 0 to 4 (KHz), and the folded version of the input signal in the high band of 4 to 8 (KHz) and It will have the same spectrum.

상기 저역통과필터(120)는 상기 업샘플링된 신호를 필터링하여 상기 폴딩된 버전을 제거하여 입력 신호와 동일한 스펙트럼 특성을 갖는 저대역 신호(121)를 출력한다.The low pass filter 120 filters the upsampled signal to remove the folded version and outputs a low band signal 121 having the same spectral characteristics as the input signal.

상기 고대역 신호 생성부(130)는 상기 협대역 음성 신호(101)로부터 4~8(KHz)의 고대역 음성 신호를 추정하는 부분으로, 후술되는 바와 같이 고대역 신호 추정부 및 고대역 스펙트럼 포락선 수정부를 포함하여 상기 협대역 음성신호로부터 고대역 신호를 추정하는 한편, 상기 협대역 음성신호와 대응되는 고대역 음성신호를 소정의 테이블을 이용하여 검색하고, 상기 추정된 고대역 신호를 수정하여 출력하는 방식으로 동작한다.The high band signal generator 130 estimates a high band speech signal of 4 to 8 (KHz) from the narrow band speech signal 101, and a high band signal estimator and a high band spectral envelope as described below. A high frequency signal is estimated from the narrowband speech signal, a highband speech signal corresponding to the narrowband speech signal is searched using a predetermined table, and the corrected highband signal is corrected and output. It works in such a way.

상기 고대역 신호 생성부(130)에서 출력되는 고대역 음성 신호는 상기 제 2 업샘플링부(150)에서 2배로 업샘플링되고, 상기 고역 통과 필터(160)는 상기 업샘플링된 신호로부터 4~8(KHz) 대역의 음성 신호(161)를 추출한다.The high band speech signal output from the high band signal generator 130 is upsampled twice by the second upsampling unit 150, and the high pass filter 160 is 4 to 8 from the upsampled signal. The audio signal 161 of the (KHz) band is extracted.

상기 결합부(170)는 상기 저역통과필터(120)에서 출력되는 0~4(KHz)의 저역 음성 신호(121) 및 상기 고역통과필터(160)에서 출력되는 4~8(KHz)의 고역 음성 신호(161)를 결합하여 전체적으로 0~8(KHz)으로 대역 확장된 광대역 음성 신호(171)를 출력하게 된다.The coupling unit 170 is a low pass voice signal of 0 to 4 (KHz) output from the low pass filter 120 and a high pass voice of 4 to 8 (KHz) output from the high pass filter 160. The signal 161 is combined to output a wideband voice signal 171 band extended to 0 to 8 (KHz) as a whole.

도 2는 본 발명에 따른 상기 고대역 신호 생성부(130)의 구체적인 구성을 나타낸 블록도이다.2 is a block diagram showing a specific configuration of the high band signal generation unit 130 according to the present invention.

상기 고대역 신호 생성부(130)는 입력되는 협대역 음성 신호로부터 고대역 음성 신호를 추정하는 부분(A)과, 상기 추정된 고대역 음성 신호의 스펙트럼 포락선(envelope)을 조정하는 부분(B)으로 구분된다.The high band signal generator 130 estimates a high band speech signal from the narrowband speech signal inputted therein, and adjusts a spectral envelope of the estimated high band speech signal. Separated by.

상기 고대역 음성 신호를 추정하는 부분(A)은 피치 필터(134), 피치 인버스 필터(135) 및 고대역 추정부(136)를 포함한다.The portion A for estimating the high band speech signal includes a pitch filter 134, a pitch inverse filter 135, and a high band estimator 136.

구체적으로는, 상기 피치 필터(134)는 입력되는 협대역 음성 신호로부터 3차 피치 필터 계수 및 피치에 대한 정보를 얻는다. 상기 피치 필터(134)에서는 수정된 공분산(covariance) 방법을 사용한다.Specifically, the pitch filter 134 obtains information about the third-order pitch filter coefficients and the pitch from the input narrowband speech signal. The pitch filter 134 uses a modified covariance method.

상기 피치 인버스 필터(135)는 상기 피치 필터(134)의 출력신호에 포함된 피치 성분을 제거한다. 이는, 고대역의 음성 신호에는 일반적으로 적은 피치 성분을 가지고 있기 때문이다.The pitch inverse filter 135 removes the pitch component included in the output signal of the pitch filter 134. This is because high pitch voice signals generally have fewer pitch components.

상기 고대역 추정부(136)는 상기 피치 인버스 필터(135)로부터 출력되는 피치 성분이 제거된 협대역 음성신호로부터 고대역의 음성신호를 추정하는 부분이다.The high band estimator 136 estimates a high band speech signal from a narrow band speech signal from which the pitch component output from the pitch inverse filter 135 is removed.

구체적으로는, 상기 고대역 추정부(136)는 스펙트럴 폴딩부(137) 및 비선형화부(138)를 포함한다.Specifically, the high band estimating unit 136 includes a spectral folding unit 137 and a nonlinearizing unit 138.

상기 스펙트럴 폴딩부(137)는 상기 피치 성분이 제거된 협대역 음성신호를 2배로 업샘플링하고, 고역 통과 필터를 통해 필터링한 다음, 다시 2배로 다운 샘플링함으로써, 원래 협대역 음성 신호와 비교하여 미러 대칭적인 스펙트럼 성분을 갖는 신호를 출력한다.The spectral folding unit 137 upsamples the narrowband speech signal from which the pitch component has been removed twice, filters through a high pass filter, and then downsamples the result twice to compare with the original narrowband speech signal. Output a signal with mirror symmetrical spectral components.

상기 비선형화부(138)는 상기 피치 성분이 제거된 협대역 음성신호를 2배로 업샘플링하고, 저역 통과 필터를 통해 필터링된 신호를 전파 정류기(full wave rectifier)와 같은 비선형화 수단을 통해 통과시킨다. 또한, 상기 비선형화부(138)는 상기 비선형화된 신호를 다시 고역 통과 필터를 통해 필터링하고 2배로 다운 샘플링한다.The nonlinearizer 138 upsamples the narrowband speech signal from which the pitch component has been removed twice, and passes the filtered signal through a low pass filter through a nonlinearization means such as a full wave rectifier. In addition, the nonlinearizer 138 again filters the nonlinearized signal through a high pass filter and downsamples it twice.

상기 스펙트럴 폴딩부(137) 및 비선형화부(138)에서의 출력 신호는 선형적으로 결합되어 고대역의 음성신호, 즉 4~8(KHz) 대역의 추정된 음성신호가 출력된다.The output signals from the spectral folding unit 137 and the non-linearization unit 138 are linearly combined to output a high band voice signal, that is, an estimated voice signal of 4 to 8 (KHz) band.

상기 고대역 추정부(136)의 구체적인 구성을 나타낸 도 3을 참조하여 설명하면 다음과 같다.A detailed configuration of the high band estimator 136 will now be described with reference to FIG. 3.

도 3을 참조하면, 상기 고대역 추정부(136)는 주로 무성음 성분의 고주파 대역을 복원하는데 유효한 스펙트럴 폴딩부(137)와, 유성음 성분의 하모닉(Harmonic) 구조를 생성하기 위한 비선형화부(138)를 구비한다.Referring to FIG. 3, the high band estimator 136 mainly includes a spectral folding unit 137 that is effective for restoring a high frequency band of an unvoiced sound component, and a nonlinearizer 138 for generating a harmonic structure of the voiced sound components. ).

상기 스펙트럴 폴딩부(137)는 업샘플링부(137a)에 의하여 대역 제한된 신호(협대역 음성신호)를 업샘플링하며, 이로 인해 고주파 대역의 성분은 저주파 대역의 성분의 대칭적인 형태로 복원된다. 이러한 과정을 거치면, 고주파 대역에도 성분이 생성된다. 일반적으로 음성의 무성음 성분은 고주파 대역에서도 상당한 에너지 를 가지고 있으므로, 상기 스펙트럴 폴딩부(137)는 무성음 성분의 고주파 대역을 복원하는데 유효하다.The spectral folding unit 137 upsamples a band limited signal (narrowband speech signal) by the upsampling unit 137a, thereby restoring a high frequency component to a symmetrical form of a low frequency component. Through this process, components are also generated in the high frequency band. In general, since the unvoiced sound component of the voice has a considerable energy even in the high frequency band, the spectral folding unit 137 is effective to restore the high frequency band of the unvoiced sound component.

상기 비선형화부(138)는 상기 스펙트럴 폴딩부(137)와는 달리, 유성음 성분의 하모닉 구조를 만들어내기 위해서 사용된다. 일반적으로, 유성음 성분은 많은 수의 하모닉 구조로 이루어져 있으므로, 비선형화를 적용하면 유성음 성분의 고주파 대역을 추정할 수 있다. 상세히는, 원신호를 저역통과필터(138a)를 통해 필터링하여 저주파 성분만을 얻은 후에, 이를 비선형화 함수(x(n)²)(138b)를 통해 고주파 대역까지 확장함으로써 고주파 대역에 저주파 대역이 갖는 강한 하모닉 성분을 확장시킨다. 여기서, 상기 비선형화 함수는 상기한 이외에 다양한 형태가 적용될 수 있다.The nonlinearizer 138 is used to create a harmonic structure of voiced sound components, unlike the spectral folding unit 137. In general, since the voiced sound component is composed of a large number of harmonic structures, nonlinearization can be used to estimate the high frequency band of the voiced sound component. In detail, after the original signal is filtered through the low pass filter 138a to obtain only low frequency components, the low frequency band is included in the high frequency band by extending it to the high frequency band through the nonlinearization function (x (n) ² ) 138b. Extends the strong harmonics. Here, the non-linearization function may be applied in various forms in addition to the above.

상기 스펙트럴 폴딩부(137) 및 비선형화부(138)에서 얻어진 신호는 각각 가중부(137b,138c)에서 α, 1-α가 곱해져서 더해짐으로써 가중 합산된다. 상기 α는 상기 스펙트럴 폴딩 기법 및 비선형화 기법 중에 어느쪽에 더 비중을 둘 것인가를 의미한다. 상기 α는, 상기 스펙트럴 폴딩 기법 및 비선형화 기법이 각각 무성음, 유성음 성분에 적합하다는 것을 고려하여 음성의 주기성(pitch)을 측정한 다음, 상기 측정된 주기성으로부터 추정될 수 있다. 음성의 주기성 측정 방법은 여러 가지가 있으나 일반적으로 스피치 코덱에서 주기성을 처리하기 위해서 사용되는 피치 필터의 이득값을 이용할 수 있으며, 이외에 정규화된 자기 상관 계수(normalized autocorrelation coefficient) 등을 이용할 수 있다.The signals obtained by the spectral folding unit 137 and the nonlinearizing unit 138 are weighted and summed by being multiplied by α and 1-α in the weighting units 137b and 138c, respectively. Α denotes which of the spectral folding and non-linearization techniques are to be given more weight. The α may be estimated from the measured periodicity after measuring the pitch of the speech in consideration that the spectral folding technique and the nonlinearization technique are suitable for the unvoiced and voiced components, respectively. There are various methods for measuring the periodicity of speech, but in general, the gain value of the pitch filter used to process the periodicity in the speech codec may be used, and in addition, a normalized autocorrelation coefficient may be used.

상기와 같이 가중 합산된 신호는 고역 통과 필터(137d)를 통해 필터링되고, 2배로 다운 샘플링(137e)되어 고대역에 관한 신호가 추정되어 출력되게 된다.The weighted summation signal as described above is filtered through the high pass filter 137d, down-sampled 137e twice, and the signal about the high band is estimated and output.

이와 같이, 본 발명에서는 대역 확장을 위해 스펙트럴 폴딩 기법 및 비선형화 기법을 함께 적용함으로써 개선된 대역 확장이 가능하다.As described above, in the present invention, the spectral folding technique and the nonlinearization technique are applied together to improve the bandwidth extension.

다시 도 2를 참조하면, 상기 추정된 고대역 음성 신호의 스펙트럼 포락선을 조정하는 부분(B)은 제 1,2 이산 여현 변환부(131,139), 제 1,2 켑스트럴(cepstral) 계수 추출부(132,140), 매핑부(133) 및 포락선 조정부(141)를 포함한다. Referring back to FIG. 2, portions B for adjusting the spectral envelope of the estimated high-band speech signal may include first and second discrete cosine transform units 131 and 139 and first and second cepstral coefficient extractors. 132 and 140, a mapping unit 133, and an envelope adjusting unit 141.

상기 제 1,2 이산 여현 변환부(Discrete Cosine Transform:DCT)(131,139)는 입력되는 시간 영역의 음성 신호를 주파수 영역으로 변환한다.The first and second discrete cosine transform (DCT) units 131 and 139 convert the input voice signal into a frequency domain.

상기 켑스트럴 계수 추출부(132,140)는 상기 주파수 영역으로 변환된 음성 신호로부터 상기 음성 신호의 특징을 나타내는 켑스트럴 계수를 추출한다.The spectral coefficient extracting units 132 and 140 extract the spectral coefficients representing the characteristics of the speech signal from the speech signal converted into the frequency domain.

널리 알려진 바와 같이, MFCC(Mel-Frequency Cepstral Coefficients)는 음성인식 등의 분야에서 음성의 특징을 나타내는데 사용되는 계수이다. 본 발명에서는 상기 MFCC를 변형한 선형 주파수 켑스트럴 계수(Linear-Frequency Cepstral Coefficient, 이하 "LFCC"라 한다.)를 이용하여 음성 신호의 특징을 추출한다. 본 발명에서는 선형 스케일 주파수 대역을 사용하기 때문에, 종래의 멜(Mel)-스케일 단위를 이용하는 MFCC와 구별하여 LFCC라고 명명한 선형 주파수 켑스트럴 계수를 이용하여 음성 신호의 특징을 나타낸다.As is well known, Mel-Frequency Cepstral Coefficients (MFCC) are coefficients used to characterize speech in the field of speech recognition. In the present invention, the characteristic of the speech signal is extracted using the linear frequency cepstral coefficient (hereinafter referred to as "LFCC") modified from the MFCC. In the present invention, since the linear scale frequency band is used, the characteristic of the speech signal is represented by using a linear frequency Histral coefficient named LFCC, which is distinguished from the conventional MFCC using Mel-scale units.

또한, 본 발명에서는 상기 LFCC를 추출함에 있어서 이산 푸리에 변환/고속 푸리에 변환(Discrete Fourier Transform/Fast Fourier Transform: DFT/FFT) 대신에 DCT를 이용할 수 있다. 이는 상기 DCT가 상기 DFT/FFT에 비하여 더 저렴하게 구현될 수 있기 때문이다.In the present invention, in extracting the LFCC, DCT can be used instead of Discrete Fourier Transform / Fast Fourier Transform (DFT / FFT). This is because the DCT can be implemented at a lower cost than the DFT / FFT.

또한, 본 발명에 사용되는 스펙트럴 윈도우를 나타낸 도 4를 참조하면, 상기 LFCC를 추출하는 과정에 이용되는 스펙트럴 윈도우는 첫 번째 및 마지막이 평평한 에지(edge)를 갖는다. 이러한 스펙트럴 윈도우 형태는 종래의 MFCC/LFCC에 이용되는 윈도우에 비하여 개선된 스펙트럴 포락선 조정을 가능하게 한다.In addition, referring to FIG. 4 illustrating a spectral window used in the present invention, the spectral window used in the process of extracting the LFCC has first and last flat edges. This spectral window form allows for improved spectral envelope adjustment compared to the window used for conventional MFCC / LFCC.

전술한 상기 이산 여현 변환부(131,139) 및 상기 켑스트럴 계수 추출부(132,140)는 각각 입력 음성 신호의 특징을 파악하여 협대역 음성 신호로부터 매핑되는 고대역 음성 신호의 검색 및 상기 고대역 추정부(136)으로부터 추정된 고대역 음성 신호의 특징을 얻는데 사용된다.The discrete cosine transforming units 131 and 139 and the cepstral coefficient extracting units 132 and 140 respectively identify characteristics of an input speech signal and search for a high-band speech signal mapped from a narrowband speech signal and the high-band estimator. It is used to obtain the characteristics of the high band speech signal estimated from 136.

구체적으로는, 입력단과 연결된 제 1 이산 여현 변환부(131)는 입력 협대역 음성 신호를 주파수 영역으로 변환하고, 제 1 켑스트럴 계수 추출부(132)는 상기 주파수 영역으로 변환된 음성 신호로부터 상기 음성 신호의 특징을 나타내는 LFCC를 출력한다.Specifically, the first discrete cosine transform unit 131 connected to the input stage converts the input narrowband speech signal into the frequency domain, and the first Histral coefficient extractor 132 converts the speech signal into the frequency domain. An LFCC indicating characteristics of the voice signal is output.

상기 제 1 켑스트럴 계수 추출부(132)에서 출력되는 LFCC를 이용하여, 상기 매핑부(133)는 미도시된 소정의 테이블로부터 입력된 협대역 음성 신호에 대응되는 고대역 음성 신호를 검색한다. 상기 매핑부(133)에 내장되는 테이블에는 벡터 양자화기 등을 이용하여 미리 협대역 음성 신호에 대응되는 고대역 음성 신호가 훈련되어 저장된다. Using the LFCC output from the first Histral coefficient extractor 132, the mapping unit 133 searches for a high-band speech signal corresponding to a narrowband speech signal input from a predetermined table not shown. . The high frequency speech signal corresponding to the narrowband speech signal is trained and stored in the table embedded in the mapping unit 133 in advance using a vector quantizer.

한편, 상기 고대역 추정부(136)에서 추정된 고대역 음성 신호는 상기 제 2 이산 여현 변환부(139)에서 주파수 영역으로 변환되고, 제 2 켑스트럴 계수 추출부(140)에서는 상기 추정된 고대역 음성 신호의 특징을 나타내는 LFCC를 출력한다.On the other hand, the high-band speech signal estimated by the high-band estimator 136 is converted into a frequency domain by the second discrete cosine transform unit 139, and the second Histral coefficient extracting unit 140 estimates the high-band speech signal. Outputs an LFCC indicating the characteristics of the high band speech signal.

상기 포락선 조정부(141)는 상기 매핑부(133)에서 출력된 고대역 음성 신호에 상기 추정된 고대역 음성 신호의 스펙트럼이 매칭될 수 있도록, 상기 추정된 고대역 음성 신호의 스펙트럼을 조정한다.The envelope adjusting unit 141 adjusts the spectrum of the estimated high-band speech signal so that the spectrum of the estimated high-band speech signal matches the high-band speech signal output from the mapping unit 133.

상기 포락선 조정부(141)에서 조정된 주파수 영역의 고대역 음성 신호는 역이산 여현 변환부(Inverse Discrete Cosine Transform:IDCT)(142)에서 시간 영역 신호로 변환되어 최종적으로 4~8(KHz) 대역의 고대역 음성 신호를 출력한다.The high-band speech signal of the frequency domain adjusted by the envelope adjusting unit 141 is converted into a time-domain signal by an inverse discrete cosine transform (IDCT) 142 and finally, 4-8 (KHz) band. Output a high-band speech signal.

도 5는 본 발명에 따른 음성 신호의 대역 확장 방법을 나타낸 플로우 차트이다.5 is a flowchart illustrating a method for extending a band of a voice signal according to the present invention.

도 5를 참조하면, 본 발명에 따른 음성 신호의 대역 확장 방법은 크게 입력되는 협대역 음성 신호로부터 고대역 음성 신호를 추정하는 단계(200) 및 미리 훈련되어 저장된 상기 입력 협대역 음성 신호에 대응되는 고대역 음성 신호의 정보를 이용하여 상기 추정된 고대역 음성 신호의 포락선을 조정하는 단계(250)로 나뉜다.Referring to FIG. 5, the method for extending a speech signal according to the present invention includes estimating a high band speech signal from a narrowly input narrowband speech signal (200) and corresponding to the pre-trained and stored narrowband speech signal. In step 250, the envelope of the estimated high-band speech signal is adjusted using information of the high-band speech signal.

먼저, 상기 고대역 음성 신호를 추정하기 위하여, 상기 피치 필터(134) 및 피치 인버스 필터(134)를 이용하여 피치 정보를 획득하고, 협대역 음성 신호에 포함된 피치 성분을 제거하여 피치를 평탄화한다(단계 202).First, in order to estimate the high-band speech signal, pitch information is obtained using the pitch filter 134 and the pitch inverse filter 134, and the pitch is removed by removing the pitch components included in the narrowband speech signal. (Step 202).

다음, 상기 피치 성분이 제거된 협대역 음성 신호를 이용하여, 상기 스펙트럴 폴딩부(137)는 업샘플링, 고역 통과 필터를 이용한 필터링 및 다운샘플링 과정 을 순차적으로 수행하여 원래의 신호에 대칭적인 스펙트럼 성분을 갖는 신호를 출력하고, 상기 비선형화부(138)는 업샘플링, 비선형화, 고역 통과 필터를 이용한 필터링, 및 다운 샘플링 과정을 수행함으로써 비선형화된 신호를 출력하며, 상기 스펙트럴 폴딩부(137) 및 비선형화부(138)에서 출력되는 신호가 선형적으로 결합되어 입력되는 협대역 음성 신호에 의하여 고대역 음성 신호가 추정된다(단계 204).Next, by using the narrowband speech signal from which the pitch component has been removed, the spectral folding unit 137 sequentially performs upsampling, filtering using a high pass filter, and downsampling to sequentially spectrum symmetric to the original signal. Outputs a signal having a component, and the nonlinearizer 138 outputs a nonlinearized signal by performing an upsampling, nonlinearization, filtering using a high pass filter, and a downsampling process, and the spectral folding unit 137 ) And the high-band speech signal is estimated by the narrow-band speech signal input by linearly combining the signals output from the non-linearization unit 138 (step 204).

다음, 상기 추정된 고대역 음성 신호를 더욱 정확하게 수정하기 위한 과정을 살펴보면, 먼저 입력되는 협대역 음성 신호를 주파수 영역으로 변환하고, 전술한 바와 같이 상기 협대역 음성 신호의 특징을 나타내는 LFCC를 추출한다. 상기 매핑부(133)에서는 상기 LFCC 정보와 미리 훈련되어 소정의 테이블에 저장된 상기 입력 협대역 음성 신호에 대응되는 고대역 음성 신호에 관한 LFCC 정보를 상기 포락선 조정부(141)로 출력한다(단계 254).Next, a process for correcting the estimated high-band speech signal more accurately will be described. First, an input narrowband speech signal is converted into a frequency domain, and the LFCC indicating the characteristics of the narrowband speech signal is extracted as described above. . The mapping unit 133 outputs, to the envelope adjusting unit 141, the LFCC information related to the high-band speech signal corresponding to the input narrow-band speech signal pre-trained with the LFCC information and stored in a predetermined table (step 254). .

상기 포락선 조정부(141)는 상기 고대역 추정부(136)에서 추정된 고대역 음성 신호의 스펙트럼을 상기 매핑부(133)에서 출력되는 LFCC 정보를 이용하여 상기 매핑부(133)의 고대역 음성 신호에 매칭될 수 있도록 조정하여, 고대역 음성 신호의 스펙트럴 포락선을 조정한다(단계 256).The envelope adjusting unit 141 uses the LFCC information output from the mapping unit 133 to convert the spectrum of the high band speech signal estimated by the high band estimating unit 136 into the high band speech signal of the mapping unit 133. The spectral envelope of the high band speech signal is adjusted to match (step 256).

전술한 단계들에 의하여 추정 및 조정 과정을 거친 고대역 음성 신호는 역이산변환부에 의하여 시간영역으로 변환되어 출력되고, 상기 도 1을 참조하여 설명한 바와 같이, 저대역 음성 신호와 결합되어 최종적으로 0~8(KHz)의 대역을 갖는 광대역 신호가 출력된다.The high-band speech signal, which has been estimated and adjusted by the above-described steps, is converted into the time domain by the inverse discrete transform unit and outputted. As described with reference to FIG. 1, the high-band speech signal is finally combined with the low-band speech signal. A wideband signal having a band of 0 to 8 (KHz) is output.

이와 같이, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자는 본 발 명이 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 변형된 형태로 구현될 수 있음을 이해할 수 있을 것이다. 그러므로 개시된 실시예들은 한정적인 관점이 아니라 설명적인 관점에서 고려되어야 한다. 본 발명의 범위는 전술한 설명이 아니라 특허청구범위에 나타나 있으며, 그와 동등한 범위 내에 있는 모든 차이점은 본 발명에 포함된 것으로 해석되어야 할 것이다.As such, it will be understood by those skilled in the art that the present invention may be implemented in a modified form without departing from the essential characteristics of the present invention. Therefore, the disclosed embodiments should be considered in descriptive sense only and not for purposes of limitation. The scope of the present invention is shown in the claims rather than the foregoing description, and all differences within the scope will be construed as being included in the present invention.

전술한 바와 같은 본 발명에 의하면, 스펙트럴 폴딩 기법과 비선형화 기법을 이용하여 협대역 음성 신호로부터 고대역 음성신호를 추정하여 효과적으로 음성 신호의 대역을 확장할 수 있다.According to the present invention as described above, it is possible to effectively expand the band of the speech signal by estimating the highband speech signal from the narrowband speech signal using the spectral folding technique and the nonlinearization technique.

또한, 본 발명은 통신 시스템의 수신측에 사용되어 향상된 품질의 음성 신호를 제공할 수 있다.In addition, the present invention can be used at the receiving side of a communication system to provide an improved quality speech signal.

Claims

In the band expansion device of a narrowband speech signal,

Means for estimating a highband speech signal from the narrowband speech signal using spectral folding and nonlinearization techniques; And

Means for adjusting the spectral envelope of the estimated high band speech signal.

The method of claim 1,

Means for estimating a highband speech signal from the narrowband speech signal,

A spectral folding unit configured to output a signal having a mirror symmetrical spectral component with respect to the narrowband speech signal by performing upsampling, filtering using a high pass filter, and downsampling the narrowband speech signal; And

And a nonlinearizer for nonlinearizing the narrowband speech signal, and linearly combining the spectral folding unit and an output signal of the nonlinearizer to estimate a highband speech signal from the narrowband speech signal. Bandwidth extension device for voice signals.

The method of claim 1,

Means for adjusting the spectral envelope of the estimated high band speech signal,

A mapping unit for outputting a high band speech signal corresponding to the narrowband speech signal using a predetermined table; And

And an envelope adjusting unit for adjusting a spectrum of the estimated high band speech signal to match the high band speech signal output from the mapping unit.

The method of claim 3, wherein

And the mapping unit and the envelope adjusting unit use characteristics of the speech signal provided by the linear frequency hysteresis coefficient (LFCC).

In the method of extending the band of a narrowband speech signal,

Removing a pitch component of the narrowband speech signal and estimating a highband speech signal from the narrowband speech signal from which the pitch component has been removed using spectral folding and nonlinearization techniques; And

Extracts a linear frequency cepstral coefficient (LFCC) of the narrowband speech signal, retrieves a highband speech signal corresponding to the narrowband speech signal, and estimates the estimated highband speech to match the found highband speech signal. Adjusting the spectral envelope of the signal.