KR20170098761A

KR20170098761A - Apparatus and method for extending bandwidth of earset with in-ear microphone

Info

Publication number: KR20170098761A
Application number: KR1020170103939A
Authority: KR
Inventors: 김은동
Original assignee: 주식회사 오르페오사운드웍스
Priority date: 2015-12-30
Filing date: 2017-08-17
Publication date: 2017-08-30
Also published as: KR20170080387A; KR101850693B1

Abstract

Disclosed are a device and a method for extending a bandwidth of an earset having an in-ear microphone. The device of the present invention comprises: a high frequency signal generating unit for generating high frequency signals by synthesizing excitation signals extended from inputted super-narrowband signals and high frequency band signals extended and filtered by doubling the frequency of the super-narrowband signals; and a mixing unit for mixing the high frequency signals and the super-narrowband signals. According to the present invention, narrowband signals inputted to an in-ear microphone are simply doubled to be extended to a high frequency band, and the high frequency band is extracted by performing only simple filtering in the extended high frequency band. Thus, a calculation amount can be remarkably reduced. Accordingly, due to a decrease in a calculation amount, a real-time treatment is possible, and thus a signal transfer delay or the like can be prevented.

Description

BACKGROUND OF THE INVENTION 1. Field of the Invention [0001] The present invention relates to an earphone having an in-ear microphone and a method of extending the bandwidth of the ear-

본 발명은 음성 복원 기법에 관한 것이다. 더 구체적으로는 인-이어 마이크로폰에 입력되는 저음역으로부터 고음역을 복원하는 인-이어 마이크로폰을 갖는 이어셋의 대역폭 확장 장치 및 방법에 관한 것이다.The present invention relates to a speech restoration technique. And more particularly to an apparatus and method for expanding bandwidth of an earset having an in-ear microphone for restoring a high frequency range from a low frequency range input to an in-ear microphone.

최근, 스피커와 마이크로폰을 일체화시킨 이어셋이 많이 제안되고 있다.In recent years, many ear-sets have been proposed in which a speaker and a microphone are integrated.

이러한 이어셋은 외이도로 음향을 전달하는 기능과 사용자 음성을 집음하는 기능을 하나의 바디(Body)에서 수행할 수 있다. 이에 통상, 스피커는 음향 전달을 위해 외이도 방향을 향하게 되며, 마이크로폰은 사용자 음성을 집음하기 위해 외부로 노출되게 된다.The earset can perform a function of transmitting sound to the ear canal and a function of collecting user voice in a single body. Normally, the loudspeaker is directed toward the ear canal for sound transmission, and the microphone is exposed to the outside to pick up the user's voice.

그런데, 외부로 노출된 마이크로폰에는 사용자 음성뿐 아니라, 외부 소음도 함께 집음되게 된다.However, not only the user's voice but also external noises are collected together with the microphone exposed to the outside.

이에 외부 소음 문제를 해결하기 위해, 마이크로폰(인-이어 마이크로폰)을 외이도 방향으로 설치한 이어셋이 제안된 바 있으나, 목소리가 성대로부터 유스타키오관을 통하여 고막쪽으로 전달되는 주파수는 0 ~ 2KHz 정도의 저음역이고, 이에 따라 인-이어 마이크로폰에 입력되는 저음역만으로 원음을 복원하는데 어려움이 있다.In order to solve the external noise problem, an earset in which a microphone (in-ear microphone) is installed in the direction of the ear canal has been proposed. However, the frequency at which the voice is transmitted from the vocal cords to the eardrum through the eustachian tube is a low- So that it is difficult to restore the original sound only by the low-frequency sound inputted to the in-ear microphone.

이러한 고음역 상실 문제를 해결하기 위해, 마이크로폰을 다수개 구성하여 마이크로폰에 입력된 서로 다른 대역의 주파수 음성을 합성하여 원음으로 복원시키는 기술이 제안된 바 있다. 즉, 외이도 측에 설치되는 인-이어(In-Ear) 마이크로폰과, 귓바퀴 바깥쪽에 설치되는 아웃-이어(Out-Ear) 마이크로폰을 함께 구성하고, 인-이어(In-Ear) 마이크로폰과 아웃-이어(Out-Ear) 마이크로폰으로 입력되는 서로 다른 대역의 주파수 음성을 합성하여 원음을 복원하는 것이다.In order to solve such a problem of high frequency range loss, a technique has been proposed in which a plurality of microphones are constituted to synthesize frequency bands of different frequency bands inputted to a microphone and restore them to original sounds. That is, an in-ear microphone provided on the ear canal and an out-ear microphone provided on the outer ear are constituted together, and an in-ear microphone and an out- Synthesizes frequency bands of different frequency bands input from an Out-Ear microphone and restores the original sound.

그럼, 여기서 음성을 합성하여 원음을 복원하는 기존 기술에 대해 설명한다.Here, an existing technique of restoring the original sound by synthesizing the voice will be described.

도 1은 기존 음성 합성 장치의 제어회로블록도이다.1 is a block diagram of a control circuit of a conventional speech synthesis apparatus.

도 1을 참조하면, 기존 음성 합성 장치는, 본 출원인에 의해 출원된 것으로서, 인-이어 마이크로폰(1)과, 인-이어 마이크로폰(1)으로부터 전달된 신호를 저주파 대역 및 고주파 대역으로 확장시키는 주파수 대역 확장부(2)와, 확장된 신호로부터 저주파 대역 신호를 추출하는 저주파 대역 신호 추출부(3)와, 적어도 하나 이상의 아웃-이어 마이크로폰(4)과, 아웃-이어 마이크로폰(4)으로부터 전달된 신호를 빔포밍 처리하는 빔포밍부(5)와, 빔포밍 처리된 신호로부터 고주파 대역 신호를 추출하는 고주파 대역 신호 추출부(6)와, 적어도 하나 이상의 채널에 대한 음성의 진폭값을 감지하여 패킷 생성여부를 결정하는 음성 활동 감지부(7)와, 음성 활동 감지에 대응하여 구동되어 저주파 대역 신호 추출부(3) 및 고주파 대역 신호 추출부(6)에서 전달된 저주파 대역 신호 및 고주파 대역 신호를 합성하는 합성부(8)로 구성된다.1, the conventional speech synthesizing apparatus has been proposed by the present applicant and includes an in-ear microphone 1 and a frequency synthesizer 2 for expanding a signal transmitted from the in-ear microphone 1 into a low frequency band and a high frequency band A low frequency band signal extracting section 3 for extracting a low frequency band signal from the extended signal and at least one out-ear microphone 4 and an out-ear microphone 4, A high-frequency band signal extracting unit 6 for extracting a high-frequency band signal from the signal subjected to the beamforming process, and a high-frequency band signal extracting unit 6 for detecting the amplitude value of the voice for at least one or more channels, Frequency band signal extracted from the low frequency band signal extracting unit 3 and the high frequency band signal extracting unit 6 driven in response to the voice activity detection, And a synthesizer 8 for synthesizing a high frequency band signal and a high frequency band signal.

이와 같이 구성된 기존 음성 합성 장치에서는, 아웃-이어 마이크로폰(4) 측의 빔포밍 처리된 고주파 대역 신호와 인-이어 마이크로폰(1)으로부터 전달된 저주파 대역 신호를 합성하여 원음을 복원하고 있다.In the conventional speech synthesizer thus constituted, the original sound is restored by synthesizing the high-frequency band signal subjected to the beam-forming process on the side of the out-ear microphone 4 and the low-frequency band signal transmitted from the in- ear microphone 1.

그러나, 기존 음성 합성 장치에서는 다수의 마이크로폰을 구성하여야 하므로 제조 비용이 증가하는 문제가 있다. 또한, 아웃-이어 마이크로폰(4)으로는 여전히 외부 소음이 입력되므로, 외부 소음을 완벽하게 제거하는 것은 실질적으로 불가능하며, 또한 외부 소음을 제거하기 위한 필터링이 반드시 수반되어야 하는 문제가 있다.However, the conventional speech synthesizer requires a large number of microphones, which increases the manufacturing cost. Further, since external noise is still input to the out-ear microphone 4, it is practically impossible to completely eliminate external noise, and there is a problem that filtering for eliminating external noise must be accompanied.

한편, 고음 복원에 이용되는 기술들로는, 스펙트럼 폴딩, 스펙트럼 쉬프팅, 선형 예측 부호화(LPC; Linear Predicative coding) 등이 있다.On the other hand, techniques used for treble reconstruction include spectral folding, spectral shifting, and linear predictive coding (LPC).

여기서, 선형 예측 부호화 기술은 음성 부호화나 복호화시에 널리 이용되고 있는 기술로서, 미국등록특허(US 8,306,249) 등에서 알 수 있는 것처럼 보청기 등에 사용 가능한 음성 복호화 장치에서 선형 예측 부호화 알고리즘을 사용하기도 한다.Here, the linear predictive coding technique is widely used in speech coding and decoding, and a linear predictive coding algorithm may be used in a speech decoding apparatus usable in a hearing aid or the like, as is known from US Pat. No. 8,306,249.

선형 예측 부호화의 source-filter 모델링 기술에 의하면, 소리는 성대의 떨림을 통해서 발생(source)되고, 소리는 구강, 비강, 입 구조에 따라서 다른 소리로 배출(filter)된다. 이를 수학적으로 모델링한 것이 source-filter 모델링이다. 즉, 소스를 모델링하고, 여기에 필터를 추가하면, 떨림이 목소리(원음)로 재생되는 과정을 모델링할 수 있다.According to the source-filter modeling technique of linear predictive coding, sound is generated through vibrations of the vocal cords, and sound is filtered by different sounds depending on the structure of the mouth, nasal cavity and mouth. This is mathematically modeled by source-filter modeling. That is, by modeling the source and adding a filter to it, you can model the process by which tremors are reproduced with the voices (original sounds).

도 2는 기존 선형 예측 부호화 기법을 이용한 음성 합성 장치의 제어회로블록도이다.2 is a block diagram of a control circuit of a speech synthesizer using a conventional linear predictive coding technique.

도 2를 참조하면, 기존 음성 합성 장치는, 입력된 협대역 신호로부터 여기신호(excitation signal)를 결정하는 선형 예측 분석부(11)와, 결정된 여기신호를 스펙트럼 폴딩 기법을 통해 광대역 여기신호를 출력하여 소리를 생성하는 여기신호 확장부(12)와, 입력된 협대역 신호, 저주파 포락선 정보 및 여기신호로부터 목소리 특징 정보를 추출하는 특징 추출부(13)와, 특징 정보에 대응하여 선 스펙트럼 주파수(linear spectrum frequency)로 표현되는 포락선 성분에 대해 코드북 매핑(Codebook Mapping), 인공신경망, 가우시안 믹싱 모델(Gaussian Mixture Model) 중 어느 하나의 기법을 이용하여 광대역 고주파 신호를 출력하여 목소리를 생성하는 스펙트럼 포락선 확장부(14)와, 광대역 여기신호와 광대역 고주파 신호를 합성하여 원음을 복원하는 합성부(15)로 구성된다.2, the conventional speech synthesizing apparatus includes a linear prediction analyzing unit 11 for determining an excitation signal from an input narrowband signal, and a demodulating unit 11 for outputting a determined excitation signal to a wideband excitation signal through a spectrum folding technique A feature extracting unit 13 for extracting voice feature information from the input narrowband signal, low frequency envelope information, and excitation signal, and an extractor 13 for extracting voice feature information from a line spectrum frequency a spectrum envelope expansion that generates a voice by outputting a broadband high frequency signal using a codebook mapping, an artificial neural network, or a Gaussian Mixture Model for an envelope component represented by a linear spectrum frequency, And a synthesizer 15 for synthesizing the wideband excitation signal and the broadband high frequency signal to restore the original sound.

그런데, 이와 같이 구성된 기존 음성 합성 장치는 스펙트럼 포락선 확장부(14)에서 이용하는 코드북 매핑(Codebook Mapping), 인공신경망, 가우시안 믹싱 모델(Gaussian Mixture Model) 기법들은 연산량이 너무 방대하여 실시간 처리가 어려운 문제가 있다. 이에, 예를 들면 블루투스 이어셋/헤드셋에 포함되는 칩셋(DSP)으로 처리할 경우에 연산량이 너무 많아 지연 현상이 발생할 수 있다. 한편, 인-이어 마이크로폰에 입력되는 저음역에 대해 기존 선형 예측 부호화 기법을 적용하는 것은 적합하지 않다는 문제가 있다.However, in the conventional speech synthesizer thus constructed, the codebook mapping, the artificial neural network, and the Gaussian Mixture Model techniques used in the spectral envelope expanding section 14 have a problem that the amount of computation is too large to be processed in real time . For example, when processing with a chipset (DSP) included in a Bluetooth earset / headset, the amount of computation is too large to cause a delay phenomenon. On the other hand, there is a problem in that it is not suitable to apply the existing linear predictive coding technique to the low-frequency range input to the in-ear microphone.

문헌 1. US 8,306,249 (Siemens Medical Instruments Pte. Ltd. Rosenkranz Tobias) 2012.11.6US 8,306,249 (Siemens Medical Instruments Pte. Ltd. Rosenkranz Tobias) 문헌 2. 대한민국특허청 특허등록번호 제10-0517229호, "적응형 필터링에 의해 고주파 복원 코딩 방법의 인식성능을 향상시키기 위한 방법 및 장치"Document 2. Patent Registration No. 10-0517229, "Method and Apparatus for Improving Recognition Performance of High Frequency Restoration Coding Method by Adaptive Filtering"

따라서, 본 발명은 상기한 종래 기술의 문제점을 해결하기 위해 이루어진 것으로서, 본 발명의 목적은 인-이어 마이크로폰에 입력되는 협대역 신호를 고주파 대역으로 단순 확장시키고, 확장된 고주파 대역에서 단순 필터링을 통해 고주파 대역을 추출하는 인-이어 마이크로폰을 갖는 이어셋의 대역폭 확장 장치 및 방법을 제공하는데 있다.SUMMARY OF THE INVENTION Accordingly, it is an object of the present invention to provide an in-ear microphone which simply expands a narrowband signal input to an in-ear microphone into a high frequency band and performs simple filtering in an extended high frequency band And an in-ear microphone for extracting a high frequency band.

상기와 같은 목적을 달성하기 위한 본 발명의 인-이어 마이크로폰을 갖는 이어셋의 대역폭 확장 장치는, 바람직하게는 입력된 초협대역 신호(Super-Narrowband signal)로부터 확장된 여기신호(excitation signal)와 상기 초협대역 신호의 주파수를 배가시켜 확장시키고 필터링한 고주파 대역 신호를 합성하여 고주파 신호를 생성하는 고주파 신호 생성부; 및 상기 고주파 신호와 상기 초협대역 신호를 믹싱하는 믹싱부를 포함할 수 있다.According to an aspect of the present invention, there is provided an earphone bandwidth extension device including an earphone microphone, wherein the earphone bandwidth extension device includes an excitation signal extended from an input super-narrowband signal, A high frequency signal generating unit for multiplying the frequency of the band signal by doubling and synthesizing the filtered high frequency band signals to generate a high frequency signal; And a mixing unit for mixing the high-frequency signal and the ultra-narrowband signal.

이 때, 상기 고주파 신호 생성부는, 상기 초협대역 신호로부터 상기 여기신호를 결정하는 제1 선형 예측 분석부; 결정된 상기 여기신호를 광대역 여기신호로 확장하는 여기신호 확장부; 상기 초협대역 신호의 주파수를 배가(N배)시켜 고주파 대역 신호를 포함하는 광대역 신호로 확장시키는 고주파 스펙트럼 확장부; 확장된 상기 광대역 신호로부터 고주파 대역 신호를 추정 및 결정하는 제2 선형 예측 분석부; 상기 제2 선형 예측 분석부로부터 출력된 고주파 대역 신호를 필터링하는 필터링부; 및 상기 필터링부로부터 출력된 고주파 대역 신호와 상기 여기신호 확장부로부터 출력된 광대역 여기신호를 합성하는 합성부를 포함할 수 있다. 여기서, 상기 여기신호의 확장은 스펙트럼 폴딩 기법을 이용할 수 있다. 또한, 상기 광대역 신호의 확장은 스펙트럼 폴딩(spectral folding), 변환(modulation) 기법 중 어느 하나를 이용할 수 있다.In this case, the high-frequency signal generator may include: a first linear prediction analyzer for determining the excitation signal from the ultra-narrowband signal; An excitation signal expander for expanding the excitation signal to a wideband excitation signal; A high-frequency spectrum expander for doubling the frequency of the ultra-narrowband signal to a wideband signal including a high-frequency band signal; A second linear prediction analyzer for estimating and determining a high frequency band signal from the extended wideband signal; A filtering unit for filtering the high frequency band signal output from the second linear prediction analyzer; And a combining unit for combining the high frequency band signal output from the filtering unit and the wideband excitation signal output from the excitation signal expanding unit. Here, the extension of the excitation signal may use a spectral folding technique. In addition, the extension of the wideband signal may use either spectral folding or modulation.

한편, 상기 고주파 신호 생성부 및 믹싱부는 이어셋의 회로에 구성될 수 있으며, 이 때, 상기 이어셋은 블루투스 칩셋을 포함할 수 있다. 한편, 상기 고주파 신호 생성부 및 믹싱부는 스마트폰의 회로에 구성될 수도 있다.Meanwhile, the high-frequency signal generator and the mixing unit may be configured in a circuit of the earset, and the earset may include a Bluetooth chipset. Meanwhile, the high-frequency signal generating unit and the mixing unit may be configured in a circuit of a smart phone.

한편, 본 발명의 인-이어 마이크로폰을 갖는 이어셋의 대역폭 확장 방법은, 바람직하게는 (a) 입력된 초협대역 신호(Super-Narrowband signal)로부터 확장된 여기신호(excitation signal)와 상기 초협대역 신호의 주파수를 배가시켜 확장시키고 필터링한 고주파 대역 신호를 합성하여 고주파 신호를 생성하는 단계; 및 (b) 상기 고주파 신호와 상기 초협대역 신호를 믹싱하는 단계를 포함할 수 있다.Meanwhile, the bandwidth extension method of the earset having the in-ear microphone of the present invention preferably includes: (a) extracting an excitation signal from an input super-narrowband signal and an excitation signal of the ultra- Generating a high-frequency signal by multiplying the filtered high frequency band signals by doubling the frequency; And (b) mixing the high frequency signal and the ultrasound band signal.

이 때, 상기 단계 (a)는, 상기 초협대역 신호로부터 상기 여기신호를 결정하는 단계; 결정된 상기 여기신호를 광대역 여기신호로 확장하는 단계; 상기 초협대역 신호의 주파수를 배가(N배)시켜 고주파 대역 신호를 포함하는 광대역 신호로 확장시키는 단계; 확장된 상기 광대역 신호로부터 고주파 대역 신호를 추정 및 결정하는 단계; 결정된 상기 고주파 대역 신호를 필터링하는 필터링부; 및 필터링된 상기 고주파 대역 신호와 확장된 상기 광대역 여기신호를 합성하는 단계를 포함할 수 있다.At this time, the step (a) includes: determining the excitation signal from the ultra-narrowband signal; Expanding the determined excitation signal to a wideband excitation signal; Expanding the frequency of the ultra-narrow band signal to a wide band signal including a high frequency band signal by doubling (N times) the frequency of the ultra low band signal; Estimating and determining a high frequency band signal from the extended wideband signal; A filtering unit for filtering the determined high frequency band signal; And synthesizing the filtered high-frequency band signal and the extended wide-band excitation signal.

상술한 바와 같이, 본 발명에 의한 인-이어 마이크로폰을 갖는 이어셋의 대역폭 확장 장치 및 방법에 따르면, 인-이어 마이크로폰에 입력되는 협대역 신호를 단순하게 배가시켜 고주파 대역으로 확장시키고, 확장된 고주파 대역에서 단순 필터링만을 수행하여 고주파 대역을 추출하므로 연산량을 현저하게 감소시킬 수 있다.As described above, according to the apparatus and method for expanding the bandwidth of an earphone having an in-ear microphone according to the present invention, a narrowband signal inputted to an in-ear microphone is simply doubled to expand into a high frequency band, It is possible to remarkably reduce the amount of computation since only the simple filtering is performed to extract the high frequency band.

이에, 연산량 감소에 따라 실시간 처리가 가능하므로, 신호 전달 지연 현상 등을 방지할 수 있다.Therefore, it is possible to perform real-time processing in accordance with the reduction of the calculation amount, thereby preventing the signal transmission delay phenomenon.

도 1은 기존 음성 합성 장치의 제어회로블록도이다.
도 2는 기존 선형 예측 부호화 기법을 이용한 음성 합성 장치의 제어회로블록도이다.
도 3은 본 발명의 일 실시예로서, 인-이어 마이크로폰을 갖는 이어셋의 대역폭 확장 장치의 제어회로블록도이다.
도 4는 본 발명의 적용례로서, 무선 이어셋/헤드셋에 적용되는 경우의 개념도이다.
도 5는 본 발명이 다른 적용례로서, 유선 이어셋/헤드셋에 적용되는 경우의 개념도이다.
도 6은 본 발명의 일 실시예로서, 인-이어 마이크로폰을 갖는 이어셋의 대역폭 확장 방법의 흐름도이다.1 is a block diagram of a control circuit of a conventional speech synthesis apparatus.
2 is a block diagram of a control circuit of a speech synthesizer using a conventional linear predictive coding technique.
3 is a block diagram of a control circuit of an earphone bandwidth extension device having an in-ear microphone according to one embodiment of the present invention.
4 is a conceptual diagram of an application of the present invention to a wireless earset / headset.
5 is a conceptual diagram when the present invention is applied to a wired ears / headset as another application example.
Figure 6 is a flow diagram of a method for extending the bandwidth of an earset with an in-ear microphone, according to one embodiment of the present invention.

이하에서는 본 발명의 바람직한 실시예 및 첨부하는 도면을 참조하여 본 발명을 상세히 설명하되, 도면의 동일한 참조부호는 동일한 구성요소를 지칭함을 전제하여 설명하기로 한다.DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT Hereinafter, the present invention will be described in detail with reference to preferred embodiments of the present invention and the accompanying drawings, wherein like reference numerals refer to like elements.

발명의 상세한 설명 또는 특허청구범위에서 어느 하나의 구성요소가 다른 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 당해 구성요소만으로 이루어지는 것으로 한정되어 해석되지 아니하며, 다른 구성요소들을 더 포함할 수 있는 것으로 이해되어야 한다.It is to be understood that when an element is referred to as being "comprising" another element in the description of the invention or in the claims, it is not to be construed as being limited to only that element, And the like.

또한, 발명의 상세한 설명 또는 특허청구범위에서 "~수단", "~부", "~모듈", "~블록"으로 명명된 구성요소들은 적어도 하나 이상의 기능이나 동작을 처리하는 단위를 의미하며, 이들 각각은 소프트웨어 또는 하드웨어, 또는 이들의 결합에 의하여 구현될 수 있다.Also, in the description of the invention or the claims, the components named as "means", "parts", "modules", "blocks" refer to units that process at least one function or operation, Each of which may be implemented by software or hardware, or a combination thereof.

본 발명은 인-이어 마이크로폰을 통해 전달된 사용자 목소리의 저주파 대역 신호로부터 고주파 대역 신호를 복원하는 방법에 관한 것이다. 특히, 블루투스의 DSP를 이용하여 실시간으로 고음의 복원이 가능하도록 한 기법을 제시한다.The present invention relates to a method for recovering a high frequency band signal from a low frequency band signal of a user voice transmitted through an in-ear microphone. In particular, we propose a technique that enables high-quality restoration in real time using Bluetooth DSP.

상기한 바와 같이, 소리는 성대의 떨림을 통해서 발생(source)되고, 소리는 구강, 비강, 입 구조에 따라서 다른 소리로 배출(filter)된다. 즉, 떨림(source) 혹은 좁은틈 사이를 공기가 통과하면서 생성되는 교란을 나타내는 여기신호 성분과, 목소리(filter)를 생성하는 포락선 성분으로 구분된다. 통상, 여기신호 성분과 포락선 성분은 각각 광대역 확장 과정을 거치게 되는데, 여기신호 성분은 원음을 생성하는데 미치는 영향이 포락선 성분에 대비하여 상대적으로 작으므로 계산량이 적은 스펙트럼 폴딩 기법 또는 스펙트럼 평행이동 기법을 사용한다. 그런데, 포락선 성분에 대해서는, 선 스펙트럼 주파수(linear spectrum frequency)로 표현되는 포락선 성분에 대해 코드북 매핑(Codebook Mapping), 인공신경망, 가우시안 믹싱 모델(Gaussian Mixture Model), 은닉 마르코프 모델(hidden Markov model, HMM) 기법을 이용하여 광대역 고주파 신호를 출력하여 목소리를 생성하므로, 연산량이 많다는 단점이 있다. 결국, 예를 들어 블루투스 DSP에서 실시간으로 고음을 복원하는 것이 실질적으로 불가능한 상황이다. 이에 본 발명에서는 연산량을 현저히 감소시킴으로써 실시간 고음 복원 및 원음 복원이 가능하도록 한 방안을 제안한다.As described above, sound is generated through vibrations of the vocal cords, and sound is filtered by different sounds depending on the oral cavity, nasal cavity, and mouth structure. That is, it is divided into an excitation signal component representing a disturbance generated when air passes between a source or a narrow gap, and an envelope component generating a filter. Generally, the excitation signal component and the envelope signal component are subjected to a broadband expansion process. Since the excitation signal component has a relatively small influence on the original sound component in comparison with the envelope component, the spectrum folding technique or the spectral parallelism technique do. For the envelope component, a codebook mapping, an artificial neural network, a Gaussian Mixture Model, a hidden Markov model (HMM), and the like are applied to an envelope component represented by a linear spectrum frequency. ) Technique to output a wideband high-frequency signal to generate a voice, there is a disadvantage in that the calculation amount is large. As a result, for example, it is practically impossible to restore a treble in real time in a Bluetooth DSP. Accordingly, the present invention proposes a method of real time treble restoration and original sound restoration by significantly reducing the amount of computation.

이하에서는 본 발명의 인-이어 마이크로폰을 갖는 이어셋의 대역폭 확장 장치 및 방법이 구현된 일 예를 특정한 실시예를 통해 설명하기로 한다.Hereinafter, an embodiment in which an earphone bandwidth extension apparatus and method having an in-ear microphone of the present invention is implemented will be described with reference to specific embodiments.

도 3은 본 발명의 일 실시예로서, 인-이어 마이크로폰을 갖는 이어셋의 대역폭 확장 장치의 제어회로블록도이다.3 is a block diagram of a control circuit of an earphone bandwidth extension device having an in-ear microphone according to one embodiment of the present invention.

도 3을 참조하면, 본 발명의 대역폭 확장 장치는, 입력된 초협대역 신호(Super-Narrowband signal)로부터 여기신호(excitation signal)를 결정하는 제1 선형 예측 분석부(21)와, 결정된 여기신호를 스펙트럼 폴딩 기법을 통해 광대역 여기신호를 출력하여 소리를 생성하는 여기신호 확장부(22)와, 초협대역 신호의 주파수를 배가(N배)시켜 고주파 대역 신호를 포함하는 광대역 신호로 확장시키는 고주파 스펙트럼 확장부(23)와, 확장된 광대역 신호로부터 고주파 대역 신호를 추정 및 결정하는 제2 선형 예측 분석부(24)와, 제2 선형 예측 분석부(24)로부터 출력된 고주파 대역 신호를 필터링하는 필터링부(25)와, 필터링부(25)로부터 출력된 고주파 대역 신호와 여기신호 확장부(22)로부터 출력된 광대역 여기신호를 합성하는 합성부(26)와, 합성부(26)로부터 출력된 고주파 신호와 초협대역 신호를 믹싱하는 믹싱부(27)를 포함한다. 이와 같이, 본 발명의 대역폭 확장 장치는, 고주파 크게 입력된 초협대역 신호(Super-Narrowband signal)로부터 확장된 여기신호(excitation signal)와 초협대역 신호의 주파수를 배가시켜 확장시키고 필터링한 고주파 대역 신호를 합성하여 고주파 신호를 생성하는 고주파 신호 생성부와, 고주파 신호와 초협대역 신호를 믹싱하는 믹싱부(27)로 구성되어 있다.3, the bandwidth extension apparatus of the present invention includes a first linear prediction analyzer 21 for determining an excitation signal from an input super-narrowband signal, An excitation signal expander 22 for generating a sound by outputting a wideband excitation signal through a spectral folding technique, a high frequency spectrum expander 22 for doubling the frequency of the ultra wideband signal to a wideband signal including a high frequency band signal, A second linear prediction analyzer 24 for estimating and determining a high frequency band signal from the extended wideband signal, a filtering unit 24 for filtering the high frequency band signal output from the second linear prediction analyzer 24, A combining section 26 for combining the high frequency band signal outputted from the filtering section 25 and the wide band excitation signal outputted from the excitation signal expanding section 22, Wow And a mixing unit 27 for mixing the ultrasound band signals. As described above, the bandwidth extension device of the present invention can expand the frequency of an excitation signal and an ultrasound band signal expanded from a super-narrowband signal input with a high frequency to a high frequency band signal And a mixer 27 for mixing the high frequency signal and the ultra high frequency band signal.

고주파 스펙트럼 확장부(23)는 일례로서, 초협대역 신호(0 ~ 2KHz)를 2배로 업샘플링하면, 업샘플링된 신호는 4KHz에서 샘플링된다. 이에 고주파 스펙트럼 확장부(23)에서 출력되는 신호는 0 ~ 4KHz 대역과 동일하고, 고주파 대역인 4 ~ 8KHz에서는 입력 신호의 폴딩된 버전과 동일한 스펙트럼을 갖게 된다. 이 스펙트럼을 이용하여 고주파 대역 신호를 추정하게 된다. 이에, 필터링부(25)에서는 4 ~ 8KHz 대역의 음성 신호를 추출하게 된다. 이후, 합성부(26)에서는 0 ~ 4KHz 대역의 음성 신호와 4 ~ 8KHz 대역의 음성 신호의 합성이 이루어지고, 이어서 합성부(26)에서 출력된 고주파 음성과 확장 이전의 초협대역 신호(0 ~ 2KHz)를 믹싱하여 최종적으로 원음을 복원한다.As an example, the high frequency spectrum expanding unit 23 upsamples the ultra-narrowband signal (0 to 2 KHz) twice, and the upsampled signal is sampled at 4 KHz. Accordingly, the signal output from the high-frequency spectrum expander 23 is the same as that in the 0 to 4 KHz band, and has the same spectrum as the folded version of the input signal in the high-frequency band 4 to 8 KHz. This spectrum is used to estimate the high frequency band signal. Accordingly, the filtering unit 25 extracts the audio signal in the band of 4 to 8 KHz. Thereafter, the synthesizer 26 synthesizes the audio signal in the band of 0 to 4 KHz and the audio signal in the band of 4 to 8 KHz, and then synthesizes the high-frequency audio output from the synthesizer 26 and the ultra- 2KHz) and finally restoring the original sound.

이와 같이, 본 발명의 본 발명의 대역폭 확장 장치는, 초협대역 신호(Super-Narrowband signal)가 인-이어 마이크로폰으로 입력되더라도 원음 복원이 가능하도록 하고 있다. 즉, 일반적으로 고음 복원 알고리즘은 0 ~ 4KHz를 8KHz 까지 확장하는데 반해, 본 발명에서는 인-이어 마이크로폰으로 입력되는 2KHz 미만의 초협대역 신호에 대해 복원이 이루어지게 된다. 게다가, 본 발명에서는 연산량이 현저하게 감소되었음에도 불구하고 원음을 복원할 수 있다.As described above, according to the present invention, the bandwidth extension device is capable of restoring original sound even when a super-narrowband signal is input as an in-ear microphone. That is, in general, the treble reconstruction algorithm extends from 0 to 4 KHz to 8 KHz, whereas in the present invention, reconstruction is performed on an ultrasound band signal of less than 2 KHz input by the in-ear microphone. In addition, in the present invention, the original sound can be restored even though the amount of calculation is significantly reduced.

도 2에서 제시한 기존 음성 합성 장치에 대비하여, 선형 예측 부호화 이후에 여기신호를 확장하는 기능은 그대로 수행하고 있으나, 스펙트럼 포락선 확장부의 기능은 제거되어 있다. 기존 음성 합성 장치는 선형 예측 부호화 기반 알고리즘을 통해 주파수를 예측하여 확장시키고 있는데 반해, 본 발명에서는 선형 예측 부호화 기반 알고리즘을 통해 주파수를 예측하여 확장시키는 연산은 수행하지 않으며, 고주파 스펙트럼 확장(High Frequency Spectrum Extension)을 통해 단순 주파수 확장이 이루어지도록 한다. 즉, 주파수를 예측해서 실시간으로 만들어서 확장시키는 연산은 생략하고, 스펙트럼 폴딩(spectral folding), 변환(modulation) 기법을 사용해서 주파수만 확장시킨다. 이에 연산량이 크게 감소시킬 수 있다.In contrast to the conventional speech synthesizer shown in FIG. 2, the function of extending the excitation signal after linear prediction coding is performed as it is, but the function of the spectrum envelope expanding unit is eliminated. In the present invention, the conventional speech synthesis apparatus does not perform an operation of predicting and extending a frequency through a linear prediction encoding based algorithm, whereas a high frequency spectrum Extension to allow simple frequency extension. In other words, the operation of predicting the frequency in real time and expanding it is omitted, and only the frequency is expanded by using spectral folding and modulation. Therefore, the amount of computation can be greatly reduced.

이와 같이 고주파 스펙트럼 확장부(23)에서 단순히 주파수만 확장시킴으로써 광대역 신호가 출력되면, 이에 대해 선형 예측 분석을 수행한 후, 선형 예측 모델링을 통한 주파수 확장을 수행하지 않고 필터를 사용하여 단순 필터링만을 수행한다. 즉, 대역폭 확장 없이 원음에 근접한 필터링이 이루어지는 것이다. 이후, 필터링된 결과와 여기신호가 확장된 결과를 합성하면 고주파 신호가 생성된다. 이어서, 마지막으로 고주파 신호와 인-이어 마이크로폰을 통해 입력받은 초협대역 신호를 믹싱하면 원음이 복원된다.When the wideband signal is output by simply extending the frequency in the high frequency spectrum expanding unit 23 as described above, the linear prediction analysis is performed on the wideband signal, and then only the simple filtering is performed using the filter without performing the frequency expansion through the linear prediction modeling do. That is, the filtering is performed close to the original sound without expanding the bandwidth. Thereafter, a high-frequency signal is generated by combining the filtered result and the expanded result of the excitation signal. Finally, when the ultrasound band signal received through the in-ear microphone is mixed with the high-frequency signal, the original sound is restored.

도 4는 본 발명의 적용례로서, 무선 이어셋/헤드셋에 적용되는 경우의 개념도이다.4 is a conceptual diagram of an application of the present invention to a wireless earset / headset.

도 4를 참조하면, 본 발명의 대역폭 확장이 이어셋의 DSP, 즉 예를 들어 블루투스 칩셋(DSP)에서 이루어지는 경우에 대해 설명하고 있다. 이 경우 연산량이 현저하게 감소하여 블루투스 칩셋에서의 실시간 처리가 가능할 뿐 아니라, 무선 전송 지연을 최소화할 수 있다. 물론, 이어셋과 스마트폰은 유선접속될 수 있다.Referring to FIG. 4, the bandwidth extension of the present invention is described in a DSP of an earset, for example, a Bluetooth chipset (DSP). In this case, the amount of computation can be remarkably reduced to enable real-time processing in the Bluetooth chipset, and wireless transmission delay can be minimized. Of course, earsets and smartphones can be wired.

도 5는 본 발명이 다른 적용례로서, 유선 이어셋/헤드셋에 적용되는 경우의 개념도이다.5 is a conceptual diagram when the present invention is applied to a wired ears / headset as another application example.

도 5를 참조하면, 본 발명의 대역폭 확장이 스마트폰 등에서 이루어지는 경우에 대해 설명하고 있다. 이 경우, 이어셋과 스마트폰이 유선접속될 수 있으며, 스마트폰 칩셋에서 실시간 처리가 가능하다. 물론, 이어셋과 스마트폰은 무선접속될 수 있다.Referring to FIG. 5, the case where the bandwidth extension of the present invention is performed in a smart phone or the like is described. In this case, the earset and the smartphone can be connected by wire, and the real time processing is possible in the smartphone chipset. Of course, earsets and smartphones can be wirelessly connected.

그러면, 여기서 상기와 같이 구성된 대역폭 확장 장치를 이용한 본 발명의 인-이어 마이크로폰을 갖는 이어셋의 대역폭 확장 방법에 대해 설명하기로 한다.Hereinafter, a method of extending the bandwidth of an earset having the in-ear microphone of the present invention using the bandwidth extension device configured as described above will be described.

도 6은 본 발명의 일 실시예로서, 인-이어 마이크로폰을 갖는 이어셋의 대역폭 확장 방법의 흐름도이다.Figure 6 is a flow diagram of a method for extending the bandwidth of an earset with an in-ear microphone, according to one embodiment of the present invention.

도 6을 참조하면, 인-이어 마이크로폰으로 초협대역 신호가 입력되면(S1), 여기신호의 확장이 이루어지는 과정으로서, 입력된 초협대역 신호(Super-Narrowband signal)로부터 여기신호(excitation signal)를 결정하고(S2), 결정된 여기신호를 광대역 여기신호로 확장시킨다(S3).Referring to FIG. 6, when an ultrasound band signal is input to an in-ear microphone (S1), an excitation signal is determined from an input super-narrowband signal as a process of expanding an excitation signal (S2), and expands the determined excitation signal to a wideband excitation signal (S3).

한편, 입력된 초협대역 신호의 주파수를 배가시켜 고주파 대역 신호를 포함하는 광대역 신호로 확장시킨다(S4).Meanwhile, the frequency of the input ultrasound band signal is doubled and expanded to a wideband signal including a high frequency band signal (S4).

이에, 확장된 광대역 신호로부터 고주파 대역 신호를 추정 및 결정한다(S5).Accordingly, the high frequency band signal is estimated and determined from the extended wide band signal (S5).

이어서, 추정 및 결정된 고주파 대역 신호를 필터링한다(S6).Next, the estimated and determined high-frequency band signals are filtered (S6).

한편, 필터링된 고주파 대역 신호와 광대역 여기신호를 합성하여 고주파 신호를 생성하게 된다(S7).Meanwhile, the filtered high frequency band signal and the wideband excitation signal are combined to generate a high frequency signal (S7).

이어서, 고주파 신호와 초협대역 신호를 믹싱하여 원음을 복원한다(S8).Subsequently, the high-frequency signal and the ultra-narrowband signal are mixed to restore original sound (S8).

이와 같이 본 발명에서는, 초협대역 신호(0 ~ 2KHz)로부터 선형 예측을 통해 여기신호를 확장하는 영역과, 초협대역 신호로부터 단순 주파수 확장을 수행하고, 단순 확장된 고주파수 신호로부터 선형 예측을 통해 고주파 신호를 예측하고, 예측된 고주파 신호를 단순 필터링하는 영역으로 구성하고 있다. 이후, 확장된 여기신호와 필터링된 고주파 신호를 합성하여 고주파 신호를 생성하고, 이어서 고주파 신호와 초협대역 신호로부터 광대역신호(0 ~ 8KHz)를 생성한다. 이 때, 고주파 스펙트럼 확장부(23)에서의 단순 확장 기법으로는, 스펙트럼 폴딩(spectral folding), 변환(modulation) 기법을 이용할 수 있다.As described above, according to the present invention, it is possible to perform a simple frequency extension from the ultrasound band signal (0 to 2 KHz) by extending the excitation signal through the linear prediction and a high frequency signal And the predicted high-frequency signal is simply filtered. Thereafter, the extended excitation signal and the filtered high-frequency signal are combined to generate a high-frequency signal, and then a broadband signal (0 to 8 KHz) is generated from the high-frequency signal and the ultra-narrowband signal. At this time, spectral folding and modulation techniques can be used as the simple extension method in the high frequency spectrum extension unit 23. [

이상 몇 가지의 실시예를 통해 본 발명의 기술적 사상을 살펴보았다.The technical idea of the present invention has been described through several embodiments.

본 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 본 발명의 기재사항으로부터 상기 살펴본 실시예를 다양하게 변형하거나 변경할 수 있음은 자명하다. 또한, 비록 명시적으로 도시되거나 설명되지 아니하였다 하여도 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 본 발명의 기재사항으로부터 본 발명에 의한 기술적 사상을 포함하는 다양한 형태의 변형을 할 수 있음은 자명하며, 이는 여전히 본 발명의 권리범위에 속한다. 첨부하는 도면을 참조하여 설명된 상기의 실시예들은 본 발명을 설명하기 위한 목적으로 기술된 것이며 본 발명의 권리범위는 이러한 실시예에 국한되지 아니한다.It will be apparent to those skilled in the art that various changes and modifications may be made to the embodiments described above from the description of the present invention. Further, although not explicitly shown or described, those skilled in the art can make various modifications including the technical idea of the present invention from the description of the present invention Which is still within the scope of the present invention. The above-described embodiments described with reference to the accompanying drawings are for the purpose of illustrating the present invention, and the scope of the present invention is not limited to these embodiments.

21 : 제1 선형 예측 분석부
22 : 여기신호 확장부
23 : 고주파 스펙트럼 확장부
24 : 제2 선형 예측 분석부
25 : 필터링부
26 : 합성부
27 : 믹싱부21: First linear prediction analysis unit
22:
23: High-frequency spectral broadening
24: The second linear prediction analysis unit
25: Filtering section
26:
27: Mixing section

Claims

An excitation signal extended from a super-narrowband signal within 2KHz inputted from an in-ear microphone and a high-frequency band signal in a range of 4KHz to 8KHz, which are expanded and filtered by doubling the frequency of the ultra- A high-frequency signal generator for generating a high-frequency signal; And
And a mixing unit for mixing the high-frequency signal and the ultra-narrowband signal,
Wherein the in-ear microphone blocks the sound of the outside of the ear of the body and collects the raw sound which is transmitted through the eustachian tube in the direction of the auricle;
The high frequency signal generation unit may multiply the ultra narrowband signal within the frequency range of 0 to 4 KHz of the unprocessed original voice to the high frequency band by simply doubling the frequency band and extract the high frequency band through simple filtering in the extended high frequency band, &Lt; / RTI >
Wherein the mixing unit reconstructs an in-band speech signal within 2 KHz of the raw speech and a high-frequency band signal generated by the high-frequency signal generation unit by minimizing the calculation amount, An earpiece bandwidth extension device having an ear microphone.

The method according to claim 1,
Wherein the high-
A first linear prediction analyzer for determining the excitation signal through linear prediction from the ultrasound band signal;
An excitation signal expander for expanding the determined excitation signal to a wideband excitation signal within 4 KHz;
A high-frequency spectrum expander for doubling the frequency of the ultra-narrowband signal to a wideband signal including a high-frequency band signal;
A second linear prediction analyzer for linearly predicting the high frequency band signal from the extended wideband signal;
A filtering unit for filtering the high frequency band signal output from the second LPC analysis unit in a range of 4 KHz to 8 KHz using a high pass filter; And
And a combining unit for combining the high frequency band signal output from the filtering unit and the wideband excitation signal output from the excitation signal expanding unit.

The method according to claim 1,
Wherein the extension of the excitation signal uses an in-ear microphone using a spectral folding technique.

The method according to claim 1,
Wherein the extension of the wideband signal uses either an in-ear microphone or a spectral folding or modulation technique.

The method according to claim 1,
Wherein the high-frequency signal generator and the mixer have an in-ear microphone configured in a circuit of the earphone.

6. The method of claim 5,
Wherein the earset comprises an in-ear microphone including a Bluetooth chipset.

The method according to claim 1,
Wherein the high frequency signal generator and the mixer have an in-ear microphone configured in a circuit of a smart phone.

(a) an excitation signal extended from a super-narrowband signal within 2KHz inputted from an in-ear microphone, and an excitation signal expanded in a frequency range of 4KHz to 8KHz by doubling the frequency of the ultra- Synthesizing a high frequency band signal to generate a high frequency signal;
(b) mixing the high frequency signal and the ultrasound band signal,
Wherein the in-ear microphone blocks the sound of the outside of the ear of the body and collects the raw sound which is transmitted through the eustachian tube in the direction of the auricle;
In the step (a), the ultra-narrowband signal within the frequency range of 0 to 4 KHz of the original raw speech is doubled and expanded to a high frequency band, and the high frequency band is extracted through simple filtering in the extended high frequency band Minimize computation;
In the step (b), an ultrasonic band signal within 2 KHz of the original raw speech and a high-frequency band signal generated by the high-frequency signal generation unit are minimized by restricting the calculation amount to restore / RTI > wherein the earphone has an in-ear microphone.

9. The method of claim 8,
The step (a)
Determining the excitation signal from the ultrasound band signal;
Expanding the determined excitation signal to a wideband excitation signal;
Expanding the frequency of the ultra-narrow band signal to a wide band signal including a high frequency band signal by doubling (N times) the frequency of the ultra low band signal;
Estimating and determining a high frequency band signal from the extended wideband signal;
A filtering unit for filtering the determined high frequency band signal; And
And combining the filtered high-frequency band signal with the expanded wide-band excitation signal.