WO2017116022A1 - Apparatus and method for extending bandwidth of earset having in-ear microphone - Google Patents

Apparatus and method for extending bandwidth of earset having in-ear microphone Download PDF

Info

Publication number
WO2017116022A1
WO2017116022A1 PCT/KR2016/013989 KR2016013989W WO2017116022A1 WO 2017116022 A1 WO2017116022 A1 WO 2017116022A1 KR 2016013989 W KR2016013989 W KR 2016013989W WO 2017116022 A1 WO2017116022 A1 WO 2017116022A1
Authority
WO
WIPO (PCT)
Prior art keywords
signal
high frequency
frequency band
wideband
band signal
Prior art date
Application number
PCT/KR2016/013989
Other languages
French (fr)
Korean (ko)
Inventor
김은동
Original Assignee
주식회사 오르페오사운드웍스
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from KR1020160009803A external-priority patent/KR20170080387A/en
Application filed by 주식회사 오르페오사운드웍스 filed Critical 주식회사 오르페오사운드웍스
Publication of WO2017116022A1 publication Critical patent/WO2017116022A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/06Determination or coding of the spectral characteristics, e.g. of the short-term prediction coefficients
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/038Speech enhancement, e.g. noise reduction or echo cancellation using band spreading techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/04Time compression or expansion
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/10Earpieces; Attachments therefor ; Earphones; Monophonic headphones

Definitions

  • the present invention relates to a speech reconstruction technique. More specifically, the present invention relates to an apparatus and method for expanding bandwidth of an earset having an in-ear microphone for recovering a high range from a low range input to an in-ear microphone.
  • Such an earset may perform a function of transmitting sound to the ear canal and a function of collecting a user's voice in one body.
  • the speaker is directed toward the ear canal for sound transmission, and the microphone is exposed to the outside for collecting user voice.
  • the microphone exposed to the outside is not only user voice, but also external noise is collected together.
  • an earset having a microphone (in-ear microphone) installed in the ear canal direction has been proposed, but the frequency at which the voice is transmitted from the vocal cords to the eardrum through the eustachian tube is a low range of 0 to 2KHz. Therefore, there is a difficulty in restoring the original sound only by the low range input to the in-ear microphone.
  • 1 is a control circuit block diagram of a conventional speech synthesis apparatus.
  • a conventional speech synthesis apparatus includes a frequency for extending an in-ear microphone 1 and a signal transmitted from the in-ear microphone 1 to a low frequency band and a high frequency band.
  • a band extension section 2 a low frequency band signal extraction section 3 for extracting a low frequency band signal from the extended signal, at least one or more out-ear microphones 4, and an out-ear microphone 4
  • a beamforming unit 5 for beamforming a signal
  • a high frequency band signal extracting unit 6 for extracting a high frequency band signal from the beamformed signal
  • a packet by sensing an amplitude value of voice for at least one channel The low frequency band signal transmitted from the low frequency band signal extracting unit 3 and the high frequency band signal extracting unit 6 driven in response to the voice activity detection and the voice activity detecting unit 7 for determining whether to generate or not.
  • a synthesizer 8 for synthesizing the call and the high frequency band signal.
  • the original sound is restored by synthesizing the beamformed high frequency band signal on the out-ear microphone 4 side and the low frequency band signal transmitted from the in-ear microphone 1.
  • techniques used for treble recovery include spectral folding, spectral shifting, nonlinear processing using a rectifier, and linear predictive coding (LPC).
  • linear predictive encoding technique is widely used in speech encoding and decoding, and a linear predictive encoding algorithm may be used in a speech decoding apparatus that can be used in hearing aids or the like as described in US Pat. No. 8,306,249.
  • the sound is sourced through the tremor of the vocal cords and the sound is filtered out according to the oral cavity, the nasal cavity, and the mouth structure.
  • the mathematical modeling of this is source-filter modeling. In other words, by modeling a source and adding a filter to it, you can model how the tremor is reproduced as a voice.
  • FIG. 2 is a control circuit block diagram of a speech synthesis apparatus using a conventional linear prediction coding technique.
  • the conventional speech synthesis apparatus includes a linear prediction analyzer 11 for determining an excitation signal from an input narrowband signal, and a spectral folding technique or Gaussian noise passband conversion technique for the determined excitation signal.
  • An excitation signal expansion unit 12 for outputting a wideband excitation signal to generate sound through the same, a feature extraction unit 13 for extracting voice feature information from the input narrowband signal, low frequency envelope information, and the excitation signal, and Outputs a wideband high frequency signal using one of codebook mapping, artificial neural network, and Gaussian Mixture Model for the envelope component represented by linear spectrum frequency in response to the information.
  • the spectral envelope expansion unit 14 that generates a voice, and synthesizes a wideband excitation signal and a wideband high frequency signal to It consists of a composite section 15 desired.
  • the conventional speech synthesizer configured as described above has a problem that codebook mapping, artificial neural network, and Gaussian Mixture Model techniques used by the spectral envelope expansion unit 14 are difficult to process in real time due to the large amount of computation.
  • codebook mapping used by the spectral envelope expansion unit 14
  • Gaussian Mixture Model techniques used by the spectral envelope expansion unit 14 are difficult to process in real time due to the large amount of computation.
  • DSP chipset
  • a large amount of computation may cause a delay.
  • it is not suitable to apply the existing linear predictive coding technique to the low range input to the in-ear microphone.
  • An object and method for extending the bandwidth of an earset having an in-ear microphone for simply expanding a narrowband signal input to an in-ear microphone into a high frequency band and extracting a high frequency band through simple filtering in the extended high frequency band To provide.
  • the apparatus for extending the bandwidth of an ear set having an in-ear microphone of the present invention preferably includes an excitation signal extended from an input super-narrowband signal and the ultra narrow.
  • the high frequency signal generator may include a first linear prediction analyzer configured to determine the excitation signal from the ultra narrowband signal; An excitation signal expansion unit for extending the determined excitation signal into a wideband excitation signal; A high frequency spectral expansion unit that multiplies (N times) the frequency of the ultra narrowband signal to a wideband signal including a high frequency band signal; A second linear prediction analyzer configured to estimate and determine a high frequency band signal from the expanded wideband signal; A filtering unit filtering the high frequency band signal output from the second linear prediction analyzer; And a synthesizer configured to synthesize the high frequency band signal output from the filtering unit and the wideband excitation signal output from the excitation signal extension unit.
  • a first linear prediction analyzer configured to determine the excitation signal from the ultra narrowband signal
  • An excitation signal expansion unit for extending the determined excitation signal into a wideband excitation signal into a wideband excitation signal
  • a high frequency spectral expansion unit that multiplies (N times) the frequency of the ultra narrowband signal to
  • the extension of the excitation signal may use any one of a spectral folding technique and a Gaussian noise passband conversion technique.
  • the widening of the wideband signal may use any one of a rectifier, a spectral folding, and a modulation technique.
  • the high frequency signal generation unit and mixing unit may be configured in the circuit of the ear set, wherein the ear set may include a Bluetooth chipset.
  • the high frequency signal generation unit and the mixing unit may be configured in the circuit of the smartphone.
  • the bandwidth expansion method of the earset having the in-ear microphone of the present invention preferably (a) the excitation signal (excitation signal) extended from the input super- narrowband signal (Super-Narrowband signal) and the ultra narrowband signal Synthesizing the expanded high frequency band signal by doubling the frequency to generate a high frequency signal; And (b) mixing the high frequency signal and the ultra narrowband signal.
  • step (a) may include determining the excitation signal from the ultra narrowband signal; Expanding the determined excitation signal into a wideband excitation signal; Multiplying (N times) the frequency of the ultra narrowband signal to a wideband signal including a high frequency band signal; Estimating and determining a high frequency band signal from the expanded wideband signal; A filtering unit filtering the determined high frequency band signal; And synthesizing the filtered high frequency band signal and the extended wideband excitation signal.
  • the narrowband signal input to the in-ear microphone is simply doubled to expand to a high frequency band, and an extended high frequency band. Since only high-frequency band is extracted by performing simple filtering, the amount of computation can be significantly reduced.
  • the real-time processing is possible according to the decrease in the amount of computation, thereby preventing a signal transmission delay phenomenon.
  • 1 is a control circuit block diagram of a conventional speech synthesis apparatus.
  • FIG. 2 is a control circuit block diagram of a speech synthesis apparatus using a conventional linear prediction coding technique.
  • FIG. 3 is a block diagram of a control circuit of an apparatus for expanding a bandwidth of an ear set having an in-ear microphone according to one embodiment of the present invention.
  • FIG. 4 is a conceptual diagram of an application of the present invention in the case of being applied to a wireless earset / headset.
  • FIG. 5 is a conceptual diagram when the present invention is applied to a wired earset / headset as another application example.
  • FIG. 6 is a flow chart of a method for bandwidth expansion of an earset with an in-ear microphone as an embodiment of the present invention.
  • ⁇ means means a unit that processes at least one function or operation, Each of these may be implemented by software or hardware, or a combination thereof.
  • the present invention relates to a method of recovering a high frequency band signal from a low frequency band signal of a user voice transmitted through an in-ear microphone.
  • a technique that enables the restoration of high-pitched sound in real time using the DSP of Bluetooth.
  • the sound is sourced through the tremor of the vocal cords, and the sound is filtered into different sounds depending on the oral cavity, the nasal cavity and the mouth structure. That is, it is divided into excitation signal components representing disturbances generated as air passes between sources or narrow gaps, and envelope components generating filters.
  • the excitation signal component and the envelope component are each subjected to a wideband extension process. Since the influence of the excitation signal component is relatively small compared to the envelope component, the spectral folding technique or the spectral parallel shift technique is used. do.
  • the present invention proposes a method for enabling real-time high-pitched sound restoration and original sound restoration by significantly reducing the amount of computation.
  • FIG. 3 is a block diagram of a control circuit of an apparatus for expanding a bandwidth of an ear set having an in-ear microphone according to one embodiment of the present invention.
  • the apparatus for extending bandwidth of the present invention includes a first linear prediction analyzer 21 for determining an excitation signal from an input super-narrowband signal, and the determined excitation signal.
  • An excitation signal extension 22 for generating sound by outputting a wideband excitation signal through a spectral folding technique or a Gaussian noise passband conversion technique, and a high frequency band signal by doubling (N times) the frequency of the ultra narrowband signal.
  • the bandwidth extension device of the present invention multiplies and expands and filters the excitation signal extended from the super- narrowband signal inputted at a high frequency and the super narrowband signal to expand and filter the high frequency band signal.
  • a high frequency signal generation unit for synthesizing and generating a high frequency signal, and a mixing unit 27 for mixing a high frequency signal and an ultra narrow band signal.
  • the high frequency spectral expansion unit 23 upsamples the ultra narrowband signal (0 to 2 KHz) twice, and the upsampled signal is sampled at 4 KHz.
  • the signal output from the high frequency spectrum expansion unit 23 is the same as the 0 ⁇ 4KHz band, the high frequency band 4 ⁇ 8KHz will have the same spectrum as the folded version of the input signal.
  • the spectrum is used to estimate the high frequency band signal. Accordingly, the filtering unit 25 extracts the voice signal of the 4 ⁇ 8KHz band.
  • the synthesizer 26 synthesizes a voice signal in the 0-4KHz band and a voice signal in the 4-8KHz band, and then the high-frequency voice output from the combiner 26 and the ultra narrowband signal before extension (0-4KHz). 2KHz) to finally restore the original sound.
  • the bandwidth extension device of the present invention enables the original sound recovery even if a super-narrowband signal is input to the in-ear microphone. That is, in general, the treble reconstruction algorithm extends 0 to 4KHz to 8KHz, whereas in the present invention, the reconstruction is performed for a narrowband signal of less than 2KHz input to the in-ear microphone. In addition, in the present invention, the original sound can be restored even though the calculation amount is significantly reduced.
  • the function of extending the excitation signal after linear prediction encoding is performed as it is, but the function of the spectral envelope expansion unit is removed.
  • the conventional speech synthesis apparatus predicts and extends a frequency through a linear predictive coding based algorithm
  • the present invention does not perform an operation of predicting and expanding a frequency through a linear predictive coding based algorithm, and performs high frequency spectrum expansion (High Frequency Spectrum). Extension) allows simple frequency extension. That is, the operation of estimating and extending the frequency in real time is omitted, and only the frequency is extended by using rectifier, spectral folding, and modulation techniques. This can greatly reduce the amount of computation.
  • FIG. 4 is a conceptual diagram of an application of the present invention in the case of being applied to a wireless earset / headset.
  • the bandwidth extension of the present invention is performed in the DSP of the earset, that is, for example, a Bluetooth chipset (DSP) is described.
  • DSP Bluetooth chipset
  • the amount of computation is significantly reduced, enabling real-time processing in the Bluetooth chipset and minimizing radio transmission delay.
  • the earset and the smartphone can be wired connection.
  • FIG. 5 is a conceptual diagram when the present invention is applied to a wired earset / headset as another application example.
  • the bandwidth extension of the present invention is performed in a smartphone or the like.
  • the earset and the smartphone may be wired, and real-time processing is possible in the smartphone chipset.
  • the earset and the smartphone can be wirelessly connected.
  • FIG. 6 is a flow chart of a method for bandwidth expansion of an earset with an in-ear microphone as an embodiment of the present invention.
  • an ultra narrowband signal is input to an in-ear microphone (S1)
  • the excitation signal is expanded, and an excitation signal is determined from the input super- narrowband signal.
  • S2 the determined excitation signal is extended to a wideband excitation signal (S3).
  • the frequency of the input super narrowband signal is doubled to extend the wideband signal including the high frequency band signal (S4).
  • the high frequency band signal is estimated and determined from the extended wideband signal (S5).
  • the estimated and determined high frequency band signal is filtered (S6).
  • a high frequency signal is generated by combining the filtered high frequency band signal and the wideband excitation signal (S7).
  • the high frequency signal and the ultra narrowband signal are mixed to restore the original sound (S8).
  • a rectifier, a spectral folding, and a modulation technique may be used as a simple extension technique in the high frequency spectrum extension unit 23, a rectifier, a spectral folding, and a modulation technique may be used.

Abstract

Disclosed is an apparatus and method for extending the bandwidth of an earset having an in-ear microphone. The apparatus and method for extending the bandwidth of an earset having an in-ear microphone according to the present invention comprises: a high-frequency signal generation unit for generating a high-frequency signal by synthesizing an excitation signal extended from an input super-narrowband signal with a high-frequency band signal, wherein the high-frequency band signal is obtained by multiplying the frequency of the super-narrowband signal, extending the super-narrowband signal to the multiplied frequency, and filtering the same; and a mixing unit for mixing the high-frequency signal and the super-narrowband signal.

Description

인-이어 마이크로폰을 갖는 이어셋의 대역폭 확장 장치 및 방법Bandwidth Expansion Apparatus and Method for Earsets with In-Ear Microphones
본 발명은 음성 복원 기법에 관한 것이다. 더 구체적으로는 인-이어 마이크로폰에 입력되는 저음역으로부터 고음역을 복원하는 인-이어 마이크로폰을 갖는 이어셋의 대역폭 확장 장치 및 방법에 관한 것이다.The present invention relates to a speech reconstruction technique. More specifically, the present invention relates to an apparatus and method for expanding bandwidth of an earset having an in-ear microphone for recovering a high range from a low range input to an in-ear microphone.
최근, 스피커와 마이크로폰을 일체화시킨 이어셋이 많이 제안되고 있다.Recently, many earphones have been proposed in which a speaker and a microphone are integrated.
이러한 이어셋은 외이도로 음향을 전달하는 기능과 사용자 음성을 집음하는 기능을 하나의 바디(Body)에서 수행할 수 있다. 이에 통상, 스피커는 음향 전달을 위해 외이도 방향을 향하게 되며, 마이크로폰은 사용자 음성을 집음하기 위해 외부로 노출되게 된다.Such an earset may perform a function of transmitting sound to the ear canal and a function of collecting a user's voice in one body. Typically, the speaker is directed toward the ear canal for sound transmission, and the microphone is exposed to the outside for collecting user voice.
그런데, 외부로 노출된 마이크로폰에는 사용자 음성뿐 아니라, 외부 소음도 함께 집음되게 된다.However, the microphone exposed to the outside is not only user voice, but also external noise is collected together.
이에 외부 소음 문제를 해결하기 위해, 마이크로폰(인-이어 마이크로폰)을 외이도 방향으로 설치한 이어셋이 제안된 바 있으나, 목소리가 성대로부터 유스타키오관을 통하여 고막쪽으로 전달되는 주파수는 0 ~ 2KHz 정도의 저음역이고, 이에 따라 인-이어 마이크로폰에 입력되는 저음역만으로 원음을 복원하는데 어려움이 있다.In order to solve the external noise problem, an earset having a microphone (in-ear microphone) installed in the ear canal direction has been proposed, but the frequency at which the voice is transmitted from the vocal cords to the eardrum through the eustachian tube is a low range of 0 to 2KHz. Therefore, there is a difficulty in restoring the original sound only by the low range input to the in-ear microphone.
이러한 고음역 상실 문제를 해결하기 위해, 마이크로폰을 다수개 구성하여 마이크로폰에 입력된 서로 다른 대역의 주파수 음성을 합성하여 원음으로 복원시키는 기술이 제안된 바 있다. 즉, 외이도 측에 설치되는 인-이어(In-Ear) 마이크로폰과, 귓바퀴 바깥쪽에 설치되는 아웃-이어(Out-Ear) 마이크로폰을 함께 구성하고, 인-이어(In-Ear) 마이크로폰과 아웃-이어(Out-Ear) 마이크로폰으로 입력되는 서로 다른 대역의 주파수 음성을 합성하여 원음을 복원하는 것이다.In order to solve such a high frequency loss problem, a technique of constructing a plurality of microphones, synthesizing frequency voices of different bands input to the microphone, and restoring the original sound has been proposed. That is, an in-ear microphone installed on the ear canal side and an out-ear microphone installed on the outer side of the ear canal are configured together, and the in-ear microphone and the out-ear (Out-Ear) To recover the original sound by synthesizing the frequency voices of different bands input to the microphone.
그럼, 여기서 음성을 합성하여 원음을 복원하는 기존 기술에 대해 설명한다.Then, here, the existing technique for synthesizing the speech and restoring the original sound is described.
도 1은 기존 음성 합성 장치의 제어회로블록도이다.1 is a control circuit block diagram of a conventional speech synthesis apparatus.
도 1을 참조하면, 기존 음성 합성 장치는, 본 출원인에 의해 출원된 것으로서, 인-이어 마이크로폰(1)과, 인-이어 마이크로폰(1)으로부터 전달된 신호를 저주파 대역 및 고주파 대역으로 확장시키는 주파수 대역 확장부(2)와, 확장된 신호로부터 저주파 대역 신호를 추출하는 저주파 대역 신호 추출부(3)와, 적어도 하나 이상의 아웃-이어 마이크로폰(4)과, 아웃-이어 마이크로폰(4)으로부터 전달된 신호를 빔포밍 처리하는 빔포밍부(5)와, 빔포밍 처리된 신호로부터 고주파 대역 신호를 추출하는 고주파 대역 신호 추출부(6)와, 적어도 하나 이상의 채널에 대한 음성의 진폭값을 감지하여 패킷 생성여부를 결정하는 음성 활동 감지부(7)와, 음성 활동 감지에 대응하여 구동되어 저주파 대역 신호 추출부(3) 및 고주파 대역 신호 추출부(6)에서 전달된 저주파 대역 신호 및 고주파 대역 신호를 합성하는 합성부(8)로 구성된다.Referring to FIG. 1, a conventional speech synthesis apparatus, as filed by the present applicant, includes a frequency for extending an in-ear microphone 1 and a signal transmitted from the in-ear microphone 1 to a low frequency band and a high frequency band. A band extension section 2, a low frequency band signal extraction section 3 for extracting a low frequency band signal from the extended signal, at least one or more out-ear microphones 4, and an out-ear microphone 4 A beamforming unit 5 for beamforming a signal, a high frequency band signal extracting unit 6 for extracting a high frequency band signal from the beamformed signal, and a packet by sensing an amplitude value of voice for at least one channel The low frequency band signal transmitted from the low frequency band signal extracting unit 3 and the high frequency band signal extracting unit 6 driven in response to the voice activity detection and the voice activity detecting unit 7 for determining whether to generate or not. And a synthesizer 8 for synthesizing the call and the high frequency band signal.
이와 같이 구성된 기존 음성 합성 장치에서는, 아웃-이어 마이크로폰(4) 측의 빔포밍 처리된 고주파 대역 신호와 인-이어 마이크로폰(1)으로부터 전달된 저주파 대역 신호를 합성하여 원음을 복원하고 있다.In the conventional speech synthesizer configured as described above, the original sound is restored by synthesizing the beamformed high frequency band signal on the out-ear microphone 4 side and the low frequency band signal transmitted from the in-ear microphone 1.
그러나, 기존 음성 합성 장치에서는 다수의 마이크로폰을 구성하여야 하므로 제조 비용이 증가하는 문제가 있다. 또한, 아웃-이어 마이크로폰(4)으로는 여전히 외부 소음이 입력되므로, 외부 소음을 완벽하게 제거하는 것은 실질적으로 불가능하며, 또한 외부 소음을 제거하기 위한 필터링이 반드시 수반되어야 하는 문제가 있다.However, in the existing speech synthesis apparatus, since a plurality of microphones must be configured, a manufacturing cost increases. In addition, since the outside noise is still input to the out-ear microphone 4, it is practically impossible to completely remove the outside noise, and there is a problem that filtering must be accompanied to remove the outside noise.
한편, 고음 복원에 이용되는 기술들로는, 스펙트럼 폴딩, 스펙트럼 쉬프팅, 정류기를 이용한 비선형 처리, 선형 예측 부호화(LPC; Linear Predicative coding) 등이 있다.On the other hand, techniques used for treble recovery include spectral folding, spectral shifting, nonlinear processing using a rectifier, and linear predictive coding (LPC).
여기서, 선형 예측 부호화 기술은 음성 부호화나 복호화시에 널리 이용되고 있는 기술로서, 미국등록특허(US 8,306,249) 등에서 알 수 있는 것처럼 보청기 등에 사용 가능한 음성 복호화 장치에서 선형 예측 부호화 알고리즘을 사용하기도 한다.Here, the linear predictive encoding technique is widely used in speech encoding and decoding, and a linear predictive encoding algorithm may be used in a speech decoding apparatus that can be used in hearing aids or the like as described in US Pat. No. 8,306,249.
선형 예측 부호화의 source-filter 모델링 기술에 의하면, 소리는 성대의 떨림을 통해서 발생(source)되고, 소리는 구강, 비강, 입 구조에 따라서 다른 소리로 배출(filter)된다. 이를 수학적으로 모델링한 것이 source-filter 모델링이다. 즉, 소스를 모델링하고, 여기에 필터를 추가하면, 떨림이 목소리(원음)로 재생되는 과정을 모델링할 수 있다.According to the source-filter modeling technique of linear prediction coding, the sound is sourced through the tremor of the vocal cords and the sound is filtered out according to the oral cavity, the nasal cavity, and the mouth structure. The mathematical modeling of this is source-filter modeling. In other words, by modeling a source and adding a filter to it, you can model how the tremor is reproduced as a voice.
도 2는 기존 선형 예측 부호화 기법을 이용한 음성 합성 장치의 제어회로블록도이다.2 is a control circuit block diagram of a speech synthesis apparatus using a conventional linear prediction coding technique.
도 2를 참조하면, 기존 음성 합성 장치는, 입력된 협대역 신호로부터 여기신호(excitation signal)를 결정하는 선형 예측 분석부(11)와, 결정된 여기신호를 스펙트럼 폴딩 기법 또는 가우시안 노이즈 통과대역 변환 기법 등을 통해 광대역 여기신호를 출력하여 소리를 생성하는 여기신호 확장부(12)와, 입력된 협대역 신호, 저주파 포락선 정보 및 여기신호로부터 목소리 특징 정보를 추출하는 특징 추출부(13)와, 특징 정보에 대응하여 선 스펙트럼 주파수(linear spectrum frequency)로 표현되는 포락선 성분에 대해 코드북 매핑(Codebook Mapping), 인공신경망, 가우시안 믹싱 모델(Gaussian Mixture Model) 중 어느 하나의 기법을 이용하여 광대역 고주파 신호를 출력하여 목소리를 생성하는 스펙트럼 포락선 확장부(14)와, 광대역 여기신호와 광대역 고주파 신호를 합성하여 원음을 복원하는 합성부(15)로 구성된다.Referring to FIG. 2, the conventional speech synthesis apparatus includes a linear prediction analyzer 11 for determining an excitation signal from an input narrowband signal, and a spectral folding technique or Gaussian noise passband conversion technique for the determined excitation signal. An excitation signal expansion unit 12 for outputting a wideband excitation signal to generate sound through the same, a feature extraction unit 13 for extracting voice feature information from the input narrowband signal, low frequency envelope information, and the excitation signal, and Outputs a wideband high frequency signal using one of codebook mapping, artificial neural network, and Gaussian Mixture Model for the envelope component represented by linear spectrum frequency in response to the information. The spectral envelope expansion unit 14 that generates a voice, and synthesizes a wideband excitation signal and a wideband high frequency signal to It consists of a composite section 15 desired.
그런데, 이와 같이 구성된 기존 음성 합성 장치는 스펙트럼 포락선 확장부(14)에서 이용하는 코드북 매핑(Codebook Mapping), 인공신경망, 가우시안 믹싱 모델(Gaussian Mixture Model) 기법들은 연산량이 너무 방대하여 실시간 처리가 어려운 문제가 있다. 이에, 예를 들면 블루투스 이어셋/헤드셋에 포함되는 칩셋(DSP)으로 처리할 경우에 연산량이 너무 많아 지연 현상이 발생할 수 있다. 한편, 인-이어 마이크로폰에 입력되는 저음역에 대해 기존 선형 예측 부호화 기법을 적용하는 것은 적합하지 않다는 문제가 있다.However, the conventional speech synthesizer configured as described above has a problem that codebook mapping, artificial neural network, and Gaussian Mixture Model techniques used by the spectral envelope expansion unit 14 are difficult to process in real time due to the large amount of computation. . Thus, for example, when processing with a chipset (DSP) included in a Bluetooth earset / headset, a large amount of computation may cause a delay. On the other hand, there is a problem that it is not suitable to apply the existing linear predictive coding technique to the low range input to the in-ear microphone.
본 발명의 목적은 인-이어 마이크로폰에 입력되는 협대역 신호를 고주파 대역으로 단순 확장시키고, 확장된 고주파 대역에서 단순 필터링을 통해 고주파 대역을 추출하는 인-이어 마이크로폰을 갖는 이어셋의 대역폭 확장 장치 및 방법을 제공하는데 있다.An object and method for extending the bandwidth of an earset having an in-ear microphone for simply expanding a narrowband signal input to an in-ear microphone into a high frequency band and extracting a high frequency band through simple filtering in the extended high frequency band To provide.
상기와 같은 목적을 달성하기 위한 본 발명의 인-이어 마이크로폰을 갖는 이어셋의 대역폭 확장 장치는, 바람직하게는 입력된 초협대역 신호(Super-Narrowband signal)로부터 확장된 여기신호(excitation signal)와 상기 초협대역 신호의 주파수를 배가시켜 확장시키고 필터링한 고주파 대역 신호를 합성하여 고주파 신호를 생성하는 고주파 신호 생성부; 및 상기 고주파 신호와 상기 초협대역 신호를 믹싱하는 믹싱부를 포함할 수 있다.In order to achieve the above object, the apparatus for extending the bandwidth of an ear set having an in-ear microphone of the present invention preferably includes an excitation signal extended from an input super-narrowband signal and the ultra narrow. A high frequency signal generator for generating a high frequency signal by synthesizing and extending the frequency of the band signal and synthesizing the filtered high frequency band signal; And a mixing unit for mixing the high frequency signal and the ultra narrow band signal.
이 때, 상기 고주파 신호 생성부는, 상기 초협대역 신호로부터 상기 여기신호를 결정하는 제1 선형 예측 분석부; 결정된 상기 여기신호를 광대역 여기신호로 확장하는 여기신호 확장부; 상기 초협대역 신호의 주파수를 배가(N배)시켜 고주파 대역 신호를 포함하는 광대역 신호로 확장시키는 고주파 스펙트럼 확장부; 확장된 상기 광대역 신호로부터 고주파 대역 신호를 추정 및 결정하는 제2 선형 예측 분석부; 상기 제2 선형 예측 분석부로부터 출력된 고주파 대역 신호를 필터링하는 필터링부; 및 상기 필터링부로부터 출력된 고주파 대역 신호와 상기 여기신호 확장부로부터 출력된 광대역 여기신호를 합성하는 합성부를 포함할 수 있다. 여기서, 상기 여기신호의 확장은 스펙트럼 폴딩 기법 또는 가우시안 노이즈 통과대역 변환 기법 중 어느 하나를 이용할 수 있다. 또한, 상기 광대역 신호의 확장은 정류기(rectifier), 스펙트럼 폴딩(spectral folding), 변환(modulation) 기법 중 어느 하나를 이용할 수 있다.In this case, the high frequency signal generator may include a first linear prediction analyzer configured to determine the excitation signal from the ultra narrowband signal; An excitation signal expansion unit for extending the determined excitation signal into a wideband excitation signal; A high frequency spectral expansion unit that multiplies (N times) the frequency of the ultra narrowband signal to a wideband signal including a high frequency band signal; A second linear prediction analyzer configured to estimate and determine a high frequency band signal from the expanded wideband signal; A filtering unit filtering the high frequency band signal output from the second linear prediction analyzer; And a synthesizer configured to synthesize the high frequency band signal output from the filtering unit and the wideband excitation signal output from the excitation signal extension unit. The extension of the excitation signal may use any one of a spectral folding technique and a Gaussian noise passband conversion technique. In addition, the widening of the wideband signal may use any one of a rectifier, a spectral folding, and a modulation technique.
한편, 상기 고주파 신호 생성부 및 믹싱부는 이어셋의 회로에 구성될 수 있으며, 이 때, 상기 이어셋은 블루투스 칩셋을 포함할 수 있다. 한편, 상기 고주파 신호 생성부 및 믹싱부는 스마트폰의 회로에 구성될 수도 있다.On the other hand, the high frequency signal generation unit and mixing unit may be configured in the circuit of the ear set, wherein the ear set may include a Bluetooth chipset. On the other hand, the high frequency signal generation unit and the mixing unit may be configured in the circuit of the smartphone.
한편, 본 발명의 인-이어 마이크로폰을 갖는 이어셋의 대역폭 확장 방법은, 바람직하게는 (a) 입력된 초협대역 신호(Super-Narrowband signal)로부터 확장된 여기신호(excitation signal)와 상기 초협대역 신호의 주파수를 배가시켜 확장시키고 필터링한 고주파 대역 신호를 합성하여 고주파 신호를 생성하는 단계; 및 (b) 상기 고주파 신호와 상기 초협대역 신호를 믹싱하는 단계를 포함할 수 있다.On the other hand, the bandwidth expansion method of the earset having the in-ear microphone of the present invention, preferably (a) the excitation signal (excitation signal) extended from the input super- narrowband signal (Super-Narrowband signal) and the ultra narrowband signal Synthesizing the expanded high frequency band signal by doubling the frequency to generate a high frequency signal; And (b) mixing the high frequency signal and the ultra narrowband signal.
이 때, 상기 단계 (a)는, 상기 초협대역 신호로부터 상기 여기신호를 결정하는 단계; 결정된 상기 여기신호를 광대역 여기신호로 확장하는 단계; 상기 초협대역 신호의 주파수를 배가(N배)시켜 고주파 대역 신호를 포함하는 광대역 신호로 확장시키는 단계; 확장된 상기 광대역 신호로부터 고주파 대역 신호를 추정 및 결정하는 단계; 결정된 상기 고주파 대역 신호를 필터링하는 필터링부; 및 필터링된 상기 고주파 대역 신호와 확장된 상기 광대역 여기신호를 합성하는 단계를 포함할 수 있다.In this case, step (a) may include determining the excitation signal from the ultra narrowband signal; Expanding the determined excitation signal into a wideband excitation signal; Multiplying (N times) the frequency of the ultra narrowband signal to a wideband signal including a high frequency band signal; Estimating and determining a high frequency band signal from the expanded wideband signal; A filtering unit filtering the determined high frequency band signal; And synthesizing the filtered high frequency band signal and the extended wideband excitation signal.
상술한 바와 같이, 본 발명에 의한 인-이어 마이크로폰을 갖는 이어셋의 대역폭 확장 장치 및 방법에 따르면, 인-이어 마이크로폰에 입력되는 협대역 신호를 단순하게 배가시켜 고주파 대역으로 확장시키고, 확장된 고주파 대역에서 단순 필터링만을 수행하여 고주파 대역을 추출하므로 연산량을 현저하게 감소시킬 수 있다.As described above, according to the apparatus and method for bandwidth expansion of an earset having an in-ear microphone according to the present invention, the narrowband signal input to the in-ear microphone is simply doubled to expand to a high frequency band, and an extended high frequency band. Since only high-frequency band is extracted by performing simple filtering, the amount of computation can be significantly reduced.
이에, 연산량 감소에 따라 실시간 처리가 가능하므로, 신호 전달 지연 현상 등을 방지할 수 있다.As a result, the real-time processing is possible according to the decrease in the amount of computation, thereby preventing a signal transmission delay phenomenon.
도 1은 기존 음성 합성 장치의 제어회로블록도이다.1 is a control circuit block diagram of a conventional speech synthesis apparatus.
도 2는 기존 선형 예측 부호화 기법을 이용한 음성 합성 장치의 제어회로블록도이다.2 is a control circuit block diagram of a speech synthesis apparatus using a conventional linear prediction coding technique.
도 3은 본 발명의 일 실시예로서, 인-이어 마이크로폰을 갖는 이어셋의 대역폭 확장 장치의 제어회로블록도이다.3 is a block diagram of a control circuit of an apparatus for expanding a bandwidth of an ear set having an in-ear microphone according to one embodiment of the present invention.
도 4는 본 발명의 적용례로서, 무선 이어셋/헤드셋에 적용되는 경우의 개념도이다.4 is a conceptual diagram of an application of the present invention in the case of being applied to a wireless earset / headset.
도 5는 본 발명이 다른 적용례로서, 유선 이어셋/헤드셋에 적용되는 경우의 개념도이다.5 is a conceptual diagram when the present invention is applied to a wired earset / headset as another application example.
도 6은 본 발명의 일 실시예로서, 인-이어 마이크로폰을 갖는 이어셋의 대역폭 확장 방법의 흐름도이다.6 is a flow chart of a method for bandwidth expansion of an earset with an in-ear microphone as an embodiment of the present invention.
이하에서는 본 발명의 바람직한 실시예 및 첨부하는 도면을 참조하여 본 발명을 상세히 설명하되, 도면의 동일한 참조부호는 동일한 구성요소를 지칭함을 전제하여 설명하기로 한다.Hereinafter, with reference to the preferred embodiments of the present invention and the accompanying drawings will be described in detail, the same reference numerals in the drawings will be described on the assumption that the same components.
발명의 상세한 설명 또는 특허청구범위에서 어느 하나의 구성요소가 다른 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 당해 구성요소만으로 이루어지는 것으로 한정되어 해석되지 아니하며, 다른 구성요소들을 더 포함할 수 있는 것으로 이해되어야 한다.When any one element in the description or claims of the invention "includes" another element, unless otherwise stated, it is not limited to consisting only of that element, and other elements are not interpreted. It should be understood that it may include more.
또한, 발명의 상세한 설명 또는 특허청구범위에서 "~수단", "~부", "~모듈", "~블록"으로 명명된 구성요소들은 적어도 하나 이상의 기능이나 동작을 처리하는 단위를 의미하며, 이들 각각은 소프트웨어 또는 하드웨어, 또는 이들의 결합에 의하여 구현될 수 있다.Further, in the detailed description of the invention or in the claims, the elements designated as "~ means", "~ part", "~ module", and "~ block" mean a unit that processes at least one function or operation, Each of these may be implemented by software or hardware, or a combination thereof.
본 발명은 인-이어 마이크로폰을 통해 전달된 사용자 목소리의 저주파 대역 신호로부터 고주파 대역 신호를 복원하는 방법에 관한 것이다. 특히, 블루투스의 DSP를 이용하여 실시간으로 고음의 복원이 가능하도록 한 기법을 제시한다.The present invention relates to a method of recovering a high frequency band signal from a low frequency band signal of a user voice transmitted through an in-ear microphone. In particular, we propose a technique that enables the restoration of high-pitched sound in real time using the DSP of Bluetooth.
상기한 바와 같이, 소리는 성대의 떨림을 통해서 발생(source)되고, 소리는 구강, 비강, 입 구조에 따라서 다른 소리로 배출(filter)된다. 즉, 떨림(source) 혹은 좁은틈 사이를 공기가 통과하면서 생성되는 교란을 나타내는 여기신호 성분과, 목소리(filter)를 생성하는 포락선 성분으로 구분된다. 통상, 여기신호 성분과 포락선 성분은 각각 광대역 확장 과정을 거치게 되는데, 여기신호 성분은 원음을 생성하는데 미치는 영향이 포락선 성분에 대비하여 상대적으로 작으므로 계산량이 적은 스펙트럼 폴딩 기법 또는 스펙트럼 평행이동 기법을 사용한다. 그런데, 포락선 성분에 대해서는, 선 스펙트럼 주파수(linear spectrum frequency)로 표현되는 포락선 성분에 대해 코드북 매핑(Codebook Mapping), 인공신경망, 가우시안 믹싱 모델(Gaussian Mixture Model), 은닉 마르코프 모델(hidden Markov model, HMM) 기법을 이용하여 광대역 고주파 신호를 출력하여 목소리를 생성하므로, 연산량이 많다는 단점이 있다. 결국, 예를 들어 블루투스 DSP에서 실시간으로 고음을 복원하는 것이 실질적으로 불가능한 상황이다. 이에 본 발명에서는 연산량을 현저히 감소시킴으로써 실시간 고음 복원 및 원음 복원이 가능하도록 한 방안을 제안한다.As described above, the sound is sourced through the tremor of the vocal cords, and the sound is filtered into different sounds depending on the oral cavity, the nasal cavity and the mouth structure. That is, it is divided into excitation signal components representing disturbances generated as air passes between sources or narrow gaps, and envelope components generating filters. In general, the excitation signal component and the envelope component are each subjected to a wideband extension process. Since the influence of the excitation signal component is relatively small compared to the envelope component, the spectral folding technique or the spectral parallel shift technique is used. do. By the way, for the envelope component, Codebook Mapping, Artificial Neural Network, Gaussian Mixture Model, Hidden Markov Model, HMM for the envelope component represented by linear spectrum frequency ), A voice is generated by outputting a wideband high frequency signal using a large number of techniques, and thus has a large amount of computation. As a result, it is practically impossible to recover treble in real time, for example in a Bluetooth DSP. Accordingly, the present invention proposes a method for enabling real-time high-pitched sound restoration and original sound restoration by significantly reducing the amount of computation.
이하에서는 본 발명의 인-이어 마이크로폰을 갖는 이어셋의 대역폭 확장 장치 및 방법이 구현된 일 예를 특정한 실시예를 통해 설명하기로 한다.Hereinafter, an example in which an apparatus and method for extending bandwidth of an earset having an in-ear microphone according to the present invention is implemented will be described with reference to a specific embodiment.
도 3은 본 발명의 일 실시예로서, 인-이어 마이크로폰을 갖는 이어셋의 대역폭 확장 장치의 제어회로블록도이다.3 is a block diagram of a control circuit of an apparatus for expanding a bandwidth of an ear set having an in-ear microphone according to one embodiment of the present invention.
도 3을 참조하면, 본 발명의 대역폭 확장 장치는, 입력된 초협대역 신호(Super-Narrowband signal)로부터 여기신호(excitation signal)를 결정하는 제1 선형 예측 분석부(21)와, 결정된 여기신호를 스펙트럼 폴딩 기법 또는 가우시안 노이즈 통과대역 변환 기법 등을 통해 광대역 여기신호를 출력하여 소리를 생성하는 여기신호 확장부(22)와, 초협대역 신호의 주파수를 배가(N배)시켜 고주파 대역 신호를 포함하는 광대역 신호로 확장시키는 고주파 스펙트럼 확장부(23)와, 확장된 광대역 신호로부터 고주파 대역 신호를 추정 및 결정하는 제2 선형 예측 분석부(24)와, 제2 선형 예측 분석부(24)로부터 출력된 고주파 대역 신호를 필터링하는 필터링부(25)와, 필터링부(25)로부터 출력된 고주파 대역 신호와 여기신호 확장부(22)로부터 출력된 광대역 여기신호를 합성하는 합성부(26)와, 합성부(26)로부터 출력된 고주파 신호와 초협대역 신호를 믹싱하는 믹싱부(27)를 포함한다. 이와 같이, 본 발명의 대역폭 확장 장치는, 고주파 크게 입력된 초협대역 신호(Super-Narrowband signal)로부터 확장된 여기신호(excitation signal)와 초협대역 신호의 주파수를 배가시켜 확장시키고 필터링한 고주파 대역 신호를 합성하여 고주파 신호를 생성하는 고주파 신호 생성부와, 고주파 신호와 초협대역 신호를 믹싱하는 믹싱부(27)로 구성되어 있다.Referring to FIG. 3, the apparatus for extending bandwidth of the present invention includes a first linear prediction analyzer 21 for determining an excitation signal from an input super-narrowband signal, and the determined excitation signal. An excitation signal extension 22 for generating sound by outputting a wideband excitation signal through a spectral folding technique or a Gaussian noise passband conversion technique, and a high frequency band signal by doubling (N times) the frequency of the ultra narrowband signal. A high frequency spectrum expansion unit 23 for extending a wideband signal, a second linear prediction analysis unit 24 for estimating and determining a high frequency band signal from the extended wideband signal, and a second linear prediction analysis unit 24 Filtering unit 25 for filtering the high-frequency band signal, a synthesis unit for combining the high-frequency band signal output from the filtering unit 25 and the wideband excitation signal output from the excitation signal expansion unit 22 (26), and a mixing section 27 for mixing the high frequency signal and the ultra narrow band signal output from the combining section 26. As described above, the bandwidth extension device of the present invention multiplies and expands and filters the excitation signal extended from the super- narrowband signal inputted at a high frequency and the super narrowband signal to expand and filter the high frequency band signal. A high frequency signal generation unit for synthesizing and generating a high frequency signal, and a mixing unit 27 for mixing a high frequency signal and an ultra narrow band signal.
고주파 스펙트럼 확장부(23)는 일례로서, 초협대역 신호(0 ~ 2KHz)를 2배로 업샘플링하면, 업샘플링된 신호는 4KHz에서 샘플링된다. 이에 고주파 스펙트럼 확장부(23)에서 출력되는 신호는 0 ~ 4KHz 대역과 동일하고, 고주파 대역인 4 ~ 8KHz에서는 입력 신호의 폴딩된 버전과 동일한 스펙트럼을 갖게 된다. 이 스펙트럼을 이용하여 고주파 대역 신호를 추정하게 된다. 이에, 필터링부(25)에서는 4 ~ 8KHz 대역의 음성 신호를 추출하게 된다. 이후, 합성부(26)에서는 0 ~ 4KHz 대역의 음성 신호와 4 ~ 8KHz 대역의 음성 신호의 합성이 이루어지고, 이어서 합성부(26)에서 출력된 고주파 음성과 확장 이전의 초협대역 신호(0 ~ 2KHz)를 믹싱하여 최종적으로 원음을 복원한다.As an example, the high frequency spectral expansion unit 23 upsamples the ultra narrowband signal (0 to 2 KHz) twice, and the upsampled signal is sampled at 4 KHz. The signal output from the high frequency spectrum expansion unit 23 is the same as the 0 ~ 4KHz band, the high frequency band 4 ~ 8KHz will have the same spectrum as the folded version of the input signal. The spectrum is used to estimate the high frequency band signal. Accordingly, the filtering unit 25 extracts the voice signal of the 4 ~ 8KHz band. Thereafter, the synthesizer 26 synthesizes a voice signal in the 0-4KHz band and a voice signal in the 4-8KHz band, and then the high-frequency voice output from the combiner 26 and the ultra narrowband signal before extension (0-4KHz). 2KHz) to finally restore the original sound.
이와 같이, 본 발명의 본 발명의 대역폭 확장 장치는, 초협대역 신호(Super-Narrowband signal)가 인-이어 마이크로폰으로 입력되더라도 원음 복원이 가능하도록 하고 있다. 즉, 일반적으로 고음 복원 알고리즘은 0 ~ 4KHz를 8KHz 까지 확장하는데 반해, 본 발명에서는 인-이어 마이크로폰으로 입력되는 2KHz 미만의 초협대역 신호에 대해 복원이 이루어지게 된다. 게다가, 본 발명에서는 연산량이 현저하게 감소되었음에도 불구하고 원음을 복원할 수 있다.As described above, the bandwidth extension device of the present invention enables the original sound recovery even if a super-narrowband signal is input to the in-ear microphone. That is, in general, the treble reconstruction algorithm extends 0 to 4KHz to 8KHz, whereas in the present invention, the reconstruction is performed for a narrowband signal of less than 2KHz input to the in-ear microphone. In addition, in the present invention, the original sound can be restored even though the calculation amount is significantly reduced.
도 2에서 제시한 기존 음성 합성 장치에 대비하여, 선형 예측 부호화 이후에 여기신호를 확장하는 기능은 그대로 수행하고 있으나, 스펙트럼 포락선 확장부의 기능은 제거되어 있다. 기존 음성 합성 장치는 선형 예측 부호화 기반 알고리즘을 통해 주파수를 예측하여 확장시키고 있는데 반해, 본 발명에서는 선형 예측 부호화 기반 알고리즘을 통해 주파수를 예측하여 확장시키는 연산은 수행하지 않으며, 고주파 스펙트럼 확장(High Frequency Spectrum Extension)을 통해 단순 주파수 확장이 이루어지도록 한다. 즉, 주파수를 예측해서 실시간으로 만들어서 확장시키는 연산은 생략하고, 정류기(rectifier), 스펙트럼 폴딩(spectral folding), 변환(modulation) 기법을 사용해서 주파수만 확장시킨다. 이에 연산량이 크게 감소시킬 수 있다.In contrast to the conventional speech synthesis apparatus shown in FIG. 2, the function of extending the excitation signal after linear prediction encoding is performed as it is, but the function of the spectral envelope expansion unit is removed. While the conventional speech synthesis apparatus predicts and extends a frequency through a linear predictive coding based algorithm, the present invention does not perform an operation of predicting and expanding a frequency through a linear predictive coding based algorithm, and performs high frequency spectrum expansion (High Frequency Spectrum). Extension) allows simple frequency extension. That is, the operation of estimating and extending the frequency in real time is omitted, and only the frequency is extended by using rectifier, spectral folding, and modulation techniques. This can greatly reduce the amount of computation.
이와 같이 고주파 스펙트럼 확장부(23)에서 단순히 주파수만 확장시킴으로써 광대역 신호가 출력되면, 이에 대해 선형 예측 분석을 수행한 후, 선형 예측 모델링을 통한 주파수 확장을 수행하지 않고 필터를 사용하여 단순 필터링만을 수행한다. 즉, 대역폭 확장 없이 원음에 근접한 필터링이 이루어지는 것이다. 이후, 필터링된 결과와 여기신호가 확장된 결과를 합성하면 고주파 신호가 생성된다. 이어서, 마지막으로 고주파 신호와 인-이어 마이크로폰을 통해 입력받은 초협대역 신호를 믹싱하면 원음이 복원된다.When the wideband signal is output by simply expanding the frequency in the high frequency spectrum extension unit 23 as described above, after performing linear prediction analysis on this, only simple filtering is performed using a filter without performing frequency expansion through linear prediction modeling. do. In other words, filtering is performed close to the original sound without bandwidth extension. Subsequently, a high frequency signal is generated by combining the filtered result with the result of the extended excitation signal. Then, when the high frequency signal and the ultra narrowband signal input through the in-ear microphone are finally mixed, the original sound is restored.
도 4는 본 발명의 적용례로서, 무선 이어셋/헤드셋에 적용되는 경우의 개념도이다.4 is a conceptual diagram of an application of the present invention in the case of being applied to a wireless earset / headset.
도 4를 참조하면, 본 발명의 대역폭 확장이 이어셋의 DSP, 즉 예를 들어 블루투스 칩셋(DSP)에서 이루어지는 경우에 대해 설명하고 있다. 이 경우 연산량이 현저하게 감소하여 블루투스 칩셋에서의 실시간 처리가 가능할 뿐 아니라, 무선 전송 지연을 최소화할 수 있다. 물론, 이어셋과 스마트폰은 유선접속될 수 있다.Referring to FIG. 4, the case where the bandwidth extension of the present invention is performed in the DSP of the earset, that is, for example, a Bluetooth chipset (DSP) is described. In this case, the amount of computation is significantly reduced, enabling real-time processing in the Bluetooth chipset and minimizing radio transmission delay. Of course, the earset and the smartphone can be wired connection.
도 5는 본 발명이 다른 적용례로서, 유선 이어셋/헤드셋에 적용되는 경우의 개념도이다.5 is a conceptual diagram when the present invention is applied to a wired earset / headset as another application example.
도 5를 참조하면, 본 발명의 대역폭 확장이 스마트폰 등에서 이루어지는 경우에 대해 설명하고 있다. 이 경우, 이어셋과 스마트폰이 유선접속될 수 있으며, 스마트폰 칩셋에서 실시간 처리가 가능하다. 물론, 이어셋과 스마트폰은 무선접속될 수 있다.Referring to FIG. 5, the case where the bandwidth extension of the present invention is performed in a smartphone or the like is described. In this case, the earset and the smartphone may be wired, and real-time processing is possible in the smartphone chipset. Of course, the earset and the smartphone can be wirelessly connected.
그러면, 여기서 상기와 같이 구성된 대역폭 확장 장치를 이용한 본 발명의 인-이어 마이크로폰을 갖는 이어셋의 대역폭 확장 방법에 대해 설명하기로 한다.Then, the bandwidth extension method of the ear set having the in-ear microphone of the present invention using the bandwidth extension device configured as described above will be described.
도 6은 본 발명의 일 실시예로서, 인-이어 마이크로폰을 갖는 이어셋의 대역폭 확장 방법의 흐름도이다.6 is a flow chart of a method for bandwidth expansion of an earset with an in-ear microphone as an embodiment of the present invention.
도 6을 참조하면, 인-이어 마이크로폰으로 초협대역 신호가 입력되면(S1), 여기신호의 확장이 이루어지는 과정으로서, 입력된 초협대역 신호(Super-Narrowband signal)로부터 여기신호(excitation signal)를 결정하고(S2), 결정된 여기신호를 광대역 여기신호로 확장시킨다(S3).Referring to FIG. 6, when an ultra narrowband signal is input to an in-ear microphone (S1), the excitation signal is expanded, and an excitation signal is determined from the input super- narrowband signal. (S2), the determined excitation signal is extended to a wideband excitation signal (S3).
한편, 입력된 초협대역 신호의 주파수를 배가시켜 고주파 대역 신호를 포함하는 광대역 신호로 확장시킨다(S4).Meanwhile, the frequency of the input super narrowband signal is doubled to extend the wideband signal including the high frequency band signal (S4).
이에, 확장된 광대역 신호로부터 고주파 대역 신호를 추정 및 결정한다(S5).Accordingly, the high frequency band signal is estimated and determined from the extended wideband signal (S5).
이어서, 추정 및 결정된 고주파 대역 신호를 필터링한다(S6).Subsequently, the estimated and determined high frequency band signal is filtered (S6).
한편, 필터링된 고주파 대역 신호와 광대역 여기신호를 합성하여 고주파 신호를 생성하게 된다(S7).Meanwhile, a high frequency signal is generated by combining the filtered high frequency band signal and the wideband excitation signal (S7).
이어서, 고주파 신호와 초협대역 신호를 믹싱하여 원음을 복원한다(S8).Next, the high frequency signal and the ultra narrowband signal are mixed to restore the original sound (S8).
이와 같이 본 발명에서는, 초협대역 신호(0 ~ 2KHz)로부터 선형 예측을 통해 여기신호를 확장하는 영역과, 초협대역 신호로부터 단순 주파수 확장을 수행하고, 단순 확장된 고주파수 신호로부터 선형 예측을 통해 고주파 신호를 예측하고, 예측된 고주파 신호를 단순 필터링하는 영역으로 구성하고 있다. 이후, 확장된 여기신호와 필터링된 고주파 신호를 합성하여 고주파 신호를 생성하고, 이어서 고주파 신호와 초협대역 신호로부터 광대역신호(0 ~ 8KHz)를 생성한다. 이 때, 고주파 스펙트럼 확장부(23)에서의 단순 확장 기법으로는, 정류기(rectifier), 스펙트럼 폴딩(spectral folding), 변환(modulation) 기법을 이용할 수 있다.As described above, in the present invention, a region in which the excitation signal is extended from the ultra narrow band signal (0 to 2 KHz) through linear prediction, and the simple frequency extension is performed from the ultra narrow band signal, and the high frequency signal is linearly estimated from the simple high frequency signal. It is composed of an area for predicting and simple filtering the predicted high frequency signal. Thereafter, the extended excitation signal and the filtered high frequency signal are synthesized to generate a high frequency signal, and then a wideband signal (0 to 8 KHz) is generated from the high frequency signal and the ultra narrow band signal. In this case, as a simple extension technique in the high frequency spectrum extension unit 23, a rectifier, a spectral folding, and a modulation technique may be used.
이상 몇 가지의 실시예를 통해 본 발명의 기술적 사상을 살펴보았다.The technical spirit of the present invention has been described through several embodiments.
본 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 본 발명의 기재사항으로부터 상기 살펴본 실시예를 다양하게 변형하거나 변경할 수 있음은 자명하다. 또한, 비록 명시적으로 도시되거나 설명되지 아니하였다 하여도 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 본 발명의 기재사항으로부터 본 발명에 의한 기술적 사상을 포함하는 다양한 형태의 변형을 할 수 있음은 자명하며, 이는 여전히 본 발명의 권리범위에 속한다. 첨부하는 도면을 참조하여 설명된 상기의 실시예들은 본 발명을 설명하기 위한 목적으로 기술된 것이며 본 발명의 권리범위는 이러한 실시예에 국한되지 아니한다.It will be apparent to those skilled in the art that the present invention may be variously modified or changed from the description of the present invention. In addition, even if not explicitly shown or described, those skilled in the art to which the present invention pertains various modifications, including the technical idea according to the present invention from the description of the present invention. Is obvious, and still belongs to the scope of the present invention. The above embodiments described with reference to the accompanying drawings are described for the purpose of illustrating the present invention, and the scope of the present invention is not limited to these embodiments.

Claims (9)

  1. 입력된 초협대역 신호(Super-Narrowband signal)로부터 확장된 여기신호(excitation signal)와 상기 초협대역 신호의 주파수를 배가시켜 확장시키고 필터링한 고주파 대역 신호를 합성하여 고주파 신호를 생성하는 고주파 신호 생성부; 및A high frequency signal generator for generating a high frequency signal by combining an extended excitation signal from an input super narrow band signal and an extended and filtered high frequency band signal by doubling the frequency of the super narrow band signal; And
    상기 고주파 신호와 상기 초협대역 신호를 믹싱하는 믹싱부를 포함하는 인-이어 마이크로폰을 갖는 이어셋의 대역폭 확장 장치.And an in-ear microphone including a mixing unit for mixing the high frequency signal and the ultra narrow band signal.
  2. 제1항에 있어서,The method of claim 1,
    상기 고주파 신호 생성부는,The high frequency signal generator,
    상기 초협대역 신호로부터 상기 여기신호를 결정하는 제1 선형 예측 분석부;A first linear prediction analyzer determining the excitation signal from the ultra narrowband signal;
    결정된 상기 여기신호를 광대역 여기신호로 확장하는 여기신호 확장부;An excitation signal expansion unit for extending the determined excitation signal into a wideband excitation signal;
    상기 초협대역 신호의 주파수를 배가(N배)시켜 고주파 대역 신호를 포함하는 광대역 신호로 확장시키는 고주파 스펙트럼 확장부;A high frequency spectral expansion unit that multiplies (N times) the frequency of the ultra narrowband signal to a wideband signal including a high frequency band signal;
    확장된 상기 광대역 신호로부터 고주파 대역 신호를 추정 및 결정하는 제2 선형 예측 분석부;A second linear prediction analyzer configured to estimate and determine a high frequency band signal from the expanded wideband signal;
    상기 제2 선형 예측 분석부로부터 출력된 고주파 대역 신호를 필터링하는 필터링부; 및A filtering unit filtering the high frequency band signal output from the second linear prediction analyzer; And
    상기 필터링부로부터 출력된 고주파 대역 신호와 상기 여기신호 확장부로부터 출력된 광대역 여기신호를 합성하는 합성부를 포함하는 인-이어 마이크로폰을 갖는 이어셋의 대역폭 확장 장치.Bandwidth expansion device of the ear set having an in-ear microphone including a synthesis unit for synthesizing a high frequency band signal output from the filtering unit and a wideband excitation signal output from the excitation signal expansion unit.
  3. 제2항에 있어서,The method of claim 2,
    상기 여기신호의 확장은 스펙트럼 폴딩 기법 또는 가우시안 노이즈 통과대역 변환 기법 중 어느 하나를 이용하는 인-이어 마이크로폰을 갖는 이어셋의 대역폭 확장 장치.The extension of the excitation signal is a bandwidth extension device of an earset having an in-ear microphone using any one of a spectral folding technique or a Gaussian noise passband conversion technique.
  4. 제2항에 있어서,The method of claim 2,
    상기 광대역 신호의 확장은 정류기(rectifier), 스펙트럼 폴딩(spectral folding), 변환(modulation) 기법 중 어느 하나를 이용하는 인-이어 마이크로폰을 갖는 이어셋의 대역폭 확장 장치.The widening of the wideband signal is a bandwidth extension device of an earset having an in-ear microphone using any one of a rectifier, a spectral folding, and a modulation technique.
  5. 제1항에 있어서,The method of claim 1,
    상기 고주파 신호 생성부 및 믹싱부는 이어셋의 회로에 구성되는 인-이어 마이크로폰을 갖는 이어셋의 대역폭 확장 장치.And the high frequency signal generator and the mixing unit have an in-ear microphone configured in a circuit of the earset.
  6. 제5항에 있어서,The method of claim 5,
    상기 이어셋은 블루투스 칩셋을 포함하는 인-이어 마이크로폰을 갖는 이어셋의 대역폭 확장 장치.The earset is a device for bandwidth expansion of the earset having an in-ear microphone including a Bluetooth chipset.
  7. 제1항에 있어서,The method of claim 1,
    상기 고주파 신호 생성부 및 믹싱부는 스마트폰의 회로에 구성되는 인-이어 마이크로폰을 갖는 이어셋의 대역폭 확장 장치.And the high frequency signal generator and the mixer are in-ear microphones configured in a circuit of a smartphone.
  8. (a) 입력된 초협대역 신호(Super-Narrowband signal)로부터 확장된 여기신호(excitation signal)와 상기 초협대역 신호의 주파수를 배가시켜 확장시키고 필터링한 고주파 대역 신호를 합성하여 고주파 신호를 생성하는 단계;(a) generating a high frequency signal by synthesizing an extended excitation signal from an input super narrowband signal and a frequency of the ultra narrow band signal and expanding and filtering the filtered high frequency band signal;
    (b) 상기 고주파 신호와 상기 초협대역 신호를 믹싱하는 단계를 포함하는 인-이어 마이크로폰을 갖는 이어셋의 대역폭 확장 방법.(b) mixing the high frequency signal with the ultra narrowband signal.
  9. 제8항에 있어서,The method of claim 8,
    상기 단계 (a)는,Step (a) is,
    상기 초협대역 신호로부터 상기 여기신호를 결정하는 단계;Determining the excitation signal from the ultra narrowband signal;
    결정된 상기 여기신호를 광대역 여기신호로 확장하는 단계;Expanding the determined excitation signal into a wideband excitation signal;
    상기 초협대역 신호의 주파수를 배가(N배)시켜 고주파 대역 신호를 포함하는 광대역 신호로 확장시키는 단계;Multiplying (N times) the frequency of the ultra narrowband signal to a wideband signal including a high frequency band signal;
    확장된 상기 광대역 신호로부터 고주파 대역 신호를 추정 및 결정하는 단계;Estimating and determining a high frequency band signal from the expanded wideband signal;
    결정된 상기 고주파 대역 신호를 필터링하는 필터링부; 및A filtering unit filtering the determined high frequency band signal; And
    필터링된 상기 고주파 대역 신호와 확장된 상기 광대역 여기신호를 합성하는 단계를 포함하는 인-이어 마이크로폰을 갖는 이어셋의 대역폭 확장 방법.Synthesizing the filtered high frequency band signal and the extended wideband excitation signal.
PCT/KR2016/013989 2015-12-30 2016-11-30 Apparatus and method for extending bandwidth of earset having in-ear microphone WO2017116022A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
KR10-2015-0189230 2015-12-30
KR20150189230 2015-12-30
KR10-2016-0009803 2016-01-27
KR1020160009803A KR20170080387A (en) 2015-12-30 2016-01-27 Apparatus and method for extending bandwidth of earset with in-ear microphone

Publications (1)

Publication Number Publication Date
WO2017116022A1 true WO2017116022A1 (en) 2017-07-06

Family

ID=59225212

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2016/013989 WO2017116022A1 (en) 2015-12-30 2016-11-30 Apparatus and method for extending bandwidth of earset having in-ear microphone

Country Status (1)

Country Link
WO (1) WO2017116022A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022231977A1 (en) * 2021-04-29 2022-11-03 Bose Corporation Recovery of voice audio quality using a deep learning model

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20040066835A (en) * 2001-11-23 2004-07-27 코닌클리즈케 필립스 일렉트로닉스 엔.브이. Audio signal bandwidth extension
KR101077328B1 (en) * 2009-09-30 2011-10-26 엘지이노텍 주식회사 System for improving sound quality in stfd type headset
KR20150051301A (en) * 2013-11-02 2015-05-12 삼성전자주식회사 Method and apparatus for generating wideband signal and device employing the same

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20040066835A (en) * 2001-11-23 2004-07-27 코닌클리즈케 필립스 일렉트로닉스 엔.브이. Audio signal bandwidth extension
KR101077328B1 (en) * 2009-09-30 2011-10-26 엘지이노텍 주식회사 System for improving sound quality in stfd type headset
KR20150051301A (en) * 2013-11-02 2015-05-12 삼성전자주식회사 Method and apparatus for generating wideband signal and device employing the same

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
KORNAGEL, ULRICH: "Techniques for Artificial Bandwidth Extension of Telephone Speech", SIGNAL PROCESSING, vol. 86, no. 6, 1 June 2006 (2006-06-01), pages 1296 - 1306, XP024997679 *
NAGEL, FREDERIK ET AL.: "A Harmonic Bandwidth Extension Method for Audio Codecs", IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING 2009 (ICASSP 2009, 2009, pages 145 - 148, XP031459187 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022231977A1 (en) * 2021-04-29 2022-11-03 Bose Corporation Recovery of voice audio quality using a deep learning model

Similar Documents

Publication Publication Date Title
WO2018030589A2 (en) Device and method for monitoring earphone wearing state
CN109065067A (en) A kind of conference terminal voice de-noising method based on neural network model
DK2649813T3 (en) HEARING AND A PROCEDURE FOR IMPROVED SOUND RENDERING
JP4759052B2 (en) Hearing aid with enhanced high frequency reproduction and audio signal processing method
US7243060B2 (en) Single channel sound separation
KR100643310B1 (en) Method and apparatus for disturbing voice data using disturbing signal which has similar formant with the voice signal
EP2594090B1 (en) Method of signal processing in a hearing aid system and a hearing aid system
JPH10509256A (en) Audio signal conversion method using pitch controller
JP2003520469A (en) Noise reduction apparatus and method
EP3002959B1 (en) Feedback estimation based on deterministic sequences
JP2011075728A (en) Voice band extender and voice band extension program
WO2021068120A1 (en) Deep learning speech extraction and noise reduction method fusing signals of bone vibration sensor and microphone
JP2003256000A (en) Telephone device
KR101850693B1 (en) Apparatus and method for extending bandwidth of earset with in-ear microphone
WO2017183789A1 (en) Tone compensation device and method for earset
WO2017116022A1 (en) Apparatus and method for extending bandwidth of earset having in-ear microphone
US9295423B2 (en) System and method for audio kymographic diagnostics
JP2012208177A (en) Band extension device and sound correction device
CN111554323A (en) Voice processing method, device, equipment and storage medium
JPH0580796A (en) Method and device for speech speed control type hearing aid
JP3185363B2 (en) hearing aid
JPH06289896A (en) System and device for emphaizing feature of speech
WO2023197203A1 (en) Method and system for reconstructing speech signals
WO2023057410A1 (en) Joint far-end and near-end speech intelligibility enhancement
Negi et al. Comparative Analysis of Octave and Band Pass Filter for Improving Hearing Capability of Deaf People

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16881974

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16881974

Country of ref document: EP

Kind code of ref document: A1