WO2017116022A1

WO2017116022A1 - Apparatus and method for extending bandwidth of earset having in-ear microphone

Info

Publication number: WO2017116022A1
Application number: PCT/KR2016/013989
Authority: WO
Inventors: 김은동
Original assignee: 주식회사 오르페오사운드웍스
Priority date: 2015-12-30
Filing date: 2016-11-30
Publication date: 2017-07-06

Abstract

Disclosed is an apparatus and method for extending the bandwidth of an earset having an in-ear microphone. The apparatus and method for extending the bandwidth of an earset having an in-ear microphone according to the present invention comprises: a high-frequency signal generation unit for generating a high-frequency signal by synthesizing an excitation signal extended from an input super-narrowband signal with a high-frequency band signal, wherein the high-frequency band signal is obtained by multiplying the frequency of the super-narrowband signal, extending the super-narrowband signal to the multiplied frequency, and filtering the same; and a mixing unit for mixing the high-frequency signal and the super-narrowband signal.

Description

Bandwidth Expansion Apparatus and Method for Earsets with In-Ear Microphones

The present invention relates to a speech reconstruction technique. More specifically, the present invention relates to an apparatus and method for expanding bandwidth of an earset having an in-ear microphone for recovering a high range from a low range input to an in-ear microphone.

Recently, many earphones have been proposed in which a speaker and a microphone are integrated.

Such an earset may perform a function of transmitting sound to the ear canal and a function of collecting a user's voice in one body. Typically, the speaker is directed toward the ear canal for sound transmission, and the microphone is exposed to the outside for collecting user voice.

However, the microphone exposed to the outside is not only user voice, but also external noise is collected together.

In order to solve the external noise problem, an earset having a microphone (in-ear microphone) installed in the ear canal direction has been proposed, but the frequency at which the voice is transmitted from the vocal cords to the eardrum through the eustachian tube is a low range of 0 to 2KHz. Therefore, there is a difficulty in restoring the original sound only by the low range input to the in-ear microphone.

In order to solve such a high frequency loss problem, a technique of constructing a plurality of microphones, synthesizing frequency voices of different bands input to the microphone, and restoring the original sound has been proposed. That is, an in-ear microphone installed on the ear canal side and an out-ear microphone installed on the outer side of the ear canal are configured together, and the in-ear microphone and the out-ear (Out-Ear) To recover the original sound by synthesizing the frequency voices of different bands input to the microphone.

Then, here, the existing technique for synthesizing the speech and restoring the original sound is described.

1 is a control circuit block diagram of a conventional speech synthesis apparatus.

Referring to FIG. 1, a conventional speech synthesis apparatus, as filed by the present applicant, includes a frequency for extending an in-ear microphone 1 and a signal transmitted from the in-ear microphone 1 to a low frequency band and a high frequency band. A band extension section 2, a low frequency band signal extraction section 3 for extracting a low frequency band signal from the extended signal, at least one or more out-ear microphones 4, and an out-ear microphone 4 A beamforming unit 5 for beamforming a signal, a high frequency band signal extracting unit 6 for extracting a high frequency band signal from the beamformed signal, and a packet by sensing an amplitude value of voice for at least one channel The low frequency band signal transmitted from the low frequency band signal extracting unit 3 and the high frequency band signal extracting unit 6 driven in response to the voice activity detection and the voice activity detecting unit 7 for determining whether to generate or not. And a synthesizer 8 for synthesizing the call and the high frequency band signal.

In the conventional speech synthesizer configured as described above, the original sound is restored by synthesizing the beamformed high frequency band signal on the out-ear microphone 4 side and the low frequency band signal transmitted from the in-ear microphone 1.

However, in the existing speech synthesis apparatus, since a plurality of microphones must be configured, a manufacturing cost increases. In addition, since the outside noise is still input to the out-ear microphone 4, it is practically impossible to completely remove the outside noise, and there is a problem that filtering must be accompanied to remove the outside noise.

On the other hand, techniques used for treble recovery include spectral folding, spectral shifting, nonlinear processing using a rectifier, and linear predictive coding (LPC).

Here, the linear predictive encoding technique is widely used in speech encoding and decoding, and a linear predictive encoding algorithm may be used in a speech decoding apparatus that can be used in hearing aids or the like as described in US Pat. No. 8,306,249.

According to the source-filter modeling technique of linear prediction coding, the sound is sourced through the tremor of the vocal cords and the sound is filtered out according to the oral cavity, the nasal cavity, and the mouth structure. The mathematical modeling of this is source-filter modeling. In other words, by modeling a source and adding a filter to it, you can model how the tremor is reproduced as a voice.

2 is a control circuit block diagram of a speech synthesis apparatus using a conventional linear prediction coding technique.

Referring to FIG. 2, the conventional speech synthesis apparatus includes a linear prediction analyzer 11 for determining an excitation signal from an input narrowband signal, and a spectral folding technique or Gaussian noise passband conversion technique for the determined excitation signal. An excitation signal expansion unit 12 for outputting a wideband excitation signal to generate sound through the same, a feature extraction unit 13 for extracting voice feature information from the input narrowband signal, low frequency envelope information, and the excitation signal, and Outputs a wideband high frequency signal using one of codebook mapping, artificial neural network, and Gaussian Mixture Model for the envelope component represented by linear spectrum frequency in response to the information. The spectral envelope expansion unit 14 that generates a voice, and synthesizes a wideband excitation signal and a wideband high frequency signal to It consists of a composite section 15 desired.

However, the conventional speech synthesizer configured as described above has a problem that codebook mapping, artificial neural network, and Gaussian Mixture Model techniques used by the spectral envelope expansion unit 14 are difficult to process in real time due to the large amount of computation. . Thus, for example, when processing with a chipset (DSP) included in a Bluetooth earset / headset, a large amount of computation may cause a delay. On the other hand, there is a problem that it is not suitable to apply the existing linear predictive coding technique to the low range input to the in-ear microphone.

An object and method for extending the bandwidth of an earset having an in-ear microphone for simply expanding a narrowband signal input to an in-ear microphone into a high frequency band and extracting a high frequency band through simple filtering in the extended high frequency band To provide.

In order to achieve the above object, the apparatus for extending the bandwidth of an ear set having an in-ear microphone of the present invention preferably includes an excitation signal extended from an input super-narrowband signal and the ultra narrow. A high frequency signal generator for generating a high frequency signal by synthesizing and extending the frequency of the band signal and synthesizing the filtered high frequency band signal; And a mixing unit for mixing the high frequency signal and the ultra narrow band signal.

In this case, the high frequency signal generator may include a first linear prediction analyzer configured to determine the excitation signal from the ultra narrowband signal; An excitation signal expansion unit for extending the determined excitation signal into a wideband excitation signal; A high frequency spectral expansion unit that multiplies (N times) the frequency of the ultra narrowband signal to a wideband signal including a high frequency band signal; A second linear prediction analyzer configured to estimate and determine a high frequency band signal from the expanded wideband signal; A filtering unit filtering the high frequency band signal output from the second linear prediction analyzer; And a synthesizer configured to synthesize the high frequency band signal output from the filtering unit and the wideband excitation signal output from the excitation signal extension unit. The extension of the excitation signal may use any one of a spectral folding technique and a Gaussian noise passband conversion technique. In addition, the widening of the wideband signal may use any one of a rectifier, a spectral folding, and a modulation technique.

On the other hand, the high frequency signal generation unit and mixing unit may be configured in the circuit of the ear set, wherein the ear set may include a Bluetooth chipset. On the other hand, the high frequency signal generation unit and the mixing unit may be configured in the circuit of the smartphone.

On the other hand, the bandwidth expansion method of the earset having the in-ear microphone of the present invention, preferably (a) the excitation signal (excitation signal) extended from the input super- narrowband signal (Super-Narrowband signal) and the ultra narrowband signal Synthesizing the expanded high frequency band signal by doubling the frequency to generate a high frequency signal; And (b) mixing the high frequency signal and the ultra narrowband signal.

In this case, step (a) may include determining the excitation signal from the ultra narrowband signal; Expanding the determined excitation signal into a wideband excitation signal; Multiplying (N times) the frequency of the ultra narrowband signal to a wideband signal including a high frequency band signal; Estimating and determining a high frequency band signal from the expanded wideband signal; A filtering unit filtering the determined high frequency band signal; And synthesizing the filtered high frequency band signal and the extended wideband excitation signal.

As described above, according to the apparatus and method for bandwidth expansion of an earset having an in-ear microphone according to the present invention, the narrowband signal input to the in-ear microphone is simply doubled to expand to a high frequency band, and an extended high frequency band. Since only high-frequency band is extracted by performing simple filtering, the amount of computation can be significantly reduced.

As a result, the real-time processing is possible according to the decrease in the amount of computation, thereby preventing a signal transmission delay phenomenon.

3 is a block diagram of a control circuit of an apparatus for expanding a bandwidth of an ear set having an in-ear microphone according to one embodiment of the present invention.

4 is a conceptual diagram of an application of the present invention in the case of being applied to a wireless earset / headset.

5 is a conceptual diagram when the present invention is applied to a wired earset / headset as another application example.

6 is a flow chart of a method for bandwidth expansion of an earset with an in-ear microphone as an embodiment of the present invention.

Hereinafter, with reference to the preferred embodiments of the present invention and the accompanying drawings will be described in detail, the same reference numerals in the drawings will be described on the assumption that the same components.

When any one element in the description or claims of the invention "includes" another element, unless otherwise stated, it is not limited to consisting only of that element, and other elements are not interpreted. It should be understood that it may include more.

Further, in the detailed description of the invention or in the claims, the elements designated as "~ means", "~ part", "~ module", and "~ block" mean a unit that processes at least one function or operation, Each of these may be implemented by software or hardware, or a combination thereof.

The present invention relates to a method of recovering a high frequency band signal from a low frequency band signal of a user voice transmitted through an in-ear microphone. In particular, we propose a technique that enables the restoration of high-pitched sound in real time using the DSP of Bluetooth.

As described above, the sound is sourced through the tremor of the vocal cords, and the sound is filtered into different sounds depending on the oral cavity, the nasal cavity and the mouth structure. That is, it is divided into excitation signal components representing disturbances generated as air passes between sources or narrow gaps, and envelope components generating filters. In general, the excitation signal component and the envelope component are each subjected to a wideband extension process. Since the influence of the excitation signal component is relatively small compared to the envelope component, the spectral folding technique or the spectral parallel shift technique is used. do. By the way, for the envelope component, Codebook Mapping, Artificial Neural Network, Gaussian Mixture Model, Hidden Markov Model, HMM for the envelope component represented by linear spectrum frequency ), A voice is generated by outputting a wideband high frequency signal using a large number of techniques, and thus has a large amount of computation. As a result, it is practically impossible to recover treble in real time, for example in a Bluetooth DSP. Accordingly, the present invention proposes a method for enabling real-time high-pitched sound restoration and original sound restoration by significantly reducing the amount of computation.

Hereinafter, an example in which an apparatus and method for extending bandwidth of an earset having an in-ear microphone according to the present invention is implemented will be described with reference to a specific embodiment.

Referring to FIG. 3, the apparatus for extending bandwidth of the present invention includes a first linear prediction analyzer 21 for determining an excitation signal from an input super-narrowband signal, and the determined excitation signal. An excitation signal extension 22 for generating sound by outputting a wideband excitation signal through a spectral folding technique or a Gaussian noise passband conversion technique, and a high frequency band signal by doubling (N times) the frequency of the ultra narrowband signal. A high frequency spectrum expansion unit 23 for extending a wideband signal, a second linear prediction analysis unit 24 for estimating and determining a high frequency band signal from the extended wideband signal, and a second linear prediction analysis unit 24 Filtering unit 25 for filtering the high-frequency band signal, a synthesis unit for combining the high-frequency band signal output from the filtering unit 25 and the wideband excitation signal output from the excitation signal expansion unit 22 (26), and a mixing section 27 for mixing the high frequency signal and the ultra narrow band signal output from the combining section 26. As described above, the bandwidth extension device of the present invention multiplies and expands and filters the excitation signal extended from the super- narrowband signal inputted at a high frequency and the super narrowband signal to expand and filter the high frequency band signal. A high frequency signal generation unit for synthesizing and generating a high frequency signal, and a mixing unit 27 for mixing a high frequency signal and an ultra narrow band signal.

As an example, the high frequency spectral expansion unit 23 upsamples the ultra narrowband signal (0 to 2 KHz) twice, and the upsampled signal is sampled at 4 KHz. The signal output from the high frequency spectrum expansion unit 23 is the same as the 0 ~ 4KHz band, the high frequency band 4 ~ 8KHz will have the same spectrum as the folded version of the input signal. The spectrum is used to estimate the high frequency band signal. Accordingly, the filtering unit 25 extracts the voice signal of the 4 ~ 8KHz band. Thereafter, the synthesizer 26 synthesizes a voice signal in the 0-4KHz band and a voice signal in the 4-8KHz band, and then the high-frequency voice output from the combiner 26 and the ultra narrowband signal before extension (0-4KHz). 2KHz) to finally restore the original sound.

As described above, the bandwidth extension device of the present invention enables the original sound recovery even if a super-narrowband signal is input to the in-ear microphone. That is, in general, the treble reconstruction algorithm extends 0 to 4KHz to 8KHz, whereas in the present invention, the reconstruction is performed for a narrowband signal of less than 2KHz input to the in-ear microphone. In addition, in the present invention, the original sound can be restored even though the calculation amount is significantly reduced.

In contrast to the conventional speech synthesis apparatus shown in FIG. 2, the function of extending the excitation signal after linear prediction encoding is performed as it is, but the function of the spectral envelope expansion unit is removed. While the conventional speech synthesis apparatus predicts and extends a frequency through a linear predictive coding based algorithm, the present invention does not perform an operation of predicting and expanding a frequency through a linear predictive coding based algorithm, and performs high frequency spectrum expansion (High Frequency Spectrum). Extension) allows simple frequency extension. That is, the operation of estimating and extending the frequency in real time is omitted, and only the frequency is extended by using rectifier, spectral folding, and modulation techniques. This can greatly reduce the amount of computation.

When the wideband signal is output by simply expanding the frequency in the high frequency spectrum extension unit 23 as described above, after performing linear prediction analysis on this, only simple filtering is performed using a filter without performing frequency expansion through linear prediction modeling. do. In other words, filtering is performed close to the original sound without bandwidth extension. Subsequently, a high frequency signal is generated by combining the filtered result with the result of the extended excitation signal. Then, when the high frequency signal and the ultra narrowband signal input through the in-ear microphone are finally mixed, the original sound is restored.

Referring to FIG. 4, the case where the bandwidth extension of the present invention is performed in the DSP of the earset, that is, for example, a Bluetooth chipset (DSP) is described. In this case, the amount of computation is significantly reduced, enabling real-time processing in the Bluetooth chipset and minimizing radio transmission delay. Of course, the earset and the smartphone can be wired connection.

Referring to FIG. 5, the case where the bandwidth extension of the present invention is performed in a smartphone or the like is described. In this case, the earset and the smartphone may be wired, and real-time processing is possible in the smartphone chipset. Of course, the earset and the smartphone can be wirelessly connected.

Then, the bandwidth extension method of the ear set having the in-ear microphone of the present invention using the bandwidth extension device configured as described above will be described.

Referring to FIG. 6, when an ultra narrowband signal is input to an in-ear microphone (S1), the excitation signal is expanded, and an excitation signal is determined from the input super- narrowband signal. (S2), the determined excitation signal is extended to a wideband excitation signal (S3).

Meanwhile, the frequency of the input super narrowband signal is doubled to extend the wideband signal including the high frequency band signal (S4).

Accordingly, the high frequency band signal is estimated and determined from the extended wideband signal (S5).

Subsequently, the estimated and determined high frequency band signal is filtered (S6).

Meanwhile, a high frequency signal is generated by combining the filtered high frequency band signal and the wideband excitation signal (S7).

Next, the high frequency signal and the ultra narrowband signal are mixed to restore the original sound (S8).

As described above, in the present invention, a region in which the excitation signal is extended from the ultra narrow band signal (0 to 2 KHz) through linear prediction, and the simple frequency extension is performed from the ultra narrow band signal, and the high frequency signal is linearly estimated from the simple high frequency signal. It is composed of an area for predicting and simple filtering the predicted high frequency signal. Thereafter, the extended excitation signal and the filtered high frequency signal are synthesized to generate a high frequency signal, and then a wideband signal (0 to 8 KHz) is generated from the high frequency signal and the ultra narrow band signal. In this case, as a simple extension technique in the high frequency spectrum extension unit 23, a rectifier, a spectral folding, and a modulation technique may be used.

The technical spirit of the present invention has been described through several embodiments.

It will be apparent to those skilled in the art that the present invention may be variously modified or changed from the description of the present invention. In addition, even if not explicitly shown or described, those skilled in the art to which the present invention pertains various modifications, including the technical idea according to the present invention from the description of the present invention. Is obvious, and still belongs to the scope of the present invention. The above embodiments described with reference to the accompanying drawings are described for the purpose of illustrating the present invention, and the scope of the present invention is not limited to these embodiments.

Claims

A high frequency signal generator for generating a high frequency signal by combining an extended excitation signal from an input super narrow band signal and an extended and filtered high frequency band signal by doubling the frequency of the super narrow band signal; And

And an in-ear microphone including a mixing unit for mixing the high frequency signal and the ultra narrow band signal.
The method of claim 1,

The high frequency signal generator,

A first linear prediction analyzer determining the excitation signal from the ultra narrowband signal;

An excitation signal expansion unit for extending the determined excitation signal into a wideband excitation signal;

A high frequency spectral expansion unit that multiplies (N times) the frequency of the ultra narrowband signal to a wideband signal including a high frequency band signal;

A second linear prediction analyzer configured to estimate and determine a high frequency band signal from the expanded wideband signal;

A filtering unit filtering the high frequency band signal output from the second linear prediction analyzer; And

Bandwidth expansion device of the ear set having an in-ear microphone including a synthesis unit for synthesizing a high frequency band signal output from the filtering unit and a wideband excitation signal output from the excitation signal expansion unit.
The method of claim 2,

The extension of the excitation signal is a bandwidth extension device of an earset having an in-ear microphone using any one of a spectral folding technique or a Gaussian noise passband conversion technique.
The method of claim 2,

The widening of the wideband signal is a bandwidth extension device of an earset having an in-ear microphone using any one of a rectifier, a spectral folding, and a modulation technique.
The method of claim 1,

And the high frequency signal generator and the mixing unit have an in-ear microphone configured in a circuit of the earset.
The method of claim 5,

The earset is a device for bandwidth expansion of the earset having an in-ear microphone including a Bluetooth chipset.
The method of claim 1,

And the high frequency signal generator and the mixer are in-ear microphones configured in a circuit of a smartphone.
(a) generating a high frequency signal by synthesizing an extended excitation signal from an input super narrowband signal and a frequency of the ultra narrow band signal and expanding and filtering the filtered high frequency band signal;

(b) mixing the high frequency signal with the ultra narrowband signal.
The method of claim 8,

Step (a) is,

Determining the excitation signal from the ultra narrowband signal;

Expanding the determined excitation signal into a wideband excitation signal;

Multiplying (N times) the frequency of the ultra narrowband signal to a wideband signal including a high frequency band signal;

Estimating and determining a high frequency band signal from the expanded wideband signal;

A filtering unit filtering the determined high frequency band signal; And

Synthesizing the filtered high frequency band signal and the extended wideband excitation signal.