JP2012208177A

JP2012208177A - Band extension device and sound correction device

Info

Publication number: JP2012208177A
Application number: JP2011071837A
Authority: JP
Inventors: Hiroyuki Noto; 広之野戸; Daigo Yamada; 大悟山田; Daisuke Takagi; 大介高木; Yasuhiro Kume; 康裕久米; Hiroki Yamauchi; 寛紀山内; Yoshihiro Nishio; 嘉浩西尾; Kohei Shimada; 浩平島田
Original assignee: NIPPON LOGICS KK; SYSTEM ADVANCE CORP; Takumi Vision Co Ltd
Current assignee: NIPPON LOGICS KK; SYSTEM ADVANCE CORP; Takumi Vision Co Ltd
Priority date: 2011-03-29
Filing date: 2011-03-29
Publication date: 2012-10-25

Abstract

PROBLEM TO BE SOLVED: To provide a band extension device capable of improving an auditory recognition degree.SOLUTION: A band extension device includes: time frame division means for dividing an input signal of a time area for every time frame; Fourier transformation means for performing Fourier transformation to time frames to generate an original sound spectrum of a frequency area; harmonic spectrum generation means for generating a harmonic spectrum based on the original sound spectrum; harmonic spectrum addition means for adding the harmonic spectrum to the original sound spectrum; Fourier inverse transformation means for performing Fourier inverse transformation to the original sound spectrum to which the harmonic spectrum is added to generate output signal components of the time areas; and output signal generation means for adding the output signal components respectively to generate an output signal whose frequency band is extended.

Description

本発明は、入力信号の周波数帯域を拡張するための帯域拡張装置に関する。また、本発明は、骨導マイクロホンからの音声信号を補正するための音声補正装置に関する。 The present invention relates to a band extending apparatus for extending a frequency band of an input signal. The present invention also relates to an audio correction device for correcting an audio signal from a bone-conduction microphone.

従来より、入力信号の周波数帯域を拡張するための帯域拡張装置が提案されている（例えば、特許文献１及び特許文献２参照）。従来の帯域拡張装置は、入力信号から低域信号を生成する低域信号生成手段と、入力信号から高域信号を生成する高域信号生成手段と、生成した低域信号及び高域信号をそれぞれ入力信号に加算する加算手段と、を備えている。低域信号生成手段は、入力信号をオーバーサンプリングした後に全波整流（非線形処理）し、この全波整流した入力信号を周波数領域においてバンドパスフィルタリングすることにより低域信号を生成する。高域信号生成手段は、入力信号をオーバーサンプリングした後に、周波数領域において入力信号の低域の周波数帯域を高域側にコピーすることにより高域信号を生成する。このように生成された低域信号及び高域信号がそれぞれ入力信号に加算されることにより、周波数帯域が拡張された出力信号が生成される。このように周波数帯域を拡張することによって、電話の通話音質やデジタル音響機器の再生音質等を改善することができる。 Conventionally, band extension devices for extending the frequency band of an input signal have been proposed (see, for example, Patent Document 1 and Patent Document 2). The conventional band extension device includes a low-frequency signal generating means for generating a low-frequency signal from an input signal, a high-frequency signal generating means for generating a high-frequency signal from the input signal, and the generated low-frequency signal and high-frequency signal, respectively. Adding means for adding to the input signal. The low-frequency signal generating means performs full-wave rectification (nonlinear processing) after oversampling the input signal, and generates a low-frequency signal by performing bandpass filtering on the input signal subjected to full-wave rectification in the frequency domain. The high frequency signal generating means generates a high frequency signal by oversampling the input signal and then copying a low frequency band of the input signal to the high frequency side in the frequency domain. The low-frequency signal and high-frequency signal thus generated are added to the input signal, thereby generating an output signal with an expanded frequency band. By expanding the frequency band in this way, it is possible to improve telephone call sound quality, playback sound quality of digital audio equipment, and the like.

また、従来より、骨導マイクロホンからの音声信号を補正するための音声補正装置が提案されている（例えば、特許文献３参照）。骨導マイクロホンは骨伝導マイクロホンとも呼ばれ、音声発声時の声帯振動によって生じる骨の振動を額や顎、頬、耳穴等で収録し、実際の音声の代用の信号として利用するための振動ピックアップの一つである。このような骨導マイクロホンは、気導マイクロホンに比べて高いＳ／Ｎ比を得ることができ、高騒音下の音声入力手段として有効に用いられる。従来の音声補正装置は、骨導マイクロホンと、骨導マイクロホンからの音声信号を時間フレームごとに分割する時間フレーム分割手段と、時間フレームに基づいてピッチ周波数データ及び骨導音声特徴パラメータを分析するＬＰＣ分析手段と、分析された骨導音声特徴パラメータを疑似気導音声特徴パラメータに変換する信号変換手段と、ピッチ周波数データ及び疑似気導音声特徴パラメータに基づいて疑似気導音声信号を生成するＬＰＣ合成手段と、疑似気導音声信号を長時間の信号に変換する平滑化手段と、を備えている。 Conventionally, an audio correction device for correcting an audio signal from a bone-conduction microphone has been proposed (see, for example, Patent Document 3). Bone-conduction microphones are also called bone-conduction microphones. These are vibration pickups that record bone vibrations caused by vocal cord vibrations during voice production using the forehead, chin, cheeks, ear holes, etc., and use them as substitute signals for actual speech. One. Such a bone-conduction microphone can obtain a higher S / N ratio than an air-conduction microphone, and is effectively used as a voice input means under high noise. A conventional speech correction apparatus includes a bone conduction microphone, a time frame dividing unit that divides a speech signal from the bone conduction microphone into time frames, and an LPC that analyzes pitch frequency data and bone conduction speech feature parameters based on the time frames. LPC synthesis for generating a pseudo air conduction speech signal based on the pitch frequency data and the pseudo air conduction speech feature parameter, and an analysis means, a signal conversion means for converting the analyzed bone conduction speech feature parameter into a pseudo air conduction speech feature parameter Means and smoothing means for converting the pseudo air conduction sound signal into a long-time signal.

特許第３３０１４７３号公報Japanese Patent No. 3301473 特開２００３−１５６９５号公報JP 2003-15695 A 特許第３３０６７８４号公報Japanese Patent No. 3306784

第１に、上述のような従来の帯域拡張装置では、次のような問題がある。一般に、入力信号の性質を表す重要な要素として、スペクトル包絡のピークにおける周波数（以下、「ピーク周波数」という）がある。音声におけるスペクトル包絡のピーク周波数はホルマントピーク周波数であり、このホルマントピーク周波数は、声道伝達系の音韻を識別するために重要である。また、楽音におけるスペクトル包絡のピーク周波数は、楽器の種類及びその各部分の構造に基づく共振系の共鳴周波数であり、この共鳴周波数は、楽音の特徴を表すものとして重要である。このようなスペクトル包絡のピーク周波数は共鳴・共振現象であるため、１次共振モードに対して逓倍となる周波数に高次共振モードを有するのが一般的である。従って、高域信号を生成する際には、基本周波数の倍音と同様に、適切な周波数にスペクトル包絡の高次共振モードのピークを持たせることが重要となる。 First, the conventional band extending apparatus as described above has the following problems. In general, as an important factor representing the nature of an input signal, there is a frequency at the peak of the spectral envelope (hereinafter referred to as “peak frequency”). The peak frequency of the spectral envelope in speech is the formant peak frequency, and this formant peak frequency is important for identifying the phoneme of the vocal tract transmission system. Further, the peak frequency of the spectrum envelope in the musical sound is a resonance frequency of the resonance system based on the type of the musical instrument and the structure of each part thereof, and this resonance frequency is important as representing the characteristics of the musical sound. Since the peak frequency of such a spectral envelope is a resonance / resonance phenomenon, it is common to have a high-order resonance mode at a frequency that is multiplied by the primary resonance mode. Therefore, when generating a high-frequency signal, it is important to have a peak of a high-order resonance mode with a spectral envelope at an appropriate frequency, similar to harmonics of the fundamental frequency.

しかしながら、上述のような従来の帯域拡張装置では、周波数領域において入力信号の低域の周波数帯域を高域側にコピーする際のシフト量は、スペクトル包絡のピーク周波数とは無関係であるため、高域信号においてスペクトル包絡の高次共振モードのピークを再現することができない。そのため、自然な特性の出力信号を得るのは難しく、聴認度の向上には限界があった。また、入力信号をオーバーサンプリングした後に全波整流することにより低域信号を生成しているが、このような構成では、入力信号にノイズが混入した際に、整流のタイミングであるゼロ交差付近でノイズの影響を大きく受けてしまう。そのため、出力音質が低下するおそれがある。 However, in the conventional band extension device as described above, the shift amount when copying the low frequency band of the input signal to the high frequency side in the frequency domain is independent of the peak frequency of the spectrum envelope, The peak of the higher order resonance mode of the spectral envelope cannot be reproduced in the band signal. For this reason, it is difficult to obtain an output signal with natural characteristics, and there is a limit to improving the degree of hearing. In addition, a low-frequency signal is generated by full-wave rectification after oversampling the input signal. However, in such a configuration, when noise is mixed in the input signal, the rectification timing is near the zero crossing. It is greatly affected by noise. As a result, the output sound quality may be degraded.

第２に、上述のような従来の音声補正装置では、次のような問題がある。従来の音声補正装置は、音声信号に基づいてピッチ周波数データ及び骨導音声特徴パラメータを分析するように構成されているが、骨導マイクロホンの特性上、音声信号が無声音である場合にはピッチ周波数データを得ることができない。そのため、無声音に対しては疑似気導音声信号を生成することができず、出力される音声が不自然なものとなってしまい、聴認度の向上を図ることが難しい。 Secondly, the conventional sound correcting apparatus as described above has the following problems. The conventional speech correction apparatus is configured to analyze the pitch frequency data and the bone conduction speech feature parameters based on the speech signal. However, when the speech signal is an unvoiced sound due to the characteristics of the bone conduction microphone, the pitch frequency is determined. I can't get the data. Therefore, a pseudo air conduction sound signal cannot be generated for an unvoiced sound, and the output sound becomes unnatural and it is difficult to improve the degree of hearing.

本発明の目的は、聴認度の向上を図ることができる帯域拡張装置及び音声補正装置を提供することである。 An object of the present invention is to provide a band extension device and a sound correction device capable of improving the degree of hearing.

本発明の請求項１に記載の帯域拡張装置では、入力信号の周波数帯域を拡張するための帯域拡張装置であって、
時間領域の入力信号を時間フレームごとに分割する時間フレーム分割手段と、前記時間フレームをフーリエ変換して周波数領域の原音スペクトルを生成するフーリエ変換手段と、前記原音スペクトルに基づいて高調波スペクトルを生成する高調波スペクトル生成手段と、前記高調波スペクトルを前記原音スペクトルに加算する高調波スペクトル加算手段と、前記高調波スペクトルが加算された前記原音スペクトルをフーリエ逆変換して時間領域の出力信号成分を生成するフーリエ逆変換手段と、前記出力信号成分をそれぞれ加算して周波数帯域が拡張された出力信号を生成する出力信号生成手段と、を備え、
前記高調波スペクトル生成手段は、前記原音スペクトルに含まれる原音スペクトル成分の周波数を算出し、この算出した周波数の逓倍の周波数を前記高調波スペクトルに含まれる高調波スペクトル成分の周波数として設定することを特徴とする。 The band extending apparatus according to claim 1 of the present invention is a band extending apparatus for extending a frequency band of an input signal,
Time frame dividing means for dividing the time domain input signal into time frames; Fourier transform means for generating a frequency domain original sound spectrum by Fourier transforming the time frame; and generating a harmonic spectrum based on the original sound spectrum A harmonic spectrum generating means for adding the harmonic spectrum to the original sound spectrum, and inverse Fourier transforming the original sound spectrum to which the harmonic spectrum has been added to obtain an output signal component in the time domain. An inverse Fourier transform means for generating, and an output signal generation means for adding an output signal component to generate an output signal with an expanded frequency band.
The harmonic spectrum generation means calculates the frequency of the original sound spectrum component included in the original sound spectrum, and sets the frequency multiplied by the calculated frequency as the frequency of the harmonic spectrum component included in the harmonic spectrum. Features.

また、本発明の請求項２に記載の帯域拡張装置では、前記高調波スペクトル生成手段は、前記原音スペクトルに含まれる前記原音スペクトル成分の位相角を算出し、この算出した位相角の逓倍の位相角を前記高調波スペクトルに含まれる前記高調波スペクトル成分の位相角として設定することを特徴とする。 Further, in the band extending apparatus according to claim 2 of the present invention, the harmonic spectrum generating means calculates a phase angle of the original sound spectrum component included in the original sound spectrum, and a phase of multiplication of the calculated phase angle. An angle is set as a phase angle of the harmonic spectrum component included in the harmonic spectrum.

また、本発明の請求項３に記載の帯域拡張装置では、前記高調波スペクトル生成手段は、前記原音スペクトルを分析する原音スペクトル分析手段と、前記原音スペクトル分析部の分析結果に基づいて、前記高調波スペクトルに含まれる前記高調波スペクトル成分の大きさに対して所定の加重係数で加重を施す加重手段と、を含むことを特徴とする。 Further, in the band extending apparatus according to claim 3 of the present invention, the harmonic spectrum generating means is based on the original sound spectrum analyzing means for analyzing the original sound spectrum and the analysis result of the original sound spectrum analyzing section. Weighting means for weighting the magnitude of the harmonic spectrum component contained in the wave spectrum with a predetermined weighting factor.

また、本発明の請求項４に記載の帯域拡張装置では、前記原音スペクトル分析手段は、前記原音スペクトルの包絡線の傾斜を分析するように構成されており、前記加重手段は、分析された前記原音スペクトルの包絡線の傾斜に基づいて前記加重係数を変化させることを特徴とする。 Moreover, in the band extending apparatus according to claim 4 of the present invention, the original sound spectrum analyzing means is configured to analyze an inclination of an envelope of the original sound spectrum, and the weighting means The weighting coefficient is changed based on the slope of the envelope of the original sound spectrum.

また、本発明の請求項５に記載の帯域拡張装置では、前記高調波スペクトル生成手段は、前記高調波スペクトル成分を補間するための補間手段を含んでおり、前記補間手段は、前記原音スペクトルに含まれる第１原音スペクトル成分の周波数の逓倍の周波数と、前記第１原音スペクトル成分に隣接する第２原音スペクトル成分の周波数の逓倍の周波数との間の周波数を有する前記高調波スペクトル成分を補間することを特徴とする。 In the band extending apparatus according to claim 5 of the present invention, the harmonic spectrum generating means includes an interpolation means for interpolating the harmonic spectrum component, and the interpolation means adds the original sound spectrum to the original sound spectrum. The harmonic spectral component having a frequency between a frequency multiplied by the frequency of the first original sound spectral component included and a frequency multiplied by the frequency of the second original sound spectral component adjacent to the first original sound spectral component is interpolated. It is characterized by that.

また、本発明の請求項６に記載の帯域拡張装置では、前記入力信号は、マイクロホンや振動ピックアップ、骨導マイクロホン、デジタル音響機器、電話システム、人工声帯等により生成される音声信号又は楽音信号であることを特徴とする。 In the band extending apparatus according to claim 6 of the present invention, the input signal is a voice signal or a musical tone signal generated by a microphone, a vibration pickup, a bone conduction microphone, a digital acoustic device, a telephone system, an artificial vocal cord, or the like. It is characterized by being.

また、本発明の請求項７に記載の音声補正装置では、骨導マイクロホンと、前記骨導マイクロホンからの音声信号を時間フレームごとに分割する時間フレーム分割手段と、前記時間フレームにおける音声信号の音声特性を分析する音声特性分析手段と、音声特性の分析結果に基づいて前記時間フレームにおける音声信号が有声音か無声音かを判別する音声性質判別手段と、有声音と判別された音声信号を補正して疑似気導音声信号を生成する第１信号補正手段と、無声音と判別された音声信号を補正して疑似気導音声信号を生成する第２信号補正手段と、前記第１信号補正手段による有声音補正モードと前記第２信号補正手段による無声音補正モードとを切り替える補正モード切替手段と、生成された疑似気導音声信号をそれぞれ加算して出力信号を生成する出力信号生成手段と、を備えることを特徴とする。 In the sound correction device according to claim 7 of the present invention, the bone conduction microphone, the time frame dividing means for dividing the sound signal from the bone conduction microphone for each time frame, and the sound of the sound signal in the time frame. A voice characteristic analyzing means for analyzing the characteristics; a voice property determining means for determining whether the voice signal in the time frame is voiced or unvoiced based on the voice characteristic analysis result; and correcting the voice signal determined to be voiced. First signal correcting means for generating a pseudo air conduction sound signal, second signal correcting means for generating a pseudo air conduction sound signal by correcting the sound signal determined to be unvoiced sound, and existence by the first signal correcting means. A correction mode switching means for switching between the voice sound correction mode and the unvoiced sound correction mode by the second signal correction means, and the generated pseudo air conduction sound signal are respectively added and output. Characterized in that it comprises an output signal generating means for generating a signal.

また、本発明の請求項８に記載の音声補正装置では、骨導音声道特性パラメータ、気導音声道特性パラメータ及び気導音音源特性パラメータが記憶されたパラメータ記憶手段を更に備え、前記音声特性分析手段は、前記時間フレームにおける音声信号の骨導音音源特性及び骨導音声道特性を分析するように構成されており、
前記第１信号補正手段は、前記音声特性分析手段により分析された音声信号の骨導音声道特性に基づいて、これに対応する前記気導音声道特性パラメータを前記記憶手段から読み出し、前記音声特性分析手段により分析された音声信号の骨導音音源特性と前記気導音声道特性パラメータとを合成して疑似気導音声信号を生成することを特徴とする。 The speech correction apparatus according to claim 8 of the present invention further includes parameter storage means for storing a bone-conducted vocal tract characteristic parameter, an air-conducted vocal tract characteristic parameter, and an air-conducted sound source characteristic parameter. The analysis means is configured to analyze the bone conduction sound source characteristics and the bone conduction vocal tract characteristics of the speech signal in the time frame,
The first signal correction means reads out the air conduction vocal tract characteristic parameter corresponding to the bone conduction vocal tract characteristic of the sound signal analyzed by the sound characteristic analysis means from the storage means, and the sound characteristic A pseudo air conduction sound signal is generated by synthesizing the bone conduction sound source characteristic of the sound signal analyzed by the analyzing means and the air conduction sound path characteristic parameter.

また、本発明の請求項９に記載の音声補正装置では、前記第２信号補正手段は、前記音声特性分析手段により分析された音声信号の骨導音声道特性に基づいて、これに対応する前記気導音声道特性パラメータ及び前記気導音音源特性パラメータを前記記憶手段から読み出し、この読み出した前記気導音声道特性パラメータと前記気導音音源特性パラメータとを合成して疑似気導音声信号を生成することを特徴とする。 Further, in the sound correction device according to claim 9 of the present invention, the second signal correction means is based on the bone-conducted vocal tract characteristic of the sound signal analyzed by the sound characteristic analysis means, and corresponds to the sound conduction characteristic. The air conduction sound path characteristic parameter and the air conduction sound source characteristic parameter are read from the storage means, and the read air conduction sound path characteristic parameter and the air conduction sound source characteristic parameter are synthesized to generate a pseudo air conduction sound signal. It is characterized by generating.

本発明の請求項１に記載の帯域拡張装置によれば、高調波スペクトル生成手段は、原音スペクトルに含まれる原音スペクトル成分の周波数を算出し、この算出した周波数の逓倍の周波数を高調波スペクトルに含まれる高調波スペクトル成分の周波数として設定するので、出力信号においてスペクトル包絡の高次共振モードのピークを再現することができる。それ故に、自然な特性の出力信号を得ることができ、聴認度の向上を図ることができる。また、原音スペクトル成分のＳ／Ｎ比を高調波スペクトル成分で復元することができ、ノイズの影響を受け難くすることができる。 According to the band extending apparatus of the first aspect of the present invention, the harmonic spectrum generating means calculates the frequency of the original sound spectrum component included in the original sound spectrum, and sets the frequency multiplied by the calculated frequency as the harmonic spectrum. Since it is set as the frequency of the included harmonic spectrum component, the peak of the higher-order resonance mode of the spectrum envelope can be reproduced in the output signal. Therefore, an output signal with natural characteristics can be obtained, and the degree of hearing can be improved. In addition, the S / N ratio of the original sound spectrum component can be restored with the harmonic spectrum component, and the influence of noise can be reduced.

また、本発明の請求項２に記載の帯域拡張装置によれば、高調波スペクトル生成手段は、原音スペクトルに含まれる原音スペクトル成分の位相角を算出し、この算出した位相角の逓倍の位相角を高調波スペクトルに含まれる高調波スペクトル成分の位相角として設定するので、原音スペクトルと高調波スペクトルとの時間関係を一定に保持することができる。これにより、時間領域の出力信号成分をそれぞれ加算した際に、各出力信号成分の高調波成分が互いに打ち消し合うのを防止することができ、出力信号を精度良く生成することができる。 According to the band extending apparatus of the second aspect of the present invention, the harmonic spectrum generating means calculates a phase angle of the original sound spectrum component included in the original sound spectrum, and a phase angle obtained by multiplying the calculated phase angle. Is set as the phase angle of the harmonic spectrum component included in the harmonic spectrum, the time relationship between the original sound spectrum and the harmonic spectrum can be kept constant. Thereby, when the output signal components in the time domain are added, the harmonic components of the output signal components can be prevented from canceling each other, and the output signal can be generated with high accuracy.

また、本発明の請求項３に記載の帯域拡張装置によれば、加重手段は、原音スペクトル分析部の分析結果に基づいて、高調波スペクトルに含まれる高調波スペクトル成分の大きさに対して所定の加重係数で加重を施すので、例えば原音スペクトルの音韻を考慮した高調波スペクトルを生成することができ、より自然な特性の出力信号を得ることができる。 According to the band extending apparatus of the third aspect of the present invention, the weighting means is predetermined with respect to the magnitude of the harmonic spectrum component included in the harmonic spectrum based on the analysis result of the original sound spectrum analysis unit. Thus, for example, a harmonic spectrum considering the phoneme of the original sound spectrum can be generated, and an output signal with more natural characteristics can be obtained.

また、本発明の請求項４に記載の帯域拡張装置によれば、加重手段は、分析された原音スペクトルの包絡線の傾斜に基づいて加重係数を変化させる。例えば、原音スペクトルの包絡線の傾斜が大きいときには、加重係数を小さく設定する。これにより、高調波スペクトル成分の大きさが小さくなる。一般に、母音のスペクトルでは、高域側におけるスペクトル成分の減衰が大きくなる特性があり、それ故に、上述のように加重係数を小さく設定することによって、母音をよりリアルに復元することができる。また例えば、原音スペクトルの包絡線の傾斜が小さいときには、加重係数を大きく設定する。これにより、高調波スペクトル成分の大きさが大きくなる。一般に、子音のスペクトルでは、高域側におけるスペクトル成分の減衰が小さくなる特性があり、それ故に、上述のように加重係数を大きく設定することによって、子音をよりリアルに復元することができる。 According to the band extending apparatus of the fourth aspect of the present invention, the weighting means changes the weighting coefficient based on the slope of the envelope of the analyzed original sound spectrum. For example, when the slope of the envelope of the original sound spectrum is large, the weighting coefficient is set small. Thereby, the magnitude | size of a harmonic spectrum component becomes small. Generally, the spectrum of a vowel has a characteristic that the attenuation of the spectrum component on the high frequency side becomes large. Therefore, the vowel can be restored more realistically by setting the weighting coefficient small as described above. For example, when the slope of the envelope of the original sound spectrum is small, the weighting coefficient is set large. Thereby, the magnitude | size of a harmonic spectrum component becomes large. In general, the consonant spectrum has a characteristic that the attenuation of the spectral component on the high frequency side is reduced. Therefore, the consonant can be restored more realistically by setting the weighting factor large as described above.

また、本発明の請求項５に記載の帯域拡張装置によれば、高調波スペクトル生成手段は、高調波スペクトル成分を補間するための補間手段を含んでいる。このように高調波スペクトル成分を補間することによって、高調波スペクトルをより精度良く生成することができる。 According to the band extending apparatus of the fifth aspect of the present invention, the harmonic spectrum generating means includes the interpolation means for interpolating the harmonic spectrum component. By interpolating the harmonic spectrum components in this way, the harmonic spectrum can be generated with higher accuracy.

また、本発明の請求項６に記載の帯域拡張装置によれば、入力信号は、マイクロホンや振動ピックアップ、骨導マイクロホン、デジタル音響機器、電話システム、人工声帯等により生成される音声信号又は楽音信号である。このようなマイクロホンや振動ピックアップ、骨導マイクロホン、デジタル音響機器、電話システム、人工声帯等は、その特性不足により高音域を出力するのが難しいが、これらに本発明の帯域拡張装置を適用することにより、自然な高音域を再現することができ、聴認度の向上を図ることができる。 According to the band extending apparatus of the sixth aspect of the present invention, the input signal is a voice signal or musical tone signal generated by a microphone, vibration pickup, bone-conduction microphone, digital acoustic device, telephone system, artificial vocal cord, or the like. It is. Such microphones, vibration pickups, bone-conduction microphones, digital audio equipment, telephone systems, artificial vocal cords, etc. are difficult to output high frequencies due to their insufficient characteristics, but the band expansion device of the present invention is applied to them. Thus, a natural high frequency range can be reproduced, and the degree of hearing can be improved.

また、本発明の請求項７に記載の音声補正装置によれば、第１信号補正手段による有声音補正モードと第２信号補正手段による無声音補正モードとを切り替える補正モード切替手段を備えているので、従来のように有声音だけでなく無声音に対しても疑似気導音声信号を生成することができる。これにより、自然な特性の出力信号を得ることができ、聴認度の向上を図ることができる。 According to the sound correction apparatus of the seventh aspect of the present invention, the sound correction apparatus includes the correction mode switching means for switching between the voiced sound correction mode by the first signal correction means and the unvoiced sound correction mode by the second signal correction means. Thus, it is possible to generate a pseudo air conduction sound signal not only for voiced sound but also for unvoiced sound as in the prior art. As a result, an output signal with natural characteristics can be obtained, and the degree of hearing can be improved.

また、本発明の請求項８に記載の音声補正装置によれば、第１信号補正手段は、音声特性分析手段により分析された音声信号の骨導音声道特性に基づいて、これに対応する気導音声道特性パラメータを記憶手段から読み出し、音声特性分析手段により分析された音声信号の骨導音音源特性と気導音声道特性パラメータとを合成して疑似気導音声信号を生成する。これにより、有声音に対して疑似気導音声信号を容易に且つ精度良く生成することができる。 According to the speech correction apparatus of the present invention, the first signal correction means is adapted to the corresponding voice based on the bone conduction vocal tract characteristic of the voice signal analyzed by the voice characteristic analysis means. The derived vocal tract characteristic parameter is read from the storage means, and the bone conduction sound source characteristic and the air conduction vocal tract characteristic parameter of the voice signal analyzed by the voice characteristic analyzing means are synthesized to generate a pseudo air conduction voice signal. Thereby, a pseudo air conduction sound signal can be easily and accurately generated for voiced sound.

また、本発明の請求項９に記載の音声補正装置によれば、第２信号補正手段は、音声特性分析手段により分析された音声信号の骨導音声道特性に基づいて、これに対応する気導音声道特性パラメータ及び気導音音源特性パラメータを記憶手段から読み出し、この読み出した気導音声道特性パラメータと気導音音源特性パラメータとを合成して疑似気導音声信号を生成する。これにより、無声音に対して疑似気導音声信号を容易に且つ精度良く生成することができる。 According to the sound correcting device of the present invention, the second signal correcting means is based on the bone-conducted vocal tract characteristic of the sound signal analyzed by the sound characteristic analyzing means, and is adapted to this. The guided sound path characteristic parameter and the air conduction sound source characteristic parameter are read from the storage means, and the read air conduction sound path characteristic parameter and the air conduction sound source characteristic parameter are synthesized to generate a pseudo air conduction sound signal. Thereby, a pseudo air conduction sound signal can be easily and accurately generated with respect to an unvoiced sound.

本発明の一実施形態による帯域拡張装置の構成を示すブロック図である。It is a block diagram which shows the structure of the band expansion apparatus by one Embodiment of this invention. 入力信号を時間フレームごとに分割した状態を説明するための図である。It is a figure for demonstrating the state which divided | segmented the input signal for every time frame. 出力信号の生成過程を説明するための図である。It is a figure for demonstrating the production | generation process of an output signal. （ａ）は、高調波スペクトルが加算された原音スペクトルを示すスペクトル図であり、（ｂ）は、原音スペクトルのうち例えば一つの原音スペクトル成分を複素平面上で表した図であり、（ｃ）は、図４（ｂ）の原音スペクトル成分に対応する一つの高調波スペクトル成分を複素平面上で表した図である。(A) is a spectrum diagram showing an original sound spectrum to which a harmonic spectrum is added, (b) is a diagram showing, for example, one original sound spectrum component of the original sound spectrum on a complex plane, and (c) These are the figures which represented one harmonic spectrum component corresponding to the original sound spectrum component of FIG.4 (b) on a complex plane. 本発明の他の実施形態による帯域拡張装置の構成を示すブロック図である。It is a block diagram which shows the structure of the band expansion apparatus by other embodiment of this invention. 原音スペクトルの包絡線の傾斜を分析する方法を説明するための図である。It is a figure for demonstrating the method to analyze the inclination of the envelope of an original sound spectrum. （ａ）は、音韻が母音である原音スペクトルを示すスペクトル図であり、（ｂ）は、高調波スペクトルが加算された原音スペクトルを示すスペクトル図である。(A) is a spectrum diagram showing an original sound spectrum whose phoneme is a vowel, and (b) is a spectrum diagram showing an original sound spectrum to which a harmonic spectrum is added. （ａ）は、音韻が子音である原音スペクトルを示すスペクトル図であり、（ｂ）は、高調波スペクトルが加算された原音スペクトルを示すスペクトル図である。(A) is a spectrum figure which shows the original sound spectrum whose phoneme is a consonant, (b) is a spectrum figure which shows the original sound spectrum to which the harmonic spectrum was added. 本発明の一実施形態による音声補正装置の構成を示すブロック図である。It is a block diagram which shows the structure of the audio | voice correction apparatus by one Embodiment of this invention. パラメータ記憶部に記憶された各パラメータの対応関係を示す図である。It is a figure which shows the correspondence of each parameter memorize | stored in the parameter memory | storage part. 気導音声道特性パラメータを作成するためのパラメータ作成装置の構成を示すブロック図である。It is a block diagram which shows the structure of the parameter production apparatus for producing an air conduction vocal tract characteristic parameter. 骨導音声道特性パラメータ及び気導音音源特性パラメータを作成するためのパラメータ作成装置を示すブロック図である。It is a block diagram which shows the parameter production apparatus for producing a bone-conduction sound path characteristic parameter and an air conduction sound source characteristic parameter.

以下、添付図面を参照して、本発明に従う帯域拡張装置及び音声補正装置の各種実施形態について説明する。
［帯域拡張装置の実施形態］
まず、図１〜図４を参照して、帯域拡張装置の一実施形態について説明する。図１は、本発明の一実施形態による帯域拡張装置の構成を示すブロック図であり、図２は、入力信号を時間フレームごとに分割した状態を説明するための図であり、図３は、出力信号の生成過程を説明するための図であり、図４（ａ）は、高調波スペクトルが加算された原音スペクトルを示すスペクトル図であり、図４（ｂ）は、原音スペクトルのうち例えば一つの原音スペクトル成分を複素平面上で表した図であり、図４（ｃ）は、図４（ｂ）の原音スペクトル成分に対応する一つの高調波スペクトル成分を複素平面上で表した図である。 Hereinafter, various embodiments of a bandwidth expansion device and a sound correction device according to the present invention will be described with reference to the accompanying drawings.
[Embodiment of Bandwidth Expansion Device]
First, an embodiment of a bandwidth expansion device will be described with reference to FIGS. FIG. 1 is a block diagram showing a configuration of a bandwidth extension apparatus according to an embodiment of the present invention, FIG. 2 is a diagram for explaining a state in which an input signal is divided for each time frame, and FIG. FIG. 4A is a diagram for explaining a generation process of an output signal, FIG. 4A is a spectrum diagram showing an original sound spectrum to which a harmonic spectrum is added, and FIG. 4B is an example of one of the original sound spectra. FIG. 4C is a diagram showing one harmonic spectrum component corresponding to the original sound spectrum component of FIG. 4B on the complex plane. .

図１を参照して、図示の帯域拡張装置２は、入力端子４、オーバーサンプリング型ローパスフィルタ（ＬＰＦ）６、時間フレーム分割部８（時間フレーム分割手段を構成する）、フーリエ変換部１０（フーリエ変換手段を構成する）、高調波スペクトル生成部１２（高調波スペクトル生成手段を構成する）、高調波スペクトル加算部１４（高調波スペクトル加算手段を構成する）、フーリエ逆変換部１６（フーリエ逆変換手段を構成する）、出力信号生成部１８（出力信号生成手段を構成する）及び出力端子２０を含んでいる。 Referring to FIG. 1, the illustrated bandwidth extension device 2 includes an input terminal 4, an oversampling low-pass filter (LPF) 6, a time frame division unit 8 (which constitutes a time frame division unit), and a Fourier transform unit 10 (Fourier). Conversion means), harmonic spectrum generation unit 12 (configures harmonic spectrum generation means), harmonic spectrum addition unit 14 (configures harmonic spectrum addition means), inverse Fourier transform unit 16 (inverse Fourier transform) An output signal generator 18 (which constitutes an output signal generator) and an output terminal 20.

入力端子４には、サンプリング周波数８ｋＨｚでサンプリングされた時間領域の入力信号が入力される。なお、入力信号は、例えば電話システムにおける音声信号やデジタル音響機器における楽音信号である。 A time domain input signal sampled at a sampling frequency of 8 kHz is input to the input terminal 4. The input signal is, for example, an audio signal in a telephone system or a musical sound signal in a digital acoustic device.

オーバーサンプリング型ローパスフィルタ６は、入力端子４からの入力信号のサンプリング周波数を８ｋＨｚから１６ｋＨｚにオーバーサンプリングするとともに、入力信号の４ｋＨｚ以上の周波数帯域を減衰させる。 The oversampling low-pass filter 6 oversamples the sampling frequency of the input signal from the input terminal 4 from 8 kHz to 16 kHz, and attenuates the frequency band of 4 kHz or more of the input signal.

時間フレーム分割部８は、ハニング窓を窓関数として、オーバーサンプリング型ローパスフィルタ６より出力された時間領域の入力信号を所定の時間長（例えば１６ｍｓｅｃ）を有する時間フレームごとに分割する（図２参照）。各時間フレームは、その両端部がそれぞれ両側に隣接する時間フレームと重複するようにして分割される。このように分割された各時間フレームより、２５６個のサンプルがそれぞれ取り出される。 The time frame dividing unit 8 divides the time domain input signal output from the oversampling low-pass filter 6 into time frames having a predetermined time length (for example, 16 msec) using the Hanning window as a window function (see FIG. 2). ). Each time frame is divided so that both end portions thereof overlap with time frames adjacent to both sides. From each time frame divided in this way, 256 samples are taken out.

フーリエ変換部１０は、時間フレーム分割部８により分割された時間フレームをそれぞれフーリエ変換（短時間フーリエ変換）することにより、周波数領域の原音スペクトルを生成する。本実施形態では、各原音スペクトルの周波数帯域はそれぞれ０〜４ｋＨｚとなる。 The Fourier transform unit 10 generates an original sound spectrum in the frequency domain by performing a Fourier transform (short-time Fourier transform) on each of the time frames divided by the time frame dividing unit 8. In this embodiment, the frequency band of each original sound spectrum is 0 to 4 kHz.

高調波スペクトル生成部１２は、原音スペクトル成分抽出部２２、高調波スペクトル演算部２４及び補間演算部２６（補間手段を構成する）を含んでいる。原音スペクトル成分抽出部２２は、原音スペクトルの２〜４ｋＨｚの周波数帯域より原音スペクトル成分を複数抽出する。高調波スペクトル演算部２４は、抽出された原音スペクトル成分に基づいて高調波スペクトル成分を後述するようにして演算し、この演算した高調波スペクトル成分を合成して４〜８ｋＨｚの周波数帯域を有する高調波スペクトルを生成する。また、補間演算部２６は、後述するようにして高調波スペクトル成分の補間演算を行う。 The harmonic spectrum generation unit 12 includes an original sound spectrum component extraction unit 22, a harmonic spectrum calculation unit 24, and an interpolation calculation unit 26 (which constitutes an interpolation unit). The original sound spectrum component extraction unit 22 extracts a plurality of original sound spectrum components from the frequency band of 2 to 4 kHz of the original sound spectrum. The harmonic spectrum calculation unit 24 calculates a harmonic spectrum component based on the extracted original sound spectrum component as described later, synthesizes the calculated harmonic spectrum component, and has a frequency band of 4 to 8 kHz. Generate a wave spectrum. In addition, the interpolation calculation unit 26 performs an interpolation calculation of harmonic spectrum components as described later.

高調波スペクトル加算部１４は、上述のようにして生成された高調波スペクトルを原音スペクトルに加算する。これにより、４〜８ｋＨｚの周波数帯域が拡張された原音スペクトルが生成される。 The harmonic spectrum adding unit 14 adds the harmonic spectrum generated as described above to the original sound spectrum. As a result, an original sound spectrum in which the frequency band of 4 to 8 kHz is expanded is generated.

フーリエ逆変換部１６は、高調波スペクトルが加算された原音スペクトルをフーリエ逆変換することにより、時間領域の出力信号成分を生成する。 The inverse Fourier transform unit 16 generates an output signal component in the time domain by performing inverse Fourier transform on the original sound spectrum to which the harmonic spectrum is added.

出力信号生成手段１８は、上述のようにして生成された出力信号成分をそれぞれ加算することにより、時間領域の出力信号を生成する。各出力信号成分は、その両端部がそれぞれ両側に隣接する出力信号成分と重複するようにして加算される。このように生成された出力信号は、出力端子２０より外部に出力される。 The output signal generation means 18 generates a time domain output signal by adding the output signal components generated as described above. Each output signal component is added such that both end portions thereof overlap with output signal components adjacent to both sides. The output signal generated in this way is output from the output terminal 20 to the outside.

次に、図３及び図４をも参照して、上述した帯域拡張装置２による出力信号の生成の流れについて説明する。まず、入力端子４より入力された時間領域の入力信号Ｔ_ｉｎ（アナログ信号）は、図示しないＡ／Ｄ変換部にてデジタル信号に変換された後に、オーバーサンプリング型ローパスフィルタ６にてサンプリング周波数が８ｋＨｚから１６ｋＨｚにオーバーサンプリングされるとともに、４ｋＨｚ以上の周波数帯域が減衰される（図３（ａ）参照）。次いで、この入力信号は、時間フレーム分割部８にて時間フレームＴ_ｎ（ｎ＝１，２，３，・・・）ごとに分割される（図３（ｂ）参照）。各時間フレームＴ_ｎはそれぞれフーリエ変換部１０にてフーリエ変換され、これにより周波数領域の原音スペクトルＸ_ｎが生成される（図３（ｃ）参照）。各原音スペクトルＸ_ｎはそれぞれ、０〜４ｋＨｚの周波数帯域を有するようになる。 Next, with reference to FIG. 3 and FIG. 4 as well, the flow of generating an output signal by the band extending device 2 described above will be described. First, the time domain input signal T _in (analog signal) input from the input terminal 4 is converted into a digital signal by an A / D converter (not shown), and then the sampling frequency is converted by an oversampling low-pass filter 6. Oversampling is performed from 8 kHz to 16 kHz, and a frequency band of 4 kHz or more is attenuated (see FIG. 3A). Next, the input signal is divided for each time frame T _n (n = 1, 2, 3,...) By the time frame dividing unit 8 (see FIG. 3B). Each time frame T _n is Fourier transformed by the Fourier transform unit 10, thereby generating an original sound spectrum X _{n in the} frequency domain (see FIG. 3C). Each original sound spectrum _Xn has a frequency band of 0 to 4 kHz.

その後、高調波スペクトル生成部１２にて、原音スペクトルＸ_ｎに基づいて高調波スペクトルＸ_ｎ’が次のようにして生成される。まず、原音スペクトル成分抽出部２２によって、原音スペクトルＸ_ｎの２〜４ｋＨｚの周波数帯域より原音スペクトル成分Ｘ（ｆ_ｎ）が複数抽出される。原音スペクトル成分Ｘ（ｆ_ｎ）を数式で表現すると、次式（１）で示すようになる。 Thereafter, the harmonic spectrum generator 12 generates a harmonic spectrum X _n ′ based on the original sound spectrum X _n as follows. First, the original sound spectrum component extracting unit 22 extracts a plurality of original sound spectrum components X (f _n ) from the frequency band of 2 to 4 kHz of the original sound spectrum X _n . When the original sound spectrum component X (f _n ) is expressed by a mathematical expression, the following expression (1) is obtained.

次いで、高調波スペクトル演算部２４によって、抽出された原音スペクトル成分Ｘ（ｆ_ｎ）に基づいて高調波スペクトル成分Ｘ’（２ｆ_ｎ）が演算される。具体的には、図４（ａ）〜（ｃ）に示すように、高調波スペクトル演算部２４は、原音スペクトル成分Ｘ（ｆ_ｎ）の周波数ｆ_ｎを算出し、この算出した周波数ｆ_ｎの２倍の周波数２ｆ_ｎを高調波スペクトル成分Ｘ’（２ｆ_ｎ）の周波数として設定する。また、高調波スペクトル演算部２４は、原音スペクトル成分Ｘ（ｆ_ｎ）の位相角θ_ｎを算出し、この算出した位相角θ_ｎの２倍の位相角２θ_ｎを高調波スペクトル成分Ｘ’（２ｆ_ｎ）の位相角として設定する。更に、高調波スペクトル演算部２４は、原音スペクトル成分Ｘ（ｆ_ｎ）の大きさ（レベル）｜Ｘ（ｆ_ｎ）｜を算出し、この算出した大きさ｜Ｘ（ｆ_ｎ）｜の１／２倍の大きさ１／２｜Ｘ（ｆ_ｎ）｜を高調波スペクトル成分Ｘ’（２ｆ_ｎ）の大きさとして設定する。なお、図４では、原音スペクトルをＸ（ｆ）、高調波スペクトルをＸ’（ｆ）と表している。また、本実施形態では、高調波スペクトル成分Ｘ’（２ｆ_ｎ）の大きさに対して加重を施す加重係数は、１／２（一定）である。高調波スペクトル成分Ｘ’（２ｆ_ｎ）を数式で表現すると、次式（２）で示すようになる。

Next, a harmonic spectrum component X ′ (2f _n ) is calculated by the harmonic spectrum calculation unit 24 based on the extracted original sound spectrum component X (f _n ). Specifically, as shown in FIGS. 4A to 4C, the harmonic spectrum calculation unit 24 calculates the frequency f _n of the original sound spectrum component X (f _n ), and calculates the frequency f _n of the calculated frequency f _n . The double frequency 2f _n is set as the frequency of the harmonic spectrum component X ′ (2f _n ). Further, the harmonic spectrum calculator 24 calculates a phase angle theta _n of the original spectral components X (f _n), twice the phase angle 2 [Theta] _n of the calculated phase angle theta _n harmonic spectral components X '( 2f _n ). Further, the harmonic spectrum calculation unit 24 calculates the magnitude (level) | X (f _n ) | of the original sound spectrum component X (f _n ), and 1 / of the calculated magnitude | X (f _n ) | The double magnitude 1/2 | X (f _n ) | is set as the magnitude of the harmonic spectrum component X ′ (2f _n ). In FIG. 4, the original sound spectrum is represented as X (f), and the harmonic spectrum is represented as X ′ (f). In the present embodiment, the weighting coefficient for applying weight to the magnitude of the harmonic spectrum component X ′ (2f _n ) is ½ (constant). When the harmonic spectrum component X ′ (2f _n ) is expressed by a mathematical expression, the following expression (2) is obtained.

上述した演算と同時に、補間演算部２６は、第１原音スペクトル成分Ｘ（ｆ_ｎ）の周波数ｆ_ｎの２倍の周波数２ｆ_ｎと、第２原音スペクトル成分Ｘ（ｆ_ｎ＋１）の周波数ｆ_ｎ＋１の２倍の周波数２ｆ_ｎ＋２との間の周波数２ｆ_ｎ＋１を有する高調波スペクトル成分Ｘ’（２ｆ_ｎ＋１）を例えば次式（３）の補間演算により求める。次式（３）において、高調波スペクトル成分Ｘ’（２ｆ_ｎ＋１）の位相角については、第１原音スペクトル成分Ｘ（ｆ_ｎ）及び第２原音スペクトル成分Ｘ（ｆ_ｎ＋１）の各位相角にそれぞれ周波数比を乗じることによって算出している。

At the same time the operation described above, the interpolation operation unit 26, the frequency _{f n} of twice the frequency 2f _n of frequency _{f n} of the first original spectral component X _{(f n),} second original spectral component _{X (f} n +1) A harmonic spectrum component X ′ (2f _n +1) having a frequency 2f _n +1 between a frequency 2f _n +2 that is twice +1 is obtained by, for example, an interpolation operation of the following equation (3). In the following equation (3), for the phase angle of the harmonic spectrum component X ′ (2f _n +1), each phase angle of the first original sound spectrum component X (f _n ) and the second original sound spectrum component X (f _n +1). Is calculated by multiplying each by the frequency ratio.

このようにして演算された高調波スペクトル成分Ｘ’（２ｆ_ｎ）、Ｘ’（２ｆ_ｎ＋１）はそれぞれ、高調波スペクトル演算部２４によって合成され、４〜８ｋＨｚの周波数帯域を有する高調波スペクトルＸ’（ｆ）が生成される（図４（ａ）参照）。

The harmonic spectrum components X ′ (2f _n ) and X ′ (2f _n +1) calculated in this way are synthesized by the harmonic spectrum calculation unit 24, and the harmonic spectrum X having a frequency band of 4 to 8 kHz. '(F) is generated (see FIG. 4A).

上述のようにして生成された高調波スペクトルＸ’_ｎは、高調波スペクトル加算部１４によって原音スペクトルＸ_ｎに加算される（図３（ｄ）参照）。これにより、４〜８ｋＨｚの周波数帯域が拡張された原音スペクトルＸｎが生成される。 The harmonic spectrum X ′ _n generated as described above is added to the original sound spectrum X _n by the harmonic spectrum adding unit 14 (see FIG. 3D). Thereby, the original sound spectrum Xn in which the frequency band of 4 to 8 kHz is expanded is generated.

高調波スペクトルＸ’_ｎが加算された原音スペクトルＸ_ｎはそれぞれ、フーリエ逆変換部１６によってフーリエ逆変換され、これにより時間領域の出力信号成分Ｔ_ｎ’が生成される（図３（ｅ）参照）。各出力信号成分Ｔ_ｎ’はそれぞれ出力信号生成手段１８によって加算され、出力信号Ｔ_ｏｕｔが生成される（図３（ｆ）参照）。生成された出力信号Ｔ_ｏｕｔは、図示しないＤ／Ａ変換部にてアナログ信号に変換された後に、出力端子より外部に出力される。出力信号Ｔ_ｏｕｔの周波数帯域は０〜８ｋＨｚとなり、入力信号Ｔ_ｉｎの周波数帯域０〜４ｋＨｚの高域側が拡張されたものとなる。 Harmonic spectrum X 'each _n is the addition the original spectrum X _n is the inverse Fourier transform by the inverse Fourier transform unit 16, thereby the output signal component T _n in the time _domain' is generated (see FIG. 3 (e) ). Each output signal component T _n ′ is added by the output signal generation means 18 to generate an output signal T _out (see FIG. 3F). The generated output signal T _out is converted to an analog signal by a D / A converter (not shown) and then output to the outside from the output terminal. The frequency band of the output signal T _out is 0 to 8 kHz, and the high frequency side of the frequency band 0 to 4 kHz of the input signal T _in is expanded.

本実施形態の帯域拡張装置２では、上述したように原音スペクトル成分の周波数の逓倍（本実施形態では２倍）の周波数を高調波スペクトル成分の周波数として設定するので、出力信号においてスペクトル包絡の高次共振モード（本実施形態では２次共振モード）のピークを再現することができる。それ故に、自然な特性の出力信号を得ることができ、聴認度の向上を図ることができる。 In the band extending apparatus 2 of the present embodiment, as described above, the frequency multiplied by the frequency of the original sound spectral component (twice in the present embodiment) is set as the frequency of the harmonic spectral component. The peak of the secondary resonance mode (secondary resonance mode in this embodiment) can be reproduced. Therefore, an output signal with natural characteristics can be obtained, and the degree of hearing can be improved.

なお、本実施形態の帯域拡張装置２は、種々の用途に適用することができる。例えば、帯域拡張装置２を電話システムに適用した場合には、自然な通話音声を得ることができるようになり、これにより、高齢者などが通話する際の聴認度を向上させることができ、また、他人になりすまして通話することによる犯罪行為（所謂、振り込め詐欺など）を防止することができる。また、帯域拡張装置２を喉頭摘出者向けの人工声帯などに適用した場合には、自然な音声を再現することができる。また、帯域拡張装置２をデジタル音響機器に適用した場合には、自然な再生音質を得ることができる。更に、入力信号を生成する例えばマイクロホンや振動ピックアップ、骨導マイクロホン等は、その特性不足により高音域を出力するのは難しいが、これらに対して本実施形態の帯域拡張装置２を適用することにより、自然な高音域を再現することが容易となる。 Note that the bandwidth expansion device 2 of the present embodiment can be applied to various uses. For example, when the bandwidth extension device 2 is applied to a telephone system, it becomes possible to obtain a natural call voice, thereby improving the degree of hearing when an elderly person makes a call, In addition, criminal acts (so-called transfer fraud, etc.) caused by making a call while impersonating another person can be prevented. In addition, when the band extending device 2 is applied to an artificial vocal cord for laryngectomies, a natural voice can be reproduced. In addition, when the band extending device 2 is applied to a digital audio device, natural reproduction sound quality can be obtained. Furthermore, for example, microphones, vibration pickups, bone-conduction microphones, and the like that generate input signals are difficult to output a high sound range due to insufficient characteristics, but by applying the band extending device 2 of the present embodiment to these, It becomes easy to reproduce a natural high range.

次に、図５〜図８を参照して、帯域拡張装置の他の実施形態について説明する。図５は、本発明の他の実施形態による帯域拡張装置の構成を示すブロック図であり、図６は、原音スペクトルの包絡線の傾斜を分析する方法を説明するための図であり、図７（ａ）は、音韻が母音である原音スペクトルを示すスペクトル図であり、図７（ｂ）は、高調波スペクトルが加算された原音スペクトルを示すスペクトル図であり、図８（ａ）は、音韻が子音である原音スペクトルを示すスペクトル図であり、図８（ｂ）は、高調波スペクトルが加算された原音スペクトルを示すスペクトル図である。なお、本実施形態において、上記実施形態と実質上同一の構成要素には同一の符号を付し、その説明を省略する。 Next, another embodiment of the band extending apparatus will be described with reference to FIGS. FIG. 5 is a block diagram showing a configuration of a band extending apparatus according to another embodiment of the present invention, and FIG. 6 is a diagram for explaining a method of analyzing the slope of the envelope of the original sound spectrum. (A) is a spectrum diagram showing an original sound spectrum whose phoneme is a vowel, FIG. 7 (b) is a spectrum diagram showing an original sound spectrum to which a harmonic spectrum is added, and FIG. 8 (a) is a phoneme. FIG. 8B is a spectrum diagram showing an original sound spectrum to which a harmonic spectrum is added. In the present embodiment, components that are substantially the same as those in the above embodiment are denoted by the same reference numerals, and the description thereof is omitted.

図５を参照して、本実施形態の帯域拡張装置２Ａでは、高調波スペクトル生成部１２Ａは、原音スペクトル分析部２８（原音スペクトル分析手段を構成する）及び加重部３０（加重手段を構成する）を更に含んでいる。原音スペクトル分析部２８は、原音スペクトルの包絡線の傾斜を分析するものである。加重部３０は、分析された原音スペクトルの包絡線の傾斜に基づいて、高調波スペクトルに含まれる高調波スペクトル成分の大きさに対して、後述する加重係数で加重を施すものである。以下、これら原音スペクトル分析部２８及び加重部３０について説明する。 Referring to FIG. 5, in the band extending apparatus 2A of the present embodiment, the harmonic spectrum generation unit 12A includes an original sound spectrum analysis unit 28 (which constitutes an original sound spectrum analysis unit) and a weighting unit 30 (which constitutes a weighting unit). Is further included. The original sound spectrum analysis unit 28 analyzes the slope of the envelope of the original sound spectrum. The weighting unit 30 applies weighting to the magnitude of the harmonic spectrum component included in the harmonic spectrum with a weighting coefficient described later, based on the slope of the envelope of the analyzed original sound spectrum. Hereinafter, the original sound spectrum analysis unit 28 and the weighting unit 30 will be described.

原音スペクトル分析部２８は、フーリエ変換により生成された原音スペクトルに含まれる原音スペクトル成分の周波数をオクターブ分割するとともに、原音スペクトル成分の大きさを対数化する（図６参照）。そして、最小二乗法による演算を行うことにより、原音スペクトルの包絡線の傾斜αを次式（４）で示す近似直線により分析する。 The original sound spectrum analyzing unit 28 divides the frequency of the original sound spectrum component included in the original sound spectrum generated by the Fourier transform in an octave and logarithmizes the magnitude of the original sound spectrum component (see FIG. 6). Then, the slope α of the envelope of the original sound spectrum is analyzed by an approximate straight line represented by the following equation (4) by performing an operation using the least square method.

加重部３０は、上述のようにして分析された傾斜αより、次式（５）で示す加重係数Ｚ（α）を算出する。更に、加重部３０は、高調波スペクトル演算部２４による高調波スペクトル成分Ｘ’（２ｆ_ｎ）の演算の際に、次式（６）で示すように、算出された加重係数Ｚ（α）によって高調波スペクトル成分Ｘ’（２ｆ_ｎ）の大きさに重みを付ける。

The weighting unit 30 calculates a weighting coefficient Z (α) represented by the following equation (5) from the slope α analyzed as described above. Further, when the harmonic spectrum component X ′ (2f _n ) is calculated by the harmonic spectrum calculation unit 24, the weighting unit 30 uses the calculated weighting coefficient Z (α) as shown in the following equation (6). A weight is applied to the magnitude of the harmonic spectral component X ′ (2f _n ).

例えば入力信号が音声信号である場合には、原音スペクトルの音韻の種類によって、原音スペクトルの包絡線の傾斜αが変化する。図７（ａ）に示すように、原音スペクトルの音韻が母音であるときには、母音の特性上、原音スペクトルの包絡線の傾斜αはマイナス側に大きくなる。これにより、加重係数Ｚ（α）は小さくなり、図７（ｂ）に示すように、高調波スペクトル成分の大きさは小さくなる。一般に、母音のスペクトルでは、高域側におけるスペクトル成分の減衰が大きくなる特性がある。それ故に、上述のように加重係数Ｚ（α）を小さく設定することによって、母音をよりリアルに復元することができる。

For example, when the input signal is an audio signal, the slope α of the envelope of the original sound spectrum changes depending on the type of phoneme of the original sound spectrum. As shown in FIG. 7A, when the phoneme of the original sound spectrum is a vowel, the slope α of the envelope of the original sound spectrum increases to the negative side due to the characteristics of the vowel. As a result, the weighting coefficient Z (α) is reduced, and the magnitude of the harmonic spectrum component is reduced as shown in FIG. 7B. In general, the spectrum of vowels has a characteristic that the attenuation of the spectrum component on the high frequency side is increased. Therefore, by setting the weighting coefficient Z (α) to be small as described above, the vowel can be restored more realistically.

また、図８（ａ）に示すように、原音スペクトルの音韻が子音であるときには、子音の特性上、原音スペクトルの包絡線の傾斜αは小さくなる。これにより、加重係数Ｚ（α）は小さくなり、図８（ｂ）に示すように、高調波スペクトル成分の大きさは大きくなる。一般に、子音のスペクトルでは、高域側におけるスペクトル成分の減衰が小さくなる特性がある。それ故に、上述のように加重係数Ｚ（α）を大きく設定することによって、子音をよりリアルに復元することができる。 Further, as shown in FIG. 8A, when the phoneme of the original sound spectrum is a consonant, the slope α of the envelope of the original sound spectrum is small due to the characteristics of the consonant. As a result, the weighting coefficient Z (α) decreases, and the harmonic spectrum component increases in size as shown in FIG. In general, the consonant spectrum has a characteristic that the attenuation of the spectrum component on the high frequency side is small. Therefore, the consonant can be restored more realistically by setting the weighting coefficient Z (α) large as described above.

なお、上記各実施形態では、高調波スペクトル演算部２４は、原音スペクトル成分Ｘ（ｆ_ｎ）の周波数ｆ_ｎの２倍の周波数２ｆ_ｎを高調波スペクトル成分Ｘ’（２ｆ_ｎ）の周波数として設定するように構成したが、３倍の周波数３ｆ_ｎあるいは４倍の周波数４ｆ_ｎでもよく、任意の整数ｍによる逓倍の周波数ｍｆ_ｎを高調波スペクトル成分Ｘ’（ｍｆ_ｎ）の周波数として設定することができる。これに対応して、高調波スペクトル演算部２４は、原音スペクトル成分Ｘ（ｆ_ｎ）の位相角θ_ｎの任意の整数ｍによる逓倍の位相角ｍθ_ｎを高調波スペクトル成分Ｘ’（ｍｆ_ｎ）の位相角として設定することができる。 In each of the above embodiments, the harmonic spectrum calculation unit 24 sets a frequency 2f _n twice the frequency f _n of the original sound spectrum component X (f _n ) as the frequency of the harmonic spectrum component X ′ (2f _n ). However, the frequency may be 3 times the frequency 3f _n or 4 times the frequency 4f _n , and the frequency mf _n multiplied by an arbitrary integer m is set as the frequency of the harmonic spectrum component X ′ (mf _n ). Can do. Correspondingly, the harmonic spectrum calculation unit 24, original spectral components X (f _n) harmonic spectral components X phase angle m.theta _n multiplied by any integer m of the phase angle theta _n of '(mf _n) Can be set as the phase angle.

また、上記各実施形態において、例えば、原音スペクトル成分Ｘ（ｆ_ｎ）の周波数ｆ_ｎの２倍の周波数２ｆ_ｎを有する第１高調波スペクトル成分Ｘ’（２ｆ_ｎ）と、原音スペクトル成分Ｘ（ｆ_ｎ）の周波数ｆ_ｎの３倍の周波数３ｆ_ｎを有する第２高調波スペクトル成分Ｘ’（３ｆ_ｎ）と、を生成し、これら第１及び第２高調波スペクトル成分Ｘ’（２ｆ_ｎ）、Ｘ’（３ｆ_ｎ）をそれぞれ合成して得られる第１及び第２高調波スペクトルを原音スペクトルに加算するように構成してもよい。
［音声補正装置の実施形態］
次に、図９〜図１２を参照して、音声補正装置の一実施形態について説明する。図９は、本発明の一実施形態による音声補正装置の構成を示すブロック図であり、図１０は、パラメータ記憶部に記憶された各パラメータの対応関係を示す図であり、図１１は、気導音声道特性パラメータを作成するためのパラメータ作成装置の構成を示すブロック図であり、図１２は、骨導音声道特性パラメータ及び気導音音源特性パラメータを作成するためのパラメータ作成装置を示すブロック図である。 In each of the above embodiments, for example, the first harmonic spectral component X having twice the frequency 2f _n frequency _{f n} of the original spectral components _{X (f n) '(2f} n), original spectral components X ( and a second harmonic spectral component X ′ (3f _n ) having a frequency 3f _n that is three times the frequency f _n of f _n ), and these first and second harmonic spectral components X ′ (2f _n ) , X ′ (3f _n ) may be configured to add the first and second harmonic spectra obtained by synthesis to the original sound spectrum.
[Sound correction device embodiment]
Next, an embodiment of a sound correction apparatus will be described with reference to FIGS. FIG. 9 is a block diagram showing the configuration of the sound correction apparatus according to an embodiment of the present invention, FIG. 10 is a diagram showing the correspondence between parameters stored in the parameter storage unit, and FIG. FIG. 12 is a block diagram showing a configuration of a parameter creation apparatus for creating a guided sound path characteristic parameter, and FIG. 12 is a block diagram showing a parameter creation apparatus for creating a bone-conducted sound path characteristic parameter and an air-conducted sound source characteristic parameter. FIG.

図９を参照して、本実施形態の音声補正装置３２は、骨導マイクロホン３４、ローパスフィルタ（ＬＰＦ）３６、Ａ／Ｄ変換部３８、時間フレーム分割部４０（時間フレーム分割手段を構成する）、ＬＰＣ分析部４２（音声特性分析手段及び音声性質判別手段を構成する）、補正モード切替部４４（補正モード切替手段を構成する）、第１ＬＰＣ合成部４６（第１信号補正手段を構成する）、第２ＬＰＣ合成部４８（第１信号補正手段を構成する）、パラメータ記憶部５０（パラメータ記憶手段を構成する）、平滑化部５２（出力信号生成手段を構成する）、Ｄ／Ａ変換部５４、ローパスフィルタ５６及び出力端子６０を含んでいる。 Referring to FIG. 9, the sound correction device 32 of this embodiment includes a bone-conduction microphone 34, a low-pass filter (LPF) 36, an A / D conversion unit 38, and a time frame division unit 40 (which constitutes a time frame division unit). , LPC analysis unit 42 (which constitutes speech characteristic analysis means and speech property determination means), correction mode switching unit 44 (which constitutes correction mode switching means), and first LPC synthesis unit 46 (which constitutes first signal correction means) A second LPC synthesis unit 48 (which constitutes a first signal correction unit), a parameter storage unit 50 (which constitutes a parameter storage unit), a smoothing unit 52 (which constitutes an output signal generation unit), and a D / A conversion unit 54 The low-pass filter 56 and the output terminal 60 are included.

骨導マイクロホン３４は、顔の部位、例えば額や顎、頬、耳穴等に装着され、骨や皮膚に伝達される発声者の声帯振動を収録するものである。ローパスフィルタ３６は、骨導マイクロホン３４からの音声信号の所定の周波数（例えば８ｋＨｚ）以上の周波数帯域を減衰させる。Ａ／Ｄ変換部３８は、音声信号をアナログ信号からデジタル信号に変換する。 The bone-conduction microphone 34 is mounted on a face part, for example, the forehead, chin, cheek, ear hole or the like, and records vocal cord vibrations of a speaker who is transmitted to bones and skin. The low-pass filter 36 attenuates a frequency band equal to or higher than a predetermined frequency (for example, 8 kHz) of the audio signal from the bone conduction microphone 34. The A / D converter 38 converts the audio signal from an analog signal to a digital signal.

時間フレーム分割部４０は、ハニング窓を窓関数として、時間領域の音声信号を所定の時間長（例えば１６ｍｓｅｃ）を有する時間フレームごとに分割する。各時間フレームは、その両端部がそれぞれ両側に隣接する時間フレームと重複するようにして分割される。 The time frame dividing unit 40 divides the time domain audio signal into time frames having a predetermined time length (for example, 16 msec) using the Hanning window as a window function. Each time frame is divided so that both end portions thereof overlap with time frames adjacent to both sides.

ＬＰＣ分析部４２は、各時間フレームにおける音声信号に対して線形予測分析（ＬＰＣ）を行い、音声信号の音声特性を分析する。この分析によって、音声信号が骨導音音源特性と骨導音声道特性とに分離される。分析された骨導音音源特性は、第１ＬＰＣ合成部４６に出力され、また分析された骨導音声道特性は、補正モード切替部４４を介して第１ＬＰＣ合成部４６又は第２ＬＰＣ合成部４８に出力される。 The LPC analysis unit 42 performs linear prediction analysis (LPC) on the audio signal in each time frame, and analyzes the audio characteristics of the audio signal. By this analysis, the audio signal is separated into the bone-conducted sound source characteristic and the bone-conducted vocal tract characteristic. The analyzed bone conduction sound source characteristic is output to the first LPC synthesis unit 46, and the analyzed bone conduction vocal tract characteristic is sent to the first LPC synthesis unit 46 or the second LPC synthesis unit 48 via the correction mode switching unit 44. Is output.

また、ＬＰＣ分析部４２は、分析された骨導音音源特性に基づいて、音声信号の音声性質を判別する。ＬＰＣ分析部４２は、骨導音音源特性にピッチ成分（即ち、有声音の音源特性を示すパルス列）が検出されたときには、音声信号が有声音であると判別し、また骨導音音源特性にピッチ成分が検出されないときには、音声信号が無声音であると判別する。 Moreover, the LPC analysis part 42 discriminate | determines the audio | voice property of an audio | voice signal based on the analyzed bone-conduction sound source characteristic. When a pitch component (that is, a pulse train indicating the voiced sound source characteristic) is detected in the bone-conducted sound source characteristic, the LPC analysis unit 42 determines that the audio signal is a voiced sound, and also determines the bone-conducted sound source characteristic. When the pitch component is not detected, it is determined that the audio signal is an unvoiced sound.

補正モード切替部４４は、ＬＰＣ分析部４２による音声信号の音声性質の判別結果に基づいて、骨導音声道特性の出力先を切り替えるためのスイッチである。補正モード切替部４４は、ＬＰＣ分析部４２により音声信号が有声音であると判別されると、骨導音声道特性の出力先を第１ＬＰＣ合成部４６に切り替える。これにより、第１ＬＰＣ合成部４６により音声信号の補正が行われる有声音補正モードとなる。また、補正モード切替部４４は、ＬＰＣ分析部４２により音声信号が無声音であると判別されると、骨導音声道特性の出力先を第２ＬＰＣ合成部４８に切り替える。これにより、第２ＬＰＣ合成部４８により音声信号の補正が行われる無声音補正モードとなる。 The correction mode switching unit 44 is a switch for switching the output destination of the bone conduction vocal tract characteristics based on the discrimination result of the voice property of the voice signal by the LPC analysis unit 42. When the LPC analysis unit 42 determines that the voice signal is voiced, the correction mode switching unit 44 switches the output destination of the bone conduction vocal tract characteristic to the first LPC synthesis unit 46. Thus, the voiced sound correction mode in which the first LPC synthesis unit 46 corrects the voice signal is set. In addition, when the LPC analysis unit 42 determines that the audio signal is an unvoiced sound, the correction mode switching unit 44 switches the output destination of the bone-conducted vocal tract characteristic to the second LPC synthesis unit 48. As a result, an unvoiced sound correction mode in which the audio signal is corrected by the second LPC synthesis unit 48 is set.

パラメータ記憶部５０には、骨導音声道特性パラメータ、気導音声道特性パラメータ及び気導音音源特性パラメータを一組のパラメータグループとして、複数組（例えば４０組）のパラメータグループが記憶されている（図１０参照）。これら各パラメータは、予め後述するパラメータ作成装置６２，７６によって作成され、パラメータ記憶部５０に記憶される。各パラメータの作成方法については後述する。 The parameter storage unit 50 stores a plurality of parameter groups (for example, 40 sets) with the bone conduction vocal tract characteristic parameter, the air conduction vocal tract characteristic parameter, and the air conduction sound source characteristic parameter as a set of parameter groups. (See FIG. 10). These parameters are created in advance by parameter creation devices 62 and 76, which will be described later, and stored in the parameter storage unit 50. A method for creating each parameter will be described later.

第１ＬＰＣ合成部４６は、ＬＰＣ分析部４２より出力された骨導音音源特性とパラメータ記憶部５０に記憶された気導音声道特性パラメータとをＬＰＣ合成（即ち、線形予測法による合成）することにより、疑似気導音声信号を生成する。また、第２ＬＰＣ合成部４８は、パラメータ記憶部５０に記憶された気導音音源特性パラメータと気導音声道特性パラメータとをＬＰＣ合成することにより、疑似気導音声信号を生成する。これら第１ＬＰＣ合成部４６及び第２ＬＰＣ合成部４８による疑似気導音声信号の生成方法については後述する。 The first LPC synthesis unit 46 performs LPC synthesis (that is, synthesis by a linear prediction method) between the bone conduction sound source characteristics output from the LPC analysis unit 42 and the air conduction vocal tract characteristic parameters stored in the parameter storage unit 50. Thus, a pseudo air conduction sound signal is generated. In addition, the second LPC synthesis unit 48 generates a pseudo air conduction sound signal by performing LPC synthesis of the air conduction sound source characteristic parameter and the air conduction sound path characteristic parameter stored in the parameter storage unit 50. A method of generating the pseudo air conduction speech signal by the first LPC synthesis unit 46 and the second LPC synthesis unit 48 will be described later.

平滑化部５２は、第１ＬＰＣ合成部４６又は第２ＬＰＣ合成部４８により生成された疑似気導音声信号をそれぞれ加算するとともに平滑化処理を行うことにより、出力信号を生成する。なお、平滑化処理としては、例えばハニング窓を窓関数として、信号の接合部の振幅値を零に近似した値とする方法が用いられる。Ｄ／Ａ変換部５４は、生成された出力信号をデジタル信号からアナログ信号に変換する。ローパスフィルタ５６は、アナログ信号に変換された出力信号の所定の周波数（例えば８ｋＨｚ）以上の周波数帯域を減衰させる。このように生成された出力信号は、出力端子６０より外部に出力される。 The smoothing unit 52 generates an output signal by adding the pseudo air conduction sound signals generated by the first LPC synthesis unit 46 or the second LPC synthesis unit 48 and performing a smoothing process. As the smoothing process, for example, a method is used in which the Hanning window is used as a window function and the amplitude value of the signal junction is approximated to zero. The D / A converter 54 converts the generated output signal from a digital signal to an analog signal. The low-pass filter 56 attenuates a frequency band of a predetermined frequency (for example, 8 kHz) or more of the output signal converted into the analog signal. The output signal generated in this way is output from the output terminal 60 to the outside.

次に、上述した音声補正装置３２による音声信号の補正の流れについて説明する。まず、骨導マイクロホン３４を装着した発声者が発声すると、骨導マイクロホン３４より音声信号が出力される。骨導マイクロホン３４からの音声信号（アナログ信号）は、ローパスフィルタ３６にて所定の周波数以上の周波数帯域が減衰される。次いで、この音声信号は、Ａ／Ｄ変換部３８にてデジタル信号に変換された後に、時間フレーム分割部４０にて時間フレームごとに分割される。 Next, the flow of audio signal correction by the audio correction device 32 described above will be described. First, when a speaker who wears the bone conduction microphone 34 speaks, an audio signal is output from the bone conduction microphone 34. The sound signal (analog signal) from the bone-conduction microphone 34 is attenuated in a frequency band equal to or higher than a predetermined frequency by the low-pass filter 36. Next, the audio signal is converted into a digital signal by the A / D conversion unit 38 and then divided by the time frame division unit 40 for each time frame.

各時間フレームにおける音声信号はそれぞれ、ＬＰＣ分析部４２にて骨導音音源特性と骨導音声道特性とに分離される。また、ＬＰＣ分析部４２にて、分析された骨導音音源特性に基づいて、各時間フレームにおける音声信号が有声音であるか無声音であるかが判別される。 The audio signal in each time frame is separated into a bone-conducted sound source characteristic and a bone-conducted vocal tract characteristic by the LPC analyzer 42. Further, the LPC analysis unit 42 determines whether the sound signal in each time frame is a voiced sound or an unvoiced sound based on the analyzed bone conduction sound source characteristics.

時間フレームにおける音声信号が有声音であると判別されたときには、補正モード切替部４４によって有声音補正モードに切り替えられる。この有声音補正モードにおいては、ＬＰＣ分析部４２により分析された骨導音音源特性及び骨導音声道特性はそれぞれ、第１ＬＰＣ合成部４６に出力される。第１ＬＰＣ合成部４６においては、ＬＰＣ分析部４２より出力された骨導音声道特性に最も近い特性を有する骨導音声道特性パラメータ（例えばｂ_１）がパラメータ記憶部５０より選択され、この選択された骨導音声道特性パラメータに対応する気導音声道特性パラメータ（例えばａ_１）がパラメータ記憶部５０より読み出される。この読み出された気導音声道特性パラメータと骨導音音源特性とがＬＰＣ合成されることにより、疑似気導音声信号が生成される。 When it is determined that the audio signal in the time frame is a voiced sound, the correction mode switching unit 44 switches to the voiced sound correction mode. In the voiced sound correction mode, the bone conduction sound source characteristic and the bone conduction vocal tract characteristic analyzed by the LPC analysis unit 42 are output to the first LPC synthesis unit 46, respectively. In the first LPC synthesis unit 46, a bone conduction vocal tract characteristic parameter (for example, b ₁ ) having a characteristic closest to the bone conduction vocal tract characteristic output from the LPC analysis unit 42 is selected from the parameter storage unit 50 and selected. The air conduction vocal tract characteristic parameter (for example, a ₁ ) corresponding to the bone conduction vocal tract characteristic parameter is read from the parameter storage unit 50. The read air conduction vocal tract characteristic parameter and the bone conduction sound source characteristic are LPC synthesized to generate a pseudo air conduction sound signal.

また、時間フレームにおける音声信号が無声音であると判別されたときには、補正モード切替部４４によって無声音補正モードに切り替えられる。この無声音補正モードにおいては、ＬＰＣ分析部４２により分析された骨導音声道特性は、補正モード切替部４４を介して第２ＬＰＣ合成部４８に出力される。第２ＬＰＣ合成部４８においては、ＬＰＣ分析部４２より出力された骨導音声道特性に最も近い特性を有する骨導音声道特性パラメータ（例えばｂ_２）がパラメータ記憶部５０より選択され、この選択された骨導音声道特性パラメータに対応する気導音音源特性パラメータ（例えばａｖ_２）及び気導音声道特性パラメータ（例えばａ_２）がパラメータ記憶部５０より読み出される。この読み出された気導音音源特性パラメータと気導音声道特性とがＬＰＣ合成されることにより、疑似気導音声信号が生成される。 When it is determined that the audio signal in the time frame is an unvoiced sound, the correction mode switching unit 44 switches to the unvoiced sound correction mode. In this unvoiced sound correction mode, the bone conduction vocal tract characteristics analyzed by the LPC analysis unit 42 are output to the second LPC synthesis unit 48 via the correction mode switching unit 44. In the second LPC synthesis unit 48, a bone conduction vocal tract characteristic parameter (for example, b ₂ ) having a characteristic closest to the bone conduction vocal tract characteristic output from the LPC analysis unit 42 is selected from the parameter storage unit 50 and selected. The air conduction sound source characteristic parameter (for example, av ₂ ) and the air conduction sound path characteristic parameter (for example, a ₂ ) corresponding to the bone conduction vocal tract characteristic parameter are read from the parameter storage unit 50. The read air-conducted sound source characteristic parameter and air-conducted vocal tract characteristic are LPC synthesized to generate a pseudo air-conducted speech signal.

上述のようにして生成された各疑似気導音声信号はそれぞれ、平滑化部５２にて加算されるとともに平滑化され、これにより出力信号が生成される。生成された出力信号は、Ｄ／Ａ変換部５４にてアナログ信号に変換された後に、ローパスフィルタ５６で所定の周波数以上の周波数帯域が減衰され、出力端子６０より外部に出力される。 Each of the pseudo air conduction sound signals generated as described above is added and smoothed by the smoothing unit 52, thereby generating an output signal. The generated output signal is converted into an analog signal by the D / A converter 54, and then the frequency band of a predetermined frequency or more is attenuated by the low-pass filter 56 and output to the outside from the output terminal 60.

なお、パラメータ記憶手段５０に記憶された骨導音声道特性パラメータ、気導音声道特性パラメータ及び気導音音源特性パラメータはそれぞれ、例えば次のようにして作成される。まず、気導音声道特性パラメータの作成方法について説明する。気導音声道特性パラメータの作成には、図１１に示すようなパラメータ作成装置６２が用いられる。このパラメータ作成装置６２は、気導マイクロホン６４、ローパスフィルタ６６、Ａ／Ｄ変換部６８、時間フレーム分割部７０、ＬＰＣ分析部７２及び代表値選出部７４を含んでいる。 Note that the bone-conducted vocal tract characteristic parameter, the air-conducted vocal tract characteristic parameter, and the air-conducted sound source characteristic parameter stored in the parameter storage unit 50 are created as follows, for example. First, a method for creating an air conduction vocal tract characteristic parameter will be described. A parameter creating device 62 as shown in FIG. 11 is used to create the air conduction vocal tract characteristic parameters. The parameter creation device 62 includes an air conduction microphone 64, a low-pass filter 66, an A / D conversion unit 68, a time frame division unit 70, an LPC analysis unit 72, and a representative value selection unit 74.

気導マイクロホン６４は、空気伝搬する発声者の肉声の音声信号を収録するものであり、所謂一般的なマイクロホンである。ローパスフィルタ６６、Ａ／Ｄ変換部６８及び時間フレーム分割部７０はそれぞれ、上述した音声補正装置３２のものとほぼ同様の機能を有するものである。 The air conduction microphone 64 is a so-called general microphone that records a voice signal of a voice of a speaker who propagates in the air. Each of the low-pass filter 66, the A / D conversion unit 68, and the time frame division unit 70 has substantially the same function as that of the audio correction device 32 described above.

ＬＰＣ分析部７２は、時間フレーム分割部７０により分割された各時間フレームにおける音声信号に対して線形予測分析を行い、音声信号の気導音声道特性を分析する。代表値選出部７４は、ＬＰＣ分析部７２により分析された複数の気導音声道特性より後述するようにして代表値を選出し、この代表値を気導音声道特性パラメータとして設定する。 The LPC analysis unit 72 performs linear prediction analysis on the audio signal in each time frame divided by the time frame dividing unit 70, and analyzes the air conduction vocal tract characteristics of the audio signal. The representative value selection unit 74 selects a representative value as described later from the plurality of air conduction vocal tract characteristics analyzed by the LPC analysis unit 72, and sets the representative value as an air conduction vocal tract characteristic parameter.

気導音声道特性パラメータの作成の流れについて説明すると、次の通りである。まず、発声者が、音声信号としてあらゆる特徴が表出した語彙や文章、例えば１００個の日本都市名等を発声する。発声された音声は、気導マイクロホン６４に入力される。気導マイクロホン６４からの音声信号は、ローパスフィルタ６６及びＡ／Ｄ変換部６８を介してデジタル信号に変換される。その後、この音声信号は、時間フレーム分割部７０にて時間フレームごとに分割され、ＬＰＣ分析部７２に出力される。ＬＰＣ分析部７２では、各時間フレームにおける音声信号に対して線形予測分析が行われ、音声信号の気導音声道特性が分析される。この分析された気導音声道特性は、複数の特性グループのいずれかに分類される。これら複数の特性グループは、性質の似ている気導音声道特性を分類するためのであり、各特性グループには、性質の似ている気導音声道特性が複数属するようになる。代表値選出部では、各特性グループより一つの気導音声道特性を代表値として選出する。この選出された代表値は、気導音声道特性パラメータとして設定され、パラメータ記憶部５０に記憶される。 The flow of creating the air conduction vocal tract characteristic parameters is as follows. First, a speaker speaks a vocabulary or a sentence in which all features are expressed as an audio signal, such as 100 Japanese city names. The uttered voice is input to the air conduction microphone 64. The audio signal from the air conduction microphone 64 is converted into a digital signal via the low pass filter 66 and the A / D conversion unit 68. Thereafter, the audio signal is divided for each time frame by the time frame dividing unit 70, and is output to the LPC analysis unit 72. The LPC analysis unit 72 performs linear prediction analysis on the speech signal in each time frame, and analyzes the air conduction vocal tract characteristics of the speech signal. This analyzed air conduction vocal tract characteristic is classified into one of a plurality of characteristic groups. The plurality of characteristic groups are for classifying the air conduction vocal tract characteristics having similar properties, and a plurality of air conduction vocal tract characteristics having similar properties belong to each characteristic group. The representative value selection unit selects one air conduction vocal tract characteristic as a representative value from each characteristic group. The selected representative value is set as an air conduction vocal tract characteristic parameter and stored in the parameter storage unit 50.

次に、骨導音声道特性パラメータ及び気導音音源特性パラメータの作成方法について説明する。骨導音声道特性パラメータ及び気導音音源特性パラメータの作成には、図１２に示すようなパラメータ作成装置７６が用いられる。このパラメータ作成装置７６は、骨導マイクロホン７８、ローパスフィルタ８０、Ａ／Ｄ変換部８２、時間フレーム分割部８４、ＬＰＣ分析部８６、気導マイクロホン８８、ローパスフィルタ９０、Ａ／Ｄ変換部９２、時間フレーム分割部９４、ＬＰＣ分析部９６、パラメータ割当部９８及び平均化部１００を含んでいる。 Next, a method for creating a bone-conducted vocal tract characteristic parameter and an air-conducted sound source characteristic parameter will be described. A parameter creation device 76 as shown in FIG. 12 is used to create the bone-conducted vocal tract characteristic parameter and the air-conducted sound source characteristic parameter. The parameter creation device 76 includes a bone conduction microphone 78, a low pass filter 80, an A / D conversion unit 82, a time frame division unit 84, an LPC analysis unit 86, an air conduction microphone 88, a low pass filter 90, an A / D conversion unit 92, A time frame division unit 94, an LPC analysis unit 96, a parameter assignment unit 98 and an averaging unit 100 are included.

ローパスフィルタ８０，９０、Ａ／Ｄ変換部８２，９２及び時間フレーム分割部８４，９４はそれぞれ、上述した音声補正装置３２のものとほぼ同様の機能を有するものである。ＬＰＣ分析部８６は、時間フレーム分割部８４により分割された各時間フレームにおける音声信号に対して線形予測分析を行い、音声信号の骨導音声道特性を分析する。また、ＬＰＣ分析部９６は、時間フレーム分割部９４により分割された各時間フレームにおける音声信号に対して線形予測分析を行い、音声信号の気導音声道特性及び気導音音源特性を分析する。パラメータ割当部９８は、後述するようにして骨導音声道特性パラメータ、気導音声道特性パラメータ及び気導音音源特性パラメータから構成されるパラメータグループを複数組作成する。平均化部１００は、複数組のパラメータグループのうち、重複したパラメータがある場合には平均化処理を行う。 The low-pass filters 80 and 90, the A / D conversion units 82 and 92, and the time frame division units 84 and 94 each have substantially the same function as that of the audio correction device 32 described above. The LPC analysis unit 86 performs linear prediction analysis on the audio signal in each time frame divided by the time frame dividing unit 84, and analyzes the bone-conducted vocal tract characteristics of the audio signal. The LPC analysis unit 96 performs linear prediction analysis on the audio signal in each time frame divided by the time frame dividing unit 94, and analyzes the air conduction sound path characteristics and the air conduction sound source characteristics of the sound signals. The parameter allocating unit 98 creates a plurality of parameter groups including bone-conducted vocal tract characteristic parameters, air-conducted vocal tract characteristic parameters, and air-conducted sound source characteristic parameters as will be described later. The averaging unit 100 performs an averaging process when there are overlapping parameters among a plurality of sets of parameter groups.

骨導音声道特性パラメータ及び気導音音源特性パラメータの作成の流れについて説明すると、次の通りである。まず、上述したのと同様に、発声者が、音声信号としてあらゆる特徴が表出した語彙や文章、例えば１００個の日本都市名等を発声する。発声された音声は、骨導マイクロホン７８及び気導マイクロホン８８に同時に入力される。骨導マイクロホン７８からの音声信号は、ローパスフィルタ８０及びＡ／Ｄ変換部８２を介してデジタル信号に変換される。その後、この音声信号は、時間フレーム分割部８４にて時間フレームごとに分割され、ＬＰＣ分析部８６に出力される。ＬＰＣ分析部８６では、各時間フレームにおける音声信号に対して線形予測分析を行い、音声信号の骨導音声道特性を分析する。 The flow of creating the bone-conducted vocal tract characteristic parameter and the air-conducted sound source characteristic parameter will be described as follows. First, in the same manner as described above, a speaker speaks a vocabulary or a sentence in which all features are expressed as an audio signal, for example, 100 Japanese city names. The uttered voice is input to the bone conduction microphone 78 and the air conduction microphone 88 at the same time. The audio signal from the bone-conduction microphone 78 is converted into a digital signal via the low-pass filter 80 and the A / D conversion unit 82. Thereafter, the audio signal is divided for each time frame by the time frame dividing unit 84 and output to the LPC analyzing unit 86. The LPC analysis unit 86 performs linear prediction analysis on the speech signal in each time frame, and analyzes the bone-conducted vocal tract characteristics of the speech signal.

また、気導マイクロホン８８からの音声信号は、ローパスフィルタ９０及びＡ／Ｄ変換部９２を介してデジタル信号に変換される。その後、この音声信号は、時間フレーム分割部９４にて時間フレームごとに分割され、ＬＰＣ分析部９６に出力される。ＬＰＣ分析部９６では、各時間フレームにおける音声信号に対して線形予測分析を行い、音声信号の気導音声道特性及び気導音音源特性を分析する。 The audio signal from the air conduction microphone 88 is converted into a digital signal via the low-pass filter 90 and the A / D converter 92. Thereafter, the audio signal is divided for each time frame by the time frame dividing unit 94 and output to the LPC analyzing unit 96. The LPC analysis unit 96 performs linear prediction analysis on the speech signal in each time frame, and analyzes the air conduction vocal tract characteristics and the air conduction sound source characteristics of the speech signal.

パラメータ割当部９８では、上述のようにして作成された気導音声道特性パラメータと、各時間フレームにおける気導音声道特性とが照合され、時間フレーム（例えばＴ_１）における気導音声道特性と最も特性の近い気導音声道特性パラメータ（例えばａ_１）が選び出される。次いで、パラメータ割当部９８では、同じ時間フレーム（例えばＴ_１）で分析された骨導音声道特性及び気導音音源特性に対して、上述のようにして選び出された気導音声道特性パラメータ（例えばａ_１）と同じパラメータ番号（即ち、例えばａ_ｎにおける番号「ｎ」）を有する骨導音声道特性パラメータ（例えばｂ_１）及び気導音音源特性パラメータ（例えばａｖ_１）がそれぞれ割り当てられる。このように時間フレームごとの骨導音声道特性及び気導音音源特性に対してそれぞれ気導音声道特性パラメータのパラメータ番号が割り当てられることにより、骨導音声道特性パラメータ及び気導音音源特性パラメータから構成されるパラメータグループが複数組作成される。 In the parameter assigning unit 98, the air conduction vocal tract characteristic parameter created as described above is collated with the air conduction vocal tract characteristic in each time frame, and the air conduction vocal tract characteristic in the time frame (for example, T ₁ ) The air conduction vocal tract characteristic parameter (for example, a ₁ ) having the closest characteristic is selected. Next, the parameter assigning unit 98 selects the air conduction vocal tract characteristic parameters selected as described above for the bone conduction vocal tract characteristic and the air conduction sound source characteristic analyzed in the same time frame (for example, T ₁ ). (e.g., a ₁₎ the same parameter number (i.e., for example, a number "n" in a _n) bone-conducted vocal tract characteristic parameter with (e.g., b ₁₎ and air conduction sound source characteristic parameters (e.g. av ₁₎ is allocated respectively . Thus, by assigning the parameter numbers of the air conduction vocal tract characteristic parameters to the bone conduction vocal tract characteristics and the air conduction sound source characteristics for each time frame, the bone conduction vocal tract characteristic parameters and the air conduction sound source characteristic parameters are assigned. A plurality of parameter groups are created.

なお、このような処理を全時間フレームに渡って行うと、例えば、骨導音声道特性パラメータｂ_１及び気導音音源特性パラメータａｖ_１がそれぞれ複数個現れる場合がある。このような場合には、平均化部１００にて複数個の骨導音声道特性パラメータｂ_１及び複数個の気導音音源特性パラメータａｖ_１がそれぞれ平均化処理されることにより、骨導音声道特性パラメータｂ_１及び気導音音源特性パラメータａｖ_１がそれぞれ１個ずつ作成される。以上のようにして、各気導音声道特性パラメータに対応付けられた骨導音声道特性パラメータ及び気導音音源特性パラメータがそれぞれパラメータ記憶部５０に記憶される（図１０参照）。 If such processing is performed over the entire time frame, for example, a plurality of bone-conducted vocal tract characteristic parameters b ₁ and air-conducted sound source characteristic parameters av ₁ may appear. In such a case, a plurality of bone-conducted vocal tract characteristic parameters b ₁ and a plurality of air-conducted sound source characteristic parameters av ₁ are averaged by the averaging unit 100, respectively. One characteristic parameter b ₁ and _one air conduction sound source characteristic parameter av ₁ are created. As described above, the bone-conducted vocal tract characteristic parameter and the air-conducted sound source characteristic parameter associated with each air-conducted vocal tract characteristic parameter are stored in the parameter storage unit 50 (see FIG. 10).

以上、本発明に従う帯域拡張装置及び音声補正装置の各種実施形態について説明したが、本発明はかかる実施形態に限定されるものではなく、本発明の範囲を逸脱することなく種々の変形乃至修正が可能である。 Although various embodiments of the band extending apparatus and the sound correcting apparatus according to the present invention have been described above, the present invention is not limited to such embodiments, and various modifications or corrections can be made without departing from the scope of the present invention. Is possible.

２，２Ａ帯域拡張装置
８，４０時間フレーム分割部
１０フーリエ変換部
１２，１２Ａ高調波スペクトル生成部
１４高調波スペクトル加算部
１６フーリエ逆変換部
１８出力信号生成部
２６補間演算部
２８原音スペクトル分析部
３０加重部
３２音声補正装置
３４骨導マイクロホン
４２ＬＰＣ分析部
４４補正モード切替部
４６第１ＬＰＣ合成部
４８第２ＬＰＣ合成部
５０パラメータ記憶部
５２平滑化部 2,2A Band Expansion Device 8,40 Time Frame Dividing Unit 10 Fourier Transform Unit 12, 12A Harmonic Spectrum Generation Unit 14 Harmonic Spectrum Addition Unit 16 Inverse Fourier Transform Unit 18 Output Signal Generation Unit 26 Interpolation Operation Unit 28 Original Sound Spectrum Analysis Unit DESCRIPTION OF SYMBOLS 30 Weighting part 32 Audio | voice correction | amendment apparatus 34 Bone-conduction microphone 42 LPC analysis part 44 Correction mode switching part 46 1st LPC synthetic | combination part 48 2nd LPC synthetic | combination part 50 Parameter memory | storage part 52 Smoothing part

Claims

A band extending device for extending the frequency band of an input signal,
Time frame dividing means for dividing the time domain input signal into time frames; Fourier transform means for generating a frequency domain original sound spectrum by Fourier transforming the time frame; and generating a harmonic spectrum based on the original sound spectrum A harmonic spectrum generating means for adding the harmonic spectrum to the original sound spectrum, and inverse Fourier transforming the original sound spectrum to which the harmonic spectrum has been added to obtain an output signal component in the time domain. An inverse Fourier transform means for generating, and an output signal generation means for adding an output signal component to generate an output signal with an expanded frequency band.
The harmonic spectrum generation means calculates the frequency of the original sound spectrum component included in the original sound spectrum, and sets the frequency multiplied by the calculated frequency as the frequency of the harmonic spectrum component included in the harmonic spectrum. A bandwidth extension device.

The harmonic spectrum generating means calculates a phase angle of the original sound spectrum component included in the original sound spectrum, and a phase angle obtained by multiplying the calculated phase angle is a phase of the harmonic spectrum component included in the harmonic spectrum. The band extending apparatus according to claim 1, wherein the band extending apparatus is set as a corner.

The harmonic spectrum generation unit is configured to analyze the original sound spectrum with respect to the magnitude of the harmonic spectrum component included in the harmonic spectrum based on the analysis result of the original sound spectrum analysis unit. The band extending apparatus according to claim 1, further comprising a weighting unit that applies weighting with a predetermined weighting factor.

The original sound spectrum analyzing means is configured to analyze an inclination of an envelope of the original sound spectrum, and the weighting means changes the weighting factor based on the analyzed inclination of the envelope of the original sound spectrum. The band extending apparatus according to claim 3.

The harmonic spectrum generation means includes interpolation means for interpolating the harmonic spectrum component, and the interpolation means has a frequency multiplied by a frequency of a first original sound spectrum component included in the original sound spectrum, and 5. The harmonic spectrum component having a frequency between a frequency multiplied by a frequency of a second original sound spectrum component adjacent to the first original sound spectrum component is interpolated. Bandwidth expansion device.

6. The input signal according to claim 1, wherein the input signal is a voice signal or a musical tone signal generated by a microphone, a vibration pickup, a bone conduction microphone, a digital acoustic device, a telephone system, an artificial vocal cord, or the like. Bandwidth expansion device.

The bone conduction microphone, the time frame dividing means for dividing the sound signal from the bone conduction microphone into time frames, the sound characteristic analyzing means for analyzing the sound characteristics of the sound signal in the time frame, and the analysis result of the sound characteristics A sound property determining means for determining whether the sound signal in the time frame is voiced sound or unvoiced sound, first signal correcting means for correcting the sound signal determined to be voiced sound and generating a pseudo air conduction sound signal; A second signal correcting means for correcting a voice signal determined to be an unvoiced sound to generate a pseudo air conduction voice signal; a voiced sound correcting mode by the first signal correcting means; and an unvoiced sound correcting mode by the second signal correcting means. Correction mode switching means for switching, and output signal generation means for adding the generated pseudo air conduction sound signals to generate an output signal. Voice correction apparatus characterized.

The apparatus further comprises parameter storage means for storing a bone-conducted vocal tract characteristic parameter, an air-conducted vocal tract characteristic parameter, and an air-conducted sound source characteristic parameter. And configured to analyze bone-conducted vocal tract characteristics,
The first signal correction means reads out the air conduction vocal tract characteristic parameter corresponding to the bone conduction vocal tract characteristic of the sound signal analyzed by the sound characteristic analysis means from the storage means, and the sound characteristic 8. The speech correction apparatus according to claim 7, wherein a pseudo air conduction sound signal is generated by synthesizing the bone conduction sound source characteristic of the sound signal analyzed by the analysis means and the air conduction sound path characteristic parameter.

The second signal correction unit is configured to determine the air conduction sound source characteristic parameter and the air conduction sound source characteristic parameter corresponding thereto based on the bone conduction vocal tract characteristic of the sound signal analyzed by the sound characteristic analysis unit. 9. The speech correction apparatus according to claim 8, wherein the speech correction apparatus generates a pseudo air conduction sound signal by reading from the storage means and combining the read air conduction sound path characteristic parameter and the air conduction sound source characteristic parameter. .