JP2013195713A

JP2013195713A - Speech correction device, speech correction method, and computer program for speech correction

Info

Publication number: JP2013195713A
Application number: JP2012062860A
Authority: JP
Inventors: Chisato Ishikawa; 千里石川; Taro Togawa; 太郎外川; Takeshi Otani; 猛大谷; Masanao Suzuki; 政直鈴木
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2012-03-19
Filing date: 2012-03-19
Publication date: 2013-09-30
Anticipated expiration: 2032-03-19
Also published as: JP6098038B2

Abstract

PROBLEM TO BE SOLVED: To provide a speech correction device capable of correcting muffled sound in spite of a small difference between a power spectrum in a low frequency band and a power spectrum in a high frequency band.SOLUTION: A speech correction device 6 includes: an effective spectrum extraction unit 13 which, with respect to at least a first frequency band and a second frequency band out of a plurality of frequency bands, calculates effective spectrum signals representing perceivable spectrum signal values by subtracting masking thresholds corresponding to spectrum signal values which cannot be heard in accordance with human auditory characteristics, from spectrum signal values; an inter-band power difference calculation unit 14 which obtains a difference between the effective spectrum signal of the first frequency band and the effective spectrum signal of the second frequency band; a correction amount calculation unit 15 which determines a correction amount for a prescribed frequency band in accordance with the difference; and a correction unit 16 which corrects spectrum signal values at respective frequencies within the prescribed frequency band in accordance with the correction amount.

Description

本発明は、例えば、音声信号を補正する音声補正装置、音声補正方法及び音声補正用コンピュータプログラムに関する。 The present invention relates to a sound correction device, a sound correction method, and a sound correction computer program for correcting a sound signal, for example.

携帯電話機で集音された音声信号では、例えば、携帯電話機が有するマイクロホンの周波数特性により、高周波数成分が相対的に小さくなることがある。このような場合、その集音された音声信号を再生すると、その再生音はいわゆるこもった音となり、その結果、リスナーにとってその再生音は聞き取り難くなることがあった。 In an audio signal collected by a mobile phone, for example, a high frequency component may be relatively small due to a frequency characteristic of a microphone included in the mobile phone. In such a case, when the collected audio signal is reproduced, the reproduced sound becomes a so-called muffled sound, and as a result, it may be difficult for the listener to hear the reproduced sound.

上記のような問題に対して、音声品質を落とさずに音声を強調する技術が研究されている（例えば、特許文献１を参照）。 In order to solve the above-described problems, a technique for enhancing speech without degrading speech quality has been studied (for example, see Patent Document 1).

例えば、特許文献１に開示された音声強調装置は、受話音声と周囲騒音の成分比であるSNRを算出し、かつ、受話音声のピッチ周波数と音声のパワースペクトルの傾きの少なくとも一つから音声の明るさを算出する。またこの音声強調装置は、受話音声の主観的な了解性の向上に寄与する帯域と主観的な明るさの向上に寄与する帯域を示す帯域分割情報とSNRから受話音声が周囲騒音にマスクされた際の受話音声の主観的な了解性の向上に寄与する第１の帯域の強調量を算出する。さらにこの音声強調装置は、第１の帯域の強調量と音声の明るさから、主観的な明るさの向上に寄与する第２の帯域の強調量を算出する。そしてこの音声強調装置は、第１の帯域の強調量と第２の帯域の強調量とを用いて、受話音声のスペクトルを加工する。 For example, the speech enhancement device disclosed in Patent Document 1 calculates SNR, which is a component ratio between received speech and ambient noise, and determines the speech from at least one of the pitch frequency of the received speech and the slope of the power spectrum of the speech. Calculate brightness. In addition, this speech enhancement device masks the received speech from ambient noise from the band division information and SNR indicating the bandwidth that contributes to the improvement of subjective intelligibility of the received speech and the bandwidth that contributes to the improvement of subjective brightness. The enhancement amount of the first band that contributes to the improvement of the subjective intelligibility of the received voice at the time is calculated. Furthermore, the speech enhancement apparatus calculates the enhancement amount of the second band that contributes to the improvement of subjective brightness from the enhancement amount of the first band and the brightness of the speech. The speech enhancement apparatus processes the spectrum of the received speech using the enhancement amount of the first band and the enhancement amount of the second band.

特開２０１０−１４９１４号公報JP 2010-14914 A

特許文献１に開示された技術では、低周波数帯域と高周波数帯域との間のパワースペクトルの傾きによって強調量が影響されるので、その傾きがある程度大きい場合には、リスナーが音がこもると感じない程度にまで高周波数帯域のスペクトル成分が増幅される。しかしながら、音声信号によっては、低周波数帯域と高周波数帯域との間のパワースペクトルの傾きが小さくても、リスナーは音がこもると感じることがある。このような場合、その音声信号の高周波数帯域に対する強調量が十分に大きくならず、その結果として、強調された音声信号に対しても、リスナーは音がこもると感じることがあった。
また、パワースペクトルの傾きに対する強調量を大きくすると、パワースペクトルの傾きが大きい音声信号に対しては、高周波数帯域のスペクトル成分が過剰に増幅されてしまい、かえって聞き難くなるほど音声信号が歪んでしまう。 In the technique disclosed in Patent Document 1, since the amount of enhancement is affected by the inclination of the power spectrum between the low frequency band and the high frequency band, if the inclination is large to some extent, the listener feels that the sound is muffled. The spectral components in the high frequency band are amplified to the extent that they are not. However, depending on the audio signal, the listener may feel that the sound is congested even if the slope of the power spectrum between the low frequency band and the high frequency band is small. In such a case, the amount of enhancement of the audio signal with respect to the high frequency band is not sufficiently large, and as a result, the listener may feel that the sound is congested even with the enhanced audio signal.
In addition, if the amount of enhancement with respect to the power spectrum inclination is increased, the spectrum component in the high frequency band is excessively amplified for an audio signal having a large power spectrum inclination, and the audio signal is distorted so that it is difficult to hear. .

そこで本明細書は、低周波数帯域におけるパワースペクトルと高周波数帯域におけるパワースペクトルの差が小さくてもこもる音を補正できる音声補正装置を提供することを目的とする。 In view of this, an object of the present specification is to provide an audio correction apparatus that can correct a muffled sound even if a difference between a power spectrum in a low frequency band and a power spectrum in a high frequency band is small.

一つの実施形態によれば、音声補正装置が提供される。この音声補正装置は、時間領域の音声信号を所定の時間長を持つフレーム単位で周波数領域へ変換することで複数の周波数のそれぞれについてのスペクトル信号値を含むスペクトル信号を算出する時間周波数変換部と、各周波数について、人の聴覚特性に応じて聞き取れないスペクトル信号値に相当するマスキング閾値を算出するマスキング閾値算出部と、少なくとも第１の周波数帯域に含まれる周波数及び第２の周波数帯域に含まれる周波数についてのスペクトル信号値とマスキング閾値に基づいて、第１の周波数帯域及び第２の周波数帯域の知覚可能なスペクトル信号を表す有効スペクトル信号を算出する有効スペクトル抽出部と、第１の周波数帯域の有効スペクトル信号と第２の周波数帯域の有効スペクトル信号との差を求める帯域間パワー差算出部と、その差に応じて所定の周波数帯域の補正量を決定する補正量算出部と、その補正量に応じて、所定の周波数帯域内の各周波数のスペクトル信号値を補正することで補正スペクトル信号を算出する補正部と、補正スペクトル信号を時間領域へ変換することで補正された音声信号を得る周波数時間変換部とを有する。 According to one embodiment, an audio correction device is provided. This audio correction apparatus includes a time-frequency conversion unit that calculates a spectrum signal including spectrum signal values for each of a plurality of frequencies by converting a time-domain audio signal into a frequency domain in units of frames having a predetermined time length. A masking threshold value calculation unit that calculates a masking threshold value corresponding to a spectrum signal value that cannot be heard according to human auditory characteristics for each frequency, and at least a frequency included in the first frequency band and a second frequency band An effective spectrum extraction unit for calculating an effective spectrum signal representing a perceptible spectrum signal in the first frequency band and the second frequency band based on the spectrum signal value and the masking threshold for the frequency; Between the bands for obtaining the difference between the effective spectrum signal and the effective spectrum signal of the second frequency band A word difference calculation unit, a correction amount calculation unit that determines a correction amount of a predetermined frequency band according to the difference, and a correction of a spectrum signal value of each frequency within the predetermined frequency band according to the correction amount The correction unit that calculates the correction spectrum signal in step S1 and the frequency time conversion unit that obtains the corrected audio signal by converting the correction spectrum signal into the time domain.

本発明の目的及び利点は、請求項において特に指摘されたエレメント及び組み合わせにより実現され、かつ達成される。
上記の一般的な記述及び下記の詳細な記述の何れも、例示的かつ説明的なものであり、請求項のように、本発明を限定するものではないことを理解されたい。 The objects and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the appended claims.
It should be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention as claimed.

本明細書に開示された音声補正装置は、低周波数帯域におけるパワースペクトルと高周波数帯域におけるパワースペクトルの差が小さくてもこもる音を補正できる。 The sound correcting device disclosed in the present specification can correct a sound that remains even if the difference between the power spectrum in the low frequency band and the power spectrum in the high frequency band is small.

一実施形態による音声補正装置が実装された携帯電話機の概略構成図である。It is a schematic block diagram of the mobile telephone by which the audio | voice correction apparatus by one Embodiment was mounted. 音声補正装置の概略構成図である。It is a schematic block diagram of an audio | voice correction apparatus. パワースペクトルのピーク周波数と各周波数のマスキング閾値との関係の一例を示す図である。It is a figure which shows an example of the relationship between the peak frequency of a power spectrum, and the masking threshold value of each frequency. （ａ）は、入力された音声信号のパワースペクトルと各周波数のマスキング閾値とを示す図であり、（ｂ）は、（ａ）に示された音声信号のパワースペクトルから算出された有効パワースペクトルの一例を示す図である。(A) is a figure which shows the power spectrum of the input audio | voice signal, and the masking threshold value of each frequency, (b) is the effective power spectrum calculated from the power spectrum of the audio | voice signal shown to (a). It is a figure which shows an example. 低周波数帯域のパワースペクトル及び高周波数帯域のパワースペクトルとパワー差の関係の一例を示す図である。It is a figure which shows an example of the relationship between the power spectrum of a low frequency band, the power spectrum of a high frequency band, and a power difference. パワー差と基準補正係数の関係の一例を示す図である。It is a figure which shows an example of the relationship between a power difference and a reference | standard correction coefficient. 周波数と（５）式に示された係数β(f)との関係を示す図である。It is a figure which shows the relationship between a frequency and the coefficient (beta) (f) shown by (5) Formula. （ａ）は、こもり感がある音声信号のパワースペクトルの一例を示す図である。（ｂ）は、（ａ）に示されたパワースペクトルのうち、有効パワースペクトルを示す図である。（ｃ）は、（ａ）に示されたパワースペクトルを、それぞれ、従来技術と本実施形態による音声補正装置とで補正した音声信号のパワースペクトルの一例を示す図である。(A) is a figure which shows an example of the power spectrum of the audio | voice signal with a feeling of being crowded. (B) is a figure which shows an effective power spectrum among the power spectra shown by (a). (C) is a figure which shows an example of the power spectrum of the audio | voice signal which each corrected the power spectrum shown by (a) with the prior art and the audio | voice correction apparatus by this embodiment. 音声補正処理の動作フローチャートである。It is an operation | movement flowchart of an audio | voice correction process. 変形例による、パワー差と基準補正係数の関係の一例を示す図である。It is a figure which shows an example of the relationship between a power difference and a reference | standard correction coefficient by a modification. 変形例による、周波数と（７）式に示された係数β1(f)及びβ2(f)との関係を示す図である。It is a figure which shows the relationship between the frequency and the coefficient (beta) 1 (f) shown to (7) Formula, and (beta) 2 (f) by the modification. 実施形態またはその変形例による音声補正装置の各部の機能を実現するコンピュータプログラムが動作することにより、音声補正装置として動作するコンピュータの構成図である。It is a block diagram of the computer which operate | moves as a sound correction apparatus, when the computer program which implement | achieves the function of each part of the sound correction apparatus by embodiment or its modification is operated.

以下、図を参照しつつ、一つの実施形態による音声補正装置について説明する。
発明者は、低周波数帯域のパワースペクトルと高周波数帯域のパワースペクトル間の差そのものよりも、低周波数帯域及び高周波数帯域のパワースペクトルのうちの人が知覚可能な成分間の差が音のこもり感に影響するという知見を得た。
そこでこの音声補正装置は、入力された音声信号についての複数の周波数のそれぞれのパワースペクトルから、各周波数について人が知覚できないパワースペクトル値に相当するマスキング閾値を求める。そしてこの音声補正装置は、各周波数について、パワースペクトル値からマスキング閾値を減算して、人が知覚可能なパワースペクトル成分を表す有効パワースペクトル値を算出する。そしてこの音声補正装置は、低周波数帯域の有効パワースペクトルから高周波数帯域の有効パワースペクトルを減じた差が大きいほど、高周波数帯域に含まれる各周波数の周波数信号値の増幅率を高くする。 Hereinafter, an audio correction device according to an embodiment will be described with reference to the drawings.
The inventor believes that the difference between the components that can be perceived by humans in the power spectrum of the low frequency band and the high frequency band is more sensitive to the sound than the difference between the power spectrum of the low frequency band and the high frequency band. I got the knowledge that it affects me.
In view of this, the sound correction apparatus obtains a masking threshold corresponding to a power spectrum value that cannot be perceived by a person for each frequency from the power spectrum of each of a plurality of frequencies of the input sound signal. And this audio | voice correction apparatus calculates the effective power spectrum value showing the power spectrum component which a person can perceive by subtracting a masking threshold value from a power spectrum value about each frequency. And this audio | voice correction apparatus makes the amplification factor of the frequency signal value of each frequency contained in a high frequency band high, so that the difference which subtracted the effective power spectrum of a high frequency band from the effective power spectrum of a low frequency band is large.

なお、本明細書において、「低周波数帯域」という用語は、便宜上、その帯域に含まれる周波数成分が大きくなることで音声信号のこもり感の悪化に寄与する、人が知覚可能な周波数帯域を表すために使用される。また「高周波数帯域」という用語は、便宜上、「低周波数帯域」よりも相対的に高く、かつ、その帯域に含まれる周波数成分が大きくなることで音声信号のこもり感の改善に寄与する、人が知覚可能な周波数帯域を表すために使用される。
また、「パワースペクトル値」という用語は、任意の一つの周波数についてのパワースペクトルの値を表すために使用される。一方、「パワースペクトル」という用語は、複数の周波数のそれぞれのパワースペクトル値を含む、その複数の周波数を含む周波数帯域全体にわたるパワースペクトルの信号系列を表すために使用される。 In the present specification, the term “low frequency band” represents a frequency band that can be perceived by humans for the sake of convenience, which contributes to the deterioration of the feeling of bulkiness of audio signals by increasing the frequency component included in the band. Used for. In addition, the term “high frequency band” is, for convenience, relatively higher than the “low frequency band” and contributes to the improvement of the feeling of volume of the audio signal by increasing the frequency component included in the band. Are used to represent perceptible frequency bands.
Also, the term “power spectrum value” is used to represent the value of the power spectrum for any one frequency. On the other hand, the term “power spectrum” is used to denote a signal sequence of a power spectrum over the entire frequency band including the plurality of frequencies, including the power spectrum value of each of the plurality of frequencies.

図１は、第１の実施形態による音声補正装置が実装された携帯電話機の概略構成図である。図１に示されるように、携帯電話機１は、制御部２と、通信部３と、マイクロホン４と、アナログ／デジタル変換器５と、音声補正装置６と、デジタル／アナログ変換器７と、スピーカ８とを有する。
このうち、制御部２、通信部３及び音声補正装置６は、それぞれ別個の回路として形成される。あるいはこれらの各部は、その各部に対応する回路が集積された一つの集積回路として携帯電話機１に実装されてもよい。さらに、これらの各部は、携帯電話機１が有するプロセッサ上で実行されるコンピュータプログラムにより実現される、機能モジュールであってもよい。 FIG. 1 is a schematic configuration diagram of a mobile phone in which the sound correction apparatus according to the first embodiment is mounted. As shown in FIG. 1, a mobile phone 1 includes a control unit 2, a communication unit 3, a microphone 4, an analog / digital converter 5, an audio correction device 6, a digital / analog converter 7, and a speaker. 8.
Among these, the control part 2, the communication part 3, and the audio | voice correction apparatus 6 are each formed as a separate circuit. Alternatively, these units may be mounted on the mobile phone 1 as a single integrated circuit in which circuits corresponding to the units are integrated. Furthermore, each of these units may be a functional module realized by a computer program executed on a processor included in the mobile phone 1.

制御部２は、少なくとも一つのプロセッサ、不揮発性のメモリ及び揮発性のメモリ及びその周辺回路を有する。制御部２は、携帯電話機１が有するキーパッドなどの操作部（図示せず）を介した操作により通話が開始されると、携帯電話機１と基地局装置（図示せず）との間における、無線接続、切断などの呼制御処理を携帯電話機１が準拠する通信規格に従って実行する。そして制御部２は、その呼制御処理の結果に応じて、通信部３に対して音声通話の開始または終了を指示する。さらに、制御部２は、通信部３を介して基地局装置から受信したダウンリンク信号に含まれる符号化された音声信号を取り出し、その音声信号を復号する。そして制御部２は、復号した音声信号を受信音声信号として、音声補正装置６へ出力する。 The control unit 2 includes at least one processor, a nonvolatile memory, a volatile memory, and its peripheral circuits. When a call is started by an operation via an operation unit (not shown) such as a keypad included in the mobile phone 1, the control unit 2, between the mobile phone 1 and a base station device (not shown), Call control processing such as wireless connection and disconnection is executed according to a communication standard to which the mobile phone 1 complies. Then, the control unit 2 instructs the communication unit 3 to start or end the voice call according to the result of the call control process. Furthermore, the control unit 2 extracts an encoded audio signal included in the downlink signal received from the base station apparatus via the communication unit 3, and decodes the audio signal. Then, the control unit 2 outputs the decoded audio signal to the audio correction device 6 as a received audio signal.

また制御部２は、マイクロホン４により集音され、アナログ／デジタル変換器５を介して入力された音声信号を符号化し、その符号化された音声信号を含むアップリンク信号を生成する。そして制御部２は、そのアップリンク信号を通信部３へ渡す。なお、音声信号に対する符号化方式としては、例えば、Third Generation Partnership Project(3GPP)により標準化されたAdaptive Multi-Rate-NarrowBand(AMR-NB)方式、またはAdaptive Multi-Rate-WideBand(AMR-WB)方式などが用いられる。 The control unit 2 encodes a voice signal collected by the microphone 4 and input via the analog / digital converter 5 and generates an uplink signal including the encoded voice signal. Then, the control unit 2 passes the uplink signal to the communication unit 3. In addition, as an encoding method for a speech signal, for example, Adaptive Multi-Rate-NarrowBand (AMR-NB) method standardized by the Third Generation Partnership Project (3GPP), or Adaptive Multi-Rate-WideBand (AMR-WB) method Etc. are used.

通信部３は、基地局装置との間で無線通信する。そして通信部３は、基地局装置から無線信号を受信して、その無線信号をベースバンド周波数を持つダウンリンク信号に変換する。そして通信部３は、ダウンリンク信号に対して分離、復調、誤り訂正復号などの受信処理を行った後、そのダウンリンク信号を制御部２へ渡す。また通信部３は、制御部２から受け取ったアップリンク信号に対して誤り訂正符号化、変調及び多重化などの送信処理を行った後、そのアップリンク信号を無線周波数を持つ搬送波に重畳して基地局装置へ送信する。 The communication unit 3 performs wireless communication with the base station device. And the communication part 3 receives a radio signal from a base station apparatus, and converts the radio signal into a downlink signal having a baseband frequency. The communication unit 3 performs reception processing such as separation, demodulation, and error correction decoding on the downlink signal, and then passes the downlink signal to the control unit 2. The communication unit 3 performs transmission processing such as error correction coding, modulation, and multiplexing on the uplink signal received from the control unit 2, and then superimposes the uplink signal on a carrier wave having a radio frequency. Transmit to the base station device.

マイクロホン４は、音声入力部の一例であり、携帯電話機１の周囲の音を集音し、その音の強度に応じたアナログ音声信号を生成する。そしてマイクロホン４は、そのアナログ音声信号をアナログ／デジタル変換器５へ出力する。 The microphone 4 is an example of an audio input unit, collects sounds around the mobile phone 1, and generates an analog audio signal corresponding to the intensity of the sound. The microphone 4 outputs the analog audio signal to the analog / digital converter 5.

アナログ／デジタル変換器５は、マイクロホン４から受け取ったアナログ音声信号を所定のサンプリングピッチでサンプリングすることによりデジタル化された入力音声信号を生成する。また、アナログ／デジタル変換器５は、増幅器を有し、アナログ音声信号を増幅した後にデジタル化してもよい。
アナログ／デジタル変換器５は、入力音声信号を制御部２へ出力する。 The analog / digital converter 5 generates a digitized input audio signal by sampling the analog audio signal received from the microphone 4 at a predetermined sampling pitch. Further, the analog / digital converter 5 may include an amplifier, and may be digitized after amplifying the analog audio signal.
The analog / digital converter 5 outputs the input audio signal to the control unit 2.

音声補正装置６は、受信音声信号の再生音がこもることを抑制するよう、受信音声信号に含まれる高周波数帯域の周波数成分を強調することで、補正された音声信号を算出する。そして音声補正装置６は、補正された音声信号をデジタル／アナログ変換器７へ出力する。なお、音声補正装置６の詳細については後述する。 The sound correction device 6 calculates the corrected sound signal by enhancing the frequency component in the high frequency band included in the received sound signal so as to suppress the reproduction sound of the received sound signal from being accumulated. Then, the sound correction device 6 outputs the corrected sound signal to the digital / analog converter 7. Details of the sound correction device 6 will be described later.

デジタル／アナログ変換器７は、音声補正装置６から受け取った補正音声信号をデジタル−アナログ変換することでアナログ化する。なお、デジタル／アナログ変換器７は、増幅器を有し、その増幅器により、アナログ化された補正音声信号を増幅してもよい。そしてデジタル／アナログ変換器７は、アナログ化された補正音声信号をスピーカ８へ出力する。
スピーカ８は、音声出力部の一例であり、デジタル／アナログ変換器７から受け取った補正音声信号を再生する。 The digital / analog converter 7 converts the corrected audio signal received from the audio correction device 6 into an analog signal by performing digital-analog conversion. Note that the digital / analog converter 7 may include an amplifier, and the analog corrected sound signal may be amplified by the amplifier. Then, the digital / analog converter 7 outputs the analog corrected audio signal to the speaker 8.
The speaker 8 is an example of an audio output unit, and reproduces the corrected audio signal received from the digital / analog converter 7.

以下、音声補正装置６の詳細について説明する。
図２は、一つの実施形態による音声補正装置６の概略構成図である。音声補正装置６は、時間周波数変換部１１と、マスキング閾値算出部１２と、有効パワースペクトル抽出部１３と、帯域間パワー差算出部１４と、補正量算出部１５と、補正部１６と、周波数時間変換部１７とを有する。
音声補正装置６が有するこれらの各部は、それぞれ、別個の回路として音声補正装置６に実装されてもよく、あるいはそれらの各部の機能を実現する一つの集積回路であってもよい。 Hereinafter, details of the sound correction device 6 will be described.
FIG. 2 is a schematic configuration diagram of the audio correction device 6 according to one embodiment. The audio correction device 6 includes a time-frequency conversion unit 11, a masking threshold calculation unit 12, an effective power spectrum extraction unit 13, an inter-band power difference calculation unit 14, a correction amount calculation unit 15, a correction unit 16, and a frequency. And a time conversion unit 17.
Each of these units included in the audio correction device 6 may be mounted on the audio correction device 6 as a separate circuit, or may be a single integrated circuit that realizes the functions of these units.

時間周波数変換部１１は、受信音声信号を、所定の時間長（例えば、数10msec）を持つフレーム単位で周波数領域へ変換することにより周波数信号を算出する。なお、この周波数信号は、複数の周波数のそれぞれについての周波数信号値を含む。そのために、時間周波数変換部１１は、例えば、受信音声信号に対して、高速フーリエ変換(Fast Fourier Transform, FFT)、または修正離散コサイン変換(Modified Discrete Cosine Transform, MDCT)といった時間周波数変換を実行することにより周波数信号へ変換する。あるいは、時間周波数変換部１１は、Quadrature Mirror Filter(QMF)フィルタバンクあるいはウェーブレット変換を上記の時間周波数変換として用いてもよい。
時間周波数変換部１１は、次式に従って各周波数のパワースペクトル値を算出する。

ここで、S(f)は、周波数fにおける周波数信号値であり、F(f)は、周波数fにおけるパワースペクトル値である。なお、周波数信号値及びパワースペクトル値は、それぞれ、スペクトル信号値の一例である。
時間周波数変換部１１は、フレームごとに、各周波数のパワースペクトル値をマスキング閾値算出部１２及び有効パワースペクトル抽出部１３へ出力する。また、時間周波数変換部１１は、各周波数の周波数信号値を補正部１６へ出力する。 The time frequency conversion unit 11 calculates a frequency signal by converting the received audio signal into a frequency domain in units of frames having a predetermined time length (for example, several tens of milliseconds). The frequency signal includes frequency signal values for each of a plurality of frequencies. For this purpose, the time-frequency conversion unit 11 performs time-frequency conversion such as Fast Fourier Transform (FFT) or Modified Discrete Cosine Transform (MDCT) on the received audio signal. To convert it to a frequency signal. Alternatively, the time frequency conversion unit 11 may use a Quadrature Mirror Filter (QMF) filter bank or a wavelet transform as the above time frequency conversion.
The time-frequency converter 11 calculates the power spectrum value of each frequency according to the following equation.

Here, S (f) is a frequency signal value at the frequency f, and F (f) is a power spectrum value at the frequency f. The frequency signal value and the power spectrum value are examples of spectrum signal values.
The time frequency conversion unit 11 outputs the power spectrum value of each frequency to the masking threshold value calculation unit 12 and the effective power spectrum extraction unit 13 for each frame. Further, the time frequency conversion unit 11 outputs the frequency signal value of each frequency to the correction unit 16.

マスキング閾値算出部１２は、各フレームについて、周波数ごとに人の聴覚特性に基づいて人が知覚不能なパワースペクトル値に相当するマスキング閾値を算出する。
一般に、ある周波数の音のスペクトル成分が大きいと、その周波数に近いほど、かつ、その周波数のスペクトル成分が大きいほど、その周波数近傍の周波数の音のスペクトル成分は知覚され難くなるマスキング効果が生じることが知られている。この周波数マスキング効果は、人の聴覚心理的な特性に起因する。
そこでマスキング閾値算出部１２は、各フレームについて、隣接する周波数間のパワースペクトル値の変化を調べることにより、パワースペクトル値が極大値となるピーク周波数を検出する。そしてマスキング閾値算出部１２は、ピーク周波数におけるパワースペクトル値が大きいほど、かつ、ピーク周波数に近いほどマスキング閾値も大きくなるように、例えば、次式に従って各周波数のマスキング閾値を算出する。

ここで、f_iは、最も低いピーク周波数から順にi番目のピーク周波数を表す。F(f_i)は、ピーク周波数f_iにおけるパワースペクトル値である。そしてm(f,f_i)は、ピーク周波数f_iに基づいて算出される、周波数fのマスキング閾値である。関数α(x)は、変数xが大きくなるほど出力値が単調減少する単調減少関数である。さらに関数max(m(f,f_i))は、各ピーク周波数について算出される周波数fに対するマスキング閾値のうちの最大値を出力する関数である。そしてM(f)は、周波数fのマスキング閾値である。（１）式から明らかなように、マスキング閾値M(f)は、何れかのピーク周波数f_iについて算出されたマスキング閾値m(f,f_i)のうちの最大値となる。 For each frame, the masking threshold value calculation unit 12 calculates a masking threshold value corresponding to a power spectrum value that cannot be perceived by a person based on a person's auditory characteristics for each frequency.
In general, when the spectral component of a sound at a certain frequency is large, the closer to that frequency and the larger the spectral component of that frequency, the more masking effect that makes it difficult to perceive the spectral component of the sound near that frequency. It has been known. This frequency masking effect is attributed to human psychoacoustic characteristics.
Therefore, the masking threshold value calculation unit 12 detects the peak frequency at which the power spectrum value becomes the maximum value by examining the change in the power spectrum value between adjacent frequencies for each frame. Then, the masking threshold value calculation unit 12 calculates the masking threshold value of each frequency according to, for example, the following formula so that the masking threshold value increases as the power spectrum value at the peak frequency increases and the closer to the peak frequency.

Here, f _i represents the i-th peak frequency in order from the lowest peak frequency. F (f _i ) is a power spectrum value at the peak frequency f _i . M (f, f _i ) is a masking threshold value of the frequency f calculated based on the peak frequency f _i . The function α (x) is a monotonically decreasing function in which the output value monotonously decreases as the variable x increases. Furthermore, the function max (m (f, f _i )) is a function that outputs the maximum value of the masking thresholds for the frequency f calculated for each peak frequency. M (f) is a masking threshold of frequency f. (1) As apparent from the equation, the masking threshold M (f) is one of the peak frequency f _i is calculated for the masking threshold m (f, f _i) becomes the maximum value of the.

図３は、パワースペクトル値のピーク周波数と、本実施形態により設定される各周波数のマスキング閾値との関係の一例を示す図である。図３において横軸は周波数を表し、縦軸はパワーを表す。グラフ３０１は、周波数ごとのパワースペクトル値の一例を表す。またグラフ３０２は、各周波数に対するマスキング閾値を表す。この例では、パワースペクトル３０１は、周波数fA、fB、fCにて極大値となる。すなわち、周波数fA、fB、fCが、それぞれピーク周波数となる。したがって、グラフ３０２に示されるように、マスキング閾値は、何れかのピーク周波数に近い周波数ほど大きく、かつ、ピーク周波数におけるパワースペクトル値が大きいほど大きな値となるように設定される。 FIG. 3 is a diagram illustrating an example of the relationship between the peak frequency of the power spectrum value and the masking threshold value of each frequency set according to the present embodiment. In FIG. 3, the horizontal axis represents frequency and the vertical axis represents power. A graph 301 represents an example of a power spectrum value for each frequency. A graph 302 represents a masking threshold for each frequency. In this example, the power spectrum 301 has local maximum values at the frequencies fA, fB, and fC. That is, the frequencies fA, fB, and fC are peak frequencies. Therefore, as shown in the graph 302, the masking threshold is set to be larger as the frequency is closer to any peak frequency, and to be larger as the power spectrum value at the peak frequency is larger.

なお、マスキング効果は、経時的な音の大きさの変化によっても生じる。例えば、あるフレームにおける音が大きいと、その直後のフレームにおける小さい音は知覚困難となる。
そこで変形例として、マスキング閾値算出部１２は、経時的な音の変化に基づいて各周波数のマスキング閾値を算出してもよい。この場合、マスキング閾値算出部１２は、最新のフレームである現フレームの各周波数に対するマスキング閾値を、現フレームよりも所定数前、例えば一つ前のフレームにおける対応する周波数のパワースペクトル値が大きいほど、大きな値に設定する。例えば、マスキング閾値算出部１２は、ISO/IEC 13818-7:2006のAnnex CのC.1 Psychoacoustic ModelのC.1.4 Steps in Threshold Calculationの項に記載された閾値（マスキング閾値に相当）の算出処理に従って、マスキング閾値を算出できる。あるいは、マスキング閾値算出部１２は、Third Generation Partnership Project(3GPP) TS 26.403 V9.0.0 5.4.2 Threshold Calculationの項に記載されている方法に従ってマスキング閾値を算出してもよい。 The masking effect is also caused by a change in sound volume over time. For example, if the sound in a certain frame is loud, it is difficult to perceive the small sound in the immediately following frame.
Therefore, as a modification, the masking threshold calculation unit 12 may calculate a masking threshold for each frequency based on a change in sound over time. In this case, the masking threshold value calculation unit 12 sets the masking threshold value for each frequency of the current frame, which is the latest frame, to a predetermined number before the current frame, for example, as the power spectrum value of the corresponding frequency in the previous frame increases. Set it to a large value. For example, the masking threshold value calculation unit 12 calculates the threshold value (corresponding to the masking threshold value) described in the C.1.4 Steps in Threshold Calculation section of C.1 Psychoacoustic Model of Annex C of ISO / IEC 13818-7: 2006. Thus, the masking threshold can be calculated. Alternatively, the masking threshold calculation unit 12 may calculate the masking threshold according to the method described in the section of Third Generation Partnership Project (3GPP) TS 26.403 V9.0.0 5.4.2 Threshold Calculation.

さらに他の変形例によれば、マスキング閾値算出部１２は、各周波数について、ピーク周波数に基づいて算出されたマスキング閾値と、経時的な音の変化に基づいて算出されたマスキング閾値とを組み合わせることにより、最終的なマスキング閾値を決定してもよい。例えば、マスキング閾値算出部１２は、周波数ごとに、ピーク周波数に基づいて算出されたマスキング閾値と、経時的な音の変化に基づいて算出されたマスキング閾値のうち、大きい方のマスキング閾値をその周波数に対するマスキング閾値としてもよい。
マスキング閾値算出部１２は、各周波数のマスキング閾値を有効パワースペクトル抽出部１３へ出力する。 According to still another modification, the masking threshold value calculation unit 12 combines, for each frequency, a masking threshold value calculated based on the peak frequency and a masking threshold value calculated based on a change in sound over time. Thus, the final masking threshold may be determined. For example, for each frequency, the masking threshold value calculation unit 12 selects the larger masking threshold value among the masking threshold value calculated based on the peak frequency and the masking threshold value calculated based on the change in sound over time. It may be a masking threshold for.
The masking threshold value calculation unit 12 outputs the masking threshold value for each frequency to the effective power spectrum extraction unit 13.

有効パワースペクトル抽出部１３は、有効スペクトル抽出部の一例であり、フレームごとに、各周波数について、パワースペクトル値からマスキング閾値を減算することで人が知覚可能なパワースペクトル成分を表す有効パワースペクトル値を算出する。有効パワースペクトル抽出部１３は、例えば、次式に従って有効パワースペクトル値を算出する。

ここでF(f)は、周波数fのパワースペクトル値であり、M(f)は周波数fのマスキング閾値である。そしてF'(f)は、周波数fの有効パワースペクトル値である。 The effective power spectrum extraction unit 13 is an example of an effective spectrum extraction unit, and for each frame, an effective power spectrum value representing a power spectrum component that can be perceived by a person by subtracting a masking threshold value from the power spectrum value for each frequency. Is calculated. For example, the effective power spectrum extraction unit 13 calculates an effective power spectrum value according to the following equation.

Here, F (f) is a power spectrum value of the frequency f, and M (f) is a masking threshold value of the frequency f. F ′ (f) is an effective power spectrum value of the frequency f.

図４（ａ）は、各周波数の音声信号のパワースペクトル値及びマスキング閾値の一例を示す図である。図４（ｂ）は、図４（ａ）に示された音声信号のパワースペクトルから算出された有効パワースペクトルの一例を示す図である。図４（ａ）及び図４（ｂ）において、横軸は周波数を表し、縦軸はパワーを表す。そしてグラフ４０１は、各周波数の音声信号のパワースペクトル値の大きさを表し、グラフ４０２は、各周波数のマスキング閾値を表す。またグラフ４１１は、各周波数の有効パワースペクトル値の大きさを表す。グラフ４１１で表される有効パワースペクトル値の大きさは、図４（ａ）におけるハッチング領域の大きさと等価である。 FIG. 4A is a diagram illustrating an example of a power spectrum value and a masking threshold value of an audio signal of each frequency. FIG. 4B is a diagram illustrating an example of an effective power spectrum calculated from the power spectrum of the audio signal illustrated in FIG. 4 (a) and 4 (b), the horizontal axis represents frequency, and the vertical axis represents power. A graph 401 represents the magnitude of the power spectrum value of the audio signal at each frequency, and a graph 402 represents a masking threshold value at each frequency. A graph 411 represents the magnitude of the effective power spectrum value of each frequency. The size of the effective power spectrum value represented by the graph 411 is equivalent to the size of the hatched area in FIG.

有効パワースペクトル抽出部１３は、各周波数の有効パワースペクトル値を帯域間パワー差算出部１４へ出力する。 The effective power spectrum extraction unit 13 outputs the effective power spectrum value of each frequency to the interband power difference calculation unit 14.

帯域間パワー差算出部１４は、人が知覚可能な周波数帯域のうちの相対的に低い周波数帯域の有効パワースペクトルと、人が知覚可能な周波数帯域のうちの相対的に高い周波数帯域の有効パワースペクトルとのパワー差を算出する。このパワー差が大きいほど、受信音声信号のうちの人の知覚に寄与する高周波成分が相対的に小さいので、このパワー差は、受信音声信号がこもる程度を表す指標となる。
例えば、帯域間パワー差算出部１４は、次式に従って、低周波数帯域及び高周波数帯域内の各周波数のパワースペクトル値の平均値を、それぞれ、低周波数帯域及び高周波数帯域の有効パワースペクトルとすることで、パワー差ΔPを算出する。

ls及びleは、低周波数帯域の下限及び上限の周波数であり、例えば、ls及びleは、それぞれ、ls以上かつle以下の周波数帯域のパワースペクトルを増幅するとこもり感が悪化する周波数の下限値及び上限値に設定される。一方、hs及びheは、それぞれ、高周波数帯域の下限及び上限の周波数であり、例えば、hs及びheは、それぞれ、hs以上かつhe以下の周波数帯域のパワースペクトルを増幅するとこもり感が改善する周波数の下限値及び上限値に設定される。例えば、ls=150[Hz]、le=800[Hz]、hs=2900[Hz]、he=4000[Hz]に設定される。なお、帯域間パワー差算出部１４は、低周波数帯域及び高周波数帯域内の各周波数の有効パワースペクトル値の中央値を、それぞれ、低周波数帯域及び高周波数帯域の有効パワースペクトルとして、パワー差ΔPを算出してもよい。 The inter-band power difference calculation unit 14 has an effective power spectrum in a relatively low frequency band among frequency bands perceivable by humans and an effective power in a relatively high frequency band among frequency bands perceivable by humans. The power difference from the spectrum is calculated. As the power difference is larger, the high-frequency component that contributes to human perception in the received audio signal is relatively small. Therefore, the power difference is an index representing the degree of muffled received audio signal.
For example, the inter-band power difference calculation unit 14 sets the average value of the power spectrum value of each frequency in the low frequency band and the high frequency band as the effective power spectrum of the low frequency band and the high frequency band, respectively, according to the following equation. Thus, the power difference ΔP is calculated.

ls and le are the lower limit and upper limit frequencies of the low frequency band, for example, ls and le are the lower limit value and the lower limit value of the frequency at which the feeling of obscuration deteriorates when the power spectrum in the frequency band above and below ls is amplified, respectively. Set to the upper limit. On the other hand, hs and he are the lower limit and upper limit frequencies of the high frequency band, respectively.For example, hs and he are frequencies at which the feeling of lumps improves when the power spectrum in the frequency band of hs and higher and he or lower is amplified, respectively. The lower limit value and the upper limit value are set. For example, ls = 150 [Hz], le = 800 [Hz], hs = 2900 [Hz], and he = 4000 [Hz]. Note that the inter-band power difference calculation unit 14 uses the median of the effective power spectrum values of the respective frequencies in the low frequency band and the high frequency band as the effective power spectra of the low frequency band and the high frequency band, respectively, as a power difference ΔP. May be calculated.

図５は、低周波数帯域のパワースペクトル及び高周波数帯域のパワースペクトルとパワー差の関係の一例を示す図である。図５において、横軸は周波数を表し、縦軸はパワーを表す。グラフ５００は、周波数ごとの有効パワースペクトル値の大きさを表す。図５に示されるように、パワー差ΔPは、周波数ls〜le間の有効パワースペクトル値の平均値５０１から、周波数hs〜周波数he間の有効パワースペクトル値の平均値５０２を減じた値となる。したがって、パワー差ΔPが大きいほど、相対的に低い周波数帯域のパワースペクトルに対する相対的に高い周波数帯域のパワースペクトルの比が小さくなる。 FIG. 5 is a diagram illustrating an example of a relationship between a power spectrum in a low frequency band and a power spectrum in a high frequency band and a power difference. In FIG. 5, the horizontal axis represents frequency and the vertical axis represents power. The graph 500 represents the magnitude of the effective power spectrum value for each frequency. As shown in FIG. 5, the power difference ΔP is a value obtained by subtracting the average value 502 of the effective power spectrum values between the frequency hs and the frequency he from the average value 501 of the effective power spectrum values between the frequencies ls and le. . Therefore, the larger the power difference ΔP, the smaller the ratio of the power spectrum in the relatively high frequency band to the power spectrum in the relatively low frequency band.

帯域間パワー差算出部１４は、パワー差ΔPを補正量算出部１５へ出力する。 The inter-band power difference calculation unit 14 outputs the power difference ΔP to the correction amount calculation unit 15.

補正量算出部１５は、フレームごとに、パワー差ΔPに応じて、低周波数帯域に対する高周波数帯域の強調度を大きくするように、補正対象周波数帯域内の各周波数の周波数信号値を強調する程度を表す補正係数を決定する。本実施形態では、補正対象周波数帯域は、高周波数帯域を含み、かつ、低周波数帯域を含まないように設定される。本実施形態では、補正対象周波数帯域の下限の周波数は、hsから所定のオフセット値を減じた値、例えば、2562Hzに設定される。また補正対象周波数帯域の上限は設定しない。 The correction amount calculation unit 15 emphasizes the frequency signal value of each frequency in the correction target frequency band so as to increase the enhancement degree of the high frequency band with respect to the low frequency band according to the power difference ΔP for each frame. Is determined. In the present embodiment, the correction target frequency band is set so as to include a high frequency band and not include a low frequency band. In the present embodiment, the lower limit frequency of the correction target frequency band is set to a value obtained by subtracting a predetermined offset value from hs, for example, 2562 Hz. In addition, the upper limit of the correction target frequency band is not set.

補正量算出部１５は、先ず、パワー差ΔPが大きくなるほど大きな値となる基準補正係数Gmを求める。例えば、補正量算出部１５は、補正量算出部１５が有する不揮発性のメモリ回路に予め記憶された、パワー差ΔPと基準補正係数Gmとの関係を表す関係式またはテーブルを参照することにより、基準補正係数Gmを決定する。 First, the correction amount calculation unit 15 obtains a reference correction coefficient Gm that increases as the power difference ΔP increases. For example, the correction amount calculation unit 15 refers to a relational expression or table representing a relationship between the power difference ΔP and the reference correction coefficient Gm, which is stored in advance in a nonvolatile memory circuit included in the correction amount calculation unit 15. A reference correction coefficient Gm is determined.

図６は、パワー差ΔPと基準補正係数Gmとの関係の一例を示す図である。図６において、横軸はパワー差ΔPを表し、縦軸は基準補正係数Gmを表す。グラフ６００は、パワー差ΔPと基準補正係数Gmの関係を表す。
グラフ６００に示されるように、例えば、パワー差ΔPが基準値Pl以下であれば、基準補正係数Gmは0に設定される。そしてパワー差ΔPが基準値Plよりも大きく、かつ、補正上限値Pu以下であれば、そのパワー差ΔPが増加するにつれて基準補正係数Gmは線形に増加する。そしてパワー差ΔPが補正上限値Pu以上となると、基準補正係数Gmはその上限値Gmaxに設定される。なお、基準値Plは、例えば、音声信号を補正しなくても、その音声信号に対して人がこもり感を感じないパワー差の上限値、例えば、28dBに設定される。一方、補正上限値Pu及び基準補正係数の上限値Gmaxは、補正対象周波数帯域内の各周波数の周波数信号値を強調することにより生じる音声信号の歪みが主観的に検知されないパワー差の下限値及び基準補正係数の上限値、例えば、48dB及び20dBに設定される。 FIG. 6 is a diagram illustrating an example of the relationship between the power difference ΔP and the reference correction coefficient Gm. In FIG. 6, the horizontal axis represents the power difference ΔP, and the vertical axis represents the reference correction coefficient Gm. A graph 600 represents the relationship between the power difference ΔP and the reference correction coefficient Gm.
As shown in the graph 600, for example, if the power difference ΔP is equal to or less than the reference value Pl, the reference correction coefficient Gm is set to zero. If the power difference ΔP is larger than the reference value Pl and not more than the correction upper limit value Pu, the reference correction coefficient Gm increases linearly as the power difference ΔP increases. When the power difference ΔP becomes equal to or greater than the correction upper limit value Pu, the reference correction coefficient Gm is set to the upper limit value Gmax. Note that the reference value Pl is set to, for example, an upper limit value of a power difference, for example, 28 dB, that does not cause a person to feel a feeling of being bulky with respect to the audio signal without correcting the audio signal. On the other hand, the upper limit value Pu of the correction upper limit Pu and the upper limit value Gmax of the reference correction coefficient are the lower limit value of the power difference at which the distortion of the audio signal caused by enhancing the frequency signal value of each frequency within the correction target frequency band is not subjectively detected, and The upper limit value of the reference correction coefficient is set to 48 dB and 20 dB, for example.

補正量算出部１５は、基準補正係数Gmを決定すると、次式に示されるように、周波数に応じて決まる係数β(f)を基準補正係数Gmに乗じることで、補正対象周波数帯域内の各周波数に対する補正係数g(f)を決定する。

ここで、hsは、上記の高周波数帯域の下限周波数であり、例えば、2900[Hz]に設定される。Esは、補正対象周波数帯域の下限周波数であり、例えば、2500〜2700[Hz]に設定される。またEulは、補正係数g(f)が一定となる周波数の下限値であり、例えば、3100〜3300[Hz]に設定される。 When determining the reference correction coefficient Gm, the correction amount calculation unit 15 multiplies the reference correction coefficient Gm by a coefficient β (f) determined according to the frequency, as shown in the following equation, to thereby calculate each correction frequency band. A correction coefficient g (f) for frequency is determined.

Here, hs is the lower limit frequency of the high frequency band, and is set to 2900 [Hz], for example. Es is a lower limit frequency of the correction target frequency band, and is set to, for example, 2500 to 2700 [Hz]. Eul is a lower limit value of the frequency at which the correction coefficient g (f) becomes constant, and is set to 3100 to 3300 [Hz], for example.

図７は、周波数と（５）式に示された係数β(f)との関係を示す図である。図７において、横軸は周波数を表し、縦軸は係数β(f)の大きさを表す。グラフ７００は、周波数と係数β(f)との関係を表す。グラフ７００に示されるように、係数β(f)は、周波数Es未満では0であり、周波数Es以上、かつ、周波数Eul以下では、周波数が高くなるにつれて単調増加する。そしてβ(f)は、周波数Eulより大きくなると一定となる。このように、係数β(f)が設定されることにより、補正対象周波数帯域の下限Esの近傍では、その下限Esに近づくにつれて補正係数g(f)も徐々に小さくなる。そのため、補正対象周波数帯域の下限近傍にて周波数信号が不連続となることが防止されるので、補正された音声信号が不自然に歪むことが防止される。 FIG. 7 is a diagram showing the relationship between the frequency and the coefficient β (f) shown in the equation (5). In FIG. 7, the horizontal axis represents the frequency, and the vertical axis represents the magnitude of the coefficient β (f). The graph 700 represents the relationship between the frequency and the coefficient β (f). As shown in the graph 700, the coefficient β (f) is 0 below the frequency Es, and monotonously increases as the frequency increases above the frequency Es and below the frequency Eul. Β (f) becomes constant when the frequency is greater than the frequency Eul. Thus, by setting the coefficient β (f), in the vicinity of the lower limit Es of the correction target frequency band, the correction coefficient g (f) gradually decreases as the lower limit Es is approached. Therefore, the frequency signal is prevented from becoming discontinuous near the lower limit of the correction target frequency band, so that the corrected sound signal is prevented from being unnaturally distorted.

補正量算出部１５は、補正対象周波数帯域内の各周波数の補正係数g(f)を補正部１６へ出力する。 The correction amount calculation unit 15 outputs the correction coefficient g (f) of each frequency within the correction target frequency band to the correction unit 16.

補正部１６は、フレーム単位で、補正対象周波数帯域内の各周波数の周波数信号値を次式に従って補正する。

S(f)は、周波数fの周波数信号値であり、g(f)は、周波数fの補正係数である。そしてS_out(f)は、補正後の周波数信号値である。（６）式から明らかなように、補正係数g(f)=0のとき、補正後の周波数信号値S'(f)は、補正前の周波数信号値S(f)と等しく、補正係数g(f)が大きくなるほど、補正後の周波数信号値S'(f)は増幅される。 The correction unit 16 corrects the frequency signal value of each frequency within the correction target frequency band in accordance with the following equation in units of frames.

S (f) is a frequency signal value of frequency f, and g (f) is a correction coefficient of frequency f. S _out (f) is the corrected frequency signal value. As apparent from the equation (6), when the correction coefficient g (f) = 0, the frequency signal value S ′ (f) after correction is equal to the frequency signal value S (f) before correction, and the correction coefficient g As (f) increases, the corrected frequency signal value S ′ (f) is amplified.

図８（ａ）は、こもり感がある音声信号のパワースペクトルの一例を示す図である。図８（ｂ）は、図８（ａ）に示されたパワースペクトルのうち、有効パワースペクトルを示す図である。図８（ｃ）は、図８（ａ）に示されたパワースペクトルを、それぞれ、従来技術と本実施形態による音声補正装置とで補正した音声信号のパワースペクトルの一例を示す図である。
図８（ａ）〜図８（ｃ）において、横軸は周波数を表し、縦軸はパワーを表す。図８（ａ）及び（ｃ）に示されたグラフ８００は、各周波数についてのこもり感がある音声信号のパワースペクトルを表す。また線８０１は、低周波数帯域内の各周波数のパワースペクトル値の平均値P_lowを表し、一方、線８０２は、高周波数帯域内の各周波数のパワースペクトル値の平均値P_highを表す。また図（ｂ）に示されたグラフ８１０は、図８（ａ）に示された各周波数におけるパワースペクトル値のうち、有効パワースペクトル値を表す。そして線８１１及び８１２は、それぞれ、低周波数帯域内の各周波数の有効パワースペクトル値の平均値P'_low及び高周波数帯域内の各周波数の有効パワースペクトル値の平均値P'_highを表す。また図８（ｃ）に示されたグラフ８２０は、従来技術に従って補正された音声信号の周波数ごとのパワースペクトル値を表し、グラフ８２１は、音声補正装置６によって補正された音声信号の周波数ごとのパワースペクトル値を表す。図８（ａ）及び図８（ｂ）に示されるように、パワースペクトル値の平均値P_lowとP_high間の差Δよりも、有効パワースペクトル値の平均値P'_lowとP'_high間の差Δ'の方が大きい。そのため、図８（ｃ）に示されるように、音声補正装置６にて補正された音声信号の方が、従来技術に従って補正された音声信号よりも、低周波数帯域のパワースペクトルに対する高周波数帯域のパワースペクトルの比が大きくなっている。そのため、音声補正装置６にて補正された音声信号の方が、従来技術に従って補正された音声信号よりもこもり感が改善されている。 FIG. 8A is a diagram illustrating an example of a power spectrum of an audio signal having a feeling of being crowded. FIG. 8B is a diagram illustrating an effective power spectrum among the power spectra illustrated in FIG. FIG. 8C is a diagram showing an example of the power spectrum of the audio signal obtained by correcting the power spectrum shown in FIG. 8A with the conventional technology and the audio correcting apparatus according to the present embodiment.
8A to 8C, the horizontal axis represents frequency, and the vertical axis represents power. The graphs 800 shown in FIGS. 8A and 8C represent the power spectrum of an audio signal having a feeling of stagnation for each frequency. Line 801 represents the average value P _low of the power spectrum values of each frequency in the low frequency band, while line 802 represents the average value P _high of the power spectrum values of each frequency in the high frequency band. A graph 810 shown in FIG. 8B represents an effective power spectrum value among the power spectrum values at each frequency shown in FIG. Lines 811 and 812 represent the average value P ′ _low of the effective power spectrum value of each frequency in the low frequency band and the average value P ′ _high of the effective power spectrum value of each frequency in the high frequency band, respectively. A graph 820 shown in FIG. 8C represents a power spectrum value for each frequency of the audio signal corrected according to the conventional technique, and a graph 821 represents the frequency for each frequency of the audio signal corrected by the audio correction device 6. Represents a power spectrum value. As shown in FIG. 8A and FIG. 8B, the effective power spectrum value between the average values P ′ _low and P ′ _high rather than the difference Δ between the average values P _low and P _{high of the} power spectrum values. The difference Δ ′ is larger. Therefore, as shown in FIG. 8C, the audio signal corrected by the audio correction device 6 has a higher frequency band for the power spectrum in the lower frequency band than the audio signal corrected in accordance with the prior art. The power spectrum ratio is increasing. For this reason, the sound signal corrected by the sound correction device 6 has a feeling of being more muffled than the sound signal corrected according to the prior art.

補正部１６は、補正後の補正対象周波数帯域内の各周波数の周波数信号値を含む、全ての周波数帯域の周波数信号値を周波数時間変換部１７へ出力する。 The correction unit 16 outputs the frequency signal values of all frequency bands including the frequency signal values of each frequency within the corrected correction target frequency band to the frequency time conversion unit 17.

周波数時間変換部１７は、補正された各周波数の周波数信号値を、時間周波数変換部１１が用いた時間周波数変換の逆変換を用いて時間領域へ変換することにより、補正された音声信号を得る。そして周波数時間変換部１７は、補正された音声信号をデジタル／アナログ変換器７へ出力する。 The frequency-time conversion unit 17 obtains a corrected audio signal by converting the corrected frequency signal value of each frequency into the time domain using the inverse of the time-frequency conversion used by the time-frequency conversion unit 11. . Then, the frequency time conversion unit 17 outputs the corrected audio signal to the digital / analog converter 7.

図９は、音声補正装置６により実行される音声補正処理の動作フローチャートである。音声補正装置６は、フレームごとに、以下に示す動作フローチャートに従って音声補正処理を実行する。 FIG. 9 is an operation flowchart of a sound correction process executed by the sound correction device 6. The sound correction device 6 executes sound correction processing for each frame according to the operation flowchart shown below.

時間周波数変換部１１は、音声信号をフレーム単位で周波数領域へ変換することにより、周波数信号を算出する（ステップＳ１０１）。そして時間周波数変換部１１は、各周波数のパワースペクトル値を算出する（ステップＳ１０２）。時間周波数変換部１１は、各周波数のパワースペクトル値をマスキング閾値算出部１２及び有効パワースペクトル抽出部１３へ出力する。また時間周波数変換部１１は、各周波数の周波数信号値を補正部１６へ出力する。 The time-frequency conversion unit 11 calculates the frequency signal by converting the audio signal into the frequency domain in units of frames (step S101). And the time frequency conversion part 11 calculates the power spectrum value of each frequency (step S102). The time frequency conversion unit 11 outputs the power spectrum value of each frequency to the masking threshold value calculation unit 12 and the effective power spectrum extraction unit 13. Further, the time frequency conversion unit 11 outputs the frequency signal value of each frequency to the correction unit 16.

マスキング閾値算出部１２は、各周波数について、人が知覚困難なパワースペクトル値に相当するマスキング閾値を求める（ステップＳ１０３）。そしてマスキング閾値算出部１２は、各周波数のマスキング閾値を有効パワースペクトル抽出部１３へ出力する。有効パワースペクトル抽出部１３は、各周波数について、パワースペクトル値からマスキング閾値を減ずることにより、人が知覚可能な成分である有効パワースペクトル値を算出する（ステップＳ１０４）。有効パワースペクトル抽出部１３は、各周波数の有効パワースペクトル値を帯域間パワー差算出部１４へ出力する。 The masking threshold value calculation unit 12 obtains a masking threshold value corresponding to a power spectrum value that is difficult for humans to perceive for each frequency (step S103). Then, the masking threshold value calculation unit 12 outputs the masking threshold value of each frequency to the effective power spectrum extraction unit 13. The effective power spectrum extraction unit 13 calculates an effective power spectrum value, which is a component that can be perceived by a person, by subtracting the masking threshold value from the power spectrum value for each frequency (step S104). The effective power spectrum extraction unit 13 outputs the effective power spectrum value of each frequency to the interband power difference calculation unit 14.

帯域間パワー差算出部１４は、低周波数帯域内の各周波数の有効パワースペクトル値の平均値と高周波数帯域内の各周波数の有効パワースペクトル値の平均値とのパワー差ΔPを算出する（ステップＳ１０５）。そして帯域間パワー差算出部１４は、パワー差ΔPを補正量算出部１５へ出力する。
補正量算出部１５は、パワー差ΔPが大きいほど、補正対象周波数帯域内の各周波数の周波数信号値を大きく増幅するように、補正対象周波数帯域内の各周波数の補正係数を決定する（ステップＳ１０６）。そして補正部１６は、補正対象周波数帯域内の各周波数について、補正量算出部１５で決定された補正係数に応じて周波数信号値を増幅することにより、各周波数の周波数信号値を補正する（ステップＳ１０７）。そして周波数時間変換部１７は、補正された各周波数の周波数信号値を時間領域へ変換することで補正された音声信号を算出する（ステップＳ１０８）。
そして音声補正装置６は、補正された音声信号を出力し、音声補正処理を終了する。 The inter-band power difference calculation unit 14 calculates a power difference ΔP between an average value of effective power spectrum values of each frequency in the low frequency band and an average value of effective power spectrum values of each frequency in the high frequency band (step) S105). Then, the interband power difference calculation unit 14 outputs the power difference ΔP to the correction amount calculation unit 15.
The correction amount calculation unit 15 determines a correction coefficient for each frequency in the correction target frequency band so that the frequency signal value of each frequency in the correction target frequency band is amplified more as the power difference ΔP is larger (step S106). ). Then, the correction unit 16 corrects the frequency signal value of each frequency by amplifying the frequency signal value for each frequency in the correction target frequency band according to the correction coefficient determined by the correction amount calculation unit 15 (step). S107). Then, the frequency time conversion unit 17 calculates the corrected audio signal by converting the corrected frequency signal value of each frequency into the time domain (step S108).
Then, the sound correction device 6 outputs the corrected sound signal and ends the sound correction process.

以上に説明してきたように、この音声補正装置は、低周波数帯域及び高周波数帯域のパワースペクトルのうちの人が知覚可能な成分の差に基づいて、高周波数帯域内の各周波数の周波数信号値に対する強調度合いを決定する。そのため、この音声補正装置は、低周波数帯域のパワースペクトルと高周波数帯域のパワースペクトルの差が小さくても、音声信号のこもり感を適切に改善できる。 As described above, this sound correction apparatus emphasizes the frequency signal value of each frequency in the high frequency band based on the difference between the components perceivable by humans in the power spectrum of the low frequency band and the high frequency band. Determine the degree. For this reason, the sound correction apparatus can appropriately improve the feeling of the sound signal bulk even if the difference between the power spectrum in the low frequency band and the power spectrum in the high frequency band is small.

変形例によれば、マスキング閾値算出部は、パワー差の算出に利用する低周波数帯域に含まれる周波数及び高周波数帯域に含まれる周波数のみについてマスキング閾値を算出してもよい。同様に、有効パワースペクトル抽出部も、パワー差の算出に利用する低周波数帯域に含まれる周波数及び高周波数帯域に含まれる周波数のみについて有効パワースペクトル値を算出してもよい。これにより演算量が削減される。 According to the modification, the masking threshold value calculation unit may calculate the masking threshold value only for the frequency included in the low frequency band and the frequency included in the high frequency band used for calculating the power difference. Similarly, the effective power spectrum extraction unit may calculate the effective power spectrum value only for the frequency included in the low frequency band and the frequency included in the high frequency band used for calculating the power difference. Thereby, the calculation amount is reduced.

また他の変形例によれば、音声補正装置は、低周波数帯域と高周波数帯域間のパワー差が大きいほど、低周波数帯域に含まれる各周波数の周波数信号値を減衰させてもよい。この変形例によっても、低周波数帯域内の各周波数の周波数信号値に対する高周波数帯域内の各周波数の周波数信号値の比率が相対的に高くなるので、こもり感は改善される。この変形例の場合には、補正量算出部は、パワー差が大きくなるほど、低周波数帯域内の各周波数に対する減衰係数を大きくする。そして補正部は、減衰係数が大きいほど、低周波数帯域内の各周波数の周波数信号値を減衰させることで、補正周波数信号を生成する。 According to another modification, the sound correction apparatus may attenuate the frequency signal value of each frequency included in the low frequency band as the power difference between the low frequency band and the high frequency band is large. Also according to this modified example, the ratio of the frequency signal value of each frequency in the high frequency band to the frequency signal value of each frequency in the low frequency band becomes relatively high, so that the feeling of being muffled is improved. In the case of this modification, the correction amount calculation unit increases the attenuation coefficient for each frequency in the low frequency band as the power difference increases. And a correction | amendment part produces | generates a correction frequency signal by attenuating the frequency signal value of each frequency in a low frequency band, so that an attenuation coefficient is large.

さらに他の変形例によれば、マスキング閾値算出部は、パワースペクトルの代わりに、各周波数信号値の振幅の絶対値を用いて各周波数のマスキング閾値を算出してもよい。周波数信号値の振幅の絶対値も、スペクトル信号の一例である。この場合、有効パワースペクトル抽出部も、各周波数について、周波数信号値の振幅の絶対値からマスキング閾値を減じた値を有効スペクトル信号として求める。そして帯域間パワー差算出部も、周波数信号値の振幅の絶対値に基づいて算出された、低周波数帯域内の各周波数の有効スペクトル信号の平均値と高周波数帯域内の各周波数の有効スペクトル信号の平均値との差を求める。補正量算出部は、その差が大きくなるほど、補正対象周波数帯域内の各周波数についての補正係数を大きくする。 According to still another modification, the masking threshold value calculation unit may calculate the masking threshold value for each frequency using the absolute value of the amplitude of each frequency signal value instead of the power spectrum. The absolute value of the amplitude of the frequency signal value is also an example of a spectrum signal. In this case, the effective power spectrum extraction unit also obtains, as an effective spectrum signal, a value obtained by subtracting the masking threshold value from the absolute value of the amplitude of the frequency signal value for each frequency. The power difference calculation unit between the bands also calculates the average value of the effective spectrum signal of each frequency in the low frequency band and the effective spectrum signal of each frequency in the high frequency band, calculated based on the absolute value of the amplitude of the frequency signal value. Find the difference from the average value of. The correction amount calculation unit increases the correction coefficient for each frequency within the correction target frequency band as the difference increases.

さらに他の変形例によれば、音声補正装置は、高周波数帯域のパワースペクトルが大き過ぎることによる音声の歪みを改善するものであってもよい。この場合には、音声補正装置の補正量算出部は、上記のパワー差ΔPが上記の基準値P1よりも小さい場合に、高周波数帯域に含まれる各周波数のスペクトル信号値を減衰させるように補正係数を決定する。 According to still another modification, the sound correction apparatus may improve sound distortion caused by a power spectrum in a high frequency band being too large. In this case, the correction amount calculation unit of the sound correction device performs correction so as to attenuate the spectrum signal value of each frequency included in the high frequency band when the power difference ΔP is smaller than the reference value P1. Determine the coefficient.

図１０は、この変形例による、パワー差ΔPと基準補正係数Gmとの関係の一例を示す図である。図１０において、横軸はパワー差ΔPを表し、縦軸は基準補正係数Gmを表す。グラフ１０００は、パワー差ΔPと基準補正係数Gmの関係を表す。
グラフ１０００に示されるように、この変形例では、パワー差ΔPが基準値Pl以下であれば、基準補正係数Gmは負の値に設定され、ΔPが小さくなるほど、基準補正係数Gmも小さくなる。そしてパワー差ΔPが補正下限値Pmin以下では、負の一定値、例えば、-10dBに設定される。なお、パワー差ΔPが基準値Plよりも大きい場合には、図６に示した例と同様に、パワー差ΔPに応じて基準補正係数Gmは決定される。 FIG. 10 is a diagram showing an example of the relationship between the power difference ΔP and the reference correction coefficient Gm according to this modification. In FIG. 10, the horizontal axis represents the power difference ΔP, and the vertical axis represents the reference correction coefficient Gm. A graph 1000 represents the relationship between the power difference ΔP and the reference correction coefficient Gm.
As shown in the graph 1000, in this modification, when the power difference ΔP is equal to or less than the reference value Pl, the reference correction coefficient Gm is set to a negative value, and the reference correction coefficient Gm decreases as ΔP decreases. When the power difference ΔP is equal to or less than the correction lower limit value Pmin, a negative constant value, for example, −10 dB is set. If the power difference ΔP is larger than the reference value Pl, the reference correction coefficient Gm is determined according to the power difference ΔP, as in the example shown in FIG.

さらに他の変形例によれば、補正量算出部は、高周波数帯域を複数のサブ周波数帯域に分割し、サブ周波数帯域ごとに補正係数を変えてもよい。例えば、スピーカの能力によっては、高周波数帯域のスペクトル信号を増幅し過ぎると音割れなどの音質劣化が発生することがある。しかし、この変形例によれば、音声補正装置は、そのような音質劣化が発生しない範囲でこもりを改善することができる。 According to still another modification, the correction amount calculation unit may divide the high frequency band into a plurality of sub frequency bands and change the correction coefficient for each sub frequency band. For example, depending on the ability of a speaker, sound quality degradation such as sound cracking may occur when a spectrum signal in a high frequency band is excessively amplified. However, according to this modification, the audio correction apparatus can improve the volume in a range in which such sound quality deterioration does not occur.

例えば、高周波数帯域を二つのサブ周波数帯域に分割する場合、補正係数g(f)は、次式に従って算出される。

なお、係数β1(f)及びβ2(f)は、それぞれ、低い方のサブ周波数帯域及び高い方のサブ周波数帯域に対応する。 For example, when the high frequency band is divided into two sub frequency bands, the correction coefficient g (f) is calculated according to the following equation.

The coefficients β1 (f) and β2 (f) correspond to the lower sub-frequency band and the higher sub-frequency band, respectively.

図１１は、周波数と（７）式に示された係数β1(f)及びβ2(f)との関係を示す図である。図１１において、横軸は周波数を表し、縦軸は係数β1(f)及びβ2(f)の大きさを表す。グラフ１１００は、周波数と係数β1(f)との関係を表す。またグラフ１１０１は、周波数と係数β2(f)との関係を表す。グラフ１１００に示されるように、係数β1(f)は、周波数Es1未満では0であり、周波数Es1以上、かつ、周波数Eul1以下では、周波数が高くなるにつれて単調増加する。そして係数β1(f)は、周波数Eul1より高く、かつ、周波数Em未満では一定となる。さらに、係数β1(f)は、周波数Emより高く、かつ、周波数Ee以下では、周波数が高くなるにつれて単調減少する。そして、係数β1(f)は、周波数Eeより高くなると、0となる。一方、グラフ１１０１に示されるように、係数β2(f)は、周波数Es2未満では0であり、周波数Es2以上、かつ、周波数Eul2以下では、周波数が高くなるにつれて単調増加する。そして係数β2(f)は、周波数Eul2より高くなると一定となる。ただし、EeとEul2はほぼ等しく、また、Es2とEmも、ほぼ等しい周波数に設定される。 FIG. 11 is a diagram showing the relationship between the frequency and the coefficients β1 (f) and β2 (f) shown in the equation (7). In FIG. 11, the horizontal axis represents the frequency, and the vertical axis represents the magnitudes of the coefficients β1 (f) and β2 (f). A graph 1100 represents the relationship between the frequency and the coefficient β1 (f). A graph 1101 represents the relationship between the frequency and the coefficient β2 (f). As shown in the graph 1100, the coefficient β1 (f) is 0 below the frequency Es1, and monotonously increases as the frequency increases above the frequency Es1 and below the frequency Eul1. The coefficient β1 (f) is constant above the frequency Eul1 and below the frequency Em. Furthermore, the coefficient β1 (f) is monotonously decreased as the frequency becomes higher than the frequency Em and below the frequency Ee. The coefficient β1 (f) becomes 0 when it becomes higher than the frequency Ee. On the other hand, as shown in the graph 1101, the coefficient β2 (f) is 0 below the frequency Es2, and monotonously increases as the frequency increases above the frequency Es2 and below the frequency Eul2. The coefficient β2 (f) becomes constant when it becomes higher than the frequency Eul2. However, Ee and Eul2 are substantially equal, and Es2 and Em are also set to substantially equal frequencies.

また他の変形例によれば、音声補正装置は、携帯電話機に搭載されたマイクロホンにより集音され、アナログ／デジタル変換器によりデジタル化された入力音声信号に対して音声補正処理を実行してもよい。この場合、音声補正装置により補正された入力音声信号は、携帯電話機の制御部へ出力される。
この場合、低周波数帯域の下限ls、上限leは、携帯電話機に搭載されたマイクロホンの近接効果により強調される周波数帯域の下限及び上限に設定されてもよい。
さらに、音声補正装置は、携帯電話機に限らず、固定電話機、または電話会議システムなどに実装されてもよい。 According to another modification, the sound correction device may perform sound correction processing on an input sound signal collected by a microphone mounted on a mobile phone and digitized by an analog / digital converter. Good. In this case, the input voice signal corrected by the voice correction device is output to the control unit of the mobile phone.
In this case, the lower limit ls and the upper limit le of the low frequency band may be set to the lower limit and the upper limit of the frequency band emphasized by the proximity effect of the microphone mounted on the mobile phone.
Furthermore, the audio correction device is not limited to a mobile phone, and may be mounted on a fixed phone, a telephone conference system, or the like.

さらに、上記の各実施形態による音声補正装置の各部が有する各機能をコンピュータに実現させるコンピュータプログラムは、磁気記録媒体あるいは光記録媒体といった、コンピュータによって読み取り可能な媒体に記録された形で提供されてもよい。 Furthermore, a computer program that causes a computer to realize each function of each unit of the sound correction apparatus according to each of the above embodiments is provided in a form recorded on a computer-readable medium such as a magnetic recording medium or an optical recording medium. Also good.

図１２は、上記の実施形態またはその変形例による音声補正装置の各部の機能を実現するコンピュータプログラムが動作することにより、音声補正装置として動作するコンピュータの構成図である。
コンピュータ１００は、ユーザインターフェース部１０１と、通信インターフェース部１０２と、記憶部１０３と、記憶媒体アクセス装置１０４と、プロセッサ１０５とを有する。プロセッサ１０５は、ユーザインターフェース部１０１、通信インターフェース部１０２、記憶部１０３及び記憶媒体アクセス装置１０４と、例えば、バスを介して接続される。 FIG. 12 is a configuration diagram of a computer that operates as a sound correction apparatus when a computer program that realizes the functions of the respective units of the sound correction apparatus according to the above-described embodiment or its modification is operated.
The computer 100 includes a user interface unit 101, a communication interface unit 102, a storage unit 103, a storage medium access device 104, and a processor 105. The processor 105 is connected to the user interface unit 101, the communication interface unit 102, the storage unit 103, and the storage medium access device 104 via, for example, a bus.

ユーザインターフェース部１０１は、例えば、キーボードとマウスなどの入力装置と、液晶ディスプレイといった表示装置とを有する。または、ユーザインターフェース部１０１は、タッチパネルディスプレイといった、入力装置と表示装置とが一体化された装置を有してもよい。そしてユーザインターフェース部１０１は、例えば、ユーザの操作に応じて、音声補正処理を開始させる操作信号をプロセッサ１０５へ出力する。 The user interface unit 101 includes, for example, an input device such as a keyboard and a mouse, and a display device such as a liquid crystal display. Alternatively, the user interface unit 101 may include a device such as a touch panel display in which an input device and a display device are integrated. For example, the user interface unit 101 outputs an operation signal for starting the sound correction processing to the processor 105 in accordance with a user operation.

通信インターフェース部１０２は、コンピュータ１００を、マイクロホン及びスピーカと接続するためのオーディオインターフェース及びその制御回路を有してもよい。
さらに、通信インターフェース部１０２は、イーサネット（登録商標）などの通信規格に従った通信ネットワークに接続するための通信インターフェース及びその制御回路を有してもよい。
この場合には、通信インターフェース部１０２は、通信ネットワークに接続された他の機器から、音声信号を取得し、プロセッサ１０５へ渡す。また通信インターフェース部１０２は、プロセッサ１０５から受け取った、補正された音声信号を通信ネットワークを介して他の機器へ出力してもよい。 The communication interface unit 102 may include an audio interface for connecting the computer 100 to a microphone and a speaker and a control circuit thereof.
Furthermore, the communication interface unit 102 may include a communication interface for connecting to a communication network according to a communication standard such as Ethernet (registered trademark) and a control circuit thereof.
In this case, the communication interface unit 102 acquires an audio signal from another device connected to the communication network and passes it to the processor 105. Further, the communication interface unit 102 may output the corrected audio signal received from the processor 105 to another device via the communication network.

記憶部１０３は、例えば、読み書き可能な半導体メモリと読み出し専用の半導体メモリとを有する。そして記憶部１０３は、プロセッサ１０５上で実行される、音声補正処理を実行するためのコンピュータプログラム、及び音声補正処理で利用される、各種のデータを記憶する。 The storage unit 103 includes, for example, a readable / writable semiconductor memory and a read-only semiconductor memory. And the memory | storage part 103 memorize | stores the computer program for performing the audio | voice correction | amendment process performed on the processor 105, and various data utilized by an audio | voice correction process.

記憶媒体アクセス装置１０４は、例えば、磁気ディスク、半導体メモリカード及び光記憶媒体といった記憶媒体１０６にアクセスする装置である。記憶媒体アクセス装置１０４は、例えば、記憶媒体１０６に記憶されたプロセッサ１０５上で実行される、音声補正処理用のコンピュータプログラムを読み込み、プロセッサ１０５に渡す。 The storage medium access device 104 is a device that accesses a storage medium 106 such as a magnetic disk, a semiconductor memory card, and an optical storage medium. For example, the storage medium access device 104 reads a computer program for sound correction processing executed on the processor 105 stored in the storage medium 106 and passes the computer program to the processor 105.

プロセッサ１０５は、上記の実施形態または変形例による音声補正処理用コンピュータプログラムを実行することにより、音声信号のこもり感を改善するよう、高周波数帯域の周波数成分を強調する。そしてプロセッサ１０５は、補正された音声信号を通信インターフェース部１０２を介して他の機器へ出力する。 The processor 105 enhances the frequency components in the high frequency band so as to improve the feeling of the sound signal bulkiness by executing the sound correction processing computer program according to the above-described embodiment or modification. Then, the processor 105 outputs the corrected audio signal to other devices via the communication interface unit 102.

ここに挙げられた全ての例及び特定の用語は、読者が、本発明及び当該技術の促進に対する本発明者により寄与された概念を理解することを助ける、教示的な目的において意図されたものであり、本発明の優位性及び劣等性を示すことに関する、本明細書の如何なる例の構成、そのような特定の挙げられた例及び条件に限定しないように解釈されるべきものである。本発明の実施形態は詳細に説明されているが、本発明の精神及び範囲から外れることなく、様々な変更、置換及び修正をこれに加えることが可能であることを理解されたい。 All examples and specific terms listed herein are intended for instructional purposes to help the reader understand the concepts contributed by the inventor to the present invention and the promotion of the technology. It should be construed that it is not limited to the construction of any example herein, such specific examples and conditions, with respect to showing the superiority and inferiority of the present invention. Although embodiments of the present invention have been described in detail, it should be understood that various changes, substitutions and modifications can be made thereto without departing from the spirit and scope of the present invention.

以上説明した実施形態及びその変形例に関し、更に以下の付記を開示する。
（付記１）
時間領域の音声信号を所定の時間長を持つフレーム単位で周波数領域へ変換することで複数の周波数のそれぞれについてのスペクトル信号値を含むスペクトル信号を算出する時間周波数変換部と、
各周波数について、人の聴覚特性に応じて聞き取れないスペクトル信号値に相当するマスキング閾値を算出するマスキング閾値算出部と、
少なくとも第１の周波数帯域に含まれる周波数及び第２の周波数帯域に含まれる周波数についての前記スペクトル信号値と前記マスキング閾値に基づいて、前記第１の周波数帯域及び前記第２の周波数帯域の知覚可能なスペクトル信号を表す有効スペクトル信号を算出する有効スペクトル抽出部と、
前記第１の周波数帯域の有効スペクトル信号と前記第２の周波数帯域の有効スペクトル信号との差を求める帯域間パワー差算出部と、
前記差に応じて所定の周波数帯域の補正量を決定する補正量算出部と、
前記補正量に応じて、前記所定の周波数帯域内の各周波数のスペクトル信号値を補正することで補正スペクトル信号を算出する補正部と、
前記補正スペクトル信号を時間領域へ変換することで補正された音声信号を得る周波数時間変換部と、
を有する音声補正装置。
（付記２）
前記有効スペクトル抽出部は、前記第１の周波数帯域または前記第２の周波数帯域に含まれる各周波数について前記スペクトル信号値から前記マスキング閾値を減算することで前記第１の周波数帯域の前記有効スペクトル信号及び前記第２の周波数帯域の前記有効スペクトルを算出する、付記１に記載の音声補正装置。
（付記３）
前記マスキング閾値算出部は、現フレームにおける前記複数の周波数帯域のうち、前記スペクトル信号値が極大値となるピーク周波数を検出し、前記ピーク周波数から近く、かつ前記ピーク周波数の前記スペクトル信号値が大きい周波数ほど、当該周波数に対する前記マスキング閾値を大きくする、付記１または２に記載の音声補正装置。
（付記４）
前記マスキング閾値算出部は、現フレームよりも所定数前のフレームにおける前記スペクトル信号値が大きい周波数ほど、当該周波数に対する前記マスキング閾値を大きくする、付記１または２に記載の音声補正装置。
（付記５）
前記第１の周波数帯域は、当該第１の周波数帯域の前記スペクトル信号値を増幅することでこもり感が悪化する周波数帯域であり、一方、前記第２の周波数帯域は、当該第２の周波数帯域の前記スペクトル信号値を増幅することでこもり感が改善する周波数帯域である、付記１〜４の何れか一項に記載の音声補正装置。
（付記６）
前記第１の周波数帯域は、前記音声信号を取得した音声入力部の特性によって周波数信号が増幅される第３の周波数帯域に含まれる、付記１〜５の何れか一項に記載の音声補正装置。
（付記７）
前記第２の周波数帯域は前記第１の周波数帯域よりも高く、
前記補正部は、前記補正量に応じて、前記第１の周波数帯域内の周波数の前記スペクトル信号値に対する前記第２の周波数帯域内の周波数の前記スペクトル信号値の比を高くするように、前記補正スペクトル信号を算出する、付記１〜６の何れか一項に記載の音声補正装置。
（付記８）
前記補正量算出部は、前記差が大きいほど、前記スペクトル信号値の比が高くなるように、前記第１の周波数帯域及び前記第２の周波数帯域のうちの少なくとも一方に対する前記補正量を決定する、付記７に記載の音声補正装置。
（付記９）
時間領域の音声信号を所定の時間長を持つフレーム単位で周波数領域へ変換することで複数の周波数のそれぞれについてのスペクトル信号値を含むスペクトル信号を算出し、
各周波数について、人の聴覚特性に応じて聞き取れないスペクトル信号値に相当するマスキング閾値を算出し、
少なくとも第１の周波数帯域に含まれる周波数及び第２の周波数帯域に含まれる周波数についての前記スペクトル信号値と前記マスキング閾値に基づいて、前記第１の周波数帯域及び前記第２の周波数帯域の知覚可能なスペクトル信号を表す有効スペクトル信号を算出し、
前記第１の周波数帯域の有効スペクトル信号と前記第２の周波数帯域の有効スペクトル信号との差を求め、
前記差に応じて、所定の周波数帯域に対する補正量を決定し、
前記補正量に応じて、前記所定の周波数帯域内の各周波数のスペクトル信号値を補正することで補正スペクトル信号を算出し、
前記補正スペクトル信号を時間領域へ変換することで補正された音声信号を得る、
ことを含む音声補正方法。
（付記１０）
時間領域の音声信号を所定の時間長を持つフレーム単位で周波数領域へ変換することで複数の周波数のそれぞれについてのスペクトル信号値を含むスペクトル信号を算出し、
各周波数について、人の聴覚特性に応じて聞き取れないスペクトル信号値に相当するマスキング閾値を算出し、
少なくとも第１の周波数帯域に含まれる周波数及び第２の周波数帯域に含まれる周波数についての前記スペクトル信号値と前記マスキング閾値に基づいて、前記第１の周波数帯域及び前記第２の周波数帯域の知覚可能なスペクトル信号を表す有効スペクトル信号を算出し、
前記第１の周波数帯域の有効スペクトル信号と前記第２の周波数帯域の有効スペクトル信号との差を求め、
前記差に応じて、所定の周波数帯域に対する補正量を決定し、
前記補正量に応じて、前記所定の周波数帯域内の各周波数のスペクトル信号値を補正することで補正スペクトル信号を算出し、
前記補正スペクトル信号を時間領域へ変換することで補正された音声信号を得る、
ことをコンピュータに実行させる音声補正用コンピュータプログラム。 The following supplementary notes are further disclosed regarding the embodiment described above and its modifications.
(Appendix 1)
A time-frequency conversion unit that calculates a spectrum signal including spectrum signal values for each of a plurality of frequencies by converting a time-domain audio signal into a frequency domain in units of frames having a predetermined time length;
For each frequency, a masking threshold value calculation unit that calculates a masking threshold value corresponding to a spectrum signal value that cannot be heard according to human auditory characteristics;
Perceptibility of the first frequency band and the second frequency band based on the spectral signal value and the masking threshold for at least the frequency included in the first frequency band and the frequency included in the second frequency band. An effective spectrum extraction unit for calculating an effective spectrum signal representing a simple spectrum signal;
An inter-band power difference calculation unit for obtaining a difference between the effective spectrum signal of the first frequency band and the effective spectrum signal of the second frequency band;
A correction amount calculation unit that determines a correction amount of a predetermined frequency band according to the difference;
A correction unit that calculates a corrected spectrum signal by correcting a spectrum signal value of each frequency in the predetermined frequency band according to the correction amount;
A frequency time conversion unit for obtaining a corrected audio signal by converting the corrected spectrum signal into the time domain;
An audio correction apparatus having
(Appendix 2)
The effective spectrum extraction unit subtracts the masking threshold value from the spectrum signal value for each frequency included in the first frequency band or the second frequency band, thereby the effective spectrum signal of the first frequency band. The speech correction apparatus according to appendix 1, wherein the effective spectrum of the second frequency band is calculated.
(Appendix 3)
The masking threshold calculation unit detects a peak frequency at which the spectrum signal value is a maximum value among the plurality of frequency bands in the current frame, and is close to the peak frequency and the spectrum signal value at the peak frequency is large. The audio correction apparatus according to appendix 1 or 2, wherein the masking threshold for the frequency is increased as the frequency is increased.
(Appendix 4)
The audio correction apparatus according to appendix 1 or 2, wherein the masking threshold value calculation unit increases the masking threshold value for the frequency as the frequency of the spectrum signal value in a frame a predetermined number before the current frame increases.
(Appendix 5)
The first frequency band is a frequency band in which the feeling of bulkiness is deteriorated by amplifying the spectrum signal value of the first frequency band, while the second frequency band is the second frequency band. The audio correction device according to any one of appendices 1 to 4, which is a frequency band in which a feeling of being obscured is improved by amplifying the spectrum signal value.
(Appendix 6)
The audio correction device according to any one of appendices 1 to 5, wherein the first frequency band is included in a third frequency band in which the frequency signal is amplified by characteristics of an audio input unit that has acquired the audio signal. .
(Appendix 7)
The second frequency band is higher than the first frequency band;
The correction unit increases the ratio of the spectrum signal value of the frequency in the second frequency band to the spectrum signal value of the frequency in the first frequency band according to the correction amount so as to increase the ratio. The audio correction device according to any one of appendices 1 to 6, which calculates a correction spectrum signal.
(Appendix 8)
The correction amount calculation unit determines the correction amount for at least one of the first frequency band and the second frequency band such that the greater the difference, the higher the ratio of the spectral signal values. The sound correction apparatus according to appendix 7.
(Appendix 9)
A spectrum signal including a spectrum signal value for each of a plurality of frequencies is calculated by converting a time domain audio signal into a frequency domain in units of frames having a predetermined time length,
For each frequency, calculate the masking threshold corresponding to the spectrum signal value that cannot be heard according to the human auditory characteristics,
Perceptibility of the first frequency band and the second frequency band based on the spectral signal value and the masking threshold for at least the frequency included in the first frequency band and the frequency included in the second frequency band. The effective spectrum signal representing the correct spectrum signal,
Determining a difference between the effective spectrum signal of the first frequency band and the effective spectrum signal of the second frequency band;
According to the difference, determine a correction amount for a predetermined frequency band,
According to the correction amount, a corrected spectrum signal is calculated by correcting a spectrum signal value of each frequency within the predetermined frequency band,
Obtaining a corrected audio signal by converting the corrected spectrum signal into the time domain;
An audio correction method including the above.
(Appendix 10)
A spectrum signal including a spectrum signal value for each of a plurality of frequencies is calculated by converting a time domain audio signal into a frequency domain in units of frames having a predetermined time length,
For each frequency, calculate the masking threshold corresponding to the spectrum signal value that cannot be heard according to the human auditory characteristics,
Perceptibility of the first frequency band and the second frequency band based on the spectral signal value and the masking threshold for at least the frequency included in the first frequency band and the frequency included in the second frequency band. The effective spectrum signal representing the correct spectrum signal,
Determining a difference between the effective spectrum signal of the first frequency band and the effective spectrum signal of the second frequency band;
According to the difference, determine a correction amount for a predetermined frequency band,
According to the correction amount, a corrected spectrum signal is calculated by correcting a spectrum signal value of each frequency within the predetermined frequency band,
Obtaining a corrected audio signal by converting the corrected spectrum signal into the time domain;
A computer program for sound correction that causes a computer to execute the operation.

１携帯電話機
２制御部
３通信部
４マイクロホン
５アナログ／デジタル変換器
６音声補正装置
７デジタル／アナログ変換器
８スピーカ
１１時間周波数変換部
１２マスキング閾値算出部
１３有効パワースペクトル抽出部
１４帯域間パワー差算出部
１５補正量算出部
１６補正部
１７周波数時間変換部
１００コンピュータ
１０１ユーザインターフェース部
１０２通信インターフェース部
１０３記憶部
１０４記憶媒体アクセス装置
１０５プロセッサ
１０６記憶媒体 DESCRIPTION OF SYMBOLS 1 Cellular phone 2 Control part 3 Communication part 4 Microphone 5 Analog / digital converter 6 Voice correction device 7 Digital / analog converter 8 Speaker 11 Time frequency conversion part 12 Masking threshold calculation part 13 Effective power spectrum extraction part 14 Interband power difference Calculation unit 15 Correction amount calculation unit 16 Correction unit 17 Frequency time conversion unit 100 Computer 101 User interface unit 102 Communication interface unit 103 Storage unit 104 Storage medium access device 105 Processor 106 Storage medium

Claims

A time-frequency conversion unit that calculates a spectrum signal including spectrum signal values for each of a plurality of frequencies by converting a time-domain audio signal into a frequency domain in units of frames having a predetermined time length;
For each frequency, a masking threshold value calculation unit that calculates a masking threshold value corresponding to a spectrum signal value that cannot be heard according to human auditory characteristics;
Perceptibility of the first frequency band and the second frequency band based on the spectral signal value and the masking threshold for at least the frequency included in the first frequency band and the frequency included in the second frequency band. An effective spectrum extraction unit for calculating an effective spectrum signal representing a simple spectrum signal;
An inter-band power difference calculation unit for obtaining a difference between the effective spectrum signal of the first frequency band and the effective spectrum signal of the second frequency band;
A correction amount calculation unit that determines a correction amount of a predetermined frequency band according to the difference;
A correction unit that calculates a corrected spectrum signal by correcting a spectrum signal value of each frequency in the predetermined frequency band according to the correction amount;
A frequency time conversion unit for obtaining a corrected audio signal by converting the corrected spectrum signal into the time domain;
An audio correction apparatus having

The effective spectrum extraction unit subtracts the masking threshold value from the spectrum signal value for each frequency included in the first frequency band or the second frequency band, thereby the effective spectrum signal of the first frequency band. The sound correction apparatus according to claim 1, wherein the effective spectrum of the second frequency band is calculated.

The masking threshold calculation unit detects a peak frequency at which the spectrum signal value is a maximum value among the plurality of frequency bands in the current frame, and is close to the peak frequency and the spectrum signal value at the peak frequency is large. The sound correction apparatus according to claim 1, wherein the masking threshold for the frequency is increased as the frequency is increased.

3. The speech correction apparatus according to claim 1, wherein the masking threshold calculation unit increases the masking threshold for the frequency as the frequency of the spectrum signal value in a frame a predetermined number before the current frame increases.

The second frequency band is higher than the first frequency band;
The correction unit increases the ratio of the spectrum signal value of the frequency in the second frequency band to the spectrum signal value of the frequency in the first frequency band according to the correction amount so as to increase the ratio. The sound correction apparatus according to claim 1, wherein the correction spectrum signal is calculated.

The correction amount calculation unit determines the correction amount for at least one of the first frequency band and the second frequency band such that the greater the difference, the higher the ratio of the spectral signal values. The sound correction apparatus according to claim 5.

A spectrum signal including a spectrum signal value for each of a plurality of frequencies is calculated by converting a time domain audio signal into a frequency domain in units of frames having a predetermined time length,
For each frequency, calculate the masking threshold corresponding to the spectrum signal value that cannot be heard according to the human auditory characteristics,
Perceptibility of the first frequency band and the second frequency band based on the spectral signal value and the masking threshold for at least the frequency included in the first frequency band and the frequency included in the second frequency band. The effective spectrum signal representing the correct spectrum signal,
Determining a difference between the effective spectrum signal of the first frequency band and the effective spectrum signal of the second frequency band;
According to the difference, determine a correction amount for a predetermined frequency band,
According to the correction amount, a corrected spectrum signal is calculated by correcting a spectrum signal value of each frequency within the predetermined frequency band,
Obtaining a corrected audio signal by converting the corrected spectrum signal into the time domain;
An audio correction method including the above.

A spectrum signal including a spectrum signal value for each of a plurality of frequencies is calculated by converting a time domain audio signal into a frequency domain in units of frames having a predetermined time length,
For each frequency, calculate the masking threshold corresponding to the spectrum signal value that cannot be heard according to the human auditory characteristics,
Perceptibility of the first frequency band and the second frequency band based on the spectral signal value and the masking threshold for at least the frequency included in the first frequency band and the frequency included in the second frequency band. The effective spectrum signal representing the correct spectrum signal,
Determining a difference between the effective spectrum signal of the first frequency band and the effective spectrum signal of the second frequency band;
According to the difference, determine a correction amount for a predetermined frequency band,
According to the correction amount, a corrected spectrum signal is calculated by correcting a spectrum signal value of each frequency within the predetermined frequency band,
Obtaining a corrected audio signal by converting the corrected spectrum signal into the time domain;
A computer program for sound correction that causes a computer to execute the operation.