JP2009015209A

JP2009015209A - Speech articulation improving system and speech articulation improving method

Info

Publication number: JP2009015209A
Application number: JP2007179469A
Authority: JP
Inventors: Toru Marumoto; 徹丸本; Nozomi Saito; 望齊藤
Original assignee: Alpine Electronics Inc
Current assignee: Alpine Electronics Inc
Priority date: 2007-07-09
Filing date: 2007-07-09
Publication date: 2009-01-22
Anticipated expiration: 2027-07-09
Also published as: JP5383008B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a speech articulation improving system and a speech articulation improving method, which can appropriately perform a loudness compensation by accurately detecting change of a level around mid/highrange by using a uni-directional microphone. <P>SOLUTION: The speech articulation improving system, which controls a gain of a speech signal on the basis of the speech power of the speech signal generated by a speech signal generation section, and power of surrounding sound, includes: the uni-directional microphone for detecting the speech outputted by a loud speaker and the surrounding sound; an estimating section for estimating each of the speech power and the power of the surrounding sound, from a microphone detection signal and the speech signal; and an audibility compensation filter provided between the microphone and the estimation section. A transfer characteristic from the microphone to the audible compensation filter is made approximately to be A characteristic. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は音声明瞭度改善システム及び音声明瞭度改善方法に係り、特に周辺の騒音を拾うマイクを話者音声入力マイクと兼用できるようにした音声明瞭度改善システム及び音声明瞭度改善方法に関する。 The present invention relates to a speech intelligibility improvement system and a speech intelligibility improvement method, and more particularly to a speech intelligibility improvement system and a speech intelligibility improvement method in which a microphone that picks up surrounding noise can be used also as a speaker voice input microphone.

スピーカから出力された音声を騒音下でも明瞭に聞こえるようにする音声明瞭度改善システムがある。例えば、車載用ナビゲーション装置では進路案内等の音声がスピーカから車室内に出力されるが、走行中などでエンジン音、ロードノイズ等の騒音が大きいときはマスキング効果でスピーカ出力音声が聞きづらくなる。そこで、スピーカ出力音声に比し騒音が大きいときはスピーカ出力音声にラウドネス補償を施して音声帯域全体のゲインを上げるなどして騒音下でもスピーカ出力音声が明瞭に聞こえるようにしている。 There is a speech intelligibility improvement system that makes it possible to hear sound output from a speaker clearly even under noise. For example, in an in-vehicle navigation device, sound such as route guidance is output from a speaker to a vehicle interior, but when noise such as engine sound and road noise is high during traveling, it is difficult to hear the speaker output sound due to a masking effect. Therefore, when the noise is louder than the speaker output sound, the speaker output sound is clearly audible even under noise by performing loudness compensation on the speaker output sound and increasing the gain of the entire sound band.

図１０は従来の音声明瞭度改善システムを含む車載機器の一例を示す構成図である。１は進路案内等の音声信号を出力する音声信号生成手段としての車載用ナビゲーション装置、２は音声信号の明瞭度を改善する音声明瞭度改善システムである。車載用ナビゲーション装置１から出力された進路案内等の案内音声信号は音声明瞭度改善システム２のラウドネス補償を行なうラウドネス補償部としてのゲイン調整部４、パワーアンプ５を経てスピーカ６で音響変換されて車室内に案内音声が出力される。案内音声Ａ、エンジン音やロードノイズ等の周辺の騒音(周辺音という)Ｎは低域までほぼフラットな周波数特性を有する無指向性のマイク７で拾われて推定部８に出力される。推定部８は案内音声信号とマイク検出信号とから案内音声パワーと周辺音パワーを推定する。この推定部８は案内音声信号をパワーアンプ５の入力側からマイク７の出力側までの伝達特性を模擬した伝達特性模擬フィルタ９に入力して案内音声成分を推定するとともに、パワー計算器１０に案内音声成分を通して案内音声パワーを推定する。一方、マイク７に接続したパワー計算器１１によりマイク入力のパワーを計算し、加算器１２によりマイク入力パワーから案内音声パワーを減算してマイク入力中の周辺音パワーを推定する。ラウドネス補償制御部１３は案内音声パワーと周辺音パワーに基づき、周辺音のレベルによらず案内音声が明瞭に聞こえるゲインを人のラウドネス特性により決定し、ゲイン調整部４に対しゲイン調整を行う。 FIG. 10 is a block diagram showing an example of an in-vehicle device including a conventional speech intelligibility improving system. Reference numeral 1 denotes an in-vehicle navigation device as an audio signal generating means for outputting an audio signal such as route guidance, and reference numeral 2 denotes an audio intelligibility improving system for improving the intelligibility of the audio signal. A guidance voice signal such as a route guidance output from the in-vehicle navigation device 1 is acoustically converted by a speaker 6 through a gain adjustment unit 4 and a power amplifier 5 as a loudness compensation unit for performing loudness compensation of the speech intelligibility improvement system 2. A guidance voice is output in the passenger compartment. Surrounding noise (referred to as ambient sound) N such as guidance voice A, engine sound and road noise is picked up by an omnidirectional microphone 7 having a substantially flat frequency characteristic up to a low frequency and output to an estimation unit 8. The estimation unit 8 estimates the guidance voice power and the ambient sound power from the guidance voice signal and the microphone detection signal. The estimation unit 8 inputs a guidance voice signal to a transmission characteristic simulation filter 9 that simulates a transmission characteristic from the input side of the power amplifier 5 to the output side of the microphone 7 to estimate a guidance voice component, and to the power calculator 10. Guidance voice power is estimated through the guidance voice component. On the other hand, the power of the microphone input is calculated by the power calculator 11 connected to the microphone 7, and the guidance voice power is subtracted from the microphone input power by the adder 12 to estimate the ambient sound power during the microphone input. The loudness compensation control unit 13 determines a gain by which the guidance voice can be heard clearly regardless of the level of the surrounding sound based on the guidance voice power and the ambient sound power, and performs gain adjustment for the gain adjustment unit 4.

ところで、車両の騒音は低音域のレベルが非常に大きく、この低音域成分は停車中のアイドリング時にも存在するため、低音域成分まで含めた騒音のトータルなレベルは走行中と停車中のアイドリング時とでそれほど差がない。一方、走行中は中高域の騒音レベルが増加するが、この帯域はスピーカ出力音声とオーバラップするため、停車中のアイドリング時に比べてスピーカ出力音声が聞こえにくくなる。
上記した従来の音声明瞭度改善システム２では、低域までほぼフラットな周波数特性(Ｃ特性)を有する無指向性のマイク７を用いて低音域まで含めた周辺音成分を推定している。しかし、フラットな周波数特性を有する無指向性のマイクを用いると自動車騒音の低域レベルが全体の騒音レベルを主に律することになり、音声の聞こえやすさにより強く影響する中高域の騒音レベルの変化を捉えにくくなる。換言すれば、従来の音声明瞭度改善システム２では、中高域の騒音レベルの変化を検出してスピーカ出力音声のゲインを正しく制御することができない問題がある。 By the way, the noise level of the vehicle is very high in the low frequency range, and this low frequency range component exists even when idling while the vehicle is stopped. There is not much difference. On the other hand, the noise level in the mid-high range increases while traveling, but this band overlaps with the speaker output sound, so that the speaker output sound is less audible than when idling while the vehicle is stopped.
In the conventional speech intelligibility improvement system 2 described above, the ambient sound component including the low sound range is estimated using the omnidirectional microphone 7 having the frequency characteristic (C characteristic) substantially flat to the low frequency range. However, when an omnidirectional microphone having a flat frequency characteristic is used, the low-frequency level of automobile noise mainly determines the overall noise level. It becomes difficult to catch changes. In other words, the conventional speech intelligibility improving system 2 has a problem in that it cannot correctly control the gain of the speaker output sound by detecting a change in the noise level in the mid-high range.

これとは別に、昨今は車両においても音声認識システム、ハンズフリーテレフォンなど、話者音声入力用のマイクを必要とする種々のシステムが導入されるようになってきている。これら音声認識システム、ハンズフリーテレフォンなどでは、主に単一指向性のマイクが用いられる。単一指向性のマイクは、カージオイドと呼ばれる指向性を形成することで中高域のＳＮ比を向上させると共に、音声とはあまりオーバラップしない低音域の騒音はカットするような特性を持つものが多く、その点でもＳＮ比の向上に有効だからである。一方、従来の音声明瞭度改善システムでは、上述のように低音域まで含めて(Ｃ特性)周囲の騒音レベルを推定するため、音声とはあまりオーバラップしない低音域の騒音をカットしてしまうような特性を持つマイクを採用することは難しい。音声明瞭度改善システムのマイクを話者音声入力システムのマイクと兼用できれば、システム全体の構成を簡略化できて有効であるが、以上の理由により従来の音声明瞭度改善システムでは、音声認識やハンズフリーテレフォンなどとマイクを共有することができない問題がある。
従来技術として、マイク感度を高くしたときに周辺の騒音で誤作動しないようにするため、マイク入力の通過周波数帯域を狭くする技術がある（特許文献１）。しかし、この従来技術は上記問題を解決するものではない。
特開平１０−１１６０９９号公報 Apart from this, various systems that require a microphone for speaker voice input, such as voice recognition systems and hands-free telephones, have recently been introduced in vehicles. In these voice recognition systems and hands-free telephones, unidirectional microphones are mainly used. Unidirectional microphones have a characteristic that cuts low-frequency noise that does not overlap very much with voice while improving the S / N ratio in the mid-high range by forming a directivity called cardioid. This is because many of them are effective in improving the SN ratio. On the other hand, in the conventional speech intelligibility improving system, since the surrounding noise level is estimated including the low frequency range (C characteristic) as described above, the low frequency range noise that does not overlap with the voice is cut off. It is difficult to adopt a microphone with special characteristics. If the microphone of the speech intelligibility improvement system can also be used as the microphone of the speaker speech input system, it is effective to simplify the overall system configuration. There is a problem that a microphone cannot be shared with a free telephone.
As a conventional technique, there is a technique of narrowing the pass frequency band of a microphone input in order to prevent malfunction due to ambient noise when the microphone sensitivity is increased (Patent Document 1). However, this prior art does not solve the above problem.
JP 10-1160999 A

以上より、本発明の目的は、音声明瞭度改善システムのマイクと話者音声入力用のマイクを共用できるようにすることである。
本発明の別の目的は中高域の騒音レベルの変化を検出してスピーカ出力音声のゲインを正しく制御できるようにすることである。
本発明の別の目的は、話者入力マイクを用いて中高域の周辺音レベルの変化を的確に捉えてラウドネス補償を適正に行なえるようにした音声明瞭度改善システムおよび音声明瞭度改善方法を提供することである。 In view of the above, an object of the present invention is to make it possible to share the microphone of the speech intelligibility improving system and the microphone for inputting the speaker voice.
Another object of the present invention is to detect a change in the noise level in the mid-high range so that the gain of the speaker output sound can be correctly controlled.
Another object of the present invention is to provide a speech intelligibility improvement system and a speech intelligibility improvement method that can appropriately perform loudness compensation by accurately detecting changes in the level of ambient sounds in the middle and high frequencies using a speaker input microphone. Is to provide.

・音声明瞭度改善システム
本発明の第１は、音声信号生成部が生成した音声信号の音声パワーと周辺音のパワーとに基づき音声信号のゲインを制御する音声明瞭度改善システムである。この音声明瞭度改善システムは、前記音声信号に基づいてスピーカから出力される音声と周辺音を検出する単一指向性のマイク、マイク検出信号と前記音声信号とから音声パワーと周辺音のパワーを推定する推定部、前記マイクと推定部との間に設けられた聴感補正フィルタを備え、マイクから該聴感補正フィルタまでの伝達特性が略Ａ特性となるようにする。
上記音声明瞭度改善システムにおいて、無指向性マイクにＡ特性フィルタを接続したときのＡ特性フィルタの出力を周波数領域で表わしたものをＹ（ω）、無指向性マイクと近接するように設置した単一指向性マイクの出力を周波数領域で表わしたものをＸ（ω）とするとき、前記聴感補正フィルタの伝達特性Ｈ（ω）を、
Ｈ（ω）＝Ｙ（ω）／Ｘ（ω）
となるように決定する。
・音声明瞭度改善方法
本発明の第２は音声信号生成部が生成した音声信号をスピーカで音響変換し、該スピーカから出力された音声と周辺音をマイクで検出し、該マイク検出信号と前記音声信号とから音声パワーと周辺音パワーを推定し、該推定した音声パワーと周辺音パワーとに基づき音声信号のゲインを制御する音声明瞭度改善方法である。この音声明瞭度改善方法において、前記マイクを単一指向性とし、該マイクと前記音声パワーと周辺音パワーを推定する推定部との間に聴感補正フィルタを設けると共に、該マイクから該聴感補正フィルタまでの伝達特性が略Ａ特性となるようにし、前記推定部において前記聴感補正フィルタを介して入力されるマイク検出信号と音声信号とから音声パワーと周辺音パワーを推定する。
上記音声明瞭度改善方法において、無指向性マイクにＡ特性フィルタを接続したときのＡ特性フィルタの出力を周波数領域で表わしたものをＹ（ω）、無指向性マイクと近接するように設置した単一指向性マイクの出力を周波数領域で表わしたものをＸ（ω）とするとき、前記聴感補正フィルタの伝達特性Ｈ（ω）を、
Ｈ（ω）＝Ｙ（ω）／Ｘ（ω）
となるように決定する。
上記音声明瞭度改善方法において、無指向性マイクにＡ特性フィルタを接続するとともに単一指向性マイクに適応フィルタを接続し、近接するように設置した第１のマイクと第２のマイクに所定の音波を入射させながらＡ特性フィルタの出力と適応フィルタの出力の誤差が最小となるように適応フィルタのフィルタ係数を更新して学習を行なわせ、前記聴感補正フィルタの伝達特性Ｈ（ω）を学習後の適応フィルタの伝達特性にする。 -Voice clarity improvement system The first aspect of the present invention is a voice clarity improvement system that controls the gain of a voice signal based on the voice power of the voice signal generated by the voice signal generation unit and the power of surrounding sounds. This speech intelligibility improving system is a unidirectional microphone that detects sound and ambient sound output from a speaker based on the sound signal, and obtains sound power and peripheral sound power from the microphone detection signal and the sound signal. An estimation unit for estimation, and an auditory correction filter provided between the microphone and the estimation unit are provided, and a transfer characteristic from the microphone to the auditory correction filter is set to a substantially A characteristic.
In the above speech intelligibility improvement system, the output of the A characteristic filter when the A characteristic filter is connected to the omnidirectional microphone is represented in the frequency domain as Y (ω), and is placed close to the omnidirectional microphone. When the output of the unidirectional microphone in the frequency domain is X (ω), the transfer characteristic H (ω) of the audibility correction filter is
H (ω) = Y (ω) / X (ω)
To be determined.
-Voice intelligibility improving method The second of the present invention is that the audio signal generated by the audio signal generation unit is acoustically converted by a speaker, the audio and surrounding sounds output from the speaker are detected by a microphone, and the microphone detection signal and the This is a speech intelligibility improving method that estimates speech power and ambient sound power from a speech signal and controls the gain of the speech signal based on the estimated speech power and ambient sound power. In this speech intelligibility improving method, the microphone is unidirectional, and an auditory correction filter is provided between the microphone and the estimation unit for estimating the voice power and the ambient sound power, and the auditory correction filter is provided from the microphone. So that the transmission characteristic up to approximately A characteristic is obtained, and the estimation unit estimates the sound power and the ambient sound power from the microphone detection signal and the sound signal input via the audibility correction filter.
In the above speech intelligibility improving method, the output of the A characteristic filter when the A characteristic filter is connected to the omnidirectional microphone is expressed in the frequency domain as Y (ω), and is set close to the omnidirectional microphone. When the output of the unidirectional microphone in the frequency domain is X (ω), the transfer characteristic H (ω) of the audibility correction filter is
H (ω) = Y (ω) / X (ω)
To be determined.
In the speech intelligibility improving method, an A characteristic filter is connected to an omnidirectional microphone, an adaptive filter is connected to a unidirectional microphone, and a first microphone and a second microphone installed so as to be close to each other are predetermined. Learning the transfer characteristic H (ω) of the audibility correction filter by updating the filter coefficient of the adaptive filter so that the error between the output of the A characteristic filter and the output of the adaptive filter is minimized while the sound wave is incident. The transfer characteristic of the later adaptive filter is used.

本発明によれば、マイクから聴感補正フィルタまでの伝達特性が人の聴覚特性に基づくＡ特性に近くなるので、音声の聞こえやすさにより強く影響する中高域の周辺音レベルの変化を的確に捉えて音声明瞭度改善制御が可能となり、ナビゲーション案内音声などの音声を聞き取りやすくできる。
また、本発明によれば、音声明瞭度改善システムのマイクを音声認識、ハンズフリーテレフォン等の話者入力マイクと兼用できるため、話者入力マイクを使用する他の音声入力装置と合わせた全体構成を簡単化できる。 According to the present invention, since the transfer characteristic from the microphone to the auditory correction filter is close to the A characteristic based on the human auditory characteristic, the change in the ambient sound level in the middle and high frequencies that strongly affects the ease of hearing of the sound is accurately captured. Therefore, voice clarity improvement control can be performed, and navigation guidance voice and other voices can be easily heard.
Further, according to the present invention, since the microphone of the speech intelligibility improving system can be used also as a speaker input microphone such as a voice recognition, a hands-free telephone, etc., the overall configuration combined with other voice input devices using the speaker input microphone Can be simplified.

（Ａ）概要
本発明は、従来の音声明瞭度改善システムと異なり、低音域まで含めて(Ｃ特性で）周囲の騒音レベルを推定するのではなく、音声の聞こえやすさにより強く影響する中高域の騒音レベルを主に観測する様にする。これは、騒音や音声信号をある種のハイパスフィルタに通してそのレベルを観測することにより実現できる。そこで、ハイパスフィルタとしての特性を持ち、人間が感じる音量感に基づいて考案されたＡ特性を通して周囲の騒音や音声のレベルを推定するようにする。Ａ特性を通した騒音(周辺音)や音声のレベル推定値に基づくラウドネス補償についての基礎研究は下記文献で示すように既になされているため、実際の制御はこの研究成果に基づいて行えばよい。
[1] Y.Suzuki et al., J. Acoust. Soc. Jpn.(E), 3(2), pp55-65, 1982
[2] Y.Suzuki et al., J. Acoust. Soc. Jpn.(E), 6(3), pp161-170, 1985
次に、音声認識やハンズフリーテレフォンなどで主に用いられる単一指向性のマイクは、音声とはあまりオーバラップしない低音域の騒音をカットするような特性（ハイパスフィルタ特性）をそもそも持つものが多い。そのため、単一指向性のマイクを用い、その出力側に適当なフィルタ処理を入れて全体としての特性がA特性となるようにすれば、このマイクシステムは音声明瞭度改善システムとしても用いることが可能になる。勿論、音声認識やハンズフリーテレフォンなどでは、単一指向性マイクの出力を、フィルタリングしないで用いればよい。しかも、単一指向性のマイク自体がハイパスフィルタとなっているため、従来の様に低音域までフラットな特性を持つマイクとＡ特性フィルタにより実現する場合に比べ、単一指向性マイクの後に接続するフィルタの規模はより小さくできる。
要約すると、本発明では、
(１)ハイパスフィルタとしての特性を持ち、人間が感じる音量感に基づいて考案されたA特性を通して周囲の騒音や音声のレベルを推定することにより、"音声の聞こえやすさ"により強く影響する中高域の騒音レベルの変化を捉えやすくする。
(２)単一指向性のマイクを用い、その出力側に適当なフィルタ処理を入れて、全体としての特性がＡ特性となるようにすることで、音声認識やハンズフリーテレフォンなどとマイクを共有できる構成とし、更に、"従来のように低音域までフラットな特性を持つマイクとＡ特性フィルタとを用いる方法"に比べて、単一指向性マイクの後に接続するフィルタの規模をより小さなものとする。 (A) Outline Unlike the conventional speech intelligibility improvement system, the present invention does not estimate the surrounding noise level (including the C characteristic) including the low frequency range, but rather influences the ease of hearing of the mid-high frequency range. Mainly observe the noise level. This can be realized by observing the level of a noise or voice signal through a certain high-pass filter. Therefore, the ambient noise and voice levels are estimated through the A-characteristic, which has a characteristic as a high-pass filter and is devised based on a sense of volume felt by humans. Since basic research on loudness compensation based on noise (ambient sound) and speech level estimation values through A-characteristics has already been made as shown in the following document, actual control should be based on this research result. .
[1] Y. Suzuki et al., J. Acoust. Soc. Jpn. (E), 3 (2), pp55-65, 1982
[2] Y. Suzuki et al., J. Acoust. Soc. Jpn. (E), 6 (3), pp161-170, 1985
Next, unidirectional microphones, which are mainly used for voice recognition and hands-free telephones, have characteristics (high-pass filter characteristics) that cut low-frequency noise that does not overlap with speech. Many. Therefore, if a unidirectional microphone is used and appropriate filtering is performed on the output side so that the overall characteristic becomes the A characteristic, this microphone system can also be used as a speech intelligibility improvement system. It becomes possible. Of course, the output of the unidirectional microphone may be used without filtering in voice recognition, hands-free telephone, or the like. In addition, since the unidirectional microphone itself is a high-pass filter, it is connected after the unidirectional microphone, compared to the case where a microphone having a flat characteristic up to the low frequency range and an A-characteristic filter are used. The size of the filter to be performed can be made smaller.
In summary, in the present invention,
(1) Medium-high, which has characteristics as a high-pass filter and has a strong influence on “easy to hear” by estimating ambient noise and sound levels through A-weighting designed based on the sense of volume perceived by humans. Make it easier to detect changes in the noise level of the area.
(2) Use a unidirectional microphone and put an appropriate filter process on the output side so that the overall characteristics become A characteristics, so that the microphone is shared with voice recognition and hands-free telephones. In addition, the size of the filter connected after the unidirectional microphone is smaller than the “method of using a microphone having a flat characteristic up to the low frequency range and an A characteristic filter as in the past”. To do.

（Ｂ）実施例
次に、図１を参照して本発明の実施例を説明する。図１は本発明に係る音声明瞭度改善システムを含む車載機器の構成を示すブロック図であり、図１０と同一の構成部分には同一の符号が付している。
図１において、２Ａは音声明瞭度改善システム、７Ａはスピーカから出力された案内音声Ａ、エンジン音やロードノイズ等の騒音Ｎを検出するマイクであり、話者音声入力用のマイクと兼用するため単一指向性を有する。量産向けの単一指向性マイクは、カージオイドと呼ばれる指向性を形成することで中高域のＳＮ比を向上させるとともに、構造上、音声とはあまりオーバラップしない低音域の騒音はカットするような特性を持つ（図２参照）。２０は音声認識装置であり、マイク７Ａからのマイク入力(話者入力音声)に基づき音声認識を行い、オーディオシステム、ナビゲーションシステムなどの音声制御を行なう。２１はマイク７Ａと推定部８Ａの間に設けられた聴感補正フィルタであり、ここでは図３に示す如く重み付け係数s1、a11、a21、b01、b11、b21が固定の2次のIIRフィルタの1段構成となっている。聴感補正フィルタ２１の伝達特性（周波数−振幅特性）は、マイク７Ａと聴感補正フィルタ２１とを組み合わせた特性が略Ａ特性を示すように決定すれば、音声の聞こえやすさにより強く影響する中高域の周辺音レベルの変化を的確に捉えることが可能となる。 (B) Embodiment Next, an embodiment of the present invention will be described with reference to FIG. FIG. 1 is a block diagram showing a configuration of an in-vehicle device including a speech intelligibility improving system according to the present invention, and the same components as those in FIG. 10 are denoted by the same reference numerals.
In FIG. 1, 2A is a speech intelligibility improving system, 7A is a guidance voice A output from a speaker, a microphone for detecting noise N such as engine sound and road noise, and is also used as a microphone for inputting a speaker's voice. Unidirectional. The unidirectional microphone for mass production improves the S / N ratio in the mid-high range by forming the directivity called cardioid, and cuts out the low-frequency noise that does not overlap with the voice because of its structure. Has characteristics (see FIG. 2). Reference numeral 20 denotes a voice recognition device which performs voice recognition based on microphone input (speaker input voice) from the microphone 7A and performs voice control of an audio system, a navigation system, and the like. Reference numeral 21 denotes an auditory correction filter provided between the microphone 7A and the estimation unit 8A. Here, as shown in FIG. 3, the weighting coefficients s1, a11, a21, b01, b11, and b21 are fixed ones of the secondary IIR filters. It has a stage configuration. The transfer characteristic (frequency-amplitude characteristic) of the audibility correction filter 21 is a medium / high frequency band that has a strong influence on the ease of hearing of the sound if the combination of the microphone 7A and the audibility correction filter 21 is determined so as to exhibit approximately A characteristics. It is possible to accurately grasp changes in the ambient sound level of the.

推定部８Ａは案内音声信号と聴感補正フィルタ２１の出力信号から案内音声パワーと周辺音パワーを推定する。すなわち、推定部８Ａは、パワーアンプ５の入力側から聴感補正フィルタ２１の出力側までの伝達特性を模擬した伝達特性模擬フィルタ８Ａに案内音声信号を入力して案内音声成分を求めるとともに、パワー計算器１０に該案内音声成分を入力して案内音声パワーを推定する。一方、聴感補正フィルタ２１に接続したパワー計算器１１はマイク入力信号のパワーを計算し、加算器１２はマイク入力パワーから案内音声パワーを減算して周辺音パワーを推定する。ラウドネス補償制御部１３は案内音声パワーと周辺音パワーに基づき、周辺音のレベルによらず案内音声が明瞭に聞こえるゲインを人のラウドネス特性により決定し、ゲイン調整部４に対しゲイン調整を行う。このとき、周辺音パワーは聴感補正がされているため、音声の聞こえやすさにより強く影響する中高域の周辺音レベルの変化を的確に捉えることができ、走行中に中高域の周辺音レベルが増大したときにラウドネス補償を適正に行なうことができる。 The estimation unit 8A estimates the guidance voice power and the ambient sound power from the guidance voice signal and the output signal of the audibility correction filter 21. That is, the estimation unit 8A obtains the guidance voice component by inputting the guidance voice signal to the transfer characteristic simulation filter 8A that simulates the transfer characteristic from the input side of the power amplifier 5 to the output side of the audibility correction filter 21, and calculates the power. The guidance voice component is input to the device 10 to estimate the guidance voice power. On the other hand, the power calculator 11 connected to the audibility correction filter 21 calculates the power of the microphone input signal, and the adder 12 subtracts the guidance voice power from the microphone input power to estimate the ambient sound power. The loudness compensation control unit 13 determines a gain by which the guidance voice can be heard clearly regardless of the level of the surrounding sound based on the guidance voice power and the ambient sound power, and performs gain adjustment for the gain adjustment unit 4. At this time, since the perceived sound power has been corrected for hearing, it is possible to accurately capture changes in the mid-high range sound level, which strongly influences the ease of hearing of the sound. When increased, the loudness compensation can be performed properly.

(Ｃ）聴感補正フィルタ２１の伝達特性設定法
音声明瞭度改善システムにおいて、低域までほぼフラットな特性（図４参照）を有する無指向性マイクを用いる場合、図５に示すように、該無指向性マイクＭＩＣに人の周波数別音量感の聴覚特性を考慮したＡ特性（図６参照）を有するＡ特性フィルタAFILを直列に接続し、該Ａ特性フィルタAFILの出力を推定部８Ａに入力する。このようにすると、音声の聞こえやすさにより強く影響する中高域の周辺音レベルの変化を的確に捉えることができる。なお、Ａ特性フィルタAFILをＩＩＲフィルタで近似する場合、低域の減衰量が急激に大きくなっているので、図７に示すように２次のＩＩＲフィルタの２段構成が必要である。
そこで、マイク７Ａと聴感補正フィルタ２１とを組み合わせた特性が、図５の無指向性のマイクＭＩＣからＡ特性フィルタAFILまでの伝達特性と略同一となるように、聴感補正フィルタ２１の伝達特性（周波数−振幅特性）を設定すれば、マイク７Ａと聴感補正フィルタ２１との組み合わせで、音声の聞こえやすさにより強く影響する中高域の周辺音レベルの変化を的確に捉えることが可能となる。 (C) Transfer characteristic setting method of auditory correction filter 21 In the speech intelligibility improvement system, when an omnidirectional microphone having a characteristic (see FIG. 4) that is substantially flat up to a low frequency is used, as shown in FIG. A directional microphone MIC is connected in series with an A characteristic filter AFIL having an A characteristic (see FIG. 6) in consideration of the auditory characteristic of human volume perceived volume, and the output of the A characteristic filter AFIL is input to the estimation unit 8A. . In this way, it is possible to accurately grasp the change in the surrounding sound level in the middle and high range, which strongly influences the ease of hearing of the sound. Note that, when the A characteristic filter AFIL is approximated by an IIR filter, the attenuation in the low band is rapidly increased, and therefore a two-stage configuration of a secondary IIR filter is required as shown in FIG.
Therefore, the transmission characteristic of the audibility correction filter 21 (the characteristic of the combination of the microphone 7A and the audibility correction filter 21 is substantially the same as the transmission characteristic from the omnidirectional microphone MIC to the A characteristic filter AFIL in FIG. If the frequency-amplitude characteristics are set, the combination of the microphone 7A and the audibility correction filter 21 can accurately capture the change in the mid-high range surrounding sound level that strongly affects the ease of hearing of the sound.

（ａ）聴感補正フィルタのフィルタ特性の第１の決定方法
図８は聴感補正フィルタ２１のフィルタ特性の第１の決定方法説明図である。
無指向性のマイクＭＩＣと単一指向性のマイク７ＡをスピーカＳＰに向けて近接させて並べ、マイクＭＩＣの出力側にＡ特性フィルタ２２を接続しておく。スピーカＳＰから放射させた白色雑音（ＷＮ）をマイクＭＩＣと７Ａで検出したときのＡ特性フィルタＡＦＩＬの出力ｙ、マイク７Ａの出力ｘをＦＦＴアナライザＡＮＬ_FFTで周波数分析し、その結果得られるマイクＭＩＣにＡ特性フィルタAFILを接続した系の出力をＹ（ω）、マイク７Ａの出力をＸ（ω）とするとき、次式
Ｈ（ω）＝Ｙ（ω）／Ｘ（ω）
を満足する伝達関数Ｈ（ω）を求める。図１の聴感補正フィルタ２１の周波数−振幅特性をこの伝達関数Ｈ（ω）の周波数−振幅特性と一致あるいは略一致させる。具体的には、伝達関数Ｈ（ω）の周波数−振幅特性を単純化してアナログフィルタ（例えばＬＰＦ）の周波数−振幅特性を決定し、ｓ−ｚ変換等の手法により図３に示す１段構成の２次ＩＩＲの各重み付け係数s1、a11、a21、b01、b11、b21を決定する。 (A) First Determination Method of Filter Characteristics of Auditory Correction Filter FIG. 8 is an explanatory diagram of a first determination method of filter characteristics of the auditory correction filter 21.
An omnidirectional microphone MIC and a unidirectional microphone 7A are arranged close to each other toward the speaker SP, and an A characteristic filter 22 is connected to the output side of the microphone MIC. When the white noise (WN) radiated from the speaker SP is detected by the microphone MIC and 7A, the output y of the A characteristic filter AFIL and the output x of the microphone 7A are frequency-analyzed by the FFT analyzer ANL _FFT , and the resulting microphone MIC When the output of the system in which the A characteristic filter AFIL is connected to Y (ω) and the output of the microphone 7A is X (ω), the following expression H (ω) = Y (ω) / X (ω)
A transfer function H (ω) that satisfies the above is obtained. The frequency-amplitude characteristic of the auditory sensation correction filter 21 in FIG. 1 is matched or substantially matched with the frequency-amplitude characteristic of the transfer function H (ω). Specifically, the frequency-amplitude characteristic of the transfer function H (ω) is simplified to determine the frequency-amplitude characteristic of an analog filter (for example, LPF), and the one-stage configuration shown in FIG. 3 is obtained by a technique such as sz conversion. The secondary IIR weighting coefficients s1, a11, a21, b01, b11, b21 are determined.

（ｂ）聴感補正フィルタのフィルタ特性の第２の決定方法
図９は聴感補正フィルタ２１のフィルタ特性の第２の決定方法説明図である。
無指向性のマイクＭＩＣと単一指向性のマイク７ＡをスピーカＳＰに向けて近接させて並べ、マイクＭＩＣの出力側にＡ特性フィルタＡＦＩＬを接続し、マイク７Ａの出力側に適応フィルタＡＤＦを接続する。スピーカＳＰから白色雑音ＷＮを放射させた状態で、加算器ＡＤＤにより、Ａ特性フィルタＡＦＩＬの出力ｙと適応フィルタＡＤＦの出力ｘの誤差ｅを求め、係数更新部ＣＲＮＷにより、誤差ｅの２乗値が最小となるようにＬＭＳアルゴリズム等により適応フィルタＡＤＦの係数を更新して学習を行なわせる。学習後の適応フィルタＡＤＦの周波数−振幅特性を単純化してアナログフィルタ（例えばＬＰＦ）の周波数−振幅特性を決定し、ｓ−ｚ変換等の手法により図３に示す１段構成の２次ＩＩＲの各重み付け係数s1、a11、a21、b01、b11、b21を決定する。 (B) Second Determination Method of Filter Characteristics of Auditory Correction Filter FIG. 9 is an explanatory diagram of a second determination method of filter characteristics of the auditory correction filter 21.
An omnidirectional microphone MIC and a unidirectional microphone 7A are arranged close to each other toward the speaker SP, an A characteristic filter AFIL is connected to the output side of the microphone MIC, and an adaptive filter ADF is connected to the output side of the microphone 7A To do. While the white noise WN is radiated from the speaker SP, the adder ADD obtains an error e between the output y of the A characteristic filter AFIL and the output x of the adaptive filter ADF, and the coefficient updating unit CRNW obtains the square value of the error e. The learning is performed by updating the coefficient of the adaptive filter ADF by the LMS algorithm or the like so as to minimize. The frequency-amplitude characteristic of the adaptive filter ADF after learning is simplified to determine the frequency-amplitude characteristic of an analog filter (for example, LPF), and the second-order IIR having the one-stage configuration shown in FIG. Each weighting coefficient s1, a11, a21, b01, b11, b21 is determined.

以上本発明によれば、話者入力と兼用の単一指向性のマイク７Ａから聴感補正フィルタ２１までの伝達特性が人の聴覚特性を模したＡ特性に近くなるので、音声の聞こえやすさにより強く影響する中高域の周辺音レベルの変化を的確に捉えてラウドネス補償を適正に行なえるようになる。また、音声明瞭度改善システム２Ａのマイク７Ａを音声認識装置２０の話者音声入力マイクと兼用できるので、マイク構成が簡単となる。
更に、Ａ特性フィルタの低域減衰特性は急峻なのでＩＩＲフィルタでＡ特性フィルタを具現する場合、２次ＩＩＲフィルタの２段構成とする必要があるが、単一指向性のマイク７Ａは低音域を減衰させる特性を有するので（図２参照）、聴感補正フィルタ２１の低域減衰特性はＡ特性フィルタより緩やかで良く、２次ＩＩＲフィルタの１段構成で具現することができる（図３参照）。
なお、上記した実施例では、聴感補正フィルタは２次ＩＩＲの１段構成としたが、本発明は何らこれに限定されるものでなく、２次ＩＩＲの２段構成としたり、３次以上のＩＩＲの１段以上の構成としても良い。また、ＦＩＲフィルタで構成しても良い。 As described above, according to the present invention, the transfer characteristic from the unidirectional microphone 7A also serving as a speaker input to the audibility correction filter 21 is close to the A characteristic imitating the human auditory characteristic. This makes it possible to properly compensate for the loudness compensation by accurately grasping the change in the surrounding sound level in the mid-high range that strongly influences. Moreover, since the microphone 7A of the speech intelligibility improving system 2A can be used also as the speaker voice input microphone of the speech recognition device 20, the microphone configuration is simplified.
Furthermore, since the low-frequency attenuation characteristic of the A-characteristic filter is steep, when implementing the A-characteristic filter with an IIR filter, it is necessary to have a two-stage IIR filter, but the unidirectional microphone 7A has a low-frequency range. Since it has an attenuation characteristic (see FIG. 2), the low frequency attenuation characteristic of the audibility correction filter 21 may be more gradual than that of the A characteristic filter, and can be realized by a one-stage configuration of a secondary IIR filter (see FIG. 3).
In the embodiment described above, the auditory sensation correction filter has a secondary IIR one-stage configuration. However, the present invention is not limited to this, and the secondary IIR has a two-stage configuration or a third-order or higher order. It is good also as a structure of 1 step | paragraph or more of IIR. Moreover, you may comprise with a FIR filter.

本発明に係る音声明瞭度改善システムを含む車載機器の構成を示すブロック図である（実施例１）。BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a block diagram illustrating a configuration of an in-vehicle device including a speech intelligibility improvement system according to the present invention (Example 1). 単一指向性マイクの周波数特性を示す線図である。It is a diagram which shows the frequency characteristic of a unidirectional microphone. 図１中の聴感補正フィルタの構成例を示すブロック図である。It is a block diagram which shows the structural example of the auditory sense correction filter in FIG. 無指向性マイクの周波数特性を示す線図である。It is a diagram which shows the frequency characteristic of an omnidirectional microphone. 音声の聞こえやすさにより強く影響する中高域の周辺音レベルの変化を的確に捉えるための無指向性マイク用いた構成である。This configuration uses an omnidirectional microphone to accurately capture changes in the ambient sound level in the middle and high range, which strongly influences the ease of hearing. Ａ特性フィルタの周波数特性を示す線図である。It is a diagram which shows the frequency characteristic of A characteristic filter. Ａ特性フィルタの構成例を示すブロック図である。It is a block diagram which shows the structural example of an A characteristic filter. 聴感補正フィルタの特性の第１の決定方法の説明図である。It is explanatory drawing of the 1st determination method of the characteristic of a hearing correction filter. 聴感補正フィルタの特性の第２の決定方法の説明図である。It is explanatory drawing of the 2nd determination method of the characteristic of an auditory sensation correction filter. 従来の音声明瞭度改善システムを含む車載機器の構成を示すブロック図である。It is a block diagram which shows the structure of the vehicle-mounted apparatus containing the conventional speech clarity improvement system.

Explanation of symbols

１車載用ナビゲーション装置
２Ａ音声明瞭度改善システム
４ゲイン調整部
６スピーカ
７Ａ単一指向性のマイク
８Ａ推定部
１３ラウドネス補償制御部 DESCRIPTION OF SYMBOLS 1 Car navigation system 2A Voice clarity improvement system 4 Gain adjustment part 6 Speaker 7A Unidirectional microphone 8A Estimation part 13 Loudness compensation control part

Claims

In the speech intelligibility improvement system that controls the gain of the audio signal based on the audio power of the audio signal generated by the audio signal generation unit and the power of the surrounding sound,
A unidirectional microphone for detecting sound output from a speaker and ambient sound based on the sound signal;
An estimation unit that estimates the sound power and the power of surrounding sounds from the microphone detection signal and the sound signal;
Hearing correction filter provided between the microphone and the estimation unit,
With
A speech intelligibility improvement system characterized in that a transfer characteristic from a microphone to the audibility correction filter is substantially A-characteristic.

Y (ω) represents the output of the A characteristic filter in the frequency domain when the A characteristic filter is connected to the omnidirectional microphone, and the output of the unidirectional microphone installed close to the omnidirectional microphone. When what is expressed in the frequency domain is X (ω), the transfer characteristic H (ω) of the hearing correction filter is
H (ω) = Y (ω) / X (ω)
Decide to be,
The speech intelligibility improvement system according to claim 1.

The sound signal generated by the sound signal generation unit is acoustically converted by the speaker, the sound output from the speaker and the surrounding sound are detected by the microphone, and the sound power and the surrounding sound power are estimated from the microphone detection signal and the sound signal. In the speech intelligibility improving method for controlling the gain of the audio signal based on the estimated audio power and the ambient audio power,
The microphone is unidirectional,
An audibility correction filter is provided between the microphone and the estimation unit for estimating the sound power and the ambient sound power, and a transfer characteristic from the microphone to the audibility correction filter is substantially A characteristic,
Estimating audio power and ambient sound power from the microphone detection signal and the audio signal input through the auditory correction filter in the estimation unit,
A speech intelligibility improvement method characterized by the above.

Y (ω) represents the output of the A characteristic filter in the frequency domain when the A characteristic filter is connected to the omnidirectional microphone, and the output of the unidirectional microphone installed close to the omnidirectional microphone. When what is expressed in the frequency domain is X (ω), the transfer characteristic H (ω) of the hearing correction filter is
H (ω) = Y (ω) / X (ω)
Decide to be,
The method of improving speech intelligibility according to claim 3.

An A characteristic filter is connected to an omnidirectional microphone, an adaptive filter is connected to a unidirectional microphone, and a predetermined sound wave is incident on a first microphone and a second microphone installed so as to be close to each other. Learning is performed by updating the filter coefficient of the adaptive filter so that the error between the output of the adaptive filter and the output of the adaptive filter is minimized, and the transfer characteristic H (ω) of the audibility correction filter is changed to the transfer characteristic of the adaptive filter after learning. To
The method of improving speech intelligibility according to claim 3.