JP5339849B2

JP5339849B2 - Speech intelligibility improving method and speech intelligibility improving system

Info

Publication number: JP5339849B2
Application number: JP2008275277A
Authority: JP
Inventors: 洋平薮田; 望齊藤; 徹丸本
Original assignee: Alpine Electronics Inc
Current assignee: Alpine Electronics Inc
Priority date: 2008-10-27
Filing date: 2008-10-27
Publication date: 2013-11-13
Anticipated expiration: 2028-10-27
Also published as: JP2010102223A

Abstract

<P>PROBLEM TO BE SOLVED: To provide a voice articulation improving method and a voice articulation improving system capable of improving an effect by voice articulation improving processing by accurately measuring noise power even if there is a delay from a time when a voice data RG are generated to a time when the voice is detected by a microphone. <P>SOLUTION: A voice data stream which is outputted by a voice data generating section is multiplied by a predetermined gain and inputted to a speaker, and a voice signal detected by the microphone is taken at a predetermined length unit. An output start time and an output end time of the voice data stream are monitored, and a taking start time and an end time of the voice data stream of the predetermined length which is taken by the microphone are monitored. It is determined whether or not the voice data stream of the predetermined length which is taken by the microphone is the voice data stream in a noise section by using each of the time, and the noise power is detected by using the voice data stream of the noise section. When it is not the noise section, voice power is detected based on the voice data stream, the gain which is multiplied with the voice data stream is determined by using the noise power and the voice power. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は音声明瞭度改善方法および音声明瞭度改善システムに係り、特に、案内音声信号が出力されていない区間を騒音区間とし、該騒音区間における騒音パワーと非騒音区間における音声パワーを用いて案内音声信号のゲインを制御する音声明瞭度改善方法および音声明瞭度改善システムに関する。 The present invention relates to a speech intelligibility improving method and a speech intelligibility improving system, and in particular, a section where no guidance voice signal is output is defined as a noise section, and guidance is performed using noise power in the noise section and voice power in a non-noise section. The present invention relates to a speech intelligibility improving method and a speech intelligibility improving system for controlling a gain of an audio signal.

スピーカから出力された音声（ナビゲーションガイド音声や，ニュースやメールの読上げ音声等）を騒音下でも明瞭に聞こえるようにする車載の音声明瞭度改善システムがある。例えば、車載用ナビゲーション装置では進路案内等の音声がスピーカから車室内に出力されるが、走行中などでエンジン音、ロードノイズ等の騒音が大きいときはマスキング効果でスピーカ出力音声が聞きづらくなる。そこで、出力する音声データのパワーと騒音のパワーに応じて音声データにラウドネス補償を施して音声帯域全体のゲインを上げるなどして騒音下でもスピーカ出力音声が明瞭に聞こえるようにしている。 There is an in-vehicle speech intelligibility improvement system that makes it possible to clearly hear voices (navigation guide voices, news and mail reading voices) output from speakers even under noisy conditions. For example, in an in-vehicle navigation device, sound such as route guidance is output from a speaker to a vehicle interior, but when noise such as engine sound and road noise is high during traveling, it is difficult to hear the speaker output sound due to a masking effect. Therefore, the loudspeaker output sound can be clearly heard even under noise by performing loudness compensation on the sound data according to the power of the sound data to be output and the power of the noise to increase the gain of the entire sound band.

図７は従来の音声明瞭度改善システム（特許文献１）の構成図である。図７の音声明瞭度改善システムによれば（詳細な動作は特許文献１参照）、同定フィルタ７１によりマイク７２の設置位置における案内音声信号SGを模擬し、減算器７３により、マイク７２の出力から前記信号を引くことによって騒音信号SNを抽出している。ラウドネス補償ゲイン算出部７４では、案内音声信号および騒音信号のそれぞれの信号をもとにゲインGoptを算出してRG補正部（RouteGuidance音声補正部）７５に入力している。この際、同定用フィルタ７６における同定処理は、適応フィルタ７７を用いて行われ、この中の適応アルゴリズム部７８は、種々の適応アルゴリズムを用いて実現することができ、その代表的なものの一つがLMSアルゴリズムであるが、Fast-LMSアルゴリズム（周波数領域におけるLMSアルゴリズム）等を使用してフィルタ係数の更新を行うようにしてもよい。 FIG. 7 is a configuration diagram of a conventional speech intelligibility improvement system (Patent Document 1). According to the speech intelligibility improving system of FIG. 7 (refer to Patent Document 1 for detailed operation), the identification voice 71 simulates the guidance voice signal SG at the installation position of the microphone 72, and the subtractor 73 from the output of the microphone 72. The noise signal SN is extracted by subtracting the signal. The loudness compensation gain calculation unit 74 calculates the gain Gopt based on the guidance voice signal and the noise signal and inputs the gain Gopt to the RG correction unit (RouteGuidance voice correction unit) 75. At this time, the identification processing in the identification filter 76 is performed using the adaptive filter 77, and the adaptive algorithm unit 78 therein can be realized using various adaptive algorithms, and one of the typical ones is as follows. Although it is the LMS algorithm, the filter coefficient may be updated using a Fast-LMS algorithm (LMS algorithm in the frequency domain) or the like.

しかし、音声信号のパワーに推定誤差が生じると、減算器７３による減算によって算出される騒音推定パワーの誤差が、音声信号の推定パワーの誤差の符号と逆符号となり、差幅が大きくなってゲインを正しく決定できなくなる。また、従来の音声明瞭度改善システムでは演算量が多すぎて高価なＤＳＰが必要とされる問題がある。 However, if an estimation error occurs in the power of the audio signal, the error in the noise estimation power calculated by subtraction by the subtracter 73 becomes opposite to the sign of the error in the estimation power of the audio signal, and the difference width increases and the gain increases. Cannot be determined correctly. In addition, the conventional speech intelligibility improving system has a problem that an expensive DSP is required due to an excessive amount of calculation.

そこで、音声信号の音声パワーと騒音パワーとに基づき音声信号のゲインを制御するシステムにおいて音声パワーが設定レベル以上であるか否かを検出し、すなわち案内音声が出力されているか否かを検出し、音声パワーが設定レベルより小さいとき（案内音声が出力されていないとき）の騒音パワーを測定して保存し、音声パワーが設定レベルより大きいとき（案内音声が出力されているとき）の騒音パワーを前記保存してあるパワーであると推定し、音声パワーと推定した騒音パワーとに基づいて音声信号のゲインを制御する技術が提案されている。 Therefore, in the system that controls the gain of the audio signal based on the audio power and noise power of the audio signal, it is detected whether or not the audio power is above the set level, that is, whether or not the guidance audio is being output. Measure and save the noise power when the voice power is lower than the set level (when no guidance voice is output), and save the noise power when the voice power is higher than the set level (when the guidance voice is output) Has been proposed to control the gain of the audio signal based on the audio power and the estimated noise power.

図８は上記提案されている音声明瞭度改善システムの構成図である。
ナビゲーション装置の案内音声発生部８１は、例えば交差点に接近したとき案内音声信号を発生する。サウンドドライバ８２はこの案内音声信号に音質制御等を施し、増幅して出力する。ＲＧ補正部８３は後述する補正値算出部８４で決定したゲインｇをサウンドドライバ８２より出力される音声信号に乗算し、音量補正してＤＡＣ８５に入力し、ＤＡＣ８５は入力された音声信号をアナログ変換してスピーカ８６に入力する。スピーカ８６は入力音声信号を出力する。マイク８７は案内音声ａと周辺の騒音ｎ（エンジン音、ロードノイズ等）の合成音を検出してＡＤＣ８８でディジタルデータに変換し聴感補正フィルタ８９ａを介してパワー計算部８９ｂに入力する。パワー計算部８９ｂは入力されたマイク検出信号の振幅の二乗演算を行なってパワーを計算し、切替部８９ｃに入力する。 FIG. 8 is a block diagram of the proposed speech intelligibility improving system.
The guidance voice generation unit 81 of the navigation device generates a guidance voice signal when approaching an intersection, for example. The sound driver 82 performs sound quality control on the guidance voice signal, amplifies it, and outputs it. The RG correction unit 83 multiplies the audio signal output from the sound driver 82 by the gain g determined by the correction value calculation unit 84, which will be described later, corrects the volume, and inputs it to the DAC 85. The DAC 85 converts the input audio signal into an analog signal. And input to the speaker 86. The speaker 86 outputs an input audio signal. The microphone 87 detects the synthesized sound of the guidance voice a and the surrounding noise n (engine sound, road noise, etc.), converts it into digital data by the ADC 88, and inputs it to the power calculator 89b via the audibility correction filter 89a. The power calculation unit 89b calculates the power by performing the square calculation of the amplitude of the input microphone detection signal, and inputs the power to the switching unit 89c.

切替部８９ｃは、ガイド音声が出力されていない区間において、すなわち、音声信号のパワー（音声パワー）が設定値より小さいとき、パワー計算部８９ｂで計算されたパワーを、固定接点Ａを介して騒音パワー平均化部８４ｂに入力し、ガイド音声が出力されている区間において、すなわち、音声パワーが設定値より大きいとき、パワー計算部８９ｂで計算されたパワーをＢ接点側に出力していずれのユニットにも入力しない。騒音パワー平均化部８４ｂは、ガイド音声が出力されていない区間において、パワー計算部８９ｂから出力するパワーを騒音パワーと見なし、パワー計算部８９ｂから出力する最新のＮ個（Ｎ：定数）のパワーの移動平均値を求め、該移動平均値を騒音パワーとしてパワー保存部８４ｃに保存する。この結果、ガイド音声が出力されたとき、直前のガイド音声が出力されていない区間における最新の騒音パワーがパワー保存部８４ｃに保存されていることになる。ガイド音声出力中の騒音パワーは、パワー保存部８４ｃに保存されている騒音パワーであると見なし、パワー保存部８４ｃに保存されている騒音パワーをラウドネス補償ゲイン算出部８４ａに入力する。 The switching unit 89c uses the power calculated by the power calculation unit 89b as noise via the fixed contact A in a section where no guide voice is output, that is, when the power of the audio signal (audio power) is smaller than the set value. In the section in which the guide voice is output, which is input to the power averaging unit 84b, that is, when the voice power is larger than the set value, the power calculated by the power calculation unit 89b is output to the B contact side and any unit Also do not enter. The noise power averaging unit 84b regards the power output from the power calculation unit 89b as noise power in a section where no guide voice is output, and the latest N (N: constant) powers output from the power calculation unit 89b. And the moving average value is stored as noise power in the power storage unit 84c. As a result, when the guide voice is output, the latest noise power in the section where the previous guide voice is not output is stored in the power storage unit 84c. The noise power during the output of the guide voice is regarded as the noise power stored in the power storage unit 84c, and the noise power stored in the power storage unit 84c is input to the loudness compensation gain calculation unit 84a.

以上と並行して、案内音声発生部８１から出力される音声信号は、聴感補正フィルタ８９ｅを介して音声パワー計算部８９ｆに入力する。音声パワー計算部８９ｆは入力された音声信号の振幅の二乗演算を行なって音声パワーを計算し、該音声パワーを判定部８９ｇと音声パワー平均化部８９ｈに入力する。判定部８９ｇは、入力された音声パワーと設定レベルとを比較し、音声パワーが設定レベルより小さいときはガイド音声が出力されていない区間であると判定し、音声パワーが設定レベルより大きいときはガイド音声が出力されている区間であると判定する。そして、判定部８９ｇは、ガイド音声が出力されていない区間では切替器８９ｃを制御してパワー計算部８９ｂが計算したパワーを騒音パワー平均化部８４ｂに入力し、ガイド音声が出力されている区間では、何れのユニットにも入力しない。音声パワー平均化部８９ｈは音声パワー計算部８９ｆから出力するＭ個（Ｍ：定数）の音声パワーの平均値を演算し可変ゲイン部８４ｄに入力し、可変ゲイン部８４ｄは設定されているゲインＧを平均音声パワーに乗算してラウドネス補償ゲイン算出部８４ａに入力する。なお、可変ゲイン部８４ｄにより設定されるゲインＧは、スピーカ８６の入力端子からマイク出力端子までの伝播特性をゲインのみで近似できると見なして、特性同定部８９ｉが該ゲインＧを別途求めて設定するものである。ラウドネス補償制御部８９ａは、ガイド音声が出力されている区間において、可変ゲイン部８４ｄから入力する音声パワーとパワー保存部８４ｃから入力する騒音パワーに基づき、騒音のレベルによらず案内音声が明瞭に聞こえるゲインｇを人のラウドネス特性により決定して補正部８３に入力し、ＲＧ補正部８３は該ゲインｇを入力され、案内音声信号にゲインｇを乗算して出力する。なお、ラウドネス補償制御部８４ａは、ガイド音声が出力されていない区間では、ゲインｇの決定制御を行なわない。 In parallel with the above, the voice signal output from the guidance voice generator 81 is input to the voice power calculator 89f via the audibility correction filter 89e. The audio power calculation unit 89f calculates the audio power by performing the square calculation of the amplitude of the input audio signal, and inputs the audio power to the determination unit 89g and the audio power averaging unit 89h. The determination unit 89g compares the input sound power with the set level, determines that the guide sound is not output when the sound power is lower than the set level, and when the sound power is higher than the set level. It determines with it being the area where the guide sound is output. Then, the determination unit 89g controls the switch 89c to input the power calculated by the power calculation unit 89b to the noise power averaging unit 84b in the section where the guide voice is not output, and outputs the guide voice. Then, it does not input to any unit. The audio power averaging unit 89h calculates an average value of M (M: constant) audio powers output from the audio power calculation unit 89f and inputs the average value to the variable gain unit 84d. The variable gain unit 84d sets the gain G that is set. Is multiplied by the average voice power and input to the loudness compensation gain calculator 84a. Note that the gain G set by the variable gain unit 84d is set by separately obtaining the gain G by assuming that the propagation characteristic from the input terminal of the speaker 86 to the microphone output terminal can be approximated only by the gain. To do. The loudness compensation control unit 89a makes the guidance voice clear regardless of the noise level based on the voice power input from the variable gain unit 84d and the noise power input from the power storage unit 84c in the section where the guide voice is output. The audible gain g is determined by the human loudness characteristic and input to the correction unit 83. The RG correction unit 83 receives the gain g, multiplies the guidance voice signal by the gain g, and outputs the result. Note that the loudness compensation control unit 84a does not perform the gain g determination control in a section where no guide voice is output.

図９は図８の音声明瞭度改善システムをマルチプロセスの汎用ＣＰＵ９１とＤＳＰ（Digital Signal Processor）８０とで実現する例であり、図８と同一部分には同一符号を付している。ＤＳＰ８０は、図８におけるＲＧ補正部８３、補正値算出部８４、騒音分離部８９の機能を実行し、マルチプロセスの汎用ＣＰＵ９１は図８の案内音声発生部８１の案内音声データ作成処理、該案内音声データのサウンドドライバ８２への受け渡し処理等の音声再生処理（ＶＯＩＣＥアプリ）９１ｄを行う。汎用ＣＰＵ９１はかかる音声再生処理に加えて、ナビゲーション処理９１ａ、車載オーディオ処理９１ｂ、自動車電話処理９１ｃ等の複数のアプリケーションを実行するようになっており、優先順位の高い処理を優先的に実行するようになっている。 FIG. 9 shows an example in which the speech intelligibility improvement system of FIG. 8 is realized by a multi-process general-purpose CPU 91 and a DSP (Digital Signal Processor) 80, and the same parts as those in FIG. The DSP 80 executes the functions of the RG correction unit 83, the correction value calculation unit 84, and the noise separation unit 89 in FIG. 8, and the multi-process general-purpose CPU 91 performs guidance voice data creation processing of the guidance voice generation unit 81 in FIG. An audio reproduction process (VOICE application) 91d such as a process of transferring audio data to the sound driver 82 is performed. The general-purpose CPU 91 executes a plurality of applications such as a navigation process 91a, an in-vehicle audio process 91b, and a car phone process 91c in addition to the sound reproduction process, and preferentially executes a process with a high priority. It has become.

音声再生処理において、ＲＧ生成部８１ｂは案内音声データ保存部８１ａから符号化された案内音声データを読み出して復号してＲＧ再生部８１ｃに入力すると共に、ＲＧ再生部８１ｃは該入力された音声データを一時的に保存し、適宜、該音声データＲＧをサウンドドライバ８２に入力する。サウンドドライバ８２は音声データに所定の処理を施して、サウンドドライバ８２を介してＲＧ補正部８３に入力し、ＲＧ補正部８３は入力された音声データＲＧに補正値算出部８４が算出した補正値（ゲイン）を乗算し音量補正を行い、補正された音声信号ＲＧ’をアナログ信号に変換してスピーカ８６に入力する。スピーカ８６は入力された音声信号ＲＧ’を出力し、マイク８７は該音声信号と周囲の雑音を検出し、検出データ（ＭＩＣデータ）をＡＤＣ８８を介して騒音分離部８９に入力する。騒音分離部８９はＭＩＣデータと案内音声データＲＧとを用いて案内音声パワーのレベルと騒音パワーのレベルを算出し、補正値算出部８４に入力する。補正値算出部８４は入力された案内音声パワーのレベルと騒音パワーのレベルに基づいてゲインを算出し、ゲインをＲＧ補正部８３に入力し、ＲＧ補正部８３は入力された補正値を音声データに乗算して出力する。これにより音声信号のゲインが大幅に誤差を持たないようになり、かつ演算量を大幅に削減することができる。
なお、DSPを用いず、マルチプロセスの汎用CPU９１のみを用いて、汎用CPU上で音声明瞭度改善処理を行うことも提案されている。
特開平１１−１６６８３５号公報 In the voice reproduction process, the RG generation unit 81b reads the guidance voice data encoded from the guidance voice data storage unit 81a, decodes it, and inputs it to the RG playback unit 81c. The RG playback unit 81c receives the input voice data. Is temporarily stored, and the sound data RG is input to the sound driver 82 as appropriate. The sound driver 82 performs predetermined processing on the sound data and inputs the sound data to the RG correction unit 83 via the sound driver 82. The RG correction unit 83 calculates the correction value calculated by the correction value calculation unit 84 to the input sound data RG. (Gain) is multiplied to perform volume correction, and the corrected audio signal RG ′ is converted into an analog signal and input to the speaker 86. The speaker 86 outputs the input audio signal RG ′, and the microphone 87 detects the audio signal and ambient noise, and inputs detection data (MIC data) to the noise separation unit 89 via the ADC 88. The noise separation unit 89 calculates the guidance voice power level and the noise power level using the MIC data and the guidance voice data RG, and inputs them to the correction value calculation unit 84. The correction value calculation unit 84 calculates a gain based on the input guidance voice power level and the noise power level, and inputs the gain to the RG correction unit 83. The RG correction unit 83 uses the input correction value as voice data. Multiply by and output. As a result, the gain of the audio signal has no significant error, and the amount of calculation can be greatly reduced.
Note that it has also been proposed to perform speech intelligibility improvement processing on a general-purpose CPU using only a multi-process general-purpose CPU 91 without using a DSP.
Japanese Patent Laid-Open No. 11-166835

しかし、上記提案されている技術において、汎用のＣＰＵで複数のアプリケーションの処理を行うマルチプロセスではそれぞれのアプリケーションに優先順位が設けられており、常に音声明瞭度改善システムの処理が行われるわけではなく音声データがＲＧ生成部で生成されてスピーカ出力されたＲＧ音声がマイクにより検出されて騒音分離部に入力するまでに時間的遅延が生じる。具体的に図１０を用いて説明する。尚、図１０では、図９のDSP８０の処理を汎用CPU９１に実行させる場合を示しており、GAE部（補正部８３，補正値算出部８４，騒音分離部８９）、サウンドドライバ８２、RG再生部８１ｃ等の配置を変更している。汎用CPU９１上のＲＧ再生部８１ｃの処理は優先順位が低く、ＲＧ補正部８３から入力した音声データをサウンドドライバ８２に直ちに出力できるとは限らず、内蔵のバッファに滞留して遅延が生じ、しかも、他の部分でも遅延が生じる。この結果、ＲＧ生成部８１ｂが音声データＲＧを騒音分離部８９に入力してから相当の時間が経過してからサウンドドライバ８２がマイク検出音声データを騒音分離部８９に入力される。かかる遅延が発生すると正確な騒音パワーの測定ができず、音声明瞭度改善処理による効果が低下する。図１１は騒音パワーが正確に測定できないことを説明するタイムチャートであり、音声データＲＧの出力開始時刻RGtime-S、音声データＲＧの出力終了時刻をRGtime-E、マイクによる音声信号の取り込み開始時刻をMICtime-S、マイクによる音声信号の取り込み終了時刻をMICtime-Eとしている。時刻RGtime-S以前と時刻RGtime-E以降が騒音区間であり、時刻RGtime-S〜RGtime-Eが非騒音区間である。最初の騒音区間では期間Aで音声が出力されていないため騒音パワーの測定誤差はないが、あとの騒音区間では期間Bで音声が出力されているため、該音声も騒音として検出し、正確な騒音パワーを測定できない。 However, in the proposed technique, in a multi-process in which a general-purpose CPU processes a plurality of applications, each application has a priority, and the processing of the speech intelligibility improvement system is not always performed. There is a time delay until the RG sound generated by the RG generation unit and output from the speaker by the microphone is detected by the microphone and input to the noise separation unit. This will be specifically described with reference to FIG. FIG. 10 shows a case where the general-purpose CPU 91 executes the processing of the DSP 80 of FIG. 9. The GAE unit (correction unit 83, correction value calculation unit 84, noise separation unit 89), sound driver 82, RG reproduction unit The arrangement of 81c and the like is changed. The processing of the RG playback unit 81c on the general-purpose CPU 91 has a low priority, and the audio data input from the RG correction unit 83 cannot always be output immediately to the sound driver 82, but stays in the built-in buffer and causes a delay. In other parts, delay occurs. As a result, the sound driver 82 inputs the microphone detection voice data to the noise separation unit 89 after a considerable time has elapsed after the RG generation unit 81 b inputs the voice data RG to the noise separation unit 89. When such a delay occurs, accurate noise power cannot be measured, and the effect of the speech intelligibility improvement processing is reduced. FIG. 11 is a time chart for explaining that the noise power cannot be measured accurately. The output start time RGtime-S of the audio data RG, the output end time of the audio data RG is RGtime-E, and the start of capturing the audio signal by the microphone. Is MICtime-S, and the end time of audio signal capture by the microphone is MICtime-E. Times before RGtime-S and after time RGtime-E are noise intervals, and times RGtime-S to RGtime-E are non-noise intervals. There is no measurement error of noise power in the first noise section because no sound is output in period A, but since the sound is output in period B in the subsequent noise section, the sound is also detected as noise and accurate. Noise power cannot be measured.

以上より、本発明の目的は音声データＲＧが発生してからマイクにより音声が検出されるまでに遅延があっても騒音パワーを正確に測定できるようにすることである。
本発明の別の目的は、音声明瞭度改善処理による効果を改善することである。 As described above, an object of the present invention is to enable accurate measurement of noise power even when there is a delay between the generation of audio data RG and the detection of audio by a microphone.
Another object of the present invention is to improve the effect of the speech intelligibility improving process.

本発明は、音声明瞭度改善方法および音声明瞭度改善システムである。
・音声明瞭度改善方法
本発明の音声明瞭度改善方法は、マルチプロセスの１つのプロセスとして音声データ発生部が出力する音声データ列に所定のゲインを乗算してスピーカ側に出力すると共に、マイクにより検出された音声信号をディジタルデータに変換して所定長単位で取り込む第１ステップ、前記音声データ列のスピーカ側への出力開始時刻と出力終了時刻を監視し、かつ、前記マイクより取り込んだ所定長の音声データ列の取り込み開始時刻と取り込み終了時刻を監視する第２ステップ、前記各時刻を用いて前記マイクより取り込んだ所定長の音声データ列が騒音区間の音声データ列であるか否か判定する第３ステップ、前記騒音区間の音声データ列を用いて騒音パワーを検出する第４ステップ、を備えており、前記騒音区間でないときに、前記音声データ発生部が出力する音声データ列に基づいて音声パワーを検出する第５ステップ、前記騒音パワーと前記音声パワーを用いて前記音声データ列に乗算するゲインを決定する第６ステップ、を備えている。 The present invention is a speech intelligibility improving method and a speech intelligibility improving system.
-Speech intelligibility improvement method The speech intelligibility improvement method according to the present invention is a multi-process in which a speech data sequence output by an audio data generation unit is multiplied by a predetermined gain and output to a speaker side, and also by a microphone. A first step of converting the detected audio signal into digital data and taking it in a predetermined length unit, monitoring an output start time and an output end time of the audio data string to the speaker side, and taking a predetermined length taken from the microphone A second step of monitoring the acquisition start time and the acquisition end time of the audio data sequence, and determining whether the audio data sequence of a predetermined length acquired from the microphone using each time is an audio data sequence of a noise section A third step, and a fourth step of detecting noise power using the voice data string of the noise section, Sometimes, a fifth step of detecting voice power based on the voice data string output from the voice data generating unit, and a sixth step of determining a gain to be multiplied by the voice data string using the noise power and the voice power. It is equipped with.

・音声明瞭度改善システム
本発明の音声明瞭度改善システムは、音声データ列を発生する音声データ発生部と、前記音声データ列を入力され、音声データをアナログデータに変換してスピーカへ出力する音声信号出力部と、マイクにより検出された音声信号をディジタルデータに変換して所定長単位で取り込む音声データ取り込み部と、前記音声データ列の前記音声信号出力部への入力開始時刻と入力終了時刻と、前記音声データ取り込み部における所定長の音声データ列の取り込み開始時刻と所定長の音声データ列の取り込み終了時刻を監視する時刻監視部と、前記各時刻を用いて前記音声データ取り込み部より取り込んだ所定長の音声データ列が騒音区間の音声データ列であるか判定する騒音区間判定部と、前記音声データ取り込み部に取り込まれた前記騒音区間の音声データ列を用いて騒音パワーを検出する騒音パワー検出部と、前記騒音パワーを用いて、前記出力する音声データ列に乗算するゲインを算出する補正値算出部と、前記算出したゲインを前記音声データ発生部が発生する音声データ列に乗算して前記音声信号出力部に入力する補正部と、を備えており、前記騒音区間でないときに、前記音声データ発生部が出力する音声データ列に基づいて音声パワーを検出する音声パワー検出部、を更に備え、前記補正値算出部は前記騒音パワーと前記音声パワーを比較して前記音声データ列に乗算するゲインを決定する。 -Voice intelligibility improvement system The audio intelligibility improvement system of this invention is the audio | voice data generation part which produces | generates an audio | voice data sequence, and the audio | voice which receives the said audio | voice data sequence, converts audio | voice data into analog data, and outputs it to a speaker A signal output unit, a sound data capturing unit that converts the sound signal detected by the microphone into digital data and captures the data in units of a predetermined length, and an input start time and an input end time of the sound data string to the sound signal output unit; A time monitoring unit for monitoring a start time for capturing a predetermined length of audio data string and a time for capturing a predetermined length of audio data string in the sound data capturing unit, and a time monitoring unit for capturing the sound data using the respective times. A noise section determination unit for determining whether a predetermined length of the voice data string is a voice data string of a noise section; and the voice data capturing unit A noise power detection unit that detects noise power using the captured audio data sequence of the noise section; a correction value calculation unit that calculates a gain to be multiplied to the output audio data sequence using the noise power; A correction unit that multiplies the calculated gain by an audio data sequence generated by the audio data generation unit and inputs the result to the audio signal output unit, and when the audio data generation unit is not in the noise section, An audio power detection unit that detects audio power based on the audio data sequence to be output; and the correction value calculation unit compares the noise power with the audio power to determine a gain for multiplying the audio data sequence. .

本発明の別の音声明瞭度改善システムは、前記音声データ発生部と、時刻監視部と、騒音区間判定部と、騒音パワー検出部と、補正値算出部と補正部の各処理をマルチプロセスの１つのプロセスとしてＣＰＵにより実現する。 Another speech intelligibility improvement system according to the present invention includes a multi-process process for each of the speech data generation unit, the time monitoring unit, the noise section determination unit, the noise power detection unit, the correction value calculation unit, and the correction unit. It is realized by the CPU as one process.

本発明の別の音声明瞭度改善システムは、前記音声データ発生部と、入力された音声データ列を別のＣＰＵに出力する音声データ出力部の各処理をマルチプロセスの１つのプロセスとして、ＣＰＵにより実現し、時刻監視部と、騒音区間判定部と、騒音パワー検出部と、音声信号出力部の各処理をマルチプロセスの１つのプロセスとして別のＣＰＵにより実現する。 Another speech intelligibility improvement system according to the present invention includes a process in which each process of the voice data generation unit and the voice data output unit that outputs an input voice data string to another CPU is performed as one multi-process. The processing of the time monitoring unit, the noise section determination unit, the noise power detection unit, and the audio signal output unit is realized by another CPU as one multi-process.

本発明によれば、音声データ列のスピーカ側への入力開始時刻と入力終了時刻と、マイクより取り込んだ所定長の音声データ列の取り込み開始時刻と取り込み終了時刻とを監視し、それぞれの時刻を用いて、マイクより取り込んだ所定長の音声データ列が騒音区間の音声データ列であるか判断するようにしたので、音声を騒音として検出することがなくなり、正確な騒音パワーを測定することができる。 According to the present invention, the input start time and the input end time to the speaker side of the audio data sequence, and the acquisition start time and the acquisition end time of the audio data sequence of a predetermined length acquired from the microphone are monitored, and the respective times are monitored. Since it is determined whether or not the audio data string of a predetermined length captured from the microphone is an audio data string in the noise section, the voice is not detected as noise, and accurate noise power can be measured. .

また本発明によれば、正しく騒音区間の騒音パワーを用いてスピーカに入力する音声データ列に乗算するゲインを決定するようにしたので、正確に音声明瞭度改善処理を行うことができ、騒音下でもスピーカ出力音声が明瞭に聞こえるようにすることができる。 Further, according to the present invention, the gain for multiplying the audio data string to be input to the speaker is determined using the noise power of the noise section correctly, so that the speech intelligibility improvement process can be performed accurately and the noise level is reduced. However, the speaker output sound can be heard clearly.

・本発明の概略
図１は本発明の概略説明図、図２はマイク検出データが騒音区間の音声データであるかを示す説明図である。図１において、１０は汎用ＣＰＵ、１１はＲＧデータ保存部、１２はＲＧ生成部、１３はＧＡＥ（音声明瞭度改善；Guidance Articulation Enhancement）部、１４はＲＧ再生部、１５はサウンドドライバ、１６はＤ／Ａ変換器、１７はスピーカ、１８はマイク、１９はＡ／Ｄ変換器であり、ＲＧはＲＧデータ保存部が保存している音声データ列、ＲＧ’はＧＡＥ部１３が求めた補正値を乗算した音声データ列、ＭＩＣはマイク１８が取り込んだ音声および周囲の雑音を含む信号である。 Outline of the Present Invention FIG. 1 is a schematic explanatory view of the present invention, and FIG. 2 is an explanatory view showing whether the microphone detection data is sound data of a noise section. In FIG. 1, 10 is a general purpose CPU, 11 is an RG data storage unit, 12 is an RG generation unit, 13 is a GAE (Guidance Articulation Enhancement) unit, 14 is an RG playback unit, 15 is a sound driver, and 16 is D / A converter, 17 is a speaker, 18 is a microphone, 19 is an A / D converter, RG is an audio data sequence stored in the RG data storage unit, RG ′ is a correction value obtained by the GAE unit 13 MIC is a signal including the voice captured by the microphone 18 and ambient noise.

ＲＧ生成部１２はＲＧデータ保存部１１より音声データ列ＲＧを読み出し生成し、騒音分離部１３ｂとＲＧ補正部１３ｄに入力する。ＲＧ補正部１３ｄは補正値算出部１３ｃが算出した補正値を音声データ列ＲＧに乗算して、音量補正した音声データ列ＲＧ’をＲＧ再生部１４に入力する。ＲＧ再生部１４は音声データ列ＲＧ’を一時的に保持すると共にデータ受け渡しのための処理が割り当てられると音声データ列ＲＧ’をサウンドドライバ１５に入力し該入力時刻を入力開始時刻RGtime-S（図２参照）として騒音区間判定部１３ａに入力する。サウンドドライバ１５は音声データ列ＲＧ’をＤ／Ａ変換器１６に入力し、音声データ列がなくなって、その入力を終了した入力終了時刻RGtime-E（図２参照）をＲＧ再生部１４を介して、騒音区間判定部１３ａに入力する。Ｄ／Ａ変換器１６はアナログ変換した音声データ列ＲＧ’をスピーカ１７に入力し、スピーカ１７は入力された音声データ列ＲＧ’に応じた音声を出力する。マイク１８はＭＩＣデータ（出力された音声データ列ＲＧ’の音声と周囲の雑音）を取り込み、Ａ／Ｄ変換器１９によりディジタル変換してサウンドドライバ１５に入力する。サウンドドライバ１５は入力されたＭＩＣデータを図示しない所定容量のバッファ（保存部）に保存し、該バッファが満杯になった時刻を取り込み終了時刻MICtime-E（図２参照）とし、騒音区間判定部１３ａに取り込み終了時刻MICtime-Eを入力する。また、サウンドドライバ１５は所定容量のＭＩＣデータを騒音区間判定部１３ａと騒音分離部１３ｂに入力する。騒音区間判定部１３ａは該取り込み終了時刻よりバッファの容量に応じた時間MICbuftimeを差し引いてＭＩＣデータの取り込み開始時刻MICtime-S（図２参照）を求める。騒音区間判定部１３ａは、取り込んだ所定サイズのＭＩＣデータ（例えば図２のＳ３）が騒音区間のデータ（騒音）であるか判定するために、前記取り込み終了時刻MICtime-Eが前記入力開始時刻RGtime-Sより古いか、または前記入力終了時刻RGtime-Eが前記取り込み開始時刻MICtime-Sより古いか判断する。「ＹＥＳ」と判断した場合前記ＭＩＣデータは騒音区間のデータであると判断し、騒音分離部１３ｂは該音声データ列ＭＩＣを用いて騒音パワーを算出する。ついで、補正値算出部１３ｃは該騒音パワーと音声データ列のパワーを用いて音声データ列ＲＧに乗算する補正値を算出し、ＲＧ補正部１３ｄは音声データ列ＲＧに補正値を乗算し、音量補正した音声データ列ＲＧ’をＲＧ再生部１４、サウンドドライバ１５、Ｄ/Ａ変換器１６を介してスピーカ１７に入力する。また、騒音区間判定部１３ａは上記判断により「ＮＯ」であれば、前記ＭＩＣデータは騒音区間のデータではないと判定し、前記ＭＩＣデータを用いて騒音パワーを算出しない。 The RG generation unit 12 reads out and generates the audio data string RG from the RG data storage unit 11, and inputs it to the noise separation unit 13b and the RG correction unit 13d. The RG correcting unit 13d multiplies the audio data sequence RG by the correction value calculated by the correction value calculating unit 13c, and inputs the audio data sequence RG 'whose volume has been corrected to the RG reproducing unit 14. The RG playback unit 14 temporarily stores the audio data sequence RG ′ and, when assigned with a process for data transfer, inputs the audio data sequence RG ′ to the sound driver 15 and inputs the input time to the input start time RGtime-S ( 2) and input to the noise section determination unit 13a. The sound driver 15 inputs the audio data string RG ′ to the D / A converter 16, and the input end time RGtime-E (see FIG. 2) when the audio data string disappears and the input ends is passed through the RG reproducing unit 14. And input to the noise section determination unit 13a. The D / A converter 16 inputs the analog-converted audio data sequence RG ′ to the speaker 17, and the speaker 17 outputs audio corresponding to the input audio data sequence RG ′. The microphone 18 takes in the MIC data (the voice of the output voice data string RG ′ and ambient noise), converts it into a digital signal by the A / D converter 19 and inputs it to the sound driver 15. The sound driver 15 stores the input MIC data in a buffer (storing unit) having a predetermined capacity (not shown), takes the time when the buffer is full as an end time MICtime-E (see FIG. 2), and determines a noise section determination unit. In 13a, the capture end time MICtime-E is input. The sound driver 15 inputs MIC data having a predetermined capacity to the noise section determination unit 13a and the noise separation unit 13b. The noise section determination unit 13a obtains the MIC data capturing start time MICtime-S (see FIG. 2) by subtracting the time MICbuftime corresponding to the buffer capacity from the capturing end time. The noise section determination unit 13a determines whether the captured end time MICtime-E is the input start time RGtime in order to determine whether the captured MIC data of a predetermined size (for example, S3 in FIG. 2) is noise section data (noise). It is determined whether it is older than -S or the input end time RGtime-E is older than the capture start time MICtime-S. If "YES" is determined, it is determined that the MIC data is data of a noise section, and the noise separation unit 13b calculates noise power using the voice data string MIC. Next, the correction value calculation unit 13c calculates a correction value for multiplying the audio data sequence RG using the noise power and the power of the audio data sequence, and the RG correction unit 13d multiplies the audio data sequence RG by the correction value, The corrected audio data string RG ′ is input to the speaker 17 via the RG reproducing unit 14, the sound driver 15, and the D / A converter 16. Further, if the determination is “NO”, the noise section determination unit 13a determines that the MIC data is not noise section data, and does not calculate noise power using the MIC data.

以上より、音声データ列が発生してからマイクにより音声が検出されるまでに遅延があっても、騒音パワーを正確に測定できるようになるので、音声明瞭度改善処理による効果を改善することができる。 From the above, noise power can be measured accurately even if there is a delay between the generation of the audio data sequence and the detection of the audio by the microphone, so that the effect of the audio intelligibility improvement process can be improved. it can.

・実施例
図３は本発明の第１実施例の音声明瞭度改善システムの構成図である。
通常時、ナビゲーション装置のＲＧ生成部１２は、例えば交差点に接近したときＲＧデータ保存部（図示せず）より音声データ列を読み出し、案内音声の音声データ列ＲＧを生成する。ＲＧ補正部１３ｄは、後述するラウドネス補償ゲイン算出部２１で算出した補正値ｇを入力された音声データ列ＲＧに乗算して音量補正した音声データ列ＲＧ’をＲＧ再生部１４に入力する。ＲＧ再生部１４は入力された音声データ列を内蔵のバッファに保存し CPUよりサウンドドライバへの受け渡しが許可されたときFIFO（ファーストインファーストアウト）により該バッファから音声データ列ＲＧ’を読み出してサウンドドライバ１５ａへ入力する。またRG再生部１４は、サウンドドライバ１５ａへの入力を開始した時刻（入力開始時刻RGtime-S）を測定し、騒音区間判定部１３ａに通知する。 Embodiment FIG. 3 is a configuration diagram of a speech intelligibility improvement system according to a first embodiment of the present invention.
During normal times, the RG generation unit 12 of the navigation device, for example, reads an audio data sequence from an RG data storage unit (not shown) when approaching an intersection, and generates an audio data sequence RG of guidance voice. The RG correction unit 13d inputs to the RG playback unit 14 an audio data sequence RG ′ whose volume has been corrected by multiplying the input audio data sequence RG by the correction value g calculated by the loudness compensation gain calculation unit 21 described later. The RG playback unit 14 stores the input audio data string in a built-in buffer, and reads the audio data string RG 'from the buffer by FIFO (first in first out) when the CPU permits the transfer to the sound driver. Input to the driver 15a. Further, the RG playback unit 14 measures the time when the input to the sound driver 15a is started (input start time RGtime-S) and notifies the noise section determination unit 13a.

サウンドドライバ１５ａはＲＧ再生部１４から入力された音声データ列ＲＧ’を、Ｄ／Ａ変換器１６に入力し、全音声データ列のＤ／Ａ変換器１６への入力が終了すれば該終了した時刻（入力終了時刻RGtime-E）を測定し、ＲＧ再生部１４を介して騒音区間判定部１３ａに通知する。Ｄ／Ａ変換器１６は入力された音声データ列ＲＧ’をアナログデータに変換して、スピーカ１７に入力する。マイク１８はスピーカから出力された音声信号と周囲の雑音を集音し、Ａ／Ｄ変換器１９に入力する。Ａ／Ｄ変換器１９は、入力された音声信号をディジタルデータに変換して、ＭＩＣデータとしてサウンドドライバ１５ｂに入力する。サウンドドライバ１５ｂは入力されたＭＩＣデータを内蔵の所定容量のバッファ１５ｃに保存すると共に、該バッファ１５ｃが満杯になれば、保存されている所定サイズのデータをＧＡＥ部１３に入力し、かつ該入力した時刻（取り込み終了時刻MICtime-E）を測定し、騒音区間判定部１３ａに入力する。以後、サウンドドライバ１５ｂは次のＭＩＣデータのバッファ１５ｃへの保存を開始し、満杯になるごとに保存データをＧＡＥ部１３に入力すると共に、取り込み終了時刻MICtime-Eを騒音区間判定部１３ａに入力する。 The sound driver 15a inputs the audio data string RG ′ input from the RG reproducing unit 14 to the D / A converter 16, and the input is completed when the input of all the audio data strings to the D / A converter 16 is completed. The time (input end time RGtime-E) is measured and notified to the noise section determination unit 13a via the RG playback unit 14. The D / A converter 16 converts the input audio data string RG ′ into analog data and inputs the analog data to the speaker 17. The microphone 18 collects the audio signal output from the speaker and ambient noise and inputs the collected sound signal to the A / D converter 19. The A / D converter 19 converts the input audio signal into digital data and inputs the digital data to the sound driver 15b as MIC data. The sound driver 15b stores the input MIC data in a built-in buffer 15c having a predetermined capacity. When the buffer 15c is full, the stored data of a predetermined size is input to the GAE unit 13, and the input The measured time (capture end time MICtime-E) is measured and input to the noise section determination unit 13a. Thereafter, the sound driver 15b starts storing the next MIC data in the buffer 15c, and inputs the stored data to the GAE unit 13 every time it is full, and inputs the capture end time MICtime-E to the noise section determining unit 13a. To do.

騒音区間判定部１３ａは、取り込み終了時刻MICtime-Eとバッファ１５ｃの容量に応じた時間（MICbuftime）を用いて、ＭＩＣデータの取り込みを開始した時刻（取り込み開始時刻MICtime-S）を算出し（MICtime-S＝MICtime-E−MICbuftime）、各時刻RGtime-S、RGtime-E、MICtime-S、MICtime-Eを用いてバッファから取り込んだ音声データが騒音区間のデータであるか判断する（図２参照）。すなわち、騒音区間判定部１３ａはRGtime-ＥからRGtime-Sまでの期間を騒音区間、RGtime-SからRGtime-Ｅまでの期間を非騒音区間とみなし、ＭＩＣデータの取り込み終了時刻MICtime-Eが入力開始時刻RGtime-Sより古いか、または入力終了時刻RGtime-EがＭＩＣデータの取り込み開始時刻MICtime-Sより古いかを判断し、「YES」の場合にはＭＩＣデータは騒音区間のデータであると判定し、「NO」の場合には非騒音区間のデータであると判定し、騒音区間もしくは非騒音区間の切り替え信号を切り替え部２２に入力する。尚、図２のＭＩＣデータＳ０〜Ｓ８のうちＳ２〜Ｓ４が騒音区間のデータとなる。切り替え部２２は接点Aに固定しておき、騒音区間判定部１３ａが入力する非騒音区間の信号により接点Ｂに切り替える。サウンドドライバ１５ｂは固定接点Ａと聴感補正フィルタ２３を介してＭＩＣデータ（騒音信号）をパワー計算部２４に入力し、パワー計算部２４は入力されたＭＩＣデータの振幅の二乗演算を行って騒音パワーを計算する。 The noise section determination unit 13a calculates the start time of MIC data import (capture start time MICtime-S) using the capture end time MICtime-E and the time (MICbuftime) according to the capacity of the buffer 15c (MICtime). -S = MICtime-E-MICbuftime), and using each time RGtime-S, RGtime-E, MICtime-S, and MICtime-E, it is determined whether the audio data taken from the buffer is noise section data (see FIG. 2). ). That is, the noise section determination unit 13a regards the period from RGtime-E to RGtime-S as the noise section and the period from RGtime-S to RGtime-E as the non-noise section, and inputs the MIC data capture end time MICtime-E. It is determined whether it is older than the start time RGtime-S or the input end time RGtime-E is older than the MIC data capture start time MICtime-S. If “YES”, the MIC data is data in the noise section. In the case of “NO”, it is determined that the data is in a non-noise section, and a switching signal for a noise section or a non-noise section is input to the switching unit 22. Of the MIC data S0 to S8 in FIG. 2, S2 to S4 are noise section data. The switching unit 22 is fixed to the contact point A, and is switched to the contact point B according to the signal of the non-noise section input by the noise section determination unit 13a. The sound driver 15b inputs the MIC data (noise signal) to the power calculation unit 24 through the fixed contact A and the audibility correction filter 23, and the power calculation unit 24 performs the square calculation of the amplitude of the input MIC data to obtain the noise power. Calculate

騒音パワー平均化部２５は、騒音区間において、パワー計算部２４から出力する最新のＮ個（Ｎ：定数）のパワーの移動平均値を求め、該移動平均値を騒音パワーとしてパワー保存部２６に保存する。この結果、音声信号が出力されたときに、直前の騒音区間における最新の騒音パワーがパワー保存部２６に保存されていることになる。本発明では非騒音区間における騒音パワーは、パワー保存部２６に保存されている騒音パワーであると見なし、パワー保存部２６に保存されている騒音パワーをラウドネス補償ゲイン算出部２１に入力する。 The noise power averaging unit 25 obtains a moving average value of the latest N (N: constant) powers output from the power calculation unit 24 in the noise section, and uses the moving average value as noise power to the power storage unit 26. save. As a result, when the audio signal is output, the latest noise power in the immediately preceding noise section is stored in the power storage unit 26. In the present invention, the noise power in the non-noise section is regarded as the noise power stored in the power storage unit 26, and the noise power stored in the power storage unit 26 is input to the loudness compensation gain calculation unit 21.

以上と並行して、ＲＧ生成部１２から出力される音声データ列ＲＧは、聴感補正フィルタ２７を介して音声パワー計算部２８に入力する。音声パワー計算部２８は入力された音声データ列ＲＧの振幅の二乗演算を行って音声パワーを計算し、該音声パワーを音声パワー平均化部２９に入力し、音声パワー平均化部２９は音声パワー計算部２８より入力されたＭ個（Ｍ：定数）の音声パワーの平均値を演算し、可変ゲイン部３１に入力する。可変ゲイン部３１は平均音声パワーにゲインＧを乗算して出力する。なお、可変ゲイン部３１に設定されるゲインＧはスピーカ１７の入力端子からマイク出力端子までの伝播特性をゲインのみで近似できるとみなして、特性同定部３０が該ゲインＧを予め同定して設定するものである。 In parallel with the above, the audio data string RG output from the RG generation unit 12 is input to the audio power calculation unit 28 via the audibility correction filter 27. The audio power calculation unit 28 calculates the audio power by performing the square calculation of the amplitude of the input audio data string RG, inputs the audio power to the audio power averaging unit 29, and the audio power averaging unit 29 An average value of M (M: constant) audio powers input from the calculation unit 28 is calculated and input to the variable gain unit 31. The variable gain unit 31 multiplies the average audio power by the gain G and outputs the result. Note that the gain G set in the variable gain unit 31 assumes that the propagation characteristic from the input terminal of the speaker 17 to the microphone output terminal can be approximated only by the gain, and the characteristic identification unit 30 identifies and sets the gain G in advance. To do.

ラウドネス補償ゲイン算出部２１は、非騒音区間において、可変ゲイン部３１から入力する音声パワーとパワー保存部２６から入力する騒音パワーに基づき、騒音のレベルによらず音声信号が明瞭に聞こえるゲインｇを人のラウドネス特性により決定してＲＧ補正部１３ｄに入力し、ＲＧ補正部１３ｄは該ゲインｇを入力され、音声データ列ＲＧにゲインｇを乗算して出力する。なお、ラウドネス補償ゲイン算出部２１は非騒音区間ではゲインｇの決定制御を行わない。 The loudness compensation gain calculation unit 21 calculates a gain g at which the audio signal can be clearly heard regardless of the noise level based on the audio power input from the variable gain unit 31 and the noise power input from the power storage unit 26 in the non-noise section. The gain is determined by human loudness characteristics and input to the RG correction unit 13d. The RG correction unit 13d receives the gain g, multiplies the audio data string RG by the gain g, and outputs the result. The loudness compensation gain calculation unit 21 does not perform gain g determination control in the non-noise section.

以上、本発明によれば、図２に示す案内音声を含むＭＩＣデータＳ１、Ｓ５を騒音区間のデータとしないから騒音区間の騒音パワーを正確に測定して保存することができる。 As described above, according to the present invention, since the MIC data S1 and S5 including the guidance voice shown in FIG. 2 are not used as the noise section data, the noise power in the noise section can be accurately measured and stored.

図４はＲＧ再生部１４およびサウンドドライバ１５ａの処理フロー、図５は騒音区間判定部１３ａ、補正値算出部１３ｃおよびサウンドドライバ１５ｂの処理フローである。以下、これらの処理フローに沿って音声明瞭度改善システムの騒音区間判定処理について説明を行う。ただし、ＲＧ生成部１２がＲＧデータ保存部１１から音声データ列ＲＧを読み出し、ＲＧ補正部１３ｄに入力し、ＲＧ補正部１３ｄは補正値算出部１３ｃが算出したゲインｇを乗算し、音声信号である音声データ列ＲＧ’をＲＧ再生部１４に入力してあるものとする。 FIG. 4 is a processing flow of the RG playback unit 14 and the sound driver 15a, and FIG. 5 is a processing flow of the noise section determination unit 13a, the correction value calculation unit 13c, and the sound driver 15b. Hereinafter, the noise section determination processing of the speech intelligibility improvement system will be described along these processing flows. However, the RG generation unit 12 reads the audio data string RG from the RG data storage unit 11 and inputs it to the RG correction unit 13d. The RG correction unit 13d multiplies the gain g calculated by the correction value calculation unit 13c, and uses the audio signal. It is assumed that a certain audio data string RG ′ is input to the RG reproducing unit 14.

ＲＧ再生部１４は、ＣＰＵより許可されて音声データ列ＲＧ’をサウンドドライバ１５ａに入力開始した入力開始時刻RGtime-Sを測定し（ステップＳ４０１）、騒音区間判定部１３ａに入力開始時刻RGtime-Sを入力する（ステップＳ４０２）。 The RG playback unit 14 measures the input start time RGtime-S that is permitted by the CPU and starts to input the audio data string RG ′ to the sound driver 15a (step S401), and inputs the input start time RGtime-S to the noise section determination unit 13a. Is input (step S402).

ついで、ＲＧ再生部１４はサウンドドライバ１５ａに音声データ列ＲＧを渡し（ステップＳ４０３）、サウンドドライバ１５ａは受け取った音声データ列ＲＧ’に所定の処理を施して、Ｄ／Ａ変換器１６に入力し、Ｄ／Ａ変換器１６はディジタルの音声データ列ＲＧ’をアナログ信号に変換し、スピーカ１７に入力し、スピーカ１７は音声信号を出力する（ステップＳ４０４）。 Next, the RG playback unit 14 passes the audio data string RG to the sound driver 15a (step S403), and the sound driver 15a performs a predetermined process on the received audio data string RG 'and inputs it to the D / A converter 16. The D / A converter 16 converts the digital audio data string RG ′ into an analog signal and inputs it to the speaker 17, and the speaker 17 outputs the audio signal (step S404).

サウンドドライバ１５ａは入力された全音声データ列の出力が終了すれば、該時刻を入力終了時刻RGtime-Eとして測定し（ステップS４０５）、該入力再生時刻RGtime-EをＲＧ再生部１４に入力し（ステップＳ４０６）、ＲＧ再生部１４は騒音区間判定部１３ａに通知する（ステップＳ４０７）。以上により、騒音区間判定部１３ａはRGtime-ＥからRGtime-Sまでの期間を騒音区間、RGtime-SからRGtime-Ｅまでの期間を非騒音区間とみなし、次の図５の処理フローにしたがってＭＩＣデータが騒音区間のデータであるか否かを判定する。すなわち、ＭＩＣデータの取り込み終了時刻MICtime-Eが入力開始時刻RGtime-Sより古い、または入力終了時刻RGtime-EがＭＩＣデータの取り込み開始時刻MICtime-Sより古いかを判断し、「YES」の場合にはMICデータは騒音区間のデータであると判定し、「NO」の場合には非騒音区間であると判定する。 When the output of all the input audio data strings is completed, the sound driver 15a measures the time as the input end time RGtime-E (step S405), and inputs the input playback time RGtime-E to the RG playback unit 14. (Step S406), the RG reproducing unit 14 notifies the noise section determining unit 13a (Step S407). From the above, the noise section determination unit 13a regards the period from RGtime-E to RGtime-S as the noise section and the period from RGtime-S to RGtime-E as the non-noise section, and follows the processing flow of FIG. It is determined whether the data is noise section data. That is, it is determined whether the MIC data capture end time MICtime-E is older than the input start time RGtime-S or the input end time RGtime-E is older than the MIC data capture start time MICtime-S. Is determined that the MIC data is data in a noise section, and if “NO”, it is determined as a non-noise section.

以下、図５にしたがって、マイク１８より取り込んだＭＩＣデータが騒音区間のデータであるか否かの処理を説明する。 Hereinafter, the process of determining whether or not the MIC data captured from the microphone 18 is noise section data will be described with reference to FIG.

サウンドドライバ１５ｂはマイク１８により検出されたＭＩＣデータを順にバッファ１５ｃに保存し（ステップＳ５０１）、該バッファ１５ｃが満杯になったかの判断を行い（ステップＳ５０２）、満杯になった場合はステップＳ５０３に進み、満杯になっていない場合はＳ５０１〜Ｓ５０２の処理を繰り返す。 The sound driver 15b sequentially stores the MIC data detected by the microphone 18 in the buffer 15c (step S501), determines whether the buffer 15c is full (step S502), and proceeds to step S503 if the buffer 15c is full. If not full, the processing of S501 to S502 is repeated.

ステップＳ５０２において、バッファ１５ｃが満杯になったと判断した場合、サウンドドライバ１５ｂは該満杯になった時刻（取り込み終了時刻）MICtime-Eを測定し（ステップＳ５０３）、取り込み終了時刻MICtime-Eを騒音区間判定部１３ａに入力すると共に、バッファ１５ｃに保存されているＭＩＣデータをＧＡＥ部１３に入力する（ステップＳ５０４）。 If it is determined in step S502 that the buffer 15c is full, the sound driver 15b measures the full time (capture end time) MICtime-E (step S503), and uses the capture end time MICtime-E as the noise interval. The MIC data input to the determination unit 13a and the MIC data stored in the buffer 15c are input to the GAE unit 13 (step S504).

ついで、騒音区間判定部１３ａは、バッファ１５ｃの容量に応じた時間MICbuftimeをMICtime-Eより差し引いて取り込み開始した取り込み開始時刻MICtime-Sを算出し（ステップＳ５０５）、各時刻RGtime-S、RGtime-E、MICtime-S、MICtime-Eを用いてバッファ１５ｃから取り込んだ音声データが騒音区間のデータであるか判断する（ステップＳ５０６）。すなわち、ステップＳ５０６では、騒音区間判定部１３ａは取り込み終了時刻MICtime-Eが入力開始時刻RGtime-Sより古いか、または入力終了時刻RGtime-Eが取り込み開始時刻MICtime-Sより古いか判断し、「ＹＥＳ」と判断した場合はステップＳ５０７に進み、「ＮＯ」と判断した場合にはステップＳ５１０に進む。 Next, the noise section determination unit 13a calculates the acquisition start time MICtime-S at which the acquisition starts by subtracting the time MICbuftime corresponding to the capacity of the buffer 15c from MICtime-E (step S505), and each time RGtime-S, RGtime- It is determined whether the audio data captured from the buffer 15c is noise section data using E, MICtime-S, and MICtime-E (step S506). That is, in step S506, the noise section determination unit 13a determines whether the capture end time MICtime-E is older than the input start time RGtime-S or whether the input end time RGtime-E is older than the capture start time MICtime-S. If “YES” is determined, the process proceeds to step S507. If “NO” is determined, the process proceeds to step S510.

ステップＳ５０６において、「ＹＥＳ」と判断した場合には、バッファ１５ｃから取り込んだＭＩＣデータは騒音区間におけるデータであるとみなし、騒音分離部１３ｂはＭＩＣデータを用いて騒音パワーを算出し、騒音パワーを補正値算出部１３ｃに入力する（ステップＳ５０７）。 If “YES” is determined in step S506, the MIC data fetched from the buffer 15c is regarded as data in the noise section, and the noise separation unit 13b calculates the noise power using the MIC data and calculates the noise power. It inputs into the correction value calculation part 13c (step S507).

しかる後、補正値算出部１３ｃは入力された騒音パワーを用いて、音声データ列ＲＧに乗算する補正値ｇを算出し、算出した補正値ｇをＲＧ補正部１３ｄに入力し（ステップＳ５０８）、ＲＧ補正部１３ｄは入力された補正値ｇを音声データ列ＲＧに乗算し音量補正する（ステップＳ５０９）。その後、上記の処理を繰り返し行い、補正値の更新および騒音区間の判定を行う。 Thereafter, the correction value calculation unit 13c calculates a correction value g by which the audio data string RG is multiplied using the input noise power, and inputs the calculated correction value g to the RG correction unit 13d (step S508). The RG correction unit 13d multiplies the input correction value g by the audio data string RG to correct the volume (step S509). Thereafter, the above process is repeated to update the correction value and determine the noise section.

ステップＳ５０６において、「ＮＯ」と判断した場合には、バッファから取り込んだ音声データは騒音区間のデータでないと判断し、騒音パワーを算出しない（ステップＳ５１０）。その後、上記の処理を繰り返し行い、補正値の更新および騒音区間の判定を行う。本実施例では、取り込み終了時刻MICtime-Eよりバッファの容量に応じた時間（MICbuftime）を差し引いて、ＭＩＣデータの取り込み開始時刻MICtime-Sを算出したが、それに限定されるものではなく、例えば、取り込み開始時刻MICtime-Sにバッファの容量に応じた時間（MICbuftime）を加算して、ＭＩＣデータの取り込みを終了した時刻（取り込み終了時刻MICtime-E）を算出する（MICtime-E＝MICtime-S＋MICbuftime）ようにしてもよい。 If “NO” is determined in step S506, it is determined that the voice data taken from the buffer is not noise section data, and the noise power is not calculated (step S510). Thereafter, the above process is repeated to update the correction value and determine the noise section. In this embodiment, the acquisition start time MICtime-S of the MIC data is calculated by subtracting the time (MICbuftime) corresponding to the buffer capacity from the acquisition end time MICtime-E. However, the present invention is not limited to this. The time (MICbuftime) corresponding to the buffer capacity is added to the capture start time MICtime-S to calculate the time when the MIC data capture is completed (capture end time MICtime-E) (MICtime-E = MICtime-S + MICbuftime) You may do it.

以上、本実施例によれば、音声データ列の入力開始時刻と入力終了時刻と、マイクより取り込んだ所定長の音声データ列の取り込み開始時刻と取り込み終了時刻とを監視し、それぞれの時刻を用いて、マイクより取り込んだ所定長の音声データ列が騒音区間の音声データ列であるか判断するようにしたので、音声を騒音として検出することがなくなり（例えば図２のＭＩＣデータＳ１、Ｓ５を騒音として検出しないため）、正確な騒音パワーを測定することができる。 As described above, according to the present embodiment, the input start time and input end time of the audio data string and the acquisition start time and the acquisition end time of the audio data string of a predetermined length acquired from the microphone are monitored, and the respective times are used. Therefore, since it is determined whether the audio data string of a predetermined length captured from the microphone is an audio data string in the noise section, the audio is not detected as noise (for example, the MIC data S1 and S5 in FIG. So that the exact noise power can be measured.

また、本実施例によれば、正しく騒音区間の騒音パワーを用いてスピーカに入力する音声データ列に乗算するゲインを決定するようにしたので、正確に音声明瞭度改善処理を行うことができ、騒音下でもスピーカ出力音声が明瞭に聞こえるようにすることができる。 In addition, according to the present embodiment, the gain for multiplying the audio data string input to the speaker by using the noise power of the noise section correctly is determined, so that the speech intelligibility improvement processing can be performed accurately, The speaker output sound can be heard clearly even under noise.

・変形例
図６は本発明の変形例の構成図であり、図１と同一部分には同一符号を付している。異なる点は１つの汎用ＣＰＵによる音声明瞭度改善処理を、もう１つの別の汎用ＣＰＵによる音声明瞭度改善処理に変更した点である。 Modified Example FIG. 6 is a block diagram of a modified example of the present invention, and the same parts as those in FIG. The difference is that the voice clarity improvement processing by one general-purpose CPU is changed to the voice clarity improvement processing by another different general-purpose CPU.

もう１つの別の汎用ＣＰＵ４０は、騒音区間判定部１３ａ、騒音分離部１３ｂ、補正値算出部１３ｃ、ＲＧ補正部１３ｄから構成されており、１３ａ〜１３ｄは上記実施例のＧＡＥ部１３と同様の処理を行い、音声明瞭度改善処理を行う。 Another general-purpose CPU 40 includes a noise section determination unit 13a, a noise separation unit 13b, a correction value calculation unit 13c, and an RG correction unit 13d, and 13a to 13d are the same as the GAE unit 13 of the above embodiment. Process to improve speech intelligibility.

以上、本変形例によれば、上記実施例と同様の効果を得ることができ、音声と騒音を検出することがなくなり、正確な騒音パワーを測定でき、かつ、正確に音声明瞭度改善処理を行うことができ、騒音下でもスピーカ出力音声が明瞭に聞こえるようにすることができる。
また、本変形例によれば、アプリの処理を行う第一の汎用ＣＰＵとＧＡＥの処理を行う第二の汎用ＣＰＵを用いるようにしたので、第一の汎用ＣＰＵと第二の汎用ＣＰＵが分離可能となり、アプリとＧＡＥが脱着可能となる。 As described above, according to this modification, it is possible to obtain the same effects as in the above-described embodiment, no longer detect voice and noise, accurately measure noise power, and accurately perform speech intelligibility improvement processing. This can be performed, and the speaker output sound can be clearly heard even under noise.
In addition, according to this modification, the first general-purpose CPU that performs application processing and the second general-purpose CPU that performs GAE processing are used. Therefore, the first general-purpose CPU and the second general-purpose CPU are separated. It becomes possible, and the application and GAE can be detached.

本発明の概略説明図である。It is a schematic explanatory drawing of this invention. マイク検出データが騒音区間の音声データであるか示す説明図である。It is explanatory drawing which shows whether microphone detection data is the audio | voice data of a noise area. 本発明の第１実施例の音声明瞭度改善システムの構成図である。It is a block diagram of the speech intelligibility improvement system of 1st Example of this invention. ＲＧ再生部１２およびサウンドドライバ１５ａの処理フローである。It is a processing flow of the RG playback unit 12 and the sound driver 15a. 騒音区間判定部１３ａ、補正値算出部１３ｃおよびサウンドドライバ１５ｂの処理フローである。It is a processing flow of the noise area determination part 13a, the correction value calculation part 13c, and the sound driver 15b. 本発明の変形例の構成図である。It is a block diagram of the modification of this invention. 従来の音声明瞭度改善システムである。This is a conventional speech intelligibility improvement system. 提案されている音声明瞭度改善システムの構成図である。It is a block diagram of the proposed speech intelligibility improvement system. 図８の音声明瞭度改善システムをマルチプロセスの汎用ＣＰＵとＤＳＰ（Digital Signal Processor）とで実現する例である。It is an example which implement | achieves the audio | voice intelligibility improvement system of FIG. 8 with multi-process general purpose CPU and DSP (Digital Signal Processor). 汎用ＣＰＵ上で遅延が生じる原因を説明する図である。It is a figure explaining the cause which a delay produces on general purpose CPU. 騒音パワーが正確に測定できないことを説明するタイムチャートである。It is a time chart explaining that noise power cannot be measured correctly.

Explanation of symbols

１０汎用ＣＰＵ
１０ａ〜１０ｄ汎用ＣＰＵ上が処理するアプリ
１１ＲＧデータ保存部
１２ＲＧ生成部
１３音声明瞭度改善システム
１３ａ騒音区間判定部
１３ｂ騒音分離部
１３ｃ補正値算出部
１３ｄＲＧ補正部
１４ＲＧ再生部
１５サウンドドライバ
１５ｃバッファ
２３聴感補正フィルタ
２４パワー計算部
２５騒音パワー平均化部
２６パワー保存部
２７聴感補正フィルタ
２８音声パワー計算部
２９音声パワー平均化部
３０特性同定部
３１可変ゲイン部
４０別の汎用ＣＰＵ 10 General-purpose CPU
10a to 10d Application processed on general-purpose CPU 11 RG data storage unit 12 RG generation unit 13 Speech intelligibility improvement system 13a Noise section determination unit 13b Noise separation unit 13c Correction value calculation unit 13d RG correction unit 14 RG reproduction unit 15 Sound driver 15c buffer 23 auditory correction filter 24 power calculation unit 25 noise power averaging unit 26 power storage unit 27 auditory correction filter 28 audio power calculation unit 29 audio power averaging unit 30 characteristic identification unit 31 variable gain unit 40 another general-purpose CPU

Claims

In the speech intelligibility improving method for controlling the gain of the audio signal using the noise power in the noise interval and the audio power in the non-noise interval as a noise interval in the interval where the audio signal is not output,
As a multi-process, the audio data sequence output from the audio data generation unit is multiplied by the gain and output to the speaker side, and the audio signal detected by the microphone is converted into digital data and captured in units of a predetermined length. First step,
A second step of monitoring an input start time and an input end time to the speaker side of the audio data sequence, and monitoring an acquisition start time and an acquisition end time of a predetermined length of audio data sequence acquired from the microphone;
A third step of determining whether or not the audio data string of a predetermined length captured from the microphone using each time is an audio data string of a noise section;
A fourth step of detecting noise power using the audio data string of the noise section;
A method for improving speech intelligibility, comprising:

A fifth step of detecting voice power based on a voice data sequence output by the voice data generator when not in the noise section;
A sixth step of determining a gain by which the sound data string is multiplied using the noise power and the sound power;
The method for improving speech intelligibility according to claim 1.

When the audio data string output by the audio data generation unit is input to the speaker via a sound driver, the second step includes:
Storing the time when the input of the audio data string to the sound driver is started as the input start time;
Storing the time when the input of the last audio data string to the speaker by the sound driver is ended as the input end time;
The speech intelligibility improving method according to claim 1 or 2, further comprising:

The second step includes
Storing the audio data sequence captured from the microphone in a buffer of a predetermined capacity and storing the capture start time;
Outputting the entire audio data sequence stored when the buffer is full and the capture start time;
The speech intelligibility improving method according to claim 1 or 2, further comprising:

The third step is a step of obtaining the capture end time by adding a certain time according to the capacity of the buffer to the output capture start time,
The speech intelligibility improving method according to claim 4, further comprising:

The second step includes
Storing the audio data sequence captured from the microphone in a buffer having a predetermined capacity and storing the time as an end time when the buffer is full;
Outputting all stored audio data and the capture end time;
The speech intelligibility improving method according to claim 1 or 2, further comprising:

The third step is a step of subtracting a certain time according to the capacity of the buffer from the output capture end time to obtain the capture start time;
The method for improving speech intelligibility according to claim 6.

When the latest input start time is RGtime-S, the latest input end time is RGtime-E, the latest capture start time is MICtime-S, and the latest capture end time is MICtime-E.
Comparing the input start time RGtime-S and the capture end time MICtime-E;
Comparing the input end time RGtime-E and the capture start time MICtime-S;
If the capture end time MICtime-E is older than the latest input start time RGtime-S, or the latest input end time RGtime-E is older than the latest capture start time MICtime-S Determining that the audio data string of the predetermined length is an audio data string of the noise section;
The speech intelligibility improving method according to claim 1 or 2, characterized by comprising:

In a speech intelligibility improving system that controls a gain of a speech signal using a noise power in a noise zone and a voice power in a non-noise zone as a noise zone where a voice signal is not output.
An audio data generator for generating an audio data sequence;
An audio signal output unit that converts the input audio data into analog data and outputs the analog data; and
An audio data capturing unit that converts an audio signal detected by a microphone into digital data and captures the data in units of a predetermined length;
Monitor the input start time and input end time of the audio data sequence to the audio signal output unit, the acquisition start time of the audio data sequence of a predetermined length and the acquisition end time of the audio data sequence of a predetermined length in the audio data acquisition unit A time monitoring unit to
A noise section determination unit that determines whether the predetermined length of the voice data sequence captured by the voice data capturing unit using each time is a noise section voice data sequence;
A noise power detection unit for detecting noise power using a voice data string of the noise section captured by the voice data capturing unit;
Using the noise power, a correction value calculation unit for calculating a gain to be multiplied with the output audio data string;
A correction unit that multiplies the calculated gain by the audio data sequence generated by the audio data generation unit and inputs the multiplication to the audio signal output unit;
A speech intelligibility improvement system characterized by comprising:

A voice power detector that detects voice power based on a voice data sequence output by the voice data generator when not in the noise section;
The correction value calculating unit determines a gain by which the audio data string is multiplied using the noise power and the audio power;
The speech intelligibility improvement system according to claim 9.

The time monitoring unit finishes inputting the last audio data sequence to the audio signal output unit, with the time when the audio data sequence output from the correction unit started to be input to the audio signal output unit as the input start time. Storing the time as the input end time;
The speech intelligibility improvement system according to claim 9 or 10.

Furthermore, the audio data storage unit of the predetermined length that stores the audio data sequence captured by the audio data capturing unit,
The time monitoring unit outputs to the noise section determination unit the entire audio data sequence stored when the audio data storage unit is full and the start time of capturing the audio data sequence of the predetermined length.
The speech intelligibility improvement system according to claim 9 or 10.

The noise section determination unit obtains the capture end time by adding a certain time according to the capacity of the audio data storage unit to the capture start time.
The speech intelligibility improvement system according to claim 12.

Furthermore, the audio data storage unit of the predetermined length that stores the audio data sequence captured by the audio data capturing unit,
The time monitoring unit outputs the entire audio data sequence stored when the audio data storage unit is full and the capture end time to the noise section determination unit.
The speech intelligibility improvement system according to claim 9 or 10.

The noise section determination unit obtains the capture start time by subtracting a certain time according to the capacity of the audio data storage unit from the capture end time.
The speech intelligibility improvement system according to claim 14.

When the latest input start time is RGtime-S, the latest input end time is RGtime-E, the latest capture start time is MICtime-S, and the latest capture end time is MICtime-S.
The noise section determination unit compares the input start time RGtime-S and the capture end time MICtime-E, compares the latest input end time RGtime-E and the latest capture start time MICtime-S, and If the capture end time MICtime-E is older than the latest input start time RGtime-S, or the latest input end time RGtime-E is older than the latest capture start time MICtime-S , Determining that the audio data string of the predetermined length is an audio data string of the noise section,
The speech intelligibility improvement system according to claim 9 or 10.

Each process of the voice data generation unit, the time monitoring unit, the noise section determination unit, the noise power detection unit, the correction value calculation unit, and the correction unit is realized by the CPU as one multi-process. The speech intelligibility improvement system according to claim 9.

Each process of the audio data generation unit and the audio data output unit that outputs the input audio data string to another CPU is realized as a multi-process by the CPU, and a time monitoring unit and a noise section determination unit 10. The speech intelligibility improvement system according to claim 9, wherein each process of the noise power detection unit and the audio signal output unit is realized by another CPU as one multi-process.