JPH10224897A

JPH10224897A - Processing circuit for audio signal

Info

Publication number: JPH10224897A
Application number: JP9026262A
Authority: JP
Inventors: Masami Miura; 雅美三浦
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 1997-02-10
Filing date: 1997-02-10
Publication date: 1998-08-21
Anticipated expiration: 2017-02-10
Also published as: JP4005166B2

Abstract

PROBLEM TO BE SOLVED: To facilitate listening of even continuous sounds in speech while reducing the effect of masking at the time of relay. SOLUTION: A variable gain amplifier 12 is provided on the signal line of audio signal S11. A detection circuit 23 for detecting the end point of audio signal S11 and a control circuit 24 for controlling the gain of variable gain amplifier 12 according to the detection output of this detection circuit 23 are provided. When the end point of audio signal S11 is detected by the detection circuit 23, at the variable gain amplifier 12, the amplitude of audio signal S11 is enlarged for a prescribed period by the control output of control circuit 24.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】この発明は、補聴器、電話、
拡声器、音声通信などの分野で用いられる音声信号の処
理回路に関する。The present invention relates to a hearing aid, a telephone,
The present invention relates to an audio signal processing circuit used in fields such as a loudspeaker and an audio communication.

【０００２】[0002]

【従来の技術】音声を伝送あるいは再生する場合、その
伝送系あるいは再生系に残響やエコーが多いと、結果の
音声の明瞭度が低下してしまう。そこで、そのようなと
きには、発話速度を遅くする、連続して発声される語音
を細かく分解し、時間をあけて再生するなどの処理が行
なわれている。2. Description of the Related Art When transmitting or reproducing sound, if there is much reverberation or echo in the transmission system or reproduction system, the clarity of the resulting sound is reduced. Therefore, in such a case, processing such as slowing down the utterance speed, finely decomposing a continuously uttered speech sound, and reproducing it at a later time is performed.

【０００３】また、子音のような高域周波数が聞き取り
にくいときには、周波数イコライザ処理により高域周波
数の強調を行うこともある。さらに、いわゆる継時マス
キング（エネルギーの大きい母音と子音とが続くとき、
その母音により子音がマスクされる現象）を考慮した重
み関数をかける処理も試みられている。When a high frequency such as a consonant is difficult to hear, the high frequency may be emphasized by a frequency equalizer process. Furthermore, so-called successive masking (when energetic vowels and consonants continue,
Attempts have also been made to apply a weighting function taking into account the phenomenon of consonants being masked by the vowels).

【０００４】さらに、以上の処理は難聴者や老人を対象
に行われることもある。[0004] Further, the above processing may be performed for a hearing-impaired person or an elderly person.

【０００５】[0005]

【発明が解決しようとする課題】ところが、上述したよ
うに、発話速度を遅くしたり、連続して発声される語音
を分解したりすると、次のような問題点を生じてしま
う。However, as described above, if the utterance speed is reduced or a speech sound that is continuously uttered is decomposed, the following problems occur.

【０００６】１．原音声との間に時間のずれを生じ、即
時性がなくなってしまう。したがって、会話などを行な
うときには使えない。また、放送などを聞く楊合であっ
ても、聞き終わるまでの時間が長くなってしまう。２．語音の知覚判断には音声成分の変化速度も重要な手
がかりになっているので、発話速度を遅くすると、この
手がかりが変化して別な語音に知覚されてしまうことが
ある。３．語音を分解してゆっくり再生すると、語音のまとま
りとしての情報や過渡的な変化部分の情報が失われ、明
瞭度の悪くなることがある。４．常に高域周波数を増幅した音声は、音色のバランス
がくずれて不快であったり、聞き取りにくいことがあ
る。５．継時マスキングを考慮した重み関数をかける処理
は、少なくとも重み関数の時間長の遅延が生じてしま
い、即時性が失われてしまう。この結果、口の動きと処
理音との間に時間ずれを生じて明瞭度に悪い影響を与え
ることがある。また、イヤホンからマイクロフォンへの
音響的フィードバックがあるときには、その時間遅れに
よって残響音のような現象が引き起こされてしまう。[0006] 1. A time lag occurs between the original sound and the original sound. Therefore, it cannot be used for conversations. In addition, even if you listen to a broadcast, the time it takes to finish listening will be long. 2. Since the rate of change of the voice component is also an important clue in the perception judgment of the speech sound, if the speech speed is reduced, the clue may change and be perceived as another speech sound. 3. If a speech is decomposed and played slowly, information as a unit of speech or information of a transitional change may be lost, resulting in poor clarity. 4. A sound with a constantly amplified high frequency may be uncomfortable due to a loss of timbre, or may be difficult to hear. 5. In the process of applying the weighting function in consideration of the successive masking, at least the time length of the weighting function is delayed, and the immediacy is lost. As a result, a time lag may occur between the movement of the mouth and the processed sound, which may adversely affect the clarity. Also, when there is acoustic feedback from the earphone to the microphone, a time delay causes a phenomenon such as reverberation.

【０００７】この発明は、以上のような問題点を一掃し
ようとするものである。[0007] The present invention aims to eliminate the above problems.

【０００８】[0008]

【課題を解決するための手段】このため、この発明にお
いては、音声信号の供給される可変利得アンプと、上記
音声信号の終了点を検出する検出回路と、この検出回路
の検出出力にしたがって上記可変利得アンプの利得を制
御する制御回路とを有し、上記検出回路により上記音声
信号の終了点が検出されたとき、上記可変利得アンプに
おいて、上記制御回路の制御出力により、上記音声信号
の振幅を所定の期間にわたって大きくするようにした音
声信号の処理回路とするものである。したがって、連続
する音声に子音が続くとき、その振幅が拡大される。According to the present invention, there is provided a variable gain amplifier to which an audio signal is supplied, a detection circuit for detecting an end point of the audio signal, and a detection circuit for detecting an end point of the audio signal. A control circuit for controlling the gain of the variable gain amplifier, wherein when the detection circuit detects the end point of the audio signal, the variable gain amplifier controls the amplitude of the audio signal by the control output of the control circuit. Is increased over a predetermined period of time. Therefore, when a consonant follows a continuous voice, its amplitude is enlarged.

【０００９】[0009]

【発明の実施の形態】ところで、通常の会話の音声は、
ある程度のまとまりをもって発音されており、音声の言
語知覚も、各音の知覚と、まとまった語音の特徴の知覚
との両方から行なわれていると言われている。DESCRIPTION OF THE PREFERRED EMBODIMENTS The voice of a normal conversation is
It is said to be pronounced with a certain degree of cohesion, and it is said that the speech perception of speech is also performed from both the perception of each sound and the perception of the characteristics of a set of speech sounds.

【００１０】また、健聴者の場合、雑音が少なく受聴環
境の良いところでは、特別な音声処理を行なわなくても
音声を十分に聞き取ることができるが、雑音が多いとこ
ろでは、言葉の聞き取りが悪くなったりする。この原因
はいくつか考えられるが、主なものに、継時マスキング
の影響がある。つまり、前の母音が次の音の子音部をマ
スクすることがあり、この結果、子音の聴覚的な感度が
悪くなって聞き取りにくくなるものである。[0010] In addition, in the case of a normal hearing person, in a place where there is little noise and the listening environment is good, the voice can be sufficiently heard without performing any special sound processing, but in a place where there is a lot of noise, words are not well heard. Or become. There are several possible causes, but the main one is the effect of successive masking. In other words, the previous vowel may mask the consonant part of the next sound, and as a result, the auditory sensitivity of the consonant becomes poor and the consonant becomes difficult to hear.

【００１１】そこで、この発明においては、音声のまと
まりの終了点を検出し、この終了点から十数ｍ秒〜数十
ｍ秒の期間、音声信号（特にその高域成分）の振幅を増
幅し、これにより子音に対する聴覚的な感度を相対的に
高くするものである。また、音声のまとまりの終了点
は、音声のピッチ成分およびフォルマント成分のレベル
変化を解析することにより検出する。Therefore, according to the present invention, the end point of a group of voices is detected, and the amplitude of the voice signal (particularly its high-frequency component) is amplified for a period of several tens to several tens of milliseconds from this end point. Thus, the auditory sensitivity to consonants is relatively increased. Also, the end point of the united voice is detected by analyzing the level change of the pitch component and the formant component of the voice.

【００１２】図１は、この発明の一形態を示すもので、
もとの処理前の音声信号Ｓ11が入力端子１１を通じて可
変利得アンプ１２に供給され、このアンプ１２の出力信
号Ｓ12が出力端子１３に取り出される。FIG. 1 shows an embodiment of the present invention.
The original unprocessed audio signal S11 is supplied to the variable gain amplifier 12 through the input terminal 11, and the output signal S12 of the amplifier 12 is extracted to the output terminal 13.

【００１３】さらに、端子１１の信号Ｓ11が、前処理の
ため、バンドパスフィルタ２１およびレベル検出回路２
２に順に供給される。この場合、バンドパスフィルタ２
１は、連続する音声の終了点を検出しやすくし、かつ、
雑音による影響が小さくなるように、信号Ｓ11からピッ
チ成分とフォルマント成分とを抽出するものである。し
たがって、その通過帯域は、例えば150Hz 〜1000Hzとさ
れている。Further, the signal S11 at the terminal 11 is converted to a band-pass filter 21 and a level detection circuit 2 for preprocessing.
2 in turn. In this case, the bandpass filter 2
1 makes it easier to detect the end point of the continuous voice, and
The pitch component and the formant component are extracted from the signal S11 so that the influence of noise is reduced. Therefore, the pass band is, for example, 150 Hz to 1000 Hz.

【００１４】また、レベル検出回路２２は、バンドパス
フィルタ２１の出力信号Ｓ21を使用して連続する音声の
終了点を検出するためのものである。このため、レベル
検出回路２２は、例えば、信号Ｓ21を両波整流するとと
もに、その低域成分（例えば60Hz以下の成分）を取り出
すことにより、信号Ｓ11のレベルを示す信号Ｓ22を形成
している。The level detection circuit 22 detects the end point of the continuous sound using the output signal S21 of the band-pass filter 21. For this reason, the level detection circuit 22 forms a signal S22 indicating the level of the signal S11 by, for example, performing double-wave rectification on the signal S21 and extracting a low-frequency component (for example, a component of 60 Hz or less).

【００１５】そして、このレベル検出回路２２の検出信
号Ｓ22が終了点検出回路２３に供給され、連続する音声
の終了点が検出され、その終了点の検出信号Ｓ23が利得
制御回路２４に供給されて制御信号Ｓ24が形成され、こ
の信号Ｓ24が可変利得アンプ１２に利得の制御信号とし
て供給される。The detection signal S22 of the level detection circuit 22 is supplied to an end point detection circuit 23, where the end point of the continuous sound is detected, and the detection signal S23 of the end point is supplied to a gain control circuit 24. A control signal S24 is formed, and this signal S24 is supplied to the variable gain amplifier 12 as a gain control signal.

【００１６】この場合、連続する音声の終了点の検出
は、音声レベル、つまり、信号Ｓ22のレベルが、第１の
しきい値（音声存在判定しきい値）を一度越え、その
後、第２のしきい値（音声終了判定しきい値）よりも小
さくなるときに行う。また、利得の制御は、その音声の
終了点の検出時点の直後から、例えば十数ｍ秒〜数十ｍ
秒の期間にわたって、利得が大きくなるように行う。た
だし、第１のしきい値は第２のしきい値以上とする。In this case, the end point of the continuous voice is detected when the voice level, that is, the level of the signal S22 once exceeds the first threshold value (voice presence determination threshold value), and then the second threshold value. It is performed when it becomes smaller than a threshold value (speech end determination threshold value). Further, the gain is controlled, for example, from several tens of msec to several tens m
The gain is increased over a period of seconds. However, the first threshold is equal to or greater than the second threshold.

【００１７】このような構成によれば、端子１１に入力
された音声信号Ｓ11が連続している期間は、制御信号Ｓ
23により可変利得アンプ１２の利得は基準値に固定され
ている。したがって、その音声信号Ｓ11が、そのまま出
力信号Ｓ12として端子１３に取り出される。According to such a configuration, while the audio signal S11 input to the terminal 11 is continuous, the control signal S11
By 23, the gain of the variable gain amplifier 12 is fixed to the reference value. Therefore, the audio signal S11 is directly taken out to the terminal 13 as the output signal S12.

【００１８】しかし、音声信号Ｓ11の連続が終了する
と、その終了点の時点から制御信号Ｓ24により可変利得
アンプ１２の利得が基準値よりも大きくされるので、そ
の終了点の時点から所定の期間、音声信号Ｓ11があれ
ば、音声信号Ｓ12の振幅は本来の大きさよりも大きくな
る。したがって、連続する音声（音声信号Ｓ11）の直後
の子音に対する聴感的な感度が継時マスキングにより低
下していても、その子音の振幅は本来の大きさよりも大
きくなっているので、その感度の低下は相殺されること
になり、その子音を含む音声の明瞭度が向上する。However, when the continuation of the audio signal S11 is completed, the gain of the variable gain amplifier 12 is made larger than the reference value by the control signal S24 from the time of the end point. If there is the audio signal S11, the amplitude of the audio signal S12 becomes larger than the original amplitude. Therefore, even if the perceptual sensitivity to the consonant immediately after the continuous voice (voice signal S11) is reduced by successive masking, the amplitude of the consonant is larger than the original size, and the sensitivity is reduced. Are canceled out, and the clarity of the voice including the consonant is improved.

【００１９】図２〜図４は、検出回路２３および制御回
路２４が、検出信号Ｓ22から制御信号Ｓ24を形成する方
法の一形態を示す。すなわち、この場合には、図１に示
した回路の全部がデジタル化され、例えばＤＳＰにより
構成される。そして、音声信号Ｓ11はもとの処理前のア
ナログ音声信号をＡ／Ｄ変換したデジタル音声信号とさ
れる。2 to 4 show an embodiment of a method in which the detection circuit 23 and the control circuit 24 form the control signal S24 from the detection signal S22. That is, in this case, all of the circuits shown in FIG. 1 are digitized and configured by, for example, a DSP. Then, the audio signal S11 is a digital audio signal obtained by A / D converting the original analog audio signal before processing.

【００２０】そして、検出回路２３および制御回路２４
においては、デジタル音声信号Ｓ11の１サンプルごと
に、図２の処理ルーチン１００が実行される。このルー
チン１００においては、アンプ１２の利得を変更すると
き、音声信号Ｓ12のレベルが不連続にならないようにす
るため、例えば図５に示すように、次第に変化するよう
にしている。The detection circuit 23 and the control circuit 24
2, the processing routine 100 of FIG. 2 is executed for each sample of the digital audio signal S11. In this routine 100, when the gain of the amplifier 12 is changed, the level of the audio signal S12 is changed gradually as shown in FIG. 5, for example, in order to prevent discontinuity.

【００２１】また、ルーチン１００および以下の説明に
おいて、各変数の意味は以下のとおりである。In the routine 100 and the following description, the meaning of each variable is as follows.

【００２２】 e(i) ：音声信号Ｓ11の第ｉ番目のサンプルのレベ
ル。 peak ：音声信号Ｓ11のピーク値。 peakmax ：値peakの上限を制限するときの値。 rate ：信号Ｓ11が値peakから何％小さくなった
ら、音声がなくなったと判断するかの割り合い。 voicemin ：第１のしきい値（音声存在判定しきい
値）。信号Ｓ11がこの値を越えたとき、音声の存在を認
める。 threshold ：第２のしきい値（音声終了判定しきい
値）。信号Ｓ11がこの値よりも小さくなったとき、音声
の終了点であると判定する。voicemin≧threshold 、th
reshold=peak×rateである。 slope1 ：アンプ１２の利得を大きくしていく期間Ｔ
1 の長さ。 slope2 ：アンプ１２の利得をもとに戻していく期間
Ｔ2 の長さ。 enable ：第(i-1) 番目のサンプルに、音声が存在す
るかどうかを示すフラグ。 enable＝“０”のとき、音声は存在しない。 enable＝“１”のとき、音声が存在する。 jj ：連続する音声の終了点から時間を計数する
ときの変数。E (i): the level of the i-th sample of the audio signal S11. peak: peak value of the audio signal S11. peakmax: Value for limiting the upper limit of the value peak. rate: The rate of what percentage of the signal S11 becomes smaller than the value of peak to determine that the sound is gone. voicemin: First threshold (voice presence determination threshold). When the signal S11 exceeds this value, the presence of voice is recognized. threshold: The second threshold value (voice end determination threshold value). When the signal S11 becomes smaller than this value, it is determined that the sound is the end point. voicemin ≧ threshold, th
reshold = peak × rate. slope1: period T during which the gain of the amplifier 12 is increased
Length of 1. slope2: The length of the period T2 in which the gain of the amplifier 12 is restored. enable: Flag indicating whether or not voice exists in the (i-1) th sample. When enable = "0", no voice exists. When enable = "1", voice is present. jj: A variable for counting time from the end point of a continuous voice.

【００２３】そして、ルーチン１００においては、ま
ず、ステップ１０１において、e(i)＜threshold である
かどうかをチェックすることにより第ｉ番目のサンプル
に音声が存在しないかどうかが判別され、信号Ｓ11が存
在しないとき（e(i)＜threshold のとき）には、処理は
ステップ１０２に進む。そして、ステップ１０２におい
て、enable＝“１”であるかどうかをチェックすること
により１つ前のサンプルである第(i-1) 番目のサンプル
に音声が存在していたかどうかが判別され、１つ前のサ
ンプルに音声が存在していたとき（enable＝“１”のと
き）には、処理はステップ１０３に進む。In the routine 100, first, at step 101, it is determined whether or not e (i) <threshold holds, thereby determining whether or not there is no voice in the i-th sample. If it does not exist (e (i) <threshold), the process proceeds to step 102. Then, in step 102, by checking whether enable = "1", it is determined whether or not the voice exists in the (i-1) -th sample which is the immediately preceding sample. If a voice exists in the previous sample (when enable = "1"), the process proceeds to step 103.

【００２４】したがって、処理がステップ１０３に進む
のは、連続する２つのサンプルのうち、前のサンプルに
は音声が存在し、かつ、後ろのサンプルには音声が存在
しないときであり、これは連続した音声が終了したとき
である。つまり、連続した音声の終了が検出されたとき
である。Therefore, the process proceeds to step 103 when the sound is present in the preceding sample and the sound is not present in the succeeding sample of the two consecutive samples. This is when the sound that has been played ends. That is, when the end of the continuous voice is detected.

【００２５】そこで、ステップ１０３において、次の第
(i+1) 番目のサンプルに備えてenable＝“０”とされる
とともに、jj＝０とされ、第ｉ番目のサンプルについて
の処理を終了する。Therefore, in step 103, the following
In preparation for the (i + 1) -th sample, enable = "0" is set and jj = 0, and the processing for the i-th sample is completed.

【００２６】また、ステップ１０２において、１つ前の
サンプルに信号Ｓ11が存在していないとき（enable＝
“０”のとき）には、処理はステップ１１１に進み、jj
＜slope1であるかどうかをチェックすることによりサン
プル時点が期間Ｔ1 に含まれるかどうかが判別され、含
まれるとき（jj＜slope1のとき）には、処理はステップ
１１２に進む。つまり、期間Ｔ1 には、処理はステップ
１１２に進む。In step 102, when the signal S11 does not exist in the immediately preceding sample (enable =
If “0”), the process proceeds to step 111, where jj
It is determined whether or not the sample time is included in the period T1 by checking whether or not <slope1. If the sample time is included (when jj <slope1), the process proceeds to step 112. That is, the process proceeds to step 112 during the period T1.

【００２７】そして、このステップ１１２において、ア
ンプ１２の利得が１ステップ分だけ大きくされる。ま
た、変数jjが「１」だけインクリメントされる。そし
て、これで第ｉ番目のサンプルについての処理を終了す
る。Then, in step 112, the gain of the amplifier 12 is increased by one step. Also, the variable jj is incremented by “1”. Then, the processing for the i-th sample is completed.

【００２８】さらに、ステップ１１１において、サンプ
ル時点が期間Ｔ1 に含まれないとき（jj≧slope1のと
き）には、処理はステップ１２１に進む。つまり、期間
Ｔ1 を過ぎているときには、処理はステップ１２１に進
む。Further, in step 111, when the sampling time is not included in the period T1 (when jj ≧ slope1), the processing proceeds to step 121. That is, when the period T1 has passed, the process proceeds to step 121.

【００２９】そして、このステップ１２１において、jj
≧slope1、かつ、jj＜(slope1 ＋slope2) であるかどう
かをチェックすることによりサンプル時点が期間Ｔ2 に
含まれるかどうかが判別され、含まれるとき（jj≧slop
e1、かつ、jj＜(slope1 ＋slope2) のとき）には、処理
はステップ１２２に進む。つまり、期間Ｔ2 には、処理
はステップ１２２に進む。Then, in step 121, jj
By checking whether or not ≧ slope1 and jj <(slope1 + slope2), it is determined whether or not the sample time is included in the period T2.
If e1 and jj <(slope1 + slope2), the process proceeds to step 122. That is, the process proceeds to step 122 during the period T2.

【００３０】そして、このステップ１２２において、ア
ンプ１２の利得が１ステップ分だけ小さくされる。ま
た、変数jjが「１」だけインクリメントされる。そし
て、これで第ｉ番目のサンプルについての処理を終了す
る。In this step 122, the gain of the amplifier 12 is reduced by one step. Also, the variable jj is incremented by “1”. Then, the processing for the i-th sample is completed.

【００３１】さらに、ステップ１２１において、サンプ
ル時点が期間Ｔ2 を過ぎているとき（jj≧(slope1 ＋sl
ope2) のとき）には、アンプ１２の利得の変更などをし
ないで、第ｉ番目のサンプルについての処理を終了す
る。Further, in step 121, when the sampling time has passed the period T2 (jj ≧ (slope1 + sl
In the case of ope2), the processing for the i-th sample is ended without changing the gain of the amplifier 12.

【００３２】こうして、以上の処理によれば、連続して
いた音声が終了すると、これが検出され、図３に示すよ
うに、アンプ１２の利得が制御される。Thus, according to the above-described processing, when the continuous voice ends, this is detected, and the gain of the amplifier 12 is controlled as shown in FIG.

【００３３】そして、ルーチン１００においては、さら
に、アンプ１２の利得を大きくしたときに、音声レベル
のピーク値が極端に大きくなることを防ぐため、ピーク
値に上限を設けている。また、ルーチン１００において
は、第２のしきい値は、音声の連続ごとに設定するとと
もに、音声レベルが第１のしきい値を越えてから音声レ
ベルのピークを見つけ、そのピーク値の数％〜数十％の
値に設定している。In the routine 100, an upper limit is set on the peak value of the audio level to prevent the peak value of the audio level from becoming extremely large when the gain of the amplifier 12 is increased. In the routine 100, the second threshold value is set for each continuous voice, and a peak of the voice level is found after the voice level exceeds the first threshold value, and several% of the peak value is found. It is set to a value of ~ several tens%.

【００３４】すなわち、ステップ１０１において、音声
が存在するとき（e(i)≧thresholdのとき）には、処理
はステップ１３１に進む。そして、このステップ１３１
において、enable＝“０”であるかどうかをチェックす
ることにより１つ前のサンプルである第(i-1) 番目のサ
ンプルに音声が存在していないかどうかが判別され、１
つ前のサンプルに音声が存在していないとき（enable＝
“０”のとき）には、処理はステップ１４１に進む。That is, in step 101, when a voice is present (when e (i) ≧ threshold), the process proceeds to step 131. And this step 131
In step (1), it is determined whether or not enable = "0" to determine whether or not voice exists in the (i-1) th sample which is the immediately preceding sample.
When there is no sound in the previous sample (enable =
If “0”, the process proceeds to step 141.

【００３５】この場合、処理がステップ１４１に進むの
は、ステップ１０１およびステップ１３１を通じてであ
るから、これは、連続する２つのサンプルのうち、前の
サンプルに音声が存在しなくて後ろのサンプルに音声が
存在する場合であり、つまり、音声が開始されたときで
ある。In this case, since the process proceeds to step 141 through steps 101 and 131, this means that the sound does not exist in the preceding sample and the succeeding sample in the two consecutive samples does not exist. This is the case when audio is present, that is, when audio is started.

【００３６】そこで、ステップ１４１において、e(i)＞
voiceminであるかどうかをチェックすることにより音声
が開始されたかどうかの確認が取られ、開始されている
とき（e(i)＞voiceminのとき）には、処理はステップ１
４２に進み、レベルe(i)が上限値peakmax を越えている
かどうかが判別される。そして、越えていないとき（e
(i)≦peakmax のとき）には、処理はステップ１４３に
進み、ピーク値peakがレベルe(i)に設定され、その後、
処理はステップ１４５に進む。Therefore, in step 141, e (i)>
A check is made to see if the voice has been started by checking if it is voicemin, and if it has been started (e (i)> voicemin), the process proceeds to step 1
Proceeding to 42, it is determined whether the level e (i) exceeds the upper limit value peakmax. And when it does not exceed (e
(i) ≦ peakmax), the process proceeds to step 143, where the peak value peak is set to the level e (i), and thereafter,
The process proceeds to step 145.

【００３７】また、ステップ１４２において、レベルe
(i)が上限値peakmax を越えているとき（e(i)＞peakmax
のとき）には、処理はステップ１４４に進み、ピーク
値peakが上限値peakmax に設定され、その後、処理はス
テップは１４５に進む。In step 142, the level e
When (i) exceeds the upper limit value peakmax (e (i)> peakmax
), The process proceeds to step 144, where the peak value peak is set to the upper limit value peakmax, and then the process proceeds to step 145.

【００３８】そして、ステップ１４５においては、ステ
ップ１４３あるいはステップ１４４で設定されたピーク
値peakにしたがって、第２のしきい値threshold が、th
reshold ＝peak×rateで示される値に設定されるととも
に、次の第(i+1) 番目のサンプルに備えてenable＝
“１”とされ、その後、ステップ１６１に進む。Then, in step 145, the second threshold value threshold is set to the threshold value th according to the peak value peak set in step 143 or 144.
reshold = set to the value indicated by peak x rate and enable = for the next (i + 1) th sample
It is set to "1", and thereafter, the process proceeds to step 161.

【００３９】さらに、ステップ１４１において、音声が
開始されていないとき（e(i)≦voiceminのとき）には、
処理はステップ１４１からそのままステップ１６１に進
み、ピーク値peakおよびフラグenableは設定されない。Further, in step 141, when voice has not been started (when e (i) ≦ voicemin),
The process proceeds directly from step 141 to step 161 and the peak value peak and the flag enable are not set.

【００４０】こうして、音声が開始されたときには、そ
の開始時のレベルに対応して第２のしきい値threshold
が設定される。Thus, when the sound is started, the second threshold value threshold is set corresponding to the level at the start.
Is set.

【００４１】一方、ステップ１３１において、１つ前の
サンプルに音声が存在していたとき（enable＝“１”の
とき）には、処理はステップ１５１に進む。この場合、
処理がステップ１５１に進むのは、ステップ１０１およ
びステップ１３１を通じてであるから、これは、連続す
る２つのサンプルの両方に音声が存在するときであり、
音声が連続しているときである。つまり、音声が連続し
ている期間のときである。On the other hand, if it is determined in step 131 that a sound exists in the immediately preceding sample (when enable = "1"), the process proceeds to step 151. in this case,
Since the process proceeds to step 151 through steps 101 and 131, this is when there is speech in both two consecutive samples,
This is when the sound is continuous. In other words, this is a period during which the sound is continuous.

【００４２】そこで、ステップ１５１において、第ｉ番
目のサンプルのレベルe(i)がそれまでのピーク値peakと
比較され、レベルe(i)のほうが大きいとき（e(i)＞peak
のとき）には、処理はステップ１５１からステップ１５
２に進み、レベルe(i)が上限値peakmax を越えているか
どうかが判別される。そして、越えていないとき（e(i)
＜peakmax のとき）には、処理はステップ１５３に進
み、ピーク値peakがレベルe(i)に更新され、その後、処
理はステップ１５５に進む。Then, in step 151, the level e (i) of the i-th sample is compared with the previous peak value peak, and when the level e (i) is larger (e (i)> peak)
), The processing proceeds from step 151 to step 15
Proceeding to 2, it is determined whether or not the level e (i) exceeds the upper limit value peakmax. And when it does not exceed (e (i)
(<Peakmax), the process proceeds to step 153, where the peak value peak is updated to the level e (i), and then the process proceeds to step 155.

【００４３】また、ステップ１５２において、レベルe
(i)が上限値peakmax を越えているとき（e(i)＜peakmax
のとき）には、処理はステップ１５４に進み、ピーク
値peakが上限値peakmax に更新され、その後、処理はス
テップは１５５に進む。In step 152, level e
When (i) exceeds the upper limit peakmax (e (i) <peakmax
), The process proceeds to step 154, where the peak value peak is updated to the upper limit value peakmax, and then the process proceeds to step 155.

【００４４】そして、ステップ１５５においては、ステ
ップ１５３あるいはステップ１５４で更新されたピーク
値peakにしたがって、第２のしきい値threshold が、th
reshold ＝peak×rateで示される値に更新され、その
後、ステップ１６１に進む。Then, in step 155, the second threshold value threshold is set to th according to the peak value peak updated in step 153 or step 154.
reshold = peak × rate is updated to the value indicated, and then the process proceeds to step 161.

【００４５】さらに、ステップ１５１において、それま
でのピーク値peakのほうが大きいとき（e(i)≦peakのと
き）には、処理はステップ１５１からそのままステップ
１６１に進み、ピーク値peakは更新されない。Further, in step 151, when the peak value peak so far is larger (when e (i) ≦ peak), the process proceeds from step 151 to step 161 as it is, and the peak value peak is not updated.

【００４６】こうして、音声が連続しているときには、
その連続期間におけるピーク値peakが見つけられるとと
もに、そのピーク値peakにしたがって第２のしきい値th
reshold が更新される。Thus, when the sound is continuous,
A peak value peak in the continuous period is found, and a second threshold value th according to the peak value peak is obtained.
reshold is updated.

【００４７】そして、処理がステップ１６１に進むと、
jj＞０、かつ、jj＜slope1であるかどうかをチェックす
ることによりサンプル時点が期間Ｔ1 に含まれるかどう
かが判別され、含まれるとき（jj＞０、かつ、jj＜slop
e1のとき）には、処理はステップ１６２に進む。つま
り、期間Ｔ1 には、処理はステップ１６２に進む。Then, when the processing proceeds to step 161,
By checking whether jj> 0 and jj <slope1, it is determined whether or not the sample time is included in the period T1, and when it is included (jj> 0 and jj <slop1).
In the case of e1, the process proceeds to step 162. That is, the process proceeds to step 162 during the period T1.

【００４８】そして、このステップ１６２において、ア
ンプ１２の利得が１ステップ分だけ大きくされる。ま
た、変数jjが「１」だけインクリメントされる。そし
て、これで第ｉ番目のサンプルについての処理を終了す
る。Then, in this step 162, the gain of the amplifier 12 is increased by one step. Also, the variable jj is incremented by “1”. Then, the processing for the i-th sample is completed.

【００４９】さらに、ステップ１６１において、サンプ
ル時点が期間Ｔ1 に含まれないとき（jj≧slope1のと
き）には、処理はステップ１７１に進む。つまり、期間
Ｔ1 を過ぎたときには、処理はステップ１７１に進む。Further, in step 161, when the sampling time is not included in the period T 1 (when jj ≧ slope 1), the processing proceeds to step 171. That is, when the period T1 has passed, the process proceeds to step 171.

【００５０】そして、このステップ１７１において、jj
≧slope1、かつ、jj＜(slope1 ＋slope2) であるかどう
かをチェックすることによりサンプル時点が期間Ｔ2 に
含まれるかどうかが判別され、含まれるとき（jj≧slop
e1、かつ、jj＜(slope1 ＋slope2) のとき）には、処理
はステップ１７２に進む。つまり、期間Ｔ2 には、処理
はステップ１７２に進む。Then, in this step 171, jj
By checking whether or not ≧ slope1 and jj <(slope1 + slope2), it is determined whether or not the sample time is included in the period T2.
If e1 and jj <(slope1 + slope2), the process proceeds to step 172. That is, the process proceeds to step 172 during the period T2.

【００５１】そして、このステップ１７２において、ア
ンプ１２の利得が１ステップ分だけ小さくされる。ま
た、変数jjが「１」だけインクリメントされる。そし
て、これで第ｉ番目のサンプルについての処理を終了す
る。Then, in this step 172, the gain of the amplifier 12 is reduced by one step. Also, the variable jj is incremented by “1”. Then, the processing for the i-th sample is completed.

【００５２】さらに、ステップ１７１において、サンプ
ル時点が期間Ｔ2 を過ぎているとき（jj≧(slope1 ＋sl
ope2) のとき）には、処理はステップ１８１に進み、変
数jjが「０」に初期化され、これで第ｉ番目のサンプル
についての処理を終了する。Further, in step 171, when the sampling time has passed the period T2 (jj ≧ (slope1 + sl
In the case of (ope2)), the process proceeds to step 181, where the variable jj is initialized to "0", and the process for the i-th sample is completed.

【００５３】以上の処理により、連続する音声の終了点
を起点として十数ｍ秒〜数十ｍ秒の期間、音声信号Ｓ11
の振幅が大きくされるので、その連続する音声（音声信
号Ｓ11）の直後の子音に対する聴感的な感度が継時マス
キングにより低下していても、その感度の低下は、信号
Ｓ11の振幅の増大により相殺されることになり、その子
音を含む音声の明瞭度が向上する。By the above-described processing, the audio signal S11 is output for a period of about several tens of milliseconds to several tens of milliseconds starting from the end point of the continuous audio.
Therefore, even if the auditory sensitivity to the consonant immediately after the continuous sound (sound signal S11) is reduced by successive masking, the decrease in the sensitivity is caused by the increase in the amplitude of the signal S11. As a result, the intelligibility of the voice including the consonant is improved.

【００５４】図４は、音声波形の観測結果を示すもの
で、図４Ａはルーチン１００による処理を行っていない
音声信号Ｓ11の波形、図４Ｂはルーチン１００による処
理を行った音声信号Ｓ12の波形である。また、このとき
の発声内容は、「１行目に書いてください」である。FIG. 4 shows the result of the observation of the sound waveform. FIG. 4A shows the waveform of the sound signal S11 which has not been processed by the routine 100, and FIG. 4B shows the waveform of the sound signal S12 which has been processed by the routine 100. is there. The utterance content at this time is "Please write on the first line".

【００５５】そして、矢印Ａ、Ｅ、Ｆにより示すよう
に、連続する音声の終了点から次の子音までの時間間隔
が短いときには、その子音に対する継時マスキングが大
きいが、その子音の振幅は拡大されて強調されている。
また、矢印Ｂ、Ｃ、Ｄにより示すように、次の子音まで
の時間間隔が長いときには、継時マスキングは小さい
が、これに対応して子音の振幅は強調されていない。As shown by arrows A, E and F, when the time interval from the end point of a continuous voice to the next consonant is short, successive masking for the consonant is large, but the amplitude of the consonant is enlarged. Being emphasized.
As shown by arrows B, C, and D, when the time interval to the next consonant is long, successive masking is small, but the amplitude of the consonant is not emphasized correspondingly.

【００５６】したがって、上述の処理回路によれば、音
声を残響やエコーなどのある系で伝送あるいは再生する
とき、あるいは難聴者や老人が音声を聞くとき、以下の
ような効果を得ることができる。１．次に発声される子音への継時マスキングが軽減され
るように、その子音だけを強調しているので、音声がは
っきりし、明瞭度を改善できる。２．常に音声の高域を強調すると、音色のバランスが崩
れたような不快感を伴うが、継時マスキングが起きてい
るときのみ子音を強調するので、そのような不快感がな
い。３．原理的に即時処理ができるので、発声者の口の動き
と処理音との間に時間差の生じることがない。また、イ
ヤホンからマイクロフォンへの音響的フィードバックが
あっても、残響音のような音にはならないので、聞きや
すい。４．語音の知覚判断にとって重要な音声成分の変化速度
や、語音のまとまりとしての情報および過渡的な変化部
分の情報が失われない。Therefore, according to the above-described processing circuit, the following effects can be obtained when the sound is transmitted or reproduced in a certain system such as reverberation or echo, or when a hearing-impaired person or an elderly person hears the sound. . 1. Since only the consonant is emphasized so that successive masking to the next consonant is reduced, the sound is clear and the clarity can be improved. 2. When emphasizing the high frequency of the voice is always accompanied by an unpleasant sensation such as a loss of timbre balance, the consonant is emphasized only when successive masking is occurring, so there is no such unpleasant sensation. 3. Since the processing can be performed immediately in principle, there is no time difference between the movement of the mouth of the speaker and the processed sound. Also, even if there is acoustic feedback from the earphone to the microphone, it does not sound like reverberation, so it is easy to hear. 4. The rate of change of the speech component, which is important for the perceptual judgment of the speech sound, the information as a unit of the speech sound, and the information of the transient change portion are not lost.

【００５７】なお、上述において、継時マスキングによ
るマスキング量は、妨害音として作用する音声部分（連
続する音声区間）のレベルおよび継続時間と、連続する
音声部分の終了点からの時間によって変化するが、もと
の音声信号Ｓ11の振幅が制御される期間およびその大き
さを、継時マスキングのマスキング量にあわせて加減す
ることもでき、マスキング量が大きいときには、振幅お
よびその制御期間を大きくすればよい。In the above description, the amount of masking by successive masking varies depending on the level and duration of a sound portion (continuous sound section) acting as a disturbing sound, and the time from the end point of the continuous sound portion. The period during which the amplitude of the original audio signal S11 is controlled and its magnitude can be adjusted according to the masking amount of the successive masking. When the masking amount is large, the amplitude and the control period can be increased. Good.

【００５８】また、上述においては、可変利得アンプ１
２において、音声信号Ｓ11の全帯域について振幅を大き
くしているが、子音に対応する高域だけ振幅を大きくし
てもよい。In the above description, the variable gain amplifier 1
In 2, the amplitude is increased in the entire band of the audio signal S11, but the amplitude may be increased only in the high band corresponding to the consonant.

【００５９】[0059]

【発明の効果】この発明によれば、音声がはっきりし、
明瞭度を改善できる。また、常に音声の高域を強調する
ときのような不快感がない。さらに、発声者の口の動き
と処理音との間に時間差の生じることがない。According to the present invention, the sound is clear,
Clarity can be improved. Also, there is no unpleasant sensation as when always emphasizing the high frequency range of the sound. Furthermore, there is no time difference between the movement of the mouth of the speaker and the processed sound.

【００６０】また、イヤホンからマイクロフォンへの音
響的フィードバックがあっても、残響音のようになら
ず、聞きやすい。さらに、語音の知覚判断などに有効な
情報が損なわれない。Also, even if there is acoustic feedback from the earphone to the microphone, it does not look like a reverberant sound and is easy to hear. Further, effective information for perceptual judgment of speech sounds is not impaired.

[Brief description of the drawings]

【図１】この発明の一形態を示す系統図である。FIG. 1 is a system diagram illustrating one embodiment of the present invention.

【図２】この発明の一形態の一部を示すフローチャート
である。FIG. 2 is a flowchart showing a part of one embodiment of the present invention.

【図３】図２の続きを示すフローチャートである。FIG. 3 is a flowchart showing a continuation of FIG. 2;

【図４】図３の続きを示すフローチャートである。FIG. 4 is a flowchart showing a continuation of FIG. 3;

【図５】この発明を説明するための図である。FIG. 5 is a diagram for explaining the present invention.

【図６】この発明を説明するための図である。FIG. 6 is a diagram for explaining the present invention.

[Explanation of symbols]

１２…可変利得アンプ、２１…バンドパスフィルタ、２
２…レベル検出回路、２３…終了点検出回路、２４…利
得制御回路、１００…処理ルーチン12: variable gain amplifier, 21: band-pass filter, 2
2: Level detection circuit, 23: End point detection circuit, 24: Gain control circuit, 100: Processing routine

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁶ 識別記号ＦＩＨ０３Ｇ 3/00 Ｈ０３Ｇ 3/00 Ａ ──────────────────────────────────────────────────続き Continued on front page (51) Int.Cl. ⁶ Identification code FI H03G 3/00 H03G 3/00 A

Claims

[Claims]

A variable gain amplifier to which an audio signal is supplied; a detection circuit for detecting an end point of the audio signal; and a control circuit for controlling a gain of the variable gain amplifier according to a detection output of the detection circuit. A variable gain amplifier configured to increase the amplitude of the audio signal over a predetermined period by a control output of the control circuit when the end point of the audio signal is detected by the detection circuit. Processing circuit.

2. The audio signal processing circuit according to claim 1, wherein the detection circuit terminates the audio after the level of the audio signal once exceeds a first threshold for recognizing the presence of audio. The point in time when the value becomes smaller than the second threshold value determined as
An audio signal processing circuit configured to generate the detection output.

3. The audio signal processing circuit according to claim 1, wherein the audio signal is supplied to a band-pass filter to extract a pitch component and a formant component of the audio as an output signal. 3. The audio signal processing circuit according to claim 1, wherein an end point of said audio signal is detected from an output signal of said band pass filter.

4. The audio signal processing circuit according to claim 1, wherein the first threshold value is equal to or larger than the second threshold value.

5. The audio signal processing circuit according to claim 1, wherein the second threshold value is set separately from a maximum value of the level of the audio signal during a period in which the audio exists. An audio signal processing circuit that is set to a few percent to a few tens percent of the smaller of the upper limit and the upper limit.

6. The audio signal processing circuit according to claim 1, wherein a gain of said variable gain amplifier is controlled by a level of said continuous audio portion.

7. The audio signal processing circuit according to claim 1, wherein a gain of said variable gain amplifier is controlled by a time from an end point of said audio signal.