JP3824182B2

JP3824182B2 - Audio amplification device, communication terminal device, and audio amplification method

Info

Publication number: JP3824182B2
Application number: JP27153296A
Authority: JP
Inventors: 祐児前田
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 1996-09-20
Filing date: 1996-09-20
Publication date: 2006-09-20
Anticipated expiration: 2016-09-20
Also published as: JPH1098344A

Description

【０００１】
【目次】
以下の順序で本発明を説明する。
発明の属する技術分野
従来の技術
発明が解決しようとする課題
課題を解決するための手段
発明の実施の形態（図１〜図４）
発明の効果
【０００２】
【発明の属する技術分野】
本発明は音声増幅装置及び通信端末装置並びに音声増幅方法に関し、例えば携帯電話装置に適用して好適なものである。
【０００３】
【従来の技術】
従来、移動体通信システムのうち最も普及の著しいサービスとして携帯電話装置がある。近年の携帯電話装置の小型化、軽量化は価格の低減効果に大きく寄与し、加入者数の拡大を促進している。しかしながら加入者数の拡大に伴い様々な問題も出現している。例えばユーザが静かな公共の場所で通話すると話し声が周囲に迷惑をかけてしまうおそれがあつたり、また混雑した電車の中で通話すると会話の内容が他人に聞かれてしまうおそれがあるといつた問題である。このようなときユーザは小さな声で通話することによつて他人に迷惑をかけないようにしているが、小さな声の状態のままで送信すると相手が音声を聞き取りにくいという事態が起こる。
【０００４】
従つてこの事態を解消するため従来の携帯電話装置では端末に切換スイツチを設けて、ユーザがこの切換スイツチを操作しマイクゲインを直接切り換ることにより、予め決められた増幅度に変化するようになされている。これにより小さな音声を増幅し、当該音声を相手が聞き取りやすい音量にしてから送信している。
【０００５】
【発明が解決しようとする課題】
ところでかかる構成の携帯電話装置においては、ユーザ自身が通話の状況に応じて切換スイツチを操作し増幅度の調整をしなければならないが、ユーザが会話の状況に応じて切換スイツチを操作しながら通話するということは困難なことで実際そのようなことはほとんど行われていない。例えばユーザが小さい声で通話するため切換スイツチを予め増幅度の大きい側に設定しているとき、ユーザが相手の話を聞いているときのように音声が入力されていない状況になつたからといつて、ユーザはわざわざ切換スイツチを増幅度の小さい側に切り換えるようなことはせず、通常そのまま通話し続けると考えられる。
【０００６】
このように端末に音声が入力されていなく周囲の雑音だけしか入力されていない場合において、従来の携帯電話装置では周囲の雑音を単純に増幅してから相手に送信してしまう不都合がある。またユーザが小さい声で通話するため切換スイツチを増幅度の大きい側に設定しているとき、通話中周囲に突発的な大音量の雑音が発生してこれが端末に入力された場合、従来の携帯電話装置では音声と共に大音量の雑音も増幅してしまう不都合があり、これにより相手に不快感を与えてしまうおそれがある。
【０００７】
本発明は以上の点を考慮してなされたもので、音声の増幅度を適応的に切り換えて最適な音量に調整し得る音声増幅装置及び通信端末装置並びに音声増幅方法を提案しようとするものである。
【０００８】
【課題を解決するための手段】
かかる課題を解決するため本発明においては、入力音声の所定サンプル数をフレームとし、当該フレーム毎に最大振幅を検出し、当該検出した最大振幅に基づいて入力音声が有音か無音かを検出し、その結果、入力音声が有音であると検出した場合には、かかる最大振幅に基づいて入力音声に対する増幅度を決定し、入力音声が無音であると検出した場合には、当該入力音声を増幅しないように増幅度を決定し、当該決定した増幅度に基づいて入力音声を増幅するようにし、入力音声が有音か無音かを検出する際に、直近の所定フレームの最大振幅のうち少なくとも１フレームの最大振幅が所定の有音無音判定しきい値よりも大きい場合、入力音声が有音であると判定し、直近の所定フレーム全ての最大振幅が有音無音判定しきい値以下である場合、入力音声が無音であると判定するようにした。
従つて本発明では、入力音声が有音か無音かの誤検出を防ぎながら、入力音声が無音である場合、雑音だけが増大することを回避すると共に、当該入力音声が有音である場合、最大振幅に応じて増幅度を適応的に切り換えることができる。
【０００９】
また本発明においては、入力音声が有音であると検出した場合、最大振幅が所定の最大振幅しきい値より大きいか否かを判定し、当該判定結果に応じて最大振幅が最大振幅しきい値以下のときには、予め決められた所定の増幅度に決定し、最大振幅が最大振幅しきい値より大きいときには、当該最大振幅が最大振幅しきい値より超えている度合に応じて所定の増幅度を減少させるようにした。
従つて本発明では、入力音声をクリツピングノイズが発生しないように増幅することができる。
【００１０】
さらに本発明においては、増幅度を決定した際、当該決定した増幅度を所定の演算式に従つて入力音声のフレーム内のサンプル毎に順次平滑化するようにした。
従つて本発明では、入力音声の不連続が生じることを回避することができる。
【００１１】
【発明の実施の形態】
以下図面について、本発明の一実施例を詳述する。
【００１２】
図１において、１は全体として携帯電話装置を示し、大きく分けて送信系回路と受信系回路とから形成されている。まず送信系では、通話時にマイク２に入力された音声信号Ｓ１はアナログ／デイジタル変換器３に供給されるようになされている。アナログ／デイジタル変換器３はこの音声信号Ｓ１を音声データＳ_iにアナログ／デイジタル変換した後、切換器４に出力する。
【００１３】
ここでユーザが操作部５において音量増幅の操作をすると、操作部５はその操作情報を表す操作信号Ｓ３を制御部６に送出する。制御部６はこの操作信号Ｓ３に応じて制御信号Ｓ４を切換器４に出力することにより、切換器４を音量増幅器７側に切り換える。これにより音量増幅の操作がなされた場合には、音声データＳ_iは音量増幅器７に供給される。これとは逆に、ユーザが操作部５において音量増幅解除の操作をすると、制御部６は制御信号Ｓ４を出力して切換器４をスピーチエンコーダ８側に切り換える。これにより音量増幅解除の操作がなされた場合には、音声データＳ_iは音量増幅器７を介さずに直接スピーチエンコーダ８に供給され、増幅されないようになされている。
【００１４】
音量増幅器７は後述するような処理によつて音声データＳ_iを増幅し、これを音声データＳ_oとしてスピーチエンコーダ８に出力する。スピーチエンコーダ８は音声データＳ_i又はＳ_oを所定の方式で符号化し、その結果得られる送信データＳ６をチヤンネルエンコーダ９に出力する。チヤンネルエンコーダ９は送信データＳ６に対して誤り訂正符号を付加する等のデータ処理をフレーム単位で行い、その結果得られる送信データＳ７を変調器１０に出力する。
【００１５】
変調器１０は送信データＳ７に所定の変調を施し、その結果得られる送信信号Ｓ８を送信用の高周波処理回路（ＴＸ）１１に出力する。この送信用の高周波処理回路１１は送信信号Ｓ８を高周波帯域に周波数変換し、その結果得られる送信信号Ｓ９をアンテナ１２に供給する。これによりアンテナ１２から送信信号Ｓ９が送信される。
【００１６】
続いて受信系では、アンテナ１２によつて受信された受信信号Ｓ１０は受信用の高周波処理回路（ＲＸ）１３に入力される。この受信用の高周波処理回路１３は受信信号Ｓ１０を低周波帯域に周波数変換し、その結果得られる受信信号Ｓ１１を復調器１４に出力する。復調器１４は受信信号Ｓ１１を復調することにより受信データＳ１２を得、この受信データＳ１２をチヤンネルデコーダ１５に出力する。
【００１７】
チヤンネルデコーダ１５はフレーム単位で受信データＳ１２に所定のデータ処理（例えば伝送途中で発生した符号誤りを検出し、マスクする処理等）を行い、その結果得られる受信データＳ１３をスピーチデコーダ１６に出力する。スピーチデコーダ１６は受信データＳ１３を音声データＳ１４に復号化し、当該音声データＳ１４をデイジタル／アナログ変換器１７に出力する。デイジタル／アナログ変換器１７は音声データＳ１４を音声信号Ｓ１５にデイジタル／アナログ変換した後、スピーカ１８を介して音声として出力する。
【００１８】
ここで上述した音量増幅器７について図２を用いて具体的に説明する。まず音量増幅器７において音声データＳ_iは音量増幅部２０と最大振幅検出部２１に供給される。最大振幅検出部２１は音声データＳ_iの所定サンプル数をフレームとすることにより、音声データＳ_iの１フレーム分を音声フレームデータＳ_i[n] とする。そして最大振幅検出部２１はこの音声フレームデータＳ_i[n] 中の最大振幅Ａ_maxを検出し、当該最大振幅Ａ_maxを有音／無音検出部２２及び音量制御部２３に出力する。
【００１９】
有音／無音検出部２２は、最大振幅Ａ_max及び有音／無音判定しきい値Ａ_muteを基に有音か無音か検出し、その検出結果を有音／無音フラグデータＳ２０として音量制御部２３に出力する。音量制御部２３では最大振幅Ａ_max及び有音／無音フラグデータＳ２０を基に増幅率ａ_oを決定し、当該増幅率ａ_oを音量増幅部２０に出力する。音量増幅部２０は増幅率ａ_oによつて音声フレームデータＳ_i[n] を増幅し、その結果得られる音声フレームデータＳ_o[n] をスピーチエンコーダ８に出力する。
【００２０】
ここで音量増幅器７における音声増幅方法について図３に示すフローチヤートを用いて具体的に説明する。まずステツプＳＰ１から入つたステツプＳＰ２において、最大振幅検出部２１では音声フレームデータＳ_i[n] から最大振幅Ａ_maxを検出し、当該最大振幅Ａ_maxを有音／無音検出部２２及び音量制御部２３に出力する。
【００２１】
ステツプＳＰ３において有音／無音検出部２２では直近Ｎフレームの最大振幅Ａ_maxを保存しており、この直近Ｎフレームの最大振幅Ａ_maxが有音／無音判定しきい値Ａ_muteより大きいか否かを調べることにより有音か無音か検出し、その検出結果を有音／無音フラグデータＳ２０として音量制御部２３に出力する。有音／無音検出部２２は例えば直近Ｎフレームにおいて少なくとも１フレームが有音／無音判定しきい値Ａ_muteより大きい時有音と判定するようになされている。これにより有音／無音検出部２２は誤検出を防ぐようになされている。
ステツプＳＰ４において音量制御部２３では有音／無音フラグデータＳ２０を調べることにより、そのフレームが有音か無音かを判定する。その結果、有音と判定された場合には音量制御部２３はステツプＳＰ５に移行し、無音と判定された場合にはステツプＳＰ６に移行する。
【００２２】
ここで音量制御部２３においては予め決められた増幅率をａ_uとし、この増幅率ａ_uを信号レベルに応じて変化させた増幅率をａ_iとしている。さらにこの増幅率ａ_iをサンプリングデータ毎に平滑化した増幅率をａ_oとしている。
ところで最大振幅Ａ_maxがダイナミツクレンジの最大値より大きい値に増幅されると、波形のピーク値付近が切りとられるようなひずみ、すなわちクリツピングが発生する。このクリツピングは雑音の発生原因になるため、音量制御部２３ではクリツピングが発生しないように増幅率ａ_iを変化させている。
さらに音量制御部２３においては増幅率ａ_uによつてダイナミツクレンジの最大値に増幅される最大振幅Ａ_maxをクリツピングの発生しない最大振幅しきい値Ａ_THとし、この最大振幅しきい値Ａ_THを用いてクリツピングが発生しないように増幅率ａ_iを決定している。
【００２３】
ステツプＳＰ５において音量制御部２３では音声フレームデータＳ_i[n] の最大振幅Ａ_maxがクリツピングの発生しない最大振幅しきい値Ａ_THより大きいか否か判定する。その結果、最大振幅Ａ_maxが最大振幅しきい値Ａ_THより大きくないと判定された場合には音量制御部２３はステツプＳＰ７に移行し、最大振幅Ａ_maxが最大振幅しきい値Ａ_THより大きいと判定された場合にはステツプＳＰ８に移行する。
【００２４】
ステツプＳＰ６において音量制御部２３では、増幅率ａ_iに「１」を設定することにより増幅を行わないようにして、無音のときには増幅しないようにしている。ステツプＳＰ７において音量制御部２３では増幅率ａ_iを増幅率ａ_uに決定し、増幅率ａ_uによつて増幅するようにしている。ステツプＳＰ８において音量制御部２３では増幅率ａ_iを、次式
【数１】

によつて求め、クリツピングの発生を防止している。このようにして最大振幅Ａ_maxが最大振幅しきい値Ａ_THより小さいか或いは等しいとき増幅率ａ_iを増幅率ａ_uに決定し、最大振幅Ａ_maxが最大振幅しきい値Ａ_THより大きいときには増幅率ａ_iは増幅率ａ_uより小さくなるようになされている。
【００２５】
ステツプＳＰ９において音量制御部２３ではカウンタｋを「０」に設定する。
ところで増幅率ａ_iはサンプリングデータ毎に値が極端に異なると、音声の不連続が生じるおそれがあり、ステツプＳＰ１０において音量制御部２３ではこの不都合を解決するため増幅率ａ_iを、次式
【数２】

によつて平滑化することにより増幅率ａ_oを得、この増幅率ａ_oを音量増幅部２０に出力する。ここで右辺の増幅率ａ_oは前回求めた、１つ前のサンプリングデータの増幅率ａ_oである。またｍは１以下の定数であり、例えばこの場合には１／４０程度が用いられる。
【００２６】
ステツプＳＰ１１において音量増幅部２０では、音声フレームデータＳ_i[k] 及び増幅率ａ_oを基に音声フレームデータＳ_o[k] を、次式
【数３】

に示すように、音声フレームデータＳ_i[k] と増幅率ａ_oを乗算することによつて算出する。ステツプＳＰ１２において音量制御部２３では、カウンタｋに「１」を加算して、当該カウンタｋをインクリメントする。
【００２７】
ステツプＳＰ１３において音量制御部２３では、カウンタｋが「ｎ−１」に一致したか否か、すなわちサンプリングデータをｎ個分増幅したか否か判定する。その結果、サンプリングデータをｎ個分増幅したと判定された場合には音量制御部２３はステツプＳＰ１４に移つて処理を終了し、増幅していないと判定された場合にはステツプＳＰ１０に戻つて動作を繰り返す。
【００２８】
以上の構成において、ユーザが小さい声で通話するため操作部５において音量増幅の操作をすると、マイク２に入力された音声信号Ｓ１はアナログ／デイジタル変換器３に供給され、当該アナログ／デイジタル変換器３は音声信号Ｓ１を音声データＳ_iにアナログ／デイジタル変換した後、切換器４を介して音量増幅器７に出力する。音量増幅器７はこの音声データＳ_iを音声データＳ_oに増幅した後、スピーチエンコーダ８に出力する。
【００２９】
ここで音量増幅器７の有音／無音検出部２２では最大振幅Ａ_maxが有音／無音判定しきい値Ａ_muteより大きいか否かを調べることにより、有音か無音かを検出し、その結果得られる有音／無音フラグデータＳ２０を音量制御部２３に出力する。その結果、無音と判定された場合には音量制御部２３では増幅率ａ_iに「１」を設定し増幅を行わないようにする。このようにして音声が無音のときには増幅しないようにしたことにより、雑音だけが増大することを回避することができる。
【００３０】
音量制御部２３では音声フレームデータＳ_i[n] の最大振幅Ａ_maxがクリツピングの発生しない最大振幅しきい値Ａ_THより大きいか否か判定する。その結果、最大振幅Ａ_maxが最大振幅しきい値Ａ_THより大きくないと判定された場合には増幅率ａ_iを増幅率ａ_uに決定し、大きいと判定された場合には上述の（１）式に示すように、最大振幅Ａ_maxが最大振幅しきい値Ａ_THより超えている度合に応じて増幅率ａ_iを減少させるように決定している。従つて最大振幅Ａ_maxが最大振幅しきい値Ａ_THより小さいか或いは等しいときは、増幅率ａ_iを増幅率ａ_uに決定し、最大振幅Ａ_maxが最大振幅しきい値Ａ_THより大きいときは、その超えている度合に応じて増幅率ａ_iを増幅率ａ_uより減少させるように決定する。このようにして入力音声が大きくなると増幅率ａ_iが減少するようにしたことにより、クリツピングの発生を押さえて、雑音が発生しないようにしている。
【００３１】
音量増幅部２０では、音声フレームデータＳ_i[k] 及び増幅率ａ_oを基に音声フレームデータＳ_o[k] を上述の（３）式によつて求め、当該音声フレームデータＳ_o[k] をスピーチエンコーダ８に出力する。これにより音量増幅部２０では小さな声を相手が聞き取りやすい音量に増幅することができる。
【００３２】
このようにして入力音声の信号レベルに応じて増幅度を決定するようにしたことにより、信号レベルに合わせて増幅度を適応的に切り換えることができる。これにより携帯電話装置１では周囲の雑音が増大しないように増幅しながら、ユーザの手間をかけずに容易に最適な音量に調整することができ、従来に比して一段と使い勝手を向上し得る。
【００３３】
以上の構成によれば、入力音声の信号レベルに応じて増幅度を決定するようにしたことにより、信号レベルに合わせて増幅度を適応的に切り換えることができ、かくしてユーザの手間をかけずに容易に最適な音量に調整することができる携帯電話装置１を実現できる。
【００３４】
なお上述の実施例においては、音声フレームデータＳ_i[n] の最大振幅Ａ_maxを直接検出した場合について述べたが、本発明はこれに限らず、フレーム当たりの平均電力から最大振幅を検出しても上述の場合と同様の効果を得ることができる。
【００３５】
ここでフレーム当たりの平均電力を用いる場合について、図３との対応部分に同一符号を付して示す図４を用いて説明する。まずステツプＳＰ２０から入つたステツプＳＰ２１において、最大振幅検出部２１では音声フレームデータＳ_i[n] を基にフレーム当たりの平均電力Ｐ_FRAMEを、次式
【数４】

によつて求める。
【００３６】
ステツプＳＰ２２において最大振幅検出部２１では平均電力Ｐ_FRAMEを基にフレーム当たりの平均振幅Ａ_av(FRAME)を、次式
【数５】

によつて求める。
【００３７】
ステツプＳＰ２３において最大振幅検出部２１では、平均振幅Ａ_av(FRAME)に対する最大振幅の比率である波高率ａ_hを基に推定最大振幅Ａ_es(FRAME)を、次式
【数６】

によつて求める。
【００３８】
このようにして最大振幅検出部２１では１フレーム分のサンプリングデータの積分をサンプル数によつて割り算することにより、フレーム当たりの平均電力を算出する。さらに最大振幅検出部２１ではこの平均電力の平方根をとつて平均振幅を求めた後、当該平均振幅を波高率によつて割り算し最大振幅を得る。この処理が終わるとステツプＳＰ３に移り、当該ステツプＳＰ３からステツプＳＰ１４までにおいて、音量増幅器７は平均電力から算出した最大振幅の大きさに応じて増幅度を切り換え、当該増幅度によつて入力音声を増幅している。すなわち音量増幅器７は図３に示すフローチヤートのステツプＳＰ３からステツプＳＰ１４までと同じ動作を行つている。
【００３９】
また上述の実施例においては、本発明を携帯電話装置に適用した場合について述べたが、本発明はこれに限らず、音声の送受信手段を備える通信端末装置に本発明を適用しても上述の場合と同様の効果を得ることができる。
【００４０】
【発明の効果】
上述のように本発明によれば、入力音声の所定サンプル数をフレームとし、当該フレーム毎に最大振幅を検出し、当該検出した最大振幅に基づいて入力音声が有音か無音かを検出し、その結果、入力音声が有音であると検出した場合には、かかる最大振幅に基づいて入力音声に対する増幅度を決定し、入力音声が無音であると検出した場合には、当該入力音声を増幅しないように増幅度を決定し、当該決定した増幅度に基づいて入力音声を増幅するようにし、入力音声が有音か無音かを検出する際に、直近の所定フレームの最大振幅のうち少なくとも１フレームの最大振幅が所定の有音無音判定しきい値よりも大きい場合、入力音声が有音であると判定し、直近の所定フレーム全ての最大振幅が有音無音判定しきい値以下である場合、入力音声が無音であると判定するようにしたことにより、入力音声が有音か無音かの誤検出を防ぎながら、入力音声が無音である場合、雑音だけが増大することを回避すると共に、当該入力音声が有音である場合、最大振幅に応じて増幅度を適応的に切り換えることができ、かくしてユーザの手間をかけずに入力音声を容易に最適な音量に調整することができる。
【図面の簡単な説明】
【図１】本発明の一実施例による携帯電話装置の構成を示すブロツク図である。
【図２】音量増幅器の内部構成を示すブロツク図である。
【図３】音声増幅方法を示すフローチヤートである。
【図４】平均電力を用いた音声増幅方法を示すフローチヤートである。
【符号の説明】
１……携帯電話装置、２……マイク、３……アナログ／デイジタル変換器、４……切換器、５……操作部、６……制御部、７……音量増幅器、８……スピーチエンコーダ、９……チヤンネルエンコーダ、１０……変調器、１２……アンテナ、１４……復調器、１５……チヤンネルデコーダ、１６……スピーチデコーダ、１７……デイジタル／アナログ変換器、１８……スピーカ、２０……音量増幅部、２１……最大振幅検出部、２２……有音／無音検出部、２３……音量制御部。[0001]
【table of contents】
The present invention will be described in the following order.
DESCRIPTION OF THE PREFERRED EMBODIMENTS Embodiments of Means Invention for Solving the Problems to be Solved by Conventional Inventions (FIGS. 1 to 4)
Effect of the Invention
BACKGROUND OF THE INVENTION
The present invention relates to an audio amplifying device, a communication terminal device, and an audio amplifying method, and is suitable for application to, for example, a mobile phone device.
[0003]
[Prior art]
2. Description of the Related Art Conventionally, a mobile phone device is the most widely used service among mobile communication systems. The recent reduction in size and weight of mobile phone devices has greatly contributed to the price reduction effect and has promoted the expansion of the number of subscribers. However, various problems have emerged as the number of subscribers increases. For example, when a user makes a call in a quiet public place, there is a risk that the voice may disturb the surroundings, or when the user talks on a crowded train, the contents of the conversation may be heard by others It is a problem. In such a case, the user does not bother others by calling with a small voice. However, when the voice is transmitted in a low voice state, the other party cannot hear the voice.
[0004]
Therefore, in order to eliminate this situation, the conventional mobile phone device is provided with a switching switch in the terminal, and the user operates the switching switch to directly switch the microphone gain, so that the gain is changed to a predetermined gain. Has been made. As a result, a small sound is amplified, and the sound is transmitted after the volume of the sound is easily heard by the other party.
[0005]
[Problems to be solved by the invention]
By the way, in the mobile phone device having such a configuration, the user himself / herself has to operate the switching switch according to the state of the call and adjust the amplification degree, but the user can make a call while operating the switching switch according to the state of the conversation. It's difficult to do, and in fact it is rarely done. For example, when the switch is set to a higher amplification level in advance so that the user can talk with a lower voice, when the user is in a situation where no voice is input as when listening to the other party's story, Thus, the user does not bother to switch the switching switch to the side with the lower amplification degree, and it is considered that the user usually keeps talking.
[0006]
As described above, in the case where only the ambient noise is input without voice input to the terminal, the conventional mobile phone device has a disadvantage that the ambient noise is simply amplified and then transmitted to the other party. In addition, when the switch is set to a higher amplification level so that the user can talk with a low voice, when a sudden loud noise is generated around the call and this is input to the terminal, A telephone device has a disadvantage that it amplifies a loud noise together with the voice, which may cause discomfort to the other party.
[0007]
The present invention has been made in consideration of the above points, and intends to propose an audio amplifying apparatus, a communication terminal apparatus, and an audio amplifying method capable of adaptively switching the audio amplification degree and adjusting the sound volume to an optimum volume. is there.
[0008]
[Means for Solving the Problems]
In order to solve such a problem, in the present invention, a predetermined number of samples of input speech is used as a frame, the maximum amplitude is detected for each frame, and whether the input speech is sound or silent is detected based on the detected maximum amplitude. As a result, when it is detected that the input sound is sound, the amplification degree for the input sound is determined based on the maximum amplitude, and when it is detected that the input sound is silent, the input sound is The amplification level is determined so as not to be amplified, and the input voice is amplified based on the determined amplification level. When detecting whether the input voice is voiced or silent, at least of the maximum amplitudes of the latest predetermined frames When the maximum amplitude of one frame is larger than a predetermined sound / silence determination threshold, it is determined that the input sound is sound, and the maximum amplitudes of all the latest predetermined frames are less than the sound / silence determination threshold. If the input voice is so determined to be silent.
Therefore, in the present invention, when the input voice is silent while preventing erroneous detection of whether the input voice is voiced or silent, it is avoided that only the noise increases, and when the input voice is voiced, The amplification degree can be adaptively switched according to the maximum amplitude.
[0009]
In the present invention, when it is detected that the input sound is sound, it is determined whether or not the maximum amplitude is larger than a predetermined maximum amplitude threshold value, and the maximum amplitude is the maximum amplitude threshold according to the determination result. When the maximum amplitude is larger than the maximum amplitude threshold, the predetermined amplification is determined according to the degree to which the maximum amplitude exceeds the maximum amplitude threshold. Was reduced.
Therefore, in the present invention, the input voice can be amplified so that clipping noise does not occur.
[0010]
Further, in the present invention, when the amplification degree is determined, the determined amplification degree is sequentially smoothed for each sample in the frame of the input speech according to a predetermined arithmetic expression.
Therefore, in the present invention, it is possible to avoid the occurrence of discontinuity of the input voice.
[0011]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, an embodiment of the present invention will be described in detail with reference to the drawings.
[0012]
In FIG. 1, reference numeral 1 denotes a cellular phone device as a whole, which is roughly composed of a transmission system circuit and a reception system circuit. First, in the transmission system, the audio signal S1 input to the microphone 2 during a call is supplied to the analog / digital converter 3. The analog / digital converter 3 performs analog / digital conversion of the audio signal S1 into audio data S _i , and then outputs it to the switch 4.
[0013]
Here, when the user performs a volume amplification operation on the operation unit 5, the operation unit 5 sends an operation signal S 3 representing the operation information to the control unit 6. The controller 6 switches the switch 4 to the volume amplifier 7 side by outputting a control signal S4 to the switch 4 in response to the operation signal S3. As a result, when a volume amplification operation is performed, the audio data S _i is supplied to the volume amplifier 7. On the contrary, when the user performs an operation of canceling the volume amplification in the operation unit 5, the control unit 6 outputs a control signal S4 and switches the switch 4 to the speech encoder 8 side. Thus, when the volume amplification cancellation operation is performed, the audio data S _i is directly supplied to the speech encoder 8 without going through the volume amplifier 7 and is not amplified.
[0014]
The volume amplifier 7 amplifies the audio data S _i by a process as will be described later, and outputs it to the speech encoder 8 as the audio data S _o . The speech encoder 8 encodes the audio data S _i or S _o by a predetermined method, and outputs the transmission data S 6 obtained as a result to the channel encoder 9. The channel encoder 9 performs data processing such as adding an error correction code to the transmission data S6 in units of frames, and outputs the transmission data S7 obtained as a result to the modulator 10.
[0015]
The modulator 10 performs predetermined modulation on the transmission data S7, and outputs a transmission signal S8 obtained as a result to the high-frequency processing circuit (TX) 11 for transmission. The high-frequency processing circuit 11 for transmission converts the frequency of the transmission signal S8 into a high-frequency band and supplies the transmission signal S9 obtained as a result to the antenna 12. As a result, the transmission signal S9 is transmitted from the antenna 12.
[0016]
Subsequently, in the reception system, the reception signal S10 received by the antenna 12 is input to a high frequency processing circuit (RX) 13 for reception. The high-frequency processing circuit 13 for reception converts the frequency of the reception signal S10 into a low frequency band and outputs the reception signal S11 obtained as a result to the demodulator 14. The demodulator 14 obtains reception data S12 by demodulating the reception signal S11, and outputs the reception data S12 to the channel decoder 15.
[0017]
The channel decoder 15 performs predetermined data processing (for example, processing for detecting and masking a code error occurring during transmission) on the received data S12 in units of frames, and outputs the received data S13 obtained as a result to the speech decoder 16. . The speech decoder 16 decodes the received data S13 into audio data S14 and outputs the audio data S14 to the digital / analog converter 17. The digital / analog converter 17 performs digital / analog conversion of the audio data S14 into an audio signal S15, and then outputs the audio data S14 as audio through the speaker 18.
[0018]
Here, the volume amplifier 7 described above will be specifically described with reference to FIG. First, in the volume amplifier 7, the audio data S _i is supplied to the volume amplifier 20 and the maximum amplitude detector 21. The maximum amplitude detecting unit 21 by a frame a predetermined number of audio data samples S _i, the voice frame data S _i [n] for one frame of audio data S _i. The maximum amplitude detection unit 21 detects the maximum amplitude A _max in the audio frame data S _i [n], and outputs the maximum amplitude A _max to the sound / silence detection unit 22 and the volume control unit 23.
[0019]
The sound / silence detection unit 22 detects sound or silence based on the maximum amplitude A _max and the sound / silence determination threshold value A _mute, and the volume control unit uses the detection result as the sound / silence flag data S20. To 23. The volume control unit 23 determines the amplification factor a _o based on the maximum amplitude A _max and the sound / silence flag data S 20, and outputs the amplification factor a _o to the volume amplification unit 20. The volume amplifying unit 20 amplifies the audio frame data S _i [n] by the amplification factor a _o and outputs the audio frame data S _o [n] obtained as a result to the speech encoder 8.
[0020]
Here, the sound amplification method in the volume amplifier 7 will be specifically described with reference to the flowchart shown in FIG. First, in NyuTsuta step SP2 from step SP1, and detects the maximum amplitude A _max the maximum amplitude detecting unit 21 in the audio frame data S _i [n], the maximum amplitude A _max the sound / silence detector 22 and the volume control unit To 23.
[0021]
In step SP3, the voice / silence detection unit 22 stores the maximum amplitude A _max of the latest N frames, and whether or not the maximum amplitude A _max of the latest N frames is larger than the voice / silence determination threshold value A _mute . Is detected, and the detection result is output to the volume control unit 23 as the sound / silence flag data S20. For example, the sound / silence detection unit 22 determines that sound is present when at least one frame is larger than the sound / silence determination threshold value A _mute in the latest N frames. As a result, the sound / silence detection unit 22 is configured to prevent erroneous detection.
In step SP4, the sound volume control unit 23 checks the sound / silence flag data S20 to determine whether the frame is sound or silence. As a result, when it is determined that there is sound, the volume control unit 23 proceeds to step SP5, and when it is determined that there is no sound, the volume control unit 23 proceeds to step SP6.
[0022]
Here, in the volume control unit 23, a predetermined amplification factor is a _u , and an amplification factor obtained by changing the amplification factor a _u according to the signal level is a _i . Furthermore, an amplification factor obtained by smoothing the amplification factor a _i for each sampling data is defined as a _o .
By the way, when the maximum amplitude A _max is amplified to a value larger than the maximum value of the dynamic range, distortion such as clipping near the peak value of the waveform, that is, clipping occurs. Since this clipping causes noise generation, the volume control unit 23 changes the amplification factor a _i so that clipping does not occur.
Further, in the volume control unit 23, the maximum amplitude A _max amplified to the maximum value of the dynamic range by the amplification factor a _u is set as the maximum amplitude threshold value A _TH where clipping does not occur, and this maximum amplitude threshold value A _TH. Is used to determine the amplification factor a _i so that clipping does not occur.
[0023]
In step SP5, the volume controller 23 determines whether or not the maximum amplitude A _{max of the} audio frame data S _i [n] is larger than the maximum amplitude threshold A _TH at which clipping does not occur. As a result, when it is determined that the maximum amplitude A _max is not larger than the maximum amplitude threshold value A _TH , the volume control unit 23 proceeds to step SP7, and the maximum amplitude A _max is larger than the maximum amplitude threshold value A _TH. If it is determined, step SP8 is entered.
[0024]
In step SP6, the volume control unit 23 sets the amplification factor a _i to “1” so that amplification is not performed, and amplification is not performed when there is no sound. In step SP7, the volume control unit 23 determines the amplification factor a _i as the amplification factor a _u and amplifies the amplification factor a _{u according} to the amplification factor a _u . In step SP8, the volume control unit 23 sets the amplification factor a _i as follows:

To prevent the occurrence of clipping. Thus, when the maximum amplitude A _max is smaller than or equal to the maximum amplitude threshold value A _{TH, the} amplification factor a _i is determined as the amplification factor a _u , and when the maximum amplitude A _max is larger than the maximum amplitude threshold value A _TH. The amplification factor a _i is made smaller than the amplification factor a _u .
[0025]
In step SP9, the volume controller 23 sets the counter k to “0”.
Meanwhile the amplification factor a _i is a value for each sampling data extremely different, there is a possibility that discontinuity of sound occurs, the volume control unit 23 in step SP10 the amplification factor a _i to solve this inconvenience, the following equation [ Number 2]

Obtain the amplification factor a _o by smoothing Te Niyotsu, and outputs the amplification factor a _o the volume amplifier unit 20. Wherein the right side of the amplification factor a _o is previously obtained, the amplification factor a _o of the previous sampling data. Further, m is a constant of 1 or less. For example, about 1/40 is used in this case.
[0026]
In step SP11, the volume amplifying unit 20 converts the audio frame data S _o [k] based on the audio frame data S _i [k] and the amplification factor a _o into the following equation:

As shown in FIG. 4, the calculation is performed by multiplying the audio frame data S _i [k] and the amplification factor a _o . In step SP12, the volume control unit 23 adds “1” to the counter k and increments the counter k.
[0027]
In step SP13, the volume controller 23 determines whether or not the counter k matches “n−1”, that is, whether or not n pieces of sampling data have been amplified. As a result, if it is determined that n pieces of sampling data have been amplified, the sound volume control unit 23 moves to step SP14 and ends the process. If it is determined that the sampling data has not been amplified, the process returns to step SP10. repeat.
[0028]
In the above configuration, when the user performs a volume amplification operation in the operation unit 5 to talk with a low voice, the audio signal S1 input to the microphone 2 is supplied to the analog / digital converter 3, and the analog / digital converter 3 performs analog / digital conversion of the audio signal S1 into audio data S _i and outputs it to the volume amplifier 7 via the switch 4. The volume amplifier 7 amplifies the audio data S _i into the audio data S _o and outputs it to the speech encoder 8.
[0029]
Here, the voice / silence detection unit 22 of the volume amplifier 7 detects whether the maximum amplitude A _max is greater than the voice / silence determination threshold value A _mute, thereby detecting whether the voice is voiced or silent. The obtained sound / silence flag data S20 is output to the volume controller 23. As a result, when it is determined that there is no sound, the volume control unit 23 sets the amplification factor a _i to “1” so that amplification is not performed. In this way, since the amplification is not performed when the sound is silent, it is possible to avoid an increase in noise alone.
[0030]
The volume control unit 23 determines whether or not the maximum amplitude A _{max of the} audio frame data S _i [n] is larger than the maximum amplitude threshold A _TH at which clipping does not occur. As a result, when it is determined that the maximum amplitude A _max is not greater than the maximum amplitude threshold A _TH , the amplification factor a _i is determined as the amplification factor a _u , and when it is determined that the maximum amplitude A _max is greater than (1) ), The amplification factor a _i is determined to be decreased according to the degree to which the maximum amplitude A _max exceeds the maximum amplitude threshold value A _TH . Therefore, when the maximum amplitude A _max is smaller than or equal to the maximum amplitude threshold value A _TH , the amplification factor a _i is determined as the amplification factor a _u , and when the maximum amplitude A _max is larger than the maximum amplitude threshold value A _TH. Determines the amplification factor a _i to be smaller than the amplification factor a _u according to the degree of excess. In this way, when the input voice increases, the amplification factor a _i decreases so that the occurrence of clipping is suppressed and noise is not generated.
[0031]
In the volume amplifying unit 20, the audio frame data S _o [k] is obtained by the above equation (3) based on the audio frame data S _i [k] and the amplification factor a _o, and the audio frame data S _o [k ] To the speech encoder 8. As a result, the volume amplifier 20 can amplify a small voice to a volume that is easy for the other party to hear.
[0032]
Since the amplification degree is determined according to the signal level of the input sound in this way, the amplification degree can be adaptively switched according to the signal level. As a result, the cellular phone device 1 can be easily adjusted to an optimum volume without increasing the user's effort while amplifying the surrounding noise so as not to increase, and the usability can be further improved as compared with the prior art.
[0033]
According to the above configuration, the amplification degree is determined according to the signal level of the input sound, so that the amplification degree can be adaptively switched according to the signal level, and thus, without the user's effort. The mobile phone device 1 that can be easily adjusted to the optimum volume can be realized.
[0034]
In the above-described embodiment, the case where the maximum amplitude A _max of the audio frame data S _i [n] is directly detected has been described. However, the present invention is not limited to this, and the maximum amplitude is detected from the average power per frame. However, the same effect as in the above case can be obtained.
[0035]
Here, the case where the average power per frame is used will be described with reference to FIG. First, in step SP21 entered from step SP20, the maximum amplitude detector 21 calculates the average power P _FRAME per frame based on the audio frame data S _i [n] as follows:

To find out.
[0036]
In step SP22, the maximum amplitude detector 21 calculates the average amplitude A _{av (FRAME)} per frame based on the average power P _FRAME as follows:

To find out.
[0037]
In step SP23, the maximum amplitude detector 21 calculates the estimated maximum amplitude A _{es (FRAME)} based on the crest factor a _h which is the ratio of the maximum amplitude to the average amplitude A _av _(FRAME) by the following equation:

To find out.
[0038]
In this way, the maximum amplitude detector 21 calculates the average power per frame by dividing the integral of sampling data for one frame by the number of samples. Further, the maximum amplitude detector 21 obtains the average amplitude by taking the square root of the average power, and then divides the average amplitude by the crest factor to obtain the maximum amplitude. When this processing is completed, the process proceeds to step SP3, and in step SP3 to step SP14, the volume amplifier 7 switches the amplification degree according to the maximum amplitude calculated from the average power, and the input sound is converted based on the amplification degree. Amplifying. That is, the volume amplifier 7 performs the same operation as the steps SP3 to SP14 of the flow chart shown in FIG.
[0039]
Further, in the above-described embodiments, the case where the present invention is applied to the mobile phone device has been described. However, the present invention is not limited to this, and the above-described embodiment can be applied to a communication terminal device including a voice transmission / reception unit. The same effect as the case can be obtained.
[0040]
【The invention's effect】
As described above, according to the present invention, the predetermined number of samples of the input sound is set as a frame, the maximum amplitude is detected for each frame, and whether the input sound is sound or silence is detected based on the detected maximum amplitude, As a result, when it is detected that the input sound is sound, the amplification degree for the input sound is determined based on the maximum amplitude, and when the input sound is detected to be silent, the input sound is amplified. The amplification level is determined so as not to be amplified, and the input voice is amplified based on the determined amplification level. When detecting whether the input voice is voiced or silent, at least one of the maximum amplitudes of the most recent predetermined frame is detected. When the maximum amplitude of the frame is greater than the predetermined sound / silence determination threshold, it is determined that the input sound is sound, and the maximum amplitude of all of the most recent predetermined frames is less than the sound / silence determination threshold , Input sound When the input sound is silent while avoiding false detection of whether the input sound is sound or no sound, it is possible to prevent the noise from increasing and Can be switched adaptively according to the maximum amplitude, and thus the input voice can be easily adjusted to the optimum volume without any user effort.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a configuration of a cellular phone device according to an embodiment of the present invention.
FIG. 2 is a block diagram showing an internal configuration of a volume amplifier.
FIG. 3 is a flowchart showing an audio amplification method.
FIG. 4 is a flowchart showing an audio amplification method using average power.
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 1 ... Cell-phone apparatus, 2 ... Microphone, 3 ... Analog / digital converter, 4 ... Switch, 5 ... Operation part, 6 ... Control part, 7 ... Volume amplifier, 8 ... Speech encoder 9 ... Channel encoder, 10 ... Modulator, 12 ... Antenna, 14 ... Demodulator, 15 ... Channel decoder, 16 ... Speech decoder, 17 ... Digital / analog converter, 18 ... Speaker, 20... Volume amplifier, 21... Maximum amplitude detector, 22... Sound / silence detector, 23.

Claims

In an audio amplification device that amplifies input audio,
Maximum amplitude detection means for detecting the maximum amplitude for each frame, with the predetermined number of samples of the input speech as a frame,
A voiced / silent detection means for detecting whether the input voice is voiced or silent based on the maximum amplitude detected by the maximum amplitude detecting means;
When the voiced / silent detection means detects that the input voice is voiced, an amplification degree for the input voice is determined based on the maximum amplitude detected by the maximum amplitude detection means, and the voiced / silent Control means for determining the degree of amplification so as not to amplify the input sound when the input means detects that the input sound is silent;
Amplifying means for amplifying the input voice based on the amplification degree determined by the control means ,
The voiced / silent detection means is:
If the maximum amplitude of at least one frame among the maximum amplitudes of the most recent predetermined frame is greater than a predetermined sound / silence determination threshold, it is determined that the input sound is sound, and all of the most recent predetermined frames When the maximum amplitude is less than the sound / silence determination threshold, the input sound is determined to be silent.
Audio amplification device comprising a call.

The maximum amplitude detecting means is
Audio amplification apparatus according to claim 1, characterized in that to detect the maximum amplitude based on the signal level in the frame of the input speech.

The maximum amplitude detecting means is
Audio amplification apparatus according to claim 1, characterized in that to detect the maximum amplitude based on the average power within the frame of the input speech.

The control means includes
When the input sound is detected as sound by the sound / silence detection means, it is determined whether or not the maximum amplitude is larger than a predetermined maximum amplitude threshold, and the maximum amplitude is determined according to the determination result. There when: the maximum amplitude threshold, determined at a predetermined amplification degree that is determined in advance, when the maximum amplitude is larger than the maximum amplitude threshold, the maximum amplitude above the maximum amplitude threshold The audio amplification device according to claim 1 , wherein the predetermined amplification degree is reduced according to a degree exceeding the value.

The control means includes
The sound amplification apparatus according to claim 1 , wherein the determined amplification degree is sequentially smoothed for each sample in the frame of the input sound according to a predetermined arithmetic expression .

In a communication terminal device that amplifies and transmits input speech,
Maximum amplitude detection means for detecting the maximum amplitude for each frame, with the predetermined number of samples of the input speech as a frame,
A voiced / silent detection means for detecting whether the input voice is voiced or silent based on the maximum amplitude detected by the maximum amplitude detecting means;
When the voiced / silent detection means detects that the input voice is voiced, an amplification degree for the input voice is determined based on the maximum amplitude detected by the maximum amplitude detection means, and the voiced / silent Control means for determining the degree of amplification so as not to amplify the input sound when the input means detects that the input sound is silent ;
Amplifying means for amplifying the input voice based on the amplification degree determined by the control means ,
The voiced / silent detection means is:
If the maximum amplitude of at least one frame among the maximum amplitudes of the most recent predetermined frame is greater than a predetermined sound / silence determination threshold, it is determined that the input sound is sound, and all of the most recent predetermined frames When the maximum amplitude is less than the sound / silence determination threshold, the input sound is determined to be silent.
Communication terminal device comprising a call.

The maximum amplitude detecting means is
The communication terminal device according to claim 6, characterized in that to detect the maximum amplitude based on the signal level in the frame of the input speech.

The maximum amplitude detecting means is
The communication terminal device according to claim 6, characterized in that to detect the maximum amplitude based on the average power within the frame of the input speech.

The control means includes
When the input sound is detected as sound by the sound / silence detection means, it is determined whether or not the maximum amplitude is larger than a predetermined maximum amplitude threshold, and the maximum amplitude is determined according to the determination result. There when: the maximum amplitude threshold, determined at a predetermined amplification degree that is determined in advance, when the maximum amplitude is larger than the maximum amplitude threshold, the maximum amplitude above the maximum amplitude threshold The communication terminal apparatus according to claim 6 , wherein the predetermined amplification degree is decreased according to a degree exceeding the value.

The control means includes
The communication terminal apparatus according to claim 6 , wherein the determined degree of amplification is sequentially smoothed for each sample in the frame of the input speech according to a predetermined arithmetic expression .

In an audio amplification method for amplifying input audio,
A maximum amplitude detection step for detecting the maximum amplitude for each frame, with the predetermined number of samples of the input speech as a frame,
A sound / silence detection step for detecting whether the input voice is sound or silence based on the maximum amplitude detected in the maximum amplitude detection step;
When the input sound is detected to be sound in the sound / silence detection step, an amplification degree for the input sound is determined based on the maximum amplitude detected in the maximum amplitude detection step, and the sound / silence An amplification degree determination step for determining an amplification degree so as not to amplify the input voice when the input voice is detected to be silent in the detection step;
An amplification step for amplifying the input speech based on the amplification degree determined in the amplification degree determination step;
With
The voiced / silent detection step is:
If the maximum amplitude of at least one frame among the maximum amplitudes of the most recent predetermined frame is greater than a predetermined sound / silence determination threshold, it is determined that the input sound is sound, and all of the most recent predetermined frames When the maximum amplitude is less than or equal to the sound / silence determination threshold, the input sound is determined to be silent .

The maximum amplitude detection step is
Audio amplification method according to claim 11, characterized in that to detect the maximum amplitude based on the signal level in the frame of the input speech.

The maximum amplitude detection step is
Audio amplification method according to claim 11, characterized in that to detect the maximum amplitude based on the average power within the frame of the input speech.

The amplification degree determining step is:
When it is detected that the input sound is sound in the sound / silence detection step, it is determined whether or not the maximum amplitude is larger than a predetermined maximum amplitude threshold value, and the maximum amplitude is determined according to the determination result. There when: the maximum amplitude threshold, determined at a predetermined amplification degree that is determined in advance, when the maximum amplitude is larger than the maximum amplitude threshold, the maximum amplitude above the maximum amplitude threshold The audio amplification method according to claim 11 , wherein the predetermined amplification is decreased according to a degree exceeding the value.

The amplification degree determining step is:
The speech amplification method according to claim 11 , wherein the determined amplification degree is smoothed sequentially for each sample in the frame of the input speech according to a predetermined arithmetic expression .