JP3854188B2

JP3854188B2 - Audio signal processing device

Info

Publication number: JP3854188B2
Application number: JP2002122863A
Authority: JP
Inventors: 隆小原; 公生三関; 岳彦井阪
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2002-04-24
Filing date: 2002-04-24
Publication date: 2006-12-06
Anticipated expiration: 2022-04-24
Also published as: JP2003316400A

Description

【０００１】
【発明の属する技術分野】
本発明は、一般的には例えば携帯電話等の移動通信分野のディジタル音声通信方式に適用する音声信号処理装置に関し、特に、音声符号化処理でのノイズ抑圧機能およびエコー抑圧機能に関する。
【０００２】
【従来の技術】
一般的に、例えば携帯電話などの移動通信分野では、ディジタル音声通信方式が応用されている。ディジタル音声通信方式では、音声データを圧縮して伝送するために、音声符号化（圧縮符号化）方式が利用されている。
【０００３】
移動通信分野では、代表的な音声符号化方式としてＣＥＬＰ（ＣｏｄｅＥｘｃｉｔｅｄＬｉｎｅａｒＰｒｅｄｉｃｔｉｏｎ）方式と呼ばれる低ビットレート符号化方式が周知である。このような方式により音声符号化を行なう場合に、音声信号だけでなく、高周囲雑音と呼ぶノイズ成分を含む音声信号を符号化することになる。しかし、ノイズ成分やエコー成分を含む音声信号をそのまま符号化すると、品質が劣化した音声符号化データを生成することが知られている。このため、一般的には、音声符号化回路には、ノイズ成分を抑圧した音声信号のみが入力されるように、ノイズキャンセラと呼ぶノイズ抑圧回路が使用されたり、エコー成分を抑圧した音声信号が入力されるように、エコーキャンセラやボイススイッチといったエコー抑圧回路が使用される。
【０００４】
ノイズキャンセラは、例えば音声信号がないとき、即ち周囲ノイズ信号のみ状態を判定して、その特徴を分析し、音声信号とノイズ成分とが混合している区間で当該特徴を用いてノイズ成分を抑圧するように構成されている。エコーキャンセラは、例えば受話側に音声信号が到来しかつ送話側は何も通話していないとき、即ち受話のシングルトーク状態を判定して、受話から送話への回り込みの音響特性を学習し、当該音響特性を用いて送話側の信号に混入したエコー成分を抑圧するように構成されている。ボイススイッチは、例えば受話と送話で信号パワーを比較してパワーの小さい方にロスを入れて、エコー成分を抑圧するように構成されている。
【０００５】
また、現在の携帯電話で使用されている音声符号化方式は、主として音声信号が存在する帯域に制限されている。近年では、更なる高品質を求めるために、音声信号帯域より広い帯域で音声符号化を行う高域音声符号化方式も標準化されつつある。このような広帯域音声符号化方式においても、ＣＥＬＰ方式を利用することになり、高周囲雑音であるノイズ成分を抑圧するためのノイズキャンセラが必要となったり、エコー成分を抑圧するためのエコーキャンセラやボイススイッチが必要となる。
【０００６】
図１７は、ノイズキャンセラを使用した広帯域音声符号化方式を採用した音声信号処理装置の一般的構成を示すブロック図である。
【０００７】
音声処理装置は、マイクロホン１０に入力された音声信号をＡ／Ｄ変換器１１でディジタル音声信号に変換して、当該ディジタル音声信号から音声符号化データ（ＴＸ）を生成する符号化系と、音声符号化データ（ＲＸ）をＤ／Ａ変換器２１でアナログ信号に変換して、スピーカ２０から音声出力を行なう再生系（復号化系）とに大別される。
【０００８】
符号化系は、マイクロホン１０とＡ／Ｄ変換器１１以外に、ノイズキャンセラ７０、エンコーダ７１及びマルチプレクサ（データ多重部）１４を有する。ノイズキャンセラ７０は、ディジタル音声信号から高周囲雑音であるノイズ成分を抑圧する回路である。エンコーダ７１は、ノイズ成分を抑圧されたディジタル音声信号に対して、所定のアルゴリズム（例えばＣＥＬＰ方式）で圧縮符号化する音声符号化回路である。一方、再生系は、通常ではメモリに格納された音声符号化データを元の音声データに復号化するために、ディマルチプレクサ２３及びデコーダ（音声復号化回路）２２を有する。
【０００９】
ここで、特に広帯域方式のエンコーダ７１は、低域用音声符号化器（Ｌコーダと表記する場合がある）７００と、高域用音声符号化器（Ｈコーダと表記する場合がある）７０１とに分かれている。ところで、ノイズキャンセラ７０を経由したディジタル音声信号は、音声信号としてパワーがなく情報的にもさほど重要でない高域音声信号成分と、その他の低域音声信号成分とに分けられる。ある符号化モード時には、高域の音声信号成分は不要であり、予め音声符号化データから除去する方式がある。このため、マルチプレクサ１４は、低域音声信号成分のみの音声符号化データを出力したり、また高域の音声信号成分も含む音声符号化データを出力する。
【００１０】
【発明が解決しようとする課題】
前述したように、低域用音声符号化器７００と、高域用音声符号化器７０１とに分かれているエンコーダでは、符号化モードに従って、Ｌコーダ７００のみが動作することがある。このような符号化モードでは、ノイズキャンセラ７０は、Ａ／Ｄ変換器１１から出力される全ての帯域のディジタル音声信号に対してノイズ抑圧処理を実行する必要は無く、低域の音声信号成分のみに対するノイズ抑圧処理でよい。
【００１１】
しかしながら、従来の方式では、低域用音声符号化器７００のみが動作するモード時においても、ノイズキャンセラ７０は全ての帯域のディジタル音声信号に対して処理を実行する。ここで、通常では、ノイズキャンセラ７０、エンコーダ７１、及びマルチプレクサ１４は、ディジタル信号プロセッサ（ＤＳＰ）により構成されている。このため、従来の方式では、ＤＳＰに対して、ノイズキャンセラ７０の機能を実現する上で、過大なデータ処理量やメモリ量が要求されている問題がある。
【００１２】
そこで、本発明の目的は、音声品質の低下を招くことなく、特に符号化系でのノイズキャンセラの機能に要するデータ処理量やメモリ量を削減できるようにして、結果として音声信号処理効率を向上できる音声信号処理装置を提供することにある。
【００１３】
図１８はエコーキャンセラ７２を使用した広帯域音声符号化方式を採用した音声信号処理装置の一般的構成を示すブロック図であるが、Ｌコーダ７００のみが動作する場合は、低域の音声信号成分のみに対するエコー抑圧処理でよく、同様に、データ処理量やメモリ量の削減が望まれる。
【００１４】
また、図１９のようにボイススイッチ７３についても同様のことが望まれる。そこで、エコー抑圧機能に要するデータ処理量やメモリ量についても、これらを削減して音声信号処理効率を向上することを本発明の目的とする。
【００１５】
【課題を解決するための手段】
本発明の観点は、特に広帯域の音声符号化回路（エンコーダ）とノイズキャンセラとを有する音声信号処理装置において、当該エンコーダに含まれる高域用音声符号化器を動作させないモード時には、高域用ノイズキャンセラ機能を無効する音声信号処理装置に関する。換言すれば、低域用音声符号化器のみを動作させるモード時には、低域用ノイズキャンセラ機能を有効にする音声信号処理装置である。
【００１６】
本発明の観点に従った音声信号処理装置は、ディジタル音声信号を符号化する音声信号処理装置において、前記ディジタル音声信号を高域成分の信号と低域成分の信号に分割する分割手段と、前記高域成分の信号の符号化を指示する動作モード信号に応じて、前記高域成分の信号を符号化する第１の符号化手段と、前記低域成分の信号を符号化する第２の符号化手段と、前記第１及び第２の符号化手段で符号化される前に、前記ディジタル音声信号に含まれるノイズ成分を抑圧する第１の抑圧手段と、前記第２の符号化手段で符号化される前に、前記低域成分の信号に含まれるノイズ成分を抑圧する第２の抑圧手段と、前記動作モード信号により前記高域成分の信号の符号化が指示されない場合には、前記ディジタル音声信号に対して前記第２の符号化手段により前記低域成分の信号を符号化し、かつ前記第１の抑圧手段を動作させないように制御する制御手段とを備えた構成である。
【００１７】
このような構成により、高域音声信号成分に対する音声符号化処理を実行せずに、低域音声信号成分のみに対する音声符号化処理を実行するときに、低域音声信号成分に対してのみノイズ抑圧処理を実行できる。従って、例えばＤＳＰによりノイズ抑圧処理を実行するような構成では、高域音声符号化処理を実行しないモード時には、ノイズキャンセラの機能に要するデータ処理量やメモリ量を削減することができる。従って、結果として音声信号処理効率を向上できる音声信号処理装置を提供できる。
【００１８】
また、別の本発明の観点は、広帯域のエンコーダとエコー抑圧手段（エコーキャンセラ、ボイススイッチ）とを有する音声信号処理装置において、当該エンコーダに含まれる高域用音声符号化器を動作させないモード時には、高域用エコー抑圧手段の機能を無効する音声信号処理装置に関する。換言すれば、低域用音声符号化器のみを動作させるモード時には、低域用エコー抑圧手段の機能を有効にする音声信号処理装置である。
【００１９】
本発明の観点に従った音声信号処理装置は、ディジタル音声信号を高域成分の信号と低域成分の信号に分割する分割手段と、前記高域成分の信号の符号化を指示する動作モード信号に応じて、前記高域成分の信号を符号化する第１の符号化手段と、前記低域成分の信号を符号化する第２の符号化手段と、受話音声信号に起因して生じ、前記ディジタル音声信号に含まれるエコー成分を抑圧する第１の抑圧手段と、前記受話音声信号に起因して生じ、前記低域成分の信号に含まれるエコー成分を抑圧する第２の抑圧手段と、前記動作モード信号により前記高域成分の信号の符号化が指示されない場合には、前記ディジタル音声信号に対して前記低域成分の信号を符号化し、かつ前記第１の抑圧手段を動作させないように制御する制御手段とを備えた構成である。
【００２０】
このような構成により、高域音声信号成分に対する音声符号化処理を実行せずに、低域音声信号成分に対してのみ音声符号化処理を実行するときに、低域音声信号成分に対してのみエコー抑圧処理を実行できる。従って、例えばＤＳＰによりエコー抑圧処理を実行するような構成では、高域音声符号化処理を実行しないモード時には、エコーキャンセラやボイススイッチの機能に要するデータ処理量やメモリ量を削減することができる。従って、結果として音声信号処理効率を向上できる音声信号処理装置を提供できる。
【００２１】
【発明の実施の形態】
本発明の主要構成は、図１６の（Ａ）〜（Ｄ）に示すように４つのパターンに分類される。図１６（Ａ）は、帯域分割手段によって符号化系の信号を帯域分割した後に低域信号に対して補正を行った後に、高域と低域の各々を符号化する。（Ｂ）は帯域分割手段によって符号化系の信号を帯域分割して、さらに高域と低域の各々を符号化した後に高域符号について補正を行う。（Ｃ）は（Ａ）において低域符号化系の信号を補正する際に、低域復号化後の信号も参照する。（Ｄ）は（Ｂ）において高域符号化系の信号を補正する際に、高域復号化後の信号も参照する。
【００２２】
以上のような構成パターンとすることで、帯域分割前よりも低いサンプリングレートで補正処理を行うことができ、データ処理量やメモリ量を削減することができる。
【００２３】
これを踏まえた上で、以下図面を参照して、本発明の実施の形態を説明する。
【００２４】
（第１の実施形態）
図１は、第１の実施形態に関する音声信号処理装置の要部を示すブロック図である。
【００２５】
本装置は、図１に示すように、大別してディジタル音声信号から音声符号化データ（ＴＸ）を生成する符号化系と、通常ではメモリ１５に格納された音声符号化データ（ＲＸ）を元の音声信号に復号化する再生系（復号化系）とから構成される。
【００２６】
復号化系は、マイクロホン１０に入力された音声信号をディジタル音声信号に変換するＡ／Ｄ変換器１１と、ノイズキャンセラ１２と、エンコーダ１３と、マルチプレクサ（データ多重部）１４とを有する。一方、再生系は、スピーカ２０と、Ｄ／Ａ変換器２１と、デコーダ（音声復号化回路）２２と、ディマルチプレクサ２３とを有する。なお、再生系は、図１に示す従来のものと同様のため説明を省略する。また、符号化系において、ノイズキャンセラ１２、エンコーダ１３、及びマルチプレクサ１４は、通常では、ディジタル信号プロセッサ（ＤＳＰ）により構成されている。
【００２７】
エンコーダ１３は、ディジタル音声信号に対して、所定のアルゴリズム（例えばＣＥＬＰ方式）で圧縮符号化処理して、音声符号化データを生成する音声符号化回路である。エンコーダ１３は広帯域方式の音声符号化回路であり、低域用音声符号化器１３０と、高域用音声符号化器（Ｈコーダと表記する場合がある）１３１とに分かれている。マルチプレクサ１４は、エンコーダ１３により生成された音声符号化データを、伝送路、モデム部または誤り訂正部等の特性に応じた形態に変換してメモリ１５に出力する。
【００２８】
ノイズキャンセラ１２は、エンコーダ１３の動作モードを設定するモード信号（ＨＭ）に従って、ノイズ抑圧機能の有効又は無効を制御される。このモード信号（ＨＭ）は、例えば携帯電話のＣＰＵから出力される信号であり、高域用音声符号化器（Ｈコーダ）１３１を動作させるか否かを決定する。ここでは、便宜的に、「ＨＭ＝１」のときにＨコーダ１３１を動作させて、また「ＨＭ＝０」のときにＨコーダ１３１を動作させないものと想定する。
【００２９】
ノイズキャンセラ１２は、「ＨＭ＝１」のときには動作して、Ａ／Ｄ変換器１１から出力されたディジタル音声信号に対してノイズ成分を抑圧する。一方、ノイズキャンセラ１２は、「ＨＭ＝０」のときにはノイズ抑圧処理を実行せずに、Ａ／Ｄ変換器１１から出力されたディジタル音声信号（ＶＳ）をそのまま通過させる。
【００３０】
低域用音声符号化器１３０は、図２に示すように、ダウンサンプル部２０１及び低域符号化器（Ｌコーダ）２０２を含むモジュール２００と、ノイズキャンセラ部２０３とを有する。ダウンサンプル部２０１は、Ａ／Ｄ変換器１１から出力されるディジタル音声信号（ＶＳ）に対して低域処理を行うために所定のサンプル数を削減するようにダウンサンプルする。
【００３１】
ノイズキャンセラ部２０３は、「ＨＭ＝０」のときには、ダウンサンプル部２０１でダウンサンプルされたディジタル音声信号（ＶＳ）に対するノイズ抑圧処理を実行して、Ｌコーダ２０２に出力する。一方、「ＨＭ＝１」のときには、ノイズキャンセラ部２０３は、ダウンサンプル部２０１でダウンサンプルされたディジタル音声信号（ＶＳ）に対するノイズ抑圧処理を実行せずに、そのままＬコーダ２０２に通過させる。
【００３２】
（第１の実施形態の動作）
以下図１及び図２を参照して、本実施形態の符号化系の動作を説明する。
【００３３】
例えば携帯電話のＣＰＵからモード信号ＨＭが出力されて、エンコーダ１３の動作モード（ＨＭ＝１／０）が設定される。Ａ／Ｄ変換器１１は、マイクロホン１０に入力された音声信号をディジタル音声信号に変換する。
【００３４】
ここで、高域用音声符号化器（Ｈコーダ）１３１を動作させる動作モードが設定された場合を想定する（ＨＭ＝１）。ノイズキャンセラ１２は、「ＨＭ＝１」のときには動作して、Ａ／Ｄ変換器１１から出力されたディジタル音声信号に対してノイズ成分を抑圧した後に、エンコーダ１３に出力する。
【００３５】
エンコーダ１３では、Ｈコーダ１３１は高域音声信号に対する符号化処理を実行する。一方、低域用音声符号化器１３０では、「ＨＭ＝１」のときには、ノイズキャンセラ部２０３は、ダウンサンプル部２０１でダウンサンプルされたディジタル音声信号（ＶＳ）に対するノイズ抑圧処理を実行せずに、そのままＬコーダ２０２に通過させる。但し、ダウンサンプルされたディジタル音声信号（ＶＳ）は、前段のノイズキャンセラ１２によりノイズ抑圧処理されている。Ｈコーダ１３１及びＬコーダ２０２の各出力（音声符号化データ）は、マルチプレクサ１４により多重化されてメモリ１５に格納される。
【００３６】
一方、高域用音声符号化器（Ｈコーダ）１３１を動作させない動作モードを設定された場合を想定する（ＨＭ＝０）。ノイズキャンセラ１２は、「ＨＭ＝０」のときにはノイズ抑圧処理を実行せずに、Ａ／Ｄ変換器１１から出力されたディジタル音声信号（ＶＳ）をそのまま通過させる。Ｈコーダ１３１は非動作状態である。
【００３７】
低域用音声符号化器１３０では、「ＨＭ＝０」のときには、ノイズキャンセラ部２０３は、ダウンサンプル部２０１でダウンサンプルされたディジタル音声信号（ＶＳ）に対するノイズ抑圧処理を実行して、Ｌコーダ２０２に出力する。Ｌコーダ２０２は、低域用の音声符号化データを生成してマルチプレクサ１４に出力する。
【００３８】
以上のように本実施形態によれば、符号化系の動作モードがＨコーダ１３１を動作させない場合（ＨＭ＝０）、エンコーダ１３の前段に設けられたノイズキャンセラ１２も動作しない状態となる。従って、Ａ／Ｄ変換器１１から出力されたディジタル音声信号（ＶＳ）をそのまま通過して、エンコーダ１３の低域用音声符号化器１３０に与えられる。
【００３９】
低域用音声符号化器１３０では、「ＨＭ＝０」のときには、ノイズキャンセラ部２０３は動作状態になり、ダウンサンプル部２０１でダウンサンプルされたディジタル音声信号（ＶＳ）に対するノイズ抑圧処理を実行して、Ｌコーダ２０２に出力する。これにより、低域用音声符号化器１３０は、ノイズ成分が抑圧された低域用ディジタル音声信号から低域用音声符号化データを生成する。
【００４０】
従って、高域用音声符号化器１３１を動作させない動作モード時には、エンコーダ１３の前段に設けられたノイズキャンセラ１２を非動作状態にするため、当該ノイズキャンセラの機能に必要なＤＳＰでのデータ処理量やメモリ量を削減することができる。一方、低域用音声符号化器１３０では、低域用のノイズキャンセラ部２０３が機能するため、音声品質の劣化を招くことなく、低域用音声符号化データを生成することができる。この場合、低域用のノイズキャンセラ部２０３は、ダウンサンプルされた（サンプル数が削減された）ディジタル音声信号に対してノイズ抑圧処理を実行する。従って、ノイズキャンセラ部２０３の機能に必要なＤＳＰでのデータ処理量やメモリ量は、高域のノイズキャンセラ１２を機能させる場合と比較して相対的に削減することができる。
【００４１】
（第２の本実施形態）
図３は、第２の実施形態に関する音声信号処理装置の要部を示すブロック図である。
【００４２】
本実施形態の符号化系は、高域ノイズキャンセラが無く、低域用ノイズキャンセラ（ＬＮＣ）を含む低域用音声符号化器３００と、高域用ノイズキャンセラ（ＨＮＣ）を含む高域用音声符号化器３０１とを有するエンコーダ３０を備えたものである。なお、再生系（復号化系）は、第１の実施形態（図１を参照）と同様であるため説明を省略する。
【００４３】
当該エンコーダ３０において、低域用音声符号化器３００は、図４に示すように、低域符号化器（Ｌコーダ）４００、ダウンサンプル部４０１、及び低域用ノイズキャンセラ部（ＬＮＣ）４０２を有する。ダウンサンプル部４０１は、Ａ／Ｄ変換器１１から出力されるディジタル音声信号（ＶＳ）に対して低域処理を行うために所定のサンプル数を削減するようにダウンサンプルする。ＬＮＣ４０２は、ダウンサンプルされたディジタル音声信号（ＶＳ）に対して、主に低域の高周囲雑音を抑圧するノイズ抑圧処理を実行する。Ｌコーダ４００は、ＬＮＣ４０２によりノイズ抑圧されたディジタル音声信号（ダウンサンプルされた信号）から低域用の音声符号化データを生成してマルチプレクサ１４に出力する。
【００４４】
一方、高域用音声符号化器３０１は、高域符号化器（Ｈコーダ）５００及び高域用ノイズキャンセラ部（ＨＮＣ）５０１を有する。Ｈコーダ５００は、前述のモード信号ＨＭにより設定される動作モード（ＨＭ＝１／０）に応じて動作するか否かが決定される。即ち、「ＨＭ＝１」のときには、Ｈコーダ５００は動作し、Ａ／Ｄ変換器１１から出力されたディジタル音声信号（ＶＳ）の高域音声信号に対する符号化処理を実行する。ＨＮＣ５０１は、高域の高周囲雑音を抑圧するノイズ抑圧処理を実行する。ＨＮＣ５０１及びＬコーダ４００の各出力（音声符号化データ）は、マルチプレクサ１４により多重化されてメモリ１５に格納される。
【００４５】
ここで、「ＨＭ＝０」のときには、Ｈコーダ５００は非動作状態となる。この動作モードでは、低域用音声符号化器３００のみが動作して、Ｌコーダ４００の出力である音声符号化データをマルチプレクサ１４に送出する。
【００４６】
以上のように本実施形態によれば、符号化系の動作モードがＨコーダ５００を動作させない場合（ＨＭ＝０）、高域用音声符号化器３０１は非動作状態となり、低域用音声符号化器３００のみが動作する。従って、「ＨＭ＝０」のときには、低域用音声符号化器３００に含まれるＬＮＣ４０２のみが動作して、ダウンサンプル部４０１でダウンサンプルされたディジタル音声信号（ＶＳ）に対するノイズ抑圧処理を実行する。従って、高域用音声符号化器３０１を動作させない動作モード時には、当該ノイズキャンセラの機能に必要なＤＳＰでのデータ処理量やメモリ量を削減することができる。
【００４７】
（ＶＡＤ機能）
ところで、低域用音声符号化器３００は、図４に示すように、ディジタル音声信号（ＶＳ）から入力された音声が有音または無音を判定するＶＡＤ（Voice Activity Detection）機能を有し、無音を検出したときに所定のフラグ（ＶＡＤＦ）を高域用音声符号化器３０１に出力する。
【００４８】
高域用音声符号化器３０１では、Ｈコーダ５００の出力は、主として音声信号の高域ゲインに関する音声符号化データである。ＨＮＣ５０１は、当該音声符号化データを処理することにより簡易的にノイズをキャンセルするノイズ抑圧部である。ＨＮＣ５０１は、無音（ＶＡＤＦ＝０）のときには、高域のゲインが雑音信号（ノイズ）のゲインと判断し、Ｈコーダ５００からの出力信号から当該ゲインに応じた値を引き、その結果をマルチプレクサ１４に出力する。一方、ＨＮＣ５０１は、有音（ＶＡＤＦ＝１）のときには、無音（ＶＡＤＦ＝０）のときに差し引いた値をＨコーダ５００の入力から差し引き、その結果をマルチプレクサ１４に出力する。
【００４９】
ここで、低域用音声符号化器３００では、ＶＡＤ機能は、Ｌコーダ４００の内部に設けられている。具体的には、Ｌコーダ４００は、図５（Ａ）に示すように、ＶＡＤ部５０と、有音コーダ部５１と、無音コーダ部５２とを有する。無音コーダ部５２は、ＶＡＤ部５０から無音を示すフラグ（ＶＡＤＦ＝０）が出力されたときに機能する。また、有音コーダ部５１は、ＶＡＤ部５０から有音を示すフラグ（ＶＡＤＦ＝１）が出力されたときに機能する。ＶＡＤ部５０は、当該フラグ（ＶＡＤＦ＝１／０）を高域用音声符号化器３０１のＨＮＣ５０１に出力する。
【００５０】
また、Ｌコーダ４００は、図５（Ｂ）に示すように、ＶＡＤ部５０と、有音コーダ部５１と、無音コーダ部５２と、スイッチ部５３を有する構成でもよい。スイッチ部５３は、ＶＡＤ部５０から無音を示すフラグ（ＶＡＤＦ＝０）が出力されたときに、ディジタル音声信号（ＶＳ）を無音コーダ部５２に転送する。
【００５１】
また、スイッチ部５３は、ＶＡＤ部５０から有音を示すフラグ（ＶＡＤＦ＝１）が出力されたときに、ディジタル音声信号（ＶＳ）を有音コーダ部５１に転送する。ＶＡＤ部５０は、当該フラグ（ＶＡＤＦ＝１／０）を高域用音声符号化器３０１のＨＮＣ５０１に出力する。
【００５２】
（変形例）
図６は、第２の実施形態の変形例に関するブロック図である。
【００５３】
本変形例は、高域用音声符号化器３０１において、例えば携帯電話のＣＰＵからの動作モード信号（ＭＳ）に応じてＨＮＣ５０１の動作を制御する構成である。具体的には、動作モード信号（ＭＳ）としては、例えば音楽用の音声信号を処理するモードを設定する信号である。
【００５４】
高域用音声符号化器３０１では、ＣＰＵから音楽用の音声信号に対する高域符号化処理を実行するときには、ＨＮＣ５０１は動作モード信号（ＭＳ＝１）に応じて動作し、音楽用として有効な高域ノイズの抑圧処理を実行する。
【００５５】
なお、ＣＰＵから設定される動作モード信号（ＭＳ）としては、音楽用に限定されず、各種のモードを設定する場合にも適用できる。
【００５６】
（第３の本実施形態）
図７は、第３の実施形態に関する音声信号処理装置の要部を示すブロック図であり、図８は図７の低域用音声符号化器１７２及び低域用音声復号化器２２２の構成を示すブロック図である。
【００５７】
本実施形態は、図１と図７の比較、図２と図８の比較からも分かるように、第１の実施形態において、ノイズキャンセラをエコーキャンセラで置き換え、エンコーダ２２から広帯域エコーキャンセラ１６への受話音声信号（ＢＲ信号）入力を加え、低域用音声復号化器２２２から低域用音声符号化器１７２（エコーキャンセラ２０４）へのＬＢＲ信号入力を加えたものである。
【００５８】
エコーキャンセラ１６と２０４はどちらか一方が動作し、高域用音声符号化器１７１の動作時は１６のみが動作し、１７１の非動作時は２０４のみが動作する。従って、高域用音声符号化器１７１の非動作時には、当該エコーキャンセラの機能に必要なＤＳＰでのデータ処理量やメモリ量を削減することができる。
【００５９】
（第４の本実施形態）
図９は、第４の実施形態に関する音声信号処理装置の要部を示すブロック図であり、図１０は図９のエンコーダ３１の構成を示すブロック図である。
【００６０】
本実施形態は、図３と図９の比較、図４と図１０の比較からも分かるように、第２の実施形態において、ノイズキャンセラをエコーキャンセラで置き換え、低域用音声復号化器２２２から低域用音声符号化器３１２（低域用エコーキャンセラ４０３）へのＬＢＲ信号入力を加え、高域用音声復号化器２２１から高域用音声符号化器３１３（高域用エコーキャンセラ５０２）へのＨＢＲ信号入力を加えたものである。
【００６１】
高域音声符号化器５００が非動作時、高域用エコーキャンセラ５０２は非動作状態となり、低域用エコーキャンセラ４０３のみが動作する。従って、高域用音声符号化器５００が非動作時には、当該エコーキャンセラの機能に必要なＤＳＰでのデータ処理量やメモリ量を削減することができる。
【００６２】
（変形例）
図１１は、第４の実施形態の変形例に関するブロック図である。
【００６３】
本変形例は、高域用音声符号化器３１３において、例えば携帯電話のＣＰＵからの動作モード信号（ＲＢＴ）に応じてＨＥＣ５０２の動作を制御する構成である。具体的には、動作モード信号（ＲＢＴ）としては、例えば電話のプッシュ音、着信メロディーもしくはアラーム音のように周波数的に極端に偏りのある信号を処理するモードを設定する信号である。
【００６４】
ＨＥＣ５０２は動作モード信号（ＲＢＴ＝１）に応じて動作し、ＨＥＣ５０１およびＬＥＣ４０３の学習を停止する。
【００６５】
なお、ＣＰＵから設定される動作モード信号（ＲＢＴ）としては、プッシュ音、着信メロディもしくはアラーム音に限定されず、符号化モード等の各種モードを設定する場合にも適用できる。
【００６６】
また、図７〜図１０におけるエコーキャンセラをボイススイッチに置き換えて、図１２〜図１５に示すような実施形態も考えられる。図１２、図１３は低域ボイススイッチＬＶＳ８１と高域ボイススイッチＨＶＳ８２を組み合わせたものであり、図１４、図１５は高域ボイススイッチと低域ボイススイッチを組み合わせたものである。いずれも高域用音声符号化器が動作しないときに、低域のみのボイススイッチを動作させることにより、データ処理量およびメモリ量を削減することができる。
【００６７】
【発明の効果】
以上詳述したように本発明によれば、特に広帯域の音声符号化回路（エンコーダ）とノイズキャンセラ、エコーキャンセラもしくはボイススイッチのいずれか一つ以上とを有する音声信号処理装置において、音声品質の低下を招くことなく、特に符号化系でのノイズキャンセラ、エコーキャンセラもしくはボイススイッチの機能に要するデータ処理量やメモリ量を削減できる。従って、結果として音声信号処理効率を向上できる音声信号処理装置を提供することができる。
【００６８】
具体的には、高域音声信号成分に対する音声符号化処理を実行せずに、低域音声信号成分のみに対する音声符号化処理を実行するときに、低域音声信号成分に含まれるノイズ成分あるいはエコー成分の抑圧処理を実行できる。従って、例えばＤＳＰによりノイズあるいはエコーの抑圧処理を実行するような構成では、高域音声符号化処理を実行しないモード時には、ノイズキャンセラ、エコーキャンセラもしくはボイススイッチの機能に要するデータ処理量やメモリ量を削減することができる。
【図面の簡単な説明】
【図１】本発明の第１の実施形態に関する音声信号処理装置の要部を示すブロック図。
【図２】同実施形態に関する低域用音声符号化器の構成を示すブロック図。
【図３】本発明の第２の実施形態に関する音声信号処理装置の要部を示すブロック図。
【図４】同実施形態に関するエンコーダの構成を示すブロック図。
【図５】同実施形態に関するＶＡＤ機能を説明するためのブロック図。
【図６】第２の実施形態の変形例に関するブロック図。
【図７】本発明の第３の実施形態に関する音声信号処理装置の要部を示すブロック図。
【図８】同実施形態に関する低域用音声符号化器の構成を示すブロック図。
【図９】本発明の第４の実施形態に関する音声信号処理装置の要部を示すブロック図。
【図１０】同実施形態に関するエンコーダの構成を示すブロック図。
【図１１】第４の実施形態の変形例に関するブロック図。
【図１２】本発明の第５の実施形態に関する音声信号処理装置の要部を示すブロック図。
【図１３】同実施形態に関する低域用音声符号化器の構成を示すブロック図。
【図１４】本発明の第６の実施形態に関する音声信号処理装置の要部を示すブロック図。
【図１５】同実施形態に関するエンコーダの構成を示すブロック図。
【図１６】主要構成を示すブロック図。
【図１７】従来の第１の音声信号処理装置の一般的構成を示すブロック図。
【図１８】従来の第２の音声信号処理装置の一般的構成を示すブロック図。
【図１９】従来の第３の音声信号処理装置の一般的構成を示すブロック図。
【符号の説明】
１…帯域分割手段
２…補正手段
３…低域符号化手段
４…高域符号化手段
５…低域復号化手段
１０…マイクロホン
１１…Ａ／Ｄ変換器
１２…ノイズキャンセラ
１３，１７，３０，３２，７１…エンコーダ（音声符号化器）
１４…マルチプレクサ
１５…メモリ
１６，７２…広域用エコーキャンセラ
２０…スピーカ
２１…Ｄ／Ａ変換器
２２，２４…デコーダ（音声復号化回路）
２３…ディマルチプレクサ
５０…ＶＡＤ部
７３，８２…広域用ボイススイッチ
８０…高域用ボイススイッチ
８１…低域用ボイススイッチ
１３０，１７３，２０２，２４２，３２０，４００…低域符号化器（Ｌコーダ）
１３１，１７１，２４１，３２１，５００…高域符号化器（Ｈコーダ）
１７２，３００，３１０…低域用音声符号化器
２００，２０５…モジュール
２０１，４０１…ダウンサンプル部
２０３…ノイズキャンセラ部
２０４…低域用エコーキャンセラ部
２２１，７０１…高域復号化器（Ｈデコーダ）
２２２，２２３，２３０，７０２…低域復号化器（Ｌデコーダ）
２３１…アップサンプル部
３０１，３１１…高域用音声符号化器
４０２…低域用ノイズキャンセラ部（ＬＮＣ）
４０３…低域用エコーキャンセラ部（ＬＥＣ）
５０１…高域用ノイズキャンセラ部（ＨＮＣ）
５０２…高域用エコーキャンセラ部（ＨＥＣ）[0001]
BACKGROUND OF THE INVENTION
The present invention relates generally to an audio signal processing apparatus applied to a digital audio communication system in the mobile communication field such as a mobile phone, and more particularly to a noise suppression function and an echo suppression function in audio encoding processing.
[0002]
[Prior art]
In general, in the mobile communication field such as a mobile phone, a digital voice communication system is applied. In the digital voice communication system, a voice coding (compression coding) system is used to compress voice data for transmission.
[0003]
In the mobile communication field, a low bit rate coding method called a CELP (Code Excited Linear Prediction) method is well known as a typical speech coding method. When speech coding is performed by such a method, not only a speech signal but also a speech signal including a noise component called high ambient noise is coded. However, it is known that when a speech signal including a noise component and an echo component is encoded as it is, speech encoded data with degraded quality is generated. For this reason, in general, a noise suppression circuit called a noise canceller is used in the speech coding circuit so that only a speech signal in which the noise component is suppressed is input, or a speech signal in which the echo component is suppressed is input. As described above, an echo suppression circuit such as an echo canceller or a voice switch is used.
[0004]
For example, when there is no audio signal, the noise canceller determines the state of only the ambient noise signal, analyzes the feature, and suppresses the noise component using the feature in a section where the audio signal and the noise component are mixed. It is configured as follows. The echo canceller learns the acoustic characteristics of wraparound from reception to transmission, for example, when a voice signal arrives at the reception side and the transmission side is not talking at all, that is, determines the single talk state of the reception. The echo component mixed in the signal on the transmission side is suppressed using the acoustic characteristics. The voice switch is configured to suppress the echo component by comparing the signal power between receiving and transmitting, for example, and putting a loss in the smaller power.
[0005]
In addition, the voice encoding method used in current mobile phones is mainly limited to a band in which a voice signal exists. In recent years, in order to obtain higher quality, a high-frequency audio encoding method that performs audio encoding in a wider band than the audio signal band is being standardized. Even in such a wideband speech coding system, the CELP system is used, so that a noise canceler for suppressing a noise component which is a high ambient noise is required, or an echo canceller or voice for suppressing an echo component. A switch is required.
[0006]
FIG. 17 is a block diagram showing a general configuration of an audio signal processing apparatus adopting a wideband audio encoding method using a noise canceller.
[0007]
The audio processing apparatus converts an audio signal input to the microphone 10 into a digital audio signal by the A / D converter 11 and generates audio encoded data (TX) from the digital audio signal, and an audio The encoded data (RX) is roughly divided into a reproduction system (decoding system) that converts audio signals into analog signals by the D / A converter 21 and outputs sound from the speaker 20.
[0008]
The encoding system includes a noise canceller 70, an encoder 71, and a multiplexer (data multiplexing unit) 14 in addition to the microphone 10 and the A / D converter 11. The noise canceller 70 is a circuit that suppresses a noise component that is high ambient noise from a digital audio signal. The encoder 71 is a speech encoding circuit that compresses and encodes a digital speech signal whose noise component is suppressed by a predetermined algorithm (for example, CELP method). On the other hand, the reproduction system usually includes a demultiplexer 23 and a decoder (audio decoding circuit) 22 in order to decode audio encoded data stored in a memory into original audio data.
[0009]
Here, in particular, the wideband encoder 71 includes a low-frequency speech encoder (sometimes referred to as an L coder) 700, and a high-frequency speech encoder (sometimes referred to as an H coder) 701. It is divided into. By the way, the digital audio signal that has passed through the noise canceller 70 is divided into a high frequency audio signal component that has no power as an audio signal and is not so important in terms of information, and other low frequency audio signal components. In a certain encoding mode, there is a method in which a high-frequency audio signal component is unnecessary and is previously removed from audio encoded data. Therefore, the multiplexer 14 outputs audio encoded data including only the low frequency audio signal component or outputs audio encoded data including the high frequency audio signal component.
[0010]
[Problems to be solved by the invention]
As described above, in the encoder divided into the low frequency speech encoder 700 and the high frequency speech encoder 701, only the L coder 700 may operate according to the encoding mode. In such an encoding mode, the noise canceller 70 does not need to perform noise suppression processing on the digital audio signals in all the bands output from the A / D converter 11, but only on the low frequency audio signal components. Noise suppression processing may be used.
[0011]
However, in the conventional method, even in the mode in which only the low-band speech encoder 700 operates, the noise canceller 70 performs processing on digital speech signals in all bands. Here, normally, the noise canceller 70, the encoder 71, and the multiplexer 14 are configured by a digital signal processor (DSP). For this reason, the conventional method has a problem that an excessive data processing amount and memory amount are required for the DSP to realize the function of the noise canceller 70.
[0012]
Accordingly, an object of the present invention is to reduce the amount of data processing and the amount of memory required for the noise canceller function particularly in the coding system without deteriorating the voice quality, and as a result, the voice signal processing efficiency can be improved. An object is to provide an audio signal processing apparatus.
[0013]
FIG. 18 is a block diagram showing a general configuration of an audio signal processing apparatus adopting a wideband audio encoding system using an echo canceller 72. When only the L coder 700 is operated, only the low frequency audio signal component is shown. In the same way, it is desirable to reduce the data processing amount and the memory amount.
[0014]
Further, the same is desired for the voice switch 73 as shown in FIG. Accordingly, an object of the present invention is to reduce the data processing amount and the memory amount required for the echo suppression function to improve the audio signal processing efficiency.
[0015]
[Means for Solving the Problems]
An aspect of the present invention is a high frequency noise canceller function particularly in a speech signal processing apparatus having a wideband speech encoding circuit (encoder) and a noise canceller in a mode in which the high frequency speech encoder included in the encoder is not operated. The present invention relates to an audio signal processing device that invalidates the sound. In other words, in the mode in which only the low frequency speech coder is operated, the speech signal processing device enables the low frequency noise canceller function.
[0016]
An audio signal processing device according to an aspect of the present invention is an audio signal processing device that encodes a digital audio signal, wherein the digital audio signal is divided into a high-frequency component signal and a low-frequency component signal; First encoding means for encoding the high-frequency component signal and a second code for encoding the low-frequency component signal in response to an operation mode signal instructing encoding of the high-frequency component signal And Before being encoded by the first and second encoding means, First suppression means for suppressing noise components included in the digital audio signal; Before being encoded by the second encoding means, A second suppression means for suppressing a noise component included in the low-frequency component signal; and when the operation mode signal does not instruct encoding of the high-frequency component signal, By the second encoding means And a control unit that encodes the low-frequency component signal and controls the first suppression unit not to operate.
[0017]
With such a configuration, noise suppression is performed only on the low frequency audio signal component when the audio encoding processing is performed only on the low frequency audio signal component without performing audio encoding processing on the high frequency audio signal component. Processing can be executed. Therefore, for example, in a configuration in which noise suppression processing is executed by a DSP, the amount of data processing and memory required for the function of the noise canceller can be reduced in a mode in which high frequency speech encoding processing is not executed. Therefore, as a result, an audio signal processing device that can improve the audio signal processing efficiency can be provided.
[0018]
Another aspect of the present invention is that in a speech signal processing apparatus having a wide-band encoder and echo suppression means (echo canceller, voice switch), in a mode in which the high-frequency speech encoder included in the encoder is not operated. The present invention also relates to an audio signal processing device that disables the function of high-frequency echo suppression means. In other words, in the mode in which only the low frequency speech coder is operated, the speech signal processing device validates the function of the low frequency echo suppression means.
[0019]
An audio signal processing apparatus according to an aspect of the present invention includes a dividing unit that divides a digital audio signal into a high-frequency component signal and a low-frequency component signal; In response to an operation mode signal that instructs encoding of the high-frequency component signal, A first encoding unit that encodes the high-frequency component signal; a second encoding unit that encodes the low-frequency component signal; First suppression means for suppressing an included echo component; and second suppression means for suppressing an echo component generated due to the received voice signal and included in the low-frequency component signal; When the operation mode signal does not instruct encoding of the high frequency component signal, The low frequency component signal with respect to the digital audio signal And And a control unit that controls the first suppression unit not to operate.
[0020]
With such a configuration, when the speech coding process is performed only on the low frequency speech signal component without performing the speech coding processing on the high frequency speech signal component, only the low frequency speech signal component is performed. Echo suppression processing can be executed. Therefore, for example, in a configuration in which echo suppression processing is executed by a DSP, the amount of data processing and memory required for the functions of the echo canceller and voice switch can be reduced in a mode in which high frequency speech encoding processing is not executed. Therefore, as a result, an audio signal processing device that can improve the audio signal processing efficiency can be provided.
[0021]
DETAILED DESCRIPTION OF THE INVENTION
The main configuration of the present invention is classified into four patterns as shown in FIGS. In FIG. 16A, the band signal is subjected to band division by the band dividing unit and then the low band signal is corrected, and then each of the high band and the low band is encoded. (B) divides the band of the coding system signal by the band dividing means, further encodes each of the high band and the low band, and then corrects the high band code. (C) also refers to the signal after low-frequency decoding when correcting the low-frequency encoding signal in (A). (D) also refers to the signal after high frequency decoding when correcting the signal of the high frequency encoding system in (B).
[0022]
With the configuration pattern as described above, correction processing can be performed at a lower sampling rate than before band division, and the amount of data processing and memory can be reduced.
[0023]
Based on this, an embodiment of the present invention will be described below with reference to the drawings.
[0024]
(First embodiment)
FIG. 1 is a block diagram showing a main part of the audio signal processing apparatus according to the first embodiment.
[0025]
As shown in FIG. 1, the present apparatus is roughly divided into an encoding system for generating speech encoded data (TX) from a digital speech signal, and speech encoded data (RX) normally stored in a memory 15 as an original. And a reproduction system (decoding system) that decodes the audio signal.
[0026]
The decoding system includes an A / D converter 11 that converts an audio signal input to the microphone 10 into a digital audio signal, a noise canceller 12, an encoder 13, and a multiplexer (data multiplexing unit) 14. On the other hand, the reproduction system includes a speaker 20, a D / A converter 21, a decoder (audio decoding circuit) 22, and a demultiplexer 23. The reproduction system is the same as the conventional one shown in FIG. In the encoding system, the noise canceller 12, the encoder 13, and the multiplexer 14 are usually configured by a digital signal processor (DSP).
[0027]
The encoder 13 is a speech coding circuit that performs speech coding processing on a digital speech signal using a predetermined algorithm (for example, CELP method) to generate speech coded data. The encoder 13 is a wideband speech encoding circuit, and is divided into a low-frequency speech encoder 130 and a high-frequency speech encoder (sometimes referred to as an H coder) 131. The multiplexer 14 converts the speech encoded data generated by the encoder 13 into a form according to the characteristics of the transmission path, modem unit, error correction unit, etc., and outputs it to the memory 15.
[0028]
The noise canceller 12 is controlled to enable or disable the noise suppression function according to a mode signal (HM) for setting the operation mode of the encoder 13. This mode signal (HM) is, for example, a signal output from a CPU of a mobile phone, and determines whether or not to operate the high frequency speech encoder (H coder) 131. Here, for convenience, it is assumed that the H coder 131 is operated when “HM = 1” and the H coder 131 is not operated when “HM = 0”.
[0029]
The noise canceller 12 operates when “HM = 1” and suppresses a noise component with respect to the digital audio signal output from the A / D converter 11. On the other hand, when “HM = 0”, the noise canceller 12 passes the digital audio signal (VS) output from the A / D converter 11 as it is without executing the noise suppression processing.
[0030]
As shown in FIG. 2, the low-frequency speech encoder 130 includes a module 200 including a down-sampling unit 201 and a low-frequency encoder (L coder) 202, and a noise canceller unit 203. The down-sampling unit 201 down-samples the digital audio signal (VS) output from the A / D converter 11 so as to reduce a predetermined number of samples in order to perform low-frequency processing.
[0031]
When “HM = 0”, the noise canceller unit 203 performs noise suppression processing on the digital audio signal (VS) downsampled by the downsampling unit 201 and outputs the result to the L coder 202. On the other hand, when “HM = 1”, the noise canceller unit 203 does not perform noise suppression processing on the digital audio signal (VS) downsampled by the downsampling unit 201 and passes it directly to the L coder 202.
[0032]
(Operation of the first embodiment)
The operation of the coding system according to this embodiment will be described below with reference to FIGS.
[0033]
For example, the mode signal HM is output from the CPU of the mobile phone, and the operation mode (HM = 1/0) of the encoder 13 is set. The A / D converter 11 converts the audio signal input to the microphone 10 into a digital audio signal.
[0034]
Here, it is assumed that the operation mode for operating the high frequency speech coder (H coder) 131 is set (HM = 1). The noise canceller 12 operates when “HM = 1”, suppresses the noise component of the digital audio signal output from the A / D converter 11, and then outputs it to the encoder 13.
[0035]
In the encoder 13, the H coder 131 executes an encoding process for the high frequency audio signal. On the other hand, in the low frequency speech encoder 130, when “HM = 1”, the noise canceller unit 203 does not perform noise suppression processing on the digital speech signal (VS) downsampled by the downsampling unit 201. It passes through the L coder 202 as it is. However, the downsampled digital audio signal (VS) is subjected to noise suppression processing by the noise canceller 12 in the previous stage. Outputs (voice encoded data) of the H coder 131 and the L coder 202 are multiplexed by the multiplexer 14 and stored in the memory 15.
[0036]
On the other hand, it is assumed that an operation mode in which the high frequency speech coder (H coder) 131 is not operated is set (HM = 0). The noise canceller 12 passes the digital audio signal (VS) output from the A / D converter 11 as it is without executing the noise suppression process when “HM = 0”. The H coder 131 is not operating.
[0037]
In the low frequency speech encoder 130, when “HM = 0”, the noise canceller unit 203 performs noise suppression processing on the digital speech signal (VS) downsampled by the downsampling unit 201, and the L coder 202. Output to. The L coder 202 generates low-frequency speech encoded data and outputs it to the multiplexer 14.
[0038]
As described above, according to the present embodiment, when the operation mode of the encoding system does not operate the H coder 131 (HM = 0), the noise canceller 12 provided in the previous stage of the encoder 13 also does not operate. Therefore, the digital speech signal (VS) output from the A / D converter 11 is passed through as it is and is supplied to the low frequency speech encoder 130 of the encoder 13.
[0039]
In the low frequency speech encoder 130, when “HM = 0”, the noise canceller unit 203 is in an operating state, and performs noise suppression processing on the digital speech signal (VS) downsampled by the downsampling unit 201. , Output to the L coder 202. Thereby, the low frequency speech coder 130 generates low frequency speech encoded data from the low frequency digital speech signal in which the noise component is suppressed.
[0040]
Therefore, in the operation mode in which the high frequency speech coder 131 is not operated, the noise canceller 12 provided in the preceding stage of the encoder 13 is deactivated. Therefore, the data processing amount and memory in the DSP necessary for the function of the noise canceller The amount can be reduced. On the other hand, in the low-frequency speech encoder 130, the low-frequency noise canceller unit 203 functions, so that low-frequency speech encoded data can be generated without causing deterioration of speech quality. In this case, the low-frequency noise canceller unit 203 performs noise suppression processing on the down-sampled digital audio signal (the number of samples is reduced). Therefore, the data processing amount and memory amount in the DSP necessary for the function of the noise canceller unit 203 can be relatively reduced as compared with the case where the high frequency noise canceller 12 is functioned.
[0041]
(Second embodiment)
FIG. 3 is a block diagram showing a main part of the audio signal processing apparatus according to the second embodiment.
[0042]
The encoding system of the present embodiment has no high frequency noise canceller, and includes a low frequency speech encoder 300 including a low frequency noise canceller (LNC) and a high frequency audio encoder including a high frequency noise canceller (HNC). The encoder 30 having 301 is provided. Note that the playback system (decoding system) is the same as that of the first embodiment (see FIG. 1), and thus description thereof is omitted.
[0043]
In the encoder 30, the low frequency speech encoder 300 includes a low frequency encoder (L coder) 400, a downsampling unit 401, and a low frequency noise canceller (LNC) 402 as shown in FIG. 4. . The down-sampling unit 401 down-samples the digital audio signal (VS) output from the A / D converter 11 so as to reduce a predetermined number of samples in order to perform low-frequency processing. The LNC 402 executes noise suppression processing for mainly suppressing low-frequency high ambient noise on the down-sampled digital voice signal (VS). The L coder 400 generates low-frequency speech encoded data from the digital speech signal (down-sampled signal) whose noise has been suppressed by the LNC 402 and outputs it to the multiplexer 14.
[0044]
On the other hand, the high frequency speech encoder 301 includes a high frequency encoder (H coder) 500 and a high frequency noise canceller (HNC) 501. Whether or not the H coder 500 operates according to the operation mode (HM = 1/0) set by the mode signal HM is determined. That is, when “HM = 1”, the H coder 500 operates and executes a coding process for the high frequency audio signal of the digital audio signal (VS) output from the A / D converter 11. The HNC 501 executes noise suppression processing that suppresses high ambient noise in a high frequency range. Each output (voice encoded data) of the HNC 501 and the L coder 400 is multiplexed by the multiplexer 14 and stored in the memory 15.
[0045]
Here, when “HM = 0”, the H coder 500 is in a non-operating state. In this operation mode, only the low frequency speech encoder 300 operates, and the speech encoded data that is the output of the L coder 400 is sent to the multiplexer 14.
[0046]
As described above, according to the present embodiment, when the operation mode of the encoding system does not operate the H coder 500 (HM = 0), the high frequency speech encoder 301 is inoperative and the low frequency speech code Only the generator 300 operates. Therefore, when “HM = 0”, only the LNC 402 included in the low-frequency speech encoder 300 operates to execute noise suppression processing on the digital speech signal (VS) downsampled by the downsampling unit 401. . Therefore, in the operation mode in which the high frequency speech encoder 301 is not operated, it is possible to reduce the data processing amount and memory amount in the DSP necessary for the function of the noise canceller.
[0047]
(VAD function)
By the way, as shown in FIG. 4, the low frequency speech encoder 300 has a VAD (Voice Activity Detection) function for determining whether speech input from a digital speech signal (VS) is sound or silence, Is detected, a predetermined flag (VADF) is output to the high frequency speech encoder 301.
[0048]
In the high frequency speech encoder 301, the output of the H coder 500 is speech encoded data mainly relating to the high frequency gain of the speech signal. The HNC 501 is a noise suppression unit that simply cancels noise by processing the speech encoded data. When there is no sound (VADF = 0), the HNC 501 determines that the high-frequency gain is the noise signal (noise) gain, subtracts a value corresponding to the gain from the output signal from the H coder 500, and the result is the multiplexer 14. Output to. On the other hand, when there is sound (VADF = 1), the HNC 501 subtracts the value subtracted when there is no sound (VADF = 0) from the input of the H coder 500 and outputs the result to the multiplexer 14.
[0049]
Here, in the low frequency speech encoder 300, the VAD function is provided inside the L coder 400. Specifically, the L coder 400 includes a VAD unit 50, a voice coder unit 51, and a silent coder unit 52, as shown in FIG. The silent coder unit 52 functions when a silent flag (VADF = 0) is output from the VAD unit 50. Further, the voice coder unit 51 functions when a flag (VADF = 1) indicating voice is output from the VAD unit 50. The VAD unit 50 outputs the flag (VADF = 1/0) to the HNC 501 of the high frequency speech encoder 301.
[0050]
Further, as shown in FIG. 5B, the L coder 400 may include a VAD unit 50, a voiced coder unit 51, a silent coder unit 52, and a switch unit 53. When the flag (VADF = 0) indicating silence is output from the VAD unit 50, the switch unit 53 transfers the digital audio signal (VS) to the silence coder unit 52.
[0051]
Further, the switch unit 53 transfers the digital voice signal (VS) to the voiced coder unit 51 when the flag (VADF = 1) indicating the voice is output from the VAD unit 50. The VAD unit 50 outputs the flag (VADF = 1/0) to the HNC 501 of the high frequency speech encoder 301.
[0052]
(Modification)
FIG. 6 is a block diagram relating to a modification of the second embodiment.
[0053]
In this modification, the high frequency speech encoder 301 controls the operation of the HNC 501 in accordance with, for example, an operation mode signal (MS) from a CPU of a mobile phone. Specifically, the operation mode signal (MS) is, for example, a signal for setting a mode for processing an audio signal for music.
[0054]
In the high frequency speech encoder 301, when the CPU performs high frequency encoding processing on the audio signal for music, the HNC 501 operates in accordance with the operation mode signal (MS = 1), and is effective for music. Performs noise suppression processing.
[0055]
Note that the operation mode signal (MS) set from the CPU is not limited to music, and can be applied to setting various modes.
[0056]
(Third embodiment)
FIG. 7 is a block diagram showing the main part of the speech signal processing apparatus according to the third embodiment, and FIG. 8 shows the configuration of the low frequency speech encoder 172 and the low frequency speech decoder 222 of FIG. FIG.
[0057]
As can be seen from the comparison between FIG. 1 and FIG. 7 and the comparison between FIG. 2 and FIG. 8, the present embodiment replaces the noise canceller with an echo canceller in the first embodiment, and receives speech from the encoder 22 to the wideband echo canceller 16. A speech signal (BR signal) input is added, and an LBR signal input from the low frequency speech decoder 222 to the low frequency speech encoder 172 (echo canceller 204) is added.
[0058]
Either one of the echo cancellers 16 and 204 operates. When the high frequency speech coder 171 operates, only 16 operates, and when 171 does not operate, only 204 operates. Therefore, when the high frequency speech encoder 171 is not operating, it is possible to reduce the data processing amount and memory amount in the DSP necessary for the function of the echo canceller.
[0059]
(Fourth embodiment)
FIG. 9 is a block diagram showing a main part of an audio signal processing apparatus according to the fourth embodiment, and FIG. 10 is a block diagram showing a configuration of the encoder 31 of FIG.
[0060]
As can be seen from the comparison between FIG. 3 and FIG. 9 and the comparison between FIG. 4 and FIG. 10, the present embodiment replaces the noise canceller with an echo canceller in the second embodiment, and The LBR signal input to the high frequency speech encoder 312 (low frequency echo canceller 403) is added, and the high frequency speech decoder 221 to the high frequency speech encoder 313 (high frequency echo canceller 502). The HBR signal input is added.
[0061]
When the high frequency speech encoder 500 is not operating, the high frequency echo canceller 502 is inactive, and only the low frequency echo canceller 403 operates. Therefore, when the high frequency speech encoder 500 is not operating, it is possible to reduce the data processing amount and memory amount in the DSP necessary for the function of the echo canceller.
[0062]
(Modification)
FIG. 11 is a block diagram relating to a modification of the fourth embodiment.
[0063]
In the present modification, the high frequency speech encoder 313 controls the operation of the HEC 502 in accordance with, for example, an operation mode signal (RBT) from a CPU of a mobile phone. Specifically, the operation mode signal (RBT) is a signal for setting a mode for processing a signal that is extremely biased in frequency, such as a telephone push sound, an incoming melody, or an alarm sound.
[0064]
The HEC 502 operates in response to the operation mode signal (RBT = 1), and stops learning the HEC 501 and the LEC 403.
[0065]
Note that the operation mode signal (RBT) set by the CPU is not limited to a push sound, a ringing melody, or an alarm sound, and can be applied when various modes such as an encoding mode are set.
[0066]
Also, embodiments as shown in FIGS. 12 to 15 can be considered by replacing the echo canceller in FIGS. 7 to 10 with a voice switch. 12 and 13 show a combination of a low-frequency voice switch LVS81 and a high-frequency voice switch HVS82, and FIGS. 14 and 15 show a combination of a high-frequency voice switch and a low-frequency voice switch. In either case, when the high frequency speech encoder does not operate, the data processing amount and the memory amount can be reduced by operating only the low frequency voice switch.
[0067]
【The invention's effect】
As described above in detail, according to the present invention, in an audio signal processing apparatus having a wideband audio encoding circuit (encoder) and at least one of a noise canceller, an echo canceller and a voice switch, voice quality is reduced. Without incurring this, it is possible to reduce the amount of data processing and the amount of memory required for the functions of the noise canceller, echo canceller or voice switch in the coding system. Therefore, as a result, it is possible to provide an audio signal processing device that can improve audio signal processing efficiency.
[0068]
Specifically, the noise component or echo contained in the low frequency audio signal component is not performed when the audio encoding processing is performed only on the low frequency audio signal component without performing the audio encoding processing on the high frequency audio signal component. Component suppression processing can be executed. Therefore, for example, in a configuration in which noise or echo suppression processing is executed by a DSP, the amount of data processing and memory required for the function of the noise canceller, echo canceller, or voice switch is reduced in a mode in which high-frequency speech encoding processing is not executed. can do.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a main part of an audio signal processing apparatus according to a first embodiment of the present invention.
FIG. 2 is a block diagram showing a configuration of a low-frequency speech encoder according to the embodiment.
FIG. 3 is a block diagram showing a main part of an audio signal processing apparatus according to a second embodiment of the present invention.
FIG. 4 is a block diagram showing a configuration of an encoder according to the embodiment.
FIG. 5 is an exemplary block diagram for explaining a VAD function according to the embodiment;
FIG. 6 is a block diagram relating to a modified example of the second embodiment.
FIG. 7 is a block diagram showing a main part of an audio signal processing apparatus according to a third embodiment of the present invention.
FIG. 8 is a block diagram showing a configuration of a low-frequency speech encoder according to the embodiment.
FIG. 9 is a block diagram showing a main part of an audio signal processing apparatus according to a fourth embodiment of the present invention.
FIG. 10 is a block diagram showing a configuration of an encoder according to the embodiment.
FIG. 11 is a block diagram relating to a modified example of the fourth embodiment.
FIG. 12 is a block diagram showing a main part of an audio signal processing apparatus according to a fifth embodiment of the present invention.
FIG. 13 is an exemplary block diagram showing the configuration of a low-frequency speech encoder according to the embodiment;
FIG. 14 is a block diagram showing a main part of an audio signal processing apparatus according to a sixth embodiment of the present invention.
FIG. 15 is a block diagram showing a configuration of an encoder according to the embodiment.
FIG. 16 is a block diagram showing the main configuration.
FIG. 17 is a block diagram showing a general configuration of a conventional first audio signal processing apparatus.
FIG. 18 is a block diagram showing a general configuration of a second conventional audio signal processing apparatus.
FIG. 19 is a block diagram showing a general configuration of a third conventional audio signal processing apparatus.
[Explanation of symbols]
1 ... Band division means
2. Correction means
3 ... Low frequency encoding means
4 ... High frequency encoding means
5 ... Low frequency decoding means
10 ... Microphone
11 ... A / D converter
12 ... Noise canceller
13, 17, 30, 32, 71... Encoder (speech encoder)
14 ... Multiplexer
15 ... Memory
16, 72 ... Wide area echo canceller
20 ... Speaker
21 ... D / A converter
22, 24... Decoder (voice decoding circuit)
23 ... Demultiplexer
50 ... VAD
73,82 ... Voice switch for wide area
80 ... High frequency voice switch
81 ... Low-range voice switch
130, 173, 202, 242, 320, 400... Low frequency coder (L coder)
131,171,241,321,500 ... high band encoder (H coder)
172, 300, 310 ... low-range speech encoder
200, 205 ... module
201, 401 ... down-sample part
203 ... Noise canceller
204 ... Low frequency echo canceller
221, 701... High frequency decoder (H decoder)
222, 223, 230, 702... Low band decoder (L decoder)
231 ... Upsample section
301, 311... High frequency speech encoder
402: Low frequency noise canceller (LNC)
403 ... Low frequency echo canceller (LEC)
501 ... High frequency noise canceller (HNC)
502 ... High frequency echo canceller (HEC)

Claims

In an audio signal processing apparatus for encoding a digital audio signal,
Dividing means for dividing the digital audio signal into a high-frequency component signal and a low-frequency component signal;
First encoding means for encoding the high-frequency component signal in response to an operation mode signal instructing encoding of the high-frequency component signal;
Second encoding means for encoding the low-frequency component signal;
First suppression means for suppressing noise components included in the digital audio signal before being encoded by the first and second encoding means;
Second suppression means for suppressing a noise component included in the low-frequency component signal before being encoded by the second encoding means;
When the operation mode signal does not instruct the encoding of the high frequency component signal, the digital encoding signal is encoded with the low frequency component signal by the second encoding means , and the first And a control means for controlling so as not to operate the suppression means.

Detection means for detecting that the digital audio signal is a silence signal;
The audio signal processing apparatus according to claim 1, wherein the control unit performs control so that the first suppression unit is not operated when the detection unit detects a silence signal.

In an audio signal processing apparatus for encoding a digital audio signal,
Dividing means for dividing the digital audio signal into a high-frequency component signal and a low-frequency component signal;
Encoding the high-frequency component signal in accordance with an operation mode signal instructing encoding of the high-frequency component signal and encoding the low-frequency component signal independently of the operation mode signal Means,
A first suppression unit included in the encoding unit and configured to suppress a noise component included in the high-frequency component signal of the digital audio signal; and a second suppression unit configured to suppress a noise component included in the low-frequency component signal. Suppression means having the following suppression means,
Control means for controlling the first suppression means so that the suppression process for the high-frequency component signal is not executed when the encoding of the high-frequency component signal is not instructed by the operation mode signal; An audio signal processing apparatus comprising the audio signal processing apparatus.

In an audio signal processing apparatus for encoding a digital audio signal,
Dividing means for dividing the digital audio signal into a high-frequency component signal and a low-frequency component signal;
First encoding means for encoding the high-frequency component signal in response to an operation mode signal instructing encoding of the high-frequency component signal;
Second encoding means for encoding the low-frequency component signal;
First suppression means for suppressing an echo component that occurs due to a received voice signal and is included in the digital voice signal before being encoded by the first and second encoding means;
Before being encoded by the second encoding means, second suppression means for suppressing an echo component that occurs due to the received voice signal and is included in the low-frequency component signal;
When the operation mode signal does not instruct the encoding of the high frequency component signal, the digital encoding signal is encoded with the low frequency component signal by the second encoding means , and the first And a control means for controlling so as not to operate the suppression means.

In an audio signal processing apparatus for encoding a digital audio signal,
Dividing means for dividing the digital audio signal into a high-frequency component signal and a low-frequency component signal;
Encoding the high-frequency component signal in accordance with an operation mode signal instructing encoding of the high-frequency component signal and encoding the low-frequency component signal independently of the operation mode signal Means,
Suppression means for suppressing an echo component generated due to the received voice signal and included in the digital voice signal;
Control means for controlling the suppression means so that the suppression process for the high-frequency component signal is not executed when the operation mode signal does not instruct encoding of the high-frequency component signal. An audio signal processing device.

In an audio signal processing method for encoding a digital audio signal,
In response to an operation mode signal indicating whether the high frequency component and low frequency component of the digital audio signal are encoded or only the low frequency component of the digital audio signal is encoded, the operation mode signal is a high frequency component In addition, when instructing the encoding of the low frequency component, the noise component contained in the high frequency component and the low frequency component of the digital audio signal is suppressed, and the operation mode signal indicates the encoding of only the low frequency component. If so, perform a suppression process of only the noise component included in the low frequency component of the digital audio signal,
When the operation mode signal indicates encoding of a high frequency component and a low frequency component, the high frequency component and the low frequency component of the digital audio signal in which the noise component is suppressed are encoded,
The audio signal processing method , wherein the low frequency component of the digital audio signal in which the noise component is suppressed is encoded when the operation mode signal instructs encoding of only the low frequency component .

In an audio signal processing apparatus for encoding a digital audio signal,
Dividing means for dividing the digital audio signal into signals of a plurality of bands;
In the digital audio signal for each band divided by the dividing means, encoding means for encoding a signal of at least one band in accordance with an operation mode signal instructing encoding;
Suppression means for suppressing an echo component generated due to the received voice signal and included in the digital voice signal;
An audio signal processing apparatus comprising: control means for controlling the suppression means so as not to execute a suppression process on a signal in a band excluding a band encoded by the encoding means.

In an audio signal processing apparatus for encoding a digital audio signal,
Dividing means for dividing the digital audio signal into a high-frequency component signal and a low-frequency component signal;
First encoding means for encoding the high-frequency component signal in response to an operation mode signal instructing encoding of the high-frequency component signal;
Second encoding means for encoding the low-frequency component signal;
First suppression means for suppressing noise components included in the digital audio signal before being encoded by the first and second encoding means;
Second suppression means included in the second encoding means for suppressing noise components included in the low-frequency component signal before being encoded;
When the operation mode signal does not instruct the encoding of the high-frequency component signal, the digital audio signal is encoded by the second encoding means with the low-frequency component signal, and the first Control means for controlling so as not to operate the suppression means of
An audio signal processing apparatus comprising: