JP4464484B2

JP4464484B2 - Noise signal encoding apparatus and speech signal encoding apparatus

Info

Publication number: JP4464484B2
Application number: JP16854599A
Authority: JP
Inventors: 幸司吉田
Original assignee: Panasonic Corp; Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Corp; Panasonic Holdings Corp
Priority date: 1999-06-15
Filing date: 1999-06-15
Publication date: 2010-05-19
Anticipated expiration: 2019-06-15
Also published as: EP1120775A1; EP1120775A4; WO2000077774A1; AU5103700A; CN1313983A; JP2000357000A

Abstract

Using noise model storage section 302 for storing information on a noise model capable of expressing statistical characteristic quantities regarding an input noise signal with respect to statistical characteristic quantities for an input noise signal calculated by noise signal analysis section 301, noise model variation detection section 303 detects whether a noise model parameter indicating the input noise signal has changed or not, noise model updating section 304 updates the noise model and outputs the updated model information. Coding is performed on a non-speech segment speech segment (segment with only noise) of an input signal or on a noise signal separated from a speech signal using the noise signal coder in the above configuration, while coding is performed on a speech segment using a speech coder. <IMAGE>

Description

【０００１】
【発明の属する技術分野】
本発明は、音声信号を符号化して伝送する移動通信システムや音声録音装置等の用途に用いられる低ビットレート音声信号符号化装置に関する。
【０００２】
【従来の技術】
ディジタル移動通信や音声蓄積の分野においては、電波や記憶媒体の有効利用のために音声情報を圧縮し、低いビットレートで符号化する音声符号化装置が用いられている。そのような従来の技術として、ＩＴＵ−Ｔ勧告Ｇ．７２９（"Coding of speech at 8kbit/s using conjugate-structure algebraic-code-excited linear-prediction(CS-ACELP)"）のＣＳ−ＡＣＥＬＰ符号化方式や、同じくＩＴＵ−Ｔ勧告のＧ．７２９ Annex B（"A silence compression scheme for G.729 optimized for terminals conforming to Recommendation V.70"）のＤＴＸ(Discontinuous Transmission)制御付きのＣＳ−ＡＣＥＬＰ符号化方式がある。
【０００３】
図１３は、従来のＣＳ−ＡＣＥＬＰ符号化方式の符号化装置の構成を示すブロック図である。図１３において、入力音声信号に対してＬＰＣ分析・量子化器１でＬＰＣ（線形予測）分析および量子化を行い、ＬＰＣ係数およびＬＰＣ量子化符号を出力する。
【０００４】
そして、適応音源符号帳２および固定音源符号帳３から取り出された適応音源信号および固定音源信号にゲイン符号帳４から取り出されたゲインを乗じて加算し、ＬＰＣ合成フィルタ７により音声合成を行い、入力信号に対する誤差信号を聴覚重み付けフィルタ９により重み付けを行い、重み付け後の誤差が最小となる適応音源符号、固定音源符号、ゲイン符号をＬＰＣ量子化符号と共に符号化データとして出力する。なお、図１３において、５は乗算器であり、６は加算器であり、８は減算器である。
【０００５】
図１４は、従来のＤＴＸ制御付きＣＳ−ＡＣＥＬＰ符号化方式の符号化装置の構成を示すブロック図である。まず、有音／無音判定器１１により入力信号が有音区間か無音区間（背景雑音のみの区間）かの判定を行う。そして、有音／無音判定器１１により有音と判定された場合、ＣＳ−ＡＣＥＬＰ音声符号化器１２により有音区間の音声符号化を行う。なお、ＣＳ−ＡＣＥＬＰ音声符号化器１２は、図１３に示す構成となっている。
【０００６】
一方、有音／無音判定器１１により無音と判定された場合、無音区間符号化器１３により符号化を行う。この無音区間符号化器１３は、入力信号から有音区間の符号化と同様なＬＰＣ係数と入力信号のＬＰＣ予測残差エネルギーを算出し、それらを無音区間の符号化データとして出力する。
【０００７】
ＤＴＸ制御および多重化器１４は、有音／無音判定器１１、ＣＳ−ＡＣＥＬＰ音声符号化器１２および無音区間符号化器１３の出力から、送信データとして送信すべきデータを制御し、多重化して送信データとして出力する。
【０００８】
【発明が解決しようとする課題】
しかしながら、上記従来のＣＳ−ＡＣＥＬＰ符号化器では、音声符号化器が音声特有の冗長性を利用して８ｋｂｐｓという低ビットレートで符号化を行っているため、背景雑音が重畳されないクリーンな音声信号が入力された場合には、高品質な符号化が可能であるが、入力信号として周囲の背景雑音が重畳された音声信号が入力された場合、背景雑音信号を符号化した際にその復号信号の品質が劣化するという問題がある。
【０００９】
また、上記従来のＤＴＸ制御付きＣＳ−ＡＣＥＬＰ符号化器においては、有音区間のみＣＳ−ＡＣＥＬＰ符号化器により符号化を行い、無音区間（雑音のみの区間）は、専用の無音区間符号化器で音声符号化器より少ないビットレートで符号化を行うことで、伝送する平均ビットレートを低減する。しなしながら、無音区間符号化器が音声符号化器と同様な信号モデル（短区間（１０〜５０ｍｓ程度）毎にＡＲ型の合成フィルタ（ＬＰＣ合成フィルタ）を雑音信号で駆動することで復号信号を生成する）で符号化を行っているため、上記従来のＣＳ−ＡＣＥＬＰ符号化器と同様に、背景雑音が重畳された音声信号に対しては復号信号の品質が劣化するという問題がある。
【００１０】
本発明はかかる点に鑑みてなされたものであり、背景雑音が重畳された音声信号に対しても復号信号の品質の劣化が少なく、かつ伝送に必要な平均ビットレートも低減することのできる音声信号の符号化装置および復号装置を提供することを目的とする。
【００１１】
【課題を解決するための手段】
本発明の骨子は、無音区間（雑音のみの区間）の入力信号に対する統計的特徴量を算出し、入力雑音信号に関する統計的特徴量を表現できるような雑音モデルに関する情報を記憶し、入力雑音信号を表す雑音モデルパラメータが変化したかどうかを検出し、雑音モデルの更新を行うことにより、背景雑音が重畳された音声信号に対しても復号信号の品質の劣化が少なく、かつ伝送に必要な平均ビットレートも低減することである。
【００１２】
【発明の実施の形態】
本発明の第１の態様に係る雑音信号符号化装置は、雑音信号を含む音声信号の前記雑音信号に対して信号分析を行う分析手段と、前記雑音信号を表わす雑音モデルに関する情報を記憶する記憶手段と、現入力の雑音信号の信号分析結果に基づいて、記憶された雑音モデルに関する情報の変化を検出する検出手段と、雑音モデルに関する情報の変化が検出された場合に、前記記憶された雑音モデルに関する情報を更新して出力する更新手段と、を具備し、前記分析手段は、前記信号分析により雑音信号に関する統計的特徴量を抽出し、前記更新手段は、前記統計的特徴量を統計モデルで表現した情報であるとともに、復号側において、雑音信号の生成に必要な信号パラメータを前記統計モデルに基づいて確率的に出力できる情報である前記雑音モデルに関する情報の前記更新を行う構成を採る。
【００１３】
本発明の第２の態様に係る音声信号符号化装置は、入力音声信号に対して有音区間か雑音信号のみを含む無音区間かを判定する有音／無音判定手段と、判定結果が有音である場合に前記入力音声信号に対して音声符号化を行う音声符号化手段と、判定結果が無音である場合に前記入力信号に対して雑音信号の符号化を行う第１の態様の雑音信号符号化装置と、前記有音／無音判定手段、前記音声符号化手段、および前記雑音信号符号化装置からの出力を多重化する多重化手段と、を具備する構成を採る。
【００１４】
本発明の第３の態様に係る音声信号符号化装置は、入力音声信号を、音声信号とこの音声信号に重畳している背景雑音信号とに分離する音声／雑音信号分離手段と、前記入力音声信号又は前記音声／雑音信号分離手段により得られる音声信号から有音区間か雑音信号のみを含む無音区間かを判定する有音／無音判定手段と、判定結果が有音である場合に前記入力音声信号に対して音声符号化を行う音声符号化手段と、前記音声／雑音信号分離手段により得られる背景雑音信号の符号化を行う第１の態様の雑音信号符号化装置と、前記有音／無音判定手段、前記音声符号化手段、および前記雑音信号符号化装置からの出力を多重化する多重化手段と、を具備する構成を採る。
【００１５】
本発明の第４の態様に係る音声信号符号化装置は、入力音声信号に対して信号分析を行う分析手段と、前記入力音声信号が有音信号であるかどうかを判定するために必要な音声の特徴パターンを記憶する音声モデル記憶手段と、前記入力音声信号に含まれる雑音信号を表現する雑音モデルに関する情報を記憶する雑音モデル記憶手段と、前記分析手段、前記音声モデル記憶手段および前記雑音モデル記憶手段の出力を用いて、前記入力音声信号が有音区間か雑音信号のみを含む無音区間かを判定すると共に、前記無音区間の場合に雑音モデルを更新するかどうかの判定を行うモード判定手段と、前記モード判定手段が有音区間と判定した場合に入力音声信号に対して音声符号化を行う音声符号化手段と、前記モード判定手段が無音区間でかつ雑音モデルを更新すると判定した場合にその雑音モデルの更新を行って出力する雑音モデル更新手段と、前記音声符号化手段および前記雑音モデル更新手段からの出力を多重化する多重化手段と、を具備し、前記分析手段は、前記信号分析により入力音声信号に関する統計的特徴量を抽出し、前記雑音モデル更新手段は、前記統計的特徴量を統計モデルで表現した情報であるとともに、復号側において、雑音信号の生成に必要な信号パラメータを前記統計モデルに基づいて確率的に出力できる情報である前記雑音モデルに関する情報の前記更新を行う構成を採る。
【００１６】
本発明の第５の態様に係る雑音信号生成装置は、符号化側で入力雑音信号に対して符号化された雑音モデルパラメータおよび雑音モデル更新フラグにしたがって、必要な場合に雑音モデルの更新を行う雑音モデル更新手段と、前記雑音モデル更新手段の出力を用いて更新後の雑音モデルに関する情報を記憶する雑音モデル記憶手段と、前記雑音モデル記憶手段で記憶している雑音モデルに関する情報から雑音信号を生成する雑音信号生成手段と、を具備し、前記雑音モデル記憶手段は、前記雑音モデルに関する情報として統計的特徴量を統計モデルで表現した情報を記憶し、前記雑音信号生成手段は、前記統計モデルに基づいて信号パラメータを確率的に出力し、出力した前記信号パラメータより雑音信号を生成する構成を採る。
【００１７】
本発明の第６の態様に係る音声信号復号化装置は、符号化側で符号化された音声データ、雑音モデルパラメータ、有音／無音判定フラグおよび雑音モデル更新フラグを含む信号を受信し、前記信号から雑音モデルパラメータ、有音／無音判定フラグおよび雑音モデル更新フラグを分離する分離手段と、前記有音／無音判定フラグが有音区間を示す場合に、前記音声データに対して音声復号を行う音声復号化手段と、前記有音／無音判定フラグが無音区間を示す場合に、前記雑音モデルパラメータおよび雑音モデル更新フラグから雑音信号の生成を行う第５の態様の雑音信号生成装置と、前記音声復号化手段から出力される復号音声と前記雑音信号生成装置から出力される雑音信号のいずれかを、前記有音／無音判定フラグに応じて切り替えて出力信号として出力する出力切り替え手段と、を具備する構成を採る。
【００１８】
本発明の第７の態様に係る音声信号復号化装置は、符号化側で符号化された音声データ、雑音モデルパラメータ、有音／無音判定フラグおよび雑音モデル更新フラグを含む信号を受信し、前記信号から雑音モデルパラメータ、有音／無音判定フラグおよび雑音モデル更新フラグを分離する分離手段と、前記有音／無音判定フラグが有音区間を示す場合に、前記音声データに対して音声復号を行う音声復号化手段と、前記有音／無音判定フラグが無音区間を示す場合に、前記雑音モデルパラメータおよび雑音モデル更新フラグから雑音信号の生成を行う第５の態様の雑音信号生成装置と、前記音声復号化手段から出力される復号音声と前記雑音信号生成装置から出力される雑音信号とを加算する音声／雑音信号加算手段と、を具備する構成を採る。
【００１９】
本発明の第８の態様に係る音声信号符号化方法は、入力音声信号に対して有音区間か雑音信号のみを含む無音区間かを判定する有音／無音判定工程と、判定結果が有音である場合に前記入力音声信号に対して音声符号化を行う音声符号化工程と、判定結果が無音である場合に前記入力信号に対して雑音信号の符号化を行う雑音信号符号化工程と、前記有音／無音判定工程、前記音声符号化工程、および前記雑音信号符号化工程における出力を多重化する多重化工程と、を具備し、前記雑音信号符号化工程は、雑音信号を含む音声信号の前記雑音信号に対して信号分析を行う分析工程と、前記雑音信号を表わす雑音モデルに関する情報を記憶する記憶工程と、現入力の雑音信号の信号分析結果に基づいて、記憶された雑音モデルに関する情報の変化を検出する検出工程と、雑音モデルに関する情報の変化が検出された場合に、前記変化の変化量分だけ前記記憶された雑音モデルに関する情報を更新する更新工程と、を含み、前記分析工程は、前記信号分析により雑音信号に関する統計的特徴量を抽出し、前記更新工程は、前記統計的特徴量を統計モデルで表現した情報であるとともに、復号側において、雑音信号の生成に必要な信号パラメータを前記統計モデルに基づいて確率的に出力できる情報である前記雑音モデルに関する情報の前記更新を行うようにした。
【００２０】
本発明の第９の態様に係る音声信号符号化方法は、入力音声信号を、音声信号とこの音声信号に重畳している背景雑音信号とに分離する音声／雑音信号分離工程と、前記入力音声信号又は前記音声／雑音信号分離工程において得られる音声信号から有音区間か雑音信号のみを含む無音区間かを判定する有音／無音判定工程と、判定結果が有音である場合に前記入力音声信号に対して音声符号化を行う音声符号化工程と、判定結果が無音である場合に前記入力信号に対して雑音信号の符号化を行うと共に、前記音声／雑音信号分離工程において得られる背景雑音信号の符号化を行う雑音信号符号化工程と、前記有音／無音判定工程、前記音声符号化工程、および前記雑音信号符号化工程における出力を多重化する多重化工程と、を具備し、雑音信号符号化工程は、雑音信号を含む音声信号の前記雑音信号に対して信号分析を行う分析工程と、前記雑音信号を表わす雑音モデルに関する情報を記憶する記憶工程と、現入力の雑音信号の信号分析結果に基づいて、記憶された雑音モデルに関する情報の変化を検出する検出工程と、雑音モデルに関する情報の変化が検出された場合に、前記変化の変化量分だけ前記記憶された雑音モデルに関する情報を更新する更新工程と、を含み、前記分析工程は、前記信号分析により雑音信号に関する統計的特徴量を抽出し、前記更新工程は、前記統計的特徴量を統計モデルで表現した情報であるとともに、復号側において、雑音信号の生成に必要な信号パラメータを前記統計モデルに基づいて確率的に出力できる情報である前記雑音モデルに関する情報の前記更新を行うようにした。
【００２１】
本発明の第１０の態様に係る音声信号符号化方法は、入力音声信号に対して信号分析を行う分析工程と、前記入力音声信号が有音信号であるかどうかを判定するために必要な音声の特徴パターンを記憶する音声モデル記憶工程と、前記入力音声信号に含まれる雑音信号を表現する雑音モデルに関する情報を記憶する雑音モデル記憶工程と、前記分析工程、前記音声モデル記憶工程および前記雑音モデル記憶工程における出力を用いて、前記入力音声信号が有音区間か雑音信号のみを含む無音区間かを判定すると共に、前記無音区間の場合に雑音モデルを更新するかどうかの判定を行うモード判定工程と、前記モード判定工程において有音区間と判定した場合に入力音声信号に対して音声符号化を行う音声符号化工程と、前記モード判定工程において無音区間でかつ雑音モデルを更新すると判定した場合にその雑音モデルの更新を行う雑音モデル更新工程と、前記音声符号化工程および前記雑音モデル更新工程の出力を多重化する多重化工程と、を具備し、前記分析工程は、前記信号分析により入力音声信号に関する統計的特徴量を抽出し、前記雑音モデル更新工程は、前記統計的特徴量を統計モデルで表現した情報であるとともに、復号側において、雑音信号の生成に必要な信号パラメータを前記統計モデルに基づいて確率的に出力できる情報である前記雑音モデルに関する情報の前記更新を行うようにした。
【００２２】
本発明の第１１の態様に係る記録媒体は、コンピュータに、入力雑音信号に対して信号分析を行って雑音信号に関する統計的特徴量を抽出する手順と、入力雑音信号に対する統計的特徴量を統計モデルで表現した情報を雑音モデルに関する情報として記憶する手順と、入力雑音信号を表す雑音モデルの変化を検出する手順と、雑音モデルに関する情報の変化が検出された場合に、復号側において、雑音信号の生成に必要な信号パラメータを前記統計モデルに基づいて確率的に出力できる前記雑音モデルに関する情報の更新を必要な場合に行い、更新後の雑音モデルに関する情報を出力する手順と、を実行させるためのプログラムを記録した機械読みとり可能なものである。
【００３８】
以下、本発明の実施の形態について、添付図面を参照して詳細に説明する。
（実施の形態１）
図１は、本発明の実施の形態１に係る音声信号符号化装置を備えた無線通信装置の構成を示すブロック図である。
【００３９】
この無線通信装置において、送信側で音声がマイクなどの音声入力装置１０１によって電気的アナログ信号に変換され、Ａ／Ｄ変換器１０２に出力される。アナログ音声信号は、Ａ／Ｄ変換器１０２によってディジタル音声信号に変換され、音声符号化部１０３に出力される。音声符号化部１０３は、ディジタル音声信号に対して音声符号化処理を行い、符号化した情報を変復調部１０４に出力する。変復調部１０４は、符号化された音声信号をディジタル変調して、無線送信部１０５に送る。無線送信部１０５では、変調後の信号に所定の無線送信処理を施す。この信号は、アンテナ１０６を介して送信される。
【００４０】
一方、無線通信装置の受信側では、アンテナ１０７で受信した受信信号は、無線受信部１０８で所定の無線受信処理が施され、変復調部１０４に送られる。変復調部１０４では、受信信号に対して復調処理を行い、復調後の信号を音声復号化部１０９に出力する。音声復号化部１０９は、復調後の信号に復号処理を行ってディジタル復号音声信号を得て、そのディジタル復号音声信号をＤ／Ａ変換器１１０へ出力する。Ｄ／Ａ変換器１１０は、音声復号化部１０９から出力されたディジタル復号音声信号をアナログ復号音声信号に変換してスピーカなどの音声出力装置１１１に出力する。最後に音声出力装置１１１が電気的アナログ復号音声信号を復号音声に変換して出力する。
【００４１】
図１に示す音声符号化部１０３は、図２に示す構成を有する。図２は、本発明の実施の形態１に係る音声符号化部の構成を示すブロック図である。
【００４２】
有音／無音判定器２０１において、入力音声信号に対して有音区間か無音区間（雑音のみの区間）かを判定し、その判定結果をＤＴＸ制御および多重化器２０４に出力する。有音／無音判定器２０１は任意のものでよく、一般には、入力信号のパワー、スペクトルやピッチ周期などの複数のパラメータの瞬時量または変化量等を用いて判定が行われる。
【００４３】
そして、前記有音／無音判定器２０１による判定結果が有音である場合には、音声符号化器２０２により、音声信号と雑音信号が含まれる有音区間において入力音声信号に対して音声符号化を行い、その符号化データをＤＴＸ制御および多重化器２０４に出力する。この音声符号化器２０２は、有音区間用の符号化器であり、音声を高能率に符号化するものであれば任意の符号化器でよい。
【００４４】
一方、前記有音／無音判定器２０１による判定結果が無音である場合には、雑音信号符号化器２０３により、雑音信号のみが含まれる無音区間において入力信号に対して雑音信号の符号化を行い、入力雑音信号を表現する雑音モデルに関する情報と、雑音モデルの更新をするかどうかのフラグとをＤＴＸ制御および多重化器２０４に出力する。最後に、ＤＴＸ制御および多重化器２０４により、前記有音／無音判定器２０１、音声符号化器２０２および雑音信号符号化器２０３からの出力を用いて送信データとして送信すべき情報の制御と送信情報の多重化を行い、送信データとして出力する。
【００４５】
図２における雑音信号符号化器２０３は、図３に示す構成を有する。図３は、本発明の実施の形態１に係る音声符号化部の雑音信号符号化器の構成を示すブロック図である。
【００４６】
雑音信号分析部３０１により、ある一定区間毎に入力された雑音信号に対して信号分析を行い、雑音信号に対する分析パラメータを算出する。抽出する分析パラメータとしては、入力信号に関する統計的特徴量を表すのに必要なパラメータであり、例えば、短区間信号に対してＦＦＴ(Fast Fourier Transform)により求めた短時間スペクトルや、入力パワー、ＬＰＣスペクトルパラメータ等がある。
【００４７】
次に、雑音モデル変化検出部３０３において、現在入力された雑音信号を表すべき雑音モデルパラメータが、雑音モデル記憶部３０２により保持されている雑音モデルパラメータから変化しているかどうかを検出する。
【００４８】
ここで、雑音モデルパラメータとは、入力雑音信号に関する統計的特徴量を表現できるような雑音モデルに関する情報であり、例えば、短時間スペクトルの平均スペクトルや分散値、等の統計的特徴量を、例えばＨＭＭのような統計モデルで表現した際の情報である。
【００４９】
そして、雑音モデル変化検出部３０３は、雑音信号分析部３０１により得られた現入力信号に対する分析パラメータが、それ以前の入力信号を表す雑音モデルとして記憶された雑音モデルからの出力として妥当かどうか（例えば、ＨＭＭモデルであれば現入力信号に対する分析パラメータの出力確率が規定値以上であるか）を判定し、現在入力された雑音信号を表すべき雑音モデルパラメータが記憶された雑音モデルから変化していると判定された場合に、雑音モデルの更新を行うかどうかのフラグと更新すべき情報（更新情報）を雑音モデル更新部３０４に出力する。
【００５０】
なお、外部更新許可フラグは、雑音モデル更新を許可するかどうかを外部から指示するフラグで、後述する本発明における音声符号化部において、有音区間中の符号化データを送信する期間中等、雑音モデルパラメータの送信を行わないようにする際には、雑音モデルの更新を不許可とする。
【００５１】
そして、雑音モデル更新部３０４において、雑音モデル更新フラグが更新を示す場合には、雑音モデル更新情報として、更新後の雑音モデルパラメータまたは以前に雑音モデル記憶部３０２に記憶されている雑音モデルパラメータからの変化分のみの情報を出力すると共に、その出力情報を用いて雑音モデル記憶部３０２の更新を行う。一方、雑音モデル更新フラグが非更新を示す場合には、更新を行わず、更新情報を出力しない。
【００５２】
次に、図１に示す音声復号化部１０９は、図４に示す構成を有する。図４は、本発明の実施の形態１に係る音声復号化装置の構成を示すブロック図である。
【００５３】
分離およびＤＴＸ制御器４０１において、符号化側で入力信号に対して符号化され送信された送信データを受信データとして受信し、この受信データを音声復号および雑音生成に必要な、音声符号化データまたは雑音モデルパラメータ、有音／無音判定フラグおよび雑音モデル更新フラグに分離する。
【００５４】
次いで、前記有音／無音判定フラグが有音区間を示す場合には、音声復号化器４０２により前記音声符号化データから音声復号を行い復号音声を出力切替え器４０４に出力する。
【００５５】
一方、前記有音／無音判定フラグが無音区間を示す場合には、雑音信号生成器４０３により前記雑音モデルパラメータおよび雑音モデル更新フラグから雑音信号の生成を行し、雑音信号を出力切替え器４０４に出力する。そして、出力切り替え器４０４により、前記音声復号化器４０２の出力と前記雑音信号生成器４０３の出力を、有音／無音判定フラグの結果に応じて切り替えて出力し、出力信号とする。
【００５６】
図４における雑音信号生成器４０３は、図５に示す構成を有する。図５は、本発明の実施の形態１に係る音声復号化装置の雑音信号生成器の構成を示すブロック図である。
【００５７】
図３に示す雑音信号符号化器２０３から出力された、雑音モデル更新フラグおよび雑音モデルパラメータ（モデル更新の場合）が雑音モデル更新部５０１に入力される。雑音モデル更新部５０１においては、前記雑音モデル更新フラグが更新を示している場合、前記入力雑音モデルパラメータおよび雑音モデル記憶部５０２で保持されている以前の雑音モデルパラメータを用いて、雑音モデルの更新を行い、更新後の雑音モデルパラメータを雑音モデル記憶部５０２にて新たに記憶する。
【００５８】
雑音信号生成部５０３では、雑音モデル記憶部５０２の情報をもとに、雑音信号を生成し出力する。雑音生成は、統計的特徴量をパラメータにモデル化された情報をもとに、生成される雑音信号がそのモデルからの出力として妥当な信号となるように生成される。例えば、統計モデルとしてＨＭＭを用いた場合、その状態遷移確率およびパラメータ出力確率等に従って、生成に必要な信号パラメータ（例えば、短時間スペクトル）を確率的に出力し、それに基づき雑音信号を生成・出力する。
【００５９】
次に、上記構成を有する音声符号化部および音声復号化部の動作について説明する。図６は、実施の形態１に係る音声信号の符号化方法の処理の流れを示すフロー図である。なお、本方法では、図６に示す本処理を、一定短区間（例えば、１０〜５０ｍｓ程度）のフレーム毎に繰り返して行うものとする。
【００６０】
まず、ステップ（以下ＳＴと省略する）１０１において、フレーム単位の音声信号を入力する。次に、ＳＴ１０２にて、入力信号に対する有音／無音判定を行い、その判定結果を出力する。そして、その判定結果が有音である場合には、ＳＴ１０４により入力音声信号に対して音声符号化処理を行いその符号化データを出力する。
【００６１】
一方、ＳＴ１０３における判定結果が無音である場合には、ＳＴ１０５にて、入力信号に対して雑音信号符号化器による雑音信号符号化処理を行い、入力雑音信号を表現する雑音モデルに関する情報と雑音モデルの更新を行うかどうかのフラグを出力する。なお、雑音信号の符号化処理については後述する。
【００６２】
そして、ＳＴ１０６において、前記有音／無音判定、音声符号化処理および雑音信号符号化処理の結果得られた出力を用いて送信データとして送信すべき情報の制御と送信情報の多重化を行い、最後にＳＴ１０７にて、送信データとして出力する。
【００６３】
図７は、本実施の形態に係る音声信号の符号化方法における雑音信号符号化方法の処理の流れを示すフロー図である。なお、本方法では、図７に示す本処理を、一定短区間（例えば、１０〜５０ｍｓ程度）のフレーム毎に繰り返して行うものとする。
【００６４】
ＳＴ２０１において、フレーム単位の雑音信号を入力する。次に、ＳＴ２０２において、フレーム単位の雑音信号に対して信号分析を行い、雑音信号に対する分析パラメータを算出する。そして、ＳＴ２０３において、分析パラメータから雑音モデルの変化があるかどうかの検出を行い、雑音モデルが変化したと判定された場合、ＳＴ２０５にて、雑音モデルの更新をするかどうかのフラグ（更新あり）と更新すべき情報（更新情報）を出力すると共に、ＳＴ２０６にて、その出力情報を用いて雑音モデル記憶部３０２の更新を行う。
【００６５】
一方、ＳＴ２０４にて、雑音モデルの変化なしと判定された場合には、ＳＴ２０７にて、雑音モデルの更新をするかどうかのフラグ（更新なし）のみ出力する。なお、ＳＴ２０３において、外部から別途入力される外部更新許可フラグが不許可の場合、モデル変化なしとして雑音モデルパラメータの送信を行わないようにする。
【００６６】
このように、本実施の形態に係る雑音符号化方法によれば、雑音信号を統計的特徴量で表現できるような雑音モデルでモデル化することにより、背景雑音信号に対して聴感的に劣化の少ない復号信号を生成することができる。また、入力信号波形に対する忠実な符号化が不要であると共に、入力信号に対応する雑音モデルパラメータが変化する区間のみ伝送することにより、低ビットレートで高効率な符号化を行うことができる。
【００６７】
また、本実施の形態に係る音声信号の符号化方法によれば、有音区間では音声信号を高品質で符号化できる音声符号化器で符号化を行い、無音区間では高効率で聴感的に劣化が少ない雑音信号符号化器で符号化を行うことにより、背景雑音環境下においても高品質・高効率な符号化を行うことができる。
【００６８】
（実施の形態２）
図８は、本発明の実施の形態２に係る音声信号の符号化部の構成を示すブロック図である。
【００６９】
この音声符号化部１０３においては、音声／雑音信号分離器８０１で、入力音声信号を、音声信号と音声信号に重畳している背景雑音信号とに分離する。音声／雑音信号分離器８０１は、任意のものでよい。この分離方法としては、スペクトルサブトラクションと呼ばれる、入力信号から周波数領域で雑音スペクトルを減ずることで、入力信号を雑音抑圧後の音声信号と雑音信号とに分離する方法や、複数の信号入力器からの入力信号から音声と雑音の分離を行う方法などが考えられる。
【００７０】
次に、有音／無音判定器８０２において、前記音声/雑音信号分離器８０１から得られる分離後の音声信号から有音区間か無音区間（雑音のみの区間）かを判定し、その判定結果を音声符号化器８０３およびＤＴＸ制御および多重化器８０５に出力する。なお、分離前の入力信号を用いて判定を行う構成でもよい。有音／無音判定器８０２は任意のものでよい。この判定は、一般には、入力信号のパワー、スペクトルやピッチ周期などの複数のパラメータの瞬時量または変化量等を用いて判定が行われる。
【００７１】
そして、前記有音／無音判定器８０２による判定結果が有音である場合には、音声符号化器８０３により、前記音声／雑音信号分離器８０１から得られる分離後の音声信号に対して有音区間のみ音声符号化器８０３で音声信号の符号化を行い、その符号化データをＤＴＸ制御および多重化器８０５に出力する。この音声符号化器８０３は、有音区間用の符号化器で、音声を高能率に符号化する任意の符号化器でよい。
【００７２】
一方、雑音信号符号化器８０４により、前記音声／雑音信号分離器８０１から得られる分離後の雑音信号に対して全区間にわたって雑音信号符号化器８０４で雑音信号の符号化を行い、入力雑音信号表現する雑音モデルに関する情報と雑音モデルの更新をするかどうかのフラグを出力する。音声／雑音信号符号化器８０１は、実施の形態１にて説明した図３に示すものである。
【００７３】
なお、有音／無音判定結果が有音である場合、雑音信号符号化器８０４に入力される有音／無音判定結果フラグを雑音信号符号化器８０４における雑音モデル更新不許可フラグとして、モデル更新を行わないようにする。
【００７４】
最後に、ＤＴＸ制御および多重化器８０５により、前記有音／無音判定器８０２、音声符号化器８０３および雑音信号符号化器８０４からの出力を用いて送信データとして送信すべき情報の制御と送信情報の多重化を行い、送信データとして出力する。
【００７５】
図９は、実施の形態２に係る音声信号の復号化装置の構成を示すブロック図である。
図９に示す復号化装置においては、分離およびＤＴＸ制御器９０１において、符号化側で入力信号に対して符号化され送信された送信データを受信データとして受信し、音声復号および雑音生成に必要な、音声符号化データまたは雑音モデルパラメータ、有音／無音判定フラグおよび雑音モデル更新フラグに分離する。
【００７６】
次に、前記有音／無音判定フラグが有音区間を示す場合には、音声復号化器９０２により前記音声符号化データから音声復号を行い復号音声を音声／雑音信号加算器９０４に出力する。
【００７７】
一方、雑音信号生成器９０３により前記雑音モデルパラメータおよび雑音モデル更新フラグから雑音信号の生成を行い、雑音信号を音声／雑音信号加算器９０４に出力する。そして、音声／雑音信号加算器９０４により、前記音声復号化器９０２の出力と前記雑音信号生成器９０３の出力とを加算し、出力信号とする。
【００７８】
次に、図１０を参照して、実施の形態２に係る音声信号の符号化方法の処理の流れを説明する。なお、本方法では、図１０に示す本処理を、一定短区間（例えば、１０〜５０ｍｓ程度）のフレーム毎に繰り返して行うものとする。
【００７９】
まず、ＳＴ３０１において、フレーム単位の入力信号を入力する。次いで、ＳＴ３０２にて、入力音声信号を、音声信号と音声信号に重畳している背景雑音信号とに分離する。そして、ＳＴ３０３において、入力信号またはＳＴ３０２で得られた分離後の音声信号に対して有音／無音判定を行い、その判定結果を出力する（ＳＴ３０４）。
【００８０】
そして、判定結果が有音である場合には、ＳＴ３０５において、ＳＴ３０２で得られた分離後の音声信号に対して音声符号化器による音声符号化処理を行い、その符号化データを出力する。次いで、ＳＴ３０２で得られた分離後の雑音信号に対して、ＳＴ３０６にて、雑音信号符号化器による雑音信号符号化処理を行い、入力雑音信号表現する雑音モデルに関する情報と雑音モデルの更新をするかどうかのフラグを出力する。
【００８１】
ＳＴ３０３における有音／無音判定結果が有音である場合、ＳＴ３０６にて行う雑音信号符号化処理において、モデル更新を行わないようにする。そして、ＳＴ３０７において、前記有音／無音判定、音声符号化処理および雑音信号符号化処理の結果得られた出力を用いて送信データとして送信すべき情報の制御と送信情報との多重化を行い、最後にＳＴ３０８にて送信データとして出力する。
【００８２】
このように、本実施の形態の音声信号の符号化装置によれば、有音区間では音声信号を高品質で符号化できる音声符号化器で符号化を行い、雑音信号に対しては高効率で聴感的に劣化が少ない実施の形態１記載の雑音信号符号化器で符号化を行うことにより、背景雑音環境下においても高品質・高効率な符号化を行うことができ、さらに音声／雑音信号分離器を設けることにより、前記音声符号化器に入力される音声信号から重畳された背景雑音が除去され、有音区間をより高品質にまたはより高効率に符号化することができる。
【００８３】
（実施の形態３）
図１１は、本発明の実施の形態３に係る音声符号化部の構成を示すブロック図である。なお、本実施の形態における復号側の構成は、図４に示す音声信号の復号装置の構成と同一である。
【００８４】
入力信号分析器１１０１により、ある一定区間毎に入力された入力信号に対して信号分析を行い、入力信号に対する分析パラメータを算出する。抽出する特徴パラメータとしては、入力信号に関する統計的特徴量を表すのに必要なパラメータおよび音声的な特徴を表すパラメータである。統計的特徴量を表すのに必要なパラメータとしては、例えば、短区間信号に対してＦＦＴにより求めた短時間スペクトルや、入力パワー、ＬＰＣスペクトルパラメータ、等がある。また、音声的な特徴を表すパラメータとしては、ＬＰＣパラメータ、入力パワーやピッチ周期性情報、等がある。
【００８５】
次に、モード判定器１１０４により、前記入力信号分析器１１０１で得られた分析パラメータに対して、音声モデル記憶器１１０２で保持されている音声的な特徴パターンおよび雑音モデル記憶器１１０３で保持されている雑音モデルパラメータを用いて、入力信号が有音区間か無音区間（雑音のみの区間）か、および無音区間の場合に雑音モデルを更新して更新情報を伝送するかどうかの判定を行う。
【００８６】
ここで、音声モデル記憶器１１０２は、音声的な特徴パターンを予め作成記憶しているもので、音声的な特徴パターンとしては、例えば、音声（有音）区間中のＬＰＣパラメータ、入力信号パワーやピッチ周期性情報等の分布などの情報である。また、雑音モデルパラメータとは、入力雑音信号に関する統計的特徴量を表現できるような雑音モデルに関する情報であり、例えば、短時間スペクトルの平均スペクトルや分散値、等の統計的特徴量を、例えばＨＭＭのような統計モデルで表現した際の情報である。
【００８７】
そして、入力信号分析器１１０１により得られた現入力信号に対する統計的分析パラメータが、それ以前の雑音区間中の信号を表す雑音モデルとして記憶された雑音モデルからの出力として妥当かどうか（例えば、ＨＭＭモデルであれば現入力信号に対する分析パラメータの出力確率が規定値以上であるか）を判定すると共に、入力信号に対する音声的特徴を表すパラメータから音声（有音）区間かどうかを判定する。
【００８８】
前記モード判定器１１０４が有音区間であると判定した場合には、音声符号化器１１０５により、入力信号に対して音声符号化を行いその符号化データをＤＴＸ制御および多重化器１１０７に出力する。一方、前記モード判定器１１０４が無音区間でかつ雑音モデル更新情報を伝送すると判定した場合には、雑音モデル更新器１１０６により、その雑音モデルの更新を行い、更新後の雑音モデルに関する情報をＤＴＸ制御および多重化器１１０７に出力する。
【００８９】
最後に、ＤＴＸ制御および多重化器１１０７により、音声符号化器および雑音モデル更新器１１０６からの出力を用いて送信データとして送信すべき情報の制御と送信情報の多重化を行い、送信データを出力する。
【００９０】
次に、図１２を参照して、本実施の形態に係る音声信号の符号化方法の処理の流れを説明する。なお、本方法では、図１２に示す本処理を、一定短区間（例えば、１０〜５０ｍｓ程度）のフレーム毎に繰り返して行うものとする。
【００９１】
まず、ＳＴ４０１において、フレーム単位の入力信号を入力する。次に、ＳＴ４０２において、ある一定区間毎に入力された入力信号に対して信号分析を行い、その分析パラメータを算出し出力する。
【００９２】
そして、ＳＴ４０３において、現在入力された統計的分析パラメータが、図１１における雑音モデル記憶器１１０３により保持されている雑音モデルからの出力として妥当かどうかその適合性を判定する（ＳＴ４０４）。その結果、適合しない、すなわち現入力信号が現時点で保持されている雑音モデルでは表現できないと判定された場合には、次のＳＴ４０５に進み、入力信号に対して分析して得られた音声的特徴パラメータから音声（有音）区間かどうか判定する。そして、音声区間と判定された場合、ＳＴ４０６にて、音声符号化器による音声符号化処理を行い、その符号化データを出力する。
【００９３】
一方、ＳＴ４０５にて、音声区間ではないと判定された場合、ＳＴ４０７にて、雑音モデルの更新を行い、更新後の雑音モデルに関する情報を出力する。ＳＴ４０３にて、現入力が現時点で保持されている雑音モデルで表現できると判定された場合は、何も処理をせず次ステップに進む。そして、ＳＴ４０８において、音声符号化器および雑音モデル更新器からの出力を用いて送信データとして送信すべき情報の制御と送信情報の多重化を行い、ＳＴ４０９にて送信データを出力する。
【００９４】
このように、本実施の形態に係る音声信号の符号化装置によれば、モード判定器を設けることにより、入力信号の統計的特徴量の変化および音声の特徴パターンを用いて判定を行うことができる。したがって、より正確なモード判定を行うことができ、判定誤りによる品質劣化を抑えることができる。
【００９５】
【発明の効果】
以上説明したように本発明の雑音信号符号化装置では、雑音信号を統計的特徴量で表現できるような雑音モデルでモデル化することにより、背景雑音信号に対して聴感的に劣化の少ない復号信号を生成することができる。また、入力信号波形に対する忠実な符号化が不要となるので、入力信号に対応する雑音モデルパラメータが変化する区間のみ伝送することにより、低ビットレートで高効率な符号化を行うことができる。
【００９６】
また、本発明の音声信号符号化装置においては、有音区間では音声信号を高品質で符号化できる音声符号化器で符号化を行い、無音区間では高効率で聴感的に劣化が少ない前記雑音信号符号化器で符号化を行うことにより、背景雑音環境下においても高品質・高効率な符号化を行うことができる。
【図面の簡単な説明】
【図１】本発明の実施の形態に係る音声信号符号化装置及び音声信号復号化装置を備えた無線通信システムの構成を示すブロック図
【図２】本発明の実施の形態１に係る音声信号符号化装置の構成を示すブロック図
【図３】本発明の実施の形態１に係る雑音信号符号化装置の構成を示すブロック図
【図４】本発明の実施の形態１に係る音声信号復号化装置の構成を示すブロック図
【図５】本発明の実施の形態１に係る音声信号復号化装置における雑音信号生成器の構成を示すブロック図
【図６】本発明の実施の形態１に係る音声信号符号化方法の処理の流れを示すフロー図
【図７】本発明の実施の形態１に係る雑音信号符号化方法の処理の流れを示すフロー図
【図８】本発明の実施の形態２に係る音声信号符号化装置の構成を示すブロック図
【図９】本発明の実施の形態２に係る音声信号復号化装置の構成を示すブロック図
【図１０】本発明の実施の形態２に係る音声信号符号化方法の処理の流れを示すフロー図
【図１１】本発明の実施の形態３に係る音声信号符号化装置の構成を示すブロック図
【図１２】本発明の実施の形態３に係る音声信号符号化方法の処理の流れを示すフロー図
【図１３】従来の音声信号符号化装置の構成を示すブロック図
【図１４】従来の音声信号符号化装置の構成を示すブロック図
【符号の説明】
２０１有音／無音判定器
２０２音声符号化器
２０３雑音信号符号化器
２０４ＤＴＸ制御および多重化器
３０１雑音信号分析部
３０２，５０２雑音モデル記憶部
３０３雑音モデル変化検出部
３０４，５０１雑音モデル更新部
４０１分離およびＤＴＸ制御器
４０２音声復号化器
４０３雑音信号生成器
４０４出力切り替え器
５０３雑音信号生成部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a low bit rate audio signal encoding apparatus used for applications such as a mobile communication system and an audio recording apparatus that encode and transmit an audio signal.
[0002]
[Prior art]
In the fields of digital mobile communications and voice storage, voice coding apparatuses that compress voice information and code at a low bit rate are used for effective use of radio waves and storage media. As such conventional technology, ITU-T recommendation G.I. 729 (“Coding of speech at 8 kbit / s using conjugate-structure algebraic-code-excited linear-prediction (CS-ACELP)”) or G.I. There is a CS-ACELP encoding method with DTX (Discontinuous Transmission) control of 729 Annex B ("A silence compression scheme for G.729 optimized for terminals conforming to Recommendation V.70").
[0003]
FIG. 13 is a block diagram showing a configuration of a conventional CS-ACELP encoding system encoding apparatus. In FIG. 13, the LPC analysis / quantizer 1 performs LPC (linear prediction) analysis and quantization on the input speech signal, and outputs LPC coefficients and LPC quantized codes.
[0004]
Then, the adaptive excitation signal and the fixed excitation signal extracted from the adaptive excitation codebook 2 and the fixed excitation codebook 3 are multiplied by the gain extracted from the gain codebook 4 and added, and speech synthesis is performed by the LPC synthesis filter 7. The error signal with respect to the input signal is weighted by the auditory weighting filter 9, and the adaptive excitation code, fixed excitation code, and gain code that minimize the error after weighting are output as encoded data together with the LPC quantization code. In FIG. 13, 5 is a multiplier, 6 is an adder, and 8 is a subtractor.
[0005]
FIG. 14 is a block diagram showing the configuration of a conventional CS-ACELP encoding system encoding apparatus with DTX control. First, the voice / silence determination unit 11 determines whether the input signal is a voiced section or a silent section (section with only background noise). When the sound / silence determination unit 11 determines that there is sound, the CS-ACELP speech encoder 12 performs speech encoding of the sound section. The CS-ACELP speech encoder 12 has a configuration shown in FIG.
[0006]
On the other hand, when the sound / silence determination unit 11 determines that there is no sound, the silence section encoder 13 performs encoding. The silent section encoder 13 calculates LPC coefficients and LPC prediction residual energy of the input signal that are the same as those of the speech section from the input signal, and outputs them as encoded data of the silent section.
[0007]
The DTX control / multiplexer 14 controls and multiplexes data to be transmitted as transmission data from the outputs of the voice / silence determination unit 11, the CS-ACELP speech encoder 12 and the silence segment encoder 13. Output as transmission data.
[0008]
[Problems to be solved by the invention]
However, in the conventional CS-ACELP encoder, since the speech encoder performs coding at a low bit rate of 8 kbps using redundancy specific to speech, a clean speech signal in which background noise is not superimposed is obtained. Is input, a high-quality encoding is possible, but when an audio signal with surrounding background noise superimposed is input as an input signal, the decoded signal is encoded when the background noise signal is encoded. There is a problem that the quality of the product deteriorates.
[0009]
Further, in the conventional CS-ACELP encoder with DTX control, only the sound period is encoded by the CS-ACELP encoder, and the silent period (noise only period) is the dedicated silence period encoder. Thus, the average bit rate for transmission is reduced by encoding at a bit rate lower than that of the speech encoder. However, the silence section coder drives the AR-type synthesis filter (LPC synthesis filter) with a noise signal for each signal model (short section (about 10 to 50 ms)) similar to the speech coder. As in the conventional CS-ACELP encoder, there is a problem that the quality of the decoded signal deteriorates with respect to a speech signal on which background noise is superimposed.
[0010]
The present invention has been made in view of the above points, and it is possible to reduce the quality of a decoded signal with respect to an audio signal on which background noise is superimposed, and to reduce the average bit rate necessary for transmission. It is an object of the present invention to provide a signal encoding apparatus and decoding apparatus.
[0011]
[Means for Solving the Problems]
The essence of the present invention is to calculate a statistical feature quantity for an input signal in a silent section (noise only section), store information related to a noise model that can express a statistical feature quantity related to the input noise signal, and input noise signal By detecting whether or not the noise model parameter that represents is changed, and updating the noise model, there is little degradation in the quality of the decoded signal even for speech signals with background noise superimposed, and the average required for transmission The bit rate is also reduced.
[0012]
DETAILED DESCRIPTION OF THE INVENTION
  The noise signal encoding apparatus according to the first aspect of the present invention includes an analysis unit that performs signal analysis on the noise signal of the speech signal including the noise signal, and a memory that stores information on a noise model representing the noise signal. A detection means for detecting a change in information about the stored noise model based on a signal analysis result of the noise signal of the current input; and a change in information about the noise model is detected.,in frontUpdated information about stored noise modelsThen outputUpdating means forThe analysis means extracts a statistical feature quantity related to the noise signal by the signal analysis, and the update means is information representing the statistical feature quantity by a statistical model, and on the decoding side, the noise signal The information related to the noise model, which is information that can be output probabilistically based on the statistical model, is generated for the signal parameters necessary for generation.Take the configuration.
[0013]
  The speech signal encoding apparatus according to the second aspect of the present invention includes a speech / silence determination means for determining whether a speech section or a silence section including only a noise signal with respect to an input speech signal; A speech encoding means for performing speech encoding on the input speech signal when the input signal is, and a noise signal according to the first aspect for encoding a noise signal with respect to the input signal when the determination result is silence A configuration comprising an encoding device, and a voice / silence determination unit, a speech encoding unit, and a multiplexing unit that multiplexes outputs from the noise signal encoding device is adopted.
[0014]
  The speech signal encoding apparatus according to the third aspect of the present invention comprises a speech / noise signal separating means for separating an input speech signal into a speech signal and a background noise signal superimposed on the speech signal, and the input speech A voice / silence determination means for determining whether the voice section is a voiced section or a silent section including only a noise signal from a signal or a voice signal obtained by the voice / noise signal separating means; and the input voice when the judgment result is voiced A speech encoding unit that performs speech encoding on a signal; a noise signal encoding device according to a first aspect that encodes a background noise signal obtained by the speech / noise signal separation unit; It adopts a configuration comprising: a determination unit, the speech encoding unit, and a multiplexing unit that multiplexes outputs from the noise signal encoding device.
[0015]
  The speech signal encoding apparatus according to the fourth aspect of the present invention comprises an analysis means for performing signal analysis on an input speech signal, and speech necessary for determining whether the input speech signal is a speech signal. Voice model storage means for storing the feature pattern, noise model storage means for storing information on a noise model representing a noise signal included in the input voice signal, the analysis means, the voice model storage means, and the noise model A mode determination means for determining whether the input speech signal is a sound section or a silence section including only a noise signal, and determining whether to update a noise model in the case of the silence section, using the output of the storage means A speech encoding means for performing speech encoding on the input speech signal when the mode determination means determines that it is a voiced section, and the mode determination means is a silent section and A noise model updating means for updating and outputting the noise model when it is determined to update the sound model; and a multiplexing means for multiplexing the output from the speech coding means and the noise model updating means. The analysis unit extracts a statistical feature amount related to the input speech signal by the signal analysis, and the noise model update unit is information representing the statistical feature amount in a statistical model. A configuration is adopted in which the information related to the noise model, which is information that can be output probabilistically based on the statistical model, is a signal parameter necessary for generating a noise signal.
[0016]
  The noise signal generation device according to the fifth aspect of the present invention updates the noise model when necessary in accordance with the noise model parameter and noise model update flag encoded on the input noise signal on the encoding side. A noise model update unit, a noise model storage unit that stores information about the updated noise model using the output of the noise model update unit, and a noise signal from the information about the noise model stored in the noise model storage unit Noise signal generating means for generating, wherein the noise model storage means stores information representing a statistical feature quantity in a statistical model as information relating to the noise model, and the noise signal generating means includes the statistical model The signal parameter is stochastically output based on the above, and a noise signal is generated from the output signal parameter.
[0017]
  The speech signal decoding apparatus according to the sixth aspect of the present invention receives a signal including speech data encoded on the encoding side, a noise model parameter, a sound / silence determination flag, and a noise model update flag, Separating means for separating a noise model parameter, a sound / silence determination flag and a noise model update flag from a signal, and performing speech decoding on the sound data when the sound / silence determination flag indicates a sound section A speech signal decoding means; and a noise signal generation device according to a fifth aspect for generating a noise signal from the noise model parameter and the noise model update flag when the voice / silence determination flag indicates a silent section; Either the decoded speech output from the decoding means or the noise signal output from the noise signal generator is switched according to the sound / silence determination flag. It adopts a configuration comprising an output switching means for outputting as a force signal.
[0018]
  The speech signal decoding apparatus according to the seventh aspect of the present invention receives a signal including speech data encoded on the encoding side, a noise model parameter, a sound / silence determination flag, and a noise model update flag, Separating means for separating a noise model parameter, a sound / silence determination flag and a noise model update flag from a signal, and performing speech decoding on the sound data when the sound / silence determination flag indicates a sound section A speech signal decoding means; and a noise signal generation device according to a fifth aspect for generating a noise signal from the noise model parameter and the noise model update flag when the voice / silence determination flag indicates a silent section; A voice / noise signal adding means for adding the decoded speech output from the decoding means and the noise signal output from the noise signal generating device; .
[0019]
  The speech signal encoding method according to the eighth aspect of the present invention includes a speech / silence determination step for determining whether the input speech signal is a speech interval or a silence interval including only a noise signal, and the determination result is sound. A speech encoding step of performing speech encoding on the input speech signal in the case of a noise signal encoding step of encoding a noise signal on the input signal when a determination result is silent, A multiplexing step for multiplexing outputs in the voice / silence determination step, the voice encoding step, and the noise signal encoding step, and the noise signal encoding step includes a voice signal including a noise signal. An analysis step of performing signal analysis on the noise signal, a storage step of storing information on a noise model representing the noise signal, and a stored noise model based on a signal analysis result of the noise signal of the current input Of information A detection step of detecting the change, and an update step of updating the stored noise model information by the change amount of the change when a change in the information related to the noise model is detected, the analysis step comprising: A statistical feature amount relating to a noise signal is extracted by the signal analysis, and the updating step is information representing the statistical feature amount in a statistical model, and a signal parameter necessary for generating a noise signal on the decoding side. The information related to the noise model, which is information that can be output stochastically based on the statistical model, is updated.
[0020]
  The speech signal encoding method according to the ninth aspect of the present invention includes a speech / noise signal separation step of separating an input speech signal into a speech signal and a background noise signal superimposed on the speech signal, and the input speech A voice / silence determination step for determining whether a voice section or a silent section including only a noise signal from a signal or a voice signal obtained in the voice / noise signal separation step, and the input voice when the judgment result is voice A speech encoding step for performing speech encoding on a signal, and a background noise obtained in the speech / noise signal separation step while encoding a noise signal for the input signal when the determination result is silence. A noise signal encoding step for encoding a signal; a voice / silence determination step; a speech encoding step; and a multiplexing step for multiplexing outputs in the noise signal encoding step. The signal encoding step includes an analysis step for performing signal analysis on the noise signal of the speech signal including the noise signal, a storage step for storing information on a noise model representing the noise signal, and a signal of the noise signal currently input A detection step for detecting a change in information related to the stored noise model based on the analysis result, and information related to the stored noise model corresponding to the change amount of the change when a change in information related to the noise model is detected. An update step for updating the data, wherein the analysis step extracts a statistical feature amount related to a noise signal by the signal analysis, and the update step is information representing the statistical feature amount in a statistical model On the decoding side, information on the noise model, which is information that can be output stochastically based on the statistical model, signal parameters necessary for generating a noise signal. It was to perform the update.
[0021]
  The speech signal encoding method according to the tenth aspect of the present invention includes an analysis step of performing signal analysis on an input speech signal, and speech necessary for determining whether the input speech signal is a speech signal. A speech model storage step for storing a feature pattern of the noise, a noise model storage step for storing information on a noise model expressing a noise signal included in the input speech signal, the analysis step, the speech model storage step, and the noise model A mode determination step of determining whether the input speech signal is a sound interval or a silence interval including only a noise signal, and determining whether to update a noise model in the case of the silence interval, using the output in the storage step And a speech encoding step for performing speech encoding on the input speech signal when it is determined as a voiced section in the mode determination step, and the mode determination step. A noise model updating step for updating the noise model when it is determined that the noise model is updated in a silent section, and a multiplexing step for multiplexing the output of the speech encoding step and the noise model updating step. And the analysis step extracts a statistical feature amount related to an input speech signal by the signal analysis, and the noise model update step is information representing the statistical feature amount in a statistical model, and at the decoding side The information related to the noise model, which is information that can be output probabilistically based on the statistical model, is obtained by updating the signal parameters necessary for generating the noise signal.
[0022]
  According to an eleventh aspect of the present invention, there is provided a recording medium for performing a signal analysis on an input noise signal to extract a statistical feature amount related to the noise signal to a computer and a statistical feature amount for the input noise signal. The procedure to store the information expressed in the model as information about the noise model, the procedure to detect the change of the noise model representing the input noise signal, and the noise signal on the decoding side when the change of the information about the noise model is detected To update information on the noise model that can be output stochastically based on the statistical model and to output information on the updated noise model. It can be read by a machine that records the program.
[0038]
Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
(Embodiment 1)
FIG. 1 is a block diagram showing a configuration of a radio communication apparatus provided with a speech signal encoding apparatus according to Embodiment 1 of the present invention.
[0039]
In this wireless communication device, voice is converted into an electrical analog signal by a voice input device 101 such as a microphone on the transmission side and output to an A / D converter 102. The analog audio signal is converted into a digital audio signal by the A / D converter 102 and output to the audio encoding unit 103. The audio encoding unit 103 performs audio encoding processing on the digital audio signal and outputs the encoded information to the modem unit 104. The modem unit 104 digitally modulates the encoded audio signal and sends the digital signal to the wireless transmission unit 105. Radio transmission section 105 performs predetermined radio transmission processing on the modulated signal. This signal is transmitted via the antenna 106.
[0040]
On the other hand, on the reception side of the wireless communication apparatus, a reception signal received by the antenna 107 is subjected to a predetermined wireless reception process by the wireless reception unit 108 and sent to the modem unit 104. Modulator / demodulator 104 performs demodulation processing on the received signal and outputs the demodulated signal to speech decoder 109. Audio decoding section 109 performs decoding processing on the demodulated signal to obtain a digital decoded audio signal, and outputs the digital decoded audio signal to D / A converter 110. The D / A converter 110 converts the digital decoded audio signal output from the audio decoding unit 109 into an analog decoded audio signal and outputs the analog decoded audio signal to an audio output device 111 such as a speaker. Finally, the audio output device 111 converts the electrical analog decoded audio signal into decoded audio and outputs it.
[0041]
The speech encoding unit 103 shown in FIG. 1 has the configuration shown in FIG. FIG. 2 is a block diagram showing the configuration of the speech encoding unit according to Embodiment 1 of the present invention.
[0042]
The voice / silence determination unit 201 determines whether the input voice signal is a voiced segment or a silent segment (noise only segment), and outputs the determination result to the DTX control and multiplexer 204. The presence / absence determination unit 201 may be arbitrary, and is generally determined using the instantaneous amount or change amount of a plurality of parameters such as the power of the input signal, spectrum, pitch period, and the like.
[0043]
When the determination result by the sound / silence determination unit 201 is sound, the sound encoder 202 performs sound coding on the input sound signal in the sound section including the sound signal and the noise signal. The encoded data is output to the DTX control and multiplexer 204. The speech encoder 202 is a speech section encoder, and may be any encoder as long as it encodes speech with high efficiency.
[0044]
On the other hand, when the determination result by the voice / silence determination unit 201 is silent, the noise signal encoder 203 performs encoding of the noise signal on the input signal in the silent period including only the noise signal. The information about the noise model expressing the input noise signal and the flag indicating whether or not to update the noise model are output to the DTX control and multiplexer 204. Finally, the DTX control and multiplexer 204 controls and transmits information to be transmitted as transmission data using the outputs from the voice / silence determination unit 201, speech encoder 202, and noise signal encoder 203. Information is multiplexed and output as transmission data.
[0045]
The noise signal encoder 203 in FIG. 2 has the configuration shown in FIG. FIG. 3 is a block diagram showing the configuration of the noise signal encoder of the speech encoding unit according to Embodiment 1 of the present invention.
[0046]
The noise signal analysis unit 301 performs signal analysis on the noise signal input every certain interval, and calculates an analysis parameter for the noise signal. The analysis parameter to be extracted is a parameter necessary to represent a statistical feature amount related to the input signal. For example, a short-time spectrum obtained by FFT (Fast Fourier Transform) for the short interval signal, input power, LPC There are spectral parameters.
[0047]
Next, the noise model change detection unit 303 detects whether or not the noise model parameter that should represent the currently input noise signal has changed from the noise model parameter held in the noise model storage unit 302.
[0048]
Here, the noise model parameter is information relating to a noise model that can express a statistical feature amount related to an input noise signal. For example, a statistical feature amount such as an average spectrum or a variance value of a short-time spectrum is expressed as, for example, This is information when expressed by a statistical model such as HMM.
[0049]
Then, the noise model change detection unit 303 determines whether the analysis parameter for the current input signal obtained by the noise signal analysis unit 301 is valid as an output from the noise model stored as a noise model representing the previous input signal ( For example, in the case of the HMM model, whether the output probability of the analysis parameter with respect to the current input signal is equal to or higher than a specified value) is determined, and the noise model parameter to represent the currently input noise signal is changed from the stored noise model. When it is determined that the noise model is updated, a flag indicating whether or not to update the noise model and information to be updated (update information) are output to the noise model update unit 304.
[0050]
The external update permission flag is a flag that externally indicates whether or not the noise model update is permitted. In the speech encoding unit according to the present invention, which will be described later, noise transmission is performed during a period in which encoded data in a sound section is transmitted. When the model parameter is not transmitted, updating of the noise model is not permitted.
[0051]
Then, in the noise model update unit 304, when the noise model update flag indicates update, from the updated noise model parameter or the noise model parameter previously stored in the noise model storage unit 302 as noise model update information. The information of only the change amount is output, and the noise model storage unit 302 is updated using the output information. On the other hand, when the noise model update flag indicates non-update, update is not performed and update information is not output.
[0052]
Next, speech decoding section 109 shown in FIG. 1 has the configuration shown in FIG. FIG. 4 is a block diagram showing the configuration of the speech decoding apparatus according to Embodiment 1 of the present invention.
[0053]
In the separation and DTX controller 401, transmission data encoded and transmitted on the input signal at the encoding side is received as reception data, and the reception data is encoded as voice encoded data or noise necessary for voice decoding and noise generation. The noise model parameter, the sound / silence determination flag, and the noise model update flag are separated.
[0054]
Next, when the sound / silence determination flag indicates a sound section, the speech decoder 402 performs speech decoding from the speech encoded data and outputs the decoded speech to the output switch 404.
[0055]
On the other hand, when the voice / silence determination flag indicates a silent section, the noise signal generator 403 generates a noise signal from the noise model parameter and the noise model update flag, and the noise signal is output to the output switch 404. Output. Then, the output switcher 404 switches the output of the speech decoder 402 and the output of the noise signal generator 403 according to the result of the sound / silence determination flag and outputs it as an output signal.
[0056]
The noise signal generator 403 in FIG. 4 has the configuration shown in FIG. FIG. 5 is a block diagram showing the configuration of the noise signal generator of the speech decoding apparatus according to Embodiment 1 of the present invention.
[0057]
The noise model update flag and the noise model parameter (in the case of model update) output from the noise signal encoder 203 illustrated in FIG. 3 are input to the noise model update unit 501. In the noise model update unit 501, when the noise model update flag indicates update, the noise model update unit 501 uses the input noise model parameter and the previous noise model parameter held in the noise model storage unit 502 to update the noise model. The noise model parameters after the update are newly stored in the noise model storage unit 502.
[0058]
The noise signal generation unit 503 generates and outputs a noise signal based on information in the noise model storage unit 502. Noise generation is generated based on information modeled using a statistical feature amount as a parameter so that a generated noise signal becomes a valid signal as an output from the model. For example, when HMM is used as a statistical model, signal parameters (for example, short-time spectrum) necessary for generation are stochastically output according to the state transition probability and parameter output probability, etc., and a noise signal is generated and output based on it. To do.
[0059]
Next, operations of the speech encoding unit and speech decoding unit having the above-described configuration will be described. FIG. 6 is a flowchart showing a process flow of the audio signal encoding method according to the first embodiment. In this method, the processing shown in FIG. 6 is repeated for each frame of a certain short section (for example, about 10 to 50 ms).
[0060]
First, in step (hereinafter abbreviated as ST) 101, an audio signal in units of frames is input. Next, in ST102, sound / silence determination is performed on the input signal, and the determination result is output. If the determination result is sound, ST104 performs speech encoding processing on the input speech signal and outputs the encoded data.
[0061]
On the other hand, when the determination result in ST103 is silence, in ST105, the noise signal encoding process by the noise signal encoder is performed on the input signal, and information on the noise model expressing the input noise signal and the noise model Outputs a flag indicating whether or not to update. The noise signal encoding process will be described later.
[0062]
In ST106, control of information to be transmitted as transmission data and multiplexing of transmission information are performed using the outputs obtained as a result of the sound / silence determination, speech encoding processing, and noise signal encoding processing, and finally At ST107, it is output as transmission data.
[0063]
FIG. 7 is a flowchart showing a process flow of the noise signal encoding method in the audio signal encoding method according to the present embodiment. In this method, the processing shown in FIG. 7 is repeatedly performed for each frame of a certain short section (for example, about 10 to 50 ms).
[0064]
In ST201, a noise signal in units of frames is input. Next, in ST202, signal analysis is performed on the noise signal in units of frames, and analysis parameters for the noise signal are calculated. In ST203, it is detected whether there is a change in the noise model from the analysis parameter. If it is determined that the noise model has changed, a flag indicating whether the noise model is updated in ST205 (updated). The information to be updated (update information) is output, and the noise model storage unit 302 is updated using the output information in ST206.
[0065]
On the other hand, if it is determined in ST204 that there is no change in the noise model, only a flag (no update) indicating whether or not to update the noise model is output in ST207. In ST203, when the external update permission flag separately input from the outside is not permitted, the noise model parameter is not transmitted because there is no model change.
[0066]
As described above, according to the noise encoding method according to the present embodiment, the noise signal is modeled with a noise model that can be represented by a statistical feature amount, so that the background noise signal is audibly deteriorated. A small number of decoded signals can be generated. In addition, it is not necessary to faithfully encode the input signal waveform, and by transmitting only the section in which the noise model parameter corresponding to the input signal changes, it is possible to perform highly efficient encoding at a low bit rate.
[0067]
Also, according to the audio signal encoding method according to the present embodiment, encoding is performed with a speech encoder capable of encoding a speech signal with high quality in a voiced section, and highly efficient and audibly in a silent section. By performing encoding with a noise signal encoder with little degradation, high quality and high efficiency encoding can be performed even in a background noise environment.
[0068]
(Embodiment 2)
FIG. 8 is a block diagram showing a configuration of an audio signal encoding unit according to Embodiment 2 of the present invention.
[0069]
In the speech encoding unit 103, the speech / noise signal separator 801 separates the input speech signal into a speech signal and a background noise signal superimposed on the speech signal. The voice / noise signal separator 801 may be arbitrary. This separation method is called spectral subtraction, which reduces the noise spectrum in the frequency domain from the input signal, thereby separating the input signal into a noise signal and a noise signal after noise suppression, and from a plurality of signal input devices. A method of separating voice and noise from an input signal can be considered.
[0070]
Next, in the voice / silence determination unit 802, it is determined from the voice signal after separation obtained from the voice / noise signal separator 801 whether it is a voiced segment or a silent segment (noise only segment). The data is output to the speech encoder 803 and the DTX control / multiplexer 805. In addition, the structure which performs determination using the input signal before isolation | separation may be sufficient. The sound / silence determination unit 802 may be arbitrary. This determination is generally performed using the instantaneous amount or change amount of a plurality of parameters such as the power of the input signal, the spectrum and the pitch period.
[0071]
If the determination result by the sound / silence determination unit 802 is sound, the sound encoder 803 generates sound for the separated sound signal obtained from the sound / noise signal separator 801. The speech encoder 803 encodes the speech signal only for the section, and outputs the encoded data to the DTX control and multiplexer 805. The speech encoder 803 is an encoder for a sound section and may be any encoder that encodes speech with high efficiency.
[0072]
On the other hand, the noise signal encoder 804 encodes the noise signal in the noise signal encoder 804 over the entire section with respect to the separated noise signal obtained from the voice / noise signal separator 801, and the input noise signal Information about the noise model to be expressed and a flag indicating whether or not to update the noise model are output. The voice / noise signal encoder 801 is shown in FIG. 3 described in the first embodiment.
[0073]
If the sound / silence determination result is sound, the model update is performed using the sound / silence determination result flag input to the noise signal encoder 804 as a noise model update non-permission flag in the noise signal encoder 804. Do not do.
[0074]
Finally, the DTX control / multiplexer 805 controls and transmits information to be transmitted as transmission data using the outputs from the voice / silence determination unit 802, the speech encoder 803, and the noise signal encoder 804. Information is multiplexed and output as transmission data.
[0075]
FIG. 9 is a block diagram showing a configuration of a speech signal decoding apparatus according to the second embodiment.
In the decoding apparatus shown in FIG. 9, the separation and DTX controller 901 receives transmission data encoded and transmitted with respect to the input signal on the encoding side as reception data, and is necessary for speech decoding and noise generation. Then, it is separated into speech encoded data or noise model parameters, voice / silence determination flag and noise model update flag.
[0076]
Next, when the sound / silence determination flag indicates a sound section, the speech decoder 902 performs speech decoding from the speech encoded data and outputs the decoded speech to the speech / noise signal adder 904.
[0077]
On the other hand, the noise signal generator 903 generates a noise signal from the noise model parameter and the noise model update flag, and outputs the noise signal to the voice / noise signal adder 904. Then, a voice / noise signal adder 904 adds the output of the voice decoder 902 and the output of the noise signal generator 903 to obtain an output signal.
[0078]
Next, with reference to FIG. 10, a processing flow of the audio signal encoding method according to Embodiment 2 will be described. In this method, it is assumed that the processing shown in FIG. 10 is repeated for each frame of a certain short section (for example, about 10 to 50 ms).
[0079]
First, in ST301, an input signal in units of frames is input. Next, in ST302, the input audio signal is separated into an audio signal and a background noise signal superimposed on the audio signal. Then, in ST303, sound / silence determination is performed on the input signal or the separated audio signal obtained in ST302, and the determination result is output (ST304).
[0080]
If the determination result is sound, in ST305, the separated speech signal obtained in ST302 is subjected to speech encoding processing by a speech encoder, and the encoded data is output. Next, in ST306, the noise signal encoding process by the noise signal encoder is performed on the separated noise signal obtained in ST302, and information on the noise model representing the input noise signal and the noise model are updated. Whether or not flag is output.
[0081]
When the sound / silence determination result in ST303 is sound, the model is not updated in the noise signal encoding process performed in ST306. In ST307, control of information to be transmitted as transmission data and multiplexing of transmission information are performed using outputs obtained as a result of the sound / silence determination, the voice encoding process, and the noise signal encoding process, Finally, it outputs as transmission data in ST308.
[0082]
As described above, according to the speech signal encoding apparatus of the present embodiment, encoding is performed by a speech encoder that can encode a speech signal with high quality in a voiced section, and high efficiency is obtained for a noise signal. By performing the encoding with the noise signal encoder described in the first embodiment, which is less audibly deteriorated, it is possible to perform high-quality and high-efficiency encoding even in a background noise environment, and further, voice / noise By providing the signal separator, the background noise superimposed from the speech signal input to the speech coder is removed, and the voiced section can be encoded with higher quality or higher efficiency.
[0083]
(Embodiment 3)
FIG. 11 is a block diagram showing a configuration of a speech encoding unit according to Embodiment 3 of the present invention. The configuration on the decoding side in the present embodiment is the same as the configuration of the audio signal decoding apparatus shown in FIG.
[0084]
The input signal analyzer 1101 performs signal analysis on the input signal input every certain interval, and calculates analysis parameters for the input signal. The feature parameters to be extracted are a parameter necessary for representing a statistical feature quantity related to an input signal and a parameter representing a voice feature. As parameters necessary for representing a statistical feature amount, there are, for example, a short-time spectrum obtained by FFT on a short interval signal, input power, LPC spectrum parameters, and the like. Also, parameters representing voice characteristics include LPC parameters, input power, pitch periodicity information, and the like.
[0085]
Next, the mode decision unit 1104 holds the analysis parameter obtained by the input signal analyzer 1101 in the speech feature pattern and noise model storage unit 1103 held in the speech model storage unit 1102. In the case where the input signal is a voiced section or a silent section (a section containing only noise) and whether the input signal is a silent section, it is determined whether to update the noise model and transmit update information.
[0086]
Here, the speech model storage unit 1102 creates and stores a speech feature pattern in advance. Examples of the speech feature pattern include LPC parameters, input signal power, and the like in a speech (sound) section. This is information such as distribution of pitch periodicity information. The noise model parameter is information about a noise model that can express a statistical feature quantity related to an input noise signal. For example, a statistical feature quantity such as an average spectrum or a variance value of a short-time spectrum is used as an HMM, for example. It is information when expressed by a statistical model such as
[0087]
Then, whether the statistical analysis parameter for the current input signal obtained by the input signal analyzer 1101 is valid as an output from the noise model stored as a noise model representing the signal in the previous noise interval (for example, HMM In the case of a model, whether or not the output probability of the analysis parameter with respect to the current input signal is equal to or higher than a specified value) is determined, and whether or not it is a voice (sound) section is determined from a parameter representing a voice characteristic for the input signal.
[0088]
If the mode determiner 1104 determines that it is a sound section, the speech encoder 1105 performs speech encoding on the input signal and outputs the encoded data to the DTX control and multiplexer 1107. . On the other hand, if the mode determiner 1104 determines that the noise model update information is to be transmitted in the silent period, the noise model updater 1106 updates the noise model, and DTX control the information about the updated noise model. And output to the multiplexer 1107.
[0089]
Finally, the DTX control and multiplexer 1107 controls the information to be transmitted as transmission data using the output from the speech encoder and noise model updater 1106 and multiplexes the transmission information, and outputs the transmission data. To do.
[0090]
Next, with reference to FIG. 12, the flow of processing of the audio signal encoding method according to the present embodiment will be described. In this method, it is assumed that the processing shown in FIG. 12 is repeated for each frame of a certain short section (for example, about 10 to 50 ms).
[0091]
First, in ST401, an input signal in units of frames is input. Next, in ST402, signal analysis is performed on the input signal input every certain interval, and the analysis parameter is calculated and output.
[0092]
In ST403, whether the currently input statistical analysis parameter is valid as an output from the noise model held in the noise model storage 1103 in FIG. 11 is determined (ST404). As a result, if it is determined that it does not match, that is, the current input signal cannot be expressed by the noise model held at the present time, the process proceeds to the next ST405, where the audio characteristics obtained by analyzing the input signal are obtained. It is determined from the parameter whether it is a voice (sound) section. If it is determined as a speech section, in ST406, speech encoding processing by the speech encoder is performed, and the encoded data is output.
[0093]
On the other hand, when it is determined in ST405 that it is not a speech section, in ST407, the noise model is updated, and information on the updated noise model is output. If it is determined in ST403 that the current input can be expressed by the noise model currently held, the process proceeds to the next step without performing any processing. Then, in ST408, control of information to be transmitted as transmission data and multiplexing of transmission information are performed using outputs from the speech encoder and noise model updater, and transmission data is output in ST409.
[0094]
As described above, according to the speech signal encoding apparatus according to the present embodiment, by providing the mode determiner, the determination can be performed using the change in the statistical feature amount of the input signal and the speech feature pattern. it can. Therefore, more accurate mode determination can be performed and quality degradation due to a determination error can be suppressed.
[0095]
【The invention's effect】
As described above, in the noise signal encoding device of the present invention, the noise signal is modeled by a noise model that can be represented by a statistical feature amount, so that a decoded signal that is less audibly degraded than the background noise signal. Can be generated. Further, since faithful encoding with respect to the input signal waveform is not required, transmission can be performed at a low bit rate and with high efficiency by transmitting only the section in which the noise model parameter corresponding to the input signal changes.
[0096]
In the speech signal encoding device of the present invention, the noise is encoded by a speech coder that can encode a speech signal with high quality in a voiced section, and the noise is highly efficient and less audibly deteriorated in a silent section. By performing encoding with the signal encoder, it is possible to perform encoding with high quality and high efficiency even in a background noise environment.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a configuration of a radio communication system including an audio signal encoding device and an audio signal decoding device according to an embodiment of the present invention.
FIG. 2 is a block diagram showing a configuration of a speech signal encoding apparatus according to Embodiment 1 of the present invention.
FIG. 3 is a block diagram showing a configuration of a noise signal encoding apparatus according to Embodiment 1 of the present invention.
FIG. 4 is a block diagram showing a configuration of a speech signal decoding apparatus according to Embodiment 1 of the present invention.
FIG. 5 is a block diagram showing a configuration of a noise signal generator in the speech signal decoding apparatus according to Embodiment 1 of the present invention.
FIG. 6 is a flowchart showing a process flow of the speech signal encoding method according to Embodiment 1 of the present invention.
FIG. 7 is a flowchart showing a processing flow of a noise signal encoding method according to Embodiment 1 of the present invention;
FIG. 8 is a block diagram showing a configuration of a speech signal encoding apparatus according to Embodiment 2 of the present invention.
FIG. 9 is a block diagram showing a configuration of a speech signal decoding apparatus according to Embodiment 2 of the present invention.
FIG. 10 is a flowchart showing a processing flow of a speech signal encoding method according to Embodiment 2 of the present invention;
FIG. 11 is a block diagram showing a configuration of a speech signal encoding apparatus according to Embodiment 3 of the present invention.
FIG. 12 is a flowchart showing a processing flow of a speech signal encoding method according to Embodiment 3 of the present invention;
FIG. 13 is a block diagram showing a configuration of a conventional speech signal encoding device.
FIG. 14 is a block diagram showing a configuration of a conventional speech signal encoding device.
[Explanation of symbols]
201 Sound / silence detector
202 Speech encoder
203 Noise signal encoder
204 DTX Control and Multiplexer
301 Noise signal analyzer
302, 502 Noise model storage unit
303 Noise model change detector
304,501 Noise model update unit
401 Separation and DTX Controller
402 Speech decoder
403 Noise signal generator
404 Output switcher
503 Noise signal generator

Claims

Based on the signal analysis result of the noise signal of the current input, analysis means for performing signal analysis on the noise signal of the speech signal including the noise signal, storage means for storing information on the noise model representing the noise signal, comprising: a detection means for detecting a change in the information on the stored noise model, if a change of information about the noise model is detected, and updating means for outputting the update information for the previous SL stored noise model, the And
The analysis means extracts a statistical feature quantity related to a noise signal by the signal analysis, and the update means is information representing the statistical feature quantity in a statistical model, and generates noise signals on the decoding side. A noise signal encoding apparatus , wherein the information related to the noise model, which is information that can be output probabilistically based on the statistical model, is updated .

The noise signal encoding apparatus according to claim 1, wherein the updating unit updates and outputs the information related to the stored noise model by an amount of change of the change.

A voice / silence determination means for determining whether the input voice signal is a voiced section or a silent section including only a noise signal, and performs voice encoding on the input voice signal when the determination result is voiced and voice encoding means, the determination result and the noise signal coding apparatus according to claim 1 Symbol mounting performs encoding of the noise signal to the input signal when it is silent, the sound / silence decision unit, the voice A speech signal encoding apparatus comprising: encoding means; and multiplexing means for multiplexing outputs from the noise signal encoding apparatus.

Audio / noise signal separation means for separating an input audio signal into an audio signal and a background noise signal superimposed on the audio signal, and an audio signal obtained by the input audio signal or the audio / noise signal separation means. A sound / silence determination means for determining whether a sound section or a silence section including only a noise signal, a speech encoding means for performing speech encoding on the input speech signal when the determination result is sound, and audio / noise signal 1 SL and mounting of a noise signal coding apparatus according to claim performs encoding of the obtained background noise signal by separating means, said voice / silence determination means, said speech coding means, and said noise signal coding A speech signal encoding apparatus comprising: multiplexing means for multiplexing outputs from the apparatus.

Analysis means for performing signal analysis on the input sound signal, sound model storage means for storing a sound feature pattern necessary for determining whether or not the input sound signal is a sound signal, and the input sound signal a noise model storing means for storing information relating to the noise model representing the noise signal contained in said analyzing means, said speech model storage means and using the output of said noise model storing means, the input audio signal is speech interval Mode determination means for determining whether or not the noise model is updated in the case of the silent section, and input when the mode determination means determines that it is a voiced section. A speech encoding means for performing speech encoding on a speech signal; and a noise model when the mode determination means determines that the noise model is to be updated in a silent section. Comprising a noise model updating means for outputting Le of the update I line, and multiplexing means for multiplexing the outputs from said speech coding means and said noise model updating means, and
The analysis means extracts a statistical feature quantity related to an input speech signal by the signal analysis, and the noise model update means is information representing the statistical feature quantity by a statistical model, and a noise signal is obtained on the decoding side. A speech signal encoding apparatus , wherein the information regarding the noise model, which is information that can be output stochastically based on the statistical model, is updated .

A base station apparatus comprising the audio signal encoding apparatus according to any one of claims 3 to 5.

A communication terminal apparatus comprising the audio signal encoding apparatus according to any one of claims 3 to 5.

A noise model updating unit that updates a noise model when necessary according to a noise model parameter and a noise model update flag encoded with respect to an input noise signal on the encoding side, and an output of the noise model updating unit Noise model storage means for storing information on the updated noise model, and noise signal generation means for generating a noise signal from information on the noise model stored in the noise model storage means ,
The noise model storage means stores information representing a statistical feature quantity in a statistical model as information relating to the noise model, and the noise signal generation means stochastically outputs a signal parameter based on the statistical model, A noise signal generating apparatus , wherein a noise signal is generated from the output signal parameter .

Receives a signal including speech data encoded on the encoding side, noise model parameter, sound / silence determination flag and noise model update flag, and updates noise model parameter, sound / silence determination flag and noise model from the signal Separation means for separating a flag, speech decoding means for performing speech decoding on the speech data when the speech / silence determination flag indicates a speech section, and the speech / silence determination flag is a silence section to indicate the noise model parameter and noise model updating flag and generates the noise signal from the claims 8 Symbol mounting of the noise signal generating device, a decoded speech output from the speech decoding means and the noise signal generating device Output switching means for switching any one of the noise signals output from the signal according to the sound / silence determination flag and outputting as an output signal; Audio signal decoding apparatus, characterized by Bei.

Receives a signal including speech data encoded on the encoding side, noise model parameter, sound / silence determination flag and noise model update flag, and updates noise model parameter, sound / silence determination flag and noise model from the signal Separating means for separating a flag; speech decoding means for performing speech decoding on the speech data when the sound / silence determination flag indicates a sound section; and the sound / silence determination flag is a silence section to indicate the noise model parameter and noise model updating flag and generates the noise signal from the claims 8 Symbol mounting of the noise signal generating device, a decoded speech output from the speech decoding means and the noise signal generating device An audio signal decoding device comprising: an audio / noise signal adding means for adding the noise signal output from the audio signal.

A voice / silence determination step for determining whether the input voice signal is a voiced section or a silent section including only a noise signal, and voice coding is performed on the input voice signal when the determination result is voiced. A speech encoding step, a noise signal encoding step for encoding a noise signal for the input signal when the determination result is silence, the speech / silence determination step, the speech encoding step, and the A multiplexing step of multiplexing the output in the noise signal encoding step,
The noise signal encoding step includes an analysis step of performing signal analysis on the noise signal of the speech signal including the noise signal, a storage step of storing information on a noise model representing the noise signal, and a noise signal of the current input A detection step of detecting a change in information related to the stored noise model based on the signal analysis result of the above, and when a change in the information related to the noise model is detected, the stored noise model corresponding to the change amount of the change only contains the update process that information to the update on the, the,
The analysis step extracts a statistical feature amount relating to a noise signal by the signal analysis, and the update step is information representing the statistical feature amount in a statistical model, and the decoding side generates a noise signal. A speech signal encoding method , comprising: updating the information related to the noise model, which is information that can be output probabilistically necessary signal parameters based on the statistical model .

A voice / noise signal separation step for separating an input voice signal into a voice signal and a background noise signal superimposed on the voice signal, and a voice signal obtained in the input voice signal or the voice / noise signal separation step. A sound / silence determination step for determining whether a sound interval or a silence interval including only a noise signal, a speech encoding step for performing speech encoding on the input speech signal when the determination result is sound, and determination A noise signal encoding step for encoding a noise signal for the input signal when the result is silent, and encoding a background noise signal obtained in the voice / noise signal separation step; A multiplexing step for multiplexing outputs in the silence determination step, the speech encoding step, and the noise signal encoding step,
The noise signal encoding step includes an analysis step of performing signal analysis on the noise signal of the speech signal including the noise signal, a storage step of storing information on a noise model representing the noise signal, and a noise signal of the current input A detection step for detecting a change in information related to the stored noise model based on the signal analysis result, and a change in the information related to the noise model when the change in the information related to the noise model is detected. and an update step of updating the information, only including,
The analysis step extracts a statistical feature amount relating to a noise signal by the signal analysis, and the update step is information representing the statistical feature amount in a statistical model, and the decoding side generates a noise signal. A speech signal encoding method , comprising: updating the information related to the noise model, which is information that can be output probabilistically necessary signal parameters based on the statistical model .

An analysis step of performing signal analysis on the input voice signal, a voice model storing step of storing a voice feature pattern necessary for determining whether or not the input voice signal is a voiced signal, and the input voice signal a noise model storing step of storing the information about the noise model representing the noise signal included in said analysis step, said speech model storage step and using an output of said noise model storing step, the input audio signal is speech interval A mode determination step for determining whether or not the noise model is updated in the case of the silence interval, and a mode determination step in which the determination is made as a sound interval in the mode determination step. determining a speech coding step of performing speech coding on the speech signal, and updates the and noise model silent section in the mode decision step Comprising a noise model updating step of updating the noise model when the multiplexing step of multiplexing the output of said speech coding step and said noise model updating step, the,
The analysis step extracts a statistical feature amount related to an input speech signal by the signal analysis, and the noise model update step is information representing the statistical feature amount by a statistical model, and a noise signal on the decoding side A method for encoding a speech signal , comprising: updating the information related to the noise model, which is information that can be output probabilistically based on a statistical model based on a signal parameter required for generating a signal .

A computer, memory and procedure for extracting a statistical feature amount relating to the noise signal I line signal analysis on an input noise signal, the information represented in statistical models a statistical characteristic amount for the input noise signal as information on a noise model And a procedure for detecting a change in a noise model representing an input noise signal, and when a change in information about the noise model is detected, a signal parameter necessary for generating a noise signal is included in the statistical model on the decoding side. based stochastically have rows when required to update the information on the noise model that can be output, a step of outputting information about the noise model of the updated records a program for causing the execution machine readable storage medium .