JP4004431B2

JP4004431B2 - Packet sending apparatus, index value calculation method and program for priority used in the same

Info

Publication number: JP4004431B2
Application number: JP2003119829A
Authority: JP
Inventors: 祐介日和▲崎▼; 丈太朗池戸; 徹森永; 岳至森; 大輔徳元
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2003-04-24
Filing date: 2003-04-24
Publication date: 2007-11-07
Anticipated expiration: 2023-04-24
Also published as: JP2004328354A

Description

【０００１】
【発明の属する技術分野】
この発明は、インターネットを始めとするパケット通信ネットワーク、特に今後普及するであろう音声動画データ統合ネットワークにおいて、ピーク伝送レートが高くバースト的にトラヒックが発生する一般的な「データ通信」と、遅延時間が品質劣化に直接結びついてしまう「音声通信」や「動画通信」を効率良く混在させることを可能とするパケット送出装置、これらに用いる優先度に関する指標値の算出方法及びプログラムに関する。
【０００２】
【従来の技術】
従来、音声動画通信などのリアルタイム（実時間）通信をインターネットで実現する際、一定の間隔でパケットが常に伝送される必要がある。そのようなデータは通常ＲＴＰ（Real Time Protocol）（例えばIETF-RFC1889：RTP：A Transport Protocol for Real-Time Applications,1996. Services Field（DS Field）in the IPv4 and IPv6 Headers,1998．参照）というプロトコル（通信手順）を用いて伝送されるが、このプロトコルはＵＤＰ（User Datagram Protocol）上に実装されているのでＴＣＰ（Transmission Control Protocol）パケットとは異なり再送が行なわれない。ネットワーク上で輻輳などの理由でパケットが破棄された場合（パケットロスが生じた場合）、受信側では音切れや画像の乱れが顕著に知覚されてしまう。
それに対して、ファイル転送やＷＷＷ（World Wide Web）などの従来からインターネットで使用されているサービスは、データのやりとりの時にのみバースト的にトラヒックが発生するイベント型の通信であり、ＴＣＰパケットとして実装されているため、もしパケット破棄が生じても再送を試みる仕組が備わっている。これはインターネットがベストエフォート型のネットワークであることに起因する。
【０００３】
こうして比較すると、常にデータの伝送されている必要があるリアルタイム通信とイベント型の通信は親和性が低く、お互いに阻害する立場にあることが分る。それを解決するため、与えられたネットワークやシステム資源（音声、映像など）を用いて最大の効果が得られるように、各メディアの品質（アプリケーション品質）を調整することを、インターネットで動的に制御する技術、いわゆるインターネットＱｏＳ（Quality of Service）制御技術として、DiffServ（非特許文献１，２参照）が注目されている。この手法は特にネットワークに入るパケットを予め優先度でクラス分けしておき、ネットワーク輻輳時に各ノードで優先度の低いパケットから破棄することによって、優先度の高いパケットの送信先への到達性を向上させるという仕組みである。例えば音声動画通信に優先度を高く設定すれば、ＲＴＰのパケットロスは起きにくくなり、データ通信に低い優先度を設定してＴＣＰの再送する機能を用いて全体的に安定した接続を実現することが可能となる。
【０００４】
優先度はＩＰ（Internet Protocol）パケットがＤＳドメインと呼ばれる単一の優先制御ポリシーを持つＩＰ伝送網に到着した場合、エッジノードと呼ばれるゲートウェイで優先度のラベリングがなされ、ＤＳドメイン内の各ノードでは、ゲートウェイで付けられた優先度で各パケットに対する輻輳時の処理がなされる。このとき優先度は従来においてはＴＯＳフィールドと呼ばれているヘッダ部分に書き込まれるが、エッジノードは送信元アドレス、送信先アドレス、ＩＰポート番号でそのパケットの重要度を判別する。
従来、この技術はフロー（一連の情報）毎に優先度が割当てられる仕組みとなっており、例えばファイル転送（例ＦＴＰ）やＷＷＷなどのサービスには別々の（well known）ポート番号が割当てられているため、サービスそのものを区別することが可能となっている。
【０００５】
また、従来の符号化技術としては、音声の有無に着目したＶＡＤ（Voice Activity Detection）（例えば3GPP：ETSI TS 146 032,“Digital cellular telecommunications system（Phase 2+）；Voice Activity Detection（VAD),2002参照）あるいはボイススイッチと呼ばれる機能があり、これを用いて伝送する必要がある有声部分と無声部分を区別することが可能であるが、条件によっては判別誤りによって品質が劣化することがある。
また、近年では音声信号ブロックを時間方向では２０ｍｓあるいはそれよりも小さい単位で処理を行い、周波数帯域や品質方向に符号化データを積み重ねることによってスケーラブル（あるいはエンベデッド）な、つまり各種周波数帯域又は／及び各種品質の符号を出力することができる音声符号化器が提案されている（例えば非特許文献３、４参照）。
また、各音声ブロックの優先度を演算する方法および装置を用いることによって聴感的な品質を落さずに効率良く音声パケット伝送を可能とする技術が提案されている（例えば非特許文献５参照）。
【０００６】
図１２に示すように広帯域音声信号は入力端子１１からの各サンプルがディジタル値とされた音声ディジタル信号（以下音声信号と記す）ｓ［ｎ］はこの種の一般的な符号化器と同様に５ミリ秒から２０ミリ秒の単位のフレームにフレーム分割部１２で分割され、（ｎは離散的時刻）つまりＮサンプルごと、例えば３２ｋＨｚサンプリングの音声信号であれば、Ｎ＝１６０サンプルからＮ＝６４０サンプルごとに分割される。更に帯域分割部１６で帯域通過フィルタを用いてＦ個の複数帯域に分割されブロックとされる。この帯域の分割方法は、音声信号ｓ［ｎ］が例えば１６ｋＨｚサンプリングであれば上下各４ｋＨｚ帯域（Ｆ＝２）に分割し、３２ｋＨｚサンプリングであればＦ＝３で０〜４ｋＨｚ帯域と、４ｋＨｚ〜８ｋＨｚ帯域と、８ｋＨｚ〜１６ｋＨｚ帯域というようにウェーブレットで分割しても良いし、Ｆ＝４で総て等間隔に各４ｋＨｚ帯域に分割しても良い。各フレームごとに帯域分割された各ブロックの音声信号は個々の符号化器で、固定時間長（フレーム）ごとに符号化される。このときの音声ブロック（パケット）の分割イメージを図１３に示す。図１３の例はＦ＝３でフレームごとに各帯域の信号がそれぞれブロック（パケット）とされ、フレームごとに３つのブロック（パケット）が生成されることになる。
図１２に示す例では音声信号を上、下２帯域に分割し、２つのブロックとした場合で、分離された低域音声信号ｓ１［ｎ］、高域音声信号ｓ２［ｎ］はそれぞれ低域符号化部１３_L、高域符号化部１３_Hで符号化される。また低域音声信号ｓ１［ｎ］、高域音声信号ｓ２［ｎ］はそれぞれ低域優先度決定部１４_L、高域優先度決定部１４_Hに入力され、ブロックごとのパケット優先度がそれぞれ決定される。
【０００７】
低域優先度決定部１４_L の具体例を図１４に示す。そのｉ番目のフレームの低域帯域のブロック（１，ｉ）の音声信号ｓ１［ｎ］の特徴量を、複数の説明変数生成部１４１_L ，１４２_L ，１４３_L でそれぞれ説明変数ｘ１［１，ｉ］，ｘ２［１，ｉ］，ｘ３［１，ｉ］として生成する。ｉ番目の低域帯域の処理ブロック（１，ｉ）の説明変数ｘｊ[１，ｉ］として、そのブロックの音声信号ｓ１［ｎ］を入力して、その絶対電力を説明変数生成部１４１_L で次式（１）を計算して求める。
ｘ１[１，ｉ］＝（１／Ｎ）Σ_n=1 ^Nｓ１［Ｎｉ＋ｎ］² （１）
あるいは、次式（２）に示すように絶対電力の対数表現としてｘ１[１，ｉ]を求める。
ｘ１[１，ｉ]＝log₁₀（（１／Ｎ）Σ_n=1 ^Nｓ１［Ｎｉ＋ｎ］²）（２）
説明変数生成部１４２_L では説明変数生成部１４１_Lよりの説明変数ｘ１［１，ｉ］と、前フレーム（ｉ−１）の低域ブロック（１，ｉ−１）の説明変数ｘ１［１，ｉ−１］を入力して現フレームの電力の前フレームの電力に対する比を次式（３）により計算して説明変数ｘ２［１，ｉ］を出力する。
ｘ２［１，ｉ］＝ｘ１［１，ｉ］／（ｘ１［１，ｉ−１］）（３）
前フレームのそのブロックの説明変数ｘ１［１，ｉ−１］を前フレームバッファ１４２ａに格納しておき、式（３）の計算を計算部１４２ｂで行い、現フレームのブロック（１，ｉ）の説明変数ｘ１［１，ｉ］で前フレームバッファ１４２ａに保持する説明変数を更新する。
【０００８】
更に説明変数生成部１４３_Lでは音声信号ｓ１［ｎ］を入力して、その自己相関関数（ρ［ｎ］）の最大値（周期性）を次式（４）により計算して説明変数ｘ３［１，ｉ］とする。
ｘ３［１，ｉ］＝ｍａｘ（ρ_i[ｋ］）（４）
ここで正規化された自己相関関数ρ［ｎ］は、次式（５）を用いて計算する。
ρ_i[ｋ］＝Σ_n=0 ^N（ｓ１［Ｎｉ＋ｎ］）（ｓ１［Ｎｉ＋ｎ＋ｋ］）／
Σ_n=0 ^N（ｓ１［Ｎｉ＋ｎ］）² （５）
ｋは１，２，…とし、ｋの最大値は音声信号ｓ［ｎ］のピッチ周期相当程度とする。この時、自己相関関数をアップサンプリングして、つまり補間してより正確な値を計算するようにした方が良い結果が得られる。
これら求めた説明変数ｘ１［１，ｉ］，ｘ２［１，ｉ］，ｘ３［１，ｉ］を指標値計算部１４４_Lで線形結合して指標値ｙ［１，ｉ］を求める。つまり例えば次式（６）、（７）を計算する。
ｙ［１，ｉ］＝α０＋Σ_j=1 ³αｊｘｊ［１，ｉ］＾（６）
ｘｊ［１，ｉ］＾は説明変数ｘｊの確率分布の平均を０、分散を１に正規化したもの、つまり次式（７）で求まる。
ｘｊ［１，ｉ］＾＝（ｘｊ［１，ｉ］−ｘｊ′）／γｊ（７）
ｘｊ′，γｊはそれぞれ説明変数ｘｊの平均値、標準偏差である。
【０００９】
これらの線形結合係数α０〜α３は重回帰分析（例えば奥野忠一他：多変量解析法（改訂版），日科技連，１９８１参照）を用いて事前に最適化した偏回帰係数値を用いる。例えば１つのパケット（ブロック）を消失させたときの受聴者が主観評価したＭＯＳ値をｙ［１，ｉ］′とした時、このｙ［１，ｉ］′と、式（６）により計算された指標値ｙ［１，ｉ］との誤差が最小となるように、最小自乗法を用いて、係数αｊを求める。α０はＭＯＳ値１〜５の平均値である。ここでＭＯＳ値１は「非常に悪い」、ＭＯＳ値５は「非常に良い」と対応する。
係数α０〜α３は、このように決められるから、αｊの絶対値が大きいことはその説明変数（特徴量）がパケット（ブロック）消失時の主観評価品質に大きく影響し、αｊの絶対値が小さければその説明変数（特徴量）はパケット（ブロック）消失時の主観評価品質への影響が比較的小さいことになる。つまり主観評価品質への影響度が大きい程、係数αｊが大きくなるようにαｊが決定されている。また指標値ｙ［１，ｉ］は複数の説明変数（特徴量）ｘ１［１，ｉ］〜ｘ３［１，ｉ］を係数α１〜α３を用いて線形結合させたものであるから、１つの説明変数（特徴量）のみにて、パケット（ブロック）消失の主観評価品質に与える影響の程度よりも、より正しく、影響の程度を示すことになる。主観評価品質に対して大きく影響を与えるブロック、この場合音声であるから聴感的に重要なものは指標値ｙ［１，ｉ］が小さくなり、重要でないものは指標値が大きくなる傾向になる。
【００１０】
図１４中の指標値計算部１４４_Lにおいて、各説明変数ｘ１〜ｘ３はそれぞれ正規化部１４４ａ１〜１４４ａ３で正規化され、正規化説明変数ｘ１＾〜ｘ３＾は乗算部１４４ｂ１〜１４４ｂ３で係数α１〜α３がそれぞれ乗算され、これら乗算結果と定数α０は加算部１４４ｃ１，１４４ｃ２により加算されて指標値ｙ［１，ｉ］が出力される。
こうして求められた指標値ｙ［１，ｉ］は、量子化部１４５_Lでスカラ量子化され、離散的な値、例えば０，１，…，７の何れかの値の優先度ｐ［１，ｉ］が出力される。つまり一般的に指標値の小さいブロック（パケット）は高優先度のものへ、大きいものは低優先度のものへと写像する。写像は以下のような関数で表わすことができる。
ｐ［１，ｉ］＝ｆ（ｙ［１，ｉ］）（８）
このとき用いる写像関数ｆ（ｙ）は、パケットを総優先度ステップ数に写像するスカラ量子化を用いればよい。このときの量子化のしきい値は、指標値ｙ［１，ｉ］を等確率で分割する方法や、指標値ｙ［１，ｉ］の範囲を等分割するなどの方法がある。
【００１１】
同様にして高域優先度決定部１４_Hで指標値
ｙ［２，ｉ］＝α０＋Σ_j=1 ⁴αｊｘｊ［２，ｉ］＾
ｘｊ［２，ｉ］＾＝（ｘｊ［２，ｉ］−ｘｊ［２］′）／γｊ［２］
が計算され、更に優先度ｐ［２，ｉ］＝ｆ₂(ｙ［２，ｉ］）が出力される。パケット送出部１５は、低域符号化部１３_Lよりの符号化符号Ｐ［１，ｉ］と優先度ｐ［１，ｉ］が１つのパケットとして、また符号化部１３_Hよりの符号化符号Ｐ［２，ｉ］と優先度ｐ［２，ｉ］が１つのパケットとして送出される。
なお一般にＦ個に帯域分割された場合、ｉ番目フレームｆ番目帯域のブロック（ｆ，ｉ）の指標値ｙ［ｆ，ｉ］は
ｙ［ｆ，ｉ］＝α０＋Σ_j=1 ³αｊｘｊ［ｆ，ｉ］＾
ｘｊ［ｆ，ｉ］＾＝（ｘｊ［ｆ，ｉ］−ｘｊ［ｆ］′）／γｊ［ｆ］
により計算され、優先度ｐ［ｆ，ｉ］はｆ_f(ｙ[ｆ，ｉ］)により求められる。
【００１２】
【非特許文献１】
IETF-RFC2474：Definition of the Differentiated Services Field（DS Field）in the IPv4 and IPv6 Headers,1998.
【非特許文献２】
IETF-RFC2475：An architecture for Differentiated Services,1998.
【非特許文献３】
森岳至他３名著「パケット通信向け低遅延広帯域音声符号化法の検討」電子情報通信学会２００３年春季全国大会予稿集第１分冊３２７−３２８頁
【非特許文献４】
池戸丈太朗他５名著「等間隔パルス列による雑音励振源符号帳」電子情報通信学会２００３年春季全国大会予稿集第１７３頁
【非特許文献５】
森永徹他２名著「時間および帯域分割された音声ブロックの聴覚的重要度について」音響学会２００３年春季全国大会予稿集第１７８頁
【００１３】
【発明が解決しようとする課題】
DiffServの問題点は、高優先度のトラヒックの割合いが多い状態で輻輳が起きた場合であり、そのような時には高優先度のパケットでも破棄されるという状態が起きてしまう。今後、音声や動画伝送などのリアルタイム性を要求されるトラヒックが増加すればそのような状態になることは容易に類推できる。こうしたとき、音声の場合、従来技術のＶＡＤやボイススイッチを用いると判別誤りが起きる可能性があるため、品質劣化が起きる可能性がある。また無声部分は一切伝送されないため、臨場感のある通信は望めない。
また従来のＶｏＩＰによる音声伝送では、通常２０ｍｓという時間ブロックごとに符号化処理などを行って１つのパケットとしてまとめて伝送する。２０ｍｓという時間は子音の音素をまるごと含むことができるほどの長さであり、そのパケット損失が発生した場合、音素そのものが損失してしまい会話が不明瞭となる。
また、先に述べた非特許文献５に示す技術では優先度が低いパケットを破棄しても聴感的な品質の劣化が少ないが、パケットにはヘッダ情報が付加されるため、１つのブロックを１パケットに対応づけて伝送することになり、効率が悪く、このヘッダのオーバーヘッドによるトラヒックの増加が問題になる。例えばＩＰネットワークはヘッダが非常に大きく、圧縮効率の良い符号化方式を用いると、１ブロックの情報の符号化符号に対して、オーバーヘッドになってしまう。具体的にはＩＰ＋ＵＤＰ＋ＲＴＰによるパケットの場合、ヘッダは合計４００バイト以上になり、一方、符号化符号は８kbit/sのビットレートとすると、２０ｍｓおきに出力される符号は２０バイトであり、ヘッダはペイロードの２０倍となる。
【００１４】
【課題を解決するための手段】
この発明のパケット送出方法の一面によれば、ディジタル信号をフレーム毎に分割し、前記フレーム毎のディジタル信号を複数の帯域のブロックに分割して帯域毎のブロックを生成し、前記各ブロック毎に、全帯域の絶対電力の総和に対する該ブロックの絶対電力の比を１つの特徴量として含む複数の特徴量を求め、前記各ブロック毎に、前記複数の特徴量の線形結合を該ブロックの優先度に関する指標値として求めて優先度に関する指標値を算出する。
【００１５】
この発明のパケット送出方法の他面によれば、ディジタル信号をフレーム毎に分割し、前記フレーム毎のディジタル信号を複数の品質に基づいたブロックに分割し、前記各ブロック毎に、該ブロックの入力信号と符号化誤差信号との比の対数値を１つの特徴量として含む複数の特徴量を求め、前記各ブロック毎に、前記複数の特徴量の線形結合を該ブロックの優先度に関する指標値として求めて優先度に関する指標値の算出する。
又、この発明のパケット送出方法の更なる他面によれば、ディジタル信号をフレーム毎に分割し、前記フレーム毎のディジタル信号を複数の帯域及び複数の品質のブロックに分割し、前記各ブロック毎に、全帯域の絶対電力の総和に対する該ブロックの絶対電力の比と、該ブロックの入力信号と符号化誤差信号との比の対数値と、をそれぞれ特徴量として含む複数の特徴量を求め、前記各ブロック毎に、前記複数の特徴量の線形結合を該ブロックの優先度に関する指標値として求めて優先度に関する指標値を算出する。
【００１６】
音声や動画が必ずしも聴感および視覚的に重要な情報を常に伝送しているわけではないことに着目した。例えば、音声の無音や高域部分は必ずしも常に伝送していなくとも品質への影響は少ない。また、動画でも画像状態にあまり変化がない場合、間の画像を間引いても視覚的な影響は少ない。つまり、この仕組みを更に効率良く音声動画通信で利用するために、ディジタル信号を品質や周波数帯域で分割したブロックごとに符号化し、つまりスケーラブル符号化し、各ブロックごとの符号化符号に優先度を付与し、その同じ優先度のついたブロックを１つのパケットに集約することによって伝送する。このとき、優先度は従来のようにフロー毎に一意に定まるのではなく、音声や動画像の状態に応じて動的に変化する。
前記優先度としては、ディジタル信号をフレームごとに分割し、その分割されたフレームごとのディジタル信号を符号化し、上記符号化に基づく特徴量又は／及び上記ディジタル信号の特徴量を説明変数として求め、上記説明変数の複数個を線形結合して指標値を求め、その指標値を量子化して優先度とすることが好ましい。
【００１７】
【発明の実施の形態】
ネットワーク的に遠距離にある端末間のパケット通信は基幹網を経由して行われる。ここでネットワーク的に遠距離とは、物理的な距離と対応されず、伝送される際にパケットが経由するノードの数が多いことである。
図１にこの発明が適用されるシステム構成例を示す。基幹網１００には他の通信網から通信を受信する入側装置（ゲートウェイ）１１０、他の通信網へ通信を送出する出側装置１２０，１３０が設けられている。他の通信網からの通信を受信し、他の通信網への通信を送出する両機能を備える装置（ゲートウェイ）もある。入側装置１１０は複数の端末２１０，２２０，２３０と接続することができ、出側装置１２０は端末２４０，２５０と接続することができ、出側装置１３０は端末２６０と、また他の基幹網１４０と接続することができる。
【００１８】
入側装置１１０では例えば図２に示すように端末２１０，２２０，２３０から受信したパケットは分配部１１１で優先度ごとに分配されて優先度別バッファ１１２₁，…，１１２₄に格納される。この例では優先度ｐが１〜４の場合である。これら分配された優先度ごとのパケットの複数個が１つのパケットに集約パケットとして、パケット構築部１１３₁，…，１１３₄で集約構築される。その際に、各集約パケットは、その元のパケットの少なくとも送信元情報と受信先（宛先）情報を含む個別ヘッダＨ_Pとそのパケットのペイロードとが組とされ、１つの集約パケットに、基幹網１００内の伝送に必要とする基幹網ヘッダＨ_Sが付けられる。
【００１９】
例えば図３Ａに示すように、端末２１０から端末２４０への優先度ｐが１〜４のパケットＰ_AD1 ，…，Ｐ_AD4 、端末２２０から端末２５０への優先度ｐが１〜４のパケットＰ_BE1 ，…，Ｐ_BE4 、端末２３０から端末２６０への優先度ｐが１〜４のパケットＰ_CF1 ，…，Ｐ_CF4 を受信したとする。優先度ｐが１のパケットを集約した集約パケットＳＰを構築するパケット構築部１１３₁ では、優先度別バッファ１１２₁ からパケットＰ_AD1 、パケットＰ_BE1 、パケットＰ_CF1 を取り出し、基幹網１００内の同一の出側装置１２０へ伝送させるものを集め、個別ヘッダ生成部１１３ａで端末２４０への受信先情報を含む個別ヘッダＨ_PDを生成し、この個別ヘッダＨ_PDと、パケットＰ_AD1 のペイロードとの組を作り、同様に端末２５０への受信先情報を含む個別ヘッダＨ_PEとパケットＰ_BE1 のペイロードとの組を作り、以下同様に図に示していないが、個別ヘッダとペイロードとの組を作る。
図３Ｂに示すように、これらの個別ヘッダとペイロードの組を集めて１つのペイロード（集約ペイロード）としてこれに対し、基幹網１００内を入側装置１１０から出側装置１２０への伝送に必要な基幹網ヘッダＨ_S1をＩＰヘッダ生成部１１３ｂで生成して付けて１つの集約パケットＳＰ_12,1を構築する。この基幹網ヘッダＨ_S1には優先度ｐが１であることを示す情報も含まれている。
【００２０】
他のパケット構築部１１３₂，…，１１３₄においても、それぞれ優先度ｐが２，…，４のパケットについて、各個別ヘッダＨ_Pとペイロードの組を集約し、基幹網ヘッダＨ_S1，Ｈ_S4をそれぞれ付けた集約パケットＳＰ_12,2，…，ＳＰ_12,4を構築する。
基幹網１００を入側装置１１０から出側装置１３０へ伝送させるパケットＰ_CF1，…，Ｐ_CF4なども、それぞれ同一優先度のものを集め、その個別ヘッダとペイロードの組を集約ペイロードとし、これに基幹網ヘッダＨ_Sを付けて集約パケットＳＰ_13,1などを構築する。これらの集約パケットＳＰは送信部１１４より基幹網１００内に送信される。個別ヘッダＨ_pは受信先情報と送信元情報よりなるため、６４バイト程度でよく、基幹網ヘッダＨ_sは前述したように例えば２００バイト以上であり、これと比較して個別ヘッダＨ_pの情報量は可成り小さく、ディジタル信号をその各ブロックごとの符号のパケットとして伝送する場合と比較してヘッダのオーバーヘッドによるトラヒックの増加は著しく改善されることになる。
更にこの実施形態では各端末から送信されるパケットＰ_AD，Ｐ_BE，Ｐ_CFなどはそれぞれディジタル信号がスケーラブル符号化され、所定区間ごとに同一優先度の信号ブロック符号をまとめて１つのパケットとしたものである。
【００２１】
例えば図４に示すようにフレーム分割された音声信号を、Ｆ個の帯域に帯域分割部１６で分割し、これら１〜Ｆ番目の帯域信号をそれぞれ符号化部１３₁〜１３_Fで符号化すると共に優先度決定部１４₁〜１４_Fでそれぞれ優先度を決定する。これら符号化符号Ｐ［１，ｉ］〜Ｐ［Ｆ，ｉ］と優先度ｐ［１，ｉ］〜ｐ［Ｆ，ｉ］をパケット集約部１９に供給し、所定フレーム数ごとに、同一優先度の符号をまとめて、１つのパケットとして、送出部１５より送出する。
入力音声信号ｓ［ｎ］を例えばウェーブレット分析を用いた０−４ｋＨｚ，４ｋＨｚ−８ｋＨｚ，８−１６ｋＨｚのＦ＝３帯域に分割し、５ｍｓで時間方向に分割し、時間２０ｍｓごとにパケット送出するものとする。各パケット送出番号ｔにおけるフレーム番号ｉ＝１，…，４とし、フレーム番号ｉの帯域番号ｆの信号ブロックの符号化符号をＰ［ｆ，ｉ］と、優先度をｐ［ｆ，ｉ］とそれぞれ表わす。各第ｔ番目の送出区間における各ブロックの符号Ｐ［ｆ，ｉ］と優先度ｐ［ｆ，ｉ］が図５Ａに示すようになった場合パケット集約部１９では図５Ｂに示すように、同じ優先度をもつブロックをそれぞれ集約してその優先度の情報を含むヘッダを持つ１つのパケットとする。この例では優先度ｐ＝４のブロック（１，２）及び（１，３）の符号Ｐ［１，２］，Ｐ［１，３］をまとめ、かつその各符号Ｐ［１，２］，Ｐ［１，３］の帯域−時間座標上の位置情報、つまり所定複数フレーム内のブロックの位置情報（１，２），（１，３）を優先度ｐ＝４の情報を含むヘッダを付けたパケットに組み込む。優先度ｐ＝３のパケットには符号Ｐ［２，２］，Ｐ［１，４］とその位置情報（２，２），（１，４）を組み込み、優先度３の情報を含むヘッダを付けて１つのパケットとする。以下同様に同一優先度の符号をまとめ、その位置情報と共に１つのパケットとして組み込む。
【００２２】
こうして同一の優先順位をもつ符号が集約されたパケットは、この例では２０ｍｓ毎にネットワークへと送出される。このとき、ネットワークの状況に応じて、優先度が低いパケットは品質への影響が少ないので、送出しなくても良い。また、ネットワークの各ノードにおいてトラフィックの混雑状況に応じて低い優先度のパケットは破棄されても通信品質への影響は最小限に留められる。
基幹網１００内を伝送され、出側装置１２０に受信された集約パケットＳＰは例えば図６に示すように、パケット分解部１２１でそのペイロードは各受信端末ごとに分解され、その分解されたペイロードをペイロードとし、これにヘッダＨを付けて各１つのパケットにパケット再構成部１２２で行って、対応端末へ送信部１２３により送信する。
例えば図３に示した例では、図７Ａに示す集約パケットＳＰ_12,1，…，ＳＰ_12,4が出側装置１２０に受信される。集約パケットＳＰ_12,1はそのペイロードはその各個別ヘッダＨ_PEに基づき、図７Ｂに示すように受信先情報が端末２４０に対する情報、つまり図３Ａ中のパケットＰ_AD1のペイロードと、受信先情報が端末２５０に対する情報、つまり図３Ａ中のパケットＰ_BE1のペイロードに分解され、端末２２０への情報に、その集約パケットＳＰ_12,1の優先度を含むヘッダを付けてパケットＰ_AD1を再構成し、また端末２３０への情報にヘッダを付けてパケットＰ_BE1を再構成する。
【００２３】
前述したように、この実施形態では、端末から送出されるパケットは同一優先度のブロック符号がまとめられたものである。従って受信端末において図８に示すようにパケット分解部２１でｔ番目の送出区間の全てのパケット、図５の場合は優先度ｐ＝１〜ｐ＝４の４つのパケットＰ［１，ｔ］〜Ｐ［４，ｔ］を図５に示した組み立てと逆の手順を経て帯域−時間座標上に再構成し、各帯域符号Ｐ［１，ｉ］〜Ｐ［Ｆ，ｉ］を復号化部２２₁〜２２_Fでそれぞれ帯域音声復号に復号する。このとき、受信側に到達しなかった低い優先度の符号がある場合は、基本的にはその符号に対する復号化部の動作を停止する。高優先度の符号が到達しない場合は、フレーム（ブロック）消失対策をブロック消失補償部２３₁〜２３_Fの対応する部分で行い、品質低下を避ける。このようにして復号され、必要に応じて消失補償された各帯域音声信号は帯域合成部２４で合成されて再生音声信号ｓ［ｎ］として出力される。なおパケット分解部２１よりブロック消失情報がブロック消失補償部２３₁〜２３_Fへ供給されている。このブロック消失補償は公知の技術により行えばよい。
【００２４】
変形例
図１２に示したように帯域分割してブロックとする場合、その説明変数として更に、４［１，ｉ］を加えてもよい。即ち図１４中に破線で示すように説明変数生成部１４６_Lでこの帯域の絶対電力ｘ１［ｆ，ｉ］と、他帯域の絶対電力とが入力されてこの帯域の絶対電力の総電力に対する比が次式（９）により計算され、説明変数ｘ４［ｆ，ｉ］として出力される。
ｘ４［ｆ，ｉ］＝ｘ１［ｆ，ｉ］／Σ_f=1 ^Fｘ１［ｆ，ｉ］（９）
図１４の例ではＦ＝２であるから、低域のｘ１［１，ｉ］と高域のｘ１［２，ｉ］により
ｘ４［１，ｉ］＝ｘ１［１，ｉ］／（ｘ１［１，ｉ］＋ｘ１［２，ｉ］）
が計算される。
指標値計算部１４４_Lで説明変数ｘ１［１，ｉ］，ｘ２［１，ｉ］，ｘ３［１，ｉ］，ｘ４［１，ｉ］が線形結合され、次式による指標値ｙ［１，ｉ］が計算され、更に量子化されて優先度ｐ［１，ｉ］が出力される。
ｙ［１，ｉ］＝α０＋Σ_j=1 ⁴αｊｘｊ［１，ｉ］＾
ｘｊ［１，ｉ］＾＝（ｘｊ［１，ｉ］−ｘｊ［１］′）／γｊ［１］
ブロック分割は品質に基づき行ってもよい。この場合の音声ブロック（パケット）の分割イメージは図１３中に括弧書きで品質ｑとフレームとの関係を示すようになる。またＱ＝２段構成の、一般的な固定処理時間単位で音声信号を符号化する場合の機能構成を図９に示す。
【００２５】
音声信号ｓ［ｎ］はフレーム分割部１２でフレーム単位で分割され、１段目符号化部１３₁でフレームごとに符号化されると共に１段目優先度決定部１４₁で優先度ｐ［１，ｉ］が決定される。１段目符号化部１３₁よりの符号化符号Ｐ［１，ｉ］は１段目復号化部１７₁で復号化され、この復号化信号が音声信号から減算部１８₁で差し引かれて、１段目の残差信号（符号化誤差信号）ｅ１［ｎ］が生成される。この残差信号は２段目符号化部１３₂でフレームごとに符号化されると共に２段目優先度決定部１４₂で優先度ｐ２［２，ｉ］が決定される。２段目符号化部１３₂よりの符号化符号Ｐ［２，ｉ］は２段目復号化部１７₂で復号化され、その復号化信号が、１段目の残差信号ｅ１［ｎ］から減算部１８₂で差し引かれて２段目残差信号ｅ２［ｎ］が生成される。
１段目優先度決定部１４₁の具体例を図１０に示す。図１４に示した優先度決定部１４_Lと同様に、絶対電力の説明変数ｘ１［１，ｉ］と前フレーム電力比の説明変数ｘ２［１，ｉ］と、自己相関関数最大値の説明変数ｘ３［１，ｉ］とがそれぞれ説明変数生成部１４１₁と１４２₁と１４３₁で生成される。更に説明変数生成部１４で符号Ｐ［１，ｉ］の品質、例えば信号に対する雑音比が説明変数ｘ５［１，ｉ］として生成される。即ち信号電力計算部１４７ａでＳ＝Σ_n=1 ^Nｓ［Ｎｉ＋ｎ］²が計算され、また雑音計算部１４７ｂでＥ＝Σ_n=1 ^Nｅ１［Ｎｉ＋ｎ］²が計算され、これらの比の対数log₁₀Ｅ／Ｓが対数割算部１４７ｃで計算され、その結果が説明変数ｘ５［１，ｉ］として出力される。
【００２６】
これら４個の説明変数は指標計算部１４４₁ で線形結合されて指標値ｙ［１，ｉ］が計算される。例えば先の場合と同様に正規化部１４４ａｊ（ｊ＝１，…，４）で説明変数ｘｊ［１，ｉ］がそれぞれ正規化され、その正規化値ｘｊ［１，ｉ］＾が線形結合ｙ［１，ｉ］＝α０＋Σ_j=1 ⁴αｊｘｊ［１，ｉ］＾，ｘｊ［１，ｉ］＾＝（ｘｊ［１，ｉ］−ｘｊ［１］′）γｊされる。この指標値ｙ［１，ｉ］は量子化部１４５₁ で量子化され、１段目優先度ｐ［１，ｉ］が出力される。
２段目優先度ｐ［２，ｉ］も同様に求められる。この場合は図１０中に括弧書きで示しているように、１段目残差信号ｅ１［ｎ］の代りに２段目残差信号ｅ２［ｎ］がそれぞれ入力され、これら信号に対して同様に処理され、２段目優先度ｐ［２，ｉ］が出力される。
【００２７】
パケット送出部１５（図４）では１段目符号Ｐ［１，ｉ］と優先度ｐ［１，ｉ］を１つのパケットとし、２段目符号Ｐ［２，ｉ］と優先度ｐ［２，ｉ］を１つのパケットとして出力する。
この説明変数ｘ５［ｑ，ｉ］（ｑ＝１，２，…，Ｑ）は、符号化に基づく特徴量といえる。これを求める計算式は一般的に示すと以下となる。
ｘ５［ｑ，ｉ］＝log₁₀(Σ_n=1 ^Nｅｑ［Ｎｉ＋ｎ］² ／Σ_n=1 ^Nｓ［Ｎｉ＋ｎ］²)
この場合の線形結合係数α５は−０．１程度が考えられる。ｑが大きいものは高品質の信号の再生には必要であるが、トラヒックが輻輳している状態では品質よりも伝送される情報の意味内容がより重要であるから、ｑが大きいパケットはｘ５［ｑ，ｉ］が小さくなり、かつα５が比較的小さいから優先度にあまり関与しないようになる。
【００２８】
一般的なスケーラブル複数帯域符号化器の場合は、説明変数ｘ１［ｆ，ｉ］，ｘ２［ｆ，ｉ］，ｘ３［ｆ，ｉ］，ｘ４［ｆ，ｉ］，ｘ５［ｑ，ｉ］を用いて指標値ｙ［ｆ，ｑ，ｉ］の演算を行なう。このときの音声ブロック（パケット）の分割イメージを図１１に示す。
つまり各種サンプリング周波数、各種サンプル量子化精度（振幅ビット数）の組合せをもつ各品質の音声信号に符号化する、いわゆるスケーラブル符号化の場合で、図１１はサンプリング周波数は３段階、量子化精度（品質）も３段階とした場合で周波数帯域がｆ＝１，ｆ＝２，ｆ＝３の３帯域に分割され、振幅ビット長がｑ＝１，ｑ＝２，ｑ＝３の３領域に分割され、互いに直交する周波数軸（帯域番号）と品質軸（振幅ビット分割番号）と時間軸（フレーム番号）で表わされていた３次元空間における１つの信号ブロック（パケット）として［ｆ，ｑ，ｉ］で識別される。
この場合の各説明変数はそれぞれ次式で求める。帯域ｆ、品質（ビット分割番号）ｑの音声信号をｓｆｑと表わす。
ｘ１［ｆ，ｑ，ｉ］＝（１／Ｎ）Σ_n=1 ^Nｓｆｑ［Ｎｉ＋ｎ］²
又はｘ１［ｆ，ｑ，ｉ］＝log₁₀（（１／Ｎ）Σ_n=1 ^Nｓｆｑ［Ｎｉ＋ｎ］²）
ｘ２［ｆ，ｑ，ｉ］＝ｘ１［ｆ，ｑ，ｉ］／ｘ１［ｆ，ｑ，ｉ−１］
ｘ３［ｆ，ｑ，ｉ］＝ｍａｘ（ρ_f,q,i[ｋ]）
ρ_f,q,i[ｋ]＝Σ_n=0 ^N（ｓｆｑ［Ｎｉ＋ｎ］）（ｓｆｑ［Ｎｉ＋ｎ＋ｋ］）
／Σ_n=0 ^N（ｓｆｑ［Ｎｉ＋ｎ］）²
ｘ４［ｆ，ｑ，ｉ］＝ｘ１［ｆ，ｑ，ｉ］／Σ_f=1 ^Fｘ１［ｆ，ｑ，ｉ］
ｘ５［ｆ，ｑ，ｉ］＝log₁₀(Σ_n=1 ^Nｅｆｑ［Ｎｉ＋ｎ］²／Σ_n=1 ^Nｓｆｑ［Ｎｉ＋ｎ］²)
指標値ｙ［ｆ，ｑ，ｉ］＝α０＋Σ_j=1 ⁵αｊｘｊ［ｆ，ｑ，ｉ］
優先度ｐ［ｆ，ｑ，ｉ］＝ｆ_f,q(ｙ［ｆ，ｑ，ｉ］)
【００２９】
このようにして決定された優先度ｐ［ｆ，ｑ，ｉ］と対応する符号化符号Ｐ［ｆ，ｑ，ｉ］とを１つの組とする。この場合も所定のフレーム数ごとに、同一優先度のブロック符号をその位置情報と共にまとめて１つのパケットとして送出する。
上述ではこの発明を音声信号に適用したが、音楽信号、映像信号にも適用できる。また符号化に基づく特徴量の説明変数としては次のものなども考えられる。例えば、予測符号化を用いた音声符号化器によっては語頭などのパケットが破棄されると、その後の音声品質（ＳＮ比）が著しく劣化する可能性がある。そのような破棄されることによって伝播するＳＮ比の劣化も説明変数ｘｊ（ｍ，ｉ）としてもよい。音声信号の特徴量の説明変数、符号化に基づく特徴量の説明変数の何れも上述した例に限らず、各種のものを使用することができる。
上述した端末、入側装置、出側装置はコンピュータにより機能させることもできる。その場合はその装置としてコンピュータを機能させるためのプログラムを、ＣＤ−ＲＯＭ、磁気ディスクなどの記録媒体から当該装置のコンピュータにインストールし、あるいは通信回線を介してダウンロードして実行させればよい。
【００３０】
【発明の効果】
この発明によればパケットに優先度を付けて伝送するため、ネットワークに輻輳が起きていない場合は通常の通信と同じように品質は最高品質で伝送が可能となる。これに対し、ネットワークに輻輳が起きた場合、優先度の低いパケットは破棄が起きるが品質の劣化は最低限に抑えることも可能である。こうすることによって全体的にネットワークの効率的な利用が可能となる。このような仕組には、狭帯域音声通信ではペイロードがヘッダより小さいためヘッダなどのオーバーヘッドをある程度考慮する必要があるが、ネットワーク側で集約や分配をヘッダのオーバヘッドによるトラヒックの増加が抑圧される。また、広帯域楽音通信や動画通信はパケットのペイロードの大きさがヘッダよりも大きくなるので、集約や分配の仕組を使わなくともネットワークを有効に使用することができる。
【図面の簡単な説明】
【図１】この発明が適用されるネットワークの例を示す図。
【図２】図１中の入側装置１１０の機能構成例を示す図。
【図３】図１中の入側装置１１０における複数の受信パケットと集約した集約パケットの例を示す図。
【図４】端末におけるパケット送出装置の機能構成例を示す図。
【図５】図４中のパケット集約部１９における集約の様子の例を示す図。
【図６】図１中の出側装置１２０の機能構成例を示す図。
【図７】出側装置における集約パケットと分解再生成されたパケットの例を示す図。
【図８】端末におけるパケット受信装置の機能構成例を示す図。
【図９】パケット送出装置の他の機能構成例を示すブロック図。
【図１０】図９中の１段目優先度決定部１４₁の具体的機能構成例を示すブロック図。
【図１１】品質−帯域−時間の３次元座標に信号をブロック分割する例を示す図。
【図１２】従来のパケット送出装置の機能構成例を示すブロック図。
【図１３】帯域−時間の２次元座標に信号をブロック分割する例を示す図。
【図１４】図１２中の低域優先度決定部１４_Lの具体的機能構成例を示すブロック図。[0001]
BACKGROUND OF THE INVENTION
  The present invention relates to a general “data communication” in which a peak transmission rate is high and traffic is generated in a burst manner in a packet communication network including the Internet, particularly an audio-video data integrated network that will be widely used in the future, and a delay time. Packet transmission that enables efficient mixing of "voice communication" and "video communication" that directly leads to quality degradationDevice, index value calculation method for priority used in these devicesAnd the program.
[0002]
[Prior art]
Conventionally, when real-time (real-time) communication such as audio-video communication is realized on the Internet, packets need to be transmitted at regular intervals. Such data is usually a protocol called RTP (Real Time Protocol) (see, for example, IETF-RFC1889: RTP: A Transport Protocol for Real-Time Applications, 1996. Services Field (DS Field) in the IPv4 and IPv6 Headers, 1998)). Although it is transmitted using (communication procedure), since this protocol is implemented on UDP (User Datagram Protocol), retransmission is not performed unlike a TCP (Transmission Control Protocol) packet. When a packet is discarded on the network due to congestion or the like (when a packet loss occurs), sound interruption and image disturbance are perceived remarkably on the receiving side.
In contrast, services that have been used on the Internet, such as file transfer and WWW (World Wide Web), are event-type communications that generate traffic in bursts only when data is exchanged, and are implemented as TCP packets. Therefore, there is a mechanism for attempting retransmission even if a packet is discarded. This is because the Internet is a best-effort network.
[0003]
From this comparison, it can be seen that real-time communication and event-type communication, which always require data transmission, have low affinity and are in a position to inhibit each other. In order to solve this problem, it is possible to dynamically adjust the quality of each media (application quality) on the Internet so that the maximum effect can be obtained using the given network and system resources (voice, video, etc.). DiffServ (see Non-Patent Documents 1 and 2) is attracting attention as a control technology, so-called Internet QoS (Quality of Service) control technology. This method improves the reachability of high priority packets to the destination by classifying packets entering the network according to priority in advance and discarding packets with low priority at each node during network congestion. It is a mechanism of letting. For example, if a high priority is set for audio-video communication, RTP packet loss is unlikely to occur, and a low priority is set for data communication, and a stable connection is realized using a TCP retransmission function. Is possible.
[0004]
When the IP (Internet Protocol) packet arrives at an IP transmission network having a single priority control policy called a DS domain, the priority is labeled at a gateway called an edge node, and each node in the DS domain The processing at the time of congestion for each packet is performed with the priority assigned by the gateway. At this time, the priority is conventionally written in a header portion called a TOS field, but the edge node determines the importance of the packet by the source address, destination address, and IP port number.
Conventionally, this technique has a mechanism in which a priority is assigned to each flow (a series of information). For example, services such as file transfer (eg FTP) and WWW are assigned different (well known) port numbers. Therefore, it is possible to distinguish the service itself.
[0005]
Further, as a conventional coding technique, VAD (Voice Activity Detection) focusing on the presence or absence of speech (for example, 3GPP: ETSI TS 146 032, “Digital cellular telecommunications system (Phase 2+); Voice Activity Detection (VAD), 2002) There is a function called a voice switch, and it is possible to distinguish between voiced and unvoiced parts that need to be transmitted using this function, but depending on the conditions, quality may deteriorate due to discrimination errors.
In recent years, a speech signal block is processed in units of 20 ms or less in the time direction, and encoded data is stacked in the frequency band or quality direction, thereby being scalable (or embedded), that is, various frequency bands and / or Speech encoders that can output codes of various qualities have been proposed (see, for example, Non-Patent Documents 3 and 4).
In addition, a technique has been proposed that enables efficient voice packet transmission without degrading auditory quality by using a method and apparatus for calculating the priority of each voice block (see, for example, Non-Patent Document 5). .
[0006]
As shown in FIG. 12, a wideband audio signal is an audio digital signal (hereinafter referred to as an audio signal) s [n] in which each sample from the input terminal 11 is a digital value, as in this type of general encoder. If the frame is divided into frames of 5 to 20 milliseconds by the frame dividing unit 12 (n is a discrete time), that is, every N samples, for example, a sound signal of 32 kHz sampling, N = 160 samples to N = 640 Divided for each sample. Further, the band dividing unit 16 divides the signal into F multiple bands using a band pass filter to form blocks. For example, if the audio signal s [n] is 16 kHz sampling, the band is divided into upper and lower 4 kHz bands (F = 2), and if it is 32 kHz sampling, F = 3 and the 0 to 4 kHz band and 4 kHz to 4 kHz. It may be divided by wavelets such as 8 kHz band and 8 kHz to 16 kHz band, or may be divided into 4 kHz bands at equal intervals with F = 4. The audio signal of each block divided into bands for each frame is encoded for each fixed time length (frame) by an individual encoder. FIG. 13 shows a divided image of the voice block (packet) at this time. In the example of FIG. 13, F = 3 and the signal of each band is made into a block (packet) for each frame, and three blocks (packets) are generated for each frame.
In the example shown in FIG. 12, the audio signal is divided into two upper and lower bands to form two blocks, and the separated low frequency audio signal s1 [n] and high frequency audio signal s2 [n] Encoding unit 13_LThe high frequency encoding unit 13_HIt is encoded with. Further, the low frequency audio signal s1 [n] and the high frequency audio signal s2 [n] are respectively assigned to the low frequency priority determining unit 14_L, High frequency priority determination unit 14_HThe packet priority for each block is determined.
[0007]
  Low frequency priority determination unit 14_L A specific example is shown in FIG. The feature quantity of the audio signal s1 [n] of the block (1, i) in the low band of the i-th frame is converted into a plurality of explanatory variable generators 141._L 142_L , 143_L Are generated as explanatory variables x1 [1, i], x2 [1, i], x3 [1, i], respectively. As the explanatory variable xj [1, i] of the processing block (1, i) of the i-th low-frequency band, the audio signal s1 [n] of that block is input, and the absolute power is input to the explanatory variable generating unit 141._L The following equation (1) is calculated and obtained.
    x1 [1, i] = (1 / N) Σ_{n = 1} ^Ns1 [Ni + n]²            (1)
  Alternatively, as shown in the following equation (2), x1 [1, i] is obtained as a logarithmic expression of absolute power.
    x1 [1, i] = log_Ten((1 / N) Σ_{n = 1} ^Ns1 [Ni + n]²(2)
  Explanation variable generator 142_L Then, the explanatory variable generation unit 141_LInput explanatory variable x1 [1, i] and the explanatory variable x1 [1, i-1] of the low-frequency block (1, i-1) of the previous frame (i-1). The ratio to the power of the previous frame is calculated by the following equation (3), and the explanatory variable x2 [1, i] is output.
    x2 [1, i] = x1 [1, i] / (x1 [1, i-1]) (3)
  The explanatory variable x1 [1, i-1] of the block of the previous frame is stored in the previous frame buffer 142a, the calculation of Expression (3) is performed by the calculation unit 142b, and the block (1, i) of the current frame is calculated. The explanatory variable held in the previous frame buffer 142a is updated with the explanatory variable x1 [1, i].
[0008]
Further, the explanatory variable generation unit 143_LThen, the speech signal s1 [n] is input, and the maximum value (periodicity) of the autocorrelation function (ρ [n]) is calculated by the following equation (4) to be an explanatory variable x3 [1, i].
x3 [1, i] = max (ρ_i[k]) (4)
Here, the normalized autocorrelation function ρ [n] is calculated using the following equation (5).
ρ_i[k] = Σ_{n = 0} ^N(S1 [Ni + n]) (s1 [Ni + n + k]) /
Σ_{n = 0} ^N(S1 [Ni + n])²                                    (5)
k is 1, 2,..., and the maximum value of k is approximately equivalent to the pitch period of the audio signal s [n]. At this time, a better result can be obtained by up-sampling the autocorrelation function, that is, by interpolating and calculating a more accurate value.
The calculated explanatory variables x1 [1, i], x2 [1, i], x3 [1, i] are used as the index value calculation unit 144._LTo obtain an index value y [1, i]. That is, for example, the following equations (6) and (7) are calculated.
y [1, i] = α0 + Σ_{j = 1} ^Threeαjxj [1, i] ^ (6)
xj [1, i] ^ is obtained by normalizing the probability distribution of the explanatory variable xj to 0 and the variance to 1, that is, the following equation (7).
xj [1, i] ^ = (xj [1, i] −xj ′) / γj (7)
xj ′ and γj are the average value and standard deviation of the explanatory variable xj, respectively.
[0009]
As these linear combination coefficients α0 to α3, partial regression coefficient values optimized in advance using multiple regression analysis (see, for example, Taichi Okuno et al .: Multivariate analysis method (revised version), Nikka Giren, 1981) are used. For example, when the MOS value subjectively evaluated by the listener when erasing one packet (block) is y [1, i] ′, this y [1, i] ′ is calculated by Equation (6). The coefficient αj is obtained using the method of least squares so that the error from the index value y [1, i] is minimized. α0 is an average value of MOS values 1 to 5. Here, MOS value 1 corresponds to “very bad” and MOS value 5 corresponds to “very good”.
Since the coefficients α0 to α3 are determined in this way, the fact that the absolute value of αj is large greatly affects the subjective evaluation quality when the explanatory variable (feature) is lost in the packet (block), and the absolute value of αj is small. For example, the explanatory variable (feature amount) has a relatively small influence on the subjective evaluation quality when the packet (block) is lost. That is, αj is determined so that the coefficient αj increases as the degree of influence on the subjective evaluation quality increases. The index value y [1, i] is obtained by linearly combining a plurality of explanatory variables (feature quantities) x1 [1, i] to x3 [1, i] using coefficients α1 to α3. Only the explanatory variable (feature amount) indicates the degree of influence more correctly than the degree of influence of the packet (block) loss on the subjective evaluation quality. A block that greatly affects the subjective evaluation quality. In this case, since the sound is important, the index value y [1, i] is small for those that are audibly important, and the index value tends to be large for those that are not important.
[0010]
The index value calculation unit 144 in FIG._L, The explanatory variables x1 to x3 are normalized by the normalizing units 144a1 to 144a3, respectively, and the normalized explanatory variables x1 ^ to x3 ^ are respectively multiplied by the coefficients α1 to α3 by the multiplying units 144b1 to 144b3. The constant α0 is added by the adders 144c1 and 144c2, and the index value y [1, i] is output.
The index value y [1, i] obtained in this way is the quantization unit 145._LAnd the priority p [1, i] of any one of discrete values, for example, 0, 1,..., 7 is output. That is, generally, a block (packet) with a small index value is mapped to a high-priority block, and a large index value is mapped to a low-priority block. The mapping can be expressed by the following function.
p [1, i] = f (y [1, i]) (8)
The mapping function f (y) used at this time may use scalar quantization that maps the packet to the total priority step number. The quantization threshold at this time includes a method of dividing the index value y [1, i] with equal probability and a method of equally dividing the range of the index value y [1, i].
[0011]
Similarly, the high frequency priority determination unit 14_HIndex value at
y [2, i] = α0 + Σ_{j = 1} ^Fourαjxj [2, i] ^
xj [2, i] ^ = (xj [2, i] −xj [2] ′) / γj [2]
Is calculated, and priority p [2, i] = f₂(y [2, i]) is output. The packet sending unit 15 includes a low frequency encoding unit 13_LEncoding code P [1, i] and priority p [1, i] from one packet, and encoding unit 13_HThe encoded code P [2, i] and the priority p [2, i] are transmitted as one packet.
In general, when the band is divided into F, the index value y [f, i] of the block (f, i) of the i-th frame and the f-th band is
y [f, i] = α0 + Σ_{j = 1} ^Threeαjxj [f, i] ^
xj [f, i] ^ = (xj [f, i] −xj [f] ′) / γj [f]
And the priority p [f, i] is f_f(y [f, i]).
[0012]
[Non-Patent Document 1]
IETF-RFC2474: Definition of the Differentiated Services Field (DS Field) in the IPv4 and IPv6 Headers, 1998.
[Non-Patent Document 2]
IETF-RFC2475: An architecture for Differentiated Services, 1998.
[Non-Patent Document 3]
Takeshi Mori and 3 other authors "Study of low-delay wideband speech coding method for packet communications" IEICE 2003 Annual Meeting Proceedings Vol. 327-328
[Non-Patent Document 4]
Taketaro Ikedo and 5 other authors “Noise Excitation Source Codebook Using Equally-spaced Pulse Trains” Proceedings of the 2003 IEICE Spring Conference, page 173
[Non-Patent Document 5]
Toru Morinaga and two other authors “On the auditory importance of time- and band-divided speech blocks” Proceedings of the Acoustical Society of Japan 2003 Spring, page 178
[0013]
[Problems to be solved by the invention]
  The problem with DiffServ is when congestion occurs with a high percentage of high-priority traffic. In such a case, a high-priority packet is discarded. In the future, if traffic that requires real-time performance such as voice and moving image transmission increases, it can be easily inferred that such a state will occur. In such a case, in the case of voice, there is a possibility that a determination error may occur if a conventional VAD or voice switch is used, so that quality degradation may occur. In addition, since the silent part is not transmitted at all, it is not possible to expect a realistic communication.
  In the conventional voice transmission by VoIP, encoding processing is performed for each time block of 20 ms, and the packets are transmitted as a single packet. The time of 20 ms is long enough to contain the entire phoneme of the consonant, and when the packet loss occurs, the phoneme itself is lost and the conversation becomes unclear.
  Further, in the technique described in Non-Patent Document 5 described above, even if a packet having a low priority is discarded, the perceptual quality degradation is small. However, since header information is added to the packet, one block is set to 1 The transmission is performed in association with the packet, so that the efficiency is low, and an increase in traffic due to the overhead of the header becomes a problem. For example, an IP network has a very large header, and if an encoding method with good compression efficiency is used, it becomes an overhead for an encoding code of one block of information. Specifically, if the packet is IP + UDP + RTP, the header is total400On the other hand, if the encoded code has a bit rate of 8 kbit / s, the code output every 20 ms is 20 bytes, and the header is 20 times the payload.
[0014]
[Means for Solving the Problems]
  According to one aspect of the packet transmission method of the present invention,A digital signal is divided for each frame, the digital signal for each frame is divided into a plurality of band blocks, and a block for each band is generated. A plurality of feature amounts including an absolute power ratio as one feature amount are obtained, and for each of the blocks, a linear combination of the plurality of feature amounts is obtained as an index value relating to the priority of the block, and an index value relating to the priority is obtained. calculate.
[0015]
  According to another aspect of the packet transmission method of the present invention,The digital signal is divided for each frame, the digital signal for each frame is divided into blocks based on a plurality of qualities, and the logarithmic value of the ratio between the input signal of the block and the encoding error signal is calculated for each block. A plurality of feature amounts included as one feature amount are obtained, and for each block, a linear combination of the plurality of feature amounts is obtained as an index value related to the priority of the block, and an index value related to the priority is calculated.
  According to still another aspect of the packet transmission method of the present invention, the digital signal is divided into frames, the digital signal for each frame is divided into a plurality of bands and a plurality of quality blocks, and each block is divided. A plurality of feature amounts each including a ratio of the absolute power of the block to the sum of absolute powers of all bands and a logarithmic value of the ratio of the input signal of the block and the encoding error signal, respectively, as feature amounts; For each block, a linear combination of the plurality of feature quantities is obtained as an index value related to the priority of the block, and an index value related to the priority is calculated.
[0016]
We focused on the fact that audio and video do not always transmit auditory and visually important information. For example, sound silence and high-frequency parts are not necessarily transmitted even if they are not always transmitted. In addition, when there is not much change in the image state even with a moving image, even if the image between them is thinned, there is little visual influence. In other words, in order to use this mechanism in audio-video communications more efficiently, digital signals are encoded for each block divided by quality and frequency band, that is, scalable encoding is performed, and priority is given to the encoding code for each block. Then, the blocks having the same priority are transmitted by aggregating them into one packet. At this time, the priority is not uniquely determined for each flow as in the prior art, but dynamically changes according to the state of the voice or moving image.
As the priority, the digital signal is divided for each frame, the digital signal for each divided frame is encoded, and the feature amount based on the encoding or / and the feature amount of the digital signal is obtained as an explanatory variable, Preferably, an index value is obtained by linearly combining a plurality of the explanatory variables, and the index value is quantized to be a priority.
[0017]
DETAILED DESCRIPTION OF THE INVENTION
Packet communication between terminals located at a long distance in terms of network is performed via a backbone network. Here, the long distance in terms of network means that the number of nodes through which a packet passes is large when it is transmitted without corresponding to a physical distance.
FIG. 1 shows a system configuration example to which the present invention is applied. The backbone network 100 is provided with an incoming device (gateway) 110 that receives communication from another communication network, and outgoing devices 120 and 130 that send communication to the other communication network. There is also an apparatus (gateway) having both functions of receiving communication from another communication network and sending communication to the other communication network. The ingress device 110 can be connected to a plurality of terminals 210, 220, 230, the egress device 120 can be connected to terminals 240, 250, and the egress device 130 can be connected to the terminal 260 and other backbone networks. 140 can be connected.
[0018]
In the ingress device 110, for example, as shown in FIG. 2, the packets received from the terminals 210, 220, and 230 are distributed by the distribution unit 111 according to priority, and the priority buffer 112 is received.₁, ..., 112_FourStored in In this example, the priority p is 1 to 4. A plurality of the packets for each distributed priority are aggregated into one packet, and the packet construction unit 113₁, ..., 113_FourIt is constructed in an aggregate. At that time, each aggregated packet has an individual header H including at least transmission source information and reception destination (destination) information of the original packet._PAnd the payload of the packet are combined into one aggregate packet, which is a backbone header H required for transmission within the backbone 100_SIs attached.
[0019]
  For example, as shown in FIG. 3A, a packet P having a priority p of 1 to 4 from the terminal 210 to the terminal 240._AD1 , ..., P_AD4 , Packet P with a priority p of 1 to 4 from terminal 220 to terminal 250_BE1 , ..., P_BE4 , Packet P with a priority p of 1 to 4 from terminal 230 to terminal 260_CF1 , ..., P_CF4 Is received. A packet construction unit 113 that constructs an aggregate packet SP in which packets having a priority p of 1 are aggregated.₁ Then, the priority buffer 112₁ To packet P_AD1 , Packet P_BE1 , Packet P_CF1 Are collected and transmitted to the same outgoing device 120 in the backbone network 100, and an individual header H including destination information to the terminal 240 is obtained by the individual header generator 113a._PDAnd this individual header H_PDAnd packet P_AD1 The individual header H including the receiver information to the terminal 250 is also created._PEAnd packet P_BE1 Although not shown in the figure, a pair of an individual header and a payload is created.
  These individual headers, as shown in FIG. 3BWhenA set of payloads is collected as one payload (aggregated payload), and a backbone header H required for transmission from the ingress device 110 to the egress device 120 in the backbone network 100_S1Is generated and attached by the IP header generation unit 113b, and one aggregate packet SP is attached._12,1Build up. This backbone header H_S1Includes information indicating that the priority p is 1.
[0020]
Other packet construction unit 113₂, ..., 113_Four, Each individual header H for packets with priority p of 2,._PAnd the payload set are aggregated, and the backbone header H_S1, H_S4Packet SP with each_12,2, ..., SP_12,4Build up.
Packet P for transmitting the backbone network 100 from the ingress device 110 to the egress device 130_CF1, ..., P_CF4Are collected with the same priority, and a set of the individual header and the payload is used as an aggregate payload, which is added to the backbone network header H._SWith aggregate packet SP_13,1Build etc. These aggregated packets SP are transmitted from the transmitter 114 into the backbone network 100. Individual header H_pIs composed of the receiver information and the sender information, so it may be about 64 bytes, and the backbone header H_sIs, for example, 200 bytes or more as described above, and the individual header H is compared with this._pTherefore, the increase in traffic due to the overhead of the header is remarkably improved as compared with the case where the digital signal is transmitted as a packet of a code for each block.
Furthermore, in this embodiment, the packet P transmitted from each terminal_AD, P_BE, P_CFAnd the like are obtained by scalable coding digital signals and combining signal block codes having the same priority for each predetermined section into one packet.
[0021]
For example, as shown in FIG. 4, the frame-divided audio signal is divided into F bands by the band dividing unit 16, and these 1st to Fth band signals are respectively encoded by the encoding unit 13.₁~ 13_FAnd the priority determination unit 14₁~ 14_FTo determine the priority. These encoded codes P [1, i] to P [F, i] and priorities p [1, i] to p [F, i] are supplied to the packet aggregating unit 19 and are given the same priority every predetermined number of frames. The codes of the degrees are collected and sent from the sending unit 15 as one packet.
The input audio signal s [n] is divided into 0-4 kHz, 4 kHz-8 kHz, and 8-16 kHz F = 3 bands using, for example, wavelet analysis, divided in the time direction at 5 ms, and transmitted in packets every 20 ms And The frame number i = 1,..., 4 in each packet transmission number t, the encoding code of the signal block of the band number f of the frame number i is P [f, i], and the priority is p [f, i]. Represent each. When the code P [f, i] and the priority p [f, i] of each block in each t-th transmission section are as shown in FIG. 5A, the packet aggregating unit 19 is the same as shown in FIG. 5B. Assume that each block having priority is aggregated into one packet having a header including information on the priority. In this example, the codes P [1,2] and P [1,3] of the blocks (1,2) and (1,3) with the priority p = 4 are put together, and the respective codes P [1,2], Position information on the band-time coordinate of P [1,3], that is, position information (1, 2), (1, 3) of a block in a predetermined plurality of frames is attached with a header including information of priority p = 4 Incorporate in the packet. The packet of priority p = 3 incorporates codes P [2,2], P [1,4] and their position information (2,2), (1,4), and includes a header including priority 3 information. Add one packet. In the same manner, codes having the same priority are collected and incorporated as one packet together with the position information.
[0022]
In this example, packets in which codes having the same priority are aggregated are sent to the network every 20 ms. At this time, a packet having a low priority has little influence on the quality according to the state of the network, and therefore does not need to be transmitted. Further, even if a low-priority packet is discarded in each node of the network according to traffic congestion, the influence on communication quality is kept to a minimum.
For example, as shown in FIG. 6, the aggregated packet SP transmitted through the backbone network 100 and received by the outgoing device 120 is decomposed by the packet decomposing unit 121 for each receiving terminal. A payload H is added to the payload, and each packet is processed by the packet reconstruction unit 122 and transmitted to the corresponding terminal by the transmission unit 123.
For example, in the example shown in FIG. 3, the aggregate packet SP shown in FIG. 7A._12,1, ..., SP_12,4Is received by the outgoing device 120. Aggregated packet SP_12,1The payload is its individual header H_PE7B, the destination information is information for the terminal 240, that is, the packet P in FIG._AD1And the destination information is information for the terminal 250, that is, the packet P in FIG._BE1Of the aggregate packet SP into the information to the terminal 220._12,1Packet P with a header containing the priority of_AD1And the packet P with the header added to the information to the terminal 230_BE1Reconfigure.
[0023]
As described above, in this embodiment, the packet transmitted from the terminal is a group of block codes having the same priority. Therefore, in the receiving terminal, as shown in FIG. 8, all packets in the t-th transmission section are shown in the packet disassembling unit 21, and in the case of FIG. 5, four packets P [1, t] with priority p = 1 to p = 4. P [4, t] is reconstructed on band-time coordinates through the reverse procedure of the assembly shown in FIG. 5, and each band code P [1, i] to P [F, i] is decoded by the decoding unit 22.₁~ 22_FTo decode each band audio. At this time, if there is a low priority code that has not reached the receiving side, basically the operation of the decoding unit for that code is stopped. If the high priority code does not arrive, the block (block) loss countermeasure is taken as a block loss compensation unit 23.₁~ 23_FTo avoid quality degradation. Each band audio signal decoded in this manner and subjected to erasure compensation as necessary is synthesized by the band synthesizing unit 24 and output as a reproduced audio signal s [n]. It should be noted that the block erasure information is received from the packet disassembly unit 21 by the block erasure compensation unit 23₁~ 23_FHas been supplied to. This block disappearance compensation may be performed by a known technique.
[0024]
Modified example
When the block is divided into bands as shown in FIG. 12, 4 [1, i] may be further added as an explanatory variable. That is, as shown by a broken line in FIG._LThen, the absolute power x1 [f, i] of this band and the absolute power of other bands are input, and the ratio of the absolute power of this band to the total power is calculated by the following equation (9), and the explanatory variable x4 [f, i].
x4 [f, i] = x1 [f, i] / Σ_{f = 1} ^Fx1 [f, i] (9)
In the example of FIG. 14, F = 2, so that x1 [1, i] in the low band and x1 [2, i] in the high band
x4 [1, i] = x1 [1, i] / (x1 [1, i] + x1 [2, i])
Is calculated.
Index value calculation unit 144_LThe explanatory variables x1 [1, i], x2 [1, i], x3 [1, i], x4 [1, i] are linearly combined, and the index value y [1, i] according to the following equation is calculated: Further, it is quantized and the priority p [1, i] is output.
y [1, i] = α0 + Σ_{j = 1} ^Fourαjxj [1, i] ^
xj [1, i] ^ = (xj [1, i] −xj [1] ′) / γj [1]
Block division may be performed based on quality. In this case, the audio block (packet) divided image shows the relationship between the quality q and the frame in parentheses in FIG. Further, FIG. 9 shows a functional configuration when a speech signal is encoded in a general fixed processing time unit with a Q = 2 stage configuration.
[0025]
The audio signal s [n] is divided by the frame dividing unit 12 in units of frames, and the first-stage encoding unit 13₁Are encoded for each frame and the first-stage priority determination unit 14₁The priority p [1, i] is determined. First stage encoding unit 13₁The encoded code P [1, i] from the first stage decoding unit 17₁And the decoded signal is subtracted from the audio signal 18.₁The first stage residual signal (encoding error signal) e1 [n] is generated. This residual signal is supplied to the second stage encoding unit 13.₂Are encoded for each frame and the second-stage priority determination unit 14₂The priority p2 [2, i] is determined. Second stage encoding unit 13₂The encoded code P [2, i] from the second stage decoding unit 17₂And the decoded signal is subtracted from the first stage residual signal e1 [n].₂Is subtracted to generate the second stage residual signal e2 [n].
First stage priority determination unit 14₁A specific example is shown in FIG. Priority determining unit 14 shown in FIG._LSimilarly, the explanatory variable x1 [1, i] of the absolute power, the explanatory variable x2 [1, i] of the previous frame power ratio, and the explanatory variable x3 [1, i] of the autocorrelation function maximum value are respectively explanatory variables. Generation unit 141₁And 142₁And 143₁Is generated. Furthermore, the explanatory variable generation unit 14 generates the quality of the code P [1, i], for example, the noise ratio to the signal, as the explanatory variable x5 [1, i]. That is, S = Σ in the signal power calculation unit 147a._{n = 1} ^Ns [Ni + n]²And E = Σ in the noise calculation unit 147b_{n = 1} ^Ne1 [Ni + n]²Is the logarithm of these ratios_TenE / S is calculated by the logarithmic division unit 147c, and the result is output as the explanatory variable x5 [1, i].
[0026]
  These four explanatory variables are index calculation unit 144.₁ The index value y [1, i] is calculated by linear combination. For example, as in the previous case, the normalization unit 144aj (j = 1,..., 4) normalizes the explanatory variables xj [1, i], respectively, and the normalized value xj [1, i] ^ is linearly combined y [1, i] = α0 + Σ_{j = 1} ^Fourαjxj [1, i] ^, xj [1, i] ^ = (xj [1, i] −xj [1] ′) γj. The index value y [1, i] is the quantization unit 145.₁ And the first stage priority p [1, i] is output.
  The second-stage priority p [2, i] is obtained in the same manner. In this case figure10As shown in parentheses, the second-stage residual signal e2 [n] is input instead of the first-stage residual signal e1 [n], and these signals are processed in the same manner. The stage priority p [2, i] is output.
[0027]
  Packet sending unit 15 (FIG.4), The first-stage code P [1, i] and the priority p [1, i] are one packet, and the second-stage code P [2, i] and the priority p [2, i] are one packet. Output as.
  This explanatory variable x5 [q, i] (q = 1, 2,..., Q) can be said to be a feature quantity based on encoding. A general formula for calculating this is as follows.
    x5 [q, i] = log_Ten(Σ_{n = 1} ^Neq [Ni + n]² / Σ_{n = 1} ^Ns [Ni + n]²)
  In this case, the linear combination coefficient α5 can be about −0.1. A packet having a large q is necessary for reproduction of a high-quality signal. However, since the semantic content of information transmitted is more important than the quality when traffic is congested, a packet having a large q is x5 [ q, i] becomes small, and α5 is relatively small, so that the priority is not so much involved.
[0028]
In the case of a general scalable multi-band encoder, the explanatory variables x1 [f, i], x2 [f, i], x3 [f, i], x4 [f, i], x5 [q, i] are set. The index value y [f, q, i] is calculated using the calculation. FIG. 11 shows a divided image of the voice block (packet) at this time.
That is, in the case of so-called scalable coding in which audio signals of various quality having combinations of various sampling frequencies and various sample quantization accuracy (number of amplitude bits) are shown, FIG. 11 shows three sampling frequencies and quantization accuracy ( (Quality) is also divided into three stages, the frequency band is divided into three bands of f = 1, f = 2, and f = 3, and the amplitude bit length is divided into three areas of q = 1, q = 2, and q = 3 As a single signal block (packet) in a three-dimensional space represented by a frequency axis (band number), a quality axis (amplitude bit division number), and a time axis (frame number) orthogonal to each other, [f, q, i].
Each explanatory variable in this case is obtained by the following equation. An audio signal of band f and quality (bit division number) q is represented as sfq.
x1 [f, q, i] = (1 / N) Σ_{n = 1} ^Nsfq [Ni + n]²
Or x1 [f, q, i] = log_Ten((1 / N) Σ_{n = 1} ^Nsfq [Ni + n]²)
x2 [f, q, i] = x1 [f, q, i] / x1 [f, q, i-1]
x3 [f, q, i] = max (ρ_{f, q, i}[k])
ρ_{f, q, i}[k] = Σ_{n = 0} ^N(Sfq [Ni + n]) (sfq [Ni + n + k])
/ Σ_{n = 0} ^N(Sfq [Ni + n])²
x4 [f, q, i] = x1 [f, q, i] / Σ_{f = 1} ^Fx1 [f, q, i]
x5 [f, q, i] = log_Ten(Σ_{n = 1} ^Nefq [Ni + n]²/ Σ_{n = 1} ^Nsfq [Ni + n]²)
Index value y [f, q, i] = α0 + Σ_{j = 1} ^Fiveαjxj [f, q, i]
Priority p [f, q, i] = f_{f, q}(y [f, q, i])
[0029]
The priority p [f, q, i] determined in this way and the corresponding encoded code P [f, q, i] are taken as one set. Also in this case, block codes having the same priority are bundled together with the position information and transmitted as one packet every predetermined number of frames.
Although the present invention has been applied to audio signals in the above description, it can also be applied to music signals and video signals. In addition, the following may be considered as explanatory variables of feature amounts based on encoding. For example, depending on the speech coder using predictive coding, if a packet such as a word head is discarded, the speech quality (S / N ratio) thereafter may be significantly degraded. The degradation of the S / N ratio that is propagated by such discarding may also be used as the explanatory variable xj (m, i). The explanatory variable of the feature amount of the audio signal and the explanatory variable of the feature amount based on the encoding are not limited to the above-described examples, and various types can be used.
The above-described terminal, entry-side device, and exit-side device can be functioned by a computer. In that case, a program for causing the computer to function as the device may be installed in a computer of the device from a recording medium such as a CD-ROM or a magnetic disk, or downloaded via a communication line and executed.
[0030]
【The invention's effect】
According to the present invention, since packets are transmitted with priority, when the network is not congested, transmission can be performed with the highest quality as in normal communication. On the other hand, when congestion occurs in the network, packets with low priority are discarded, but quality degradation can be minimized. In this way, the network can be efficiently used as a whole. In such a mechanism, since the payload is smaller than the header in narrowband voice communication, it is necessary to consider the overhead such as the header to some extent, but aggregation and distribution on the network side are suppressed from increasing traffic due to the header overhead. Also, in broadband musical tone communication and video communication, the size of the packet payload is larger than the header, so the network can be used effectively without using an aggregation or distribution mechanism.
[Brief description of the drawings]
FIG. 1 is a diagram showing an example of a network to which the present invention is applied.
FIG. 2 is a diagram showing a functional configuration example of an entry side device 110 in FIG. 1;
FIG. 3 is a diagram showing an example of aggregated packets aggregated with a plurality of received packets in the ingress device 110 in FIG. 1;
FIG. 4 is a diagram illustrating a functional configuration example of a packet transmission device in a terminal.
FIG. 5 is a diagram showing an example of a state of aggregation in the packet aggregation unit 19 in FIG. 4;
6 is a diagram showing a functional configuration example of an exit side device 120 in FIG. 1;
FIG. 7 is a diagram showing an example of aggregated packets and decomposed / regenerated packets in the outgoing device.
FIG. 8 is a diagram illustrating a functional configuration example of a packet reception device in a terminal.
FIG. 9 is a block diagram showing another functional configuration example of the packet transmission device.
10 is a first-stage priority determination unit 14 in FIG. 9;₁The block diagram which shows the specific example of a function structure of these.
FIG. 11 is a diagram illustrating an example of dividing a signal into three-dimensional coordinates of quality-bandwidth-time.
FIG. 12 is a block diagram showing a functional configuration example of a conventional packet transmission device.
FIG. 13 is a diagram showing an example of dividing a signal into two-dimensional coordinates of band-time.
14 is a low-frequency priority determination unit 14 in FIG. 12;_LThe block diagram which shows the specific example of a function structure of these.

Claims

A frame dividing unit for dividing the digital signal into frames,
A band dividing unit that divides the digital signal for each frame into a plurality of bands to generate blocks for each band;
For each block, an encoding unit that generates a code obtained by encoding a signal of the block;
For each block, an explanatory variable generation unit that obtains a plurality of feature amounts including a ratio of the absolute power of the block to the total absolute power of all bands as one feature amount;
An index value calculation unit that obtains a linear combination of the plurality of feature quantities as an index value related to the priority of the block for each block;
A priority determination unit that quantizes the index value and obtains a priority for each block;
A packet sending unit for generating a packet in which codes of the same priority are collected for each predetermined number of frames;
A packet transmission device comprising:

A frame dividing unit for dividing the digital signal into frames,
Means for dividing the digital signal for each frame into blocks based on a plurality of qualities;
For each block, an encoding unit that generates a code obtained by encoding a signal of the block;
An explanatory variable generator for obtaining a plurality of feature quantities including a logarithmic value of the ratio between the input signal of the block and the encoding error signal as one feature quantity for each block;
For each block, an index value calculation unit for obtaining a linear combination of the plurality of feature quantities as an index value related to the priority of the block;
A priority determination unit that quantizes the index value and obtains a priority for each block;
A packet sending unit for generating a packet in which codes of the same priority are collected for each predetermined number of frames;
A packet transmission device comprising:

A frame dividing unit for dividing the digital signal into frames,
A band dividing unit for dividing the digital signal of the frame into blocks of a plurality of bands;
Means for dividing the signal of the block into blocks based on a plurality of qualities for each block of the band;
For each block of the quality of each band, an encoding unit that generates a code obtained by encoding the signal of the block;
For each block of quality of each band, the ratio of the absolute power of the block to the sum of the absolute power of all bands and the logarithm of the ratio of the input signal of the block and the encoding error signal are used as feature quantities, respectively. An explanatory variable generator for obtaining a plurality of feature quantities including:
An index value calculation unit for obtaining a linear combination of the plurality of feature quantities as an index value relating to the priority of the block for each block of the quality of each band;
For each block of the quality of each band, a priority determination unit that quantizes the index value to obtain a priority;
A packet sending unit that generates a packet in which codes of the same priority are collected for each block of the quality of each band;
A packet transmission device comprising:

A method for calculating an index value related to a priority determined by a packet transmission device,
Dividing the digital signal into frames,
Dividing the digital signal for each frame into a plurality of bands to generate a block;
For each of the blocks, obtaining a plurality of feature amounts including a ratio of the absolute power of the block to the sum of absolute powers of all bands as one feature amount;
Obtaining, for each block, a linear combination of the plurality of feature quantities as an index value relating to the priority of the block;
Of calculating index values related to priority including

A method for calculating an index value related to a priority determined by a packet transmission device,
Dividing the digital signal into frames,
Generating a block obtained by dividing the digital signal for each frame based on a plurality of qualities;
Obtaining a plurality of feature quantities including a logarithmic value of a ratio between the input signal of the block and the encoding error signal as one feature quantity for each block;
Obtaining, for each block, a linear combination of the plurality of feature quantities as an index value relating to the priority of the block;
Of calculating index values related to priority including

A method for calculating an index value related to a priority determined by a packet transmission device,
Dividing the digital signal into frames,
Generating a block obtained by dividing the digital signal for each frame based on a plurality of bands and a plurality of qualities;
A plurality of features each including a ratio of the absolute power of the block to the sum of absolute powers of all bands and a logarithmic value of the ratio of the input signal of the block and the encoding error signal as feature quantities for each block. Determining the quantity;
Obtaining, for each block, a linear combination of the plurality of feature quantities as an index value relating to the priority of the block;
Of calculating index values related to priority including

The program for making a computer perform each step of the calculation method of the index value regarding the priority in any one of Claim 4 thru | or 6 .