JP4102223B2

JP4102223B2 - Data processing apparatus and data processing method

Info

Publication number: JP4102223B2
Application number: JP2003062713A
Authority: JP
Inventors: 孝雄山口; 稔栄藤; 博荒川
Original assignee: Panasonic Corp; Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Corp; Panasonic Holdings Corp
Priority date: 1997-03-17
Filing date: 2003-03-10
Publication date: 2008-06-18
Anticipated expiration: 2018-03-16
Also published as: JP2004007461A

Description

【０００１】
【発明の属する技術分野】
本発明は、通信及び放送の分野におけるデータ処理装置及びデータ処理方法に関する。
【０００２】
【従来の技術】
従来より、自分が居る空間の風景の画像中から、例えば人物画像を抽出し、その画像と相手側から送られてきた人物画像と予め記憶されている相手側と共通的に表示する仮想的な空間の画像と重畳して表示することにより、相手が自分の前にいるという実在感を充足し、臨場感のある映像通信を目指したものがある（特公平４−２４９１４号公報）。
【０００３】
特に、従来の技術では画像合成を行うための高速化、メモリーを低減する方法に関する発明が行われている（例えば、特公平５−４６５９２号公報：画像合成装置）。
【０００４】
【発明が解決しようとする課題】
この様な従来の技術では、２次元の静止画や３次元のＣＧデータを合成する画像合成を利用した通信システムが提案されていたが、複数の動画や音声を同時に合成して表示させるシステムの実現方法について、下記の観点からの具体的な議論が行われていなかった。
【０００５】
即ち、（Ａ１）一つあるいは二つ以上の現実の伝送路上においてソフト的に構築される複数の論理的な伝送路を用いて、データと制御情報（データとは別のパケットで伝送される、端末側の処理を制御するための情報）とが独立して伝送される環境下での画像や音声の伝送（通信と放送）及び、その制御方法、（Ａ２）送信すべき画像や音声のデータに付加するヘッダ情報（本発明のデータ管理情報に対応）の動的な変更方法、（Ａ３）送信のために付加するヘッダ情報（本発明の伝送管理情報に対応）の動的な変更方法、（Ａ４）複数の論理的な伝送路を動的に多重化、分離して情報の伝送を行う方法、（Ａ５）プログラムやデータの読み込み、立ち上げ時間を考慮した画像や音声の伝送方法、及び（Ａ６）ザッピングを考慮した画像や音声の伝送方法等の観点からの具体的な議論が行われていなかったという課題があった。
【０００６】
一方、従来より、ネットワークへの伝送量を動的に調整する方法としては、エンコードの方式を変更する方式や、映像のフレームタイプに応じて、フレーム単位でデータを廃棄する方式が提案されている（秦泉寺（じんぜんじ）浩史、田尻哲男、分散適応型ＶＯＤシステムの一検討、Ｄ−８１、電子情報通信学会システムソサイエティ（１９９５））。
【０００７】
エンコーダ側で処理量を調整する方式としては、処理時間拘束のもとで画質の高い映像を提供できる動的演算量スケーラブルアルゴリズムが提案されている（大迫史典，矢島由幸，小寺博，渡辺裕，島村和典：動的演算量スケーラブルアルゴリズムによるソフトウェア画像符号化，電子情報通信学会論文誌Ｄ− ２, Vol.80-D-2, No.2, pp.444-458(1997).）。
【０００８】
また、動画と音声の同期再生を実現した例としては、ＭＰＥＧ１／ＭＰＥＧ２のシステムがある。
【０００９】
この様な従来の技術における、（Ｂ１）従来方式の映像のフレームタイプに応じて映像を廃棄する方式では、扱える情報の粒度が、単一のストリーム内であるため、複数のビデオストリームや複数のオーディオストリームの取り扱いや、編集者の意図を反映させて、重要なシーンカットを重点的にオーディオとともに同期再生をさせることは困難であるという課題があった。（Ｂ２）また、ＭＰＥＧ１／ＭＰＥＧ２では、ハードウェアでの実現が前提であるため、デコーダは与えられたビットストリームをすべてデコードできることが前提となる。したがって、デコーダの処理能力を超えた場合の対応方法が不定となる課題が有る。
【００１０】
又一方、従来、動画像の伝送においては、Ｈ．２６１（ＩＴＵ−ＴＲｅｃｏｍｍｅｎｄａｔｉｏｎＨ．２６１−Ｖｉｄｅｏｃｏｄｅｃｆｏｒａｕｄｉｏｖｉｓｕａｌｓｅｒｖｉｃｅｓａｔｐｘ６４）などの方式を用いたものがあり、これまで、ハードウェアにより実装されていた。このため、ハードウェア設計時に、必要な性能の上限を考慮しているため指定時間以内に復号化処理を完了できないという場合は、生じなかった。
【００１１】
なお、ここで、指定時間とは、一枚の画像を符号化したビットストリームの伝送に要する時間である。この時間内に復号化できないと、超過した時間が遅延となり、これが蓄積して大きくなると、送信側から受信側までの遅延が大きくなりテレビ電話としての使用に適しなくなる。このような状況は避けねばならない。
【００１２】
また、通信相手が規格外のビットストリームを生成しているために復号化処理を指定時間内に完了できない場合には、動画像の伝送ができないという課題があった。
【００１３】
上記の課題は、動画像だけではなく、音声データにおいても発生する課題である。
【００１４】
ところが近年、インタネットやＩＳＤＮの普及という形でパーソナルコンピュータ（ＰＣ）でのネットワーク環境が整備された結果、伝送速度が速くなり、ＰＣとネットワークを利用した動画像の伝送が可能になってきた。ユーザからの動画像伝送に対する要求も、とみに高まってきている。また、ＣＰＵ性能の向上により、ソフトウェアによる動画像の復号化が充分可能となってきている。
【００１５】
しかしながら、パーソナルコンピュータにおいては同じソフトウェアを、ＣＰＵ、バス幅、アクセラレータの有無など、装置構成の異なるコンピュータで実行可能であるため、必要な性能の上限を予め考慮することが困難であり、指定時間内に画像を復号化できない場合が生じる。
【００１６】
また、受信装置の処理能力を越える長さの動画像の符号化データが伝送された場合には指定時間内の符号化が不可能となる。
【００１７】
課題（Ｃ１）：指定時間内に画像を復号化し、遅延を小さく抑える。
【００１８】
また、この課題Ｃ１の解決手段として、例えば、波形データとして動画像を入力する場合であれば、伝送されたビットストリームのうち一部を使用しないため、伝送路の実質使用効率が悪い、という問題が残る場合もある。また、符号化方式によっては、前回の復号画像をもとに今回の復号画像を生成するものがあるが（Ｐピクチャなど）、前回の復号画像を完全に復元しない場合があるため、画質劣化が、時間とともに波及的に大きくなるという問題もある。
【００１９】
課題（Ｃ２）：伝送路の実質使用効率が悪い。また、画質劣化が波及する。
【００２０】
また、ソフトウェアによる実装では、一回の符号化処理に要する時間で画像のフレームレートが決まるため、ユーザの指定したフレームレートが計算機の処理限界を越えた場合には、指定に応えることができなかった。
【００２１】
課題（Ｃ３）：ユーザの指示したフレームレートが、計算機の処理限界を越えると指定に応えられない。
【００２２】
本発明は、上記第２の従来技術の（Ｂ１）〜（Ｂ２）の課題を考慮し、それらの課題の少なくとも何れか一つを解決するデータ処理装置、及びデータ処理方法を提供することを目的とする。
【００２３】
【課題を解決するための手段】
請求項１記載の本発明は、（１）音声または動画像の時系列データと、（２）前記時系列データ間の処理の優先度を示す時系列データ間優先度と、（３）前記動画像の時系列データを構成するフレームが少なくともＩフレームかＰフレームかを示すフレームタイプと、前記フレームタイプとは異なる前記フレームの処理優先度を示す時系列データ内優先度とを含むデータ系列を受け付ける受付手段と、前記時系列データ間優先度により、前記各時系列データに対する処理能力を配分し、さらに前記各時系列データについて、配分された処理能力内に収まるように、前記時系列データ内優先度の閾値を適応的に変化させた状態で、前記時系列データ内の区分されたデータの処理を行うデータ処理手段とを備えたデータ処理装置である。
【００２４】
請求項２記載の本発明は、（１）音声または動画像の時系列データと、（２）前記時系列データ間の処理の優先度を示す時系列データ間優先度と、（３）前記動画像の時系列データを構成するフレームが少なくともＩフレームかＰフレームかを示すフレームタイプと、前記フレームタイプとは異なる前記フレームの処理優先度を示す時系列データ内優先度とを含むデータ系列を入力とし、前記時系列データ間優先度により、前記各時系列データに対する処理能力を配分し、さらに前記各時系列データについて、配分された処理能力内に収まるように、前記時系列データ内優先度の閾値を適応的に変化させた状態で、前記時系列データ内の区分されたデータの処理を行うデータ処理方法である。
【００２５】
【発明の実施の形態】
以下、本発明の実施の形態について図面を参照しながら説明する。
【００２６】
尚、ここで述べる実施の形態は、主に、上述した課題（Ａ１）〜（Ａ６）の何れかを解決するものである。
【００２７】
本発明で使用する「画像」としては、静止画と動画の両方を含む。また、対象とする画像は、コンピュータ・グラフィックス（ＣＧ）のような２次元画像とワイヤーフレーム・モデルから構成されるような３次元の画像データであってもよい。
【００２８】
図１は、本発明の実施の形態における画像音声送受信装置の概略構成図である。
【００２９】
同図において、情報を受信する受信管理部１１と情報を送信する伝送部１３は、同軸ケーブル、ＣＡＴＶ、ＬＡＮ、モデム等の情報を伝送する手段である。通信環境としては、インターネットのように、多重化手段を意識せずに複数の論理的な伝送路が利用できる通信環境であってもよいし、アナログ電話や衛星放送のように多重化手段を意識しなければならない通信環境であってもよい。
【００３０】
また、端末の接続形態としては、ＴＶ電話やＴＶ会議システムのように端末間で双方向で映像や音声を送受信する形態や、衛星放送やＣＡＴＶ、インターネット上での放送型の映像や音声放送の形態が挙げられる。本発明では、このような端末の接続形態について考慮している。
【００３１】
図１に示す分離部１２は受信情報を解析し、データと制御情報を分離する手段である。具体的には、送信のためにデータに付加された送信用のヘッダ情報とデータとを分解したり、データ自身に付加されたデータ制御用のヘッダとデータの中身を分解するための手段である。画像伸張部１４は受信した画像を伸張する手段である。たとえば、Ｈ．２６１、Ｈ．２６３、ＭＰＥＧ１／２、ＪＰＥＧといった標準化された動画や静止画の圧縮画像であってもよいし、そうでなくてもよい。
【００３２】
図１に示す画像伸張管理部１５は画像の伸張状態を監視する手段である。たとえば、画像の伸張状態を監視することで、受信バッファがオーバーフローを起こしそうになった場合に、画像の伸張を行わずに、受信バッファを空読みし、画像の伸張ができる状態になった時点から、画像の伸張を再開させることができる。
【００３３】
又、同図において、画像合成部１６は、伸張された画像を合成する手段である。合成方法に関しては、ＪＡＶＡ、ＶＲＭＬ、ＭＨＥＧといったスクリプト言語で、画像と画像の構造情報（表示位置と表示時間（表示期間を含めてもよい））、画像同士のグルーピングの方法、画像の表示のレイヤ（深さ）、そして、オブジェクトＩＤ（後述するＳＳＲＣ）と、これらの属性の関係を記述することによって画像の合成方法が定義できる。合成方法を記述したスクリプトはネットワークやローカルの記憶装置から入出力する。
【００３４】
又、出力部１７は、画像の合成結果を出力するディスプレイやプリンターなどである。端末制御部１８はこれら各部を制御する手段である。なお、画像の代わりに音声を伸張する構成であっても（画像伸張部を音声伸張部に、画像伸張管理部を音声伸張管理部に、画像合成部を音声合成部に変更することで対応できる）、画像と音声の両方を伸張し、時間的に同期を保ちながら合成、表示する構成であってもよい。
【００３５】
さらに、画像を圧縮する画像圧縮部、画像圧縮部を管理する画像圧縮管理部、音声を圧縮する音声圧縮部、音声圧縮部を管理する音声圧縮管理部を備えることにより、画像や音声の伝送も可能になる。
【００３６】
図２は受信管理部１１と分離部１２とを示す図である。
【００３７】
図１に示した受信管理部１１にデータを受信するデータ受信部１０１とデータを制御するための制御情報を受信する制御情報受信部１０２と、分離部１２に伝送内容を解釈するための伝送構造（詳細は後述する）について記憶する伝送フォーマット記憶部１０３と、伝送フォーマット記憶部１０３に記憶された伝送構造に基づき伝送内容を解釈する伝送情報解釈部１０４で各部を構成することで、データと制御情報を独立して受信することが可能になるので、例えば、受信しながらの受信画像や音声の削除や移動が容易になる。
【００３８】
前述したが、受信管理部１１が対象とする通信環境としては、インターネットのように、多重化手段を意識せずに複数の論理的な伝送路が利用できる通信環境（インターネット・プロファイル）であってもよいし、アナログ電話や衛星放送のように多重化手段を意識しなければならない通信環境（Ｒａｗプロファイル）であってもよい。しかし、利用者から見れば、論理的な伝送路（ロジカルチャンネル）が複数個用意されている通信環境を前提としている（たとえば、ＴＣＰ／ＩＰが使える通信環境では「通信ポート」と呼ばれる表現が一般に使われる）。
【００３９】
また、図２に示すように、受信管理部１１が受信する情報としては１種類以上のデータ用の伝送路と、伝送するデータを制御するための制御用の論理的な伝送路を１種類以上を想定している。データ伝送用の伝送路を複数用意し、データ制御用の伝送路を１本だけ用意してもよい。また、Ｈ．３２３でも利用されているＲＴＰ／ＲＴＣＰのように、データ伝送毎にデータ制御用の伝送路を用意してもよい。さらに、ＵＤＰを使った放送を考慮した場合、単一の通信ポート（マルチキャストアドレス）を使った通信形態であってもよい。
【００４０】
図３は、複数の論理的な伝送路を用いて画像や音声の伝送、制御する方法について説明する図である。伝送するデータ自身をＥＳ（エレメンタリー・ストリーム）と呼び、ＥＳとしては、画像であれば１フレーム分の画像情報や１フレームよりも小さいＧＯＢ単位やマクロブロック単位の画像情報であってもよい。
【００４１】
音声であれば、利用者が決めた固定長の長さであってよい。また、伝送するデータに付加するデータ制御用のヘッダ情報をＡＬ（アダプテーション・レイヤ情報）と呼ぶ。ＡＬ情報としては、データの処理可能な開始位置であるかどうかを示す情報、データの再生時刻を示す情報、データの処理の優先度を示す情報などがあげられる。本発明のデータ管理情報は、ＡＬ情報に対応する。なお、本発明で用いられるＥＳとＡＬはＭＰＥＧ１／２で定義されている内容と必ずしも合致しなくてもよい。
【００４２】
データの処理可能な開始位置であるかどうかを示す情報は、具体的には２種類の情報があげられる。１つはランダムアクセスのためのフラグであり、例えば、画像ならイントラフレーム（Ｉピクチャ）といったように前後のデータに関係なく単独に読みとって再生できることを示すための情報である。２つ目としては、単に単独で読みとりが可能であることを示すためのフラグとして、アクセスフラグが定義できる。たとえば、画像ならばＧＯＢ単位やマクロブロック単位の画像の先頭であることを示す情報である。従って、アクセスフラグがなければデータの途中である。必ずしもデータの処理可能な開始位置であるかどうかを示す情報としてランダムアクセスのフラグと、アクセスフラグの両方が必要ではない。
【００４３】
ＴＶ会議システムのようなリアルタイム通信では両方のフラグを付加しなくても問題が起こらない場合もあるし、編集を簡単に行えるようにするためにはランダムアクセスフラグは必要である。フラグが必要であるか、必要な場合でもどのフラグが必要かを通信路を介してデータ転送前に決定しておいてもよい。
【００４４】
データの再生時刻を示す情報は、画像と音声の再生される時の時間同期の情報を示し、ＭＥＰＧ１／２ではＰＴＳ（プレゼンテーション・タイムスタンプ）と呼ばれる。ＴＶ会議システムのようなリアルタイム通信では通常、時間同期に関しては考慮されていないため、必ずしも再生時刻を意味する情報は必要ない。必要な情報としては、エンコードされたフレームの時間間隔になるかもしれない。
【００４５】
時間間隔を受信側で調整させることによって、フレーム間隔の大きな変動は防げるが、再生間隔を調整させることで遅延になる可能性もある。従って、エンコードのフレーム間隔を示す時間情報も必要ないと判断できる場合もある。
【００４６】
データの再生時刻を示す情報は、ＰＴＳを意味するのか、フレーム間隔を意味するのか、データの再生時刻をデータ自身には付加しないということを通信路を介してデータ転送前に決定して受信端末に通知して、決定されたデータ管理情報とともにデータを伝送してもよい。
【００４７】
データの処理の優先度を示す情報は、受信端末の負荷やネットワークの負荷によって処理もしくは伝送できない場合に、データの処理を中止させたり、伝送を取りやめることによって受信端末の負荷やネットワークの負荷を低減させることができる。
【００４８】
受信端末では画像伸張管理部１５で、ネットワークでは、中継の端末やルータなどで処理することができる。優先度の表現方法としては数値による表現やフラグであってもよい。なお、データの処理の優先度を示す情報のオフセット値を制御情報、もしくはデータとともにデータ管理情報（ＡＬの情報）として伝送することで、受信端末の負荷やネットワークの負荷の急激な変動に対して、あらかじめ画像や音声に割り当てている優先度にオフセット値を加えることで、システムの動作状況に応じた動的な優先度の設定が可能になる。
【００４９】
さらに、スクランブルの有無、コピーライトの有無、オリジナルかコピーかを識別するための情報をデータとは別に、データの識別子（ＳＳＲＣ）とともに制御情報として送信することで、中継ノードでのスクランブルの解除などが容易になる。
【００５０】
なお、データの処理の優先度を示す情報は、複数のビデオやオーディオのフレームの集合から構成されるストリーム単位で付加してもよいし、ビデオやオーディオのフレーム単位に付加してもよい。
【００５１】
Ｈ．２６３やＧ．７２３などの符号化方法で、符号化された情報の過負荷時の処理の優先度を予め決められた基準で決定し、符号化された情報と決定された優先度を対応づける優先度付加手段を送信端末装置に備える（図５４参照）。
【００５２】
図５４は、映像と音声に優先度を付加する優先度付加手段５２０１について説明する図である。
【００５３】
即ち、同図に示す様に、符号化された映像と音声の各データ（それぞれ映像符号化手段５２０２と音声符号化手段５２０３が処理する）に対して、予め決められた規則に基づき優先度を付加する。優先度を付加する規則は、優先度付加規則５２０４に規則が格納されている。規則とは、Ｉフレーム（フレーム内符号化された映像フレーム）は、Ｐフレーム（フレーム間符号化された映像フレーム）よりも高い優先度付加するという規則や、映像は音声よりも低い優先度を付加するという規則である。また、この規則は利用者の指示により動的に変更しても良い。
【００５４】
優先度を付加する対象となるものは、たとえば、画像であればシーンチェンジ、編集者や利用者が指示した画像フレームやストリーム、音声であれば、有音区間と無音区間である。
【００５５】
過負荷時の処理の優先度を定義する、画像や音声フレーム単位の優先度の付加方法は、通信ヘッダへ付加する方法と符号化時にビデオやオーディオの符号化されたビットストリームのヘッダに埋め込む方法が考えられる。前者は、復号せずに優先度に関する情報を得ることが可能であり、後者はシステムに依存せずにビットストリーム単体で独立に扱うことが可能である。
【００５６】
通信ヘッダに優先度情報を付加する場合、１つの画像フレーム（たとえば、フレーム内符号化されたＩフレーム、フレーム間符号化されたＰ、Ｂフレーム）が複数個の送信パケットに分割される場合、画像であれば単独の情報としてアクセス可能な画像フレームの先頭部分を伝送する通信ヘッダのみに優先度を付加する（同一の画像フレーム内で優先度が等しい場合、次のアクセス可能な画像フレームの先頭が現れるまで、優先度は変わらないものとすればよい）。
【００５７】
なお、用途に合わせて、優先度が表現できる値の範囲（たとえば、時間情報を１６ビットで表現するとか、３２ビットで表現するとか）を可変にして、制御情報でコンフィグレーションできるようにしてもよい。
【００５８】
また、復号化装置では、受信された種々の符号化された情報の過負荷時の優先度に従って、処理の方法を決定する優先度決定手段を受信端末装置に備える（図５５参照）。
【００５９】
図５５は、映像と音声に付加された優先度を解釈し、復号処理の可否を決定する優先度決定手段５３０１について説明する図である。
【００６０】
即ち、同図に示す様に、優先度は映像、音声のストリーム毎に付加される優先度、映像もしくは音声のフレーム毎に付加される優先度である。これらの優先度はそれぞれ独立に用いてもよいし、フレーム優先度とストリーム優先度とを対応付けて用いてもよい。優先度決定手段５３０１は、これら優先度に応じて復号すべきストリームやフレームを決定する。
【００６１】
端末での過負荷時の処理の優先度を決定する２種類の優先度を用いて、デコード処理を行なう。すなわち、映像、音声といったビットストリーム間の相対的優先度を定義するストリーム優先度（Stream Priority；時系列間優先度）と、同一ストリーム内の映像フレームといった復号処理単位間の相対的優先度を定義するフレーム優先度(Frame Priority；時系列内優先度)を定義する（図３０）。
【００６２】
前者のストリーム優先度により複数のビデオやオーディオの取り扱いが可能になる。後者のフレーム優先度により映像のシーンチェンジや編集者の意図に応じて、同一のフレーム内符号化された映像フレーム（Iフレーム）でも異なる優先度の付加が可能になる。
【００６３】
なお、ストリーム優先度を、画像や音声の符号化もしくは復号化処理のオペレーティング・システム（ＯＳ）での割り当て時間もしく処理の優先度に対応付けて管理することで、ＯＳレベルでの処理時間の管理が可能となる。たとえば、マイクロソフト社のＷｉｎｄｏｗｓ９５／ＮＴでは５段階のＯＳレベルでの優先度の定義ができる。符号化、復号化の手段をソフトウェアでスレッドの単位で実現した場合、処理対象となるストリームのストリーム優先度から、各スレッドに割り当てるＯＳレベルでの優先度を決定することができる。
【００６４】
ここで述べた、フレーム優先度とストリーム優先度は、伝送媒体やデータ記録媒体へ適用が可能である。例えば、伝送するパケットの優先度をアクセスユニット優先度（ＡｃｃｅｓｓＵｎｉｔＰｒｉｏｒｉｔｙ）と定義すると、ＡｃｃｅｓｓＵｎｉｔＰｒｉｏｒｉｔｙ＝ＳｔｒｅａｍＰｒｉｏｒｉｔｙ−ＦｒａｍｅＰｒｉｏｒｉｔｙといった、フレーム優先度と、ストリーム優先度の関係式から、パケットの伝送に関する優先度、若しくは、端末による過負荷時の処理の優先度を決定することが出来る。
【００６５】
又、データ記録媒体としてフロッピーディスク、光ディスクなどを用いて行うことができる。また、記録媒体はこれに限らず、ＩＣカード、ＲＯＭカセット等、プログラムを記録できるものであれば同様に実施することができる。さらに、データの中継を行うルータやゲートウェイといった画像や音声の中継装置を対象としてもよい。
【００６６】
具体的な優先度に関する利用方法としては、受信端末が過負荷である場合に、処理すべき符号化された情報の優先度の閾値を決定する優先度決定手段を画像伸長管理部１５や音声伸長管理部に具備し、表示されるべき時刻（ＰＴＳ）と現在までの処理開始からの経過時間もしくは、復号されるべき時刻（ＤＴＳ）と現在までの処理開始からの経過時間を比較し、比較結果により処理すべき符号化された情報の優先度の閾値を変化させる（閾値を変化させるための情報としては、Ｉフレームの挿入間隔、優先度の粒度を参考にしてもよい）。
【００６７】
図２５（ａ）に示す例では、エンコード時には、取り込まれたＱＣＩＦ、ＣＩＦのサイズの画像をエンコーダ（Ｈ．２６３）により、エンコードを行い、エンコードされた情報とともに、復号する時刻（ＤＴＳ）、画像を表示する時刻を示すタイムスタンプ（ＰＴＳ）、過負荷時の処理の順序を示す優先度情報（ＣＧＤ、ＣｏｍｐｕｔａｔｉｏｎａｌＧｒａｃｅｆｕｌＤｅｇｒａｄａｔｉｏｎ）、フレームタイプ、シーケンス番号（ＳＮ）を出力する。
【００６８】
また、図２５（ｂ）に示す例では、音声もマイクを通して録音され、エンコーダ（Ｇ．７２１）により、エンコードを行い、エンコードされた情報とともに、復号する時刻（ＤＴＳ）、音声を再生する時刻を示すタイムスタンプ（ＰＴＳ）、優先度情報（ＣＧＤ）、シーケンス番号（ＳＮ）を出力する。
【００６９】
デコード時には、図２６に示す様に、画像と音声は、それぞれ別々のバッファに渡され、画像と音声はそれぞれのＤＴＳ（復号時刻）と現在の処理開始からの経過時間とを比較して、ＤＴＳの方が遅れていなければ、画像と音声はそれぞれのデコーダ（Ｈ．２６３、Ｇ．７２１）に渡される。
【００７０】
図２７の例では、エンコーダでの過負荷時の優先度の付加方法について記している。画像はＩフレーム（フレーム内符号化された画像フレーム）は優先度が「０」と「１」で高い優先度を割り当てている（数字が大きいほど優先度が低い）。Ｐフレームは優先度が「２」でＩフレームよりも低い優先度を割り当てている。Ｉフレームは、２段階の優先度を割り当てているため、デコードする端末の負荷が高い場合、優先度が「０」のＩフレームのみを再生するといったことができる。なお、優先度の付加方法に応じて、Ｉフレームの挿入間隔を調整する必要がある。
【００７１】
図２８の例は、過負荷時の受信端末での優先度の決定方法について記した図である。廃棄するフレームの優先度をCutOffPriorityよりも大きいと設定する。つまり、すべての画像フレームを処理の対象とする。画像フレームに付加される優先度の最大値は端末接続時に送信側から受信側へ通知することにより、あらかじめ知ることができる（ステップ１０１）。
【００７２】
ＤＴＳと現在の処理開始からの経過時間を比較して、経過時間の方が大きい場合（復号処理が間に合っていない場合）、処理対象とすべき画像、音声の優先度の閾値CutOffPriorityを引き下げ、処理を間引く（ステップ１０２）、逆に処理開始からの経過時間の方が小さい場合（復号処理が間に合っている場合）は、処理できる対象の画像や音声を増やすために、優先度の閾値CutOffPriorityを引き上げる（ステップ１０３）。
【００７３】
１つ前の画像フレームがＰフレームでスキップされているならば処理は行わない。そうでなければ、画像フレーム（もしくは音声のフレーム）の優先度に優先度のオフセット値を付加し、優先度の閾値と比較し、閾値をこえていなければ、デコーダに復号すべきデータを渡す（ステップ１０４）。
【００７４】
なお、優先度のオフセットは、マシンの性能をあらかじめ調べ、受信端末へオフセットを通知しておくという使い方（利用者が受信端末で指示してもよい）、複数のビデオとサウンドストリームのストリーム単位の優先度を変更するという使い方（例えば、一番後ろの背景はオフセット値をあげて処理を間引くようにする）ができる。
【００７５】
マルチストリームを対象とする場合、ストリーム毎の優先度を付加し、画像や音声のデコードのスキップ判定をしてもよい。加えて、リアルタイム通信においてもＨ．２６３のＴＲ（テンポラリーリファレンス）をＤＴＳと同様にして取り扱い利用することで、端末でのデコード処理が進んでいるか、遅れているかを判定でき、上記で述べた同様のスキップ処理を実現することができる。
【００７６】
図２９は、図２８のアルゴリズムを実装して、優先度の時間変化を調べたものである。
【００７７】
同図では、映像フレームに付加される優先度の変化を示している。この優先度は端末が過負荷である際の復号の可否を決定するための優先度であり、各フレーム毎に付加される。優先度は値が小さいほど優先度が高い。同図の例では０が最も優先度が高い。優先度の閾値が３であるとき、３よりも大きな値の優先度のフレームは復号されずに廃棄され、３以下の値の優先度が付加されているフレームは復号される。優先度による選択的なフレームの廃棄を行うことで、端末の負荷を押さえることが可能である。この優先度の閾値は、現在の処理時刻と各フレームに付加される復号処理時間（ＤＴＳ）との関係から動的に決定してもよい。本手法は映像フレームだけでなく、音声に対しても同様な要領で適用が可能である。
【００７８】
インターネットのような伝送路を考えた場合、伝送途中で紛失した符号化された情報の再送が必要な場合、再送すべき符号化された情報の優先度の閾値を決定する再送要求優先度決定部を受信管理部１１に備え、優先度決定部が管理する優先度や、再送回数、情報の損失率、フレーム内符号化されたフレームの挿入間隔、優先度の粒度（たとえば、５段階の優先度など）の情報をもとに、再送要求すべき符号化された情報に付加された優先度の閾値を決定することで、受信端末で必要とする画像や音声のみを再送要求することができる。再送回数や情報の損失率が大きければ、再送すべき対象とする情報の優先度を引き上げて、再送や損失率を低下させる必要がある。また、優先度決定部で使用されている優先度を知ることで、処理対象外の情報の伝送をなくすことができる。
【００７９】
送信側端末に関しては、送信端末の情報の目標転送レートよりも実際の転送レートが超える場合や、送信バッファへの符号化された情報の書き込みが、現在までの転送処理開始からの経過時間と符号化された情報に付加されている復号もしくは表示される時刻とを比較して、送信バッファへの情報の書き込みが遅れている場合、符号化された情報に付加され、受信端末の優先度決定部で利用される端末が過負荷時の優先度を用いて、情報の送信を間引くことで、目標レートにあった画像や音声の伝送が可能となる。また、受信側端末で行っているような過負荷時の処理のスキップ機能を送信側端末でも導入することで送信側端末の過負荷による破綻を押さえることができる。
【００８０】
上記で説明したＡＬの情報を必要に応じて、必要な情報だけを伝送できるようにすることによって、アナログ電話回線のような狭帯域の通信路には伝送情報量を調節できるので有効である。実現方法としては、送信側端末でデータ自身に付加するデータ管理情報を予めデータ送信前に決定し、受信端末に使用するデータ管理情報を制御情報（たとえば、ランダムアクセスフラグだけを使用するとか）として通知するとともに、受信側端末では得られた制御情報をもとに、前記伝送フォーマット記憶部１０３で記憶する伝送構造に関する情報（どのＡＬの情報を使用するか表している）を書き換えることにより、送信側で使用するＡＬの情報（データ管理情報）の組み替えが可能になる（図１９〜図２０参照）。
【００８１】
図４は、送信すべき画像や音声のデータに付加するヘッダ情報の動的な変更方法について説明する図である。図の例では、伝送すべきデータ（ＥＳ）をデータ片に分解し、得られたデータ片に、データの順序関係を示すための識別情報（シーケンス番号）と、データ片の処理可能な開始位置であるかどうかを示す情報（マーカービット）と、データ片の転送に関する時間情報（タイムスタンプ）とを、本発明の伝送管理情報に対応するものとして、通信ヘッダの形でデータ片に付加している。
【００８２】
具体的な例としては、ＲＴＰ（ＲｅａｌｔｉｍｅＴｒａｎｓｆｅｒＰｒｏｔｏｃｏｌ、ＲＦＣ１８８９）では上記のシーケンス番号、マーカービット、タイムスタンプ、オブジェクトＩＤ（ＳＳＲＣと呼ばれている）、バージョン番号などの情報を通信ヘッダとして使用している。ヘッダ情報の項目の拡張は可能であるが、上記の項目は固定の項目として必ず付加される。しかし、複数の異なる符号化の画像や音声を複数、同時に伝送する通信環境で、ＴＶ電話のようにリアルタイム通信とビデオ・オン・デマンドのように蓄積メディアの伝送が混在する場合、通信ヘッダの持つ意味合いが異なり、識別する手段が必要である。
【００８３】
例えば、タイムスタンプの情報は、ＭＰＥＧ１／２の場合は前述したように再生時刻であるＰＴＳを示すが、Ｈ．２６１やＨ．２６３ではエンコードされた時間間隔を表す。しかし、Ｈ．２６３を音声と同期をとって処理を行いたい場合、タイムスタンプがＰＴＳの情報であることを示す必要がある。なぜならば、Ｈ．２６３の場合、タイムスタンプの情報は、エンコードされたフレーム間の時間間隔を示すのであって、１枚目のフレームのタイムスタンプはランダムであるとＲＴＰで定義されているからである。
【００８４】
そこで、（ａ）タイムスタンプがＰＴＳであるかないかを示すフラグを通信ヘッダ情報（通信ヘッダの拡張が必要になる）もしくは、（ｂ）Ｈ．２６３やＨ．２６１のペイロードのヘッダ情報（つまり、ＡＬの情報）として付加する必要がある（この場合、ペイロード情報の拡張が必要になる）。
【００８５】
ＲＴＰのヘッダ情報として、データ片の処理可能な開始位置であるかどうかを示す情報であるマーカビットが付加されているが、ＡＬの情報としても前述したように、データに対してアクセスできる開始時点であることを示すアクセスフラグ、ランダムにデータに対してアクセスすることができることを示すランダムアクセスフラグを持たせたい場合がある。重複して、通信ヘッダに持たせるのは効率が悪くなるため、ＡＬのフラグを通信ヘッダで用意しているフラグで代用させる方法も考えられる。
【００８６】
（ｃ）ＡＬにフラグを付加せずに通信ヘッダに付加しているヘッダでＡＬのフラグを代用させることを示すフラグを通信ヘッダに新たに設けるか、通信ヘッダのマーカービットはＡＬのものと同じであると定義することで、問題は解決される（ＡＬに持たせるよりも解釈が早くできことが期待できる）。つまり、マーカービットがＡＬのフラグと同じ意味を持つかどうかを示すフラグである。この場合、通信ヘッダの改良もしくは、拡張領域に記述することが考えられる。
【００８７】
逆に、（ｄ）通信ヘッダのマーカビットの意味をＡＬに少なくともランダムアクセスフラグもしくは、アクセスフラグのいずれかが存在することを意味するように解釈するようにしてもよい。この場合、従来とは解釈の意味が変わったことを知るには通信ヘッダのバージョン番号で対応できる。これ以外に、単純な方法としては、通信ヘッダもしくはＡＬのヘッダにのみアクセスフラグやランダムアクセスフラグを設ければ処理は簡単である（前者の場合、フラグを両方とも設ける場合も考えられるが、通信ヘッダの新たな拡張が必要になる）。
【００８８】
データ処理の優先度を示す情報をＡＬの情報として付加することは述べたが、通信ヘッダにデータの処理の優先度を付加することによって、データ処理の優先度の処理の判定がネットワーク上においてもデータの中身を解釈せずに行うことが可能となる。なお、ＩＰｖ６の場合、ＲＴＰのレベルより下位のレイヤーで付加することが可能である。
【００８９】
ＲＴＰの通信ヘッダにデータの処理の有効期間を示すためのタイマーもしくはカウンタを付加することで、伝送されてくるパケットのある状態変化がどのように変化しているかを判断することができる。たとえば、必要となるデコーダソフトウェアが、アクセス速度の遅い記憶装置に記憶されている場合、デコーダが必要になるという情報と、タイマーやカウンターにより、いつの時点で必要になるかを判断することが可能になる。この場合、用途によってはＡＬの情報にタイマーやカウンター、データの処理の優先度の情報は不要である。
【００９０】
図５（ａ）〜図５（ｂ）、と図６（ａ）〜図６（ｄ）は、ＡＬ情報の付加方法について説明する図である。
【００９１】
図５（ａ）に示した様に、ＡＬを伝送すべきデータの先頭にのみ付加するか、あるいは、図５（ｂ）に示した様に、伝送すべきデータ（ＥＳ）を１つ以上のデータ片に分解した後のデータ片のそれぞれに付加するかを通知する制御情報を、受信端末へ送付することにより伝送情報の取り扱い粒度を選択できるようにすることが可能になる。ＡＬを細分化されたデータに対してつけることで、アクセス遅延が問題になるような場合には有効である。
【００９２】
前述したように、受信側でのデータ管理情報の組み替えや、データ管理情報のデータへの配置方法の変更が行われることを予め受信側端末に通知するために、フラグ、カウンター、タイマーのような表現方法を用いて、ＡＬの情報として用意したり、通信ヘッダとして用意して受信端末に通知することで、受信端末対応がスムーズにできる。
【００９３】
これまでの例ではＲＴＰのヘッダ（又は、通信ヘッダ）とＡＬの情報の重複を回避する方法や、ＲＴＰの通信ヘッダやＡＬの情報を拡張する方法について述べた。しかし、本発明は、必ずしもＲＴＰである必要はない。たとえば、ＵＤＰやＴＣＰを使って独自の通信ヘッダやＡＬ情報を新たに定義してもよい。インターネットプロファイルではＲＴＰを使うことはあるが、ＲａｗプロファイルではＲＴＰのような多機能なヘッダは定義されていない。ＡＬ情報と通信ヘッダに関する考え方としては、次の４通りの考え方ができる（図６（ａ）〜図６（ｄ）参照）。
【００９４】
（１）ＲＴＰとＡＬで、既に割り当てられているヘッダ情報が重複しないように、ＲＴＰのヘッダ情報もしくはＡＬの情報を修正、拡張する（とくにタイムスタンプの情報が重複、タイマーやカウンター、データの処理の優先度情報が拡張情報となる）。あるいは、ＲＴＰのヘッダも拡張せず、ＡＬの情報もＲＴＰのものと重複していても考慮しない方法でもよい。これらに関してはこれまでに示した内容に相当する。ＲＴＰは既に一部、Ｈ．３２３で実用化されているので、互換性を保ったＲＴＰの拡張は有効である（図６（ａ）参照）。
【００９５】
（２）ＲＴＰにこだわらずに、通信ヘッダを簡略にして（たとえば、シーケンス番号だけにするとか）、残りをＡＬ情報に多機能な制御情報として持たせる。また、ＡＬ情報で使用する項目を通信前に可変に設定できるようにすることで、柔軟な伝送フォーマットが規定できる（図６（ｂ）参照）。
【００９６】
（３）ＲＴＰにこだわらずに、ＡＬの情報を簡略にして（極端な例では、ＡＬには情報を付加しない）、通信ヘッダにすべての制御情報を持たせる。通信ヘッダとして頻繁によく参照されうるシーケンス番号、タイムスタンプ、マーカービット、ペイロードタイプ、オブジェクトＩＤに関しては固定のヘッダ情報としておき、データ処理の優先度情報、タイマー情報に関しては拡張情報として、拡張情報が存在するどうかを示す識別子を設けておいて、拡張情報が定義されていれば参照するようにしてもよい（図６（ｃ）参照）。
【００９７】
（４）ＲＴＰにこだわらず、通信ヘッダ、ＡＬの情報を簡略にして、これら通信ヘッダやＡＬ情報とは、別のパケットとして、フォーマットを定義して、伝送する。例えば、ＡＬの情報はマーカービット、タイムスタンプ、オブジェクトＩＤだけ定義し、通信ヘッダもシーケンス番号だけを定義し、これらの情報とは別の伝送パケット（第２のパケット）として、ペイロード情報、データ処理の優先度情報、タイマー情報などを定義し、伝送する方法も考えられる（図６（ｄ）参照）。
【００９８】
上記に示したように、用途や、既に画像や音声に付加されているヘッダ情報を考慮すれば、用途にあわせて、通信ヘッダ、ＡＬの情報、データとは別に伝送するパケット（第２のパケット）を自由に定義できる（カスタイマイズできる）ようにするのが望ましい。
【００９９】
図７は、複数の論理的な伝送路を動的に多重化、分離して情報の伝送を行う方法について説明する図である。論理的な伝送路の数を節約するために、利用者の指示もしくは論理的な伝送路の数に応じて、複数のデータもしくは制御情報を伝送するための論理的な伝送路の情報の多重化を開始したり、終了させることが可能な情報多重部を伝送部１３に、多重化された情報を分離する情報分離部を受信管理部１１に設けることにより実現できる。
【０１００】
なお、図７では情報多重部を“ＧｒｏｕｐＭＵＸ”とよんでおり、具体的にはＨ．２２３のような多重化方式を用いればよい。このＧｒｏｕｐＭＵＸは送受信端末で設けてもよいし、中継のルータや端末に設けることによって、狭帯域通信路への対応や、ＧｒｏｕｐＭＵＸをＨ．２２３で実現すればＨ．３２４と相互接続できる。
【０１０１】
情報多重部に関する制御情報（多重化制御情報）を素早く取り出すために、情報多重部の制御情報を情報多重部でデータと多重化して送信するのではなく、多重化せずに別の論理的な伝送路で伝送することで、多重化による遅延を低減することができる。これに伴って、情報多重部に関する制御情報をデータと多重化して伝送するのか、データと多重化して送信するのではなく、多重化せずに別の論理的な伝送路で伝送するのかを通知して伝送することで、従来の多重化と整合性を保たせたり、多重化による遅延を低減させるかを利用者で選択することが可能になる。ここで、情報多重部に関する多重化制御情報とは、例えば、情報多重部が、各データに対して、どの様な多重化を行っているのかという、多重化の内容を示す情報である。
【０１０２】
前述したように、同様に、少なくとも多重化の開始と終了を通知する情報、多重化すべき論理的な伝送路の組合せを通知するための情報、多重化に関する制御情報（多重化制御情報）の伝送方法の通知を、フラグ、カウンタ、タイマーのような表現方法で、制御情報として伝送、もしくはデータ管理情報としてデータとともに、受信側端末に伝送することで、受信側でのセットアップの時間を短縮できる。また、前述したようにフラグ、カウンタ、タイマーを表現する項目はＲＴＰの送信ヘッダに設けてもよい。
【０１０３】
複数個の情報多重部や情報分離部が存在する場合、情報多重部や情報分離部を識別するための識別子とともに制御情報（多重化制御情報）を伝送すれば、どの情報多重部に関する制御情報（多重化制御情報）かを識別することができる。制御情報（多重化制御情報）としては、多重化のパターンなどがあげられる。また、情報多重部や情報分離部の識別子を乱数を用いて、端末間で決定することで情報多重部の識別子を生成することができる。たとえば、送受信端末間で決められた範囲での乱数を発生させ、大きい方の値を情報多重部の識別子（識別番号）とすればよい。
【０１０４】
情報多重部で多重化されたデータは、従来、ＲＴＰで定義されているメディアタイプとは異なるため、ＲＴＰのペイロード・タイプに、情報多重部で多重化された情報であることを示す情報（新たなメディアタイプ、Ｈ．２２３を定義）を定義すればよい。
【０１０５】
多重化されたデータに対するアクセス速度を向上させる方法として、情報多重部で伝送もしくは記録する情報を制御情報、データ情報の順に配置することで多重化された情報の解析を早くできることが期待できる。また、制御情報に付加するデータ管理情報で記述する項目は固定にし、データとは異なる識別子（ユニークなパターン）を付加して多重化することでヘッダ情報の解析を早くできる。
【０１０６】
図８は放送番組の伝送手順について説明するための図である。論理的な伝送路の識別子と放送番組の識別子の対応関係を放送番組の情報として制御情報を伝送するか、放送番組の識別子をデータ管理情報（ＡＬ情報）としてデータに付加して伝送することで複数の伝送路で伝送されるデータがどの番組のために放送されているのかを識別することが可能となる。また、データの識別子（ＲＴＰではＳＳＲＣ）と論理的な伝送路の識別子（たとえば、ＬＡＮのポート番号）との関係を制御情報として受信側端末に伝送して、受信側端末では受信可能であることを確認後（Ack／Reject）、対応するデータを伝送することにより、制御情報とデータを独立した伝送路で伝送しても、データ間の対応関係がとれる。
【０１０７】
放送番組やデータに対して伝送の順序関係を示す識別子と、放送番組やデータが情報として利用できる有効期限を示すためのカウンタもしくはタイマーの情報とを組み合わせて、放送番組やデータに付加して伝送することで、戻りチャンネルなしで放送が実現できる（有効期限が過ぎそうになったら、不足の情報があっても放送番組の情報やデータの再生を開始する）。単一の通信ポートのアドレス（マルチキャストアドレス）を使って、制御情報とデータに分離せずに放送する方法も考えられる。
【０１０８】
なお、バックチャンネルを持たない通信の場合、データの構造情報を受信端末が知ることができるように、制御情報はデータよりも十分、前もって伝送しておく必要がある。また、制御情報は一般には、パケットロスのない信頼性の高い伝送チャンネルで伝送すべきであるが、信頼性の低い伝送チャネルを用いる場合は周期的に同じ伝送シーケンス番号を持った制御情報を繰り返し伝送する必要がある。これはセットアップ時間に関する制御情報を送る場合に限った話ではない。
【０１０９】
また、データ管理情報として付加可能な項目（たとえば、アクセスフラグ、ランダムアクセスフラグ、データの再生時刻（ＰＴＳ）、データ処理の優先度情報など）を選択して、制御情報としてデータの識別子（ＳＳＲＣ）とともにデータとは別の論理的な伝送路で伝送するか、データとともにデータ管理情報（ＡＬの情報）として伝送するかを、データ送信前に送信側で決定して、受信側に制御情報として通知して伝送することで柔軟なデータの管理と伝送が可能となる。
【０１１０】
これにより、ＡＬには情報を付加せずにデータ情報の伝送を行うことができるので、ＲＴＰを用いて画像や音声のデータを伝送する際に、従来から定義されているペイロードの定義を拡張する必要がなくなる。
【０１１１】
図９（ａ）〜図９（ｂ）は、プログラムやデータの読み込み、立ち上げ時間を考慮した画像や音声の伝送方法を示す図である。特に、衛星放送や携帯端末のように戻りチャンネルがなく一方向で、端末の資源が限られている場合で、プログラムやデータが受信側端末に存在して利用する場合、必要となるプログラム（例えば、Ｈ．２６３、ＭＰＥＧ１／２、音声のデコーダのソフトウェアなど）やデータ（たとえば、画像データや音声のデータ）が、読み込みに時間がかかる記憶装置（たとえば、ＤＶＤ、ハードディスク、ネットワーク上のファイルサーバなど）に存在する場合に、予め必要となるプログラムやデータを識別する識別子と、伝送されるストリームの識別子（たとえば、ＳＳＲＣや、ＬｏｇｉｃａｌＣｈａｎｎｅｌＮｕｍｂｅｒ）、受信端末で必要となる時点を推定するためのフラグ、カウンタ（カウントアップ、ダウン）、タイマーのような表現方法で、制御情報として受信、もしくはデータ管理情報としてデータとともに受信することで、必要となるプログラムやデータのセットアップ時間の短縮が可能となる（図２２）。
【０１１２】
一方、プログラムやデータが送信される場合、プログラムやデータの受信端末での記憶先（たとえば、ハードディスク、メモリー）、起動や読み込みにかかる時間、端末の種類や記憶先と起動や読みとりにかかる時間の対応関係（例えば、ＣＰＵパワー、記憶デバイスと平均的な応答時間の関係）、利用順序を示す情報とともにプログラムやデータを送信側から伝送することで、受信側端末で必要となるプログラムやデータを実際に必要となる場合、プログラムやデータの記憶先や読み出す時間に関してスケジューリングが可能となる。
【０１１３】
図１０（ａ）〜図１０（ｂ）は、ザッピング（ＴＶのチャンネル切り替え）に対する対応方法について説明する図である。
【０１１４】
従来からある映像を受信するだけの衛星放送とは異なり、プログラムを受信端末で実行しなければならないとき、プログラムの読み込みや立ち上がるまでのセットアップの時間が大きな問題となる。これは、携帯端末のように利用資源が限られる場合でも同じことがいえる。
【０１１５】
解決策の１つとして、（ａ）利用者が視聴するための主視聴部と、利用者が視聴している以外の番組で、必要となるプログラムやデータが、読み込みに時間がかかる記憶装置に存在する場合に、利用者が視聴している番組以外の番組を受信端末が周期的に視聴する副視聴部を備え、予め必要となるプログラムやデータを識別する識別子と、受信端末で必要となる時点を推定するためのフラグ、カウンタ、タイマーといった情報と、番組との対応関係を、制御情報（データとは別のパケットで伝送される、端末処理を制御するための情報）として受信、もしくはデータ管理情報（ＡＬの情報）としてデータとともに受信して、プログラムやデータの読み込みを準備しておくことで、受信側端末でのセットアップ時間が短縮できることが期待できる。
【０１１６】
解決策の２つ目としては、複数個のチャンネルで放送される画像の見出し画像だけを放送する放送チャンネルを設け、視聴者が視聴番組を切り替えることで、必要となるプログラムやデータが、読み込みに時間がかかる記憶装置に存在した場合、一旦、視聴したい番組の見出し画像を選択して視聴者に提示するか、読み込み中であることを提示するとともに、記憶装置から必要となるプログラムやデータを読み込み、読み込み終了後、視聴者が視聴したい番組を再開することで、セットアップ時に発生する画面の停止が防止できる。ここでいう見出し画像は、周期的に複数個のチャンネルで放送される番組をサンプリングした放送画像を指す。
【０１１７】
また、タイマーは時間表現で、たとえば、送信側から送られてくるデータストリームをデコードするのに必要なプログラムは現在からいつの時点で必要となるかを示す。カウンタは送受信端末間で決めた基本時間単位で、何回目かを示す情報であればよい。フラグは、セットアップに必要な時間前に送出するデータもしくは、制御情報（データとは別のパケットで伝送される、端末処理を制御する情報）とともに伝送して通知する。タイマー、カウンターともデータの中に埋め込んで伝送してよいし、制御情報として伝送してもよい。
【０１１８】
さらに、セットアップ時間の決定方法としては、例えば、クロックベースで動作しているＩＳＤＮのような伝送路を用いた場合、送信側端末から受信端末でプログラムやデータが必要となる時点を通知するために、伝送管理情報として伝送の順序関係を識別するための送信シリアル番号を用いて、データ管理情報としてデータとともに、もしくは、制御情報として受信端末に通知することで、セットアップが行われる時刻の予測が可能になる。また、インターネットのようにジッタや遅延により、伝送時間が変動する場合は、ＲＴＣＰ（インターネットのメディア伝送プロトコル）で既に実現されているような手段で、ジッタや遅延時間から、伝送の伝播遅延を加味してセットアップ時間に付加しておけばよい。
【０１１９】
図１１から図２４は、実際に端末間で送受信されるプロトコルの具体例を示す図である。
【０１２０】
伝送フォーマットや伝送手続きはＡＳＮ．１で記述した。又、本伝送フォーマットは、ＩＴＵのＨ．２４５をベースに拡張を行った。図１１にもあるように、画像や音声のオブジェクトは階層構造をなしていてもよく、ここの例では、各オブジェクトＩＤは放送番組の識別子（ＰｒｏｇｒａｍＩＤ）とオブジェクトＩＤ（ＳＳＲＣ）の属性をもち、画像間の構造情報、合成方法はＪａｖａ，ＶＲＭＬといったスクリプト言語で記述する。
【０１２１】
図１１は、オブジェクト間の関係についての例を示す図である。
【０１２２】
同図において、オブジェクトは、映像、音声、ＣＧ、テキストなどのメディアである。同図の例では、オブジェクトは階層構造を成している。各オブジェクトは、プログラム番号（ＴＶのチャンネルに相当、“ＰｒｏｇｒａｍＩＤ”）とオブジェクトを識別するオブジェクト識別子“ＯｂｊｅｃｔＩＤ”を持つ。ＲＴＰ（インターネットで用いられるメディア伝送のプロトコル、ＲｅａｌｔｉｍｅＴｒａｎｓｆｅｒＰｒｏｔｏｃｏｌ）で各オブジェクトを伝送する場合は、オブジェクト識別子はＳＳＲＣ（同期ソース識別子）に対応させることで容易にオブジェクトの識別が可能である。なお、オブジェクト間の構造記述はＪＡＶＡ、ＶＲＭＬといった記述言語で記述することが可能である。
【０１２３】
これらのオブジェクトの伝送方法は２通り考えられる。１つは放送型であり、送信側端末から一方的に伝送する形態である。もう１つは送受信端末間（端末Ａ、端末Ｂ）でオブジェクトの伝送を行う形態（通信型）も考えられる。
【０１２４】
例えば、伝送方法としてはインターネットの場合はＲＴＰを用いることができる。制御情報は、ＴＶ電話の規格ではＬＣＮＯと呼ばれる伝送チャンネルを用いて伝送する。同図の例では伝送に複数の伝送チャンネルを用いているが、これらのチャンネルは同一の番組チャンネル（ＰｒｏｇｒａｍＩＤ）が割り当てられている。
【０１２５】
図１２は、本発明で述べた機能を実現するためのプロトコルの実現方法について説明する図である。ここではＴＶ電話の規格（Ｈ．３２４，Ｈ．３２３）で用いられる伝送プロトコル（Ｈ．２４５）を用いて説明する。Ｈ．２４５の拡張を行うことで本発明で述べた機能を実現する。
【０１２６】
同図の例で示した記述方法は、ＡＳＮ．１と呼ばれるプロトコル記述方式である。“ＴｅｒｍｉｎａｌＣａｐａｂｉｌｉｔｙＳｅｔ”は端末の性能を表現する。同図の例では、“ｍｐｅｇ４Ｃａｐａｂｉｌｉｔｙ”と記した機能を従来からあるＨ．２４５に対して拡張している。
【０１２７】
図１３では、“ｍｐｅｇ４Ｃａｐａｂｉｌｉｔｙ”は端末で同時に処理できる最大の映像の数（“ＭａｘＮｕｍｂｅｒＯｆＶｉｄｅｏ”）、最大の音声の数（“ＭａｘＮｕｍｂｅｒＯｆＳｏｕｎｄｓ”）、端末で実現できる最大の多重化機能の数（“ＭａｘＮｕｍｂｅｒＯｆＭｕｘ”）を記している。
【０１２８】
同図では、これらをまとめて、処理できる最大のオブジェクト数（“ＮｕｍｂｅｒＯｆＰｒｏｃｅｓｓＯｂｊｅｃｔ”）として表現している。また、通信ヘッダ（同図ではＡＬと表現）の変更が可能であるかを記すフラグが記されている。この値が真であるとき通信ヘッダの変更が可能である。“ＭＰＥＧ４Ｃａｐａｂｉｌｉｔｙ”を用いて端末間で処理できるオブジェクト数をお互いに通知する場合に、通知された側が受け入れ（処理）可能であれば“ＭＥＰＧ４ＣａｐａｂｉｌｉｔｙＡｃｋ”を、そうでなければ“ＭＥＰＧ４ＣａｐａｂｉｌｉｔｙＲｅｊｅｃｔ”を、“ＭＥＰＧ４Ｃａｐａｂｉｌｉｔｙ”を送信してきた端末に返す。
【０１２９】
図１４では、１つの伝送チャンネル（この例ではＬＡＮの伝送チャンネル）を複数の論理的なチャンネルで共有して使用するために複数の論理的なチャンネルを１つの伝送チャンネルに多重化する前述のＧｒｏｕｐＭＵＸを使用するためのプロトコルの記述方法について示している。同図の例では、ＬＡＮ（ローカルエリアネトワーク）の伝送チャンネル（“ＬＡＮＰｏｒｔＮｕｍｂｅｒ”）に多重化手段（ＧｒｏｕｐＭＵＸ）を対応づけている。“ＧｒｏｕｐＭｕｘＩＤ”は、多重化手段を識別するための識別子である。“ＣｒｅａｔｅＧｒｏｕｐＭｕｘ”を用いて端末間で多重化手段を使用する場合にお互いに通知する場合に、通知された側が受け入れ（使用）可能であれば“ＣｒｅａｔｅＧｒｏｕｐＭｕｘＡｃｋ”を、そうでなければ“ＣｒｅａｔｅＧｒｏｕｐＭｕｘＲｅｊｅｃｔ”を、“ＣｒｅａｔｅＧｒｏｕｐＭｕｘ”を送信してきた端末に返す。多重化手段の逆の動作を行う手段である分離手段は、同様な方法で実現出来る。
【０１３０】
図１５では、既に生成した多重化手段を消去する場合について記述している。
【０１３１】
図１６では、ＬＡＮの伝送チャンネルと複数の論理的なチャンネルの関係について記述している。
【０１３２】
ＬＡＮの伝送チャンネルは“ＬＡＮＰｏｒｔＮｕｍｂｅｒ”で、複数の論理的なチャンネルは“ＬｏｇｉｃａｌＰｏｒｔＮｕｍｂｅｒ”で記述する。
【０１３３】
同図の例では、１つのＬＡＮの伝送チャンネルに対して最大１５個の論理的なチャンネルを対応づけることが可能である。
【０１３４】
尚、同図において、使用できるＭＵＸの数が、１個だけの場合は、ＧｒｏｕｐＭｕｘＩＤは、不要である。又、Ｍｕｘを複数使用する場合は、Ｈ．２２３の各コマンドに対してＧｒｏｕｐＭｕｘＩＤが必要である。又、多重化と分離手段との間で用いられるポートの対応関係を通知するためのフラグを設けても良い。又、制御情報も多重化するか、別の論理的な伝送路を介して伝送するかを選択出来るようにするためのコマンドを設けても良い。
【０１３５】
図１４〜図１６の説明では伝送チャンネルはＬＡＮであるが、Ｈ．２２３、ＭＰＥＧ２のようにインターネットプロトコルを使わない方式でもよい。
【０１３６】
図１７では、“ＯｐｅｎＬｏｇｉｃａｌＣｈａｎｎｅｌ”は伝送チャンネルの属性を定義するためのプロトコル記述を示している。同図の例では、Ｈ．２４５のプロトコルに対して、“ＭＰＥＧ４ＬｏｇｉｃａｌＣｈａｎｎｅｌＰａｒａｍｅｔｅｒｓ”を拡張定義している。
【０１３７】
図１８では、ＬＡＮの伝送チャンネルに対して、プログラム番号（ＴＶのチャンネルに相当）と、プログラムの名前とを対応づけている（“ＭＰＥＧ４ＬｏｇｉｃａｌＣａｎｎｅｌＰａｒａｍｅｔｅｒｓ”）ことを示している。
【０１３８】
又、同図において、“ＢｒｏａｄｃａｓｔＣｈａｎｎｅｌＰｒｏｇｒａｍ”は、ＬＡＮの伝送チャンネルとプログラム番号との対応付けを放送型で送信する場合の記述方法である。同図の例では、最大１０２３個の伝送チャンネルとプログラム番号の対応関係を送付することが可能である。放送の場合は送信側から受信側へ一方的に送信するだけであるため、これらの情報を伝送中の損失を考慮して周期的に伝送する必要がある。
【０１３９】
図１９では、プログラムとして伝送されるオブジェクト（例えば、映像、音声など）の属性について記述している（“ＭＰＥＧ４ＯｂｊｅｃｔＣｌａｓｓｄｅｆｉｎｉｔｉｏｎ”）。プログラムの識別子（“ＰｒｏｇｒａｍＩＤ”）に対してオブジェクトの情報（“ＯｂｊｅｃｔＳｔｒｕｃｔｕｒｅＥｌｅｍｅｎｔ”）を対応付けている。最大で１０２３個のオブジェクトを対応付けることが可能である。オブジェクトの情報としては、ＬＡＮの伝送チャンネル（“ＬＡＮＰｏｒｔＮｕｍｂｅｒ”）、スクランブルが使用されているか否かのフラグ（“ＳｃｒａｍｂｌｅＦｌａｇ”）、端末が過負荷である場合の処理の優先度を変更するためのオフセット値を定義するフィールド（“ＣＧＤＯｆｆｓｅｔ”）、そして、伝送するメディア（映像、音声など）のタイプを識別するための識別子（ＭｅｄｉａＴｙｐｅ）を記述する。
【０１４０】
図２０の例では、ＥＳ（ここでは１フレーム分の映像に相当するデータ列と定義する）の復号処理を管理するためにＡＬ（ここでは１フレーム分の映像を復号するために必要な付加情報と定義する）が付加されている。ＡＬの情報としては、（１）ＲａｎｄｏｍＡｃｃｅｓｓＦｌａｇ（単独で再生可能であるかどうかを示すフラグ、フレーム内符号化された映像フレームであれば真である）、（２）ＰｒｅｓｅｎｔａｔｉｏｎＴｉｍｅＳｔａｍｐ（フレームの表示時刻）、（３）ＣＧＤＰｒｉｏｒｉｔｙ（端末が過負荷時に処理の優先度を決定するための優先度の値）が定義されている。これらの１フレーム分のデータ列を、ＲＴＰ（インターネットで連続メディアを伝送するためのプロトコル，ＲｅａｌｔｉｍｅＴｒａｎｓｆｅｒＰｒｏｔｏｃｏｌ）を用いて伝送する場合の例を示している。“ＡＬＲｅｃｏｎｆｉｇｕｒａｔｉｏｎ”は、上記のＡＬで表現できる最大値を変更するための伝送表現である。
【０１４１】
同図の例では、“ＲａｎｄｏｍＡｃｃｅｓｓＦｌａｇＭａｘＢｉｔ”として、最大で２ビットの表現が可能である。例えば０ならば、ＲａｎｄｏｍＡｃｃｅｓｓＦｌａｇは使用しない。２ならば最大値は３である。
【０１４２】
尚、実数部と仮数部による表現を行っても良い（例えば、３＾６）。又、非設定時は、デフォルトで決められた状態で動作することにしても良い。
【０１４３】
図２１では、“ＳｅｔｕｐＲｅｑｕｅｓｔ”は、セットアップ時間を送信するための伝送表現を示している。プログラムを送信する前に“ＳｅｔｕｐＲｅｑｕｅｓｔ”は送信され、伝送される伝送チャンネル番号（“ＬｏｇｉｃａｌＣｈａｎｎｅｌＮｕｍｂｅｒ”）と、実行するプログラムＩＤ（“ｅｘｃｕｔｅＰｒｏｇｒａｍＮｕｍｂｅｒ”）、使用するデータＩＤ（“ｄａｔａＮｕｍｂｅｒ”）、実行するコマンドのＩＤ（“ｅｘｅｃｕｔｅＣｏｍｍａｎｄＮｕｍｂｅｒ”）を対応付けて受信端末へ送付する。また、別の表現方法として、伝送チャンネル番号と対応付けて、実行の許可のフラグ（“ｆｌａｇ”）、あと何回ＳｅｔｕｐＲｅｑｕｅｓｔを受信したら実行するかを記したカウンタ（“ｃｏｕｎｔｅｒ”）、あとどれくらいの時間で実行するかを示すタイマー値（“ｔｉｍｅｒ”）であってもよい。
【０１４４】
尚、要求予定のリクエストの例としては、ＡＬ情報の書き換え、ＧｒｏｕｐＭｕｘの立ち上がり時間の確保などがあげられる。
【０１４５】
図２２は、図２０で説明したＡＬの使用の有無を送信端末から受信端末へ通知するための伝送表現について説明する図である（“ＣｏｎｔｒｏｌＡＬｄｅｆｉｎｉｔｉｏｎ”）。
【０１４６】
同図において、“ＲａｎｄｏｍＡｃｃｅｓｓＦｌａｇＵｓｅ”が真ならばＲａｎｄｏｍＡｃｃｅｓｓＦｌａｇは使用する。そうでなければ使用しない。このＡＬの変更通知は制御情報としてデータとは別の伝送チャンネルで伝送してもよいし、データとともに同一の伝送チャンネルで伝送してもよい。
【０１４７】
尚、実行するプログラムとしては、デコーダプログラムなどがあげられる。又、セットアップのリクエストは、放送であっても通信であっても利用出来る。又、制御情報としての項目を、ＡＬの情報としてどの項目を使用するかを上記のリクエストで受信端末に指示する。又、同様に通信ヘッダにどの項目を、ＡＬの情報としてどの項目を、制御情報としてごの項目を使用するかを受信端末に指示出来る。
【０１４８】
図２３では、情報枠組み識別子（“ｈｅａｄｅｒＩＤ”）を用いて、伝送するヘッダ情報（データ管理情報、伝送管理情報、制御情報）の構造を送受信端末間で用途に応じて変更するための伝送表現の例を示している。
【０１４９】
同図において、“ｃｌａｓｓＥＳｈｅａｄｅｒ”は、データと同じ伝送チャンネルで伝送されるデータ管理情報や、伝送管理情報の伝送される情報の構造を、情報枠組み識別子により送受信端末間で区別している。
【０１５０】
例えば“ｈｅａｄｅｒＩＤ”の値が０ならば、ｂｕｆｆｅｒＳｉｚｅＥＳの項目だけ用い、“ｈｅａｄｅｒＩＤ”の値が１ならば“ｒｅｓｅｒｖｅｄ”の項目を加えて用いる。
【０１５１】
又、デフォルト識別子（“ｕｓｅＨｅａｄｅｒＥｘｔｅｎｓｉｏｎ”）を用いることでデフォルトの形式の情報の枠組みを用いるか、用いないかを判定する。“ｕｓｅＨｅａｄｅｒＥｘｔｅｎｓｉｏｎ”が真であれば、ｉｆ文の内部の項目が用いられる。これらの構造情報に関しては予め送受信端末間で取り決められているものとする。なお、情報枠組み識別子とデフォルト識別子は、何れか一方を使用する構成であってもよい。
【０１５２】
図２４では、“ＡＬｃｏｎｆｉｇｕｒａｔｉｏｎ”は、データとは異なる伝送チャンネルで伝送される制御情報の構造を送受信端末間で用途に応じて変更する場合の例を示している。情報枠組み識別子とデフォルト識別子の使用方法は図２３の場合と同じである。
【０１５３】
本発明では、複数の動画や音声を同時に合成して表示させるシステムの実現方法について、下記の観点から具体的に述べた。
【０１５４】
（１）複数の論理的な伝送路を用いて画像や音声の伝送（通信と放送）及び、それらを制御する方法。特に、制御情報とデータをそれぞれ、伝送する論理的な伝送路を独立させて伝送する方法について述べた。
【０１５５】
（２）送信すべき画像や音声のデータに付加するヘッダ情報（ＡＬの情報）の動的な変更方法。
【０１５６】
（３）送信のために付加する通信用のヘッダ情報の動的な変更方法。
【０１５７】
具体的には、（２）と（３）に関しては、ＡＬの情報と通信用ヘッダで重複している情報について統合して管理する方法や、ＡＬの情報を制御情報として伝送する方法について述べた。
【０１５８】
（４）複数の論理的な伝送路を、動的に多重化、分離して情報の伝送を行う方法。
【０１５９】
伝送路のチャンネル数を節約する方法、効率的な多重化を実現する方法について述べた。
【０１６０】
（５）プログラムやデータの読み込み、立ち上げ時間を考慮した画像や音声の伝送方法。様々な機能、用途で見かけ上のセットアップ時間の短縮方法について述べた。
【０１６１】
（６）ザッピングに対する画像や音声の伝送方法。
【０１６２】
尚、本発明は、２次元の画像合成だけに限定されない。２次元の画像と３次元の画像を組み合わせた表現形式でもよいし、広視野画像（パノラマ画像）のように複数の画像を隣接するように画像合成するような画像合成方法も含めてもよい。
【０１６３】
また、本発明で対象としている通信形態は、有線の双方向ＣＡＴＶやＢ−ＩＳＤＮだけではない。例えば、センター側端末から家庭側端末への映像や音声の伝送は電波（例えば、ＶＨＦ帯、ＵＨＦ帯）、衛星放送で、家庭側端末からセンター側端末への情報発信はアナログの電話回線やＮ−ＩＳＤＮであってもよい（映像、音声、データも必ずしも多重化されている必要はない）。
【０１６４】
また、ＩｒＤＡ、ＰＨＳ（パーソナル・ハンディー・ホン）や無線ＬＡＮのような無線を利用した通信形態であってもよい。さらに、対象とする端末は、携帯情報端末のように携帯型の端末であっても、セットトップＢＯＸ、パーソナルコンピュータのように卓上型の端末であってもよい。なお、応用分野としては、ＴＶ電話、多地点の監視システム、マルチメディアのデータベース検索システム、ゲームなどが挙げられ、本発明は受信端末だけではなく、受信端末に接続されるサーバや中継の機器なども含まれる。
【０１６５】
さらに、これまでの例ではＲＴＰの（通信）ヘッダとＡＬの情報の重複を回避する方法や、ＲＴＰの通信ヘッダやＡＬの情報を拡張する方法について述べた。しかし、本発明は、必ずしもＲＴＰである必要はない。たとえば、ＵＤＰやＴＣＰを使って独自の通信ヘッダやＡＬ情報を新たに定義してもよい。インターネットプロファイルではＲＴＰを使うことはあるが、ＲａｗプロファイルではＲＴＰのような多機能なヘッダは定義されていない。ＡＬ情報と通信ヘッダに関する考え方としては、前述したように４通りの考え方ができる。
【０１６６】
このように、送信端末と受信端末で使用するデータ管理情報、伝送管理情報、制御情報の各情報の枠組み（例えば、１番最初は、ランダムアクセスのフラグで１ビットのフラグ情報として割り当て、２番めはシーケンス番号で１６ビット割り当てるといった、付加する情報の順序とビット数をともなった情報の枠組み）を動的に決定することで、状況に応じた情報の枠組みの変更が可能になり、用途や伝送路に応じた変更ができる。
【０１６７】
尚、各情報の枠組みとしては、図６（ａ）〜図６（ｄ）において既に示したものあってもよいし、ＲＴＰならば、データ管理情報（ＡＬ）はメディア毎のヘッダ情報（例えば、Ｈ．２６３ならＨ．２６３固有のビデオのヘッダ情報や、ペイロードのヘッダ情報）、伝送管理情報はＲＴＰのヘッダ情報で、制御情報はＲＴＣＰのようなＲＴＰを制御するような情報であってもよい。
【０１６８】
また、送受信端末間で予め設定されている公知の情報の枠組みで、情報の送受信して処理するか、否かを示すためのデフォルト識別子をデータ管理情報、伝送管理情報、制御情報（データとは別のパケットで伝送される、端末処理を制御する情報）に、それぞれ設けることで、情報の枠組みの変更が行われているかどうかを知ることができ、変更が行なわれている時だけ、デフォルト識別子をセットし、前述の図１９〜図２０に示したような方法で変更内容（たとえば、タイムスタンプ情報を３２ビットから１６ビットに変更する）を通知することで、情報の枠組み情報を変更しない場合でも不要にコンフィグレーション情報を送信しなくても済む。
【０１６９】
たとえば、データ管理情報の情報の枠組みを変更したいときには、次の２つの方法が考えられる。まず、データ自身にデータ管理情報の情報の枠組みの変更方法を記述する場合、データ管理情報の情報の枠組みに関して記述されたデータ内に存在する情報のデフォルト識別子（固定の領域、位置に書き込む必要がある）をセットし、そのあとに情報の枠組みの変更内容に関して記述する。
【０１７０】
もう１つの方法として制御情報（情報枠組み制御情報）にデータの情報の枠組みの変更方法を記述して、データ管理情報における情報の枠組みを変更する場合、制御情報に設けられたデフォルト識別子をセットし、変更するデータ管理情報の情報の枠組みの内容を記述し、ＡＣＫ／Ｒｅｊｅｃｔで受信端末にデータ管理情報の情報の枠組みが変更されたことを通知、確認してから、情報の枠組みが変更されたデータを伝送する。伝送管理情報、制御情報自身の情報の枠組みを変更する場合も、同様に上記の２つの方法で実現できる（図２３〜図２４）。
【０１７１】
より具体的な例としては、例えば、ＭＰＥＧ２のヘッダ情報は固定であるが、ＭＰＥＧ２−ＴＳ（トランスポート・ストリーム）のビデオ・ストリーム、オーディオ・ストリームを関係づけるプログラム・マップテーブル（ＰＳＩで定義される）にデフォルト識別子を設け、ビデオ・ストリーム、オーディオ・ストリームの情報の枠組みの変更方法を記述したコンフィグレーション・ストリームを定義しておくことで、デフォルト識別子がセットされていれば、まず、コンフィグレーション・ストリームを解釈してから、コンフィグレーション・ストリームの内容に応じて、ビデオとオーディオのストリームのヘッダーを解釈することができる。コンフィグレーションストリームは図２３〜図２４で示した内容でよい。
【０１７２】
尚、本発明の、伝送方法に関する及び／又は伝送するデータの構造に関する内容（伝送フォーマット情報）は、上記実施の形態では、例えば、情報の枠組みに対応している。
【０１７３】
又、上記実施の形態では、変更しようとする、伝送方法に関する及び／又は伝送するデータの構造に関する内容を伝送する場合を中心に述べたが、これに限らず例えば、その内容の識別子のみを伝送する構成でも勿論良い。この場合、送信装置としては、例えば、図５２に示す様に、（１）伝送方法に関する及び／又は伝送するデータの構造に関する内容、又はその内容を示す識別子を、伝送フォーマット情報として、前記伝送するデータの伝送路と同一の伝送路、又は、前記伝送路とは別の伝送路を用いて伝送する伝送手段５００１と、（２）前記伝送方法に関する及び／又は伝送するデータの構造に関する内容と、その識別子とを複数種類格納する格納手段５００２とを備え、前記識別子が、データ管理情報、伝送管理情報又は、端末側の処理を制御するための情報の内、少なくとも一つの情報の中に含まれている画像・音声送信装置であってもよい。又、受信装置としては、例えば、図５３に示す様に、上記画像・音声送信装置から送信されてくる前記伝送フォーマット情報を受信する受信手段５１０１と、前記受信した伝送フォーマット情報を解釈する伝送情報解釈手段５１０２とを備えた画像・音声受信装置であってもよい。更に、この画像・音声受信装置は、前記伝送方法に関する及び／又は伝送するデータの構造に関する内容と、その識別子とを複数種類格納する格納手段５１０３を備え、前記伝送フォーマット情報として前記識別子を受信した場合には、前記識別子の内容を解釈する際に、前記格納手段に格納されている内容を利用する構成であっても良い。
【０１７４】
さらに、具体的には、予め情報の枠組みを複数、送受信端末で取り決めて用意しておき、それら複数種類の情報の枠組みの識別と、複数種のデータ管理情報、伝送管理情報、制御情報（情報枠組み制御情報）を識別するための情報枠組み識別子をデータとともに、もしくは、制御情報として伝送することで、複数種のデータ管理情報、伝送管理情報、制御情報の各情報を識別することが可能となり、伝送すべきメディアの形式や伝送路の太さに応じて各情報の情報の枠組みを自由に選択することができる。尚、本発明の識別子は、上記情報の枠組み識別子に対応する。
【０１７５】
これら情報の枠組み識別子、デフォルト識別子は、伝送される情報の予め決められた固定長の領域もしくは、位置に付加することで、受信側端末で、情報の枠組みが変更されていても読み取り、解釈することができる。
【０１７６】
又、上述した実施の形態で述べた構成以外に、複数個のチャンネルで放送される画像の見出し画像だけを放送する放送チャンネルを設け、視聴者が視聴番組を切り替えることで、必要となるプログラムやデータのセットアップに時間がかかる場合、一旦、視聴したい番組の見出し画像を選択して視聴者に提示する構成としても良い。
【０１７７】
以上のように本発明によれば、送信端末と受信端末で使用するデータ管理情報、伝送管理情報、制御情報の各情報の枠組みを動的に決定することで、状況に応じた情報の枠組みの変更が可能になり、用途や伝送路に応じた変更ができる。
【０１７８】
また、送受信端末間で予め設定されている公知の情報の枠組みで、情報の送受信して処理するか、否かを示すためのデフォルト識別子をデータ管理情報、伝送管理情報、制御情報に、それぞれ設けることで、情報の枠組みの変更が行われているかどうかを知ることができ、変更が行なわれている時だけ、デフォルト識別子をセットし、変更内容を通知することで、情報の枠組み情報を変更しない場合でも不要にコンフィグレーション情報を送信しなくても済む。
【０１７９】
さらに、予め情報の枠組みを複数、送受信端末で取り決めて用意しておき、複数種のデータ管理情報、伝送管理情報、制御情報を識別するための情報枠組み識別子をデータとともに、もしくは、制御情報として伝送することで、複数種のデータ管理情報、伝送管理情報、制御情報の各情報を識別することが可能となり、伝送すべきメディアの形式や伝送路の太さに応じて各情報の情報の枠組みを自由に選択することができる。
【０１８０】
これら情報枠組み識別子、デフォルト識別子は、伝送される情報の予め決められた固定長の領域もしくは、位置に付加することで、受信側端末で、情報の枠組みが変更されていても読み取り、解釈することができる。
【０１８１】
以下、本発明の実施の形態について図面を参照して説明する。
【０１８２】
尚、ここでは、主に上述した課題（Ｂ１）〜（Ｂ２）の何れか一つを解決するものである。
【０１８３】
本発明で使用する「画像」の意味は静止画と動画の両方を含む。また、対象とする画像は、コンピュータ・グラフィックス（ＣＧ）のような２次元画像とワイヤーフレーム・モデルから構成されるような３次元の画像データであってもよい。
【０１８４】
図３１は、本発明の実施の形態における画像符号化、画像復号化装置の概略構成図である。
【０１８５】
符号化された種々の情報を送信もしくは記録する送信管理部４０１１は、同軸ケーブル、ＣＡＴＶ、ＬＡＮ、モデム等の情報を伝送する手段である。画像符号化装置４１０１は、Ｈ．２６３、ＭＰＥＧ１／２、ＪＰＥＧ、あるいは、ハフマン符号化といった画像情報の符号化を行う画像符号部４０１２と、上記送信管理部４０１１とを具備する構成である。又、画像復号化装置４１０２は、符号化された種々の情報を受信する受信管理部４０１３と、その受信された種々の画像情報の復号を行う画像復号部４０１４と、復号された１つ以上の画像を合成する画像合成部４０１５と、画像を出力するディスプレイやプリンターなどから構成される出力部４０１６とを備えた構成である。
【０１８６】
図３２は、本発明の実施の形態における音声符号化、音声復号化装置の概略構成図である。
【０１８７】
音声符号化装置４２０１は、符号化された種々の情報を送信もしくは記録する送信管理部４０２１と、Ｇ．７２１、ＭＰＥＧ１オーディオといった音声情報の符号化を行う音声符号部４０２２とを具備する構成である。又、音声復号化装置４２０２は、符号化された種々の情報を受信する受信管理部４０２３と、前記種々の音声情報の復号を行う音声復号部４０２４と、復号された１つ以上の音声を合成する音声合成部４０２５と、音声を出力する出力部４０２６とを備えた構成である。
【０１８８】
音声や動画像の時系列データは、具体的には上記の各装置で、符号化、又は復号化される。
【０１８９】
図３１、図３２とも、通信環境としてはインターネットのように多重化の手段を意識せずに複数の論理的な伝送路が利用できる通信環境であってもよし、アナログ電話や衛星放送のように多重化手段を意識しなければならない通信環境であってもよい。また、端末の接続形態としては、ＴＶ電話やＴＶ会議システムのように端末間で双方向で映像や音声を送受信する形態や、衛星放送やＣＡＴＶ、インターネット上での放送型の映像や音声放送の形態が挙げられる。
【０１９０】
同様に、画像や音声の合成方法に関しては、ＪＡＶＡ、ＶＲＭＬ、ＭＨＥＧといったスクリプト言語で、画像・音声と画像・音声の構造情報（表示位置や表示時間）、画像・音声同士のグルーピングの方法、画像の表示のレイヤ（深さ）、そして、オブジェクトＩＤ（画像、音声といった個々のオブジェクトを識別するためのＩＤ）と、これらの属性の関係を記述することによって画像や音声の合成方法が定義できる。合成方法を記述したスクリプトはネットワークやローカルの記憶装置から得られる。
【０１９１】
尚、画像符号化装置、画像復号化装置、音声符号化装置、音声復号化装置を、それぞれ任意の個数で、任意の組み合わせで送受信の端末を構成してもよい。
【０１９２】
図３３（ａ）は、過負荷時の処理の優先度を管理する優先度付加部、優先度決定部について説明する図である。Ｈ．２６３やＧ．７２３などの符号化方法で、符号化された情報の過負荷時の処理の優先度を予め決められた基準で決定し、符号化された情報と決定された優先度を対応づける優先度付加部４０３１を画像符号化装置４１０１や音声符号化装置４２０１に備える。
【０１９３】
優先度の付加の基準は、たとえば、画像であればシーンチェンジ、編集者や利用者が指示した画像フレームやストリーム、音声であれば、有音区間と無音区間である。
【０１９４】
過負荷時の処理の優先度を定義する優先度の付加方法は、通信ヘッダへ付加する方法と符号化時にビデオやオーディオの符号化されるビットストリームのヘッダに埋め込む方法が考えられる。前者は、復号せずに優先度に関する情報が得ることが可能であり、後者はシステムに依存せずにビットストリーム単体で独立に扱うことが可能である。
【０１９５】
図３３（ｂ）に示したように、通信ヘッダに優先度情報を付加する場合、１つの画像フレーム（例たとえば、フレーム内符号化されたＩフレーム、フレーム間符号化されたＰ、Ｂフレーム）が複数個の送信パケットに分割される場合、画像であれば単独の情報としてアクセス可能な画像フレームの先頭部分を伝送する通信ヘッダのみに優先度を付加する（同一の画像フレーム内で優先度が等しい場合、次のアクセス可能な画像フレームの先頭が現れるまで、優先度は変わらないものとすればよい）。
【０１９６】
また、復号化装置では、受信された種々の符号化された情報の過負荷時の優先度に従って、処理の方法を決定する優先度決定部４０３２を画像復号化装置４１０２や音声復号化装置４２０２に備える。
【０１９７】
図３４〜図３６は、優先度を付加する粒度について説明する図である。端末での過負荷時の処理の優先度を決定する２種類の優先度を用いて、デコード処理を行なう。
【０１９８】
すなわち、映像、音声といったビットストリーム単位での過負荷時の処理の優先度を定義するストリーム優先度（ＳｔｒｅａｍＰｒｉｏｒｉｔｙ；時系列データ間優先度）と、同一ストリーム内の映像フレームといったフレーム単位での過負荷時の処理の優先度を定義するフレーム優先度（ＦｒａｍｅＰｒｉｏｒｉｔｙ；時系列データ内優先度）を定義する（図３４参照）。
【０１９９】
前者のストリーム優先度により複数のビデオやオーディオの取り扱いが可能になる。後者のフレーム優先度により映像のシーンチェンジや編集者の意図に応じて、同一のフレーム内符号化された映像フレーム（Ｉフレーム）でも異なる優先度の付加が可能になる。
【０２００】
ストリーム優先度が表現する値の意味としては、相対的な値として扱う場合と、絶対的な値として扱う場合が考えられる（図３５、図３６参照）。
【０２０１】
ストリーム優先度とフレーム優先度の取り扱いが行なわれるのはネットワーク上であれば、ルータやゲートウェイといった中継端末、端末であれば、送信端末と受信端末があげられる。
【０２０２】
絶対的な値と、相対的な値の表現方法は２通り考えられる。１つは、図３５で示した方法であり、もう１つは図３６で示した方法である。
【０２０３】
図３５では、絶対的な値の優先度とは、編集者や機械的に付加された画像ストリームや音声ストリームが過負荷時に処理される（又は、処理されるべき）順序をあらわす値である（実際のネットワークや端末の負荷変動を考慮した値ではない）。相対的な値の優先度は、端末やネットワークの負荷に応じて、絶対的な優先度の値を変更するための値である。
【０２０４】
優先度を相対的な値と、絶対的な値に分離して管理することで、ネットワークの負荷の変動などに応じて、送信側や中継装置で相対的な値だけを変更することで、元来、画像や音声ストリームに付加されていた絶対的な優先度を残したままで、ハードディスクやＶＴＲへの記録が可能となる。このように絶対的な優先度の値が記録されていれば、ネットワークの負荷変動などの影響を受けていない形での映像や音声の再生が可能となる。なお、相対的な優先度や絶対的な優先度はデータとは独立に制御チャンネルを通して伝送してもよい。
【０２０５】
同様に、図３５では、ストリーム優先度よりも粒度を細かくして、過負荷時のフレームの処理の優先度を定義するフレーム優先度を、相対的な優先度の値として扱ったり、絶対的な優先度の値として扱うことも可能である。たとえば、絶対的なフレーム優先度を符号化された画像の情報内に記述し、ネットワークや端末の負荷で変動を反映させるために、先の映像フレームに付加した絶対的な優先度に対する相対的なフレーム優先度を符号化された情報を伝送するための通信パケットの通信ヘッダに記述することで、フレームレベルでも、オリジナルの優先度を残しながらも、ネットワークや端末の負荷に応じた優先度の付加が可能である。
【０２０６】
なお、相対的な優先度は、通信ヘッダではなくデータとは独立して制御チャネルでフレームとの対応関係を記述して伝送してもよい。これにより、元来、画像や音声ストリームに付加されていた絶対的な優先度を残したままで、ハードディスクやＶＴＲへの記録が可能となる。
【０２０７】
一方、図３５において、受信端末で記録を行なわずに、ネットワークを介して伝送しながら受信端末で再生を行なう場合、受信端末で絶対的な値と相対的な値を分離して管理する必要がないため、送信側で予め、フレーム、ストリームの両方のレベルの場合においても、絶対値な優先度の値と相対的な優先度の値を送信前に計算して絶対値のみを送ってもよい。
【０２０８】
図３６において、絶対的な値の優先度とは、ＳｔｒｅａｍＰｒｉｏｒｉｔｙと、ＦｒａｍｅＰｒｉｏｒｉｔｙの関係から求められるフレーム間で一意に決定される値である。相対的な値の優先度は、編集者や機械的に付加された画像ストリームや音声ストリームが過負荷時に処理される（又は、処理されるべき）順序をあらわす値である。図３６の例では、映像、音声の各ストリームのフレーム優先度（ｒｅｌａｔｉｖｅ；相対値）とストリーム毎にストリーム優先度が付加されている。
【０２０９】
絶対的なフレーム優先度（ａｂｓｏｌｕｔｅ；絶対値）は相対的なフレーム優先度と、ストリーム優先度の和から求められる（即ち、絶対的なフレーム優先度＝相対的なフレーム優先度＋ストリーム優先度）。なお、この算出方法は減算したり、定数を掛け合わせるような方法でもよい。
【０２１０】
絶対的なフレーム優先度は主としてネットワークで用いる。これはルータやゲイトウエイといった中継装置で、ＳｔｒｅａｍＰｒｉｏｒｉｔｙとＦｒａｍｅＰｒｉｏｒｉｔｙとを加味してフレーム毎の優先度を決定する必要が絶対値による表現では不要になるからである。この絶対的なフレーム優先度を用いることで中継装置でのフレームの廃棄などの処理が容易になる。
【０２１１】
一方、相対的なフレーム優先度は主として記録、編集を行なう蓄積系への応用が期待できる。編集作業では、複数の映像、音声ストリームを同時に扱うことがある。そのような場合に、端末やネットワークの負荷により再生できる映像ストリームやフレームの数には限界が生じる可能性がある。
【０２１２】
そのような場合に、ＳｔｒｅａｍＰｒｉｏｒｉｔｙと、ＦｒａｍｅＰｒｉｏｒｉｔｙとを分離して管理しておくだけで、例えば、編集者が、優先的に表示させたい、あるいは、ユーザが、見たいストリームのＳｔｒｅａｍＰｒｉｏｒｉｔｙを変更するだけで、絶対値の表現を行なっている時とは違い、ＦｒａｍｅＰｒｉｏｒｉｔｙをすべて計算し直す必要がない。このように用途に応じて、絶対的な表現、相対的な表現を使い分ける必要がある。
【０２１３】
また、ストリーム優先度の値を相対的な値として用いるか、絶対的な値として用いるかを記述することで、伝送時にも、蓄積する場合にも有効な優先度の表現が可能となる。
【０２１４】
図３５の例では、ストリーム優先度に付随して、ストリーム優先度が表現する値が絶対値であるか、相対値であるかを表現するフラグや識別子を設けて区別する。フレーム優先度の場合は、通信ヘッダに相対的な値が記述され、符号化されたフレーム内に絶対的な値が記述されるため、フラグや識別子は不要である。
【０２１５】
図３６の例では、フレーム優先度が絶対値であるか相対値であるかを識別するためのフラグもしくは識別子を設けている。絶対値であれば、ストリーム優先度と相対的なフレーム優先度から算出されている優先度であるから、算出の処理を中継装置や端末で行なわない。また、受信端末では、算出式が端末間で既知である場合、絶対的なフレーム優先度とストリーム優先度から相対的なフレーム優先度を逆算することが可能である。例えば、伝送するパケットの絶対的な優先度（ＡｃｃｅｓｓＵｎｉｔＰｒｉｏｒｉｔｙ）を、ＡｃｃｅｓｓＵｎｉｔＰｒｉｏｒｉｔｙ＝ストリーム優先度−フレーム優先度、という関係式から求めても良い。ここで、フレーム優先度は、ストリーム優先度を減算することから、劣後優先度と表現しても良い。
【０２１６】
さらに、１つ以上のストリーム優先度をＴＣＰ／ＩＰの論理チャンネル（ＬＡＮのポート番号）を流れるデータの処理の優先度に対応付けて、データの処理を管理してもよい。
【０２１７】
加えて、画像や音声は、文字もしくは制御情報よりも低いストリーム優先度やフレーム優先度を割り当てることで再送処理の必要が低減できることが期待できる。これは画像や音声は一部分が失われても、問題が発生しない場合も多いからである。
【０２１８】
図３７は、多重解像度の画像データへ優先度の割り当て方法について説明する図である。
【０２１９】
１つのストリームが２つ以上の複数のサブストリームから構成される場合、サブストリームにストリーム優先度の付加を行い、蓄積時もしくは伝送時に論理和もしくは論理積の記述を行うことでサブストリームの処理方法の定義を行うことが可能である。
【０２２０】
ウェーブレットの場合、１つの映像フレームを複数の異なる解像度の映像フレームに分解することが可能である。また、ＤＣＴベースの符号化方式でも高周波の成分と低周波の成分に分割して符号化することで異なる解像度の映像フレームへの分解は可能である。
【０２２１】
分解された一連の映像フレームから構成される複数個の映像ストリームに付加されるストリーム優先度のほかに、映像のストリーム間の関係を記述するためにＡＮＤ（論理積）とＯＲ（論理和）で関係を定義する。具体的な使用方法は、ストリームＡのストリーム優先度が５であり、ストリームＢのストリーム優先度が１０である場合（数字の少ない方が優先度が高い）、優先度によりストリームデータの廃棄ならば、ストリームＢの方は廃棄されるが、ストリーム間の関係記述を行なうことで、ＡＮＤの場合にはストリームＢの優先度が閾値の優先度よりも低くても、廃棄せずに伝送、処理するように定義しておく。
【０２２２】
これにより、関連のあるストリームは廃棄されずに処理できるようになる。ＯＲの場合には逆に、廃棄可能であると定義する。これまでと同様に、廃棄処理は送受信端末でも行なっても、中継端末で行なってもよい。
【０２２３】
なお、関係記述のための演算子として、おなじビデオクリップを２４Ｋｂｐｓと４８Ｋｂｐｓの別のストリームに符号化した場合、どちらかを再生すれば良いという場合がある（関係記述として排他的論理和ＥＸ−ＯＲ）。
【０２２４】
前者の優先度を１０、後者を５としてある場合、ユーザは優先度に基づいて後者を再生してもよいし、優先度に従わずユーザは後者を選んでもよい。
【０２２５】
図３８は通信ペイロードの構成方法について説明する図である。
【０２２６】
複数のサブストリームから構成される場合、サブストリームに付加したストリーム優先度に応じて、たとえば優先度の高い順に、送信パケットを構成することで送信パケットレベルでの廃棄が容易になる。また、粒度を細かくして、フレーム優先度の高いオブジェクト同士の情報をひとつにまとめて通信パケットを構成しても通信パケットレベルでの廃棄が容易になる。
【０２２７】
なお、画像のスライス構造を通信パケットに対応付けることでパケット落ちしたときの復帰が容易である。つまり、動画像のスライス構造をパケットの構造に対応付けることで、再同期のためのリシンクマーカーが不要になる。スライス構造と通信パケットの構造が一致していなければ、パケット落ちなどで情報が損失した場合、再同期ができるようにリシンクマーカー（復帰する位置を知らせるための印）を付加する必要がある。
【０２２８】
これにあわせて、優先度の高い通信パケットには高いエラープロテクションをかけることが考えられる。なお、画像のスライス構造とはＧＯＢやＭＢといったまとまった画像情報の単位をさす。
【０２２９】
図３９はデータを通信ペイロードへ対応づける方法について説明する図である。ストリームやオブジェクトの通信パケットへの対応付けの方法を制御情報もしくはデータとともに伝送することで、通信状況や用途に応じて任意のデータフォーマットが生成できる。たとえば、ＲＴＰ（ＲｅａｌｔｉｍｅＴｒａｎｓｆｅｒＰｒｏｔｏｃｏｌ）では、扱う符号化毎にＲＴＰのペイロードが定義されている。現行のＲＴＰの形式は固定である。Ｈ．２６３の場合、同図に示したように、ＭｏｄｅＡからＭｏｄｅＣの３つのデータ形式が定義されている。Ｈ．２６３では、多重解像度の映像フォーマットを対象とした通信ペイロードは定義されていない。
【０２３０】
同図の例では、ＬａｙｅｒＮｏ．と前述の関係記述（ＡＮＤ、ＯＲ）を、ＭｏｄｅＡのデータフォーマットに追加して定義している。
【０２３１】
図４０は、フレーム優先度、ストリーム優先度と通信パケット優先度との対応について説明する図である。
【０２３２】
又、同図は、伝送路で通信パケットに付加される優先度を通信パケット優先度とし、ストリーム優先度やフレーム優先度を、通信パケット優先度に対応させる例である。
【０２３３】
通常、ＩＰを利用した通信では、画像や音声データに付加されたフレーム優先度やストリーム優先度を下位のＩＰパケットの優先度にパケットに対応付けてデータを伝送する必要がある。画像や音声データは分割され、ＩＰのパケットに分割されて伝送されるため優先度の対応付けが必要がある。図の例では、ストリーム優先度は０から３までの値をとり、フレーム優先度は０から５までの値をとるため、上位のデータでは０から１５までの優先度を取りうる。
【０２３４】
ＩＰｖ６では優先度（４ビット）のうち０から７までは輻輳制御されたトラフィックのために予約されている、優先度のうち８から１５までは実時間通信トラフィックまたは輻輳制御されていないトラフィックのために予約されている。優先度１５は最も優先度が高く、優先度８が最も優先度が低い。これはＩＰのパケットのレベルでの優先度になる。
【０２３５】
ＩＰを使ったデータの伝送では上位の０から１５までの優先度を下位のＩＰの優先度である８から１５までの優先度に対応付ける必要がある。対応付けは上位の優先度の一部をクリッピングする方式でもよいし、評価関数をもうけて対応付けてもよい。上位のデータと下位のＩＰの優先度の対応付けは、中継ノード（ルータやゲートウェイなど）、送受信端末で管理を行う。
【０２３６】
なお、伝送手段はＩＰだけに限定されるわけではなく、ＡＴＭやＭＰＥＧ２のＴＳ（トランスポート・ストリーム）のように廃棄可能かそうでないかのフラグをもった伝送パケットを対象としてもよい。
【０２３７】
これまでに述べた、フレーム優先度とストリーム優先度は、伝送媒体やデータ記録媒体へ適用が可能である。データ記録媒体としてフロッピーディスク、光ディスクなどを用いて行うことができる。
【０２３８】
また、記録媒体はこれに限らず、ＩＣカード、ＲＯＭカセット等、プログラムを記録できるものであれば同様に実施することができる。さらに、データの中継を行うルータやゲートウェイといった画像音声中継装置を対象としてもよい。
【０２３９】
加えて、ＳｔｒｅａｍＰｒｉｏｒｉｔｙ（時系列データ間優先度）や、ＦｒａｍｅＰｒｉｏｒｉｔｙ（時系列データ内優先度）の情報に基づいて再送すべき時系列データを決定することで、優先的な再送処理が可能となる。たとえば、優先度情報に基づいて受信端末でデコードを行なっている場合、処理の対象外であるストリームやフレームの再送を防止することができる。
【０２４０】
また、現在の処理対象となっている優先度とは別に、再送回数と送信成功回数の関係から再送すべき優先度のストリームやフレームを決定してもよい。
【０２４１】
一方、送信側の端末においても、ＳｔｒｅａｍＰｒｉｏｒｉｔｙ（時系列データ間優先度）やＦｒａｍｅＰｒｉｏｒｉｔｙ（時系列データ内優先度）の情報に基づいて送信すべき時系列データを決定することで、優先的な送信処理が可能となる。たとえば、平均転送レートや、再送回数に基づいて送信すべきストリームやフレームの優先度を決定することで、ネットワークが過負荷である際にも適応的な映像や音声の伝送が可能になる。
【０２４２】
なお、上記実施の形態は、２次元の画像合成だけに限定したものではない。２次元の画像と３次元の画像を組み合わせた表現形式でもよいし、広視野画像（パノラマ画像）のように複数の画像を隣接するように画像合成するような画像合成方法も含めてもよい。また、本発明で対象としている通信形態は、有線の双方向ＣＡＴＶやＢ−ＩＳＤＮだけではない。たとえば、センター側端末から家庭側端末への映像や音声の伝送は電波（例えば、ＶＨＦ帯、ＵＨＦ帯）、衛星放送で、家庭側端末からセンター側端末への情報発信はアナログの電話回線やＮ−ＩＳＤＮであってもよい（映像、音声、データも必ずしも多重化されている必要はない）。また、ＩｒＤＡ、ＰＨＳ（パーソナル・ハンディー・ホン）や無線ＬＡＮのような無線を利用した通信形態であってもよい。
【０２４３】
さらに、対象とする端末は、携帯情報端末のように携帯型の端末であっても、セットトップＢＯＸ、パーソナルコンピュータのように卓上型の端末であっても良い。
【０２４４】
以上のように本発明によれば、複数のビデオストリームや複数のオーディオストリームの取り扱いや、編集者の意図を反映させて、重要なシーンカットを重点的にオーディオとともに同期再生をさせることが容易となる。
【０２４５】
以下に本発明の実施の形態を図面を参照しながら説明する。
【０２４６】
尚、ここで述べる実施の形態は、主に、上述した課題（Ｃ１）〜（Ｃ３）の何れかを解決するものである。
【０２４７】
図４１は第1の実施の形態である送信装置の構成を示すものである。２１０１は画像入力端子であって、一枚の画像サイズは例えば縦１４４画素、横１７６画素である。２１０２は動画像符号化装置であって、４つの構成要素１０２１，１０２２，１０２３，１０２４から成る（ＲｅｃｏｍｍｅｎｄａｔｉｏｎＨ．２６１参照）。
【０２４８】
１０２１は入力された画像をマクロブロック（縦１６画素、横１６画素の正方形領域）に分割し、このブロックの符号化を、イントラ／インタどちらで符号化するかを決定する切替器、１０２２は前回の符号化結果から計算できるローカルデコード画像をもとに動き補償画像を作成し、これと入力画像との差分を計算し、結果をマクロブロック単位に出力する動き補償手段であって、動き補償には、処理時間の長いハーフペル動き補償と処理時間の短いフルペル動き補償がある。１０２３はそれぞれのマクロブロックに対してＤＣＴ変換を施す直交変換手段、１０２４はこのＤＣＴ変換結果及び他の符号化情報に対してエントロピー符号化を施すための可変長符号化手段である。
【０２４９】
２１０３は計数手段であって、動画像符号化装置２１０２の４つの構成要素の実行回数を計数し、入力画像ごとに、結果を変換手段に出力する。この時、動き補償手段１０２２からは、ハーフペルとフルペルの２通りについてそれぞれの実行回数を計数する。
【０２５０】
２１０４は変換手段であって、図４２に示すようなデータ列を出力する。２１０５は送信手段であって、動画像符号化装置２１０２からの可変長符号と、変換手段２１０４からのデータ列を多重化して、一本のデータ列とし、データ出力端子２１０９に出力するものである。
【０２５１】
以上の構成により、受信装置に、必須処理（切替器１０２１，直交変換手段１０２３，可変長符号化手段１０２４）と非必須処理（動き補償手段１０２２）の各実行回数を伝達することができる。
【０２５２】
次に、図４８は、第２の実施の形態である送信方法のフローチャートである。
【０２５３】
本実施の形態における動作が第１の実施の形態と似ているので、対応する要素を付記しておく。８０１にて、画像を入力し（画像入力端子２１０１）、８０２にて画像をマクロブロックに分割する。以降、８０７の条件分岐により、すべてのマクロブロックに対する処理を完了するまで、８０３から８０６までの処理を繰りかえす。なお、８０３から８０６までの処理の回数を、特定の変数に記録できるように、それぞれの処理を実行した場合には、対応する変数を１だけインクリメントする。
【０２５４】
まず、８０３にて、処理対象のマクロブロックをイントラ／インタどちらで符号化するかを判定する（切替器１０２１）。インタの場合は、８０４にて動き補償を行う（動き補償手段１０２２）。その後、８０５，８０６にて、ＤＣＴ変換、可変長符号化を、行う（直交変換手段１０２３，可変長符号化手段１０２４）。すべてのマクロブロックに対する処理を完了したら（８０７にてＹｅｓの時）、８０８にて、それぞれの処理に対応する実行回数を示す変数を読み、図２に示すようなデータ列を生成し、このデータ列と符号とを多重化し、出力する。以上の８０１から８０８までの処理を、入力画像が続くかぎり、繰り返し実行する。
【０２５５】
以上の構成により、各処理の実行回数を送信することができる。
【０２５６】
次に、図４３は第３の実施の形態である受信装置の構成を示すものである。
【０２５７】
同図において、３０７は第１の実施の形態の送信装置の出力を入力するための入力端子、３０１は第１の実施の形態の送信装置の出力をもとに可変長符号とデータ列を逆多重化により取り出し出力する受信手段であって、この時、一枚分のデータを受信するのに要した時間を計測しておき、これも出力するものとする。
【０２５８】
３０３は可変長符号を入力とする動画像の復号化装置であって、５つの構成要素から成る。３０３１は可変長符号からＤＣＴ係数及び他の符号化情報を取り出すための可変長復号化手段、３０３２はＤＣＴ係数に対して逆ＤＣＴ変換処理を施す逆直交変換手段、３０３３は切替器であって、マクロブロックごとに、イントラ／インタどちらで符号化されているかの符号化情報に基づき、出力を上下に振りわける動作をする。３０３４は動き補償手段であって、前回の復号画像と動きの符号化情報とを用い、動き補償画像を作成し、この画像に逆直交変換手段３０３２の出力を加算して出力する。３０３５は実行時間計測手段であって、復号化装置３０３に可変長符号が入力されてから画像の復号化及び出力を完了するまでの実行時間を計測し、これを出力する。３０２は、受信手段３０１からのデータ列から各要素（可変長復号化手段３０３１，逆直交変換手段３０３２，切替器３０３３，動き補償手段３０３４）の実行回数と、実行時間計測手段３０３５から実行時間とを受け取り、各要素の実行時間を推定する推定手段である。
【０２５９】
推定方法は、例えば、線型回帰を用いれば、推定実行時間を目的変数ｙ、各要素の実行回数を説明変数ｘ＿ｉとすれば良い。この場合、回帰パラメタａ＿ｉは、各要素の実行時間とみなせるであろう。また、線型回帰の場合、過去のデータを充分多く蓄積しておく必要があり、メモリを沢山消費することになるが、これを嫌う場合には、カルマンフィルタによる内部状態変数の推定を利用しても良い。この場合、観測値が実行時間、各要素の実行時間を内部状態変数とし、観測行列Ｃが各要素の実行回数でステップごとに変化する場合、と考えれば良い。３０４は、フルペル動き補償の実行回数を減らし、相当数だけハーフペル動き補償の実行回数を増やすように、各要素の実行回数を変更する回数削減手段である。この相当数の計算方法は、以下の通りである。
【０２６０】
まず、推定手段３０２から各要素の実行回数と推定実行時間とを受けとり、実行時間を予想する。この時間が、受信手段３０１からのデータを受信するのに要した時間を越える場合に、越えなくなるまで、フルペル動き補償の実行回数を増やし、ハーフペル動き補償の実行回数を減らす。３０６は復号化画像の出力端子である。
【０２６１】
なお、動き補償手段３０３４は、符号化情報からハーフペル動き補償を行うよう指示されている場合であるが、ハーフペル動き補償の所定実行回数を越えてしまった場合には、ハーフペルの動きを丸めて、フルペルの動きとして、フルペル動き補償を実行する。
【０２６２】
以上にて説明した第１の実施の形態、第３の実施の形態によれば、推定された各要素の実行時間から復号化処理の実行時間を予測し、これが一枚分のデータを受信するのに要した時間（指定時間）を越えるようであれば、実行時間の長いハーフペルの動き補償を、フルペルの動き補償で置き替える。これによって、実行時間が指定時間を越えないようにでき、課題（Ｃ１）を解決することができる。
【０２６３】
なお、受信装置でのＩＤＣＴ計算において、高周波成分を使用しないようにすることで、ＩＤＣＴ計算の処理時間を減らすことができる。つまり、ＩＤＣＴ計算のうち、低周波成分の計算を必須処理、高周波成分の計算を非必須処理とみなして、ＩＤＣＴ計算の高周波成分の計算回数を削減するようにしても良い。
【０２６４】
次に、図４９は、第４の実施の形態である受信方法のフローチャートである。
【０２６５】
本実施の形態における動作が第３の実施の形態と似ているので、対応する要素を付記しておく。ステップ９０１にて各要素の実行時間を表現する変数ａ＿ｉを初期化する（推定手段３０２）。９０２にて多重化データの入力と、これに要する時間の計測を行う（受信手段３０１）。９０３にてこの多重化データを、可変長符号とデータ列とに分離し、出力する（受信手段３０１）。９０４にてデータ列（図２）から各実行回数を取り出し、これらをｘ＿ｉに設定する。９０５にて、各要素の実行時間ａ＿ｉと各実行回数ｘ＿ｉとから、実際の実行回数を算出する（回数削減手段３０４）。９０６にて、復号化処理の実行時間の計測を開始し、９０７にて後述する復号化処理ルーチンを起動し、その後、９０８にて復号化処理の実行時間の計測を終了する（動画像の復号化装置３０３，実行時間計測手段３０３５）。９０８では、９０８での復号化処理の実行時間と９０５での各要素の実際の実行回数とから各要素の実行時間を推定し、ａ＿ｉを更新する（推定手段３０２）。以上の処理を入力される多重化データごとに実行する。
【０２６６】
また、復号化処理ルーチン９０７では、９１０にて可変長復号化を行い（可変長復号化手段３０３１）、９１１にて逆直交変換を行い（逆直交変換手段３０３２）、９１２にて、９１０での処理で取り出されたイントラ／インタの情報で分岐する（切替器３０３３）。インタの場合は、９１３にて動き補償を施す（動き補償手段３０３４）。この９１３にて、ハーフペル動き補償の実行回数を計数しておき、これが９０５で求めた実際の実行回数を越えた場合には、ハーフペル動き補償をフルペル動き補償で置き替えて実行する。以上の処理を、すべてのマクロブロックについて完了後（ステップ９１４）、このルーチンを終了する。
【０２６７】
以上にて説明した第２の実施の形態、第４の実施の形態によれば、推定された各要素の実行時間から復号化処理の実行時間を予測し、これが一枚分のデータを受信するのに要した時間（指定時間）を越えるようであれば、実行時間の長いハーフペルの動き補償を、フルペルの動き補償で置き替える。これによって、実行時間が指定時間を越えないようにでき、課題（Ｃ１）を解決することができる。
【０２６８】
次に、図４４は第５の実施の形態である受信装置の構成を示すものである。
【０２６９】
本実施の形態のほとんどの構成要素は、第２の実施の形態で説明したのと同じであり、２つの構成要素の追加と、１つの構成要素の修正のみであるのでその点を説明する。
【０２７０】
４０２は第２の実施の形態で説明した推定手段３０２に推定の結果得た各要素の実行時間を、回数制限手段３０４への出力とは別に、出力するよう修正したものである。４０８は送信手段であって、各要素の実行時間から図４５に示すようなデータ列を生成し、これを出力するものである。実行時間は、マイクロセコンドを単位として、１６ｂｉｔで表現すれば最大で、約６５ミリセコンドを表現できるので、充分であろう。４０９はこのデータ列を送信手段に送るための出力端子である。
【０２７１】
また、この第５の実施の形態に対応する受信方法は、図４５に示すようなデータ列を生成するステップを図４８の８０８の直後に追加したもので良い。
【０２７２】
次に、図４６は第６の実施の形態である送信装置の構成を示すものである。
【０２７３】
本実施の形態のほとんどの構成要素は、第１の実施の形態で説明したのと同じであり、２つの構成要素の追加のみであるのでその点を説明する。６０６は第３の実施の形態の受信装置の出力するデータ列を受信するための入力端子、６０７はこのデータ列を受信し、各要素の実行時間を出力する受信手段である。６０８は、各要素の実行回数を求める決定手段であって、その手順は以下の通りである。まず、画像中のすべてのマクロブロックについて、切替器１０２１での処理を行い、この時点での切替器１０２１の実行回数を求める。また、このあとの、動き補償手段１０２２、直交変換手段１０２３，可変長符号化手段１０２４での実行回数は、この時点までの処理結果によって、一意に決定できる。そこで、これら実行回数と、受信手段６０７からの実行時間を用いて、受信装置側での復号化に要する実行時間を予測する。この予測復号化時間は、各要素の実行時間と実行回数の積の、要素ごとの総和として、求まる。そして、予測復号化時間が、レートコントローラなどが指定した今回の画像で発生すべき符号量（例えば１６ｋｂｉｔｓ）の伝送に要する時間（例えば、伝送速度が６４ｋｂｉｔ／ｓｅｃなら２５０ｍｓｅｃ）以上であれば、復号化時間が伝送に要する時間を越えないように、フルペル動き補償の実行回数を増やし、ハーフペル動き補償の実行回数を減らす（フルペル動き補償のほうが、実行時間が短いので、これの回数を減らすことで実行時間を小さくすることができる。）。
【０２７４】
なお、動画像の符号化装置２１０２は、決定手段６０８の指定した実行回数に基づき、各処理を行う。例えば、動き補償手１０２２は、指定されたハーフペルの動き補償実行回数分だけ、ハーフペル動き補償を実行完了すれば、その後は、フルペルの動き補償だけを実行するようになる。
【０２７５】
また、ハーフペルの動き補償が、画像中に一様にちらばるように、選択方法を工夫しても良い。たとえば、まず、ハーフペルの動き補償を必要とするマクロブロックをすべて求め、この数（例えば１２）をハーフペルの動き補償実行回数（例えば４）で割った商（３）を求め、ハーフペルの動き補償を必要とするマクロブロックの始めからの順序が、この商で割りきれるもの（０，３，６，９）だけにハーフペルの動き補償を施す、という方法でも良い。
【０２７６】
以上にて説明した第５の実施の形態、第６の実施の形態によれば、推定された各要素の実行時間を送信側に伝送し、送信側にて復号化処理の実行時間を予測し、これが一枚分のデータを受信するのに要するであろう時間（指定時間）を越えないように実行時間の長いハーフペルの動き補償を、フルペルの動き補償で置き替える。これによって、送られた符号化情報のうち、ハーフペル動き補償の情報が捨てられることなく、実行時間が指定時間を越えないようにでき、課題（Ｃ２）を解決することができる。
【０２７７】
なお、非必須処理において、インターマクロブロック符号化を普通の動き補償、８ｘ８動き補償、オーバラップ動き補償の３つに分割しても良い。
【０２７８】
次に、図５０は、第７の実施の形態である送信方法のフローチャートである。
【０２７９】
本実施の形態における動作が第６の実施の形態と似ているので、対応する要素を付記しておく。１００１にて、各処理の実行時間の初期値を設定する。８０１にて画像を入力し（入力端子２１０１）、にて画像をマクロブロックに分割する。１００２にて、すべてのマクロブロックについて、イントラ／インタどちらで符号化するかを判定する（切替器１０２１）。この結果、１００５から８０６までの各処理の実行回数がわかるので、１００３では、この実行回数と、各処理の実行時間とから、実際の実行回数を算出する（決定手段６０８）。
【０２８０】
以降、８０７の条件分岐により、すべてのマクロブロックに対する処理を完了するまで、１００５から８０６までの処理を繰りかえす。
【０２８１】
なお、１００５から８０６までの処理の回数を、特定の変数に記録できるように、それぞれの処理を実行した場合には、対応する変数を１だけインクリメントする。まず、１００５にて、１００２での判定結果に基き、分岐する（切替器１０２１）。インタの場合は、８０４にて動き補償を行う（動き補償手段１０２２）。ここで、ハーフペル動き補償の回数を計数しておき、これが１００３で求めた実際の実行回数を越えた場合には、ハーフペル動き補償を実行せずかわりにフルペル動き補償を実行する。その後、８０５，８０６にて、ＤＣＴ変換、可変長符号化を、行う（直交変換手段１０２３，可変長符号化手段１０２４）。すべてのマクロブロックに対する処理を完了したら（８０７にてＹｅｓの時）、８０８にて、それぞれの処理に対応する実行回数を示す変数を読み、図２に示すようなデータ列を生成し、このデータ列と符号とを多重化し、出力する。１００４では、データ列を受信し、これから各処理の実行時間を取り出し、設定する。
【０２８２】
以上の８０１から１００４までの処理を、入力画像が続くかぎり、繰り返し実行する。
【０２８３】
以上にて説明した、第５の実施の形態の説明部分の最後の「また」で始まるパラグラフと、第７の実施の形態とによれば、推定された各要素の実行時間を送信側に伝送し、送信側にて復号化処理の実行時間を予測し、これが一枚分のデータを受信するのに要するであろう時間（指定時間）を越えないように実行時間の長いハーフペルの動き補償を、フルペルの動き補償で置き替える。これによって、送られた符号化情報のうち、ハーフペル動き補償の情報が捨てられることなく、実行時間が指定時間を越えないようにでき、課題（Ｃ２）を解決することができる。
【０２８４】
次に、図４７は第８の実施の形態である送信装置の構成を示すものである。
【０２８５】
本実施の形態のほとんどの構成要素は、第１の実施の形態で説明したのと同じであり、４つの構成要素の追加のみであるのでその点を説明する。
【０２８６】
７０１０は実行時間計測手段であって、符号化装置２１０２に画像が入力されてから画像の符号化及び符号の出力を完了するまでの実行時間を計測し、これを出力する。７０６は、計数手段２１０３からのデータ列からの各要素（切替器１０２１、動き補償手段１０２２、直交変換手段１０２３，可変長復号化手段１０２４）の実行回数と、実行時間計測手段７０１０からの実行時間とを受け取り、各要素の実行時間を推定する推定手段である。推定方法は、第２の実施の形態の推定手段３０２で説明したものと同じで良い。７０７はユーザからのフレームレート値を入力するための入力端子、７０８は、各要素の実行回数を求める決定手段であって、その手順は以下の通りである。
【０２８７】
まず、画像中のすべてのマクロブロックについて、切替器１０２１での処理を行い、この時点での切替器１０２１の実行回数を求める。また、このあとの、動き補償手段１０２２、直交変換手段１０２３，可変長符号化手段１０２４での実行回数は、この時点までの処理結果によって、一意に決定できる。つぎに、この実行回数と推定手段７０６からの各要素の推定実行時間との積の、要素ごとの総和を求め予測符号化時間を算出する。そして、予測符号化時間が、７０７からのフレームレートの逆数から求まる一枚の画像の符号化に使用可能な時間以上であれば、フルペル動き補償の実行回数を増やし、ハーフペル動き補償の実行回数を減らす。
【０２８８】
この増減処理と予測符号化時間の算出とを、予測符号化時間が使用可能な時間以下になるまで、繰り返すことで、それぞれの実行回数を決定する。
【０２８９】
なお、動画像の符号化装置２１０２は、決定手段６０８の指定した実行回数に基づき、各処理を行う。例えば、動き補償手１０２２は、指定されたハーフペルの動き補償実行回数分だけ、ハーフペル動き補償を実行完了すれば、その後は、フルペルの動き補償だけを実行するようになる。
【０２９０】
また、ハーフペルの動き補償が、画像中に一様にちらばるように、選択方法を工夫しても良い。たとえば、まず、ハーフペルの動き補償を必要とするマクロブロックをすべて求め、この数（例えば１２）をハーフペルの動き補償実行回数（例えば４）で割った商（３）を求め、ハーフペルの動き補償を必要とするマクロブロックの始めからの順序が、この商で割りきれるもの（０，３，６，９）だけにハーフペルの動き補償を施す、という方法でも良い。
【０２９１】
以上示した第８の実施の形態によれば、各処理の実行時間を推定し、この推定実行時間に基き、符号化に要する実行時間を予め予測し、この予測符号化時間が、フレームレートから決まる画像の符号化に使用可能な時間以下になるように、実行回数を決定することにより、課題（Ｃ３）を解決することができる。
【０２９２】
なお、動き補償手段１０２２では、動きベクトルを検出するために、左右上下１５画素の範囲のベクトルのうち、もっともＳＡＤ（画素ごとに差の絶対値の和）を小さくするものを検出するフルサーチ動きベクトル検出方法存在するが、これ以外に、３ｓｔｅｐ動きベクトル検出方法というものもある（Ｈ．２６１のａｎｎｅｘ．に記述がある）。これは、上記の探索範囲にて均等な配置関係の９点を選び、これのＳＡＤ最小の点を選ぶ。次に、この点の近傍のせばめた範囲にて、再度、９点を選び、ＳＡＤ最小の点を選ぶ。このような処理をもう一度実行するのが、３ｓｔｅｐ動きベクトル検出方法である。
【０２９３】
これら２つの方法を、非必須処理方法とみなし、実行時間をそれぞれ推定し、推定実行時間にもとづき、符号化に要する実行時間を予測し、この予測実行時間がユーザ指定時間以下になるように、適宜、フルサーチ動きベクトル検出方法の実行回数を減らし、かわりに３ｓｔｅｐ動きベクトル検出方法の実行回数を増やすようにしても良い。
【０２９４】
さらに、３ｓｔｅｐ動きベクトル検出方法以外に、もっと処理を簡略化した固定探索回数による動きベクトル検出方法や、（０，０）動きベクトルのみを結果として返す動きベクトル検出方法を併用しても良い。
【０２９５】
次に、図５１は、第９の実施の形態である送信方法のフローチャートである。
【０２９６】
本実施の形態における動作が第８の実施の形態と似ているので、対応する要素を付記しておく。各フローでの詳しい動作は、対応する要素の説明を参照のこと。また、第２の実施の形態とほぼ同じであるので、異なる点のみを説明する。
【０２９７】
１１０１にて各処理の実行時間の初期値を変数a＿iに設定する。また、１１０２にてフレームレートを入力する（入力端子７０７）。ｌ１０３は、１１０２でのフレームレート、各処理の実行時間a＿i、１００２でのイントラ／インタ判定結果から求まる各処理の実行回数、とから実際の実行回数を決定する（決定手段７０８）。１１０５，１１０６は、符号化処理の実行時間を計測するためのものである。１１０４は、１１０６での実行時間と各処理の実際の実行回数とから各処理の実行時間を推定し、変数a＿iを更新する（推定手段７０６）。
【０２９８】
以上示した第９の実施の形態によれば、各処理の実行時間を推定し、この推定実行時間に基き、符号化に要する実行時間を予め予測し、この予測符号化時間が、フレームレートから決まる画像の符号化に使用可能な時間以下になるように、実行回数を決定することにより、課題（Ｃ３）を解決することができる。
【０２９９】
なお、第２の実施の形態において、８０８でのデータ列生成時に、図２に示すスタートコードの直後に、２バイトの領域を追加し、ここに、符号の長さの二進表現を追加しても良い。
【０３００】
さらに、第４の実施の形態において、９０２での多重化データの入力時にこの２バイトの領域から符号の長さを抽出し、この符号長さと、符号の伝送速度とから求まる符号の伝送時間を、９０５での実行回数計算に用いるようにしても良い（符号の伝送時間を越えないように、ハーフペル動き補償の実行回数を減らす）。
【０３０１】
なお、第１の実施の形態において、２１０４でのデータ列生成時に、図２に示すスタートコードの直後に、２バイトの領域を追加し、ここに、符号の長さの二進表現を追加しても良い。
【０３０２】
さらに、第３の実施の形態において、３０１での多重化データの入力時にこの２バイトの領域から符号の長さを抽出し、この符号長さと、符号の伝送速度とから求まる符号の伝送時間を、３０４での実行回数計算に用いるようにしても良い（符号の伝送時間を越えないように、ハーフペル動き補償の実行回数を減らす）。
【０３０３】
また、第４の実施の形態において、９０９直後に、ハーフペル動き補償の実際の実行回数を記録し、これの最大値を算出する。そして、この最大値が充分小さな値（例えば、２とか３）以下の場合には、ハーフペル動き補償を使用しないことを示すデータ列（特定のビットパターンから成るデータ列）を生成し、これを送信しても良い。さらに、第２の実施の形態において、８０８直後にて、このデータ列の受信有無を確認し、ハーフペル動き補償を使用しないことを示すデータ列を受信した場合には、８０８にて動き補償の処理を常にフルペル動き補償とするようにしても良い。
【０３０４】
さらに、動き補償以外にも、この考えを適用できる。たとえば、ＤＣＴ計算で、高周波成分を使用しないようにすることで、ＤＣＴ計算の処理時間を減らすことができる。つまり、受信方法にて、ＩＤＣＴ計算の実行時間の全体の実行時間に占める割合が一定値を越える場合には、その旨を示すデータ列を送信側に伝送する。送信側では、このデータ列を受信した場合には、ＤＣＴ計算において低周波成分のみを計算し、高周波成分はすべて０にしても良い。
【０３０５】
さらに、ここでは、画像を用いて実施の形態を説明したが、画像以外の音声などに、上記の各方法を適用しても良い。
【０３０６】
また、第３の実施の形態において、３０３４にて、ハーフペル動き補償の実際の実行回数を記録し、これの最大値を算出する。そして、この最大値が充分小さな値（例えば、２とか３）以下の場合には、ハーフペル動き補償を使用しないことを示すデータ列（特定のビットパターンから成るデータ列）を生成し、これを送信しても良い。さらに、第１の実施の形態において、ハーフペル動き補償を使用しないことを示すデータ列を受信した場合には、１０２２での動き補償の処理を常にフルペル動き補償とするようにしても良い。
【０３０７】
さらに、動き補償以外にも、この考えを適用できる。たとえば、ＤＣＴ計算で、高周波成分を使用しないようにすることで、ＤＣＴ計算の処理時間を減らすことができる。つまり、受信方法にて、ＩＤＣＴ計算の実行時間の全体の実行時間に占める割合が一定値を越える場合には、その旨を示すデータ列を送信側に伝送する。
【０３０８】
送信側では、このデータ列を受信した場合には、ＤＣＴ計算において低周波成分のみを計算し、高周波成分はすべて０にしても良い。
【０３０９】
さらに、ここでは、画像を用いて実施の形態を説明したが、画像以外の音声などに、上記の方法を適用しても良い。
【０３１０】
以上説明したところから明らかなように、例えば第１の実施の形態、第３の実施の形態によれば、推定された各要素の実行時間から復号化処理の実行時間を予測し、これが一枚分のデータを受信するのに要した時間（指定時間）を越えるようであれば、実行時間の長いハーフペルの動き補償を、フルペルの動き補償で置き替える。これによって、実行時間が指定時間を越えないようにでき、課題（Ｃ１）を解決することができる。
【０３１１】
また、例えば第５の実施の形態、第７の実施の形態によれば、推定された各要素の実行時間を送信側に伝送し、送信側にて復号化処理の実行時間を予測し、これが一枚分のデータを受信するのに要するであろう時間（指定時間）を越えないように実行時間の長いハーフペルの動き補償を、フルペルの動き補償で置き替える。これによって、送られた符号化情報のうち、ハーフペル動き補償の情報が捨てられることなく、実行時間が指定時間を越えないようにでき、課題（Ｃ２）を解決することができる。
【０３１２】
また、例えば第９の実施の形態によれば、各処理の実行時間を推定し、この推定実行時間に基き、符号化に要する実行時間を予め予測し、この予測符号化時間が、フレームレートから決まる画像の符号化に使用可能な時間以下になるように、実行回数を決定することにより、課題（Ｃ３）を解決することができる。
【０３１３】
このように、本発明により、計算負荷が増大してもゆるやかに品質を落とす機能（ＣＧＤ：Computational Graceful Degradation)を実現出来、実施に伴う利益は非常に大である。
【０３１４】
又、以上述べてきた実施の形態の何れか一つに記載の各ステップ（又は、各手段）の全部又は一部のステップ（又は、各手段の動作）をコンピュータに実行させるためのプログラムを記録した磁気記録媒体や、光記録媒体などの記録媒体を作成し、その記録媒体を用いてコンピュータにより上記と同様の動作を行っても良い。
【０３１５】
【発明の効果】
以上説明したように、本発明によれば、例えば、複数のビデオストリームや複数のオーディオストリームの取り扱いや、編集者の意図を反映させて、重要なシーンカットを重点的にオーディオとともに同期再生をさせることが容易となる。
【図面の簡単な説明】
【図１】本発明の実施例における画像音声送受信装置の概略構成図
【図２】受信管理部と分離部とを示す図
【図３】複数の論理的な伝送路を用いて画像や音声の伝送、制御する方法を示す図
【図４】送信すべき画像や音声のデータに付加するヘッダ情報の動的な変更方法を示す図
【図５】（ａ）〜（ｂ）：ＡＬ情報の付加方法を示す図
【図６】（ａ）〜（ｄ）：ＡＬ情報の付加方法の例を示す図
【図７】複数の論理的な伝送路を動的に多重化、分離して情報の伝送を行う方法を示す図
【図８】放送番組の伝送手順を示す図
【図９】（ａ）：プログラム、データが受信端末にある場合における、プログラムやデータの読み込み、立ち上げ時間を考慮した画像や音声の伝送方法を示す図
（ｂ）：プログラム、データが送信される場合における、プログラムやデータの読み込み、立ち上げ時間を考慮した画像や音声の伝送方法を示す図
【図１０】（ａ）〜（ｂ）：ザッピングに対する対応方法を示す図
【図１１】実際に端末間で送受信されるプロトコルの具体例を示す図
【図１２】実際に端末間で送受信されるプロトコルの具体例を示す図
【図１３】実際に端末間で送受信されるプロトコルの具体例を示す図
【図１４】実際に端末間で送受信されるプロトコルの具体例を示す図
【図１５】実際に端末間で送受信されるプロトコルの具体例を示す図
【図１６】実際に端末間で送受信されるプロトコルの具体例を示す図
【図１７】実際に端末間で送受信されるプロトコルの具体例を示す図
【図１８】実際に端末間で送受信されるプロトコルの具体例を示す図
【図１９】実際に端末間で送受信されるプロトコルの具体例を示す図
【図２０】実際に端末間で送受信されるプロトコルの具体例を示す図
【図２１】実際に端末間で送受信されるプロトコルの具体例を示す図
【図２２】実際に端末間で送受信されるプロトコルの具体例を示す図
【図２３】実際に端末間で送受信されるプロトコルの具体例を示す図
【図２４】実際に端末間で送受信されるプロトコルの具体例を示す図
【図２５】（ａ）〜（ｂ）：本発明のＣＧＤのデモシステム構成図
【図２６】本発明のＣＧＤのデモシステム構成図
【図２７】エンコーダでの過負荷時の優先度の付加方法を示す図
【図２８】過負荷時の受信端末での優先度の決定方法について記した図
【図２９】優先度の時間変化を示す図
【図３０】ストリーム優先度とオブジェクト優先度を示す図
【図３１】本発明の実施例における画像符号化、画像復号化装置の概略構成図
【図３２】本発明の実施例における音声符号化、音声復号化装置の概略構成図
【図３３】（ａ）〜（ｂ）：過負荷時の処理の優先度を管理する優先度付加部、優先度決定部を示す図
【図３４】優先度を付加する粒度を示す図
【図３５】優先度を付加する粒度を示す図
【図３６】優先度を付加する粒度を示す図
【図３７】多重解像度の画像データへ優先度の割り当て方法を示す図
【図３８】通信ペイロードの構成方法を示す図
【図３９】データを通信ペイロードへ対応づける方法を示す図
【図４０】オブジェクト優先度、ストリーム優先度と通信パケット優先度との対応を示す図
【図４１】本発明の第１の実施の形態における送信装置の構成図
【図４２】第１の実施の形態の説明図
【図４３】本発明の第３の実施の形態における受信装置の構成図
【図４４】本発明の第５の実施の形態における受信装置の構成図
【図４５】第５の実施の形態の説明図
【図４６】本発明の第６の実施の形態における送信装置の構成図
【図４７】本発明の第８の実施の形態における送信装置の構成図
【図４８】本発明の第２の実施の形態における送信方法のフローチャート
【図４９】本発明の第４の実施の形態における受信方法のフローチャート
【図５０】本発明の第７の実施の形態における送信方法のフローチャート
【図５１】本発明の第９の実施の形態における送信方法のフローチャート
【図５２】本発明の画像・音声送信装置の一例を示す構成図
【図５３】本発明の画像・音声受信装置の一例を示す構成図
【図５４】本発明の画像・音声送信装置の映像と音声に優先度を付加する優先度付加手段について説明する図
【図５５】本発明の画像・音声受信装置の映像と音声に付加された優先度を解釈し、復号処理の可否を決定する優先度決定手段について説明する図
【符号の説明】
１１受信管理部
１２分離部
１３伝送部
１４画像伸長部
１５画像伸長管理部
１６画像合成部
１７出力部
１８端末制御部
３０１受信手段
３０２推定手段
３０３動画像の復号化装置
３０４回数削減手段
３０６出力端子
３０７入力端子
３０３１可変長復号化手段
３０３２逆直交変換手段
３０３３切替器
３０３４動き補償手段
３０３５実行時間計測手段
４０１１送信管理部
４０１２画像符号部
４０１３受信管理部
４０１４画像復号部
４０１５画像合成部
４０１６出力部
４１０１画像符号化装置
４１０２画像復号化装置[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a data processing apparatus and a data processing method in the fields of communication and broadcasting.
[0002]
[Prior art]
Conventionally, for example, a person image is extracted from an image of a landscape in the space where the person is present, and the image, a person image sent from the other party, and a virtual image displayed in common with the other party stored in advance are displayed. There is one that aims at video communication with a sense of reality by satisfying the reality that the other party is in front by superimposing it on the image of the space (Japanese Patent Publication No. 4-24914).
[0003]
In particular, in the prior art, an invention relating to a method for speeding up image synthesis and a method for reducing memory has been performed (for example, Japanese Patent Publication No. 5-46592: Image synthesis apparatus).
[0004]
[Problems to be solved by the invention]
In such a conventional technique, a communication system using image synthesis for synthesizing a two-dimensional still image or three-dimensional CG data has been proposed. However, a system for simultaneously synthesizing and displaying a plurality of moving images and sounds is proposed. There was no specific discussion on the realization method from the following viewpoints.
[0005]
That is, (A1) using a plurality of logical transmission paths constructed in software on one or more actual transmission paths, data and control information (transmitted in a packet different from data, Image and sound transmission (communication and broadcasting) in an environment where information for controlling processing on the terminal side) is transmitted independently, and its control method, and (A2) image and sound data to be transmitted Header information (corresponding to data management information of the present invention) to be added dynamically, (A3) header information to be added for transmission (corresponding to transmission management information of the present invention) dynamically changing method, (A4) a method of dynamically multiplexing and separating a plurality of logical transmission paths and transmitting information; (A5) a method of transmitting images and sounds in consideration of loading of programs and data and startup time; and (A6) Image and sound considering zapping Specific discussion from the viewpoint of the method of transmission is a problem that has not been performed.
[0006]
On the other hand, conventionally, as a method of dynamically adjusting the transmission amount to the network, a method of changing the encoding method or a method of discarding data in units of frames according to the frame type of the video has been proposed. (Hiroshi Jinzenji, Tetsuo Tajiri, A Study of Distributed Adaptive VOD System, D-81, IEICE System Society (1995)).
[0007]
As a method for adjusting the processing amount on the encoder side, a dynamic algorithm scalable algorithm that can provide high-quality video under processing time constraints has been proposed (Fumisuke Osako, Yoshiyuki Yajima, Hiroshi Kodera, Hiroshi Watanabe, Kazunori Shimamura: Software Image Coding with Dynamic Complexity Scalable Algorithm, IEICE Transactions D-2, Vol.80-D-2, No.2, pp.444-458 (1997)).
[0008]
An example of realizing synchronized playback of moving images and audio is an MPEG1 / MPEG2 system.
[0009]
In such a conventional technique, in (B1) the method of discarding video according to the video frame type of the conventional method, the granularity of information that can be handled is within a single stream, so that multiple video streams and multiple video streams There has been a problem that it is difficult to play an audio stream in sync with audio while focusing on important scene cuts, reflecting the handling of the audio stream and the intentions of the editor. (B2) Also, since MPEG1 / MPEG2 is premised on hardware implementation, it is assumed that the decoder can decode all the given bitstreams. Therefore, there is a problem that a method for dealing with the case where the processing capacity of the decoder is exceeded is indefinite.
[0010]
On the other hand, in the conventional video transmission, H. 261 (ITU-T Recommendation H.261-Video codec for audiovisual services at px64) and the like have been implemented by hardware. For this reason, the case where the decoding process cannot be completed within the specified time because the upper limit of the necessary performance is taken into consideration at the time of hardware design did not occur.
[0011]
Here, the designated time is the time required to transmit a bit stream obtained by encoding one image. If the decoding cannot be performed within this time, the excess time becomes a delay, and if this increases and accumulates, the delay from the transmission side to the reception side increases, making it unsuitable for use as a videophone. This situation must be avoided.
[0012]
In addition, there is a problem that a moving image cannot be transmitted when the decoding process cannot be completed within a specified time because the communication partner generates a non-standard bit stream.
[0013]
The above problem is a problem that occurs not only in moving images but also in audio data.
[0014]
However, in recent years, as a result of the development of the network environment of personal computers (PCs) in the form of the spread of the Internet and ISDN, the transmission speed has increased, and it has become possible to transmit moving images using the PC and the network. The demand for moving image transmission from users is also increasing. In addition, the improvement of CPU performance has made it possible to sufficiently decode moving images by software.
[0015]
However, in a personal computer, the same software can be executed on computers with different device configurations, such as the presence of a CPU, bus width, accelerator, etc., so it is difficult to consider the upper limit of required performance in advance, and within the specified time In some cases, the image cannot be decoded.
[0016]
In addition, when encoded data of a moving image having a length exceeding the processing capability of the receiving apparatus is transmitted, encoding within a designated time becomes impossible.
[0017]
Problem (C1): Decoding an image within a specified time to suppress a delay to a small value.
[0018]
Further, as a means for solving the problem C1, for example, if a moving image is input as waveform data, a part of the transmitted bit stream is not used, so that the actual use efficiency of the transmission path is poor. May remain. Some encoding methods generate the current decoded image based on the previous decoded image (such as a P picture). However, since the previous decoded image may not be completely restored, image quality degradation may occur. There is also a problem that it grows with time.
[0019]
Problem (C2): The actual use efficiency of the transmission line is poor. In addition, image quality deterioration spreads.
[0020]
In addition, in software implementation, the frame rate of the image is determined by the time required for one encoding process, so if the frame rate specified by the user exceeds the processing limit of the computer, the specification cannot be met. It was.
[0021]
Problem (C3): If the frame rate specified by the user exceeds the processing limit of the computer, the specification cannot be satisfied.
[0022]
SUMMARY OF THE INVENTION An object of the present invention is to provide a data processing apparatus and a data processing method that solve at least one of the problems in consideration of the problems (B1) to (B2) of the second prior art. And
[0023]
[Means for Solving the Problems]
The present invention described in claim 1 includes: (1) time-series data of audio or moving image; (2) priority between time-series data indicating a priority of processing between the time-series data; and (3) the moving image. Accepts a data sequence including a frame type indicating whether a frame constituting image time-series data is at least an I frame or a P frame, and a time-series data priority indicating a processing priority of the frame different from the frame type. The processing means for each time-series data is allocated according to the receiving means and the priority between the time-series data, and the time-series data priority is set so that each time-series data is within the allocated processing capacity. And a data processing device for processing the divided data in the time-series data in a state where the threshold value is adaptively changed.
[0024]
The present invention according to claim 2 is: (1) time-series data of audio or moving image; (2) priority between time-series data indicating a priority of processing between the time-series data; and (3) the moving image. Input a data sequence including a frame type indicating whether a frame constituting the time-series data of the image is at least an I frame or a P frame, and a priority within the time series data indicating a processing priority of the frame different from the frame type The processing power for each time-series data is allocated according to the priority between the time-series data, and the time-series data priority of the time-series data is set so as to be within the allocated processing capacity for each time-series data. In this data processing method, the divided data in the time-series data is processed in a state where the threshold value is adaptively changed.
[0025]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, embodiments of the present invention will be described with reference to the drawings.
[0026]
The embodiment described here mainly solves any of the problems (A1) to (A6) described above.
[0027]
The “image” used in the present invention includes both still images and moving images. The target image may be three-dimensional image data composed of a two-dimensional image such as computer graphics (CG) and a wire frame model.
[0028]
FIG. 1 is a schematic configuration diagram of an audio / video transmission / reception apparatus according to an embodiment of the present invention.
[0029]
In the figure, a reception management unit 11 that receives information and a transmission unit 13 that transmits information are means for transmitting information such as a coaxial cable, CATV, LAN, and modem. The communication environment may be a communication environment in which a plurality of logical transmission paths can be used without being conscious of multiplexing means, such as the Internet, or conscious of multiplexing means, such as analog telephones and satellite broadcasting. It may be a communication environment that must be done.
[0030]
The terminal connection mode includes a mode in which video and audio are transmitted and received bi-directionally between terminals, such as a TV phone and a TV conference system, and a broadcast type video and audio broadcast on satellite broadcasting, CATV, and the Internet. A form is mentioned. In the present invention, such a terminal connection form is considered.
[0031]
The separating unit 12 shown in FIG. 1 is means for analyzing received information and separating data and control information. Specifically, it is a means for disassembling the transmission header information and data added to the data for transmission, or disassembling the data control header and data added to the data itself. . The image decompression unit 14 is a means for decompressing the received image. For example, H.M. 261, H.H. It may or may not be a standardized moving image or still image compressed image such as H.263, MPEG1 / 2, or JPEG.
[0032]
The image expansion management unit 15 shown in FIG. 1 is means for monitoring the image expansion state. For example, by monitoring the image expansion status, when the reception buffer is about to overflow, when the reception buffer is idle read without image expansion, the image can be expanded. From this, it is possible to resume the expansion of the image.
[0033]
In the figure, an image composition unit 16 is a means for composing the expanded image. Concerning the composition method, in the script language such as JAVA, VRML, and MHEG, the image and image structure information (display position and display time (display period may be included)), grouping method between images, image display layer An image composition method can be defined by describing the relationship between (depth) and an object ID (SSRC described later) and these attributes. The script describing the synthesis method is input / output from a network or a local storage device.
[0034]
The output unit 17 is a display, a printer, or the like that outputs an image synthesis result. The terminal control unit 18 is means for controlling these units. It should be noted that a configuration in which sound is expanded instead of an image can be dealt with (by changing the image expansion unit to an audio expansion unit, the image expansion management unit to an audio expansion management unit, and the image synthesis unit to a voice synthesis unit) ), Both the image and the sound may be expanded and synthesized and displayed while maintaining temporal synchronization.
[0035]
Furthermore, an image compression unit that compresses an image, an image compression management unit that manages the image compression unit, an audio compression unit that compresses audio, and an audio compression management unit that manages the audio compression unit can also be used to transmit images and audio. It becomes possible.
[0036]
FIG. 2 is a diagram illustrating the reception management unit 11 and the separation unit 12.
[0037]
A data receiving unit 101 for receiving data, a control information receiving unit 102 for receiving control information for controlling data, and a transmission structure for interpreting transmission contents to the separating unit 12 shown in FIG. The transmission format storage unit 103 stores details (to be described in detail later) and the transmission information interpretation unit 104 that interprets the transmission contents based on the transmission structure stored in the transmission format storage unit 103, thereby configuring data and control. Since it becomes possible to receive information independently, for example, it becomes easy to delete or move a received image or sound while receiving information.
[0038]
As described above, the communication environment targeted by the reception management unit 11 is a communication environment (Internet profile) such as the Internet in which a plurality of logical transmission paths can be used without being aware of multiplexing means. Alternatively, it may be a communication environment (Raw profile) in which multiplexing means must be conscious, such as analog telephones and satellite broadcasting. However, from the user's point of view, it is premised on a communication environment in which a plurality of logical transmission paths (logical channels) are prepared. used).
[0039]
Further, as shown in FIG. 2, the information received by the reception management unit 11 includes one or more types of data transmission paths and one or more types of control logical transmission paths for controlling the data to be transmitted. Is assumed. A plurality of transmission paths for data transmission may be prepared, and only one transmission path for data control may be prepared. H. Like RTP / RTCP, which is also used in H.323, a data control transmission path may be prepared for each data transmission. Furthermore, when broadcasting using UDP is considered, a communication form using a single communication port (multicast address) may be used.
[0040]
FIG. 3 is a diagram for explaining a method of transmitting and controlling images and sounds using a plurality of logical transmission paths. The data itself to be transmitted is called an ES (elementary stream). The ES may be image information for one frame or image information in units of GOB or macroblock smaller than one frame as long as it is an image.
[0041]
In the case of voice, it may be a fixed length determined by the user. In addition, header information for data control added to data to be transmitted is referred to as AL (Adaptation Layer Information). Examples of the AL information include information indicating whether or not it is a start position where data can be processed, information indicating the reproduction time of data, information indicating the priority of data processing, and the like. The data management information of the present invention corresponds to AL information. Note that ES and AL used in the present invention do not necessarily match the contents defined in MPEG1 / 2.
[0042]
Specifically, there are two types of information indicating whether or not it is a start position where data can be processed. One is a flag for random access, and is information for indicating that an image can be read and reproduced independently regardless of preceding and following data, such as an intra frame (I picture). Second, an access flag can be defined as a flag simply indicating that reading is possible independently. For example, in the case of an image, it is information indicating that it is the head of an image in units of GOB or macroblock. Therefore, if there is no access flag, it is in the middle of data. It is not always necessary to use both a random access flag and an access flag as information indicating whether the data processing start position is available.
[0043]
In real-time communication such as a TV conference system, there is a case where no problem occurs even if both flags are not added, and a random access flag is necessary in order to enable easy editing. Whether a flag is necessary or which flag is necessary may be determined before data transfer via a communication path.
[0044]
The information indicating the data reproduction time indicates time synchronization information when the image and the sound are reproduced, and is called PTS (Presentation Time Stamp) in the MPEG1 / 2. In real-time communication such as a TV conference system, time synchronization is usually not taken into consideration, and thus information indicating playback time is not necessarily required. The required information may be the time interval of the encoded frame.
[0045]
By adjusting the time interval on the receiving side, large fluctuations in the frame interval can be prevented, but adjusting the playback interval may cause a delay. Accordingly, it may be determined that time information indicating an encoding frame interval is not necessary.
[0046]
Whether the information indicating the reproduction time of the data means PTS, the frame interval, or the fact that the reproduction time of the data is not added to the data itself is determined before the data transfer through the communication path, and the receiving terminal The data may be transmitted together with the determined data management information.
[0047]
Information indicating the priority of data processing reduces the load on the receiving terminal and the network by canceling the data processing or canceling the transmission if the data cannot be processed or transmitted due to the load on the receiving terminal or the network load. Can be made.
[0048]
Processing can be performed by the image expansion management unit 15 at the receiving terminal, and by a relay terminal or router in the network. The priority expression method may be a numerical expression or a flag. By transmitting the offset value of information indicating the priority of data processing as control information or data management information (AL information) together with data, it is possible to cope with a sudden change in the load on the receiving terminal and the load on the network. By adding an offset value to the priorities assigned to images and sounds in advance, it becomes possible to set dynamic priorities according to the operating status of the system.
[0049]
Further, by transmitting information for identifying whether it is scrambled, whether there is a copyright, whether it is an original or a copy, together with the data identifier (SSRC) as control information, descrambling at the relay node, etc. Becomes easier.
[0050]
The information indicating the priority of data processing may be added in units of streams composed of a set of a plurality of video and audio frames, or may be added in units of video and audio frames.
[0051]
H. 263 and G.H. Priority adding means for determining the priority of processing when the encoded information is overloaded by an encoding method such as 723 according to a predetermined criterion and associating the encoded information with the determined priority Is provided in the transmission terminal device (see FIG. 54).
[0052]
FIG. 54 is a diagram for explaining priority adding means 5201 for adding priority to video and audio.
[0053]
That is, as shown in the figure, priorities are assigned to encoded video and audio data (processed by the video encoding unit 5202 and audio encoding unit 5203, respectively) based on predetermined rules. Append. The rule for adding priority is stored in the priority addition rule 5204. A rule is a rule that an I frame (an intra-frame encoded video frame) is given a higher priority than a P frame (an inter-frame encoded video frame), or a video has a lower priority than audio. It is a rule of adding. Further, this rule may be dynamically changed according to a user instruction.
[0054]
The target to which the priority is added is, for example, a scene change for an image, an image frame or stream instructed by an editor or user, and a voiced and silent section for audio.
[0055]
The method of adding the priority of image or audio frame units that defines the priority of processing at the time of overload is the method of adding to the communication header and the method of embedding in the header of the encoded video or audio bitstream at the time of encoding Can be considered. The former can obtain information on priority without decoding, and the latter can be handled independently by a single bitstream without depending on the system.
[0056]
When priority information is added to the communication header, when one image frame (for example, an intra-frame encoded I frame, inter-frame encoded P, B frame) is divided into a plurality of transmission packets, In the case of an image, priority is added only to the communication header that transmits the head portion of an image frame that can be accessed as a single piece of information (if the priority is the same in the same image frame, the head of the next accessible image frame The priority should not change until appears.)
[0057]
Depending on the application, the range of values that can express the priority (for example, whether time information is expressed in 16 bits or 32 bits) can be made variable so that it can be configured with control information. Good.
[0058]
In addition, in the decoding apparatus, the receiving terminal apparatus includes priority determining means for determining a processing method in accordance with the priority at the time of overload of various received encoded information (see FIG. 55).
[0059]
FIG. 55 is a diagram for explaining priority determination means 5301 for interpreting the priority added to video and audio and determining whether or not decoding processing is possible.
[0060]
That is, as shown in the figure, the priority is a priority added for each video and audio stream, and a priority added for each video or audio frame. These priorities may be used independently, or the frame priority and the stream priority may be used in association with each other. The priority determination means 5301 determines a stream and a frame to be decoded according to these priorities.
[0061]
Decoding processing is performed using two types of priorities that determine the priority of processing when the terminal is overloaded. That is, stream priority (Stream Priority) that defines the relative priority between bitstreams such as video and audio, and relative priority between decoding processing units such as video frames in the same stream Frame priority (frame priority) is defined (FIG. 30).
[0062]
The former stream priority allows handling of multiple videos and audios. With the latter frame priority, different priorities can be added to the same intra-frame encoded video frame (I frame) according to the scene change of the video and the intention of the editor.
[0063]
The stream priority is managed in association with the allocation time or the processing priority in the operating system (OS) of image or audio encoding or decoding processing, thereby reducing the processing time at the OS level. Management becomes possible. For example, Microsoft Windows 95 / NT can define priority levels at five OS levels. When the encoding and decoding means are realized by software in units of threads, the priority at the OS level assigned to each thread can be determined from the stream priority of the stream to be processed.
[0064]
The frame priority and the stream priority described here can be applied to a transmission medium or a data recording medium. For example, when the priority of a packet to be transmitted is defined as an access unit priority (Access Unit Priority), it is related to packet transmission from the relational expression of frame priority and stream priority such as Access Unit Priority = Stream Priority-Frame Priority. It is possible to determine the priority or the priority of processing when the terminal is overloaded.
[0065]
Moreover, it can carry out using a floppy disk, an optical disk, etc. as a data recording medium. Further, the recording medium is not limited to this, and any recording medium such as an IC card or a ROM cassette capable of recording a program can be similarly implemented. Furthermore, an image or audio relay device such as a router or gateway that relays data may be targeted.
[0066]
As a specific usage method regarding priority, when the receiving terminal is overloaded, priority determination means for determining a priority threshold of encoded information to be processed is the image expansion management unit 15 or voice expansion. Comparing result of comparing the time (PTS) to be displayed and the elapsed time from the start of processing until the present, or the time to be decoded (DTS) and the elapsed time from the start of the processing to the present, provided in the management unit The priority threshold value of the encoded information to be processed is changed (information for changing the threshold value may refer to the I-frame insertion interval and the priority granularity).
[0067]
In the example shown in FIG. 25 (a), at the time of encoding, the captured QCIF and CIF size images are encoded by the encoder (H.263), and the encoded information and the decoding time (DTS), image A time stamp (PTS) indicating the time for displaying the message, priority information (CGD, Computational Graceful Degradation) indicating the order of processing at the time of overload, a frame type, and a sequence number (SN) are output.
[0068]
In the example shown in FIG. 25 (b), sound is also recorded through a microphone, encoded by an encoder (G.721), and the time of decoding (DTS) and the time of reproducing the sound together with the encoded information. A time stamp (PTS), priority information (CGD), and sequence number (SN) are output.
[0069]
At the time of decoding, as shown in FIG. 26, the image and the sound are respectively passed to separate buffers, and the image and the sound are compared with each DTS (decoding time) and the elapsed time from the start of the current processing, and the DTS If is not delayed, the image and sound are passed to the respective decoders (H.263, G.721).
[0070]
In the example of FIG. 27, a method of adding priority when an encoder is overloaded is described. As for the image, I frames (image frames that have been intra-coded) have a priority of “0” and “1”, and a higher priority is assigned (the larger the number, the lower the priority). The P frame has a priority “2” and is assigned a lower priority than the I frame. Since I frames are assigned two levels of priority, when the load on the terminal to be decoded is high, only I frames with a priority of “0” can be reproduced. It is necessary to adjust the I-frame insertion interval according to the priority adding method.
[0071]
The example of FIG. 28 is a diagram describing a priority determination method at the receiving terminal during overload. The priority of the frame to be discarded is set to be higher than CutOffPriority. That is, all image frames are processed. The maximum priority added to the image frame can be known in advance by notifying the receiving side from the transmitting side when the terminal is connected (step 101).
[0072]
Compare the DTS and the elapsed time from the start of the current process, and if the elapsed time is greater (if the decoding process is not in time), lower the threshold value CutOffPriority for the image and audio to be processed, and process If the elapsed time from the start of processing is shorter (when the decoding process is in time), the priority threshold value CutOffPriority is raised to increase the number of target images and sounds that can be processed. (Step 103).
[0073]
If the previous image frame is skipped by the P frame, no processing is performed. Otherwise, the priority offset value is added to the priority of the image frame (or audio frame) and compared with the priority threshold. If the threshold is not exceeded, the data to be decoded is passed to the decoder ( Step 104).
[0074]
Note that the priority offset is a method of checking the machine performance in advance and notifying the receiving terminal of the offset (the user may specify the offset at the receiving terminal), multiple video and sound stream stream units. You can change the priority (for example, increase the offset value for the backmost background to thin out the process).
[0075]
When a multi-stream is targeted, priority for each stream may be added to determine whether to skip image or audio decoding. In addition, H. By handling and using the H.263 TR (temporary reference) in the same manner as the DTS, it is possible to determine whether the decoding process at the terminal is advanced or delayed, and the same skip process described above can be realized. .
[0076]
FIG. 29 shows the time variation of the priority by implementing the algorithm of FIG.
[0077]
In the same figure, the change of the priority added to a video frame is shown. This priority is a priority for determining whether or not decoding is possible when the terminal is overloaded, and is added to each frame. The priority is higher as the value is smaller. In the example in the figure, 0 has the highest priority. When the priority threshold is 3, frames with a priority value greater than 3 are discarded without being decoded, and frames with a priority value of 3 or less are decoded. By selectively discarding frames according to priority, it is possible to reduce the load on the terminal. The priority threshold may be dynamically determined from the relationship between the current processing time and the decoding processing time (DTS) added to each frame. This method can be applied not only to video frames but also to audio in the same manner.
[0078]
When a transmission path such as the Internet is considered, when retransmission of encoded information lost during transmission is necessary, a retransmission request priority determination unit that determines a threshold value of priority of encoded information to be retransmitted In the reception management unit 11, the priority managed by the priority determination unit, the number of retransmissions, the loss rate of information, the insertion interval of intra-coded frames, the granularity of priority (for example, five levels of priority) And the like, the priority threshold value added to the encoded information to be requested for retransmission can be determined, so that only the image and sound required by the receiving terminal can be requested for retransmission. If the number of retransmissions and the loss rate of information are large, it is necessary to raise the priority of the information to be retransmitted and reduce the retransmission and loss rate. Further, by knowing the priority used in the priority determination unit, it is possible to eliminate transmission of information that is not a processing target.
[0079]
For the transmitting terminal, if the actual transfer rate exceeds the target transfer rate of the information of the transmitting terminal, or if the encoded information is written to the transmission buffer, the elapsed time and sign from the start of the transfer process up to now When the writing of information to the transmission buffer is delayed compared with the decoded or displayed time added to the encoded information, the priority determination unit of the receiving terminal is added to the encoded information By thinning out the transmission of information using the priority when the terminal used in the overload is overloaded, it becomes possible to transmit images and sounds that match the target rate. In addition, by introducing a skip function of processing at the time of overload, which is performed at the receiving side terminal, at the transmitting side terminal, it is possible to suppress a failure due to the overload of the transmitting side terminal.
[0080]
By allowing only the necessary information to be transmitted as needed for the AL information described above, the amount of transmission information can be adjusted for a narrowband communication path such as an analog telephone line, which is effective. As an implementation method, the data management information to be added to the data itself at the transmitting terminal is determined in advance before data transmission, and the data management information used for the receiving terminal is used as control information (for example, using only a random access flag). In addition to the notification, the reception side terminal rewrites the information on the transmission structure stored in the transmission format storage unit 103 (representing which AL information is used) based on the control information obtained. The AL information (data management information) used on the side can be rearranged (see FIGS. 19 to 20).
[0081]
FIG. 4 is a diagram for explaining a method for dynamically changing header information added to image or audio data to be transmitted. In the example of the figure, the data (ES) to be transmitted is decomposed into data pieces, and the obtained data pieces are provided with identification information (sequence number) for indicating the order relationship of the data, and the start position where the data pieces can be processed. Is added to the data piece in the form of a communication header, corresponding to the transmission management information of the present invention, and information indicating whether or not (marker bit) and time information (time stamp) related to the transfer of the data piece Yes.
[0082]
As a specific example, RTP (Realtime Transfer Protocol, RFC1889) uses information such as the sequence number, marker bit, time stamp, object ID (referred to as SSRC), and version number as a communication header. Yes. Although the header information items can be expanded, the above items are always added as fixed items. However, in a communication environment in which a plurality of differently encoded images and sounds are transmitted simultaneously, if there is a mixture of real-time communication and video-on-demand storage media such as a videophone, the communication header has Meaning is different and means to identify is needed.
[0083]
For example, the time stamp information indicates the PTS which is the reproduction time as described above in the case of MPEG1 / 2. 261 and H.264. Reference numeral 263 denotes an encoded time interval. However, H. When processing is to be performed in synchronization with the voice 263, it is necessary to indicate that the time stamp is PTS information. Because H. In the case of H.263, the time stamp information indicates the time interval between encoded frames, and the time stamp of the first frame is defined by RTP to be random.
[0084]
Therefore, (a) a flag indicating whether or not the time stamp is PTS is set as communication header information (the communication header needs to be extended) or (b) H.264. H.263 and H.264. It is necessary to add it as header information (that is, AL information) of the H.261 payload (in this case, extension of the payload information is necessary).
[0085]
As RTP header information, a marker bit, which is information indicating whether or not the data piece is a processable start position, is added. In some cases, it may be desired to have an access flag indicating that the data can be accessed and a random access flag indicating that the data can be accessed at random. Since it is inefficient to have duplicate communication headers, the AL flag may be replaced with a flag prepared in the communication header.
[0086]
(C) A flag indicating that the AL flag is substituted in the header added to the communication header without adding a flag to the AL is newly provided in the communication header, or the marker bit of the communication header is the same as that of the AL. The problem can be solved by defining it as (it can be expected that the interpretation will be faster than it is given to AL). That is, the flag indicates whether the marker bit has the same meaning as the AL flag. In this case, it can be considered that the communication header is improved or described in an extended area.
[0087]
Conversely, (d) the meaning of the marker bit in the communication header may be interpreted to mean that at least either a random access flag or an access flag exists in AL. In this case, the version number of the communication header can be used to know that the meaning of interpretation has changed from the conventional one. In addition to this, as a simple method, if an access flag or a random access flag is provided only in the communication header or AL header, the processing is simple (in the former case, both flags may be provided, A new extension of the header is required).
[0088]
Although it has been described that information indicating the priority of data processing is added as AL information, by adding the priority of data processing to the communication header, it is possible to determine the processing priority of data processing even on the network. This can be done without interpreting the contents of the data. In the case of IPv6, it is possible to add at a lower layer than the RTP level.
[0089]
By adding a timer or counter for indicating the valid period of data processing to the RTP communication header, it is possible to determine how a certain state change of the transmitted packet is changed. For example, if the required decoder software is stored in a storage device with a slow access speed, it is possible to determine when it will be necessary with the information that a decoder is required and a timer or counter. Become. In this case, depending on the purpose, information on the priority of the timer, counter, and data processing is not necessary for the AL information.
[0090]
5 (a) to 5 (b) and FIGS. 6 (a) to 6 (d) are diagrams for explaining a method of adding AL information.
[0091]
As shown in FIG. 5 (a), AL is added only to the head of data to be transmitted, or as shown in FIG. 5 (b), one or more data (ES) to be transmitted is added. It becomes possible to select the handling granularity of transmission information by sending control information for notifying whether to add to each data piece after being decomposed into data pieces to the receiving terminal. It is effective in the case where access delay becomes a problem by attaching AL to subdivided data.
[0092]
As described above, in order to notify the receiving side terminal in advance that the data management information on the receiving side will be rearranged or the arrangement method of the data management information to the data will be changed, a flag, counter, timer, etc. Using the expression method, it is prepared as AL information, or prepared as a communication header and notified to the receiving terminal, so that the receiving terminal can be handled smoothly.
[0093]
In the above examples, a method for avoiding duplication of RTP header (or communication header) and AL information and a method for extending RTP communication header and AL information have been described. However, the present invention is not necessarily RTP. For example, a unique communication header or AL information may be newly defined using UDP or TCP. The Internet profile sometimes uses RTP, but the Raw profile does not define a multi-function header such as RTP. There are four ways of thinking regarding the AL information and the communication header (see FIGS. 6A to 6D).
[0094]
(1) Modify and extend RTP header information or AL information so that the already assigned header information does not overlap between RTP and AL (especially time stamp information is duplicated, timer, counter, and data processing) Priority information becomes extended information). Alternatively, the RTP header may not be expanded, and the AL information may be duplicated with that of the RTP, and a method that does not consider it may be used. These correspond to the contents shown so far. RTP is already partly Since it is put into practical use in H.323, it is effective to extend RTP while maintaining compatibility (see FIG. 6A).
[0095]
(2) Regardless of RTP, the communication header is simplified (for example, only the sequence number is used), and the remaining information is included in the AL information as multifunctional control information. Also, by allowing the items used in the AL information to be variably set before communication, a flexible transmission format can be defined (see FIG. 6B).
[0096]
(3) Regardless of RTP, AL information is simplified (in the extreme example, no information is added to AL), and all control information is included in the communication header. For sequence numbers, time stamps, marker bits, payload types, and object IDs that can often be frequently referred to as communication headers, fixed header information is set. For data processing priority information and timer information, extended information is set as extended information. An identifier indicating whether or not it exists may be provided and referred to if extended information is defined (see FIG. 6C).
[0097]
(4) Regardless of RTP, the communication header and AL information are simplified, and the communication header and AL information are transmitted with a format defined as a separate packet. For example, AL information defines only a marker bit, a time stamp, and an object ID, and a communication header defines only a sequence number. As a transmission packet (second packet) different from these information, payload information, data processing It is also possible to define and transmit priority information, timer information, etc. (see FIG. 6D).
[0098]
As described above, considering the application and the header information already added to the image or sound, a packet (second packet) to be transmitted separately from the communication header, AL information, and data according to the application. ) Is freely definable (can be customized).
[0099]
FIG. 7 is a diagram for explaining a method of transmitting information by dynamically multiplexing and separating a plurality of logical transmission paths. In order to save the number of logical transmission lines, multiplexing of logical transmission line information for transmitting a plurality of data or control information according to a user instruction or the number of logical transmission lines Can be realized by providing the transmission unit 13 with an information multiplexing unit that can start and stop the transmission, and an information separating unit for separating the multiplexed information.
[0100]
In FIG. 7, the information multiplexing unit is called “Group MUX”. A multiplexing scheme such as H.223 may be used. This Group MUX may be provided at a transmission / reception terminal, or by providing it at a relay router or terminal, to cope with a narrowband communication path, or to install a Group MUX in H.264. If realized by H.223, H. 324 can be interconnected.
[0101]
In order to quickly extract control information (multiplexing control information) related to the information multiplexing unit, the control information of the information multiplexing unit is not multiplexed with data and transmitted by the information multiplexing unit. By transmitting on the transmission path, delay due to multiplexing can be reduced. Along with this, it is notified whether the control information related to the information multiplexing unit is multiplexed with the data and transmitted, or not multiplexed with the data and transmitted through another logical transmission line without multiplexing. Thus, it is possible for the user to select whether to maintain consistency with conventional multiplexing or to reduce delay due to multiplexing. Here, the multiplexing control information related to the information multiplexing unit is information indicating the contents of multiplexing such as how the information multiplexing unit performs multiplexing on each data.
[0102]
As described above, similarly, at least information for notifying the start and end of multiplexing, information for notifying the combination of logical transmission paths to be multiplexed, and control information for multiplexing (multiplexing control information) are transmitted. By transmitting the method notification to the receiving terminal as control information or data management information together with data in an expression method such as a flag, a counter, or a timer, setup time on the receiving side can be shortened. Further, as described above, items representing flags, counters, and timers may be provided in the RTP transmission header.
[0103]
When there are a plurality of information multiplexing units and information demultiplexing units, if control information (multiplexing control information) is transmitted together with an identifier for identifying the information multiplexing unit and the information demultiplexing unit, control information on which information multiplexing unit ( Multiplexing control information). Examples of control information (multiplexing control information) include multiplexing patterns. Moreover, the identifier of an information multiplexing part can be produced | generated by determining between the terminals using the random number for the identifier of an information multiplexing part or an information separation part. For example, a random number within a range determined between the transmitting and receiving terminals may be generated, and the larger value may be used as the identifier (identification number) of the information multiplexing unit.
[0104]
Since the data multiplexed in the information multiplexing unit is conventionally different from the media type defined in RTP, information indicating that the information is multiplexed in the information multiplexing unit in the RTP payload type (new Defined media type, H.223).
[0105]
As a method for improving the access speed for multiplexed data, it can be expected that analysis of multiplexed information can be accelerated by arranging information transmitted or recorded in the information multiplexing unit in the order of control information and data information. In addition, the items described in the data management information added to the control information are fixed, and an identifier (unique pattern) different from the data is added and multiplexed, whereby the header information can be analyzed quickly.
[0106]
FIG. 8 is a diagram for explaining the transmission procedure of a broadcast program. By transmitting the control information with the correspondence between the identifier of the logical transmission path and the identifier of the broadcast program as broadcast program information, or by adding the identifier of the broadcast program as data management information (AL information) to the data for transmission It is possible to identify for which program data transmitted through a plurality of transmission paths is broadcast. In addition, the relationship between the data identifier (SSRC in RTP) and the logical transmission path identifier (for example, LAN port number) is transmitted as control information to the receiving terminal, and can be received by the receiving terminal. After confirming (Ack / Reject), by transmitting the corresponding data, even if the control information and the data are transmitted through independent transmission paths, the correspondence between the data can be obtained.
[0107]
A combination of an identifier indicating the order of transmission of a broadcast program or data and counter or timer information for indicating the expiration date that the broadcast program or data can be used as information is added to the broadcast program or data and transmitted. Thus, broadcasting without a return channel can be realized (when the expiration date is about to expire, reproduction of broadcast program information and data is started even if there is insufficient information). It is also conceivable to use a single communication port address (multicast address) and broadcast without separating the control information and data.
[0108]
In the case of communication without a back channel, it is necessary to transmit the control information sufficiently in advance of the data so that the receiving terminal can know the structure information of the data. In addition, control information should generally be transmitted on a reliable transmission channel without packet loss, but when using a transmission channel with low reliability, control information having the same transmission sequence number is repeated periodically. It is necessary to transmit. This is not limited to sending control information related to setup time.
[0109]
Also, an item that can be added as data management information (for example, an access flag, a random access flag, a data reproduction time (PTS), data processing priority information, etc.) is selected, and a data identifier (SSRC) is selected as control information. In addition, the transmission side decides whether to transmit the data management data (AL information) together with the data on the logical transmission path different from the data, and notifies the receiving side as control information. Thus, flexible data management and transmission are possible.
[0110]
As a result, data information can be transmitted without adding information to the AL. Therefore, when image or audio data is transmitted using RTP, the definition of the payload defined in the past is expanded. There is no need.
[0111]
FIG. 9A to FIG. 9B are diagrams illustrating a method of transmitting an image and sound in consideration of reading of a program and data and start-up time. In particular, when there are no return channels, such as satellite broadcasting and portable terminals, the terminal resources are limited, and the programs and data are present and used in the receiving terminal (for example, the necessary programs (for example, , H.263, MPEG1 / 2, audio decoder software, etc.) and data (for example, image data and audio data) take a long time to read (for example, DVD, hard disk, network file server, etc.) ), An identifier for identifying a program or data required in advance, an identifier of a stream to be transmitted (for example, SSRC or Logical Channel Number), and a flag for estimating a time point required at the receiving terminal , Counter (count up, down), expression like timer In, received as control information, or by receiving with the data as the data management information, it is possible to shorten the programs and data necessary setup time (Figure 22).
[0112]
On the other hand, when a program or data is transmitted, the storage destination (eg, hard disk or memory) of the program or data at the receiving terminal, the time required for activation or reading, the type of terminal or the time required for activation or reading with the storage destination By transmitting the program and data from the transmission side together with information indicating the correspondence relationship (for example, the relationship between the CPU power, the storage device and the average response time), and the usage order, the program or data required at the reception side terminal is actually transmitted. Therefore, scheduling is possible with respect to the storage destination and reading time of programs and data.
[0113]
FIG. 10A to FIG. 10B are diagrams for explaining a method for dealing with zapping (TV channel switching).
[0114]
Unlike conventional satellite broadcasts that only receive video, when a program must be executed on a receiving terminal, the time required for setup of the program to be read or started up becomes a major problem. The same can be said even when the resources used are limited, such as a portable terminal.
[0115]
As one of the solutions, (a) a main viewing section for viewing by a user and a storage device that takes a long time to read necessary programs and data in programs other than the viewing by the user. If present, the receiving terminal is provided with a sub-viewing unit that periodically receives a program other than the program being viewed by the user, and an identifier for identifying a program or data that is required in advance and is required by the receiving terminal The correspondence between information such as flags, counters, and timers for estimating time points and programs is received as control information (information for controlling terminal processing, transmitted in a packet different from data), or data It can be expected that the setup time at the receiving terminal can be shortened by receiving the management information (AL information) together with the data and preparing the reading of the program and data. .
[0116]
The second solution is to provide a broadcast channel that broadcasts only the headline image of the image that is broadcast on a plurality of channels. If it exists in a storage device that takes a long time, select the headline image of the program you want to view and present it to the viewer, or indicate that it is being read, and load the necessary programs and data from the storage device After the reading is completed, the program that the viewer wants to watch can be resumed to prevent the screen from being stopped during setup. The headline image here refers to a broadcast image obtained by sampling a program that is periodically broadcast on a plurality of channels.
[0117]
The timer is a time expression, for example, indicating when the program necessary for decoding the data stream sent from the transmission side is needed from the present time. The counter may be information indicating the number of times in a basic time unit determined between the transmitting and receiving terminals. The flag is transmitted and notified together with data sent before the time required for setup or control information (information for controlling terminal processing transmitted in a packet different from the data). Both the timer and the counter may be transmitted by being embedded in data, or may be transmitted as control information.
[0118]
Furthermore, as a method for determining the setup time, for example, when a transmission line such as ISDN operating on a clock base is used, to notify the time when a program or data is required at the receiving terminal from the transmitting terminal. By using the transmission serial number for identifying the transmission order relationship as the transmission management information, it is possible to predict the setup time by notifying the receiving terminal with the data as the data management information or as the control information. become. If the transmission time fluctuates due to jitter or delay as in the Internet, the transmission delay of transmission is taken into account from the jitter and delay time by means already implemented in RTCP (Internet Media Transmission Protocol). And add it to the setup time.
[0119]
11 to 24 are diagrams illustrating specific examples of protocols that are actually transmitted and received between terminals.
[0120]
The transmission format and transmission procedure are described in ASN. Described in 1. This transmission format is ITU H.264. Expansion was based on H.245. As shown in FIG. 11, the image and audio objects may have a hierarchical structure. In this example, each object ID has attributes of an identifier (ProgramID) and an object ID (SSRC) of a broadcast program, Structure information between images and a synthesis method are described in a script language such as Java or VRML.
[0121]
FIG. 11 is a diagram illustrating an example of a relationship between objects.
[0122]
In the figure, an object is a medium such as video, audio, CG, or text. In the example of the figure, the objects have a hierarchical structure. Each object has a program number (corresponding to a TV channel, “Program ID”) and an object identifier “Object ID” for identifying the object. When each object is transmitted using RTP (Realtime Transfer Protocol), the object identifier can be easily identified by corresponding to SSRC (synchronization source identifier). The structure description between objects can be described in a description language such as JAVA or VRML.
[0123]
There are two possible methods for transmitting these objects. One is a broadcast type, in which transmission is unilaterally performed from a transmission side terminal. Another is a mode (communication type) in which an object is transmitted between transmitting and receiving terminals (terminal A and terminal B).
[0124]
For example, RTP can be used as the transmission method in the case of the Internet. The control information is transmitted using a transmission channel called LCNO in the videophone standard. In the example of the figure, a plurality of transmission channels are used for transmission, but the same program channel (Program ID) is assigned to these channels.
[0125]
FIG. 12 is a diagram for explaining a protocol implementation method for realizing the functions described in the present invention. Here, the transmission protocol (H.245) used in the TV phone standard (H.324, H.323) will be described. H. The functions described in the present invention are realized by extending 245.
[0126]
The description method shown in the example of FIG. This is a protocol description method called 1. “Terminal Capability Set” expresses the performance of the terminal. In the example shown in FIG. It extends to H.245.
[0127]
In FIG. 13, “mpeg4 Capability” is the maximum number of videos (“Max Number Of Video”), the maximum number of audios (“Max Number Of Sounds”) that can be simultaneously processed by the terminal, and the maximum multiplexing that can be realized by the terminal. The number of functions (“Max Number Of Mux”) is indicated.
[0128]
In the figure, these are collectively expressed as the maximum number of objects that can be processed (“Number Of Process Object”). In addition, a flag indicating whether the communication header (expressed as AL in the figure) can be changed is written. When this value is true, the communication header can be changed. When notifying each other of the number of objects that can be processed between terminals using “MPEG4 Capability”, if the notified side can accept (process), “MPEG4 Capability Ack”; otherwise, “MPEG4 Capability Reject” Is returned to the terminal that has transmitted “MEPG4 Capability”.
[0129]
In FIG. 14, in order to share and use one transmission channel (in this example, a LAN transmission channel) among a plurality of logical channels, the above-described Group is used to multiplex a plurality of logical channels into one transmission channel. It shows a method of describing a protocol for using MUX. In the example of FIG. 6, a multiplexing means (Group MUX) is associated with a LAN (local area network) transmission channel (“LAN Port Number”). “Group MuxID” is an identifier for identifying the multiplexing means. When using “Create Group Mux” to notify each other when multiplexing means are used between terminals, if the notified side can accept (use), “Create Group Mux Ack”, otherwise “Create Group Mux Reject” is returned to the terminal that has transmitted “Create Group Mux”. Separation means, which is means for performing the reverse operation of the multiplexing means, can be realized by a similar method.
[0130]
FIG. 15 describes the case where the already generated multiplexing means is deleted.
[0131]
FIG. 16 describes the relationship between a LAN transmission channel and a plurality of logical channels.
[0132]
A LAN transmission channel is described as “LAN Port Number”, and a plurality of logical channels are described as “Logical Port Number”.
[0133]
In the example of the figure, a maximum of 15 logical channels can be associated with one LAN transmission channel.
[0134]
In the figure, when only one MUX can be used, the GroupMux ID is not necessary. When using a plurality of Muxes, H. A Group Mux ID is required for each 223 command. In addition, a flag for notifying the correspondence relationship between the ports used between the multiplexing and the separation means may be provided. In addition, a command for enabling selection of whether the control information is multiplexed or transmitted via another logical transmission path may be provided.
[0135]
In the description of FIGS. 14 to 16, the transmission channel is LAN. A system that does not use an Internet protocol, such as H.223 and MPEG2, may be used.
[0136]
In FIG. 17, “Open Logical Channel” indicates a protocol description for defining the attribute of the transmission channel. In the example of FIG. “MPEG4 Logical Channel Parameters” is extended and defined for the H.245 protocol.
[0137]
FIG. 18 shows that a program number (corresponding to a TV channel) and a program name are associated with a LAN transmission channel (“MPEG4 Logical Channel Parameters”).
[0138]
Also, in the figure, “Broadcast Channel Program” is a description method in the case of transmitting a correspondence between a LAN transmission channel and a program number in a broadcast type. In the example of the figure, it is possible to send the correspondence between a maximum of 1023 transmission channels and program numbers. In the case of broadcasting, since it is only transmitted unilaterally from the transmission side to the reception side, it is necessary to periodically transmit such information in consideration of a loss during transmission.
[0139]
In FIG. 19, the attributes of an object (for example, video, audio, etc.) transmitted as a program are described (“MPEG4 Object Class definition”). Object information (“Object Structure Element”) is associated with a program identifier (“Program ID”). A maximum of 1023 objects can be associated. The object information includes a LAN transmission channel ("LAN Port Number"), a flag indicating whether scrambling is used ("Scramble Flag"), and the priority of processing when the terminal is overloaded. A field for defining an offset value (“CGD Offset”) and an identifier (Media Type) for identifying the type of media (video, audio, etc.) to be transmitted are described.
[0140]
In the example of FIG. 20, additional information necessary for decoding the AL (here, one frame of video) is managed in order to manage the decoding process of ES (defined here as a data string corresponding to one frame of video). Is defined). The AL information includes (1) Random Access Flag (a flag indicating whether or not it can be reproduced independently, true if it is an intra-frame encoded video frame), (2) Presentation Time Stamp (frame Display time) and (3) CGD Priority (priority value for determining the priority of processing when the terminal is overloaded). An example in which the data string for one frame is transmitted using RTP (Protocol for transmitting continuous media over the Internet, Realtime Transfer Protocol) is shown. “AL Reconfiguration” is a transmission expression for changing the maximum value that can be expressed by the above-mentioned AL.
[0141]
In the example of the figure, a maximum of 2 bits can be expressed as “Random Access Flag Max Bit”. For example, if 0, Random Access Flag is not used. If it is 2, the maximum value is 3.
[0142]
In addition, you may express by the real part and the mantissa part (for example, 3 ^ 6). Further, when not set, the operation may be performed in a state determined by default.
[0143]
In FIG. 21, “Setup Request” indicates a transmission expression for transmitting the setup time. Before sending the program, “Setup Request” is sent and the transmission channel number (“Logical Channel Number”), the program ID to be executed (“exclude Program Number”), and the data ID to be used (“data Number”) ), The ID of the command to be executed (“execute CommandNumber”) is associated and sent to the receiving terminal. As another expression method, an execution permission flag (“flag”) in association with a transmission channel number, a counter (“counter”) indicating how many times Setup Request is received, and how much is left It may be a timer value ("timer") indicating whether to execute at the time.
[0144]
Note that examples of requests that are scheduled to be requested include rewriting AL information, securing the rise time of Group Mux, and the like.
[0145]
FIG. 22 is a diagram for explaining a transmission expression for notifying the reception terminal of whether or not the AL described in FIG. 20 is used (“Control AL definition”).
[0146]
In the figure, if “Random Access Flag Use” is true, Random Access Flag is used. Otherwise do not use. This AL change notification may be transmitted as control information through a transmission channel different from that of data, or may be transmitted along with the data through the same transmission channel.
[0147]
An example of the program to be executed is a decoder program. The setup request can be used for both broadcasting and communication. The receiving terminal is instructed by the above request which item is used as the control information and which item is used as the AL information. Similarly, the receiving terminal can be instructed which item is used in the communication header, which item is used as AL information, and which item is used as control information.
[0148]
In FIG. 23, a transmission expression for changing the structure of header information (data management information, transmission management information, control information) to be transmitted between transmitting and receiving terminals using an information framework identifier (“header ID”). An example is shown.
[0149]
In the figure, “class ES header” distinguishes between data transmission / reception terminals by data framework information and data management information transmitted on the same transmission channel as data and the structure of information transmitted by transmission management information.
[0150]
For example, if the “header ID” value is 0, only the buffer size ES item is used, and if the “header ID” value is 1, the “reserved” item is used.
[0151]
Further, by using a default identifier (“use Header Extension”), it is determined whether or not to use a default format information framework. If “use Header Extension” is true, the item inside the if statement is used. These pieces of structural information are preliminarily negotiated between transmitting and receiving terminals. Note that either one of the information framework identifier and the default identifier may be used.
[0152]
In FIG. 24, “AL configuration” indicates an example in which the structure of control information transmitted on a transmission channel different from that of data is changed between transmitting and receiving terminals according to the use. The method of using the information framework identifier and the default identifier is the same as in the case of FIG.
[0153]
In the present invention, a method for realizing a system for simultaneously synthesizing and displaying a plurality of moving images and sounds has been specifically described from the following viewpoints.
[0154]
(1) A method of transmitting (communication and broadcasting) images and sounds using a plurality of logical transmission paths and controlling them. In particular, the method of transmitting the control information and the data independently of the logical transmission paths for transmitting the control information and data has been described.
[0155]
(2) A method for dynamically changing header information (AL information) added to image or audio data to be transmitted.
[0156]
(3) A method for dynamically changing header information for communication added for transmission.
[0157]
Specifically, with regard to (2) and (3), a method of integrating and managing information that overlaps AL information and communication headers and a method of transmitting AL information as control information were described. .
[0158]
(4) A method of transmitting information by dynamically multiplexing and separating a plurality of logical transmission paths.
[0159]
A method for saving the number of channels in the transmission path and a method for realizing efficient multiplexing have been described.
[0160]
(5) Image and audio transmission method taking into account program and data reading and start-up time. We have described how to reduce the apparent setup time for various functions and applications.
[0161]
(6) A method of transmitting images and sounds for zapping.
[0162]
Note that the present invention is not limited to only two-dimensional image synthesis. An expression format combining a two-dimensional image and a three-dimensional image may be used, and an image composition method for synthesizing a plurality of images adjacent to each other like a wide-field image (panoramic image) may be included.
[0163]
Further, the communication form targeted by the present invention is not limited to wired bidirectional CATV and B-ISDN. For example, transmission of video and audio from the center side terminal to the home side terminal is radio waves (for example, VHF band, UHF band) and satellite broadcasting, and information transmission from the home side terminal to the center side terminal is an analog telephone line or N -It may be ISDN (video, audio and data are not necessarily multiplexed).
[0164]
Further, a communication form using radio such as IrDA, PHS (Personal Handy Phone), and wireless LAN may be used. Furthermore, the target terminal may be a portable terminal such as a portable information terminal, or a desktop terminal such as a set-top BOX or personal computer. Application fields include TV telephones, multi-point monitoring systems, multimedia database search systems, games, etc. The present invention is not limited to receiving terminals, but also servers and relay devices connected to receiving terminals. Is also included.
[0165]
Further, in the above examples, a method for avoiding duplication of RTP (communication) header and AL information and a method for extending RTP communication header and AL information have been described. However, the present invention is not necessarily RTP. For example, a unique communication header or AL information may be newly defined using UDP or TCP. The Internet profile sometimes uses RTP, but the Raw profile does not define a multi-function header such as RTP. There are four ways of thinking about the AL information and the communication header as described above.
[0166]
As described above, the data management information, transmission management information, and control information used by the transmitting terminal and the receiving terminal are each framed (for example, the first is assigned as 1-bit flag information with a random access flag. For example, it is possible to change the information framework according to the situation, by dynamically determining the information frame with the order of information to be added and the number of bits (16 bits are allocated by the sequence number). Changes can be made according to the transmission path.
[0167]
Note that the framework of each information may be the one already shown in FIGS. 6A to 6D. In the case of RTP, the data management information (AL) is the header information for each medium (for example, For H.263, H.263-specific video header information and payload header information), transmission management information may be RTP header information, and control information may be information for controlling RTP such as RTCP. .
[0168]
In addition, a default identifier for indicating whether or not information is transmitted and received and processed in a known information framework set in advance between the transmitting and receiving terminals is set as data management information, transmission management information, control information (what is data? It is possible to know whether or not the information framework has been changed by providing it in the information that controls the terminal processing (transmitted in another packet), and the default identifier is used only when the change is made. When the change information (for example, the time stamp information is changed from 32 bits to 16 bits) is notified by the method shown in FIGS. 19 to 20, and the information framework information is not changed. However, it is not necessary to send configuration information unnecessarily.
[0169]
For example, when the information framework of the data management information is to be changed, the following two methods can be considered. First, when describing how to change the information management information framework in the data itself, the default identifier of information existing in the data described with respect to the data management information framework (which must be written to a fixed area or location). After that, describe the changes to the information framework.
[0170]
Another method is to describe how to change the data information framework in the control information (information framework control information), and to change the information framework in the data management information, set the default identifier provided in the control information. , Describe the contents of the information management information framework to be changed, notify the receiving terminal that the data management information framework has been changed by ACK / Reject, and confirm that the information framework has been changed. Transmit data. Similarly, when the framework of the transmission management information and the control information itself is changed, the above two methods can be used (FIGS. 23 to 24).
[0171]
As a more specific example, for example, the header information of MPEG2 is fixed, but the program map table (PSI) relating the MPEG2-TS (transport stream) video stream and audio stream is defined. )) And a configuration stream that describes how to change the video stream and audio stream information framework is defined. If the default identifier is set, After the stream is interpreted, the video and audio stream headers can be interpreted according to the contents of the configuration stream. The configuration stream may have the contents shown in FIGS.
[0172]
The contents (transmission format information) relating to the transmission method and / or the structure of data to be transmitted according to the present invention correspond to, for example, an information framework in the above embodiment.
[0173]
In the above embodiment, the description has been made centering on the case of transmitting the contents related to the transmission method and / or the structure of data to be transmitted. However, the present invention is not limited to this. For example, only the identifier of the contents is transmitted. Of course, it is also possible to have a configuration. In this case, as the transmission device, for example, as shown in FIG. 52, (1) the transmission method and / or the data structure to be transmitted, or the identifier indicating the content is transmitted as transmission format information. Transmission means 5001 for transmission using the same transmission line as the data transmission line, or a transmission line different from the transmission line, and (2) content relating to the transmission method and / or structure of data to be transmitted, Storage means 5002 for storing a plurality of types of identifiers, and the identifier is included in at least one of data management information, transmission management information, or information for controlling processing on the terminal side. The image / sound transmitting apparatus may be used. As a receiving apparatus, for example, as shown in FIG. 53, receiving means 5101 for receiving the transmission format information transmitted from the image / sound transmitting apparatus, and transmission information for interpreting the received transmission format information. It may be an image / sound receiving device provided with the interpreting means 5102. The image / sound receiving apparatus further includes storage means 5103 for storing a plurality of types of contents related to the transmission method and / or the structure of data to be transmitted and identifiers thereof, and received the identifiers as the transmission format information. In such a case, the content stored in the storage unit may be used when interpreting the content of the identifier.
[0174]
More specifically, a plurality of information frameworks are determined and prepared in advance at the transmitting and receiving terminals, and the identification of the plurality of types of information frameworks, the plurality of types of data management information, transmission management information, control information (information By transmitting an information framework identifier for identifying (framework control information) together with data or as control information, it becomes possible to identify each type of data management information, transmission management information, and control information, The information framework of each information can be freely selected according to the type of media to be transmitted and the thickness of the transmission path. Note that the identifier of the present invention corresponds to the framework identifier of the information.
[0175]
These information framework identifiers and default identifiers are added to a predetermined fixed-length area or position of information to be transmitted, so that they can be read and interpreted even if the information framework is changed at the receiving terminal. be able to.
[0176]
In addition to the configuration described in the above-described embodiment, a broadcast channel that broadcasts only the heading image of an image that is broadcast on a plurality of channels is provided, and the viewer can switch the viewing program, When it takes time to set up the data, it is possible to select a headline image of a program to be viewed once and present it to the viewer.
[0177]
As described above, according to the present invention, by dynamically determining the data management information, transmission management information, and control information used by the transmitting terminal and the receiving terminal, the information framework corresponding to the situation is determined. Changes can be made, and changes can be made according to applications and transmission paths.
[0178]
In addition, a default identifier is provided in each of the data management information, the transmission management information, and the control information to indicate whether or not the information is transmitted and received and processed according to a known information framework set in advance between the transmitting and receiving terminals. Therefore, it is possible to know whether or not the information framework has been changed. Only when the change has been made, the default identifier is set and the change contents are notified, so that the information framework information is not changed. Even in this case, it is not necessary to send configuration information unnecessarily.
[0179]
In addition, a plurality of information frameworks are determined and prepared in advance by the transmitting and receiving terminals, and information framework identifiers for identifying a plurality of types of data management information, transmission management information, and control information are transmitted together with the data or as control information. This makes it possible to identify multiple types of data management information, transmission management information, and control information, and set the information framework for each information according to the type of media to be transmitted and the thickness of the transmission path. You can choose freely.
[0180]
These information framework identifiers and default identifiers can be read and interpreted even if the information framework is changed at the receiving terminal by adding it to a predetermined fixed-length area or position of the information to be transmitted. Can do.
[0181]
Hereinafter, embodiments of the present invention will be described with reference to the drawings.
[0182]
In this case, one of the problems (B1) to (B2) described above is mainly solved.
[0183]
The meaning of “image” used in the present invention includes both still images and moving images. The target image may be three-dimensional image data composed of a two-dimensional image such as computer graphics (CG) and a wire frame model.
[0184]
FIG. 31 is a schematic configuration diagram of an image encoding / decoding device according to an embodiment of the present invention.
[0185]
A transmission management unit 4011 that transmits or records various encoded information is means for transmitting information such as a coaxial cable, a CATV, a LAN, and a modem. The image encoding device 4101 is H.264. 263, MPEG1 / 2, JPEG, or an image encoding unit 4012 that encodes image information such as Huffman encoding, and the transmission management unit 4011. The image decoding apparatus 4102 includes a reception management unit 4013 that receives various encoded information, an image decoding unit 4014 that decodes the received various image information, and one or more decoded ones. The image synthesizing unit includes an image synthesizing unit 4015 that synthesizes images and an output unit 4016 that includes a display, a printer, or the like that outputs images.
[0186]
FIG. 32 is a schematic configuration diagram of a speech encoding / decoding device according to an embodiment of the present invention.
[0187]
The audio encoding device 4201 includes a transmission management unit 4021 that transmits or records various types of encoded information; 721, an audio encoding unit 4022 for encoding audio information such as MPEG1 audio. Also, the speech decoding apparatus 4202 synthesizes one or more decoded speeches, a reception management unit 4023 that receives various encoded information, a speech decoding unit 4024 that decodes the various speech information, and the like. The voice synthesizing unit 4025 and the output unit 4026 for outputting voice are provided.
[0188]
Specifically, the time-series data of voice and moving images is encoded or decoded by each of the above devices.
[0189]
Both FIG. 31 and FIG. 32 may be a communication environment in which a plurality of logical transmission paths can be used without being aware of multiplexing means, such as the Internet, as in the case of analog telephones and satellite broadcasting. It may be a communication environment in which the multiplexing means must be conscious. The terminal connection mode includes a mode in which video and audio are transmitted and received bi-directionally between terminals, such as a TV phone and a TV conference system, and a broadcast type video and audio broadcast on satellite broadcasting, CATV, and the Internet. A form is mentioned.
[0190]
Similarly, with regard to image and audio synthesis methods, image / sound and image / sound structure information (display position and display time), image / sound grouping method, and image in script languages such as JAVA, VRML, and MHEG By describing the display layer (depth) and the object ID (ID for identifying individual objects such as images and sounds) and the relationship between these attributes, a method for synthesizing images and sounds can be defined. A script describing the synthesis method can be obtained from a network or a local storage device.
[0191]
Note that a transmitting / receiving terminal may be configured with an arbitrary number and combination of image encoding devices, image decoding devices, audio encoding devices, and audio decoding devices.
[0192]
FIG. 33A is a diagram illustrating a priority adding unit and a priority determining unit that manage processing priorities during overload. H. 263 and G.H. A priority adding unit that determines the priority of processing when the encoded information is overloaded by an encoding method such as 723 according to a predetermined criterion, and associates the encoded information with the determined priority. 4031 is provided in the image encoding device 4101 and the audio encoding device 4201.
[0193]
The reference for adding priority is, for example, a scene change for an image, an image frame or stream instructed by an editor or user, and a voiced and silent section for audio.
[0194]
As a method of adding a priority that defines the priority of processing at the time of an overload, a method of adding to a communication header and a method of embedding in a header of a bit stream to be encoded of video or audio during encoding can be considered. The former can obtain information on priority without decoding, and the latter can be handled independently by a single bitstream without depending on the system.
[0195]
As shown in FIG. 33 (b), when priority information is added to the communication header, one image frame (for example, intra-frame coded I frame, inter-frame coded P, B frame) Is divided into a plurality of transmission packets, if it is an image, priority is added only to the communication header that transmits the head portion of the image frame that can be accessed as a single piece of information (priority within the same image frame). If they are equal, the priority should not change until the beginning of the next accessible image frame appears).
[0196]
Also, in the decoding apparatus, a priority determination unit 4032 that determines a processing method according to the priority at the time of overload of various received encoded information is provided in the image decoding apparatus 4102 and the audio decoding apparatus 4202. Prepare.
[0197]
34 to 36 are diagrams for explaining the granularity to which the priority is added. Decoding processing is performed using two types of priorities that determine the priority of processing when the terminal is overloaded.
[0198]
That is, stream priority (stream priority) defining the priority of processing in the case of overload in units of bit streams such as video and audio, and excess in units of frames such as video frames in the same stream. Frame priority (Frame Priority; priority in time-series data) that defines the priority of processing during load is defined (see FIG. 34).
[0199]
The former stream priority allows handling of multiple videos and audios. With the latter frame priority, different priorities can be added to the same intra-frame encoded video frame (I frame) according to the scene change of the video and the intention of the editor.
[0200]
The meaning of the value expressed by the stream priority can be considered as a relative value or as an absolute value (see FIGS. 35 and 36).
[0201]
Stream priority and frame priority are handled on a network by relay terminals such as routers and gateways, and by terminals, transmission terminals and reception terminals can be cited.
[0202]
There are two ways to express absolute values and relative values. One is the method shown in FIG. 35, and the other is the method shown in FIG.
[0203]
In FIG. 35, the absolute value priority is a value indicating the order in which an editor or a mechanically added image stream or audio stream is processed (or should be processed) when overloaded ( It is not a value that takes into account actual network and terminal load fluctuations). The relative value priority is a value for changing the absolute priority value according to the load of the terminal or the network.
[0204]
By managing the priority by separating it into a relative value and an absolute value, it is possible to change only the relative value on the transmission side or relay device according to fluctuations in the network load, etc. From now on, recording to a hard disk or VTR is possible while retaining the absolute priority added to the image and audio streams. If absolute priority values are recorded in this way, video and audio can be played back without being affected by network load fluctuations. The relative priority and the absolute priority may be transmitted through the control channel independently of the data.
[0205]
Similarly, in FIG. 35, the frame priority that defines the priority of the frame processing at the time of overload with finer granularity than the stream priority is treated as a relative priority value, or absolute It can also be handled as a priority value. For example, in order to describe the absolute frame priority in the encoded image information and reflect the fluctuation due to the load of the network or terminal, it is relative to the absolute priority added to the previous video frame. By describing the frame priority in the communication header of the communication packet for transmitting the encoded information, the priority is added according to the load of the network and the terminal while maintaining the original priority at the frame level. Is possible.
[0206]
The relative priority may be transmitted by describing the correspondence with the frame on the control channel independently of the data instead of the communication header. As a result, recording to a hard disk or VTR is possible while retaining the absolute priority originally added to the image or audio stream.
[0207]
On the other hand, in FIG. 35, when reproduction is performed at the receiving terminal while being transmitted through the network without recording at the receiving terminal, it is necessary to manage the absolute value and the relative value separately at the receiving terminal. Therefore, even in the case of both the frame and stream levels, the absolute priority value and the relative priority value may be calculated before transmission on the transmission side, and only the absolute value may be sent. .
[0208]
In FIG. 36, the absolute value priority is a value uniquely determined between frames obtained from the relationship between the Stream Priority and the Frame Priority. The relative value priority is a value indicating the order in which an image stream or an audio stream added by an editor or a machine is processed (or should be processed) in the event of an overload. In the example of FIG. 36, the frame priority (relative value) of each stream of video and audio and the stream priority are added to each stream.
[0209]
The absolute frame priority (absolute) is obtained from the sum of the relative frame priority and the stream priority (ie, absolute frame priority = relative frame priority + stream priority). . This calculation method may be a method of subtracting or multiplying by a constant.
[0210]
Absolute frame priority is mainly used in the network. This is because a relay device such as a router or a gateway does not need to determine the priority for each frame in consideration of the Stream Priority and Frame Priority in terms of absolute values. By using this absolute frame priority, processing such as discarding of frames in the relay apparatus is facilitated.
[0211]
On the other hand, the relative frame priority can be expected to be applied mainly to a storage system for recording and editing. In editing, a plurality of video and audio streams may be handled simultaneously. In such a case, there may be a limit to the number of video streams and frames that can be reproduced due to the load on the terminal or the network.
[0212]
In such a case, simply managing the Stream Priority and Frame Priority separately, for example, the editor wants to display it preferentially, or the user changes the Stream Priority of the stream he wants to see It is not necessary to recalculate all Frame Priority, unlike when expressing absolute values. Thus, it is necessary to use an absolute expression and a relative expression properly according to the application.
[0213]
In addition, by describing whether the stream priority value is used as a relative value or an absolute value, it is possible to express an effective priority during transmission and storage.
[0214]
In the example of FIG. 35, a flag or identifier expressing whether the value expressed by the stream priority is an absolute value or a relative value is provided in association with the stream priority. In the case of frame priority, since a relative value is described in the communication header and an absolute value is described in the encoded frame, no flag or identifier is necessary.
[0215]
In the example of FIG. 36, a flag or identifier for identifying whether the frame priority is an absolute value or a relative value is provided. If it is an absolute value, it is the priority calculated from the stream priority and the relative frame priority, so the calculation process is not performed in the relay device or terminal. In addition, in the receiving terminal, when the calculation formula is known between terminals, it is possible to reversely calculate the relative frame priority from the absolute frame priority and the stream priority. For example, the absolute priority (Access Unit Priority) of a packet to be transmitted may be obtained from a relational expression of “Access Unit Priority = stream priority−frame priority”. Here, the frame priority may be expressed as a subordinate priority because the stream priority is subtracted.
[0216]
Further, the data processing may be managed by associating one or more stream priorities with the processing priority of data flowing through the TCP / IP logical channel (LAN port number).
[0217]
In addition, for images and audio, it can be expected that the need for retransmission processing can be reduced by assigning a lower stream priority or frame priority than text or control information. This is because even if a part of the image or sound is lost, there is often no problem.
[0218]
FIG. 37 is a diagram for explaining a method for assigning priorities to multi-resolution image data.
[0219]
When one stream is composed of two or more substreams, a method for processing substreams is performed by adding stream priority to the substreams and describing a logical sum or logical product during accumulation or transmission. Can be defined.
[0220]
In the case of the wavelet, it is possible to decompose one video frame into a plurality of video frames having different resolutions. Also, in the DCT-based encoding method, it is possible to decompose into video frames having different resolutions by dividing and encoding into a high frequency component and a low frequency component.
[0221]
In addition to stream priority added to multiple video streams composed of a series of decomposed video frames, AND (logical product) and OR (logical sum) are used to describe the relationship between video streams. Define relationships. When the stream priority of the stream A is 5 and the stream priority of the stream B is 10 (the smaller the number is, the higher the priority is), the stream data is discarded according to the priority. Stream B is discarded, but by describing the relationship between streams, transmission and processing are performed without discarding even if the priority of stream B is lower than the threshold priority in the case of AND. Define as follows.
[0222]
As a result, related streams can be processed without being discarded. In the case of OR, conversely, it is defined that it can be discarded. As before, the discarding process may be performed at the transmission / reception terminal or the relay terminal.
[0223]
As an operator for the relation description, when the same video clip is encoded into another stream of 24 Kbps and 48 Kbps, it may be necessary to reproduce either one (exclusive OR or EX-OR as the relation description). ).
[0224]
When the priority of the former is 10 and the latter is 5, the user may reproduce the latter based on the priority, or the user may select the latter without following the priority.
[0225]
FIG. 38 is a diagram for explaining a communication payload configuration method.
[0226]
When composed of a plurality of substreams, discarding at the transmission packet level is facilitated by configuring the transmission packets according to the stream priority added to the substreams, for example, in descending order of priority. Further, even if the granularity is made fine and the information of objects having high frame priority is combined into one to form a communication packet, discarding at the communication packet level becomes easy.
[0227]
Note that it is easy to recover when a packet is dropped by associating the slice structure of an image with a communication packet. That is, by associating the slice structure of a moving image with the structure of a packet, a resync marker for resynchronization becomes unnecessary. If the slice structure does not match the structure of the communication packet, it is necessary to add a resync marker (a mark for informing the position to return) so that resynchronization can be performed when information is lost due to a packet drop or the like.
[0228]
In conjunction with this, it is conceivable to apply high error protection to communication packets with high priority. The image slice structure is a unit of image information such as GOB or MB.
[0229]
FIG. 39 is a diagram for explaining a method of associating data with a communication payload. By transmitting a method of associating a stream or an object with a communication packet together with control information or data, an arbitrary data format can be generated according to the communication status and application. For example, in RTP (Real time Transfer Protocol), an RTP payload is defined for each encoding to be handled. The current RTP format is fixed. H. In the case of H.263, three data formats from Mode A to Mode C are defined as shown in FIG. H. In H.263, a communication payload targeting a multi-resolution video format is not defined.
[0230]
In the example of FIG. And the above-described relationship description (AND, OR) are defined in addition to the data format of Mode A.
[0231]
FIG. 40 is a diagram for explaining the correspondence between the frame priority, the stream priority, and the communication packet priority.
[0232]
In addition, this figure is an example in which the priority added to the communication packet in the transmission path is the communication packet priority, and the stream priority and the frame priority correspond to the communication packet priority.
[0233]
Normally, in communication using IP, it is necessary to transmit data by associating the frame priority and stream priority added to image and audio data with the priority of the lower IP packet. Since image and audio data are divided and divided into IP packets and transmitted, it is necessary to associate priorities. In the example shown in the figure, the stream priority takes a value from 0 to 3, and the frame priority takes a value from 0 to 5, so that the higher order data can take a priority from 0 to 15.
[0234]
In IPv6, 0 to 7 of priorities (4 bits) are reserved for congestion-controlled traffic, and 8 to 15 of priorities are for real-time communication traffic or traffic that is not subject to congestion control. Reserved. Priority 15 has the highest priority, and priority 8 has the lowest priority. This is a priority at the IP packet level.
[0235]
In the transmission of data using IP, it is necessary to associate the upper priority 0 to 15 with the lower IP priority 8 to 15. The association may be a method in which a part of the higher priority is clipped, or may be associated with an evaluation function. The correspondence between the upper data and the lower IP priority is managed by a relay node (such as a router or a gateway) and a transmission / reception terminal.
[0236]
Note that the transmission means is not limited to IP only, and transmission packets having a flag indicating whether discarding or not, such as ATM or MPEG2 TS (transport stream), may be targeted.
[0237]
The frame priority and the stream priority described so far can be applied to a transmission medium and a data recording medium. It can be performed using a floppy disk, an optical disk or the like as a data recording medium.
[0238]
Further, the recording medium is not limited to this, and any recording medium such as an IC card or a ROM cassette capable of recording a program can be similarly implemented. Furthermore, the image / audio relay device such as a router or a gateway for relaying data may be targeted.
[0239]
In addition, by determining the time-series data to be retransmitted based on the information of Stream Priority (inter-time-series data priority) and Frame Priority (priority within time-series data), preferential retransmission processing is possible Become. For example, when decoding is performed at the receiving terminal based on the priority information, it is possible to prevent retransmission of streams and frames that are not subject to processing.
[0240]
In addition to the priority that is the current processing target, a priority stream or frame to be retransmitted may be determined from the relationship between the number of retransmissions and the number of successful transmissions.
[0241]
On the other hand, the terminal on the transmission side also determines the time series data to be transmitted based on the information of the stream priority (priority between time series data) and the frame priority (priority within time series data). Transmission processing is possible. For example, by determining the priority of streams and frames to be transmitted based on the average transfer rate and the number of retransmissions, adaptive video and audio transmission is possible even when the network is overloaded.
[0242]
The above embodiment is not limited to two-dimensional image composition. An expression format combining a two-dimensional image and a three-dimensional image may be used, and an image composition method for synthesizing a plurality of images adjacent to each other like a wide-field image (panoramic image) may be included. Further, the communication form targeted by the present invention is not limited to wired bidirectional CATV and B-ISDN. For example, transmission of video and audio from the center side terminal to the home side terminal is radio waves (for example, VHF band, UHF band) and satellite broadcasting, and information transmission from the home side terminal to the center side terminal is an analog telephone line or N -It may be ISDN (video, audio and data are not necessarily multiplexed). Further, a communication form using radio such as IrDA, PHS (Personal Handy Phone), and wireless LAN may be used.
[0243]
Furthermore, the target terminal may be a portable terminal such as a portable information terminal, or a desktop terminal such as a set-top BOX or personal computer.
[0244]
As described above, according to the present invention, it is easy to synchronize and play important scene cuts with audio while reflecting the handling of a plurality of video streams and a plurality of audio streams and the intention of the editor. Become.
[0245]
Embodiments of the present invention will be described below with reference to the drawings.
[0246]
The embodiment described here mainly solves any of the problems (C1) to (C3) described above.
[0247]
FIG. 41 shows the configuration of the transmission apparatus according to the first embodiment. An image input terminal 2101 has an image size of, for example, 144 pixels vertically and 176 pixels horizontally. Reference numeral 2102 denotes a moving image encoding apparatus, which includes four components 1021, 1022, 1023, and 1024 (see Recommendation H.261).
[0248]
A switch 1021 divides the input image into macroblocks (square area of 16 pixels in the vertical direction and 16 pixels in the horizontal direction), and a switch for determining whether to encode the block by intra / inter. This is a motion compensation unit that creates a motion compensated image based on a local decoded image that can be calculated from the encoding result of the above, calculates the difference between this and the input image, and outputs the result in units of macroblocks. There are half-pel motion compensation with a long processing time and full-pel motion compensation with a short processing time. Reference numeral 1023 denotes orthogonal transform means for performing DCT transform on each macroblock, and 1024 is variable length coding means for performing entropy coding on the DCT transform result and other encoded information.
[0249]
Reference numeral 2103 denotes counting means, which counts the number of executions of the four components of the moving picture encoding apparatus 2102 and outputs the result to the converting means for each input image. At this time, the motion compensation means 1022 counts the number of executions for each of the two types of half pel and full pel.
[0250]
A conversion unit 2104 outputs a data string as shown in FIG. A transmission unit 2105 multiplexes the variable-length code from the moving image encoding device 2102 and the data string from the conversion unit 2104 to form one data string and outputs the data string to the data output terminal 2109. .
[0251]
With the configuration described above, the number of executions of essential processing (switcher 1021, orthogonal transform unit 1023, variable length encoding unit 1024) and non-essential processing (motion compensation unit 1022) can be transmitted to the receiving apparatus.
[0252]
Next, FIG. 48 is a flowchart of the transmission method according to the second embodiment.
[0253]
Since the operation in the present embodiment is similar to that in the first embodiment, the corresponding elements are added. At 801, an image is input (image input terminal 2101), and at 802, the image is divided into macroblocks. Thereafter, the processing from 803 to 806 is repeated until the processing for all macroblocks is completed by the conditional branch of 807. When each process is executed so that the number of processes from 803 to 806 can be recorded in a specific variable, the corresponding variable is incremented by one.
[0254]
First, at 803, it is determined whether the macroblock to be processed is encoded intra or inter (switcher 1021). In the case of inter, motion compensation is performed at 804 (motion compensation means 1022). Thereafter, DCT transform and variable length coding are performed at 805 and 806 (orthogonal transform means 1023 and variable length coding means 1024). When processing for all the macroblocks is completed (Yes in 807), in 808, a variable indicating the number of executions corresponding to each processing is read to generate a data string as shown in FIG. The sequence and code are multiplexed and output. The processes from 801 to 808 are repeatedly executed as long as the input image continues.
[0255]
With the above configuration, the number of executions of each process can be transmitted.
[0256]
Next, FIG. 43 shows a configuration of a receiving apparatus according to the third embodiment.
[0257]
In the figure, reference numeral 307 denotes an input terminal for inputting the output of the transmission apparatus of the first embodiment, and 301 denotes a variable length code and a data string which are reversed based on the output of the transmission apparatus of the first embodiment. It is a receiving means for taking out and outputting by multiplexing, and at this time, the time required to receive one piece of data is measured and output.
[0258]
Reference numeral 303 denotes a moving picture decoding apparatus that receives a variable length code, and is composed of five components. 3031 is a variable length decoding means for extracting DCT coefficients and other encoded information from the variable length code, 3032 is an inverse orthogonal transform means for performing an inverse DCT transform process on the DCT coefficients, and 3033 is a switch. For each macroblock, the operation of distributing the output up and down is performed based on the encoding information indicating whether the encoding is performed by intra / inter. A motion compensation unit 3034 creates a motion compensated image using the previous decoded image and motion coding information, adds the output of the inverse orthogonal transform unit 3032 to the image, and outputs the motion compensated image. Reference numeral 3035 denotes an execution time measuring unit that measures the execution time from the input of the variable length code to the decoding device 303 until the completion of decoding and output of the image, and outputs this. 302 represents the number of executions of each element (variable length decoding unit 3031, inverse orthogonal transform unit 3032, switch 3033, motion compensation unit 3034) from the data string from the reception unit 301, and the execution time from the execution time measurement unit 3035. Is an estimation means for estimating the execution time of each element.
[0259]
As the estimation method, for example, if linear regression is used, the estimated execution time may be the objective variable y, and the number of executions of each element may be the explanatory variable x_i. In this case, the regression parameter a_i may be regarded as the execution time of each element. In addition, in the case of linear regression, it is necessary to accumulate a sufficient amount of past data, which consumes a lot of memory, but if you do not like this, you can use estimation of internal state variables by Kalman filter. good. In this case, it can be considered that the observation value is the execution time, the execution time of each element is the internal state variable, and the observation matrix C changes for each step according to the number of executions of each element. Reference numeral 304 denotes frequency reduction means for changing the number of executions of each element so as to reduce the number of executions of full-pel motion compensation and increase the number of executions of half-pel motion compensation by a considerable number. The calculation method of this considerable number is as follows.
[0260]
First, the number of executions of each element and the estimated execution time are received from the estimation unit 302, and the execution time is predicted. If this time exceeds the time required to receive data from the receiving unit 301, the number of executions of full-pel motion compensation is increased and the number of executions of half-pel motion compensation is decreased until the time does not exceed. Reference numeral 306 denotes a decoded image output terminal.
[0261]
Note that the motion compensation unit 3034 is instructed to perform half-pel motion compensation from the encoded information, but if the predetermined number of executions of half-pel motion compensation has been exceeded, the motion of the half-pel is rounded, Full-pel motion compensation is executed as full-pel motion.
[0262]
According to the first embodiment and the third embodiment described above, the execution time of the decoding process is predicted from the estimated execution time of each element, and this receives one piece of data. If the time required for this is exceeded (specified time), half-pel motion compensation with a long execution time is replaced with full-pel motion compensation. As a result, the execution time can be prevented from exceeding the specified time, and the problem (C1) can be solved.
[0263]
Note that the IDCT calculation processing time can be reduced by not using the high-frequency component in the IDCT calculation in the receiving apparatus. That is, in the IDCT calculation, the calculation of the low frequency component may be regarded as an essential process, and the calculation of the high frequency component as a non-essential process, and the number of calculations of the high frequency component of the IDCT calculation may be reduced.
[0264]
Next, FIG. 49 is a flowchart of the reception method according to the fourth embodiment.
[0265]
Since the operation in the present embodiment is similar to that of the third embodiment, the corresponding elements are added. In step 901, a variable a_i expressing the execution time of each element is initialized (estimating means 302). At 902, multiplexed data is input and the time required for this is measured (reception means 301). At 903, the multiplexed data is separated into a variable length code and a data string and output (receiving means 301). At 904, the number of executions is extracted from the data string (FIG. 2) and set to x_i. In 905, the actual number of executions is calculated from the execution time a_i of each element and the number of executions x_i (number reduction means 304). At 906, measurement of the execution time of the decoding process is started, a decoding process routine to be described later is started at 907, and then measurement of the execution time of the decoding process is ended at 908 (decoding of moving images) Device 303, execution time measuring means 3035). In 908, the execution time of each element is estimated from the execution time of the decoding process in 908 and the actual number of executions of each element in 905, and a_i is updated (estimating means 302). The above processing is executed for each input multiplexed data.
[0266]
In the decoding processing routine 907, variable length decoding is performed at 910 (variable length decoding unit 3031), inverse orthogonal transformation is performed at 911 (inverse orthogonal transformation unit 3032), and 912 is performed at 910. Branches according to the intra / inter information extracted by the processing (switch 3033). In the case of inter, motion compensation is performed at 913 (motion compensation means 3034). In 913, the number of executions of half-pel motion compensation is counted, and if this exceeds the actual number of executions obtained in 905, half-pel motion compensation is replaced with full-pel motion compensation. After the above processing is completed for all macroblocks (step 914), this routine is terminated.
[0267]
According to the second embodiment and the fourth embodiment described above, the execution time of the decoding process is predicted from the estimated execution time of each element, and this receives one piece of data. If the time required for this is exceeded (specified time), half-pel motion compensation with a long execution time is replaced with full-pel motion compensation. As a result, the execution time can be prevented from exceeding the specified time, and the problem (C1) can be solved.
[0268]
Next, FIG. 44 shows a configuration of a receiving apparatus according to the fifth embodiment.
[0269]
Most of the constituent elements of this embodiment are the same as those described in the second embodiment, and only the addition of two constituent elements and the correction of one constituent element will be described.
[0270]
402 is modified so that the execution time of each element obtained as a result of estimation by the estimation unit 302 described in the second embodiment is output separately from the output to the number limiting unit 304. Reference numeral 408 denotes a transmission means for generating a data string as shown in FIG. 45 from the execution time of each element and outputting it. If the execution time is expressed in 16 bits in units of microseconds, a maximum of about 65 milliseconds can be expressed. Reference numeral 409 denotes an output terminal for sending this data string to the transmission means.
[0271]
Also, the reception method corresponding to the fifth embodiment may be obtained by adding a step of generating a data string as shown in FIG. 45 immediately after 808 in FIG.
[0272]
Next, FIG. 46 shows a configuration of a transmission apparatus according to the sixth embodiment.
[0273]
Most of the constituent elements of the present embodiment are the same as those described in the first embodiment, and only the addition of two constituent elements will be described. Reference numeral 606 denotes an input terminal for receiving a data string output from the receiving apparatus according to the third embodiment, and reference numeral 607 denotes a receiving means for receiving the data string and outputting the execution time of each element. Reference numeral 608 denotes determination means for obtaining the number of executions of each element, and the procedure is as follows. First, the processing in the switcher 1021 is performed for all macroblocks in the image, and the number of executions of the switcher 1021 at this point is obtained. Further, the number of executions by the motion compensation unit 1022, the orthogonal transform unit 1023, and the variable length coding unit 1024 thereafter can be uniquely determined by the processing result up to this point. Therefore, the execution time required for decoding on the receiving apparatus side is predicted using the number of executions and the execution time from the receiving unit 607. This predictive decoding time is obtained as the sum of each element of the product of the execution time and the number of executions of each element. If the predictive decoding time is equal to or longer than the time (for example, 250 msec if the transmission rate is 64 kbit / sec) required for transmission of the code amount (for example, 16 kbits) to be generated in the current image specified by the rate controller or the like, decoding is performed. Increase the number of executions of full-pel motion compensation and reduce the number of executions of half-pel motion compensation so that the commutation time does not exceed the time required for transmission. Execution time can be reduced.)
[0274]
The moving image encoding apparatus 2102 performs each process based on the number of executions designated by the determining unit 608. For example, if the motion compensator 1022 completes execution of half-pel motion compensation for the designated number of half-pel motion compensations, then it performs only full-pel motion compensation.
[0275]
Further, the selection method may be devised so that half-pel motion compensation is uniformly distributed in an image. For example, first, all macroblocks that require half-pel motion compensation are obtained, and a quotient (3) obtained by dividing this number (for example, 12) by the number of times of half-pel motion compensation (for example, 4) is obtained. A method may be used in which half-pel motion compensation is performed only on the necessary macroblocks (0, 3, 6, 9) that can be divided by this quotient.
[0276]
According to the fifth embodiment and the sixth embodiment described above, the estimated execution time of each element is transmitted to the transmission side, and the execution time of the decoding process is predicted on the transmission side. The motion compensation of the half pel having a long execution time is replaced with the motion compensation of the full pel so that this does not exceed the time (specified time) that would be required to receive the data for one sheet. As a result, the half-pel motion compensation information is not discarded among the transmitted encoded information, so that the execution time does not exceed the specified time, and the problem (C2) can be solved.
[0277]
In the non-essential processing, the inter macroblock coding may be divided into three types of normal motion compensation, 8 × 8 motion compensation, and overlap motion compensation.
[0278]
Next, FIG. 50 is a flowchart of the transmission method according to the seventh embodiment.
[0279]
Since the operation in the present embodiment is similar to that of the sixth embodiment, the corresponding elements are added. In 1001, the initial value of the execution time of each process is set. An image is input at 801 (input terminal 2101), and the image is divided into macroblocks. At 1002, it is determined whether all macroblocks are encoded by intra / inter (switcher 1021). As a result, the number of executions of each process from 1005 to 806 is known. In 1003, the actual number of executions is calculated from the number of executions and the execution time of each process (decision means 608).
[0280]
Thereafter, the processing from 1005 to 806 is repeated until the processing for all the macroblocks is completed by the conditional branch of 807.
[0281]
In addition, when each process is executed so that the number of processes from 1005 to 806 can be recorded in a specific variable, the corresponding variable is incremented by one. First, at 1005, the process branches based on the determination result at 1002 (switch 1021). In the case of inter, motion compensation is performed at 804 (motion compensation means 1022). Here, the number of half-pel motion compensation is counted, and when this exceeds the actual number of executions obtained in 1003, full-pel motion compensation is executed instead of executing half-pel motion compensation. Thereafter, DCT transform and variable length coding are performed at 805 and 806 (orthogonal transform means 1023 and variable length coding means 1024). When processing for all the macroblocks is completed (Yes in 807), in 808, a variable indicating the number of executions corresponding to each processing is read to generate a data string as shown in FIG. The sequence and code are multiplexed and output. In 1004, the data string is received, and the execution time of each process is extracted from this and set.
[0282]
The above processing from 801 to 1004 is repeatedly executed as long as the input image continues.
[0283]
According to the paragraph starting with “M” at the end of the description of the fifth embodiment described above and the seventh embodiment, the estimated execution time of each element is transmitted to the transmission side. Then, the execution time of the decoding process is predicted on the transmission side, and half-pel motion compensation with a long execution time is performed so as not to exceed the time (specified time) that would be required to receive one piece of data. Replace with full-pel motion compensation. As a result, the half-pel motion compensation information is not discarded among the transmitted encoded information, so that the execution time does not exceed the specified time, and the problem (C2) can be solved.
[0284]
Next, FIG. 47 shows a configuration of a transmission apparatus according to the eighth embodiment.
[0285]
Most of the components of the present embodiment are the same as those described in the first embodiment, and only four components are added, which will be described.
[0286]
Reference numeral 7010 denotes an execution time measuring unit that measures the execution time from the input of an image to the encoding device 2102 until the completion of image encoding and code output, and outputs this. Reference numeral 706 denotes the number of executions of each element (switch 1021, motion compensation unit 1022, orthogonal transform unit 1023, variable length decoding unit 1024) from the data sequence from the counting unit 2103, and the execution time from the execution time measuring unit 7010. And estimating means for estimating the execution time of each element. The estimation method may be the same as that described in the estimation unit 302 of the second embodiment. Reference numeral 707 denotes an input terminal for inputting a frame rate value from the user, and reference numeral 708 denotes determination means for obtaining the number of executions of each element. The procedure is as follows.
[0287]
First, the processing in the switcher 1021 is performed for all macroblocks in the image, and the number of executions of the switcher 1021 at this point is obtained. Further, the number of executions by the motion compensation unit 1022, the orthogonal transform unit 1023, and the variable length coding unit 1024 thereafter can be uniquely determined by the processing result up to this point. Next, the sum of each product of the product of the number of times of execution and the estimated execution time of each element from the estimation means 706 is obtained to calculate the prediction encoding time. If the predicted encoding time is equal to or longer than the time available for encoding one image obtained from the reciprocal of the frame rate from 707, the number of executions of full-pel motion compensation is increased and the number of executions of half-pel motion compensation is increased. cut back.
[0288]
The number of executions is determined by repeating this increase / decrease processing and calculation of the predicted encoding time until the predicted encoding time becomes equal to or shorter than the usable time.
[0289]
The moving image encoding apparatus 2102 performs each process based on the number of executions designated by the determining unit 608. For example, if the motion compensator 1022 completes execution of half-pel motion compensation for the designated number of half-pel motion compensations, then it performs only full-pel motion compensation.
[0290]
Further, the selection method may be devised so that half-pel motion compensation is uniformly distributed in an image. For example, first, all macroblocks that require half-pel motion compensation are obtained, and a quotient (3) obtained by dividing this number (for example, 12) by the number of times of half-pel motion compensation (for example, 4) is obtained. A method may be used in which half-pel motion compensation is performed only on the necessary macroblocks (0, 3, 6, 9) that can be divided by this quotient.
[0291]
According to the eighth embodiment described above, the execution time of each process is estimated, the execution time required for encoding is predicted in advance based on the estimated execution time, and the predicted encoding time is calculated from the frame rate. The problem (C3) can be solved by determining the number of executions so that the time is less than or equal to the time available for encoding the determined image.
[0292]
Note that the motion compensation unit 1022 detects a motion vector by detecting a vector that has the smallest SAD (sum of absolute values of differences for each pixel) out of vectors in a range of 15 pixels on the left, right, top, and bottom. There is a vector detection method, but there is also a 3step motion vector detection method (described in H.261 Annex.). In this case, nine points having the same arrangement relation in the search range are selected, and the point having the smallest SAD is selected. Next, 9 points are selected again within the fitted range in the vicinity of this point, and the point with the smallest SAD is selected. Such a process is executed once again by the 3step motion vector detection method.
[0293]
Considering these two methods as non-essential processing methods, estimating the execution time respectively, predicting the execution time required for encoding based on the estimated execution time, so that the predicted execution time is less than the user-specified time, If appropriate, the number of executions of the full search motion vector detection method may be reduced, and instead, the number of executions of the 3step motion vector detection method may be increased.
[0294]
In addition to the 3-step motion vector detection method, a motion vector detection method based on the number of fixed searches that further simplifies the processing, or a motion vector detection method that returns only (0, 0) motion vectors as a result may be used in combination.
[0295]
Next, FIG. 51 is a flowchart of the transmission method according to the ninth embodiment.
[0296]
Since the operation in the present embodiment is similar to that in the eighth embodiment, the corresponding elements are added. For detailed operation in each flow, refer to the explanation of the corresponding element. Further, since it is almost the same as that of the second embodiment, only different points will be described.
[0297]
In 1101, the initial value of the execution time of each process is set in the variable a_i. Also, the frame rate is input at 1102 (input terminal 707). 1103 determines the actual number of executions from the frame rate at 1102, the execution time a_i of each process, and the number of executions of each process obtained from the intra / inter determination result at 1002 (decision unit 708). 1105 and 1106 are for measuring the execution time of the encoding process. 1104 estimates the execution time of each process from the execution time in 1106 and the actual number of executions of each process, and updates the variable a_i (estimating means 706).
[0298]
According to the ninth embodiment described above, the execution time of each process is estimated, the execution time required for encoding is predicted in advance based on the estimated execution time, and the predicted encoding time is calculated from the frame rate. The problem (C3) can be solved by determining the number of executions so that the time is less than or equal to the time available for encoding the determined image.
[0299]
In the second embodiment, when generating a data string in 808, a 2-byte area is added immediately after the start code shown in FIG. 2, and a binary representation of the code length is added here. May be.
[0300]
Further, in the fourth embodiment, the code length is extracted from the 2-byte area when the multiplexed data is input in 902, and the code transmission time obtained from the code length and the code transmission speed is calculated. , 905 may be used to calculate the number of executions (reducing the number of executions of half-pel motion compensation so as not to exceed the code transmission time).
[0301]
In the first embodiment, when a data string is generated in 2104, a 2-byte area is added immediately after the start code shown in FIG. 2, and a binary representation of the code length is added here. May be.
[0302]
Further, in the third embodiment, the code length is extracted from the 2-byte area when the multiplexed data is input in 301, and the code transmission time obtained from the code length and the code transmission speed is calculated. , 304 may be used for calculating the number of executions (reducing the number of executions of half-pel motion compensation so as not to exceed the code transmission time).
[0303]
In the fourth embodiment, immediately after 909, the actual number of executions of half-pel motion compensation is recorded, and the maximum value is calculated. If this maximum value is less than a sufficiently small value (for example, 2 or 3), a data string (data string consisting of a specific bit pattern) indicating that half-pel motion compensation is not used is generated and transmitted. You may do it. Furthermore, in the second embodiment, immediately after 808, whether or not this data string is received is confirmed. If a data string indicating that half-pel motion compensation is not used is received, motion compensation processing is performed at 808. May always be full-pel motion compensation.
[0304]
Furthermore, this idea can be applied to other than motion compensation. For example, the processing time of the DCT calculation can be reduced by not using the high frequency component in the DCT calculation. That is, when the ratio of the total execution time of the IDCT calculation exceeds a certain value in the reception method, a data string indicating that fact is transmitted to the transmission side. On the transmission side, when this data string is received, only the low frequency component may be calculated in the DCT calculation, and all the high frequency components may be set to zero.
[0305]
Furthermore, although the embodiments have been described using images, the above methods may be applied to sound other than images.
[0306]
Further, in the third embodiment, at 3034, the actual number of executions of half-pel motion compensation is recorded, and the maximum value is calculated. If this maximum value is less than a sufficiently small value (for example, 2 or 3), a data string (data string consisting of a specific bit pattern) indicating that half-pel motion compensation is not used is generated and transmitted. You may do it. Furthermore, in the first embodiment, when a data string indicating that half-pel motion compensation is not used is received, the motion compensation processing at 1022 may always be full-pel motion compensation.
[0307]
Furthermore, this idea can be applied to other than motion compensation. For example, the processing time of the DCT calculation can be reduced by not using the high frequency component in the DCT calculation. That is, when the ratio of the total execution time of the IDCT calculation exceeds a certain value in the reception method, a data string indicating that fact is transmitted to the transmission side.
[0308]
On the transmission side, when this data string is received, only the low frequency component may be calculated in the DCT calculation, and all the high frequency components may be set to zero.
[0309]
Furthermore, although the embodiment has been described using an image here, the above method may be applied to sound other than an image.
[0310]
As is clear from the above description, for example, according to the first embodiment and the third embodiment, the execution time of the decoding process is predicted from the estimated execution time of each element. If the time required to receive the minute data (specified time) is exceeded, half-pel motion compensation with a long execution time is replaced with full-pel motion compensation. As a result, the execution time can be prevented from exceeding the specified time, and the problem (C1) can be solved.
[0311]
Further, for example, according to the fifth embodiment and the seventh embodiment, the estimated execution time of each element is transmitted to the transmission side, and the execution time of the decoding process is predicted on the transmission side. Half-pel motion compensation with a long execution time is replaced with full-pel motion compensation so as not to exceed the time (specified time) that would be required to receive the data for one sheet. As a result, the half-pel motion compensation information is not discarded among the transmitted encoded information, so that the execution time does not exceed the specified time, and the problem (C2) can be solved.
[0312]
For example, according to the ninth embodiment, the execution time of each process is estimated, and based on the estimated execution time, the execution time required for encoding is predicted in advance, and the predicted encoding time is calculated from the frame rate. The problem (C3) can be solved by determining the number of executions so that the time is less than or equal to the time available for encoding the determined image.
[0313]
As described above, according to the present invention, it is possible to realize a function (CGD: Computational Graceful Degradation) that gradually lowers the quality even when the calculation load increases, and the benefits associated with the implementation are very large.
[0314]
Further, a program for causing a computer to execute all or a part of each step (or each means) described in any one of the embodiments described above (or the operation of each means) is recorded. A recording medium such as a magnetic recording medium or an optical recording medium may be created, and the same operation as described above may be performed by a computer using the recording medium.
[0315]
【The invention's effect】
As described above, according to the present invention, for example, an important scene cut is focused and synchronized with audio while reflecting the handling of a plurality of video streams and a plurality of audio streams and the intention of the editor. It becomes easy.
[Brief description of the drawings]
FIG. 1 is a schematic configuration diagram of an image / audio transmission / reception apparatus according to an embodiment of the present invention;
FIG. 2 is a diagram illustrating a reception management unit and a separation unit.
FIG. 3 is a diagram showing a method of transmitting and controlling images and sounds using a plurality of logical transmission paths.
FIG. 4 is a diagram showing a method for dynamically changing header information added to image or audio data to be transmitted.
FIGS. 5A to 5B are diagrams showing a method of adding AL information.
FIGS. 6A to 6D are diagrams illustrating an example of a method of adding AL information.
FIG. 7 is a diagram showing a method for dynamically multiplexing and separating a plurality of logical transmission paths to transmit information;
FIG. 8 is a diagram showing a transmission procedure of a broadcast program
FIG. 9A is a diagram showing a method for transmitting an image or sound in consideration of program and data reading and start-up time when the program and data are in the receiving terminal.
(B): A diagram showing a method for transmitting an image or sound in consideration of reading and startup time of the program and data when the program and data are transmitted
FIGS. 10A to 10B are diagrams showing a method for dealing with zapping;
FIG. 11 is a diagram showing a specific example of a protocol that is actually transmitted and received between terminals;
FIG. 12 is a diagram showing a specific example of a protocol that is actually transmitted and received between terminals;
FIG. 13 is a diagram showing a specific example of a protocol that is actually transmitted and received between terminals;
FIG. 14 is a diagram showing a specific example of a protocol that is actually transmitted and received between terminals;
FIG. 15 is a diagram showing a specific example of a protocol that is actually transmitted and received between terminals;
FIG. 16 is a diagram showing a specific example of a protocol that is actually transmitted and received between terminals;
FIG. 17 is a diagram showing a specific example of a protocol that is actually transmitted and received between terminals;
FIG. 18 is a diagram showing a specific example of a protocol that is actually transmitted and received between terminals;
FIG. 19 is a diagram illustrating a specific example of a protocol that is actually transmitted and received between terminals;
FIG. 20 is a diagram showing a specific example of a protocol that is actually transmitted and received between terminals;
FIG. 21 is a diagram showing a specific example of a protocol that is actually transmitted and received between terminals;
FIG. 22 is a diagram showing a specific example of a protocol that is actually transmitted and received between terminals;
FIG. 23 is a diagram illustrating a specific example of a protocol that is actually transmitted and received between terminals;
FIG. 24 is a diagram showing a specific example of a protocol that is actually transmitted and received between terminals.
25 (a) to 25 (b): CGD demo system configuration diagram of the present invention
FIG. 26 is a configuration diagram of a CGD demo system according to the present invention.
FIG. 27 is a diagram showing a method for adding priority when an encoder is overloaded.
FIG. 28 is a diagram describing a priority determination method at a receiving terminal during an overload.
FIG. 29 is a diagram showing a change in priority over time;
FIG. 30 is a diagram showing stream priority and object priority.
FIG. 31 is a schematic configuration diagram of an image encoding / decoding device according to an embodiment of the present invention.
FIG. 32 is a schematic configuration diagram of a speech encoding and speech decoding apparatus according to an embodiment of the present invention.
FIGS. 33A to 33B are diagrams showing a priority adding unit and a priority determining unit that manage the priority of processing in the event of an overload.
FIG. 34 is a diagram showing the granularity to which priority is added.
FIG. 35 is a diagram showing the granularity to which priority is added.
FIG. 36 is a diagram showing the granularity to which priority is added.
FIG. 37 is a diagram showing a method for assigning priorities to multi-resolution image data.
FIG. 38 is a diagram showing a method for configuring a communication payload
FIG. 39 is a diagram showing a method for associating data with a communication payload;
FIG. 40 is a diagram showing correspondence between object priority, stream priority, and communication packet priority
FIG. 41 is a configuration diagram of a transmission apparatus according to the first embodiment of the present invention.
42 is an explanatory diagram of the first embodiment; FIG.
FIG. 43 is a configuration diagram of a receiving apparatus according to the third embodiment of the present invention.
FIG. 44 is a block diagram of a receiving device according to the fifth embodiment of the present invention.
FIG. 45 is an explanatory diagram of the fifth embodiment.
FIG. 46 is a block diagram of a transmitting apparatus according to the sixth embodiment of the present invention.
FIG. 47 is a block diagram of a transmitting apparatus according to the eighth embodiment of the present invention.
FIG. 48 is a flowchart of a transmission method according to the second embodiment of the present invention.
FIG. 49 is a flowchart of a receiving method according to the fourth embodiment of the present invention.
FIG. 50 is a flowchart of a transmission method according to the seventh embodiment of the present invention.
FIG. 51 is a flowchart of a transmission method according to the ninth embodiment of the present invention.
FIG. 52 is a block diagram showing an example of an image / sound transmitter of the present invention.
FIG. 53 is a block diagram showing an example of an image / sound receiver of the present invention.
FIG. 54 is a diagram for explaining priority adding means for adding priority to video and audio in the image / sound transmitting apparatus of the present invention
FIG. 55 is a diagram for explaining priority determination means for interpreting the priority added to video and audio of the image / sound reception apparatus of the present invention and determining whether or not decoding processing is possible;
[Explanation of symbols]
11 Reception Manager
12 Separation part
13 Transmission unit
14 Image expansion unit
15 Image Expansion Manager
16 Image composition part
17 Output section
18 Terminal control unit
301 Receiving means
302 Estimating means
303 moving picture decoding apparatus
304 Time reduction means
306 Output terminal
307 Input terminal
3031: Variable length decoding means
3032 inverse orthogonal transform means
3033 selector
3034 Motion compensation means
3035 Execution time measurement means
4011 Transmission management unit
4012 Image encoding unit
4013 Reception management unit
4014 Image decoding unit
4015 Image composition unit
4016 output section
4101 Image coding apparatus
4102 Image decoding apparatus

Claims

(1) Time series data of encoded voice or moving image, decoding time information in which a decoding time for each frame of the time series data is described, and (2) a priority of processing between the time series data Priority between the time-series data shown, (3) a frame type indicating whether the frame constituting the time-series data of the moving image is at least an I frame or a P frame , and priority of decoding processing of the frame different from the frame type Receiving means for receiving a data series including a priority in time-series data indicating a degree;
Data processing means for decoding the time-series data for each frame ;
By comparing the elapsed time from the start of the decoding process by the data processing means and the decoding time indicated by the decoding time information, when the elapsed time from the start of the process is greater than the decoding time, If the threshold value in the time-series data priority is set lower than the threshold value in the time-series data priority, and the elapsed time from the start of processing is smaller than the decoding time, Priority determining means for setting the threshold of priority in the time-series data higher than the threshold of priority, and
The data processing means allocates the processing capability of the processing means for each time series data by associating the priority between the time series data with the priority of the processing set for the data processing means,
As a result of the comparison of the elapsed time from the start of the decoding process and the decoding time indicated by the decoding time information by the priority determination means, the elapsed time from the start of the process is larger than the decoding time, and the current time When the threshold value in the time-series data priority is set lower than the threshold value in the time-series data priority, the frame having the priority in the time-series data satisfying the threshold value in the time-series data priority set low. Decrypt,
The elapsed time from the start of processing is smaller than the decoding time, and when the threshold value for priority in time-series data is set higher than the threshold value for priority in time-series data, the set value is set higher. A data processing device that decodes the time-series data for each frame within the allocated processing capacity by decoding a frame having a priority within the time-series data that satisfies a threshold of priority within the time-series data .

(1) Time series data of encoded voice or moving image, decoding time information in which a decoding time for each frame of the time series data is described, and (2) a priority of processing between the time series data Priority between the time-series data shown, (3) a frame type indicating whether the frame constituting the time-series data of the moving image is at least an I frame or a P frame , and priority of decoding processing of the frame different from the frame type Input a data series including priority in time series data indicating the degree,
A process of decoding the time series data for each frame ,
When the elapsed time from the start of the process is larger than the decoding time by comparing the elapsed time from the start of the decoding process and the decoding time indicated by the decoding time information , the current time-series data When the threshold of the priority in the time series data is set lower than the threshold of the internal priority, and the elapsed time from the start of the processing is smaller than the decoding time, the threshold of the priority in the current time series data Rather than setting the threshold for priority in the time series data higher than
By associating the priority between the time series data with the priority of the processing set for the data processing means for processing the time series data, the processing ability of the data processing means for each time series data can be increased. Distribute,
As a result of comparing the elapsed time from the start of the decoding process and the decoding time indicated by the decoding time information, the elapsed time from the start of the process is larger than the decoding time, and the current time-series data priority is When setting the priority threshold in the time-series data lower than the threshold, decoding a frame having a priority in the time-series data that satisfies the threshold in the time-series data priority set low,
The elapsed time from the start of processing is smaller than the decoding time, and when the threshold value for priority in time-series data is set higher than the threshold value for priority in time-series data, the set value is set higher. when the time when performing the decoding of frames having sequence data in priority to meet the threshold of series data in the priority, for decoding the time-series data for each frame in the allocation processing power, data processing Method.