JP4018335B2

JP4018335B2 - Image decoding apparatus and image decoding method

Info

Publication number: JP4018335B2
Application number: JP2000385941A
Authority: JP
Inventors: 正大平; 充前田
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2000-01-05
Filing date: 2000-12-19
Publication date: 2007-12-05
Anticipated expiration: 2020-12-19
Also published as: JP2001258004A

Description

【０００１】
【発明の属する技術分野】
本発明は、動画像を入力して符号化する画像符号化装置とその方法、及びその符号化されたコードを復号する画像復号装置とその方法に関するものである。
【０００２】
【従来の技術】
従来、画像の符号化方式として、フレーム内符号化方式であるMotion JPEGやDigital Video等の符号化方式や、フレーム間予測符号化を用いたＨ．２６１，Ｈ．２６３，ＭＰＥＧ−１，ＭＰＥＧ−２等の符号化方式が知られている。これらの符号化方式は、ＩＳＯ（International Organization for Standardization：国際標準化機構）やＩＴＵ（International Telecommunication Union：国際電気通信連合）によって国際標準化されている。フレーム内符号化方式はフレーム単位で独立に符号化を行うもので、フレームの管理がしやすいため、動画像の編集や特殊再生が必要な装置に最適である。また、フレーム間予測符号化方式は、フレーム間での画像データの差分に基づくフレーム間予測を用いるため、符号化効率が高いという特徴を持っている。
【０００３】
更に、コンピュータ、放送、通信など多くの領域で利用できる、汎用的な次世代マルチメディア符号化規格としてＭＰＥＧ−４の国際標準化作業が進められている。
【０００４】
このようなディジタル符号化規格の普及に伴い、コンテンツ業界からは著作権保護の問題が強く提起されるようになってきた。即ち、著作権が保護されることが十分に保証されていない規格に対しては、安心してコンテンツを提供することができない、という問題が生じている。
【０００５】
このためＭＰＥＧ−４では、ＩＰＭＰ(Intellectual Property Management and Protect)の技術が導入され、著作権を保護するために画像の再生を中断したり再開したりする機能が検討されている。この方式では、著作権を保護する必要のあるフレームを再生しないことにより、著作権保護を実現している。
【０００６】
その一方、画像にスクランブルを施すことで、視聴者に対しある程度の概略を認識できる程度の画像を提供する方式やサービスが開始されている。具体的には、テレビジョン信号における任意の走査線や画素を置換することにより実現している。また出力される再生画像を再生装置で変換する方法もある。
【０００７】
またスケーラビリティの機能も検討されており、画像時間及び空間解像度を複数のレベルで持ちながら画像の符号化や復号を行う方法もある。
【０００８】
【発明が解決しようとする課題】
よって、以下のような問題が発生する。
▲１▼従来のＩＰＭＰ技術では、著作権を保護したい画像に対して、復号を停止したり画像の再生を止めてしまうため、視聴者に対して全く情報を提供できない。このことは、その映像等を視聴する権利を持たない視聴者に対して、そのコンテンツ（例えば画像）の情報が全く提供できないことを意味する。本来、コンテンツの提供者側は、ビジネスとしてより多くの視聴者にコンテンツを広めるたいと考えており、そのためには、視聴する権利を持たない視聴者に対しても、ある程度のコンテンツの情報を提供する必要がある。
▲２▼また前述の一連の画像符号化方式において、従来のスクランブルをこのビットストリーム全体にかけた場合、スクランブルを解除できない復号器を持つ視聴者或は視聴する権利を持たない視聴者は正常な復号ができないため、全く映像を認識することができないことになる。
▲３▼更に、一連の画像符号化方式は、画像の空間及び時間方向の相関を利用して高い符号化効率を実現しているが、符号化時の入力画像に従来のスクランブルを施すと、画像の空間及び時間方向の相関が無くなってしまい、符号化効率を著しく低下させてしまう。
▲４▼更に、ビットストリームの一部に対してスクランブルをかけたとしても、フレーム間予測符号化を用いた動画像符号化方式の再生画像では、あるフレームにおける歪みは次のフレームへ伝搬して次第に蓄積することになる。このため、歪みは定常的に発生しないことになり、復号側で再生画像を見た場合、スクランブルのための歪みか、或は別の誤動作の症状かの判別がしにくくなる。
▲５▼また近年、画像の符号化・復号装置の処理は複雑化しており、ソフトウェアによる符号化及び復号を想定する場合もでてきた。このような場合、画像符号化・復号処理以外でスクランブル処理の負荷が大きいと、装置全体としての性能が低下するという問題がある。
【０００９】
本発明は上記従来例に鑑みてなされたもので、画像符号化時に著作権を保護すべき対象のビットストリームの一部にスクランブルをかけ、符号化効率を下げることなく画像を符号化できる画像符号化装置及びその方法を提供することを目的とする。
【００１０】
また本発明の目的は、視聴権利を有する視聴者の画像復号装置では正常な再生を行い、視聴権利を持たない視聴者の画像復号装置では画像のおおよその概観を認識できる程度の再生を行うことができるように画像を符号化できる画像符号化装置及びその方法を提供することにある。
【００１１】
また本発明の目的は、上記画像符号化装置により符号化された画像を復号して再生できる画像復号装置を提供することにある。
【００１２】
【課題を解決するための手段】
上記目的を達成するための本発明の一態様による画像復号装置は以下の構成を備える。すなわち、
符号化されたビットストリームを入力し、前記ビットストリームを、知的財産を保護するための保護コードと、１つ或は複数の拡張レイヤと、前記拡張レイヤよりも解像度の低い基本レイヤとに分配する分配手段と、
外部から認証用データを入力する認証用データ入力手段と、
前記認証用データと前記保護コードとの整合を調べる認証手段と、
前記１つ又は複数の拡張レイヤのスクランブルを解除するスクランブル解除手段と、
前記分配手段によって分配された前記拡張レイヤ、または、前記スクランブル解除手段によってスクランブルを解除された拡張レイヤを復号する拡張レイヤ復号手段と、
前記分配手段によって分配された基本レイヤを復号する基本レイヤ復号手段と、
前記認証手段による認証結果が一致であり、かつ前記拡張レイヤにスクランブル処理が掛かっている場合には、前記スクランブル解除手段によって解除された拡張レイヤを復号した画像を選択し、
前記認証手段による認証結果が不一致であり、かつ前記拡張レイヤにスクランブル処理が掛かっている場合、もしくは、前記認証手段による認証結果が一致であり、かつ前記拡張レイヤにスクランブル処理が掛かっていない場合には、前記拡張レイヤを復号した画像を選択し、
前記認証手段による認証結果が不一致であり、かつ前記拡張レイヤにスクランブル処理が掛かっていない場合は、前記基本レイヤを復号した画像を選択する選択手段と、
前記選択手段によって選択された画像を出力する画像出力手段とを備える。
【００１３】
また、上記目的を達成するための本発明の他の態様による画像復号方法は、
符号化されたビットストリームを入力し、前記ビットストリームを、知的財産を保護するための保護コードと、１つ或は複数の拡張レイヤと、前記拡張レイヤよりも解像度の低い基本レイヤとに分配する分配工程と、
外部から認証用データを入力する認証用データ入力工程と、
前記認証用データと前記保護コードとの整合を調べる認証工程と、
前記１つ又は複数の拡張レイヤのスクランブルを解除するスクランブル解除工程と、
前記分配工程によって分配された前記拡張レイヤ、または、前記スクランブル解除工程によってスクランブルを解除された拡張レイヤを復号する拡張レイヤ復号工程と、
前記分配工程によって分配された基本レイヤを復号する基本レイヤ復号工程と、
前記認証工程による認証結果が一致であり、かつ前記拡張レイヤにスクランブル処理が掛かっている場合には、前記スクランブル解除工程によって解除された拡張レイヤを復号した画像を選択し、
前記認証工程による認証結果が不一致であり、かつ前記拡張レイヤにスクランブル処理が掛かっている場合、もしくは、前記認証工程による認証結果が一致であり、かつ前記拡張レイヤにスクランブル処理が掛かっていない場合には、前記拡張レイヤを復号した画像を出力し、
前記認証工程による認証結果が不一致であり、かつ前記拡張レイヤにスクランブル処理が掛かっていない場合は、前記基本レイヤを復号した画像を選択する選択工程と、
前記選択工程で選択された画像を出力する画像出力工程とを備える。
【００１６】
【発明の実施の形態】
以下、添付図面を参照して本発明の好適な実施の形態を詳細に説明する。
【００１７】
［実施の形態１］
図１は、本発明の実施の形態１に係る動画像符号化装置の構成を示すブロック図、図３は、この符号化装置により符号化された符号を復号する動画像復号装置の構成を示すブロック図である。本実施の形態１では、ＭＰＥＧ−４符号化方式において、空間スケーラビリティの機能を有し、その拡張レイヤ６００１に対してＤＣＴ係数を符号化したハフマンコードの符号ビットを反転することでスクランブルをかける場合について説明する。尚、ＭＰＥＧ−４符号化方式の詳細についてはＩＳＯ／ＩＥＣ勧告書を参照されたい。
【００１８】
図１において、１００はフレームメモリ（ＦＭ）で、１フレーム分の入力画像データを格納し、符号化単位であるマクロブロックとして出力する。ここでマクロブロックは、輝度が１６×１６画素、色差Ｃｂ、Ｃｒとも８×８画素であり、輝度は４ブロック、色差は各１ブロックとなる。１０１はＤＣＴ器で、８×８画素（ブロック）単位の２次元の離散コサイン変換（ＤＣＴ、Discrete Cosign Transform）を行って、入力されたマクロブロックをブロック毎に順次変換してＤＣＴ係数を出力する。１０２は量子化器で、ブロック毎のＤＣＴ係数に対して順次量子化を行って、その量子化代表値を出力する。１０３は逆量子化器で、量子化された量子化代表値をＤＣＴ係数として出力する。１０４は逆ＤＣＴ器で、逆量子化されたＤＣＴ係数を元の画像データに変換する。１０５は局部復号画像を格納するフレームメモリである。１０６は動き補償器で、フレームメモリ１００からの入力画像データと、フレームメモリ１０５、及び後述するアップサンプリング器３０１からの局部復号画像データを入力し、マクロブロック毎に動きベクトルの検出を行って予測画像を出力する。
【００１９】
１０７は可変長符号化器で、量子化代表値に対してハフマン符号化を行ってハフマンコードを出力する。１０８はＤＣＴ符号反転器で、可変長符号化器１０７からのハフマンコードを符号反転する。ＭＰＥＧ−４におけるハフマンコードの符号ビットは、ビット列の末尾の１ビットであり、正の場合は“０”、負の場合は“１”である。よって、この符号ビットを反転することが、ＤＣＴ符号反転器１０８の処理となる。尚、ＤＣＴ係数のブロック内をジグザグスキャン順序で連続した列をＤＣＴ[ｉ]（ｉ＝０〜６３）とした場合、本実施の形態１で反転するハフマンコードは、ｉ＝３〜６３とする。
【００２０】
１０９はセレクタで、外部から入力するスクランブルON/OFFフラグに応じて、可変長符号化器１０７からの出力と、ＤＣＴ符号反転器１０８から出力のいずれか一方を選択している。１１０は多重化器で、セレクタ１０９から出力されたハフマンコード、外部から入力するスクランブルON/OFFフラグ、及びＩＰ符号化器１１１から出力されるＩＰ符号化コードを、ユーザデータとして多重化し、ビットストリームとして出力する。このＩＰ符号化器１１１は、画像の著作権ＩＰ(Intellectual Property)を保護するための情報を外部から入力し、ＩＰ符号化コードを出力する。本実施の形態１では、このＩＰをパスワードとする。
【００２１】
次に図１におけるレイヤ間の構成について記述する。
【００２２】
３００はダウンサンプリング器で、入力画像をダウンサンプリングする。本実施の形態１では、このダウンサンプリング器３００におけるダウンサンプリングレートを“１／２”とする。３０１はアップサンプリング器で、後述するフレームメモリ２０５の局部復号画像をアップサンプリングする。本実施の形態１では、このアップサンプリング器３０１におけるアップサンプリングレートを“２”とする。３０２は多重化器で、空間スケーラビリティにおける拡張レイヤ６００１と基本レイヤ６０００のビットストリームを多重化している。
【００２３】
次に図１の基本レイヤ６０００の構成について説明する。
【００２４】
基本レイヤ６０００では、入力をダウンサンプリング器３００の出力としている点を除けば、同名の各機能ブロックは、前述の拡張レイヤ６００１と同様である。２００はフレームメモリで、入力画像１フレーム分を格納し、符号化単位であるマクロブロックとして出力する。２０１はＤＣＴ器で、８×８画素（ブロック）単位の２次元の離散コサイン変換を行う。２０２は量子化器で、ブロック毎に量子化を行って量子化代表値を出力している。２０３は逆量子化器で、量子化代表値をＤＣＴ係数として出力する。２０４は逆ＤＣＴ器で、ＤＣＴ係数に対して逆ＤＣＴを実行して画像データに変換している。２０５はフレームメモリで、局部復号画像を格納している。２０６は動き補償器で、フレームメモリ２００からの入力画像と、フレームメモリ２０５からの局部復号画像とを入力し、マクロブロック毎に動きベクトルの検出を行って予測画像を出力する。２０７は可変長符号化器で、量子化器２０２から出力される量子化代表値に対してハフマン符号化を行ってハフマンコードを出力する。２０８は多重化器で、可変長符号化器２０７からのハフマンコードを多重化してビットストリームとして出力する。
【００２５】
まず図１の上部拡張レイヤ６００１の動作について記述する。
【００２６】
以降、本実施の形態１では、フレーム内符号化をＩ−ＶＯＰ(Video Object Plane)符号化モード、１つの予測画像から予測するフレーム間予測符号化をＰ−ＶＯＰ符号化モード、２つの予測画像から予測するフレーム間予測符号化をＢ−ＶＯＰ符号化モードとする。
【００２７】
フレームメモリ１００は、入力画像を符号化単位であるマクロブロックに変換して出力する。このフレームメモリ１００から出力された画像データから、減算器により動き補償器１０６からの予測画像データが減算され、予測誤差画像データとしてＤＣＴ器１０１に入力される。ＤＣＴ器１０１は、入力されたマクロブロックの予測誤差を、各ブロック毎にＤＣＴ係数に変換する。量子化器１０２は、このＤＣＴ係数をブロック毎に所望の量子化代表値として出力する。この量子化代表値は、逆量子化器１０３、逆ＤＣＴ器１０４を介し予測誤差画像データとして復号される。この予測誤差画像データは、加算器により、動き補償器１０６からの予測画像データと加算された後、フレームメモリ１０５へ局部復号画像データとして格納される。なお、この動き補償器１０６は、外部から指定されたフレーム毎の符号化モード１に応じて予測を行って予測画像データを出力するものとする。
【００２８】
また量子化代表値を入力する可変長符号化器１０７は、その量子化代表値をハフマン符号化してハフマンコードを出力する。セレクタ１０９は、一方の端子(a)にこのハフマンコードを直接入力するとともに、他方の端子(b)にはＤＣＴ符号反転器１０８により符号ビットが反転された（スクランブルされた）ハフマンコードを入力する。このセレクタ１０９は、外部から入力するスクランブルON/OFFフラグに応じて、スクランブルON/OFFフラグがオフの場合は端子（ａ）、即ち、可変長符号化器１０７の出力を選択し、スクランブルON/OFFフラグがオンの場合は、端子（ｂ）、即ち、符号ビットを反転したハフマンコードを選択して出力する。多重化器１１０は、セレクタ１０９の出力、スクランブルON/OFFフラグ、及びＩＰ符号化器１１１から出力されるＩＰ符号データを多重化して出力する。
【００２９】
次に図１に示す基本レイヤ６０００の動作について説明する。
【００３０】
ダウンサンプリング器３００によりダウンサンプルされた画像はフレームメモリ２００へ入力されて記憶される。ＤＣＴ器２０１、量子化器２０２、逆量子化器２０３、逆ＤＣＴ器２０４、フレームメモリ２０５、フレーム単位の符号化モード２を入力する動き補償器２０６、及び可変長符号化器２０７は、前述の拡張レイヤ６００１における対応する部分と同様に動作する。また多重化器２０８は、可変長符号化器２０７の出力を多重化する。
【００３１】
次に、これら基本レイヤ６０００と拡張レイヤ６００１との間の動作についてアップサンプリング器３０１を含めて記述する。
【００３２】
図２は、本実施の形態１に係る空間スケーラビリティによる基本レイヤと拡張レイヤとの各ＶＯＰの関係を説明する図である。
【００３３】
入力画像の先頭フレームでは、まずダウンサンプリング器３００で入力画像データをダウンサンプリングした後、基本レイヤにてＩ−ＶＯＰ（フレーム内）符号化を行い、フレームメモリ２０５に局部復号画像を格納する。
【００３４】
また拡張レイヤ６００１は、フレームメモリ２０５の画像をアップサンプリング器３０１でアップサンプリングした後、このアップサンプリング器３０１の出力を参照画像として動き補償器１０６に入力し、Ｐ−ＶＯＰ（１つの予測画像から予測するフレーム間予測）符号化する。
【００３５】
次に２番目の第２フレームは、基本レイヤ６０００にて、第１フレームの符号化時にフレームメモリ２０５に格納された局部復号画像を参照してＰ−ＶＯＰ符号化される。一方、拡張レイヤ６００１では、第１フレームの符号化時にフレームメモリ１０５に格納された局部復号画像と、フレームメモリ２０５の画像データをアップサンプリング器３０１でアップサンプリングしたデータとを動き補償器１０６に入力し、Ｂ−ＶＯＰ（２つの予測画像から予測するフレーム間予測）符号化する。
【００３６】
第３フレームも第２フレームと同様に行い、これ以降はこれら３つのフレーム分の動作を繰り返す。尚、図２において、「Ｉ」はＩ−ＶＯＰ（フレーム内）符号化を、「Ｐ」はＰ−ＶＯＰ符号化を、そして「Ｂ」はＢ−ＶＯＰ符号化をそれぞれ示す。
【００３７】
図３は、本実施の形態１に係る画像復号装置の構成を示すブロック図である。まず図３における拡張レイヤ７００１の構成について記述する。
【００３８】
１０００は分配器で、拡張レイヤ７００１に入力されるビットストリームから、ハフマンコード、符号化モード１、スクランブルON/OFFフラグ、及びＩＰ符号化コードにそれぞれ分配する。１０１０はＩＰ復号器で、分配器１０００からのＩＰ符号化コードをＩＰに復号する。１０１１はＩＰ認証器で、ＩＰ復号器で復号されたＩＰと、外部から入力した認証用ＩＰとの整合を調べて認証を行う。１０１２はセキュリティ制御器で、分配器１０００からのスクランブルON/OFFフラグ、ＩＰ認証器１０１１からの認証結果に基づいて、後述するセレクタ１００２及びセレクタ１００９を制御する。
【００３９】
１００１はＤＣＴ符号反転器で、ハフマンコードのＤＣＴ係数を符号反転する。セレクタ１００２は、セキュリティ制御器１０１２からの選択信号により、分配器１０００の出力（ａ入力）とＤＣＴ符号反転器１００１の出力（ｂ入力）のいずれか一方を選択して出力する。１００３及び１００６は可変長復号器で、ハフマンコードを量子化代表値へ変換している。１００４及び１００７は逆量子化器で、量子化代表値をＤＣＴ係数として出力する。１００５及び１００８は逆ＤＣＴ器で、ＤＣＴ係数を画像へ変換している。１０１４は局部復号画像を格納するフレームメモリである。１０１３は動き補償器で、フレームメモリ１０１４からの出力、及び後述するアップサンプリング器１３０１からの局部復号画像を参照し、マクロブロック毎に動き補償を行って予測画像を出力している。セレクタ１００９は、セキュリティ制御器１０１２により、逆ＤＣＴ器１００５からの出力（ａ入力）、及び逆ＤＣＴ器１００８からの出力（ｂ入力）のいずれかを選択して出力する。
【００４０】
次に、図３における基本レイヤと拡張レイヤ間の構成について記述する。
【００４１】
１３００は分配器で、入力ビットストリームを基本レイヤと拡張レイヤとに分配する。１３０１はアップサンプリング器で、後述するフレームメモリ１１０５からの局部復号画像を入力してアップサンプリングする。セレクタ１３０２は、セキュリティ制御器１０１２からの選択信号に基づいて、拡張レイヤ７００１からの入力（ａ入力）と基本レイヤ７０００からの入力（ｂ入力）のいずれかを選択する。
【００４２】
次に図３における基本レイヤ７０００の構成について記述する。
【００４３】
１１００は分配器で、基本レイヤのビットストリームを入力してハフマンコードと符号化モード２とに分配し、ハフマンコードを可変長復号器１１０１に、符号化モード２を動き補償器１１０４に出力している。可変長復号器１１０１は、ハフマンコードを量子化代表値へ変換する。１１０２は逆量子化器で、可変長復号された量子化代表値をＤＣＴ係数として出力する。１１０３は逆ＤＣＴ器で、ＤＣＴ係数を元の画像データに変換している。フレームメモリ１１０５は、局部復号された画像データを格納する。１１０４は動き補償器で、フレームメモリ１１０５からの局部復号画像データを入力し、マクロブロック毎に動き補償を行って予測画像を出力する。
【００４４】
以上の構成に基づく動作を説明する。
【００４５】
符号化された入力ビットストリームは、分配器１３００により拡張レイヤと基本レイヤとに分配される。基本レイヤ７０００では、分配器１１００によりハフマンコード、及び符号化モード２に分配され、ハフマンコードは可変長符号化器１１０１、逆量子化器１１０２、逆ＤＣＴ器１１０３を介して画像データに復号され、Ｉ−ＶＯＰ（フレーム内）符号化の場合は、直接、フレームメモリ１１０５へ局部復号画像データが格納され、これとともにセレクタ１３０２のｂ入力に供給される。またＰ−ＶＯＰ（フレーム間予測）符号化の場合は、逆ＤＣＴ器１１０３の出力に、動き補償器１１０４の出力した予測画像データを加算し、その後、フレームメモリ１１０５へ格納するとともに、セレクタ１３０２のｂ入力に供給する。
【００４６】
一方、拡張レイヤ７００１では、分配器１０００によりハフマンコード、スクランブルON/OFFフラグ、ＩＰ、及び符号化モード１に分配される。ＤＣＴ符号反転器１００１は、符号反転したハフマンコードを出力する。
【００４７】
これ以降では、正常なハフマンコードを正常コード、符号反転したハフマンコードを反転コードと称す。尚、ＤＣＴ係数のブロック内をジグザグスキャン順序で連続した列をＤＣＴ[ｉ]（ｉ＝０〜６３）とした場合、本実施の形態１で反転するハフマンコードは、符号化器に対応してｉ＝３〜６３とする。
【００４８】
本実施の形態１における空間スケーラビリティによる基本レイヤと拡張レイヤとの各ＶＯＰの関係は、前述した図１の符号化器における場合と同様である。
【００４９】
（Ａ）まずスクランブルオンで、且つＩＰ認証結果がＯＫの場合について説明する。
セレクタ１００２はａ入力に符号反転されたハフマンコード、ｂ入力に符号反転器１００１で符号が元に戻された正常コードを入力する。そして、スクランブルオンのため、セレクタ１００２はｂ入力の正常コードを選択する。可変長復号器１００６、逆量子化器１００７、逆ＤＣＴ器１００８は、その正常コードを入力し、その結果として正常な予測誤差画像データを出力する。
【００５０】
次に動き補償器１０１３では、符号化モード１に応じてフレームメモリ１０１４の出力、アップサンプリング器１３０１の出力から予測画像を出力する。逆ＤＣＴ器１００８からの予測誤差画像と動き補償器１０１３からの予測画像データとを加算し、正常な画像データをセレクタ１００９のｂ入力に供給し、これと同時に、その正常な画像データをフレームメモリ１０１４に格納する。セレクタ１００９は、セキュリティ制御器１０１２からの選択信号によりｂ入力を選択して出力し、またセレクタ１３０２は、拡張レイヤ７００１の出力であるａ入力を選択する。その結果、本実施の形態１に係る画像復号装置は、高空間解像度の画像を再生することができる。
【００５１】
（Ｂ）スクランブルオンで、且つ、ＩＰ認証結果がＮＯの場合について記述する。
この場合、スクランブルオンのため、セレクタ１００２はｂの符号が元に戻された正常なコードを選択する。可変長符号化器１００３、逆量子化器１００４、逆ＤＣＴ器１００５は、その符号反転したコードを入力し、その復号画像を出力する。
【００５２】
次に動き補償器１０１３では、符号化モード１に応じて、フレームメモリ１０１４の出力とアップサンプリング器１３０１の出力から予測画像を出力する。逆ＤＣＴ器１００５からの予測画像は、動き補償器１０１３からの予測画像データと加算され、セレクタ１００９のａへ入力される。セレクタ１００９は、セキュリティ制御器１０１２からの制御データによりａ入力を選択して出力する。セレクタ１３０２は、セキュリティ制御器１０１２からの制御データにより、拡張レイヤａを選択する。その結果、本実施の形態１に係る復号装置は、スクランブルによる歪みのある画像を再生することになる。
【００５３】
（Ｃ）スクランブルオフで、且つ、ＩＰ認証結果がＯＫの場合について記述する。
スクランブルオフのため、セレクタ１００２はａ入力を選択して正常コードを出力する。可変長復号器１００６、逆量子化器１００７、逆ＤＣＴ器１００８は、この正常コードを入力して正常な画像を出力する。このとき入力ビットストリームはスクランブルが施されていないため、セレクタ１００９のａ，ｂ入力とも同一の正常画像データを入力する。次にセレクタ１００９は、ａ入力もしくはｂ入力を選択して出力する。セレクタ１３０２は拡張レイヤ７００１の出力であるａ入力を選択する。その結果、高空間解像度画像を再生することができる。
【００５４】
（Ｄ）スクランブルオフで、且つ、ＩＰ認証結果がＮＯの場合について説明する。
セキュリティ制御器１０１２では認証結果がＮＯのため、セレクタ１３０２は基本レイヤ７０００の出力であるｂ入力を選択して出力する。その結果、本実施の形態１に係る画像復号装置は、基本レイヤのフレーム内符号化された画像及びＰ−ＶＯＰ（フレーム間予測）符号化された画像だけを復号して再生するため、低空間解像度の画像を再生することになる。
【００５５】
図４は、本実施の形態１に係るセキュリティ制御器１０１２によるセレクタ１３０２、セレクタ１００９、セレクタ１００２の選択状態、及び再生画像との関係について示す図である。
【００５６】
ここでは、スクランブルON/OFFフラグとＩＰ認証結果とに基づいて３つのセレクタ１００２，１００９，１３０２を制御し、その結果、３種類の再生画像の状態（高解像度、低解像度、歪み有り）を生成することが可能である。
【００５７】
［多レイヤ空間スケーラビリティ構成］
図５は、前述の図１及び図３における空間スケーラビリティの機能を持つ符号化及び復号装置を多レイヤ化した構成を示す図である。なお、ここでレイヤ数は任意であり、図では「ｎ」としている。
【００５８】
図において、６０００は図１の基本レイヤ部、６００１は図１の拡張レイヤ部に相当している。また７０００は図２の基本レイヤ部、７００１は図２の拡張レイヤ部に相当している。８０００，８００１はダウンサンプリング器、８００２，８００３は対応するアップサンプリング器である。これらはレイヤ数に応じたサンプリングレートを設定することができる。但し、各レイヤとサンプリングレートは対応していなければならない。例えば、ダウンサンプリングレートを“１／２”とし、アップサンプリングレートを“２”とする。８００４は多重化器で、多レイヤ部からのビットストリームを１つに多重化している。８００５は分離器で、１つのビットストリームをレイヤ毎に分離している。８００６はセレクタで、各拡張レイヤの認証結果に従って再生するレイヤを選択する。
【００５９】
［多レイヤ空間スケーラビリティ動作］
図５の符号化装置では、各拡張レイヤ毎に著作権情報及びスクランブルON/OFFを指定する。また復号装置では、これらスクランブルON/OFF、著作権の認証結果に従った解像度の画像を復号して再生する。但し、ある拡張レイヤにスクランブルを施す場合は、その上位レイヤにおいてスクランブルが施されていなくてはならない。
【００６０】
尚、本実施の形態１の各機能をソフトウェアにより実現してもかまわない。
【００６１】
図６は本実施の形態１に係る画像の符号化処理を示すフローチャート、図７はその復号化処理を示すフローチャートである。
【００６２】
まず、図６の符号化処理を説明する。この処理は１フレームの画像データが入力されることにより開始され、まずステップＳ１で、フレーム数をカウントするカウンタｆrと、カウンタｎの値を共に“０”にセットする。尚、これらカウンタは、前述の図１における符号化モード１，２の信号があればなくてもよい。次にステップＳ２に進み、フレームカウンタｆrを＋１する。そしてステップＳ３で、フレームカウンタｆrの値が「３ｎ＋１」かどうか、即ち、図２のフレーム番号が“１”、“４”、“７”…“３ｎ＋１”かどうかを調べ、そうであって基本レイヤの場合はステップＳ４でｎを+１した後にステップＳ５に，拡張レイヤの場合はステップＳ４でｎを+１した後にステップＳ６に進む。ステップＳ５では、フレーム内符号化(I-VOP)を実行する。ステップＳ６では、ステップＳ５で符号化された符号をもとに１つの予測画像から予測するフレーム間予測符号化(P-VOP)を実行して、後述するステップＳ１０に進む。一方、ステップＳ５で符号化されたフレームの画像データは、ステップＳ７で多重化されて出力ビットストリームとして出力される。そして処理をステップＳ２へ戻す。
【００６３】
一方ステップＳ３で、フレームカウンタｆrの値が「３ｎ＋１」でなく、それが基本レイヤである場合、即ち、基本レイヤにおいてフレーム間予測符号化されるフレームの場合はステップＳ８に進む。ステップＳ８では、１つの予測画像から予測するフレーム間予測符号化(P-VOP)を実行し、処理はステップＳ７に進む。一方、拡張レイヤにおいてフレーム間予測符号化されるフレームの場合はステップＳ９において、２つの予測画像から予測するフレーム間予測符号化(B-VOP)を実行する。次にステップＳ１０で、スクランブルON/OFFフラグがオンかどうかを調べ、オンであればステップＳ１１に進み、ハフマンコードの符号ビットを反転させる。但し、ここではＤＣＴ係数列（ｉ＝０〜６３）のうちの反転するハフマンコードはｉ＝３〜６３のＤＣＴ係数とする。こうしてステップＳ１１を実行した後、或はステップＳ１０でスクランブルがオフの場合はステップＳ１２に進み、ＩＰを符号化したＩＰ符号、スクランブルON/OFFフラグ、符号化モード(I-VOP,P-VOP,B-VOP)及びフレーム間予測符号化されたコードを多重化し、ステップＳ７で、基本レイヤにおいて符号化されたコードと多重化して出力される。そして、処理をステップＳ２へ戻す。
【００６４】
図７は、本実施の形態１に係る画像復号処理を示すフローチャートである。
【００６５】
この処理は、前述の図１の符号化装置により符号化されたコードストリームを入力することにより開始され、まずステップＳ２１で、入力したビットストリームを基本レイヤと拡張レイヤとに分配する。基本レイヤのコードは、ステップＳ２２で、可変長復号、逆量子化、逆ＤＣＴ、更には動き補償による予測符号の復号処理を実行する。
【００６６】
一方、拡張レイヤの場合はステップＳ２３で、スクランブルON/OFFフラグがオンかどうかを調べ、オンであればステップＳ２４に進み、ＩＰ認証がＯＫかどうかをみる。ＯＫであれば、例えば著作権などの視聴許可を得ているユーザであるためステップＳ２５に進み、スクランブルを解除するためにハフマンコードの符号を反転する。そしてステップＳ２６に進み、可変長復号、逆量子化、逆ＤＣＴ、及び動き補償器によるＰ−ＶＯＰ及びＢ−ＶＯＰ復号処理を実行する。そしてステップＳ２７に進み、その復号されて再生された画像データを出力して表示する。この場合は高解像度の画像を表示することができる。
【００６７】
又、ステップＳ２４で、ＩＰ認証がＯＫでないときはステップＳ２６に進み、スクランブルがかかっている画像を復号して、ステップＳ２７で再生する。この場合には、スクランブルによる歪みのある画像が再生される。
【００６８】
一方、ステップＳ２３で、スクランブルフラグがオフの場合はステップＳ２９に進み、ＩＰ認証がＯＫかどうかをみる。ＯＫであれば、ステップＳ２６に進み、そのスクランブルがかかっていない画像を復号して再生する。
【００６９】
またステップＳ２８で、ＩＰ認証がＯＫでない場合はステップＳ２２に進み、Ｉ−ＶＯＰ復号或はＰ−ＶＯＰ復号による復号処理を実行し、ステップＳ２７で画像の再生（低解像度での再生）を行う。こうしてステップＳ２８で、受信画像の復号処理が全て終了するまで、上述の処理を繰り返し実行する。
【００７０】
尚、上述の実施の形態１では、ハフマンコードのスクランブル対象を前述のようにｉ＝３〜６３としたが、他の範囲でもかまわない。このように範囲を設定することにより、スクランブルの歪みを調整することができる。
【００７１】
また本実施の形態は、色信号構成４２０の場合であるが、その他の色信号構成でもかまわない。さらに図２のフレームモードの構成は別の組み合わせでもかまわない。
【００７２】
また本実施の形態ではダウンサンプリングを“１／２”、アップサンプリングレートを“２”としたが、これらは対応していればどの値でもかまわない。
【００７３】
以上説明したように本実施の形態１によれば、空間スケーラビリティの機能を持つ画像符号化器及び復号器において、拡張レイヤに対してスクランブルを施すことにより、必要に応じて復号器側で歪みを発生させ、動画像の著作権を保護することができる。
【００７４】
またスクランブルをビットストリーム全体ではなく、ハフマンコードの符号ビットに対して行うことで、装置の処理の負荷は少なくて済む。
【００７５】
またハフマンコードは、ブロック毎に符号反転するため、スクランブルのかかったビットストリームを直接復号すると歪みはブロック内に閉じたものとなる。本実施の形態１の場合、正常な画像よりもブロック歪みやモスキート歪みと言われる量子化歪みを多く発生した画像を再生することになり、その結果、スクランブルを解除できない視聴者は画像の概観を認識するに留まることになる。
【００７６】
また動画像復号装置において、局部復号画像を格納するフレームメモリ１０１４に対して、可変長復号器、逆量子化器、逆ＤＣＴ器を２系統備えており、フレームメモリ１０１４に、可変長復号器１００６、逆量子化器１００７、逆ＤＣＴ器１００８の経路の正常な復号画像が格納されるため、他方の可変長復号器１００３、逆量子化器１００４、逆ＤＣＴ器１００５の経路から生じるスクランブルによる歪みがフレーム毎に蓄積することを防止することができる。
【００７７】
また３以上のレイヤ数においても実現することが可能である。
【００７８】
［実施の形態２］
図８は、本発明の実施の形態２に係る動画像符号化装置の構成を示すブロック図であり、図１１はこれに対応する動画像復号装置の構成を示すブロック図である。本実施の形態２では、ＭＰＥＧ−２符号化方式において、時間スケーラビリティの機能を有し、その拡張レイヤに対してＤＣＴ係数を符号化したハフマンコードの符号ビットを反転することでスクランブルをかける場合について説明する。尚、ＭＰＥＧ−２符号化方式の詳細についてはＩＳＯ／ＩＥＣ勧告書を参照されたい。
【００７９】
図８は、本発明の実施の形態２に係る動画像符号化装置の構成を示すブロック図である。尚、図８において、前述の実施の形態１と同様の構成要素については同一番号を付し、その詳細な説明は省略する。よって本実施の形態２では、前述の実施の形態１と異なる点について詳しく説明する。
【００８０】
７００は分配器で、入力画像を拡張レイヤと基本レイヤへ、各フレーム毎に割り当て、それと同時にオーダリング（フレーム並び替え）を行う。７０２は動き補償器で、フレームメモリ１００からの入力画像と、基本レイヤのフレームメモリ２０５からの局部復号画像データを入力し、マクロブロック毎に動きベクトル検出を行って予測画像を出力する。７０１は多重化器で、時間スケーラビリティにおける拡張レイヤと基本レイヤのビットストリームを多重化している。
【００８１】
以上の構成に基づく動作を説明する。
【００８２】
本実施の形態２では、フレーム内符号化をI-Picture符号化モード、１つの予測画像から予測するフレーム間予測符号化をP-Picture符号化モード、２つの予測画像から予測するフレーム間予測符号化をB-Picture符号化モードと称す。
【００８３】
入力画像は、分配器７００によりフレーム毎に、外部から入力された符号化モード１に従って基本レイヤと拡張レイヤへ分配される。
【００８４】
図９及び図１０は、本実施の形態２における時間スケーラビリティによる基本レイヤと拡張レイヤとの各フレームの関係について示す図である。ここで図９は入力画像の順序を示し、図１０は分配器７００によりフレームの順序が並び替えられた後のフレームの構成を示す図である。ここで、入力画像は図９に示す順序で各レイヤへ割り当てられる。
【００８５】
入力画像の先頭フレームを、まず基本レイヤに入力してI-Picture符号化を行い、フレームメモリ２０５に局部復号画像を格納する、次に第２フレームを基本レイヤに入力し、フレームメモリ２０５に局部復号画像を参照してP-Picture符号化を行う。次に第３フレームを拡張レイヤに入力し、フレームメモリ１００において入力画像をマクロブロックへ変換する。
【００８６】
動き補償器７０２は、フレームメモリ１００の出力を入力し、既に基本レイヤのフレームメモリ２０５に格納された第１と第２フレームの２つのフレームの参照画像から動き補償を行う。つまり、B-Picture符号化を行う。次の第４フレームも第３フレームと同様にB-Picture符号化を行い、それ以降はフレームモードに従って動作を繰り返す。
【００８７】
次に図１１に示す動画像復号装置について説明する。この図１１における構成要素のうち、前述の実施の形態１と同様の構成要素については同一番号を付してその詳細な説明は省略する。よって本実施の形態２では、前述の実施の形態１と異なる点について記述する。
【００８８】
２０００は分配器で、入力ビットストリームを基本レイヤと拡張レイヤとに分配する。２００２は動き補償器で、基本レイヤ内のフレームメモリ１１０５からの局部復号画像データを入力し、マクロブロック毎に予測画像を出力する。２００１はセレクタで、分配器１０００からの符号化モード１に応じて、拡張レイヤの出力（ａ入力）と基本レイヤの出力（ｂ入力）を選択するセレクタであり、同時にリオーダリング（フレーム並び替え）を行い、画像を表示順序に並び替える。
【００８９】
以上の構成において、入力ビットストリームは分配器２０００により拡張レイヤと基本レイヤに分配される。本実施の形態における基本レイヤと拡張レイヤとの各フレームの関係については図１０と同様である。
【００９０】
まず第１から第２フレームでは、基本レイヤにビットストリームを入力し、第１フレームをI-Picture、第２フレームをP-Pictureとして復号する。そして第３フレームでは、拡張レイヤにビットストリームを入力し、動き補償器２００２にて既に基本レイヤのフレームメモリ１１０５に格納されている第１と第２フレームの２つの画像を参照しB-Pictureとして復号する。また第４フレームでは、拡張レイヤにビットストリームを入力し、第３フレームと同様の復号を行う。以降はフレームモードに従い動作を繰り返す。
【００９１】
セレクタ２００１は、分配器１０００からの符号化モード１により、拡張レイヤ（ａ入力）と基本レイヤ（ｂ入力）を選択し、さらにリオーダリングを行い復号画像を出力する。
【００９２】
前述した図４において、セレクタ１３０２をセレクタ２００１とすれば各セレクタと再生画像との関係は図４と同じになる。但し、この場合の再生画像の解像度は時間周波数に対するものとなる。つまり、低解像度画像とは基本レイヤのみの画像を示し、高解像度画像とは拡張レイヤを含めた画像を示す。
【００９３】
以上説明したように本実施の形態２によれば、時間スケーラビリティの機能を持つ画像符号化器及び復号器において、拡張レイヤに対してスクランブルを施すことで、必要に応じて復号器側でフレーム毎に歪みを発生させ、動画像の著作権を保護することができる。
【００９４】
またスクランブルをビットストリーム全体ではなくハフマンコードの符号ビットに対して行うことで、装置の処理の負荷が少なくて済む。
【００９５】
またスクランブルは、拡張レイヤに対してのみ行われるため、両レイヤを連続して再生し、且つ、スクランブルの歪みのある拡張レイヤの画像を再生する場合には、歪みのない画像と、歪みのある画像が交互に再生されることになる。その結果、スクランブルを解除できない視聴者は画像の概観を認識するに留まることになる。
【００９６】
また空間スケーラビリティと同様に、３以上のレイヤ数においても実現することが可能である。
【００９７】
尚、この実施の形態２に係る処理も前述の実施の形態１と同様にソフトウェアにより処理することが可能である。その場合の処理の流れを示すフローチャートは、前述の実施の形態１と基本的に同様であるため、その説明を省略する。
【００９８】
［実施の形態３］
図１２は、本発明の実施の形態３に係る画像符号化装置の構成を示すブロック図であり、図１３は、これに対応する画像復号装置の構成を示すブロック図である。本実施の形態３では、フレーム内符号化方式において、ＭＰＥＧ符号化方式の空間スケーラビリティに類する機能を有し、その拡張レイヤに対してＤＣＴ係数を符号化したハフマンコードの符号ビットを反転することでスクランブルをかける場合について説明する。
【００９９】
図１２において、前述の実施の形態１と同様の構成要素については同一番号を付し、その詳細な説明は省略する。よって本実施の形態３では実施の形態１と異なる点について記述する。
【０１００】
３０００は分離器で、可変長符号化器２０７の出力からブロック内の位置に対応してＤＣＴの低周波成分と高周波成分を分離する。３００１は多重化器で、分離器３０００で分離した各レイヤのコードを多重化する。
【０１０１】
尚、ＤＣＴ係数のブロック内をジグザグスキャン順序で連続した列をＤＣＴ[ｉ]（ｉ＝０〜６３）とした場合、本実施の形態３の分離器３０００で拡張レイヤへ出力するハフマンコードは、ｉ＝３〜６３とする。
【０１０２】
図１２において、入力画像はフレームメモリ２００においてマクロブロック化され、ＤＣＴ器２０１、量子化器２０２、可変長符号化器２０７を介して可変長符号化コードとなる。この符号化コードは分離器３０００によりブロック内の成分毎に分離され、低周波成分は多重化器３００１へ、その他の成分はＤＣＴ符号反転器１０８、セレクタ１０９へ出力される。
【０１０３】
図１３の画像復号装置において、前述の実施の形態１と同様の構成要素については同一番号を付してその詳細な説明は省略する。よって本実施の形態３では、実施の形態１と異なる点について記述する。
【０１０４】
４０００は分離器で、入力ビットストリームを拡張レイヤと基本レイヤへ分離する。４００１は多重化器で、両レイヤのコードを多重化する。４００２はセレクタで、セキュリティ制御器１０１２からの選択信号により、黒画像（ａ入力）もしくは逆ＤＣＴ器１１０３の出力を選択する。
【０１０５】
以上の構成に基づく復号装置の動作を説明する。
【０１０６】
まず入力したビットストリームは分離器４０００において拡張レイヤと基本レイヤへ分離される。これ以降では、正常なハフマンコードを正常コード、符号反転したハフマンコードを反転コードと称す。尚、ＤＣＴ係数のブロック内をジグザグスキャン順序で連続した列をＤＣＴ[ｉ]（ｉ＝０〜６３）とした場合、本実施の形態３の分離器４０００で拡張レイヤへ出力するハフマンコードは、ｉ＝３〜６３とする。
【０１０７】
（Ａ）スクランブルオンで、且つ、ＩＰ認証結果がＯＫの場合について説明する。
セキュリティ制御器１０１２では認証結果がＯＫのため、セレクタ１００２はｂ入力の正常コードを選択する。多重化器４００１は、分離器４０００からの低周波成分のコードと、セレクタ１００２からの高周波成分のコードとを多重化する。可変長復号器１１０１、逆量子化器１１０２、逆ＤＣＴ器１１０３は、正常コードを入力する。その結果、正常な画像を復号することができる。ここでセレクタ４００２は、セキュリティ制御器１０１２からの選択信号によりｂ入力を選択して出力する。その結果、本復号装置は正常な画像（高解像度）を再生する。
【０１０８】
（Ｂ）スクランブルオンで、且つ、ＩＰ認証結果がＮＯの場合について記述する。
セキュリティ制御器１０１２では認証結果がＮＯのため、セレクタ１００２はａ入力の反転コードを選択する。このため高周波成分は反転コードとなったまま逆ＤＣＴ器１１０３から出力される。セレクタ４００２はセキュリティ制御器１０１２からの選択信号によりｂ入力を選択して出力する。その結果、スクランブルによる歪みのある画像が再生される。
【０１０９】
（Ｃ）スクランブルオフで、且つ、ＩＰ認証結果がＯＫの場合について記述する。
セキュリティ制御器１０１２では認証結果がＯＫのため、セレクタ１００２はａ入力を選択し正常コードを出力する。このため全ての周波数成分は正常となり逆ＤＣＴ器１１０３から出力される。セレクタ４００２はセキュリティ制御器１０１２からの選択信号によりｂ入力を選択して出力する。その結果、正常な画像を再生することができる。
【０１１０】
（Ｄ）スクランブルオフで、且つ、ＩＰ認証結果がＮＯの場合について記述する。
セキュリティ制御器１０１２では認証結果がＮＯのため、セレクタ４００２は黒画像（ａ入力）を選択して出力する。その結果、黒画像を再生する。
【０１１１】
図１４は、本実施の形態３におけるセキュリティ制御器１０１２によるセレクタ４００２、セレクタ１００２の選択状態、及び再生画像との関係について示す図である。
【０１１２】
このように、スクランブルON/OFFフラグとＩＰ認証結果により、２つのセレクタ１００２，４００２を制御し、その結果として３種類の再生画像の状態を生成することが可能である。
【０１１３】
以上説明したように本実施の形態３によれば、フレーム内符号化方式のため、静止画像及び動画像符号化・復号装置で実現可能である。
【０１１４】
空間スケーラビリティの機能を持つ画像符号化器及び復号器において、拡張レイヤに対してスクランブルを施すことで、必要に応じて復号器側で歪みを発生させ、画像の著作権を保護することができる。
【０１１５】
またスクランブルをビットストリーム全体ではなくハフマンコードの符号ビットに対して行うことで、装置の処理の負荷は少なくて済む。
【０１１６】
またハフマンコードはブロック毎に符号反転するため、スクランブルのかかったビットストリームを直接復号すると歪みはブロック内に閉じたものとなる。本実施の形態の場合、正常な画像よりもブロック歪みやモスキート歪みと言われる量子化歪みを多く発生した画像を再生することになり、結果としてスクランブルを解除できない視聴者は画像の概観を認識するに留まることになる。
【０１１７】
なお本発明は、複数の機器（例えばホストコンピュータ、インタフェイス機器、リーダ、プリンタなど）から構成されるシステムに適用しても、一つの機器からなる装置（例えば、複写機、ファクシミリ装置など）に適用してもよい。
【０１１８】
また本発明の目的は、前述した実施形態の機能を実現するソフトウェアのプログラムコードを記録した記憶媒体（または記録媒体）を、システムあるいは装置に供給し、そのシステムあるいは装置のコンピュータ（またはCPUやMPU）が記憶媒体に格納されたプログラムコードを読み出し実行することによっても達成される。この場合、記憶媒体から読み出されたプログラムコード自体が前述した実施形態の機能を実現することになり、そのプログラムコードを記憶した記憶媒体は本発明を構成することになる。また、コンピュータが読み出したプログラムコードを実行することにより、前述した実施形態の機能が実現されるだけでなく、そのプログラムコードの指示に基づき、コンピュータ上で稼働しているオペレーティングシステム(OS)などが実際の処理の一部または全部を行い、その処理によって前述した実施形態の機能が実現される場合も含まれる。
【０１１９】
さらに、記憶媒体から読み出されたプログラムコードが、コンピュータに挿入された機能拡張カードやコンピュータに接続された機能拡張ユニットに備わるメモリに書込まれた後、そのプログラムコードの指示に基づき、その機能拡張カードや機能拡張ユニットに備わるCPUなどが実際の処理の一部または全部を行い、その処理によって前述した実施形態の機能が実現される場合も含まれる。
【０１２０】
以上の説明したように本実施の形態によれば、スケーラビリティの機能を持つ画像符号化装置及び復号装置において、ビットストリーム内にＩＰ符号化コード、及び著作権を保護するための付加情報を設けることにより、所望の画像の一部に対してコンテンツ提供者の著作権を守ることが可能となる。
【０１２１】
また本実施の形態によれば、ビットストリームの一部の限定されたコードに対してスクランブルを行うため、符号化コード全体を対象とするよりもその処理の負荷は少なくて済む。またその歪みはブロック内に閉じたものとなるため、スクランブルを解除できない視聴者は画像の概観を認識するという状態を作ることが可能となる。
【０１２２】
【発明の効果】
以上説明したように本発明によれば、画像符号化時に著作権を保護すべき対象のビットストリームの一部にスクランブルをかけ、符号化効率を下げることなく画像を符号化できる。
【０１２３】
また本発明によれば、視聴権利を有する視聴者の画像復号装置では正常な再生を行い、視聴権利を持たない視聴者の画像復号装置では画像のおおよその概観を認識できる程度の再生を行うことができるように画像を符号化できるという効果がある。
【０１２４】
また、本発明によれば、上記画像符号化装置により符号化された画像を復号し、正規の視聴権利を有するか否かの認証結果とスクランブルの有無の組合せにより、３種類の画像（高解像度、低解像度、歪み有り）を再生できるという効果がある。
【図面の簡単な説明】
【図１】本発明の実施の形態１に係る画像符号化装置の主要構成を示すブロック図である。
【図２】本実施の形態１に係る空間スケーラビリティによる基本レイヤと拡張レイヤとの各ＶＯＰの関係を示す図である。
【図３】本発明の実施の形態１に係る画像復号化装置の主要構成を示すブロック図である。
【図４】本発明の実施の形態１におけるセキュリティ制御器の入出力と再生画像の関係とを説明する図である。
【図５】本発明の実施の形態１に係る多レイヤの空間スケーラビリティの構成を示す図である。
【図６】本発明の実施の形態１に係る画像符号化処理を説明するフローチャートである。
【図７】本発明の実施の形態１に係る画像復号処理を示すフローチャートである。
【図８】本発明の実施の形態２に係る画像符号化装置の主要構成を示すブロック図である。
【図９】本実施の形態２に係る空間スケーラビリティによる基本レイヤと拡張レイヤとにおけるフレーム表示順序の関係を示す図である。
【図１０】本実施の形態２に係る空間スケーラビリティによる基本レイヤと拡張レイヤとにおけるフレームの符号化順序の関係を示す図である。
【図１１】本発明の実施の形態２に係る画像復号装置の主要構成を示すブロック図である。
【図１２】本発明の実施の形態３に係る画像符号化装置の主要部の構成を示すブロック図である。
【図１３】本実施の形態３に係る画像復号装置の主要構成を示すブロック図である。
【図１４】本実施の形態３におけるセキュリティ制御器の入出力と再生画像の関係を示す図である。[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an image encoding apparatus and method for inputting and encoding a moving image, and an image decoding apparatus and method for decoding the encoded code.
[0002]
[Prior art]
Conventionally, as an image encoding method, an intra-frame encoding method such as Motion JPEG or Digital Video, or H.264 using inter-frame predictive encoding is used. 261, H.M. Coding schemes such as H.263, MPEG-1, and MPEG-2 are known. These encoding methods are internationally standardized by ISO (International Organization for Standardization) and ITU (International Telecommunication Union). The intra-frame coding method performs coding independently for each frame, and is easy to manage the frames, so that it is most suitable for an apparatus that requires editing of moving images and special reproduction. In addition, the inter-frame predictive coding method has a feature that coding efficiency is high because inter-frame prediction based on a difference in image data between frames is used.
[0003]
Furthermore, an international standardization work for MPEG-4 is in progress as a general-purpose next-generation multimedia coding standard that can be used in many fields such as computers, broadcasting, and communication.
[0004]
With the spread of such digital coding standards, the content industry has strongly raised the problem of copyright protection. That is, there is a problem that the content cannot be provided with peace of mind for a standard whose copyright is not sufficiently protected.
[0005]
For this reason, in MPEG-4, IPMP (Intellectual Property Management and Protect) technology has been introduced, and a function of interrupting or resuming reproduction of an image is being studied in order to protect copyright. In this method, copyright protection is realized by not reproducing a frame that requires copyright protection.
[0006]
On the other hand, methods and services have been started to provide viewers with images that can be recognized to some extent by scrambling the images. Specifically, this is realized by replacing an arbitrary scanning line or pixel in the television signal. There is also a method for converting an output reproduced image by a reproducing apparatus.
[0007]
Scalability functions are also being studied, and there is a method for encoding and decoding images while having multiple levels of image time and spatial resolution.
[0008]
[Problems to be solved by the invention]
Therefore, the following problems occur.
{Circle around (1)} With conventional IPMP technology, it is impossible to provide information to the viewer at all because the decoding is stopped or the reproduction of the image is stopped for the image whose copyright is to be protected. This means that information of the content (for example, an image) cannot be provided to a viewer who does not have the right to view the video or the like. Originally, the content provider wants to spread the content to a larger number of viewers as a business, and to that end, it provides a certain amount of content information to viewers who do not have the right to view. There is a need to.
(2) In addition, in the above-described series of image encoding schemes, when conventional scrambling is applied to the entire bitstream, a viewer who has a decoder that cannot scramble or a viewer who does not have the right to view normal decoding. Cannot be recognized at all.
(3) Furthermore, a series of image coding schemes achieve high coding efficiency by utilizing the spatial and temporal correlation of the image, but when conventional scrambling is applied to the input image at the time of coding, The correlation between the image space and the time direction is lost, and the coding efficiency is significantly reduced.
(4) Furthermore, even if a part of the bit stream is scrambled, in a reproduced image of a moving image coding method using interframe predictive coding, distortion in one frame propagates to the next frame. It will gradually accumulate. For this reason, distortion does not occur constantly, and when the reproduced image is viewed on the decoding side, it is difficult to determine whether it is distortion for scrambling or another symptom of malfunction.
(5) In recent years, the processing of an image encoding / decoding apparatus has become complicated, and it has been assumed that encoding and decoding by software are assumed. In such a case, there is a problem that the performance of the entire apparatus is lowered when the load of the scramble process other than the image encoding / decoding process is large.
[0009]
The present invention has been made in view of the above-described conventional example. An image code that can scramble a part of a target bitstream to be protected at the time of image encoding and encode an image without lowering the encoding efficiency. An object of the present invention is to provide an apparatus and a method thereof.
[0010]
It is another object of the present invention to perform normal reproduction in an image decoding device of a viewer who has viewing rights, and to perform reproduction so that an approximate overview of an image can be recognized in an image decoding device of a viewer who does not have viewing rights. It is an object to provide an image encoding apparatus and method for encoding an image so that the image can be transmitted.
[0011]
It is another object of the present invention to provide an image decoding apparatus that can decode and reproduce an image encoded by the image encoding apparatus.
[0012]
[Means for Solving the Problems]
  In order to achieve the above object, an image decoding apparatus according to an aspect of the present invention has the following arrangement. That is,
  An encoded bitstream is input and the bitstream is protected with a protection code for protecting intellectual property and one or moreExpansionWith layersA base layer having a lower resolution than the enhancement layerDistribution means for distributing to,
  Authentication data input means for inputting authentication data from the outside,
  An authentication means for checking consistency between the authentication data and the protection code;
  SaidOne or moreExpansionA descrambling means for descrambling the layer;
  Enhancement layer decoding means for decoding the enhancement layer distributed by the distribution means or the enhancement layer descrambled by the descrambling means;
  Base layer decoding means for decoding the base layer distributed by the distributing means;
  If the authentication result by the authentication means is coincident and the enhancement layer is scrambled, select an image obtained by decoding the enhancement layer released by the descrambling means,
  When the authentication result by the authentication means does not match and the enhancement layer is scrambled, or when the authentication result by the authentication means matches and the extension layer is not scrambled Selects an image obtained by decoding the enhancement layer,
  If the authentication result by the authentication unit is inconsistent and the enhancement layer has not been scrambled, a selection unit that selects an image obtained by decoding the base layer;
  SaidChoiceTo the meansChooseImage output means for outputting the processed image.
[0013]
  An image decoding method according to another aspect of the present invention for achieving the above object is as follows:
  An encoded bitstream is input and the bitstream is protected with a protection code for protecting intellectual property and one or moreExpansionWith layersA base layer having a lower resolution than the enhancement layerA distribution step of distributing to
  An authentication data input process for inputting authentication data from the outside,
  An authentication step for checking the consistency between the authentication data and the protection code;
  SaidOne or moreExpansionA descrambling step for descrambling the layer;
  An enhancement layer decoding step of decoding the enhancement layer distributed by the distribution step or the enhancement layer descrambled by the descrambling step;
  A base layer decoding step of decoding the base layer distributed by the distributing step;
  If the authentication result by the authentication step is the same and the enhancement layer has been scrambled, select an image obtained by decoding the enhancement layer released by the descrambling step,
  When the authentication result by the authentication step is inconsistent and the enhancement layer is scrambled, or when the authentication result by the authentication step is coincident and the extension layer is not scrambled Outputs an image obtained by decoding the enhancement layer,
  If the authentication result of the authentication step is inconsistent and the enhancement layer is not scrambled, a selection step of selecting an image obtained by decoding the base layer;
  SaidChoiceIn the processChoiceAn image output step for outputting the processed image.
[0016]
DETAILED DESCRIPTION OF THE INVENTION
Preferred embodiments of the present invention will be described below in detail with reference to the accompanying drawings.
[0017]
[Embodiment 1]
FIG. 1 is a block diagram showing the configuration of a video encoding device according to Embodiment 1 of the present invention, and FIG. 3 shows the configuration of a video decoding device that decodes a code encoded by this encoding device. It is a block diagram. In the first embodiment, in the MPEG-4 encoding system, the spatial scalability function is provided, and the enhancement layer 6001 is scrambled by inverting the sign bit of the Huffman code obtained by encoding the DCT coefficient. Will be described. Refer to the ISO / IEC recommendation for details of the MPEG-4 encoding method.
[0018]
In FIG. 1, reference numeral 100 denotes a frame memory (FM) which stores input image data for one frame and outputs it as a macroblock which is a coding unit. Here, the macro block has a luminance of 16 × 16 pixels, and the color differences Cb and Cr are both 8 × 8 pixels, the luminance is 4 blocks, and the color difference is 1 block each. Reference numeral 101 denotes a DCT device, which performs two-dimensional discrete cosine transform (DCT) in units of 8 × 8 pixels (blocks), sequentially converts input macroblocks for each block, and outputs DCT coefficients. . Reference numeral 102 denotes a quantizer that sequentially quantizes the DCT coefficients for each block and outputs the quantized representative value. Reference numeral 103 denotes an inverse quantizer that outputs the quantized representative value as a DCT coefficient. Reference numeral 104 denotes an inverse DCT unit that converts the inversely quantized DCT coefficients into original image data. Reference numeral 105 denotes a frame memory for storing locally decoded images. A motion compensator 106 receives input image data from the frame memory 100 and locally decoded image data from the frame memory 105 and an upsampling unit 301 described later, and performs motion vector detection for each macroblock to perform prediction. Output an image.
[0019]
A variable length encoder 107 performs Huffman coding on the quantized representative value and outputs a Huffman code. Reference numeral 108 denotes a DCT code inverter that inverts the Huffman code from the variable length encoder 107. The sign bit of the Huffman code in MPEG-4 is the last bit of the bit string, which is “0” for positive and “1” for negative. Therefore, inversion of this sign bit is the processing of the DCT code inverter 108. Note that when DCT [i] (i = 0 to 63) is a continuous column in the DCT coefficient block in the zigzag scan order, the Huffman code inverted in the first embodiment is i = 3 to 63. .
[0020]
Reference numeral 109 denotes a selector that selects either the output from the variable length encoder 107 or the output from the DCT code inverter 108 in accordance with a scramble ON / OFF flag input from the outside. A multiplexer 110 multiplexes a Huffman code output from the selector 109, a scramble ON / OFF flag input from the outside, and an IP encoded code output from the IP encoder 111 as user data, and a bit stream. Output as. The IP encoder 111 receives information for protecting the copyright IP (Intellectual Property) of the image from the outside, and outputs an IP encoded code. In the first embodiment, this IP is a password.
[0021]
Next, the configuration between layers in FIG. 1 will be described.
[0022]
Reference numeral 300 denotes a downsampler, which downsamples an input image. In the first embodiment, the downsampling rate in the downsampler 300 is set to “1/2”. An upsampling unit 301 upsamples a locally decoded image in a frame memory 205 described later. In the first embodiment, the upsampling rate in the upsampler 301 is set to “2”. A multiplexer 302 multiplexes the bit streams of the enhancement layer 6001 and the base layer 6000 in spatial scalability.
[0023]
Next, the configuration of the base layer 6000 in FIG. 1 will be described.
[0024]
In the basic layer 6000, each functional block with the same name is the same as the above-described enhancement layer 6001 except that the input is the output of the downsampler 300. Reference numeral 200 denotes a frame memory which stores one frame of an input image and outputs it as a macroblock which is a coding unit. A DCT unit 201 performs two-dimensional discrete cosine transform in units of 8 × 8 pixels (blocks). Reference numeral 202 denotes a quantizer that performs quantization for each block and outputs a quantized representative value. Reference numeral 203 denotes an inverse quantizer that outputs a quantized representative value as a DCT coefficient. Reference numeral 204 denotes an inverse DCT device that performs inverse DCT on the DCT coefficient to convert it into image data. A frame memory 205 stores a locally decoded image. A motion compensator 206 receives an input image from the frame memory 200 and a locally decoded image from the frame memory 205, detects a motion vector for each macroblock, and outputs a predicted image. A variable length encoder 207 performs Huffman coding on the quantized representative value output from the quantizer 202 and outputs a Huffman code. A multiplexer 208 multiplexes the Huffman code from the variable length encoder 207 and outputs it as a bit stream.
[0025]
First, the operation of the upper extension layer 6001 in FIG. 1 will be described.
[0026]
Hereinafter, in the first embodiment, intra-frame coding is performed in an I-VOP (Video Object Plane) coding mode, inter-frame prediction coding that is predicted from one predicted image is performed in a P-VOP coding mode, and two predicted images. The inter-frame prediction coding predicted from the B-VOP coding mode is used.
[0027]
The frame memory 100 converts the input image into a macro block which is a coding unit and outputs it. Prediction image data from the motion compensator 106 is subtracted from the image data output from the frame memory 100 by a subtracter and input to the DCT unit 101 as prediction error image data. The DCT unit 101 converts the input macroblock prediction error into DCT coefficients for each block. The quantizer 102 outputs the DCT coefficient as a desired quantization representative value for each block. The quantized representative value is decoded as prediction error image data via the inverse quantizer 103 and the inverse DCT unit 104. The prediction error image data is added to the prediction image data from the motion compensator 106 by an adder, and then stored in the frame memory 105 as locally decoded image data. Note that the motion compensator 106 performs prediction according to the encoding mode 1 for each frame designated from the outside and outputs predicted image data.
[0028]
The variable length encoder 107 that receives the quantized representative value encodes the quantized representative value by Huffman coding and outputs a Huffman code. The selector 109 directly inputs the Huffman code to one terminal (a), and inputs the Huffman code whose sign bit is inverted (scrambled) by the DCT code inverter 108 to the other terminal (b). . This selector 109 selects the terminal (a), that is, the output of the variable length encoder 107 when the scramble ON / OFF flag is OFF in accordance with the scramble ON / OFF flag input from the outside, and selects the scramble ON / OFF When the OFF flag is on, the terminal (b), that is, the Huffman code with the sign bit inverted is selected and output. The multiplexer 110 multiplexes and outputs the output of the selector 109, the scramble ON / OFF flag, and the IP code data output from the IP encoder 111.
[0029]
Next, the operation of the base layer 6000 shown in FIG. 1 will be described.
[0030]
The image downsampled by the downsampler 300 is input to the frame memory 200 and stored therein. The DCT unit 201, the quantizer 202, the inverse quantizer 203, the inverse DCT unit 204, the frame memory 205, the motion compensator 206 that inputs the frame-by-frame encoding mode 2, and the variable length encoder 207 are described above. It operates in the same manner as the corresponding part in the enhancement layer 6001. The multiplexer 208 multiplexes the output of the variable length encoder 207.
[0031]
Next, the operation between the base layer 6000 and the enhancement layer 6001 will be described including the upsampler 301.
[0032]
FIG. 2 is a diagram for explaining the relationship of each VOP between the base layer and the enhancement layer based on the spatial scalability according to the first embodiment.
[0033]
In the first frame of the input image, first, the downsampler 300 downsamples the input image data, performs I-VOP (intraframe) encoding in the base layer, and stores the locally decoded image in the frame memory 205.
[0034]
The enhancement layer 6001 up-samples the image in the frame memory 205 with the up-sampler 301, and then inputs the output of the up-sampler 301 as a reference image to the motion compensator 106 to generate a P-VOP (from one predicted image). Predict interframe prediction) encoding.
[0035]
Next, the second second frame is P-VOP encoded by the base layer 6000 with reference to the locally decoded image stored in the frame memory 205 when the first frame is encoded. On the other hand, in enhancement layer 6001, the locally decoded image stored in frame memory 105 at the time of encoding the first frame and the data obtained by up-sampling image data in frame memory 205 by up-sampler 301 are input to motion compensator 106. Then, B-VOP (interframe prediction predicted from two prediction images) is encoded.
[0036]
The third frame is performed in the same manner as the second frame, and thereafter, the operation for these three frames is repeated. In FIG. 2, “I” indicates I-VOP (intraframe) encoding, “P” indicates P-VOP encoding, and “B” indicates B-VOP encoding.
[0037]
FIG. 3 is a block diagram showing a configuration of the image decoding apparatus according to the first embodiment. First, the configuration of the enhancement layer 7001 in FIG. 3 will be described.
[0038]
Reference numeral 1000 denotes a distributor, which distributes the Huffman code, the encoding mode 1, the scramble ON / OFF flag, and the IP encoded code from the bit stream input to the enhancement layer 7001. An IP decoder 1010 decodes the IP encoded code from the distributor 1000 into IP. Reference numeral 1011 denotes an IP authenticator, which performs authentication by checking the match between the IP decrypted by the IP decoder and the authentication IP input from the outside. A security controller 1012 controls a selector 1002 and a selector 1009 described later based on a scramble ON / OFF flag from the distributor 1000 and an authentication result from the IP authenticator 1011.
[0039]
Reference numeral 1001 denotes a DCT code inverter that reverses the sign of the DCT coefficient of the Huffman code. The selector 1002 selects and outputs either the output of the distributor 1000 (a input) or the output of the DCT code inverter 1001 (b input) according to the selection signal from the security controller 1012. Reference numerals 1003 and 1006 denote variable length decoders that convert the Huffman codes into quantized representative values. Reference numerals 1004 and 1007 denote inverse quantizers that output quantized representative values as DCT coefficients. Reference numerals 1005 and 1008 denote inverse DCT units which convert DCT coefficients into images. Reference numeral 1014 denotes a frame memory for storing locally decoded images. A motion compensator 1013 refers to an output from the frame memory 1014 and a locally decoded image from an upsampler 1301 described later, and performs motion compensation for each macroblock to output a predicted image. The selector 1009 selects and outputs either the output (a input) from the inverse DCT device 1005 or the output (b input) from the inverse DCT device 1008 by the security controller 1012.
[0040]
Next, the configuration between the base layer and the enhancement layer in FIG. 3 will be described.
[0041]
Reference numeral 1300 denotes a distributor that distributes an input bit stream to a base layer and an enhancement layer. Reference numeral 1301 denotes an upsampling device that inputs a locally decoded image from a frame memory 1105 described later and upsamples it. The selector 1302 selects either the input from the enhancement layer 7001 (a input) or the input from the base layer 7000 (b input) based on the selection signal from the security controller 1012.
[0042]
Next, the configuration of the base layer 7000 in FIG. 3 will be described.
[0043]
Reference numeral 1100 denotes a distributor which inputs a base layer bit stream and distributes the Huffman code to the encoding mode 2 and outputs the Huffman code to the variable length decoder 1101 and the encoding mode 2 to the motion compensator 1104. Yes. The variable length decoder 1101 converts the Huffman code into a quantized representative value. Reference numeral 1102 denotes an inverse quantizer that outputs a quantized representative value subjected to variable length decoding as a DCT coefficient. Reference numeral 1103 denotes an inverse DCT device which converts DCT coefficients into original image data. A frame memory 1105 stores locally decoded image data. Reference numeral 1104 denotes a motion compensator, which receives locally decoded image data from the frame memory 1105, performs motion compensation for each macroblock, and outputs a predicted image.
[0044]
An operation based on the above configuration will be described.
[0045]
The encoded input bit stream is distributed by the distributor 1300 to the enhancement layer and the base layer. In the base layer 7000, the distributor 1100 distributes the Huffman code and encoding mode 2, and the Huffman code is decoded into image data via the variable length encoder 1101, the inverse quantizer 1102, and the inverse DCT 1103. In the case of I-VOP (intraframe) encoding, the locally decoded image data is directly stored in the frame memory 1105 and is supplied to the b input of the selector 1302 together with this. In the case of P-VOP (interframe prediction) encoding, the predicted image data output from the motion compensator 1104 is added to the output of the inverse DCT unit 1103, and then stored in the frame memory 1105. Supply to b input.
[0046]
On the other hand, in the enhancement layer 7001, the distributor 1000 distributes the Huffman code, the scramble ON / OFF flag, the IP, and the encoding mode 1. The DCT code inverter 1001 outputs the Huffman code with the sign inverted.
[0047]
Hereinafter, a normal Huffman code is referred to as a normal code, and a Huffman code whose sign is inverted is referred to as an inverted code. When a DCT [i] (i = 0 to 63) is a continuous column in the DCT coefficient block in the zigzag scan order, the Huffman code inverted in the first embodiment corresponds to the encoder. i = 3 to 63.
[0048]
The relationship between the VOPs of the base layer and the enhancement layer due to spatial scalability in the first embodiment is the same as that in the encoder in FIG. 1 described above.
[0049]
(A) First, the case where the scramble is on and the IP authentication result is OK will be described.
The selector 1002 inputs the Huffman code whose sign is inverted to the a input, and the normal code whose sign has been restored by the sign inverter 1001 to the b input. In order to scramble on, the selector 1002 selects a normal code of b input. The variable length decoder 1006, the inverse quantizer 1007, and the inverse DCT unit 1008 receive the normal code and output normal prediction error image data as a result.
[0050]
Next, the motion compensator 1013 outputs a prediction image from the output of the frame memory 1014 and the output of the upsampler 1301 in accordance with the encoding mode 1. The prediction error image from the inverse DCT unit 1008 and the prediction image data from the motion compensator 1013 are added to supply normal image data to the b input of the selector 1009. At the same time, the normal image data is supplied to the frame memory. 1014. The selector 1009 selects and outputs the b input according to the selection signal from the security controller 1012, and the selector 1302 selects the a input that is the output of the enhancement layer 7001. As a result, the image decoding apparatus according to the first embodiment can reproduce a high spatial resolution image.
[0051]
(B) A case where the scramble is ON and the IP authentication result is NO will be described.
In this case, because of scrambling on, the selector 1002 selects a normal code in which the code of b is restored. The variable length encoder 1003, the inverse quantizer 1004, and the inverse DCT device 1005 receive the code-inverted code and output the decoded image.
[0052]
Next, the motion compensator 1013 outputs a prediction image from the output of the frame memory 1014 and the output of the upsampler 1301 in accordance with the encoding mode 1. The predicted image from the inverse DCT device 1005 is added to the predicted image data from the motion compensator 1013 and input to a of the selector 1009. The selector 1009 selects and outputs the a input based on the control data from the security controller 1012. The selector 1302 selects the enhancement layer a based on the control data from the security controller 1012. As a result, the decoding apparatus according to the first embodiment reproduces an image having distortion due to scrambling.
[0053]
(C) Describes the case where the scramble is off and the IP authentication result is OK.
For scrambling off, the selector 1002 selects the a input and outputs a normal code. The variable length decoder 1006, the inverse quantizer 1007, and the inverse DCT device 1008 input this normal code and output a normal image. At this time, since the input bit stream is not scrambled, the same normal image data is input to both the a and b inputs of the selector 1009. Next, the selector 1009 selects and outputs the a input or the b input. The selector 1302 selects the a input that is the output of the enhancement layer 7001. As a result, a high spatial resolution image can be reproduced.
[0054]
(D) A case where the scramble is off and the IP authentication result is NO will be described.
Since the authentication result is NO in the security controller 1012, the selector 1302 selects and outputs the b input that is the output of the base layer 7000. As a result, the image decoding apparatus according to the first embodiment decodes and reproduces only the intra-frame encoded image and the P-VOP (inter-frame prediction) encoded image of the base layer. A resolution image is reproduced.
[0055]
FIG. 4 is a diagram showing the relationship between the selector 1302, the selector 1009, the selection state of the selector 1002, and the playback image by the security controller 1012 according to the first embodiment.
[0056]
Here, the three selectors 1002, 1009, and 1302 are controlled based on the scramble ON / OFF flag and the IP authentication result, and as a result, three types of reproduced image states (high resolution, low resolution, and distortion) are generated. Is possible.
[0057]
[Multi-layer spatial scalability configuration]
FIG. 5 is a diagram showing a multi-layered configuration of the encoding and decoding apparatus having the spatial scalability function in FIGS. 1 and 3 described above. Here, the number of layers is arbitrary, and is “n” in the figure.
[0058]
In the figure, 6000 corresponds to the base layer portion of FIG. 1, and 6001 corresponds to the enhancement layer portion of FIG. Reference numeral 7000 corresponds to the base layer portion in FIG. 2, and 7001 corresponds to the enhancement layer portion in FIG. Reference numerals 8000 and 8001 denote downsamplers, and 8002 and 8003 denote corresponding upsamplers. These can set the sampling rate according to the number of layers. However, each layer must correspond to the sampling rate. For example, the downsampling rate is “1/2” and the upsampling rate is “2”. Reference numeral 8004 denotes a multiplexer that multiplexes the bit streams from the multi-layer part into one. Reference numeral 8005 denotes a separator that separates one bit stream for each layer. Reference numeral 8006 denotes a selector that selects a layer to be reproduced according to the authentication result of each enhancement layer.
[0059]
[Multi-layer spatial scalability operation]
In the encoding apparatus of FIG. 5, copyright information and scramble ON / OFF are designated for each enhancement layer. Further, the decoding device decodes and reproduces an image having a resolution according to the scramble ON / OFF and the copyright authentication result. However, when a certain enhancement layer is scrambled, the higher layer must be scrambled.
[0060]
Each function of the first embodiment may be realized by software.
[0061]
FIG. 6 is a flowchart showing an image encoding process according to the first embodiment, and FIG. 7 is a flowchart showing the decoding process.
[0062]
First, the encoding process of FIG. 6 will be described. This process is started by inputting one frame of image data. First, in step S1, both the counter fr for counting the number of frames and the value of the counter n are set to “0”. These counters do not have to have the signals of the encoding modes 1 and 2 in FIG. In step S2, the frame counter fr is incremented by one. In step S3, it is checked whether the value of the frame counter fr is “3n + 1”, that is, whether the frame numbers in FIG. 2 are “1”, “4”, “7”... “3n + 1”. In the case of the layer, n is incremented by 1 in step S4, and then the process proceeds to step S5. In step S5, intra-frame coding (I-VOP) is executed. In step S6, inter-frame predictive coding (P-VOP) is performed in which prediction is performed from one predicted image based on the code encoded in step S5, and the process proceeds to step S10 described later. On the other hand, the frame image data encoded in step S5 is multiplexed in step S7 and output as an output bit stream. Then, the process returns to step S2.
[0063]
On the other hand, if the value of the frame counter fr is not “3n + 1” and it is the base layer in step S3, that is, if it is a frame that is interframe predictively encoded in the base layer, the process proceeds to step S8. In step S8, inter-frame predictive coding (P-VOP) predicted from one predicted image is executed, and the process proceeds to step S7. On the other hand, in the case of a frame subjected to interframe prediction encoding in the enhancement layer, interframe prediction encoding (B-VOP) for predicting from two prediction images is executed in step S9. Next, in step S10, it is checked whether or not the scramble ON / OFF flag is on. If it is on, the process proceeds to step S11 to invert the sign bit of the Huffman code. However, the Huffman code to be inverted in the DCT coefficient sequence (i = 0 to 63) is a DCT coefficient of i = 3 to 63 here. After executing step S11 or if scrambling is off in step S10, the process proceeds to step S12, where an IP code encoding IP, a scramble ON / OFF flag, an encoding mode (I-VOP, P-VOP, B-VOP) and the inter-frame prediction encoded code are multiplexed, and in step S7, multiplexed with the code encoded in the base layer and output. Then, the process returns to step S2.
[0064]
FIG. 7 is a flowchart showing an image decoding process according to the first embodiment.
[0065]
This process is started by inputting the code stream encoded by the encoding apparatus of FIG. 1 described above. First, in step S21, the input bit stream is distributed to the base layer and the enhancement layer. In step S22, the base layer code is subjected to variable-length decoding, inverse quantization, inverse DCT, and prediction code decoding processing by motion compensation.
[0066]
On the other hand, in the case of the enhancement layer, it is checked in step S23 whether or not the scramble ON / OFF flag is on. If it is on, the process proceeds to step S24 to check whether or not the IP authentication is OK. If OK, for example, since the user has permission to view such as copyright, the process proceeds to step S25, and the sign of the Huffman code is inverted to release the scramble. In step S26, variable length decoding, inverse quantization, inverse DCT, and P-VOP and B-VOP decoding processing by a motion compensator are executed. In step S27, the decoded and reproduced image data is output and displayed. In this case, a high resolution image can be displayed.
[0067]
If it is determined in step S24 that the IP authentication is not OK, the process proceeds to step S26, where the scrambled image is decoded and reproduced in step S27. In this case, an image having distortion due to scramble is reproduced.
[0068]
On the other hand, if the scramble flag is off in step S23, the process proceeds to step S29 to check whether the IP authentication is OK. If OK, the process proceeds to step S26, and the unscrambled image is decoded and reproduced.
[0069]
If it is determined in step S28 that the IP authentication is not OK, the process proceeds to step S22, in which decoding processing by I-VOP decoding or P-VOP decoding is executed, and image reproduction (reproduction at a low resolution) is performed in step S27. Thus, in step S28, the above-described processing is repeatedly executed until all the received image decoding processing is completed.
[0070]
In the first embodiment described above, the scramble target of the Huffman code is set to i = 3 to 63 as described above, but other ranges may be used. By setting the range in this way, the scramble distortion can be adjusted.
[0071]
In addition, the present embodiment is the case of the color signal configuration 420, but other color signal configurations may be used. Further, the frame mode configuration in FIG.
[0072]
In this embodiment, the downsampling is set to “1/2” and the upsampling rate is set to “2”. However, these values may be any values as long as they correspond.
[0073]
As described above, according to the first embodiment, in the image encoder and decoder having a spatial scalability function, by performing scrambling on the enhancement layer, distortion is performed on the decoder side as necessary. The copyright of the moving image can be protected.
[0074]
Further, by performing scrambling on the sign bit of the Huffman code instead of the entire bit stream, the processing load on the apparatus can be reduced.
[0075]
Further, since the Huffman code is inverted for each block, when the scrambled bit stream is directly decoded, the distortion is closed in the block. In the case of the first embodiment, an image in which quantization distortion called block distortion or mosquito distortion is generated more than a normal image is reproduced, and as a result, a viewer who cannot unscramble the overview of the image. It will remain to recognize.
[0076]
In addition, the moving image decoding apparatus is provided with two systems of variable length decoders, inverse quantizers, and inverse DCT units for the frame memory 1014 for storing locally decoded images. The variable length decoder 1006 is provided in the frame memory 1014. Since the normal decoded image of the path of the inverse quantizer 1007 and the inverse DCT unit 1008 is stored, distortion due to scrambling generated from the path of the other variable length decoder 1003, the inverse quantizer 1004, and the inverse DCT unit 1005 is caused. Accumulation for each frame can be prevented.
[0077]
It can also be realized with the number of layers of 3 or more.
[0078]
[Embodiment 2]
FIG. 8 is a block diagram showing the configuration of the video encoding apparatus according to Embodiment 2 of the present invention, and FIG. 11 is a block diagram showing the configuration of the video decoding apparatus corresponding to this. In the second embodiment, the MPEG-2 encoding method has a temporal scalability function and scrambles by inverting the sign bit of the Huffman code obtained by encoding the DCT coefficient for the enhancement layer. explain. Refer to the ISO / IEC recommendation for details of the MPEG-2 encoding method.
[0079]
FIG. 8 is a block diagram showing the configuration of the moving picture coding apparatus according to Embodiment 2 of the present invention. In FIG. 8, the same components as those in the first embodiment are denoted by the same reference numerals, and detailed description thereof is omitted. Therefore, in the second embodiment, differences from the first embodiment will be described in detail.
[0080]
A distributor 700 assigns an input image to the enhancement layer and the base layer for each frame, and at the same time performs ordering (frame rearrangement). A motion compensator 702 receives an input image from the frame memory 100 and local decoded image data from the frame memory 205 of the base layer, performs motion vector detection for each macroblock, and outputs a predicted image. A multiplexer 701 multiplexes the bit stream of the enhancement layer and the base layer in time scalability.
[0081]
An operation based on the above configuration will be described.
[0082]
In the second embodiment, intra-frame coding is performed in I-Picture coding mode, inter-frame prediction coding predicted from one predicted image is performed in P-Picture coding mode, and inter-frame prediction code is predicted from two predicted images. This is called B-Picture encoding mode.
[0083]
The input image is distributed by the distributor 700 to the base layer and the enhancement layer for each frame according to the encoding mode 1 input from the outside.
[0084]
FIG. 9 and FIG. 10 are diagrams illustrating the relationship between the frames of the base layer and the enhancement layer based on temporal scalability in the second embodiment. Here, FIG. 9 shows the order of the input images, and FIG. 10 is a diagram showing the configuration of the frames after the order of the frames is rearranged by the distributor 700. Here, the input image is assigned to each layer in the order shown in FIG.
[0085]
The first frame of the input image is first input to the base layer to perform I-Picture encoding, and the locally decoded image is stored in the frame memory 205. Next, the second frame is input to the base layer, and the local frame is stored in the frame memory 205. P-Picture encoding is performed with reference to the decoded image. Next, the third frame is input to the enhancement layer, and the input image is converted into a macroblock in the frame memory 100.
[0086]
The motion compensator 702 receives the output of the frame memory 100 and performs motion compensation from the reference images of the first and second frames already stored in the frame memory 205 of the base layer. That is, B-Picture encoding is performed. The next fourth frame is B-Picture encoded in the same manner as the third frame, and thereafter the operation is repeated according to the frame mode.
[0087]
Next, the video decoding device shown in FIG. 11 will be described. Among the constituent elements in FIG. 11, the same constituent elements as those in the first embodiment are given the same reference numerals, and detailed description thereof is omitted. Therefore, the second embodiment will describe points that are different from the first embodiment.
[0088]
Reference numeral 2000 denotes a distributor that distributes an input bit stream to a base layer and an enhancement layer. 2002 is a motion compensator, which receives locally decoded image data from the frame memory 1105 in the base layer and outputs a predicted image for each macroblock. Reference numeral 2001 denotes a selector that selects an enhancement layer output (a input) and a base layer output (b input) in accordance with the encoding mode 1 from the distributor 1000, and simultaneously performs reordering (frame rearrangement). To rearrange the images in the display order.
[0089]
In the above configuration, the input bit stream is distributed by the distributor 2000 to the enhancement layer and the base layer. The relationship of each frame between the base layer and the enhancement layer in the present embodiment is the same as in FIG.
[0090]
First, in the first to second frames, a bit stream is input to the base layer, and the first frame is decoded as I-Picture and the second frame is decoded as P-Picture. In the third frame, a bit stream is input to the enhancement layer, and the motion compensator 2002 refers to the two images of the first and second frames already stored in the frame memory 1105 of the base layer as B-Picture. Decrypt. In the fourth frame, a bit stream is input to the enhancement layer, and decoding similar to that in the third frame is performed. Thereafter, the operation is repeated according to the frame mode.
[0091]
The selector 2001 selects an enhancement layer (a input) and a base layer (b input) according to the encoding mode 1 from the distributor 1000, performs reordering, and outputs a decoded image.
[0092]
In FIG. 4 described above, if the selector 1302 is the selector 2001, the relationship between each selector and the reproduced image is the same as in FIG. However, the resolution of the reproduced image in this case is relative to the time frequency. That is, the low resolution image indicates an image of only the basic layer, and the high resolution image indicates an image including an enhancement layer.
[0093]
As described above, according to the second embodiment, in the image encoder and decoder having a temporal scalability function, by performing scrambling on the enhancement layer, the decoder side performs each frame as necessary. Distortion can be generated, and the copyright of the moving image can be protected.
[0094]
Further, by performing scrambling on the sign bit of the Huffman code instead of the entire bit stream, the processing load on the apparatus can be reduced.
[0095]
In addition, since scrambling is performed only on the enhancement layer, when both layers are reproduced continuously and an enhancement layer image with scramble distortion is reproduced, an image without distortion and distortion are present. Images are reproduced alternately. As a result, viewers who cannot unscramble only recognize the appearance of the image.
[0096]
In addition, similar to the spatial scalability, it can be realized with the number of layers of 3 or more.
[0097]
Note that the processing according to the second embodiment can also be processed by software as in the first embodiment. The flowchart showing the processing flow in this case is basically the same as that of the first embodiment, and the description thereof is omitted.
[0098]
[Embodiment 3]
FIG. 12 is a block diagram showing a configuration of an image encoding device according to Embodiment 3 of the present invention, and FIG. 13 is a block diagram showing a configuration of an image decoding device corresponding to this. In the third embodiment, the intra-frame coding method has a function similar to the spatial scalability of the MPEG coding method, and by inverting the sign bit of the Huffman code obtained by coding the DCT coefficient for the enhancement layer. A case where scramble is applied will be described.
[0099]
In FIG. 12, the same components as those in the first embodiment are denoted by the same reference numerals, and detailed description thereof is omitted. Therefore, the third embodiment will describe differences from the first embodiment.
[0100]
A separator 3000 separates the DCT low frequency component and high frequency component from the output of the variable length encoder 207 corresponding to the position in the block. Reference numeral 3001 denotes a multiplexer that multiplexes the codes of each layer separated by the separator 3000.
[0101]
When a DCT [i] (i = 0 to 63) is a continuous column in the DCT coefficient block in the zigzag scan order, the Huffman code output to the enhancement layer by the separator 3000 according to the third embodiment is i = 3 to 63.
[0102]
In FIG. 12, an input image is converted into a macroblock in the frame memory 200 and becomes a variable-length encoded code via a DCT unit 201, a quantizer 202, and a variable-length encoder 207. This encoded code is separated for each component in the block by the separator 3000, the low frequency component is output to the multiplexer 3001, and the other components are output to the DCT code inverter 108 and the selector 109.
[0103]
In the image decoding apparatus in FIG. 13, the same components as those in the first embodiment are given the same numbers, and detailed descriptions thereof are omitted. Therefore, the third embodiment will describe points that are different from the first embodiment.
[0104]
Reference numeral 4000 denotes a separator that separates an input bit stream into an enhancement layer and a base layer. Reference numeral 4001 denotes a multiplexer that multiplexes codes in both layers. A selector 4002 selects a black image (a input) or an output of the inverse DCT unit 1103 according to a selection signal from the security controller 1012.
[0105]
The operation of the decoding device based on the above configuration will be described.
[0106]
First, the input bit stream is separated into an enhancement layer and a base layer by a separator 4000. Hereinafter, a normal Huffman code is referred to as a normal code, and a Huffman code whose sign is inverted is referred to as an inverted code. Note that when DCT [i] (i = 0 to 63) is a continuous column in the block of DCT coefficients in the zigzag scan order, the Huffman code output to the enhancement layer by the separator 4000 of the third embodiment is i = 3 to 63.
[0107]
(A) A case where the scramble is on and the IP authentication result is OK will be described.
Since the authentication result is OK in the security controller 1012, the selector 1002 selects the normal code with b input. The multiplexer 4001 multiplexes the low frequency component code from the separator 4000 and the high frequency component code from the selector 1002. The variable length decoder 1101, the inverse quantizer 1102, and the inverse DCT device 1103 receive normal codes. As a result, a normal image can be decoded. Here, the selector 4002 selects the b input according to the selection signal from the security controller 1012 and outputs it. As a result, the present decoding device reproduces a normal image (high resolution).
[0108]
(B) A case where the scramble is ON and the IP authentication result is NO will be described.
In the security controller 1012, since the authentication result is NO, the selector 1002 selects the inverted code of the a input. For this reason, the high frequency component is output from the inverse DCT unit 1103 while being in the inverted code. The selector 4002 selects the b input according to the selection signal from the security controller 1012 and outputs it. As a result, an image having distortion due to scramble is reproduced.
[0109]
(C) Describes the case where the scramble is off and the IP authentication result is OK.
Since the authentication result is OK in the security controller 1012, the selector 1002 selects the a input and outputs a normal code. For this reason, all frequency components become normal and are output from the inverse DCT unit 1103. The selector 4002 selects the b input according to the selection signal from the security controller 1012 and outputs it. As a result, a normal image can be reproduced.
[0110]
(D) Describes the case where the scramble is off and the IP authentication result is NO.
Since the authentication result is NO in the security controller 1012, the selector 4002 selects and outputs a black image (a input). As a result, a black image is reproduced.
[0111]
FIG. 14 is a diagram illustrating the relationship between the selection state of the selector 4002 and the selector 1002 by the security controller 1012 and the playback image according to the third embodiment.
[0112]
As described above, it is possible to control the two selectors 1002 and 4002 based on the scramble ON / OFF flag and the IP authentication result, and as a result, three types of reproduced image states can be generated.
[0113]
As described above, according to the third embodiment, since it is an intra-frame coding method, it can be realized by a still image and moving image coding / decoding device.
[0114]
In an image encoder and decoder having a spatial scalability function, by performing scrambling on the enhancement layer, distortion can be generated on the decoder side as needed, and the copyright of the image can be protected.
[0115]
Further, by performing scrambling on the sign bit of the Huffman code instead of the entire bit stream, the processing load on the apparatus can be reduced.
[0116]
Further, since the Huffman code is inverted for each block, when the scrambled bit stream is directly decoded, the distortion is closed in the block. In the case of this embodiment, an image in which quantization distortion called block distortion or mosquito distortion is generated more than a normal image is reproduced, and as a result, a viewer who cannot unscramble recognizes the appearance of the image. Will stay.
[0117]
Note that the present invention can be applied to a system (for example, a copier, a facsimile machine, etc.) consisting of a single device even if it is applied to a system composed of a plurality of devices (eg, a host computer, interface device, reader, printer, etc.). You may apply.
[0118]
Another object of the present invention is to supply a storage medium (or recording medium) that records a program code of software that realizes the functions of the above-described embodiments to a system or apparatus, and to perform computer (or CPU or MPU) of the system or apparatus. ) Is also achieved by reading and executing the program code stored in the storage medium. In this case, the program code itself read from the storage medium realizes the functions of the above-described embodiments, and the storage medium storing the program code constitutes the present invention. Further, by executing the program code read by the computer, not only the functions of the above-described embodiments are realized, but also an operating system (OS) running on the computer based on the instruction of the program code. A case where part or all of the actual processing is performed and the functions of the above-described embodiments are realized by the processing is also included.
[0119]
Furthermore, after the program code read from the storage medium is written into a memory provided in a function expansion card inserted into the computer or a function expansion unit connected to the computer, the function is determined based on the instruction of the program code. This includes a case where the CPU or the like provided in the expansion card or the function expansion unit performs part or all of the actual processing, and the functions of the above-described embodiments are realized by the processing.
[0120]
As described above, according to the present embodiment, an IP encoding code and additional information for protecting copyright are provided in a bitstream in an image encoding device and decoding device having a scalability function. Thus, it is possible to protect the copyright of the content provider for a part of the desired image.
[0121]
Further, according to the present embodiment, since the limited code of a part of the bit stream is scrambled, the processing load is less than that for the entire encoded code. Further, since the distortion is closed in the block, it is possible to create a state in which a viewer who cannot unscramble recognizes the appearance of the image.
[0122]
【The invention's effect】
As described above, according to the present invention, it is possible to scramble a part of a target bit stream to be protected for copyright at the time of image encoding, and to encode an image without lowering the encoding efficiency.
[0123]
In addition, according to the present invention, normal reproduction is performed in the image decoding device of the viewer who has the viewing right, and reproduction is performed so that an approximate overview of the image can be recognized in the image decoding device of the viewer who does not have the viewing right. There is an effect that an image can be encoded so that
[0124]
According to the present invention, the image encoded by the image encoding device is decoded,Three types of images (high resolution, low resolution, and distortion) can be obtained by combining the authentication result of whether or not the user has the right to view and the presence or absence of scrambling.There is an effect that it can be reproduced.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a main configuration of an image coding apparatus according to Embodiment 1 of the present invention.
FIG. 2 is a diagram showing a relationship of each VOP between a base layer and an enhancement layer based on spatial scalability according to the first embodiment.
FIG. 3 is a block diagram showing a main configuration of an image decoding apparatus according to Embodiment 1 of the present invention.
FIG. 4 is a diagram for explaining the relationship between the input / output of the security controller and the reproduced image in the first embodiment of the present invention.
FIG. 5 is a diagram showing a multi-layer spatial scalability configuration according to Embodiment 1 of the present invention;
FIG. 6 is a flowchart illustrating an image encoding process according to Embodiment 1 of the present invention.
FIG. 7 is a flowchart showing an image decoding process according to the first embodiment of the present invention.
FIG. 8 is a block diagram showing the main configuration of an image coding apparatus according to Embodiment 2 of the present invention.
[Fig. 9] Fig. 9 is a diagram illustrating a relationship in frame display order between a base layer and an enhancement layer based on spatial scalability according to the second embodiment.
[Fig. 10] Fig. 10 is a diagram illustrating a relationship of frame encoding orders in a base layer and an enhancement layer based on spatial scalability according to the second embodiment.
FIG. 11 is a block diagram showing the main configuration of an image decoding apparatus according to Embodiment 2 of the present invention.
FIG. 12 is a block diagram showing a configuration of a main part of an image coding apparatus according to Embodiment 3 of the present invention.
FIG. 13 is a block diagram showing the main configuration of an image decoding apparatus according to the third embodiment.
FIG. 14 is a diagram illustrating a relationship between input / output of a security controller and a reproduced image according to the third embodiment.

Claims

Input an encoded bitstream, and distribute the bitstream to a protection code for protecting intellectual property, one or more enhancement layers, and a base layer having a lower resolution than the enhancement layer Distribution means to
Authentication data input means for inputting authentication data from the outside,
An authentication means for checking consistency between the authentication data and the protection code;
And descrambling means for descrambling the one or more enhancement layers,
Enhancement layer decoding means for decoding the enhancement layer distributed by the distribution means or the enhancement layer descrambled by the descrambling means;
Base layer decoding means for decoding the base layer distributed by the distributing means;
If the authentication result by the authentication means is coincident and the enhancement layer is scrambled, select an image obtained by decoding the enhancement layer released by the descrambling means,
When the authentication result by the authentication means does not match and the enhancement layer is scrambled, or when the authentication result by the authentication means matches and the extension layer is not scrambled Selects an image obtained by decoding the enhancement layer,
If the authentication result by the authentication unit is inconsistent and the enhancement layer has not been scrambled, a selection unit that selects an image obtained by decoding the base layer;
Image decoding apparatus characterized by comprising an image output means for outputting the result selected image to the selection unit.

The image decoding apparatus according to claim 1 , wherein the image output unit outputs an image frame assigned to a reproduction order for each layer.

The image decoding apparatus according to claim 1 , wherein the descrambling means replaces the bitstream with another code having the same code length.

The base layer decoding means or the enhancement layer decoding means
Variable-length decoding means for variable-length decoding the encoded code for each block;
Inverse quantization means for inversely quantizing the decoding result by the variable length decoding means;
The image decoding apparatus according to any one of claims 1 to 3 , further comprising: an inverse orthogonal transform unit that performs an inverse orthogonal transform on an inverse quantization result obtained by the inverse quantization unit.

Input an encoded bitstream, and distribute the bitstream to a protection code for protecting intellectual property, one or more enhancement layers, and a base layer having a lower resolution than the enhancement layer A dispensing process to
An authentication data input process for inputting authentication data from the outside,
An authentication step for checking the consistency between the authentication data and the protection code;
A descrambling step for descrambling the one or more enhancement layers,
An enhancement layer decoding step of decoding the enhancement layer distributed by the distribution step or the enhancement layer descrambled by the descrambling step;
A base layer decoding step of decoding the base layer distributed by the distributing step;
If the authentication result by the authentication step is the same and the enhancement layer has been scrambled, select an image obtained by decoding the enhancement layer released by the descrambling step,
When the authentication result by the authentication step is inconsistent and the enhancement layer is scrambled, or when the authentication result by the authentication step is coincident and the extension layer is not scrambled Outputs an image obtained by decoding the enhancement layer,
If the authentication result of the authentication step is inconsistent and the enhancement layer is not scrambled, a selection step of selecting an image obtained by decoding the base layer;
An image decoding method comprising: an image output step of outputting the image selected in the selection step.

A computer-readable storage medium storing a program for causing a computer to execute the method according to claim 5 .