JP4349542B2

JP4349542B2 - Device for detecting telop area in moving image

Info

Publication number: JP4349542B2
Application number: JP2000248794A
Authority: JP
Inventors: 晴久加藤; 康之中島
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2000-08-18
Filing date: 2000-08-18
Publication date: 2009-10-21
Anticipated expiration: 2020-08-18
Also published as: JP2002064748A

Description

【０００１】
【発明の属する技術分野】
この発明は動画像内のテロップ領域検出装置に関し、特に、圧縮符号化データそのものまたはその一部だけを復号した情報から、高速かつ高精度にテロップ領域を抽出できる動画像内のテロップ領域検出装置に関する。
【０００２】
【従来の技術】
従来のテロップ領域を検出する方式（以下、第１の検出方式）、特にニュース映像からのテロップ検出方式として、時間的な輝度分布差分値をテロップの検出判定に使用する方式がある。また、出現位置の局所性、規則的な配置などの幾何的な性質からテロップを求める方式、エッジ分布の偏向や色の類似性など文字としての特徴量からテロップを求める方式が報告されている。さらに、これら特徴量を解析する方法として、ニューラルネットワークや遺伝アルゴリズムを取り入れたテロップ領域検出方法などが提案されている。これらの方式は、動画像における各画素の輝度値を対象に様々なアルゴリズムを適用してテロップ領域を抽出する方式である。
【０００３】
また、他のテロップ検出方式では画素そのものを利用するのではなく、圧縮符号化された動画像の符号化データそのものを利用する方式が提案されている。この方式は圧縮の際に求められる各種のパラメータや符号化データを直接操作することでテロップ領域の検出処理を達成する。
【０００４】
従来の動き予測誤差の時間的変化に注目した方式（以下、第２の検出方式）は、画像内の符号単位ブロックについて動き予測情報の時間変動を観測し、テロップの瞬時的な出現を検出する。必要最小限の情報のみを取捨選択しテロップ領域の抽出を行う。このため、抽出される解像度は動き予測情報を持つ各圧縮方式の符号化単位ブロックの大きさに設定される。
【０００５】
さらに、符号単位ブロックの符号化モードに着目した方式（以下、第３の検出方式）は、静止したテロップの特徴と符号化モードの相関から、符号化モードに応じて計数カウンタを増減させる。閾値以上の領域をテロップ領域候補と認識し、更に形状判断することで、テロップとして抽出する。なお、該第３の検出方式を開示した文献として、例えば、電子情報通信学会論文誌 D-II, Vol.J81-D-IINo.8,PP1847-1855,1998 年8 月「ＭＰＥＧ符号化映像からの高速テロップ領域検出法」がある。
【０００６】
【発明が解決しようとする課題】
前記第１の検出方式を圧縮符号化された動画像データに適用するには、符号化データを一旦復号する必要があり、圧縮符号化データを画素領域の情報に戻さねばテロップ検出処理を適用できない。よって、第１の検出方式は、例えば図１０のブロック図で示される構成により実現される。
【０００７】
図１０の可変長復号部５１は圧縮された画像符号化データを入力とし、可変長復号、逆量子化や係数範囲制限等の復号処理を行う。可変長復号部５１からは復号されたフレームやブロックの符号化モード情報、動き予測情報、動き予測誤差情報等が出力される。画像生成部５２の入力は可変長復号部５１からの予測誤差情報であり、逆変換を経て１フレームの画像データを生成する。また、動き補償部５２は動き予測の参照フレームを保持している画像メモリ５３から動き予測情報を使ってブロックを抽出し、完全な１フレームを構成するために画像生成部５２の出力画像データと加算する。この画像はテロップ検出部５４へ入力されるとともに、次フレーム以降の参照画像となるフレームならば画像メモリ５３に蓄積される。これら一連の復号処理の後、テロップ検出部５４にて、初めて画素領域でのテロップ領域検出処理が施される。テロップ検出部５４による検出結果は、画像表示部５５に送られる。
【０００８】
この第１の検出方式では、圧縮データの復号処理および復号された画像を用いたテロップ領域検出処理に、大きな計算コストがかかるという問題がある。
【０００９】
一方、前記第２、第３の検出方式は圧縮符号化データそのものを利用するので、復号処理過程が省略でき検出処理も高速に実行できる。しかし、実際の動画像では、パン、チルトなどのカメラワークや、ワイプ、ディゾルブなどの撮影後に編集された映像効果などの要因によって、動き予測誤差情報の変化が激しくなり、テロップの出現との判別が難しい。特にシーンチェンジにおいてはこの影響が大きく、第２の検出方式ではシーンチェンジ後のフレームをテロップ領域と誤認識するなど、検出精度が劣るという問題がある。
【００１０】
また、第３の検出方式はテロップ領域外に動きベクトルが多数存在する場合に検出率が高くなる特性を持つ。しかし、低解像度の映像では動きベクトルが相対的に小さくなり、符号化モードの分布も異なってくる。また、ニュース映像等はカメラが固定されている場合が多く、背景に動きベクトルが存在しないような場合、テロップ以外の領域をテロップ領域と誤検出する恐れがある。
【００１１】
本発明の目的は、前述した従来技術の問題点を解消し、圧縮された符号化データそのもの、またはその一部だけを復号した情報からテロップの出現を高速かつ高精度に検出できるテロップ領域検出装置、およびフレーム内でのテロップ位置を抽出できるテロップ領域検出装置を提供することにある。
【００１２】
前述の目的を達成するために、本発明は、圧縮された動画像のデータを入力とし、該動画像のデータにテロップ領域情報を付加して出力する動画像内のテロップ領域検出装置において、前記圧縮された動画像のデータを可変長復号する可変長復号部と、該可変長復号部で復号された現在と一つ前のフレーム内符号化画像とを比較して変化が認められた領域について該変化が収束するか否かを検知し、収束すると検知された場合にテロップ候補の位置情報を出力する時間変移判定部と、前記フレーム内符号化画像間に存在する前記可変長復号部で復号されたフレーム間符号化画像において、前記テロップ候補の位置情報に該当するブロックの符号化モードの種類からテロップに相応しい符号化モードをもつブロックを前記フレーム内符号化画像上から抜き出し、そのブロックの位置情報を出力するテロップ位置判定部と、前記フレーム間符号化画像の前記テロップ候補のブロックに関して、前記可変長復号部から出力される双方向予測画像群の動き予測情報の参照方向の時間的変化からテロップの出現フレームを検出する出現フレーム判定部とを具備した点に特徴がある。
【００１３】
この特徴によれば、テロップ領域の検出過程を段階的にしたので、高速でかつ高精度なテロップ領域検出処理を行うことができるようになる。
【００１４】
【発明の実施の形態】
以下に、本発明を、図面を参照して説明する。図１に本発明の一実施形態の構成を示すブロック図を表す。なお、この実施形態は入力動画像の符号化方式に国際標準であるMPEG-1ビデオ(ISO/IEC11172-2)を使用しているが、本発明はこれに限定されるものではない。
【００１５】
システム全体の入力として、圧縮符号化された動画像の符号化データが与えられる。符号化データは可変長復号部１により必要な情報だけが部分的に復号され、該復号された情報Ａ，Ｂ，Ｃは、それぞれ時間変移判定部２、テロップ位置判定部３、および出現フレーム判定部４に送られる。ここに、前記情報Ａは、（ｎ−Ｎ）フレーム〜（ｎ＋ｍＮ）フレーム間のフレーム内符号化画像（Ｉピクチャ）の符号化情報、Ｂは、ｎフレーム〜（ｎ＋ｍＮ）フレーム間のＰ、Ｂピクチャの符号化モード情報、Ｃは、（ｎ−Ｎ）フレーム〜ｎフレーム間のＢピクチャの動き予測情報である。ここで、定数ＮはＧＯＰ内のピクチャ数を表し、フレーム内符号化画像の出現間隔を意味するパラメータである。また、ｎ，ｍは任意の正の整数である。
【００１６】
時間変移判定部２は複数のフレーム内符号化画像の符号化情報Ａをもとに時間的推移の状態を検討する。そして、該検討の上で、テロップ領域候補となる領域をＩピクチャ上で設定し、この領域の位置情報Ｄをテロップ位置判定部３へ出力する。
【００１７】
テロップ位置判定部３では、時間変移判定部２からの位置情報Ｄをもとにテロップ判定対象のブロックを決定する。同時に、可変長復号部１からは対象ブロックの符号化モード情報Ｂを入力し、判定対象ブロック毎に符号化モードの選定状況を把握する。これらの結果を基に、テロップに相応しい符号化モードを持つブロックをＩピクチャ上から抜き出し、そのブロックの位置情報Ｅを出現フレーム判定部４へ送る。
【００１８】
出現フレーム判定部４ではテロップ位置判定部３から入力されたブロックの位置情報Ｅをもとに検出対象ブロックを決定し、同時に可変長復号部１から動き予測情報Ｃを受け取る。動き予測情報Ｃをもとに、どのフレームからテロップが出現したかのフレーム判定処理を行う。
【００１９】
この判定結果Ｆは、検出結果表示部または記録部（図示されていない）へ出力される。検出結果表示部または記録部は検出結果を出力し、要求があれば抽出領域の映像を部分的に復号し提示する。
【００２０】
次に、前記時間変移判定部２の機能を、図２のフローチャートを参照して、詳細に説明する。時間変移判定部２は４つの処理を行う。この４つの処理は、ステップＳ１のライン変動判定処理と、ステップＳ２のブロック変動判定処理と、ステップＳ３の収束判定処理と、ステップＳ４の形状整形処理である。
【００２１】
前記ライン変動判定処理（ステップＳ１）は、入力してくる２つのフレーム内(Intra) 符号化画像Ｉn-N 、Ｉn の符号化データに対して処理を行う。ここで、定数ＮはＧＯＰ内のピクチャ数を表し、フレーム内符号化画像の出現間隔を意味するパラメータである。ライン変動判定処理は、時間経過によって変化が生じた領域をブロックの１ライン単位で抽出する。ステップＳ１の処理で変動領域と判定された領域の位置情報は、ステップＳ２のブロック変動判定処理の入力となる。該ブロック変動判定処理ではブロック単位で変動領域を判定し、該抽出ブロックはステップＳ３の収束判定処理に送られる。
【００２２】
該収束判定処理は、指定された領域に対して、Ｉn 以降のフレームＩn+mN（０＜ｍ＜ｑ）全てにおいて変化が収束する領域を抽出し、該領域をテロップ領域候補とする。ここで、定数ｑは収束判定対象となるフレーム数を表し、１以上の値を取る。ステップＳ４の形状整形処理はテロップ領域候補を受け取り、テロップの大きさを考慮して孤立した小領域を排除する。次に膨張収縮処理にてテロップ領域候補の欠損部を補い、次以降の判定処理のためテロップ領域候補を整形する。以上で、前記時間変移判定部２の処理は終了する。
【００２３】
該時間変移判定部２は、現在のフレーム内符号化画像と一つ前のフレーム内符号化画像との比較によって変化が認められた領域に着目して、その領域のみ未来のフレーム内符号化画像において変化が収束するか否かを検討する。具体的な時間変化の発生判定には、ＤＣＴ係数のＤＣ成分とＡＣ成分を個別に判定基準として用いる。
【００２４】
以下に、前記ステップＳ１〜Ｓ３の処理を、図３〜図５を参照して、より詳細に説明する。
【００２５】
まず、図３を参照して、前記図２のステップＳ１のライン変動判定処理を説明する。図３(a) は、ＤＣ成分によるライン変動判定処理の詳細を示すフローチャートである。ライン変動判定処理には、フレームＩn と１ＧＯＰ前のフレームＩn-N のＤＣＴ係数ＤＣ成分情報が入力される。初めにテロップは画面に対して水平または垂直に現れると仮定して、ＤＣ成分はステップＳ１０の処理にて縦横それぞれ１ライン単位で読み込まれる。例えば、図３(b) に示されているように、フレームＩn と１ＧＯＰ前のフレームＩn-N の各ブロックのＤＣＴ係数のＤＣ成分が縦横１ライン単位で読み込まれる。
【００２６】
ステップＳ１１では、該ＤＣ成分の大まかな変化を捉えるため、粗く量子化した輝度ヒストグラムを生成する。ステップＳ１２では、過去のフレームＩn-N において同位置ラインのヒストグラムとの差分絶対値和を求める。ステップＳ１３では、閾値による判定を行い、閾値以上の差分値を持つラインはステップＳ１４で１ライン全体をテロップ領域候補とする。そうでなければ、ステップＳ１５で１ライン全体を非テロップ領域とする。
【００２７】
ステップＳ１６ではライン毎の処理が全ブロックについて全て終了したか否かを判断し、終了していなければステップＳ１０に戻り、次のラインについてステップＳ１０〜Ｓ１５の一連の処理を繰り返す。ステップＳ１６の判断が肯定の場合には、ステップＳ１７に移る。
【００２８】
ステップＳ１７はテロップ領域候補となったラインの本数を計数し、テロップ領域候補がフレームの大部分を占める場合はテロップ以外の原因による輝度変化として、全ブロックを非テロップ領域とした上で現フレームの検出処理を終了する。そうでなければ、抽出ブロックの位置情報を出力し、ＤＣ成分による時間変動判定処理を終了する。
【００２９】
次に、前記ステップＳ２のブロック変動判定処理を、図４を参照して説明する。図４(a) は、ＡＣ成分によるブロック変動判定処理のフローチャートを表す。該変動判定処理には、フレームＩn とフレームＩn-N のＤＣＴ係数ＡＣ成分情報が入力され、ブロック単位で処理する。文字と背景が織り成すエッジ領域はＤＣＴ係数ＡＣ成分の多寡に対応するので、ブロックの部分和の変化が空間的、時間的ともに閾値を超えるブロックをテロップ領域とする。
【００３０】
ステップＳ１９は、対象ブロックが前記ステップＳ１のライン変動判定処理でテロップ領域候補と判定された領域であるか調べる。テロップ領域候補であれば処理を続行し、そうでなければ非テロップとした上で該ブロックに対する判定を終了する。ステップＳ２０は、テロップ領域候補に対してＡＣ成分の絶対値部分和を計算する。
【００３１】
図４(b) は変動判定に利用する係数範囲についての一例を表す。ここでは、ＤＣＴ係数ＡＣ成分について、ジグザグスキャンオーダーでＡＣ低周波成分９個の絶対値部分和による判定を使用している。
【００３２】
ステップＳ２１では、過去のフレーム（Ｉn-N フレーム）における同位置ブロックの、ＡＣ低周波成分９個の絶対値部分和との差分を計算する。ステップＳ２２では、閾値による判定を行い、閾値以上の差分値を持つブロックはステップＳ２３でテロップ領域候補とする。そうでなければ、ステップＳ２４で非テロップ領域候補とする。ステップＳ２５は全てのブロックに対して処理が完了したかを判断し、完了していなければ、ステップＳ１９に戻って、ステップＳ１９〜Ｓ２４の一連の処理を繰り返す。ステップＳ２５で、全ブロックの処理が完了したと判定されると、ＡＣ成分による変動判定処理は終了する。
【００３３】
次に、前記ステップＳ３の収束判定処理の詳細を、図５を参照して説明する。図５(a) は、該収束判定処理のフローチャートを表す。変動領域はテロップの出現である可能性が高いが、移動する物体やカメラワークによる変動である可能性も否めない。テロップは出現過渡期においては時間的な輝度変動が激しいが、定常状態では逆に輝度変化がほとんど生じない。よって、該収束判定処理には、前記ステップＳ１、Ｓ２で抽出したテロップ領域候補のうち、テロップ出現以外の要因による変動領域を除去するため、テロップの位置に対する定常性を利用する。
【００３４】
具体的には、画面全体に対しＤＣ成分の時間的変化を判断基準にシーンチェンジ等が無いことを確認した上で、テロップ領域候補に対し静止したテロップのエッジの方向と位置が同一であることを利用する。エッジの一致性には、ＡＣ成分の部分和を利用したクラス分類を用いる。例えば、図４(b) に示した９つのＡＣ成分を更に縦（垂直）、横（水平）、対角要素の３つの部分に分割したとき、図５(b) で示すように４つのブロックから、合計で１２個のクラスを形成することができる。
【００３５】
ステップＳ３の収束判定処理には、フレームＩn からフレームＩn+qNまでのＤＣＴ係数ＡＣ成分情報が入力される。ステップＳ２６では、対象ブロックがテロップ領域候補であるか否かを調べる。テロップ領域候補であれば、処理を続行し、そうでなければ、非テロップ領域とした上で該ブロックの収束判定処理を終了する。ステップＳ２７は、フレームＩn における対象ブロックのエッジクラスを決定する。例えば、図５(b) の前記１２個のクラスから、最大部分和をもつエッジクラスを求める。ステップＳ２８では、フレームＩn+mN（０＜ｍ＜ｑ）における同位置ブロックの同エッジクラスを決定する。一般にフレーム内符号化画像は１２〜１５フレーム間隔で配置されることが多いため、３０ｆｐｓならば、およそ０．５秒間隔で配置されていることになる。２秒以上テロップが提示されていると仮定すれば、ｑの値は４程度まで設定できる。
【００３６】
ステップＳ２９では、前記ステップＳ２７とＳ２８で求められたエッジクラスの部分和が一致するかを判定する。一致する場合は、ステップＳ３０で該ブロックをテロップ領域候補とする。そうでなければ、ステップＳ３１で非テロップ領域候補とする。ステップＳ３２は全てのブロックに対して処理が完了したかを判断し、完了していなければ、ステップＳ２６に戻り、該ステップＳ２６〜Ｓ３１の一連の処理を繰り返す。全ブロックの処理が完了していれば、クラス分類による収束判定処理を終了する。
【００３７】
以上の時間変移判定処理により、テロップ領域候補となる領域がＩピクチャ上で設定されたことになる。
【００３８】
次に、図６を参照して、前記符号化モード情報によるテロップ位置判定部３（図１参照）の機能を説明する。テロップ位置判定部３には、時間変移判定部２にて抽出されたテロップ領域候補情報が入力される。同時に可変長復号部１からはフレームＩn とフレームＩn+mNの間に存在するフレーム（Ｐ，Ｂピクチャ）の符号化モード情報が入力される。複数のフレームに渡って、同位置に存在するブロック群を１単位として処理する。符号化モード情報によるテロップの検証は、前述の検出処理で抽出されたテロップ領域候補に限定して行う。
【００３９】
ステップＳ３３は対象ブロックが時間変移判定部２でテロップ領域候補と判定されているか否かを判断する。テロップ領域候補であれば処理を続行し、そうでなければ非テロップとした上で該ブロックに対する判定を終了する。ステップＳ３４は符号化モード情報と動き予測情報の参照するフレーム間距離から計数カウンタを生成する。ステップＳ３５はステップＳ３４で計数されたカウンタに対して、閾値による判定を行う。閾値以上を持つカウンタを形成したブロック群はステップＳ３６にてテロップ領域候補とする。そうでなければ、ステップＳ３７で非テロップ領域候補とする。ステップＳ３８は全てのブロックに対して処理が完了したかを判断し、完了していなければ、ステップＳ３３に戻って、ステップＳ３３〜Ｓ３７の一連の処理を繰り返す。全ブロックの処理が完了していれば、ステップＳ３９の処理に移る。ステップＳ３９はテロップ領域候補に対して形状整形処理を行う。処理内容は形状整形処理部（前記ステップＳ４）と同一である。以上で符号化モード情報による判定処理を終了する。
【００４０】
入力情報の一つである符号化モード情報には、フレーム符号化情報とブロック符号化情報を用いる。フレーム符号化情報には次の３種類が存在する。
(1) フレーム内符号化画像（Ｉピクチャ）
(2) 順方向予測画像（Ｐピクチャ）
(3) 双方向予測画像（Ｂピクチャ）
【００４１】
ブロック符号化モードには、フレーム内符号化ブロック（Intra ）とフレーム間符号化ブロック（Inter ）がある。さらに、フレーム間符号化ブロックには動き補償と符号化の有無から次に示す４種類が存在する。
(1) 動き補償符号化ブロック（MC coded）
(2) フレーム差分符号化ブロック（no MC coded ）
(3) 動き補償ブロック（MC no coded ）
(4) スキップト・ブロック（Skip）
ただし、動き予測には順方向、逆方向、両方向の３種類が存在する。
【００４２】
このとき、静止したテロップの特徴と、対応するブロックの符号化モードとの間に高い相関が存在する。例えば、静止したテロップには動き予測情報が存在しないか、若しくはその大きさが０に近い。また、テロップを構成する文字列の特徴として境界部のエッジを挙げているように、複雑なテクスチャが存在するため動き予測誤差情報が省略されることは少ない。
【００４３】
よって、上記の特徴を備えるno MC coded 符号化モードは、テロップである可能性が最も高い。しかし、実映像の符号化過程を考慮するとき、符号化器の精度が向上するほど、静止領域には動き予測情報が与えられない。つまりテロップの有無に関わらず、動き予測情報を持たないモードが選択されることが多くなる。また、動き予測の参照フレームが近いほど移動している領域でも見かけの動きが小さいため、動き予測情報が割り当てられないことがある。
【００４４】
この問題を解決する方法として、符号化モードによる判定を行う際、時間的距離の概念を導入し、符号化モードの信頼性情報として利用する。参照フレームが近い場合は動き予測情報が存在しなくても、それがテロップである可能性を保証するものではないので符号化モード情報の信頼度は低く設定する。逆に動き予測情報が存在するならば、明確な移動物体が存在するものとして非テロップ領域候補としての信頼度は高くする。一方、参照フレームが遠く離れている場合は逆の設定を用意する。すなわち、動き予測情報が存在しない場合は完全に静止した領域と判断できるので、テロップ領域候補としての符号化モード情報の信頼度を高く設定する。また、動き予測情報が存在しても、非テロップ領域候補としての符号化モード情報の信頼度は低く設定する。
【００４５】
この時間的距離を信頼性情報とした符号化モード情報による計数法（前記ステップＳ３４）の一例を、図７のフローチャートを参照して説明する。入力される情報は同位置のブロック群の符号化モード情報である。
【００４６】
ステップＳ４０では、動き予測情報が参照するフレームまでの距離を算出する。ただし、予測方式が両方向予測のときは距離の近い方を採用する。ステップＳ４１はブロックの符号化モード情報を用いて判定を下す。ブロックの符号化タイプが動き予測情報を持つMC coded、MC no coded ならば、ステップＳ４２にてステップＳ４０で求めた参照フレームまでの時間的距離に反比例した数を減算する。一方、動き予測情報を持たないIntra 、動き予測情報の大きさが０であるno MC coded 、またはSkipならば、ステップＳ４３で時間的距離に比例した数を加算する。ステップＳ４４は同位置に存在するブロックに対して処理がすべて終了したかを判断し、終了していなければ次のブロックについてステップＳ４０から一連の操作を繰り返す。そうでなければカウンタの計数を終了する。
【００４７】
以上のテロップ位置判定処理により、テロップに相応しい符号化モードを持つブロックをＩピクチャ上から抜き出すことができる。
【００４８】
次に、前記出現フレーム判定部４の動作を説明する。動き予測情報によるテロップの検証は、前述の検出処理で抽出されたテロップ領域候補に限定して行う。ここでは、ブロックの符号化モードと動きベクトルの時間的参照方向を利用する。テロップが出現するとき、ＩまたはＰピクチャに区切られた連続するＢピクチャ（以下、これをＢピクチャ群という）のテロップ領域には次に挙げる性質が現れる。
(1) Ｂピクチャ群に両方向予測が存在しない。
(2) 出現フレームがＩ、又はＰピクチャのとき、Ｂピクチャ群に順方向動きベクトルのみ存在する。
(3) 出現フレームがＢピクチャのとき、Ｂピクチャ群に逆方向動きベクトルも存在する。
(4) 出現フレームがＢピクチャのとき、Ｂピクチャ群に順逆方向の切り替わりは一度だけ存在する。
(5) テロップ出現後は動きベクトルを持たない。
【００４９】
上記の性質の理解を容易にするために、図９を示す。同図(a) 、(b) から、Ｂピクチャ、Ｉピクチャ、またはＰピクチャにテロップが出現する時には、前記(1) のようにＢピクチャ群に両方向予測が存在しないことは明らかである。また、テロップの出現フレームがＩ、又はＰピクチャのときには、同図(b) から、前記(2) のようにＢピクチャ群に順方向動きベクトルのみが存在することは明らかである。また、テロップ出現フレームがＢピクチャのときには、同図(a) から、Ｂピクチャ群は前記(3) 、(4) の性質を有することは明らかである。なお、同図(c) に示されているように、Ｂピクチャ群に両方向予測が存在する場合には、Ｉ，ＰおよびＢピクチャ群のいずれにもテロップは出現しない。
【００５０】
したがって、Ｂピクチャ群の動き予測情報からテロップの出現フレームを上記の条件(1) 〜(5) を満たすフレームに絞り込む。なお、図９におけるＢピクチャ群の両側が、共にＰピクチャであることもありうる。
【００５１】
具体的には、時間変移判定部２で複数のフレーム内符号化画像にテロップの出現が検知されたとき、出現フレーム判定部４はＧＯＰ内部のＢピクチャ群について上記の特性（性質）を検証する。初めに、Ｂピクチャ群毎に、個々のブロックに対して動き予測情報の時間的参照方向を調べる。Ｂピクチャ群が順方向予測のみで構成される場合（前記性質２）は、直後のＩまたはＰピクチャにテロップが出現したものと判断する。Ｂピクチャ群が逆方向を含む場合、又は順方向から逆方向への変化が一度だけ存在する場合（前記性質３、４）は、逆方向が始まったＢピクチャにテロップが出現したと判断する。これら以外の場合は、この連続するＢピクチャ群にはテロップの出現はないと判断し、次のＢピクチャ群について判定処理を続ける。これにより、テロップの出現フレームはフレーム単位で検出することが可能となる。
【００５２】
ただし、静止したテロップを仮定しているため、動き予測情報自体の長さはほぼ０であるブロックに限定する。動き予測情報が有意な長さを持つブロックは参照方向の如何に関わらず、テロップ領域候補から外す。同様に、テロップ出現判定後のＧＯＰに対しても、同位置のブロック毎に、動き予測情報の長さを検証する。長さが十分０に近くなければ、そのブロックはテロップ領域候補から除外する。
【００５３】
図８に動き予測情報による出現フレーム判定部４の動作のフローチャートを示す。テロップ位置判定部３からはテロップ領域候補の位置情報が入力される。同時に可変長復号部１からはフレームＩn-N とフレームＩn 間にあるフレームの動き予測情報が入力される。判定はＢピクチャ群の個々のブロックを対象とする。
【００５４】
ステップＳ４５では、対象ブロックがテロップ位置判定部３でフレームＩn においてテロップ領域候補と判定されているか否かを判断する。テロップ領域候補であれば処理を続行し、そうでなければ該ブロック群に対する判定を終了する。ステップＳ４６は前述したテロップ出現に伴うＢピクチャ群の特性を検証する（前記性質(1) 〜(5) 、および図９参照）。ステップＳ４７ではステップＳ４６の出力が上記の性質を満たしているものであるか否かを判断し、満たしているならば、ステップＳ４８にて逆方向ベクトルが出現したＢピクチャフレーム，あるいはＩまたはＰピクチャフレームをテロップの出現フレームとして出力する。そうでなければ、ステップＳ４９にて該ブロックを非テロップとし、該Ｂピクチャ群にはテロップが出現していないと判断する。ステップＳ５０はＢピクチャ群の全てのブロックに対して判定処理が完了したかを判断する。終了していなければ、ステップＳ４５に戻って、次のブロックに対して前記ステップＳ４５〜Ｓ４９の一連の処理を繰り返す。そうでなければ判定処理を続ける。
【００５５】
ステップＳ５１はＧＯＰ内部のすべてのＢピクチャ群に対して判定処理が完了したかを判断する。終了していなければ次のＢピクチャ群について、ステップＳ４５〜Ｓ５０の一連の処理を繰り返す。そうでなければ、処理を終了する。なお、ステップＳ４８で、テロップ開始フレームが検出されなかった場合には、、全ブロックを非テロップ領域とし、出現フレーム判定部４の処理を終了する。この場合には、該出現フレーム判定部４は、その後、次のＧＯＰ内部のすべてのＢピクチャ群に対して、図８の判定処理を再度行う。
【００５６】
上記の説明から明らかなように、本発明によれば、以下の特徴(1) 〜(5) を提供することができる。
(1) テロップ領域の検出過程を段階的にするようにしたので、高速な処理と高精度な処理を両立させることができる。
(2) 時間的な変動判定とそれに続く収束判定とを行うようにしたので、不要な変動領域を排除して、テロップ領域の検出処理をすることができるようになる。
(3) 有意な動き予測情報を備えるブロックを検出対象から除外することができるようになる。
(4) 符号化モード情報の信頼性を考慮して、重み付け計数による検出判定を行うことができるようになる。
(5) 動き予測情報を利用して、１フレーム単位でのテロップ検出解像度を達成できるようになる。
【００５７】
【発明の効果】
以上の説明から明らかなように、本発明によれば、圧縮符号化された動画像データを部分的に復号することに加え、１０数フレームの間隔をおいたフレーム（例えば、フレーム内符号化画像）を対象とした検出をまず行い、次いで１フレーム単位での検出を行うというように、テロップ開始フレームの検出処理を階層的にしたので、従来の画素領域の検出方式（前記第１の検出方式）は無論のこと、符号データ領域での検出方式（前記第２、第３の検出方式）と比較しても処理コストを抑えることが可能となる。つまり、本発明では、テロップ検出判定の適用範囲を必要最小限に抑えることができるため、圧縮符号化データ上でのテロップ領域抽出方式の処理量の低減および高速性を更に向上することが可能となる。
【００５８】
また、本発明は、テロップの出現に伴う前兆（変動）と出現後の定常性（収束性）の２性質をそれぞれ異なる判別法で判定するようにしたので、第２、第３の検出方式と比較してはるかに優れた検出精度を達成することが可能となる。
【００５９】
また、本発明は、符号化モード情報を利用してテロップ領域候補となるブロックの精度を高め、動き予測情報を利用してテロップ開始フレームを求めるようにしたので、テロップ検出解像度を高めることが可能になる。
【図面の簡単な説明】
【図１】本発明の一実施形態の概略の構成を示すブロック図である。
【図２】図１の時間変移判定部の動作を示すフローチャートである。
【図３】図２のライン変動判定処理（ステップＳ１）の詳細を示すフローチャートおよび説明図である。
【図４】図２のブロック変動判定処理（ステップＳ２）の詳細を示すフローチャートおよび説明図である。
【図５】図２の収束判定処理（ステップＳ３）の詳細を示すフローチャートおよび説明図である。
【図６】図１のテロップ位置判定部の動作を示すフローチャートである。
【図７】図６の重み付け符号化モード計数処理（ステップＳ３４）の詳細を示すフローチャートである。
【図８】図１の出現フレーム判定部の動作を示すフローチャートである。
【図９】図８の出現予兆の検証処理（ステップＳ４６）の説明図である。
【図１０】従来の第１の検出方式の構成を示すブロック図である。
【符号の説明】
１…可変長復号部、２…時間変移判定部、３…テロップ位置判定部、４…出現フレーム判定部。[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a telop area detection device in a moving image, and more particularly to a telop region detection device in a moving image that can extract a telop region at high speed and with high accuracy from information obtained by decoding compressed encoded data itself or only a part thereof. .
[0002]
[Prior art]
As a conventional method for detecting a telop area (hereinafter referred to as a first detection method), in particular, as a telop detection method from a news video, there is a method that uses a temporal luminance distribution difference value for telop detection determination. In addition, a method for obtaining a telop from geometrical properties such as locality of appearance position and regular arrangement, and a method for obtaining a telop from feature amounts as characters such as edge distribution deflection and color similarity have been reported. Further, as a method for analyzing these feature amounts, a telop area detection method incorporating a neural network or a genetic algorithm has been proposed. These methods are methods for extracting a telop area by applying various algorithms to the luminance value of each pixel in a moving image.
[0003]
Also, other telop detection methods have proposed a method that uses encoded data of a moving image that has been compression-encoded, rather than using the pixels themselves. This method achieves a telop area detection process by directly manipulating various parameters and encoded data required at the time of compression.
[0004]
A conventional method (hereinafter referred to as a second detection method) that pays attention to a temporal change in motion prediction error observes temporal variation of motion prediction information for a code unit block in an image and detects an instantaneous appearance of a telop. . Only the minimum necessary information is selected and the telop area is extracted. For this reason, the extracted resolution is set to the size of the encoding unit block of each compression method having motion prediction information.
[0005]
Furthermore, in the method focused on the coding mode of the code unit block (hereinafter, third detection method), the count counter is increased or decreased according to the coding mode from the correlation between the feature of the stationary telop and the coding mode. An area equal to or greater than the threshold is recognized as a telop area candidate, and further extracted as a telop by determining the shape. References disclosing the third detection method include, for example, IEICE Transactions D-II, Vol. J81-D-II No. 8, PP1847-1855, August 1998 “From MPEG Coded Video. "High-speed telop area detection method".
[0006]
[Problems to be solved by the invention]
In order to apply the first detection method to compression-encoded moving image data, it is necessary to once decode the encoded data, and the telop detection process cannot be applied unless the compressed encoded data is returned to the pixel area information. . Therefore, the first detection method is realized by, for example, the configuration shown in the block diagram of FIG.
[0007]
The variable length decoding unit 51 in FIG. 10 receives compressed image encoded data and performs decoding processing such as variable length decoding, inverse quantization, and coefficient range restriction. The variable length decoding unit 51 outputs the decoded frame and block coding mode information, motion prediction information, motion prediction error information, and the like. The input of the image generation unit 52 is prediction error information from the variable length decoding unit 51, and generates image data of one frame through inverse transformation. In addition, the motion compensation unit 52 extracts a block using the motion prediction information from the image memory 53 that holds a motion prediction reference frame, and outputs the image data output from the image generation unit 52 to form a complete frame. to add. This image is input to the telop detection unit 54 and is stored in the image memory 53 if it is a frame that becomes a reference image for the next and subsequent frames. After these series of decoding processes, the telop detection unit 54 performs the telop area detection process in the pixel area for the first time. The detection result by the telop detection unit 54 is sent to the image display unit 55.
[0008]
In the first detection method, there is a problem that a large calculation cost is required for the decoding process of the compressed data and the telop area detection process using the decoded image.
[0009]
On the other hand, since the second and third detection methods use the compressed encoded data itself, the decoding process can be omitted and the detection process can be executed at high speed. However, in actual moving images, motion prediction error information changes greatly due to camera work such as panning and tilting, and video effects edited after shooting such as wipes and dissolves. Is difficult. In particular, this influence is great in scene changes, and the second detection method has a problem that detection accuracy is inferior, such as erroneously recognizing a frame after a scene change as a telop area.
[0010]
Further, the third detection method has a characteristic that the detection rate becomes high when a large number of motion vectors exist outside the telop area. However, in a low-resolution video, the motion vector is relatively small, and the distribution of the encoding mode is also different. Also, news videos and the like often have a fixed camera, and if there is no motion vector in the background, there is a risk of misdetecting an area other than the telop as a telop area.
[0011]
An object of the present invention is to solve the above-described problems of the prior art, and a telop area detection device that can detect the appearance of a telop at high speed and with high accuracy from compressed encoded data itself or information obtained by decoding only a part thereof. Another object of the present invention is to provide a telop area detection device that can extract a telop position in a frame.
[0012]
In order to achieve the above-mentioned object, the present invention provides a telop area detection device in a moving picture that receives compressed moving picture data as input, adds telop area information to the moving picture data, and outputs the same. A variable-length decoding unit that performs variable-length decoding on the compressed moving image data, and a region in which a change is recognized by comparing the current and previous intra-frame encoded image decoded by the variable-length decoding unit. If the change is detected as to whether or not it converges, telop Output candidate location information A time transition determination unit; In the inter-frame encoded image decoded by the variable length decoding unit existing between the intra-frame encoded images, the block corresponding to the position information of the telop candidate Types of encoding modes A block having a coding mode suitable for the telop is extracted from the intra-frame coded image and the position information of the block is output. A telop position determination unit; With respect to the telop candidate block of the inter-frame encoded image, the bidirectional prediction image group output from the variable-length decoding unit Reference direction of motion prediction information Temporal change And an appearance frame determination unit for detecting an appearance frame of a telop.
[0013]
According to this feature, the telop area detection process has been made step-by-step so And High-precision telop area detection processing can be performed.
[0014]
DETAILED DESCRIPTION OF THE INVENTION
The present invention will be described below with reference to the drawings. FIG. 1 is a block diagram showing the configuration of an embodiment of the present invention. In this embodiment, MPEG-1 video (ISO / IEC 11172-2), which is an international standard, is used as an input moving image encoding method, but the present invention is not limited to this.
[0015]
As input of the entire system, encoded data of a compressed and encoded moving image is given. Only necessary information is partially decoded by the variable length decoding unit 1 in the encoded data, and the decoded information A, B, and C are respectively converted into a time transition determination unit 2, a telop position determination unit 3, and an appearance frame determination. Sent to part 4. Here, the information A is encoded information of an intra-frame encoded image (I picture) between (n−N) frames and (n + mN) frames, and B is P and B between n frames and (n + mN) frames. Coding mode information of a picture, C is motion prediction information of a B picture between (n−N) frames to n frames. Here, the constant N represents the number of pictures in the GOP and is a parameter indicating the appearance interval of the intra-frame encoded image. N and m are arbitrary positive integers.
[0016]
The time transition determination unit 2 examines the state of temporal transition based on the encoding information A of a plurality of intra-frame encoded images. Then, on the basis of the examination, an area that is a telop area candidate is set on the I picture, and position information D of this area is output to the telop position determination unit 3.
[0017]
The telop position determination unit 3 determines a telop determination target block based on the position information D from the time transition determination unit 2. At the same time, the encoding mode information B of the target block is input from the variable length decoding unit 1, and the selection status of the encoding mode is grasped for each determination target block. Based on these results, a block having an encoding mode suitable for the telop is extracted from the I picture, and the position information E of the block is sent to the appearance frame determination unit 4.
[0018]
The appearance frame determination unit 4 determines a detection target block based on the block position information E input from the telop position determination unit 3 and simultaneously receives the motion prediction information C from the variable length decoding unit 1. Based on the motion prediction information C, frame determination processing is performed to determine from which frame a telop has appeared.
[0019]
The determination result F is output to a detection result display unit or a recording unit (not shown). The detection result display unit or the recording unit outputs the detection result and, if requested, partially decodes and presents the video in the extraction area.
[0020]
Next, the function of the time transition determination unit 2 will be described in detail with reference to the flowchart of FIG. The time transition determination unit 2 performs four processes. These four processes are a line fluctuation determination process in step S1, a block fluctuation determination process in step S2, a convergence determination process in step S3, and a shape shaping process in step S4.
[0021]
The line fluctuation determination process (step S1) is performed on the encoded data of the two intra-frame (Intra) encoded images In-N and In that are input. Here, the constant N represents the number of pictures in the GOP and is a parameter indicating the appearance interval of the intra-frame encoded image. In the line fluctuation determination process, an area that has changed over time is extracted in units of one line of a block. The position information of the area determined as the fluctuation area in the process of step S1 is input to the block fluctuation determination process of step S2. In the block variation determination process, the variation region is determined in units of blocks, and the extracted block is sent to the convergence determination process in step S3.
[0022]
In the convergence determination process, a region where the change converges in all frames In + mN (0 <m <q) after In is extracted from the designated region, and the region is set as a telop region candidate. Here, the constant q represents the number of frames subject to convergence determination, and takes a value of 1 or more. The shape shaping process in step S4 receives the telop area candidate and eliminates the isolated small area in consideration of the size of the telop. Next, the missing part of the telop area candidate is compensated by the expansion / contraction process, and the telop area candidate is shaped for the subsequent determination process. Above, the process of the said time transition determination part 2 is complete | finished.
[0023]
The time transition determination unit 2 pays attention to a region where a change is recognized by comparing the current intra-frame encoded image and the previous intra-frame encoded image, and only the region in the future intra-frame encoded image Whether the change converges at. For the specific occurrence determination of the time change, the DC component and the AC component of the DCT coefficient are individually used as determination criteria.
[0024]
Below, the process of said step S1-S3 is demonstrated in detail with reference to FIGS.
[0025]
First, the line fluctuation determination process in step S1 of FIG. 2 will be described with reference to FIG. FIG. 3A is a flowchart showing details of the line fluctuation determination process using a DC component. In the line fluctuation determination process, DCT coefficient DC component information of the frame In and the frame In-N one GOP before is input. First, assuming that the telop appears horizontally or vertically with respect to the screen, the DC component is read in units of one line each in the vertical and horizontal directions in the process of step S10. For example, as shown in FIG. 3B, the DC component of the DCT coefficient of each block of the frame In and the frame In-N before 1 GOP is read in units of vertical and horizontal lines.
[0026]
In step S11, a coarsely quantized luminance histogram is generated in order to capture a rough change in the DC component. In step S12, the sum of absolute differences from the histogram of the same position line in the past frame In-N is obtained. In step S13, a determination is made based on a threshold value, and for a line having a difference value equal to or greater than the threshold value, the entire line is set as a telop area candidate in step S14. Otherwise, in step S15, the entire line is set as a non-telop area.
[0027]
In step S16, it is determined whether or not the processing for each line has been completed for all blocks. If not, the process returns to step S10, and the series of processing of steps S10 to S15 is repeated for the next line. If the determination in step S16 is affirmative, the process proceeds to step S17.
[0028]
In step S17, the number of lines that are telop area candidates is counted. If the telop area candidates occupy most of the frame, the luminance change due to a cause other than the telop is set as a non-telop area and the current frame is changed. The detection process ends. Otherwise, the position information of the extracted block is output, and the time variation determination process using the DC component is terminated.
[0029]
Next, the block variation determination process in step S2 will be described with reference to FIG. FIG. 4A shows a flowchart of block fluctuation determination processing using an AC component. In the variation determination process, DCT coefficient AC component information of the frame In and the frame In-N is input and processed in units of blocks. Since the edge region interwoven with the character and the background corresponds to the amount of the DCT coefficient AC component, a block whose change in the partial sum of the blocks exceeds the threshold both spatially and temporally is defined as a telop region.
[0030]
In step S19, it is checked whether the target block is an area determined as a telop area candidate in the line fluctuation determination process in step S1. If it is a telop area candidate, the processing is continued. Otherwise, the non-telop is determined and the determination for the block is ended. In step S20, an absolute value partial sum of AC components is calculated for the telop area candidate.
[0031]
FIG. 4B shows an example of the coefficient range used for variation determination. Here, for the DCT coefficient AC component, the determination based on the absolute value partial sum of nine AC low-frequency components in a zigzag scan order is used.
[0032]
In step S21, a difference between the absolute position partial sum of nine AC low frequency components of the same position block in the past frame (In-N frame) is calculated. In step S22, determination based on a threshold is performed, and a block having a difference value equal to or greater than the threshold is determined as a telop area candidate in step S23. Otherwise, it is determined as a non-telop area candidate in step S24. In step S25, it is determined whether or not the processing has been completed for all the blocks. If not completed, the process returns to step S19 to repeat the series of processing in steps S19 to S24. If it is determined in step S25 that all blocks have been processed, the AC component variation determination process ends.
[0033]
Next, details of the convergence determination process in step S3 will be described with reference to FIG. FIG. 5A shows a flowchart of the convergence determination process. There is a high possibility that the fluctuation area is the appearance of a telop, but there is a possibility that the fluctuation area is a fluctuation due to a moving object or camera work. In the telop, the temporal luminance fluctuation is severe in the appearance transition period, but the luminance change hardly occurs in the steady state. Therefore, in the convergence determination process, continuity with respect to the position of the telop is used in order to remove the variation area caused by factors other than the appearance of the telop from the telop area candidates extracted in the steps S1 and S2.
[0034]
Specifically, after confirming that there is no scene change or the like based on the temporal change of the DC component for the entire screen, the direction and position of the edge of the stationary telop is the same as the telop area candidate. Is used. For edge matching, class classification using a partial sum of AC components is used. For example, when the nine AC components shown in FIG. 4 (b) are further divided into three parts of vertical (vertical), horizontal (horizontal), and diagonal elements, four blocks are obtained as shown in FIG. 5 (b). Thus, a total of 12 classes can be formed.
[0035]
DCT coefficient AC component information from frame In to frame In + qN is input to the convergence determination process in step S3. In step S26, it is checked whether or not the target block is a telop area candidate. If it is a telop area candidate, the process is continued. Otherwise, the non-telop area is selected and the convergence determination process for the block is terminated. In step S27, the edge class of the target block in the frame In is determined. For example, an edge class having the maximum partial sum is obtained from the 12 classes shown in FIG. In step S28, the same edge class of the same position block in the frame In + mN (0 <m <q) is determined. In general, since intra-frame encoded images are often arranged at intervals of 12 to 15 frames, at 30 fps, they are arranged at intervals of approximately 0.5 seconds. Assuming that a telop is presented for 2 seconds or more, the value of q can be set to about 4.
[0036]
In step S29, it is determined whether or not the partial sums of the edge classes obtained in steps S27 and S28 match. If they match, the block is set as a telop area candidate in step S30. Otherwise, it is determined as a non-telop area candidate in step S31. In step S32, it is determined whether or not the processing has been completed for all the blocks. If not, the processing returns to step S26, and the series of processing in steps S26 to S31 is repeated. If the processing of all blocks is completed, the convergence determination process by class classification is terminated.
[0037]
As a result of the above time transition determination processing, a region that is a telop region candidate is set on the I picture.
[0038]
Next, the function of the telop position determination unit 3 (see FIG. 1) based on the coding mode information will be described with reference to FIG. The telop position determination unit 3 receives the telop area candidate information extracted by the time transition determination unit 2. At the same time, encoding mode information of a frame (P, B picture) existing between the frame In and the frame In + mN is input from the variable length decoding unit 1. A group of blocks existing at the same position is processed as a unit over a plurality of frames. The verification of the telop using the encoding mode information is limited to the telop area candidates extracted by the above-described detection process.
[0039]
In step S33, it is determined whether or not the target block is determined as a telop area candidate by the time transition determination unit 2. If it is a telop area candidate, the processing is continued. Otherwise, the non-telop is determined and the determination for the block is ended. A step S34 generates a count counter from the inter-frame distance referred to by the encoding mode information and the motion prediction information. In step S35, the counter counted in step S34 is determined based on a threshold value. A block group in which a counter having a threshold value or more is formed is set as a telop area candidate in step S36. Otherwise, it is set as a non-telop area candidate in step S37. In step S38, it is determined whether or not the processing has been completed for all the blocks. If not completed, the process returns to step S33, and the series of processing in steps S33 to S37 is repeated. If the processing of all the blocks has been completed, the process proceeds to step S39. In step S39, shape shaping processing is performed on the telop area candidate. The processing content is the same as the shape shaping processing unit (step S4). Thus, the determination process using the encoding mode information is completed.
[0040]
Frame coding information and block coding information are used as coding mode information which is one of input information. There are the following three types of frame encoding information.
(1) Intra-frame coded image (I picture)
(2) Forward prediction image (P picture)
(3) Bidirectional prediction image (B picture)
[0041]
The block coding mode includes an intra-frame coding block (Intra) and an inter-frame coding block (Inter). Furthermore, there are the following four types of inter-frame coding blocks depending on the presence or absence of motion compensation and coding.
(1) Motion compensated coding block (MC coded)
(2) Frame differential coding block (no MC coded)
(3) Motion compensation block (MC no coded)
(4) Skipped block (Skip)
However, there are three types of motion prediction: forward direction, reverse direction, and both directions.
[0042]
At this time, there is a high correlation between the feature of the stationary telop and the coding mode of the corresponding block. For example, motion prediction information does not exist in a stationary telop or its size is close to zero. In addition, as a characteristic of the character string constituting the telop is the edge of the boundary, the motion prediction error information is rarely omitted because there is a complex texture.
[0043]
Therefore, the no MC coded coding mode having the above features is most likely a telop. However, when considering the encoding process of the real video, the motion prediction information is not given to the still region as the accuracy of the encoder is improved. That is, a mode having no motion prediction information is often selected regardless of the presence or absence of a telop. In addition, motion prediction information may not be assigned because the apparent motion is small even in an area that is moving as the motion prediction reference frame is closer.
[0044]
As a method for solving this problem, the concept of temporal distance is introduced and used as reliability information of the coding mode when making a determination in the coding mode. If the reference frame is close, even if there is no motion prediction information, the possibility of it being a telop is not guaranteed, so the reliability of the coding mode information is set low. On the other hand, if motion prediction information exists, the reliability as a non-telop area candidate is increased because there is a clear moving object. On the other hand, if the reference frame is far away, the opposite setting is prepared. That is, when there is no motion prediction information, it can be determined that the area is completely stationary, so the reliability of the encoding mode information as a telop area candidate is set high. Even if motion prediction information exists, the reliability of the coding mode information as a non-telop area candidate is set low.
[0045]
An example of the counting method using the encoding mode information with the temporal distance as reliability information (step S34) will be described with reference to the flowchart of FIG. The input information is the encoding mode information of the block group at the same position.
[0046]
In step S40, the distance to the frame referenced by the motion prediction information is calculated. However, when the prediction method is bi-directional prediction, the closest one is adopted. In step S41, a determination is made using the coding mode information of the block. If the block coding type is MC coded or MC no coded with motion prediction information, a number inversely proportional to the temporal distance to the reference frame obtained in step S40 is subtracted in step S42. On the other hand, if Intra has no motion prediction information, no MC coded has a motion prediction information size of 0, or Skip, a number proportional to the temporal distance is added in step S43. In step S44, it is determined whether or not the processing has been completed for the block existing at the same position. If the processing has not been completed, a series of operations from step S40 is repeated for the next block. Otherwise, the counter counting ends.
[0047]
By the above telop position determination process, a block having an encoding mode suitable for the telop can be extracted from the I picture.
[0048]
Next, the operation of the appearance frame determination unit 4 will be described. The verification of the telop using the motion prediction information is limited to the telop area candidates extracted by the above-described detection process. Here, the coding mode of the block and the temporal reference direction of the motion vector are used. When a telop appears, the following properties appear in the telop area of consecutive B pictures (hereinafter referred to as B picture groups) divided into I or P pictures.
(1) There is no bidirectional prediction in the B picture group.
(2) When the appearance frame is an I or P picture, only the forward motion vector exists in the B picture group.
(3) When the appearance frame is a B picture, a backward motion vector also exists in the B picture group.
(4) When the appearance frame is a B picture, there is only one forward / reverse switching in the B picture group.
(5) No motion vector after telop appears.
[0049]
In order to facilitate understanding of the above properties, FIG. 9 is shown. From FIGS. 7A and 7B, it is clear that when a telop appears in a B picture, I picture, or P picture, there is no bidirectional prediction in the B picture group as in (1). In addition, when the telop appearance frame is an I or P picture, it is clear from FIG. 4B that only the forward motion vector exists in the B picture group as in the above (2). In addition, when the telop appearance frame is a B picture, it is clear from FIG. 5A that the B picture group has the properties (3) and (4). As shown in FIG. 5C, when bi-directional prediction exists in the B picture group, no telop appears in any of the I, P, and B picture groups.
[0050]
Therefore, the appearance frames of the telop are narrowed down to the frames satisfying the above conditions (1) to (5) from the motion prediction information of the B picture group. Note that both sides of the B picture group in FIG. 9 may both be P pictures.
[0051]
Specifically, when the appearance of a telop is detected in a plurality of intra-frame encoded images by the time transition determination unit 2, the appearance frame determination unit 4 verifies the above characteristics (properties) for the B picture group inside the GOP. . First, for each B picture group, the temporal reference direction of motion prediction information is examined for each block. When the B picture group is configured only by forward prediction (the property 2), it is determined that a telop has appeared in the immediately following I or P picture. When the B picture group includes the reverse direction, or when the change from the forward direction to the reverse direction exists only once (the properties 3 and 4), it is determined that a telop has appeared in the B picture in which the reverse direction has started. In other cases, it is determined that no telop appears in the continuous B picture group, and the determination process is continued for the next B picture group. Thereby, the appearance frame of the telop can be detected in units of frames.
[0052]
However, since a stationary telop is assumed, the length of the motion prediction information itself is limited to a block having almost zero. Blocks with significant motion prediction information are excluded from telop area candidates regardless of the reference direction. Similarly, the length of the motion prediction information is verified for each block at the same position for the GOP after the telop appearance is determined. If the length is not sufficiently close to 0, the block is excluded from the telop area candidates.
[0053]
FIG. 8 shows a flowchart of the operation of the appearance frame determination unit 4 based on motion prediction information. The position information of the telop area candidate is input from the telop position determination unit 3. At the same time, the motion prediction information of the frame between the frame In-N and the frame In is input from the variable length decoding unit 1. The determination is made on individual blocks of the B picture group.
[0054]
In step S45, it is determined whether or not the target block is determined as a telop area candidate in the frame In by the telop position determination unit 3. If it is a telop area candidate, the process is continued, otherwise the determination for the block group is ended. In step S46, the characteristics of the B picture group accompanying the appearance of the above-mentioned telop are verified (refer to the characteristics (1) to (5) and FIG. 9). In step S47, it is determined whether or not the output of step S46 satisfies the above property. If so, the B picture frame or the I or P picture in which the backward vector appears in step S48. Outputs the frame as a telop appearance frame. Otherwise, in step S49, the block is set as a non-telop, and it is determined that no telop appears in the B picture group. In step S50, it is determined whether the determination process has been completed for all blocks in the B picture group. If not completed, the process returns to step S45, and the series of processes of steps S45 to S49 is repeated for the next block. Otherwise, the determination process is continued.
[0055]
In step S51, it is determined whether the determination process has been completed for all the B picture groups in the GOP. If not completed, the series of processing in steps S45 to S50 is repeated for the next B picture group. Otherwise, the process ends. If no telop start frame is detected in step S48, all blocks are set as non-telop areas, and the process of the appearance frame determination unit 4 ends. In this case, the appearance frame determination unit 4 then performs the determination process of FIG. 8 again for all the B picture groups in the next GOP.
[0056]
As is apparent from the above description, according to the present invention, the following features (1) to (5) can be provided.
(1) Since the telop area detection process is made stepwise, both high-speed processing and high-precision processing can be achieved.
(2) Since the temporal variation determination and the subsequent convergence determination are performed, an unnecessary variation region can be eliminated and the telop region can be detected.
(3) Blocks having significant motion prediction information can be excluded from detection targets.
(4) In consideration of the reliability of the coding mode information, detection determination by weighting counting can be performed.
(5) Using the motion prediction information, the telop detection resolution can be achieved in units of one frame.
[0057]
【The invention's effect】
As is apparent from the above description, according to the present invention, in addition to partial decoding of compression-coded moving image data, frames (for example, intra-frame encoded images) having an interval of 10 or more frames are provided. ), The detection processing of the telop start frame is made hierarchical, such that detection is first performed and then detection is performed in units of one frame, so that the conventional pixel area detection method (the first detection method described above) Of course, the processing cost can be reduced even when compared with the detection methods in the code data region (the second and third detection methods). In other words, in the present invention, the application range of the telop detection determination can be minimized, so that the processing amount of the telop area extraction method on the compression encoded data can be further reduced and the high speed can be further improved. Become.
[0058]
Further, according to the present invention, the two characteristics of the precursor (variation) associated with the appearance of the telop and the continuity (convergence) after the appearance are determined by different discriminating methods. Much better detection accuracy can be achieved.
[0059]
In the present invention, since the encoding mode information is used to improve the accuracy of blocks that are telop area candidates and the motion prediction information is used to obtain the telop start frame, the telop detection resolution can be increased. become.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a schematic configuration of an embodiment of the present invention.
FIG. 2 is a flowchart showing an operation of a time transition determination unit in FIG. 1;
FIG. 3 is a flowchart and an explanatory diagram showing details of the line fluctuation determination process (step S1) in FIG. 2;
FIG. 4 is a flowchart and an explanatory diagram showing details of a block fluctuation determination process (step S2) in FIG. 2;
FIG. 5 is a flowchart and an explanatory diagram showing details of convergence determination processing (step S3) in FIG. 2;
6 is a flowchart showing the operation of the telop position determination unit in FIG. 1. FIG.
7 is a flowchart showing details of the weighted encoding mode counting process (step S34) of FIG. 6;
FIG. 8 is a flowchart showing the operation of the appearance frame determination unit in FIG. 1;
FIG. 9 is an explanatory diagram of the appearance predictor verification process (step S46) of FIG. 8;
FIG. 10 is a block diagram showing a configuration of a conventional first detection method.
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 1 ... Variable length decoding part, 2 ... Time transition determination part, 3 ... Telop position determination part, 4 ... Appearance frame determination part.

Claims

In a telop area detection device in a moving image that receives compressed moving image data, adds telop area information to the moving image data, and outputs the telop area information.
A variable length decoding unit for variable length decoding the compressed moving image data;
When it is detected that the change converges in the area where the change is recognized by comparing the current decoded by the variable length decoding unit with the previous intra-frame encoded image, and the change is detected. A time transition determination unit that outputs position information of telop candidates to
In the inter-frame encoded image decoded by the variable length decoding unit existing between the intra-frame encoded images, the encoding mode suitable for the telop is determined from the type of the encoding mode of the block corresponding to the position information of the telop candidate. A telop position determination unit that extracts a block having the encoded image from the intra-frame encoded image and outputs position information of the block ;
Appearance frame determination for detecting an appearance frame of a telop from a temporal change in a reference direction of motion prediction information of a bidirectional prediction image group output from the variable length decoding unit for the telop candidate block of the inter-frame encoded image A telop area detecting device in a moving image.

The telop area detection device in a moving image according to claim 1,
The time transition determination unit uses a difference value of motion prediction error information between frames as a determination criterion as a telop appearance determination.

The telop area detection device in a moving image according to claim 2,
The time transition determination unit uses a histogram difference based on a DCT coefficient DC component of motion prediction error information as a difference value between frames used for variation determination.

The telop area detection device in a moving image according to claim 2 or 3,
The time transition determination unit uses a histogram difference of motion prediction error information for each line of a block, both vertically and horizontally, as a telop area detecting device.

The telop area detection device in a moving image according to claim 2,
The time transition determination unit uses a difference based on a partial absolute value sum of DCT coefficient AC components of motion prediction error information as a difference value between frames used for variation determination.

The telop area detection device in a moving image according to claim 1,
The telop area detection device, wherein the time transition determination unit includes means for confirming a steady state after the telop appears.

The telop area detection device in a moving image according to claim 6,
The telop area detection device according to claim 1, wherein the time transition determination unit uses identity determination by class classification for confirmation of a steady state.

The telop area detection device in a moving image according to claim 6 or 7,
The telop area detection device, wherein the time transition determination unit forms a class used for grasping a steady state based on uneven distribution of coefficient distribution of motion prediction error information.

The telop area detection device in a moving image according to any one of claims 6, 7, and 8,
The time transition determination unit uses the edge direction identified from the three elements of the vertical, horizontal, and diagonal motion prediction error information DCT coefficient AC component for the formation of a class used for convergence determination. .

The telop area detection device in a moving image according to claim 1,
The telop position detection unit uses a 0 approximation determination based on the size of motion prediction information as a determination reference for determining a telop.

The telop area detection device in a moving image according to claim 1 ,
The telop position determination unit uses a temporal distance to a reference frame of motion prediction information for determination of reliability with respect to encoding mode information.

The telop area detection device in a moving image according to claim 11 ,
A telop area detection apparatus using a weighting coefficient counter for determining reliability of the encoding mode information.

The telop area detection device in a moving image according to claim 12 ,
The weighting factor of the weighting factor counter, telop area detection device, characterized in that made proportional to the temporal distance to the frame motion prediction information refers.

The telop area detection device in a moving image according to claim 1 ,
The appearance frame determination unit uses, as a determination criterion, that there is no bidirectional motion prediction information in a telop area in a continuous bidirectional prediction image as a determination of a telop appearance frame. .

The telop area detection device in a moving image according to claim 1 ,
When the telop appears in the intra-frame encoded image or the unidirectional prediction image as the determination of the telop appearance frame, the appearance frame determination unit predicts the forward motion in the telop area in the previous continuous bidirectional prediction image. A telop area detection apparatus using, as a criterion, the presence of only information.

The telop area detection device in a moving image according to claim 1 ,
The appearance frame determination unit determines, as a determination criterion, that when the telop appears in the bidirectional prediction image, the backward motion prediction information also exists in the telop area in the continuous bidirectional prediction image as the determination of the appearance frame of the telop. A telop area detection apparatus characterized by being used.

The telop area detection device in a moving image according to claim 1 ,
When the telop appears in the bidirectional prediction image, the appearance frame determination unit determines that the forward / reverse direction of the motion prediction information is switched only once to the telop area in the continuous bidirectional prediction image. A telop area detection apparatus characterized by being used as a criterion.