JP3569391B2

JP3569391B2 - Character appearance frame extraction device

Info

Publication number: JP3569391B2
Application number: JP20445496A
Authority: JP
Inventors: 秀豪桑野; 正治倉掛
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1996-08-02
Filing date: 1996-08-02
Publication date: 2004-09-22
Anticipated expiration: 2016-08-02
Also published as: JPH1049682A

Description

【０００１】
【発明の属する技術分野】
この発明は，動画像を構成する複数のフレームの中から文字が含まれるフレームを抽出する文字出現フレーム抽出装置に関するものである。
【０００２】
【従来の技術】
動画像を構成する複数のフレームの中から文字が含まれるフレームを抽出する技術に関しては，近年多くの研究が行われている。その一つとして，各フレームにおいて輝度に関して高コントラスト部分の面積を，エッジ強度の高い画素の個数などを用いて求め，面積値が時間軸上で急激に増加した時刻のフレームを文字出現フレームとする方法が提案されている。この方法に関する参考文献としては，下記の参考文献［１］［２］［３］がある。
【０００３】
また，下記の参考文献［４］では，文字出現フレームの抽出が目的ではないが，動画像を構成する各フレームにおける高コントラスト部分の面積の時間変化を求める量として，フレーム間の輝度ヒストグラム差分値を提案している。これは前述のエッジ強度の高い画素の個数を求める場合と比べて計算量が少ないという性質を持つ。
【０００４】
また，ニュース映像中の文字出現フレーム抽出に特化した方法なども提案されている（参考文献［５］）。これは，テロップ文字がシーンチェンジが行われたフレームに表示されることを利用するもので，シーンチェンジフレームの中でスタジオシーンを選択して，このフレームを文字出現フレームとして抽出する方法である。
【０００５】
〔参考文献〕
［１］中島，堀，塩原：“キーワード画像抽出による動画像サマリの作成”，情処全大，１Ｆ−１０，（１９９４後期）．
［２］根本，半谷，宮内：“テロップの認識による資料映像の検索について”，信学全大，Ｄ−４２７，（１９９４春季）．
［３］高野，中村：“Ｈ．２６１符号ハンドリングによるテロップ検出”，画像電子学会研究会予稿，９４−０６−０４，ｐｐ．１３−１６，（１９９４−０６）．
［４］長坂，田中：“カラービデオ映像における自動索引付け法と物体探索法”，情処論，Ｖｏｌ．３３，Ｎｏ．４，ｐｐ．５４３−５５０，（１９９２−０４）．
［５］茂木，有木：“ニュース映像中の文字認識に基づく記事の索引付け”，信学技報，ＩＥ９５−１５３，ＰＲＵ９５−２４０，ｐｐ．３３−４０，（１９９６−０３）．
【０００６】
【発明が解決しようとする課題】
従来手法の中で，輝度の高コントラスト部分の面積の時間変化に基づく方法では，フレーム内の高コントラスト部分のフレーム内における空間的位置情報や形状情報の時間変化は考慮されていない。このため，異なる文字が，間に１フレームも間隔を置かずに連続して現れる場合等，文字出現の際の高コントラスト部分の面積の時間変化が小さい場合の文字出現フレームの抽出が安定して行えないなどの問題点がある。
【０００７】
また，シーンチェンジフレームを利用する方法でも，異なる文字が，間に１フレームも間隔を置かずに連続して現れる場合等，フレーム内の色や輝度の時間変化が少ない場合には，シーンチェンジの検出ができないため，文字出現フレームの抽出に失敗するという問題点がある。
【０００８】
本発明は，従来手法で問題となった文字出現フレーム抽出失敗を減少させ，文字が出現するフレームの抽出精度を向上させることを目的としている。
【０００９】
【課題を解決するための手段】
上記の目的を達成するため，本発明は，以下の手段を備えることを主要な特徴とする。
【００１０】
１．動画像をフレーム単位に入力して記憶する手段，
２．フレーム分割に複数の階層を設け，前記１の手段を用いて入力された各フレームを各階層毎に異なるサイズの部分矩形領域をもつ複数のフレームに分割する手段，
３．文字出現検出処理対象時刻を指定する手段，
４．前記３の手段を用いて指定された時刻に対応するフレームと該フレームの時刻から時間軸の前後に予め設定した１種類以上の時間間隔ずつ離れた１以上の時刻に対応するフレームが前記２の手段を用いて部分矩形領域に分割された場合に，文字出現検出処理対象時刻より１単位時間だけ前の時刻のフレームが文字出現フレームであるかどうかを調べ，文字出現フレームでない場合，次の第５の手段に進むことを決定し，文字出現検出処理対象時刻より１単位時間だけ前のフレームが文字出現フレームである場合，第６の手段へ進むことを決定する手段，
５．文字出現検出処理対象時刻のフレームに関して，予め決められた文字出現検出方法で文字出現が起こったかどうか判断する手段，
６．文字出現検出処理対象時刻のフレームとそれより１単位時間だけ前の時刻の文字出現フレームとの間で，対応する部分矩形領域毎に，対応画素間の輝度差分値が予め設定した差分値閾値よりも小さい画素の個数を求める手段，
７．文字出現検出処理対象時刻のフレームに関して，前記６の手段を用いて求めた画素の個数が予め設定した個数閾値より少ない部分矩形領域が存在するという条件を満たすかどうか判定し，該条件を満たす場合，該フレームを部分シーンチェンジ発生フレームと判断する手段，
８．文字出現検出処理対象時刻のフレームが前記７の手段を用いて部分シーンチェンジ発生フレームと判断された場合，文字出現検出処理対象時刻のフレームとそれより時間軸の後方向に予め設定した１種類以上の時間間隔ずつ離れた１以上の時刻に対応するフレームとの間で前記７の手段において部分シーンチェンジ発生の条件を満たした部分矩形領域における対応画素間の輝度差分値が予め設定した差分値閾値よりも小さい画素の個数を求める手段，
９．文字出現検出処理対象時刻のフレームに関して，前記８の手段を用いて算出した画素の個数が，全て予め設定した個数閾値以上存在するという条件を満たすかどうか判定し，該条件を満たす場合，該フレームを文字出現フレームと判断する手段，
１０．前記５の手段および前記９の手段を用いて文字出現フレームと判断されたフレームを出力する手段。
【００１１】
本発明の作用は，以下のとおりである。
映像中で類似した面積値を持つ異なる文字が間に１フレームも間隔を置かずに連続して現れる場合（例えば，ニュース映像において，あるニュース項目の開始時に，そのニュース内容の概要を示すヘッドラインが表示され，それに続いて間に１フレームも間隔を置かずに，ヘッドラインと同程度の文字数，および大きさを持つニュースの詳細を説明するためのヘッドラインとは内容が異なるテロップ文字が表示される場合など），最初の文字が現れていたフレームと次の文字が現れたフレームとの間の高コントラスト部分の面積差分値は，必ずしも大きい値をとるとは限らない。しかし，両フレーム内の各々の文字を構成する各画素の存在位置は異なることが多いため，対応画素間輝度差分値が小さい画素の数は少なくなる。
【００１２】
また，文字が出現したフレームの後，数フレームに渡っては，文字は同じ位置に静止して表示されるため，文字を構成する各画素の存在位置の変化がない。したがって，前の時刻のフレームが文字出現フレームである場合，１単位時間だけ前の時刻の文字出現フレームとの間の輝度値の対応画素間輝度差分値が予め設定した差分値閾値よりも小さい画素の個数が，予め設定した個数閾値より小さいフレームを，文字出現の特徴を反映した部分シーンチェンジ発生フレームとして抽出しておき，さらに該フレームに対して，後の時刻の一つ，あるいは複数のフレームとの対応画素間の輝度差分値が予め設定した差分値閾値より小さい画素の個数が全て予め設定した個数閾値以上であるという条件を満たす場合，該フレームを文字出現フレームとして抽出する手段により，文字連続出現の場合にも，従来手法の問題点であった文字出現フレームの抽出漏れを防ぐことができる。
【００１３】
また，対応画素間輝度差分値算出処理は，前記のように部分シーンチェンジを検出することが可能であるため，従来手法の問題点であった連続文字出現の場合の文字出現フレームの抽出漏れを抑制するために用いるだけでなく，文字が表示されていないフレームに続いて，最初に文字が出現する場合（この場合の文字出現を前記の「連続文字出現」に対して，「最初の文字出現」と呼ぶことにする）の文字出現フレーム抽出にも適用可能である。
【００１４】
ただし，対応画素間輝度差分値を用いて前記の最初の文字出現フレーム抽出を行う場合，文字出現検出処理対象時刻のフレームとそれより１単位時間だけ前の時刻のフレームとの間で対応画素間輝度差分値算出処理対象の部分矩形領域内の全ての画素について，対応画素間輝度差分値を算出するため，前述した参考文献［４］に示されているような従来の手法で提案されたフレーム間輝度ヒストグラム差分値と差分値算出の回数を比較すると，従来手法の輝度ヒストグラム差分値の算出の方が処理量が少ない。また，通常，映像中の文字出現パターンとしては「連続文字出現」に比べ，「最初の文字出現」の方が出現頻度が高いため，「最初の文字出現」パターンに対しても，対応画素間輝度差分値を用いて，文字が出現するフレームを抽出することは，処理量が多くなる。
【００１５】
したがって，特許請求の範囲に記載の第４の手段により，「最初の文字出現」パターンに対しては，従来手法で用いられている量で文字出現検出処理対象時刻のフレームの文字出現を判断し，「連続文字出現」パターンに対しては，対応画素間輝度差分値を用いて文字出現検出処理対象時刻のフレームの文字出現を判断することで，「最初の文字出現」および「連続文字出現」の両方の場合の文字出現フレームを効率的に検出することができる。
【００１６】
【発明の実施の形態】
以下，本発明の実施の形態を図１を用いて詳細に説明する。
図１は，本発明を実施する装置の一構成例を示すブロック図である。１は動画像入力記憶部，２は部分矩形領域分割部，３は文字出現検出処理対象時刻指定部，４は文字出現検出処理開始制御部，５は第１の文字出現判断部，６は第１の対応画素間輝度差分値算出部，７は部分シーンチェンジ判断部，８は第２の対応画素間輝度差分値算出部，９は第２の文字出現判断部，１０は文字出現フレーム抽出処理結果の出力部である。
【００１７】
動画像入力記憶部１では，動画像を構成する複数のフレームを入力してメモリ上に記憶する。
部分矩形領域分割部２では，動画像入力記憶部１の処理で記憶した動画像を構成する各フレームに対して，予め設定した複数のフレーム分割法により，フレームを部分矩形領域に分割する（図２参照）。図２は，フレーム分割の一例を示すものであり，複数の階層を設け，各階層毎に異なるサイズの部分矩形領域にフレームを分割している。
【００１８】
文字出現検出処理対象時刻指定部３では，部分矩形領域分割部２の処理により部分矩形領域に分割されたフレームのうち，文字出現検出処理を行っていないフレームに対して，予め決められた方法で文字出現検出処理を行うフレームの時刻を設定する。時刻の決め方は，例えば，時刻の順に決めればよい。
【００１９】
文字出現検出処理開始制御部４では，文字出現検出処理対象時刻指定部３で指定された時刻に対応するフレームと該フレームの時刻から時間軸の前後に予め設定した１種類以上の時間間隔ずつ離れた１以上の時刻に対応するフレームが部分矩形領域分割部２の処理で部分矩形領域に分割された場合，文字出現検出処理対象時刻より１単位時間だけ前の時刻のフレームが文字出現フレームであるかどうかを，１単位時間だけ前の時刻のフレームの文字出現検出処理結果のフラグ情報から判断し，文字出現でないと判断された場合，文字出現検出処理対象時刻のフレームに対し，第１の文字出現判断部５により文字出現検出処理を開始することを決定する。一方，文字出現検出処理対象時刻のフレームより１単位時間だけ前のフレームが文字出現フレームであると判断された場合，第１の対応画素間輝度差分値算出部６の処理を行う。
【００２０】
第１の文字出現判断部５では，文字出現検出処理対象フレームに対し，予め決められた方法（例えば参考文献［４］に記載の方法）を用いて文字出現フレームかどうかを判断し，文字出現フレームと判断された場合，該フレームに対し，文字出現をフラグ情報として付与する。文字出現でないと判断された場合，文字非出現をフラグ情報として付与し，文字出現検出処理対象時刻指定部３の処理に戻る。
【００２１】
第１の対応画素間輝度差分値算出部６では，文字出現検出処理対象フレームと１単位時間だけ前の時刻の文字出現フレームとの間で，対応する部分矩形領域における対応画素間の輝度差分値が予め設定した差分値閾値よりも小さい画素の個数を求める（図３参照）。
【００２２】
図３は，第１の対応画素間輝度差分値算出部６の機能の一例を示すものであり，文字出現検出処理対象時刻のフレームとそれより１単位時間だけ前の時刻のフレームとの間で対応する部分矩形領域における対応画素間で輝度の差分値を算出することを示している。図３の例では，部分矩形領域を構成する４８個の画素中，輝度差分値が所定の閾値より小さい画素数を求めている。
【００２３】
部分シーンチェンジ判断部７では，文字出現検出処理対象フレームに関して，各部分矩形領域毎に第１の対応画素間輝度差分値算出部６の処理で求めた前の時刻の文字出現フレームとの対応画素間の輝度差分値が予め設定した差分値閾値より小さい画素の個数が，予め設定した個数閾値より少ないかどうか判定し，該条件を満たす部分矩形領域を持つ場合，該フレームを部分シーンチェンジ発生フレームとして判断する。
【００２４】
第１の対応画素間輝度差分値算出部６の処理で求めた画素数が該条件を満たさない場合，該フレームを文字が前の時刻から継続して出現しているフレームと判断し，該フレームに対して，文字出現のフラグ情報を付与し，文字出現検出処理対象時刻指定部３の処理に戻る。
【００２５】
第２の対応画素間輝度差分値算出部８では，部分シーンチェンジ判断部７の処理で部分シーンチェンジ発生フレームとして判断された文字出現検出処理対象フレームとそれより時間軸の後方向に予め設定した１種類以上の時間間隔ずつ離れた１以上の時刻のフレームとの間で，対応する部分矩形領域における対応画素間の輝度差分値が予め設定した差分値閾値よりも小さい画素の個数を求める（図４参照）。
【００２６】
図４は，第２の対応画素間輝度差分値算出部８の機能の一例を示すものであり，文字出現検出処理対象時刻のフレームとそれより後の１以上の時刻のフレームとの間で対応する部分矩形領域における対応画素間で輝度の差分値を算出することを示している。この図４の例では，対応画素間輝度差分値算出処理対象の後時刻の複数のフレームを６枚としており，部分矩形領域を構成する４８個の画素中，輝度差分値が所定の閾値より小さい画素数を後の時刻の６フレーム分について求めている。
【００２７】
第２の文字出現判断部９では，部分シーンチェンジ判断部７の処理で部分シーンチェンジ発生フレームとして判断されたフレーム内の部分シーンチェンジ発生の条件を満たした部分矩形領域において，第２の対応画素間輝度差分値算出部８で求めた文字出現検出処理対象時刻のフレームの後の時刻の１以上のフレームとの間の対応する部分矩形領域における対応画素間の輝度差分値が小さい画素数が全て予め設定した個数閾値以上の場合，該フレームを文字出現フレームとして判断し，該フレームに対し，文字出現をフラグ情報として付与する。
【００２８】
また，第２の対応画素間輝度差分値算出部８で算出した１種類以上の画素数の中で予め設定した個数閾値より少ない値が一つでも存在すれば，文字非出現フレームと判断し，当該フレームに対して，文字非出現のフラグ情報を付与し，文字出現検出処理対象時刻指定部３の処理に戻る。
【００２９】
出力部１０では，第１の文字出現判断部５および第２の文字出現判断部９の処理で得られた文字出現フレーム抽出結果を出力する。
以上の説明では，第１の対応画素間輝度差分値算出部６および第２の対応画素間輝度差分値算出部８において，対応画素間の輝度差分値が所定の閾値より小さい画素の数を求めるとしたが，もちろん輝度差分値が所定の閾値以上の画素の数を求め，該当する画素数と閾値との大小の比較を変えれば，実質的に同等であり，同様に本発明を実施することが可能であることは言うまでもない。
【００３０】
【発明の効果】
以上説明したように，本発明によれば，画像中から予め決められた方法で抽出された文字出現フレームとそれより１単位時間だけ後の時刻のフレームとの間のフレーム間対応画素間輝度差分値を算出することで，従来手法では困難であった文字の連続出現を検出するため，従来の手法に比べ，より高精度に文字出現フレームの抽出が可能となる。
【図面の簡単な説明】
【図１】本発明を実施する装置の一構成例を示すブロック図である。
【図２】部分矩形領域分割部の機能の一例を説明する図である。
【図３】第１の対応画素間輝度差分値算出部の機能の一例を説明する図である。
【図４】第２の対応画素間輝度差分値算出部の機能の一例を説明する図である。
【符号の説明】
１動画像入力記憶部
２部分矩形領域分割部
３文字出現検出処理対象時刻指定部
４文字出現検出処理開始制御部
５第１の文字出現判断部
６第１の対応画素間輝度差分値算出部
７部分シーンチェンジ判断部
８第２の対応画素間輝度差分値算出部
９第２の文字出現判断部
１０文字出現フレーム抽出処理結果の出力部[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a character appearance frame extraction device for extracting a frame including a character from a plurality of frames constituting a moving image.
[0002]
[Prior art]
In recent years, much research has been conducted on a technique for extracting a frame including a character from a plurality of frames constituting a moving image. As one of them, the area of the high-contrast portion in terms of luminance in each frame is obtained using the number of pixels with high edge strength, and the frame at the time when the area value sharply increases on the time axis is set as a character appearance frame. A method has been proposed. References relating to this method include the following references [1], [2], and [3].
[0003]
In the following reference [4], although the purpose is not to extract the character appearance frame, the luminance histogram difference value between the frames is used as the amount of time change of the area of the high contrast portion in each frame constituting the moving image. Has been proposed. This has the property that the amount of calculation is smaller than the case where the number of pixels having a high edge strength is obtained.
[0004]
In addition, a method specialized in extracting a character appearance frame in a news video has been proposed (reference [5]). This method utilizes the fact that a telop character is displayed in a frame where a scene change has been made, and is a method of selecting a studio scene from scene change frames and extracting this frame as a character appearance frame.
[0005]
(References)
[1] Nakajima, Hori, Shiobara: "Creation of a moving image summary by extracting keyword images", Jikyou Zendai, 1F-10, (late 1994).
[2] Nemoto, Hanya, Miyauchi: "Retrieval of Material Video by Recognition of Telop", IEICE, D-427, Spring 1994.
[3] Takano, Nakamura: “Telop Detection by H.261 Code Handling”, Proc. Of the Institute of Image Electronics Engineers of Japan, 94-06-04, pp. 13-16, (1994-06).
[4] Nagasaka, Tanaka: "Automatic Indexing Method and Object Searching Method for Color Video Images", Information Processing, Vol. 33, no. 4, pp. 543-550, (1992-04).
[5] Mogi, Ariki: "Indexing Articles Based on Character Recognition in News Videos", IEICE Technical Report, IE95-153, PRU95-240, pp. 33-40, (1996-03).
[0006]
[Problems to be solved by the invention]
In the conventional method, in the method based on the time change of the area of the high contrast portion of the luminance, the time change of the spatial position information and the shape information in the frame of the high contrast portion in the frame is not considered. For this reason, extraction of a character appearance frame in a case where the time variation of the area of a high contrast portion at the time of a character appearance is small, such as when different characters appear continuously without any intervening frame, is stable. There are problems such as not being able to do it.
[0007]
Also, in the method using a scene change frame, if the color or luminance in a frame is small with time, such as when different characters appear continuously without any intervening frame, a scene change frame is used. There is a problem that extraction of character appearance frames fails because detection is not possible.
[0008]
SUMMARY OF THE INVENTION It is an object of the present invention to reduce a character appearance frame extraction failure which has been a problem in the conventional method, and to improve the extraction accuracy of a character appearance frame.
[0009]
[Means for Solving the Problems]
In order to achieve the above object, the present invention is mainly characterized by including the following means.
[0010]
1. Means for inputting and storing moving images in frame units,
2. Means for providing a plurality of layers for frame division, and for dividing each frame inputted by using the first means into a plurality of frames having partial rectangular areas of different sizes for each layer ;
3. Means for specifying the character appearance detection processing target time,
4. The frame corresponding to the time designated by using the means of the third and the frame corresponding to the one or more times separated from the time of the frame by one or more kinds of predetermined time intervals before and after the time axis are defined by the second means. When the frame is divided into partial rectangular areas by using the means, it is checked whether or not the frame at the time one unit time before the character appearance detection processing target time is a character appearance frame. Means for deciding to proceed to means 5 and means for deciding to proceed to means 6 if the frame one unit time before the character appearance detection processing target time is a character appearance frame,
5. Means for judging whether or not a character has occurred by a predetermined character occurrence detection method with respect to the frame at the character appearance detection processing target time;
6. For each corresponding partial rectangular area, a luminance difference value between corresponding pixels between a frame at a character appearance detection processing target time and a character appearance frame at a time one unit time earlier than the predetermined difference threshold value Means to determine the number of pixels smaller than
7. For the frame at the character appearance detection processing target time, it is determined whether or not the condition that there is a partial rectangular area in which the number of pixels obtained using the above-described means is smaller than a preset number threshold is satisfied. Means for judging the frame as a partial scene change occurrence frame,
8. When the frame of the character appearance detection processing target time is determined to be a partial scene change occurrence frame using the above-described means, one or more types of frames set in advance of the character appearance detection processing target time and a time axis backward from the frame of the character appearance detection processing target time The luminance difference value between corresponding pixels in a partial rectangular area that satisfies the condition for the occurrence of a partial scene change in the above-mentioned means is set to a difference value threshold value between a frame corresponding to one or more times separated by a time interval of Means for determining the number of pixels smaller than
9. With respect to the frame at the character appearance detection processing target time, it is determined whether or not the number of pixels calculated by using the means of the above 8 satisfies a condition that all pixels are equal to or greater than a preset number threshold. Means to judge as a character appearance frame,
10. Means for outputting a frame determined to be a character appearance frame using the means of the above 5 and the means of the above 9;
[0011]
The operation of the present invention is as follows.
When different characters having similar area values appear successively in the video without any intervening frame (for example, in a news video, at the start of a news item, a headline indicating an overview of the news content) Is displayed, followed by telop characters with the same number of characters and size as the headline but with a different content from the headline, with no intervening frames. In such a case, the area difference value of the high contrast portion between the frame in which the first character appears and the frame in which the next character appears does not always take a large value. However, since the positions of the pixels constituting each character in both frames are often different, the number of pixels having a small luminance difference value between the corresponding pixels decreases.
[0012]
In addition, for several frames after the frame in which the character appears, the character remains stationary at the same position, so there is no change in the location of each pixel constituting the character. Therefore, when the frame at the previous time is a character appearance frame, the pixel whose luminance value between the corresponding pixels at the luminance value between the character appearance frame at the time one unit time earlier and the corresponding pixel is smaller than the preset difference value threshold value Are extracted as a partial scene change occurrence frame reflecting the feature of character appearance, and one or a plurality of frames at a later time are extracted from the frame. If the condition that the number of pixels whose luminance difference value between the corresponding pixels is smaller than a preset difference value threshold value is equal to or greater than a preset number threshold value is satisfied, the frame is extracted as a character appearance frame by means of character extraction. Even in the case of continuous appearance, extraction omission of character appearance frames, which is a problem of the conventional method, can be prevented.
[0013]
In addition, since the corresponding pixel-to-pixel luminance difference value calculation processing can detect a partial scene change as described above, the extraction omission of a character appearance frame in the case of a continuous character appearance, which is a problem of the conventional method, is eliminated. In addition to using it for suppression, if a character first appears after a frame in which no character is displayed (in this case, the character occurrence is referred to as "the first character occurrence"") Is also applicable to character appearance frame extraction.
[0014]
However, when the above-mentioned first character appearance frame is extracted using the corresponding pixel-to-pixel luminance difference value, the corresponding pixel appears between the frame of the character appearance detection target time and the frame of the time one unit time earlier than the frame. In order to calculate the brightness difference value between the corresponding pixels for all the pixels in the partial rectangular area for which the brightness difference value is to be calculated, the frame proposed by the conventional method as described in the above-mentioned reference [4] is used. Comparing the inter-luminance histogram difference value with the number of times of calculating the difference value, the processing amount of the calculation of the luminance histogram difference value according to the conventional method is smaller. Also, since the appearance frequency of “first character appearance” is higher than that of “continuous character appearance” as a character appearance pattern in a video, the “first character appearance” Extracting a frame in which a character appears using a luminance difference value requires a large amount of processing.
[0015]
Therefore, for the "first character appearance" pattern, the character occurrence of the frame at the character appearance detection processing target time is determined by the amount used in the conventional method by the fourth means described in the claims. For the “continuous character appearance” pattern, the “first character appearance” and “continuous character appearance” are determined by determining the character appearance of the frame at the character appearance detection processing target time using the corresponding pixel luminance difference value. In both cases, the character appearance frame can be efficiently detected.
[0016]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, embodiments of the present invention will be described in detail with reference to FIG.
FIG. 1 is a block diagram showing a configuration example of an apparatus for implementing the present invention. 1 is a moving image input storage unit, 2 is a partial rectangular area division unit, 3 is a character appearance detection processing target time designation unit, 4 is a character appearance detection processing start control unit, 5 is a first character appearance determination unit, and 6 is a Reference numeral 1 is a corresponding pixel-to-pixel luminance difference value calculation unit, 7 is a partial scene change determination unit, 8 is a second corresponding pixel-to-pixel luminance difference value calculation unit, 9 is a second character appearance determination unit, and 10 is a character appearance frame extraction process. This is the output part of the result.
[0017]
The moving image input storage unit 1 inputs a plurality of frames constituting a moving image and stores them in a memory.
The partial rectangular area dividing unit 2 divides each frame constituting the moving image stored in the processing of the moving image input storage unit 1 into a partial rectangular area by a plurality of frame dividing methods set in advance (see FIG. 2). FIG. 2 shows an example of frame division, in which a plurality of layers are provided, and a frame is divided into partial rectangular areas of different sizes for each layer.
[0018]
The character appearance detection processing target time designation unit 3 uses a predetermined method for a frame that has not been subjected to the character appearance detection process among the frames divided into the partial rectangle regions by the process of the partial rectangle region division unit 2. Set the time of the frame for performing the character appearance detection process. The time may be determined, for example, in the order of the time.
[0019]
The character appearance detection processing start control unit 4 separates a frame corresponding to the time specified by the character appearance detection processing target time specification unit 3 from the time of the frame by one or more types of time intervals set before and after the time axis. If a frame corresponding to one or more times is divided into partial rectangular areas by the processing of the partial rectangular area dividing unit 2, a frame at a time one unit time before the character appearance detection processing target time is a character appearance frame. Is determined from the flag information of the character appearance detection processing result of the frame at the time one unit time earlier, and if it is determined that the character does not appear, the first character The appearance determining unit 5 determines to start the character appearance detection process. On the other hand, when it is determined that the frame one unit time earlier than the frame at the character appearance detection processing target time is a character appearance frame, the processing of the first corresponding pixel luminance difference value calculation unit 6 is performed.
[0020]
The first character appearance determination unit 5 determines whether or not the target frame for character appearance detection processing is a character appearance frame using a predetermined method (for example, the method described in Reference [4]). When it is determined that the frame is a frame, character appearance is added to the frame as flag information. When it is determined that the character does not appear, the character non-appearance is added as flag information, and the process returns to the character appearance detection processing target time designation unit 3.
[0021]
The first corresponding pixel-to-pixel luminance difference calculation unit 6 calculates a luminance difference value between corresponding pixels in a corresponding partial rectangular area between a character appearance detection processing target frame and a character appearance frame at a time one unit time earlier. Is determined as the number of pixels smaller than a preset difference value threshold (see FIG. 3).
[0022]
FIG. 3 shows an example of the function of the first corresponding pixel luminance difference value calculation unit 6. The function is performed between the frame of the character appearance detection processing target time and the frame of the time one unit time before it. This shows that a luminance difference value is calculated between corresponding pixels in a corresponding partial rectangular area. In the example of FIG. 3, the number of pixels whose luminance difference value is smaller than a predetermined threshold is obtained from 48 pixels constituting the partial rectangular area.
[0023]
In the partial scene change determination unit 7, the pixel corresponding to the character appearance frame at the previous time obtained by the processing of the first corresponding pixel luminance difference value calculation unit 6 for each partial rectangular area for the character appearance detection processing target frame It is determined whether the number of pixels whose luminance difference value is smaller than a preset difference threshold value is smaller than a preset number threshold value. Is determined.
[0024]
If the number of pixels obtained by the processing of the first corresponding pixel-to-pixel luminance difference value calculation unit 6 does not satisfy the condition, the frame is determined to be a frame in which characters continuously appear from the previous time, and the frame is determined. , The character appearance flag information is added to the processing, and the process returns to the processing of the character appearance detection processing target time designation unit 3.
[0025]
In the second corresponding pixel-to-pixel luminance difference value calculation unit 8, a character appearance detection processing target frame determined as a partial scene change occurrence frame in the processing of the partial scene change determination unit 7 and a frame backward in the time axis are set in advance. The number of pixels whose luminance difference value between corresponding pixels in a corresponding partial rectangular area is smaller than a preset difference value threshold value between one or more types of frames separated by one or more time intervals (FIG. 4).
[0026]
FIG. 4 shows an example of the function of the second corresponding pixel-to-pixel luminance difference value calculation unit 8, and the correspondence between the frame of the character appearance detection processing target time and the frame of one or more times subsequent thereto. This indicates that a luminance difference value is calculated between the corresponding pixels in the partial rectangular area. In the example of FIG. 4, a plurality of frames at the later time for the corresponding pixel-to-pixel luminance difference value calculation processing are set to six, and the luminance difference value is smaller than a predetermined threshold value among the 48 pixels constituting the partial rectangular area. The number of pixels is obtained for six frames at a later time.
[0027]
The second character appearance judging section 9 sets a second corresponding pixel in a partial rectangular area satisfying the condition of the occurrence of the partial scene change in the frame determined as the partial scene change occurrence frame in the processing of the partial scene change determining section 7. The number of pixels having a small brightness difference value between corresponding pixels in a corresponding partial rectangular area between one or more frames at a time after the frame of the character appearance detection processing time obtained by the inter-brightness difference value calculation unit 8 If the number is equal to or larger than the preset number threshold, the frame is determined as a character appearance frame, and the character appearance is added to the frame as flag information.
[0028]
If at least one of the one or more types of pixels calculated by the second corresponding pixel-to-pixel luminance difference value calculation unit 8 has a value smaller than a preset number threshold, it is determined to be a character non-appearing frame. The character non-appearance flag information is added to the frame, and the process returns to the character appearance detection processing target time designation unit 3.
[0029]
The output unit 10 outputs a character appearance frame extraction result obtained by the processing of the first character appearance determination unit 5 and the second character appearance determination unit 9.
In the above description, the first corresponding pixel-to-pixel luminance difference value calculating unit 6 and the second corresponding-pixel-to-pixel luminance difference value calculating unit 8 determine the number of pixels whose luminance difference value between corresponding pixels is smaller than a predetermined threshold value. However, if the number of pixels whose luminance difference value is equal to or greater than a predetermined threshold value is obtained, and the magnitude comparison between the corresponding pixel number and the threshold value is changed, it is substantially the same. Needless to say, this is possible.
[0030]
【The invention's effect】
As described above, according to the present invention, an inter-frame-corresponding inter-pixel luminance difference between a character appearance frame extracted from an image by a predetermined method and a frame at a time one unit time later than the character appearance frame. By calculating the value, it is possible to detect the continuous appearance of characters, which is difficult with the conventional method, and therefore, it is possible to extract a character appearance frame with higher accuracy than the conventional method.
[Brief description of the drawings]
FIG. 1 is a block diagram illustrating a configuration example of an apparatus that implements the present invention.
FIG. 2 is a diagram illustrating an example of a function of a partial rectangular area dividing unit.
FIG. 3 is a diagram illustrating an example of a function of a first corresponding pixel luminance difference value calculation unit.
FIG. 4 is a diagram illustrating an example of a function of a second corresponding pixel luminance difference value calculation unit.
[Explanation of symbols]
REFERENCE SIGNS LIST 1 moving image input storage unit 2 partial rectangular area division unit 3 character appearance detection processing target time designation unit 4 character appearance detection processing start control unit 5 first character appearance determination unit 6 first corresponding pixel luminance difference value calculation unit 7 Partial scene change determination unit 8 Second corresponding pixel luminance difference value calculation unit 9 Second character appearance determination unit 10 Output unit of character appearance frame extraction processing result

Claims

In a character appearance frame extracting apparatus for extracting a frame including a character from a plurality of frames constituting a moving image,
First means for inputting and storing moving images in frame units;
A second means for providing a plurality of layers for frame division and dividing each frame input using the first means into a plurality of frames having partial rectangular areas of different sizes for each layer ;
Third means for designating a character appearance detection processing target time;
If the frame at a time one unit time earlier than the character appearance detection processing target time designated using the third means is not a character appearance frame, it is determined to proceed to the fifth means. A fourth means for deciding to proceed to the sixth means if the frame one unit time before the time is a character appearance frame;
Fifth means for judging whether or not a character appearance has occurred with a predetermined character appearance detection method for the frame at the character appearance detection processing target time;
For each corresponding partial rectangular area, a luminance difference value between corresponding pixels between a frame at a character appearance detection processing target time and a character appearance frame at a time one unit time earlier than the predetermined difference threshold value A sixth means for determining the number of pixels which are also smaller,
With respect to the frame at the character appearance detection processing target time, it is determined whether or not the condition that there is a partial rectangular area in which the number of pixels obtained by using the sixth means is smaller than a preset number threshold is satisfied. In this case, a seventh means for judging the frame as a partial scene change occurrence frame,
When the frame of the character appearance detection processing target time is determined to be a partial scene change occurrence frame using the seventh means, the frame of the character appearance detection processing target time and one type preset in the backward direction of the time axis therefrom. The luminance difference value between the corresponding pixels in the partial rectangular area that satisfies the condition of the occurrence of the partial scene change in the seventh means between the frame corresponding to the one or more times separated by the above time intervals is a predetermined difference. Eighth means for determining the number of pixels smaller than the value threshold,
It is determined whether the number of pixels calculated using the eighth means with respect to the frame at the character appearance detection processing target time satisfies the condition that all pixels are equal to or greater than a preset number threshold value. Ninth means for determining that the frame is a character appearance frame;
Means for outputting a frame determined to be a character appearance frame using the fifth means and the ninth means.