JP2003298981A

JP2003298981A - Digest image generating apparatus, digest image generating method, digest image generating program, and computer-readable storage medium for storing the digest image generating program

Info

Publication number: JP2003298981A
Application number: JP2002101196A
Authority: JP
Inventors: Shozo Hirose; 尚三廣瀬; Mitsuhiro Hori; 充宏堀; Shingo Fujiwara; 紳吾藤原; Motohide Yasukawa; 元英安川; Masahiro Inui; 昌弘乾
Original assignee: Ogis Ri Co Ltd
Current assignee: Ogis Ri Co Ltd
Priority date: 2002-04-03
Filing date: 2002-04-03
Publication date: 2003-10-17

Abstract

<P>PROBLEM TO BE SOLVED: To provide a digest image generating apparatus and method therefor capable of generating a proper digest image in response to a category. <P>SOLUTION: A plurality of feature extract sections 14, 16, 18, 20, 22 extract features of a plurality of kinds different from each other such as scene changes, a camera motion, the luminance and color of a designated area, a telop, and the sound from an original image, read a scene extract condition corresponding to a selected category from a library 26, produce a play list listing up feature information of scenes with a feature for satisfying the condition, and select a particular scene from the original image according to the play list to generate a digest image. <P>COPYRIGHT: (C)2004,JPO

Description

Detailed Description of the Invention

【発明の属する技術分野】本発明は、要約画像作成装
置、要約画像作成方法、要約画像作成プログラム、及び
要約画像作成プログラムを記憶したコンピュータ読取可
能な記憶媒体に関し、さらに詳しくは、動画像から要約
画像を作成する要約画像作成装置、要約画像作成方法、
要約画像作成プログラム、及び要約画像作成プログラム
を記憶したコンピュータ読取可能な記憶媒体に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a summary image creating device, a summary image creating method, a summary image creating program, and a computer-readable storage medium storing the summary image creating program. Abstract image creating device for creating an image, abstract image creating method,
The present invention relates to a summary image creating program and a computer-readable storage medium storing the summary image creating program.

【０００１】[0001]

【従来の技術】特表平１０−５０７５５４号公報、特開
平１０−２３２８８４号公報、及び特開平１１−７５１
４６号公報には、録画したテレビ番組や映画等の概要を
短時間で確認することを可能にするために、動画像から
要約画像を作成する装置が開示されている。2. Description of the Related Art Japanese Laid-Open Patent Publication No. 10-507554, Japanese Laid-Open Patent Publication No. 10-232884, and Japanese Laid-Open Patent Publication No. 11-751.
Japanese Patent Laid-Open No. 46-46 discloses a device for creating a summary image from a moving image in order to confirm the outline of a recorded TV program or movie in a short time.

【０００２】[0002]

【発明が解決しようとする課題】しかし、これらの装置
では、ニュース、音楽、相撲等、番組のカテゴリによっ
ては、適切な要約画像を作成することができない場合が
あった。上述の特開平１０−２３２８８４号公報及び特
開平１１−７５１４６号公報には、カテゴリに応じて適
切な要約画像を作成することが示唆されているが、具体
的な方法は記載されていない。However, these devices may not be able to create an appropriate summary image depending on the program category such as news, music, sumo, etc. Although the above-mentioned JP-A-10-232884 and JP-A-11-75146 suggest that an appropriate summary image be created according to the category, no specific method is described.

【０００３】本発明の目的は、カテゴリに応じて適切な
要約画像を作成することが可能な要約画像作成装置、要
約画像作成方法、要約画像作成プログラム、及び要約画
像作成プログラムを記憶したコンピュータ読取可能な記
憶媒体を提供することである。An object of the present invention is to provide a summary image creating apparatus capable of creating an appropriate summary image according to a category, a summary image creating method, a summary image creating program, and a computer readable storage of the summary image creating program. Another storage medium is to provide.

【０００４】[0004]

【課題を解決するための手段】本発明による要約画像作
成装置は、動画像から要約画像を作成する要約画像作成
装置であって、１又は複数の特徴抽出手段と、シーン抽
出手段とを備える。１又は複数の特徴抽出手段は、動画
像のビデオ又はオーディオ信号の中から、１又は複数種
類の特徴を抽出する。シーン抽出手段は、特徴抽出手段
により抽出された特徴を参照して、動画像のカテゴリに
特有の特徴を有するシーンをビデオ又はオーディオ信号
の中から抽出する。A summary image creating apparatus according to the present invention is a summary image creating apparatus for creating a summary image from a moving image, and includes one or a plurality of feature extracting means and a scene extracting means. The one or a plurality of feature extracting means extracts one or a plurality of types of features from the video or audio signal of the moving image. The scene extraction unit refers to the features extracted by the feature extraction unit and extracts a scene having a feature specific to the category of the moving image from the video or audio signal.

【０００５】本発明による要約画像作成方法は、動画像
から要約画像を作成する要約画像作成方法であって、動
画像のビデオ又はオーディオ信号の中から、１又は複数
種類の特徴を抽出する特徴抽出ステップと、抽出された
特徴を参照して、動画像のカテゴリに特有の特徴を有す
るシーンをビデオ又はオーディオ信号の中から抽出する
シーン抽出ステップとを含む。A summary image creating method according to the present invention is a summary image creating method for creating a summary image from a moving image, which is feature extraction for extracting one or a plurality of types of features from a video or audio signal of the moving image. And a scene extraction step of extracting, from the video or audio signal, a scene having characteristics specific to the category of the moving image with reference to the extracted characteristics.

【０００６】本発明による要約画像作成プログラムは、
上記ステップをコンピュータに実行させるものである。A summary image creating program according to the present invention is
The computer is made to execute the above steps.

【０００７】本発明によるコンピュータ読取可能な記憶
媒体は、上記要約画像作成プログラムを記憶したもので
ある。A computer-readable storage medium according to the present invention stores the summary image creating program.

【０００８】この要約画像作成装置、要約画像作成方
法、要約画像作成プログラム、及び要約画像作成プログ
ラムを記憶したコンピュータ読取可能な記憶媒体では、
動画像のビデオ又はオーディオ信号の中から１又は複数
種類の特徴が抽出され、その抽出された特徴を参照して
動画像のカテゴリに特有の特徴を有するシーンが抽出さ
れる。The summary image creating apparatus, the summary image creating method, the summary image creating program, and the computer-readable storage medium storing the summary image creating program include:
One or a plurality of types of features are extracted from the video or audio signal of the moving image, and a scene having features unique to the category of the moving image is extracted by referring to the extracted features.

【０００９】その結果、本発明によれば、動画像のカテ
ゴリが様々であっても、カテゴリに応じた適切な要約画
像を作成することができる。As a result, according to the present invention, it is possible to create an appropriate summary image according to a category even if the moving image has various categories.

【００１０】[0010]

【発明の実施の形態】以下、本発明の実施の形態を図面
を参照して詳しく説明する。図中同一又は相当部分には
同一符号を付してその説明を援用する。BEST MODE FOR CARRYING OUT THE INVENTION Embodiments of the present invention will be described in detail below with reference to the drawings. In the drawings, the same or corresponding parts will be denoted by the same reference symbols and the description thereof will be incorporated.

【００１１】１．構成図１は、本発明の実施の形態による要約画像作成装置の
構成を示すブロック図である。図１を参照して、要約画
像作成装置は、データライブラリ１０に蓄積されている
動画像から要約画像を作成する。要約画像作成装置は具
体的には、ＭＰＥＧ形式のビデオストリームを複数のレ
イヤに分解するＭＰＥＧパージング部１２と、シーンチ
ェンジに関連するシーンチェンジ特徴データをビデオス
トリームの中から抽出するシーンチェンジ特徴抽出部１
４と、カメラモーションに関連するカメラモーション特
徴データをビデオストリームの中から抽出するカメラモ
ーション特徴抽出部１６と、フレーム内の指定された領
域の輝度又は色に関連する指定領域特徴データをビデオ
ストリームの中から抽出する指定領域特徴抽出部１８
と、テロップに関連するテロップ特徴データをビデオス
トリームの中から抽出するテロップ特徴抽出部２０と、
音声に関連する音声特徴データをビデオストリームの中
から抽出する音声特徴抽出部２２と、複数の特徴抽出部
１４，１６，１８，２０，２２により抽出された特徴を
参照して、動画像のカテゴリに特有の特徴を有するシー
ンをビデオストリーム及びオーディオストリームの中か
ら抽出するシーン抽出部２４とを備える。1. Configuration FIG. 1 is a block diagram showing a configuration of a summary image creating device according to an embodiment of the present invention. With reference to FIG. 1, the summary image creating apparatus creates a summary image from moving images stored in a data library 10. Specifically, the summary image creating apparatus includes an MPEG parsing unit 12 that decomposes an MPEG format video stream into a plurality of layers, and a scene change feature extraction unit that extracts scene change feature data related to a scene change from the video stream. 1
4, a camera motion feature extraction unit 16 that extracts camera motion feature data related to camera motion from the video stream, and designated area feature data related to luminance or color of a designated area in a frame of the video stream. Specified area feature extraction unit 18 to extract from inside
And a telop feature extraction unit 20 for extracting telop feature data related to the telop from the video stream,
The audio feature extraction unit 22 that extracts audio feature data related to audio from the video stream, and the features extracted by the plurality of feature extraction units 14, 16, 18, 20, and 22 are referred to And a scene extraction unit 24 for extracting a scene having a characteristic peculiar to the video stream and the audio stream.

【００１２】シーン抽出部２４は、複数のカテゴリに対
応して複数のシーン抽出条件を記憶する条件ライブラリ
２６と、複数のカテゴリのうち選択されたカテゴリに対
応するシーン抽出条件に応じて、抽出すべきシーンを特
定するために必要なシーン特定情報を作成するシーン特
定情報作成部２８と、作成されたシーン特定情報を列記
したプレイリストを記憶するためのプレイリストライブ
ラリ３０と、プレイリストのシーン特定情報に応じて特
定されたシーンをビデオストリーム及びオーディオスト
リームの中から選択するシーン選択部３２とを含む。The scene extraction unit 24 extracts a scene according to a condition library 26 for storing a plurality of scene extraction conditions corresponding to a plurality of categories and a scene extraction condition corresponding to a selected category among the plurality of categories. Scene identification information creation unit 28 that creates the scene identification information necessary to identify the desired scene, a playlist library 30 that stores a playlist that lists the created scene identification information, and the scene identification of the playlist. And a scene selection unit 32 that selects a scene specified according to information from a video stream and an audio stream.

【００１３】２．動作次に、この要約画像作成装置の動作を図２に示したフロ
ー図を参照して説明する。2. Operation Next, the operation of this summary image creating apparatus will be described with reference to the flow chart shown in FIG.

【００１４】データライブラリ１０には、動画像とし
て、ニュース、音楽、野球、ドラマ、相撲等、録画した
テレビ番組のデジタルデータが蓄積されている。データ
ライブラリ１０は実際には、ＤＶＤ（デジタル多用途デ
ィスク）やコンピュータのハードディスク上に形成され
る。The data library 10 stores digital data of recorded television programs such as news, music, baseball, drama, sumo, etc. as moving images. The data library 10 is actually formed on a DVD (digital versatile disk) or a hard disk of a computer.

【００１５】まず、データライブラリ１０のコンテンツ
が要約画像作成装置に入力される（Ｓ１）。入力された
コンテンツは、エンコーダ（図示せず）によりＭＰＥＧ
形式のビデオストリームに変換されるとともに（Ｓ
２）、ＷＡＶＥ形式のオーディオストリームに変換され
る。ビデオストリームはＭＰＥＧパージング部１２に入
力され、ＩフレームやＰフレーム等、複数のレイヤに分
解される（Ｓ３）。Ｉフレームは、シーンチェンジ特徴
抽出部１４、指定領域特徴抽出部１８及びテロップ特徴
抽出部２０に入力される（Ｓ４）。Ｐフレームは、カメ
ラモーション特徴抽出部１６及びテロップ特徴抽出部２
０に入力される（Ｓ４）。オーディオストリームは、音
声特徴抽出部２２に入力される（Ｓ４）。そして、これ
ら特徴抽出部により互いに異なる複数種類の特徴がビデ
オストリーム及びオーディオストリームから抽出される
（Ｓ５）。以下、各特徴抽出部の動作を説明する。First, the contents of the data library 10 are input to the summarized image creating apparatus (S1). The input content is MPEG encoded by an encoder (not shown).
Format is converted to a video stream (S
2), converted to a WAVE format audio stream. The video stream is input to the MPEG parsing unit 12 and decomposed into a plurality of layers such as I frame and P frame (S3). The I frame is input to the scene change feature extraction unit 14, the designated area feature extraction unit 18, and the telop feature extraction unit 20 (S4). The P frame includes a camera motion feature extraction unit 16 and a telop feature extraction unit 2
It is input to 0 (S4). The audio stream is input to the audio feature extraction unit 22 (S4). Then, a plurality of different types of features are extracted from the video stream and the audio stream by these feature extraction units (S5). The operation of each feature extraction unit will be described below.

【００１６】（１）シーンチェンジ特徴抽出シーンチェンジ特徴抽出部１４は、ＩフレームのＤＣＴ
（離散コサイン変換）係数に基づいて輝度ヒストグラム
を作成する。具体的には、８ドット×８ドットからなる
ブロックの６４個のＤＣＴ係数の中から、このブロック
の平均輝度を表す直流成分を抽出する。そして、各フレ
ーム内における輝度のヒストグラムを作成する。続い
て、各フレームの輝度ヒストグラムとその次のフレーム
の輝度ヒストグラムとの差分を算出し、その差分の時間
方向に対する分散を算出する。この分散がシーンチェン
ジ特徴データであり、シーンが大きく変化する時に大き
くなる傾向にある。(1) Scene Change Feature Extraction The scene change feature extraction unit 14 uses the DCT of the I frame.
(Discrete cosine transform) A luminance histogram is created based on the coefficients. Specifically, the DC component representing the average luminance of this block is extracted from the 64 DCT coefficients of the block of 8 dots × 8 dots. Then, a histogram of luminance in each frame is created. Then, the difference between the luminance histogram of each frame and the luminance histogram of the next frame is calculated, and the variance of the difference in the time direction is calculated. This distribution is the scene change feature data, and tends to increase when the scene changes significantly.

【００１７】（２）カメラモーション特徴抽出カメラモーション特徴抽出部１６は、Ｐフレームのモー
ションベクトルをアフィン変換してカメラモーション特
徴データを算出する。このカメラモーション特徴データ
から、画像を撮影したカメラが停止しているのか、上下
左右に回転しているのか、ズームインしているのか、ズ
ームアウトしているのか等、カメラの動き（モーショ
ン）を判別することができる。(2) Camera Motion Feature Extraction The camera motion feature extraction unit 16 affine-transforms the motion vector of the P frame to calculate camera motion feature data. From this camera motion feature data, determine the camera movement (motion) such as whether the camera that took the image is stopped, rotating up / down / left / right, zooming in, zooming out, etc. can do.

【００１８】（３）指定領域特徴抽出指定領域特徴抽出部１８は、ＩフレームのＤＣＴ係数に
基づいて、フレーム内で予め指定された１又は２以上の
領域において輝度又は色の値が所定の範囲内か否かを判
別するために有効な指定領域特徴データを算出する。(3) Designated Area Feature Extraction The designated area feature extraction unit 18 uses the DCT coefficient of the I frame to set the luminance or color value in a predetermined range in one or more areas designated in advance in the frame. Specified area feature data effective for determining whether or not the area is inside is calculated.

【００１９】図３は、相撲の取り組み画面の例を示す。
相撲というカテゴリの場合、このような取り組み画面を
要約画像の中に取り込む必要がある。相撲の取り組み画
面では一般に、画面の上側約５分の３に観客席が映し出
され、画面の下側約５分の２に土俵が映し出されること
が多い。そこで、フレーム内の上下に２つの指定領域３
４，３６を設定する。取り組み画面では、上側の指定領
域３４は観客席が映し出されるので暗くなり、下側の指
定領域３６は土俵が映し出されるので明るくなる傾向に
ある。したがって、上側の指定領域３４では輝度の低い
画素の割合が８０％以上あり、かつ下側の指定領域３６
では輝度の高い画素の割合が９０％以上あれば、その画
面は取り組み画面と判別することができる。FIG. 3 shows an example of a sumo wrestling screen.
In the case of the sumo category, it is necessary to incorporate such an action screen in the summary image. In sumo wrestling screens, the audience seats are generally displayed on the upper three-fifths of the screens, and the ring is projected on the lower two-fifths of the screens. Therefore, there are two designated areas 3 above and below in the frame.
Set 4, 36. In the approach screen, the upper designated area 34 tends to be dark because the audience seats are projected, and the lower designated area 36 tends to be bright because the ring is projected. Therefore, in the upper designated area 34, the proportion of pixels having low luminance is 80% or more, and in the lower designated area 36.
Then, if the ratio of pixels with high brightness is 90% or more, the screen can be determined to be the approach screen.

【００２０】図４は、指定領域特徴抽出のための具体的
な処理を示すフロー図である。FIG. 4 is a flow chart showing a specific process for extracting a specified area feature.

【００２１】まず、フレーム内の全ての画素について後
述のチェックを行ったか否かを判別する（Ｓ１０）。全
ての画素についてチェックを終えていない場合、画像中
のある画素の位置とその輝度値とを入力する（Ｓ１
１）。First, it is determined whether all the pixels in the frame have been checked as described below (S10). If the check has not been completed for all pixels, the position of a pixel in the image and its brightness value are input (S1).
1).

【００２２】続いて、フレーム内の全ての指定領域（図
３の例では２つの指定領域３４，３６）について後述の
チェックを行ったか否かを判別する（Ｓ１２）。全ての
指定領域についてチェックを終えていない場合、入力さ
れた画素の位置が指定領域内にあるか否かをチェックす
る（Ｓ１３）。指定領域内にない場合はステップＳ１２
に戻り、ある場合は次のステップＳ１５に進む（Ｓ１
４）。Then, it is determined whether or not the below-described check has been performed for all designated areas (two designated areas 34 and 36 in the example of FIG. 3) in the frame (S12). If all the designated areas have not been checked, it is checked whether the position of the input pixel is within the designated area (S13). If it is not within the designated area, step S12
, And if there is, go to the next step S15 (S1
4).

【００２３】入力された画素の位置が指定領域内にある
場合、その画素の輝度値が所定の範囲内か否かをチェッ
クする（Ｓ１５）。図３に示した画面の場合、上側の指
定領域３４には観客席として典型的な低い輝度値を中心
に所定の範囲を設定し、下側の指定領域３６には土俵と
して典型的な高い輝度値を中心に所定の範囲を設定す
る。輝度値が所定の範囲内でない場合はステップＳ１２
に戻り、所定の範囲内の場合は次のステップＳ１７に進
む。If the position of the input pixel is within the designated area, it is checked whether the brightness value of the pixel is within a predetermined range (S15). In the case of the screen shown in FIG. 3, a predetermined range is set in the designated area 34 on the upper side around a low luminance value that is typical for a spectator seat, and in the designated area 36 on the lower side, a high brightness that is typical for a ring. Set a predetermined range around the value. If the brightness value is not within the predetermined range, step S12.
Returning to step S17, if it is within the predetermined range, the process proceeds to the next step S17.

【００２４】そして、画素のカウントを１つ追加する
（Ｓ１７）。これにより、指定領域内において所定範囲
内の輝度値を有する画素の数をカウントすることができ
る。Then, one pixel count is added (S17). Thus, it is possible to count the number of pixels having the brightness value within the predetermined range within the designated area.

【００２５】ステップＳ１２で全ての指定領域について
上述のチェックを終えたと判断した場合はステップＳ１
０に戻り、さらにステップＳ１０でフレーム内の全ての
画素について上述のチェックを終えたと判断した場合は
次のステップＳ１８に進む。If it is determined in step S12 that the above-mentioned checks have been completed for all designated areas, step S1
Returning to 0, and when it is determined in step S10 that the above-described check has been completed for all pixels in the frame, the process proceeds to the next step S18.

【００２６】次に、指定領域内の全画素数に対する上記
でカウントされた画素数の割合を各指定領域ごとに算出
する（Ｓ１８）。続いて、算出した画素数の割合が所定
のしきい値以上か否かを判定する（Ｓ１９）。しきい値
以上の場合はその指定領域に割り当てられたフラグを
「１」に設定し（Ｓ２０）、しきい値以上でない場合は
そのフラグを「０」に設定する（Ｓ２１）。Next, the ratio of the number of pixels counted above to the total number of pixels in the designated area is calculated for each designated area (S18). Then, it is determined whether the calculated ratio of the number of pixels is equal to or more than a predetermined threshold value (S19). If it is equal to or more than the threshold value, the flag assigned to the designated area is set to "1" (S20), and if it is not equal to or more than the threshold value, the flag is set to "0" (S21).

【００２７】続いて、フラグ＝１が連続するフレーム数
が所定のしきい値以上か否かを判定する（Ｓ２２）。し
きい値以上の場合は抽出すべきシーンの開始フレームと
判断する（Ｓ２３）。一方、しきい値以上でない場合は
フラグ＝０が連続するフレーム数が所定のしきい値以上
か否かを判定する（Ｓ２４）。しきい値以上の場合は抽
出すべきシーンの終了フレームと判断する（Ｓ２５）。Then, it is determined whether or not the number of frames in which flag = 1 continues is equal to or more than a predetermined threshold value (S22). If it is equal to or more than the threshold value, it is determined to be the start frame of the scene to be extracted (S23). On the other hand, if it is not equal to or more than the threshold value, it is determined whether or not the number of consecutive frames of flag = 0 is equal to or more than a predetermined threshold value (S24). If it is equal to or more than the threshold value, it is determined that it is the end frame of the scene to be extracted (S25).

【００２８】上述の例では輝度情報を用いているが、輝
度情報とともに又は輝度情報に代えて色情報を用いても
よい。Although the luminance information is used in the above example, color information may be used together with the luminance information or instead of the luminance information.

【００２９】（４）テロップ特徴抽出テロップ特徴抽出部２０は、ＩフレームのＤＣＴ係数及
びＰフレームのモーションベクトルに基づいて、画面に
テロップがあるか否かを判別するために有効なテロップ
特徴データを算出する。(4) Telop Feature Extraction The telop feature extraction unit 20 outputs telop feature data effective for determining whether or not there is a telop on the screen based on the DCT coefficient of the I frame and the motion vector of the P frame. calculate.

【００３０】図５は、テロップ特徴抽出のための具体的
な動作を示すフロー図である。FIG. 5 is a flowchart showing a specific operation for extracting telop features.

【００３１】まず、フレーム内の全てのマクロブロック
について後述の処理を行ったか否かを判別する（Ｓ３
０）。図６に示すように、ＭＰＥＧでは１つの画像は複
数のマクロブロックで構成される。各マクロブロックは
１６ドット×１６ドットで構成される。First, it is determined whether or not the processing described below has been performed for all macroblocks in the frame (S3).
0). As shown in FIG. 6, in MPEG, one image is composed of a plurality of macroblocks. Each macroblock is composed of 16 dots × 16 dots.

【００３２】全マクロブロックの処理を終えていない場
合、ＩフレームからマクロブロックのＤＣＴ係数を入力
する（Ｓ３１）。図６に示すように、各マクロブロック
は４つのブロックで構成される。各ブロックは８ドット
×８ドットで構成される。ＭＰＥＧでは、各ブロックに
ある６４（＝８×８）個のＤＣＴ係数をデコードするこ
とによって実際の色を求める。If the processing of all macroblocks has not been completed, the DCT coefficient of the macroblock is input from the I frame (S31). As shown in FIG. 6, each macroblock is composed of four blocks. Each block is composed of 8 dots × 8 dots. In MPEG, the actual color is obtained by decoding 64 (= 8 × 8) DCT coefficients in each block.

【００３３】続いて、各ブロックについて、直流成分の
ＤＣＴ係数（図６に示したブロック上で左上隅のドット
に割り当てられる）を除く６３個のＤＣＴ係数の分散を
算出する（Ｓ３２）。このようなＤＣＴ係数の分散を４
つのブロックについて算出する。Subsequently, the variance of 63 DCT coefficients excluding the DCT coefficient of the DC component (assigned to the dot in the upper left corner on the block shown in FIG. 6) is calculated for each block (S32). The variance of such DCT coefficient is 4
Calculate for one block.

【００３４】続いて、算出した４つの分散の中から最大
値を選択する（Ｓ３３）。その最大値が所定のしきい値
よりも大きければ、そのマクロブロックに割り当てられ
たフラグを立てる（Ｓ３４）。テロップの部分では１色
又は数色の文字が映し出されることが多いので、ここで
算出した分散の最大値が大きくなる傾向にある。Then, the maximum value is selected from the calculated four variances (S33). If the maximum value is larger than the predetermined threshold value, the flag assigned to the macroblock is set (S34). Since characters of one color or several colors are often displayed in the telop portion, the maximum value of the variance calculated here tends to increase.

【００３５】上記ステップＳ３０で全マクロブロックに
ついて上述の処理を終えたと判別した場合、指定領域内
のフラグの割合を算出する（Ｓ３５）。テロップは画面
の下側中央に映し出されることが多いので、その辺りに
指定領域は設定される。If it is determined in step S30 that the above processing has been completed for all macroblocks, the ratio of flags in the designated area is calculated (S35). Since the telop is often displayed in the lower center of the screen, the designated area is set around that.

【００３６】続いて、フラグの割合が所定のしきい値よ
りも大きければ、その指定領域に割り当てられたフラグ
を立てる（Ｓ３６）。Then, if the flag ratio is higher than a predetermined threshold value, the flag assigned to the designated area is set (S36).

【００３７】続いて、フラグが立ったマクロブロックの
Ｘ座標（画面横方向）の分散を算出する（Ｓ３７）。テ
ロップは画面横方向に伸びていることが多いので、ここ
で算出したＸ座標の分散が大きくなる傾向にある。Then, the variance of the X-coordinate (horizontal direction of the screen) of the flagged macroblock is calculated (S37). Since the telop often extends in the horizontal direction of the screen, the variance of the X coordinate calculated here tends to be large.

【００３８】続いて、算出した分散の時間方向に対する
分散を算出し、安定しているか否かを調べる（Ｓ３
８）。テロップは一定時間継続して映し出されることが
多いので、ここで算出した時間方向に対する分散が安定
する傾向にある。Subsequently, the variance of the calculated variance in the time direction is calculated and it is checked whether or not it is stable (S3).
8). Since the telop is often displayed continuously for a certain period of time, the dispersion in the time direction calculated here tends to be stable.

【００３９】続いて、指定領域のフラグが立ち、かつ、
時間方向に対する分散が安定しているか否かを判別する
（Ｓ３９）。フラグが立ちかつ分散が安定している場合
は、テロップの開始フレームと判断し、それを保持する
（Ｓ４０）。一方、そうでない場合は、開始フレームが
保持されているか否かを判別する（Ｓ４１）。開始フレ
ームが保持されている場合は、テロップの終了フレーム
と判断し、開始フレーム及び終了フレームを出力する
（Ｓ４２）。Then, the flag of the designated area is set, and
It is determined whether or not the dispersion in the time direction is stable (S39). If the flag is set and the dispersion is stable, it is determined that the frame is the telop start frame, and the frame is held (S40). On the other hand, if not, it is determined whether or not the start frame is held (S41). If the start frame is held, it is determined to be the end frame of the telop, and the start frame and the end frame are output (S42).

【００４０】（５）音声特徴抽出音声特徴抽出部２２は、オーディオストリームから得ら
れる音声の周波数及びその分散、振幅及びその分散等、
その音声の種類（会話、音楽等）を特定するのに有効な
音声特徴データを算出する。(5) Speech Feature Extraction The speech feature extraction unit 22 determines the frequency and the variance of the voice obtained from the audio stream, the amplitude and the variance thereof, and the like.
The voice feature data effective for specifying the type of voice (conversation, music, etc.) is calculated.

【００４１】上述した特徴抽出の結果、図１及び図２に
示すように、特徴抽出部１４，１６，１８，２０，２２
からそれぞれの特徴データが出力される（Ｓ６）。As a result of the above-mentioned feature extraction, as shown in FIGS. 1 and 2, the feature extraction units 14, 16, 18, 20, 22 are extracted.
The respective characteristic data are output from (S6).

【００４２】続いて、ユーザの操作に応じて動画像のカ
テゴリを選択する（Ｓ７）。Then, the category of the moving image is selected according to the user's operation (S7).

【００４３】図７は、条件ライブラリ２６の内容を示
す。条件ライブラリ２６は、複数のカテゴリに対応して
複数のシーン抽出条件を記憶している。たとえばニュー
スというカテゴリの場合、ニュースヘッドラインに相当
する要約画像を作成する必要がある。この例では、シー
ン抽出条件として、音声特徴データに関連して「音声に
人の声が含まれていること」、シーンチェンジ特徴デー
タに関連して「シーンチェンジがないこと」、カメラモ
ーション特徴データに関連して「動きがないこと」、テ
ロップ特徴データに関連して「テロップが画面の下側に
ある区間の後半に出現すること」が記憶されている。FIG. 7 shows the contents of the condition library 26. The condition library 26 stores a plurality of scene extraction conditions corresponding to a plurality of categories. For example, in the category of news, it is necessary to create a summary image corresponding to a news headline. In this example, as the scene extraction conditions, "the voice contains a human voice" in relation to the voice feature data, "there is no scene change" in relation to the scene change feature data, and the camera motion feature data "There is no motion" in relation to ", and" that the telop appears in the latter half of the section on the lower side of the screen "is stored in relation to the telop feature data.

【００４４】このような条件ライブラリ２６から、上記
ステップＳ７で選択されたカテゴリに対応するシーン抽
出条件が読み出される。From such a condition library 26, scene extraction conditions corresponding to the category selected in step S7 are read.

【００４５】続いて、シーン特定情報作成部２８は、特
徴抽出部１４，１６，１８，２０，２２から与えられた
特徴抽出データに基づいて、条件ライブラリ２６から読
み出されたシーン抽出条件を満たすシーンを特定するた
めに必要なシーン特定情報を作成する。シーン特定情報
としては、先頭フレームから該当フレームまでの時間、
フレームの連続番号等を用いることができる。作成され
たシーン特定情報はプレイリストライブラリ３０に順次
記録され、その結果、要約画像に含めるべきシーンを特
定するプレイリストが完成する。プレイリストは、画像
コンテンツを含んでおらず、要約画像を形成するフレー
ムの指標を記録したインデックスファイルである。Then, the scene specifying information creating section 28 satisfies the scene extraction conditions read from the condition library 26 based on the feature extraction data provided from the feature extracting sections 14, 16, 18, 20, 22. Create the scene identification information needed to identify the scene. As the scene identification information, the time from the first frame to the corresponding frame,
A frame serial number or the like can be used. The created scene identification information is sequentially recorded in the playlist library 30, and as a result, the playlist that identifies the scenes to be included in the summary image is completed. The play list is an index file that does not include image content and records indexes of frames forming a summary image.

【００４６】実際に要約画像を作成するときには、プレ
イリストライブラリ３０からプレイリストが出力され、
シーン選択部３２がこのプレイリストに従って動画像の
中から特定のシーンを選択する（Ｓ９）。When the summary image is actually created, the playlist is output from the playlist library 30,
The scene selection unit 32 selects a specific scene from the moving images according to this playlist (S9).

【００４７】３．他の実施の形態上述した実施の形態ではユーザの操作に応じてカテゴリ
を選択している。通常はユーザが動画像のカテゴリを知
っているから、ユーザにカテゴリを選択させた方が適切
な要約画像を作成することができる。しかし、１つの動
画像の中に複数のカテゴリの画像が含まれている場合
は、ユーザが一々カテゴリを選択しなければならず、面
倒である。3. Other Embodiments In the above-described embodiments, the category is selected according to the user's operation. Since the user usually knows the category of the moving image, it is possible to create an appropriate summary image by allowing the user to select the category. However, when a plurality of categories of images are included in one moving image, the user has to select each category, which is troublesome.

【００４８】そこで、図２に示したステップＳ７の代わ
りに、図８に示したステップＳ７１を追加してもよい。
ここでは、カテゴリを自動的に選択する。Therefore, step S71 shown in FIG. 8 may be added instead of step S7 shown in FIG.
Here, the category is automatically selected.

【００４９】図９は、カテゴリを自動的に選択する方法
を示す。まず上記と同じ方法で、動画像から複数種類の
特徴を抽出する。この動画像は複数のカテゴリの画像を
含んでいるから、各カテゴリごとに特有の特徴が抽出さ
れる。これらの抽出された特徴は、タイムレコードやそ
の特徴抽出に特有のパラメータを複数持つが、パラメー
タがＮ個の場合、Ｎ次元の特徴ベクトルに置き換えるこ
とができる。なお予め、様々なカテゴリの番組について
過去の番組等から抽出した特徴抽出結果からカテゴリ別
にテンプレートとなる特徴ベクトルを１又は複数用意し
ておき、データベース化しておく。FIG. 9 shows a method of automatically selecting a category. First, a plurality of types of features are extracted from a moving image by the same method as above. Since this moving image includes images of a plurality of categories, a characteristic peculiar to each category is extracted. These extracted features have a plurality of parameters peculiar to the time record and the feature extraction, and when the number of parameters is N, it can be replaced with an N-dimensional feature vector. It should be noted that one or a plurality of feature vectors serving as templates for each category are prepared in advance from the feature extraction results extracted from past programs and the like for programs in various categories and stored in a database.

【００５０】任意の動画像のある区間の特徴ベクトルを
作成し、データベース内の特徴ベクトルの距離や相関係
数を求め、ベクトル間距離が短いか又は最も相関係数が
高い特徴ベクトルを選択する。この特徴ベクトルのカテ
ゴリがたとえばニュースであれば、この区間のカテゴリ
はニュースと決定することができる。A feature vector of a certain section of an arbitrary moving image is created, the distance and correlation coefficient of the feature vector in the database are obtained, and the feature vector having a short inter-vector distance or the highest correlation coefficient is selected. If the category of this feature vector is, for example, news, the category of this section can be determined to be news.

【００５１】また、上述した実施の形態によれば、ほと
んどの場合、カテゴリに応じた適切な要約画像を作成す
ることができる。しかし、同じカテゴリであっても番組
によって抽出される特徴にはバラツキがあるため、適切
な要約画像を作成することができない場合もある。Further, according to the above-described embodiments, in most cases, an appropriate summary image can be created according to the category. However, even in the same category, there are cases where it is not possible to create an appropriate summary image because the characteristics extracted by the program vary.

【００５２】そこで図１０に示すように、各特徴抽出部
の特徴抽出条件を調整し（Ｓ５０）、その調整後の特徴
抽出条件で特徴を抽出し（Ｓ４）、さらにシーン抽出条
件を調整し（Ｓ５１）、その調整後のシーン抽出条件で
シーン特定情報を作成する（Ｓ８）、というようにして
もよい。なお、図１０中で丸付数字は処理の順番を示
す。Therefore, as shown in FIG. 10, the feature extraction conditions of each feature extraction unit are adjusted (S50), the features are extracted by the adjusted feature extraction conditions (S4), and the scene extraction conditions are further adjusted (S50). S51), the scene identification information may be created under the adjusted scene extraction conditions (S8). In FIG. 10, circled numbers indicate the order of processing.

【００５３】特徴抽出条件を自動的に調整するために
は、初期の特徴抽出条件によって抽出された上記特徴ベ
クトルと、それに近いか又は相関の高いデータベース内
の１又は複数の特徴ベクトルとの差を求め、その差が小
さくなるように特徴抽出条件を微調整する。In order to automatically adjust the feature extraction condition, the difference between the above-mentioned feature vector extracted by the initial feature extraction condition and one or a plurality of feature vectors in the database that are close to or have a high correlation is calculated. Then, the feature extraction conditions are finely adjusted so that the difference becomes small.

【００５４】次に、微調整された特徴抽出条件を用いて
特徴を抽出し、特徴ベクトルを求め、最も近いか又は相
関の高いデータベース内の特徴ベクトルを選択する。こ
のとき若干差が残るが、これを特徴ベクトル差として記
憶しておく。選択された特徴ベクトルがたとえばニュー
スであった場合、ニュースのシーン抽出条件を用いてシ
ーン特定情報を作成するが、先に求めた特徴ベクトル差
により、予め規定された調整係数を用い、シーン抽出条
件の各パラメータを微調整する。Next, the feature is extracted by using the finely adjusted feature extraction condition, the feature vector is obtained, and the feature vector in the database that is the closest or has a high correlation is selected. At this time, a slight difference remains, but this is stored as a feature vector difference. If the selected feature vector is, for example, news, the scene identification information is created using the news scene extraction condition. However, the scene extraction condition is created using the adjustment coefficient that is defined in advance based on the previously obtained feature vector difference. Fine-tune each parameter of.

【００５５】図１に示したブロック図中の各機能、並び
に図２、図４、図５、図８及び図１０に示したフロー図
中の各ステップは、所定のプログラムをコンピュータに
実行させることにより実現することができる。このプロ
グラムはＣＤ−ＲＯＭ等のコンピュータ読取可能な記録
媒体に記録して配布することも可能であるが、インター
ネット等を通じて配信することも可能である。The functions in the block diagram shown in FIG. 1 and the steps in the flow charts shown in FIGS. 2, 4, 5, 8 and 10 cause a computer to execute a predetermined program. Can be realized by This program can be recorded in a computer-readable recording medium such as a CD-ROM and distributed, but can also be distributed via the Internet or the like.

【００５６】以上、本発明の実施の形態を説明したが、
上述した実施の形態は本発明を実施するための例示に過
ぎない。よって、本発明は上述した実施の形態に限定さ
れることなく、その趣旨を逸脱しない範囲内で上述した
実施の形態を適宜変形して実施することが可能である。The embodiment of the present invention has been described above.
The embodiments described above are merely examples for carrying out the present invention. Therefore, the present invention is not limited to the above-described embodiments, and can be implemented by appropriately modifying the above-described embodiments without departing from the spirit thereof.

[Brief description of drawings]

【図１】本発明の実施の形態による要約画像作成装置の
構成を示すブロック図である。FIG. 1 is a block diagram showing a configuration of a summary image creating device according to an embodiment of the present invention.

【図２】図１に示した画像作成装置の動作を示すフロー
図である。FIG. 2 is a flowchart showing an operation of the image creating apparatus shown in FIG.

【図３】図１中の指定領域特徴抽出部で指定領域を設定
した画面の例を示す図である。FIG. 3 is a diagram showing an example of a screen in which a designated area is set by a designated area feature extraction unit in FIG.

【図４】図１中の指定領域特徴抽出部の動作を示すフロ
ー図である。FIG. 4 is a flowchart showing an operation of a specified area feature extraction unit in FIG.

【図５】図１中のテロップ特徴抽出部の動作を示すフロ
ー図である。5 is a flowchart showing the operation of a telop feature extraction unit in FIG.

【図６】図５に示したテロップ特徴抽出で用いるＭＰＥ
Ｇのマクロブロックの構成を示す図である。6 is an MPE used in the telop feature extraction shown in FIG.
It is a figure which shows the structure of the macroblock of G.

【図７】図１中の条件ライブラリの具体的な内容を示す
図である。FIG. 7 is a diagram showing specific contents of a condition library in FIG.

【図８】本発明のもう１つの実施の形態による要約画面
作成装置の動作を示すフロー図である。FIG. 8 is a flowchart showing the operation of the summary screen creating apparatus according to another embodiment of the present invention.

【図９】図８中のカテゴリの自動選択の方法を示す図で
ある。9 is a diagram showing a method of automatic selection of categories in FIG.

【図１０】本発明のさらにもう１つの実施の形態による
要約画面作成装置の動作を示すフロー図である。FIG. 10 is a flowchart showing the operation of the summary screen creation device according to yet another embodiment of the present invention.

[Explanation of symbols]

１０データライブラリ１４シーンチェンジ特徴抽出部１６カメラモーション特徴抽出部１８指定領域特徴抽出部２０テロップ特徴抽出部２２音声特徴抽出部２４シーン抽出部２６条件ライブラリ２８シーン特定情報作成部３０プレイリストライブラリ３２シーン選択部３４，３６指定領域 10 data library 14 Scene change feature extraction unit 16 Camera motion feature extraction unit 18 Designated Area Feature Extraction Unit 20 Telop feature extraction unit 22 Speech feature extraction unit 24 Scene extractor 26 Condition Library 28 Scene Specific Information Creation Department 30 playlist library 32 Scene selection section 34,36 designated area

───────────────────────────────────────────────────── フロントページの続き (72)発明者藤原紳吾大阪市西区千代崎３丁目南２番37号株式会社オージス総研内 (72)発明者安川元英大阪市西区千代崎３丁目南２番37号株式会社オージス総研内 (72)発明者乾昌弘大阪市西区千代崎３丁目南２番37号株式会社オージス総研内Ｆターム(参考） 5B075 ND12 NS01 5C052 AA01 AC08 CC11 DD04 5C053 FA06 FA14 FA23 GA11 GB37 HA30 ─────────────────────────────────────────────────── ─── Continued front page (72) Inventor Shingo Fujiwara 2-3-3 Minami 2-3, Chiyosaki, Nishi-ku, Osaka OGIS Research Institute (72) Inventor Motohide Yasukawa 2-3-3 Minami 2-3, Chiyosaki, Nishi-ku, Osaka OGIS Research Institute (72) Inventor Masahiro Inui 2-3-3 Minami 2-3, Chiyosaki, Nishi-ku, Osaka OGIS Research Institute F term (reference) 5B075 ND12 NS01 5C052 AA01 AC08 CC11 DD04 5C053 FA06 FA14 FA23 GA11 GB37 HA30

Claims

[Claims]

1. A summary image creating apparatus for creating a summary image from a moving image, comprising one or a plurality of feature extracting means for extracting one or a plurality of types of features from a video or audio signal of the moving image, And a scene extracting unit for extracting a scene having a characteristic peculiar to the category of the moving image from the video or audio signal with reference to the characteristic extracted by the characteristic extracting unit. Image creation device.

2. The summary image creating apparatus according to claim 1, wherein one of the feature extracting means is a scene change feature extracting means for extracting a feature associated with a scene change from the video signal. A summary image creating device characterized by the above.

3. The summary image creating device according to claim 1, wherein one of the feature extracting means extracts a feature related to camera motion from the video signal. An abstract image creating device characterized by being extraction means.

4. The summary image creating device according to claim 1, wherein one of the feature extracting means relates to a luminance or a color of a designated area in a frame. A specified area feature extracting means for extracting the feature to be extracted from the video signal.

5. The summary image creating device according to claim 1, wherein one of the feature extracting means extracts a feature associated with a telop from the video signal. An abstract image creating apparatus characterized by being a telop feature extracting means for performing.

6. The summary image creating device according to claim 1, wherein one of the feature extracting means extracts a feature related to voice from the audio signal. A summary image creating apparatus characterized in that it is a voice feature extracting means.

7. The summary image creating device according to claim 1, wherein the scene extracting unit includes scene specifying information necessary for specifying a scene to be extracted. A summary image creating apparatus comprising: a scene specifying information creating unit to create; and a scene selecting unit selecting a scene specified according to the scene specifying information from the video or audio signal.

8. The summary image creating device according to claim 7, wherein the scene extraction means further includes a storage means for storing a plurality of scene extraction conditions corresponding to a plurality of categories, and the scene identification information. The means of creation is
A summary image creating apparatus, wherein the scene specifying information is created according to a scene extraction condition corresponding to a selected category among the plurality of categories.

9. The summary image creating device according to claim 8, wherein the scene extracting unit further selects one of the plurality of categories based on the features extracted by the feature extracting unit. An abstract image creating apparatus comprising a selecting means.

10. The summary image creating apparatus according to claim 1, further comprising: the feature extracting unit extracting the feature based on the feature extracted by the feature extracting unit. And a feature extraction condition adjusting means for adjusting a feature extraction condition applied when performing a summary image creating apparatus.

11. The summary image creating apparatus according to claim 8, further comprising a scene extraction condition adjusting unit that adjusts the scene extraction condition based on the feature extracted by the feature extracting unit. Summary image creation device.

12. A summary image creating method for creating a summary image from a moving image, comprising: a feature extracting step of extracting a plurality of different features from a video or audio signal of the moving image; A scene extraction step of extracting from the video or audio signal a scene having a characteristic peculiar to the category of the moving image with reference to the characteristic, a summary image creating method.

13. The method of creating a summarized image according to claim 12, wherein the feature extracting step extracts a feature associated with a scene change from the video signal.

14. The summary image creating method according to claim 12, wherein the feature extracting step extracts a feature related to camera motion from the video signal. Image creation method.

15. The method of creating a summarized image according to claim 12, wherein the feature extracting step includes extracting a feature related to a luminance or a color of a designated area in the frame. A method of creating a summarized image, characterized by extracting from the video signal.

16. The summary image creating method according to claim 12, wherein the feature extracting step extracts a feature associated with a telop from the video signal. A method for creating a featured summary image.

17. The method of creating a summary image according to claim 12, wherein the feature extracting step extracts a feature associated with voice from the audio signal. A method for creating a featured summary image.

18. The summary image creating method according to claim 12, wherein the scene extracting step includes scene specifying information necessary for specifying a scene to be extracted. A method of creating a summary image, comprising: a scene specifying information creating step to be created; and a scene selecting step of selecting a scene specified according to the scene specifying information from the video or audio signal.

19. The summary image creating method according to claim 18, wherein the scene specifying information creating unit stores the plurality of scene extracting conditions in association with a plurality of categories. Of the above categories, the scene specifying information is created according to a scene extraction condition corresponding to the selected category.

20. The summary image creating method according to claim 19, wherein the scene extracting step further includes a category selecting step of selecting one of the plurality of categories based on the extracted characteristics. A method for creating a summary image, which is characterized in that

21. The summary image creating method according to claim 12, further comprising a feature extraction condition applied when the feature is extracted based on the extracted feature. And a feature extraction condition adjusting step for adjusting the.

22. The summary image creating method according to claim 19, further comprising a scene extraction condition adjusting means for adjusting the scene extraction condition based on the extracted feature. Method.

23. A summary image creating program for causing a computer to execute the steps according to any one of claims 12 to 22.

24. A computer-readable storage medium storing the summary image creating program according to claim 23.