JP4369308B2

JP4369308B2 - Representative image selection device, representative image selection method, and representative image selection program

Info

Publication number: JP4369308B2
Application number: JP2004172035A
Authority: JP
Inventors: 恭子数藤; 理絵山田; 裕子高橋; 賢一荒川
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2004-06-10
Filing date: 2004-06-10
Publication date: 2009-11-18
Anticipated expiration: 2024-06-10
Also published as: JP2005352718A

Description

本発明は，カメラによって物体を撮影した動画像に，その動画像を利用するためのインデックスを与える技術に関し，特に，動画像中から注目物体を決め，注目物体と背景が最もよく分離できるフレームを選択し，代表画像とする代表画像選択装置，代表画像選択方法および代表画像選択プログラムに関するものである。 The present invention relates to a technique for giving an index for using a moving image to a moving image obtained by photographing an object with a camera, and in particular, determines a target object from the moving image and determines a frame that can best separate the target object and the background. The present invention relates to a representative image selection device, a representative image selection method, and a representative image selection program that are selected to be representative images.

映像の検索，編集，早見など，用途に応じて映像にインデキシングを行う技術の従来例として，次のようなものがある。 The following are examples of conventional techniques for indexing video depending on the application, such as video search, editing, and quick viewing.

下記の特許文献１「ビデオの代表画像一覧作成方法および装置」には，ビデオの代表的な画像を選択し，その代表画像がもつシーンの継続時間と場所を一覧化する技術が記載されている。代表画像の選択方法は，手動または自動で，自動の場合には，一定間隔のフレームとするか，あるいはシーンの切り替わりを検出している。この場合，自動でシーンの切り替わりを検出しても，切り替わりの瞬間のフレームに必ずしも注目物体が鮮明に含まれているとは限らず，的確な代表画像になるとは限らない。 The following Patent Document 1 “Method and apparatus for creating a list of representative images of video” describes a technique for selecting representative images of a video and listing the durations and locations of scenes of the representative images. . The method for selecting the representative image is manual or automatic, and in the case of automatic, frames are set at regular intervals or scene change is detected. In this case, even if a scene change is detected automatically, the frame of interest at the instant of the change does not necessarily include the object of interest clearly, and the accurate representative image is not necessarily obtained.

特許文献２「ショット検出方法および代表画像記録・表示装置」に記載された技術も，ショットの変化を検出し，映像を適当な単位にまとめ，その単位ごとに代表的な画像を抽出する技術である。画像の選択は，ズーム，パン，色ヒストグラム，周波数などに基づいて行う。シーンの切り替わりは検出できるが，これが代表画像に必ずしも適しているとは限らない。 The technique described in Patent Document 2 “Shot Detection Method and Representative Image Recording / Display Device” is also a technique for detecting changes in shots, collecting videos into appropriate units, and extracting representative images for each unit. is there. The image is selected based on zoom, pan, color histogram, frequency, and the like. Although a scene change can be detected, this is not always suitable for a representative image.

特許文献３「代表画像表示方法，代表画像表示装置およびこの装置を用いた動画像検索装置」に記載された技術は，あるシーンにおいて，そのシーンに含まれる各フレームから動物体の輪郭を抽出し，この輪郭を重ね合わせた合成画像をそのシーンの代表画像とする技術である。しかし，後述する本発明で目的としているように，各フレームの代表画像としての適合度を評価したり，映像の各フレームにユーザの活用用途に応じた代表画像を選択するためのインデックスを与えることはできない。 The technique described in Patent Document 3 “Representative image display method, representative image display device and moving image search device using this device” extracts the contour of the moving object from each frame included in the scene. , A technique in which a composite image obtained by superimposing the contours is used as a representative image of the scene. However, as will be described later in the present invention, it is possible to evaluate the adaptability of each frame as a representative image, and to give an index for selecting a representative image according to a user's application to each frame of video. I can't.

下記の非特許文献１，非特許文献２は，輪郭抽出技術であり，後述する本発明中の注目物体輪郭抽出部に応用することは可能であるが，各フレームの代表画像としての適合度を評価したり，映像の各フレームにインデックスを与えることはできない。
特開平７−７９４０４号公報特開平７−２３６１１５号公報特開平９−２１４８６６号公報坂上，山本，「動的な網のモデル Active Net とその領域抽出への応用」，テレビジョン学会誌，1991年，vol.45，No.10 ，p.1155-1163 境田，苗村，金次，「背景差分法と時空間 watershed による領域成長法を併用した動画像オブジェクトの抽出」，電子情報通信学会論文誌 DII，2001年，Vol.J84-DII ，No.12 ，p.2541-2555 The following Non-Patent Document 1 and Non-Patent Document 2 are contour extraction techniques, which can be applied to a target object contour extraction unit in the present invention described later. You cannot evaluate or index each frame of the video.
Japanese Patent Laid-Open No. 7-79404 JP 7-236115 A Japanese Patent Laid-Open No. 9-214666 Sakagami, Yamamoto, "Dynamic Net Model Active Net and its Application to Region Extraction", Television Society Journal, 1991, vol.45, No.10, p.1155-1163 Sakaida, Naemura, Kinji, “Extraction of moving image objects using background difference method and region growing method by spatio-temporal watershed”, IEICE Transactions DII, 2001, Vol.J84-DII, No.12 p.2541-2555

上記のように，従来技術では，動画像からの代表画像の選択方法としては，シーン変化を検出して，シーンの切り替わりのフレームを代表画像としていた。しかし，カメラのパンやズーム，色などの変化によって定義されたシーンの変わり目のフレームには，必ずしもそのシーンで最も注目したい被写体が含まれているとは限らない。 As described above, in the prior art, as a method for selecting a representative image from a moving image, a scene change is detected and a scene switching frame is used as the representative image. However, a frame at the transition of a scene defined by changes in camera pan, zoom, color, etc. does not necessarily include the subject that is most interested in the scene.

代表画像は，それだけを閲覧しても映像のダイジェストとして映像全体の内容が把握できるような画像が選ばれることが望ましい。もとの映像が，何か対象物体を撮影したものであった場合，その対象物体が含まれている必要がある。また，対象物体が含まれるフレームの中でも，どれを代表画像とするかは，その後の用途にもよるため，ユーザが代表画像の選択方法を指示できることが望ましい。 As the representative image, it is desirable to select an image that allows the content of the entire video to be grasped as a video digest even if only that image is viewed. If the original video is an image of a target object, the target object needs to be included. In addition, it is desirable that the user can instruct the selection method of the representative image because which of the frames including the target object is the representative image depends on the subsequent use.

本発明は，以上のような事情に基づきなされたものであって，注目物体がぶれずに最も鮮明に映っているフレームを代表画像として選択することができ，また，ユーザの活用用途に応じた代表画像の選択を可能にすることを目的とする。 The present invention has been made based on the circumstances as described above, and can select a frame in which the object of interest is most clearly reflected without blurring as a representative image, and also according to the usage application of the user. An object is to enable selection of a representative image.

上記の目的を達成するため，本発明は，動画像を入力し，動画像の各フレームから一つまたは複数の動領域である注目物体領域を含む領域を小領域に分割する。その小領域内の各画素について，少なくとも動きに基づく特徴量を含む特徴量を抽出し，抽出した特徴量を用いて前記小領域内の各画素が注目物体領域とそれ以外の領域である背景領域のどちらのクラスに属するのかクラス分けする。その後，隣接する前記小領域間で，前記特徴量の分布形状が近いクラス同士を同じクラスとし，前記小領域間でクラスの境界同士をつなぎ合わせることで，前記注目物体領域の輪郭を抽出する。このようにして抽出された注目物体領域および背景領域それぞれの領域内の全画素について前記特徴量の分布の平均値を求め，双方の平均値間の距離を各フレームの代表画像としての適合度を示す点数とする。 In order to achieve the above object, the present invention inputs a moving image and divides a region including a target object region, which is one or a plurality of moving regions, from each frame of the moving image into small regions. For each pixel in the small region, a feature amount including at least a feature amount based on motion is extracted, and using the extracted feature amount, each pixel in the small region is a target object region and a background region other than that region Classify which class belongs to. Thereafter, the classes having similar feature distribution shapes between the adjacent small areas are set as the same class, and the boundaries of the classes are connected between the small areas, thereby extracting the contour of the target object area. The average value of the distribution of the feature values is obtained for all the pixels in each of the target object region and the background region extracted in this way, and the distance between the two average values is used as the representative image of each frame. The number of points shown.

さらに，前記適合度の点数として，前記距離の他に，注目物体領域の形状と所定のテンプレートとの一致度，または該注目物体領域の大きさ，または該注目物体領域の輪郭の長さと所定の基準値とを比較した値，または該注目物体領域の輪郭の形状と所定の基準値とを比較した値，のいずれか１つ以上を求めてもよい。そして，その適合度の点数および点数を算出した方法に関する情報をインデックスとして動画像の各フレームに付与し，インデックスを付与された動画像を出力することができる。
Further, as the score of the degree of matching, in addition to the distance, the degree of coincidence between the shape of the target object region and the predetermined template, the size of the target object region, or the length of the contour of the target object region Any one or more of a value compared with a reference value or a value obtained by comparing a contour shape of the object region of interest with a predetermined reference value may be obtained. Then, the score of the degree of fitness and information on the method for calculating the score can be assigned to each frame of the moving image as an index, and the indexed moving image can be output .

上記注目物体領域の輪郭の抽出は，以下のようにして行う。まず，入力された動画像を，フレームごとに，上記一つまたは複数の注目物体の一部または全体を含む小領域に分割する。そして，分割された各小領域内の各画素について一つまたは複数の特徴量を抽出する。特徴量は，例えば，輝度，色，オプティカルフローなどである。特徴量を抽出する場合，例えば，処理対象のフレームとその近傍のフレームとを使って動きに基づく特徴量を抽出するか，処理対象のフレームを使って動きに基づかない特徴量を抽出するか，その双方の特徴量を抽出する。 The extraction of the contour of the target object region is performed as follows. First, the input moving image is divided into small regions including some or all of the one or more objects of interest for each frame. Then, one or a plurality of feature amounts are extracted for each pixel in each divided small area. The feature amount is, for example, luminance, color, optical flow, and the like. When extracting feature quantities, for example, whether to extract feature quantities based on motion using the processing target frame and neighboring frames, or to extract feature quantities not based on motion using the processing target frame, Both feature quantities are extracted.

分割された各小領域ごとに，特徴量の全てあるいは一部を用いて各画素が２つの領域のどちらに属するのかをクラス分けする場合に，そのクラス分けの分離度を示す尤度が最も高くなる値を示すことになるクラス分けを行う。次に，隣接する各小領域について，前記特徴量を用いて，同じクラスにクラス分けされた領域の対応づけを行い，それに基づいて，同じクラスにクラス分けされた領域同士をつなぎ合わせることで，前記注目物体の輪郭を抽出して背景から分離する。 For each divided small area, when classifying which pixel each pixel belongs to, using all or part of the feature quantity, the likelihood indicating the separation of the classification is the highest. Classify that will show the value Next, for each adjacent small region, using the feature amount, the regions classified into the same class are associated, and based on this, the regions classified into the same class are joined together, The outline of the object of interest is extracted and separated from the background.

本発明を用いて，注目物体を背景から最もきれいに分離できるフレームの原画像を代表画像とすることで，注目物体が動きのためにぶれている画像ではなく，注目物体が鮮明なフレームを代表画像として選択することができる。このようにして得られた代表画像は，注目物体が鮮明に映っているため，利用しやすい。 By using the present invention as the representative image, the original image of the frame that can most clearly separate the object of interest from the background is used as the representative image. Can be selected. The representative image obtained in this way is easy to use because the object of interest is clearly visible.

また，輪郭の抽出方法は，前景も背景も動きがある場合に適用可能な手法であるため，携帯カメラや携帯端末で移動しながら撮影して注目物体も背景も動いているような画像にも適用できる。 In addition, the contour extraction method can be applied when the foreground and the background are both moving, so it can be used for images where the object of interest and the background are moving while moving with a mobile camera or mobile device. Applicable.

本発明の装置は，コンピュータとプログラムによっても実現でき，プログラムを記録媒体に記録することも，ネットワークを通して提供することも可能である。 The apparatus of the present invention can be realized by a computer and a program, and the program can be recorded on a recording medium or provided through a network.

本発明によれば，映像から対象物体を抽出し，対象物体に基いた判定基準によって，映像中のフレームの代表画像としての適合性を評価し，適合度をインデックスとしてフレームに付加することができる。これによって，ユーザの活用用途に応じた代表画像を選択することができる。 According to the present invention, it is possible to extract a target object from a video, evaluate the suitability of a frame in the video as a representative image according to a criterion based on the target object, and add the suitability to the frame as an index. . This makes it possible to select a representative image according to the usage application of the user.

〔第１の実施の形態〕
はじめに，図１を用いて本発明の第１の実施の形態を説明する。図１は，代表画像選択装置の構成例を示す図である。また，図２は，その処理フローチャートである。代表画像選択装置１は，動画像入力部１１，注目物体輪郭抽出部１２，代表画像適合度算出部１３，インデックス付与部１４，動画像出力部１５，輪郭評価基準格納部１６を備える。１６０は輪郭評価基準格納部１６に格納される輪郭評価基準である。 [First Embodiment]
First, a first embodiment of the present invention will be described with reference to FIG. FIG. 1 is a diagram illustrating a configuration example of a representative image selection device. FIG. 2 is a flowchart showing the processing. The representative image selection device 1 includes a moving image input unit 11, a target object contour extraction unit 12, a representative image matching degree calculation unit 13, an indexing unit 14, a moving image output unit 15, and a contour evaluation reference storage unit 16. Reference numeral 160 denotes a contour evaluation reference stored in the contour evaluation reference storage unit 16.

動画像入力部１１は，動画像を入力し（図２のステップＳ１），動画像のフレームごとに順番に注目物体輪郭抽出部１２へ送る。以下の処理は，フレームごとに行われるため，入力の動画像としては，ビデオカメラで撮影された動画像のほか，デジタルカメラの連写機能で撮影したＭｏｔｉｏｎＪＰＥＧ形式の動画像などでもよい。 The moving image input unit 11 inputs a moving image (step S1 in FIG. 2), and sequentially sends the moving image to the target object contour extracting unit 12 for each frame of the moving image. Since the following processing is performed for each frame, the input moving image may be a moving image captured by a video camera or a Motion JPEG format moving image captured by a continuous shooting function of a digital camera.

注目物体輪郭抽出部１２は，図１（Ａ）に示すような画像の時系列の各フレームごとに注目物体の輪郭抽出を行う（図２のステップＳ２）。注目物体領域は，デフォルトで動領域とする。隣接するフレーム間で局所的なテンプレートマッチングを行うなどの方法で動領域を検出することができる（例えば，「デジタル画像入力の基礎」コロナ社，第３章，ｐｐ．１３７参照）。 The target object contour extracting unit 12 extracts the contour of the target object for each time-series frame of the image as shown in FIG. 1A (step S2 in FIG. 2). The target object area is a moving area by default. A moving region can be detected by a method such as performing local template matching between adjacent frames (for example, see “Basics of Digital Image Input” Corona, Chapter 3, pp.137).

この領域を注目物体領域の初期値，それ以外の領域を背景領域の初期値として，注目物体領域の輪郭抽出を行う。輪郭抽出の技術としては，例えば，上記非特許文献１に記載されたような，動的輪郭モデルを用いる方法などがある。 Using this region as the initial value of the target object region and the other region as the initial value of the background region, contour extraction of the target object region is performed. Examples of the contour extraction technique include a method using a dynamic contour model as described in Non-Patent Document 1 above.

輪郭評価基準格納部１６は，輪郭評価基準１６０を格納する手段である。輪郭評価基準１６０は，各フレームが代表画像となり得るかどうかを評価するためにあらかじめ用意された基準で，注目物体輪郭抽出部１２の出力について評価を行う基準である。輪郭評価基準１６０は，外部の装置から入力されるようなものであってもよい。 The contour evaluation reference storage unit 16 is means for storing the contour evaluation reference 160. The contour evaluation standard 160 is a standard prepared in advance for evaluating whether each frame can be a representative image, and is a standard for evaluating the output of the target object contour extraction unit 12. The contour evaluation standard 160 may be input from an external device.

代表画像適合度算出部１３は，注目物体輪郭抽出部１２の出力と，輪郭評価基準１６０とに基づいて，各フレームの代表画像としての適合度を算出する（図２のステップＳ３）。 The representative image suitability calculating unit 13 calculates the suitability of each frame as a representative image based on the output of the target object contour extracting unit 12 and the contour evaluation reference 160 (step S3 in FIG. 2).

例えば，代表画像適合度算出部１３は，注目物体輪郭抽出部１２で算出された，最終的な尤度（注目物体と背景との特徴量の分布のあてはまりの良さ），各画素の座標値と，背景領域／注目物体領域のいずれかを示すフラグと，輪郭上の画素の座標値と，領域分割の過程で算出した特徴量とを対応させたデータ配列，を入力とし，輪郭評価基準１６０に基づいて，各フレームの代表画像としての適合度を点数として算出する。適合度は，理想的な代表画像との近さを示すものであり，点数は，適合度を評価するため，画像の特徴量のテンプレートとの距離に基づく評価関数に特徴量を代入して得られる値である。 For example, the representative image suitability calculation unit 13 calculates the final likelihood (goodness of fit of the feature amount distribution between the target object and the background), the coordinate value of each pixel, calculated by the target object contour extraction unit 12. , A flag indicating any one of the background region / target object region, the coordinate value of the pixel on the contour, and a data array in which the feature amount calculated in the region division process is associated with each other. Based on this, the adaptability of each frame as a representative image is calculated as a score. The goodness of fit indicates the closeness to the ideal representative image, and the score is obtained by substituting features into an evaluation function based on the distance from the image feature to the template in order to evaluate the goodness of fit. Value.

インデックス付与部１４は，例えば，代表画像適合度算出部１３から入力された各フレームの画像と，そのフレームから抽出された注目物体の輪郭と，その輪郭の背景からの分離度と，輪郭を評価した評価方法とその結果の点数を，評価方法と点数をインデックスとする図３に示すようなデータ構造に書き込むことで，各フレームの画像にインデックスを付与する（図２のステップＳ４）。動画像出力部１５は，インデックスを付与された動画像を出力する（図２のステップＳ５）。 For example, the index assigning unit 14 evaluates the image of each frame input from the representative image suitability calculating unit 13, the contour of the target object extracted from the frame, the degree of separation of the contour from the background, and the contour. The evaluation method and the resulting score are written in a data structure as shown in FIG. 3 using the evaluation method and the score as an index, thereby giving an index to the image of each frame (step S4 in FIG. 2). The moving image output unit 15 outputs the indexed moving image (step S5 in FIG. 2).

各フレームの画像へのインデックスの付与によって，ユーザは，例えば，所望の評価方法で評価した結果，点数が高かった図１（Ｂ）の斜線部に示すフレームだけを代表画像として検索することができる。 By assigning an index to the image of each frame, for example, the user can search only the frames indicated by the hatched portion in FIG. .

また，インデックス付与部１４は，代表画像適合度算出部１３から入力された各フレームの画像と，そのフレームから抽出された注目物体の輪郭と，その輪郭の背景からの分離度と，輪郭を評価した評価方法とその結果の点数と，後述する代表画像条件入力部によって入力された代表画像を選択する条件とを，評価方法と点数と条件をインデックスとする図１１に示すようなデータ構造に書き込む場合もある。 Further, the index assigning unit 14 evaluates the image of each frame input from the representative image suitability calculating unit 13, the contour of the target object extracted from the frame, the degree of separation of the contour from the background, and the contour. The evaluation method, the score of the result, and the condition for selecting the representative image input by the representative image condition input unit described later are written in the data structure as shown in FIG. 11 using the evaluation method, the score, and the condition as an index. In some cases.

図４および図５に従って，輪郭評価基準１６０を用いた代表画像適合度算出部１３の処理の例を述べる。輪郭評価基準１６０による評価方法として，（ａ）注目物体領域の形状と大きさを評価する方法，（ｂ）輪郭線の性質を定量的に評価する方法，（ｃ）注目物体領域と背景領域の分離度，分離度の時間的な変化，注目物体輪郭抽出部１２の出力である全体の尤度（注目物体と背景との特徴量の分布のあてはまりの良さ）が一定値以上であるかどうかを評価する方法，（ｄ）上記評価方法を複数組み合わせる方法などがある。 An example of processing of the representative image suitability calculation unit 13 using the contour evaluation reference 160 will be described with reference to FIGS. As an evaluation method based on the contour evaluation standard 160, (a) a method for evaluating the shape and size of the target object region, (b) a method for quantitatively evaluating the properties of the contour line, and (c) a target object region and a background region. Whether or not the degree of separation, the temporal change in the degree of separation, and the overall likelihood (goodness of fit of the feature quantity distribution between the target object and the background), which is the output of the target object contour extraction unit 12, is greater than or equal to a certain value. There are a method of evaluation, and (d) a method of combining a plurality of the above evaluation methods.

（ａ）注目物体領域の形状と大きさの評価
図４（Ａ），（Ｂ）は，注目物体輪郭抽出部１２の出力である注目物体領域の形状や大きさの評価の例を説明する図である。これらの図に示すように，例えば，注目物体輪郭抽出部１２の出力である各画素の座標値と，その画素が背景領域／注目物体領域のいずれに属するかを示すフラグが対応している配列から，背景領域を“０”，注目物体領域を“１”とする二値画像を作成し，テンプレートマッチング（例えば，「コンピュータ画像処理入門」総研出版，第５章，ｐｐ．１４９参照）を行う。作成した二値画像に対して，形状の内部を“１”，背景を“０”とした二値画像によるテンプレートを，その大きさと位置を変えながら重ね合わせて，重なる面積が最大となるときの形状内部の面積に対する重なり部分の面積の比率を一致度とする。 (A) Evaluation of shape and size of target object region FIGS. 4A and 4B are diagrams illustrating an example of evaluation of the shape and size of the target object region, which is the output of the target object contour extraction unit 12. It is. As shown in these figures, for example, an array in which a coordinate value of each pixel that is an output of the target object outline extraction unit 12 and a flag indicating whether the pixel belongs to a background area or a target object area corresponds to each other. From this, a binary image with a background area of “0” and a target object area of “1” is created, and template matching (for example, “Introduction to Computer Image Processing”, Research Institute Publishing, Chapter 5, pp. 149) is performed. . When a binary image template with the shape inside “1” and the background “0” is superimposed on the created binary image while changing its size and position, the overlapping area is maximized The ratio of the area of the overlapping portion to the area inside the shape is defined as the degree of coincidence.

点数は，この一致度と，その時の大きさでもよいし，それらから所定の関数で算出された値でもよい。その他，輪郭の凹凸性などの特徴の評価関数を用いてもよい（例えば，「コンピュータ画像処理入門」総研出版，第３章３．４図形形状特徴を参照）。以上の例に限らず，注目物体領域の形状と大きさを定量的に評価できる手法であればよい。 The score may be the degree of coincidence and the magnitude at that time, or may be a value calculated from them using a predetermined function. In addition, an evaluation function for features such as contour unevenness may be used (for example, see “Introduction to Computer Image Processing”, Soken Publishing, Chapter 3, 3.4 Graphic Shape Features). The method is not limited to the above example, and any method that can quantitatively evaluate the shape and size of the object region of interest may be used.

（ｂ）輪郭線の性質の評価
図４（Ｃ）は，輪郭線の性質の評価の例を説明する図である。輪郭の長さ，輪郭が閉曲線であるかどうか，端点の位置，輪郭の滑らかさなどを評価する。輪郭の長さは，境界線追跡などにより求める（例えば，「画像処理工学基礎編」共立出版株式会社，第７章，ｐｐ．１２６参照）。 (B) Evaluation of outline property FIG. 4C is a diagram illustrating an example of evaluation of the outline property. Evaluate the length of the contour, whether the contour is a closed curve, the position of the end points, the smoothness of the contour, etc. The length of the contour is obtained by boundary line tracking or the like (for example, see “Image Processing Engineering Basics”, Kyoritsu Publishing Co., Ltd., Chapter 7, pp. 126).

閉曲線かどうかは，始点と終点が一致するかどうかで判定する。端点の位置は，図４（Ｃ）に示すように，注目物体領域と背景領域の境界（輪郭線）上の画素をたどり，画像領域のｘまたはｙの最大値か最小値をとる画素があるかどうかで判定する。輪郭の滑らかさは，輪郭の１次差分や２次差分を用いた評価関数によって求める。 Whether it is a closed curve is determined by whether the start and end points match. As shown in FIG. 4C, the position of the end point follows a pixel on the boundary (contour line) between the target object region and the background region, and there is a pixel that takes the maximum or minimum value of x or y in the image region. Judge whether or not. The smoothness of the contour is obtained by an evaluation function using the primary difference or the secondary difference of the contour.

これらと，これらの項目の基準値とを比較した結果の点数を与える。各項目をパラメータとする適当な関数を用意しておき，これを用いて算出する。以上の例に限らず，輪郭線の性質を定量的に評価できる手法であればよい。 Scores are given as a result of comparing these items with the reference values of these items. An appropriate function with each item as a parameter is prepared and calculated using this function. The method is not limited to the above example, and any method that can quantitatively evaluate the properties of the contour line may be used.

（ｃ）注目物体領域と背景領域の分離度，分離度の時間的な変化，注目物体輪郭抽出部１２の出力である全体の尤度が一定値以上であるかどうかの評価
図５は，この評価の例を説明する図である。注目物体領域と背景領域の分離度，分離度の時間的な変化，注目物体と背景との特徴量の分布のあてはまりの良さが一定値以上であるかどうかを評価する。分離度は，ここでは注目物体領域に属する画素の特徴量の分布と，背景領域に属する画素の特徴量の分布との特徴空間における重なりの程度を示すものであり，分布の重なりが大きい場合を分離度が低い，重なりが小さい場合を分離度が高いとしている。点数は，この注目物体領域と背景領域の特徴量の分布の分離度などとする。図５（Ｂ）に示す例では，図５（Ａ）に示す画像中の背景領域内画素の特徴量の分布の平均と，注目物体領域内画素の特徴量の分布の平均を求め，双方の平均値間の距離を分離度としている。なお，分離度としては，マハラノビス距離（例えば，「わかりやすいパターン認識」オーム社，ｐｐ．１２１−１２２参照）などを用いることもできる。 (C) Evaluation of the degree of separation between the target object region and the background region, the temporal change in the degree of separation, and whether or not the overall likelihood that is the output of the target object contour extraction unit 12 is greater than or equal to a certain value. It is a figure explaining the example of evaluation. It evaluates whether the degree of separation between the target object region and the background region, the temporal change in the degree of separation, and the goodness of fit of the feature quantity distribution between the target object and the background are above a certain value. Here, the degree of separation indicates the degree of overlap in the feature space between the distribution of feature values of the pixels belonging to the object region of interest and the distribution of feature values of the pixels belonging to the background region. The separation degree is high when the separation degree is low and the overlap is small. The score is defined as the degree of separation of the distribution of feature amounts of the target object area and the background area. In the example shown in FIG. 5B, the average of the feature amount distribution of the pixels in the background region and the average of the feature amount distribution of the pixels in the object region of interest in the image shown in FIG. The distance between the average values is used as the degree of separation. As the degree of separation, Mahalanobis distance (see, for example, “Intuitive Pattern Recognition” Ohm, pp. 121-122) can also be used.

また，注目物体と背景との特徴量の分布のあてはまりの良さ（尤度）が一定値以上であるかどうかを評価してもよい。また，それらを入力とした関数を用意しておいてもよい。また，尤度が時間的にどのように変化するかを評価してもよい。尤度小→大または尤度大→小など，フレーム間で尤度に急な変化が生じたとき，これを検出する。点数は，尤度の変化量でもよいし，それを入力とした関数を用意しておいてもよい。以上の例に限らず，注目物体領域と背景領域の分離度やその時間変化を定量的に評価できる手法であればよい。 Also, it may be evaluated whether the goodness of fit (likelihood) of the feature quantity distribution between the object of interest and the background is a certain value or more. In addition, a function using them as input may be prepared. Moreover, you may evaluate how likelihood changes with time. When a sudden change in likelihood occurs between frames, such as likelihood small → large or likelihood large → small, this is detected. The score may be the amount of change in likelihood, or a function with the input as an input may be prepared. The method is not limited to the above example, and any method that can quantitatively evaluate the degree of separation between the object region of interest and the background region and the temporal change thereof may be used.

（ｄ）上記の評価の組み合わせ
本発明の実施の形態では，上記の例のような評価の複数の組み合わせを行ってもよい。また，どの基準を用いるかは，ユーザがその後の用途に応じて選択することができる。 (D) Combination of the above evaluations In the embodiment of the present invention, a plurality of combinations of evaluations as in the above example may be performed. In addition, which criterion is used can be selected by the user according to the subsequent use.

〔第２の実施の形態〕
次に図６を用いて，本発明の第２の実施の形態を説明する。図６は，第２の実施の形態に係る代表画像選択装置の構成例を示す図である。代表画像選択装置２は，図１に示す代表画像選択装置１の構成に，注目物体指示部２１を加えた構成である。注目物体指示部２１以外の構成は，図１に示す代表画像選択装置１が備える構成と同じである。 [Second Embodiment]
Next, a second embodiment of the present invention will be described with reference to FIG. FIG. 6 is a diagram illustrating a configuration example of a representative image selection device according to the second embodiment. The representative image selection device 2 has a configuration in which a target object instruction unit 21 is added to the configuration of the representative image selection device 1 shown in FIG. The configuration other than the object-of-interest instruction unit 21 is the same as the configuration included in the representative image selection device 1 shown in FIG.

注目物体指示部２１は，注目物体領域をユーザが指示するための手段である。上記第１の実施の形態において，注目物体輪郭抽出部１２では，デフォルトで動領域すべてを注目物体とみなす例を述べた。第２の実施の形態では，それ以外に図６に示すように，注目物体指示部２１を設け，複数の決定方法を持たせ，これを注目物体指示部２１によりユーザに指示させたのち画像処理を行うことによって注目物体を決定することができるようにしている。 The target object instruction unit 21 is a means for the user to specify a target object region. In the first embodiment, an example has been described in which the object-of-interest contour extraction unit 12 regards all moving regions as objects of interest by default. In the second embodiment, as shown in FIG. 6, an object-of-interest instruction unit 21 is provided, and a plurality of determination methods are provided. After the user is instructed by the object-of-object instruction unit 21, image processing is performed. By doing this, the target object can be determined.

指示方法の例としては以下のようなものがある。（１）ある一定以上の面積をもつ動領域を選択する，（２）動きベクトルの絶対値がある一定以上の値をもつ領域を選択する，（３）動領域のうち，あらかじめ登録されたテンプレートによるテンプレートマッチングを行い，相関の高い領域を選択する，（４）ＧＵＩ（Graphical User Interface）によって，注目物体領域をユーザにマウス等で選択させる，などである。 Examples of the instruction method include the following. (1) Select a moving region having a certain area or more, (2) Select a region having an absolute value of a motion vector having a certain value or more, (3) Pre-registered templates among the moving regions Template matching is performed to select a highly correlated area, and (4) the user selects a target object area with a mouse or the like by using a GUI (Graphical User Interface).

〔第３の実施の形態〕
次に，本発明の第３の実施の形態を説明する。図７は，本発明の第３の実施の形態に係る注目物体輪郭抽出部の構成例を示す図である。第３の実施の形態は，第１および第２の実施の形態における代表画像選択装置１，２における注目物体輪郭抽出部１２の一例として，図６に示す注目物体輪郭抽出部１２を用いた例である。 [Third Embodiment]
Next, a third embodiment of the present invention will be described. FIG. 7 is a diagram illustrating a configuration example of an attention object contour extraction unit according to the third embodiment of the present invention. The third embodiment is an example in which the target object contour extracting unit 12 shown in FIG. 6 is used as an example of the target object contour extracting unit 12 in the representative image selecting devices 1 and 2 in the first and second embodiments. It is.

注目物体輪郭抽出部１２は，小領域設定部１２１と，特徴ベクトル抽出部１２２と，小領域分割部１２３と，領域抽出部１２４とからなり，動画像入力部１１から入力された画像の時系列の各フレームについて，注目物体の輪郭を求める。 The target object contour extracting unit 12 includes a small region setting unit 121, a feature vector extracting unit 122, a small region dividing unit 123, and a region extracting unit 124, and a time series of images input from the moving image input unit 11. For each frame, the contour of the object of interest is obtained.

まず，小領域設定部１２１は，注目物体を含むような領域を複数個の小領域に分割する。例えば，注目物体指示部２１からの指示入力がある場合には，指示された領域全体を含むような１つ以上の小領域を設定する。例えば，図８に示すように，画像領域をＮ（Ｎ≧２）個の小領域に分割して，それらの小領域を単位とする画素の座標と画素値とを組とする情報を保持する。 First, the small area setting unit 121 divides an area including the target object into a plurality of small areas. For example, when there is an instruction input from the object-of-interest instruction unit 21, one or more small areas including the entire designated area are set. For example, as shown in FIG. 8, the image area is divided into N (N ≧ 2) small areas, and information that sets pixel coordinates and pixel values in units of these small areas is held. .

特徴ベクトル抽出部１２２は，小領域内の画素について特徴量を求める。特徴量は，輝度，色，オプティカルフローなどで，すべての画素について求める。そして，画素の座標と特徴量とを組とする情報を保持する。 The feature vector extraction unit 122 obtains a feature amount for the pixels in the small area. The feature amount is obtained for all pixels by luminance, color, optical flow, and the like. And the information which makes the coordinate of a pixel and the feature-value a pair is hold | maintained.

次に，小領域分割部１２３において，各小領域内の注目物体領域と背景領域のそれぞれの特徴量の分布に基づいて，各画素がそれぞれの領域に含まれる確率を求め，最もあてはまりのよいように，すなわち，尤度が最大となるように，小領域における注目物体領域と背景領域を分割し直し，その時の尤度を算出する。 Next, the small area dividing unit 123 obtains the probability that each pixel is included in each area based on the distribution of the feature amounts of the target object area and the background area in each small area, and seems to be the best fit. In other words, the target object region and the background region in the small region are divided again so that the likelihood becomes maximum, and the likelihood at that time is calculated.

よく知られている一般的な尤度の定義について簡単に説明する。確率密度関数がｆ（ｙ，θ）で与えられるとき，確率変数Ｙ₁，Ｙ₂，…，Ｙ_nの観測値ｙ₁，ｙ₂，…，ｙ_nに対して，
Ｌ＝Π_iｆ（ｙ_i，θ）
をθの関数とみて，これを尤度関数と呼ぶ（“確率モデルによる音声認識”（電子情報通信学会），ｐ．５１参照）。このＬのとり得るいろいろな値を尤度と呼ぶ。すなわち，確率モデルｆ（ｙ，θ）を仮定したときに，観測値ｙ₁，ｙ₂，…，ｙ_nが得られる確率であり，確率モデルｆ（ｙ，θ）への観測値のあてはまりの良さを評価する値である。 A well-known general definition of likelihood is briefly described. When the probability density function is given by f (y, θ), the random variables Y _1, Y _2, ..., observed value y _1, y ₂ of Y _n, ..., with respect to y _n,
L = Π _i f (y _i , θ)
Is regarded as a function of θ, and this is called a likelihood function (see “Speech Recognition by Probability Model” (The Institute of Electronics, Information and Communication Engineers), p. 51). Various values that L can take are called likelihoods. That is, when assuming a probability model f (y, theta), observed value y _1, y _2, ..., the probability of y _n is obtained, the probability model f (y, theta) of observations to apply the It is a value that evaluates goodness.

本実施の形態では，この尤度が最大となるように小領域を分割し直す処理を繰り返すことによって，注目物体領域と背景領域とを正確に分離する。繰り返しの回数は所定回数に達するか，または尤度が所定値以上になるまでとする。 In the present embodiment, the object region and the background region are accurately separated by repeating the process of re-dividing the small region so that the likelihood becomes maximum. The number of repetitions is until the predetermined number of times is reached or the likelihood becomes equal to or greater than a predetermined value.

小領域分割部１２３の処理について，さらに詳しく説明する。小領域分割部１２３は，特徴ベクトル抽出部１２２から各小領域Ｃ_i（１≦ｉ≦Ｎ）の持つ画素の座標と特徴量とを組とする情報を受信し，小領域ごとに各特徴量（Ｆ₁〜Ｆ_M）を使って，各画素を２つのクラスに分割する処理を行う。具体的にはその特徴量を使うと，その画素がどちらのクラスに属するのかということと，そのクラスに属する確率と，その特徴量を用いてクラス分けを行う場合の小領域全体の尤度（クラス分けの分離度を示す値：高い値ほど特徴量の分布のモデルへのあてはまりがよいことを示す）とを算出する処理を行う。そして，画素の座標と特徴量とを組とする情報に，上記処理結果である，画素ごとの推定クラスとそのクラスに属する確率と小領域全体の尤度とを加えた情報を保持する。 The processing of the small area dividing unit 123 will be described in more detail. The small region dividing unit 123 receives information that sets the coordinates of the pixel and the feature amount of each small region C _i (1 ≦ i ≦ N) from the feature vector extracting unit 122, and each feature amount for each small region. Using (F _{1 to} F _M ), processing for dividing each pixel into two classes is performed. Specifically, when using the feature value, the class to which the pixel belongs, the probability of belonging to the class, and the likelihood of the entire small region when classifying using the feature value ( A process of calculating a value indicating the degree of separation of classification: a higher value indicates that the distribution of the feature amount is better applied to the model). Then, information obtained by adding the estimated class for each pixel, the probability of belonging to the class, and the likelihood of the entire small area to the information obtained by combining the coordinates of the pixel and the feature amount is stored.

小領域Ｃ_iを領域Ｚ₁と領域Ｚ₂の２つのクラスに分割する手順について説明する。この分割は，画素の特徴量に基づいて行う。小領域内において特徴量は，物体部分と背景部分のそれぞれにおいて，おおむね一様であるが，通常はガウス分布のように，ある平均値のまわりに多少の分散をもって分布すると考えられる。 A procedure for dividing the small area C _i into two classes of area Z ₁ and area Z ₂ will be described. This division is performed based on the feature amount of the pixel. In the small area, the feature value is almost uniform in each of the object part and the background part, but it is usually considered to be distributed with some dispersion around a certain average value like a Gaussian distribution.

領域分割においては，この分布を推定する統計的手法が有効である。例えば，ガウス分布を仮定して，平均と分散とをパラメータとし，領域分割の形状とパラメータ値とを推定する最尤推定を行う。物体候補領域Ｆ’と背景候補領域Ｂ’との境界を含む複数の小領域について，特徴量の平均値，分散値などのパラメータの推定と尤度算出とを繰り返すことで，最終的に領域分割の形状を推定する。 A statistical method for estimating this distribution is effective in region segmentation. For example, assuming a Gaussian distribution, maximum likelihood estimation is performed to estimate the shape and parameter values of the region division using the mean and variance as parameters. For a plurality of small regions including the boundary between the object candidate region F ′ and the background candidate region B ′, it is finally divided into regions by repeatedly estimating parameters such as feature value average values and variance values and calculating likelihoods. Estimate the shape of

アルゴリズムとしては，例えば，パラメータの推定（Ｅ（Expectation)ステップ）と最大尤度推定（Ｍ（Maximization) ステップ）とを繰り返すＥＭアルゴリズム（参考文献：情報処理, vol.37, No.1, pp.43-46, Jan, 1996)を用いる。 As an algorithm, for example, an EM algorithm that repeats parameter estimation (E (Expectation) step) and maximum likelihood estimation (M (Maximization) step) (reference: information processing, vol.37, No.1, pp. 43-46, Jan, 1996).

このＥＭアルゴリズムは，最急降下法と同様に解を逐次改良していく繰り返し探索のアルゴリズムであり，このアルゴリズムによって正確な領域を決定し，小領域Ｃ_iを領域Ｚ₁と領域Ｚ₂との２つのクラスに分割する。ＥＭアルゴリズムでは，Ｅステップで，すべての特徴量について並列的にパラメータの推定を行ない，Ｍステップで，すべての特徴量の尤度の合計（下記に示すＱ_i（Ｆ_m）の合計）を評価関数として最尤推定を行う。 The EM algorithm is an algorithm for the iterative search continue to iterative improvement solutions like the steepest descent method to determine the exact area The algorithm 2 between the region Z ₁ and region Z ₂ subregions C _i Divide into two classes. In the EM algorithm, parameters are estimated in parallel for all feature quantities in E step, and the total likelihood (sum of Q _i (F _m ) shown below) of all feature quantities is evaluated in M step. Perform maximum likelihood estimation as a function.

例えば，ｉ番目の小領域をＭ個の特徴量Ｆ₁〜Ｆ_Mによって分割した処理結果は，
（イ）ｋ番目の画素について求めた特徴量Ｆ₁〜Ｆ_M：Ｆ_1k〜Ｆ_Mk
（ロ）ｋ番目の画素が各特徴量ｍにより領域Ｚ_i1と領域Ｚ_i2のどちらのクラスに分類されたのかを示す情報：Ｚ_mk
（ハ）ｋ番目の画素がクラス分けされた領域Ｚ_mkに属する確率：Ｐ_mk（Ｚ_mk）
（ニ）特徴量ごとの小領域Ｃ_i全体での尤度：Ｑ_i（Ｆ₁）〜Ｑ_i（Ｆ_M）
但し，Ｑ_i（Ｆ_m）＝Σ_kｌｏｇＰ_mk（Ｚ_mk）
という値の組として得られる。 For example, the processing result obtained by dividing the i-th small region by M feature amounts F _{1 to} F _M is as follows:
(A) Feature quantities F _{1 to} F _M obtained for the k-th pixel: F _{1k to} F _Mk
(B) Information indicating whether the k-th pixel is classified into the region Z _i1 or the region Z _i2 by each feature amount m: Z _mk
(C) Probability that the k-th pixel belongs to the classified region Z _mk : P _mk (Z _mk )
(D) Likelihood in the entire small region C _i for each feature quantity: Q _i (F ₁ ) to Q _i (F _M )
However, Q _i (F _m ) = Σ _k log P _mk (Z _mk )
Is obtained as a set of values.

最後に，領域抽出部１２４では，一つの小領域に含まれる注目物体領域と背景領域と，それに隣接する小領域に含まれる注目物体領域と背景領域とを，それぞれの領域の特徴量の分布の近いもの同士を同じ領域とみなすことで，統合する。ここで，領域が統合されたとは，隣り合う２つの小領域で，接する辺上の境界線をつなぎ，つながれた境界線の同じ側が同じクラス（ここでは，物体領域と背景領域の２つのクラス）に属するように決定されたことをいう。 Finally, the region extraction unit 124 converts the attention object region and the background region included in one small region, and the attention object region and the background region included in the adjacent small region into the distribution of the feature amount of each region. Integrate by considering close objects as the same area. Here, the area is merged. Two adjacent small areas connect the border lines on the adjacent sides, and the same side of the connected border line is the same class (in this case, two classes, the object area and the background area) That it was decided to belong to.

各小領域を以下のようにして統合することにより，画像全体の物体領域と画像全体の背景領域とが求められる。各小領域の統合処理について，具体的に説明する。統合の基本的なルールとして，ある２つの隣り合う小領域Ｃ_i,Ｃ_jにおいて，小領域Ｃ_i内の２つのクラスＺ_i1／Ｚ_i2と，小領域Ｃ_j内の２つのクラスＺ_j1／Ｚ_j2とのクラス間の対応を決定する。そして，クラス同士をつなぐ境界線を矛盾なくつなぐ。 By integrating the small areas as follows, the object area of the entire image and the background area of the entire image are obtained. The integration process of each small area will be specifically described. As a rule of integration, small areas C _i neighboring are _two, the C _j, the two classes Z _i1 / Z _i2 in the small area C _i, two classes Z in the small area C _j _j1 / Determine the correspondence between classes with Z _j2 . And, the boundary line connecting classes is connected without contradiction.

統合処理では，まず，特徴量の分布（例えば平均値）を使って，図９（Ａ）に示す小領域Ｃ_i内の２つのクラスＺ_i1／Ｚ_i2と，小領域Ｃ_j内の２つのクラスＺ_j1／Ｚ_j2とのクラス間の対応を決定する。例えば，Ｚ_i1とＺ_j1とが同じクラスに属し，Ｚ_i2とＺ_j2とが同じクラスに属するという対応関係を決定するのである。 In the integration process, first, with the distribution of the feature (e.g., average value), and two classes Z _i1 / Z _i2 in small areas C _i shown in FIG. 9 (A), the two in the small area C _j The correspondence between classes with class Z _j1 / Z _j2 is determined. For example, the correspondence relationship that Z _i1 and Z _j1 belong to the same class and Z _i2 and Z _j2 belong to the same class is determined.

図９（Ａ）および（Ｂ）の上段に示すように，小領域Ｃ_iはクラスＺ_i1とＺ_i2とに分割され，Ｃ_iと隣り合う小領域Ｃ_jはクラスＺ_j1とＺ_j2とに分割されているものとする。図９（Ａ）および（Ｂ）の下段に示す各小領域における各クラスの特徴量の分布に基づいて，特徴量の分布形状の近い方を同じクラスとしてラベルづけする。その後，図９（Ｃ）に示すように，クラス同士をつなぐ境界線を矛盾なくつなぐ。特徴量の分布の近さは，平均値などの統計量を比較するか，マハラノビス距離などにより判定することができる。 9A and 9B, the small area C _i is divided into classes Z _i1 and Z _i2, and the small area C _j adjacent to C _i is divided into classes Z _j1 and Z _j2 . It is assumed that it is divided. Based on the distribution of the feature quantity of each class in each small area shown in the lower part of FIGS. 9A and 9B, the one with the closest feature quantity distribution shape is labeled as the same class. Thereafter, as shown in FIG. 9C, the boundary lines connecting the classes are connected without contradiction. The closeness of the distribution of feature values can be determined by comparing statistics such as average values or by Mahalanobis distance.

以上の処理の結果，注目物体輪郭抽出部１２は，時系列の画像と，画像の各フレームに対応づけられた輪郭情報と，輪郭抽出の際に得られた注目物体の背景からの分離度を出力する。 As a result of the above processing, the target object contour extraction unit 12 calculates the time-series image, the contour information associated with each frame of the image, and the degree of separation from the background of the target object obtained at the time of contour extraction. Output.

〔第４の実施の形態〕
次に図１０を用いて，本発明の第４の実施の形態を説明する。図１０に示す代表画像選択装置３は，図６に示す代表画像選択装置２の構成に，代表画像条件入力部３１と代表画像条件判定部３２とを加えたものである。代表画像条件入力部３１と代表画像条件判定部３２以外の構成は，代表画像選択装置２が備える構成と同じである。 [Fourth Embodiment]
Next, a fourth embodiment of the present invention will be described with reference to FIG. A representative image selection device 3 shown in FIG. 10 is obtained by adding a representative image condition input unit 31 and a representative image condition determination unit 32 to the configuration of the representative image selection device 2 shown in FIG. The configuration other than the representative image condition input unit 31 and the representative image condition determination unit 32 is the same as the configuration included in the representative image selection device 2.

代表画像条件入力部３１は，輪郭評価基準１６０で与えられた輪郭に基づく評価基準とは別の，代表画像の選択条件を，ユーザに選択入力させる手段である。代表画像の選択条件の例として，背景色の特定，背景に特定の物体（人，顔，文字，テロップなど）が含まれるかどうかなどの条件がある。ユーザは，これらのうち一つまたは複数の条件を選択することができる。 The representative image condition input unit 31 is a means for allowing the user to selectively input representative image selection conditions different from the evaluation criterion based on the contour given by the contour evaluation criterion 160. Examples of representative image selection conditions include specifying the background color and whether or not a specific object (such as a person, face, character, or telop) is included in the background. The user can select one or more of these conditions.

代表画像条件判定部３２は，代表画像条件入力部３１で入力された条件に従って，動画像入力部１１から入力されたフレームの条件判定を行う。代表画像条件判定部３２は，代表画像条件入力部３１が持つ条件に対応した判定処理関数をもち，条件判定を行う。 The representative image condition determining unit 32 determines the condition of the frame input from the moving image input unit 11 according to the condition input by the representative image condition input unit 31. The representative image condition determination unit 32 has a determination processing function corresponding to the condition of the representative image condition input unit 31 and performs condition determination.

例えば，特定の背景色を選択するという条件に対応する判定処理としては，色ヒストグラムを求めて，特定の背景色の色ヒストグラムと比較する。代表画像条件判定部３２による判定の結果，現在の着目フレームについて，例えば「背景色が青」というような選択のための条件が検出されると，その選択条件をインデックス付与部１４に通知する。また，代表画像条件判定部３２は，特定の物体が含まれるという条件に対応する判定処理としては，特定の物体のテンプレートとのマッチングを行う。その結果，現在の着目フレームについて，例えば「顔が含まれる」というような選択のための条件が検出されると，その選択条件をインデックス付与部１４に通知する。 For example, as a determination process corresponding to the condition of selecting a specific background color, a color histogram is obtained and compared with a color histogram of a specific background color. As a result of the determination by the representative image condition determination unit 32, when a selection condition such as “background color is blue” is detected for the current frame of interest, the selection condition is notified to the index assigning unit. The representative image condition determination unit 32 performs matching with a template of a specific object as a determination process corresponding to the condition that the specific object is included. As a result, when a condition for selection such as “a face is included” is detected for the current frame of interest, the selection condition is notified to the index assigning unit 14.

この条件判定は，入力された動画像のすべてのフレームに対して行うようにしてもよいし，代表画像適合度算出部１３において算出された点数が所定値以上のフレームに対してのみ行うようにしてもよい。 This condition determination may be performed for all the frames of the input moving image, or may be performed only for the frames for which the score calculated by the representative image fitness calculation unit 13 is equal to or greater than a predetermined value. May be.

インデックス付与部１４では，代表画像適合度算出部１３から入力された各フレームの画像と，そのフレームから抽出された注目物体の輪郭と，その輪郭の背景からの分離度と，輪郭を評価した評価方法とその結果の点数と，代表画像条件入力部３１によって入力された，代表画像を選択する条件を，評価方法と点数と条件をインデックスとする図１１に示すようなデータ構造に書き込む。 The index assigning unit 14 evaluates the image of each frame input from the representative image suitability calculating unit 13, the contour of the target object extracted from the frame, the degree of separation of the contour from the background, and the contour. The method, the score of the result, and the condition for selecting the representative image input by the representative image condition input unit 31 are written in the data structure as shown in FIG. 11 using the evaluation method, the score, and the condition as an index.

〔第５の実施の形態〕
次に，図１２を用いて，本発明の第５の実施の形態を説明する。図１２に示す代表画像選択装置４は，図１の代表画像選択装置１の構成にシーン分離部４１とシーン結合部４２とを加えた構成を採る。 [Fifth Embodiment]
Next, a fifth embodiment of the present invention will be described with reference to FIG. The representative image selection device 4 shown in FIG. 12 adopts a configuration in which a scene separation unit 41 and a scene combination unit 42 are added to the configuration of the representative image selection device 1 of FIG.

シーン分離部４１とシーン結合部４２以外の構成は，代表画像選択装置１が備える構成と同じである。シーン分離部４１は，例えば図１３（Ａ）に示すような長時間の映像を入力とし，これを図１３（Ｂ）に示すシーン１とシーン２のような短いシーンに分割する手段であり，映像からのシーン検出方法の従来手法（例えば，上記特許文献１，特許文献２，特許文献３に記載された手法）に置き換えることができる。これ以外でも，一定時間に分割するなどの方法で分割してもよい。 The configuration other than the scene separation unit 41 and the scene combination unit 42 is the same as the configuration included in the representative image selection device 1. The scene separation unit 41 is a means for receiving, for example, a long-time video as shown in FIG. 13A and dividing it into short scenes such as scene 1 and scene 2 shown in FIG. It can be replaced with a conventional method of scene detection method from video (for example, the method described in Patent Document 1, Patent Document 2, and Patent Document 3). Other than this, it may be divided by a method such as dividing at a certain time.

インデックス付与部１４により，シーン１とシーン２のそれぞれに含まれるフレームにインデックスが付与され，図１３（Ｃ）の斜線部に示すフレームが各シーンの代表画像とされると，シーン結合部４２は，図１３（Ｄ）に示すように，分割されてそれぞれ代表画像が選択されたシーン１およびシーン２をもとの時間の順序で結合する。 When the index assigning unit 14 assigns an index to the frame included in each of the scene 1 and the scene 2, and the frame indicated by the hatched portion in FIG. 13C is a representative image of each scene, the scene combining unit 42 As shown in FIG. 13D, scenes 1 and 2 which are divided and selected as representative images are combined in the original time order.

代表画像選択装置の構成例を示す図である。It is a figure which shows the structural example of a representative image selection apparatus. 本発明の第１の実施の形態の処理フローチャートである。It is a processing flowchart of a 1st embodiment of the present invention. インデックスのデータ構造の例を示す図である。It is a figure which shows the example of the data structure of an index. 輪郭評価基準を用いた代表画像適合度算出部の処理の例を示す図である。It is a figure which shows the example of a process of the representative image adaptation calculation part using an outline evaluation reference | standard. 輪郭評価基準を用いた代表画像適合度算出部の処理の例を示す図である。It is a figure which shows the example of a process of the representative image adaptation calculation part using an outline evaluation reference | standard. 代表画像選択装置の構成例を示す図である。It is a figure which shows the structural example of a representative image selection apparatus. 注目物体輪郭抽出部の構成例を示す図である。It is a figure which shows the structural example of an attention object outline extraction part. 小領域設定部による画像分割を説明する図である。It is a figure explaining the image division by a small area setting part. 各小領域の統合処理を説明する図である。It is a figure explaining the integration process of each small area | region. 代表画像選択装置の構成例を示す図である。It is a figure which shows the structural example of a representative image selection apparatus. インデックスのデータ構造の例を示す図である。It is a figure which shows the example of the data structure of an index. 代表画像選択装置の構成例を示す図である。It is a figure which shows the structural example of a representative image selection apparatus. 本発明の第５の実施の形態を説明する図である。It is a figure explaining the 5th Embodiment of this invention.

Explanation of symbols

１，２，３，４代表画像選択装置
１１動画像入力部
１２注目物体輪郭抽出部
１３代表画像適合度算出部
１４インデックス付与部
１５動画像出力部
１６輪郭評価基準格納部
２１注目物体指示部
３１代表画像条件入力部
３２代表画像条件判定部
４１シーン分離部
４２シーン結合部
１２１小領域設定部
１２２特徴ベクトル抽出部
１２３小領域分割部
１２４領域抽出部
１６０輪郭評価基準 1, 2, 3, 4 Representative image selection device 11 Moving image input unit 12 Target object contour extraction unit 13 Representative image fitness calculation unit 14 Index assignment unit 15 Moving image output unit 16 Contour evaluation reference storage unit 21 Target object instruction unit 31 Representative image condition input unit 32 Representative image condition determination unit 41 Scene separation unit 42 Scene combination unit 121 Small region setting unit 122 Feature vector extraction unit 123 Small region division unit 124 Region extraction unit 160 Contour evaluation standard

Claims

A representative image selection device for selecting a representative image from a moving image,
A moving image input means for inputting a moving image;
A small area setting means for dividing an area including an object area of interest which is one or a plurality of moving areas from each frame of the moving image into small areas;
Feature vector extraction means for extracting a feature quantity including at least a feature quantity based on motion for each pixel in the small area;
Sub-region dividing means for classifying each pixel in the sub-region using the feature amount to belong to a class of a target object region or a background region other than the target object region;
Region extracting means for extracting the contour of the object region of interest by connecting classes that are close to each other in the same class and having the same class between the adjacent small regions and connecting the boundaries of the classes between the small regions When,
The average value of the distribution of the feature amount is obtained for all the pixels in each of the target object region and the background region extracted by the region extracting means, and the distance between both average values is used as a representative image of each frame. representative image selection apparatus, comprising a representative image matching degree calculation means to score indicating a.

The representative image selection device according to claim 1,
A representative image selection device comprising: an object-of-interest instruction means for designating an object-of-interest area that is one or a plurality of moving areas by input to the small area setting means .

In the representative image selection device according to claim 1 or 2,
The representative image suitability calculating means includes:
As the score of the fitness, in addition to the distance, the degree of coincidence between the shape of the target object region extracted by the region extraction unit and a predetermined template, the size of the target object region, or the target object region Obtaining one or more of a value obtained by comparing the length of the contour and a predetermined reference value, or a value obtained by comparing the shape of the contour of the object region of interest and a predetermined reference value;
Furthermore, index assigning means for assigning information about the score of the fitness and the method for calculating the score to the frame as an index;
A representative image selection device comprising: a moving image output unit that outputs the indexed moving image .

In the representative image selection device according to claim 3 ,
Representative image condition input means for inputting conditions for selecting a representative image;
Representative image condition determining means for performing condition determination on the moving image input by the moving image input means using a condition for selecting the input representative image;
The index assigning means receives the condition determination result by the representative image condition determining means, and uses the moving image as an index including information on the degree of matching of the frame of the moving image as a representative image and the condition for selecting the representative image. The representative image selection device is characterized in that the image is added to the frame.

In the representative image selection device according to claim 3 or 4 ,
Scene separation means for dividing the moving image into a plurality of scenes;
The index assigning means assigns the index to a frame in each scene for each of the divided scenes.

A representative image selection method in which a computer selects a representative image from moving images,
A moving image input process for inputting a moving image;
A small region setting process for dividing a region including a target object region, which is one or a plurality of moving regions, from each frame of the moving image into small regions;
A feature vector extraction process for extracting a feature amount including at least a feature amount based on motion for each pixel in the small region;
A small area dividing process of classifying each pixel in the small area to the class of which of the object object area and the background area, which is the other area, using the feature amount;
A region extraction process for extracting the contour of the object region of interest by joining classes that are close to each other in the same feature class between adjacent small regions and connecting the boundaries of the classes between the small regions When,
The average value of the distribution of the feature amount is obtained for all pixels in each of the target object region and the background region extracted in the region extraction process, and the distance between the two average values is used as a representative image of each frame. representative image selection method characterized by having a representative image matching degree calculation step of a number indicating the.

In the representative image selection method according to claim 6,
A representative image selection method comprising: an object-of-interest instruction process that designates an object-of-interest area that is one or a plurality of moving areas by input with respect to the small area setting process .

In the representative image selection method according to claim 6 or 7,
The representative image fitness calculation process includes:
As the score of the fitness, in addition to the distance, the degree of coincidence between the shape of the target object region extracted in the region extraction process and a predetermined template, the size of the target object region, or the target object region Obtaining one or more of a value obtained by comparing the length of the contour and a predetermined reference value, or a value obtained by comparing the shape of the contour of the object region of interest and a predetermined reference value;
And an indexing process for assigning information about the score of the fitness and a method for calculating the score to the frame as an index,
A representative image selection method comprising: a moving image output process of outputting the indexed moving image .

In the representative image selection method according to claim 8 ,
A representative image condition input process for inputting a condition for selecting a representative image;
And a representative image condition determination process of performing condition determination on the moving image inputted by the moving image input step using the conditions for selecting a representative image that is the input,
The indexing process receives a condition determination result from the representative image condition determination process, and includes an index including information on a degree of matching of the frame of the moving image as a representative image and a condition for selecting the representative image. A representative image selecting method, characterized by being applied to a frame of

In the representative image selection method according to claim 8 or 9 ,
Has a scene separation process of dividing the moving picture into a plurality of scenes,
In the index assigning process, the index is assigned to a frame in each scene for each of the divided scenes.

A representative image selection program for causing a computer to execute the representative image selection method according to any one of claims 6 to 10.