JP4909315B2

JP4909315B2 - Video processing apparatus and method, program, and computer-readable recording medium

Info

Publication number: JP4909315B2
Application number: JP2008148219A
Authority: JP
Inventors: 陽介鳥井; 隆佐藤; 行信谷口
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2008-06-05
Filing date: 2008-06-05
Publication date: 2012-04-04
Anticipated expiration: 2028-06-05
Also published as: JP2009296344A

Description

本発明は、映像処理装置及び方法及びプログラム及びコンピュータ読取可能な記録媒体に係り、特に、映像の動きを表現した代表画像を生成するための映像処理装置及び方法及びプログラム及びコンピュータ読取可能な記録媒体に関する。 The present invention relates to a video processing apparatus, method, program, and computer-readable recording medium, and more particularly to a video processing apparatus, method, program, and computer-readable recording medium for generating a representative image representing video motion. About.

映像の中から映像内容を表す代表画像を抽出する技術は多数存在する。以下に例をあげる。 There are many techniques for extracting representative images representing video content from video. Here are some examples:

（１）第１の従来技術として、映像中の隣接フレーム画像間の内容変化が大きなところから小さくなる所を検出することで映像シーンの変化部分を表すサムネイルを検出している（例えば、特許文献１参照）。 (1) As a first conventional technique, a thumbnail representing a changed portion of a video scene is detected by detecting a place where the content change between adjacent frame images in a video is reduced from a large one (for example, Patent Documents). 1).

（２）第２の従来技術として、映像の動きベクトルの総量を時系列にグラフ化し、そのローカルミニマムを代表画像として採用する方法である。こうすることで、印象的なシーンや、カメラや俳優の動作の節目となるサムネイルを取り出すことを狙っている。また付随する効果として比較的はっきりとした画像が抽出できる（例えば、非特許文献１参照）。 (2) The second conventional technique is a method of graphing the total amount of motion vectors of a video in time series and employing the local minimum as a representative image. By doing this, we aim to extract impressive scenes and thumbnails that will be milestones for camera and actor movements. Further, a relatively clear image can be extracted as an accompanying effect (see, for example, Non-Patent Document 1).

（３）第３の従来技術として、動きベクトルを用いて映像を動物体と背景に分離し、動物体が大きく写ったフレーム画像（動物体アップフレーム）を検出し、それらの隣接したものを統合することで映像区間（動物体アップショット）を算出する。また、撮影者が動物体に注目し、フォローしているフォローショットを検出する（例えば、非特許文献２参照）。
特許第３１９４８３７号公報 Wayne Wolf, "Key Frame Selection by Motion Analysis." IEEE ICASSP 96, pp. 1228, 1996 鳥井他、「映像の動きを用いた動物体アップショット・フォローショット検出」、MIRU2005 Proc. pp. 24-31 Jul. 2005 (3) As a third conventional technique, a motion vector is used to separate an image into a moving object and a background, and a frame image (moving object up frame) in which the moving object is shown is detected, and those adjacent to each other are integrated. By doing so, the video section (animal body upshot) is calculated. In addition, the photographer pays attention to the moving object and detects a follow shot that is being followed (see, for example, Non-Patent Document 2).
Japanese Patent No. 319437 Wayne Wolf, "Key Frame Selection by Motion Analysis." IEEE ICASSP 96, pp. 1228, 1996 Torii et al., “Animal upshot / followshot detection using video motion”, MIRU2005 Proc. Pp. 24-31 Jul. 2005

しかしながら、上記従来の技術には以下のような問題がある。 However, the above conventional technique has the following problems.

第１の従来技術は、色特徴変化を主に利用しているが、明度等に敏感なため、一つの映像に対し、非常に多くサムネイルが検出され、見難くなる。また、一般的に色特徴変化によるサムネイル抽出では被写体・背景を分離して、映像の被写体・背景及びその動作までを分離して考慮したサムネイルを抽出することは難しい。例えば、固定カメラで撮影した被写体の動きに応じたサムネイルを抽出することはできない（以下、問題ａと記す）。 The first prior art mainly uses color feature change, but because it is sensitive to brightness and the like, a very large number of thumbnails are detected for one video, making it difficult to see. In general, in thumbnail extraction based on color feature changes, it is difficult to separate a subject / background and extract a thumbnail that takes into consideration the subject / background of the video and its operation. For example, it is impossible to extract a thumbnail corresponding to the movement of a subject photographed with a fixed camera (hereinafter referred to as problem a).

第２の従来技術は、フレーム全体の動き総量ではその動きが被写体の動きなのかカメラの動きなのかが判別がつかないため、効果的なサムネイル抽出にならない場合もある。例えば、スノーボードの映像ではジャンプしているような箇所は見所になるが、動き総量のローカルミニマムでは検出できない（以下、問題ｂと記す）。 In the second conventional technique, since it is impossible to determine whether the movement is the movement of the subject or the movement of the camera from the total movement amount of the entire frame, there are cases where the effective thumbnail extraction cannot be performed. For example, in a snowboard video, a jumping spot is a highlight, but it cannot be detected by the local minimum of the total amount of movement (hereinafter referred to as problem b).

第３の従来技術は、動物体の大きく写っている場面は検出できるが、その中から効果的なサムネイルを選択できない。また、フォローショットの検出において、フレーム単位で検出された局所的な動き量を利用していたため、過剰検出を起すという問題がある。 The third conventional technique can detect a scene in which a moving object is large, but cannot select an effective thumbnail from the scene. Further, in detecting the follow shot, there is a problem that excessive detection occurs because the local motion amount detected in units of frames is used.

本発明は、上記の点に鑑みなされたもので、映像から被写体（動物体）の存在する区間のうち、被写体または背景の動きが強調された画像、及び被写体または背景がはっきり写った画像をそれぞれ抽出することが可能な映像処理装置及び方法及びプログラム及びコンピュータ読取可能な記録媒体を提供することを目的とする。 The present invention has been made in view of the above points, and in the section where the subject (animal body) exists from the video, an image in which the motion of the subject or the background is emphasized and an image in which the subject or the background is clearly visible are respectively shown. It is an object of the present invention to provide a video processing apparatus and method, a program, and a computer-readable recording medium that can be extracted.

更なる目的は、被写体をカメラがフォローして撮影しているショットを比較的安定して抽出でき、注目被写体のサムネイルを抽出することが可能な映像処理装置及び方法及びプログラム及びコンピュータ読取可能な記録媒体を提供することである。 A further object is to provide a video processing apparatus and method, a program, and a computer-readable recording capable of extracting a shot of a subject of interest relatively stably, and capable of extracting a shot of the subject of interest. To provide a medium.

図１は、本発明の原理を説明するための図である。 FIG. 1 is a diagram for explaining the principle of the present invention.

本発明（請求項１）は、映像の中から映像内容を表す代表画像を抽出する映像処理方法であって、
映像入力手段が、映像を入力し、映像フレーム記憶手段に格納する映像入力ステップ（ステップ１）と、
被写体検出手段が、前記映像フレーム記憶手段から映像フレームを取得して、該映像フレーム内の被写体を検出し、フレーム対応付けを行って（ステップ２）被写体の位置とともに時系列被写体リスト記憶手段に格納する被写体検出ステップ（ステップ３）と、
サムネイル画像抽出手段が、前記時系列被写体リスト記憶手段に格納された前記被写体の動き量を算出し、算出した動き量の極大・極小値の時点を抽出時点とし、当該抽出時点の周辺の中で、画面中央と被写体位置との距離の逆数と、被写体面積の画面に占める割合との重み付き和を評価値とし、当該評価値に基づいて、サムネイル画像を抽出するサムネイル画像抽出ステップ（ステップ４）と、を行う。 The present invention (Claim 1) is a video processing method for extracting a representative image representing video content from video,
A video input step (step 1) for the video input means to input the video and store it in the video frame storage means;
Subject detection means acquires a video frame from the video frame storage means, detects a subject in the video frame, performs frame association (step 2), and stores it in the time-series subject list storage means together with the position of the subject. Subject detection step (step 3),
The thumbnail image extraction means calculates the amount of movement of the subject stored in the time-series subject list storage means, and sets the time point of the calculated maximum or minimum value of the movement amount as the extraction time point. A thumbnail image extraction step for extracting a thumbnail image based on the weighted sum of the reciprocal of the distance between the center of the screen and the subject position and the ratio of the subject area to the screen based on the evaluation value (step 4) And do.

本発明（請求項２）は、映像の中から映像内容を表す代表画像を抽出する映像処理方法であって、
映像入力手段が、映像を入力し、映像フレーム記憶手段に格納する映像入力ステップと、
被写体検出手段が、前記映像フレーム記憶手段から映像フレームを取得して、該映像フレーム内の被写体を検出し、フレーム対応付けを行って被写体の位置とともに時系列被写体リスト記憶手段に格納する被写体検出ステップと、
画像単位対応手段が、画像単位毎のフレーム間の動きベクトルを算出し、背景の情報を前記時系列被写体リスト記憶手段に格納する画像単位対応ステップと、
サムネイル画像抽出手段が、前記時系列被写体リスト記憶手段に格納された情報を利用して、背景の動き量を算出し、算出した動き量の極大・極小値の時点を抽出時点とし、当該抽出時点の周辺の中で、画面中央と被写体位置との距離の逆数と、被写体面積の画面に占める割合との重み付き和を評価値とし、当該評価値に基づいて、サムネイル画像を抽出するサムネイル画像抽出ステップと、を行う。 The present invention (Claim 2) is a video processing method for extracting a representative image representing video content from video,
A video input means for inputting video and storing the video in the video frame storage means;
A subject detection step in which a subject detection means acquires a video frame from the video frame storage means, detects a subject in the video frame, performs frame association, and stores it in the time-sequential subject list storage means together with the position of the subject. When,
An image unit corresponding unit calculates a motion vector between frames for each image unit, and stores background information in the time-series subject list storage unit;
The thumbnail image extraction means calculates the amount of motion of the background using the information stored in the time-series subject list storage means, and sets the time point of the calculated maximum or minimum value of the motion amount as the extraction time point. In the periphery of the image, a weighted sum of the reciprocal of the distance between the center of the screen and the subject position and the ratio of the subject area to the screen is used as an evaluation value, and a thumbnail image is extracted based on the evaluation value. And step.

また、本発明（請求項３）は、サムネイル画像抽出ステップにおいて、
時系列被写体リスト記憶手段に格納された情報を利用して、被写体と背景の両者の動き量を算出し、被写体の動き量及び前記背景の動き量の一方の極大・極小値の時点を取り出し、取り出した時点の周辺において、他方の動き量による条件を加えて決まる時点を抽出時点とする。 The present invention (Claim 3), in the thumbnail image extraction step,
Using the information stored in the time-series subject list storage means, the amount of movement of both the subject and the background is calculated, and the time point of one of the maximum and minimum values of the amount of movement of the subject and the amount of movement of the background is extracted, A time point determined by adding a condition based on the other movement amount around the time point of the extraction is set as an extraction time point .

本発明（請求項４）は、サムネイル画像抽出ステップにおいて、
時系列被写体リスト記憶手段に格納された情報を利用して、被写体と背景の両者の動き量を算出し、背景の動き量の２つの連続する極小値間における被写体の動き量の平均値を算出し、２つの連続する極小値間にある背景の動き量の極大値の前後近傍に被写体の動き量の極小値が存在し、かつ、その極小値が被写体の動き量の平均値より小さい場合に、被写体の動き量の極小値の時点を抽出時点とする。 According to the present invention (Claim 4), in the thumbnail image extraction step,
Using the information stored in the time-series subject list storage means, the amount of movement of both the subject and the background is calculated, and the average value of the amount of movement of the subject between two consecutive local minimum values of the background is calculated. And when the minimum value of the subject's motion amount exists in the vicinity of the maximum value of the background motion amount between two consecutive minimum values, and the minimum value is smaller than the average value of the subject's motion amount. The time point at which the subject movement amount is minimum is set as the extraction time point.

図２は、本発明の原理構成図である。 FIG. 2 is a principle configuration diagram of the present invention.

本発明（請求項５）は、映像の中から映像内容を表す代表画像を抽出する映像処理装置であって、
映像を入力し、映像フレーム記憶手段に格納する映像入力手段１１０と、
映像フレーム記憶手段１０２から映像フレームを取得して、該映像フレーム内の被写体を検出し、フレーム対応付けを行って被写体の位置とともに時系列被写体リスト記憶手段に格納する被写体検出手段１４０と、
時系列被写体リスト記憶手段１０４に格納された被写体の動き量を算出し、算出した動き量の極大・極小値の時点を抽出時点とし、当該抽出時点の周辺の中で、画面中央と被写体位置との距離の逆数と、被写体面積の画面に占める割合との重み付き和を評価値とし、当該評価値に基づいて、サムネイル画像を抽出するサムネイル画像抽出手段１５０と、を有する。 The present invention (Claim 5) is a video processing apparatus for extracting a representative image representing video content from video,
Video input means 110 for inputting video and storing it in video frame storage means;
A subject detection unit 140 that acquires a video frame from the video frame storage unit 102, detects a subject in the video frame, performs frame association, and stores the subject in the time-series subject list storage unit together with the position of the subject;
The amount of movement of the subject stored in the time- series subject list storage unit 104 is calculated, and the time point of the calculated maximum and minimum values of movement amount is set as the extraction time point. And a thumbnail image extraction unit 150 that extracts a thumbnail image based on the weighted sum of the reciprocal of the distance and the ratio of the subject area to the screen as an evaluation value .

本発明（請求項６）は、映像の中から映像内容を表す代表画像を抽出する映像処理装置であって、
映像を入力し、映像フレーム記憶手段に格納する映像入力手段と、
映像フレーム記憶手段から映像フレームを取得して、該映像フレーム内の被写体を検出し、フレーム対応付けを行って被写体の位置とともに時系列被写体リスト記憶手段に格納する被写体検出手段と、
画像単位毎のフレーム間の動きベクトルを算出し、背景の情報を時系列被写体リスト記憶手段に格納する画像単位対応手段と、
時系列被写体リスト記憶手段に格納された情報を利用して、背景の動き量を算出し、算出した動き量の極大・極小値の時点を抽出時点とし、当該抽出時点の周辺の中で、画面中央と被写体位置との距離の逆数と、被写体面積の画面に占める割合との重み付き和を評価値とし、当該評価値に基づいて、サムネイル画像を抽出するサムネイル画像抽出手段と、を有する。 The present invention (Claim 6) is a video processing apparatus for extracting a representative image representing video content from video,
Video input means for inputting video and storing it in video frame storage means;
Subject detection means for acquiring a video frame from the video frame storage means, detecting a subject in the video frame, performing frame association, and storing in the time-series subject list storage means together with the position of the subject;
An image unit correspondence unit that calculates a motion vector between frames for each image unit, and stores background information in a time-series subject list storage unit;
The information stored in the time-series subject list storage means is used to calculate the amount of movement of the background, and the time point of the calculated maximum or minimum value of the movement amount is set as the extraction time point. And a thumbnail image extracting means for extracting a thumbnail image based on the weighted sum of the reciprocal of the distance between the center and the subject position and the ratio of the subject area to the screen as an evaluation value .

また、本発明（請求項７）は、サムネイル画像抽出手段において、
時系列被写体リスト記憶手段に格納された情報を利用して、被写体と背景の両者の動き量を算出し、被写体の動き量及び背景の動き量の一方の極大・極小値の時点を取り出し、取り出した時点の周辺において、他方の動き量による条件を加えて決まる時点を抽出時点とする。 Further, the present invention (Claim 7) provides a thumbnail image extraction means,
Using the information stored in the time-series subject list storage means, the movement amount of both the subject and the background is calculated, and the time point of one of the maximum and minimum values of the subject movement amount and the background movement amount is extracted and extracted. A time point determined by adding a condition based on the other amount of motion around the time point is set as an extraction time point.

また、本発明（請求項８）は、サムネイル画像抽出手段において、
時系列被写体リスト記憶手段に格納された情報を利用して、被写体と背景の両者の動き量を算出し、背景の動き量の２つの連続する極小値間における被写体の動き量の平均値を算出し、２つの連続する極小値間にある背景の動き量の極大値の前後近傍に被写体の動き量の極小値が存在し、かつ、その極小値が被写体の動き量の平均値より小さい場合に、被写体の動き量の極小値の時点を抽出時点とする。 The present invention (Claim 8) provides a thumbnail image extraction means,
Using the information stored in the time-series subject list storage means, the amount of movement of both the subject and the background is calculated, and the average value of the amount of movement of the subject between two consecutive local minimum values of the background is calculated. And when the minimum value of the subject's motion amount exists in the vicinity of the maximum value of the background motion amount between two consecutive minimum values, and the minimum value is smaller than the average value of the subject's motion amount. The time point at which the subject movement amount is minimum is set as the extraction time point.

本発明（請求項９）は、請求項５乃至８のいずれか１項に記載の映像処理装置を構成する各手段としてコンピュータを機能させるための映像処理プログラムである。 The present invention (Claim 9) is a video processing program for causing a computer to function as each means constituting the video processing apparatus according to any one of Claims 5 to 8.

本発明（請求項１０）は、請求項９記載の映像処理プログラムを格納したコンピュータ読取可能な記録媒体である。 The present invention (Claim 10) is a computer-readable recording medium storing the video processing program according to Claim 9.

上記のように本発明によれば、ショット全てのサムネイルを提示するのではなく、被写体検出処理を行い、例えば被写体が検出された箇所に着目し、そこから画像を抽出・提示することで、無意味なフレーム画像の提示を避け、動作を表すのに効果的なフレーム画像のみを提示することができ、前述の問題ａを解決できる。 As described above, according to the present invention, instead of presenting thumbnails of all shots, subject detection processing is performed. For example, attention is paid to a place where a subject is detected, and an image is extracted and presented from there. It is possible to avoid presentation of a meaningful frame image, and to present only a frame image effective for representing an operation, and to solve the above-mentioned problem a.

また、本発明は、フレーム全体の動きではなく、注目被写体を検出し、被写体領域の動きとそれ以外の動き（背景の動き）と分離して利用し、注目すべき動きを表現できるサムネイルを抽出することで解決する。また動きの極小だけでなく極大部分のサムネイルも提示することで動作途中の様子を知ることができる。これにより、前述の問題ｂを解決できる。 In addition, the present invention detects not the entire frame movement, but the target object, and extracts the thumbnail that can express the movement to be noticed by using it separately from the movement of the subject area and the other movement (background movement). To solve it. Also, you can know the state of the operation by showing not only the minimum of movement but also the thumbnail of the maximum. Thereby, the above-mentioned problem b can be solved.

また、動物体の大きさだけではなく、動物体の動き及び背景の動きを考慮することで、被写体の動き及び場面の流れを表現できるサムネイル画像を抽出する。また、時間的連続性を考慮してフォローショットを検出することで、前述の問題ｃを解決できる。 Further, by considering not only the size of the moving object but also the movement of the moving object and the movement of the background, a thumbnail image that can express the movement of the subject and the flow of the scene is extracted. Moreover, the above-mentioned problem c can be solved by detecting a follow shot in consideration of temporal continuity.

上記のように、本発明では、被写体と背景を分離し、動き特徴・画像特徴をそれぞれ別に利用することで、映像から被写体（動物体）の存在する区間のうち、被写体または背景の動きが強調された画像、及び被写体または背景がはっきり写った画像をそれぞれ区別して抽出することが可能となる。例えば、その動きの極大・極小画像を交互に表示することで、被写体の動作あるいはカメラの視点移行等の動き及びその過程を表現したスライドショーを作成することが可能になる。また、被写体をカメラがフォローして撮影しているショットを比較的安定に抽出でき、注目被写体のサムネイルを抽出することができる。 As described above, in the present invention, the subject and the background are separated, and the motion feature and the image feature are separately used, so that the motion of the subject or the background is emphasized in the section where the subject (animal body) exists from the video. It is possible to distinguish and extract the captured image and the image clearly showing the subject or the background. For example, by alternately displaying the maximum and minimum images of the movement, it is possible to create a slide show that represents the movement of the subject or the movement of the viewpoint of the camera and the process thereof. In addition, it is possible to relatively stably extract a shot in which the camera follows the subject and photograph it, and to extract a thumbnail of the subject of interest.

以下、図面と共に本発明の実施の形態を説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

図３は、本発明の一実施の形態における映像処理装置の構成を示す。 FIG. 3 shows the configuration of the video processing apparatus in one embodiment of the present invention.

同図に示す映像処理装置は、映像入力部１１０、カット点検出部１２０、画像単位対応部１３０、被写体検出部１４０、サムネイル画像抽出部１５０、出力部１６０、映像記憶部１０１、フレームバッファ１０２、被写体映像区間リスト記憶部１０３、時系列被写体リスト記憶部１０４、サムネイル時刻リスト記憶部１０５から構成される。 The video processing apparatus shown in the figure includes a video input unit 110, a cut point detection unit 120, an image unit correspondence unit 130, a subject detection unit 140, a thumbnail image extraction unit 150, an output unit 160, a video storage unit 101, a frame buffer 102, A subject video section list storage unit 103, a time-series subject list storage unit 104, and a thumbnail time list storage unit 105 are configured.

上記の映像記憶部１０１、フレームバッファ１０２、被写体映像区間リスト記憶部１０３、時系列被写体リスト記憶部１０４、サムネイル時刻リスト部１０５はハードディスクデバイスやメモリ等の記憶媒体である。 The video storage unit 101, the frame buffer 102, the subject video section list storage unit 103, the time-series subject list storage unit 104, and the thumbnail time list unit 105 are storage media such as a hard disk device and a memory.

フレームバッファ１０２はフレーム番号（映像時刻）と画像データを格納する。被写体映像区間リスト記憶部１０３は開始時刻と終了時刻を格納する。時系列被写体リスト記憶部１０４は、被写体ＩＤ、画像単位座標（集合）、分類、重み（面積などの評価値）を格納する。サムネイル時刻リスト記憶部１０５は映像フレーム番号を格納する。 The frame buffer 102 stores a frame number (video time) and image data. The subject video section list storage unit 103 stores a start time and an end time. The time-series subject list storage unit 104 stores a subject ID, image unit coordinates (set), classification, and weight (evaluation value such as area). The thumbnail time list storage unit 105 stores the video frame number.

映像入力部１１０は、映像記憶部１０１から映像データを入力し、指定されたフレーム番号（映像時刻）、あるいは読み込んだフレーム画像を順に出力する。このとき、映像時刻情報を取得して、その映像時刻に対応するフレーム画像を取得する。また、入力映像フレームをフレームバッファ１０２へ格納し、指定時刻の映像データの取得を効率化することが望ましい。また、入力映像のデインタレース（インタレース解除）処理を行ってもよい。以下では、フレーム画像とその映像時刻とは常に組で入出力を行うものとする。 The video input unit 110 inputs video data from the video storage unit 101 and sequentially outputs a designated frame number (video time) or a read frame image. At this time, video time information is acquired, and a frame image corresponding to the video time is acquired. In addition, it is desirable to store the input video frame in the frame buffer 102 and to efficiently acquire the video data at the specified time. Further, deinterlace (deinterlace) processing of the input video may be performed. In the following, it is assumed that the frame image and its video time are always input / output in pairs.

カット点検出部１２０は、現在処理対象としているフレーム時刻と前回処理対象としたフレーム時刻の間に不連続点であるカット点が存在するかどうかを判定する装置である。当該方法については、例えば、特許第２８３９１３２号公報に記載の手法等を用いることが可能である。カット点のような映像の不連続点では動き検出の信頼性が低くなる。そのためカット点間の連続した映像区間（ショットと呼ぶ）単位で各映像処理を行い、カット点間での処理を回避する。処理対象の映像にカット点が含まれていない、あるいは、カット点における処理上の問題がないことが分かっている場合は省略可能である。 The cut point detection unit 120 is a device that determines whether or not a cut point that is a discontinuous point exists between the frame time that is the current processing target and the frame time that is the previous processing target. For this method, for example, the method described in Japanese Patent No. 2839132 can be used. The reliability of motion detection is low at video discontinuous points such as cut points. Therefore, each video processing is performed in units of continuous video sections (called shots) between cut points, and processing between cut points is avoided. This can be omitted if the image to be processed does not include a cut point or if it is known that there is no processing problem at the cut point.

被写体検出部１４０は、映像入力部１１０より入力したフレーム画像を処理して、その画像に被写体（動物体）が存在するか否かを判定し、その情報を時系列被写体リスト記憶部１０４に格納する。これは画像に対する顔などのオブジェクト検出（認識）技術、背景差分法等の物体検出手法を用いて被写体検出を行う、あるいは、前述の非特許文献２に記載の特徴点ベースの動物体検出手法等を利用するなどして実現することが可能である。検出された被写体（動物体）はその被写体が存在する座標（集合）を、被写体毎に被写体ＩＤを付して時系列被写体リスト記憶部１０４に記憶する。この被写体検出部１４０にて格納する座標は、例えば、画像ブロック、画素集合、特徴点、画像テンプレートなど被写体及びその動きを検出する画像単位を表す座標である。 The subject detection unit 140 processes the frame image input from the video input unit 110, determines whether or not a subject (animal body) exists in the image, and stores the information in the time-series subject list storage unit 104. To do. This is because object detection is performed using object detection (recognition) technology such as a face with respect to an image, an object detection method such as a background subtraction method, or a feature point-based moving object detection method described in Non-Patent Document 2 above. It is possible to realize this by using, for example. The detected subject (animal body) stores the coordinates (collection) where the subject exists in the time-series subject list storage unit 104 with a subject ID for each subject. The coordinates stored in the subject detection unit 140 are, for example, coordinates representing an image unit for detecting a subject and its movement such as an image block, a pixel set, a feature point, and an image template.

なお、上記の被写体検出手法により画像単位毎の動きベクトル、被写体の動き（フレーム間の対応付け）が内部的に算出される場合で、かつ背景の動きを利用しない場合は、以下で説明する画像単位対応部１３０は省略可能である。被写体のフレーム間対応（動き）が内部的に算出される被写体検出手法の例として、テンプレートマッチングや各種オブジェクト認識技術や前述の非特許文献２のような動きを用いた被写体検出などが挙げられる。また、被写体の動きが検出できない被写体検出方法の例として背景差分法があげられる。以下では、被写体の動きが検出できる手法を用いた場合に関して述べるが、被写体のフレーム間対応付けを別途行えば、被写体の動きを検出できない手法を利用した場合でも同様の処理が可能である。 When the motion vector for each image unit and the motion of the subject (association between frames) are internally calculated by the subject detection method described above and the background motion is not used, the image described below is used. The unit correspondence unit 130 can be omitted. Examples of the subject detection method in which the inter-frame correspondence (motion) of the subject is calculated internally include template matching, various object recognition techniques, and subject detection using motion as described in Non-Patent Document 2 above. An example of a subject detection method in which the motion of the subject cannot be detected is the background difference method. In the following, a case where a method capable of detecting the motion of the subject is described will be described. However, if the subject is associated with each other between frames, the same processing can be performed even when a method in which the motion of the subject cannot be detected is used.

上記のフレーム間対応付けの手法としては、例えば、被写体ＩＤとして仮の数値を記録し、画像単位対応部１３０において、前フレームの各被写体領域との重なり面積の、前フレームの比較対象となっている領域面積に対する割合が一定値以上である場合を全処理フレームと同一ＩＤとする方法や、前フレームと現フレームとの間の動きベクトルを算出して、前フレームの被写体領域の移動先を被写体領域内で算出された動きベクトルの平均ベクトルなどとして推定し、現フレームで検出されている各被写体と重なり割合が一定値以上である領域のＩＤを対応付ける等、種々の方法が考えられる。 As a method of the above-described inter-frame association, for example, a temporary numerical value is recorded as the subject ID, and the image unit correspondence unit 130 becomes an object to be compared with the previous frame in the overlapping area with each subject area of the previous frame. When the ratio of the area to the area is equal to or greater than a certain value, the method uses the same ID as all processing frames, or calculates the motion vector between the previous frame and the current frame, Various methods are conceivable, such as estimating as an average vector of motion vectors calculated in the region, and associating each subject detected in the current frame with the ID of the region where the overlapping ratio is a certain value or more.

時系列被写体リスト記憶部１０４は、処理対象のフレーム画像毎に記憶されるリストを記憶する。記憶レコードは、被写体ＩＤと画像単位の座標（集合）、分類、及び重み（面積などの評価値）からなる。「被写体ＩＤ」は、被写体あるいは画像単位毎に付与され、時系列的に対応付いた数値である。時系列被写体リスト記憶部１０４は、画像単位あるいはその集合によりその被写体を表す位置を記憶する。「画像単位座標（集合）」は、被写体検出の検出単位により記憶される内容は異なる。例えば、矩形の画像単位を用いた被写体検出・追跡であればその矩形の左上、右下の座標対、画素単位で物体抽出・追跡される場合のその１物体をなす画素の点集合、特徴点単位での被写体検出・追跡の場合には特徴点座標集合が、それぞれ記憶される。「分類」には、検出された被写体の種類等を格納する。基本的には"被写体"または"背景"を「分類」として記憶するが、利用する被写体検出手段に応じて被写体分類、例えば、文字（テロップ）、動物、人、顔などの意味的な分類も付与してもよい。「重み」は、検出された被写体の面積（画素数）や、特徴点毎の支配領域を表す重み（非特許文献２）を記憶する。また、検出時のその確からしさの評価値も合わせて重みとして記憶してもよい。 The time-series subject list storage unit 104 stores a list stored for each frame image to be processed. The storage record includes a subject ID, image unit coordinates (set), classification, and weight (evaluation value such as area). “Subject ID” is a numerical value assigned to each subject or image unit and associated in time series. The time-series subject list storage unit 104 stores a position representing the subject in units of images or a set thereof. The content of “image unit coordinates (collection)” is different depending on the detection unit of subject detection. For example, in the case of subject detection / tracking using a rectangular image unit, the coordinate pair of the upper left and lower right of the rectangle, the point set of pixels that make up that one object when the object is extracted / tracked in pixel units, and the feature points In the case of subject detection / tracking in units, feature point coordinate sets are stored respectively. The “classification” stores the type of the detected subject. Basically, “subject” or “background” is stored as “classification”, but subject classification, for example, semantic classification of characters (telop), animals, people, faces, etc., depending on the subject detection means used. It may be given. The “weight” stores the area (number of pixels) of the detected subject and the weight (Non-Patent Document 2) representing the dominant region for each feature point. Moreover, the evaluation value of the certainty at the time of detection may be stored together as a weight.

画像単位対応部１３０は、被写体以外の領域（背景）の動きベクトルを利用したサムネイル生成を行う場合に必要な装置である。背景の動きベクトルが推定されていない場合に、各画像単位の動きベクトルを推定して時系列被写体リスト記憶部１０４に記憶する。その際の時系列被写体リスト記憶部１０４の「分類」の欄には"背景"を表す数値など、被写体とは別の分類を設定するものとする。 The image unit correspondence unit 130 is an apparatus necessary for generating thumbnails using a motion vector of an area (background) other than the subject. When the background motion vector is not estimated, the motion vector for each image is estimated and stored in the time-series subject list storage unit 104. In this case, a classification other than the subject, such as a numerical value representing “background”, is set in the “classification” column of the time-series subject list storage unit 104.

上記の背景の動きベクトルの算出方法は、例えば、複数時刻のフレーム画像を入力し、それら２つのフレーム画像間の画像単位毎のマッチングをとることで算出可能である。動きベクトルの算出方法としては、既存技術のブロックマッチングや非特許文献２に記載の特徴点追跡技術など種々の技術を利用することができる。動きベクトル算出の単位領域（画像単位）は、例えばブロックマッチングでは画像ブロックを指し、また、特徴点追跡では特徴点である。 The background motion vector calculation method can be calculated, for example, by inputting a frame image at a plurality of times and matching the two frame images for each image unit. As a motion vector calculation method, various techniques such as block matching of existing techniques and a feature point tracking technique described in Non-Patent Document 2 can be used. A unit area (image unit) for motion vector calculation indicates, for example, an image block in block matching, and a feature point in feature point tracking.

サムネイル画像抽出部１５０は、被写体検出部１４０にて被写体が消滅したと判定された場合、あるいは、被写体が検出されたままショット終端まで来た場合に動作し、時系列被写体リスト記憶部１０４に蓄積された情報を利用して、その被写体映像区間からサムネイルを抽出し、その映像フレーム番号、または、抽出されたサムネイルの時刻をサムネイル時刻リスト記憶部１０５に格納する。 The thumbnail image extraction unit 150 operates when the subject detection unit 140 determines that the subject has disappeared, or when the subject has been detected and reaches the end of the shot, and is stored in the time-series subject list storage unit 104. Using the extracted information, a thumbnail is extracted from the subject video section, and the video frame number or the time of the extracted thumbnail is stored in the thumbnail time list storage unit 105.

上記の構成における動作の詳細を以下に示す。以下では、
（１）被写体検出部１４０においてフレーム間の対応付けが可能であり、被写体の動きのみを利用してサムネイルを抽出する処理；
（２）被写体及び背景の動きを利用してサムネイルを抽出する処理；
に分けて説明する。 Details of the operation in the above configuration will be described below. Below,
(1) Processing for extracting thumbnails using only the movement of a subject, in which the subject detection unit 140 can associate the frames;
(2) Processing for extracting thumbnails using the movement of the subject and background;
This will be explained separately.

（１）の処理を図４に示し、（２）の処理を図５に示す。 FIG. 4 shows the process (1), and FIG. 5 shows the process (2).

まず、上記の（１）の処理について説明する。当該処理は装置構成に画像単位対応部１３０が含まれない場合を示す。 First, the process (1) will be described. This process shows a case where the image unit correspondence unit 130 is not included in the apparatus configuration.

図４は、本発明の一実施の形態における動作のフローチャート（画像ベース被写体検出利用）である。 FIG. 4 is a flowchart of the operation (using image-based subject detection) in one embodiment of the present invention.

ステップ１０１）映像入力更新手順：
映像入力部１１０は、映像の次の処理フレームをフレームバッファ１０２等へ読み出し、各装置（カット点検出部１２０、画像単位対応部１３０、被写体検出部１４０、サムネイル画像抽出部１５０）から利用可能にする手順である。処理を初めて開始した際には、カット点検出部１２０及び被写体検出手順における処理に必要なフレーム数を、例えば、フレームバッファ１０２に読み出す。例えば特徴点追跡を利用した被写体検出であれば２枚である。 Step 101) Video input update procedure:
The video input unit 110 reads the next processing frame of the video to the frame buffer 102 and the like, and can be used from each device (cut point detection unit 120, image unit correspondence unit 130, subject detection unit 140, thumbnail image extraction unit 150). It is a procedure to do. When processing is started for the first time, the number of frames necessary for processing in the cut point detection unit 120 and the subject detection procedure is read out to the frame buffer 102, for example. For example, in the case of subject detection using feature point tracking, the number is two.

ステップ１０２）カット点検出手順：
カット点検出部１２０は、読み出した処理対象フレーム間にカット点があるかどうか判定し、カット点がある場合は処理を飛ばすために映像入力更新手順（ステップ１０１）に戻る。カット点がない場合は被写体検出手順（ステップ１０３）に移行する。 Step 102) Cut point detection procedure:
The cut point detection unit 120 determines whether there is a cut point between the read processing target frames. If there is a cut point, the cut point detection unit 120 returns to the video input update procedure (step 101) to skip the process. If there is no cut point, the process proceeds to the subject detection procedure (step 103).

ステップ１０３）被写体検出手順：
被写体検出部１４０は、被写体の検出有無を、例えば、特徴点を画像単位として動きによって判定し、時系列被写体リスト記憶部１０４に格納し、ステップ１０２に戻る。 Step 103) Subject detection procedure:
The subject detection unit 140 determines the presence / absence of detection of the subject, for example, by movement using the feature point as an image unit, stores it in the time-series subject list storage unit 104, and returns to step 102.

ステップ１０４）サムネイル検出手順：
サムネイル画像抽出部１５０は、ステップ１０３で格納された時系列被写体リスト記憶部１０４の情報を基にサムネイルとして抽出するフレーム画像を選択し、選択された画像の映像時刻等をサムネイル時刻リスト記憶部１０５に記憶する。具体的な手順は図６に後述する。 Step 104) Thumbnail detection procedure:
The thumbnail image extraction unit 150 selects a frame image to be extracted as a thumbnail based on the information in the time-series subject list storage unit 104 stored in step 103, and displays the video time and the like of the selected image as the thumbnail time list storage unit 105. To remember. A specific procedure will be described later with reference to FIG.

ステップ１０５）映像が終端であるかを判定する。判定方法は、次のフレームを読み出そうとして、読み出させるかどうかを判定するなど種々の方法がある。映像が終端の場合は、ステップ１０６に移行し、終端でない場合はステップ１０１に移行する。 Step 105) It is determined whether the video is the end. As the determination method, there are various methods such as determining whether to read out the next frame. If the video is at the end, the process proceeds to step 106, and if not, the process proceeds to step 101.

ステップ１０６）出力部１６０において、サムネイル時刻リスト記憶部１０５等、必要なファイル出力の開放やメモリの開放などの終了手順を行う。 Step 106) In the output unit 160, the thumbnail time list storage unit 105 or the like performs a termination procedure such as releasing necessary file output or memory.

次に、上記の（２）の処理について説明する。以下では、装置構成に、画像単位対応部１３０が含まれる場合の処理について説明する。 Next, the process (2) will be described. Hereinafter, processing when the image unit correspondence unit 130 is included in the apparatus configuration will be described.

図５は、本発明の一実施の形態における動作のフローチャート（画像ベース被写体検出利用）である。同図において、図４と同一の手順については同一のステップ番号を付し、その説明を省略する。 FIG. 5 is a flowchart of the operation (using image-based subject detection) in one embodiment of the present invention. In the figure, the same steps as those in FIG. 4 are denoted by the same step numbers, and the description thereof is omitted.

図５に示す動作は、図４の動作に画像単位対応手順（ステップ２０１）が追加されたものである。具体的には画像単位対応部１３０において、各画像単位毎のフレーム間対応をとり、時系列被写体リスト記憶部１０４に記憶する。画像単位毎のフレーム間の対応をとる方法は、前述のようにブロックマッチング、特徴点追跡手法を用いることで実現可能である。これは、被写体検出手順（ステップ１０３）に被写体同士の対応付け、あるいは動きベクトル（画像単位対応付け）の処理が入っていない場合に必要となる。 The operation shown in FIG. 5 is obtained by adding an image unit correspondence procedure (step 201) to the operation of FIG. Specifically, in the image unit correspondence unit 130, the correspondence between frames for each image unit is taken and stored in the time-series subject list storage unit 104. A method of taking correspondence between frames for each image unit can be realized by using block matching and a feature point tracking method as described above. This is necessary when the subject detection procedure (step 103) does not include subject-to-subject association or motion vector (image unit association) processing.

＜サムネイルの抽出方法：被写体動きを利用＞
次に、上記のステップ１０４における被写体動きを利用したサムネイルの抽出方法について説明する。 <Thumbnail extraction method: Using subject movement>
Next, a thumbnail extraction method using subject movement in step 104 will be described.

図６は、本発明の一実施の形態におけるサムネイル画像抽出手順のフローチャートである。 FIG. 6 is a flowchart of a thumbnail image extraction procedure according to an embodiment of the present invention.

ステップ１０４１）被写体・背景動き量算出手順
サムネイル画像抽出部１５０は、時系列被写体リスト記憶部１０４の情報を利用して被写体及び背景の動き量を算出する。例えば、前述の非特許文献２の手法では、画像単位座標（集合）は特徴点集合で記憶されている。動きベクトルを対応する特徴点の各フレーム間のベクトル差分として、例えば、全被写体の動きベクトル絶対値の中央値を被写体の動き量として算出する。また、背景の動き量をサムネイル抽出条件に利用する場合は、被写体同様に背景の特徴点に関して動きベクトル絶対値の中央値を背景の動き量として算出する。この算出をショット内の全てのフレーム間で行う。その後、必要に応じてその動き量グラフに対して平滑化処理を行うなどの前処理を行う。こうして被写体及び背景の動き量のグラフを作成する。スノーボードを行っている映像に対して作成したグラフの例を図７に示す。なお、被写体検出部１４０に別の手法を用いた場合、例えば、矩形を画像単位とするオブジェクト認識手法の場合は、被写体領域の矩形重心座標同士のベクトル差を上記被写体動き量とすることで同様の処理を行うことができる。また、被写体・背景動き量を中央値として算出しているが、他の計算方法を用いてもよい。 Step 1041) Subject / Background Motion Amount Calculation Procedure The thumbnail image extraction unit 150 uses the information in the time-series subject list storage unit 104 to calculate the subject / background motion amount. For example, in the method of Non-Patent Document 2 described above, the image unit coordinates (set) are stored as a feature point set. As a vector difference between frames of corresponding feature points corresponding to motion vectors, for example, a median of motion vector absolute values of all subjects is calculated as a subject motion amount. When the background motion amount is used as a thumbnail extraction condition, the median value of the motion vector absolute value is calculated as the background motion amount with respect to the background feature points as in the subject. This calculation is performed between all frames in the shot. Thereafter, pre-processing such as smoothing processing is performed on the motion amount graph as necessary. Thus, a graph of the amount of movement of the subject and the background is created. FIG. 7 shows an example of a graph created for an image of snowboarding. Note that when another method is used for the subject detection unit 140, for example, in the case of an object recognition method in which a rectangle is an image unit, the same is obtained by using the vector difference between the rectangular barycentric coordinates of the subject region as the subject motion amount. Can be processed. Further, although the subject / background motion amount is calculated as the median value, other calculation methods may be used.

ステップ１０４２）サムネイル抽出条件検出手順：
被写体・背景動き量算出手順（ステップ１０４１）で得られた各動き量のグラフの極大・極小値及び被写体の位置や大きさを考慮してサムネイルを検出する。以下にその検出例を示す。始めに被写体・背景動き量算出手順にて算出された動き量それぞれに対して極大・極小を取る映像時刻を選択する。更に、その各映像時刻前後のある範囲内で被写体の位置、大きさに対する条件を最も満たす映像時刻をそれぞれ選択する。被写体の位置・大きさの評価基準は、例えば、画面中央の垂線と被写体重心座標との距離の逆数（距離が０の場合はある定数）と、また被写体が画面に占める割合とを掛け合わせたものを評価基準（大きな方がより適している）として利用する。そして評価値として被写体の位置及び大きさの評価基準の重み付き和を上記評価値として選択基準に利用することで被写体の位置・大きさを満たす時刻を算出することができる。 Step 1042) Thumbnail extraction condition detection procedure:
A thumbnail is detected in consideration of the maximum and minimum values of the graph of each motion amount obtained in the subject / background motion amount calculation procedure (step 1041) and the position and size of the subject. The detection example is shown below. First, the video time that takes the maximum and minimum values for each of the motion amounts calculated in the subject / background motion amount calculation procedure is selected. Further, the video time that best satisfies the conditions for the position and size of the subject within a certain range before and after each video time is selected. The evaluation criteria for the position and size of the subject are, for example, the product of the reciprocal of the distance between the vertical line at the center of the screen and the subject center-of-gravity coordinates (a constant if the distance is 0) and the ratio of the subject to the screen. Use things as evaluation criteria (larger is better). Then, by using the weighted sum of the evaluation position of the subject position and size as the evaluation value as the evaluation value, the time satisfying the position and size of the subject can be calculated.

なお、被写体と背景を区別せずに合わせた動き量を算出し、極大・極小値をとるサムネイルを検出してもよい。この場合、画面内容全体の変化の様子を表すサムネイルを抽出できる。また、評価値の算出方法は他の任意の算術計算でもよい。 It should be noted that it is also possible to calculate the amount of movement without distinguishing between the subject and the background, and detect thumbnails having maximum and minimum values. In this case, it is possible to extract a thumbnail representing the state of change of the entire screen content. The evaluation value may be calculated by any other arithmetic calculation.

動きの極大値付近で選択された前後の画像を任意の方法で複数枚抽出してもよい（例えば、等間隔で一定枚数抽出するなど）。スノーボードなどのスポーツ映像に対しては映像内の動き、すなわち変化が激しいため、動きの変化を詳しく見られるサムネイルを抽出するために必要となる。 A plurality of images before and after selected near the maximum value of the movement may be extracted by an arbitrary method (for example, a certain number of images are extracted at equal intervals). For sports images such as snowboards, the movement in the image, that is, the change is so severe that it is necessary to extract a thumbnail that allows detailed observation of the change in the movement.

極大・極小を取る映像時刻を選択する際、その山と谷の高さに条件を入れて選択を絞りこんでもよい。例えば、前後の極大・極小の差が一定値以下の場合はその極値は選択しない、などである。被写体、背景あるいは画面全体が大きく動いた時点のサムネイルのみを選択することが可能になる。処理対象映像によって山と谷の高さの条件を変更することでより映像内容を表現したサムネイルを抽出可能になる。 When selecting a video time that takes a maximum or minimum, the selection may be narrowed down by putting conditions into the heights of the peaks and valleys. For example, if the difference between the maximum and minimum values before and after is less than a certain value, the extreme value is not selected. It is possible to select only the thumbnail when the subject, the background, or the entire screen has moved greatly. By changing the conditions of the height of the peaks and valleys depending on the processing target video, it is possible to extract thumbnails expressing the video content.

被写体動き、背景動き共に突発的にインパルス応答を示し、被写体・背景の動きにあまり差がない場合は有意な画像でない場合が多いため検出しなくてもよい。この理由は「背景動きを利用したサムネイル抽出」の項で後述する。 If the subject motion and the background motion suddenly show an impulse response, and there is not much difference between the subject and the background motion, it may not be detected because it is often not a significant image. The reason for this will be described later in the section “Thumbnail extraction using background motion”.

また、時系列被写体リスト記憶部１０４の被写体の分類に人や車などの分類が記憶されている場合には、特定の被写体の動き量のみを利用して上記サムネイル選択を行ってもよい。この場合、注目被写体のみの変化に着目したサムネイルを抽出できる。 Further, when a classification such as a person or a car is stored in the classification of the subject in the time-series subject list storage unit 104, the thumbnail selection may be performed using only the amount of movement of a specific subject. In this case, it is possible to extract a thumbnail that focuses on changes in only the subject of interest.

ステップ１０４３）検出時刻記憶手順：
サムネイル抽出条件検出手順（ステップ１０４２）で検出されたサムネイル時刻をサムネイル時刻リスト記憶部１０４に格納する。または、映像入力部１１０により該当時刻の画像を取得し、ファイル、メモリ等に記憶してもよい。 Step 1043) Detection time storage procedure:
The thumbnail time detected in the thumbnail extraction condition detection procedure (step 1042) is stored in the thumbnail time list storage unit 104. Alternatively, an image at the corresponding time may be acquired by the video input unit 110 and stored in a file, a memory, or the like.

＜サムネイル画像抽出手順：背景動きを利用したサムネイル抽出＞
例として、ドラマ映像に対して、被写体・背景動き算出手順（ステップ１０４１）で作成した被写体・背景のグラフを図８に示す。 <Thumbnail image extraction procedure: Thumbnail extraction using background motion>
As an example, FIG. 8 shows a graph of the subject / background created in the subject / background motion calculation procedure (step 1041) for the drama video.

サムネイル抽出条件を背景動きの極大・極小とすることで、カメラの動きに応じたサムネイルを抽出できる。視点を切り替えるため、あるいは風景を見渡すためにカメラを動かした際の場面の移り変わりを表現したサムネイルを抽出可能である。 By setting the thumbnail extraction condition to the maximum / minimum of background motion, it is possible to extract thumbnails according to camera movement. It is possible to extract thumbnails that represent scene transitions when the camera is moved to switch the viewpoint or overlook the landscape.

なお、視点切り換えと風景撮影を区別して検出する方法は、突発的に背景動き及び被写体動きの極大値が共に閾値以上で、両者の比が一定値以下になる場合を"視点切り換え"とし、それ以外に発生した背景動きを"風景撮影"と見なすことで実現可能である。すなわち、これは"視点切り換え"の場合はカメラを動かす途中の場面には興味がないため、"風景撮影"に比べてカメラを素早く動かすことを仮定したものである。なお、突発性の評価は、例えば、極大値の中央値より一定以上大きな極大値の５０％地点の幅（図８（ｂ））が一定値以下の場合を"突発的"と見なすことで行うことができる。 The method of distinguishing and detecting viewpoint switching and landscape shooting is “viewpoint switching” when the maximum value of both background motion and subject motion is suddenly both above the threshold and the ratio of both is below a certain value. This can be realized by regarding the background movement that occurs in addition to the "landscape photography". That is, in the case of “viewpoint switching”, it is assumed that the camera is moved more quickly than “landscape shooting” because there is no interest in the scene in the middle of moving the camera. Note that the evaluation of the suddenness is performed by, for example, considering that the case where the width of the 50% point (FIG. 8B) of the maximum value that is greater than or equal to the central value of the maximum value is equal to or less than a certain value is “suddenly”. be able to.

なお、図８（ｃ）に示すような、被写体と背景動きがほぼ等しい区間は被写体がアップで写った区間である（なお、非特許文献２ではこれを利用してカメラワークに生じる急激な変化を検出することでアップショットを検出している）。背景サムネイルの抽出では、例えば、この区間を事前に除くことで過剰検出を防ぐことが可能である。検出方法は検出された背景・被写体動きの極大・極小値の差分絶対値を連続してある個数だけ平均した値が閾値以下として検出する、あるいは、極大・極小値の代わりにその区間内の差分絶対値の平均を利用するなどして検出可能である。 Note that a section where the subject and the background motion are almost equal as shown in FIG. 8C is a section in which the subject is photographed up (in Non-Patent Document 2, abrupt changes that occur in camera work using this are shown. To detect up-shots). In the extraction of the background thumbnail, for example, it is possible to prevent excessive detection by removing this section in advance. The detection method detects the difference between the detected background / subject movement maximum / minimum difference values continuously as a certain number or less as the threshold value, or instead of the maximum / minimum values, the difference within the interval is detected. It can be detected by using an average of absolute values.

＜フォローショット検出＞
本発明は、上記の背景動きに加えて、被写体動きも考慮することで、被写体に注目して追跡撮影したフォローショットを検出可能である。図９にグラフの例を示す。 <Follow shot detection>
In the present invention, in addition to the background motion described above, the subject motion is also taken into consideration, so that it is possible to detect a follow shot that is tracked and photographed while paying attention to the subject. FIG. 9 shows an example of the graph.

注目被写体が歩き、カメラがそれを追ったとき、被写体の映像の見た目上の動きは比較的小さくなる一方、カメラの動きが発生する。サムネイル画像抽出部１５０がそれを検出することで、フォローショットを検出できる。具体例としては、カメラワーク区間をカメラ動きの極小値間（図９（ａ））とし、その区間内の被写体動き平均値を算出し、背景極大値の前後近傍に被写体動き極小値が存在し、その極小値が被写体動き平均値に比して小さい場合にフォローショットとして検出する。 When the subject of interest walks and the camera follows it, the apparent movement of the subject image becomes relatively small while the movement of the camera occurs. When the thumbnail image extraction unit 150 detects this, a follow shot can be detected. As a specific example, the camera work section is set between the minimum values of the camera motion (FIG. 9A), the subject motion average value in the section is calculated, and the subject motion minimum value exists in the vicinity of the background maximum value. When the local minimum value is smaller than the subject motion average value, it is detected as a follow shot.

検出されたフォローショット前後で前述の「サムネイルの抽出方法：被写体動きを利用」、「サムネイル画像抽出手順：背景動きを利用したサムネイル抽出」の手順と同様にサムネイルを抽出することで、注目被写体を含むサムネイルを抽出することが可能となる。 The subject of interest is extracted by extracting thumbnails before and after the detected follow shot in the same manner as the procedures of “Thumbnail Extraction Method: Using Subject Movement” and “Thumbnail Image Extraction Procedure: Thumbnail Extraction Using Background Motion” described above. It is possible to extract the thumbnails that are included.

＜出力部：スライドショー提示の方法＞
出力部１６０では、サムネイル時刻リスト記憶部１０５の映像フレーム番号に基づいてフレームバッファ１０２からサムネイルを抽出し、そのサムネイルを、例えば、画像を順次表示していくスライドショーのような提示方法でユーザに提示するのが効果的である。 <Output unit: slideshow presentation method>
The output unit 160 extracts thumbnails from the frame buffer 102 based on the video frame numbers in the thumbnail time list storage unit 105, and presents the thumbnails to the user by a presentation method such as a slide show in which images are sequentially displayed, for example. It is effective to do.

スライドショーを提示する際に、例えば、動きの極大値にあたるサムネイルは動きの極小値にあたるサムネイルより短く表示する。また、画像と画像を入れ替える際にディゾルブなどの効果を入れることで、より動きを強調したスライドショーとなる。 When presenting a slide show, for example, a thumbnail corresponding to the maximum value of movement is displayed shorter than a thumbnail corresponding to the minimum value of movement. Also, by adding an effect such as dissolve when the images are exchanged, a slide show with more enhanced motion is obtained.

スライドショーとして表示するサムネイル検出方法として適するサムネイル抽出の方法は対象映像によって異なる。スポーツのように被写体の動きが重要な場合は、被写体動きを利用したサムネイル抽出が適している。また、ドラマのように場面の転換も重要な場合は、被写体動きを利用する手法及び背景動きを利用する方法の両者の方法で抽出されたサムネイルを利用する。個人が撮影した映像など撮影者が注目した被写体を追い続ける傾向がある場合は、前述のフォローショットのサムネイルを利用することも効果的である。 A thumbnail extraction method suitable as a thumbnail detection method for displaying as a slide show differs depending on the target video. When the movement of the subject is important as in sports, thumbnail extraction using the subject movement is suitable. In addition, when the change of the scene is important as in a drama, thumbnails extracted by both the method using the subject motion and the method using the background motion are used. If there is a tendency to keep track of the subject that the photographer has paid attention to, such as a video taken by an individual, it is also effective to use the thumbnail of the follow shot described above.

なお、図３に示す映像処理装置の各構成要素の動作をプログラムとして構築し、映像処理装置として利用されるコンピュータにインストールする、または、ネットワークを介して流通させることが可能である。 Note that the operation of each component of the video processing apparatus shown in FIG. 3 can be constructed as a program and installed in a computer used as the video processing apparatus, or can be distributed via a network.

また、構築されたプログラムをハードディスクや、フレキシブルディスク・ＣＤ−ＲＯＭ等の可搬記憶媒体に格納し、コンピュータにインストールする、または、配布することが可能である。 Further, the constructed program can be stored in a portable storage medium such as a hard disk, a flexible disk, or a CD-ROM, and can be installed or distributed in a computer.

なお、本発明は、上記の実施の形態に限定されることなく、特許請求の範囲内において種々変更・応用が可能である。 The present invention is not limited to the above-described embodiment, and various modifications and applications can be made within the scope of the claims.

本発明は、映像から映像内容を表す代表画像を抽出する技術に適用可能である。 The present invention can be applied to a technique for extracting a representative image representing video content from a video.

本発明の原理を説明するための図である。It is a figure for demonstrating the principle of this invention. 本発明の原理構成図である。It is a principle block diagram of this invention. 本発明の一実施の形態における映像処理装置の構成図である。It is a block diagram of the video processing apparatus in one embodiment of this invention. 本発明の一実施の形態における動作のフローチャート（その１）である。It is a flowchart (the 1) of the operation | movement in one embodiment of this invention. 本発明の一実施の形態における動作のフローチャート（その２）である。It is a flowchart (the 2) of the operation | movement in one embodiment of this invention. 本発明の一実施の形態におけるサムネイル画像抽出手順のフローチャートである。It is a flowchart of the thumbnail image extraction procedure in one embodiment of the present invention. 本発明の一実施の形態における抽出サムネイル例（被写体動き利用）である。It is an example of extraction thumbnails (use of subject movement) in an embodiment of the present invention. 本発明の一実施の形態における被写体・背景動き量グラフと抽出サムネイル例（背景動き利用）である。It is a to-be-photographed object / background motion amount graph and extracted thumbnail example (using background motion) in an embodiment of the present invention. 本発明の一実施の形態における被写体・背景動き量グラフと抽出サムネイル例（背景・被写体動き利用）である。5 is a subject / background motion amount graph and extracted thumbnail example (using background / subject motion) in an embodiment of the present invention.

Explanation of symbols

１０１映像記憶部
１０２映像フレーム記憶手段、フレームバッファ
１０３被写体映像区間リスト記憶部
１０４時系列被写体リスト記憶部、時系列被写体リスト記憶部
１０５サムネイル時刻記憶手段、サムネイル時刻リスト記憶部
１１０映像入力手段、映像入力部
１２０カット点検出部
１３０画像単位対応部
１４０被写体検出手段、被写体検出部
１５０サムネイル画像抽出手段、サムネイル画像抽出部
１６０出力部 101 Video storage unit 102 Video frame storage unit, frame buffer 103 Subject video section list storage unit 104 Time series subject list storage unit, time series subject list storage unit 105 Thumbnail time storage unit, thumbnail time list storage unit 110 Video input unit, video Input unit 120 Cut point detection unit 130 Image unit correspondence unit 140 Subject detection unit, subject detection unit 150 Thumbnail image extraction unit, thumbnail image extraction unit 160 Output unit

Claims

A video processing method for extracting a representative image representing video content from a video,
A video input means for inputting video and storing the video in the video frame storage means;
A subject detection step in which a subject detection means acquires a video frame from the video frame storage means, detects a subject in the video frame, performs frame association, and stores it in the time-sequential subject list storage means together with the position of the subject. When,
The thumbnail image extraction means calculates the amount of movement of the subject stored in the time-series subject list storage means, and sets the time point of the calculated maximum or minimum value of the movement amount as the extraction time point. A thumbnail image extraction step of extracting a thumbnail image based on the weighted sum of the reciprocal of the distance between the center of the screen and the subject position and the ratio of the subject area to the screen based on the evaluation value ;
A video processing method characterized by:

A video processing method for extracting a representative image representing video content from a video,
A video input means for inputting video and storing the video in the video frame storage means;
A subject detection step in which a subject detection means acquires a video frame from the video frame storage means, detects a subject in the video frame, performs frame association, and stores it in the time-sequential subject list storage means together with the position of the subject. When,
An image unit corresponding unit calculates a motion vector between frames for each image unit, and stores background information in the time-series subject list storage unit;
The thumbnail image extraction means calculates the amount of motion of the background using the information stored in the time-series subject list storage means, and sets the time point of the calculated maximum or minimum value of the motion amount as the extraction time point. , A weighted sum of the reciprocal of the distance between the center of the screen and the subject position and the ratio of the subject area to the screen as an evaluation value, and a thumbnail for extracting thumbnail images based on the evaluation value An image extraction step;
A video processing method characterized by:

In the thumbnail image extraction step,
Using the information stored in the time-series subject list storage means, the amount of movement of both the subject and the background is calculated , and the time point of one of the maximum and minimum values of the amount of movement of the subject and the amount of movement of the background And the time determined by adding the condition based on the amount of movement in the vicinity of the time of extraction is defined as the extraction time.
The video processing method according to claim 2 .

In the thumbnail image extraction step,
Using the information stored in the time-series subject list storage means, the amount of movement of both the subject and the background is calculated, and the average amount of movement of the subject between two consecutive minimum values of the amount of background movement is calculated. A minimum value of the subject motion amount exists in the vicinity of the maximum value of the background motion amount between the two consecutive minimum values, and the minimum value is a value of the subject motion amount. When it is smaller than the average value, the time point of the minimum amount of movement of the subject is set as the extraction time point.
The video processing method according to claim 2.

A video processing apparatus for extracting a representative image representing video content from video,
Video input means for inputting video and storing it in video frame storage means;
Subject detection means for acquiring a video frame from the video frame storage means, detecting a subject in the video frame, performing frame association, and storing in the time-series subject list storage means together with the position of the subject;
Calculating a movement amount before Symbol time series the object stored in the object list storage means, the time of the maximum and minimum values of the calculated motion amount is extracted point in the periphery of the extraction time, the center of the screen A thumbnail image extraction means for extracting a thumbnail image based on a weighted sum of the reciprocal of the distance to the subject position and the ratio of the subject area to the screen as an evaluation value ;
A video processing apparatus comprising:

A video processing apparatus for extracting a representative image representing video content from video,
Video input means for inputting video and storing it in video frame storage means;
Subject detection means for acquiring a video frame from the video frame storage means, detecting a subject in the video frame, performing frame association, and storing in the time-series subject list storage means together with the position of the subject;
An image unit correspondence unit that calculates a motion vector between frames for each image unit, and stores background information in the time-series subject list storage unit;
Using the information stored before Symbol time series object list storage means, calculates the amount of motion of the background, the point of maximum and minimum values of the calculated motion amount is extracted point in the periphery of the extraction point A thumbnail image extracting means for extracting a thumbnail image based on a weighted sum of a reciprocal of the distance between the center of the screen and the subject position and a ratio of the subject area to the screen based on the evaluation value ;
A video processing apparatus comprising:

The thumbnail image extraction means includes
Using the information stored in the time-series subject list storage means, the amount of movement of both the subject and the background is calculated , and the time point of one of the maximum and minimum values of the amount of movement of the subject and the amount of movement of the background And the time determined by adding the condition based on the amount of movement in the vicinity of the time of extraction is defined as the extraction time.
The video processing apparatus according to claim 6 .

The thumbnail image extraction means includes
Using the information stored in the time-series subject list storage means, the amount of movement of both the subject and the background is calculated, and the average amount of movement of the subject between two consecutive minimum values of the amount of background movement is calculated. A minimum value of the subject motion amount exists in the vicinity of the maximum value of the background motion amount between the two consecutive minimum values, and the minimum value is a value of the subject motion amount. When it is smaller than the average value, the time point of the minimum amount of movement of the subject is set as the extraction time point.
The video processing apparatus according to claim 6.

A video processing program for causing a computer to function as each means constituting the video processing device according to any one of claims 5 to 8.

A computer-readable recording medium storing the video processing program according to claim 9.