JP2012105205A

JP2012105205A - Key frame extractor, key frame extraction program, key frame extraction method, imaging apparatus, and server device

Info

Publication number: JP2012105205A
Application number: JP2010254049A
Authority: JP
Inventors: Takeshi Matsuo; 武史松尾
Original assignee: Nikon Corp
Current assignee: Nikon Corp
Priority date: 2010-11-12
Filing date: 2010-11-12
Publication date: 2012-05-31

Abstract

PROBLEM TO BE SOLVED: To extract a key frame from moving image data accurately at a low cost.SOLUTION: A key frame extractor includes: a scene analysis part 510 for analyzing the moving image data and detecting scenes; a key frame extraction part 520 for extracting a frame image time-sequentially at the center or near the center as a key frame from the plurality of frame images corresponding to the scene having the longest scene length among the plurality of scenes detected by the scene analysis part 510; a feature amount analysis part 530 for normalizing the key frame, extracting the feature amount of an image, executing clustering processing, and generating a feature amount histogram; and a classification processing part 540 for performing machine learning and classification based on the feature amount histogram.

Description

本発明は、キーフレーム抽出装置、キーフレーム抽出プログラム、キーフレーム抽出方法、撮像装置、およびサーバ装置に関する。 The present invention relates to a key frame extraction device, a key frame extraction program, a key frame extraction method, an imaging device, and a server device.

ビデオクリップ（動画像データ）からキーフレームを抽出する技術が知られている（例えば、特許文献１参照）。この特許文献１には、カメラにカメラモーションセンサを備え、グローバルモーションを計算して複数のビデオセグメントを形成し、一連のカメラモーションクラスにしたがって各セグメントをラベリングし、このラベリングしたセグメントからキーフレーム候補を抽出する技術が開示されている。このグローバルモーションは、カメラモーションセンサによるカメラワークや映像から得られるカメラワークを計算したものである。 A technique for extracting a key frame from a video clip (moving image data) is known (see, for example, Patent Document 1). In this patent document, a camera is provided with a camera motion sensor, a global motion is calculated to form a plurality of video segments, each segment is labeled according to a series of camera motion classes, and a key frame candidate is determined from the labeled segments. A technique for extracting the above is disclosed. This global motion is obtained by calculating camera work obtained from a camera motion sensor and video.

特表２００９−５３９２７３号公報Special table 2009-539273 gazette

しかしながら、特許文献１に開示されたキーフレーム候補の抽出方法では、カメラにカメラモーションセンサを設ける必要がある。また、この抽出方法では、グローバルモーションを計算するが、このグローバルモーションの計算負荷は重く高速な演算処理能力が要求される。すなわち、同文献に開示されたキーフレーム候補の抽出方法を実現するためのコストは高いものとなる。
そこで、本発明は、上記事情に鑑みてなされたものであり、特別なセンサを用いる必要がなく、高い演算処理能力を必要とせずに、動画像データからキーフレームを精度よく且つ低コストに抽出する、キーフレーム抽出装置、キーフレーム抽出プログラム、キーフレーム抽出方法、撮像装置、およびサーバ装置を提供することを目的とする。 However, in the key frame candidate extraction method disclosed in Patent Document 1, it is necessary to provide a camera motion sensor in the camera. Further, in this extraction method, global motion is calculated, but the calculation load of this global motion is heavy and high speed processing capability is required. That is, the cost for realizing the key frame candidate extraction method disclosed in this document is high.
Therefore, the present invention has been made in view of the above circumstances, and it is not necessary to use a special sensor, and it is possible to extract key frames from moving image data with high accuracy and at low cost without requiring high arithmetic processing capability. An object of the present invention is to provide a key frame extraction device, a key frame extraction program, a key frame extraction method, an imaging device, and a server device.

［１］上記の課題を解決するため、本発明の一態様であるキーフレーム抽出装置は、動画像データを解析してシーンを検出するシーン解析部と、前記シーン解析部が検出したシーンのシーン長と前記シーンに対応する複数の画像データにおける時間方向の画像データの位置とに基づいて、キーフレームを抽出するキーフレーム抽出部と、備えることを特徴とする。
［２］上記の課題を解決するため、本発明の一態様であるキーフレーム抽出プログラムは、コンピュータを、動画像データを解析してシーンを検出するシーン解析部と、前記シーン解析部が検出したシーンのシーン長と前記シーンに対応する複数の画像データにおける時間方向の画像データの位置とに基づいて、キーフレームを抽出するキーフレーム抽出部と、として機能させる。
［３］上記の課題を解決するため、本発明の一態様であるキーフレーム抽出方法は、シーン解析部が、動画像データを解析してシーンを検出するシーン解析ステップと、キーフレーム抽出部が、前記シーン解析ステップにおいて前記シーン解析部が検出したシーンのシーン長と前記シーンに対応する複数の画像データにおける時間方向の画像データの位置とに基づいて、キーフレームを抽出するキーフレーム抽出ステップと、を有することを特徴とする。
［４］上記の課題を解決するため、本発明の一態様である撮像装置は、撮像して動画像データを生成する撮像部と、前記撮像部が生成した動画像データを解析してシーンを検出するシーン解析部と、前記シーン解析部が検出したシーンのシーン長と前記シーンに対応する複数の画像データにおける時間方向の画像データの位置とに基づいて、キーフレームを抽出するキーフレーム抽出部と、前記キーフレーム抽出部が抽出したキーフレームに基づいて縮小画像データを生成するサムネイル生成部と、前記サムネイル生成部が生成した縮小画像データを表示する表示部と、備えることを特徴とする。
［５］上記の課題を解決するため、本発明の一態様であるサーバ装置は、動画像データを記憶する動画像データ記憶部と、前記動画像データ記憶部に記憶された動画像データを解析してシーンを検出するシーン解析部と、前記シーン解析部が検出したシーンのシーン長と前記シーンに対応する複数の画像データにおける時間方向の画像データの位置とに基づいて、キーフレームを抽出するキーフレーム抽出部と、前記キーフレーム抽出部が抽出したキーフレームに基づいて縮小画像データを生成し、この縮小画像データを前記動画像データに関連付けて前記動画像データ記憶部に記憶させるサムネイル生成部と、を備えたことを特徴とする。 [1] In order to solve the above-described problem, a key frame extraction device according to an aspect of the present invention includes a scene analysis unit that analyzes moving image data to detect a scene, and a scene of the scene detected by the scene analysis unit And a key frame extraction unit that extracts a key frame based on the length and the position of the image data in the time direction in the plurality of image data corresponding to the scene.
[2] In order to solve the above-described problem, a key frame extraction program according to one aspect of the present invention includes a scene analysis unit that detects a scene by analyzing moving image data, and a scene analysis unit that detects a computer. Based on the scene length of the scene and the position of the image data in the time direction in the plurality of image data corresponding to the scene, the key frame extracting unit is configured to extract a key frame.
[3] In order to solve the above problem, a key frame extraction method according to one aspect of the present invention includes a scene analysis step in which a scene analysis unit analyzes a moving image data to detect a scene, and a key frame extraction unit includes A key frame extraction step for extracting a key frame based on a scene length of the scene detected by the scene analysis unit in the scene analysis step and a position of image data in a time direction in a plurality of image data corresponding to the scene; It is characterized by having.
[4] In order to solve the above-described problem, an imaging apparatus according to an aspect of the present invention includes an imaging unit that captures and generates moving image data, and analyzes the moving image data generated by the imaging unit to generate a scene. A scene analysis unit to detect, and a key frame extraction unit to extract a key frame based on a scene length of the scene detected by the scene analysis unit and a position of image data in a time direction in a plurality of image data corresponding to the scene A thumbnail generation unit that generates reduced image data based on the key frame extracted by the key frame extraction unit; and a display unit that displays the reduced image data generated by the thumbnail generation unit.
[5] In order to solve the above-described problem, a server device according to one aspect of the present invention analyzes a moving image data storage unit that stores moving image data, and moving image data stored in the moving image data storage unit And extracting a key frame based on a scene analysis unit for detecting a scene, a scene length of the scene detected by the scene analysis unit, and positions of image data in a time direction in a plurality of image data corresponding to the scene A thumbnail generation unit that generates reduced image data based on the key frame extracted by the key frame extraction unit, and stores the reduced image data in the moving image data storage unit in association with the moving image data And.

本発明によれば、特別なセンサを用いる必要がなく、高い演算処理能力を必要とせずに、動画像データからキーフレームを精度よく且つ低コストに抽出することができる。 According to the present invention, it is not necessary to use a special sensor, and key frames can be extracted from moving image data with high accuracy and at low cost without requiring high calculation processing capability.

本発明の第１実施形態であるキーフレーム抽出装置を適用した撮像装置の機能構成を表すブロック図である。It is a block diagram showing the functional structure of the imaging device to which the key frame extracting device which is 1st Embodiment of this invention is applied. 同実施形態におけるキーフレーム抽出装置の機能構成を表すブロック図である。It is a block diagram showing the functional composition of the key frame extraction device in the embodiment. 同実施形態において、時空間画像データを説明するための模式図である。In the same embodiment, it is a schematic diagram for demonstrating spatiotemporal image data. 同実施形態において、キーフレーム抽出部が正規化動画像データの中からキーフレームを抽出する様子を模式的に表した図である。In the same embodiment, it is the figure which represented typically a mode that a key frame extraction part extracts a key frame from normalized moving image data. 同実施形態において、キーフレーム抽出部が抽出したキーフレームの画像の例である。In the embodiment, it is an example of an image of a key frame extracted by a key frame extraction unit. 同実施形態における、キーフレーム抽出装置の動作手順を表すフローチャートである。It is a flowchart showing the operation | movement procedure of the key frame extraction apparatus in the embodiment. 同実施形態において、表示部に表示された圧縮動画像データの一覧を模式的に表した図である。In the embodiment, it is the figure which represented typically the list of the compression moving image data displayed on the display part. 第１実施形態の第１変形例において、キーフレーム抽出部が正規化動画像データの中から３個のキーフレームを抽出する様子を模式的に表した図である。It is the figure which represented typically a mode that a key frame extraction part extracts three key frames from normalized moving image data in the 1st modification of 1st Embodiment. 同変形例において、キーフレーム抽出部が抽出した３個のキーフレームの画像の例である。In the same modification, it is an example of the image of three key frames which the key frame extraction part extracted. 同変形例において、図９（ａ）における３個のキーフレームの画像それぞれのヒストグラムと、クラスタリング処理部がこれら３個のヒストグラムをクラスごとに加算して生成した統合ヒストグラムとを模式的に表した図である。In the modification, the histogram of each of the three key frame images in FIG. 9A and the integrated histogram generated by the clustering processing unit adding these three histograms for each class are schematically shown. FIG. 同変形例において、表示部に表示された圧縮動画像データの一覧を模式的に表した図である。In the modification, it is the figure which represented typically the list of the compression moving image data displayed on the display part. 第１実施形態の第２変形例において、キーフレーム抽出部が正規化動画像データの中から３個のキーフレームを抽出する様子を模式的に表した図である。It is the figure which represented typically a mode that a key frame extraction part extracts three key frames from normalized moving image data in the 2nd modification of 1st Embodiment. 同変形例において、キーフレーム抽出部が抽出した３個のキーフレームの画像の例である。In the same modification, it is an example of the image of three key frames which the key frame extraction part extracted. 同変形例において、図１３（ａ）における３個のキーフレームの画像それぞれのヒストグラムと、クラスタリング処理部がこれら３個のヒストグラムをクラスごとに加算して生成した統合ヒストグラムとを模式的に表した図である。In the modified example, the histogram of each of the three key frame images in FIG. 13A and the integrated histogram generated by the clustering processing unit adding these three histograms for each class are schematically shown. FIG. 第１実施形態の第３変形例において、キーフレーム抽出部が正規化動画像データの中から９個のキーフレームを抽出する様子を模式的に表した図である。It is the figure which represented typically a mode that a key frame extraction part extracts nine key frames from normalized moving image data in the 3rd modification of 1st Embodiment. 本発明の第２実施形態であるキーフレーム抽出装置を適用したサーバ装置を含む、ネットワークシステムの全体構成を表すブロック図である。It is a block diagram showing the whole structure of a network system including the server apparatus to which the key frame extraction apparatus which is 2nd Embodiment of this invention is applied.

以下、本発明を実施するための形態について、図面を参照して詳細に説明する。
［第１の実施の形態］
図１は、本発明の第１実施形態であるキーフレーム抽出装置を適用した撮像装置の機能構成を表すブロック図である。同図に示すように、撮像装置１００は、撮像部１１０と、制御部１９０と、操作部１８０と、画像処理部１４０と、表示部１５０と、記憶部１６０と、バッファメモリ部１３０と、通信部１７０とを、バス３００を介して接続した構成を有する。
また、撮像装置１００には、記憶媒体２００が着脱可能に取り付けられている。なお、記憶媒体２００は、撮像装置１００に内蔵されるものであってもよい。 Hereinafter, embodiments for carrying out the present invention will be described in detail with reference to the drawings.
[First Embodiment]
FIG. 1 is a block diagram showing a functional configuration of an imaging apparatus to which a key frame extraction apparatus according to the first embodiment of the present invention is applied. As shown in the figure, the imaging apparatus 100 includes an imaging unit 110, a control unit 190, an operation unit 180, an image processing unit 140, a display unit 150, a storage unit 160, a buffer memory unit 130, and a communication. The unit 170 is connected via the bus 300.
In addition, a storage medium 200 is detachably attached to the imaging apparatus 100. Note that the storage medium 200 may be built in the imaging apparatus 100.

撮像部１１０は、制御部１９０が設定した撮像条件（例えば、絞り値、露出値）に基づいて制御部１９０により制御され、被写体から到来する光束を撮像して画像データを生成する。本実施形態において、画像データはフレーム画像データである。後述するように、撮像装置１００は写真のデータである静止画像データと、映像（ビデオ）のデータである動画像データとを扱う。静止画像データは１個の画像データである。また、動画像データは複数の画像データである。また、撮像部１００は、静止画像データおよび動画像データのデータ圧縮処理を行って記憶媒体２００に記憶させる。データ圧縮処理後の静止画像データは圧縮静止画像データであり、データ圧縮処理後の動画像データは圧縮動画像データである。ただし、撮像部１００は、静止画像データを非圧縮のまま記憶媒体２００に記憶させる場合もある。
以下の説明においては、撮像装置１００が動画像データを扱う場合について説明する。
なお、本実施形態では、圧縮動画像データを復号したデータは動画像データであることとして以下説明する。 The imaging unit 110 is controlled by the control unit 190 based on the imaging conditions (for example, aperture value and exposure value) set by the control unit 190, and generates image data by imaging the light flux coming from the subject. In the present embodiment, the image data is frame image data. As will be described later, the imaging apparatus 100 handles still image data, which is photographic data, and moving image data, which is video (video) data. Still image data is one piece of image data. The moving image data is a plurality of image data. Further, the imaging unit 100 performs data compression processing of still image data and moving image data and stores the data in the storage medium 200. Still image data after data compression processing is compressed still image data, and moving image data after data compression processing is compressed moving image data. However, the imaging unit 100 may store still image data in the storage medium 200 without compression.
In the following description, a case where the imaging apparatus 100 handles moving image data will be described.
In the present embodiment, it will be described below that the data obtained by decoding the compressed moving image data is moving image data.

撮像部１１０は、その機能構成として、光学系１１１と、撮像素子１１９と、アナログ／デジタル（Ａ／Ｄ）変換部１２０とを備える。
光学系１１１は、対物レンズおよび集束レンズを含むレンズ群を有し、被写体から到来する光束を集光して撮像素子１１９の撮像面に結像させる。
撮像素子１１９は、撮像面に結像した被写体像を光電変換することにより撮像してアナログ信号である画像信号を生成する。撮像素子１１９は、例えば、ＣＣＤ（ＣｈａｒｇｅＣｏｕｐｌｅｄＤｅｖｉｃｅ）イメージセンサやＣＭＯＳ（ＣｏｍｐｌｅｍｅｎｔａｒｙＭｅｔａｌＯｘｉｄｅＳｅｍｉｃｏｎｄｕｃｔｏｒ）イメージセンサ等の固体撮像素子により実現される。
Ａ／Ｄ変換部１２０は、撮像素子１１９が生成した画像信号を取り込んでデジタルデータである画像データに変換する。 The imaging unit 110 includes an optical system 111, an imaging element 119, and an analog / digital (A / D) conversion unit 120 as functional configurations.
The optical system 111 has a lens group including an objective lens and a converging lens. The optical system 111 condenses the light flux coming from the subject and forms an image on the imaging surface of the image sensor 119.
The image pickup device 119 picks up an object image formed on the image pickup surface by photoelectric conversion and generates an image signal that is an analog signal. The image sensor 119 is realized by a solid-state image sensor such as a CCD (Charge Coupled Device) image sensor or a CMOS (Complementary Metal Oxide Semiconductor) image sensor, for example.
The A / D conversion unit 120 takes in the image signal generated by the image sensor 119 and converts it into image data that is digital data.

なお、光学系１１１は、撮像装置１００に一体的に設けられてもよいし、撮像装置１００に着脱可能に取り付けられてもよい。 The optical system 111 may be provided integrally with the imaging apparatus 100 or may be detachably attached to the imaging apparatus 100.

制御部１９０は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）と、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）と、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）とを含んで実現される。制御部１９０は、ＲＯＭに記憶した制御プログラムをＲＡＭに読み出して各ステップを実行することにより、バス３００を介して各部を制御する。例えば、制御部１９０は、Ａ／Ｄ変換部１２０が出力する画像データを、表示部１５０に表示させたり、画像処理部１４０でデータ圧縮させて記憶媒体２００に記憶させたりする。また、制御部１９０は、記憶媒体２００に記憶された圧縮静止画像データおよび圧縮動画像データをバッファメモリ部１３０に読み込ませ、画像処理部１４０により復号させて表示部１５０に表示させる。 The control unit 190 is realized by including a CPU (Central Processing Unit), a ROM (Read Only Memory), and a RAM (Random Access Memory). The control unit 190 controls each unit via the bus 300 by reading the control program stored in the ROM into the RAM and executing each step. For example, the control unit 190 displays the image data output from the A / D conversion unit 120 on the display unit 150, or the data is compressed by the image processing unit 140 and stored in the storage medium 200. The control unit 190 also reads the compressed still image data and the compressed moving image data stored in the storage medium 200 into the buffer memory unit 130, decodes them by the image processing unit 140, and displays them on the display unit 150.

また、制御部１９０は、その機能構成として、キーフレーム抽出装置５００と、サムネイル生成部５５０とを備える。キーフレーム抽出装置５００は、制御部１９０によって記憶媒体２００からバッファメモリ部１３０に読み出され、画像処理部１４０によって復号された動画像データを取り込む。そして、キーフレーム抽出装置５００は、取り込んだ動画像データからシーンを検出する。キーフレーム抽出装置５００は、シーンが切り替わるタイミング（シーンカットタイミング）を推定することによりシーンを検出する。そして、キーフレーム抽出装置５００は、推定したシーンカットタイミングに基づいてキーフレームを抽出し、このキーフレームを特定する情報（例えば、後述するフレーム識別情報）を出力する。そして、キーフレーム抽出装置５００は、キーフレームを分類して分類データも出力する。 Further, the control unit 190 includes a key frame extraction device 500 and a thumbnail generation unit 550 as functional configurations. The key frame extraction apparatus 500 takes in the moving image data read from the storage medium 200 to the buffer memory unit 130 by the control unit 190 and decoded by the image processing unit 140. Then, the key frame extracting apparatus 500 detects a scene from the captured moving image data. The key frame extraction apparatus 500 detects a scene by estimating the timing at which the scene changes (scene cut timing). Then, the key frame extraction apparatus 500 extracts a key frame based on the estimated scene cut timing, and outputs information for specifying the key frame (for example, frame identification information described later). Then, the key frame extraction device 500 classifies the key frames and outputs the classification data.

撮影によって得られた動画像におけるシーンは、例えば、同一被写体が時間的な連続性を有して表現さる映像の一区間であり、シーンには撮影者が意図するテーマ（主題）が表現されていることが多い。連接する２個のシーンの区切り（シーンカット）は、撮像装置１００の撮像停止状態からの撮像開始時、撮影時のディゾルブやワイプ等の特殊効果映像の挿入、撮像装置１００の素早いパンやズーム等のカメラワーク等により生じる。また、撮影後の編集によって、シーンカットが生じる場合もある。
キーフレームは、動画像データのうち代表的な一または複数の画像データである。キーフレームの画像は、シーンのテーマを表現した画像であることが望ましい。本実施形態であるキーフレーム抽出装置５００は、動画像データにおけるキーフレームを精度よく抽出するものである。 A scene in a moving image obtained by shooting is, for example, a section of video in which the same subject is expressed with temporal continuity, and the theme (theme) intended by the photographer is expressed in the scene. There are many. Separation of two scenes that are connected (scene cut) includes the start of imaging from the imaging stop state of the imaging apparatus 100, insertion of special effect images such as dissolves and wipes at the time of imaging, quick panning and zooming of the imaging apparatus 100, etc. This is caused by camera work. Moreover, a scene cut may occur due to editing after shooting.
The key frame is one or more representative image data of the moving image data. The key frame image is preferably an image representing the theme of the scene. The key frame extracting apparatus 500 according to the present embodiment extracts key frames in moving image data with high accuracy.

サムネイル生成部５５０は、キーフレーム抽出装置５００が抽出したキーフレームに基づいて、当該キーフレームに対応する画像データの解像度を縮小したサムネイル画像データ（縮小画像データ）を生成し、当該キーフレームを含む動画像データに関連付けて記憶媒体２００に記憶させる。 The thumbnail generation unit 550 generates thumbnail image data (reduced image data) obtained by reducing the resolution of image data corresponding to the key frame based on the key frame extracted by the key frame extraction apparatus 500, and includes the key frame. It is stored in the storage medium 200 in association with moving image data.

操作部１８０は、電源スイッチ、シャッターボタン、十字キー、確定ボタン、キャンセルボタン、メニューボタン等の操作キーを有し、撮影者等の操作者による操作キーの操作にしたがって操作キー信号を発生させて制御部１９０に供給する。
画像処理部１４０は、記憶部１６０に記憶された画像処理条件に基づいて、バッファメモリ部１３０に記憶された画像データに対する画像処理を実行する。画像処理条件は、例えば、画像データ圧縮処理に関する条件であり、例えば、ＭＰＥＧ（ＭｏｖｉｎｇＰｉｃｔｕｒｅＥｘｐｅｒｔｓＧｏｕｐ）−４、ＭｏｔｉｏｎＪＰＥＧ（ＪｏｉｎｔＰｈｏｔｏｇｒａｐｈｉｃＥｘｐｅｒｔｓＧｒｏｕｐ）、ＭＰＥＧ−２等のデータ圧縮方式の指定や、データ圧縮の度合いの指定である。
バッファメモリ部１３０は、静止画像データ、動画像データ、圧縮静止画像データ、圧縮動画像データ等の各種画像に関するデータを一時記憶（バッファリング）する記憶部である。バッファメモリ部１３０は、例えば、Ａ／Ｄ変換部１２０が出力する画像データ、制御部１９０が通信部１７０を介して記憶媒体２００と受給する各種データ、画像処理部１４０がデータ圧縮処理および復号処理を実行する際に発生する各種データを記憶する。 The operation unit 180 has operation keys such as a power switch, a shutter button, a cross key, a confirmation button, a cancel button, and a menu button, and generates an operation key signal according to operation of the operation key by an operator such as a photographer. It supplies to the control part 190.
The image processing unit 140 executes image processing on the image data stored in the buffer memory unit 130 based on the image processing conditions stored in the storage unit 160. The image processing conditions are, for example, conditions relating to image data compression processing. For example, designation of data compression methods such as MPEG (Moving Picture Experts Group) -4, Motion JPEG (Joint Photographic Experts Group), MPEG-2, etc., data Specifies the degree of compression.
The buffer memory unit 130 is a storage unit that temporarily stores (buffers) data relating to various images such as still image data, moving image data, compressed still image data, and compressed moving image data. The buffer memory unit 130 includes, for example, image data output from the A / D conversion unit 120, various data received by the control unit 190 from the storage medium 200 via the communication unit 170, and data compression processing and decoding processing performed by the image processing unit 140. Various data generated when executing is stored.

表示部１５０は、例えば液晶ディスプレイ装置により実現され、撮像部１１０がとらえている画像を表示したり、画像処理部１４０が復号した静止画像データまたは動画像データを表示したり、各種メニューを表示したりする。
記憶部１６０は、制御部１９０が参照する撮像条件、画像処理部１４０が参照する画像処理条件等の情報を記憶する。
通信部１７０は、記憶媒体２００と接続可能な接続インタフェースを有する。通信部１７０は、接続インタフェースに記憶媒体２００が接続されている場合に、制御部１９０の制御に基づいて、記憶媒体２００に対する圧縮静止画像データ、圧縮動画像データ等の書込み処理、読み込み処理、または消去処理等を実行する。 The display unit 150 is realized by, for example, a liquid crystal display device, displays an image captured by the imaging unit 110, displays still image data or moving image data decoded by the image processing unit 140, and displays various menus. Or
The storage unit 160 stores information such as imaging conditions referred to by the control unit 190 and image processing conditions referred to by the image processing unit 140.
The communication unit 170 has a connection interface that can be connected to the storage medium 200. When the storage medium 200 is connected to the connection interface, the communication unit 170 writes, reads, or writes compressed still image data or compressed moving image data to the storage medium 200 based on the control of the control unit 190. Erase processing and the like are executed.

記憶媒体２００は、例えばメモリカードにより実現される。記憶媒体２００は、圧縮静止画像データおよび圧縮動画像データを記憶する。また、記憶媒体２００は、動画像データに関連付けられたサムネイル画像データも記憶する。 The storage medium 200 is realized by a memory card, for example. The storage medium 200 stores compressed still image data and compressed moving image data. The storage medium 200 also stores thumbnail image data associated with moving image data.

次に、キーフレーム抽出装置５００の構成について説明する。図２は、キーフレーム抽出装置５００の機能構成を表すブロック図である。同図に示すように、キーフレーム抽出装置５００は、シーン解析部５１０と、キーフレーム抽出部５２０と、特徴量解析部５３０と、分類処理部５４０とを備える。
キーフレーム抽出装置５００は、制御部１９０によって記憶媒体２００からバッファメモリ部１３０に読み出され、画像処理部１４０によって復号された動画像データを取り込むと、その動画像データをキーフレーム抽出装置５００のシーン解析部５１０と特徴量解析部５３０とにそれぞれ供給する。 Next, the configuration of the key frame extraction apparatus 500 will be described. FIG. 2 is a block diagram illustrating a functional configuration of the key frame extracting apparatus 500. As shown in the figure, the key frame extraction apparatus 500 includes a scene analysis unit 510, a key frame extraction unit 520, a feature amount analysis unit 530, and a classification processing unit 540.
When the moving image data read out from the storage medium 200 to the buffer memory unit 130 and decoded by the image processing unit 140 is captured by the control unit 190, the key frame extracting device 500 reads the moving image data into the key frame extracting device 500. The data are supplied to the scene analysis unit 510 and the feature amount analysis unit 530, respectively.

シーン解析部５１０は、供給された動画像データを取り込んで解析し、シーンカットタイミングを推定することによってシーンを検出する。シーン解析部５１０は、その機能構成として、時空間画像生成部５１１と、シーン検出部５１２とを備える。 The scene analysis unit 510 captures and analyzes the supplied moving image data, and detects a scene by estimating a scene cut timing. The scene analysis unit 510 includes a spatiotemporal image generation unit 511 and a scene detection unit 512 as functional configurations.

時空間画像生成部５１１は、動画像データを所定の属性にしたがって正規化したのち、この正規化した動画像データ（正規化動画像データ）から、時間経過に伴うフレーム画像の画素値の変化を表す時空間画像データを生成する。所定の属性は、例えば、画像データの縦横比、解像度、色空間、フレームレートである。時空間画像生成部５１１が時空間画像データを生成する処理の詳細については後述する。 The spatiotemporal image generation unit 511 normalizes the moving image data according to a predetermined attribute, and then changes the pixel value of the frame image over time from the normalized moving image data (normalized moving image data). Generate spatiotemporal image data to represent. The predetermined attributes are, for example, the aspect ratio, resolution, color space, and frame rate of the image data. Details of the process of generating the spatiotemporal image data by the spatiotemporal image generation unit 511 will be described later.

シーン検出部５１２は、時空間画像生成部５１１が生成した時空間画像データを取り込み、この時空間画像データを解析してシーンカットタイミングを推定することによりシーンを検出する。シーン検出部５１２は、推定した全てのシーンカットタイミングに対応するフレーム画像のフレーム識別情報を抽出し、これらのフレーム識別情報と正規化動画像データとを対応付けてキーフレーム抽出部５２０に供給する。フレーム識別情報は、正規化動画像データを構成するフレーム画像を特定する情報、言い換えると、動画像データを構成する画像データを特定する情報であり、例えばタイムスタンプ情報である。タイムスタンプ情報は、例えば、“時：分：秒．フレーム番号”により表される。
シーン検出部５１２がシーンを検出する処理の詳細については後述する。 The scene detection unit 512 detects the scene by taking in the spatiotemporal image data generated by the spatiotemporal image generation unit 511 and analyzing the spatiotemporal image data to estimate the scene cut timing. The scene detection unit 512 extracts frame identification information of frame images corresponding to all estimated scene cut timings, and associates the frame identification information with the normalized moving image data and supplies them to the key frame extraction unit 520. . The frame identification information is information for specifying the frame images constituting the normalized moving image data, in other words, information for specifying the image data constituting the moving image data, for example, time stamp information. The time stamp information is represented by, for example, “hour: minute: second.frame number”.
Details of the process in which the scene detection unit 512 detects a scene will be described later.

キーフレーム抽出部５２０は、シーン解析部５１０のシーン検出部５１２から供給されるフレーム識別情報と正規化動画像データとを取り込み、正規化動画像データから１個のキーフレームとなるフレーム画像を抽出し、この抽出したフレーム画像に対応するフレーム識別情報を、制御部１９０のサムネイル生成部５５０に供給するとともに特徴量解析部５３０に供給する。 The key frame extraction unit 520 takes in the frame identification information and the normalized moving image data supplied from the scene detection unit 512 of the scene analysis unit 510, and extracts a frame image that becomes one key frame from the normalized moving image data. Then, the frame identification information corresponding to the extracted frame image is supplied to the thumbnail generation unit 550 of the control unit 190 and also to the feature amount analysis unit 530.

本実施形態では、キーフレーム抽出部５２０は、動画の特性に関する以下の３つの仮定に基づいてキーフレームを抽出する。
仮定１：シーン内における構図の変化は小さい。
仮定２：シーンカット付近のフレーム画像は、テーマ性が高い画像ではない。
仮定３：動画のテーマは、シーン長が長いシーンに含まれている。 In the present embodiment, the key frame extraction unit 520 extracts key frames based on the following three assumptions relating to the characteristics of the moving image.
Assumption 1: The compositional change in the scene is small.
Assumption 2: The frame image near the scene cut is not an image with high theme characteristics.
Assumption 3: The moving image theme is included in a scene having a long scene length.

仮定３および仮定１は、シーン長が長いシーンにおいては、撮影者が被写体に対する興味を持続したと推測することができることに基づく。また、仮定２は、シーン終了付近については、シーンに対する撮影者の興味が薄れてきたか、興味がなくなったか、フェードアウト等の特殊映像効果が作動しているか、撮影を終了する間際かといった状況にある可能性が高いことに基づく。また、仮定２は、シーン開始後直近については、撮影者が新たなシーンの構図を決定している最中か、フェードイン等の特殊映像効果が作動しているか、撮影を開始して間もないかといった状況にある可能性が高いことに基づく。
キーフフレーム抽出部５２０がキーフレームを抽出する方法については後述する。 Assumption 3 and Assumption 1 are based on the fact that in a scene with a long scene length, it can be assumed that the photographer has maintained interest in the subject. Assumption 2 is that, in the vicinity of the end of the scene, the photographer's interest in the scene has diminished, is no longer interested, whether special video effects such as fade-out are operating, or just before the end of shooting. Based on the high probability. Assumption 2 is that, immediately after the start of the scene, the photographer is determining the composition of a new scene, whether a special video effect such as fade-in is operating, It is based on the high possibility of being in a situation such as.
A method by which the key frame extraction unit 520 extracts key frames will be described later.

特徴量解析部５３０は、動画像データとキーフレーム抽出部５２０から供給されるフレーム識別情報とをそれぞれ取り込み、そのフレーム識別情報に対応するフレーム画像から画像の特徴量（以下、単に“特徴量”と記載する。）を抽出してクラスタリング処理を実行する。
特徴量解析部５３０は、その機能構成として、画像正規化部５３１と、特徴量抽出部５３２と、クラスタリング処理部５３３とを備える。 The feature amount analysis unit 530 takes in the moving image data and the frame identification information supplied from the key frame extraction unit 520, respectively, and from the frame image corresponding to the frame identification information, an image feature amount (hereinafter simply referred to as “feature amount”). And the clustering process is executed.
The feature amount analysis unit 530 includes an image normalization unit 531, a feature amount extraction unit 532, and a clustering processing unit 533 as its functional configuration.

画像正規化部５３１は、フレーム識別情報に対応する画像データを動画像データから抽出し、この抽出した画像データを所定の属性にしたがって正規化して正規化画像データを生成する。所定の属性は、例えば、フレーム画像の縦横比、解像度、色空間である。
具体的には、例えば、画像正規化部５３１は、キーフレームである画像データから輝度成分のみを抽出して輝度画像データを生成する。次に、画像正規化部５３１は、輝度画像データをフレーム画像の縦横比が４対３となる輝度画像データにトリミング処理する。例えば、縦横比が１６対９、２１対９、１対１等である輝度画像データを４対３の輝度画像データにトリミング処理する場合、トリミング前の輝度画像データの水平方向両端の矩形画像部分をカットして縦横比が４対３になるように合わせる。なお、トリミング前の輝度画像データの水平方向両端部に、無模様（例えば、黒色または灰色）の矩形枠を付加して縦横比を合わせるようにしてもよい。次に、画像正規化部５３１は、トリミング処理された輝度画像データをフィルタリング処理等によって解像度が水平方向３２０画素×垂直方向２４０画素である正規化画像データに変換する。 The image normalization unit 531 extracts image data corresponding to the frame identification information from the moving image data, normalizes the extracted image data according to a predetermined attribute, and generates normalized image data. The predetermined attributes are, for example, the aspect ratio, resolution, and color space of the frame image.
Specifically, for example, the image normalization unit 531 generates luminance image data by extracting only luminance components from image data that is a key frame. Next, the image normalization unit 531 trims the luminance image data into luminance image data in which the aspect ratio of the frame image is 4 to 3. For example, when trimming processing of luminance image data having an aspect ratio of 16: 9, 21: 9, 1: 1, etc. into 4: 3 luminance image data, rectangular image portions at both ends in the horizontal direction of the luminance image data before trimming And adjust so that the aspect ratio is 4 to 3. Note that a non-patterned (for example, black or gray) rectangular frame may be added to both ends in the horizontal direction of the luminance image data before trimming so as to match the aspect ratio. Next, the image normalization unit 531 converts the trimmed luminance image data into normalized image data having a resolution of 320 pixels in the horizontal direction × 240 pixels in the vertical direction by filtering processing or the like.

画像正規化部５３１を設けた理由は、フレーム画像の縦横比が異なるとキーフレームにおける被写体の密度が異なることとなり、また、解像度が異なるとキーフレームが有する情報量が異なることとなり、これらによってキーフレームが有する特徴量の評価に影響が出てしまうことを防ぐためである。 The reason for providing the image normalization unit 531 is that if the aspect ratio of the frame image is different, the density of the subject in the key frame is different, and if the resolution is different, the information amount of the key frame is different. This is to prevent the evaluation of the feature value of the frame from being affected.

特徴量抽出部５３２は、画像正規化部５３１が生成した正規化画像データを取り込み、正規化画像データから特徴量を抽出する。例えば、特徴量抽出部５３２は、正規化画像データにおいて複数の画素を含む小領域ごとに輝度のガウス分布の状態を調べ、特徴点ごとのＳＩＦＴ（Ｓｃａｌｅ−ＩｎｖａｒｉａｎｔＦｅａｔｕｒｅＴｒａｎｓｆｏｒｍ）特徴量を求める。 The feature amount extraction unit 532 takes in the normalized image data generated by the image normalization unit 531 and extracts the feature amount from the normalized image data. For example, the feature amount extraction unit 532 checks the state of the luminance Gaussian distribution for each small region including a plurality of pixels in the normalized image data, and obtains a SIFT (Scale-Invariant Feature Transform) feature amount for each feature point.

クラスタリング処理部５３３は、特徴量抽出部５３２が抽出した正規化画像データの特徴量を取り込み、特徴点ごとの特徴量のクラスタリング処理（Ｂａｇ−ｏｆ−ｗｏｒｄｓ処理）を実行して特徴量のヒストグラム（特徴量ヒストグラム）を生成する。クラスタリング処理部５３３は、例えば、Ｋ平均法によって正規化画像データの特徴量をＫ個（例えば、１０００個）のクラスタに分類して特徴量ヒストグラムを生成する。 The clustering processing unit 533 takes in the feature amount of the normalized image data extracted by the feature amount extraction unit 532, executes a feature amount clustering process (Bag-of-words process) for each feature point, and performs a feature amount histogram ( Feature amount histogram). The clustering processing unit 533 generates the feature amount histogram by classifying the feature amounts of the normalized image data into K (for example, 1000) clusters by, for example, the K average method.

分類処理部５４０は、クラスタリング処理部５３３が生成した特徴量ヒストグラムを分析して正規化画像データ、すなわちキーフレームを分類し分類データを出力する。例えば、分類処理部５４０は、機械学習により特徴量の分類を学習し、その分類結果を分類データとして出力する。分類データは、区分に応じた識別情報や、あらかじめ学習によって決定しておいた“テニス試合”、“クッキー作り”、“サッカー”等のキーワード等である。分類処理部５４０は、例えば、サポートベクターマシン（ＳｕｐｐｏｒｔＶｅｃｔｏｒＭａｃｈｉｎｅ）やニューラルネットワーク等によって特徴量の分類を学習する。 The classification processing unit 540 analyzes the feature amount histogram generated by the clustering processing unit 533, classifies normalized image data, that is, key frames, and outputs classification data. For example, the classification processing unit 540 learns the classification of feature amounts by machine learning and outputs the classification result as classification data. The classification data includes identification information corresponding to the classification, keywords such as “tennis game”, “cookie making”, “soccer”, and the like that are determined in advance through learning. The classification processing unit 540 learns the classification of feature quantities using, for example, a support vector machine (Support Vector Machine) or a neural network.

次に、時空間画像生成部５１１が時空間画像データを生成する処理について説明する。まず、時空間画像生成部５１１は、動画像データの画像データを間引く。例えば、時空間画像生成部５１１は、３０フレーム／秒（ｆｐｓ；ｆｒａｍｅｐｅｒｓｅｃｏｎｄ）のフレームレートを有する動画像データを、２フレーム／秒のフレームレートになるように画像データを間引く。ただし、この間引き処理は省略してもよい。次に、時空間画像生成部５１１は、間引き後の動画像データから輝度成分のみを抽出して輝度動画像データを生成する。次に、時空間画像生成部５１１は、輝度動画像データの各画像データを、例えばフレーム画像の縦横比が４対３となる画像データにトリミング処理する。トリミング処理は、前述した画像正規化部５３１における処理と同様の処理である。次に、時空間画像生成部５１１は、トリミング処理された輝度動画像データをフィルタリング処理等によって、例えば解像度が水平方向３２０画素×垂直方向２４０画素である正規化動画像データに変換する。次に、時空間画像生成部５１１は、正規化動画像データから時空間画像データを生成する。 Next, processing in which the spatiotemporal image generation unit 511 generates spatiotemporal image data will be described. First, the spatiotemporal image generation unit 511 thins out image data of moving image data. For example, the spatiotemporal image generation unit 511 thins out the moving image data having a frame rate of 30 frames / second (fps; frame per second) so that the frame rate becomes 2 frames / second. However, this thinning process may be omitted. Next, the spatio-temporal image generation unit 511 generates luminance moving image data by extracting only luminance components from the thinned moving image data. Next, the spatio-temporal image generation unit 511 performs a trimming process on each image data of the luminance moving image data to image data in which the aspect ratio of the frame image is 4 to 3, for example. The trimming process is the same process as the process in the image normalization unit 531 described above. Next, the spatio-temporal image generation unit 511 converts the trimmed luminance moving image data into normalized moving image data whose resolution is, for example, 320 pixels in the horizontal direction × 240 pixels in the vertical direction by filtering processing or the like. Next, the spatiotemporal image generation unit 511 generates spatiotemporal image data from the normalized moving image data.

図３は、時空間画像データを説明するための模式図である。同図（ａ）は、時空間画像データの元である正規化動画像データのフレーム構成を模式的に表した図である。同図（ｂ）は、同図（ａ）に示した正規化動画像データに基づいて、時空間画像生成部５１１が生成した時空間画像データのデータ構成を模式的に表した図である。
同図（ａ）に示すように、正規化動画像データの属性は、フレーム画像の縦横比が４対３であり、解像度が水平方向３２０画素×垂直方向２４０画素であり、フレームレートが２フレーム／秒である。ここでは、再生時間が３５１秒である例を示している。１フレーム画像における水平ラインは、フレーム画像の上端から下端に向けてライン１，ライン２，・・・，ライン３０と表される。 FIG. 3 is a schematic diagram for explaining the spatiotemporal image data. FIG. 4A is a diagram schematically showing a frame configuration of normalized moving image data that is the source of spatiotemporal image data. FIG. 6B is a diagram schematically showing the data configuration of the spatiotemporal image data generated by the spatiotemporal image generation unit 511 based on the normalized moving image data shown in FIG.
As shown in FIG. 5A, the normalized moving image data has the following attributes: the frame image has an aspect ratio of 4: 3, the resolution is 320 pixels in the horizontal direction × 240 pixels in the vertical direction, and the frame rate is 2 frames. / Sec. Here, an example in which the reproduction time is 351 seconds is shown. The horizontal lines in one frame image are represented as line 1, line 2,..., Line 30 from the upper end to the lower end of the frame image.

時空間画像生成部５１１は、正規化動画像データを図３（ｂ）のように、時間（フレーム位置）対空間座標に置き換えて二次元データである時空間画像データを生成する。言い換えると、時空間画像生成部５１１は、正規化動画像データのフレーム画像ごとに、ライン１からライン３０までの画素値を空間座標軸方向に一列に配列して時空間画像データを生成する。同図（ａ）に示す正規化動画像データから得られる時空間画像データは、時間軸方向７０２フレーム×空間座標軸方向１２００画素分の１個の画像データである。 The spatiotemporal image generation unit 511 generates spatiotemporal image data that is two-dimensional data by replacing the normalized moving image data with time (frame position) versus spatial coordinates as shown in FIG. In other words, the spatio-temporal image generation unit 511 generates spatio-temporal image data by arranging the pixel values from line 1 to line 30 in a line in the spatial coordinate axis direction for each frame image of the normalized moving image data. The spatiotemporal image data obtained from the normalized moving image data shown in FIG. 5A is one piece of image data for 702 frames in the time axis direction × 1200 pixels in the spatial coordinate axis direction.

次に、シーン検出部５１２がシーンを検出する処理について説明する。シーン検出部５１２は、時空間画像データについて、例えば輝度勾配を計算することによって画像のエッジを検出する。例えば、シーン検出部５１２は、キャニー（Ｃａｎｎｙ）法によって時空間画像データから画像のエッジを検出する。ここで検出される画像のエッジの直線方向は縦、横、および斜め方向を含んでいる。
次に、シーン検出部５１２は、検出した画像のエッジの中から、空間座標軸に平行な直線となるエッジを検出する。例えば、シーン検出部５１２は、ハフ（Ｈｏｕｇｈ）変換処理を実行することによって、検出した画像のエッジの中から空間座標軸に平行な直線成分のみを抽出する。 Next, processing in which the scene detection unit 512 detects a scene will be described. The scene detection unit 512 detects an edge of the image by calculating, for example, a luminance gradient for the spatiotemporal image data. For example, the scene detection unit 512 detects the edge of the image from the spatiotemporal image data by the Canny method. The linear direction of the edge of the image detected here includes the vertical, horizontal, and diagonal directions.
Next, the scene detection unit 512 detects an edge that is a straight line parallel to the spatial coordinate axis from the detected edges of the image. For example, the scene detection unit 512 extracts only a linear component parallel to the spatial coordinate axis from the detected edge of the image by executing a Hough conversion process.

次に、シーン検出部５１２は、フレーム画像ごとに、当該フレーム画像がショットカットに該当するか否かを推定する。具体的には、シーン検出部５１２は、フレーム画像ごと（時間方向に１画素ごと）に空間座標方向の直線成分の総和を計算し、その総和が所定の閾値以上である場合は、当該フレーム画像はショットカットであると推定し、その総和が所定の閾値未満である場合は、当該フレーム画像はショットカットではないと推定する。推定されたショットカットはシーンカットタイミングの候補となる。所定の閾値は、任意に決定することができ、例えば、時空間画像データの空間座標方向の高さの割合で規定する。例えば、所定の閾値は、時空間画像データにおける時空間座標方向の高さの１／６０（高さが１２００画素である場合は２０画素）である。 Next, the scene detection unit 512 estimates, for each frame image, whether the frame image corresponds to a shot cut. Specifically, the scene detection unit 512 calculates the sum of the linear components in the spatial coordinate direction for each frame image (one pixel in the time direction), and if the sum is equal to or greater than a predetermined threshold, the frame image Is estimated to be a shot cut, and if the sum is less than a predetermined threshold, it is estimated that the frame image is not a shot cut. The estimated shot cut is a candidate for scene cut timing. The predetermined threshold value can be arbitrarily determined, and is defined by, for example, the ratio of the height in the spatial coordinate direction of the spatiotemporal image data. For example, the predetermined threshold value is 1/60 of the height in the spatiotemporal coordinate direction in the spatiotemporal image data (20 pixels if the height is 1200 pixels).

次に、シーン検出部５１２は、ショットカットであると推定した直線成分から、単なるショットであって有効なシーンではないカットを除外する。この「ショットであって有効なシーンではないカット」とは、例えば、ディゾルブやワイプ等の特殊効果映像が該当する。具体的には、シーン検出部５１２は、ショットカットであると推定した直線成分において、時間方向に隣り合う２個の直線成分に挟まれた時間間隔が所定時間よりも短い場合の当該２個の直線成分の少なくとも一方を、シーンカットタイミングの候補から除外する。この所定時間は、任意に設定可能であるが、例えば２秒（図３（ｂ）においては、時間方向に４画素分）である。 Next, the scene detection unit 512 excludes a cut that is a mere shot and is not an effective scene from the linear component estimated to be a shot cut. This “cut that is a shot and is not an effective scene” corresponds to, for example, a special effect image such as dissolve or wipe. Specifically, in the linear component estimated to be a shot cut, the scene detection unit 512 uses the two when the time interval between two linear components adjacent in the time direction is shorter than a predetermined time. At least one of the linear components is excluded from the scene cut timing candidates. The predetermined time can be arbitrarily set, and is, for example, 2 seconds (in FIG. 3B, 4 pixels in the time direction).

次に、シーン検出部５１２は、シーンカットタイミングの候補として残っている全ての直線成分に対応するフレーム画像のフレーム識別情報と正規化動画像データとを出力する。 Next, the scene detection unit 512 outputs frame identification information and normalized moving image data of frame images corresponding to all the linear components remaining as scene cut timing candidates.

次に、キーフレーム抽出部５２０がキーフレームを抽出する処理について図４を併せ参照して説明する。図４は、キーフレーム抽出部５２０が正規化動画像データの中からキーフレームを抽出する様子を模式的に表した図である。シーン抽出部５２０は、シーン解析部５１０のシーン検出部５１２が出力したフレーム識別情報と正規化動画像データとを取り込み、正規化動画像データにおける各シーンのシーン長を計算する。このシーン長は、時間長またはフレーム数である。つまり、フレーム識別情報に対応するフレーム画像がシーンの最初のフレーム画像であることに基づいて、シーン抽出部５２０は、各シーンのシーン長を計算する。例えば、同図（ａ）に示すシーン構成である場合、フレーム識別情報に対応するフレーム画像は各シーンの最初のフレーム画像であり、キーフレーム抽出部５２０は、第１番目のシーン（シーン１）から第５番目のシーン（シーン５）までの各シーンのフレーム数を、１０フレーム、６フレーム、１５フレーム、１２フレーム、および４フレームとして計数する。 Next, processing in which the key frame extraction unit 520 extracts key frames will be described with reference to FIG. FIG. 4 is a diagram schematically illustrating how the key frame extraction unit 520 extracts key frames from the normalized moving image data. The scene extraction unit 520 takes in the frame identification information output from the scene detection unit 512 of the scene analysis unit 510 and the normalized moving image data, and calculates the scene length of each scene in the normalized moving image data. This scene length is a time length or the number of frames. That is, based on the fact that the frame image corresponding to the frame identification information is the first frame image of the scene, the scene extraction unit 520 calculates the scene length of each scene. For example, in the case of the scene configuration shown in FIG. 5A, the frame image corresponding to the frame identification information is the first frame image of each scene, and the key frame extraction unit 520 displays the first scene (scene 1). To the fifth scene (scene 5) are counted as 10 frames, 6 frames, 15 frames, 12 frames, and 4 frames.

次に、キーフレーム抽出部５２０は、前述した仮定３にしたがい、シーンの中から最もシーン長が長い（フレーム数が多い）シーンを検出する。例えば、キーフレーム抽出部５２０は、各シーンのフレーム数を比較することにより、最もフレーム数が多いシーンとして、図４（ｂ）に示すように第３番目のシーン（シーン３：１５フレーム）を検出する。
次に、キーフレーム抽出部５２０は、正規化動画像データのうち、検出した１個のシーンに対応するフレーム画像から所定範囲の輝度値を有するフレーム画像を検出する。具体的には、キーフレーム抽出部５２０は、正規化動画像データのうち、検出した１個のシーンに対応するフレーム画像から、平均輝度値が第１の閾値よりも低い低輝度フレーム画像、もしくは平均輝度値が第２の閾値（第２の閾値＞第１の閾値）よりも高い高輝度フレーム画像、または低輝度フレーム画像および高輝度フレーム画像を検出する。同図（ｂ）は、第３番目のシーン（シーン３）に対応する１５フレームのフレーム画像から、４つの低輝度フレーム画像および高輝度フレーム画像を検出した例を示している。 Next, the key frame extraction unit 520 detects the scene having the longest scene length (the number of frames is large) from the scenes according to the assumption 3 described above. For example, the key frame extraction unit 520 compares the number of frames of each scene to determine the third scene (scene 3: 15 frames) as the scene with the largest number of frames as shown in FIG. To detect.
Next, the key frame extraction unit 520 detects a frame image having a predetermined range of luminance values from the frame image corresponding to one detected scene from the normalized moving image data. Specifically, the key frame extraction unit 520 includes a low-luminance frame image having an average luminance value lower than the first threshold value from a frame image corresponding to one detected scene in the normalized moving image data, or A high-luminance frame image whose average luminance value is higher than a second threshold (second threshold> first threshold), or a low-luminance frame image and a high-luminance frame image are detected. FIG. 7B shows an example in which four low-luminance frame images and high-luminance frame images are detected from the 15-frame frame image corresponding to the third scene (scene 3).

次に、キーフレーム抽出部５２０は、低輝度フレーム画像もしくは高輝度フレーム画像または低輝度フレーム画像および高輝度フレーム画像を除いたフレーム画像の中から、前述した仮定２にしたがい、時系列的に中央または中央近傍（例えば、時系列中心に最も近い）の１個のフレーム画像をキーフレームとして抽出する。このように、低輝度フレーム画像や高輝度フレーム画像を除く理由は、これらのフレーム画像は、一般的に見づらく、また後段の特徴量解析部５３０において特徴量を検出し難いためである。図４（ｃ），（ｄ）は、キーフレーム抽出部５２０が、１１フレームのフレーム画像の中から、時系列中心のフレーム画像である左から６番目のフレーム画像をキーフレームとして抽出した例を示している。
次に、キーフレーム抽出部５２０は、抽出したキーフレームに対応するフレーム識別情報を、制御部１９０のサムネイル生成部５５０に供給するとともに特徴量解析部５３０に供給する。 Next, the key frame extraction unit 520 performs time-series central processing in accordance with Assumption 2 described above from the low-luminance frame image, the high-luminance frame image, or the frame image excluding the low-luminance frame image and the high-luminance frame image. Alternatively, one frame image near the center (for example, closest to the time series center) is extracted as a key frame. As described above, the reason for excluding the low-luminance frame image and the high-luminance frame image is that these frame images are generally difficult to see, and it is difficult for the feature amount analysis unit 530 in the subsequent stage to detect the feature amount. FIGS. 4C and 4D show an example in which the key frame extraction unit 520 extracts the sixth frame image from the left, which is a frame image centered in time series, from 11 frame images as a key frame. Show.
Next, the key frame extraction unit 520 supplies frame identification information corresponding to the extracted key frame to the thumbnail generation unit 550 of the control unit 190 and also to the feature amount analysis unit 530.

図５は、キーフレーム抽出部５２０が抽出したキーフレームの画像の例である。同図（ａ）は、撮像装置１００がテニスの試合の様子を撮影した動画像データを用いて、キーフレーム抽出装置５００が抽出したキーフレームの画像である。この動画像データには、同図（ａ）に示すような、手前側と奥側との二人のテニス選手がラリーを続けているシーンが、最も時間が長いシーンとして含まれている。
また、同図（ｂ）は、撮像装置１００がクッキーを作っている少女の様子を撮影した動画像データを用いて、キーフレーム抽出装置５００が抽出したキーフレームの画像である。この動画像データには、同図（ｂ）に示すような、少女の手元がフレーム画像から外れて見えない構図のシーンが、最も時間が長いシーンとして含まれている。 FIG. 5 is an example of a key frame image extracted by the key frame extraction unit 520. FIG. 5A shows a key frame image extracted by the key frame extraction device 500 using moving image data obtained by photographing the state of the tennis game by the imaging device 100. In this moving image data, a scene where two tennis players on the front side and the back side continue the rally as shown in FIG.
FIG. 5B shows a key frame image extracted by the key frame extraction device 500 using moving image data obtained by photographing the state of the girl making the cookie by the imaging device 100. In this moving image data, a scene with a composition in which the hand of the girl cannot be seen from the frame image as shown in FIG.

次に、本実施形態であるキーフレーム抽出装置５００の主要な動作について説明する。図６は、キーフレーム抽出装置５００の動作手順を表すフローチャートである。
キーフレーム抽出装置５００が動画像データを取り込むと、ステップＳ１において、シーン解析部５１０の時空間画像生成部５１１は、動画像データの画像データを間引いたのち、間引き後の動画像データを所定の属性にしたがって正規化した正規化動画像データを生成する。例えば、時空間画像生成部５１１は、図３（ａ）のように、水平方向３２０画素×垂直方向２４０画素の解像度で、２フレーム／秒のフレームレートで、輝度情報により表現される正規化動画像データを生成する。
次に、時空間画像生成部５１１は、正規化動画像データを時間（フレーム位置）対空間座標に置き換えて、二次元データである時空間画像データを生成する。例えば、時空間画像生成部５１１は、同図（ａ）に示す正規化動画像データから、同図（ｂ）のように、時間方向７０２フレーム×空間座標方向１２００画素分の時空間画像データを生成する。 Next, main operations of the key frame extraction apparatus 500 according to the present embodiment will be described. FIG. 6 is a flowchart showing an operation procedure of the key frame extraction apparatus 500.
When the key frame extraction device 500 captures moving image data, in step S1, the spatiotemporal image generation unit 511 of the scene analysis unit 510 thins out the image data of the moving image data, and then extracts the moving image data after the thinning out to a predetermined value. Normalized moving image data normalized according to the attribute is generated. For example, as shown in FIG. 3A, the spatiotemporal image generation unit 511 has a resolution of 320 pixels in the horizontal direction × 240 pixels in the vertical direction, and a normalized moving image expressed by luminance information at a frame rate of 2 frames / second. Image data is generated.
Next, the spatiotemporal image generation unit 511 replaces the normalized moving image data with time (frame position) versus space coordinates to generate spatiotemporal image data that is two-dimensional data. For example, the spatio-temporal image generation unit 511 obtains spatio-temporal image data for 702 frames in the time direction × 1200 pixels in the spatial coordinate direction from the normalized moving image data shown in FIG. Generate.

次に、ステップＳ２において、シーン検出部５１２は、時空間画像生成部５１１が生成した時空間画像データを取り込み、この時空間画像データを解析してシーンカットタイミングを推定することによってシーンを検出する。具体的には、シーン検出部５１２は、時空間画像データについて、例えば輝度勾配を計算することによって画像のエッジを検出する。
次に、シーン検出部５１２は、検出した画像のエッジの中から、空間座標方向に直線となるエッジを検出する。
次に、シーン検出部５１２は、フレーム画像ごとに、当該フレーム画像がショットカットに該当するか否かを推定する。具体的には、シーン検出部５１２は、フレーム画像ごとに、空間座標方向の直線成分の総和を計算し、その総和が所定の閾値以上である場合は、当該フレーム画像はショットカットであると推定し、その総和が所定の閾値未満である場合は、当該フレーム画像はショットカットではないと推定する。
次に、シーン検出部５１２は、ショットカットであると推定した直線成分（シーンカットタイミングの候補）から、単なるショットであって有効なシーンではないカットを除外する。具体的には、シーン検出部５１２は、ショットカットであると推定した直線成分において、時間方向に隣り合う２個の直線成分に挟まれた時間間隔が所定時間よりも短い場合の当該２個の直線成分の少なくとも一方を、シーンカットタイミングの候補から除外する。
次に、シーン検出部５１２は、シーンカットタイミングの候補として残っている全ての直線成分に対応するフレーム画像のフレーム識別情報と正規化動画像データとを対応付けてキーフレーム抽出部５２０に供給する。 In step S2, the scene detection unit 512 detects the scene by taking in the spatiotemporal image data generated by the spatiotemporal image generation unit 511 and analyzing the spatiotemporal image data to estimate the scene cut timing. . Specifically, the scene detection unit 512 detects the edge of the image by calculating, for example, a luminance gradient for the spatiotemporal image data.
Next, the scene detection unit 512 detects an edge that is a straight line in the spatial coordinate direction from the detected edges of the image.
Next, the scene detection unit 512 estimates, for each frame image, whether the frame image corresponds to a shot cut. Specifically, the scene detection unit 512 calculates the sum of the linear components in the spatial coordinate direction for each frame image. If the sum is equal to or greater than a predetermined threshold, the scene image is estimated to be a shot cut. If the sum is less than a predetermined threshold, it is estimated that the frame image is not a shot cut.
Next, the scene detection unit 512 excludes a cut that is a mere shot and is not an effective scene from the straight line component (scene cut timing candidate) estimated to be a shot cut. Specifically, in the linear component estimated to be a shot cut, the scene detection unit 512 uses the two when the time interval between two linear components adjacent in the time direction is shorter than a predetermined time. At least one of the linear components is excluded from the scene cut timing candidates.
Next, the scene detection unit 512 associates the frame identification information of the frame image corresponding to all the straight line components remaining as scene cut timing candidates and the normalized moving image data, and supplies them to the key frame extraction unit 520. .

次に、ステップＳ３において、キーフレーム抽出部５２０は、シーン検出部５１２から供給されるフレーム識別情報と正規化動画像データとを取り込み、正規化動画像データから１個のキーフレームとなるフレーム画像に対応するフレーム識別情報を抽出して特徴量解析部５３０に供給する。具体的には、シーン抽出部５２０は、フレーム識別情報と正規化動画像データとに基づいて、正規化動画像データにおける各シーンのシーン長を計算する。
次に、キーフレーム抽出部５２０は、シーンの中から最もシーン長が長いシーンを検出する。
次に、キーフレーム抽出部５２０は、正規化動画像データのうち、検出した１個のシーンに対応するフレーム画像から所定範囲の輝度値を有するフレーム画像を検出する。具体的には、キーフレーム抽出部５２０は、正規化動画像データのうち、検出した１個のシーンに対応するフレーム画像から、平均輝度値が第１の閾値よりも低い低輝度フレーム画像、もしくは平均輝度値が第２の閾値よりも高い高輝度フレーム画像、または低輝度フレーム画像および高輝度フレーム画像を検出する。
次に、キーフレーム抽出部５２０は、低輝度フレーム画像もしくは高輝度フレーム画像または低輝度フレーム画像および高輝度フレーム画像を除いたフレーム画像の中から、時系列的に中央または中央近傍の１個のフレーム画像をキーフレームとして抽出する。
次に、キーフレーム抽出部５２０は、抽出したキーフレームに対応するフレーム識別情報を、制御部１９０のサムネイル生成部５５０に供給するとともに特徴量解析部５３０に供給する。 Next, in step S3, the key frame extraction unit 520 takes in the frame identification information and the normalized moving image data supplied from the scene detection unit 512, and the frame image that becomes one key frame from the normalized moving image data. The frame identification information corresponding to is extracted and supplied to the feature amount analysis unit 530. Specifically, the scene extraction unit 520 calculates the scene length of each scene in the normalized moving image data based on the frame identification information and the normalized moving image data.
Next, the key frame extraction unit 520 detects a scene having the longest scene length from the scenes.
Next, the key frame extraction unit 520 detects a frame image having a predetermined range of luminance values from the frame image corresponding to one detected scene from the normalized moving image data. Specifically, the key frame extraction unit 520 includes a low-luminance frame image having an average luminance value lower than the first threshold value from a frame image corresponding to one detected scene in the normalized moving image data, or A high luminance frame image whose average luminance value is higher than the second threshold value, or a low luminance frame image and a high luminance frame image are detected.
Next, the key frame extraction unit 520 time-sequentially selects one of the low-luminance frame image, the high-luminance frame image, or the frame image excluding the low-luminance frame image and the high-luminance frame image in time series. A frame image is extracted as a key frame.
Next, the key frame extraction unit 520 supplies frame identification information corresponding to the extracted key frame to the thumbnail generation unit 550 of the control unit 190 and also to the feature amount analysis unit 530.

次に、ステップＳ４において、特徴量解析部５３０の画像正規化部５３１は、動画像データとキーフレーム抽出部５２０から供給されるフレーム識別情報とをそれぞれ取り込み、フレーム識別情報に対応する画像データを動画像データから抽出する。
次に、画像正規化部５３１は、抽出した画像データを所定の属性にしたがって正規化した正規化画像データを生成する。例えば、画像正規化部５３１は、水平方向３２０画素×垂直方向２４０画素の解像度で、輝度情報により表現される正規化画像データを生成する。
次に、ステップＳ５において、特徴量抽出部５３２は、画像正規化部５３１が生成した正規化画像データを取り込み、正規化画像データから特徴量を抽出する。
次に、ステップＳ６において、クラスタリング処理部５３３は、特徴量抽出部５３２から供給された正規化画像データの特徴量のクラスタリング処理を実行して特徴量ヒストグラムを生成する。 Next, in step S4, the image normalization unit 531 of the feature amount analysis unit 530 takes in the moving image data and the frame identification information supplied from the key frame extraction unit 520, and obtains image data corresponding to the frame identification information. Extract from video data.
Next, the image normalization unit 531 generates normalized image data obtained by normalizing the extracted image data according to a predetermined attribute. For example, the image normalization unit 531 generates normalized image data represented by luminance information with a resolution of 320 pixels in the horizontal direction × 240 pixels in the vertical direction.
Next, in step S5, the feature amount extraction unit 532 takes in the normalized image data generated by the image normalization unit 531 and extracts the feature amount from the normalized image data.
Next, in step S <b> 6, the clustering processing unit 533 performs a clustering process of the feature amounts of the normalized image data supplied from the feature amount extraction unit 532 to generate a feature amount histogram.

次に、ステップＳ７において、分類処理部５４０は、クラスタリング処理部５３３が生成した特徴量ヒストグラムを取り込み、この特徴量ヒストグラムを分析して正規化画像データ、すなわちキーフレームを分類し、その分類結果を分類データとして出力する。 In step S7, the classification processing unit 540 takes in the feature amount histogram generated by the clustering processing unit 533, analyzes the feature amount histogram, classifies normalized image data, that is, key frames, and outputs the classification result. Output as classification data.

次に、キーフレーム抽出装置５００を適用した撮像装置１００の主要な動作について説明する。まず、撮像装置１００が、キーフレーム抽出装置５００のキーフレーム抽出結果に基づいて、記憶媒体２００に記憶された圧縮動画像データにおけるキーフレームのサムネイル画像データを生成する処理について説明する。
キーフレーム抽出装置５００がキーフレームを抽出してこのキーフレームに対応するフレーム識別情報を出力すると、制御部１９０のサムネイル生成部５５０は、そのフレーム識別情報を取り込む。
次に、サムネイル生成部５５０は、フレーム識別情報に対応する画像データを、記憶媒体２００またはバッファメモリ部１３０から抽出して取り込む。
次に、サムネイル生成部５５０は、取り込んだ画像データの解像度を縮小したサムネイル画像データを生成する。
次に、サムネイル生成部５５０は、生成したサムネイル画像データを対応する圧縮動画像データに関連付けて記憶媒体２００に記憶させる。なお、サムネイル画像データを、対応する圧縮動画像データのヘッダ部分に格納してもよい。 Next, main operations of the imaging apparatus 100 to which the key frame extraction apparatus 500 is applied will be described. First, a process in which the imaging apparatus 100 generates thumbnail image data of key frames in compressed moving image data stored in the storage medium 200 based on the key frame extraction result of the key frame extraction apparatus 500 will be described.
When the key frame extraction apparatus 500 extracts a key frame and outputs frame identification information corresponding to the key frame, the thumbnail generation unit 550 of the control unit 190 takes in the frame identification information.
Next, the thumbnail generation unit 550 extracts and captures image data corresponding to the frame identification information from the storage medium 200 or the buffer memory unit 130.
Next, the thumbnail generation unit 550 generates thumbnail image data in which the resolution of the captured image data is reduced.
Next, the thumbnail generation unit 550 stores the generated thumbnail image data in the storage medium 200 in association with the corresponding compressed moving image data. The thumbnail image data may be stored in the header portion of the corresponding compressed moving image data.

なお、サムネイル生成部５５０は、キーフレーム抽出装置５００が出力した分類データをも取り込み、その分類データ（例えば、“テニス試合”や“クッキー作り”等のキーワード）を、サムネイル画像データとともに圧縮動画像データに関連付けて記憶媒体２００に記憶させてもよい。
また、制御部１９０は、分類データに基づいて記憶媒体２００に記憶された圧縮動画像データを整理（例えば、ディレクトリ管理）してもよい。 Note that the thumbnail generation unit 550 also captures the classification data output by the key frame extraction device 500 and compresses the classification data (for example, keywords such as “tennis game” and “cookie making”) together with the thumbnail image data. The data may be stored in the storage medium 200 in association with the data.
Further, the control unit 190 may organize (for example, directory management) the compressed moving image data stored in the storage medium 200 based on the classification data.

次に、記憶媒体２００に記憶された、サムネイル画像データが関連付けられた圧縮動画像データの一覧を表示部１５０に表示させる処理について説明する。
操作者による操作部１８０の操作により、操作部１８０が、記憶媒体２００に記憶されている圧縮動画像データの一覧を表示部１５０に表示させるための操作信号を出力すると、制御部１９０はその操作信号を取り込む。
次に、制御部１９０は、圧縮動画像データに関連付けられたサムネイル画像データと、圧縮動画像データが格納されている電子ファイルのファイル名とを記憶媒体２００から読み込んで表示部１５０に表示させる。 Next, processing for displaying a list of compressed moving image data associated with thumbnail image data stored in the storage medium 200 on the display unit 150 will be described.
When the operation unit 180 outputs an operation signal for causing the display unit 150 to display a list of compressed moving image data stored in the storage medium 200 by the operation of the operation unit 180 by the operator, the control unit 190 performs the operation. Capture the signal.
Next, the control unit 190 reads the thumbnail image data associated with the compressed moving image data and the file name of the electronic file storing the compressed moving image data from the storage medium 200 and causes the display unit 150 to display them.

図７は、表示部１５０に表示された圧縮動画像データの一覧を模式的に表した図である。同図において、表示部１５０の表示画面６００には、３個の圧縮動画像データの一覧が表示されており、サムネイル画像６０１，６０２，６０３と、サムネイル画像６０１，６０２，６０３にそれぞれ関係付けられた圧縮動画像データの電子ファイルのファイル名である、“２０１０１１０３．ｍｐ４”，“２０１０１１０５．ｍｐ４”，“２０１０１１１２．ｍｐ４”が表示されている。そして、同図においては、サムネイル画像６０１にカーソル６０４が表示されている。操作部１８０の十字キーの操作によってカーソル６０４はサムネイル画像６０１，６０２，６０３上を移動可能である。そして、確定ボタンが操作されることによって、制御部１９０は、カーソル６０４が指示するサムネイル画像データに関連付けられた圧縮動画像データを選択する。 FIG. 7 is a diagram schematically showing a list of compressed moving image data displayed on the display unit 150. In the drawing, a list of three compressed moving image data is displayed on the display screen 600 of the display unit 150, and is associated with the thumbnail images 601, 602, 603 and the thumbnail images 601, 602, 603, respectively. “201010103.mp4”, “201010105.mp4”, and “201011112.mp4”, which are the file names of the electronic files of the compressed moving image data, are displayed. In the figure, a cursor 604 is displayed on the thumbnail image 601. The cursor 604 can be moved on the thumbnail images 601, 602, and 603 by operating the cross key of the operation unit 180. When the confirm button is operated, the control unit 190 selects compressed moving image data associated with the thumbnail image data indicated by the cursor 604.

以上説明したように、本発明の第１実施形態であるキーフレーム抽出装置５００では、シーン解析部５１０が動画像データのシーンを検出し、キーフレーム抽出部５２０が、シーン長が最も長いシーンから時系列的に中央または中央近傍の１個のフレーム画像をキーフレームとして抽出するようにした。このように構成したことにより、キーフレーム抽出装置５００は、動画の特性に関する“動画のテーマは、シーン長が長いシーンに含まれている。”（仮定３）と、“シーンカット付近のフレーム画像は、テーマ性が高い画像ではない。”（仮定２）とを満足させて、キーフレームを精度よく抽出することができる。特に、キーフレーム抽出装置５００は、動画の特性を考慮してキーフレームを抽出するため、演算処理の負荷が軽く低コストに実現することができる。 As described above, in the key frame extraction apparatus 500 according to the first embodiment of the present invention, the scene analysis unit 510 detects a scene of moving image data, and the key frame extraction unit 520 starts from the scene having the longest scene length. One frame image at the center or near the center in time series is extracted as a key frame. By configuring in this way, the key frame extraction apparatus 500 relates to the characteristics of the moving image “The moving image theme is included in a scene having a long scene length” (assuming 3) and “the frame image near the scene cut”. Is not an image with high theme properties. ”(Assumption 2) is satisfied, and key frames can be extracted with high accuracy. In particular, since the key frame extraction apparatus 500 extracts key frames in consideration of the characteristics of a moving image, the calculation processing load is light and can be realized at low cost.

また、このキーフレーム抽出装置５００では、特徴量解析部５３０がキーフレームである画像データを正規化して画像の特徴量を抽出し、特徴量をクラスタリング処理して特徴量の特性を表す特徴量ヒストグラムを生成するようにした。また、このキーフレーム抽出装置５００は、特徴量ヒストグラムを機会学習して分類し分類データを出力するようにした。このように構成したことにより、キーフレーム抽出装置５００は、抽出したキーフレームである画像データを、画像の特徴に応じて機動的に分類することができる。よって、キーフレーム抽出装置５００を備える撮像装置１００は、キーフレームおよび分類データまたはいずれか一方の情報を適用して、保持する動画像データを視覚的に見易くまた検索容易に管理することができる。 In the key frame extraction apparatus 500, the feature amount analysis unit 530 normalizes the image data that is the key frame to extract the feature amount of the image, and the feature amount is subjected to clustering processing to represent the feature amount characteristic. Was generated. The key frame extracting apparatus 500 classifies the feature amount histogram by opportunity learning and outputs the classification data. With this configuration, the key frame extraction apparatus 500 can categorize image data, which is an extracted key frame, according to image characteristics. Therefore, the image capturing apparatus 100 including the key frame extraction apparatus 500 can manage the moving image data to be easily viewed and easily searched by applying the key frame and / or classification data.

［第１実施形態の第１の変形例］
上述した第１実施形態では、キーフレーム抽出部５２０は、仮定３および仮定２にしたがって、正規化動画像データにおける最もシーン長が長いシーンから１個のキーフレームとなるフレーム画像を抽出した。本実施形態の第１変形例では、キーフレーム抽出部５２０が、正規化動画像データにおける最もシーン長が長いシーンからキーフレームとなる複数のフレーム画像を抽出する例について説明する。本変形例におけるキーフレーム抽出部５２０は、仮定３および仮定２に加えて仮定１をも考慮に入れてキーフレームを抽出するものである。 [First Modification of First Embodiment]
In the first embodiment described above, the key frame extraction unit 520 extracts a frame image that becomes one key frame from the scene having the longest scene length in the normalized moving image data according to Assumption 3 and Assumption 2. In the first modification of the present embodiment, an example will be described in which the key frame extraction unit 520 extracts a plurality of frame images serving as key frames from a scene having the longest scene length in normalized moving image data. The key frame extraction unit 520 in the present modification extracts key frames in consideration of assumption 1 in addition to assumption 3 and assumption 2.

本変形例において、第１実施形態と相違する構成は、キーフレーム抽出部５２０と、特徴量解析部５３０の主にクラスタリング処理部５３３と、サムネイル生成部５５０とである。本変形例における全体的な機能構成は第１実施形態と同一であるため、本変形例では、ブロック図を省略し、第１実施形態における各構成と同一の符号を付した構成を用いて説明する。そして、本変形例では、第１実施形態との相違点についてのみ説明する。 In the present modification, the configuration different from the first embodiment is a key frame extraction unit 520, a clustering processing unit 533 mainly of a feature amount analysis unit 530, and a thumbnail generation unit 550. Since the overall functional configuration in the present modification is the same as that of the first embodiment, in this modification, a block diagram is omitted, and description is made using configurations having the same reference numerals as those in the first embodiment. To do. In this modification, only differences from the first embodiment will be described.

キーフレーム抽出部５２０は、シーン解析部５１０のシーン検出部５１２から供給されるフレーム識別情報と正規化動画像データとを取り込み、正規化動画像データからキーフレームとなるＮ個（Ｎは２以上の整数）のフレーム画像を抽出し、これら抽出したＮ個のフレーム画像に対応するフレーム識別情報を、制御部１９０のサムネイル生成部５５０に供給するとともに特徴量解析部５３０に供給する。 The key frame extraction unit 520 takes in the frame identification information and the normalized moving image data supplied from the scene detection unit 512 of the scene analysis unit 510, and N frames (N is 2 or more) that become key frames from the normalized moving image data. And the frame identification information corresponding to the extracted N frame images is supplied to the thumbnail generation unit 550 of the control unit 190 and to the feature amount analysis unit 530.

キーフレーム抽出部５２０がＮ個のキーフレームを抽出する処理について図８を併せ参照して説明する。図８は、キーフレーム抽出部５２０が正規化動画像データの中からＮ個（Ｎ＝３）のキーフレームを抽出する様子を模式的に表した図である。同図（ａ），（ｂ）は、第１実施形態における図４（ａ），（ｂ）と同一であり、キーフレーム抽出部５２０による処理も同一であるため、図８（ｃ）に対応する処理以降について説明する。 A process in which the key frame extraction unit 520 extracts N key frames will be described with reference to FIG. FIG. 8 is a diagram schematically illustrating how the key frame extraction unit 520 extracts N (N = 3) key frames from the normalized moving image data. FIGS. 4A and 4B are the same as FIGS. 4A and 4B in the first embodiment, and the processing by the key frame extraction unit 520 is also the same, and therefore corresponds to FIG. The subsequent processing will be described.

キーフレーム抽出部５２０は、低輝度フレーム画像もしくは高輝度フレーム画像または低輝度フレーム画像および高輝度フレーム画像を除いたフレーム画像の中から、前述した仮定２および仮定１にしたがい、等間隔にＮ個のフレーム画像をキーフレームとして抽出する。この場合、キーフレーム抽出部５２０は、仮定２にしたがって、シーンの少なくとも両端に対応するフレーム画像を抽出しないようにする。具体的には、キーフレーム抽出部５２０は、下記の式（１）または式（２）を計算することによって、抽出するフレーム画像の間隔を決定する。式（１），式（２）において、Ｄは抽出するフレーム画像の間隔を示す値、Ｆはシーンにおけるフレーム画像の数、Ｎは抽出するキーフレームの数である。 The key frame extraction unit 520 includes N low-frequency frame images, high-luminance frame images, or frame images excluding the low-luminance frame images and the high-luminance frame images according to the above-described assumption 2 and assumption 1, and N frames at equal intervals. Are extracted as key frames. In this case, the key frame extraction unit 520 does not extract frame images corresponding to at least both ends of the scene according to Assumption 2. Specifically, the key frame extraction unit 520 determines the interval between the frame images to be extracted by calculating the following formula (1) or formula (2). In Expressions (1) and (2), D is a value indicating the interval between extracted frame images, F is the number of frame images in the scene, and N is the number of key frames to be extracted.

キーフレーム抽出部５２０は、シーンの少なくとも両端に対応するフレーム画像を抽出しないようにするため、シーンにおけるフレーム画像の数Ｆが抽出するキーフレームの数Ｎで割り切れない場合は、式（１）により値Ｄを計算する。一方、キーフレーム抽出部５２０は、シーンにおけるフレーム画像の数Ｆが抽出するキーフレームの数Ｎで割り切れる場合は、式（２）により値Ｄを計算する。
なお、値Ｄが２未満の値である場合、キーフレーム抽出部５２０は、シーン長が次に長いシーンに対応するフレーム画像をつなげて式（１）または式（２）を計算する。 In order not to extract frame images corresponding to at least both ends of the scene, the key frame extraction unit 520 does not divide the number F of frame images in the scene by the number N of key frames to be extracted. Calculate the value D. On the other hand, when the number F of frame images in the scene is divisible by the number N of key frames to be extracted, the key frame extraction unit 520 calculates the value D using Expression (2).
When the value D is less than 2, the key frame extraction unit 520 connects the frame images corresponding to the scene with the next longest scene length and calculates Expression (1) or Expression (2).

キーフレーム抽出部５２０は、フレーム画像の配列の端から数えてＤ番目、次は、その位置から数えてＤ番目という順序でフレーム画像を抽出する。図８（ｃ），（ｄ）の例では、キーフレーム抽出部５２０は、式（１）によりＤ＝３を算出し、１１フレームの端から数えて３番目、その位置から数えて３番目、さらにその位置から数えて３番目にそれぞれ該当するフレーム画像をキーフレームとして抽出する。 The key frame extraction unit 520 extracts frame images in the order of Dth counted from the end of the array of frame images, and then Dth counted from the position. In the example of FIGS. 8C and 8D, the key frame extraction unit 520 calculates D = 3 according to the equation (1), the third counting from the end of the 11th frame, the third counting from the position, Further, the third corresponding frame image from the position is extracted as a key frame.

次に、キーフレーム抽出部５２０は、抽出したＮ個のキーフレームに対応するフレーム識別情報を、制御部１９０のサムネイル生成部５５０に供給するとともに特徴量解析部５３０に供給する。 Next, the key frame extraction unit 520 supplies frame identification information corresponding to the extracted N key frames to the thumbnail generation unit 550 of the control unit 190 and also to the feature amount analysis unit 530.

図９は、キーフレーム抽出部５２０が抽出した複数（Ｎ＝３）のキーフレームの画像の例である。同図（ａ）は、撮像装置１００がテニスの試合の様子を撮影した動画像データを用いて、キーフレーム抽出装置５００が抽出したキーフレームの画像である。この動画像データには、同図（ａ）に示すような、略一定したカメラアングルで撮影されたシーンが、最も時間が長いシーンとして含まれている。
また、同図（ｂ）は、撮像装置１００がクッキーを作っている少女の様子を撮影した動画像データを用いて、キーフレーム抽出装置５００が抽出したキーフレームの画像である。この動画像データにも、同図（ｂ）に示すような、略一定したカメラアングルで撮影されたシーン（少女の手元がフレーム画像から外れて見えない構図のシーン）が、最も時間が長いシーンとして含まれている。
同図（ａ），（ｂ）それぞれの３個のキーフレームの画像は、左側から右側に向けて時系列的に並べたものである。 FIG. 9 is an example of a plurality (N = 3) of key frame images extracted by the key frame extraction unit 520. FIG. 5A shows a key frame image extracted by the key frame extraction device 500 using moving image data obtained by photographing the state of the tennis game by the imaging device 100. In this moving image data, a scene shot at a substantially constant camera angle as shown in FIG. 5A is included as a scene having the longest time.
FIG. 5B shows a key frame image extracted by the key frame extraction device 500 using moving image data obtained by photographing the state of the girl making the cookie by the imaging device 100. Also in this moving image data, a scene taken at a substantially constant camera angle (a scene in which the girl's hand cannot be seen from the frame image) as shown in FIG. Included as
The images of the three key frames in FIGS. 4A and 4B are arranged in time series from the left side to the right side.

特徴量解析部５３０は、動画像データとキーフレーム抽出部５２０から供給されるＮ個のフレーム識別情報とをそれぞれ取り込み、フレーム識別情報ごとに、フレーム識別情報に対応するフレーム画像から特徴量を抽出してクラスタリング処理を実行する。
画像正規化部５３１は、フレーム識別情報ごとに、フレーム識別情報に対応する画像データを動画像データから抽出し、この抽出した画像データを所定の属性にしたがって正規化して正規化画像データを生成する。画像正規化部５３１の詳細な処理は第１実施形態と同様であるため、その説明を省略する。
特徴量抽出部５３２は、画像正規化部５３１が生成したＮ個の正規化画像データを取り込み、正規化画像データごとに、正規化画像データから特徴量を抽出する。特徴量抽出部５３２の詳細な処理は第１実施形態と同様であるため、その説明を省略する。 The feature amount analysis unit 530 takes in the moving image data and the N pieces of frame identification information supplied from the key frame extraction unit 520, and extracts the feature amount from the frame image corresponding to the frame identification information for each frame identification information. Then, the clustering process is executed.
For each frame identification information, the image normalization unit 531 extracts image data corresponding to the frame identification information from the moving image data, normalizes the extracted image data according to a predetermined attribute, and generates normalized image data. . Since the detailed processing of the image normalization unit 531 is the same as that of the first embodiment, the description thereof is omitted.
The feature amount extraction unit 532 takes in N normalized image data generated by the image normalization unit 531 and extracts feature amounts from the normalized image data for each normalized image data. Detailed processing of the feature amount extraction unit 532 is the same as that of the first embodiment, and a description thereof will be omitted.

クラスタリング処理部５３３は、特徴量抽出部５３２が抽出したＮ個の正規化画像データの特徴量を取り込み、正規化画像データごと且つ特徴点ごとに、特徴量のクラスタリング処理（Ｂａｇ−ｏｆ−ｗｏｒｄｓ処理）を実行して特徴量ヒストグラムを生成する。クラスタリング処理部５３３は、例えば、Ｋ平均法によって正規化画像データの特徴量をＫ個（例えば、１０００個）のクラスタに分類して特徴量ヒストグラムを生成する。次に、クラスタリング処理部５３３は、生成したＮ個の特徴量ヒストグラムをクラスごとに加算して統合ヒストグラムを生成する。この統合ヒストグラムは、キーフレームが含まれるシーンの動画の特性を表すヒストグラムである。 The clustering processing unit 533 takes in the feature amounts of the N normalized image data extracted by the feature amount extraction unit 532, and performs feature amount clustering processing (Bag-of-words processing) for each normalized image data and for each feature point. ) To generate a feature amount histogram. The clustering processing unit 533 generates the feature amount histogram by classifying the feature amounts of the normalized image data into K (for example, 1000) clusters by, for example, the K average method. Next, the clustering processing unit 533 adds the generated N feature amount histograms for each class to generate an integrated histogram. This integrated histogram is a histogram representing the characteristics of a moving image of a scene including a key frame.

図１０は、図９（ａ）における３個のキーフレームの画像それぞれの特徴量ヒストグラムと、クラスタリング処理部５３３がこれら３個の特徴量ヒストグラムをクラスごとに加算して生成した統合ヒストグラムとを模式的に表した図である。同図に示すように、画像の特性の類似性が比較的高い３個のキーフレームの画像の特徴量ヒストグラムを加算することにより、特徴的な分布が強調される。言い換えると、動画のテーマが強調される。 FIG. 10 schematically illustrates the feature amount histograms of the three key frame images in FIG. 9A and the integrated histogram generated by the clustering processing unit 533 adding the three feature amount histograms for each class. FIG. As shown in the figure, the characteristic distribution is emphasized by adding the feature amount histograms of images of three key frames having relatively high image characteristic similarity. In other words, the theme of the movie is emphasized.

分類処理部５４０は、クラスタリング処理部５３３が生成した統合ヒストグラムを分析して正規化画像データ、すなわちキーフレームを分類する。分類処理部５４０の詳細な処理は第１実施形態と同様であるため、その説明を省略する。 The classification processing unit 540 analyzes the integrated histogram generated by the clustering processing unit 533 and classifies normalized image data, that is, key frames. Since the detailed processing of the classification processing unit 540 is the same as that of the first embodiment, description thereof is omitted.

次に、キーフレーム抽出装置５００を適用した撮像装置１００の主要な動作について説明する。まず、撮像装置１００が、キーフレーム抽出装置５００のキーフレーム抽出結果に基づいて、記憶媒体２００に記憶された圧縮動画像データにおけるキーフレームのサムネイル画像データを生成する処理について説明する。
キーフレーム抽出装置５００がＮ個のキーフレームを抽出してこれらキーフレームに対応するフレーム識別情報を出力すると、制御部１９０のサムネイル生成部５５０は、Ｎ個のキーフレームに対応するフレーム識別情報を取り込む。
次に、サムネイル生成部５５０は、Ｎ個のフレーム識別情報に対応する画像データを、記憶媒体２００またはバッファメモリ部１３０から抽出して取り込む。
次に、サムネイル生成部５５０は、取り込んだＮ個の画像データそれぞれについて、解像度を縮小したサムネイル画像データを生成する。
次に、サムネイル生成部５５０は、生成したＮ個のサムネイル画像データを対応する圧縮動画像データに関連付けて記憶媒体２００に記憶させる。なお、Ｎ個のサムネイル画像データを、対応する圧縮動画像データのヘッダ部分に格納してもよい。 Next, main operations of the imaging apparatus 100 to which the key frame extraction apparatus 500 is applied will be described. First, a process in which the imaging apparatus 100 generates thumbnail image data of key frames in compressed moving image data stored in the storage medium 200 based on the key frame extraction result of the key frame extraction apparatus 500 will be described.
When the key frame extraction apparatus 500 extracts N key frames and outputs frame identification information corresponding to the key frames, the thumbnail generation unit 550 of the control unit 190 outputs frame identification information corresponding to the N key frames. take in.
Next, the thumbnail generation unit 550 extracts and captures image data corresponding to the N pieces of frame identification information from the storage medium 200 or the buffer memory unit 130.
Next, the thumbnail generation unit 550 generates thumbnail image data with reduced resolution for each of the N pieces of captured image data.
Next, the thumbnail generation unit 550 stores the generated N thumbnail image data in the storage medium 200 in association with the corresponding compressed moving image data. Note that N thumbnail image data may be stored in the header portion of the corresponding compressed moving image data.

次に、記憶媒体２００に記憶された、Ｎ個のサムネイル画像データが関連付けられた圧縮動画像データの一覧を表示部１５０に表示させる処理について説明する。
操作者による操作部１８０の操作により、操作部１８０が、記憶媒体２００に記憶されている圧縮動画像データの一覧を表示部１５０に表示させるための操作信号を出力すると、制御部１９０はその操作信号を取り込む。
次に、制御部１９０は、圧縮動画像データに関連付けられたＮ個のサムネイル画像データと、圧縮動画像データが格納されている電子ファイルのファイル名とを記憶媒体２００から読み込んで表示部１５０に表示させる。制御部１９０は、Ｎ個のサムネイル画像データの表示部１５０への表示を、例えば次の二通りの方法のいずれかによって実現する。 Next, processing for displaying a list of compressed moving image data associated with N thumbnail image data stored in the storage medium 200 on the display unit 150 will be described.
When the operation unit 180 outputs an operation signal for causing the display unit 150 to display a list of compressed moving image data stored in the storage medium 200 by the operation of the operation unit 180 by the operator, the control unit 190 performs the operation. Capture the signal.
Next, the control unit 190 reads N thumbnail image data associated with the compressed moving image data and the file name of the electronic file in which the compressed moving image data is stored from the storage medium 200, and displays it on the display unit 150. Display. The control unit 190 realizes the display of the N thumbnail image data on the display unit 150 by, for example, one of the following two methods.

第１の方法は、Ｎ個のサムネイル画像データを表示部１５０に並べて表示する方法である。Ｎ個のサムネイル画像データが並ぶ順番は、例えば時系列である。
第２の方法は、Ｎ個のサムネイル画像データを表示部１５０の同一座標に所定の時間おきに順次切り替えて表示する方法である。サムネイル画像データの表示の時間間隔は、任意に設定でき、例えば“１秒”である。 The first method is a method of displaying N thumbnail image data side by side on the display unit 150. The order in which the N thumbnail image data are arranged is, for example, time series.
The second method is a method of sequentially switching and displaying N thumbnail image data at the same coordinates on the display unit 150 at predetermined intervals. The time interval for displaying the thumbnail image data can be arbitrarily set, for example, “1 second”.

図１１は、表示部１５０に表示された圧縮動画像データの一覧を模式的に表した図である。同図（ａ）は上記の第１の方法による表示例であり、同図（ｂ）は上記の第２の方法による表示例である。同図（ａ）において、表示部１５０の表示画面６１０には、４個の圧縮動画像データの一覧が表示されており、圧縮動画像データごとに、サムネイル画像データが横並びに配列して表示されている。４個の圧縮動画像データの一覧のうち最上段に注目すると、３個のサムネイル画像６１１ａ，６１１ｂ，６１１ｃと、サムネイル画像６１１ａ，６１１ｂ，６１１ｃに関係付けられた圧縮動画像データの電子ファイルのファイル名である、“２０１０１１０３．ｍｐ４”が表示されている。そして、同図においては、サムネイル画像６１１ａ，６１１ｂ，６１１ｃを囲んでカーソル６１２が表示されている。操作部１８０の十字キーの操作によってカーソル６１２は表示画面６１０の上下方向に移動可能である。そして、確定ボタンが操作されることによって、制御部１９０は、カーソル６１２が指示するサムネイル画像データに関連付けられた圧縮動画像データを選択する。 FIG. 11 is a diagram schematically showing a list of compressed moving image data displayed on the display unit 150. FIG. 5A shows a display example by the first method, and FIG. 6B shows a display example by the second method. In FIG. 6A, a list of four compressed moving image data is displayed on the display screen 610 of the display unit 150, and thumbnail image data is displayed side by side for each compressed moving image data. ing. When attention is paid to the top of the list of four compressed moving image data, the three thumbnail images 611a, 611b, and 611c and the electronic file file of the compressed moving image data associated with the thumbnail images 611a, 611b, and 611c. The name “201010103.mp4” is displayed. In the figure, a cursor 612 is displayed surrounding thumbnail images 611a, 611b, and 611c. The cursor 612 can be moved up and down on the display screen 610 by operating the cross key of the operation unit 180. When the confirmation button is operated, the control unit 190 selects the compressed moving image data associated with the thumbnail image data indicated by the cursor 612.

図１１（ｂ）において、表示部１５０の表示画面６２０には、３個の圧縮動画像データの一覧が表示されており、時分割切り替えのサムネイル画像６２１，６２２，６２３と、時分割切り替えのサムネイル画像６２１，６２２，６２３にそれぞれ関係付けられた圧縮動画像データの電子ファイルのファイル名である、“２０１０１１０３．ｍｐ４”，“２０１０１１０５．ｍｐ４”，“２０１０１１１２．ｍｐ４”が表示されている。時分割切り替えのサムネイル画像６２１，６２２，６２３それぞれは、所定の時間おき（例えば１秒おき）にサムネイル画像データが切り替わる。そして、同図においては、時分割切り替えのサムネイル画像６２１にカーソル６２４が表示されている。操作部１８０の十字キーの操作によって、カーソル６２４は、時分割切り替えのサムネイル画像６２１，６２２，６２３上を移動可能である。そして、確定ボタンが操作されることによって、制御部１９０は、カーソル６２４が指示する時分割切り替えのサムネイル画像データに関連付けられた圧縮動画像データを選択する。 In FIG. 11B, a list of three compressed moving image data is displayed on the display screen 620 of the display unit 150, and thumbnail images 621, 622, and 623 for time division switching and thumbnails for time division switching. “201010103.mp4”, “2010101105.mp4”, and “201011112.mp4”, which are the file names of the electronic files of the compressed moving image data associated with the images 621, 622, and 623, are displayed. In each of the thumbnail images 621, 622, and 623 subjected to time division switching, the thumbnail image data is switched every predetermined time (for example, every second). In the figure, a cursor 624 is displayed on the thumbnail image 621 for time division switching. By operating the cross key of the operation unit 180, the cursor 624 can be moved on the time-division switching thumbnail images 621, 622, and 623. When the confirmation button is operated, the control unit 190 selects the compressed moving image data associated with the thumbnail image data for time division switching indicated by the cursor 624.

以上説明したように、本発明の第１実施形態の第１変形例であるキーフレーム抽出装置５００では、シーン解析部５１０が動画像データのシーンを検出し、キーフレーム抽出部５２０が、シーン長が最も長いシーンから、シーンの少なくとも両端に対応するフレーム画像を含めずに複数のフレーム画像をキーフレームとして抽出するようにした。このように構成したことにより、キーフレーム抽出装置５００は、動画の特性に関する“動画のテーマは、シーン長が長いシーンに含まれている。”（仮定３）と、“シーンカット付近のフレーム画像は、テーマ性が高い画像ではない。”（仮定２）とに加えて、“シーン内における構図の変化は小さい。”（仮定１）を満足させて、テーマ性をより強調したキーフレームを精度よく抽出することができる。 As described above, in the key frame extraction apparatus 500 that is the first modification of the first embodiment of the present invention, the scene analysis unit 510 detects a scene of moving image data, and the key frame extraction unit 520 detects the scene length. A plurality of frame images are extracted as key frames from the longest scene without including frame images corresponding to at least both ends of the scene. By configuring in this way, the key frame extraction apparatus 500 relates to the characteristics of the moving image “The moving image theme is included in a scene having a long scene length” (assuming 3) and “the frame image near the scene cut”. Is not a high-theme image. (Assumption 2) In addition to “there is a small change in composition in the scene” (Assumption 1), the key frame that emphasizes the theme more accurately Can be extracted well.

［第１実施形態の第２の変形例］
第１実施形態の第２変形例では、キーフレーム抽出部５２０が、正規化動画像データにおけるシーンをシーン長が長い順に複数選択し、シーンごとに、キーフレームとなる１個のフレーム画像を抽出する例について説明する。本変形例におけるキーフレーム抽出部５２０は、仮定３および仮定２にしたがい、特に複数のテーマを考慮してキーフレームを抽出するものである。 [Second Modification of First Embodiment]
In the second modified example of the first embodiment, the key frame extraction unit 520 selects a plurality of scenes in the normalized moving image data in order of long scene length, and extracts one frame image that becomes a key frame for each scene. An example will be described. The key frame extraction unit 520 in this modification example extracts key frames in consideration of a plurality of themes in accordance with Assumption 3 and Assumption 2.

本変形例において、第１変形例と相違する構成はキーフレーム抽出部５２０である。本変形例における全体的な機能構成は第１実施形態と同一であるため、本変形例では、ブロック図を省略し、第１実施形態における各構成と同一の符号を付した構成を用いて説明する。そして、本変形例では、第１変形例との相違点についてのみ説明する。 In this modification, a configuration different from the first modification is a key frame extraction unit 520. Since the overall functional configuration in the present modification is the same as that of the first embodiment, in this modification, a block diagram is omitted, and description is made using configurations having the same reference numerals as those in the first embodiment. To do. In this modification, only differences from the first modification will be described.

キーフレーム抽出部５２０は、シーン解析部５１０のシーン検出部５１２から供給されるフレーム識別情報と正規化動画像データとを取り込み、正規化動画像データにおけるシーンをシーン長が長い順にＭ個（Ｍは２以上の整数）選択し、シーンごとに、キーフレームとなる１個のフレーム画像を抽出し、これら抽出したＭ個のフレーム画像に対応するフレーム識別情報を、制御部１９０のサムネイル生成部５５０に供給するとともに特徴量解析部５３０に供給する。 The key frame extraction unit 520 takes in the frame identification information and the normalized moving image data supplied from the scene detection unit 512 of the scene analysis unit 510, and M scenes in the normalized moving image data in the order of long scene length (M Is an integer greater than or equal to 2), and for each scene, one frame image serving as a key frame is extracted, and frame identification information corresponding to the extracted M frame images is obtained as a thumbnail generation unit 550 of the control unit 190. To the feature amount analysis unit 530.

キーフレーム抽出部５２０がＭ個のキーフレームを抽出する処理について図１２を併せ参照して説明する。図１２は、キーフレーム抽出部５２０が正規化動画像データの中からＭ個（Ｍ＝３）のキーフレームを抽出する様子を模式的に表した図である。同図（ａ）は、第１実施形態における図４（ａ）と同一であり、キーフレーム抽出部５２０による処理も同一であるため、図１２（ｂ）に対応する処理以降について説明する。 A process in which the key frame extraction unit 520 extracts M key frames will be described with reference to FIG. FIG. 12 is a diagram schematically illustrating how the key frame extraction unit 520 extracts M (M = 3) key frames from the normalized moving image data. FIG. 12A is the same as FIG. 4A in the first embodiment, and the processing by the key frame extraction unit 520 is also the same. Therefore, the processing after the processing corresponding to FIG.

キーフレーム抽出部５２０は、前述した仮定３にしたがい、シーンの中からシーン長が長い（フレーム数が多い）順にＭ個のシーンを検出する。例えば、キーフレーム抽出部５２０は、各シーンのフレーム数を比較することにより、フレーム数が多い順に、図１２（ｂ）に示すように第３番目のシーン（シーン３：１５フレーム）と第４番目のシーン（シーン４：１２フレーム）と第１番目のシーン（シーン１：１０フレーム）とを検出する。
次に、キーフレーム抽出部５２０は、検出したシーンごとに、正規化動画像データのうち、シーンに対応するフレーム画像から所定範囲の輝度値を有するフレーム画像を検出する。具体的には、キーフレーム抽出部５２０は、正規化動画像データのうち、検出した１個のシーンに対応するフレーム画像から、平均輝度値が第１の閾値よりも低い低輝度フレーム画像、もしくは平均輝度値が第２の閾値（第２の閾値＞第１の閾値）よりも高い高輝度フレーム画像、または低輝度フレーム画像および高輝度フレーム画像を検出する。同図（ｂ）は、第３番目のシーン（シーン３）に対応するフレーム画像から４つの低輝度フレーム画像および高輝度フレーム画像を検出し、第４番目のシーン（シーン４）に対応するフレーム画像から５つの低輝度フレーム画像および高輝度フレーム画像を検出し、第１番目のシーン（シーン１）に対応するフレーム画像から３個の低輝度フレーム画像および高輝度フレーム画像を検出した例を示している。 The key frame extraction unit 520 detects M scenes in order from the scene having the longest scene length (the largest number of frames) according to Assumption 3 described above. For example, the key frame extraction unit 520 compares the number of frames of each scene, and in order from the largest number of frames, the third scene (scene 3: 15 frames) and the fourth are shown in FIG. The first scene (scene 4: 12 frames) and the first scene (scene 1: 10 frames) are detected.
Next, the key frame extraction unit 520 detects, for each detected scene, a frame image having a luminance value within a predetermined range from the frame image corresponding to the scene from the normalized moving image data. Specifically, the key frame extraction unit 520 includes a low-luminance frame image having an average luminance value lower than the first threshold value from a frame image corresponding to one detected scene in the normalized moving image data, or A high-luminance frame image whose average luminance value is higher than a second threshold (second threshold> first threshold), or a low-luminance frame image and a high-luminance frame image are detected. FIG. 5B shows four low-luminance frame images and high-luminance frame images detected from the frame image corresponding to the third scene (scene 3), and the frame corresponding to the fourth scene (scene 4). An example is shown in which five low-luminance frame images and high-luminance frame images are detected from the image, and three low-luminance frame images and high-luminance frame images are detected from the frame image corresponding to the first scene (scene 1). ing.

次に、キーフレーム抽出部５２０は、検出したシーンごとに、低輝度フレーム画像もしくは高輝度フレーム画像または低輝度フレーム画像および高輝度フレーム画像を除いたフレーム画像の中から、前述した仮定２にしたがい、時系列的に中央または中央近傍（例えば、時系列中心に最も近い）１個のフレーム画像をキーフレームとして抽出する。図１２（ｃ），（ｄ）は、キーフレーム抽出部５２０が、第３番目のシーン（シーン３）に対応するフレーム画像の中から、時系列中心のフレーム画像である左から６番目のフレーム画像をキーフレームとして抽出した例を示している。また、図１２（ｃ），（ｄ）は、キーフレーム抽出部５２０が、第４番目のシーン（シーン４）に対応するフレーム画像の中から、時系列中心のフレーム画像である左から４番目のフレーム画像をキーフレームとして抽出した例を示している。また、図１２（ｃ），（ｄ）は、キーフレーム抽出部５２０が、第１番目のシーン（シーン１）に対応するフレーム画像の中から、時系列中心のフレーム画像である左から４番目のフレーム画像をキーフレームとして抽出した例を示している。
次に、キーフレーム抽出部５２０は、抽出したＭ個のキーフレームに対応するフレーム識別情報を、制御部１９０のサムネイル生成部５５０に供給するとともに特徴量解析部５３０に供給する。 Next, for each detected scene, the key frame extraction unit 520 follows the assumption 2 described above from the low luminance frame image, the high luminance frame image, or the frame image excluding the low luminance frame image and the high luminance frame image. One frame image is extracted as a key frame in the center in the time series or near the center (for example, closest to the center of the time series). FIGS. 12C and 12D show the sixth frame from the left, which is a frame image centered in time series, from among the frame images corresponding to the third scene (scene 3) by the key frame extraction unit 520. An example in which an image is extracted as a key frame is shown. FIGS. 12C and 12D show that the key frame extraction unit 520 is the fourth frame from the left, which is the frame image centered in time series, among the frame images corresponding to the fourth scene (scene 4). In this example, the frame image is extracted as a key frame. FIGS. 12C and 12D show that the key frame extraction unit 520 is the fourth frame from the left, which is the frame image centered in time series, from the frame images corresponding to the first scene (scene 1). In this example, the frame image is extracted as a key frame.
Next, the key frame extraction unit 520 supplies frame identification information corresponding to the extracted M key frames to the thumbnail generation unit 550 of the control unit 190 and also to the feature amount analysis unit 530.

図１３は、キーフレーム抽出部５２０が抽出した複数（Ｍ＝３）のキーフレームの画像の例である。同図（ａ）は、撮像装置１００がテニスの試合の様子を撮影した動画像データを用いて、キーフレーム抽出装置５００が抽出したキーフレームの画像である。この動画像データには、同図（ａ）に示すような、時間が長いシーンとして、構図や被写体が異なるシーンが含まれている。
また、同図（ｂ）は、撮像装置１００がクッキーを作っている少女の様子を撮影した動画像データを用いて、キーフレーム抽出装置５００が抽出したキーフレームの画像である。この動画像データにも、同図（ｂ）に示すような、時間が長いシーンとして、構図や被写体が異なるシーンが含まれている。第１実施形態および第１変形例では、少女の手元がフレーム画像から外れて見えない構図のシーンしか検出できなかったが、本変形例によれば、その他の構図のシーン（クッキーが写ったシーン等）が検出できている。
同図（ａ），（ｂ）それぞれの３個のキーフレームの画像は、左側から右側に向けてシーン長が長い順に並べたものである。 FIG. 13 shows an example of a plurality (M = 3) of key frame images extracted by the key frame extraction unit 520. FIG. 5A shows a key frame image extracted by the key frame extraction device 500 using moving image data obtained by photographing the state of the tennis game by the imaging device 100. The moving image data includes scenes with different compositions and subjects as long time scenes as shown in FIG.
FIG. 5B shows a key frame image extracted by the key frame extraction device 500 using moving image data obtained by photographing the state of the girl making the cookie by the imaging device 100. This moving image data also includes scenes with different compositions and subjects as scenes having a long time as shown in FIG. In the first embodiment and the first modified example, only a scene having a composition in which the hand of the girl cannot be seen from the frame image can be detected. However, according to the present modified example, a scene having a different composition (a scene in which a cookie appears). Etc.) can be detected.
The images of the three key frames in each of FIGS. 9A and 9B are arranged in order of increasing scene length from the left side to the right side.

図１４は、図１３（ａ）における３個のキーフレームの画像それぞれの特徴量ヒストグラムと、クラスタリング処理部５３３がこれら３個の特徴量ヒストグラムをクラスごとに加算して生成した統合ヒストグラムとを模式的に表した図である。同図に示すように、画像の特性がそれぞれ比較的大きく異なる３個のキーフレームの画像の特徴量ヒストグラムを加算することにより、各シーンの特徴的な分布を網羅した分布を得ることができる。言い換えると、動画のテーマをもらすことなく把握することができる。 FIG. 14 schematically illustrates the feature amount histograms of the three key frame images in FIG. 13A and the integrated histogram generated by the clustering processing unit 533 adding the three feature amount histograms for each class. FIG. As shown in the figure, by adding the feature amount histograms of the images of three key frames having relatively different image characteristics, it is possible to obtain a distribution that covers the characteristic distribution of each scene. In other words, it is possible to grasp without having the theme of the video.

以上説明したように、本発明の第１実施形態の第２変形例であるキーフレーム抽出装置５００では、シーン解析部５１０が動画像データのシーンを検出し、キーフレーム抽出部５２０が、シーン長が長い順に複数のシーンを選択し、シーンごとに、時系列的に中央または中央近傍の１個のフレーム画像をキーフレームとして抽出するようにした。このように構成したことにより、キーフレーム抽出装置５００は、動画の特性に関する“動画のテーマは、シーン長が長いシーンに含まれている。”（仮定３）と、“シーンカット付近のフレーム画像は、テーマ性が高い画像ではない。”（仮定２）とを満足させ、複数のテーマを考慮したキーフレームを精度よく抽出することができる。 As described above, in the key frame extraction apparatus 500 that is the second modification of the first embodiment of the present invention, the scene analysis unit 510 detects a scene of moving image data, and the key frame extraction unit 520 detects the scene length. A plurality of scenes are selected in descending order, and one frame image at the center or near the center is extracted as a key frame in time series for each scene. By configuring in this way, the key frame extraction apparatus 500 relates to the characteristics of the moving image “The moving image theme is included in a scene having a long scene length” (assuming 3) and “the frame image near the scene cut”. Is not an image with high theme properties ”(Assumption 2), and key frames considering a plurality of themes can be extracted with high accuracy.

［第１実施形態の第３の変形例］
第１実施形態の第３変形例では、キーフレーム抽出部５２０が、正規化動画像データにおけるシーンをシーン長が長い順に複数選択し、シーンごとに、キーフレームとなる複数のフレーム画像を抽出する例について説明する。本変形例におけるキーフレーム抽出部５２０は、仮定３および仮定２にしたがい、特に複数のテーマを考慮してキーフレームを抽出するものである。 [Third Modification of First Embodiment]
In the third modification example of the first embodiment, the key frame extraction unit 520 selects a plurality of scenes in the normalized moving image data in order of increasing scene length, and extracts a plurality of frame images that become key frames for each scene. An example will be described. The key frame extraction unit 520 in this modification example extracts key frames in consideration of a plurality of themes in accordance with Assumption 3 and Assumption 2.

本変形例において、第２変形例と相違する構成はキーフレーム抽出部５２０である。本変形例における全体的な機能構成は第１実施形態と同一であるため、本変形例では、ブロック図を省略し、第１実施形態における各構成と同一の符号を付した構成を用いて説明する。そして、本変形例では、第２変形例との相違点についてのみ説明する。 In this modification, a configuration different from the second modification is a key frame extraction unit 520. Since the overall functional configuration in the present modification is the same as that of the first embodiment, in this modification, a block diagram is omitted, and description is made using configurations having the same reference numerals as those in the first embodiment. To do. In this modification, only differences from the second modification will be described.

キーフレーム抽出部５２０は、シーン解析部５１０のシーン検出部５１２から供給されるフレーム識別情報と正規化動画像データとを取り込み、正規化動画像データにおけるシーンをシーン長が長い順にＭ個（Ｍは２以上の整数）選択し、シーンごとに、キーフレームとなるＮ個（Ｎは２以上の整数）のフレーム画像を抽出し、これら抽出したＭ×Ｎ個のフレーム画像に対応するフレーム識別情報を、制御部１９０のサムネイル生成部５５０に供給するとともに特徴量解析部５３０に供給する。 The key frame extraction unit 520 takes in the frame identification information and the normalized moving image data supplied from the scene detection unit 512 of the scene analysis unit 510, and M scenes in the normalized moving image data in the order of long scene length (M Is an integer of 2 or more), and for each scene, N frame images (N is an integer of 2 or more) that are key frames are extracted, and frame identification information corresponding to these extracted M × N frame images Are supplied to the thumbnail generation unit 550 of the control unit 190 and to the feature amount analysis unit 530.

キーフレーム抽出部５２０がＭ×Ｎ個のキーフレームを抽出する処理について図１５を併せ参照して説明する。図１５は、キーフレーム抽出部５２０が正規化動画像データの中からＭ×Ｎ個（Ｍ×Ｎ＝３×３）のキーフレームを抽出する様子を模式的に表した図である。同図（ａ），（ｂ）は、第２変形例における図１２（ａ），（ｂ）と同一であり、キーフレーム抽出部５２０による処理も同一であるため、図１５（ｃ）に対応する処理以降について説明する。 A process in which the key frame extraction unit 520 extracts M × N key frames will be described with reference to FIG. FIG. 15 is a diagram schematically illustrating how the key frame extraction unit 520 extracts M × N (M × N = 3 × 3) key frames from the normalized moving image data. FIGS. 12A and 12B are the same as FIGS. 12A and 12B in the second modification, and the processing by the key frame extraction unit 520 is also the same, and therefore corresponds to FIG. The subsequent processing will be described.

キーフレーム抽出部５２０は、検出したシーンごとに、低輝度フレーム画像もしくは高輝度フレーム画像または低輝度フレーム画像および高輝度フレーム画像を除いたフレーム画像の中から、前述した仮定２および仮定１にしたがい、等間隔にＮ個のフレーム画像をキーフレームとして抽出する。Ｎ個のキーフレームを抽出する方法は、前述した第１変形例と同様であるため、ここでの説明を省略する。図１５（ｃ），（ｄ）では、キーフレーム抽出部５２０は、第３番目のシーン（シーン３）について式（１）によりＤ＝３を算出し、１１フレームの端から数えて３番目、その位置から数えて３番目、さらにその位置から数えて３番目にそれぞれ該当するフレーム画像をキーフレームとして抽出する。また、図１５（ｃ），（ｄ）では、キーフレーム抽出部５２０は、第４番目のシーン（シーン４）について式（１）によりＤ＝２を算出し、７フレームの端から数えて２番目、その位置から数えて２番目、さらにその位置から数えて２番目にそれぞれ該当するフレーム画像をキーフレームとして抽出する。また、図１５（ｃ），（ｄ）では、キーフレーム抽出部５２０は、第１番目のシーン（シーン１）について式（１）によりＤ＝２を算出し、７フレームの端から数えて２番目、その位置から数えて２番目、さらにその位置から数えて２番目にそれぞれ該当するフレーム画像をキーフレームとして抽出する。
次に、キーフレーム抽出部５２０は、抽出したＭ×Ｎ個のキーフレームに対応するフレーム識別情報を、制御部１９０のサムネイル生成部５５０に供給するとともに特徴量解析部５３０に供給する。 For each detected scene, the key frame extraction unit 520 follows the above assumption 2 and assumption 1 from the low luminance frame image, the high luminance frame image, or the frame image excluding the low luminance frame image and the high luminance frame image. N frame images are extracted as key frames at equal intervals. The method for extracting the N key frames is the same as that in the first modification described above, and thus the description thereof is omitted here. In FIGS. 15C and 15D, the key frame extraction unit 520 calculates D = 3 by the expression (1) for the third scene (scene 3), and the third, counting from the end of the 11th frame, The third frame image counted from the position and the third frame image counted from the position are extracted as key frames. Further, in FIGS. 15C and 15D, the key frame extraction unit 520 calculates D = 2 for the fourth scene (scene 4) by the equation (1), and counts 2 from the end of the 7th frame. The frame images corresponding to the second, second from the position, and second from the position are extracted as key frames. Further, in FIGS. 15C and 15D, the key frame extraction unit 520 calculates D = 2 for the first scene (scene 1) by the equation (1), and counts 2 from the end of the 7th frame. The frame images corresponding to the second, second from the position, and second from the position are extracted as key frames.
Next, the key frame extraction unit 520 supplies frame identification information corresponding to the extracted M × N key frames to the thumbnail generation unit 550 of the control unit 190 and also to the feature amount analysis unit 530.

以上説明したように、本発明の第１実施形態の第３変形例であるキーフレーム抽出装置５００では、シーン解析部５１０が動画像データのシーンを検出し、キーフレーム抽出部５２０が、シーン長が長い順に複数のシーンを選択し、シーンごとに、シーンの少なくとも両端に対応するフレーム画像を含めずに複数のフレーム画像をキーフレームとして抽出するようにした。このように構成したことにより、キーフレーム抽出装置５００は、動画の特性に関する“動画のテーマは、シーン長が長いシーンに含まれている。”（仮定３）と、“シーンカット付近のフレーム画像は、テーマ性が高い画像ではない。”（仮定２）とに加えて、“シーン内における構図の変化は小さい。”（仮定１）を満足させて、テーマ性をより強調して且つ複数のテーマを考慮したキーフレームを精度よく抽出することができる。 As described above, in the key frame extraction device 500 which is the third modification of the first embodiment of the present invention, the scene analysis unit 510 detects a scene of moving image data, and the key frame extraction unit 520 detects the scene length. A plurality of scenes are selected in descending order, and for each scene, a plurality of frame images are extracted as key frames without including frame images corresponding to at least both ends of the scene. By configuring in this way, the key frame extraction apparatus 500 relates to the characteristics of the moving image “The moving image theme is included in a scene having a long scene length” (assuming 3) and “the frame image near the scene cut”. Is not a high-theme image. ”(Assumption 2) In addition to“ there is little change in composition in the scene ”(Assumption 1) Keyframes that take into account themes can be extracted with high accuracy.

［第２の実施の形態］
図１６は、本発明の第２実施形態であるキーフレーム抽出装置を適用したサーバ装置を含む、ネットワークシステムの全体構成を表すブロック図である。同図に示すように、ネットワークシステム７００は、サーバ装置５６０と、情報通信端末５７０−１，５７０−２とが、ネットワーク５８０を介して接続された構成を有する。
ネットワークシステム７００は、多数の動画像データを格納したサーバ装置５６０から、端末利用者によって使用される情報通信端末５７０−１，５７０−２が、所望の動画像データをダウンロードして端末利用者に閲覧等使用させるシステムである。 [Second Embodiment]
FIG. 16 is a block diagram showing an overall configuration of a network system including a server device to which the key frame extraction device according to the second embodiment of the present invention is applied. As shown in the figure, the network system 700 has a configuration in which a server device 560 and information communication terminals 570-1 and 570-2 are connected via a network 580.
In the network system 700, the information communication terminals 570-1 and 570-2 used by the terminal user download the desired moving image data from the server device 560 storing a large number of moving image data to the terminal user. This is a system that allows browsing.

サーバ装置５６０は、その機能構成として、動画像データ記憶部５６１と、キーフレーム抽出装置５００と、サムネイル生成部５６２と、通信処理部５６３とを備える。
動画像データ記憶部５６１は、動画像データを記憶する記憶装置であり、例えば磁気ハードディスク装置により実現される。
キーフレーム抽出装置５００は、前述した第１実施形態および第１−第３変形例のうちいずれかによる装置である。 The server device 560 includes a moving image data storage unit 561, a key frame extraction device 500, a thumbnail generation unit 562, and a communication processing unit 563 as functional configurations.
The moving image data storage unit 561 is a storage device that stores moving image data, and is realized by, for example, a magnetic hard disk device.
The key frame extraction apparatus 500 is an apparatus according to any one of the first embodiment and the first to third modifications described above.

サムネイル生成部５６２は、キーフレーム抽出装置５００が出力するフレーム識別情報と分類データとを取り込む。そして、サムネイル生成部５６２は、動画像データ記憶部５６１からフレーム識別情報に対応する画像データを抽出し、抽出した画像データを縮小してサムネイル画像データを生成する。そして、サムネイル生成部５６２は、生成したサムネイル画像データを、対応する動画像データに関連付ける。そして、サムネイル生成部５６２は、サムネイル画像データを関係付けた動画像データを、動画像データ記憶部５６１に分類データに基づき分類して記憶する。 The thumbnail generation unit 562 takes in the frame identification information and the classification data output from the key frame extraction device 500. Then, the thumbnail generation unit 562 extracts image data corresponding to the frame identification information from the moving image data storage unit 561, reduces the extracted image data, and generates thumbnail image data. Then, the thumbnail generation unit 562 associates the generated thumbnail image data with the corresponding moving image data. Then, the thumbnail generation unit 562 classifies and stores the moving image data associated with the thumbnail image data in the moving image data storage unit 561 based on the classification data.

通信処理部５６３は、ネットワークインタフェースの機能を含み、ネットワーク５８０を介する情報通信端末５７０−１，５７０−２との通信を制御する。通信処理部５６３は、情報通信端末５７０−１，５７０−２からのダウンロード要求を受信すると、ダウンロード要求に応じた、サムネイル画像データが関係付けられた動画像データを動画像データ記憶部５６１から読み出して要求元の情報通信端末５７０−１，５７０−２に送信する。 The communication processing unit 563 includes a network interface function, and controls communication with the information communication terminals 570-1 and 570-2 via the network 580. When receiving the download request from the information communication terminals 570-1 and 570-2, the communication processing unit 563 reads out the moving image data associated with the thumbnail image data from the moving image data storage unit 561 according to the download request. To the requesting information communication terminals 570-1 and 570-2.

情報通信端末５７０−１，５７０−２は、ネットワーク５８０に対する接続装置との通信が可能な装置であり、例えば、コンピュータ装置、携帯電話機、スマートフォン、携帯情報端末（ＰＤＡ；ＰｅｒｓｏｎａｌＤｉｇｉｔａｌＡｓｓｉｓｔａｎｔ）等により実現される。
情報通信端末５７０−１，５７０−２は、サーバ装置５６０に対して動画像データの一覧を要求して取得し、図示しないディスプレイ装置に表示する。その表示内容は、前述した第１実施形態および各変形例に示したような、サムネイル画像が動画像データの一覧に表示されるものである。
ネットワーク５８０は、インターネットやＬＡＮ（ＬｏａｃａｌＡｒｅａＮｅｔｗｏｒｋ）等のコンピュータネットワークである。 The information communication terminals 570-1 and 570-2 are devices capable of communicating with a connection device for the network 580, and are realized by, for example, a computer device, a mobile phone, a smartphone, a personal digital assistant (PDA), or the like. Is done.
The information communication terminals 570-1 and 570-2 request and acquire a list of moving image data from the server device 560 and display it on a display device (not shown). The display content is such that thumbnail images are displayed in a list of moving image data as shown in the first embodiment and the respective modifications.
The network 580 is a computer network such as the Internet or a LAN (Local Area Network).

本発明の第２実施形態であるサーバ装置５６０によれば、キーフレーム抽出装置５００が提供する、キーフレームおよび分類データまたはいずれか一方の情報を適用して、保持する動画像データを視覚的に見易くまた検索容易に管理することができる。 According to the server device 560 according to the second embodiment of the present invention, the moving image data to be held is visually applied by applying the key frame and / or classification data provided by the key frame extraction device 500. It is easy to see and manage easily.

なお、上述した第１実施形態および第１−第３変形例、ならびに第２実施形態において、シーン解析部５１０は、公知のシーン検出技術を適用して実現してもよい。例えば、シーン解析部５１０は、パターンマッチングによるフレーム相関を検出する技術によりシーンを検出してもよい。また、シーン解析部５１０は、画像データのフレーム全体または一部の空間周波数を抽出して比較することによってシーンを検出してもよい。 In the first embodiment, the first to third modifications, and the second embodiment described above, the scene analysis unit 510 may be realized by applying a known scene detection technique. For example, the scene analysis unit 510 may detect a scene by a technique for detecting frame correlation by pattern matching. In addition, the scene analysis unit 510 may detect a scene by extracting and comparing all or some of the spatial frequencies of the image data.

また、上述した実施形態であるキーフレーム抽出装置の一部の機能をコンピュータで実現するようにしてもよい。この場合、その機能を実現するためのキーフレーム抽出プログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたキーフレーム抽出プログラムをコンピュータシステムに読み込ませ、実行することによって実現してもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳ（ＯｐｅｒａｔｉｎｇＳｙｓｔｅｍ）や周辺装置のハードウェアを含むものである。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、光ディスク、メモリカード等の可搬型記録媒体、コンピュータシステムに内蔵される磁気ハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを送信する場合の通信線のように、短時間の間、動的にプログラムを保持するもの、その場合のサーバ装置やクライアントとなるコンピュータシステム内部の揮発性メモリのように、一定時間プログラムを保持するものを含んでもよい。また上記のプログラムは、前述した機能の一部を実現するためのものであってもよく、さらに前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせにより実現するものであってもよい。 Moreover, you may make it implement | achieve a part of function of the key frame extraction apparatus which is embodiment mentioned above with a computer. In this case, the key frame extraction program for realizing the function is recorded on a computer-readable recording medium, and the key frame extraction program recorded on the recording medium is read by the computer system and executed. May be. Here, the “computer system” includes an OS (Operating System) and peripheral device hardware. The “computer-readable recording medium” refers to a portable recording medium such as a flexible disk, a magneto-optical disk, an optical disk, and a memory card, and a storage device such as a magnetic hard disk built in the computer system. Furthermore, the “computer-readable recording medium” dynamically holds a program for a short time like a communication line when transmitting a program via a network such as the Internet or a communication line such as a telephone line. In this case, it may include a device that holds a program for a certain period of time, such as a volatile memory inside a computer system serving as a server device or a client. Further, the above program may be for realizing a part of the functions described above, or may be realized by a combination with the program already recorded in the computer system. .

以上、本発明の実施の形態について図面を参照して詳述したが、具体的な構成はその実施形態に限られるものではなく、本発明の要旨を逸脱しない範囲の設計等も含まれる。 As mentioned above, although embodiment of this invention was explained in full detail with reference to drawings, the specific structure is not restricted to that embodiment, The design of the range which does not deviate from the summary of this invention, etc. are included.

１００撮像装置
１１０撮像部
１１１光学系
１１９撮像素子
１２０アナログ／デジタル（Ａ／Ｄ）変換部
１３０バッファメモリ部
１４０画像処理部
１５０表示部
１６０記憶部
１７０通信部
１８０操作部
１９０制御部
２００記憶媒体
３００バス
５００キーフレーム抽出装置
５１０シーン解析部
５１１時空間画像生成部
５１２シーン検出部
５２０キーフレーム抽出部
５３０特徴量解析部
５３１画像正規化部
５３２特徴量抽出部
５３３クラスタリング処理部
５４０分類処理部
５５０サムネイル生成部
５６０サーバ装置
５６１動画像データ記憶部
５６２サムネイル生成部
５６３通信処理部
５７０−１，５７０−２情報通信端末
５８０ネットワーク
７００ネットワークシステム DESCRIPTION OF SYMBOLS 100 Image pick-up device 110 Image pick-up part 111 Optical system 119 Image pick-up element 120 Analog / digital (A / D) conversion part 130 Buffer memory part 140 Image processing part 150 Display part 160 Storage part 170 Communication part 180 Operation part 190 Control part 200 Storage medium 300 Bus 500 Key frame extraction device 510 Scene analysis unit 511 Spatio-temporal image generation unit 512 Scene detection unit 520 Key frame extraction unit 530 Feature amount analysis unit 531 Image normalization unit 532 Feature amount extraction unit 533 Clustering processing unit 540 Classification processing unit 550 Thumbnail Generation unit 560 Server device 561 Moving image data storage unit 562 Thumbnail generation unit 563 Communication processing unit 570-1, 570-2 Information communication terminal 580 Network 700 Network system

Claims

A scene analysis unit that analyzes moving image data and detects a scene;
A key frame extraction unit that extracts a key frame based on a scene length of the scene detected by the scene analysis unit and a position of image data in a time direction in a plurality of image data corresponding to the scene;
A key frame extraction device comprising:

The key frame extraction unit
The center or near-center image data is extracted as a key frame in time series from a plurality of image data corresponding to a scene having the longest scene length among a plurality of scenes detected by the scene analysis unit. Item 2. A key frame extraction device according to Item 1.

The key frame extraction unit
The plurality of image data is extracted as a key frame from a plurality of image data corresponding to the scene having the longest scene length without including image data corresponding to at least both ends of the scene. Key frame extractor.

The key frame extraction unit
Among the plurality of scenes detected by the scene analysis unit, a plurality of scenes are selected in descending order of the scene length, and for each selected scene, from the plurality of image data corresponding to the scene, the center or the vicinity of the center is selected in time series. 2. The key frame extracting apparatus according to claim 1, wherein the image data is extracted as a key frame.

The key frame extraction unit
5. The plurality of image data is extracted as a key frame for each selected scene from a plurality of image data corresponding to the scene without including image data corresponding to at least both ends of the scene. The key frame extraction device described.

The key frame extraction unit
6. A plurality of image data having a predetermined range of luminance values is extracted from a plurality of image data corresponding to the scene, and key frames are extracted from the extracted plurality of image data. A key frame extraction device according to claim 1.

A feature amount extraction unit that extracts a feature amount of an image from the key frame extracted by the key frame extraction unit;
A clustering processing unit that generates a feature amount histogram by performing clustering processing based on the feature amount of the image extracted by the feature amount extraction unit;
The key frame extraction device according to claim 1, further comprising:

A feature amount extraction unit that extracts a feature amount of an image from each of a plurality of key frames extracted by the key frame extraction unit;
Based on the feature amount of the image extracted by the feature amount extraction unit, a clustering process is performed for each key frame to generate a plurality of feature amount histograms, and the plurality of feature amount histograms are added for each class to obtain an integrated histogram A clustering processing unit for generating
The key frame extraction device according to claim 1, further comprising:

Computer
A scene analysis unit that analyzes moving image data and detects a scene;
A key frame extraction unit that extracts a key frame based on a scene length of the scene detected by the scene analysis unit and a position of image data in a time direction in a plurality of image data corresponding to the scene;
Key frame extraction program to function as

A scene analysis step in which the scene analysis unit detects the scene by analyzing the moving image data;
The key frame extraction unit extracts a key frame based on the scene length of the scene detected by the scene analysis unit in the scene analysis step and the position of the image data in the time direction in the plurality of image data corresponding to the scene. A key frame extraction step;
A key frame extraction method comprising:

An imaging unit for capturing and generating moving image data;
A scene analysis unit for analyzing the moving image data generated by the imaging unit and detecting a scene;
A key frame extraction unit that extracts a key frame based on a scene length of the scene detected by the scene analysis unit and a position of image data in a time direction in a plurality of image data corresponding to the scene;
A thumbnail generation unit that generates reduced image data based on the key frame extracted by the key frame extraction unit;
A display unit for displaying the reduced image data generated by the thumbnail generation unit;
An imaging apparatus comprising:

A moving image data storage unit for storing moving image data;
A scene analysis unit for detecting a scene by analyzing the moving image data stored in the moving image data storage unit;
A key frame extraction unit that extracts a key frame based on a scene length of the scene detected by the scene analysis unit and a position of image data in a time direction in a plurality of image data corresponding to the scene;
A thumbnail generation unit that generates reduced image data based on the key frame extracted by the key frame extraction unit, and stores the reduced image data in the moving image data storage unit in association with the moving image data;
A server device comprising: