JP2014229092A

JP2014229092A - Image processing device, image processing method and program therefor

Info

Publication number: JP2014229092A
Application number: JP2013108657A
Authority: JP
Inventors: 武史松尾; Takeshi Matsuo
Original assignee: Nikon Corp
Current assignee: Nikon Corp
Priority date: 2013-05-23
Filing date: 2013-05-23
Publication date: 2014-12-08

Abstract

PROBLEM TO BE SOLVED: To accurately extract a scene of importance from image data.SOLUTION: An image processing device comprises: a motion extraction unit that extracts motion information on an image to be inputted on the basis of the image; and a characteristic amount calculation unit that calculates the amount of a characteristic indicative of an important scene in which there is a motion in a subject within the image on the basis of the extracted motion information. Herein, the subject is an image of an image area having a relatively large motion among the image areas out of a plurality of image areas within the image, and information indicative of a direction of the motion of the subject is included in th motion information. The characteristic amount calculation unit calculates the amount of the characteristic on the basis of the direction of the motion indicated by the motion information.

Description

本発明は、画像処理装置、画像処理方法、および、そのプログラムに関する。 The present invention relates to an image processing apparatus, an image processing method, and a program thereof.

ビデオクリップ（動画データ）あるいは映像データのハイライト再生やダイジェスト再生などを行うため、これらの画像データから特定のシーン（例えば、重要なシーン）を抽出する技術が知られている（例えば、特許文献１参照）。この特許文献１には、カメラにカメラモーションセンサを備え、グローバルモーションを計算して複数のビデオセグメントを形成し、一連のカメラモーションクラスにしたがって各セグメントをラベリングし、このラベリングしたセグメントから重要なシーンの候補を抽出する技術が開示されている。このグローバルモーションは、カメラモーションセンサによるカメラワークや映像から得られるカメラワークを計算したものである。 In order to perform highlight reproduction or digest reproduction of video clips (moving image data) or video data, a technique for extracting a specific scene (for example, an important scene) from these image data is known (for example, Patent Documents). 1). In this patent document, a camera is provided with a camera motion sensor, a global motion is calculated to form a plurality of video segments, each segment is labeled according to a series of camera motion classes, and an important scene is determined from the labeled segments. A technique for extracting candidates is disclosed. This global motion is obtained by calculating camera work obtained from a camera motion sensor and video.

特表２００９−５３９２７３号公報Special table 2009-539273 gazette

しかしながら、特許文献１に開示された重要なシーンの候補の抽出方法では、カメラにカメラモーションセンサを設ける必要があった。
そこで、本発明は、上記事情に鑑みてなされたものであり、特別なセンサを用いることなく、画像データから重要なシーンを精度よく抽出する、画像処理装置、画像処理方法、および、そのプログラムを提供することを目的とする。 However, in the important scene candidate extraction method disclosed in Patent Document 1, it is necessary to provide a camera motion sensor in the camera.
Therefore, the present invention has been made in view of the above circumstances, and provides an image processing apparatus, an image processing method, and a program for accurately extracting an important scene from image data without using a special sensor. The purpose is to provide.

本発明の一態様は、入力される画像に基づいて、当該画像の動き情報を抽出する動き抽出部と、前記抽出された動き情報に基づいて、前記画像内の被写体に動きがある重要シーンを示す特徴量を算出する特徴量算出部とを備えることを特徴とする画像処理装置である。 One aspect of the present invention provides a motion extraction unit that extracts motion information of an image based on an input image, and an important scene in which a subject in the image has motion based on the extracted motion information. An image processing apparatus comprising: a feature amount calculation unit that calculates a feature amount to be indicated.

また、本発明の一態様は、入力される画像に基づいて、当該画像の動き情報を抽出する動き抽出手順と、前記抽出された動き情報に基づいて、前記画像内の被写体に動きがある重要シーンを示す特徴量を算出する特徴量算出手順とを有することを特徴とする画像処理方法である。 Further, according to one embodiment of the present invention, a motion extraction procedure for extracting motion information of an image based on an input image, and an object in the image has motion based on the extracted motion information. It is an image processing method characterized by having a feature amount calculation procedure for calculating a feature amount indicating a scene.

また、本発明の一態様は、画像処理装置が備えるコンピュータに、入力される画像に基づいて、当該画像の動き情報を抽出する動き抽出手順と、前記抽出された動き情報に基づいて、前記画像内の被写体に動きがある重要シーンを示す特徴量を算出する特徴量算出手順とを実行させるためのプログラムである。 One embodiment of the present invention is a motion extraction procedure for extracting motion information of an image based on an image input to a computer included in the image processing apparatus, and the image based on the extracted motion information. This is a program for executing a feature amount calculation procedure for calculating a feature amount indicating an important scene in which a subject moves.

本発明によれば、画像データから重要なシーンを精度よく抽出することができる。 According to the present invention, an important scene can be accurately extracted from image data.

本発明の一実施の形態に係る画像処理装置の構成を示す概略図である。1 is a schematic diagram illustrating a configuration of an image processing apparatus according to an embodiment of the present invention. 本実施形態の動画のフレーム構成と、フレーム間の被写体の動きの方向の一例を示す模式図である。It is a schematic diagram which shows an example of the frame structure of the moving image of this embodiment, and the direction of the motion of the to-be-photographed object between frames. 本実施形態の動き抽出部による動き算出結果の一例を示す模式図である。It is a schematic diagram which shows an example of the motion calculation result by the motion extraction part of this embodiment. 本実施形態の画像処理装置の動作の一例を示すフローチャートである。It is a flowchart which shows an example of operation | movement of the image processing apparatus of this embodiment.

［実施形態］
以下、図面を参照して、本発明に係る画像処理装置１０の一実施形態について説明する。図１は、本発明の一実施の形態に係る画像処理装置１０の構成を示す概略図である。 [Embodiment]
Hereinafter, an embodiment of an image processing apparatus 10 according to the present invention will be described with reference to the drawings. FIG. 1 is a schematic diagram showing a configuration of an image processing apparatus 10 according to an embodiment of the present invention.

図１に示すとおり、本実施の形態の画像処理装置１０は、例えば、パーソナルコンピュータの機能として実現されており、入力される画像（処理対象の画像）のなかから抽出した重要シーンを出力する。すなわち、この画像処理装置１０は、処理対象の画像データに含まれる重要シーンを抽出する。ここで、重要シーンとは、画像のうち、この画像の視聴者あるいは撮影者（以下の説明においては、単に視聴者と記載する。）が注目する場面の画像である。視聴者が画像に時間的な変化、すなわち動きがある部分に注目することから、この画像処理装置１０は、画像のなかで動きがある部分をこの画像の被写体として抽出し、この被写体が特徴的な動きをする場面を重要シーンとして抽出する。すなわち、重要シーンとは、処理対象の画像のうち被写体が動いている場面の画像である。 As shown in FIG. 1, the image processing apparatus 10 according to the present embodiment is realized as a function of a personal computer, for example, and outputs an important scene extracted from input images (processing target images). That is, the image processing apparatus 10 extracts important scenes included in the image data to be processed. Here, the important scene is an image of a scene that is viewed by a viewer or a photographer (in the following description, simply referred to as a viewer) of the image. Since the viewer pays attention to a temporal change in the image, that is, a portion having movement, the image processing apparatus 10 extracts a portion having movement in the image as a subject of the image, and this subject is characteristic. Scenes that move smoothly are extracted as important scenes. That is, the important scene is an image of a scene where the subject is moving among the images to be processed.

なお、この画像処理装置１０は、パーソナルコンピュータの機能として実現されることに限られない。例えば、この画像処理装置１０は、携帯電話やカメラに内蔵されていてもよい。また、ここでいう画像とは、必ずしも動画に限られないが、以下の説明においては、入力される画像（処理対象の画像）が動画である例について説明する。 The image processing apparatus 10 is not limited to being realized as a function of a personal computer. For example, the image processing apparatus 10 may be built in a mobile phone or a camera. In addition, the image here is not necessarily limited to a moving image, but in the following description, an example in which an input image (image to be processed) is a moving image will be described.

以下、この画像処理装置１０の構成について説明する。
画像処理装置１０は、特徴量算出部１００と、識別器生成部２００と、重要シーン抽出部３００と、記憶部４００とを備えている。
記憶部４００には、重要シーンを識別する識別器（識別情報）が記憶されている。この識別器とは、重要シーンを示す特徴量（以下の説明において、重要シーン特徴量ｈ＊（エッチ・アスタリスク）とも記載する。）に基づいて重要シーンを識別するための情報である。この重要シーン特徴量ｈ＊とは、処理対象の動画のうち、あるフレーム間の被写体の動きの方向を示す情報である。この動画のフレーム間の被写体の動きの方向について、図２を参照して説明する。 Hereinafter, the configuration of the image processing apparatus 10 will be described.
The image processing apparatus 10 includes a feature amount calculation unit 100, a discriminator generation unit 200, an important scene extraction unit 300, and a storage unit 400.
The storage unit 400 stores an identifier (identification information) that identifies an important scene. The discriminator is information for identifying an important scene based on a feature amount indicating an important scene (also referred to as an important scene feature amount h * (etch asterisk) in the following description). This important scene feature amount h * is information indicating the direction of movement of the subject between certain frames in the moving image to be processed. The direction of movement of the subject between frames of the moving image will be described with reference to FIG.

図２は、本実施形態の動画のフレーム構成と、フレーム間の被写体の動きの方向の一例を示す模式図である。以下、各フレームの左上の頂点を原点にして、ＸＹ直交座標系を設定し、このＸＹ直交座標系を参照しつつ説明する。このＸＹ直交座標系において、各フレームの左右方向をＸ軸とし、各フレームの上下方向をＹ軸とする。図２（ａ）に示すように、本実施形態の動画は、時系列に並べられた複数枚のフレーム（例えば、フレーム１〜ｎ）を含んでいる。このフレームは、複数の画素（例えば、１９２０×１０８０画素）によって構成されている。 FIG. 2 is a schematic diagram illustrating an example of a frame structure of a moving image and a direction of movement of a subject between frames according to the present embodiment. Hereinafter, an XY orthogonal coordinate system is set with the top left vertex of each frame as the origin, and the description will be made with reference to this XY orthogonal coordinate system. In this XY orthogonal coordinate system, the horizontal direction of each frame is taken as the X axis, and the vertical direction of each frame is taken as the Y axis. As shown in FIG. 2A, the moving image of the present embodiment includes a plurality of frames (for example, frames 1 to n) arranged in time series. This frame is composed of a plurality of pixels (for example, 1920 × 1080 pixels).

ここで、複数枚（例えば、２枚）のフレームの画像どうしを比較することにより、フレーム間の画像の動きを求めることができる。図２（ｂ）に示すように、フレーム１およびフレーム４にサッカーボールの画像が含まれている場合を具体例にして説明する。このフレーム１には、サッカーボールの画像が位置Ｙ１に表示されている。また、フレーム４には、サッカーボールの画像が位置Ｙ２に表示されている。これら２枚のフレーム（フレーム１とフレーム４）の画像を比較すると、サッカーボールの画像の位置が位置Ｙ１から位置Ｙ２に、すなわち−Ｙ方向に移動している。このようにして、２枚のフレームの画像を比較することにより、サッカーボールの画像の動きを求めることができる。ここで、画像の動きをオプティカルフローＯＦともいう。すなわち、この図２（ｂ）の例の場合においては、２枚のフレームの画像を比較することにより、サッカーボールのオプティカルフローＯＦ１を求めることができる。このオプティカルフローＯＦは、フレーム内の画像の動きを表す２種類の値（フレームの幅方向（Ｘ軸方向）成分と高さ方向（Ｙ軸方向）成分）を、フレームを構成する画素毎に有している。 Here, the motion of the image between the frames can be obtained by comparing the images of a plurality of frames (for example, two). As shown in FIG. 2B, a case where a soccer ball image is included in frames 1 and 4 will be described as a specific example. In this frame 1, an image of a soccer ball is displayed at a position Y1. In addition, an image of a soccer ball is displayed on the frame 4 at a position Y2. When the images of these two frames (frame 1 and frame 4) are compared, the position of the soccer ball image has moved from position Y1 to position Y2, that is, in the -Y direction. In this way, the motion of the soccer ball image can be obtained by comparing the images of the two frames. Here, the movement of the image is also referred to as an optical flow OF. That is, in the case of the example of FIG. 2B, the optical flow OF1 of the soccer ball can be obtained by comparing the images of the two frames. This optical flow OF has two types of values (the frame width direction (X-axis direction) component and the height direction (Y-axis direction) component) representing the motion of the image in the frame for each pixel constituting the frame. doing.

図１に戻り、画像処理装置１０の説明を続ける。この画像処理装置１０は、訓練過程と、識別過程との２つの過程によって重要シーンを抽出する。このうち、訓練過程とは、入力される画像（例えば、動画）の重要シーン特徴量ｈ＊を算出して、算出した重要シーン特徴量ｈ＊に基づく識別器（識別情報）を記憶部４００に記憶させる過程である。この訓練過程において入力される画像とは、訓練画像である。この識別過程において入力される画像とは、処理対象の画像である。ここでは、まず訓練過程について説明する。 Returning to FIG. 1, the description of the image processing apparatus 10 will be continued. The image processing apparatus 10 extracts an important scene through two processes, a training process and an identification process. Among these, the training process means that an important scene feature amount h * of an input image (for example, a moving image) is calculated, and a discriminator (identification information) based on the calculated important scene feature amount h * is stored in the storage unit 400. It is a process of memorizing. The image input in this training process is a training image. The image input in this identification process is an image to be processed. Here, the training process will be described first.

画像処理装置１０は、訓練過程において、動画のカテゴリごとに複数の訓練画像が入力される。画像処理装置１０は、入力された複数の訓練画像について、重要シーン特徴量ｈ＊をそれぞれ算出する。ここで、動画のカテゴリとは、オプティカルフローＯＦの方向の特徴に基づいて、動画の内容を分類する情報である。より詳細には、動画のカテゴリとは、オプティカルフローＯＦの方向のばらつきの程度に基づいて、動画の内容を分類する情報である。ここで、オプティカルフローＯＦの方向のばらつきの程度とは、例えば、オプティカルフローＯＦの方向を８方向に分類した場合、ある動画のオプティカルフローＯＦについて、この８方向の各方向を階級とするヒストグラムの頻度のばらつきの程度である。具体例を示すと、動画のカテゴリには、「サッカーの試合中の競技場の全景画像」や、「サッカーの個人技のクローズアップ画像」などが含まれる。この動画のカテゴリ「サッカーの試合中の競技場の全景画像」におけるオプティカルフローＯＦの特徴とは、例えば、サッカーボールが左右のいずれかのゴールに向けて移動する動きである。この場合、選手の画像はサッカーボールの画像に比べて動きが少ない。この動画を見た視聴者は、サッカーボールの画像に注目する。すなわち、この動画を見た視聴者は、画像のうち相対的に動きが大きい画像領域を被写体として認識し、その被写体の画像に注目する。換言すれば、ここでいう被写体とは、画像内の複数の画像領域のうち当該画像領域間において相対的に動きが大きい画像領域の画像である。また、以下の説明において、被写体を前景ともいう。この場合において、注目される被写体（前景）とは、サッカーボールである。すなわち、この例においては、注目される被写体が画面の左右方向（Ｘ軸方向）に移動する動きが、動画のカテゴリ「サッカーの試合中の競技場の全景画像」のオプティカルフローＯＦの特徴である。換言すれば、例えば、動画のカテゴリ「サッカーの試合中の競技場の全景画像」のオプティカルフローＯＦの特徴とは、オプティカルフローＯＦの主方向が画面の左右方向（Ｘ軸方向）になることである。 The image processing apparatus 10 receives a plurality of training images for each moving image category in the training process. The image processing apparatus 10 calculates the important scene feature amount h * for each of the input training images. Here, the moving image category is information for classifying the moving image content based on the characteristics of the direction of the optical flow OF. More specifically, the moving image category is information that classifies the content of the moving image based on the degree of variation in the direction of the optical flow OF. Here, the degree of variation in the direction of the optical flow OF is, for example, when the direction of the optical flow OF is classified into 8 directions, the optical flow OF of a certain moving image has a class with each direction in the 8 directions as a class. The degree of frequency variation. As a specific example, the moving image category includes “a full view image of a stadium during a soccer game”, “a close-up image of a soccer individual technique”, and the like. The feature of the optical flow OF in the video category “entire view image of the stadium during a soccer game” is, for example, movement of a soccer ball moving toward one of the left and right goals. In this case, the player image moves less than the soccer ball image. Viewers who watch this video pay attention to the image of the soccer ball. That is, the viewer who has viewed this moving image recognizes an image area having a relatively large movement as an object, and pays attention to the image of the object. In other words, the subject here is an image of an image region that has a relatively large movement among the image regions among a plurality of image regions in the image. In the following description, the subject is also referred to as a foreground. In this case, the subject of interest (foreground) is a soccer ball. That is, in this example, the movement in which the subject of interest moves in the left-right direction (X-axis direction) of the screen is a feature of the optical flow OF of the moving image category “entire view image of the stadium during a soccer game”. . In other words, for example, the feature of the optical flow OF of the video category “entire view image of the stadium during a soccer game” is that the main direction of the optical flow OF is the left-right direction (X-axis direction) of the screen. is there.

また、動画のカテゴリ「サッカーの個人技のクローズアップ画像」におけるオプティカルフローＯＦの特徴とは、例えば、サッカーボールがリフティングされて上下に移動する動きである。この場合、リフティングしている人の画像はサッカーボールの画像に比べて動きが少ない。この動画を見た視聴者は、サッカーボールの画像に注目する。したがって、この場合において、注目される被写体、つまり前景とは、サッカーボールである。すなわち、注目される被写体（前景）が画面の上下方向（Ｙ軸方向）に移動する動きが、動画のカテゴリ「サッカーの個人技のクローズアップ画像」のオプティカルフローＯＦの特徴である。 In addition, the feature of the optical flow OF in the moving image category “close-up image of a soccer individual technique” is, for example, a movement in which a soccer ball is lifted and moved up and down. In this case, the image of the person lifting is less moving than the image of the soccer ball. Viewers who watch this video pay attention to the image of the soccer ball. Therefore, in this case, the subject to be noticed, that is, the foreground is a soccer ball. That is, the movement of the noted subject (foreground) in the vertical direction (Y-axis direction) of the screen is a feature of the optical flow OF of the moving image category “close-up image of soccer individual technique”.

［画像処理装置１０の構成（訓練過程）］
次に、訓練過程における画像処理装置１０の構成について説明する。この訓練過程においては、特徴量算出部１００と識別器生成部２００とによって訓練画像の識別器を生成する。この特徴量算出部１００は、オプティカルフロー抽出部１１０と、前景のオプティカルフロー抽出部１２０と、重要シーン特徴量算出部１３０とを備えている。この特徴量算出部１００は、入力される画像に基づいて、当該画像の動き情報を抽出し、抽出した動き情報に基づいて、当該画像の識別器を生成する。 [Configuration of Image Processing Apparatus 10 (Training Process)]
Next, the configuration of the image processing apparatus 10 in the training process will be described. In this training process, a discriminator for training images is generated by the feature amount calculation unit 100 and the discriminator generation unit 200. The feature amount calculation unit 100 includes an optical flow extraction unit 110, a foreground optical flow extraction unit 120, and an important scene feature amount calculation unit 130. The feature amount calculation unit 100 extracts motion information of the image based on the input image, and generates a classifier of the image based on the extracted motion information.

具体的には、オプティカルフロー抽出部１１０は、入力される訓練画像のオプティカルフローＯＦを抽出する。このオプティカルフロー抽出部１１０は、訓練画像データが入力されると、この訓練画像を所定の時間間隔によってサンプリングする。この所定の時間間隔とは、例えば、図２に示すように、３フレームごとである。この場合には、オプティカルフロー抽出部１１０は、訓練画像データが入力されると、この訓練画像を３フレームごとにサンプリングする。 Specifically, the optical flow extraction unit 110 extracts the optical flow OF of the input training image. When the training image data is input, the optical flow extraction unit 110 samples the training image at a predetermined time interval. This predetermined time interval is, for example, every three frames as shown in FIG. In this case, when the training image data is input, the optical flow extraction unit 110 samples the training image every three frames.

次に、オプティカルフロー抽出部１１０は、サンプリングした前後２枚のフレームに基づいて、訓練画像のオプティカルフローＯＦを次の式（１）によって算出する。 Next, the optical flow extraction unit 110 calculates the optical flow OF of the training image by the following equation (1) based on the two frames before and after the sampling.

ここで、式（１）中の（ｘ，ｙ）は、オプティカルフローＯＦを算出したフレーム内の画素座標を表す。また、式（１）中のｖｘ（ｘ，ｙ）、ｖｙ（ｘ，ｙ）は、それぞれフレームの幅方向（Ｘ軸方向）と高さ方向（Ｙ軸方向）のオプティカルフロー成分を示す。なお、このｖｘ、ｖｙの添え字ｘ、ｙは、式（１）および以下の各式において、いずれも下付き文字によって記載する。すなわち、オプティカルフロー抽出部１１０は、画像を構成する所定の画素毎に当該画像の動き情報を抽出する。ここで、所定の画素とは、フレーム内の各画素であってもよく、ある画素間隔Ｌ（例えば、Ｌ＝５画素）毎の画素であってもよい。以下、オプティカルフロー抽出部１１０がフレーム内の各画素についてオプティカルフローＯＦを算出した場合について説明する。 Here, (x, y) in Equation (1) represents pixel coordinates in the frame for which the optical flow OF has been calculated. Further, vx (x, y) and vy (x, y) in the formula (1) indicate optical flow components in the width direction (X-axis direction) and the height direction (Y-axis direction), respectively. The subscripts x and y of vx and vy are written in subscripts in the formula (1) and the following formulas. That is, the optical flow extraction unit 110 extracts the motion information of the image for each predetermined pixel that configures the image. Here, the predetermined pixel may be each pixel in the frame, or may be a pixel at a certain pixel interval L (for example, L = 5 pixels). Hereinafter, a case where the optical flow extraction unit 110 calculates the optical flow OF for each pixel in the frame will be described.

前景のオプティカルフロー抽出部１２０は、オプティカルフロー抽出部１１０が抽出した訓練画像のオプティカルフローＯＦから、前景のオプティカルフローＦＯＦを抽出する。具体的には、前景のオプティカルフロー抽出部１２０は、式（２）および式（３）によってフレーム毎にオプティカルフローＯＦの平均値ｖ￣（ブイ・バー）と標準偏差σ（シグマ）とを算出する。ここでＮとは、フレーム毎に算出したオプティカルフローＯＦの数である。 The foreground optical flow extraction unit 120 extracts the foreground optical flow FOF from the training image optical flow OF extracted by the optical flow extraction unit 110. Specifically, the foreground optical flow extraction unit 120 calculates the average value v ￣ (buoy bar) and the standard deviation σ (sigma) of the optical flow OF for each frame by the equations (2) and (3). To do. Here, N is the number of optical flows OF calculated for each frame.

次に、前景のオプティカルフロー抽出部１２０は、算出したオプティカルフローＯＦの平均値ｖ￣（ブイ・バー）と標準偏差σ（シグマ）とに基づいて、前景のオプティカルフローＦＯＦを抽出する。具体的には、前景のオプティカルフロー抽出部１２０は、オプティカルフローＯＦの大きさｖ（ｘ，ｙ）から平均値ｖ￣（ブイ・バー）を引いたもの、すなわち、残差（ｖ（ｘ，ｙ）−ｖ￣（ブイ・バー））に基づいて、前景のオプティカルフローＦＯＦを算出する。例えば、前景のオプティカルフロー抽出部１２０は、標準偏差σより大きければ、そのオプティカルフローＯＦ（ｖ（ｘ，ｙ））を前景のオプティカルフローＦＯＦ（ｖ＊（ｘ，ｙ））（ブイ・アスタリスク・エックス・ワイ）とする（式（４）を参照。）。なお、このｖの添え字＊（アスタリスク）は、式（４）および以下の各式において、いずれも上付き文字によって記載する。 Next, the foreground optical flow extraction unit 120 extracts the foreground optical flow FOF based on the calculated average value v ￣ (buoy bar) and standard deviation σ (sigma) of the optical flow OF. Specifically, the foreground optical flow extraction unit 120 subtracts the average value v ￣ (buoy bar) from the magnitude v (x, y) of the optical flow OF, that is, the residual (v (x, y y) -Foreground optical flow FOF is calculated based on -v￣ (buoy bar)). For example, if the foreground optical flow extraction unit 120 is larger than the standard deviation σ, the foreground optical flow OF (v (x, y)) is converted into the foreground optical flow FOF (v * (x, y)) (buoy asterisk X Y) (see equation (4)). Note that the subscript * (asterisk) of v is written in superscript in both the formula (4) and the following formulas.

次に、前景のオプティカルフロー抽出部１２０は、前景のオプティカルフローＦＯＦ（ｖ＊（ｘ，ｙ））を複数の方向に量子化して、各方向の頻度を算出する。具体的には、前景のオプティカルフロー抽出部１２０は、０でない前景のオプティカルフローＦＯＦ（ｖ＊（ｘ，ｙ））の方向を８方向に量子化して、方向ヒストグラム（方向毎の頻度）ｈを算出する（式（５）、式（６）を参照。）。ここで、方向θ（ｘ，ｙ）は、ラジアンによって示される方向である。 Next, the foreground optical flow extraction unit 120 quantizes the foreground optical flow FOF (v * (x, y)) in a plurality of directions, and calculates the frequency in each direction. Specifically, the foreground optical flow extraction unit 120 quantizes the direction of the non-zero foreground optical flow FOF (v * (x, y)) into eight directions, and generates a direction histogram (frequency for each direction) h. Calculate (see formulas (5) and (6)). Here, the direction θ (x, y) is a direction indicated by radians.

すなわち、特徴量算出部１００は、オプティカルフロー抽出部１１０が、入力される画像に基づいて当該画像の動き情報（オプティカルフローＯＦ）を抽出し、前景のオプティカルフロー抽出部１２０が、抽出された動き情報に基づいて画像内の被写体の動きを示す動き情報（前景のオプティカルフローＦＯＦ）を抽出する。 That is, in the feature quantity calculation unit 100, the optical flow extraction unit 110 extracts motion information (optical flow OF) of the image based on the input image, and the foreground optical flow extraction unit 120 extracts the motion that has been extracted. Based on the information, motion information (foreground optical flow FOF) indicating the motion of the subject in the image is extracted.

また、方向ヒストグラムｈは、あるフレーム内において相対的に動きが大きい画像領域を示している。すなわち、前景のオプティカルフロー抽出部１２０は、この方向ヒストグラムｈを算出することによって、あるフレーム内において相対的に動きが大きい画像領域を前景（被写体）として抽出することができる。 In addition, the direction histogram h indicates an image region having a relatively large movement within a certain frame. That is, the foreground optical flow extraction unit 120 can extract an image region having a relatively large motion within a certain frame as a foreground (subject) by calculating the direction histogram h.

次に、重要シーン特徴量算出部１３０は、式（５）および式（６）によって算出した方向ヒストグラムｈのうち、頻度が最も大きい成分を第１成分とし、残りの成分を巡回的に並べ替えたものｈ＊（エッチ・アスタリスク）を、重要シーン特徴量として算出する。すなわち、重要シーン特徴量算出部１３０は、抽出された被写体の動きを示す動き情報に基づいて、重要シーンを示す特徴量を算出する。ここで、重要シーン特徴量算出部１３０は、所定の画素毎に抽出された動き情報が示す動きの方向のうち、最も出現頻度が高い動きの方向に基づいて、重要シーンを示す特徴量を算出する。例えば、あるフレームにおいて、方向ヒストグラムｈのｋ＝３番目の成分ｈ３が最も大きかったとすると、このフレームの重要シーン特徴量ｈ＊は、次の式（７）によって示される。 Next, the important scene feature amount calculation unit 130 cyclically rearranges the remaining components from the direction histogram h calculated by Equation (5) and Equation (6) as the first component having the highest frequency. The h * (etch asterisk) is calculated as an important scene feature amount. That is, the important scene feature quantity calculation unit 130 calculates a feature quantity indicating an important scene based on the extracted motion information indicating the motion of the subject. Here, the important scene feature amount calculation unit 130 calculates the feature amount indicating the important scene based on the direction of the motion having the highest appearance frequency among the motion directions indicated by the motion information extracted for each predetermined pixel. To do. For example, if the k = third component h3 of the direction histogram h is the largest in a certain frame, the important scene feature amount h * of this frame is expressed by the following equation (7).

重要シーン特徴量算出部１３０は、この重要シーン特徴量ｈ＊をサンプリングしたすべてのフレームにおいて算出する。次に、重要シーン特徴量算出部１３０は、算出した重要シーン特徴量ｈ＊と、画像のカテゴリを示す画像カテゴリデータとを関連付けて、記憶部４００に記憶させる。すなわち、重要シーン特徴量算出部１３０は、抽出された動き情報に基づいて、重要シーン特徴量ｈ＊を算出する。すなわち、重要シーン特徴量算出部１３０は、入力された画像のカテゴリ（種類）を示す情報と、抽出されたオプティカルフローＯＦ（動き情報）とに基づいて、重要シーン特徴量ｈ＊を算出する。 The important scene feature amount calculation unit 130 calculates the important scene feature amount h * in all the sampled frames. Next, the important scene feature quantity calculation unit 130 associates the calculated important scene feature quantity h * with the image category data indicating the category of the image and causes the storage unit 400 to store them. That is, the important scene feature quantity calculation unit 130 calculates the important scene feature quantity h * based on the extracted motion information. That is, the important scene feature quantity calculation unit 130 calculates the important scene feature quantity h * based on the information indicating the category (kind) of the input image and the extracted optical flow OF (motion information).

上述したように、重要シーン特徴量算出部１３０は、前景のオプティカルフローＦＯＦが示す被写体の動きの方向に基づいて、重要シーン特徴量ｈ＊を算出する。すなわち、重要シーン特徴量算出部１３０は、動き情報が示す動きの方向に基づいて、重要シーン特徴量ｈ＊を算出する。 As described above, the important scene feature quantity calculation unit 130 calculates the important scene feature quantity h * based on the direction of movement of the subject indicated by the foreground optical flow FOF. That is, the important scene feature quantity calculation unit 130 calculates the important scene feature quantity h * based on the direction of motion indicated by the motion information.

また、重要シーン特徴量算出部１３０は、方向ヒストグラムｈに基づいて、重要シーン特徴量ｈ＊を算出する。この方向ヒストグラムｈとは、所定の画素毎に抽出された前景のオプティカルフローＦＯＦ（動き情報）が示す動きの方向の出現頻度を示す情報である。すなわち、重要シーン特徴量算出部１３０は、所定の画素毎に抽出された動き情報が示す動きの方向の出現頻度に基づいて、重要シーン特徴量ｈ＊を算出する。 Further, the important scene feature quantity calculation unit 130 calculates the important scene feature quantity h * based on the direction histogram h. The direction histogram h is information indicating the appearance frequency of the direction of motion indicated by the foreground optical flow FOF (motion information) extracted for each predetermined pixel. That is, the important scene feature quantity calculation unit 130 calculates the important scene feature quantity h * based on the appearance frequency of the motion direction indicated by the motion information extracted for each predetermined pixel.

識別器生成部２００は、画像の重要シーンを識別する識別器（識別情報）を生成する。具体的には、識別器生成部２００は、重要シーン特徴量算出部１３０が算出した重要シーン特徴量ｈ＊と、入力される重要シーンデータと、入力される画像カテゴリデータとに基づいて、識別器を生成する。この重要シーンデータとは、特徴量算出部１００に入力される訓練画像の各シーンのうち、重要シーンを示す情報である。すなわち、重要シーンデータとは、識別器が、訓練画像の各シーンのうち、重要シーンとして識別すべき正解のシーンを示す情報である。また、画像カテゴリデータとは、特徴量算出部１００に入力される訓練画像データのカテゴリを示すデータである。 The discriminator generation unit 200 generates a discriminator (identification information) that identifies an important scene of an image. Specifically, the discriminator generation unit 200 discriminates based on the important scene feature quantity h * calculated by the important scene feature quantity calculation unit 130, input important scene data, and input image category data. Create a container. The important scene data is information indicating an important scene among the scenes of the training image input to the feature amount calculation unit 100. That is, the important scene data is information indicating a correct scene that the classifier should identify as an important scene among the scenes of the training image. The image category data is data indicating a category of training image data input to the feature amount calculation unit 100.

この識別器生成部２００は、例えば、既知の機械学習の手法（例えば、ｂａｇ−ｏｆ−ｗｏｒｄｓとＳＶＭとを用いた識別方法）によって識別器を生成する。 The classifier generation unit 200 generates a classifier by, for example, a known machine learning method (for example, a classification method using bag-of-words and SVM).

また、識別器生成部２００は、画像のカテゴリ毎に識別器を生成することができる。ここで、画像のカテゴリ毎に重要シーン特徴量ｈ＊が相違することがある。したがって、重要シーンを識別する識別器を画像のカテゴリ毎に用意することにより、各カテゴリに適応した、識別精度のよい識別器を用意することができる。すなわち、画像のカテゴリ毎に、このカテゴリに応じた重要シーン特徴量ｈ＊に基づいて生成される識別器を用意することにより、画像処理装置１０は、精度よく重要シーンを識別することができる。 Further, the discriminator generation unit 200 can generate a discriminator for each category of the image. Here, the important scene feature amount h * may be different for each category of the image. Therefore, by preparing a discriminator for identifying an important scene for each category of an image, it is possible to prepare a discriminator with high discrimination accuracy adapted to each category. That is, for each image category, by preparing a discriminator that is generated based on the important scene feature quantity h * corresponding to this category, the image processing apparatus 10 can accurately identify the important scene.

［画像処理装置１０の構成（識別過程）］
次に、識別過程における画像処理装置１０の構成について説明する。この識別過程においては、重要シーン抽出部３００が、記憶部４００に記憶されている訓練画像の重要シーン特徴量ｈ＊と、処理対象の画像とに基づいて、処理対象の画像から重要シーンを抽出する。この重要シーン抽出部３００の具体的な構成について、以下説明する。 [Configuration of Image Processing Device 10 (Identification Process)]
Next, the configuration of the image processing apparatus 10 in the identification process will be described. In this identification process, the important scene extraction unit 300 extracts an important scene from the processing target image based on the important scene feature amount h * of the training image stored in the storage unit 400 and the processing target image. To do. A specific configuration of the important scene extraction unit 300 will be described below.

重要シーン抽出部３００は、上述した特徴量算出部１００に相当する特徴量算出部３２０と、重要シーン判定部３３０とを備えている。特徴量算出部３２０は、処理対象の画像データが入力されると、上述した特徴量算出部１００と同様にして、処理対象の画像の前景のオプティカルフローＦＯＦ（ｖ＊（ｘ，ｙ））を抽出する。 The important scene extraction unit 300 includes a feature amount calculation unit 320 corresponding to the above-described feature amount calculation unit 100, and an important scene determination unit 330. When the image data to be processed is input, the feature amount calculation unit 320 receives the optical flow FOF (v * (x, y)) of the foreground of the image to be processed in the same manner as the feature amount calculation unit 100 described above. Extract.

また、特徴量算出部３２０は、抽出した処理対象の画像の前景のオプティカルフローＦＯＦ（ｖ＊（ｘ，ｙ））に基づいて、処理対象の画像の重要シーン特徴量ｈ＊（エッチ・アスタリスク）を算出する。 In addition, the feature amount calculation unit 320, based on the extracted foreground optical flow FOF (v * (x, y)) of the processing target image, the important scene feature amount h * (etch asterisk) of the processing target image. Is calculated.

重要シーン判定部３３０は、処理対象の画像のカテゴリを示す画像カテゴリデータと、記憶部４００に画像のカテゴリ毎に記憶されている識別器と、特徴量算出部３２０が算出した処理対象の画像の重要シーン特徴量ｈ＊とに基づいて、処理対象の画像から重要シーンを判定する。具体的には、重要シーン判定部３３０は、特徴量算出部３２０が算出した処理対象の画像の重要シーン特徴量ｈ＊を、記憶部４００に記憶されている識別器に適用することにより、入力された処理対象の画像のシーンが重要シーンであるか否かを判定する。 The important scene determination unit 330 includes image category data indicating the category of the image to be processed, a discriminator stored for each image category in the storage unit 400, and the processing target image calculated by the feature amount calculation unit 320. Based on the important scene feature amount h *, an important scene is determined from the image to be processed. Specifically, the important scene determination unit 330 applies the important scene feature amount h * of the processing target image calculated by the feature amount calculation unit 320 to the discriminator stored in the storage unit 400, thereby performing input. It is determined whether the scene of the processed image to be processed is an important scene.

上述したように、重要シーン判定部３３０は、入力される画像カテゴリデータに基づいて、重要シーンを判定する。具体的には、重要シーン判定部３３０は、記憶部４００に記憶されている識別器のうち、入力される画像カテゴリデータが示す画像のカテゴリに関連付けられている識別器を読み出す。これにより、重要シーン判定部３３０は、処理対象の画像の重要シーン特徴量ｈ＊の適用対象として、画像のカテゴリ毎に分類された識別器を用いることができる。すなわち、重要シーン判定部３３０は、処理対象の画像のカテゴリに適合した識別器を選択することができるため、精度よく重要シーンを判定することができる。 As described above, the important scene determination unit 330 determines an important scene based on the input image category data. Specifically, the important scene determination unit 330 reads out the classifiers associated with the image category indicated by the input image category data from among the classifiers stored in the storage unit 400. Thereby, the important scene determination unit 330 can use the classifiers classified for each category of the image as the application target of the important scene feature amount h * of the image to be processed. That is, since the important scene determination unit 330 can select a discriminator suitable for the category of the image to be processed, the important scene can be determined with high accuracy.

重要シーン判定部３３０は、上述のようにして判定した重要シーンを示す画像を重要シーン画像データとして出力する。
ここまで、画像処理装置１０が行う重要シーン抽出の基本的な構成について説明した。以下、画像処理装置１０が行う重要シーン抽出の、より具体的な構成について説明する。 The important scene determination unit 330 outputs an image indicating the important scene determined as described above as important scene image data.
Up to this point, the basic configuration of the important scene extraction performed by the image processing apparatus 10 has been described. Hereinafter, a more specific configuration of the important scene extraction performed by the image processing apparatus 10 will be described.

［画像処理装置１０のより具体的な構成について］
上述において、特徴量算出部１００は、入力される訓練画像データの画素数を変化させずに、訓練画像のオプティカルフローＯＦおよび前景のオプティカルフローＦＯＦを算出したが、これに限られない。具体的には、特徴量算出部１００は、訓練画像データの画素数を低減させて訓練画像のオプティカルフローＯＦおよび前景のオプティカルフローＦＯＦを算出する。例えば、特徴量算出部１００のオプティカルフロー抽出部１１０は、入力される訓練画像が１９２０×１０８０画素である場合に、この訓練画像を３２０×２４０画素に画像サイズを変更して、オプティカルフローＯＦおよび前景のオプティカルフローＦＯＦを算出する。この際に、オプティカルフロー抽出部１１０は、入力される訓練画像のアスペクト比（例えば、１６：９）と、画像サイズを変更した後の画像のアスペクト比（４：３）とが異なる場合には、画像サイズを変更する際に画像のトリミングを行う。これにより、特徴量算出部１００は、オプティカルフローＯＦを算出するための演算量を低減させることができる。また、これにより、特徴量算出部１００は、入力される訓練画像にノイズ成分が混入している場合に、このノイズ成分による影響を低減することができる。 [More Specific Configuration of Image Processing Apparatus 10]
In the above description, the feature amount calculation unit 100 calculates the optical flow OF of the training image and the optical flow FOF of the foreground without changing the number of pixels of the input training image data, but is not limited thereto. Specifically, the feature amount calculation unit 100 calculates the optical flow OF of the training image and the optical flow FOF of the foreground by reducing the number of pixels of the training image data. For example, when the input training image is 1920 × 1080 pixels, the optical flow extraction unit 110 of the feature amount calculation unit 100 changes the size of the training image to 320 × 240 pixels, and the optical flow OF and The foreground optical flow FOF is calculated. At this time, the optical flow extraction unit 110 determines that the aspect ratio (for example, 16: 9) of the input training image is different from the aspect ratio (4: 3) of the image after the image size is changed. When the image size is changed, the image is trimmed. Thereby, the feature quantity calculation unit 100 can reduce the amount of calculation for calculating the optical flow OF. Thereby, the feature-value calculation part 100 can reduce the influence by this noise component, when the noise component is mixed in the input training image.

また、上述において、特徴量算出部１００は、入力される訓練画像のフレーム内のすべての画素について、訓練画像のオプティカルフローＯＦおよび前景のオプティカルフローＦＯＦを算出したが、これに限られない。ここで、動画のフレームにおいて、視聴者が注目する領域はフレーム中央付近でありフレーム四隅はほとんど注目されないことがある。したがって、特徴量算出部１００は、訓練画像のフレーム内の周辺部分の画素については、訓練画像のオプティカルフローＯＦおよび前景のオプティカルフローＦＯＦを算出しなくてもよい。すなわち、特徴量算出部１００は、訓練画像のフレーム内の中心部分の画素について、訓練画像のオプティカルフローＯＦおよび前景のオプティカルフローＦＯＦを算出すれば足りる。より具体的な一例として、特徴量算出部１００は、半径閾値αの楕円ＥＯ内の画素、すなわち、Ｗ、Ｈをフレームの横幅と高さとした場合に、次の式（８）によって示される画素（ｘ，ｙ）によってオプティカルフローＯＦを算出する。なお、この式（８）において、０＜α≦１である。 In the above description, the feature amount calculation unit 100 calculates the optical flow OFF of the training image and the optical flow FOF of the foreground for all the pixels in the frame of the input training image. Here, in the frame of the moving image, the region that the viewer pays attention to is near the center of the frame, and the four corners of the frame may be hardly noticed. Therefore, the feature amount calculation unit 100 does not have to calculate the optical flow OF of the training image and the optical flow FOF of the foreground for the peripheral pixels in the frame of the training image. That is, the feature quantity calculation unit 100 only needs to calculate the optical flow OF of the training image and the optical flow FOF of the foreground for the pixel at the center portion in the frame of the training image. As a more specific example, the feature amount calculation unit 100 calculates the pixels in the ellipse EO with the radius threshold value α, that is, the pixels represented by the following equation (8) when W and H are the horizontal width and height of the frame. The optical flow OF is calculated by (x, y). In this equation (8), 0 <α ≦ 1.

特徴量算出部１００が、この式（８）によって示される楕円ＥＯ内の画素を算出対象として、訓練画像のオプティカルフローＯＦおよび前景のオプティカルフローＦＯＦを算出した具体例を図３に示す。
図３は、本実施形態の特徴量算出部１００による動き算出結果の一例を示す模式図である。特徴量算出部１００のオプティカルフロー抽出部１１０は、例えば、図３（ａ）に示す楕円ＥＯ内画素を算出対象として、訓練画像のオプティカルフローＯＦを算出する。また、特徴量算出部１００の前景のオプティカルフロー抽出部１２０は、例えば、図３（ｂ）に示すように、上述の楕円ＥＯ内に含まれるサッカーボールの画像の動きを、前景のオプティカルフローＦＯＦとして算出する。このように構成することにより、特徴量算出部１００は、画像の動きを示す情報（オプティカルフローＯＦ、前景のオプティカルフローＦＯＦ）を算出するための演算量を低減させることができる。 FIG. 3 shows a specific example in which the feature amount calculation unit 100 calculates the optical flow OF of the training image and the optical flow FOF of the foreground using the pixels in the ellipse EO represented by the equation (8) as the calculation target.
FIG. 3 is a schematic diagram illustrating an example of a motion calculation result by the feature amount calculation unit 100 of the present embodiment. For example, the optical flow extraction unit 110 of the feature amount calculation unit 100 calculates the optical flow OF of the training image using the pixels within the ellipse EO shown in FIG. In addition, the foreground optical flow extraction unit 120 of the feature amount calculation unit 100, for example, as shown in FIG. Calculate as With this configuration, the feature amount calculation unit 100 can reduce the amount of calculation for calculating information (optical flow OF, foreground optical flow FOF) indicating image motion.

また、上述において、特徴量算出部１００は、入力される訓練画像のフレーム内のすべての画素について、訓練画像のオプティカルフローＯＦおよび前景のオプティカルフローＦＯＦを算出したが、これに限られない。特徴量算出部１００は、フレーム内の画素のうち、所定の間隔で間引きされた画素に対して、オプティカルフローＯＦおよび前景のオプティカルフローＦＯＦを算出してもよい。一例として、特徴量算出部１００は、画素間隔Ｌ（例えば、Ｌ＝５画素）毎にオプティカルフローＯＦおよび前景のオプティカルフローＦＯＦを算出する（式（９）および式（１０）を参照。）。 In the above description, the feature amount calculation unit 100 calculates the optical flow OFF of the training image and the optical flow FOF of the foreground for all the pixels in the frame of the input training image. The feature amount calculation unit 100 may calculate an optical flow OF and a foreground optical flow FOF for pixels thinned out at a predetermined interval among the pixels in the frame. As an example, the feature amount calculation unit 100 calculates an optical flow OF and a foreground optical flow FOF for each pixel interval L (for example, L = 5 pixels) (see Expression (9) and Expression (10)).

このように構成することによっても、特徴量算出部１００は、オプティカルフローＯＦを算出するための演算量を低減させることができる。 Also with this configuration, the feature amount calculation unit 100 can reduce the amount of calculation for calculating the optical flow OF.

また、オプティカルフロー抽出部１１０は、上述した式（２）および式（３）によってオプティカルフローＯＦの平均値ｖ￣（ブイ・バー）と標準偏差σ（シグマ）とを算出する前に、ノイズとみなすオプティカルフローの大きさの閾値ｖ０（ブイ・ゼロ）と、前景のオプティカルフローの標準偏差の閾値σ０（シグマ・ゼロ）とを用いて、オプティカルフローＯＦ（ｖ（ｘ，ｙ））を算出する（式（１１）を参照。）。 Further, the optical flow extraction unit 110 calculates noise and noise before calculating the average value v￣ (buoy bar) and the standard deviation σ (sigma) of the optical flow OF by the above-described equations (2) and (3). The optical flow OF (v (x, y)) is calculated by using the threshold v0 (buoy zero) of the optical flow to be regarded and the threshold σ0 (sigma zero) of the standard deviation of the foreground optical flow. (See equation (11).)

一例として、オプティカルフロー抽出部１１０は、閾値ｖ０＝２であり、閾値σ０＝２画素（ピクセル）としてオプティカルフローＯＦ（ｖ（ｘ，ｙ））を算出する。 As an example, the optical flow extraction unit 110 calculates the optical flow OF (v (x, y)) with the threshold v0 = 2 and the threshold σ0 = 2 pixels.

このように構成することによって、オプティカルフロー抽出部１１０は、オプティカルフローＯＦ算出時にノイズを除去することができ、前景のオプティカルフローＦＯＦの誤検出を少なくすることができる。 With this configuration, the optical flow extraction unit 110 can remove noise when calculating the optical flow OF, and can reduce erroneous detection of the foreground optical flow FOF.

また、前景のオプティカルフロー抽出部１２０は、上述した式（４）に基づいて前景のオプティカルフローＦＯＦ（ｖ＊（ｘ，ｙ））（ブイ・アスタリスク・エックス・ワイ）を算出するとして説明したが、これに限られない。前景のオプティカルフロー抽出部１２０は、式（１２）に示すように、閾値β（β≧０）を用いて前景のオプティカルフローＦＯＦを算出するように構成してもよい。この閾値βは、例えば、閾値β＝１である。 The foreground optical flow extraction unit 120 has been described as calculating the foreground optical flow FOF (v * (x, y)) (buoy asterisk x wai) based on the above-described equation (4). Not limited to this. The foreground optical flow extraction unit 120 may be configured to calculate the foreground optical flow FOF using a threshold β (β ≧ 0), as shown in Expression (12). This threshold value β is, for example, threshold value β = 1.

このように構成することによって、前景のオプティカルフロー抽出部１２０は、算出する前景のオプティカルフローＦＯＦを、例えば、動画のカテゴリや、動画シーンの長さ（継続時間）に応じて調節することができる。 With this configuration, the foreground optical flow extraction unit 120 can adjust the calculated foreground optical flow FOF in accordance with, for example, the category of the moving image and the length (duration) of the moving image scene. .

また、重要シーン判定部３３０は、２つ以上の重要シーンを１つの重要シーンにまとめて画像を出力してもよい。この場合には、重要シーン判定部３３０は、隣接する２つの重要シーンの時間間隔がある時間Ｔ（例えば、時間Ｔ＝２秒）以下であれば、その２つの前の重要シーンと後の重要シーンとのうち、前の重要シーンの開始時刻から後の重要シーンの終了時刻までを１つの重要シーンとしてまとめて出力してもよい。このように構成することにより、重要シーン判定部３３０は、重要シーンが細切れにならないようにして、重要シーンを出力することができる。 Further, the important scene determination unit 330 may output an image by combining two or more important scenes into one important scene. In this case, if the time interval between two adjacent important scenes is equal to or less than a certain time T (for example, time T = 2 seconds), the important scene determination unit 330 will perform the two previous important scenes and the subsequent important scenes. Of the scenes, the time from the start time of the previous important scene to the end time of the subsequent important scene may be output together as one important scene. By configuring in this way, the important scene determination unit 330 can output the important scene so that the important scene is not shredded.

［画像処理装置１０の動作］
次に、図４を参照して、本実施形態の画像処理装置１０の動作について説明する。
図４は、本実施形態の画像処理装置１０の動作の一例を示すフローチャートである。まず、訓練過程における動作について説明し、次に、識別過程における動作について説明する。 [Operation of Image Processing Apparatus 10]
Next, the operation of the image processing apparatus 10 according to the present embodiment will be described with reference to FIG.
FIG. 4 is a flowchart showing an example of the operation of the image processing apparatus 10 of the present embodiment. First, the operation in the training process will be described, and then the operation in the identification process will be described.

訓練過程において、オプティカルフロー抽出部１１０は、入力される訓練画像のオプティカルフローＯＦを抽出する（ステップＳ１０）。 In the training process, the optical flow extraction unit 110 extracts the optical flow OF of the input training image (step S10).

次に、前景のオプティカルフロー抽出部１２０は、オプティカルフロー抽出部１１０が抽出した訓練画像のオプティカルフローＯＦから、前景のオプティカルフローＦＯＦを抽出する（ステップＳ２０）。 Next, the foreground optical flow extraction unit 120 extracts the foreground optical flow FOF from the optical flow OF of the training image extracted by the optical flow extraction unit 110 (step S20).

次に、前景のオプティカルフロー抽出部１２０は、ステップＳ２０において抽出した前景のオプティカルフローＦＯＦを複数の方向に量子化する（ステップＳ３０）。
次に、前景のオプティカルフロー抽出部１２０は、ステップＳ３０において各方向に量子化した前景のオプティカルフローＦＯＦについて、各方向の頻度を算出する（ステップＳ４０）。これにより、方向ヒストグラムｈが算出される。 Next, the foreground optical flow extraction unit 120 quantizes the foreground optical flow FOF extracted in step S20 in a plurality of directions (step S30).
Next, the foreground optical flow extraction unit 120 calculates the frequency in each direction for the foreground optical flow FOF quantized in each direction in step S30 (step S40). Thereby, the direction histogram h is calculated.

次に、重要シーン特徴量算出部１３０は、ステップＳ４０において算出された方向ヒストグラムｈに基づいて、重要シーン特徴量ｈ＊（エッチ・アスタリスク）を算出する（ステップＳ５０）。 Next, the important scene feature quantity calculation unit 130 calculates the important scene feature quantity h * (etch asterisk) based on the direction histogram h calculated in step S40 (step S50).

次に、特徴量算出部１００は、すべての訓練画像について重要シーン特徴量ｈ＊を算出済みであるか否かを判定する（ステップＳ６０）。ここで、訓練画像には、さまざまなカテゴリの画像がある。特徴量算出部１００は、訓練画像のカテゴリ毎に重要シーン特徴量ｈ＊を算出するため、複数ある訓練画像について、重要シーン特徴量ｈ＊をそれぞれ算出する。特徴量算出部１００は、すべての訓練画像について重要シーン特徴量ｈ＊を算出済みであると判定した場合（ステップＳ６０：ＹＥＳ）には、処理をステップＳ７０に進める。特徴量算出部１００は、すべての訓練画像について重要シーン特徴量ｈ＊を算出済みでないと判定した場合（ステップＳ６０：ＮＯ）には、次の訓練画像について重要シーン特徴量ｈ＊を算出するため、処理をステップＳ１０に戻す。 Next, the feature amount calculation unit 100 determines whether or not the important scene feature amount h * has been calculated for all training images (step S60). Here, the training images include images of various categories. Since the feature amount calculation unit 100 calculates the important scene feature amount h * for each category of the training image, the feature amount calculation unit 100 calculates the important scene feature amount h * for each of the plurality of training images. If the feature amount calculation unit 100 determines that the important scene feature amount h * has been calculated for all training images (step S60: YES), the feature amount calculation unit 100 advances the process to step S70. When it is determined that the important scene feature value h * has not been calculated for all training images (step S60: NO), the feature value calculating unit 100 calculates the important scene feature value h * for the next training image. The process returns to step S10.

次に、識別器生成部２００は、ステップＳ５０において画像のカテゴリ毎に算出した重要シーン特徴量ｈ＊と、入力された画像カテゴリデータと、重要シーンデータとに基づいて、識別器を生成する。また、識別器生成部２００は、入力された画像カテゴリデータと、生成した識別器とを関連付けて、記憶部４００に記憶させて訓練過程を終了する（ステップＳ７０）。 Next, the discriminator generation unit 200 generates a discriminator based on the important scene feature amount h * calculated for each image category in step S50, the input image category data, and the important scene data. The discriminator generation unit 200 associates the input image category data with the generated discriminator, stores them in the storage unit 400, and ends the training process (step S70).

次に、識別過程において、特徴量算出部３２０は、上述したステップＳ１０〜ステップＳ５０と同様にして、重要シーン特徴量ｈ＊を算出する（ステップＳ１００〜ステップＳ１４０）。 Next, in the identification process, the feature amount calculation unit 320 calculates the important scene feature amount h * in the same manner as in steps S10 to S50 described above (steps S100 to S140).

次に、重要シーン判定部３３０は、ステップＳ１００〜ステップＳ１４０において算出された処理対象画像の重要シーン特徴量ｈ＊と、ステップＳ７０において記憶部４００に記憶された識別器と、入力される画像のカテゴリとに基づいて、処理対象画像から重要シーンを抽出して処理を終了する（ステップＳ１５０）。 Next, the important scene determination unit 330 calculates the important scene feature amount h * of the processing target image calculated in steps S100 to S140, the classifier stored in the storage unit 400 in step S70, and the input image. Based on the category, an important scene is extracted from the processing target image, and the process ends (step S150).

以上説明したように本実施形態の画像処理装置１０は、特徴量算出部１００と、識別器生成部２００とを備えている。これにより、画像処理装置１０は、入力される画像（例えば、動画）に含まれる被写体を抽出し、この被写体の動きに基づいて、重要シーン特徴量ｈ＊を算出する。したがって、画像処理装置１０は、入力される画像を撮影したときの音声情報やカメラのセンサ情報などの付加的な情報を必要とせずに、重要シーン特徴量ｈ＊を算出することができる。すなわち、画像処理装置１０によれば、特別なセンサを用いることなく、画像データから重要なシーンを精度よく抽出することができる。 As described above, the image processing apparatus 10 according to the present embodiment includes the feature amount calculation unit 100 and the classifier generation unit 200. Accordingly, the image processing apparatus 10 extracts a subject included in the input image (for example, a moving image), and calculates an important scene feature amount h * based on the movement of the subject. Therefore, the image processing apparatus 10 can calculate the important scene feature amount h * without requiring additional information such as audio information when the input image is taken and sensor information of the camera. That is, according to the image processing apparatus 10, an important scene can be accurately extracted from image data without using a special sensor.

また、動画が、動いている被写体をカメラで追って撮影された場合には、この被写体がほぼ固定され背景が大きく動く。この場合にも、本実施形態の画像処理装置１０によれば、大きく動く背景を被写体（前景）と誤認識することなく、前景のオプティカルフローＦＯＦを算出することができる。 In addition, when a moving image is shot with a camera following a moving subject, the subject is almost fixed and the background moves greatly. Also in this case, according to the image processing apparatus 10 of the present embodiment, the foreground optical flow FOF can be calculated without erroneously recognizing a largely moving background as a subject (foreground).

なお、上述の実施形態において、画像処理装置１０が、動画のカテゴリに基づいて重要シーンを抽出する例について説明したが、これに限られない。例えば、画像処理装置１０は、一般的な機械学習の手法に基づいて重要シーンを抽出するため、動画のカテゴリごとに特徴量を算出しなくても、重要シーンを抽出することができる。これにより、画像処理装置１０は、その構成を簡素化することができる。 In the above-described embodiment, the example in which the image processing apparatus 10 extracts the important scene based on the moving image category has been described. However, the present invention is not limited to this. For example, since the image processing apparatus 10 extracts an important scene based on a general machine learning method, the image processing apparatus 10 can extract an important scene without calculating a feature amount for each category of a moving image. Thereby, the image processing apparatus 10 can simplify the structure.

また、上述の実施形態において、画像処理装置１０が、入力された動画のカテゴリと、算出した特徴量とを関連付ける例について説明したが、これに限られない。例えば、画像処理装置１０は、一般的な機械学習の手法に基づいて動画のカテゴリを自装置において判定し、判定した動画のカテゴリと算出した特徴量とを関連付けてもよい。これにより、画像処理装置１０は、動画のカテゴリ判定を自動化することができるため、その操作を容易にすることができる。 In the above-described embodiment, the example in which the image processing apparatus 10 associates the input moving image category with the calculated feature amount has been described, but the present invention is not limited thereto. For example, the image processing apparatus 10 may determine a moving image category in the self apparatus based on a general machine learning technique, and associate the determined moving image category with the calculated feature amount. Thereby, since the image processing apparatus 10 can automate the category determination of the moving image, the operation can be facilitated.

また、上述した画像処理装置１０の一部の機能をコンピュータで実現するようにしてもよい。この場合、その機能を実現するための画像処理プログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録された画像処理プログラムをコンピュータシステムに読み込ませ、実行することによって実現してもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳ（ＯｐｅｒａｔｉｎｇＳｙｓｔｅｍ）や周辺装置のハードウェアを含むものである。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、光ディスク、メモリカード等の可搬型記録媒体、コンピュータシステムに内蔵される磁気ハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを送信する場合の通信線のように、短時間の間、動的にプログラムを保持するもの、その場合のサーバ装置やクライアントとなるコンピュータシステム内部の揮発性メモリのように、一定時間プログラムを保持するものを含んでもよい。また上記のプログラムは、前述した機能の一部を実現するためのものであってもよく、さらに前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせにより実現するものであってもよい。 Further, some functions of the image processing apparatus 10 described above may be realized by a computer. In this case, the image processing program for realizing the function may be recorded on a computer-readable recording medium, and the image processing program recorded on the recording medium may be read by the computer system and executed. Good. Here, the “computer system” includes an OS (Operating System) and peripheral device hardware. The “computer-readable recording medium” refers to a portable recording medium such as a flexible disk, a magneto-optical disk, an optical disk, and a memory card, and a storage device such as a magnetic hard disk built in the computer system. Furthermore, the “computer-readable recording medium” dynamically holds a program for a short time like a communication line when transmitting a program via a network such as the Internet or a communication line such as a telephone line. In this case, it may include a device that holds a program for a certain period of time, such as a volatile memory inside a computer system serving as a server device or a client. Further, the above program may be for realizing a part of the functions described above, or may be realized by a combination with the program already recorded in the computer system. .

以上、本発明の実施の形態について図面を参照して詳述したが、具体的な構成はその実
施形態に限られるものではなく、本発明の要旨を逸脱しない範囲の設計等も含まれる。 As mentioned above, although embodiment of this invention was explained in full detail with reference to drawings, the specific structure is not restricted to that embodiment, The design of the range which does not deviate from the summary of this invention, etc. are included.

１０…画像処理装置、１００…特徴量算出部、１１０…オプティカルフロー抽出部、１２０…前景のオプティカルフロー抽出部、１３０…重要シーン特徴量算出部、２００…識別器生成部、３００…重要シーン抽出部、３２０…特徴量算出部、３３０…重要シーン判定部、４００…記憶部 DESCRIPTION OF SYMBOLS 10 ... Image processing apparatus, 100 ... Feature-value calculation part, 110 ... Optical flow extraction part, 120 ... Foreground optical flow extraction part, 130 ... Important scene feature-value calculation part, 200 ... Discriminator production | generation part, 300 ... Important scene extraction , 320... Feature amount calculation unit, 330... Important scene determination unit, 400.

Claims

A motion extraction unit that extracts motion information of the image based on the input image;
Based on the extracted motion information, a feature amount calculating unit that calculates a feature amount indicating an important scene in which the subject in the image has a motion;
An image processing apparatus comprising:

The movement information is information indicating movement of the subject in the image,
The motion extraction unit
Based on the input image, extract motion information indicating the motion of the subject in the image,
The feature amount calculation unit includes:
The image processing apparatus according to claim 1, wherein a feature amount indicating the important scene is calculated based on movement information indicating the extracted movement of the subject.

The image processing apparatus according to claim 1, wherein the subject is an image of an image area that has a relatively large movement between the image areas among a plurality of image areas in the image. .

The movement information includes information indicating the direction of movement of the subject,
The feature amount calculation unit includes:
The image processing apparatus according to any one of claims 1 to 3, wherein the feature amount is calculated based on a direction of the motion indicated by the motion information.

The motion extraction unit
Based on the input image, the motion information of the image is extracted for each predetermined pixel constituting the image,
The feature amount calculation unit includes:
The image processing apparatus according to claim 4, wherein a feature amount indicating the important scene is calculated based on an appearance frequency of the motion direction indicated by the motion information extracted for each predetermined pixel.

The feature amount calculation unit includes:
The feature amount indicating the important scene is calculated based on the direction of the motion having the highest appearance frequency among the directions of the motion indicated by the motion information extracted for each predetermined pixel. 5. The image processing apparatus according to 5.

The feature amount calculation unit is further input with information indicating the type of the image,
The feature amount calculation unit includes:
The feature amount indicating the important scene is calculated based on the input information indicating the type of the image and the extracted motion information. An image processing apparatus according to 1.

A motion extraction procedure for extracting motion information of the image based on the input image;
Based on the extracted motion information, a feature amount calculation procedure for calculating a feature amount indicating an important scene in which a subject in the image has a motion;
An image processing method comprising:

In the computer provided in the image processing apparatus,
A motion extraction procedure for extracting motion information of the image based on the input image;
Based on the extracted motion information, a feature amount calculation procedure for calculating a feature amount indicating an important scene in which a subject in the image has a motion;
A program for running