JP5598159B2

JP5598159B2 - Image processing apparatus, imaging system, image processing method, and program

Info

Publication number: JP5598159B2
Application number: JP2010186026A
Authority: JP
Inventors: 武史松尾
Original assignee: Nikon Corp
Current assignee: Nikon Corp
Priority date: 2010-08-23
Filing date: 2010-08-23
Publication date: 2014-10-01
Anticipated expiration: 2030-08-23
Also published as: JP2012043337A

Description

本発明は、画像処理装置、撮像システム、画像処理方法、およびプログラムに関する。 The present invention relates to an image processing apparatus, an imaging system, an image processing method, and a program.

従来、デジタルカメラ等で撮像した写真等の画像の特徴を、予めカテゴリ別に分類した学習画像の特徴と比較することにより、画像を分類する方法が知られている。例えば、画像の特徴量により画像を分類する方法は、特許文献１に記載されている。
特許文献１特開２００９−００９４０２号公報 2. Description of the Related Art Conventionally, a method for classifying an image by comparing the characteristics of an image such as a photograph taken with a digital camera or the like with the characteristics of a learning image classified in advance by category is known. For example, Patent Document 1 discloses a method for classifying images based on image feature amounts.
Patent Document 1 JP 2009-009402 A

デジタルビデオカメラまたはデジタルカメラで撮像した動画像データには、様々な被写体が撮像されており、進行に伴って撮像されている被写体の特徴が移り変わっていく。したがって、動画像データから特徴的なシーンを自動的に選択することは困難であった。 Various subjects are captured in the moving image data captured by the digital video camera or the digital camera, and the characteristics of the captured subject change with progress. Therefore, it has been difficult to automatically select a characteristic scene from moving image data.

上記課題を解決するために、本発明の第１の態様においては、入力された動画像データから特徴シーンを選択する画像処理装置であって、動画像データを解析して、動画像データが、画像内容を識別するための予め定められた複数のカテゴリのうち何れのカテゴリに属するかを特定するカテゴリ特定部と、動画像データから抽出される複数の特徴量の時系列データのうち動画像データが属するカテゴリに対応付けられた特徴量の時系列データに基づき、動画像データに含まれる特徴シーンを選択する選択部と、を備える画像処理装置を提供する。 In order to solve the above-described problem, in the first aspect of the present invention, an image processing apparatus that selects a feature scene from input moving image data, the moving image data is analyzed, and the moving image data is A category specifying unit for specifying which category of a plurality of predetermined categories for identifying image contents, and moving image data among time-series data of a plurality of feature amounts extracted from moving image data There is provided an image processing apparatus comprising: a selection unit that selects a feature scene included in moving image data based on time-series data of feature amounts associated with a category to which.

なお、上記の発明の概要は、本発明の必要な特徴の全てを列挙したものではない。また、これらの特徴群のサブコンビネーションもまた、発明となりうる。 It should be noted that the above summary of the invention does not enumerate all the necessary features of the present invention. In addition, a sub-combination of these feature groups can also be an invention.

本実施形態に係る画像処理装置１００の構成例を示す。2 shows a configuration example of an image processing apparatus 100 according to the present embodiment. 本実施形態に係る画像処理装置１００の動作フローを示す。2 shows an operation flow of the image processing apparatus 100 according to the present embodiment. 本実施形態に係る動画像データ２１０の一例を示す。An example of the moving image data 210 which concerns on this embodiment is shown. 本実施形態に係る動画像データ２１０に対応する、カテゴリ特徴量の時系列データの一例を示す。An example of the time series data of the category feature amount corresponding to the moving image data 210 according to the present embodiment is shown. 本実施形態に係る動画像データ２１０に対応する、シーン特徴量の時系列データの一例を示す。An example of time-series data of scene feature amounts corresponding to the moving image data 210 according to the present embodiment is shown. 本実施形態に係る画像処理装置１００の変形例を示す。The modification of the image processing apparatus 100 which concerns on this embodiment is shown. 本実施形態に係るコンピュータ１９００のハードウェア構成の一例を示す。2 shows an exemplary hardware configuration of a computer 1900 according to the present embodiment.

以下、発明の実施の形態を通じて本発明を説明するが、以下の実施形態は特許請求の範囲にかかる発明を限定するものではない。また、実施形態の中で説明されている特徴の組み合わせの全てが発明の解決手段に必須であるとは限らない。 Hereinafter, the present invention will be described through embodiments of the invention, but the following embodiments do not limit the invention according to the claims. In addition, not all the combinations of features described in the embodiments are essential for the solving means of the invention.

図１は、本実施形態に係る画像処理装置１００の構成例を示す。画像処理装置１００は、入力された動画像データから特徴シーンを選択する。画像処理装置１００は、デジタルビデオカメラまたはデジタルカメラ等で撮像された動画像データから特徴シーンを選択して、動画像データに対応づけて記録してよく、動画像データの特徴シーンをサムネイル表示してもよい。ここで本実施形態における特徴シーンとは、動画像データに撮像されたシーンのうち、動画像データの内容を把握できる印象的なシーンのことをいう。特徴シーンは、静止画像であっても、短い動画像データであってもよい。 FIG. 1 shows a configuration example of an image processing apparatus 100 according to the present embodiment. The image processing apparatus 100 selects a feature scene from the input moving image data. The image processing apparatus 100 may select a feature scene from moving image data captured by a digital video camera, a digital camera, or the like, and record it in association with the moving image data, and displays the feature scene of the moving image data as a thumbnail. May be. Here, the feature scene in the present embodiment refers to an impressive scene in which the content of the moving image data can be grasped among the scenes captured in the moving image data. The feature scene may be a still image or short moving image data.

例えば、特徴シーンは、運動会を撮像した動画像データにおいては子供が走っているシーン、サッカーの試合を撮像した動画像データにおいてはゴールを決めたシーン、家族旅行を撮像した動画像データにおいては観光スポットでの家族の集合シーン等の、利用者が印象的と感じると判断されたシーンであってもよい。画像処理装置１００は、カテゴリ特定部１１０と、選択部１２０と、抽出部１３０と、加工部１４０とを備える。 For example, a feature scene is a scene in which a child is running in moving image data capturing an athletic meet, a scene in which a goal is scored in moving image data capturing a soccer game, or sightseeing in moving image data capturing a family trip. It may be a scene determined to be impressive by the user, such as a family gathering scene at a spot. The image processing apparatus 100 includes a category specifying unit 110, a selection unit 120, an extraction unit 130, and a processing unit 140.

カテゴリ特定部１１０は、動画像データを解析して、動画像データが、画像内容を識別するための予め定められた複数のカテゴリのうち何れのカテゴリに属するかを特定する。カテゴリ特定部１１０は、動画像データが、どのようなイベントにおいて撮像された動画像であるかを識別するための複数のカテゴリのうちの何れのカテゴリに属するかを特定してよい。 The category identifying unit 110 analyzes the moving image data and identifies which category the moving image data belongs to among a plurality of predetermined categories for identifying image contents. The category specifying unit 110 may specify which category of a plurality of categories for identifying in which event the moving image data is a moving image captured.

また、カテゴリ特定部１１０は、解析すべき動画像データが、予め定められたいずれのカテゴリに属する画像であるかを識別する。例えば、カテゴリ特定部１１０は、動画像データを「花見」、「海辺」、「祭り」、「結婚式」等の利用者が撮像している出来事（イベント）が、予め定められたカテゴリに属するかを特定する。カテゴリ特定部１１０は、複数（ｎ個）の識別器１１５と、特定器１１７とを有する。 Moreover, the category specifying unit 110 identifies which category the moving image data to be analyzed belongs to. For example, the category specifying unit 110 includes events (events) taken by users such as “cherry blossom viewing”, “seaside”, “festival”, “wedding”, etc., as moving image data belonging to a predetermined category. To identify. The category specifying unit 110 includes a plurality (n) of discriminators 115 and a specifier 117.

複数の識別器１１５のそれぞれは、動画像データから抽出される１以上のカテゴリ特徴量に基づき、動画像データが各識別器１１５に対応づけられたカテゴリに属するかを判別する。識別器１１５は、予め定められた複数のカテゴリのうち、対応するカテゴリに応じた特徴量を抽出してよい。例えば、カテゴリ特定部１１０は、予め定められた複数のカテゴリの数と同数の識別器１１５を有し、各識別器１１５は、各カテゴリに１対１に対応してそれぞれのカテゴリに応じた特徴量をそれぞれ抽出する。識別器１１５は、１つのカテゴリに対応して、１つ以上の特徴量を抽出してよい。ここで、カテゴリに対応した特徴量をカテゴリ特徴量とする。 Each of the plurality of discriminators 115 determines whether the moving image data belongs to a category associated with each discriminator 115 based on one or more category feature amounts extracted from the moving image data. The discriminator 115 may extract a feature amount corresponding to a corresponding category among a plurality of predetermined categories. For example, the category specifying unit 110 has the same number of discriminators 115 as the number of predetermined categories, and each discriminator 115 has a one-to-one correspondence with each category and features according to the respective categories. Extract each amount. The discriminator 115 may extract one or more feature amounts corresponding to one category. Here, the feature amount corresponding to the category is set as the category feature amount.

識別器１１５は、入力された動画像データから動画を形成するフレームの複数の静止画像データに分解して、複数の静止画像データのカテゴリ特徴量をそれぞれ抽出してよい。ここで識別器１１５は、複数の静止画像データのうち、１以上の静止画像データのカテゴリ特徴量を抽出して、入力された動画像データのカテゴリ特徴量としてよい。 The discriminator 115 may decompose the input moving image data into a plurality of still image data of frames forming a moving image, and extract category feature amounts of the plurality of still image data, respectively. Here, the discriminator 115 may extract a category feature amount of one or more still image data from among a plurality of still image data and use it as a category feature amount of the input moving image data.

識別器１１５は、１以上の静止画像データに含まれるエッジの数に対応する情報をカテゴリ特徴量としてよい。具体的には、識別器１１５は、１つの静止画像データの領域を所定の画素数から成る複数の領域に分割した上で、分割した複数の領域の一部または全ての領域に含まれるエッジの数を積算してよい。なお、エッジとは、隣接する画素間で輝度または色度が所定量以上変化する箇所である。 The discriminator 115 may use information corresponding to the number of edges included in one or more still image data as the category feature amount. Specifically, the discriminator 115 divides one still image data area into a plurality of areas each having a predetermined number of pixels, and then determines edges included in some or all of the plurality of divided areas. You may accumulate numbers. Note that an edge is a portion where luminance or chromaticity changes by a predetermined amount or more between adjacent pixels.

識別器１１５は、１以上の静止画像データの色情報を数値化することにより、カテゴリ特徴量を算出してもよい。例えば、識別器１１５は、１つの静止画像データの色空間をＨＳＶ色空間に変換した上で、色空間を構成する各色の発生頻度に基づいてヒストグラムを生成する。識別器１１５は、各色の発生頻度の数値情報をカテゴリ特徴量としてよい。 The discriminator 115 may calculate the category feature amount by digitizing color information of one or more still image data. For example, the discriminator 115 converts a color space of one still image data into an HSV color space, and then generates a histogram based on the occurrence frequency of each color constituting the color space. The discriminator 115 may use numerical information on the occurrence frequency of each color as a category feature amount.

また、識別器１１５は、１以上の静止画像データの輝度情報を数値化することにより、カテゴリ特徴量を算出してもよい。例えば、識別器１１５は、１つの静止画像データの輝度の平均および分散を、カテゴリ特徴量とする。識別器１１５は、一例として、１つの静止画像データの色空間をＨＳＶ色空間に変換した上で、変換後の色空間におけるエネルギーおよび分散をカテゴリ特徴量としてもよい。 The discriminator 115 may calculate the category feature amount by digitizing luminance information of one or more still image data. For example, the discriminator 115 uses the average and variance of the luminance of one still image data as the category feature amount. For example, the discriminator 115 may convert the color space of one still image data into the HSV color space and then use the energy and variance in the converted color space as the category feature amount.

また、識別器１１５は、１以上の静止画像データに含まれるメタ情報に基づいて、カテゴリ特徴量を算出してもよい。具体的には、入力された動画像データがＭＰＥＧ−７に準拠して生成された動画像データである場合には、当該動画像データの映像信号特徴記述子に示されるエッジに関する情報、色の発生頻度に関する情報、および輝度に関する情報等をカテゴリ特徴量としてよい。 The discriminator 115 may calculate a category feature amount based on meta information included in one or more still image data. Specifically, when the input moving image data is moving image data generated in conformity with MPEG-7, information about the edge and the color information indicated in the video signal feature descriptor of the moving image data are displayed. Information on occurrence frequency, information on luminance, and the like may be used as category feature amounts.

識別器１１５は、入力された動画像データを形成する複数の静止画像データについてカテゴリ特徴量をそれぞれ抽出して、カテゴリ特徴量の時系列データを抽出してよい。これに代えて、識別器１１５は、動画像データの一部分の静止画像データのカテゴリ特徴量を抽出してもよい。 The discriminator 115 may extract the category feature amount for each of a plurality of still image data forming the input moving image data, and extract the time series data of the category feature amount. Alternatively, the discriminator 115 may extract a category feature amount of still image data that is a part of moving image data.

識別器１１５は、静止画像データ毎に抽出したカテゴリ特徴量に基づき、各静止画像データのカテゴリを特定する。識別器１１５は、複数のカテゴリ特徴量を各成分とするカテゴリ特徴量ベクトルを用いてカテゴリを特定してもよい。例えば、識別器１１５は、画像の色度に基づく値、画像の輝度に基づく値等のカテゴリ特徴量をベクトルの成分とする。 The discriminator 115 identifies the category of each still image data based on the category feature amount extracted for each still image data. The discriminator 115 may specify a category using a category feature quantity vector having a plurality of category feature quantities as components. For example, the discriminator 115 uses a category feature amount such as a value based on the chromaticity of the image and a value based on the luminance of the image as a vector component.

各識別器１１５は、カテゴリ特徴量と予め定められた基準量とを比較して、カテゴリ特徴量と基準量との差分に基づき動画像データが当該識別器１１５に対応するカテゴリに属するかを判別してよい。ここで、予め定められた基準量は、各識別器１１５に対応づけられたカテゴリ毎に設定されてよい。また、識別器１１５がカテゴリ特徴量ベクトルを用いてカテゴリを分類する場合、識別器１１５は、そのカテゴリについて予め定められた基準ベクトルと比較して、動画像データが対応するカテゴリに属するかを判別してよい。この場合、基準ベクトルは、カテゴリ特徴量毎の基準量を成分とするベクトルであってよい。 Each discriminator 115 compares the category feature amount with a predetermined reference amount, and determines whether the moving image data belongs to the category corresponding to the discriminator 115 based on the difference between the category feature amount and the reference amount. You can do it. Here, the predetermined reference amount may be set for each category associated with each discriminator 115. In addition, when the classifier 115 classifies a category using the category feature vector, the classifier 115 compares the category with a predetermined reference vector to determine whether the moving image data belongs to the corresponding category. You can do it. In this case, the reference vector may be a vector having a reference amount for each category feature amount as a component.

各識別器１１５は、カテゴリ特徴量ベクトルと基準ベクトル間の距離を算出し、算出した距離に基づいて当該識別器１１５に対応するカテゴリに属するか否かをそれぞれ判別してよい。例えば、識別器１１５は、カテゴリ特徴量ベクトルが基準ベクトルから予め定められた距離の範囲内にある場合に、入力された静止画像データが当該識別器１１５に対応づけられたカテゴリに属すると判別する。 Each discriminator 115 may calculate a distance between the category feature vector and the reference vector, and may determine whether or not each belongs to a category corresponding to the discriminator 115 based on the calculated distance. For example, the discriminator 115 discriminates that the input still image data belongs to the category associated with the discriminator 115 when the category feature vector is within a predetermined distance from the reference vector. .

また、識別器１１５は、入力された動画像データを形成する複数の静止画像データのうち、一部の静止画像データのカテゴリ特徴量ベクトルが基準ベクトルから予め定められた距離の範囲内にある場合は、当該一部の静止画像データが当該識別器１１５に対応づけられたカテゴリに属すると判別してよい。各識別器１１５は、対応するカテゴリに属すると判別した場合に、カテゴリに属する旨の判別結果の情報を特定器１１７に送ってよい。ここで各識別器１１５は、カテゴリに属するとした判別結果に加えて、基準ベクトルからの距離の情報も特定器１１７に送ってよい。 Further, the discriminator 115, when the category feature amount vector of some still image data among a plurality of still image data forming the input moving image data is within a predetermined distance range from the reference vector May determine that the partial still image data belongs to the category associated with the classifier 115. When each discriminator 115 determines that it belongs to the corresponding category, the discriminator 115 may send information of the determination result indicating that it belongs to the category to the specifying unit 117. Here, each discriminator 115 may send information about the distance from the reference vector to the identifying unit 117 in addition to the determination result that the classifying unit 115 belongs to the category.

特定器１１７は、複数の識別器１１５から受け取った判別結果の情報に基づき、入力された動画像データが属するカテゴリを特定する。例えば、特定器１１７は、入力された動画像データを形成する複数の静止画像データについて、１つの識別器１１５から対応するカテゴリに属する旨の判別結果を受け取った場合、入力された動画像データが当該識別器１１５に対応づけられたカテゴリに属すると特定する。 The identifying unit 117 identifies the category to which the input moving image data belongs based on the determination result information received from the plurality of identifying units 115. For example, when the identifying unit 117 receives a determination result indicating that it belongs to a corresponding category from one discriminator 115 for a plurality of still image data forming the input moving image data, the input moving image data is It identifies that it belongs to the category associated with the discriminator 115.

また、特定器１１７は、複数の静止画像データの一部について１つの識別器１１５から対応するカテゴリに属する旨の判別結果を受け取った場合、入力された動画像データの一部を当該識別器１１５に対応づけられたカテゴリに特定してよい。この場合において、特定器１１７は、他の静止画像データについて他の識別器１１５からさらに対応するカテゴリに属する旨の判別結果を受け取った場合、特定器１１７は、入力された動画像データの領域を２つ以上に分けて、２つ以上の領域について判別結果を送信したそれぞれの識別器１１５に対応づけられたカテゴリに属すると特定してもよい。 Further, when the identification device 117 receives a determination result indicating that a part of a plurality of still image data belongs to a corresponding category from one discriminator 115, the identifying device 117 converts a part of the input moving image data to the discriminator 115. You may specify the category associated with. In this case, when the identifying unit 117 receives a determination result indicating that other still image data belongs to the corresponding category from the other identifying unit 115, the identifying unit 117 determines the region of the input moving image data. It may be divided into two or more and specified as belonging to a category associated with each discriminator 115 that has transmitted discrimination results for two or more regions.

また、特定器１１７は、同一の静止画像データについて２つ以上の識別器１１５から対応するカテゴリに属する旨の判別結果を受け取った場合、各識別器１１５から受け取る基準ベクトルからの距離の情報に応じて、静止画像データのカテゴリを特定してよい。例えば、特定器１１７は、複数の判別結果のうち、最も基準ベクトルからの距離が小さい結果を送った識別器１１５に対応づけられたカテゴリを、当該静止画像データのカテゴリと特定してよい。特定器１１７は、入力された動画像データを形成する静止画像データ毎にカテゴリを特定していき、動画像データのカテゴリを特定してよい。特定器１１７は、特定したカテゴリの情報を選択部１２０に送る。 Further, when the identification device 117 receives the determination result indicating that the same still image data belongs to the corresponding category from two or more identification devices 115, the identification device 117 responds to the distance information from the reference vector received from each identification device 115. Thus, the category of still image data may be specified. For example, the identifying unit 117 may identify the category associated with the classifier 115 that has transmitted the result having the shortest distance from the reference vector among the plurality of discrimination results as the category of the still image data. The specifying unit 117 may specify a category for each still image data forming the input moving image data, and specify a category of the moving image data. The identifying unit 117 sends information on the identified category to the selection unit 120.

選択部１２０は、入力された動画像データから抽出される複数の特徴量の時系列データのうち動画像データが属するカテゴリに対応付けられた特徴量の時系列データに基づき、動画像データに含まれる特徴シーンを選択する。選択部１２０は、特徴量の時系列データが予め定められた条件を満たす動画像データの部分を、特徴シーンとして選択してよい。 The selection unit 120 is included in the moving image data based on the time-series data of feature amounts associated with the category to which the moving image data belongs among the time-series data of the plurality of feature amounts extracted from the input moving image data. Select the feature scene to be displayed. The selection unit 120 may select, as a feature scene, a portion of moving image data in which time-series data of feature amounts satisfies a predetermined condition.

選択部１２０は、一のカテゴリに属する動画像データに含まれる特徴シーンを選択する場合において、動画像データのカテゴリを特定するために用いた複数のカテゴリ特徴量とは異なる特徴量の時系列データに基づき特徴シーンを選択してよい。これによって、選択部１２０は、動画像データが属するカテゴリに応じた特徴に基づいて、動画像データの特徴シーンを選択できる。ここで、特徴シーンに対応した特徴量をシーン特徴量とする。選択部１２０は、特徴検出器１２５と、特徴選択器１２７とを有する。 When selecting a feature scene included in moving image data belonging to one category, the selection unit 120 has time-series data of feature amounts different from the plurality of category feature amounts used to specify the category of moving image data. A feature scene may be selected based on Accordingly, the selection unit 120 can select a feature scene of the moving image data based on the feature corresponding to the category to which the moving image data belongs. Here, a feature amount corresponding to the feature scene is set as a scene feature amount. The selection unit 120 includes a feature detector 125 and a feature selector 127.

特徴検出器１２５は、カテゴリの種類毎に設けられ、各カテゴリに応じた１以上のシーン特徴量を抽出する。特徴検出器１２５は、予め定められた特徴シーンに応じたシーン特徴量を抽出してよい。 The feature detector 125 is provided for each category type, and extracts one or more scene feature amounts corresponding to each category. The feature detector 125 may extract a scene feature amount corresponding to a predetermined feature scene.

例えば、カテゴリ特定部１１０が動画像データを「花見」のカテゴリに特定する場合、「花見」カテゴリに対応する１つの特徴検出器１２５は、「多くの花が撮像されたシーン」等の特徴シーンに応じた特徴量を抽出する。この場合、特徴検出器１２５は、「花の色」、「花の形状」等を特徴量としてよい。図中の例において、特徴検出器１２５は、ｎ個の識別器１１５の数に応じて、選択部１２０にｎ個備わり、ｎ通りのカテゴリに対応したシーン特徴量をそれぞれ出力する。 For example, when the category specifying unit 110 specifies moving image data as a “cherry-blossom viewing” category, one feature detector 125 corresponding to the “cherry-blossom viewing” category uses a feature scene such as “a scene in which many flowers are captured”. The feature quantity corresponding to is extracted. In this case, the feature detector 125 may use “flower color”, “flower shape”, and the like as feature amounts. In the example in the figure, feature detectors 125 are provided in the selection unit 120 according to the number of n discriminators 115, and respectively output scene feature amounts corresponding to n categories.

特徴検出器１２５は、識別器１１５と同様に、入力された動画像データから動画を形成する複数の静止画像データのシーン特徴量をそれぞれ抽出してよい。ここで特徴検出器１２５は、複数の静止画像データのうち、動画中の一部の期間における静止画像データのシーン特徴量を抽出して、シーン特徴量を時系列に出力してもよい。 Similar to the discriminator 115, the feature detector 125 may extract scene feature amounts of a plurality of still image data forming a moving image from the input moving image data. Here, the feature detector 125 may extract a scene feature amount of still image data in a part of a moving image from among a plurality of still image data, and output the scene feature amount in time series.

特徴検出器１２５は、１以上の静止画像データに含まれるエッジの数に対応する情報をシーン特徴量としてよい。また、特徴検出器１２５は、１以上の静止画像データの色情報を数値化することにより、シーン特徴量を算出してもよい。特徴検出器１２５は、各色の発生頻度をシーン特徴量としてよい。 The feature detector 125 may use information corresponding to the number of edges included in one or more still image data as the scene feature amount. The feature detector 125 may calculate the scene feature amount by digitizing color information of one or more still image data. The feature detector 125 may use the occurrence frequency of each color as a scene feature amount.

また、特徴検出器１２５は、１以上の静止画像データの輝度情報を数値化することにより、シーン特徴量を算出してもよい。特徴検出器１２５は、一例として、１つの静止画像データの色空間をＨＳＶ色空間に変換した上で、変換後の色空間におけるエネルギーおよび分散をシーン特徴量としてよい。また、特徴検出器１２５は、１以上の静止画像データに含まれるメタ情報に基づいて、シーン特徴量を算出してもよい。 Further, the feature detector 125 may calculate the scene feature amount by digitizing luminance information of one or more still image data. As an example, the feature detector 125 may convert the color space of one still image data into an HSV color space, and then use the energy and dispersion in the converted color space as scene feature amounts. The feature detector 125 may calculate a scene feature amount based on meta information included in one or more still image data.

また、特徴検出器１２５は、１以上の静止画像データがそれぞれ表示されるのと同時に出力される音声データを抽出して、シーン特徴量を算出してもよい。例えば、特徴検出器１２５は、１つの静止画像データの表示と共に出力される音声データのボリュームをシーン特徴量としてよい。これに代えて、特徴検出器１２５は、音声データをフィルタリングして、高音部または低音部のボリュームをシーン特徴量としてもよい。 Further, the feature detector 125 may extract audio data that is output at the same time that one or more still image data is displayed, and calculate a scene feature amount. For example, the feature detector 125 may use the volume of audio data output together with the display of one still image data as the scene feature amount. Instead, the feature detector 125 may filter the audio data and use the volume of the high-pitched sound portion or the low-pitched sound portion as the scene feature amount.

また、特徴検出器１２５は、動画像データに含まれている音声データを抽出して、抽出した音声データをテキストデータに変換してシーン特徴量を算出してもよい。例えば、特徴検出器１２５は、動画像データの中で話されている内容をテキスト変換して、予め定められたキーワードとのマッチング量に基づき、特徴量を算出する。 Further, the feature detector 125 may extract audio data included in the moving image data, convert the extracted audio data into text data, and calculate a scene feature amount. For example, the feature detector 125 converts the content spoken in the moving image data into text, and calculates a feature amount based on a matching amount with a predetermined keyword.

特徴検出器１２５は、静止画像データ毎に抽出したシーン特徴量に基づき、特徴シーンを検出する。特徴検出器１２５は、複数のシーン特徴量を各成分とするシーン特徴量ベクトルを用いて特徴シーンを検出してもよい。例えば、特徴検出器１２５は、画像の特定領域に含まれる特定の色の変化に基づく値、画像の輝度の変化に基づく値等をベクトルの成分とする。 The feature detector 125 detects a feature scene based on the scene feature amount extracted for each still image data. The feature detector 125 may detect a feature scene using a scene feature quantity vector having a plurality of scene feature quantities as components. For example, the feature detector 125 uses a value based on a change in a specific color included in a specific area of the image, a value based on a change in luminance of the image, or the like as a vector component.

特徴検出器１２５は、動画像データが属するカテゴリに対応付けられた特徴量の時系列データが予め定められた条件を満たす動画像データの部分を、特徴シーンとして検出する。例えば、特徴検出器１２５は、抽出したシーン特徴量の時系列データのうち、最も大きな特徴量となるデータに対応する静止画像を、特徴シーンとして検出してよい。これに代えて、特徴検出器１２５は、時系列データのうち最も大きな特徴量から予め定められたフレーム数内の複数の静止画像を、特徴シーンとして検出してよい。 The feature detector 125 detects, as a feature scene, a portion of moving image data in which time-series data of feature amounts associated with a category to which the moving image data belongs satisfies a predetermined condition. For example, the feature detector 125 may detect, as a feature scene, a still image corresponding to data having the largest feature amount among the extracted time series data of scene feature amounts. Instead of this, the feature detector 125 may detect a plurality of still images within a predetermined number of frames from the largest feature amount in the time series data as feature scenes.

これに代えて、特徴検出器１２５は、時系列データのうち予め定められた基準値よりも大きな値となる１以上の静止画像を、特徴シーンとして検出してよい。ここで、基準値よりも大きな値となる静止画像がない場合、特徴検出器１２５は、基準値を下げて特徴シーンを検出してよい。また、特徴検出器１２５は、特徴シーンが検出されるまで、基準値を下げてもよい。 Instead, the feature detector 125 may detect one or more still images having a value larger than a predetermined reference value in the time series data as a feature scene. Here, when there is no still image having a value larger than the reference value, the feature detector 125 may detect the feature scene by lowering the reference value. The feature detector 125 may decrease the reference value until a feature scene is detected.

これに代えて、特徴検出器１２５は、選択したシーン特徴量の時系列データのうち最も小さな特徴量となるデータに対応する静止画像を、特徴シーンとして検出してもよい。これに代えて、特徴検出器１２５は、時系列データのうち最も小さな特徴量から予め定められたフレーム数内の複数の静止画像を、特徴シーンとして検出してよい。これに代えて、特徴検出器１２５は、時系列データのうち予め定められた基準値よりも小さな値となる１以上の静止画像を、特徴シーンとして検出してよい。特徴検出器１２５は、検出した特徴シーンの情報を特徴選択器１２７に送る。 Instead, the feature detector 125 may detect a still image corresponding to data having the smallest feature amount among the time-series data of the selected scene feature amounts as a feature scene. Instead, the feature detector 125 may detect a plurality of still images within a predetermined number of frames from the smallest feature amount in the time series data as feature scenes. Instead, the feature detector 125 may detect one or more still images that are smaller than a predetermined reference value in the time series data as a feature scene. The feature detector 125 sends the detected feature scene information to the feature selector 127.

特徴選択器１２７は、複数の特徴検出器１２５から受け取る複数の特徴シーンの情報のうち、特定器１１７が特定したカテゴリに対応する特徴シーンを検出する特徴検出器１２５が検出した特徴シーンの情報を選択する。ここで特徴選択器１２７は、特定器１１７が複数のカテゴリを特定した場合は、特定されたカテゴリ毎に対応する特徴検出器１２５が検出した特徴シーンの情報を選択してよい。特徴選択器１２７は、選択した特徴シーンの情報を抽出部１３０に送る。 The feature selector 127 detects the feature scene information detected by the feature detector 125 that detects the feature scene corresponding to the category identified by the identifier 117 from the plurality of feature scene information received from the plurality of feature detectors 125. select. Here, when the specifying unit 117 specifies a plurality of categories, the feature selector 127 may select information on the feature scene detected by the feature detector 125 corresponding to each specified category. The feature selector 127 sends information on the selected feature scene to the extraction unit 130.

抽出部１３０は、特徴シーンの情報を受け取り、動画像データに含まれる特徴シーンを抽出する。これによって、抽出部１３０は、入力された動画像データから、特定されたカテゴリに応じた特徴シーンを抽出することができる。抽出部１３０は、動画像データに含まれる特徴シーンの位置を、タイムスタンプまたは静止画像の配列順序で指定して、特徴シーンを抽出してよい。抽出部１３０は、動画像データに含まれる特徴シーンの位置の情報を加工部１４０に送ってよい。 The extraction unit 130 receives feature scene information and extracts feature scenes included in the moving image data. Thereby, the extraction unit 130 can extract a feature scene corresponding to the specified category from the input moving image data. The extracting unit 130 may extract the feature scene by designating the position of the feature scene included in the moving image data in a time stamp or still image arrangement order. The extraction unit 130 may send information on the position of the feature scene included in the moving image data to the processing unit 140.

加工部１４０は、抽出部１３０から受け取った特徴シーンの位置の情報に基づき、入力された動画像データに含まれる特徴シーンの位置に相当する静止画像データに対して、特徴シーンであることの情報を埋め込むように動画像データを加工する。加工部１４０は、特徴シーンの情報を持つ動画像データを出力する。 Based on the information on the position of the feature scene received from the extraction unit 130, the processing unit 140 is information indicating that it is a feature scene with respect to still image data corresponding to the position of the feature scene included in the input moving image data. The moving image data is processed so as to be embedded. The processing unit 140 outputs moving image data having feature scene information.

図２は、本実施形態に係る画像処理装置１００の動作フローを示す。画像処理装置１００は、入力された動画像データをカテゴリ特定部１１０および選択部１２０に入力させて、カテゴリ特徴量およびシーン特徴量を算出する。画像処理装置１００は、カテゴリ特徴量およびシーン特徴量を算出する処理を並行して実行してよい。 FIG. 2 shows an operation flow of the image processing apparatus 100 according to the present embodiment. The image processing apparatus 100 inputs the input moving image data to the category specifying unit 110 and the selection unit 120 to calculate a category feature amount and a scene feature amount. The image processing apparatus 100 may execute the process of calculating the category feature amount and the scene feature amount in parallel.

カテゴリ特定部１１０の複数の識別器１１５は、動画像データをそれぞれ入力し、動画を形成する複数の静止画像データから、複数の識別器１１５に対応する複数のカテゴリ特徴量をそれぞれ抽出する（Ｓ５００）。複数の識別器１１５のそれぞれは、複数の静止画像データから「花見」、「海辺」、「祭り」、「結婚式」等の予め定められた１つのカテゴリに応じた１以上のカテゴリ特徴量を抽出する。 The plurality of classifiers 115 of the category specifying unit 110 respectively input moving image data, and extract a plurality of category feature amounts corresponding to the plurality of classifiers 115 from a plurality of still image data forming a moving image (S500). ). Each of the plurality of classifiers 115 obtains one or more category feature amounts corresponding to one predetermined category such as “cherry blossom viewing”, “seaside”, “festival”, “wedding” from a plurality of still image data. Extract.

次に、カテゴリ特定部１１０は、入力された動画像データのカテゴリを特定する（Ｓ５１０）。複数の識別器１１５は、それぞれ抽出した複数のカテゴリ特徴量に基づき、複数の静止画像データが各識別器１１５に対応するカテゴリに属するか否かを判別する。ここでそれぞれの識別器１１５は、静止画像データの複数のカテゴリ特徴量と、カテゴリの基準量とを比較して判別してよい。 Next, the category specifying unit 110 specifies the category of the input moving image data (S510). The plurality of classifiers 115 determine whether or not the plurality of still image data belong to the category corresponding to each classifier 115 based on the plurality of category feature amounts extracted. Here, each discriminator 115 may discriminate by comparing a plurality of category feature amounts of still image data with a category reference amount.

例えば、識別器１１５は、予め対応するカテゴリに特定されるべき複数の動画像データからカテゴリ特徴量を抽出して、基準量および／またはカテゴリ特徴量の統計的な分布の範囲を集計しておいてよい。このように集計したカテゴリ特徴量の分布の範囲内に含まれることに基づき、入力された動画像データのカテゴリを判別してよい。複数の識別器１１５のうち、入力された静止画像データが対応するカテゴリに属すると判別した識別器１１５は、判別結果を特定器１１７に送る。 For example, the discriminator 115 extracts a category feature amount from a plurality of moving image data to be specified in advance in a corresponding category, and totals the range of the reference amount and / or the statistical distribution of the category feature amount. May be. The category of the input moving image data may be discriminated based on being included in the range of the distribution of the category feature amount thus aggregated. The discriminator 115 that has determined that the input still image data belongs to the corresponding category among the plurality of discriminators 115 sends the discrimination result to the identifying unit 117.

特定器１１７は、識別器１１５から受け取った判別結果に基づき、入力された動画像データのカテゴリを特定する。特定器１１７は、特定したカテゴリの情報を、特徴選択器に送る。 The identifying unit 117 identifies the category of the input moving image data based on the determination result received from the identifying unit 115. The identifier 117 sends information on the identified category to the feature selector.

一方、選択部１２０は、動画像データに含まれる特徴シーンを選択する。選択部１２０の複数の特徴検出器１２５は、動画像データをそれぞれ入力して、動画を形成する複数の静止画像データから、複数の特徴検出器１２５に対応する複数のシーン特徴量をそれぞれ抽出する（Ｓ５２０）。複数の特徴検出器１２５のそれぞれは、複数の静止画像データから「花見」、「海辺」、「祭り」、「結婚式」等のカテゴリ特定部１１０が特定する予め定められた１つのカテゴリに応じた１以上のシーン特徴量を抽出する。 On the other hand, the selection unit 120 selects a feature scene included in the moving image data. The plurality of feature detectors 125 of the selection unit 120 respectively input moving image data, and extract a plurality of scene feature amounts corresponding to the plurality of feature detectors 125 from a plurality of still image data forming a moving image. (S520). Each of the plurality of feature detectors 125 corresponds to one predetermined category specified by the category specifying unit 110 such as “cherry blossom viewing”, “seaside”, “festival”, “wedding”, etc. from a plurality of still image data. One or more scene feature amounts are extracted.

次に、複数の特徴検出器１２５それぞれは、入力された動画像データの特徴シーンを選択すべく、複数のシーン特徴量の時系列データから特徴シーンをそれぞれ検出する。複数の特徴検出器１２５は、それぞれ検出した特徴シーンを特徴選択器１２７にそれぞれ送る。 Next, each of the plurality of feature detectors 125 detects a feature scene from time-series data of a plurality of scene feature amounts in order to select a feature scene of the input moving image data. The plurality of feature detectors 125 send the detected feature scenes to the feature selector 127, respectively.

特徴選択器１２７は、複数の特徴検出器１２５から受け取る複数の特徴シーンの情報のうち、特定器１１７が特定したカテゴリに対応する特徴シーンを検出する特徴検出器１２５が検出した特徴シーンの情報を選択する（Ｓ５３０）。例えば、入力した動画を特定器１１７が「海辺」のカテゴリに特定した場合、特徴選択器１２７は、入力した動画から「海辺」のシーン特徴量を抽出して「海辺」の特徴シーンを検出する特徴検出器１２５から受け取る特徴シーンの情報を選択する。特徴選択器１２７は、選択した特徴シーンの情報を抽出部１３０に送る。 The feature selector 127 detects the feature scene information detected by the feature detector 125 that detects the feature scene corresponding to the category identified by the identifier 117 from the plurality of feature scene information received from the plurality of feature detectors 125. Select (S530). For example, when the identifying unit 117 identifies the input moving image as the “seaside” category, the feature selector 127 detects the “seaside” feature scene by extracting the “seaside” scene feature amount from the input moving image. The feature scene information received from the feature detector 125 is selected. The feature selector 127 sends information on the selected feature scene to the extraction unit 130.

抽出部１３０は、特徴シーンの情報を受け取り、入力された動画像データに含まれる特徴シーンを抽出する（Ｓ５４０）。抽出部１３０は、動画像データに含まれる特徴シーンの位置の情報を加工部１４０に送り、加工部１４０は、入力された動画像データに特徴シーンの情報を埋め込む。以上の本実施形態に係る画像処理装置１００の動作フローによって、画像処理装置１００は、動画像データの特徴シーンを抽出することができる。 The extraction unit 130 receives the feature scene information and extracts the feature scene included in the input moving image data (S540). The extraction unit 130 sends the information on the position of the feature scene included in the moving image data to the processing unit 140, and the processing unit 140 embeds the information on the feature scene in the input moving image data. With the operation flow of the image processing apparatus 100 according to the present embodiment, the image processing apparatus 100 can extract a feature scene of moving image data.

即ち、画像処理装置１００は、例えば、スポーツを撮影した動画像データの場合、観客の盛り上がりおよび選手の動き等を特徴量として特徴シーンを抽出すること、ペットを撮影した動画像データの場合は、飼い主の笑顔およびペットの仕草等を特徴量として特徴シーンを抽出すること等、カテゴリに応じた特徴量を使ってシーン群の中から特徴シーンを抽出することができる。したがって、画像処理装置１００は、動画像データから特徴的なシーンを自動的に選択することができる。 That is, for example, in the case of moving image data capturing sports, the image processing apparatus 100 extracts feature scenes using feature amounts such as spectator excitement and player movement, and in the case of moving image data capturing a pet, A feature scene can be extracted from a scene group using a feature amount corresponding to a category, such as extracting a feature scene using a smile of the owner and a pet gesture as a feature amount. Therefore, the image processing apparatus 100 can automatically select a characteristic scene from the moving image data.

ここで、抽出部１３０は、選択された特徴シーンを代表する静止画像を生成する静止画生成部であってもよい。例えば、抽出部１３０は、選択された特徴シーンが複数の静止画像の場合に、複数の静止画像のうちの１つの静止画像を抽出する。この場合、抽出部１３０は、複数の静止画像のうち、最も大きな特徴量または小さな特徴量に対応する静止画像を抽出してよい。抽出部１３０は、入力する動画像データが複数の場合、複数の動画像データに対応する特徴シーンを代表する静止画像を生成してよい。これによって、抽出部１３０は、複数の動画像データを代表する静止画像で、サムネイル表示することができる。 Here, the extraction unit 130 may be a still image generation unit that generates a still image representing the selected feature scene. For example, when the selected feature scene is a plurality of still images, the extraction unit 130 extracts one still image from the plurality of still images. In this case, the extraction unit 130 may extract a still image corresponding to the largest feature amount or the smallest feature amount among the plurality of still images. When there are a plurality of pieces of moving image data to be input, the extraction unit 130 may generate a still image that represents a feature scene corresponding to the plurality of pieces of moving image data. Accordingly, the extraction unit 130 can display thumbnails as still images representing a plurality of moving image data.

以上の実施例において、選択部１２０は、入力された動画像データに対して１つの特徴シーンを選択する例を説明したが、これに代えて選択部１２０は、複数の特徴シーンを選択してもよい。例えば、特徴選択器１２７は、シーン特徴量の時系列データの最大値に加えて、さらに極値を示す静止画像データを特徴シーンとして選択する。 In the above-described embodiments, the example in which the selection unit 120 selects one feature scene with respect to the input moving image data has been described. Instead, the selection unit 120 selects a plurality of feature scenes. Also good. For example, the feature selector 127 selects still image data indicating an extreme value as a feature scene in addition to the maximum value of the time series data of the scene feature amount.

選択部１２０は、複数の特徴シーンを選択した場合、複数の特徴シーンのそれぞれに対してシーン特徴量に基づきランクをつけてよい。これによって、画像処理装置１００は、入力された動画像データから複数の特徴シーンをランク付けしつつ抽出することができる。抽出部１３０は、ランキングに応じたサムネイル表示をしてもよい。これによって、動画像データの複数の特徴的なシーンを自動的に選択してランキング表示することができる。 When selecting a plurality of feature scenes, the selection unit 120 may rank each of the plurality of feature scenes based on the scene feature amount. As a result, the image processing apparatus 100 can extract a plurality of feature scenes from the input moving image data while ranking them. The extraction unit 130 may display thumbnails according to the ranking. As a result, a plurality of characteristic scenes of moving image data can be automatically selected and displayed in ranking.

図３は、本実施形態に係る動画像データ２１０の一例を示す。動画像データ２１０は、時系列に並んだ複数の静止画像データ２１５から形成されてよい。図中の例において、動画像データ２１０は、２４の静止画像データ２１５から形成されている。 FIG. 3 shows an example of the moving image data 210 according to the present embodiment. The moving image data 210 may be formed from a plurality of still image data 215 arranged in time series. In the example in the drawing, the moving image data 210 is formed from 24 still image data 215.

図４は、本実施形態に係る動画像データ２１０に対応する、カテゴリ特徴量の時系列データの一例を示す。図中の例は、３つの動画像データ２１０に対応する、カテゴリ特徴量の時系列データ３０２、時系列データ３０４、および時系列データ３０６を示す。動画像データの途中から異なるカテゴリの動画に場面展開されること等がなく、同一のカテゴリの映像が撮像されている場合、当該動画像データ２１０を形成する静止画像データ２１５であれば、カテゴリ特徴量は図中の例のように略一定の範囲内の値となる。 FIG. 4 shows an example of time series data of category feature values corresponding to the moving image data 210 according to the present embodiment. The example in the figure shows category feature amount time-series data 302, time-series data 304, and time-series data 306 corresponding to three moving image data 210. If the video of the same category is not captured in the middle of the moving image data and the video of the same category is captured, if the still image data 215 forming the moving image data 210 is used, The amount is a value within a substantially constant range as in the example in the figure.

「海辺」のカテゴリを特定する識別器１１５は、予め定められた「海辺」の基準ベクトルとの距離によって、時系列データが「海辺」のカテゴリに属するか否かを特定してよい。例えば、「海辺」の動画像データであれば、海の色、砂浜の色、海岸を形成する岩の色等でカテゴリ特徴量を抽出する。ここで、予め、「海辺」にカテゴリ分けされるべき複数の映像からカテゴリ特徴量を抽出して、基準量およびカテゴリ特徴量の統計的な分布の範囲を集計しておいてよい。このように抽出したカテゴリ特徴量の分布から、予め定められた基準値の範囲内に含まれる時系列データを、「海辺」のカテゴリに特定してよい。 The discriminator 115 that identifies the “seaside” category may identify whether or not the time-series data belongs to the “seaside” category based on a predetermined distance from the “seaside” reference vector. For example, in the case of “seaside” moving image data, category feature amounts are extracted by the color of the sea, the color of the beach, the color of the rocks forming the coast, and the like. Here, a category feature amount may be extracted from a plurality of videos to be categorized as “seaside” and a statistical distribution range of the reference amount and the category feature amount may be aggregated in advance. From the distribution of the category feature amount extracted in this way, time-series data included in a predetermined reference value range may be specified as the “seaside” category.

ここで、予め、画像処理装置１００の製造元において、多くの「海辺」の動画像データから、機械学習または人手による最適化等によりカテゴリ特徴量とする特徴量を選択し、カテゴリ基準ベクトルを設定してもよい。また、画像処理装置１００が選択した特徴シーンおよびそのカテゴリを利用者が変更したこと応じて、学習によりユーザー最適化してもよい。 Here, in advance, the manufacturer of the image processing apparatus 100 selects a feature amount as a category feature amount from a lot of “seaside” moving image data by machine learning or manual optimization, and sets a category reference vector. May be. Further, the user may be optimized by learning in accordance with the change of the feature scene selected by the image processing apparatus 100 and its category.

図中の例が、このようなカテゴリ特徴量で抽出する識別器１１５の抽出結果の場合、基準ベクトルに近い距離の範囲内に含まれる時系列データ３０２および３０４を、識別器１１５は、「海辺」のカテゴリに特定してよい。ここで、それぞれの時系列データの距離は静止画像データの数だけあるので、識別器１１５は、距離の平均値を用いて特定してよい。図中の例において、カテゴリ特徴量の時系列データは、動画像データ２１０を形成する静止画像データ２１５と同数のデータを抽出する例を説明したが、これに代えて、一定間隔で間引いてカテゴリ特徴量を抽出してもよい。これによって、カテゴリ特定部１１０は、カテゴリを特定する処理を高速に実行することができる。 In the case where the example in the figure is the extraction result of the discriminator 115 that extracts with such category feature amount, the discriminator 115 indicates that the time series data 302 and 304 included in the distance range close to the reference vector ”Category. Here, since the distance of each time series data is the number of still image data, the discriminator 115 may specify using the average value of the distance. In the example in the figure, the example in which the same number of data as the still image data 215 forming the moving image data 210 is extracted as the time series data of the category feature amount has been described. A feature amount may be extracted. As a result, the category specifying unit 110 can execute the process of specifying the category at high speed.

また、例えば、「花見」等のカテゴリであれば、「海辺」のカテゴリと同様に、動画像データに発生する色に着目して「花見」のカテゴリ特徴量を抽出することもある。しかしながら、「海辺」のカテゴリを特定する識別器１１５は、「海辺」の動画に発生する色等を特徴量として抽出するので、「花見」等の動画像データは、例えば、時系列データ３０６のように、「海辺」とは特定されない距離に分布する。これによって、識別器１１５は、カテゴリ特徴量の時系列データから予め定められたカテゴリを特定することができる。 For example, in the case of a category such as “cherry-blossom viewing”, the category feature amount of “cherry-blossom viewing” may be extracted by paying attention to the color generated in the moving image data, similarly to the “seaside” category. However, the discriminator 115 that identifies the category “Seaside” extracts the color and the like generated in the “Seaside” video as a feature amount, so that moving image data such as “Hanami” is, for example, the time-series data 306. Thus, the “seaside” is distributed at an unspecified distance. Thereby, the discriminator 115 can specify a predetermined category from the time series data of the category feature amount.

図５は、本実施形態に係る動画像データ２１０に対応する、シーン特徴量の時系列データの一例を示す。時系列データ４０２、および時系列データ４０４は、図４の時系列データ３０２および時系列データ３０４に対応するシーン特徴量の時系列データを示す。シーン特徴量は、動画像データの中から特徴的なシーンを抽出するものなので、カテゴリ特徴量とは異なり時系列で略一定の値にはならない。 FIG. 5 shows an example of time-series data of scene feature amounts corresponding to the moving image data 210 according to the present embodiment. The time series data 402 and the time series data 404 indicate time series data of scene feature amounts corresponding to the time series data 302 and the time series data 304 in FIG. Since the scene feature amount extracts a characteristic scene from the moving image data, the scene feature amount does not have a substantially constant value in time series unlike the category feature amount.

ここで、シーン特徴量の時系列データは、カテゴリ特定部１１０が特定したカテゴリに対応づけられたシーン特徴量を抽出する特徴検出器１２５の抽出結果である。このような特徴検出器１２５は、例えば、「海辺」の特徴シーンを検出すべく、海の色が発生する面積、砂浜の色が発生する面積、岩の色が発生する面積、波の音の音量、子供の顔認識、子供の声の音量等のシーン特徴量を抽出する。ここで、予め、画像処理装置１００の製造元において、多くの「海辺」の動画像データから、機械学習または人手による最適化等によりシーン特徴量を選択してよい。 Here, the scene feature amount time-series data is an extraction result of the feature detector 125 that extracts the scene feature amount associated with the category specified by the category specifying unit 110. For example, the feature detector 125 may detect an area where the sea color is generated, an area where the sand beach color is generated, an area where the rock color is generated, and the sound of the waves. Extract scene features such as volume, child face recognition, and child voice volume. Here, in advance, the manufacturer of the image processing apparatus 100 may select a scene feature amount from a lot of “seaside” moving image data by machine learning or manual optimization.

例えば、海辺の風景を撮像した場合、海の色が発生する面積、砂浜の色が発生する面積、および岩の色が発生する面積が変わらない場合があり、特徴検出器１２５は、抽出する特徴量としての数値は大きくても変化があまり見られない時系列データを検出する。そこで、例えば、特徴検出器１２５は、シーン特徴量として波の音の音量の特徴量を加えることで、図中の時系列データ４０２のように、時系列データが変化して特徴シーンを検出することができる。 For example, when an image of a seaside landscape is captured, the area where the color of the sea is generated, the area where the color of the sandy beach is generated, and the area where the color of the rock is generated may not change, and the feature detector 125 extracts the feature to be extracted. Even if the numerical value as a quantity is large, time series data that does not change much is detected. Therefore, for example, the feature detector 125 detects the feature scene by changing the time-series data as in the time-series data 402 in the figure by adding the feature quantity of the volume of the wave sound as the scene feature quantity. be able to.

ここで、波の音が大きくても、海が撮像されていなければ海の色が発生する面積による特徴量が減少するので、このようなシーンは抽出されない。したがって、特徴検出器１２５は、海が撮像され、かつ、大きな波が海辺に打ちつけられるダイナミックなシーンの静止画像データＡを、特徴シーンとして抽出することができる。また、静止画像データＡと、当該データの前後の予め定められた数のフレームとを特徴シーンとしてもよい。これによって、大きな波が海岸に打ち付けられるシーンを特徴シーンとして抽出することができる。 Here, even if the sound of the waves is loud, if the sea is not imaged, the feature amount due to the area where the color of the sea is generated decreases, so such a scene is not extracted. Therefore, the feature detector 125 can extract still image data A of a dynamic scene in which the sea is imaged and a large wave is hit against the seaside as a feature scene. Still image data A and a predetermined number of frames before and after the data may be used as the feature scene. As a result, a scene in which a large wave is hit against the coast can be extracted as a feature scene.

また、同じ「海辺」のカテゴリに特定された動画像データであっても、子供の顔を主に撮像した場合、海の色に着目したシーン特徴量は低く算出される。この場合であっても、特徴検出器１２５は、子供の顔認識、子供の声の音量等を含めてシーン特徴量を抽出することで、図中の時系列データ４０４にように、撮像された子供の特徴で特徴量が変化して特徴シーンを検出することができる。これによって、例えば海辺で遊ぶ子供が声を出して笑っているシーンである静止画像データＢを特徴シーンとして抽出することができる。 Further, even when moving image data specified in the same “seaside” category is used, when a child's face is mainly imaged, the scene feature amount focusing on the color of the sea is calculated to be low. Even in this case, the feature detector 125 is picked up as shown in the time-series data 404 in the figure by extracting the scene feature amount including the child face recognition, the voice volume of the child, and the like. It is possible to detect the feature scene by changing the feature amount according to the feature of the child. As a result, for example, still image data B, which is a scene where a child playing at the beach laughs out loud, can be extracted as a feature scene.

図６は、本実施形態に係る画像処理装置１００の変形例を示す。本変形例の画像処理装置１００において、図１に示された本実施形態に係る画像処理装置１００の動作と略同一のものには同一の符号を付け、説明を省略する。画像処理装置１００は、入力された動画像データのカテゴリを特定した後に、特定したカテゴリに応じたシーン特徴量を抽出して特徴シーンを選択する。 FIG. 6 shows a modification of the image processing apparatus 100 according to the present embodiment. In the image processing apparatus 100 of the present modification, the same reference numerals are given to the substantially same operations as those of the image processing apparatus 100 according to the present embodiment shown in FIG. After specifying the category of the input moving image data, the image processing apparatus 100 extracts a scene feature amount corresponding to the specified category and selects a feature scene.

カテゴリ特定部１１０の特定器１１７は、入力された動画像データのカテゴリを特定する。特定器１１７は、特定したカテゴリの情報と入力された動画像データとを選択部１２０の特徴選択器１２７に送る。特徴選択器１２７は、受け取ったカテゴリに応じたシーン特徴量を抽出する特徴検出器１２５を選択して、受け取った動画像データを入力させる。選択された特徴検出器１２５は、入力された動画像データからシーン特徴量の時系列データを抽出し、特徴シーンを検出する。特徴検出器１２５は、検出した特徴シーンの情報と入力された動画像データとを抽出部１３０に送る。 The specifying unit 117 of the category specifying unit 110 specifies the category of the input moving image data. The identifying unit 117 sends the identified category information and the input moving image data to the feature selecting unit 127 of the selecting unit 120. The feature selector 127 selects the feature detector 125 that extracts a scene feature amount corresponding to the received category, and inputs the received moving image data. The selected feature detector 125 extracts scene feature amount time-series data from the input moving image data, and detects a feature scene. The feature detector 125 sends the detected feature scene information and the input moving image data to the extraction unit 130.

抽出部１３０は、受け取った動画像データと特徴シーンの情報から、特徴シーンを抽出する。また、抽出部１３０は、受け取った動画像データから特徴シーンを抽出してサムネイル表示してよい。抽出部１３０は、入力された動画像データと特徴シーンの情報とを加工部１４０に送る。加工部１４０は、受け取った特徴シーンの情報を、受け取った動画像データに加工して埋め込む。加工部１４０は、加工した動画像データを出力する。 The extraction unit 130 extracts feature scenes from the received moving image data and feature scene information. The extraction unit 130 may extract feature scenes from the received moving image data and display them as thumbnails. The extraction unit 130 sends the input moving image data and feature scene information to the processing unit 140. The processing unit 140 processes and embeds the received feature scene information into the received moving image data. The processing unit 140 outputs the processed moving image data.

以上の本実施形態に係る画像処理装置１００の変形例によって、入力された動画像データの特徴シーンを選択することができる。また、本変形例は、入力された動画像データのカテゴリを特定してからシーン特徴量を抽出するので、特定したカテゴリに応じた特徴検出器１２５を適切に選択できるので、特定したカテゴリとは無関係の特徴検出器１２５を動作させることなく、特徴シーンを選択することができる。 The feature scene of the input moving image data can be selected by the modification of the image processing apparatus 100 according to the present embodiment. In addition, in this modified example, the scene feature amount is extracted after specifying the category of the input moving image data. Therefore, the feature detector 125 corresponding to the specified category can be appropriately selected. A feature scene can be selected without operating an irrelevant feature detector 125.

以上の本実施形態にかかる画像処理装置１００は、動画像データが入力されて、特徴シーンを選択することを例に説明した。さらに、この画像処理装置１００は、撮像システムに搭載されてよい。撮像システムは、被写体を撮像した動画像データを生成する撮像部と、撮像部により撮像された動画像データから特徴シーンを選択する画像処理装置１００と、撮像部により撮像された動画像データに対応付けて、画像処理装置により選択された特徴シーンを表す情報を記録媒体に記録する記録部と、を備えてよい。これによって、撮像システムは、被写体を撮像しつつ、撮像した動画像データの特徴シーンを選択することができる。 The image processing apparatus 100 according to the present embodiment has been described by way of example in which moving image data is input and a feature scene is selected. Furthermore, the image processing apparatus 100 may be mounted on an imaging system. The imaging system corresponds to an imaging unit that generates moving image data obtained by imaging a subject, an image processing device 100 that selects a feature scene from moving image data captured by the imaging unit, and moving image data captured by the imaging unit. In addition, a recording unit that records information representing the feature scene selected by the image processing apparatus on a recording medium may be provided. Thus, the imaging system can select a feature scene of the captured moving image data while imaging the subject.

図７は、本実施形態に係るコンピュータ１９００のハードウェア構成の一例を示す。本実施形態に係るコンピュータ１９００は、ホスト・コントローラ２０８２により相互に接続されるＣＰＵ２０００、ＲＡＭ２０２０、グラフィック・コントローラ２０７５、及び表示装置２０８０を有するＣＰＵ周辺部と、入出力コントローラ２０８４によりホスト・コントローラ２０８２に接続される通信インターフェイス２０３０、ハードディスクドライブ２０４０、及びＤＶＤドライブ２０６０を有する入出力部と、入出力コントローラ２０８４に接続されるＲＯＭ２０１０、フレキシブルディスク・ドライブ２０５０、及び入出力チップ２０７０を有するレガシー入出力部と、を備える。 FIG. 7 shows an example of the hardware configuration of a computer 1900 according to this embodiment. A computer 1900 according to this embodiment is connected to a CPU peripheral unit having a CPU 2000, a RAM 2020, a graphic controller 2075, and a display device 2080 that are connected to each other by a host controller 2082, and to the host controller 2082 by an input / output controller 2084. An input / output unit having a communication interface 2030, a hard disk drive 2040, and a DVD drive 2060; a legacy input / output unit having a ROM 2010, a flexible disk drive 2050, and an input / output chip 2070 connected to the input / output controller 2084; Is provided.

ホスト・コントローラ２０８２は、ＲＡＭ２０２０と、高い転送レートでＲＡＭ２０２０をアクセスするＣＰＵ２０００及びグラフィック・コントローラ２０７５とを接続する。ＣＰＵ２０００は、ＲＯＭ２０１０及びＲＡＭ２０２０に格納されたプログラムに基づいて動作し、各部の制御を行う。グラフィック・コントローラ２０７５は、ＣＰＵ２０００等がＲＡＭ２０２０内に設けたフレーム・バッファ上に生成する画像データを取得し、表示装置２０８０上に表示させる。これに代えて、グラフィック・コントローラ２０７５は、ＣＰＵ２０００等が生成する画像データを格納するフレーム・バッファを、内部に含んでもよい。 The host controller 2082 connects the RAM 2020 to the CPU 2000 and the graphic controller 2075 that access the RAM 2020 at a high transfer rate. The CPU 2000 operates based on programs stored in the ROM 2010 and the RAM 2020 and controls each unit. The graphic controller 2075 acquires image data generated by the CPU 2000 or the like on a frame buffer provided in the RAM 2020 and displays it on the display device 2080. Instead of this, the graphic controller 2075 may include a frame buffer for storing image data generated by the CPU 2000 or the like.

入出力コントローラ２０８４は、ホスト・コントローラ２０８２と、比較的高速な入出力装置である通信インターフェイス２０３０、ハードディスクドライブ２０４０、ＤＶＤドライブ２０６０を接続する。通信インターフェイス２０３０は、ネットワークを介して他の装置と通信する。ハードディスクドライブ２０４０は、コンピュータ１９００内のＣＰＵ２０００が使用するプログラム及びデータを格納する。ＤＶＤドライブ２０６０は、ＤＶＤ−ＲＯＭ２０９５からプログラム又はデータを読み取り、ＲＡＭ２０２０を介してハードディスクドライブ２０４０に提供する。 The input / output controller 2084 connects the host controller 2082 to the communication interface 2030, the hard disk drive 2040, and the DVD drive 2060, which are relatively high-speed input / output devices. The communication interface 2030 communicates with other devices via a network. The hard disk drive 2040 stores programs and data used by the CPU 2000 in the computer 1900. The DVD drive 2060 reads a program or data from the DVD-ROM 2095 and provides it to the hard disk drive 2040 via the RAM 2020.

また、入出力コントローラ２０８４には、ＲＯＭ２０１０と、フレキシブルディスク・ドライブ２０５０、及び入出力チップ２０７０の比較的低速な入出力装置とが接続される。ＲＯＭ２０１０は、コンピュータ１９００が起動時に実行するブート・プログラム、及び／又は、コンピュータ１９００のハードウェアに依存するプログラム等を格納する。フレキシブルディスク・ドライブ２０５０は、フレキシブルディスク２０９０からプログラム又はデータを読み取り、ＲＡＭ２０２０を介してハードディスクドライブ２０４０に提供する。入出力チップ２０７０は、フレキシブルディスク・ドライブ２０５０を入出力コントローラ２０８４へと接続すると共に、例えばパラレル・ポート、シリアル・ポート、キーボード・ポート、マウス・ポート等を介して各種の入出力装置を入出力コントローラ２０８４へと接続する。 The input / output controller 2084 is connected to the ROM 2010, the flexible disk drive 2050, and the relatively low-speed input / output device of the input / output chip 2070. The ROM 2010 stores a boot program that the computer 1900 executes at startup and / or a program that depends on the hardware of the computer 1900. The flexible disk drive 2050 reads a program or data from the flexible disk 2090 and provides it to the hard disk drive 2040 via the RAM 2020. The input / output chip 2070 connects the flexible disk drive 2050 to the input / output controller 2084 and inputs / outputs various input / output devices via, for example, a parallel port, a serial port, a keyboard port, a mouse port, and the like. Connect to controller 2084.

ＲＡＭ２０２０を介してハードディスクドライブ２０４０に提供されるプログラムは、フレキシブルディスク２０９０、ＤＶＤ−ＲＯＭ２０９５、又はＩＣカード等の記録媒体に格納されて利用者によって提供される。プログラムは、記録媒体から読み出され、ＲＡＭ２０２０を介してコンピュータ１９００内のハードディスクドライブ２０４０にインストールされ、ＣＰＵ２０００において実行される。 A program provided to the hard disk drive 2040 via the RAM 2020 is stored in a recording medium such as the flexible disk 2090, the DVD-ROM 2095, or an IC card and provided by the user. The program is read from the recording medium, installed in the hard disk drive 2040 in the computer 1900 via the RAM 2020, and executed by the CPU 2000.

コンピュータ１９００にインストールされ、コンピュータ１９００を画像処理装置１００として機能させるプログラムは、カテゴリ特定モジュールと、選択モジュールと、抽出モジュールと、加工モジュールと、を備える。これらのプログラム又はモジュールは、ＣＰＵ２０００等に働きかけて、コンピュータ１９００を、カテゴリ特定部１１０、選択部１２０、抽出部１３０、加工部１４０としてそれぞれ機能させる。 A program installed on the computer 1900 and causing the computer 1900 to function as the image processing apparatus 100 includes a category specifying module, a selection module, an extraction module, and a processing module. These programs or modules work on the CPU 2000 or the like to cause the computer 1900 to function as the category specifying unit 110, the selection unit 120, the extraction unit 130, and the processing unit 140, respectively.

これらのプログラムに記述された情報処理は、コンピュータ１９００に読込まれることにより、ソフトウェアと上述した各種のハードウェア資源とが協働した具体的手段であるカテゴリ特定部１１０、選択部１２０、抽出部１３０、加工部１４０として機能する。そして、これらの具体的手段によって、本実施形態におけるコンピュータ１９００の使用目的に応じた情報の演算又は加工を実現することにより、使用目的に応じた特有の画像処理装置１００が構築される。 The information processing described in these programs is read by the computer 1900, whereby the category specifying unit 110, the selecting unit 120, and the extracting unit, which are specific means in which the software and the various hardware resources described above cooperate. 130 and function as the processing unit 140. And the specific image processing apparatus 100 according to the intended purpose is constructed | assembled by implement | achieving the calculation or processing of the information according to the intended purpose of the computer 1900 in this embodiment by these specific means.

一例として、コンピュータ１９００と外部の装置等との間で通信を行う場合には、ＣＰＵ２０００は、ＲＡＭ２０２０上にロードされた通信プログラムを実行し、通信プログラムに記述された処理内容に基づいて、通信インターフェイス２０３０に対して通信処理を指示する。通信インターフェイス２０３０は、ＣＰＵ２０００の制御を受けて、ＲＡＭ２０２０、ハードディスクドライブ２０４０、フレキシブルディスク２０９０、又はＤＶＤ−ＲＯＭ２０９５等の記憶装置上に設けた送信バッファ領域等に記憶された送信データを読み出してネットワークへと送信し、もしくは、ネットワークから受信した受信データを記憶装置上に設けた受信バッファ領域等へと書き込む。このように、通信インターフェイス２０３０は、ＤＭＡ（ダイレクト・メモリ・アクセス）方式により記憶装置との間で送受信データを転送してもよく、これに代えて、ＣＰＵ２０００が転送元の記憶装置又は通信インターフェイス２０３０からデータを読み出し、転送先の通信インターフェイス２０３０又は記憶装置へとデータを書き込むことにより送受信データを転送してもよい。 As an example, when communication is performed between the computer 1900 and an external device or the like, the CPU 2000 executes a communication program loaded on the RAM 2020 and executes a communication interface based on the processing content described in the communication program. A communication process is instructed to 2030. Under the control of the CPU 2000, the communication interface 2030 reads transmission data stored in a transmission buffer area provided on a storage device such as the RAM 2020, the hard disk drive 2040, the flexible disk 2090, or the DVD-ROM 2095, and sends it to the network. The reception data transmitted or received from the network is written into a reception buffer area or the like provided on the storage device. As described above, the communication interface 2030 may transfer transmission / reception data to / from the storage device by a DMA (direct memory access) method. Instead, the CPU 2000 transfers the storage device or the communication interface 2030 as a transfer source. The transmission / reception data may be transferred by reading the data from the data and writing the data to the communication interface 2030 or the storage device of the transfer destination.

また、ＣＰＵ２０００は、ハードディスクドライブ２０４０、ＤＶＤドライブ２０６０（ＤＶＤ−ＲＯＭ２０９５）、フレキシブルディスク・ドライブ２０５０（フレキシブルディスク２０９０）等の外部記憶装置に格納されたファイルまたはデータベース等の中から、全部または必要な部分をＤＭＡ転送等によりＲＡＭ２０２０へと読み込ませ、ＲＡＭ２０２０上のデータに対して各種の処理を行う。そして、ＣＰＵ２０００は、処理を終えたデータを、ＤＭＡ転送等により外部記憶装置へと書き戻す。このような処理において、ＲＡＭ２０２０は、外部記憶装置の内容を一時的に保持するものとみなせるから、本実施形態においてはＲＡＭ２０２０および外部記憶装置等をメモリ、記憶部、または記憶装置等と総称する。本実施形態における各種のプログラム、データ、テーブル、データベース等の各種の情報は、このような記憶装置上に格納されて、情報処理の対象となる。なお、ＣＰＵ２０００は、ＲＡＭ２０２０の一部をキャッシュメモリに保持し、キャッシュメモリ上で読み書きを行うこともできる。このような形態においても、キャッシュメモリはＲＡＭ２０２０の機能の一部を担うから、本実施形態においては、区別して示す場合を除き、キャッシュメモリもＲＡＭ２０２０、メモリ、及び／又は記憶装置に含まれるものとする。 In addition, the CPU 2000 includes all or necessary portions of files or databases stored in an external storage device such as the hard disk drive 2040, DVD drive 2060 (DVD-ROM 2095), and flexible disk drive 2050 (flexible disk 2090). Are read into the RAM 2020 by DMA transfer or the like, and various processes are performed on the data on the RAM 2020. Then, CPU 2000 writes the processed data back to the external storage device by DMA transfer or the like. In such processing, since the RAM 2020 can be regarded as temporarily holding the contents of the external storage device, in the present embodiment, the RAM 2020 and the external storage device are collectively referred to as a memory, a storage unit, or a storage device. Various types of information such as various programs, data, tables, and databases in the present embodiment are stored on such a storage device and are subjected to information processing. Note that the CPU 2000 can also store a part of the RAM 2020 in the cache memory and perform reading and writing on the cache memory. Even in such a form, the cache memory bears a part of the function of the RAM 2020. Therefore, in the present embodiment, the cache memory is also included in the RAM 2020, the memory, and / or the storage device unless otherwise indicated. To do.

また、ＣＰＵ２０００は、ＲＡＭ２０２０から読み出したデータに対して、プログラムの命令列により指定された、本実施形態中に記載した各種の演算、情報の加工、条件判断、情報の検索・置換等を含む各種の処理を行い、ＲＡＭ２０２０へと書き戻す。例えば、ＣＰＵ２０００は、条件判断を行う場合においては、本実施形態において示した各種の変数が、他の変数または定数と比較して、大きい、小さい、以上、以下、等しい等の条件を満たすかどうかを判断し、条件が成立した場合（又は不成立であった場合）に、異なる命令列へと分岐し、またはサブルーチンを呼び出す。 In addition, the CPU 2000 performs various operations, such as various operations, information processing, condition determination, information search / replacement, etc., described in the present embodiment, specified for the data read from the RAM 2020 by the instruction sequence of the program. Is written back to the RAM 2020. For example, when performing the condition determination, the CPU 2000 determines whether the various variables shown in the present embodiment satisfy the conditions such as large, small, above, below, equal, etc., compared to other variables or constants. When the condition is satisfied (or not satisfied), the program branches to a different instruction sequence or calls a subroutine.

また、ＣＰＵ２０００は、記憶装置内のファイルまたはデータベース等に格納された情報を検索することができる。例えば、第１属性の属性値に対し第２属性の属性値がそれぞれ対応付けられた複数のエントリが記憶装置に格納されている場合において、ＣＰＵ２０００は、記憶装置に格納されている複数のエントリの中から第１属性の属性値が指定された条件と一致するエントリを検索し、そのエントリに格納されている第２属性の属性値を読み出すことにより、所定の条件を満たす第１属性に対応付けられた第２属性の属性値を得ることができる。 Further, the CPU 2000 can search for information stored in a file or database in the storage device. For example, in the case where a plurality of entries in which the attribute value of the second attribute is associated with the attribute value of the first attribute are stored in the storage device, the CPU 2000 displays the plurality of entries stored in the storage device. The entry that matches the condition in which the attribute value of the first attribute is specified is retrieved, and the attribute value of the second attribute that is stored in the entry is read, thereby associating with the first attribute that satisfies the predetermined condition The attribute value of the specified second attribute can be obtained.

以上に示したプログラム又はモジュールは、外部の記録媒体に格納されてもよい。記録媒体としては、フレキシブルディスク２０９０、ＤＶＤ−ＲＯＭ２０９５の他に、ＤＶＤ又はＣＤ等の光学記録媒体、ＭＯ等の光磁気記録媒体、テープ媒体、ＩＣカード等の半導体メモリ等を用いることができる。また、専用通信ネットワーク又はインターネットに接続されたサーバシステムに設けたハードディスク又はＲＡＭ等の記憶装置を記録媒体として使用し、ネットワークを介してプログラムをコンピュータ１９００に提供してもよい。 The program or module shown above may be stored in an external recording medium. As the recording medium, in addition to the flexible disk 2090 and the DVD-ROM 2095, an optical recording medium such as DVD or CD, a magneto-optical recording medium such as MO, a tape medium, a semiconductor memory such as an IC card, and the like can be used. Further, a storage device such as a hard disk or RAM provided in a server system connected to a dedicated communication network or the Internet may be used as a recording medium, and the program may be provided to the computer 1900 via the network.

以上、本発明を実施の形態を用いて説明したが、本発明の技術的範囲は上記実施の形態に記載の範囲には限定されない。上記実施の形態に、多様な変更または改良を加えることが可能であることが当業者に明らかである。その様な変更または改良を加えた形態も本発明の技術的範囲に含まれ得ることが、特許請求の範囲の記載から明らかである。 As mentioned above, although this invention was demonstrated using embodiment, the technical scope of this invention is not limited to the range as described in the said embodiment. It will be apparent to those skilled in the art that various modifications or improvements can be added to the above-described embodiment. It is apparent from the scope of the claims that the embodiments added with such changes or improvements can be included in the technical scope of the present invention.

特許請求の範囲、明細書、および図面中において示した装置、システム、プログラム、および方法における動作、手順、ステップ、および段階等の各処理の実行順序は、特段「より前に」、「先立って」等と明示しておらず、また、前の処理の出力を後の処理で用いるのでない限り、任意の順序で実現しうることに留意すべきである。特許請求の範囲、明細書、および図面中の動作フローに関して、便宜上「まず、」、「次に、」等を用いて説明したとしても、この順で実施することが必須であることを意味するものではない。本願によれば、以下の構成もまた開示される。
（項目１）
入力された動画像データから特徴シーンを選択する画像処理装置であって、
前記動画像データを解析して、前記動画像データが、画像内容を識別するための予め定められた複数のカテゴリのうち何れのカテゴリに属するかを特定するカテゴリ特定部と、
前記動画像データから抽出される複数の特徴量の時系列データのうち前記動画像データが属するカテゴリに対応付けられた特徴量の時系列データに基づき、前記動画像データに含まれる特徴シーンを選択する選択部と、
を備える画像処理装置。
（項目２）
当該画像処理装置は、撮像された動画像データから特徴シーンを選択し、
前記カテゴリ特定部は、前記動画像データが、どのようなイベントにおいて撮像された動画像であるかを識別するための複数のカテゴリのうちの何れのカテゴリに属するかを特定する項目１に記載の画像処理装置。
（項目３）
前記カテゴリ特定部は、前記動画像データから抽出される１以上の特徴量に基づき、前記動画像データが何れのカテゴリに属するかを特定する項目１または２に記載の画像処理装置。
（項目４）
前記選択部は、一のカテゴリに属する動画像データに含まれる特徴シーンを選択する場合において、前記動画像データのカテゴリを特定するために用いた前記複数の特徴量とは異なる特徴量の時系列データに基づき特徴シーンを選択する項目３に記載の画像処理装置。
（項目５）
前記選択部は、前記動画像データが属するカテゴリに対応付けられた特徴量の時系列データが予め定められた条件を満たす動画像データの部分を、特徴シーンとして選択する項目１から４の何れか一項に記載の画像処理装置。
（項目６）
前記選択部は、複数の特徴シーンを選択した場合、前記複数の特徴シーンのそれぞれに対して特徴量に基づきランクをつける項目５に記載の画像処理装置。
（項目７）
選択された前記特徴シーンを代表する静止画像を生成する静止画生成部を更に備える項目１から６の何れか一項に記載の画像処理装置。
（項目８）
動画像データを撮像する撮像システムであって、
被写体を撮像した動画像データを生成する撮像部と、
前記撮像部により撮像された動画像データから特徴シーンを選択する請求項１から７の何れかに記載の画像処理装置と、
前記撮像部により撮像された動画像データに対応付けて、前記画像処理装置により選択された前記特徴シーンを表す情報を記録媒体に記録する記録部と、
を備える撮像システム。
（項目９）
入力された動画像データから特徴シーンを選択する画像処理方法であって、
前記動画像データを解析して、前記動画像データが、画像内容を識別するための予め定められた複数のカテゴリのうち何れのカテゴリに属するかを特定するカテゴリ特定段階と、
前記動画像データから抽出される複数の特徴量の時系列データのうち前記動画像データが属するカテゴリに対応付けられた特徴量の時系列データに基づき、前記動画像データに含まれる特徴シーンを選択する選択段階と、
を備える画像処理方法。
（項目１０）
コンピュータを、入力された動画像データから特徴シーンを選択する画像処理装置として機能させるためのプログラムであって、
当該プログラムは、前記コンピュータを、
前記動画像データを解析して、前記動画像データが、画像内容を識別するための予め定められた複数のカテゴリのうち何れのカテゴリに属するかを特定するカテゴリ特定部と、
前記動画像データから抽出される複数の特徴量の時系列データのうち前記動画像データが属するカテゴリに対応付けられた特徴量の時系列データに基づき、前記動画像データに含まれる特徴シーンを選択する選択部と、
して機能させるプログラム。 The order of execution of each process such as operations, procedures, steps, and stages in the apparatus, system, program, and method shown in the claims, the description, and the drawings is particularly “before” or “prior to”. It should be noted that the output can be realized in any order unless the output of the previous process is used in the subsequent process. Regarding the operation flow in the claims, the description, and the drawings, even if it is described using “first”, “next”, etc. for convenience, it means that it is essential to carry out in this order. It is not a thing. According to the present application, the following configurations are also disclosed.
(Item 1)
An image processing apparatus for selecting a feature scene from input moving image data,
A category specifying unit that analyzes the moving image data and specifies which category of a plurality of predetermined categories for identifying the image content of the moving image data;
A feature scene included in the moving image data is selected based on time-series data of feature amounts associated with a category to which the moving image data belongs among time-series data of feature amounts extracted from the moving image data. A selection section to
An image processing apparatus comprising:
(Item 2)
The image processing apparatus selects a feature scene from the captured moving image data,
The category specifying unit according to Item 1, wherein the category specifying unit specifies which category of a plurality of categories for identifying in which event the moving image data is a moving image captured. Image processing device.
(Item 3)
The image processing apparatus according to item 1 or 2, wherein the category specifying unit specifies to which category the moving image data belongs based on one or more feature amounts extracted from the moving image data.
(Item 4)
When the selection unit selects a feature scene included in moving image data belonging to one category, a time series of feature amounts different from the plurality of feature amounts used for specifying the category of the moving image data 4. The image processing device according to item 3, wherein a feature scene is selected based on data.
(Item 5)
The selection unit is any one of items 1 to 4 for selecting, as a feature scene, a portion of moving image data in which time-series data of feature amounts associated with a category to which the moving image data belongs satisfy a predetermined condition. The image processing apparatus according to one item.
(Item 6)
6. The image processing device according to item 5, wherein when the plurality of feature scenes are selected, the selection unit ranks each of the plurality of feature scenes based on a feature amount.
(Item 7)
The image processing device according to any one of items 1 to 6, further comprising a still image generation unit that generates a still image representing the selected feature scene.
(Item 8)
An imaging system for capturing moving image data,
An imaging unit that generates moving image data obtained by imaging a subject;
The image processing apparatus according to claim 1, wherein a feature scene is selected from moving image data captured by the imaging unit.
A recording unit that records information representing the feature scene selected by the image processing apparatus on a recording medium in association with moving image data captured by the imaging unit;
An imaging system comprising:
(Item 9)
An image processing method for selecting a feature scene from input moving image data,
A category specifying step of analyzing the moving image data and specifying which of the plurality of predetermined categories for identifying the image content the moving image data belongs to;
A feature scene included in the moving image data is selected based on time-series data of feature amounts associated with a category to which the moving image data belongs among time-series data of feature amounts extracted from the moving image data. And a selection stage to
An image processing method comprising:
(Item 10)
A program for causing a computer to function as an image processing device that selects a feature scene from input moving image data,
The program causes the computer to
A category specifying unit that analyzes the moving image data and specifies which category of a plurality of predetermined categories for identifying the image content of the moving image data;
A feature scene included in the moving image data is selected based on time-series data of feature amounts associated with a category to which the moving image data belongs among time-series data of feature amounts extracted from the moving image data. A selection section to
Program to make it work.

１００画像処理装置、１１０カテゴリ特定部、１１５識別器、１１７特定器、１２０選択部、１２５特徴検出器、１２７特徴選択器、１３０抽出部、１４０加工部、２１０動画像データ、２１５静止画像データ、３０２、３０４、３０６、４０２、４０４時系列データ、１９００コンピュータ、２０００ＣＰＵ、２０１０ＲＯＭ、２０２０ＲＡＭ、２０３０通信インターフェイス、２０４０ハードディスクドライブ、２０５０フレキシブルディスク・ドライブ、２０６０ＤＶＤドライブ、２０７０入出力チップ、２０７５グラフィック・コントローラ、２０８０表示装置、２０８２ホスト・コントローラ、２０８４入出力コントローラ、２０９０フレキシブルディスク、２０９５ＤＶＤ−ＲＯＭ 100 image processing apparatus, 110 category specifying unit, 115 classifier, 117 specifying unit, 120 selecting unit, 125 feature detector, 127 feature selecting unit, 130 extracting unit, 140 processing unit, 210 moving image data, 215 still image data, 302, 304, 306, 402, 404 Time series data, 1900 computer, 2000 CPU, 2010 ROM, 2020 RAM, 2030 communication interface, 2040 hard disk drive, 2050 flexible disk drive, 2060 DVD drive, 2070 I / O chip, 2075 graphics Controller, 2080 display device, 2082 Host controller, 2084 I / O controller, 2090 flexible disk, 2095 DVD-ROM

Claims

An image processing apparatus for selecting a feature scene from input moving image data,
By analyzing the moving picture data, the moving image data, and a category specifying unit for specifying whether belongs to the category of the multiple categories defined Me pre,
Based on the feature quantity the moving image data associated with the category to which belongs among the plurality of feature amounts extracted from the moving image data, a selection unit for selecting a feature scenes included in the moving image data,
Equipped with a,
An image processing apparatus in which a feature amount associated with a category to which the moving image data belongs is a color .

The selection unit sets the occurrence frequency of the color of the moving image data as a feature amount associated with a category to which the moving image data belongs.
The image processing apparatus according to claim 1.

The selection unit sets the color area in the moving image data as a feature amount associated with a category to which the moving image data belongs.
The image processing apparatus according to claim 1.

The image processing apparatus selects a feature scene from the captured moving image data,
The category specifying unit, the moving image data, from the claims 1 to identify whether they fall in a category of a plurality of categories to identify whether the moving image captured in any event 3 The image processing apparatus according to any one of the above.

The image according to any one of claims 1 to 4, wherein the category specifying unit specifies a category to which the moving image data belongs based on one or more feature amounts extracted from the moving image data. Processing equipment.

The image processing apparatus according to any one of claims 1, further comprising a still image generation unit for generating a still image representative of said selected feature scenes further 5.

An imaging system for capturing moving image data,
An imaging unit that generates moving image data obtained by imaging a subject;
The image processing apparatus according to any one of claims 1 to 6 , wherein a feature scene is selected from moving image data captured by the imaging unit.
A recording unit that records information representing the feature scene selected by the image processing apparatus on a recording medium in association with moving image data captured by the imaging unit;
An imaging system comprising:

An image processing method for selecting a feature scene from input moving image data,
By analyzing the moving picture data, the moving image data, and category specific step of identifying whether belongs to the category of the multiple categories defined Me pre,
Based on the feature quantity the moving image data associated with the category to which belongs among the plurality of feature amounts extracted from the moving image data, a selecting step of selecting the feature scenes included in the moving image data,
Equipped with a,
An image processing method in which a feature amount associated with a category to which the moving image data belongs is a color .

A program for causing a computer to function as an image processing device that selects a feature scene from input moving image data,
The program causes the computer to
By analyzing the moving picture data, the moving image data, and a category specifying unit for specifying whether belongs to the category of the multiple categories defined Me pre,
Based on the feature quantity the moving image data associated with the category to which belongs among the plurality of feature amounts extracted from the moving image data, a selection unit for selecting a feature scenes included in the moving image data,
With
Feature quantity the moving image data associated with the category belongs, the program to function in an image processing apparatus to be color.