JP2015032905A

JP2015032905A - Information processing device, information processing method, and program

Info

Publication number: JP2015032905A
Application number: JP2013159672A
Authority: JP
Inventors: 建志入江; Kenji Irie
Original assignee: Canon Marketing Japan Inc; Canon IT Solutions Inc; Canon MJ IT Group Holdings Inc
Current assignee: Canon Marketing Japan Inc; Canon IT Solutions Inc; Canon MJ IT Group Holdings Inc
Priority date: 2013-07-31
Filing date: 2013-07-31
Publication date: 2015-02-16

Abstract

PROBLEM TO BE SOLVED: To provide a mechanism for improving convenience when searching for a moving image to be material in moving image editing.SOLUTION: An information processing device for generating a summarized moving image on the basis of a plurality of summarization target moving images acquires the summarization target moving images, extracts, from a moving image frame of the summarization target moving images, an object feature amount for identifying an object included in the moving image frame, and identifies the object included in the moving image frame on the basis of the extracted object feature amount.

Description

本発明は、動画編集装置、及びその制御方法、プログラムに関する。 The present invention relates to a moving image editing apparatus, a control method thereof, and a program.

近年、撮影デバイスの普及に伴って、大量の動画が撮影、保存されるようになってきており、複数の動画を編集し、つなぎあわせ、内容を要約した動画を作成する作業が広く一般的に行われるようになってきている。しかし、動画編集時において、素材となる動画を検索する作業は、日付や動画ファイル名などのメタデータにもとづいて行う場合がほとんどであり、利用者にとって非常に煩雑な作業となっている。 In recent years, with the widespread use of photographic devices, a large number of videos have been shot and stored, and it has become common to edit multiple videos, connect them together, and create videos that summarize the contents. It is getting done. However, at the time of editing a moving image, the operation of searching for a moving image as a material is almost always performed based on metadata such as a date and a moving image file name, which is very complicated for the user.

このような課題を解決するために、特許文献１では、キーワードを利用した検索時に複数の素材動画が候補となる場合、素材同士の画像特徴量から算出した類似度を用いることで、編集後の動画における、所定のシーンと隣接シーンとの類似度が高いシーンを持つ素材動画により高い優先度を与え、優先度順に表示することで利用者が動画を決定する効率を上げる手法が示されている。 In order to solve such a problem, in Patent Document 1, when a plurality of material moving images are candidates at the time of searching using a keyword, by using the similarity calculated from the image feature amount between the materials, In the video, a method is shown in which a high priority is given to a material video having a scene with a high degree of similarity between a predetermined scene and an adjacent scene, and the user is determined in order of priority to increase the efficiency of determining the video. .

また、特許文献２では、動画中の画像に物体認識や文字認識を行い、認識結果を文字情報として記録しておくことで、検索・編集時における利用者の効率を上げる手法が示されている。 Patent Document 2 discloses a method for improving user efficiency during search / editing by performing object recognition and character recognition on an image in a moving image and recording the recognition result as character information. .

特開２００５−３０３８４０JP-A-2005-303840 特開２００７−０８２０８８JP2007-082088

上記特許文献１では、検索時に利用される動画の内容を表すキーワードに関しては、あらかじめ利用者が登録しておく必要があり、登録作業は利用者にとって非常に負担となる作業である。 In the above-mentioned Patent Document 1, a keyword indicating the content of a moving image used at the time of search needs to be registered in advance by the user, and the registration work is a very burdensome work for the user.

また、上記特許文献２では、特定物体認識の精度によっては利用者にとって利用しづらいものであり、物体認識した名称について利用者が記憶していなければ検索することができない。さらに、利用者が間違った認識結果を訂正する機能については提供されていない。 Moreover, in the said patent document 2, it is difficult for a user to use depending on the precision of specific object recognition, and if the user has not memorize | stored about the name which recognized the object, it cannot search. In addition, no function is provided for users to correct incorrect recognition results.

そこで、本発明は、動画編集時に素材となる動画を検索する際の利便性を向上させる仕組みを提供することを目的とする。 Therefore, an object of the present invention is to provide a mechanism for improving convenience when searching for a moving image as a material when editing a moving image.

本発明は、複数の要約対象動画をもとに、要約動画を生成する情報処理装置であって、前記要約対象動画を取得する取得手段と、前記取得手段により取得した要約対象動画の動画フレームから、当該動画フレームに含まれる物体を特定するための物体特徴量を抽出する物体特徴量抽出手段と、前記物体特徴量抽出手段より抽出された物体特徴量に基づき、当該動画フレームに含まれる物体を特定する物体特定手段と、を備えることを特徴とする。 The present invention is an information processing apparatus for generating a summary video based on a plurality of summary target videos, from an acquisition unit that acquires the summary target video and a video frame of the summary target video acquired by the acquisition unit , An object feature amount extracting unit for extracting an object feature amount for identifying an object included in the moving image frame, and an object included in the moving image frame based on the object feature amount extracted by the object feature amount extracting unit. And an object specifying means for specifying.

また、本発明は、複数の要約対象動画をもとに、要約動画を生成する情報処理装置における情報処理方法であって、前記情報処理装置の取得手段が、前記要約対象動画を取得する取得工程と、前記情報処理装置の物体特徴量抽出手段が、前記取得工程により取得した要約対象動画の動画フレームから、当該動画フレームに含まれる物体を特定するための物体特徴量を抽出する物体特徴量抽出工程と、前記情報処理装置の物体特定手段が、前記物体特徴量抽出工程より抽出された物体特徴量に基づき、当該動画フレームに含まれる物体を特定する物体特定工程と、を備えることを特徴とする。 The present invention is also an information processing method in an information processing apparatus that generates a summary video based on a plurality of summary target videos, wherein the acquisition unit of the information processing apparatus acquires the summary target video. And object feature amount extraction means for extracting an object feature amount for specifying an object included in the moving image frame from the moving image frame of the summary target moving image acquired by the acquisition step. And an object specifying unit of the information processing apparatus comprising: an object specifying step of specifying an object included in the moving image frame based on the object feature amount extracted from the object feature amount extracting step. To do.

また、本発明は、複数の要約対象動画をもとに、要約動画を生成する情報処理装置において実行可能なプログラムであって、前記情報処理装置を、前記要約対象動画を取得する取得手段と、前記取得手段により取得した要約対象動画の動画フレームから、当該動画フレームに含まれる物体を特定するための物体特徴量を抽出する物体特徴量抽出手段と、前記物体特徴量抽出手段より抽出された物体特徴量に基づき、当該動画フレームに含まれる物体を特定する物体特定手段として機能させることを特徴とする。 Further, the present invention is a program executable in an information processing apparatus that generates a summary video based on a plurality of summary target videos, and the information processing apparatus acquires the summary target video, and Object feature amount extraction means for extracting an object feature amount for specifying an object included in the moving image frame from the moving image frame of the summary target moving image acquired by the acquisition means; and the object extracted by the object feature amount extraction means It is characterized by functioning as an object specifying means for specifying an object included in the moving image frame based on the feature amount.

本発明によれば、動画編集時に素材となる動画を検索する際の利便性を向上させる仕組みを提供することが可能となる。 ADVANTAGE OF THE INVENTION According to this invention, it becomes possible to provide the mechanism which improves the convenience at the time of searching the moving image used as a raw material at the time of moving image editing.

本発明の実施形態における動画編集システムの構成を示す図であるIt is a figure which shows the structure of the moving image editing system in embodiment of this invention. 本発明の実施形態における利用者端末１０１，要約生成装置１０２に適用可能な情報処理装置のハードウェア構成を示すブロック図であるIt is a block diagram which shows the hardware constitutions of the information processing apparatus applicable to the user terminal 101 and the summary production | generation apparatus 102 in embodiment of this invention. 本発明の実施形態における動画編集システムの機能ブロックの構成を示す図である。It is a figure which shows the structure of the functional block of the moving image editing system in embodiment of this invention. 本発明の実施形態における画像検索システムにおける検索対象画像の登録手順の一例を示すフローチャートであるIt is a flowchart which shows an example of the registration procedure of the search object image in the image search system in embodiment of this invention. 本発明の実施形態における、動画編集システムにおける動画解析処理の詳細処理を示すフローチャートであるIt is a flowchart which shows the detailed process of the moving image analysis process in the moving image editing system in embodiment of this invention. 本発明の実施形態における、動画編集システムにおける物体認識処理の詳細処理を示すフローチャートであるIt is a flowchart which shows the detailed process of the object recognition process in the moving image editing system in embodiment of this invention. 本発明の実施形態における、動画編集システムにおける物体推定処理の詳細処理を示すフローチャートであるIt is a flowchart which shows the detailed process of the object estimation process in the moving image editing system in embodiment of this invention. Ｓ７０９における推定対象動画フレームと推定用動画フレーム間の経過時間を計算するための式の一例であるIt is an example of the formula for calculating the elapsed time between the estimation target moving image frame and the estimation moving image frame in S709. Ｓ７１２における推定対象動画フレームと推定用動画フレーム間の経過時間を計算するための式の一例であるIt is an example of the formula for calculating the elapsed time between the estimation target moving image frame and the estimation moving image frame in S712. Ｓ７１５における推定対象動画フレームと推定用動画フレーム間の経過時間を計算するための式の一例であるIt is an example of the formula for calculating the elapsed time between the estimation target moving image frame and the estimation moving image frame in S715. 本発明の実施形態における動画編集システムにおける要約生成処理の手順を示すフローチャートであるIt is a flowchart which shows the procedure of the summary production | generation process in the moving image editing system in embodiment of this invention. 本発明の実施形態における、動画編集システムにおける要約候補生成処理の詳細処理を示すフローチャートであるIt is a flowchart which shows the detailed process of the summary candidate production | generation process in the moving image editing system in embodiment of this invention. 本発明の実施形態における、動画編集システムにおける要約重みベクトル生成処理の詳細処理を示すフローチャートであるIt is a flowchart which shows the detailed process of the summary weight vector generation process in the moving image editing system in embodiment of this invention. Ｓ１３０２において初期要約重みベクトルを算出するための式の一例であるIt is an example of the formula for calculating an initial summary weight vector in S1302 本発明の実施形態における動画データベースの一例を示す図であるIt is a figure which shows an example of the moving image database in embodiment of this invention. 本発明の実施形態における物体認識データベースの一例を示す図であるIt is a figure which shows an example of the object recognition database in embodiment of this invention. 本発明における動画編集システムの利用者端末の要約生成指定・要約対象表示部における表示画面の一例を示す図であるIt is a figure which shows an example of the display screen in the summary production | generation specification and summary object display part of the user terminal of the moving image editing system in this invention. 利用者が要約動画生成の条件を設定するための表示画面の一例を示す図であるIt is a figure which shows an example of the display screen for a user to set the conditions of summary animation production | generation. 動画編集システムの利用者端末の要約候補表示・編集部における表示画面の一例を示す図であるIt is a figure which shows an example of the display screen in the summary candidate display and edit part of the user terminal of a moving image editing system. 利用者が要約候補動画の編集を行うための表示画面の一例を示す図であるIt is a figure which shows an example of the display screen for a user to edit a summary candidate animation.

＜第１の実施形態＞
以下、図面を参照して、本発明の実施形態を詳細に説明する。 <First Embodiment>
Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

まず、図１を参照して、本発明の実施形態における動画編集システムの構成について説明する。図１は、本発明の実施形態における動画編集システムの構成を示す図である。図１は、ひとつ又は複数の利用者端末１０１と、ひとつの要約生成装置１０２がローカルエリアネットワーク（ＬＡＮ）１０３を介して接続される構成となっている。 First, with reference to FIG. 1, the structure of the moving image editing system in the embodiment of the present invention will be described. FIG. 1 is a diagram showing a configuration of a moving image editing system according to an embodiment of the present invention. FIG. 1 shows a configuration in which one or a plurality of user terminals 101 and one summary generation device 102 are connected via a local area network (LAN) 103.

利用者端末１０１は、動画の編集を行う利用者が使用する情報処理装置であって、動画検索・要約生成・要約編集要求を発信する機能と結果を受信して表示する機能を有する。 The user terminal 101 is an information processing apparatus used by a user who edits a moving image, and has a function of transmitting a moving image search / summary generation / summary editing request and a function of receiving and displaying the result.

要約生成装置１０２は、対象となる複数の動画を記憶しており、利用者端末１０１からの検索要求を受け付け、動画の検索処理を行い、検索結果を応答する機能、利用者端末１０１からの要約生成要求を受け付け、動画の要約生成処理を行い、結果を応答する機能、利用者端末１０１からの要約編集要求を受け付け、処理を行い、編集結果を応答する機能を有する。また、外部から対象とする動画を入力する機能を備えている。以上が図１の、本発明の実施形態における動画編集システムの構成についての説明である。 The summary generation device 102 stores a plurality of target videos, accepts a search request from the user terminal 101, performs a video search process, and responds to the search result, a summary from the user terminal 101. It has a function for receiving a generation request, performing a summary generation process for a moving image, and responding to the result, and a function for receiving a summary editing request from the user terminal 101, performing a process, and responding to the editing result. In addition, it has a function of inputting a target moving image from the outside. The above is the description of the configuration of the moving image editing system in the embodiment of the present invention shown in FIG.

以下、図２を用いて、本発明の実施形態における利用者端末１０１，要約生成装置１０２に適用可能な情報処理装置のハードウェア構成を示すブロック図の構成の一例について説明する。図２は、本発明の実施形態における利用者端末１０１，要約生成装置１０２に適用可能な情報処理装置のハードウェア構成を示すブロック図である。 Hereinafter, an example of a configuration of a block diagram illustrating a hardware configuration of an information processing apparatus applicable to the user terminal 101 and the summary generation apparatus 102 according to the embodiment of the present invention will be described with reference to FIG. FIG. 2 is a block diagram showing a hardware configuration of an information processing apparatus applicable to the user terminal 101 and the summary generation apparatus 102 according to the embodiment of the present invention.

図２において、２０１はＣＰＵで、システムバス２０４に接続される各デバイスやコントローラを統括的に制御する。また、ＲＯＭ２０２あるいは外部メモリ２１１には、ＣＰＵ２０１の制御プログラムであるＢＩＯＳ（ＢａｓｉｃＩｎｐｕｔ／ＯｕｔｐｕｔＳｙｓｔｅｍ）やオペレーティングシステムプログラム（以下、ＯＳ）や、各サーバ或いは各ＰＣの実行する機能を実現するために必要な後述する各種プログラム等が記憶されている。 In FIG. 2, reference numeral 201 denotes a CPU that comprehensively controls each device and controller connected to the system bus 204. Further, the ROM 202 or the external memory 211 is necessary to realize a BIOS (Basic Input / Output System) or an operating system program (hereinafter referred to as an OS), which is a control program of the CPU 201, or a function executed by each server or each PC. Various programs to be described later are stored.

２０３はＲＡＭで、ＣＰＵ２０１の主メモリ、ワークエリア等として機能する。ＣＰＵ２０１は、処理の実行に際して必要なプログラム等をＲＯＭ２０２あるいは外部メモリ２１１からＲＡＭ２０３にロードして、該ロードしたプログラムを実行することで各種動作を実現するものである。 A RAM 203 functions as a main memory, work area, and the like for the CPU 201. The CPU 201 implements various operations by loading a program or the like necessary for execution of processing from the ROM 202 or the external memory 211 into the RAM 203 and executing the loaded program.

また、２０５は入力コントローラで、キーボード（ＫＢ）２０９や不図示のマウス等のポインティングデバイス等からの入力を制御する。２０６はビデオコントローラで、ＣＲＴディスプレイ（ＣＲＴ）２１０等の表示器への表示を制御する。なお、図２では、ＣＲＴ２１０と記載しているが、表示器はＣＲＴだけでなく、液晶ディスプレイ等の他の表示器であってもよい。これらは必要に応じて管理者が使用するものである。 An input controller 205 controls input from a keyboard (KB) 209 or a pointing device such as a mouse (not shown). A video controller 206 controls display on a display device such as a CRT display (CRT) 210. In FIG. 2, although described as CRT 210, the display device is not limited to the CRT, but may be another display device such as a liquid crystal display. These are used by the administrator as needed.

２０７はメモリコントローラで、ブートプログラム，各種のアプリケーション，フォントデータ，ユーザファイル，編集ファイル，各種データ等を記憶する外部記憶装置（ハードディスク（ＨＤ））や、フレキシブルディスク（ＦＤ）、或いはＰＣＭＣＩＡカードスロットにアダプタを介して接続されるコンパクトフラッシュ（登録商標）メモリ等の外部メモリ２１１へのアクセスを制御する。 A memory controller 207 is provided in an external storage device (hard disk (HD)), flexible disk (FD), or PCMCIA card slot for storing a boot program, various applications, font data, user files, editing files, various data, and the like. Controls access to an external memory 211 such as a compact flash (registered trademark) memory connected via an adapter.

２０８は通信Ｉ／Ｆコントローラで、ネットワーク（例えば、図１に示したＬＡＮ１０３）を介して外部機器と接続・通信するものであり、ネットワークでの通信制御処理を実行する。例えば、ＴＣＰ／ＩＰを用いた通信等が可能である。 A communication I / F controller 208 is connected to and communicates with an external device via a network (for example, the LAN 103 shown in FIG. 1), and executes communication control processing in the network. For example, communication using TCP / IP is possible.

なお、ＣＰＵ２０１は、例えばＲＡＭ２０３内の表示情報用領域へアウトラインフォントの展開（ラスタライズ）処理を実行することにより、ＣＲＴ２１０上での表示を可能としている。また、ＣＰＵ２０１は、ＣＲＴ２１０上の不図示のマウスカーソル等でのユーザ指示を可能とする。 Note that the CPU 201 enables display on the CRT 210 by executing outline font rasterization processing on a display information area in the RAM 203, for example. In addition, the CPU 201 enables a user instruction with a mouse cursor (not shown) on the CRT 210.

本発明を実現するための後述する各種プログラムは、外部メモリ２１１に記録されており、必要に応じてＲＡＭ２０３にロードされることによりＣＰＵ２０１によって実行されるものである。さらに、上記プログラムの実行時に用いられる定義ファイル及び各種情報テーブル等も、外部メモリ２１１に格納されており、これらについての詳細な説明も後述する。以上が図２の、本発明の実施形態における利用者端末１０１、要約生成装置１０２に適用可能な情報処理装置のハードウェア構成を示すブロック図の構成の一例についての説明である。 Various programs to be described later for realizing the present invention are recorded in the external memory 211 and executed by the CPU 201 by being loaded into the RAM 203 as necessary. Furthermore, definition files and various information tables used when executing the program are also stored in the external memory 211, and a detailed description thereof will be described later. The above is an example of the configuration of the block diagram illustrating the hardware configuration of the information processing apparatus applicable to the user terminal 101 and the summary generation apparatus 102 in the embodiment of the present invention of FIG.

次に、図３を用いて、本発明の実施形態における動画編集システムの機能ブロックの構成について説明する。図３は、本発明の実施形態における動画編集システムの機能ブロックの構成を示す図である。 Next, the configuration of functional blocks of the moving image editing system according to the embodiment of the present invention will be described with reference to FIG. FIG. 3 is a diagram showing a functional block configuration of the moving image editing system according to the embodiment of the present invention.

図１の説明にて前述したように、本発明の実施形態における動画編集システムは、利用者端末１０１と要約生成装置１０２と画像ソース５００から構成される。利用者端末１０１と要約生成装置１０２と画像ソース５００とはそれぞれネットワークを介して相互に通信可能に接続されている。なお、本実施形態においては図１や図３に示すように利用者端末と要約生成装置を別々の端末として説明しているが、利用者端末と要約生成装置の両方の機能を備えた一つの端末により、本実施形態の処理が実行されても良い。 As described above with reference to FIG. 1, the moving image editing system according to the embodiment of the present invention includes the user terminal 101, the summary generation device 102, and the image source 500. The user terminal 101, the summary generation device 102, and the image source 500 are connected to each other via a network so that they can communicate with each other. In this embodiment, as shown in FIG. 1 and FIG. 3, the user terminal and the summary generation device are described as separate terminals. However, the user terminal and the summary generation device have both functions. The processing of this embodiment may be executed by the terminal.

利用者端末１０１は、要約生成装置１０２に対して、動画検索要求を送り、検索結果を受信し表示し、要約対象の動画を指示し、要約生成の指示を送るための情報処理装置である。利用者端末１０１は、要約生成指定・要約対象表示部３０１と、要約候補表示・編集部３０２と、から構成される。 The user terminal 101 is an information processing device for sending a moving image search request to the summary generation device 102, receiving and displaying the search result, instructing a summary target video, and sending a summary generation instruction. The user terminal 101 includes a summary generation designation / summarization target display unit 301 and a summary candidate display / editing unit 302.

要約生成指定・要約対象表示部３０１は、利用者から、検索要求としてのクエリーと、要約対象動画の指示と、動画メタデータ訂正の指示と、要約生成条件の指示と、要約生成の指示を受け付ける入力機能を有し、当該クエリーや当該指示を、ネットワークを通じて、要約生成装置１０２の要約候補生成部４０６へ送信する機能と、要約生成装置１０２から応答される動画検索結果を受信する機能と、該検索結果を表示する機能と、を有する機能処理部である。 The summary generation designation / summarization target display unit 301 receives a query as a search request, an instruction of a summary target video, an instruction of video metadata correction, an instruction of summary generation conditions, and an instruction of summary generation from a user. An input function, a function of transmitting the query and the instruction to the summary candidate generation unit 406 of the summary generation apparatus 102 via the network, a function of receiving a video search result returned from the summary generation apparatus 102, A function processing unit having a function of displaying a search result.

要約候補表示・編集部３０２は、要約生成装置１０２から応答される要約候補結果を受信する機能と、該要約候補結果を表示する機能と、該要約候補結果の編集を指示する機能と、要約動画の出力を指示する機能と、を有する機能処理部である。 The summary candidate display / editing unit 302 has a function of receiving a summary candidate result returned from the summary generation apparatus 102, a function of displaying the summary candidate result, a function of instructing editing of the summary candidate result, and a summary video A function processing unit having a function of instructing the output of.

要約生成装置１０２は、利用者端末１０１から、動画の検索要求を受信し、蓄積された動画に対して要求された検索処理を実行し、要約生成の指示を受信し、要約候補を生成し、要約動画の出力指示を受信し、要約動画を出力し、検索結果情報と生成した要約候補を利用者端末１０１へ送信する情報処理装置である。要約生成装置１０２は、動画登録部４０１と、動画解析部４０２と、特徴量抽出部４０３と、物体認識部４０４と、物体推定部４０５と、要約候補生成部４０６と、動画検索部４０７と、メタデータ訂正部４０８と、要約重みベクトル生成部４０９と、要約候補結果出力部４１０と、動画推薦部４１１と、動画データベース４１２と、物体認識データベース４１３と、から構成される。 The summary generation device 102 receives a video search request from the user terminal 101, executes a requested search process on the stored video, receives a summary generation instruction, generates a summary candidate, The information processing apparatus receives an instruction to output a summary video, outputs a summary video, and transmits search result information and a generated summary candidate to the user terminal 101. The summary generation device 102 includes a video registration unit 401, a video analysis unit 402, a feature amount extraction unit 403, an object recognition unit 404, an object estimation unit 405, a summary candidate generation unit 406, a video search unit 407, The metadata correction unit 408, summary weight vector generation unit 409, summary candidate result output unit 410, movie recommendation unit 411, movie database 412, and object recognition database 413 are configured.

動画登録部４０１は、処理対象となる動画を本システムへ登録する機能処理部である。動画ソース５００で指示されるシステムの外部のアクターから、対象とする動画データ（群）を受信または取得し、当該動画データ（群）を動画解析部４０２へ渡し、当該動画データ群をそれぞれ動画データベース４１２へ保存する機能を有する。 The moving image registration unit 401 is a function processing unit that registers a moving image to be processed in this system. Receive or acquire the target video data (group) from an actor outside the system indicated by the video source 500, pass the video data (group) to the video analysis unit 402, and store the video data group in the video database. 412 is stored.

動画解析部４０２は、動画登録部４０１から動画データ群を受け取り、受け取った各動画に付帯する位置情報および日付情報を動画データベース４１２へ保存する機能と、各動画データから、全ての動画フレームデータ、つまり画像データを取得する機能と、当該各画像データを特徴量抽出部４０３へ渡し、画像特徴量の抽出処理を指示する機能と、当該各画像データを物体認識部４０４へ渡し、物体認識処理を指示する機能と、受け取った動画データ群を物体推定部に渡し、物体推定処理を指示する機能と、を有する機能処理部である。 The moving image analysis unit 402 receives the moving image data group from the moving image registration unit 401, stores the position information and date information attached to each received moving image in the moving image database 412, and all the moving image frame data from each moving image data, That is, a function for acquiring image data, a function for instructing image feature amount extraction processing to pass each image data to the feature amount extraction unit 403, and a function for instructing image feature amount extraction processing, and the object recognition processing to perform the object recognition processing. This is a function processing unit having a function to instruct and a function to give the received moving image data group to the object estimation unit and to instruct object estimation processing.

特徴量抽出部４０３は、動画解析部４０２から画像データを受け取り、該画像データの特徴量（例えばＲＧＢヒストグラム）を抽出し、該特徴量データを動画データベース４１２へ保存する機能処理部である。 The feature amount extraction unit 403 is a function processing unit that receives image data from the moving image analysis unit 402, extracts a feature amount (for example, RGB histogram) of the image data, and stores the feature amount data in the moving image database 412.

物体認識部４０４は、動画解析部４０２から画像データを受け取り、該画像データから特定物体認識を行うための特徴量データ、例えば、ＳＩＦＴ（ＳｃａｌｅＩｎｖａｒｉａｎｔＦｅａｔｕｒｅＴｒａｎｓｆｏｒｍ）特徴量（局所的な領域の濃度変化特徴を表す特徴量）などの局所特徴量から計算されるＢａｇＯｆＦｅａｔｕｒｅｓ特徴量（例えば、あらかじめ局所特徴量の集合をＫｍｅａｎｓ法によりクラスタリングしておき、代表的な局所特徴量を任意の個数見つけ出し、画像１枚における求めた代表的な局所特徴量の出現度合いを表した特徴量）を抽出する機能と、該特徴量データと、物体認識データベース４１３の中に保存されている物体特徴量とを比較して特定物体認識処理を行う機能と、該特定物体認識結果と特定物体の位置情報を、動画データベース４１２に保存する機能と、該画像データと物体認識データベース４１３の中に保存されている一般物体認識器と一般物体名称から一般物体認識処理を行う機能と、該一般物体認識結果を動画データベース４１２に保存する機能と、を有する機能処理部である。 The object recognition unit 404 receives image data from the moving image analysis unit 402, and features amount data for performing specific object recognition from the image data, for example, a SIFT (Scale Invariant Feature Transform) feature amount (local area density change) Bag Of Features features calculated from local features such as features representing features) (for example, a set of local features is clustered in advance by the Kmeans method to find an arbitrary number of representative local features, A function for extracting a representative feature amount representing the degree of appearance of a representative local feature amount in one image), and the feature amount data is compared with the object feature amount stored in the object recognition database 413 The specific object recognition process, the specific object recognition result and the specific object A function of storing the position information of the image in the moving image database 412, a function of performing a general object recognition process from the general object recognizer and the general object name stored in the image data and the object recognition database 413, and the general object A function processing unit having a function of storing the recognition result in the moving image database 412.

物体推定部４０５は、動画解析部４０２から動画データ群を受け取り、動画データベース４１２の中に一般物体認識結果が保存されている各動画について、該動画データと、動画データベース４１２の中に保存されている特定物体認識結果と動画撮影日時と動画位置情報を利用して、該一般物体認識結果の、特定物体名称を推定する機能と、該推定結果を動画データベース４１２に保存する機能と、を有する機能処理部である。 The object estimation unit 405 receives the moving image data group from the moving image analysis unit 402 and is stored in the moving image data and the moving image database 412 for each moving image in which the general object recognition result is stored in the moving image database 412. A function of estimating a specific object name of the general object recognition result and a function of saving the estimation result in the moving image database 412 using the specific object recognition result, moving image shooting date and time, and moving image position information It is a processing unit.

要約候補生成部４０６は、動画検索要求として検索クエリーを受け取り、該検索クエリーを動画検索部４０７へ渡し、動画検索処理を指示する機能と、該動画検索結果を利用者端末１０１の要約生成指定・要約対象表示部３０１に送信する機能と、要約生成指示を受け取り、要約対象動画と該要約対象動画のメタデータを要約重みベクトル生成部４０９へ渡し、要約重みベクトル生成処理を指示する機能と、該要約重みベクトルと該要約対象動画から要約候補を生成する機能と、メタデータ訂正要求として文字列クエリーを受け取り、該文字列クエリーをメタデータ訂正部４０８へ渡し、メタデータ訂正を指示する機能と、要約候補出力要求を受け取り、要約候補結果出力部４１０に生成した要約候補の出力を指示する機能と、を有する機能処理部である。 The summary candidate generation unit 406 receives a search query as a video search request, passes the search query to the video search unit 407, and instructs the video search processing, and the video search result is sent to the user terminal 101 as a summary generation designation / A function of transmitting to the summary target display unit 301, a function of receiving a summary generation instruction, passing the summary target video and metadata of the summary target video to the summary weight vector generation unit 409, and instructing a summary weight vector generation process; A function of generating a summary candidate from the summary weight vector and the video to be summarized, a function of receiving a character string query as a metadata correction request, passing the character string query to the metadata correction unit 408, and instructing metadata correction; A function of receiving a summary candidate output request and instructing the summary candidate result output unit 410 to output the generated summary candidate A processing section.

動画検索部４０７は、要約候補生成部４０６、要約重みベクトル生成部４０９、および要約候補結果出力部４１０から動画検索クエリーを受け取り、動画データベース４１２の中に保存されている各動画について、当該検索クエリーの条件に合致する動画の動画データおよび付帯する動画メタデータを取得し、当該検索結果を応答する機能処理部である。 The video search unit 407 receives a video search query from the summary candidate generation unit 406, the summary weight vector generation unit 409, and the summary candidate result output unit 410, and for each video stored in the video database 412, the search query This is a function processing unit that acquires the moving image data of the moving image that matches the above condition and the accompanying moving image metadata and responds with the search result.

メタデータ訂正部４０８は、要約候補生成部４０６からメタデータ訂正要求として文字列クエリーを受け取り、当該クエリーから動画データベース４１２の中に保存されている動画メタデータの訂正処理機能を有する、機能処理部である。 The metadata correction unit 408 receives a character string query as a metadata correction request from the summary candidate generation unit 406, and has a function of correcting the moving image metadata stored in the moving image database 412 from the query. It is.

要約重みベクトル生成部４０９は、要約候補生成部４０６から要約対象動画と該要約対象動画のメタデータを受け取り、要約生成の際に、当該要約対象動画の時間を設定するために用いる要約重みベクトルを生成する機能を有する、機能処理部である。 The summary weight vector generation unit 409 receives the summary target video and the metadata of the summary target video from the summary candidate generation unit 406, and uses the summary weight vector used to set the time of the summary target video when generating the summary. A function processing unit having a function to be generated.

要約候補結果出力部４１０は、要約候補生成部４０６から要約候補結果を受け取り、当該要約候補結果を、利用者端末１０１の要約候補表示・編集部３０２へ送信する機能と、要約候補表示・編集部３０２から要約候補結果の編集要求を受け取り、該当編集結果を要約候補表示・編集部３０２に送信する機能と、要約候補表示・編集部３０２から推薦動画の表示要求を受け取り、該当要求を動画推薦部４１１に渡し、推薦動画の計算を指示する機能と、を有する機能処理部である。 The summary candidate result output unit 410 receives the summary candidate result from the summary candidate generation unit 406, transmits the summary candidate result to the summary candidate display / edit unit 302 of the user terminal 101, and the summary candidate display / edit unit. 302 receives a summary candidate result editing request from 302, transmits a corresponding editing result to the summary candidate display / editing unit 302, receives a recommendation video display request from the summary candidate display / editing unit 302, and sends the request to the video recommendation unit. 411 and a function processing unit having a function of instructing calculation of a recommended moving image.

動画推薦部４１１は、要約候補結果出力部４１０から、推薦動画表示要求を受け取り、該当要求に合致する動画を動画データベース４１２から検索し、検索結果を該当推薦動画として要約候補結果出力部４１０へ応答する機能を有する機能処理部である。 The video recommendation unit 411 receives the recommended video display request from the summary candidate result output unit 410, searches the video database 412 for a video that matches the request, and responds to the summary candidate result output unit 410 as the recommended video. It is a function processing unit having a function to perform.

動画データベース４１２は、要約対象動画となる動画データ群、および図１５で例示される、各動画のメタデータおよび各動画の、各フレームのメタデータを保存する記憶領域である。
（図１５の説明） The moving image database 412 is a storage area for storing a moving image data group as a summary target moving image, and metadata of each moving image and each frame of each moving image illustrated in FIG.
(Explanation of FIG. 15)

ここで図１５を用いて、本発明の実施形態における動画データベース４１２の一例について説明する。 Here, an example of the moving image database 412 in the embodiment of the present invention will be described with reference to FIG.

図１５の動画メタデータ保存テーブルは、１行が動画データベース４１２に蓄えられている１つの動画データを表し、動画データのＩＤ（識別子）（動画ＮＯとも言う）とともに、ｆｐｓカラムに動画のＦＰＳ（フレームパーセカンド）が、フレーム数カラムに動画のフレーム数が、撮影日時カラムに、動画の撮影開始日時を示す時間情報が、動画位置情報カラムに、動画の撮影開始時の緯度・経度を示す位置情報が、それぞれ保存されていることを表している。 The moving image metadata storage table of FIG. 15 represents one moving image data stored in the moving image database 412 in one row, and the moving image data ID (identifier) (also referred to as moving image NO) and the FPS ( The frame number) is the number of frames of the video in the frame number column, the time information indicating the shooting start date and time of the video in the shooting date column, and the position indicating the latitude and longitude at the start of video shooting in the video position information column Each piece of information is stored.

図１５の動画フレームメタデータ保存テーブルは、１行が動画データベース４１２に蓄えられている１つの動画中の１つのフレームを表し、動画フレームのＩＤ（識別子）とともに、フレームＮｏカラムに動画の何フレーム目であるかを示すフレームＮｏが保存される。また、動画ＩＤカラムにはフレームがどの動画のものであるかを示す動画ＩＤ（上記動画メタデータ保存テーブルの動画ＩＤ）が保存される。また、画像特徴量カラムにはフレームから取得した画像特徴量（例えば、色の分布情報を表し、多次元数値ベクトルで表現されるＲＧＢヒストグラム）が保存される。また、特定物体名称カラムには、フレームに特定物体認識処理を行って取得されるフレーム中に存在する特定物体の名称が保存される。また、一般物体名称カラムには、フレームに一般物体認識処理を行って取得されるフレーム中に存在する一般物体の名称が保存される。また、物体推定結果カラムには、フレームに物体推定処理を行って取得されるフレーム中に存在する特定物体の名称が保存される。また、フレーム位置情報カラムには、フレームの特定物体名称カラムの値から取得されるフレーム撮影時の位置情報を示す緯度・経度が保存される。また、特定物体説明情報カラムには、フレームの特定物体名称カラムの値から取得される特定物体に付帯する該当特定物体を説明する情報が保存される。また、曖昧検索インデックスカラムには、フレームの特定物体説明情報カラムの値から生成される動画検索時に利用者が特定物体名称を記憶していない場合でも検索可能にするための単語列が保存される。 The moving image frame metadata storage table of FIG. 15 represents one frame in one moving image stored in the moving image database 412, and the number of moving images in the frame No column along with the moving image frame ID (identifier). A frame number indicating whether it is an eye is stored. In the moving image ID column, a moving image ID (moving image ID in the moving image metadata storage table) indicating which moving image the frame belongs to is stored. The image feature amount column stores an image feature amount acquired from a frame (for example, an RGB histogram representing color distribution information and represented by a multidimensional numerical vector). In the specific object name column, the name of the specific object existing in the frame acquired by performing the specific object recognition process on the frame is stored. In the general object name column, the name of the general object existing in the frame obtained by performing the general object recognition process on the frame is stored. In the object estimation result column, the name of a specific object existing in a frame obtained by performing object estimation processing on the frame is stored. In the frame position information column, latitude / longitude indicating position information at the time of frame shooting acquired from the value of the specific object name column of the frame is stored. In the specific object description information column, information describing the specific object attached to the specific object acquired from the value of the specific object name column of the frame is stored. Also, the fuzzy search index column stores a word string that enables search even when the user does not store the specific object name at the time of moving image search generated from the value of the specific object description information column of the frame. .

物体認識データベース４１３は、図１６で例示される、特定物体認識、一般物体認識および物体推定を行う際に利用するための特定物体名称や物体位置情報などを保存する記憶領域である。
（図１６の説明） The object recognition database 413 is a storage area for storing a specific object name, object position information, and the like for use in performing specific object recognition, general object recognition, and object estimation illustrated in FIG.
(Explanation of FIG. 16)

ここで図１６を用いて、本発明の実施形態における物体認識データベース４１３の一例について説明する。 Here, an example of the object recognition database 413 in the embodiment of the present invention will be described with reference to FIG.

図１６の特定物体管理テーブルは、１行が物体認識データベース４１３に蓄えられている１つの特定物体のデータを表し、特定物体データのＩＤ（識別子）（特定物体ＮＯとも言う）とともに、物体特徴量カラムには、該当特定物体であることを特定するための多次元数値ベクトルで表現される特徴量が保存される。また、特定物体名称カラムには、特定物体の名称を表現する値が保存される。また、特定物体位置情報カラムには、特定物体の存在する緯度・経度が、それぞれ保存されていることを表している。 In the specific object management table of FIG. 16, one row represents data of one specific object stored in the object recognition database 413, and an object feature amount together with an ID (identifier) (also referred to as a specific object NO) of the specific object data. In the column, feature quantities represented by multidimensional numerical vectors for specifying the corresponding specific object are stored. In the specific object name column, a value representing the name of the specific object is stored. The specific object position information column indicates that the latitude and longitude in which the specific object exists are stored.

ここで、物体特徴量カラムには、例えば、ＳＩＦＴ（ＳｃａｌｅＩｎｖａｒｉａｎｔＦｅａｔｕｒｅＴｒａｎｓｆｏｒｍ）特徴量（局所的な領域の濃度変化特徴を表す特徴量）などの局所特徴量から計算されるＢａｇＯｆＦｅａｔｕｒｅｓ特徴量（例えば、局所特徴量の集合をＫｍｅａｎｓ法によりクラスタリングし、代表的な局所特徴量を任意の個数見つけ出し、画像１枚における求めた代表的な局所特徴量の出現度合いを表した特徴量）が当てはめられる。 Here, in the object feature column, for example, a Bag Of Features feature (calculated from a local feature such as a SIFT (Scale Invariant Feature Transform) feature (a feature representing a density change feature of a local region)) ( For example, a set of local feature quantities is clustered by the Kmeans method, an arbitrary number of representative local feature quantities are found, and a feature quantity that represents the degree of appearance of a representative local feature quantity obtained in one image is applied. .

図１６の一般物体管理テーブルは、１行が物体認識データベース４１３に蓄えられている１つの一般物体のデータを表し、一般物体データのＩＤ（識別子）（一般物体ＮＯとも言う）とともに、一般物体検出器出力ラベルカラムには、各一般物体検出器が物体を識別した際に出力する数値が保存される。また、一般物体名称カラムには、一般物体の名称を表現する値が、それぞれ保存されていることを表している。 In the general object management table of FIG. 16, one line represents data of one general object stored in the object recognition database 413, and general object data ID (identifier) (also referred to as general object NO) is detected. The unit output label column stores a numerical value output when each general object detector identifies an object. In the general object name column, values representing the names of the general objects are stored.

ここで、一般物体検出器とは、例えば、前記したＢａｇＯｆＦｅａｔｕｒｅｓ特徴量と、サポートベクターマシン（ＳｕｐｐｏｒｔＶｅｃｔｏｒＭａｃｈｉｎｅ、ＳＶＭ）と呼ばれる機械学習手法を利用して構成されるものである。サポートベクターマシンとは、教師あり学習手法の１つであり、あらかじめ正解データと非正解データ（学習データ）を用いて、正解データを識別するパターンを学習することができるものである。したがって、当該検出器は、あらかじめ各一般物体の学習データを用意し、用意した学習データより抽出したＢａｇＯｆＦｅａｔｕｒｅｓ特徴量を学習することで、当該物体であるかどうかを識別するパターンを学習し、画像から抽出したＢａｇＯｆＦｅａｔｕｒｅｓ特徴量を入力として与えると、物体が識別できた場合には、識別物体ごとにひもづけられたラベルＩＤを出力するものであり、物体認識データベース４１３に保存されているものである。 Here, the general object detector is configured by using, for example, the above-described Bag Of Features feature amount and a machine learning method called a support vector machine (Support Vector Machine, SVM). The support vector machine is one of supervised learning methods, and can learn a pattern for identifying correct data using correct data and non-correct data (learning data) in advance. Therefore, the detector prepares learning data of each general object in advance, learns a Bag Of Features feature amount extracted from the prepared learning data, learns a pattern for identifying whether the object is an object, When the Bag Of Features feature amount extracted from the image is given as an input, if an object can be identified, a label ID linked to each identified object is output and stored in the object recognition database 413. Is.

図１６の物体推定用テーブルは、１行が、物体認識データベース４１３に蓄えられている１つの物体推定データを表し、物体推定データのＩＤ（識別子）（物体推定ＮＯとも言う）とともに、一般物体名称カラムに、一般物体の名称を表現する値が、特定物体名称カラムに、推定結果を表す特定物体の名称を表現する値が、特定物体位置情報カラムに、特定物体の存在する緯度・経度が、それぞれ保存されていることを表している。
図３の説明に戻る。 In the object estimation table of FIG. 16, one row represents one object estimation data stored in the object recognition database 413, and an object estimation data ID (identifier) (also referred to as object estimation NO) and a general object name The value representing the name of the general object in the column, the value representing the name of the specific object representing the estimation result in the specific object name column, the latitude and longitude where the specific object exists in the specific object position information column, Each of them is stored.
Returning to the description of FIG.

動画ソース５００は、本動画編集システムにおける、要約対象となる動画の出所（入力ソース）を表す外部アクターである。例えば、直接動画データを提供する利用者そのもの、各種ビデオカメラ等の映像入力機器なども考えられる。以上が図３の、本発明の実施形態における動画編集システムの機能ブロックの構成についての説明である。 The moving image source 500 is an external actor that represents the source (input source) of a moving image to be summarized in the moving image editing system. For example, users who directly provide moving image data, video input devices such as various video cameras, and the like are also conceivable. The above is the description of the functional block configuration of the moving image editing system according to the embodiment of the present invention shown in FIG.

次に図４を参照して、本発明の実施形態における画像検索システムにおける検索対象画像の登録手順について説明する。図４は、本発明の実施形態における画像検索システムにおける検索対象画像の登録手順の一例を示すフローチャートである。 Next, with reference to FIG. 4, the registration procedure of the search target image in the image search system according to the embodiment of the present invention will be described. FIG. 4 is a flowchart illustrating an example of a procedure for registering a search target image in the image search system according to the embodiment of the present invention.

尚、以下で説明する動画ソース５００（外部装置）は、上記で説明したようにいくつも種類が考えられるが、ここではシステムの利用者が操作する、要約を行いたい動画群が保存されている利用者端末とした場合の例で説明する。 The video source 500 (external device) described below may be of various types as described above, but here, a group of videos to be summarized and operated by the system user are stored. An example in the case of a user terminal will be described.

ステップＳ４０１では、動画登録部４０１は、動画ソース５００で表わされるシステム利用者が操作する利用者端末から要約対象となる動画データ群を取得し、取得した動画データ群を動画データベース４１２に保存して、当該動画データ群を動画解析部４０２へ入力する。 In step S 401, the moving image registration unit 401 acquires a moving image data group to be summarized from a user terminal operated by a system user represented by the moving image source 500, and stores the acquired moving image data group in the moving image database 412. The moving image data group is input to the moving image analysis unit 402.

ステップＳ４０２では、動画解析部４０２は、前記取得した動画データ群それぞれに動画解析処理を行い、動画検索時に利用される動画メタデータと動画フレームメタデータを、動画データベース４１２へ登録する。前記動画メタデータと動画フレームメタデータは、前述した通り、図１５で例示されるようなテーブル構造で保存される。動画解析部４０２は、該動画データ群を、物体推定部４０５へ入力する。ステップＳ４０２の動画解析の詳細処理は、図５を用いて後述する。 In step S 402, the moving image analysis unit 402 performs moving image analysis processing on each of the acquired moving image data groups, and registers moving image metadata and moving image frame metadata used at the time of moving image search in the moving image database 412. The moving image metadata and moving image frame metadata are stored in a table structure as illustrated in FIG. 15 as described above. The moving image analysis unit 402 inputs the moving image data group to the object estimation unit 405. Detailed processing of the moving image analysis in step S402 will be described later with reference to FIG.

ステップＳ４０３では、物体推定部４０５は、前記取得した動画データ群それぞれについて物体推定処理を行い、推定した結果を動画データベース４１２の動画フレームメタデータ保存テーブルへ登録する。ステップＳ４０３の物体推定の詳細処理は、図７を用いて後述する。 In step S 403, the object estimation unit 405 performs object estimation processing for each of the acquired moving image data groups, and registers the estimation result in the moving image frame metadata storage table of the moving image database 412. The detailed object estimation process in step S403 will be described later with reference to FIG.

ステップＳ４０４では、動画解析部４０２は、動画データベース４１２の動画フレームメタデータ保存テーブルより特定物体名称を持っている動画フレームを取得し、取得した動画フレームに対する繰り返し処理を開始する。 In step S404, the moving image analysis unit 402 acquires a moving image frame having a specific object name from the moving image frame metadata storage table of the moving image database 412, and starts repetitive processing on the acquired moving image frame.

ステップＳ４０５では、動画解析部４０２は、処理中の動画フレームに対して、取得した特定物体名称より、該特定物体を説明する情報を取得し、動画データベース４１２の動画フレームメタデータ保存テーブルへ登録する。前記特定物体を説明する情報は、例えば、インターネット上にあるデータや、予め構築したデータベースから取得することが可能である。 In step S405, the moving image analysis unit 402 acquires information describing the specific object from the acquired specific object name for the moving image frame being processed, and registers it in the moving image frame metadata storage table of the moving image database 412. . The information describing the specific object can be acquired from, for example, data on the Internet or a database built in advance.

ステップＳ４０６では、動画解析部４０２は、利用者が当該特定物体の名称を記憶していなくても、該フレームを持つ動画を検索可能にするために、ステップＳ４０５で取得した特定物体説明情報に対して、例えば、形態素解析処理を行い、名詞情報だけを、曖昧検索インデックスとして、動画データベース４１２の動画フレームメタデータ保存テーブルへ登録する。 In step S406, the moving image analysis unit 402 performs the search for the specific object description information acquired in step S405 in order to enable searching for a moving image having the frame even if the user does not store the name of the specific object. Thus, for example, morphological analysis processing is performed, and only noun information is registered in the moving image frame metadata storage table of the moving image database 412 as an ambiguous search index.

ステップＳ４０７では、未処理の動画フレームがある場合は、ステップＳ４０５に戻る。未処理の動画フレームがない場合は、処理を終了する。 In step S407, when there is an unprocessed moving image frame, the process returns to step S405. If there is no unprocessed moving image frame, the process ends.

以上の図４に示す処理により、要約対象動画について、当該要約対象動画に含まれる特定物体に関する情報を含むデータとして登録することが可能となる。具体的には、図１５に示す動画フレームメタデータ保存テーブルに示す情報を登録することが可能となる。 With the processing shown in FIG. 4 described above, it is possible to register the summary target moving image as data including information on a specific object included in the summary target moving image. Specifically, the information shown in the moving picture frame metadata storage table shown in FIG. 15 can be registered.

次に、図５を用いて、本発明の実施形態における、動画編集システムにおける動画解析処理の詳細処理について説明する。図５は、本発明の実施形態における、動画編集システムにおける動画解析処理の詳細処理を示すフローチャートである。 Next, detailed processing of the moving image analysis processing in the moving image editing system in the embodiment of the present invention will be described with reference to FIG. FIG. 5 is a flowchart showing detailed processing of moving image analysis processing in the moving image editing system in the embodiment of the present invention.

ステップＳ５０１では、動画解析部４０２は、ステップＳ４０１で取得した動画データ群に対する繰り返し処理を開始する。 In step S501, the moving image analysis unit 402 starts an iterative process for the moving image data group acquired in step S401.

ステップＳ５０２では、動画解析部４０２は、動画に付帯するメタデータとして、撮影日時、位置情報（緯度・経度情報）を抽出し、動画データベース４１２の動画メタデータ保存テーブルへ登録する。 In step S 502, the moving image analysis unit 402 extracts shooting date / time and position information (latitude / longitude information) as metadata attached to the moving image, and registers the extracted information in the moving image metadata storage table of the moving image database 412.

ステップＳ５０３では、動画解析部４０２は、処理中の動画の各フレームに対する繰り返し処理を開始する。 In step S503, the moving image analysis unit 402 starts repetitive processing for each frame of the moving image being processed.

ステップＳ５０４では、動画解析部４０２は、処理中の動画フレームを特徴量抽出部４０３へ入力する。特徴量抽出部４０３は、前記取得した動画フレームに対し、画像特徴量の抽出処理を行い、抽出した特徴量を動画データベース４１２の動画フレームメタデータ保存テーブルへ登録する。ここで、画像特徴量とは、前述したように、例えば、各色の分布を表現するＲＧＢヒストグラムなどの、多次元数値ベクトルで表現される特徴量である。 In step S 504, the moving image analysis unit 402 inputs the moving image frame being processed to the feature amount extraction unit 403. The feature amount extraction unit 403 performs image feature amount extraction processing on the acquired moving image frame, and registers the extracted feature amount in the moving image frame metadata storage table of the moving image database 412. Here, as described above, the image feature amount is a feature amount expressed by a multidimensional numerical vector such as an RGB histogram expressing the distribution of each color.

ステップＳ５０５では、動画解析部４０２は、処理中の動画フレームを、物体認識部４０４へ入力する。物体認識部４０４は、前記取得した動画フレームに対し、物体認識処理を行い、該認識結果を動画データベース４１２の動画フレームメタデータ保存テーブルへ登録する。ステップＳ５０５の物体認識の詳細処理は、図６を用いて後述する。 In step S 505, the moving image analysis unit 402 inputs the moving image frame being processed to the object recognition unit 404. The object recognition unit 404 performs object recognition processing on the acquired moving image frame and registers the recognition result in the moving image frame metadata storage table of the moving image database 412. The detailed object recognition process in step S505 will be described later with reference to FIG.

ステップＳ５０６では、動画解析部４０２は、未処理の動画フレームがある場合は、ステップＳ５０４へ戻る。未処理の動画フレームがない場合は、ステップＳ５０７へ進む。 In step S506, when there is an unprocessed moving image frame, the moving image analysis unit 402 returns to step S504. If there is no unprocessed moving image frame, the process proceeds to step S507.

ステップＳ５０７では、動画解析部４０２は、未処理の動画がある場合は、ステップＳ５０２へ戻る。未処理の動画がない場合は、処理を終了する。 In step S507, the moving image analysis unit 402 returns to step S502 when there is an unprocessed moving image. If there is no unprocessed moving image, the process ends.

次に、図６を用いて、本発明の実施形態における、動画編集システムにおける物体認識処理の詳細処理について説明する。図６は、本発明の実施形態における、動画編集システムにおける物体認識処理の詳細処理を示すフローチャートである。 Next, detailed processing of the object recognition processing in the moving image editing system in the embodiment of the present invention will be described using FIG. FIG. 6 is a flowchart showing detailed processing of object recognition processing in the moving image editing system in the embodiment of the present invention.

ステップＳ６０１では、物体認識部４０４は、前記取得した動画フレームより、前述した物体特徴量、ＢａｇＯｆＦｅａｔｕｒｅｓ特徴量を抽出する。物体認識部４０４は、前記抽出した物体特徴量と、前述した物体認識データベース４１３の特定物体管理テーブルに保存されている各物体特徴量とを比較し、各特定物体との類似度を計算する。類似度は、例えば、多次元数値ベクトル同士のユークリッド距離で計算される。物体認識部４０４は、前記計算した各類似度のうち、最も小さい類似度が十分小さい場合（例えば０．０１未満である場合。なお当該判断基準となる値は、予め設定されている値であっても、その都度設定する値であってもいずれでも良い。）、該当する特定物体を認識したと判断し、該認識結果を、特定物体名称として動画データベース４１２の動画フレームメタデータ保存テーブルへ登録する。 In step S601, the object recognition unit 404 extracts the above-described object feature amount and Bag Of Features feature amount from the acquired moving image frame. The object recognition unit 404 compares the extracted object feature quantity with each object feature quantity stored in the specific object management table of the object recognition database 413 described above, and calculates a similarity with each specific object. The similarity is calculated by, for example, the Euclidean distance between multidimensional numerical vectors. When the smallest similarity among the calculated similarities is sufficiently small (for example, less than 0.01), the object recognition unit 404 is a value set in advance. However, it may be any value set each time.) It is determined that the corresponding specific object has been recognized, and the recognition result is registered in the video frame metadata storage table of the video database 412 as the specific object name. To do.

認識しない場合は、登録処理を実行せず、次の処理（ステップＳ６０２）に移行する。 If not recognized, the registration process is not executed and the process proceeds to the next process (step S602).

ステップＳ６０２では、物体認識部４０４は、ステップＳ６０１の処理で取得した特定物体認識結果より、前述した物体認識データベース４１３の特定物体管理テーブルに保存されている特定物体位置情報を取得し、該位置情報を動画データベース４１２の動画フレームメタデータ保存テーブルへ登録する。 In step S602, the object recognition unit 404 acquires the specific object position information stored in the specific object management table of the object recognition database 413 described above from the specific object recognition result acquired in the process of step S601, and the position information Are registered in the moving image frame metadata storage table of the moving image database 412.

ステップＳ６０３では、物体認識部４０４は、ステップＳ６０１で動画フレームより抽出した物体特徴量を、前述した物体認識データベース４１３に保存されている一般物体検出器に入力する。物体認識部４０４は、前記一般物体検出器の出力と、物体認識データベース４１３の一般物体管理テーブルの各一般物体検出器出力ラベルとを比較し、該動画フレームに一般物体が存在するかを認識する。物体認識部４０４は、該認識結果を動画データベース４１２の動画フレームメタデータ保存テーブルへ登録する。 In step S603, the object recognition unit 404 inputs the object feature amount extracted from the moving image frame in step S601 to the general object detector stored in the object recognition database 413 described above. The object recognition unit 404 compares the output of the general object detector with each general object detector output label in the general object management table of the object recognition database 413, and recognizes whether a general object exists in the moving image frame. . The object recognition unit 404 registers the recognition result in the moving image frame metadata storage table of the moving image database 412.

以上の図５、図６のフローチャートに示す処理により、予め登録された情報に基づき、要約対象動画の各フレームに含まれる物体の名称等を特定することが可能となる。 By the processing shown in the flowcharts of FIGS. 5 and 6, it is possible to specify the names of objects included in the frames of the summary target moving image based on pre-registered information.

次に、図７を用いて、本発明の実施形態における、動画編集システムにおける物体推定処理の詳細処理について説明する。図７は、本発明の実施形態における、動画編集システムにおける物体推定処理の詳細処理を示すフローチャートである。 Next, the detailed process of the object estimation process in the moving image editing system in the embodiment of the present invention will be described with reference to FIG. FIG. 7 is a flowchart showing a detailed process of the object estimation process in the moving image editing system in the embodiment of the present invention.

ステップＳ７０１では、物体推定部４０５は、取得した動画データ群に対する、物体推定の繰り返し処理を開始する。 In step S 701, the object estimation unit 405 starts object estimation repetition processing for the acquired moving image data group.

ステップＳ７０２では、物体推定部４０５は、当該動画（推定対象動画と呼ぶ）について、動画データベース４１２の、動画フレームメタデータ保存テーブルの一般物体名称が登録されているか否かを判断する。一般物体名称が登録されている場合は、ステップＳ７０３へ進む。登録されていない場合は、ステップＳ７１７へ進む。 In step S702, the object estimation unit 405 determines whether or not the general object name in the moving image frame metadata storage table of the moving image database 412 is registered for the moving image (referred to as an estimation target moving image). If the general object name is registered, the process proceeds to step S703. If not registered, the process proceeds to step S717.

ステップＳ７０３では、物体推定部４０５は、動画データベース４１２の、動画メタデータ保存テーブルに保存されている、推定対象動画のメタデータを取得する。 In step S703, the object estimation unit 405 acquires the metadata of the estimation target moving image stored in the moving image metadata storage table of the moving image database 412.

ステップＳ７０４では、物体推定部４０５は、ステップＳ７０３で取得した推定対象動画のメタデータの撮影日時について、当該撮影日時と同日に撮影された動画群を、動画データベース４１２より取得する。物体推定部４０５は、前記取得した動画群の中に、動画データベース４１２の、動画フレームメタデータ保存テーブルの、特定物体名称が登録されている動画が存在するか否かを判断する。特定物体名称が登録されている動画が存在する場合は、前記取得した動画群のうち、特定物体名称が登録されている動画群のみを一時記憶領域に保存し、ステップＳ７０５へ進む。存在しない場合は、ステップＳ７１７へ進む。 In step S 704, the object estimation unit 405 acquires, from the moving image database 412, a moving image group captured on the same date as the shooting date / time with respect to the shooting date / time of the metadata of the estimation target moving image acquired in step S 703. The object estimation unit 405 determines whether or not a moving image in which the specific object name is registered in the moving image frame metadata storage table of the moving image database 412 exists in the acquired moving image group. If there is a moving image in which the specific object name is registered, only the moving image group in which the specific object name is registered among the acquired moving image group is stored in the temporary storage area, and the process proceeds to step S705. If not, the process proceeds to step S717.

ステップＳ７０５では、物体推定部４０５は、推定対象動画について、動画データベース４１２の、動画フレームメタデータ保存テーブルの一般物体名称が格納されている動画フレーム（推定対象動画フレームと呼ぶ）のメタデータを取得する。 In step S705, the object estimation unit 405 acquires metadata of a moving image frame (referred to as an estimation target moving image frame) in which a general object name in the moving image frame metadata storage table of the moving image database 412 is stored for the estimation target moving image. To do.

ステップＳ７０６では、物体推定部４０５は、ステップＳ７０４で一時記憶領域に保存した特定物体名称が登録されている動画群について、動画データベース４１２の動画メタデータ保存テーブルに保存されている撮影日時とｆｐｓとフレーム数を取得し、動画フレームメタデータ保存テーブルに保存されている、特定物体名称が登録されているフレームのフレームＮｏを取得する。物体推定部４０５は、取得した該動画群の撮影日時とｆｐｓとフレーム数、該動画群の特定物体名称が登録されているフレームのフレームＮｏ群、ステップＳ７０３で取得した推定対象動画メタデータの撮影日時とｆｐｓとフレーム数、ステップＳ７０５で取得した推定対象動画フレームメタデータのフレームＮｏと、を利用し、推定対象動画フレームの撮影時間と最も近い時間に撮影された、特定物体名称が登録されている動画フレーム（推定用動画フレームと呼ぶ）を決定する。 In step S706, the object estimation unit 405 captures the shooting date / time and fps stored in the moving image metadata storage table of the moving image database 412 for the moving image group in which the specific object name stored in the temporary storage area in step S704 is registered. The number of frames is acquired, and the frame No. of the frame registered in the moving image frame metadata storage table and registered with the specific object name is acquired. The object estimation unit 405 captures the acquired shooting date / time, fps, the number of frames, the frame number group of the frame in which the specific object name of the moving image group is registered, and the estimation target moving image metadata acquired in step S703. Using the date and time, the fps, the number of frames, and the frame number of the estimation target moving image frame metadata acquired in step S705, the name of the specific object that was shot at the time closest to the shooting time of the estimation target moving image frame is registered. A moving image frame (referred to as an estimation moving image frame) is determined.

ステップＳ７０７では、物体推定部４０５は、動画データベース４１２の動画メタデータ保存テーブルから、ステップＳ７０６で決定した推定用動画フレームが属する動画（推定用動画と呼ぶ）の、動画メタデータを取得し、動画フレームメタデータ保存テーブルから、推定用動画フレームのメタデータを取得する。 In step S707, the object estimation unit 405 acquires the moving image metadata of the moving image (referred to as the estimating moving image) to which the estimating moving image frame determined in step S706 belongs from the moving image metadata storage table of the moving image database 412. The metadata of the estimation video frame is acquired from the frame metadata storage table.

ステップＳ７０８では、物体推定部４０５は、ステップＳ７０３で取得した推定対象動画メタデータに、動画位置情報が登録されているか否かを判断する。動画位置情報が登録されている場合は、ステップＳ７１１へ進む。登録されていない場合は、ステップＳ７０９へ進む。 In step S708, the object estimation unit 405 determines whether moving image position information is registered in the estimation target moving image metadata acquired in step S703. If the moving image position information is registered, the process proceeds to step S711. If not registered, the process proceeds to step S709.

ステップＳ７０９では、物体推定部４０５は、ステップＳ７０３で取得した推定対象動画の撮影日時とｆｐｓと、ステップＳ７０５で取得した推定対象動画フレームのフレームＮｏと、ステップＳ７０７で取得した推定用動画の撮影日時とｆｐｓと、推定用動画フレームのフレームＮｏと、図８で示される式を用いて、推定対象動画フレームと推定用動画フレーム間の経過時間を計算する。 In step S709, the object estimation unit 405, the shooting date and time of the estimation target moving image acquired in step S703, the frame number of the estimation target moving image frame acquired in step S705, and the shooting date and time of the estimation moving image acquired in step S707. , Fps, the frame number of the estimation moving image frame, and the equation shown in FIG. 8, the elapsed time between the estimation target moving image frame and the estimation moving image frame is calculated.

ステップＳ７１０では、物体推定部４０５は、ステップＳ７０７で取得した推定用動画フレームのフレーム位置情報が示す位置から、例えば、一般的な成人男性の歩行速度の時速４ｋｍで、ステップＳ７０９で計算した推定対象動画フレームと推定用動画フレーム間の経過時間を移動した場合の位置から、誤差１ｋｍ範囲内に存在する、物体認識データベース４１３の、物体推定用テーブルに保存されているレコードを取得する。（ただし、移動速度は、時速４ｋｍに限定されるものではなく、また、複数の移動速度について計算しても良い。また、誤差も、１ｋｍに限るものではなく、実施例に合わせて設定すれば良い。）物体推定部４０５は、前記取得したレコード群のうち、該レコードの一般物体名称と、ステップＳ７０５で取得した推定対象動画フレームの一般物体名称とが合致するレコードの特定物体名称を、物体推定結果として、推定対象動画フレームが示す動画データベース４１２の動画フレームメタデータ保存テーブルのレコードの、物体推定結果に登録する。 In step S710, the object estimation unit 405 calculates the estimation target calculated in step S709 from the position indicated by the frame position information of the estimation moving image frame acquired in step S707, for example, at a normal adult male walking speed of 4 km / h. A record stored in the object estimation table of the object recognition database 413 that is within an error of 1 km is acquired from the position when the elapsed time between the moving image frame and the estimating moving image frame is moved. (However, the moving speed is not limited to 4 km / h, and may be calculated for a plurality of moving speeds. Also, the error is not limited to 1 km, and may be set according to the embodiment. The object estimation unit 405 uses the specific object name of the record in which the general object name of the record matches the general object name of the estimation target moving image frame acquired in step S705 in the acquired record group. The estimation result is registered in the object estimation result of the record in the moving picture frame metadata storage table of the moving picture database 412 indicated by the estimation target moving picture frame.

ステップＳ７１１では、物体推定部４０５は、ステップＳ７０３で取得した推定対象動画の動画ＩＤと、ステップＳ７０７で取得した推定用動画の動画ＩＤが同じであるか否かを判断する。同じである場合は、ステップＳ７１４へ進む。同じでない場合は、ステップＳ７１２へ進む。 In step S711, the object estimation unit 405 determines whether the video ID of the estimation target video acquired in step S703 is the same as the video ID of the estimation video acquired in step S707. If they are the same, the process proceeds to step S714. If not, the process proceeds to step S712.

ステップＳ７１２では、物体推定部４０５は、ステップＳ７０３で取得した推定対象動画のｆｐｓと、ステップＳ７０５で取得した推定対象動画フレームのフレームＮｏと、図９で示される式を用いて、推定対象動画の開始フレームと推定対象動画フレーム間の経過時間を計算する。 In step S712, the object estimation unit 405 uses the fps of the estimation target moving image acquired in step S703, the frame number of the estimation target moving image frame acquired in step S705, and the equation shown in FIG. The elapsed time between the start frame and the estimation target moving image frame is calculated.

ステップＳ７１３では、物体推定部４０５は、ステップＳ７０３で取得した推定対象動画の動画位置情報が示す位置から、例えば、一般的な成人男性の歩行速度の時速４ｋｍで、ステップＳ７１１で計算した推定対象動画フレームと推定用動画フレーム間の経過時間を移動した場合の位置から、誤差１ｋｍ範囲内に存在する、物体認識データベース４１３の、物体推定用テーブルに保存されているレコードを取得する。（ただし、移動速度は、時速４ｋｍに限定されるものではなく、また、複数の移動速度について計算しても良い。また、誤差も、１ｋｍに限るものではなく、実施例に合わせて設定すれば良い。）物体推定部４０５は、前記取得したレコード群のうち、該レコードの一般物体名称と、ステップＳ７０５で取得した推定対象動画フレームの一般物体名称とが合致するレコードの特定物体名称を、物体推定結果として、推定対象動画フレームが示す動画データベース４１２の動画フレームメタデータ保存テーブルのレコードの、物体推定結果に登録する。 In step S713, the object estimation unit 405 calculates the estimation target moving image calculated in step S711 from the position indicated by the moving image position information of the estimation target moving image acquired in step S703, for example, at a speed of 4 km per hour of a general adult male walking speed. A record stored in the object estimation table of the object recognition database 413, which is within an error of 1 km, is acquired from the position when the elapsed time between the frame and the estimation moving image frame is moved. (However, the moving speed is not limited to 4 km / h, and may be calculated for a plurality of moving speeds. Also, the error is not limited to 1 km, and may be set according to the embodiment. The object estimation unit 405 uses the specific object name of the record in which the general object name of the record matches the general object name of the estimation target moving image frame acquired in step S705 in the acquired record group. The estimation result is registered in the object estimation result of the record in the moving picture frame metadata storage table of the moving picture database 412 indicated by the estimation target moving picture frame.

ステップＳ７１４では、物体推定部４０５は、ステップＳ７０３で取得した推定対象動画の動画位置情報とｆｐｓと、ステップＳ７０７で取得した推定用動画フレームのフレーム位置情報とフレームＮｏと、を利用して、推定対象動画の撮影時の推定移動速度を計算する。 In step S714, the object estimation unit 405 performs estimation using the moving image position information and fps of the estimation target moving image acquired in step S703 and the frame position information and frame No. of the estimating moving image frame acquired in step S707. Calculate the estimated moving speed when shooting the target video.

ステップＳ７１５では、物体推定部４０５は、ステップＳ７０５で取得した推定対象動画フレームのフレームＮｏと、ステップＳ７０７で取得した推定用動画フレームのフレームＮｏと、図１０に示される式を用いて、推定対象動画の開始フレームと推定対象動画フレーム間の経過時間と、推定用動画フレームと推定対象動画フレーム間の経過時間をそれぞれ計算し、経過時間の小さい方の経過時間および位置情報（推定対象動画の開始フレームと推定対象動画フレーム間の経過時間の方が小さければ、推定対象動画の動画位置情報、推定用動画フレームと推定対象動画フレーム間の経過時間の方が小さければ、推定用動画フレームのフレーム位置情報）を一時記憶領域に保存する。 In step S715, the object estimation unit 405 uses the frame number of the estimation target moving image frame acquired in step S705, the frame number of the estimation moving image frame acquired in step S707, and the equation shown in FIG. Calculate the elapsed time between the start frame of the video and the estimation target video frame, and the elapsed time between the estimation video frame and the estimation target video frame, and calculate the elapsed time and position information of the smaller elapsed time (start of the estimation target video If the elapsed time between the frame and the estimation target video frame is smaller, the video position information of the estimation target video, and if the elapsed time between the estimation video frame and the estimation target video frame is smaller, the frame position of the estimation video frame Information) in the temporary storage area.

ステップＳ７１６では、物体推定部４０５は、ステップＳ７１５で一時記憶領域に保存した位置情報が示す位置から、ステップＳ７１４で計算した移動速度で、ステップＳ７１５で一時記憶領域に保存した経過時間を移動した場合の位置から、誤差１ｋｍ範囲内に存在する、物体認識データベース４１３の、物体推定用テーブルに保存されているレコードを取得する。（ただし、誤差は、１ｋｍに限るものではなく、実施例に合わせて設定すれば良い。）物体推定部４０５は、前記取得したレコード群のうち、該レコードの一般物体名称と、ステップＳ７０５で取得した推定対象動画フレームの一般物体名称とが合致するレコードの特定物体名称を、物体推定結果として、推定対象動画フレームが示す動画データベース４１２の動画フレームメタデータ保存テーブルのレコードの、物体推定結果に登録する。 In step S716, the object estimation unit 405 moves the elapsed time stored in the temporary storage area in step S715 from the position indicated by the position information stored in the temporary storage area in step S715 at the moving speed calculated in step S714. Records stored in the object estimation table of the object recognition database 413, which are within the error range of 1 km, are acquired from the position. (However, the error is not limited to 1 km, and may be set according to the embodiment.) The object estimation unit 405 acquires the general object name of the record in the acquired record group and the acquisition in step S705. The specific object name of the record that matches the general object name of the estimated target video frame is registered as the object estimation result in the object estimation result of the record in the video frame metadata storage table of the video database 412 indicated by the estimation target video frame To do.

ステップＳ７１６では、物体推定部４０５は、ステップＳ７１５で一時記憶領域に保存した位置情報から、ステップＳ７１５で一時記憶領域に保存した経過時間で、ステップＳ７１４で計算した移動速度によって移動可能な範囲内にある、物体認識データベース４１３の、物体推定用テーブルに保存されているレコードを取得する。物体推定部４０５は、前記取得したレコード群のうち、該レコードの一般物体名称と、ステップＳ７０５で取得した推定対象動画フレームの一般物体名称とが合致するレコードの特定物体名称を、物体推定結果として、推定対象動画フレームが示す動画データベース４１２の動画フレームメタデータ保存テーブルのレコードの、物体推定結果に登録する。 In step S716, the object estimation unit 405 falls within a range that can be moved according to the moving speed calculated in step S714, based on the elapsed time saved in the temporary storage area in step S715, from the position information saved in the temporary storage area in step S715. A record stored in an object estimation table of an object recognition database 413 is acquired. The object estimation unit 405 uses, as the object estimation result, the specific object name of the record in which the general object name of the record matches the general object name of the estimation target moving image frame acquired in step S705 in the acquired record group. The object estimation result of the record in the movie frame metadata storage table of the movie database 412 indicated by the estimation target movie frame is registered.

ステップＳ７１７では、物体推定部４０５は、未処理の動画がある場合は、ステップＳ７０２へ戻る。未処理の動画がない場合は、処理を終了する。 In step S717, when there is an unprocessed moving image, the object estimation unit 405 returns to step S702. If there is no unprocessed moving image, the process ends.

以上の図７のフローチャートで示す処理により、動画中に一般物体名称は特定されたものの、特定物体としては認識できなかった場合であっても、他の動画の情報に基づき、当該一般物体の具体的な名称等を特定することが可能となる。 Although the general object name is specified in the video by the process shown in the flowchart of FIG. 7 above, even if the general object name cannot be recognized as the specific object, the specific object is specified based on the information of the other video. It is possible to specify a specific name or the like.

例えば、図６に示す処理により、「建物」や「改札」として認識された物体について、他の動画の情報を用いることで、当該建物が「増上寺」であると推定したり、当該改札が「東京駅」の改札であると推定することが可能となる。 For example, for the object recognized as “building” or “ticket gate” by the processing shown in FIG. 6, it is estimated that the building is “Zojoji” by using the information of other moving images, It can be estimated that the ticket gate is “Tokyo Station”.

次に、図１１を用いて、本発明の実施形態における動画編集システムにおける要約生成処理の手順について説明する。図１１は、本発明の実施形態における動画編集システムにおける要約生成処理の手順を示すフローチャートである。 Next, a summary generation processing procedure in the moving image editing system according to the embodiment of the present invention will be described with reference to FIG. FIG. 11 is a flowchart showing the procedure of the summary generation process in the moving image editing system according to the embodiment of the present invention.

ステップＳ１１０１では、要約生成指定・要約対象表示部３０１は、利用者による要約候補生成指示を検知した場合はステップＳ１１１０へ進み、検知していない場合はステップＳ１１０２へ進む。
（図１７の説明） In step S1101, the summary generation designation / summarization target display unit 301 proceeds to step S1110 if a summary candidate generation instruction from the user is detected, and proceeds to step S1102 if not detected.
(Explanation of FIG. 17)

ここで図１７を参照して、本発明における動画編集システムの利用者端末１０１の要約生成指定・要約対象表示部３０１における表示画面の一例について説明する。 Here, with reference to FIG. 17, an example of a display screen in the summary generation designation / summarization target display unit 301 of the user terminal 101 of the moving image editing system according to the present invention will be described.

１７０１は、利用者が要約動画の生成指示を、要約生成装置１０２に送信するためのボタンを表している。 Reference numeral 1701 denotes a button for the user to send a summary moving image generation instruction to the summary generation apparatus 102.

１７０２で指示される表示領域は、要約生成指示を送信する際の、要約の対象動画とする動画の一覧を、各動画の代表的な静止画像１枚で表示するための領域である。各動画の代表的な静止画像とは、例えば、動画の先頭フレームで表される画像であっても良いし、動画データベース４１２の動画フレームメタデータ保存テーブルから、利用者が一目見て動画の内容がわかるように、特定物体名称を持つフレームを選択しても良い。 The display area indicated by 1702 is an area for displaying a list of moving pictures to be summarized as a summary target moving picture when a summary generation instruction is transmitted as one representative still image of each moving picture. The representative still image of each moving image may be, for example, an image represented by the first frame of the moving image, or the content of the moving image at a glance from the moving image frame metadata storage table of the moving image database 412. As shown, a frame having a specific object name may be selected.

１７０３は、利用者が要約の対象とする動画候補を、撮影期間やキーワードの条件に基いて検索する指示を、要約生成装置１０２に送信するためのボタンを表している。 Reference numeral 1703 denotes a button for transmitting, to the summary generation apparatus 102, an instruction for a user to search for a moving image candidate to be summarized based on a shooting period or a keyword condition.

１７０４は、利用者が動画検索のために、動画の撮影時間を検索の条件として設定するための入力フィールドである。 Reference numeral 1704 denotes an input field for the user to set the video shooting time as a search condition for video search.

１７０５は、利用者が、動画に付帯するキーワード、例えば、動画データベース４１２の動画フレームメタデータ管理テーブルに保存されている、特定物体名称に合致する動画を、検索の条件として設定するための入力フィールドである。 Reference numeral 1705 denotes an input field for setting a keyword attached to a moving image, for example, a moving image matching a specific object name stored in the moving image frame metadata management table of the moving image database 412 as a search condition. It is.

１７０６で指示される表示領域は、利用者が１７０３のボタンを押下して動画検索を指示した時の、検索結果に含まれる各動画を、各動画の代表的な静止画像１枚で表示するための領域である。 The display area indicated by 1706 displays each moving image included in the search result when the user presses the button 1703 to instruct a moving image search as one representative still image of each moving image. It is an area.

１７０７は、検索結果の動画フレームが、動画データベース４１２の動画フレームメタデータ管理テーブルに保存されている、特定物体名称または物体推定結果を持つ場合、該特定物体名称または該物体推定結果を、該動画フレーム上に表示することで、利用者が、該動画の内容を一目見て把握できるようにしていることを表している。ここで、物体推定結果として、複数の推定結果を持っている場合、該推定結果を全て表示することで、利用者が、後述するメタデータ訂正処理を行うことにより、効率的に物体推定結果の訂正を行うことができる。 When the moving image frame of the search result has a specific object name or an object estimation result stored in the moving image frame metadata management table of the moving image database 412, 1707 indicates the specific object name or the object estimation result as the moving image By displaying on the frame, it indicates that the user can grasp the contents of the moving image at a glance. Here, when there are a plurality of estimation results as the object estimation results, by displaying all the estimation results, the user can efficiently perform the object estimation result by performing the metadata correction processing described later. Corrections can be made.

１７０８は、動画フレームが、動画データベース４１２の動画フレームメタデータ管理テーブルに保存されている、特定物体説明情報を持つ場合、該特定物体説明情報を、該動画フレーム上に表示することで、利用者が、該動画の内容と、該動画フレームに紐付けられている特定物体の内容を把握できるようにしていることを表している。 If the moving image frame has specific object description information stored in the moving image frame metadata management table of the moving image database 412, 1708 displays the specific object description information on the moving image frame so that the user Represents that the content of the moving image and the content of the specific object associated with the moving image frame can be grasped.

１７０７と１７０８により、利用者は、動画フレームに紐付けられている特定物体の内容をひと目で把握できるとともに、該特定物体が実際の該動画フレームに映っている物体と異なる場合には、即座に訂正しやすくなる。 With 1707 and 1708, the user can grasp at a glance the content of the specific object associated with the moving image frame, and if the specific object is different from the actual object reflected in the moving image frame, It becomes easy to correct.

１７０９は、利用者が、上述したように、メタデータを訂正、例えば、マウスで１７０７で示された領域をクリックして、正しい特定物体名称を入力するなどした後に、該訂正結果を要約生成装置１０２に送信するためのボタンである。 1709, after the user corrects the metadata as described above, for example, by clicking the area indicated by 1707 with the mouse and inputting the correct specific object name, the summary generation device displays the correction result. This is a button for transmitting to 102.

１７１０は、利用者が、１７０６の表示領域に示されている動画を１つ、あるいは複数、マウスで選択し、該選択動画を、要約対象動画に追加するためのボタンである。利用者が、動画を選択し、１７１０を押下すると、該動画は１７０２で指示される表示領域に追加される。 Reference numeral 1710 denotes a button for the user to select one or a plurality of moving images shown in the display area 1706 with the mouse and add the selected moving images to the summary target moving image. When the user selects a moving image and presses 1710, the moving image is added to the display area indicated by 1702.

なお、要約対象動画に追加する方法については、ボタン１７１０の押下に限らず、１７０６の表示領域に示されている動画を選択し、当該動画をドラッグし、１７０２の表示領域にドロップすることで追加するよう構成しても良い。 Note that the method of adding to the summary target video is not limited to pressing the button 1710, but can be added by selecting the video shown in the display area 1706, dragging the video, and dropping it in the display area 1702 You may comprise so that it may carry out.

１７１１は、利用者が、要約を生成する際の条件を設定する際に押下するボタンである。当該ボタンを押下すると、図１８に示されるような画面が表示される。
（図１８の説明） Reference numeral 1711 denotes a button that the user presses when setting conditions for generating a summary. When the button is pressed, a screen as shown in FIG. 18 is displayed.
(Explanation of FIG. 18)

ここで図１８を参照して、前述した、１７１１のボタンを押下した際に表示される、利用者が要約動画生成の条件を設定するための表示画面の一例について説明する。 Here, with reference to FIG. 18, an example of a display screen that is displayed when the user presses the button 1711 described above and for the user to set conditions for generating a summary video will be described.

１８０１は、利用者が、要約動画に、優先して含まれてほしい動画を設定するために、動画フレームの持つ特定物体名称を指定するための入力フィールドである。 Reference numeral 1801 denotes an input field for designating a specific object name of a moving image frame so that a user can set a moving image that is desired to be included in the summary moving image.

１８０２は、利用者が、生成される要約動画の再生時間を設定するための、入力フィールドである。 Reference numeral 1802 denotes an input field for the user to set the playback time of the generated summary video.

１８０３は、利用者が、要約動画生成の条件の設定を終了するためのボタンである。 Reference numeral 1803 denotes a button for the user to finish setting the conditions for generating the summary video.

以上、説明したように、利用者は、図１７に示される画面を利用して、動画の検索指示、要約対象動画の指定、メタデータの訂正指示、要約候補の生成指示を行うことができる。
図１１の説明に戻る。 As described above, the user can use the screen shown in FIG. 17 to give a moving image search instruction, a summary target moving picture designation, a metadata correction instruction, and a summary candidate generation instruction.
Returning to the description of FIG.

ステップＳ１１０２では、要約生成指定・要約対象表示部３０１は、利用者による動画検索指示を検知した場合は、ステップＳ１１０３に進み、検知していない場合は、ステップＳ１１０７へ進む。 In step S1102, the summary generation designation / summarization target display unit 301 proceeds to step S1103 if a moving image search instruction by the user is detected, and proceeds to step S1107 if not detected.

ステップＳ１１０３では、要約生成指定・要約対象表示部３０１は、前述した動画の撮影期間と動画に付帯するキーワードを、検索クエリーとして要約生成装置１０２へ送信する。 In step S 1103, the summary generation designation / summarization target display unit 301 transmits the above-described moving image shooting period and the keyword attached to the moving image to the summary generation device 102 as a search query.

ステップＳ１１０４では、要約生成装置１０２の、要約候補生成部４０６は、ステップＳ１１０３で送信された検索クエリーを受信し、該検索クエリーを動画検索部４０７へ入力し、動画検索処理を指示する。動画検索部４０７は、当該検索クエリーの、動画撮影期間に、動画データベース４１２の動画メタデータ保存テーブルの撮影日時が合致する動画と、当該検索クエリーのキーワードを、動画データベース４１２の動画フレームメタデータ保存テーブルの、特定物体名称または物体推定結果または曖昧検索インデックスに持つ動画を、動画検索結果として、要約候補生成部４０６へ応答する。 In step S1104, the summary candidate generation unit 406 of the summary generation apparatus 102 receives the search query transmitted in step S1103, inputs the search query to the video search unit 407, and instructs video search processing. The moving image search unit 407 stores a moving image in which the shooting date and time of the moving image metadata storage table of the moving image database 412 matches the moving image shooting period of the search query and the keyword of the search query in the moving image database 412. The specific object name or the object estimation result or the moving image included in the fuzzy search index in the table is returned to the summary candidate generation unit 406 as the moving image search result.

ステップＳ１１０５では、要約候補生成部４０６は、ステップＳ１１０４で動画検索部４０７より応答された動画検索結果を、利用者端末１０１の、要約生成指定・要約対象表示部３０１へ送信する。 In step S 1105, the summary candidate generating unit 406 transmits the moving image search result responded by the moving image search unit 407 in step S 1104 to the summary generation designation / summarization target display unit 301 of the user terminal 101.

ステップＳ１１０６では、要約生成指定・要約対象表示部３０１は、受信した動画検索結果を表示する。利用者は、表示された検索結果から、要約対象として追加したい動画を選択し、要約対象動画に追加する。要約生成指定・要約対象表示部３０１は、追加された要約対象動画を、一時記憶領域に記録する。 In step S1106, the summary generation designation / summarization target display unit 301 displays the received moving image search result. The user selects a video to be added as a summary target from the displayed search results and adds it to the summary target video. The summary generation designation / summarization target display unit 301 records the added summary target moving image in the temporary storage area.

ステップＳ１１０７では、要約生成指定・要約対象表示部３０１は、利用者によるメタデータ訂正指示を検知した場合は、ステップＳ１１０８に進み、検知していない場合は、ステップＳ１１０１へ戻る。 In step S1107, the summary generation designation / summarization target display unit 301 proceeds to step S1108 when detecting a metadata correction instruction by the user, and returns to step S1101 when not detected.

ステップＳ１１０８では、要約生成指定・要約対象表示部３０１は、動画フレームＩＤと、前述した１７０７に入力されたメタデータ訂正結果を、メタデータ訂正クエリーとして要約生成装置１０２へ送信する。 In step S1108, the summary generation designation / summarization target display unit 301 transmits the moving image frame ID and the metadata correction result input in the above-described 1707 to the summary generation apparatus 102 as a metadata correction query.

ステップＳ１１０９では、要約生成装置１０２の要約候補生成部４０６は、ステップＳ１１０８で送信されたメタデータ訂正クエリーを受信し、該メタデータ訂正クエリーを、メタデータ訂正部４０８へ入力し、メタデータ訂正を指示する。メタデータ訂正部４０８は、当該メタデータ訂正クエリーの動画フレームＩＤが示す動画フレームに対して、動画データベース４１２の動画フレームメタデータ保存テーブルの特定物体名称に、当該メタデータ訂正クエリーのメタデータ訂正結果を登録する。 In step S1109, the summary candidate generation unit 406 of the summary generation apparatus 102 receives the metadata correction query transmitted in step S1108, inputs the metadata correction query to the metadata correction unit 408, and performs metadata correction. Instruct. The metadata correction unit 408 adds the metadata correction result of the metadata correction query to the specific object name in the moving image frame metadata storage table of the moving image database 412 for the moving image frame indicated by the moving image frame ID of the metadata correction query. Register.

ステップＳ１１１０では、要約生成指定・要約対象表示部３０１は、前述した１７０２で指示される表示領域の、対象動画群と、図１８で示される画面により設定された要約生成の条件を、要約生成クエリーとして、要約生成装置１０２へ送信する。 In step S1110, the summary generation designation / summarization target display unit 301 displays the target movie group in the display area designated by 1702 described above and the conditions for the summary generation set by the screen shown in FIG. To the summary generation apparatus 102.

ステップＳ１１１１では、要約生成装置１０２の、要約候補生成部４０６は、ステップＳ１１１０で送信された要約生成クエリーより、要約候補を生成する。要約候補生成部４０６は、当該要約候補結果を要約候補結果出力部４１０へ入力し、要約候補結果の送信を指示する。ステップＳ１１１１の要約候補生成の詳細処理は、図１２を用いて後述する。 In step S 1111, the summary candidate generation unit 406 of the summary generation apparatus 102 generates a summary candidate from the summary generation query transmitted in step S 1110. The summary candidate generation unit 406 inputs the summary candidate result to the summary candidate result output unit 410 and instructs transmission of the summary candidate result. Details of the summary candidate generation in step S1111 will be described later with reference to FIG.

ステップＳ１１１２では、要約候補結果出力部４１０は、要約候補結果を利用者端末１０１の、要約候補表示・編集部３０２へ送信する。 In step S1112, the summary candidate result output unit 410 transmits the summary candidate result to the summary candidate display / editing unit 302 of the user terminal 101.

ステップＳ１１１３では、要約候補表示・編集部３０２は、受信した要約候補結果を表示する。利用者は、表示された当該要約候補結果を確認する。
（図１９の説明） In step S1113, summary candidate display / editing section 302 displays the received summary candidate result. The user confirms the displayed summary candidate result.
(Explanation of FIG. 19)

ここで図１９を参照して、本発明における動画編集システムの利用者端末１０１の要約候補表示・編集部３０２における表示画面の一例について説明する。 Here, an example of a display screen in the summary candidate display / editing unit 302 of the user terminal 101 of the moving image editing system according to the present invention will be described with reference to FIG.

１９０１で指示される表示領域は、要約候補結果を、当該要約候補を構成する各動画の代表的な静止画像を、要約動画の時系列となるようにつなげて（タイムラインと呼ぶ）表示するための領域である。 The display area designated by 1901 displays the summary candidate result by connecting the representative still images of the moving images constituting the summary candidate in time series of the summary moving images (referred to as a timeline). It is an area.

１９０２は、利用者が、１９０１に表示される要約候補の編集を行うためのボタンである。 Reference numeral 1902 denotes a button for the user to edit the summary candidate displayed in 1901.

１９０３は、利用者が、１９０１に表示される要約候補を、最終的な要約動画として出力するためのボタンである。
（図２０の説明） Reference numeral 1903 denotes a button for the user to output the summary candidate displayed in 1901 as a final summary video.
(Explanation of FIG. 20)

ここで図２０を参照して、前述した、１９０２のボタンを押下した際に表示される、利用者が要約候補動画の編集を行うための表示画面の一例について説明する。 Here, with reference to FIG. 20, an example of a display screen that is displayed when the user presses the above-described button 1902 for the user to edit the summary candidate video will be described.

２００１は、編集中の要約候補を表示しているタイムラインである。 Reference numeral 2001 denotes a timeline displaying summary candidates being edited.

２００２は、利用者が、最終的な要約動画を出力するためのボタンである。 Reference numeral 2002 denotes a button for the user to output a final summary video.

２００３で指示される表示領域は、利用者が新たに要約動画に追加したい動画を、動画検索を行って表示するための領域である。利用者は、例えば、本領域に表示された動画を代表する静止画像を、マウスを利用してドラッグアンドドロップの操作を行い、２００１で指示されるタイムライン上の、動画を追加したい箇所へ移動することで、要約候補の編集処理を行うことができる。 The display area indicated in 2003 is an area for performing a video search and displaying a video that the user wants to newly add to the summary video. For example, the user performs a drag-and-drop operation using a mouse on a still image representing the moving image displayed in this area, and moves to a location on the timeline indicated in 2001 where the moving image is to be added. This makes it possible to perform summary candidate editing processing.

２００４は、利用者が、例えば、２００１で指示されるタイムライン上の静止画像をマウスでクリックした後に、その次の動画として、より自然につながるような素材動画の推薦結果の表示を指示するためのボタンである。推薦動画は、例えば、要約生成装置１０２が、動画の各フレームの画像特徴量を平均し（動画特徴量と呼ぶ）、利用者が選択した動画の動画特徴量との類似度（特徴量同士のユークリッド距離などにより計算される）を計算することによって行われる。要約生成装置１０２は、計算した類似度が小さい順に、例えば５個の動画を推薦結果として利用者端末１０１に送信する。 In 2004, for example, after the user clicks a still image on the timeline indicated in 2001 with a mouse, the user instructs to display a recommendation result of a material video that is more naturally connected as the next video. It is a button. For the recommended moving image, for example, the summary generation device 102 averages the image feature amount of each frame of the moving image (referred to as a moving image feature amount), and the similarity (of the feature amount between the moving image feature amounts of the moving image selected by the user). Calculated by Euclidean distance etc.). The summary generation apparatus 102 transmits, for example, five moving images to the user terminal 101 as recommendation results in ascending order of the calculated similarity.

２００５は、利用者が２００２のボタンを押下して、動画推薦結果の表示を指示した時の、推薦結果に含まれる動画を、各動画の代表的な静止画像１枚で表示するための領域である。 Reference numeral 2005 denotes an area for displaying a moving image included in the recommendation result as one representative still image of each moving image when the user presses the button 2002 to instruct display of the moving image recommendation result. is there.

以上、説明したように、利用者は、図２０に示される画面を利用して、要約動画の素材となる動画の入れ替えや再生時間の変更、新たに検索した動画を追加するなどの操作により、要約候補動画の編集処理と要約動画の出力指示を行うことができる。
図１１の説明に戻る。 As described above, the user uses the screen shown in FIG. 20 to perform operations such as replacement of the video that is the material of the summary video, change of the playback time, and addition of a newly searched video. An editing process for the summary candidate video and an instruction to output the summary video can be performed.
Returning to the description of FIG.

ステップＳ１１１４では、要約候補表示・編集部３０２は、利用者による要約候補の修正を検知した場合は、ステップＳ１１１５へ進み、検知していない場合はステップＳ１１１６へ進む。 In step S1114, the summary candidate display / editing unit 302 proceeds to step S1115 if the correction of the summary candidate by the user is detected, and proceeds to step S1116 if not detected.

ステップＳ１１１５では、要約候補表示・編集部３０２は、前述したように、利用者による図２０で示される画面を利用した要約候補動画の編集処理を行う。 In step S1115, the summary candidate display / editing unit 302 performs a summary candidate moving image editing process using the screen shown in FIG. 20 by the user, as described above.

ステップＳ１１１６では、要約候補表示・編集部３０２は、利用者による要約動画出力指示を検知し、要約動画出力指示を要約生成装置１０２の要約候補結果出力部に送信する。 In step S 1116, summary candidate display / editing section 302 detects a summary video output instruction by the user and transmits a summary video output instruction to summary candidate result output section of summary generation apparatus 102.

ステップＳ１１１７では、要約候補結果出力部４１０は、ステップＳ１１１７で送信された要約動画出力指示により、最終的な要約動画を作成し、出力する。出力先は、例えば、要約生成装置１０２が備える外部記憶装置や、利用者端末１０１が備える外部記憶装置であってもよい。 In step S1117, the summary candidate result output unit 410 creates and outputs a final summary video according to the summary video output instruction transmitted in step S1117. For example, the output destination may be an external storage device included in the summary generation device 102 or an external storage device included in the user terminal 101.

以上、図１１を用いて、本発明の実施形態における動画編集システムにおける要約生成処理の手順について説明した。 The summary generation process procedure in the moving image editing system according to the embodiment of the present invention has been described above with reference to FIG.

次に、図１２を用いて、本発明の実施形態における、動画編集システムにおける要約候補生成処理の詳細処理について説明する。図１２は、本発明の実施形態における、動画編集システムにおける要約候補生成処理の詳細処理を示すフローチャートである。 Next, detailed processing of the summary candidate generation processing in the moving image editing system in the embodiment of the present invention will be described using FIG. FIG. 12 is a flowchart showing a detailed process of the summary candidate generation process in the moving image editing system according to the embodiment of the present invention.

ステップＳ１２０１では、要約候補生成部４０６は、動画データベース４１２の動画メタデータ保存テーブルから、受信した要約対象動画群の動画メタデータを取得する。 In step S 1201, the summary candidate generating unit 406 acquires the received moving image metadata of the summary target moving image group from the moving image metadata storage table of the moving image database 412.

ステップＳ１２０２では、要約候補生成部４０６は、受信した要約生成クエリーより、要約生成の条件を取得する。 In step S1202, the summary candidate generation unit 406 obtains a summary generation condition from the received summary generation query.

ステップＳ１２０３では、要約候補生成部４０６は、ステップＳ１２０１で取得した要約対象動画群の動画メタデータと、ステップＳ１２０２で取得した要約生成の条件と、を要約重みベクトル生成部４０９へ入力し、要約候補を構成する、各要約対象動画の再生フレーム数を決定するための、要約重みベクトル生成処理を指示する。ステップＳ１２０３の要約重みベクトル生成の詳細処理は、図１３を用いて後述する。 In step S1203, the summary candidate generation unit 406 inputs the video metadata of the summary target video group acquired in step S1201 and the summary generation condition acquired in step S1202 to the summary weight vector generation unit 409, and the summary candidate The summarization weight vector generation process for determining the number of playback frames of each summarization target moving image is configured. Detailed processing for generating the summary weight vector in step S1203 will be described later with reference to FIG.

ステップＳ１２０４では、要約候補生成部４０６は、要約対象動画データ群に対する繰り返し処理を開始する。 In step S1204, the summary candidate generating unit 406 starts an iterative process for the summary target moving image data group.

ステップＳ１２０５では、要約候補生成部４０６は、ステップＳ１２０１で取得した要約対象動画のフレーム数と、ステップＳ１２０３で生成した要約重みベクトルの、要約対象動画に対応する重みより、当該要約動画の再生フレーム数を計算する。 In step S1205, the summary candidate generating unit 406 determines the number of playback frames of the summary video from the number of frames of the summary target video acquired in step S1201 and the weight corresponding to the summary target video of the summary weight vector generated in step S1203. Calculate

ステップＳ１２０６では、要約候補生成部４０６は、動画データベース４１２の動画フレームメタデータから、要約対象動画に該当する動画フレームのメタデータ群を取得する。要約候補生成部４０６は、前記取得した動画フレームメタデータ群のうち、特定物体名称または物体推定結果に、ステップＳ１２０２で取得した要約生成の条件の優先キーワードと合致する動画フレームが存在する場合、当該動画フレームを中間フレームとし、ステップＳ１２０５で計算した再生フレーム数を満たすように、当該要約動画の再生フレームＮｏ群を決定する。要約候補生成部４０６は、前記取得した動画フレームメタデータ群のうち、特定物体名称または物体推定結果に、ステップＳ１２０２で取得した要約生成の条件の優先キーワードと合致する動画フレームが存在しない場合、当該要約対象動画の開始フレームからステップＳ１２０５で計算した再生フレーム数を、当該要約動画の再生フレームＮｏ群として決定する。 In step S1206, the summary candidate generating unit 406 acquires a moving image frame metadata group corresponding to the summary target moving image from the moving image frame metadata of the moving image database 412. The summary candidate generating unit 406, in the acquired moving image frame metadata group, if there is a moving image frame that matches the priority keyword of the abstract generation condition acquired in step S1202 in the specific object name or the object estimation result, A playback frame number group of the summary movie is determined so that the movie frame is an intermediate frame and the number of playback frames calculated in step S1205 is satisfied. If there is no video frame that matches the priority keyword of the summary generation condition acquired in step S1202 in the specific object name or the object estimation result in the acquired video frame metadata group, the summary candidate generation unit 406 The number of playback frames calculated in step S1205 from the start frame of the summary video is determined as the playback frame No group of the summary video.

ステップＳ１２０７では、要約候補生成部４０６は、未処理の動画がある場合は、ステップＳ１２０５へ戻る。未処理の動画がない場合は、ステップＳ１２０８へ進む。 In step S1207, if there is an unprocessed moving image, the summary candidate generating unit 406 returns to step S1205. If there is no unprocessed moving image, the process advances to step S1208.

ステップＳ１２０８では、要約候補生成部４０６は、ステップＳ１２０７で決定した各要約対象動画の再生フレームＮｏ群で構成される、要約候補を生成し、該要約候補結果を要約候補結果出力部４１０へ入力し、要約候補結果の送信を指示する。 In step S1208, the summary candidate generation unit 406 generates a summary candidate including the playback frame No group of each summary target video determined in step S1207, and inputs the summary candidate result to the summary candidate result output unit 410. Instruct the transmission of summary candidate results.

次に、図１３を用いて、本発明の実施形態における、動画編集システムにおける要約重みベクトル生成処理の詳細処理について説明する。図１３は、本発明の実施形態における、動画編集システムにおける要約重みベクトル生成処理の詳細処理を示すフローチャートである。 Next, detailed processing of the summary weight vector generation processing in the moving image editing system in the embodiment of the present invention will be described using FIG. FIG. 13 is a flowchart showing detailed processing of summary weight vector generation processing in the moving image editing system in the embodiment of the present invention.

ステップＳ１３０１では、要約重みベクトル生成部４０９は、受信した要約生成の条件の、出力動画時間が設定されていれば、当該出力動画時間を再生フレーム数に変換し、ｘに代入する。設定されていなければ、受信した要約対象動画群の動画メタデータのフレーム数を合算し、合算したフレーム数を例えば１０で割ったフレーム数をｘに代入する。 In step S1301, the summary weight vector generation unit 409 converts the output movie time into the number of playback frames if the output movie time is set as the received summary generation condition, and substitutes it for x. If not set, the number of frames of the moving image metadata of the received summary target moving image group is added up, and the number of frames obtained by dividing the combined number of frames by 10, for example, is substituted for x.

ステップＳ１３０２では、要約重みベクトル生成部４０９は、受信した各要約対象動画群の動画メタデータの各フレーム数と、要約対象動画の数と、図１４で示される式を用いて、各要約対象動画に対応する重みからなる、初期要約重みベクトルを生成する。 In step S1302, the summary weight vector generating unit 409 uses each frame of the received video metadata of each summary target video group, the number of summary target videos, and the formula shown in FIG. An initial summary weight vector consisting of weights corresponding to is generated.

ステップＳ１３０３では、要約重みベクトル生成部４０９は、動画データベース４１２の動画フレームメタデータから、受信した要約対象動画群に該当する動画フレームのメタデータ群を取得する。 In step S 1303, the summary weight vector generation unit 409 acquires a moving image frame metadata group corresponding to the received summary target moving image group from the moving image frame metadata of the moving image database 412.

ステップＳ１３０４では、要約重みベクトル生成部４０９は、受信した要約生成の条件の、優先キーワードに対する繰り返し処理を開始する。 In step S1304, the summary weight vector generation unit 409 starts an iterative process for the priority keyword under the received summary generation condition.

ステップＳ１３０５では、要約重みベクトル生成部４０９は、ステップＳ１３０３で取得した要約対象動画群のフレームメタデータの特定物体名称または物体推定結果に、優先キーワードと合致するフレームが存在しない動画群のなかで、要約重みベクトルの重みが最大の動画を選び、ｔとする。 In step S1305, the summary weight vector generation unit 409 includes a moving image group in which a frame matching the priority keyword does not exist in the specific object name or the object estimation result of the frame metadata of the moving image group to be summarized acquired in step S1303. A moving picture with the maximum weight of the summary weight vector is selected and is set to t.

ステップＳ１３０６では、要約重みベクトル生成部４０９は、ステップＳ１３０５で選択した動画ｔの重みを１／２に更新し、更新した当該重みをｔｗとする。 In step S1306, the summary weight vector generation unit 409 updates the weight of the moving image t selected in step S1305 to ½, and sets the updated weight as tw.

ステップＳ１３０７では、要約重みベクトル生成部４０９は、ステップＳ１３０３で取得した要約対象動画群のフレームメタデータの特定物体名称または物体推定結果に、優先キーワードと合致するフレームが存在する動画の数を、ｎとする。 In step S1307, the summary weight vector generation unit 409 calculates the number of moving images in which a frame matching the priority keyword exists in the specific object name or the object estimation result of the frame metadata of the summary target moving image group acquired in step S1303. And

ステップＳ１３０８では、要約重みベクトル生成部４０９は、ステップＳ１３０３で取得した要約対象動画群のフレームメタデータの特定物体名称または物体推定結果に、優先キーワードと合致するフレームが存在する動画に対する繰り返し処理を開始する。 In step S1308, the summary weight vector generation unit 409 starts an iterative process for a moving image in which a frame matching the priority keyword exists in the specific object name or the object estimation result of the frame metadata of the summarizing target moving image group acquired in step S1303. To do.

ステップＳ１３０９では、要約重みベクトル生成部４０９は、対象動画の重みにｔｗ／ｎを足して対象動画の重みを更新し、更新した重みをｕｗとする。 In step S1309, the summary weight vector generation unit 409 updates the weight of the target moving image by adding tw / n to the weight of the target moving image, and sets the updated weight as uw.

ステップＳ１３１０では、要約重みベクトル生成部４０９は、対象動画の動画メタデータのフレーム数をｍとする。 In step S1310, the summary weight vector generation unit 409 sets m as the number of frames of moving image metadata of the target moving image.

ステップＳ１３１１では、要約重みベクトル生成部４０９は、ｕｗがｍ／ｘより大きければ、ステップＳ１３１２へ進む。そうでない場合、ステップＳ１３１５へ進む。 In step S1311, the summary weight vector generation unit 409 proceeds to step S1312 if uw is larger than m / x. Otherwise, the process proceeds to step S1315.

ステップＳ１３１２では、要約重みベクトル生成部４０９は、ｕｗからｍ／ｘを引いた値を、ｕｗ’とする。 In step S1312, the summary weight vector generation unit 409 sets uw ′ as a value obtained by subtracting m / x from uw.

ステップＳ１３１３では、要約重みベクトル生成部４０９は、ｕｗ’を動画ｔの重みに足し、動画ｔの重みを更新する。 In step S1313, the summary weight vector generation unit 409 adds uw ′ to the weight of the moving image t, and updates the weight of the moving image t.

ステップＳ１３１４では、要約重みベクトル生成部４０９は、対象動画の重みをｍ／ｘに更新する。 In step S1314, the summary weight vector generation unit 409 updates the weight of the target moving image to m / x.

ステップＳ１３１５では、要約重みベクトル生成部４０９は、未処理の動画がある場合は、ステップＳ１３０９へ戻る。未処理の動画がない場合は、ステップＳ１３１６へ進む。 In step S1315, the summary weight vector generation unit 409 returns to step S1309 if there is an unprocessed moving image. If there is no unprocessed moving image, the process proceeds to step S1316.

ステップＳ１３１６では、要約重みベクトル生成部４０９は、未処理の優先キーワードがある場合は、ステップＳ１３０５へ戻る。未処理の優先キーワードがない場合は、処理を終了する。 In step S1316, if there is an unprocessed priority keyword, the summary weight vector generation unit 409 returns to step S1305. If there is no unprocessed priority keyword, the process ends.

以上、図１３を用いて説明したように、各要約対象動画の再生フレーム数を、優先キーワードが合致するフレームが存在する動画ほど再生フレーム数が大きくなるように、要約重みベクトルを生成することができる。 As described above with reference to FIG. 13, the summary weight vector can be generated so that the number of playback frames of each summarization target video is increased so that the number of playback frames increases for a video with a frame that matches the priority keyword. it can.

以上説明したように、本発明によれば、動画に含まれる物体が何であるかを特定し、特定された結果とともに動画データを保存しておくことが可能となる（例えば、動画中に「建物」が写っている。そしてその建物は「増上寺」である。といった情報とともに動画データを保存することが可能となる）。このように動画データを保存することで、要約動画を作成する際に、ユーザはキーワードを入力することで、当該キーワードが示す物体が写っているシーンを含む要約動画を作成することが可能となる（例えば、ユーザが「増上寺」というキーワードを指定して要約動画の生成指示をした場合には、「増上寺」が写っているシーンを含む要約動画が生成される）。 As described above, according to the present invention, it is possible to specify what an object is included in a moving image, and to store moving image data together with the specified result (for example, “building” ”And the building is“ Zojoji ”, and video data can be saved). By storing the moving image data in this way, when creating a summary movie, the user can create a summary movie including a scene in which the object indicated by the keyword is captured by inputting the keyword. (For example, when the user designates the keyword “Zojoji” and gives an instruction to generate a summary video, a summary video including a scene in which “Zojoji” is shown is generated).

このように、図４〜図７に示す処理により、要約した動画を作成する等の動画編集時において、その素材となる動画を検索する際の利便性を向上することが可能となる。 As described above, the processing shown in FIGS. 4 to 7 can improve convenience when searching for a moving image as a material when editing a moving image such as creating a summarized moving image.

さらに、図１１〜図１３に示す処理により、ユーザが望む要約動画を生成することが可能となる。 Furthermore, the process shown in FIGS. 11 to 13 makes it possible to generate a summary video desired by the user.

なお、上述した各種データの構成及びその内容はこれに限定されるものではなく、用途や目的に応じて、様々な構成や内容で構成されることは言うまでもない。 It should be noted that the configuration and contents of the various data described above are not limited to this, and it goes without saying that the various data and configurations are configured according to the application and purpose.

また、本発明におけるプログラムは、図１１〜図１３、図１７の処理をコンピュータに実行させるプログラムである。なお、本発明におけるプログラムは、図１１〜図１３、図１７の各処理ごとのプログラムであってもよい。 Moreover, the program in this invention is a program which makes a computer perform the process of FIGS. 11-13, and FIG. The program in the present invention may be a program for each process in FIGS. 11 to 13 and FIG.

以上のように、前述した実施形態の機能を実現するプログラムを記録した記録媒体を、システムあるいは装置に供給し、そのシステムあるいは装置のコンピュータ（またはＣＰＵやＭＰＵ）が記録媒体に格納されたプログラムを読み出し、実行することによっても本発明の目的が達成されることは言うまでもない。 As described above, a recording medium that records a program that implements the functions of the above-described embodiments is supplied to a system or apparatus, and a computer (or CPU or MPU) of the system or apparatus stores the program stored in the recording medium. It goes without saying that the object of the present invention can also be achieved by reading and executing.

この場合、記録媒体から読み出されたプログラム自体が本発明の新規な機能を実現することになり、そのプログラムを記録した記録媒体は本発明を構成することになる。 In this case, the program itself read from the recording medium realizes the novel function of the present invention, and the recording medium recording the program constitutes the present invention.

プログラムを供給するための記録媒体としては、例えば、フレキシブルディスク、ハードディスク、光ディスク、光磁気ディスク、ＣＤ−ＲＯＭ、ＣＤ−Ｒ、ＤＶＤ−ＲＯＭ、磁気テープ、不揮発性のメモリカード、ＲＯＭ、ＥＥＰＲＯＭ、シリコンディスク等を用いることが出来る。 As a recording medium for supplying the program, for example, a flexible disk, hard disk, optical disk, magneto-optical disk, CD-ROM, CD-R, DVD-ROM, magnetic tape, nonvolatile memory card, ROM, EEPROM, silicon A disk or the like can be used.

また、コンピュータが読み出したプログラムを実行することにより、前述した実施形態の機能が実現されるだけでなく、そのプログラムの指示に基づき、コンピュータ上で稼働しているＯＳ（オペレーティングシステム）等が実際の処理の一部または全部を行い、その処理によって前述した実施形態の機能が実現される場合も含まれることは言うまでもない。 Further, by executing the program read by the computer, not only the functions of the above-described embodiments are realized, but also an OS (operating system) operating on the computer based on an instruction of the program is actually It goes without saying that a case where the function of the above-described embodiment is realized by performing part or all of the processing and the processing is included.

さらに、記録媒体から読み出されたプログラムが、コンピュータに挿入された機能拡張ボードやコンピュータに接続された機能拡張ユニットに備わるメモリに書き込まれた後、そのプログラムコードの指示に基づき、その機能拡張ボードや機能拡張ユニットに備わるＣＰＵ等が実際の処理の一部または全部を行い、その処理によって前述した実施形態の機能が実現される場合も含まれることは言うまでもない。 Furthermore, after the program read from the recording medium is written to the memory provided in the function expansion board inserted into the computer or the function expansion unit connected to the computer, the function expansion board is based on the instructions of the program code. It goes without saying that the case where the CPU or the like provided in the function expansion unit performs part or all of the actual processing and the functions of the above-described embodiments are realized by the processing.

また、本発明は、複数の機器から構成されるシステムに適用しても、ひとつの機器から成る装置に適用しても良い。また、本発明は、システムあるいは装置にプログラムを供給することによって達成される場合にも適応できることは言うまでもない。この場合、本発明を達成するためのプログラムを格納した記録媒体を該システムあるいは装置に読み出すことによって、そのシステムあるいは装置が、本発明の効果を享受することが可能となる。 The present invention may be applied to a system constituted by a plurality of devices or an apparatus constituted by a single device. Needless to say, the present invention can be applied to a case where the present invention is achieved by supplying a program to a system or apparatus. In this case, by reading a recording medium storing a program for achieving the present invention into the system or apparatus, the system or apparatus can enjoy the effects of the present invention.

さらに、本発明を達成するためのプログラムをネットワーク上のサーバ、データベース等から通信プログラムによりダウンロードして読み出すことによって、そのシステムあるいは装置が、本発明の効果を享受することが可能となる。なお、上述した各実施形態およびその変形例を組み合わせた構成も全て本発明に含まれるものである。 Furthermore, by downloading and reading a program for achieving the present invention from a server, database, etc. on a network using a communication program, the system or apparatus can enjoy the effects of the present invention. In addition, all the structures which combined each embodiment mentioned above and its modification are also included in this invention.

利用者端末１０１
要約生成装置１０２ User terminal 101
Summary generation device 102

Claims

An information processing apparatus for generating a summary video based on a plurality of summary target videos,
Obtaining means for obtaining the video to be summarized;
An object feature amount extracting unit that extracts an object feature amount for identifying an object included in the moving image frame from the moving image frame of the summary target moving image acquired by the acquiring unit;
Object specifying means for specifying an object included in the moving image frame based on the object feature extracted by the object feature extracting means;
An information processing apparatus comprising:

A specific object management means for managing the name of the specific object and the object feature amount of the specific object in association with each other;
The object specifying unit specifies a specific object included in the moving image frame by comparing the object feature amount extracted by the object feature amount extracting unit with the object feature amount managed by the specific object managing unit. The information processing apparatus according to claim 1, wherein:

The object specifying unit further specifies a general object included in the moving image frame using the feature amount extracted by the feature amount extraction unit and learning data stored in advance. The information processing apparatus according to 1 or 2.

Time information when a moving image frame in which a specific object is specified by the object specifying means, position information of the specific object, and time information when a moving image frame in which a general object is specified by the object specifying means are taken The information processing apparatus according to claim 3, further comprising: an object estimation unit that estimates the specified general object by using the object estimation unit.

A summary video generation means for generating a summary video;
The summary video generation unit generates a summary video that displays the name of the specific object specified by the object specification unit together with a video frame including the specific object. The information processing apparatus according to item 1.

Second acquisition means for acquiring information relating to the object specified by the object specifying means;
Analysis means for performing morphological analysis on the information acquired by the second acquisition means;
Object description information management means for managing the information acquired by the second acquisition means and the result of analysis by the analysis means in association with the moving image frame;
The information processing apparatus according to claim 1, further comprising:

Search word accepting means for accepting a search keyword for searching for the video to be summarized;
Search for searching for a video to be summarized by searching for the name of the specific object, information acquired by the second acquisition unit, and a result analyzed by the analysis unit, using a search keyword received by the search word reception unit The information processing apparatus according to claim 6, further comprising: means.

Correction accepting means for accepting a correction instruction by the user for the name of the specific object;
In accordance with the correction instruction received by the correction receiving means, correction means for correcting the name of the specific object,
The information processing apparatus according to claim 5, further comprising:

An information processing method in an information processing apparatus for generating a summary video based on a plurality of summary target videos,
An acquisition step in which the acquisition unit of the information processing apparatus acquires the summary target moving image;
An object feature amount extracting unit that extracts an object feature amount for specifying an object included in the moving image frame from the moving image frame of the summary target moving image acquired by the object feature amount extracting unit of the information processing apparatus; ,
An object specifying step of specifying an object included in the moving image frame based on the object feature amount extracted by the object feature amount extraction step;
An information processing method comprising:

A program that can be executed in an information processing apparatus that generates summary videos based on a plurality of summary target videos,
The information processing apparatus;
Obtaining means for obtaining the video to be summarized;
An object feature amount extracting unit that extracts an object feature amount for identifying an object included in the moving image frame from the moving image frame of the summary target moving image acquired by the acquiring unit;
A program which functions as an object specifying unit for specifying an object included in the moving image frame based on the object feature amount extracted by the object feature amount extracting unit.