JP2019092025A

JP2019092025A - Editing system

Info

Publication number: JP2019092025A
Application number: JP2017219011A
Authority: JP
Inventors: 治彦小島; Haruhiko Kojima
Original assignee: Hitachi Kokusai Electric Inc
Current assignee: Hitachi Kokusai Electric Inc
Priority date: 2017-11-14
Filing date: 2017-11-14
Publication date: 2019-06-13
Anticipated expiration: 2037-11-14
Also published as: JP6934402B2

Abstract

To provide a technology for effectively performing editing such as mosaic processing by making it easy to detect a face image of a person to be detected in a program (in video data).SOLUTION: A similar face image detection device 16 is made by an editor to read a face image file (detection target face image) of a target performer. The similar face image detection device 16 compares the detection target face image with a face image in a face image storage server 127, which results in retrieval of a face image in a scene in which a performer having the same face appears and time code information. The time code information on the detected scene of appearance is transferred to an editing device 14. Without time and labor for searching for a scene in which the target performer appears, the editor is enabled to produce a special feature program on the performer or add a mosaic effect on the performer's image. When a preview on the detected scene of appearance without using the editing device 14 is desired, a video file in a low resolution server 126 is reproduced.SELECTED DRAWING: Figure 16

Description

本発明は、編集システムに係り、例えば、映像を蓄積している放送局において、祝賀番組や追悼番組などを制作するために、過去の映像の中から特定の出演者の出演シーンを検出して、番組制作のための編集を補助する機能を有する編集システムに関する。 The present invention relates to an editing system, for example, in a broadcast station storing video, in order to produce a celebration program or a memorial program, a appearance scene of a specific performer is detected from the video in the past. , And an editing system having a function of assisting editing for program production.

従来、過去の映像資産はＶＴＲテープに記録されており、膨大な数のＶＴＲテープが倉庫に保管されていた。各ＶＴＲテープにはテープ番号が貼られ、テープ番号とともに、そのテープに記録されている番組名や出演者、番組内容概略の情報が管理されていた。そのため、特定の出演者の映像が欲しい場合、放送局の担当者はＶＴＲテープの管理情報を元に、その出演者が出演している番組が記録されているＶＴＲテープを特定していた。 Conventionally, past video assets have been recorded on VTR tapes, and a huge number of VTR tapes have been stored in a warehouse. A tape number is attached to each VTR tape, and together with the tape number, information of a program name, a cast, and a program content summary recorded on the tape is managed. Therefore, when a video of a specific performer is desired, the person in charge of the broadcast station specified the VTR tape in which the program in which the performer appears is recorded based on the management information of the VTR tape.

例えば、特許文献１では、同じ番組編集技術として、編集用元素材と編集済素材との関係を抜き出した情報である元素材情報データを作成し、再び編集する場合には、編集済素材とプロジェクトデータと元素材情報データとを使用して編集する技術が提案されている。 For example, in Patent Document 1, as the same program editing technology, in the case of creating original material information data which is information extracted the relationship between the editing original material and the edited material, and editing again, the edited material and the project Techniques for editing using data and original material information data have been proposed.

特開２０１２−３４２１８号公報JP 2012-34218 A

ところで、従来では、ＶＴＲテープのどのシーンに出演しているかを特定するためには、ＶＴＲテープをＶＴＲ装置にかけて再生させ、目視でその出演者の出演シーンを探す必要があり、出演シーンが見つかった場合、出演シーンのタイムコード情報をメモして、編集に使用しており、作業効率や精度の観点から対策の新たな技術が求められていた。 By the way, conventionally, in order to specify which scene of VTR tape has appeared, it is necessary to reproduce the VTR tape through the VTR device and visually look for the appearance scene of the performer, and the appearance scene is found In this case, the time code information of the appearance scene is noted and used for editing, and a new technology of measures is required from the viewpoint of work efficiency and accuracy.

近年、映像資産をＶＴＲテープからＬＴＯテープ等の磁気メディアやブルーレイディスク（登録商標）等の光学メディアにダビングして、これらのメディア内で映像ファイルとして保管する方式に変わりつつある。しかし、出演シーンを探し出すためには、これらのメディア内の映像ファイルを再生して目視する必要があることには変わりはなく、同様の課題があった。 In recent years, a method of dubbing video assets from VTR tapes to magnetic media such as LTO tapes and optical media such as Blu-ray Disc (registered trademark) has been changed to a method of storing as video files in these media. However, there is no change in that it is necessary to play back and view video files in these media in order to find out the appearance scene, and there is a similar problem.

また、番組の編集が完了してから放送直前に出演者が問題を起こしたことにより、その出演者の放送が不可になった場合は、その出演者にモザイクをかけるか、または出演シーンをカットするための再編集を行う必要がある。再編集するために出演シーンを探し出すためには、編集完了後の映像を再生させて、編集者が目視で出演シーンを探し出す必要があった。この点でも、同様の課題があった。 In addition, if the cast has a problem just before the broadcast after editing of the program is completed and the cast of the cast is not available, the cast will be mosaicted or the cast scene cut Need to re-edit to In order to find out the appearance scene for re-editing, it is necessary to reproduce the video after the editing is completed and the editor visually finds out the appearance scene. This point also has the same problem.

本発明は、このような状況に鑑みなされたもので、上記課題を解決することを目的とする。 The present invention has been made in view of such a situation, and an object thereof is to solve the above-mentioned problems.

本発明は、放送に用いられる映像ファイルを編集する編集装置を備えた編集システムであって、前記映像ファイルに含まれる出演者の顔画像を取得し、前記顔画像と各出演者の出演映像のタイムコード情報とを関連付けて記録する顔画像蓄積サーバと、前記顔画像蓄積サーバに記録されている顔画像と、特定番組の映像ファイルに含まれる検索対象となる顔画像とを比較し、前記特定番組における出演映像を検出する出演映像検出部と、前記出演映像検出部が検出した出演映像に基づいて、前記特定番組において前記検索対象となる顔画像の人物が出演している他の出演映像を類似顔画像検索により検出し、検出した出演映像のタイムコード情報を前記検索対象となった出演者情報と関連付けて前記編集装置に通知する類似顔画像検出装置と、を備え、前記編集装置は、前記タイムコード情報を用いて前記特定番組の映像ファイルを編集する。
また、前記編集装置は、前記映像ファイルの編集の際に、出演者の前記映像ファイルが前記類似顔画像検出装置の処理対象の記録装置に保存されている場合は、前記記録装置の映像ファイルを再生することで出演映像の映像確認を可能に表示してもよい。
また、前記編集装置は、低解像度映像を使用して検出した出演映像を再生してもよい。
また、前記顔画像蓄積サーバは、検出対象の顔画像を顔の種別と関連付けて保存可能であり、前記類似顔画像検出装置は、前記顔種別に応じて類似顔画像検索を行ってもよい。 The present invention is an editing system provided with an editing apparatus for editing a video file used for broadcasting, which acquires a face image of a performer included in the video file, and the face image and the appearance video of each performer The face image storage server for recording in association with time code information, the face image stored in the face image storage server, and the face image to be searched included in the video file of a specific program are compared, Based on the appearance video detection unit for detecting appearance video in a program and the appearance video detected by the appearance video detection unit, other appearance videos in which a person of the face image to be searched for in the specific program is appearance A similar face image detection device that detects time code information of a detected appearance video detected by similar face image search, associates the detected time code information of the appearance video with the performer information that is the search target, and notifies the editing device Wherein the editing device edits the video file of the specific program by using the time code information.
In addition, when editing the video file, if the video file of the performer is stored in the recording device to be processed by the similar face image detection device, the editing device may use the video file of the recording device. It may be possible to display the video confirmation of the appearance video by playing back.
Further, the editing apparatus may reproduce the appearance video detected using the low resolution image.
The face image storage server may store a face image to be detected in association with a face type, and the similar face image detection device may perform a similar face image search according to the face type.

本発明よれば、番組中（映像データ中）の検索対象の人物の顔画像の検出を容易にし、モザイク処理等の編集を効果的に行う技術を提供できる。 According to the present invention, it is possible to provide a technique for facilitating detection of a face image of a person to be searched for in a program (in video data) and effectively performing editing such as mosaic processing.

実施形態に係る、映像編集システムの概略構成を示すブロック図である。FIG. 1 is a block diagram showing a schematic configuration of a video editing system according to an embodiment. 実施形態に係る、記録装置の概略構成を示すブロック図である。FIG. 1 is a block diagram showing a schematic configuration of a recording apparatus according to an embodiment. 実施形態に係る、類似顔画像検出装置の概略構成を示すブロック図である。FIG. 1 is a block diagram showing a schematic configuration of a similar face image detection device according to an embodiment. 実施形態に係る、編集装置の概略構成を示すブロック図である。It is a block diagram showing a schematic structure of an editing device concerning an embodiment. 実施形態に係る、自動編集情報作成装置の概略構成を示す機能ブロックである。It is a functional block which shows schematic structure of the automatic edit information creation apparatus based on embodiment. 実施形態に係る、編集処理の一例を示すフローチャートである。It is a flow chart which shows an example of edit processing concerning an embodiment. 実施形態に係る、編集処理の一例を示すフローチャートである。It is a flow chart which shows an example of edit processing concerning an embodiment. 実施形態に係る、編集処理の一例を示すフローチャートである。It is a flow chart which shows an example of edit processing concerning an embodiment. 実施形態に係る、処理対象認識部の動作例を示すフローチャートである。It is a flowchart which shows the operation example of a process target recognition part based on embodiment. 実施形態に係る、編集装置における表示方法（タッチパネルディスプレイの表示）の例を示す図である。It is a figure which shows the example of the display method (display of a touch-panel display) in an editing apparatus based on embodiment. 実施形態に係る、検索キー画像の候補となった画像の特徴量の例を示す図である。It is a figure which shows the example of the feature-value of the image used as the candidate of the search key image based on embodiment. 実施形態に係る、類似人物検索（類似顔画像検出処理）を実施する手順を示すフローチャートである。It is a flowchart which shows the procedure which implements a similar person search (similar face image detection process) based on embodiment. 実施形態に係る、類似顔画像検索システムに使用可能な検索画面の例を示す図である。It is a figure which shows the example of the search screen which can be used for a similar face image search system based on embodiment. 実施形態に係る、顔画像蓄積サーバに顔画像を蓄積する手順例を示す図である。It is a figure which shows the example of a procedure which accumulate | stores a face image in a face image storage server based on embodiment. 実施形態に係る、顔画像蓄積サーバに顔画像を蓄積する手順例を示す図である。It is a figure which shows the example of a procedure which accumulate | stores a face image in a face image storage server based on embodiment. 実施形態に係る、目的の出演者の顔画像を検出対象として顔画像蓄積サーバから類似顔画像検出した例を示す図である。It is a figure which shows the example which similar face image detection was carried out from the face image storage server by making the face image of the target performer into a detection target based on embodiment.

以下、本発明の実施形態について図面を参照して詳細に説明する。
本実施形態の概要は次の通りである。
（１）放送局に蓄積された膨大な過去映像の中から、可能な限り全ての出演者の顔画像のみを切出して出演シーンのタイムコード情報とともに顔画像蓄積サーバに保存しておく。
（２）蓄積された顔画像と目的の出演者の顔画像を比較して、その出演者の出演シーンを検出する。
（３）検出した結果の出演シーンの顔画像を用いて、類似顔画像検出処理によって類似した出演シーンを絞り込む。
（４）検出した出演シーンのタイムコード情報を編集機に渡すことで、その出演者が出演している特集番組の制作を容易にする。
（５）低解像度映像を使用して検出した出演シーンを簡易に再生する。
（６）放送直前に出演者の映像が放送不可になった場合、その出演者が出演しているシーンを特定して、編集（モザイク、カット等）する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
The outline of the present embodiment is as follows.
(1) From the huge past video accumulated in the broadcasting station, only the face images of all the performers are cut out and stored in the face image storage server together with time code information of appearance scenes.
(2) The appearance image of the performer is detected by comparing the accumulated face image and the face image of the target performer.
(3) Using the face image of the appearance scene of the detected result, narrow down the appearance scene similar by the similar face image detection processing.
(4) By passing the time code information of the detected appearance scene to the editing machine, it facilitates the production of a feature program in which the performer appears.
(5) The appearance scene detected using the low resolution video is simply reproduced.
(6) If the video of the performer becomes not available immediately before the broadcast, identify the scene in which the performer is performing and edit (mosaic, cut, etc.).

図１は、本実施形態に係る映像編集システム１の概略構成を示すブロック図である。映像編集システム１は、カメラ１０と、収録装置１１と、記録装置１２（ビデオサーバ）と、自動編集情報作成装置１３と、編集装置１４と、管理端末１７と、送出サーバ１８と、システム制御部１５とを備え、それらはＬＡＮ回線や所定の通信回線等のネットワーク２で接続されている。システム制御部１５は、映像編集システム１全体を統括的に制御するものであって、単独で構成されてもよいし、他装置（記録装置１２や編集装置１４など）と同一に含まれて構成されてもよい。 FIG. 1 is a block diagram showing a schematic configuration of a video editing system 1 according to the present embodiment. The video editing system 1 includes a camera 10, a recording device 11, a recording device 12 (video server), an automatic editing information creation device 13, an editing device 14, a management terminal 17, a transmission server 18, and a system control unit. 15 are connected by a network 2 such as a LAN line or a predetermined communication line. The system control unit 15 controls the whole of the video editing system 1 in an integrated manner, and may be configured independently or included in the same manner as other devices (such as the recording device 12 and the editing device 14) It may be done.

カメラ１０は、ＣＣＤ（Charge Coupled Device）やＣＭＯＳ（Complementary Metal Oxide Semiconductor）素子等で撮像した画像にデジタル変換処理を施し、変換結果の画像データ（例えば、ＨＤ−ＳＤＩ規格の素材映像データ）を、収録装置１１を用いて、ネットワーク２を介して記録装置１２へ出力する。記録装置１２（ビデオサーバ）は、これを記憶する。記録装置１２には、ネットワーク２を介して自動編集情報作成装置１３が接続され、素材映像データは自動編集情報作成装置１３に入力されてから、記録装置１２に記憶される。ただし、素材映像データは、自動編集情報作成装置１３を介さずに記録装置１２に直接入力され記憶されてもよい。 The camera 10 performs digital conversion processing on an image captured by a CCD (Charge Coupled Device), a CMOS (Complementary Metal Oxide Semiconductor) element, or the like, and converts the image data of the conversion result (for example, material video data of the HD-SDI standard) The data is output to the recording device 12 via the network 2 using the recording device 11. The recording device 12 (video server) stores this. The automatic editing information creation device 13 is connected to the recording device 12 via the network 2, and the material video data is input to the automatic editing information creation device 13 and then stored in the recording device 12. However, the material video data may be directly input and stored in the recording device 12 without passing through the automatic editing information creation device 13.

図２は、記録装置１２の概略構成を示すブロック図である。記録装置１２は、録画機能及び類似顔画像検出機能（類似顔画像検出装置１６）と、データ保存機能（１２１〜１２７）を備える。 FIG. 2 is a block diagram showing a schematic configuration of the recording device 12. The recording device 12 includes a recording function and a similar face image detection function (similar face image detection device 16), and a data storage function (121 to 127).

記録装置１２のデータ保存機能について説明する。記録装置１２は、素材映像データを記録する素材映像データ部１２１と、編集済み映像データを記録する編集済み映像データ部１２２と、自動編集済み映像データを記録する自動編集済み映像データ部１２３と、最終編集情報を記録する最終編集情報部１２４と、自動編集情報を記録する自動編集情報部１２５と、低解像度ファイルを記録する低解像度サーバ１２６と、映像中に含まれる顔画像を記録し蓄積する顔画像蓄積サーバ１２７と、を備える。 The data storage function of the recording device 12 will be described. The recording device 12 includes a source video data unit 121 for recording source video data, an edited video data unit 122 for recording edited video data, and an automatically edited video data unit 123 for recording automatically edited video data. A final editing information unit 124 for recording final editing information, an automatic editing information unit 125 for recording automatic editing information, a low resolution server 126 for recording low resolution files, and a face image included in video are recorded and stored. And a face image storage server 127.

低解像度サーバ１２６を設ける理由は次の通りである。すなわち、一般に、メディア５（光学メディア５ａ、磁気メディア５ｂ、ＶＴＲテープ５ｃ）内の映像ファイルは高画質で保存する必要があるため、映像ファイルサイズが大きくなる。すなわち、常にアクセス可能なＨＤＤストレージに保存しておくことはできない。低解像度ファイルであれば、ファイルサイズが小さいため常にアクセス可能なＨＤＤストレージに映像ファイルを保存して試写することが可能である。そこで、メディア５をダビングする場合は、低解像度映像（低解像度ファイル）を同時に作成して、低解像度サーバ１２６に記録する。 The reason for providing the low resolution server 126 is as follows. That is, in general, the video file in the medium 5 (optical media 5a, magnetic media 5b, VTR tape 5c) needs to be stored with high image quality, so the video file size becomes large. That is, it can not always be stored in accessible HDD storage. If the file is a low resolution file, the file size is small, so it is possible to save the video file on an accessible HDD storage and make a trial copy. Therefore, when dubbing the medium 5, low resolution video (low resolution file) is simultaneously created and recorded in the low resolution server 126.

つづいて、記録装置１２の録画機能及び類似顔画像検出機能を実現する類似顔画像検出装置１６について図３を参照して説明する。 Subsequently, the similar face image detection device 16 for realizing the recording function and the similar face image detection function of the recording device 12 will be described with reference to FIG.

図３は、類似顔画像検出装置１６の概略構成を示すブロック図である。類似顔画像検出装置１６は、画像送受信部２１０と、画像記録部２１１と、再生制御部２１２と、人物領域検出部２１３と、人物特徴量抽出部２１４と、人物特徴量記録部２１５と、属性情報記録部２１６と、要求受信部２１７と、類似人物検索部２１８と、登場イベント検索部２１９と、検索結果送信部２２０と、キーワード記録部１１０と、キーワード検索部１１１とを有する。 FIG. 3 is a block diagram showing a schematic configuration of the similar face image detection device 16. The similar face image detection device 16 includes an image transmission / reception unit 210, an image recording unit 211, a reproduction control unit 212, a person area detection unit 213, a person feature quantity extraction unit 214, a person feature quantity recording unit 215, and attributes. An information recording unit 216, a request receiving unit 217, a similar person searching unit 218, an appearance event searching unit 219, a search result transmitting unit 220, a keyword recording unit 110, and a keyword searching unit 111 are included.

画像送受信部２１０は、装置外部からの画像の入出力を行う処理部であり、カメラ１０や他の装置からの入力画像データの受信、他装置（編集装置１４等）への出力画像データの送信を行う。 The image transmission / reception unit 210 is a processing unit that inputs / outputs an image from the outside of the apparatus, receives input image data from the camera 10 or another apparatus, and transmits output image data to another apparatus (such as the editing apparatus 14). I do.

画像記録部２１１は、入力画像データの記録媒体へ書込みや出力画像データの記録媒体からの読出し（ＶＴＲテープ５ｃの場合はメディア再生装置１９に接続される）を行う。書込みの際には、画像データに加え、画像データを読出す際の情報となる画像ＩＤ（画像の識別情報）も併せて記録する。再生制御部２１２は、編集装置１４への映像再生を制御する。 The image recording unit 211 writes the input image data to the recording medium and reads the output image data from the recording medium (in the case of the VTR tape 5c, it is connected to the media reproduction device 19). At the time of writing, in addition to the image data, an image ID (image identification information) serving as information when reading out the image data is also recorded. The playback control unit 212 controls video playback to the editing device 14.

人物領域検出部２１３は、入力画像データに対し画像認識技術を用いた人物検出を行い、画像中の人物の存在判定をし、人物が存在する場合には、その領域の座標算出を行う。また、人物領域検出部２１３は、人物の「顔」の領域を特定し、その領域を含む顔画像を抽出し、顔画像蓄積サーバ１２７に記録する。 The person area detection unit 213 performs person detection using image recognition technology on input image data, determines presence of a person in an image, and calculates coordinates of the area when a person is present. Also, the human area detection unit 213 specifies an area of the “face” of the human, extracts a face image including the area, and records the face image in the face image storage server 127.

人物特徴量抽出部２１４は、人物領域検出部２１３で検出した領域に対して画像認識技術を用いて特徴量算出を行う。ここで算出する人物特徴量とは、例えば、人物の輪郭の形状や方向、皮膚の色、歩容（どの脚をどのようにどんなタイミングで動かすかといった脚の捌き方）、或いは、人物を特定する代表的な部位である顔の輪郭の形状や方向、目や鼻、口といった主要構成要素の大きさ、形状、配置関係等が挙げられるが、本実施形態においては、使用する特徴量の種類や数はいずれであってもよい。人物特徴量抽出部２１４は、顔種別（正面、横顔、斜め顔、後ろ顔、笑った顔、怒った顔等）を特徴量の種類として判別することができ、検出対象の顔画像とそのような特徴量を関連づけることができる。 The person feature quantity extraction unit 214 performs feature quantity calculation on the area detected by the person area detection unit 213 using an image recognition technique. The human feature value calculated here is, for example, the shape and direction of the contour of the person, the color of the skin, the gait (the way of the legs, such as how to move the legs and how to move them), or The shape and direction of the contour of the face, which is a representative part, and the size, shape, arrangement relationship, etc. of main components such as eyes, nose, and mouth can be mentioned. In this embodiment, the type of feature used The number may be any. The person feature quantity extraction unit 214 can distinguish the face type (front, side face, oblique face, back face, laughing face, angry face, etc.) as the type of feature quantity, and the face image to be detected and the like Feature amounts can be associated.

人物特徴量記録部２１５は、人物特徴量抽出部２１４で算出した特徴量の記録媒体への書込みと読出しを行う。このとき、人物特徴量は、人物領域検出部２１３が抽出した顔画像を顔画像蓄積サーバ１２７に記録する際に関連付けられる。顔画像には、所定のタイミング（ユーザによる入力または類似顔画像検索による自動付与）で人物の名前と関連付けられる。 The person feature quantity recording unit 215 writes and reads the feature quantity calculated by the person feature quantity extraction unit 214 on a recording medium. At this time, the person feature amount is associated when the face image extracted by the person area detection unit 213 is recorded in the face image storage server 127. The face image is associated with the person's name at a predetermined timing (input by the user or automatic assignment by similar face image search).

画像記録部２１１における画像データの記録媒体と本処理部における人物特徴量の記録媒体とは同一であっても別個であってもよい。 The recording medium of the image data in the image recording unit 211 and the recording medium of the person feature amount in the main processing unit may be the same or different.

属性情報記録部２１６は、画像データに関連する属性情報の記録媒体への書込みと読出しを行う。属性情報とは、例えば、画像の撮影時刻や撮像装置番号等である。 The attribute information recording unit 216 writes and reads attribute information related to image data to a recording medium. The attribute information is, for example, a photographing time of an image, an imaging device number, or the like.

要求受信部２１７は、編集装置１４からの検索要求やキーワード付与要求の受信を行う。検索要求には、類似顔画像検索要求と、登場イベント検索要求がある。 The request receiving unit 217 receives a search request and a keyword assignment request from the editing device 14. The search request includes a similar face image search request and an appearance event search request.

類似人物検索部２１８は、要求受信部２１７にて受信した要求が類似人物検索要求であった場合に、類似顔画像検索を行う。 The similar person search unit 218 performs similar face image search when the request received by the request reception unit 217 is a similar person search request.

登場イベント検索部２１９は、要求受信部にて受信した要求が登場イベント検索要求であった場合に、登場イベント検索を行う。 The appearance event search unit 219 performs appearance event search when the request received by the request reception unit is an appearance event search request.

検索結果送信部２２０は、類似人物検索部２１８や登場イベント検索部２１９から得た類似人物検索結果や登場イベント検索結果の編集装置１４への送信を行う。 The search result transmission unit 220 transmits the similar person search result and the appearance event search result obtained from the similar person search unit 218 and the appearance event search unit 219 to the editing device 14.

キーワード記録部１１０は、要求受信部２１７にて受信したキーワード付与要求に基づくキーワードの記録媒体への書込みと読出しを行う。 The keyword recording unit 110 writes and reads a keyword to the recording medium based on the keyword assignment request received by the request receiving unit 217.

キーワード検索部１１１は、要求受信部２１７にて受信した検索要求データ中にキーワードが含まれていた場合に、キーワード検索を行う。 When the search request data received by the request reception unit 217 includes a keyword, the keyword search unit 111 performs a keyword search.

つづいて、図４を参照して編集装置１４（編集機）を説明する。図４は、編集装置１４の概略構成を示すブロック図である。編集装置１４は、素材映像データに対して実際にレンダリング処理等を施す編集処理を行う。 Subsequently, the editing device 14 (editing machine) will be described with reference to FIG. FIG. 4 is a block diagram showing a schematic configuration of the editing device 14. The editing device 14 performs editing processing for actually performing rendering processing and the like on the material video data.

編集装置１４は、実際にこの編集作業を行うプロセッサを具備する編集制御部（編集手段）１４１と、素材映像データ、及びこれに編集が施された後の映像データに基づく映像を表示させる表示部１４２（ディスプレイ）と、その画像や音声における各部分を選択する、あるいは指示を入力するための操作パネル１４３（操作手段）と、類似顔画像検出操作部１０３とを備える。表示部１４２と操作パネル１４３とが一体化されたタッチパネルディスプレイ１４４として設けられてもよい。 The editing device 14 includes an editing control unit (editing means) 141 having a processor that actually performs this editing operation, and a display unit for displaying a video based on the material video data and the video data after the editing. A similar face image detection / operation unit 103 is provided with an operation panel 142 (operation means) for selecting each part of the image or sound or inputting an instruction. The display unit 142 and the operation panel 143 may be provided as an integrated touch panel display 144.

編集制御部１４１は、素材映像データと上記の自動編集情報を記録装置１２（自動編集情報部１２５）から読み出し、自動編集情報に基づいて素材映像データを編集した新たな映像データ（自動編集済み映像データ）を作成し、自動編集済み映像データを記録装置１２（自動編集済み映像データ部１２３）に記憶させる。 The editing control unit 141 reads the material video data and the above-mentioned automatic editing information from the recording device 12 (automatic editing information unit 125), and edits the material video data based on the automatic editing information (new video data (automatically edited video) Data) is created, and the automatically edited video data is stored in the recording device 12 (automatically edited video data unit 123).

ただし、編集装置１４においては、この自動編集済み映像データに基づく画像をユーザが表示部１４２で確認した上で、操作パネル１４３を操作して、自動編集済み映像データにおいて処理が施された部分のうち、適切でないと認識された部分の処理を解除するための指示を編集制御部１４１に出し、この処理の解除を行うこともできる。この場合には、素材映像データを参照することもできる。 However, in the editing device 14, after the user confirms the image based on the automatically edited video data on the display unit 142, the operation panel 143 is operated to perform processing of the automatically edited video data. It is also possible to issue an instruction to cancel the processing of the portion recognized as not appropriate among them to the editing control unit 141 and cancel this processing. In this case, material video data can also be referenced.

同様に、編集制御部１４１は、自動編集済み映像データに対して、更に追加の処理を施すこともできる。この際に新たに処理の対象となる部分は、ユーザによって指定される。この際にも、ユーザは、自動編集済み映像データに基づく映像を表示部１４２で確認した上で、操作パネル１４３を操作して、この操作を行うことができる。こうしたユーザによる操作によって、前記の自動編集情報が書き換えられた最終編集情報が生成される。この最終編集情報は素材映像データに対する編集処理に反映されると共に、後述されるように、処理対象情報の更新に利用される。 Similarly, the editing control unit 141 can further perform additional processing on the automatically edited video data. At this time, the part to be newly processed is designated by the user. Also at this time, the user can perform this operation by operating the operation panel 143 after confirming the video based on the automatically edited video data on the display unit 142. By such an operation by the user, final editing information in which the automatic editing information is rewritten is generated. The final editing information is reflected in the editing process on the material video data, and is also used to update the processing target information as described later.

同様に、編集制御部１４１は、記録装置１２から素材映像データを直接読み込み、この素材映像データに基づく画像を表示部１４２でユーザに確認させた上で操作パネル１４３を操作させ、前記の自動編集情報を用いずに、処理の対象となる部分を指定し、レンダリング処理を施す操作を行うこともできる。この操作においては、ユーザは、自動編集情報とは無関係に、素材映像データに対してレンダリング処理を行うことができる。 Similarly, the editing control unit 141 directly reads the material video data from the recording device 12, causes the user to confirm an image based on the material video data on the display unit 142, and then operates the operation panel 143 to perform the automatic editing. It is also possible to specify a portion to be processed and perform an operation to perform rendering processing without using information. In this operation, the user can perform rendering processing on the material video data regardless of the automatic editing information.

編集制御部１４１は、このように、自動編集情報に基づいて編集された自動編集済み映像データ、ユーザによって自動編集済み映像データ又は素材映像データが編集された編集済み映像データを、記録装置１２に記録させることができる。 Thus, the editing control unit 141 causes the recording device 12 to edit the automatically edited video data edited based on the automatic editing information, the edited video data edited by the user or the material video data. It can be recorded.

類似顔画像検出操作部１０３は、機能構成として、検索要求送信部２２１、検索結果受信部２２２、検索結果表示部２２３、再生画像表示部２２４、画面操作検知部２２５、キーワード付与要求送信部１１２、複数検索キー選択部１１３の各処理部を有する。 The similar face image detection operation unit 103 has, as functional components, a search request transmission unit 221, a search result reception unit 222, a search result display unit 223, a reproduction image display unit 224, a screen operation detection unit 225, and a keyword assignment request transmission unit 112. Each processing unit of the multiple search key selection unit 113 is included.

検索要求送信部２２１は、検索要求の記録装置１２への送信を行う。類似人物検索の場合、検索要求データには、類似人物検索の検索キーとして、人物の名前、検索キー画像（特に顔画像）或いはその特徴量が含まれる。また、検索要求データには、絞込みパラメータを含めることも可能である。 The search request transmission unit 221 transmits the search request to the recording device 12. In the case of similar person search, the search request data includes the name of a person, a search key image (especially a face image) or its feature amount as a search key for a similar person search. In addition, search request data can also include refinement parameters.

検索結果受信部２２２は、検索結果の記録装置１２（類似顔画像検出装置１６）からの受信を行う。検索結果として受信するデータには、記録装置１２（類似顔画像検出装置１６）において、類似人物検索、或いは、登場イベント検索を実施して得られた画像の集合が含まれる。集合を構成する個々の画像は、記録装置１２（類似顔画像検出装置１６）に記録された映像から画像サイズ縮小処理等を施して生成される。以下、この個々の画像を「検索結果画像」、検索結果として送受信するデータを「検索結果データ」ともいう。 The search result receiving unit 222 receives the search result from the recording device 12 (similar face image detection device 16). The data received as the search result includes a set of images obtained by performing similar person search or appearance event search in the recording device 12 (similar face image detection device 16). The individual images forming the set are generated from the video recorded in the recording device 12 (similar face image detection device 16) by performing image size reduction processing and the like. Hereinafter, the individual images are also referred to as “search result images”, and data transmitted and received as search results is also referred to as “search result data”.

検索結果表示部２２３は、検索結果受信部２２２にて受信した検索結果の画面表示を行う。表示される画面例については後述する。
再生画像表示部２２４は、記録装置１２（類似顔画像検出装置１６）から入力された画像データの画面への連続動画表示を行う。
画面操作検知部２２５は、ユーザによる操作内容の検知及び取得を行う。
キーワード付与要求送信部１１２は、キーワード付与要求の記録装置１２（類似顔画像検出装置１６）への送信を行う。
複数検索キー選択部１１３は、検索キー画像の候補が複数選択されたときに、より少ない数の検索キー画像を適切に選択する処理を行う。 The search result display unit 223 displays a screen of the search result received by the search result receiving unit 222. An example of the displayed screen will be described later.
The reproduction image display unit 224 performs continuous moving image display on the screen of the image data input from the recording device 12 (similar face image detection device 16).
The screen operation detection unit 225 detects and acquires an operation content by the user.
The keyword assignment request transmission unit 112 transmits a keyword assignment request to the recording device 12 (similar face image detection device 16).
The multiple search key selection unit 113 performs processing to appropriately select a smaller number of search key images when a plurality of search key image candidates are selected.

図５は、自動編集情報作成装置１３の機能ブロックである。自動編集情報作成装置１３は、処理対象認識部１３１と、情報記憶部１３２とを備える。情報記憶部１３２は、最終編集情報部１２４と、自動編集情報部１２５と、処理対象情報部１２８とを備える。最終編集情報部１２４、自動編集情報部１２５は、記録装置１２に設けられるものと同一であってもよいし、別に設けられてもよい。 FIG. 5 is a functional block of the automatic editing information creation apparatus 13. The automatic editing information creation apparatus 13 includes a processing target recognition unit 131 and an information storage unit 132. The information storage unit 132 includes a final editing information unit 124, an automatic editing information unit 125, and a processing target information unit 128. The final editing information unit 124 and the automatic editing information unit 125 may be the same as those provided in the recording device 12 or may be provided separately.

自動編集情報作成装置１３は、この素材映像データを読み込み、処理対象認識部１３１で、レンダリング処理を施す部分を認識する。この際、処理対象認識部１３１におけるプロセッサは、情報記憶部１３２に記憶された処理対象情報を基にして、この認識を行い、このように処理の対象となる部分とその処理についての情報（自動編集情報）を記録装置１２に記憶させる。 The automatic editing information creation apparatus 13 reads the material video data, and the processing target recognition unit 131 recognizes a portion to be subjected to rendering processing. At this time, the processor in the processing target recognition unit 131 performs this recognition based on the processing target information stored in the information storage unit 132, and thus information on the portion to be processed and the processing thereof (automatic The editing information is stored in the recording device 12.

自動編集情報の内容における処理の対象となる部分に関する情報としては、具体的には、処理対象となる部分の映像フレーム位置（タイムコード情報）、映像上の座標、あるいは処理対象が音声の場合には音声サンプルの位置の範囲、処理の内容等がある。処理の内容としては、処理対象が映像の場合にはモザイク処理、ブラー処理、映像カット、輝度の増減処理、処理対象が音声の場合にはミュート処理、音量調整等がある。また、処理の対象とする理由（例えば放送禁止に該当する、特定企業名である等）も処理対象情報に含まれる。 More specifically, the information on the portion to be processed in the contents of the automatic editing information includes the video frame position (time code information) of the portion to be processed, the coordinates on the video, or the case where the processing target is audio. Are the range of the position of the audio sample, the contents of the processing, and the like. The contents of the process include mosaic process, blur process, video cut, increase / decrease process of luminance when the process object is video, mute process, volume adjustment, etc. when the process object is audio. The processing target information also includes the reason for the processing (for example, the name of a specific company that corresponds to a broadcast prohibition, etc.).

処理対象情報としては、例えば映像の配信先（目的）等に応じ、複数のものを設定することができる。これに応じて、例えばある一つの配信先に対しては処理の対象とならない部分を他の配信先に対しては処理の対象とすること、上記の処理の内容を配信先に応じて変える、等の操作が可能となる。こうした場合には、処理対象情報がユーザによって選択される構成とされる。 As the processing target information, for example, a plurality of pieces of information can be set according to the delivery destination (purpose) of the video. In response to this, for example, a part not to be processed for one distribution destination is to be processed for another distribution destination, and the contents of the above processing are changed according to the distribution destination. Such operations are possible. In such a case, the processing target information is selected by the user.

また、後述するように、最終的に素材映像データに対して編集が行われる際には、処理対象となった部分や処理の内容は、ユーザによって確認された後に、修正が施される。こうした最終的な編集情報（最終編集情報）あるいは最終編集情報と自動編集情報との違いに関する情報も、情報記憶部１３２に記憶される。 Further, as described later, when the editing is finally performed on the material video data, the portion to be processed and the content of the process are corrected after being confirmed by the user. Information on the difference between the final editing information (final editing information) or the final editing information and the automatic editing information is also stored in the information storage unit 132.

以上の構成による動作例を説明する。
まず、図６〜１０を参照して編集処理例を説明し、次に図１１〜１３を参照して類似人物検索処理（特に類似顔検出処理）について説明し、さらに図１４〜１６を参照して類似顔検出処理を編集処理に適用した処理例を説明する。 An operation example with the above configuration will be described.
First, an example of editing processing will be described with reference to FIGS. 6 to 10, and then similar person search processing (particularly, similar face detection processing) will be described with reference to FIGS. A processing example in which the similar face detection processing is applied to the editing processing will be described.

図６は、システム制御部１５が行わせる具体的な動作を示すフローチャートの一例である。ここでは、単純化のために、編集装置１４を用いてユーザによって指定された処理は行われないものとする。また、図１において、素材映像データは自動編集情報作成装置１３を介してのみ記録装置１２に入力する（記憶される）ものとする。 FIG. 6 is an example of a flowchart showing a specific operation performed by the system control unit 15. Here, for the sake of simplicity, it is assumed that the processing specified by the user using the editing device 14 is not performed. Further, in FIG. 1, it is assumed that material video data is input (stored) in the recording device 12 only via the automatic editing information creation device 13.

まず、収録装置１１は、素材映像データを入手する（Ｓ１）。自動編集情報作成装置１３は、この素材映像データを入手し、素材映像データ中の画像において処理対象となる部分があるかを解析する（Ｓ２）。ここでは、処理対象認識部１３１が、情報記憶部１３２中の情報を参照し、素材映像データ中の画像において処理対象となる部分があるかを認識し、この部分が認識された場合には、この部分に対する処理も、情報記憶部１３２中の情報に基づき、決定する（Ｓ３）。これによって、自動編集情報が作成される。処理の対象となる部分が認識されなかった場合（Ｓ４のＮｏ）には、素材映像データがそのまま記録装置１２に記憶される（Ｓ５）。 First, the recording device 11 obtains material video data (S1). The automatic editing information creation apparatus 13 obtains the material video data, and analyzes whether there is a portion to be processed in the image in the material video data (S2). Here, the processing target recognition unit 131 refers to the information in the information storage unit 132 to recognize whether there is a portion to be processed in the image in the material video data, and when this portion is recognized, The processing for this portion is also determined based on the information in the information storage unit 132 (S3). This creates automatic editing information. If the part to be processed is not recognized (No in S4), the material video data is stored in the recording device 12 as it is (S5).

処理の対象となる部分が認識された場合（Ｓ４のＹｅｓ）、システム制御部１５は、素材映像データを記憶するか否かをユーザに問い合わせる（Ｓ６）。記憶しない場合（Ｓ６のＮｏ）、前記の通り、編集装置１４を用いて、この素材映像データに対して自動編集情報に基づく編集を行わせた自動編集済み映像データを作成し（Ｓ７）、この自動編集済み映像データと自動編集情報とを記録装置１２に記憶させる（Ｓ８）。この場合には、記録装置１２に記憶される映像データは、編集後の自動編集済み映像データのみとなる、あるいは、素材映像データが記録装置１２に記憶されていた場合には、素材映像データは自動編集済み映像データに置き換えられる。 When the part to be processed is recognized (Yes in S4), the system control unit 15 inquires the user whether to store the material video data (S6). If not stored (No in S6), as described above, the editing device 14 creates automatically edited video data by editing the material video data based on the automatic editing information (S7). The automatically edited video data and the automatic editing information are stored in the recording device 12 (S8). In this case, the video data stored in the recording device 12 is only the automatically edited video data after editing, or, if the material video data is stored in the recording device 12, the material video data is Replaced with automatically edited video data.

素材映像データを記憶する場合（Ｓ６のＹｅｓ）、システム制御部１５は、素材映像データと自動編集情報を記録装置１２に記憶させた後（Ｓ９）、ユーザに対して、自動編集を行うか否かの確認を行う（Ｓ１０）。自動編集を行わない場合（Ｓ１０のＮｏ）、処理は終了する。この場合には、記録装置１２には編集前の素材映像データと自動編集情報が記憶される。このため、この時点では自動編集済み映像データは存在しないが、編集装置１４を用いて、後で容易に自動編集済み映像データを作成することができる。 When the material video data is stored (Yes in S6), the system control unit 15 stores the material video data and the automatic editing information in the recording device 12 (S9), and then performs automatic editing for the user. Check whether or not (S10). When the automatic editing is not performed (No in S10), the process ends. In this case, the recording device 12 stores material video data before editing and automatic editing information. For this reason, although there is no automatically edited video data at this time, the edited device 14 can be used to easily create the automatically edited video data later.

自動編集を行う場合（Ｓ１０のＹｅｓ）、システム制御部１５は、編集装置１４に自動編集済み映像データを作成させ（Ｓ１１）、これを記録装置１２に記憶させる（Ｓ１２）。この場合、記録装置１２には、元となった素材映像データ、自動編集情報、自動編集済み映像データの全てが記憶される。このため、例えば、上記のように複数の処理対象情報が設定された場合において、同一の素材映像データに対して他の処理対象情報を用いた処理を後で行うことが容易となる。 When the automatic editing is performed (Yes in S10), the system control unit 15 causes the editing device 14 to create automatically edited video data (S11), and stores the video data in the recording device 12 (S12). In this case, the recording device 12 stores all of the original material video data, the automatic editing information, and the automatically edited video data. Therefore, for example, when a plurality of pieces of processing target information are set as described above, it becomes easy to perform later processing using the other pieces of processing target information on the same material video data.

なお、記録装置１２が収録装置１１から素材映像データを直接受信してこれを記憶する場合には、上記のＳ６〜Ｓ８の工程は不要となる。ただし、自動編集済み映像データが記憶された（Ｓ１２）後に、素材映像データを削除してもよい。 When the recording device 12 directly receives the material video data from the recording device 11 and stores the data, the steps S6 to S8 become unnecessary. However, material video data may be deleted after the automatically edited video data is stored (S12).

図６のフローチャートにおいては、素材映像データの入力があった後におけるシステム制御部１５の動作が示された。一方、素材映像データが記録装置１２に予め記憶されている状態でシステム制御部１５に対して映像の配信（出力）要求があり、これに応じて素材映像データが編集された後の映像データを出力させる場合もある。 The flowchart of FIG. 6 shows the operation of the system control unit 15 after the input of material video data. On the other hand, there is a video distribution (output) request to the system control unit 15 in a state where the material video data is stored in advance in the recording device 12, and the video data after the material video data is edited according to this is It may be output.

図７は、こうした場合におけるシステム制御部１５の動作の一例を示すフローチャートである。ここでは、少なくとも素材映像データは記録装置１２に記憶されているものとする。 FIG. 7 is a flowchart showing an example of the operation of the system control unit 15 in such a case. Here, it is assumed that at least material video data is stored in the recording device 12.

まず、システム制御部１５は、配信の要求があった場合（Ｓ２１）、記録装置１２に自動編集済み映像データが記憶されているか否かを確認する（Ｓ２２）。自動編集済み映像データが記憶されていなかった場合（Ｓ２２のＮｏ）、自動編集情報が記憶されているか否かを確認する（Ｓ２３）。 First, when there is a distribution request (S21), the system control unit 15 checks whether the automatically edited video data is stored in the recording device 12 (S22). If the automatically edited video data is not stored (No in S22), it is checked whether the automatic editing information is stored (S23).

自動編集情報が存在する場合（Ｓ２３のＹｅｓ）、システム制御部１５は、前記のように編集装置１４を用いて自動編集済み映像データを作成し、これを記録装置１２に記憶させる（Ｓ２４）。自動編集情報が存在しない場合（Ｓ２３のＮｏ）、システム制御部１５は、自動編集情報作成装置１３を用いて自動編集情報を作成し（Ｓ２５）、同様に編集装置１４を用いて自動編集済み映像データを作成し、これを記録装置１２に記憶させる（Ｓ２４）。これによって、自動編集済み映像データが記憶されていなかった場合（Ｓ２２のＮｏ）、自動編集済み映像データが新たに作成されて記録装置１２に記憶される。 If the automatic editing information exists (Yes in S23), the system control unit 15 creates the automatically edited video data using the editing device 14 as described above, and stores the video data in the recording device 12 (S24). If the automatic editing information does not exist (No in S23), the system control unit 15 creates the automatic editing information using the automatic editing information creating device 13 (S25), and similarly the automatically edited video using the editing device 14 Data is created and stored in the recording device 12 (S24). As a result, when the automatically edited video data is not stored (No in S22), the automatically edited video data is newly created and stored in the recording device 12.

自動編集済み映像データが記憶されていた場合（Ｓ２２のＹｅｓ）、あるいは上記のように新たに自動編集済み映像データが作成・記憶された場合（Ｓ２４）、システム制御部１５は、この自動編集済み映像データに基づく画像を編集装置１４（表示部１４２）で表示させ（Ｓ２６）、この内容で配信してよいか否かをユーザに問い合わせる（Ｓ２７）。 When the automatically edited video data is stored (Yes in S22), or when the newly edited video data is newly created and stored as described above (S24), the system control unit 15 performs the automatic editing. An image based on the video data is displayed on the editing device 14 (display unit 142) (S26), and the user is inquired as to whether or not the content may be distributed with this content (S27).

この内容で配信してよい場合（Ｓ２７のＹｅｓ）、この自動編集済み映像データを、配信が許可された編集済み映像データと設定する（Ｓ２８）。一方、この内容からの変更を希望する場合（Ｓ２７のＮｏ）、システム制御部１５は、編集装置１４を用いて自動編集済み映像データを更に編集させ（Ｓ２９）、この編集後の映像データを、配信が許可された編集済み映像データと設定し、記録装置１２に記憶させる（Ｓ３０）。この際、前記の通り最終編集情報も作成し、記憶させる。 If the contents may be distributed with this content (Yes in S27), the automatically edited video data is set as the edited video data for which distribution is permitted (S28). On the other hand, when a change from this content is desired (No in S27), the system control unit 15 further edits the automatically edited video data using the editing device 14 (S29), and the edited video data is The edited video data permitted to be distributed is set and stored in the recording device 12 (S30). At this time, final editing information is also created and stored as described above.

その後、システム制御部１５は、上記のように記録装置１２に記憶された編集済み映像データを配信させる（Ｓ３１）。 Thereafter, the system control unit 15 distributes the edited video data stored in the recording device 12 as described above (S31).

また、自動編集情報が作成されてもこれを適用して自動編集済み映像データを作成するのには時間を要し、記録装置１２に様々な映像データを記憶させるのにも時間を要する。このため、配信において不要となることが明らかな映像データを記憶させない、作成しないことが好ましい。更に、処理の時間を短縮するために、ユーザが他の装置を用いて同時に映像を確認する場合もある。 Also, even if the automatic editing information is created, it takes time to apply it to create the automatically edited video data, and it also takes time to store various video data in the recording device 12. For this reason, it is preferable not to store or create video data that is clearly unnecessary in distribution. Furthermore, in order to reduce the processing time, the user may check the image simultaneously using another device.

図８は、こうした点を考慮したシステム制御部１５の動作を示すフローチャートの一例である。 FIG. 8 is an example of a flowchart showing the operation of the system control unit 15 in consideration of such a point.

ここでは、収録装置１１が素材映像データを入手したら（Ｓ４１）、この素材映像データをそのまま記録装置１２に記憶するか否かが判断される（Ｓ４２）。素材映像データの記憶が不要であると認識された場合（Ｓ４２のＮｏ）、前記の通りに自動編集処理が行われて自動編集済み映像データが作成され（Ｓ４３）、この自動編集済み映像データを配信用の映像データであるとして記録装置１２に記憶する（Ｓ４４）。この場合においては、記録装置１２に記録される映像データは自動編集済み映像データのみである。 Here, when the recording device 11 obtains material video data (S41), it is determined whether the material video data is to be stored as it is in the recording device 12 (S42). If it is recognized that the storage of the material video data is unnecessary (No at S42), the automatic editing process is performed as described above, and the automatically edited video data is created (S43). It is stored in the recording device 12 as video data for distribution (S44). In this case, the video data recorded in the recording device 12 is only the automatically edited video data.

素材映像データの記憶をすると認識された場合（Ｓ４２のＹｅｓ）、素材映像データが記録装置１２に記憶される（Ｓ４５）。その後、他装置も用いて素材映像データの解析を行うか否かが問い合わせられる（Ｓ４６）。他装置も用いて素材映像データの解析を行う場合（Ｓ４６のＹｅｓ）、ユーザは、他装置を用いて素材映像データの解析を行い（Ｓ４７）、その上で編集装置１４を用いた以降の処理を開始させることができる。この解析結果を、以下の判定（Ｓ５０、Ｓ５６）に利用できる。 If it is recognized that the material video data is stored (Yes in S42), the material video data is stored in the recording device 12 (S45). Thereafter, it is inquired whether or not to analyze the material video data using another device (S46). When analyzing the material video data using another device (Yes in S46), the user analyzes the material video data using the other device (S47), and the process thereafter using the editing device 14 Can be started. This analysis result can be used for the following determination (S50, S56).

その後、自動編集を直ちに行うか否かが問い合わせられ（Ｓ４８）、直ちに行わない場合（Ｓ４８のＮｏ）、自動編集情報作成装置１３によって自動編集情報が作成され（Ｓ４９）、その後で編集装置１４は、この自動編集情報の内容でよいか否かを問い合わせる（Ｓ５０）。 Thereafter, it is inquired whether or not automatic editing is to be performed immediately (S48), and if it is not to be performed immediately (No in S48), automatic editing information is created by the automatic editing information creation device 13 (S49). Inquiring whether or not the contents of the automatic editing information are acceptable (S50).

この問い合わせを行う際には、実際に自動編集済み映像データは作成されていないが、ユーザは、この自動編集情報に基づく編集後の内容を確認するために、前記の通り、ある一時点での静止画像を用いて、この確認をすることが可能である。 When this inquiry is made, the automatically edited video data is not actually created, but as described above, the user can confirm the content after editing based on the automatic editing information. It is possible to do this verification using a still image.

この内容を変更したい場合（Ｓ５０のＮｏ）、編集装置１４は、ユーザにその修正を行わせる（Ｓ５１）。その後、内容の変更がない場合（Ｓ５０のＹｅｓ）、そのままの自動編集情報に基づいて、素材映像データに対する実際の編集作業が行われた編集済み映像データが作成される（Ｓ５２）。この編集済み映像データが、配信用の映像データとして記録装置１２に記憶される（Ｓ５３）。この場合には、最終的に内容が確定するまで編集済み映像データは作成されない。 If it is desired to change this content (No in S50), the editing apparatus 14 causes the user to make the correction (S51). Thereafter, if there is no change in the content (Yes in S50), edited video data in which the actual editing work has been performed on the material video data is created based on the automatic editing information as it is (S52). The edited video data is stored in the recording device 12 as video data for distribution (S53). In this case, edited video data is not created until the content is finally determined.

自動編集を直ちに行う場合（Ｓ４８のＹｅｓ）、直ちに自動編集情報とこれに基づいた自動編集済み映像データが作成され（Ｓ５４）、自動編集済み映像データを表示部１４２で表示させる（Ｓ５５）。この場合には、ユーザは、自動編集済み映像データの全ての時点で、この編集内容が適正か否かを詳細に確認することができる（Ｓ５６）。 When the automatic editing is immediately performed (Yes in S48), the automatic editing information and the automatically edited video data based on the automatic editing information are created (S54), and the automatically edited video data is displayed on the display unit 142 (S55). In this case, the user can check in detail whether or not the edited content is appropriate at all time points of the automatically edited video data (S56).

その後、この編集内容の修正を望む場合（Ｓ５６のＮｏ）には、上記と同様にその修正作業、確認が行われ（Ｓ５７）、その後に再びこの修正後の編集情報に基づき新たな映像データ（編集済み映像データ）が作成され（Ｓ５８）、この編集済み映像データが配信用の映像データとして記録装置１２に記憶される（Ｓ５９）。この際に作成された最終編集情報も同時に記憶される。 After that, when it is desired to correct this editing content (No in S56), the correction work and confirmation are performed in the same manner as described above (S57), and thereafter new video data (based on the corrected editing information) The edited video data is created (S58), and the edited video data is stored in the recording device 12 as video data for distribution (S59). The final editing information created at this time is also stored simultaneously.

自動編集情報に基づく編集が適正であると認められた場合（Ｓ５６のＹｅｓ）には、既に作成された自動編集済み映像データが、配信用の映像データとして記録装置１２に記憶される（Ｓ６０）。 If it is determined that the editing based on the automatic editing information is appropriate (Yes in S56), the automatically edited video data that has already been created is stored in the recording device 12 as video data for distribution (S60) .

上記の動作においては、素材映像データに対して実際に編集処理を施すことを必要最小限に留めることによって処理時間を短くし、かつユーザによる編集処理が適正か否かのチェックを確実に行うことができ、その修正も行われる。 In the above operation, the processing time can be shortened by minimizing the necessity of actually performing the editing process on the material video data, and the user can surely check whether the editing process is appropriate or not. And their corrections will be made.

次に、素材映像データにおける処理の対象となる部分を認識するために情報記憶部１３２に記憶される処理対象情報について説明する。こうした処理の対象となる部分としては、前記のような時刻表示、映り込んだ自動車の登録ナンバー、企業名、映り込んだ人物の顔等がある。時刻表示や登録ナンバーは、数字をパターン認識することによって認識することができ、企業名は文字のパターン認識によって認識することができ、顔もパターン認識手法によって認識することができる。 Next, processing target information stored in the information storage unit 132 to recognize a portion to be processed in the source video data will be described. The parts to be subjected to such processing include the time display as described above, the registered number of the reflected car, the company name, the face of the reflected person, and the like. The time display and registration number can be recognized by pattern recognition of numbers, company names can be recognized by pattern recognition of characters, and faces can also be recognized by a pattern recognition method.

前記の通り、上記の編集装置１４においては、ユーザ自身が操作パネル１４３を操作することによって、こうした処理の対象となる部分を設定することもでき、その後にこの操作が反映された最終編集情報が作成される。この場合、この最終編集情報を処理対象認識部１３１が認識して、処理対象情報を更新（あるいは作成）することもできる。この場合、処理対象認識部１３１は、処理対象情報をより好ましい内容に更新する処理対象情報改変手段として機能する。 As described above, in the editing apparatus 14 described above, when the user operates the operation panel 143, it is possible to set a portion to be a target of such processing, and then the final editing information on which the operation is reflected is It is created. In this case, the processing target recognition unit 131 can recognize the final editing information and update (or create) the processing target information. In this case, the processing target recognition unit 131 functions as processing target information modifying means for updating the processing target information to more preferable contents.

図９は、処理対象認識部１３１におけるこうした動作の流れを示す図である。
まず、初期状態（初期設定）の処理対象情報は、ユーザによって作成される（Ｐ１）。ここでは、例えば、処理の対象として必要最小限でありかつ認識が比較的容易なもののみが対象として選定される。また例えば、前記のような画像中の時刻表示を、こうした対象とすることができる。この処理対象情報を用いて、前記のようにこの映像編集システム１が繰り返し用いられる。この際、前記のように、自動編集情報による編集に加え、あるいはこの編集に代わり、ユーザによっても編集作業が行われ、最終的に素材映像データに対して適用された最終編集情報が作成され、この最終編集情報も情報記憶部１３２に記憶される。 FIG. 9 is a diagram showing the flow of such an operation in the process target recognition unit 131.
First, processing target information in the initial state (initial setting) is created by the user (P1). Here, for example, only those that are minimally necessary for processing and relatively easy to recognize are selected as targets. Also, for example, time display in an image as described above can be such an object. The video editing system 1 is repeatedly used as described above using the processing target information. At this time, as described above, in addition to or in place of the editing by the automatic editing information, the editing work is also performed by the user, and finally the final editing information applied to the material video data is created. The final editing information is also stored in the information storage unit 132.

このため、処理対象認識部１３１は、自動編集済み映像データの基となった自動編集情報と、その後に生成された最終編集情報とを比較することによって、自動編集情報の基となり情報記憶部１３２に記憶された処理対象情報を改変することができる。例えば、画像中のある文字列が処理対象情報における処理の対象に含まなかったために自動編集情報においては処理の対象とされていなかったが、ユーザによって後で指定されて最終編集情報においては処理の対象とされた場合には、この文字列を処理の対象として追加するように処理対象情報を改変することができる。逆に、画像中のある文字列が処理対象情報における処理の対象に含まれたために自動編集情報においては処理の対象とされたが、ユーザによって後でこの指定が解除されて最終編集情報においては処理の対象とされなかった場合には、この文字列を処理の対象から削除するように処理対象情報を改変することができる。処理対象情報における処理の内容（ブラー処理等）についても、同様に改変することができる。こうした作業は、例えば画像中の顔認識を用いれば、特定の人物を処理の対象とする場合においても同様に行うことができる。 Therefore, the processing target recognition unit 131 becomes the basis of the automatic editing information by comparing the automatic editing information that is the basis of the automatically edited video data with the final editing information generated after that. The processing object information stored in can be modified. For example, although a character string in the image was not included in the processing target information in the processing target information, the automatic editing information was not processed in the processing, but is specified later by the user and processed in the final editing information When it is considered as a target, the processing target information can be modified to add this character string as a processing target. Conversely, because a character string in the image is included in the processing of the processing target information, the automatic editing information has been processed, but this designation is later canceled by the user and the final editing information is If the processing target information is not processed, the processing target information can be modified to delete this character string from the processing target. The contents of processing in the processing target information (such as blur processing) can be similarly modified. Such an operation can be performed similarly even in the case of processing a specific person by using face recognition in an image, for example.

また、このように処理の対象として選択されたか否かという単純な判断を用いずに、処理対象認識部１３１は、記録された複数の最終編集情報における統計的処理に基づいて、処理対象情報を改変することもできる。この際、例えば、最終編集情報と自動編集情報との相違点の各々を数値評価してその数値の総計を点数として算出し、この数値に基づき、処理対象情報を改変することもできる。例えば、この点数が大きかった（違いが大きかった）最終編集情報を抽出し、これらの中で共通の処理対象とされ処理対象情報に含まれなかったものを、新たに処理対象情報に取り入れることができる。 In addition, the processing target recognition unit 131 performs processing target information based on statistical processing in the plurality of final editing information recorded, without using such a simple determination as to whether or not the processing target is selected. It can also be modified. At this time, for example, each difference between the final editing information and the automatic editing information may be evaluated numerically, the total of the numerical values may be calculated as a score, and the processing target information may be modified based on this numerical value. For example, it is possible to extract final editing information in which the score is large (the difference is large), and to incorporate in the processing object information a common processing target among these and not included in the processing target information. it can.

このため、図９のフローにおいては、初期状態の処理対象情報（Ｐ１）を用いてこの映像編集システム１が用いられ、この際に、ユーザの操作により最終編集情報が作成され、情報記憶部１３２に記憶される（Ｐ２）。その後、上記のように、最終編集情報と自動編集情報の違いが数値化されて評価される（Ｐ３）。この数値に基づき、総合的解析として、現在の処理対象情報を書き換えることが好ましいか、あるいはどのように書き換えるかが判定され（Ｐ４）、最終的に処理対象情報が更新される（Ｐ５）。ここで、図９に示されるように、最終的な判定（Ｐ４）に際しては、上記のような最終編集情報と自動編集情報の違いだけでなく、編集装置１４におけるユーザによる編集作業の傾向（例えばあるユーザにおいては編集作業が多く、他のあるユーザでは編集作業が少ない）や、初期設定（Ｐ１）後に新たに発生した事情によって追加された画像に対する条件、等も考慮することができる。 Therefore, in the flow of FIG. 9, this video editing system 1 is used using the processing object information (P1) in the initial state, and at this time, the final editing information is created by the operation of the user. Are stored (P2). Thereafter, as described above, the difference between the final editing information and the automatic editing information is quantified and evaluated (P3). Based on this numerical value, it is determined whether or not it is preferable to rewrite the current processing target information as a comprehensive analysis (P4), and the processing target information is finally updated (P5). Here, as shown in FIG. 9, in the final determination (P4), not only the difference between the final editing information and the automatic editing information as described above, but also the tendency of the editing work by the user in the editing apparatus 14 (for example, It is also possible to consider conditions for an image added due to circumstances newly generated after initial setting (P1), etc., because there are many editing operations for some users and few editing operations for other users.

このような処理対象情報の改変作業は、この映像編集システム１が使用されて最終編集情報が作成される度に繰り返してもよく、周期的に行ってもよい。また、上記の点数を用いる場合には、この点数の累積値に応じて行ってもよい。 Such modification work of the processing target information may be repeated every time the video editing system 1 is used and the final editing information is created, or may be performed periodically. Moreover, when using said score, you may carry out according to the cumulative value of this score.

このように、処理対象情報を、多数の最終編集情報を基にして改変する作業は、周知の機械学習手法（ディープラーニング）等を用いても行うことができる。前記のように、映像の配信先等に応じて複数の処理対象情報が設定される場合には、これらの作業も処理対象情報毎に行うことができる。 As described above, the operation of modifying the processing target information based on a large number of final editing information can also be performed using a well-known machine learning method (deep learning) or the like. As described above, in the case where a plurality of pieces of processing target information are set according to the delivery destination of the video, etc., these tasks can also be performed for each piece of processing target information.

ユーザ自身が自動編集済み映像データに対する評価を入力できる設定とするための問い合わせ、入力は、編集装置１４における表示部１４２、操作パネル１４３（タッチパネルディスプレイ１４４）を用いて行うことができる。 The user can use the display unit 142 and the operation panel 143 (touch panel display 144) of the editing device 14 to make an inquiry and a setting for setting the evaluation of the automatically edited video data to be input.

図１０は、こうした表示の一例である。ここでは、表示Ｋにおいて、自動編集情報（処理の対象となる部分の各々及びそれぞれにおける処理の内容）の説明及びその適用の可否が行われ、上側の表示Ｌで、この際の自動編集情報の評価がユーザによって入力される。その後で下側の表示Ｍを操作することによって、自動編集情報が表示Ｋの操作を反映して改変された最終編集情報を用いた編集処理が実行される。 FIG. 10 is an example of such a display. Here, on the display K, the description of the automatic editing information (the contents of the processing in each of the portions to be processed and the respective portions) and the availability of the application thereof are performed. A rating is entered by the user. Thereafter, by operating the lower display M, editing processing is performed using the final editing information in which the automatic editing information has been modified to reflect the operation of the display K.

素材映像データには様々な種類のものがあり、場合によっては、一般的ではない特殊部分に対して処理を施す場合もある。こうした場合においては、自動編集情報と最終編集情報の違いが大きくなった場合でも、この場合の最終編集情報は、一般的に用いられる処理対象情報の改変に用いないことが好ましい。図１０に示されたように、この場合の自動編集情報を評価の対象としないことを選択した場合には、このように特殊な場合の最終編集情報は処理対象情報の改変には使用されない。 There are various types of material video data, and in some cases, processing may be performed on an uncommon special part. In such a case, even if the difference between the automatic editing information and the final editing information becomes large, it is preferable that the final editing information in this case is not used to modify the generally used processing target information. As shown in FIG. 10, when it is selected that the automatic editing information in this case is not targeted for evaluation, the final editing information in such a special case is not used for modifying the processing object information.

このように、新たに作成された最終編集情報をフィードバックして処理対象情報を更新する方法として、上記の他にも、様々な手法が適用可能である。 As described above, various methods other than the above can be applied as a method of feeding back the newly created final editing information and updating the processing target information.

また、例えば、上記の処理の対象となりうる部分としては、映り込んだ人物の顔があり、処理対象認識部１３１は画像中における顔を認識することが可能である。ここで、例えば、映り込んだ人物が複数おり、ある特定の人物の顔のみに対して処理を適用したい場合、あるいは逆にこの特定の人物以外の全ての人物の顔に処理を施したい場合がある。こうした場合には、処理対象情報において、人物の顔を上記の第１のレベルに設定すれば、前記の放送禁止用語の場合と同様に、警告のみを発し、この警告が解除されない限り、自動編集済み映像データを作成せず、かつ素材映像データも配信しない構成とすればよい。その後、ユーザは、映り込んだ全ての顔のうち、特定の人物の顔のみに処理を行う、あるいは逆に特定の人物の顔のみに処理を行わないように、操作パネル１４３を制御して最終編集情報を作成し、この最終編集情報に応じて編集済み映像データを作成した後に、これを配信させることができる。 Also, for example, as a part that can be the target of the above-described processing, there is a face of a person who is reflected, and the processing target recognition unit 131 can recognize the face in the image. Here, for example, when there are a plurality of reflected persons and it is desired to apply the processing only to the face of a specific person, or conversely, the processing may be applied to the faces of all persons other than the specific person. is there. In such a case, if the face of the person is set to the first level in the processing target information, as in the case of the above-mentioned broadcast prohibited term, only a warning is issued, and unless the warning is canceled, automatic editing is performed. What is necessary is just to set it as the structure which does not create finished video data and does not deliver material video data. Thereafter, the user controls the operation panel 143 so that the process is performed only on the face of a specific person among all the reflected faces, or the process is not performed on only the face of a specific person. After the editing information is created and the edited video data is created according to the final editing information, this can be distributed.

なお、上述の構成においては、記録装置１２（ビデオサーバ）に、処理対象認識部（処理対象認識手段、処理対象情報改変手段）１３１、情報記憶部（情報記憶手段）１３２を具備する自動編集情報作成装置１３と、編集制御部（編集手段）１４１、表示部（表示手段）１４２、操作パネル（操作手段）１４３を具備する編集装置１４が接続され、上記の動作が行われた。しかしながら、上記と同様の機能をもつ処理対象認識手段、処理対象情報改変手段、情報記憶手段、編集手段、表示手段等が素材映像データに関わって設けられ、自動編集済み映像データ、自動編集情報、最終編集情報等を作成することができる限りにおいて、具体的な装置の構成は任意である。すなわち、使用される各装置において上記の各手段がどのように設けられるかは任意であり、上記の各手段が全て単一の装置内に設けられていてもよい。 In the above configuration, the automatic editing information includes the processing target recognition unit (processing target recognition unit, processing target information modification unit) 131 and the information storage unit (information storage unit) 132 in the recording device 12 (video server). The editing device 14 including the creation device 13, the editing control unit (editing means) 141, the display unit (display means) 142, and the operation panel (operation means) 143 is connected, and the above operation is performed. However, processing target recognition means, processing target information modification means, information storage means, editing means, display means and the like having the same functions as described above are provided in relation to the material video data, and automatically edited video data, automatic editing information, The specific configuration of the apparatus is arbitrary as long as final editing information and the like can be created. That is, how each means described above is provided in each device to be used is optional, and each means described above may be provided in a single device.

次に図１１〜１８を参照して類似人物検索処理（特に類似顔検出処理）について説明する。当該処理は、類似顔画像検出装置１６や編集装置１４（特に類似顔画像検出操作部１０３）の機能により実行されるもので、特開２０１３−１０１４３１号公報に開示の技術を顔画像の認識処理に適用したものである。以下では、開示されている主要部分を例示する。 Next, similar person search processing (in particular, similar face detection processing) will be described with reference to FIGS. The processing is executed by the functions of the similar face image detection device 16 and the editing device 14 (in particular, the similar face image detection operation unit 103), and the technology disclosed in JP 2013-101431 A is a face image recognition process Applied to The following illustrates the main parts disclosed.

図１１（ａ）〜（ｇ）には、本実施例において、類似人物検索を実施する手順に沿って、検索キー画像の候補となった画像の特徴量を例示している。図１２には、類似人物検索（類似顔検出処理）を実施する手順を例示している。 FIGS. 11A to 11G illustrate feature amounts of images that have become candidates for search key images along the procedure of performing similar person search in the present embodiment. FIG. 12 exemplifies a procedure for performing similar person search (similar face detection processing).

まず、最初のキー画像による検索処理６００１では、ユーザが選択した最初の検索キー画像によって最初の検索が行われる。ここでは、最初の検索キー画像に選択された画像の特徴量（本例では、画像中の人物の特徴量）と距離が近い特徴量を有する画像を記録装置１２内の類似人物検索部２１８を通じて検索し、その結果、例えば１０件の画像が検索される。 First, in the search processing by the first key image 6001, the first search is performed by the first search key image selected by the user. Here, an image having a feature amount close to the feature amount of the image selected in the first search key image (in this example, the feature amount of a person in the image) is sent to the similar person search unit 218 in the recording device 12 As a result, for example, 10 images are searched.

図１１（ａ）には、最初の検索キー画像の特徴量を「○」で示してある。ここでは、説明の分かり易さのために画像の特徴量を２次元で表現しているが、実際には、画像の特徴量は例えば数百次元といった非常に多くの次元数を持つ場合が多い。 In FIG. 11A, the feature quantities of the first search key image are indicated by "o". Here, the feature quantities of the image are expressed in two dimensions for easy understanding of the explanation, but in practice, the feature quantities of the image often have a very large number of dimensions, for example, several hundred dimensions. .

ここで、検索結果である１０件の画像のうち３件が最初の検索キー画像と同一の対象であるとする。検索結果から同一人物を選択する処理６００２では、１０件の検索結果画像から目的の３件の画像を選択する。具体的には、例えば、ユーザが編集装置１４の操作パネル１４３やマウス（図示せず）を操作して目的の画像を選択する。なお、画像の特徴量について閾値を設け、最初の検索キー画像の特徴量と検索結果画像の特徴量との距離が閾値以下なら同一の対象（同一人物）であると判断し、該当する検索結果画像を自動選択する方法としてもよい。 Here, it is assumed that three out of ten images as search results are the same as the first search key image. In processing 6002 for selecting the same person from the search results, the target 3 images are selected from the 10 search result images. Specifically, for example, the user operates the operation panel 143 or the mouse (not shown) of the editing apparatus 14 to select a target image. A threshold is provided for the feature amount of the image, and if the distance between the feature amount of the first search key image and the feature amount of the search result image is equal to or less than the threshold, it is determined to be the same target (the same person), and the corresponding search result The image may be automatically selected.

図１１（ｂ）には、図１１（ａ）の内容に加え、検索結果から同一人物を選択する処理６００２によって選択された画像の特徴量を「△」で示してある。このような処理によって選択された画像は、新たな検索キー画像の候補となる。 In FIG. 11B, in addition to the contents of FIG. 11A, the feature amount of the image selected by the process 6002 of selecting the same person from the search result is indicated by “Δ”. An image selected by such processing becomes a candidate for a new search key image.

ここで、検索結果画像が動画を形成する連続的な画像のうちの１枚であるとすると、その動画における検索結果画像の前後にも同一人物の画像が含まれている場合が多い。検索結果前後の同一人物を選択する処理６００３では、検索結果画像が抽出された動画における検索結果画像の前又は後の所定長の時間帯に含まれる複数の画像から、人物の位置や進行速度等に基づいて検索結果画像の人物と同一人物（すなわち、検索キー画像の人物と同一人物）と判定される画像を自動的に選択する。なお、ユーザが指定できるようにしてもよい。 Here, if it is assumed that the search result image is one of continuous images forming a moving image, the images of the same person are often included before and after the search result image in the moving image. In a process 6003 for selecting the same person before and after the search result, the position, the traveling speed, etc. of the person from a plurality of images included in a predetermined time slot before or after the search result image in the moving image from which the search result image is extracted. And automatically select an image determined to be the same person as the person in the search result image (ie, the same person as the person in the search key image). The user may be able to specify.

図１１（ｃ）には、図１１（ｂ）の内容に加え、検索結果前後の同一人物を選択する処理６００３によって選択された画像の特徴量を「□」で示してある。このような処理によって選択された画像は、新たな検索キー画像の候補となる。 In addition to the content of FIG.11 (b), the feature-value of the image selected by the process 6003 which selects the same person before and behind a search result is shown by FIG.11 (c) by "(square). An image selected by such processing becomes a candidate for a new search key image.

マスクを付加する画像処理６００４では、これまでの処理で新たな検索キー画像の候補とした人物の画像に対し、画像処理で鼻や口を覆うマスクを付加した画像を生成し、新たな検索キー画像の候補に追加する処理を行う。なお、これとは逆に、画像処理前の人物の画像が鼻や口を覆うマスクをしている場合に、画像処理でマスクを外すようにする処理を行ってもよい。また、マスクの画像は複数種類の画像を準備してもよい。 In the image processing 6004 for adding a mask, an image is generated by adding a mask for covering the nose and the mouth by image processing to the image of a person who is a candidate for a new search key image in the processing up to this point, and a new search key Perform processing to add to image candidates. In contrast to this, when the image of the person before the image processing has a mask covering the nose and the mouth, processing may be performed to remove the mask in the image processing. Moreover, the image of a mask may prepare several types of images.

サングラスや眼鏡を付加する画像処理６００５では、これまでの処理で新たな検索キー画像の候補とした人物の画像に対し、画像処理でサングラスや眼鏡を付加した画像を生成し、新たな検索キー画像の候補に追加する処理を行う。なお、これとは逆に、画像処理前の人物の画像がサングラスや眼鏡をかけている場合に、画像処理でサングラスや眼鏡を外すようにする処理を行ってもよい。また、サングラスや眼鏡の画像は複数種類の画像を準備してもよい。 The image processing 6005 for adding sunglasses and glasses generates an image in which the sunglasses and glasses are added by image processing to the image of a person who is a candidate of a new search key image by the processing so far, and a new search key image Perform processing to be added to the candidate of. In contrast to this, when the image of the person before image processing wears sunglasses or glasses, processing may be performed to remove the sunglasses or glasses in the image processing. In addition, images of sunglasses and glasses may be prepared with multiple types of images.

人物の向きを変更する画像処理６００６では、これまでの処理で新たな検索キー画像の候補とした人物の画像に対し、画像処理で人物の向きを変更した画像を生成し、新たな検索キー画像の候補に追加する処理を行う。人物の向きは通常複数であるが、単純な左右反転であってもよい。 The image processing 6006 for changing the direction of the person generates an image in which the direction of the person is changed by the image processing with respect to the image of the person as a new search key image candidate in the processing up to this point. Perform processing to be added to the candidate of. Although the direction of the person is usually plural, it may be simple left-right inversion.

図１１（ｄ）には、図１１（ｃ）の内容に加え、マスクを付加する画像処理６００４、サングラスや眼鏡を付加する画像処理６００５、人物の向きを変更する画像処理６００６の結果生成された複数の画像の特徴量を「×」で示してある。このような処理によって生成された画像は、新たな検索キー画像の候補として追加される。 In FIG. 11D, in addition to the contents of FIG. 11C, an image processing 6004 for adding a mask, an image processing 6005 for adding sunglasses and glasses, and an image processing 6006 for changing the direction of a person are generated. The feature quantities of a plurality of images are indicated by “x”. An image generated by such processing is added as a candidate for a new search key image.

なお、マスクを付加する画像処理６００４、サングラスや眼鏡を付加する画像処理６００５、人物の向きを変更する画像処理６００６は、最初の検索キー画像、検索結果から同一人物を選択する処理６００２の結果の画像、検索結果前後の同一人物を選択する処理６００３の結果の画像のいずれを対象にして施してもよい。また、対象の画像に対していずれか１つの画像処理を施してもよく、任意の２つの画像処理を施してもよく、３つの画像処理を全て施してもよい。また、対象の画像の明暗を変える画像処理など、上記以外の画像処理を施してもよい。 Note that the image processing 6004 for adding a mask, the image processing 6005 for adding sunglasses and glasses, and the image processing 6006 for changing the direction of a person are the results of the processing 6002 for selecting the same person from the first search key image and search results. The processing may be performed on any of the image and the image of the result of processing 6003 for selecting the same person before and after the search result. Further, any one image processing may be performed on an image of a target, any two image processing may be performed, or all three image processing may be performed. In addition, image processing other than the above may be performed, such as image processing for changing the contrast of the target image.

次に、クラスタリング処理６００７では、これまでの処理６００１〜６００６により検索キー画像の候補とされた複数の画像をクラスタリングして、各クラスタを代表する画像（或いはその特徴量）を求める。クラスタリング方法としては、ｋ−ｍｅａｎｓ法などの公知の技術を用いることができる。各クラスタを代表する画像としては、例えば、そのクラスタに含まれる画像の特徴量の平均に最も近い画像が用いられ、その画像の特徴量が新たな検索キーとされる。なお、クラスタに含まれる画像の特徴量の平均をそのまま新たな検索キーとしてもよい。 Next, in the clustering processing 6007, a plurality of images which have been made candidates of search key images by the processing 6001 to 6006 thus far are clustered to obtain an image (or a feature value thereof) representative of each cluster. As a clustering method, known techniques such as the k-means method can be used. As an image representing each cluster, for example, an image closest to the average of the feature amounts of the images included in the cluster is used, and the feature amount of the image is used as a new search key. The average of the feature amounts of the images included in the cluster may be used as a new search key as it is.

図１１（ｅ）には、これまでの処理６００１〜６００６によって得られた新たな検索キー画像の候補がクラスタリング処理６００７によってクラスタに分けられた様子と、各クラスタを代表する画像の特徴量を例示してある。図１１（ｅ）では、３つのクラスタを枠線で囲って示してあり、各クラスタを代表する画像の特徴量として、各クラスタの重心に最も近い画像の特徴量Ｐ１１、Ｐ１２、Ｐ１３がそれぞれ選択されている。 FIG. 11E illustrates how new search key image candidates obtained by the processes 6001 to 6006 thus far are divided into clusters by the clustering process 6007 and feature amounts of images representative of the clusters. Yes. In FIG. 11E, three clusters are shown surrounded by a frame line, and feature amounts P11, P12, and P13 of an image closest to the center of gravity of each cluster are respectively selected as feature amounts of an image representative of each cluster. It is done.

代表する検索キーによる検索処理６００８では、クラスタリング処理６００７によって得られた各クラスタを代表する画像の特徴量を新たな検索キーに用いて類似画像検索を行い、結果を出力する。 In search processing 6008 using a representative search key, similar image search is performed using the feature amount of the image representative of each cluster obtained by the clustering processing 6007 as a new search key, and the result is output.

ここで、図１１（ｅ）の例では、最初の検索キー画像に関連する画像（処理６００１〜６００６により得られた画像）は２９枚あるため、従来であれば、これらの画像の特徴量を新たな検索キーとした検索を２９回繰り返していたところ、本実施例においては、クラスタリング処理６００７によって得られた各クラスタを代表する３つの画像の特徴量を用いて類似顔画像検索を行うことで、特徴量のバランスをとりつつ３回の検索で済むようにしている。ここでは、クラスタの数を３としたが、これは設定によって変えることができる。 Here, in the example of FIG. 11E, since there are 29 images (images obtained by the processes 6001 to 6006) related to the first search key image, the feature amounts of these images are conventionally used. Although the search with the new search key is repeated 29 times, in the present embodiment, similar face image search is performed using feature amounts of three images representing each cluster obtained by the clustering processing 6007. It is made to be able to do three searches while balancing feature quantities. Here, the number of clusters is three, but this can be changed depending on the setting.

次に、図１３を参照して編集装置１４の類似顔画像検出操作部１０３の画面について説明する。図１３には、本例の類似顔画像検索システムに使用可能な検索画面を例示してある。 Next, the screen of the similar face image detection operation unit 103 of the editing device 14 will be described with reference to FIG. FIG. 13 illustrates a search screen that can be used in the similar face image search system of this example.

検索画面は、再生画像表示領域３００１、画像再生操作領域３００３、検索キー画像指定領域３００４、検索絞込パラメータ指定領域３００８、検索実行領域３０１７、検索結果表示領域３０２０を有する。 The search screen has a reproduction image display area 3001, an image reproduction operation area 3003, a search key image specification area 3004, a search narrowing down parameter specification area 3008, a search execution area 3017, and a search result display area 3020.

再生画像表示領域３００１は、（類似顔画像検出装置１６や）記録装置１２に記録された画像を動画像として表示する領域である。また、再生画像表示領域３００１の動画３００２は、記録装置１２に記録された画像を動画像として表示するものである。 The reproduction image display area 3001 is an area for displaying an image recorded in the recording device 12 (similar face image detection device 16 or the like) as a moving image. Further, the moving image 3002 in the reproduction image display area 3001 is for displaying the image recorded in the recording device 12 as a moving image.

画像再生操作領域３００３は、記録装置１２に記録された画像を再生操作する領域である。本領域３００３を構成する各ボタンには、それぞれ固有の再生種類が割当てられている。本図においては、巻戻し、逆再生、再生停止、順再生、早送りの再生種類が左から順に割当てられている例を示している。ユーザが各ボタンをマウス２８２で適宜押下することにより、動画３００２がボタンに割当てられた再生種類に切り替る。 The image reproduction operation area 3003 is an area for reproducing an image recorded in the recording device 12. Each button constituting the present area 3003 is assigned its own reproduction type. In the figure, an example is shown in which rewind types, reverse playback, playback stop, forward playback, and fast-forward playback types are assigned in order from the left. When the user appropriately presses each button with the mouse 282, the moving image 3002 is switched to the playback type assigned to the button.

検索キー画像指定領域３００４は、検索キー画像の指定と表示を行う領域である。本領域３００４は、検索キー画像３００５と、映像指定ボタン３００６、ファイル指定ボタン３００７を有する。 A search key image specification area 3004 is an area for specifying and displaying a search key image. The present area 3004 has a search key image 3005, a video designation button 3006, and a file designation button 3007.

検索キー画像３００５は、類似検索のための最初の検索キー画像とする画像である。初期状態においては、検索キー画像は、未指定であるので、画像表示はされていない状態となる。なお、未指定の場合に、別途用意した未指定状態を示す画像を表示する等、未指定である旨の表記をするようにしてもよい。 The search key image 3005 is an image to be a first search key image for similarity search. In the initial state, since the search key image is not specified, the image is not displayed. In addition, when it does not designate, it may be made to indicate that it is not designated, such as displaying an image showing an unspecified state prepared separately.

映像指定ボタン３００６は、押下時に再生画像表示領域３００１に表示されている画像を、検索キー画像３００５として指定するボタンである。 The image designation button 3006 is a button for designating, as a search key image 3005, an image displayed in the reproduced image display area 3001 at the time of pressing.

ファイル指定ボタン３００７は、記録装置１２に記録されている画像以外の画像、例えば、デジタルスチルカメラで撮影した画像やスキャナで取込んだ画像等を、検索キー画像３００５として指定するボタンである。このボタン３００７を押下すると、それらの画像をファイル指定するダイアログボックスが表示され、ユーザはそこで所望の画像を指定することができる。 The file designation button 3007 is a button for designating, as a search key image 3005, an image other than the image recorded in the recording device 12, for example, an image photographed by a digital still camera or an image captured by a scanner. When the button 3007 is pressed, a dialog box for specifying the file of the image is displayed, and the user can specify a desired image there.

検索絞込パラメータ指定領域３００８は、検索の際の絞込パラメータの種類とその値（範囲）を指定する領域である。本領域３００８は、撮像装置指定チェックボックス３００９、３０１０、３０１１、３０１２と、タイムコード指定チェックボックス３０１３、３０１４と、タイムコード指定欄３０１５、３０１６を有する。 The search refinement parameter designation area 3008 is an area for designating the type of the refinement parameter at the time of the search and the value (range) thereof. This area 3008 has imaging device specification check boxes 3009, 3010, 3011 and 3012, time code specification check boxes 3013 and 3014, and time code specification fields 3015 and 3016.

撮像装置指定チェックボックス３００９、３０１０、３０１１、３０１２は、検索の際に検索対象とする撮像装置（カメラ１０等）を指定するチェックボックスである。本チェックボックス３００９、３０１０、３０１１、３０１２は、押下すると選ばれたことを示すチェックマークがそれぞれ表示される。このマークは再押下すると非表示となり、押下で表示・非表示を繰り返す。 The imaging device specification check boxes 3009, 3010, 3011, and 3012 are check boxes for specifying an imaging device (such as the camera 10) to be searched at the time of search. When the check boxes 3009, 3010, 3011, and 3012 are pressed, check marks indicating that they have been selected are displayed. When this mark is pressed again, it becomes non-display, and when it is pressed, display / non-display is repeated.

タイムコード指定チェックボックス３０１３、３０１４は、検索の際に検索対象とする時刻範囲を指定するチェックボックスである。表示の態様については本チェックボックスも他のチェックボックスと同様である。タイムコード指定チェックボックス３０１３を選択状態にした場合には時刻範囲に先頭時刻を与える。非選択状態にした場合には、時刻範囲に先頭時刻を与えない、すなわち、記録装置１２に記録された最も古い時刻の画像までを検索対象範囲とすることを意味する。 The time code specification check boxes 3013 and 3014 are check boxes for specifying a time range to be searched when searching. This check box is the same as the other check boxes as to the display mode. When the time code specification check box 3013 is selected, the start time is given to the time range. In the non-selected state, it means that the head time is not given to the time range, that is, the image of the oldest time recorded in the recording device 12 is set as the search target range.

同様にタイムコード指定チェックボックス３０１４を選択状態にした場合には時刻範囲に末尾時刻を与える。非選択状態にした場合には、時刻範囲に末尾時刻を与えない、すなわち、記録装置１２に記録された最も新しい時刻の画像までを検索対象範囲とすることを意味する。 Similarly, when the time code specification check box 3014 is selected, the end time is given to the time range. In the non-selected state, it means that the end time is not given to the time range, that is, the image of the newest time recorded in the recording device 12 is set as the search target range.

タイムコード指定欄３０１５、３０１６は、上述の先頭時刻と末尾時刻の値を指定する入力欄である。初期状態においては、全時間帯を検索対象とするため、タイムコード指定チェックボックス３０１３、３０１４は全て非選択状態、タイムコード指定欄３０１５、３０１６は空欄とする。 The time code specification fields 3015 and 3016 are input fields for specifying the values of the start time and the end time described above. In the initial state, since all time zones are to be searched, the time code specification check boxes 3013 and 3014 are all not selected and the time code specification fields 3015 and 3016 are blank.

検索実行領域３０１７は、検索実行を指示する領域である。本領域３０１７は、類似人物検索ボタン３０１８、登場イベント検索ボタン３０１９に加え、検索結果からの類似人物検索ボタン３３００、同一シーンチェックボックス３２０１、マスクチェックボックス３２０２、サングラスチェックボックス３２０３、異なる角度チェックボックス３２０４を有する。 The search execution area 3017 is an area for instructing a search execution. This area 3017 includes a similar person search button 3018 and an appearance event search button 3019, as well as a similar person search button 3300 from the search result, the same scene check box 3201, a mask check box 3202, a sunglasses check box 3203, a different angle check box 3204 Have.

類似人物検索ボタン３０１８は、検索キー画像３００５による類似人物検索（最初のキー画像による検索処理６００１）の実行を指示するボタンである。検索絞込パラメータ指定領域３００８にてパラメータが指定されている場合には、指定されたパラメータに従って類似人物検索の実行を指示する。 The similar person search button 3018 is a button for instructing execution of a similar person search (search process 6001 using a first key image) by the search key image 3005. When a parameter is designated in the search narrowing parameter designation area 3008, execution of a similar person search is instructed according to the designated parameter.

登場イベント検索ボタン３０１９は、登場イベント検索の実行を指示するボタンである。検索絞込パラメータ指定領域３００８にてパラメータが指定されている場合には、指定されたパラメータに従って登場イベント検索の実行を指示する。 The appearance event search button 3019 is a button for instructing execution of the appearance event search. If a parameter is specified in the search / refinement parameter specification area 3008, execution of the appearance event search is instructed according to the specified parameter.

検索結果表示領域３０２０は、検索結果を表示する領域である。検索結果の表示は、検索結果画像を一覧表示することにより実施する。初期状態においては、検索結果表示領域３０２０には何も表示されない。 The search result display area 3020 is an area for displaying a search result. The display of the search results is carried out by displaying a list of search result images. In the initial state, nothing is displayed in the search result display area 3020.

ここで、ユーザが、映像指定ボタン３００６を押下し、また、撮像装置指定チェックボックス３００９、３０１０、３０１２を押下し、更に、タイムコード指定チェックボックス３０１３、３０１４を押下し、タイムコード指定欄３０１５、３０１６にそれぞれ「１５：３０：２０：１７」、「１２：３０：２０：１７」と入力したとする。 Here, the user presses the image designation button 3006, presses the imaging device designation check box 3009, 3010, 3012, and further presses the time code designation check boxes 3013, 3014, and the time code designation field 3015, It is assumed that “15: 30: 20: 17” and “12: 30: 20: 17” are input to 3016 respectively.

これにより、図１３に示すように、検索キー画像３００５には、動画３００２に表示された人物「Ａさん」の画像が検索キー画像として指定され、また、検索対象としたい撮像装置２０１として「カメラ１」、「カメラ２」、「カメラ４」の３つが指定され、検索対象としたい時刻範囲として「１５：３０：２０：１７から１２：３０：２０：１７まで」が指定される。 Thereby, as shown in FIG. 13, in the search key image 3005, an image of the person “A” displayed in the moving image 3002 is designated as a search key image, and “camera Three points “1”, “camera 2”, and “camera 4” are designated, and “15: 30: 20: 17 to 12: 30: 20: 17” is designated as a time range to be searched.

その後、ユーザが、類似人物検索ボタン３０１８を押下したとする。すると、検索結果表示領域３０２０には、検索キー画像３００５を用いて類似人物検索を実行して得られた検索結果が表示される。図１３は、この状態における検索画面の一例を示したものである。検索結果の表示は、検索結果画像（本例では、検索結果画像３０３１〜３１４１）を一覧表示することにより実施する。 Thereafter, it is assumed that the user presses the similar person search button 3018. Then, in the search result display area 3020, the search result obtained by executing the similar person search using the search key image 3005 is displayed. FIG. 13 shows an example of the search screen in this state. The display of the search results is implemented by displaying a list of search result images (in this example, search result images 3031 to 3141).

検索結果画像３０３１〜３１４１は、例えば、最上段左から右へ、次に２段目左から右へと検索キー画像３００５に対する類似度順に表示する。この表示例においては、検索結果画像３０３１が検索キー画像３００５に対し最も類似度が高く、検索結果画像３１４１が最も類似度が低いということを示している。 For example, the search result images 3031 to 3141 are displayed in order of similarity with the search key image 3005 from the top left to the right and then from the left to the right in the second row. In this display example, it is shown that the search result image 3031 has the highest similarity to the search key image 3005, and the search result image 3141 has the lowest similarity.

この図に示された例の表記において、検索結果表示領域３０２０内の検索結果画像３０３１〜３１４１上に図示した円とアルファベットは、人物の顔と人物名称を簡略表示したものであり、例えば、検索結果画像３０３１には、人物「Ａさん」が登場することを示している。この簡略表示している部分には、もちろん、実際のシステムでの表示では実画像が表示される。 In the notation of the example shown in this figure, the circle and the alphabet illustrated on the search result images 3031 to 3141 in the search result display area 3020 are a simplified representation of the person's face and person name, for example, the search The result image 3031 shows that the character "Mr. A" appears. Of course, an actual image is displayed in the display in an actual system in the simplified display portion.

検索結果画像３０３１の周辺には、頭出し再生ボタン３０３２、検索キー画像指定ボタン３０３３、検索対象チェックボックス３３０１を備える。他の検索結果画像３０４１〜３１４１も同様である。 Around the search result image 3031, a cue play button 3032, a search key image designation button 3033, and a search target check box 3301 are provided. The same applies to the other search result images 3041 to 3141.

頭出し再生ボタン３０３２は、検索結果画像３０３１を先頭とした連続動画再生開始を指示するボタンである。例えば、頭出し再生ボタン３０３２を押下すると動画３００２が検索結果画像３０３１に切り替り、その検索結果画像３０３１を先頭として始まる動画をユーザは、視聴することができる。 The cue reproduction button 3032 is a button for instructing start of continuous moving image reproduction with the search result image 3031 at the top. For example, when the cue reproduction button 3032 is pressed, the moving image 3002 is switched to the search result image 3031, and the user can view the moving image starting with the search result image 3031.

検索キー画像指定ボタン３０３３は、検索結果画像３０３１を新たな検索キー画像に指定するボタンである。例えば、検索キー画像指定ボタン３０３３を押下すると、検索結果画像３０３１が検索キー画像３００５に表示される。これにより、検索結果画像３０３１を使って再検索を実施することができる。 The search key image designation button 3033 is a button for designating the search result image 3031 as a new search key image. For example, when the search key image designation button 3033 is pressed, the search result image 3031 is displayed on the search key image 3005. Thereby, it is possible to carry out the re-search using the search result image 3031.

検索対象チェックボックス３３０１は、検索結果からの類似人物検索ボタン３３００を押下した場合に新たな検索キー画像（或いはその候補）として検索結果画像３０３１を指定するチェックボックスである。例えば、検索結果に出てきた「Ａさん」の画像（本例では、検索結果画像２０３１〜３０６１、３０８１、３０９１、３１２１、３１４１）を全てチェックして、検索結果からの類似人物検索ボタン３３００を押すことで、様々なパターンの「Ａさん」を検索することが可能である。 The search target check box 3301 is a check box for specifying the search result image 3031 as a new search key image (or its candidate) when the similar person search button 3300 from the search result is pressed. For example, all images of Mr. "A" appearing in the search result (in this example, search result images 2031 to 3061, 3081, 3091, 3121, 3141) are checked, and the similar person search button 3300 from the search result is displayed. By pressing, it is possible to search for "Mr. A" of various patterns.

検索結果からの類似人物検索ボタン３３００は、検索キー画像３００５による類似人物検索の結果に基づく再度の類似人物検索（代表する検索キーによる検索処理６００８）の実行を指示するボタンである。再度の類似人物検索では、検索結果表示領域３０２０の表示（最初のキー画像による検索処理６００１の結果）の中からユーザに選択された（検索対象チェックボックスがチェックされた）画像を新たな検索キー画像（或いはその候補）として類似人物検索を再実行する。 The similar person search button 3300 from the search result is a button for instructing execution of another similar person search (search process with representative search key 6008) based on the result of the similar person search by the search key image 3005. In the similar person search again, the user selects an image (a search target check box is checked) selected by the user from the display of the search result display area 3020 (the result of the search process 6001 by the first key image) as a new search key Re-execute similar person search as an image (or its candidate).

同一シーンチェックボックス３２０１は、検索結果表示領域３０２０の表示の中からユーザに選択された画像を対象にして検索結果前後の同一人物を選択する処理６００３を実行し、その結果の画像（対象の画像中の人物と同一人物を映した前後の画像）を新たな検索キー画像の候補に追加することを指定するチェックボックスである。 The same scene check box 3201 executes processing 6003 for selecting the same person before and after the search result for the image selected by the user from the display of the search result display area 3020, and the resulting image (target image This is a check box for designating that the image of the same person as the person in the image before and after) be added to the new search key image candidate.

尚、マスクチェックボックス３２０２は、検索結果表示領域３０２０の表示の中からユーザに選択された画像を対象にしてマスクを付加する画像処理６００４を実行し、その結果の画像（対象の画像中の人物にマスクを付加した画像或いは当該人物からマスクを外した画像）を新たな検索キー画像の候補に追加することを指定するチェックボックスである。 Note that the mask check box 3202 executes the image processing 6004 for adding a mask to an image selected by the user from the display of the search result display area 3020, and the resulting image (a person in the target image Is a check box for designating that an image with a mask added or an image with a mask removed from the person is added as a new search key image candidate.

また、サングラスチェックボックス３２０３は、検索結果表示領域３０２０の表示の中からユーザに選択された画像を対象にしてサングラスや眼鏡を付加する画像処理６００５を実行し、その結果の画像（対象の画像中の人物にサングラス等を付加した画像或いは当該人物からサングラス等を外した画像）を新たな検索キー画像の候補に追加することを指定するチェックボックスである。 In addition, the sunglasses check box 3203 executes an image processing 6005 for adding sunglasses or glasses to an image selected by the user from the display of the search result display area 3020, and the resulting image (in the target image It is a check box for designating that an image obtained by adding sunglasses or the like to a person or an image obtained by removing sunglasses or the like from the person is added as a new search key image candidate.

異なる角度チェックボックス３２０４は、検索結果表示領域３０２０の表示の中からユーザに選択された画像を対象にして人物の向きを変更する画像処理６００６を実行し、その結果の画像（対象の画像中の人物の向きを変更した画像）を新たな検索キー画像の候補に追加することを指定するチェックボックスである。 The different angle check box 3204 executes the image processing 6006 for changing the direction of the person for the image selected by the user from the display of the search result display area 3020, and the resulting image (in the target image This is a check box for designating that an image in which the direction of the person is changed is added to a new search key image candidate.

これらのチェックボックス３２０１〜３２０４の１以上がチェックされた状態で検索結果からの類似人物検索ボタン３３００が押下された場合には、検索結果表示領域３０２０の表示の中からユーザに選択された各々の画像を対象にして、チェックされた状態のチェックボックスに対応する画像処理を実行し、その結果生成された画像を新たな検索キー画像の候補を追加し、その後、新たな検索キー画像の候補に対してクラスタリング処理６００７を実行して各クラスタを代表する検索キー画像を求め、各クラスタを代表する画像の特徴量を検索キーとして用いて類似画像検索を実行する。 When the similar person search button 3300 from the search result is pressed in a state where one or more of these check boxes 3201 to 3204 is checked, each of the display of the search result display area 3020 is selected by the user. Perform image processing corresponding to the checkbox in the checked state for an image, add a new search key image candidate as a result to the generated image, and then add the new search key image candidate On the other hand, clustering processing 6007 is executed to obtain a search key image representing each cluster, and the similar image search is executed using the feature amount of the image representing each cluster as a search key.

以上のように、上記の例では、検索キーの候補である複数の画像の特徴量に基づいて、検索キーとする画像の特徴量を決定する検索キー決定手段と、検索キー決定手段により検索キーに決定された画像の特徴量に類似する特徴量を有する画像を検索する検索手段と、を備えた構成において、検索キー決定手段が、検索キーの候補である複数の画像の特徴量をクラスタリングし、クラスタ毎にそのクラスタを代表する画像の特徴量を検索キーとして決定し、検索手段が、検索キー決定手段により決定されたクラスタ毎の検索キーをそれぞれ用いて検索を行うように構成した。 As described above, in the above example, search key determination means for determining the feature amount of the image to be the search key based on the feature amounts of the plurality of images as search key candidates, and the search key by the search key determination means And retrieval means for retrieving an image having a feature amount similar to the feature amount of the image determined in step b), the search key determination means clusters feature amounts of a plurality of images as search key candidates. The feature amount of the image representing the cluster is determined as the search key for each cluster, and the search means is configured to perform the search using the search key for each cluster determined by the search key determination means.

なお、本例では、編集装置１４の類似顔画像検出操作部１０３の複数検索キー選択部１１３の機能により検索キー決定手段を実現し、類似顔画像検出装置１６の類似人物検索部２１８の機能により検索手段を実現しているが、他の態様により検索キー決定手段及び検索手段を実現しても構わない。 In this example, the search key determination unit is realized by the function of the multiple search key selection unit 113 of the similar face image detection operation unit 103 of the editing device 14, and the function of the similar person search unit 218 of the similar face image detection device 16. Although the search means is realized, the search key determination means and the search means may be realized according to other aspects.

次に、図１４〜１６を参照して上述した類似人物検索処理（類似顔検出処理）を編集処理に適用した処理例を説明する。 Next, a processing example in which the similar person search processing (similar face detection processing) described above is applied to the editing processing will be described with reference to FIGS.

上述の様に、従来から行われている出演者の出演シーン（出演映像）を探し出すまでのフローでは、担当者（編集者等）は管理端末で出演者の情報を検索すると、その出演者が出演している番組及び、その番組が記録されているＶＴＲテープ番号の一覧が表示される。その後、担当者は出力されたテープ番号のＶＴＲテープを棚から取り出し、ＶＴＲ再生機にかけて再生する。そして、再生映像を目視して出演シーンを探し、出演シーンのタイムコード情報を記録していた。このようなフローでは、作業効率や精度の観点から、改善が必要とされていた。そこで、次の様なフローによる技術を導入する。 As described above, in the flow until the cast scene (cast video) of the cast performed conventionally is searched, if the person in charge (editor etc.) searches the information of the cast on the management terminal, the cast A list of appearing programs and VTR tape numbers in which the programs are recorded is displayed. After that, the person in charge takes out the VTR tape of the output tape number from the shelf and reproduces it by the VTR reproducing machine. Then, the reproduction video was visually observed to search for a cast scene, and the time code information of the cast scene was recorded. In such a flow, improvement has been required from the viewpoint of work efficiency and accuracy. Therefore, we introduce the following flow technology.

図１４は元の映像が、メディア５（光学メディア５ａ、磁気メディア５ｂ及びＶＴＲテープ５ｃ）に記録されている場合に、顔画像蓄積サーバ１２７に顔画像を蓄積する手順を示す。メディア５を探し出すまでの手順は、従来通りである。 FIG. 14 shows a procedure for storing a face image in the face image storage server 127 when the original video is recorded on the medium 5 (optical media 5a, magnetic media 5b and VTR tape 5c). The procedure for finding the media 5 is conventional.

元の映像が光学メディア５ａや磁気メディア５ｂに記録されている場合は、探し出したメディア（光学メディア５ａや磁気メディア５ｂ）から映像ファイルを取り出し、類似顔画像検出装置１６で映像ファイルを再生し、上述の類似人物検索処理の技術を用いて、再生映像から顔部分の映像のみを切出して、切出した顔画像をタイムコード情報と共に顔画像蓄積サーバ１２７に保存する。 When the original video is recorded on the optical media 5a and the magnetic media 5b, the video file is taken out from the found media (the optical media 5a and the magnetic media 5b), and the similar face image detection device 16 reproduces the video file. Only the image of the face portion is cut out from the reproduced image using the above-mentioned similar person search processing technology, and the cut-out face image is stored in the face image storage server 127 together with time code information.

蓄積される顔画像は、１種類（一般には正面の顔）のみでなく、顔種別（正面、横顔、斜め顔、後ろ顔、笑った顔、怒った顔等）を検出対象の顔画像として複数登録して保存可能であり、顔画像とその顔種別が関連付けられて記録される。検出対象とする顔画像を複数、特に種別の異なる顔画像を複数、準備しておくことにより、特定の出演者が出演している映像をより精度よく検出することが可能となると共に、特定の出演者の映像の中でも特に欲しい状況（笑った顔の映像が欲しい等）を検出することができる。また、顔画像の蓄積の際に、出演者の名前が特定できている場合には、その名前も登録されてもよい。また、同一出演者について複数の顔画像が顔画像蓄積サーバ１２７に記録される場合に、基準となる顔画像（基準顔画像）が指定されてもよい。基準顔画像は、一つに限る趣旨では無いが、作業性の観点から、顔種別毎に１つや、所定の出演時期（例えば５年間）に一つといった程度に設定されうる。 The face images to be stored are not only one type (generally a frontal face) but a plurality of face types (front, side, diagonal, back, laughing, angry, etc.) as face images to be detected. It can be registered and saved, and a face image and its face type are associated and recorded. By preparing a plurality of face images to be detected, in particular a plurality of face images of different types, it becomes possible to more accurately detect a video on which a specific performer has appeared, and It is possible to detect a particularly desired situation (such as wanting a video of a smiling face) in the video of a performer. When the face image is stored, if the names of the performers can be identified, the names may also be registered. When a plurality of face images of the same performer are recorded in the face image storage server 127, a reference face image (reference face image) may be designated. The reference face image is not limited to one, but may be set to one for each face type or one at a predetermined appearance time (for example, five years) from the viewpoint of workability.

元の映像がＶＴＲテープ５ｃに記録されている場合は、探し出したＶＴＲテープ５ｃをＶＴＲ再生装置（メディア再生装置１９）で再生し、類似顔画像検出装置１６に取り込む。類似顔画像検出装置１６は、取り込んだ再生映像から、光学メディア５ａや磁気メディア５ｂの場合と同様に、類似人物検索処理の技術を用いて、顔部分の映像のみを切出して、切出した顔画像をタイムコード情報と共に顔画像蓄積サーバ１２７に保存する。 When the original video is recorded on the VTR tape 5 c, the VTR tape 5 c found is reproduced by the VTR reproducing device (media reproducing device 19) and taken into the similar face image detecting device 16. Similar to the case of the optical media 5a and the magnetic media 5b, the similar face image detection device 16 cuts out only the image of the face portion using the technique of similar person search processing as in the case of the optical media 5a and the magnetic media 5b. Are stored in the face image storage server 127 together with time code information.

図１５は元の映像が低解像度サーバ１２６に記録されている場合に、顔画像蓄積サーバ１２７に顔画像を蓄積する手順を示す。 FIG. 15 shows a procedure for storing a face image in the face image storage server 127 when the original video is recorded in the low resolution server 126.

元の映像が低解像度サーバ１２６に記録されている場合、担当者が管理端末１７上で出演者の情報を検索すると、その出演者が出演している番組及び、その番組が記録されている低解像度サーバ１２６内の映像ファイル名が出力される。その情報をそのままオンラインで、すなわちネットワーク２を介して類似顔画像検出装置１６に渡される。その結果、低解像度サーバ１２６から映像ファイルを取り出し、類似顔画像検出装置１６で映像ファイルを再生することで再生映像から顔部分の映像のみを切出して、切出した顔画像をタイムコード情報と共に顔画像蓄積サーバ１２７に保存する。 When the original video is recorded on the low resolution server 126, when the person in charge searches for information on the performer on the management terminal 17, the program in which the performer appears and the low on which the program is recorded are recorded. The video file name in the resolution server 126 is output. The information is passed to the similar face image detection device 16 online as it is, that is, via the network 2. As a result, the video file is taken out from the low resolution server 126, and the similar face image detection device 16 reproduces the video file to cut out only the video of the face part from the reproduced video, and the cut out face image together with the time code information It is stored in the accumulation server 127.

図１６は目的の出演者の顔画像を検出対象として顔画像蓄積サーバ１２７から類似顔検出した図である。 FIG. 16 is a diagram in which a similar face is detected from the face image storage server 127 with the face image of the target performer as the detection target.

編集者は目的の出演者の顔画像ファイル（検出対象顔画像）を類似顔画像検出装置１６に読み込ませる。検出対象顔画像は、編集対象の映像ファイルから代表的な顔画像として抽出された画像でもよいし、顔画像蓄積サーバ１２７に含まれる顔画像から選択された顔画像でもよいし、ウェブ上の画像から取り込んだ画像でもよい。類似顔画像検出装置１６は、検出対象顔画像と顔画像蓄積サーバ１２７内の顔画像とを比較し、同じ顔の出演者が出演するシーンの顔画像及び、タイムコード情報が検索される。 The editor causes the similar face image detection device 16 to read the face image file (the face image to be detected) of the target performer. The face image to be detected may be an image extracted as a representative face image from a video file to be edited, or may be a face image selected from face images included in the face image storage server 127, or an image on the web It may be an image captured from. The similar face image detection device 16 compares the face image to be detected with the face image in the face image storage server 127, and a face image of a scene where a cast of the same face appears and time code information are searched.

ここで検出した出演シーンのタイムコード情報が編集装置１４に渡される。編集者は目的の出演者が出演しているシーンを探し出す手間がなく、出演者の特集番組を制作したり出演者にモザイクをかけることが可能となる。 The time code information of the appearance scene detected here is passed to the editing device 14. The editor does not have to search for a scene in which a target performer appears, and can create a feature program of the performer or apply a mosaic to the performer.

また、検出した出演シーンについて、編集装置１４を使用せずに試写したい場合は、低解像度サーバ１２６内の映像ファイルを再生することで、出演シーン試写が容易に可能となる。 In addition, if it is desired to make a preview of the detected appearance scene without using the editing device 14, the appearance scene preview can be easily performed by reproducing the video file in the low resolution server 126.

このような類似顔検出処理をすることで、例えば、放送局の厖大な過去映像の中から目的の出演者が出演しているシーンを探し出す場合に、類似顔画像検出装置１６が自動的に出演シーンを検出してくれる。その結果、編集者はメディア５（光学メディア５ａ、磁気メディア５ｂ、ＶＴＲテープ５ｃ）の映像を注視している必要がなくなる。その間に編集者は他の仕事をすることが可能となり、編集者の業務効率を大幅に向上させることができる。 By performing such a similar face detection process, the similar face image detection device 16 automatically appears, for example, when searching for a scene in which a target performer appears from a huge past video of a broadcasting station. It will detect the scene. As a result, the editor need not look at the image of the medium 5 (optical medium 5a, magnetic medium 5b, VTR tape 5c). In the meantime, the editor can do other work, and the efficiency of the editor's work can be greatly improved.

また、編集装置１４の数には限りがあるため、編集装置１４を使用できない場合は事前に目的の出演者の出演シーンを探して、低解像度サーバ１２６の映像ファイルを使用して出演シーンを事前に試写しておくことで、編集前の事前作業が可能となる。 In addition, since the number of editing devices 14 is limited, when the editing device 14 can not be used, the appearance scene of the target performer is searched in advance, and the appearance scene is previously made using the video file of the low resolution server 126 By previewing in, it is possible to perform pre-editing work.

また、番組編集を完了して放送直前に出演者が問題を起こしたことにより、その出演者の放送が不可になった場合には、上述の技術によって、容易に目的の出演者の出演シーンを探し、その出演者にモザイクをかける処理や、または出演シーンをカットする処理が可能となり、スポンサーや視聴者からのクレーム防止になる。 In addition, when the cast has a problem just before the broadcast is completed, and the cast of the cast becomes impossible, the above-mentioned technology makes it easy for the cast scene of the desired cast to appear. It becomes possible to perform processing for applying mosaics to the performers, or for processing to cut out the appearance scenes, thereby preventing claims from sponsors and viewers.

上記処理では、放送局の過去の映像から出演者を検出する。しかし、映像収録から数十年経つと出演者の顔も変化していくため、目的の出演者の現在の顔画像を検出対象とすると、検出の精度が落ちる可能性が高くなる。それを解決するために、一度、現在の顔画像を検出対象として検出した結果の顔画像（検出精度の落ちた過去の顔画像）に替えて、検出対象の顔画像として新たに再登録し、再度類似顔画像検出することで検出精度を向上させることができる。すなわち、２ステップの検出（基準顔（基準顔画像）の新情報再登録→類似顔画像検索）による検出精度向上が期待できる。 In the above process, a performer is detected from the past video of the broadcast station. However, since the face of the performer changes several decades after the video recording, if the current face image of the target performer is targeted for detection, the accuracy of the detection may be low. In order to solve that, once, it replaces with the face image (the face image of the past which detection accuracy fell) of a result detected as the detection object of the present face image again, and registers it newly as a face image of detection object, The detection accuracy can be improved by detecting similar face images again. That is, improvement in detection accuracy can be expected by two-step detection (new information re-registration of reference face (reference face image) → similar face image search).

そして、検出した出演映像のタイムコード情報を編集機に渡すことで、編集者はその出演者が出演している映像にモザイクをかけたり、または出演映像をカットすることが可能となる。 Then, by passing the time code information of the detected appearance video to the editing machine, the editor can mosaic the video on which the cast appears, or cut the appearance video.

また、出演時期（撮影時期）が近い顔画像であれば、同じような特徴量が現れると考えられるため、同じような特徴量を辿りながら出演時期が開いている顔画像も検出することができる。また、同様に横顔の映像が欲しい場合は、横顔を検出対象の顔画像として再登録し、類似顔画像検出することで、より絞った出演シーンの検出が可能となる。 Also, if the appearance time (shooting time) is a face image that is close, similar feature quantities are considered to appear, so it is possible to detect face images with appearance times open while following similar feature quantities. . Similarly, when a side face image is desired, the side face is re-registered as a face image to be detected, and similar face image detection enables detection of a more narrow appearance scene.

そして、検出した出演シーンのタイムコード情報を編集機に渡すことで、編集者はその出演者が出演しているシーンのみを纏めた特集番組を作成することが可能となる。 Then, by passing the time code information of the detected appearance scene to the editing machine, the editor can create a special program in which only the scene in which the performer appears.

現在の放送局では、編集完了した映像（編集済み映像）を光学メディア５ａで記録し、それを再生装置にかけて放送出力するか、または、その光学メディア５ａから送出サーバ１８に取り込んで放送出力する運用が多い。そのため、光学メディア５ａから映像ファイルを取り出し、類似顔画像検出装置１６（類似顔検出装置）内で映像ファイルを再生させ、再生映像から顔の映像のみ切出して、切出した顔画像をタイムコードと共に顔画像蓄積サーバ１２７に保存しておき、目的の出演者の顔画像を検出対象として類似顔検出することで担当者は映像を目視しなくても出演シーンを探すことが可能となる。ここで、検出対象とする顔画像を正面顔、横顔、斜め顔等複数準備しておくことにより、探したい出演者が出演しているシーンをより精度よく検出することが可能となる。 At the present broadcasting station, the edited video (edited video) is recorded on the optical media 5a, and it is broadcasted to the reproduction apparatus or broadcasted from the optical media 5a or taken from the optical media 5a to the transmission server 18 and broadcasted. There are many. Therefore, the video file is taken out from the optical medium 5a, the video file is reproduced in the similar face image detection device 16 (similar face detection device), only the video of the face is cut out from the reproduction video, and the cut out face image is a face together with the time code. By storing similar faces in the image storage server 127 and detecting the face image of the target performer as a detection target, the person in charge can search for the appearance scene without looking at the video. Here, by preparing a plurality of face images to be detected, such as a front face, a side face, an oblique face, etc., it becomes possible to more accurately detect a scene in which a performer who wants to search is appearing.

以上、本発明を実施形態をもとに説明した。この実施形態は例示であり、それらの各構成要素の組み合わせにいろいろな変形例が可能なこと、またそうした変形例も本発明の範囲にあることは当業者に理解されるところである。 The present invention has been described above based on the embodiments. This embodiment is an exemplification, and it is understood by those skilled in the art that various modifications can be made to the combination of the respective constituent elements, and such modifications are also within the scope of the present invention.

１映像編集システム
２ネットワーク
５メディア
５ａ光学メディア
５ｂ磁気メディア
５ｃＶＴＲテープ
１０カメラ
１１収録装置
１２記録装置
１３自動編集情報作成装置
１４編集装置
１５システム制御部
１６類似顔画像検出装置
１７管理端末
１８送出サーバ
１９メディア再生装置
１０３類似顔画像検出操作部
１１０キーワード記録部
１１１キーワード検索部
１１２キーワード付与要求送信部
１１３複数検索キー選択部
１２１素材映像データ部
１２２編集済み映像データ部
１２３自動編集済み映像データ部
１２４最終編集情報部
１２５自動編集情報部
１２６低解像度サーバ
１２７顔画像蓄積サーバ
１２８処理対象情報部
１３１処理対象認識部
１３２情報記憶部
１４１編集制御部
１４２表示部
１４３操作パネル
１４４タッチパネルディスプレイ
２１０画像送受信部
２１１画像記録部
２１２再生制御部
２１３人物領域検出部
２１４人物特徴量抽出部
２１５人物特徴量記録部
２１６属性情報記録部
２１７要求受信部
２１８類似人物検索部
２１９登場イベント検索部
２２０検索結果送信部
２２１検索要求送信部
２２２検索結果受信部
２２３検索結果表示部
２２４再生画像表示部
２２５画面操作検知部 1 video editing system 2 network 5 media 5a optical media 5b magnetic media 5c VTR tape 10 camera 11 recording device 12 recording device 13 automatic editing information creation device 14 editing device 15 system control section 16 similar face image detection device 17 management terminal 18 transmission server 19 media reproduction apparatus 103 similar face image detection operation unit 110 keyword recording unit 111 keyword search unit 112 keyword assignment request transmission unit 113 multiple search key selection unit 121 material video data unit 122 edited video data unit 123 automatically edited video data unit 124 Final editing information unit 125 Automatic editing information unit 126 Low resolution server 127 Face image storage server 128 Processing object information unit 131 Processing object recognition unit 132 Information storage unit 141 Editing control unit 142 Display unit 143 Operation panel 144 Touch Image display unit 212 image recording unit 212 reproduction control unit 213 person area detection unit 214 person feature amount extraction unit 215 person feature amount recording unit 216 attribute information recording unit 217 request reception unit 218 similar person search unit 219 appearance event search unit 220 search result transmission unit 221 search request transmission unit 222 search result reception unit 223 search result display unit 224 reproduction image display unit 225 screen operation detection unit

Claims

An editing system provided with an editing apparatus for editing a video file used for broadcasting, comprising:
A face image storage server for acquiring the face image of the performer included in the video file and associating the face image with time code information of the appearance video of each performer;
A cast video detection unit that detects a cast video in the specific program by comparing the face image recorded in the face image storage server with the face image to be searched included in the video file of the specific program;
Based on the appearance video detected by the appearance video detection unit, another appearance video in which a person of the face image to be searched for appears in the specific program is detected by similar face image search, and And a similar face image detection device for notifying the editing device of time code information in association with the performer information as the search target,
The editing system is characterized in that the video file of the specific program is edited using the time code information.

The editing device reproduces the video file of the recording device when the video file of the performer is stored in the recording device to be processed by the similar face image detection device when editing the video file. The editing system according to claim 1, wherein the video confirmation of the appearance video is displayed in a possible manner.

The editing system according to claim 1 or 2, wherein the editing apparatus reproduces the appearance video detected using the low resolution video.

The face image storage server can store a face image to be detected in association with the type of face, and
The editing system according to any one of claims 1 to 3, wherein the similar face image detection device performs a similar face image search according to the face type.