JP6934402B2

JP6934402B2 - Editing system

Info

Publication number: JP6934402B2
Application number: JP2017219011A
Authority: JP
Inventors: 治彦小島
Original assignee: Hitachi Kokusai Electric Inc
Current assignee: Hitachi Kokusai Electric Inc
Priority date: 2017-11-14
Filing date: 2017-11-14
Publication date: 2021-09-15
Anticipated expiration: 2037-11-14
Also published as: JP2019092025A

Description

本発明は、編集システムに係り、例えば、映像を蓄積している放送局において、祝賀番組や追悼番組などを制作するために、過去の映像の中から特定の出演者の出演シーンを検出して、番組制作のための編集を補助する機能を有する編集システムに関する。 The present invention relates to an editing system, for example, in a broadcasting station accumulating images, in order to produce a celebration program, a memorial program, or the like, the appearance scene of a specific performer is detected from the past images. , Regarding an editing system having a function of assisting editing for program production.

従来、過去の映像資産はＶＴＲテープに記録されており、膨大な数のＶＴＲテープが倉庫に保管されていた。各ＶＴＲテープにはテープ番号が貼られ、テープ番号とともに、そのテープに記録されている番組名や出演者、番組内容概略の情報が管理されていた。そのため、特定の出演者の映像が欲しい場合、放送局の担当者はＶＴＲテープの管理情報を元に、その出演者が出演している番組が記録されているＶＴＲテープを特定していた。 Traditionally, past video assets have been recorded on VTR tapes, and a huge number of VTR tapes have been stored in warehouses. A tape number was attached to each VTR tape, and along with the tape number, information on the program name, performers, and program content outline recorded on the tape was managed. Therefore, when a video of a specific performer is desired, the person in charge of the broadcasting station identifies the VTR tape on which the program in which the performer is appearing is recorded based on the management information of the VTR tape.

例えば、特許文献１では、同じ番組編集技術として、編集用元素材と編集済素材との関係を抜き出した情報である元素材情報データを作成し、再び編集する場合には、編集済素材とプロジェクトデータと元素材情報データとを使用して編集する技術が提案されている。 For example, in Patent Document 1, as the same program editing technique, when the original material information data which is the information extracted from the relationship between the original material for editing and the edited material is created and edited again, the edited material and the project A technique for editing using data and original material information data has been proposed.

特開２０１２−３４２１８号公報Japanese Unexamined Patent Publication No. 2012-34218

ところで、従来では、ＶＴＲテープのどのシーンに出演しているかを特定するためには、ＶＴＲテープをＶＴＲ装置にかけて再生させ、目視でその出演者の出演シーンを探す必要があり、出演シーンが見つかった場合、出演シーンのタイムコード情報をメモして、編集に使用しており、作業効率や精度の観点から対策の新たな技術が求められていた。 By the way, conventionally, in order to identify which scene of a VTR tape is appearing, it is necessary to play the VTR tape on a VTR device and visually search for the appearance scene of the performer, and the appearance scene is found. In this case, the time code information of the appearance scene is written down and used for editing, and a new technology for countermeasures is required from the viewpoint of work efficiency and accuracy.

近年、映像資産をＶＴＲテープからＬＴＯテープ等の磁気メディアやブルーレイディスク（登録商標）等の光学メディアにダビングして、これらのメディア内で映像ファイルとして保管する方式に変わりつつある。しかし、出演シーンを探し出すためには、これらのメディア内の映像ファイルを再生して目視する必要があることには変わりはなく、同様の課題があった。 In recent years, there has been a change in the method of dubbing video assets from VTR tapes to magnetic media such as LTO tapes and optical media such as Blu-ray discs (registered trademarks) and storing them as video files in these media. However, in order to find out the appearance scene, it is still necessary to reproduce and visually inspect the video files in these media, and there is a similar problem.

また、番組の編集が完了してから放送直前に出演者が問題を起こしたことにより、その出演者の放送が不可になった場合は、その出演者にモザイクをかけるか、または出演シーンをカットするための再編集を行う必要がある。再編集するために出演シーンを探し出すためには、編集完了後の映像を再生させて、編集者が目視で出演シーンを探し出す必要があった。この点でも、同様の課題があった。 In addition, if the performer's broadcast becomes impossible due to a problem caused by the performer immediately before the broadcast after the program editing is completed, the performer is mosaicked or the appearance scene is cut. Need to be re-edited to do. In order to find the appearance scene for re-editing, it was necessary for the editor to visually find the appearance scene by playing back the video after the editing was completed. In this respect as well, there was a similar problem.

本発明は、このような状況に鑑みなされたもので、上記課題を解決することを目的とする。 The present invention has been made in view of such a situation, and an object of the present invention is to solve the above problems.

本発明は、放送に用いられる映像ファイルを編集する編集装置を備えた編集システムであって、前記映像ファイルに含まれる出演者の顔画像を取得し、前記顔画像と各出演者の出演映像のタイムコード情報とを関連付けて記録する顔画像蓄積サーバと、前記顔画像蓄積サーバに記録されている顔画像と、特定番組の映像ファイルに含まれる検索対象となる顔画像とを比較し、前記特定番組における出演映像を検出する出演映像検出部と、前記出演映像検出部が検出した出演映像に基づいて、前記特定番組において前記検索対象となる顔画像の人物が出演している前記特定番組以外の出演映像の顔画像と、前記特定番組における人物の顔画像との間の特徴量の距離が予め設けた閾値よりも近い場合に、同一の人物と判断する類似顔画像検索により、前記検索対象となる顔画像の人物が出演している前記特定番組以外の出演映像を検出し、検出した出演映像のタイムコード情報を前記検索対象となった出演者情報と関連付けて前記編集装置に通知する類似顔画像検出装置と、を備え、前記編集装置は、前記タイムコード情報を用いて前記特定番組の映像ファイルを編集する。
また、前記編集装置は、前記映像ファイルの編集の際に、出演者の前記映像ファイルが前記類似顔画像検出装置の処理対象の記録装置に保存されている場合は、前記記録装置の映像ファイルを再生することで出演映像の映像確認を可能に表示してもよい。
また、前記編集装置は、低解像度映像を使用して検出した出演映像を再生してもよい。
また、前記顔画像蓄積サーバは、検出対象の顔画像を顔の種別と関連付けて保存可能であり、前記類似顔画像検出装置は、前記顔種別に応じて類似顔画像検索を行ってもよい。 The present invention is an editing system provided with an editing device for editing a video file used for broadcasting, and acquires a face image of a performer included in the video file, and obtains the face image and the appearance video of each performer. The face image storage server that records in association with the time code information, the face image recorded in the face image storage server, and the face image to be searched included in the video file of the specific program are compared and specified. Other than the specific program in which the person with the face image to be searched appears in the specific program based on the appearance image detection unit that detects the appearance image in the program and the appearance image detected by the appearance image detection unit. When the distance of the feature amount between the face image of the appearance video and the face image of the person in the specific program is closer than the preset threshold , the search target is obtained by the similar face image search for determining the same person. A similar face that detects an appearance video other than the specific program in which a person with a face image is appearing , associates the time code information of the detected appearance video with the performer information to be searched, and notifies the editing device. An image detection device is provided, and the editing device edits a video file of the specific program using the time code information.
In addition, when editing the video file, the editing device captures the video file of the recording device when the video file of the performer is stored in the recording device to be processed by the similar face image detection device. By playing back, the video confirmation of the appearance video may be displayed.
In addition, the editing device may reproduce the appearance video detected by using the low-resolution video.
Further, the face image storage server can save the face image to be detected in association with the face type, and the similar face image detection device may perform a similar face image search according to the face type.

本発明よれば、番組中（映像データ中）の検索対象の人物の顔画像の検出を容易にし、モザイク処理等の編集を効果的に行う技術を提供できる。 According to the present invention, it is possible to provide a technique for facilitating the detection of a face image of a person to be searched during a program (in video data) and effectively performing editing such as mosaic processing.

実施形態に係る、映像編集システムの概略構成を示すブロック図である。It is a block diagram which shows the schematic structure of the video editing system which concerns on embodiment. 実施形態に係る、記録装置の概略構成を示すブロック図である。It is a block diagram which shows the schematic structure of the recording apparatus which concerns on embodiment. 実施形態に係る、類似顔画像検出装置の概略構成を示すブロック図である。It is a block diagram which shows the schematic structure of the similar face image detection apparatus which concerns on embodiment. 実施形態に係る、編集装置の概略構成を示すブロック図である。It is a block diagram which shows the schematic structure of the editing apparatus which concerns on embodiment. 実施形態に係る、自動編集情報作成装置の概略構成を示す機能ブロックである。It is a functional block which shows the schematic structure of the automatic editing information creation apparatus which concerns on embodiment. 実施形態に係る、編集処理の一例を示すフローチャートである。It is a flowchart which shows an example of the editing process which concerns on embodiment. 実施形態に係る、編集処理の一例を示すフローチャートである。It is a flowchart which shows an example of the editing process which concerns on embodiment. 実施形態に係る、編集処理の一例を示すフローチャートである。It is a flowchart which shows an example of the editing process which concerns on embodiment. 実施形態に係る、処理対象認識部の動作例を示すフローチャートである。It is a flowchart which shows the operation example of the process target recognition part which concerns on embodiment. 実施形態に係る、編集装置における表示方法（タッチパネルディスプレイの表示）の例を示す図である。It is a figure which shows the example of the display method (display of a touch panel display) in an editing apparatus which concerns on embodiment. 実施形態に係る、検索キー画像の候補となった画像の特徴量の例を示す図である。It is a figure which shows the example of the feature amount of the image which became the candidate of the search key image which concerns on embodiment. 実施形態に係る、類似人物検索（類似顔画像検出処理）を実施する手順を示すフローチャートである。It is a flowchart which shows the procedure which carries out the similar person search (similar face image detection processing) which concerns on embodiment. 実施形態に係る、類似顔画像検索システムに使用可能な検索画面の例を示す図である。It is a figure which shows the example of the search screen which can be used in the similar face image search system which concerns on embodiment. 実施形態に係る、顔画像蓄積サーバに顔画像を蓄積する手順例を示す図である。It is a figure which shows the procedure example which stores a face image in a face image storage server which concerns on embodiment. 実施形態に係る、顔画像蓄積サーバに顔画像を蓄積する手順例を示す図である。It is a figure which shows the procedure example which stores a face image in a face image storage server which concerns on embodiment. 実施形態に係る、目的の出演者の顔画像を検出対象として顔画像蓄積サーバから類似顔画像検出した例を示す図である。It is a figure which shows the example which detected the similar face image from the face image storage server with the face image of the target performer as the detection target which concerns on embodiment.

以下、本発明の実施形態について図面を参照して詳細に説明する。
本実施形態の概要は次の通りである。
（１）放送局に蓄積された膨大な過去映像の中から、可能な限り全ての出演者の顔画像のみを切出して出演シーンのタイムコード情報とともに顔画像蓄積サーバに保存しておく。
（２）蓄積された顔画像と目的の出演者の顔画像を比較して、その出演者の出演シーンを検出する。
（３）検出した結果の出演シーンの顔画像を用いて、類似顔画像検出処理によって類似した出演シーンを絞り込む。
（４）検出した出演シーンのタイムコード情報を編集機に渡すことで、その出演者が出演している特集番組の制作を容易にする。
（５）低解像度映像を使用して検出した出演シーンを簡易に再生する。
（６）放送直前に出演者の映像が放送不可になった場合、その出演者が出演しているシーンを特定して、編集（モザイク、カット等）する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
The outline of this embodiment is as follows.
(1) From the huge amount of past images stored in the broadcasting station, only the face images of all the performers are cut out as much as possible and saved in the face image storage server together with the time code information of the appearance scenes.
(2) The accumulated face image is compared with the face image of the target performer, and the appearance scene of the performer is detected.
(3) Using the face image of the appearance scene as a result of detection, the similar appearance scene is narrowed down by the similar face image detection process.
(4) By passing the time code information of the detected appearance scene to the editing machine, it is easy to produce a special program in which the performer is appearing.
(5) The appearance scene detected by using the low-resolution video is easily reproduced.
(6) If the video of a performer becomes unbroadcast immediately before broadcasting, the scene in which the performer is appearing is specified and edited (mosaic, cut, etc.).

図１は、本実施形態に係る映像編集システム１の概略構成を示すブロック図である。映像編集システム１は、カメラ１０と、収録装置１１と、記録装置１２（ビデオサーバ）と、自動編集情報作成装置１３と、編集装置１４と、管理端末１７と、送出サーバ１８と、システム制御部１５とを備え、それらはＬＡＮ回線や所定の通信回線等のネットワーク２で接続されている。システム制御部１５は、映像編集システム１全体を統括的に制御するものであって、単独で構成されてもよいし、他装置（記録装置１２や編集装置１４など）と同一に含まれて構成されてもよい。 FIG. 1 is a block diagram showing a schematic configuration of a video editing system 1 according to the present embodiment. The video editing system 1 includes a camera 10, a recording device 11, a recording device 12 (video server), an automatic editing information creation device 13, an editing device 14, a management terminal 17, a transmission server 18, and a system control unit. 15 is provided, and they are connected by a network 2 such as a LAN line or a predetermined communication line. The system control unit 15 controls the entire video editing system 1 in an integrated manner, and may be configured independently or is included in the same configuration as other devices (recording device 12, editing device 14, etc.). May be done.

カメラ１０は、ＣＣＤ（Charge Coupled Device）やＣＭＯＳ（Complementary Metal Oxide Semiconductor）素子等で撮像した画像にデジタル変換処理を施し、変換結果の画像データ（例えば、ＨＤ−ＳＤＩ規格の素材映像データ）を、収録装置１１を用いて、ネットワーク２を介して記録装置１２へ出力する。記録装置１２（ビデオサーバ）は、これを記憶する。記録装置１２には、ネットワーク２を介して自動編集情報作成装置１３が接続され、素材映像データは自動編集情報作成装置１３に入力されてから、記録装置１２に記憶される。ただし、素材映像データは、自動編集情報作成装置１３を介さずに記録装置１２に直接入力され記憶されてもよい。 The camera 10 performs digital conversion processing on an image captured by a CCD (Charge Coupled Device), CMOS (Complementary Metal Oxide Semiconductor) element, or the like, and obtains image data (for example, HD-SDI standard material video data) of the conversion result. The recording device 11 is used to output data to the recording device 12 via the network 2. The recording device 12 (video server) stores this. The automatic editing information creating device 13 is connected to the recording device 12 via the network 2, and the material video data is input to the automatic editing information creating device 13 and then stored in the recording device 12. However, the material video data may be directly input to and stored in the recording device 12 without going through the automatic editing information creating device 13.

図２は、記録装置１２の概略構成を示すブロック図である。記録装置１２は、録画機能及び類似顔画像検出機能（類似顔画像検出装置１６）と、データ保存機能（１２１〜１２７）を備える。 FIG. 2 is a block diagram showing a schematic configuration of the recording device 12. The recording device 12 includes a recording function, a similar face image detection function (similar face image detection device 16), and a data storage function (121-127).

記録装置１２のデータ保存機能について説明する。記録装置１２は、素材映像データを記録する素材映像データ部１２１と、編集済み映像データを記録する編集済み映像データ部１２２と、自動編集済み映像データを記録する自動編集済み映像データ部１２３と、最終編集情報を記録する最終編集情報部１２４と、自動編集情報を記録する自動編集情報部１２５と、低解像度ファイルを記録する低解像度サーバ１２６と、映像中に含まれる顔画像を記録し蓄積する顔画像蓄積サーバ１２７と、を備える。 The data storage function of the recording device 12 will be described. The recording device 12 includes a material video data unit 121 for recording material video data, an edited video data unit 122 for recording edited video data, and an automatically edited video data unit 123 for recording automatically edited video data. The final edit information unit 124 that records the final edit information, the automatic edit information unit 125 that records the automatic edit information, the low resolution server 126 that records the low resolution file, and the face image included in the video are recorded and stored. It includes a face image storage server 127.

低解像度サーバ１２６を設ける理由は次の通りである。すなわち、一般に、メディア５（光学メディア５ａ、磁気メディア５ｂ、ＶＴＲテープ５ｃ）内の映像ファイルは高画質で保存する必要があるため、映像ファイルサイズが大きくなる。すなわち、常にアクセス可能なＨＤＤストレージに保存しておくことはできない。低解像度ファイルであれば、ファイルサイズが小さいため常にアクセス可能なＨＤＤストレージに映像ファイルを保存して試写することが可能である。そこで、メディア５をダビングする場合は、低解像度映像（低解像度ファイル）を同時に作成して、低解像度サーバ１２６に記録する。 The reason for providing the low resolution server 126 is as follows. That is, in general, the video file in the media 5 (optical media 5a, magnetic media 5b, VTR tape 5c) needs to be stored with high image quality, so that the video file size becomes large. That is, it cannot be saved in the HDD storage that can always be accessed. If it is a low-resolution file, since the file size is small, it is possible to save the video file in the HDD storage that is always accessible and preview it. Therefore, when dubbing the media 5, a low-resolution video (low-resolution file) is simultaneously created and recorded on the low-resolution server 126.

つづいて、記録装置１２の録画機能及び類似顔画像検出機能を実現する類似顔画像検出装置１６について図３を参照して説明する。 Subsequently, the similar face image detection device 16 that realizes the recording function and the similar face image detection function of the recording device 12 will be described with reference to FIG.

図３は、類似顔画像検出装置１６の概略構成を示すブロック図である。類似顔画像検出装置１６は、画像送受信部２１０と、画像記録部２１１と、再生制御部２１２と、人物領域検出部２１３と、人物特徴量抽出部２１４と、人物特徴量記録部２１５と、属性情報記録部２１６と、要求受信部２１７と、類似人物検索部２１８と、登場イベント検索部２１９と、検索結果送信部２２０と、キーワード記録部１１０と、キーワード検索部１１１とを有する。 FIG. 3 is a block diagram showing a schematic configuration of the similar face image detection device 16. The similar face image detection device 16 includes an image transmission / reception unit 210, an image recording unit 211, a reproduction control unit 212, a person area detection unit 213, a person feature amount extraction unit 214, a person feature amount recording unit 215, and attributes. It has an information recording unit 216, a request receiving unit 217, a similar person search unit 218, an appearance event search unit 219, a search result transmission unit 220, a keyword recording unit 110, and a keyword search unit 111.

画像送受信部２１０は、装置外部からの画像の入出力を行う処理部であり、カメラ１０や他の装置からの入力画像データの受信、他装置（編集装置１４等）への出力画像データの送信を行う。 The image transmission / reception unit 210 is a processing unit that inputs / outputs images from the outside of the device, receives input image data from the camera 10 or another device, and transmits output image data to another device (editing device 14 or the like). I do.

画像記録部２１１は、入力画像データの記録媒体へ書込みや出力画像データの記録媒体からの読出し（ＶＴＲテープ５ｃの場合はメディア再生装置１９に接続される）を行う。書込みの際には、画像データに加え、画像データを読出す際の情報となる画像ＩＤ（画像の識別情報）も併せて記録する。再生制御部２１２は、編集装置１４への映像再生を制御する。 The image recording unit 211 writes the input image data to the recording medium and reads the output image data from the recording medium (in the case of the VTR tape 5c, it is connected to the media reproduction device 19). At the time of writing, in addition to the image data, an image ID (image identification information) which is information when reading the image data is also recorded. The playback control unit 212 controls video playback to the editing device 14.

人物領域検出部２１３は、入力画像データに対し画像認識技術を用いた人物検出を行い、画像中の人物の存在判定をし、人物が存在する場合には、その領域の座標算出を行う。また、人物領域検出部２１３は、人物の「顔」の領域を特定し、その領域を含む顔画像を抽出し、顔画像蓄積サーバ１２７に記録する。 The person area detection unit 213 detects the person using the image recognition technique for the input image data, determines the existence of the person in the image, and if the person exists, calculates the coordinates of the area. Further, the person area detection unit 213 identifies the area of the "face" of the person, extracts the face image including the area, and records it in the face image storage server 127.

人物特徴量抽出部２１４は、人物領域検出部２１３で検出した領域に対して画像認識技術を用いて特徴量算出を行う。ここで算出する人物特徴量とは、例えば、人物の輪郭の形状や方向、皮膚の色、歩容（どの脚をどのようにどんなタイミングで動かすかといった脚の捌き方）、或いは、人物を特定する代表的な部位である顔の輪郭の形状や方向、目や鼻、口といった主要構成要素の大きさ、形状、配置関係等が挙げられるが、本実施形態においては、使用する特徴量の種類や数はいずれであってもよい。人物特徴量抽出部２１４は、顔種別（正面、横顔、斜め顔、後ろ顔、笑った顔、怒った顔等）を特徴量の種類として判別することができ、検出対象の顔画像とそのような特徴量を関連づけることができる。 The person feature amount extraction unit 214 calculates the feature amount for the area detected by the person area detection unit 213 by using the image recognition technique. The person feature amount calculated here is, for example, the shape and direction of the outline of the person, the color of the skin, the gait (how to handle the legs such as which leg is moved how and at what timing), or the person is specified. The shape and direction of the contour of the face, which is a typical part of the face, and the size, shape, and arrangement of the main components such as eyes, nose, and mouth can be mentioned. Or any number. The person feature amount extraction unit 214 can determine the face type (front, profile, diagonal face, back face, laughing face, angry face, etc.) as the type of feature amount, and the face image to be detected and such a face image to be detected. Features can be associated.

人物特徴量記録部２１５は、人物特徴量抽出部２１４で算出した特徴量の記録媒体への書込みと読出しを行う。このとき、人物特徴量は、人物領域検出部２１３が抽出した顔画像を顔画像蓄積サーバ１２７に記録する際に関連付けられる。顔画像には、所定のタイミング（ユーザによる入力または類似顔画像検索による自動付与）で人物の名前と関連付けられる。 The person feature amount recording unit 215 writes and reads the feature amount calculated by the person feature amount extraction unit 214 to the recording medium. At this time, the person feature amount is associated with the face image extracted by the person area detection unit 213 when it is recorded in the face image storage server 127. The face image is associated with the person's name at a predetermined timing (input by the user or automatically assigned by a similar face image search).

画像記録部２１１における画像データの記録媒体と本処理部における人物特徴量の記録媒体とは同一であっても別個であってもよい。 The image data recording medium in the image recording unit 211 and the personal feature amount recording medium in the processing unit may be the same or separate.

属性情報記録部２１６は、画像データに関連する属性情報の記録媒体への書込みと読出しを行う。属性情報とは、例えば、画像の撮影時刻や撮像装置番号等である。 The attribute information recording unit 216 writes and reads the attribute information related to the image data to the recording medium. The attribute information is, for example, an image shooting time, an image pickup device number, or the like.

要求受信部２１７は、編集装置１４からの検索要求やキーワード付与要求の受信を行う。検索要求には、類似顔画像検索要求と、登場イベント検索要求がある。 The request receiving unit 217 receives a search request and a keyword assignment request from the editing device 14. The search request includes a similar face image search request and an appearance event search request.

類似人物検索部２１８は、要求受信部２１７にて受信した要求が類似人物検索要求であった場合に、類似顔画像検索を行う。 The similar person search unit 218 performs a similar face image search when the request received by the request receiving unit 217 is a similar person search request.

登場イベント検索部２１９は、要求受信部にて受信した要求が登場イベント検索要求であった場合に、登場イベント検索を行う。 The appearance event search unit 219 performs an appearance event search when the request received by the request receiving unit is an appearance event search request.

検索結果送信部２２０は、類似人物検索部２１８や登場イベント検索部２１９から得た類似人物検索結果や登場イベント検索結果の編集装置１４への送信を行う。 The search result transmission unit 220 transmits the similar person search result and the appearance event search result obtained from the similar person search unit 218 and the appearance event search unit 219 to the editing device 14.

キーワード記録部１１０は、要求受信部２１７にて受信したキーワード付与要求に基づくキーワードの記録媒体への書込みと読出しを行う。 The keyword recording unit 110 writes and reads the keyword to the recording medium based on the keyword addition request received by the request receiving unit 217.

キーワード検索部１１１は、要求受信部２１７にて受信した検索要求データ中にキーワードが含まれていた場合に、キーワード検索を行う。 The keyword search unit 111 performs a keyword search when a keyword is included in the search request data received by the request receiving unit 217.

つづいて、図４を参照して編集装置１４（編集機）を説明する。図４は、編集装置１４の概略構成を示すブロック図である。編集装置１４は、素材映像データに対して実際にレンダリング処理等を施す編集処理を行う。 Subsequently, the editing device 14 (editing machine) will be described with reference to FIG. FIG. 4 is a block diagram showing a schematic configuration of the editing device 14. The editing device 14 performs an editing process that actually performs a rendering process or the like on the material video data.

編集装置１４は、実際にこの編集作業を行うプロセッサを具備する編集制御部（編集手段）１４１と、素材映像データ、及びこれに編集が施された後の映像データに基づく映像を表示させる表示部１４２（ディスプレイ）と、その画像や音声における各部分を選択する、あるいは指示を入力するための操作パネル１４３（操作手段）と、類似顔画像検出操作部１０３とを備える。表示部１４２と操作パネル１４３とが一体化されたタッチパネルディスプレイ１４４として設けられてもよい。 The editing device 14 includes an editing control unit (editing means) 141 including a processor that actually performs this editing work, and a display unit that displays material video data and a video based on the video data after being edited. It includes 142 (display), an operation panel 143 (operation means) for selecting each part of the image or sound, or inputting an instruction, and a similar face image detection operation unit 103. The display unit 142 and the operation panel 143 may be provided as an integrated touch panel display 144.

編集制御部１４１は、素材映像データと上記の自動編集情報を記録装置１２（自動編集情報部１２５）から読み出し、自動編集情報に基づいて素材映像データを編集した新たな映像データ（自動編集済み映像データ）を作成し、自動編集済み映像データを記録装置１２（自動編集済み映像データ部１２３）に記憶させる。 The editing control unit 141 reads the material video data and the above-mentioned automatic editing information from the recording device 12 (automatic editing information unit 125), and edits the material video data based on the automatic editing information. New video data (automatically edited video). Data) is created, and the automatically edited video data is stored in the recording device 12 (automatically edited video data unit 123).

ただし、編集装置１４においては、この自動編集済み映像データに基づく画像をユーザが表示部１４２で確認した上で、操作パネル１４３を操作して、自動編集済み映像データにおいて処理が施された部分のうち、適切でないと認識された部分の処理を解除するための指示を編集制御部１４１に出し、この処理の解除を行うこともできる。この場合には、素材映像データを参照することもできる。 However, in the editing device 14, the user confirms the image based on the automatically edited video data on the display unit 142, and then operates the operation panel 143 to process the automatically edited video data. Of these, an instruction for canceling the processing of the portion recognized as inappropriate can be issued to the editing control unit 141, and this processing can be canceled. In this case, the material video data can also be referred to.

同様に、編集制御部１４１は、自動編集済み映像データに対して、更に追加の処理を施すこともできる。この際に新たに処理の対象となる部分は、ユーザによって指定される。この際にも、ユーザは、自動編集済み映像データに基づく映像を表示部１４２で確認した上で、操作パネル１４３を操作して、この操作を行うことができる。こうしたユーザによる操作によって、前記の自動編集情報が書き換えられた最終編集情報が生成される。この最終編集情報は素材映像データに対する編集処理に反映されると共に、後述されるように、処理対象情報の更新に利用される。 Similarly, the editing control unit 141 can further perform additional processing on the automatically edited video data. At this time, the part to be newly processed is specified by the user. Also at this time, the user can perform this operation by operating the operation panel 143 after confirming the video based on the automatically edited video data on the display unit 142. By such an operation by the user, the final editing information in which the automatic editing information is rewritten is generated. This final editing information is reflected in the editing process for the material video data, and is used for updating the processing target information as described later.

同様に、編集制御部１４１は、記録装置１２から素材映像データを直接読み込み、この素材映像データに基づく画像を表示部１４２でユーザに確認させた上で操作パネル１４３を操作させ、前記の自動編集情報を用いずに、処理の対象となる部分を指定し、レンダリング処理を施す操作を行うこともできる。この操作においては、ユーザは、自動編集情報とは無関係に、素材映像データに対してレンダリング処理を行うことができる。 Similarly, the editing control unit 141 directly reads the material video data from the recording device 12, causes the user to confirm the image based on the material video data on the display unit 142, and then operates the operation panel 143 to perform the automatic editing. It is also possible to specify the part to be processed and perform the rendering process without using the information. In this operation, the user can perform the rendering process on the material video data regardless of the automatic editing information.

編集制御部１４１は、このように、自動編集情報に基づいて編集された自動編集済み映像データ、ユーザによって自動編集済み映像データ又は素材映像データが編集された編集済み映像データを、記録装置１２に記録させることができる。 The editing control unit 141 transfers the automatically edited video data edited based on the automatic editing information, the automatically edited video data by the user, or the edited video data edited by the user to the recording device 12 in this way. Can be recorded.

類似顔画像検出操作部１０３は、機能構成として、検索要求送信部２２１、検索結果受信部２２２、検索結果表示部２２３、再生画像表示部２２４、画面操作検知部２２５、キーワード付与要求送信部１１２、複数検索キー選択部１１３の各処理部を有する。 The similar face image detection operation unit 103 has functional configurations such as a search request transmission unit 221 and a search result reception unit 222, a search result display unit 223, a reproduction image display unit 224, a screen operation detection unit 225, and a keyword assignment request transmission unit 112. It has each processing unit of the plurality of search key selection units 113.

検索要求送信部２２１は、検索要求の記録装置１２への送信を行う。類似人物検索の場合、検索要求データには、類似人物検索の検索キーとして、人物の名前、検索キー画像（特に顔画像）或いはその特徴量が含まれる。また、検索要求データには、絞込みパラメータを含めることも可能である。 The search request transmission unit 221 transmits the search request to the recording device 12. In the case of a similar person search, the search request data includes a person's name, a search key image (particularly a face image), or a feature amount thereof as a search key for the similar person search. It is also possible to include narrowing parameters in the search request data.

検索結果受信部２２２は、検索結果の記録装置１２（類似顔画像検出装置１６）からの受信を行う。検索結果として受信するデータには、記録装置１２（類似顔画像検出装置１６）において、類似人物検索、或いは、登場イベント検索を実施して得られた画像の集合が含まれる。集合を構成する個々の画像は、記録装置１２（類似顔画像検出装置１６）に記録された映像から画像サイズ縮小処理等を施して生成される。以下、この個々の画像を「検索結果画像」、検索結果として送受信するデータを「検索結果データ」ともいう。 The search result receiving unit 222 receives the search result from the recording device 12 (similar face image detecting device 16). The data received as the search result includes a set of images obtained by performing a similar person search or an appearance event search in the recording device 12 (similar face image detecting device 16). The individual images constituting the set are generated by performing image size reduction processing or the like from the images recorded in the recording device 12 (similar face image detecting device 16). Hereinafter, these individual images are also referred to as "search result images", and the data transmitted and received as search results are also referred to as "search result data".

検索結果表示部２２３は、検索結果受信部２２２にて受信した検索結果の画面表示を行う。表示される画面例については後述する。
再生画像表示部２２４は、記録装置１２（類似顔画像検出装置１６）から入力された画像データの画面への連続動画表示を行う。
画面操作検知部２２５は、ユーザによる操作内容の検知及び取得を行う。
キーワード付与要求送信部１１２は、キーワード付与要求の記録装置１２（類似顔画像検出装置１６）への送信を行う。
複数検索キー選択部１１３は、検索キー画像の候補が複数選択されたときに、より少ない数の検索キー画像を適切に選択する処理を行う。 The search result display unit 223 displays the screen of the search result received by the search result receiving unit 222. An example of the displayed screen will be described later.
The reproduced image display unit 224 displays a continuous moving image on the screen of the image data input from the recording device 12 (similar face image detecting device 16).
The screen operation detection unit 225 detects and acquires the operation content by the user.
The keyword assignment request transmission unit 112 transmits the keyword assignment request to the recording device 12 (similar face image detection device 16).
The multiple search key selection unit 113 performs a process of appropriately selecting a smaller number of search key images when a plurality of search key image candidates are selected.

図５は、自動編集情報作成装置１３の機能ブロックである。自動編集情報作成装置１３は、処理対象認識部１３１と、情報記憶部１３２とを備える。情報記憶部１３２は、最終編集情報部１２４と、自動編集情報部１２５と、処理対象情報部１２８とを備える。最終編集情報部１２４、自動編集情報部１２５は、記録装置１２に設けられるものと同一であってもよいし、別に設けられてもよい。 FIG. 5 is a functional block of the automatic editing information creation device 13. The automatic editing information creation device 13 includes a processing target recognition unit 131 and an information storage unit 132. The information storage unit 132 includes a final editing information unit 124, an automatic editing information unit 125, and a processing target information unit 128. The final editing information unit 124 and the automatic editing information unit 125 may be the same as those provided in the recording device 12, or may be provided separately.

自動編集情報作成装置１３は、この素材映像データを読み込み、処理対象認識部１３１で、レンダリング処理を施す部分を認識する。この際、処理対象認識部１３１におけるプロセッサは、情報記憶部１３２に記憶された処理対象情報を基にして、この認識を行い、このように処理の対象となる部分とその処理についての情報（自動編集情報）を記録装置１２に記憶させる。 The automatic editing information creation device 13 reads the material video data, and the processing target recognition unit 131 recognizes a portion to be rendered. At this time, the processor in the processing target recognition unit 131 performs this recognition based on the processing target information stored in the information storage unit 132, and thus performs the processing target portion and the information about the processing (automatic). The editing information) is stored in the recording device 12.

自動編集情報の内容における処理の対象となる部分に関する情報としては、具体的には、処理対象となる部分の映像フレーム位置（タイムコード情報）、映像上の座標、あるいは処理対象が音声の場合には音声サンプルの位置の範囲、処理の内容等がある。処理の内容としては、処理対象が映像の場合にはモザイク処理、ブラー処理、映像カット、輝度の増減処理、処理対象が音声の場合にはミュート処理、音量調整等がある。また、処理の対象とする理由（例えば放送禁止に該当する、特定企業名である等）も処理対象情報に含まれる。 The information about the part to be processed in the content of the automatic editing information is specifically, the video frame position (time code information) of the part to be processed, the coordinates on the video, or when the processing target is audio. Has the range of audio sample positions, the content of processing, and so on. The contents of the processing include mosaic processing, blur processing, video cut, brightness increase / decrease processing when the processing target is video, mute processing, volume adjustment and the like when the processing target is audio. In addition, the reason for processing (for example, broadcasting ban, specific company name, etc.) is also included in the processing target information.

処理対象情報としては、例えば映像の配信先（目的）等に応じ、複数のものを設定することができる。これに応じて、例えばある一つの配信先に対しては処理の対象とならない部分を他の配信先に対しては処理の対象とすること、上記の処理の内容を配信先に応じて変える、等の操作が可能となる。こうした場合には、処理対象情報がユーザによって選択される構成とされる。 As the processing target information, for example, a plurality of information can be set according to the video distribution destination (purpose) and the like. According to this, for example, the part that is not the target of processing for one delivery destination is the target of processing for other delivery destinations, and the content of the above processing is changed according to the delivery destination. Etc. can be operated. In such a case, the processing target information is selected by the user.

また、後述するように、最終的に素材映像データに対して編集が行われる際には、処理対象となった部分や処理の内容は、ユーザによって確認された後に、修正が施される。こうした最終的な編集情報（最終編集情報）あるいは最終編集情報と自動編集情報との違いに関する情報も、情報記憶部１３２に記憶される。 Further, as will be described later, when the material video data is finally edited, the portion to be processed and the content of the processing are corrected after being confirmed by the user. Such final editing information (final editing information) or information regarding the difference between the final editing information and the automatic editing information is also stored in the information storage unit 132.

以上の構成による動作例を説明する。
まず、図６〜１０を参照して編集処理例を説明し、次に図１１〜１３を参照して類似人物検索処理（特に類似顔検出処理）について説明し、さらに図１４〜１６を参照して類似顔検出処理を編集処理に適用した処理例を説明する。 An operation example with the above configuration will be described.
First, an example of editing processing will be described with reference to FIGS. 6 to 10, then a similar person search processing (particularly similar face detection processing) will be described with reference to FIGS. 11 to 13, and further with reference to FIGS. 14 to 16. A processing example in which the similar face detection processing is applied to the editing processing will be described.

図６は、システム制御部１５が行わせる具体的な動作を示すフローチャートの一例である。ここでは、単純化のために、編集装置１４を用いてユーザによって指定された処理は行われないものとする。また、図１において、素材映像データは自動編集情報作成装置１３を介してのみ記録装置１２に入力する（記憶される）ものとする。 FIG. 6 is an example of a flowchart showing a specific operation performed by the system control unit 15. Here, for the sake of simplicity, it is assumed that the process specified by the user using the editing device 14 is not performed. Further, in FIG. 1, the material video data is input (stored) to the recording device 12 only via the automatic editing information creation device 13.

まず、収録装置１１は、素材映像データを入手する（Ｓ１）。自動編集情報作成装置１３は、この素材映像データを入手し、素材映像データ中の画像において処理対象となる部分があるかを解析する（Ｓ２）。ここでは、処理対象認識部１３１が、情報記憶部１３２中の情報を参照し、素材映像データ中の画像において処理対象となる部分があるかを認識し、この部分が認識された場合には、この部分に対する処理も、情報記憶部１３２中の情報に基づき、決定する（Ｓ３）。これによって、自動編集情報が作成される。処理の対象となる部分が認識されなかった場合（Ｓ４のＮｏ）には、素材映像データがそのまま記録装置１２に記憶される（Ｓ５）。 First, the recording device 11 obtains the material video data (S1). The automatic editing information creation device 13 obtains this material video data and analyzes whether or not there is a portion to be processed in the image in the material video data (S2). Here, the processing target recognition unit 131 refers to the information in the information storage unit 132, recognizes whether there is a part to be processed in the image in the material video data, and if this part is recognized, the processing target recognition unit 131 is recognized. The processing for this portion is also determined based on the information in the information storage unit 132 (S3). As a result, automatic editing information is created. When the portion to be processed is not recognized (No in S4), the material video data is stored in the recording device 12 as it is (S5).

処理の対象となる部分が認識された場合（Ｓ４のＹｅｓ）、システム制御部１５は、素材映像データを記憶するか否かをユーザに問い合わせる（Ｓ６）。記憶しない場合（Ｓ６のＮｏ）、前記の通り、編集装置１４を用いて、この素材映像データに対して自動編集情報に基づく編集を行わせた自動編集済み映像データを作成し（Ｓ７）、この自動編集済み映像データと自動編集情報とを記録装置１２に記憶させる（Ｓ８）。この場合には、記録装置１２に記憶される映像データは、編集後の自動編集済み映像データのみとなる、あるいは、素材映像データが記録装置１２に記憶されていた場合には、素材映像データは自動編集済み映像データに置き換えられる。 When the portion to be processed is recognized (Yes in S4), the system control unit 15 asks the user whether or not to store the material video data (S6). When not stored (No in S6), as described above, the editing device 14 is used to create automatically edited video data in which the material video data is edited based on the automatic editing information (S7). The automatically edited video data and the automatically edited information are stored in the recording device 12 (S8). In this case, the video data stored in the recording device 12 is only the automatically edited video data after editing, or when the material video data is stored in the recording device 12, the material video data is Replaced with automatically edited video data.

素材映像データを記憶する場合（Ｓ６のＹｅｓ）、システム制御部１５は、素材映像データと自動編集情報を記録装置１２に記憶させた後（Ｓ９）、ユーザに対して、自動編集を行うか否かの確認を行う（Ｓ１０）。自動編集を行わない場合（Ｓ１０のＮｏ）、処理は終了する。この場合には、記録装置１２には編集前の素材映像データと自動編集情報が記憶される。このため、この時点では自動編集済み映像データは存在しないが、編集装置１４を用いて、後で容易に自動編集済み映像データを作成することができる。 When storing the material video data (Yes in S6), the system control unit 15 stores the material video data and the automatic editing information in the recording device 12 (S9), and then whether or not to perform automatic editing for the user. Is confirmed (S10). When automatic editing is not performed (No in S10), the process ends. In this case, the recording device 12 stores the material video data before editing and the automatic editing information. Therefore, although the automatically edited video data does not exist at this point, the automatically edited video data can be easily created later by using the editing device 14.

自動編集を行う場合（Ｓ１０のＹｅｓ）、システム制御部１５は、編集装置１４に自動編集済み映像データを作成させ（Ｓ１１）、これを記録装置１２に記憶させる（Ｓ１２）。この場合、記録装置１２には、元となった素材映像データ、自動編集情報、自動編集済み映像データの全てが記憶される。このため、例えば、上記のように複数の処理対象情報が設定された場合において、同一の素材映像データに対して他の処理対象情報を用いた処理を後で行うことが容易となる。 When performing automatic editing (Yes in S10), the system control unit 15 causes the editing device 14 to create the automatically edited video data (S11), and stores the automatically edited video data in the recording device 12 (S12). In this case, the recording device 12 stores all of the original material video data, automatic editing information, and automatically edited video data. Therefore, for example, when a plurality of processing target information is set as described above, it becomes easy to perform processing using other processing target information on the same material video data later.

なお、記録装置１２が収録装置１１から素材映像データを直接受信してこれを記憶する場合には、上記のＳ６〜Ｓ８の工程は不要となる。ただし、自動編集済み映像データが記憶された（Ｓ１２）後に、素材映像データを削除してもよい。 When the recording device 12 directly receives the material video data from the recording device 11 and stores it, the above steps S6 to S8 are unnecessary. However, the material video data may be deleted after the automatically edited video data is stored (S12).

図６のフローチャートにおいては、素材映像データの入力があった後におけるシステム制御部１５の動作が示された。一方、素材映像データが記録装置１２に予め記憶されている状態でシステム制御部１５に対して映像の配信（出力）要求があり、これに応じて素材映像データが編集された後の映像データを出力させる場合もある。 In the flowchart of FIG. 6, the operation of the system control unit 15 after the input of the material video data is shown. On the other hand, there is a video distribution (output) request to the system control unit 15 in a state where the material video data is stored in the recording device 12 in advance, and the video data after the material video data is edited in response to the request. It may be output.

図７は、こうした場合におけるシステム制御部１５の動作の一例を示すフローチャートである。ここでは、少なくとも素材映像データは記録装置１２に記憶されているものとする。 FIG. 7 is a flowchart showing an example of the operation of the system control unit 15 in such a case. Here, it is assumed that at least the material video data is stored in the recording device 12.

まず、システム制御部１５は、配信の要求があった場合（Ｓ２１）、記録装置１２に自動編集済み映像データが記憶されているか否かを確認する（Ｓ２２）。自動編集済み映像データが記憶されていなかった場合（Ｓ２２のＮｏ）、自動編集情報が記憶されているか否かを確認する（Ｓ２３）。 First, when there is a distribution request (S21), the system control unit 15 confirms whether or not the automatically edited video data is stored in the recording device 12 (S22). When the automatically edited video data is not stored (No in S22), it is confirmed whether or not the automatically edited information is stored (S23).

自動編集情報が存在する場合（Ｓ２３のＹｅｓ）、システム制御部１５は、前記のように編集装置１４を用いて自動編集済み映像データを作成し、これを記録装置１２に記憶させる（Ｓ２４）。自動編集情報が存在しない場合（Ｓ２３のＮｏ）、システム制御部１５は、自動編集情報作成装置１３を用いて自動編集情報を作成し（Ｓ２５）、同様に編集装置１４を用いて自動編集済み映像データを作成し、これを記録装置１２に記憶させる（Ｓ２４）。これによって、自動編集済み映像データが記憶されていなかった場合（Ｓ２２のＮｏ）、自動編集済み映像データが新たに作成されて記録装置１２に記憶される。 When the automatic editing information exists (Yes in S23), the system control unit 15 creates the automatically edited video data using the editing device 14 as described above, and stores the automatically edited video data in the recording device 12 (S24). When the automatic editing information does not exist (No in S23), the system control unit 15 creates the automatic editing information using the automatic editing information creation device 13 (S25), and similarly uses the editing device 14 to create the automatically edited video. Data is created and stored in the recording device 12 (S24). As a result, when the automatically edited video data is not stored (No in S22), the automatically edited video data is newly created and stored in the recording device 12.

自動編集済み映像データが記憶されていた場合（Ｓ２２のＹｅｓ）、あるいは上記のように新たに自動編集済み映像データが作成・記憶された場合（Ｓ２４）、システム制御部１５は、この自動編集済み映像データに基づく画像を編集装置１４（表示部１４２）で表示させ（Ｓ２６）、この内容で配信してよいか否かをユーザに問い合わせる（Ｓ２７）。 When the automatically edited video data is stored (Yes in S22), or when a new automatically edited video data is created and stored as described above (S24), the system control unit 15 is automatically edited. An image based on the video data is displayed on the editing device 14 (display unit 142) (S26), and the user is inquired as to whether or not the content may be distributed (S27).

この内容で配信してよい場合（Ｓ２７のＹｅｓ）、この自動編集済み映像データを、配信が許可された編集済み映像データと設定する（Ｓ２８）。一方、この内容からの変更を希望する場合（Ｓ２７のＮｏ）、システム制御部１５は、編集装置１４を用いて自動編集済み映像データを更に編集させ（Ｓ２９）、この編集後の映像データを、配信が許可された編集済み映像データと設定し、記録装置１２に記憶させる（Ｓ３０）。この際、前記の通り最終編集情報も作成し、記憶させる。 When the content may be distributed (Yes in S27), the automatically edited video data is set as the edited video data for which distribution is permitted (S28). On the other hand, when it is desired to change from this content (No in S27), the system control unit 15 further edits the automatically edited video data using the editing device 14 (S29), and the edited video data is displayed. It is set as the edited video data for which distribution is permitted, and stored in the recording device 12 (S30). At this time, as described above, the final editing information is also created and stored.

その後、システム制御部１５は、上記のように記録装置１２に記憶された編集済み映像データを配信させる（Ｓ３１）。 After that, the system control unit 15 distributes the edited video data stored in the recording device 12 as described above (S31).

また、自動編集情報が作成されてもこれを適用して自動編集済み映像データを作成するのには時間を要し、記録装置１２に様々な映像データを記憶させるのにも時間を要する。このため、配信において不要となることが明らかな映像データを記憶させない、作成しないことが好ましい。更に、処理の時間を短縮するために、ユーザが他の装置を用いて同時に映像を確認する場合もある。 Further, even if the automatic editing information is created, it takes time to apply the automatic editing information to create the automatically edited video data, and it also takes time to store various video data in the recording device 12. Therefore, it is preferable not to store or create video data that is clearly unnecessary for distribution. Further, in order to shorten the processing time, the user may check the video at the same time using another device.

図８は、こうした点を考慮したシステム制御部１５の動作を示すフローチャートの一例である。 FIG. 8 is an example of a flowchart showing the operation of the system control unit 15 in consideration of these points.

ここでは、収録装置１１が素材映像データを入手したら（Ｓ４１）、この素材映像データをそのまま記録装置１２に記憶するか否かが判断される（Ｓ４２）。素材映像データの記憶が不要であると認識された場合（Ｓ４２のＮｏ）、前記の通りに自動編集処理が行われて自動編集済み映像データが作成され（Ｓ４３）、この自動編集済み映像データを配信用の映像データであるとして記録装置１２に記憶する（Ｓ４４）。この場合においては、記録装置１２に記録される映像データは自動編集済み映像データのみである。 Here, when the recording device 11 obtains the material video data (S41), it is determined whether or not the material video data is stored in the recording device 12 as it is (S42). When it is recognized that the storage of the material video data is unnecessary (No in S42), the automatic editing process is performed as described above to create the automatically edited video data (S43), and the automatically edited video data is used. It is stored in the recording device 12 as video data for distribution (S44). In this case, the video data recorded in the recording device 12 is only the automatically edited video data.

素材映像データの記憶をすると認識された場合（Ｓ４２のＹｅｓ）、素材映像データが記録装置１２に記憶される（Ｓ４５）。その後、他装置も用いて素材映像データの解析を行うか否かが問い合わせられる（Ｓ４６）。他装置も用いて素材映像データの解析を行う場合（Ｓ４６のＹｅｓ）、ユーザは、他装置を用いて素材映像データの解析を行い（Ｓ４７）、その上で編集装置１４を用いた以降の処理を開始させることができる。この解析結果を、以下の判定（Ｓ５０、Ｓ５６）に利用できる。 When it is recognized that the material video data is to be stored (Yes in S42), the material video data is stored in the recording device 12 (S45). After that, an inquiry is made as to whether or not to analyze the material video data using another device (S46). When the material video data is analyzed using another device (Yes in S46), the user analyzes the material video data using the other device (S47), and then the subsequent processing using the editing device 14. Can be started. This analysis result can be used for the following determinations (S50, S56).

その後、自動編集を直ちに行うか否かが問い合わせられ（Ｓ４８）、直ちに行わない場合（Ｓ４８のＮｏ）、自動編集情報作成装置１３によって自動編集情報が作成され（Ｓ４９）、その後で編集装置１４は、この自動編集情報の内容でよいか否かを問い合わせる（Ｓ５０）。 After that, an inquiry is made as to whether or not to perform automatic editing immediately (S48), and if it is not performed immediately (No in S48), automatic editing information is created by the automatic editing information creation device 13 (S49), and then the editing device 14 , Inquire whether the content of this automatic editing information is acceptable (S50).

この問い合わせを行う際には、実際に自動編集済み映像データは作成されていないが、ユーザは、この自動編集情報に基づく編集後の内容を確認するために、前記の通り、ある一時点での静止画像を用いて、この確認をすることが可能である。 When making this inquiry, the automatically edited video data is not actually created, but the user can check the edited contents based on this automatically edited information at a certain point in time as described above. It is possible to confirm this using a still image.

この内容を変更したい場合（Ｓ５０のＮｏ）、編集装置１４は、ユーザにその修正を行わせる（Ｓ５１）。その後、内容の変更がない場合（Ｓ５０のＹｅｓ）、そのままの自動編集情報に基づいて、素材映像データに対する実際の編集作業が行われた編集済み映像データが作成される（Ｓ５２）。この編集済み映像データが、配信用の映像データとして記録装置１２に記憶される（Ｓ５３）。この場合には、最終的に内容が確定するまで編集済み映像データは作成されない。 When it is desired to change this content (No in S50), the editing device 14 causes the user to make the correction (S51). After that, when there is no change in the content (Yes in S50), the edited video data in which the actual editing work for the material video data is performed is created based on the automatic editing information as it is (S52). This edited video data is stored in the recording device 12 as video data for distribution (S53). In this case, the edited video data is not created until the content is finally finalized.

自動編集を直ちに行う場合（Ｓ４８のＹｅｓ）、直ちに自動編集情報とこれに基づいた自動編集済み映像データが作成され（Ｓ５４）、自動編集済み映像データを表示部１４２で表示させる（Ｓ５５）。この場合には、ユーザは、自動編集済み映像データの全ての時点で、この編集内容が適正か否かを詳細に確認することができる（Ｓ５６）。 When the automatic editing is performed immediately (Yes in S48), the automatic editing information and the automatically edited video data based on the automatic editing information are immediately created (S54), and the automatically edited video data is displayed on the display unit 142 (S55). In this case, the user can confirm in detail whether or not the edited content is appropriate at all points in the automatically edited video data (S56).

その後、この編集内容の修正を望む場合（Ｓ５６のＮｏ）には、上記と同様にその修正作業、確認が行われ（Ｓ５７）、その後に再びこの修正後の編集情報に基づき新たな映像データ（編集済み映像データ）が作成され（Ｓ５８）、この編集済み映像データが配信用の映像データとして記録装置１２に記憶される（Ｓ５９）。この際に作成された最終編集情報も同時に記憶される。 After that, when it is desired to correct the edited content (No in S56), the correction work and confirmation are performed in the same manner as described above (S57), and then new video data (No) based on the corrected edited information is performed again. Edited video data) is created (S58), and the edited video data is stored in the recording device 12 as video data for distribution (S59). The final editing information created at this time is also stored at the same time.

自動編集情報に基づく編集が適正であると認められた場合（Ｓ５６のＹｅｓ）には、既に作成された自動編集済み映像データが、配信用の映像データとして記録装置１２に記憶される（Ｓ６０）。 When it is confirmed that the editing based on the automatic editing information is appropriate (Yes in S56), the already created automatically edited video data is stored in the recording device 12 as the video data for distribution (S60). ..

上記の動作においては、素材映像データに対して実際に編集処理を施すことを必要最小限に留めることによって処理時間を短くし、かつユーザによる編集処理が適正か否かのチェックを確実に行うことができ、その修正も行われる。 In the above operation, the processing time is shortened by keeping the actual editing process on the material video data to the minimum necessary, and the user checks whether the editing process is appropriate or not. Can be done, and the correction is also made.

次に、素材映像データにおける処理の対象となる部分を認識するために情報記憶部１３２に記憶される処理対象情報について説明する。こうした処理の対象となる部分としては、前記のような時刻表示、映り込んだ自動車の登録ナンバー、企業名、映り込んだ人物の顔等がある。時刻表示や登録ナンバーは、数字をパターン認識することによって認識することができ、企業名は文字のパターン認識によって認識することができ、顔もパターン認識手法によって認識することができる。 Next, the processing target information stored in the information storage unit 132 in order to recognize the processing target portion of the material video data will be described. The parts to be processed by such processing include the time display as described above, the registration number of the reflected automobile, the company name, the face of the reflected person, and the like. The time display and the registration number can be recognized by recognizing the numbers by pattern recognition, the company name can be recognized by recognizing the character pattern, and the face can also be recognized by the pattern recognition method.

前記の通り、上記の編集装置１４においては、ユーザ自身が操作パネル１４３を操作することによって、こうした処理の対象となる部分を設定することもでき、その後にこの操作が反映された最終編集情報が作成される。この場合、この最終編集情報を処理対象認識部１３１が認識して、処理対象情報を更新（あるいは作成）することもできる。この場合、処理対象認識部１３１は、処理対象情報をより好ましい内容に更新する処理対象情報改変手段として機能する。 As described above, in the above-mentioned editing device 14, the user can set the target portion of such processing by operating the operation panel 143 by himself / herself, and then the final editing information reflecting this operation is displayed. Created. In this case, the processing target recognition unit 131 may recognize the final editing information and update (or create) the processing target information. In this case, the processing target recognition unit 131 functions as a processing target information modification means for updating the processing target information to more preferable contents.

図９は、処理対象認識部１３１におけるこうした動作の流れを示す図である。
まず、初期状態（初期設定）の処理対象情報は、ユーザによって作成される（Ｐ１）。ここでは、例えば、処理の対象として必要最小限でありかつ認識が比較的容易なもののみが対象として選定される。また例えば、前記のような画像中の時刻表示を、こうした対象とすることができる。この処理対象情報を用いて、前記のようにこの映像編集システム１が繰り返し用いられる。この際、前記のように、自動編集情報による編集に加え、あるいはこの編集に代わり、ユーザによっても編集作業が行われ、最終的に素材映像データに対して適用された最終編集情報が作成され、この最終編集情報も情報記憶部１３２に記憶される。 FIG. 9 is a diagram showing the flow of such an operation in the processing target recognition unit 131.
First, the processing target information in the initial state (initial setting) is created by the user (P1). Here, for example, only the minimum necessary processing target and relatively easy to recognize are selected as the target. Further, for example, the time display in the image as described above can be such an object. Using this processing target information, the video editing system 1 is repeatedly used as described above. At this time, as described above, in addition to the editing by the automatic editing information, or instead of this editing, the editing work is also performed by the user, and finally the final editing information applied to the material video data is created. This final editing information is also stored in the information storage unit 132.

このため、処理対象認識部１３１は、自動編集済み映像データの基となった自動編集情報と、その後に生成された最終編集情報とを比較することによって、自動編集情報の基となり情報記憶部１３２に記憶された処理対象情報を改変することができる。例えば、画像中のある文字列が処理対象情報における処理の対象に含まなかったために自動編集情報においては処理の対象とされていなかったが、ユーザによって後で指定されて最終編集情報においては処理の対象とされた場合には、この文字列を処理の対象として追加するように処理対象情報を改変することができる。逆に、画像中のある文字列が処理対象情報における処理の対象に含まれたために自動編集情報においては処理の対象とされたが、ユーザによって後でこの指定が解除されて最終編集情報においては処理の対象とされなかった場合には、この文字列を処理の対象から削除するように処理対象情報を改変することができる。処理対象情報における処理の内容（ブラー処理等）についても、同様に改変することができる。こうした作業は、例えば画像中の顔認識を用いれば、特定の人物を処理の対象とする場合においても同様に行うことができる。 Therefore, the processing target recognition unit 131 becomes the basis of the automatic editing information and the information storage unit 132 by comparing the automatic editing information which is the basis of the automatically edited video data with the final editing information generated after that. The processing target information stored in can be modified. For example, a certain character string in the image was not included in the processing target in the processing target information, so that it was not processed in the automatic editing information, but it was later specified by the user and processed in the final editing information. When it is targeted, the processing target information can be modified so that this character string is added as the processing target. On the contrary, since a certain character string in the image was included in the processing target in the processing target information, it was targeted for processing in the automatic editing information, but this designation was later canceled by the user and in the final editing information. If it is not the target of processing, the processing target information can be modified so that this character string is deleted from the processing target. The processing content (blurring processing, etc.) in the processing target information can also be modified in the same manner. Such work can be performed in the same manner even when a specific person is targeted for processing by using face recognition in an image, for example.

また、このように処理の対象として選択されたか否かという単純な判断を用いずに、処理対象認識部１３１は、記録された複数の最終編集情報における統計的処理に基づいて、処理対象情報を改変することもできる。この際、例えば、最終編集情報と自動編集情報との相違点の各々を数値評価してその数値の総計を点数として算出し、この数値に基づき、処理対象情報を改変することもできる。例えば、この点数が大きかった（違いが大きかった）最終編集情報を抽出し、これらの中で共通の処理対象とされ処理対象情報に含まれなかったものを、新たに処理対象情報に取り入れることができる。 Further, the processing target recognition unit 131 uses the processing target information based on the statistical processing of the plurality of recorded final editing information without using a simple determination as to whether or not the processing target is selected in this way. It can also be modified. At this time, for example, it is possible to numerically evaluate each of the differences between the final editing information and the automatic editing information, calculate the total of the numerical values as a score, and modify the processing target information based on this numerical value. For example, it is possible to extract the final editing information with a large score (large difference), and newly incorporate the information that was treated as a common processing target and was not included in the processing target information. can.

このため、図９のフローにおいては、初期状態の処理対象情報（Ｐ１）を用いてこの映像編集システム１が用いられ、この際に、ユーザの操作により最終編集情報が作成され、情報記憶部１３２に記憶される（Ｐ２）。その後、上記のように、最終編集情報と自動編集情報の違いが数値化されて評価される（Ｐ３）。この数値に基づき、総合的解析として、現在の処理対象情報を書き換えることが好ましいか、あるいはどのように書き換えるかが判定され（Ｐ４）、最終的に処理対象情報が更新される（Ｐ５）。ここで、図９に示されるように、最終的な判定（Ｐ４）に際しては、上記のような最終編集情報と自動編集情報の違いだけでなく、編集装置１４におけるユーザによる編集作業の傾向（例えばあるユーザにおいては編集作業が多く、他のあるユーザでは編集作業が少ない）や、初期設定（Ｐ１）後に新たに発生した事情によって追加された画像に対する条件、等も考慮することができる。 Therefore, in the flow of FIG. 9, the video editing system 1 is used using the processing target information (P1) in the initial state, and at this time, the final editing information is created by the user's operation, and the information storage unit 132. It is stored in (P2). After that, as described above, the difference between the final editing information and the automatic editing information is quantified and evaluated (P3). Based on this numerical value, as a comprehensive analysis, it is determined whether it is preferable to rewrite the current processing target information or how to rewrite it (P4), and finally the processing target information is updated (P5). Here, as shown in FIG. 9, in the final determination (P4), not only the difference between the final editing information and the automatic editing information as described above, but also the tendency of the editing work by the user in the editing device 14 (for example,). Some users have a lot of editing work, and some other users have a little editing work), and conditions for images newly added due to circumstances newly generated after the initial setting (P1) can be taken into consideration.

このような処理対象情報の改変作業は、この映像編集システム１が使用されて最終編集情報が作成される度に繰り返してもよく、周期的に行ってもよい。また、上記の点数を用いる場合には、この点数の累積値に応じて行ってもよい。 Such modification work of the processing target information may be repeated every time the video editing system 1 is used to create the final editing information, or may be performed periodically. Further, when the above points are used, it may be performed according to the cumulative value of the points.

このように、処理対象情報を、多数の最終編集情報を基にして改変する作業は、周知の機械学習手法（ディープラーニング）等を用いても行うことができる。前記のように、映像の配信先等に応じて複数の処理対象情報が設定される場合には、これらの作業も処理対象情報毎に行うことができる。 In this way, the work of modifying the processing target information based on a large amount of final editing information can also be performed by using a well-known machine learning method (deep learning) or the like. As described above, when a plurality of processing target information is set according to the video distribution destination and the like, these operations can also be performed for each processing target information.

ユーザ自身が自動編集済み映像データに対する評価を入力できる設定とするための問い合わせ、入力は、編集装置１４における表示部１４２、操作パネル１４３（タッチパネルディスプレイ１４４）を用いて行うことができる。 Inquiries and inputs for setting the user himself / herself to input the evaluation of the automatically edited video data can be performed by using the display unit 142 and the operation panel 143 (touch panel display 144) in the editing device 14.

図１０は、こうした表示の一例である。ここでは、表示Ｋにおいて、自動編集情報（処理の対象となる部分の各々及びそれぞれにおける処理の内容）の説明及びその適用の可否が行われ、上側の表示Ｌで、この際の自動編集情報の評価がユーザによって入力される。その後で下側の表示Ｍを操作することによって、自動編集情報が表示Ｋの操作を反映して改変された最終編集情報を用いた編集処理が実行される。 FIG. 10 is an example of such a display. Here, in the display K, the automatic editing information (each of the parts to be processed and the content of the processing in each) is explained and whether or not the application is possible is performed, and in the upper display L, the automatic editing information at this time is described. The rating is entered by the user. After that, by operating the lower display M, an editing process using the final editing information in which the automatic editing information is modified to reflect the operation of the display K is executed.

素材映像データには様々な種類のものがあり、場合によっては、一般的ではない特殊部分に対して処理を施す場合もある。こうした場合においては、自動編集情報と最終編集情報の違いが大きくなった場合でも、この場合の最終編集情報は、一般的に用いられる処理対象情報の改変に用いないことが好ましい。図１０に示されたように、この場合の自動編集情報を評価の対象としないことを選択した場合には、このように特殊な場合の最終編集情報は処理対象情報の改変には使用されない。 There are various types of material video data, and in some cases, processing is applied to uncommon special parts. In such a case, even if the difference between the automatic editing information and the final editing information becomes large, it is preferable that the final editing information in this case is not used for modifying the generally used processing target information. As shown in FIG. 10, when it is selected not to evaluate the automatic editing information in this case, the final editing information in such a special case is not used for modifying the processing target information.

このように、新たに作成された最終編集情報をフィードバックして処理対象情報を更新する方法として、上記の他にも、様々な手法が適用可能である。 As described above, various methods other than the above can be applied as a method of feeding back the newly created final editing information and updating the processing target information.

また、例えば、上記の処理の対象となりうる部分としては、映り込んだ人物の顔があり、処理対象認識部１３１は画像中における顔を認識することが可能である。ここで、例えば、映り込んだ人物が複数おり、ある特定の人物の顔のみに対して処理を適用したい場合、あるいは逆にこの特定の人物以外の全ての人物の顔に処理を施したい場合がある。こうした場合には、処理対象情報において、人物の顔を上記の第１のレベルに設定すれば、前記の放送禁止用語の場合と同様に、警告のみを発し、この警告が解除されない限り、自動編集済み映像データを作成せず、かつ素材映像データも配信しない構成とすればよい。その後、ユーザは、映り込んだ全ての顔のうち、特定の人物の顔のみに処理を行う、あるいは逆に特定の人物の顔のみに処理を行わないように、操作パネル１４３を制御して最終編集情報を作成し、この最終編集情報に応じて編集済み映像データを作成した後に、これを配信させることができる。 Further, for example, the portion that can be the target of the above processing is the face of the reflected person, and the processing target recognition unit 131 can recognize the face in the image. Here, for example, when there are a plurality of reflected persons and it is desired to apply the processing only to the face of a specific person, or conversely, when it is desired to apply the processing to the faces of all persons other than this specific person. be. In such a case, if the face of the person is set to the above-mentioned first level in the processing target information, only a warning is issued as in the case of the above-mentioned broadcast prohibited term, and unless this warning is canceled, automatic editing is performed. The configuration may be such that the finished video data is not created and the material video data is not distributed. After that, the user finally controls the operation panel 143 so that the processing is performed only on the face of a specific person among all the reflected faces, or conversely, the processing is not performed only on the face of a specific person. After creating the edited information and creating the edited video data according to the final edited information, it can be distributed.

なお、上述の構成においては、記録装置１２（ビデオサーバ）に、処理対象認識部（処理対象認識手段、処理対象情報改変手段）１３１、情報記憶部（情報記憶手段）１３２を具備する自動編集情報作成装置１３と、編集制御部（編集手段）１４１、表示部（表示手段）１４２、操作パネル（操作手段）１４３を具備する編集装置１４が接続され、上記の動作が行われた。しかしながら、上記と同様の機能をもつ処理対象認識手段、処理対象情報改変手段、情報記憶手段、編集手段、表示手段等が素材映像データに関わって設けられ、自動編集済み映像データ、自動編集情報、最終編集情報等を作成することができる限りにおいて、具体的な装置の構成は任意である。すなわち、使用される各装置において上記の各手段がどのように設けられるかは任意であり、上記の各手段が全て単一の装置内に設けられていてもよい。 In the above configuration, the recording device 12 (video server) is provided with a processing target recognition unit (processing target recognition means, processing target information modification means) 131, and an information storage unit (information storage means) 132. The creation device 13 was connected to an editing device 14 including an editing control unit (editing means) 141, a display unit (display means) 142, and an operation panel (operating means) 143, and the above operation was performed. However, processing target recognition means, processing target information modification means, information storage means, editing means, display means, etc. having the same functions as described above are provided in relation to the material video data, and the automatically edited video data, the automatic editing information, etc. As long as the final editing information and the like can be created, the specific configuration of the device is arbitrary. That is, how each of the above means is provided in each device used is arbitrary, and all of the above means may be provided in a single device.

次に図１１〜１８を参照して類似人物検索処理（特に類似顔検出処理）について説明する。当該処理は、類似顔画像検出装置１６や編集装置１４（特に類似顔画像検出操作部１０３）の機能により実行されるもので、特開２０１３−１０１４３１号公報に開示の技術を顔画像の認識処理に適用したものである。以下では、開示されている主要部分を例示する。 Next, a similar person search process (particularly a similar face detection process) will be described with reference to FIGS. 11 to 18. This process is executed by the functions of the similar face image detection device 16 and the editing device 14 (particularly the similar face image detection operation unit 103), and the technique disclosed in Japanese Patent Application Laid-Open No. 2013-101431 is used for face image recognition processing. It is applied to. In the following, the main parts disclosed will be illustrated.

図１１（ａ）〜（ｇ）には、本実施例において、類似人物検索を実施する手順に沿って、検索キー画像の候補となった画像の特徴量を例示している。図１２には、類似人物検索（類似顔検出処理）を実施する手順を例示している。 11 (a) to 11 (g) exemplify the feature amount of the image that is a candidate for the search key image according to the procedure for performing the similar person search in this embodiment. FIG. 12 illustrates a procedure for performing a similar person search (similar face detection process).

まず、最初のキー画像による検索処理６００１では、ユーザが選択した最初の検索キー画像によって最初の検索が行われる。ここでは、最初の検索キー画像に選択された画像の特徴量（本例では、画像中の人物の特徴量）と距離が近い特徴量を有する画像を記録装置１２内の類似人物検索部２１８を通じて検索し、その結果、例えば１０件の画像が検索される。 First, in the search process 6001 using the first key image, the first search is performed by the first search key image selected by the user. Here, an image having a feature amount close to the feature amount of the image selected as the first search key image (in this example, the feature amount of the person in the image) is transmitted through the similar person search unit 218 in the recording device 12. As a result of the search, for example, 10 images are searched.

図１１（ａ）には、最初の検索キー画像の特徴量を「○」で示してある。ここでは、説明の分かり易さのために画像の特徴量を２次元で表現しているが、実際には、画像の特徴量は例えば数百次元といった非常に多くの次元数を持つ場合が多い。 In FIG. 11A, the feature amount of the first search key image is indicated by “◯”. Here, the feature amount of the image is expressed in two dimensions for the sake of clarity of explanation, but in reality, the feature amount of the image often has a very large number of dimensions such as several hundred dimensions. ..

ここで、検索結果である１０件の画像のうち３件が最初の検索キー画像と同一の対象であるとする。検索結果から同一人物を選択する処理６００２では、１０件の検索結果画像から目的の３件の画像を選択する。具体的には、例えば、ユーザが編集装置１４の操作パネル１４３やマウス（図示せず）を操作して目的の画像を選択する。なお、画像の特徴量について閾値を設け、最初の検索キー画像の特徴量と検索結果画像の特徴量との距離が閾値以下なら同一の対象（同一人物）であると判断し、該当する検索結果画像を自動選択する方法としてもよい。 Here, it is assumed that 3 out of 10 images that are the search results are the same target as the first search key image. In the process 6002 of selecting the same person from the search results, three target images are selected from the ten search result images. Specifically, for example, the user operates the operation panel 143 of the editing device 14 or a mouse (not shown) to select a target image. A threshold value is set for the feature amount of the image, and if the distance between the feature amount of the first search key image and the feature amount of the search result image is equal to or less than the threshold value, it is judged that they are the same target (same person), and the corresponding search result is applied. It may be a method of automatically selecting an image.

図１１（ｂ）には、図１１（ａ）の内容に加え、検索結果から同一人物を選択する処理６００２によって選択された画像の特徴量を「△」で示してある。このような処理によって選択された画像は、新たな検索キー画像の候補となる。 In FIG. 11B, in addition to the contents of FIG. 11A, the feature amount of the image selected by the process 6002 for selecting the same person from the search results is indicated by “Δ”. The image selected by such processing becomes a candidate for a new search key image.

ここで、検索結果画像が動画を形成する連続的な画像のうちの１枚であるとすると、その動画における検索結果画像の前後にも同一人物の画像が含まれている場合が多い。検索結果前後の同一人物を選択する処理６００３では、検索結果画像が抽出された動画における検索結果画像の前又は後の所定長の時間帯に含まれる複数の画像から、人物の位置や進行速度等に基づいて検索結果画像の人物と同一人物（すなわち、検索キー画像の人物と同一人物）と判定される画像を自動的に選択する。なお、ユーザが指定できるようにしてもよい。 Here, if the search result image is one of the continuous images forming the moving image, the image of the same person is often included before and after the search result image in the moving image. In the process 6003 of selecting the same person before and after the search result, the position, progress speed, etc. of the person are selected from a plurality of images included in a predetermined length of time before or after the search result image in the video from which the search result image is extracted. The image determined to be the same person as the person in the search result image (that is, the same person as the person in the search key image) is automatically selected based on. The user may be able to specify it.

図１１（ｃ）には、図１１（ｂ）の内容に加え、検索結果前後の同一人物を選択する処理６００３によって選択された画像の特徴量を「□」で示してある。このような処理によって選択された画像は、新たな検索キー画像の候補となる。 In FIG. 11 (c), in addition to the contents of FIG. 11 (b), the feature amount of the image selected by the process 6003 for selecting the same person before and after the search result is indicated by “□”. The image selected by such processing becomes a candidate for a new search key image.

マスクを付加する画像処理６００４では、これまでの処理で新たな検索キー画像の候補とした人物の画像に対し、画像処理で鼻や口を覆うマスクを付加した画像を生成し、新たな検索キー画像の候補に追加する処理を行う。なお、これとは逆に、画像処理前の人物の画像が鼻や口を覆うマスクをしている場合に、画像処理でマスクを外すようにする処理を行ってもよい。また、マスクの画像は複数種類の画像を準備してもよい。 In the image processing 6004 to add a mask, an image with a mask covering the nose and mouth is generated by image processing for the image of the person who is a candidate for a new search key image in the previous processing, and a new search key is added. Performs processing to add to image candidates. On the contrary, when the image of the person before the image processing has a mask covering the nose and mouth, a process of removing the mask by the image processing may be performed. Further, a plurality of types of mask images may be prepared.

サングラスや眼鏡を付加する画像処理６００５では、これまでの処理で新たな検索キー画像の候補とした人物の画像に対し、画像処理でサングラスや眼鏡を付加した画像を生成し、新たな検索キー画像の候補に追加する処理を行う。なお、これとは逆に、画像処理前の人物の画像がサングラスや眼鏡をかけている場合に、画像処理でサングラスや眼鏡を外すようにする処理を行ってもよい。また、サングラスや眼鏡の画像は複数種類の画像を準備してもよい。 In the image processing 6005 to add sunglasses and glasses, an image to which sunglasses and glasses are added is generated by image processing for the image of the person who is a candidate for a new search key image in the previous processing, and a new search key image is generated. Performs the process of adding to the candidates of. On the contrary, when the image of the person before the image processing is wearing sunglasses or glasses, a process of removing the sunglasses or glasses may be performed by the image processing. Further, a plurality of types of images may be prepared for the images of sunglasses and eyeglasses.

人物の向きを変更する画像処理６００６では、これまでの処理で新たな検索キー画像の候補とした人物の画像に対し、画像処理で人物の向きを変更した画像を生成し、新たな検索キー画像の候補に追加する処理を行う。人物の向きは通常複数であるが、単純な左右反転であってもよい。 In the image processing 6006 for changing the direction of a person, an image in which the direction of the person is changed by image processing is generated for the image of the person who is a candidate for a new search key image in the previous processing, and a new search key image is generated. Performs the process of adding to the candidates of. The orientation of the person is usually plural, but it may be a simple left-right reversal.

図１１（ｄ）には、図１１（ｃ）の内容に加え、マスクを付加する画像処理６００４、サングラスや眼鏡を付加する画像処理６００５、人物の向きを変更する画像処理６００６の結果生成された複数の画像の特徴量を「×」で示してある。このような処理によって生成された画像は、新たな検索キー画像の候補として追加される。 In addition to the contents of FIG. 11C, FIG. 11D is generated as a result of image processing 6004 for adding a mask, image processing 6005 for adding sunglasses and eyeglasses, and image processing 6006 for changing the orientation of a person. The feature quantities of a plurality of images are indicated by "x". The image generated by such processing is added as a candidate for a new search key image.

なお、マスクを付加する画像処理６００４、サングラスや眼鏡を付加する画像処理６００５、人物の向きを変更する画像処理６００６は、最初の検索キー画像、検索結果から同一人物を選択する処理６００２の結果の画像、検索結果前後の同一人物を選択する処理６００３の結果の画像のいずれを対象にして施してもよい。また、対象の画像に対していずれか１つの画像処理を施してもよく、任意の２つの画像処理を施してもよく、３つの画像処理を全て施してもよい。また、対象の画像の明暗を変える画像処理など、上記以外の画像処理を施してもよい。 The image processing 6004 for adding a mask, the image processing 6005 for adding sunglasses and glasses, and the image processing 6006 for changing the direction of a person are the results of the first search key image and the process 6002 for selecting the same person from the search results. It may be applied to either the image or the image of the result of the process 6003 for selecting the same person before and after the search result. Further, any one image processing may be performed on the target image, any two image processings may be performed, or all three image processings may be performed. Further, image processing other than the above may be performed, such as image processing for changing the brightness of the target image.

次に、クラスタリング処理６００７では、これまでの処理６００１〜６００６により検索キー画像の候補とされた複数の画像をクラスタリングして、各クラスタを代表する画像（或いはその特徴量）を求める。クラスタリング方法としては、ｋ−ｍｅａｎｓ法などの公知の技術を用いることができる。各クラスタを代表する画像としては、例えば、そのクラスタに含まれる画像の特徴量の平均に最も近い画像が用いられ、その画像の特徴量が新たな検索キーとされる。なお、クラスタに含まれる画像の特徴量の平均をそのまま新たな検索キーとしてもよい。 Next, in the clustering process 6007, a plurality of images that have been candidates for the search key image by the previous processes 6001 to 6006 are clustered, and an image (or a feature amount thereof) representing each cluster is obtained. As the clustering method, a known technique such as the k-means method can be used. As an image representing each cluster, for example, an image closest to the average of the feature amounts of the images included in the cluster is used, and the feature amount of the image is used as a new search key. The average of the feature amounts of the images included in the cluster may be used as a new search key as it is.

図１１（ｅ）には、これまでの処理６００１〜６００６によって得られた新たな検索キー画像の候補がクラスタリング処理６００７によってクラスタに分けられた様子と、各クラスタを代表する画像の特徴量を例示してある。図１１（ｅ）では、３つのクラスタを枠線で囲って示してあり、各クラスタを代表する画像の特徴量として、各クラスタの重心に最も近い画像の特徴量Ｐ１１、Ｐ１２、Ｐ１３がそれぞれ選択されている。 FIG. 11 (e) illustrates how the new search key image candidates obtained by the previous processes 6001 to 6006 are divided into clusters by the clustering process 6007, and the feature amount of the image representing each cluster. It is done. In FIG. 11 (e), three clusters are surrounded by a frame, and the feature amounts P11, P12, and P13 of the image closest to the center of gravity of each cluster are selected as the feature amounts of the images representing each cluster. Has been done.

代表する検索キーによる検索処理６００８では、クラスタリング処理６００７によって得られた各クラスタを代表する画像の特徴量を新たな検索キーに用いて類似画像検索を行い、結果を出力する。 In the search process 6008 using the representative search key, a similar image search is performed using the feature amount of the image representing each cluster obtained by the clustering process 6007 as a new search key, and the result is output.

ここで、図１１（ｅ）の例では、最初の検索キー画像に関連する画像（処理６００１〜６００６により得られた画像）は２９枚あるため、従来であれば、これらの画像の特徴量を新たな検索キーとした検索を２９回繰り返していたところ、本実施例においては、クラスタリング処理６００７によって得られた各クラスタを代表する３つの画像の特徴量を用いて類似顔画像検索を行うことで、特徴量のバランスをとりつつ３回の検索で済むようにしている。ここでは、クラスタの数を３としたが、これは設定によって変えることができる。 Here, in the example of FIG. 11 (e), since there are 29 images (images obtained by processing 6001 to 6006) related to the first search key image, conventionally, the feature amounts of these images are used. When the search using the new search key was repeated 29 times, in this embodiment, a similar face image search was performed using the feature quantities of the three images representing each cluster obtained by the clustering process 6007. , I try to search only 3 times while balancing the features. Here, the number of clusters is set to 3, but this can be changed by setting.

次に、図１３を参照して編集装置１４の類似顔画像検出操作部１０３の画面について説明する。図１３には、本例の類似顔画像検索システムに使用可能な検索画面を例示してある。 Next, the screen of the similar face image detection operation unit 103 of the editing device 14 will be described with reference to FIG. FIG. 13 illustrates a search screen that can be used in the similar face image search system of this example.

検索画面は、再生画像表示領域３００１、画像再生操作領域３００３、検索キー画像指定領域３００４、検索絞込パラメータ指定領域３００８、検索実行領域３０１７、検索結果表示領域３０２０を有する。 The search screen includes a reproduced image display area 3001, an image reproduction operation area 3003, a search key image designation area 3004, a search narrowing parameter designation area 3008, a search execution area 3017, and a search result display area 3020.

再生画像表示領域３００１は、（類似顔画像検出装置１６や）記録装置１２に記録された画像を動画像として表示する領域である。また、再生画像表示領域３００１の動画３００２は、記録装置１２に記録された画像を動画像として表示するものである。 The reproduced image display area 3001 is an area for displaying an image recorded on the recording device 12 (similar face image detecting device 16 or) as a moving image. Further, the moving image 3002 in the reproduced image display area 3001 displays the image recorded in the recording device 12 as a moving image.

画像再生操作領域３００３は、記録装置１２に記録された画像を再生操作する領域である。本領域３００３を構成する各ボタンには、それぞれ固有の再生種類が割当てられている。本図においては、巻戻し、逆再生、再生停止、順再生、早送りの再生種類が左から順に割当てられている例を示している。ユーザが各ボタンをマウス２８２で適宜押下することにより、動画３００２がボタンに割当てられた再生種類に切り替る。 The image reproduction operation area 3003 is an area for reproducing an image recorded in the recording device 12. A unique reproduction type is assigned to each button constituting the area 3003. In this figure, an example is shown in which rewind, reverse playback, playback stop, forward playback, and fast forward playback types are assigned in order from the left. When the user appropriately presses each button with the mouse 282, the moving image 3002 is switched to the playback type assigned to the button.

検索キー画像指定領域３００４は、検索キー画像の指定と表示を行う領域である。本領域３００４は、検索キー画像３００５と、映像指定ボタン３００６、ファイル指定ボタン３００７を有する。 The search key image designation area 3004 is an area for designating and displaying the search key image. This area 3004 has a search key image 3005, a video designation button 3006, and a file designation button 3007.

検索キー画像３００５は、類似検索のための最初の検索キー画像とする画像である。初期状態においては、検索キー画像は、未指定であるので、画像表示はされていない状態となる。なお、未指定の場合に、別途用意した未指定状態を示す画像を表示する等、未指定である旨の表記をするようにしてもよい。 The search key image 3005 is an image used as the first search key image for a similar search. In the initial state, since the search key image is not specified, the image is not displayed. If it is not specified, it may be indicated that it is not specified, such as displaying a separately prepared image showing the unspecified state.

映像指定ボタン３００６は、押下時に再生画像表示領域３００１に表示されている画像を、検索キー画像３００５として指定するボタンである。 The video designation button 3006 is a button for designating the image displayed in the reproduced image display area 3001 when pressed as the search key image 3005.

ファイル指定ボタン３００７は、記録装置１２に記録されている画像以外の画像、例えば、デジタルスチルカメラで撮影した画像やスキャナで取込んだ画像等を、検索キー画像３００５として指定するボタンである。このボタン３００７を押下すると、それらの画像をファイル指定するダイアログボックスが表示され、ユーザはそこで所望の画像を指定することができる。 The file designation button 3007 is a button for designating an image other than the image recorded in the recording device 12, such as an image taken by a digital still camera or an image captured by a scanner, as a search key image 3005. When this button 3007 is pressed, a dialog box for specifying files of those images is displayed, in which the user can specify desired images.

検索絞込パラメータ指定領域３００８は、検索の際の絞込パラメータの種類とその値（範囲）を指定する領域である。本領域３００８は、撮像装置指定チェックボックス３００９、３０１０、３０１１、３０１２と、タイムコード指定チェックボックス３０１３、３０１４と、タイムコード指定欄３０１５、３０１６を有する。 The search narrowing parameter specification area 3008 is an area for designating the type of the narrowing parameter and its value (range) at the time of searching. This area 3008 has image pickup device designation check boxes 3009, 3010, 3011, 3012, time code designation check boxes 3013, 3014, and time code designation fields 3015, 3016.

撮像装置指定チェックボックス３００９、３０１０、３０１１、３０１２は、検索の際に検索対象とする撮像装置（カメラ１０等）を指定するチェックボックスである。本チェックボックス３００９、３０１０、３０１１、３０１２は、押下すると選ばれたことを示すチェックマークがそれぞれ表示される。このマークは再押下すると非表示となり、押下で表示・非表示を繰り返す。 The image pickup device designation check boxes 3009, 3010, 3011, and 3012 are check boxes for designating the image pickup device (camera 10 and the like) to be searched at the time of search. When the check boxes 3009, 3010, 3011, and 3012 are pressed, check marks indicating that they have been selected are displayed. When this mark is pressed again, it disappears, and when it is pressed, it is repeatedly displayed and hidden.

タイムコード指定チェックボックス３０１３、３０１４は、検索の際に検索対象とする時刻範囲を指定するチェックボックスである。表示の態様については本チェックボックスも他のチェックボックスと同様である。タイムコード指定チェックボックス３０１３を選択状態にした場合には時刻範囲に先頭時刻を与える。非選択状態にした場合には、時刻範囲に先頭時刻を与えない、すなわち、記録装置１２に記録された最も古い時刻の画像までを検索対象範囲とすることを意味する。 The time code designation check boxes 3013 and 3014 are check boxes for designating the time range to be searched at the time of search. This check box is the same as other check boxes in terms of display mode. When the time code specification check box 3013 is selected, the start time is given to the time range. When the non-selected state is set, it means that the start time is not given to the time range, that is, the image up to the oldest time recorded in the recording device 12 is set as the search target range.

同様にタイムコード指定チェックボックス３０１４を選択状態にした場合には時刻範囲に末尾時刻を与える。非選択状態にした場合には、時刻範囲に末尾時刻を与えない、すなわち、記録装置１２に記録された最も新しい時刻の画像までを検索対象範囲とすることを意味する。 Similarly, when the time code specification check box 3014 is selected, the last time is given to the time range. When the non-selected state is set, it means that the last time is not given to the time range, that is, the search target range is up to the image of the latest time recorded in the recording device 12.

タイムコード指定欄３０１５、３０１６は、上述の先頭時刻と末尾時刻の値を指定する入力欄である。初期状態においては、全時間帯を検索対象とするため、タイムコード指定チェックボックス３０１３、３０１４は全て非選択状態、タイムコード指定欄３０１５、３０１６は空欄とする。 The time code designation fields 3015 and 3016 are input fields for designating the above-mentioned start time and end time values. In the initial state, since all time zones are searched, the time code designation check boxes 3013 and 3014 are all unselected states, and the time code designation fields 3015 and 3016 are left blank.

検索実行領域３０１７は、検索実行を指示する領域である。本領域３０１７は、類似人物検索ボタン３０１８、登場イベント検索ボタン３０１９に加え、検索結果からの類似人物検索ボタン３３００、同一シーンチェックボックス３２０１、マスクチェックボックス３２０２、サングラスチェックボックス３２０３、異なる角度チェックボックス３２０４を有する。 The search execution area 3017 is an area for instructing the search execution. In this area 3017, in addition to the similar person search button 3018 and the appearance event search button 3019, the similar person search button 3300 from the search results, the same scene check box 3201, the mask check box 3202, the sunglasses check box 3203, and the different angle check box 3204 Have.

類似人物検索ボタン３０１８は、検索キー画像３００５による類似人物検索（最初のキー画像による検索処理６００１）の実行を指示するボタンである。検索絞込パラメータ指定領域３００８にてパラメータが指定されている場合には、指定されたパラメータに従って類似人物検索の実行を指示する。 The similar person search button 3018 is a button for instructing the execution of the similar person search (search process 6001 by the first key image) by the search key image 3005. When the parameter is specified in the search narrowing parameter specification area 3008, the execution of the similar person search is instructed according to the specified parameter.

登場イベント検索ボタン３０１９は、登場イベント検索の実行を指示するボタンである。検索絞込パラメータ指定領域３００８にてパラメータが指定されている場合には、指定されたパラメータに従って登場イベント検索の実行を指示する。 The appearance event search button 3019 is a button for instructing the execution of the appearance event search. When a parameter is specified in the search narrowing parameter specification area 3008, the execution of the appearance event search is instructed according to the specified parameter.

検索結果表示領域３０２０は、検索結果を表示する領域である。検索結果の表示は、検索結果画像を一覧表示することにより実施する。初期状態においては、検索結果表示領域３０２０には何も表示されない。 The search result display area 3020 is an area for displaying search results. The search results are displayed by displaying a list of search result images. In the initial state, nothing is displayed in the search result display area 3020.

ここで、ユーザが、映像指定ボタン３００６を押下し、また、撮像装置指定チェックボックス３００９、３０１０、３０１２を押下し、更に、タイムコード指定チェックボックス３０１３、３０１４を押下し、タイムコード指定欄３０１５、３０１６にそれぞれ「１５：３０：２０：１７」、「１２：３０：２０：１７」と入力したとする。 Here, the user presses the image designation button 3006, the imaging device designation check boxes 3009, 3010, 3012, and further presses the time code designation check boxes 3013, 3014, and the time code designation fields 3015, It is assumed that "15:30:20:17" and "12:30:20:17" are input to 3016, respectively.

これにより、図１３に示すように、検索キー画像３００５には、動画３００２に表示された人物「Ａさん」の画像が検索キー画像として指定され、また、検索対象としたい撮像装置２０１として「カメラ１」、「カメラ２」、「カメラ４」の３つが指定され、検索対象としたい時刻範囲として「１５：３０：２０：１７から１２：３０：２０：１７まで」が指定される。 As a result, as shown in FIG. 13, the image of the person "Mr. A" displayed in the moving image 3002 is designated as the search key image in the search key image 3005, and the "camera" is used as the image pickup device 201 to be searched. "1", "Camera 2", and "Camera 4" are specified, and "from 15:30:20:17 to 12:30:20:17" is specified as the time range to be searched.

その後、ユーザが、類似人物検索ボタン３０１８を押下したとする。すると、検索結果表示領域３０２０には、検索キー画像３００５を用いて類似人物検索を実行して得られた検索結果が表示される。図１３は、この状態における検索画面の一例を示したものである。検索結果の表示は、検索結果画像（本例では、検索結果画像３０３１〜３１４１）を一覧表示することにより実施する。 After that, it is assumed that the user presses the similar person search button 3018. Then, in the search result display area 3020, the search result obtained by executing the similar person search using the search key image 3005 is displayed. FIG. 13 shows an example of the search screen in this state. The search result is displayed by displaying a list of search result images (search result images 3033-1141 in this example).

検索結果画像３０３１〜３１４１は、例えば、最上段左から右へ、次に２段目左から右へと検索キー画像３００５に対する類似度順に表示する。この表示例においては、検索結果画像３０３１が検索キー画像３００５に対し最も類似度が高く、検索結果画像３１４１が最も類似度が低いということを示している。 The search result images 3033-1141 are displayed, for example, from the top left to the right and then from the second left to the right in order of similarity to the search key image 3005. In this display example, the search result image 3031 has the highest degree of similarity to the search key image 3005, and the search result image 3141 has the lowest degree of similarity.

この図に示された例の表記において、検索結果表示領域３０２０内の検索結果画像３０３１〜３１４１上に図示した円とアルファベットは、人物の顔と人物名称を簡略表示したものであり、例えば、検索結果画像３０３１には、人物「Ａさん」が登場することを示している。この簡略表示している部分には、もちろん、実際のシステムでの表示では実画像が表示される。 In the notation of the example shown in this figure, the circles and alphabets shown on the search result images 3033-1141 in the search result display area 3020 are simplified displays of a person's face and person's name, for example, a search. The result image 3031 shows that the person "Mr. A" appears. Of course, in this simplified display part, the actual image is displayed in the display in the actual system.

検索結果画像３０３１の周辺には、頭出し再生ボタン３０３２、検索キー画像指定ボタン３０３３、検索対象チェックボックス３３０１を備える。他の検索結果画像３０４１〜３１４１も同様である。 A cue playback button 3032, a search key image designation button 3033, and a search target check box 3301 are provided around the search result image 3031. The same applies to the other search result images 3041 to 141.

頭出し再生ボタン３０３２は、検索結果画像３０３１を先頭とした連続動画再生開始を指示するボタンである。例えば、頭出し再生ボタン３０３２を押下すると動画３００２が検索結果画像３０３１に切り替り、その検索結果画像３０３１を先頭として始まる動画をユーザは、視聴することができる。 The cueing playback button 3032 is a button for instructing the start of continuous moving image playback starting with the search result image 3031. For example, when the cue playback button 3032 is pressed, the moving image 3002 is switched to the search result image 3031, and the user can watch the moving image starting with the search result image 3031.

検索キー画像指定ボタン３０３３は、検索結果画像３０３１を新たな検索キー画像に指定するボタンである。例えば、検索キー画像指定ボタン３０３３を押下すると、検索結果画像３０３１が検索キー画像３００５に表示される。これにより、検索結果画像３０３１を使って再検索を実施することができる。 The search key image designation button 3033 is a button for designating the search result image 3031 as a new search key image. For example, when the search key image designation button 3033 is pressed, the search result image 3031 is displayed on the search key image 3005. As a result, the search result image 3031 can be used for re-searching.

検索対象チェックボックス３３０１は、検索結果からの類似人物検索ボタン３３００を押下した場合に新たな検索キー画像（或いはその候補）として検索結果画像３０３１を指定するチェックボックスである。例えば、検索結果に出てきた「Ａさん」の画像（本例では、検索結果画像２０３１〜３０６１、３０８１、３０９１、３１２１、３１４１）を全てチェックして、検索結果からの類似人物検索ボタン３３００を押すことで、様々なパターンの「Ａさん」を検索することが可能である。 The search target check box 3301 is a check box for designating the search result image 3031 as a new search key image (or a candidate thereof) when the similar person search button 3300 from the search result is pressed. For example, all the images of "Mr. A" appearing in the search results (in this example, the search result images 2031 to 3061, 3081, 3091, 3121, 3141) are checked, and the similar person search button 3300 from the search results is pressed. By pressing, it is possible to search for "Mr. A" in various patterns.

検索結果からの類似人物検索ボタン３３００は、検索キー画像３００５による類似人物検索の結果に基づく再度の類似人物検索（代表する検索キーによる検索処理６００８）の実行を指示するボタンである。再度の類似人物検索では、検索結果表示領域３０２０の表示（最初のキー画像による検索処理６００１の結果）の中からユーザに選択された（検索対象チェックボックスがチェックされた）画像を新たな検索キー画像（或いはその候補）として類似人物検索を再実行する。 The similar person search button 3300 from the search result is a button for instructing the execution of another similar person search (search process 6008 by the representative search key) based on the result of the similar person search by the search key image 3005. In the similar person search again, the image selected by the user (the search target check box is checked) from the display of the search result display area 3020 (the result of the search process 6001 by the first key image) is a new search key. Re-execute the similar person search as an image (or its candidate).

同一シーンチェックボックス３２０１は、検索結果表示領域３０２０の表示の中からユーザに選択された画像を対象にして検索結果前後の同一人物を選択する処理６００３を実行し、その結果の画像（対象の画像中の人物と同一人物を映した前後の画像）を新たな検索キー画像の候補に追加することを指定するチェックボックスである。 The same scene check box 3201 executes the process 6003 of selecting the same person before and after the search result for the image selected by the user from the display of the search result display area 3020, and the resulting image (target image). This is a check box that specifies that images before and after the same person as the person inside) should be added as new search key image candidates.

尚、マスクチェックボックス３２０２は、検索結果表示領域３０２０の表示の中からユーザに選択された画像を対象にしてマスクを付加する画像処理６００４を実行し、その結果の画像（対象の画像中の人物にマスクを付加した画像或いは当該人物からマスクを外した画像）を新たな検索キー画像の候補に追加することを指定するチェックボックスである。 The mask check box 3202 executes image processing 6004 for adding a mask to an image selected by the user from the display of the search result display area 3020, and the resulting image (a person in the target image). This is a check box that specifies that an image with a mask added to the image or an image with the mask removed from the person concerned) is added as a candidate for a new search key image.

また、サングラスチェックボックス３２０３は、検索結果表示領域３０２０の表示の中からユーザに選択された画像を対象にしてサングラスや眼鏡を付加する画像処理６００５を実行し、その結果の画像（対象の画像中の人物にサングラス等を付加した画像或いは当該人物からサングラス等を外した画像）を新たな検索キー画像の候補に追加することを指定するチェックボックスである。 Further, the sunglasses check box 3203 executes image processing 6005 for adding sunglasses and eyeglasses to an image selected by the user from the display of the search result display area 3020, and the resulting image (in the target image). This is a check box for designating an image in which sunglasses or the like are added to the person in question or an image in which sunglasses or the like are removed from the person in question) as a candidate for a new search key image.

異なる角度チェックボックス３２０４は、検索結果表示領域３０２０の表示の中からユーザに選択された画像を対象にして人物の向きを変更する画像処理６００６を実行し、その結果の画像（対象の画像中の人物の向きを変更した画像）を新たな検索キー画像の候補に追加することを指定するチェックボックスである。 The different angle check box 3204 executes image processing 6006 for changing the direction of a person for an image selected by the user from the display of the search result display area 3020, and the resulting image (in the target image). This is a check box that specifies that an image with a changed person's orientation) should be added as a new search key image candidate.

これらのチェックボックス３２０１〜３２０４の１以上がチェックされた状態で検索結果からの類似人物検索ボタン３３００が押下された場合には、検索結果表示領域３０２０の表示の中からユーザに選択された各々の画像を対象にして、チェックされた状態のチェックボックスに対応する画像処理を実行し、その結果生成された画像を新たな検索キー画像の候補を追加し、その後、新たな検索キー画像の候補に対してクラスタリング処理６００７を実行して各クラスタを代表する検索キー画像を求め、各クラスタを代表する画像の特徴量を検索キーとして用いて類似画像検索を実行する。 When one or more of these check boxes 3201 to 204 are checked and the similar person search button 3300 from the search results is pressed, each of the displays selected by the user from the display of the search result display area 3020. For images, perform image processing corresponding to the checked check box, add a new search key image candidate to the resulting image, and then make it a new search key image candidate. On the other hand, the clustering process 6007 is executed to obtain a search key image representing each cluster, and a similar image search is executed using the feature amount of the image representing each cluster as a search key.

以上のように、上記の例では、検索キーの候補である複数の画像の特徴量に基づいて、検索キーとする画像の特徴量を決定する検索キー決定手段と、検索キー決定手段により検索キーに決定された画像の特徴量に類似する特徴量を有する画像を検索する検索手段と、を備えた構成において、検索キー決定手段が、検索キーの候補である複数の画像の特徴量をクラスタリングし、クラスタ毎にそのクラスタを代表する画像の特徴量を検索キーとして決定し、検索手段が、検索キー決定手段により決定されたクラスタ毎の検索キーをそれぞれ用いて検索を行うように構成した。 As described above, in the above example, the search key determining means for determining the feature amount of the image to be the search key based on the feature amounts of a plurality of images that are candidates for the search key, and the search key determining means by the search key determining means. In a configuration including a search means for searching an image having a feature amount similar to the feature amount of the image determined in, the search key determination means clusters the feature amounts of a plurality of images that are candidates for the search key. , The feature amount of the image representing the cluster is determined for each cluster as a search key, and the search means is configured to perform the search using each of the search keys for each cluster determined by the search key determining means.

なお、本例では、編集装置１４の類似顔画像検出操作部１０３の複数検索キー選択部１１３の機能により検索キー決定手段を実現し、類似顔画像検出装置１６の類似人物検索部２１８の機能により検索手段を実現しているが、他の態様により検索キー決定手段及び検索手段を実現しても構わない。 In this example, the search key determination means is realized by the function of the multiple search key selection unit 113 of the similar face image detection operation unit 103 of the editing device 14, and the function of the similar person search unit 218 of the similar face image detection device 16 is used. Although the search means is realized, the search key determination means and the search means may be realized by other aspects.

次に、図１４〜１６を参照して上述した類似人物検索処理（類似顔検出処理）を編集処理に適用した処理例を説明する。 Next, a processing example in which the above-mentioned similar person search processing (similar face detection processing) is applied to the editing processing will be described with reference to FIGS. 14 to 16.

上述の様に、従来から行われている出演者の出演シーン（出演映像）を探し出すまでのフローでは、担当者（編集者等）は管理端末で出演者の情報を検索すると、その出演者が出演している番組及び、その番組が記録されているＶＴＲテープ番号の一覧が表示される。その後、担当者は出力されたテープ番号のＶＴＲテープを棚から取り出し、ＶＴＲ再生機にかけて再生する。そして、再生映像を目視して出演シーンを探し、出演シーンのタイムコード情報を記録していた。このようなフローでは、作業効率や精度の観点から、改善が必要とされていた。そこで、次の様なフローによる技術を導入する。 As described above, in the conventional flow of searching for a performer's appearance scene (appearance video), the person in charge (editor, etc.) searches for the performer's information on the management terminal, and the performer finds out. A list of programs appearing and VTR tape numbers in which the programs are recorded is displayed. After that, the person in charge takes out the VTR tape having the output tape number from the shelf and puts it on a VTR player to play it. Then, the appearance scene was searched by visually observing the reproduced video, and the time code information of the appearance scene was recorded. In such a flow, improvement was required from the viewpoint of work efficiency and accuracy. Therefore, we will introduce the following flow technology.

図１４は元の映像が、メディア５（光学メディア５ａ、磁気メディア５ｂ及びＶＴＲテープ５ｃ）に記録されている場合に、顔画像蓄積サーバ１２７に顔画像を蓄積する手順を示す。メディア５を探し出すまでの手順は、従来通りである。 FIG. 14 shows a procedure for accumulating a face image on the face image storage server 127 when the original image is recorded on the media 5 (optical media 5a, magnetic media 5b, and VTR tape 5c). The procedure for finding the media 5 is the same as before.

元の映像が光学メディア５ａや磁気メディア５ｂに記録されている場合は、探し出したメディア（光学メディア５ａや磁気メディア５ｂ）から映像ファイルを取り出し、類似顔画像検出装置１６で映像ファイルを再生し、上述の類似人物検索処理の技術を用いて、再生映像から顔部分の映像のみを切出して、切出した顔画像をタイムコード情報と共に顔画像蓄積サーバ１２７に保存する。 When the original video is recorded on the optical media 5a or the magnetic media 5b, the video file is taken out from the found media (optical media 5a or magnetic media 5b), the video file is played back by the similar face image detection device 16, and the video file is played back. Using the above-mentioned similar person search processing technique, only the image of the face portion is cut out from the reproduced video, and the cut out face image is stored in the face image storage server 127 together with the time code information.

蓄積される顔画像は、１種類（一般には正面の顔）のみでなく、顔種別（正面、横顔、斜め顔、後ろ顔、笑った顔、怒った顔等）を検出対象の顔画像として複数登録して保存可能であり、顔画像とその顔種別が関連付けられて記録される。検出対象とする顔画像を複数、特に種別の異なる顔画像を複数、準備しておくことにより、特定の出演者が出演している映像をより精度よく検出することが可能となると共に、特定の出演者の映像の中でも特に欲しい状況（笑った顔の映像が欲しい等）を検出することができる。また、顔画像の蓄積の際に、出演者の名前が特定できている場合には、その名前も登録されてもよい。また、同一出演者について複数の顔画像が顔画像蓄積サーバ１２７に記録される場合に、基準となる顔画像（基準顔画像）が指定されてもよい。基準顔画像は、一つに限る趣旨では無いが、作業性の観点から、顔種別毎に１つや、所定の出演時期（例えば５年間）に一つといった程度に設定されうる。 The accumulated face images are not limited to one type (generally the front face), but multiple face types (front, profile, diagonal face, back face, laughing face, angry face, etc.) are used as detection target face images. It can be registered and saved, and the face image and its face type are associated and recorded. By preparing a plurality of face images to be detected, particularly a plurality of face images of different types, it is possible to more accurately detect a video in which a specific performer is appearing, and a specific face image can be detected. It is possible to detect the situation that is particularly desired among the images of the performers (such as wanting an image of a laughing face). Further, when the name of the performer can be specified at the time of accumulating the face image, that name may also be registered. Further, when a plurality of face images of the same performer are recorded on the face image storage server 127, a reference face image (reference face image) may be specified. The reference face image is not limited to one, but from the viewpoint of workability, it can be set to one for each face type or one at a predetermined appearance time (for example, 5 years).

元の映像がＶＴＲテープ５ｃに記録されている場合は、探し出したＶＴＲテープ５ｃをＶＴＲ再生装置（メディア再生装置１９）で再生し、類似顔画像検出装置１６に取り込む。類似顔画像検出装置１６は、取り込んだ再生映像から、光学メディア５ａや磁気メディア５ｂの場合と同様に、類似人物検索処理の技術を用いて、顔部分の映像のみを切出して、切出した顔画像をタイムコード情報と共に顔画像蓄積サーバ１２７に保存する。 When the original video is recorded on the VTR tape 5c, the found VTR tape 5c is reproduced by the VTR playback device (media playback device 19) and captured in the similar face image detection device 16. The similar face image detection device 16 cuts out only the image of the face portion from the captured reproduced image by using the technique of similar person search processing as in the case of the optical media 5a and the magnetic media 5b, and cuts out the face image. Is saved in the face image storage server 127 together with the time code information.

図１５は元の映像が低解像度サーバ１２６に記録されている場合に、顔画像蓄積サーバ１２７に顔画像を蓄積する手順を示す。 FIG. 15 shows a procedure for accumulating a face image on the face image storage server 127 when the original video is recorded on the low resolution server 126.

元の映像が低解像度サーバ１２６に記録されている場合、担当者が管理端末１７上で出演者の情報を検索すると、その出演者が出演している番組及び、その番組が記録されている低解像度サーバ１２６内の映像ファイル名が出力される。その情報をそのままオンラインで、すなわちネットワーク２を介して類似顔画像検出装置１６に渡される。その結果、低解像度サーバ１２６から映像ファイルを取り出し、類似顔画像検出装置１６で映像ファイルを再生することで再生映像から顔部分の映像のみを切出して、切出した顔画像をタイムコード情報と共に顔画像蓄積サーバ１２７に保存する。 When the original video is recorded on the low resolution server 126, when the person in charge searches for the performer's information on the management terminal 17, the program in which the performer is appearing and the program in which the performer is recorded are recorded. The video file name in the resolution server 126 is output. The information is passed online as it is, that is, to the similar face image detecting device 16 via the network 2. As a result, the video file is taken out from the low resolution server 126, and the video file is played back by the similar face image detection device 16 to cut out only the image of the face part from the reproduced video, and the cut out face image is used as the face image together with the time code information. Save to the storage server 127.

図１６は目的の出演者の顔画像を検出対象として顔画像蓄積サーバ１２７から類似顔検出した図である。 FIG. 16 is a diagram in which a similar face is detected from the face image storage server 127 with the face image of the target performer as the detection target.

編集者は目的の出演者の顔画像ファイル（検出対象顔画像）を類似顔画像検出装置１６に読み込ませる。検出対象顔画像は、編集対象の映像ファイルから代表的な顔画像として抽出された画像でもよいし、顔画像蓄積サーバ１２７に含まれる顔画像から選択された顔画像でもよいし、ウェブ上の画像から取り込んだ画像でもよい。類似顔画像検出装置１６は、検出対象顔画像と顔画像蓄積サーバ１２７内の顔画像とを比較し、同じ顔の出演者が出演するシーンの顔画像及び、タイムコード情報が検索される。 The editor loads the face image file (detection target face image) of the target performer into the similar face image detection device 16. The detection target face image may be an image extracted as a representative face image from the video file to be edited, a face image selected from the face images included in the face image storage server 127, or an image on the web. It may be an image captured from. The similar face image detection device 16 compares the face image to be detected with the face image in the face image storage server 127, and searches for the face image of the scene in which the performer of the same face appears and the time code information.

ここで検出した出演シーンのタイムコード情報が編集装置１４に渡される。編集者は目的の出演者が出演しているシーンを探し出す手間がなく、出演者の特集番組を制作したり出演者にモザイクをかけることが可能となる。 The time code information of the appearance scene detected here is passed to the editing device 14. The editor does not have to find out the scene in which the target performer is appearing, and can create a special program of the performer or apply a mosaic to the performer.

また、検出した出演シーンについて、編集装置１４を使用せずに試写したい場合は、低解像度サーバ１２６内の映像ファイルを再生することで、出演シーン試写が容易に可能となる。 Further, when it is desired to preview the detected appearance scene without using the editing device 14, the appearance scene preview can be easily performed by playing back the video file in the low resolution server 126.

このような類似顔検出処理をすることで、例えば、放送局の厖大な過去映像の中から目的の出演者が出演しているシーンを探し出す場合に、類似顔画像検出装置１６が自動的に出演シーンを検出してくれる。その結果、編集者はメディア５（光学メディア５ａ、磁気メディア５ｂ、ＶＴＲテープ５ｃ）の映像を注視している必要がなくなる。その間に編集者は他の仕事をすることが可能となり、編集者の業務効率を大幅に向上させることができる。 By performing such similar face detection processing, for example, when a scene in which a target performer is appearing is searched for from a huge past image of a broadcasting station, the similar face image detection device 16 automatically appears. It detects the scene. As a result, the editor does not need to watch the image of the media 5 (optical media 5a, magnetic media 5b, VTR tape 5c). In the meantime, the editor can do other work, which can greatly improve the work efficiency of the editor.

また、編集装置１４の数には限りがあるため、編集装置１４を使用できない場合は事前に目的の出演者の出演シーンを探して、低解像度サーバ１２６の映像ファイルを使用して出演シーンを事前に試写しておくことで、編集前の事前作業が可能となる。 Further, since the number of editing devices 14 is limited, if the editing device 14 cannot be used, the appearance scene of the target performer is searched for in advance, and the appearance scene is preliminarily used by using the video file of the low resolution server 126. By previewing it in, it is possible to perform pre-work before editing.

また、番組編集を完了して放送直前に出演者が問題を起こしたことにより、その出演者の放送が不可になった場合には、上述の技術によって、容易に目的の出演者の出演シーンを探し、その出演者にモザイクをかける処理や、または出演シーンをカットする処理が可能となり、スポンサーや視聴者からのクレーム防止になる。 In addition, if the performer's broadcast becomes impossible due to a problem caused by the performer immediately before the broadcast after completing the program editing, the above-mentioned technology can be used to easily change the appearance scene of the target performer. It is possible to search for and apply a mosaic to the performer, or to cut the appearance scene, which prevents complaints from sponsors and viewers.

上記処理では、放送局の過去の映像から出演者を検出する。しかし、映像収録から数十年経つと出演者の顔も変化していくため、目的の出演者の現在の顔画像を検出対象とすると、検出の精度が落ちる可能性が高くなる。それを解決するために、一度、現在の顔画像を検出対象として検出した結果の顔画像（検出精度の落ちた過去の顔画像）に替えて、検出対象の顔画像として新たに再登録し、再度類似顔画像検出することで検出精度を向上させることができる。すなわち、２ステップの検出（基準顔（基準顔画像）の新情報再登録→類似顔画像検索）による検出精度向上が期待できる。 In the above process, performers are detected from the past images of the broadcasting station. However, since the face of the performer changes several decades after the video is recorded, if the current face image of the target performer is targeted for detection, there is a high possibility that the detection accuracy will drop. In order to solve this problem, the face image of the result of detecting the current face image as the detection target (past face image with reduced detection accuracy) is replaced with a new face image to be detected, and the face image is newly re-registered. The detection accuracy can be improved by detecting the similar face image again. That is, it is expected that the detection accuracy will be improved by the two-step detection (re-registration of new information of the reference face (reference face image) → search for similar face image).

そして、検出した出演映像のタイムコード情報を編集機に渡すことで、編集者はその出演者が出演している映像にモザイクをかけたり、または出演映像をカットすることが可能となる。 Then, by passing the time code information of the detected appearance video to the editing machine, the editor can apply a mosaic to the video in which the performer is appearing or cut the appearance video.

また、出演時期（撮影時期）が近い顔画像であれば、同じような特徴量が現れると考えられるため、同じような特徴量を辿りながら出演時期が開いている顔画像も検出することができる。また、同様に横顔の映像が欲しい場合は、横顔を検出対象の顔画像として再登録し、類似顔画像検出することで、より絞った出演シーンの検出が可能となる。 In addition, if the face image has a similar appearance time (shooting time), it is considered that the same feature amount appears. Therefore, it is possible to detect the face image whose appearance time is open while tracing the same feature amount. .. Similarly, if a profile image is desired, the profile can be re-registered as a face image to be detected and a similar face image can be detected, so that a narrower appearance scene can be detected.

そして、検出した出演シーンのタイムコード情報を編集機に渡すことで、編集者はその出演者が出演しているシーンのみを纏めた特集番組を作成することが可能となる。 Then, by passing the time code information of the detected appearance scene to the editing machine, the editor can create a special program that summarizes only the scene in which the performer is appearing.

現在の放送局では、編集完了した映像（編集済み映像）を光学メディア５ａで記録し、それを再生装置にかけて放送出力するか、または、その光学メディア５ａから送出サーバ１８に取り込んで放送出力する運用が多い。そのため、光学メディア５ａから映像ファイルを取り出し、類似顔画像検出装置１６（類似顔検出装置）内で映像ファイルを再生させ、再生映像から顔の映像のみ切出して、切出した顔画像をタイムコードと共に顔画像蓄積サーバ１２７に保存しておき、目的の出演者の顔画像を検出対象として類似顔検出することで担当者は映像を目視しなくても出演シーンを探すことが可能となる。ここで、検出対象とする顔画像を正面顔、横顔、斜め顔等複数準備しておくことにより、探したい出演者が出演しているシーンをより精度よく検出することが可能となる。 In the current broadcasting station, the edited video (edited video) is recorded on the optical media 5a and broadcasted by the optical media 5a, or the optical media 5a is taken into the transmission server 18 and broadcast output. There are many. Therefore, the video file is taken out from the optical media 5a, the video file is reproduced in the similar face image detection device 16 (similar face detection device), only the face image is cut out from the reproduced video, and the cut out face image is combined with the time code for the face. By storing the image in the image storage server 127 and detecting similar faces by using the face image of the target performer as a detection target, the person in charge can search for the appearance scene without visually observing the image. Here, by preparing a plurality of face images to be detected, such as a front face, a profile face, and an oblique face, it is possible to more accurately detect the scene in which the performer to be searched for appears.

以上、本発明を実施形態をもとに説明した。この実施形態は例示であり、それらの各構成要素の組み合わせにいろいろな変形例が可能なこと、またそうした変形例も本発明の範囲にあることは当業者に理解されるところである。 The present invention has been described above based on the embodiments. This embodiment is an example, and it will be understood by those skilled in the art that various modifications are possible for the combination of each of these components, and that such modifications are also within the scope of the present invention.

１映像編集システム
２ネットワーク
５メディア
５ａ光学メディア
５ｂ磁気メディア
５ｃＶＴＲテープ
１０カメラ
１１収録装置
１２記録装置
１３自動編集情報作成装置
１４編集装置
１５システム制御部
１６類似顔画像検出装置
１７管理端末
１８送出サーバ
１９メディア再生装置
１０３類似顔画像検出操作部
１１０キーワード記録部
１１１キーワード検索部
１１２キーワード付与要求送信部
１１３複数検索キー選択部
１２１素材映像データ部
１２２編集済み映像データ部
１２３自動編集済み映像データ部
１２４最終編集情報部
１２５自動編集情報部
１２６低解像度サーバ
１２７顔画像蓄積サーバ
１２８処理対象情報部
１３１処理対象認識部
１３２情報記憶部
１４１編集制御部
１４２表示部
１４３操作パネル
１４４タッチパネルディスプレイ
２１０画像送受信部
２１１画像記録部
２１２再生制御部
２１３人物領域検出部
２１４人物特徴量抽出部
２１５人物特徴量記録部
２１６属性情報記録部
２１７要求受信部
２１８類似人物検索部
２１９登場イベント検索部
２２０検索結果送信部
２２１検索要求送信部
２２２検索結果受信部
２２３検索結果表示部
２２４再生画像表示部
２２５画面操作検知部 1 Video editing system 2 Network 5 Media 5a Optical media 5b Magnetic media 5c VTR tape 10 Camera 11 Recording device 12 Recording device 13 Automatic editing information creation device 14 Editing device 15 System control unit 16 Similar face image detection device 17 Management terminal 18 Sending server 19 Media playback device 103 Similar face image detection operation unit 110 Keyword recording unit 111 Keyword search unit 112 Keyword assignment request transmission unit 113 Multiple search key selection unit 121 Material video data unit 122 Edited video data unit 123 Automatically edited video data unit 124 Final editing information unit 125 Automatic editing information unit 126 Low resolution server 127 Face image storage server 128 Processing target information unit 131 Processing target recognition unit 132 Information storage unit 141 Editing control unit 142 Display unit 143 Operation panel 144 Touch panel display 210 Image transmission / reception unit 211 Image recording unit 212 Playback control unit 213 Person area detection unit 214 Person feature amount extraction unit 215 Person feature amount recording unit 216 Attribute information recording unit 217 Request reception unit 218 Similar person search unit 219 Appearance event search unit 220 Search result transmission unit 221 Search Request transmission unit 222 Search result reception unit 223 Search result display unit 224 Playback image display unit 225 Screen operation detection unit

Claims

An editing system equipped with an editing device that edits video files used for broadcasting.
A face image storage server that acquires the face image of the performer included in the video file and records the face image in association with the time code information of the appearance video of each performer.
A face image detection unit that compares the face image recorded in the face image storage server with the face image to be searched included in the video file of the specific program, and detects the appearance video in the specific program.
Based on the appearance video detected by the appearance video detection unit, the face image of the appearance video other than the specific program in which the person of the face image to be searched in the specific program appears, and the face image of the person in the specific program. The specific program in which a person with a face image to be searched appears by a similar face image search that determines that the person is the same when the distance between the feature amount and the face image is closer than a preset threshold. It is provided with a similar face image detection device that detects appearance videos other than the above, associates the time code information of the detected appearance video with the performer information to be searched, and notifies the editing device.
The editing device is an editing system characterized by editing a video file of the specific program using the time code information.

When editing the video file, the editing device reproduces the video file of the recording device if the video file of the performer is stored in the recording device to be processed by the similar face image detection device. The editing system according to claim 1, wherein the video confirmation of the appearance video is possible.

The editing system according to claim 1 or 2, wherein the editing device reproduces an appearance video detected by using a low-resolution video.

The face image storage server can save the face image to be detected in association with the face type.
The editing system according to any one of claims 1 to 3, wherein the similar face image detection device performs a similar face image search according to the face type.