JP7340992B2

JP7340992B2 - Image management device and program

Info

Publication number: JP7340992B2
Application number: JP2019153510A
Authority: JP
Inventors: 秀樹吉岡; 和代細谷
Original assignee: Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2019-08-26
Filing date: 2019-08-26
Publication date: 2023-09-08
Anticipated expiration: 2039-08-26
Also published as: JP2021033664A

Description

本発明は、画像管理装置およびプログラムに関する。 The present invention relates to an image management device and a program.

動画像に含まれる顔を自動的に認識することにより、低コストで、動画像コンテンツに関するメタデータを蓄積できるようにすることが望まれる。 It is desirable to be able to store metadata related to video content at low cost by automatically recognizing faces included in video images.

特許文献１の例えば請求項１には、顔を認識するための顔認識装置に顔のデータを送信する顔検出装置が記載されている。その顔検出装置において、分類部は、動画像を構成するフレームに含まれる顔画像の特徴データを比較することにより、同一人物の特徴データが一つのグループになるように、人物ごとの分類を行っている。 For example, claim 1 of Patent Document 1 describes a face detection device that transmits facial data to a face recognition device for recognizing a face. In the face detection device, the classification unit performs classification for each person by comparing the feature data of the face images included in the frames constituting the video image, so that the feature data of the same person is grouped into one group. ing.

特開２０１７－１８２２１０号公報JP2017-182210A

しかしながら、幅広い動画コンテンツを対象として大量の人物の情報を蓄積するためには、顔認識処理のためのモデルの精度を向上させる必要がある。通常は、対象とする人物の数が増えれば増えるほど、例えば似た顔を正しく判別することが困難になる。顔認識処理のためのモデルの精度を向上できないと、顔認識処理によるエラー率が高くなってしまうという問題がある。 However, in order to accumulate a large amount of information on people for a wide range of video content, it is necessary to improve the accuracy of models for face recognition processing. Normally, as the number of target persons increases, it becomes difficult to correctly identify, for example, similar faces. If the accuracy of the model for face recognition processing cannot be improved, there is a problem in that the error rate in face recognition processing will increase.

本発明は、上記の課題認識に基づいて行なわれたものであり、顔認識処理の精度を上げながら、多人数の顔画像の特徴量の情報を蓄積したり、動画コンテンツ内で検出された顔の識別結果の情報を蓄積したりすることのできる画像管理装置およびプログラムを提供しようとするものである。 The present invention was made based on the above-mentioned problem recognition, and it is possible to improve the accuracy of face recognition processing while accumulating feature amount information of facial images of multiple people, and to improve the accuracy of face recognition processing. An object of the present invention is to provide an image management device and a program that can store information on identification results.

［１］上記の課題を解決するため、本発明の一態様による画像管理装置は、動画コンテンツから取り出された顔画像を画像特徴量に基づいてクラスタリングするクラスタリング部と、前記クラスタリング部の処理によって得られたクラスターごとの前記画像特徴量を記憶する仮クラスター記憶部と、前記顔画像と前記クラスターとの関係を表す情報に基づいて機械学習処理を行うことによって、前記動画コンテンツに含まれる顔画像を基に前記クラスターを判定するための、前記動画コンテンツ用の顔認識モデルを生成する特定コンテンツ用顔画像学習部と、生成された前記顔認識モデルに基づいて前記動画コンテンツ内に含まれる顔画像の認識処理を行い、認識結果としてクラスターの情報を出力する顔認識処理部と、顔画像の画像特徴量とクラスターとを関連付けて記憶する人物データベースと、前記顔認識処理部から出力されたクラスターについて、前記仮クラスター記憶部から読み出したクラスターごとの前記画像特徴量を、前記人物データベースに登録する人物データ登録部と、を備える。 [1] In order to solve the above problems, an image management device according to one aspect of the present invention includes a clustering unit that clusters facial images extracted from video content based on image features, and a clustering unit that clusters facial images extracted from video content based on image features, and a clustering unit that clusters facial images extracted from video content based on image features; By performing machine learning processing based on a temporary cluster storage unit that stores the image feature amount for each cluster and information representing the relationship between the face image and the cluster, the face images included in the video content are a face image learning unit for specific content that generates a face recognition model for the video content in order to determine the cluster based on the face recognition model; A face recognition processing unit that performs recognition processing and outputs cluster information as a recognition result, a person database that stores image features of face images and clusters in association with each other, and clusters output from the face recognition processing unit, A person data registration unit that registers the image feature amount for each cluster read from the temporary cluster storage unit in the person database.

［２］また、本発明の一態様は、上記の画像管理装置において、前記クラスタリング部は、複数の段階のクラスタリング処理を行うための複数の段階クラスタリング部を持ち、第２段階以後の前記段階クラスタリング部は、１つ前の前記段階クラスタリング部から出力されたクラスターの前記顔画像を対象としてクラスタリング処理を行う、ものである。 [2] Further, in one aspect of the present invention, in the image management device described above, the clustering unit has a plurality of stage clustering units for performing clustering processing in a plurality of stages, and the stage clustering after the second stage is performed. The clustering section performs clustering processing on the facial images of the cluster output from the previous stage clustering section.

［３］また、本発明の一態様は、上記の画像管理装置において、前記人物データベースに登録されている前記画像特徴量に基づいて、複数のクラスターを１つのクラスターに統合する名寄せ処理部、をさらに備えるものである。 [3] Further, in one aspect of the present invention, the image management device described above further includes a name matching processing unit that integrates a plurality of clusters into one cluster based on the image feature amount registered in the person database. It is also something to be prepared for.

［４］また、本発明の一態様は、上記の画像管理装置において、前記人物データベースは、クラスターに付与するタグを前記クラスターと関連付けて記憶するものであり、タグが設定されていない前記クラスターについて、新たなタグを設定する処理を行うタグ設定部をさらに備える、ものである。 [4] Further, in one aspect of the present invention, in the image management device described above, the person database stores tags to be assigned to clusters in association with the clusters, and for the clusters to which no tags are set. , further comprising a tag setting section that performs a process of setting a new tag.

［５］また、本発明の一態様は、上記の画像管理装置において、前記顔認識処理部は、前記動画コンテンツのフレーム間における画素値の変化量がピークとなるカット点で区切った時間区間ごとにフレーム間で前記顔画像を追跡し、追跡結果に基づいて前記クラスターの判定におけるエラーを検出するとともに検出されたエラーを補正して、前記認識結果としてクラスターの情報を出力する、ものである。 [5] Further, in one aspect of the present invention, in the above-described image management device, the face recognition processing unit detects each time interval divided by a cut point at which the amount of change in pixel value between frames of the video content reaches a peak. The facial image is tracked between frames, and based on the tracking result, an error in the cluster determination is detected, the detected error is corrected, and cluster information is output as the recognition result.

［６］また、本発明の一態様は、上記の画像管理装置において、前記人物データベースは、さらに、前記時間区間と前記クラスターとを関連付けた情報を記憶する、ものである。 [6] Moreover, one aspect of the present invention is that in the image management device described above, the person database further stores information associating the time interval with the cluster.

［７］また、本発明の一態様は、動画コンテンツから取り出された顔画像を画像特徴量に基づいてクラスタリングするクラスタリング部と、前記クラスタリング部の処理によって得られたクラスターごとの前記画像特徴量を記憶する仮クラスター記憶部と、前記顔画像と前記クラスターとの関係を表す情報に基づいて機械学習処理を行うことによって、前記動画コンテンツに含まれる顔画像を基に前記クラスターを判定するための、前記動画コンテンツ用の顔認識モデルを生成する特定コンテンツ用顔画像学習部と、生成された前記顔認識モデルに基づいて前記動画コンテンツ内に含まれる顔画像の認識処理を行い、認識結果としてクラスターの情報を出力する顔認識処理部と、顔画像の画像特徴量とクラスターとを関連付けて記憶する人物データベースと、前記顔認識処理部から出力されたクラスターについて、前記仮クラスター記憶部から読み出したクラスターごとの前記画像特徴量を、前記人物データベースに登録する人物データ登録部と、を備える画像管理装置としてコンピューターを機能させるためのプログラムである。 [7] Further, one aspect of the present invention includes a clustering unit that clusters face images extracted from video content based on image features, and a clustering unit that clusters face images extracted from video content based on image features, and the image features for each cluster obtained by processing of the clustering unit. determining the cluster based on the face image included in the video content by performing machine learning processing based on a stored temporary cluster storage unit and information representing a relationship between the face image and the cluster; A face image learning unit for specific content that generates a face recognition model for the video content, performs recognition processing on the face image included in the video content based on the generated face recognition model, and generates a cluster of clusters as a recognition result. A face recognition processing unit that outputs information, a person database that stores clusters in association with image feature amounts of face images, and information about each cluster read from the temporary cluster storage unit for the clusters output from the face recognition processing unit. A program for causing a computer to function as an image management device, comprising: a person data registration unit that registers the image feature amount of , in the person database.

本発明によれば、動画内における顔の認識率を高めながら、動画コンテンツに含まれる顔画像に関する大量の情報を蓄積することが可能となる。 According to the present invention, it is possible to accumulate a large amount of information regarding face images included in video content while increasing the recognition rate of faces in videos.

本発明の実施形態による画像管理装置の概略機能構成を示すブロック図である。FIG. 1 is a block diagram showing a schematic functional configuration of an image management device according to an embodiment of the present invention. 同実施形態において、仮クラスターの情報を記憶する仮クラスター記憶部のデータ構成例を示す概略図である。FIG. 3 is a schematic diagram illustrating an example of a data configuration of a temporary cluster storage unit that stores information on temporary clusters in the same embodiment. 同実施形態によるクラスター選択部の処理の概略を示す概略図である。It is a schematic diagram showing an outline of processing of a cluster selection part by the same embodiment. 同実施形態による人物データベースが保持する人物データ（名寄せ前）の構成例を示す概略図である。FIG. 2 is a schematic diagram illustrating a configuration example of person data (before name matching) held by a person database according to the embodiment. 同実施形態による人物データベースが保持する人物データ（名寄せ後）の構成例を示す概略図である。It is a schematic diagram showing an example of composition of person data (after name matching) held by a person database according to the same embodiment. 同実施形態による人物データベースが保持する人物データ（新規タグ付与後）の構成例を示す概略図である。FIG. 2 is a schematic diagram showing a configuration example of person data (after adding a new tag) held by the person database according to the embodiment. 同実施形態による人物データベースが保持する人物データ（名寄せ前）の構成例（別形態）を示す概略図である。FIG. 3 is a schematic diagram showing a configuration example (another form) of person data (before name matching) held by the person database according to the embodiment. 同実施形態による人物データベースが保持する人物データ（名寄せ後）の構成例（別形態）を示す概略図である。FIG. 3 is a schematic diagram showing a configuration example (another form) of person data (after name matching) held by the person database according to the embodiment. 同実施形態による人物データベースが保持する人物データ（新規タグ付与後）の構成例（別形態）を示す概略図である。FIG. 7 is a schematic diagram showing a configuration example (another form) of person data (after adding a new tag) held by the person database according to the same embodiment. 同実施形態における、２段階クラスタリングの処理の概要を示す概略図である。It is a schematic diagram showing an outline of two-stage clustering processing in the same embodiment. 同実施形態における、２段階のクラスタリング処理によって分類される顔画像の例を示す概略図である。FIG. 3 is a schematic diagram illustrating an example of facial images classified by two-stage clustering processing in the same embodiment. 同実施形態における、カット単位顔認識処理部の処理を説明するための概略図である。FIG. 3 is a schematic diagram for explaining processing of a cut-by-cut face recognition processing section in the same embodiment. 同実施形態における出演情報の構成例を示す概略図である。It is a schematic diagram showing an example of composition of appearance information in the same embodiment. 同実施形態による画像管理装置の処理手順を示すフローチャートである。It is a flowchart which shows the processing procedure of the image management apparatus by the same embodiment.

次に、本発明の一実施形態について、図面を参照しながら説明する。 Next, one embodiment of the present invention will be described with reference to the drawings.

図１は、本実施形態による画像管理装置の概略機能構成を示すブロック図である。符号１は、画像管理装置である。図示するように、画像管理装置１は、動画コンテンツ取得部１１と、顔検出処理部１２と、クラスタリング部１３と、仮クラスター管理部１６と、仮クラスター記憶部１７と、特定コンテンツ用顔画像学習部１８と、特定コンテンツ用顔認識モデル１９と、カット単位顔認識処理部２０と、クラスター選択部２１と、人物データ登録部２２と、人物データベース２３と、出演者データ記憶部２４と、タグ設定部２５とを含んで構成される。なお、クラスタリング部１３は、第１クラスタリング部１４と、第２クラスタリング部１５とを含む。 FIG. 1 is a block diagram showing a schematic functional configuration of an image management device according to this embodiment. Reference numeral 1 indicates an image management device. As shown in the figure, the image management device 1 includes a video content acquisition unit 11, a face detection processing unit 12, a clustering unit 13, a temporary cluster management unit 16, a temporary cluster storage unit 17, and face image learning for specific content. unit 18, specific content face recognition model 19, cut unit face recognition processing unit 20, cluster selection unit 21, person data registration unit 22, person database 23, performer data storage unit 24, tag setting 25. Note that the clustering section 13 includes a first clustering section 14 and a second clustering section 15.

これらの各機能部は、例えば、電子回路を用いて実現可能である。また、各機能部を、コンピューターと、プログラムとで実現することも可能である。例えばコンピューターを用いて画像管理装置１を実現する場合、画像管理装置１が持つすべての機能を１台のコンピューターが持つようにしてもよいし、複数のコンピューター等に機能を分散させてもよい。複数のコンピューターは、相互に通信可能とする。また、画像管理装置１が持つ機能の一部をいわゆるクラウドサーバーで実現してもよい。各機能部は、必要に応じて、記憶手段を有する。記憶手段としては、例えば、半導体メモリーや磁気ハードディスク装置などといったものを用いることができる。各部の機能は、以下に説明する通りである。 Each of these functional units can be realized using, for example, an electronic circuit. It is also possible to implement each functional section using a computer and a program. For example, when implementing the image management device 1 using a computer, one computer may have all the functions of the image management device 1, or the functions may be distributed among multiple computers. Multiple computers are allowed to communicate with each other. Further, some of the functions of the image management device 1 may be realized by a so-called cloud server. Each functional unit has storage means, if necessary. As the storage means, for example, a semiconductor memory or a magnetic hard disk device can be used. The functions of each part are as explained below.

動画コンテンツ取得部１１は、動画コンテンツを取得する。動画コンテンツは、時系列のフレーム画像を含む。動画コンテンツ取得部１１は、例えば、放送信号を受信して、放送信号内に含まれる動画コンテンツを取得する。また、動画コンテンツ取得部１１が、通信ネットワークを介して外部の配信サーバー装置から動画コンテンツを取得したり、磁気ハードディスク装置や光ディスク等の記録媒体に記録された動画コンテンツを読み取ったりするようにしてもよい。 The video content acquisition unit 11 acquires video content. The video content includes time-series frame images. For example, the video content acquisition unit 11 receives a broadcast signal and acquires video content included in the broadcast signal. Further, even if the video content acquisition unit 11 acquires video content from an external distribution server device via a communication network, or reads video content recorded on a recording medium such as a magnetic hard disk device or an optical disk, good.

顔検出処理部１２は、動画コンテンツ取得部１１が取得した動画コンテンツのフレーム画像内に含まれる顔画像を検出する。顔検出処理部１２は、例えば、検出した顔画像の領域の画像を切り出して、クラスタリング部１３に渡す。顔検出処理部１２は、必要に応じて、切り出す顔画像のサイズ（縦および横の画素数）を正規化してもよい。なお、顔画像検出の技術自体は、既存技術を利用できる。顔検出処理部１２は、顔らしさを表すモデルを参照することによって、画像内の顔を検出する。 The face detection processing unit 12 detects a face image included in a frame image of the video content acquired by the video content acquisition unit 11. For example, the face detection processing unit 12 cuts out an image of the detected face image area and passes it to the clustering unit 13. The face detection processing unit 12 may normalize the size (the number of vertical and horizontal pixels) of the face image to be cut out, if necessary. Note that existing technology can be used as the face image detection technology itself. The face detection processing unit 12 detects a face in an image by referring to a model representing face-likeness.

クラスタリング部１３は、顔検出処理部１２から渡された顔画像の特徴量に基づくクラスタリングを行う。顔画像の特徴量は、顔画像内の、あるいは顔画像内の一部領域の、画素値の配置のパターンあるいは画素値が属する範囲の配置のパターン等に基づく。顔画像の特徴量は、顔の輪郭や顔に含まれるパーツの形状や、それらの相対的な位置関係や、パーツごとのサイズや、色（画素値）の分布や、その他の要素を反映したものである。特徴量は、例えば、多次元のベクトルで表現され得る。本実施形態では、多段階のクラスタリングを行う。具体的には、クラスタリング部１３は、第１クラスタリング部１４と、第２クラスタリング部１５とを備える。つまり、顔検出処理部１２から渡される顔画像群を、まず第１クラスタリング部１４がクラスタリングし、第１クラスタリング部１４から出力される画像群を、さらに第２クラスタリング部１５がクラスタリングする。こういった多段階のクラスタリング処理により、クラスターに含まれるノイズを除去する作用があり、クラスターの精度が上がる。なお、クラスタリング処理の段階数は、２に限らず、１または３以上であってもよい。クラスタリング処理の詳細については後述する。 The clustering unit 13 performs clustering based on the feature amount of the face image passed from the face detection processing unit 12. The feature amount of the face image is based on the pattern of arrangement of pixel values in the face image or in a partial region of the face image, or the arrangement pattern of the range to which the pixel values belong. The features of a face image reflect the outline of the face, the shape of the parts included in the face, their relative positions, the size of each part, the distribution of colors (pixel values), and other factors. It is something. The feature amount can be expressed as a multidimensional vector, for example. In this embodiment, multi-stage clustering is performed. Specifically, the clustering section 13 includes a first clustering section 14 and a second clustering section 15. That is, the first clustering section 14 first clusters the face image group passed from the face detection processing section 12, and the second clustering section 15 further clusters the image group output from the first clustering section 14. This multi-stage clustering process has the effect of removing noise contained in clusters, increasing the accuracy of clusters. Note that the number of stages of the clustering process is not limited to 2, and may be 1 or 3 or more. Details of the clustering process will be described later.

なお、第１クラスタリング部１４と第２クラスタリング部１５とは、同一の回路あるいはプログラムモジュールを、異なるパラメーターで動作させることによって実現するようにしてもよい。 Note that the first clustering section 14 and the second clustering section 15 may be realized by operating the same circuit or program module with different parameters.

仮クラスター管理部１６は、クラスタリング部１３から出力されるクラスター（ここでは「仮クラスター」と呼ぶ）の情報を管理する。具体的には、仮クラスター管理部１６は、仮クラスターを識別する情報と、仮クラスターごとの特徴量に関する情報を、仮クラスター記憶部１７に書き込む。特徴量の情報として、特徴量の統計情報（例えば、平均値や分散値）を含んでもよい。 The temporary cluster management section 16 manages information on clusters (herein referred to as "temporary clusters") output from the clustering section 13. Specifically, the temporary cluster management unit 16 writes information identifying the temporary cluster and information regarding the feature amount of each temporary cluster to the temporary cluster storage unit 17. The feature amount information may include feature amount statistical information (for example, average value and variance value).

仮クラスター記憶部１７は、上記の仮クラスターの情報を記憶するものである。仮クラスター記憶部１７は、仮クラスターを識別する情報と、仮クラスターの特徴量の情報とを相互に関連付けて記憶する。仮クラスター記憶部１７が記憶するデータの構成については、後で別の図を参照しながら説明する。なお、仮クラスター記憶部１７が、仮クラスターの特徴量の情報を持つ代わりに、仮クラスターに属する顔画像群そのものを記憶するようにしてもよい。また、仮クラスター記憶部１７が、仮クラスターごとに、仮クラスターの特徴量の情報と、仮クラスターに属する顔画像群との両方を記憶するようにしてもよい。 The temporary cluster storage unit 17 stores information on the above-mentioned temporary clusters. The temporary cluster storage unit 17 stores information for identifying a temporary cluster and information on the feature amount of the temporary cluster in association with each other. The structure of the data stored in the temporary cluster storage section 17 will be explained later with reference to another diagram. Note that the temporary cluster storage unit 17 may store the face image group itself belonging to the temporary cluster instead of storing information on the feature amount of the temporary cluster. Further, the temporary cluster storage unit 17 may store, for each temporary cluster, both information on the feature amount of the temporary cluster and a group of facial images belonging to the temporary cluster.

特定コンテンツ用顔画像学習部１８は、仮クラスター記憶部１７から仮クラスターの識別情報と特徴量の情報とを読み出し、それらを教師データとして機械学習を行うことによって、仮クラスター識別情報と画像特徴量との関係を表すモデルを構築する。ここで構築されるモデルは、特定コンテンツ（現在、処理対象としている動画コンテンツ）用のモデルである。特定コンテンツ用顔画像学習部１８は、例えば、ニューラルネットワークを用いた機械学習を行う。 The specific content face image learning unit 18 reads temporary cluster identification information and feature amount information from the temporary cluster storage unit 17 and performs machine learning using them as training data, thereby obtaining the temporary cluster identification information and image feature amount. Build a model that represents the relationship between The model constructed here is a model for specific content (video content currently being processed). The specific content face image learning unit 18 performs machine learning using a neural network, for example.

特定コンテンツ用顔認識モデル１９は、特定コンテンツ用顔画像学習部１８によって構築されるモデルである。ニューラルネットワークを用いたモデルである場合、特定コンテンツ用顔認識モデル１９は、具体的には、ニューラルネットワーク内の各ノードにおける演算用パラメーターの値を記憶している。前述の通り、この特定コンテンツ用顔認識モデル１９は、特定コンテンツ（現在、処理対象としている動画コンテンツ）用のモデルである。 The specific content face recognition model 19 is a model constructed by the specific content face image learning unit 18. In the case of a model using a neural network, the specific content face recognition model 19 specifically stores values of calculation parameters at each node in the neural network. As described above, the specific content face recognition model 19 is a model for specific content (video content currently being processed).

カット単位顔認識処理部２０は、動画コンテンツ取得部１１から渡される動画コンテンツに関して、カット単位で、動画コンテンツに含まれる顔を認識する処理を行う。カットは、動画の撮影用のカメラを切り替えたり、シーンが変わったりする単位である。カット単位顔認識処理部２０は、上記の特定コンテンツ用顔認識モデル１９を参照することによって、動画コンテンツ内に含まれる顔画像がどのカテゴリーに属するものであるかを判定し、カットを特定する情報と関連付ける形で、そのカテゴリーの情報を出力する。 The cut-by-cut face recognition processing unit 20 performs processing for recognizing faces included in the video content in units of cuts regarding the video content passed from the video content acquisition unit 11. A cut is a unit in which a video camera is switched or a scene is changed. The cut-by-cut face recognition processing unit 20 determines which category the face image included in the video content belongs to by referring to the above-mentioned specific content face recognition model 19, and generates information for identifying the cut. The information for that category is output in a form that is associated with it.

クラスター選択部２１は、仮クラスター記憶部１７に登録されたクラスター（仮クラスター）のうち、カット単位顔認識処理部２０が認識結果として出力したクラスター（「使用クラスター」と呼ぶ）を選択して出力する。クラスター選択部２１は、選択したクラスターの情報を、人物データ登録部２２に渡す。クラスター選択部２１が渡すクラスターの情報には、少なくとも、クラスターを識別する情報と、クラスターの特徴量の情報とを含む。 The cluster selection unit 21 selects and outputs the cluster (referred to as “used cluster”) that the cut unit face recognition processing unit 20 outputs as a recognition result from among the clusters (temporary clusters) registered in the temporary cluster storage unit 17. do. The cluster selection unit 21 passes information on the selected cluster to the person data registration unit 22. The cluster information passed by the cluster selection unit 21 includes at least information for identifying the cluster and information on the feature amount of the cluster.

人物データ登録部２２は、クラスター選択部２１から渡されるクラスターの情報を、人物データベース２３に登録する。 The person data registration unit 22 registers the cluster information passed from the cluster selection unit 21 into the person database 23.

人物データベース２３は、動画コンテンツ内に登場する人物に関する情報を蓄積するためのデータベースである。人物データベース２３は、クラスターを識別する情報と、クラスターの画像特徴量の情報と、クラスターに付与されるタグ（例えば、人名等）と、必要に応じてクラスターの他の属性情報とを、相互に関連付けて記憶する。さらに、人物データベース２３が、クラスターを識別する情報に関連付けて、当該クラスターに属する顔画像群そのものを持つようにしてもよい。人物データベース２３内に、あるいは人物データベース２３内の情報と関連付けて、顔画像群を保持するための具体的な方法の一例は、次の通りである。即ち、人物データベース２３内に、クラスターを識別する情報と関連付けて、数枚の（例えば、２～３枚の）代表顔画像を保持する。また、当該クラスターに関連付けて、さらに多くの顔画像群を保存する場所の情報を保持する。この「場所の情報」とは、例えば、ファイルシステム内のフォルダーを特定する名前や、それと同等のＵＲＬ（ユニフォームリソースロケーター）等である。そのフォルダー等では、顔画像群のデータを例えばＪＰＥＧ形式等の画像ファイル群として保持してもよいし、ＺＩＰ形式等の圧縮ファイル内にそれら画像ファイル群を保持するようにしてもよい。なお、上記のように数枚の代表顔画像を人物データベース２３が直接持つようにした場合には、ユーザーが人物データベース２３を検索した際にそれらの顔画像をすばやく閲覧することもできる。 The person database 23 is a database for accumulating information regarding people appearing in video content. The person database 23 mutually exchanges information that identifies clusters, information on image features of clusters, tags assigned to clusters (for example, person names, etc.), and other attribute information of clusters as necessary. Store in association. Furthermore, the person database 23 may include a group of facial images belonging to the cluster itself in association with information for identifying the cluster. A specific example of a method for holding a group of facial images in the person database 23 or in association with information in the person database 23 is as follows. That is, in the person database 23, several (for example, 2 to 3) representative face images are held in association with information for identifying clusters. In addition, information on locations where more facial image groups are stored is held in association with the cluster. This "location information" is, for example, a name that specifies a folder in a file system, a URL (uniform resource locator) equivalent to the name, or the like. In the folder or the like, the data of the facial image group may be held as a group of image files such as JPEG format, or the image file group may be held within a compressed file such as ZIP format. Note that if the person database 23 directly holds several representative facial images as described above, the user can quickly view those facial images when searching the person database 23.

また、人物データベース２３が、さらに、出演情報を蓄積するようにしてもよい。出演情報は、どの動画コンテンツのどのシーン（カット）に、どの人物が出演していたかを表す情報である。つまり、出演情報は、コンテンツを識別する情報と、シーン（カット）を識別する情報と、出演者（クラスター）を識別する情報とを相互に関連付けて記憶するものである。 Furthermore, the person database 23 may further accumulate appearance information. The appearance information is information indicating which person appeared in which scene (cut) of which video content. In other words, the appearance information is stored in such a manner that information for identifying content, information for identifying scenes (cuts), and information for identifying performers (clusters) are correlated with each other.

人物データベース２３は、名寄せ処理を行う機能を持つ。名寄せ処理は、人物データベースに登録されている複数のクラスターが、実際には同一のクラスターである（つまり、同一の人物の特徴量を表す）場合に、それら複数のクラスターを統合して、１つのクラスターとする処理である。名寄せ処理は、典型的には、新たに登録されたクラスターと既存のクラスターとがある場合に、それらのクラスターのうちの同一人物を表すデータを一つのクラスターに統合するために行われる。人物データベース２３は、例えば、特徴量の類似性に基づいてクラスターの名寄せ処理を行う。 The person database 23 has a function of performing name matching processing. When multiple clusters registered in a person database are actually the same cluster (that is, they represent the same person's features), the name matching process integrates the multiple clusters into one cluster. This is a cluster process. Name matching processing is typically performed when there is a newly registered cluster and an existing cluster, in order to integrate data representing the same person from these clusters into one cluster. The person database 23 performs cluster name matching processing based on the similarity of feature amounts, for example.

人物データベース２３の詳細については、後で別の図を参照しながらさらに説明する。 Details of the person database 23 will be further explained later with reference to another figure.

出演者データ記憶部２４は、画像管理装置１が処理対象とする動画コンテンツに出演する人物の情報を記憶するものである。出演者データ記憶部２４は、例えば、動画コンテンツごとに、出演者の名前（人名）等を記憶する。この動画コンテンツごとの出演者のデータは、動画コンテンツの映像（画像）から抽出される情報ではなく、別途外部から与えられる情報である。 The performer data storage unit 24 stores information about people who appear in the video content that is processed by the image management device 1. The performer data storage unit 24 stores, for example, the names of performers (personal names) for each video content. The performer data for each video content is not information extracted from the video (image) of the video content, but is information provided separately from the outside.

タグ設定部２５は、出演者データ記憶部２４が記憶する人名の情報を参照しながら、人物データベース２３内の、タグ（人名等）がまだ付与されていないクラスターに、タグを設定する。タグ設定部２５は、ユーザーによる操作に基づいて、どのクラスターにどのタグを付与するかを決定するようにしてよい。つまり、タグ設定部２５は、ユーザーインターフェースを持つ。タグ設定部２５は、そのユーザーインターフェースを通して、出演者データ記憶部２４に記憶されている出演者のリストをユーザーに提示する。そして、ユーザーによって選択された特定の出演者の名前、あるいはユーザーによって新たに入力された出演者の名前を、タグとして、人物データベース２３内の特定のクラスターに設定する。 The tag setting unit 25 sets a tag to a cluster in the person database 23 to which a tag (person's name, etc.) has not yet been attached, while referring to the information on the person's name stored in the performer data storage unit 24. The tag setting unit 25 may determine which tag is assigned to which cluster based on the user's operation. That is, the tag setting section 25 has a user interface. The tag setting section 25 presents the list of performers stored in the performer data storage section 24 to the user through its user interface. Then, the name of a specific performer selected by the user or the name of a performer newly input by the user is set as a tag in a specific cluster in the person database 23.

図２は、クラスタリング部１３による処理の結果として得られる仮クラスターの情報を記憶する仮クラスター記憶部１７のデータ構成例を示す概略図である。図示するように、仮クラスター記憶部１７は、例えば、表形式のデータを記憶する。仮クラスター記憶部１７は、仮クラスター識別情報と、特徴量情報とを、相互に関連付けて記憶する。仮クラスター識別情報は、個々の仮クラスターを識別するための情報である。なお、前述のように仮クラスター記憶部１７が顔画像群そのものの情報をも持つ場合には、仮クラスター記憶部１７は、仮クラスター識別情報と、それらの顔画像群の情報とを、関連付けて記憶する。図示する例では、仮クラスター識別情報は、例えば、「Ａ」、「Ｂ」、「Ｃ」等である。特徴量情報は、各クラスターの、画像に関する特徴を表す情報である。特徴量情報は、特徴量を統計的に処理したデータ（例えば、ある量の平均値や分散値等）であってもよい。つまり、仮クラスター記憶部１７は、仮クラスターごとの特徴を表す情報を記憶する。 FIG. 2 is a schematic diagram showing an example of the data structure of the temporary cluster storage unit 17 that stores information on temporary clusters obtained as a result of processing by the clustering unit 13. As illustrated, the temporary cluster storage unit 17 stores, for example, tabular data. The temporary cluster storage unit 17 stores temporary cluster identification information and feature amount information in association with each other. Temporary cluster identification information is information for identifying individual temporary clusters. Note that when the temporary cluster storage section 17 also has information on the facial image group itself as described above, the temporary cluster storage section 17 associates the temporary cluster identification information with the information on the facial image group. Remember. In the illustrated example, the temporary cluster identification information is, for example, "A", "B", "C", etc. The feature amount information is information representing the image-related features of each cluster. The feature amount information may be data obtained by statistically processing a feature amount (for example, an average value, a variance value, etc. of a certain amount). That is, the temporary cluster storage unit 17 stores information representing the characteristics of each temporary cluster.

図３は、クラスター選択部２１による処理の概略を示す概略図である。図示するように、クラスター選択部２１は、ある特定のコンテンツを対象として、抽出された仮クラスターの集合の中から、カット単位顔認識処理部２０による顔認識の結果として使用されたクラスターの集合を選択する。クラスター選択部２１は、選択結果であるクラスターの集合の情報を、人物データ登録部に渡す。図示する例では、クラスター選択部２１は、仮クラスター記憶部１７から、仮クラスターＡ、Ｂ、およびＣを読み出す。また、クラスター選択部２１は、カット単位顔認識処理部２０から、動画コンテンツを顔認識処理することよって抽出したクラスターであるクラスターＡおよびＢの情報を受け取る。そして、クラスター選択部２１は、仮クラスター記憶部１７から読み出した仮クラスターＡ、Ｂ、およびＣのうち、カット単位顔認識処理部２０から渡されたクラスターであるクラスターＡおよびＢの情報のみを選択する。クラスター選択部２１は、選択されたクラスターＡおよびＢの情報を、人物データ登録部２２に渡す。つまり、この例では、クラスターＣは仮クラスターとして抽出されていたが、顔認識においてクラスターＣが認識されてなかったため、クラスター選択部２１は、クラスターＣを選択せずに捨てる。なお、カット単位顔認識処理部２０が出力したクラスターを「使用クラスター」と呼ぶ場合がある。仕様クラスターは、動画内で使用されていたことが検出されたクラスターである。 FIG. 3 is a schematic diagram showing an outline of processing by the cluster selection unit 21. As shown in the figure, the cluster selection unit 21 selects a set of clusters used as a result of face recognition by the cut unit face recognition processing unit 20 from a set of extracted temporary clusters for a certain specific content. select. The cluster selection unit 21 passes information on a set of clusters, which is the selection result, to the person data registration unit. In the illustrated example, the cluster selection unit 21 reads temporary clusters A, B, and C from the temporary cluster storage unit 17. Further, the cluster selection unit 21 receives information on clusters A and B, which are clusters extracted by performing face recognition processing on video content, from the cut-by-cut face recognition processing unit 20. Then, the cluster selection unit 21 selects only the information of clusters A and B, which are the clusters passed from the cut unit face recognition processing unit 20, from among the temporary clusters A, B, and C read out from the temporary cluster storage unit 17. do. The cluster selection unit 21 passes information on the selected clusters A and B to the person data registration unit 22. That is, in this example, cluster C was extracted as a temporary cluster, but since cluster C was not recognized in face recognition, the cluster selection unit 21 discards cluster C without selecting it. Note that the cluster output by the cut-by-cut face recognition processing section 20 may be referred to as a "used cluster." The specification cluster is a cluster detected to have been used in the video.

図４、図５、図６は、人物データベース２３が保持する人物データ（クラスターデータ）の構成例を示す概略図である。図示するように、人物データは、表形式のデータであり、クラスター識別情報と、特徴量情報と、タグと、属性情報（人物属性等）との各項目を有する。 4, FIG. 5, and FIG. 6 are schematic diagrams showing configuration examples of person data (cluster data) held by the person database 23. As shown in the figure, the person data is tabular data, and includes items such as cluster identification information, feature information, tags, and attribute information (person attributes, etc.).

図４は、新たなクラスターが登録され、それらの新たなクラスターと既存のクラスターとの名寄せの処理がまだ行われていない状態における人物データを示す。図示するように、この人物データは、クラスター識別情報として、Ａ、Ｂ、Ｘ、およびＹを含んでいる。これらのうち、クラスターＸおよびＹは、既存のクラスターである。クラスターＡおよびＢは、新たに登録されたクラスターである。すべてのクラスターは、特徴量の情報を持っている。特徴量の情報は、例えば、特徴量の平均値や分散値といった統計情報を含む。また、前述の通り、人物データベース２３が、顔画像の特徴量のデータとともに、顔画像そのもののデータを持つようにしてもよい。既存のクラスターであるクラスターＸおよびＹには、既にタグが付与されている。ここで、タグは、各クラスターに対応する人名である。クラスターＡおよびＢには、まだタグが設定されていない。 FIG. 4 shows person data in a state where new clusters have been registered and the name matching process between these new clusters and existing clusters has not yet been performed. As shown in the figure, this person data includes A, B, X, and Y as cluster identification information. Among these, clusters X and Y are existing clusters. Clusters A and B are newly registered clusters. All clusters have feature information. The information on the feature amount includes, for example, statistical information such as the average value and variance value of the feature amount. Further, as described above, the person database 23 may have data on the facial image itself as well as data on the feature amount of the facial image. Clusters X and Y, which are existing clusters, have already been tagged. Here, the tag is a person's name corresponding to each cluster. Clusters A and B have not yet been tagged.

図５は、図４に示した状態から、名寄せ処理が行われた後の状態における人物データを示す。図４におけるクラスターＡが、名寄せ処理によって既存のクラスターＸと統合されたために、図５のデータでは、クラスターＡの行が存在しない。クラスターＢは、名寄せ処理によって既存のクラスターと統合されなかった（名寄せすべき既存のクラスターが存在しなかった）ために、図５のデータにおいてもクラスターＢの行は残っている。この状態において、クラスターＢの行では、タグは依然未設定である。 FIG. 5 shows the person data in a state after the name matching process has been performed from the state shown in FIG. 4. Since cluster A in FIG. 4 has been integrated with existing cluster X through the name matching process, there is no row for cluster A in the data in FIG. 5. Cluster B was not integrated with an existing cluster by the name matching process (there was no existing cluster to be matched), so the row for cluster B remains in the data in FIG. 5 as well. In this state, the tag is still unset in the cluster B row.

図６は、図５に示した状態から、タグ付与の処理が行われた後の状態における人物データを示す。前述の通り、タグ設定部２５が、新規クラスターに対するタグの設定を行う。 FIG. 6 shows the person data in a state after the tagging process has been performed from the state shown in FIG. As described above, the tag setting unit 25 sets tags for new clusters.

上記のように、名寄せ処理（図５）およびタグ設定の処理（図６）によって、新規に登録されたクラスターにもタグ情報（人名の情報）が関連付けられる。 As described above, tag information (person name information) is also associated with newly registered clusters through the name matching process (FIG. 5) and tag setting process (FIG. 6).

図４、図５、図６で示したデータを、別の形態で構成するようにしてもよい。
図７、図８、図９は、人物データベース２３が保持する人物データ（クラスターデータ）の別の構成例を示す概略図である。これらの図に示す構成では、タグ（人物に対応）に関するデータと、画像のクラスターに関するデータとを、それぞれ別のテーブルに格納し、それら両者間を関連付けることができるようにしている。 The data shown in FIGS. 4, 5, and 6 may be configured in another format.
7, FIG. 8, and FIG. 9 are schematic diagrams showing other configuration examples of person data (cluster data) held by the person database 23. In the configurations shown in these figures, data related to tags (corresponding to people) and data related to image clusters are stored in separate tables, respectively, so that they can be associated with each other.

図７は、新たなクラスターが登録され、それらの新たなクラスターと既存のクラスターとの名寄せの処理がまだ行われていない状態における人物データを示す。図示するデータでは、クラスターＡ、Ｂ、Ｘ、およびＹを含んでいる。これらのうち、クラスターＸおよびＹは、既存のクラスターである。クラスターＡおよびＢは、新たに登録されたクラスターである。すべてのクラスターは、特徴量の情報を持っている。この形態のデータにおいても、人物データベース２３が、顔画像の特徴量のデータとともに、顔画像そのもののデータを持つようにしてもよい。なお、クラスターＸおよびＹのそれぞれには、タグの情報が関連付けられている。タグの情報は、タグそのものの情報（例えば、名前）と、属性情報（人物属性等）を持つ。また、タグの情報と、クラスターとの情報との関連付けは、一例として、図示するように、タグの情報がクラスター識別情報を持つことにより実現される。 FIG. 7 shows person data in a state where new clusters have been registered and the name matching process between these new clusters and existing clusters has not yet been performed. The illustrated data includes clusters A, B, X, and Y. Among these, clusters X and Y are existing clusters. Clusters A and B are newly registered clusters. All clusters have feature information. Even in this form of data, the person database 23 may have data on the facial image itself as well as data on the feature amount of the facial image. Note that tag information is associated with each of the clusters X and Y. The tag information includes information about the tag itself (eg, name) and attribute information (personal attributes, etc.). Further, as an example, the association between the tag information and the cluster information is realized by the tag information having cluster identification information, as shown in the figure.

図８は、図７に示した状態から、名寄せ処理が行われた後の状態における人物データを示す。図７においてクラスターＸのみに関連付けられていたタグの情報は、名寄せ処理により、図８ではクラスターＡにも関連付けられている。クラスターＢは、名寄せすべき既存のクラスターが存在しなかったために、図８のデータにおいてもクラスターＢにはタグの情報が関連付けられていない。つまり、この状態において、クラスターＢには、タグは依然未設定である。図８に示すデータ表現では、クラスターＡおよびＸは、テーブル上ではデータとして統合されていない。言い換えれば、クラスターＡおよびＸのそれぞれの情報は、図８に示すデータにおいても、失われていない。これは、図８のデータの表現が図５のデータの表現と異なる点である。 FIG. 8 shows the person data in a state after the name matching process has been performed from the state shown in FIG. The tag information that was associated only with cluster X in FIG. 7 is also associated with cluster A in FIG. 8 through the name matching process. Cluster B has no existing cluster to be merged, so no tag information is associated with cluster B in the data of FIG. 8 as well. That is, in this state, cluster B still has no tag set. In the data representation shown in FIG. 8, clusters A and X are not integrated as data on the table. In other words, the information of clusters A and X is not lost even in the data shown in FIG. This is a point in which the data representation in FIG. 8 differs from the data representation in FIG. 5.

図９は、図８に示した状態から、タグ付与の処理が行われた後の状態における人物データを示す。図９においては、クラスターＢにもタグの情報が関連付けられている。前述の通り、タグ設定部２５が、新規クラスター（クラスターＢ）に対するタグの設定を行う。 FIG. 9 shows the person data in a state after the tagging process has been performed from the state shown in FIG. In FIG. 9, tag information is also associated with cluster B. As described above, the tag setting unit 25 sets a tag for the new cluster (cluster B).

上記のように、名寄せ処理（図８）およびタグ設定の処理（図９）によって、新規に登録されたクラスターにもタグ情報（人名の情報）が関連付けられる。 As described above, tag information (person name information) is also associated with newly registered clusters through the name matching process (FIG. 8) and tag setting process (FIG. 9).

次に、画像管理装置１が持つ特徴的な処理の詳細について、さらに説明する。 Next, details of the characteristic processing that the image management device 1 has will be further explained.

図１０は、クラスタリング部１３による２段階クラスタリングの処理の概要を示す概略図である。図示するように、クラスタリング対象画像群は、第１段階のクラスタリングおよび第２段階のクラスタリング処理により、複数のクラスターに分類される。第１クラスタリング部１４が第１段階のクラスタリングを行い、第２クラスタリング部１５が第２段階のクラスタリングを行う。同図において、符号３０１は、クラスタリング前の画像群である。符号３０２は、第１段階のクラスタリングの結果である。第１段階のクラスタリングの結果として、クラスター１、２、３に分類されている。符号３０３は、第２段階のクラスタリングの結果である。第２段階のクラスタリングの結果として、元のクラスター１が、クラスター１と４の２つのクラスターに分類されている。元のクラスター２および３のそれぞれは、そのまま、それぞれクラスター２および３として残っている。 FIG. 10 is a schematic diagram showing an overview of two-stage clustering processing by the clustering unit 13. As shown in the figure, the group of images to be clustered is classified into a plurality of clusters by first-stage clustering and second-stage clustering processing. The first clustering unit 14 performs first-stage clustering, and the second clustering unit 15 performs second-stage clustering. In the figure, reference numeral 301 indicates a group of images before clustering. Reference numeral 302 is the result of the first stage clustering. As a result of the first stage clustering, it is classified into clusters 1, 2, and 3. Reference numeral 303 is the result of the second stage clustering. As a result of the second stage clustering, the original cluster 1 has been classified into two clusters, clusters 1 and 4. Each of the original clusters 2 and 3 remains intact as clusters 2 and 3, respectively.

第１段階および第２段階のクラスタリング処理では、ともに、例えばＤＢＳＣＡＮ（Density-based spatial clustering of applications with noise）を使用する。第１段階と第２段階とでは、異なるパラメーターを使用する。第１段階のクラスタリングでは、探索範囲を広く設けて、荒い粒度での分類を行うようにする。また、第２段階のクラスタリングでは、探索範囲を狭く設けて再分類することにより、精度の改善を図る。このように２段階でクラスタリングを実施することにより、ノイズの混入を削減することができる。ここでのノイズとは、あるクラスターが特定の人物の顔画像だけで構成されず、他の人物の顔画像が混入することである。このように顔画像クラスターの制度を改善することは、顔認識処理用の顔認識モデル（特定コンテンツ用顔認識モデル１９）の精度の向上につながる。つまり、カット単位顔認識処理部２０による顔認識処理の精度の向上につながる。 Both the first-stage and second-stage clustering processes use, for example, DBSCAN (Density-based spatial clustering of applications with noise). Different parameters are used in the first and second stages. In the first stage of clustering, the search range is set wide and classification is performed with coarse granularity. In addition, in the second stage of clustering, accuracy is improved by narrowing the search range and reclassifying. By performing clustering in two stages in this way, it is possible to reduce the mixing of noise. Noise here means that a certain cluster is not composed only of facial images of a specific person, but includes facial images of other people. Improving the accuracy of face image clusters in this way leads to an improvement in the accuracy of the face recognition model for face recognition processing (specific content face recognition model 19). In other words, the accuracy of face recognition processing by the cut-by-cut face recognition processing unit 20 is improved.

図１１は、２段階のクラスタリング処理によって分類される顔画像の例を示す概略図である。同図において、（Ａ）は、第１段階のクラスタリングの結果として得られたクラスター１に属する画像群を示す。また、（Ｂ）は、第２段階のクラスタリングの結果として得られたクラスター１に属する画像群を示す。また、（Ｃ）は、第２段階のクラスタリングの結果として得られたクラスター４に属する画像群を示す。前述の通り（図１０）、第１段階におけるクラスター１（Ａ）が、第２段階におけるクラスター１（Ｂ）とクラスター４（Ｃ）に分類されている。つまり、（Ａ）において、画像３１１から３１５までは同一人物の顔画像であり、画像３１６および３１７がノイズとしてクラスター１に混入している。そして、第２段階のクラスタリング処理の結果として、（Ｂ）のクラスター１が画像３１１から３１５までを含み、（Ｃ）のクラスター４が画像３１６および３１７を含むように分類されている。つまり、第２段階のクラスタリングにより、クラスター１から、ノイズである画像３１６および３１７が、クラスター４として分離されている。 FIG. 11 is a schematic diagram showing an example of facial images classified by two-stage clustering processing. In the figure, (A) shows a group of images belonging to cluster 1 obtained as a result of the first stage clustering. Further, (B) shows a group of images belonging to cluster 1 obtained as a result of the second stage clustering. Further, (C) shows a group of images belonging to cluster 4 obtained as a result of the second stage clustering. As described above (FIG. 10), cluster 1 (A) in the first stage is classified into cluster 1 (B) and cluster 4 (C) in the second stage. That is, in (A), images 311 to 315 are face images of the same person, and images 316 and 317 are mixed into cluster 1 as noise. As a result of the second-stage clustering process, cluster 1 in (B) includes images 311 to 315, and cluster 4 in (C) includes images 316 and 317. In other words, images 316 and 317, which are noise, are separated from cluster 1 as cluster 4 by the second stage of clustering.

図１２は、カット単位顔認識処理部２０による処理を説明するための概略図である。同図は、カット単位で顔認識処理を行った場合の認識結果と、一定時間間隔ごと（例として、１秒ごと）に顔認識処理を行った場合の認識結果とを、対比して示している。同図において、（Ａ）は、カット単位での顔認識処理（カット単位顔認識処理部２０が処理する方式）の結果である。（Ｂ）は、比較対象であり、秒単位（１秒ごと）での顔認識処理の結果である。 FIG. 12 is a schematic diagram for explaining processing by the cut unit face recognition processing section 20. As shown in FIG. The figure shows a comparison of the recognition results when face recognition processing is performed in units of cuts and the recognition results when face recognition processing is performed at fixed time intervals (for example, every second). There is. In the figure, (A) is the result of face recognition processing in units of cuts (processed by the cut unit face recognition processing section 20). (B) is a comparison target and is the result of face recognition processing in units of seconds (every second).

既に述べたように、カット単位顔認識処理部２０は、動画をカット単位に分割して、そのカットの中で人物を追跡しながら顔認識処理を行う。カット検出のために、カット単位顔認識処理部２０は、例えば、カラーヒストグラムの変化量を参照する。具体的には、カット単位顔認識処理部２０は、例えば、認識対象とする動画の各フレームのカラーヒストグラムを算出する。そのため、カット単位顔認識処理部２０は、各フレームのＲＧＢの各原色の値の範囲ごとの画素頻度を求める。一例として、カット単位顔認識処理部２０は、１つのフレームにおいて、Ｒ（赤）チャンネルの値が０％以上且つ２５％未満、２５％以上且つ５０％未満、５０％以上且つ７５％未満、７５％以上且つ１００％以下の、４つの範囲のそれぞれに属する画素数をカウントする。Ｇ（緑）チャンネルとＢ（青）チャンネルについてもこれと同様の処理を行う。カット単位顔認識処理部２０は、そのようにして求めたフレームごとのカラーヒストグラムの、時間方向の変化量を隣接するフレーム間で求める。この変化量が特異に一時的に上昇するポイントが動画内で現れるが、カット単位顔認識処理部２０は、その箇所をカット点として検出する。 As already mentioned, the cut-by-cut face recognition processing unit 20 divides a video into cut units and performs face recognition processing while tracking a person in each cut. For cut detection, the cut unit face recognition processing section 20 refers to the amount of change in the color histogram, for example. Specifically, the cut-by-cut face recognition processing unit 20 calculates, for example, a color histogram of each frame of a video to be recognized. Therefore, the cut-by-cut face recognition processing unit 20 calculates the pixel frequency for each value range of each primary color of RGB in each frame. As an example, the cut-by-cut face recognition processing unit 20 determines that in one frame, the value of the R (red) channel is 0% or more and less than 25%, 25% or more and less than 50%, 50% or more and less than 75%, 75 The number of pixels belonging to each of four ranges of % or more and 100% or less is counted. Similar processing is performed for the G (green) channel and the B (blue) channel. The cut-by-cut face recognition processing unit 20 calculates the amount of change in the temporal direction of the color histogram for each frame thus calculated between adjacent frames. A point where the amount of change temporarily increases appears in the video, and the cut unit face recognition processing section 20 detects that point as a cut point.

カット単位顔認識処理部２０は、カット内では人物の入れ替わりは非常に少ないという傾向を前提として、人物の追跡を行いながら、各フレーム内の顔画像の認識処理を行う。例えば、一定時間間隔（比較対象。例えば、１秒間隔。）で顔認識処理を行うと、動画内の人物の顔の向きや照明の変化などの影響により認識の誤りが生じて、別の人物の顔画像であると認識されてしまう場合が起こり得る。しかしながら、カット内で人物の追跡を行うことにより、そういった認識誤りをノイズとして判定することができるようになる。つまり、カット単位顔認識処理部２０は、ノイズ（同一人物であるはずの顔画像を別人物として認識してしまう認識結果）を除外することができる。つまり、カット単位顔認識処理部２０は、認識誤りをなくすことができる。 The cut-by-cut face recognition processing unit 20 performs recognition processing on a face image in each frame while tracking a person on the premise that there is a tendency for very few people to be replaced within a cut. For example, if face recognition processing is performed at fixed time intervals (comparison targets, e.g. 1 second intervals), recognition errors may occur due to changes in the direction of the face of the person in the video, changes in lighting, etc. There may be cases where the face image is recognized as a facial image. However, by tracking the person within the cut, it becomes possible to determine such recognition errors as noise. In other words, the cut-by-cut face recognition processing unit 20 can exclude noise (a recognition result in which facial images that are supposed to be of the same person are recognized as different people). In other words, the cut unit face recognition processing section 20 can eliminate recognition errors.

図１２では、（Ａ）のカット単位の顔認識に関しては、カット番号と、時間区間（時間の長さはカットごとに可変）と、認識されたクラスターとの情報を示している。また、（Ｂ）の秒単位の顔認識に関しては、時間区間（時間の長さは、一例として、１秒）と、認識されたクラスターの情報とを示している。また、便宜的に中央付近（（Ａ）と（Ｂ）との間）に行番号を付している。なお、時間区間は、コンテンツの開始時点をゼロとする相対時間を用いて表している。 In FIG. 12, regarding face recognition in units of cuts in (A), information on cut numbers, time intervals (the length of time is variable for each cut), and recognized clusters is shown. Furthermore, regarding face recognition in units of seconds in (B), time intervals (the length of time is, for example, 1 second) and information on recognized clusters are shown. Also, for convenience, line numbers are attached near the center (between (A) and (B)). Note that the time interval is expressed using relative time with the start time of the content as zero.

図示する例では、カット単位の顔認識の、カット番号９８は、時間区間０：３０：０１から０：３０：１１に対応し、秒単位の顔認識における第１行から第１０行までに対応する。カット単位の顔認識の、カット番号９９は、時間区間０：３０：１１から０：３０：１８に対応し、秒単位の顔認識における第１１行から第１７行までに対応する。カット単位の顔認識の、カット番号１００は、時間区間０：３０：１８から０：３０：２７に対応し、秒単位の顔認識における第１８行から第２６行までに対応する。カット単位の顔認識の、カット番号１０１は、時間区間０：３０：２７から０：３０：３１に対応し、秒単位の顔認識における第２７行から第３０行までに対応する。一方、秒単位の顔認識の、第１行から第３０行までのそれぞれの行は、０：３０：０１に始まる１秒間から０：３０：３０に始まる１秒間までの、３０個の時間区間に対応するものである。 In the illustrated example, cut number 98 in face recognition in cut units corresponds to the time interval 0:30:01 to 0:30:11, and corresponds to lines 1 to 10 in face recognition in seconds. do. Cut number 99 in face recognition in cut units corresponds to the time interval 0:30:11 to 0:30:18, and corresponds to lines 11 to 17 in face recognition in seconds. Cut number 100 in face recognition in cut units corresponds to the time interval 0:30:18 to 0:30:27, and corresponds to lines 18 to 26 in face recognition in seconds. Cut number 101 in face recognition in cut units corresponds to the time interval 0:30:27 to 0:30:31, and corresponds to lines 27 to 30 in face recognition in seconds. On the other hand, each line from the 1st line to the 30th line of face recognition in seconds represents 30 time intervals from 1 second starting at 0:30:01 to 1 second starting at 0:30:30. This corresponds to

カット単位顔認識処理部２０によるカット単位の顔認識処理では、カット番号９８については、ＡおよびＢの２つのクラスターが認識結果として出力される。また、カット番号９９については、認識されたクラスターはない。また、カット番号１００については、クラスターＡのみが認識結果として出力される。また、カット番号１０１については、クラスターＥのみが認識結果といて出力される。なお、カット単位顔認識処理部２０が、カットの中の特定の時間帯の認識結果としてクラスターの情報を出力するようにしてもよい。例えば、カット単位顔認識処理部２０は、カット９８の中の、時間区間０：３０：０１～０：３０：０７における認識結果としてクラスターＡを出力する。また、カット単位顔認識処理部２０は、カット９８の中の、時間区間０：３０：０１～０：３０：１１における認識結果としてクラスターＢを出力する（クラスターＢは、当該時間区間において継続的に追跡されている）。また、カット単位顔認識処理部２０は、カット１００の中の、時間区間０：３０：１８～０：３０：２１および時間区間０：３０：２４～０：３０：２７における認識結果としてクラスターＡを出力する。また、カット単位顔認識処理部２０は、カット１０１の中の、時間区間０：３０：２８～０：３０：３１における認識結果としてクラスターＥを出力する。 In the cut-by-cut face recognition processing performed by the cut-by-cut face recognition processing unit 20, for cut number 98, two clusters, A and B, are output as recognition results. Furthermore, for cut number 99, there is no recognized cluster. Furthermore, for cut number 100, only cluster A is output as the recognition result. Furthermore, for cut number 101, only cluster E is output as the recognition result. Note that the cut-by-cut face recognition processing unit 20 may output cluster information as a recognition result for a specific time period within a cut. For example, the cut-by-cut face recognition processing unit 20 outputs cluster A as the recognition result in the time interval 0:30:01 to 0:30:07 in the cut 98. In addition, the cut-by-cut face recognition processing unit 20 outputs cluster B as a recognition result in the time interval 0:30:01 to 0:30:11 in cut 98 (cluster B is continuous in the time interval ). In addition, the cut-by-cut face recognition processing unit 20 generates a cluster A as a recognition result in the time interval 0:30:18 to 0:30:21 and the time interval 0:30:24 to 0:30:27 in the cut 100. Output. Further, the cut-by-cut face recognition processing unit 20 outputs cluster E as the recognition result in the time interval 0:30:28 to 0:30:31 in the cut 101.

同じ動画コンテンツを対象として（Ｂ）の秒単位の顔認識の処理を行った場合、各時間区間において認識されるクラスターの集合は、誤認識を考慮しなければ、対応するカット単位の認識処理結果のクラスター集合の、部分集合（空集合である場合を含む）となる。しかしながら、秒単位の顔認識を行う場合に、カット単位の人物追跡を行わなかったことによる誤認識が発生し得る。図示する例では、第５行および第６行で認識結果に含まれているクラスターＣは、誤認識されたクラスターである。また、第２４行で認識結果に含まれているクラスターＤは、誤認識されたクラスターである。 When performing second-by-second face recognition processing (B) on the same video content, the set of clusters recognized in each time interval will be the same as the corresponding cut-by-cut recognition processing results, unless misrecognition is taken into account. It is a subset of the cluster set (including the case where it is an empty set). However, when performing face recognition on a second-by-second basis, erroneous recognition may occur due to not tracking a person on a cut-by-cut basis. In the illustrated example, cluster C included in the recognition results in the fifth and sixth rows is a cluster that has been misrecognized. Furthermore, cluster D included in the recognition result in the 24th line is a cluster that has been misrecognized.

以上のように、本実施形態では、カット単位顔認識処理部２０がカット単位での顔認識処理御行う。つまり、カット単位顔認識処理部２０は、カット内で人物の入れ替わりが起こらない（あるいは、少ない）ことを前提として、人物を追跡しながら顔認識処理を行う。これにより、カット単位顔認識処理部２０は、顔認識結果のノイズを除去する。つまり、カット単位顔認識処理部２０は、誤認識を軽減することができる。言い換えれば、カット単位顔認識処理部２０は、カット点で区切った時間区間ごとにフレーム間で顔画像を追跡し、追跡結果に基づいてクラスターの判定におけるエラーを検出するとともに検出されたエラーを補正（修正）する。 As described above, in this embodiment, the cut-by-cut face recognition processing unit 20 performs face recognition processing in cut units. In other words, the cut-by-cut face recognition processing unit 20 performs face recognition processing while tracking a person on the premise that there is no (or only a small) change of people within a cut. Thereby, the cut-by-cut face recognition processing section 20 removes noise from the face recognition result. In other words, the cut-by-cut face recognition processing section 20 can reduce misrecognition. In other words, the cut-by-cut face recognition processing unit 20 tracks the face image between frames for each time interval divided by the cut point, detects an error in cluster determination based on the tracking result, and corrects the detected error. (revise).

図１３は、人物データベース２３が持つ出演情報の表の構成例を示す概略図である。図示するように、出演情報を表す表形式のデータは、コンテンツ識別情報と、時間区間識別情報と、出演者識別情報とを総合に関連付けたデータである。コンテンツ識別情報は、動画コンテンツを識別するための情報である。時間区間識別情報は、コンテンツ内における時間区間を識別する情報である。時間区間識別情報は、具体的には、例えば、時間区間に対して付与された番号（例えば、図９に示したカット番号など）や、時間区間の開始時刻および終了時刻の組などである。出演者識別情報は、出演者を識別するための情報である。出演者識別情報は、例えば、図４等において示したクラスター識別情報や、人名（図４等において示したタグ）等であってよい。図示する例による出演情報は、「コンテンツＸ」として識別されるコンテンツの、「カット９８」として識別される時間区間の動画に、カテゴリーＡおよびＢに相当する各出演者が出演していたことを表す。このような出演情報を蓄積することにより、大量の動画コンテンツの中の、どの動画コンテンツのどのシーンにどの出演者が出演していたかを管理することができる。このように、画像管理装置１の人物データベース２３は、自動的に認識（識別）された顔画像に基づいて、動画コンテンツに出演した出演者の情報を例えばカット（シーン）に関連付ける形で管理することができる。 FIG. 13 is a schematic diagram showing an example of the configuration of a table of appearance information held in the person database 23. As shown in the figure, the tabular data representing appearance information is data in which content identification information, time interval identification information, and performer identification information are comprehensively associated. Content identification information is information for identifying video content. The time section identification information is information that identifies a time section within the content. Specifically, the time interval identification information is, for example, a number assigned to a time interval (for example, the cut number shown in FIG. 9), a set of a start time and an end time of a time interval, and the like. Performer identification information is information for identifying performers. The performer identification information may be, for example, the cluster identification information shown in FIG. 4, etc., or a person's name (tag shown in FIG. 4, etc.). The appearance information according to the illustrated example indicates that each performer corresponding to categories A and B appeared in the video in the time section identified as "Cut 98" of the content identified as "Content X". represent. By accumulating such appearance information, it is possible to manage which performer appeared in which scene of which video content out of a large amount of video content. In this way, the person database 23 of the image management device 1 manages information on performers who have appeared in video content by associating it with cuts (scenes), for example, based on automatically recognized (identified) facial images. be able to.

図１４は、画像管理装置１の処理手順を示すフローチャートである。以下、このフローチャートに沿って動作手順を説明する。 FIG. 14 is a flowchart showing the processing procedure of the image management device 1. The operating procedure will be explained below along with this flowchart.

まず、ステップＳ１１において、顔検出処理部１２は、動画コンテンツ内の顔を検出する。ここで、顔を検出する対象とするコンテンツは、特定のコンテンツである。顔検出処理部１２は、検出した顔を含む領域の画像を、クラスタリング部１３に渡す。
次に、ステップＳ１２において、クラスタリング部１３は、顔検出処理部１２から渡された顔画像のクラスタリング処理を行う。具体的には、既に説明したように、第１クラスタリング部１４が第１段階のクラスタリング処理を行い、第２クラスタリング部１５が第２段階のクラスタリング処理を行う。つまり、クラスタリング部１３は、２段階のクラスタリングを行う。 First, in step S11, the face detection processing unit 12 detects a face in the video content. Here, the content whose face is to be detected is specific content. The face detection processing unit 12 passes the image of the area including the detected face to the clustering unit 13.
Next, in step S12, the clustering unit 13 performs clustering processing on the face images passed from the face detection processing unit 12. Specifically, as already explained, the first clustering unit 14 performs the first stage clustering process, and the second clustering unit 15 performs the second stage clustering process. In other words, the clustering unit 13 performs two-stage clustering.

次に、ステップＳ１３において、仮クラスター管理部１６は、クラスタリング部１３から、クラスタリング処理の結果を受け取る。そして、仮クラスター管理部１６は、それらのクラスターすべてを「仮クラスター」として、各仮クラスターの情報を、仮クラスター記憶部１７に登録する。ここで、仮クラスターの情報は、少なくとも、仮クラスターを識別する情報と、その仮クラスターについての特徴量（画像の特徴量）に関する情報を含む。また、この特徴量の情報は、例えば、画像に関する数値等の統計情報であってもよい。 Next, in step S13, the temporary cluster management unit 16 receives the results of the clustering process from the clustering unit 13. Then, the temporary cluster management unit 16 registers information about each temporary cluster in the temporary cluster storage unit 17, regarding all of these clusters as “temporary clusters”. Here, the information on the temporary cluster includes at least information for identifying the temporary cluster and information regarding the feature amount (image feature amount) for the temporary cluster. Further, the information on the feature amount may be, for example, statistical information such as numerical values regarding the image.

次に、ステップＳ１４において、特定コンテンツ用顔画像学習部１８は、ステップＳ１３で登録された仮クラスターのそれぞれに関する学習処理を行う。具体的には、特定コンテンツ用顔画像学習部１８は、仮クラスターごとに絞り込まれた顔画像を用いた学習処理を行い、顔認識処理用のモデルを構築する。構築されるモデルは、顔全体、あるいは顔に含まれる各パーツの、形状や、色や、サイズなどに関する特徴の情報を持つ。本ステップで構築されるモデルは、顔画像を基に、クラスターを判別するためのモデルである。なお、学習処理自体は、既存の機械学習の技術を用いて実現可能である。一例として、学習処理には、ニューラルネットワーク等を用いることができる。本ステップで得られた学習済みモデルは、特定コンテンツ用顔認識モデル１９として記憶媒体に書き込まれる。 Next, in step S14, the specific content face image learning unit 18 performs learning processing for each of the temporary clusters registered in step S13. Specifically, the specific content face image learning unit 18 performs a learning process using face images narrowed down for each temporary cluster, and constructs a model for face recognition processing. The model that is constructed has information about the shape, color, size, and other characteristics of the entire face or each part of the face. The model constructed in this step is a model for determining clusters based on facial images. Note that the learning process itself can be realized using existing machine learning technology. As an example, a neural network or the like can be used for the learning process. The trained model obtained in this step is written to the storage medium as the specific content face recognition model 19.

次に、ステップＳ１５において、カット単位顔認識処理部２０は、上記特定動画コンテンツの、カット単位での顔認識処理を行う。カット単位顔認識処理部２０は、カットごとの認識結果であるクラスターの集合の情報を、クラスター選択部２１に渡す。本ステップで得られたクラスターは、動画内で使用されたクラスターであり、「使用クラスター」と呼ばれる場合がある。 Next, in step S15, the cut-by-cut face recognition processing unit 20 performs face recognition processing for the specific video content in cut units. The cut-by-cut face recognition processing unit 20 passes information on a set of clusters, which is the recognition result for each cut, to the cluster selection unit 21. The clusters obtained in this step are clusters used in the video, and may be called "used clusters."

次に、ステップＳ１６において、クラスター選択部２１は、仮クラスター記憶部１７に記憶されている仮クラスターの情報の中から、ステップＳ１５でカット単位顔認識処理部２０が出力した使用クラスターの情報のみを選択する。クラスター選択部２１は、選択したクラスターの情報を人物データ登録部２２に渡す。
次に、ステップＳ１７において、人物データ登録部２２は、クラスター選択部２１から渡されたデータを用いて、使用クラスターを人物データベースに登録する。 Next, in step S16, the cluster selection unit 21 selects only the information on the used clusters output by the cut unit face recognition processing unit 20 in step S15 from among the information on the temporary clusters stored in the temporary cluster storage unit 17. select. The cluster selection unit 21 passes information on the selected cluster to the person data registration unit 22.
Next, in step S17, the person data registration section 22 uses the data passed from the cluster selection section 21 to register the used cluster in the person database.

次に、ステップＳ１８において、人物データベース２３は、保持しているデータの名寄せ処理を行う。つまり、人物データベース２３は、新たに登録されたクラスターのデータのうち、既存のクラスターと一致するもの（同一人物であると判断できるクラスター）の名寄せを行う。名寄せ処理の具体例については、図４および図５を参照しながら説明した通りである。 Next, in step S18, the person database 23 performs name matching processing on the data it holds. In other words, the person database 23 performs name matching of data on newly registered clusters that match existing clusters (clusters that can be determined to be the same person). A specific example of the name matching process is as described with reference to FIGS. 4 and 5.

以上の一連の処理により、画像管理装置１は、特定コンテンツ用の顔認識モデルを構築し、特定コンテンツ用の顔認識モデルに基づく顔認識処理を行い、認識結果として得られたクラスター（人物）を人物データベースに登録することができる。 Through the above series of processes, the image management device 1 constructs a face recognition model for specific content, performs face recognition processing based on the face recognition model for specific content, and identifies clusters (persons) obtained as recognition results. Can be registered in the person database.

なお、上述した実施形態における画像管理装置の少なくとも一部の機能をコンピューターで実現することができる。その場合、この機能を実現するためのプログラムをコンピューター読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピューターシステムに読み込ませ、実行することによって実現しても良い。なお、ここでいう「コンピューターシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。また、「コンピューター読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ－ＲＯＭ、ＤＶＤ－ＲＯＭ、ＵＳＢメモリー等の可搬媒体、コンピューターシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピューター読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを送信する場合の通信線のように、一時的に、動的にプログラムを保持するもの、その場合のサーバーやクライアントとなるコンピューターシステム内部の揮発性メモリーのように、一定時間プログラムを保持しているものも含んでも良い。また上記プログラムは、前述した機能の一部を実現するためのものであっても良く、さらに前述した機能をコンピューターシステムにすでに記録されているプログラムとの組み合わせで実現できるものであっても良い。 Note that at least some of the functions of the image management apparatus in the embodiments described above can be realized by a computer. In that case, a program for realizing this function may be recorded on a computer-readable recording medium, and the program recorded on the recording medium may be read into a computer system and executed. Note that the "computer system" herein includes hardware such as an OS and peripheral devices. Furthermore, "computer-readable recording media" refers to portable media such as flexible disks, magneto-optical disks, ROM, CD-ROM, DVD-ROM, and USB memory, and storage devices such as hard disks built into computer systems. Say something. Furthermore, a "computer-readable recording medium" refers to a medium that temporarily and dynamically stores a program, such as a communication line when transmitting a program via a network such as the Internet or a communication line such as a telephone line. In that case, it may also include something that retains a program for a certain period of time, such as a volatile memory inside a computer system that is a server or client. Further, the program may be one for realizing a part of the above-mentioned functions, or may be one that can realize the above-mentioned functions in combination with a program already recorded in the computer system.

以上、実施形態を説明したが、本発明はさらに次のような変形例でも実施することが可能である。例えば、カット単位顔認識処理部２０は、カット単位での顔認識処理を行った。しかし、カット以外の単位による時間区間ごとに顔認識処理を行うようにしてもよい。また、例えば、人物データベース２３における名寄せ処理を自動的に行わず、人の判断に基づいて行うようにしてもよい。また、例えば、画像管理装置１が、出演者データ記憶部２４を持たない構成としてもよい。さらに、画像管理装置１が、タグ設定部２５を持たない構成としてもよい。 Although the embodiments have been described above, the present invention can be further implemented in the following modifications. For example, the cut-by-cut face recognition processing unit 20 performed face recognition processing in cut units. However, the face recognition process may be performed in units of time other than cuts. Further, for example, the name matching process in the person database 23 may not be performed automatically, but may be performed based on human judgment. Furthermore, for example, the image management device 1 may be configured without the performer data storage section 24. Furthermore, the image management device 1 may be configured without the tag setting section 25.

以上説明した実施形態およびその変形例をまとめると、次の通りである。 The embodiment described above and its modifications are summarized as follows.

画像管理装置１は、クラスタリング部１３と、仮クラスター記憶部１７と、特定コンテンツ用顔画像学習部１８と、顔認識処理部（カット単位顔認識処理部２０）と、人物データベース２３と、人物データ登録部２２とを少なくとも備える。クラスタリング部１３は、動画コンテンツから取り出された顔画像を画像特徴量に基づいてクラスタリングする。仮クラスター記憶部１７は、クラスタリング部１３の処理によって得られたクラスターごとの画像特徴量を記憶する。特定コンテンツ用顔画像学習部１８は、前記顔画像と前記クラスターとの関係を表す情報に基づいて機械学習処理を行うことによって、特定の前記動画コンテンツに含まれる顔画像を基に前記クラスターを判定するための、動画コンテンツ用の顔認識モデルを生成する。顔認識処理部は、生成された前記顔認識モデルに基づいて前記動画コンテンツ内に含まれる顔画像の認識処理を行い、認識結果としてクラスターの情報を出力する。人物データベース２３は、顔画像の画像特徴量とクラスターとを関連付けて記憶する。人物データ登録部２２は、顔認識処理部から出力されたクラスター（使用クラスター）について、仮クラスター記憶部１７から読み出したクラスターごとの画像特徴量の情報を、人物データベース２３に登録する。 The image management device 1 includes a clustering unit 13, a temporary cluster storage unit 17, a specific content face image learning unit 18, a face recognition processing unit (cut unit face recognition processing unit 20), a person database 23, and person data. and a registration section 22. The clustering unit 13 clusters facial images extracted from video content based on image features. The temporary cluster storage unit 17 stores image feature amounts for each cluster obtained through the processing of the clustering unit 13. The specific content face image learning unit 18 determines the cluster based on the face image included in the specific video content by performing machine learning processing based on information representing the relationship between the face image and the cluster. Generate a facial recognition model for video content. The face recognition processing unit performs recognition processing on a face image included in the video content based on the generated face recognition model, and outputs cluster information as a recognition result. The person database 23 stores image features of facial images and clusters in association with each other. The person data registration unit 22 registers, in the person database 23, information on image feature amounts for each cluster read from the temporary cluster storage unit 17 for the clusters (used clusters) output from the face recognition processing unit.

これにより、顔認識処理部は、当該動画コンテンツに専用のモデルに基づいて、顔を認識する処理、即ち、顔画像に対応するクラスターを決定する処理を行うことができる。当該動画コンテンツに特有のモデルを用いるため、対象とするクラスター数（人物数）は相対的に限定され、誤認識の確率は低くなる。つまり、高い精度で正しくクラスターを判定することとなる。 Thereby, the face recognition processing unit can perform face recognition processing, that is, processing for determining a cluster corresponding to a face image, based on a model dedicated to the video content. Since a model specific to the video content is used, the number of target clusters (number of people) is relatively limited, and the probability of misrecognition is low. In other words, clusters can be correctly determined with high accuracy.

クラスタリング部１３は、複数の段階のクラスタリング処理を行うための複数の段階クラスタリング部を持ってもよい。複数の段階クラスタリング部とは、具体的には、既に説明した第１クラスタリング部１４および第２クラスタリング部１５である。第２段階以後の段階クラスタリング部（つまり、本実施形態では、第２クラスタリング部１５）は、１つ前の段階クラスタリング部（つまり、本実施形態では、第１クラスタリング部１４）から出力されたクラスターの顔画像を対象としてクラスタリング処理を行う。なお、クラスタリングの段階数は３以上であってもよい。 The clustering unit 13 may have multiple stage clustering units for performing clustering processing at multiple stages. Specifically, the multiple stage clustering units are the first clustering unit 14 and the second clustering unit 15, which have already been described. The stage clustering unit after the second stage (that is, the second clustering unit 15 in this embodiment) uses the clusters output from the previous stage clustering unit (that is, the first clustering unit 14 in this embodiment). Clustering processing is performed on facial images. Note that the number of stages of clustering may be three or more.

これにより、既に説明した通り、クラスタリングの精度を向上させることが可能となる。 Thereby, as already explained, it is possible to improve the accuracy of clustering.

画像管理装置１が名寄せ処理部（不図示）をさらに備えるようにしてもよい。名寄せ処理部は、例えば、人物データベース２３内に設けられた機能であってもよい。名寄せ処理部は、人物データベース２３に登録されている画像特徴量に基づいて、複数のクラスターを１つのクラスターに統合する（名寄せする）ものである。 The image management device 1 may further include a name matching processing section (not shown). The name matching processing unit may be a function provided within the person database 23, for example. The name matching processing unit integrates a plurality of clusters into one cluster (name matching) based on image feature amounts registered in the person database 23.

これにより、新規登録のクラスターと、既存のクラスターとを統合することが容易に行える。 This makes it easy to integrate a newly registered cluster with an existing cluster.

人物データベース２３は、既に説明したように、クラスターに付与するタグを、クラスターと関連付けて記憶するものであってもよい。タグは、例えば、人名等を表す。また、タグ設定部２５を設けてよい。タグ設定部２５は、タグが設定されていないクラスターについて、新たなタグを設定する処理を行う。 As already explained, the person database 23 may store tags assigned to clusters in association with the clusters. The tag represents, for example, a person's name. Further, a tag setting section 25 may be provided. The tag setting unit 25 performs a process of setting new tags for clusters for which no tags have been set.

これにより、新規登録のクラスターに、人物名等を関連付けて管理することが可能となる。 This makes it possible to manage a newly registered cluster by associating it with a person's name, etc.

顔認識処理部は、カット点で区切った時間区間ごとに顔認識処理を行うものであってもよい。カット点は、動画コンテンツのフレーム間における画素値（例えば、フレーム全体における画素値の総合評価値）の変化量がピークとなるポイントである。顔認識処理部は、カット点で区切った時間区間ごとにフレーム間で顔画像を追跡し、追跡結果に基づいてクラスターの判定におけるエラーを検出するとともに検出されたエラーを補正して、認識結果としてクラスターの情報を出力するものであってよい。カット単位での顔認識や、カット内での前提事項を利用したエラーの検出および補正については、図１２を参照しながら説明した通りである。 The face recognition processing section may perform face recognition processing for each time interval divided by a cut point. The cut point is a point at which the amount of change in pixel values between frames of video content (for example, the overall evaluation value of pixel values in the entire frame) reaches a peak. The face recognition processing unit tracks the face image between frames for each time interval separated by the cut point, detects errors in cluster determination based on the tracking results, corrects the detected errors, and outputs the results as recognition results. It may be something that outputs cluster information. Face recognition in units of cuts and error detection and correction using assumptions within a cut are as described with reference to FIG. 12.

人物データベース２３は、さらに、前記時間区間とクラスターとを関連付けた情報（出演情報）を記憶するものであってよい。これにより、どの時間区間にどの人物が出演していたかを容易に管理することができる。 The person database 23 may further store information (appearance information) associating the time intervals with clusters. This makes it possible to easily manage which person appeared in which time section.

以上、この発明の実施形態について図面を参照して詳述してきたが、具体的な構成はこの実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の設計等も含まれる。 Although the embodiments of the present invention have been described above in detail with reference to the drawings, the specific configuration is not limited to these embodiments, and includes designs within the scope of the gist of the present invention.

本発明は、例えば、動画コンテンツ（放送番組を含む）を管理したり配信したりする事業に利用することができる。但し、本発明の利用範囲はここに例示したものには限られない。 The present invention can be used, for example, in businesses that manage and distribute video content (including broadcast programs). However, the scope of use of the present invention is not limited to what is exemplified here.

１画像管理装置
１１動画コンテンツ取得部
１２顔検出処理部
１３クラスタリング部
１４第１クラスタリング部
１５第２クラスタリング部
１６仮クラスター管理部
１７仮クラスター記憶部
１８特定コンテンツ用顔画像学習部
１９特定コンテンツ用顔認識モデル
２０カット単位顔認識処理部
２１クラスター選択部
２２人物データ登録部
２３人物データベース
２４出演者データ記憶部
２５タグ設定部 1 Image management device 11 Video content acquisition unit 12 Face detection processing unit 13 Clustering unit 14 First clustering unit 15 Second clustering unit 16 Temporary cluster management unit 17 Temporary cluster storage unit 18 Specific content face image learning unit 19 Specific content face Recognition model 20 Cut unit face recognition processing section 21 Cluster selection section 22 Person data registration section 23 Person database 24 Performer data storage section 25 Tag setting section

Claims

a clustering unit that clusters face images extracted from video content based on image features;
a temporary cluster storage unit that stores the image feature amount for each cluster obtained by the processing of the clustering unit;
a face recognition model for the video content for determining the cluster based on the face image included in the video content by performing machine learning processing based on information representing the relationship between the face image and the cluster; a specific content face image learning unit that generates
a face recognition processing unit that performs recognition processing on a face image included in the video content based on the generated face recognition model and outputs cluster information as a recognition result;
a person database that stores image features of facial images and clusters in association with each other;
a person data registration unit that registers the image feature amount for each cluster read from the temporary cluster storage unit with respect to the cluster output from the face recognition processing unit in the person database;
An image management device comprising:

The clustering unit has a plurality of stage clustering units for performing clustering processing in a plurality of stages, and the stage clustering unit after the second stage uses the face of the cluster output from the previous stage clustering unit. Perform clustering processing on images,
The image management device according to claim 1.

a name matching processing unit that integrates a plurality of clusters into one cluster based on the image feature amount registered in the person database;
The image management device according to claim 1 or 2, further comprising:.

The person database stores tags assigned to clusters in association with the clusters,
further comprising a tag setting unit that performs processing to set a new tag for the cluster for which no tag is set;
An image management device according to any one of claims 1 to 3.

The face recognition processing unit tracks the face image between frames for each time interval separated by a cut point at which the amount of change in pixel value between frames of the video content peaks, and identifies the cluster based on the tracking result. detecting an error in the determination, correcting the detected error, and outputting cluster information as the recognition result;
An image management device according to any one of claims 1 to 4.

The person database further stores information associating the time interval with the cluster.
The image management device according to claim 5.

a clustering unit that clusters face images extracted from video content based on image features;
a temporary cluster storage unit that stores the image feature amount for each cluster obtained by the processing of the clustering unit;
a face recognition model for the video content for determining the cluster based on the face image included in the video content by performing machine learning processing based on information representing the relationship between the face image and the cluster; a specific content face image learning unit that generates
a face recognition processing unit that performs recognition processing on a face image included in the video content based on the generated face recognition model and outputs cluster information as a recognition result;
a person database that stores image features of facial images and clusters in association with each other;
a person data registration unit that registers, in the person database, the image feature amount for each cluster read from the temporary cluster storage unit for the cluster output from the face recognition processing unit;
A program that allows a computer to function as an image management device.