JP2006331271A

JP2006331271A - Representative image extraction apparatus and program

Info

Publication number: JP2006331271A
Application number: JP2005157089A
Authority: JP
Inventors: Atsushi Matsui; 淳松井; Clippingdale Simon; クリピングデルサイモン; Takahiro Mochizuki; 貴裕望月; Makoto Tadenuma; 眞蓼沼
Original assignee: Nippon Hoso Kyokai NHK; Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2005-05-30
Filing date: 2005-05-30
Publication date: 2006-12-07

Abstract

<P>PROBLEM TO BE SOLVED: To highly accurately extract a representative image from moving images. <P>SOLUTION: A representative image extraction apparatus for extracting the representative image from an inputted moving image comprises: a representative image determination means for determining a representative candidate image from a prescribed section included in the moving image; a face image recognition means for recognizing a face image by collating the representative candidate image with a plurality of face templates having at least a preliminarily set face image and personal identification information corresponding to the face image; and a representative image decision means for searching a representative image on the basis of a collated result corresponding to the personal identification information obtained by the face image recognition means and then outputting the searched representative image. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、代表画像抽出装置及び代表画像抽出プログラムに係り、特に動画像の中から代表する画像を高精度に抽出するための代表画像抽出装置及び代表画像抽出プログラムに関する。 The present invention relates to a representative image extraction device and a representative image extraction program, and more particularly to a representative image extraction device and a representative image extraction program for extracting a representative image from a moving image with high accuracy.

近年では、通常の地上波によるテレビ放送に加えて、ＢＳデジタル放送やケーブルテレビ等、多様な放送ネットワークが普及し、視聴者が視聴可能な番組の選択肢が広がると同時に視聴したい番組数も増加する傾向にある。これにより、視聴者は全ての番組を視聴するために多くの時間が必要になる。 In recent years, various broadcasting networks such as BS digital broadcasting and cable TV have become widespread in addition to normal terrestrial television broadcasting, and the number of programs that viewers can view increases and the number of programs that they want to view also increases. There is a tendency. Thus, the viewer needs a lot of time to view all the programs.

そこで、放送される番組の大まかな内容を視聴者に把握させるために代表的な画像を抽出し、抽出した画像を用いて時間を短縮させた要約番組を生成し視聴者に提供するサービス等がある。 Therefore, there is a service that extracts representative images in order to make viewers understand the general contents of programs to be broadcast, generates summary programs with reduced time using the extracted images, and provides them to viewers. is there.

また、このような代表画面を抽出する１つの手法として、動画中の顔認識、音声認識、動作認識、顔表情認識、対象物認識やそれらを組み合わせた技術が有効な手法であり、そのための人物画像の認識手法がある（例えば、非特許文献１参照。）。 In addition, as one method for extracting such a representative screen, face recognition, voice recognition, motion recognition, facial expression recognition, object recognition in a moving image, and a technique combining them are effective methods. There is an image recognition method (for example, see Non-Patent Document 1).

非特許文献１に示されている手法は、複数の代表顔判定器の出力の重み付き線形和を用いて、与えられた候補画像の中から代表画像を決定するものである。なお、それぞれの代表顔判定器は、代表顔を判定するために設計された異なる特徴量を用いる。判定器に用いる特徴量の具体例としては、顔の向き・大きさ・明るさ・移動量・目の検出結果の５種類の特徴量の組が提案されている。
平澤宏祐他，「ベストショット顔画像記録システムの開発」，社団法人電子情報通信学会，ＰＲＭＵ２００４−１０３，ＨＩＰ２００４−４３（２００４−１１） The technique disclosed in Non-Patent Document 1 determines a representative image from given candidate images using a weighted linear sum of outputs from a plurality of representative face discriminators. Each representative face determiner uses a different feature amount designed to determine a representative face. As specific examples of feature amounts used for the determiner, a set of five types of feature amounts including face orientation, size, brightness, movement amount, and eye detection result has been proposed.
Kosuke Hirasawa et al., “Development of Best Shot Face Image Recording System”, The Institute of Electronics, Information and Communication Engineers, PRMU 2004-103, HIP 2004-43 (2004-11)

ところで、上述したように代表画像の決定に複数の判定器を用いる手法では、個別に最適化された複数の判定器からの出力を統合した結果が用いられる。しかしながら、個々の判定器の評価値（特徴量）は、代表画像の特徴を表すと思われる対象画像（顔画像）の性質をシステム設計者が主観的な判断に基づいて設定している。そのため、選ばれた代表画像は、必ずしも顔画像の認識処理における個人の特定に有利な画像であるとは限らない。 By the way, in the method using a plurality of determiners for determining the representative image as described above, the result of integrating the outputs from the plurality of individually optimized determiners is used. However, the evaluation value (feature value) of each determiner is set by the system designer based on subjective judgment on the nature of the target image (face image) that is supposed to represent the feature of the representative image. For this reason, the selected representative image is not necessarily an image advantageous for specifying an individual in the face image recognition process.

また、システム設計者の想定した代表画像の性質（特徴量）が、個人の特定に有利な顔画像の特徴をよく反映している場合であっても、閾値等の内部変数（パラメータ）をそれぞれの判定器毎に事前に調整する必要があり、システムを運用する上の利便性が低い。 Even if the characteristics (features) of the representative image assumed by the system designer reflect the characteristics of the face image that is advantageous for individual identification, internal variables (parameters) such as threshold values are set. Therefore, it is necessary to make adjustments in advance for each determination device, and the convenience in operating the system is low.

更に、顔画像の認識処理に用いられる判別器の個数についての基準はなく、また一般に、パラメータ調整用に予め用意された学習データに対するエラーは、判別器（特徴量）の数を増加するほど改善される。しかしながら、学習データに対する判別精度を向上するために判別器を増加すると、学習データへの過適応のため、未知のデータに対する判別能力（頑健性）が低下する危険を本質的に抱えてしまうことになる。 Furthermore, there is no standard for the number of discriminators used for face image recognition processing, and in general, errors with respect to learning data prepared for parameter adjustment improve as the number of discriminators (features) increases. Is done. However, if the number of discriminators is increased in order to improve the discrimination accuracy for learning data, there is an inherent risk that the discrimination ability (robustness) for unknown data will be reduced due to over adaptation to learning data. Become.

本発明は、上述した課題に鑑みなされたものであり、動画像の中から代表する画像を高精度に抽出するための代表画像抽出装置及び代表画像抽出プログラムを提供することを目的とする。 SUMMARY An advantage of some aspects of the invention is that it provides a representative image extraction device and a representative image extraction program for extracting a representative image from a moving image with high accuracy.

上記課題を解決するために、本件発明は、以下の特徴を有する課題を解決するための手段を採用している。 In order to solve the above problems, the present invention employs means for solving the problems having the following characteristics.

請求項１に記載された発明は、入力される動画像から代表画像を抽出する代表画像抽出装置において、前記動画像に含まれる所定の区間から代表候補画像を決定する代表画像決定手段と、前記代表候補画像と、予め設定される顔画像及び顔画像に対応する個人識別情報を少なくとも有する複数の顔テンプレートとを照合し、顔画像の認識を行う顔画像認識手段と、前記顔画像認識手段により得られる前記個人識別情報に対応した照合結果に基づいて代表画像を探索し、探索した代表画像を出力する代表画像判定手段とを有することを特徴とする。 The invention described in claim 1 is a representative image extracting device that extracts a representative image from an input moving image, a representative image determining unit that determines a representative candidate image from a predetermined section included in the moving image; By comparing the representative candidate image with a preset face image and a plurality of face templates having at least personal identification information corresponding to the face image, face image recognition means for recognizing the face image, and the face image recognition means And a representative image determining unit that searches for a representative image based on a matching result corresponding to the obtained personal identification information and outputs the searched representative image.

請求項１記載の発明によれば、動画像の中から代表する画像を高精度に抽出することができる。これにより、例えば膨大な映像（動画像）の内容に関する情報、特に、映された人物に関する情報を代表的な静止画像を用いて要約することができる。したがって、直感的で簡便な動画像の閲覧を実現することができる。 According to the first aspect of the present invention, a representative image can be extracted from a moving image with high accuracy. Thereby, for example, information on the contents of a huge amount of video (moving images), particularly information on a projected person can be summarized using a representative still image. Therefore, intuitive and simple browsing of moving images can be realized.

請求項２に記載された発明は、前記代表画像判定手段は、前記複数の顔テンプレートによる前記代表候補画像との照合結果から、所定の条件に基づいて特定の個人識別情報に基づく代表画像を探索する代表画像探索手段と、前記代表画像探索手段により得られる画像フレーム又は前記画像フレームの出現時刻情報に基づいて、前記代表候補画像から代表画像を抽出する代表画像出力手段とを有することを特徴とする。 According to a second aspect of the present invention, the representative image determination unit searches for a representative image based on specific personal identification information based on a predetermined condition from a result of collation with the representative candidate image by the plurality of face templates. And a representative image output means for extracting a representative image from the representative candidate image based on the image frame obtained by the representative image searching means or the appearance time information of the image frame. To do.

請求項２記載の発明によれば、代表画像判定のための処理に顔認識における照合結果を利用することで、装置全体の構成を単純化することができ、代表画像を高精度に抽出することができる。また、変動する未知の入力データに対する頑健性を改善することができる。 According to the second aspect of the present invention, the configuration of the entire apparatus can be simplified and the representative image can be extracted with high accuracy by using the collation result in the face recognition for the processing for determining the representative image. Can do. In addition, robustness with respect to fluctuating unknown input data can be improved.

請求項３に記載された発明は、前記代表画像探索手段は、前記代表候補画像に含まれる各画像フレームに対して、前記複数の顔テンプレートのうち、ある特定の顔テンプレートにおけるマッチングスコアと、前記特定の顔テンプレート以外の顔画像におけるマッチングスコアとのスコア値の差に基づいて、前記画像フレーム又は前記出現時刻情報を探索することを特徴とする。 In the invention described in claim 3, the representative image search means, for each image frame included in the representative candidate image, a matching score in a specific face template among the plurality of face templates, and the The image frame or the appearance time information is searched based on a difference in score value with a matching score in a face image other than a specific face template.

請求項３記載の発明によれば、スコア値の差に基づいて迅速に高精度な代表画像を抽出することができる。 According to invention of Claim 3, a highly accurate representative image can be extracted rapidly based on the difference of a score value.

請求項４に記載された発明は、前記代表画像探索手段は、前記スコア値の差が、最大もしくは予め設定された閾値以上の前記画像フレーム又は前記出現時刻情報を探索することを特徴とする。 The invention described in claim 4 is characterized in that the representative image search means searches for the image frame or the appearance time information in which the difference in the score values is a maximum or a predetermined threshold value or more.

請求項４記載の発明によれば、代表画像の目的に応じて最適な代表画像又は、最適な複数の代表画像を抽出することができる。 According to the fourth aspect of the present invention, an optimum representative image or an optimum plurality of representative images can be extracted according to the purpose of the representative image.

請求項５に記載された発明は、前記代表画像探索手段は、前記顔テンプレート毎に得られる複数のマッチングスコアの和を一定値に正規化することを特徴とする。 The invention described in claim 5 is characterized in that the representative image search means normalizes a sum of a plurality of matching scores obtained for each face template to a constant value.

請求項５記載の発明によれば、正規化した値を用いる場合、マッチングスコアの最大値に注目するだけで代表画像の抽出を行うことができる。 According to the fifth aspect of the present invention, when the normalized value is used, the representative image can be extracted only by paying attention to the maximum value of the matching score.

請求項６に記載された発明は、入力される動画像から代表画像を抽出する代表画像抽出処理をコンピュータに実行させるための代表画像抽出プログラムにおいて、前記動画像に含まれる所定の区間から代表候補画像を決定する代表画像決定処理と、前記代表候補画像と、予め設定される顔画像及び顔画像に対応する個人識別情報を少なくとも有する複数の顔テンプレートとを照合し、顔画像の認識を行う顔画像認識処理と、前記顔画像認識処理により得られる前記個人識別情報に対応した照合結果に基づいて代表画像を探索し、探索した代表画像を出力する代表画像判定処理とをコンピュータに実行させる。 According to a sixth aspect of the present invention, in a representative image extraction program for causing a computer to execute a representative image extraction process for extracting a representative image from an input moving image, the representative candidate is extracted from a predetermined section included in the moving image. A face that recognizes a face image by collating a representative image determination process for determining an image, the representative candidate image, and a plurality of face templates having at least a preset face image and personal identification information corresponding to the face image. The computer is caused to search for a representative image based on the image recognition process and a matching result corresponding to the personal identification information obtained by the face image recognition process, and to output the searched representative image.

請求項６記載の発明によれば、動画像の中から代表する画像を高精度に抽出することができる。また、本発明における代表画像を抽出する実行プログラムを生成することで、特別な装置構成を必要とせず、低コストで代表画像抽出処理を実現することができる。また、プログラムをインストールすることにより、容易に代表画像抽出処理を実現することができる。 According to the sixth aspect of the present invention, a representative image can be extracted from a moving image with high accuracy. In addition, by generating an execution program for extracting a representative image in the present invention, a representative image extraction process can be realized at a low cost without requiring a special apparatus configuration. Also, the representative image extraction process can be easily realized by installing the program.

本発明によれば、動画像の中から代表する画像を高精度に抽出することができる。 According to the present invention, a representative image can be extracted from a moving image with high accuracy.

＜本発明の概要＞
本発明は、顔認識処理に用いたパラメータを、代表画像を判定する際にも利用することにより、代表画像の評価基準を顔認識処理における条件の優劣と直接関係のあるものとすることで、動画像の中から代表する画像を高精度に抽出する。また、代表画像の判定処理を顔認識の内部処理と共通化することで、装置全体の構成を単純化することができる。 <Outline of the present invention>
The present invention uses the parameters used for the face recognition process also when determining the representative image, thereby making the evaluation standard of the representative image directly related to the superiority or inferiority of the conditions in the face recognition process. A representative image is extracted from a moving image with high accuracy. Further, by making the representative image determination process common with the internal process of face recognition, the configuration of the entire apparatus can be simplified.

＜実施形態＞
以下に、上述したような特徴を有する本発明における代表画像抽出装置及び代表画像抽出プログラムを好適に実施した形態について、図面を用いて詳細に説明する。 <Embodiment>
In the following, a preferred embodiment of a representative image extraction apparatus and a representative image extraction program according to the present invention having the above-described features will be described in detail with reference to the drawings.

図１は、本発明における代表画像抽出装置の一構成例を示す図である。図１に示す代表画像抽出装置１は、代表候補画像決定手段１０と、顔画像認識手段２０と、代表画像判定手段３０とを有するよう構成されている。 FIG. 1 is a diagram illustrating a configuration example of a representative image extraction apparatus according to the present invention. The representative image extraction apparatus 1 shown in FIG. 1 is configured to include a representative candidate image determination unit 10, a face image recognition unit 20, and a representative image determination unit 30.

また、代表候補画像決定手段１０は、代表候補区間決定手段１１と、代表候補画像保存手段１２とを有するよう構成されている。また、顔画像認識手段２０は、顔テンプレート保存手段２１と、顔テンプレート照合手段２２とを有するよう構成されている。更に、代表画像判定手段３０は、代表画像探索手段３１と、代表画像出力手段３２とを有するよう構成されている。 The representative candidate image determination unit 10 includes a representative candidate section determination unit 11 and a representative candidate image storage unit 12. Further, the face image recognition means 20 is configured to include a face template storage means 21 and a face template collation means 22. Further, the representative image determination unit 30 is configured to include a representative image search unit 31 and a representative image output unit 32.

代表候補区間決定手段１１は、入力画像（動画像）から代表候補画像の区間を決定する開始点と終了点の組を出力する。また、代表候補区間決定手段１１は、決定した開始点と終了点との組からなる代表候補画像（開始点から終了点までの連続静止画像群）を代表候補画像保存手段１２に出力する。 The representative candidate section determining means 11 outputs a set of start points and end points for determining the section of the representative candidate image from the input image (moving image). Further, the representative candidate section determination unit 11 outputs a representative candidate image (a group of continuous still images from the start point to the end point) including the determined start point and end point to the representative candidate image storage unit 12.

代表候補画像保存手段１２は、代表候補区間決定手段１１で決定した代表候補区間内の連続静止画像を、動画中における時刻の情報と共に保存する。また、代表候補画像保存手段１２は、保存された代表候補画像を顔画像認識手段２０及び代表画像判定手段３０に出力する。 The representative candidate image storage unit 12 stores the continuous still images in the representative candidate section determined by the representative candidate section determination unit 11 together with time information in the moving image. The representative candidate image storage unit 12 outputs the stored representative candidate image to the face image recognition unit 20 and the representative image determination unit 30.

顔画像認識手段２０における顔テンプレート保存手段２１は、予め設定された顔テンプレートを入力して保存する。つまり、顔テンプレート保存手段２１は、各個人の顔を識別するために必要な情報（顔画像や顔画像に対応した個人識別情報等）からなる顔テンプレートを保存する。また、顔テンプレート保存手段２１は、保存した顔テンプレートを顔テンプレート照合手段２２に出力する。 The face template storage unit 21 in the face image recognition unit 20 inputs and stores a preset face template. That is, the face template storage unit 21 stores a face template including information necessary for identifying each person's face (such as a face image and personal identification information corresponding to the face image). Further, the face template storage unit 21 outputs the stored face template to the face template collation unit 22.

顔テンプレート照合手段２２は、代表候補画像決定手段１０により得られる代表候補画像と、顔テンプレート保存手段２１により得られる複数の顔テンプレートとから代表顔テンプレートを決定する。また、顔テンプレート照合手段２２は、代表顔テンプレートに対応する個人識別情報としての代表顔識別番号と、全ての顔テンプレートに関するマッチングスコア（代表候補画像と顔テンプレートとの類似度）の履歴情報（マッチングスコアテーブル）とを代表画像判定手段３０に出力する。 The face template matching unit 22 determines a representative face template from the representative candidate images obtained by the representative candidate image determining unit 10 and a plurality of face templates obtained by the face template storage unit 21. Further, the face template matching unit 22 records history information (matching) of representative face identification numbers as personal identification information corresponding to the representative face templates and matching scores (similarities between representative candidate images and face templates) for all face templates. (Score table) is output to the representative image determination means 30.

代表画像判定手段３０における代表画像探索手段３１は、顔画像認識手段２０から得られる代表顔識別番号と、マッチングスコアの履歴情報とを入力し、これらの情報に基づいて代表顔テンプレートのマッチングスコアと、その他の顔テンプレートのマッチングスコアの全てとの差が最大となる画像フレーム又は動画における画像フレームの時刻情報を代表候補区間の中から探索する。 The representative image search means 31 in the representative image determination means 30 inputs the representative face identification number obtained from the face image recognition means 20 and the history information of the matching score, and based on these information, the matching score of the representative face template and The time information of the image frame or the image frame in the moving image in which the difference from all the matching scores of the other face templates is maximized is searched from the representative candidate section.

なお、代表画像探索手段３１は、例えば顔テンプレート照合手段２２に登録されている複数の顔テンプレート毎に得られる複数のマッチングスコアの和を一定値に正規化して代表画像の探索を行うこともできる。また、代表画像探索手段３１は、上述したようにマッチングスコアの和を正規化する処理を行う場合には、代表顔テンプレートのマッチングスコアが最大となる時刻（代表画像出現時刻）を代表候補区間の中から探索する。このように、正規化した値を用いる場合、マッチングスコアの最大値に注目するだけで代表画像の抽出を行うことができる。 The representative image search means 31 can also search for a representative image by normalizing the sum of a plurality of matching scores obtained for each of a plurality of face templates registered in the face template collation means 22, for example. . Further, when performing the process of normalizing the sum of the matching scores as described above, the representative image search unit 31 sets the time (representative image appearance time) at which the matching score of the representative face template is maximized as the representative candidate section. Search from inside. As described above, when using the normalized value, it is possible to extract the representative image only by paying attention to the maximum value of the matching score.

また、代表画像探索手段３１は、探索した最大となる時刻（代表画像出現時刻）を代表画像出力手段３２に出力する。なお、上述の探索手法については、与えられた一連の動画像の内容を代表する１枚の静止画像を探索する例を示しているが、本発明においてはこの限りではなく、例えば予め設定された閾値を基準として、スコア値の差が閾値以上である複数の画像フレーム又は画像フレームに対応する時刻を代表画像出力手段３２に出力してもよい。これにより、代表画像の目的に応じて最適な代表画像又は、最適な複数の代表画像を抽出することができる。 The representative image search means 31 outputs the searched maximum time (representative image appearance time) to the representative image output means 32. As for the above-described search method, an example is shown in which a single still image representing the contents of a given series of moving images is searched. However, the present invention is not limited to this. With reference to the threshold value, a plurality of image frames whose difference in score values is equal to or greater than the threshold value or times corresponding to image frames may be output to the representative image output unit 32. Thereby, an optimal representative image or a plurality of optimal representative images can be extracted according to the purpose of the representative image.

また、代表画像出力手段３２は，代表画像探索手段３１から得られる代表画像出現時刻に対応する画像（代表画像）を、代表候補画像決定手段１０から得られる代表候補画像から検索し、検索により抽出された代表画像を出力する。 The representative image output means 32 searches the representative candidate image obtained from the representative candidate image determination means 10 for an image (representative image) corresponding to the representative image appearance time obtained from the representative image search means 31, and extracts it by the search. The processed representative image is output.

上述した代表画像抽出装置１により、動画像の中から代表する画像を高精度に抽出することができる。具体的には、代表候補画像決定手段１０により与えられる代表候補画像により、顔テンプレート照合手段２２で算出した各々の顔テンプレートについてのマッチングスコアの履歴を代表画像判定手段３０に出力し、その情報を代表画像探索手段３１での処理に利用することで、装置全体を必要最小限の構成に単純化することができる。また、変動する未知の入力データに対する頑健性を改善することができる。つまり、可変テンプレートマッチングにおける顔画像処理の内部パラメータの履歴を用いることにより、高精度に顔画像の認識を実現することができる。 The representative image extraction apparatus 1 described above can extract a representative image from a moving image with high accuracy. Specifically, based on the representative candidate image given by the representative candidate image determining means 10, the history of matching scores for each face template calculated by the face template matching means 22 is output to the representative image determining means 30, and the information is By using it for processing in the representative image search means 31, the entire apparatus can be simplified to the minimum necessary configuration. In addition, robustness with respect to fluctuating unknown input data can be improved. That is, by using the history of internal parameters of face image processing in variable template matching, face image recognition can be realized with high accuracy.

次に、代表候補画像決定手段１０、顔画像認識手段２０、及び代表画像判定手段３０における具体的な処理内容について説明する。 Next, specific processing contents in the representative candidate image determination unit 10, the face image recognition unit 20, and the representative image determination unit 30 will be described.

＜代表候補画像決定手段１０＞
代表候補画像決定手段１０は、入力される連続静止画像又は動画像のデータから、テクスチャや色の変化等に基づいて同一人物が映っている可能性の高い区間を推定し、その区間の開始点と終了点を出力する。特に、代候補画像の区間を固定としてよい場合には、開始点のみを出力し、終了点については開始点の時刻に固定長の遅延時間を足した時刻を代用してもよい。 <Representative candidate image determining means 10>
The representative candidate image determining means 10 estimates a section where the same person is highly likely to be reflected based on texture, color change, etc., from the input continuous still image or moving image data, and starts the section And end point. In particular, when the candidate image section may be fixed, only the start point may be output, and for the end point, a time obtained by adding a fixed length of delay time to the time of the start point may be used instead.

なお、本実施形態で必要とする代表候補画像決定手段１０は、与えられた画像の中から同一人物が映っている区間を推定できる手法であれば特に制限はない。例えば、動画像の構図が変化する時刻を、色とテクスチャ情報、及び、音声の大きさの情報を用いて開始点及び終了点を推定する手法については、本出願人により出願されている特願２００４−１４９９７２号「要約番組生成装置、及び要約番組生成プログラム」に示されている手法が利用可能である。 The representative candidate image determining unit 10 required in the present embodiment is not particularly limited as long as it is a method that can estimate a section where the same person is shown from a given image. For example, the Japanese Patent Application filed by the present applicant for a method for estimating the start time and the end point by using the color and texture information and the information of the volume of the sound, the time when the composition of the moving image changes. The technique shown in 2004-149972 "Summary program generation device and summary program generation program" can be used.

上述の手法は、番組の内容を理解しやすい画像及び音声を効率的に抽出することにより、高精度な要約番組を生成する。具体的に説明すると、例えばドラマ映像及び音声をコンピュータに取り込み、映像の切り替わりを示す部分（カット点）及びセリフの喋り出し等に基づいて画像を抽出し、抽出した画像及びフレーム情報等からなるデータ（画像データ構造体）を蓄積する。次に、利用者がＧＵＩ（ＧｒａｐｈｉｃａｌＵｓｅｒＩｎｔｅｒｆａｃｅ）上で記録された画像データ構造体の削除あるいは変更等を行い、利用者により設定される条件に対応する画像データ構造体を抽出する。 The above-described method generates a high-precision summary program by efficiently extracting images and sounds that make it easy to understand the contents of the program. More specifically, for example, a drama video and audio are taken into a computer, an image is extracted based on a portion (cut point) indicating the switching of the video, a line start, etc., and data including the extracted image and frame information, etc. (Image data structure) is stored. Next, the user deletes or changes the image data structure recorded on the GUI (Graphical User Interface), and extracts the image data structure corresponding to the condition set by the user.

このようにして、代表候補区間決定手段１１により得られる代表候補画像は、代表候補画像保存手段１２に出力する。なお、上述以外の手法としては、例えば放送映像の内容記述（メタデータ）を用いて開始点及び終了点を決定することができ、これにより効率的で高精度に代表候補区間を抽出することができる。 In this way, the representative candidate image obtained by the representative candidate section determining unit 11 is output to the representative candidate image storing unit 12. As a method other than the above, for example, the start point and the end point can be determined using the content description (metadata) of the broadcast video, and thereby the representative candidate section can be extracted efficiently and with high accuracy. it can.

＜顔画像認識手段２０＞
顔画像認識手段２０は、予め登録した顔画像に関するデータ（顔テンプレート）のそれぞれについて、代表候補画像中の顔画像との類似度（マッチングスコア）を算出する。なお、マッチングスコアの具体的な算出方法としては、例えばマッチングを行う対象画像（顔画像）の状態が個人識別の難易度を反映した方式であれば特に制限はない。例えば、顔の各器官の特徴的な部位（目の端点，鼻の頂点等）等からなる特徴点において、計算した空間周波数成分と、各々の特徴点を連結するグラフの歪みエネルギーとからマッチングスコアを算出する手法が適用可能である。なお、上述の手法については、本出願人により出願されている特願２００３−３８８２０３号「顔画像認識装置及び顔画像認識プログラム」に示されている。 <Face image recognition means 20>
The face image recognition means 20 calculates the degree of similarity (matching score) with the face image in the representative candidate image for each data (face template) related to the face image registered in advance. A specific method for calculating the matching score is not particularly limited as long as the state of the target image (face image) to be matched reflects the difficulty of personal identification. For example, a matching score is calculated from the calculated spatial frequency component and the distortion energy of a graph connecting each feature point in a feature point composed of a characteristic part of each organ of the face (end of the eye, apex of the nose, etc.). It is possible to apply a method for calculating. The above-described method is described in Japanese Patent Application No. 2003-388203 “Face Image Recognition Device and Face Image Recognition Program” filed by the present applicant.

上述の手法は、顔画像認識において対象物（顔）について想定される複数の変形サンプル画像を撮影し、そのサンプル画像上にプロットした特徴点の統計的な情報（分散・共分散行列）を抽出して利用する。つまり、顔の表情により特徴点毎の移動方向や移動位置がある程度対応付けられて移動するため、その移動情報を人物を特定したい画像の特徴点について適用することにより、顔画像認識を行う。 The above-described method takes a plurality of deformed sample images assumed for an object (face) in face image recognition, and extracts statistical information (variance / covariance matrix) of feature points plotted on the sample images. And use it. In other words, since the movement direction and movement position of each feature point are associated with each other to some extent depending on the facial expression, facial image recognition is performed by applying the movement information to the feature point of the image for which a person is desired to be identified.

なお、サンプルとして与える画像は、その対象物（顔）が取り得る主な変形のパターンを網羅できればよいので、登録画像の対象物そのもの（例えば、顔画像による個人認証システムでは登録人物そのもの）についての変形の様子を撮影する必要はない。すなわち、変形のサンプルとして与える表情の異なる画像が、登録画像の人物そのものが異なる表情の画像を撮影するものでなく、顔の変形パターンが類似している関係、あるいはサンプル画像の表情が登録画像の人物が取り得る表情を包含する関係であれば適応が可能である。 Since the image given as a sample only needs to be able to cover the main deformation patterns that the object (face) can take, the registered image object itself (for example, in the personal authentication system using a face image, the registered person itself) There is no need to shoot the deformation. In other words, images with different facial expressions given as deformation samples do not shoot images with different facial expressions in the registered image, but the relationship in which the facial deformation patterns are similar, or the facial expression of the sample image Adaptation is possible as long as the relationship includes facial expressions that a person can take.

具体的には、複数の特徴点毎に計算した局所的な類似度（空間周波数成分の類似度）の上位数個のみの和をとり、その和をそれぞれの顔テンプレートと入力顔画像とのマッチングスコアとして定義する。そのため、顔の向きや遮蔽による局所的な類似度が高い特徴点の欠落、すなわち個人識別の難易度に関わる顔の状態を同スコアの大小に反映させることが可能となる。 Specifically, only the top few local similarities (spatial frequency component similarities) calculated for each feature point are summed, and the sum is matched with each face template and the input face image. Define as score. Therefore, it is possible to reflect the lack of feature points with high local similarity due to face orientation and occlusion, that is, the state of the face related to the difficulty of personal identification in the magnitude of the score.

ここで、与えられる画像が連続静止画像の場合、与えられた画像全てについて、複数の顔テンプレート毎にマッチングスコアを積分し、その最大値を与える顔テンプレートを代表顔テンプレートとして出力する。 If the given image is a continuous still image, the matching score is integrated for each of the plurality of face templates for all the given images, and the face template that gives the maximum value is output as the representative face template.

なお、本実施形態で示された顔画像認識手段２０は、各々の顔テンプレート毎のマッチングスコアを算出し、算出した結果から代表顔画像に関するテンプレート及びマッチングスコアを出力できる手法であれば特に制限はない。 The face image recognition means 20 shown in the present embodiment is not particularly limited as long as it is a method capable of calculating a matching score for each face template and outputting a template and a matching score related to the representative face image from the calculated result. Absent.

＜顔テンプレート照合手段２２の具体例＞
ここで、本実施形態における顔テンプレート照合手段２２の具体例について図を用いて説明する。図２は、テンプレートの照合に用いた登録顔画像の一例を示す図である。また、図３は、顔認識の様子を説明するための一例を示す図である。 <Specific Example of Face Template Matching Unit 22>
Here, a specific example of the face template matching unit 22 in the present embodiment will be described with reference to the drawings. FIG. 2 is a diagram illustrating an example of a registered face image used for template matching. Moreover, FIG. 3 is a figure which shows an example for demonstrating the mode of face recognition.

顔テンプレート保存手段２１には、図２に示す人物４０（ＩＤ＝２）のようなテンプレート画像（登録顔画像）が多数蓄積されている。また、図２に示す人物４０には、予めこの人物がどのような人物であり、画像中の角度、特徴点位置関係等を示す情報が画像と対応付けられて蓄積されている。 A large number of template images (registered face images) such as the person 40 (ID = 2) shown in FIG. Further, in the person 40 shown in FIG. 2, information indicating what kind of person this person is and the angle in the image, the feature point positional relationship, and the like are stored in association with the image.

顔テンプレート照合手段２２は、図２に示すように、例えば代表候補画像保存手段１２により保存された画像フレーム０〜２０［ｆｒａｍｅ］に示された人物４０が誰であるかを特定するため、顔テンプレートと照合することにより認識処理を行う。 As shown in FIG. 2, the face template collation unit 22 specifies the person 40 shown in the image frames 0 to 20 [frame] stored by the representative candidate image storage unit 12, for example, Recognition processing is performed by matching with the template.

このような処理を行う場合には、上述したように、例えば予め設定される画像中の顔領域４１中に設定される顔画像における目元や目尻、鼻の頂点等の最大９点の特徴点４２から位置から顔の向きや各点の位置関係から照合が行われる。なお、９点の他にも口の端点等を含めた特徴点を用いてもよい。更に、例えば９点のうちの画像中に表示されている所定の特徴点（３点や５点等）を、各々の特徴点における顔テンプレートとのマッチングの度合いに応じて上位から順に選択し、選択した特徴点に基づいて顔画像の認識を行ってもよい。 When such processing is performed, as described above, for example, a maximum of nine feature points 42 such as the eyes, the corners of the eyes, and the nose vertices in the face image set in the face area 41 in the preset image. The collation is performed from the position to the face direction and the positional relationship of each point. In addition to the nine points, feature points including the end points of the mouth may be used. Further, for example, predetermined feature points (3 points, 5 points, etc.) displayed in the image out of 9 points are selected in order from the top according to the degree of matching with the face template at each feature point, The face image may be recognized based on the selected feature point.

ここで、図２に示すような顔画像が顔テンプレート２１として保存されているとすると、顔テンプレート照合手段２２は、各フレームとの照合を行い、それぞれのフレーム画像とのマッチングスコアを作成する。また、顔テンプレート照合手段２２は、顔テンプレート保存手段２１に保存されている図２に示す画像以外の顔画像についても同様の処理を行い、その照合結果であるマッチングスコアテーブルを代表顔識別番号（個人識別情報）と共に、代表画像探索手段３１に出力する。 Here, assuming that a face image as shown in FIG. 2 is stored as the face template 21, the face template matching means 22 performs matching with each frame and creates a matching score with each frame image. Further, the face template matching unit 22 performs the same process for face images other than the image shown in FIG. 2 stored in the face template storage unit 21, and the matching score table as a result of the matching is represented by a representative face identification number ( And the personal identification information) to the representative image search means 31.

ここで、図４は、マッチングスコアテーブルの一例を示す図である。図４に示すように、マッチングスコアテーブル５０は、顔テンプレート保存手段２１に保存されている顔画像（個人識別情報：ＩＤ）毎に各フレーム毎のマッチングスコアが出力されている。 Here, FIG. 4 is a diagram illustrating an example of the matching score table. As shown in FIG. 4, the matching score table 50 outputs a matching score for each frame for each face image (personal identification information: ID) stored in the face template storage unit 21.

なお、図４に示す出力データにおいて、”ｆｒａｍｅ”は、映像中に含まれる画像フレームを示し、具体的にはフレームに対応する時刻情報を示している。また、”ｉｔｅｒａｔｉｏｎ”は代表候補区間の開始点からの相対時刻を示し、”ｔｒａｃｋ＿ｎｏ”は個々の顔領域を識別する番号を示し、”ｔｒａｃｋ−＞ａｇｅ”は、検出した顔領域の履歴の長さを示し、”ｔｒａｃｋ−＞ｃｆｇｒｏｕｐ＿ｎｏ”は照合に用いた代表候補画像上の特徴点配点パターンの通し番号を示している。 In the output data shown in FIG. 4, “frame” indicates an image frame included in the video, and specifically indicates time information corresponding to the frame. “Iteration” indicates a relative time from the start point of the representative candidate section, “track_no” indicates a number for identifying each face area, and “track-> age” indicates the length of the history of the detected face area. "Track-> cfgroup_no" indicates the serial number of the feature point constellation pattern on the representative candidate image used for collation.

また、”ｐｏｓｅ”は顔テンプレートの向き（例えば、正面を０とし、右に向いている場合は＋０〜＋９０とし、左に向いている場合は−０〜−９０とする。）を示し、”ｃｅｎ＿ｘ”は顔領域の中心位置（Ｘ座標）を示し、”ｃｅｎ＿ｙ”は顔領域の中心位置（Ｙ座標）を示し、”ｒａｄ”は顔領域の大きさを示している。更に、”ｍａｘ＿ｅｖｉｄ”は本実施形態で必要となるマッチングスコアを示し、”ｍａｘ＿ｅｖｉｄ＿ｉｄ”は個人識別情報としての個人識別番号を示している。 “Pose” indicates the direction of the face template (for example, the front is 0, and if it is facing right, it is +0 to +90, and if it is facing left, it is −0 to −90). “cen_x” indicates the center position (X coordinate) of the face area, “cen_y” indicates the center position (Y coordinate) of the face area, and “rad” indicates the size of the face area. Furthermore, “max_evid” indicates a matching score required in the present embodiment, and “max_evid_id” indicates a personal identification number as personal identification information.

なお、マッチングスコアテーブル５０に含まれる情報は、これに限定されるものではない。また、マッチングスコアテーブル５０は、上述した情報の全てが必要ではなく、少なくとも個人識別情報と、マッチングスコアと、フレームに対する出現時刻情報とを有していればよい。 Note that the information included in the matching score table 50 is not limited to this. Further, the matching score table 50 does not need all of the above-described information, and may have at least personal identification information, a matching score, and appearance time information for the frame.

＜代表画像判定手段３０＞
代表画像判定手段３０は、上述した図４に示すマッチングスコアテーブルから、個人識別番号毎にマッチングスコアを比較して、最適な代表画像を探索する。 <Representative image determination means 30>
The representative image determination unit 30 searches for the optimum representative image by comparing the matching score for each individual identification number from the above-described matching score table shown in FIG.

ここで、図５は、本実施形態における代表画像探索の結果の一例を示す図である。代表画像判定手段３０において、代表画像探索手段３１は、代表画像を探索する１手法として、顔テンプレートとして登録されるある１つの登録顔画像（ＩＤ）と、それ以外の全登録画像（ＩＤ）のマッチングスコアの平均とを比較して、その差が最大となる画像フレームを代表画像として抽出する。このように、マッチングスコアのスコア値に基づいて代表画像を探索することで、迅速に高精度な代表画像を抽出することができる。 Here, FIG. 5 is a diagram illustrating an example of a result of the representative image search in the present embodiment. In the representative image determination means 30, the representative image search means 31 includes one registered face image (ID) registered as a face template and all other registered images (ID) as one method for searching for a representative image. The average of the matching scores is compared, and an image frame having the maximum difference is extracted as a representative image. Thus, by searching for a representative image based on the score value of the matching score, a highly accurate representative image can be quickly extracted.

例えば、上述した図２に示す登録顔画像ＩＤ＝２の画像について、他の登録顔画像ＩＤと比較した結果、図５に示すように予め設定した開始点から１６フレーム（ｆｒａｍｅ）目でマッチングスコア（ｍａｔｃｈｉｎｇｓｃｏｒｅ）の差が最大となる。つまり、１６フレームは、登録顔画像ＩＤ＝２である人物が表示されている可能性が高いことになる。 For example, as a result of comparing the registered face image ID = 2 shown in FIG. 2 described above with other registered face image IDs, a matching score is obtained at the 16th frame from the preset start point as shown in FIG. The difference in (matching score) is maximized. That is, in 16 frames, there is a high possibility that a person with registered face image ID = 2 is displayed.

したがって、代表画像探索手段３１は、１６フレームという情報を代表画像の出現時刻情報として代表画像出力手段３２に出力する。なお、代表画像探索手段３１は、対応する画像フレームがある場合には、その画像フレームを直接出力してもよい。 Therefore, the representative image search means 31 outputs information of 16 frames to the representative image output means 32 as the appearance time information of the representative image. The representative image search means 31 may directly output the image frame when there is a corresponding image frame.

代表画像出力手段３２は、代表候補画像保存手段１２に保存されている代表候補画像から１６フレーム目の画像を抽出し、図６に示す代表画像の出力結果の一例に示すように、１６フレームを登録顔画像ＩＤ＝２の代表画像として出力する。これにより、動画像から画像（代表画像）の自動抽出を高精度に実現することができる。 The representative image output unit 32 extracts the image of the 16th frame from the representative candidate image stored in the representative candidate image storage unit 12, and, as shown in an example of the output result of the representative image shown in FIG. The registered face image ID = 2 is output as a representative image. Thereby, automatic extraction of an image (representative image) from a moving image can be realized with high accuracy.

なお、上述した代表画像の探索手法は、本発明においてはこの限りではない。上述では、ある特定の登録顔画像と、その他の全登録顔画像とを用いた探索手法を示しているが、その他の全登録顔画像のうち、マッチングスコアの大きい方から上位何位（例えば上位５位）までを用いて探索を行ってもよい。 The representative image search method described above is not limited to this in the present invention. In the above description, a search method using a specific registered face image and all other registered face images is shown, but among all the other registered face images, the highest rank (for example, higher rank) from the one with the highest matching score. Search may be performed using up to the fifth place.

また、上述した代表画像探索手段３１は、スコアの差が最大となる代表画像を１つ抽出する例を示すものであるが、本発明においてはこの限りではなく、例えば図５に示すようなマッチングスコアの比較において、予め閾値を設定しておき、スコアの差が閾値以上である画像フレーム又は時刻情報を全て抽出するようにしてもよい。 The above-described representative image search means 31 shows an example in which one representative image with the largest difference in scores is extracted. However, the present invention is not limited to this. For example, matching as shown in FIG. In the score comparison, a threshold value may be set in advance, and all image frames or time information whose score difference is equal to or greater than the threshold value may be extracted.

上述した代表画像抽出装置１を用いることにより、動画像の中から代表する画像を高精度に抽出することができる。これにより、例えば膨大な映像（動画像）の内容に関する情報、特に、映された人物に関する情報を代表的な静止画像を用いて要約することにより、直感的で簡便な動画像の閲覧を実現することができる。
また、顔画像認識の処理を反映したパラメータを代表画像の決定に用いることにより，入力画像の変動に対して頑健な代表画像の抽出を実現することができる。 By using the representative image extraction apparatus 1 described above, a representative image can be extracted from a moving image with high accuracy. As a result, for example, information related to the contents of a vast amount of video (moving images), in particular, information related to a projected person is summarized using representative still images, thereby realizing intuitive and simple browsing of moving images. be able to.
Further, by using a parameter reflecting the face image recognition process for determination of a representative image, it is possible to realize extraction of a representative image that is robust against fluctuations in the input image.

ここで、上述した代表画像抽出装置１は、専用の装置構成により代表画像の抽出を行うこともできるが、例えば、代表画像抽出処理をコンピュータに実行させることができる実行プログラムを生成し、例えば、汎用のパーソナルコンピュータ、ワークステーション等にプログラムをインストールすることにより、本発明における代表画像抽出処理を実現することができる。 Here, the above-described representative image extraction apparatus 1 can extract a representative image with a dedicated apparatus configuration. For example, the representative image extraction apparatus 1 generates an execution program capable of causing a computer to execute a representative image extraction process. By installing a program in a general-purpose personal computer, workstation, or the like, the representative image extraction process in the present invention can be realized.

＜ハードウェア構成＞
ここで、本発明における代表画像抽出処理が実行可能なコンピュータのハードウェア構成例について図を用いて説明する。図７は、本発明における代表画像抽出処理が実現可能なハードウェア構成の一例を示す図である。 <Hardware configuration>
Here, a hardware configuration example of a computer capable of executing the representative image extraction processing according to the present invention will be described with reference to the drawings. FIG. 7 is a diagram showing an example of a hardware configuration capable of realizing the representative image extraction process in the present invention.

図７におけるコンピュータ本体には、入力装置６１と、出力装置６２と、ドライブ装置６３と、補助記憶装置６４と、メモリ装置６５と、各種制御を行うＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）６６と、ネットワーク接続装置６７とを有するよう構成されており、これらはシステムバスＢで相互に接続されている。 7 includes an input device 61, an output device 62, a drive device 63, an auxiliary storage device 64, a memory device 65, a CPU (Central Processing Unit) 66 for performing various controls, and a network connection device. 67, which are connected to each other by a system bus B.

入力装置６１は、ユーザが操作するキーボード及びマウス等のポインティングデバイスを有しており、ユーザからのプログラムの実行等、各種操作信号を入力する。出力装置６２は、本発明における処理を行うためのコンピュータ本体を操作するのに必要な各種ウィンドウやデータ等を表示するディスプレイを有し、ＣＰＵ６６が有する制御プログラムによりプログラムの実行経過や結果等を表示することができる。 The input device 61 has a pointing device such as a keyboard and a mouse operated by the user, and inputs various operation signals such as execution of a program from the user. The output device 62 has a display for displaying various windows and data necessary for operating the computer main body for performing processing according to the present invention, and displays the program execution progress, results, and the like by the control program of the CPU 66. can do.

ここで、本発明において、コンピュータ本体にインストールされる実行プログラムは、例えばＣＤ−ＲＯＭ等の記録媒体６８等により提供される。プログラムを記録した記録媒体６８は、ドライブ装置６３にセット可能であり、記録媒体６８に含まれる実行プログラムが、記録媒体６８からドライブ装置６３を介して補助記憶装置６４にインストールされる。 Here, in the present invention, the execution program installed in the computer main body is provided by the recording medium 68 such as a CD-ROM. The recording medium 68 on which the program is recorded can be set in the drive device 63, and the execution program included in the recording medium 68 is installed from the recording medium 68 to the auxiliary storage device 64 via the drive device 63.

補助記憶装置６４は、ハードディスク等のストレージ手段であり、本発明における実行プログラムや、コンピュータに設けられた制御プログラム等を蓄積し必要に応じて入出力を行うことができる。 The auxiliary storage device 64 is a storage means such as a hard disk, and can store an execution program according to the present invention, a control program provided in a computer, and the like, and can perform input / output as necessary.

ＣＰＵ６６は、ＯＳ（ＯｐｅｒａｔｉｎｇＳｙｓｔｅｍ）等の制御プログラム、及びメモリ装置６５により読み出され格納されている実行プログラムに基づいて、各種演算や各ハードウェア構成部とのデータの入出力等、コンピュータ全体の処理を制御して代表画像抽出における各処理を実現することができる。プログラムの実行中に必要な各種データ等は、補助記憶装置６４から取得することができ、また格納することもできる。 The CPU 66 performs various operations and data input / output with each hardware component based on a control program such as an OS (Operating System) and an execution program read and stored by the memory device 65. Each process in the representative image extraction can be realized by controlling the process. Various data necessary during the execution of the program can be acquired from the auxiliary storage device 64 and stored.

ネットワーク接続装置６７は、通信ネットワーク等と接続することにより、実行プログラムを通信ネットワークに接続されている他の端末等から取得したり、プログラムを実行することで得られた実行結果又は本発明における実行プログラム自体を他の端末等に提供することができる。 The network connection device 67 obtains an execution program from another terminal connected to the communication network by connecting to a communication network or the like, or an execution result obtained by executing the program or an execution in the present invention The program itself can be provided to other terminals.

上述したようなハードウェア構成により、特別な装置構成を必要とせず、低コストで代表画像抽出処理を実現することができる。また、プログラムをインストールすることにより、容易に代表画像抽出処理を実現することができる。 With the hardware configuration as described above, the representative image extraction process can be realized at a low cost without requiring a special device configuration. Also, the representative image extraction process can be easily realized by installing the program.

＜代表画像抽出処理手順＞
次に、実行プログラムにおける処理手順について、フローチャートを用いて説明する。図８は、本発明における代表画像抽出処理手順の一例を示すフローチャートである。まず、代表画像を抽出するための動画像を入力し（Ｓ０１）、入力された動画像から代表候補画像を抽出するための代表候補区間を決定する（Ｓ０２）。なお、代表候補区間は、上述したように、動画像から音声の開始や終了、シーンの切り替わり等に基づいて代表候補画像の区間の開始点及び終了点の組を出力する。また、Ｓ０２では、代表候補区間内の連続静止画像を動画中における時刻情報と関連付けた代表候補画像を抽出する。 <Representative image extraction process>
Next, a processing procedure in the execution program will be described using a flowchart. FIG. 8 is a flowchart showing an example of a representative image extraction processing procedure in the present invention. First, a moving image for extracting a representative image is input (S01), and a representative candidate section for extracting a representative candidate image from the input moving image is determined (S02). Note that, as described above, the representative candidate section outputs a set of start points and end points of the section of the representative candidate image based on the start and end of audio from the moving image, scene switching, and the like. In S02, a representative candidate image in which continuous still images in the representative candidate section are associated with time information in the moving image is extracted.

次に、Ｓ０２の処理により取得した代表候補画像（連続静止画像群）と予め登録されている顔テンプレートの顔画像との照合を行う（Ｓ０３）。Ｓ０３の処理では、上述したように代表顔候補画像と顔テンプレートとから代表顔テンプレートを決定し、代表顔テンプレートを示す個人識別情報とマッチングスコアテーブルとを生成する。 Next, the representative candidate image (continuous still image group) acquired by the process of S02 is collated with the face image of the face template registered in advance (S03). In the process of S03, a representative face template is determined from the representative face candidate image and the face template as described above, and personal identification information indicating the representative face template and a matching score table are generated.

次に、個人識別情報とマッチングスコアテーブルとに基づいて、代表顔テンプレートのマッチングスコアと、それ以外の顔テンプレート又は所定の顔テンプレートのマッチングスコアとの差を最大とする画像フレーム又は時刻を探索し（Ｓ０４）、その画像フレーム又は時刻に対応する画像を代表画像として出力する（Ｓ０５）。 Next, based on the personal identification information and the matching score table, an image frame or time that maximizes the difference between the matching score of the representative face template and the matching score of other face templates or a predetermined face template is searched. (S04), an image corresponding to the image frame or time is output as a representative image (S05).

ここで、未処理の動画像が存在するかを判断し（Ｓ０６）、未処理の動画像が存在する場合（Ｓ０６において、ＹＥＳ）、Ｓ０１に戻り未処理の動画像を入力して以降の処理を行う。また、未処理の動画像が存在しない場合（Ｓ０６において、ＮＯ）、代表画像抽出処理を終了する。 Here, it is determined whether or not an unprocessed moving image exists (S06). If an unprocessed moving image exists (YES in S06), the process returns to S01 to input an unprocessed moving image and subsequent processing. I do. If there is no unprocessed moving image (NO in S06), the representative image extraction process ends.

上述したように、代表画像抽出処理により動画像の中から代表する画像を高精度に抽出することができる。また、本発明における代表画像を抽出する実行プログラムを生成することで、特別な装置構成を必要とせず、低コストで代表画像抽出処理を実現することができる。また、プログラムをインストールすることにより、容易に代表画像抽出処理を実現することができる。 As described above, a representative image can be extracted from a moving image with high accuracy by the representative image extraction process. In addition, by generating an execution program for extracting a representative image in the present invention, a representative image extraction process can be realized at a low cost without requiring a special apparatus configuration. Also, the representative image extraction process can be easily realized by installing the program.

上述したように本発明によれば、動画像の中から代表する画像を高精度に抽出することができる。例えば膨大な映像（動画像）の内容に関する情報、特に、映された人物に関する情報を代表的な静止画像を用いて要約することができる。したがって、直感的で簡便な動画像の閲覧を実現することができる。 As described above, according to the present invention, a representative image can be extracted from a moving image with high accuracy. For example, it is possible to summarize information related to the contents of a vast amount of video (moving images), particularly information related to a projected person, using representative still images. Therefore, intuitive and simple browsing of moving images can be realized.

また、代表画像判定のための処理に顔認識における照合結果を利用することで、装置全体の構成を単純化することができ、代表画像を高精度に抽出することができる。また、変動する未知の入力データに対する頑健性を改善することができる。 Further, by using the collation result in face recognition for the processing for representative image determination, the configuration of the entire apparatus can be simplified, and the representative image can be extracted with high accuracy. In addition, robustness with respect to fluctuating unknown input data can be improved.

なお、上述した実施形態では代表画像の抽出例として顔画像を抽出する例について説明したが、本発明においてはこの限りではなく、例えば手話の映像であれば、指の端点を特徴点として代表的な画像を抽出することができ、動物や植物、建物等についても所定の部分を特徴点として、上述した処理に適用することにより、顔や人物画像以外の代表画像を高精度に抽出することができる。 In the above-described embodiment, an example in which a face image is extracted as a representative image extraction example has been described. However, the present invention is not limited to this. For example, in the case of a sign language video, a finger endpoint is a representative feature point. Images can be extracted and representative images other than faces and human images can be extracted with high accuracy by applying the above-mentioned processing with a predetermined portion as a feature point for animals, plants, buildings, etc. it can.

以上本発明の好ましい実施形態について詳述したが、本発明は係る特定の実施形態に限定されるものではなく、特許請求の範囲に記載された本発明の要旨の範囲内において、種々の変形、変更が可能である。 Although the preferred embodiment of the present invention has been described in detail above, the present invention is not limited to the specific embodiment, and various modifications, within the scope of the gist of the present invention described in the claims, It can be changed.

本発明における代表画像抽出装置の一構成例を示す図である。It is a figure which shows one structural example of the representative image extraction apparatus in this invention. テンプレート照合に用いた登録顔画像の一例を示す図である。It is a figure which shows an example of the registration face image used for template collation. 顔認識の様子を説明するための一例を示す図である。It is a figure which shows an example for demonstrating the mode of face recognition. マッチングスコアテーブルの一例を示す図である。It is a figure which shows an example of a matching score table. 本実施形態における代表画像探索の結果の一例を示す図である。It is a figure which shows an example of the result of the representative image search in this embodiment. 代表画像の出力結果の一例を示す図である。It is a figure which shows an example of the output result of a representative image. 本発明における代表画像抽出処理が実現可能なハードウェア構成の一例を示す図である。It is a figure which shows an example of the hardware constitutions which can implement | achieve the representative image extraction process in this invention. 本発明における代表画像抽出処理手順の一例を示すフローチャートである。It is a flowchart which shows an example of the representative image extraction process sequence in this invention.

Explanation of symbols

１代表画像抽出装置
１０代表候補画像決定手段
１１代表候補区間決定手段
１２代表候補画像保存手段
２０顔画像認識手段
２１顔テンプレート保存手段
２２顔テンプレート照合手段
３０代表画像判定手段
３１代表画像探索手段
３２代表画像出力手段
４０人物
４１顔領域
４２特徴点
５０マッチングスコアテーブル
６１入力装置
６２出力装置
６３ドライブ装置
６４補助記憶装置
６５メモリ装置
６６ＣＰＵ
６７ネットワーク接続装置
６８記録媒体 DESCRIPTION OF SYMBOLS 1 Representative image extraction apparatus 10 Representative candidate image determination means 11 Representative candidate area determination means 12 Representative candidate image storage means 20 Face image recognition means 21 Face template storage means 22 Face template collation means 30 Representative image determination means 31 Representative image search means 32 Representative Image output means 40 Person 41 Face area 42 Feature point 50 Matching score table 61 Input device 62 Output device 63 Drive device 64 Auxiliary storage device 65 Memory device 66 CPU
67 Network connection device 68 Recording medium

Claims

In a representative image extraction device that extracts a representative image from an input moving image,
Representative image determining means for determining a representative candidate image from a predetermined section included in the moving image;
Face image recognition means for recognizing a face image by comparing the representative candidate image with a plurality of face templates having at least a preset face image and personal identification information corresponding to the face image;
A representative image extracting apparatus comprising: a representative image determining unit that searches for a representative image based on a matching result corresponding to the personal identification information obtained by the face image recognition unit and outputs the searched representative image.

The representative image determination means includes
Representative image search means for searching for a representative image based on specific personal identification information based on a predetermined condition from a matching result with the representative candidate image by the plurality of face templates;
The representative image output means for extracting a representative image from the representative candidate image based on the image frame obtained by the representative image search means or the appearance time information of the image frame. Representative image extraction device.

The representative image search means includes
For each image frame included in the representative candidate image, a difference in score value between a matching score in a specific face template and a matching score in a face image other than the specific face template among the plurality of face templates. The representative image extraction apparatus according to claim 2, wherein the image frame or the appearance time information is searched based on the information.

The representative image search means includes
4. The representative image extraction device according to claim 3, wherein the image frame or the appearance time information whose difference in score values is maximum or equal to or greater than a preset threshold is searched.

The representative image search means includes
The representative image extraction apparatus according to claim 3 or 4, wherein a sum of a plurality of matching scores obtained for each face template is normalized to a constant value.

In a representative image extraction program for causing a computer to execute a representative image extraction process for extracting a representative image from an input moving image,
Representative image determination processing for determining a representative candidate image from a predetermined section included in the moving image;
A face image recognition process for recognizing a face image by comparing the representative candidate image with a face image set in advance and a plurality of face templates having at least personal identification information corresponding to the face image;
A representative image extraction program for causing a computer to execute a representative image determination process of searching for a representative image based on a matching result corresponding to the personal identification information obtained by the face image recognition process and outputting the searched representative image.