JP4999589B2

JP4999589B2 - Image processing apparatus and method

Info

Publication number: JP4999589B2
Application number: JP2007193680A
Authority: JP
Inventors: 俊司藤田
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2007-07-25
Filing date: 2007-07-25
Publication date: 2012-08-15
Anticipated expiration: 2027-07-25
Also published as: JP2009033351A

Description

本発明は、画像処理装置及び方法に関する。 The present invention relates to an image processing apparatus and method.

近年、ＢＳデジタル放送又は地上波デジタル放送等のデジタルテレビ放送番組を、ハードディスク又は光ディスク等のディスク媒体に記録再生する映像記録再生装置が普及している。また、ネットワークに対応し、ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）上のパーソナルコンピュータやデジタル家電機器との間で動画像データを送受信可能な映像記録再生装置が商品化されている。 In recent years, video recording / reproducing apparatuses for recording and reproducing digital television broadcast programs such as BS digital broadcast or terrestrial digital broadcast on a disk medium such as a hard disk or an optical disk have become widespread. In addition, video recording / playback apparatuses that are compatible with networks and that can transmit and receive moving image data to and from personal computers and digital home appliances on a LAN (Local Area Network) have been commercialized.

ホームネットワークで動画像データの共有を目指す仕組みとして、ＤＬＮＡ（Digital Living Network Alliance）があり、ＤＬＮＡが策定した実装ガイドラインに対応した製品も増えてきている。 DLNA (Digital Living Network Alliance) is a mechanism for sharing moving image data on a home network, and products that comply with the implementation guidelines established by DLNA are increasing.

近年、ハードディスクの大容量化に伴い、大量のテレビ放送番組や動画像データを記録できるようになっている。例えば、ハードディスクの容量が２５０ＧＢ（ギガバイト）のものでは、標準画質モードで１００時間以上、長時間モードでは２００時間以上のテレビ放送を記録できる。その結果、装置内部に保存できる番組数は、数十本から百本以上と膨大な数になる。特許文献１には、このように大量に録画された番組から希望の番組を簡単に検索するための技術が記載されている。
特開２００２−３５９８０３号公報 In recent years, with the increase in capacity of hard disks, a large amount of television broadcast programs and moving image data can be recorded. For example, when the hard disk has a capacity of 250 GB (gigabytes), it is possible to record a television broadcast of 100 hours or more in the standard image quality mode and 200 hours or more in the long-time mode. As a result, the number of programs that can be stored in the apparatus is enormous, from tens to hundreds. Patent Document 1 describes a technique for easily searching for a desired program from such a large number of recorded programs.
JP 2002-359803 A

従来の検索技術は、文字情報に基づいて所望の番組を検索するものであり、依然として煩わしさが残ってしまう。また、ユーザが自分でデジタルカメラやデジタルビデオカメラで撮影した動画像データには、ユーザ自身が文字情報を入力しない限り、適用できない。 Conventional search techniques search for a desired program based on character information, and are still bothersome. In addition, it cannot be applied to moving image data captured by a user with a digital camera or a digital video camera unless the user inputs character information.

従って、依然として、ハードディスクに記録された大量の動画像データの中からユーザが観賞したいコンテンツを探しだすには、非常な手間と時間がかかるといった問題がある。 Therefore, there is still a problem that it takes a lot of time and effort to search for content that the user wants to watch from a large amount of moving image data recorded on the hard disk.

本発明は、大量の動画像から希望の動画像を迅速に検索できる画像処理装置及び方法を提示することを目的とする。 An object of the present invention is to provide an image processing apparatus and method capable of quickly retrieving a desired moving image from a large number of moving images.

本発明に係る画像処理装置は、動画像を記録媒体に記録する記録手段と、前記動画像に含まれる複数の特定のオブジェクトの出現期間を検出し、複数の前記特定のオブジェクト毎の前記出現期間に関するオブジェクト関連情報を生成する手段と、画像を取得する取得手段と、前記取得手段により取得された画像から特定のオブジェクトを検出する検出手段と、前記オブジェクト関連情報と前記検出手段の検出結果とに基づいて、前記記録媒体に記録された動画像のうち前記検出手段が検出した特定のオブジェクトの出現期間を選択して再生するためのプレイリストを生成するプレイリスト生成手段とを備え、前記検出手段が、前記取得手段により取得された１画面の画像から、前記動画像に含まれる複数の前記特定のオブジェクトのうちの第１の特定のオブジェクトと第２の特定のオブジェクトとを検出した場合、前記プレイリスト生成手段は、前記第１の特定のオブジェクトと前記第２の特定のオブジェクトとが共に出現する期間を選択して再生するための第１のプレイリストと、前記第１の特定のオブジェクトの出現期間を選択して再生するための第２のプレイリストと、前記第２の特定のオブジェクトの出現期間を選択して再生するための第３のプレイリストとを生成することを特徴とする。 The image processing apparatus according to the present invention detects recording periods for recording a moving image on a recording medium, and appearance periods of a plurality of specific objects included in the moving image, and the appearance periods for each of the plurality of specific objects Generating means for generating object related information, acquiring means for acquiring an image, detecting means for detecting a specific object from the image acquired by the acquiring means, the object related information and a detection result of the detecting means Based on the moving image recorded on the recording medium, a playlist generating unit that generates a playlist for selecting and reproducing the appearance period of the specific object detected by the detecting unit, and the detecting unit Is the first of the plurality of specific objects included in the moving image from the one-screen image acquired by the acquiring means. When the specific object and the second specific object are detected, the playlist generation means selects and reproduces the period in which the first specific object and the second specific object appear together A first play list for selecting, a second play list for selecting and reproducing the appearance period of the first specific object, and selecting and reproducing the appearance period of the second specific object Generating a third play list to be performed .

本発明に係る画像処理方法は、記録媒体に記録された動画像を処理する方法であって、前記動画像に含まれる複数の特定のオブジェクトの出現期間を検出し、複数の前記特定のオブジェクト毎の前記出現期間に関するオブジェクト関連情報を生成するステップと、画像を取得する取得ステップと、前記取得された画像から特定のオブジェクトを検出する検出ステップと、前記オブジェクト関連情報と前記検出ステップの検出結果とに基づいて、前記記録媒体に記録された動画像のうち前記検出ステップが検出した特定のオブジェクトの出現期間を選択して再生するためのプレイリストを生成するステップとを備え、前記検出ステップが、前記取得された１画面の画像から、前記動画像に含まれる複数の前記特定のオブジェクトのうちの第１の特定のオブジェクトと第２の特定のオブジェクトとを検出した場合、前記プレイリストを生成するステップは、前記第１の特定のオブジェクトと前記第２の特定のオブジェクトとが共に出現する期間を選択して再生するための第１のプレイリストと、前記第１の特定のオブジェクトの出現期間を選択して再生するための第２のプレイリストと、前記第２の特定のオブジェクトの出現期間を選択して再生するための第３のプレイリストとを生成することを特徴とする。 An image processing method according to the present invention is a method for processing a moving image recorded on a recording medium, wherein the appearance period of a plurality of specific objects included in the moving image is detected, and each of the plurality of specific objects is detected. Generating object-related information relating to the appearance period, an acquisition step of acquiring an image, a detection step of detecting a specific object from the acquired image, the object-related information, and a detection result of the detection step And generating a playlist for selecting and reproducing the appearance period of the specific object detected by the detection step from the moving images recorded on the recording medium, the detection step comprising: From the acquired one-screen image, a first of a plurality of the specific objects included in the moving image When the predetermined object and the second specific object are detected, the step of generating the playlist selects a period in which the first specific object and the second specific object appear together. A first playlist for reproduction, a second playlist for selecting and reproducing an appearance period of the first specific object, and an appearance period of the second specific object A third play list for reproduction is generated .

本発明によれば、記録媒体に大量の動画像が蓄積されている場合においても、特定のオブジェクト、例えば、特定の人物が映る動画像を再生候補としてユーザに素早く提供することが可能になる。これにより、ユーザが視聴したい可能性が高いコンテンツを素早くユーザに再生させることが可能になり、大量の動画像から希望の動画像を探しだす手間と時間を軽減できる。 According to the present invention, even when a large amount of moving images are stored in a recording medium, it is possible to quickly provide a user with a moving image showing a specific object, for example, a specific person, as a reproduction candidate. As a result, it is possible for the user to quickly reproduce content that the user is likely to view, and the time and labor for searching for a desired moving image from a large amount of moving images can be reduced.

以下、図面を参照して、本発明の実施例を詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

図１は、本発明に係る記録再生装置の一実施例の概略構成ブロック図を示し、図２は、周辺の装置との接続例を示す。 FIG. 1 shows a schematic block diagram of an embodiment of a recording / reproducing apparatus according to the present invention, and FIG. 2 shows an example of connection with peripheral devices.

本発明の一実施例である記録再生装置１０に、ＵＳＢケーブル又はＩＥＥＥ１３９４ケーブルを介して、撮像手段としてのカメラ１２が接続する。カメラ１２は、本実施例では、動画像を出力するビデオカメラであるが、ワンショットの撮影画像を出力するデジタルスチルカメラであってもよい。 The recording / reproducing apparatus 10 which is one embodiment of the present invention is connected to a camera 12 as an imaging means via a USB cable or an IEEE1394 cable. In this embodiment, the camera 12 is a video camera that outputs a moving image, but may be a digital still camera that outputs a one-shot shot image.

記録再生装置１０には更に、ＬＡＮ１４を介してビデオカメラ１６が接続する。本実施例では、ビデオカメラ１６が出力する動画像及び音声がＬＡＮ１４を介して記録再生装置１０に供給され、記録再生装置１０の記録媒体に記録される。特に除外する場合を除き、動画像は、音声を含む。記録再生装置１０の映像音声出力は、映像音声モニタ１８に供給される。映像音声モニタ１８は、記録再生装置１０からの再生映像を映像表示装置１８ａの画面に表示し、再生音声をスピーカ１８ｂから出力する。 A video camera 16 is further connected to the recording / reproducing apparatus 10 via the LAN 14. In this embodiment, moving images and audio output from the video camera 16 are supplied to the recording / reproducing apparatus 10 via the LAN 14 and recorded on the recording medium of the recording / reproducing apparatus 10. Unless specifically excluded, the moving image includes sound. The video / audio output of the recording / reproducing apparatus 10 is supplied to the video / audio monitor 18. The video / audio monitor 18 displays the playback video from the recording / playback device 10 on the screen of the video display device 18a, and outputs the playback audio from the speaker 18b.

記録再生装置１０の構成と基本動作を説明する。ビデオカメラ１６からの動画像は、ＬＡＮ１４を介して、記録再生装置１０の通信処理装置２０に入力する。通信処理装置２０は、ＨＴＴＰ（Hyper Text Transfer Protocol）又はＦＴＰ（File Transfer Protocol）等の周知のプロトコルでＬＡＮ１４からのデータを受信できる。ビデオカメラ１６及び通信処理装置２０は、それぞれ適切なＩＰアドレスを設定されており、お互いのＩＰアドレスを知っているとする。例えば、よく知られているＵＰｎＰ（Universal Play and Play）規格の機能を利用する。 The configuration and basic operation of the recording / reproducing apparatus 10 will be described. A moving image from the video camera 16 is input to the communication processing device 20 of the recording / reproducing device 10 via the LAN 14. The communication processing device 20 can receive data from the LAN 14 by a known protocol such as HTTP (Hyper Text Transfer Protocol) or FTP (File Transfer Protocol). Assume that the video camera 16 and the communication processing device 20 are set with appropriate IP addresses and know the IP addresses of each other. For example, a well-known UPnP (Universal Play and Play) standard function is used.

記録再生装置１０は、ユーザが操作する操作キー、操作ボタン又はリモートコントロール装置等からなるユーザインターフェース（ＵＩ）２２を具備する。ユーザはユーザインターフェース２２を使って、動作モード、及び記録・再生の開始・終了等を記録再生装置１０に指示できる。 The recording / reproducing apparatus 10 includes a user interface (UI) 22 including operation keys, operation buttons, a remote control device, and the like operated by a user. The user can use the user interface 22 to instruct the recording / reproducing apparatus 10 about the operation mode and the start / end of recording / reproduction.

記録モードでは、通信処理装置２０は、ビデオカメラ１６からの動画像を記録処理装置２４及び顔認識処理装置２６に供給する。ＬＡＮ１４経由の動画像伝送のために、ビデオカメラ１６が動画像を圧縮符号化してＬＡＮ１４に出力している場合、記録再生装置１０は、対応する画像伸長装置（図示せず）を有することは明らかである。 In the recording mode, the communication processing device 20 supplies the moving image from the video camera 16 to the recording processing device 24 and the face recognition processing device 26. When the video camera 16 compresses and encodes a moving image and transmits it to the LAN 14 for moving image transmission via the LAN 14, it is clear that the recording / reproducing apparatus 10 has a corresponding image expansion device (not shown). It is.

記録開始の指示に従い、記録処理装置２４は、通信処理装置２０からの動画像を記録媒体であるハードディスク（ＨＤＤ）２８に記録する。ＨＤＤ２８への動画像記録に特定の画像圧縮方式を使用する場合に、記録処理装置２４は、そのための画像圧縮装置を含む。 In accordance with the recording start instruction, the recording processing device 24 records the moving image from the communication processing device 20 on a hard disk (HDD) 28 that is a recording medium. When a specific image compression method is used for moving image recording on the HDD 28, the recording processing device 24 includes an image compression device for that purpose.

顔認識処理装置２６は、通信処理装置２０からの動画像のフレーム画像又は一定フレーム周期のフレーム画像から画像認識により人物の顔を認識し、顔特徴量（第１のオブジェクト特徴量）を抽出する。これは、第１のオブジェクト認識処理による第１のオブジェクト特徴量の抽出に相当する。例えば、顔認識処理装置２６は、１０フレーム周期、又は、フレームレートの１／５の周期で、顔を認識する。顔認識処理装置２６は、顔のサイズが所定サイズ以上の場合に、顔認識を実行する。顔認識処理装置２６の顔認識処理技術には、公知の種々の方法を適用できる。例えば、エッジ検出で顔の輪郭を検出し、目、鼻及び口などの位置を特徴量として抽出する方法でもよい。顔認識処理装置２６は、抽出した顔特徴量を顔判別処理装置３０に供給する。 The face recognition processing device 26 recognizes the face of a person by image recognition from the frame image of the moving image from the communication processing device 20 or the frame image of a fixed frame period, and extracts a face feature amount (first object feature amount). . This corresponds to the extraction of the first object feature amount by the first object recognition process. For example, the face recognition processing device 26 recognizes a face at a cycle of 10 frames or a cycle of 1/5 of the frame rate. The face recognition processing device 26 performs face recognition when the face size is equal to or larger than a predetermined size. Various known methods can be applied to the face recognition processing technology of the face recognition processing device 26. For example, a method may be used in which the contour of the face is detected by edge detection, and the positions of eyes, nose and mouth are extracted as feature amounts. The face recognition processing device 26 supplies the extracted face feature amount to the face discrimination processing device 30.

顔判別処理装置３０は、顔認識処理装置２６からの顔特徴量をキーとして顔特徴量データベース３２を検索し、人物を判別する。顔特徴量データベース３２は、顔特徴量と人物（実際には、人物を特定する顔識別子）とを対応付けるレコードからなる。顔判別処理装置３０は、顔特徴量データベース３２の検索の結果、判別した人物を示す情報、すなわち、人物識別子を管理テーブル３４に格納する。顔特徴量データベース３２及び管理テーブル３４は、ＨＤＤ２８に格納されても、ＨＤＤ２８とは別の記憶媒体に格納されても良い。 The face discrimination processing device 30 searches the face feature amount database 32 using the face feature amount from the face recognition processing device 26 as a key, and discriminates a person. The face feature amount database 32 includes a record that associates a face feature amount with a person (actually, a face identifier that identifies a person). The face discrimination processing device 30 stores information indicating a person who has been discriminated as a result of the search of the face feature amount database 32, that is, a person identifier in the management table 34. The face feature amount database 32 and the management table 34 may be stored in the HDD 28 or may be stored in a storage medium different from the HDD 28.

管理テーブル３４には最終的に、ビデオカメラ１６から供給され、ＨＤＤ２８に記録される動画像にどの人物が、どの期間に含まれているかを示す情報が、人物の出現単位で格納される。即ち、各行のレコードは、顔識別子フィールド４２の顔識別子で識別される人物が、コンテンツ名フィールドで特定される動画像内に映っている期間を示す。この点で、管理テーブル３４は、動画像に含まれる人物（オブジェクト）の登場履歴を記憶する、いわばオブジェクト履歴テーブルである。 In the management table 34, information indicating which person is included in which period in the moving image finally supplied from the video camera 16 and recorded in the HDD 28 is stored in the appearance unit of the person. That is, each row record indicates a period during which a person identified by the face identifier in the face identifier field 42 is shown in the moving image specified in the content name field. In this regard, the management table 34 is an object history table that stores the appearance history of persons (objects) included in a moving image.

図３は、管理テーブル３４の構造例を示す。管理テーブル３４は、判別識別子フィールド４０、顔識別子フィールド４２、コンテンツ名フィールド４４、開始タイムフィールド４６、終了タイムフィールド４８、及び時間フィールド５０からなる。管理テーブル３４は、ＨＤＤ２８に記録される動画像に特定のオブジェクト（この実施例では人物の顔）がどのコンテンツのどの期間に出現するかを記述するものであり、特許請求の範囲のオブジェクト関連情報に対応する。管理テーブル３４を生成する顔判別処理装置３０の機能が、特許請求の範囲のオブジェクト関連情報生成手段に相当する。 FIG. 3 shows an example of the structure of the management table 34. The management table 34 includes a discrimination identifier field 40, a face identifier field 42, a content name field 44, a start time field 46, an end time field 48, and a time field 50. The management table 34 describes in which period of which content a specific object (a person's face in this embodiment) appears in a moving image recorded in the HDD 28, and the object related information in the claims Corresponding to The function of the face discrimination processing device 30 that generates the management table 34 corresponds to the object-related information generation unit in the claims.

判別識別子フィールド４０の判別識別子は、顔認識処理において新しく顔が認識された時、又は、顔が再び認識された時に顔判別処理装置３０により生成される一意の識別子である。顔識別子フィールド４２の顔識別子は、顔を一意に特定する識別子である。顔判別処理装置３０は、顔認識処理装置２６からの顔特徴量が顔特徴量データベース３２に登録済みの場合に、顔特徴量データベース３２から当該顔特徴量に対応する顔識別子を読み出し、顔識別子フィールド４２に格納する。 The discrimination identifier in the discrimination identifier field 40 is a unique identifier generated by the face discrimination processing device 30 when a new face is recognized in the face recognition process or when a face is recognized again. The face identifier in the face identifier field 42 is an identifier that uniquely identifies a face. When the face feature amount from the face recognition processing device 26 is already registered in the face feature amount database 32, the face discrimination processing device 30 reads a face identifier corresponding to the face feature amount from the face feature amount database 32, and the face identifier Store in field 42.

コンテンツ名フィールド４４のコンテンツ名は、ビデオカメラ１６から受信する動画像データに対し、図示していないアプリケーションが付与する名前である。すなわち、顔判別処理装置３０は、図示していないアプリケーションから提供されるファイル名などのコンテンツ名を管理テーブル３４のコンテンツ名フィールド４４に格納する。 The content name in the content name field 44 is a name given to the moving image data received from the video camera 16 by an application (not shown). That is, the face discrimination processing device 30 stores a content name such as a file name provided from an application (not shown) in the content name field 44 of the management table 34.

開始タイムフィールド４６には、フレーム画像に顔が初めて又は再び出現した時のタイムコードが格納され、終了タイムフィールド４８には、その顔が認識されなくなった時のタイムコードが格納される。タイムコードは、動画像データの最初のフレーム画像を００：００：００としたときの経過時間であり、動画像データのフレームレートに合わせて値が増加する。例えば、フレームレートが３０フレーム／秒で、顔認識処理を６フレーム毎に行っている場合、タイムコードは５フレーム毎に１秒、増加する。即ち、開始タイムフィールド４６の開始タイムから、終了タイムフィールド４８の終了タイムの直前までの間、顔識別子フィールド４２の顔識別子で特定される人物が、記録中の動画像に含まれていることになる。 The start time field 46 stores the time code when the face appears for the first time or again in the frame image, and the end time field 48 stores the time code when the face is no longer recognized. The time code is an elapsed time when the first frame image of the moving image data is set to 00:00:00, and the value increases in accordance with the frame rate of the moving image data. For example, when the frame rate is 30 frames / second and the face recognition process is performed every 6 frames, the time code increases by 1 second every 5 frames. That is, the person specified by the face identifier in the face identifier field 42 from the start time in the start time field 46 to immediately before the end time in the end time field 48 is included in the moving image being recorded. Become.

時間フィールド５０には、顔識別子フィールド４２で特定される人物が動画像に出現していた期間、具体的には、終了タイムフィールド４８の値から開始タイムフィールド４６の値を差し引いた結果が格納される。 The time field 50 stores the period during which the person specified in the face identifier field 42 has appeared in the moving image, specifically, the result of subtracting the value of the start time field 46 from the value of the end time field 48. The

ビデオカメラ１６からの動画像をＨＤＤ２８に記録する動作を説明したが、ビデオカメラ１６からのワンショットの動画像、即ち、静止画像を、同様のプロセスでＨＤＤ２８に記録することができる。勿論、最近の多くのビデオカメラは、デジタルスチルカメラとしても利用できる程の高解像度の静止画像を撮影できるので、この静止画撮影機能を使えばよい。 Although the operation of recording a moving image from the video camera 16 on the HDD 28 has been described, a one-shot moving image from the video camera 16, that is, a still image can be recorded on the HDD 28 in the same process. Of course, many recent video cameras can shoot still images with a resolution that is high enough to be used as a digital still camera.

また、動画像の記録と同時に、顔認識処理と顔判別処理を実行する動作を説明したが、動画像の記録終了後に、例えば待機状態の時に、顔認識処理と顔判別処理を実行してもよい。 In addition, the operation of executing the face recognition process and the face discrimination process simultaneously with the recording of the moving image has been described, but the face recognition process and the face discrimination process may be executed after the recording of the moving image, for example, in a standby state. Good.

本実施例の記録再生装置１０は、再生モードで、ＨＤＤ２８に記録される任意の画像（動画像又は静止画像）を再生することができる。本実施例は特に、特定の人物が撮影されている動画像のプレイリストを作成し、そのプレイリストに従って、ＨＤＤ２８の記録画像を再生できる。即ち、ＨＤＤ２８に記録された動画像のうち、プレイリストにて指定された部分の動画像を選択して再生することができる。この動作モードをプレイリスト作成モードと呼ぶ。 The recording / reproducing apparatus 10 of the present embodiment can reproduce any image (moving image or still image) recorded in the HDD 28 in the reproduction mode. In the present embodiment, in particular, a playlist of moving images in which a specific person is photographed can be created, and recorded images on the HDD 28 can be reproduced according to the playlist. That is, it is possible to select and play a moving image of a portion specified in the play list from the moving images recorded in the HDD 28. This operation mode is called a playlist creation mode.

プレイリスト作成モードでは、カメラ１２の出力画像信号は、デジタルインターフェース３８を介して顔認識処理装置２６に供給される。デジタルインターフェース３８は、例えば、ＵＳＤＢ又はＩＥＥＥ１３９４等に準拠するインターフェース、又は、アナログ画像信号をデジタル画像信号に変換する画像キャプチャ装置である。顔認識処理装置２６は、カメラ１２の撮影画像から、記録モードの場合と同様に、映っている人物の顔を認識する。勿論、カメラ１２がスチルカメラの場合には、顔認識処理装置２６は、カメラ１２からのワンショットの撮影画像（静止画）から顔を認識する。顔認識処理装置２６は、顔認識により抽出された顔特徴量（第２のオブジェクト特徴量）を顔判別処理装置３０に供給する。これは、第２のオブジェクト認識処理による第２のオブジェクト特徴量の抽出に相当する。 In the playlist creation mode, the output image signal of the camera 12 is supplied to the face recognition processing device 26 via the digital interface 38. The digital interface 38 is, for example, an interface conforming to USDB or IEEE1394, or an image capture device that converts an analog image signal into a digital image signal. The face recognition processing device 26 recognizes the face of the person being shown from the captured image of the camera 12 as in the recording mode. Of course, when the camera 12 is a still camera, the face recognition processing device 26 recognizes a face from a one-shot photographed image (still image) from the camera 12. The face recognition processing device 26 supplies the face feature amount (second object feature amount) extracted by face recognition to the face discrimination processing device 30. This corresponds to the extraction of the second object feature amount by the second object recognition process.

顔判別処理装置３０は、顔認識処理装置２６からの顔特徴量を検索キーとして顔特徴量データベース３２を検索し、更に、顔特徴量データベース３２の検索で合致した人物の顔識別子を検索キーとして管理テーブル３４を検索する。顔判別処理装置３０は、最終的に、カメラ１２で撮影された人物が含まれるコンテンツを示すプレイリストを管理テーブル３４から生成し、再生処理装置３６に供給する。即ち、顔判別処理装置３０は、特許請求の範囲のプレイリスト生成手段に相当する。 The face discrimination processing device 30 searches the face feature value database 32 using the face feature value from the face recognition processing device 26 as a search key, and further uses the face identifier of the person matched in the search of the face feature value database 32 as a search key. The management table 34 is searched. The face discrimination processing device 30 finally generates a playlist indicating content including a person photographed by the camera 12 from the management table 34 and supplies the playlist to the reproduction processing device 36. That is, the face discrimination processing device 30 corresponds to a playlist generation unit in the claims.

再生処理装置３６は、顔判別処理装置３０からのプレイリストを参照して、ユーザにプレイリストの存在を通知し、自動で、又はユーザの指示に従い、ＨＤＤ２８から該当するコンテンツを再生する。再生処理装置３６は、再生映像信号及び再生音声信号を映像音声モニタ１８に供給する。 The reproduction processing device 36 refers to the playlist from the face discrimination processing device 30 to notify the user of the existence of the playlist, and reproduces the corresponding content from the HDD 28 automatically or according to the user's instruction. The reproduction processing device 36 supplies the reproduction video signal and the reproduction audio signal to the video / audio monitor 18.

図４は、本実施例による動画像の記録と顔認識の動作フローチャートを示す。図示しないアプリケーションが起動し、例えばＨＴＴＰのＰＯＳＴリクエストを使ってビデオカメラ１６から記録再生装置１０への動画像データの送信を開始する。通信処理装置２０が、ビデオカメラ１６からの動画像データを受信する（Ｓ１）。このとき、記録再生装置１０は、受信する動画像データにコンテンツ名を付与する。本実施例では、例えば動画像データであれば、ＭＯＶＩＥ−Ｘ．ＭＰＧとし、静止画データであれば、ＩＭＡＧＥ−Ｘ．ＪＰＧとする。Ｘは連続番号である。 FIG. 4 shows an operation flowchart of moving image recording and face recognition according to this embodiment. An application (not shown) is activated, and transmission of moving image data from the video camera 16 to the recording / reproducing apparatus 10 is started using, for example, an HTTP POST request. The communication processing device 20 receives the moving image data from the video camera 16 (S1). At this time, the recording / reproducing apparatus 10 assigns a content name to the received moving image data. In the present embodiment, for example, MOVIE-X. MPG, and if it is still image data, IMAGE-X. JPG. X is a serial number.

顔認識処理装置２６は、上述したように、通信処理装置２０からの動画像データから所定間隔のフレーム画像を抽出し、顔認識処理を実施する（Ｓ２）。フレーム画像内に顔を認識した場合（Ｓ３）、抽出した顔特徴量を顔判別処理装置３０に供給し、フレーム画像内に顔を認識できなかった場合（Ｓ３）、顔認識処理を終了する。 As described above, the face recognition processing device 26 extracts frame images at predetermined intervals from the moving image data from the communication processing device 20, and performs face recognition processing (S2). When the face is recognized in the frame image (S3), the extracted face feature amount is supplied to the face discrimination processing device 30, and when the face is not recognized in the frame image (S3), the face recognition process is ended.

顔判別処理装置３０は、顔認識処理装置２６からの顔特徴量が顔特徴量データベース３２に登録されているかいなかを意志ラベル（Ｓ４）。抽出された顔特徴量が顔特徴量データベース３２に登録されていない場合（Ｓ４）、新しくユニークな顔識別子を生成し、顔特徴量とともに顔特徴量データベース３２に登録する（Ｓ５）。一方、抽出された顔特徴量が顔特徴量データベース３２に登録されている場合（Ｓ４）、顔特徴量データベース３２から、顔特徴量に対応する顔識別子を取得する（Ｓ６）。 The face discrimination processing device 30 indicates whether or not the facial feature amount from the facial recognition processing device 26 is registered in the facial feature amount database 32 (S4). If the extracted face feature value is not registered in the face feature value database 32 (S4), a new unique face identifier is generated and registered in the face feature value database 32 together with the face feature value (S5). On the other hand, when the extracted face feature value is registered in the face feature value database 32 (S4), a face identifier corresponding to the face feature value is acquired from the face feature value database 32 (S6).

生成した顔識別子（Ｓ５）又は取得した顔識別子（Ｓ６）を管理テーブル３４に記録する（Ｓ７）。新規に顔を認識した際に、管理テーブル３４に判別識別子フィールド４０、顔識別子フィールド４２，コンテンツ名フィールド４４及び開始タイムフィールド４６を書き込んだ新レコードを追加する。この新レコードの終了タイムフィールド４８には、この顔が認識されなくなった時のタイムコードが書き込まれ、時間フィールド５０には、開始タイムから終了タイムへの経過時間が書き込まれる。管理テーブル３４の各レコードは、顔識別子フィールド４２の顔識別子で識別される人物が動画像内に映っている期間を示す。 The generated face identifier (S5) or the acquired face identifier (S6) is recorded in the management table 34 (S7). When a face is newly recognized, a new record in which a discrimination identifier field 40, a face identifier field 42, a content name field 44, and a start time field 46 are written is added to the management table 34. A time code when the face is no longer recognized is written in the end time field 48 of the new record, and an elapsed time from the start time to the end time is written in the time field 50. Each record in the management table 34 indicates a period during which the person identified by the face identifier in the face identifier field 42 is shown in the moving image.

動画像データの受信を完了するまで、以上の処理（Ｓ２〜Ｓ７）を繰り返す（Ｓ８）。動画像データの受信を完了すると、図４に示すフローを終了する。 The above processing (S2 to S7) is repeated until the reception of the moving image data is completed (S8). When the reception of moving image data is completed, the flow shown in FIG. 4 ends.

図５、図６及び図７は、管理テーブル３４の内容と変遷の例を示す。図５は、記録再生装置１０が動画像データ（コンテンツ名：ＭＯＶＩＥ−１．ＭＰＧ）を受信し記録した時の管理テーブル３４の内容例を示す。図６は、図５に続いて、静止画データ（コンテンツ名：ＩＭＡＧＥ−１．ＪＰＧ）をビデオカメラ１６から受信し記録した時の管理テーブル３４の内容例を示す。図７は、図６に続いて動画像データ（コンテンツ名：ＭＯＶＩＥ−２．ＭＰＧ）をビデオカメラ１６から受信し記録した時の管理テーブル３４の内容例を示す。顔判別処理装置３０が、図５、図６及び図７に示すように管理テーブル３４を更新する。 5, 6 and 7 show examples of the contents and transition of the management table 34. FIG. FIG. 5 shows an example of the contents of the management table 34 when the recording / reproducing apparatus 10 receives and records moving image data (content name: MOVIE-1.MPG). FIG. 6 shows an example of the contents of the management table 34 when still image data (content name: IMAGE-1.JPG) is received from the video camera 16 and recorded, following FIG. FIG. 7 shows an example of the contents of the management table 34 when moving image data (content name: MOVIE-2.MPG) is received from the video camera 16 and recorded, following FIG. The face discrimination processing device 30 updates the management table 34 as shown in FIGS.

カメラ１２で撮影した人物を検索キーとして当該人物が撮影されている画像のプレイリストを作成するプレイリスト再生モードの動作を説明する。図８は、プレイリスト作成モードの動作フローチャートを示す。 The operation in the playlist reproduction mode for creating a playlist of images in which the person is photographed using the person photographed by the camera 12 as a search key will be described. FIG. 8 shows an operation flowchart of the playlist creation mode.

図示しないアプリケーションが、カメラ１２の撮影画像データを取り込む（Ｓ１１）。これにより、カメラ１２の撮影画像データは、デジタルインターフェース３８を介して顔認識処理装置２６に供給される。デジタルインターフェース３８にＩＥＥＥ１３９４インターフェースを使用する場合、ＩＥＥＥ１３９４のＡＶ／Ｃプロトコルを利用できる。顔認識処理装置２６は、記録モードの際と同様に、カメラ１２の撮影画像データから人物の顔を認識する（Ｓ１２）。 An application (not shown) takes captured image data of the camera 12 (S11). Thereby, the captured image data of the camera 12 is supplied to the face recognition processing device 26 via the digital interface 38. When the IEEE 1394 interface is used for the digital interface 38, the IEEE 1394 AV / C protocol can be used. As in the recording mode, the face recognition processing device 26 recognizes a person's face from the captured image data of the camera 12 (S12).

顔を認識した場合（Ｓ１３）、顔認識処理装置２６は、抽出した顔特徴量を顔判別処理装置３０に供給する。顔を認識できなかった場合（Ｓ１３）、顔認識処理装置２６は、顔認識処理を終了して、Ｓ１７に進む。 When the face is recognized (S13), the face recognition processing device 26 supplies the extracted face feature amount to the face discrimination processing device 30. When the face cannot be recognized (S13), the face recognition processing device 26 ends the face recognition processing and proceeds to S17.

顔判別処理装置３０は、顔認識処理装置２６からの顔特徴量を検索キーとして顔特徴量データベース３２を検索する（Ｓ１４）。カメラ１２の撮影画像データから抽出された顔特徴量が、顔特徴量データベース３２に登録済みでない場合（Ｓ１４）には、顔判別処理装置３０は顔判別処理を終了して、Ｓ１７に進む。 The face discrimination processing device 30 searches the face feature amount database 32 using the face feature amount from the face recognition processing device 26 as a search key (S14). When the face feature amount extracted from the captured image data of the camera 12 has not been registered in the face feature amount database 32 (S14), the face discrimination processing device 30 ends the face discrimination processing and proceeds to S17.

カメラ１２の撮影画像データから抽出された顔特徴量が顔特徴量データベース３２に登録済みの場合（Ｓ１４）、顔判別処理装置３０は、当該顔特徴量に対応する顔識別子を顔特徴量データベース３２から読み出す。そして、顔判別処理装置３０は、顔特徴量データベース３２からの顔識別子を検索キーとして管理テーブル３４を検索し、この顔識別子を含む画像のプレイリストを管理テーブル３４から生成する（Ｓ１５）。 When the face feature amount extracted from the captured image data of the camera 12 has already been registered in the face feature amount database 32 (S14), the face discrimination processing device 30 sets the face identifier corresponding to the face feature amount to the face feature amount database 32. Read from. Then, the face discrimination processing device 30 searches the management table 34 using the face identifier from the face feature amount database 32 as a search key, and generates a playlist of images including this face identifier from the management table 34 (S15).

例えば、管理テーブル３４が図７に示す内容からなる場合で、Ｓ１４において顔識別子として“Ｆ−０１”を取得したとする。この場合、顔判別処理装置３０は、図７に示す管理テーブル３４を顔識別子＝“Ｆ−０１”でフィルタリングし、図９に示すプレイリストを生成する。即ち、顔識別子＝“Ｆ−０１”のレコードを管理テーブル３４から抽出する。 For example, it is assumed that the management table 34 has the contents shown in FIG. 7, and “F-01” is acquired as the face identifier in S14. In this case, the face discrimination processing device 30 filters the management table 34 shown in FIG. 7 with face identifier = “F-01”, and generates the playlist shown in FIG. That is, a record with face identifier = “F-01” is extracted from the management table 34.

顔判別処理装置３０は、プレイリスト生成のイベントをプレイリストと共に再生処理装置３６に通知する。再生処理装置３６は、このイベントに応じて、ユーザにプレイリストの存在を通知する（Ｓ１６）。具体的には、再生処理装置３６は、映像表示装置１８ａの画面にプレイリストの存在を表示し、又は、スピーカ１８ｂからプレイリストの存在を音声で出力する。図１０は、映像表示装置１８ａの表示例を示す。図１０に示す例では、プレイリスト再生のための“再生する”ボタンを映像表示装置１８ａに表示しており、ユーザがユーザインターフェース２２を用いて再生を指示した場合、再生処理装置３６は、この指示に応じて、プレイリストの再生を開始する。 The face discrimination processing device 30 notifies the playback processing device 36 of the playlist generation event together with the playlist. In response to this event, the playback processing device 36 notifies the user of the presence of the playlist (S16). Specifically, the playback processing device 36 displays the presence of the playlist on the screen of the video display device 18a, or outputs the presence of the playlist from the speaker 18b by voice. FIG. 10 shows a display example of the video display device 18a. In the example shown in FIG. 10, when a “play” button for playing a playlist is displayed on the video display device 18a, and the user instructs playback using the user interface 22, the playback processing device 36 In response to the instruction, playback of the playlist is started.

ユーザからの指示又はタイムアウト等によってカメラ１２からの映像信号の取り込みが終了するまで、ステップＳ１２〜Ｓ１６を繰り返す（Ｓ１７）。 Steps S12 to S16 are repeated until the capturing of the video signal from the camera 12 is completed due to an instruction from the user or a timeout (S17).

ユーザから一定時間、何の指示もない場合、プレイリストに従い、プレイリストに記述される画像を順番に自動再生してもよい。 When there is no instruction from the user for a certain period of time, images described in the playlist may be automatically reproduced in order according to the playlist.

このようにして、本実施例では、カメラ１２で撮影した人物を含むプレイリストを記録画像から自動生成することができ、ユーザの指示に応じて又は自動的に、プレイリストを実行できる。 In this way, in this embodiment, a playlist including a person photographed by the camera 12 can be automatically generated from the recorded image, and the playlist can be executed in response to a user instruction or automatically.

ＬＡＮ経由で送られる動画像を録画と顔認識の対象とする実施例を説明したが、テレビ放送される映像信号を録画と顔認識の対象とすることができることは明らかである。また、記録再生装置１０を映像音声モニタ１８と一体化してもよい。 Although an embodiment has been described in which a moving image sent via a LAN is a target for recording and face recognition, it is obvious that a video signal broadcast on television can be a target for recording and face recognition. Further, the recording / reproducing apparatus 10 may be integrated with the video / audio monitor 18.

プレイリスト作成のキーとして、人物の顔を例示したが、その他のオブジェクト一般、例えば、犬、猫等の特定の動物、特定の風景等にも、本発明は適用可能であることは明らかである。この点で、顔認識処理装置２６は、オブジェクト認識処理手段の一例である。 The face of a person is exemplified as a key for creating a playlist. However, it is obvious that the present invention can be applied to other objects in general, for example, specific animals such as dogs and cats, specific landscapes, and the like. . In this regard, the face recognition processing device 26 is an example of an object recognition processing unit.

図８に示すフローでは、カメラ１２で撮影する一人の人物のプレイリストを作成した。顔認識処理装置２６として高速に動作するものを使用することで、１画面内で複数人の顔を認識することが可能である。その場合、複数人のそれぞれを映す画像のプレイリストを作成するだけでなく、これら複数人を同時に映す画像のプレイリストを作成する。図１１は、その動作フローチャートを示す。 In the flow shown in FIG. 8, a playlist of one person photographed by the camera 12 is created. By using a device that operates at high speed as the face recognition processing device 26, it is possible to recognize the faces of a plurality of people within one screen. In that case, not only a playlist of images showing each of a plurality of people but also a playlist of images showing the plurality of people at the same time is created. FIG. 11 shows a flowchart of the operation.

図示しないアプリケーションが、カメラ１２の撮影画像データを取り込む（Ｓ２１）。これにより、カメラ１２の撮影画像データは、デジタルインターフェース３８を介して顔認識処理装置２６に供給される。顔認識処理装置２６は、記録モードの際と同様に、カメラ１２の撮影画像データから人物の顔を認識する（Ｓ２２）。複数の人間が映っている場合、複数人の顔を認識する。 An application (not shown) captures captured image data of the camera 12 (S21). Thereby, the captured image data of the camera 12 is supplied to the face recognition processing device 26 via the digital interface 38. As in the recording mode, the face recognition processing device 26 recognizes a human face from the captured image data of the camera 12 (S22). When multiple people are shown, the faces of multiple people are recognized.

認識した各顔について、顔認識処理装置２６は、抽出した顔特徴量を顔判別処理装置３０に供給し（Ｓ２３）、顔判別処理装置３０は、顔特徴量データベース３２に登録済みかどうかを調べる（Ｓ２４）。顔特徴量が顔特徴量データベース３２に登録済みの場合（Ｓ２４）、顔判別処理装置３０は、登録済みの各顔特徴量に対応する顔識別子を顔特徴量データベース３２から読み出す。 For each recognized face, the face recognition processing device 26 supplies the extracted face feature amount to the face discrimination processing device 30 (S23), and the face discrimination processing device 30 checks whether it has been registered in the face feature amount database 32. (S24). When the face feature value has been registered in the face feature value database 32 (S24), the face discrimination processing device 30 reads the face identifier corresponding to each registered face feature value from the face feature value database 32.

顔判別処理装置３０は、顔特徴量データベース３２から読み出した各顔識別子を検索キーとして管理テーブル３４を検索し、これらの顔識別子を単独で含む画像のプレイリストと、全部の顔識別子を含むプレイリストを作成する（Ｓ２５）。 The face discrimination processing device 30 searches the management table 34 using each face identifier read from the face feature quantity database 32 as a search key, a play list of images including these face identifiers alone, and a play including all face identifiers. A list is created (S25).

例えば、管理テーブル３４が図７に示す内容からなる場合で、Ｓ２４において顔識別子として”Ｆ−０１”と”Ｆ−０２”を取得したとする。この場合、顔判別処理装置３０は、顔識別子＝“Ｆ−０１”に対する図９に示すプレイリストに加えて、図１２及び図１３に示すプレイリストを作成する。図１２は、顔識別子＝“Ｆ−０２”に対するプレイリストを示す。図１３は、顔識別子として“Ｆ−０１”と“Ｆ−０２”の両方を同時に含むプレイリストを示す。 For example, it is assumed that the management table 34 has the contents shown in FIG. 7 and that “F-01” and “F-02” are acquired as face identifiers in S24. In this case, the face discrimination processing device 30 creates the playlist shown in FIGS. 12 and 13 in addition to the playlist shown in FIG. 9 for the face identifier = “F-01”. FIG. 12 shows a play list for face identifier = “F-02”. FIG. 13 shows a playlist that simultaneously includes both “F-01” and “F-02” as face identifiers.

図１３に示すプレイリストでは、開始タイム、終了タイム及び時間は、顔識別子＝“Ｆ−０１”が映っている期間と、顔識別子＝“Ｆ−０２”が映って期間が重複する期間を規定する。即ち、図１３に示すプレイリストは、図１０に示すプレイリストと、図１２に示すプレイリストの論理積に相当する。 In the playlist shown in FIG. 13, the start time, end time, and time are defined as a period in which the face identifier = “F-01” is reflected and a period in which the face identifier = “F-02” is reflected and the periods overlap. To do. That is, the playlist shown in FIG. 13 corresponds to the logical product of the playlist shown in FIG. 10 and the playlist shown in FIG.

顔判別処理装置３０は、プレイリスト生成のイベントをプレイリストと共に再生処理装置３６に通知する。再生処理装置３６は、このイベントに応じて、ユーザにプレイリストの存在を通知する（Ｓ２６）。具体的には、再生処理装置３６は、映像表示装置１８ａの画面にプレイリストの存在を表示し、又は、スピーカ１８ｂからプレイリストの存在を音声で出力する。図１４は、映像表示装置１８ａの表示例を示す。図１０に示すプレイリストに対するメッセージ６０、図１２に示すプレイリストに対するメッセージ６２、図１３に示すプレイリストに対するメッセージ６４が、同時に表示される。その他の構成は、図１４と同じである。 The face discrimination processing device 30 notifies the playback processing device 36 of the playlist generation event together with the playlist. In response to this event, the playback processing device 36 notifies the user of the presence of the playlist (S26). Specifically, the playback processing device 36 displays the presence of the playlist on the screen of the video display device 18a, or outputs the presence of the playlist from the speaker 18b by voice. FIG. 14 shows a display example of the video display device 18a. A message 60 for the playlist shown in FIG. 10, a message 62 for the playlist shown in FIG. 12, and a message 64 for the playlist shown in FIG. 13 are simultaneously displayed. Other configurations are the same as those in FIG.

ユーザからの指示又はタイムアウト等によってカメラ１２からの映像信号の取り込みが終了するまで、ステップＳ２２〜Ｓ２６を繰り返す（Ｓ２７）。 Steps S22 to S26 are repeated until the capturing of the video signal from the camera 12 is completed due to an instruction from the user or a timeout (S27).

このようにして、本実施例では、カメラ１２で同時に撮影された複数の人物を同時に含むプレイリストをも記録画像から自動生成できる。各人に対するプレイリストの論理和により、複数の人物の一人が必ず含まれるプレイリストを生成できることは明らかである。 In this way, in this embodiment, a playlist including a plurality of persons photographed simultaneously by the camera 12 can be automatically generated from the recorded image. It is clear that a playlist that always includes one of a plurality of persons can be generated by the logical sum of the playlists for each person.

本発明の一実施例の概略構成ブロック図である。It is a schematic block diagram of one Example of this invention. 本実施例の周辺装置との接続構成図である。It is a connection block diagram with the peripheral device of a present Example. 管理テーブル３４の構成を示す図である。4 is a diagram showing a configuration of a management table 34. FIG. 本実施例の動画記録時の動作フローチャートである。It is an operation | movement flowchart at the time of the moving image recording of a present Example. 管理テーブルの内容例である。It is an example of the contents of a management table. 図５に続く管理テーブルの内容例である。6 is a content example of a management table following FIG. 図６に続く管理テーブルの内容例である。FIG. 7 is a content example of a management table following FIG. 6. FIG. プレイリスト作成モードの動作フローチャートである。It is an operation | movement flowchart in play list creation mode. 作成されたプレイリスト例である。It is an example of a created playlist. ステップＳ１６のプレイリスト作成メッセージの表示例である。It is an example of a display of the play list creation message of step S16. 同時に撮影された複数の人物に対応するプレイリスト作成モードの動作フローチャートである。It is an operation | movement flowchart in the play list creation mode corresponding to the several person image | photographed simultaneously. 図１１に示すフローで生成されるプレイリスト例である。It is an example of a play list generated by the flow shown in FIG. 図１１に示すフローで生成される複数人を同時に含む画像のプレイリスト例である。FIG. 12 is an example of an image playlist including a plurality of people generated in the flow shown in FIG. ステップＳ２６のプレイリスト作成メッセージの表示例である。It is an example of a display of the play list creation message of step S26.

Explanation of symbols

１０：記録再生装置
１２：カメラ
１４：ＬＡＮ
１６：ビデオカメラ
１８：映像音声モニタ
１８ａ：映像表示装置
１８ｂ：スピーカ
２０：通信処理装置
２２：ユーザインターフェース（ＵＩ）
２４：記録処理装置
２６：顔認識処理装置
２８：ハードディスク（ＨＤＤ）
３０：顔判別処理装置
３２：顔特徴量データベース
３４：管理テーブル
３６：再生処理装置
３８：デジタルインターフェース
４０：判別識別子フィールド
４２：顔識別子フィールド
４４：コンテンツ名フィールド
４６：開始タイムフィールド
４８：終了タイムフィールド
５０：時間フィールド 10: Recording / reproducing apparatus 12: Camera 14: LAN
16: Video camera 18: Video / audio monitor 18a: Video display device 18b: Speaker 20: Communication processing device 22: User interface (UI)
24: Recording processing device 26: Face recognition processing device 28: Hard disk (HDD)
30: Face discrimination processing device 32: Face feature quantity database 34: Management table 36: Playback processing device 38: Digital interface 40: Discrimination identifier field 42: Face identifier field 44: Content name field 46: Start time field 48: End time field 50: Time field

Claims

Recording means for recording a moving image on a recording medium;
Means for detecting an appearance period of a plurality of specific objects included in the moving image, and generating object related information regarding the appearance period for each of the plurality of specific objects;
An acquisition means for acquiring an image;
Detecting means for detecting a specific object from the image acquired by the acquiring means;
Based on the object-related information and the detection result of the detection means, a playlist for selecting and reproducing the appearance period of the specific object detected by the detection means from the moving images recorded on the recording medium. Playlist generating means for generating ,
The detection unit detects a first specific object and a second specific object among a plurality of the specific objects included in the moving image from an image of one screen acquired by the acquisition unit. In this case, the playlist generating means selects a first playlist for reproduction by selecting a period in which the first specific object and the second specific object appear together, and the first specific object. A second playlist for selecting and reproducing the appearance period of the second object and a third playlist for selecting and reproducing the appearance period of the second specific object are generated.
An image processing apparatus.

The object related information includes information on an appearance start time and an appearance stop time for each of the plurality of objects, and the playlist generation unit is configured to start appearance of a specific object detected by the detection unit included in the object related information. The image processing apparatus according to claim 1, wherein the playlist is generated based on time and appearance stop time information.

The detecting means detects the specific object included in the moving image recorded on the recording medium when the moving image is recorded, and the means for generating the object related information is a moving image recorded on the recording medium. The image processing apparatus according to claim 1, wherein the object related information is generated based on a detection result of the specific object included in an image.

Comprises means for generating a feature amount information including the identifier of the previous SL plurality of said specific object each feature quantity included in the moving image recorded on a recording medium and a plurality of said specific object,
The means for generating the object related information generates the object related information in which a plurality of identifiers of the specific object and information indicating the appearance period for each of the specific objects are associated with each other.
The playlist generation unit detects an identifier of the specific object detected by the detection unit based on the feature amount of the specific object detected by the detection unit and the feature amount information, and corresponds to the detected identifier. The image processing apparatus according to claim 1, wherein the play list is generated by detecting an appearance period to be detected from the object-related information.

Face recognition means for recognizing a person's face as the specific object, and the means for generating the object related information, the appearance of the specific object based on a face recognition result from the moving image by the face recognition means A period of time is detected, and the detection unit includes the identification included in the one-screen image acquired by the acquisition unit based on a face recognition result from the one-screen image acquired by the acquisition unit by the face recognition unit. 5. The image processing apparatus according to claim 1 , wherein the object is detected .

A method of processing a moving image recorded on a recording medium,
Detecting an appearance period of a plurality of specific objects included in the moving image, and generating object-related information regarding the appearance period for each of the plurality of specific objects;
An acquisition step of acquiring an image;
A detecting step of detecting a specific object from the acquired image;
Based on the object related information and the detection result of the detection step, a playlist for selecting and reproducing the appearance period of the specific object detected by the detection step from the moving images recorded on the recording medium. Step to generate
And
When the detection step detects a first specific object and a second specific object among the plurality of specific objects included in the moving image from the acquired one-screen image, The step of generating a playlist includes: a first playlist for selecting and reproducing a period in which the first specific object and the second specific object appear together; and the first specific object A second playlist for selecting and reproducing the appearance period of the object and a third playlist for selecting and reproducing the appearance period of the second specific object are generated.
An image processing method.