JP5317710B2

JP5317710B2 - Image processing apparatus, control method therefor, program, and recording medium

Info

Publication number: JP5317710B2
Application number: JP2009001731A
Authority: JP
Inventors: 利道工藤
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2009-01-07
Filing date: 2009-01-07
Publication date: 2013-10-16
Anticipated expiration: 2029-01-07
Also published as: JP2010161562A

Abstract

PROBLEM TO BE SOLVED: To highly reliably record a position at which a face newly appears with a small processing load. SOLUTION: An image processing apparatus includes a means for detecting a face from a moving images, a tracking means for tracking the face of a main object by tracking conditions including other than the position and size thereof, a holding means (S416) for holding the information of the position and size of the detected face, a first determination means (S411) for determining whether the face matches with the information of the position or the size held in the holding means when any face other than that of the main object is newly detected, and a recording control means (S407, S419) for executing control so as to record a face index specifying a frame included in a period during which it is determined that the new face is detected. The recording control means does not record the face index (S418) for a period of detecting the face for which the face index is already recorded in the other period of face detection among the faces determined as matching by the first determination means. COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、動画から人物の顔を検出し、顔が検出されたフレームに関する情報を記録する画像処理装置、その制御方法、プログラム及び記録媒体に関する。 The present invention relates to an image processing apparatus that detects a face of a person from a moving image and records information about a frame in which the face is detected, a control method thereof, a program, and a recording medium.

従来、動画の再生に係る操作性を向上させるため、さまざまなＧＵＩ（グラフィックユーザーインターフェース）が提案されている。 Conventionally, various GUIs (graphic user interfaces) have been proposed in order to improve operability related to reproduction of moving images.

特許文献１によれば、動画コンテンツに登場する人物やその登場位置を、ユーザに適切かつ判り易く提示することができるようにする動画の再生装置について開示されている。この再生装置によれば、動画中で人物の顔が新たに検出された位置を示す顔画像を一覧表示し、その画像の中からユーザに選択された顔画像に対応する区間の動画を再生することができる。 According to Patent Literature 1, a moving image playback device is disclosed in which a person appearing in moving image content and its appearance position can be presented to a user appropriately and easily. According to this reproducing apparatus, a list of face images indicating positions where a human face is newly detected in a moving image is displayed, and a moving image in a section corresponding to the face image selected by the user from the images is reproduced. be able to.

図８に、再生時に動画中で検出された顔画像を一覧表示した表示（以下、顔インデックス表示と称する）の例を示す。 FIG. 8 shows an example of display (hereinafter referred to as face index display) displaying a list of face images detected in a moving image during reproduction.

表示画面８０１は、顔インデックス表示が行われる表示画面であり、顔インデックス表示を行うために選択された１つの動画についての代表画像、付帯情報、検出された顔画像一覧などが表示されている。 A display screen 801 is a display screen on which face index display is performed, and displays a representative image, supplementary information, a list of detected face images, and the like for one moving image selected for performing face index display.

代表画像８０２は現在選択している動画全体の代表画像である。例えば動画の先頭のフレーム（または先頭のＩピクチャ）、または最初に検出された顔を含むフレームである。 A representative image 802 is a representative image of the entire currently selected moving image. For example, the first frame (or first I picture) of the moving image or a frame including the face detected first.

シーン情報８０３は動画に付帯する情報である、動画の記録日時、記録時間等が表示される。 The scene information 803 is information attached to the moving image, and displays the recording date and time and recording time of the moving image.

画像８０４〜８０８はそれぞれ動画中で新たに検出された顔が含まれるフレームを縮小した画像であり、タイムライン８１０上に時系列で表示されている。動画中で新たに検出された顔が所定数（ここでは５）に満たない場合はブランクとなる。動画中で新たに検出された顔が所定数以上である場合には、ユーザー操作により次のページに移動することも可能である。 Images 804 to 808 are images obtained by reducing the frames including the newly detected face in the moving image, and are displayed on the timeline 810 in time series. When the number of newly detected faces in the moving image is less than a predetermined number (here, 5), it is blank. When the number of faces newly detected in the moving image exceeds a predetermined number, it is possible to move to the next page by a user operation.

カーソル８０９はタイムライン８１０上に表示された画像から所望の画像を選択するためのカーソルであり、ユーザの操作によって移動される。そしてこのカーソルがあたって選択されている画像の位置から、動画の再生を開始することができる。例えば図８では画像８０６にカーソル８０９が当たっており、この状態でユーザからの再生指示を受けると、画像８０６のフレームの位置から動画の再生を開始できる。このような顔インデックス表示は、動画の撮影時に生成された顔インデックス情報に基づいて行われる。顔インデックス情報とは、動画ファイルとは別ファイルあるいは動画ファイルのヘッダ情報として記録されているファイルである。顔インデックス情報には、タイムライン８１０上で表示すべき画像のフレームを特定する情報（以下、顔インデックスと称する）が記録されている。この顔インデックスは、撮影時に撮像装置によって各フレームについて顔検出処理が行われ、新たな顔が登場したと判定された際に、顔インデックス記録要求が行われることにより記録される。 A cursor 809 is a cursor for selecting a desired image from images displayed on the timeline 810, and is moved by a user operation. Then, the playback of the moving image can be started from the position of the image selected by the cursor. For example, in FIG. 8, the cursor 809 is placed on the image 806, and when a playback instruction is received from the user in this state, playback of a moving image can be started from the frame position of the image 806. Such face index display is performed based on face index information generated at the time of shooting a moving image. The face index information is a file recorded as header information of a moving image file or a separate file from the moving image file. In the face index information, information for specifying a frame of an image to be displayed on the timeline 810 (hereinafter referred to as a face index) is recorded. The face index is recorded by performing a face index recording request when a face detection process is performed for each frame by the imaging device at the time of shooting and it is determined that a new face has appeared.

図９に、従来の顔インデックス表示に用いられる顔インデックス情報に含まれる、各顔インデックスの記録要求のタイミングを示す。 FIG. 9 shows the recording request timing of each face index included in the face index information used for conventional face index display.

期間９０１、９０３、９１１〜９１５、９２１〜９２３はそれぞれ撮像装置がもつ顔検出回路が顔を検出していた期間を示す。期間９１２と期間９１３の間、期間９１３と期間９１４の間、期間９２２と期間９２３の間は４秒以上空いており、その他の期間の間は全て４秒未満であるものとする。また、期間９１３は１秒未満の期間であるものとする。この例は、同一時刻に最大３人を検出した例であり、以下説明のため主被写体、顔ＩＤ１、顔ＩＤ２とする。主被写体は所定のアルゴリズムにより主被写体と認識されたものとする。主被写体が割り当てられた顔は、期間９０２で輝度によるマッチングでの追尾処理が行われており、期間９０１、９０３で検出されていた顔は一度顔検出が途切れているものの、期間９０３で検出された顔と同一人物の顔とみなすことができる。ただし輝度によるマッチングは処理負荷が大きいため、主被写体のみについて行われる。そのため、主被写体以外の顔には追尾処理は施されず、顔ＩＤ１、顔ＩＤ２が割り当てられている顔は、同一ＩＤが割り当てられていても顔検出期間が途切れていれば同じ顔であるとは限らない。すなわち、顔ＩＤ１が割り当てられた、期間９１１〜９１５で検出されている顔はそれぞれ異なる人物の顔である可能性があるし、同一人物の顔である可能性もある。また、顔ＩＤ２が割り当てられた、期間９２１〜９２３で検出されている顔もそれぞれ異なる人物の顔である可能性があるし、同一人物の顔である可能性もある。ただし、異なる顔ＩＤの顔はそれぞれ同一時刻に検出されている期間が存在する。同一時刻に検出されていれば、１人の人物に複数のＩＤが割り当てられることは無いので、顔ＩＤ毎に異なる人物の顔である。つまり、主被写体以外に割り当てられた顔ＩＤは、あくまで同一時刻に検出されている複数の顔を識別するための識別子に過ぎない。一度顔検出が途切れると、割り当てられていた顔ＩＤは失効し、新たに顔が検出された際に、有効でない顔ＩＤのうちもっとも小さい番号のＩＤが割り当てられる。このような顔ＩＤの管理は、撮影中に一時記憶媒体に用意した顔管理テーブルを用いて行われる。 Periods 901, 903, 911 to 915, and 921 to 923 indicate periods during which the face detection circuit included in the imaging apparatus has detected a face. It is assumed that there are four seconds or more between the period 912 and the period 913, between the period 913 and the period 914, between the period 922 and the period 923, and all other periods are less than 4 seconds. The period 913 is a period of less than 1 second. This example is an example in which a maximum of three people are detected at the same time, and are assumed to be a main subject, face ID1, and face ID2 for the following explanation. It is assumed that the main subject is recognized as the main subject by a predetermined algorithm. The face to which the main subject is assigned is subjected to tracking processing by matching by luminance in the period 902, and the face detected in the periods 901 and 903 is detected in the period 903 although the face detection is once interrupted. Can be regarded as the face of the same person. However, matching by luminance is performed only on the main subject because of a heavy processing load. Therefore, tracking processing is not performed on faces other than the main subject, and faces assigned with face ID 1 and face ID 2 are the same face if the face detection period is interrupted even if the same ID is assigned. Is not limited. That is, the faces detected in the periods 911 to 915 to which the face ID 1 is assigned may be faces of different persons or the faces of the same person. Further, the faces detected in the periods 921 to 923 to which the face ID 2 is assigned may be faces of different persons, or may be faces of the same person. However, there are periods in which faces with different face IDs are detected at the same time. If it is detected at the same time, a plurality of IDs are not assigned to one person, so the faces of the persons are different for each face ID. That is, the face ID assigned to other than the main subject is merely an identifier for identifying a plurality of faces detected at the same time. Once face detection is interrupted, the assigned face ID expires, and when a new face is detected, the ID with the smallest number among the invalid face IDs is assigned. Such face ID management is performed using a face management table prepared in a temporary storage medium during shooting.

顔インデックス番号９５０は、図９に示すとおりに顔が検出された場合に行われる顔インデックス記録要求のタイミングと、各顔インデックスの顔インデックス番号を示している。主被写体以外の顔である顔ＩＤ１、２が割り当てられる顔は、新たに検出される度に顔インデックス記録要求が行われる。こうして図９の例では、顔インデックス情報に顔インデックス番号０〜７の８個の顔インデックスが記録される。そして再生時の顔インデックス表示では顔インデックス番号０〜７に対応する８個の画像がタイムライン上に表示されることになる。
特開２００８−０１７０４１号公報 The face index number 950 indicates the timing of a face index recording request that is performed when a face is detected as shown in FIG. 9, and the face index number of each face index. A face index recording request is made each time a face to which face IDs 1 and 2 that are faces other than the main subject are assigned is newly detected. In this way, in the example of FIG. 9, eight face indexes of face index numbers 0 to 7 are recorded in the face index information. In the face index display at the time of reproduction, eight images corresponding to the face index numbers 0 to 7 are displayed on the timeline.
JP 2008-017041 A

図９の例では、検出期間が途切れている顔でも、同一ＩＤが割り当てられた顔は同じ人物の顔である可能性もあるが、同じ人物の顔であったとしても新たに検出される度に顔インデックスが記録される。例えば、同じ人物Ａの顔を検出していたとしても、人物Ａが一瞬下を向いたために顔検出が一瞬途切れ、人物Ａがすぐにまたカメラを向いたために再度顔検出されたようなタイミングでも顔インデックスが記録されてしまう。例えば期間９１２の検出開始のタイミング（顔インデックス番号２の顔インデックスの記録要求がされるタイミング）が、期間９１１から検出が途切れた時間が短いため、このような場合になっていることがありえる。しかしこのような場合は、視聴者から見れば人物Ａが新たに登場したタイミングであるとは言えず、このタイミングから再生を開始したいという視聴者の要望は考え難い。すなわち、従来の顔インデックスの記録要求のタイミングでは、新たに登場した人物ではなく、単に顔の動きの多い同一人物の顔が、前述の顔インデックス表示に多く登場してしまう可能性があり、顔インデックス表示での検索性を損ねてしまうことがあった。 In the example of FIG. 9, even if the detection period is interrupted, the face assigned with the same ID may be the face of the same person, but each time it is newly detected even if it is the face of the same person The face index is recorded. For example, even if the face of the same person A is detected, the face detection is interrupted for a moment because the person A turns downward for a moment, and the face detection is performed again because the person A faces the camera again. The face index is recorded. For example, the detection start timing in the period 912 (timing for requesting the recording of the face index with the face index number 2) may be such a case because the time when the detection is interrupted from the period 911 is short. However, in such a case, it cannot be said that it is the timing when the person A newly appears from the viewpoint of the viewer, and it is difficult to think of the viewer's desire to start playback from this timing. In other words, at the timing of the conventional face index recording request, there is a possibility that the face of the same person with many face movements will appear in the face index display described above rather than a newly appearing person. In some cases, the search performance in the index display was impaired.

また、図９の例では、１秒に満たないような一瞬しか顔が検出できなかった期間９１３のタイミングでも顔インデックスが記録されてしまう。例えば期間９１３で検出された顔が、人物Ｂの顔であったとする。この期間９１３の顔インデックス（顔インデックス番号は３）によって、顔インデックス表示には人物Ｂの顔が写っている画像が表示される。そのためこの画像をみて視聴者は、人物Ｂに関する動画を視聴することを期待して顔インデックス番号３による画像を選択し、その位置から再生を始める。しかし実際人物Ｂが写っている動画は１秒に満たない短い時間であり、人物Ｂに関する動画を視聴したいという視聴者の要望を十分に満たせない可能性が高い。このような顔インデックスの画像も、顔インデックス表示での検索性を損ねる一因となっていた。 Further, in the example of FIG. 9, the face index is recorded even at the timing of the period 913 in which the face can be detected only for a moment such as less than 1 second. For example, it is assumed that the face detected in the period 913 is the face of the person B. According to the face index in this period 913 (face index number is 3), an image showing the face of person B is displayed in the face index display. Therefore, viewing this image, the viewer selects the image with the face index number 3 in the hope of viewing the movie related to the person B, and starts reproduction from that position. However, the video in which the person B is actually shown is a short time of less than 1 second, and there is a high possibility that the viewer's desire to view the video related to the person B cannot be sufficiently satisfied. Such face index images also contribute to the loss of searchability in face index display.

また、主被写体の顔は追尾処理を行っているため、新たに登場したと判定されることは少ない。これに比べ、主被写体以外の顔は追尾処理を施していない分だけ新たに登場したと判定されることが増えてしまう。そのため、ユーザが注目しているのは主被写体であると考えられるにも関わらず、顔インデックス表示には主被写体以外の顔が登場したシーンを表す画像が多くなってしまう。 Further, since the face of the main subject is subjected to the tracking process, it is rarely determined that the face has appeared newly. Compared to this, it is more likely that a face other than the main subject is newly appeared by the amount not subjected to the tracking process. For this reason, although it is considered that the user is paying attention to the main subject, the face index display has many images representing scenes where faces other than the main subject appear.

上記課題を鑑みて本発明は、少ない処理負荷で顔が新たに登場した位置をより信頼性高く記録する撮像装置を提供することを目的とする。 In view of the above problems, an object of the present invention is to provide an imaging apparatus that records a position where a face newly appears with a small processing load with higher reliability.

上記目的を達成するために、本発明の請求項１の画像処理装置は、
動画の各フレームから顔を検出する顔検出手段と、
前記顔検出手段で検出された顔のうち、主被写体と判定された顔について、前記動画のうち、前記顔検出手段で前記主被写体が検出された第１の期間と、前記主被写体が検出されなくなってからの期間であって、前記主被写体に関する、顔の位置と大きさ及び時間以外の条件を含む第１の条件を満たす第２の期間と、前記顔検出手段で前記主被写体が再度検出された第３の期間とが連続した期間について、前記第１の期間についてのフレームを特定する情報を顔インデックス情報として記録し、前記第３の期間についてのフレームを特定する情報は前記顔インデックス情報としては記録しないように制御するとともに、
前記顔検出手段で検出された顔のうち、前記主被写体以外の特定の顔について、前記動画のうち、前記顔検出手段で前記特定の顔が検出された第一の期間と、前記特定の顔が検出されなくなってから所定時間以内の第二の期間と、前記第一の期間で検出されていた前記特定の顔の位置と大きさの少なくとも一方に関する第２の条件を満たす顔が前記顔検出手段で検出された第三の期間とが連続した期間について、前記第一の期間についてのフレームを特定する情報を前記顔インデックス情報として記録し、前記第三の期間についてのフレームを特定する情報は前記顔インデックス情報としては記録しないように制御する制御手段と
を有することを特徴とする。 In order to achieve the above object, an image processing apparatus according to claim 1 of the present invention provides:
Face detection means for detecting a face from each frame of the video;
Of the faces detected by the face detection means, for the face determined as the main subject, a first period in which the main subject is detected by the face detection means and the main subject are detected in the video. A period after the first period, a second period that satisfies a first condition that includes conditions other than the position, size, and time of the face relating to the main object; and the main object is detected again by the face detection means. Information for specifying a frame for the first period is recorded as face index information for a period that is continuous with the third period, and information for specifying a frame for the third period is the face index information. As well as controlling not to record,
Among the faces detected by the face detection means, for a specific face other than the main subject, a first period in which the specific face is detected by the face detection means in the video, and the specific face A face that satisfies a second condition relating to at least one of the second period within a predetermined time after the detection of the detection and the position and size of the specific face detected in the first period is the face detection For a period continuous with the third period detected by the means, information specifying a frame for the first period is recorded as the face index information, and information for specifying a frame for the third period is as the face index information and said Rukoto which have a control means for controlling so as not to record.

本発明によれば、少ない処理負荷で顔が新たに登場した位置をより信頼性高く記録することができる。 According to the present invention, it is possible to more reliably record a position where a new face appears with a small processing load.

以下、図面を参照して本発明の好適な実施形態を説明する。 Preferred embodiments of the present invention will be described below with reference to the drawings.

＜デジタルビデオカメラの構成＞
図１に本発明の撮像装置の一例としてのデジタルビデオカメラの構成ブロック図を示す。 <Configuration of digital video camera>
FIG. 1 is a block diagram showing a configuration of a digital video camera as an example of an imaging apparatus of the present invention.

１０１はレンズユニットであり、集光のための固定レンズ群、変倍レンズ群、絞り、変倍レンズ群の動きで移動した結像位置を補正する機能と焦点調節を行う機能とを兼ね備えた補正レンズ群により構成される。１０２は撮像素子であり、光学像を電気信号に変換する。１０３はカメラ信号処理回路であり、撮像素子１０２によって生成される電気信号に対し所定の色変換処理、縮小処理など画像処理を行い、出力先に適した画像データを出力する。 Reference numeral 101 denotes a lens unit, which is a correction unit that has a function of correcting an imaging position moved by movement of a fixed lens group for focusing, a variable power lens group, a diaphragm, and a variable power lens group, and a function of performing focus adjustment. Consists of a lens group. An image sensor 102 converts an optical image into an electrical signal. A camera signal processing circuit 103 performs image processing such as predetermined color conversion processing and reduction processing on the electrical signal generated by the image sensor 102 and outputs image data suitable for the output destination.

１０４は顔検出回路である。カメラ信号処理回路１０３は、顔検出に適した画像データを顔検出回路１０４に伝送する。顔検出回路１０４は、入力された画像データに所定の処理を行った後、パターンマッチングによって目及び鼻、口、耳の候補群を抽出する。さらに抽出された目の候補群の中から、予め設定された条件（例えば２つの目の距離、傾き等）を満たすものを、目の対と判断し、目の対があるもののみ目の候補群として絞り込む。そして絞り込まれた目の候補群とそれに対応する顔を形成する他のパーツ（鼻、口、耳）を対応付け、また、予め設定した非顔条件フィルタを通すことで、顔を検出する。顔検出回路１０４は、顔の検出結果に応じて上記顔情報を出力し、処理を終了する。このとき、顔の数、位置、大きさなどの特徴量である顔検出情報を後述するメモリ１０９に記憶する。 Reference numeral 104 denotes a face detection circuit. The camera signal processing circuit 103 transmits image data suitable for face detection to the face detection circuit 104. The face detection circuit 104 performs predetermined processing on the input image data, and then extracts a candidate group of eyes, nose, mouth, and ears by pattern matching. Further, from the extracted eye candidate group, those satisfying preset conditions (for example, distance between two eyes, inclination, etc.) are determined as eye pairs, and only eye candidates having eye pairs are determined. Narrow down as a group. Then, the face candidate is detected by associating the narrowed-down eye candidate group with other parts (nose, mouth, and ears) that form a face corresponding thereto, and through a preset non-face condition filter. The face detection circuit 104 outputs the face information according to the face detection result, and ends the process. At this time, face detection information, which is a feature quantity such as the number, position, and size of faces, is stored in the memory 109 described later.

以上のように画像解析して、画像データの特徴量を抽出して被写体情報を検出することが可能である。 As described above, it is possible to detect the subject information by extracting the feature amount of the image data by analyzing the image.

１０８はフラッシュＲＯＭ、１０９はメモリ、１１０はカメラ制御用マイクロコンピュータ、１２０はバスである。カメラ信号処理回路１０３、顔検出回路１０４を含め、各ブロックはバス１２０を介し接続されている。フラッシュＲＯＭ１０８はカメラ制御用マイクロコンピュータ１１０が実行するプログラム、及び各種パラメータなどが予め格納されている書き換え可能不揮発性メモリである。メモリ１０９は揮発性メモリであり、カメラ制御用マイクロコンピュータ１１０などバス１２０を介して接続される各ブロックが作業領域として使用する。 Reference numeral 108 denotes a flash ROM, 109 denotes a memory, 110 denotes a camera control microcomputer, and 120 denotes a bus. Each block including the camera signal processing circuit 103 and the face detection circuit 104 is connected via a bus 120. The flash ROM 108 is a rewritable nonvolatile memory in which programs executed by the camera control microcomputer 110 and various parameters are stored in advance. A memory 109 is a volatile memory, and each block connected via the bus 120 such as a camera control microcomputer 110 is used as a work area.

カメラ制御用マイクロコンピュータ１１０はカメラ信号処理回路１０３、顔検出回路１０４等を制御し、カメラ側の様々な制御を行うものである。 The camera control microcomputer 110 controls the camera signal processing circuit 103, the face detection circuit 104, etc., and performs various controls on the camera side.

なお、顔に対して焦点を調整する顔ＡＦ、顔の明るさに応じて露出を最適化する顔ＡＥ、顔の色に応じてホワイトバランスを最適化するＷＢなど、カメラ機能を顔に対して最適になるよう作動させることができる。これらは、メモリ１０９に格納された顔検出情報を元にカメラ制御用マイクロコンピュータ１１０の制御によって行われる。 Camera functions for the face, such as face AF that adjusts the focus on the face, face AE that optimizes exposure according to the brightness of the face, and WB that optimizes white balance according to the face color Can be actuated to be optimal. These are performed under the control of the camera control microcomputer 110 based on the face detection information stored in the memory 109.

カメラ制御用マイクロコンピュータ１１０は、複数の顔情報の中から主被写体を判定する。この判定は、検出された複数の顔情報の中から、顔の位置、その大きさなどから所定のアルゴリズムにより行われる。または、ユーザーが任意の顔を選択する機能を設け、この顔情報を主被写体としても良い。 The camera control microcomputer 110 determines the main subject from the plurality of face information. This determination is performed by a predetermined algorithm based on the position of the face, its size, etc. from the detected plurality of face information. Alternatively, the user may have a function of selecting an arbitrary face, and this face information may be used as the main subject.

カメラ制御用マイクロコンピュータ１１０は主被写体に対する追尾処理を行う。 The camera control microcomputer 110 performs a tracking process on the main subject.

カメラ信号処理回路１０３は、撮影画像から輝度成分を抽出し、さらに縮小した画像データをメモリ１０９に格納する。カメラ制御用マイクロコンピュータ１１０は主被写体と判定したエリア、及びその周辺の画像データを切り出し、フレーム間で比較を行い追尾する。以降輝度マッチングと称す。この処理により、顔検出回路１０４で検出できなかったとしても補間する。 The camera signal processing circuit 103 extracts a luminance component from the photographed image and stores the further reduced image data in the memory 109. The camera control microcomputer 110 cuts out the area determined as the main subject and the surrounding image data, and compares and tracks between the frames. This is hereinafter referred to as luminance matching. By this processing, even if the face detection circuit 104 cannot detect it, it is interpolated.

例えば主被写体が下を向いてしまい、顔の特徴を検出できなかった場合などであっても主被写体を認識することが可能となり、顔ＡＦ等を継続して作動させることができる。 For example, the main subject can be recognized even when the main subject faces down and the facial features cannot be detected, and the face AF or the like can be continuously operated.

１０５はビデオ信号処理回路であり、カメラ信号処理回路１０３や、後述する圧縮伸張回路１０６からの画像データを受け、各ブロックに適した画像処理を行い出力するものである。また、ビデオ信号処理回路１０５はビットマップを重畳することができる。これにより、機器の状態、警告、各種設定画面等の表示を行う。また、再生画像を縮小しメモリに保持したり、この画像を表示画像として出力する機能を有する。複数フレームに対し縮小を行い、表示することで各シーンの代表画像を並べたインデックス画面や、顔インデックス表示等のＧＵＩを実現することができる。 Reference numeral 105 denotes a video signal processing circuit which receives image data from the camera signal processing circuit 103 and the compression / decompression circuit 106 described later, performs image processing suitable for each block, and outputs it. In addition, the video signal processing circuit 105 can superimpose a bitmap. As a result, the device status, warning, various setting screens, and the like are displayed. In addition, the reproduction image is reduced and held in a memory, or the image is output as a display image. By reducing and displaying a plurality of frames, it is possible to realize a GUI such as an index screen in which representative images of each scene are arranged or a face index display.

１０６は圧縮伸張回路であり、画像データをＭＰＥＧ２（ＭｏｖｉｎｇＰｉｃｔｕｒｅＥｘｐｅｒｔｓＧｒｏｕｐｐｈａｓｅ２）方式にて圧縮し、後述のバス１２１を介してやはり後述のメモリ１１４に圧縮ビデオデータを書き出す。一方、メモリ１１４に格納された圧縮ビデオデータを読み込み、画像データに再生し、ビデオ信号処理回路１０５に出力する機能も兼ね備えている。 Reference numeral 106 denotes a compression / decompression circuit, which compresses image data by the MPEG2 (Moving Picture Experts Group phase 2) method, and writes compressed video data to a memory 114 (to be described later) via a bus 121 (to be described later). On the other hand, the compressed video data stored in the memory 114 is read, reproduced as image data, and output to the video signal processing circuit 105.

ＭＰＥＧ２ビデオ圧縮技術（動画像圧縮方式）では、フレーム内符号化ピクチャ（Ｉピクチャ）と、フレーム間符号化ピクチャであるＰ（前方予測）ピクチャ、Ｂ（双方向予測）ピクチャがある。このうちＩピクチャはそのフレーム単独で再生することが可能である。 In the MPEG2 video compression technique (moving picture compression method), there are an intra-frame coded picture (I picture), an inter-frame coded picture P (forward prediction) picture, and B (bidirectional prediction) picture. Of these, the I picture can be reproduced by itself.

また、ＭＰＥＧ２システムでは再生時刻を示すＰＴＳ（ＰｒｅｓｅｎｔａｔｉｏｎＴｉｍｅＳｔａｍｐ：再生時刻情報）があり、例えばこれを各フレームに付与することで再生時に各フレームの再生時刻を検出することができる。 In addition, in the MPEG2 system, there is PTS (Presentation Time Stamp: Reproduction Time Information) indicating a reproduction time. For example, by adding this to each frame, the reproduction time of each frame can be detected during reproduction.

圧縮伸張回路１０６では、圧縮動作時に少なくともＩピクチャのＰＴＳと、圧縮ビデオデータ内どの位置に存在するのかを特定するための情報を、後述するレコーダ制御用マイクロコンピュータ１１１が読み出せる機能を有する。この情報を蓄積し記録することで、任意のＩピクチャの位置を検索することができるので、任意のＩピクチャを再生させることができる。 The compression / decompression circuit 106 has a function that allows a recorder control microcomputer 111 (to be described later) to read at least a PTS of an I picture and information for specifying a position in the compressed video data during a compression operation. By accumulating and recording this information, the position of an arbitrary I picture can be searched, so that an arbitrary I picture can be reproduced.

１１１はレコーダ制御用マイクロコンピュータであり、動画像及び管理情報等の記録再生、表示等の制御を行う。１１３はフラッシュＲＯＭであり、レコーダ制御用マイクロコンピュータ１１１が実行するプログラム、及び各種パラメータなどが格納されている書き換え可能不揮発性メモリである。１１４はメモリであり、レコーダ制御用マイクロコンピュータ１１１や圧縮伸張回路１０６などが作業領域として使用する揮発性メモリである。１２１はバスである。１１５はメモリカードインターフェース（カードＩ／Ｆ）であり、１１６はメモリカードである。カードＩ／Ｆ１１５はメモリカード１１６を格納し、コマンドやデータを送受信するものである。メモリカード１１６はＳＤメモリカードのようなカード型メモリカードであり、圧縮伸張回路１０６で生成された圧縮ビデオデータをＦＡＴファイルシステムなどコンピュータと互換のある所定フォーマットに従って動画ファイルとして記録するための記録媒体である。なお、圧縮ビデオデータを記録可能な記録媒体であればよく、メモリカードにかぎらず、ＤＶＤやハードディスク等でも良い。 Reference numeral 111 denotes a recorder control microcomputer which controls recording / reproduction and display of moving images and management information. Reference numeral 113 denotes a flash ROM, which is a rewritable nonvolatile memory in which programs executed by the recorder control microcomputer 111, various parameters, and the like are stored. Reference numeral 114 denotes a memory, which is a volatile memory used as a work area by the recorder control microcomputer 111, the compression / decompression circuit 106, and the like. 121 is a bus. Reference numeral 115 denotes a memory card interface (card I / F), and reference numeral 116 denotes a memory card. The card I / F 115 stores the memory card 116 and transmits / receives commands and data. The memory card 116 is a card-type memory card such as an SD memory card, and a recording medium for recording the compressed video data generated by the compression / decompression circuit 106 as a moving image file according to a predetermined format compatible with a computer such as a FAT file system. It is. Note that any recording medium capable of recording compressed video data may be used, and not only a memory card but also a DVD or a hard disk.

１０７は液晶パネルであり、ビデオ信号処理回路１０５から画像データを受け表示させるためのものである。 Reference numeral 107 denotes a liquid crystal panel for receiving and displaying image data from the video signal processing circuit 105.

また特に図示していないが、音声についてもマイクユニット、スピーカーや外部出力経路を設け、画像と共に圧縮伸張回路１０６にて圧縮、多重化される。 Although not particularly shown, a microphone unit, a speaker, and an external output path are also provided for audio, and are compressed and multiplexed together with the image by the compression / decompression circuit 106.

１１２は操作スイッチ群であり、ユーザーが操作入力するためのものである。また操作スイッチ群１１２には、主としてカメラ撮影を行うためのカメラモード（静止画撮影モード、動画撮影モードを含む）、主として再生させるための再生モード、及び電源オフとするパワーオフモードを選択するためのスイッチも設けられている。カメラ制御用マイクロコンピュータ１１０とレコーダ制御用マイクロコンピュータ１１１は、この操作スイッチ群１１２へのユーザの操作を受け付け、受け付けた操作に応じて各種動作を行う（受付手段）。 Reference numeral 112 denotes a group of operation switches for the user to input operations. The operation switch group 112 is for selecting a camera mode (including a still image shooting mode and a moving image shooting mode) mainly for camera shooting, a playback mode for mainly playback, and a power-off mode for turning off the power. The switch is also provided. The camera control microcomputer 110 and the recorder control microcomputer 111 accept user operations on the operation switch group 112 and perform various operations in accordance with the accepted operations (accepting means).

なお、カメラ制御用マイクロコンピュータ１１０とレコーダ制御用マイクロコンピュータ１１１間で通信を行っており、必要な情報の送受信を行う。 Note that communication is performed between the camera control microcomputer 110 and the recorder control microcomputer 111, and necessary information is transmitted and received.

カメラ制御用マイクロコンピュータ１１０からレコーダ制御用マイクロコンピュータ１１１へは、後述する顔インデックスの記録を要求する情報を送信する。顔インデックスとは、新たな顔が出現した場所（フレーム）を検索するための情報である。 Information requesting recording of a face index, which will be described later, is transmitted from the camera control microcomputer 110 to the recorder control microcomputer 111. The face index is information for searching for a place (frame) where a new face appears.

レコーダ制御用マイクロコンピュータ１１１はこれに従い後述する管理情報に顔インデックス情報を記録する。 The recorder control microcomputer 111 records face index information in management information described later.

本発明のデジタルビデオカメラは、少なくとも動画撮影モードと再生モードとを操作スイッチ群１１２の操作によって切り替えることができる。動画撮影モードでは動画を撮影して動画ファイルをメモリカード１１６に記録することが可能である。再生モードではメモリカードに記録された動画ファイルを再生することが可能であり、顔インデックス表示することも可能である。 The digital video camera of the present invention can switch at least the moving image shooting mode and the playback mode by operating the operation switch group 112. In the moving image shooting mode, a moving image can be shot and a moving image file can be recorded on the memory card 116. In the reproduction mode, a moving image file recorded on the memory card can be reproduced, and a face index can be displayed.

＜動画撮影モード処理＞
図２は、本発明の動画撮影モード処理を説明するフローチャートである。操作スイッチ群１１２の操作によって動画撮影モードに切り替えると、動画撮影モード処理を開始する。この動画撮影モード処理は、カメラ制御用マイクロコンピュータ１１０とレコーダ制御用マイクロコンピュータ１１１が共同し、メモリ１０９やフラッシュＲＯＭ１１３に記録されたプログラムを、メモリ１０９やメモリ１１４に展開して実行することで実現する。 <Movie shooting mode processing>
FIG. 2 is a flowchart for explaining the moving image shooting mode processing of the present invention. When the operation switch group 112 is operated to switch to the moving image shooting mode, the moving image shooting mode process is started. The moving image shooting mode processing is realized by the camera control microcomputer 110 and the recorder control microcomputer 111 cooperating to develop and execute a program recorded in the memory 109 or the flash ROM 113 in the memory 109 or the memory 114. To do.

ステップＳ２０１ではまず、撮像素子１０２で撮像した画像をほぼリアルタイムで液晶パネル１０７に動画として表示する、いわゆるスルー表示を行う。 In step S201, first, so-called through display is performed in which an image captured by the image sensor 102 is displayed as a moving image on the liquid crystal panel 107 in almost real time.

ステップＳ２０２では、操作スイッチ群１１２に含まれる撮影ボタンにより、動画撮影開始指示がされたか否かを判定する。動画撮影開始指示がされていないと判定するとステップＳ２１０へ進み、動画撮影開始指示がされたと判定するとステップＳ２０３へ進む。 In step S 202, it is determined whether or not a moving image shooting start instruction has been given by the shooting button included in the operation switch group 112. If it is determined that the moving image shooting start instruction is not issued, the process proceeds to step S210. If it is determined that the moving image shooting start instruction is issued, the process proceeds to step S203.

ステップＳ２０３では、動画の撮影・記録処理を行う。動画の撮影・記録処理では、例えばフレームレートを３０ｆｐｓとする場合、撮像素子１０２により得られた１／３０秒間隔で得られた画像データ（動画フレーム）を、バッファメモリに一時的に格納する。そして、バッファメモリに一時的に格納された画像データ（動画フレーム）を圧縮伸長回路１０６で圧縮し、処理を終えた画像データをメモリ１１４に確保された書き込み用出力バッファに格納する。そして、書き込み用出力バッファに所定秒数分画像のデータが格納されると、順次書き込み用出力バッファからメモリカード１１６へと画像データを書き込む。同時に、動画データのヘッダ情報となる情報を、メモリ１１４へ記録する。 In step S203, moving image shooting / recording processing is performed. In moving image shooting / recording processing, for example, when the frame rate is set to 30 fps, image data (moving image frames) obtained at 1/30 second intervals obtained by the image sensor 102 is temporarily stored in the buffer memory. Then, the image data (moving image frame) temporarily stored in the buffer memory is compressed by the compression / decompression circuit 106, and the processed image data is stored in the write output buffer secured in the memory 114. When the image data is stored in the write output buffer for a predetermined number of seconds, the image data is sequentially written from the write output buffer to the memory card 116. At the same time, information serving as header information of the moving image data is recorded in the memory 114.

ステップＳ２０４では、顔インデックス記録要求処理を行う。これは、再生時の顔インデックス表示に用いる顔インデックス情報を記録するために、撮影中の動画で新たに顔が登場したフレームに関して顔インデックスを記録するよう要求する処理である。詳細は図４を用いて後述する。 In step S204, face index recording request processing is performed. This is a process for requesting to record a face index for a frame in which a new face appears in a moving image being shot in order to record face index information used for face index display during reproduction. Details will be described later with reference to FIG.

ステップＳ２０５では、顔インデックス情報生成処理を行う。これは、ステップＳ２０４での顔インデックス記録要求に基づいて、実際に記録される顔インデックス情報を生成する処理である。ここでは、現在行っている動画撮影が終了するまで、メモリ１１４に顔インデックス情報を保持し、順次更新していく。この処理の詳細は図５を用いて後述する。 In step S205, face index information generation processing is performed. This is a process of generating face index information to be actually recorded based on the face index recording request in step S204. Here, the face index information is held in the memory 114 and updated sequentially until the current moving image shooting is completed. Details of this processing will be described later with reference to FIG.

なお、ステップＳ２０３〜Ｓ２０５の処理は、実質的に並行して行われる。 Note that the processes in steps S203 to S205 are performed substantially in parallel.

続くステップＳ２０６では、ユーザによって操作スイッチ群１１２が操作されることにより、動作モードが動画撮影モードから他のモードへ切り替えられたか（変更されたか）否かを判定する。動画撮影モードから他のモードへ切り替えられたと判定するとステップＳ２０７へ進む。 In subsequent step S206, it is determined whether or not the operation mode has been switched from the moving image shooting mode to another mode by operating the operation switch group 112 by the user. If it is determined that the moving image shooting mode has been switched to another mode, the process proceeds to step S207.

ステップＳ２０７では、動画撮影の終了処理を開始する。終了処理では、この時点でバッファメモリに一時的に格納されていた画像データ（動画フレーム）を圧縮伸長回路１０６で圧縮し、処理を終えた画像データをメモリ１１４に確保された書き込み用出力バッファに格納する。そして、書き込み用出力バッファに格納された圧縮された画像データをメモリカード１１６へと書き込む。バッファメモリに格納されていた画像データをメモリカード１１６へ書き込み終えると、ステップＳ２０３でメモリ１１４に記録しておいた情報を、ヘッダ情報としてメモリカード１１６へ書き出す（記録する）。そして、メモリカード１１６上の動画ファイルをクローズする処理を行い、動画ファイルを生成する。さらに、生成された動画ファイルに関する管理情報ファイルを記録する。管理情報ファイルについては図６を用いて後述する。この管理情報ファイルには、ステップＳ２０５で生成されていた顔インデックス情報が記録される。管理情報ファイルは撮影された動画ファイルと対応関係が取れるように記録される。例えば動画ファイルのファイル名の拡張子より前の部分と、管理情報ファイルのファイル名の拡張子より前の部分を同じ名称とすることで対応関係を取れるようにする。ステップＳ２０７の終了処理を行うと動画撮影モード処理を終了する。 In step S207, moving image shooting end processing is started. In the end processing, the image data (moving image frame) temporarily stored in the buffer memory at this time is compressed by the compression / decompression circuit 106, and the processed image data is stored in the write output buffer secured in the memory 114. Store. Then, the compressed image data stored in the write output buffer is written to the memory card 116. When the image data stored in the buffer memory has been written to the memory card 116, the information recorded in the memory 114 in step S203 is written (recorded) to the memory card 116 as header information. Then, the moving image file on the memory card 116 is closed to generate a moving image file. Further, a management information file related to the generated moving image file is recorded. The management information file will be described later with reference to FIG. In the management information file, the face index information generated in step S205 is recorded. The management information file is recorded so as to have a correspondence relationship with the captured moving image file. For example, a correspondence relationship can be established by setting the portion before the file name extension of the moving image file and the portion before the file name extension of the management information file to the same name. When the end process of step S207 is performed, the moving image shooting mode process ends.

一方、ステップＳ２０６で、動作モードが動画撮影モードから他のモードへ切り替えられていないと判定するとステップＳ２０８へ進む。ステップＳ２０８では操作スイッチ群１１２に含まれる撮影ボタンの操作により、動画撮影終了指示がされたか否かを判定する。動画撮影終了指示がされていないと判定するとステップＳ２０３へ戻り、動画の撮影を継続する。動画撮影終了指示がされたと判定するとステップＳ２０９へ進む。ステップＳ２０９では、動画撮影の終了処理を行う。この終了処理は前述したステップＳ２０７と同様の処理なので説明を省略する。ステップＳ２０９で終了処理を行うと、ステップＳ２１０へ進む。 On the other hand, if it is determined in step S206 that the operation mode has not been switched from the moving image shooting mode to another mode, the process proceeds to step S208. In step S208, it is determined whether or not a moving image shooting end instruction has been issued by operating a shooting button included in the operation switch group 112. If it is determined that the moving image shooting end instruction has not been given, the process returns to step S203 to continue shooting the moving image. If it is determined that the moving image shooting end instruction has been issued, the process proceeds to step S209. In step S209, a moving image shooting end process is performed. Since this end process is the same as the above-described step S207, its description is omitted. When the termination process is performed in step S209, the process proceeds to step S210.

ステップＳ２１０では、ユーザによって操作スイッチ群１１２が操作されることにより、動作モードが動画撮影モードから他のモードへ切り替えられたか（変更されたか）否かを判定する。他のモードへ切り替えられていないと判定するとステップＳ２０２に戻り、動画撮影モード処理を継続する。動画撮影モードから他のモードへ切り替えられた判定すると、動画撮影モード処理を終了する。 In step S210, it is determined whether the operation mode is switched (changed) from the moving image shooting mode to another mode by operating the operation switch group 112 by the user. If it is determined that the mode has not been switched to another mode, the process returns to step S202 to continue the moving image shooting mode process. If it is determined that the moving image shooting mode has been switched to another mode, the moving image shooting mode process is terminated.

＜顔インデックス記録要求処理＞
前述の図２のステップＳ２０４における顔インデックス記録要求処理の詳細を説明する。顔インデックス記録要求処理は、メモリ１０９に用意した顔管理テーブルを用いて行われる。 <Face index recording request processing>
Details of the face index recording request processing in step S204 of FIG. 2 will be described. The face index recording request process is performed using a face management table prepared in the memory 109.

図３に、顔インデックス記録要求処理で用いられる顔管理テーブルの例を示す。顔管理テーブルは、動画撮影・記録処理中に、顔インデックス要求処理を行うために、メモリ１０９に保持される情報である。顔管理テーブルでは、現在検出されている顔、あるいは検出された後に消失して間もない顔について、顔ＩＤを付与し、顔ＩＤごとに出現時間、最新出現時間、消失時間、位置・大きさ、要求済みフラグを記録する。出現時間は、その顔ＩＤの顔が初めて検出されたフレームが、撮影開始後どの時点のフレームかを特定するための時間情報である。最新出現時間は、その顔ＩＤの顔が１番最近検出された期間の最初のフレームが、撮影開始後どの時点のフレームかを特定するための時間情報である。消失時間は、その顔ＩＤの顔が現在検出されていない場合、検出されていない期間の最初のフレームが、撮影開始後どの時点のフレームかを特定するための時間情報である。位置・大きさは、その顔ＩＤの顔が、最後に検出されたフレーム中でどの位置でどんな大きさで検出されたのかを示す領域情報であり、フレーム中における座標情報、顔検出された領域の縦のサイズ×横のサイズが記録される。要求済みフラグは、その顔ＩＤの顔について顔インデックス記録要求が既に行われたか否かを示すフラグであり、既に行われている場合は１となる。 FIG. 3 shows an example of a face management table used in the face index recording request process. The face management table is information held in the memory 109 in order to perform face index request processing during moving image shooting / recording processing. In the face management table, a face ID is assigned to the currently detected face or a face that has just disappeared after being detected, and the appearance time, latest appearance time, disappearance time, position / size are assigned to each face ID. Record the requested flag. The appearance time is time information for identifying the frame at which the frame in which the face with the face ID is detected for the first time is the frame after the start of photographing. The latest appearance time is time information for specifying the frame at which the first frame in the period in which the face with the face ID is most recently detected is the frame after the start of imaging. The disappearance time is time information for specifying the point in time when the first frame in the period in which the face ID is not detected is detected when the face with the face ID is not currently detected. The position / size is area information indicating at what position and in what size the face with the face ID is detected in the last detected frame. The coordinate information in the frame and the area where the face is detected. The vertical size x horizontal size is recorded. The requested flag is a flag indicating whether or not a face index recording request has already been made for the face having the face ID.

以下、有効な顔ＩＤというのは、顔管理テーブルに出現時間、最新出現時間、位置・大きさの情報が保持されている顔ＩＤ、すなわち現在使用中の顔ＩＤであるものとする。また、無効な顔ＩＤというのは、出現時間、最新出現時間、消失時間、位置・大きさの情報、要求済みフラグの全てがクリアされた顔ＩＤ、すなわち使用されていない顔ＩＤであるものとする。 Hereinafter, the effective face ID is assumed to be a face ID in which information of appearance time, latest appearance time, position / size is stored in the face management table, that is, a face ID currently in use. In addition, the invalid face ID is a face ID in which all of the appearance time, latest appearance time, disappearance time, position / size information, and requested flag are cleared, that is, a face ID that is not used. To do.

図４に、前述の図２のステップＳ２０４における顔インデックス記録要求処理のフローチャートを示す。この顔インデックス記録要求処理は、カメラ制御用マイクロコンピュータ１１０が、フラッシュＲＯＭ１０８に記録されたプログラムを読み出し、作業用領域としてメモリ１０９に展開して実行することで実現する。 FIG. 4 shows a flowchart of the face index recording request process in step S204 of FIG. This face index recording request processing is realized by the camera control microcomputer 110 reading out a program recorded in the flash ROM 108, developing it in the memory 109 as a work area, and executing it.

顔インデックス記録要求処理を開始するとまず、ステップＳ４０１において、顔管理テーブルの初期化を行う。顔ＩＤは全て無効（顔が１つも検出されていない状態）とし、出現時間、最新出現時間、消失時間、位置・大きさ、要求済みフラグを全てクリアした状態で、顔管理テーブルをメモリ１０９に用意する。 When the face index recording request process is started, first, in step S401, the face management table is initialized. All face IDs are invalid (no face is detected), and the face management table is stored in the memory 109 with the appearance time, latest appearance time, disappearance time, position / size, and requested flag all cleared. prepare.

ステップＳ４０２では、メモリ１０９に用意した内部変数フラグを初期化して０にセットする。 In step S402, the internal variable flag prepared in the memory 109 is initialized and set to zero.

ステップＳ４０３では、撮像素子１０２で撮像された１フレーム分の画像に関して、顔検出回路１０４による顔検出結果である顔情報を取得する。なお、これが顔インデックス記録要求処理開始後最初のフレームである場合には、取得した顔情報から主被写体を決定して、主被写体の追尾処理に必要な情報をメモリ１０９に記録（保持）する。そして主被写体以外で検出されている顔に顔ＩＤを新規発行し、各顔に関する情報を顔管理テーブルに記録する。 In step S 403, face information, which is a face detection result by the face detection circuit 104, is acquired for an image of one frame imaged by the image sensor 102. If this is the first frame after the start of the face index recording request process, the main subject is determined from the acquired face information, and information necessary for the main subject tracking process is recorded (held) in the memory 109. Then, a face ID is newly issued for a face detected other than the main subject, and information about each face is recorded in the face management table.

ステップＳ４０４では、主被写体が消失したか否かを判定する。主被写体は輝度マッチングによる追尾条件により追尾されているが、撮像素子１０２で撮像された画像を解析した結果、追尾条件を満たさなくなると主被写体は消失したと判定される。主被写体が消失したと判定するとステップＳ４０５へ進み、主被写体が消失していなければステップＳ４０６へ進む。 In step S404, it is determined whether or not the main subject has disappeared. Although the main subject is tracked according to the tracking condition based on the luminance matching, it is determined that the main subject has disappeared if the tracking condition is not satisfied as a result of analyzing the image captured by the image sensor 102. If it is determined that the main subject has disappeared, the process proceeds to step S405. If the main subject has not disappeared, the process proceeds to step S406.

ステップＳ４０５では、主被写体が消失したため、現在検出されているその他の顔から新たに主被写体を決定する。顔管理テーブルで管理されている有効な顔ＩＤのうち、所定のアルゴリズムにより主被写体としての顔を１つ決定し、この顔に関する情報で、メモリ１０９に記録されている主被写体の追尾に必要な情報を更新する。顔管理テーブルからは、主被写体に決定された顔に関する情報を全てクリアし、その顔ＩＤを無効とする。 In step S405, since the main subject has disappeared, a new main subject is determined from the other faces currently detected. Of the valid face IDs managed in the face management table, one face as a main subject is determined by a predetermined algorithm, and information relating to this face is necessary for tracking the main subject recorded in the memory 109. Update information. From the face management table, all information related to the face determined as the main subject is cleared, and the face ID is invalidated.

ステップＳ４０６では、ステップＳ４０３で取得した顔情報が、主被写体用要求条件に合致するか否かを判定する。この判定は、顔インデックスの記録要求をするべき主被写体の顔が新たに登場したか否かの判定である。主被写体は上述のように追尾処理が行われるため、主被写体の登場が理由で顔インデックスの記録要求をするべきか否かの判定は、後述する主被写体以外の顔に対する判定とは異なる条件によって行われる。主被写体用の要求条件に合致すると判定するとステップＳ４０７に進み、内部変数フラグを１にセットし、ステップＳ４０８へ進む。このように内部変数フラグを１にセットすることで、後述するステップＳ４２１で顔インデックスの記録要求が送信され、レコーダ制御用マイクロコンピュータ１１１によって顔インデックスが記録される（記録制御）。ステップＳ４０６で主被写体用の要求条件に合致しないと判定すると、内部変数フラグは０のままステップＳ４０８へ進む。 In step S406, it is determined whether or not the face information acquired in step S403 matches the main subject requirement condition. This determination is a determination as to whether or not a new face of the main subject for which a face index recording request is to be made has appeared. Since the tracking process is performed on the main subject as described above, the determination as to whether or not to make a face index recording request because of the appearance of the main subject depends on conditions different from the determination for a face other than the main subject described later. Done. If it is determined that the requirement for the main subject is met, the process proceeds to step S407, the internal variable flag is set to 1, and the process proceeds to step S408. By setting the internal variable flag to 1 in this way, a face index recording request is transmitted in step S421 described later, and the face index is recorded by the recorder control microcomputer 111 (recording control). If it is determined in step S406 that the required condition for the main subject is not met, the internal variable flag remains 0 and the process proceeds to step S408.

ステップＳ４０８では、ステップＳ４０３で取得した顔情報より、顔管理テーブルの有効な顔ＩＤのうち消失時間の記録されていない顔ＩＤの中で、今回のフレームから該当する顔が検出されてなくなった顔ＩＤがあるか否かを判定する。顔管理テーブルの有効な顔ＩＤのうち消失時間の記録されていない顔ＩＤとは、すなわち直前のフレームまで検出されていた顔の顔ＩＤである。該当する顔が検出されていると判定すると、すなわちどの顔も消失していないと判定するとステップＳ４１０へ進み、該当する顔が検出されていない顔ＩＤがあると判定すると、すなわち現在のフレームから消失した顔があると判定するとステップＳ４０９へ進む。 In step S408, from the face information acquired in step S403, the face for which no corresponding face has been detected from the current frame among the face IDs for which the disappearance time is not recorded among the valid face IDs in the face management table. It is determined whether or not there is an ID. Of the valid face IDs in the face management table, the face ID with no disappearance time recorded is the face ID of the face that has been detected up to the previous frame. If it is determined that the corresponding face has been detected, that is, if it is determined that no face has disappeared, the process proceeds to step S410, and if it is determined that there is a face ID for which no corresponding face has been detected, that is, it disappears from the current frame. If it is determined that there is a face that has been moved, the process proceeds to step S409.

ステップＳ４０９では、顔管理テーブルのうち、消失した顔の顔ＩＤの消失時間として、現在のフレームの時間を記録する。以前のフレームで検出できていた顔が少しの間検出できなかったとしても、単に少し顔をそむけた等の理由により、実際にはその顔の人物が撮像されていたにも関わらず、顔が検出されない場合がある。そのため、顔が検出されず消失した場合でも、消失後少しの間は顔ＩＤを有効としたまま、消失時間を記録して顔の情報を保持する。消失時間を記録するとステップＳ４１０へ進む。 In step S409, the current frame time is recorded as the disappearance time of the face ID of the lost face in the face management table. Even if the face that could be detected in the previous frame could not be detected for a while, the face was actually captured even though the person of that face was actually imaged due to a slight turn away. It may not be detected. For this reason, even when a face disappears without being detected, the disappearance time is recorded and face information is retained while the face ID remains valid for a short time after the disappearance. When the disappearance time is recorded, the process proceeds to step S410.

ステップＳ４１０では、ステップＳ４０３で取得した顔情報より、顔管理テーブルの有効な顔ＩＤのうち、消失時間の記録されている顔ＩＤに該当する顔が現在のフレームにあるか否かを判定する（第１の判定手段）。この判定は、Ｓ４０３で取得した顔情報に含まれる顔が、消失時間の記録されている有効な顔ＩＤに関して記録されている位置・大きさのどちらか一方、あるいは双方の情報に所定の閾値で合致するか否かによって判定する。消失時間の記録されている顔ＩＤに該当する顔があると判定するとステップＳ４１１へ進み、該当する顔がないと判定するとステップＳ４１２に進む。 In step S410, it is determined from the face information acquired in step S403 whether or not a face corresponding to the face ID in which the disappearance time is recorded is present in the current frame among the valid face IDs in the face management table ( First determination means). In this determination, the face included in the face information acquired in S403 has a predetermined threshold value in one or both of the positions and sizes recorded for the effective face ID in which the disappearance time is recorded. Judgment is made based on whether or not they match. If it is determined that there is a face corresponding to the face ID in which the disappearance time is recorded, the process proceeds to step S411. If it is determined that there is no corresponding face, the process proceeds to step S412.

ステップＳ４１１では、該当する顔ＩＤの消失時間を消去（クリア）し、最新出現時間として現在のフレームの時間を記録する。これにより、少しの間消失していたが以前検出されていた顔と同一の顔が復帰したと見なし、単に顔の動きの多い同一人物の顔について何度も顔インデックスの記録要求がされてしまうといった問題等を抑制している。 In step S411, the disappearance time of the corresponding face ID is deleted (cleared), and the current frame time is recorded as the latest appearance time. As a result, the face that has disappeared for a while but the same face as previously detected has been restored, and face index recording requests are repeatedly made for the face of the same person with many face movements. The problem etc. are suppressed.

ステップＳ４１２では、顔管理テーブルの有効な顔ＩＤのうち、消失時間から４秒が経過した顔ＩＤがあるか否かを判定する。消失時間から４秒が経過した顔ＩＤがあると判定するとステップＳ４１３に進み、消失時間から４秒が経過した顔ＩＤが無いと判定するとステップＳ４１４に進む。 In step S412, it is determined whether there is a face ID for which 4 seconds have elapsed from the disappearance time among the valid face IDs in the face management table. If it is determined that there is a face ID for which 4 seconds have elapsed from the disappearance time, the process proceeds to step S413, and if it is determined that there is no face ID for which 4 seconds have elapsed from the disappearance time, the process proceeds to step S414.

ステップＳ４１３では、消失時間から４秒が経過した顔ＩＤについて記録されている情報をすべて消去し、無効な顔ＩＤとする。消失後４秒で顔ＩＤを無効化するのは、消失時間から４秒経過した場合、顔をそむける等の小さな動きにより消失したのではなく、以前検出されていた顔の人物は実際に撮像されている範囲から居なくなったものと考えられるためである。なお、この４秒という閾値は、どの程度の期間消失し続けた場合に、以前検出されていた顔の人物が実際に撮像されている範囲から居なくなったものと考えるかによって適宜設定すればよい事項であるので、４秒でなくても良い。ステップＳ４１３の処理を行うと続いてステップＳ４１４へ進む。 In step S413, all the information recorded for the face ID for which 4 seconds have elapsed from the disappearance time is erased and set as an invalid face ID. The face ID is invalidated 4 seconds after the disappearance, when 4 seconds have elapsed from the disappearance time, it is not disappeared by a small movement such as turning away the face. This is because it is considered that they are no longer in the range. Note that the threshold value of 4 seconds may be set as appropriate depending on how long the face person has disappeared from the range in which the previously detected face is not actually captured. Since it is a matter, it may not be 4 seconds. When the process of step S413 is performed, the process proceeds to step S414.

ステップＳ４１４では、ステップＳ４０３で取得した顔情報より、顔管理テーブルの有効な顔ＩＤの何れにも該当しない顔があるか否かを判定する。顔管理テーブルの有効な顔ＩＤの何れにも該当しない顔がある場合とは、現在顔管理テーブルで管理していない、新しい顔が検出された場合であり、この場合はステップＳ４１５へ進む。一方、ステップＳ４１４で偽と判定された場合は新しい顔が検出されたわけではないため、ステップＳ４１７へ進む。 In step S414, it is determined from the face information acquired in step S403 whether there is a face that does not correspond to any of the valid face IDs in the face management table. The case where there is a face that does not correspond to any of the valid face IDs in the face management table is a case where a new face that is not currently managed in the face management table is detected. In this case, the process proceeds to step S415. On the other hand, if it is determined to be false in step S414, a new face has not been detected, and the process proceeds to step S417.

ステップＳ４１５では、検出された新しい顔が所定のアルゴリズムにより主被写体となるか否かを判定する（第３の判定手段）。主被写体になると判定した場合は、顔管理テーブルでは管理しないので、主被写体の追尾処理に必要な情報をメモリ１０９に記録（保持）し、ステップＳ４１７に進む。一方、検出された新しい顔が主被写体とならないと判定した場合は、ステップＳ４１６に進み、検出された新しい顔に対して新規に顔ＩＤを発行し、顔管理テーブルに新しい顔に関して出現時間、最新出現時間（出現時間と等しくなる）、位置・大きさを記録する。新規な顔ＩＤには、有効でない顔ＩＤ（無効な顔ＩＤ）のうちもっとも小さい番号のＩＤが割り当てられ、有効な顔ＩＤとなる。続いてステップＳ４１７へ進む。 In step S415, it is determined whether or not the detected new face is the main subject by a predetermined algorithm (third determining means). If it is determined to be the main subject, it is not managed in the face management table, so information necessary for tracking processing of the main subject is recorded (held) in the memory 109, and the process proceeds to step S417. On the other hand, if it is determined that the detected new face is not the main subject, the process proceeds to step S416, where a new face ID is issued for the detected new face, and the appearance time and latest information for the new face are displayed in the face management table. Record appearance time (equal to appearance time), position and size. The new face ID is assigned the ID having the smallest number among the invalid face IDs (invalid face IDs) and becomes a valid face ID. Then, it progresses to step S417.

ステップＳ４１７では、顔管理テーブルの有効な顔ＩＤのうち、消失時間の記録されていない顔ＩＤで、最新出現時間から１秒が経過した顔ＩＤがあるか否かを判定する（第２の判定手段）。これは、最新の出現時間から１秒間継続して検出され続けた顔があるか否かの判定である。最新出現時間から１秒が経過した顔ＩＤが無いと判定するとステップＳ４２１に進み、最新出現時間から１秒が経過した顔ＩＤがあると判定するとステップＳ４１８へ進む。ここで１秒間検出されつづけたか否かを判定するのは、一瞬しか検出されなかった顔に関して顔インデックスの記録要求がされてしまうことを防ぐためと、再生時の顔インデックス表示で用いるＩピクチャの画像に必ず顔が写っていることを保証するためである。１秒以上検出されつづけた顔であることを条件に顔インデックスの記録要求をすることで、検出時間の極端に短い顔について顔インデックスが記録されることを防ぎ、必要以上に顔インデックスの数が増えることを抑えている。なお、１秒かどうかは設計事項であり、１秒に限らず、Ｉピクチャの出現間隔以上の時間で、所定時間継続して検出されつづけたか否かの判定でよい。 In step S417, it is determined whether or not there is a face ID for which 1 second has elapsed from the latest appearance time among the valid face IDs in the face management table and for which no disappearance time is recorded (second determination). means). This is a determination as to whether or not there is a face that has been detected continuously for one second from the latest appearance time. If it is determined that there is no face ID for which one second has elapsed from the latest appearance time, the process proceeds to step S421, and if it is determined that there is a face ID for which one second has elapsed from the latest appearance time, the process proceeds to step S418. Here, it is determined whether or not the detection is continued for 1 second in order to prevent a face index recording request for a face that has been detected only for a moment, and for the I picture used for face index display during playback. This is to ensure that the face is always shown in the image. By requesting face index recording on the condition that the face has been detected for more than 1 second, it is possible to prevent face indexes from being recorded for faces with extremely short detection times. Suppressing the increase. Whether or not it is 1 second is a design matter, and is not limited to 1 second, and it may be determined whether or not it has been continuously detected for a predetermined time in a time longer than the appearance interval of the I picture.

ステップＳ４１８では、顔管理テーブルを参照して、最新の出現時間から１秒間継続して検出され続けた顔の顔ＩＤについて要求済みフラグがついているか否かを判定する。ついている場合は、最新の出現時間より前の出現の際に既に顔インデックスの記録要求が行われた顔であるため、今回は顔インデックスの記録要求をすることはせず、ステップＳ４２１に進む。なお、要求済みフラグがついている場合は最新の出現時間より前の出現が存在する顔ＩＤの場合であるので、ステップＳ４１０では真の判定となった顔ＩＤである。一方、要求済みフラグが付いていないと判定するとステップＳ４１９に進む。 In step S418, it is determined by referring to the face management table whether or not a requested flag is set for the face ID of the face that has been continuously detected for one second from the latest appearance time. If it is, the face index recording request has already been made at the time of appearance before the latest appearance time, so that the face index recording request is not made this time, and the process proceeds to step S421. Note that when the requested flag is attached, the face ID has an appearance before the latest appearance time, and thus the face ID is determined to be true in step S410. On the other hand, if it is determined that the requested flag is not attached, the process proceeds to step S419.

ステップＳ４１９では、まったく新しい顔で、出現時間から１秒以上継続して検出され続けた顔があるため、内部変数フラグを１にセットする。このように内部変数フラグを１にセットすることで、後述するステップＳ４２１で顔インデックスの記録要求が送信され、レコーダ制御用マイクロコンピュータ１１１によって顔インデックスが記録される（記録制御）。そしてステップＳ４２０で顔管理テーブルの要求済みフラグを１とする。続いてステップＳ４２１へ進む。 In step S419, since there is a completely new face that has been detected continuously for more than 1 second from the appearance time, the internal variable flag is set to 1. By setting the internal variable flag to 1 in this way, a face index recording request is transmitted in step S421 described later, and the face index is recorded by the recorder control microcomputer 111 (recording control). In step S420, the requested flag in the face management table is set to 1. Then, it progresses to step S421.

ステップＳ４２１では、現在のフレームを特定する情報と、メモリ１０９の内部変数フラグの情報をレコーダ制御用マイクロコンピュータ１１１に送信する。この際、内部変数フラグが１であった場合は顔インデックス記録要求となる。ステップＳ４２１の処理を終了すると、ステップＳ４０２に戻って次のフレームについての処理を行う。 In step S421, information for specifying the current frame and information on the internal variable flag in the memory 109 are transmitted to the recorder control microcomputer 111. At this time, if the internal variable flag is 1, it is a face index recording request. When the process of step S421 is completed, the process returns to step S402 and the process for the next frame is performed.

＜顔インデックス情報生成処理＞
図５に、前述の図２のステップＳ２０５における顔インデックス情報生成処理のフローチャートを示す。この顔インデックス情報生成処理は、レコーダ制御用マイクロコンピュータ１１１が、フラッシュＲＯＭ１１３に記録されたプログラムを読み出し、作業用領域としてメモリ１１４に展開して実行することで実現する。 <Face index information generation processing>
FIG. 5 shows a flowchart of the face index information generation process in step S205 of FIG. The face index information generation process is realized by the recorder control microcomputer 111 reading out the program recorded in the flash ROM 113, developing it in the memory 114 as a work area, and executing it.

ステップＳ５０１ではまず、タイマーをスタートする。 In step S501, first, a timer is started.

ステップＳ５０２では、図４のステップＳ４２１で送信された情報を受信し、メモリ１１４に一時記憶する。ステップＳ５０３では、ステップＳ５０１でタイマーがスタートしてから０．５秒が経過したか否かを判定する。０．５秒経過していないと判定するとステップＳ５０２に戻り、更に情報を受信して一時記憶する。つまり、ステップＳ４２１で送信された情報はメモリ１１４に０．５秒期間分蓄積する。ステップＳ５０３で０．５秒が経過したと判定するとステップＳ５０４に進む。 In step S502, the information transmitted in step S421 of FIG. 4 is received and temporarily stored in the memory 114. In step S503, it is determined whether 0.5 second has elapsed since the timer was started in step S501. If it is determined that 0.5 seconds have not elapsed, the process returns to step S502 to further receive and temporarily store information. That is, the information transmitted in step S421 is accumulated in the memory 114 for a period of 0.5 seconds. If it is determined in step S503 that 0.5 second has elapsed, the process proceeds to step S504.

ステップＳ５０４では、メモリ１１４に蓄積された０．５秒期間分の情報の中に、顔インデックス記録要求があるか否かを判定する。顔インデックス記録要求があった場合にはステップＳ５０５へ進み、無かった場合にはステップＳ５０１に戻り、次の０．５秒期間を計測するためにタイマーをリセットして再スタートし、処理を繰り返す。 In step S504, it is determined whether or not there is a face index recording request in the information for 0.5 second period accumulated in the memory 114. If there is a face index recording request, the process proceeds to step S505. If not, the process returns to step S501, and the timer is reset and restarted to measure the next 0.5 second period, and the process is repeated.

ステップＳ５０５では、メモリ１１４に蓄積された０．５秒期間分の情報の中から、顔インデックス記録要求のあったフレームを特定するフレームを参照する。そして、そのフレームの直前のＩピクチャの位置を示すＰＴＳをメモリ１１４に用意した顔インデックス情報に記録する。この際、各顔インデックスを識別するために、記録順に顔インデックス番号を付与して時系列に管理する。ステップＳ５０５の処理を終了するとステップＳ５０１に戻り、次の０．５秒期間を計測するためにタイマーをリセットして再スタートし、処理を繰り返す。 In step S505, a frame for identifying a frame for which a face index recording request is made is referred to from the information for 0.5 second period accumulated in the memory 114. Then, the PTS indicating the position of the I picture immediately before the frame is recorded in the face index information prepared in the memory 114. At this time, in order to identify each face index, face index numbers are assigned in order of recording and managed in time series. When the process of step S505 is completed, the process returns to step S501, and the timer is reset and restarted to measure the next 0.5 second period, and the process is repeated.

このように、撮影中にメモリ１１４に顔インデックス情報を生成、更新していき、図２のステップＳ２０７、Ｓ２０９で前述したように、撮影終了時点の顔インデックス情報をメモリカード１１６の管理情報ファイルに記録する。 In this way, face index information is generated and updated in the memory 114 during shooting, and the face index information at the end of shooting is stored in the management information file of the memory card 116 as described above in steps S207 and S209 in FIG. Record.

＜管理情報ファイル＞
図６に、前述した図２のステップＳ２０７、Ｓ２０９でメモリカード１１６に記録される管理情報ファイルの構成例を示す。管理情報ファイル６０１は、基本情報６０２、検索テーブル６０３、機種情報テーブル６０４、顔インデックス情報６０５で構成される。基本情報６０２には、圧縮ビデオデータの圧縮方式、フレームレート、画素数など基本的な情報が記録される。検索テーブル６０３は、早送りなどの特殊再生や指定時間のフレームを表示するとき等に必要となる検索テーブルである。動画ファイルに含まれる全ＩピクチャのＩＤと、各ＩピクチャのＰＴＳ、各Ｉピクチャが動画ファイルの先頭から何バイト目のデータ位置にあるかを示す情報が記録される。動画ファイルがパケットに分割されていれば、各Ｉピクチャが何パケット目に含まれるかの情報があってもよい。この検索テーブル６０３により、ＩピクチャのＰＴＳさえわかれば、動画ファイル中のＩピクチャの位置が特定できる。 <Management information file>
FIG. 6 shows a configuration example of the management information file recorded on the memory card 116 in steps S207 and S209 of FIG. The management information file 601 includes basic information 602, a search table 603, a model information table 604, and face index information 605. In the basic information 602, basic information such as a compression method of compressed video data, a frame rate, and the number of pixels is recorded. The search table 603 is a search table required for special playback such as fast-forwarding or when displaying a frame for a specified time. Information indicating the ID of all I pictures included in the moving image file, the PTS of each I picture, and the number of bytes of data position from the beginning of the moving image file is recorded. If the moving image file is divided into packets, there may be information on how many packets each I picture is included in. With this search table 603, the position of the I picture in the moving image file can be specified as long as the PTS of the I picture is known.

機種情報テーブル６０４はメーカーＩＤ、機種ＩＤが記録される領域であり、メーカー、製品ごとにユニークなＩＤが付与される。 The model information table 604 is an area in which a manufacturer ID and model ID are recorded, and a unique ID is assigned to each manufacturer and product.

顔インデックス情報６０５には、顔インデックス情報ＩＤ６０６、顔インデックス数６０７、各顔インデックス６０８が記録される。 In the face index information 605, a face index information ID 606, a face index number 607, and each face index 608 are recorded.

顔インデックス情報ＩＤ６０６は、この領域が顔インデックス情報を記録している部分であることを示す識別子が記録される。この識別子を認識できる再生装置であれば、この領域の情報を顔インデックス情報として利用することができ、顔インデックス表示を行うことができる。顔インデックス情報ＩＤを識別できない（このＩＤが顔インデックス情報を示すものであることを知らない）再生装置では、顔インデックス情報６０５は利用されない。 In the face index information ID 606, an identifier indicating that this area is a portion where face index information is recorded is recorded. If the playback apparatus can recognize this identifier, the information in this area can be used as face index information, and face index display can be performed. In a playback apparatus that cannot identify the face index information ID (it does not know that this ID indicates face index information), the face index information 605 is not used.

顔インデックス数６０７は、各顔インデックス６０８に記録された顔インデックスの数を示しており、最大Ｎ個分まで記録される。Ｎ個とは、予め決められた最大値である。これは、顔情報が異常に増加してしまうと作業メモリ容量、検索速度の点から不利となるため上限を設けるものである。 The face index number 607 indicates the number of face indexes recorded in each face index 608, and is recorded for a maximum of N. N is a predetermined maximum value. This is because an increase in face information is disadvantageous in terms of working memory capacity and search speed, so that an upper limit is set.

各顔インデックス６０８には、顔インデックス表示で表示すべき画像を特定する情報である。各顔インデックスについて、顔インデックス番号と、前述した図５の顔インデックス情報生成処理におけるステップＳ６０５でメモリ１１４に一時記憶されていたＰＴＳが記録される。このＰＴＳを用いて、顔インデックス表示で表示すべき各画像のＩピクチャの動画ファイル中での位置を、検索テーブルから６０３から検索して特定することができる。これにより、顔インデックス表示中で指定された各画像（顔インデックスで特定される画像）の位置から再生するなどの再生機能を実現することができる。 Each face index 608 is information for specifying an image to be displayed in the face index display. For each face index, the face index number and the PTS temporarily stored in the memory 114 in step S605 in the face index information generation process of FIG. 5 described above are recorded. Using this PTS, the position in the moving image file of the I picture of each image to be displayed in the face index display can be searched from the search table 603 and specified. Thereby, it is possible to realize a reproduction function such as reproduction from the position of each image (image specified by the face index) designated in the face index display.

この管理情報ファイルのファイル名の拡張子より前の部分は、対応する動画ファイルのファイル名の拡張子より前の部分と同じとすることで対応関係を取れるようになっていて、動画ファイルとは別のファイルである。なお、この管理情報ファイルは動画ファイルと別体ではなく、動画ファイル内のヘッダ情報として付加されていても良い。 The part before the file name extension of the management information file is the same as the part before the file name extension of the corresponding video file so that the correspondence can be taken. Another file. Note that this management information file may be added as header information in the moving image file instead of being separated from the moving image file.

本実施の形態のデジタルビデオカメラは、この管理情報ファイルを用いて、再生モードにおいて上述の顔インデックス表示を行うことができる（表示制御）。操作スイッチ群１１２の操作によって再生モードでの顔インデックス表示が指示されると、デジタルビデオカメラは、顔インデックス表示の対象となった動画ファイルと、拡張子より前の部分のファイル名が同じ管理情報ファイルを検索する。そして検索された管理情報ファイルに含まれる顔インデックス情報６０５に基づいて、図８のような顔インデックス表示を行うよう制御する。図８の画像８０４〜８０８には、この画像管理ファイルに記録されている顔インデックス番号０〜４のＰＴＳが示すフレームの画像が表示される。ユーザの操作によってこのうちいずれかの画像が選択され、ユーザからの再生指示を受け付けると、選択された画像に対応する顔インデックス番号のＰＴＳが示すフレームから、動画の再生が開始される（再生制御）。 The digital video camera of the present embodiment can perform the above-described face index display in the playback mode using this management information file (display control). When face index display in the playback mode is instructed by operating the operation switch group 112, the digital video camera uses the management information in which the file name before the extension is the same as the video file that is the target of face index display. Search for a file. Then, based on the face index information 605 included in the searched management information file, control is performed so that the face index display as shown in FIG. 8 is performed. In the images 804 to 808 in FIG. 8, the images of the frames indicated by the PTSs with face index numbers 0 to 4 recorded in the image management file are displayed. When any one of these images is selected by the user's operation and a reproduction instruction from the user is accepted, the reproduction of the moving image is started from the frame indicated by the PTS of the face index number corresponding to the selected image (reproduction control). ).

＜本願を適用した場合の顔インデックス記録要求のタイミングチャート＞
図７に、図９で前述した動画と同一の動画を撮影した場合に、本発明による顔インデックス情報に含まれる、各顔インデックスの記録要求のタイミングを示す。 <Timing chart of face index recording request when this application is applied>
FIG. 7 shows the recording request timing of each face index included in the face index information according to the present invention when the same moving image as that described above with reference to FIG. 9 is taken.

期間９０１〜９０３、９１１〜９１５、９２１〜９２３は図９で前述したものと同じ意味、同じタイミングであるので説明を省略する。 The periods 901 to 903, 911 to 915, and 921 to 923 have the same meaning and the same timing as those described above with reference to FIG.

顔インデックス番号９５０は、図４で説明した顔インデックス記録要求処理により行われる顔インデックス記録要求のタイミングと、各顔インデックスの顔インデックス番号を示している。本願の顔インデックス記録要求処理によれば、図７の例のように、４回の顔インデックス記録要求が行われ、顔インデックス情報に顔インデックス番号０〜３の４個の顔インデックスが記録される。そして再生時の顔インデックス表示では顔インデックス番号０〜３に対応する４個の画像がタイムライン上に表示されることになる。 The face index number 950 indicates the timing of the face index recording request performed by the face index recording request process described with reference to FIG. 4 and the face index number of each face index. According to the face index recording request process of the present application, as shown in the example of FIG. 7, four face index recording requests are made, and four face indexes with face index numbers 0 to 3 are recorded in the face index information. . In the face index display during reproduction, four images corresponding to the face index numbers 0 to 3 are displayed on the timeline.

各顔インデックス記録要求のタイミングについて詳しく説明する。 The timing of each face index recording request will be described in detail.

期間９０１、期間９１１の顔検出では、主被写体と顔ＩＤ１の顔が新たに検出されて１秒以上検出されつづけるため、顔インデックス番号０で顔インデックスが記録される。 In the face detection in the period 901 and the period 911, the face with the face index number 0 is recorded because the main subject and the face with face ID 1 are newly detected and continue to be detected for 1 second or longer.

期間９１２による顔検出では期間９１１の検出が途切れてから４秒経たないうちに顔が検出されており、同一ＩＤが割り当てられている。すなわち、図４のステップＳ４１０で消失時間のある顔ＩＤに該当する顔である（期間９１１で検出された後に消失していた顔ＩＤ１と同一人物とみなす顔である）と判定された顔である。そのため、顔ＩＤ１は期間９１１で既に顔インデックスの記録要求をしており、図４のステップＳ４１８で要求済みフラグが付いていると判定されるので、新たに顔インデックスの記録要求をすることはしない。これにより、同一人物の顔がたまたま少しの期間検出されなかったような場合に顔インデックスの記録要求をすることを省き、必要以上に顔インデックスの数が増えることを抑えている。 In the face detection in the period 912, the face is detected within 4 seconds after the detection in the period 911 is interrupted, and the same ID is assigned. In other words, the face is determined to be a face corresponding to the face ID having the disappearance time in step S410 in FIG. 4 (the face considered to be the same person as face ID1 that has disappeared after being detected in period 911). . Therefore, face ID1 has already requested a face index recording in period 911, and since it is determined that the requested flag is attached in step S418 in FIG. 4, no new face index recording request is made. . As a result, the face index recording request is omitted when the face of the same person is not detected for a short period of time, and the number of face indexes is prevented from increasing more than necessary.

期間９１３による顔検出では、期間９１２の検出が途切れてから４秒経った後に顔が検出されている。そのため、同一ＩＤである顔ＩＤ１が割り当てられているが、顔ＩＤ１は一度失効しており、期間９１３で検出された顔と期間９１２で検出された顔が同一の顔であると判定されたわけではない。しかし期間９１３が１秒に満たないため、図４のステップＳ４１７でＹｅｓの判定となることが無く、新たに顔インデックスの記録要求をすることはしない。これにより、検出時間の極端に短い顔検出期間については顔インデックスの記録要求を省き、必要以上に顔インデックスの数が増えることを抑えている。 In the face detection in the period 913, the face is detected 4 seconds after the detection in the period 912 is interrupted. Therefore, the face ID 1 that is the same ID is assigned, but the face ID 1 has expired once, and it is not determined that the face detected in the period 913 and the face detected in the period 912 are the same face. Absent. However, since the period 913 is less than 1 second, there is no determination of Yes in step S417 in FIG. 4, and no new face index recording request is made. Thus, the face index recording request is omitted during the face detection period having an extremely short detection time, and the number of face indexes is prevented from increasing more than necessary.

期間９１４による検出では、期間９１３の検出が途切れてから４秒経った後に顔が１秒以上連続して検出されているため、顔インデックス番号２で顔インデックスが記録される。 In the detection by the period 914, the face index is recorded with the face index number 2 since the face is continuously detected for 1 second or more after 4 seconds have passed since the detection of the period 913 is interrupted.

期間９１５による顔検出では、期間９１２による顔検出と同様の理由により顔インデックスの記録要求をすることを省いている。 In the face detection in the period 915, a face index recording request is omitted for the same reason as the face detection in the period 912.

期間９２１では、顔ＩＤ２が割り当てられる顔が新たに検出され、１秒以上連続して検出されているため、顔インデックス番号１で顔インデックスが記録される。期間９２２による顔検出では、期間９１２による顔検出と同様の理由により顔インデックスの記録要求をすることを省いている。期間９２３による検出では、期間９２２の検出が途切れてから４秒経った後に顔が１秒以上連続して検出されているため、顔インデックス番号３で顔インデックスが記録される。 In the period 921, since a face to which the face ID 2 is assigned is newly detected and continuously detected for 1 second or longer, the face index is recorded with the face index number 1. In the face detection in the period 922, a face index recording request is omitted for the same reason as the face detection in the period 912. In the detection by the period 923, the face index is recorded with the face index number 3 because the face is continuously detected for 1 second or more after 4 seconds have passed since the detection of the period 922 was interrupted.

以上のようにして、本発明を適用した図７の例では、顔インデックス番号０〜３の４個の顔インデックス番号が記録される。記録される動画は図９で前述したものと同一であるが、顔インデックスの数が図９で前述した従来の数よりも少なくなっている。このように、本発明によれば、主被写体以外の顔について、処理負荷の高い追尾処理をすることなく、不必要に顔インデックスの数が増え過ぎることを抑えることができる。 As described above, in the example of FIG. 7 to which the present invention is applied, four face index numbers of face index numbers 0 to 3 are recorded. The moving images to be recorded are the same as those described above with reference to FIG. 9, but the number of face indexes is smaller than the conventional number described above with reference to FIG. As described above, according to the present invention, it is possible to suppress an unnecessary increase in the number of face indexes without performing tracking processing with a high processing load on faces other than the main subject.

なお、本実施の形態では、動画を撮影可能な撮像装置による撮像時である動画撮影モード処理中に本願発明を適用する例を説明したが、撮像時に限らず、再生時に適用することもできる。既に撮像され、記録された動画を再生して各フレームから顔検出処理を行って図４の記録要求処理、図５の顔インデックス情報生成処理に相当する処理を行い、図６のような管理情報ファイルを作成してもよい。 In the present embodiment, the example in which the present invention is applied during the moving image shooting mode process that is performed when an image capturing apparatus capable of capturing a moving image is described. However, the present invention is not limited to image capturing but can be applied during reproduction. Management information as shown in FIG. 6 is performed by playing back a video that has already been captured and recorded, performing face detection processing from each frame, and performing processing corresponding to the recording request processing in FIG. 4 and the face index information generation processing in FIG. You may create a file.

以上これまで説明してきたように本実施の形態によれば、主被写体以外の顔情報に追尾処理がないことによる顔インデックス情報の増加を防ぐことが、簡単な処理を追加することで実現できる。すなわち、再生時の顔インデックス表示の際に主被写体以外の顔が必要以上に多くならないよう、撮影時に、少ない処理負荷で顔が新たに登場した位置をより信頼性高く記録することができる。 As described above, according to the present embodiment, it is possible to prevent an increase in the face index information due to the absence of the tracking process in the face information other than the main subject by adding a simple process. That is, the position where a new face appears can be recorded with higher reliability at the time of shooting so that the number of faces other than the main subject does not increase more than necessary when the face index is displayed during playback.

よって、タイムライン等のＧＵＩにおいて主被写体以外のサムネイル数を適切に抑え、ユーザにとって良好な機能を実現することが可能となる。 Therefore, it is possible to appropriately suppress the number of thumbnails other than the main subject in a GUI such as a timeline and realize a function that is favorable for the user.

なお、上述した実施の形態の処理は、各機能を具現化したソフトウェアのプログラムコードを記録した記憶媒体をシステム或いは装置に提供してもよい。そして、そのシステム或いは装置のコンピュータ（又はＣＰＵやＭＰＵ）が記憶媒体に格納されたプログラムコードを読み出し実行することによって、前述した実施形態の機能を実現することができる。この場合、記憶媒体から読み出されたプログラムコード自体が前述した実施形態の機能を実現することになり、そのプログラムコードを記憶した記憶媒体は本発明を構成することになる。このようなプログラムコードを供給するための記憶媒体としては、例えば、フロッピィ（登録商標）ディスク、ハードディスク、光ディスク、光磁気ディスクなどを用いることができる。或いは、ＣＤ−ＲＯＭ、ＣＤ−Ｒ、磁気テープ、不揮発性のメモリカード、ＲＯＭなどを用いることもできる。 Note that the processing of the above-described embodiment may provide a system or apparatus with a storage medium that records software program codes that embody each function. The functions of the above-described embodiments can be realized by the computer (or CPU or MPU) of the system or apparatus reading out and executing the program code stored in the storage medium. In this case, the program code itself read from the storage medium realizes the functions of the above-described embodiments, and the storage medium storing the program code constitutes the present invention. As a storage medium for supplying such a program code, for example, a floppy (registered trademark) disk, a hard disk, an optical disk, a magneto-optical disk, or the like can be used. Alternatively, a CD-ROM, CD-R, magnetic tape, nonvolatile memory card, ROM, or the like can be used.

また、コンピュータが読み出したプログラムコードを実行することにより、前述した各実施の形態の機能が実現されるだけではない。そのプログラムコードの指示に基づき、コンピュータ上で稼動しているＯＳ（オペレーティングシステム）などが実際の処理の一部又は全部を行い、その処理によって前述した各実施の形態の機能が実現される場合も含まれている。 The functions of the above-described embodiments are not only realized by executing the program code read by the computer. In some cases, an OS (operating system) running on the computer performs part or all of the actual processing based on the instruction of the program code, and the functions of the above-described embodiments are realized by the processing. include.

さらに、記憶媒体から読み出されたプログラムコードが、コンピュータに挿入された機能拡張ボードやコンピュータに接続された機能拡張ユニットに備わるメモリに書きこまれてもよい。その後、そのプログラムコードの指示に基づき、その機能拡張ボードや機能拡張ユニットに備わるＣＰＵなどが実際の処理の一部又は全部を行い、その処理によって前述した各実施の形態の機能が実現される場合も含むものである。 Further, the program code read from the storage medium may be written in a memory provided in a function expansion board inserted into the computer or a function expansion unit connected to the computer. After that, the CPU of the function expansion board or function expansion unit performs part or all of the actual processing based on the instruction of the program code, and the functions of the above-described embodiments are realized by the processing. Is also included.

本発明の一実施形態としてのデジタルビデオカメラの構成を示すブロック図である。It is a block diagram which shows the structure of the digital video camera as one Embodiment of this invention. 本実施形態における動画撮影モード処理を説明するフローチャートである。It is a flowchart explaining the moving image shooting mode processing in the present embodiment. 本実施形態における顔インデックス記録要求処理で使用される顔管理テーブルの例である。It is an example of the face management table used by the face index recording request process in this embodiment. 本実施形態における顔インデックス記録要求処理を説明するフローチャートである。It is a flowchart explaining the face index recording request | requirement process in this embodiment. 本実施形態における顔インデックス情報生成処理を説明するフローチャートである。It is a flowchart explaining the face index information generation process in this embodiment. 本実施形態における動画撮影モード処理で生成される管理情報ファイルの例である。It is an example of the management information file produced | generated by the moving image shooting mode process in this embodiment. 本実施形態における顔インデックス記録要求処理で行われる顔インデックス記録要求のタイミングを示すタイミングチャートである。It is a timing chart which shows the timing of the face index recording request | requirement performed by the face index recording request | requirement process in this embodiment. 動画中で検出された顔画像を一覧表示した表示例である。It is the example of a display which displayed the face image detected in the moving image as a list. 従来の顔インデックス記録要求のタイミングを示すタイミングチャートである。It is a timing chart which shows the timing of the conventional face index recording request | requirement.

１０１レンズユニット
１０２撮像素子
１０３カメラ信号処理
１０４顔検出回路
１０５ビデオ信号処理回路
１０６圧縮伸張回路
１０７液晶パネル
１０８フラッシュＲＯＭ
１０９メモリ
１１０カメラ制御用マイクロコンピュータ
１１１レコーダ制御用マイクロコンピュータ
１１２操作スイッチ群
１１３フラッシュＲＯＭ
１１４メモリ
１１５カードＩ／Ｆ
１１６メモリカード
１２０バス
１２１バス DESCRIPTION OF SYMBOLS 101 Lens unit 102 Image pick-up element 103 Camera signal processing 104 Face detection circuit 105 Video signal processing circuit 106 Compression / decompression circuit 107 Liquid crystal panel 108 Flash ROM
109 Memory 110 Microcomputer for camera control 111 Microcomputer for recorder control 112 Operation switch group 113 Flash ROM
114 Memory 115 Card I / F
116 memory card 120 bus 121 bus

Claims

Face detection means for detecting a face from each frame of the video;
Of the faces detected by the face detection means, for the face determined as the main subject, a first period in which the main subject is detected by the face detection means and the main subject are detected in the video. A period after the first period, a second period satisfying a first condition including matching conditions other than the face position and size and time relating to the main object, and the main object is detected by the face detection means. Information for specifying a frame for the first period is recorded as face index information for a period continuous with the third period detected again, and information for specifying a frame for the third period is the face The index information is controlled not to be recorded, and among the faces detected by the face detection means, the face detection of the moving image is detected for a specific face other than the main subject. A first period in which the specific face is detected in a stage, a second period within a predetermined time after the specific face is no longer detected, and the specific face that has been detected in the first period For a period in which a face that satisfies the second condition relating to at least one of the position and the size of the face is detected by the face detection means, the information specifying the frame for the first period An image processing apparatus comprising: control means for recording information as face index information and controlling not to record information specifying a frame for the third period as the face index information.

The image processing apparatus according to claim 1, wherein the first condition is a condition including matching by luminance related to the main subject.

The image processing apparatus according to claim 2, wherein the control unit does not perform matching by luminance in the second period for the specific face other than the main subject.

The control means records the face index information when the face detection means continuously detects the specific face other than the main subject for a specific time or longer, and the specific face continues for the specific time or longer. 4. The image processing apparatus according to claim 1, wherein the face index information is not recorded when the face index information is not detected.

A holding means for holding area information of at least one of the position and size of the specific face detected immediately before the specific face is no longer detected;
2. The second condition is that a face that matches the area information held in the holding unit is detected within a predetermined time after the specific face is no longer detected. 5. The image processing apparatus according to any one of items 4 to 4.

The said holding | maintenance means erase | eliminates the information currently hold | maintained regarding this face when the said predetermined time passes without the face corresponding to the said area | region information being hold | maintained being detected. Image processing device.

It further has an imaging means capable of shooting a video,
The image processing apparatus according to claim 1, wherein the face detecting unit detects a face from each frame of the moving image being shot while the moving image is being shot by the imaging unit. .

The moving image is a moving image compressed by a moving image compression method including an intra-frame encoded picture and an inter-frame encoded picture,
The imaging apparatus according to claim 1, wherein the control unit performs control so as to record information specifying a position of an intra-frame encoded picture as the face index information.

The moving image is a moving image compressed by a moving image compression method including an intra-frame encoded picture and an inter-frame encoded picture,
The control means controls to record information specifying a position of an intra-frame coded picture as the face index information;
The image processing apparatus according to claim 4, wherein the specific time is longer than an appearance interval of the intra-frame coded picture in the moving image.

Display control means for controlling to display an image of a frame specified by the face index information recorded by the control means;
Accepting means for accepting an operation for selecting one image from images displayed by the display control means from a user;
The image processing apparatus according to claim 1, further comprising reproduction control means for controlling the reproduction of the moving image from the position of the frame of the selected image.

Face detection means for detecting a face from each frame of the video;
Of the faces detected by the face detection means, for the face determined as the main subject, a first period in which the main subject is detected by the face detection means and the main subject are detected in the video. A period after the first period, a second period satisfying a first condition including matching conditions other than the face position and size and time relating to the main object, and the main object is detected by the face detection means. Control is performed so that the third period is treated as a series of periods in which the main subject exists without treating the third period as a period in which the main subject has newly appeared for a period in which the third period detected again is continuous. With
Among the faces detected by the face detection means, for a specific face other than the main subject, a first period in which the specific face is detected by the face detection means in the video, and the specific face A face that satisfies a second condition relating to at least one of the second period within a predetermined time after the detection of the detection and the position and size of the specific face detected in the first period is the face detection Control that the third period detected by the means is continued as a series of periods related to the specific subject without treating the third period as a period in which the specific subject has newly appeared. And an image processing apparatus.

A face detection step for detecting a face from each frame of the video;
Of the faces detected in the face detection step, for the face determined as the main subject, in the moving image, the first period in which the main subject is detected in the face detection step and the main subject are detected. A period after the first period, a second period satisfying a first condition including a matching condition other than a face position, size, and time related to the main object, and the main object is detected in the face detection step. Information for specifying a frame for the first period is recorded as face index information for a period continuous with the third period detected again, and information for specifying a frame for the third period is the face In addition to controlling not to record as index information,
Among the faces detected in the face detection step, for a specific face other than the main subject, a first period in which the specific face is detected in the face detection step in the video, and the specific face A face that satisfies a second condition relating to at least one of the second period within a predetermined time after the detection of the detection and the position and size of the specific face detected in the first period is the face detection For a period continuous with the third period detected in the step, information for specifying a frame for the first period is recorded as the face index information, and information for specifying a frame for the third period is And a control step of controlling not to record the face index information.

A face detection step for detecting a face from each frame of the video;
Of the faces detected in the face detection step, for the face determined as the main subject, in the moving image, the first period in which the main subject is detected in the face detection step and the main subject are detected. A period after the first period, a second period satisfying a first condition including a matching condition other than a face position, size, and time related to the main object, and the main object is detected in the face detection step. Control is performed so that the third period is treated as a series of periods in which the main subject exists without treating the third period as a period in which the main subject has newly appeared for a period in which the third period detected again is continuous. With
Among the faces detected in the face detection step, for a specific face other than the main subject, a first period in which the specific face is detected in the face detection step in the video, and the specific face A face that satisfies a second condition relating to at least one of the second period within a predetermined time after the detection of the detection and the position and size of the specific face detected in the first period is the face detection Control to treat the third period as a series of periods related to the specific subject without treating the third period as a period in which the specific subject has newly appeared for a period in which the third period detected in the step is continuous. And a control step for controlling the image processing apparatus.

The program for functioning a computer as each means of the image processing apparatus as described in any one of Claims 1 thru | or 11.

A computer-readable recording medium storing a program for causing a computer to function as each unit of the image processing apparatus according to any one of claims 1 to 11.