JP2007299297A

JP2007299297A - Image composition device and control method thereof

Info

Publication number: JP2007299297A
Application number: JP2006128057A
Authority: JP
Inventors: Koichi Tanaka; 康一田中; Masaya Tamaru; 雅也田丸; Sumie Mikami; 澄重三上
Original assignee: Fujifilm Corp
Current assignee: Fujifilm Corp
Priority date: 2006-05-02
Filing date: 2006-05-02
Publication date: 2007-11-15

Abstract

<P>PROBLEM TO BE SOLVED: To obtain a subject image so that all persons and a target image are made to be the best shot when the subject image includes a plurality of persons and a target object image. <P>SOLUTION: Three frames of subject images I1, I2 and I3 are obtained by performing consecutive photographing. A face image 71 with the best expressions is obtained among face images 61, 71 and 81 of the same subject. A face image 62 with the best expressions is similarly obtained among face images 62, 72 and 82 of the same subject. A face image 83 with the best expressions is obtained among face images 63, 73 and 83 of the same subject. The obtained face images 71, 62 and 83 are combined with a background image 101 of the subject image I2 of the second frame to obtain a composite image. Therefore, the composite images are mostly an image with good expressions. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

この発明は，画像合成装置およびその制御方法に関する。 The present invention relates to an image composition device and a control method thereof.

集合写真のように多くの人物が被写体に含まれている場合には，目をつむってしまう人物のいるためすべての人物の表情が良好となることは難しい。このために，集合写真を撮る場合に，各人を個別に撮像して合成するものがある（特許文献１）。しかしながら，個別に撮像したからといって必ずしも各人の良好な画像が得られるとは限らない。
特開2001-251550号公報 When a large number of people are included in the subject as in a group photo, it is difficult to improve the facial expression of all the people because there are people who close their eyes. For this reason, when taking a group photo, there is one in which each person is individually captured and synthesized (Patent Document 1). However, a good image of each person is not always obtained just because the images are taken individually.
JP 2001-251550 A

また，連続撮影を行い，得られた複数駒の被写体像の中の特定の人物像に対して顔の向き，目つむりなどの評価を行い，高い評価の人物像を含む被写体像を選ぶものもある（特許文献２，３）。
特開2005-45600号公報特開2005-45601号公報 In addition, there are those that perform continuous shooting, evaluate the face orientation, eye-browsing, etc. for a specific person image in the obtained multiple-frame subject image, and select a subject image including a highly evaluated person image (Patent Documents 2 and 3).
JP-A-2005-45600 JP 2005-45601 A

しかしながら，単に一人の人物のベスト・ショットの被写体像が選ばれるにすぎない。 However, the best shot subject image of one person is simply selected.

この発明は，被写体像に複数の人物，対象物画像が含まれている場合に，すべての人物，対象画像がベスト・ショットとなるように被写体像を得ることを目的とする。 An object of the present invention is to obtain a subject image so that all the people and the target image become the best shot when the subject image includes a plurality of persons and target images.

第１の発明による画像合成装置は，被写体を撮像し，被写体像を表す画像データを出力する撮像装置，上記撮像装置から出力された画像データによって表される被写体像から２以上の異なる顔の画像部分を検出する顔検出手段，上記顔検出手段によって検出された２以上の異なる顔画像部分のそれぞれの表情評価値を算出する表情評価値算出手段，上記撮像装置における撮像，上記顔検出手段における検出処理，および上記表情評価値算出手段における算出を連続して複数回繰り返すように上記撮像装置，上記顔検出手段，および上記評価値算出手段を制御する制御手段，上記制御手段の制御により得られた複数駒の被写体像において検出された複数の同一の顔画像部分の中から表情評価値の高い顔画像部分を決定する処理を，２以上の異なる顔画像部分のそれぞれについて行う顔画像決定手段，ならびに上記顔画像決定手段によって決定された２以上の異なる顔画像部分と，顔画像部分を除く背景画像部分とを合成して一駒の合成被写体像を生成する生成手段を備えていることを特徴とする。 An image composition device according to a first aspect of the present invention is an image pickup device that picks up a subject and outputs image data representing the subject image, and two or more different face images from the subject image represented by the image data output from the image pickup device. Face detecting means for detecting a part, facial expression evaluation value calculating means for calculating facial expression evaluation values of two or more different face image parts detected by the face detecting means, imaging in the imaging device, detection in the face detecting means Obtained by the control of the imaging device, the face detecting means, and the evaluation value calculating means, and the control means so that the processing and the calculation in the facial expression evaluation value calculating means are repeated a plurality of times continuously. A process for determining a face image portion having a high expression evaluation value from a plurality of identical face image portions detected in a plurality of subject images is performed by two or more different processes. The face image determining means for each of the face image portions, and two or more different face image portions determined by the face image determining means and the background image portion excluding the face image portion are combined to form a single composite subject image. It is characterized by comprising generating means for generating.

第１の発明は，上記画像合成装置の適した制御方法も提供している。すなわち，この方法は，撮像装置が，被写体を撮像し，被写体像を表す画像データを出力し，顔検出手段が，上記撮像装置から出力された画像データによって表される被写体像から２以上の異なる顔の画像部分を検出し，表情評価値算出手段が，上記顔検出手段によって検出された２以上の異なる顔画像部分のそれぞれの表情評価値を算出し，制御手段が，上記撮像装置における撮像，上記顔検出手段における検出処理，および上記表情評価値差出手段における算出を連続して複数回繰り返すように上記撮像装置，上記顔検出手段，および上記評価値算出手段を制御し，顔画像決定手段が，上記制御手段の制御により得られた複数駒の被写体像において検出された複数の同一の顔画像部分の中から表情評価値の高い顔画像部分を決定する処理を，２以上の異なる顔画像部分のそれぞれについて行い，生成手段が，上記顔画像決定手段によって決定された２以上の異なる顔画像部分と，顔画像部分を除く背景画像部分とを合成して一駒の合成被写体像を生成するものである。 The first invention also provides a suitable control method for the image composition apparatus. That is, in this method, the imaging device images a subject, outputs image data representing the subject image, and the face detection means differs from the subject image represented by the image data output from the imaging device by two or more. The facial image portion is detected, the facial expression evaluation value calculation means calculates facial expression evaluation values of two or more different facial image portions detected by the face detection means, and the control means captures images in the imaging device. The face image determining means controls the imaging device, the face detecting means, and the evaluation value calculating means so that the detection processing in the face detecting means and the calculation in the facial expression evaluation value sending means are continuously repeated a plurality of times. , A process of determining a face image portion having a high facial expression evaluation value from a plurality of identical face image portions detected in a plurality of frame subject images obtained by control of the control means , For each of two or more different face image portions, and the generating means combines two or more different face image portions determined by the face image determining means and a background image portion excluding the face image portion to produce one frame. A composite subject image is generated.

第１の発明によると，被写体が撮像され，被写体像を表す画像データが得られる。得られた画像データによって表される被写体像の中から２以上の異なる顔の画像部分が検出される。検出された２以上の異なる顔の画像部分のそれぞれの表情評価値が算出される。このような撮像，顔画像部分の検出処理および表情評価値の算出処理が複数回繰り返される。複数駒の被写体像において検出された複数の同一の顔画像部分において表情評価値の高い顔画像部分を決定する処理が２以上の顔画像部分のそれぞれについて行われる。決定された２以上の顔画像部分と顔画像部分を除く背景画像部分とが合成されて一駒の被写体像が生成される。 According to the first invention, the subject is imaged, and image data representing the subject image is obtained. Two or more different face image portions are detected from the subject image represented by the obtained image data. A facial expression evaluation value is calculated for each of the detected two or more different face image portions. Such imaging, face image portion detection processing, and facial expression evaluation value calculation processing are repeated a plurality of times. A process of determining a face image portion having a high expression evaluation value in a plurality of identical face image portions detected in a plurality of subject images is performed for each of two or more face image portions. The determined two or more face image portions and the background image portion excluding the face image portion are combined to generate a single subject image.

第１の発明によると，複数駒の被写体像が得られることにより，その複数駒の被写体像から同一の顔画像部分が複数得られる。複数の同一の顔画像部分の中から決定された表情評価値の高い顔画像部分を用いて背景画像部分と合成されるので，合成画像に含まれる顔画像部分の表情はすべて表情評価値の高いものとなる。複数の顔画像部分が含まれていても，それらの複数の顔画像部分のすべてが表情評価値の高い（たとえば，良い表情のもの）ものとなる。 According to the first invention, by obtaining a plurality of frame subject images, a plurality of identical face image portions can be obtained from the plurality of frame subject images. Since the facial image part with a high facial expression evaluation value determined from the same facial image part is combined with the background image part, all facial expression parts included in the composite image have a high facial expression evaluation value It will be a thing. Even if a plurality of face image portions are included, all of the plurality of face image portions have a high expression evaluation value (for example, a good expression).

上記表情評価値算出手段が，笑顔，怒り顔，目つむり顔および泣き顔のうち一つの評価値を算出するように指定する指定手段をさらに備えてもよい。 The facial expression evaluation value calculating means may further comprise a specifying means for specifying to calculate one evaluation value among a smile, an angry face, a staring face and a crying face.

上記生成手段によって生成された合成画像のうち，２以上の異なる顔画像部分のうちの表情評価値がしきい値以上の顔画像部分が所定値以下であるときに警告する警告手段をさらに備えてもよい。 Warning means is further provided for warning when a facial image portion whose facial expression evaluation value is greater than or equal to a threshold value among two or more different facial image portions of the composite image generated by the generating device is less than a predetermined value. Also good.

上記顔画像決定手段によって決定された２以上の異なる顔画像部分のそれぞれの顔画像部分と背景画像部分との相対的なずれ量にもとづいて，背景画像部分との相対的なずれを解消するようにそれぞれの顔画像部分のずれを補正するずれ補正手段をさらに備えてもよい。この場合，上記生成手段は，上記ずれ補正手段によって補正されたそれぞれの顔画像部分と背景画像部分とを合成するものとなろう。 The relative deviation from the background image portion is eliminated based on the relative deviation amount between each of the two or more different face image portions determined by the face image determination means and the background image portion. Further, a deviation correction means for correcting a deviation of each face image portion may be further provided. In this case, the generation means will synthesize the face image portion corrected by the shift correction means and the background image portion.

第２の発明による画像合成装置は，被写体を撮像し，被写体像を表す画像データを出力する撮像装置，上記撮像装置から出力された画像データによって表される被写体像から２以上の異なる対象画像部分を検出する対象画像検出手段，上記顔検出手段によって検出された２以上の異なる対象画像部分のそれぞれのポーズ評価値を算出するポーズ評価値算出手段，上記撮像装置における撮像，上記対象画像検出手段における検出処理，および上記ポーズ評価値算出手段における算出を連続して複数回繰り返すように上記撮像装置，上記対象画像検出手段，および上記評価値算出手段を制御する制御手段，上記制御手段の制御により得られた複数駒の被写体像において検出された複数の同一の対象画像部分の中からポーズ評価値の高い対象画像部分を決定する処理を，２以上の異なる対象画像部分のそれぞれについて行う対象画像決定手段，ならびに上記対象画像決定手段によって決定された２以上の異なる対象画像部分と，対象画像部分を除く背景画像部分とを合成して一駒の合成被写体像を生成する生成手段を備えていることを特徴とする。 According to a second aspect of the present invention, there is provided an image synthesizing apparatus that images a subject and outputs image data representing the subject image, and two or more different target image portions from the subject image represented by the image data output from the imaging device. Target image detection means for detecting pose evaluation value calculation means for calculating each pose evaluation value of two or more different target image portions detected by the face detection means, imaging in the imaging device, in the target image detection means Obtained by the control of the imaging device, the target image detection means, the evaluation value calculation means, and the control means so that the detection process and the calculation in the pose evaluation value calculation means are repeated a plurality of times in succession. Target image portion having a high pose evaluation value from among a plurality of the same target image portions detected in a plurality of frame subject images A target image determination unit that performs processing for determining each of two or more different target image portions, two or more different target image portions determined by the target image determination unit, a background image portion excluding the target image portion, and And generating means for generating a single composite subject image.

第２の発明は，上記画像合成装置に適した方法も提供している。すなわち，この方法は，撮像装置が，被写体を撮像し，被写体像を表す画像データを出力し，対象画像検出手段が，上記撮像装置から出力された画像データによって表される被写体像から２以上の異なる対象画像部分を検出し，ポーズ評価値算出手段が，上記対象画像検出手段によって検出された２以上の異なる対象画像部分のそれぞれのポーズ評価値を算出し，制御手段が，上記撮像装置における撮像，上記対象画像検出手段における検出処理，および上記ポーズ評価値差出手段における算出を連続して複数回繰り返すように上記撮像装置，上記対象画像検出手段，および上記評価値算出手段を制御し，対象画像決定手段が，上記制御手段の制御により得られた複数駒の被写体像において検出された複数の同一の対象画像部分の中からポーズ評価値の高い対象画像部分を決定する処理を，２以上の異なる対象画像部分のそれぞれについて行い，生成手段が，上記対象画像決定手段によって決定された２以上の異なる対象画像部分と，対象画像部分を除く背景画像部分とを合成して一駒の合成被写体像を生成するものである。 The second invention also provides a method suitable for the image composition apparatus. That is, in this method, the imaging device images a subject, outputs image data representing the subject image, and the target image detection means detects two or more subject images represented by the image data output from the imaging device. Different target image portions are detected, the pose evaluation value calculating means calculates pose evaluation values for each of two or more different target image portions detected by the target image detecting means, and the control means is used for imaging in the imaging apparatus. , Controlling the imaging device, the target image detecting means, and the evaluation value calculating means so that the detection processing in the target image detecting means and the calculation in the pause evaluation value sending means are repeated a plurality of times in succession, The deciding means includes a plurality of identical target image portions detected in the plurality of subject images obtained by the control of the control means. A process for determining a target image portion having a high evaluation value is performed for each of two or more different target image portions, and the generation means includes two or more different target image portions determined by the target image determination means, and a target image portion. The background image portion except for is combined to generate a single framed subject image.

第２の発明によると，被写体が撮像され，被写体像を表す画像データが得られる。得られた画像データによって表される被写体像の中から２以上の異なる対象画像部分が検出される。検出された２以上の異なる対象画像部分のそれぞれのポーズ評価値が算出される。このような撮像，対象画像部分の検出処理およびポーズ評価値の算出処理が複数回繰り返される。複数駒の被写体像において検出された複数の同一の対象画像部分においてポーズ評価値の高い対象画像部分を決定する処理が２以上の対象画像部分のそれぞれについて行われる。決定された２以上の対象画像部分と対象画像部分を除く背景画像部分とが合成されて一駒の被写体像が生成される。 According to the second invention, the subject is imaged, and image data representing the subject image is obtained. Two or more different target image portions are detected from the subject image represented by the obtained image data. A pose evaluation value is calculated for each of the two or more detected different target image portions. Such imaging, target image portion detection processing, and pose evaluation value calculation processing are repeated a plurality of times. A process for determining a target image portion having a high pose evaluation value in a plurality of identical target image portions detected in a plurality of subject images is performed for each of two or more target image portions. The two or more determined target image portions and the background image portion excluding the target image portion are combined to generate a single subject image.

第２の発明によると，複数駒の被写体像が得られることにより，その複数駒の被写体像から同一の対象画像部分が複数得られる。複数の同一の対象画像部分の中から決定されたポーズ評価値の高い対象画像部分を用いて背景画像部分と合成されるので，合成画像に含まれる対象画像部分の表情はすべてポーズ評価値の高いものとなる。複数の対象画像部分が含まれていても，それらの複数の対象画像部分のすべてがポーズ表情評価値の高い（ほぼ，同一のポーズを取っているもの）ものとなる。 According to the second invention, by obtaining a plurality of frame subject images, a plurality of identical target image portions can be obtained from the plurality of subject images. Since the target image part having a high pose evaluation value determined from among the same target image parts is combined with the background image part, all facial expressions of the target image part included in the composite image have a high pose evaluation value. It will be a thing. Even if a plurality of target image portions are included, all of the plurality of target image portions have a high pose expression evaluation value (substantially the same pose).

上記ポーズ評価値算出手段において算出されるポーズ評価値の種類を指定する指定手段をさらに備えてもよい。 You may further provide the designation | designated means to designate the kind of pose evaluation value calculated in the said pose evaluation value calculation means.

上記生成手段によって生成された合成画像のうち，２以上の異なる対象画像部分のうちポーズ評価値がしきい値以上の対象画像部分が所定数以下であるときに警告する警告手段をさらに備えてもよい。 The image processing apparatus may further include a warning unit that warns when the number of target image portions having a pose evaluation value equal to or greater than a threshold value among two or more different target image portions of the composite image generated by the generation unit is less than a predetermined number. Good.

上記対象画像決定手段によって決定された２以上の異なる対象画像部分のそれぞれの対象画像部分と背景画像部分との相対的なずれ量にもとづいて，背景画像部分との相対的なずれを解消するようにそれぞれの対象画像部分のずれを補正するずれ補正手段をさらに備えてもよい。この場合，上記生成手段は，上記ずれ補正手段によって補正されたそれぞれの対象画像部分と背景画像部分とを合成するものとなろう。 The relative deviation from the background image portion is eliminated based on the relative deviation amount between the target image portion and the background image portion of each of the two or more different target image portions determined by the target image determination means. Further, a deviation correction unit for correcting a deviation of each target image portion may be further provided. In this case, the generation means will synthesize the target image portion corrected by the shift correction means and the background image portion.

上記合成手段は，たとえば，上記複数駒の被写体像から得られる複数の背景画像部分のうち，最もぶれていない背景画像部分または最もノイズの少ない背景画像部分を用いて上記合成処理を行うものである。 The synthesizing means performs, for example, the synthesizing process using the least blurred background image portion or the least noisy background image portion among the plurality of background image portions obtained from the plurality of subject images.

図１は，この発明の実施例を示すもので，ディジタル・スチル・カメラの電気的構成を示すブロック図である。 FIG. 1 shows an embodiment of the present invention and is a block diagram showing an electrical configuration of a digital still camera.

ディジタル・スチル・カメラの全体の動作は，ＣＰＵ１によって統括される。 The entire operation of the digital still camera is controlled by the CPU 1.

ディジタル・スチル・カメラには，電源ボタン，シャッタ・レリーズ・ボタン，モード設定ダイアル，メニュー・ボタン，決定ボタン，ＯＫボタン，上下左右ボタンなどの各種ボタン，スイッチ等を含む操作器２が含まれている。この操作器２から出力される操作信号は，ＣＰＵ１に入力する。モード設定ダイアルによって設定可能なモードには，通常撮像モード，合成撮像モード，再生モードなどの各種モードがある。また，ディジタル・スチル・カメラには，ＣＰＵ１によって発光が制御されるストロボ発光装置３が設けられている。 The digital still camera includes an operation device 2 including a power button, a shutter release button, a mode setting dial, a menu button, various buttons such as an OK button, up / down / left / right buttons, and switches. Yes. An operation signal output from the operation device 2 is input to the CPU 1. Modes that can be set by the mode setting dial include various modes such as a normal imaging mode, a composite imaging mode, and a playback mode. The digital still camera is provided with a strobe light emitting device 3 whose light emission is controlled by the CPU 1.

レンズ駆動回路４によって撮像レンズ７の位置が制御され，絞り駆動回路５によって絞り８の絞り値が制御される。 The lens driving circuit 4 controls the position of the imaging lens 7, and the diaphragm driving circuit 5 controls the aperture value of the diaphragm 8.

撮像モードが設定されると（通常撮像モード，合成撮像モードのいずれも同じである），撮像レンズ７によって集光された光線束は，絞り８，赤外線カット・フィルタ９および光学ロウ・パス・フィルタ10を介してＣＣＤ11の受光面上に入射し，被写体像が結像する。ＣＣＤ11が駆動回路６によって駆動させられることにより，被写体像を表す映像信号がＣＣＤ11から出力される。映像信号は，アナログ・フロント・エンド12に入力する。 When the image pickup mode is set (the normal image pickup mode and the composite image pickup mode are the same), the light bundle condensed by the image pickup lens 7 is converted into an aperture 8, an infrared cut filter 9, and an optical low pass filter. The light is incident on the light receiving surface of the CCD 11 through 10 and a subject image is formed. When the CCD 11 is driven by the drive circuit 6, a video signal representing a subject image is output from the CCD 11. The video signal is input to the analog front end 12.

映像信号は，アナログ・フロント・エンド12において，ＣＤＳ（相関二重サンプリング），信号増幅，アナログ／ディジタル変換などの所定のアナログ信号処理が行われる。アナログ・フロント・エンド12から出力された画像データは，メイン・メモリ13に与えられ，一時的に記憶される。 The video signal is subjected to predetermined analog signal processing such as CDS (correlated double sampling), signal amplification, and analog / digital conversion at the analog front end 12. The image data output from the analog front end 12 is given to the main memory 13 and temporarily stored.

画像データは，メイン・メモリ13から読み取られ，表示装置21に与えられる。撮像により得られた被写体像が表示装置21の表示画面に表示される（いわゆるスルー画）。 The image data is read from the main memory 13 and given to the display device 21. The subject image obtained by imaging is displayed on the display screen of the display device 21 (so-called through image).

シャッタ・レリーズ・ボタンか半押しされると，撮像により得られた画像データは上述と同様に，メイン・メモリ13に一時的に記憶される。画像データは，メイン・メモリ13から読み取られ，積算回路18に入力する。積算回路18において，画像データの輝度成分が積算される。得られた積算値を表すデータがＣＰＵ１に入力し，絞り８の絞り値およびＣＣＤ11のシャッタ速度が決定される。 When the shutter release button is pressed halfway, the image data obtained by imaging is temporarily stored in the main memory 13 as described above. Image data is read from the main memory 13 and input to the integrating circuit 18. In the integrating circuit 18, the luminance components of the image data are integrated. Data representing the obtained integrated value is input to the CPU 1, and the aperture value of the aperture 8 and the shutter speed of the CCD 11 are determined.

通常撮像モードにおいては，その後，シャッタ・レリーズ・ボタンが全押しされると，撮像により得られた画像データは，上述したようにメイン・メモリ13に一時的に記憶される。画像データは，メイン・メモリ13から読み取られ，ディジタル信号処理回路14に入力する。ディジタル信号処理回路14において，入力した画像データから輝度データおよび色差データが生成される。生成された輝度データおよび色差データは，圧縮伸張処理回路19において圧縮される。圧縮された輝度データおよび色差データが，外部記録装置20によってメモリ・カードに記録される。 In the normal imaging mode, after that, when the shutter release button is fully pressed, the image data obtained by imaging is temporarily stored in the main memory 13 as described above. Image data is read from the main memory 13 and input to the digital signal processing circuit 14. In the digital signal processing circuit 14, luminance data and color difference data are generated from the input image data. The generated luminance data and color difference data are compressed by the compression / decompression processing circuit 19. The compressed luminance data and color difference data are recorded on the memory card by the external recording device 20.

合成撮像モードにおいては，複数駒の撮像が行われ，得られた複数駒の被写体像を利用して合成が行われる。このため，シャッタ・レリーズ・ボタンが押されると，ユーザによって入力された所定の駒数分の撮像が繰り返される。撮像により得られた一駒目の画像データは，メイン・メモリ13に与えられ，上述と同様にして一時的に記憶される。記憶された画像データは，顔検出／表情認識回路15に入力する。 In the composite imaging mode, imaging of a plurality of frames is performed, and synthesis is performed using the obtained subject images of the plurality of frames. Therefore, when the shutter release button is pressed, imaging for a predetermined number of frames input by the user is repeated. The first frame of image data obtained by imaging is given to the main memory 13 and temporarily stored in the same manner as described above. The stored image data is input to the face detection / expression recognition circuit 15.

顔検出／表情認識回路15は，入力した画像データによって表される被写体像の中から顔の画像部分を検出し，かつその検出された顔の画像部分の表情を認識して評価値（表情評価値）を算出する回路である。表情評価値を算出するための表情データは，サブ・メモリ17に格納されている。笑顔の表情，怒り顔の表情，目つむりの表情，泣き顔の表情などを表す表情データが，サブ・メモリ17に格納されている。これらの表情には，いずれも一般化された特徴があり，それらの特徴を表すデータが表情データとしてサブ・メモリ17に格納されている。たとえば，笑顔の表情であれば，口が横に開く，鼻から唇の両端を超えた外側など走る皺ができるなどの特徴がある。笑顔以外の表情の特徴も同様に考えることができる。このような表情データを利用することにより，顔の表情評価値を算出することは理解されよう。また，目，鼻，口，耳，頬の画像部分などの顔の特徴を表すデータもサブ・メモリ17に格納されており，これらの顔の特徴を表すデータを用いて，被写体像の中から顔の画像部分を検出することができる。 The face detection / expression recognition circuit 15 detects a face image portion from the subject image represented by the input image data, recognizes the expression of the detected face image portion, and evaluates the evaluation value (expression evaluation). Value). Expression data for calculating the expression evaluation value is stored in the sub memory 17. The sub memory 17 stores facial expression data representing a facial expression of a smile, a facial expression of an angry face, a facial expression of an eyebrows, a facial expression of a crying face, and the like. Each of these facial expressions has generalized features, and data representing these features is stored in the sub memory 17 as facial expression data. For example, a smiley expression has features such as a mouth that opens sideways and a wrinkle that runs outside the nose and beyond both ends of the lips. The characteristics of facial expressions other than smiles can be considered in the same way. It will be understood that the facial expression evaluation value is calculated by using such facial expression data. In addition, data representing facial features such as image parts of eyes, nose, mouth, ears, and cheeks are also stored in the sub-memory 17, and data representing these facial features can be used from the subject image. The image portion of the face can be detected.

このように，一駒分の被写体像についての顔検出処理および表情認識（表情評価値算出）が行われると，次の駒の撮像が行われる。次の駒の撮像により得られた画像データについても同様に顔検出処理および表情認識が行われる。ユーザによって入力された所定の駒数の撮像が終わるまで，撮像処理が繰り返され，顔検出処理および表情認識も繰り返される。もっとも，顔検出処理および表情認識が終了してから次の駒の被写体の撮像を行うようにしなくとも，所定の駒数の撮像がすべて終了してから顔検出処理および表情認識を行うようにしてもよい。 Thus, when face detection processing and facial expression recognition (expression evaluation value calculation) are performed on a subject image for one frame, the next frame is imaged. Similarly, face detection processing and facial expression recognition are performed on image data obtained by imaging the next frame. The imaging process is repeated until the predetermined number of frames input by the user is completed, and the face detection process and facial expression recognition are also repeated. Of course, the face detection process and facial expression recognition may be performed after all of the predetermined number of frames have been imaged even if the next frame subject is not imaged after the face detection process and facial expression recognition is completed. Good.

このようにして，所定の駒数の撮像および顔検出処理および表情認識が終了すると，表情評価値にもとづいて多数の顔の画像の中から合成すべき顔の画像が決定される。決定された顔の画像等を用いて画像合成が行われる。この画像合成について詳しくは，後述する。合成された画像を表す画像データは，圧縮伸張処理回路19において圧縮される。圧縮された合成画像データが外部記録装置20によってメモリ・カード20に記録される。 In this way, when imaging of a predetermined number of frames, face detection processing, and facial expression recognition are completed, a facial image to be synthesized is determined from among many facial images based on the facial expression evaluation value. Image synthesis is performed using the determined face image or the like. Details of this image composition will be described later. Image data representing the synthesized image is compressed by the compression / decompression processing circuit 19. The compressed composite image data is recorded on the memory card 20 by the external recording device 20.

さらに，この実施例におけるディジタル・スチル・カメラにおいては，人物検出／ポーズ認識回路16が設けられるようにしてもよい。この人物検出／ポーズ認識回路16は，被写体像の中から人物像（対象画像）およびそのポーズが認識され，ポーズ評価値を算出するものである。顔検出／表情認識回路15において行われる顔検出／表情認識のデータと同様に，人物像を検出するためのデータおよびポーズを認識し，ポーズ評価値を算出するデータは，サブ・メモリ17に記憶される。 Further, in the digital still camera in this embodiment, a person detection / pose recognition circuit 16 may be provided. The person detection / pose recognition circuit 16 recognizes a person image (target image) and its pose from the subject image, and calculates a pose evaluation value. Similar to the data of face detection / expression recognition performed in the face detection / expression recognition circuit 15, data for detecting a human image and pose are recognized, and data for calculating a pose evaluation value is stored in the sub-memory 17. Is done.

再生モードが設定されると，メモリ・カードに記録されている画像データは外部記録装置20によって読み取られ，圧縮伸張処理回路19に入力する。圧縮伸張処理回路19において，伸張される。伸張された画像データが表示装置21に与えられることにより，メモリ・カードに記録されている画像データによって表される画像が表示される。 When the reproduction mode is set, the image data recorded on the memory card is read by the external recording device 20 and input to the compression / decompression processing circuit 19. The compression / decompression processing circuit 19 decompresses the data. By applying the decompressed image data to the display device 21, an image represented by the image data recorded on the memory card is displayed.

図２および図３は，合成撮像モードの処理手順を示すフローチャートである。 2 and 3 are flowcharts showing the processing procedure of the composite imaging mode.

この画像合成撮像モードは，同一の被写体を複数回連続撮影（撮影間隔は，得られる複数駒の被写体像がほぼ同一のものと考えられる間隔となる）して得られた複数駒の被写体像に含まれている顔画像の中から所望の表情をもつものを見つけ出して合成して一駒の画像とするものである， This image composition imaging mode is included in multiple frame subject images obtained by continuously shooting the same subject multiple times (the shooting interval is the interval at which the obtained multiple frame subject images are considered to be approximately the same). The image with the desired expression is found out from the face images that are present and synthesized into a single frame image.

まず，連続撮影の駒数が入力され（ステップ31），かつ所望の表情が選択される（ステップ32）。さらに，所望の人数が入力される（ステップ33）。この実施例においては，合成画像の中に含まれる顔画像のうち，所望の表情をもつ顔画像が所望の人数未満の場合に警告される。その人数が入力される。これらの連続撮影の駒数，所望の表情の選択および所望の人数の入力は，いずれもユーザによって行われる。たとえば，メニュー・ボタンが押されることによりメニューが表示画面上に表示され，そのメニューの中に連続撮影駒数の選択項目，表情の選択項目，人数の選択項目が現れる。現れた項目の中から上下左右ボタンなどで所望の駒数，表情，人数などが選択され，決定ボタンが押されることにより，駒数等が決定される。表情の中には，たとえば，よい表情（笑顔など），悪い表情（目つむりなど）があり，所望の表情がユーザによって選択される。ここでは，連続枚数として３駒が入力されたものとする。また，所望の表情として良い表情が選択されたものとする。 First, the number of frames for continuous shooting is input (step 31), and a desired facial expression is selected (step 32). Further, a desired number of people is input (step 33). In this embodiment, a warning is issued when the number of face images having a desired expression among the face images included in the composite image is less than the desired number. The number of people is entered. The number of frames for continuous shooting, selection of a desired facial expression, and input of a desired number of people are all performed by the user. For example, when a menu button is pressed, a menu is displayed on the display screen, and a selection item for the number of continuously shot frames, a selection item for facial expression, and a selection item for the number of people appear in the menu. From the items that appear, the desired number of frames, facial expressions, number of people, etc. are selected with the up / down / left / right buttons, etc., and the number of frames is determined by pressing the enter button. Among the facial expressions, for example, there are good facial expressions (such as a smile) and bad facial expressions (such as an eyelid), and a desired facial expression is selected by the user. Here, it is assumed that three frames are input as the continuous number. It is assumed that a good facial expression is selected as a desired facial expression.

ユーザによってシャッタ・レリーズ・ボタンが押されると，撮影枚数が初期化され，連続撮影が開始される（ステップ34，35）。撮影により，一駒目の被写体像が得られると，その被写体像の中から顔画像が検出される。 When the user presses the shutter release button, the number of shots is initialized and continuous shooting is started (steps 34 and 35). When a subject image for the first frame is obtained by shooting, a face image is detected from the subject image.

図４は，一駒目の撮影によって得られた被写体像Ｉ１の一例である。 FIG. 4 is an example of a subject image I1 obtained by photographing the first frame.

一駒目の被写体像Ｉ１の中には，人物像51，52，および53が含まれている。被写体像Ｉ１において顔画像検出処理が行われることにより，人物像51，52，および53の顔画像部分61，62，および63が検出される。顔画像部分61，62，および63のうち，顔画像部分61および62はいずれも良い表情のものであり，顔画像部分63は悪い表情のものである。これらの人物像51，52および53の背景（回り）には背景画像91が形成されている。 The subject image I1 of the first frame includes person images 51, 52, and 53. By performing face image detection processing on the subject image I1, face image portions 61, 62, and 63 of the person images 51, 52, and 53 are detected. Of the face image portions 61, 62, and 63, the face image portions 61 and 62 have good expressions, and the face image portion 63 has bad expressions. A background image 91 is formed in the background (around) of these human images 51, 52 and 53.

図２に戻って，顔検出処理が終了すると，検出された顔画像部分についての表情認識／評価処理が行われ，表情評価値が算出される（ステップ37）。ここでは，所望の表情として良い表情が選択されているから，一駒目の被写体像Ｉ１から検出された顔画像部分61，62および63の中からも，良い表情の顔画像が認識され，かつその良い表情の評価値が算出される。上述のように，口の開き具合，皺のより具合などから表情認識処理を行うことができる。検出された顔画像部分の表情認識処理により表情評価値が得られるのは理解できよう。一駒目の顔画像部分について表情評価値が得られると，その表情評価値が得られた顔画像部分の位置，および表情評価値が記憶される。 Returning to FIG. 2, when the face detection process ends, facial expression recognition / evaluation processing is performed on the detected face image portion, and a facial expression evaluation value is calculated (step 37). Here, since a good facial expression is selected as a desired facial expression, a facial image with a good facial expression is recognized from among the facial image portions 61, 62 and 63 detected from the subject image I1 of the first frame, and The evaluation value of the good facial expression is calculated. As described above, facial expression recognition processing can be performed based on the degree of opening of the mouth and the degree of heel. It can be understood that the facial expression evaluation value is obtained by the facial expression recognition processing of the detected face image portion. When the facial expression evaluation value is obtained for the face image portion of the first frame, the position of the facial image portion from which the facial expression evaluation value is obtained and the facial expression evaluation value are stored.

ユーザによって入力された連続駒数の撮影が終了するまでステップ35から38の処理が繰り返される。連続駒数の撮影が終了していなければ（ステップ39でＮＯ），撮影駒数がインクレメントされ（ステップ40），２駒目の撮影が行われる（ステップ35）。 Steps 35 to 38 are repeated until shooting of the continuous frame number input by the user is completed. If the continuous frame number has not been shot (NO in step 39), the number of shot frames is incremented (step 40), and the second frame is shot (step 35).

図５は，二駒目の被写体像Ｉ２の一例である。 FIG. 5 is an example of the subject image I2 of the second frame.

二駒目の被写体像Ｉ２も一駒目の被写体像Ｉ１と同一の被写体のものであるから，一駒目の被写体像Ｉ１に含まれている人物像51，52および53が含まれている。二駒目の被写体像Ｉ２について顔検出処理が行われることにより，人物像51，52および53の顔画像部分71，72および73が検出される。二駒目の被写体像Ｉ２では，顔画像部分71，72および73のうち，顔画像部分71は良い表情であるが，顔画像部分72および73は悪い表情である。二駒目の被写体像Ｉ２にも背景画像101が含まれている。 Since the subject image I2 of the second frame is also the same subject as the subject image I1 of the first frame, person images 51, 52, and 53 included in the subject image I1 of the first frame are included. By performing face detection processing on the second subject image I2, face image portions 71, 72, and 73 of the person images 51, 52, and 53 are detected. In the subject image I2 of the second frame, of the face image portions 71, 72 and 73, the face image portion 71 has a good expression, but the face image portions 72 and 73 have a bad expression. The background image 101 is also included in the subject image I2 of the second frame.

さらに，三駒目の撮影が行われる。 In addition, the third frame is shot.

図６は，三駒目の被写体像Ｉ３の一例である。 FIG. 6 is an example of the subject image I3 of the third frame.

三駒目の被写体像Ｉ２も一駒目の被写体像Ｉ１および二駒目の被写体像Ｉ２と同一の被写体のものであるから，一駒目の被写体像Ｉ１および二駒目の被写体像Ｉ２にそれぞれ含まれている人物像と同じ人物像51，52および53が含まれている。三駒目の被写体像Ｉ３について顔検出処理が行われることにより，人物像51，52および53の顔画像部分81，82および83が検出される。三駒目の被写体像Ｉ３では，顔画像部分53は良い表情であるが，顔画像部分52は悪い表情である。三駒目の被写体像Ｉ３にも背景画像102が含まれている。三駒の被写体像Ｉ１，Ｉ２およびＩ３の背景画像91，101および102はほぼ同じものであるが，人物像と異なり符号を変えている。 The third frame subject image I2 is also the same subject as the first frame subject image I1 and the second frame subject image I2, so that the first frame subject image I1 and the second frame subject image I2 respectively. The same person images 51, 52 and 53 as the included person images are included. By performing face detection processing on the subject image I3 of the third frame, the face image portions 81, 82, and 83 of the person images 51, 52, and 53 are detected. In the subject image I3 of the third frame, the face image portion 53 has a good expression, but the face image portion 52 has a bad expression. The background image 102 is also included in the subject image I3 of the third frame. Although the background images 91, 101, and 102 of the three-frame subject images I1, I2, and I3 are substantially the same, the signs are changed unlike the human image.

図２に戻って，連続撮影駒数分の撮影が終了すると（ステップ39でＹＥＳ），同一人物の複数の顔画像のうち表情評価値の高い顔画像が決定される（ステップ41）。３駒の被写体像Ｉ１，Ｉ２およびＩ３にはそれぞれ，人物像51，52および53が含まれている。３駒の被写体像Ｉ１，Ｉ２およびＩ３の人物像51は同一人物の画像であり，３駒の撮影が行われたことにより，同一人物の複数の顔画像61，71および81が得られている。これらの顔画像61，71および81のうち，二駒目の被写体像Ｉ２から得られた顔画像71の表情評価値が最も高かったものとする。すると，同一人物の複数の顔画像61，71および81のうち表情評価値の高い顔画像として顔画像71が決定される。 Returning to FIG. 2, when shooting for the number of consecutive frames is completed (YES in step 39), a face image having a high facial expression evaluation value is determined among a plurality of face images of the same person (step 41). The three frames of the subject images I1, I2 and I3 include person images 51, 52 and 53, respectively. The three frames of the subject images I1, I2, and I3 are images of the same person, and a plurality of face images 61, 71, and 81 of the same person are obtained by photographing the three frames. Of these face images 61, 71 and 81, it is assumed that the facial expression 71 obtained from the subject image I2 of the second frame has the highest expression evaluation value. Then, the face image 71 is determined as a face image having a high expression evaluation value among the plurality of face images 61, 71 and 81 of the same person.

同様にして，人物像52についての顔画像62，72および82のうち，一駒目の被写体像Ｉ１から得られた顔画像62の表情評価値が最も高かったものとすると，表情評価値の高い顔画像として顔画像62が決定される。また，人物像53についての顔画像63，73および83のうち，三駒目の被写体像Ｉ３から得られた顔画像83の表情評価値が最も高かったものとすると，表情評価値の高い顔画像として顔画像83が決定される。 Similarly, if the facial expression 62 of the face image 62 obtained from the subject image I1 of the first frame is the highest among the facial images 62, 72 and 82 for the human image 52, the facial expression evaluation value is high. A face image 62 is determined as the face image. Of the face images 63, 73, and 83 for the human image 53, if the facial image 83 obtained from the third frame subject image I3 has the highest facial expression evaluation value, the facial image having a high facial expression evaluation value As a result, the face image 83 is determined.

このようにして，顔画像が決定されると，被写体像の中から顔画像部分を除いた背景画像が決定される（ステップ42）。背景画像はあらかじめ決定されていてもよいし，得られた３駒の背景画像91，101，102の中から最もノイズの少ない背景画像または最もぶれの少ない背景画像が選択されてもよい。レベルが所定のしきい値を超えているような小さな領域が多数あればノイズが多いと考えられ，そのような小さな領域が少なければノイズが少ないと考えられる。また，高周波数成分の多い被写体像は最もぶれの少ないものと考えられる。この実施例においては，背景画像は，二駒目のものが用いられるものとする。 When the face image is determined in this way, a background image obtained by removing the face image portion from the subject image is determined (step 42). The background image may be determined in advance, or the background image with the least noise or the background image with the least blur may be selected from the three background images 91, 101, and 102 obtained. If there are many small areas where the level exceeds a predetermined threshold, it is considered that there is a lot of noise, and if there are few such small areas, the noise is considered to be low. A subject image with many high frequency components is considered to have the least blur. In this embodiment, it is assumed that the second background image is used.

図７は，背景画像101の一例である。このように背景画像101は，顔画像が除かれた部分のものとなる。 FIG. 7 is an example of the background image 101. As described above, the background image 101 is the portion from which the face image is removed.

図２に戻って，背景画像が決定されると，その決定された背景画像101と，決定された顔画像71，62および83との間のずれ量が算出される（ステップ43）。算出されたずれ量に応じて，背景画像101と一致するように顔画像71，62および83のずれが補正される（ステップ43）。 Returning to FIG. 2, when the background image is determined, the amount of deviation between the determined background image 101 and the determined face images 71, 62 and 83 is calculated (step 43). In accordance with the calculated amount of deviation, the deviation of the face images 71, 62 and 83 is corrected so as to coincide with the background image 101 (step 43).

図８は，背景画像と顔画像とのずれを示している。 FIG. 8 shows the deviation between the background image and the face image.

上述したように，３駒の被写体像Ｉ１，Ｉ２およびＩ３の撮影間隔は短いこととなるが，これらの３駒の被写体像Ｉ１，Ｉ２およびＩ３間でずれが生じることがある。たとえば，背景画像101は，二駒目の被写体像Ｉ２から得られたものであり，その二駒目の被写体像Ｉ２には顔画像部分71，72および73が含まれている。決定された顔画像71，62および83のうち，顔画像71は，二駒目の顔画像71であるから二駒目の背景画像101との間にずれは無い。顔画像62は，一駒目の顔画像62であるから，二駒目の背景画像101との間にΔ12の位置ずれがある。同様に，顔画像83は三駒目の顔画像83であるから，二駒目の背景画像101との間にΔ23の位置ずれがある。背景画像101の間の位置ずれが無くなるように，顔画像が補正される（合成位置の補正，大きさの調整など）。 As described above, the shooting intervals of the three subject images I1, I2 and I3 are short, but there may be a deviation between these three subject images I1, I2 and I3. For example, the background image 101 is obtained from the subject image I2 of the second frame, and the subject image I2 of the second frame includes face image portions 71, 72, and 73. Of the determined face images 71, 62 and 83, the face image 71 is the face image 71 of the second frame, so there is no deviation from the background image 101 of the second frame. Since the face image 62 is the face image 62 of the first frame, there is a positional deviation of Δ12 with respect to the background image 101 of the second frame. Similarly, since the face image 83 is the face image 83 of the third frame, there is a positional deviation of Δ23 from the background image 101 of the second frame. The face image is corrected (position correction, size adjustment, etc.) so that the positional deviation between the background images 101 is eliminated.

再び，図２に戻って，顔画像が補正されると，補正された顔画像と背景画像とが合成されて合成画像が得られる（ステップ45）。 Returning again to FIG. 2, when the face image is corrected, the corrected face image and the background image are combined to obtain a combined image (step 45).

図９は，合成画像の生成の仕方を示している。 FIG. 9 shows how to generate a composite image.

上述のように，二駒目の顔画像71，一駒目の顔画像62および三駒目の顔画像83ならびに二駒目の背景画像101が合成されて合成画像Ｉ４が得られる。得られた合成画像Ｉ４は，良い表情の多い画像となっている。 As described above, the face image 71 of the second frame, the face image 62 of the first frame, the face image 83 of the third frame, and the background image 101 of the second frame are combined to obtain a combined image I4. The obtained composite image I4 is an image with many good expressions.

図２を参照して，合成画像の中の顔画像部分のうち，しきい値以上の表情評価値をもつ顔画像部分の数が，ステップ33で入力された人数以上かどうかが確認される（ステップ46）。人数以上でなければ（ステップ46でＮＯ），ユーザに取り直しを促すために警告が行われる（ステップ47）。人数以上であれば（ステップ46でＹＥＳ），ステップ47の処理はスキップされる。 Referring to FIG. 2, it is confirmed whether the number of face image portions having facial expression evaluation values equal to or greater than the threshold among the face image portions in the composite image is greater than or equal to the number of persons input in step 33 (see FIG. 2). Step 46). If it is not more than the number of people (NO in step 46), a warning is issued to prompt the user to retake (step 47). If it is greater than or equal to the number of persons (YES in step 46), the process of step 47 is skipped.

得られた合成画像を表す画像データについて所定のディジタル信号処理が行われ（ステップ48），合成画像データがメモリ・カードに記録されることとなる（ステップ49）。 Predetermined digital signal processing is performed on the obtained image data representing the synthesized image (step 48), and the synthesized image data is recorded on the memory card (step 49).

上述の実施例においては，ずれ補正などが行われていたが，ずれ補正を無視できるような場合などには必ずしも行われなくとも良い。 In the above-described embodiment, misalignment correction and the like are performed. However, it is not always necessary to perform misalignment correction when the misalignment correction can be ignored.

図10から図16は，他の実施例を示すものである。上述の実施例においては，顔の表情評価値を利用して，表情の良い顔画像が多くなるようにしていたが，次に示す実施例では，同じポーズの人物像（対象画像）が多くなるようにするものである。 10 to 16 show another embodiment. In the above-described embodiments, facial expression evaluation values of faces are used to increase the number of facial images with good facial expressions. However, in the following embodiments, the number of human images (target images) in the same pose increases. It is what you want to do.

図10および図11は合成撮像モードの処理手順を示すフローチャートである。 10 and 11 are flowcharts showing the processing procedure in the composite imaging mode.

上述と同様に，連続撮影駒数の入力が行われ（ステップ111），所望のポーズが選択される（ステップ112）。次に述べる実施例では所望のポーズは両手を上げているものであるが，その他のどのようなポーズでもよいのはいうまでもない。このポーズの選択も表情の選択と同様に，メニューを用いて選択される。所望の人数が入力されると（ステップ113），撮影駒数が初期化され，連続撮影が開始される（ステップ114）。この実施例においても３駒撮影されるものとすると，複数駒であれば何駒でもよいのはいうまでもない。 Similarly to the above, the number of continuously shot frames is input (step 111), and a desired pose is selected (step 112). In the embodiment described below, the desired pose is raised both hands, but it is needless to say that any other pose may be used. This pose is selected using a menu in the same manner as the expression is selected. When the desired number of people is input (step 113), the number of frames is initialized and continuous shooting is started (step 114). In this embodiment, if three frames are taken, it goes without saying that any number of frames may be used.

一駒目の撮影が行われ，被写体像が得られる（ステップ115）。得られた被写体像の中から人物像が検出される（ステップ116）。検出された人物像のポーズ認識が行われ，ポーズ評価値が算出される（ステップ117）。検出された人物像の位置および算出されたポーズ評価値が記憶される（ステップ118）。一駒目の撮影が終了すると（ステップ119でＮＯ），撮影回数がインクレメントされ（ステップ120），二駒目の撮影および三駒目の撮影が行われる。二駒目の被写体像および三駒目の被写体像においても人物像の検出およびポーズ評価値の算出が行われる。 The first frame is shot and a subject image is obtained (step 115). A person image is detected from the obtained subject image (step 116). The detected pose of the person image is recognized, and a pose evaluation value is calculated (step 117). The position of the detected person image and the calculated pose evaluation value are stored (step 118). When the first frame has been shot (NO in step 119), the number of shots is incremented (step 120), and the second frame and the third frame are shot. The detection of the person image and the calculation of the pose evaluation value are also performed in the subject image of the second frame and the subject image of the third frame.

図12は，一駒目の被写体像Ｉ11の一例である。 FIG. 12 shows an example of the subject image I11 for the first frame.

被写体像Ｉ11には，人物像131，132および133ならびに背景画像171が含まれている。人物像の検出処理が行われることにより，被写体像Ｉ11の中から人物像131，132および133の画像部分141，142および143が検出される。検出された画像部分141，142および143のポーズ評価値が算出される。 The subject image I11 includes person images 131, 132, and 133 and a background image 171. By performing the human image detection process, the image portions 141, 142, and 143 of the human images 131, 132, and 133 are detected from the subject image I11. Pause evaluation values of the detected image portions 141, 142 and 143 are calculated.

図13は，二駒目の被写体像Ｉ12の一例である。 FIG. 13 shows an example of the subject image I12 of the second frame.

被写体像Ｉ12にも，人物像131，132および133ならびに背景画像172が含まれており，これらの画像部分151，152および153が検出される。検出された画像部分151，152および153のポーズ評価値が算出される。 The subject image I12 also includes person images 131, 132, and 133 and a background image 172, and these image portions 151, 152, and 153 are detected. Pause evaluation values of the detected image portions 151, 152, and 153 are calculated.

図14は，三駒目の被写体像Ｉ13の一例である。 FIG. 14 shows an example of the subject image I13 for the third frame.

被写体像Ｉ13にも，人物像131，132および133ならびに背景画像173が含まれており，これらの画像部分161，162および163が検出される。検出された画像部分161，162および163のポーズ評価値が算出される。 The subject image I13 also includes person images 131, 132 and 133 and a background image 173, and these image portions 161, 162 and 163 are detected. Pause evaluation values of the detected image portions 161, 162 and 163 are calculated.

図15は，背景画像の一例である。 FIG. 15 is an example of a background image.

この背景画像は，二駒目の被写体像Ｉ12から得られた背景画像172である。検出された画像部分151，152および153を除いた部分が背景画像172とされている。 This background image is the background image 172 obtained from the subject image I12 of the second frame. A portion excluding the detected image portions 151, 152, and 153 is a background image 172.

人物像131については，三駒目の被写体像Ｉ13の画像部分161が，人物像132については一駒目の被写体像Ｉ11の画像部分142が，人物像133については二駒目の被写体像Ｉ12の画像部分153が，それぞれポーズ評価値が高いものとする。 For the person image 131, the image portion 161 of the third frame subject image I13, for the person image 132, the image portion 142 of the first frame subject image I11, and for the person image 133, the second frame subject image I12. Assume that each image portion 153 has a high pose evaluation value.

図２に戻って，ユーザによって入力された連続撮影駒数分撮影か終了すると（ステップ119でＹＥＳ），同一の人物像のうちポーズ評価値の高い顔画像が決定される（ステップ121）。上述のように，画像部分161，142および153が，同一の人物像のうちポーズ評価値の高い顔画像として決定されることとなる。 Returning to FIG. 2, when shooting is completed for the number of continuously shot frames input by the user (YES in step 119), a face image having a high pose evaluation value is determined from the same person image (step 121). As described above, the image portions 161, 142, and 153 are determined as face images having a high pose evaluation value among the same person image.

その後，背景画像が決定される（ステップ122）。決定された背景画像と決定された人物像とのずれ量が算出され（ステップ123），算出されたずれ量が無くなるように補正される（ステップ124）。補正された人物像と背景画像とが合成されて合成画像が得られる（ステップ125）。 Thereafter, a background image is determined (step 122). A shift amount between the determined background image and the determined person image is calculated (step 123), and correction is performed so that the calculated shift amount is eliminated (step 124). The corrected person image and the background image are combined to obtain a combined image (step 125).

図16は，合成画像の生成の仕方を示している。 FIG. 16 shows a method of generating a composite image.

上述のように，三駒目の被写体像Ｉ13の画像部分161，一駒目の被写体像Ｉ11の画像部分142および二駒目の被写体像Ｉ12の画像部分153ならびに背景画像172が合成されて一駒の合成画像Ｉ14が得られる。 As described above, the image portion 161 of the subject image I13 for the third frame, the image portion 142 of the subject image I11 for the first frame, the image portion 153 of the subject image I12 for the second frame, and the background image 172 are synthesized. A composite image I14 is obtained.

図11を参照して，得られた合成画像Ｉ14の中にしきい値以上のポーズ評価値をもつ画像部分が，入力された人数以上かどうかが確認される（ステップ126）。人数以上でなければ（ステップ126でＮＯ），撮影者に警告が行われる（ステップ127）。人数以上であればステップ127の処理はスキップされる。 Referring to FIG. 11, it is confirmed whether or not the obtained composite image I14 has an image portion having a pose evaluation value equal to or greater than a threshold value, which is greater than or equal to the input number of people (step 126). If it is not more than the number of persons (NO in step 126), a warning is given to the photographer (step 127). If the number is greater than or equal to the number of persons, the process of step 127 is skipped.

その後，ディジタル信号処理が行われ（ステップ128），合成画像データがメモリ・カードに記録される（ステップ129）。 Thereafter, digital signal processing is performed (step 128), and the composite image data is recorded on the memory card (step 129).

上述の実施例においては，人物像を検出し，同じポーズの人物像が多くなるように合成画像が生成されているが，人物像に限らず，動物の画像などその他の対象画像であってもよい。 In the above-described embodiment, a human image is detected and a composite image is generated so that the number of human images in the same pose increases. However, not only a human image but also other target images such as an animal image may be used. Good.

ディジタル・スチル・カメラの電気的構成を示すブロック図である。It is a block diagram which shows the electric constitution of a digital still camera. 合成撮像モードの処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of composite imaging mode. 合成撮像モードの処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of composite imaging mode. 被写体像の一例である。It is an example of a to-be-photographed image. 被写体像の一例である。It is an example of a to-be-photographed image. 被写体像の一例である。It is an example of a to-be-photographed image. 背景画像の一例である。It is an example of a background image. 背景画像とのずれを示している。The deviation from the background image is shown. 画像合成の仕方を示している。It shows how to compose an image. 合成撮像モードの処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of composite imaging mode. 合成撮像モードの処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of composite imaging mode. 被写体像の一例である。It is an example of a to-be-photographed image. 被写体像の一例である。It is an example of a to-be-photographed image. 被写体像の一例である。It is an example of a to-be-photographed image. 背景画像の一例である。It is an example of a background image. 画像合成の仕方を示している。It shows how to compose an image.

Explanation of symbols

１ＣＰＵ
11 ＣＣＤ
15 顔検出／表情認識回路
16 人物検出／ポーズ認識回路

1 CPU
11 CCD
15 Face detection / expression recognition circuit
16 Human detection / pause recognition circuit

Claims

An imaging device for imaging a subject and outputting image data representing the subject image;
Face detection means for detecting image portions of two or more different faces from a subject image represented by image data output from the imaging device;
Facial expression evaluation value calculation means for calculating facial expression evaluation values of two or more different face image portions detected by the face detection means;
The imaging device, the face detection unit, and the evaluation value calculation unit are controlled so that imaging in the imaging device, detection processing in the face detection unit, and calculation in the facial expression evaluation value calculation unit are repeated a plurality of times in succession. Control means,
A process for determining a face image portion having a high expression evaluation value from a plurality of identical face image portions detected in a plurality of subject images obtained by the control of the control means is performed for each of two or more different face image portions. A face image determining means for generating a composite image of a single frame by combining two or more different face image portions determined by the face image determining means and a background image portion excluding the face image portion;
An image synthesizing apparatus.

2. The image synthesizing apparatus according to claim 1, wherein the expression evaluation value calculating means further comprises a specifying means for specifying to calculate one evaluation value among a smile, an angry face, a staring face and a crying face.

Warning means for warning when a facial image portion whose facial expression evaluation value is greater than or equal to a threshold value among two or more different facial image portions of the composite image generated by the generation device is less than a predetermined value is further provided. The image composition device according to claim 1.

The relative deviation from the background image portion is eliminated based on the relative deviation amount between each of the two or more different face image portions determined by the face image determination means and the background image portion. Is further provided with a deviation correction means for correcting the deviation of each face image portion,
The generating means synthesizes each face image part corrected by the deviation correcting means and the background image part,
The image composition device according to claim 1.

An imaging device for imaging a subject and outputting image data representing the subject image;
Target image detection means for detecting two or more different target image portions from the subject image represented by the image data output from the imaging device;
Pose evaluation value calculating means for calculating pose evaluation values of two or more different target image portions detected by the target image detecting means;
The imaging apparatus, the target image detection means, and the pose evaluation value calculation means so that the imaging in the imaging apparatus, the detection processing in the target image detection means, and the calculation in the pose evaluation value calculation means are repeated a plurality of times in succession. Control means for controlling
A process for determining a target image portion having a high expression evaluation value from a plurality of identical target image portions detected in a plurality of subject images obtained by the control of the control means is performed for each of two or more different target image portions. A target image determining means for generating a single frame composite subject image by combining two or more different target image portions determined by the target image determining means and a background image portion excluding the target image portion;
An image synthesizing apparatus.

6. The image synthesizing apparatus according to claim 5, further comprising designation means for designating a type of a pose evaluation value calculated by the pose evaluation value calculation means.

Claim means further comprising warning means for warning when the number of target image parts having a pose evaluation value equal to or greater than a threshold value among two or more different target image parts of the composite image generated by the generating means is less than a predetermined number. Item 6. The image composition device according to Item 5.

The relative deviation from the background image portion is eliminated based on the relative deviation amount between the target image portion and the background image portion of each of the two or more different target image portions determined by the target image determination means. Is further provided with a deviation correction means for correcting the deviation of each target image portion,
The generating means synthesizes each target image portion and background image portion corrected by the deviation correcting means,
The image composition device according to claim 5.

The synthesizing means performs the synthesizing process using the least blurred background image part or the least noise background image part among the plurality of background image parts obtained from the subject images of the plurality of frames.
The image composition device according to claim 1 or 5.

The imaging device images the subject and outputs image data representing the subject image.
A face detecting means for detecting two or more different face image portions from the subject image represented by the image data output from the imaging device;
Facial expression evaluation value calculating means calculates facial expression evaluation values for each of two or more different face image portions detected by the face detection means;
The imaging means, the face detection means, and the evaluation value calculation so that the control means continuously repeats the imaging in the imaging apparatus, the detection process in the face detection means, and the calculation in the facial expression evaluation value calculation means a plurality of times. Control means,
The face image determining means determines a face image part having a high expression evaluation value from a plurality of identical face image parts detected in a plurality of subject images obtained by the control of the control means. For each different face image part,
A generating unit that combines two or more different face image parts determined by the face image determining unit and a background image part excluding the face image part to generate a single composite subject image;
A method for controlling an image composition apparatus.

The imaging device images the subject and outputs image data representing the subject image.
A target image detecting means detects two or more different target image portions from the subject image represented by the image data output from the imaging device;
A pose evaluation value calculating unit calculates a pose evaluation value of each of two or more different target image portions detected by the target image detecting unit;
The imaging means, the target image detection means, and the evaluation are such that the control means continuously repeats the imaging in the imaging apparatus, the detection processing in the target image detection means, and the calculation in the pose evaluation value calculation means a plurality of times. Control the value calculation means,
The target image determining means determines a target image part having a high pose evaluation value from a plurality of identical target image parts detected in a plurality of subject images obtained by the control of the control means. For each different target image part,
A generating unit that combines two or more different target image portions determined by the target image determining unit and a background image portion excluding the target image portion to generate a single composite subject image;
A method for controlling an image composition apparatus.