JP2008219450A

JP2008219450A - Imaging device and control method thereof

Info

Publication number: JP2008219450A
Application number: JP2007053645A
Authority: JP
Inventors: Makoto Oishi; 誠大石
Original assignee: Fujifilm Corp
Current assignee: Fujifilm Corp
Priority date: 2007-03-05
Filing date: 2007-03-05
Publication date: 2008-09-18

Abstract

<P>PROBLEM TO BE SOLVED: To eliminate such a trouble that unnecessary photographing operation is carried out while securing the convenience of automatic photography with a voice as a trigger. <P>SOLUTION: The digital camera 1 is provided with a person extracting means 41, a voice analyzing means 43, and a composition deciding means 44. The person extracting means 41 analyzes image data to extract an image area representing a person. The voice analyzing means 43 analyzes an input voice to detect a predetermined features associated with the voice. The composition deciding means decides whether the composition of the image data are favorable based upon the extraction result of the person extracting means and the detection result of the voice analyzing means. The timing of recording the image data is determined based upon whether the composition is favorable, and a recording means is controlled so that the image data are recorded in the determined timing. Alternately, the determined timing is notified to a user. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、音声をトリガとして自動撮影を行う撮像装置と、その撮像装置の制御方法に関する。 The present invention relates to an imaging apparatus that performs automatic imaging using sound as a trigger, and a method for controlling the imaging apparatus.

デジタルカメラは通常、動画撮影時に音声を録音するためのマイクを備えている。このマイクから入力される特定の音声フレーズを識別し、その音声フレーズが検出されたときに、自動的に撮影動作を行い画像を取得するデジタルカメラが知られている（例えば、特許文献１）。
特開２００６−１８４５８９号公報 Digital cameras are usually equipped with a microphone for recording audio during movie shooting. There is known a digital camera that identifies a specific voice phrase input from the microphone and automatically captures an image when the voice phrase is detected (for example, Patent Document 1).
JP 2006-184589 A

音声をトリガとして自動撮影を行うデジタルカメラは、便利な反面、無関係な音声に反応して無用な動作を行うことがある。例えば、観光地など人の多いところでは、カメラが近くにいる無関係な人の声に反応してしまうことがある。また、集合写真の撮影などで、未だ撮影の準備が整っていないのにも拘らず、誰かが「ハイ、チーズ」といった声を発してしまったために、撮影が行われてしまうこともある。 A digital camera that performs automatic shooting using voice as a trigger is convenient, but may perform unnecessary operations in response to irrelevant voice. For example, in places with a lot of people such as sightseeing spots, the camera may react to the voice of an irrelevant person nearby. In addition, when a group photo is taken, the shooting may be performed because someone has made a voice such as “High, Cheese” even though it is not yet ready for shooting.

本発明は、音声をトリガとした自動撮影の利便性を確保しつつ、無用な撮影動作が行われてしまう不都合を解消することを目的とする。 An object of the present invention is to eliminate the inconvenience that an unnecessary shooting operation is performed while ensuring the convenience of automatic shooting using voice as a trigger.

本発明は、上記目的を達成するために、２種類の撮像装置を提供する。これらの撮像装置は、いずれも、情景を撮影し、その情景を表す画像データを生成する撮像手段と、撮像手段により生成された画像データを所定の記録媒体に記録する記録手段とを備えており、次に説明する人物抽出手段、音声解析手段、構図判定手段を備える。 In order to achieve the above object, the present invention provides two types of imaging devices. Each of these imaging apparatuses includes an imaging unit that captures a scene and generates image data representing the scene, and a recording unit that records the image data generated by the imaging unit on a predetermined recording medium. A person extraction unit, a voice analysis unit, and a composition determination unit described below are provided.

人物抽出手段は、撮像手段により生成された画像データを解析することにより、人物を表す画像領域を抽出する。例えば、画像データに含まれる顔を探索し、抽出結果として、その探索により検出された顔の数、各顔の位置および各顔の大きさを示す情報を出力するものとする。この場合、探索により検出された顔の表情を識別し、識別された表情を示す情報を、さらに出力するようにしてもよい。また、画像データに含まれる人物のジェスチャーを識別し、抽出結果として、識別されたジェスチャーを示す情報を出力するものとしてもよい。 The person extracting unit extracts an image area representing the person by analyzing the image data generated by the imaging unit. For example, a face included in image data is searched, and information indicating the number of faces detected by the search, the position of each face, and the size of each face is output as an extraction result. In this case, the facial expression detected by the search may be identified, and information indicating the identified facial expression may be further output. Moreover, it is good also as what identifies the gesture of the person contained in image data, and outputs the information which shows the identified gesture as an extraction result.

音声解析手段は、入力された音声を解析することにより音声に係る所定の特徴を検出する。例えば、音声の特徴として、所定の音量変化、所定の音声フレーズ、所定の人物の声の特徴として予め登録された特徴などを検出する。構図判定手段は、人物抽出手段の抽出結果および音声解析手段の検出結果に基づいて、画像データの構図の良否を判定する。 The voice analysis means detects a predetermined feature related to the voice by analyzing the input voice. For example, as a voice feature, a predetermined volume change, a predetermined voice phrase, a feature registered in advance as a voice feature of a predetermined person, and the like are detected. The composition determination means determines the quality of the composition of the image data based on the extraction result of the person extraction means and the detection result of the voice analysis means.

本発明の第１の撮像装置は、上記撮像手段、記録手段、人物抽出手段および構図判定手段に加え、構図判定手段の判定結果に基づいて画像データを記録するタイミングを決定し、決定したタイミングで画像データが記録されるように記録手段を制御する記録制御手段を備える。本発明の第１の撮像装置によれば、自動撮影のトリガとなり得る音声が発せられたとしても、構図が所定の条件を満たしていなければ自動撮影は行われないので、音声のみに反応して無用な撮影は行われる心配がない。 The first imaging device of the present invention determines the timing for recording image data based on the determination result of the composition determination means in addition to the imaging means, recording means, person extraction means and composition determination means, and at the determined timing. A recording control unit is provided for controlling the recording unit so that the image data is recorded. According to the first imaging apparatus of the present invention, even if a sound that can be a trigger for automatic shooting is emitted, automatic shooting is not performed unless the composition satisfies a predetermined condition. There is no worry about unnecessary shooting.

また、本発明の第２の撮像装置は、第１の撮像装置の記録制御手段に代えて、構図判定手段の判定結果に基づいて画像データを記録するタイミングを決定し、決定したタイミングの到来を報知する報知手段を備える。第２の装置は、自動撮影を行うものではないが、構図が所定の条件を満たし且つ所定の音声が発せられたときに、撮影者にシャッタレリーズボタンを押下すべきタイミングが来たことをユーザに報知するので、ユーザは自動撮影と同等の利便性を享受することができる。また、撮影動作が自動で行われることがないので、ユーザの意図に反して撮像装置が無用な動作を行うこともない。 Further, the second imaging apparatus of the present invention determines the timing for recording image data based on the determination result of the composition determination means instead of the recording control means of the first imaging apparatus, and the arrival of the determined timing is reached. Informing means for informing is provided. The second device does not perform automatic shooting, but when the composition satisfies a predetermined condition and a predetermined sound is produced, it is determined that the time has come for the photographer to press the shutter release button. Therefore, the user can enjoy the same convenience as automatic shooting. In addition, since the photographing operation is not automatically performed, the image capturing apparatus does not perform an unnecessary operation against the user's intention.

また、記録手段は、人物抽出手段の抽出結果および音声解析手段の検出結果を、画像データとともに記録媒体に記録することが好ましい。これにより、記録媒体に記録された画像データをパソコンなどで編集するときに、それらの抽出結果を利用した編集を行うことができる。 The recording means preferably records the extraction result of the person extraction means and the detection result of the sound analysis means together with the image data on a recording medium. Thereby, when the image data recorded on the recording medium is edited with a personal computer or the like, editing using the extraction result can be performed.

本発明の第１の制御方法は、撮像装置を次の手順で制御することにより上記第１の撮像装置として動作させる方法である。まず、撮像手段により生成された画像データを解析することにより、人物を表す画像領域を抽出する。これと並行して、入力された音声を解析することにより音声に係る所定の特徴を検出する。続いて、人物の抽出結果および音声の検出結果に基づいて、画像データの構図の良否を判定する。そして、その判定の結果に基づいて画像データを記録するタイミングを決定し、決定したタイミングで画像データが記録されるように、記録手段を制御する。 The first control method of the present invention is a method of operating as the first imaging device by controlling the imaging device according to the following procedure. First, an image region representing a person is extracted by analyzing the image data generated by the imaging means. In parallel with this, the input voice is analyzed to detect a predetermined feature related to the voice. Subsequently, the quality of the composition of the image data is determined based on the person extraction result and the sound detection result. Then, the timing for recording the image data is determined based on the determination result, and the recording means is controlled so that the image data is recorded at the determined timing.

本発明の第２の制御方法は、撮像装置を次の手順で制御することにより上記第２の撮像装置として動作させる方法である。まず、撮像手段により生成された画像データを解析することにより、人物を表す画像領域を抽出する。これと並行して、入力された音声を解析することにより音声に係る所定の特徴を検出する。続いて、人物の抽出結果および音声の検出結果に基づいて、画像データの構図の良否を判定する。そして、所定の出力手段の動作を制御することにより、決定したタイミングの到来をユーザに報知する。 The second control method of the present invention is a method of operating as the second imaging device by controlling the imaging device according to the following procedure. First, an image region representing a person is extracted by analyzing the image data generated by the imaging means. In parallel with this, the input voice is analyzed to detect a predetermined feature related to the voice. Subsequently, the quality of the composition of the image data is determined based on the person extraction result and the sound detection result. And the arrival of the determined timing is alert | reported to a user by controlling operation | movement of a predetermined | prescribed output means.

以下、本発明の方法および装置の一実施形態として、第１の制御方法および第２の制御方法を選択的に用いて制御を行うデジタルカメラを開示する。このデジタルカメラは、通常撮影モード、画像再生モード、自動撮影モードおよび撮影アシストモードの４つの動作モードを有する。自動撮影モードに設定されたときのデジタルカメラが本発明の第１の撮像装置に相当し、撮影アシストモードに設定されたときのデジタルカメラが本発明の第２の撮像装置に相当する。但し、本発明の２つの装置および制御方法は、いずれも独立して実施することが可能であり、以下に示す実施形態に限定されるものではない。 Hereinafter, a digital camera that performs control by selectively using a first control method and a second control method will be disclosed as an embodiment of the method and apparatus of the present invention. This digital camera has four operation modes: a normal shooting mode, an image reproduction mode, an automatic shooting mode, and a shooting assist mode. The digital camera when set to the automatic shooting mode corresponds to the first imaging device of the present invention, and the digital camera when set to the shooting assist mode corresponds to the second imaging device of the present invention. However, the two apparatuses and the control method of the present invention can be implemented independently, and are not limited to the embodiments described below.

はじめにデジタルカメラの構成について説明する。図１Ａおよび図１Ｂは、デジタルカメラ１の外観を示す図であり、図１Ａはデジタルカメラの正面を、図１Ｂは背面を示している。これらの図に示すように、デジタルカメラ１は、撮像レンズ２、シャッタレリーズボタン３、マイク４、操作ダイヤルあるいはボタン５ａ〜５ｆ、モニタ６、ＬＥＤランプ９を備えている。また、デジタルカメラ１の下部には、スピーカ８と開閉可能なスロットカバーがあり（図示せず）、スロットカバーの中にはメモリカード７を装填するカードスロットが備えられている。 First, the configuration of the digital camera will be described. 1A and 1B are views showing the appearance of the digital camera 1. FIG. 1A shows the front of the digital camera, and FIG. 1B shows the back. As shown in these drawings, the digital camera 1 includes an imaging lens 2, a shutter release button 3, a microphone 4, an operation dial or buttons 5a to 5f, a monitor 6, and an LED lamp 9. Further, at the lower part of the digital camera 1, there is a slot cover (not shown) that can be opened and closed with a speaker 8, and a card slot into which a memory card 7 is loaded is provided in the slot cover.

図２は、デジタルカメラ１の内部構成を示す図である。この図に示すように、デジタルカメラ１は、撮像レンズ２、レンズ駆動部１６、絞り１３、絞り駆動部１７、ＣＣＤ１４およびタイミングジェネレータ（ＴＧ）１８からなる撮像部を備える。撮像レンズ２は、被写体にピントを合わせるためのフォーカスレンズ、ズーム機能を実現するためのズームレンズなど複数の機能別レンズにより構成される。レンズ駆動部１６はステッピングモータなど小型のモータで、ＣＣＤ１４から各機能別レンズのまでの距離が目的に適った距離となるように各機能別レンズの位置を調整する。絞り１３は複数の絞り羽根からなる。絞り駆動部１７は、ステッピングモータなど小型のモータで、絞りの開口サイズが目的に適ったサイズになるように絞り羽根の位置を調整する。ＣＣＤ１４は原色カラーフィルタを伴う５００〜１２００万画素のＣＣＤで、タイミングジェネレータ１８からの指示信号に応じて蓄積された電荷を放出する。タイミングジェネレータ１８は、ＣＣＤ１４に所望の時間のみ電荷が蓄積されるようにＣＣＤ１４に対して信号を送り、これによりシャッタ速度を調整する。 FIG. 2 is a diagram illustrating an internal configuration of the digital camera 1. As shown in this figure, the digital camera 1 includes an imaging unit including an imaging lens 2, a lens driving unit 16, a diaphragm 13, a diaphragm driving unit 17, a CCD 14, and a timing generator (TG) 18. The imaging lens 2 includes a plurality of functional lenses such as a focus lens for focusing on a subject and a zoom lens for realizing a zoom function. The lens driving unit 16 is a small motor such as a stepping motor, and adjusts the position of each functional lens so that the distance from the CCD 14 to each functional lens is a suitable distance for the purpose. The diaphragm 13 is composed of a plurality of diaphragm blades. The aperture drive unit 17 is a small motor such as a stepping motor, and adjusts the position of the aperture blades so that the aperture size of the aperture becomes a size suitable for the purpose. The CCD 14 is a CCD having 5 to 12 million pixels with a primary color filter, and discharges accumulated charges in response to an instruction signal from the timing generator 18. The timing generator 18 sends a signal to the CCD 14 so that charges are accumulated in the CCD 14 only for a desired time, thereby adjusting the shutter speed.

また、デジタルカメラ１は、ＣＣＤ１４の出力信号をデジタル信号に変換するＡ／Ｄ変換部１５と、Ａ／Ｄ変換部１５が出力した画像データをシステムバス２４を介して他の処理部に転送する画像入力制御部２３と、画像入力制御部２３から転送された画像データを一時記憶するメモリ２２を備える。 The digital camera 1 also transfers an A / D conversion unit 15 that converts the output signal of the CCD 14 into a digital signal, and image data output from the A / D conversion unit 15 to another processing unit via the system bus 24. An image input control unit 23 and a memory 22 for temporarily storing image data transferred from the image input control unit 23 are provided.

また、デジタルカメラ１は、レンズ駆動部１６にレンズの移動を指示して焦点合わせを行う焦点調節部２０と、絞り値とシャッタ速度を決定し、絞り駆動部１７とタイミングジェネレータ１８に指示信号を送出する露出調整部２１を備える。また、デジタルカメラ１は、メモリ２２に記憶されている画像データに対して画像処理を施す画像処理部２５を備える。画像処理部２５は、画像を自然な色合い、明るさにするための色階調補正や明るさ補正、画像データが赤目を含むものであるときに赤目を黒目に修正する処理、画像の構図が悪いときに構図を修正する処理など、画像の見栄えを良くするための各種仕上げ処理を行う。画像処理部２５により処理された処理済画像データは、再度メモリ２２に格納される。 In addition, the digital camera 1 determines a focus adjustment unit 20 that instructs the lens driving unit 16 to move the lens and performs focusing, determines an aperture value and a shutter speed, and sends an instruction signal to the aperture driving unit 17 and the timing generator 18. An exposure adjusting unit 21 is provided. The digital camera 1 also includes an image processing unit 25 that performs image processing on the image data stored in the memory 22. The image processing unit 25 performs color tone correction and brightness correction to make an image have natural color and brightness, processing for correcting red eyes when the image data includes red eyes, and when the image composition is poor Various finishing processes are performed to improve the appearance of the image, such as a process of correcting the composition. The processed image data processed by the image processing unit 25 is stored in the memory 22 again.

また、デジタルカメラ１は、メモリ２２に記憶されている画像データのモニタ６への出力を制御する表示制御部２６を備える。表示制御部２６は、メモリ２２に記憶されている画像データの画素数を、表示に適した大きさとなるように間引きしてから液晶モニタ９に出力する。また、表示制御部２６は、動作条件などの設定画面の表示も制御する。 The digital camera 1 also includes a display control unit 26 that controls output of image data stored in the memory 22 to the monitor 6. The display control unit 26 thins out the number of pixels of the image data stored in the memory 22 so as to have a size suitable for display, and then outputs it to the liquid crystal monitor 9. The display control unit 26 also controls display of setting screens such as operating conditions.

また、デジタルカメラ１は、メモリ２２に記憶されている画像データのメモリカード７への書込み、およびメモリカード７に記録されている画像データのメモリ２２へのロードを制御する記録読出制御部２７を備える。記録読出制御部２７は、ユーザの設定に応じて撮影により取得された画像データをそのまま、もしくは圧縮符号化し、Ｅｘｉｆ（Exchangeable Image File Format）ファイルとして、メモリカード７に記録する。Ｅｘｉｆは、日本電子工業振興協会（ＪＥＩＤＡ）が定めたファイルフォーマット規格である。また、メモリカード７に記録されている画像ファイルの再生を要求する操作が行われたときには、Ｅｘｉｆファイルに含まれる画像データを、メモリ２２にロードする。画像データが圧縮されている場合には伸長した後、メモリ２２にロードする。 Further, the digital camera 1 includes a recording / reading control unit 27 that controls writing of image data stored in the memory 22 to the memory card 7 and loading of image data recorded on the memory card 7 to the memory 22. Prepare. The recording / reading control unit 27 records the image data obtained by photographing according to the user setting as it is or compression-encoded, and stores it in the memory card 7 as an Exif (Exchangeable Image File Format) file. Exif is a file format standard defined by the Japan Electronics Industry Promotion Association (JEIDA). When an operation for requesting reproduction of an image file recorded on the memory card 7 is performed, the image data included in the Exif file is loaded into the memory 22. If the image data is compressed, it is decompressed and loaded into the memory 22.

また、デジタルカメラ１は、ＬＥＤ９の点灯、消灯を制御するＬＥＤ制御部１９を備える。さらには、マイク４、スピーカ８、Ａ／Ｄ変換部１０、Ｄ／Ａ変換部１１および音声の入出力を制御する音声入出力制御部１２を備える。マイク４から入力され、Ａ／Ｄ変換部１０によりデジタルデータに変換された音声データは、音声入出力制御部１２によりシステムバス２４を介してメモリ２２に転送され、記憶される。また、各処理部または後述する全体制御部から音声入出力制御部１２に供給される音声データは、Ｄ／Ａ変換部１１により変換された後、スピーカ８に出力される。 The digital camera 1 also includes an LED control unit 19 that controls lighting and extinguishing of the LEDs 9. Furthermore, a microphone 4, a speaker 8, an A / D converter 10, a D / A converter 11, and a voice input / output controller 12 that controls voice input / output are provided. Audio data input from the microphone 4 and converted into digital data by the A / D converter 10 is transferred to the memory 22 via the system bus 24 by the audio input / output controller 12 and stored therein. Also, audio data supplied from each processing unit or the overall control unit described later to the audio input / output control unit 12 is converted by the D / A conversion unit 11 and then output to the speaker 8.

また、デジタルカメラ１は、画像を取得すべきタイミングを検出するタイミング検出部２８を備える。タイミング検出部２８は、メモリ２２に記憶されている画像データや音声データを解析し、それらのデータが所定の条件を満たすときに、画像を取得すべきタイミングが到来したことを示す信号を出力する。 The digital camera 1 also includes a timing detection unit 28 that detects the timing at which an image should be acquired. The timing detection unit 28 analyzes the image data and audio data stored in the memory 22 and outputs a signal indicating that the timing for acquiring an image has arrived when those data satisfy a predetermined condition. .

デジタルカメラ１は、この他、ＣＰＵ（Central Processor Unit）３１、操作／制御プログラムが格納されたＲＡＭ（Random Access Memory）３２、各種設定値が記憶されているＥＥＰＲＯＭ（Electronically Erasable and Programmable Read Only Memory）３３からなる全体制御部３０を備える。全体制御部３０のＣＰＵ３１は、ＥＥＰＲＯＭ３３に記憶された設定値を参照し、その設定値に基づいてＲＡＭ３２に記憶されているプログラムを選択、実行する。これにより、全体制御部３０は、シャッタレリーズボタン３、操作ダイヤル／ボタン５ａ〜５ｆの操作を検出し、あるいは各処理部の処理結果を受けて、ＬＥＤ制御部１９、焦点調節部２０、露出調整部２１、画像入力制御部２３、画像処理部２５、表示制御部２６、記録読出制御部２７、タイミング検出部２８および音声入出力制御部１２に対し、実行すべき処理を指示する指示信号を送出する。これにより、デジタルカメラ１の動作が制御される。 In addition, the digital camera 1 includes a CPU (Central Processor Unit) 31, a RAM (Random Access Memory) 32 in which an operation / control program is stored, and an EEPROM (Electronically Erasable and Programmable Read Only Memory) in which various setting values are stored. An overall control unit 30 is provided. The CPU 31 of the overall control unit 30 refers to the setting value stored in the EEPROM 33, and selects and executes a program stored in the RAM 32 based on the setting value. As a result, the overall control unit 30 detects the operation of the shutter release button 3 and the operation dial / buttons 5a to 5f, or receives the processing result of each processing unit, and the LED control unit 19, the focus adjustment unit 20, the exposure adjustment. An instruction signal for instructing a process to be executed is sent to the unit 21, the image input control unit 23, the image processing unit 25, the display control unit 26, the recording / reading control unit 27, the timing detection unit 28, and the voice input / output control unit 12. To do. Thereby, the operation of the digital camera 1 is controlled.

通常撮影モード、自動撮影モードおよび撮影アシストモードでは、全体制御部３０の制御の下で、各処理部が、それぞれ焦点調節、露出制御、フラッシュ制御、画像処理、記録などを実行することにより画像が取得される。再生モードでは、全体制御部３０の制御の下で、メモリカード７に記録されている画像がモニタ６に出力される。設定モードでは、全体制御部３０の制御の下で、モニタ６に設定画面が表示され、操作ダイヤル／ボタン５ａ〜５ｆからの操作入力が受け付けられる。設定画面において、ユーザが操作ダイヤル／ボタン５ａ〜５ｆを使って選択した情報、あるいはメモリカード７から取り込まれた情報は、ＥＥＰＲＯＭ３３に記憶される。 In the normal shooting mode, the automatic shooting mode, and the shooting assist mode, each processing unit performs focus adjustment, exposure control, flash control, image processing, recording, and the like under the control of the overall control unit 30 to obtain an image. To be acquired. In the playback mode, an image recorded on the memory card 7 is output to the monitor 6 under the control of the overall control unit 30. In the setting mode, a setting screen is displayed on the monitor 6 under the control of the overall control unit 30, and operation inputs from the operation dial / buttons 5a to 5f are accepted. Information selected by the user using the operation dials / buttons 5 a to 5 f on the setting screen or information taken from the memory card 7 is stored in the EEPROM 33.

以下、自動撮影モードと撮影アシストモードについて、さらに説明する。図３は、自動撮影モードに設定されたときのデジタルカメラ１の動作概要を示すフローチャートである。デジタルカメラ１は、自動撮影モードに設定されると、レンズに写る情景を表す画像データの生成を開始する（Ｓ１０１）。そして、生成された画像データが表す画像の構図を判定し（Ｓ１０２）、構図が良い場合にはシャッタレリーズボタン３の操作の有無に拘らず、メモリカード７に画像を記録する（Ｓ１０３）。構図が悪い場合には、より良い構図を提案し（Ｓ１０４）、ステップＳ１０１において生成される画像データがステップＳ１０４において提案した構図となるように、撮像部の動作を制御したり画像処理部２５に所定の処理を行わせたりする（Ｓ１０５）。例えば、主要な被写体が小さすぎるときには、撮像部にズーム動作を行なわせる。また、主要な被写体が偏って配置されているときには、画像処理部２５に指示して、その被写体が写っている領域のみをトリミングして移動または拡大する画像処理を行わせる。あるいは、垂直に立っているはずの被写体が斜めに写っていたら、回転処理を施して被写体が垂直になるようにする。 Hereinafter, the automatic shooting mode and the shooting assist mode will be further described. FIG. 3 is a flowchart showing an outline of the operation of the digital camera 1 when the automatic shooting mode is set. When the digital camera 1 is set to the automatic shooting mode, the digital camera 1 starts generating image data representing a scene captured on the lens (S101). Then, the composition of the image represented by the generated image data is determined (S102), and if the composition is good, the image is recorded on the memory card 7 regardless of whether the shutter release button 3 is operated (S103). If the composition is bad, a better composition is proposed (S104), and the operation of the imaging unit is controlled or the image processing unit 25 is controlled so that the image data generated in step S101 becomes the composition proposed in step S104. A predetermined process is performed (S105). For example, when the main subject is too small, the imaging unit is caused to perform a zoom operation. When the main subject is biased, the image processing unit 25 is instructed to perform image processing for trimming and moving or enlarging only the region where the subject is shown. Alternatively, if a subject that should be standing vertically appears obliquely, a rotation process is performed so that the subject becomes vertical.

撮像部や画像処理部２５により再生成（Ｓ１０１）された画像データは、ステップＳ１０２において再び評価される。以上の処理は、モード切替操作が検出されるまで（Ｓ１０６）繰り返される。 The image data regenerated (S101) by the imaging unit or the image processing unit 25 is evaluated again in step S102. The above process is repeated until a mode switching operation is detected (S106).

図４は、撮影アシストモードに設定されたときのデジタルカメラ１の動作概要を示すフローチャートである。デジタルカメラ１は、撮影アシストモードに設定されると、レンズに写る情景を表す画像データの生成を開始する（Ｓ２０１）。そして、生成された画像データが表す画像の構図を判定（評価）する（Ｓ２０２）。 FIG. 4 is a flowchart showing an outline of the operation of the digital camera 1 when the photographing assist mode is set. When the digital camera 1 is set to the photographing assist mode, the digital camera 1 starts generating image data representing a scene captured on the lens (S201). Then, the composition of the image represented by the generated image data is determined (evaluated) (S202).

構図が良い場合には、タイミングの到来を報知する（Ｓ２０３）。図５および図６に、タイミング到来の報知例を示す。図５は、モニタ６にシャッタレリーズボタンの押下を促すマーク３４を表示することで、タイミングの到来を報知する例を示している。マークに代えて「シャッタチャンスです」といったメッセージを表示してもよい。図６は、ＬＥＤランプ９を点滅させることで、タイミングの到来を報知する例を示している。この他、スピーカからの音声出力によりタイミングの到来を報知する方法も考えられる。 If the composition is good, the arrival of timing is notified (S203). 5 and 6 show notification examples of timing arrival. FIG. 5 shows an example in which the arrival of timing is notified by displaying a mark 34 prompting the user to press the shutter release button on the monitor 6. Instead of the mark, a message such as “It is a photo opportunity” may be displayed. FIG. 6 shows an example in which the arrival of timing is notified by blinking the LED lamp 9. In addition, a method of notifying the arrival of timing by sound output from a speaker is also conceivable.

構図が悪い場合には、より良い構図を提案する（Ｓ２０４）。そして、提案する構図をモニタ６に表示することにより（以下、アシスト表示と称する）、撮影者に対しカメラの構え方を直すよう、あるいは所定の操作ボタンを操作するよう促す（Ｓ２０５）。図７および図８に、アシスト表示の一例を示す。図７は、撮影されたままの画像データにフレーミング枠３５を重ねて表示することで、好ましいフレーミングのしかたを提示する例を示している。図８は、画像処理により生成した好ましいフレーミングの画像を表示し、画面の端にフレーミングをどのようにすれば表示中の画像のような画像データを取得できるかを示唆するマーク３６を表示した例である。この他、「ズームして下さい。」、「カメラを少し左に向けて下さい。」といったメッセージを表示あるいは音声出力することで、好ましいフレーミングを提示してもよい。撮影アシストモードでは、以上の処理が、モード切替操作が検出されるまで（Ｓ２０６）繰り返される。 If the composition is bad, a better composition is proposed (S204). Then, by displaying the proposed composition on the monitor 6 (hereinafter referred to as assist display), the photographer is prompted to correct the way the camera is held or to operate a predetermined operation button (S205). 7 and 8 show examples of assist display. FIG. 7 shows an example in which a preferred framing method is presented by displaying a framed frame 35 superimposed on image data that has been shot. FIG. 8 shows an example in which a preferred framing image generated by image processing is displayed, and a mark 36 is displayed that suggests how framing can be acquired at the edge of the screen to obtain image data such as the image being displayed. It is. In addition, a preferred framing may be presented by displaying a message such as “Please zoom in” or “Please turn the camera slightly to the left.” In the photographing assist mode, the above processing is repeated until a mode switching operation is detected (S206).

続いて、ステップＳ１０２およびＳ２０２の構図を判定する処理と、ステップＳ１０４およびＳ２０４の構図を提案する処理について、さらに詳しく説明する。構図の判定および構図の提案は、図２のタイミング検出部２８が行う。図９はタイミング検出部２８の構成を示す図である。図に示すように、タイミング検出部２８は、人物抽出手段４１、非人物抽出手段４２、音声解析手段４３、構図判定手段４４および構図提案手段４５により構成される。なお、タイミング検出部２８は、手段４１〜４５として機能するＬＳＩにより構成された回路でもよいし、手段４１〜４５の処理を実行するソフトウェアプログラムが組み込まれたマイクロコンピュータでもよい。 Next, the process for determining the composition in steps S102 and S202 and the process for proposing the composition in steps S104 and S204 will be described in more detail. Composition determination and composition proposal are performed by the timing detection unit 28 in FIG. FIG. 9 is a diagram illustrating a configuration of the timing detection unit 28. As shown in the figure, the timing detection unit 28 includes a person extraction unit 41, a non-person extraction unit 42, a voice analysis unit 43, a composition determination unit 44, and a composition proposal unit 45. Note that the timing detection unit 28 may be a circuit constituted by an LSI functioning as the means 41 to 45, or a microcomputer incorporating a software program for executing the processing of the means 41 to 45.

人物抽出手段４１は、メモリ２２に記憶されている画像データを読み込み、画像データ内の人物領域を探索する。本実施形態では、顔の探索を行うことにより人物を検出する。人物抽出手段４１は、顔を検出した場合には、通し番号などの識別子を付した後、顔領域の面積と顔を含む身体全体を現す領域（以下、全身領域）の面積および重心座標を算出する。重心座標は、顔の面積が所定値を超えるときは顔領域の重心座標を、所定値以下のときは全身領域の重心座標を求める。例えば図１０Ａおよび図１０Ｂの例示のように顔領域が比較的大きい場合には、個々の顔領域の重心座標が算出される。また、図１０Ｃおよび図１０Ｄのように顔領域が小さい場合には、十字印で示される全身領域の重心座標が算出される。人物抽出手段４１は、全領域の探索を終えると、検出された人物の数、位置、大きさを示す情報として、検出された人物の総数と、各人物の顔領域の範囲および面積と、全身領域の範囲および面積と、重心座標とを、タイミング検出部２８が備えるメモリ（図示せず）に記憶する。 The person extracting means 41 reads the image data stored in the memory 22 and searches for a person area in the image data. In this embodiment, a person is detected by searching for a face. When the face is detected, the person extraction unit 41 adds an identifier such as a serial number, and then calculates the area of the face area, the area of the entire body including the face (hereinafter referred to as the whole body area), and the barycentric coordinates. . The center-of-gravity coordinates are obtained as the center-of-gravity coordinates of the face area when the area of the face exceeds a predetermined value, and the center-of-gravity coordinates of the whole body area when the area is less than the predetermined value. For example, when the face area is relatively large as illustrated in FIGS. 10A and 10B, the barycentric coordinates of each face area are calculated. When the face area is small as shown in FIGS. 10C and 10D, the barycentric coordinates of the whole body area indicated by the cross mark are calculated. When the search of all areas is completed, the person extracting means 41 uses the total number of detected persons, the range and area of the face area of each person, and the whole body as information indicating the number, position, and size of the detected persons. The range and area of the region and the barycentric coordinates are stored in a memory (not shown) provided in the timing detection unit 28.

なお、顔を探索し検出する手法としては、特開２００１−５１３３８号公報に紹介されているとおり、肌色領域を顔として検出する方法、髪や目や口など幾何学的特徴を有するパーツの有無により顔か否かを判別して顔を検出する方法、その他種々の方法が知られている。人物抽出手段４１が実行する顔検出処理には、公知のあらゆる手法を利用することができる。 As a method for searching for and detecting a face, as introduced in Japanese Patent Application Laid-Open No. 2001-51338, a method for detecting a skin color area as a face, presence of parts having geometric features such as hair, eyes and mouth There are known a method for detecting a face by determining whether or not it is a face, and various other methods. Any known method can be used for the face detection processing executed by the person extracting means 41.

人物抽出手段４１は、続いて、検出された顔の表情を識別する。但し表情の識別は、自動撮影モードの詳細設定において表示識別機能がＯＮに設定されているときのみ実行する。あるいは、検出された顔の大きさが所定値以上であるときのみ表情の識別を行なうこととしてもよい。本実施形態では、人物抽出手段４１は、図１１Ａ、１１Ｂ、１１Ｃ、１１Ｄに例示する笑い顔、怒り顔、泣き顔、驚き顔の４つの表情を識別する。図の例から明らかであるように、これらの表情は、それぞれ、目や口の開き具合、眉や口角の上がり具合に特徴があるので、それらの部位の画像的特徴に基づき表情を識別することができる。表情を識別する手法は、特開２００１−５１３３８号公報に紹介されている手法をはじめ種々の方法が知られており、人物抽出手段４１が実行する表情識別処理には、公知のあらゆる手法を利用することができる。人物抽出手段４１は、識別した表情を、タイミング検出部２８が備えるメモリに記憶する。 The person extraction means 41 then identifies the detected facial expression. However, facial expression identification is performed only when the display identification function is set to ON in the detailed settings of the automatic shooting mode. Alternatively, facial expressions may be identified only when the size of the detected face is greater than or equal to a predetermined value. In the present embodiment, the person extraction unit 41 identifies four facial expressions, such as a laughing face, an angry face, a crying face, and a surprised face exemplified in FIGS. 11A, 11B, 11C, and 11D. As is clear from the example in the figure, each of these facial expressions is characterized by the degree of opening of the eyes and mouth and the degree of rise of the eyebrows and corners of the mouth, so that the facial expressions are identified based on the image characteristics of those parts. Can do. Various methods for identifying facial expressions are known, including the method introduced in Japanese Patent Laid-Open No. 2001-51338, and any known method is used for facial expression identification processing performed by the person extracting means 41. can do. The person extraction unit 41 stores the identified facial expression in a memory provided in the timing detection unit 28.

人物抽出手段４１は、さらに、検出された顔を持つ人物のジェスチャーを識別する。ジェスチャーの識別は、自動撮影モードの詳細設定においてジェスチャー識別機能がＯＮに設定されているときのみ実行する。あるいは、検出された顔の大きさが所定値以上であるときは表情の識別を行ない、所定値以下であるときはジェスチャーの識別を行なうこととしてもよい。 The person extracting means 41 further identifies a gesture of a person having the detected face. Gesture identification is executed only when the gesture identification function is set to ON in the detailed settings of the automatic shooting mode. Alternatively, facial expressions may be identified when the detected face size is greater than or equal to a predetermined value, and gestures may be identified when the detected face size is less than or equal to a predetermined value.

本実施形態では、タイミング検出部２８が備えるメモリには、よく知られているジェスチャーが、そのジェスチャーの幾何学的特徴を表すデータとして予め登録されている。例えば、図１２Ａに例示する人差し指と中指のみを立てるジェスチャー（ピース）、図１２Ｂに例示する両腕を上げるジェスチャー（万歳、ガッツポーズ）、その他、親指と人差し指で円を作るジェスチャー（ＯＫまたは円）、親指だけを立てて上に向けるジェスチャー（グッド）などが登録されている。人物抽出手段４１は、メモリ２２から読み込んだ画像データの顔周辺から抽出された幾何学的特徴を登録データと照合する。そして、抽出された特徴が登録されているジェスチャーの特徴と一致したときには、そのジェスチャー名もしくは予め決められているジェスチャーの識別子を、タイミング検出部２８が備えるメモリに記憶する。 In the present embodiment, a well-known gesture is registered in advance in the memory included in the timing detection unit 28 as data representing the geometric feature of the gesture. For example, a gesture (piece) in which only the forefinger and middle finger are exemplarily illustrated in FIG. 12A, a gesture for raising both arms illustrated in FIG. There are registered gestures such as gesturing to raise only the thumb and pointing up (good). The person extracting means 41 collates the geometric features extracted from the periphery of the face of the image data read from the memory 22 with the registered data. When the extracted feature matches the registered gesture feature, the gesture name or a predetermined gesture identifier is stored in a memory included in the timing detection unit 28.

ジェスチャーを識別する手法もまた、特開２００１−５１３３８号公報に紹介されている手法をはじめ種々の方法が知られている。人物抽出手段４１が実行するジェスチャー識別処理には、公知のあらゆる手法を利用することができる。 Various methods are known as a method for identifying a gesture, including the method introduced in Japanese Patent Laid-Open No. 2001-51338. Any known technique can be used for the gesture identification process executed by the person extracting means 41.

人物抽出手段４１は、続いて顔領域の合計面積を算出する。例えば、図１０Ａ〜１０Ｄの例であれば、それぞれ、点線枠として表示した領域の面積の合計を算出する。但し、全身領域の面積の合計を算出することとしてもよい。 The person extraction means 41 then calculates the total area of the face area. For example, in the example of FIGS. 10A to 10D, the total area of the regions displayed as dotted line frames is calculated. However, the total area of the whole body region may be calculated.

人物抽出手段４１は、算出された領域面積の合計が所定の閾値を上回るときは、タイミング検出部２８が備えるメモリに記憶しておいた顔の数、顔領域の面積、全身領域の面積、重心座標、表情およびジェスチャーの情報を構図判定手段４４にのみ供給する。一方、算出された領域面積の合計が所定の閾値以下であるときは、メモリに記憶しておいた情報を、構図判定手段４４と非人物抽出手段４２に供給する。 When the total of the calculated area areas exceeds a predetermined threshold, the person extracting means 41 determines the number of faces stored in the memory included in the timing detection unit 28, the area of the face area, the area of the whole body area, the center of gravity. Information on coordinates, facial expressions, and gestures is supplied only to the composition determination means 44. On the other hand, when the calculated total area is equal to or smaller than a predetermined threshold value, the information stored in the memory is supplied to the composition determination unit 44 and the non-person extraction unit 42.

非人物抽出手段４２は、画像データに含まれる人物以外の主要被写体を抽出する。本実施形態では、非人物抽出手段４２は、メモリ２２に記憶されている画像データを読み込み画像データの、顔もしくは胴体を含む人物に相当する領域の画素値を０に置き換えるなどして、画像データから人物領域のデータを消去する。例えば、読み込んだ画像データが図１３Ａに例示するように人物５０ａ、５０ｂと人物以外の被写体５１を含む画像データであったとする。また、人物抽出手段４１からは、図の点線枠で囲われた領域５２ａ、５２ｂについて、重心などの情報が供給されているものとする。この画像データから人物領域５２ａ、５２ｂ内の画素データを消去すれば、図１３Ｂに例示するように被写体５１のみを含む画像データが得られる。 The non-person extracting means 42 extracts main subjects other than the person included in the image data. In the present embodiment, the non-person extracting means 42 reads the image data stored in the memory 22 and replaces the pixel value of the area corresponding to the person including the face or the body with 0 in the image data. Erase the person area data from. For example, it is assumed that the read image data is image data including persons 50a and 50b and a subject 51 other than the person as illustrated in FIG. 13A. In addition, it is assumed that information such as the center of gravity is supplied from the person extraction unit 41 for the regions 52a and 52b surrounded by the dotted frame in the figure. If the pixel data in the person areas 52a and 52b are deleted from the image data, image data including only the subject 51 can be obtained as illustrated in FIG. 13B.

続いて、非人物抽出手段４２は、人物領域５２ａ、５２ｂの情報が消去された画像データに対し、ハイパスフィルタを用いたフィルタリング処理を施す。これにより、図１３Ｃに例示するように、被写体５１のエッジ部分が抽出されたエッジ画像５３が得られる。エッジ画像５３は、画像データに含まれている人物以外の被写体の輪郭からなる画像であるので、この画像を解析することにより、図１３Ｄに例示するように、被写体５３が配置された大まかな領域５４を識別することができる。非人物抽出手段４２は、識別された領域５４の面積および重心座標を算出し、構図判定手段４４に供給する。 Subsequently, the non-person extracting unit 42 performs a filtering process using a high-pass filter on the image data from which the information of the person areas 52a and 52b has been deleted. Thereby, as illustrated in FIG. 13C, an edge image 53 in which the edge portion of the subject 51 is extracted is obtained. Since the edge image 53 is an image composed of the contour of a subject other than a person included in the image data, by analyzing this image, as shown in FIG. 13D, a rough area where the subject 53 is arranged is illustrated. 54 can be identified. The non-person extracting unit 42 calculates the area and barycentric coordinates of the identified region 54 and supplies the calculated area 54 to the composition determining unit 44.

人物以外の被写体領域を識別する方法としては、ハイパスフィルタによるフィルタリング処理に代えて、フーリエ変換を利用して特定の（エッジに相当する）周波数成分のみを取り出す方法も考えられる。また、周波数解析ではなく、色の情報を利用した解析を行なうことにより主要被写体を抽出する方法も考えられる。例えば、画素値が所定の色を表す値であるときはそのままの値とし、所定の色以外の色を表す値であるときは画素値を０または１に置き換える。これにより、画像は２つの領域に分割され、その所定の色の被写体もしくはその所定の色以外の被写体が配置されている領域が抽出される。この他、人物とともに撮影されやすい対象（例えばペットとしてよく飼われる動物など）については、Ａｄａｂｏｏｓｔアルゴリズムなど学習に基づく判別アルゴリズムを利用して識別を行ない、被写体領域を示すデータを生成することもできる。 As a method of identifying a subject area other than a person, a method of extracting only a specific frequency component (corresponding to an edge) using Fourier transform instead of filtering processing using a high-pass filter is conceivable. Also, a method of extracting a main subject by performing analysis using color information instead of frequency analysis is also conceivable. For example, when the pixel value is a value representing a predetermined color, the value is left as it is, and when the pixel value is a value representing a color other than the predetermined color, the pixel value is replaced with 0 or 1. As a result, the image is divided into two regions, and the region where the subject of the predetermined color or the subject other than the predetermined color is arranged is extracted. In addition, an object that is likely to be photographed with a person (for example, an animal often kept as a pet) can be identified using a discrimination algorithm based on learning, such as the Adaboost algorithm, to generate data indicating a subject area.

なお、画像の鮮鋭度は画像データを取得したときのシャッタスピードに依存することがあり、画像の色は測光値や絞りなどに依存することがある。このため、画像解析を行なうときに、画像データが生成されたときの各種調整値、設定値を考慮に入れることで、被写体領域の識別が容易になることもある。 Note that the sharpness of the image may depend on the shutter speed when the image data is acquired, and the color of the image may depend on the photometric value or the aperture. For this reason, when performing image analysis, the subject area may be easily identified by taking into account various adjustment values and setting values when the image data is generated.

音声解析手段４３は、マイク４から入力される音声を解析し、以下の音声を検出する。第１に音声解析手段４３はマイク４から入力される音声の音量を常時計測し、所定の閾値と比較する。図１４は、横軸を時間、縦軸を音量としたグラフを示している。音声解析手段４３は、図１４の例示における時点Ｔ、すなわち音量が急激に変化して閾値Ｔｈを超えた時点を検出する。スポーツ観戦時の撮影やパーティでの撮影では、例えばサッカーのゴールシーンや結婚式の乾杯シーンなど、歓声が上がった瞬間がシャッタチャンスであることが多い。よって、音量が急激に変化した時点を検出することで、シャッタチャンスを検出することができる。あるいは、音量が変化したことを検出するのではなく、音量が閾値Ｔｈを超えていることを検出するだけでもよい。歓声が上がっている間は常にシャッタチャンスと考えることもできるからである。反対に、例えば赤ん坊の寝顔を撮影したい場合など、静かになったときがシャッタチャンスである場合には、音量が閾値を下回った時点あるいは下回っている状況を検出してもよい。音量を解析した結果、いずれの時点を検出するかは、設定により変更することができる。 The voice analysis means 43 analyzes the voice input from the microphone 4 and detects the following voice. First, the voice analysis means 43 constantly measures the volume of the voice input from the microphone 4 and compares it with a predetermined threshold value. FIG. 14 shows a graph with the horizontal axis representing time and the vertical axis representing volume. The voice analysis means 43 detects the time point T in the example of FIG. 14, that is, the time point when the volume suddenly changes and exceeds the threshold value Th. When shooting at a sporting event or shooting at a party, the moment of cheering, such as a soccer goal scene or a wedding toast scene, is often a photo opportunity. Therefore, it is possible to detect a photo opportunity by detecting a time point when the sound volume changes suddenly. Alternatively, instead of detecting that the volume has changed, it may only be detected that the volume exceeds the threshold Th. This is because it can always be considered as a photo opportunity while cheers are raised. On the other hand, when it is a photo opportunity when it is quiet, for example, when a baby's sleeping face is desired to be photographed, a situation where the volume falls below or below the threshold may be detected. As a result of analyzing the volume, which time point is detected can be changed by setting.

また、音声解析手段４３は、音声が表すフレーズを識別し、予め登録されている特定のフレーズと照合を行なう。この登録データはタイミング検出部２８が備えるメモリに記憶されており、例えば、「ハイ、チーズッ」、「カンパーイ」など、シャッタレリーズボタンを押すタイミングと同期して発せられる可能性が高いフレーズが登録されている。また、本実施形態では、登録データの１つとして、声を登録することができ、さらには、声とフレーズとを対応付けて登録することもできる。音声解析手段４３は、これらの登録データとの照合を行なうことにより、（ａ）登録フレーズが音声として発せられた時点、（ｂ）登録者が音声を発した時点、（ｃ）登録者が登録フレーズを音声として発した時点、を検出することができる。（ａ）、（ｂ）、（ｃ）のいずれの時点を検出するかは、原則として設定によって決まる。但し、登録データの登録状況に応じて、設定と異なる処理を実行してもよい。例えば、検出したい時点として（ｃ）が設定されていても、声が登録されていなければ（ａ）の時点を検出するようにする。 Moreover, the voice analysis means 43 identifies the phrase represented by the voice, and collates it with a specific phrase registered in advance. This registration data is stored in a memory provided in the timing detection unit 28. For example, phrases that are highly likely to be issued in synchronization with the timing of pressing the shutter release button, such as “high, cheese” and “campai” are registered. ing. Moreover, in this embodiment, a voice can be registered as one of the registration data, and furthermore, a voice and a phrase can be registered in association with each other. The voice analysis means 43 collates with these registration data, so that (a) when the registered phrase is uttered as voice, (b) when the registrant utters voice, (c) when the registrant registers. It is possible to detect when a phrase is uttered as speech. In principle, which time point (a), (b), or (c) is detected depends on the setting. However, processing different from the setting may be executed according to the registration status of the registration data. For example, even if (c) is set as the time point to be detected, if the voice is not registered, the time point (a) is detected.

なお、音量に基づく検出処理とフレーズの照合に基づく検出処理を、両方とも実行するか、一方のみ実行するか、一方のみ実行するとすればいずれを実行するかは、設定に依存する。 Note that it is dependent on the setting whether both the detection process based on volume and the detection process based on phrase matching are executed, or only one or only one is executed.

続いて、構図判定手段４４の処理について説明する。図９に示されるように、構図判定手段４４には、メモリ２２から読み込んだ画像データと、人物抽出手段４１、非人物抽出手段４２による抽出結果と、音声解析手段４３による検出結果とが供給される。但し、抽出や検出が行われなかったときには、供給する情報が無いことを示す値（例えば０）が入力される。 Next, processing of the composition determination unit 44 will be described. As shown in FIG. 9, the composition determination means 44 is supplied with image data read from the memory 22, extraction results by the person extraction means 41 and non-person extraction means 42, and detection results by the sound analysis means 43. The However, when extraction or detection is not performed, a value (for example, 0) indicating that there is no information to be supplied is input.

図１５は、構図判定手段４４の処理の一例を示すフローチャートである。構図判定手段４４に対しては、人物抽出手段４１から各人物の顔領域の範囲および面積、全身領域の範囲および面積、人物の重心座標、表情、ジェスチャーの情報が、非人物抽出手段４２から、人物以外の被写体の範囲、面積および重心座標の情報が、また音声解析手段４３から音声の検出結果の情報が供給されるものとする。 FIG. 15 is a flowchart illustrating an example of processing of the composition determination unit 44. For the composition determination means 44, information on the face area range and area of each person, the whole body area range and area, the person's barycentric coordinates, facial expression, and gesture information from the person extraction means 41 is received from the non-person extraction means 42. It is assumed that information on the range, area, and barycentric coordinates of a subject other than a person and information on the detection result of sound are supplied from the sound analysis means 43.

構図判定手段４４は、まず、人物を含む被写体の配置のバランスを評価する（Ｓ３０１）。人物抽出手段４１がＮ人（Ｎは整数）の人物を検出し、非人物抽出手段４２がＭ個（Ｍは整数）の被写体を検出したとすると、構図判定手段４４は、抽出された個々の人物領域もしくは被写体領域の重心座標から、Ｎ＋Ｍ個の領域全体の重心座標を算出する。例えば、図１３Ａ〜１３Ｄに例示した画像であれば、図１６に示すように、人物領域５２ａ、５２ｂの重心ｇ１、ｇ２と被写体領域５４の重心ｇ３の座標に基づいて、３つの領域の重心Ｇの座標を算出する。そして、重心Ｇが画像中央の所定範囲５５内にあれば配置バランスが良いと判断し、所定範囲５５を外れていれば、配置バランスが悪いと判断する。 The composition determination unit 44 first evaluates the balance of the arrangement of the subject including the person (S301). If the person extracting means 41 detects N persons (N is an integer) and the non-person extracting means 42 detects M (M is an integer) subjects, the composition determining means 44 The center-of-gravity coordinates of the entire N + M areas are calculated from the center-of-gravity coordinates of the person area or the subject area. For example, in the case of the images illustrated in FIGS. 13A to 13D, as shown in FIG. 16, the centroid G of the three regions is based on the coordinates of the centroids g1 and g2 of the person regions 52a and 52b and the centroid g3 of the subject region 54. The coordinates of are calculated. If the center of gravity G is within the predetermined range 55 in the center of the image, it is determined that the arrangement balance is good, and if it is outside the predetermined range 55, it is determined that the arrangement balance is bad.

なお、Ｎ＋Ｍ個の領域全体の重心座標を算出するときには、各領域の重心に、その対象の面積に応じた重み付けをしてから、重心座標を求めてもよい。面積の大きな対象ほど重み付けを大きくすれば、領域全体の重心は、面積が大きな対象に近い位置となる。例えば、図１７の例示では、各領域に均等な重み付けをして求めた場合の重心は範囲５５を外れた点ＧＡとなり、配置バランスは悪いと判断される。しかし、面積の大きな領域ほど重み付けを大きくする算出方法では、算出される重心は範囲５５内の点ＧＢとなり、配置バランスは良いと判断される。 When calculating the center-of-gravity coordinates of the entire N + M regions, the center-of-gravity coordinates may be obtained after weighting the center of gravity of each region according to the area of the target. If the weighting is increased for an object having a larger area, the center of gravity of the entire region becomes a position closer to the object having a larger area. For example, in the illustration of FIG. 17, the center of gravity obtained by weighting each region equally is the point GA that is out of the range 55, and it is determined that the arrangement balance is poor. However, in the calculation method in which the weighting is increased as the area is larger, the calculated center of gravity is the point GB within the range 55, and it is determined that the arrangement balance is good.

また、構図判定手段４４は、一部の被写体に関しては、配置バランスのみならず、回転ぶれの評価も行なう。ここで、回転ぶれとは、画像に含まれる被写体の向きや方向が実世界における被写体の向きや方向と異なることをいう。例えば垂直に立つ高層ビルが画像中斜めに配置されている場合には、回転ぶれ有りと判定する。非人物抽出手段４２が、学習に基づく判別アルゴリズムを利用した被写体抽出を行なう形態では、非人物抽出手段４２は被写体の輪郭のみならず、その被写体の種類を判別することができる。そして、そのような被写体に関しては、非人物抽出手段４３から構図判定手段４４に対し、被写体の種類を示す情報が供給される。構図判定手段４４は、抽出された被写体が、高層ビルや水平線のように、実世界において垂直あるいは水平に配置された対象であるときには、抽出された被写体の向きや方向を算出し、回転ぶれの有無を判定する。 In addition, the composition determination unit 44 evaluates not only the arrangement balance but also the rotation blur for some subjects. Here, the rotation blur means that the direction and direction of the subject included in the image are different from the direction and direction of the subject in the real world. For example, when a high-rise building that stands vertically is arranged obliquely in the image, it is determined that there is rotational shake. In the form in which the non-person extracting unit 42 performs subject extraction using a discrimination algorithm based on learning, the non-person extracting unit 42 can determine not only the contour of the subject but also the type of the subject. For such a subject, information indicating the type of the subject is supplied from the non-person extracting unit 43 to the composition determining unit 44. The composition determination means 44 calculates the orientation and direction of the extracted subject when the extracted subject is an object that is vertically or horizontally arranged in the real world, such as a high-rise building or a horizontal line. Determine presence or absence.

構図判定手段４４は、ステップＳ３０２において、配置バランスが悪いと判断したとき、あるいは回転ぶれが有ると判断したときには、構図が悪い（ＮＧ）との判定結果を出力する（Ｓ３０６）。 The composition determination unit 44 outputs a determination result that the composition is bad (NG) when it is determined in step S302 that the arrangement balance is poor or when there is a rotational shake (S306).

構図判定手段４４は、配置バランスが良く回転ぶれもないと判断したときには、人物抽出手段４１から供給された表情の情報に基づき、人物の表情が、シャッタレリーズボタンを押すに相応しい特定の表情であるか否かを判定する（Ｓ３０３）。もしくは、直前に供給された表情の情報と照合し、表情に変化があったか否か判定する。但し、表情の判定は、顔領域の面積が所定値以上のときのみ行なうこととしてもよい。特定の表情である（あるいは変化があった）場合には、構図が良い（ＯＫ）との判定結果を出力する（Ｓ３０７）。 When the composition determination means 44 determines that the arrangement balance is good and there is no rotation blur, the facial expression of the person is a specific expression suitable for pressing the shutter release button based on the facial expression information supplied from the person extraction means 41. It is determined whether or not (S303). Alternatively, it is compared with the facial expression information supplied immediately before to determine whether or not the facial expression has changed. However, facial expression determination may be performed only when the area of the face region is equal to or greater than a predetermined value. If the expression is specific (or has changed), a determination result that the composition is good (OK) is output (S307).

表情が特定の表情でない（あるいは変化がない）場合には、構図判定手段４４は、人物抽出手段４１から供給されたジェスチャーの情報に基づき、人物が、シャッタレリーズボタンを押すに相応しいジェスチャーをしているか否かを判定する（Ｓ３０４）。もしくは、直前に供給されたジェスチャーの情報との照合し、人物の動きに変化があったか否か判定する。但し、ジェスチャーの判定は、人物領域の面積が所定値以上のときのみ行なうこととしてもよい。特定のジェスチャーあるいは動きがあった場合には、構図が良いとの判定結果を出力する（Ｓ３０７）。 If the facial expression is not a specific facial expression (or no change), the composition determination means 44 performs a gesture suitable for the person to press the shutter release button based on the gesture information supplied from the person extraction means 41. It is determined whether or not there is (S304). Alternatively, it is checked against the gesture information supplied immediately before to determine whether there has been a change in the movement of the person. However, the determination of the gesture may be performed only when the area of the person region is equal to or larger than a predetermined value. When there is a specific gesture or movement, a determination result that the composition is good is output (S307).

特定のジェスチャーあるいは動きがなかった場合には、構図判定手段４４は、音声解析手段４３から供給された情報に基づいて、特定の音声が検出されたか否かを判定する（Ｓ３０５）。そして、特定の音声が検出されていなかった場合には、構図が悪いとの判定結果を出力し（Ｓ３０６）、特定の音声が検出されていた場合には、構図が良いとの判定結果を出力する（Ｓ３０７）。 If there is no specific gesture or movement, the composition determination unit 44 determines whether a specific voice is detected based on the information supplied from the voice analysis unit 43 (S305). If a specific voice is not detected, a determination result that the composition is bad is output (S306), and if a specific voice is detected, a determination result that the composition is good is output. (S307).

構図判定手段４４が、構図が良いとの判定結果を出力した場合、タイミング検出部２８は、その判定結果を全体制御部３０に送信する。この判定結果を受けた全体制御部３０は、デジタルカメラ１が自動撮影モードに設定されているときであれば、記録読出制御部２７に対し、メモリ２２に記憶されている画像データをメモリカード７に記録するよう指示する。また、撮影アシストモードに設定されているときであれば、表示制御部２６に対し、モニタに、シャッタチャンスの到来を告げるマークあるいはメッセージを表示するよう指示する（図５）。あるいは、ＬＥＤ制御部１９に対し、ＬＥＤ９の点灯を指示する（図６）。 When the composition determination unit 44 outputs a determination result indicating that the composition is good, the timing detection unit 28 transmits the determination result to the overall control unit 30. The overall control unit 30 that has received this determination result sends the image data stored in the memory 22 to the memory card 7 when the digital camera 1 is set to the automatic shooting mode. To record. If the photographing assist mode is set, the display control unit 26 is instructed to display a mark or a message indicating the arrival of a photo opportunity on the monitor (FIG. 5). Alternatively, the LED controller 19 is instructed to turn on the LED 9 (FIG. 6).

なお、本実施形態では、記録読出制御部２７は、構図の判定に利用した情報を、画像データの付属情報としてメモリカード７に記録する。詳細には、Ｅｘｉｆファイルのタグ部分に記録する。一方、構図判定手段４４は、構図が悪いと判定したときは、構図提案手段４５に対し、構図の判定に利用した情報を供給する。構図提案手段４５は、これらの情報を用いて、以下に説明する処理を実行する。 In the present embodiment, the recording / reading control unit 27 records the information used for composition determination on the memory card 7 as information attached to the image data. Specifically, it is recorded in the tag portion of the Exif file. On the other hand, when the composition determination unit 44 determines that the composition is bad, the composition determination unit 44 supplies information used to determine the composition to the composition proposal unit 45. The composition proposing means 45 executes the processing described below using these pieces of information.

構図提案手段４５は、構図判定手段４４から供給された情報を解析し、構図が悪いと判定された画像について、より好ましい構図を提案する。ここで構図を提案するとは、画像に含まれる被写体について、構図判定の判定基準を満たすような配置を決定することを意味する。決定された構図は、その構図の画像を取得するために実行すべき処理の情報とともに出力される。例えば、図１８Ａに例示するように、撮影された画像から抽出された領域５２ａ、５２ｂ、５４が左下に偏って配置されている場合、図１８Ｂに例示するように領域５２ａ、５２ｂ、５４の重心Ｇが画像の中央に配置された構図を提案する。あるいは、図１８Ｃに例示するように領域５２ａ、５２ｂ、５４の重心Ｇが画像の中央に配置され、且つ被写体がより大きく写った構図を提案する。そして、その構図の画像を取得するために実行すべき処理の情報として、２種類の情報を出力する。 The composition proposing means 45 analyzes the information supplied from the composition determining means 44 and proposes a more preferable composition for an image determined to have a poor composition. Proposing a composition here means determining an arrangement that satisfies a composition determination criterion for a subject included in an image. The determined composition is output together with information on processing to be executed to acquire an image of the composition. For example, as illustrated in FIG. 18A, when the regions 52a, 52b, and 54 extracted from the captured image are arranged to be shifted to the lower left, the center of gravity of the regions 52a, 52b, and 54 is illustrated as illustrated in FIG. 18B. A composition in which G is arranged at the center of the image is proposed. Alternatively, as illustrated in FIG. 18C, a composition is proposed in which the center of gravity G of the regions 52a, 52b, and 54 is arranged at the center of the image and the subject is shown larger. Then, two types of information are output as information of processing to be executed to acquire an image of the composition.

構図提案手段４５が出力する第１の情報は、撮影により取得された画像データを、画像処理により好ましい構図の画像データに変換するときに必要になる情報である。例えば、図１８Ｂの例ではトリミング範囲（図の太枠）と重心Ｇの移動方向（移動ベクトル）、図１８Ｃの例では、トリミング範囲と重心の移動方向、拡大倍率などの情報が出力される。 The first information output by the composition proposing means 45 is information necessary when converting image data acquired by photographing into image data having a preferable composition by image processing. For example, in the example of FIG. 18B, the trimming range (thick frame in the figure) and the moving direction (movement vector) of the center of gravity G are output. In the example of FIG.

構図提案手段４５が出力する第２の情報は、再撮影により、より好ましい構図の画像データを取得するときに必要になる情報である。例えば、図１８Ｂの例では、カメラをより左に向ける操作を示す情報が出力される。また、図１８Ｃの例では、カメラをより左に向ける操作と、設定すべき撮影倍率の情報が出力される。 The second information output by the composition proposing means 45 is information necessary for acquiring image data having a more preferable composition by re-photographing. For example, in the example of FIG. 18B, information indicating an operation for turning the camera to the left is output. Further, in the example of FIG. 18C, an operation for turning the camera to the left and information on the photographing magnification to be set are output.

この他、構図が悪いと判定される原因には回転ぶれがあるが、回転ぶれの場合には、第１の情報として傾きを修正するときの回転方向および角度の情報が、また第２の情報としてカメラを左あるいは右に傾けるべきことを示す情報が出力される。なお、構図が悪いと判定された要因が、表情、ジェスチャーあるいは音声であった場合には、画像処理による修正は行えないため、判定要因が表情、ジェスチャーあるいは音声であることを示す情報が出力される。 In addition, there is rotational blurring as a cause for determining that the composition is bad. In the case of rotational blurring, the first information includes information on the rotation direction and angle when correcting the tilt, and second information. Information indicating that the camera should be tilted to the left or right is output. In addition, if the factor determined to be poor composition is a facial expression, gesture or voice, it cannot be corrected by image processing, so information indicating that the judgment factor is a facial expression, gesture or voice is output. The

構図提案手段４５が出力した上記各情報は、全体制御部３０へと送信される。この判定結果を受けた全体制御部３０は、デジタルカメラ１が自動撮影モード、撮影アシストモードのいずれに設定されているかを判断し、それぞれのモードに応じた処理を行う。 Each information output from the composition proposing means 45 is transmitted to the overall control unit 30. Receiving this determination result, the overall control unit 30 determines whether the digital camera 1 is set to the automatic shooting mode or the shooting assist mode, and performs processing according to each mode.

デジタルカメラ１が自動撮影モードに設定されているときには、全体制御部３０は、画像処理部２５に対し、メモリ２２から画像データを読み込んで、構図を改善するために必要な画像処理（トリミング、移動、拡大縮小、回転など）を実行するよう指示する。また、表示制御部２６に対し、画像処理部２５による処理済画像データをモニタ６に表示するよう指示する。さらには、記録読出制御部２７に対し、画像処理部２５による処理済画像データをメモリカード７に記録するよう指示する。 When the digital camera 1 is set to the automatic shooting mode, the overall control unit 30 reads image data from the memory 22 into the image processing unit 25 and performs image processing (trimming and movement) necessary for improving the composition. , Scaling, rotation, etc.). Further, the display control unit 26 is instructed to display the image data processed by the image processing unit 25 on the monitor 6. Furthermore, it instructs the recording / reading control unit 27 to record the processed image data by the image processing unit 25 in the memory card 7.

本実施形態では、表示制御部２６は上記指示を受けたときに、図１９に例示するように、撮影されたとおりの構図の画像データ（撮影画像）を記録するか、提案された構図の画像データ（提案画像）を記録するか、撮影画像と提案画像の両方を記録するかをユーザに選択させるための選択画面を表示する。そして、記録読出制御部２７は、この選択画面で選択された画像データをメモリカード７に記録する。但し、このような選択画面を表示せず、無条件に、提案画像のみ、もしくは撮影画像と提案画像の両方を記録することとしてもよい。 In the present embodiment, when the display control unit 26 receives the above instruction, as illustrated in FIG. 19, the display control unit 26 records image data (captured image) of the composition as photographed, or an image of the proposed composition. A selection screen for allowing the user to select whether to record data (proposed image) or to record both a captured image and a proposed image is displayed. Then, the recording / reading control unit 27 records the image data selected on this selection screen in the memory card 7. However, such a selection screen may not be displayed, and only the proposed image or both the captured image and the proposed image may be recorded unconditionally.

なお、本実施形態では、上記指示を受けた記録読出制御部２７は、構図判定手段４４が構図判定に利用した情報、すなわちＮ＋Ｍ個の領域全体の重心座標、回転ぶれを判定した被写体の向き、人物の表情およびジェスチャー、検出された音声などの情報を、画像データの付属情報としてメモリカード７に記録する。さらには、構図提案手段４５が出力する第１の情報、すなわち、撮影により取得された画像データを画像処理により好ましい構図の画像データに変換するときに必要になる情報も、画像データの付属情報としてメモリカード７に記録する。詳細には、これらの情報は、Ｅｘｉｆファイルのタグ部分に記録される。 In the present embodiment, the recording / reading control unit 27 that has received the above instruction receives information used by the composition determination unit 44 for composition determination, that is, the center-of-gravity coordinates of the entire N + M areas, the orientation of the subject for which rotation blur has been determined, Information such as the facial expression and gesture of the person and the detected voice is recorded in the memory card 7 as the auxiliary information of the image data. Furthermore, the first information output by the composition proposing means 45, that is, information necessary for converting image data acquired by photographing into image data having a preferable composition by image processing is also attached as image data. Record in the memory card 7. Specifically, these pieces of information are recorded in the tag portion of the Exif file.

Ｅｘｉｆファイルのタグ部分に記録された情報は、パソコンを使って画像を編集するときに利用することができる。例えば、撮影画像と、構図提案手段４５が出力した第１の情報とがあれば、提案画像と同等の画像をパソコン上で生成することができるので、提案画像の記録を行わず、画像ファイルのファイルサイズを抑えることができる。また、構図提案手段４５が出力した第１の情報をベースにしつつ、少し手を加えることで、構図提案手段４５が提案した構図と異なる構図の画像を生成することもできる。 The information recorded in the tag portion of the Exif file can be used when editing an image using a personal computer. For example, if there is a photographed image and the first information output by the composition proposing means 45, an image equivalent to the proposed image can be generated on a personal computer. File size can be reduced. In addition, an image having a composition different from the composition proposed by the composition proposing means 45 can be generated by slightly modifying the first information output from the composition proposing means 45.

一方、デジタルカメラ１が撮影アシストモードに設定されているときには、全体制御部３０は、画像処理部２５に対し、メモリ２２から画像データを読み込んで、構図を改善するために必要な画像処理（トリミング、移動、拡大縮小、回転など）を実行するよう指示する。また、表示制御部２６に対し、画像処理部２５による処理済画像データと、構図提案手段４５が出力した第２の情報に基づいて生成したマークもしくはメッセージを、表示するよう指示する。これにより、図７および図８を参照して説明したアシスト表示が行われる。 On the other hand, when the digital camera 1 is set to the shooting assist mode, the overall control unit 30 reads image data from the memory 22 into the image processing unit 25 and performs image processing (trimming) necessary for improving the composition. , Move, scale, rotate, etc.). Further, the display control unit 26 is instructed to display the image data processed by the image processing unit 25 and the mark or message generated based on the second information output from the composition proposal unit 45. As a result, the assist display described with reference to FIGS. 7 and 8 is performed.

本実施形態では、デジタルカメラを自動撮影モードに設定しておけば、撮影された画像の構図が良く且つ所定の音声が検出されたときに、自動的に画像データがメモリカードに記録される。所定の音声が検出されただけでは、デジタルカメラは反応しないので、カメラを被写体に向ける前に撮影動作が行われたり、たまたま近くにいた者の声に反応して無用な撮影動作が行われることもない。すなわち、音声をトリガとした自動撮影の利便性を確保しつつ、無用な撮影が行われる不都合を解消することができる。 In the present embodiment, if the digital camera is set to the automatic shooting mode, the composition of the shot image is good and the image data is automatically recorded on the memory card when a predetermined sound is detected. The digital camera does not react if only a predetermined sound is detected, so a shooting operation is performed before the camera is pointed at the subject, or an unnecessary shooting operation is performed in response to the voice of a person who happened to be near Nor. That is, it is possible to eliminate the inconvenience of unnecessary shooting while ensuring the convenience of automatic shooting using voice as a trigger.

また、撮影アシストモードでは、撮影者にシャッタチャンスが報知されるが、この報知も、所定の音声が検出されただけでは行われない。撮影された画像の構図が良く且つ所定の音声が検出されてはじめて、シャッタチャンスであることが報知されるので、誤った報知がなされることはない。撮影アシストモードでは、報知されたタイミングでシャッタレリーズボタンを押下するだけで、簡単に構図の良い画像を取得することができ、自動撮影と同等の利便性を享受することができる。 In the photographing assist mode, the photographer is notified of the photo opportunity, but this notification is not performed only when a predetermined sound is detected. Only when the composition of the photographed image is good and a predetermined sound is detected, it is notified that there is a photo opportunity, so that no erroneous notification is made. In the shooting assist mode, an image with a good composition can be easily acquired by simply pressing the shutter release button at the notified timing, and the same convenience as in automatic shooting can be enjoyed.

なお、上記実施形態では、非人物抽出手段４２により人物以外の対象を抽出しているが、非人物抽出手段４２を設けない形態も考えられる。この場合、構図判定手段は、人物のみの配置を評価して構図を判定する。あるいは、配置の評価は行なわず、人物が含まれていれば良い構図、含まれていなければ悪い構図と判定してもよい。構図を判定するときの判定方法、判定に利用する情報、判定基準は、他にも種々考えられ、上記実施形態に限定されるものではない。また、上記実施形態では、構図提案手段４５により構図の提案を行っているが、構図提案手段４５を設けない形態も考えられる。 In the above embodiment, the non-person extracting unit 42 extracts a target other than a person, but a mode in which the non-person extracting unit 42 is not provided is also conceivable. In this case, the composition determination means determines the composition by evaluating the arrangement of only people. Alternatively, the arrangement may not be evaluated, and the composition may be determined as long as a person is included, or may be determined as a bad composition if the person is not included. Various other determination methods, information used for the determination, and determination criteria for determining the composition are conceivable and are not limited to the above-described embodiment. In the above embodiment, the composition proposal unit 45 proposes a composition, but a configuration in which the composition proposal unit 45 is not provided is also conceivable.

なお、上記実施形態の説明では、静止画撮影を行う場合を例示しながら説明しているが、本発明は、動画撮影において自動的に記録を開始したい場合などにも有用である。 In the description of the above embodiment, a case where still image shooting is performed is described as an example. However, the present invention is also useful when it is desired to automatically start recording in moving image shooting.

デジタルカメラの外観を示す図（正面）Diagram showing the appearance of a digital camera (front) デジタルカメラの外観を示す図（背面）Figure showing the appearance of the digital camera (back) デジタルカメラの内部構成を示す図Diagram showing the internal configuration of the digital camera デジタルカメラの動作概要を示すフローチャート（自動撮影モード）Flow chart showing the outline of digital camera operation (automatic shooting mode) デジタルカメラの動作概要を示すフローチャート（撮影アシストモード）Flow chart showing the outline of digital camera operation (shooting assist mode) タイミングの報知例を示す図Diagram showing an example of timing notification タイミングの他の報知例を示す図The figure which shows the other example of information of timing アシスト表示の例を示す図Diagram showing an example of assist display アシスト表示の他の例を示す図The figure which shows the other example of an assist display タイミング検出部の構成を示す図Diagram showing the configuration of the timing detector 顔の検出処理について説明するための図The figure for demonstrating the detection process of a face 顔の検出処理について説明するための図The figure for demonstrating the detection process of a face 顔の検出処理について説明するための図The figure for demonstrating the detection process of a face 顔の検出処理について説明するための図The figure for demonstrating the detection process of a face 表情識別処理について説明するための図The figure for demonstrating facial expression identification processing 表情識別処理について説明するための図The figure for demonstrating facial expression identification processing 表情識別処理について説明するための図The figure for demonstrating facial expression identification processing 表情識別処理について説明するための図The figure for demonstrating facial expression identification processing ジェスチャー識別処理について説明するための図The figure for demonstrating gesture identification processing ジェスチャー識別処理について説明するための図The figure for demonstrating gesture identification processing 人物以外の被写体の抽出処理について説明するための図The figure for demonstrating the extraction process of subjects other than a person 人物以外の被写体の抽出処理について説明するための図The figure for demonstrating the extraction process of subjects other than a person 人物以外の被写体の抽出処理について説明するための図The figure for demonstrating the extraction process of subjects other than a person 人物以外の被写体の抽出処理について説明するための図The figure for demonstrating the extraction process of subjects other than a person 音声解析の一例を示す図Diagram showing an example of speech analysis 構図判定処理の一例を示すフローチャートA flowchart showing an example of composition determination processing 構図判定処理について説明するための図The figure for demonstrating composition determination processing 構図判定処理について説明するための図The figure for demonstrating composition determination processing 構図提案処理について説明するための図Diagram for explaining composition proposal processing 構図提案処理について説明するための図Diagram for explaining composition proposal processing 構図提案処理について説明するための図Diagram for explaining composition proposal processing 記録画像選択画面の一例を示す図The figure which shows an example of a recording image selection screen

Explanation of symbols

１デジタルカメラ、２レンズ、３シャッタレリーズボタン、４マイク、
５ａ〜５ｆ操作ダイヤル／ボタン、６モニタ、７メモリカード、
８スピーカ、９ＬＥＤランプ 1 digital camera, 2 lens, 3 shutter release button, 4 microphone,
5a to 5f Operation dial / button, 6 monitor, 7 memory card,
8 Speaker, 9 LED lamp

Claims

An imaging apparatus comprising: an imaging unit that shoots a scene and generates image data representing the scene; and a recording unit that records image data generated by the imaging unit on a predetermined recording medium,
A person extracting means for extracting an image region representing a person by analyzing the image data generated by the imaging means;
Voice analysis means for detecting a predetermined feature of the voice by analyzing the input voice;
Composition determining means for determining the quality of the composition of the image data based on the extraction result of the person extracting means and the detection result of the voice analyzing means;
An imaging apparatus comprising: a recording control unit that determines a timing for recording image data based on a determination result of the composition determination unit, and controls the recording unit so that the image data is recorded at the determined timing.

An imaging apparatus comprising: an imaging unit that shoots a scene and generates image data representing the scene; and a recording unit that records image data generated by the imaging unit on a predetermined recording medium,
A person extracting means for extracting an image region representing a person by analyzing the image data generated by the imaging means;
Voice analysis means for detecting a predetermined feature of the voice by analyzing the input voice;
Composition determining means for determining the quality of the composition of the image data based on the extraction result of the person extracting means and the detection result of the voice analyzing means;
An imaging apparatus comprising: a notification unit that determines a timing for recording image data based on a determination result of the composition determination unit, and notifies the arrival of the determined timing.

3. The imaging apparatus according to claim 1, wherein the recording unit records the extraction result of the person extraction unit and the detection result of the voice analysis unit together with the image data on the recording medium.

The imaging apparatus according to claim 1, wherein the sound analysis unit detects a predetermined volume change as a characteristic of the sound.

The imaging apparatus according to claim 1, wherein the voice analysis unit detects a predetermined voice phrase as a feature of the voice.

The imaging apparatus according to claim 1, wherein the voice analysis unit detects a characteristic registered in advance as a voice characteristic of a predetermined person as the voice characteristic.

The person extracting means searches for a face included in the image data, and outputs, as the extraction result, information indicating the number of faces detected by the search, the position of each face, and the size of each face. The image pickup apparatus according to claim 1, wherein the image pickup apparatus is characterized.

8. The imaging apparatus according to claim 7, wherein the person extracting unit identifies facial expressions detected by the search, and further outputs information indicating the identified facial expressions.

The said person extraction means identifies the gesture of the person contained in the said image data, The information which shows the identified gesture is output as the said extraction result, The any one of Claim 1 to 8 characterized by the above-mentioned. Imaging device.

An imaging apparatus control method comprising: an imaging unit that captures a scene and generates image data representing the scene; and a recording unit that records the image data generated by the imaging unit on a predetermined recording medium,
By analyzing the image data generated by the imaging means, an image region representing a person is extracted,
Analyzing the input speech to detect certain features related to the speech,
Based on the extraction result of the person and the detection result of the sound, the quality of the composition of the image data is determined,
Determine the timing for recording image data based on the result of the determination,
An image pickup apparatus control method for controlling the recording unit so that image data is recorded at a determined timing.

An imaging apparatus control method comprising: an imaging unit that captures a scene and generates image data representing the scene; and a recording unit that records the image data generated by the imaging unit on a predetermined recording medium,
By analyzing the image data generated by the imaging means, an image region representing a person is extracted,
Analyzing the input speech to detect certain features related to the speech,
Based on the extraction result of the person and the detection result of the sound, the quality of the composition of the image data is determined,
A method for controlling an imaging apparatus, which notifies a user of the arrival of the determined timing by controlling an operation of a predetermined output unit.