JP2011160044A

JP2011160044A - Imaging device

Info

Publication number: JP2011160044A
Application number: JP2010017984A
Authority: JP
Inventors: Hiroshi Toshimitsu; 洋利光; Makoto Yamanaka; 誠山中; Norikazu Tsunekawa; 法和恒川; Seiji Okada; 誠司岡田
Original assignee: Sanyo Electric Co Ltd
Current assignee: Sanyo Electric Co Ltd
Priority date: 2010-01-29
Filing date: 2010-01-29
Publication date: 2011-08-18

Abstract

PROBLEM TO BE SOLVED: To automatically replicate imaging control in accordance with the preference of a user. SOLUTION: By using an object detection part for detecting each object on a target image by being sorted into one of a plurality of categories (persons, dogs, mountains and the like), the category, the size and the position of a focused object are learned by being related to a combination (for instance, a person and a mountain) of the detection categories of the plurality objects on the target image every time the target image is imaged. When a halfway pressing operation of a shutter button is performed after the completion of a certain amount of learning, object sorting for an input image at the time is performed, a learning content corresponding to a combination of detection categories for the input image is read from a learning memory, and automatic field angle adjustment and automatic focus adjustment in accordance with the read content are performed. COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、デジタルカメラ等の撮像装置に関する。 The present invention relates to an imaging apparatus such as a digital camera.

近年のデジタルカメラには、撮影モードを自動的に選択して対象画像の撮影を行う機能が備えられていることが多い（例えば特許文献１参照）。この機能では、被写体や撮影シーンをカメラ側で自動的に認識して複数の撮影モードの中から最適な撮影モードを選択し、選択撮影モードに規定された撮影条件（フォーカス、ＩＳＯ感度、信号処理等の条件）にて対象画像の撮影を行う。しかしながら、最適な撮影モードとしての選択撮影モードに規定される撮影条件は、カメラメーカ側が予め設定したものである。このため、ユーザによっては、選択撮影モードによるカメラ制御が行われた後、真に希望する撮影条件が設定されるように、手動操作を成す必要があった。 In recent years, digital cameras are often provided with a function of automatically selecting a shooting mode and shooting a target image (see, for example, Patent Document 1). This function automatically recognizes the subject and shooting scene on the camera side, selects the optimum shooting mode from a plurality of shooting modes, and sets the shooting conditions (focus, ISO sensitivity, signal processing) specified in the selected shooting mode. The target image is taken under the above conditions. However, the shooting conditions defined in the selected shooting mode as the optimum shooting mode are preset by the camera manufacturer. For this reason, some users need to perform a manual operation so that a truly desired shooting condition is set after camera control in the selected shooting mode is performed.

このような問題に対応する方法として、ユーザに質問形式で最適な処理を選択させる方法が提案されている（例えば特許文献２参照）。しかしながら、この方法は、返答操作の負担をユーザに課すことになる。 As a method for dealing with such a problem, a method for allowing a user to select an optimum process in a question format has been proposed (for example, see Patent Document 2). However, this method places a burden on the response operation on the user.

また、被写体をきめ細かく分類し、分類結果に応じて撮影条件の制御を行う方法も提案されている（例えば特許文献３参照）。しかしながら、この方法では、固有の被写体に撮影モードが個別に設定されるのみであり、注目被写体がどの被写体と共に写っているのか、どういう撮影シーンの中で写っているのか等を考慮した制御ができない。即ち例えば、人物が犬と共に写っているときに適用されるべき最適な撮影条件と人物が山と共に写ってときに適用されるべき最適な撮影条件は異なる場合があり、また、最適な撮影条件の具体的内容はユーザによってまちまちであるが、上記方法では、個々のユーザの好みに対応することはできない。 There has also been proposed a method for finely classifying subjects and controlling shooting conditions according to the classification result (see, for example, Patent Document 3). However, with this method, only the shooting mode is individually set for a specific subject, and control that takes into account the subject with which the subject of interest is captured and in what kind of shooting scene cannot be performed. . That is, for example, the optimal shooting conditions to be applied when a person is photographed with a dog may differ from the optimal shooting conditions to be applied when a person is photographed with a mountain. Although the specific contents vary depending on the user, the above method cannot cope with individual user preferences.

特開２００３−３４４８９１号公報JP 2003-344891 A 特開２００７−１１０６１９号公報JP 2007-110619 A 特開２００７−７４１４１号公報JP 2007-74141 A

ユーザに格別の負担を強いることなく、ユーザの好み（嗜好性）に沿った撮影制御を成すことは重要である。 It is important to perform shooting control according to the user's preference (preference) without imposing a special burden on the user.

同様に、音響信号を伴った画像を取得する際、ユーザに格別の負担を強いることなく、ユーザの好み（嗜好性）に沿った音声制御を成すことができれば有益である。 Similarly, when acquiring an image accompanied by an acoustic signal, it would be beneficial if voice control according to the user's preference (preference) could be achieved without imposing a special burden on the user.

そこで本発明は、ユーザに格別の負担を強いることなくユーザの好みに沿った撮影制御又は音声制御を成しうる撮像装置を提供することを目的とする。 SUMMARY An advantage of some aspects of the invention is that it provides an imaging apparatus that can perform imaging control or audio control in accordance with user preferences without imposing a special burden on the user.

本発明に係る撮像装置は、被写体の光学像を光電変換して得た信号を出力する撮像素子を有し、所定操作が成された際に得られる前記撮像素子の出力信号から対象画像を生成する撮像装置において、前記所定操作の繰り返しにより第１及び第２対象画像を含む複数の対象画像が生成され、前記第２対象画像は前記第１対象画像よりも後に生成され、当該撮像装置は、前記撮像素子の出力信号に基づく画像上に存在する各被写体を複数のカテゴリの何れかに分類して検出する被写体検出部と、前記第１対象画像上の複数の被写体に対する前記被写体検出部の検出カテゴリの組み合わせを特定組み合わせとし、前記第１対象画像の特徴又は前記第１対象画像の生成条件に応じた学習情報を前記特定組み合わせに関連付けて保存するメモリ部と、前記第１対象画像の生成後且つ前記第２対象画像の生成前における前記撮像素子の出力信号に基づく画像を評価用画像とし、前記評価用画像上の複数の被写体に対する前記被写体検出部の検出カテゴリの組み合わせが前記特定組み合わせと一致する場合、前記学習情報を用いて前記第２対象画像の生成を行う撮影制御部と、を備えたことを特徴とする。 An image pickup apparatus according to the present invention includes an image pickup device that outputs a signal obtained by photoelectrically converting an optical image of a subject, and generates a target image from an output signal of the image pickup device obtained when a predetermined operation is performed. In the imaging device, a plurality of target images including the first and second target images are generated by repeating the predetermined operation, the second target image is generated after the first target image, and the imaging device A subject detection unit that detects and classifies each subject present on an image based on an output signal of the image sensor into any of a plurality of categories, and detection of the subject detection unit for a plurality of subjects on the first target image A memory unit that stores a combination of categories as a specific combination and stores learning information according to the characteristics of the first target image or the generation conditions of the first target image in association with the specific combination; An image based on an output signal of the imaging element after the generation of the first target image and before the generation of the second target image is used as an evaluation image, and the detection category of the subject detection unit for a plurality of subjects on the evaluation image And a shooting control unit that generates the second target image using the learning information when the combination matches the specific combination.

これにより、ユーザの好みを反映した学習情報を用いて対象画像の生成を行うことが可能となる。つまり、ユーザに格別の負担を強いることなくユーザの好みに沿った撮影制御を学習情報から再現することが可能となる。 This makes it possible to generate a target image using learning information that reflects user preferences. That is, it is possible to reproduce shooting control according to the user's preference from the learning information without imposing a special burden on the user.

具体的には例えば、前記学習情報は、前記第１対象画像の特徴に応じた情報であって、前記第１対象画像上の複数の被写体の内、何れのカテゴリの被写体にピントがあっているかを表すフォーカス情報を含む。 Specifically, for example, the learning information is information according to the characteristics of the first target image, and which category of subjects among the plurality of subjects on the first target image is in focus. Contains focus information that represents.

これにより、フォーカスに関するユーザの好みを、学習情報から再現することが可能となる。 As a result, the user's preference regarding the focus can be reproduced from the learning information.

また例えば、前記学習情報は、更に、前記第１対象画像上のピントの合っている被写体の大きさを表すサイズ情報を含む。 For example, the learning information further includes size information indicating the size of the subject in focus on the first target image.

また例えば、前記学習情報は、更に、前記第１対象画像上のピントの合っている被写体の位置を表す位置情報を含む。 For example, the learning information further includes position information indicating the position of the focused subject on the first target image.

これらにより、被写体の大きさや構図に関するユーザの好みを、学習情報から再現することが可能となる。 Thus, the user's preference regarding the size and composition of the subject can be reproduced from the learning information.

また例えば、当該撮像装置は、前記第１対象画像の生成条件の指定を受け付ける操作部を更に備えて、前記操作部を介して指定された前記第１対象画像の生成条件に従って前記第１対象画像を生成し、前記学習情報は、前記第１対象画像の生成条件に応じた情報である。 In addition, for example, the imaging apparatus further includes an operation unit that receives specification of a generation condition for the first target image, and the first target image is set according to the generation condition for the first target image specified via the operation unit. The learning information is information according to the generation condition of the first target image.

これにより、操作部を介したユーザの指定内容を学習情報に保存することができ、以後、ユーザの好みに沿った撮影制御を学習情報から再現することが可能となる。 As a result, the user-specified content via the operation unit can be stored in the learning information, and thereafter, the shooting control according to the user's preference can be reproduced from the learning information.

具体的には例えば、前記撮影制御部は、前記評価用画像上の複数の被写体に対する前記被写体検出部の検出カテゴリの組み合わせが前記特定組み合わせと一致する場合、前記第１対象画像の特徴に応じた前記学習情報に基づき、前記第２対象画像が前記第１対象画像の特徴に応じた特徴を有するように、前記第２対象画像に対するフォーカス制御及びズーム制御を行う、或いは、前記第１対象画像の生成条件に応じた前記学習情報に基づき、前記第１対象画像の生成条件に応じた生成条件にて前記第２対象画像を生成する。 Specifically, for example, when the combination of detection categories of the subject detection unit for a plurality of subjects on the evaluation image matches the specific combination, the shooting control unit responds to the characteristics of the first target image. Based on the learning information, focus control and zoom control are performed on the second target image so that the second target image has characteristics according to the characteristics of the first target image, or the first target image Based on the learning information according to the generation condition, the second target image is generated under a generation condition according to the generation condition of the first target image.

本発明に係る他の撮像装置は、被写体の光学像を光電変換して得た信号を出力する撮像素子を有し、所定操作が成された際に得られる前記撮像素子の出力信号から対象画像を生成する撮像装置において、前記所定操作の繰り返しにより第１及び第２対象画像を含む複数の対象画像が生成され、前記第２対象画像は前記第１対象画像よりも後に生成され、当該撮像装置は、前記撮像素子の出力信号に基づく画像上に存在する被写体を複数のカテゴリの何れかに分類して検出する被写体検出部と、前記撮像素子の出力信号に基づく画像の撮影シーンを複数の登録シーンの中から選択することで判定するシーン判定部と、前記第１対象画像上の被写体に対する前記被写体検出部の検出カテゴリと前記第１対象画像に対する前記シーン判定部の判定シーンとの組み合わせを特定組み合わせとし、前記第１対象画像の特徴又は前記第１対象画像の生成条件に応じた学習情報を前記特定組み合わせに関連付けて保存するメモリ部と、前記第１対象画像の生成後且つ前記第２対象画像の生成前における前記撮像素子の出力信号に基づく画像を評価用画像とし、前記評価用画像上の被写体に対する前記被写体検出部の検出カテゴリと前記評価用画像に対する前記シーン判定部の判定シーンとの組み合わせが前記特定組み合わせと一致する場合、前記学習情報を用いて前記第２対象画像の生成を行う撮影制御部と、を備えたことを特徴とする。 Another imaging apparatus according to the present invention includes an imaging element that outputs a signal obtained by photoelectrically converting an optical image of a subject, and a target image is obtained from an output signal of the imaging element that is obtained when a predetermined operation is performed. The plurality of target images including the first and second target images are generated by repeating the predetermined operation, and the second target image is generated after the first target image, and the imaging device Includes a subject detection unit that detects a subject existing on an image based on an output signal of the image sensor by classifying it into any of a plurality of categories, and a plurality of registration scenes of an image based on the output signal of the image sensor A scene determination unit for determining by selecting from the scenes, a detection category of the subject detection unit for a subject on the first target image, and a determination scenario of the scene determination unit for the first target image. A memory unit that stores learning information according to characteristics of the first target image or generation conditions of the first target image in association with the specific combination, and generation of the first target image An image based on the output signal of the image sensor after and before the generation of the second target image is used as an evaluation image, and the detection category of the subject detection unit for the subject on the evaluation image and the scene determination for the evaluation image And a shooting control unit that generates the second target image using the learning information when the combination with the determination scene matches the specific combination.

本発明に係る更に他の撮像装置は、被写体の光学像を光電変換して得た信号を出力する撮像素子及び複数のマイクロホンから成るマイク部を有し、所定操作が成された際、前記撮像素子の出力信号から対象画像を生成する一方で前記複数のマイクロホンの出力音響信号に基づき対象音響信号を生成して該対象音響信号を前記対象画像に対応付ける撮像装置において、前記所定操作の繰り返しにより第１及び第２対象画像を含む複数の対象画像が生成され、前記第２対象画像は前記第１対象画像よりも後に生成され、当該撮像装置は、前記撮像素子の出力信号に基づく画像上に存在する各被写体を複数のカテゴリの何れかに分類して検出する被写体検出部と、前記第１対象画像上の複数の被写体に対する前記被写体検出部の検出カテゴリの組み合わせを特定組み合わせとし、前記第１対象画像に対応付けられた対象音響信号の特徴に応じた学習情報を前記特定組み合わせに関連付けて保存するメモリ部と、前記第１対象画像の生成後且つ前記第２対象画像の生成前における前記撮像素子の出力信号に基づく画像又は前記第２対象画像を評価用画像とし、前記評価用画像上の複数の被写体に対する前記被写体検出部の検出カテゴリの組み合わせが前記特定組み合わせと一致する場合、前記学習情報を用いて前記第２対象画像に対応付けられるべき対象音響信号の生成を行う対象音響信号生成部と、を備えたことを特徴とする。 Still another imaging device according to the present invention includes an imaging device that outputs a signal obtained by photoelectrically converting an optical image of a subject and a microphone unit including a plurality of microphones. When a predetermined operation is performed, the imaging device In the imaging device that generates a target image from the output signals of the plurality of microphones and generates a target sound signal based on the output sound signals of the plurality of microphones and associates the target sound signal with the target image, the first operation is repeated. A plurality of target images including a first target image and a second target image are generated, the second target image is generated after the first target image, and the imaging device is present on an image based on an output signal of the imaging element A combination of a subject detection unit that detects each subject to be classified into any of a plurality of categories and a detection category of the subject detection unit for a plurality of subjects on the first target image A memory unit that stores learning information corresponding to the characteristics of the target acoustic signal associated with the first target image in association with the specific combination, and after the generation of the first target image and the first The image based on the output signal of the image sensor before the generation of the two target images or the second target image is used as the evaluation image, and the combination of detection categories of the subject detection unit for the plurality of subjects on the evaluation image is the specified And a target acoustic signal generation unit configured to generate a target acoustic signal to be associated with the second target image using the learning information when matching the combination.

これにより、ユーザの好みを反映した学習情報を用いて対象音響信号の生成を行うことが可能となる。つまり、ユーザに格別の負担を強いることなくユーザの好みに沿った音声制御を学習情報から再現することが可能となる。 Thereby, it becomes possible to generate the target acoustic signal using the learning information reflecting the user's preference. That is, it is possible to reproduce the voice control according to the user's preference from the learning information without imposing a special burden on the user.

本発明によれば、ユーザに格別の負担を強いることなくユーザの好みに沿った撮影制御又は音声制御を成しうる撮像装置を提供することが可能である。 ADVANTAGE OF THE INVENTION According to this invention, it is possible to provide the imaging device which can perform imaging | photography control or audio | voice control according to a user's liking without imposing a special burden on a user.

本発明の意義ないし効果は、以下に示す実施の形態の説明により更に明らかとなろう。ただし、以下の実施の形態は、あくまでも本発明の一つの実施形態であって、本発明ないし各構成要件の用語の意義は、以下の実施の形態に記載されたものに制限されるものではない。 The significance or effect of the present invention will become more apparent from the following description of embodiments. However, the following embodiment is merely one embodiment of the present invention, and the meaning of the term of the present invention or each constituent element is not limited to that described in the following embodiment. .

本発明の実施形態に係る撮像装置の全体ブロック図である。1 is an overall block diagram of an imaging apparatus according to an embodiment of the present invention. 図１の撮像部の内部構成図である。It is an internal block diagram of the imaging part of FIG. 本発明の実施形態に係る特殊撮影モードの動作が学習段階動作と制御段階動作に大別される様子を示した図である。It is the figure which showed a mode that operation | movement of the special imaging | photography mode which concerns on embodiment of this invention is divided roughly into learning step operation | movement and control step operation | movement. 本発明の第１実施例に係り、特殊撮影モードの動作に特に関与する部位のブロック図である。FIG. 6 is a block diagram of a part related particularly to the operation of the special imaging mode according to the first embodiment of the present invention. 本発明の第１実施例に係る学習段階動作のフローチャートである。It is a flowchart of the learning stage operation | movement which concerns on 1st Example of this invention. 学習段階動作中に撮影される対象入力画像の例を示す図（ａ）と、その対象入力画像上の被写体領域を示す図（ｂ）である。FIG. 4A is a diagram illustrating an example of a target input image captured during a learning stage operation, and FIG. 5B is a diagram illustrating a subject area on the target input image. 本発明の第１実施例に係る特徴情報の構成を示す図である。It is a figure which shows the structure of the characteristic information which concerns on 1st Example of this invention. 入力画像の全体画像領域内に９つのブロックが設定される様子を示す図である。It is a figure which shows a mode that nine blocks are set in the whole image area | region of an input image. 本発明の第１実施例に係り、制御段階動作の実行時における学習メモリの記録内容を示す図である。It is a figure which shows the recorded content of the learning memory at the time of execution of control step operation | movement concerning 1st Example of this invention. 本発明の第１実施例に係る制御段階動作のフローチャートである。3 is a flowchart of a control stage operation according to the first embodiment of the present invention. 本発明の第１実施例に係り、制御段階動作の実行時に取得される評価用画像を示す図である。It is a figure which concerns on 1st Example of this invention and shows the image for evaluation acquired at the time of execution of control step operation | movement. 本発明の第１実施例に係り、画角自動調整後に得られる入力画像の例を示す図である。It is a figure which shows the example of the input image which concerns on 1st Example of this invention and is obtained after an angle of view automatic adjustment. 本発明の第４実施例において想定される入力画像を示す図である。It is a figure which shows the input image assumed in 4th Example of this invention. 本発明の第４実施例に係る特徴情報の構成を示す図である。It is a figure which shows the structure of the characteristic information which concerns on 4th Example of this invention. 本発明の第８実施例に係り、特殊撮影モードの動作に特に関与する部位のブロック図である。It is a block diagram of the site | part which concerns on 8th Example of this invention and is especially concerned in operation | movement of special imaging | photography mode. 本発明の第８実施例に係る学習段階動作のフローチャートである。It is a flowchart of the learning stage operation | movement which concerns on 8th Example of this invention. 本発明の第８実施例に係る生成条件情報の構成を示す図である。It is a figure which shows the structure of the production | generation condition information based on 8th Example of this invention. 本発明の第８実施例に係り、制御段階動作の実行時における学習メモリの記録内容を示す図である。It is a figure which shows the recorded content of the learning memory at the time of execution of control step operation | movement concerning 8th Example of this invention. 本発明の第８実施例に係る制御段階動作のフローチャートである。It is a flowchart of the control step operation | movement which concerns on 8th Example of this invention. 本発明の第９実施例に係るシーン判定部を示す図である。It is a figure which shows the scene determination part which concerns on 9th Example of this invention. 図１のマイク部の内部ブロック図である。It is an internal block diagram of the microphone part of FIG. 図１の撮像装置の外観斜視図である。It is an external appearance perspective view of the imaging device of FIG. 本発明の第１０実施例に係り、学習段階動作において取得される対象入力画像を示す図である。It is a figure which concerns on 10th Example of this invention and shows the target input image acquired in learning step operation | movement. 本発明の第１０実施例に係る音制御情報の構成を示す図である。It is a figure which shows the structure of the sound control information based on 10th Example of this invention. 本発明の第１０実施例に係り、制御段階動作の実行時における学習メモリの記録内容を示す図である。It is a figure concerning the 10th example of the present invention and shows the contents of a learning memory recorded at the time of execution of a control phase operation. 本発明の第１０実施例に係る制御段階動作のフローチャートである。It is a flowchart of the control step operation | movement which concerns on 10th Example of this invention. 本発明の第１０実施例に係り、制御段階動作の実行時に取得される評価用画像を示す図である。It is a figure concerning the 10th example of the present invention and is a figure showing an image for evaluation acquired at the time of execution of a control stage operation. 本発明の第１１実施例に係る表示画面の様子を示す図である。It is a figure which shows the mode of the display screen which concerns on 11th Example of this invention. 本発明の第１２実施例に係る表示画面の様子を示す図である。It is a figure which shows the mode of the display screen which concerns on 12th Example of this invention. 本発明の第１３実施例に係り、実際の撮影に基づく特徴情報を示す図である。It is a figure which concerns on 13th Example of this invention and shows the characteristic information based on actual imaging | photography. 本発明の第１３実施例に係り、擬似的に生成された特徴情報を示す図である。It is a figure which concerns on 13th Example of this invention and shows the characteristic information produced | generated artificially.

以下、本発明の一実施形態につき、図面を参照して具体的に説明する。参照される各図において、同一の部分には同一の符号を付し、同一の部分に関する重複する説明を原則として省略する。 Hereinafter, an embodiment of the present invention will be described in detail with reference to the drawings. In each of the drawings to be referred to, the same part is denoted by the same reference numeral, and redundant description regarding the same part is omitted in principle.

図１は、本発明の実施形態に係る撮像装置１の全体ブロック図である。撮像装置１は、符号１１〜２８によって参照される各部位を有する。撮像装置１は、デジタルビデオカメラであり、動画像及び静止画像を撮影可能となっていると共に動画像撮影中に静止画像を同時に撮影することも可能となっている。撮像装置１内の各部位は、バス２４又は２５を介して、各部位間の信号（データ）のやり取りを行う。尚、表示部２７及び／又はスピーカ２８は撮像装置１の外部装置（不図示）に設けられたものである、と解釈することも可能である。 FIG. 1 is an overall block diagram of an imaging apparatus 1 according to an embodiment of the present invention. The imaging device 1 has each part referred by the codes | symbols 11-28. The imaging device 1 is a digital video camera, and can capture a moving image and a still image, and also can simultaneously capture a still image during moving image capturing. Each part in the imaging apparatus 1 exchanges signals (data) between the parts via the bus 24 or 25. It should be noted that the display unit 27 and / or the speaker 28 can be interpreted as being provided in an external device (not shown) of the imaging device 1.

撮像部１１は、撮像素子を用いて被写体の撮影を行う。図２は、撮像部１１の内部構成図である。撮像部１１は、光学系３５と、絞り３２と、ＣＣＤ（Charge Coupled Device）又はＣＭＯＳ（Complementary Metal Oxide Semiconductor）イメージセンサなどから成る撮像素子（固体撮像素子）３３と、光学系３５や絞り３２を駆動制御するためのドライバ３４と、を有している。光学系３５は、撮像部１１の画角を調節するためのズームレンズ３０及び焦点を合わせるためのフォーカスレンズ３１を含む複数枚のレンズから形成される。ズームレンズ３０及びフォーカスレンズ３１は光軸方向に移動可能である。ＣＰＵ２３からの撮影制御信号に基づき、光学系３５内におけるズームレンズ３０及びフォーカスレンズ３１の位置並びに絞り３２の開度が制御される。 The imaging unit 11 captures a subject using an imaging element. FIG. 2 is an internal configuration diagram of the imaging unit 11. The imaging unit 11 includes an optical system 35, a diaphragm 32, an imaging device (solid-state imaging device) 33 including a CCD (Charge Coupled Device) or a CMOS (Complementary Metal Oxide Semiconductor) image sensor, and the optical system 35 and the diaphragm 32. And a driver 34 for drive control. The optical system 35 is formed of a plurality of lenses including a zoom lens 30 for adjusting the angle of view of the imaging unit 11 and a focus lens 31 for focusing. The zoom lens 30 and the focus lens 31 are movable in the optical axis direction. Based on the photographing control signal from the CPU 23, the positions of the zoom lens 30 and the focus lens 31 and the opening of the diaphragm 32 in the optical system 35 are controlled.

撮像素子３３は、水平及び垂直方向に複数の受光画素が配列されることによって形成される。撮像素子３３の各受光画素は、光学系３５及び絞り３２を介して入射した被写体の光学像を光電変換し、該光電変換によって得られた電気信号をＡＦＥ１２（Analog Front End）に出力する。 The image sensor 33 is formed by arranging a plurality of light receiving pixels in the horizontal and vertical directions. Each light receiving pixel of the image sensor 33 photoelectrically converts an optical image of a subject incident through the optical system 35 and the diaphragm 32, and outputs an electric signal obtained by the photoelectric conversion to an AFE 12 (Analog Front End).

ＡＦＥ１２は、撮像素子３３（各受光画素）から出力されるアナログ信号を増幅し、増幅されたアナログ信号をデジタル信号に変換してから画像信号処理部１３に出力する。ＡＦＥ１２における信号増幅の増幅度はＣＰＵ（Central Processing Unit）２３によって制御される。画像信号処理部１３は、ＡＦＥ１２の出力信号によって表される画像に対して必要な画像処理を施し、画像処理後の画像を表す画像信号を生成する。画像信号は、例えば、輝度信号及び色差信号を含む。マイク部１４は、撮像装置１の周辺音をアナログの音響信号に変換し、音響信号処理部１５は、このアナログの音響信号をデジタルの音響信号に変換する。 The AFE 12 amplifies the analog signal output from the image sensor 33 (each light receiving pixel), converts the amplified analog signal into a digital signal, and outputs the digital signal to the image signal processing unit 13. The amplification degree of signal amplification in the AFE 12 is controlled by a CPU (Central Processing Unit) 23. The image signal processing unit 13 performs necessary image processing on the image represented by the output signal of the AFE 12, and generates an image signal representing the image after image processing. The image signal includes, for example, a luminance signal and a color difference signal. The microphone unit 14 converts the ambient sound of the imaging device 1 into an analog acoustic signal, and the acoustic signal processing unit 15 converts the analog acoustic signal into a digital acoustic signal.

圧縮処理部１６は、画像信号処理部１３からの画像信号及び音響信号処理部１５からの音響信号を、所定の圧縮方式を用いて圧縮する。内部メモリ１７は、ＤＲＡＭ（Dynamic Random Access Memory）などから成り、各種のデータを一時的に保存する。記録媒体としての外部メモリ１８は、半導体メモリや磁気ディスクなどの不揮発性メモリであり、圧縮処理部１６による圧縮後の画像信号及び音響信号などの各種信号を記録することができる。 The compression processing unit 16 compresses the image signal from the image signal processing unit 13 and the acoustic signal from the acoustic signal processing unit 15 using a predetermined compression method. The internal memory 17 is composed of a DRAM (Dynamic Random Access Memory) or the like, and temporarily stores various data. The external memory 18 as a recording medium is a non-volatile memory such as a semiconductor memory or a magnetic disk, and can record various signals such as an image signal and an acoustic signal after being compressed by the compression processing unit 16.

伸張処理部１９は、外部メモリ１８から読み出された圧縮された画像信号及び音響信号を伸張する。伸張処理部１９による伸張後の画像信号又は画像信号処理部１３からの画像信号は、表示処理部２０を介して、液晶ディスプレイ等から成る表示部２７に送られて画像として表示される。また、伸張処理部１９による伸張後の音響信号は、音響信号出力回路２１を介してスピーカ２８に送られて音として出力される。 The expansion processing unit 19 expands the compressed image signal and sound signal read from the external memory 18. The image signal after being decompressed by the decompression processing unit 19 or the image signal from the image signal processing unit 13 is sent to the display unit 27 such as a liquid crystal display via the display processing unit 20 and displayed as an image. Further, the acoustic signal after being decompressed by the decompression processing unit 19 is sent to the speaker 28 via the acoustic signal output circuit 21 and output as sound.

ＴＧ（タイミングジェネレータ）２２は、撮像装置１全体における各動作のタイミングを制御するためのタイミング制御信号を生成し、生成したタイミング制御信号を撮像装置１内の各部に与える。タイミング制御信号は、垂直同期信号Ｖｓｙｎｃと水平同期信号Ｈｓｙｎｃを含む。ＣＰＵ２３は、撮像装置１内の各部位の動作を統括的に制御する。操作部２６は、動画像の撮影及び記録の開始／終了を指示するための録画ボタン２６ａ、静止画像の撮影及び記録を指示するためのシャッタボタン２６ｂ、並びに、ズーム倍率を指定するためのズームボタン２６ｃ等を有し、ユーザによる各種操作を受け付ける。操作部２６に対する操作内容はＣＰＵ２３に伝達される。 The TG (timing generator) 22 generates a timing control signal for controlling the timing of each operation in the entire imaging apparatus 1, and gives the generated timing control signal to each unit in the imaging apparatus 1. The timing control signal includes a vertical synchronization signal Vsync and a horizontal synchronization signal Hsync. The CPU 23 comprehensively controls the operation of each part in the imaging apparatus 1. The operation unit 26 includes a recording button 26a for instructing start / end of moving image shooting and recording, a shutter button 26b for instructing shooting and recording of a still image, and a zoom button for designating a zoom magnification. 26c and the like, and accepts various operations by the user. The operation content for the operation unit 26 is transmitted to the CPU 23.

撮像装置１の動作モードには、画像（静止画像又は動画像）の撮影及び記録が可能な撮影モードと、外部メモリ１８に記録された画像（静止画像又は動画像）を表示部２７に再生表示する再生モードと、が含まれる。操作部２６に対する操作に応じて、各モード間の遷移は実施される。撮影モードにおいて、撮像装置１は、所定のフレーム周期にて周期的に被写体の撮影を行って被写体の撮影画像を順次取得することができる。 The operation mode of the imaging apparatus 1 includes a shooting mode in which an image (still image or moving image) can be shot and recorded, and an image (still image or moving image) recorded in the external memory 18 is reproduced and displayed on the display unit 27. Playback mode to be included. In accordance with an operation on the operation unit 26, transition between the modes is performed. In the shooting mode, the imaging apparatus 1 can sequentially acquire captured images of the subject by periodically shooting the subject at a predetermined frame period.

尚、本明細書では、或る画像の画像信号のことを単に画像と言うこともある。また、画像信号及び音響信号の圧縮及び伸張は、本発明の本質とは関係ないため、以下の説明では、特に必要のない限り、画像信号及び音響信号の圧縮及び伸張の存在を無視する。従って例えば、或る画像についての圧縮された画像信号を記録することを、単に、画像信号を記録する又は画像を記録すると表現することがある（音響信号についても同様）。また、或る画像の大きさ又は画像領域の大きさを、画像サイズとも呼ぶ。注目画像又は注目画像領域の画像サイズを、注目画像を形成する画素の数又は注目画像領域に属する画素の数にて表現することができる。また、本明細書において、信号又は情報のメモリへの記録を保存と表現することもある。また、本明細書では、記号又は符号を示すことによって、その記号又は符号に対応する用語の名称を略記することがある。従って例えば、外部メモリ１８とメモリ１８は同じものを指す。 In the present specification, an image signal of a certain image may be simply referred to as an image. In addition, since the compression and expansion of the image signal and the sound signal are not related to the essence of the present invention, the presence of the compression and expansion of the image signal and the sound signal is ignored in the following description unless otherwise required. Thus, for example, recording a compressed image signal for an image may be simply expressed as recording an image signal or recording an image (the same applies to an acoustic signal). Further, the size of a certain image or the size of an image area is also referred to as an image size. The image size of the target image or target image area can be expressed by the number of pixels forming the target image or the number of pixels belonging to the target image area. In this specification, recording of a signal or information in a memory may be expressed as storage. Moreover, in this specification, the name of the term corresponding to the symbol or code | symbol may be abbreviated by showing a symbol or code | symbol. Therefore, for example, the external memory 18 and the memory 18 indicate the same thing.

撮像装置１には、過去の撮影結果及びマニュアル操作等から撮影者としてのユーザの嗜好性に関する情報を抽出し、その情報に基づき、現時点において該嗜好性に合致した撮影結果が再現されるように撮影制御を行う特殊機能が備えられている。撮影モードの一種である、上記特殊機能が働く撮像装置１の動作モードを特殊撮影モードと呼ぶ。 The imaging device 1 extracts information related to the user's preference as a photographer from past shooting results, manual operations, etc., and based on that information, the shooting results that match the preference at the present time are reproduced. A special function is provided to control shooting. The operation mode of the image pickup apparatus 1 that is a kind of shooting mode and in which the special function operates is referred to as a special shooting mode.

図３に示す如く、特殊撮影モードにおける動作は、ユーザの嗜好性を学習する学習段階動作と、その学習結果を用いて撮影制御を行う制御段階動作とに大別される。制御段階動作は、学習段階動作を経て実行される。学習段階動作が一旦完了して制御段階動作の実行が可能になってからも、ユーザの嗜好性の学習を継続し、学習結果を更新していくことができる。 As shown in FIG. 3, the operation in the special shooting mode is roughly divided into a learning stage operation for learning the user's preference and a control stage operation for performing shooting control using the learning result. The control phase operation is executed through a learning phase operation. Even after the learning stage operation is once completed and the control stage operation can be executed, the user's preference learning can be continued and the learning result can be updated.

以下、特殊撮影モードに関連する、撮像装置１の動作及び構成例を第１〜第１３実施例として説明する。矛盾なき限り、或る実施例について記載した事項を他の実施例に適用することが可能であると共に、第１〜第１３実施例の内の複数の実施例を組み合わせて実施することも可能である。以下の説明は、特に記述なき限り、特殊撮影モードにおける撮像装置１の動作の説明である。 Hereinafter, operations and configuration examples of the imaging apparatus 1 related to the special imaging mode will be described as first to thirteenth embodiments. As long as there is no contradiction, the matters described for one embodiment can be applied to other embodiments, and a plurality of embodiments of the first to thirteenth embodiments can be combined. is there. The following description is an explanation of the operation of the imaging apparatus 1 in the special imaging mode unless otherwise specified.

＜＜第１実施例＞＞
第１実施例を説明する。図４は、特殊撮影モードの動作に特に関与する部位のブロック図である。被写体検出部５１、特徴情報生成部５２、メモリ制御部５３及び撮影制御部５５を、画像信号処理部１３によって、或いは、画像信号処理部１３とＣＰＵ２３の組み合わせによって実現することができる。学習メモリ５４は、内部メモリ１７に設けられたフラッシュメモリ等の不揮発性メモリから形成される。 << First Example >>
A first embodiment will be described. FIG. 4 is a block diagram of a part particularly related to the operation in the special imaging mode. The subject detection unit 51, the feature information generation unit 52, the memory control unit 53, and the imaging control unit 55 can be realized by the image signal processing unit 13 or a combination of the image signal processing unit 13 and the CPU 23. The learning memory 54 is formed from a nonvolatile memory such as a flash memory provided in the internal memory 17.

１フレーム分のＡＦＥ１２の出力信号によって表される画像そのもの、或いは、その画像に対して所定の画像処理（デモザイキング処理やノイズ低減処理など）を施して得られる静止画像を入力画像と呼ぶ。更に、所定のシャッタ操作に従って得られた入力画像を特に対象入力画像と呼ぶ。シャッタ操作とは、シャッタボタン２６ｂを押下する操作である。但し、シャッタ操作は、シャッタボタン２６ｂを押下する操作以外の操作（例えば所定のタッチパネル操作）であっても良い。 An image itself represented by the output signal of the AFE 12 for one frame or a still image obtained by performing predetermined image processing (such as demosaicing processing or noise reduction processing) on the image is referred to as an input image. Further, an input image obtained according to a predetermined shutter operation is particularly called a target input image. The shutter operation is an operation of pressing the shutter button 26b. However, the shutter operation may be an operation other than the operation of pressing the shutter button 26b (for example, a predetermined touch panel operation).

［学習段階動作］
図５は、第１実施例に係る学習段階動作の手順を表すフローチャートであり、学習段階動作ではステップＳ１１〜Ｓ１４の各処理が実行される。ステップＳ１１では、ユーザがシャッタ操作を成すことにより対象入力画像が取得される。ステップＳ１２では、被写体検出部５１により、対象入力画像の被写体検出及び被写体のカテゴリ分類が行われる。ステップＳ１３では、特徴情報生成部５２により対象入力画像からユーザの嗜好性に関する情報とも言える特徴情報が抽出及び生成される。ステップＳ１４では、ステップＳ１２及びＳ１３の処理結果が学習メモリ５４の記録内容に反映される（単純には例えば、ステップＳ１３にて生成された特徴情報が学習メモリ５４にそのまま保存される）。学習メモリ５４に対する情報の記録制御はメモリ制御部５３によって行われる。図４に示される各部位の内、撮影制御部５５のみに関しては、制御段階動作に有益に機能する。以下、対象入力画像の具体例を挙げつつ、図４の各部位の動作を詳細に説明する。 [Learning stage operation]
FIG. 5 is a flowchart showing the procedure of the learning stage operation according to the first embodiment. In the learning stage operation, each process of steps S11 to S14 is executed. In step S11, the target input image is acquired by the user performing a shutter operation. In step S12, the subject detection unit 51 performs subject detection and subject category classification of the target input image. In step S 13, the feature information generation unit 52 extracts and generates feature information that can be said to be information relating to user preference from the target input image. In step S14, the processing results of steps S12 and S13 are reflected in the recorded contents of the learning memory 54 (simply, for example, the feature information generated in step S13 is stored in the learning memory 54 as it is). Information recording control for the learning memory 54 is performed by the memory control unit 53. Of the respective parts shown in FIG. 4, only the imaging control unit 55 functions beneficially in the control stage operation. Hereinafter, the operation of each part in FIG. 4 will be described in detail with a specific example of the target input image.

図６（ａ）において、符号３００は、ステップＳ１１にて取得された対象入力画像の例を表している。人物及び山を被写体として撮影範囲内に含めた状態でシャッタ操作を成すことにより、対象入力画像３００が得られたものとする。図６（ａ）において、符号３０１及び３０２は対象入力画像３００の被写体を表している。被写体３０１及び３０２は、夫々、人物及び山である。 In FIG. 6A, reference numeral 300 represents an example of the target input image acquired in step S11. It is assumed that the target input image 300 is obtained by performing a shutter operation in a state where a person and a mountain are included in the shooting range as subjects. In FIG. 6A, reference numerals 301 and 302 represent subjects of the target input image 300. The subjects 301 and 302 are a person and a mountain, respectively.

被写体検出部５１及び特徴情報生成部５２には、対象入力画像を含む各入力画像の画像信号が入力される。被写体検出部５１は、入力画像の画像信号に基づき、入力画像上に存在する各被写体を複数のカテゴリの何れかに分類して検出する。即ち、入力画像上に存在する被写体ごとに、被写体が何れのカテゴリに属する被写体であるのかを検出する。カテゴリを種類とも読み替えることができる。上記複数のカテゴリは、被写体検出部５１に予め登録されたカテゴリであり、人物、犬、猫、鳥、自動車、山、海、空、花などを含む。入力画像の画像信号に基づいて入力画像上の各被写体のカテゴリを検出する処理（顔検出処理等）は公知であるため、ここでは詳細な説明を割愛する。 An image signal of each input image including the target input image is input to the subject detection unit 51 and the feature information generation unit 52. The subject detection unit 51 classifies and detects each subject present on the input image in any of a plurality of categories based on the image signal of the input image. In other words, for each subject existing on the input image, it is detected to which category the subject belongs. Categories can be read as types. The plurality of categories are categories registered in advance in the subject detection unit 51 and include persons, dogs, cats, birds, automobiles, mountains, sea, sky, flowers, and the like. Since the process (face detection process or the like) for detecting the category of each subject on the input image based on the image signal of the input image is known, a detailed description is omitted here.

対象入力画像３００の画像信号が被写体検出部５１に入力されると、被写体検出部５１は、対象入力画像３００の画像信号に基づき、対象入力画像３００から被写体３０１及び３０２を検出して被写体３０１及び３０２が存在する画像領域を夫々被写体領域３１１及び３１２として抽出すると共に、被写体３０１及び３０２が何れのカテゴリに分類される被写体であるのかを検出する（図６（ｂ）参照）。上述したように、被写体３０１及び３０２は夫々人物及び山であるため、被写体検出部５１は、被写体３０１及び３０２のカテゴリが夫々人物及び山であると検出する。 When the image signal of the target input image 300 is input to the subject detection unit 51, the subject detection unit 51 detects the subjects 301 and 302 from the target input image 300 based on the image signal of the target input image 300, and the subject 301 and An image region where 302 exists is extracted as subject regions 311 and 312, and it is detected to which category the subjects 301 and 302 belong (see FIG. 6B). As described above, since the subjects 301 and 302 are a person and a mountain, respectively, the subject detection unit 51 detects that the categories of the subjects 301 and 302 are a person and a mountain, respectively.

尚、被写体領域３１１は、被写体３０１としての人物の全体像が表れている画像領域であっても良いが、本例では、人物の顔の部分だけを含む画像領域が被写体領域３１１として抽出されるものとする（後述の他の実施例においても同様）。また、図６（ｂ）では、抽出される各被写体領域が矩形領域となっているが、各被写体領域は矩形領域以外であっても構わない（後述の他の実施例においても同様）。 Note that the subject region 311 may be an image region in which the entire image of the person as the subject 301 appears, but in this example, an image region including only the face portion of the person is extracted as the subject region 311. (The same applies to other examples described later). In FIG. 6B, each subject area to be extracted is a rectangular area, but each subject area may be other than the rectangular area (the same applies to other embodiments described later).

特徴情報生成部５２は、１枚の入力画像に対して、フォーカス情報、合焦被写体サイズ情報、合焦被写体位置情報を生成することができる。フォーカス情報は、入力画像上における合焦被写体のカテゴリを表す。合焦被写体とは、ピントが合っている被写体を指す。或る入力画像に関し、その入力画像の撮影時に撮像装置１の被写界深度内に位置している被写体は、ピントが合っている被写体に含まれる。合焦被写体サイズ情報（以下、サイズ情報と略記することがある）は、入力画像上における合焦被写体の大きさを表す。或る被写体の大きさとは、例えば、その被写体の被写体領域の画像サイズを指す。合焦被写体位置情報（以下、位置情報と略記することがある）は、入力画像上における合焦被写体の位置を表す。或る被写体の位置とは、厳密には例えば、その被写体の被写体領域の中心位置を指す。 The feature information generation unit 52 can generate focus information, in-focus subject size information, and in-focus subject position information for one input image. The focus information represents the category of the focused subject on the input image. The focused subject refers to a subject that is in focus. With respect to a certain input image, a subject positioned within the depth of field of the imaging device 1 when the input image is captured is included in the subject in focus. Focused subject size information (hereinafter sometimes abbreviated as size information) represents the size of the focused subject on the input image. The size of a certain subject refers to, for example, the image size of the subject area of the subject. Focused subject position information (hereinafter sometimes abbreviated as position information) represents the position of the focused subject on the input image. Strictly speaking, the position of a certain subject indicates, for example, the center position of the subject area of the subject.

或る入力画像に対する特徴情報は、その入力画像に対するフォーカス情報、合焦被写体サイズ情報及び合焦被写体位置情報から成り、該特徴情報に必要に応じて付加情報が付加される（図７参照）。付加情報も特徴情報の一部であると考えることも可能であるが、本例においては、付加情報は特徴情報の構成要素ではないと考える。付加情報に、入力画像の撮影時における時刻、天候の情報など、任意の情報を内包させることができる。メモリ制御部５３は、対象入力画像に対して被写体検出部５１により検出された複数の被写体のカテゴリの組み合わせに注目し、注目した組み合わせに対して対象入力画像に対する特徴情報を関連付けた上でそれらを学習メモリ５４に保存する。 The feature information for an input image includes focus information, focused subject size information, and focused subject position information for the input image, and additional information is added to the feature information as necessary (see FIG. 7). Although it is possible to consider that the additional information is also a part of the feature information, in this example, it is considered that the additional information is not a component of the feature information. Arbitrary information can be included in the additional information, such as the time at the time of shooting the input image and weather information. The memory control unit 53 pays attention to a combination of categories of a plurality of subjects detected by the subject detection unit 51 with respect to the target input image, and associates the feature information with respect to the target input image with respect to the target combination. Save in the learning memory 54.

図７に、対象入力画像３００に対する、学習メモリ５４の保存内容を示す。今、対象入力画像３００上においてピントの合っている被写体が被写体３０１であって（被写体３０２にはピントが合っていない）、且つ、対象入力画像３００上における被写体領域３１１の画像サイズがＳＩＺＥ_A1であって、且つ、対象入力画像３００上における被写体領域３１１の中心位置がブロックＢＬ₅に属しているものとする。そうすると、対象入力画像３００に対するフォーカス情報は被写体３０１のカテゴリである人物であり、且つ、対象入力画像３００に対するサイズ情報及び位置情報は夫々ＳＩＺＥ_A1及びＢＬ₅とされる。フォーカス情報が人物であるとは、フォーカス情報によって指し示される、ピントの合っている被写体のカテゴリが人物である、ことを意味する。 FIG. 7 shows the contents stored in the learning memory 54 for the target input image 300. Now, the subject in focus on the target input image 300 is the subject 301 (the subject 302 is not in focus), and the image size of the subject area 311 on the target input image 300 is SIZE _A1 . there are, and the center position of the subject region 311 on the target input image 300 is assumed to belong to the block BL _5. Then, the focus information for the target input image 300 is a person is a category of the object 301, and the size information and position information for the target input image 300 are respectively SIZE _A1 and BL _5. The fact that the focus information is a person means that the category of the subject in focus indicated by the focus information is a person.

ブロックＢＬ₅の意義について説明する。図８に、注目した１枚の入力画像を示す。注目した入力画像を水平及び垂直方向の夫々に沿って３等分することにより注目した入力画像の全体画像領域を９等分し、得られた９つの画像領域をブロックＢＬ₁〜ＢＬ₉と定義する。上述のブロックＢＬ₅は、ブロックＢＬ₁〜ＢＬ₉の１つである。尚、入力画像の分割数（即ち、上記ブロックの個数）としての９は勿論例示であり、それを９以外とすることもできるが、以下の説明では、それが９であるものとする。 The significance of the block BL ₅ will be described. FIG. 8 shows a focused input image. By dividing the noted input image into three equal parts along the horizontal and vertical directions, the entire image area of the noted input image is divided into nine equal parts, and the nine obtained image areas are defined as blocks BL _{1 to} BL _9. To do. The block BL ₅ described above is one of the blocks BL _{1 to} BL ₉ . It should be noted that 9 as the number of divisions of the input image (that is, the number of the blocks) is, of course, an example, and can be other than 9. However, in the following description, it is assumed to be 9.

特徴情報生成部５２は、対象入力画像３００のフォーカス情報、サイズ情報及び位置情報を含む特徴情報３０５を生成し、メモリ制御部５３は、この特徴情報３０５を、対象入力画像３００上の被写体のカテゴリの組み合わせに関連付けて学習メモリ５４に保存する。対象入力画像３００上の被写体のカテゴリの組み合わせとは、被写体３０１及び３０２のカテゴリの組み合わせ、即ち、「人物」と「山」の組み合わせである。以下、被写体のカテゴリの組み合わせを、カテゴリ組み合わせとも表記する。従って、「人物」と「山」の組み合わせは、カテゴリ組み合わせ「人物及び山」と表記される。 The feature information generation unit 52 generates feature information 305 including focus information, size information, and position information of the target input image 300, and the memory control unit 53 converts the feature information 305 into the subject category on the target input image 300. Are stored in the learning memory 54 in association with these combinations. The combination of categories of the subject on the target input image 300 is a combination of categories of the subjects 301 and 302, that is, a combination of “person” and “mountain”. Hereinafter, a combination of subject categories is also referred to as a category combination. Therefore, the combination of “person” and “mountain” is represented as a category combination “person and mountain”.

特徴情報生成部５２は、被写体領域３１１に対するＡＦ評価値と被写体領域３１２に対するＡＦ評価値とを比較し、前者が後者よりも大きければ被写体３０１を合焦被写体と判断する一方、後者が前者よりも大きければ被写体３０２を合焦被写体と判断して、その判断結果から対象入力画像３００のフォーカス情報を生成する。本例では、前者が後者よりも大きいために、被写体３０１が合焦被写体であると判断される。尚、被写体検出部５１により検出される、対象入力画像３００上の複数の被写体の中に、ピントが合っている被写体が必ず含まれているものとする（他の対象入力画像及び合焦被写体が議論される任意の入力画像についても同様）。 The feature information generation unit 52 compares the AF evaluation value for the subject region 311 and the AF evaluation value for the subject region 312 and determines that the subject 301 is a focused subject if the former is larger than the latter, while the latter is more than the former. If it is larger, the subject 302 is determined as a focused subject, and focus information of the target input image 300 is generated from the determination result. In this example, since the former is larger than the latter, it is determined that the subject 301 is a focused subject. In addition, it is assumed that the subject in focus is included in the plurality of subjects on the target input image 300 detected by the subject detection unit 51 (other target input images and in-focus subjects are included). The same applies to any input image discussed).

ＡＦ評価値は、画像信号処理部１３内のＡＦ評価部（不図示）によって算出される。該ＡＦ評価部が特徴情報生成部５２に内在していると考えても構わない。ＡＦ評価部は、ＡＦ評価値算出の対象となる入力画像の全体画像領域を複数のＡＦ評価ブロックに分割し、ＡＦ評価ブロックごとに、ＡＦ評価ブロック内の画像のコントラスト量に応じたＡＦ評価値を算出する。撮像装置１は、このＡＦ評価値に基づき、コントラスト検出法によるオートフォーカス制御を実施することができる。或るＡＦ評価ブロックのＡＦ評価値は、そのＡＦ評価ブロック内の画像のコントラスト（換言すれば、エッジの強度）が大きいほど大きくなる。 The AF evaluation value is calculated by an AF evaluation unit (not shown) in the image signal processing unit 13. It may be considered that the AF evaluation unit is inherent in the feature information generation unit 52. The AF evaluation unit divides the entire image area of the input image that is the target of AF evaluation value calculation into a plurality of AF evaluation blocks, and for each AF evaluation block, an AF evaluation value corresponding to the contrast amount of the image in the AF evaluation block Is calculated. The imaging apparatus 1 can perform autofocus control by a contrast detection method based on the AF evaluation value. The AF evaluation value of a certain AF evaluation block increases as the contrast of the image in the AF evaluation block (in other words, the edge strength) increases.

対象入力画像３００に関しては、被写体領域３１１に属するＡＦ評価ブロックのＡＦ評価値の平均値を被写体領域３１１のＡＦ評価値として取り扱い、且つ、被写体領域３１２に属するＡＦ評価ブロックのＡＦ評価値の平均値を被写体領域３１２のＡＦ評価値として取り扱えば良い（各被写体領域に複数のＡＦ評価ブロックが属していると仮定）。或いは、被写体領域３１１及び３１２を第１及び第２のＡＦ評価ブロックとして取り扱って、夫々のＡＦ評価値を算出するようにしても良い。尚、或る入力画像に対して算出された複数のＡＦ評価値の内、最大のＡＦ評価値に対応するＡＦ評価ブロック、又は、所定の閾値以上のＡＦ評価値に対応するＡＦ評価ブロックが、合焦被写体の存在する画像領域であると判断することもできる。 Regarding the target input image 300, the average value of AF evaluation values of the AF evaluation blocks belonging to the subject area 311 is handled as the AF evaluation value of the subject area 311 and the average value of the AF evaluation values of the AF evaluation blocks belonging to the subject area 312 is used. May be handled as the AF evaluation value of the subject area 312 (assuming that a plurality of AF evaluation blocks belong to each subject area). Alternatively, the subject areas 311 and 312 may be handled as the first and second AF evaluation blocks, and the respective AF evaluation values may be calculated. Of a plurality of AF evaluation values calculated for a certain input image, an AF evaluation block corresponding to the maximum AF evaluation value or an AF evaluation block corresponding to an AF evaluation value equal to or greater than a predetermined threshold value, It can also be determined that this is an image area where a focused subject exists.

上述のようにして対象入力画像３００の合焦被写体が被写体３０１であると検出した後、特徴情報生成部５２は、被写体３０１の画像信号が存在する画像領域である被写体領域３１１の大きさ及び中心位置を検出することで、対象入力画像３００に対するサイズ情報（ＳＩＺＥ_A1）及び位置情報（ＢＬ₅）を生成する。尚、被写体領域３１１の大きさ及び中心位置の検出自体は被写体検出部５１にて行われる、と考えても良い。 After detecting that the focused subject of the target input image 300 is the subject 301 as described above, the feature information generation unit 52 determines the size and center of the subject region 311 that is an image region where the image signal of the subject 301 exists. By detecting the position, size information (SIZE _A1 ) and position information (BL ₅ ) for the target input image 300 are generated. It should be noted that detection of the size and center position of the subject region 311 itself may be performed by the subject detection unit 51.

学習段階動作では、ステップＳ１１〜Ｓ１４から成る一連の処理を繰り返し実行することで、ユーザの嗜好性を繰り返し学習する。１つのカテゴリ組み合わせについて、学習段階動作から制御段階動作へ移行するために必要な学習回数をＬ_NUMにて表す。Ｌ_NUMは１以上の整数に設定されるが、説明の具体化のため、Ｌ_NUM＝３である場合を考える。図９は、対象入力画像３００の取得後、対象入力画像３００ａ、３００ｂ、３３０、３３０ａ、３３０ｂ、３３１及び３３１ａ（全て不図示）が更に取得された後の、学習メモリ５４の記録内容を示している。対象入力画像３００ａ及び３００ｂの夫々のカテゴリ組み合わせは「人物及び山」であり、且つ、対象入力画像３３０、３３０ａ、３３０ｂの夫々のカテゴリ組み合わせは「犬及び海」であり、且つ、対象入力画像３３１及び３３１ａの夫々のカテゴリ組み合わせは「人及び海」であるとする。 In the learning stage operation, the user's preference is repeatedly learned by repeatedly executing a series of processes consisting of steps S11 to S14. For one category combination, the number of learnings required to shift from the learning stage operation to the control stage operation is represented by _LNUM . L _NUM is set to an integer equal to or greater than 1. For the sake of concrete explanation, consider the case where L _NUM = 3. FIG. 9 shows the recorded contents of the learning memory 54 after the target input image 300 is acquired and the target input images 300a, 300b, 330, 330a, 330b, 331, and 331a (all not shown) are further acquired. Yes. Each category combination of the target input images 300 a and 300 b is “person and mountain”, and each category combination of the target input images 330, 330 a, and 330 b is “dog and sea”, and the target input image 331. And 331a are assumed to be “person and sea”.

特徴情報生成部５２は、対象入力画像３００の特徴情報３０５を生成する方法と同様の方法にて、対象入力画像３００ａの特徴情報３０５ａ及び対象入力画像３００ｂの特徴情報３０５ｂを生成する。メモリ制御部５３は、特徴情報３０５、３０５ａ及び３０５ｂが生成されると、それらをカテゴリ組み合わせ「人物及び山」に関連付けつつ学習メモリ５４に保存する。一方で、特徴情報生成部５２又はメモリ制御部５３は、特徴情報３０５、３０５ａ及び３０５ｂに一致又は類似する特徴情報Ｗ１_Aを統計学に基づいて作成し、特徴情報Ｗ１_Aもカテゴリ組み合わせ「人物及び山」に関連付けて学習メモリ５４に保存する。１枚１枚の対象入力画像の特徴情報を特に要素特徴情報とも呼び、複数枚の対象入力画像の特徴情報から統計学に基づき生成された特徴情報を特に総合特徴情報とも呼ぶ。本例において、特徴情報３０５、３０５ａ及び３０５ｂの夫々は要素特徴情報であり、特徴情報Ｗ１_Aは総合特徴情報である。 The feature information generation unit 52 generates the feature information 305a of the target input image 300a and the feature information 305b of the target input image 300b by a method similar to the method of generating the feature information 305 of the target input image 300. When the feature information 305, 305a, and 305b is generated, the memory control unit 53 stores them in the learning memory 54 while associating them with the category combination “person and mountain”. On the other hand, the feature information generation unit 52 or the memory control unit 53 creates feature information W1 _A that matches or resembles the feature information 305, 305a, and 305b based on statistics, and the feature information W1 _A is also combined with the category combination “person and It is stored in the learning memory 54 in association with “mountain”. The feature information of each target input image is also called element feature information, and the feature information generated based on statistics from the feature information of a plurality of target input images is also called comprehensive feature information. In this example, each of the feature information 305, 305a, and 305b is element feature information, and the feature information W1 _A is comprehensive feature information.

図９の要素特徴情報３０５は、図７に示すそれと同じである。更に、図９に示す如く、要素特徴情報３０５ａにおけるフォーカス情報、サイズ情報及び位置情報が、夫々、人物、ＳＩＺＥ_A2及びＢＬ₅であって、要素特徴情報３０５ｂにおけるフォーカス情報、サイズ情報及び位置情報が、夫々、人物、ＳＩＺＥ_A3及びＢＬ₄である場合を考える。この場合、総合特徴情報Ｗ１_Aにおけるフォーカス情報、サイズ情報及び位置情報は、夫々、人物、ＳＩＺＥ_A及びＢＬ₅とされる。情報Ｗ１_Aに付随する付加情報には、カテゴリ組み合わせ「人物及び山」に対する学習回数が記録される。情報Ｗ１_Aは、３回分の学習結果に基づき、即ち特徴情報３０５、３０５ａ及び３０５ｂを元に生成される。従って、情報Ｗ１_Aの付加情報には、学習回数を表す数値として３が記録される。 The element feature information 305 in FIG. 9 is the same as that shown in FIG. Furthermore, as shown in FIG. 9, the focus information, size information, and position information in the element feature information 305a are person, SIZE _A2 and BL ₅ , respectively, and the focus information, size information, and position information in the element feature information 305b are , consider respectively, the person, the case is SIZE _A3 and BL _4. In this case, the focus information, size information, and position information in the general feature information W1 _A are a person, SIZE _A, and BL ₅ , respectively. In the additional information accompanying the information W1 _A , the number of learning times for the category combination “person and mountain” is recorded. The information W1 _A is generated based on the learning results for three times, that is, based on the feature information 305, 305a, and 305b. Accordingly, 3 is recorded in the additional information of the information W1 _A as a numerical value indicating the number of learnings.

総合特徴情報Ｗ１_Aの元となる要素特徴情報のフォーカス情報の内、最も頻度の多いフォーカス情報を、総合特徴情報Ｗ１_Aのフォーカス情報とすることができる。図９の例では、要素特徴情報３０５、３０５ａ及び３０５ｂのフォーカス情報が全て「人物」であるため、総合特徴情報Ｗ１_Aのフォーカス情報も「人物」とされる。仮に、要素特徴情報３０５、３０５ａ及び３０５ｂのフォーカス情報の内、２つのみが「人物」であっても、総合特徴情報Ｗ１_Aのフォーカス情報は「人物」とされる。
総合特徴情報Ｗ１_Aにおけるサイズ情報ＳＩＺＥ_Aは、例えば、総合特徴情報Ｗ１_Aの元となる要素特徴情報のサイズ情報の平均（即ち、ＳＩＺＥ_A1〜ＳＩＺＥ_A3の平均）とされる。但し、フォーカス情報において総合特徴情報Ｗ１_Aと一致しない要素特徴情報のサイズ情報は、サイズ情報ＳＩＺＥ_Aに反映されないものとする。従って仮に、要素特徴情報３０５、３０５ａ及び３０５ｂのフォーカス情報が夫々「人物」、「人物」及び「山」であるならば、要素特徴情報３０５、３０５ａのサイズ情報の平均（即ち、ＳＩＺＥ_A1及びＳＩＺＥ_A2の平均）がサイズ情報ＳＩＺＥ_Aとなる。
総合特徴情報Ｗ１_Aの元となる要素特徴情報の位置情報の内、最も頻度の多い位置情報を、総合特徴情報Ｗ１_Aの位置情報とすることができる。但し、フォーカス情報において総合特徴情報Ｗ１_Aと一致しない要素特徴情報の位置情報は、総合特徴情報Ｗ１_Aの位置情報に反映されないものとする。図９の例では、要素特徴情報３０５、３０５ａ及び３０５ｂのフォーカス情報が全て「人物」であるため、要素特徴情報３０５、３０５ａ及び３０５ｂの位置情報の内、最も頻度が多い位置情報ＢＬ₅が総合特徴情報Ｗ１_Aの位置情報とされる。仮に、要素特徴情報３０５、３０５ａ及び３０５ｂのフォーカス情報が夫々「人物」、「山」及び「人物」であるならば、要素特徴情報３０５の位置情報ＢＬ₅又は要素特徴情報３０５ｂの位置情報ＢＬ₄が総合特徴情報Ｗ１_Aの位置情報とされる。この場合において、対象入力画像３００の撮影後に対象入力画像３００ｂが撮影されていたのならば、新しく撮影された方の対象入力画像３００ｂを優先し、要素特徴情報３０５ｂの位置情報ＢＬ₄を総合特徴情報Ｗ１_Aの位置情報にするようにしてもよい。 Of focus information elements characteristic information as the original overall characteristic information W1 _A, the most frequent more focus information may be the focus information comprehensive feature information W1 _A. In the example of FIG. 9, since the focus information of the element feature information 305, 305a, and 305b is all “person”, the focus information of the comprehensive feature information W1 _A is also “person”. Even if only two of the focus information of the element feature information 305, 305a, and 305b are “person”, the focus information of the comprehensive feature information W1 _A is “person”.
Size information SIZE _A in General characteristic information W1 _A is, for example, a mean size information elements characteristic information as the original overall characteristic information W1 _A (i.e., the average of the SIZE _A1 ~SIZE _A3). However, the size information of the element feature information that does not match the general feature information W1 _A in the focus information is not reflected in the size information SIZE _A. Therefore, if the focus information of the element feature information 305, 305a, and 305b is “person”, “person”, and “mountain”, respectively, the average size information of the element feature information 305, 305a (that is, SIZE _A1 and SIZE). The average of _A2 ) is the size information SIZE _A.
Of the positional information of the element characteristic information as the original overall characteristic information W1 _A, the most frequent more location information may be a position information of the overall characteristic information W1 _A. However, the position information of the element characteristic information that does not match the overall characteristic information W1 _A in the focus information shall not reflected in the position information of the overall characteristic information W1 _A. In the example of FIG. 9, since the focus information of the element feature information 305, 305a, and 305b is all “person”, the position information BL ₅ with the highest frequency among the position information of the element feature information 305, 305a, and 305b is comprehensive. is the position information of the feature information W1 _a. If the focus information of the element feature information 305, 305a, and 305b is “person”, “mountain”, and “person”, respectively, the position information BL ₅ of the element feature information 305 or the position information BL _{4 of the} element feature information 305b. Is the position information of the general feature information W1 _A. In this case, if the target input image 300b has been shot after shooting of the target input image 300, giving priority to target input image 300b towards the newly captured, comprehensive feature position information BL ₄ elements characteristic information 305b The position information of the information W1 _A may be used.

図９において、中央部分に示された特徴情報群は、上記の対象入力画像３３０、３３０ａ及び３３０ｂ（不図示）に基づく、カテゴリ組み合わせ「犬及び海」についての特徴情報群である。Ｗ１_Bは、上記の対象入力画像３３０、３３０ａ及び３３０ｂの要素特徴情報に基づく、カテゴリ組み合わせ「犬及び海」についての総合特徴情報である。総合特徴情報Ｗ１_Bも、総合特徴情報Ｗ１_Aと同様にして作成される。図９において、下方部分に示された特徴情報群は、上記の対象入力画像３３１及び３３１ａ（不図示）に基づく、カテゴリ組み合わせ「人及び海」についての特徴情報群である。図９に示す状態において、カテゴリ組み合わせ「人及び海」についての要素特徴情報は２つしかないため、カテゴリ組み合わせ「人及び海」については学習段階動作から制御段階動作へ移行することができないが、カテゴリ組み合わせ「人及び山」と「犬及び海」については学習回数が必要学習回数Ｌ_NUM（＝３）に達しているため制御段階動作を実行することができる。 In FIG. 9, the feature information group shown in the central portion is a feature information group for the category combination “dog and sea” based on the target input images 330, 330 a, and 330 b (not shown). W1 _B is comprehensive feature information for the category combination “dog and sea” based on the element feature information of the target input images 330, 330a, and 330b. The comprehensive feature information W1 _{B is} also created in the same manner as the comprehensive feature information W1 _A. In FIG. 9, the feature information group shown in the lower part is a feature information group for the category combination “person and sea” based on the target input images 331 and 331 a (not shown). In the state shown in FIG. 9, since there is only two element feature information for the category combination “person and sea”, the category combination “person and sea” cannot shift from the learning stage operation to the control stage operation. For the category combination “people and mountains” and “dogs and the sea”, the number of learning times has reached the required number of learning times L _NUM (= 3), so that the control stage operation can be executed.

対象入力画像３００、３００ａ及び３００ｂの撮影後、更に、カテゴリ組み合わせが「人及び山」となる対象入力画像３００ｃ（不図示）が撮影された場合には、その最新の対象入力画像３００ｃの特徴情報を用いて総合特徴情報Ｗ１_Aを更新すると良い。例えば、画像３００、３００ａ、３００ｂ及び３００ｃの要素特徴情報を元にして、或いは、画像３００ａ、３００ｂ及び３００ｃの要素特徴情報を元にして、総合特徴情報Ｗ１_Aを再作成することができる。更に或いは、対象入力画像３００ｃが撮影された時点では総合特徴情報Ｗ１_Aを更新せず、カテゴリ組み合わせが「人及び山」となる対象入力画像３００ｄ及び３００ｅが更に撮影された時点で、対象入力画像３００ｃ、３００ｄ及び３００ｅの要素特徴情報を元に総合特徴情報Ｗ１_Aを再作成するようにしてもよい。 After the target input images 300, 300a, and 300b are photographed, if the target input image 300c (not shown) whose category combination is “people and mountains” is photographed, the feature information of the latest target input image 300c is captured. It is preferable to update the comprehensive feature information W1 _A using. For example, the comprehensive feature information W1 _A can be recreated based on the element feature information of the images 300, 300a, 300b, and 300c, or based on the element feature information of the images 300a, 300b, and 300c. Further alternatively, without updating the overall characteristic information W1 _A at the time the target input image 300c is captured, when the category combinations "people and mountain" to become the target input image 300d and 300e are further captured, the target input image The comprehensive feature information W1 _A may be recreated based on the element feature information of 300c, 300d, and 300e.

［制御段階動作］
カテゴリ組み合わせ「人及び山」について、要素特徴情報３０５、３０５ａ及び３０５ｂに基づく図９の総合特徴情報Ｗ１_Aが学習メモリ５４に保存され、且つ、総合特徴情報Ｗ１_A以外の幾つかの総合特徴情報（情報Ｗ１_Bを含む）が学習メモリ５４に保存されている状態を、便宜上、図９の学習状態と呼ぶ。第１実施例では、以下、図９の学習状態の下における制御段階動作の説明を行う。図１０は、第１実施例に係る制御段階動作の手順を表すフローチャートであり、制御段階動作ではステップＳ２１〜Ｓ２８の各処理が実行される。 [Control stage operation]
For the category combination “people and mountains”, the comprehensive feature information W1 _A of FIG. 9 based on the element feature information 305, 305a, and 305b is stored in the learning memory 54, and some comprehensive feature information other than the comprehensive feature information W1 _{A is} stored. For convenience, the state in which (including information W1 _B ) is stored in the learning memory 54 is referred to as a learning state in FIG. In the first embodiment, the control stage operation under the learning state of FIG. 9 will be described below. FIG. 10 is a flowchart showing the procedure of the control stage operation according to the first embodiment. In the control stage operation, each process of steps S21 to S28 is executed.

まず、ステップＳ２１において、図１のＣＰＵ２３又は図４の撮影制御部５５によりシャッタボタン２６ｂが半押し状態となっているかが確認され、それが半押し状態になっている場合には、ステップＳ２１からステップＳ２２へ移行してステップＳ２２〜Ｓ２７の処理が順次実行される一方、それが半押し状態となっていない場合には、ステップＳ２１の確認動作が繰り返し実行される。シャッタボタン２６ｂは２段階の押下操作が可能となっており、ユーザがシャッタボタン２６ｂを軽く押すとシャッタボタン２６ｂの状態は半押し状態となり、その状態から更にシャッタボタン２６ｂを押し込むとシャッタボタン２６ｂの状態は全押し状態となる。シャッタボタン２６ｂの状態を全押し状態にする操作がシャッタ操作である。シャッタボタン２６ｂの状態を半押し状態にする操作を半押し操作と呼ぶ。 First, in step S21, it is confirmed by the CPU 23 in FIG. 1 or the photographing control unit 55 in FIG. 4 whether the shutter button 26b is in a half-pressed state. The process proceeds to step S22, and the processes of steps S22 to S27 are sequentially executed. On the other hand, if it is not half-pressed, the confirmation operation of step S21 is repeatedly executed. The shutter button 26b can be pressed in two stages. When the user lightly presses the shutter button 26b, the shutter button 26b is half pressed, and when the shutter button 26b is further pressed from this state, the shutter button 26b is pressed. The state is fully pressed. The operation of setting the shutter button 26b to the fully depressed state is a shutter operation. The operation of setting the shutter button 26b to the half-pressed state is referred to as a half-press operation.

尚、半押し操作が成されたか否かではなく、撮像装置１の筐体の静止状態（換言すれば動き状態）に基づいて、ステップＳ２１からステップＳ２２への移行可否を決定しても良い。即ち例えば、ステップＳ２１において、撮像装置１の筐体が一定時間継続して静止していると判断される場合に、ステップＳ２１からステップＳ２２への移行を実行するようにしても良い。撮像装置１の筐体の動きの大きさを表す動き量が一定時間継続して所定値以下であるとき、撮像装置１の筐体が一定時間継続して静止していると判断することができる。撮像装置１の筐体の動きを検出する動きセンサ（不図示）を撮像装置１に設けておけば、動きセンサの検出結果を用いて上記動き量を検出することができる。或いは、時間的に隣接して取得された入力画像間のオプティカルフローに基づいて上記動き量を検出することもできる。動きセンサは、例えば、撮像装置１の筐体の角速度を検出する角速度センサ、又は、撮像装置１の筐体の加速度を検出する加速度センサである。 Whether or not to move from step S21 to step S22 may be determined on the basis of the stationary state (in other words, the moving state) of the housing of the imaging device 1, not whether or not the half-press operation has been performed. That is, for example, when it is determined in step S21 that the housing of the imaging apparatus 1 is stationary for a certain period of time, the transition from step S21 to step S22 may be executed. When the amount of movement representing the magnitude of the movement of the housing of the imaging device 1 is not more than a predetermined value continuously for a certain time, it can be determined that the housing of the imaging device 1 is stationary for a certain time. . If a motion sensor (not shown) for detecting the movement of the housing of the imaging device 1 is provided in the imaging device 1, the amount of motion can be detected using the detection result of the motion sensor. Alternatively, the amount of motion can be detected based on an optical flow between input images acquired adjacent in time. The motion sensor is, for example, an angular velocity sensor that detects an angular velocity of the housing of the imaging device 1 or an acceleration sensor that detects an acceleration of the housing of the imaging device 1.

特殊撮影モードを含む撮影モードでは、所定のフレーム周期（例えば、１／６０秒）にて周期的に被写体の撮影が行われて入力画像を順次取得され、順次取得された入力画像は次々と表示部２７上に更新表示される（学習段階動作においても同様、且つ、後述の他の実施例においても同様）。ステップＳ２２において、図４の被写体検出部５１は、最新の入力画像を評価用画像として取り扱い、その評価用画像に対して、上述の被写体検出及び被写体のカテゴリ分類を行う。即ち、評価用画像の画像信号に基づき、評価用画像上に存在する各被写体を複数のカテゴリの何れかに分類して検出する。これにより、上述と同様の方法にて、評価用画像についてのカテゴリ組み合わせが決定する。 In the shooting modes including the special shooting mode, the subject is periodically shot at a predetermined frame period (for example, 1/60 seconds) to sequentially acquire the input images, and the sequentially acquired input images are displayed one after another. It is updated and displayed on the section 27 (the same is true in the learning stage operation, and the same is true in other embodiments described later). In step S22, the subject detection unit 51 in FIG. 4 handles the latest input image as an evaluation image, and performs the above-described subject detection and subject category classification on the evaluation image. That is, based on the image signal of the evaluation image, each subject present on the evaluation image is classified into any of a plurality of categories and detected. Thereby, the category combination about the image for evaluation is determined by the same method as described above.

ステップＳ２２に続くステップＳ２３において、図４の撮影制御部５５は、評価用画像のカテゴリ組み合わせに一致するカテゴリ組み合わせの総合特徴情報を、学習メモリ５４から読み出す。例えば、評価用画像の組み合わせカテゴリが「人物及び山」であったならば総合特徴情報Ｗ１_Aが読み出され、評価用画像の組み合わせカテゴリが「犬及び海」であったならば総合特徴情報Ｗ１_Bが読み出される（図９参照）。以下では、評価用画像が図１１の画像３５０である場合を考える。画像上３５０には人物である被写体３５１及び山である被写体３５２の画像信号が存在している。その結果、評価用画像３５０の組み合わせカテゴリは「人物及び山」であると判断されたものとする。そうすると、ステップＳ２３において総合特徴情報Ｗ１_Aが読み出される。以下、ステップＳ２３にて読み出される総合特徴情報を、読み出し特徴情報とも呼ぶ。 In step S23 subsequent to step S22, the imaging control unit 55 in FIG. 4 reads, from the learning memory 54, the comprehensive feature information of the category combination that matches the category combination of the evaluation image. For example, if the combination category of the evaluation image is “person and mountain”, the comprehensive feature information W1 _A is read, and if the combination category of the evaluation image is “dog and sea”, the comprehensive feature information W1. _B is read (see FIG. 9). In the following, the case where the evaluation image is the image 350 in FIG. 11 is considered. On the image 350, there are image signals of a subject 351 that is a person and a subject 352 that is a mountain. As a result, it is assumed that the combination category of the evaluation image 350 is determined to be “person and mountain”. Then, the comprehensive feature information W1 _A is read out in step S23. Hereinafter, the comprehensive feature information read in step S23 is also referred to as read feature information.

ステップＳ２３に続くステップＳ２４において、撮影制御部５５は、読み出し特徴情報のフォーカス情報に示された被写体（読み出し特徴情報のフォーカス情報に示されたカテゴリの被写体）を主要被写体として設定する。読み出し特徴情報である総合特徴情報Ｗ１_Aのフォーカス情報に示された被写体のカテゴリは「人物」であるため、人物が主要被写体として設定される。 In step S24 subsequent to step S23, the imaging control unit 55 sets the subject indicated by the focus information of the readout feature information (the subject in the category indicated by the focus information of the readout feature information) as the main subject. Since the category of the subject indicated in the focus information of the general feature information W1 _A that is the read feature information is “person”, the person is set as the main subject.

図１１の画像領域３６１は、被写体３５１の画像信号が存在する被写体領域である。図６（ｂ）の被写体領域３１１の検出方法と同様の方法にて、評価用画像３５０の画像信号に基づき被写体領域３６１が検出される。評価用画像３５０上における被写体領域３６１の画像サイズがＳＩＺＥ_A’であるものとする。ステップＳ２４に続くステップＳ２５において、撮影制御部５５は、読み出し特徴情報のサイズ情報であるＳＩＺＥ_Aと、上記のＳＩＺＥ_A’とに基づき、ズーム自動制御とも言うべき画角自動調整を実行する。 An image region 361 in FIG. 11 is a subject region where an image signal of the subject 351 exists. The subject region 361 is detected based on the image signal of the evaluation image 350 by a method similar to the method for detecting the subject region 311 in FIG. It is assumed that the image size of the subject area 361 on the evaluation image 350 is SIZE _A ′. In step S25 following step S24, the imaging control unit 55, a SIZE _A is the size information of the read characteristic information, on the basis of the above SIZE _A ', to perform the angle automatic adjustment should be called a zoom automatic control.

画角自動調整では、主要被写体の大きさを示す画像サイズをＳＩＺＥ_A’からＳＩＺＥ_Aへと変更するために必要な光学ズーム倍率を目標ズーム倍率として算出し、実際の光学ズーム倍率を目標ズーム倍率に向かって変更する。実際の光学ズーム倍率が目標ズーム倍率と一致するように光学ズーム倍率が変更されたならば、その変更後に得られる入力画像上の主要被写体の大きさ（主要被写体の被写体領域の画像サイズ）は、理想的にはＳＩＺＥ_Aとなる。光学ズーム倍率の変更は図２のズームレンズ３０の位置変更によって実現され、光学ズーム倍率の変更によって、入力画像の画角とも言うべき撮像部１１の撮影画角が変更される。具体的には例えば、主要被写体の被写体領域の面積として表されるＳＩＺＥ_A’がＳＩＺＥ_Aの１／４であるなら、画角自動調整によって光学ズーム倍率を２倍にする（光学ズーム倍率が２倍になれば、主要被写体の被写体領域の面積は４倍になるからである）。図１２の画像３７０は、画角自動調整前のそれとの対比において、光学ズーム倍率を１．５倍にする画角自動調整を経て得られる入力画像を表している。 The angle automatic adjustment, the optical zoom magnification required to change the image size indicating the size of the main object SIZE _A 'to SIZE _A was calculated as the target zoom magnification, the target zoom magnification of the actual optical zoom magnification Change towards If the optical zoom magnification is changed so that the actual optical zoom magnification matches the target zoom magnification, the size of the main subject (image size of the subject area of the main subject) on the input image obtained after the change is Ideally SIZE _A. The change of the optical zoom magnification is realized by changing the position of the zoom lens 30 in FIG. 2, and the change of the optical zoom magnification changes the shooting angle of view of the image pickup unit 11 which should be called the angle of view of the input image. Specifically, for example, if SIZE _A 'expressed as the area of the subject area of the main subject is 1/4 of SIZE _A , the optical zoom magnification is doubled by the automatic adjustment of the angle of view (the optical zoom magnification is 2). This is because the area of the subject area of the main subject will be quadrupled if doubled). An image 370 in FIG. 12 represents an input image obtained through the automatic adjustment of the angle of view to increase the optical zoom magnification by 1.5 in comparison with that before the automatic adjustment of the angle of view.

但し、画角自動調整は、主要被写体の位置が、読み出し特徴情報の位置情報に示されたブロックからはみ出さないという前提の下で行われ、そのようなはみ出しが生じる場合には、そのようなはみ出しが生じる直前にて画角自動調整は強制的に終了される。主要被写体の位置とは、厳密には例えば、主要被写体の被写体領域の中心位置である。例えば、評価用画像３５０上における主要被写体の位置（被写体領域３６１の中心位置）がブロックＢＬ₅に属しており、且つ、評価用画像３５０の撮影時における光学ズーム倍率がＺＦ₁であって、且つ、ＳＩＺＥ_A＞ＳＩＺＥ_A’であるとき、ＺＦ₁よりも大きな目標ズーム倍率ＺＦ₂が設定され、目標ズーム倍率ＺＦ₂に向かって光学ズーム倍率の増大が成されるが、その増大の過程において、最新の入力画像上における主要被写体の位置がブロックＢＬ₅からはみ出しそうになったとき、そのはみ出しが生じる直前において光学ズーム倍率の増大を中止して光学ズーム倍率を固定する。この場合、ステップＳ２５の画角自動調整後の光学ズーム倍率は、ＺＦ₁よりも大きいがＺＦ₂よりも小さくなる。 However, the automatic adjustment of the angle of view is performed on the assumption that the position of the main subject does not protrude from the block indicated in the position information of the readout feature information. The automatic adjustment of the angle of view is forcibly terminated immediately before the protrusion occurs. Strictly speaking, the position of the main subject is, for example, the center position of the subject area of the main subject. For example, the position of the main subject (the center position of the subject area 361) on the evaluation image 350 belongs to the block BL ₅ , the optical zoom magnification at the time of shooting the evaluation image 350 is ZF ₁ , and , SIZE _A > SIZE _A ′, a target zoom magnification ZF ₂ larger than ZF ₁ is set, and the optical zoom magnification increases toward the target zoom magnification ZF ₂ . when the position of the main subject on the latest input image is about to protrude from the block BL _5, to fix the optical zoom magnification to cancel an increase in optical zoom magnification immediately before the squeeze out occurs. In this case, the optical zoom magnification after the automatic adjustment of the angle of view in step S25 is larger than ZF ₁ but smaller than ZF ₂ .

ステップＳ２５の画角自動調整の終了後、ステップＳ２６において、撮影制御部５５は、主要被写体にピントが合うようにフォーカス自動調整を実行する。フォーカス自動調整を、上述のＡＦ評価値を用いたコントラスト検出法によるオートフォーカス制御により実現できる。つまり、主要被写体の被写体領域のＡＦ評価値が最大化されるように図２のフォーカスレンズ３１の位置を調整することにより、主要被写体にピントを合わせることができる。 After the automatic adjustment of the angle of view in step S25, in step S26, the imaging control unit 55 performs automatic focus adjustment so that the main subject is in focus. Automatic focus adjustment can be realized by autofocus control based on the contrast detection method using the AF evaluation value described above. That is, the main subject can be focused by adjusting the position of the focus lens 31 in FIG. 2 so that the AF evaluation value of the subject area of the main subject is maximized.

フォーカス自動調整中に取得される各入力画像に対して被写体検出を継続的に行うことで、それらの入力画像の夫々からＡＦ評価値が最大化されるべき主要被写体の被写体領域を逐次検出し、その逐次検出の結果を用いてフォーカス自動調整を行うようにしても良い。或いは、評価用画像の撮影時における光学ズーム倍率及び現時点の光学ズーム倍率と、評価用画像上における主要被写体の位置から、現時点の入力画像上における主要被写体の位置を推定し、その推定の結果を用いてフォーカス自動調整を行うようにしても良い。 By continuously performing subject detection for each input image acquired during automatic focus adjustment, the subject area of the main subject for which the AF evaluation value should be maximized is sequentially detected from each of the input images, Automatic focus adjustment may be performed using the result of the sequential detection. Alternatively, the position of the main subject on the current input image is estimated from the optical zoom magnification at the time of shooting the evaluation image and the current optical zoom magnification and the position of the main subject on the evaluation image, and the result of the estimation is obtained. It may be used to perform automatic focus adjustment.

ステップＳ２６のフォーカス自動調整後、ステップＳ２７において、シャッタボタン２６ｂが全押し状態になるのを待機し、それが全押し状態になったことが確認されるとステップＳ２７からステップＳ２８へ移行して、新たな対象入力画像の撮影を行う。ステップＳ２８で得られた対象入力画像は外部メモリ１８に記録される。 After the automatic focus adjustment in step S26, in step S27, the process waits for the shutter button 26b to be fully pressed. When it is confirmed that the shutter button 26b is fully pressed, the process proceeds from step S27 to step S28. A new target input image is taken. The target input image obtained in step S28 is recorded in the external memory 18.

尚、上述の説明では、入力画像の画角を決定するズーム倍率が光学ズームによる光学ズーム倍率であることを想定している。しかしながら、入力画像の画角を決定するズーム倍率を電子ズームによる電子ズーム倍率とし、電子ズームによって画角自動調整を実現するようにしても良い。同様に考えて、光学ズームと電子ズームとの組み合わせによって画角自動調整を実現するようにしても良い。 In the above description, it is assumed that the zoom magnification for determining the angle of view of the input image is the optical zoom magnification by the optical zoom. However, the zoom magnification for determining the angle of view of the input image may be an electronic zoom magnification by electronic zoom, and automatic adjustment of the angle of view may be realized by electronic zoom. In the same way, automatic adjustment of the angle of view may be realized by a combination of optical zoom and electronic zoom.

学習メモリ５４に記録された特徴情報は、過去に撮影された対象入力画像の特徴を表している。ここで、対象入力画像の特徴には、対象入力画像上の複数の被写体の内、何れのカテゴリの被写体が合焦被写体（ピントが合っている被写体）であるのかを表す第１特徴、対象入力画像上の合焦被写体の大きさ及び位置を表す第２及び第３特徴が含まれる。第１〜第３特徴は、夫々、フォーカス情報、サイズ情報及び位置情報によって示される。 The feature information recorded in the learning memory 54 represents the feature of the target input image taken in the past. Here, the feature of the target input image includes a first feature that indicates which category of subjects from among a plurality of subjects on the target input image is a focused subject (a focused subject), target input Second and third features representing the size and position of the focused subject on the image are included. The first to third features are indicated by focus information, size information, and position information, respectively.

フォーカス情報を学習メモリ５４に記録することで、フォーカスに関するユーザの好みを再現することが可能となり、サイズ情報及び位置情報を学習メモリ５４に記録することで、合焦被写体の大きさ及び構図に関するユーザの好みを再現することが可能となる。つまり、制御段階動作では、現時点の被写体のカテゴリに応じた特徴情報を学習メモリ５４から読み出してズーム制御（画角自動調整）及びフォーカス制御（フォーカス自動調整）を行うことで、ユーザの嗜好性に適合した画角設定、構図設定及びフォーカス設定を自動的に再現する。これにより、ユーザの嗜好性に適合した画角設定、構図設定及びフォーカス設定が支援され、ユーザの利便性向上が図られる。 By recording the focus information in the learning memory 54, it becomes possible to reproduce the user's preferences regarding the focus, and by recording the size information and the position information in the learning memory 54, the user regarding the size and composition of the focused subject. Can be reproduced. In other words, in the control stage operation, feature information corresponding to the current subject category is read from the learning memory 54, and zoom control (automatic view angle adjustment) and focus control (automatic focus adjustment) are performed. Automatically reproduces the appropriate field angle setting, composition setting, and focus setting. Thereby, the angle of view setting, the composition setting and the focus setting adapted to the user's preference are supported, and the convenience of the user is improved.

以下の第２〜第７実施例には、第１実施例に適用可能な、第１実施例に対する変形技術が示される。但し、矛盾無き限り、第２〜第７実施例に示された技術を、第１実施例以外の他の実施例に適用することも可能である。 In the following second to seventh embodiments, modifications to the first embodiment that can be applied to the first embodiment are shown. However, as long as there is no contradiction, the techniques shown in the second to seventh embodiments can be applied to other embodiments other than the first embodiment.

＜＜第２実施例＞＞
第２実施例を説明する。第１実施例では、必要学習回数Ｌ_NUMが３であることを想定しているが、必要学習回数Ｌ_NUMは１であってもよく、Ｌ_NUM＝１のときは、学習段階動作において図６（ａ）の対象入力画像３００が取得されて要素特徴情報３０５（図７及び図９参照）が生成された時点で、その要素特徴情報３０５が総合特徴情報Ｗ１_Aとして機能することとなる。但し、制御段階動作においてユーザの嗜好性を適切に再現するためには、Ｌ_NUMを２以上に設定することが望ましい。 << Second Example >>
A second embodiment will be described. In the first embodiment, it is assumed that the required number of learning times L _NUM is 3. However, the required number of learning times L _NUM may be 1, and when L _NUM = 1, in the learning stage operation, FIG. when the target input image 300 is acquired element characteristic information 305 (see FIGS. 7 and 9) is generated in (a), the element characteristic information 305 is to function as a comprehensive feature information W1 _a. However, in order to appropriately reproduce the user's preference in the control stage operation, it is desirable to set _LNUM to 2 or more.

＜＜第３実施例＞＞
第３実施例を説明する。入力画像において人物が存在する場合、その人物の個人認識処理を行って個人認識処理の結果に応じたカテゴリ分類を行うようにしても良い。個人認識処理では、例えば、複数の登録人物の顔画像の画像信号に応じた顔辞書データベースを予め被写体検出部５１内に用意しておき、顔辞書データベースと入力画像の画像信号に基づいて、入力画像上に存在する人物が何れかの登録人物であるかを認識する。そして、入力画像上に存在する人物が第ｉ登録人物であると認識したならば、その人物のカテゴリは第ｉ登録人物であると検出するようにしてもよい（ｉは整数）。 << Third Example >>
A third embodiment will be described. When a person is present in the input image, the person may be subjected to personal recognition processing and categorized according to the result of the personal recognition processing. In the personal recognition processing, for example, a face dictionary database corresponding to the image signals of a plurality of registered person's face images is prepared in the subject detection unit 51 in advance, and input is performed based on the face dictionary database and the image signal of the input image. It recognizes which registered person is the person existing on the image. If it is recognized that the person existing on the input image is the i-th registered person, the category of the person may be detected as the i-th registered person (i is an integer).

第１入力画像上の人物が第１登録人物であって且つ第２入力画像上の人物が第２登録人物である場合、第１入力画像上の人物のカテゴリと第２入力画像上の人物のカテゴリは、互いに異なると判断され、結果、第１及び２入力画像間においてカテゴリ組み合わせは異なると判断される。ｉ及びｊが互いに異なる整数である場合、第ｉ登録人物と第ｊ登録人物は互いに異なる人物であるとする。 When the person on the first input image is the first registered person and the person on the second input image is the second registered person, the category of the person on the first input image and the person on the second input image The categories are determined to be different from each other, and as a result, the category combination is determined to be different between the first and second input images. When i and j are different integers, it is assumed that the i-th registered person and the j-th registered person are different persons.

＜＜第４実施例＞＞
第４実施例を説明する。第１実施例で想定されている入力画像では、人物が一人しか存在していないが、人物が複数存在する場合には、複数の人物をまとめて１つのカテゴリに分類するようにしてもよい。 << 4th Example >>
A fourth embodiment will be described. In the input image assumed in the first embodiment, there is only one person. However, when there are a plurality of persons, the plurality of persons may be grouped into one category.

例えば、図１３に示す画像４００が入力画像として被写体検出部５１に入力された場合を考える。入力画像４００には、二人の人物である被写体４０１及び４０２と山である被写体４０３の画像信号が存在しており、被写体検出部５１によって、被写体４０１〜４０３の被写体領域４１１〜４１３が入力画像４００から抽出される。被写体検出部５１は、被写体４０１及び４０２をまとめて１つのカテゴリ「人物二人」に分類し、特徴情報生成部５２は、入力画像４００に対するカテゴリ組み合わせを「人物二人と山」とみなすことができる。 For example, consider a case where an image 400 shown in FIG. 13 is input to the subject detection unit 51 as an input image. The input image 400 includes image signals of subjects 401 and 402 that are two persons and a subject 403 that is a mountain, and the subject areas 411 to 413 of the subjects 401 to 403 are input by the subject detection unit 51. 400. The subject detection unit 51 collectively classifies the subjects 401 and 402 into one category “two people”, and the feature information generation unit 52 may regard the category combination for the input image 400 as “two people and a mountain”. it can.

図１４の特徴情報４０５は、入力画像４００に対して生成される特徴情報の例を示している。入力画像４００における合焦被写体が被写体４０１及び４０２であるものとする。そうすると、要素特徴情報４０５におけるフォーカス情報は人物二人とされ、要素特徴情報４０５におけるサイズ情報は、入力画像４００上における被写体４０１及び４０２の大きさの平均（厳密には例えば、被写体領域４１１の画像サイズと被写体領域４１２の画像サイズの平均）ＳＩＺＥ_Dとされ、要素特徴情報４０５における位置情報は、入力画像４００上における被写体４０１及び４０２の位置とされる。入力画像４００上における被写体領域４１１及び４１２の中心位置が、夫々、入力画像４００上のブロックＢＬ₄及びＢＬ₅内に位置していたものとする。そうすると、図１４に示す如く、要素特徴情報４０５における位置情報はＢＬ₄及びＢＬ₅となる。 Feature information 405 in FIG. 14 shows an example of feature information generated for the input image 400. Assume that the focused subjects in the input image 400 are the subjects 401 and 402. Then, the focus information in the element feature information 405 is two persons, and the size information in the element feature information 405 is an average of the sizes of the subjects 401 and 402 on the input image 400 (strictly, for example, an image of the subject region 411). SIZE _D (average of the size and the image size of the subject area 412), and the position information in the element feature information 405 is the positions of the subjects 401 and 402 on the input image 400. Assume that the center positions of the subject areas 411 and 412 on the input image 400 are located in blocks BL ₄ and BL ₅ on the input image 400, respectively. Then, as shown in FIG. 14, the position information in the element feature information 405 is BL ₄ and BL ₅ .

今、説明の簡略化上、特徴情報４０５そのものが、カテゴリ組み合わせ「人物二人と山」についての総合特徴情報として機能していた場合を考える。この場合において、制御段階動作中に、カテゴリ組み合わせが「人物二人と山」である評価用画像が取得されたとき、特徴情報４０５が読み出し特徴情報として学習メモリ５４から読み出され（図１０のステップＳ２３）、特徴情報４０５のフォーカス情報に基づき人物二人が主要被写体として設定され（ステップＳ２４）、画角自動調整後の入力画像上における人物二人の大きさの平均（人物二人の被写体領域の画像サイズ平均）がＳＩＺＥ_Dとなるように画角自動調整が成され（ステップＳ２５）、人物二人にピントが合うようにフォーカス自動調整が成される（ステップＳ２６）。その後、シャッタボタン２６ｂの全押しが成されると（ステップＳ２７のＹ）、新たな対象入力画像の撮影及び記録が成される（ステップＳ２８）。 Now, for simplification of description, consider the case where the feature information 405 itself functions as general feature information for the category combination “two people and a mountain”. In this case, when an evaluation image having a category combination of “two people and a mountain” is acquired during the control stage operation, feature information 405 is read from the learning memory 54 as read feature information (FIG. 10). Step S23), two persons are set as main subjects based on the focus information of the feature information 405 (Step S24), and the average size of the two persons on the input image after the automatic adjustment of the angle of view (subjects of the two persons) The angle of view is automatically adjusted so that the average image size of the region is SIZE _D (step S25), and the focus is automatically adjusted so that the two people are in focus (step S26). Thereafter, when the shutter button 26b is fully pressed (Y in step S27), a new target input image is shot and recorded (step S28).

但し、画角自動調整の過程において、入力画像上の一方の人物の位置がブロックＢＬ₄内に位置し且つ他方の人物の位置がブロックＢＬ₅内に位置するという条件の成否が確認され、その条件が成り立たなくなると判断されると、その時点で、第１実施例で述べたように画角自動調整による光学ズーム倍率（又は電子ズーム倍率）の変更は終了される。 However, in the process of automatic angle of view adjustment, whether or not the condition that one person's position on the input image is located in the block BL ₄ and the other person's position is located in the block BL ₅ is confirmed. If it is determined that the condition does not hold, the change of the optical zoom magnification (or electronic zoom magnification) by the automatic adjustment of the angle of view is ended at that time as described in the first embodiment.

＜＜第５実施例＞＞
第５実施例を説明する。第１実施例（及び上述の他の実施例）では、カテゴリ組み合わせを形成するカテゴリの個数が２であることを想定しているが、その個数は３以上でも良い。例えば、その個数が３である場合において、学習段階動作中に、人、犬及び山を被写体として含む対象入力画像が取得された場合、その対象入力画像のカテゴリ組み合わせは「人、犬及び山」となり、その対象入力画像の特徴情報がカテゴリ組み合わせ「人、犬及び山」に関連付けて学習メモリ５４に保存される。 << 5th Example >>
A fifth embodiment will be described. In the first embodiment (and the other embodiments described above), it is assumed that the number of categories forming the category combination is 2, but the number may be 3 or more. For example, when the number is 3, and a target input image including a person, a dog, and a mountain as subjects is acquired during the learning stage operation, the category combination of the target input image is “person, dog, and mountain”. Thus, the feature information of the target input image is stored in the learning memory 54 in association with the category combination “people, dogs and mountains”.

カテゴリ組み合わせ「人、犬及び山」についての総合特徴情報が生成された後、制御段階動作中に、人、犬及び山を被写体として含む入力画像が評価用画像として取得されると、カテゴリ組み合わせ「人、犬及び山」についての総合特徴情報が読み出し特徴情報として読み出されて、図１０のステップＳ２４以降の各処理が実行される。 After the comprehensive feature information about the category combination “people, dogs and mountains” is generated, when an input image including people, dogs and mountains as subjects is acquired during the control phase operation, the category combination “ The comprehensive feature information about “people, dogs and mountains” is read as read feature information, and the processes after step S24 in FIG. 10 are executed.

＜＜第６実施例＞＞
第６実施例を説明する。カテゴリ組み合わせを形成するカテゴリの個数が２である場合において、学習段階動作中に、カテゴリ組み合わせ「人及び犬」の総合特徴情報４２０、カテゴリ組み合わせ「犬及び山」の総合特徴情報４２１、カテゴリ組み合わせ「人及び山」の総合特徴情報４２２が生成されて学習メモリ５４に保存された場合を考える（総合特徴情報４２０〜４２２は図示せず）。 << Sixth Example >>
A sixth embodiment will be described. When the number of categories forming the category combination is 2, during the learning stage operation, the comprehensive feature information 420 of the category combination “people and dogs”, the comprehensive feature information 421 of the category combination “dogs and mountains”, the category combination “ Consider a case where the comprehensive feature information 422 of “people and mountains” is generated and stored in the learning memory 54 (the comprehensive feature information 420 to 422 is not shown).

この場合において、制御段階動作中に、人と犬と山を被写体として含む評価用画像（不図示）が取得された場合、総合特徴情報４２０〜４２２の何れかが、図１０のステップＳ２３にて読み出される。この際、総合特徴情報４２０〜４２２の付加情報に格納されている学習回数を参照し、学習回数が最も多い総合特徴情報を、ステップＳ２３にて読み出すようにすると良い。 In this case, when an evaluation image (not shown) including a person, a dog, and a mountain as subjects is acquired during the control stage operation, any of the comprehensive feature information 420 to 422 is obtained in step S23 of FIG. Read out. At this time, referring to the number of learnings stored in the additional information of the general feature information 420 to 422, the general feature information with the largest number of learnings may be read in step S23.

総合特徴情報４２０〜４２２の中に、学習回数が同数の総合特徴情報が複数存在していた場合、最も新しい学習時刻に学習された総合特徴情報をステップＳ２３にて読み出すようにすると良い。これを実現すべく、学習メモリ５４の各付加情報に学習時刻を付与しておくと良い。或る特徴情報の学習時刻とは、その特徴情報の元となる対象入力画像の撮影時刻を表す。例えば、総合特徴情報４２０〜４２２の学習回数が夫々「３」、「４」及び「４」である場合、総合特徴情報４２１の元となる４つの要素特徴情報の学習時刻の内、最新の学習時刻を第１学習時刻として抽出する一方で、総合特徴情報４２２の元となる４つの要素特徴情報の学習時刻の内、最新の学習時刻を第２学習時刻として抽出する。そして、第１学習時刻が第２学習時刻よりも遅ければ総合特徴情報４２１をステップＳ２３にて読み出し、逆に第２学習時刻が第１学習時刻よりも遅ければ総合特徴情報４２２をステップＳ２３にて読み出せばよい。 When there are a plurality of pieces of the total feature information having the same number of times of learning in the total feature information 420 to 422, the total feature information learned at the latest learning time may be read in step S23. In order to realize this, a learning time may be given to each additional information in the learning memory 54. The learning time of certain feature information represents the shooting time of the target input image that is the source of the feature information. For example, when the number of learning times of the comprehensive feature information 420 to 422 is “3”, “4”, and “4”, respectively, the latest learning among the learning times of the four element feature information that is the basis of the comprehensive feature information 421. While extracting the time as the first learning time, the latest learning time is extracted as the second learning time among the learning times of the four element feature information that is the source of the comprehensive feature information 422. If the first learning time is later than the second learning time, the comprehensive feature information 421 is read in step S23. Conversely, if the second learning time is later than the first learning time, the comprehensive feature information 422 is read in step S23. Read it out.

＜＜第７実施例＞＞
第７実施例を説明する。第１実施例では、フォーカス情報、サイズ情報及び位置情報を特徴情報に含めることで、制御段階動作において画角自動調整及びフォーカス自動調整を実現している。 << Seventh Embodiment >>
A seventh embodiment will be described. In the first embodiment, by including focus information, size information, and position information in the feature information, automatic adjustment of the angle of view and automatic adjustment of the focus are realized in the control stage operation.

しかしながら、特徴情報からサイズ情報及び位置情報を削除することで、制御段階動作において画角自動調整の実行を割愛するようにしてもよい（図１０参照）。特徴情報からサイズ情報及び位置情報を削除した場合、制御段階動作においてステップ２５の画角自動調整が実行されず、フォーカス情報に基づくフォーカス自動調整を経てから（ステップＳ２６）、ステップＳ２８の対象入力画像の撮影が成される。 However, by deleting the size information and the position information from the feature information, execution of the automatic adjustment of the angle of view may be omitted in the control stage operation (see FIG. 10). When the size information and the position information are deleted from the feature information, the angle of view automatic adjustment in step 25 is not executed in the control stage operation, and after the automatic focus adjustment based on the focus information (step S26), the target input image in step S28 Is filmed.

或いは、特徴情報から位置情報のみを削除するようにしても良い。第１実施例では、主要被写体の位置が、読み出し特徴情報の位置情報に示されたブロックからはみ出さないという前提の下で画角自動調整が行われ、そのようなはみ出しが生じる場合には、そのようなはみ出しが生じる直前にて画角自動調整が強制的に終了されるが、特徴情報から位置情報を削除した場合には、そのような強制的な終了は発生しなくなる。つまり、特徴情報から位置情報を削除した場合には、入力画像上の主要被写体の位置に関係なく、ステップＳ２８の対象入力画像上の主要被写体の大きさが読み出し特徴情報のサイズ情報に従った大きさとなるように画角自動調整が成される。 Alternatively, only the position information may be deleted from the feature information. In the first embodiment, the angle of view is automatically adjusted under the assumption that the position of the main subject does not protrude from the block indicated by the position information of the readout feature information. Although the automatic adjustment of the angle of view is forcibly terminated immediately before such protrusion, such forcible termination does not occur when the position information is deleted from the feature information. That is, when the position information is deleted from the feature information, the size of the main subject on the target input image in step S28 is the size according to the size information of the read feature information regardless of the position of the main subject on the input image. The angle of view is automatically adjusted so that

＜＜第８実施例＞＞
第８実施例を説明する。第８実施例では、学習段階動作中に生成される対象入力画像の生成条件が、ユーザのマニュアル操作によって指定されていることを想定する。マニュアル操作は、ユーザによる操作部２６への操作である。表示部２７にタッチパネル機能が備えられている場合においては、タッチパネル操作によってマニュアル操作が実現されても良い。この場合、表示部２７は、対象入力画像の生成条件の指定を受け付ける第２操作部としても機能する。 << Eighth Example >>
An eighth embodiment will be described. In the eighth embodiment, it is assumed that the generation condition of the target input image generated during the learning stage operation is designated by the user's manual operation. The manual operation is an operation on the operation unit 26 by the user. In the case where the display unit 27 has a touch panel function, a manual operation may be realized by a touch panel operation. In this case, the display unit 27 also functions as a second operation unit that accepts specification of a target input image generation condition.

対象入力画像の生成条件には、対象入力画像の画質を変化させる（より広くいえば、対象入力画像の画像信号を変化させる）任意の条件が含まれる。指定された生成条件に依存して、手ぶれ補正のＯＮ／ＯＦＦ、ＡＦＥ１２の増幅率、対象入力画像の生成過程においてＡＦＥ１２の出力信号に成される画像処理（鮮鋭化処理やホワイトバランス調整用処理）の内容などが規定される。より具体的には、対象入力画像の生成条件には、対象入力画像の生成時に手ぶれ補正を成すか否かを既定する手ぶれ補正ＯＮ／ＯＦＦ条件、対象入力画像の感度を既定する感度条件、対象入力画像の鮮鋭度合いを既定する鮮鋭化条件、対象入力画像のホワイトバランスの状態を規定するホワイトバランス条件などが含まれる。 The generation conditions for the target input image include any condition that changes the image quality of the target input image (more broadly, changes the image signal of the target input image). Depending on the specified generation conditions, camera shake correction ON / OFF, AFE 12 gain, and image processing (sharpening processing and white balance adjustment processing) performed on the output signal of the AFE 12 in the process of generating the target input image The contents of are defined. More specifically, the target input image generation conditions include a camera shake correction ON / OFF condition that determines whether camera shake correction is performed when the target input image is generated, a sensitivity condition that determines the sensitivity of the target input image, and a target A sharpening condition that defines the sharpness of the input image, a white balance condition that defines the white balance state of the target input image, and the like are included.

手ぶれ補正がＯＦＦとなっている場合、対象入力画像の画像信号を得るために撮像素子３３にて実施された露光の期間中に撮像装置１の筐体が動けば、撮像素子３３上の像がぶれて、そのぶれが対象入力画像に混入する。手ぶれ補正がＯＮとなっている場合には、対象入力画像に混入するおそれのあるぶれが、公知の方法を用いて光学的又は電子的に除去される。
感度条件にて規定される感度は、ＩＳＯ感度である。ＩＳＯ感度は、ＩＳＯ（International Organization for Standardization）によって規定された感度を意味し、ＩＳＯ感度を調節することで対象入力画像の明るさ（輝度レベル）を調節することができる。実際には、ＩＳＯ感度に応じてＡＦＥ１２における信号増幅の増幅度を決定する。
ＡＦＥ１２の出力信号そのものによって表される画像に対して鮮鋭化処理を施すことで対象入力画像を生成することができる。鮮鋭化条件によって、鮮鋭化処理そのものを実行するのか否かを含め、鮮鋭化処理の内容が規定される。
ホワイトバランス条件によって、オートホワイトバランス制御を用いて対象入力画像を生成するのか否か等が規定される。 When camera shake correction is OFF, if the housing of the imaging device 1 moves during the exposure period performed by the imaging device 33 to obtain an image signal of the target input image, the image on the imaging device 33 is displayed. The blur is mixed in the target input image. When the camera shake correction is ON, a shake that may be mixed into the target input image is removed optically or electronically using a known method.
The sensitivity specified by the sensitivity condition is ISO sensitivity. The ISO sensitivity means sensitivity defined by ISO (International Organization for Standardization), and the brightness (luminance level) of the target input image can be adjusted by adjusting the ISO sensitivity. Actually, the amplification factor of the signal amplification in the AFE 12 is determined according to the ISO sensitivity.
A target input image can be generated by performing a sharpening process on the image represented by the output signal itself of the AFE 12. The sharpening conditions define the contents of the sharpening process, including whether or not to execute the sharpening process itself.
Whether or not to generate a target input image using auto white balance control is defined by the white balance condition.

第８実施例における以下の説明では、説明の簡略化上、手ぶれ補正ＯＮ／ＯＦＦ条件及び感度条件のみがマニュアル操作によって指定されることを想定する。 In the following description of the eighth embodiment, it is assumed that only the camera shake correction ON / OFF condition and the sensitivity condition are designated by a manual operation for the sake of simplicity.

図１５は、第８実施例に係る特殊撮影モードの動作に特に関与する部位のブロック図である。生成条件情報生成部５６は、後述の生成条件情報を生成する。生成条件情報生成部５６を、画像信号処理部１３によって、或いは、画像信号処理部１３とＣＰＵ２３の組み合わせによって実現することができる。 FIG. 15 is a block diagram of a portion particularly related to the operation in the special imaging mode according to the eighth embodiment. The generation condition information generation unit 56 generates generation condition information described later. The generation condition information generation unit 56 can be realized by the image signal processing unit 13 or a combination of the image signal processing unit 13 and the CPU 23.

［学習段階動作］
図１６は、第８実施例に係る学習段階動作の手順を表すフローチャートであり、学習段階動作ではステップＳ５１〜Ｓ５４の各処理が実行される。ステップＳ５１では、シャッタ操作に先立ち、撮像装置１は、上記マニュアル操作による対象入力画像の生成条件の指定を受け付け、その後、ステップＳ５２においてユーザによりシャッタ操作が成されると、指定された生成条件にて対象入力画像を生成する。対象入力画像の生成後、ステップＳ５３において、被写体検出部５１により、対象入力画像の被写体検出及び被写体のカテゴリ分類が行われる。最後に、ステップＳ５４では、対象入力画像のカテゴリ組み合わせに関連付けた状態で、対象入力画像の生成条件を表す生成条件情報が学習メモリ５４の記録内容に反映される。 [Learning stage operation]
FIG. 16 is a flowchart showing the procedure of the learning stage operation according to the eighth embodiment. In the learning stage operation, each process of steps S51 to S54 is executed. In step S51, prior to the shutter operation, the imaging apparatus 1 accepts specification of the target input image generation condition by the manual operation. After that, when the shutter operation is performed by the user in step S52, the specified generation condition is satisfied. To generate a target input image. After the generation of the target input image, in step S53, the subject detection unit 51 performs subject detection of the target input image and subject category classification. Finally, in step S54, the generation condition information indicating the generation condition of the target input image is reflected in the recorded content of the learning memory 54 in a state associated with the category combination of the target input image.

例えば、ステップＳ５１において、ユーザのマニュアル操作によりＩＳＯ感度を「ＩＳＯ１００」にすべきこと及び手ぶれ補正をＯＮとすべきことが指定され、その指定内容に従って、図６（ａ）の対象入力画像３００が生成された場合を考える。この場合、対象入力画像３００に対するマニュアル操作の内容に基づいて図１７に示す生成条件情報５０５が生成され、生成条件情報５０５が対象入力画像３００の組み合わせカテゴリ「人物及び山」と関連付けた状態で学習メモリ５４に保存される。組み合わせカテゴリの検出方法は、第１実施例で述べた通りである。 For example, in step S51, it is specified by the user's manual operation that the ISO sensitivity should be set to “ISO 100” and camera shake correction should be turned on, and the target input image 300 in FIG. Consider the generated case. In this case, generation condition information 505 shown in FIG. 17 is generated based on the contents of the manual operation on the target input image 300, and learning is performed in a state where the generation condition information 505 is associated with the combination category “person and mountain” of the target input image 300. It is stored in the memory 54. The combination category detection method is as described in the first embodiment.

或る入力画像に対する生成条件情報は、感度条件を表す感度情報及び手ぶれ補正ＯＮ／ＯＦＦ条件を表す手ぶれ補正情報から成り、該生成条件情報に必要に応じて、上述したような付加情報が付加される（図１７参照）。付加情報も生成条件情報の一部であると考えることも可能であるが、本例においては、付加情報は生成条件情報の構成要素ではないと考える。生成条件情報５０５において、感度情報は「ＩＳＯ１００」であり、手ぶれ補正情報は「ＯＮ」である。 The generation condition information for a certain input image is composed of sensitivity information indicating sensitivity conditions and camera shake correction information indicating camera shake correction ON / OFF conditions. Additional information as described above is added to the generation condition information as necessary. (See FIG. 17). Although it is possible to consider that the additional information is also a part of the generation condition information, in this example, it is considered that the additional information is not a component of the generation condition information. In the generation condition information 505, the sensitivity information is “ISO100”, and the camera shake correction information is “ON”.

学習段階動作では、ステップＳ５１〜Ｓ５４から成る一連の処理を繰り返し実行することで、ユーザの嗜好性を繰り返し学習する。Ｌ_NUMを１以上の任意の整数とすることができるが、今、Ｌ_NUMが３であるとする。Ｌ_NUMの意義は上述した通りである。図１８は、対象入力画像３００の取得後、図９の情報の生成元となった上述の対象入力画像３００ａ、３００ｂ、３３０、３３０ａ、３３０ｂ、３３１及び３３１ａ（全て不図示）が更に取得された後の、学習メモリ５４の記録内容を示している。 In the learning stage operation, the user's preference is repeatedly learned by repeatedly executing a series of processes consisting of steps S51 to S54. L _NUM can be any integer greater than or equal to 1, but now _assume that L _NUM is 3. The significance of _LNUM is as described above. In FIG. 18, after the target input image 300 is acquired, the above-described target input images 300 a, 300 b, 330, 330 a, 330 b, 331, and 331 a (all not shown) that are the generation sources of the information in FIG. The contents recorded later in the learning memory 54 are shown.

生成条件情報生成部５６は、対象入力画像３００の生成条件情報５０５を生成する方法と同様の方法にて、対象入力画像３００ａ及び３００ｂに対するマニュアル操作の内容に基づき、対象入力画像３００ａの生成条件情報５０５ａ及び対象入力画像３００ｂの生成条件情報５０５ｂを生成する。メモリ制御部５３は、生成条件情報５０５、５０５ａ及び５０５ｂが生成されると、それらをカテゴリ組み合わせ「人物及び山」に関連付けつつ学習メモリ５４に保存する。一方で、生成条件情報生成部５６又はメモリ制御部５３は、生成条件情報５０５、５０５ａ及び５０５ｂに一致又は類似する生成条件情報Ｗ２_Aを統計学に基づいて作成し、生成条件情報Ｗ２_Aもカテゴリ組み合わせ「人物及び山」に関連付けて学習メモリ５４に保存する。１枚１枚の対象入力画像の生成条件情報を特に要素生成条件情報とも呼び、複数枚の対象入力画像の生成条件情報から統計学に基づき生成された生成条件情報を特に総合生成条件情報とも呼ぶ。本例において、生成条件情報５０５、５０５ａ及び５０５ｂの夫々は要素生成条件情報であり、生成条件情報Ｗ２_Aは総合生成条件情報である。 The generation condition information generation unit 56 is a method similar to the method for generating the generation condition information 505 of the target input image 300, and the generation condition information of the target input image 300a based on the contents of the manual operation on the target input images 300a and 300b. The generation condition information 505b of 505a and the target input image 300b is generated. When the generation condition information 505, 505a and 505b are generated, the memory control unit 53 stores them in the learning memory 54 while associating them with the category combination “person and mountain”. On the other hand, generation condition information generating unit 56 or the memory control unit 53, a generation condition information W2 _A matching or similar to the generation condition information 505,505a and 505b created based on the statistics, generates condition information W2 _A also Category It is stored in the learning memory 54 in association with the combination “person and mountain”. The generation condition information for each target input image is also called element generation condition information, and the generation condition information generated based on statistics from the generation condition information for a plurality of target input images is also called total generation condition information. . In this example, each of the generated condition information 505,505a and 505b people are elements generating condition information, generates condition information W2 _A is a total generation condition information.

要素生成条件情報５０５、５０５ａ及び５０５ｂにおいて、感度情報は、夫々、「ＩＳＯ１００」、「ＩＳＯ１００」及び「ＩＳＯ２００」であり、手ぶれ補正情報は、夫々、「ＯＮ」、「ＯＦＦ」及び「ＯＮ」である。複数の要素特徴情報のフォーカス情報から総合特徴情報のフォーカス情報を生成する方法（第１実施例で述べた方法）と同様の方法にて（図９参照）、情報５０５、５０５ａ及び５０５ｂの感度情報から情報Ｗ２_Aの感度情報を生成することができると共に、情報５０５、５０５ａ及び５０５ｂの手ぶれ補正情報から情報Ｗ２_Aの手ぶれ補正情報を生成することができる。即ち例えば、情報Ｗ２_Aの元となる要素生成条件情報の感度情報の内、最も頻度の多い感度情報を、情報Ｗ２_Aの感度情報とすることができる（手ぶれ補正情報も同様）。従って、情報Ｗ２_Aにおける感度情報及び手ぶれ補正情報は、夫々、「ＩＳＯ１００」及び「ＯＮ」となる。情報Ｗ２_Aに付随する付加情報には、カテゴリ組み合わせ「人物及び山」に対する学習回数が記録される。情報Ｗ２_Aは、３回分の学習結果に基づき、即ち情報５０５、５０５ａ及び５０５ｂを元に生成される。従って、情報Ｗ２_Aの付加情報には、学習回数を表す数値として３が記録される。 In the element generation condition information 505, 505a, and 505b, the sensitivity information is “ISO100”, “ISO100”, and “ISO200”, respectively, and the camera shake correction information is “ON”, “OFF”, and “ON”, respectively. is there. Sensitivity information of information 505, 505a, and 505b by a method similar to the method of generating the focus information of the comprehensive feature information (the method described in the first embodiment) from the focus information of the plurality of element feature information (see FIG. 9). Sensitivity information of the information W2 _A can be generated from the image information, and camera shake correction information of the information W2 _A can be generated from the camera shake correction information of the information 505, 505a, and 505b. That is, for example, of the sensitivity information of the underlying element generation condition information of the information W2 _A, the most frequent sensitivity information may be the sensitivity information of the information W2 _A (image stabilization information as well). Therefore, the sensitivity information and image stabilization information in the information W2 _A, respectively, the "ISO100" and "ON". In the additional information accompanying the information W2 _A , the number of learning times for the category combination “person and mountain” is recorded. Information W2 _A is based on three times the learning result is generated based on the words information 505,505a and 505b. Therefore, the additional information of the information W2 _A, 3 is recorded as a numerical value representing the number of times of learning.

図１８において、中央部分に示された生成条件情報群はカテゴリ組み合わせ「犬及び海」についての生成条件情報群であり、Ｗ２_Bは、カテゴリ組み合わせ「犬及び海」についての総合生成条件情報である。総合生成条件情報Ｗ２_Bも、総合生成条件情報Ｗ２_Aと同様にして作成される。図１８において、下方部分に示された生成条件情報群はカテゴリ組み合わせ「人及び海」についての生成条件情報群である。図１８に示す状態において、カテゴリ組み合わせ「人物及び山」と「犬及び海」については学習回数が必要学習回数Ｌ_NUM（＝３）に達しているため制御段階動作を実行することができる。 18, indicated generation condition information group in the central portion is a generation condition information group for the category combinations "dog and sea", W2 _B is a comprehensive generation condition information about the category combinations "dog and sea" . The total generation condition information W2 _{B is} also created in the same manner as the total generation condition information W2 _A. In FIG. 18, the generation condition information group shown in the lower part is a generation condition information group for the category combination “person and sea”. In the state shown in FIG. 18, for the category combination “person and mountain” and “dog and sea”, the number of learning times has reached the required number of learning times L _NUM (= 3), so that the control stage operation can be executed.

尚、第１実施例で述べたように、対象入力画像３００、３００ａ及び３００ｂの撮影後、更に、カテゴリ組み合わせが「人物及び山」となる対象入力画像３００ｃ（不図示）が撮影された場合には、その最新の対象入力画像３００ｃの生成条件情報を用いて総合生成条件情報Ｗ２_Aを更新すると良い。 As described in the first embodiment, after the target input images 300, 300a, and 300b are captured, the target input image 300c (not shown) whose category combination is “person and mountain” is further captured. May update the comprehensive generation condition information W2 _A using the generation condition information of the latest target input image 300c.

［制御段階動作］
カテゴリ組み合わせ「人物及び山」について、要素生成条件情報５０５、５０５ａ及び５０５ｂに基づく図１８の総合生成条件情報Ｗ２_Aが学習メモリ５４に保存され、且つ、総合生成条件情報Ｗ２_A以外の幾つかの総合生成条件情報（情報Ｗ２_Bを含む）が学習メモリ５４に保存されている状態を、便宜上、図１８の学習状態と呼ぶ。第８実施例では、以下、図１８の学習状態の下における制御段階動作の説明を行う。図１９は、第８実施例に係る制御段階動作の手順を表すフローチャートであり、制御段階動作ではステップＳ６１〜Ｓ６５の各処理が実行される。 [Control stage operation]
For category combinations "People and Mountain", General generation condition information W2 _A of Figure 18 based on the element generating condition information 505,505a and 505b are stored in the learning memory 54, and several non-comprehensive generation condition information W2 _A a state in which total generation condition information (including information W2 _B) is stored in the learning memory 54, for convenience, referred to as the learning state of FIG. In the eighth embodiment, the control stage operation under the learning state of FIG. 18 will be described below. FIG. 19 is a flowchart showing the procedure of the control stage operation according to the eighth embodiment. In the control stage operation, steps S61 to S65 are executed.

ステップＳ６１及びＳ６２の処理内容は、図１０のステップＳ２１及びＳ２２のそれと同じである。従って、シャッタボタン２６ｂが半押し状態になっている場合、或いは、撮像装置１の筐体が一定時間継続して静止していると判断される場合、ステップＳ６１からステップＳ６２への移行が成され、被写体検出部５１は、最新の入力画像を評価用画像として取り扱って、評価用画像の画像信号に基づき、評価用画像上に存在する各被写体を複数のカテゴリの何れかに分類して検出する。これにより、上述と同様の方法にて、評価用画像についてのカテゴリ組み合わせが決定する。 The processing contents of steps S61 and S62 are the same as those of steps S21 and S22 of FIG. Accordingly, when the shutter button 26b is half-pressed, or when it is determined that the housing of the imaging device 1 is stationary for a certain period of time, the process proceeds from step S61 to step S62. The subject detection unit 51 treats the latest input image as an evaluation image, and classifies and detects each subject existing on the evaluation image based on the image signal of the evaluation image. . Thereby, the category combination about the image for evaluation is determined by the same method as described above.

ステップＳ６２に続くステップＳ６３において、図４の撮影制御部５５は、評価用画像のカテゴリ組み合わせに一致するカテゴリ組み合わせの総合生成条件情報を、学習メモリ５４から読み出す。例えば、評価用画像の組み合わせカテゴリが「人物及び山」であったならば情報Ｗ２_Aが読み出され、評価用画像の組み合わせカテゴリが「犬及び海」であったならば情報Ｗ２_Bが読み出される（図１８参照）。以下では、評価用画像が図１１の画像３５０である場合を考える。評価用画像３５０の組み合わせカテゴリは「人物及び山」であるため、ステップＳ６３において総合生成条件情報Ｗ２_Aが読み出される。以下、ステップＳ６３にて読み出される総合生成条件情報を、読み出し生成条件情報とも呼ぶ。 In step S 63 following step S 62, the imaging control unit 55 in FIG. 4 reads, from the learning memory 54, the comprehensive generation condition information of the category combination that matches the category combination of the evaluation image. For example, if the evaluation image combination category is “person and mountain”, the information W2 _A is read, and if the evaluation image combination category is “dog and sea”, the information W2 _B is read. (See FIG. 18). In the following, the case where the evaluation image is the image 350 in FIG. 11 is considered. Since the combination category of the evaluation image 350 is “person and mountain”, the comprehensive generation condition information W2 _A is read in step S63. Hereinafter, the comprehensive generation condition information read in step S63 is also referred to as read generation condition information.

ステップＳ６３の読み出し処理の後、ステップＳ６４において、シャッタボタン２６ｂが全押し状態になるのを待機し、それが全押し状態になったことが確認されるとステップＳ６４からステップＳ６５へ移行して、新たな対象入力画像の撮影を行い、その対象入力画像の画像信号を外部メモリ１８に記録する。ステップＳ６５にて取得される対象入力画像の生成条件は、読み出し生成条件情報に従ったものとされる。即ち、読み出し生成条件情報が情報Ｗ２_Aであるならば、ステップＳ６５にて取得される対象入力画像のＩＳＯ感度は「ＩＳＯ１００」とされ、且つ、手ぶれ補正をＯＮにした状態でステップＳ６５の対象入力画像の撮影が成される。 After the reading process in step S63, in step S64, the process waits for the shutter button 26b to be fully pressed. When it is confirmed that the shutter button 26b is fully pressed, the process proceeds from step S64 to step S65. A new target input image is taken and the image signal of the target input image is recorded in the external memory 18. The generation condition of the target input image acquired in step S65 is in accordance with the read generation condition information. That is, if the read generation condition information is information W2 _A, ISO sensitivity of the target input image acquired in step S65 is the "ISO100", and, subject the input of step S65 in a state where the camera shake correction to ON An image is taken.

学習メモリ５４に記録された生成条件情報には、ユーザの嗜好性が反映されている。制御段階動作では、現時点の被写体のカテゴリに応じた生成条件情報を学習メモリ５４から読み出すことで、ユーザの嗜好性に適合した生成条件（ＩＳＯ感度等）を自動的に再現する。これにより、ユーザの嗜好性に適合した生成条件（ＩＳＯ感度等）の設定が支援され、ユーザの利便性向上が図られる。 The generation condition information recorded in the learning memory 54 reflects user preference. In the control stage operation, the generation condition information corresponding to the user's preference (ISO sensitivity, etc.) is automatically reproduced by reading out the generation condition information corresponding to the current subject category from the learning memory 54. Thereby, the setting of the generation conditions (ISO sensitivity, etc.) adapted to the user's preference is supported, and the convenience of the user is improved.

＜＜第９実施例＞＞
第９実施例を説明する。上述の各実施例では、複数の被写体を関係付けることでカテゴリ組み合わせを形成しているが、シーン判定を利用し、入力画像上の被写体と入力画像に対して判定されたシーンとを関係付けることでカテゴリ組み合わせを形成しても良い。この方法を具体的に説明する。 << Ninth Embodiment >>
A ninth embodiment will be described. In each of the embodiments described above, category combinations are formed by associating a plurality of subjects. However, using scene determination, the subject on the input image and the scene determined for the input image are related. A category combination may be formed. This method will be specifically described.

第９実施例では、図２０に示すシーン判定部５８が利用される。シーン判定部５８を、図１の画像信号処理部１３に設けておくことができる。シーン判定部５８は、入力画像の画像信号に基づいて入力画像の撮影シーンを判定する。この判定を、入力画像ごとに行うことができる。入力画像の撮影シーンの判定は、入力画像の被写体の検出、入力画像の被写体のカテゴリ分類、入力画像の色相の分析、入力画像の撮影時における被写体の光源状態の推定等を用いて実行され、その判定に公知の任意の方法（例えば、特開２００９−７１６６６号公報に記載の方法）を用いることができる。 In the ninth embodiment, a scene determination unit 58 shown in FIG. 20 is used. The scene determination unit 58 can be provided in the image signal processing unit 13 of FIG. The scene determination unit 58 determines the shooting scene of the input image based on the image signal of the input image. This determination can be made for each input image. The determination of the shooting scene of the input image is performed using detection of the subject of the input image, category classification of the subject of the input image, analysis of the hue of the input image, estimation of the light source state of the subject at the time of shooting of the input image, etc. Any known method (for example, the method described in JP-A-2009-71666) can be used for the determination.

複数の登録シーンが予めシーン判定部５８に設定されている。複数の登録シーンには、例えば、人物が注目された撮影シーンであるポートレートシーン、山が注目された撮影シーンである山シーン、海が注目された撮影シーンである海シーン、日中の撮影状態を表す日中シーン、夜景の撮影状態を表す夜景シーンなどが含まれうる。シーン判定部５８は、注目した入力画像の画像信号からシーン判定に有効な特徴データを抽出することで、その注目した入力画像の撮影シーンを上記複数の登録シーンの中から選択し、これによって、注目した入力画像の撮影シーンを判定する。シーン判定部５８によって判定された撮影シーンを、判定シーンと呼ぶ。 A plurality of registered scenes are set in the scene determination unit 58 in advance. The registered scenes include, for example, a portrait scene that is a shooting scene in which a person is noted, a mountain scene that is a shooting scene in which a mountain is noted, a sea scene that is a shooting scene in which the sea is noted, and daytime shooting. A daytime scene representing a state, a night scene representing a night scene shooting state, and the like may be included. The scene determination unit 58 extracts feature data effective for scene determination from the image signal of the focused input image, thereby selecting the shooting scene of the focused input image from the plurality of registered scenes. The shooting scene of the focused input image is determined. The shooting scene determined by the scene determination unit 58 is called a determination scene.

シーン判定部５８を第１又は第８実施例に適用する方法を説明する。判定シーンを表す判定シーン情報は、図４の特徴情報生成部５２又は図１５の生成条件情報生成部５６に伝達される。特徴情報生成部５２又は生成条件情報生成部５６は、任意の入力画像に対する被写体検出部５１の検出結果と判定シーン情報に基づき、その入力画像に対するカテゴリ組み合わせを設定する。例えば、図６（ａ）の対象入力画像３００に関し、対象入力画像３００上の被写体３０１が被写体検出部５１により検出され且つ対象入力画像３００の撮影シーンがシーン判定部５８により山シーンと判定されると、被写体３０１のカテゴリである人物と山シーンとを関係付け、「人物」及び「山シーン」の組み合わせを、対象入力画像３００のカテゴリ組み合わせとして設定する。「人物」及び「山シーン」の組み合わせは、カテゴリ組み合わせ「人物及び山シーン」と表記される。同様に例えば、或る対象入力画像から犬のカテゴリの被写体が検出されると共に該対象入力画像の判定シーンが海シーンであると判断されると、その対象入力画像のカテゴリ組み合わせは、カテゴリ組み合わせ「犬及び海シーン」となる。 A method of applying the scene determination unit 58 to the first or eighth embodiment will be described. Determination scene information representing a determination scene is transmitted to the feature information generation unit 52 in FIG. 4 or the generation condition information generation unit 56 in FIG. The feature information generation unit 52 or the generation condition information generation unit 56 sets a category combination for the input image based on the detection result of the subject detection unit 51 for the arbitrary input image and the determination scene information. For example, regarding the target input image 300 in FIG. 6A, the subject 301 on the target input image 300 is detected by the subject detection unit 51, and the shooting scene of the target input image 300 is determined as a mountain scene by the scene determination unit 58. And the person who is the category of the subject 301 and the mountain scene are related, and the combination of “person” and “mountain scene” is set as the category combination of the target input image 300. A combination of “person” and “mountain scene” is described as a category combination “person and mountain scene”. Similarly, for example, when a subject of a dog category is detected from a certain target input image and it is determined that the determination scene of the target input image is a sea scene, the category combination of the target input image is the category combination “ Dog and sea scene ".

カテゴリ組み合わせの設定方法が第１又は第８実施例と異なるだけで、シーン判定部５８を第１又は第８実施例に適用した場合における学習段階動作及び制御段階動作は、第１又は第８実施例のそれと同様である。即ち例えば、第１又は第８実施例の記述中の文言「山」及び「海」をそれぞれ文言「山シーン」及び「海シーン」に適宜読みかえた上で、第１又は第８実施例にて述べた事項を第９実施例に適用することができる（第２〜第７実施例についても同様）。 The learning stage operation and the control stage operation when the scene determination unit 58 is applied to the first or eighth embodiment are different from the first or eighth embodiment except that the category combination setting method is different from the first or eighth embodiment. Similar to that of the example. That is, for example, the words “mountain” and “sea” in the description of the first or eighth embodiment are appropriately replaced with the words “mountain scene” and “sea scene”, respectively, and then in the first or eighth embodiment. The matters described can be applied to the ninth embodiment (the same applies to the second to seventh embodiments).

第１実施例が適用される場合、学習段階動作において、対象入力画像を繰り返し撮影することで各対象入力画像の特徴情報を生成し、被写体のカテゴリと判定シーンとの組み合わせであるカテゴリ組み合わせに各特徴情報を関連付けて学習メモリ５４に保存してゆく。学習メモリ５４に総合特徴情報が生成されると、制御段階動作に移行する。制御段階動作では（図１０参照）、評価用画像に対して被写体のカテゴリ分類を行うと共にシーン判定部５８による判定を行うことで評価用画像のカテゴリ組み合わせ（評価用画像の被写体のカテゴリと評価用画像の判定シーンとの組み合わせ）を決定し、決定したカテゴリ組み合わせに対応する総合特徴情報を学習メモリ５４から読み出す。そして、読み出した総合特徴情報に基づき、第１実施例で述べた方法に従って主要被写体に注目した画角自動調整及びフォーカス自動調整を行い、その後、新たな対象入力画像を撮影すればよい。 When the first embodiment is applied, feature information of each target input image is generated by repeatedly capturing the target input image in the learning stage operation, and each category combination that is a combination of the category of the subject and the determination scene is set. The feature information is associated and stored in the learning memory 54. When the comprehensive feature information is generated in the learning memory 54, the process proceeds to the control stage operation. In the control stage operation (see FIG. 10), the category classification of the subject is performed on the evaluation image and the determination by the scene determination unit 58 is performed, whereby the category combination of the evaluation image (the category of the subject and the evaluation image) The combination of the image with the judgment scene) is determined, and the comprehensive feature information corresponding to the determined category combination is read from the learning memory 54. Then, based on the read comprehensive feature information, the view angle automatic adjustment and the focus automatic adjustment focusing on the main subject are performed according to the method described in the first embodiment, and then a new target input image may be taken.

第８実施例が適用される場合、学習段階動作において、対象入力画像をマニュアル操作を介して繰り返し撮影することで各対象入力画像の生成条件情報を生成し、被写体のカテゴリと判定シーンとの組み合わせであるカテゴリ組み合わせに各生成条件情報を関連付けて学習メモリ５４に保存してゆく。学習メモリ５４に総合生成条件情報が生成されると、制御段階動作に移行する。制御段階動作では（図１９参照）、評価用画像に対して被写体のカテゴリ分類を行うと共にシーン判定部５８による判定を行うことで評価用画像のカテゴリ組み合わせ（評価用画像の被写体のカテゴリと評価用画像の判定シーンとの組み合わせ）を決定し、決定したカテゴリ組み合わせに対応する総合生成条件情報を学習メモリ５４から読み出す。そして、第８実施例で述べた方法に従い、読み出した総合生成条件情報に規定された生成条件にて、新たな対象入力画像を撮影すればよい。 When the eighth embodiment is applied, in the learning stage operation, the target input image is repeatedly photographed through manual operation to generate the generation condition information of each target input image, and the combination of the subject category and the determination scene Each generation condition information is associated with the category combination and stored in the learning memory 54. When the comprehensive generation condition information is generated in the learning memory 54, the process proceeds to the control stage operation. In the control stage operation (see FIG. 19), the category classification of the subject is performed on the evaluation image and the determination by the scene determination unit 58 is performed to combine the categories of the evaluation image (the subject category and the evaluation image in the evaluation image). The combination of the image with the determination scene) is determined, and the comprehensive generation condition information corresponding to the determined category combination is read from the learning memory 54. Then, according to the method described in the eighth embodiment, a new target input image may be taken under the generation conditions specified in the read total generation condition information.

＜＜第１０実施例＞＞
第１０実施例を説明する。第１０実施例では、音響信号に対して特徴的な制御を行う。図１のマイク部１４は、複数のマイクロホンから形成される。今、図２１に示す如く、マイク部１４は、２つのマイクロホン１４Ｌ及び１４Ｒから成るものとする。マイクロホン１４Ｌ及び１４Ｒとして指向性を有する有指向性マイクロホンを採用することも可能であるが、マイクロホン１４Ｌ及び１４Ｒは、指向性を有さない無指向性マイクロホンであるとする。 << Tenth Embodiment >>
A tenth embodiment will be described. In the tenth embodiment, characteristic control is performed on the acoustic signal. The microphone unit 14 in FIG. 1 is formed from a plurality of microphones. Now, as shown in FIG. 21, it is assumed that the microphone unit 14 includes two microphones 14L and 14R. Although it is possible to employ directional microphones having directivity as the microphones 14L and 14R, it is assumed that the microphones 14L and 14R are omnidirectional microphones having no directivity.

図２２は、撮像装置１の外観斜視図である。マイクロホン１４Ｌ及び１４Ｒは、撮像装置１の筐体上の互いに異なる位置に設置される。図２２に示す如く、マイクロホン１４Ｌは撮像装置１の筐体上の左側に設置され、マイクロホン１４Ｒは撮像装置１の筐体上の右側に設置される。図２２に示す如く、撮像部１１にて撮影可能な被写体が存在する方向を前方と定義し、その逆の方向を後方と定義する。前方及び後方は、撮像部１１の光軸に沿った方向である。また、右及び左とは、後方側から前方側を見たときの右及び左を意味するものとする。 FIG. 22 is an external perspective view of the imaging apparatus 1. The microphones 14L and 14R are installed at different positions on the housing of the imaging device 1. As shown in FIG. 22, the microphone 14 L is installed on the left side on the casing of the imaging apparatus 1, and the microphone 14 R is installed on the right side on the casing of the imaging apparatus 1. As shown in FIG. 22, the direction in which a subject that can be photographed by the imaging unit 11 is defined as the front, and the opposite direction is defined as the rear. The front and rear are directions along the optical axis of the imaging unit 11. Further, right and left mean right and left when the front side is viewed from the rear side.

マイクロホン１４Ｌ及び１４Ｒの夫々は、自身が収音した音をアナログの音響信号に変換して出力する。図１の音響信号処理部１５は、マイクロホン１４Ｌ及び１４Ｒから出力されるアナログの音響信号をデジタルの音響信号に変換する。この変換によって得られた、マイクロホン１４Ｌ及び１４Ｒの出力信号に基づくデジタルの音響信号を夫々左原信号及び右原信号と呼ぶ。音響信号処理部１５は、左原信号及び右原信号に対して公知の指向性制御を施すことにより指向性を持った音響信号を生成することができる。 Each of the microphones 14L and 14R converts the sound collected by itself into an analog acoustic signal and outputs it. The acoustic signal processing unit 15 in FIG. 1 converts analog acoustic signals output from the microphones 14L and 14R into digital acoustic signals. The digital acoustic signals obtained by this conversion and based on the output signals of the microphones 14L and 14R are referred to as a left original signal and a right original signal, respectively. The acoustic signal processing unit 15 can generate an acoustic signal having directivity by performing known directivity control on the left original signal and the right original signal.

また、撮像装置１は、音声付静止画像を生成する機能が備えられている。即ち、シャッタ操作に従って対象入力画像を撮影した際、その対象入力画像の撮影時刻を基準とした一定期間中の音響信号を対象音響信号として生成し、対象音響信号を対象入力画像の画像信号に対応付けて該画像信号と共に外部メモリ１８に記録することができる。対象音響信号の生成は、図１の音響信号処理部１５に内在する対象音響信号生成部（不図示）によって成される。再生モードにおいて、対象入力画像の再生が指示されると、対象入力画像が表示部２７にて表示再生されると共に対象音響信号が音としてスピーカ２８により再生される。 Moreover, the imaging device 1 has a function of generating a still image with sound. That is, when a target input image is captured according to the shutter operation, an acoustic signal for a certain period based on the capturing time of the target input image is generated as the target acoustic signal, and the target acoustic signal corresponds to the image signal of the target input image. In addition, it can be recorded together with the image signal in the external memory 18. The generation of the target acoustic signal is performed by a target acoustic signal generation unit (not shown) included in the acoustic signal processing unit 15 of FIG. When the reproduction of the target input image is instructed in the reproduction mode, the target input image is displayed and reproduced on the display unit 27 and the target acoustic signal is reproduced by the speaker 28 as sound.

更に、撮像装置１は、音声付静止画像を生成する際、指向性制御によって、対象入力画像上の強調対象被写体の方向に指向性を持った音響信号を対象音響信号として生成することができ、ユーザは、何れの被写体を強調対象被写体とすべきかを操作部２６等を用いて指定することができる。対象入力画像上の強調対象被写体の方向に指向性を持った音響信号を対象音響信号として生成するための指向性制御を、特に、特定方向強調制御と呼ぶ。撮像装置１は、ユーザの指定内容を音に関するユーザの嗜好性情報と捉えて学習し、以降の撮影に役立てることができる。以下、このような方法の詳細な実現法を説明する。尚、第１０実施例においても、図４又は図１５に示される各部位が利用される。 Furthermore, when generating the still image with sound, the imaging apparatus 1 can generate an acoustic signal having directivity in the direction of the enhancement target subject on the target input image as the target acoustic signal by directivity control. The user can designate which subject is to be an enhancement subject using the operation unit 26 or the like. Directivity control for generating an acoustic signal having directivity in the direction of the enhancement target subject on the target input image as the target acoustic signal is particularly referred to as specific direction enhancement control. The imaging device 1 learns by regarding the user-specified content as user preference information regarding sound, and can be used for subsequent imaging. Hereinafter, a detailed method of realizing such a method will be described. In the tenth embodiment, each part shown in FIG. 4 or FIG. 15 is used.

［学習段階動作］
まず、学習段階動作について説明する。学習段階動作では、シャッタ操作に従って対象入力画像が撮影される。図２３に示す対象入力画像６００が得られた場合を考える。対象入力画像６００は、第３実施例で述べた第１及び第２登録人物を被写体に含んだ状態で撮影された入力画像であり、被写体６０１及び６０２は、夫々、対象入力画像６００上における第１及び第２登録人物である。被写体検出部５１は、対象入力画像６００から被写体６０１及び６０２を検出して被写体６０１及び６０２が人物であると検出すると共に、互いに異なる被写体領域６１１及び６１２を設定する。被写体領域６１１及び６１２は、夫々、被写体６０１及び６０２の画像信号が存在する画像領域である。更に、対象入力画像６００の画像信号に基づく上記の個人認識処理によって（第３実施例参照）、被写体検出部５１は、被写体６０１及び６０２が夫々第１及び第２登録人物であると認識する。 [Learning stage operation]
First, the learning stage operation will be described. In the learning stage operation, the target input image is taken according to the shutter operation. Consider a case where a target input image 600 shown in FIG. 23 is obtained. The target input image 600 is an input image that is captured with the first and second registered persons described in the third embodiment included in the subject, and the subjects 601 and 602 are the first images on the target input image 600, respectively. 1 and 2 registered persons. The subject detection unit 51 detects the subjects 601 and 602 from the target input image 600, detects that the subjects 601 and 602 are people, and sets different subject areas 611 and 612. The subject areas 611 and 612 are image areas where the image signals of the subjects 601 and 602 exist, respectively. Further, through the above-described personal recognition process based on the image signal of the target input image 600 (see the third embodiment), the subject detection unit 51 recognizes that the subjects 601 and 602 are the first and second registered persons, respectively.

被写体検出部５１において、第１及び第２登録人物は互いに異なるカテゴリの被写体であるとみなされる。今、ユーザが、第１登録人物を強調対象被写体とすべきことを操作部２６等を用いて指定したとする。この指定は、対象入力画像６００の撮影前に行うこともできるし、対象入力画像６００の撮影後に行うこともできる。この指定によって、被写体６００が強調対象被写体として設定される。音響信号処理部１５は、対象入力画像６００の撮影時刻を基準とする抽出期間Ｐ₆₀₀を設定し、抽出期間Ｐ₆₀₀中の左原信号及び右原信号から特定方向強調制御によって被写体６０１から到来する音の成分を強調した音響信号を対象音響信号ＳＤ₆₀₀として生成する。この対象音響信号ＳＤ₆₀₀は、対象入力画像６００に対応付けられて対象入力画像６００の画像信号と共に外部メモリ１８に記録される。 In the subject detection unit 51, the first and second registered persons are regarded as subjects of different categories. Now, it is assumed that the user uses the operation unit 26 or the like to specify that the first registered person should be the highlight target subject. This designation can be performed before the target input image 600 is captured, or can be performed after the target input image 600 is captured. By this designation, the subject 600 is set as the enhancement target subject. The acoustic signal processing unit 15 sets an extraction period P _{600 based} on the shooting time of the target input image 600, and arrives from the subject 601 from the left original signal and the right original signal in the extraction period P ₆₀₀ by specific direction enhancement control. An acoustic signal in which the sound component is emphasized is generated as the target acoustic signal SD ₆₀₀ . This target acoustic signal SD ₆₀₀ is recorded in the external memory 18 together with the image signal of the target input image 600 in association with the target input image 600.

或る入力画像の撮影時刻とは、厳密には例えば、その入力画像の画像信号を取得するために撮像素子３３で実施される露光の開始時刻、中間時刻又は終了時刻を指す。或る撮影時刻を基準とする抽出期間とは、該撮影時刻からΔｔ_A秒だけ前の時刻を始期とし且つ該撮影時刻からΔｔ_B秒だけ後の時刻を終期とする期間をさす（Δｔ_A及びΔｔ_Bは所定の正の値）。一定時間分の左原信号及び右原信号を内部メモリ１７に記録しておくようにすれば、対象入力画像６００の撮影後に強調対象被写体の指定が行われたとしても、内部メモリ１７の記録信号から対象音響信号ＳＤ₆₀₀を作り出すことができる（他の対象音響信号についても同様）。 Strictly speaking, the shooting time of an input image refers to, for example, the start time, intermediate time, or end time of exposure performed by the image sensor 33 in order to acquire an image signal of the input image. The extraction period based on a certain shooting time refers to a period starting from a time that is Δt _A seconds before the shooting time and ending at a time that is Δt _B seconds after the shooting time (Δt _A and Δt _B is a predetermined positive value). If the left original signal and the right original signal for a predetermined time are recorded in the internal memory 17, even if the subject to be emphasized is designated after the target input image 600 is photographed, the recorded signal in the internal memory 17 is recorded. Can generate the target acoustic signal SD ₆₀₀ (the same applies to other target acoustic signals).

音響信号処理部１５は、対象入力画像６００上における被写体領域６１１の位置と、対象入力画像６００の撮影時刻における焦点距離（撮像部１１の焦点距離）から、撮像装置１から見た被写体６０１の方向を推定し、推定方向から到来する音の信号成分（即ち、音源としての被写体６０１から到来する音の信号成分）が強調されるように対象音響信号ＳＤ₆₀₀を生成する。推定方向から到来する音の信号成分を抽出期間Ｐ₆₀₀中の左原信号及び右原信号から必要成分として抽出し、抽出した必要成分そのものを対象音響信号ＳＤ₆₀₀とすることができる。或いは、その必要成分を抽出期間Ｐ₆₀₀中の左原信号及び右原信号から抽出する一方で、必要成分以外の信号成分を抽出期間Ｐ₆₀₀中の左原信号及び右原信号から不必要成分として抽出した後、必要成分の混合比率が比較的大きくなるように必要成分と不必要成分を加重加算することで対象音響信号ＳＤ₆₀₀を生成しても良い。つまり、０＜ｋ_B＜ｋ_Aを満たす係数ｋ_A及びｋ_Bを設定し、上記必要成分に係数ｋ_Aを乗じた信号と上記不必要成分に係数ｋ_Bを乗じた信号とを足し合わせた信号を対象音響信号ＳＤ₆₀₀として生成しても良い。 The acoustic signal processing unit 15 determines the direction of the subject 601 viewed from the imaging device 1 from the position of the subject region 611 on the target input image 600 and the focal length at the shooting time of the target input image 600 (focal length of the imaging unit 11). And the target acoustic signal SD ₆₀₀ is generated so that the signal component of the sound coming from the estimated direction (that is, the signal component of the sound coming from the subject 601 as a sound source) is emphasized. A signal component of sound arriving from the estimated direction and extracted as required component from the left original signal and the right original signals during the extraction period P _600, the extracted necessary ingredient itself can be the target sound signal SD ₆₀₀ a. Alternatively, while extracting from the left original signal and the right original signals during the extraction period P ₆₀₀ the necessary components, as unwanted components from the left original signal and the right original signals during the extraction period P ₆₀₀ signal components other than the necessary ingredients After the extraction, the target acoustic signal SD ₆₀₀ may be generated by weighted addition of the necessary component and the unnecessary component so that the mixing ratio of the necessary component becomes relatively large. That is, coefficients k _A and k _B satisfying 0 <k _B <k _A are set, and a signal obtained by multiplying the necessary component by the coefficient k _A and a signal obtained by multiplying the unnecessary component by the coefficient k _B are added. signal may be generated as a target sound signal SD ₆₀₀ a.

音響信号処理部１５は、対象音響信号ＳＤ₆₀₀’の代わりに、指向性制御の一種であるスレテオ化制御によって抽出期間Ｐ₆₀₀中の左原信号及び右原信号から対象音響信号ＳＤ₆₀₀’を作り出すこともできる。スレテオ化制御による対象音響信号は、強調対象被写体の位置に関係なく生成されたＬ信号とＲ信号から成るステレオ信号である。Ｌ信号及びＲ信号は、互いに異なる方向に指向軸を有する、指向性を持った音響信号である。信号ＳＤ₆₀₀’が生成された場合には、信号ＳＤ₆₀₀の代わりに信号ＳＤ₆₀₀’が、対象入力画像６００に対応付けられて対象入力画像６００の画像信号と共に外部メモリ１８に記録される。 Sound signal processing section 15, 'instead of, target sound signal SD ₆₀₀ from the left original signal and the right original signals during the extraction period P ₆₀₀ by Sureteo reduction control is a kind of directivity control' target sound signal SD ₆₀₀ produces a You can also The target acoustic signal by the stereo control is a stereo signal composed of an L signal and an R signal generated regardless of the position of the enhancement target subject. The L signal and the R signal are acoustic signals having directivity and having directivity axes in different directions. Signal SD ₆₀₀ 'when is generated, signal signal SD ₆₀₀ in place of SD _600' are recorded in association with the target input image 600 with the image signal of the target input image 600 in the external memory 18.

被写体検出部５１によって対象入力画像６００のカテゴリ組み合わせとしてカテゴリ組み合わせ「第１及び第２登録人物」が設定され、メモリ制御部５３によって、対象入力画像６００の対象音響信号の指向性に関する音制御情報がカテゴリ組み合わせ「第１及び第２登録人物」に関連付けた状態で学習メモリ５４に記録される。例えば、図２４に示す音制御情報６０５が対象入力画像６００のカテゴリ組み合わせ「第１及び第２登録人物」に関連付けた状態で学習メモリ５４に記録される。音制御情報は、例えば、図１のＣＰＵ２３に内在する音制御情報生成部（不図示）によって生成される。 The category combination “first and second registered persons” is set as the category combination of the target input image 600 by the subject detection unit 51, and the sound control information regarding the directivity of the target acoustic signal of the target input image 600 is set by the memory control unit 53. It is recorded in the learning memory 54 in a state associated with the category combination “first and second registered persons”. For example, the sound control information 605 shown in FIG. 24 is recorded in the learning memory 54 in a state associated with the category combination “first and second registered persons” of the target input image 600. The sound control information is generated, for example, by a sound control information generation unit (not shown) included in the CPU 23 of FIG.

或る入力画像に対する音制御情報は、特定方向強調制御のＯＮ／ＯＦＦを表す制御ＯＮ／ＯＦＦ情報及び強調対象被写体を表す強調対象情報から成り、その音制御情報に必要に応じて、上述したような付加情報が付加される（図２４参照）。付加情報も音制御情報の一部であると考えることも可能であるが、本例においては、付加情報は音制御情報の構成要素ではないと考える。 The sound control information for a certain input image is composed of control ON / OFF information indicating ON / OFF of specific direction emphasis control and emphasis target information indicating an emphasis target subject, and the sound control information is as described above as necessary. Additional information is added (see FIG. 24). Although it is possible to consider that the additional information is also a part of the sound control information, in this example, it is considered that the additional information is not a component of the sound control information.

制御ＯＮ／ＯＦＦ情報はＯＮ又はＯＦＦとされる。制御ＯＮ／ＯＦＦ情報がＯＮであることは、対応する対象入力画像に対して特定方向強調制御を用いて対象音響信号が生成されていることを意味し、制御ＯＮ／ＯＦＦ情報がＯＦＦであることは、対応する対象入力画像に対してステレオ化制御を用いて対象音響信号が生成されていることを意味する。
強調対象情報は、対象入力画像上の何れのカテゴリの被写体が強調対象被写体であるかを示しており、制御ＯＮ／ＯＦＦ情報がＯＮのときにのみ、意義のあるデータを持つ。
対象入力画像６００に対して対象音響信号ＳＤ₆₀₀が生成されたものとする。そうすると、音制御情報６０５において、制御ＯＮ／ＯＦＦ情報は「ＯＮ」であり、強調対象情報は「第１登録人物」である。 Control ON / OFF information is set to ON or OFF. When the control ON / OFF information is ON, it means that the target acoustic signal is generated using the specific direction enhancement control for the corresponding target input image, and the control ON / OFF information is OFF. Means that the target acoustic signal is generated using the stereo control for the corresponding target input image.
The emphasis target information indicates which category of subject on the target input image is the emphasis target subject, and has meaningful data only when the control ON / OFF information is ON.
It is assumed that the target acoustic signal SD ₆₀₀ is generated for the target input image 600. Then, in the sound control information 605, the control ON / OFF information is “ON”, and the emphasis target information is “first registered person”.

学習段階動作では、対象入力画像の撮影を繰り返し実行することで、ユーザの嗜好性を繰り返し学習する。Ｌ_NUMを１以上の任意の整数とすることができるが、今、Ｌ_NUMが３であるとする。Ｌ_NUMの意義は上述した通りである。図２５は、対象入力画像６００の取得後、対象入力画像６００ａ及び６００ｂ（全て不図示）が更に取得された後の、学習メモリ５４の記録内容を示している。対象入力画像６００ａ及び６００ｂの夫々には、第１及び第２登録人物が被写体として含まれていて、対象入力画像６００ａ及び６００ｂの夫々の組み合わせカテゴリは「第１及び第２登録人物」であるとする。更に、対象入力画像６００の場合と同様にして、対象入力画像６００ａ及び６００ｂの夫々に対しても対象音響信号が生成されているものとする。 In the learning stage operation, the user's preference is repeatedly learned by repeatedly performing capturing of the target input image. L _NUM can be any integer greater than or equal to 1, but now _assume that L _NUM is 3. The significance of _LNUM is as described above. FIG. 25 shows the recorded contents of the learning memory 54 after the target input image 600 is acquired and the target input images 600a and 600b (all not shown) are further acquired. Each of the target input images 600a and 600b includes first and second registered persons as subjects, and each combination category of the target input images 600a and 600b is “first and second registered persons”. To do. Further, it is assumed that the target acoustic signal is generated for each of the target input images 600a and 600b as in the case of the target input image 600.

ＣＰＵ２３は、対象入力画像６００の音制御情報６０５を生成する方法と同様の方法にて、対象入力画像６００ａの音制御情報６０５ａ及び対象入力画像６００ｂの音制御情報６０５ｂを生成する。メモリ制御部５３は、音制御情報６０５、６０５ａ及び６０５ｂが生成されると、それらをカテゴリ組み合わせ「第１及び第２登録人物」に関連付けつつ学習メモリ５４に保存する。一方で、ＣＰＵ２３又はメモリ制御部５３は、音制御情報６０５、６０５ａ及び６０５ｂに一致又は類似する音制御情報Ｗ３_Aを統計学に基づいて作成し、音制御情報Ｗ３_Aもカテゴリ組み合わせ「第１及び第２登録人物」に関連付けて学習メモリ５４に保存する。１枚１枚の対象入力画像の音制御情報を特に要素音制御情報とも呼び、複数枚の対象入力画像の音制御情報から統計学に基づき生成された音制御情報を特に総合音制御情報とも呼ぶ。本例において、音制御情報６０５、６０５ａ及び６０５ｂの夫々は要素音制御情報であり、音制御情報Ｗ３_Aは総合音制御情報である。 The CPU 23 generates sound control information 605a of the target input image 600a and sound control information 605b of the target input image 600b by a method similar to the method of generating the sound control information 605 of the target input image 600. When the sound control information 605, 605a, and 605b is generated, the memory control unit 53 stores them in the learning memory 54 while associating them with the category combination “first and second registered persons”. On the other hand, the CPU 23 or the memory control unit 53 creates sound control information W3 _A that matches or is similar to the sound control information 605, 605a, and 605b based on statistics, and the sound control information W3 _A is also classified into the category combination “first and The information is stored in the learning memory 54 in association with the “second registered person”. The sound control information for each target input image is also called elemental sound control information, and the sound control information generated based on statistics from the sound control information for a plurality of target input images is also called comprehensive sound control information. . In this example, a sound control information 605,605a and 605b respectively are elements sound control information, the sound control information W3 _A is a total sound control information.

要素音制御情報６０５、６０５ａ及び６０５ｂにおいて、制御ＯＮ／ＯＦＦ情報は、夫々、「ＯＮ」、「ＯＮ」及び「ＯＦＦ」であり、要素音制御情報６０５及び６０５ａにおける強調対象情報は共に「第１登録人物」である。複数の要素特徴情報のフォーカス情報から総合特徴情報のフォーカス情報を生成する方法（第１実施例で述べた方法）と同様の方法にて（図９参照）、情報６０５、６０５ａ及び６０５ｂの制御ＯＮ／ＯＦＦ情報から情報Ｗ３_Aの制御ＯＮ／ＯＦＦ情報を生成することができると共に、情報６０５、６０５ａ及び６０５ｂの強調対象情報から情報Ｗ３_Aの強調対象情報を生成することができる。即ち例えば、情報Ｗ３_Aの元となる要素音制御情報の強調対象情報の内、最も頻度の多い強調対象情報を、情報Ｗ３_Aの強調対象情報とすることができる（制御ＯＮ／ＯＦＦ情報も同様）。従って、情報Ｗ３_Aにおける制御ＯＮ／ＯＦＦ情報及び強調対象情報は、夫々、「ＯＮ」及び「第１登録人物」とされる。情報Ｗ３_Aに付随する付加情報には、カテゴリ組み合わせ「第１及び第２登録人物」に対する学習回数が記録される。情報Ｗ３_Aは、３回分の学習結果に基づき、即ち情報６０５、６０５ａ及び６０５ｂを元に生成される。従って、情報Ｗ３_Aの付加情報には、学習回数を表す数値として３が記録される。 In the element sound control information 605, 605a, and 605b, the control ON / OFF information is “ON”, “ON”, and “OFF”, respectively, and the emphasis target information in the element sound control information 605 and 605a is “first”. "Registered person". Control ON of information 605, 605a, and 605b is performed in the same manner as the method of generating the focus information of the comprehensive feature information (the method described in the first embodiment) from the focus information of the plurality of element feature information (see FIG. 9). The control ON / OFF information of the information W3 _A can be generated from the / OFF information, and the emphasis target information of the information W3 _A can be generated from the emphasis target information of the information 605, 605a, and 605b. That is, for example, of the emphasis target information elements sound control information as a source of information W3 _A, similar to the most frequent more emphasis target information may be the highlight object information of the information W3 _A (also controls ON / OFF information ). Therefore, the control ON / OFF information and the emphasis target information in the information W3 _A are “ON” and “first registered person”, respectively. In the additional information accompanying the information W3 _A , the number of learning times for the category combination “first and second registered persons” is recorded. The information W3 _A is generated based on the learning results for three times, that is, based on the information 605, 605a, and 605b. Therefore, 3 is recorded in the additional information of the information W3 _A as a numerical value indicating the number of learnings.

尚、第１実施例で述べたように、対象入力画像６００、６００ａ及び６００ｂの撮影後、更に、カテゴリ組み合わせが「第１及び第２登録人物」となる対象入力画像６００ｃ（不図示）が対象音響信号と共に生成された場合には、その最新の対象入力画像６００ｃの音制御情報を用いて総合音制御情報Ｗ３_Aを更新すると良い。 As described in the first embodiment, after the target input images 600, 600a, and 600b are captured, the target input image 600c (not shown) whose category combination is “first and second registered person” is the target. When generated together with the sound signal, the comprehensive sound control information W3 _A may be updated using the sound control information of the latest target input image 600c.

［制御段階動作］
カテゴリ組み合わせ「第１及び第２登録人物」について、要素音制御情報６０５、６０５ａ及び６０５ｂに基づく図２５の総合音制御情報Ｗ３_Aが学習メモリ５４に保存され、且つ、総合音制御情報Ｗ３_A以外の幾つかの総合音制御情報が学習メモリ５４に保存されている状態を想定し、この想定状態の下における制御段階動作の説明を行う。図２６は、第１０実施例に係る制御段階動作の手順を表すフローチャートであり、制御段階動作ではステップＳ８１〜Ｓ８４の各処理が実行される。 [Control stage operation]
For the category combination “first and second registered persons”, the integrated sound control information W3 _{A of} FIG. 25 based on the element sound control information 605, 605a, and 605b is stored in the learning memory 54, and other than the integrated sound control information W3 _A Assuming a state in which some integrated sound control information is stored in the learning memory 54, the control stage operation under this assumed state will be described. FIG. 26 is a flowchart showing the procedure of the control stage operation according to the tenth embodiment. In the control stage operation, steps S81 to S84 are executed.

制御段階動作においてシャッタ操作が成されると、ステップＳ８１において、新たな対象入力画像が撮影される一方で評価用画像に対して被写体検出部５１による検出処理が成される。ここにおける評価用画像は、通常、ステップＳ８１にて撮影される対象入力画像である。但し、制御段階動作中に得られた入力画像であって且つステップＳ８１の対象入力画像の撮影前に撮影された入力画像（例えば、ステップＳ８１の対象入力画像の撮影直前に撮影された入力画像）を、評価用画像としても良い。被写体検出部５１は、評価用画像の画像信号に基づき、評価用画像上に存在する各被写体を複数のカテゴリの何れかに分類して検出する。これにより、上述と同様の方法にて、評価用画像についてのカテゴリ組み合わせが決定する。 When the shutter operation is performed in the control stage operation, in step S81, a new target input image is captured, while the subject detection unit 51 performs detection processing on the evaluation image. The evaluation image here is usually the target input image photographed in step S81. However, the input image obtained during the control stage operation and captured before capturing the target input image in step S81 (for example, the input image captured immediately before capturing the target input image in step S81). May be used as an evaluation image. The subject detection unit 51 detects and classifies each subject existing on the evaluation image into one of a plurality of categories based on the image signal of the evaluation image. Thereby, the category combination about the image for evaluation is determined by the same method as described above.

ステップＳ８１に続くステップＳ８２において、音響信号処理部１５は、評価用画像のカテゴリ組み合わせに一致するカテゴリ組み合わせの総合音制御情報を、学習メモリ５４から読み出す。例えば、評価用画像の組み合わせカテゴリが「第１及び第２登録人物」であったならば情報Ｗ３_Aが読み出され、評価用画像の組み合わせカテゴリが「第３及び第４登録人物」であったならば組み合わせカテゴリ「第３及び第４登録人物」に対応する総合音制御情報が読み出される。以下では、評価用画像が図２７の画像６５０である場合を考える。更に、評価用画像６５０は、ステップＳ８１にて撮影される対象入力画像であるとする。画像上６５０には第１登録人物である被写体６５１及び第２登録人物である被写体６５２の画像信号が存在している。そうすると、評価用画像６５０の組み合わせカテゴリは「第１及び第２登録人物」となるため、ステップＳ８２において総合音制御情報Ｗ３_Aが読み出される。以下、ステップＳ８２にて読み出される総合音制御情報を、読み出し音制御情報とも呼ぶ。図２７の画像６５０上の被写体領域６６１及び６６２は、夫々、被写体６５１及び６５２の画像信号が存在する画像領域である。 In step S 82 following step S 81, the acoustic signal processing unit 15 reads out the comprehensive sound control information of the category combination that matches the category combination of the evaluation image from the learning memory 54. For example, if the combination category of the evaluation image is “first and second registered person”, the information W3 _A is read, and the combination category of the evaluation image is “third and fourth registered person”. Then, the integrated sound control information corresponding to the combination category “third and fourth registered persons” is read out. Hereinafter, a case where the evaluation image is the image 650 in FIG. 27 will be considered. Furthermore, it is assumed that the evaluation image 650 is a target input image captured in step S81. On the image 650, there are image signals of the subject 651 as the first registered person and the subject 652 as the second registered person. Then, since the combination category of the evaluation image 650 is “first and second registered persons”, the integrated sound control information W3 _A is read in step S82. Hereinafter, the comprehensive sound control information read in step S82 is also referred to as read sound control information. The subject areas 661 and 662 on the image 650 in FIG. 27 are image areas where the image signals of the subjects 651 and 652 exist, respectively.

ステップＳ８２の読み出し処理の後、ステップＳ８３において、音響信号処理部１５は、ステップＳ８１にて撮影される対象入力画像の撮影時刻を基準とした抽出期間Ｐ₆₅₀を設定し、ステップＳ８１の対象入力画像に対応付けられるべき対象音響信号ＳＤ₆₅₀を生成する。対象音響信号ＳＤ₆₅₀の生成方法は上述した通りであるが、対象音響信号ＳＤ₆₅₀は、読み出し音制御情報に規定された条件に従って生成される。 After the reading process in step S82, in step S83, the acoustic signal processing unit 15 sets an extraction period _{P650 based} on the shooting time of the target input image shot in step S81, and the target input image in step S81. The target acoustic signal SD ₆₅₀ to be associated with is generated. The method of generating the target acoustic signal SD ₆₅₀ is as described above, but the target acoustic signal SD ₆₅₀ is generated according to the conditions defined in the read sound control information.

今、読み出し音制御情報が情報Ｗ３_Aであることが想定されているため（図２５参照）、強調対象被写体が第１登録人物であるとみなした上で抽出期間Ｐ₆₅₀中の左原信号及び右原信号に上述の特定方向強調制御を施すことで対象音響信号ＳＤ₆₅₀を生成する。対象入力画像と評価用画像６５０が同じである場合、評価用画像６５０と一致する対象入力画像上における被写体領域６６１の位置と、ステップＳ８１の対象入力画像の撮影時刻における焦点距離（撮像部１１の焦点距離）から、撮像装置１から見た被写体６５１の方向を推定し、推定方向から到来する音の信号成分（即ち、音源としての被写体６５１から到来する音の信号成分）が強調されるように対象音響信号ＳＤ₆₅₀を生成する。尚、上記の想定とは異なるが、仮に読み出し音制御情報における制御ＯＮ／ＯＦＦ情報が「ＯＦＦ」であるならば、スレテオ化制御によって抽出期間Ｐ₆₅₀中の左原信号及び右原信号から対象音響信号ＳＤ₆₅₀が生成される。 Now, since it reads sound control information is information W3 _A is assumed (see Fig. 25), the left original signal during the extraction period P ₆₅₀ on the emphasis target object is considered to be the first registered person and The target acoustic signal SD ₆₅₀ is generated by performing the above-described specific direction emphasis control on the right original signal. When the target input image and the evaluation image 650 are the same, the position of the subject region 661 on the target input image that coincides with the evaluation image 650 and the focal length at the shooting time of the target input image in step S81 (of the imaging unit 11). The direction of the subject 651 viewed from the imaging device 1 is estimated from the focal length), and the signal component of the sound coming from the estimated direction (that is, the signal component of the sound coming from the subject 651 as a sound source) is emphasized. A target acoustic signal SD ₆₅₀ is generated. Although the assumption is different from the above assumption, if the control ON / OFF information in the readout sound control information is “OFF”, the target sound is detected from the left original signal and the right original signal in the extraction period P ₆₅₀ by the threshold control. Signal SD ₆₅₀ is generated.

その後、ステップＳ８４では、ステップＳ８１にて撮影された対象入力画像の画像信号とステップＳ８３にて生成された対象音響信号が互いに対応付けられて外部メモリ１８に記録される。 Thereafter, in step S84, the image signal of the target input image taken in step S81 and the target acoustic signal generated in step S83 are associated with each other and recorded in the external memory 18.

学習メモリ５４に記録された音制御情報には、ユーザの音に関する嗜好性が反映されている。制御段階動作では、現時点の被写体のカテゴリに応じた音制御情報を学習メモリ５４から読み出すことで、ユーザの嗜好性に適合した音の特徴を自動的に再現する。これにより、ユーザの利便性向上が図られる。 The sound control information recorded in the learning memory 54 reflects the user's preference for sound. In the control stage operation, the sound control information corresponding to the current subject category is read from the learning memory 54 to automatically reproduce the sound characteristics suitable for the user's preference. Thereby, the user's convenience is improved.

尚、上述の具体的動作例では個人認識処理の利用が想定されているが、個人認識処理の利用がない状態でも同様の処理が可能である。例えば、学習段階動作中に得られた対象入力画像のカテゴリ組み合わせが「人物及び犬」であるときに人物が強調対象被写体として指定されると、「ＯＮ」の制御ＯＮ／ＯＦＦ情報と「人物」の強調対象情報を内包する音制御情報が作成され、その音制御情報がカテゴリ組み合わせ「人物及び犬」に関連付けられた状態で学習メモリ５４に保存される。対象入力画像の撮影の繰り返しによって、カテゴリ組み合わせ「人物及び犬」についての総合音制御情報が学習メモリ５４上に生成された後、制御段階動作のステップＳ８１においてカテゴリ組み合わせが「人物及び犬」となる対象入力画像（又は評価用画像）が撮影されると、カテゴリ組み合わせ「人物及び犬」に対応する総合音制御情報が読み出し音制御情報として読み出される。そして、その読み出し音制御情報における制御ＯＮ／ＯＦＦ情報及び強調対象情報が夫々「ＯＮ」及び「人物」であるならば、ステップＳ８１の対象入力画像上の人物を強調対象被写体とみなした上で特定方向強調制御を用いてステップＳ８３の対象音響信号を生成すればよい。 In the above-described specific operation example, the use of the personal recognition process is assumed, but the same process can be performed even when the personal recognition process is not used. For example, if a person is designated as an object to be emphasized when the category combination of the target input image obtained during the learning stage operation is “person and dog”, control ON / OFF information of “ON” and “person” are designated. The sound control information including the emphasis target information is created, and the sound control information is stored in the learning memory 54 in a state associated with the category combination “person and dog”. After comprehensive sound control information for the category combination “person and dog” is generated on the learning memory 54 by repeating the shooting of the target input image, the category combination becomes “person and dog” in step S81 of the control stage operation. When the target input image (or the image for evaluation) is photographed, comprehensive sound control information corresponding to the category combination “person and dog” is read as read sound control information. If the control ON / OFF information and the emphasis target information in the read sound control information are “ON” and “person”, respectively, the person on the target input image in step S81 is regarded as the emphasis target subject and specified. What is necessary is just to produce | generate the target acoustic signal of step S83 using direction emphasis control.

また、上述の方法を動画像に対しても適用することができる。対象動画像の画像信号に基づき対象動画像上の被写体の検出及びカテゴリ分類を行うことで、静止画像としての対象入力画像と同様、対象動画像のカテゴリ組み合わせも設定することができる。対象動画像の撮影の繰り返しによって、カテゴリ組み合わせ「人物及び犬」についての総合音制御情報が学習メモリ５４上に生成された後、制御段階動作においてカテゴリ組み合わせが「人物及び犬」となる対象動画像が撮影されると、カテゴリ組み合わせ「人物及び犬」に対応する総合音制御情報が読み出し音制御情報として読み出される。そして、その読み出し音制御情報における制御ＯＮ／ＯＦＦ情報及び強調対象情報が夫々「ＯＮ」及び「人物」であるならば、制御段階動作中に撮影される対象動画像上の人物を強調対象被写体とみなした上で特定方向強調制御を用いて対象音響信号を生成すればよい。この場合、強調対象被写体からの音を追尾したような対象音響信号が生成されて対象動画像に対応付けられる。 The above-described method can also be applied to moving images. By performing subject detection and category classification on the target moving image based on the image signal of the target moving image, a category combination of the target moving image can be set as in the case of the target input image as a still image. After the comprehensive sound control information for the category combination “person and dog” is generated on the learning memory 54 by repeating the shooting of the target moving image, the target moving image in which the category combination becomes “person and dog” in the control stage operation. Is taken, comprehensive sound control information corresponding to the category combination “person and dog” is read out as read sound control information. If the control ON / OFF information and the emphasis target information in the read sound control information are “ON” and “person”, respectively, the person on the target moving image captured during the control stage operation is set as the emphasis target subject. What is necessary is just to produce | generate a target acoustic signal using specific direction emphasis control after considering. In this case, a target acoustic signal that tracks the sound from the enhancement target subject is generated and associated with the target moving image.

尚、第１０実施例で述べた方法は、他の実施例で述べた方法と組み合わせて実施することができる。 The method described in the tenth embodiment can be implemented in combination with the methods described in other embodiments.

＜＜第１１実施例＞＞
第１１実施例を説明する。学習段階動作において、学習段階動作から制御段階動作へ移行するために必要な残りの学習回数をユーザに提示するようにしても良い。 << Eleventh embodiment >>
An eleventh embodiment will be described. In the learning phase operation, the remaining number of learnings necessary for shifting from the learning phase operation to the control phase operation may be presented to the user.

例えば、第１実施例においてカテゴリ組み合わせが「人物及び山」である対象入力画像をあとｍ₁回撮影すれば学習段階動作から制御段階動作へ移行することができる場合、ｍ₁回を示す第１指標を、表示部２７に表示すると良い（ｍ₁は自然数）。Ｌ_NUM＝３であって且つ図９の特徴情報３０５及び３０５ａのみが学習メモリ５４に記録されている場合には、ｍ₁＝１である。
同様に例えば、第８実施例においてカテゴリ組み合わせが「人物及び山」である対象入力画像をあとｍ₂回撮影すれば学習段階動作から制御段階動作へ移行することができる場合、ｍ₂回を示す第２指標を、表示部２７に表示すると良い（ｍ₂は自然数）。Ｌ_NUM＝３であって且つ図１８の生成条件情報５０５及び５０５ａのみが学習メモリ５４に記録されている場合には、ｍ₂＝１である。
同様に例えば、第１０実施例においてカテゴリ組み合わせが「第１及び第２登録人物」である対象入力画像をあとｍ₃回撮影すれば学習段階動作から制御段階動作へ移行することができる場合、ｍ₃回を示す第３指標を、表示部２７に表示すると良い（ｍ₃は自然数）。Ｌ_NUM＝３であって且つ図２５の音制御情報６０５及び６０５ａのみが学習メモリ５４に記録されている場合には、ｍ₃＝１である。
ｍ₁〜ｍ₃の値は、学習メモリ５４の記録内容を参照すれば容易に判明する。 For example, if the can transition category combinations to control stage operation from the learning stage operation when photographing _once more m the target input image is a "person and mountains" in the first embodiment, the indicating _once m 1 The index may be displayed on the display unit 27 (m ₁ is a natural number). When L _NUM = 3 and only the feature information 305 and 305a of FIG. 9 is recorded in the learning memory 54, m ₁ = 1.
Similarly, for example, if the target input image whose category combination is “person and mountain” is photographed m _{2 more} times in the eighth embodiment, the learning phase operation can be shifted to the control phase operation, indicating m ₂ times. The second index may be displayed on the display unit 27 (m ₂ is a natural number). When L _NUM = 3 and only the generation condition information 505 and 505a of FIG. 18 is recorded in the learning memory 54, m ₂ = 1.
Similarly, for example, in the tenth embodiment, if the target input image whose category combination is “first and second registered persons” is photographed m _{3 more} times, the learning stage operation can be shifted to the control stage operation. _{A third} index indicating ₃ times may be displayed on the display unit 27 (m ₃ is a natural number). When L _NUM = 3 and only the sound control information 605 and 605a in FIG. 25 is recorded in the learning memory 54, m ₃ = 1.
The values of m _{1 to} m ₃ can be easily determined by referring to the recorded contents of the learning memory 54.

上記の第１指標を、図２８（ａ）に示すように文字で構成しても良いし、或いは、図２８（ｂ）に示すように図形で構成しても良いし、或いは、文字と図形の組み合わせで構成しても良い。第２及び第３指標についても同様である。図２８（ａ）及び（ｂ）は、第１指標が表示されている状態の表示画面例を示す図である。特に記述なき限り、表示画面とは、表示部２７の表示画面を指す。また、上述の複数の実施例（例えば、第１、第８及び第１０実施例）の動作を実現可能なように撮像装置１が形成されている場合には、第１〜第３指標の内の２以上の指標を同時に表示するようにしても良い。 The first index may be composed of characters as shown in FIG. 28 (a), may be composed of figures as shown in FIG. 28 (b), or may be composed of characters and figures. You may comprise by the combination of these. The same applies to the second and third indices. FIGS. 28A and 28B are diagrams illustrating display screen examples in a state where the first index is displayed. Unless otherwise specified, the display screen refers to the display screen of the display unit 27. Further, when the imaging apparatus 1 is formed so as to be able to realize the operations of the above-described plurality of embodiments (for example, the first, eighth, and tenth embodiments), the first to third indices These two or more indices may be displayed simultaneously.

＜＜第１２実施例＞＞
第１２実施例を説明する。第１２実施例では、マニュアル操作に関する第８実施例に適用可能な技術を説明する。 << Twelfth embodiment >>
A twelfth embodiment will be described. In the twelfth embodiment, a technique applicable to the eighth embodiment regarding manual operation will be described.

説明の便宜上、第１２実施例では、各生成条件情報に、感度情報及び手ぶれ補正情報に加えて、鮮鋭化条件を表す鮮鋭化情報が含まれていることを想定する（鮮鋭化条件の意義は、第８実施例で述べられている）。 For convenience of explanation, in the twelfth embodiment, it is assumed that each generation condition information includes sharpening information representing a sharpening condition in addition to sensitivity information and camera shake correction information (the significance of the sharpening condition is , Described in the eighth embodiment).

ユーザは、学習段階動作においてマニュアル操作を成すことで、各対象入力画像の感度条件、鮮鋭化条件及び手ぶれ補正ＯＮ／ＯＦＦ条件の全部又は一部を初期条件から変更することができる。それらの変更が成された際、図１５のメモリ制御部５３は、変更が成された条件が何であるか及び条件ごとの変更回数を学習メモリ５４の付加情報に記録しておく。ＣＰＵ２３は、その付加情報に基づき、ユーザが比較的頻繁に変更する条件を特定条件として設定する。そして、特定条件の設定後には、特定条件の変更が容易となるようなユーザインターフェースを実現する。 The user can change all or a part of the sensitivity condition, the sharpening condition, and the camera shake correction ON / OFF condition of each target input image from the initial conditions by performing a manual operation in the learning stage operation. When those changes are made, the memory control unit 53 in FIG. 15 records what conditions are changed and the number of changes for each condition in the additional information of the learning memory 54. Based on the additional information, the CPU 23 sets a condition that the user changes relatively frequently as the specific condition. Then, after the specific condition is set, a user interface that makes it easy to change the specific condition is realized.

具体例を説明する。学習段階動作の開始直後において、カテゴリ組み合わせが「人物及び山」となる入力画像が撮影されると、図２９（ａ）に示す如く、単に該入力画像が表示部２７に表示される。仮に、この状態において、ユーザが感度条件の変更を希望する場合、感度条件、鮮鋭化条件及び手ぶれ補正ＯＮ／ＯＦＦ条件を変更可能な操作メニューを表示させるための第１操作を成した後、感度条件、鮮鋭化条件及び手ぶれ補正ＯＮ／ＯＦＦ条件の中から感度条件を選択する第２操作と、ＩＳＯ感度の具体的数値を指定するための第３操作を順次成す必要がある。この第１〜第３操作から成る一連の操作を総称して基本操作と呼ぶ。 A specific example will be described. Immediately after the start of the learning stage operation, when an input image with the category combination “person and mountain” is photographed, the input image is simply displayed on the display unit 27 as shown in FIG. In this state, if the user desires to change the sensitivity condition, after performing the first operation for displaying the operation menu that can change the sensitivity condition, the sharpening condition, and the image stabilization ON / OFF condition, the sensitivity is changed. It is necessary to sequentially perform a second operation for selecting a sensitivity condition from among a condition, a sharpening condition, and a camera shake correction ON / OFF condition, and a third operation for designating a specific value of ISO sensitivity. A series of operations including the first to third operations is collectively referred to as a basic operation.

学習段階動作において、カテゴリ組み合わせが「人物及び山」となるｎ枚の対象入力画像が撮影されたとし、そのｎ枚の対象入力画像の撮影時の夫々において、対象入力画像の生成条件を変更するためのマニュアル操作が成されたものとする（ｎは２以上の整数）。但し、この変更は感度条件のみに対する変更であり、鮮鋭化条件はｎ枚の対象入力画像間で一定であったとする。この場合、ＣＰＵ２３は、学習メモリ５４の付加情報に基づき、感度条件に対する変更回数（即ちｎ回）が、鮮鋭化条件及び手ぶれ補正ＯＮ／ＯＦＦ条件に対する変更回数（即ち０回）よりも多いと判断し、結果、感度条件を特定条件に設定する。そして、その設定後、制御段階動作（又は学習段階動作）において、カテゴリ組み合わせが「人物及び山」となる入力画像が取得された場合、撮像装置１は、図２９（ｂ）に示す如く、表示部２７上に入力画像と共に操作アイコン７２１を表示させる（例えば入力画像上に操作アイコン７２１を重畳表示させる）。図２９（ｂ）は、操作アイコン７２１が表示されている状態の表示画面である。ユーザは、操作部２６に対する操作やタッチパネル操作により操作アイコン７２１を選択した後、第３操作のみを成すことで感度情報の指定を完了することができる。即ち、ユーザによる第１操作が不要となり、また、基本操作と比べて第２操作も簡単となる。尚、上述の想定とは異なるが、鮮鋭化条件に対する変更回数も比較的多いならば、鮮鋭化条件の変更指示用のアイコン（不図示）も操作アイコン７２１に並んで表示される。 In the learning stage operation, n target input images having a category combination of “person and mountain” are captured, and the generation conditions of the target input images are changed at the time of capturing the n target input images. It is assumed that a manual operation is performed (n is an integer of 2 or more). However, this change is a change with respect to only the sensitivity condition, and it is assumed that the sharpening condition is constant among n target input images. In this case, the CPU 23 determines that the number of changes to the sensitivity condition (ie, n times) is greater than the number of changes (ie, 0) to the sharpening condition and the camera shake correction ON / OFF condition based on the additional information in the learning memory 54. As a result, the sensitivity condition is set to a specific condition. Then, after the setting, when an input image whose category combination is “person and mountain” is acquired in the control stage operation (or learning stage operation), the imaging apparatus 1 displays as shown in FIG. The operation icon 721 is displayed together with the input image on the unit 27 (for example, the operation icon 721 is superimposed on the input image). FIG. 29B is a display screen in a state where the operation icon 721 is displayed. The user can complete the designation of sensitivity information by performing only the third operation after selecting the operation icon 721 by an operation on the operation unit 26 or a touch panel operation. That is, the first operation by the user is not necessary, and the second operation is simpler than the basic operation. Although different from the above assumption, if the number of changes to the sharpening condition is relatively large, an icon (not shown) for changing the sharpening condition is also displayed alongside the operation icon 721.

ユーザにより操作アイコン７２１が選択された場合、ＣＰＵ２３は、学習メモリ５４内の各感度情報から、ユーザが比較的頻繁に設定するＩＳＯ感度の値を抽出し、その抽出結果を用いた表示を表示部２７に行わせる。例えば、上記ｎ枚の対象入力画像の撮影時におけるＩＳＯ感度が、ＩＳＯ５０又はＩＳＯ１００であった場合、操作アイコン７２１を選択する操作を受けると、図２９（ｂ）の表示画面を図２９（ｃ）の表示画面に変更する。図２９（ｃ）の表示画面では、「ＩＳＯ５０」についてのアイコン７３１と「ＩＳＯ１００」についてのアイコン７３２が現在の入力画像と共に示されている。 When the operation icon 721 is selected by the user, the CPU 23 extracts the ISO sensitivity value set relatively frequently by the user from each sensitivity information in the learning memory 54, and displays a display using the extraction result. 27. For example, when the ISO sensitivity at the time of shooting the n target input images is ISO 50 or ISO 100, when an operation for selecting the operation icon 721 is received, the display screen of FIG. 29B is displayed as shown in FIG. Change to the display screen. In the display screen of FIG. 29C, an icon 731 for “ISO 50” and an icon 732 for “ISO 100” are shown together with the current input image.

図２９（ｃ）の表示が成されている状態において、ユーザは、アイコン７３１を選択する操作を成すだけで現在のＩＳＯ感度を「ＩＳＯ５０」に設定することができ、或いは、アイコン７３２を選択する操作を成すだけで現在のＩＳＯ感度を「ＩＳＯ１００」に設定することができる。この設定内容は、即時、入力画像の生成条件に反映されるため、ユーザは反映結果を表示画面上で容易に確認することができる。基本操作における第３操作では、３以上（１０個程度）のＩＳＯ感度の中から希望のＩＳＯ感度を選択する必要があるが、図２９（ｃ）に示すような表示を成すことで第３操作が基本操作よりも簡単となる。 In the state where the display of FIG. 29C is made, the user can set the current ISO sensitivity to “ISO50” simply by performing an operation of selecting the icon 731 or select the icon 732. The current ISO sensitivity can be set to “ISO100” simply by performing an operation. Since this setting content is immediately reflected in the generation conditions of the input image, the user can easily confirm the reflection result on the display screen. In the third operation in the basic operation, it is necessary to select a desired ISO sensitivity from among three or more (about 10) ISO sensitivities. However, the third operation is performed by displaying as shown in FIG. Is easier than basic operations.

また、図２９（ｃ）の表示を成す場合において、比較的頻繁に設定されるＩＳＯ感度のアイコンが表示画面の上方に配置されるように、表示画面上におけるアイコン７３１及び７３２の並び方を学習メモリ５４内の各感度情報に基づいて決定するようにしても良い。例えば、上記ｎ枚の対象入力画像の内、ｎ₁枚の対象入力画像についてのＩＳＯ感度が「ＩＳＯ５０」であって且つｎ₂枚の対象入力画像についてのＩＳＯ感度が「ＩＳＯ１００」である場合において（ｎ₁及びｎ₂は整数であってｎ₁＋ｎ₂＝ｎ）、ｎ₁＞ｎ₂なら、図２９（ｃ）に示す如く表示画面上においてアイコン７３１をアイコン７３２よりも上方に表示し、逆に、ｎ₂＞ｎ₁なら、表示画面上においてアイコン７３２をアイコン７３１よりも上方に表示すると良い。これにより、より頻繁に設定されるＩＳＯ感度が、より少ない操作数で選択できるようになる。尚、ここでは、表示画面上において上方に配置されているアイコンの方が下方に配置されているアイコンよりも少ない操作数で選択できることを想定している。 Further, in the case of the display of FIG. 29C, the learning memory is used to arrange the icons 731 and 732 on the display screen so that the ISO sensitivity icons set relatively frequently are arranged above the display screen. It may be determined based on each sensitivity information in 54. For example, in the case where the ISO sensitivity for n ₁ target input images among the n target input images is “ISO 50” and the ISO sensitivity for n ₂ target input images is “ISO 100”. (N ₁ and n ₂ are integers and n ₁ + n ₂ = n), and if n ₁ > n ₂ , the icon 731 is displayed above the icon 732 on the display screen as shown in FIG. On the other hand, if n ₂ > n ₁ , the icon 732 may be displayed above the icon 731 on the display screen. As a result, the ISO sensitivity set more frequently can be selected with a smaller number of operations. Here, it is assumed that the icon arranged at the top on the display screen can be selected with a smaller number of operations than the icon arranged at the bottom.

＜＜第１３実施例＞＞
第１３実施例を説明する。上述の特徴情報、生成条件情報及び音制御情報（図９、図１８及び図２５参照）を、まとめて学習情報と呼ぶことができる。上述してきた方法では、カテゴリ組み合わせごとに学習情報を生成してゆくため、全てのカテゴリ組み合わせについての学習情報を生成するまでに相応の時間（相応の撮影回数）を要し、分類検出されるカテゴリの種類が多くなると、その時間も長期化する。 << Thirteenth embodiment >>
A thirteenth embodiment will be described. The above-described feature information, generation condition information, and sound control information (see FIGS. 9, 18, and 25) can be collectively referred to as learning information. In the method described above, learning information is generated for each category combination. Therefore, it takes a certain amount of time (corresponding to the number of times of photographing) to generate learning information for all category combinations, and the category that is classified and detected. As the number of types increases, the time also increases.

第１３実施例では、学習情報の生成に必要な時間（撮影回数）を短縮するための技術を説明する。第１３実施例で述べた技術は、上述してきた任意の実施例に適用することができる。説明の簡略化上、カテゴリ組み合わせを形成するカテゴリの個数が２であるとし、また、本実施例の技術を第１実施例に係る特徴情報に適用する方法を説明する。 In the thirteenth embodiment, a technique for reducing the time (number of times of photographing) necessary for generating learning information will be described. The technique described in the thirteenth embodiment can be applied to any of the embodiments described above. For simplification of description, it is assumed that the number of categories forming the category combination is 2, and a method of applying the technique of this embodiment to the feature information according to the first embodiment will be described.

今、図３０（ａ）〜（ｃ）に示す総合特徴情報８０１及び８０２と特徴情報８０３が、学習メモリ５４に保存されていることを想定する。 Assume that the comprehensive feature information 801 and 802 and the feature information 803 shown in FIGS. 30A to 30C are stored in the learning memory 54.

総合特徴情報８０１は、カテゴリ組み合わせ「自動車及び山」に対する総合特徴情報であり、総合特徴情報８０２は、カテゴリ組み合わせ「自動車及び人物」に対する総合特徴情報である。総合特徴情報８０１において、フォーカス情報、サイズ情報及び位置情報は、夫々、「自動車」、「ＳＩＺＥ₁」及び「ＢＬ₅」であり、総合特徴情報８０２において、フォーカス情報、サイズ情報及び位置情報は、夫々、「自動車」、「ＳＩＺＥ₂」及び「ＢＬ₆」である。また、総合特徴情報８０１は、カテゴリ組み合わせ「自動車及び山」についてのＮ₁個の要素特徴情報に基づいて作成されたものとし、総合特徴情報８０２は、カテゴリ組み合わせ「自動車及び人物」についてのＮ₂個の要素特徴情報に基づいて作成されたものとする。即ち、総合特徴情報８０１及び８０２の学習回数は夫々Ｎ₁及びＮ₂である（Ｎ₁及びＮ₂は自然数）。 The comprehensive feature information 801 is comprehensive feature information for the category combination “car and mountain”, and the comprehensive feature information 802 is comprehensive feature information for the category combination “car and person”. In the general feature information 801, the focus information, the size information, and the position information are “car”, “SIZE ₁ ”, and “BL ₅ ”, respectively. In the general feature information 802, the focus information, the size information, and the position information are “Automobile”, “SIZE ₂ ” and “BL ₆ ”, respectively. Also, the comprehensive feature information 801 is created based on N ₁ element feature information for the category combination “car and mountain”, and the comprehensive feature information 802 is N ₂ for the category combination “car and person”. It is assumed that it has been created based on individual element feature information. That is, the learning times of the general feature information 801 and 802 are N ₁ and N ₂ respectively (N ₁ and N ₂ are natural numbers).

第１３実施例では、単一のカテゴリについての特徴情報も特徴情報生成部５２（図４参照）において生成されて、学習メモリ５４に保存されるものとする。図３０（ｃ）の学習情報８０３は、単一のカテゴリについての特徴情報である。学習段階動作において被写体検出部５１及び特徴情報生成部５２により、或る対象入力画像から自動車のみが検出され、且つ、その対象入力画像の合焦被写体が自動車であることが検出され、且つ、その対象入力画像上における自動車の大きさ（自動車の被写体領域の画像サイズ）がＳＩＺＥ₃であることが検出され、且つ、その対象入力画像上における自動車の位置がブロックＢＬ₅に属していると検出されたものとする。そうすると、特徴情報生成部５２は、フォーカス情報、サイズ情報及び位置情報が夫々「自動車」、「ＳＩＺＥ₃」及び「ＢＬ₅」となる特徴情報８０３を生成し、特徴情報８０３は単一のカテゴリ「自動車」に関連付けられて学習メモリ５４に保存される。 In the thirteenth embodiment, feature information for a single category is also generated by the feature information generation unit 52 (see FIG. 4) and stored in the learning memory 54. The learning information 803 in FIG. 30C is feature information about a single category. In the learning stage operation, the subject detection unit 51 and the feature information generation unit 52 detect only a car from a certain target input image, and detect that the focused subject of the target input image is a car. It is detected that the size of the automobile on the target input image (image size of the subject area of the automobile) is SIZE ₃ and that the position of the automobile on the target input image belongs to block BL _5. Shall be. Then, the feature information generation unit 52 generates feature information 803 in which the focus information, the size information, and the position information are “automobile”, “SIZE ₃ ”, and “BL ₅ ”, respectively, and the feature information 803 includes a single category “ It is associated with the “car” and stored in the learning memory 54.

学習メモリ５４に、情報８０１〜８０３のみが保存されている状態において、特徴情報生成部５２は、情報８０１〜８０３に基づき、カテゴリ組み合わせ「人物及び山」に対する特徴情報８０４を擬似的に生成することができる。図３１に、擬似的に生成された特徴情報８０４を示す。この生成は、以下のように成される。 In a state where only the information 801 to 803 is stored in the learning memory 54, the feature information generation unit 52 artificially generates the feature information 804 for the category combination “person and mountain” based on the information 801 to 803. Can do. FIG. 31 shows the characteristic information 804 generated in a pseudo manner. This generation is performed as follows.

特徴情報生成部５２は、まず、情報８０１及び８０２に基づき、ピントが合わせられるべき被写体の優先順位を判断し、その判断結果から、特徴情報８０４のフォーカス情報を生成する。情報８０１〜８０３の生成を実現したユーザにとっては、山よりも自動車の方にピントを合わせることの方が好ましいことが情報８０１から推定されると共に、人物よりも自動車の方にピントを合わせることの方が好ましいことが情報８０２から推定される。従って、自動車、人物及び山の内、自動車の優先順位が最も高い。一方で、人物と山との間の優劣は、情報８０１及び８０２だけでは推定できない。但し、人物と人物以外の被写体とを比較した場合、通常は、人物の方にピントを合わせることが望まれやすい。これを考慮し、特徴情報生成部５２は、特徴情報８０４のフォーカス情報を「人物」とする。尚、上述の想定とは異なるが、仮に情報８０１のフォーカス情報が「山」であるのであれば、情報８０２をも参照すると優先順位は「山＞自動車＞人物」となるため、特徴情報８０４のフォーカス情報は「山」とされる。以下では、特徴情報８０４のフォーカス情報は「人物」であるとする。 The feature information generation unit 52 first determines the priority order of subjects to be focused based on the information 801 and 802, and generates focus information of the feature information 804 from the determination result. It is estimated from the information 801 that it is preferable to focus on the car rather than the mountain for the user who realized the generation of the information 801 to 803, and it is possible to focus on the car rather than the person. It is estimated from the information 802 that it is preferable. Therefore, among automobiles, people, and mountains, automobiles have the highest priority. On the other hand, superiority or inferiority between a person and a mountain cannot be estimated only by the information 801 and 802. However, when comparing a person and a subject other than the person, it is usually desirable to focus on the person. Considering this, the feature information generation unit 52 sets the focus information of the feature information 804 to “person”. Although different from the above assumption, if the focus information of the information 801 is “mountain”, the priority is “mountain> automobile> person” by referring also to the information 802. Therefore, the feature information 804 The focus information is “mountain”. Hereinafter, it is assumed that the focus information of the feature information 804 is “person”.

情報８０１におけるサイズ情報ＳＩＺＥ₁は、画像上における、山の大きさを基準とした自動車の大きさである、と考えることもできる。同様に、情報８０２におけるサイズ情報ＳＩＺＥ₂は、画像上における、人物の大きさを基準とした自動車の大きさである、と考えることもできる。これを考慮すれば、ＳＩＺＥ₁とＳＩＺＥ₂の比から、山の大きさを基準とした人物の大きさを推定することができる。但し、この際、自動車の大きさの基準値として、特徴情報８０３のサイズ情報ＳＩＺＥ₃を用いる。即ち、“ＳＩＺＥ₄＝（ＳＩＺＥ₁／ＳＩＺＥ₂）×ＳＩＺＥ₃”に従って求められたサイズ情報ＳＩＺＥ₄を、特徴情報８０４のサイズ情報とすることができる。 The size information SIZE ₁ in the information 801 can be considered to be the size of the automobile on the basis of the size of the mountain on the image. Similarly, the size information SIZE ₂ in the information 802 can be considered to be the size of the automobile on the basis of the size of the person on the image. Considering this, the size of the person can be estimated based on the size of the mountain from the ratio of SIZE ₁ and SIZE ₂ . At this time, however, the size information SIZE ₃ of the feature information 803 is used as a reference value for the size of the automobile. That is, the size information SIZE ₄ obtained according to “SIZE ₄ = (SIZE ₁ / SIZE ₂ ) × SIZE ₃ ” can be used as the size information of the feature information 804.

情報８０１及び８０２の位置情報から、合焦被写体の配置位置の好み（ユーザの好み）を推定することができる。仮に、情報８０１及び８０２間で位置情報が同じならば、それらと同じ位置情報を特徴情報８０４に含めれば足るが、位置情報が情報８０１及び８０２間で異なる場合、学習回数が多いほうの位置情報を特徴情報８０４の位置情報として採用する。本例では、Ｎ₁＞Ｎ₂であることを想定する。そうすると、情報８０１の位置情報ＢＬ₅が特徴情報８０４の位置情報に代入される（仮に、Ｎ₁＜Ｎ₂であれば、情報８０２の位置情報ＢＬ₆が特徴情報８０４の位置情報に代入される）。 From the position information of the information 801 and 802, the preference of the placement position of the focused subject (user preference) can be estimated. If the position information is the same between the information 801 and 802, it is sufficient to include the same position information in the feature information 804. However, if the position information is different between the information 801 and 802, the position information with the larger number of learning times. Is adopted as the position information of the feature information 804. In this example, it is assumed that N ₁ > N ₂ . Then, the position information BL ₅ of the information 801 is substituted into the position information of the feature information 804 (assuming that N ₁ <N ₂ , the position information BL ₆ of the information 802 is substituted into the position information of the feature information 804. ).

特徴情報８０４の生成後、特徴情報８０４を総合特徴情報とみなして学習段階動作から制御段階動作へ移行することができ、制御段階動作においてカテゴリ組み合わせが「人物及び山」となる入力画像が評価用画像として取得されたならば（図１０）、特徴情報８０４を読み出し特徴情報として学習メモリ５４から読み出して第１実施例で述べた動作を成すことができる。 After the generation of the feature information 804, the feature information 804 can be regarded as comprehensive feature information and can be shifted from the learning stage operation to the control stage operation, and an input image whose category combination is “person and mountain” in the control stage operation If acquired as an image (FIG. 10), the feature information 804 can be read as feature information and read from the learning memory 54 to perform the operation described in the first embodiment.

ここで、擬似的に生成された特徴情報８０４の学習回数Ｎ₄は、便宜上、０＜Ｎ₄＜１を満たすように設定される（例えば、Ｎ₄＝０．５）。このように設定しておくことで、制御段階動作において、人物、山及び自動車が同時に被写体に含められたとき、第６実施例の方法よって特徴情報８０４が読み出し特徴情報として読み出されることはない（この場合、Ｎ₁＞Ｎ₂＞Ｎ₄なので、第６実施例の方法によって特徴情報８０１が読み出し特徴情報として読み出される）。 Here, the number of learning times N ₄ of the artificially generated feature information 804 is set so as to satisfy 0 <N ₄ <1 (for example, N ₄ = 0.5) for convenience. By setting in this way, in the control stage operation, when a person, a mountain, and a car are included in the subject at the same time, the feature information 804 is not read as feature information by the method of the sixth embodiment ( In this case, since N ₁ > N ₂ > N ₄ , the feature information 801 is read as read feature information by the method of the sixth embodiment).

特徴情報８０４の生成後、カテゴリ組み合わせが「人物及び山」となる対象入力画像が実際に撮影されたならば、その対象入力画像に基づく特徴情報が生成される。この際、特徴情報８０４を破棄して、その対象入力画像に基づく特徴情報をカテゴリ組み合わせ「人物及び山」の特徴情報として学習メモリ５４に保存することができる。この場合、カテゴリ組み合わせ「人物及び山」についての学習回数Ｎ₄は１に変更される。或いは、その対象入力画像に基づく特徴情報と特徴情報８０４から、カテゴリ組み合わせ「人物及び山」についての特徴情報を再作成するようにしてもよい（換言すれば、対象入力画像に基づく特徴情報を用いて特徴情報８０４を修正するようにしても良い）。この場合、Ｎ₄＝１、又は、１＜Ｎ₄＜２とされる。 After the feature information 804 is generated, if a target input image whose category combination is “person and mountain” is actually captured, feature information based on the target input image is generated. At this time, the feature information 804 can be discarded and the feature information based on the target input image can be stored in the learning memory 54 as the feature information of the category combination “person and mountain”. In this case, the learning frequency N ₄ for the category combination “person and mountain” is changed to 1. Alternatively, feature information on the category combination “person and mountain” may be recreated from the feature information based on the target input image and the feature information 804 (in other words, feature information based on the target input image is used). The feature information 804 may be corrected. In this case, N ₄ = 1 or 1 <N ₄ <2.

特徴情報を擬似的に生成する方法を説明したが、同様にして、生成条件情報及び音制御情報（図１８及び図２５参照）も擬似的に生成することが可能である。 Although the method of generating feature information in a pseudo manner has been described, generation condition information and sound control information (see FIGS. 18 and 25) can also be generated in a similar manner.

＜＜変形等＞＞
上述の説明文中に示した具体的な数値は、単なる例示であって、当然の如く、それらを様々な数値に変更することができる。 << Deformation, etc. >>
The specific numerical values shown in the above description are merely examples, and as a matter of course, they can be changed to various numerical values.

図１の撮像装置１を、ハードウェア、或いは、ハードウェアとソフトウェアの組み合わせによって構成することができる。ソフトウェアを用いて撮像装置１を構成する場合、ソフトウェアにて実現される部位についてのブロック図は、その部位の機能ブロック図を表すことになる。ソフトウェアを用いて実現される機能をプログラムとして記述し、該プログラムをプログラム実行装置（例えばコンピュータ）上で実行することによって、その機能を実現するようにしてもよい。 The imaging apparatus 1 in FIG. 1 can be configured by hardware or a combination of hardware and software. When the imaging apparatus 1 is configured using software, a block diagram of a part realized by software represents a functional block diagram of the part. A function realized using software may be described as a program, and the function may be realized by executing the program on a program execution device (for example, a computer).

１撮像装置
１１撮像部
１３画像信号処理部
１４マイク部
１５音響信号処理部
３０ズームレンズ
３１フォーカスレンズ
３２絞り
３３撮像素子
５１被写体検出部
５２特徴情報生成部
５３メモリ制御部
５４学習メモリ
５５撮影制御部
５６生成条件情報生成部
５８シーン判定部 DESCRIPTION OF SYMBOLS 1 Imaging device 11 Imaging part 13 Image signal processing part 14 Microphone part 15 Acoustic signal processing part 30 Zoom lens 31 Focus lens 32 Aperture 33 Imaging element 51 Subject detection part 52 Feature information generation part 53 Memory control part 54 Learning memory 55 Shooting control part 56 Generation Condition Information Generation Unit 58 Scene Determination Unit

Claims

In an imaging apparatus that has an image sensor that outputs a signal obtained by photoelectrically converting an optical image of a subject and generates a target image from an output signal of the image sensor that is obtained when a predetermined operation is performed.
By repeating the predetermined operation, a plurality of target images including the first and second target images are generated, and the second target image is generated after the first target image,
The imaging device
A subject detection unit that classifies and detects each subject present on the image based on the output signal of the image sensor into any of a plurality of categories;
A combination of detection categories of the subject detection unit for a plurality of subjects on the first target image is set as a specific combination, and learning information according to characteristics of the first target image or generation conditions of the first target image is used as the specific combination. A memory part to be stored in association with
An image based on an output signal of the image sensor after generation of the first target image and before generation of the second target image is used as an evaluation image, and a detection category of the subject detection unit for a plurality of subjects on the evaluation image An imaging apparatus comprising: an imaging control unit configured to generate the second target image using the learning information when the combination matches the specific combination.

The learning information is information corresponding to the characteristics of the first target image,
The imaging apparatus according to claim 1, further comprising: focus information indicating which category of subjects in the plurality of subjects on the first target image is in focus.

The imaging apparatus further includes an operation unit that receives specification of a generation condition of the first target image, and generates the first target image according to the generation condition of the first target image specified via the operation unit. ,
The imaging apparatus according to claim 1, wherein the learning information is information according to a generation condition of the first target image.

When the combination of detection categories of the subject detection unit for the plurality of subjects on the evaluation image matches the specific combination,
Based on the learning information according to the characteristics of the first target image, focus control and zoom control for the second target image are performed so that the second target image has characteristics according to the characteristics of the first target image. Do or
The second target image is generated under a generation condition according to the generation condition of the first target image based on the learning information according to the generation condition of the first target image. Item 4. The imaging device according to any one of Items 3 to 3.

In an imaging apparatus that has an image sensor that outputs a signal obtained by photoelectrically converting an optical image of a subject and generates a target image from an output signal of the image sensor that is obtained when a predetermined operation is performed.
By repeating the predetermined operation, a plurality of target images including the first and second target images are generated, and the second target image is generated after the first target image,
The imaging device
A subject detection unit for classifying and detecting a subject existing on an image based on an output signal of the image sensor into any of a plurality of categories;
A scene determination unit for determining by selecting a shooting scene of an image based on an output signal of the image sensor from a plurality of registered scenes;
The combination of the detection category of the subject detection unit for the subject on the first target image and the determination scene of the scene determination unit for the first target image is a specific combination, and the characteristics of the first target image or the first target A memory unit that stores learning information according to image generation conditions in association with the specific combination;
An image based on an output signal of the imaging element after the generation of the first target image and before the generation of the second target image is used as an evaluation image, and the detection category of the subject detection unit for the subject on the evaluation image and the A shooting control unit configured to generate the second target image using the learning information when a combination of the determination scene of the scene determination unit with respect to the evaluation image matches the specific combination; An imaging device.

An image sensor that outputs a signal obtained by photoelectrically converting an optical image of a subject and a microphone unit that includes a plurality of microphones, and generates a target image from an output signal of the image sensor when a predetermined operation is performed. In the imaging device that generates a target acoustic signal based on the output acoustic signals of the plurality of microphones and associates the target acoustic signal with the target image,
By repeating the predetermined operation, a plurality of target images including the first and second target images are generated, and the second target image is generated after the first target image,
The imaging device
A subject detection unit that classifies and detects each subject present on the image based on the output signal of the image sensor into any of a plurality of categories;
A combination of detection categories of the subject detection unit for a plurality of subjects on the first target image is set as a specific combination, and learning information corresponding to the characteristics of the target acoustic signal associated with the first target image is set as the specific combination. A memory portion to store in association;
An image based on an output signal of the imaging element after the generation of the first target image and before the generation of the second target image or the second target image is used as an evaluation image, and the plurality of subjects on the evaluation image are selected. A target acoustic signal generation unit configured to generate a target acoustic signal to be associated with the second target image using the learning information when a combination of detection categories of the subject detection unit matches the specific combination. An imaging apparatus characterized by that.