JP2022048077A

JP2022048077A - Image processing apparatus and control method for the same

Info

Publication number: JP2022048077A
Application number: JP2021096752A
Authority: JP
Inventors: 雅人青葉; Masahito Aoba; 広一竹内; Koichi Takeuchi
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2020-09-14
Filing date: 2021-06-09
Publication date: 2022-03-25

Abstract

To enable a selection of a focus target area on which AF can be suitably executed.SOLUTION: An image processing apparatus is operable to determine a focus target area of an image capturing apparatus. The image processing apparatus includes: obtainment means configured to obtain a first area to be a focus target in a first image captured by the image capturing apparatus at a first point in time; detection means configured to detect a second area to be a focus target candidate from a second image captured by the image capturing apparatus at a second point in time succeeding the first point in time; and determination means configured to determine, on the basis of the first area and the second image, a focus target area in the second image from among one or more partial areas in the second area.SELECTED DRAWING: Figure 2

Description

本発明は、画像領域を選択する技術に関するものである。 The present invention relates to a technique for selecting an image region.

カメラでの撮影において、焦点を自動的に合わせるオートフォーカス（ＡＦ）機能が存在する。撮影時に焦点を合わせる対象となる領域（以下、合焦対象領域）の選択方法としては、タッチパネルなどを用いてユーザーが手動で選択する方法や、顔検出や物体検出などの検出結果に基づいて自動的に選択する方法がある。どのような方法で選択された合焦対象領域であっても、選択された合焦対象領域内の物体もしくはカメラ自体の移動によって、画像上の位置や形状が変化することがある。このとき、選択された合焦対象領域を追尾もしくは連続的に検出することで、ユーザが所望する領域にＡＦを継続することが可能である。 When shooting with a camera, there is an autofocus (AF) function that automatically focuses. As a method of selecting an area to be focused at the time of shooting (hereinafter referred to as an area to be focused), a method of manually selecting by the user using a touch panel or the like, or automatic based on detection results such as face detection and object detection. There is a way to select. Regardless of the method selected for the in-focus target area, the position or shape on the image may change due to the movement of the object or the camera itself in the selected in-focus target area. At this time, by tracking or continuously detecting the selected in-focus target area, it is possible to continue AF in the area desired by the user.

特許文献１では、瞳領域を検出して合焦対象領域に用いＡＦを行う方法が開示されている。この方法によれば、カメラからの距離が一定な合焦対象領域を用いるため、精度よくピントを合わせることが可能である。 Patent Document 1 discloses a method of detecting a pupil region and using it for an in-focus target region to perform AF. According to this method, since the focusing target area having a constant distance from the camera is used, it is possible to focus accurately.

特開２０１９－１２１８６０号公報Japanese Unexamined Patent Publication No. 2019-121860

しかしながら、特許文献１に記載の方法では、瞳など特定部位を検出する必要がある。そのため、他の物体に遮蔽される等により当該特定部位が観察できない場合には適用できない。また、特許文献１に記載の方法においては、合焦対象領域内における深度（カメラからの距離）の差が大きい場合には精度よくピントを合わせることは困難である。そのため、胴部や腕などある程度大きさのある部位を合焦対象領域にしたい場合には適用が困難である。 However, in the method described in Patent Document 1, it is necessary to detect a specific part such as a pupil. Therefore, it cannot be applied when the specific part cannot be observed due to being shielded by another object or the like. Further, in the method described in Patent Document 1, it is difficult to focus accurately when the difference in depth (distance from the camera) in the focusing target region is large. Therefore, it is difficult to apply it when it is desired to set a part having a certain size such as a torso or an arm as an in-focus target area.

本発明は、このような問題に鑑みてなされたものであり、好適にＡＦを実行可能な合焦対象領域を選択可能とする技術を提供することを目的としている。 The present invention has been made in view of such a problem, and an object of the present invention is to provide a technique for selecting a focus target region in which AF can be suitably performed.

上述の問題点を解決するため、本発明に係る画像処理装置は以下の構成を備える。すなわち、撮像装置の合焦対象領域を決定する画像処理装置は、第１の時点で前記撮像装置により撮像された第１の画像において合焦対象となる第１の領域を取得する取得手段と、前記第１の時点に後続する第２の時点で前記撮像装置により撮像された第２の画像から合焦対象の候補となる第２の領域を検出する検出手段と、前記第１の領域と前記第２の画像とに基づいて、前記第２の領域のうち１つ以上の部分領域の中から前記第２の画像における合焦対象領域を決定する決定手段と、を備える。 In order to solve the above-mentioned problems, the image processing apparatus according to the present invention has the following configurations. That is, the image processing device that determines the focus target area of the image pickup device is an acquisition means for acquiring the first focus target area in the first image captured by the image pickup device at the first time point. A detection means for detecting a second region as a candidate for focusing target from a second image captured by the image pickup apparatus at a second time point following the first time point, the first region, and the above. A determination means for determining an in-focus target region in the second image from one or more partial regions of the second region based on the second image is provided.

本発明によれば、好適にＡＦを実行可能な合焦対象領域を選択可能とする技術を提供することができる。 INDUSTRIAL APPLICABILITY According to the present invention, it is possible to provide a technique that enables selection of an in-focus target region in which AF can be suitably performed.

カメラと被写体の位置関係を示す図である。It is a figure which shows the positional relationship between a camera and a subject. 第１実施形態におけるＡＦシステムの構成の一例を示す図である。It is a figure which shows an example of the structure of the AF system in 1st Embodiment. 第１実施形態におけるＡＦシステムが実行する処理を説明するフローチャートである。It is a flowchart explaining the process executed by the AF system in 1st Embodiment. 基準領域と各部分領域の比較（Ｓ１０８）を説明する図である。It is a figure explaining the comparison (S108) of a reference area and each partial area. 人体の頭部および胴部の検出例を示す図である。It is a figure which shows the detection example of the head and the torso of a human body. 画像処理装置のハードウェア構成を示す図である。It is a figure which shows the hardware configuration of an image processing apparatus. 第２実施形態におけるＡＦシステムが実行する処理を説明するフローチャートである。It is a flowchart explaining the process executed by the AF system in 2nd Embodiment. 基準領域と頭部部分領域の比較を説明する図である。It is a figure explaining the comparison of the reference region and the head partial region. 基準領域と胴部部分領域の比較を説明する図である。It is a figure explaining the comparison of the reference area and the body part area.

以下、添付図面を参照して実施形態を詳しく説明する。なお、以下の実施形態は特許請求の範囲に係る発明を限定するものではない。実施形態には複数の特徴が記載されているが、これらの複数の特徴の全てが発明に必須のものとは限らず、また、複数の特徴は任意に組み合わせられてもよい。さらに、添付図面においては、同一若しくは同様の構成に同一の参照番号を付し、重複した説明は省略する。 Hereinafter, embodiments will be described in detail with reference to the accompanying drawings. The following embodiments do not limit the invention according to the claims. Although a plurality of features are described in the embodiment, not all of the plurality of features are essential for the invention, and the plurality of features may be arbitrarily combined. Further, in the attached drawings, the same or similar configurations are given the same reference numbers, and duplicate explanations are omitted.

（第１実施形態）
本発明に係る画像処理装置の第１実施形態として、撮影装置と領域選択装置とを含むオートフォーカス（ＡＦ）システムを例に挙げて以下に説明する。特に、第１実施形態では、ＡＦシステムは、撮影装置から取得した画像に基づいて人体を検出し、検出された人体領域の中からピントを合わせる合焦対象領域を抽出する。 (First Embodiment)
As a first embodiment of the image processing apparatus according to the present invention, an autofocus (AF) system including a photographing apparatus and an area selection apparatus will be described below as an example. In particular, in the first embodiment, the AF system detects the human body based on the image acquired from the photographing apparatus, and extracts the in-focus target area to be focused from the detected human body area.

＜装置構成＞
図１は、撮影装置（カメラ）と被写体の位置関係を示す図である。図１（ａ）は立位の状態にある人体である被写体Ｇ－１を例示的に示し、図１（ｂ）は仰向けの状態にある人体である被写体Ｇ－２を例示的に示している。なお、カメラは図の左下方向に存在し、カメラからの距離（深度、デプス）を示す複数の点線が示されている。 <Device configuration>
FIG. 1 is a diagram showing a positional relationship between a photographing device (camera) and a subject. FIG. 1 (a) schematically shows a subject G-1 which is a human body in a standing position, and FIG. 1 (b) schematically shows a subject G-2 which is a human body in a supine position. .. The camera exists in the lower left direction of the figure, and a plurality of dotted lines indicating the distance (depth, depth) from the camera are shown.

図１に示すように、被写体の姿勢によって体の各部位に対するカメラからの距離は変化する。たとえば、被写体Ｇ－１に示すように立位である場合、カメラからの距離はどの部位であっても大きくは変化しない。一方で、被写体Ｇ－２に示すようにカメラの視線方向に略平行に横たわっている場合、体の部位に応じて深度は大きく変化する。そして、被写体内の深度の変化幅が撮像装置の被写界深度以上に広い場合には、一般に被写体全体にピントを合わせることは出来ない。その結果、好適にＡＦを継続実行することが出来なくなる場合がある。そこで、第１実施形態では、被写体内の深度の変化幅が撮像装置の被写界深度以上に広い場合であっても、継続的に好適なＡＦを実現可能とする例について説明する。 As shown in FIG. 1, the distance from the camera to each part of the body changes depending on the posture of the subject. For example, in the case of standing as shown in the subject G-1, the distance from the camera does not change significantly at any part. On the other hand, when lying substantially parallel to the line-of-sight direction of the camera as shown in the subject G-2, the depth changes greatly depending on the part of the body. When the change width of the depth in the subject is wider than the depth of field of the image pickup apparatus, it is generally impossible to focus on the entire subject. As a result, AF may not be suitable for continuous execution. Therefore, in the first embodiment, an example in which suitable AF can be continuously realized even when the change width of the depth in the subject is wider than the depth of field of the image pickup apparatus will be described.

＜装置構成＞
図２は、第１実施形態におけるＡＦシステムの構成の一例を示す図である。図２に示すように、ＡＦシステムは、撮影装置１０および領域選択装置２０を備える。 <Device configuration>
FIG. 2 is a diagram showing an example of the configuration of the AF system according to the first embodiment. As shown in FIG. 2, the AF system includes a photographing device 10 and an area selection device 20.

撮影装置１０は、周辺環境の光景を画像化するカメラ装置である。撮影装置１０は画像取得部１１および測距部１２を備える。撮影装置１０の例としては、デジタル一眼レフカメラやスマートフォン、ウェアラブルカメラ、ネットワークカメラ、Ｗｅｂカメラなどがある。ただし、これらの例に限定されるものではなく、周囲の光景を画像化できる装置であればよい。 The photographing device 10 is a camera device that images a scene of the surrounding environment. The photographing device 10 includes an image acquisition unit 11 and a distance measuring unit 12. Examples of the photographing device 10 include a digital single-lens reflex camera, a smartphone, a wearable camera, a network camera, and a Web camera. However, the present invention is not limited to these examples, and any device that can image the surrounding scene may be used.

画像取得部１１は、撮像素子などを用いて撮影装置１０の周囲の光景を画像化し、領域選択装置２０に出力する。画像取得部１１が取得する画像は、デモザイキング処理前のＲＡＷデータであってもよいし、デモザイキングなどによって全画素がＲＧＢ値を持つ画像であってもよい。また、ライブビュー用の画像であってもよい。 The image acquisition unit 11 uses an image sensor or the like to image a scene around the photographing device 10 and outputs the image to the area selection device 20. The image acquired by the image acquisition unit 11 may be RAW data before the demosaiking process, or may be an image in which all pixels have RGB values by demosiking or the like. It may also be an image for live view.

測距部１２は、撮影装置１０と被写体との距離である深度情報を計測する測距機能を備え、計測した深度情報を領域選択装置２０に出力する。深度情報は、画像取得部１１が取得する画像の各画素もしくは各領域毎との対応付けが可能なものとする。また、深度情報とは、空間上の距離の長さに相関する任意の情報である。たとえば、空間上の長さそのものであってもよいし、入射光の位相差を検知する位相差センサなどに基づくデフォーカス量であってもよい。また、レンズの焦点面を移動させた際の画像のコントラスト変化量であってもよい。 The distance measuring unit 12 has a distance measuring function for measuring the depth information which is the distance between the photographing device 10 and the subject, and outputs the measured depth information to the area selection device 20. The depth information can be associated with each pixel or each area of the image acquired by the image acquisition unit 11. Further, the depth information is arbitrary information that correlates with the length of the distance in space. For example, it may be the length itself in space, or it may be the amount of defocus based on a phase difference sensor that detects the phase difference of incident light. Further, it may be the amount of contrast change of the image when the focal plane of the lens is moved.

領域選択装置２０（画像処理装置）は、撮影装置１０から入力された画像および深度情報に基づいて、人体の領域を検出する。そして、検出された領域の中からピントを合わせる対象となる合焦対象領域を選択する。領域選択装置２０は、検出部２１、部分領域抽出部２２、基準領域取得部２３、比較部２４、選択部２５を備える。なお、図２においては、領域選択装置２０は撮影装置１０と別体であるとして示しているが、一体の装置として構成してもよい。また、別体として構成する場合は、有線もしくは無線の通信機能によって接続されていてもよい。また、領域選択装置２０の各機能部は、中央処理ユニット（ＣＰＵ）がソフトウェアプログラムを実行することによっても実現され得る。 The area selection device 20 (image processing device) detects the area of the human body based on the image and the depth information input from the photographing device 10. Then, the focusing target area to be focused is selected from the detected areas. The area selection device 20 includes a detection unit 21, a partial area extraction unit 22, a reference area acquisition unit 23, a comparison unit 24, and a selection unit 25. Although the area selection device 20 is shown in FIG. 2 as being separate from the photographing device 10, it may be configured as an integrated device. Further, when it is configured as a separate body, it may be connected by a wired or wireless communication function. Further, each functional unit of the area selection device 20 can also be realized by the central processing unit (CPU) executing a software program.

図６は、情報処理装置のハードウェア構成を示す図である。ＣＰＵ１００１は、ＲＡＭ１００３をワークメモリとして、ＲＯＭ１００２や記憶装置１００４に格納されたＯＳやその他プログラムを読みだして実行する。そして、システムバス１００９に接続された各構成を制御して、各種処理の演算や論理判断などを行う。ＣＰＵ１００１が実行する処理には、実施形態の情報処理が含まれる。記憶装置１００４は、ハードディスクドライブや外部記憶装置などであり、実施形態の情報処理にかかるプログラムや各種データを記憶する。入力部１００５は、カメラなどの撮像装置、ユーザー指示を入力するためのボタン、キーボード、タッチパネルなどの入力デバイスである。なお、記憶装置１００４は例えばＳＡＴＡなどのインタフェイスを介して、入力部１００５は例えばＵＳＢなどのシリアルバスを介して、それぞれシステムバス１００９に接続されるが、それらの詳細は省略する。通信Ｉ／Ｆ１００６は無線通信で外部の機器と通信を行う。表示部１００７はディスプレイである。 FIG. 6 is a diagram showing a hardware configuration of the information processing device. The CPU 1001 uses the RAM 1003 as a work memory to read out and execute an OS and other programs stored in the ROM 1002 and the storage device 1004. Then, each configuration connected to the system bus 1009 is controlled to perform calculations and logical determinations for various processes. The process executed by the CPU 1001 includes information processing of the embodiment. The storage device 1004 is a hard disk drive, an external storage device, or the like, and stores programs and various data related to information processing of the embodiment. The input unit 1005 is an image pickup device such as a camera, an input device such as a button for inputting a user instruction, a keyboard, and a touch panel. The storage device 1004 is connected to the system bus 1009 via an interface such as SATA, and the input unit 1005 is connected to the system bus 1009 via a serial bus such as USB, but the details thereof will be omitted. The communication I / F 1006 communicates with an external device by wireless communication. The display unit 1007 is a display.

検出部２１は、画像中の人体領域を検出し、部分領域抽出部２２に出力する。人体領域は、全身に対応するものであってもよいし、顔や胴など特定の部位に対応するものであってもよい。人体領域の検出方法は特定の方法に限定しない。たとえば、「Joseph Redmon, Ali Farhadi, "YOLOv3: An Incremental Improvement", arXiv e-prints (2018)」に記載されるような物体検知手法を応用したものを利用可能である。また、頭部や手足などの輪郭形状に基づいて検出してもよい。さらに、時系列の画像から抽出した動き情報に基づいて検出してもよい。他にも、遠赤外線などに基づく熱源の情報から検出する構成であってもよい。 The detection unit 21 detects the human body region in the image and outputs it to the partial region extraction unit 22. The human body region may correspond to the whole body or may correspond to a specific part such as a face or a torso. The method for detecting the human body region is not limited to a specific method. For example, an application of an object detection technique as described in "Joseph Redmon, Ali Farhadi," YOLOv3: An Incremental Improvement ", arXiv e-prints (2018)" is available. Further, it may be detected based on the contour shape of the head, limbs, or the like. Further, it may be detected based on the motion information extracted from the time-series images. In addition, it may be configured to detect from information of a heat source based on far infrared rays or the like.

検出対象の人体領域は、所定の部位に対応するものであってもよいし、ユーザーが選択した部位に対応するものであってもよい。たとえば、顔や胴など部位のカテゴリを設定する機能を提供し、ユーザーが設定したカテゴリに対応する検出処理を行ってもよい。また、たとえばユーザーがタッチパネルで選択した領域の情報に基づいて、対応する検出処理を自動的に設定してもよい。 The human body region to be detected may correspond to a predetermined part or may correspond to a part selected by the user. For example, a function for setting a category of a part such as a face or a torso may be provided, and a detection process corresponding to the category set by the user may be performed. Further, for example, the corresponding detection process may be automatically set based on the information of the area selected by the user on the touch panel.

検出部２１が複数の人体領域を検出する場合、後述する基準領域に基づいて、対応する人体領域を１つまたは複数選択して出力してもよい。たとえば、基準領域との距離や画像的な類似度に基づいて選択してよい。 When the detection unit 21 detects a plurality of human body regions, one or a plurality of corresponding human body regions may be selected and output based on the reference region described later. For example, the selection may be based on the distance to the reference area and the degree of image similarity.

部分領域抽出部２２は、人体領域または深度情報に基づいて、所定の条件を満たす１つ以上の部分領域を抽出し、比較部２４に出力する。部分領域の抽出は、画像空間上の距離および深度情報の類似度に基づいて行う。 The partial region extraction unit 22 extracts one or more partial regions satisfying a predetermined condition based on the human body region or depth information, and outputs them to the comparison unit 24. Subregion extraction is based on the similarity of distance and depth information in image space.

たとえば、人体領域内の各画素の深度情報が入力されている場合、画素間の画像空間上の距離が近く、深度情報が類似する集合を部分領域として抽出してもよい。この際、部分領域に内包される閉領域は部分領域に統合されてもよいし、異なる部分領域として抽出してもよい。また、画素間の距離や部分領域の面積に閾値を設けて抽出してもよい。 For example, when the depth information of each pixel in the human body region is input, a set in which the distance between the pixels in the image space is short and the depth information is similar may be extracted as a subregion. At this time, the closed region included in the partial region may be integrated into the partial region or may be extracted as a different partial region. Further, the extraction may be performed by setting a threshold value for the distance between pixels and the area of the partial area.

他の方法として、たとえば測距部１２が特定の測距領域について深度情報を計測する場合、人体領域内の各測距領域のうち、深度情報の分散が閾値以下となるものを部分領域として抽出してもよい。 As another method, for example, when the distance measuring unit 12 measures the depth information for a specific distance measuring area, the range measuring area in the human body area in which the variance of the depth information is equal to or less than the threshold value is extracted as a partial area. You may.

基準領域取得部２３は、ＡＦの対象となる特定部位である基準領域を取得し、比較部２４に出力する。基準領域の取得方法は特定の方法に限定しない。たとえば、カメラのライブビュー画面を撮影者がタップすることで、タップされた領域を基準領域としてもよい。ほかにも、たとえば顔検出手法を用いて、画面中央に近く、より大きく映っている顔の領域を基準領域としてもよい。検出手法を用いる場合、検出手法は検出部２１と同様であってもよいし、異なる手法であってもよい。ここでは、基準領域（第１の領域）は、ＡＦ開始時である第１の時点に撮影された第１の画像における合焦対象領域として保持されているものとする。なお、撮影開始時は、初期画像の所定の領域（例えば画面中央）を基準領域として予め設定しておく。 The reference area acquisition unit 23 acquires a reference area, which is a specific portion to be AF, and outputs the reference area to the comparison unit 24. The method of acquiring the reference area is not limited to a specific method. For example, the photographer may tap the live view screen of the camera to use the tapped area as a reference area. Alternatively, for example, using a face detection method, a region of the face that is closer to the center of the screen and appears larger may be used as the reference region. When the detection method is used, the detection method may be the same as or different from the detection unit 21. Here, it is assumed that the reference region (first region) is held as the focusing target region in the first image taken at the first time point which is the start of AF. At the start of shooting, a predetermined area of the initial image (for example, the center of the screen) is set in advance as a reference area.

比較部２４は、部分領域抽出部２２から入力された部分領域と、基準領域取得部２３から入力された基準領域と、を比較し、比較結果を選択部２５に出力する。 The comparison unit 24 compares the partial area input from the partial area extraction unit 22 with the reference area input from the reference area acquisition unit 23, and outputs the comparison result to the selection unit 25.

比較部２４が比較する対象とする要素（以降では、比較要素と呼ぶ）は深度情報を含み、異なる複数種類の情報であってもよい。たとえば、基準領域と部分領域の間の画像空間上の相対位置を含んでもよいし、領域に対応する画像上の画素値を含んでもよい。また、画像空間上の相対位置を用いる場合、深度情報に基づいて相対位置の大きさを正規化してもよい。たとえば、カメラからの距離が小さい領域については相対位置の値を小さくし、距離が大きい領域については相対位置の値を大きくしてもよい。 The element to be compared by the comparison unit 24 (hereinafter referred to as a comparison element) includes depth information, and may be a plurality of different types of information. For example, it may include relative positions in the image space between the reference region and the subregion, or it may include pixel values on the image corresponding to the region. Further, when the relative position on the image space is used, the size of the relative position may be normalized based on the depth information. For example, the relative position value may be increased for a region where the distance from the camera is small, and the relative position value may be increased for an region where the distance is large.

ただし、比較部２４の比較方法は特定の方法に限定しない。たとえば、各領域の比較要素を平均し、平均値の差を比較結果として出力してもよい。また、部分領域と基準領域の形状が同一の場合は対応する各画素について比較要素の差を比較結果として出力してもよい。他にも、領域内の深度情報の分布を比較し、カルバック・ライブラー情報量などの指標を比較結果として出力してもよい。 However, the comparison method of the comparison unit 24 is not limited to a specific method. For example, the comparison elements in each area may be averaged and the difference between the average values may be output as the comparison result. Further, when the shapes of the partial region and the reference region are the same, the difference between the comparison elements may be output as the comparison result for each corresponding pixel. In addition, the distribution of depth information in the region may be compared, and an index such as the amount of Kullback-Leibler information may be output as a comparison result.

更に、比較部２４は注目する物体の深度情報の時間変化に基づき、基準領域が取得された時点からの深度情報の変化量を推定することで、推定した深度情報と部分領域の深度情報を比較してもよい。この場合、カメラとの距離が時間経過とともに動的に変化する被写体についても、より安定したＡＦが実現できる場合がある。 Further, the comparison unit 24 compares the estimated depth information with the depth information of the partial region by estimating the amount of change in the depth information from the time when the reference region is acquired based on the time change of the depth information of the object of interest. You may. In this case, more stable AF may be realized even for a subject whose distance to the camera changes dynamically with the passage of time.

選択部２５は、比較部２４から入力された、基準領域と各部分領域との比較結果に基づいて、現在の入力画像に対する合焦対象領域を選択し、撮影装置１０に出力する。選択部２５が合焦対象領域を選択する方法の一例としては、基準領域との比較結果で、基準領域と類似する（すなわち基準領域の深度との差が相対的に小さい）部分領域を選択する方法がある。この方法によれば、基準領域と類似する領域にＡＦを継続することが可能である。 The selection unit 25 selects an in-focus target area for the current input image based on the comparison result between the reference area and each partial area input from the comparison unit 24, and outputs the image to the photographing apparatus 10. As an example of the method in which the selection unit 25 selects the in-focus target area, a partial area similar to the reference area (that is, the difference from the depth of the reference area is relatively small) is selected in the comparison result with the reference area. There is a way. According to this method, AF can be continued in a region similar to the reference region.

また、比較結果のみを用いるのではなく、各部分領域について合焦対象領域として選択する優先度を評価し、優先度の高いものが選ばれやすくなるように選択してもよい。たとえば、基準領域の深度との差が所定値未満の部分領域が複数個存在する場合、面積が大きいほど優先度を高くしてもよい。他にもたとえば、画像の中心に近いほど優先度を高くしてもよい。さらに、選択部２５は、複数の基準に基づいて、第２の画像における合焦対象領域を決定してもよい。例えば、複数の基準には、第１の領域と第２の領域との類似度、および、１つ以上の部分領域の面積についてのそれぞれの基準を含む。 Further, instead of using only the comparison result, the priority of selecting each partial region as the focusing target region may be evaluated, and the one having a high priority may be easily selected. For example, when there are a plurality of partial regions whose difference from the depth of the reference region is less than a predetermined value, the larger the area, the higher the priority may be. Alternatively, for example, the closer to the center of the image, the higher the priority. Further, the selection unit 25 may determine the in-focus target area in the second image based on a plurality of criteria. For example, the plurality of criteria include the similarity between the first region and the second region, and each criterion for the area of one or more partial regions.

更に、基準領域を取得した時刻からの経過時間によって合焦対象領域の選択基準を変更してもよい。たとえば、経過時間が短い場合は基準領域との比較結果で類似度が近いものを優先し、経過時間が長くなるほど、部分領域の面積など比較結果以外の優先基準を重視して選択してもよい。また、比較結果が複数種類の要素を含む場合、要素ごとの類似度について、異なる重みで考慮した選択を行ってもよい。 Further, the selection criterion of the focusing target area may be changed according to the elapsed time from the time when the reference area is acquired. For example, if the elapsed time is short, priority may be given to the comparison result with the reference area having a close degree of similarity, and as the elapsed time becomes longer, the priority criteria other than the comparison result such as the area of the partial area may be emphasized and selected. .. Further, when the comparison result includes a plurality of types of elements, the similarity of each element may be selected in consideration of different weights.

＜装置の動作＞
図３は、第１実施形態におけるＡＦシステムが実行する処理を説明するフローチャートである。Ｓ１０１～Ｓ１１１はそれぞれ特定の処理を表しており、原則として順番に実行する。ただし、ＡＦシステムは必ずしもこのフローチャートで説明するすべての処理を行わなくともよいし、処理の実行順序が変化してもよい。さらに、複数の処理を並列に実行してもよい。 <Operation of the device>
FIG. 3 is a flowchart illustrating a process executed by the AF system according to the first embodiment. Each of S101 to S111 represents a specific process, and is executed in order in principle. However, the AF system does not necessarily have to perform all the processes described in this flowchart, and the execution order of the processes may change. Further, a plurality of processes may be executed in parallel.

ステップＳ１０１では、画像取得部１１は、ＡＦ開始時点（時刻ｔ－１）の画像（第１の画像）を取得する。たとえば、ライブビューのＲＧＢ画像を取得する。また、ステップＳ１０２では、測距部１２は、ＡＦ開始時点の深度情報を計測する。たとえば、測距部１２が位相差センサを備える場合はデフォーカス量を計測する。なお、計測された深度情報は、領域選択装置２０により後続して取得（深度情報取得）されることになる。 In step S101, the image acquisition unit 11 acquires an image (first image) at the AF start time point (time t-1). For example, an RGB image of a live view is acquired. Further, in step S102, the ranging unit 12 measures the depth information at the start of AF. For example, when the ranging unit 12 includes a phase difference sensor, the defocus amount is measured. The measured depth information is subsequently acquired (depth information acquisition) by the area selection device 20.

ステップＳ１０３では、基準領域取得部２３は、ＡＦ開始時点（第１の時点）の基準領域（第１の領域）を取得する。すなわち、ＡＦ開始時点で合焦対象領域として使用していた領域を取得（基準取得）する。たとえば、ユーザーがタッチパネルで選択した領域や、自動検出された顔や人体の領域を取得する。基準領域を検出処理に基づいて取得する場合、検出部２１などを用いてもよい。また、基準領域の候補が複数存在する場合、選択部２５などを用いて基準領域を選択してもよい。また、連続して撮影している場合は、前回の合焦領域を取得してもよい。 In step S103, the reference area acquisition unit 23 acquires the reference area (first area) at the AF start time point (first time point). That is, the area used as the in-focus target area at the start of AF is acquired (reference acquisition). For example, the area selected by the user on the touch panel or the automatically detected area of the face or human body is acquired. When the reference area is acquired based on the detection process, the detection unit 21 or the like may be used. Further, when there are a plurality of candidates for the reference region, the reference region may be selected by using the selection unit 25 or the like. Further, when shooting continuously, the previous focusing area may be acquired.

ステップＳ１０４では、画像取得部１１は、合焦対象領域を選択する時点（時刻ｔ、第２の時点）の画像（第２の画像）を取得する。合焦対象領域を選択する時点は、ＡＦ開始時点に後続する時刻である。なお、第１の時点と第２の時点は連続する時刻でない場合でもよく、例えば、一定時間間隔でフォーカス位置を変更するものであってもよい。ステップＳ１０５では、測距部１２は、合焦対象領域を選択する時点の深度情報を計測する。なお、計測された深度情報は、領域選択装置２０により後続して取得（深度情報取得）されることになる。 In step S104, the image acquisition unit 11 acquires an image (second image) at a time point (time t, second time point) at which the focusing target area is selected. The time point at which the in-focus target area is selected is the time following the AF start time point. The first time point and the second time point may not be continuous times, and for example, the focus position may be changed at regular time intervals. In step S105, the ranging unit 12 measures the depth information at the time of selecting the focusing target area. The measured depth information is subsequently acquired (depth information acquisition) by the area selection device 20.

ステップＳ１０６では、検出部２１は、Ｓ１０４で取得された画像から合焦対象の候補となる人体領域（第２の領域）を検出する。たとえば、合焦対象となる所定の物体（人体の顔や全身、犬や猫などの動物、車や建物といった領域）を検出する。基準領域と画像特徴が類似する領域を第２の領域として検出してもよい。例えば、ディープラーニングや、セマンティックセグメンテーションを用いて第２の領域を検出してもよい。また、Ｓ１０４で取得された画像に対応する深度情報に基づいて、深度が所定の範囲である領域を第２の領域として検出してもよい。例えば、被写体が一人だけのとき等は手前の領域（つまり同じような深度を示す領域）を検出してもよい。この場合、Ｓ１０７をスキップしてもよい。そして、ステップＳ１０７では、部分領域抽出部２２は、Ｓ１０６で検出した人体領域から、所定の条件を満たす１つ以上の部分領域を抽出する。例えば、部分領域は深度情報に基づいて抽出され、具体的には、深度情報によって示される深度が被写界深度の範囲に含まれる部分領域が抽出される。 In step S106, the detection unit 21 detects a human body region (second region) as a candidate for focusing target from the image acquired in S104. For example, it detects a predetermined object to be focused (a human face or whole body, an animal such as a dog or a cat, or an area such as a car or a building). A region having similar image features to the reference region may be detected as the second region. For example, deep learning or semantic segmentation may be used to detect the second region. Further, a region having a predetermined depth may be detected as a second region based on the depth information corresponding to the image acquired in S104. For example, when there is only one subject, the area in front (that is, the area showing the same depth) may be detected. In this case, S107 may be skipped. Then, in step S107, the partial region extraction unit 22 extracts one or more partial regions satisfying a predetermined condition from the human body region detected in S106. For example, the partial region is extracted based on the depth information, and specifically, the partial region in which the depth indicated by the depth information is included in the range of the depth of field is extracted.

Ｓ１０７の処理に関して図１を参照して説明する。Ｓ１０７では、被写界深度に基づいて、部分領域の抽出基準を変更する。たとえば、被写体Ｇ－１（直立した人体）が検出されている場合、人体領域内のカメラからの距離はほぼ一定（例えば算出される深度の差は５０ｃｍ以下）である。そのため、一般には人体領域全体が１つの部分領域として抽出され得る。ただし、撮影装置の被写界深度が狭い場合（例えば数ｃｍ）には、目鼻や手足など、撮影装置からの距離が近い部分領域をそれぞれ抽出してもよい。 The processing of S107 will be described with reference to FIG. In S107, the extraction standard of the partial region is changed based on the depth of field. For example, when the subject G-1 (upright human body) is detected, the distance from the camera in the human body region is almost constant (for example, the calculated depth difference is 50 cm or less). Therefore, in general, the entire human body region can be extracted as one partial region. However, when the depth of field of the photographing device is narrow (for example, several cm), partial regions such as the eyes, nose, limbs, etc., which are close to the photographing device may be extracted.

カメラからの距離が近い部分領域を抽出する方法の一例としては、たとえばＫ－ｍｅａｎｓなどのクラスタリング手法を用いる方法がある。具体的には、深度情報が近傍する画素クラスタを抽出し、各クラスタを部分領域として抽出する。このとき、画像上の画素間の距離を考慮してもよいし、無視してもよい。 As an example of a method of extracting a partial region close to the camera, there is a method of using a clustering method such as K-means. Specifically, pixel clusters with close depth information are extracted, and each cluster is extracted as a partial area. At this time, the distance between the pixels on the image may be taken into consideration or may be ignored.

ステップＳ１０８では、比較部２４は、Ｓ１０３で取得した基準領域とＳ１０７で抽出された各部分領域を比較する。たとえば、基準領域と各部分領域間の深度情報の分布や、基準領域における画像特徴と抽出された部分領域における画像特徴を比較する。 In step S108, the comparison unit 24 compares the reference region acquired in S103 with each partial region extracted in S107. For example, the distribution of depth information between the reference region and each subregion, and the image features in the reference region and the image features in the extracted subregion are compared.

図４は、Ｓ１０８における基準領域と各部分領域の比較を説明する図である。図４（ａ）は第１の画像から取得された基準領域である領域Ｇ－４を示す。また、図４（ｂ）は第２の画像から抽出された部分領域である領域Ｇ－５ａと領域Ｇ－５ｂを示す。Ｓ１０８では、たとえば基準領域と各部分領域それぞれについて深度情報から深度の平均を求め、基準領域と部分領域との間の深度情報から深度の平均の差の絶対値を比較結果として出力してもよい。図４に示す一例では、領域Ｇ－５ａは領域Ｇ－５ｂよりも領域Ｇ－４に近い。そのため、比較結果としては深度情報の差がより小さく出力される。また、画像特徴を用いて比較する場合は、基準領域における画像特徴と抽出された部分領域における画像特徴との類似度と、予め設定された閾値とを比較する。類似度が閾値以上であれば類似しており、同じ部位である可能性が高い。一方で、閾値以下であれば類似しないため別の部位である可能性が高い。 FIG. 4 is a diagram illustrating a comparison between the reference region and each partial region in S108. FIG. 4A shows a region G-4 which is a reference region acquired from the first image. Further, FIG. 4B shows a region G-5a and a region G-5b which are partial regions extracted from the second image. In S108, for example, the average depth may be obtained from the depth information for each of the reference region and each partial region, and the absolute value of the difference between the average depths from the depth information between the reference region and the partial region may be output as a comparison result. .. In the example shown in FIG. 4, the region G-5a is closer to the region G-4 than the region G-5b. Therefore, as a comparison result, the difference in depth information is output to be smaller. When comparing using image features, the similarity between the image features in the reference region and the image features in the extracted partial region is compared with a preset threshold value. If the degree of similarity is equal to or higher than the threshold value, they are similar and are likely to be the same site. On the other hand, if it is below the threshold value, there is a high possibility that it is a different site because it is not similar.

ステップＳ１０９では、選択部２５は、比較結果に基づいて部分領域を選択し、合焦対象領域として決定する。ここでは、基準領域における平均の深度ともっとも近い平均深度を有する部分領域を合焦対象領域として決定する。領域における深度の平均値ではなく、代表的な位置の深度を用いてもよい。また、深度の差が一番小さい（所定の値より小さい）部分領域を合焦対象領域として決定してもよい。このように深度が近い領域を決定することでピントを合わせる時間の短縮にもつながる。さらに、合焦対象領域の選択方法の一例としては、たとえば、基準領域ともっとも類似する部分領域を選択してもよい。また、比較結果で基準領域との差分が閾値以下の部分領域の中で、もっとも面積の大きな部分領域を選択してもよい。複数の選択基準を組合せてもよい。 In step S109, the selection unit 25 selects a partial region based on the comparison result and determines it as the focusing target region. Here, the partial region having the average depth closest to the average depth in the reference region is determined as the focusing target region. Instead of the average depth in the region, the depth at a typical position may be used. Further, the partial region having the smallest difference in depth (smaller than a predetermined value) may be determined as the focusing target region. Determining a region with a close depth of field in this way also leads to a reduction in focusing time. Further, as an example of the method of selecting the in-focus target area, for example, a partial area most similar to the reference area may be selected. Further, the subregion having the largest area may be selected from the subregions whose difference from the reference region is equal to or less than the threshold value in the comparison result. A plurality of selection criteria may be combined.

ステップＳ１１０では、領域選択装置２０は、ＡＦ処理を継続するかどうかを判定する。ＡＦ処理を継続する場合はＳ１１１に進み、継続しない場合は処理を終了する。ステップＳ１１１では、領域選択装置２０は、第２の画像を第１の画像として置き換える。その後Ｓ１０３に戻り、処理を反復させる。 In step S110, the area selection device 20 determines whether or not to continue the AF process. If the AF process is to be continued, the process proceeds to S111, and if not, the process is terminated. In step S111, the area selection device 20 replaces the second image with the first image. After that, the process returns to S103 and the process is repeated.

以上の処理によって、直前（時刻ｔ－１）でＡＦの対象として選択されていた基準領域に対応する現在（時刻ｔ）の画像中の領域を好適に選択することが可能となる。 By the above processing, it becomes possible to suitably select the region in the current (time t) image corresponding to the reference region selected as the AF target immediately before (time t-1).

以上説明したとおり第１実施形態によれば、基準領域の情報を利用して、検出された人体領域内でＡＦに適した部分領域を合焦対象領域として選択する。特に、基準領域との差がより小さい（より類似した）部分領域を選択する。これにより、ＡＦシステムは、検出された領域内の深度の差が大きい場合であっても、好適にＡＦを継続実行することが可能である。 As described above, according to the first embodiment, the partial region suitable for AF is selected as the focusing target region in the detected human body region by using the information of the reference region. In particular, select a partial region with a smaller (more similar) difference from the reference region. This allows the AF system to suitably continue AF even when the difference in depth within the detected region is large.

なお、上述の説明では人体領域を検出する場合について説明したが、人体領域以外の検出対象に適用してもよい。たとえば、人以外の動物を検出してもよいし、車両など特定物体を検出してもよい。また、デジタルカメラでの撮影に利用可能なほか、後処理によって撮影後にピント位置を変更するシステムなどにも利用することが可能である。 Although the case of detecting the human body region has been described in the above description, it may be applied to a detection target other than the human body region. For example, an animal other than a human may be detected, or a specific object such as a vehicle may be detected. In addition to being usable for shooting with a digital camera, it can also be used for a system that changes the focus position after shooting by post-processing.

（第２実施形態）
第２実施形態では、基準領域と合焦対象領域の部位カテゴリが違う場合に対処する形態について説明する。以下では、カメラなどから取得した画像に基づいて頭部および胴部を検出し、頭部が検出されなかった場合に胴部の部分領域から合焦対象領域を選択する例について説明する。 (Second Embodiment)
In the second embodiment, a mode for dealing with a case where the site categories of the reference region and the in-focus target region are different will be described. Hereinafter, an example will be described in which the head and the body are detected based on an image acquired from a camera or the like, and when the head is not detected, the focusing target area is selected from the partial area of the body.

なお、ここでは、用語「胴部」は、頭部を含まない人の体幹部分を指す。ただし、胴部は体幹部分以外に、首や手足などを含んでもよい。また、体幹部分全体ではなく、胸部や腹部など一部の部位であってもよい。 In addition, here, the term "torso" refers to the trunk part of a person who does not include the head. However, the torso may include the neck, limbs, etc. in addition to the trunk. Moreover, it may be a part of a part such as a chest and an abdomen instead of the whole trunk part.

第２実施形態におけるＡＦシステムの構成は、第１実施形態（図２）とほぼ同様である。ただし、各機能部の動作が第１実施形態とは異なるため、以下では第１実施形態とは異なる部分について説明する。 The configuration of the AF system in the second embodiment is almost the same as that in the first embodiment (FIG. 2). However, since the operation of each functional unit is different from that of the first embodiment, the parts different from the first embodiment will be described below.

図７は、第２実施形態におけるＡＦシステムが実行する処理を説明するフローチャートである。Ｓ２０１～Ｓ２１５はそれぞれ特定の処理を表している。 FIG. 7 is a flowchart illustrating a process executed by the AF system according to the second embodiment. S201 to S215 each represent a specific process.

ステップＳ２０１およびＳ２０２では、第１の画像とその深度情報が取得される。Ｓ２０１およびＳ２０２で行われる処理は第１実施形態におけるＳ１０１およびＳ１０２と同様であるため、説明は省略する。 In steps S201 and S202, the first image and its depth information are acquired. Since the processing performed in S201 and S202 is the same as in S101 and S102 in the first embodiment, the description thereof will be omitted.

ステップＳ２０３では、検出部２１は、第１の画像から人体の頭部領域と胴部領域を検出し、部分領域抽出部２２および基準領域取得部２３に出力する。胴部領域を検出する方法は特定の方法に限定しない。たとえば、意味的領域分割の手法を用いて胴部を直接検出してもよいし、物体検出手法を用いて肩や腰など、胴部に含まれる部位を検出することで胴部を検出してもよい。 In step S203, the detection unit 21 detects the head region and the torso region of the human body from the first image and outputs them to the partial region extraction unit 22 and the reference region acquisition unit 23. The method of detecting the torso region is not limited to a specific method. For example, the torso may be detected directly by using a semantic region division method, or the torso may be detected by detecting a part contained in the torso such as a shoulder or a waist by using an object detection method. May be good.

図５は、人体の検出例を示す図である。図５（ａ）は、検出部２１が頭部および胴部を検出した場合の一例を示している。領域Ｇ－８および領域Ｇ－６ａは、検出された頭部および胴部である。一方、図５（ｂ）は、構造物（領域Ｇ－９）により頭部が遮蔽され、検出部２１が胴部Ｇ－６ｂのみを検出した場合の一例を示している。 FIG. 5 is a diagram showing an example of detection of a human body. FIG. 5A shows an example when the detection unit 21 detects the head portion and the body portion. Regions G-8 and G-6a are the detected heads and torso. On the other hand, FIG. 5B shows an example in which the head is shielded by the structure (region G-9) and the detection unit 21 detects only the body portion G-6b.

領域Ｇ－６ａ、Ｇ－６ｂのように、胴部として検出される部分の大きさや位置は必ずしも体幹全体に対応するものでなくともよい。また、必ずしも矩形で表現可能である必要はなく、楕円や多角形など任意の形状であってもよい。また、分布として表現されていてもよい。 The size and position of the portion detected as the torso, such as the regions G-6a and G-6b, do not necessarily correspond to the entire trunk. Further, it does not necessarily have to be represented by a rectangle, and may be any shape such as an ellipse or a polygon. It may also be expressed as a distribution.

ステップＳ２０４では、基準領域取得部２３は、第１の画像からＡＦの対象となる基準領域を取得し、比較部２４に出力する。基準領域の位置は、第１実施形態のＳ１０３と同様にして、ユーザによる指定や、検出部２１による検出結果を用いて指定される。以下の説明では、基準領域は頭部が優先的に選択されるものとして説明するが、本発明は頭部優先に限定されるものではない。胴部優先とする場合には、以下の説明における頭部と胴部を入れ替えて読むとよい。 In step S204, the reference area acquisition unit 23 acquires the reference area to be AF from the first image and outputs it to the comparison unit 24. The position of the reference region is designated by the user or by using the detection result by the detection unit 21 in the same manner as in S103 of the first embodiment. In the following description, the reference region will be described assuming that the head is preferentially selected, but the present invention is not limited to the head priority. When giving priority to the torso, it is advisable to replace the head and the torso in the following explanation.

指定された位置が図５（ａ）の領域Ｇ－８に示すような頭部領域であれば、その頭部領域が基準領域とされる。基準領域取得部２３が取得する頭部領域は、領域Ｇ－８のように頭部全体ではなく頭部に含まれる顔など部分の領域であってもよい。指定された位置が図５（ａ）の領域Ｇ－６ａに示すような胴部領域であった場合は、指定された位置に最も近い位置にある頭部領域である領域Ｇ－８が基準領域として選択される。指定された位置が図５（ｂ）の領域Ｇ－６ｂに示すような胴部領域であり、対応する頭部領域が存在しない場合は、指定された位置の領域Ｇ－６ｂを基準領域として選択する。 If the designated position is a head region as shown in the region G-8 of FIG. 5A, the head region is used as a reference region. The head region acquired by the reference region acquisition unit 23 may be a region such as a face included in the head instead of the entire head as in the region G-8. When the designated position is the body region as shown in the region G-6a of FIG. 5A, the region G-8, which is the head region closest to the designated position, is the reference region. Is selected as. If the designated position is the torso area as shown in the area G-6b of FIG. 5B and the corresponding head area does not exist, the area G-6b at the specified position is selected as the reference area. do.

以降、基準領域に胴部領域が指定された場合に関して説明は図示しないが、処理としては基準領域に頭部領域が指定された場合と同じである。連続して撮影されている場合には、前回の時刻で選択された合焦対象領域を基準領域とする。 Hereinafter, the case where the body region is designated as the reference region is not shown, but the processing is the same as the case where the head region is designated as the reference region. When shooting continuously, the in-focus target area selected at the previous time is used as the reference area.

ステップＳ２０５およびＳ２０６では、第２の画像とその深度情報が取得される。Ｓ２０５およびＳ２０６の処理は、第１実施形態のＳ１０４およびＳ１０５と同様であるため、説明は省略する。 In steps S205 and S206, the second image and its depth information are acquired. Since the processing of S205 and S206 is the same as that of S104 and S105 of the first embodiment, the description thereof will be omitted.

ステップＳ２０７では、検出部２１は、第２の画像から頭部領域および胴部領域を検出する。ステップＳ２０８では、部分領域抽出部２２は、第２の画像から検出された頭部領域および胴部領域のそれぞれに対して、深度情報に基づいてそれぞれの部分領域を抽出する。部分領域の具体的な抽出方法については、第１実施形態のＳ１０７ですでに詳細な説明がされているため、ここでは割愛する。 In step S207, the detection unit 21 detects the head region and the body region from the second image. In step S208, the partial region extraction unit 22 extracts each partial region based on the depth information for each of the head region and the trunk region detected from the second image. Since the specific extraction method of the partial region has already been described in detail in S107 of the first embodiment, it is omitted here.

ステップＳ２０９では、比較部２４は、第１の画像から取得された基準領域と、第２の画像で検出された頭部領域に属する部分領域に対して、それぞれの画像における深度情報を用いて比較処理を行う。基準領域と部分領域の具体的な比較方法に関しては、第１実施形態のＳ１０８で説明した方法と同様である。 In step S209, the comparison unit 24 compares the reference region acquired from the first image with the partial region belonging to the head region detected in the second image using the depth information in each image. Perform processing. The specific method for comparing the reference region and the partial region is the same as the method described in S108 of the first embodiment.

ステップＳ２１０では、選択部２５は、Ｓ２０９で得られた比較結果を検証する。検証内容として、第２の画像の所定範囲内に頭部の部分領域が存在するかどうかを判定する。ここで所定範囲とは、連続撮影における追尾範囲であるが、その広さに関しては特定の範囲に限定されない。例えば、追尾対象は人体であるため、人体の常識的な移動速度と連続撮影のフレームレートから所定範囲の広さを設定する。検証結果として、所定範囲内に頭部の部分領域が存在すればＳ２１１へと進み、存在しない場合はＳ２１２へと進む。所定範囲内に頭部の部分領域が存在しない場合とは、例えば図９（ｂ）のように、頭部領域が遮蔽されている場合や頭部検出に失敗した場合などが考えられる。 In step S210, the selection unit 25 verifies the comparison result obtained in S209. As a verification content, it is determined whether or not a partial region of the head exists within a predetermined range of the second image. Here, the predetermined range is a tracking range in continuous shooting, but the range is not limited to a specific range. For example, since the tracking target is the human body, a predetermined range is set from the common sense movement speed of the human body and the frame rate of continuous shooting. As a verification result, if the partial region of the head exists within the predetermined range, the process proceeds to S211. If not, the process proceeds to S212. The case where the partial region of the head does not exist within the predetermined range is considered to be the case where the head region is shielded or the case where the head detection fails, as shown in FIG. 9B, for example.

ステップＳ２１１では、選択部２５は、合焦対象領域を選択する。例えば、第１実施形態のＳ１０９と同様の手順により、図８（ａ）で示される第１の画像内の基準領域Ｇ－１０に対して、図８（ｂ）で示される第２の画像内の頭部部分領域Ｇ－１１ａおよびＧ－１１ｂから合焦対象領域を選択する。その後、Ｓ２１４へ処理を進める。 In step S211th, the selection unit 25 selects the focusing target area. For example, in the same procedure as in S109 of the first embodiment, in the second image shown in FIG. 8 (b) with respect to the reference region G-10 in the first image shown in FIG. 8 (a). The focusing target area is selected from the head partial areas G-11a and G-11b. After that, the process proceeds to S214.

ステップＳ２１２では、比較部２４は、第１の画像から取得された基準領域と、第２の画像で検出された胴部領域に属する部分領域に対して、それぞれの画像における深度情報を用いて比較処理を行う。基準領域と部分領域の具体的な比較方法に関しては、第１実施形態のＳ１０８で説明した方法と同様である。比較処理が終わると、Ｓ２１３へ処理を進める。 In step S212, the comparison unit 24 compares the reference region acquired from the first image with the partial region belonging to the body region detected in the second image using the depth information in each image. Perform processing. The specific method for comparing the reference region and the partial region is the same as the method described in S108 of the first embodiment. When the comparison process is completed, the process proceeds to S213.

ステップＳ２１３では、選択部２５は、Ｓ２１２で得られた比較結果から合焦対象領域を決定する。例えば、第１実施形態のＳ１０９と同様の手順により、図９（ａ）で示される第１の画像内の基準領域Ｇ－１０に対して、図９（ｂ）で示される第２の画像内の胴部部分領域Ｇ－７ａおよびＧ－７ｂから合焦対象領域を選択する。その後、Ｓ２１４へ処理を進める。 In step S213, the selection unit 25 determines the focusing target region from the comparison result obtained in S212. For example, in the same procedure as in S109 of the first embodiment, in the second image shown in FIG. 9 (b) with respect to the reference region G-10 in the first image shown in FIG. 9 (a). The in-focus target area is selected from the body partial areas G-7a and G-7b of the above. After that, the process proceeds to S214.

ステップＳ２１４では、領域選択装置２０は、ＡＦ処理を継続するかどうかを判定する。ＡＦ処理を継続する場合はＳ２１５に進み、継続しない場合は処理を終了する。ステップＳ２１５では、領域選択装置２０は、第２の画像を第１の画像として置き換える。その後Ｓ２０４に戻り、処理を反復させる。 In step S214, the area selection device 20 determines whether or not to continue the AF process. If the AF process is to be continued, the process proceeds to S215, and if not, the process is terminated. In step S215, the area selection device 20 replaces the second image with the first image. After that, the process returns to S204 and the process is repeated.

以上説明したとおり第２実施形態によれば、先行する画像において基準領域であった部位が遮蔽等により処理対象の画像において検出できない場合、検出された人体領域内でＡＦに適した部分領域を合焦対象領域として選択する。これにより、ＡＦシステムは、遮蔽等が存在した場合であっても、好適にＡＦを継続実行することが可能である。 As described above, according to the second embodiment, when the portion that was the reference region in the preceding image cannot be detected in the image to be processed due to shielding or the like, a partial region suitable for AF is combined in the detected human body region. Select as the focus area. As a result, the AF system can suitably continuously execute AF even when shielding or the like is present.

なお、本実施形態では検出部２１が胴部領域を検出し、基準領域取得部２３が頭部領域に基づく基準領域を取得する場合について説明したが、頭部と胴部の組み合わせ以外にも適用可能である。たとえば、顔と全身の組み合わせであってもよいし、単独人物と密集した人物の集団の組み合わせであってもよい。ほかにも、車のナンバープレートと車体全体の組み合わせであってもよい。 In the present embodiment, the case where the detection unit 21 detects the body region and the reference area acquisition unit 23 acquires the reference region based on the head region has been described, but it is applied to other than the combination of the head and the body. It is possible. For example, it may be a combination of a face and the whole body, or it may be a combination of a single person and a group of dense people. In addition, it may be a combination of the license plate of the car and the whole body.

（その他の実施例）
本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 (Other examples)
The present invention supplies a program that realizes one or more functions of the above-described embodiment to a system or device via a network or storage medium, and one or more processors in the computer of the system or device reads and executes the program. It can also be realized by the processing to be performed. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.

発明は上記実施形態に制限されるものではなく、発明の精神及び範囲から離脱することなく、様々な変更及び変形が可能である。従って、発明の範囲を公にするために請求項を添付する。 The invention is not limited to the above embodiment, and various modifications and modifications can be made without departing from the spirit and scope of the invention. Therefore, a claim is attached to publicize the scope of the invention.

１０撮影装置；１１画像取得部；１２測距部；２０領域選択装置；２１検出部；２２部分領域抽出部；２３基準領域取得部；２４比較部；２５選択部 10 Imaging device; 11 Image acquisition unit; 12 Distance measurement unit; 20 Area selection device; 21 Detection unit; 22 Partial area extraction unit; 23 Reference area acquisition unit; 24 Comparison unit; 25 Selection unit

Claims

An image processing device that determines the in-focus target area of the image pickup device.
An acquisition means for acquiring a first region to be focused in a first image captured by the image pickup device at a first time point, and an acquisition means.
A detection means for detecting a second region as a candidate for focusing from a second image captured by the image pickup apparatus at a second time point following the first time point.
A determination means for determining an in-focus target region in the second image from one or more partial regions of the second region based on the first region and the second image.
An image processing device characterized by comprising.

Further, it has a depth information acquisition means for acquiring depth information for each region of the image captured by the image pickup device.
The determining means is characterized in that the focusing target region in the second image is determined based on the depth information corresponding to the first region and the depth information corresponding to the second region. The image processing apparatus according to claim 1.

The feature is that the in-focus target region is determined based on the difference in depth calculated by the determination means and the depth information corresponding to the first region and the depth information corresponding to the second image. The image processing apparatus according to claim 1 or 2.

The determination means sets a partial region of the one or more partial regions in which the difference between the depth of the first region and the depth of each of the one or more partial regions is smaller than a predetermined threshold value as the focusing target region. The image processing apparatus according to claim 3, wherein the image processing apparatus is determined to be.

When there are a plurality of partial regions whose difference from the depth of the first region is less than a predetermined value, the determination means has a relatively large area or a relatively small depth among the plurality of partial regions. The image processing apparatus according to claim 4, wherein the partial area is determined to be the focusing target area.

The image processing apparatus according to any one of claims 2 to 5, wherein the depth information is based on the distance between the image pickup device and the subject or the phase difference of the incident light in the image pickup device of the image pickup device.

Claims 1 to 6 are characterized in that the detection means detects a region having image features similar to the image features extracted from the first region in the second image as the second region. The image processing apparatus according to any one of the above items.

The acquisition means acquires a specific part of a person or an object as the first region, and obtains it.
The image processing apparatus according to claim 7, wherein the detection means detects the same portion as the specific portion as the second region.

The acquisition means acquires a specific part of a person or an object as the first region, and obtains it.
The image processing apparatus according to claim 8, wherein the detection means detects a portion different from the specific portion as the second region when the same portion as the specific portion cannot be detected.

Further having an extraction means for extracting one or more partial regions satisfying a predetermined condition from the second region.
The invention according to any one of claims 1 to 9, wherein the determination means determines an in-focus target region in the second image from the one or more partial regions extracted by the extraction means. Image processing equipment.

The extraction means is characterized in that, of the second region, a region whose depth indicated by the depth information corresponding to the second image is included in a predetermined range is extracted as one or more partial regions. The image processing apparatus according to claim 10.

10. The extraction means is characterized in that the predetermined condition of the one or more partial regions is changed based on the depth of field when the image pickup apparatus captures the second image. Or the image processing apparatus according to 11.

The determination means determines an in-focus target area in the second image based on a plurality of criteria.
Claims 1 to 12, wherein the plurality of criteria include the similarity between the first region and the second region, and the respective criteria for the area of the one or more partial regions. The image processing apparatus according to any one of the above items.

13. The image processing apparatus according to claim 13, wherein the determination means changes the plurality of criteria of the focusing target region based on the elapsed time from the first time point.

The image processing apparatus according to any one of claims 1 to 14, wherein the second region is a region showing a head and / or a torso of a person.

Claims 1 to 15 are characterized in that the determination means preferentially determines a partial region having an area larger than a predetermined value among the one or more partial regions as a focus target region in the second image. The image processing apparatus according to any one of the above items.

It is a control method of an image processing device that determines an in-focus target area of an image pickup device.
The acquisition step of acquiring the first region to be focused in the first image captured by the image pickup device at the first time point, and the acquisition step.
A detection step of detecting a second region as a candidate for focusing from a second image captured by the image pickup apparatus at a second time point following the first time point.
A determination step of determining an in-focus target region in the second image from one or more partial regions of the second region based on the first region and the second image.
A control method characterized by including.

A program for making a computer function as each means of the image processing apparatus according to any one of claims 1 to 16.