WO2013005477A1

WO2013005477A1 - Imaging device, three-dimensional image capturing method and program

Info

Publication number: WO2013005477A1
Application number: PCT/JP2012/062369
Authority: WO
Inventors: 矢作　宏一
Original assignee: 富士フイルム株式会社
Priority date: 2011-07-05
Filing date: 2012-05-15
Publication date: 2013-01-10

Abstract

A region including a subject (Z) and having a size at which the shape of the subject (Z) can be identified is extracted from a left-eye image (A (i-1)) to create a template image (TA (i-1)) (step S10), and a predetermined region is extracted from a right-eye image (B (i-1)) by means of the same method to create a template image (TB (i-1)) (step S12). The created template images (TA (i-1), TB (i-1)) are used to search a left-eye image (Ai) in order to search for the subject (step S20). If the result of searching the left-eye image (Ai) by means of the template image (TA (i-1)) is not the same as the result of searching the left-eye image (Ai) by means of the template image (TB (i-1)), that having the highest degree of similarity is employed (steps S24 to 30).

Description

Imaging apparatus, stereoscopic image imaging method, and program

The present invention relates to an imaging apparatus, a stereoscopic image imaging method, and a program, and more particularly to an imaging apparatus, a stereoscopic image imaging method, and a program capable of acquiring a plurality of viewpoint images obtained by capturing the same subject from a plurality of viewpoints.

It is common to acquire a plurality of images and detect or track a subject included in the images. For example, Patent Document 1 describes that two images taken by a stereo camera are acquired and a head pose is detected from the images using a database created in advance.

In Patent Document 2, a plurality of images are acquired, an object is detected using distance information, and information such as a grayscale image or a ternary edge image, which is information other than the distance information, is detected from the position of the object detected by the distance information. Is extracted as a template and an object is tracked using this template.

Patent Document 3 is an invention for tracking a stationary subject from a time-series image group. Specifically, multiple images are acquired, an initial tracking area (corresponding to a template) represented by boundary elements and image features is set in the image, and tracking within the next frame is performed based on the initial tracking area. An area is detected, and a tracking area is further detected in the next frame based on the tracking area.

That is, in the invention described in Patent Document 3, the tracking area is sequentially changed when tracking the subject. The tracking area is changed by estimating the shape of the tracking area of the current frame based on the boundary element, creating a plurality of candidates by deforming the tracking area of the previous frame based on the estimation result, and detecting the plurality of candidates and the detection target. This is done by comparing and collating with the frame to determine the tracking area of the current frame.

JP 2009-169958 A Japanese Patent Application Laid-Open No. 11-252587 JP 2001-101419 A

However, in the invention described in Patent Document 1, it is necessary to create in advance a database used for detecting the head pose. Therefore, there is a problem that it takes time to create a database. Further, the invention described in Patent Literature 1 cannot detect a head that is not registered in the database, that is, when a photographer is photographing an arbitrary subject, an object included in the image is detected. There is a problem that the desire to track cannot be satisfied.

The invention described in Patent Document 2 has no problem with creating a database. However, there is a problem that tracking is highly likely to fail when the orientation of the object to be detected changes.

The invention described in Patent Document 3 can solve the problem of losing sight of the tracking target when the direction of the object to be detected changes. However, the invention described in Patent Document 3 has a problem that it is a stationary subject and cannot be applied when the subject moves. This is because in the case of a moving subject, the shape of the tracking area of the current frame cannot be estimated based on the boundary element.

The present invention has been made in view of such circumstances, and an imaging apparatus and a stereoscopic image imaging method that can reduce the possibility of losing a subject during tracking even when the subject moves or changes direction. And to provide a program.

In order to achieve the above object, an imaging apparatus according to an aspect of the present invention includes a first imaging unit and a second imaging unit that acquire two viewpoint images obtained by capturing the same subject from two viewpoints; A first template image is generated by extracting a part of the area including the subject to be tracked from the viewpoint image captured by the first imaging unit, and the viewpoint image captured by the second imaging unit. A first template image generating unit that extracts a part of the region including the subject to be tracked from the first template image generating unit to generate a second template image, and a viewpoint image captured by the first imaging unit, Search for a subject to be tracked using the first template image and the second template image from viewpoint images taken at different times from the viewpoint image that is the basis of the first template image. And that search means, with a.

According to one aspect of the present invention, the first template image is generated by extracting a part of the region including the subject to be tracked from the viewpoint image captured by the first imaging unit, and the second template image is generated. A second template image is generated by extracting a part of the region including the subject to be tracked from the viewpoint image captured by the imaging unit. Then, the first template image and the second template image are obtained from the viewpoint image captured by the first imaging unit and captured at a time different from the viewpoint image that is the basis of the first template image. To search for a subject to be tracked. That is, the subject is tracked using the images of a plurality of subjects having different directions as keys. Thereby, even when the subject moves or the orientation of the subject changes, the possibility of losing sight of the subject during tracking can be reduced. In addition, since subject tracking processing is performed using a template image obtained by extracting a parallax image, accurate tracking can be performed. As a result, the amount of calculation can be reduced.

In the imaging device according to another aspect of the present invention, when the tracking unit does not search for the subject to be tracked, a synthesized template image is synthesized from the first template image and the second template image by image synthesis processing. And a second template image generation unit that generates a tracking target object using the combined template image when the second template image generation unit generates a combined template image. .

According to another aspect of the present invention, when a tracking target object is not searched by the search unit, a combined template image is generated from the first template image and the second template image by image combining processing. Then, the subject to be tracked is searched using this composite template image. This can further reduce the possibility of losing sight of the subject during tracking.

In the imaging device according to still another aspect of the present invention, the composite template image is an image in an intermediate state between the first template image and the second template image.

In the imaging device according to still another aspect of the present invention, when a plurality of search results are obtained, the search means obtains a template image from which the search results are obtained and a result obtained by searching using the template images. Is calculated for each of the plurality of search results, and the result searched using the template image with the highest similarity is calculated as the subject to be tracked.

According to still another aspect of the present invention, when a plurality of search results are obtained, the similarity between the template image obtained from the search result and the result searched using the template image is calculated. Then, the search result using the template image having the highest calculated similarity is set as the subject to be tracked. Thereby, the accuracy of subject tracking can be increased.

In the imaging device according to still another aspect of the present invention, the search means is a case where a plurality of search results are obtained, and when the search results are obtained using the first template image, The search result using the first template image is set as the tracking target subject. Thereby, the accuracy of subject tracking can be increased.

According to still another aspect of the present invention, when a search result is obtained using the first template image, the result searched using the first template image is set as the subject to be tracked. That is, a subject similar to the subject extracted from the same viewpoint image is the subject to be tracked. Thereby, the accuracy of subject tracking can be increased.

In the imaging device according to still another aspect of the present invention, the second template image generation unit generates a plurality of types of the synthesized template images, and the search unit is a case where a plurality of search results are obtained. When a search result is not obtained using the first template image, a search is performed using a synthesized template image closest to the first template image among the plurality of types of synthesized template images. The result is set as the subject to be tracked.

According to still another aspect of the present invention, when a search result is not obtained using the first template image, the synthesis closest to the first template image among a plurality of types of synthesized template images is generated. A result searched using the template image is set as a subject to be tracked. This can reduce the possibility of losing sight of the subject during tracking while maintaining accuracy.

In the imaging apparatus according to still another aspect of the present invention, when the tracking unit does not search for the tracking target subject, an arbitrary amount is selected from the region extracted in generating the second template image. A third template image generating unit that generates a third template image by extracting the region moved in the direction from the viewpoint image captured by the second imaging unit; and the search unit includes the generated first template image. The object to be tracked is searched using the template image 3.

According to still another aspect of the present invention, if a subject to be tracked is not searched, a region moved in the left-right direction by an arbitrary amount from the region extracted in generating the second template image is obtained. A third template image is generated by extracting from the viewpoint image captured by the second imaging means, and the tracking target subject is searched using the generated third template image. Thereby, even when the subject to be tracked is blocked by another subject, the possibility of losing sight of the subject can be reduced.

In the imaging device according to still another aspect of the present invention, the third template image generation means may be configured to perform the third template image search if the search means cannot obtain a search result using the third template image. Generating a fourth template image by extracting a region moved in the left-right direction by a predetermined amount from the region extracted in generating the template image from the viewpoint image captured by the second imaging unit; Searches for the subject to be tracked using the generated fourth template image.

According to still another aspect of the present invention, when a search result is not obtained using the third template image, a predetermined amount from the region extracted in generating the third template image is set in the horizontal direction. The moved region is extracted from the viewpoint image captured by the second imaging unit to generate a fourth template image, and the tracking target subject is searched using the generated fourth template image. As a result, the possibility of losing sight of the subject can be reduced.

In the imaging device according to still another aspect of the present invention, the third template image generation unit estimates a position where a subject to be tracked is estimated based on the two viewpoint images, and Determine any amount. As a result, the process for creating the template image is only required once, and the time required for tracking the subject can be shortened.

In the imaging apparatus according to still another aspect of the present invention, the first imaging unit and the second imaging unit continuously acquire the two viewpoint images, and the search unit includes the third imaging unit. When the tracking target subject is searched using the template image generated by the template image generation means, the first template image and the second template are used for the viewpoint image captured by the first imaging means thereafter. The subject to be tracked is searched using the template image.

According to still another aspect of the present invention, when the subject to be tracked is not searched in the first search and the subject can be searched in the second and subsequent searches, the image is subsequently captured by the first imaging means. For the viewpoint image, the first template image and the second template image are used. As a result, when the subject to be tracked is covered with a shielding object located in front and cannot be viewed, the subject can be accurately searched when the shielding object moves and the subject can be visually recognized. Can do.

In the imaging apparatus according to still another aspect of the present invention, an automatic exposure control unit that performs automatic exposure control based on a tracking target subject searched by the searching unit, and a focus on the tracking target subject searched by the searching unit. Automatic focus adjustment means that adjusts the focus so that the two are matched, or zoom control means that adjusts the angle of view based on the subject to be tracked searched by the search means.

According to still another aspect of the present invention, automatic exposure control, automatic focus adjustment, and zoom control are performed based on the searched subject to be tracked. Thereby, appropriate control can be performed automatically.

In the imaging apparatus according to still another aspect of the present invention, a first imaging unit and a second imaging unit that acquire two viewpoint images obtained by capturing the same subject from two viewpoints, and the first imaging unit. Template image generation means for generating a first template image by extracting a part of the area including the subject to be tracked from the captured viewpoint image, and parallax acquisition means for acquiring parallax from the two viewpoint images And a fourth template image generation unit that generates a template image obtained by removing the background and foreground from the first template image based on the parallax acquired by the parallax acquisition unit, and the first imaging unit. From the viewpoint image captured at a different time from the viewpoint image that is the basis of the first template image, the fourth template image A search means for searching the tracking target object by using a template image generated in adult means, comprising a.

According to still another aspect of the present invention, the first template image is generated by extracting a part of the region including the subject to be tracked from the viewpoint image captured by the first imaging unit, The parallax is acquired from the viewpoint image, and a template image is generated by removing the background and foreground from the first template image based on the parallax. Then, it is a viewpoint image captured by the first imaging means, and the tracking target image is generated from the viewpoint image captured at a time different from the viewpoint image that is the basis of the first template image, using the generated template image. Search for a subject. This can reduce the possibility of losing sight of the subject even when the background changes due to movement of the subject or the foreground enters in front of the subject.

In the imaging device according to still another aspect of the present invention, the search unit uses the first template image generated by the template image generation unit and the template image generated by the fourth template image generation unit. Search for the subject to be tracked.

According to still another aspect of the present invention, the tracking target object is searched using the template image from which the background and the foreground are not removed and the removed template image. This can further reduce the possibility of losing sight of the subject.

In the three-dimensional image capturing method according to still another aspect of the present invention, a step of acquiring two viewpoint images obtained by capturing the same subject from two viewpoints using a first imaging unit and a second imaging unit; A first template image is generated by extracting a part of the area including the subject to be tracked from the viewpoint image captured by the imaging unit, and the tracking is performed from the viewpoint image captured by the second imaging unit. Extracting a part of a region including a subject to be a target to generate a second template image, a viewpoint image captured by the first imaging unit, and a basis of the first template image Searching for a subject to be tracked using the first template image and the second template image from viewpoint images taken at a different time from the viewpoint image No.

In the program according to still another aspect of the present invention, a step of obtaining two viewpoint images obtained by photographing the same subject from two viewpoints by the first imaging means and the second imaging means, and the first imaging means The first template image is generated by extracting a part of the region including the subject to be tracked from the viewpoint image captured in step 1, and the tracking target is determined from the viewpoint image captured by the second imaging unit. Generating a second template image by extracting a part of the region including the subject to be photographed, and a viewpoint image taken by the first imaging means, which is a basis of the first template image Searching for a subject to be tracked using viewpoint images taken at different times using the first template image and the second template image. To be executed in. A computer-readable non-transitory medium (a non-transitory computer-readable medium) in which this program is recorded is also included in the present invention.

According to the present invention, even when the subject moves or changes its direction, the possibility of losing sight of the subject during tracking can be reduced.

BRIEF DESCRIPTION OF THE DRAWINGS It is the schematic of the compound eye digital camera 1 of the 1st Embodiment of this invention, (a) is a front view, (b) is a rear view. 2 is a block diagram showing an electrical configuration of the compound-eye digital camera 1. FIG. 2 is a diagram illustrating a positional relationship between a subject and a compound-eye digital camera 1. FIG. 3 is a flowchart showing a flow of subject tracking processing of the compound-eye digital camera 1; It is an example of a parallax image. It is a figure for demonstrating the method to produce a template image from a parallax image. It is an example of a template image. It is a figure for demonstrating the method to produce a template image from a parallax image. It is a block diagram which shows the electric constitution of the compound eye digital camera 2 of the 2nd Embodiment of this invention. It is a figure for demonstrating the method to produce | generate a template image by an image synthesis process. 6 is a flowchart showing a flow of subject tracking processing of the compound-eye digital camera 2; 10 is a flowchart showing the flow of subject tracking processing in a modification of the compound-eye digital camera 2. It is a block diagram which shows the electrical constitution of the compound eye digital camera 3 of the 3rd Embodiment of this invention. 4 is a flowchart showing a flow of subject tracking processing of the compound-eye digital camera 3; It is a figure for demonstrating the method to produce a template image from a parallax image. It is an example of a template image. It is a figure for demonstrating the method to produce a template image from a parallax image. It is an example of a template image. 2 is a diagram illustrating a positional relationship between a subject and a compound-eye digital camera 1. FIG. 6 is a flowchart showing a flow of subject tracking processing of a modified example of the compound-eye digital camera 3; It is a figure for demonstrating the method to produce a template image from a parallax image. It is a figure for demonstrating the method to produce a template image from a parallax image. It is a block diagram which shows the electrical constitution of the compound eye digital camera 4 of the 4th Embodiment of this invention. 7 is a flowchart showing a flow of subject tracking processing of a modified example of the compound-eye digital camera 4; It is an example of a parallax map. It is a figure for demonstrating the method to produce a template image.

Hereinafter, the best mode for carrying out an imaging apparatus, a stereoscopic image imaging method, and a program according to the present invention will be described in detail with reference to the accompanying drawings.

<First Embodiment>
FIG. 1 is a schematic diagram of a compound-eye digital camera 1 having a stereoscopic image display device according to the present invention, where (a) is a front view and (b) is a rear view. The compound-eye digital camera 1 is a compound-eye digital camera 1 provided with a plurality of imaging systems (two are illustrated in FIG. 1) and is a stereoscopic view of the same subject viewed from a plurality of viewpoints (two viewpoints on the left and right are illustrated in FIG. 1). Images and single viewpoint images (two-dimensional images) can be taken. The compound-eye digital camera 1 is capable of recording and reproducing not only still images but also moving images and sounds.

The camera body 10 of the compound-eye digital camera 1 is formed in a substantially rectangular parallelepiped box shape, and mainly on the front thereof is a barrier 11, a right imaging system 12, and a left imaging system 13 as shown in FIG. A flash 14 and a microphone 15 are provided. A release switch 20 and a zoom button 21 are mainly provided on the upper surface of the camera body 10.

On the other hand, on the back of the camera body 10, as shown in FIG. 1B, a monitor 16, a mode button 22, a parallax adjustment button 23, a 2D / 3D switching button 24, a MENU / OK button 25, a cross button 26, a DISP A / BACK button 27 is provided.

The barrier 11 is slidably mounted on the front surface of the camera body 10, and is switched between an open state and a closed state when the barrier 11 slides up and down. Normally, as shown by a dotted line in FIG. 1A, the barrier 11 is positioned at the upper end, that is, in a closed state, and the

objective lenses

12a, 13a and the like are covered with the barrier 11. Thereby, damage of a lens etc. is prevented. When the barrier 11 is slid, the lens disposed on the front surface of the camera body 10 is exposed when the barrier is positioned at the lower end, that is, in the open state (see a solid line in FIG. 1A). When a sensor (not shown) recognizes that the barrier 11 is in an open state, the power is turned on by the CPU 110 (see FIG. 2), and photographing is possible.

The right imaging system 12 that captures an image for the right eye and the left imaging system 13 that captures an image for the left eye acquire two viewpoint images obtained by capturing the same subject from two viewpoints, as shown in FIG. The right imaging system 12 and the left imaging system 13 are optical units including a photographing lens group having a bending optical system, diaphragm and

mechanical shutters

12d and 13d, and imaging elements 122 and 123 (see FIG. 2). The photographic lens group of the right imaging system 12 and the left imaging system 13 mainly includes

objective lenses

12a and 13a that take in light from a subject, a prism (not shown) that bends an optical path incident from the objective lens substantially vertically, and a zoom lens 12c. 13c (see FIG. 2), focus

lenses

12b and 13b (see FIG. 2), and the like.

The flash 14 is composed of a xenon tube, and emits light as necessary when shooting a dark subject or when backlit.

The monitor 16 is a liquid crystal monitor capable of color display having a general aspect ratio of 4: 3, and can display both a stereoscopic image and a planar image. Although the detailed structure of the monitor 16 is not shown, the monitor 16 is a parallax barrier type 3D monitor having a parallax barrier display layer on the surface thereof. The monitor 16 is used as a user interface display panel when performing various setting operations, and is used as an electronic viewfinder during image capturing.

The monitor 16 can be switched between a mode for displaying a stereoscopic image (3D mode) and a mode for displaying a planar image (2D mode). In the 3D mode, a parallax barrier having a pattern in which light transmitting portions and light shielding portions are alternately arranged at a predetermined pitch is generated on the parallax barrier display layer of the monitor 16, and The strip-shaped image fragments showing the image are alternately arranged and displayed. When used as a 2D mode or a user interface display panel, nothing is displayed on the parallax barrier display layer, and one image is displayed as it is on the lower image display surface.

The monitor 16 is not limited to the parallax barrier type, and a lenticular method, an integral photography method using a microlens array sheet, a holography method using an interference phenomenon, or the like may be employed. The monitor 16 is not limited to a liquid crystal monitor, and an organic EL or the like may be employed.

The release switch 20 is composed of a two-stage stroke switch composed of a so-called “half press” and “full press”. When the compound-eye digital camera 1 shoots a still image (for example, when the still image shooting mode is selected with the mode button 22 or when the still image shooting mode is selected from the menu), the release switch 20 is pressed halfway, so (Automatic Exposure: Automatic Exposure), AF (Auto Focus: Automatic Focusing), AWB (Automatic White Balance: Automatic White Balance) are performed, and when fully pressed, image capturing / recording processing is performed. Also, during movie shooting (for example, when the movie shooting mode is selected with the mode button 22 or when the movie shooting mode is selected from the menu), when the release switch 20 is fully pressed, shooting of the movie starts, and when the shutter button is fully pressed again, shooting occurs Exit.

The zoom button 21 is used for a zoom operation of the right imaging system 12 and the left imaging system 13, and includes a zoom tele button 21T for instructing zooming to the telephoto side and a zoom wide button 21W for instructing zooming to the wide angle side. Has been.

The mode button 22 functions as shooting mode setting means for setting the shooting mode of the digital camera 1, and the shooting mode of the digital camera 1 is set to various modes depending on the setting position of the mode button 22. The shooting mode is divided into a “moving image shooting mode” in which moving image shooting is performed and a “still image shooting mode” in which still image shooting is performed. "Auto shooting mode" that is set automatically, "Face extraction shooting mode" that extracts a person's face for shooting, "Sport shooting mode" suitable for moving body shooting, "Landscape shooting mode" suitable for landscape shooting ”,“ Night scene shooting mode ”suitable for evening and night scene shooting,“ Aperture priority shooting mode ”in which the user sets the scale of the aperture and the digital camera 1 automatically sets the shutter speed, and the shutter speed is the user The “shutter speed priority shooting mode” in which the digital camera 1 automatically sets the scale of the aperture, and the “manifold” in which the user sets the aperture, shutter speed, etc. There are some shooting mode ", and the like.

The parallax adjustment button 23 is a button for electronically adjusting the parallax at the time of stereoscopic image shooting. By pressing the right side of the parallax adjustment button 23, the parallax between the image captured by the right imaging system 12 and the image captured by the left imaging system 13 is increased by a predetermined distance, and the left side of the parallax adjustment button 23 is pressed. Thus, the parallax between the image captured by the right imaging system 12 and the image captured by the left imaging system 13 is reduced by a predetermined distance.

The 2D / 3D switching button 24 is a switch for instructing switching between a 2D shooting mode for shooting a single viewpoint image and a 3D shooting mode for shooting a multi-viewpoint image.

The MENU / OK button 25 is used for calling up various setting screens (menu screens) of shooting and playback functions (MENU function), as well as for confirming selection contents, executing instructions for processing, etc. (OK function). All adjustment items of the camera 1 are set. When the MENU / OK button 25 is pressed during shooting, a setting screen for adjusting image quality such as exposure value, hue, ISO sensitivity, and the number of recorded pixels is displayed on the monitor 16, and when the MENU / OK button 25 is pressed during playback. Then, a setting screen for erasing the image is displayed on the monitor 16. The compound-eye digital camera 1 operates according to the conditions set on this menu screen.

The cross button 26 is a button for setting, selecting, or zooming various menus, and is provided so that it can be pressed in four directions, up, down, left, and right. The button in each direction corresponds to the setting state of the camera. A function is assigned. For example, at the time of shooting, a function for switching the macro function ON / OFF is assigned to the left button, and a function for switching the flash mode is assigned to the right button. In addition, a function for changing the brightness of the monitor 16 is assigned to the upper button, and a function for switching ON / OFF of the self-timer and time is assigned to the lower button. During playback, a frame advance function is assigned to the right button, and a frame return function is assigned to the left button. In addition, a function for deleting an image being reproduced is assigned to the upper button. Further, at the time of various settings, a function for moving the cursor displayed on the monitor 16 in the direction of each button is assigned.

The DISP / BACK button 27 functions as a button for instructing display switching of the monitor 16, and when the DISP / BACK button 27 is pressed during shooting, the display of the monitor 16 is switched from ON to framing guide display to OFF. . If the DISP / BACK button 27 is pressed during playback, the playback mode is switched from normal playback to playback without character display to multi playback. The DISP / BACK button 27 functions as a button for instructing to cancel the input operation or return to the previous operation state.

FIG. 2 is a block diagram showing the main internal configuration of the compound-eye digital camera 1. The compound-eye digital camera 1 mainly includes a CPU 110, operation means (release switch 20, MENU / OK button 25, cross button 26, etc.) 112, SDRAM 114, VRAM 116, AF detection means 118, AE / AWB detection means 120,

image sensor

122, 123, CDS /

AMP

124, 125, A /

D converters

126, 127, image input controller 128, image signal processing means 130, compression / decompression processing means 132, stereoscopic image generation unit 133, video encoder 134, template image generation unit 135, Subject search unit 136, media controller 137, sound input processing means 138, recording medium 140, focus lens driving means 142, 143, zoom lens driving means 144, 145, aperture driving means 146, 147, timing generator ( G) consisting of 148 and 149.

The CPU 110 comprehensively controls the overall operation of the compound-eye digital camera 1. CPU 110 issues a command to each block in response to an input from operation means 112. The CPU 110 controls the operations of the right imaging system 12 and the left imaging system 13. The right imaging system 12 and the left imaging system 13 basically operate in conjunction with each other, but can be operated individually. Further, the CPU 110 generates two pieces of image data obtained by the right imaging system 12 and the left imaging system 13 as strip-shaped image fragments, and generates display image data that is alternately displayed on the monitor 16. When displaying in the 3D mode, a parallax barrier having a pattern in which light transmitting portions and light shielding portions are alternately arranged at a predetermined pitch is generated on the parallax barrier display layer, and the image display surface below the parallax barrier is displayed. Stereoscopic viewing is enabled by alternately displaying strip-shaped image fragments showing left and right images.

The SDRAM 114 stores firmware, which is a control program executed by the CPU 110, various data necessary for control, camera setting values, captured image data, and the like.

The VRAM 116 is used as a work area for the CPU 110 and also as a temporary storage area for image data.

The

image sensors

122 and 123 are constituted by color CCDs provided with R, G, and B color filters in a predetermined color filter array (for example, honeycomb array, Bayer array). The

image sensors

122 and 123 receive the subject light imaged by the

focus lenses

12b and 13b, the

zoom lenses

12c and 13c, and the light incident on the light receiving surface is received by the photodiodes arranged on the light receiving surface. The signal charge is converted into an amount corresponding to the amount of incident light. In the photocharge accumulation / transfer operations of the

image sensors

122 and 123, the electronic shutter speed (photocharge accumulation time) is determined based on the charge discharge pulses input from the

TGs

148 and 149, respectively.

That is, when a charge discharge pulse is input to the

image sensors

122 and 123, charges are discharged without being stored in the

image sensors

122 and 123. On the other hand, when no charge discharge pulse is input to the

image sensors

122 and 123, charges are not discharged, so that charge accumulation, that is, exposure is started in the

image sensors

122 and 123. The imaging signals acquired by the

imaging elements

122 and 123 are output to the CDS /

AMPs

124 and 125 based on the driving pulses given from the

TGs

148 and 149, respectively.

The CDS /

AMPs

124 and 125 are for the purpose of reducing correlated double sampling processing (noise (particularly thermal noise) included in the output signal of the image sensor) and the like for the image signals output from the

image sensors

122 and 123. Processing to obtain accurate pixel data by taking the difference between the feedthrough component level and the pixel signal component level included in the output signal for each pixel of the image sensor, and amplifying the analog signals of R, G, B An image signal is generated.

The A /

D converters

126 and 127 convert the R, G, and B analog image signals generated by the CDS /

AMPs

124 and 125 into digital image signals.

The image input controller 128 has a built-in line buffer with a predetermined capacity, accumulates an image signal for one image output from the CDS / AMP / AD conversion means, and records it in the VRAM 116 in accordance with a command from the CPU 110.

The image signal processing unit 130 is a synchronization circuit (a processing circuit that converts a color signal into a simultaneous expression by interpolating a spatial shift of the color signal associated with a color filter array of a single CCD), a white balance correction circuit, and a gamma correction. Circuit, contour correction circuit, luminance / color difference signal generation circuit, etc., and according to a command from the CPU 110, the input image signal is subjected to necessary signal processing to obtain luminance data (Y data) and color difference data (Cr, Cb data). ) Is generated. Hereinafter, the image data generated from the image signal output from the image sensor 122 is referred to as a right eye image B, and the image data generated from the image signal output from the image sensor 123 is referred to as a left eye image A.

The left-eye image A and the right-eye image B (3D image data) processed by the image signal processing unit 130 are input to the VRAM 50. The VRAM 50 includes an A area and a B area each storing 3D image data representing a 3D image for one frame. In the VRAM 50, 3D image data representing a 3D image for one frame is rewritten alternately in the A area and the B area. The written 3D image data is read from an area other than the area in which the 3D image data is rewritten in the A area and the B area of the VRAM 50.

The stereoscopic image generation unit 133 allows the monitor 16 to display the 3D image data read from the VRAM 50 or the non-compressed 3D image data read from the recording medium 140 and generated by the compression / decompression processing unit 132 on the monitor 16. Process. For example, in the case of a parallax barrier monitor, the stereoscopic image generation unit 133 divides the right-eye image B and the left-eye image A used for reproduction into strips, and the strip-shaped right-eye image B and left-eye image are separated. Display image data in which images A are arranged alternately is generated. The display image data is output from the stereoscopic image generation unit 133 to the monitor 16 via the video encoder 134.

The video encoder 134 controls display on the monitor 16. That is, the display image data generated by the stereoscopic image generation unit 133 is converted into a video signal (for example, NTSC signal, PAL signal, SCAM signal) to be displayed on the monitor 16 and output to the monitor 16 as well as necessary. In response to this, predetermined character and graphic information is output to the monitor 16.

Thereby, the right-eye image B and the left-eye image A are stereoscopically displayed on the monitor 16. When 3D image data representing a 3D image for one frame is alternately rewritten in the VRAM 50 and 3D image data written from an area other than the area where the 3D image data is rewritten is read out, 3D images are continuously displayed in real time on the monitor 16 (display of live view images (through images)).

Also, the left-eye image A and the right-eye image B processed by the image signal processing means 130 are input to the template image generation unit 135. The template image generation unit 135 extracts a predetermined region (for example, a rectangle) including the subject Z to be tracked from each of the left eye image A and the right eye image B, and generates a template image. Details of the contents performed by the template image generation unit 135 will be described later.

Further, the left eye image A and the right eye image B processed by the image signal processing means 130 are input to the subject searching unit 136. The subject search unit 136 uses the template image generated by the template image generation unit 135 to search for a portion similar to the template image from at least one of the left-eye image A and the right-eye image B. Accordingly, the subject Z is searched from at least one of the left-eye image A and the right-eye image B. Details of the contents performed by the subject searching unit 136 will be described later.

At least one of the left-eye image A and the right-eye image B for which the subject Z has been searched by the subject search unit 136 is input to the template image generation unit 135, and a predetermined region including the subject Z is extracted by the template image generation unit 135. A template image is generated.

The AF detection unit 118 calculates a physical quantity necessary for AF control from the input image signal so that the subject Z searched by the subject search unit 136 is in focus according to a command from the CPU 110. The AF detection unit 118 includes a right imaging system AF control circuit that performs AF control based on the image signal input from the right imaging system 12, and a left imaging that performs AF control based on the image signal input from the left imaging system 13. And a system AF control circuit. In the digital camera 1 of the present embodiment, AF control is performed based on the contrast of the images obtained from the image sensors 122 and 123 (so-called contrast AF), and the AF detection unit 118 determines the sharpness of the image from the input image signal. The focus evaluation value shown is calculated. The CPU 110 detects a position where the focus evaluation value calculated by the AF detection means 118 is maximized, and moves the focus lens group to that position. That is, the focus lens group is moved from the closest distance to infinity in a predetermined step, the focus evaluation value is obtained at each position, and the position where the obtained focus evaluation value is the maximum is set as the in-focus position, and the focus lens group is at that position. Move.

The focus

lens driving units

142 and 143 move the

focus lenses

12b and 13b in the optical axis direction in accordance with instructions from the CPU 110, and vary the focal position so that the subject Z searched by the subject search unit 136 is in focus. .

The zoom lens driving means 144 and 145 are respectively optical axes of the

zoom lenses

12c and 13c according to a command from the CPU 110 in accordance with an instruction from the photographer or according to a command from the CPU 110 so that the subject Z searched for by the subject search unit 136 has a predetermined size. Move in the direction and change the focal length.

AE / AWB detection means 120 calculates a physical quantity necessary for AE control and AWB control from the input image signal in accordance with a command from CPU 110. For example, the AE / AWB detection unit 120 divides one screen into a plurality of areas (for example, 16 × 16) as physical quantities necessary for AE control, and an integrated value of R, G, and B image signals for each divided area. Is calculated. Alternatively, the AE / AWB detection unit 120 calculates an integrated value of R, G, and B image signals in a predetermined area including the subject Z searched by the subject search unit 136. The CPU 110 detects the brightness of the subject (subject brightness) based on the integrated value obtained from the AE / AWB detection means 120, and calculates an exposure value (shooting EV value) suitable for shooting. Then, an aperture value and a shutter speed are determined from the calculated shooting EV value and a predetermined program diagram. *

The aperture driving means 146 and 147 adjust the amounts of light incident on the

image sensors

122 and 123 by varying the apertures of the aperture /

mechanical shutters

12d and 13d in accordance with a command from the CPU 110. Further, the

aperture driving units

146 and 147 open / close the aperture /

mechanical shutters

12d and 13d in accordance with a command from the CPU 110 to perform exposure / light shielding on the

image sensors

122 and 123, respectively.

Further, the AE / AWB detection unit 120 divides one screen into a plurality of areas (for example, 16 × 16) as physical quantities necessary for AWB control, and the colors of the R, G, and B image signals for each divided area. Calculate another average integrated value. Alternatively, the AE / AWB detection unit 120 calculates an average integrated value for each color of R, G, and B image signals in a predetermined area including the subject Z searched by the subject search unit 136. The CPU 110 obtains the ratio of R / G and B / G for each divided area from the obtained R accumulated value, B accumulated value, and G accumulated value, and R of the obtained R / G and B / G values. The light source type is discriminated based on the distribution in the color space of / G and B / G. Then, according to the white balance adjustment value suitable for the discriminated light source type, for example, the value of each ratio is approximately 1 (that is, the RGB integration ratio is R: G: B = 1: 1: 1 in one screen). Then, a gain value (white balance correction value) for the R, G, and B signals of the white balance adjustment circuit is determined.

The compression / decompression processing unit 132 performs compression processing in a predetermined format on the input image data in accordance with a command from the CPU 110 to generate compressed image data. Further, in accordance with a command from the CPU 110, the input compressed image data is subjected to a decompression process in a predetermined format to generate uncompressed image data.

The media controller 137 records each image data compressed by the compression / decompression processing unit 132 on the recording medium 140.

The recording medium 140 includes an xD picture card (registered trademark) detachably attached to the compound-eye digital camera 1, a semiconductor memory card represented by smart media (registered trademark), a portable small hard disk, a magnetic disk, an optical disk, a magneto-optical disk, etc. Various recording media.

The sound input processing means 138 receives an audio signal input to the microphone 15 and amplified by a stereo microphone amplifier (not shown), and performs an encoding process on the audio signal.

The operation of the compound eye digital camera 1 configured as described above will be described.

When the barrier 11 is slid from the closed state to the open state, the compound-eye digital camera 1 is turned on, and the compound-eye digital camera 1 is activated under the photographing mode. As the shooting mode, a 2D mode and a 3D shooting mode for shooting a stereoscopic image of the same subject viewed from two viewpoints can be set. In addition, as the 3D mode, a 3D shooting mode in which a stereoscopic image is shot with a predetermined parallax using the right imaging system 12 and the left imaging system 13 can be set. The shooting mode is set by selecting “shooting mode” with the cross button 26 or the like on the menu screen displayed on the monitor 16 when the MENU / OK button 25 is pressed while the compound-eye digital camera 1 is driven in the shooting mode. Thus, the setting can be made from the shooting mode menu screen displayed on the monitor 16.

(1) 2D shooting mode The CPU 110 selects the right imaging system 12 or the left imaging system 13 (the left imaging system 13 in the present embodiment), and starts shooting for the shooting confirmation image by the imaging device 123 of the left imaging system 13. To do. That is, images are continuously picked up by the image pickup device 123, the image signals are continuously processed, and image data for a shooting confirmation image is generated.

The CPU 110 sets the monitor 16 in the 2D mode, sequentially adds the generated image data to the video encoder 134, converts it into a signal format for display, and outputs it to the monitor 16. As a result, the image captured by the image sensor 123 is stereoscopically displayed on the monitor 16. When the input of the monitor 16 corresponds to a digital signal, the video encoder 134 is not necessary, but it is necessary to convert it to a signal form that matches the input specifications of the monitor 16.

The user (user) performs framing while viewing the shooting confirmation image displayed in three dimensions on the monitor 16, checks the subject to be shot, checks the image after shooting, and sets shooting conditions.

When the release switch 20 is half-pressed in the shooting standby state, an S1 ON signal is input to the CPU 110. The CPU 110 detects this and performs AE metering and AF control. At the time of AE photometry, the brightness of the subject is measured based on the integrated value of the image signal captured via the image sensor 123. This photometric value (photometric value) is used to determine the aperture value and shutter speed of the aperture / mechanical shutter 13d at the time of actual photographing. At the same time, it is determined from the detected subject brightness whether the flash 14 needs to emit light. When it is determined that the flash 14 needs to emit light, the flash 14 is pre-lighted, and the light emission amount of the flash 14 at the time of actual photographing is determined based on the reflected light.

When the release switch 20 is fully pressed, an S2 ON signal is input to the CPU 110. The CPU 110 executes photographing and recording processing in response to the S2ON signal.

First, the CPU 110 drives the aperture-mechanical shutter 13d via the aperture driving means 147 based on the aperture value determined based on the photometric value, and at the same time, the imaging device has a shutter speed determined based on the photometric value. The charge accumulation time at 123 (so-called electronic shutter) is controlled.

Further, the CPU 110 sequentially moves the focus lens to a lens position corresponding to the distance from the nearest to infinity during AF control, and the image is based on the image signal of the AF area of the image captured through the image sensor 123 for each lens position. An evaluation value obtained by integrating the high frequency components of the signal is acquired from the AF detection unit 118, a lens position where the evaluation value reaches a peak is obtained, and contrast AF is performed to move the focus lens to the lens position.

At this time, when the flash 14 is caused to emit light, the flash 14 is caused to emit light based on the light emission amount of the flash 14 obtained from the result of pre-emission.

The subject light is incident on the light receiving surface of the image sensor 123 via the focus lens 13b, the zoom lens 13c, the diaphragm-mechanical shutter 13d, the infrared cut filter 46, the optical low-pass filter 48, and the like.

The signal charge accumulated in each photodiode of the image sensor 123 is read according to the timing signal applied from the TG 149, sequentially output from the image sensor 123 as a voltage signal (image signal), and input to the CDS / AMP 125.

The CDS / AMP 125 performs correlated double sampling processing on the CCD output signal based on the CDS pulse, and amplifies the image signal output from the CDS circuit by the imaging sensitivity setting gain applied from the CPU 110.

The analog image signal output from the CDS / AMP 125 is converted into a digital image signal by the A / D converter 127, and the converted image signal (R, G, B RAW data) is transferred to the SDRAM 114. And once stored here.

The R, G, B image signals read from the SDRAM 114 are input to the image signal processing means 130. In the image signal processing means 130, white balance adjustment is performed by applying digital gain to each of the R, G, and B image signals by the white balance adjustment circuit, and gradation conversion processing according to gamma characteristics is performed by the gamma correction circuit. The synchronization circuit interpolates the spatial shift of the color signals associated with the color filter array of the single CCD and performs the synchronization process for matching the phases of the color signals. The synchronized R, G, B image signals are further converted into a luminance signal Y and color difference signals Cr, Cb (YC signal) by a luminance / color difference data generation circuit and subjected to predetermined signal processing such as edge enhancement. The YC signal processed by the image signal processing means 130 is stored in the SDRAM 114 again.

The YC signal stored in the SDRAM 114 as described above is compressed by the compression / expansion processing means 132 and recorded on the recording medium 140 via the media controller 137 as an image file of a predetermined format. Still image data is stored in the recording medium 140 as an image file according to the Exif standard. The Exif file has an area for storing main image data and an area for storing reduced image (thumbnail image) data. A thumbnail image having a specified size (for example, 160 × 120 or 80 × 60 pixels) is generated from the main image data obtained by shooting through pixel thinning processing and other necessary data processing. The thumbnail image generated in this way is written in the Exif file together with the main image. Also, tag information such as shooting date / time, shooting conditions, and face detection information is attached to the Exif file.

When the mode of the compound-eye digital camera 1 is set to the playback mode, the CPU 110 outputs a command to the media controller 137 to read out the image file recorded last on the recording medium 140.

The compressed image data of the read image file is added to the compression / decompression processing unit 132, decompressed to an uncompressed luminance / color difference signal, and output to the monitor 16 via the video encoder 134. As a result, the image recorded on the recording medium 140 is reproduced and displayed on the monitor 16 (reproduction of one image). As for the image image | photographed in 2D imaging | photography mode, a planar image is displayed on the monitor 16 whole surface in 2D mode.

(2) 3D shooting mode Shooting for the shooting confirmation image is started by the image sensor 122 and the image sensor 123. That is, the image sensor 122 and the image sensor 123 continuously capture the right-eye image B and the left-eye image A at a predetermined frame rate, the image signals are continuously processed, and the stereoscopic image data for the shooting confirmation image is obtained. Generated. The CPU 110 sets the monitor 16 to the 3D mode, and the generated image data is sequentially converted into a signal format for display by the video encoder 134 and is output to the monitor 16. As a result, stereoscopic image data for the shooting confirmation image is stereoscopically displayed on the monitor 16.

In the present embodiment, a process of tracking the subject Z with respect to the left-eye image A is performed in parallel with the stereoscopic image data for the shooting confirmation image being stereoscopically displayed on the monitor 16.

FIG. 4 is a flowchart showing a flow of processing for tracking the subject Z with respect to the image A for the left eye continuously photographed at a predetermined frame rate. This process is controlled by the CPU 110. A program for causing the CPU 110 to execute this imaging process is stored in a program storage unit in the CPU 110.

The left-eye image A taken immediately before the processing target frame, in this case, the left-eye image A0 (see FIG. 5) taken immediately before the process of tracking the subject Z is input to the template image generation unit 135. The template image generation unit 135 generates a template image TA0 from the left eye image A0 (step S10). A process in which the template image generation unit 135 generates the template image TA0 will be described in detail.

The left-eye image A0 shown in FIG. 5 includes a subject Z (here, a human face) that is a tracking target. The template image generation unit 135 extracts a region including the subject Z from the left-eye image A0 and having a size that allows the shape of the subject Z to be recognized. In the present embodiment, as shown by the dotted line in FIG. 6, a rectangular area having a margin of several pixels in the contour of the subject Z is extracted from the left-eye image A0. Then, the template image generation unit 135 sets the extracted image as a template TA0 as shown in FIG. Thereby, a template image is generated.

Note that the method of selecting the subject Z as a tracking target includes a method of automatically detecting the subject by face detection or the like, a method of selecting by the photographer via the operation means 112, and a subject having a transmitter automatically. A method etc. can be considered. When the subject is automatically detected by face detection, face detection is performed from each of the left eye image A and the right eye image B, and it is confirmed that the same face is detected in the left eye image A and the right eye image B. Then, the face may be the subject Z. When different faces are detected in the left-eye image A and the right-eye image B, the photographer may select the face via the operation unit 112. When the photographer selects via the operation means 112, the photographer may select from each of the left eye image A and the right eye image B, or the photographer may select from the left eye image A or the right eye image B. The same subject may be detected from the other image by selecting and detecting corresponding points.

The right-eye image B photographed immediately before the frame to be processed, in this case, the right-eye image B0 (see FIG. 5) photographed immediately before the process of tracking the subject Z is input to the template image generation unit 135. The template image generation unit 135 generates a template image TB0 from the right eye image B0 by the same method as in step S10 (step S12).

When the subject Z is selected as a tracking target, if the left-eye image A is used as a reference, the right-eye image B may not include the subject Z that is the tracking target (for example, when it is covered by another subject). It is done. In this case, a template image is generated from images that are as close as possible to the photographing timing. For example, when the template image TA0 can be generated from the left-eye image A0, but the subject Z is not included in the right-eye image B0, the subject Z is included in the right-eye image captured immediately before the right-eye image B0. In this case, the template image generated from this image may be used as the template image TB0. If the subject Z is not included in the right-eye image taken immediately before the right-eye image B0, the previous image is displayed. The template image TB0 may be generated from the image for the right eye that is captured in FIG.

I is set to 1 (step S14). That is, the photographing and processing of the first image is started. Note that i is a positive integer.

The left imaging system 13 captures the left-eye image Ai (currently, A = 1 since i = 1, see FIG. 8) (step S16). Since the subject tracking process is performed on the left-eye image Ai, the captured left-eye image Ai is input to the subject search unit 136. At the same time, the left-eye image Ai is input to the video encoder 134, is sequentially converted into a display signal format by the video encoder 134, and is output to the monitor 16.

The right imaging system 12 captures the right-eye image Bi (B1 because i = 1 at this time, see FIG. 8) (step S18). Since the subject tracking process is not performed on the right-eye image Bi, the captured right-eye image Bi is input to the template image generation unit 135. At the same time, the right-eye image Bi is input to the video encoder 134, is sequentially converted into a display signal format by the video encoder 134, and is output to the monitor 16.

The subject search unit 136 searches the left-eye image Ai using the generated template images TA (i−1) and TB (i−1), and uses the left-eye image Ai as the template image TA (i−1), A part similar to TB (i-1) is searched (step S20). Since i = 1 at this time, the left-eye image A1 is searched using the template images TA0 and TB0 generated in steps S10 and S12.

As shown in FIG. 3, due to the difference in shooting position between the left imaging system 13 and the right imaging system 12, the optical axis 13 </ b> L of the left imaging system 13 and the optical axis 12 </ b> L of the right imaging system 12 do not coincide with each other. The result of photographing the subject Z with the right and the result of photographing the subject Z with the right imaging system 12 are different. That is, as shown in FIG. 6, the orientation of the subject Z included in the left-eye image A is different from the orientation of the subject Z included in the right-eye image B. Therefore, since the subject Z included in the template image TAi and the subject Z included in the template image TBi are different, the left-eye image Ai is searched by using both the template images TA (i−1) and TB (i−1).

As a method for searching for a portion similar to the template images TA (i-1) and TB (i-1) from the left-eye image Ai, a pattern matching method such as template matching can be used. It is not limited to.

The subject search unit 136 determines whether or not the search for the left-eye image Ai has been completed (step S22). If the search has not been completed (NO in step S22), step S22 is performed again.

When the search for the left-eye image Ai is completed (YES in step S22), the subject searching unit 136 searches the template image TA (i-1) for the left-eye image Ai and the template image TB (i− It is determined whether or not the result of searching the left-eye image Ai in 1) is the same (step S24). Since i = 1 now, it is determined whether or not the result of searching the left-eye image A1 using the template image TA0 is the same as the result of searching the left-eye image A1 using the template image TB0.

If the result of searching the left-eye image Ai using the template image TA (i-1) is the same as the result of searching the left-eye image Ai using the template image TB (i-1) (YES in step S24), the template image A portion similar to TA (i−1) and TB (i−1) is set as the position of the subject Z. When the left-eye image A1 shown in FIG. 8 is searched, the search result is the area surrounded by the dotted line in FIG. 8A1 regardless of whether the template image TA0 or TB0 is searched. The position surrounded by the dotted line is the position of the subject Z. The subject search unit 136 inputs the search result and the left-eye image Ai to the template image generation unit 135. Thereafter, the process proceeds to step S32.

If the result of searching the left-eye image Ai using the template image TA (i-1) and the result of searching the left-eye image Ai using the template image TB (i-1) are not the same (YES in step S24), the subject The search unit 136 calculates the similarity between the result of searching the left eye image Ai using the template image TA (i-1) and the template image TA (i-1), and also uses the template image TB (i-1) to perform the left eye operation. The similarity between the search result of the image Ai and the template image TB (i-1) is calculated. As a method for calculating the similarity, a known method, for example, a difference between feature value values, a least square method on a feature amount space (a weighted space is also possible), or the like can be employed. Then, the subject search unit 136 searches for the left eye image Ai using the template image TA (i-1) and the similarity between the template image TA (i-1) and the template image TA (i-1). It is determined whether or not the result of searching the image Ai is higher than the similarity between the template image TB (i-1) (step S26).

The similarity between the result of searching the left eye image Ai using the template image TA (i-1) and the template image TA (i-1) is the result of searching the left eye image Ai using the template image TB (i-1). When the degree of similarity is higher than the template image TB (i-1) (YES in step S26), the subject searching unit 136 searches the template image TA (i-1) for the left eye image Ai, that is, the template. A portion similar to the image TA (i-1) is set as the position of the subject Z (step S28). The subject search unit 136 inputs the position of the subject Z and the left eye image Ai to the template image generation unit 135.

The similarity between the result of searching the left eye image Ai using the template image TA (i-1) and the template image TA (i-1) is the result of searching the left eye image Ai using the template image TB (i-1). When the similarity is not higher than the similarity with the template image TB (i-1) (NO in step S26), the subject searching unit 136 searches the template image TB (i-1) for the left eye image Ai, that is, A portion similar to the template image TB (i-1) is set as the position of the subject Z (step S30). The subject search unit 136 inputs the position of the subject Z and the left eye image Ai to the template image generation unit 135.

Thereby, the subject Z is searched from the left-eye image Ai. Based on the position of the subject Z set in steps S24, S28, and S30, the template image generation unit 135 extracts a region that includes the subject Z from the left-eye image Ai and has a size that allows the shape of the subject Z to be recognized. A template image TAi is generated (step S32). Similar to step S10, a rectangular area having a margin of several pixels is extracted from the contour of the subject Z and generated as a template image. In the case shown in FIG. 8A1, the dotted line portion is generated as the template image TA1.

The template image generation unit 135 generates a template image TBi from the right-eye image Bi by the same method as in step S12 (step S34). When the subject Z that is the tracking target is not included in the right-eye image Bi, the right-eye image is taken at the closest timing before the right-eye image Bi, and the image includes the subject Z. A template image TB0 may be generated.

Thereafter, i = 1 + 1 is set (step S36), the process returns to step S16, and the processes of steps S16 to S36 are performed again. That is, when the first image (i = 1) is shot and the search for the subject is completed, the second (i = 1 + 1 = 2) process is performed, and the second (i = 2) image is taken. When the search is completed, the third sheet (i = 2 + 1 = 3) is processed, and the processes of steps S16 to S36 are sequentially repeated.

As a result, the subject Z is continuously searched for the left-eye images Ai (i = 1, 2,...) Continuously shot at a predetermined frame rate, that is, the shooting confirmation image displayed stereoscopically on the monitor 16. The subject Z is tracked with respect to the left-eye image A.

Also, the zoom may be optimized based on the subject Z simultaneously with the tracking of the subject Z. For example, the CPU 110 moves the

zoom lenses

12c and 13c in the optical axis direction via the zoom

lens driving units

144 and 145 so that the size of the subject Z becomes a predetermined size. Thereby, the photographer can recognize what the tracking target is. In addition, since the tracking target is an important subject for the photographer, an image in which the important subject is easy to see can be taken.

When the release switch 20 is pressed halfway while the stereoscopic image data for the shooting confirmation image is stereoscopically displayed on the monitor 16 (in the shooting standby state), an S1 ON signal is input to the CPU 110. The CPU 110 detects this and ends the subject tracking process shown in FIG. The CPU 110 performs AE metering and AF control. In the present embodiment, AE metering and AF control are performed by the left imaging system 13 that has performed tracking processing of the subject Z. Further, the exposure and focus are optimized based on the subject Z tracked by the subject tracking process shown in FIG. In other words, AE photometry is performed so that the subject Z has an appropriate exposure, and AF processing is performed so that the subject Z is in focus. Since AE metering and AF control are the same as in the 2D shooting mode, detailed description thereof is omitted.

When the release switch 20 is fully pressed, an S2 ON signal is input to the CPU 110. The CPU 110 executes photographing and recording processing in response to the S2ON signal. The processing for generating image data captured by each of the right imaging system 12 and the left imaging system 13 is the same as that in the 2D imaging mode, and thus description thereof is omitted.

From the two pieces of image data generated by the CDS /

AMP

124 and 125, two pieces of compressed image data are generated by the same method as in the 2D shooting mode. The two pieces of compressed image data are associated and stored in the recording medium 140 as one file. An MP format or the like can be used as the storage format.

When the mode of the compound-eye digital camera 1 is set to the playback mode, the CPU 110 outputs a command to the media controller 137 to read out the image file recorded last on the recording medium 140. The compressed image data of the read image file is added to the compression / decompression processing unit 132, decompressed to an uncompressed luminance / color difference signal, converted into a stereoscopic image by the stereoscopic image generation unit 133, and then passed through the video encoder 134. And output to the monitor 16. As a result, the image recorded on the recording medium 140 is reproduced and displayed on the monitor 16 (reproduction of one image).

The frame advance of the image is performed by operating the left and right keys of the cross button 26. When the right key of the cross button 26 is pressed, the next image file is read from the recording medium 140 and reproduced and displayed on the monitor 16. When the left key of the cross button is pressed, the previous image file is read from the recording medium 140 and reproduced and displayed on the monitor 16.

The image recorded on the recording medium 140 can be erased as necessary while confirming the image reproduced and displayed on the monitor 16. The image is erased by pressing the MENU / OK button 25 while the image is reproduced and displayed on the monitor 16.

According to the present embodiment, the subject is tracked using the images of a plurality of subjects with different orientations as keys, so even if the subject moves or the orientation of the subject changes, there is a possibility that the subject may be lost during tracking. Can be reduced. In addition, since subject tracking processing is performed using a template image obtained by extracting a parallax image, accurate tracking can be performed. As a result, the amount of calculation can be reduced.

In this embodiment, the subject tracking process is performed on the left-eye image A. However, the subject tracking process may be performed on the right-eye image B, or the subject tracking process may be performed on both the left-eye image A and the right-eye image B. Processing may be performed. The subject tracking process is performed on the right eye image B in the same manner as shown in FIG. That is, the right-eye image Bi is searched for both the template images TA (i-1) and TB (i-1). If the results are the same, the position is set as the position of the subject Z. If the results are different, The result with the higher similarity may be set as the position of the subject Z.

<Second Embodiment>
The first embodiment of the present invention uses a template image generated by extracting a part of a right-eye image and a template image generated by extracting a part of a left-eye image, and a right-eye image and a left-eye image. The subject tracking is performed by searching for the subject from at least one of the above, but the method of performing the subject tracking is not limited to this.

In the second embodiment of the present invention, a template image generated by extracting a part of the right-eye image, a template image generated by extracting a part of the right-eye image, and a part of the left-eye image are generated. This is a form in which subject tracking is performed by searching for a subject from at least one of a right-eye image and a left-eye image using a template image generated by image synthesis processing from the template image. Hereinafter, the compound-eye digital camera 2 of the second embodiment will be described. The same parts as those of the first embodiment are denoted by the same reference numerals, and description thereof is omitted.

FIG. 9 is a block diagram showing the main internal configuration of the compound-eye digital camera 2. The compound-eye digital camera 2 mainly includes a CPU 110, operation means (release switch 20, MENU / OK button 25, cross button 26, etc.) 112, SDRAM 114, VRAM 116, AF detection means 118, AE / AWB detection means 120,

image sensor

122, 123, CDS /

AMP

124, 125, A /

D converters

126, 127, image input controller 128, image signal processing means 130, compression / decompression processing means 132, stereoscopic image generation unit 133, video encoder 134, template image generation unit 135, Subject search unit 136, media controller 137, sound input processing unit 138, composite template image generation unit 139, recording medium 140, focus

lens driving units

142 and 143, zoom

lens driving units

144 and 145, aperture driving unit 146 147, and a timing generator (TG) 148 and 149.

The composite template image generation unit 139 generates a composite template image by image composition processing from a template image generated by extracting a part of the image for the right eye and a template image generated by extracting a part of the image for the left eye. FIG. 10 is a schematic diagram illustrating a method in which the composite template image generation unit 139 generates a composite template image.

First, the composite template image generation unit 139 extracts feature points from the template image TA0 generated by extracting a part of the left-eye image A0. The feature points are, for example, points (pixels) having strong signal gradients in a plurality of directions, and can be extracted by using a Harris method, a Shi-Tomasi method, or the like.

Next, the composite template image generation unit 139 extracts corresponding points from the template image TB0 generated by extracting a part of the right-eye image B0. The corresponding point is a point corresponding to the feature point extracted from the template image TA0.

Then, the composite template image generation unit 139 aligns the feature points extracted from the template image TA0 and the corresponding points extracted from the template image TB0, and then performs template combination TA0 and template image TB0 by image composition processing. Intermediate state images, that is, composite template images TM0-1 to TM0-5 are generated. In this embodiment, a composite template image is created using a technique called morphing that expresses a state of deformation from one object to another, but the image composition processing is not limited to this.

Further, in the present embodiment, five composite template images TM0-1 to TM0-5 are generated, but the number of generated composite templates is not limited to five.

Further, a composite template image may be generated after some image processing is performed on the template images TA (i-1) and TB (i-1). For example, after correcting the luminance so that the average luminance of the template image TA (i-1) and the average luminance of the template image TB (i-1) become the same value, the synthesized template image TM (i-1) -1 TM (i-1) -5 may be generated, or the color balance of the template image TA (i-1) and the color balance of the template image TB (i-1) may have the same value. The synthesized template images TM (i-1) -1 to TM (i-1) -5 may be generated after correcting the luminance. The feature points extracted from the template image TA (i-1) and the corresponding points extracted from the template image TB (i-1) are aligned, and the template image TA (i-1) is converted into the template image TB (i After that, the composite template images TM (i-1) -1 to TM (i-1) -5 may be generated with the same size as that of (-1).

The operation of the compound-eye digital camera 2 configured as described above will be described. Since the difference between the first embodiment and the second embodiment is only the subject tracking process in the 3D shooting mode, only the subject tracking process will be described, and the other description will be omitted.

FIG. 11 is a flowchart showing a flow of processing for tracking the subject Z with respect to the image A for the left eye continuously photographed at a predetermined frame rate. This process is controlled by the CPU 110. A program for causing the CPU 110 to execute this imaging process is stored in a program storage unit in the CPU 110.

The left-eye image A taken immediately before the frame to be processed, in this case, the left-eye image A0 (see FIG. 5) taken immediately before the process of tracking the subject Z is input to the template image generation unit 135. . The template image generation unit 135 generates a template image TA0 by extracting an area including the subject Z from the left-eye image A0 and having a size that allows the shape of the subject Z to be recognized (step S10).

The right eye image B photographed immediately before the processing target frame, in this case, the right eye image B0 (see FIG. 5) photographed immediately before the process of tracking the subject Z, or the processing target frame and the photographing timing as much as possible. The near right-eye image B is input to the template image generation unit 135, and the template image generation unit 135 generates a template image TB0 from the right-eye image B0 by the same method as in step S10 (step S12).

The right imaging system 12 captures the right-eye image Bi (B1 since i = 1 at this time, see FIG. 8) (step S16). Since the subject tracking process is not performed on the right-eye image Bi, the captured right-eye image Bi is input to the template image generation unit 135. At the same time, the right-eye image Bi is input to the video encoder 134, is sequentially converted into a display signal format by the video encoder 134, and is output to the monitor 16.

The composite template image generation unit 139 generates composite template images TM (i-1) -1 to TM (i-1) -5 from the template images TA (i-1) and TB (i-1) (Step S40). ). Since i = 1 at this time, synthesized template images TM0-1 to TM0-5 are generated from the template images TA0 and TB0.

The subject search unit 136 searches the left-eye image Ai using the template image TA (i-1) and the combined template images TM (i-1) -1 to TM (i-1) -5, and the left-eye image. A portion similar to template image TA (i-1) and composite template images TM (i-1) -1 to TM (i-1) -5 is searched from Ai (step S42). Since i = 1 at this time, the left-eye image A1 is searched using the template image TA0 and composite template images TM0-1 to TM0-5 generated in steps S10 and S40.

The subject search unit 136 determines whether or not the search for the left-eye image Ai has been completed (step S44), and if not completed (NO in step S44), performs the step S44 again.

When the search for the left-eye image Ai is completed (YES in step S44), the subject search unit 136 determines whether the search using a plurality of templates is successful, that is, whether the search results are obtained using a plurality of templates. Is determined (step S46).

If the search with a plurality of templates is not successful (NO in step S46), only one search result for the subject Z is obtained. Therefore, the subject search unit 136 inputs the search result (position of the subject Z) and the left-eye image Ai to the template image generation unit 135. Thereafter, the process proceeds to step S54.

If the search using a plurality of templates is successful (YES in step S46), the search result is obtained using a plurality of templates. The similarity with the result searched using the image is calculated for each template image (each search result). Then, the subject search unit 136 determines whether there are a plurality of objects having the highest similarity (step S48). As a method for calculating similarity, a known method such as a difference between feature value values, a least square method on a feature space (or a weighted space is also possible), and the like can be employed.

When there are a plurality of objects having the highest similarity (YES in step S48), the subject search unit 136 is similar to the template image TA (i-1) or the synthesized template image closest to the template image TA (i-1). This portion is set as the position of the subject Z (step S50).

Step S50 will be specifically described. When the similarity between the result of searching the left-eye image Ai in the template image TA (i-1) and the template image TA (i-1) is included in the plurality of the highest similarities, The result of searching using the template image TA (i-1) is set as the position of the subject Z. Thereby, the subject tracking system can be increased. The subject search unit 136 inputs the position of the subject Z and the left-eye image Ai to the template image generation unit 135. Thereafter, the process proceeds to step S32.

When the similarity between the result of searching for the left-eye image Ai in the template image TA (i-1) and the template image TA (i-1) is not included in the plurality of the highest similarities, Of the synthesized template images TM (i-1) -1 to TM (i-1) -5, a portion similar to the synthesized template image closest to the template image TA (i-1) is set as the position of the subject Z. When the composite template images TM0-1 to TM0-5 shown in FIG. 10 are generated, the composite template image TM0-1 is closest to the template image TA0, and the composite template image TM0-5 is farthest from the template image TA0. . Therefore, when the similarity between the result of searching for the left-eye image Ai in the composite template image TM0-1 and the composite template image TM0-1 is included in the plurality of the highest similarities, the composite template A portion similar to the image TM0-1 is set as the position of the subject Z. Further, the result of searching the left-eye image Ai with the composite template image TM0-1 and the composite template image TM0-1 are not included in the highest similarity that has a plurality of similarities, and the left-eye with the composite template image TM0-2. If the similarity between the result of the search for the image Ai and the composite template image TM0-2 is included in a plurality of the highest similarities, a portion similar to the composite template image TM0-2 is selected as the subject. Let Z be the position. Thereby, even when the detection result with the highest accuracy is not used, the possibility of losing the subject during tracking can be reduced while maintaining a certain level of accuracy.

If there are not a plurality of objects having the highest similarity (NO in step S48), the subject search unit 136 sets the search result having the highest similarity as the position of the subject Z (step S52). The subject search unit 136 inputs the position of the subject Z and the left eye image Ai to the template image generation unit 135.

Thereby, the subject Z is searched from the left-eye image Ai. Based on the position of the subject Z set in steps S44, S50, and S52, the template image generation unit 135 extracts a region that includes the subject Z from the left-eye image Ai and has a size that allows the shape of the subject Z to be recognized. A template image TAi is generated (step S54).

The template image generation unit 135 generates a template image TBi from the right-eye image Bi by the same method as in step S12 (step S34).

Thereafter, i = 1 + 1 is set (step S36), the process returns to step S16, and the processes of steps S16 to S36 are performed again.

According to the present embodiment, since the subject is tracked using a plurality of subjects having different directions as keys, the possibility of losing the subject during tracking is reduced even when the subject moves or the orientation of the subject changes. be able to.

In this embodiment, the subject tracking process is performed on the left-eye image A. However, the subject tracking process may be performed on the right-eye image B, or the subject tracking process may be performed on both the left-eye image A and the right-eye image B. Processing may be performed. When subject tracking processing is performed on the right eye image B, the right eye image Bi is converted into the template image TB (i-1) and the combined template image TM (i-1) -1 to the same method as shown in FIG. Search with TM (i-1) -5.

Further, in the present embodiment, when a search with a plurality of templates is successful (YES in step S46) and there are a plurality of items having the highest similarity (YES in step S48), the template image TA (i -1) or a portion similar to the combined template image closest to the template image TA (i-1) is set as the position of the subject Z (step S50), but if the search with a plurality of templates is successful (YES in step S46) ), The portion similar to the template image TA (i-1) or the combined template image closest to the template image TA (i-1) may be set as the position of the subject Z without calculating the similarity.

Further, in this embodiment, the template images TA (i-1) and TB (i-1) are subjected to image synthesis processing to generate synthesized template images TM (i-1) -1 to TM (i-1) -5. However, the method for generating the composite template image is not limited to the image composition processing. For example, after the feature points and corresponding points of the template images TA (i-1) and TB (i-1) are aligned, the difference is extracted, and the pixels having different luminance and hue are masked to make the template image TA (i-1 ), A template having only the common part of TB (i-1) may be generated as a synthesis template. Further, the composite template image to be generated is not limited to an image in an intermediate state between the template image TA (i−1) and the template image TB (i−1), that is, generated by interpolation. For example, as a composite template image, an image outside the template image TA (i-1) or template image TB (i-1) (that is, an image of a subject viewed from a viewpoint outside the right imaging system 12 or the left imaging system 13). ) May be generated by extrapolation.

In the present embodiment, the left eye image Ai is searched by using the template image TA (i-1) and the combined template images TM (i-1) -1 to TM (i-1) -5, and the left eye A portion similar to the template image TA (i-1) and the composite template images TM (i-1) -1 to TM (i-1) -5 was searched from the image Ai (step S42). The left eye image Ai may be searched using only (i-1) -1 to TM (i-1) -5. However, since the synthesized template image may be rough, it is desirable to search both the template image and the synthesized template image if possible. Further, the left-eye image Ai is searched using the template images TA (i-1), TB (i-1) and the composite template images TM (i-1) -1 to TM (i-1) -5. Good. In this case, although processing takes time, it is possible to reliably track the subject.

<Modification of Second Embodiment>
In the second embodiment, the left-eye image Ai is searched using the template image TA (i-1) and the combined template images TM (i-1) -1 to TM (i-1) -5. As in the first embodiment, the left eye image Ai is searched using the template images TA (i-1) and TB (i-1), and the subject is lost. May be searched.

FIG. 12 is a flowchart showing a flow of processing for tracking the subject Z with respect to the image A for the left eye continuously photographed at a predetermined frame rate. This process is controlled by the CPU 110. A program for causing the CPU 110 to execute this imaging process is stored in a program storage unit in the CPU 110.

Steps S10 to S22 are the same as those in the first embodiment, and will be described from step S60.

The subject search unit 136 searches the left eye image Ai using the target lost as the search result in step S20, that is, the template images TA (i−1) and TB (i−1). As a result, the template image TA (i− 1) It is determined whether or not a portion similar to TB (i-1) has been found (step S60).

If the target has not been lost (NO in step S60), if the search for the left-eye image Ai is completed (YES in step S22), the subject search unit 136 performs TA (i-1) for the left-eye image Ai. It is determined whether or not the result of searching for and the result of searching for the left-eye image Ai by TB (i−1) are the same (step S24). Steps S24 to S30 are the same as those in the first embodiment, and a description thereof will be omitted. Thereafter, the process proceeds to step S74.

When the target has been lost (YES in step S60), the composite template image generation unit 139 uses the composite template images TM (i-1) -1 to TM (TM) from the template images TA (i-1) and TB (i-1). i-1) -5 is generated (step S40).

The subject searching unit 136 searches the left-eye image Ai using the composite template images TM (i-1) -1 to TM (i-1) -5, and generates the composite template image TM (i− 1) A part similar to -1 to TM (i-1) -5 is searched (step S62).

The subject search unit 136 determines whether or not the search for the left-eye image Ai has ended (step S64). If the search has not ended (NO in step S64), the subject search unit 136 performs step S64 again.

When the search for the left-eye image Ai is completed (YES in step S64), the subject search unit 136 determines whether the search using a plurality of templates is successful, that is, whether the search results are obtained using a plurality of templates. Is determined (step S66).

If the search with a plurality of templates has not succeeded (NO in step S66), the search result of the subject Z is only one. Therefore, the subject search unit 136 inputs the search result (position of the subject Z) and the left-eye image Ai to the template image generation unit 135. Thereafter, the process proceeds to step S74.

If the search with a plurality of templates is successful (YES in step S66), the search result is obtained with a plurality of templates. Therefore, the subject search unit 136 includes the template image that has been successfully searched, and the template. The similarity with the search result using the image is calculated for each template image (each search result). Then, the subject searching unit 136 determines whether there are a plurality of objects having the highest similarity (step S68).

If there are a plurality of objects having the highest similarity (YES in step S68), the subject searching unit 136 sets a portion similar to the synthesized template image closest to the template image TA (i-1) as the position of the subject Z. (Step S70). If there is not a plurality of objects with the highest similarity (NO in step S68), the subject search unit 136 sets the search result with the highest similarity as the position of the subject Z (step S72). The subject search unit 136 inputs the position of the subject Z and the left-eye image Ai in steps S70 and S72 to the template image generation unit 135. Thereafter, the process proceeds to step S74.

The template image generation unit 135 includes the subject Z from the left-eye image Ai based on the position of the subject Z set in steps S22, S28, S30, S62, S70, and S72, and is large enough to recognize the shape of the subject Z. Are extracted to generate a template image TAi (step S74). Similar to step S10, a rectangular area having a margin of several pixels is extracted from the contour of the subject Z and generated as a template image.

The template image generation unit 135 generates a template image TBi from the right-eye image Bi by the same method as in step S12 (step S34). Thereafter, i = 1 + 1 is set (step S36), the process returns to step S16, and the processes of steps S16 to S36 are performed again.

According to the present embodiment, the subject is tracked using a plurality of subjects having different directions as keys, so that the possibility of losing the subject during tracking is further reduced even when the subject moves or the orientation of the subject changes. can do. In addition, since a composite template image is created and a search using the composite template image is performed only when the search with the template image fails, it is not necessary to perform useless processing, and the processing time can be shortened.

<Third Embodiment>
The first embodiment of the present invention uses a template image generated by extracting a part of a right-eye image and a template image generated by extracting a part of a left-eye image, and a right-eye image and a left-eye image. Subject tracking is performed by searching for a subject from at least one of the above, but subject tracking may fail when the entire surface of the subject to be tracked is hidden by a shield.

In the third embodiment of the present invention, when the entire subject to be tracked is hidden behind a shield, the subject to be tracked is not lost. The compound eye digital camera 3 according to the third embodiment will be described below. The same parts as those of the first embodiment are denoted by the same reference numerals, and description thereof is omitted.

FIG. 13 is a block diagram showing the main internal configuration of the compound-eye digital camera 3. The compound-eye digital camera 3 mainly includes a CPU 110, operation means (release switch 20, MENU / OK button 25, cross button 26, etc.) 112, SDRAM 114, VRAM 116, AF detection means 118, AE / AWB detection means 120,

image sensor

122, 123, CDS /

AMP

124, 125, A /

D converters

126, 127, image input controller 128, image signal processing means 130, compression / decompression processing means 132, stereoscopic image generation unit 133, video encoder 134, template image generation unit 135, Subject search unit 136, media controller 137, sound input processing unit 138, recording medium 140, template image regeneration unit 141, focus

lens driving units

142 and 143, zoom

lens driving units

144 and 145, aperture driving unit 146, 47, and a timing generator (TG) 148 and 149.

As a result of searching the template image generated by the template image generating unit 135, the template image regenerating unit 141 finds a tracking target subject when one of the left eye image A and the right eye image B is lost. A template image for searching for an image that has been lost is generated using an image in which a subject to be tracked is searched. In the present embodiment, a template image is generated by extracting a part of the image searched for the tracking target subject based on the search result of the image searched for the tracking target subject. The subject searching unit 136 uses the template image generated by the template image regenerating unit 141 to search for an image in which the tracking target subject is lost. Details of the processing of the template image regenerating unit 141 will be described later.

The operation of the compound eye digital camera 3 configured as described above will be described. Since the difference between the first embodiment and the third embodiment is only the subject tracking process in the 3D shooting mode, only the subject tracking process will be described, and the other description will be omitted.

FIG. 14 is a flowchart showing a flow of processing for tracking the subject Z with respect to the image A for the left eye continuously photographed at a predetermined frame rate. This process is controlled by the CPU 110. A program for causing the CPU 110 to execute this imaging process is stored in a program storage unit in the CPU 110.

The left-eye image A taken immediately before the frame to be processed, in this case, the left-eye image A0 (see FIG. 15) taken immediately before the process of tracking the subject Z is input to the template image generation unit 135. . The template image generation unit 135 generates a template image TA0 (see FIG. 16) by extracting a region including the subject Z from the left-eye image A0 and having a size that allows the shape of the subject Z to be recognized (step S10).

The right eye image B photographed immediately before the processing target frame, in this case, the right eye image B0 (see FIG. 15) photographed immediately before the process of tracking the subject Z, or the processing target frame, and the photographing timing as much as possible. The near right eye image B is input to the template image generation unit 135, and the template image generation unit 135 generates a template image TB0 (see FIG. 16) from the right eye image B0 by the same method as in step S10 (step S12).

The left imaging system 13 captures a left-eye image Ai (currently, A = 1 since i = 1, see FIG. 17) (step S16). Since the subject tracking process is performed on the left-eye image Ai, the captured left-eye image Ai is input to the subject search unit 136. At the same time, the left-eye image Ai is input to the video encoder 134, is sequentially converted into a display signal format by the video encoder 134, and is output to the monitor 16.

The right imaging system 12 captures the right-eye image Bi (B1 since i = 1 at this time, see FIG. 17) (step S18). Since the subject tracking process is not performed on the right-eye image Bi, the captured right-eye image Bi is input to the template image generation unit 135. At the same time, the right-eye image Bi is input to the video encoder 134, is sequentially converted into a display signal format by the video encoder 134, and is output to the monitor 16.

The subject search unit 136 acquires the generated template image TA (i-1) from the template image generation unit 135, searches the left-eye image Ai using the template image TA (i-1), and determines the left-eye image. A portion similar to the template image TA (i-1) is searched from Ai (step S80). Since i = 1 at present, the left-eye image A1 is searched using the template image TA0 generated in step S10.

The subject search unit 136 determines whether or not the search for the left-eye image Ai is successful (step S82). Since i = 1 at this time, it is determined whether or not a portion similar to the template image TA0 has been searched from the left-eye image A1.

If the search for the left-eye image Ai is successful (YES in step S82), the subject search unit 136 inputs the search result (position of the subject Z) and the left-eye image Ai to the template image generation unit 135. Then, it progresses to step S100.

If the search for the left-eye image Ai has not succeeded (NO in step S82), for example, as shown in FIG. 17, the subject Z to be tracked is an obstruction (here, another subject located in front of the subject Z). It is conceivable that the vehicle is covered with a car C) and cannot be visually recognized. In this case, the subject searching unit 136 acquires the generated template image TB (i−1) from the template image generating unit 135. Further, since the right-eye image Bi acquired in step S16 is input to the template image generation unit 135, the subject search unit 136 acquires the right-eye image Bi from the template image generation unit 135. Then, the subject search unit 136 searches the right-eye image Bi using the template image TB (i-1), and searches for a portion similar to the template image TB (i-1) from the right-eye image Bi (Step 1). S84). Since i = 1 at this time, it is determined whether a portion similar to the template image TB0 has been searched from the right-eye image B1.

The subject search unit 136 determines whether or not the search for the right-eye image Bi is successful (step S86).

If the search for the right-eye image Bi fails (NO in step S86), the subject search unit 136 performs tracking error processing (step S101) because the search has failed. As the tracking error process, for example, a message indicating a tracking error may be displayed on the monitor 16, but the tracking error process is not limited to this.

If the search for the right-eye image Bi is successful (YES in step S86), the subject search unit 136 inputs the search result to the template image regeneration unit 141, and the template image regeneration unit 141 selects the right-eye image Bi. A part is extracted to generate a template image TBi-L1 (step S88). The position of the template image TBi-L1 is a position estimated as a position close to the position of the subject Z in the left-eye image A from the positional relationship between the subject and the compound-eye digital camera. Hereinafter, a process in which the template image regenerating unit 141 generates the template image TBi-L1 will be described in detail.

From the right-eye image B1 shown in FIG. 17, the tracking target subject Z is searched in step S84. First, the template image regeneration unit 141 sets a predetermined region including the subject Z as a template image TB1 (see the dotted line in FIG. 17 and FIG. 18). Similar to the template image TB0, the template image TB1 is generated by extracting a rectangular region having a margin of several pixels on the contour of the subject Z.

As shown in FIG. 19, when the subject Z to be tracked is covered with a shield (in this case, the car C), the shield is located on the left side of the subject Z in the viewpoint image on the right side of the viewpoint that is tracking. To do. Here, since the tracking process is performed on the left-eye image A, the shielding object is located on the left side of the subject Z in the right-eye image B. Therefore, the template image regenerating unit 141 extracts a region moved left from the position of the template image TB1 by the width of the template image TB1 from the position of the template image TB1 for the right eye image B1, and generates a template image TB1-L1 (see the dotted line in FIG. FIG. 18). Note that the size of the template image TB1-L1 is the same as that of the template image TB1.

The subject searching unit 136 searches for the left-eye image Ai using the template image TBi-L1 generated by the template image regenerating unit 141 in step S88 (step S90).

The subject search unit 136 determines whether or not the search for the left-eye image Ai is successful (step S92).

If the search for the left-eye image Ai is successful (YES in step S92), the subject search unit 136 inputs the search result (position of the subject Z) and the left-eye image Ai to the template image generation unit 135. Then, it progresses to step S100.

If the search for the left-eye image Ai fails (NO in step S92), the search is unsuccessful, so the search result is input to the template image regeneration unit 141, and the template image regeneration unit 141 is the same as in step S88. By this method, the region moved to the left by the width of the template image TBi-L1 from the position of the template image TBi-L1 is extracted to generate the template image TBi-L2 (step S94). A template image TB1-L2 (see the dotted line in FIG. 17) is generated from the right-eye image B1 shown in FIG.

The subject searching unit 136 searches for the left-eye image Ai using the template image TBi-L2 generated by the template image regenerating unit 141 in step S94 (step S96).

The subject search unit 136 determines whether or not the search for the left-eye image Ai is successful (step S98).

If the search for the left-eye image Ai has failed (NO in step S98), the search is unsuccessful, and the subject search unit 136 performs tracking error processing (step S101). If the search for the left-eye image Ai is successful (YES in step S98), the subject search unit 136 inputs the search result (position of the subject Z) and the left-eye image Ai to the template image generation unit 135. Then, it progresses to step S100.

If it is determined that the search is successful in step S82, the template image generation unit 135 includes the subject Z from the left-eye image Ai based on the position of the subject Z set in step S80, and recognizes the shape of the subject Z. A region having a possible size is extracted to generate a template image TAi (step S100). Similar to step S10, a rectangular area having a margin of several pixels is extracted from the contour of the subject Z and generated as a template image.

Further, when it is determined that the search has failed in step S82, the template image generation unit 135 sets the template image TA (i-1) searched in step S80 as the template image TAi (step S100). That is, if it is determined in step S82 that the search has failed, the template image in which the tracking target subject Z was finally searched is continuously used for the next frame. Thereby, when the subject Z to be tracked is covered with the vehicle C located in the front and cannot be seen (see FIG. 17), when the vehicle C moves and the subject Z can be seen, The subject Z can be searched.

According to the present embodiment, even when the tracking target subject is obstructed by another subject, a search is performed by estimating the position where the tracking target subject is likely to be based on the positional relationship between the subject and the compound-eye digital camera. Thus, the possibility of losing sight of the subject can be reduced.

In the present embodiment, the template image TA (i−1) is used to search the left eye image Ai (step S80). If the template image TAi− is unsuccessful, the template image TBi− generated by the template image regeneration unit 141 is searched. The search for the left-eye image Ai at L1 (step S90) may be a process of searching for the left-eye image Ai using the template images TA (i-1) and TB (i-1) instead of step S80. . In this case, the left-eye image Ai is searched using the template images TA (i−1) and TB (i−1). If the image is unsuccessful, the template image TBi− generated by the template image regeneration unit 141 is searched. The left-eye image Ai is searched at L1. That is, first, tracking is performed with high accuracy, and processing is performed to reduce the possibility of losing sight of the subject, although accuracy decreases only when tracking is impossible with high accuracy tracking. Therefore, it is possible to provide subject tracking processing that is accurate and has a low possibility of losing sight of the subject. In this case, the left-eye image Ai + 1 may be searched using the template images TA (i−1) and TB (i−1).

In this embodiment, the tracking error process (step S102) is performed when the subject is lost, but when the subject is lost, the tracking error process (step S102) is not performed and the process proceeds to step S36. The next frame may be processed.

Further, in the present embodiment, the process of tracking the subject Z with respect to the left-eye image A continuously captured at a predetermined frame rate has been described, but the process of tracking the subject Z with respect to the right-eye image B Alternatively, a process for tracking the subject Z may be performed on the left-eye image A and the right-eye image B. FIG. 20 is a flowchart showing a flow of processing for tracking the subject Z with respect to the right-eye image A continuously taken at a predetermined frame rate. This process is controlled by the CPU 110. A program for causing the CPU 110 to execute this imaging process is stored in a program storage unit in the CPU 110. In addition, the same code | symbol is attached | subjected about the same part by the process shown in FIG. 14, and the process shown in FIG.

The subject search unit 136 acquires the generated template image TB (i-1) from the template image generation unit 135, searches the right-eye image Bi using the template image TB (i-1), and performs the right-eye image. A part similar to the template image TB (i-1) is searched from Bi (step S102).

The subject search unit 136 determines whether or not the search for the right-eye image Bi has been successful (step S104). If the search for the right-eye image Bi is successful (YES in step S104), the subject search unit 136 inputs the search result (position of the subject Z) and the right-eye image Bi to the template image generation unit 135. Thereafter, the process proceeds to step S122.

When the search for the right-eye image Bi has not been successful (NO in step S104), the subject search unit 136 acquires the generated template image TA (i-1) from the template image generation unit 135. In addition, the subject search unit 136 acquires the left-eye image Ai from the template image generation unit 135. Then, the subject searching unit 136 searches the left-eye image Ai using the template image TA (i-1), and searches for a portion similar to the template image TA (i-1) from the left-eye image Ai (step S1). S106).

The subject search unit 136 determines whether or not the search for the left-eye image Ai is successful (step S108).

If the search for the left-eye image Ai has failed (NO in step S108), the search is unsuccessful, and the subject search unit 136 performs tracking error processing (step S101).

If the search for the left-eye image Ai is successful (YES in step S86), the subject searching unit 136 inputs the search result to the template image regenerating unit 141, and the template image regenerating unit 141 inputs the left eye image Ai. A part is extracted to generate a template image TAi-R1 (step S110). The position of the template image TAi-R1 is a position estimated as a position close to the position of the subject Z in the right-eye image B from the positional relationship between the subject and the compound-eye digital camera.

When the subject Z to be tracked is covered with a shielding object, the shielding object is located on the right side of the subject Z in the viewpoint image on the left side of the viewpoint that is being tracked. That is, when the subject Z is covered with the shielding object in the right eye image B, the shielding object is positioned on the right side of the subject Z in the left eye image A. Therefore, the template image regenerating unit 141 extracts a region moved to the right from the position of the template image TAi by the width of the template image TAi in the left eye image Ai, and generates a template image TAi-R1. Note that the size of the template image TAi-R1 is the same as that of the template image TAi.

The subject searching unit 136 searches the right-eye image Bi using the template image TAi-R1 generated by the template image regenerating unit 141 in step S100 (step S112). Then, the subject searching unit 136 determines whether or not the search for the right-eye image Bi has been successful (step S114).

If the search for the right-eye image Bi is successful (YES in step S114), the subject search unit 136 inputs the search result (position of the subject Z) and the right-eye image Bi to the template image generation unit 135. Thereafter, the process proceeds to step S122.

If the search for the right-eye image Bi fails (NO in step S114), the search is unsuccessful, so the search result is input to the template image regeneration unit 141, and the template image regeneration unit 141 is the same as in step S112. By this method, the region moved to the right by the width of the template image TAi-R1 from the position of the template image TAi-R1 is extracted to generate the template image TAi-R2 (step S116).

The subject searching unit 136 searches the right-eye image Bi using the template image TAi-R2 generated by the template image regenerating unit 141 in step S116 (step S118). Then, the subject search unit 136 determines whether or not the search for the right-eye image Bi is successful (step S120).

If the search for the right-eye image Bi fails (NO in step S120), the subject search unit 136 performs a tracking error process (step S101) because the search has failed. If the search for the right-eye image Bi is successful (YES in step S120), the subject search unit 136 inputs the search result (position of the subject Z) and the right-eye image Bi to the template image generation unit 135. Thereafter, the process proceeds to step S122.

If it is determined that the search is successful in step S104, the template image generation unit 135 includes the subject Z from the right-eye image Bi based on the position of the subject Z set in step S102, and recognizes the shape of the subject Z. An area having a possible size is extracted to generate a template image TBi (step S122). Further, when it is determined in step S104 that the search has failed, the template image generation unit 135 sets the template image TB (i−1) searched in step S102 as the template image TBi (step S122).

As a result, even if the subject to be tracked is obstructed by other subjects, the subject is determined by performing a search by estimating the position where the subject to be tracked is supposed to be based on the positional relationship between the subject and the compound-eye digital camera. The possibility of losing sight can be further reduced.

In the present embodiment, the template image regeneration unit 141 extracts a region moved to the left by the width of the template image TBi from the position of the template image TBi to generate the template image TBi-L1, but the template image TBi− The method of determining the position of L1 is not limited to this. For example, the position of the region template image TBi-L1 moved to the left by half the width of the template image TBi may be set, or a region covered with a change in luminance or color in the vicinity of the template image TBi may be the template image TBi-L1. Or a region with many edges and contours may be used as the position of the template image TBi-L1.

Also, the position of the subject Z may be estimated based on the left-eye image A and the right-eye image B, and this may be used as the position of the template image TBi-L1. An example of a method for estimating the position of the subject Z will be described with reference to FIGS. When the subject Z to be tracked cannot be searched for in the left-eye image A1 in FIG. 22, as shown in FIG. 21, the position for the left eye is set on the basis of the position (x0, 基準 y0) of the template image TB0 in the right-eye image B0. The position (x0 + dx, y0 + dy) of the template image TA0 in the image A0 is calculated. Thereby, the difference (dx, dy) between the position (x0, y0) of the template image TB0 in the right-eye image B0 and the position (x0 + dx, y0 + dy) of the template image TA0 in the left-eye image A0 is calculated. Therefore, if the position of the template image TB1 in the right-eye image B1 is (x1, y1), the position of the subject Z to be tracked in the left-eye image A1 can be estimated as (x1 + dx, y1 + dy). A template image may be generated at the position. As a result, the template image regenerating unit 141 needs to create a template image only once, and the time required to track the subject when the template image is lost can be reduced.

In this embodiment, the template image regeneration unit 141 generates the template images TBi-L1 and TBi-L2, and performs error processing when the search in the template image TBi-L2 fails. However, the template image regeneration is performed. The number of template images generated by the unit 141 is not limited to two, and can be arbitrarily set.

<Fourth embodiment>
The first embodiment of the present invention uses a template image generated by extracting a part of a right-eye image and a template image generated by extracting a part of a left-eye image, and a right-eye image and a left-eye image. Subject tracking is performed by searching for a subject from at least one of the above, but subject tracking may fail due to the background or foreground included in the template image.

In the fourth embodiment of the present invention, a background and foreground are removed from a template image generated by extracting a part of a right-eye image and a left-eye image, and subject tracking is performed using this. The compound eye digital camera 2 according to the fourth embodiment will be described below. The same parts as those of the first embodiment are denoted by the same reference numerals, and description thereof is omitted.

FIG. 23 is a block diagram showing the main internal configuration of the compound-eye digital camera 1. The compound-eye digital camera 1 mainly includes a CPU 110, operation means (release switch 20, MENU / OK button 25, cross button 26, etc.) 112, SDRAM 114, VRAM 116, AF detection means 118, AE / AWB detection means 120,

image sensor

122, 123, CDS /

AMP

124, 125, A /

D converters

126, 127, image input controller 128, image signal processing unit 130, compression / decompression processing unit 132, stereoscopic image generation unit 133, video encoder 134, subject search unit 136, media Controller 137, sound input processing means 138, recording medium 140, focus lens driving means 142, 143, zoom lens driving means 144, 145, aperture driving means 146, 147, timing generator (TG) 148, 149, template It constituted by preparative image generation unit 150.

The template image generation unit 150 extracts a predetermined area (for example, a rectangle) including the subject Z to be tracked from each of the left-eye image A and the right-eye image B, and generates a template image. Further, the template image generation unit 150 generates a template image obtained by removing the background and foreground from the template image. The processing performed by the template image generation unit 150 will be described in detail later.

The operation of the compound eye digital camera 4 configured as described above will be described. Since the difference between the first embodiment and the fourth embodiment is only the subject tracking process in the 3D shooting mode, only the subject tracking process will be described, and the other description will be omitted.

FIG. 26 is a flowchart showing a flow of processing for tracking the subject Z with respect to the image A for the left eye continuously photographed at a predetermined frame rate. This process is controlled by the CPU 110. A program for causing the CPU 110 to execute this imaging process is stored in a program storage unit in the CPU 110.

The left-eye image A taken immediately before the processing target frame, in this case, the left-eye image A0 taken immediately before the process of tracking the subject Z is input to the template image generation unit 150. In addition, the right-eye image B taken immediately before the processing target frame, in this case, the right-eye image B0 taken immediately before the process of tracking the subject Z is input to the template image generation unit 150. The template image generation unit 150 generates parallax maps PA0 and PB0 (see FIG. 25) from the left-eye image A0 and the right-eye image B0 (step S130). The parallax map represents the amount of deviation between the left-eye image A and the right-eye image B. By referring to the parallax map, the distance of each subject included in the image becomes clear. In FIG. 25, the distance is expressed with a high density at a short distance and a low density at a long distance. Various known methods can be used to generate the parallax map.

The template image generation unit 150 extracts a region including the subject Z from the left-eye image A (here, the left-eye image A0) captured immediately before the processing target frame and having a size that allows the shape of the subject Z to be recognized. An image TA0 is generated (step S10).

As shown in FIG. 25, the template image generation unit 150 extracts the same area as the area of the template image TA0 in the left-eye image A0 from the parallax map PA0, and generates a template parallax map TPA0. Then, the template image generating unit 150 sets a parallax area that is approximately 10 pixels or more away from the parallax of the subject Z, that is, the background, with the area near 10 pixels or more from the parallax of the subject Z as a foreground (step S132). ). In the case of the template image TAO in FIG. 26, a part of the car is included as a background. In the case of the template image TAO in FIG. 26, the foreground is not included, but the foreground may be a tree, a telephone pole, or the like. Note that ± 10 pixels is determined from the idea that the parallax in the subject Z is small and the parallax between the foreground / background and the subject Z is large, and if it does not deviate from this point, ± 10 pixels It is not limited to.

The template image generation unit 150 generates a template image SA0 obtained by removing the background and foreground from the template image TA0 (step S134). Hereinafter, the process of step S134 will be described.

As shown in FIG. 26, the template image generation unit 150 generates mask data for masking the invalid area set in step S132, that is, mask data other than the parallax range where the parallax is ± 10 pixels from the approximate center of the template parallax map TPA0. Generate from TPA0.

Then, as shown in FIG. 26, the template image generation unit 150 removes the background and foreground from the template image TA0 from the template image TA0 and the mask data, and generates a template image SA0 from which the background and foreground are removed.

The left imaging system 13 captures the left eye image Ai (step S16). Since the subject tracking process is performed on the left-eye image Ai, the captured left-eye image Ai is input to the subject search unit 136. At the same time, the left-eye image Ai is input to the video encoder 134, is sequentially converted into a display signal format by the video encoder 134, and is output to the monitor 16.

The right imaging system 12 captures the right-eye image Bi (step S18). Since the subject tracking process is not performed on the right-eye image Bi, the captured right-eye image Bi is input to the template image generation unit 135. At the same time, the right-eye image Bi is input to the video encoder 134, is sequentially converted into a display signal format by the video encoder 134, and is output to the monitor 16.

The subject search unit 136 searches the left-eye image Ai using the generated template image TA (i-1) and the template image SA (i-1) from which the background and foreground are removed, and uses the left-eye image Ai as a template. A portion similar to the image TA (i-1) and the template image SA (i-1) from which the foreground is removed is searched (step S136). Since i = 1 at this time, the left-eye image A1 is searched using the template image TA0 generated in steps S10 and S12 and the template image SA0 from which the foreground is removed.

The subject search unit 136 determines whether or not the search for the left-eye image Ai has been completed (step S138). If the search has not been completed (NO in step S138), step S138 is performed again.

When the search for the left-eye image Ai is completed (YES in step S138), the subject search unit 136 removes the result of searching the left-eye image Ai from the template image TA (i-1), and the background and foreground. It is determined whether or not the result of searching for the left-eye image Ai in the template image SA (i-1) is the same (step S140).

If the result of searching the left-eye image Ai using the template image TA (i-1) is the same as the result of searching the left-eye image Ai using the template image SA (i-1) with the background and foreground removed (YES in step S140) ), A portion similar to the template image TA (i-1) and the template image SA (i-1) with the background and foreground removed is set as the position of the subject Z. The subject search unit 136 inputs the search result and the left-eye image Ai to the template image generation unit 135. Thereafter, the process proceeds to step S148.

If the result of searching the left-eye image Ai using the template image TA (i-1) is not the same as the result of searching the left-eye image Ai using the template image SA (i-1) with the background and foreground removed (in step S140) NO), the subject search unit 136 calculates the similarity between the result of searching the left-eye image Ai in the template image TA (i-1) and the template image TA (i-1), and the background and foreground. The similarity between the result of searching the left-eye image Ai using the removed template image SA (i-1) and the template image SA (i-1) from which the background and foreground have been removed is calculated. As a method for calculating the similarity, a known method, for example, a difference between feature value values, a least square method on a feature amount space (a weighted space is also possible), or the like can be employed. The subject search unit 136 then searches the template image TA (i−1) for the left-eye image Ai and the template image SA (i−1) from which the similarity between the template image TA (i−1) and the template image TA (i−1) is removed. It is determined whether or not the result of searching the left-eye image Ai in step -1) is higher than the similarity between the template image SA (i-1) with the background and foreground removed (step S142).

The similarity between the result of searching the left eye image Ai using the template image TA (i-1) and the template image TA (i-1) is the left eye image obtained using the template image SA (i-1) with the background and foreground removed. If the similarity between the result of searching for Ai and the template image SA (i-1) from which the background and foreground have been removed is higher (YES in step S142), the subject search unit 136 uses the template image TA (i-1). As a result of searching the left eye image Ai, that is, a portion similar to the template image TA (i-1) is set as the position of the subject Z (step S144). The subject search unit 136 inputs the position of the subject Z and the left eye image Ai to the template image generation unit 135.

The similarity between the result of searching the left eye image Ai using the template image TA (i-1) and the template image TA (i-1) is the left eye image obtained using the template image SA (i-1) with the background and foreground removed. If the similarity between the result of searching for Ai and the template image SA (i-1) from which the background and foreground have been removed is not higher (NO in step S142), the subject search unit 136 removes the background and foreground from the template. The result of searching the left eye image Ai with the image SA (i-1), that is, a portion similar to the template image SA (i-1) from which the background and foreground are removed is set as the position of the subject Z (step S146). The subject search unit 136 inputs the position of the subject Z and the left eye image Ai to the template image generation unit 135.

Thereby, the subject Z is searched from the left-eye image Ai. Based on the position of the subject Z set in steps S136, S144, and S146, the template image generation unit 135 extracts a region that includes the subject Z from the left-eye image Ai and has a size that allows the shape of the subject Z to be recognized. A template image TAi is generated (step S148). Similar to step S10, a rectangular area having a margin of several pixels is extracted from the contour of the subject Z and generated as a template image. In the case shown in FIG. 8A1, the dotted line portion is generated as the template image TA1.

According to the present embodiment, it is possible to reduce the possibility of losing the subject even when the background changes due to the subject moving or the foreground enters in front of the subject.

In the present embodiment, in step S136, the left-eye image Ai is searched using the template image TA (i-1) and the template image SA (i-1) from which the background and foreground are removed. Alternatively, the left-eye image Ai may be searched using only the template image SA (i-1) from which the image is removed. However, in order to reduce the possibility of losing sight of the subject, it is desirable to perform a search using the template image TA (i-1) and the template image SA (i-1) with the background and foreground removed.

In the present embodiment, first, the left-eye image Ai is searched using the template image TA (i-1) and the template image SA (i-1) from which the background and foreground are removed. First, the template image TA ( i-1), TB (i-1) is used to search the left eye image Ai. If the search fails, the left eye image Ai is used using the template image SA (i-1) from which the background and foreground are removed. You may make it search. Further, the left-eye image Ai may be searched using a template image obtained by removing the background and foreground from the template image TB (i-1).

In the first to fourth embodiments, the case of capturing a live view image has been described as an example. However, when the right-eye image B and the left-eye image A are acquired continuously, for example, the present invention can also be applied to movie recording. Can do. The difference between live view image shooting and movie shooting is that the continuous-shot right-eye image B and left-eye image A are not recorded in the case of a live-view image, whereas in the case of movie shooting, the difference is continuous. The only difference is that the process of recording the right-eye image B and the left-eye image A taken on the recording medium 54 is performed. Note that the process of recording the continuously captured right-eye image B data and left-eye image A data on the recording medium 54 is already known, and thus description thereof is omitted.

In the first to fourth embodiments, the case where two viewpoint images of the left-eye image A and the right-eye image B are captured has been described as an example. However, the case where three or more viewpoint images are captured is also described. Applicable. Also in this case, the subject tracking process may be performed on at least one of the three or more viewpoint images. When subject tracking processing is performed on all viewpoint images, not only optimization of focus, zoom, and exposure, but also a display such as displaying a frame over the tracked subject Z, highlighting the tracked subject Z, etc. It is also possible to optimize. Since various known methods can be used for displaying and highlighting the frame, description thereof is omitted.

In the first to fourth embodiments, subject tracking processing is performed at the time of shooting a live view image, and AE metering and AF control are performed on the subject to be tracked when an S1 ON signal is input. That is, subject tracking processing is performed when shooting a live view image, and then AE metering and AF control of a still image to be captured are performed on the subject to be tracked. You may make it perform AE photometry and AF control continuously. The frame display and the highlight display may be continuously performed when the live view image is captured. Alternatively, a subject may be searched for from a still image viewpoint image using a template image extracted from the viewpoint image for live view images.

1: compound eye digital camera, 10: camera body, 11: barrier, 12: right imaging system, 13: left imaging system, 14: flash, 15: microphone, 16: monitor, 20: release switch, 21: zoom button, 22 : Mode button, 23: Parallax adjustment button, 24: 2D / 3D switching button, 25: MENU / OK button, 26: Cross button, 27: DISP / BACK button, 110: CPU, 112: Operating means, 114: SDRAM, 116: VRAM, 118: AF detection circuit, 120: AE / AWB detection means, 122, 123: imaging device, 124, 125: CDS / AMP, 126, 127: A / D converter, 128: image input controller, 130 : Image signal processing means, 132: compression / expansion processing means, 133: stereoscopic image generation unit, 134: video Encoder, 135, 150: template image generating means, 136: subject searching section, 137: media controller, 138: sound input processing means, 139: synthesized template image generating section, 140: recording medium, 141: template image regenerating section, 142, 143: Focus lens driving means, 144, 145: Zoom lens driving means, 146, 147: Aperture driving means, 148, 149: Timing generator (TG)

Claims

A first imaging means and a second imaging means for acquiring two viewpoint images obtained by photographing the same subject from two viewpoints;
A first template image is generated by extracting a part of the area including the subject to be tracked from the viewpoint image captured by the first imaging unit, and the viewpoint image captured by the second imaging unit. A first template image generating means for generating a second template image by extracting a partial area including the subject to be tracked from
From the viewpoint image captured by the first imaging means and captured at a time different from the viewpoint image that is the basis of the first template image, the first template image and the second template image Search means for searching for a subject to be tracked using a template image;
An imaging apparatus comprising:
A second template image generating unit configured to generate a combined template image by image combining processing from the first template image and the second template image when the tracking unit does not search for the tracking target object; ,
The imaging device according to claim 1, wherein the search unit searches for a subject to be tracked using the combined template image when the combined template image is generated by the second template image generating unit.
The imaging apparatus according to claim 2, wherein the composite template image is an image in an intermediate state between the first template image and the second template image.
In the case where a plurality of search results are obtained, the search means determines the similarity between the template image obtained from the search result and the result searched using the template image for each of the plurality of search results. The imaging apparatus according to claim 2 or 3, wherein a result obtained by searching using a template image having the highest calculated similarity is the subject to be tracked.
The search means is a case where a plurality of search results are obtained, and when a search result is obtained using the first template image, a result searched using the first template image The imaging apparatus according to claim 2, wherein the subject is a subject to be tracked.
The second template image generation means generates a plurality of types of the composite template images,
The search means is a case where a plurality of search results are obtained, and when a search result is not obtained using the first template image, the search means The imaging apparatus according to claim 2, 3, or 4, wherein a result searched using a synthesized template image closest to the first template image is set as the tracking target subject.
When the tracking target object is not searched by the search means, an area that is moved in the left-right direction by an arbitrary amount from the area extracted in generating the second template image is the second imaging means. A third template image generating means for generating a third template image extracted from the viewpoint image captured in
The imaging device according to claim 1, wherein the search unit searches for a subject to be tracked using the generated third template image.
The third template image generation means is configured to extract a region from the region extracted in generating the third template image when the search means does not obtain a search result using the third template image. A fourth template image is generated by extracting a region moved in the left-right direction by a fixed amount from the viewpoint image captured by the second imaging unit,
The imaging apparatus according to claim 7, wherein the search unit searches for a subject to be tracked using the generated fourth template image.
The said 3rd template image generation means estimates the position where the subject which should be the said tracking object exists based on the said 2 viewpoint images, and determines the said arbitrary quantity. Imaging device.
The first imaging means and the second imaging means continuously acquire the two viewpoint images,
When the search means searches for the subject to be tracked using the template image generated by the third template image generation means, the viewpoint image captured by the first image pickup means thereafter is The imaging apparatus according to claim 7, 8 or 9, wherein a subject to be tracked is searched using the first template image and the second template image.
Automatic exposure control means for performing automatic exposure control based on the tracking target object searched by the search means, automatic focus adjustment means for performing focus adjustment so that the tracking target object searched by the search means is in focus, The imaging apparatus according to claim 1, further comprising a zoom control unit that adjusts an angle of view based on a subject to be tracked searched by the search unit.
A first imaging means and a second imaging means for acquiring two viewpoint images obtained by photographing the same subject from two viewpoints;
Template image generation means for generating a first template image by extracting a part of a region including a subject to be tracked from a viewpoint image captured by the first imaging means;
Parallax acquisition means for acquiring parallax from the two viewpoint images;
Fourth template image generation means for generating a template image obtained by removing a background and foreground from the first template image based on the parallax acquired by the parallax acquisition means;
A viewpoint image captured by the first imaging means, which is generated by the fourth template image generation means from a viewpoint image captured at a different time from the viewpoint image that is the basis of the first template image. Search means for searching for a subject to be tracked using the template image obtained,
An imaging apparatus comprising:
The search unit according to claim 12, wherein the search unit searches for a subject to be tracked using the first template image generated by the template image generation unit and the template image generated by the fourth template image generation unit. Imaging device.
Obtaining two viewpoint images obtained by photographing the same subject from two viewpoints by the first imaging means and the second imaging means;
A first template image is generated by extracting a part of the area including the subject to be tracked from the viewpoint image captured by the first imaging unit, and the viewpoint image captured by the second imaging unit. Extracting a part of the region including the subject to be tracked from the second template image,
The first template image and the second template from the viewpoint image captured by the first imaging unit, the viewpoint image captured at a time different from the viewpoint image that is the basis of the first template image. Searching for a subject to be tracked using an image;
A stereoscopic image capturing method including:
Obtaining two viewpoint images obtained by photographing the same subject from two viewpoints by the first imaging means and the second imaging means;
A first template image is generated by extracting a part of the area including the subject to be tracked from the viewpoint image captured by the first imaging unit, and the viewpoint image captured by the second imaging unit. Extracting a part of the region including the subject to be tracked from the second template image,
The first template image and the second template from the viewpoint image captured by the first imaging unit, the viewpoint image captured at a time different from the viewpoint image that is the basis of the first template image. Searching for a subject to be tracked using an image;
A program that causes an arithmetic unit to execute.
Obtaining two viewpoint images obtained by photographing the same subject from two viewpoints by the first imaging means and the second imaging means;
A first template image is generated by extracting a part of the area including the subject to be tracked from the viewpoint image captured by the first imaging unit, and the viewpoint image captured by the second imaging unit. Extracting a part of the region including the subject to be tracked from the second template image,
The first template image and the second template from the viewpoint image captured by the first imaging unit, the viewpoint image captured at a time different from the viewpoint image that is the basis of the first template image. Searching for a subject to be tracked using an image;
The computer-readable recording medium which recorded the program which makes an arithmetic unit execute.