CN112040214A

CN112040214A - Double-camera three-dimensional imaging system and processing method

Info

Publication number: CN112040214A
Application number: CN201910481518.4A
Authority: CN
Inventors: 李应樵; 陈增源
Original assignee: Marvel Research Ltd
Current assignee: Marvel Research Ltd
Priority date: 2019-06-04
Filing date: 2019-06-04
Publication date: 2020-12-04
Also published as: WO2020244273A1

Abstract

The invention discloses a self-adaptive three-dimensional imaging system.A light field shooting part comprises a first imaging part, a first camera lens and a second camera or a camera lens; the first camera lens and the second camera or the camera lens are respectively positioned at the back part and the front part of the lens part, an entrance pupil plane and a matching device are arranged between the first camera lens and the second camera or the camera lens, an internal reflection unit is formed between the first camera lens and the entrance pupil plane and between the first camera lens and the matching device, and the high-resolution camera part also comprises a second imaging part and a third camera lens; the light field imaging section and the high resolution imaging section are configured such that the third camera lens obtains a second image that coincides with a front view vertical direction among the plurality of secondary images, and outputs the plurality of secondary images and the second image at the same time. The invention can acquire accurate depth information and can acquire a high-resolution video to analyze the three-dimensional image.

Description

Double-camera three-dimensional imaging system and processing method

Technical Field

The invention belongs to the field of stereo imaging, and particularly relates to a double-camera three-dimensional stereo imaging system and a processing method based on a light field technology.

Background

There are many design schemes for cameras used to capture three-dimensional images and videos in the prior art. The most common solution is to arrange two camera modules of the same specification at a certain distance, for example, about 60 to 65 mm, in a linear arrangement, so as to simulate the stereoscopic vision principle of human eyes. The image sensors of the two camera modules can respectively record two-dimensional images or videos obtained by respective shooting, a depth map can be established by utilizing the two-dimensional images or two-dimensional videos obtained by software processing, and then the two-dimensional images or two-dimensional videos are converted into three-dimensional images or three-dimensional videos. Another solution is to use a stereo camera directly to take the picture. Two image sensors are arranged in a main body of the stereo camera and respectively record two-dimensional images or two-dimensional videos from two lens groups of a camera lens, and then a system and software attached to the camera can synthesize the two-dimensional images into a three-dimensional image or synthesize the two-dimensional videos into a three-dimensional video. However, the two schemes may not synchronize two-dimensional images or videos, or may be influenced by external factors such as lighting conditions of the environment, etc. to generate three-dimensional images and video quality.

More advanced image capturing devices, such as light field cameras (also known as plenoptic cameras), capture a light field image of a scene at a time using a microlens array lens, and extract depth information of the scene by calculation to create a depth map and convert a two-dimensional image into a three-dimensional image. However, the main disadvantages of such a light field imaging apparatus are that the image resolution significantly drops, the parallax angle is small, and it is not well suited for capturing video. The recent design is to add a reflection unit to capture multi-angle images of the target object, because the parallax angle is large, the processed images can generate more clear depth images and three-dimensional stereo images, and the images are also suitable for shooting videos, but the problem of resolution drop is still not solved by the attempt.

Disclosure of Invention

The invention aims to provide a double-camera three-dimensional stereo imaging system and a processing method for improving the resolution of a three-dimensional video. The imaging system has wide application in a variety of fields, such as medical, biotech research, industrial equipment manufacturing, semiconductor product quality verification, and the like. The invention can acquire accurate depth information and also can acquire high-quality video to analyze the three-dimensional image.

The invention provides a double-camera three-dimensional imaging system, which is characterized by comprising: a light field imaging section that obtains a first image and a high resolution imaging section that obtains a second image; wherein the light field photographing section includes a first imaging section, a first camera lens and a second camera or camera lens; the first camera lens and the second camera or camera lens are respectively positioned at the back and front of the lens portion, an entrance pupil plane and a matching device are positioned between the first camera lens and the second camera or camera lens, the entrance pupil plane and the matching means can be adapted to different focal lengths of the second camera or camera lens, an internal reflection unit is formed between the first camera lens and said entrance pupil plane and matching means, the internal reflection unit is used for decomposing and refracting the captured first image into a plurality of secondary images with different angle shifts, the high-resolution camera part further comprises a second imaging part and a third camera lens, the central axis adjusting device can adjust the double lenses of the first camera lens and the second camera lens or the camera lens and the single lens of the third camera lens, and enables the axes of the double lenses and the single lens to be parallel; the light field imaging section and the high resolution imaging section are configured such that the third camera lens obtains a second image that coincides with a front view vertical direction among the plurality of secondary images, and outputs the plurality of secondary images and the second image at the same time.

In one aspect of the invention, the light field image pick-up section is located as close as possible to the high resolution image pick-up section and both are centered on the same vertical plane.

One aspect of the invention wherein the plurality of secondary images having different angular offsets have an angular offset range of 10-20 degrees.

One aspect of the invention, wherein the angular offset of the elevation view in the plurality of secondary images is 0 degrees. The first imaging section further includes a first image sensor and a fly-eye lens that captures a first image; the fly-eye lens transmitting the captured first image to the first image sensor; and the second imaging section further includes a second image sensor; the second image obtained by the third camera lens is transmitted to the second image sensor.

In one aspect of the invention, the fly-eye lens is a plurality of micro-lens arrays, and the radius, thickness and array pitch of each micro-lens are related to the size of the first image sensor.

In one aspect of the present invention, the first camera lens and the second camera or camera lens have adjustable apertures and focal lengths, the second camera or camera lens and the third camera lens are interchangeable lenses, and the aperture of the second camera or camera lens is larger than the size of the internal reflection unit.

In one aspect of the invention, the entrance pupil plane and the matching means are pupil lenses having a diameter larger than the diameter of the internal reflection unit and allowing the incident light rays of the light field image to be refracted in the internal reflection unit.

In one aspect of the present invention, each of the secondary images has a subtle difference of a scene, and the size of the internal reflection unit and the focal length of each of the secondary images are calculated based on the following equations (1) and (2):

wherein the FOV is a field of view of the second video camera or camera lens;

n is a refractive index of the internal reflection unit;

r is the number of internal reflections;

z is the size of the internal reflection unit;

f_lensis said second video camera or camera lensThe focal length of the head;

f_subis the focal length of the secondary image.

The invention also provides a processing method of the double-camera three-dimensional imaging, which comprises the following steps: obtaining original depth map data of a first image by a light field imaging part; correcting the original depth map data; obtaining a high-resolution depth map generated by interpolation by using an edge-oriented or directional rendering method; and simultaneously obtaining a second image by using a high-resolution image pick-up part, and correcting the original depth map data of the first image by using a data model and combining the second image as reference data until an optimal interpolation-generated high-resolution depth map is obtained.

The three-dimensional imaging system and the processing method provided by the invention can provide two-dimensional and three-dimensional videos with higher resolution, and meanwhile, compared with a light field camera of a high-resolution image sensor, the cost is increased very limitedly; in addition, since the system of the present invention does not affect the function of the light field camera portion, the information obtained by the light field camera itself can still be used to calculate object depth and build a depth map.

Drawings

In order to more clearly illustrate the technical solution in the embodiments of the present invention, the drawings required to be used in the embodiments will be briefly described below. It is obvious that the drawings in the following description are only examples of the invention, and that for a person skilled in the art, other drawings can be derived from them without making an inventive step.

Fig. 1 is a perspective view of a three-dimensional imaging system of the present invention.

Fig. 2 is a block diagram of a three-dimensional imaging system of the present invention.

Fig. 3 is a schematic diagram of a three-dimensional imaging system of the present invention acquiring a first image 120.

Fig. 4 is a schematic diagram of the three-dimensional imaging system according to the present invention after normalizing the acquired first image 120.

Fig. 5 is a flow chart of the three-dimensional imaging system of the present invention processing the second image 130.

FIG. 6 is a flow chart of obtaining an image of a target by the three-dimensional imaging system of the present invention.

Detailed Description

Specific embodiments of the present invention will now be described with reference to the accompanying drawings. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. These embodiments are provided only for the purpose of exhaustive and comprehensive description of the invention so that those skilled in the art can fully describe the scope of the invention. The terminology used in the detailed description of the embodiments illustrated in the accompanying drawings is not intended to be limiting of the invention.

Fig. 1 is a perspective view of a three-dimensional imaging system of the present invention. The three-dimensional imaging system of the present invention is composed of a light field image capturing section 100 for obtaining a first image 120 (not shown in fig. 1) and a high resolution image capturing section 140 for obtaining a second image 130 (not shown in fig. 1), wherein the light field image capturing section 100 may be a light field camera in chinese patent application 201711080588.6, which includes a first imaging section 110, a first camera lens 101 and a second camera or camera lens 103, wherein the first camera lens 101 is a rear camera lens; with adjustable aperture and focal length. The second camera or camera lens 103 is a front camera or camera lens, which can adjust the focal length of the camera. Intermediate the first camera lens 101 and the second camera or camera lens 103 is an entrance pupil plane and matching device 109, which entrance pupil plane and matching device 109 may be a pupil lens, with an internal reflection unit 102 between the pupil lens 109 and the first camera lens 101. The high-resolution image pickup section 140 and the light field image pickup section 100 are integrally fixed together, and the high-resolution image pickup section 140 includes a second imaging section 116, and the lens central axis 112a (see fig. 2) of the third camera lens 117 of the high-resolution image pickup section 140 and the lens central axis 112b (see fig. 2) of the first camera lens 101 and the second camera lens 103 in the light field image pickup section 100 are held in parallel by the central axis adjusting means 118.

Fig. 2 is a block diagram of a dual camera three-dimensional imaging system of the present invention. The light field imaging part 100 of the three-dimensional imaging system comprises a first imaging part 110 and a lens part 111, wherein the first imaging part 110 comprises a first image sensor 104; a fly-eye lens 105; the first image sensor 104 is an image sensor with higher imaging quality; fly-eye lens 105 is formed by a series of combinations of lenslets that capture information of an image, such as light field image information, from different angles, thereby stripping off three-dimensional information to identify a particular object. Fly-eye lens 105 is composed of an array of micro-lenses designed to produce a depth map in addition to capturing a light field image. Also, the fly-eye lens 105 is served by the first image sensor 104, and thus, it is related to a parameter of the first image sensor 104. For example, each microlens parameter of the fly-eye lens 105 has a radius of 0.5 mm, a thickness of 0.9 micrometers, and an array pitch of each microlens is 60 micrometers. The fly-eye lens is scalable in size with respect to the first image sensor 104. In one embodiment, the size of the C-type image sensor using the advanced photography system is 25 mm × 17 mm; in yet another embodiment, a full-frame image sensor is used having dimensions of 37 mm by 25 mm.

The lens section 111 is detachably connected to the first imaging section 110. The pupil lens 109 may be a single lens that acts as a condenser and compresses the information received by the second camera or camera lens 103. An imaging process is performed at the second camera or camera lens 103, and the imaging angle is different as the second camera or camera lens 103 is replaced or replaced. The first camera lens 101 is a short-focus lens or a macro lens, which is fixed to a housing (not shown in fig. 2), and the design of the first camera lens 101 determines the size of the imaging system of the present invention. The secondary imaging process is performed at the first camera lens 101. The entrance pupil plane and the matching means 109 are designed to correct the light. Between the entrance pupil plane and the matching means 109 and the first lens 101 is an internal reflection unit 102; the internal reflection unit 102 decomposes and reflects an image to be photographed into multi-angle images having independent secondary images with different angle offsets. The internal reflection unit 102 is designed to provide a plurality of virtual images at different viewing angles. The size and scale of the internal reflection unit 102 are determinative of the number of reflections and the ratio of reflected images, resulting in images at different angles. Each reflection produces a secondary image with subtle differences in the scene, and the target image has a slight offset. The size of the internal reflection unit 102 and the focal length of each secondary image may be calculated based on the following equations (1) and (2):

wherein the FOV is a field of view of the second video camera or camera lens;

n is a refractive index of the internal reflection unit;

r is the number of internal reflections;

x, Y, Z are the dimensions of the internal reflection unit, width, height, length, respectively;

f_lensis the focal length of the second video camera or camera lens;

f_subis the focal length of the secondary image.

The size of the internal reflection unit 102 may be the same as that of the first image sensor 104, and in one embodiment may be 24 mm (width) by 36 mm (height) by 95 mm (length), that is, the ratio of the units is about 2: 3: 8. The pupil lens 109 is used to match the size of the secondary image with the size of the internal reflection unit 102 and to correctly reflect in the internal reflection unit 102. To achieve this, the diameter of the pupil lens 109 should be larger than the internal reflection unit 102. The pupil lens 109 employed in one embodiment is approximately 50 mm in diameter and 50 mm in focal length. The second camera or camera lens 103 is designed to be able to be replaced by any camera or camera lens as long as the aperture of the second camera or camera lens 103 is larger than the size of the internal reflection unit 102.

The high resolution image pickup section 140 includes the second imaging section 116, the second image sensor 119, and the third camera lens 117. The adjusting means 118 for adjusting the center axis of the twin lens of the first camera lens 101 and the second camera or camera lens 103 and the single lens of the third camera lens 117 is located outside the high-resolution imaging section 140 and is independent of the light field imaging section 100 and the high-resolution imaging section 140 as long as the axis 112b of the first camera lens 101, the second camera or camera lens 103 can be made parallel to the axis 112a of the third camera lens 117 by adjusting the adjusting means 118. The second image sensor 119 may adopt a sensor with the same or different specification as the first image sensor 104, but the resolution of the second image sensor 119 should be at least above the resolution 1/9 of the first camera sensor 104, so as to achieve the purpose of improving the light field video resolution of the three-dimensional imaging system of the present invention.

Fig. 3 is a schematic diagram of a first image 120 obtained by the light field imaging section 100 of the three-dimensional imaging system of the present invention. The internal reflection unit 102 in the light field imaging section 100 decomposes the captured first image 120, i.e., the light field image or the video picture, and reflects into a plurality of secondary images or video pictures having different angular offsets, e.g., 9 secondary images or video pictures, which are acquired by the first image sensor 104 of the first imaging section 110 through the fly-eye lens. The secondary image (i) in the middle of the 9 secondary images or video pictures is the front view of the shot scene, and the other 8 secondary images (ii) -nine or video pictures are the secondary images or video pictures shifted by a specific angle. Each image or video frame has on average only 1/9 or less resolution of the first image sensor 104. Before generating the scene depth map, 9 secondary images or video frames are segmented and each of the secondary images is preprocessed.

Fig. 4 is a schematic diagram of the first image 120 obtained by the light field imaging section 100 of the three-dimensional imaging system according to the present invention after normalization processing. Each secondary image is normalized by the following equation (3):

wherein I_n(n ═ 1, 2.., 9) represents the image before normalization; i'_n(n ═ 1, 2.., 9) represents the normalized image; mirror (I)_mLeft, right, up, down) (m ═ 1, 2, 3, 4) represents the image mirror flipping left, right, up, and down; rotate (I)_kPi) (k ═ 6.., 9) represents image rotation.

After normalization, the shift of the scene among each secondary image can be identified, with each secondary image being a separate original compound eye image. The next step is to perform pre-processing by image processing techniques including, but not limited to, image noise removal, and then decoding by synthetic aperture techniques, so as to obtain the light field information in the original image of the compound eye, and generate a quasi-focused secondary image by digital refocusing techniques. Wherein the synthetic aperture image can be digitally refocused using the following principles:

L′(u，v，x′，y′)＝L(u，v，kx′+(1-k)u，ky′+(1-k)v)(6)

I′(x′，y′)＝∫∫L(u，v，kx′+(1-k)u，ky′+(1-k)v)dudv(7)

wherein I and I' represent the coordinate systems of the primary and secondary imaging planes;

l and L' represent the energy of the primary and secondary imaging planes.

After the 9 in-focus secondary images are acquired, two secondary images, such as secondary images (c) and (c), can be selected, a Disparity Map (Disparity Map) is calculated by using a binocular Stereo Matching algorithm, and then a scene depth Map with a lower resolution is further established. Among them, Semi-Global Matching (Semi-Global Matching) algorithm is one of commonly used binocular stereo vision Matching algorithms, provides good parallax effect and has ideal calculation speed. By establishing a cost function associated with the disparity map:

wherein D represents a disparity map;

p and q are a certain pixel in the image;

C(p,D_p) The disparity value representing the current pixel is D_pA cost value for the pixel;

N_prepresents the pixels adjacent to pixel p, typically 8;

p1 and P2 are penalty coefficients, the disparity value difference of P1 for pixel P and the adjacent pixel is equal to 1, and the disparity value difference of P2 for pixel P and the adjacent pixel is greater than 1;

t [ ] is a function that returns a 1 if the parameter in the function is true, and returns a 0 otherwise.

For each pixel of the image, the minimum cost value of a certain parallax value is calculated, and the cost values of 8 directions are accumulated, and the parallax value with the lowest value after accumulation is equivalent to the final parallax value of the pixel. The whole disparity map can be obtained by calculating each pixel one by one.

Finally, the following formula can be used for converting the disparity value and the depth value:

wherein d is_pRepresents a depth value of a certain pixel;

f is the normalized focal length;

b is the baseline distance between the two secondary images;

D_prepresenting the disparity value of the current pixel.

The second image 130 obtained by the second imaging portion 116 is a 2D image or a video picture, and can completely reflect the information of the object, so that the resolution of the second image is not reduced, and the second image 130 does not need to be normalized or refocused for the same reason.

For example, the first image sensor 104 and the second image sensor 119 both use 4K image sensors, that is, the first and second image sensors each have 3840 × 2160 pixels. Assuming that the 9 secondary images or video frames are the same size, each secondary image has on average only a sensor 1/9 or lower resolution, e.g., 1280x 720 pixels per secondary image, which would be similarly limited in resolution by 1280x 720 pixels if the scene depth map and light field video were generated directly. Therefore, a scene depth map of 1280x 720 is created by using the secondary image (i-ninthly), and then the resolution of the depth map is improved to 3840x 2160 by using an Edge-directed interpolation algorithm with reference to the 3840x 2160 pixel high-resolution second image 130 obtained by the second image sensor 119. The formula used for edge-directed interpolation is as follows:

wherein m and n are low resolution and high resolution image grids before and after interpolation;

y [ n ] represents the depth map generated after interpolation;

x[m]and

representing the original depth map and the corrected map;

reference data representing the second image 130;

s and R represent the data model of the second image 130 and the operator of the edge-directed rendering step, respectively;

λ is the gain of the correction process;

k is the iteration index.

Due to the high resolution of the second image 130 as a basis, the accuracy of the correction step of the interpolation calculation is sufficient to cope with the need to generate a 3D image. Then, the high-resolution second image 130, i.e., the 2D image or the video image, is combined with the depth map with the improved resolution to generate a high-resolution 2D + Z3D image or video format, and the 3D image or video format is output to the display, so that the resolution of the light field image or the light field video can be greatly improved to 3840 × 2160 pixels.

FIG. 5 is a flow chart of the three-dimensional imaging system of the present invention processing a target image. In step 501, original depth map data of the first image 120 is obtained by the first imaging section 100; in step 502, correcting the original depth map data; in step 503, using an edge-oriented or directional rendering method, obtaining a high-resolution depth map generated by interpolation in step 504; wherein in step 505, the second image obtained in step 506 as reference data is compared with a data model of the second image; the original depth map data of the first image 120 is modified until the best interpolated high resolution depth map is obtained.

FIG. 6 is a flow chart of obtaining an image of a target by the three-dimensional imaging system of the present invention. In step 601, a first image 120 comprising 9 secondary images or video frames is acquired by a first image sensor 104; in

step

602, 9 secondary images or video frames are divided and normalized; in step 603, image noise removal processing is performed on each secondary image; in step 604, the light field information obtained from the 9 secondary images is decoded using a synthetic aperture technique, and then a quasi-focus image is generated using a digital refocusing technique; in step 605, a lower resolution scene depth map is created using the 9 in-focus secondary images; combining the high resolution 2D image obtained by the second image sensor or the second image of the video frame obtained at step 608; in step 606, with reference to the second image, the resolution of the depth map is increased using an edge-directed or edge-directed interpolation algorithm; in step 607, the enhanced resolution depth map and the second image are combined to generate a high resolution 3D image or video picture.

The above description is only for the purpose of illustrating the present invention, and any person skilled in the art can modify and change the above embodiments without departing from the spirit and scope of the present invention. Therefore, the scope of the claims should be accorded the full scope of the claims. The invention has been explained above with reference to examples. However, other embodiments than the above described are equally possible within the scope of this disclosure. The different features and steps of the invention may be combined in other ways than those described. The scope of the invention is limited only by the appended claims. More generally, those of ordinary skill in the art will readily appreciate that all parameters, dimensions, materials, and configurations described herein are exemplary and that actual parameters, dimensions, materials, and/or configurations will depend upon the particular application or applications for which the teachings of the present invention is/are used.

Claims

1. A dual camera three dimensional stereoscopic imaging system, comprising:

a light field imaging section that obtains a first image and a high resolution imaging section that obtains a second image;

wherein the light field photographing section includes a first imaging section, a first camera lens and a second camera or camera lens; the first camera lens and the second camera or camera lens are respectively positioned at the back and front of the lens portion, an entrance pupil plane and a matching device are positioned between the first camera lens and the second camera or camera lens, the entrance pupil plane and the matching means can be adapted to different focal lengths of the second camera or camera lens, an internal reflection unit is formed between the first camera lens and said entrance pupil plane and matching means, the internal reflection unit is used for decomposing and refracting the captured first image into a plurality of secondary images with different angle shifts, the high-resolution camera part further comprises a second imaging part and a third camera lens, the central axis adjusting device can adjust the double lenses of the first camera lens and the second camera lens or the camera lens and the single lens of the third camera lens, and enables the axes of the double lenses and the single lens to be parallel;

the light field imaging section and the high resolution imaging section are configured such that the third camera lens obtains a second image that coincides with a front view vertical direction among the plurality of secondary images, and outputs the plurality of secondary images and the second image at the same time.

2. The system of claim 1, wherein the light field imaging section is as close as possible to the high resolution imaging section and both are centered on the same vertical plane.

3. The system of claim 1, wherein the plurality of secondary images having different angular offsets have an angular offset range of 10-20 degrees.

4. The system of claim 3, wherein the angular offset of the elevation view in the plurality of secondary images is 0 degrees.

5. The system of claims 1-4,

the first imaging section further includes a first image sensor and a fly-eye lens that captures a first image; the fly-eye lens transmitting the captured first image to the first image sensor; and is

The second imaging section further includes a second image sensor; the second image obtained by the third camera lens is transmitted to the second image sensor.

6. The system of claim 5,

the fly-eye lens is a plurality of micro-lens arrays, and the radius, the thickness and the array spacing of each micro-lens are related to the size of the first image sensor.

7. The system of any of claims 1-4, 6,

the first and second camera lenses have apertures and focal lengths that are adjustable, the second and third camera lenses are interchangeable lenses, and the apertures of the second camera or camera lens are larger than the size of the internal reflection unit.

8. The system according to any of claims 1-4, 6,

the entrance pupil plane and the matching means are pupil lenses having a diameter larger than that of the internal reflection unit and allowing incident light rays of the light field image to be refracted in the internal reflection unit.

9. The system according to any of claims 1-4, 6,

each of the secondary images has a subtle difference in scene, and the size of the internal reflection unit and the focal length of each secondary image are calculated based on the following equations (1) and (2):

wherein the FOV is a field of view of the second video camera or camera lens;

n is a refractive index of the internal reflection unit;

r is the number of internal reflections;

z is the size of the internal reflection unit;

f_lensis the focal length of the second video camera or camera lens;

f_subis thatFocal length of the secondary image.

10. A processing method for three-dimensional imaging of double cameras comprises the following steps:

obtaining original depth map data of a first image by a light field imaging part;

correcting the original depth map data;

obtaining a high-resolution depth map generated by interpolation by using an edge-oriented or directional rendering method;

and simultaneously obtaining a second image by using a high-resolution image pick-up part, and correcting the original depth map data of the first image by using a data model and combining the second image as reference data until an optimal interpolation-generated high-resolution depth map is obtained.

11. The process of claim 10 wherein,

the light field shooting part comprises a first imaging part, a first camera lens and a second camera or a camera lens; the first camera lens and the second camera or camera lens are respectively positioned at the back and front of the lens portion, an entrance pupil plane and a matching device are positioned between the first camera lens and the second camera or camera lens, the entrance pupil plane and the matching means can be adapted to different focal lengths of the second camera or camera lens, an internal reflection unit is formed between the first camera lens and said entrance pupil plane and matching means, the internal reflection unit is used for decomposing and refracting the captured first image into a plurality of secondary images with different angle shifts, the high-resolution camera part further comprises a second imaging part and a third camera lens, the central axis adjusting device can adjust the double lenses of the first camera lens and the second camera lens or the camera lens and the single lens of the third camera lens, and enables the axes of the double lenses and the single lens to be parallel; the light field imaging section and the high resolution imaging section are configured such that the third camera lens obtains a second image that coincides with a front view vertical direction among the plurality of secondary images, and outputs the plurality of secondary images and the second image at the same time.

12. The processing method of claim 11,

the first image is 9 secondary images or video pictures acquired by the first image sensor;

dividing the 9 secondary images or video pictures and carrying out normalized processing on each secondary image;

removing image noise from each secondary image; decoding light field information acquired by the 9 secondary images by utilizing a synthetic aperture technology, and generating a quasi-focus image by utilizing a digital refocusing technology;

establishing a scene depth map with lower resolution by using the secondary images with 9 focuses; combining the second image, which is a high resolution 2D image or video picture obtained by the second image sensor;

with reference to the second image, increasing the resolution of the depth map using an edge-directed or edge-directed interpolation algorithm; and combining the enhanced resolution depth map and the second image to produce a high resolution 3D image or video picture.

13. The processing method of claim 12, wherein each secondary image is normalized by the following equation:

wherein I_n(n ═ 1, 2.., 9) represents the image before normalization; i is_n(n ═ 1, 2.., 9) represents the normalized image; mirror (I)_mLeft, right, up, down) (m ═ 1, 2, 3, 4) represents the image mirror flipping left, right, up, and down; rotate (I)_kPi) (k ═ 6.., 9) represents image rotation.

14. The process of claim 12 wherein the digital refocusing is performed on the synthetic aperture image using the following principles:

L′(u，v，x′，y′)＝L(u，v，kx′+(1-k)u，ky′+(1-k)v) (6)

I′(x′，y′)＝∫∫L(u，v，kx′+(1-k)u，ky′+(1-k)v)dudv (7)

l and L' represent the energy of the primary and secondary imaging planes.

15. The processing method of claim 12, wherein a Disparity Map (Disparity Map) is calculated using a binocular Stereo Matching algorithm, and then a lower resolution scene depth Map is further created.

16. The processing method of claim 15, wherein the binocular stereo vision Matching algorithm is a Semi-Global Matching (Semi-Global Matching) algorithm by setting the following cost function associated with disparity maps:

wherein D represents a disparity map;

p and q are a certain pixel in the image;

C(p，D_p) The disparity value representing the current pixel is D_pA cost value for the pixel;

N_prepresents the pixels adjacent to pixel p, typically 8;

t [ ] is a function, if the parameter in the function is true, 1 is returned, otherwise 0 is returned;

calculating the minimum cost value of each pixel of the image when the pixel is a certain parallax value, accumulating the cost values of 8 directions, and calculating each pixel one by one to obtain the whole parallax image, wherein the accumulated parallax value with the lowest value is equivalent to the final parallax value of the pixel.

17. The processing method as claimed in claim 12, wherein the conversion of the disparity value to the depth value may use the following formula:

wherein d is_pRepresents a depth value of a certain pixel;

f is the normalized focal length;

b is the baseline distance between the two secondary images;

D_prepresenting the disparity value of the current pixel.

18. The processing method of claim 10, wherein the edge-directed interpolation uses the formula:

y [ n ] represents the depth map generated after interpolation;

x[m]and

representing the original depth map and the corrected map;

reference data representing the second image 130;

λ is the gain of the correction process;

k is the iteration index.