WO2021184341A1

WO2021184341A1 - Autofocus method and camera system thereof

Info

Publication number: WO2021184341A1
Application number: PCT/CN2020/080375
Authority: WO
Inventors: Xuyang FENG
Original assignee: SZ DJI Technology Co., Ltd.
Priority date: 2020-03-20
Filing date: 2020-03-20
Publication date: 2021-09-23
Also published as: CN115299031A

Abstract

Systems and methods for causing a camera to focus, the method including: determining a first region of interest (ROI) in a first view of a scene captured by a first camera, the first ROI determined based on first image data associated with the first view obtained from the first camera; identifying, in accordance with the first ROI, a second ROI in a second view of the scene captured by a second camera, the second ROI corresponding to the first ROI; and causing the second camera to focus on the second ROI in the second view.

Description

AUTOFOCUS METHOD AND CAMERA SYSTEM THEREOF

Copyright Notice

A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.

TECHNICAL FIELD

The present disclosure generally relates to systems and methods for focusing a camera based on image data obtained from one or more other cameras.

BACKGROUND

In the field of camera technology, for example as it relates to capturing images (e.g., casual videography, cinematography, photography, etc. ) , the depth of field (DOF) for cameras sometimes can be shallow. The DOF of a camera’s lens corresponds to a distance (focus depth) between the nearest and farthest objects that can remain in sufficient focus within the view of the camera. For example, an object in the camera’s DOF is generally in focus and clearer to observe than objects outside of its DOF. The shallow DOF formed by a large aperture lens can make the scenes more visually pleasing and may contribute to what makes the images appear more “movie-like. ” The camera operator, for example, may control a camera’s shallow DOF to blur the background scenery and focus on an actor in the foreground. However, there lacks fast, accurate, intuitive, and inexpensive systems and methods for autofocusing for shooting images of a scene.

SUMMARY

Consistent with embodiments of the present disclosure, a system is provided for causing a camera to focus. In some embodiments, the system may include one or more processors; and memory coupled to the one or more processors and storing instructions that, when executed by the one or more processors, cause the system to determine a first region of interest (ROI) in a first view of a scene captured by a first camera, the first ROI determined based on first image data associated with the first view obtained from the first camera; identify, in accordance with the first ROI, a second ROI in a second view of the scene captured by a second camera, the second ROI corresponding to the first ROI; and cause the second camera to focus on the second ROI in the second view.

There is also provided a computer-implemented method for autofocusing a shallow depth of field (DOF) camera based on image data obtained from one or more deep DOF cameras. In some embodiments the method may comprise determining a first region of interest (ROI) in a first view of a scene captured by a first camera, the first ROI determined based on first image data associated with the first view obtained from the first camera; identifying, in accordance with the first ROI, a second ROI in a second view of the scene captured by a second camera, the second ROI corresponding to the first ROI; and causing the second camera to focus on the second ROI in the second view. The method also may comprise updating a focus of a camera configured to continuously capture a second view based on the determined region of interest.

There is further provided a non-transitory computer-readable medium with instructions stored thereon, that when executed by a processor, cause the processor to perform operations comprising determining a first region of interest (ROI) in a first view of a scene captured by a first camera, the first ROI determined based on first image data associated with the first view obtained from the first camera; identifying, in accordance with the first ROI, a second ROI in a second view of the scene captured by a second camera, the second ROI corresponding to the first ROI; and causing the second camera to focus on the second ROI in the second view.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention as claimed. Other features of the present invention will become apparent by a review of the specification, claims, and appended figures.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1A shows an exemplary system for autofocus using a deep depth-of-field sensor in accordance with certain embodiments of the present disclosure.

FIG. 1B shows an exemplary system for autofocusing for videography, in accordance with some embodiments of the present disclosure.

FIG. 2A shows an exemplary camera system that may be configured in accordance with certain embodiments of the present disclosure.

FIG. 2B shows another exemplary camera system that may be configured in accordance with certain embodiments of the present disclosure.

FIGS. 3A and 3B show exemplary focusing on objects in a view by a main camera and an auxiliary camera in accordance with certain embodiments of the present disclosure.

FIG. 4 shows a schematic diagram of an exemplary camera system that may be configured in accordance with certain embodiments of the present disclosure.

FIG. 5A-5D illustrate exemplary focusing on objects in a view by a main camera and an auxiliary camera in accordance with certain embodiments of the present disclosure.

FIG. 6 shows a flow diagram of an exemplary autofocusing process in accordance with certain embodiments of the present disclosure.

FIGS. 7A-7C show an exemplary autofocusing system in accordance with some embodiments of the present disclosure.

FIGS. 8A and 8B show another exemplary autofocusing process in accordance with certain embodiments of the present disclosure.

FIG. 9 shows a flow diagram of an autofocusing process with guidance of a deep depth of field (DOF) camera, in accordance with embodiments of the present disclosure.

FIG. 10 shows a flow diagram of an autofocusing process with guidance of a deep DOF camera, in accordance with embodiments of the present disclosure.

FIG. 11 shows a flow diagram of an autofocusing process with guidance of one or more deep DOF cameras, in accordance with embodiments of the present disclosure.

FIG. 12 shows a flow diagram of an autofocusing process with guidance of one or more deep DOF cameras, in accordance with embodiments of the present disclosure.

FIG. 13 illustrates a diagram for determining a distance to an object in an overlapping region of a plurality of auxiliary cameras, in accordance with some embodiments of the present disclosure.

DETAILED DESCRIPTION

The following detailed description refers to the accompanying drawings. Wherever possible, the same reference numbers refer to the same or similar parts. While several illustrative embodiments are described herein, modifications, adaptations and other implementations are possible. For example, substitutions, additions or modifications may be made to the components illustrated in the drawings. Accordingly, the following detailed description is not limited to the disclosed embodiments and examples. Instead, the proper scope is defined by the appended claims.

Cameras with focus adjustment (e.g., autofocusing, or assisted-focusing) systems can provide images with higher visual quality, for example, to identify specific objects and/or people, or even just their faces or eyes, in views captured by the camera. By adjusting focus of a camera, different visual effects can also be provided to the captured images in accordance with a user’s preference (e.g., “movie-like” videos, portrait images, etc. ) . As used herein, a “view” refers to any static or dynamic shot, scene, image, image frame (e.g., in a video) , or picture that may be captured by an imaging device, such as a camera. A camera’s field of view (FOV) refers to an angular area over which a view can be captured by the camera. For some cameras, however, users may have to use manual focusing to obtain a desired effect, because the autofocusing has not been widely applied, and the existing autofocusing technology may not satisfy the user’s need for fast, accurate, and low-cost focus adjustment. Accordingly, improved systems and methods for adjusting focus are desirable for video camera technology (e.g., videography) .

There are several problems relating to autofocusing for cameras. For example, some cameras rely on a shallow depth of field (DOF) to provide background imagery that produces a movie-like visual effect. Because of this, an optimal focus of video cameras often cannot be easily recognized, and it is also difficult to achieve a smooth target focus transition from one region of a shot to another. Further, current autofocus systems may focus on the wrong regions or objects captured by a video camera and may not adjust quickly enough to sudden changes in the scene. To achieve a focus transition using a video camera, a focus follower may be used. However, focus followers are usually operated by a focus puller, a person who job is to manually adjust the focus of the video camera.

As an example, many movies include scenes where two or more people are talking to each other, one person in the foreground and the others in the background. A focus follower may have a marker associated with a respective focus for each person in the shot, allowing for a smooth and quick transition between them. If the transition is not fast or accurate enough, one of the subjects may be out of focus during a portion of the scene when they should be the focus of the scene.

In another example, a device particularly for measuring a distance from a camera to an object may be used, and the camera may be automatically focused on the object based on the measured distance. The distance measurement device may, for example, include an infrared light or laser emitter and a light sensor that senses the reflected infrared light or laser. The time of flight, i.e., from the time the light is emitted by the emitter until the time the reflected light is sensed by the sensor, can be used to determine the distance between the camera and the object in the video image. Some distance measurement devices may utilize ultrasound radiation instead of light. With the measured distance, a controller (such as a computer) in the camera can send signals to motors that drive and move a lens or lenses to achieve focus on the object.

In yet another example, some cameras can employ a phase detection method to adjust focus. A mirror can reflect the image of the object onto two phase sensors, and a computer compares the two reflected images sensed by the phase sensors. In this case, focus can occur when the two reflected images are identical.

In yet another example, contrast detection, involving finding a position of one or more camera lenses that provides the best contrast between consecutive images captured by those one or more lenses, may be used for the autofocusing system. As the one or more lenses (or groups of lenses) move, thereby changing their focus, the camera takes images of an object, and a computer associated with the camera analyzes the images and compares contrasts between consecutive images. Increased contrast between consecutive images suggests the lenses are moving in the correct direction for improving focus. The position of the lenses that generates consecutive images with the highest contrast is considered to provide the optimal focus.

Various methods for autofocusing have advantages and disadvantages. For example, contrast detection requires analysis of many images as the lenses move back and forth and therefore can be time-consuming. Distance measurement methods may take much less time, but the distance measurement can only determine the distance from the camera to the closest object in the view and not work when it is desired to take a picture with a focus on an object further in the view. The phase detection method can achieve focus with precision quickly but can require complex and expensive construction of the camera since the camera must include multiple autofocus sensors, each having its own lens and photodetector. In addition, the number of autofocus sensors can limit the number of areas to focus on in the view. Two autofocus sensors, for example, can only focus the camera on one part of the image. But, raising the number of focus points can further raise the price of the camera.

The autofocusing methods may be combined, such as including the distance measurement method or phase detection method as a first step to quickly adjust the camera roughly in a desired area of focus, followed by a contrast detection method to fine-tune the camera’s focus. However, these autofocusing methods may work well when taking static pictures, but not so well in moving environments, where objects at different distances move with time. Especially when shooting a video, the camera must adjust and track its focus in real time in response to moving objects. Accordingly, there exists a need for fast, accurate, and inexpensive autofocusing and focus-tracking technology adapted for various environments.

Consistent with some exemplary embodiments of the present disclosure, there are provided systems and methods that can quickly and automatically switch focus between one or more objects or areas in a view captured by a camera (e.g., with a smaller DOF) , for example, when shooting a video. The systems obtain image data from one or more cameras (e.g., with larger DOFs) that can collectively provide guidance based on deep DOF views for capturing a shallow DOF view by the camera with the smaller DOF, without significantly increasing the cost of the system.

FIG. 1A illustrates an exemplary system 100, also referred to herein as autofocusing system 100 or camera system 100, for autofocus using a deep depth-of-field sensor guide that may be used in accordance with certain disclosed embodiments. Autofocusing system 100 includes one or more processors 102 connected to a main camera 104, one or more auxiliary cameras 106 and inputs and outputs 108. One or more processors 102 are configured to receive inputs from inputs and outputs 108 and provide instructions (e.g., in the form of signals or commands) to main camera 104 and/or one or more auxiliary cameras 106. In some embodiments, for example, one or more processors 102 may be configured to produce outputs based on information received from main camera 104 and one or more auxiliary cameras 106. In some embodiments, one or more processors 102 may be configured to receive information from one or more auxiliary cameras 106 and provide instructions to main camera 104. In some embodiments, one or more processors 102 may be configured to receive information from main camera 104 and provide instructions to one or more auxiliary cameras 106. For example, one or more processors 102 may receive information from one or more auxiliary cameras 106 indicating that main camera 104 should change its focus point, for example, to focus on a different object or area in a view captured by the main camera. In such exemplary embodiments, one or more processors 102 sends instructions to main camera 104 causing main camera 104 to change its focus point in accordance with the information that the processor received from one or more auxiliary cameras 106.

In some embodiments, inputs and outputs 108 are configured to receive inputs corresponding with a view that a person, such as a movie director, wants to capture using system 100. One or more processors 102 may process these inputs and send instructions to both main camera 104 and one or more auxiliary cameras 106 according to the inputs received.

FIG. 1B shows an exemplary system 120 for adjusting focus ofvideography, in accordance with some embodiments of the present disclosure. In some embodiments, system 120 can include a module of a system integrated with one or more cameras (e.g., main camera 104, and one or more auxiliary cameras 106 of FIG. 1A) , or a computing system communicatively coupled to a device integrated with the one or more cameras. In some other embodiments, system 120 can also include a cloud server or a mobile device (e.g., a user device 708 of FIGs. 7A-7C, or a user device 800 of FIG. 8A) configured to process data received from the one or more cameras and/or generate instructions to adjust one or more parameters of the one or more cameras respectively. In some embodiments, system 120 is a portion of or includes one or more modules of system 100 of FIG. 1A. For example, one or more processors 122 of system 120 correspond to one or more processors 102 of system 100. Input device (s) 126 and output device (s) 128 correspond to inputs and outputs 108 of system 100.

As shown in FIG. 1B, system 120 includes one or more processors 122 for executing modules, programs and/or instructions stored in a memory 140 and thereby performing predefined operations, one or more network or other communications interfaces 130, and one or more communication buses 132 for interconnecting these components. System 120 may also include a user interface 124 including one or more input devices 126 (e.g., a keyboard, a mouse, a touchscreen, a microphone, and/or a camera, etc. ) and one or more output devices 128 (e.g., a display for displaying a graphical user interface (GUI) , and/or a speaker etc. ) .

Processors 122 may be any suitable hardware processor, such as an image processor, an image processing engine, an image-processing chip, a graphics-processor (GPU) , a microprocessor, a micro-controller, a central processing unit (CPU) , a network processor (NP) , a digital signal processor (DSP) , an application specific integrated circuit (ASIC) , a field-programmable gate array (FPGA) , or another programmable logic device, discrete gate or transistor logic device, discrete hardware component.

Memory 140 may include high-speed random access memory, such as DRAM, SRAM, or other random access solid state memory devices. In some implementations, memory 140 includes non-volatile memory, such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non-volatile solid state storage devices. In some implementations, memory 140 includes one or more storage devices remotely located from processor (s) 122. Memory 140, or alternatively one or more storage devices (e.g., one or more nonvolatile storage devices) within memory 140, includes a non-transitory computer readable storage medium. In some implementations, memory 140 or the computer readable storage medium of memory 140 stores one or more computer program instructions (e.g., modules) 146, and a database 170, or a subset thereof that cause a processor, e.g., processor (s) 122, to perform one or more steps of a process as discussed more fully below with reference to FIGs. 9, 10, 11, and 12. Memory 140 may also store image data captured by one or more cameras, (e.g., main camera 104 and one or more auxiliary cameras 106 of FIG. 1A, one or more cameras in FIGs. 2A-2B, and camera system 400 in FIG. 4, etc. ) for processing by processor 122. Memory 140 may further store operations instructions for controlling the one or more cameras as discussed in the present disclosure.

In some embodiments, memory 140 of system 120 includes an operating system 142 that includes procedures for handling various basic system services and for performing hardware dependent tasks. Memory 140 further includes a network communications module 144 that is used for connecting system 120 to other electronic devices via communication network interfaces 130 and one or more communication networks (wired or wireless) , including but not limited to the Internet, other wide area networks, local area networks, and metropolitan area networks.

FIG. 2A illustrates an exemplary camera system 200 including one or more sensors, such as one or more deep DOF sensors (e.g., auxiliary camera (s) 204) , for guiding a shallow DOF camera (e.g., a main camera 202) for autofocusing that may be used in accordance with some disclosed embodiments. In some embodiments,

system

200 or 205 in FIGs. 2A and 2B, respectively, may embody system 100 in FIG. 1A. In some embodiments, system 120 of FIG. 1B may be integrated into (e.g., included as a component of)

camera system

200 or 205 shown in FIGs. 2A and 2B. In some other embodiments, system 120 of FIG. 1B may be communicatively coupled to

camera system

200 or 205 as shown in FIGs. 2A and 2B. In yet some other embodiments, system 120 of FIG. 1B may be a cloud server, a user’s mobile device, or any other suitable apparatus that can communicate with

camera system

200 or 205 shown in FIGs. 2A and 2B for exchanging image data captured by

camera system

200 or 205 and/or instructions to control parameters of

camera system

200 or 205.

As shown in FIG. 2A, camera system 200 includes main camera 202 and auxiliary camera 204. In some embodiments, main camera 202 may include a camera with a relatively shallow DOF that is configured to capture objects in a view (e.g., including one or more images of a scene) within a relatively small range of distance in focus. For example, shallow DOF cameras may capture images that appear to be isolated from their environment, and can be used in portrait work, macro photography, and sports photography, etc. In some embodiments, the shallow DOF of main camera 202 may be provided by a lens assembly with a large aperture, a long focal length, and/or a large sensor size. In some embodiments, auxiliary camera 204 may include a deep DOF sensor with a focus range that covers a large distance range front-to-back (e.g., from several meters in front of the focal plane to nearly infinity behind) , capturing objects within a large range of landscape view with acceptable visual clarity. In some embodiments, the deep DOF of auxiliary camera 204 may be provided by a lens assembly with a small aperture, a short focal length, and/or a small sensor size.

In some embodiments, auxiliary camera 204 may be configured to capture a view of the scene that may include, or otherwise may at least partially overlap with, the view captured by main camera 202. The deep DOF sensor (e.g., auxiliary camera 204) may be configured to determine one or more regions of interest (ROIs) to guide the shallow DOF camera (e.g., main camera 202) to focus on a region corresponding to the one or more determined ROIs.

It is appreciated that the configuration of main camera 202 and auxiliary camera 204 in system 200 shown in FIG. 2A is only an example for illustrative purpose, and is not intended to limit the scope of the present disclosure. For example, main camera 202 and auxiliary camera 204 may be arranged in any suitable configuration (e.g., main camera 202 placed on the right of auxiliary camera 204, main camera 202 and auxiliary camera 204 placed in left-right, front-back, and/or up-down arrangements) that may provide sufficient function as discussed in the present disclosure.

FIG. 2B illustrates another exemplary camera system 205 that may use one or more sensors, such as one or more deep DOF sensors, for guiding a shallow DOF camera to achieve autofocus in accordance with some disclosed embodiments. In some embodiments, as shown in FIG. 2B, there may be two or more

auxiliary cameras

208 and 210 in addition to main camera 206. In some embodiments, main camera 206 may support the replacement of lenses with different focal lengths, for example a wide angle lens (e.g., a short focal length and a wide FOV) and/or a telephoto lens (e.g., a long-focus lens) ,

auxiliary cameras

208 and 210 are used to increase the resolution of the view being captured without increasing the resolution of either of the

individual cameras

208 and 210. Therefore, there may be several

auxiliary cameras

208 and 210 having different focal lengths. For example, the exemplary auxiliary camera A 208 may be configured with a focal length corresponding to a wide-angle shot while the exemplary auxiliary camera B 210 may be configured with a focal length corresponding to a telephoto shot. In such an exemplary embodiment, this configuration allows for a deep focus shot having both a foreground object and a background object simultaneously in focus, similar to the effect of a split focus diopter.

Furthermore, when there are two or more

auxiliary cameras

208 and 210, in some embodiments one or more processors 102 and/or 122 may calculate a depth map in areas where the views from the

auxiliary cameras

208 and 210 overlap. This depth map allows one or more processors 102 and/or 122 to measure a distance between an object in the view and main camera 206 and then use this information to set a ROI for main camera 206 to use in controlling its focus.

It is appreciated that the configuration of main camera 206 and

auxiliary cameras

208 and 210 in system 205 shown in FIG. 2B is an example for illustrative purpose, and is not intended to limit the scope of the present disclosure. For example, main camera 206 and

auxiliary cameras

208 and 210 may be arranged in any suitable configuration (e.g., main camera 206 placed on the left or right of both

auxiliary cameras

208 and 210, main camera 206 and

auxiliary cameras

208 and 210 placed in left-right, front-back, and/or up-down arrangements) that may provide sufficient function as discussed in the present disclosure.

FIGS. 3A and 3B show exemplary focusing on objects in a view by a main camera and an auxiliary camera in accordance with some embodiments of the present disclosure. As FIGS. 3A and 3B show, a focus depth 306 of a first camera (such as a main camera as discussed in the present disclosure) may be contained within a focus depth 308 of a second camera (such as an auxiliary camera as discussed herein) . For example, a first person 310 is located within focus depth 306 of the main camera and a second person 312 is within focus depth 308 of the auxiliary camera but beyond focus depth 306 of the main camera. In this example, the auxiliary camera could capture the activity of second person 312, and the related image data can be used to calculate information associated with the second person’s position, such as positional data in the real space, or positional data relative to the view captured by the second camera. In the present embodiment, images or videos with cinema-like focus on different subjects in casual videography can be captured by a shallow DOF camera with accurate and distinct focuses on respective subjects (e.g., people and/or objects) . Deep DOF video images can be shot with a main camera with a large FOV and a shallow DOF. Auxiliary camera (s) with deep DOF can be used to guide /assist the main camera to focus on other ROIs identified by auxiliary camera (s) .

As shown in FIG. 3B, an autofocusing system 302 (e.g., system 100 of FIG. 1A or system 120 of FIG. 1 B) generates control signals comprising instructions causing the first camera (e.g., main camera) to update its depth of focus 306 and to focus on second person 312. Therefore, information that the autofocus system 302 received from the second camera (e.g., auxiliary camera) with a larger (deeper) DOF 308 may be used to control the first camera (e.g., main camera) to adjust its focus from first person 310 to second person 312 automatically. In some embodiments, the first camera may be driven by a motor, such as a voice coil actuator, and its focus changing rate could be controlled to provide a smooth visual effect (e.g., to transition from a first focused region to a second focused region) . In some implementations, autofocusing system 302 is configured to perform one or more steps of a process as discussed more fully below with reference to FIGs. 6, 9, 10, 11, and 12.

FIG. 4 illustrates an exemplary camera system 400 (e.g., similar to camera system 100 of FIG. 1 A, including one or more components of system 120 of FIG. 1B, or similar to camera system 200 of FIG. 2A) for one or more sensors, such as one or more deep DOF sensors (e.g., auxiliary cameras as disclosed herein) , to guide a shallow DOF camera (e.g., main camera as disclosed herein) for autofocusing in accordance with some disclosed embodiments. In some embodiments, as shown in FIG. 4, an auxiliary camera 404 (e.g.,

auxiliary camera

204, 208, 210, 106) may be embedded in or attached to a main camera 402 (e.g.,

main camera

202, 206, 104) . In other embodiments, auxiliary camera 404 may be a stand-alone device that is configured to coordinate with main camera 402. In some embodiments, main camera 402 and auxiliary cameras 404 can have various arrangements, e.g., relative positions to each other, distance therebetween, etc., that may provide sufficient function consistent with the present disclosure.

FIGS. 5A and 5B illustrate exemplary focusing on objects in a view by a main camera and auxiliary camera in accordance with some disclosed embodiments. As shown in FIG. 5A, a main camera (e.g.,

camera

202, 206, 104) may be configured for shallow DOF videography, in which a region associated with a person 504 is in focus, whereas the rest of the view, including a person 502, appears blurred (e.g., out of focus, unrecognizable) . In contrast, as shown in FIG. 5B, an auxiliary camera (e.g.,

camera

204, 208, 210, 106) may be configured for large DOF videography in which most or substantially all regions within the picture are not blurry and are recognizable, and with a larger area in focus at any given time (e.g., compared to the view of the main camera) . For example, as shown in FIG. 5B, both

people

502 and 504 are recognizable in the view, and one object may be on the focal point. This allows the auxiliary camera to determine a ROI to switch the focus of the main camera to. For example, the main camera may not be able to identify and focus on person 502 based on the view of the main camera alone, because person 502 is too blurry in the shallow DOF view of the main camera, lacking sufficient image information of person 502 to be captured and recognized by the main camera, and to be used for adjusting the focus of the main camera to focus on person 502. On the other hand, the shallow DOF view of the auxiliary camera (s) provides sufficient image information of person 502 (e.g., when person 502 is in focus in the view of the auxiliary camera, or when person 502 is determined to be the ROI in other ways, as discussed in the present disclosure) to guide the main camera in focusing on the corresponding object in its shallow DOF view. For example, if the action in a scene or other factors as discussed herein require the main camera to focus on person 502 (e.g., a character in a scene) instead of person 504 (e.g., a different character in the same scene) , one or more processors (e.g., 102 or 122) communicatively coupled to the auxiliary camera may determine that the focus or the ROI should be on character 502 and then guide the main camera to switch the main camera’s focus from a focus point associated with person 504 to a focus associated with the region of person 502.

FIGS. 5C and 5D illustrate exemplary focusing for a main camera (e.g., 202, 206, 104) and auxiliary camera (e.g., 204, 208, 210, 106) in accordance with certain disclosed embodiments. As shown in FIG. 5C, a main camera may have a shallow DOF with only a small portion of a view (e.g., the camera in the foreground) in focus, and the rest of the view blurry. In contrast, as shown in FIG. 5D, an auxiliary camera may have a large DOF with substantially all or nearly all of a view in focus or recognizable with sufficient image information. In accordance with some embodiments, this difference allows one or more processors (e.g., 102 or 122) communicatively coupled to the auxiliary camera to guide the adjustment of the focus of the main camera.

FIG. 6 illustrates a flowchart for an exemplary autofocusing process for one or more sensors, such as one or more deep DOF cameras, to guide a shallow DOF camera for autofocusing in accordance with some disclosed embodiments. The steps in this exemplary flowchart may be applied to the auxiliary camera (s) and the main camera in a camera system 100 or system 120. In some embodiments, the steps of the flowchart for the auxiliary camera (s) include facial recognition (step 602) for selecting a focus target (step 604) , converting a target frame projection (step 606) , and following a region of interest (step 608) . In some embodiments, the steps of the exemplary flowchart for the main camera include configuring focusing speed (step 618) , updating a focus region of interest (step 612) , determining whether a target is new (step 614) , initializing continuous auto focus (step 616) , and updating continuous autofocus (step 620) .

In step 602, one or more processors 102 connected to auxiliary camera (s) 106 (or one or more processors 122 in communication with the auxiliary camera (s) ) perform facial recognition (e.g., by executing instructions stored in face recognition module 154) . For example, the one or more processors may apply any suitable facial recognition algorithms or models to image data received from at least one auxiliary camera to determine whether there is any human face in the image and ifso, who is the person, and/or where such person is located in a view captured by the at least one auxiliary camera. For example, textures, shapes, and other facial features may be retrieved from the image data and analyzed to determine the pattern, identity, location, and/or other characteristics of the identified human faces in the view. In some embodiments, recognition of a human face, facial expressions, or one or more objects in the view can be based on artificial intelligence, such as a convolutional neural network (CNN) such as GoogleNet, Alex-Net, LeNet, ResNet, neural networks with Gabor filters, neural networks in conjunction with Hidden Markov Models, fuzzy neural networks, etc. Some other facial recognition algorithms or models may include, but are not limited to, template matching, support vector machines (SVM) , principal component analysis (PCA) , discrete cosine transform (DCT) , linear discriminant analysis (LDA) , locality preserving projections (LPP) , the hidden Markov model, the multilinear subspace learning using tensor representation, and the neuronal motivated dynamic link matching. One of ordinary skill in the art would appreciate that there are many facial recognition systems or object recognition systems that may be used to find people or other objects in one or more images of a camera view in accordance with the disclosed embodiments.

In step 604, one or more processors 102 connected to the auxiliary camera 106 (or processors 122 in communication with the auxiliary camera (s) ) select a focusing target (e.g., a region of interest (ROI) ) . For example, the one or more processors may select at least one person or an object identified from the one or more images of the view from step 602 and select one of them.

In some embodiments, the focusing target (e.g., ROI) may be determined based on someone talking in the scene. For example, one or

more processors

102 or 122 may use data received from the auxiliary camera to determine that someone’s lips are moving (e.g., recognized using a facial recognition algorithm or other suitable algorithm or model) . In response to determining that someone’s lips are moving, one or

more processors

102 or 122 may select the person whose lips are moving as the focus target (e.g., the ROI) . In other embodiments, the determination may be related to a motion detected in the view, such as a person moving in the scene or conducting some other action in the scene.

In some embodiments, one or

more processors

102 or 122 may rely on a user input to determine which person, people, or other object (s) in the one or more images of the view should be selected as the focus target (e.g., ROI) .

In step 606, one or more processors 102 connected to auxiliary camera 106 (or processors 122 in communication with the auxiliary camera (s) ) determine a target frame projection (e.g., between a view captured by the auxiliary camera and a view captured by the main camera) . For example, one or

more processors

102 or 122 may, upon selecting the focus target, determine a region in main camera 104 (e.g., a ROI in the view of the main camera) that corresponds to the region in auxiliary camera 106 associated with the focus target (e.g., the identified ROI in the view of the auxiliary camera) . One of ordinary skill in the art would recognize that there are many methods for performing such a target frame projection, e.g., to project the location of a focus object in a first view captured by one or more auxiliary cameras to a corresponding location or region in a second view that may be captured using the main camera.

In some embodiments, as shown in FIGS. 5C and 5D, main camera 104 and auxiliary camera 106 may be configured to capture different images encompassing different but overlapping areas of a target scene. A region in the view in main camera 104 thus may be determined to correspond to the same region in the view of auxiliary camera 106.

In step 608, one or more processors 102 connected to auxiliary camera 106 (or processors 122 in communication with the auxiliary camera (s) ) follow the target region of interest (e.g., the ROI) . For example, one or

more processors

102 or 122, after selecting the focus target (e.g., the ROI) in step 604, determine based on data received from auxiliary camera 106, a region of interest corresponding with the focus target. In the disclosed embodiments, an auxiliary camera may capture and provide data continuously (which may include an acceptable amount of discontinuous transmissions) to the one or more processors. As a focus target moves, the one or more processors may determine a new region of interest corresponding to the focus target (e.g., target tracking to maintain the person or object within the view of the camera) .

In step 610, one or more processors 102 connected to the auxiliary camera 106 (or processors 122 in communication with the auxiliary camera (s) ) move on to a next image frame (view) . For example, after performing facial recognition, selecting the focus target, determining the target frame projection to identify a region of interest for a view of the main camera, and following the target region of interest for a given image frame, the one or

more processors

102 or 122 may repeat this process for the next image frames by returning to step 602, as shown in FIG. 6.

In step 612, one or more processors 102 connected to main camera 104 (or processors 122 in communication with the main camera) update the focus region of interest of the main camera. For example, one or

more processors

102 or 122 may determine from the target frame projection from step 606, a region of interest in main camera 104 that corresponds with the selected focus target from step 604.

In step 614, one or more processors 102 connected to main camera 104 (or processors 122 in communication with the main camera) determine whether the region of interest contains a new target (e.g., relative to the target currently in focus for the main camera) . For example, one or

processors

102 or 122 determine whether the region of interest updated based on the converted target frame projection from step 606 contains a new target object to focus on or whether the region of interest corresponds with a previous target object. One or

more processors

102 or 122 may receive data from main camera 104 that indicates a region where the main camera’s focus is currently set. If the region where the focus is currently set is different from the updated region of interest, one or

more processors

102 or 122 may determine that there is a new target. Conversely, if the region where the focus is currently set is the same as the updated region of interest, one or

more processors

102 or 122 may determine that there is not a new target.

In step 616, if one or

more processors

102 or 122 determine that there is a new target, one or more processors 102 connected to main camera 104 (or processors 122 in communication with the main camera) initialize a continuous auto focus, which also may be substantially continuous so as to provide an acceptable amount of time that the main camera can hold its focus on a target object. For example, if the data received by one or

more processors

102 or 122 from the main camera indicates that the region where the focus is currently set is different from the updated region of interest, one or more processors 102 initialize continuous auto focus for the updated region of interest.

In step 618, one or more processors 102 connected to main camera 104 (or processors 122 in communication with the main camera) configure a focusing speed for the main camera. For example, one or

more processors

102 or 122 may configure a focusing speed based on information about main camera 104 and data from main camera 104 relating to the environment. One of ordinary skill in the art would recognize that focusing speed may differ for different cameras and in different environments, such as in well-lit versus dimly-lit environments. One or

more processors

102 or 122 may therefore determine the optimal focusing speed based on the related camera parameters.

In step 620, if one or

more processors

102 or 122 determine that there is not a new target, one or more processors 102 connected to main camera 104 (or processors 122 in communication with the main camera) update the continuous auto focus. Alternatively, if one or

more processors

102 or 122 determine that there is a new target, one or

more processors

102 or 122 update the continuous auto focus after initializing the continuous autofocus.

In step 622, one or more processors 102 connected to main camera 104 (or processors 122 in communication with the main camera) move on to the next image frame. For example, after updating the focus region of interest, determining whether there is a new target, initializing continuous auto focus, configuring a focusing speed, and updating continuous auto focus, one or

more processors

102 or 122 repeat this process for the next image frame (views) , e.g., of a captured video such as a movie, by returning to step 612 as shown in FIG. 6.

FIGS. 7A-7C show an exemplary autofocusing system 700 in accordance with some embodiments of the present disclosure. As shown in FIG. 7A, a wider FOV 714 of a first camera (e.g., an auxiliary camera, such as

auxiliary camera

106, 204, 208, 210, or 404) overlaps with or contains a narrower FOV 716 of a second camera (e.g., a main camera, such as

main camera

104, 202, 206, or 402) . In some embodiments, objects 702, 704, and 706 may be located or may move into FOV 716, causing their image (s) to be captured by the second camera. In some embodiments, view 716 captured by the main camera (e.g., person 702) can also be displayed on a user interface 710 and/or 712 of a user device 708. As shown in FIG. 7A, a first region 710 (e.g., a main display area) of a display area of user device 708 displays view 716 captured by the main camera, such as a region of interest (ROI) (e.g., including person 702 currently in focus) within view 716 of the main camera. A second region 712 of the display area may show one or more icons representing objects (e.g., objects 704, 706) and/or people (e.g., person 702) that can be captured in view 714 of the auxiliary camera. In some embodiments, the icons displayed in second region 712 correspond to objects and/or people that are one or more ROIs within view 714 of the auxiliary camera.

In the example of FIG. 7B, a user associated with user device 708 provides a user input, such as selecting, e.g., via an interaction between the user’s hand 718 and an icon or other indicator (e.g., associated with one of person 702 and

objects

704 and 706 within FOV 714 of the first camera) on the display screen (e.g., a touch screen, in second region 712 of the display screen) or via another suitable selection mechanism (e.g., audio command, eye-gaze tracking, mouse clicking, etc. ) . In response to the user selection, autofocusing system 700 (e.g., similar to

system

100, 200, 205, or including one or more modules of system 120 as discussed with reference to FIG. 1B) may instruct or otherwise control the auxiliary camera to adjust its FOV 714 (e.g., adjusting focal length to focus on or shifting its ROI to the selected object, e.g., a tree 704) . Autofocusing system 700 may also instruct or otherwise control the main camera to adjust its FOV 716 to focus on or include the selected object, so FOV 714 and/or FOV 716 encompass tree 704 associated with the object that the user selected on the user interface 712.

Accordingly, as shown in FIG. 7C, user interface 710 (e.g., the main display area 710) may then be updated to display tree 704 in response to the user selection in field of view 716 (e.g., as the current ROI) of main camera. Further, as the auxiliary camera also shifted its FOV 714 to place the selected tree 704 in approximately the center of FOV 714, object 706 is outside of the FOV 714. In response, one or more processors of system 700 cause removal of object 706 from the second display area 712.

FIGS. 8A and 8B show an exemplary autofocusing process in accordance with some embodiments of the present disclosure. In the example of FIG. 8A, a user associated with a user device 800 may use one or

more user interfaces

804 and 806 to indicate a sequence of icons or other indicators of objects for the second camera (e.g., the main camera) to focus on. The sequence may correspond to an order in which certain objects can be focused on by a camera. For example, the user may indicate that a second camera in the user device, or in communication with the user device, can focus on a person 810 first, a tree 812 second, and a car 814 third. Accordingly, as shown in the example of FIG. 8B, an autofocusing system 808 (e.g., similar to

system

100, 200, 205, or including one or more modules of system 120 as discussed in FIG. 1B) may cause a second camera to focus first on the person 810, then on the tree 812, and finally on the car 814, e.g., in the same order that the user indicated on one or more of the

user interfaces

804 and 806 in this example.

In some embodiments, a resolution of an auxiliary camera may be relatively low so that images captured by the auxiliary camera can be processed relatively quickly by a convolutional neural network or another type of machine-learning based accelerator, for example in autofocusing system 100 or system 120 communicatively coupled to the camera system, causing a main camera to automatically adjust its focus. In some embodiments, the autofocusing system may be triggered to adjust the focus of the main camera based on a view captured by an auxiliary camera in accordance with a user command, for example, entered on a user interface of a user device.

FIG. 9 shows a flow diagram of an autofocusing process 900 with guidance of a deep depth of field (DOF) camera, in accordance with embodiments of the present disclosure. For purposes of explanation and without limitation, process 900 may be performed by system 100 including one or more processors 102 as shown in FIG. 1A, system 120 including one or more modules 146 and database 170 of system 120 as shown in FIG. 1B, system 200 of FIG. 2A, system 205 of FIG. 2B, system 302 of FIGs. 3A and 3B, system 400 of FIG. 4, one or more components of user device 708 of FIGs. 7A-7C, one or more components of system 700 of FIGs. 7A-7C, one or more components of user device 800 of FIG. 8A, or one or more components of system 808 of FIG. 8B. Process 900 may be used for various types of videography, cinematography, photography, and other suitable image capturing processes performed by one or more cameras (e.g., imaging sensors) .

In some embodiments, process 900 is performed by a camera system (e.g.,

system

100, 200, 205, or 400) that is integrated with a first camera (e.g.,

auxiliary camera

106, 204, 208, 210, or 404) and a second camera (e.g.,

main camera

104, 202, 206, or 402) . In some embodiments, process 900 is performed by any of the systems noted above, (e.g., system 120) that is operably coupled to (e.g., connected to, or in communication with) the first and second cameras. In some embodiments, the first camera is configured to continuously capture a first view (e.g., FOV 714) , and the second camera is configured to continuously capture a second view (e.g., FOV 716) . In some embodiments, the first camera has a first DOF, and the second camera has a second DOF smaller than the first DOF (e.g., DOF 306 of the main camera is smaller than DOF 308 of the auxiliary camera) . In some embodiments, the first DOF may at least partially overlap with the second DOF (e.g., DOF 306 may be included within DOF 308) . In some embodiments, the first camera has a first FOV and the second camera has a second FOV smaller than the first FOV (e.g., FOV 716 of the main camera is smaller than FOV 714 of the auxiliary camera) . In some embodiments, the first FOV may at least partially overlap with the second FOV (e.g., FOV 716 of the main camera may be included within FOV 714 of the auxiliary camera) .

In step 910, a first region of interest (ROI) in the first view of a scene captured by the first camera (e.g.,

auxiliary camera

106, 204, 208, 210, or 404) is determined (e.g., by system 100 or system 120, such as by an ROI determination module 150 of system 120) . In some embodiments, the first ROI is determined based on first image data associated with the first view captured by and obtained from the first camera (e.g., by an image obtaining and processing module 148 of system 120) .

In some embodiments, the first image data associated with the first view is processed to identify the first ROI as a region in focus or is acceptably sharp in the first view of the auxiliary camera. In some embodiments, the first image data associated with the first view is processed to identify the first ROI as representing a face using a facial recognition algorithm (e.g., step 602, FIG. 6; by face recognition module 154 of system 120) . In some embodiments, the first image data associated with the first view is processed to identify an object (e.g., tree 704, car 706, or a building that can be recognized as ROIs and registered by the system) in the first ROI using an object recognition algorithm (e.g., by an object recognition module 156 of system 120) . In some embodiments, the first image data associated with the first view is processed using a machine learning algorithm to identify the first ROI. For example, a machine learning model may be trained using image data that has been marked to be associated with various objects, people, facial expressions, mouth movements, body gestures, motions, etc. (e.g., stored in machine learning data 172 of system 120) . Such machine learning model may then be used to identify an object, a person, a motion of an object or a person, a facial expression, a mouth movement (e.g., character speaking) , and/or body gestures.

In some embodiments, the first image data associated with the first view is processed to identify a plurality of ROIs, such as

objects

704 and 706, and person 702. For example, the plurality of ROIs are in focus in the first view of the auxiliary camera. In some embodiments, the first ROI may be selected from the plurality of ROIs. For example, the identified plurality of ROIs may be presented on a graphical user interface (e.g., region 712 on the display of user device 708) . A user input, such as a finger contact with a touch screen (e.g., indicated by hand 718 in FIG. 7B) , an audio command, or an eye-gaze, may be detected to indicate a selection of the first ROI from the plurality of ROIs (e.g., selection of icon corresponding to tree 704 from region 712) can be received (e.g., detected by user interface 124 on the display) . In some embodiments, the first ROI is determined as a desired region to focus using a machine learning algorithm (e.g., based on user’s previous selection data, and/or any other types of user data (e.g., stored in machine learning data 172) that can be used to train a machine learning model to predict user’s future selection) .

In step 920, in accordance with the first ROI, a second ROI is identified in a second view of the scene captured by a second camera (e.g.,

main camera

104, 202, 206, or 402) that corresponds to the first ROI. In some embodiments, parameters associated with the first ROI (e.g., location coordinates of a plurality of points in the first ROI in the real space or in the captured view) may be translated into (e.g., taking into consideration the lens parameters and lens locations between the first and second cameras) location information (e.g., in real space or in the captured view) associated with the second ROI in the second view (e.g., for identifying or defining the second ROI) .

In step 930, the second camera is caused to focus on the second ROI in the second view (e.g., by focus adjustment module 152) . In some embodiments, the focusing process may be conducted automatically. In some embodiments, a distance between a lens assembly and an image sensor of the second camera can be adjusted to cause the second camera to focus on the second ROI (e.g., based on the determined location information of the second ROI in step 920) . In some embodiments, a focus from a previous ROI of the second camera may be switched to the second ROI in the second view (e.g., based on the determined location information of the second ROI in step 920) .

FIG. 10 shows a flow diagram of an autofocusing process 1000 with guidance of a deep DOF camera, in accordance with embodiments of the present disclosure. In some embodiments, process 1000 may be performed by system 100 including one or more processors 102 as shown in FIG. 1A, system 120 including one or more modules 146 and database 170 of system 120 as shown in FIG. 1B, system 200 of FIG. 2A, system 205 of FIG. 2B, system 302 of FIGs. 3A and 3B, system 400 of FIG. 4, one or more components of user device 708 of FIGs. 7A-7C, one or more components of system 700 of FIGs. 7A-7C, one or more components of user device 800 of FIG. 8A, or one or more components of system 808 of FIG. 8B. It is appreciated that process 1000 can be performed by any camera system or a system operably coupled to one or more cameras with similar configuration as discussed with reference to process 900 in FIG 9. For the sake of brevity, similar features or steps are not repeated here.

In step 1010, a first region of interest (ROI) in a first view of a scene captured by a first camera (e.g.,

auxiliary camera

106, 204, 208, 210, or 404) is determined (e.g., by system 100 or system 120, such as an ROI determination module 150 of system 120) . In some embodiments, the first ROI is determined based on first image data associated with the first view that is captured by and obtained from the first camera (e.g., by image obtaining and processing module 148 of system 120) . The first camera may be configured to continuously capture the first view of the scene. The first camera may be associated with a first DOF.

In step 1020, a second camera (e.g.,

main camera

104, 202, 206, or 402) is caused to focus on a second ROI in a second view corresponding to the determined first ROI. The second camera may be configured to continuously capture the second view of the scene. The second camera may be associated with a second DOF smaller than the first DOF. In some embodiments, the focus of the second camera may be adjusted based on information associated with the first ROI (e.g., location information of the first ROI in the real space or in the first view) . Different from process 900, in process 1000, information of the second ROI may not be identified prior to causing the second camera to focus on the second ROI, and the second ROI in the second view may be identified as a result of adjusting one or more parameters (e.g., stored in camera parameters 174) of the second camera to focus on a region corresponding to the first ROI as the second ROI.

In some embodiments, the second camera may be caused to focus on the second ROI by an adjustment of a distance between a lens assembly and an image sensor of the second camera. One or more parameters (e.g., stored in camera parameters 174, including but not limited to, focal length, aperture, ISO sensitivity, relative distance and/or position between the second camera and the identified first ROI location, etc. ) associated with the second camera may be caused to be adjusted based on the information associated with the identified first ROI.

In some embodiments, one or more parameters of the second camera, e.g., a distance between the lens assembly and the image sensor of the second camera, may be caused to be adjusted in accordance with a predetermined relationship of one or more parameters between the first camera and the second camera (e.g., stored in camera parameters 174) . For example, a relationship between focal lengths and/or aperture of the first camera and the second camera may be predetermined. When a first parameter (e.g., a first focal length) of the first camera is determined based on the first ROI, a second parameter (e.g., a second focal length) of the second camera may be adjusted according to the predetermined relationship so as to cause the second camera to focus on a region corresponding to the first ROI.

In some embodiments, one or more parameters of the second camera, e.g., the distance between the lens assembly and the image sensor of the second camera, may be caused to be adjusted in accordance with one or more characteristics associated with the first ROI (e.g., location /position coordinates of the first ROI may be used to determine the adjustment of the parameters such as focal length of the second camera) . In some embodiments, the second camera may be caused to switch from a currently focused ROI to another region in the second view to designated as the second ROI in accordance with one or more characteristics associated with the first ROI. For example, coordinates of the first ROI may be used to switch focus of the second camera (e.g., without or without adjusting the focal length of the second camera) to a region corresponding to the first ROI in the second view as the second ROI.

FIG. 11 shows a flow diagram of an autofocusing process 1100 with guidance of one or more deep DOF cameras, in accordance with embodiments of the present disclosure. Process 1100 may be performed by system 100 including one or more processors 102 as shown in FIG. 1A, system 120 including one or more modules 146 and database 170 of system 120 as shown in FIG. 1B, system 200 of FIG. 2A, system 205 of FIG. 2B, system 302 of FIGs. 3A and 3B, system 400 of FIG. 4, one or more components of user device 708 of FIGs. 7A-7C, one or more components of system 700 of FIGs. 7A-7C, one or more components of user device 800 of FIG. 8A, or one or more components of system 808 of FIG. 8B.

In some embodiments, process 1100 is performed by a camera system (e.g.,

system

100, 200, 205, or 400) that is integrated with a main camera and a plurality of auxiliary cameras, or a system (e.g., system 120) operably coupled to (e.g., in connection with, or in communication with) the main camera and the plurality of auxiliary cameras. In some embodiments, the plurality of auxiliary cameras (e.g.,

auxiliary camera

106, 204, 208, 210, or 404) may include a first auxiliary camera configured to capture a first view of a scene and associated with a first focal length range, and a second auxiliary camera configured to capture a second view of the scene and associated with a second focal length range that is different than the first focal length range. In some embodiments, a third camera (e.g., the

main camera

104, 202, 206, or 402) may be configured to capture a third view of the scene and associated with a third focal length range.

In step 1110, a view is selected (e.g., by view selection module 158) between the first view of the first auxiliary camera and the second view of the second auxiliary camera by comparing the third focal length range of the main camera with the first focal length range and the second focal length range. For example, a view may be selected in accordance with a determination that one auxiliary camera has the first or second focal length range that at least partially overlaps with the third focal length range of the main camera.

In some embodiments, the view may be selected to be associated with a camera between the first and second cameras that has a focal length range at least partially overlapped with the third focal length rage of the third camera. In some embodiments, the view may be selected to be associated with a camera between the first and second cameras that includes a lens of a substantially similar type to a lens included in the third camera. For example, if the third camera is currently using a wide-angle lens, then the one of the first and second auxiliary cameras with a wide-angle lens may be selected. In another example, if the third camera is currently using a telephoto lens, then the one of the first and second auxiliary cameras with a telephoto lens may be selected. In some embodiments, the view may be selected to be associated with a camera between the first and second cameras that has a FOV at least partially overlapped with a FOV of the third camera.

In step 1120, a first region of interest (ROI) in the selected view is determined based on the image data associated with the selected view (e.g., image data captured by and obtained from the corresponding auxiliary camera) . In some embodiments, the image data associated with the selected view may be processed to identify the first ROI as a region in focus in the selected view. In some embodiments, the image data associated with the selected view may be processed using a facial recognition algorithm to identify the first ROI as representing a face (e.g., similar to the facial recognition process discussed above) . In some embodiments, the image data associated with the selected view may be processed using an object recognition algorithm to identify an object as the first ROI as discussed above. In some embodiments, the image data associated with the selected view may be processed using a machine leaming algorithm to identify the first ROI in the selected view as discussed above. In some embodiments, the image data associated with the selected view may be processed to identify a plurality of ROIs as discussed herein. The first ROI may then be selected from the plurality of ROIs using any other suitable method as discussed with reference to process 900.

In step 1130, the third camera is caused to focus on a second ROI in the third view corresponding to the first ROI. In some embodiments, the second ROI in the third view may be first identified corresponding to the first ROI in the selected view. Then, one or more parameters, such as a distance between a lens assembly and an image sensor of the third camera may be adjusted according to the identified second ROI in the third view. For example, as discussed with regard to process 900, the first ROI may be first projected from the selected view to the second ROI in the third view based on position information or any other information associated with the first ROI in the real space or in the selected view, and the respective parameters associated with the selected camera and the third main camera (e.g., stored in camera parameters 174) . The third camera may then be caused to focus on the second ROI projected in the third view (e.g., by adjusting parameters, such as focal length, aperture, FOV, etc., of the third camera) .

In some embodiments, as discussed with regard to process 1000, one or more parameters of the third main camera may be adjusted based on information associated with the first ROI in the selected view, such that the third camera can focus on a region in the third view corresponding to the first ROI, which can be designated as the second ROI in the third view. In some embodiments, one or more parameters, such as a distance between a lens assembly and an image sensor, of the third camera may be adjusted in accordance with a predetermined relationship of one or more parameters between the third camera and a camera associated with the selected view (e.g., as discussed with regard to process 1000 of FIG. 10) . In some embodiments, one or more parameters, such as a distance between a lens assembly and an image sensor, of the third camera may be adjusted in accordance with one or more characteristics associated with the first ROI (e.g., position information of the first ROI as discussed herein) . In some embodiments, the third camera may be caused to switch focus from a currently focused ROI to another region in the third view to become the second ROI in accordance with one or more characteristics associate with the first ROI.

FIG. 12 shows a flow diagram of an autofocusing process 1200 with guidance of one or more deep DOF cameras, in accordance with embodiments of the present disclosure. Process 1200 may be performed by system 100 including one or more processors 102 as shown in FIG. 1A, system 120 including one or more modules 146 and database 170 of system 120 as shown in FIG. 1B, system 200 of FIG. 2A, system 205 of FIG. 2B, system 302 of FIGs. 3A and 3B, system 400 of FIG. 4, one or more components of user device 708 of FIGs. 7A-7C, one or more components of system 700 of FIGs. 7A-7C, one or more components of user device 800 of FIG. 8A, or one or more components of system 808 of FIG. 8B.

In some embodiments, process 1200 is performed by a camera system (e.g.,

system

auxiliary camera

106, 204, 208, 210, or 404) may include a first camera configured to capture a first view of a scene and a second camera configured to capture a second view of the scene. In some embodiments, a third camera (e.g., the

main camera

104, 202, 206, or 402) may be configured to capture a third view of the scene. The first camera may have a first DOF, and the second camera may have a second DOF that at least partially overlaps with the first DOF. The third camera may have a third DOF that is smaller than the first DOF or the second DOF. The first camera may have a first FOV, and the second camera may have a second FOV that at least partially overlaps with the first FOV.

In step 1210, a first region of interest (ROI) is determined in an overlapping region between the first view captured by the first camera and the second view captured by the second camera. In some embodiments, the first image data and the second image data associated with the overlapping region between the first view and the second view are processed using a facial recognition algorithm to identify the first ROI as representing a face as disclosed herein. In some embodiments, the first image data and the second image data associated with the overlapping region are processed using an object recognition algorithm to identify the object corresponding to the first ROI as disclosed herein. In some embodiments, the first image data and the second image data associated with the overlapping region are processed using a machine learning algorithm to identify the first ROI as disclosed herein.

In step 1220, a distance of an object corresponding to the first ROI (e.g., located within the first ROI) may be determined (e.g., by distance determination module 160) based on first image data associated with the first view obtained from the first camera and second image data associated with the second view obtained from the second camera. In some embodiments, the distance of the object (e.g., a depth) may be determined based on a disparity value associated with the two corresponding images (e.g., stereoscopic images) captured by the first and second cameras.

FIG. 13 illustrates a diagram for determining a distance to an object in an overlapping region of a plurality of auxiliary cameras, in accordance with some embodiments of the present disclosure. As shown in FIG. 13, the optical centers of the first and second cameras (two auxiliary cameras) are at O and O’ respectively. A point X in FIG. 13 represents an object in an overlapping region between the first view of the first camera at O and the second view of the second camera at O’. In some embodiments, f indicates the focal lengths of the first and second cameras that capture the first and second images including the point X in the real space. A distance between the first camera at O and the second camera at O’ is L. In FIG. 13, x represents the point corresponding to the real point X captured on the 2D image plane of the first camera, and x’ represents the point corresponding the real point X captured on the 2D image plane of the second camera. The depth or distance D of point X is determined by:

where disparity represents a difference in image location of an object or a point captured by two cameras.

In step 1230, a third camera is configured to capture a third view of the scene to focus on a second ROI in the third view corresponding to the first ROI based on the distance of the object determined in step 1220. In some embodiments, the second ROI may be identified in the third view based on the determined distance of the object (e.g., distance D as shown in FIG. 13) . Accordingly, one or more parameters of the third camera may be adjusted to focus on the second ROI. For example, a distance between a lens assembly and an image sensor of the third camera may be adjusted according to the distance D to focus on the object (e.g., point X in FIG. 13) . In another example, a third view of the third camera may be switched to the identified second ROI from a previously focused region.

In some other embodiments, one or more parameters of the third camera, e.g., a distance between a lens assembly and an image sensor, may be adjusted in accordance with the determined distance D of the object (e.g., without having to identify the second ROI first) . In some embodiments, focus from a current ROI may be switched to a region in the third view in accordance with the determined distance D of the object, and such region may be designated as the second ROI in the third view.

It is to be understood that the disclosed embodiments are not necessarily limited in their application to the details of construction and the arrangement of the components set forth in the following description and/or illustrated in the drawings and/or the examples. The disclosed embodiments are capable of variations, or of being practiced or carried out in various ways. It will be apparent to those skilled in the art that various modifications and variations can be made to the disclosed devices and systems. Other embodiments will be apparent to those skilled in the art from consideration of the specification and practice of the disclosed devices and systems. It is intended that the specification and examples be considered as exemplary only, with a true scope being indicated by the following claims and their equivalents.

Claims

A system, comprising:

one or more processors; and

memory coupled to the one or more processors and storing instructions that, when executed by the one or more processors, cause the system to:

determine a first region of interest (ROI) in a first view of a scene captured by a first camera, the first ROI determined based on first image data associated with the first view obtained from the first camera;

identify, in accordance with the first ROI, a second ROI in a second view of the scene captured by a second camera, the second ROI corresponding to the first ROI; and

cause the second camera to focus on the second ROI in the second view.
The system of claim 1, wherein the first and second cameras are integrated in the system.
The system of claim 2, wherein the first camera is configured to continuously capture the first view.
The system of claim 2, wherein the second camera is configured to continuously capture the second view.
The system of claim 2, wherein the first camera has a first depth of field (DOF) and the second camera has a second DOF smaller than the first DOF.
The system of claim 5, wherein the first DOF overlaps with the second DOF.
The system of claim 2, wherein the first camera has a first field of view (FOV) and the second camera has a second FOV smaller than the first FOV.
The system of claim 7, wherein the first FOV overlaps with the second FOV.
The system of claim 1, wherein determining the first ROI comprises:

processing the first image data associated with the first view to identify the first ROI as a region in focus in the first view.
The system of claim 1, wherein determining the first ROI comprises:

processing the first image data associated with the first view using a facial recognition algorithm to identify the first ROI as representing a face.
The system of claim 1, wherein determining the first ROI comprises:

processing the first image data associated with the first view using an object recognition algorithm to identify an object as the first ROI.
The system of claim 1, wherein determining the first ROI comprises:

processing the first image data associated with the first view using a machine learning algorithm to identify the first ROI in the first view.
The system of claim 1, wherein determining the first ROI comprises:

processing the first image data associated with the first view to identify a plurality of ROIs; and

selecting the first ROI from the plurality of ROIs.
The system of claim 13 wherein processing the first image data associated with the first view to identify a plurality of ROIs comprises:

determining that the plurality of ROIs are in focus in the first view.
The system of claim 13, wherein selecting the first ROI from the plurality of ROIs comprises:

presenting the plurality of ROIs on a graphical user interface; and

receiving a user input indicative of a selection of the first ROI from the plurality of ROIs as a desired region to focus.
The system of claim 13, wherein selecting the first ROI from the plurality of ROIs comprises:

determining the first ROI as a desired region to focus using a machine learning algorithm.
The system of claim 1, wherein identifying the second ROI in the second view comprises:

translating the first ROI in the first view to the second ROI in the second view.
The system of claim 1, wherein causing the second camera to focus on the second ROI comprises:

causing adjustment of a distance between a lens assembly and an image sensor of the second camera.
The system of claim 1, wherein causing the second camera to focus on the second ROI comprises:

switching focus from a third ROI to the second ROI in the second view.
A method, comprising:

determining a first region of interest (ROI) in a first view of a scene captured by a first camera, the first ROI determined based on first image data associated with the first view obtained from the first camera;

identifying, in accordance with the first ROI, a second ROI in a second view of the scene captured by a second camera, the second ROI corresponding to the first ROI; and

causing the second camera to focus on the second ROI in the second view.
The method of claim 20, wherein the first camera is configured to continuously capture the first view, and the second camera is configured to continuously capture the second view.
The method of claim 20, wherein the first camera has a first depth of field (DOF) and the second camera has a second DOF smaller than the first DOF, the first DOF being overlapped with the second DOF.
The method of claim 20, wherein the first camera has a first field of view (FOV) and the second camera has a second FOV smaller than the first FOV, the first FOV being overlapped with the second FOV.
The method of claim 20, wherein determining the first ROI comprises:

processing the first image data associated with the first view to identify the first ROI as a region in focus in the first view.
The method of claim 20, wherein determining the first ROI comprises:

processing the first image data associated with the first view using a facial recognition algorithm to identify the first ROI as representing a face.
The method of claim 20, wherein determining the first ROI comprises:

processing the first image data associated with the first view using an object recognition algorithm to identify an object as the first ROI.
The method of claim 20, wherein determining the first ROI comprises:

processing the first image data associated with the first view using a machine learning algorithm to identify the first ROI in the first view.
The method of claim 20, wherein determining the first ROI comprises:

processing the first image data associated with the first view to identify a plurality of ROIs; and

selecting the first ROI from the plurality of ROIs.
The method of claim 28 wherein processing the first image data associated with the first view to identify a plurality of ROIs comprises:

determining that the plurality of ROIs are in focus in the first view.
The method of claim 28, wherein selecting the first ROI from the plurality of ROIs comprises:

presenting the plurality of ROIs on a graphical user interface; and

receiving a user input indicative of a selection of the first ROI from the plurality of ROIs as a desired region to focus.
The method of claim 28, wherein selecting the first ROI from the plurality of ROIs comprises:

determining the first ROI as a desired region to focus using a machine learning algorithm.
The method of claim 20, wherein identifying the second ROI in the second view comprises:

translating the first ROI in the first view to the second ROI in the second view.
The method of claim 20, wherein causing the second camera to focus on the second ROI comprises:

causing adjustment of a distance between a lens assembly and an image sensor of the second camera.
The method of claim 20, wherein causing the second camera to focus on the second ROI comprises:

switching focus from a third ROI to the second ROI in the second view.
A non-transitory computer-readable medium with instructions stored thereon, that when executed by a processor, cause the processor to perform operations comprising:

determining a first region of interest (ROI) in a first view of a scene captured by a first camera, the first ROI determined based on first image data associated with the first view obtained from the first camera;

identifying, in accordance with the first ROI, a second ROI in a second view of the scene captured by a second camera, the second ROI corresponding to the first ROI; and

causing the second camera to focus on the second ROI in the second view.
The non-transitory computer-readable medium of claim 35, wherein the first camera is configured to continuously capture the first view, and the second camera is configured to continuously capture the second view.
The non-transitory computer-readable medium of claim 35, wherein the first camera has a first depth of field (DOF) and the second camera has a second DOF smaller than the first DOF, the first DOF being overlapped with the second DOF.
The non-transitory computer-readable medium of claim 35, wherein the first camera has a first field of view (FOV) and the second camera has a second FOV smaller than the first FOV, the first FOV being overlapped with the second FOV.
The non-transitory computer-readable medium of claim 35, wherein determining the first ROI comprises:

processing the first image data associated with the first view to identify the first ROI as a region in focus in the first view.
The non-transitory computer-readable medium of claim 35, wherein determining the first ROI comprises:

processing the first image data associated with the first view using a facial recognition algorithm to identify the first ROI as representing a face.
The non-transitory computer-readable medium of claim 35, wherein determining the first ROI comprises:

processing the first image data associated with the first view using an object recognition algorithm to identify an object as the first ROI.
The non-transitory computer-readable medium of claim 35, wherein determining the first ROI comprises:

processing the first image data associated with the first view using a machine learning algorithm to identify the first ROI in the first view.
The non-transitory computer-readable medium of claim 35, wherein determining the first ROI comprises:

processing the first image data associated with the first view to identify a plurality of ROIs; and

selecting the first ROI from the plurality of ROIs.
The non-transitory computer-readable medium of claim 43 wherein processing the first image data associated with the first view to identify a plurality of ROIs comprises:

determining that the plurality of ROIs are in focus in the first view.
The non-transitory computer-readable medium of claim 43, wherein selecting the first ROI from the plurality of ROIs comprises:

presenting the plurality of ROIs on a graphical user interface; and

receiving a user input indicative of a selection of the first ROI from the plurality of ROIs as a desired region to focus.
The non-transitory computer-readable medium of claim 43, wherein selecting the first ROI from the plurality of ROIs comprises:

determining the first ROI as a desired region to focus using a machine learning algorithm.
The non-transitory computer-readable medium of claim 35, wherein identifying the second ROI in the second view comprises:

translating the first ROI in the first view to the second ROI in the second view.
The non-transitory computer-readable medium of claim 35, wherein causing the second camera to focus on the second ROI comprises:

causing adjustment of a distance between a lens assembly and an image sensor of the second camera.
The non-transitory computer-readable medium of claim 35, wherein causing the second camera to focus on the second ROI comprises:

switching focus from a third ROI to the second ROI in the second view.
A system, comprising:

one or more processors; and

memory coupled to the one or more processors and storing instructions that, when executed by the one or more processors, cause the system to:

determine a first region of interest (ROI) in a first view based on first image data associated with the first view obtained from a first camera that is configured to continuously capture the first view of a scene, the first camera associated with a first depth of field (DOF) ; and

cause a second camera to focus on a second ROI in a second view corresponding to the determined first ROI, wherein the second camera is configured to continuously capture the second view of the scene, and wherein the second camera is associated with a second DOF smaller than the first DOF.
The system of claim 50, wherein the first and second cameras are integrated in the system.
The system of claim 51, wherein the first DOF at least partially overlaps with the second DOF.
The system of claim 51, wherein the first camera has a first field of view (FOV) and the second camera has a second FOV smaller than the first FOV.
The system of claim 53, wherein the first FOV overlaps with the second FOV.
The system of claim 50, wherein determining the first ROI comprises:

processing the first image data associated with the first view to identify the first ROI as a region in focus in the first view.
The system of claim 50, wherein determining the first ROI comprises:

processing the first image data associated with the first view using a facial recognition algorithm to identify the first ROI as representing a face.
The system of claim 50, wherein determining the first ROI comprises:

processing the first image data associated with the first view using an object recognition algorithm to identify an object as the first ROI.
The system of claim 50, wherein determining the first ROI comprises:

processing the first image data associated with the first view using a machine learning algorithm to identify the first ROI in the first view.
The system of claim 50, wherein determining the first ROI comprises:

processing the first image data associated with the first view to identify a plurality of ROIs; and

selecting the first ROI from the plurality of ROIs.
The system of claim 59, wherein processing the first image data associated with the first view to identify a plurality of ROIs comprises:

determining that the plurality of ROIs are in focus in the first view.
The system of claim 59, wherein selecting the first ROI from the plurality of ROIs comprises:

presenting the plurality of ROIs on a graphical user interface; and

receiving a user input indicative of a selection of the first ROI as a desired region to focus.
The system of claim 59, wherein selecting the first ROI from the plurality of ROIs comprises:

determining the first ROI as a desired region to focus using a machine learning algorithm.
The system of claim 50, wherein causing the second camera to focus on the second ROI comprises:

causing adjustment of a distance between a lens assembly and an image sensor of the second camera.
The system of claim 63, wherein the distance between the lens assembly and the image sensor of the second camera is caused to be adjusted in accordance with a predetermined relationship of one or more parameters between the first camera and the second camera.
The system of claim 63, wherein the distance between the lens assembly and the image sensor of the second camera is caused to be adjusted in accordance with one or more characteristics associated with the first ROI.
The system of claim 50, wherein causing the second camera to focus on the second ROI comprises:

switching focus from a third ROI to the second ROI in the second view in accordance with one or more characteristics associated with the first ROI.
A method, comprising:

determining a first region of interest (ROI) in a first view based on first image data associated with the first view obtained from a first camera that is configured to continuously capture the first view of a scene, the first camera associated with a first depth of field (DOF) ; and

causing a second camera to focus on a second ROI in a second view corresponding to the determined first ROI, wherein the second camera is configured to continuously capture the second view of the scene, and wherein the second camera is associated with a second DOF smaller than the first DOF.
The method of claim 67, wherein the first DOF at least partially overlaps with the second DOF.
The method of claim 67, wherein the first camera has a first field of view (FOV) and the second camera has a second FOV smaller than the first FOV, the first FOV being overlapped with the second FOV.
The method of claim 67, wherein determining the first ROI comprises:

processing the first image data associated with the first view to identify the first ROI as a region in focus in the first view.
The method of claim 67, wherein determining the first ROI comprises:

processing the first image data associated with the first view using a facial recognition algorithm to identify the first ROI as representing a face.
The method of claim 67, wherein determining the first ROI comprises:

processing the first image data associated with the first view using an object recognition algorithm to identify an object as the first ROI.
The method of claim 67, wherein determining the first ROI comprises:

processing the first image data associated with the first view using a machine learning algorithm to identify the first ROI in the first view.
The method of claim 67, wherein determining the first ROI comprises:

processing the first image data associated with the first view to identify a plurality of ROIs; and

selecting the first ROI from the plurality of ROIs.
The method of claim 74, wherein processing the first image data associated with the first view to identify a plurality of ROIs comprises:

determining that the plurality of ROIs are in focus in the first view.
The method of claim 74, wherein selecting the first ROI from the plurality of ROIs comprises:

presenting the plurality of ROIs on a graphical user interface; and

receiving a user input indicative of a selection of the first ROI as a desired region to focus.
The method of claim 74, wherein selecting the first ROI from the plurality of ROIs comprises:

determining the first ROI as a desired region to focus using a machine learning algorithm.
The method of claim 67, wherein causing the second camera to focus on the second ROI comprises:

causing adjustment of a distance between a lens assembly and an image sensor of the second camera.
The method of claim 78, wherein the distance between the lens assembly and the image sensor of the second camera is caused to be adjusted in accordance with a predetermined relationship of one or more parameters between the first camera and the second camera.
The method of claim 78, wherein the distance between the lens assembly and the image sensor of the second camera is caused to be adjusted in accordance with one or more characteristics associated with the first ROI.
The method of claim 67, wherein causing the second camera to focus on the second ROI comprises:

switching focus from a third ROI to the second ROI in the second view in accordance with one or more characteristics associated with the first ROI.
A non-transitory computer-readable medium with instructions stored thereon, that when executed by a processor, cause the processor to perform operations comprising:

determining a first region of interest (ROI) in a first view based on first image data associated with the first view obtained from a first camera that is configured to continuously capture the first view of a scene, the first camera associated with a first depth of field (DOF) ; and

causing a second camera to focus on a second ROI in a second view corresponding to the determined first ROI, wherein the second camera is configured to continuously capture the second view of the scene, and wherein the second camera is associated with a second DOF smaller than the first DOF.
The non-transitory computer-readable medium of claim 82, wherein the first DOF at least partially overlaps with the second DOF.
The non-transitory computer-readable medium of claim 82, wherein the first camera has a first field of view (FOV) and the second camera has a second FOV smaller than the first FOV, the first FOV being overlapped with the second FOV.
The non-transitory computer-readable medium of claim 82, wherein determining the first ROI comprises:

processing the first image data associated with the first view to identify the first ROI as a region in focus in the first view.
The non-transitory computer-readable medium of claim 82, wherein determining the first ROI comprises:

processing the first image data associated with the first view using a facial recognition algorithm to identify the first ROI as representing a face.
The non-transitory computer-readable medium of claim 82, wherein determining the first ROI comprises:

processing the first image data associated with the first view using an object recognition algorithm to identify an object as the first ROI.
The non-transitory computer-readable medium of claim 82, wherein determining the first ROI comprises:

processing the first image data associated with the first view using a machine learning algorithm to identify the first ROI in the first view.
The non-transitory computer-readable medium of claim 82, wherein determining the first ROI comprises:

processing the first image data associated with the first view to identify a plurality of ROIs; and

selecting the first ROI from the plurality of ROIs.
The non-transitory computer-readable medium of claim 89, wherein processing the first image data associated with the first view to identify a plurality of ROIs comprises:

determining that the plurality of ROIs are in focus in the first view.
The non-transitory computer-readable medium of claim 89, wherein selecting the first ROI from the plurality of ROIs comprises:

presenting the plurality of ROIs on a graphical user interface; and

receiving a user input indicative of a selection of the first ROI as a desired region to focus.
The non-transitory computer-readable medium of claim 89, wherein selecting the first ROI from the plurality of ROIs comprises:

determining the first ROI as a desired region to focus using a machine learning algorithm.
The non-transitory computer-readable medium of claim 82, wherein causing the second camera to focus on the second ROI comprises:

causing adjustment of a distance between a lens assembly and an image sensor of the second camera.
The non-transitory computer-readable medium of claim 93, wherein the distance between the lens assembly and the image sensor of the second camera is caused to be adjusted in accordance with a predetermined relationship of one or more parameters between the first camera and the second camera.
The non-transitory computer-readable medium of claim 93, wherein the distance between the lens assembly and the image sensor of the second camera is caused to be adjusted in accordance with one or more characteristics associated with the first ROI.
The non-transitory computer-readable medium of claim 82, wherein causing the second camera to focus on the second ROI comprises:

switching focus from a third ROI to the second ROI in the second view in accordance with one or more characteristics associated with the first ROI.
A system, comprising:

one or more processors operably coupled to:

a first camera configured to capture a first view of a scene, the first camera associated with a first focal length range;

a second camera configured to capture a second view of the scene, the second camera associated with a second focal length range that is different than the first focal length range; and

a third camera configured to capture a third view of the scene, the third camera associated with a third focal length range; and

memory coupled to the one or more processors and storing instructions that, when executed by the one or more processors, cause the system to:

select a view between the first view and the second view by comparing the third focal length range with the first focal length range and the second focal length range;

determine a first region of interest (ROI) in the selected view based on image data associated with the selected view; and

cause the third camera to focus on a second ROI in the third view corresponding to the first ROI.
The system of claim 97, wherein at least one of the first, second, and third cameras is integrated in the system.
The system of claim 97, wherein selecting a view between the first view and the second view comprises:

selecting the view associated with a camera between the first and second cameras that has a focal length range at least partially overlapped with the third focal length rage of the third camera.
The system of claim 97, wherein selecting a view between the first view and the second view comprises:

selecting the view associated with a camera between the first and second cameras that includes a lens of a substantially similar type to a lens included in the third camera.
The system of claim 97, wherein selecting a view between the first view and the second view comprises:

selecting the view associated with a camera between the first and second cameras that has a field of view (FOV) at least partially overlapped with a FOV of the third camera.
The system of claim 97, wherein determining the first ROI comprises:

processing the image data associated with the selected view to identify the first ROI as a region in focus in the selected view.
The system of claim 97, wherein determining the first ROI comprises:

processing the image data associated with the selected view using a facial recognition algorithm to identify the first ROI as representing a face.
The system of claim 97, wherein determining the first ROI comprises:

processing the image data associated with the selected view using an object recognition algorithm to identify an object as the first ROI.
The system of claim 97, wherein determining the first ROI comprises:

processing the image data associated with the selected view using a machine learning algorithm to identify the first ROI in the selected view.
The system of claim 97, wherein determining the first ROI comprises:

processing the image data associated with the selected view to identify a plurality of ROIs; and

selecting the first ROI from the plurality of ROIs.
The system of claim 97, wherein causing the third camera to focus on the second ROI in the third view corresponding to the first ROI comprises:

identifying the second ROI in the third view corresponding to the first ROI; and

causing adjustment of a distance between a lens assembly and an image sensor of the third camera according to the second ROI in the third view.
The system of claim 97, wherein causing the third camera to focus on the second ROI in the third view corresponding to the first ROI comprises:

causing adjustment of a distance between a lens assembly and an image sensor of the third camera in accordance with a predetermined relationship of one or more parameters between the third camera and a camera associated with the selected view.
The system of claim 97, wherein causing the third camera to focus on the second ROI in the third view corresponding to the first ROI comprises:

causing adjustment of a distance between a lens assembly and an image sensor of the third camera in accordance with one or more characteristics associated with the first ROI.
The system of claim 97, wherein causing the third camera to focus on the second ROI in the third view corresponding to the first ROI comprises:

switching focus from a third ROI to the second ROI in the third view.
A method, comprising:

in a system including one or more processors operably coupled to:

a first camera configured to capture a first view of a scene, the first camera associated with a first focal length range;

a second camera configured to capture a second view of the scene, the second camera associated with a second focal length range that is different than the first focal length range; and

a third camera configured to capture a third view of the scene, the third camera associated with a third focal length range; and

memory coupled to the one or more processors and storing instructions comprising:

selecting a view between the first view and the second view by comparing the third focal length range with the first focal length range and the second focal length range;

determining a first region of interest (ROI) in the selected view based on image data associated with the selected view; and

causing the third camera to focus on a second ROI in the third view corresponding to the first ROI.
The method of claim 111, wherein selecting a view between the first view and the second view comprises:

selecting the view associated with a camera between the first and second cameras that has a focal length range at least partially overlapped with the third focal length rage of the third camera.
The method of claim 111, wherein selecting a view between the first view and the second view comprises:

selecting the view associated with a camera between the first and second cameras that includes a lens of a substantially similar type to a lens included in the third camera.
The method of claim 111, wherein selecting a view between the first view and the second view comprises:

selecting the view associated with a camera between the first and second cameras that has a field of view (FOV) at least partially overlapped with a FOV of the third camera.
The method of claim 111, wherein determining the first ROI comprises:

processing the image data associated with the selected view to identify the first ROI as a region in focus in the selected view.
The method of claim 111, wherein determining the first ROI comprises:

processing the image data associated with the selected view using a facial recognition algorithm to identify the first ROI as representing a face.
The method of claim 111, wherein determining the first ROI comprises:

processing the image data associated with the selected view using an object recognition algorithm to identify an object as the first ROI.
The method of claim 111, wherein determining the first ROI comprises:

processing the image data associated with the selected view using a machine learning algorithm to identify the first ROI in the selected view.
The method of claim 111, wherein determining the first ROI comprises:

processing the image data associated with the selected view to identify a plurality of ROIs; and

selecting the first ROI from the plurality of ROIs.
The method of claim 111, wherein causing the third camera to focus on the second ROI in the third view corresponding to the first ROI comprises:

identifying the second ROI in the third view corresponding to the first ROI; and

causing adjustment of a distance between a lens assembly and an image sensor of the third camera according to the second ROI in the third view.
The method of claim 111, wherein causing the third camera to focus on the second ROI in the third view corresponding to the first ROI comprises:

causing adjustment of a distance between a lens assembly and an image sensor of the third camera in accordance with a predetermined relationship of one or more parameters between the third camera and a camera associated with the selected view.
The method of claim 111, wherein causing the third camera to focus on the second ROI in the third view corresponding to the first ROI comprises:

causing adjustment of a distance between a lens assembly and an image sensor of the third camera in accordance with one or more characteristics associated with the first ROI.
The method of claim 111, wherein causing the third camera to focus on the second ROI in the third view corresponding to the first ROI comprises:

switching focus from a third ROI to the second ROI in the third view.
A non-transitory computer-readable medium with instructions stored therein, that when executed by a processor that is operably coupled to a first camera configured to capture a first view of a scene, the first camera associated with a first focal length range, a second camera configured to capture a second view of the scene, the second camera associated with a second focal length range that is different than the first focal length range, and a third camera configured to capture a third view of the scene, the third camera associated with a third focal length range, cause the processor to perform operations comprising:

selecting a view between the first view and the second view by comparing the third focal length range with the first focal length range and the second focal length range;

determining a first region of interest (ROI) in the selected view based on image data associated with the selected view;

causing the third camera to focus on a second ROI in the third view corresponding to the first ROI.
The non-transitory computer-readable medium of claim 124, wherein selecting a view between the first view and the second view comprises:

selecting the view associated with a camera between the first and second cameras that has a focal length range at least partially overlapped with the third focal length rage of the third camera.
The non-transitory computer-readable medium of claim 124, wherein selecting a view between the first view and the second view comprises:

selecting the view associated with a camera between the first and second cameras that includes a lens of a substantially similar type to a lens included in the third camera.
The non-transitory computer-readable medium of claim 124, wherein selecting a view between the first view and the second view comprises:

selecting the view associated with a camera between the first and second cameras that has a field of view (FOV) at least partially overlapped with a FOV of the third camera.
The non-transitory computer-readable medium of claim 124, wherein determining the first ROI comprises:

processing the image data associated with the selected view to identify the first ROI as a region in focus in the selected view.
The non-transitory computer-readable medium of claim 124, wherein determining the first ROI comprises:

processing the image data associated with the selected view using a facial recognition algorithm to identify the first ROI as representing a face.
The non-transitory computer-readable medium of claim 124, wherein determining the first ROI comprises:

processing the image data associated with the selected view using an object recognition algorithm to identify an object as the first ROI.
The non-transitory computer-readable medium of claim 124, wherein determining the first ROI comprises:

processing the image data associated with the selected view using a machine learning algorithm to identify the first ROI in the selected view.
The non-transitory computer-readable medium of claim 124, wherein determining the first ROI comprises:

processing the image data associated with the selected view to identify a plurality of ROIs; and

selecting the first ROI from the plurality of ROIs.
The non-transitory computer-readable medium of claim 124, wherein causing the third camera to focus on the second ROI in the third view corresponding to the first ROI comprises:

identifying the second ROI in the third view corresponding to the first ROI; and

causing adjustment of a distance between a lens assembly and an image sensor of the third camera according to the second ROI in the third view.
The non-transitory computer-readable medium of claim 124, wherein causing the third camera to focus on the second ROI in the third view corresponding to the first ROI comprises:

causing adjustment of a distance between a lens assembly and an image sensor of the third camera in accordance with a predetermined relationship of one or more parameters between the third camera and a camera associated with the selected view.
The non-transitory computer-readable medium of claim 124, wherein causing the third camera to focus on the second ROI in the third view corresponding to the first ROI comprises:

causing adjustment of a distance between a lens assembly and an image sensor of the third camera in accordance with one or more characteristics associated with the first ROI.
The non-transitory computer-readable medium of claim 124, wherein causing the third camera to focus on the second ROI in the third view corresponding to the first ROI comprises:

switching focus from a third ROI to the second ROI in the third view.
A system, comprising:

one or more processors; and

memory coupled to the one or more processors and storing instructions that, when executed by the one or more processors, cause the system to:

determine a first region of interest (ROI) in an overlapping region between a first view of a scene captured by a first camera and a second view of the scene captured by a second camera;

determine a distance of an object corresponding to the first ROI based on first image data associated with the first view obtained from the first camera and second image data associated with the second view obtained from the second camera; and

cause a third camera configured to capture a third view of the scene to focus on a second ROI in the third view corresponding to the first ROI based on the determined distance of the object.
The system of claim 137, wherein at least one of the first, second, and third cameras is integrated in the system.
The system of claim 137, wherein the first camera has a first depth of field (DOF) , and the second camera has a second DOF that at least partially overlaps with the first DOF.
The system of claim 139, wherein the third camera has a third DOF that is smaller than the first DOF or the second DOF.
The system of claim 137, wherein determining the first ROI comprises:

processing the first image data and the second image data associated with the overlapping region using a facial recognition algorithm to identify the first ROI as representing a face.
The system of claim 137, wherein determining the first ROI comprises:

processing the first image data and the second image data associated with the overlapping region using an object recognition algorithm to identify the object corresponding to the first ROI.
The system of claim 137, wherein determining the first ROI comprises:

processing the first image data and the second image data associated with the overlapping region using a machine learning algorithm to identify the first ROI.
The system of claim 137, wherein causing the third camera to focus on the second ROI in the third view further comprises:

identifying the second ROI in the third view based on the determined distance of the object.
The system of claim 137, wherein causing the third camera to focus on the second ROI in the third view comprises:

causing adjustment of a distance between a lens assembly and an image sensor of the third camera in accordance with the determined distance of the object.
The system of claim 137, wherein causing the third camera to focus on the second ROI in the third view comprises:

switching focus from a third ROI to the second ROI in the third view in accordance with the determined distance of the object.
A method, comprising:

determining a first region of interest (ROI) in an overlapping region between a first view of a scene captured by a first camera and a second view of the scene captured by a second camera;

determining a distance of an object corresponding to the first ROI based on first image data associated with the first view obtained from the first camera and second image data associated with the second view obtained from the second camera; and

causing a third camera configured to capture a third view of the scene to focus on a second ROI in the third view corresponding to the first ROI based on the determined distance of the object.
The method of claim 147, wherein the first camera has a first depth of field (DOF) , the second camera has a second DOF that at least partially overlaps with the first DOF, and the third camera has a third DOF that is smaller than the first DOF or the second DOF.
The method of claim 147, wherein determining the first ROI comprises:

processing the first image data and the second image data associated with the overlapping region using a facial recognition algorithm to identify the first ROI as representing a face.
The method of claim 147, wherein determining the first ROI comprises:

processing the first image data and the second image data associated with the overlapping region using an object recognition algorithm to identify the object corresponding to the first ROI.
The method of claim 147, wherein determining the first ROI comprises:

processing the first image data and the second image data associated with the overlapping region using a machine learning algorithm to identify the first ROI.
The method of claim 147, wherein causing the third camera to focus on the second ROI in the third view further comprises:

identifying the second ROI in the third view based on the determined distance of the object.
The method of claim 147, wherein causing the third camera to focus on the second ROI in the third view comprises:

causing adjustment of a distance between a lens assembly and an image sensor of the third camera in accordance with the determined distance of the object.
The method of claim 147, wherein causing the third camera to focus on the second ROI in the third view comprises:

switching focus from a third ROI to the second ROI in the third view in accordance with the determined distance of the object.
A non-transitory computer-readable medium with instructions stored therein, that when executed by a processor, cause the processor to perform operations comprising:

determining a first region of interest (ROI) in an overlapping region between a first view of a scene captured by a first camera and a second view of the scene captured by a second camera;

determining a distance of an object corresponding to the first ROI based on first image data associated with the first view obtained from the first camera and second image data associated with the second view obtained from the second camera; and

causing a third camera configured to capture a third view of the scene to focus on a second ROI in the third view corresponding to the first ROI based on the determined distance of the object.
The non-transitory computer-readable medium of claim 155, wherein the first camera has a first depth of field (DOF) , the second camera has a second DOF that at least partially overlaps with the first DOF, and the third camera has a third DOF that is smaller than the first DOF or the second DOF.
The non-transitory computer-readable medium of claim 155, wherein determining the first ROI comprises:

processing the first image data and the second image data associated with the overlapping region using a facial recognition algorithm to identify the first ROI as representing a face.
The non-transitory computer-readable medium of claim 155, wherein determining the first ROI comprises:

processing the first image data and the second image data associated with the overlapping region using an object recognition algorithm to identify the object corresponding to the first ROI.
The non-transitory computer-readable medium of claim 155, wherein determining the first ROI comprises:

processing the first image data and the second image data associated with the overlapping region using a machine learning algorithm to identify the first ROI.
The non-transitory computer-readable medium of claim 155, wherein causing the third camera to focus on the second ROI in the third view further comprises:

identifying the second ROI in the third view based on the determined distance of the object.
The non-transitory computer-readable medium of claim 155, wherein causing the third camera to focus on the second ROI in the third view comprises:

causing adjustment of a distance between a lens assembly and an image sensor of the third camera in accordance with the determined distance of the object.
The non-transitory computer-readable medium of claim 155, wherein causing the third camera to focus on the second ROI in the third view comprises:

switching focus from a third ROI to the second ROI in the third view in accordance with the determined distance of the object.