WO2024058790A1

WO2024058790A1 - System and method for autofocus in mobile photography

Info

Publication number: WO2024058790A1
Application number: PCT/US2022/043844
Authority: WO
Inventors: Yi Fan; Hsilin Huang
Original assignee: Zeku, Inc.
Priority date: 2022-09-16
Filing date: 2022-09-16
Publication date: 2024-03-21

Abstract

Systems and methods of the present disclosure provide solutions of performing camera autofocus to images that are obtained by a mobile terminal for mobile photography. In an embodiment, a method is performed by a mobile terminal and includes receiving an image via a first camera. The method also includes displaying the received image on a screen. The method further includes obtaining a facial image of a mobile user. The method also includes performing a facial landmark detection on the facial image. The method further includes performing at least one location detection, wherein the location detection comprises gaze detection on the mobile user, object detection on the displayed image, and saliency detection on the displayed image. The method also includes determining a focus location on the displayed image based on the at least one performed location detection. The method further includes applying autofocus on the focus location on the displayed image.

Description

SYSTEM AND METHOD FOR AUTOFOCUS IN MOBILE PHOTOGRAPHY

Technical Field

[0001] The present invention relates to an mobile photography, and more particularly, to a method of autofocus to mobile photography.

Background

[0002] Autofocus (AF) is a feature in most modern digital cameras that automates the focusing process of photos by automatically adjusting the focal length and focus settings of the camera without any input from the photographer. Autofocus typically works in realtime to allow the photographer to focus on one or more particular subjects, regions, or objects in view of the camera lens before taking a photo.

[0003] Dedicated cameras (mirrorless or not), smart phone cameras and tablet cameras all have some type of autofocus feature using an active, passive, or hybrid AF method. Regardless of the AF method used, an autofocus system relies on one or more sensors to detect the subjects, regions, or objects in a photo and determine the correct focus that is applied to the photo. However, not all cameras available in the market today have an "eye control" AF feature, where the autofocus system determines where or what the photographer is looking at through the camera's lens and applies autofocus to a particular point in a photo based on that determination.

[0004] Currently, only a few brands of dedicated cameras have the eye control AF feature. For a camera to have the eye control AF feature, the camera must contain a viewfinder. The viewfinder is necessary to enable the eye control capabilities of the eye control autofocus feature. The camera also requires 8 Light-Emitting Diodes (LEDs) that are used in conjunction with the viewfinder. The LEDs emit different wavelengths of infrared light in the viewfinder. Lastly, a camera must also have a pixel scanner. When a photographer, or user of a camera, places one or their eyes to the viewfinder, the pixel scanner will acquire images of the eye.

[0005] The combination of the viewfinder, LEDs, and pixel scanner enable the camera system to use the images of the photographer's eye to understand the position of the photographer's eye and the direction the eye is looking in. The camera's autofocus system will then combine the information on the photographer's eye with different forms of detection and tracking to automatically focus on one or more particular subjects, regions or objects in the image seen through the camera lens.

[0006] These cameras with the eye control AF feature are typically more complicated and costly than cameras without, due to the need of a viewfinder, LEDs and a pixel scanner. Currently, cameras in mobile phones and tablets do not have an eye focus autofocus feature since these products are unable to contain a viewfinder, LEDs, and a pixel scanner due to its much smaller size.

[0007] With the high demand and use of mobile phones and tablets across the globe, along with the ever increasing use of social media, mobile photography is becoming, or has already become, the main form of photography. Having the eye control autofocus feature in mobile phones and tablets will only further increase the activity of mobile photography by increasing the ease, efficiency and quality of photos taken by mobile users of all levels in photography. The eye control autofocus feature will also increase the likelihood that photos taken using a camera in a mobile phone or tablet will accurately focus on the subjects, regions, and/or objects that the photographer aims to focus on, especially when taking photos in quick session. Therefore, solutions are disclosed herein to implement the eye control autofocus feature into mobile phones and tablets for mobile photography. Brief Description of the Drawings

[0008] The present disclosure, in accordance with one or more various embodiments, is described in detail with reference to the following figures. The figures are provided for purposes of illustration only and merely depict typical or exemplary embodiments. These illustrative examples are mentioned not to limit or define the disclosure, but to provide examples to aid understanding thereof. Additional examples are discussed in the Detailed Description, and further description is provided there.

[0009] FIGS. 1 illustrates an example computing system, within or otherwise associated with a mobile terminal, for performing camera autofocus to images during mobile photography.

[0010] FIG. 2 illustrates an example of a process for performing camera autofocus to images during mobile photography according to various embodiments of the present disclosure.

[0011] FIG. 3 illustrates an example diagram of a process for performing camera autofocus to images during mobile photography according to various embodiments of the present disclosure.

[0012] FIG. 4 illustrates a computing component that includes one or more hardware processors and machine-readable storage media storing a set of machine-readable/machine- executable instructions that, when executed, cause the one or more hardware processors to perform an illustrative method for performing camera autofocus to images during mobile photography, according to various embodiments of the present disclosure.

[0013] FIG. 5 illustrates a block diagram of an example computer system in which various embodiments of the present disclosure may be implemented.

[0014] The figures are not exhaustive and do not limit the present disclosure to the precise form disclosed. Summary

[0015] Various embodiments of the present disclosure can include computing systems, methods, and non-transitory computer readable media configured to execute instructions that, when executed by one or more processors, cause a computing system to perform the actions.

[0016] In an embodiment, one general aspect includes a method for mobile autofocus. The method is performed by a mobile terminal and includes receiving, via a first camera, an image. The method also includes displaying the received image on a screen of the mobile terminal. In addition, the method also includes obtaining a facial image of a user of the mobile terminal. Further, the method also includes performing a facial landmark detection on the obtained facial image. The method also includes performing at least one location detection. In addition, the method also includes determining a focus location on the displayed image based on the at least one performed location detection. Further, the method also includes applying autofocus on the focus location on the displayed image. Other embodiments of this aspect include corresponding computing systems, apparatus, and computer programs recorded on one or more computing storage devices, each configured to perform the actions of the method.

[0017] In an embodiment, another general aspect includes a system that further includes one or more processors and a memory. The one or more processors and the memory in combination are operable to implement a method. The method includes receiving, via a first camera, an image. The method also includes displaying the received image on a screen of the mobile terminal. In addition, the method also includes obtaining a facial image of a user of the mobile terminal. Further, the method also includes performing a facial landmark detection on the obtained facial image. The method also includes performing at least one location detection. In addition, the method also includes determining a focus location on the displayed image based on the at least one performed location detection. Further, the method also includes applying autofocus on the focus location on the displayed image. [0018] In an embodiment, another general aspect includes a computer-program product that further includes a non-transitory computer-usable medium having computer- readable program code embodied therein. The computer-readable program code is adapted to be executed to implement a method. The method is performed by a mobile terminal and includes receiving, via a first camera, an image. The method also includes displaying the received image on a screen of the mobile terminal. In addition, the method also includes obtaining a facial image of a user of the mobile terminal. Further, the method also includes performing a facial landmark detection on the obtained facial image. The method also includes performing at least one location detection. In addition, the method also includes determining a focus location on the displayed image based on the at least one performed location detection. Further, the method also includes applying autofocus on the focus location on the displayed image.

[0019] In some embodiments, the image received via the first camera is a live image. [0020] In some embodiments, the facial image is obtained via the first camera.

[0021] In some embodiments, the facial image is obtained via a second camera.

[0022] In some embodiments, the facial landmark detection comprises scanning the facial image, determining one or more facial features in the facial image according to the scan, and producing one or more facial landmark locations of the one or more determined facial features in the facial image.

[0023] In some embodiments, the one or more facial landmark locations comprise facial coordinates and cropped images of the one or more determined facial features.

[0024] In some embodiments, determining the one or more facial features in the facial image is based on pre-stored facial images.

[0025] In some embodiments, producing the one or more facial landmark locations is based on pre-stored algorithm.

[0026] In some embodiments, the location detection comprises gaze detection on the mobile user, object detection on the displayed image, and saliency detection on the displayed image. [0027] In some embodiments, the gaze detection comprises determining one or more screen coordinates on the screen of the mobile terminal based on the performed facial landmark detection and gaze algorithm, wherein the one or more screen coordinates indicate one or more locations where the mobile user is looking on the screen.

[0028] In some embodiments, the object detection comprises determining one or more objects in the displayed image according to an object algorithm.

[0029] In some embodiments, the saliency detection comprises determining one or more objects in the displayed image according to a saliency algorithm.

[0030] Performing camera autofocus to images, such as the eye control autofocus feature, during mobile photography will provide greater efficiency and quality when taking photos, while also saving the photographer time with adjusting the settings of the images before a photo is taken. By automatically adjusting the focal length, and focus settings of the camera without any input from the mobile photographer, the mobile photographer will be able to more efficiently aim and take photos without a concern for low quality photos. The autofocus of the camera will also ensure that the photos taken are focused on the subjects, regions, and/or objects in the photo that the mobile photographer desires to focus on and highlight in the photo. Another opportunity allowed to the mobile photographer is the ability to take photos more rapidly in succession, and take high quality and focused photos while the photographer and/or the objects in the photo are in motion.

[0031] These and other features of the computing systems, methods, and non- transitory computer readable media disclosed herein, as well as the methods of operation and functions of the related elements of structure and the combination of parts and economies of manufacture, will become more apparent upon consideration of the following description and the appended claims with reference to the accompanying drawings, all of which form a part of this specification, wherein like reference numerals designate corresponding parts in the various figures. [0032] These illustrative embodiments are mentioned not to limit or define the disclosure, but to provide examples to aid understanding thereof. Additional embodiments are discussed in the Detailed Description, and further description is provided there.

Detailed Description

[0033] As described above, the autofocus feature, such as the eye control AF feature, has been developed to provide a focusing process of photos by automatically adjusting the focal length and focus settings of the camera without any input from the photographer. With the autofocus feature working in real-time, it may allow the photographer to more easily and effectively take photos in succession and under various circumstances without any issues of decreasing the quality of such photos due to a lack of focus. The photographer may also take photos of a plurality of subjects, regions, and/or objects in one image with greater efficiency since the autofocus feature will be able to determine which object(s) in the image to focus on without any input from the photographer.

[0034] Accordingly, the present application provides solutions that implement the autofocus feature, such as the eye control AF feature, on mobile phones and tablets. Examples described herein implement a computing component within a mobile terminal or device that performs autofocus on an image obtained from a camera of the mobile terminal, under some conditions or scenarios. First, the computing component within a mobile terminal may receive an image via one of the mobile terminal's cameras. The computing component may display the image received via the camera on a screen of the mobile terminal for the user of the mobile terminal to see. As the user is looking at the image displayed on the screen of the mobile terminal, the computing component may obtain a facial image of the user using one of the mobile terminal's cameras. The computer component may then perform a facial landmark detection on the facial image of the user to determine facial features of the user based on the facial image. The computing component may further perform one or more steps to determine and apply autofocus to a location on the image. [0035] FIG. 1 illustrates an example of a computing component 110 which may be internal to or otherwise associated within a device 150. In some examples, the device 150 may include, but is not limited to, a mobile terminal including a laptop, smart phone, tablet or any mobile device equipped with at least one camera and at least one screen. The device 150 may include a front-facing camera 160 and/or a rear-facing camera 170. The device 150 may include a front-facing screen 180. The computing component 110 may perform one or more available detections to determine particular subjects, regions, and/or objects (hereinafter "objects") in an image that autofocus should be applied to. As an example, the objects may include, but are not limited to, persons, animals, plants, structures, buildings, vehicles, and any other items existing in the world. The computing component 110 may include one or more hardware processors and logic 130 that implements instructions to carry out the functions of the computing component 110, for example, receiving an image via a camera, displaying the received image on a screen of the device 150, obtaining a facial image of the user of the device 150, performing a facial landmark detection on the facial image, performing at least one location detection, determining a focus location on the image, and/or applying autofocus on the focus location on the image. As an example, autofocus may include the eye control AF feature that is used to determine subjects, regions, objects and/or points in an image that the photographer, such as the user of a mobile terminal camera or tablet camera, is looking at. The computing component 110 may store, in a database 120, details regarding scenarios or conditions in which some location detections are performed, algorithms, and images to use to determine facial features of a user. Some of the scenarios or conditions will be illustrated in the subsequent FIGS.

[0036] FIG. 2 illustrates an example scenario in which the computing component 110 may selectively perform one location detection to determine a focus location on an image, for example, in order to perform autofocus on the image at the focus location on the image. In some embodiments, the process 200 can be executed, for example, by the computing component 110 of FIG. 1. In other embodiments, the computing component 200 may be implemented as the computing component 110 of FIG. l. The computing component 200 may be, for example, the computing system 300 of FIG. 3, 400 of FIG. 4, and 500 of FIG. 5. The computing component 200 may include a server.

[0037] At block 210, the computing component 110 receives, via a camera of the device 150, an image. A device 150 may include one or more cameras. The camera(s) of a device 150 may include a front facing camera 160 and/or a rear facing camera 170. When a user of the device 150 wants to use the device 150 to take photos, the user may open a camera application in the device 150. Upon opening the camera application, the user may select to use one of various cameras within device 150. In an example, the user may select to use the front camera 160. In another example, the user may select to use the rear camera 170. Once a camera is selected, the device 150 may receive an image from the selected camera. The received image may represent and contain all of the objects seen in real-time that are in view of the selected camera's lens. Objects may include persons, animals, plants, structures, buildings, vehicles, and any other objects or items existing in the world. The received image may change as the scene and objects in the selected camera's view changes. The scene and objects in the selected camera's view may change if objects in the real-world and/or the device 150 move positions. The received image may change if the selected camera changes from a first camera to a second camera. The received image may represent the image that can be taken as a photo by the selected camera.

[0038] At block 212, the computing component 110 displays the received image on a screen 180 of the device 150. The device 150 may include one or more screens, including screen 180. Screen 180 may display various types of media, including photos, videos, games, and other media applications. Screen 180 may be a touch screen. Screen 180 may include digital buttons that allow interaction for a user of device 150. A user of device 150 may interact with screen 180 to perform various functions provided in device 150. The image received via the selected camera may be displayed on the screen 180. The displayed image may represent and show all of the objects seen in real-time that are in view of and received from the selected camera's lens. The displayed image may be a direct representation of the objects a person, such as the user, may see through their own eyes, but from the perspective of a selected camera's lens of device 150. The displayed image may allow the user to see and acknowledge the scene and objects that are in view of and directly received from the selected camera's lens. Objects may include persons, animals, plants, structures, buildings, vehicles, and any other objects or items existing in the world. The displayed image may change to directly reflect and represent the image received from the selected camera's view. The received and displayed image may represent the image that can be taken as a photo by the selected camera.

[0039] At block 214, the computing component 110 obtains a facial image of the user of the device 150. A the user of the device 150 may be looking towards the device 150 as the user is using a camera of device 150 to obtain and display an image onto the screen 180. As the user is looking toward the device 150, the user may be performing one or more actions on the device 150, including viewing the image received from the selected camera as it is displayed on the screen 180. While an image received from a selected camera of device 150 is displayed on the screen 180, the computing component 110 may instruct the device 150 to scan and obtain a facial image of the user. The user's facial image may be scanned and obtained from a camera of the device 150 that the user's face is in view of. The camera in view of the user's face may be the front facing camera 160 or the rear facing camera 170. The camera in view of the user's face may be the same as or different from the camera selected to obtain and display the image on the screen 180. In an example, the camera in view of the user's face and the camera selected to obtain and display the image on the screen 180 is the front facing camera 160. In another example, the camera in view of the user's face is the front facing camera 160 and the camera selected to obtain and display the image on the screen 180 is the rear facing camera 170. The computing component 110 may scan the user's face using one of the cameras of device 150 to obtain a facial image either simultaneously or consecutively after receiving an image from one of the cameras of device 150. The computing component 110 may scan the user's face using one of the cameras of device 150 to obtain a facial image either simultaneously or consecutively after displaying the received image on the screen 180. [0040] At block 216, the computing component 110 performs a facial landmark detection on the facial image of the user of the device 150. After scanning and obtaining a facial image of the user's face, a facial landmark detection may be performed on the facial image. The facial landmark detection may be performed using one or more algorithms stored in the database 120. First, the facial landmark detection may scan the facial image. Scanning the facial image may include locating the face in the facial image and defining the face shape. Scanning the facial image may also include locating one or more facial features of the facial image. Upon locating the facial features, each facial feature may be determined according to their respective location on the face. Examples of facial features may include the tip of the nose, the corners of the eyes, the corners of the eyebrows, the corners of the mouth, and eye pupils. Upon determining the facial features on the facial image, one or more facial landmark locations may be produced. The facial landmark locations may be indicators of where the facial features are located on the facial image. Each facial landmark location may include coordinates and cropped images of a respective facial feature on the facial image.

[0041] In some embodiments, a determination of facial features may be performed based on pre-stored facial images. The pre-stored facial images may be stored in the database 120 of the computing component 110. The pre-stored facial images may include numerous images of faces of different individuals. Each pre-stored facial image may include facial features that are labeled and identified. Determining the one or more facial features of a facial image may include comparing the facial image with one or more pre-stored facial images. The obtained facial image may be compared to one or more pre-stored facial images to determine the one or more facial features of the obtained facial image. Determining the one or more facial features of the obtained facial image may be based on similarities with pre-stored facial images and the locations of particular facial features from similar pre-stored facial images. Also the shapes of facial features in pre-stored facial images may be used to determine the corresponding facial features in the obtained facial image.

[0042] In other embodiments, a determination of facial features may be performed based on one or more algorithms. The one or more algorithms may be pre-stored in the database 120 of the computing component 110 of the device 150. The one or more algorithms may include a plurality of equations and methods to determine facial features on a facial image. In other embodiments, the facial features of a facial image may be determined based on Machine Learning (ML) and/or Artificial Intelligence (Al). ML and/or Al may be used to identify a facial image according to previously obtained facial images. If a facial image matches a previously obtained facial image, the ML and/or Al may use the facial features of the previously obtained facial image. The more facial images of a particular user is obtained, the more quickly the ML and/or Al may be able to determine the facial features. The ML and/or Al may learn from previous sessions and previously obtained facial images to more quickly and efficiently determine facial features and facial landmark locations when performing the facial landmark detection.

[0043] In some embodiments, coordinates of the facial landmark locations may include xy coordinates according to an xy graph orientated on the facial image. The xy graph may be oriented on the facial image according to the location of the camera used to obtain the facial image. The location of the camera may be the origin (0, 0) of the xy graph. In other embodiments, the coordinates of the facial landmark locations may include xyz coordinates according to an xyz graph oriented on the facial image. The xyz graph may be oriented on the facial image according to the location of the camera used to obtain the facial image. The location of the camera may be the origin (0, 0, 0) of the xyz graph. Many variations are possible. In some embodiments, the coordinates of a particular facial landmark location will be associated with a portion of a cropped image of a particular determined facial feature. Each cropped image of a particular determined facial feature may include a one or more facial landmark location coordinates. In an example, one set of facial landmark coordinates may be used to position the center location of a respective facial feature. In another example, a plurality of facial landmark location coordinates may be used to represent, display and position the entire shape or image of each facial feature to match the respective cropped image. [0044] In some embodiments, facial landmark locations are produced from the determined facial features based on pre-stored algorithm. The pre-stored algorithm may include a plurality of equations and methods of producing coordinates of each facial landmark location. The pre-stored algorithm may be stored in the database 120 of the computing system 110 of the device 150. The pre-stored algorithm may be able to determine one or more of coordinates for each determined facial feature according to the orientation of the facial image to the location of the camera used to obtain the facial image. In an example, one set of facial landmark coordinates may be used to position the center location of a respective facial feature. In another example, a plurality of facial landmark location coordinates may be used to represent, display and position the entire shape or image of each facial feature to match the respective cropped image.

[0045] At block 218, the computing component 110 performs at least one location detection. A location detection may include a gaze detection, an object detection and a saliency detection. Each type of location detection may be a part of the eye control AF feature that is used to determine the subject(s), region(s), object(s) and/or points in an image that may be the main focus of an image. The gaze detection may be performed on the user of the device 150. The object detection may be performed on the image displayed on the screen 180. The saliency detection may be performed on the image displayed on the screen 180. One of, or a combination of, the gaze detection, object detection, and/or saliency detection may be performed. The type(s) and number of location detections performed may be based on one or more factors. The one or more factors may include the color(s) of the displayed image, the composition of the displayed image, the exposure of the displayed image, environmental elements, and settings chosen by the user of the device 150. The more detections that are performed may provide a greater likelihood of identifying one or more important objects in the displayed image. The one or more important objects may be objects that the user of the device 150 wishes to focus on and highlight in a particular image.

[0046] In some embodiments, gaze detection may be performed on the user of the device 150 at the time an image is received from a camera and displayed on screen 180 of the device 150. The gaze detection may be used to predict and determine the user's gaze location on the screen 180. Gaze detection may include determining one or more screen coordinates on the screen 180 that is displaying the received image. The one or more screen coordinates may indicate one or more locations on the screen 180 where the user is looking at. The one or more locations on the screen 180 may be where one or more objects are located in the displayed image. The one or more objects located in the displayed image may be where the user of the device 150 is looking at and wishes to focus on in the image. If the gaze detection is performed and fails to determine at least one screen coordinates on the screen 180, then the computing component 110 may perform at least one of the other location detections.

[0047] In other embodiments, the one or more screen coordinates may be determined based on the performed facial landmark detection and gaze algorithm. The gaze algorithm may be pre-stored in the database 120 of the computing component 110 of the device 150. The gaze detection may use the cropped images and facial coordinates of the facial landmark locations from the facial landmark detection, along with the gaze algorithm, to determine the screen coordinates. The cropped images and facial coordinates may provide locations of the eye pupils of the user of the device 150 with respect to the camera of the device 150 that is being used to obtain the facial image of the user. The gaze algorithm may include equations and methods for determining the locations of the cropped images and facial coordinates with respect to the screen 180 displaying the image. The gaze algorithm may include equations and methods for determining the orientation, angle(s) and direction(s) of the user's eyes. The gaze algorithm may include equations and methods for using the orientation, angle(s) and direction(s) of the user's eyes to determine the location(s) on the screen 180 of the device 150 where the user is looking at. The displayed image on the screen 180 may represent the image that the user of the device 150 wishes to take a photo of.

[0048] As an example, screen 180 of device 150 may be displaying an image of a three (3) turtles swimming in a pond. One turtle may be located in the middle of the image, one turtle may be located in the bottom right corner of the image, and one turtle may be located in the top right corner of the image. The image may be obtained from the rear facing camera 170 of the device 150, and the user of the device 150 may be in view of the front facing camera 160 while looking at the screen 180 to view the displayed image. The front facing camera 160 may obtain a facial image of the user while the user is looking at the screen 180. The gaze algorithm may determine the distance and depth of the eyes of the user with respect to the front facing camera 160. The gaze algorithm may use the distance and depth of the eyes with respect to the front facing camera 160 to determine a distance and depth of the eyes with respect to the screen 180 displaying the image. The gaze algorithm may determine the orientation, angle(s) and direction(s) of the eyes according to the obtained facial image. The gaze algorithm may use the distance and depth of the eyes with respect to the screen 180, along with the orientation, angle(s) and direction(s) of the eyes, to determine that the user is looking at the bottom right corner of the screen 180, where one of the three turtles is displayed in the image.

[0049] In some embodiments, the screen coordinates may be based on a location on the screen 180 displaying the image with respect to the camera of the device 150 that the user has selected to obtain the facial image of the user. As an example, screen coordinates may include xy coordinates according to an xy graph orientated on the screen 180 displaying the image. The xy graph may be oriented on the screen 180 according to the location of the camera used to obtain the facial image of the user. The location of the camera may be the origin (0, 0) of the xy graph. As another example, the screen coordinates may include xyz coordinates according to an xyz graph oriented on the screen 180 displaying the image. The xyz graph may be oriented on the screen 180 according to the location of the camera used to obtain the facial image of the user. The location of the camera may be the origin (0, 0, 0) of the xyz graph. Many variations are possible.

[0050] In some embodiments, object detection may be performed on the displayed image received via one of the cameras of the device 150. Object detection may be used to recognize and detect different objects present in an image. Object detection may label and classify each object that is detected in an image. In some embodiments, object detection may include determining one or more objects in the displayed image according to an object algorithm. The object algorithm may include ML algorithms. The object algorithm may include equations and methods to recognize, detect, label and classify any and all objects that are present in an image. The object algorithm may place a box around each object detected in an image. A different color box may be used for each classification of objects. Objects in an image may include persons, animals, plants, structures, buildings, vehicles, and any other objects or items existing in the world. The object algorithm may produce coordinates for each object that is determined to be present in the displayed image. The coordinates for each object may indicate the location of the respective object in the displayed image. If the object detection is performed and fails to determine at least one object on the screen 180, then the computing component 110 may perform at least one of the other location detections.

[0051] In some embodiments, saliency detection may be performed on the displayed image received via one of the cameras of the device 150. Saliency detection may be used to detect objects present in an image. The objects detected may include objects that are considered the most important or draws the most attention in the image. In some embodiments, saliency detection may include determining one or more objects in the displayed image according to a saliency algorithm. The saliency algorithm may include ML algorithms. The saliency algorithm may include equations and methods to determine and detect objects in an image that may draw a user's attention. The saliency algorithm may produce coordinates for each object that is determined to be present in the displayed image. The coordinates for each object may indicate the location of the respective object in the displayed image. If the saliency detection is performed and fails to determine at least one object on the screen 180, then the computing component 110 may perform at least one of the other location detections.

[0052] At block 220, the computing component 110 determines a focus location on the displayed image. Performing at least one location detection of gaze detection, object detection and/or saliency detection may determine one or more locations on the displayed image that display points of interest. In some embodiments, the one or more locations may include one or more objects in the image that display high levels of activity and interest in the image. The one or more locations may include objects that draw the a high degree of attention in the image. In other embodiments, the one or more locations may include one or more objects in the image that the user of the mobile terminal is looking at.

[0053] Amongst the one or more locations determined by the one or more location detections, a particular location may be the main focal point of the image. In some embodiments, the particular location may include one or more objects in the image that is the center of interest or activity in the image. The center of interest or activity may include objects that draw the most attention in the image. In other embodiments, the particular location may include one or more objects in the image that the user of the mobile terminal is looking at. The particular location on the displayed image may be the position on the displayed image that the user is focusing on.

[0054] A location on the displayed image may be represented by coordinates on the displayed image. As an example, location coordinates may include xy coordinates according to an xy graph orientated on the displayed image. The xy graph may be oriented on the displayed image with the origin (0, 0) of the xy graph being at the center of the displayed image. As another example, the screen coordinates may include xyz coordinates according to an xyz graph oriented on the displayed. The xyz graph may be oriented on the displayed image with the origin (0, 0, 0) of the xyz graph being at the center of the displayed image. Many variations are possible.

[0055] As an example, the one or more screen coordinates that are determined by the gaze detection may be used to determine one or more locations on the image. The screen coordinates may indicate where on the screen 180 the user is looking at. The screen 180 may display an image containing one or more objects. The user may be looking at a particular location on the screen 180, with particular screen coordinates, that correspond to a particular location on the image where a particular object(s) is being displayed. The screen coordinates determined by the gaze detection may be used to determine a particular location on the image that directly correspond to the screen coordinates based on how the image is displayed and oriented on the screen. [0056] In some embodiments, when a location detection is performed and fails to help determine one or more locations on the displayed image that are important, then the computing component 110 may perform at least one of the other location detections. The type(s) and number of location detections to be further performed may be based on one or more factors.

[0057] In some embodiments, when more than one location detection, of gaze detection, object detection, and saliency detection, is performed, there may be different locations on the image that are determined to be important. When different locations are determined to be important from different location detections, then the computing component 110 may default to select a location determined from a particular location detection. The defaulted location detection that may be selected based on one or more factors. The one or more factors may include the color(s) of the displayed image, the composition of the displayed image, the exposure of the displayed image, environmental elements, and settings chosen by the user of the device 150. The location that is selected may be used as the position on the displayed image that autofocus will be applied to.

[0058] In some embodiments, when more than one location detection, of gaze detection, object detection, and saliency detection, is performed, there may be one or more locations on the displayed image that are determined from more than one location detection. Locations that have been determined by more than one location detection may represent locations with a higher degree of importance with respect to the focus of the image. In an example, all three location detections of gaze detection, object detection, and saliency detection are performed. One location of (5, 6) on the displayed image is determined based on the gaze detection. Three locations of (7, -10), (-3, 8) and (5, 6) on the displayed image are determined based on the object detection. Two locations of (-3, 8) and (5, 6) on the displayed image are determined based on the saliency detection. With the location of (5, 6) being determined by all three location detections, the location of (5, 6) will have a highest degree of importance amongst the determined locations. The location of (-3, 8) will have the second highest degree of importance and the location of (7, -10) will have the third highest degree of importance.

[0059] At block 222, the computing component 110 applies autofocus on the focus location on the displayed image. Autofocus may include automatically adjusting the focal length and focus settings of the camera without any input from the photographer. Autofocus may occur in real-time and focus on one or more objects in a particular location in the image. Autofocus may be applied to a location on a displayed image on the screen 180 of the device 150. The focus location may indicate a position in the displayed image that includes one or more objects in the image where the mobile user is looking at. The focus location may indicate a position in the image that includes one or more objects that are the focal point of the image. The focus location may indicate a position in the image that includes one or more objects that attract the most attention in the image. Autofocus may be applied to the determined location on the displayed image that represents the location with the highest degree of importance compared to the other determined locations on the displayed image. The location with the highest degree of importance may be considered the focus location. In some embodiments, the determined location with the highest degree of importance may be the location that was determined based on the most number of location detections. In other embodiments, the determined location with the highest degree of importance may be the location that is selected based on default settings. In other embodiments, the determined location with the highest degree of importance may be the location that is selected based on one or more factors. Many variations are possible.

[0060] For simplicity of description, the process 200 is described as being performed with respect to a single received image. It should be appreciated that, in a typical embodiment, the computing component 110 may manage a plurality of images in short succession of one another. For example, in some embodiments, the computing component 110 can perform many, if not all, of the steps in process 200 on a plurality of images as the images change. [0061] FIG. 3 illustrates an example diagram of a scenario in which the computing component 110 may perform each location detection of gaze detection, object detection, and saliency detection to determine one or more locations on an image, for example, in order to perform autofocus on the image at a focus location on the image. The steps of process 300 is similar to the steps of process 200. The computing component 300 may be implemented as the computing component 110 of FIG. 1. The computing component 300 may be, for example, the computing system 200 of FIG. 2, 400 of FIG. 4, and 500 of FIG. 5. The computing component 300 may include a server.

[0062] Step 310 of process 300 is similar to block 210 of process 200. At step 310, the computing component 110 captures, via a camera of the device 150, an image. A device 150 may include one or more cameras. The camera(s) of a device 150 may include a front facing camera 160 and/or a rear facing camera 170. When a user of the device 150 wants to use the device 150 to take photos, the user may open a camera application in the device 150. Upon opening the camera application, the user may select to use one of various cameras within device 150. In an example, the user may select to use the front camera 160. In another example, the user may select to use the rear camera 170. Once a camera is selected, the device 150 may capture an image from the selected camera. The captured image may represent and contain all of the objects seen in real-time that are in view of the selected camera's lens. Objects may include persons, animals, plants, structures, buildings, vehicles, and any other objects or items existing in the world. The captured image may change as the scene and objects in the selected camera's view changes. The scene and objects in the selected camera's view may change if objects in the real-world and/or the device 150 move positions. The captured image may change if the selected camera changes from a first camera to a second camera. The captured image may represent the image that can be taken as a photo by the selected camera.

[0063] Step 312 of process 300 is similar to block 212 of process 200. At step 312, the computing component 110 displays the captured image on a screen 180 of the device 150.

The device 150 may include one or more screens, including screen 180. Screen 180 may display various types of media, including photos, videos, games, and other media applications. Screen 180 may be a touch screen. Screen 180 may include digital buttons that allow interaction for a user of device 150. A user of device 150 may interact with screen 180 to perform various functions provided in device 150. The image captured via the selected camera may be displayed on the screen 180. The displayed image may be a live feed, and represent and show all of the objects seen in real-time that are in view of and captured from the selected camera's lens. The displayed image may be a direct representation of the objects a person, such as the user, may see through their own eyes, but from the perspective of a selected camera's lens of device 150. The displayed image may allow the user to see and acknowledge the scene and objects that are in view of and directly captured from the selected camera's lens. Objects may include persons, animals, plants, structures, buildings, vehicles, and any other objects or items existing in the world. The displayed image may change to directly reflect and represent the image captured from the selected camera's view. The received and displayed image may represent the image that can be taken as a photo by the selected camera.

[0064] At step 314 of process 300, the computing component 110 performs object detection and saliency detection. Object detection may be performed on the image captured via one of the cameras of the device 150. Object detection may be used to recognize and detect different objects present in an image. Object detection may label and classify each object that is detected in an image. In some embodiments, object detection may include determining one or more objects in the displayed image according to an object algorithm. The object algorithm may include ML algorithms. The object algorithm may include equations and methods to recognize, detect, label and classify any and all objects that are present in an image. The object algorithm may place a box around each object detected in an image. A different color box may be used for each classification of objects. Objects in an image may include persons, animals, plants, structures, buildings, vehicles, and any other objects or items existing in the world. The object algorithm may produce coordinates for each object that is detected in the displayed image. The coordinates for each object may indicate the location of the respective object in the displayed image. If the object detection is performed and fails to determine at least one object on the screen 180, then the computing component 110 may rely on at least one of the saliency detection, or the facial landmarks detection with the gaze prediction.

[0065] Saliency detection may be performed on the image captured via one of the cameras of the device 150. Saliency detection may be used to detect objects present in an image. The objects detected may include objects that are considered the most important or draws the most attention in the image. In some embodiments, saliency detection may include determining one or more objects in the displayed image according to a saliency algorithm. The saliency algorithm may include ML algorithms. The saliency algorithm may include equations and methods to determine and detect objects in an image that may draw a user's attention. The saliency algorithm may produce coordinates for each object that is detected in the displayed image. The coordinates for each object may indicate the location of the respective object in the displayed image. If the saliency detection is performed and fails to determine at least one object on the screen 180, then the computing component 110 may rely on at least one of the object detection, or the facial landmarks detection with the gaze prediction.

[0066] Step 316 of process 300 displays an example of object detection applied to an image. The image include two objects of animal foxes standing in a desert. The object detection has identified both objects of animal foxes and has placed a box around each object detected in the image. Both boxes placed in the image are the color yellow which is used to classify the objects as animals.

[0067] Step 318 of process 300 displays an example of saliency detection applied to an image. The image includes one object of a soccer player laying on a field of grass. The saliency detection has identified the soccer player in the image to be the focal point of the image or the object that attracts the most attention.

[0068] Step 320 of process 300 is similar to block 214 of process 200. At step 320, the computing component 110 obtains a facial image of the user of the device 150. A the user of the device 150 may be looking towards the device 150 as the user is using a camera of device 150 to obtain and display an image onto the screen 180. As the user is looking toward the device 150, the user may be performing one or more actions on the device 150, including viewing the image received from the selected camera as it is displayed on the screen 180. While an image received from a selected camera of device 150 is displayed on the screen 180, the computing component 110 may instruct the device 150 to scan and obtain a facial image of the user. The user's facial image may be scanned and obtained from a camera of the device 150 that the user's face is in view of. The camera in view of the user's face may be the front facing camera 160 or the rear facing camera 170. The camera in view of the user's face may be the same as or different from the camera selected to obtain and display the image on the screen 180. In an example, the camera in view of the user's face and the camera selected to obtain and display the image on the screen 180 is the front facing camera 160. In another example, the camera in view of the user's face is the front facing camera 160 and the camera selected to obtain and display the image on the screen 180 is the rear facing camera 170. The computing component 110 may scan the user's face using one of the cameras of device 150 to obtain a facial image either simultaneously or consecutively after receiving an image from one of the cameras of device 150. The computing component 110 may scan the user's face using one of the cameras of device 150 to obtain a facial image either simultaneously or consecutively after displaying the received image on the screen 180.

[0069] Step 322 of process 300 is similar to block 216 of process 200. At step 322, the computing component 110 performs a facial landmark detection on the facial image of the user of the device 150. After scanning and obtaining a facial image of the user's face, a facial landmark detection may be performed on the facial image. The facial landmark detection may be performed using one or more algorithms stored in the database 120. First, the facial landmark detection may scan the facial image. Scanning the facial image may include locating the face in the facial image and defining the face shape. Scanning the facial image may also include locating one or more facial features of the facial image. Upon locating the facial features, each facial feature may be determined according to their respective location on the face. Examples of facial features may include the tip of the nose, the corners of the eyes, the corners of the eyebrows, the corners of the mouth, and eye pupils.

[0070] In some embodiments, a determination of facial features may be performed based on pre-stored facial images. The pre-stored facial images may be stored in the database 120 of the computing component 110. The pre-stored facial images may include numerous images of faces of different individuals. Each pre-stored facial image may include facial features that are labeled and identified. Determining the one or more facial features of a facial image may include comparing the facial image with one or more pre-stored facial images. The obtained facial image may be compared to one or more pre-stored facial images to determine the one or more facial features of the obtained facial image. Determining the one or more facial features of the obtained facial image may be based on similarities with pre-stored facial images and the locations of particular facial features from similar pre-stored facial images. Also the shapes of facial features in pre-stored facial images may be used to determine the corresponding facial features in the obtained facial image.

[0071] In other embodiments, a determination of facial features may be performed based on one or more algorithms. The one or more algorithms may be pre-stored in the database 120 of the computing component 110 of the device 150. The one or more algorithms may include a plurality of equations and methods to determine facial features on a facial image. In other embodiments, the facial features of a facial image may be determined based on ML and/or Al. ML and/or Al may be used to identify a facial image according to previously obtained facial images. If a facial image matches a previously obtained facial image, the ML and/or Al may use the facial features of the previously obtained facial image. The more facial images of a particular user is obtained, the more quickly the ML and/or Al may be able to determine the facial features. The ML and/or Al may learn from previous sessions and previously obtained facial images to more quickly and efficiently determine facial features and facial landmark locations when performing the facial landmark detection.

[0072] In some embodiments, coordinates of the facial landmark locations may include xy coordinates according to an xy graph orientated on the facial image. The xy graph may be oriented on the facial image according to the location of the camera used to obtain the facial image. The location of the camera may be the origin (0, 0) of the xy graph. In other embodiments, the coordinates of the facial landmark locations may include xyz coordinates according to an xyz graph oriented on the facial image. The xyz graph may be oriented on the facial image according to the location of the camera used to obtain the facial image. The location of the camera may be the origin (0, 0, 0) of the xyz graph. Many variations are possible. In some embodiments, the coordinates of a particular facial landmark location will be associated with a portion of a cropped image of a particular determined facial feature. Each cropped image of a particular determined facial feature may include a one or more facial landmark location coordinates. In an example, one set of facial landmark coordinates may be used to position the center location of a respective facial feature. In another example, a plurality of facial landmark location coordinates may be used to represent, display and position the entire shape or image of each facial feature to match the respective cropped image.

[0073] At step 324 of process 300, the computing component 110 determines the one or more facial landmark locations. Upon determining the facial features on the facial image, one or more facial landmark locations may be produced. The facial landmark locations may be indicators of where the facial features are located on the facial image. Each facial landmark location may include coordinates and cropped images of a respective facial feature on the facial image.

[0074] In some embodiments, facial landmark locations are produced from the determined facial features based on pre-stored algorithm. The pre-stored algorithm may include a plurality of equations and methods of producing coordinates of each facial landmark location. The pre-stored algorithm may be stored in the database 120 of the computing system 110 of the device 150. The pre-stored algorithm may be able to determine one or more of coordinates for each determined facial feature according to the orientation of the facial image to the location of the camera used to obtain the facial image. In an example, one set of facial landmark coordinates may be used to position the center location of a respective facial feature. In another example, a plurality of facial landmark location coordinates may be used to represent, display and position the entire shape or image of each facial feature to match the respective cropped image.

[0075] At step 326 of process 300, the computing component 110 performs gaze prediction. The gaze prediction may be performed on the user of the device 150. The gaze prediction may be based on the facial landmark detection and facial landmark locations produced. Gaze prediction may be performed on the user of the device 150 at the time an image is captured from a camera and displayed on screen 180 of the device 150. The gaze prediction may be used to detect and determine the user's gaze location on the screen 180. Gaze prediction may include determining one or more screen coordinates on the screen 180 that is displaying the captured image. The one or more screen coordinates may indicate one or more gaze locations on the screen 180 where the user is looking at. The one or more gaze locations on the screen 180 may indicate where the user is looking or gazing at. The one or more gaze locations may indicate where one or more objects are located in the displayed image. The one or more objects located in the displayed image may be where the user of the device 150 is looking at and wishes to focus on in the image. If the gaze prediction is performed and fails to determine at least one screen coordinates on the screen 180, then the computing component 110 may rely on at least one of the object or saliency detections.

[0076] In some embodiments, the one or more screen coordinates may be determined based on the performed facial landmarks detection and gaze algorithm. The gaze algorithm may be pre-stored in the database 120 of the computing component 110 of the device 150. The gaze prediction may use the cropped images and facial coordinates of the facial landmark locations from the facial landmarks detection, along with the gaze algorithm, to determine the screen coordinates. The cropped images and facial coordinates may provide locations of the eye pupils of the user of the device 150 with respect to the camera of the device 150 that is being used to obtain the facial image of the user. The gaze algorithm may include equations and methods for determining the locations of the cropped images and facial coordinates with respect to the screen 180 displaying the image. The gaze algorithm may include equations and methods for determining the orientation, angle(s) and direction(s) of the user's eyes. The gaze algorithm may include equations and methods for using the orientation, angle(s) and direction(s) of the user's eyes to determine the location(s) on the screen 180 of the device 150 where the user is looking at. The displayed image on the screen 180 may represent the image that the user of the device 150 wishes to take a photo of.

[0077] In some embodiments, the screen coordinates may be based on a location on the screen 180 displaying the image with respect to the camera of the device 150 that the user has selected to obtain the facial image of the user. As an example, screen coordinates may include xy coordinates according to an xy graph orientated on the screen 180 displaying the image. The xy graph may be oriented on the screen 180 according to the location of the camera used to obtain the facial image of the user. The location of the camera may be the origin (0, 0) of the xy graph. As another example, the screen coordinates may include xyz coordinates according to an xyz graph oriented on the screen 180 displaying the image. The xyz graph may be oriented on the screen 180 according to the location of the camera used to obtain the facial image of the user. The location of the camera may be the origin (0, 0, 0) of the xyz graph. Many variations are possible.

[0078] Step 328 of process 300 displays an example of determining screen coordinates based on the gaze prediction. The example image at step 328 displays an xy graph oriented on the screen 180 of the device 150. The xy graph on the screen 180 includes a screen coordinate that may indicate a position on the screen where the user is gazing or looking at.

[0079] Step 330 of process 300 is similar to block 220 of process 200. At step 330, the computing component 110 combines the results of the object detection, the saliency detection, the facial landmarks detection and the gaze prediction to determine a focus location on the displayed image. The combined results of the may include one or more locations on the displayed image that have been determined to display points of interest in the displayed image. In some embodiments, the one or more locations may include one or more objects in the image that display high levels of activity and interest in the image. The one or more locations may include objects that draw the a high degree of attention in the image. In other embodiments, the one or more locations may include one or more objects in the image that the user of the mobile terminal is looking at.

[0080] Amongst the one or more locations determined, a particular location may be considered as the focus location. The focus location may be the position that is the main focal point of the image. The location on the displayed image may indicate the position on the displayed image that should be focused on. In some embodiments, the main focal location may include one or more objects in the image that is the center of interest or activity in the image. The center of interest or activity may include objects that draw the most attention in the image. In other embodiments, the focal location may include one or more objects in the image that the user of the mobile terminal is looking at. The focus location on the displayed image may be the position on the displayed image that the user is focusing on.

[0081] A location on the displayed image may be represented by coordinates on the displayed image. As an example, location coordinates may include xy coordinates according to an xy graph orientated on the displayed image. The xy graph may be oriented on the displayed image with the origin (0, 0) of the xy graph being at the center of the displayed image. As another example, the screen coordinates may include xyz coordinates according to an xyz graph oriented on the displayed. The xyz graph may be oriented on the displayed image with the origin (0, 0, 0) of the xyz graph being at the center of the displayed image. Many variations are possible.

[0082] In some embodiments, after the object detection, saliency detection, facial landmark detection and gaze prediction are performed, there may be different locations on the image that are determined to be important. When different locations are determined to be important from different detections, then the computing component 110 may default to select a location determined from a particular location detection. The defaulted location detection that may be selected based on one or more factors. The location selected from the particular location detection may be selected based on one or more factors. The one or more factors may include the color(s) of the displayed image, the composition of the displayed image, the exposure of the displayed image, environmental elements, and settings chosen by the user of the device 150..

[0083] In other embodiments, after the object detection, saliency detection, facial landmark detection and gaze prediction are performed, there may be one or more locations on the displayed image that are determined from more than one detection. Locations that have been determined by more than one detection may indicate a higher degree of importance with respect to the focus of the image. In an example, after performing the object detection, saliency detection, facial landmarks detection and gaze prediction, a plurality of locations are determined. One location of (9, -2) on the displayed image is determined based on the gaze prediction and facial landmarks detection. Three locations of (-1, 6), (9, -2) and (4, -4) on the displayed image are determined based on the object detection. Three locations of (-3, 8), (5, 6), and (9, -2) on the displayed image are determined based on the saliency detection. With the location of (9, -2) being determined by all three detections, the location of (9, -2) will have a highest degree of importance amongst the determined locations.

[0084] Step 332 of process 300 is similar to block 222 of process 200. At step 332, the computing component 110 applies autofocus on the focus location on the displayed image. Autofocus may include automatically adjusting the focal length and focus settings of the camera without any input from the photographer. Autofocus may occur in real-time and focus on one or more objects in the one or more locations on the image. Autofocus may be applied to a particular location on a displayed image on the screen 180 of the device 150. The particular location may indicate a position on the displayed image that is the main focus of attention. The particular location may include one or more objects in the image where the mobile user is looking at. The particular location may include one or more objects in the image that are the focal point of the image. The particular location may include one or more objects in the image that attract the most attention in the image. Autofocus may be applied to the location on the displayed image that has the highest degree of importance compared to the other determined locations on the displayed image. In some embodiments, the determined location with the highest degree of importance may be the location on the displayed image that should be focused on. The determined location with the highest degree of importance may be the location that was determined from the most number of location detections. In other embodiments, the determined location with the highest degree of importance may be the location that is selected based on default settings. In other embodiments, the determined location with the highest degree of importance may be the location that is selected based on one or more factors. Many variations are possible.

[0085] For simplicity of description, the process 300 is described as performing all detection methods to a single captured image. It should be appreciated that, in a typical embodiment, the computing component 110 may manage a plurality of images in short succession of one another. For example, in some embodiments, the computing component 110 can perform many, if not all, of the steps in process 300 on a plurality of images as the images change.

[0086] FIG. 4 illustrates a computing component 400 that includes one or more hardware processors 402 and machine-readable storage media 404 storing a set of machine- readable/machine-executable instructions that, when executed, cause the one or more hardware processors 402 to perform an illustrative method of applying autofocus on an image by a mobile terminal, according to various embodiments of the present disclosure. It should be appreciated that there can be additional, fewer, or alternative steps performed in similar or alternative orders, or in parallel, within the scope of the various examples discussed herein unless otherwise stated. The computing component 400 may be implemented as the computing component 110 of FIG. 1. The computing component 400 may be, for example, the computing system 200 of FIG. 2, 300 of FIG. 3, and 500 of FIG. 5. The computing component 400 may include a server. The hardware processors 402 may include, for example, the processor(s) 504 of FIG. 5 or any other processing unit described herein. The machine- readable storage media 404 may include the main memory 506, the read-only memory (ROM) 508, the storage 510 of FIG. 5, and/or any other suitable machine-readable storage media described herein. [0087] At step 406, the hardware processor(s) 402 may execute the machine- readable/machine-executable instructions stored in the machine-readable storage media 404 to receive an image via a camera of a mobile terminal. In some embodiments, the mobile terminal may include one camera. The camera of the mobile terminal may be a front facing camera ora rear facing camera. In other embodiments, the mobile terminal may include more than one camera. The more than one cameras in the mobile terminal may include a front facing camera and a rear facing camera. The user may use a camera of the mobile terminal by opening a camera application in the mobile terminal. The mobile terminal may receive an image via the front facing camera or the rear facing camera. The mobile terminal may receive an image via a first camera through the first camera's lens. The received image may include one or more objects that are in view of the first camera's lens. Objects may include persons, animals, plants, structures, buildings, vehicles, and any other objects or items existing in the world. In some embodiments, the image being received via the first camera of the mobile terminal is a live image, wherein the live image includes objects that are seen and/or moving in real-time. The image being received via the first camera of the mobile terminal may change when the mobile terminal is moved. The image being received via the first camera of the mobile terminal may change as objects in the camera's view change. The image being received may change if the first camera being used to capture the image is switched to a second camera of the mobile terminal. Many variations are possible.

[0088] At step 408, the hardware processor(s) 402 may execute the machine- readable/machine-executable instructions stored in the machine-readable storage media 404 to display the received image on a screen of the mobile terminal. In some embodiments, the mobile terminal has at least one screen. A screen of the mobile terminal may be a touch screen. A screen of the mobile terminal may include digital buttons that may be selected by a user of the mobile terminal. A screen of the mobile terminal may be used to display other applications of the mobile terminal. A user of the mobile terminal may interact with a screen of the mobile terminal to perform various functions. An image received via a first camera of the mobile terminal may be displayed on at least one screen of the mobile terminal. The displayed image may include objects that are in view of one of a plurality of cameras of the mobile terminal. Objects in an image may include persons, animals, plants, structures, buildings, vehicles, and any other objects or items existing in the world. The displayed image may be live where the objects included in the displayed image represent objects seen in realtime. Objects in the displayed image may change as the mobile terminal is moved. Objects in the displayed image may change as the objects in the camera's view change. Objects in the displayed image may change as the camera being used to obtain the image is switched from a first camera to a second camera of the mobile terminal. Many variations are possible.

[0089] At step 410, the hardware processor(s) 402 may execute the machine- readable/machine-executable instructions stored in the machine-readable storage media 404 to obtain a facial image of a user of the mobile terminal. In some embodiments, a user of the mobile terminal is a person. The user may be using the mobile terminal to take a photo using a first camera of the mobile terminal. A camera of the mobile terminal that is in view of the user's face may be used to obtain a facial image of the user. The camera used to obtain the facial image of the user may be the first camera or a second camera of the mobile terminal. In an example, the user's face may be in view of a first camera of the mobile terminal while the user is taking a photo via a second camera of the mobile terminal. In another example, the user's face may be in view of a first camera, which is the same camera of the mobile terminal that the user is using to take a photo with. In some embodiments, the mobile terminal may obtain a facial image of the user via one of the cameras of the mobile terminal at the same time as when the mobile terminal is receiving an image from the first camera of the mobile terminal. In other embodiments, the mobile terminal may obtain a facial image of the user via one of the cameras of the mobile terminal at the same time as when the mobile terminal is displaying the received image on the screen of the mobile terminal. In other embodiments, the mobile terminal may obtain a facial image of the user via one of the cameras of the mobile terminal after the mobile terminal has received an image from the first camera of the mobile terminal. In other embodiments, the mobile terminal may obtain a facial image of the user via one of the cameras of the mobile terminal after the mobile terminal has displayed the received image on the screen of the mobile terminal. Many variations are possible.

[0090] At step 412, the hardware processor(s) 402 may execute the machine- readable/machine-executable instructions stored in the machine-readable storage media 404 to perform a facial landmark detection on the obtained facial image. In some embodiments, a facial landmark detection may be performed on the facial image of the user that is obtained via a camera of the mobile terminal. The obtained facial image of the user may be scanned. The scan on the facial image may include locating the face in the facial image and defining the face shape. One or more facial features on the facial image may be determined according to the scan. Examples of facial features may include the tip of the nose, the corners of the eyes, the corners of the eyebrows, the corners of the mouth, and eye pupils. The one or more facial features that are determined may be used to produce one or more facial landmark locations. The facial landmark locations may include coordinates and cropped images of the one or more determined facial features. Each facial landmark location may indicate the location of a respective facial feature of the user according to the facial image that was obtained.

[0091] At step 414, the hardware processor(s) 402 may execute the machine- readable/machine-executable instructions stored in the machine-readable storage media 404 to perform at least one location detection. In some embodiments, the location detection may include a gaze detection on the mobile user, an object detection on the displayed image, and a saliency detection on the displayed image. One of, or a combination of, the gaze detection, object detection, and/or saliency detection may be performed. The type(s) and number of location detections performed may be based on one or more factors. The factors may include color, composition, exposure, environmental, and settings chosen by the mobile user.

[0092] At step 416, the hardware processor(s) 402 may execute the machine- readable/machine-executable instructions stored in the machine-readable storage media 404 to determine a focus location on the displayed image based on the at least one performed location detection. In some embodiments, performing at least one location detection of gaze detection, object detection and/or saliency detection may determine a focus location on the displayed image that is the main focal point of the image. In some embodiments, the focus location may include one or more objects in the image that is the center of interest or activity in the image. The center of interest or activity may include objects that draw the most attention in the image. In other embodiments, the focus location may include one or more objects in the image that the user of the mobile terminal is looking at.

[0093] At step 418, the hardware processor(s) 402 may execute the machine- readable/machine-executable instructions stored in the machine-readable storage media 404 to apply autofocus on the focus location on the displayed image. In some embodiments, autofocus may be applied to a focus location on an image displayed on a screen of a mobile terminal. The focus location may indicate one or more objects in the image where the mobile user is looking at. The focus location may include one or more objects in the image that are the focal point of the image. The focus location may include one or more objects in the image that attract the most attention in the image. Autofocus may be applied to the one or more objects indicated by the focus location. Autofocus may include automatically adjusting the focal length and focus settings of the camera without any input from the photographer. Autofocus may occur in real-time and focus on one or more objects in the one or more locations on the image.

[0094] Subsequently, the hardware processor(s) 402 may obtain receive subsequent images via a camera of the mobile terminal and repeat the aforementioned steps for each of the subsequent images received, until a camera of the mobile terminal is no longer being used by the user.

[0095] FIG. 5 illustrates a block diagram of an example computer system 500 in which various embodiments of the present disclosure may be implemented. The computer system 500 can include a bus 502 or other communication mechanism for communicating information, one or more hardware processors 504 coupled with the bus 502 for processing information. The hardware processor(s) 504 may be, for example, one or more general purpose microprocessors. The computer system 500 may be an embodiment of a video encoding module, video decoding module, video encoder, video decoder, or similar device. [0096] The computer system 500 can also include a main memory 506, such as a random access memory (RAM), cache and/or other dynamic storage devices, coupled to the bus 502 for storing information and instructions to be executed by the hardware processor(s) 504. The main memory 506 may also be used for storing temporary variables or other intermediate information during execution of instructions by the hardware processor(s) 504. Such instructions, when stored in a storage media accessible to the hardware processor(s) 504, render the computer system 500 into a special-purpose machine that can be customized to perform the operations specified in the instructions.

[0097] The computer system 500 can further include a read only memory (ROM) 508 or other static storage device coupled to the bus 502 for storing static information and instructions for the hardware processor(s) 504. A storage device 510, such as a magnetic disk, optical disk, or USB thumb drive (Flash drive), etc., can be provided and coupled to the bus 502 for storing information and instructions.

[0098] Computer system 500 can further include at least one network interface 512, such as a network interface controller module (NIC), network adapter, or the like, or a combination thereof, coupled to the bus 502 for connecting the computer system 500 to at least one network.

[0099] In general, the word "component," "modules," "engine," "system," "database," and the like, as used herein, can refer to logic embodied in hardware or firmware, or to a collection of software instructions, possibly having entry and exit points, written in a programming language, such as, for example, Java, C or C++. A software component or module may be compiled and linked into an executable program, installed in a dynamic link library, or may be written in an interpreted programming language such as, for example, BASIC, Perl, or Python. It will be appreciated that software components may be callable from other components or from themselves, and/or may be invoked in response to detected events or interrupts. Software components configured for execution on computing devices, such as the computing system 500, may be provided on a computer readable medium, such as a compact disc, digital video disc, flash drive, magnetic disc, or any other tangible medium, or as a digital download (and may be originally stored in a compressed or installable format that requires installation, decompression or decryption prior to execution). Such software code may be stored, partially or fully, on a memory device of an executing computing device, for execution by the computing device. Software instructions may be embedded in firmware, such as an EPROM. It will be further appreciated that hardware components may be comprised of connected logic units, such as gates and flip-flops, and/or may be comprised of programmable units, such as programmable gate arrays or processors.

[00100]The computer system 500 may implement the techniques or technology described herein using customized hard-wired logic, one or more ASICs or FPGAs, firmware and/or program logic which in combination with the computer system 500 that causes or programs the computer system 500 to be a special-purpose machine. According to one or more embodiments, the techniques described herein are performed by the computer system 500 in response to the hardware processor(s) 504 executing one or more sequences of one or more instructions contained in the main memory 506. Such instructions may be read into the main memory 506 from another storage medium, such as the storage device 510. Execution of the sequences of instructions contained in the main memory 506 can cause the hardware processor(s) 504 to perform process steps described herein. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions.

[00101] The term "non-transitory media," and similar terms, as used herein refers to any media that store data and/or instructions that cause a machine to operate in a specific fashion. Such non-transitory media may comprise non-volatile media and/or volatile media. The non-volatile media can include, for example, optical or magnetic disks, such as the storage device 510. The volatile media can include dynamic memory, such as the main memory 506. Common forms of the non-transitory media include, for example, a floppy disk, a flexible disk, hard disk, solid state drive, magnetic tape, or any other magnetic data storage medium, a CD- ROM, any other optical data storage medium, any physical medium with patterns of holes, a RAM, a PROM, an EPROM, a FLASH-EPROM, an NVRAM, any other memory chip or cartridge, and networked versions of the same.

[00102] Non-transitory media is distinct from but may be used in conjunction with transmission media. The transmission media can participate in transferring information between the non-transitory media. For example, the transmission media can include coaxial cables, copper wire and fiber optics, including the wires that comprise the bus 502. The transmission media can also take a form of acoustic or light waves, such as those generated during radio-wave and infra-red data communications.

[00103] The computer system 500 also includes a network interface 512 coupled to bus 502. Network interface 512 provides a two-way data communication coupling to one or more network links that are connected to one or more local networks. For example, network interface 512 may be an integrated services digital network (ISDN) card, cable modem, satellite modem, or a modem to provide a data communication connection to a corresponding type of telephone line. As another example, network interface 512 may be a local area network (LAN) card to provide a data communication connection to a compatible LAN (or WAN component to communicated with a WAN). Wireless links may also be implemented. In any such implementation, network interface 512 sends and receives electrical, electromagnetic or optical signals that carry digital data streams representing various types of information.

[00104] A network link typically provides data communication through one or more networks to other data devices. For example, a network link may provide a connection through local network to a host computer or to data equipment operated by an Internet Service Provider (ISP). The ISP in turn provides data communication services through the world wide packet data communication network now commonly referred to as the "Internet." Local network and Internet both use electrical, electromagnetic or optical signals that carry digital data streams. The signals through the various networks and the signals on network link and through network interface 512, which carry the digital data to and from computer system 500, are example forms of transmission media. [00105] The computer system 500 can send messages and receive data, including program code, through the network(s), network link and network interface 512. In the Internet example, a server might transmit a requested code for an application program through the Internet, the ISP, the local network and the network interface 512.

[00106] The received code may be executed by processor 504 as it is received, and/or stored in storage device 510, or other non-volatile storage for later execution.

[00107] Each of the processes, methods, and algorithms described in the preceding sections may be embodied in, and fully or partially automated by, code components executed by one or more computer systems or computer processors comprising computer hardware. The one or more computer systems or computer processors may also operate to support performance of the relevant operations in a "cloud computing" environment or as a "software as a service" (SaaS). The processes and algorithms may be implemented partially or wholly in application-specific circuitry. The various features and processes described above may be used independently of one another, or may be combined in various ways. Different combinations and sub-combinations are intended to fall within the scope of this disclosure, and certain method or process blocks may be omitted in some implementations. The methods and processes described herein are also not limited to any particular sequence, and the blocks or states relating thereto can be performed in other sequences that are appropriate, or may be performed in parallel, or in some other manner. Blocks or states may be added to or removed from the disclosed example embodiments. The performance of certain of the operations or processes may be distributed among computer systems or computers processors, not only residing within a single machine, but deployed across a number of machines.

[00108] As used herein, a circuit might be implemented utilizing any form of hardware, software, or a combination thereof. For example, one or more processors, controllers, ASICs, PLAs, PALs, CPLDs, FPGAs, logical components, software routines or other mechanisms might be implemented to make up a circuit. In implementation, the various circuits described herein might be implemented as discrete circuits or the functions and features described can be shared in part or in total among one or more circuits. Even though various features or elements of functionality may be individually described or claimed as separate circuits, these features and functionality can be shared among one or more common circuits, and such description shall not require or imply that separate circuits are required to implement such features or functionality. Where a circuit is implemented in whole or in part using software, such software can be implemented to operate with a computing or processing system capable of carrying out the functionality described with respect thereto, such as computer system 500.

[00109] As used herein, the term "or" may be construed in either an inclusive or exclusive sense. Moreover, the description of resources, operations, or structures in the singular shall not be read to exclude the plural. Conditional language, such as, among others, "can," "could," "might," or "may," unless specifically stated otherwise, or otherwise understood within the context as used, is generally intended to convey that certain embodiments include, while other embodiments do not include, certain features, elements and/or steps.

[OOllOjTerms and phrases used in this document, and variations thereof, unless otherwise expressly stated, should be construed as open ended as opposed to limiting. Adjectives such as "conventional," "traditional," "normal," "standard," "known," and terms of similar meaning should not be construed as limiting the item described to a given time period or to an item available as of a given time, but instead should be read to encompass conventional, traditional, normal, or standard technologies that may be available or known now or at any time in the future. The presence of broadening words and phrases such as "one or more," "at least," "but not limited to" or other like phrases in some instances shall not be read to mean that the narrower case is intended or required in instances where such broadening phrases may be absent.

Claims

Claims What is claimed is:

1. A computer-implemented method for mobile autofocus, the method comprising, by a mobile terminal: receiving, via a first camera, an image; displaying the received image on a screen of the mobile terminal; obtaining a facial image of a mobile user of the mobile terminal; performing a facial landmark detection on the obtained facial image; performing at least one location detection; determining a focus location on the displayed image based on the at least one performed location detection; and applying autofocus on the focus location on the displayed image.

2. The computer-implemented method of claim 1, wherein the image is a live image.

3. The computer-implemented method of claim 1, wherein the facial image is obtained via the first camera.

4. The computer-implemented method of claim 1, wherein the facial image is obtained via a second camera.

5. The computer-implemented method of claim 1, wherein the facial landmark detection comprises: scanning the facial image; determining one or more facial features in the facial image according to the scan; and producing one or more facial landmark locations of the one or more determined facial features in the facial image, wherein the one or more facial landmark locations comprise facial coordinates and cropped images of the one or more determined facial features.

6. The computer-implemented method of claim 5, wherein the determining the one or more facial features in the facial image is based on pre-stored facial images.

7. The computer-implemented method of claim 5, wherein the producing the one or more facial landmark locations is based on pre-stored algorithm.

8. The computer-implemented method of claim 1, wherein the location detection comprises gaze detection on the mobile user, object detection on the displayed image, and saliency detection on the displayed image.

9. The computer-implemented method of claim 8, wherein the gaze detection comprises: determining one or more screen coordinates on the screen of the mobile terminal based on the performed facial landmark detection and gaze algorithm, wherein the one or more screen coordinates indicate one or more locations where the mobile user is looking on the screen.

10. The computer-implemented method of claim 8, wherein the object detection comprises: determining one or more objects in the displayed image according to an object algorithm.

11. The computer-implemented method of claim 8, wherein the saliency detection comprises: determining one or more objects in the displayed image according to a saliency algorithm.

12. A computing system within or associated with a mobile terminal, the computing system comprising: one or more processors; and a memory storing instructions that, when executed by the one or more processors, cause the one or more processors to: receive, via a first camera, an image; display the received image on a screen of the mobile terminal; obtain a facial image of a mobile user of the mobile terminal; perform a facial landmark detection on the obtained facial image; perform at least one location detection; determine a focus location on the displayed image based on the at least one performed location detection; and apply autofocus on the focus location on the displayed image.

13. The computing system of claim 12, wherein the facial image is obtained via the first camera.

14. The computing system of claim 12, wherein the facial image is obtained via a second camera.

15. The computing system of claim 12, wherein the facial landmark detection comprises: scan the facial image; determine one or more facial features in the facial image according to the scan; and produce one or more facial landmark locations of the one or more determined facial features in the facial image, wherein the one or more facial landmark locations comprise facial coordinates and cropped images of the one or more determined facial features.

16. The computing system of claim 12, wherein the location detection comprises gaze detection on the mobile user, object detection on the displayed image, and saliency detection on the displayed image.

17. The computing system of claim 16, wherein the gaze detection comprises: determine one or more screen coordinates on the screen of the mobile terminal based on the performed facial landmark detection and gaze algorithm, wherein the one or more screen coordinates indicate one or more locations where the mobile user is looking on the screen.

18. The computing system of claim 16, wherein the object detection comprises: determine one or more objects in the displayed image according to an object algorithm.

19. The computing system of claim 16, wherein the saliency detection comprises: determine one or more objects in the displayed image according to a saliency algorithm.

20. A non-transitory storage medium storing instructions that, when executed by at least one processor of a computing system, cause the computing system to perform a method comprising: receiving, via a first camera, an image; displaying the received image on a screen of the mobile terminal; obtaining a facial image of a mobile user of the mobile terminal; performing a facial landmark detection on the obtained facial image; performing at least one location detection; determining a focus locations on the displayed image based on the at least one performed location detection; and applying autofocus on the focus location on the displayed image.