CN115701122A

CN115701122A - Automatic focusing method and device, storage medium and electronic equipment

Info

Publication number: CN115701122A
Application number: CN202110808937.1A
Authority: CN
Inventors: 刘钦
Original assignee: Beijing Jigan Technology Co ltd
Current assignee: Beijing Jigan Technology Co ltd
Priority date: 2021-07-16
Filing date: 2021-07-16
Publication date: 2023-02-07

Abstract

The application relates to the technical field of computer vision, and provides an automatic focusing method and device, a storage medium and electronic equipment. The automatic focusing method comprises the following steps: determining the position of a target object from an original image generated by an image sensor by using a target tracking algorithm; carrying out focusing transformation on a target image in the original image, and displaying the transformed target image; the target image is a partial image of the original image, the partial image contains a target object, and the focus transformation is affine transformation which can highlight the target image. Therefore, as long as the target object is still within the range of the original image, the method can automatically lock the position of the target object and obviously display the target object on the screen, and therefore the method effectively improves the problems that the target object is easy to leave the display range of the screen and is difficult to continuously observe by a user. Moreover, the automatic focusing realized by the method does not need a camera to have a target tracking function, so that the method is favorable for reducing the implementation cost.

Description

Automatic focusing method and device, storage medium and electronic equipment

Technical Field

The invention relates to the technical field of computer vision, in particular to an automatic focusing method and device, a storage medium and electronic equipment.

Background

When a user takes a picture (or records a picture) with a smartphone, the user often zooms in an image in the camera preview interface by sliding a screen or zooming a screen with two fingers, so as to more carefully observe a target object of interest (e.g., a person, an animal, a landscape, etc.) in the picture. However, the operation of enlarging the contents of the screen is equivalent to narrowing the Field of View (FOV) of the camera, which makes it easier for the user to leave the display range of the screen by slight shake of the mobile phone or small movement of the object, and then the user must repeatedly adjust the position of the mobile phone to capture the target object in the screen again.

At present, many cameras have functions of Optical Image Stabilization (OIS) or Electronic Image Stabilization (EIS), but these functions can only reduce the vibration sense of the content of the mobile phone screen, and the problem that the target object is easy to leave the display range of the screen is not substantially improved, so that it is difficult for the user to lock the target object in the screen for observation for a long time, and it is not beneficial to photograph (or record) the target object.

Disclosure of Invention

An object of the present invention is to provide an auto-focusing method and apparatus, a storage medium, and an electronic device, so as to solve the above-mentioned technical problems.

In order to achieve the above purpose, the present application provides the following technical solutions:

in a first aspect, an embodiment of the present application provides an auto-focusing method, including: determining the position of a target object from an original image generated by an image sensor by using a target tracking algorithm; carrying out focusing transformation on a target image in the original image, and displaying the transformed target image; the target image is a local image of the original image, the local image comprises the target object, and the focusing transformation is affine transformation capable of highlighting the target image.

The method firstly uses a target tracking algorithm to automatically find the position of the target object from an original image acquired by a camera, and then automatically presents the target image containing the target object in a remarkable way through focusing transformation (such as amplification), so that the attention of a user can be focused on the target object, namely, the automatic focusing of the target object is realized.

In other words, as long as the target object is still within the range of the original image (the range is not smaller than the display range of the screen, especially when the user zooms in the screen for observation, the range is necessarily larger than the display range of the screen), the method can automatically lock the position of the target object and obviously display the target object on the screen, so that the method effectively improves the problem that the target object is easy to leave the display range of the screen and is difficult to be continuously observed (or shot) by the user, is beneficial to the user to efficiently shoot high-quality pictures (or record high-quality videos), and improves the shooting (or video recording) experience of the target object.

Moreover, the automatic focusing realized by the method does not need the camera to have a target tracking function (for example, the camera is controlled to rotate along with a target object through a holder), namely, only a common fixed camera is adopted and an image sensor of the camera is utilized for imaging, so that the method is beneficial to reducing the implementation cost and widening the application scene of the scheme.

In an implementation manner of the first aspect, the target image is obtained by expanding a region occupied by the target object in the original image.

The area occupied by the target object in the original image can be directly used as the target image, but in the target image obtained in the way, the proportion of the foreground (the target object) is too large, and the proportion of the background is too small, so that the environment of the target object is difficult to embody, and the observation habit of a general user is not met. Therefore, in the above implementation, the region occupied by the target object in the original image is appropriately expanded to be the target image.

In an implementation form of the first aspect, the focus transform comprises a magnification and/or a translation.

The zoom-in operation allows the user to more clearly view (or photograph) the details of the target object (especially after switching cameras), and thus belongs to a focus transformation. The panning operation refers to moving the target image to a position (e.g. the center of the screen) more suitable for the user to observe, and thus belongs to a focusing transformation. Of course, these two operations can also be combined, for example, moving the target image to the center of the screen on the one hand and enlarging it to fill or nearly fill the entire screen on the other hand.

In an implementation manner of the first aspect, if the focus transformation includes zooming in, the performing focus transformation on the target image in the original image includes: calculating an expected magnification according to the size of the original image and the size of the target image, and amplifying the target image in the original image according to the expected magnification; if the focus transformation includes translation, performing focus transformation on the target image in the original image, including: and calculating an expected translation amount according to the central position of the original image and the central position of the target object, and translating the target image in the original image according to the expected translation amount.

In the above implementation manner, a desired magnification and/or a desired translation amount may be calculated first, and then the target image is magnified and/or translated according to the desired magnification and/or the desired translation amount, so as to achieve targeted magnification and/or translation of the target image. The desired magnification and/or the desired amount of translation may be considered the currently estimated desired magnification and the desired amount of translation for the target image.

For example, the expected magnification factor may be the size of the original image divided by the size of the target image, i.e., the expected magnification factor is enlarged to the size of the original image, and since the original image fills the whole screen as much as possible when displayed, the target image enlarged by the expected magnification factor will fill the whole screen as much as possible when displayed, which is beneficial for the user to observe (or photograph) the target object. For example, the expected translation amount may be the center position of the original image minus the center position of the target object (the result is a vector), and since the center of the original image is generally located at the center of the screen when displayed, the center of the target image translated by the expected translation amount is generally located at the center of the screen when displayed, which is beneficial for the user to observe (or photograph) the target object.

In an implementation manner of the first aspect, the calculating an expected magnification according to a size of the original image and a size of the target image, and magnifying the target image according to the expected magnification includes: calculating the expected magnification according to the size of the current frame original image and the size of the target image in the current frame original image; calculating the current frame amplification factor according to the expected amplification factor and the previous frame amplification factor, and amplifying a target image in the current frame original image according to the current frame amplification factor; the previous frame magnification is a magnification for amplifying a target image in a previous frame original image, and the current frame magnification is located between the previous frame magnification and the expected magnification; the calculating an expected translation amount according to the center position of the original image and the center position of the target object, and translating the target image according to the expected translation amount includes: calculating the expected translation amount according to the central position of the current frame original image and the central position of the target object in the current frame original image; calculating the translation amount of the current frame according to the expected translation amount and the translation amount of the previous frame, and translating the target image in the original image of the current frame according to the translation amount of the current frame; wherein the current frame translation amount is located between the previous frame translation amount and the expected translation amount.

The target image in the current frame original image can be directly amplified to the position at one time according to the expected amplification factor, and the target image in the current frame original image can also be directly translated to the position at one time according to the expected translation amount. However, these approaches may cause the screen content to change inconsistently, which may seriously affect the user's photographing (or video recording) experience. Therefore, in the above implementation, the zooming and the panning are performed gradually (which may be referred to as smooth zooming and smooth panning), and the zooming factor and the panning amount corresponding to each frame of the original image gradually approach to the desired zooming factor and the desired panning amount (of course, the desired zooming factor and the desired panning amount may also change in this process), so that the change of the screen content is continuous, which is beneficial to improving the photographing (or recording) experience of the user.

In one implementation manner of the first aspect, the calculating the current frame magnification according to the expected magnification and the previous frame magnification includes: calculating the amplification increment according to the expected amplification, the previous frame amplification and the residual amplification frame number; wherein the remaining number of amplification frames represents: the frame number of the original image which needs to be passed when the magnification of the current frame reaches the expected magnification; superposing the amplification factor increment on the previous frame amplification factor to obtain the current frame amplification factor; the calculating the current frame translation amount according to the expected translation amount and the previous frame translation amount comprises: calculating a translation increment according to the expected translation, the translation of the previous frame and the number of the residual translation frames; wherein the remaining panning frame number represents: the number of original image frames still needed to be experienced by the current frame translation amount when the current frame translation amount reaches the expected translation amount; and superposing the translation increment on the translation of the previous frame to obtain the translation of the current frame.

In the above implementation, the magnification of each frame is obtained by superimposing an increment on the magnification of the previous frame, so that the magnification is gradually changed, and the smooth magnification effect can be realized. For translation, a similar analysis is possible.

In an implementation manner of the first aspect, the image sensor is a photosensitive element of a camera, the number of the cameras is N, N is greater than or equal to 2, and the field angles of the cameras from 1 st camera to nth camera are sequentially increased, and performing focus transformation on the target image in the original image includes: judging whether a switching condition for switching from the current camera to the next camera is met or not according to the original image corresponding to the current camera and/or the original image corresponding to the next camera; the current camera is the ith camera, i is more than or equal to 1 and less than or equal to N, the next camera is a camera adjacent to the current camera in the sequencing of the field angle, and the original image corresponding to the camera is the original image generated by the image sensor of the camera; if the judgment result does not meet the switching condition, performing focusing transformation on a target image in the original image corresponding to the current camera, and if the judgment result meets the switching condition, performing focusing transformation on a target image in the original image corresponding to the next camera; wherein the focus transform comprises a magnification.

In the process of automatic focusing, the original images corresponding to different cameras can be switched to be adopted for focusing conversion according to the switching condition (the process of focusing conversion after switching is not interrupted) so as to improve the effect of automatic focusing. For example, when the original image corresponding to the camera with the smaller angle of view can be used for the focus conversion, as much as possible, the original image corresponding to the camera with the larger angle of view is used for the focus conversion unless the original image corresponding to the camera with the smaller angle of view cannot effectively capture the target object. The arrangement can not only present the target object as clear as possible to the user, but also ensure that the target object is automatically focused in a larger range.

In an implementation manner of the first aspect, if the determination result does not satisfy the switching condition, performing focus transformation on a target image in the original image corresponding to the current camera, and if the determination result satisfies the switching condition, performing focus transformation on a target image in the original image corresponding to the next camera includes: if the judgment result does not meet the switching condition, performing focusing transformation on a target image in the current frame original image acquired by the current camera according to the current frame transformation parameter corresponding to the current camera, and if the judgment result meets the switching condition, performing focusing transformation on a target image in the current frame original image acquired by the next camera according to the current frame transformation parameter corresponding to the next camera; the current frame conversion parameters corresponding to the current camera are calculated according to the current frame original image corresponding to the current camera, the current frame conversion parameters corresponding to the next camera are calculated according to the current frame original image corresponding to the next camera, and the current frame conversion parameters comprise current frame amplification factors or current frame amplification factors and current frame translation amount.

When a plurality of cameras exist, target tracking can be simultaneously performed in original images corresponding to the plurality of cameras (not necessarily all cameras), and relevant parameters (such as current frame magnification and current frame translation) of focus conversion are calculated according to a tracking result, so that the method is not only beneficial to rapidly judging whether switching conditions are met (the parameters may be used in some switching conditions), but also beneficial to rapidly continuing focus conversion based on the switched parameters when camera switching occurs.

In an implementation manner of the first aspect, if the field angle of the current camera is larger than the field angle of the next camera, the switching condition includes: the current frame magnification factor corresponding to the current camera is not less than the camera factor ratio, and the position of the target object does not exceed the boundary of the current frame original image corresponding to the next camera; and/or if the field angle of the current camera is smaller than the field angle of the next camera, the switching condition includes: the current frame magnification factor corresponding to the next camera is smaller than the camera factor ratio, or the position of the target object exceeds the boundary of the current frame original image corresponding to the current camera; the camera multiple ratio is the ratio of the amplification factor of the current camera to the amplification factor of the next camera.

If the original image corresponding to the camera with the larger angle of view is enlarged (possibly including other focusing transformations) at the beginning, when the magnification of the current frame is not smaller than the ratio of the camera magnification (the ratio of the intrinsic magnification of the camera with the smaller angle of view to the camera with the larger angle of view) and the original image corresponding to the camera with the smaller angle of view can also capture the target object, the original image corresponding to the camera with the smaller angle of view can be switched to continue to be enlarged (possibly including other focusing transformations), which is beneficial to displaying the details of the target object more clearly.

If the original image corresponding to the camera with the smaller angle of view is enlarged (possibly including other focusing transformation) at the beginning, when the magnification of the current frame is smaller than the ratio of the camera magnification (the ratio of the intrinsic magnification of the camera with the smaller angle of view to the camera with the larger angle of view) or the original image corresponding to the camera with the smaller angle of view cannot capture the target object, the original image corresponding to the camera with the larger angle of view can be switched to continue to be enlarged (possibly including other focusing transformation), so that although the detail display of the target object is reduced, the target object can be automatically focused at least in a larger range.

Both of the above handover conditions are in accordance with the handover principles mentioned above: when the original image corresponding to the camera with the smaller angle of view can be used for focus transformation, as far as possible, the original image corresponding to the camera with the larger angle of view is used for focus transformation unless the original image corresponding to the camera with the smaller angle of view cannot effectively capture the target object (for example, the target object is out of the image range or is too large in size in the image range).

In an implementation manner of the first aspect, if the field angle of the current camera is greater than the field angle of the next camera, determining whether the position of the target object exceeds the boundary of the current frame original image corresponding to the next camera according to the current frame translation amount corresponding to the next camera; and if the field angle of the current camera is smaller than that of the next camera, judging whether the position of the target object exceeds the boundary of the current frame original image corresponding to the current camera according to the current frame translation amount corresponding to the current camera.

In the implementation mode, whether the target object exceeds the boundary of the original image is judged by using the translation amount of the current frame, and the judgment process is simple and efficient. Of course in other implementations, there may be other methods of determination, for example, based on the relationship between the area occupied by the target object in the original image (obtainable by target tracking) and the boundary of the original image.

In an implementation manner of the first aspect, if the determination result satisfies the switching condition, performing focus transformation on a target image in an original image corresponding to the next camera includes: if the judgment result at the current moment meets the switching condition, the switching condition is judged continuously according to the original image corresponding to the current camera and/or the original image corresponding to the next camera in a later time period, and if the judgment results obtained in the time period all meet the switching condition, the target image in the original image corresponding to the next camera is subjected to focusing transformation.

The implementation mode provides two camera switching schemes, one is to immediately switch the cameras when the switching conditions are met, and the other is to wait for a period of time to switch the cameras after the switching conditions are met. The former scheme has simpler logic and higher switching efficiency, and the latter scheme has slightly more complicated logic, but is beneficial to avoiding the problem of frequent camera switching (for example, a target object moves back and forth at the boundary of an original image corresponding to a camera with a smaller field angle).

In one implementation of the first aspect, before the determining the position of the target object from the raw image generated by the image sensor using the target tracking algorithm, the method further comprises: determining the object of interest detected in the initial image as the target object, or determining the target object according to a target selection operation for the initial image; and the initial image is the first frame original image processed by the target tracking algorithm.

In the implementation mode, the target object can be automatically determined from the initial image through an algorithm, so that the user behavior can be simplified through operation; the target object can also be determined from the initial image by manual selection, which has higher flexibility. Providing a variety of ways of selecting target objects is advantageous to adequately meet the user's needs to view (or photograph) different target objects.

In an implementation manner of the first aspect, the determining the target object according to a target selection operation for an initial image includes: determining an object of interest within the circumscribed region as the target object according to a region delineation operation for the initial image; or, according to the focus selection operation for the initial image, determining the object of interest containing the focus as the target object.

In the implementation mode, a user can determine the target object by selecting the area in one initial image or selecting the point in the initial image, the operation mode is flexible, and the requirements of the user on observing (or shooting) different target objects are met.

In one implementation manner of the first aspect, the determining, by using a target tracking algorithm, a position of a target object from a raw image generated by an image sensor includes: determining the positions of a plurality of target objects from an original image generated by an image sensor by using a target tracking algorithm; the performing focus transformation on the target image in the original image and displaying the transformed target image includes: carrying out focusing transformation on a selected target image of the original image, and displaying the transformed selected target image; or, carrying out focusing transformation on a target image in the original image, and displaying a selected target image after transformation; the selected target image is a target image which contains the selected target object and is in the target images corresponding to the target objects.

In the implementation mode, a plurality of target objects in the original image are tracked simultaneously, but only the target object selected by the user is subjected to focusing transformation and displayed, or all the target objects are subjected to focusing transformation and only the target object selected by the user is displayed, so that the support of multi-target automatic focusing is realized.

The timing of selecting the target object by the user is very flexible, and the selection can be performed before the target tracking starts, or after the target tracking starts, or both. In particular, the user may also dynamically change the selected target object, for example, the user may start to select target object X for autofocusing, may subsequently reselect target object Y for autofocusing, may not autofocuse X after the change, or may not display the autofocusing result of X on the screen (instead, displaying the autofocusing result of Y) although X continues to be autofocused.

In one implementation form of the first aspect, the method further comprises: saving the image displayed on the display screen as a photographing result or a recorded video frame; wherein the image displayed on the display screen includes the transformed target image.

After the automatic focusing is adopted, the displayed image on the display screen always comprises the transformed target image (for example, the target object is always enlarged and displayed in the middle), so that the picture (or the video) can be always focused on an important target in the shooting scene as a result of photographing or video recording, and the quality of the picture (or the video) is improved.

In a second aspect, an embodiment of the present application provides an auto-focusing apparatus, including: the target tracking module is used for determining the position of a target object from an original image generated by the image sensor by utilizing a target tracking algorithm; the focusing conversion module is used for carrying out focusing conversion on a target image in the original image and displaying the converted target image; the target image is a local image of the original image, the local image comprises the target object, and the focus transformation is affine transformation capable of highlighting the target image.

In a third aspect, an embodiment of the present application provides a computer-readable storage medium, where computer program instructions are stored on the computer-readable storage medium, and when the computer program instructions are read and executed by a processor, the computer program instructions perform the method provided by the first aspect or any one of the possible implementation manners of the first aspect.

In a fourth aspect, an embodiment of the present application provides an electronic device, including: a memory for storing computer program instructions; a processor configured to read and execute the computer program instructions to perform the method provided by the first aspect or any one of the possible implementation manners of the first aspect.

In one implementation manner of the fourth aspect, the apparatus further includes: a camera comprising a lens and an image sensor to generate an original image based on light passing through the lens; and the display screen is used for displaying the converted target image.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.

Fig. 1 illustrates a possible structure of an electronic device provided in an embodiment of the present application;

FIG. 2 illustrates a possible flow of an auto-focusing method provided by an embodiment of the present application;

FIG. 3 illustrates a basic principle of an auto-focusing method provided by an embodiment of the present application;

fig. 4 (a) and 4 (B) show a scenario of determining a camera switching condition;

FIG. 5 illustrates one way of switching an autofocus object;

fig. 6 shows a possible structure of an auto-focusing apparatus provided in an embodiment of the present application.

Detailed Description

In recent years, technical research based on artificial intelligence, such as computer vision, deep learning, machine learning, image processing, and image recognition, has been advanced significantly. Artificial Intelligence (AI) is an emerging scientific technology for studying and developing theories, methods, techniques and application systems for simulating and extending human Intelligence. The artificial intelligence subject is a comprehensive subject and relates to various technical categories such as chips, big data, cloud computing, internet of things, distributed storage, deep learning, machine learning and neural networks. Computer vision is used as an important branch of artificial intelligence, particularly a machine is used for identifying the world, and computer vision technologies generally comprise technologies such as face identification, living body detection, fingerprint identification And anti-counterfeiting verification, biological feature identification, face detection, pedestrian detection, target detection, pedestrian identification, image processing, image identification, image semantic understanding, image retrieval, character identification, video processing, video content identification, behavior identification, three-dimensional reconstruction, virtual reality, augmented reality, synchronous positioning And map construction (SLAM for short), computational photography, robot navigation And positioning And the like. With the research and progress of artificial intelligence technology, the technology is applied to various fields, such as security, city management, traffic management, building management, park management, face passage, face attendance, logistics management, warehouse management, robots, intelligent marketing, computational photography, mobile phone images, cloud services, smart homes, wearable equipment, unmanned driving, automatic driving, smart medical treatment, face payment, face unlocking, fingerprint unlocking, testimony verification, smart screens, smart televisions, cameras, mobile internet, live webcasts, beauty treatment, medical beauty treatment, intelligent temperature measurement and the like.

The embodiments of the present application also belong to the field of computer vision technology, and these embodiments will be described below with reference to the drawings in the embodiments of the present application. It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element.

Fig. 1 shows a possible structure of an electronic device 100 provided in an embodiment of the present application. Referring to fig. 1, the electronic device 100 includes: a processor 110, a memory 120, and a camera 130, which are interconnected and in communication with each other via a communication bus 150 and/or other form of connection mechanism (not shown).

The processor 110 includes one or more (only one is shown), which may be an integrated circuit chip having signal processing capability. The Processor 110 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Micro Control Unit (MCU), a Network Processor (NP), or other conventional processors; the Processor may also be a dedicated Processor, including a Graphics Processing Unit (GPU), a Neural-Network Processing Unit (NPU), a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, and a discrete hardware component. Also, when there are a plurality of processors 110, some of them may be general-purpose processors, and the other may be special-purpose processors.

The Memory 120 includes one or more (Only one is shown in the figure), which may be, but not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Programmable Read-Only Memory (EPROM), an electrically Erasable Programmable Read-Only Memory (EEPROM), and the like.

The processor 110, as well as possibly other components, may access, read, and/or write data to the memory 120. In particular, one or more computer program instructions may be stored in the memory 120, and may be read and executed by the processor 110 to implement the auto-focusing method provided by the embodiments of the present application.

The camera 130 includes one or more (only one is shown) for acquiring images. The camera may further include a lens and an image sensor (e.g., a CCD sensor, a CMOS sensor, etc.), and light outside the electronic device 100 passes through the lens and is imaged on the image sensor, which is an image acquisition process. Photographing, recording, and previewing before photographing or recording all require image acquisition. If the camera 130 includes a plurality of cameras, the cameras may have different field angles (or different focal lengths), and are respectively used for performing shooting tasks in different scenes, and the cameras may be switched. In addition, different camera lenses are independent, but the image sensors may be independent or shared, but hereinafter, mainly taking the case that the image sensors are independent as an example, it is simply understood that one camera corresponds to one image sensor. The camera 130 may be a black and white camera, a color camera, an infrared camera, or the like.

The display screen 140 includes one or more (only one shown) for displaying images or other content desired to be displayed by the electronic device 100. The Display screen 140 may be, but is not limited to, a Liquid Crystal Display (LCD), an Organic Light Emitting Diode (OLED) Display screen, and the like.

It will be appreciated that the configuration shown in FIG. 1 is merely illustrative and that electronic device 100 may include more or fewer components than shown in FIG. 1 or have a different configuration than shown in FIG. 1. For example, the electronic device 100 may only include the processor 110 and the memory 120, the camera 130 is a separate image capturing device, and the display screen 140 is a separate image displaying device, which is not a component of the electronic device 100, but is connected to the electronic device 100 by wire and/or wirelessly for data interaction.

Further, the components shown in fig. 1 may be implemented in hardware, software, or a combination thereof. The electronic device 100 may be a physical device, such as a mobile phone, a tablet computer, a smart wearable device, a digital camera, a digital video camera, a PC, a notebook computer, etc., or may be a virtual device, such as a virtual machine, a virtualized container, etc. The electronic device 100 is not limited to a single device, and may be a combination of a plurality of devices or a cluster including a large number of devices. Hereinafter, the electronic device 100 is mainly exemplified as a mobile phone.

Fig. 2 illustrates a possible flow of an auto-focusing method provided by an embodiment of the present application, which may be, but is not limited to being, performed by an electronic device (e.g., the electronic device 100 in fig. 1). Referring to fig. 2, the method includes:

step S210: the position of the target object is determined from the raw image generated by the image sensor using a target tracking algorithm.

The original image generally refers to an image generated by an image sensor of a camera and not processed by the auto-focusing method proposed in the present application, but does not necessarily represent that the image is not processed (e.g., brightness adjustment, noise reduction, etc.). The specific object is not limited in the present application, and may be, for example, a human face, an entire human body, an animal or plant, a vehicle, a landscape, or the like. There may be one or more target objects in the original image, and hereinafter, the case of one target object is mainly taken as an example, and the case of a plurality of target objects is also appropriately described.

The camera for acquiring the original image can adopt a fixed-focus lens or a zoom lens. Hereinafter, mainly, a fixed focus lens is taken as an example, and since a camera is limited in volume on mobile devices such as mobile phones, it is generally difficult to provide a complex optical zoom structure, and only a scheme of combining the fixed focus lens with digital zooming can be adopted. The fixed-focus lens cannot amplify the image through lens adjustment, but the digital zoom technology can amplify a local area of an original image through an up-sampling algorithm and then display the amplified local area on a screen, so that an effect similar to optical zooming (amplification) is achieved.

It should be noted that, instead of generating an original image only after a user has reached a photographing or video recording instruction, the image sensor of the camera may generate the original image as long as the camera is in an on state, for example, after the user turns on the camera APP, preview pictures may be displayed, and the preview pictures are also derived from the original image generated by the image sensor.

In addition, it is pointed out that there may be a plurality of cameras of the electronic device when fig. 1 is introduced, and the auto-focusing method in the embodiment of the present application may be applied to a scene of a single camera or a scene of a plurality of cameras.

The target tracking algorithm is an algorithm capable of tracking a specific object in an image frame sequence, and the algorithm takes the image frame sequence as input and outputs the position of the specific object in each frame of image. The target tracking algorithm includes a conventional algorithm (e.g., a particle filter algorithm, a kalman filter algorithm, an optical flow algorithm), an algorithm based on machine learning or deep learning (e.g., a TLD algorithm, a DeepSRDCF algorithm, an Adaboost algorithm), and the like, and the type of the target tracking algorithm is not limited in the aspect of the present application. In step S210, the image frame sequence is a sequence of original images, and the specific object is a target object.

The target tracking algorithms are implemented in many ways, some target tracking algorithms can automatically detect and track the position of a target object from an original image, and other target tracking algorithms cannot automatically detect the target object, so that the original image (called an initial image, i.e. the first frame of original image processed by the algorithm) and the position (called an initial position) of the target object in the original image must be input into the target tracking algorithms when the algorithms are just started to be executed, and the target tracking algorithms track the target object by taking the initial position as a starting point, so that the target tracking algorithms only need to input the original image into the target tracking algorithms subsequently, and the position of the target object does not need to be input again.

Considering a target tracking algorithm that needs to provide an initial position, the initial position of the target object may be described in, but is not limited to, one of the following forms:

1. an image block in the initial image, the image block comprising a target object: for example, the image block may be a rectangular block.

2. Coordinates in the initial image: for example, four vertex coordinates of a rectangle; as another example, the coordinates of the vertex in the upper left corner of the rectangle combine the width and height of the rectangle.

3. Mask image: for example, in a binary image, the pixel value is 0 or 255, and the set of pixels with the value of 255 corresponds to the position of the target object in the initial image.

It is noted that since the target object is identified by its position, it is not clear which object in the picture is specifically referred to by an object if its position is not clear. Therefore, determining the initial position of the target object is also equivalent to determining the object to be tracked by the target tracking algorithm, or determining the object to be focused by the auto-focusing method in the general view of the scheme.

The target object is determined in the initial image, which may be, but is not limited to, one of the following ways:

mode (1): the object of interest detected in the initial image is determined as the target object.

For example, a human face (an object of interest) in an initial image is automatically detected by using a neural network model trained in advance, and the obtained human face is used as a target object. If there are multiple faces in the initial image, there may be multiple faces detected, and one of the faces may be automatically selected as the target object according to a certain rule. For example, the rule may be: the face with the largest size, the face with the most centered position, the face with the best shooting angle and the like. Of course, the user may also be given the option to select a face as the target object, for example, all detected faces are displayed on a display screen, and the user selects a face by touching the display screen or clicking a button (virtual or physical).

Further, if an alternative supports setting multiple target objects, multiple faces may also be selected as target objects in the above example.

Mode (2): the target object is determined according to a target selection operation for the initial image.

In the method (1), although there may be human intervention, the target object is determined from the initial image mainly by means of an algorithm, so that the operation of the user can be simplified to some extent. However, the target object selected by the algorithm is not necessarily an object that the user really focuses on, for example, the user focuses on a smaller face in the picture, but the target object selected by the algorithm is the largest face in the picture. The method (2) returns the selection right of the target object to the user, and determines the target object according to the operation (called target selection operation) of selecting the target object by the user, so that the target object has more flexibility in selection, and the requirements of the user on observing different target objects can be fully met. Two cases are listed below:

case (2.1): the user defines a region from the initial image, and takes the object of interest in the region as the target object.

The "delineation" here may be considered as one of the target selection operations. For example, in the case that the display screen is a touch screen and the object of interest is a human face, the user may draw a rectangular frame on the screen by using a finger, and directly use the rectangular frame as the initial position of the human face, which is equivalent to that the human face is present in the rectangular frame drawn by the user by default. For another example, a rectangular frame drawn on the screen by the user only indicates that the face is located in the rectangular frame, and at this time, the face may be further detected from the rectangular frame by using an algorithm, and the detected face is taken as the target object. For another example, all faces in the initial image may be detected by an algorithm in advance, and then, based on a rectangular frame drawn by the user, the face located within the rectangular frame may be used as a target object selected by the user.

Case (2.2): the user selects an in-focus point from the initial image, and an object of interest including the in-focus point is determined as a target object.

The "selection focus" here may be regarded as one of the target selection operations. For example, in the case where the display screen is a touch screen, a user clicks a certain point on the screen with a finger, which is equivalent to determining the clicked point as an in-focus point. For example, in the case that the display screen is a touch screen and the object of interest is a face, a rectangular frame with a preset size and with an opposite focus as a center may be used as an initial position of the face, which is equivalent to that the face exists in the rectangular frame by default; for another example, a human face including the focus in the initial image may be used as the target object. The face in the initial image may be detected by an algorithm before the determination of the focus point, or may be detected by an algorithm from an area around the focus point after the determination of the focus point.

It is to be understood that the target selection operation is not limited to the above two kinds, and as is clear from the cases (2.1) and (2.2), there are different ways of determining the target object even for the same kind of target selection operation, and the flexibility thereof is very large.

Further, if the alternative supports setting a plurality of target objects, each target object may be selected in one of the above manners, and it is not excluded that different target objects may be selected in different manners.

The execution time of step S210 may be any time after the target tracking algorithm starts executing, and the raw image generated by the image sensor at or before the time is input to the target tracking algorithm, and the target tracking algorithm outputs the position of the target object in the raw image, for example, outputs a rectangular frame. Where the initial position of the target object may not be output by the target tracking algorithm, as previously described.

Step S220: and carrying out focusing transformation on the target image in the original image, and displaying the transformed target image.

The target image is a partial image of the original image (for example, a rectangular region in the original image, the neps of which are smaller than the area of the original image), and the partial image includes the target object obtained by tracking in step S210, and the selection of the target image is described later. The focus transformation broadly refers to some affine transformation capable of highlighting the target image, which may be mathematically defined as a linear transformation followed by a Translation transformation performed on the vector space, and which may be implemented by a series of atomic transformations (or a combination of atomic transformations) including scaling (Scale), translation (Translation), flip (Flip), rotation (Rotation), and Shear (Shear). The transformed target image may be displayed on a display screen, for example, for a mobile phone, in a preview interface of a camera APP.

The "highlighting" in the definition of the focus transformation is understood to mean that the display mode of the transformed target image is more prominent than that before the transformation, so that the user's attention can be focused on the target object more, i.e. the focusing of the target object is achieved, and the focusing process is automatically performed, and is also called as autofocusing. The focus transformation may specifically comprise a magnification and/or a translation, for example. In accordance with the definition of "highlight" above, the zoom-in operation allows the user to more clearly observe the details of the target object, and thus belongs to a focus transformation; the panning operation does not mean to move the target image randomly, but means to move the target image to a position more suitable for the user to observe (such as the center of the screen), and thus belongs to a focusing transformation. As explained above, a single fixed-focus lens cannot enlarge a picture through lens adjustment, and therefore digital zooming technology is used for enlarging a target image in a focusing transformation.

The focus transform may involve transform parameters such as magnification (including the desired magnification, current frame magnification, etc., mentioned later), amount of translation (including the desired amount of translation, current frame amount of translation, etc., mentioned later).

It should be noted that the focus transformation, although it may be regarded as a transformation for the target image, does not represent that the focus transformation operates only for the target image. For example, in the case where the focus transformation includes enlargement and translation, the target image is not necessarily cut out from the original image and enlarged and translated (although this is also possible), but the original image may be enlarged and translated as a whole, so that the target image is enlarged and translated "by the way", and a related example will be given later.

It is mentioned above that only the transformed target image can be displayed on the screen, but it is not to say that only the target image is necessarily displayed on the screen. For example, if the target image is enlarged to some extent but not enough to fill the whole screen, a part of the original image located at the outer layer of the target image may be displayed on the screen (this part of the original image may also be focused together with the target image), and when the target image is enlarged to fill the whole screen, the enlarged target image may be displayed on the screen first. Of course, it is also possible to display only the target image on the screen, and fill-in display (for example, display black) in the remaining positions.

Furthermore, it is optional whether the original image itself is to be displayed on the screen: for example, the original image may be displayed first when the camera APP is turned on, and then the complete original image is no longer displayed, but the transformed target image is displayed instead (of course, as described above, it is also possible to continue to display a part of the original image other than the target image); for another example, the converted target image is directly displayed after the camera APP is turned on, and the original image is not displayed; for another example, the original image is always displayed after the camera APP is turned on, the enlarged target image is displayed in the form of a small window on the screen, and the like.

Regarding the display of the target image, it should be noted that the solution of the present application does not pay attention to the resolution of the display screen, that is, the resolution of the target image may be smaller than, equal to, or larger than the resolution of the display screen, and when the converted target image is displayed, the adaptation may be performed according to the resolution of the display screen.

In order to explain the effect of the automatic focusing method, the condition in a time period is not considered, a plurality of frames of original images are generated by an image sensor of a camera in the time period, firstly, a target tracking algorithm is executed on each frame of original image to obtain the position of a target object in each frame of original image, then, the focus conversion is carried out on the target image containing the target object in each frame of original image, and finally, the obtained target image after each frame of conversion is displayed. Therefore, as long as the target object is still within the shooting range of the camera (namely, in the original image) and the target tracking algorithm works normally, no matter the target moves or the hand of the user shakes, the position of the target object is continuously locked by the target tracking algorithm, and the target image is continuously highlighted on the display screen through focusing transformation. Therefore, the method remarkably improves the problem that the target object is easy to leave the display range of the screen and is difficult to be continuously observed (or shot) by the user, is beneficial to efficiently shooting high-quality pictures (or recording high-quality videos) by the user, and improves the shooting (or video recording) experience of the user. Referring to fig. 3, the upper part of fig. 3 is an original image, in which a human body is a target object, and the target image is a rectangular region including the target object in the original image, and in the original image, the size of the target image is small and is not at the center, so that it is not convenient for a user to observe (or photograph) details of the target object if the original image is directly displayed on a screen. The lower half of fig. 3 is the target image after the focus transformation, and it can be seen that the details of the human body become clear and the position is centered after the target image is magnified, so that the details of the target object can be displayed on the screen to facilitate the user to observe (or photograph).

It should be noted that the above-mentioned "high-quality photo" and "high-quality video" do not mean that the attributes of sharpness, brightness, color, and the like of the photo (or video) are certainly and significantly improved after the auto-focusing is adopted, but mean that the display screen is always focused on the target object (for example, the target object is always enlarged and displayed in the middle) after the auto-focusing is adopted, so that the user can take a picture (or record a video) of the target object at a relatively good angle, thereby obtaining a photo (or video) with more valuable content.

The autofocus method is performed with a relatively free timing. Still taking the case that the focus transformation includes zooming in and out as an example, the auto-focusing method can be applied to the situation that before the user starts to take a picture or record a video, the target image is continuously moved from the original position in the original image to the center of the screen and zoomed in to a state of filling or nearly filling the whole screen for being presented to the user for preview, so that the user can continuously observe the details of the target object at a proper position so as to take the target object at a proper time and obtain a high-quality picture (or video); the automatic focusing method can also be applied to the situation that after a user starts to record a video, a target image is continuously moved from an original position in the original image to the center of the screen and is enlarged to a state of filling or nearly filling the whole screen for recording, so that the target object in the video shot by the user is always in a proper position and shows abundant details, and the video quality is high. In addition, the automatic focusing method has the advantages that the automatic focusing realized by the method is automatic focusing within the shooting range of the camera, and the camera does not need to have target tracking capability (for example, the camera is controlled by a holder to rotate along with a target object, and a mobile phone camera generally does not have the capability), namely, only a common fixed camera is adopted, so the method has low implementation cost and rich application scenes.

It should be understood that the implementation of the auto-focusing method is not necessarily related to the user, and the above-mentioned scenes all involve the user, which is only for facilitating the understanding of the contents of the solution, and the auto-focusing method can be fully applied to the auto-shooting scenes without the intervention of the user.

In some implementations, images displayed on the display screen may also be saved as a result of a photograph or a recorded video frame. As mentioned above, the image displayed on the display screen includes at least the transformed target image, but may also include other contents, such as a part of the original image at the periphery of the target image. If the image displayed on the display screen is taken as the photographing result, the image can be taken as a photo; if the image displayed on the display screen is taken as a video recording result, the image can be taken as a video frame, and a plurality of continuous video frames form the recorded video. The user's act of taking a picture (or recording) is triggered by clicking a button (virtual or physical), voice control, etc., and the taking of a picture (or recording) may occur either before or after the autofocus begins.

Considering the case where the focus transformation includes zooming in and panning, for example, when a user takes a video of a running child, the above method can always capture a picture of the child that is zoomed in and is located in the center of the image; for example, the user records a high peak at a distance on a train, and the method can always capture a screen in which the peak is enlarged and located at the center of the image. Obviously, such video is very desirable to the user. In contrast, in the prior art, once a child is recorded by enlarging a picture, the child is easy to leave the display range of the screen due to the movement of the child or the shaking of the hand of the user, so that a video frame without the child is recorded; the amplified picture records the peak, and the peak is easy to leave the display range of the screen due to the movement of the train or the shaking of the hands of the user, so that the video frame without the peak is recorded.

Next, on the basis of the above embodiment, the determination of the target image in step S220 is described continuously.

In step S210, the position of the target object in the original image is obtained, that is, the area (e.g., a rectangular area) occupied by the target object in the original image is obtained, so that in some implementations, the area in the original image can be directly used as the target image. However, in the target image obtained in this way, the ratio of the foreground (target object) is too large, and the ratio of the background is too small, so that the environment where the target object is located is difficult to embody, and the observation (or shooting) habit of a general user is not met. Therefore, in other implementations, the region occupied by the target object in the original image may be appropriately expanded to serve as the target image. The manner of the expansion is not limited, for example, the size of the original region (for a rectangular region, the size can be understood as the side length) may be multiplied by a preset proportional value (greater than 1), or the size of the original region may be increased by several pixels, and so on.

Referring to fig. 3, the region occupied by the target object is close to the edge of the human body, and the target image expands the region to include part of the image content near the human body. The extended scale view _ scale = W2/W3 is a preset value, such as 1.2, 1.5, etc., where W2 and W3 are the size of the target image and the size of the human body region, respectively.

It should be emphasized that, in step S220, the transformed target image is finally obtained, and the position of the target image needs to be determined in advance in order to obtain the transformed target image, but the "determining" operation is not necessarily explicit. As an example, it is assumed that the target image needs to be enlarged and translated, one way is to cut the target image from the original image and then enlarge and translate the target image, which requires explicit determination of the position of the target image, and the other way is to enlarge and translate the original image as a whole, and the target image is enlarged because the target image is a part of the original image. With respect to directly magnifying and translating the original image, a related example will be given later.

Next, how to enlarge and translate the target image will be described on the basis of the above embodiments. The magnification and translation are two separate types of focusing transformations, which may be performed separately or may be used in combination.

A. Magnifying a target image

In some implementations, the desired magnification factor may be calculated first, and then the target image may be magnified according to the desired magnification factor, so as to achieve targeted magnification of the target image. The desired magnification may be considered the currently estimated ideal magnification of the target image at which the user may more ideally view (or photograph) the details of the target object.

For example, the desired magnification may be calculated according to the size of the original image and the size of the target image, such as taking the ratio of the size of the original image to the size of the target image as the desired magnification (or appropriately fluctuating based on the ratio), and since the original image will fill the entire screen as much as possible when displayed, the target image enlarged by the desired magnification will also fill the entire screen as much as possible when displayed, thereby facilitating the user to observe (or photograph) the target object.

Referring to fig. 3, the desired magnification zoom _ scale = W1/W2= W1/(view _ scale × W3), where W1, W2, and W3 are the size of the original image, the size of the target image, and the size of the human body region, respectively, view _ scale is an expansion ratio as described above, and view _ scale may be a preset value, and for a particular image sensor, W1 is also a fixed value, but W3 may change over time. If the target image is enlarged zoom-scale times, its size is exactly the same as the original image (assuming that the aspect ratio of the target image is also the same as the original image), i.e. the entire screen is filled.

Considering the situation of multiple frames of original images, after target tracking is performed on each frame of original image, an expected magnification factor can be correspondingly calculated (the calculation method is as described above), taking the current frame of original image as an example, there are different ways to amplify the target image in the current frame of original image according to the expected magnification factor:

a1: the expected magnification is directly calculated to be magnified in place at one time, but the practice may cause the screen content to be magnified suddenly, and the photographing (or video recording) experience of the user is seriously influenced. For example, when a user just turns on a camera APP on a mobile phone, an original image just acquired by a camera is displayed at this time, it is assumed that a magnification is 1x, an expected magnification calculated according to the original image is 2.5x, a screen image is directly jumped from 1x to 2.5x according to the method of A1, and the user feels obtrusive, but the method logic is simple and the calculation amount is small.

A2: and performing smooth amplification for multiple times, namely calculating the amplification factor of the current frame according to the expected amplification factor (of the current frame) and the amplification factor of the previous frame, and amplifying the target image in the original image of the current frame according to the amplification factor of the current frame. The specific calculation formula of the current frame magnification is not limited, but it should be satisfied that the current frame magnification is between the previous frame magnification and the expected magnification (may be equal to the expected magnification, but cannot be equal to the previous frame magnification). Thus, after several frames of amplification, the current frame magnification will gradually approach or even reach the desired magnification. For example, the current frame magnification may be calculated using the following formula:

cur_frame_scale＝pre_frame_scale+(zoom_scale-pre_frame_scale)/smooth_residual_number

the variables in the formula have the following meanings:

pre _ frame _ scale: the last frame magnification, namely the magnification of the current frame calculated when the last frame original image is the current frame;

cur _ frame _ scale: the current frame magnification factor, if the current frame original image is the first frame original image collected after starting the auto focus, the current frame original image may not be magnified, for example, a user just starts a camera APP on a mobile phone, and cur _ frame _ scale may directly obtain the inherent magnification factor 1x of the current camera;

zoom _ scale: calculating the expected magnification according to the original image of the current frame, which is defined above;

smooth _ residual _ number: and the residual magnification frame number is the residual frame number for smoothly magnifying the target image by zoom _ scale times. It is first necessary to set the total number of frames N (N > 1) for the target images to gradually reach the desired magnification from the initial magnification (i.e., the first calculated cur _ frame _ scale, e.g., 1x as mentioned above), smooth _ residual _ number = N when the magnification of the target image for the first frame is performed, smooth _ residual _ number = N-1 when the magnification of the target image for the second frame is performed, and so on until smooth _ residual _ number =1 when the magnification of the target image for the last frame is performed.

It can be seen that the scheme described by the above formula essentially gradually enlarges the target object N times until the enlargement factor reaches zoom _ scale times. It should be noted that during this smooth zooming, the zoom _ scale value may change (because the value of W3 may change), so that the amount of change in the magnification of each frame with respect to the magnification of the previous frame is not necessarily completely uniform, but the smoothness of the zooming process can still be guaranteed.

Where Δ sacle = (zoom _ scale-pre _ frame _ scale)/smooth _ residual _ number represents the magnification increment of the current frame magnification relative to the previous frame magnification, and Δ sacle may be a positive number or a negative number. The calculation method of Δ sacle is not limited to the above formula, and may also be calculated by using other formulas according to zoom _ scale, pre _ frame _ scale, and smooth _ residual _ number, for example, by adding coefficients to the existing calculation formula so that Δ sacle gradually decreases as the smooth _ residual _ number increases.

Assuming that zoom scale remains unchanged, the above calculation formula for the current frame magnification is actually a formula for performing linear interpolation between the initial magnification and the expected magnification, and thus, obviously, the formula may be replaced by other formulas using non-linear interpolation.

For example, when a user just turns on a camera APP on a mobile phone, at this time, an original image just acquired by a camera is displayed, a magnification factor is not assumed to be 1x, an expected magnification factor calculated according to the original image is 2.5x, and the expected magnification factor is assumed to remain unchanged, a screen may be changed from 1x to 1.1x, then from 1.1x to 1.2x until 2.5x according to the method of A2, and the target image appears to the user to be gradually enlarged, which is the same as the user manually enlarging the target object on the screen, but the experience is better in the method of A2, but the calculation amount is greater than that of A1.

It should be noted that zoom _ scale is not necessarily larger than the initial magnification of the target image, that is, cur _ frame _ scale may be smaller than pre _ frame _ scale (in this case, Δ sacle is a negative number), but this does not affect the correctness of the above formula, and this also belongs to smooth magnification, except that the magnification of the current frame is gradually reduced and is finally reduced to the desired magnification. At this time, the target image displayed on the screen is still enlarged in reality with respect to the target image in the original image, but the target image displayed on the screen is tapered as viewed by the user.

For example, after performing autofocus, the target image has been enlarged by 2.5x, i.e. the desired magnification is reached, after which the target object leaves the shooting range of the camera, or the desired magnification can no longer be calculated according to the above formula because occlusion cannot be tracked, or the user has turned off the autofocus function, and a default value, e.g. 1x, can be taken. At this time, the current frame magnification of the target image needs to be changed from 2.5x to 1x, and the procedure in A2 may be continuously adopted (of course, A1 may also be adopted).

For another example, after performing auto focusing, the target image has been enlarged by 2.5x, that is, the desired magnification is reached, and then the user holds the mobile phone to approach the target object, which results in the size of the target object in the original image becoming larger (W3 increasing), so that the value of the desired magnification is significantly reduced, and assuming that the value is reduced to 1.5x, at this time, the magnification of the current frame of the target image needs to be transited from 2.5x to 1.5x, and the method in A2 may also be continuously employed (of course, A1 may also be employed, only the experience of the user is poor).

B. Translating the target image

In some implementations, the expected translation amount can be calculated first, and then the target image is translated according to the expected translation amount, so that the target image is translated in a targeted manner. The expected translation amount can be regarded as the currently estimated ideal translation amount of the target image, and after translating the target image by using the translation amount, the user can observe (or shoot) the target object at a more ideal position.

For example, the expected translation amount may be calculated according to the center position of the original image and the center position of the target object (which may also be regarded as the center position of the target image), for example, taking the difference between the center position of the original image and the center position of the target object as the expected translation amount (or appropriately fluctuating based on the difference), and the expected translation amount may be a two-dimensional vector (x direction and y direction). Since the center of the original image is generally located at the center of the screen when the original image is displayed, the center of the target image translated by the translation amount is expected to be located at the center of the screen when the original image is displayed, so that the user can observe (or shoot) the target object conveniently.

Referring to fig. 3, the expected shift amount shift = P2-P1 (including subtraction in the x direction and the y direction), where P1 and P2 are the center position of the target object and the center position of the original image, respectively, and P2 may be considered as a fixed value, but P1 may vary with time. If the target image is shifted, the center of the target image becomes coincident with the center of the original image, and the target image will be centered on the screen after display.

Considering the situation of multiple frames of original images, after target tracking is performed on each frame of original image, an expected translation amount can be calculated correspondingly (the calculation method is as described above), taking the current frame of original image as an example, there are different ways to translate the target image in the current frame of original image according to the expected translation amount:

b1: the translation amount is directly translated into position at one time according to the current calculation, but the operation may cause the screen content to move suddenly, and the photographing (or video recording) experience of the user is seriously influenced. For example, when a user just turns on a camera APP on a mobile phone, an original image just acquired by a camera is displayed at the moment, an expected translation amount calculated according to the original image is (100, -80), and a screen moves directly (100, -80) according to the method of B1, so that the user feels a sharp image, but the method logic is simple and the calculation amount is small.

B2: and performing smooth translation for multiple times, namely calculating the translation amount of the current frame according to the expected translation amount (of the current frame) and the translation amount of the previous frame, and translating the target image in the original image of the current frame according to the translation amount of the current frame. The specific calculation formula of the current frame translation amount is not limited, but it should be satisfied that the current frame translation amount is located between the previous frame translation amount and the expected translation amount (may be equal to the expected translation amount). Thus, after several frames of translation, the current frame translation will gradually approach or even reach the desired translation.

For example, the current frame translation amount may be calculated using the following formula:

cur_frame_shift＝pre_frame_shift+(shift-pre_frame_shift)/smooth_residual_number

the variables in the formula have the following meanings:

pre _ frame _ shift: the previous frame translation amount is the current frame translation amount calculated when the previous frame original image is the current frame;

cur _ frame _ shift: the current frame translation amount, if the current frame original image is the first frame original image acquired after starting the auto focus, the translation may not be performed, for example, the user just starts the camera APP on the mobile phone, and cur _ frame _ shift may be directly taken (0,0);

shift: the expected translation amount calculated according to the original image of the current frame is defined as described above;

smooth _ residual _ number: and the residual translation frame number is the residual frame number of the target image smooth translation shift. It is first necessary to set the target image to gradually reach the total number of frames N (N > 1) of the desired amount of translation from the initial amount of translation (i.e., the first calculated cur _ frame _ shift, such as (0,0) mentioned above), smooth _ residual _ number = N when the translation of the target image of the first frame is performed, smooth _ residual _ number = N-1 when the translation of the target image of the second frame is performed, and so on until smooth _ residual _ number =1 when the translation of the target image of the last frame is performed.

It can be seen that the scheme described by the above formula essentially gradually translates the target object N times until the translation amount reaches shift. Note that, during this smooth shift, the shift value may change (because the value of P1 may change), so that the shift amount of each frame is not necessarily completely uniform with respect to the shift amount of the previous frame, but the smoothness of the shift process can still be ensured.

Where Δ shift = (shift-pre _ frame _ shift)/smooth _ residual _ number represents the amount of shift increment of the current frame shift with respect to the previous frame shift. The calculation method of Δ wash is not limited to the above formula, and may also be calculated by using other formulas according to shift, pre _ frame _ shift, and smooth _ residual _ number, for example, adding coefficients on the basis of the existing calculation formula so that Δ shift gradually decreases as the smooth _ residual _ number increases.

Assuming that shift is kept constant, the above calculation formula of the current frame translation amount is actually a formula for performing linear interpolation between the initial translation amount and the expected translation amount, and thus, obviously, the formula may be replaced by other formulas using nonlinear interpolation.

For example, when the user just starts the camera APP on the mobile phone, the original image just acquired by the camera is displayed, the expected translation amount calculated according to the original image is (100, -80), and assuming that the expected translation amount remains unchanged, the translation amount of the target image may be changed from (0,0) to (10, -8) and then from (10, -8) to (20, -16) according to the procedure of B2 until the translation amount becomes (100, -80), and the target image appears to the user to be gradually translated as if the user himself translates the mobile phone so that the target object is located at the center of the screen, which is better in experience, but the calculation amount of B2 is greater than B1.

It should be noted that shift is not necessarily larger than the initial shift of the target image (the size of the shift can be understood as the size of the vector), that is, cur _ frame _ shift may also be smaller than pre _ frame _ shift, but this does not affect the correctness of the above formula, and this also belongs to smooth shift, but the shift of the current frame is gradually reduced and will finally be reduced to the desired shift. The reduction of the translation amount means that the target image returns to its position in the original image and is no longer located at a position suitable for the user to observe, and therefore the translation operation at this time does not strictly belong to the category of the focus transform.

For example, after performing auto focus, the target image has been translated (100, -80), i.e. the desired translation amount is reached, after which the target object leaves the shooting range of the camera, or because occlusion cannot be tracked, or the auto focus function is turned off by the user, the desired translation amount can no longer be calculated according to the above formula, and a default value, for example, (0,0) can be taken. At this time, the current frame translation amount of the target image needs to transition from (100, -80) to (0,0), and the procedure in B2 may be continuously adopted (of course, B1 may also be adopted).

For another example, after performing auto focus, the target image has been translated by (100, -80), that is, the desired translation amount is reached, and then the user adjusts the position of the mobile phone to move the target object toward the center of the screen, which results in the position of the target object in the original image changing (P1 is close to P2), so that the value of the desired translation amount is significantly reduced, and if the value is reduced to (20, -5), the translation amount of the current frame of the target image needs to be transited from (100, -80) to (20, -5), and the method in B2 may be adopted (of course, B1 may also be adopted).

Taking the formulas in A2 and B2 as an example, considering the case that the focus transformation includes both enlargement and translation, that is, the target image in the original image of the current frame needs to be enlarged by cur _ frame _ scale times and translated by cur _ frame _ shift, which can be implemented by matrix operation:

first, a 3 × 3 transform matrix is constructed:

where s is cur _ frame _ scale, (cx, cy) is cur _ frame _ shift (the shift amounts in the x-and y-directions, respectively).

Then, the following matrix operation is performed on the original image:

wherein, I _input Representing the original image, (x, y) representing coordinates in the original image,

is represented by I _input Pixel value at (x, y), I _output Representing the transformed original image, (x ', y') representing coordinates in the transformed original image,

is represented by _output The pixel value at (x ', y'). If (x ', y') is out of range of the original image, it is discarded.

After discarding some pixel values, I _output The number of middle pixel values will be less than I _input The number of intermediate pixel values, which can be filled by upsampling to obtain the image I to be finally displayed _final ，I _final Including the transformed target image.

As explained earlier, the enlargement and translation of the target image by the matrix operation does not need to explicitly determine the position of the target image, but only the parameters (s, cx, cy) of the enlargement and translation operation need to be calculated from the size (view _ scale × W3) and the position (P1) of the target image, and the position of the target image can be considered to be implicitly determined.

In addition, the direct amplification and translation of the original image by using the matrix operation has the following advantages: the target object cannot effectively fill the entire screen when the magnification has not reached the desired magnification because the portion of the original image that is located at the periphery of the target image needs to be enlarged and displayed together with the target image to fill the entire screen (of course, it is also possible to display the target image only in the screen, but the user may have a poor look and feel). After the matrix operation, the above method discards the pixel values beyond the range of the original image, and the remaining pixel values are just the pixel values to be displayed, including both the pixel values in the target image and the pixel values located at the periphery of the target image but needing to be displayed.

The above-mentioned zooming in and panning are performed automatically, and the target object is continuously tracked and displayed to the user after zooming in and panning. But it is not excluded that in some implementations the user may also intervene in the autofocus process, not illustrated with magnification as an example:

for example, when a user just turns on a camera APP on a mobile phone, an original image just acquired by a camera is displayed, it is assumed that a magnification is 1x, an expected magnification calculated according to the original image is 2.5x, a target image is gradually enlarged from 1x to 2.5x, and when the target image is enlarged to 1.5x, the user considers that the target object is sufficiently clear, the enlargement process can be blocked by touching a display screen, and the expected magnification is locked at 1.5x, and autofocusing is continued based on the expected magnification. Alternatively, it is also possible to turn off the autofocus function, i.e. the magnification is maintained at 1.5x, unless manually adjusted by the user.

For another example, if the magnification of the target image has reached the desired magnification of 2.5x, but the user thinks that the target object is not clear enough, the magnification of the target image may be adjusted by touching the display screen, assuming that the user adjusts its magnification to 3.0x, at which time the desired magnification is locked at 3.0x, and autofocusing is continued based on this desired magnification. Alternatively, it is also possible to turn off the autofocus function, i.e. the magnification is maintained at 3.0x, unless manually adjusted by the user.

The following continues to describe the autofocus in the case of multiple cameras on the basis of the above embodiments.

First, the concept of optical digital joint zoom is introduced, which mainly takes the case that the electronic device is a mobile phone as an example. The aforementioned digital zoom can achieve local enlargement of an image by upsampling, but cannot compensate for the reduction in image sharpness, and is more noticeable particularly at a high zoom magnification. To improve this problem, the mobile phone is increasingly equipped with a plurality of cameras with different viewing angles, such as one camera FOV =80 ° (wide-angle lens, in a relative sense) and one camera FOV =40 ° (telephoto lens, in a relative sense), and if the sensor resolutions of the two cameras are the same, for example, both are 1600 ten thousand pixels, the camera with FOV =40 ° will present a clearer image than the camera with FOV =80 °.

Without assuming that a camera with FOV =80 ° corresponds to magnification factor 1x and a camera with FOV =40 ° corresponds to magnification factor 2x, the process of optical digital joint zooming is roughly as follows: after a user starts the camera APP, the preview interface displays an image acquired by the camera with FOV =80 °, the magnification factor is 1x at this time, and as the user zooms in a screen picture, as long as the magnification factor does not reach 2x, the image displayed on the screen is a result of digitally zooming the image acquired by the camera with FOV =80 °. When the magnification reaches 2x, an image captured by a camera with FOV =40 ° is displayed on the screen, this time optical zooming. The user continues to zoom in on the screen, i.e. with a magnification greater than 2x, digital zooming is performed based on the image acquired by the camera with FOV =40 °. The case of zoom-out, the case with more cameras, is similar and is not analyzed again. It is easy to see that since the joint zooming process of multiple cameras combines optical zooming and digital zooming, it can be called "optical digital joint zooming".

Compared with pure digital zooming of a single camera, the optical digital combined zooming can be switched to use a camera with a larger focal length for imaging under high zooming magnification, and particularly for medium and long shots, the imaging quality of the optical digital combined zooming is obviously higher than that of the pure digital zooming.

One of the main problems faced by the optical digital joint zoom is the transition problem of image contents, the shooting angles and distances of different cameras to a scene are different, and the images acquired by different cameras are also different in style, so that when the cameras are switched, the image contents are kept as far as possible without sudden changes, so as to avoid giving a user a sharp feeling.

Under the condition that multiple cameras exist and optical digital joint zooming is supported, long-term research of the inventor finds that users prefer to enlarge a long-range view and preview shooting contents: for example, a friend running ahead is recorded; for another example, when a user is holding a mobile phone at a concert hall to see a distant stage; for another example, a user sitting on a train takes a picture of a distant beautiful scene with a mobile phone; for another example, a user who stands under the feet of a snow mountain wants to photograph the top of the snow mountain at a high position, and so on. However, as described above, after the screen is enlarged, slight shake of the mobile phone or small movement of the object can easily cause the target object to leave the display range of the screen, which undoubtedly restricts the popularization and application of the optical digital combined zoom. In the scheme to be described below, the auto-focusing method proposed in the present application is combined with a multi-camera scene and an optical digital joint zoom algorithm to sublimate the expression form of optical digital joint zoom. For the case that the electronic device is provided with N (N ≧ 2) cameras, and the corresponding field angles of view of the 1 st camera to the nth camera sequentially increase, the performing the focus transformation on the target image in the original image in step S220 may further include:

judging whether a switching condition for switching from the current camera to the next camera is met or not according to the original image corresponding to the current camera and/or the original image corresponding to the next camera; and if the judgment result does not meet the switching condition, performing focusing transformation on a target image in the original image corresponding to the current camera, and if the judgment result meets the switching condition, performing focusing transformation on a target image in the original image corresponding to the next camera. The original image corresponding to the camera is an original image generated by an image sensor of the camera.

Considering the user experience, only the problem of switching between two cameras is discussed here, and the corresponding field angles of the two cameras are adjacent in the size ordering. For example, FOV =80 ° of camera 1, FOV =40 ° of camera 2, FOV =20 ° of camera 3, and the corresponding magnifications are 1x, 2x, and 4x, respectively. The initial magnification of the target image is 1x, and the desired magnification is 5x, then switching from camera 1 to camera 2 and from camera 2 to camera 3 may be involved in the process of auto-focusing (in some cases, the switching order may also be reversed), but switching from camera 1 to camera 3 may not be involved, because the switching picture across the cameras (across camera 2) may generate abrupt changes, which may cause the user experience to be degraded, and of course, after the desired magnification is calculated, switching from camera 1 to camera 3 directly is also possible.

Since each camera switch occurs between two adjacent cameras, the two cameras can be named as the current camera and the next camera, respectively. The current camera refers to a camera whose corresponding image (including an original image or a target image) is being displayed on a screen, the next camera refers to a camera whose corresponding image (including an original image or a target image) is not being displayed on the screen, the next camera is a candidate switching object, and the specific meaning of camera switching refers to switching the image displayed on the screen from the image acquired by the current camera to the image acquired by the next camera.

If the current camera is the ith (i is more than or equal to 1 and less than or equal to N) camera of the N cameras, the next camera has different possibilities: for the case of i =1, the next camera is the 2 nd camera; for the case of i = N, the next camera is the N-1 st camera; and for the condition that i is more than or equal to 2 and less than or equal to N-1, the next camera is the (i-1) th or (i + 1) th camera. Continuing with the above three camera example, if the current camera is camera 1, then the next camera is camera 2; if the current camera is camera 2, then the next camera is camera 2 or camera 3.

When judging whether the switching condition is met, the current camera is necessarily in an open state, and the next camera is not necessarily. Theoretically, the upper camera and the lower camera can be started again when the switching condition is met, so that the power consumption is saved. However, the opening of the camera also needs time, and the opening of the camera when the switching condition is met may cause the displayed picture to be discontinuous, so that the next camera can be opened in advance and prepared for the switching of the cameras at any time. Certainly, in consideration of saving power consumption, in some implementations, the next camera does not need to be turned on together with the current camera, but is turned on again when it is predicted that there is a possibility of switching, for example, when the current frame magnification of the target image reaches 2x, it is possible to switch from the camera 1 to the camera 2, when the current frame magnification is 1x, it is not necessary to turn on the camera 2, and when the current frame magnification increases to 1.5x, the camera 2 is turned on.

Hereinafter, a case where both the two cameras are turned on when the switching condition is satisfied will be mainly described as an example. It is noted that once the current camera and the next camera are both in the on state, it should be considered that the two are acquiring images substantially synchronously.

Whether the switching condition meets the action or not can be judged, of course, after every frame of original image is collected by the current camera, the judgment can be carried out only once every several frames of original images. If the target image in the original image corresponding to the current camera is subjected to focusing transformation at present, when the judgment result does not meet the switching condition, the target image can be continuously subjected to focusing transformation, and when the judgment result meets the switching condition, the target image in the original image corresponding to the next camera can be subjected to focusing transformation instead. After the cameras are switched, the original current camera becomes the next camera, and the original next camera becomes the current camera. Similar to the situation before the switch occurs, after the switch, the original current camera also does not need to be turned off (at least temporarily), so as to prepare for the reverse switch.

In addition, after switching, whether the original current camera needs to continue to perform focusing conversion or not is optional, although in principle, focusing conversion may not be performed, because the image acquired by the camera is not displayed at this time, and stopping focusing change can save calculation amount and equipment power consumption, but focus conversion can be continued in some schemes is not excluded, so that when tangential switching occurs, the focusing conversion does not need to be performed temporarily. As for the specific relationship between the original image corresponding to the current camera and/or the original image corresponding to the next camera and the switching condition, an example will be given later, and will not be set forth here.

The following briefly analyzes the meaning that the autofocus method supports multi-camera switching:

in the case where there are a plurality of cameras, and the angles of view of the cameras are different, assuming that the cameras shoot the same scene containing the same target object, the shooting range of the large FOV camera (the camera with the larger angle of view, such as camera 1) is larger, so that the target object can be captured in a larger range, but the target object is smaller in size in the original image, i.e., the sharpness is poor, and the shooting range of the small FOV camera (the camera with the smaller angle of view, such as camera 2) is smaller, so that the target object can be captured only in a smaller range, but the target object is larger in size in the original image, i.e., the sharpness is higher.

In the process of automatic focusing, the original images corresponding to different cameras can be switched to be adopted for focusing conversion according to the switching condition (the process of focusing conversion after switching is not interrupted), so that the effect of automatic focusing is improved. For example, when the original image corresponding to the small FOV camera can be used for the focus transformation, it is adopted as much as possible that the original image corresponding to the large FOV camera is used for the focus transformation unless the original image corresponding to the small FOV camera cannot effectively capture the target object. The switching logic can not only present the target object as clear as possible to the user, but also ensure that the target object is continuously automatically focused in a larger range, because the shooting range of the small FOV camera is smaller, the target object is relatively easy to leave the range due to hand shake or target object movement and the like, but the shooting range of the large FOV camera is larger, and the target object is relatively difficult to leave the range.

As can be seen from the above, the optical digital combined zoom algorithm, in combination with the auto-focusing, is slightly different from the original one, because the display screen always presents the transformed target image (obtained by digital zooming) rather than the original image, and thus the result of the optical zooming is not directly used in practice, but the transformed target image is obtained based on the original image, and thus the result of the optical zooming is indirectly used.

As mentioned above, the focus transform may involve some transform parameters, such as the current frame magnification, the current frame translation, etc., the transform parameters calculated by different cameras are different, some transform parameters may be further converted from each other (e.g., the current frame magnification), and some transform parameters may not be converted from each other (e.g., the current frame translation). Thus, to ensure continuity of focus transformation after a camera is switched, a set of transformation parameters can be calculated for the current camera and the next camera respectively, so that the parameters can be immediately put into use when the switching occurs. Of course, if no switching occurs, the transform parameters calculated for the next camera may not be used, i.e. these parameters appear to be redundant at present, but this degree of redundancy is acceptable in view of the simplicity of the way the parameters are calculated.

It can be understood that the calculation of the transformation parameters should wait until the camera is started, for example, if the next camera is not started yet when the current camera is started, only one set of transformation parameters needs to be calculated. Further, the calculated conversion parameter may be used in the switching condition, so that the case where the calculated parameter is not used can be avoided as much as possible, as will be described in detail in the following example.

Specifically, if the focus transformation includes amplification and translation, the transformation parameters (current frame amplification factor, current frame translation amount) may be calculated and camera switching may be performed in the following manner:

(a) And respectively determining the position of the target object from the current frame original image corresponding to the current camera and the next camera by using a target tracking algorithm. Obviously, here the default is that the next camera has been started.

(b) And calculating the current frame magnification factor corresponding to the current camera and the next camera according to the positions of the target objects in the current frame original images corresponding to the current camera and the next camera respectively, and calculating the current frame translation amount corresponding to the current camera and the next camera. The method for calculating the current frame magnification and the current frame translation amount can refer to the foregoing, and will not be repeated (where it may be necessary to calculate the current expected magnification and the expected translation amount first). It should be noted that, for the next camera, after all, before the switching occurs, the calculated current frame magnification and the current frame translation amount are not used for performing the magnification and the translation, so the calculation frequency of the two parameters can be properly reduced, for example, it is not necessary to perform the calculation every frame, instead, the calculation is performed every time several frames of original images are collected, and certainly, the calculation is performed every frame, which is only a little bit larger.

(c) Whether the switching condition for switching from the current camera to the next camera is satisfied is judged, and the transformation parameters calculated in the previous step may be used in the judgment, which is described in detail in the following example. And if the judgment result does not meet the switching condition, amplifying and translating the target image in the current frame original image corresponding to the current camera according to the current frame amplification factor and the current frame translation amount corresponding to the current camera, and if the judgment result meets the switching condition, switching to amplifying and translating the target image in the current frame original image corresponding to the next camera according to the current frame amplification factor and the current frame translation amount corresponding to the next camera.

If the focus transform includes zooming in and not panning, only the content of the panning in steps (a) to (c) above is removed, and if the focus transform also includes other transform parameters, it is similar and will not be specifically described.

It should be noted that, when camera switching is considered, the calculation formulas of cur _ frame _ scale and cur _ frame _ shift given in the foregoing still hold, that is, smoothness of amplification and translation can still be ensured, and only before the determination result satisfies the switching condition, cur _ frame _ scale and cur _ frame _ shift are calculated according to the target tracking result of the current camera, and after the determination result satisfies the switching condition, cur _ frame _ scale and cur _ frame _ shift are calculated according to the target tracking result of the next camera.

Even if different cameras shoot the same scene, parallax exists between obtained original images, in order to weaken the influence of parallax on camera switching, the original image corresponding to the current camera can be processed (for example, rotated and translated) before the switching occurs, so that when the switching occurs, the processed original image corresponding to the current camera and the original image corresponding to the next camera are just overlapped or slightly different, and since the original image serving as the source of focus transformation basically does not change after the switching, the switched target image displayed on the screen does not naturally have sudden change.

The processing of the original image can also be performed in a smoothing manner and synchronously with the smooth enlargement of the target image. Taking rotation as an example, assuming that the angular difference (which may refer to an optical axis included angle) between the current camera and the next camera is 5 °, and the target image passes through 10 frames from 1x to 2x, each frame may rotate the original image corresponding to the current camera by 5/10=0.5 °, and the target image is enlarged on the rotated original image, and when reaching the 10 th frame, the original image corresponding to the current camera and rotated by 5 ° just coincides with the original image corresponding to the next camera (assuming that the sensor resolutions of the two cameras are the same), and seamless switching may be performed. Further, since affine transformation inherently includes rotation and can be represented by a matrix, in some implementations, if a focus transformation is implemented by using a matrix operation, the transformation matrix (such as H in the foregoing) may be implemented _trans ) Adding parameters related to the rotation.

Besides parallax, the images acquired by the two cameras may differ in style (e.g., color temperature, etc.), which is determined by the characteristics of the cameras themselves, and therefore, the transition processing is also required, and will not be described in detail.

The following continues with possible handover conditions:

condition C1: if the field angle of the current camera is larger than that of the next camera, the switching condition comprises: the current frame magnification factor corresponding to the current camera is not less than the camera factor ratio, and the position of the target object does not exceed the boundary of the current frame original image corresponding to the next camera. The camera magnification ratio is defined as the ratio of the magnification of the current camera to the magnification of the next camera, and the magnification number of the camera here refers to the inherent magnification and can be obtained through calibration data of the camera. If the condition for the magnification in C1 is not satisfied, the condition C1 is not satisfied, and the condition for the position of the target object does not need to be further determined.

It is easy to see that the condition C1 includes two sub-conditions, the first sub-condition is related to the original image corresponding to the current camera, and the second sub-condition is related to the original image corresponding to the next camera, which does not exclude that in some schemes, C1 may be simplified to include only one of the sub-conditions.

Simple explanation for the meaning of condition C1: the condition C1 corresponds to a case where the large FOV camera is cut to a small FOV camera. If the original image corresponding to the large FOV camera is enlarged (and may include other focusing transformations) at the beginning, when the current frame magnification is not less than the camera magnification ratio and the original image corresponding to the small FOV camera can capture the target object, the original image corresponding to the small FOV camera may be switched to continue to be enlarged (and may include other focusing transformations), which is beneficial to displaying the details of the target object more clearly.

In the condition C1, whether the position of the target object exceeds the boundary of the current frame original image acquired by the next camera can be determined by the current frame translation amount corresponding to the next camera, if the current frame translation amount is not greater than the maximum translation amount in the same direction in the current frame original image corresponding to the next camera, the position of the target object is considered not to exceed the boundary of the current frame original image corresponding to the next camera, otherwise, the position of the target object is considered to exceed the boundary of the current frame original image corresponding to the next camera.

If the focus transformation includes translation, the translation amount of the current frame corresponding to the next camera may be calculated in the step (b) above, and if the focus transformation does not include translation, the translation amount of the current frame corresponding to the next camera may also be specially calculated for determining the switching condition.

The maximum translation amount in the original image is related to the size of the target image and the direction of translation, and can be defined as follows: the maximum translation amount is the translation amount formed by translating the target image to the boundary of the original image along the theta direction (the boundary of the target image is overlapped with a certain section of the boundary of the original image but does not exceed the boundary of the original image) by taking the central position of the original image as a starting point. For example, θ =0 °, the maximum translation amount is (W1/2-W2/2,0), W1 is the width of the original image, and W2 is the width of the target image; θ =90 °, the maximum translation amount is (0, H1/2-H2/2), H1 is the height of the original image, and H2 is the height of the target image. When comparing the translation amount of the current frame with the maximum translation amount, the maximum translation amount in the same direction as the translation amount of the current frame is selected for comparison. In addition, for the condition C1, the original image used for calculating the maximum translation amount refers to the current frame original image corresponding to the next camera.

Of course, in some other implementation manners, there are other determination methods for determining whether the position of the target object exceeds the boundary of the current frame original image acquired by the next camera:

for example, as can be seen from the above definition of the maximum translation amount, the critical condition in the above determination method is that the target image is translated to the boundary of the original image, and the critical condition may be replaced by that the target object (also corresponding to a rectangular frame) is translated to the boundary of the original image, the center position of the target object is translated to the boundary of the original image, and so on.

For another example, as long as the target tracking algorithm can still track the target object in the current frame original image acquired by the next camera, the position of the target object is considered to be beyond the boundary of the original image.

Explaining condition C1 in conjunction with fig. 4 (a) and 4 (B), camera 1 is the current camera, FOV =80 °, magnification factor 1x; camera 2 is the next camera, FOV =40 °, magnification 2x. Since the FOV of the camera 2 is small, the shooting range of the camera 2 is smaller than that of the camera 1, that is, the shooting range of the original image corresponding to the camera 2 can only correspond to the portion of the dotted square in the original image corresponding to the camera 1 (note that the resolution of the original image corresponding to the camera 2 is not necessarily smaller than that of the original image corresponding to the camera 1, and only reduced rendering is performed in fig. 4 (a) and (B)).

When a user just starts the camera APP, the camera 1 is adopted, the magnification of the target image is 1x, and the calculated expected magnification is 2.5x at this time. In the process of automatic focusing, if the magnification factor has not reached 2x, the camera 1 is continuously adopted, and if the magnification factor has reached 2x, whether the target object is located in the current frame original image corresponding to the camera 2 is further judged: fig. 4 (a) shows a case where the position of the target object is within the boundary of the original image of the current frame, and when the condition C1 is satisfied, the camera 2 is switched to continue to zoom in the target image; fig. 4 (B) shows a case where the position of the target object is outside the boundary of the original image of the current frame, and when the condition C1 is not satisfied, the enlargement of the target image is continued based on the camera 1.

Condition C2: if the field angle of the current camera is smaller than that of the next camera, the switching condition includes: the current frame magnification factor corresponding to the next camera is smaller than the camera factor ratio, or the position of the target object exceeds the boundary of the current frame original image corresponding to the current camera. The definition of the camera magnification ratio is the same as that in the condition C1, and the current frame magnification corresponding to the next camera in C2 is the current frame magnification corresponding to the large FOV camera, and is consistent with that of C1, but the current frame magnification corresponding to the current camera (small FOV camera) may also be used, but corresponding conversion is required.

It is easy to see that the condition C2 includes two sub-conditions, the first sub-condition is related to the original image corresponding to the next camera, and the second sub-condition is related to the original image corresponding to the current camera, which does not exclude that in some schemes, C2 may be simplified to include only one of the sub-conditions.

Simple explanation for the meaning of condition C2: the condition C2 corresponds to a case where the small FOV camera cuts the camera having the large FOV. If the original image corresponding to the small FOV camera is enlarged (and may include other focusing transformations) at the beginning, when the current frame magnification is smaller than the camera magnification ratio, or the original image corresponding to the small FOV camera cannot capture the target object, the original image corresponding to the large FOV camera may be switched to continue to be enlarged (and may include other focusing transformations), so that although the detail display of the target object is reduced, the target object may be automatically focused at least in a larger range, and the function of automatic focusing is maintained.

In the condition C2, the position of the target object is beyond the boundary of the current frame original image corresponding to the current camera, and can be determined by the current frame translation amount corresponding to the current camera, if the current frame translation amount is greater than the maximum translation amount in the same direction in the current frame original image corresponding to the current camera, the position of the target object is considered to have been beyond the boundary of the current frame original image corresponding to the current camera, otherwise, the position of the target object is considered not to be beyond the boundary of the current frame original image corresponding to the current camera.

If the focus transformation includes translation, the translation amount of the current frame corresponding to the current camera may be calculated in the step (b) above, and if the focus transformation does not include translation, the translation amount of the current frame corresponding to the current camera may also be specially calculated for determining the switching condition. Regarding the definition of the maximum translation amount in the original image, similar to the condition C1, it is not repeated, but for the condition C2, the original image used for calculating the maximum translation amount refers to the current frame original image corresponding to the current camera.

Of course, in some other implementation manners, there are other determination methods for determining whether the position of the target object exceeds the boundary of the current frame original image acquired by the current camera, and the condition C1 may be referred to and is not repeated.

Still describing the condition C2 in conjunction with fig. 4 (a) and 4 (B), the determination regarding the switching condition is continuously performed, for example, the condition C1 is satisfied at a certain time, the target image is switched from the camera 1 to the camera 2, and the target image has been enlarged to 2.5 ×, and the target object leaves the shooting range of the camera 2 due to the movement of the target object itself at a later stage, and at this time, the condition C2 is satisfied, and the camera 2 is switched back to the camera 1 to continue the enlargement of the target image.

In an auto-focusing scheme, conditions C1 and C2 may be implemented simultaneously or only one of them may be implemented. For example, only condition C1 is implemented, and for the case where the small FOV camera is cut to the large FOV camera, then another condition may be employed.

Both conditions C1 and C2 comply with the handover principles mentioned before: when the original image corresponding to the small FOV camera can be used for the focus transformation, as much as possible, the original image corresponding to the large FOV camera is used for the focus transformation unless the original image corresponding to the small FOV camera cannot effectively capture the target object (for example, the target object is out of the image range or is oversized in the image range).

Further, on the basis of the above embodiment, if the determination result satisfies the switching condition, the focus transformation is performed on the target image in the original image corresponding to the next camera, and at least two ways exist:

mode 1: and if the judgment result obtained at the current moment meets the switching condition, immediately performing focusing transformation on the target image in the original image corresponding to the current camera, and switching to perform focusing transformation on the target image in the original image corresponding to the next camera. As can be seen, the method 1 is actually a scheme for immediately switching the cameras when it is currently determined that the switching condition is satisfied.

Mode 2: if the judgment result obtained at the current moment meets the switching condition, the judgment of the switching condition is continued in a later time period according to the original image corresponding to the current camera and/or the original image corresponding to the next camera (certainly, the original image newly obtained in the time period), and if the judgment results obtained in the time period all meet the switching condition, the target image in the original image corresponding to the current camera is subjected to focusing transformation, and the target image in the original image corresponding to the next camera is subjected to focusing transformation.

Mode 2 is actually a scheme of waiting for a period of time after the current switching condition is satisfied, and switching the camera if the switching condition can be satisfied all the time. The time period in mode 2 can be described in different ways: for example, described in terms of time duration, such as 2s; as another example, the description is given in terms of a number of frames, such as 20 frames (referring to the acquisition of 20 original images).

The mode 1 has simpler logic and higher switching efficiency. Mode 2 is somewhat logically complex, but is beneficial to avoiding the problem of frequent camera switching.

For example, the target object moves back and forth at the boundary of the original image corresponding to the small FOV camera (e.g., the position of the target object in fig. 4 (B)), which results in the condition C1 being satisfied for a while and the condition C2 being satisfied for a while, and thus the large FOV camera and the small FOV camera are repeatedly switched, which affects the service life thereof, and may also cause instability of the display screen. After the logic of the method 2 is adopted, if the condition C2 is satisfied, the camera is not immediately switched to the large FOV camera, but the user waits for 2s, and if the target object in 2s returns to the shooting range of the small FOV camera, the condition C2 is not satisfied, and the camera does not need to be switched.

Similarly, if the target object leaves the shooting range of the camera with the largest FOV or the target object suddenly cannot be tracked, the magnification of the target image should be reduced until the target image is reduced to the original size (e.g., 1 ×) according to the above description. Considering the situation that the target object repeatedly moves at the boundary of the original image corresponding to the camera with the largest FOV or the situation that the target object is intermittently tracked, the last short time can be waited before the magnification of the reduced target image is determined, so that the repeated reduction and the magnification of the target image on the screen are avoided, and the user experience is prevented from being influenced.

Finally, on the basis of the above embodiments, the autofocus of a plurality of target objects is explained again. To support auto-focusing of a plurality of target objects, the plurality of target objects should be tracked simultaneously and the position of each target object in the original image should be determined separately in step S210. Step S220 may include at least two schemes:

scheme 1: and carrying out focusing transformation on the selected target image of the original image, and displaying the transformed selected target image. The selected target image is a target image corresponding to a plurality of target objects, and the target image of the selected target object is contained in the target image.

For example, if 5 target objects are in total in the original image, and 3 target objects are selected by the user, the target images corresponding to the 3 target objects may be subjected to focus transformation and displayed based on the target tracking result, and the display manner is not limited, and may be displayed simultaneously, may be displayed sequentially, may be displayed by operation of the user, and may be displayed in which one or the other two are temporarily hidden, and so on.

Scheme 2: and carrying out focusing transformation on the target image in the original image, and displaying the selected target image after selection transformation. The definition of the selected target image is the same as that of the embodiment 1, and the description thereof will not be repeated.

For example, if 5 target objects are in total in the original image and 3 target objects are selected by the user, the focus transformation may be performed on the target images corresponding to all 5 target objects based on the target tracking result, but only the transformed target images corresponding to the 3 target objects selected by the user are displayed, and the display manner is not limited.

The timing of selecting the target object by the user is very flexible, and the selection can be performed before the target tracking starts, or after the target tracking starts, or both. In particular, the user may also dynamically change the selected target object, for example, the user may begin to select target object X for autofocusing, may subsequently reselect target object Y for autofocusing (before which Y may or may not have been autofocused, depending on the implementation of the scheme), may no longer autofocuse X after the change, or may not display the autofocus result for X on the screen, but instead display the autofocus result for Y, although X continues to be autofocused.

Of the two schemes described above, scheme 1 is less computationally intensive (because each target object is not autofocused), but scheme 2 is better for dynamic switching support (because each target object is autofocused, but the full results are not displayed).

Fig. 5 illustrates one way of switching an autofocus object. Referring to fig. 5, before switching, the autofocus objects displayed on the screen are black bodies, and the user presses the screen for a long time, and all the target objects currently performing autofocus appear above the screen, which is 5 in total. The user selects the 2 nd, i.e., gray body, at which point the autofocus object displayed on the screen becomes a gray body, and the black body can also continue autofocusing, so that the user switches the autofocus object again.

Fig. 6 shows a possible structure of an auto-focusing apparatus 300 provided in an embodiment of the present application.

Referring to fig. 6, the auto-focusing apparatus 300 includes:

a target tracking module 310 for determining a position of a target object from an original image generated by an image sensor using a target tracking algorithm;

a focus transformation module 320, configured to perform focus transformation on a target image in the original image and display a transformed target image; the target image is a local image of the original image, the local image comprises the target object, and the focus transformation is affine transformation capable of highlighting the target image.

In one implementation of the auto-focusing apparatus 300, the target image is obtained by expanding an area occupied by the target object in the original image.

In one implementation of the autofocus device 300, the focus transform includes a magnification and/or translation.

In one implementation of the auto-focusing apparatus 300, if the focus transformation includes zooming, the focus transformation module 320 performs the focus transformation on the target image in the original image, including: calculating an expected magnification according to the size of the original image and the size of the target image, and amplifying the target image in the original image according to the expected magnification; if the focus transformation includes translation, the focus transformation module 320 performs focus transformation on a target image in the original image, including: and calculating an expected translation amount according to the central position of the original image and the central position of the target object, and translating the target image in the original image according to the expected translation amount.

In one implementation of the auto-focusing apparatus 300, the focus transform module 320 calculates an expected magnification according to the size of the original image and the size of the target image, and magnifies the target image according to the expected magnification, including: calculating the expected magnification according to the size of the current frame original image and the size of the target image in the current frame original image; calculating the current frame amplification factor according to the expected amplification factor and the previous frame amplification factor, and amplifying a target image in the current frame original image according to the current frame amplification factor; the previous frame magnification is a magnification for amplifying a target image in a previous frame original image, and the current frame magnification is located between the previous frame magnification and the expected magnification; the focus transform module 320 calculates an expected translation amount according to the center position of the original image and the center position of the target object, and translates the target image according to the expected translation amount, including: calculating the expected translation amount according to the central position of the current frame original image and the central position of the target object in the current frame original image; calculating the translation amount of the current frame according to the expected translation amount and the translation amount of the previous frame, and translating a target image in the original image of the current frame according to the translation amount of the current frame; wherein the current frame translation amount is located between the previous frame translation amount and the expected translation amount.

In one implementation of the auto-focusing apparatus 300, the focus transform module 320 calculates the current frame magnification according to the expected magnification and the previous frame magnification, including: calculating the amplification increment according to the expected amplification, the previous frame amplification and the residual amplification frame number; wherein the remaining number of amplification frames represents: the frame number of the original image which needs to be passed when the magnification of the current frame reaches the expected magnification; superposing the amplification factor increment on the previous frame amplification factor to obtain the current frame amplification factor; the focus transform module 320 calculates the current frame translation amount according to the expected translation amount and the previous frame translation amount, including: calculating a translation increment according to the expected translation, the translation of the previous frame and the number of the residual translation frames; wherein the remaining translation frame number represents: the number of original image frames still needed to be experienced by the current frame translation amount when the current frame translation amount reaches the expected translation amount; and superposing the translation increment on the translation of the previous frame to obtain the translation of the current frame.

In an implementation manner of the automatic focusing apparatus 300, the image sensor is a photosensitive element of a camera, the number of the cameras is N, N is greater than or equal to 2, and the field angles of the cameras from 1 st camera to nth camera are sequentially increased, and the focus transformation module 320 performs focus transformation on a target image in the original image, including: judging whether a switching condition for switching from the current camera to the next camera is met or not according to the original image corresponding to the current camera and/or the original image corresponding to the next camera; the current camera is the ith camera, i is more than or equal to 1 and less than or equal to N, the next camera is a camera adjacent to the current camera in the sequencing of the field angle, and the original image corresponding to the camera is the original image generated by the image sensor of the camera; if the determination result does not satisfy the switching condition, the focus conversion module 320 performs focus conversion on a target image in the original image corresponding to the current camera, and if the determination result satisfies the switching condition, the focus conversion module 320 performs focus conversion on a target image in the original image corresponding to the next camera; wherein the focus transform comprises a magnification.

In an implementation manner of the automatic focusing apparatus 300, if the determination result does not satisfy the switching condition, the focus transformation module 320 performs focus transformation on a target image in the original image corresponding to the current camera, and if the determination result satisfies the switching condition, the focus transformation module 320 performs focus transformation on a target image in the original image corresponding to the next camera, including: if the judgment result does not satisfy the switching condition, the focus conversion module 320 performs focus conversion on a target image in the current frame original image corresponding to the current camera according to the current frame conversion parameter corresponding to the current camera, and if the judgment result satisfies the switching condition, the focus conversion module 320 performs focus conversion on a target image in the current frame original image acquired by the next camera according to the current frame conversion parameter corresponding to the next camera; the current frame conversion parameters corresponding to the current camera are calculated according to the current frame original image corresponding to the current camera, the current frame conversion parameters corresponding to the next camera are calculated according to the current frame original image corresponding to the next camera, and the current frame conversion parameters comprise current frame amplification factors or current frame amplification factors and current frame translation amount.

In one implementation of the auto-focusing apparatus 300, if the field angle of the current camera is larger than the field angle of the next camera, the switching condition includes: the current frame magnification factor corresponding to the current camera is not less than the camera factor ratio, and the position of the target object does not exceed the boundary of the current frame original image corresponding to the next camera; and/or if the field angle of the current camera is smaller than the field angle of the next camera, the switching condition includes: the current frame magnification factor corresponding to the next camera is smaller than the camera factor ratio, or the position of the target object exceeds the boundary of the current frame original image corresponding to the current camera; the camera multiple ratio is the ratio of the amplification factor of the current camera to the amplification factor of the next camera.

In an implementation manner of the automatic focusing apparatus 300, if the field angle of the current camera is greater than the field angle of the next camera, the focus transformation module 320 determines whether the position of the target object exceeds the boundary of the current frame original image corresponding to the next camera according to the current frame translation amount corresponding to the next camera; if the field angle of the current camera is smaller than the field angle of the next camera, the focus transform module 320 determines whether the position of the target object exceeds the boundary of the current frame original image corresponding to the current camera according to the current frame translation amount corresponding to the current camera.

In an implementation manner of the automatic focusing apparatus 300, if the determination result satisfies the switching condition, the performing, by the focus transformation module 320, focus transformation on a target image in an original image corresponding to the next camera includes: if the determination result at the current moment meets the switching condition, the focus transformation module 320 continues to determine the switching condition according to the original image corresponding to the current camera and/or the original image corresponding to the next camera within a later time period, and if the determination results obtained within the time period all meet the switching condition, the focus transformation module 320 performs focus transformation on the target image in the original image corresponding to the next camera.

In one implementation of the auto-focusing apparatus 300, the target tracking module 310 is further configured to: determining an object of interest detected in an initial image as a target object before determining a position of the target object from an original image generated by an image sensor using a target tracking algorithm, or determining the target object according to a target selection operation for the initial image; and the initial image is a first frame original image processed by the target tracking algorithm.

In one implementation of the auto-focusing apparatus 300, the target tracking module 310 further determines the target object according to a target selection operation for the initial image, including: determining an object of interest within the circumscribed region as the target object according to a region delineation operation for the initial image; or, according to the focus selection operation for the initial image, determining the object of interest containing the focus as the target object.

In one implementation of the auto-focusing apparatus 300, the target tracking module 310 determines the position of the target object from the raw image generated by the image sensor using a target tracking algorithm, including: determining the positions of a plurality of target objects from an original image generated by an image sensor by using a target tracking algorithm; the focus transform module 320 performs focus transform on a target image in the original image, and displays the transformed target image, including: carrying out focusing transformation on a selected target image of the original image, and displaying the transformed selected target image; or, carrying out focusing transformation on a target image in the original image, and displaying a selected target image after transformation; the selected target image is a target image which contains the selected target object and is in the target images corresponding to the target objects.

In one implementation of the autofocus device 300, the device further comprises: the shooting module is used for storing the image displayed on the display screen as a shooting result or a recorded video frame; wherein the image displayed on the display screen includes the transformed target image.

The implementation principle and the resulting technical effect of the auto-focusing apparatus 300 provided in the embodiment of the present application have been introduced in the foregoing method embodiments, and for the sake of brief description, portions of the embodiment of the apparatus that are not mentioned in the description may refer to corresponding contents in the method embodiments.

The embodiment of the present application further provides a computer-readable storage medium, where computer program instructions are stored on the computer-readable storage medium, and when the computer program instructions are read and executed by a processor of a computer, the computer-readable storage medium executes the auto-focusing method provided by the embodiment of the present application. For example, a computer-readable storage medium may be implemented as, but is not limited to, memory 120 in electronic device 100 in FIG. 1.

The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. An auto-focusing method, comprising:

determining the position of a target object from an original image generated by an image sensor by using a target tracking algorithm;

carrying out focusing transformation on a target image in the original image, and displaying the transformed target image; the target image is a local image of the original image, the local image comprises the target object, and the focusing transformation is affine transformation capable of highlighting the target image.

2. The auto-focusing method according to claim 1, wherein the target image is obtained by expanding an area occupied by the target object in the original image.

3. The auto-focusing method according to claim 1 or 2, characterized in that the focusing transformation comprises a magnification and/or a translation.

4. The auto-focusing method of claim 3, wherein if the focus transformation comprises zooming in, the performing the focus transformation on the target image in the original image comprises:

calculating an expected magnification according to the size of the original image and the size of the target image, and amplifying the target image in the original image according to the expected magnification;

if the focus transformation includes translation, performing focus transformation on the target image in the original image, including:

and calculating an expected translation amount according to the central position of the original image and the central position of the target object, and translating the target image in the original image according to the expected translation amount.

5. The auto-focusing method of claim 4, wherein the calculating a desired magnification according to the size of the original image and the size of the target image and magnifying the target image according to the desired magnification comprises:

calculating the expected magnification according to the size of the current frame original image and the size of the target image in the current frame original image;

calculating the current frame amplification factor according to the expected amplification factor and the previous frame amplification factor, and amplifying a target image in the current frame original image according to the current frame amplification factor; the previous frame magnification is a magnification for amplifying a target image in a previous frame original image, and the current frame magnification is located between the previous frame magnification and the expected magnification;

the calculating an expected translation amount according to the center position of the original image and the center position of the target object, and translating the target image according to the expected translation amount includes:

calculating the expected translation amount according to the central position of the current frame original image and the central position of the target object in the current frame original image;

calculating the translation amount of the current frame according to the expected translation amount and the translation amount of the previous frame, and translating the target image in the original image of the current frame according to the translation amount of the current frame; wherein the current frame translation amount is located between the previous frame translation amount and the expected translation amount.

6. The auto-focusing method of claim 5, wherein said calculating a current frame magnification from said desired magnification and a previous frame magnification comprises:

calculating the amplification increment according to the expected amplification, the previous frame amplification and the residual amplification frame number; wherein the remaining number of amplification frames represents: the frame number of the original image which needs to be passed when the magnification of the current frame reaches the expected magnification;

superposing the amplification factor increment on the previous frame amplification factor to obtain the current frame amplification factor;

the calculating the current frame translation amount according to the expected translation amount and the previous frame translation amount comprises:

calculating a translation increment according to the expected translation, the translation of the previous frame and the number of the remaining translation frames; wherein the remaining panning frame number represents: the number of original image frames still needed to be experienced by the current frame translation amount when the current frame translation amount reaches the expected translation amount;

and superposing the translation increment on the translation of the previous frame to obtain the translation of the current frame.

7. The automatic focusing method according to any one of claims 1-6, wherein the image sensor is a photosensitive element of a camera, the number of the cameras is N, N is greater than or equal to 2, and the field angles of the cameras increase sequentially from 1 st camera to Nth camera, and the performing of the focusing transformation on the target image in the original image comprises:

judging whether a switching condition for switching from the current camera to the next camera is met or not according to the original image corresponding to the current camera and/or the original image corresponding to the next camera; the current camera is the ith camera, i is more than or equal to 1 and less than or equal to N, the next camera is a camera adjacent to the current camera in the sequencing of the field angle, and the original image corresponding to the camera is the original image generated by the image sensor of the camera;

if the judgment result does not meet the switching condition, performing focusing transformation on a target image in the original image corresponding to the current camera, and if the judgment result meets the switching condition, performing focusing transformation on a target image in the original image corresponding to the next camera; wherein the focus transform comprises a magnification.

8. The automatic focusing method according to claim 7, wherein if the determination result does not satisfy the switching condition, performing focus transformation on a target image in the original image corresponding to the current camera, and if the determination result satisfies the switching condition, performing focus transformation on a target image in the original image corresponding to the next camera comprises:

if the judgment result does not meet the switching condition, performing focusing transformation on a target image in the current frame original image acquired by the current camera according to the current frame transformation parameter corresponding to the current camera, and if the judgment result meets the switching condition, performing focusing transformation on a target image in the current frame original image acquired by the next camera according to the current frame transformation parameter corresponding to the next camera;

the current frame conversion parameters corresponding to the current camera are calculated according to the current frame original image corresponding to the current camera, the current frame conversion parameters corresponding to the next camera are calculated according to the current frame original image corresponding to the next camera, and the current frame conversion parameters comprise current frame amplification factors or current frame amplification factors and current frame translation amount.

9. The auto-focusing method according to claim 8, wherein if the field angle of the current camera is larger than the field angle of the next camera, the switching condition includes: the current frame magnification factor corresponding to the current camera is not less than the camera factor ratio, and the position of the target object does not exceed the boundary of the current frame original image corresponding to the next camera;

and/or the presence of a gas in the gas,

if the field angle of the current camera is smaller than the field angle of the next camera, the switching condition includes: the current frame magnification factor corresponding to the next camera is smaller than the camera factor ratio, or the position of the target object exceeds the boundary of the current frame original image corresponding to the current camera;

the camera multiple ratio is the ratio of the amplification factor of the current camera to the amplification factor of the next camera.

10. The auto-focusing method according to claim 9, wherein if the field angle of the current camera is larger than the field angle of the next camera, it is determined whether the position of the target object exceeds the boundary of the current frame original image corresponding to the next camera according to the current frame translation amount corresponding to the next camera;

and if the field angle of the current camera is smaller than the field angle of the next camera, judging whether the position of the target object exceeds the boundary of the current frame original image corresponding to the current camera according to the current frame translation amount corresponding to the current camera.

11. The automatic focusing method according to any one of claims 7 to 10, wherein performing focus transformation on a target image in an original image corresponding to the next camera if the determination result satisfies the switching condition includes:

and if the judgment result at the current moment meets the switching condition, continuously judging the switching condition according to the original image corresponding to the current camera and/or the original image corresponding to the next camera in a later time period, and if the judgment results obtained in the time period all meet the switching condition, performing focusing transformation on a target image in the original image corresponding to the next camera.

12. The auto-focusing method according to any one of claims 1 to 11, wherein before said determining the position of the target object from the raw image generated by the image sensor using the target tracking algorithm, the method further comprises:

determining the object of interest detected in the initial image as the target object, or determining the target object according to a target selection operation for the initial image; and the initial image is the first frame original image processed by the target tracking algorithm.

13. The auto-focusing method of claim 12, wherein the determining the target object according to a target selection operation for an initial image comprises:

determining an object of interest within the circumscribed region as the target object according to a region delineation operation for the initial image; alternatively, the first and second electrodes may be,

according to the focus selection operation aiming at the initial image, determining an object of interest containing an in-focus point as the target object.

14. The auto-focusing method of any one of claims 1 to 13, wherein determining the position of the target object from the raw image generated by the image sensor using a target tracking algorithm comprises:

determining the positions of a plurality of target objects from an original image generated by an image sensor by using a target tracking algorithm;

the performing focus transformation on the target image in the original image and displaying the transformed target image includes:

carrying out focusing transformation on a selected target image of the original image, and displaying the transformed selected target image; alternatively, the first and second electrodes may be,

carrying out focusing transformation on a target image in the original image, and displaying a selected target image after transformation;

the selected target image is a target image which contains the selected target object and is in the target images corresponding to the target objects.

15. The auto-focusing method according to any one of claims 1 to 14, characterized in that the method further comprises:

saving the image displayed on the display screen as a photographing result or a recorded video frame; wherein the image displayed on the display screen includes the transformed target image.

16. An auto-focusing device, comprising:

the target tracking module is used for determining the position of a target object from an original image generated by the image sensor by utilizing a target tracking algorithm;

the focusing conversion module is used for carrying out focusing conversion on a target image in the original image and displaying the converted target image; the target image is a local image of the original image, the local image comprises the target object, and the focus transformation is affine transformation capable of highlighting the target image.

17. A computer-readable storage medium having computer program instructions stored thereon, which when read and executed by a processor, perform the method of any one of claims 1-15.

18. An electronic device, comprising:

a memory for storing computer program instructions;

a processor for reading and executing the computer program instructions to perform the method of any of claims 1-15.

19. The electronic device of claim 18, wherein the device further comprises:

a camera comprising a lens and an image sensor to generate an original image based on light transmitted through the lens;

and the display screen is used for displaying the converted target image.