CN112565586A

CN112565586A - Automatic focusing method and device

Info

Publication number: CN112565586A
Application number: CN201910920070.1A
Authority: CN
Inventors: 董中要
Original assignee: Beijing Anyun Century Technology Co Ltd
Current assignee: Beijing Anyun Century Technology Co Ltd
Priority date: 2019-09-26
Filing date: 2019-09-26
Publication date: 2021-03-26

Abstract

The invention discloses an automatic focusing method, which is applied to electronic equipment, wherein the electronic equipment is provided with one or more image acquisition units, and the method comprises the following steps: acquiring an image of a current scene by using the image acquisition unit to obtain at least one picture; performing target detection on the picture by using a deep learning model to obtain position information of a target area, wherein the target area is an area which is in the current scene and is interested by a user, and the user corresponds to the electronic equipment; and controlling the image acquisition unit to focus and photograph the target area based on the position information of the target area. The invention realizes the automatic focusing of photographing and improves the technical effect of photographing speed. Meanwhile, the invention also discloses an automatic focusing device, electronic equipment and a computer readable storage medium.

Description

Automatic focusing method and device

Technical Field

The present invention relates to the field of photographing technologies, and in particular, to an auto-focusing method and apparatus, an electronic device, and a computer-readable storage medium.

Background

With the development of science and technology, smart phones are widely popularized, have rich functions and are deeply loved by the majority of users. People can use smart phones to make calls, receive/send short messages, browse web pages, process documents, listen to music, watch movies, shop online, stock-making, take/record photos, chat, get a car, order/train tickets, take out, play games, transfer/manage money, etc.

The photographing is an important function of the smart phone, and the quality of the photographing effect can directly reduce the sales volume of the smart phone, so that the photographing technology is always the research and development focus of various large mobile phone manufacturers.

At present, there are two focusing methods for mobile phone photographing: one is that a user manually clicks a specific area of the screen, and the focusing is performed manually; the other is automatic focusing according to the distance between the lens and the shot object or based on the focus detection of clear imaging on the focusing screen. However, the existing manual focusing method is very inconvenient to operate and slow in focusing speed, and manual click focusing cannot be realized in a scene of taking a picture by using a selfie stick; the existing automatic focusing method is not accurate enough and has larger error.

Disclosure of Invention

The embodiment of the application provides an automatic focusing method and device, an electronic device and a computer readable storage medium, solves the technical problems of low focusing speed or inaccurate focusing in the focusing method in the prior art, realizes automatic focusing of photographing, and improves the focusing speed and the focusing accuracy.

In a first aspect, the present application provides the following technical solutions through an embodiment of the present application:

an auto-focusing method applied to an electronic device, wherein the electronic device is provided with one or more image acquisition units, and the method comprises the following steps:

acquiring an image of a current scene by using the image acquisition unit to obtain at least one picture;

performing target detection on the picture by using a deep learning model to obtain position information of a target area, wherein the target area is an area which is in the current scene and is interested by a user, and the user corresponds to the electronic equipment;

and controlling the image acquisition unit to focus and photograph the target area based on the position information of the target area.

Preferably, the performing target detection on the picture by using the deep learning model to obtain position information of a target region includes:

inputting the picture into the deep learning model, wherein the deep learning model is a neural network model;

acquiring position information of a region where a candidate target object is output by the deep learning model;

and acquiring the position information of the area where the target object is located based on the position information of the area where the candidate target object is located, and taking the position information of the area where the target object is located as the position information of the target area.

Preferably, the obtaining the position information of the area where the target object is located based on the position information of the area where the candidate target object is located includes:

when the candidate target object is an object, taking the position information of the area where the candidate target object is as the position information of the area where the target object is; or

And when the candidate target objects are a plurality of objects, determining the target objects from the plurality of objects, and extracting the position information of the area where the target objects are located.

Preferably, the determining the target object from the plurality of objects includes:

selecting an object with the largest area from the plurality of objects as the target object based on the area size of the region where each object in the plurality of objects is located; or

Selecting an object with the position closest to the central point of the picture from the plurality of objects as the target object based on the position information of the area where each object in the plurality of objects is located; or

Selecting an object with highest reliability from the plurality of objects as the target object based on the reliability of each object in the plurality of objects, wherein the reliability is provided by the deep learning model and is used for representing the reliability degree of each object identified by the deep learning model; or

Selecting an object with the highest type weight from the plurality of objects as the target object based on the type of each object in the plurality of objects, wherein the weights of different types of objects are different from each other; or

Randomly selecting one object from the plurality of objects as the target object.

Preferably, the training method of the deep learning model includes:

acquiring a plurality of data sets, wherein each data set comprises a plurality of picture materials, each picture material comprises one or more objects, and each object is marked with corresponding name information and position information of a region where the object is located;

and training the plurality of data sets as training samples to obtain the deep learning model.

Preferably, before the performing target detection on the picture by using the deep learning model to obtain the position information of a target region, the method further includes:

acquiring a history picture shot by the user from the electronic equipment locally;

and fine-tuning the deep learning model based on the historical pictures.

Based on the same inventive concept, in a second aspect, the present application provides the following technical solutions through an embodiment of the present application:

an auto-focusing apparatus for use in an electronic device having one or more image capturing units, the apparatus comprising:

the acquisition module is used for acquiring an image of the current scene by using the image acquisition unit to obtain at least one picture;

the detection module is used for carrying out target detection on the picture by utilizing a deep learning model to obtain position information of a target area, wherein the target area is an area which is interested by a user in the current scene, and the user corresponds to the electronic equipment;

and the focusing module is used for controlling the image acquisition unit to focus and photograph the target area based on the position information of the target area.

Preferably, the detection module includes:

the input sub-module is used for inputting the picture into the deep learning model, wherein the deep learning model is a neural network model;

the acquisition submodule is used for acquiring the position information of the region where the candidate target object output by the deep learning model is located;

and the obtaining submodule is used for obtaining the position information of the area where the target object is located based on the position information of the area where the candidate target object is located, and taking the position information of the area where the target object is located as the position information of the target area.

Preferably, the obtaining submodule is specifically configured to:

Preferably, the auto-focusing apparatus further includes:

the training module is used for training the deep learning model; wherein the training the deep learning model comprises: acquiring a plurality of data sets, wherein each data set comprises a plurality of picture materials, each picture material comprises one or more objects, and each object is marked with corresponding name information and position information of a region where the object is located; and training the plurality of data sets as training samples to obtain the deep learning model.

Preferably, the auto-focusing apparatus further includes:

the acquisition module is used for locally acquiring historical pictures shot by the user from the electronic equipment before the pictures are subjected to target detection by using a deep learning model and position information of a target area is acquired;

and the fine tuning module is used for fine tuning the deep learning model based on the historical pictures.

Based on the same inventive concept, in a third aspect, the present application provides the following technical solutions through an embodiment of the present application:

an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor executes the program to implement the method steps of any of the embodiments of the first aspect.

Based on the same inventive concept, in a fourth aspect, the present application provides the following technical solutions through an embodiment of the present application:

a computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, is adapted to carry out the method steps of any of the embodiments of the first aspect.

One or more technical solutions provided in the embodiments of the present application have at least the following technical effects or advantages:

in an embodiment of the present application, an auto-focusing method is disclosed, which is applied to an electronic device having one or more image capturing units, and the method includes: acquiring an image of a current scene by using the image acquisition unit to obtain at least one picture; performing target detection on the picture by using a deep learning model to obtain position information of a target area, wherein the target area is an area which is in the current scene and is interested by a user, and the user corresponds to the electronic equipment; and controlling the image acquisition unit to focus and photograph the target area based on the position information of the target area. The electronic equipment can determine a target area from the picture by using the deep learning model and automatically focus, so that the technical problems of low focusing speed or inaccurate focusing in the focusing method in the prior art are solved, the automatic focusing of photographing is realized, the focusing speed is increased, and the focusing accuracy is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.

FIG. 1 is a flowchart illustrating an auto-focusing method according to an embodiment of the present disclosure;

FIG. 2 is a block diagram of an auto-focusing apparatus according to an embodiment of the present disclosure;

FIG. 3 is a block diagram of an electronic device according to an embodiment of the present disclosure;

fig. 4 is a block diagram of a computer-readable storage medium according to an embodiment of the present application.

Detailed Description

In order to solve the technical problems, the general idea of the embodiment of the application is as follows:

an auto-focusing method applied to an electronic device, wherein the electronic device is provided with one or more image acquisition units, and the method comprises the following steps: acquiring an image of a current scene by using the image acquisition unit to obtain at least one picture; performing target detection on the picture by using a deep learning model to obtain position information of a target area, wherein the target area is an area which is in the current scene and is interested by a user, and the user corresponds to the electronic equipment; and controlling the image acquisition unit to focus and photograph the target area based on the position information of the target area. The electronic equipment can determine a target area from the picture by using the deep learning model and automatically focus, so that the technical problems of low focusing speed or inaccurate focusing in the focusing method in the prior art are solved, the automatic focusing of photographing is realized, the focusing speed is increased, and the focusing accuracy is improved.

In order to better understand the technical solution, the technical solution will be described in detail with reference to the drawings and the specific embodiments.

First, it is stated that the term "and/or" appearing herein is merely one type of associative relationship that describes an associated object, meaning that three types of relationships may exist, e.g., a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.

Example one

The embodiment provides an auto-focusing method, which is applied to electronic equipment, where the electronic equipment may be: a smart phone, or a tablet computer, or a digital camera, or a game console, or a smart television, etc. Here, the electronic device is not particularly limited in the embodiment as to what kind of device is. And, the electronic device has one or more image capturing units (i.e., cameras) for implementing photographing or video recording functions.

As shown in fig. 1, the auto-focusing method includes:

step S101: and acquiring an image of the current scene by using an image acquisition unit to obtain at least one picture.

In a specific implementation process, after it is detected that the user starts the photographing function, the image acquisition unit is controlled to acquire an image, and the acquired image is displayed on a screen of the electronic device in real time for the user to preview, which is a framing process. After the user finishes framing, the posture of the electronic equipment is relatively fixed, the image acquired by the image acquisition unit tends to be stable and does not change obviously any more, at the moment, at least one picture can be acquired, and the picture is a framing picture.

Step S102: and carrying out target detection on the picture by using the deep learning model to obtain position information of a target area, wherein the target area is an area which is interested by the user in the current scene, and the user corresponds to the electronic equipment.

In a specific implementation, the image obtained in step S101 (i.e., the framing image) may be subject to target detection by using a deep learning model (e.g., a neural network model). The purpose of current detection is to identify a target area from a picture and obtain location information of the target area, where the current area is an area in the current scene that is of interest to a user. For example, if an object (hereinafter, referred to as a "target object") in which a user is interested is located in a certain area, the area in which the object is located is the target area.

In a specific implementation process, the user refers to a user of the electronic device, and for example, a smart phone, the user generally refers to an owner of the smart phone.

As an alternative embodiment, step S102 includes:

inputting the picture into a deep learning model, wherein the deep learning model is specifically an SSD neural network model; acquiring position information of a region where a candidate target object is output by a deep learning model; and obtaining the position information of the area where the target object is located based on the position information of the area where the candidate target object is located, and taking the position information of the area where the target object is located as the position information of the target area.

In a specific implementation process, a pre-trained deep learning model may be obtained, where the deep learning model is specifically an SSD neural network model, and the picture obtained in step S101 is input into the SSD neural network model, and the SSD neural network model may identify the name and the position of each object in the picture, so as to obtain name information and position information of each object. The objects are candidate target objects, and further, a target object is determined from the candidate target objects, where the area where the target object is located is a target area, and position information of the target area is obtained.

SSD: the English is called Single Shot MultiBox Detector, and is a target detection algorithm.

Target detection: is an algorithm for detecting a target area.

As an alternative embodiment, the obtaining the position information of the area where the target object is located based on the position information of the area where the candidate target object is located includes:

When the candidate target objects are a plurality of objects, the target objects are determined from the plurality of objects, and the position information of the area where the target objects are located is extracted.

In the implementation process, when the deep learning model detects an object (i.e., a candidate target object) from the picture, the object is the final target object. When the deep learning model detects a plurality of objects (i.e., candidate target objects) from the picture, one object needs to be determined from the objects as the target object.

As an alternative embodiment, the determining the target object from the plurality of objects includes the following manners (i to (v):

selecting an object with the largest area from the plurality of objects as a target object based on the area size of the region where each object in the plurality of objects is located.

In the implementation process, generally speaking, the more a user feels like a certain object, the larger the area of the object in the view picture is, and the object belongs to the focus of the view. Therefore, the area of each object in the picture (i.e., the viewfinder picture) can be calculated based on the position information (i.e., the coordinate information) of the area where each object is located, and the object with the largest area is taken as the target object, the target object is the object in which the user is interested, and the corresponding area is the area in which the user is interested (i.e., the target area).

And secondly, selecting the object with the position closest to the central point of the picture from the multiple objects as a target object based on the position information of the area where each object in the multiple objects is located.

In practice, generally speaking, the more a user feels about an object, the closer the object is to the center point of the viewfinder picture. Therefore, the coordinates of the center point of each object in the picture (i.e., the viewfinder picture) can be calculated based on the position information (i.e., the coordinate information) of the area where each object is located, and the object with the center point closest to the center point of the picture is taken as the target object, the target object is the object in which the user is interested, and the corresponding area is the area in which the user is interested (i.e., the target area).

And thirdly, selecting an object with the highest reliability from the plurality of objects as a target object based on the reliability of each object in the plurality of objects, wherein the reliability is provided by the deep learning model and is used for representing the reliability of the deep learning model for identifying each object.

In a specific implementation process, when the deep learning model outputs name information and position information of each object in a picture, the credibility of each object can be further output, and the credibility is used for representing the reliability of the deep learning model for identifying the object. For example, it is recognized that there is an a object in the picture as "bicycle" with 70% confidence (representing 70% confidence that the a object is "bicycle"), and it is also recognized that there is a B object in the picture as "motorcycle" with 85% confidence (representing 85% confidence that the B object is "motorcycle"). Thus, the B object may be selected as the target object.

And fourthly, selecting the object with the highest type weight from the plurality of objects as the target object based on the type of each object in the plurality of objects, wherein the weights of different types of objects are different.

In a specific implementation process, different weights can be set for different types of objects, for example, the weight of "person" > "animal" > "car" > "building" >, wherein the different weights can reflect the preference of the user, for example, the user usually shoots the most people, the animal is the second, the car is the second, and the building is the minimum. In this way, when a plurality of objects are recognized by the deep learning model, the objects can be sorted according to the type weight of each object, and the object with the highest weight can be selected as the target object. For example, a "dog" and a "car" are recognized in the picture, and the "dog" is used as the target object because the weight of the "animal" is greater than that of the "car".

Randomly selecting one object from the plurality of objects as a target object.

In the specific implementation process, the modes of (i) - (v) can also be combined for use, and are not described herein again.

After the target object is determined from the plurality of objects, the area where the target object is located is the target area interested by the user, and the position information of the target area is further obtained, so that automatic focusing can be achieved, and the focusing is faster and more accurate.

As an optional embodiment, the training method of the deep learning model includes: acquiring a plurality of data sets, wherein each data set comprises a plurality of picture materials, each picture material comprises one or more objects, and each object is marked with corresponding name information and position information of a region where the object is located; and training a plurality of data sets as training samples to obtain a deep learning model.

In the specific implementation process, the deep learning model needs to be trained in advance. Specifically, the PASCAL VOC2012 data set may be used as a training sample, and the training sample is input into the deep learning model to train the deep learning model, so as to finally obtain the deep learning model required in the present embodiment.

The PASCAL VOC2012 data set currently stores 11700 picture materials, each of which contains one or more objects (which include aspects of life, such as people, animals, plants, buildings, landscapes, vehicles, clothing, living goods, electronic products, cultural goods, medical goods, and the like). And each object is tagged with its name information and location information of the area in which it is located (i.e. location information in the picture material), the data set being collectively 27000 tagged with a plurality of objects (including the same object). For example, an image of a "bicycle" is contained in a picture material, and the position coordinates and the name "bicycle" of the "bicycle" are marked in the attribute information of the picture.

Of course, other data sets may be used in addition to the PASCAL VOC2012 data set, and are not specifically limited herein.

In a specific implementation process, when the PASCAL VOC2012 data set is used as a training sample for the deep learning model to learn, the deep learning model can learn the name information and the position information of each object in each picture material. Thus, after any picture is input into the trained deep learning model, the deep learning model can identify what objects are in the picture (i.e., obtaining name information of the objects) and where the objects are located in the picture (i.e., obtaining position information of the objects).

In more detail, after a certain picture is input into the SSD neural network model, the picture may undergo operations such as convolution and pooling to obtain feature maps with different sizes, k candidate frames with different proportions are generated for each pixel on each feature map, a large number of candidate frames are formed (for example, when k is 9, 8732 candidate frames are formed), and finally, the area where the object is located is determined by a non-maximum suppression algorithm, and the position information of the area where the object is located is generated.

Non-maximum suppression algorithm: the method is an algorithm for eliminating redundant candidate frames and finding the optimal object detection position.

As an alternative embodiment, before step S102, the method further includes:

acquiring a history picture shot by a user from the electronic equipment locally; and fine-tuning the deep learning model based on the historical pictures.

In particular implementations, the deep learning model may also be fine-tuned (fine tuning) based on historical photos local to the electronic device before being used. For example, historical photos taken by a user are obtained from an album of a smart phone, and the historical photos are input into the deep learning model, so that the deep learning model is finely adjusted. In this way, the deep learning model can learn the photographing habits/preferences of the user, for example, whether a person or an animal is to be photographed, whether a car or a building is to be photographed.

In this way, after the deep learning model is finely adjusted, when the target recognition is performed on the picture (namely, the preview picture), the object liked by the user can be more easily recognized, the object liked by the user is taken as the target object, and the area where the target object is located is further obtained as the target area. The identification accuracy is higher, the area which is interested by the user can be identified more accurately in the current scene, and the focusing is faster and more accurate when the focusing is carried out.

Step S103: and controlling the image acquisition unit to focus and photograph the target area based on the position information of the target area.

In a specific implementation process, after the position information of the target area is obtained, the image acquisition unit (i.e., the camera) can be controlled to focus on the target area based on the position information of the target area, and the purpose of focusing is to adjust the focus of the image acquisition unit to the target area which is interested by a user, so that a photographed picture is clearer. Further, the user is detected to trigger a shutter key for taking pictures, and pictures (i.e., photos) obtained by taking pictures are saved in a photo album or other folders.

In this way, in the auto-focusing method of the embodiment, a target region is determined from a picture (i.e., a preview picture) by using a deep learning model and auto-focusing is performed; compared with a manual focusing method in the prior art, the method has the advantages that the focusing speed is higher, the operation is simpler and more convenient, and the focusing is not influenced in a scene of using the selfie stick; compared with the automatic focusing method in the prior art, the method can automatically identify the area in the current scene, which is interested by the user, and perform automatic focusing, so that focusing is more accurate, and the intention of the user can be more easily met.

The technical scheme in the embodiment of the application at least has the following technical effects or advantages:

Example two

Based on the same inventive concept, as shown in fig. 2, the present embodiment provides an auto-focusing apparatus 200, applied in an electronic device having one or more image capturing units, the apparatus comprising:

the acquisition module 201 is configured to acquire an image of a current scene by using the image acquisition unit to obtain at least one picture;

the detection module 202 is configured to perform target detection on the picture by using a deep learning model to obtain position information of a target area, where the target area is an area in the current scene that is interested by a user, and the user corresponds to the electronic device;

and the focusing module 203 is configured to control the image acquisition unit to focus and photograph the target area based on the position information of the target area.

As an alternative embodiment, the detection module 202 includes:

As an optional embodiment, the obtaining submodule is specifically configured to:

As an alternative embodiment, the automatic focusing apparatus 200 further includes:

Since the electronic device described in this embodiment is an electronic device used for implementing the auto-focusing method in this embodiment, a person skilled in the art can understand the specific implementation of the electronic device of this embodiment and various variations thereof based on the auto-focusing method described in this embodiment, and therefore, how to implement the method in this embodiment of the present application by the electronic device is not described in detail herein. The electronic device used by those skilled in the art to implement the auto-focusing method in the embodiments of the present application is within the scope of the present application.

in an embodiment of the present application, an auto-focusing apparatus is disclosed, which is applied to an electronic device having one or more image capturing units, and the apparatus includes: the acquisition module is used for acquiring an image of the current scene by using the image acquisition unit to obtain at least one picture; the detection module is used for carrying out target detection on the picture by utilizing a deep learning model to obtain position information of a target area, wherein the target area is an area which is interested by a user in the current scene, and the user corresponds to the electronic equipment; and the focusing module is used for controlling the image acquisition unit to focus and photograph the target area based on the position information of the target area. The electronic equipment can determine a target area from the picture by using the deep learning model and automatically focus, so that the technical problems of low focusing speed or inaccurate focusing in the focusing method in the prior art are solved, the automatic focusing of photographing is realized, the focusing speed is increased, and the focusing accuracy is improved.

EXAMPLE III

Based on the same inventive concept, as shown in fig. 3, the present embodiment provides an electronic device 300, which includes a memory 310, a processor 320, and a computer program 311 stored in the memory 310 and executable on the processor 320, wherein the processor 320 executes the computer program 311 to implement the following method steps:

acquiring an image of a current scene by using the image acquisition unit to obtain at least one picture; performing target detection on the picture by using a deep learning model to obtain position information of a target area, wherein the target area is an area which is in the current scene and is interested by a user, and the user corresponds to the electronic equipment; and controlling the image acquisition unit to focus and photograph the target area based on the position information of the target area.

In a specific implementation process, when the processor 320 executes the program 311, any manner steps in the first embodiment may also be implemented.

Example four

Based on the same inventive concept, as shown in fig. 4, the present embodiment provides a computer-readable storage medium 400, on which a computer program 411 is stored, the computer program 411 implementing the following steps when being executed by a processor:

In a specific implementation, the computer program 411, when executed by a processor, may implement the method steps of the second embodiment.

The algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. Moreover, the present invention is not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.

In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.

The various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. It will be appreciated by those skilled in the art that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functions of some or all of the components of an autofocus device, electronic device, and the like in accordance with embodiments of the present invention. The present invention may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present invention may be stored on computer-readable media or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

The invention discloses an automatic focusing method A1, which is applied to electronic equipment, wherein the electronic equipment is provided with one or more image acquisition units, and the method is characterized by comprising the following steps:

The auto-focusing method of a2, as in a1, wherein the obtaining position information of a target area by performing target detection on the picture by using a deep learning model comprises:

A3, the auto-focusing method according to a2, wherein the obtaining position information of a region where a target object is located based on the position information of the region where the candidate target object is located includes:

A4, the auto-focusing method as claimed in A3, wherein said determining the target object from the plurality of objects comprises:

The auto-focusing method A5, as claimed in A1, wherein the training method of the deep learning model comprises:

The auto-focusing method of a6, as in any one of a1 to a5, further comprising, before the performing the target detection on the picture by using the deep learning model to obtain the position information of a target region:

and fine-tuning the deep learning model based on the historical pictures.

B7, an automatic focusing device, used in an electronic device having one or more image capturing units, the device comprising:

B8, the auto-focusing device of B7, wherein the detecting module comprises:

B9, the autofocus device of claim B8, wherein the obtaining submodule is configured to:

B10, the autofocus device of claim B9, wherein the obtaining submodule is configured to:

B11, the auto-focusing device of B7, further comprising:

B12, the auto-focusing device of any one of B7 to B11, further comprising:

C13, an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor, when executing the program, performs the method steps of any of claims a 1-a 6.

D14, a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, is adapted to carry out the method steps of any of claims a1 to a 6.

Claims

1. An auto-focusing method applied to an electronic device having one or more image capturing units, the method comprising:

2. The auto-focusing method of claim 1, wherein the performing the target detection on the picture by using the deep learning model to obtain the position information of a target region comprises:

3. The auto-focusing method of claim 2, wherein the obtaining of the position information of the area where a target object is located based on the position information of the area where the candidate target object is located comprises:

4. The auto-focusing method of claim 3, wherein said determining the target object from the plurality of objects comprises:

5. The auto-focusing method of claim 1, wherein the training method of the deep learning model comprises:

6. The auto-focusing method of any one of claims 1 to 5, wherein before the performing the target detection on the picture by using the deep learning model to obtain the position information of a target region, the method further comprises:

and fine-tuning the deep learning model based on the historical pictures.

7. An auto-focusing apparatus for use in an electronic device having one or more image capturing units, the apparatus comprising:

8. The autofocus apparatus of claim 7, wherein the detection module comprises:

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor, when executing the program, is adapted to carry out the method steps of any of claims 1 to 6.

10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, is adapted to carry out the method steps of any of claims 1 to 6.