CN115037869A

CN115037869A - Automatic focusing method and device, electronic equipment and computer readable storage medium

Info

Publication number: CN115037869A
Application number: CN202110246096.XA
Authority: CN
Inventors: 颜光宇
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2021-03-05
Filing date: 2021-03-05
Publication date: 2022-09-09

Abstract

The embodiment of the application discloses an automatic focusing method, an automatic focusing device, electronic equipment and a computer readable storage medium. The method comprises the following steps: determining first image area information corresponding to a target object in a current frame image; predicting first prediction region information of the target object in an Nth frame image after the current frame image according to the first image region information, wherein N is a positive integer; taking the first prediction area information as an observed value, processing the first image area information and the observed value through a filter, and predicting to obtain second prediction area information of the target object in an Nth frame image after the current frame image; and controlling a focusing module to perform focusing operation based on the second prediction area information. The automatic focusing method, the automatic focusing device, the electronic equipment and the computer readable storage medium can improve the accuracy of automatic focusing and improve the imaging effect of the target object.

Description

Automatic focusing method and device, electronic equipment and computer readable storage medium

Technical Field

The present application relates to the field of image technologies, and in particular, to an auto-focusing method, an auto-focusing apparatus, an electronic device, and a computer-readable storage medium.

Background

When the image is collected by the camera device, focusing is required to be carried out firstly, so that the shot target object is ensured to be clearly imaged in the image. For a target object in a moving process, the camera device needs to track the target object in real time and determine the real object distance of the target object, so that the lens is moved to a focal plane with a clear target, and the moving target is guaranteed to be clearly focused.

However, in the actual use process of the camera device, in order to ensure the imaging effect after image acquisition of the target object, it is usually necessary to first fully process the acquired image, including auto-focusing, auto-exposure, white balance processing, noise reduction, and the like, so that the tracking process of the target object is delayed, the tracking result of the camera device on the target object is deviated from the real position of the target object, which causes the situation of inaccurate auto-focusing, and reduces the imaging effect of the target object.

Disclosure of Invention

The embodiment of the application discloses an automatic focusing method, an automatic focusing device, electronic equipment and a computer readable storage medium, which can improve the accuracy of automatic focusing and improve the imaging effect of a target object.

The embodiment of the application discloses an automatic focusing method, which comprises the following steps:

determining first image area information corresponding to a target object in a current frame image;

predicting first prediction region information of the target object in an Nth frame image after the current frame image according to the first image region information, wherein N is a positive integer;

taking the first prediction area information as an observed value, processing the first image area information and the observed value through a filter, and predicting to obtain second prediction area information of the target object in an Nth frame image after the current frame image;

and controlling a focusing module to perform focusing operation based on the second prediction area information.

The embodiment of the application discloses automatic focusing device includes:

the information determining module is used for determining first image area information corresponding to the target object in the current frame image;

a first prediction module, configured to predict, according to the first image region information, first prediction region information of the target object in an nth frame image after the current frame image, where N is a positive integer;

the second prediction module is used for processing the first image area information and the observation value through a filter by taking the first prediction area information as the observation value, and predicting to obtain second prediction area information of the target object in an N frame image after the current frame image;

and the control module is used for controlling the focusing module to execute focusing operation based on the second prediction area information.

The embodiment of the application discloses an electronic device, which comprises a memory and a processor, wherein a computer program is stored in the memory, and when the computer program is executed by the processor, the processor is enabled to realize the method.

An embodiment of the present application discloses a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the method as described above.

The automatic focusing method, device, electronic device and computer readable storage medium disclosed in the embodiments of the present application determine first image area information corresponding to a target object in a current frame image, predict first prediction area information of the target object in an nth frame image after the current frame image according to the first image area information, use the first prediction area information as an observed value, process the first image area information and the observed value through a filter, predict second prediction area information of the target object in the nth frame image after the current frame image, and control a focusing module to perform focusing operation based on the second prediction area information, and reduce a deviation between a tracking result and a real position of the target object by predicting a focusing area of the target object in the nth frame image after the current frame image and focusing based on the focusing area, the accuracy of automatic focusing is improved. And the first prediction area information is used as the observed value of the filter for prediction, so that the accuracy of the prediction result is improved, and the imaging effect of the target object is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

FIG. 1A is a block diagram illustrating an embodiment of an electronic device in which target focus tracking is achieved in an ideal state;

FIG. 1B is a block diagram illustrating an embodiment of an electronic device implementing target focus tracking in an actual use state;

FIG. 2 is a block diagram of image processing circuitry in one embodiment;

FIG. 3 is a flow diagram of an auto focus method in one embodiment;

FIG. 4 is a flowchart of an auto-focusing method in another embodiment;

FIG. 5A is a diagram illustrating the determination of a target subject in one embodiment;

FIG. 5B is a diagram illustrating the determination of a target subject in another embodiment;

FIG. 6 is a flow diagram for obtaining first prediction region information in one embodiment;

FIG. 7 is a flow diagram for obtaining second prediction region information in one embodiment;

FIG. 8 is a diagram illustrating an embodiment of predicting an Nth frame image of a target object after a current frame image by using a filter;

FIG. 9 is a block diagram of an autofocus device in one embodiment;

fig. 10 is a block diagram of the electronic device in one embodiment.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It is to be noted that the terms "comprises" and "comprising" and any variations thereof in the examples and figures of the present application are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements but may alternatively include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

It will be understood that, as used herein, the terms "first," "second," and the like may be used herein to describe various elements, but these elements are not limited by these terms. These terms are only used to distinguish one element from another. For example, the first prediction region information may be referred to as second prediction region information, and similarly, the second prediction region information may be referred to as first prediction region information, without departing from the scope of the present application. Both the first prediction region information and the second prediction region information are predicted image region information, but they are not the same predicted image region information.

FIG. 1A is a block diagram illustrating an electronic device implementing target focus tracking in an ideal state, according to an embodiment. As shown in fig. 1A, the electronic device may include an image sensor 110, a target determination module 120, a tracking module 130, and a focusing module 140. The focus tracking refers to tracking and focusing a moving target, and the target determination module 120 may be configured to determine a target object to be tracked, during which the image sensor 110 may capture image data. The tracking module 130 may determine the position of the target object in each frame of image based on the image data captured by the image sensor 110, and track the target object. The focusing module 140 may perform focusing based on the tracking result of the tracking module 130, and may determine the real object distance of the target object by using the position of the target object output by the tracking module 130 in the image as a focusing area, so as to move the lens to a focal plane with a clear target, thereby ensuring that the target object is focused clearly.

In the focusing process under the ideal state, the focusing module 140 is disposed behind the tracking module 130, so that it can be ensured that the tracking result received by the focusing module 140 each time is the position of the target object in the current frame image, thereby ensuring accurate focusing. However, in an actual focus tracking process, in order to ensure the imaging effect, the image data captured by the image sensor 110 is usually required to be processed sufficiently, which may include but is not limited to auto-focusing, auto-exposure, white balance processing, noise reduction, and so on, and therefore, the tracking module 130 is disposed behind the focusing module 140.

Fig. 1B is a block diagram of an embodiment of an electronic device in an actual usage state to achieve target focus tracking. As shown in fig. 1B, after the tracking module 130 is disposed behind the focusing module 140, the focusing module 140 is located at a front position in the image capturing process, which may cause a tracking result output by the tracking module 130 to be fed back to the focusing module 140 in a delayed manner, that is, the tracking result output by the tracking module 130 is a tracking result of images of previous frames, rather than a tracking result of a target object in an image that needs to be focused by the focusing module 140. For a target object in a moving state, the delay feedback of the tracking module 130 may cause a deviation between a tracking result of the target object and a real position of the target object, so that a focusing area of the focusing module 140 is inaccurate, the target object is not clear in a collected image, and an imaging effect of the target object is reduced.

The embodiment of the application provides an automatic focusing method, an automatic focusing device, electronic equipment and a computer readable storage medium, which can predict a focusing area of a target object in an Nth frame image after a current frame image and focus based on the predicted focusing area, so that the deviation between a tracking result and a real position of the target object is reduced, and the accuracy of automatic focusing is improved.

The embodiment of the application provides electronic equipment. The electronic device includes therein an Image Processing circuit, which may be implemented using hardware and/or software components, and may include various Processing units defining an ISP (Image Signal Processing) pipeline. FIG. 2 is a block diagram of an image processing circuit in one embodiment. For ease of illustration, FIG. 2 illustrates only aspects of image processing techniques related to embodiments of the present application.

As shown in fig. 2, the image processing circuit includes an ISP processor 240 and control logic 250. The image data captured by the imaging device 210 is first processed by the ISP processor 240, and the ISP processor 240 analyzes the image data to capture image statistics that may be used to determine one or more control parameters of the imaging device 210. Imaging device 210 may include one or more lenses 212 and an image sensor 214. The image sensor 214 may include an array of color filters (e.g., Bayer filters), and the image sensor 214 may acquire the light intensity and wavelength information captured by each imaging pixel and provide a set of raw image data that may be processed by the ISP processor 240. The attitude sensor 220 (e.g., a three-axis gyroscope, hall sensor, accelerometer, etc.) may provide parameters of the acquired image processing (e.g., anti-shake parameters) to the ISP processor 240 based on the type of interface of the attitude sensor 220. The attitude sensor 220 interface may employ an SMIA (Standard Mobile imaging architecture) interface, other serial or parallel camera interfaces, or a combination thereof.

It should be noted that, although only one imaging device 210 is shown in fig. 2, in the embodiment of the present application, at least two imaging devices 210 may be included, and each imaging device 210 may respectively correspond to one image sensor 214, or a plurality of imaging devices 210 may correspond to one image sensor 214, which is not limited herein. The operation of each image forming apparatus 210 can refer to the above description.

Further, image sensor 214 may also send raw image data to pose sensor 220, pose sensor 220 may provide raw image data to ISP processor 240 based on the type of interface of pose sensor 220, or pose sensor 220 may store raw image data in image memory 230.

The ISP processor 240 processes the raw image data pixel by pixel in a variety of formats. For example, each image pixel may have a bit depth of 8, 10, 12, or 14 bits, and the ISP processor 240 may perform one or more image processing operations on the raw image data, gathering statistical information about the image data. Wherein the image processing operations may be performed with the same or different bit depth precision.

The ISP processor 240 may also receive image data from the image memory 230. For example, the gesture sensor 220 interface sends raw image data to the image memory 230, and the raw image data in the image memory 230 is then provided to the ISP processor 240 for processing. The image Memory 230 may be a portion of a Memory device, a storage device, or a separate dedicated Memory within an electronic device, and may include a DMA (Direct Memory Access) feature.

Upon receiving raw image data from the image sensor 214 interface or from the attitude sensor 220 interface or from the image memory 230, the ISP processor 240 may perform one or more image processing operations, such as temporal filtering. The processed image data may be sent to image memory 230 for additional processing before being displayed. The ISP processor 240 receives the processed data from the image memory 230 and performs image data processing on the processed data in the raw domain and in the RGB and YCbCr color spaces. The image data processed by ISP processor 240 may be output to display 260 for viewing by a user and/or further processed by a Graphics Processing Unit (GPU). Further, the output of the ISP processor 240 may also be sent to the image memory 230, and the display 260 may read image data from the image memory 230. In one embodiment, image memory 230 may be configured to implement one or more frame buffers.

The statistics determined by ISP processor 240 may be sent to control logic 250. For example, the statistical data may include image sensor 214 statistics such as vibration frequency of a gyroscope, auto exposure, auto white balance, auto focus, flicker detection, black level compensation, lens 212 shading correction, and the like. Control logic 250 may include a processor and/or microcontroller that executes one or more routines (e.g., firmware) that may determine control parameters of imaging device 210 and ISP processor 240 based on the received statistical data. For example, the control parameters of the imaging device 210 may include attitude sensor 220 control parameters (e.g., gain, integration time of exposure control, anti-shake parameters, etc.), camera flash control parameters, camera anti-shake displacement parameters, lens 212 control parameters (e.g., focal length for focusing or zooming), or a combination of these parameters. The ISP control parameters may include gain levels and color correction matrices for automatic white balance and color adjustment (e.g., during RGB processing), as well as lens 212 shading correction parameters.

The auto-focusing method provided by the embodiment of the present application is exemplarily described with reference to the image processing circuit of fig. 2. The image sensor 214 in the image processing circuit captures the original image data of the first frame and then transmits the original image data to the ISP processor 240, and the ISP processor 240 can determine the target object contained in the original image data to obtain information such as the initial position and the occupied initial area size of the target object in the image of the first frame. The ISP processor 240 may send information such as the initial position and the occupied initial area size to the control logic 250, and the control logic 250 generates corresponding lens control parameters to control the lens of the imaging device 210 to move, so as to realize focusing on the target.

After the focusing of the imaging device 210 is completed, the image sensor 214 continues to capture raw image data and transmit the raw image data to the ISP processor 240, and the ISP processor 240 may process the raw image data to obtain a current frame image. The ISP processor 240 may determine first image area information corresponding to the target object in the current frame image, predict first prediction area information of the target object in an nth frame image after the current frame image according to the first image area information, and process the first image area information and the observation value through a filter by using the first prediction area information as an observation value, so as to predict second prediction area information of the target object in the nth frame image after the current frame image. The ISP processor 240 may send the second prediction region information to the control logic 250, and the control logic 250 determines a focusing region according to the second prediction region information and generates a corresponding lens control parameter to control the lens to move, so that the imaging device 210 focuses on the focusing region, thereby implementing accurate focusing on the target device.

As shown in fig. 3, in an embodiment, an image processing method is provided, which can be applied to the above-mentioned electronic devices, which can include, but are not limited to, a mobile phone, a tablet Computer, a smart wearable device, a vehicle-mounted terminal, a monitoring system, a notebook Computer, a PC (Personal Computer), and the like, and one or more cameras can be disposed in the electronic devices, and can also be connected to one or more external cameras. The image processing method can comprise the following steps:

in step 310, the first image area information corresponding to the target object in the current frame image is determined.

The target object may refer to a target object of interest of the electronic device, that is, a main shooting object of the camera, and the target object may include, but is not limited to, various objects such as a person, an animal, and a non-living object, and the target object may be the entire object or only a part of the object, for example, the target object may be the entire person or a face of the person, and the target object may be the entire building or a gate or a floor of the building.

The electronic equipment can acquire images containing the target object acquired through the camera in real time and process each frame of recently acquired images to obtain first image area information corresponding to the target object in each frame of images. The current frame image may refer to an image that a processor of the electronic device has recently processed based on raw image data captured by an image sensor of the camera.

The electronic equipment can identify the current frame image, determine an image area occupied by the target object in the current frame image, and acquire first image area information corresponding to the image area. Alternatively, the first image area information may include at least one of position information and size information, wherein the position information may refer to position coordinates of the target object in the current frame image, and the size information may refer to an area size of the image area occupied by the target object in the current frame image.

Step 320, predicting the first prediction area information of the target object in the Nth frame image after the current frame image according to the first image area information.

The electronic device can predict an image area occupied by the target object in an Nth frame image after the current frame image according to first image area information of the target object in the current frame image to obtain first prediction area information. The first prediction region information may include at least one of first prediction position information and first prediction size information.

N may be a positive integer, and in some embodiments, N may be set according to actual requirements, for example, 1, 2, 3, and so on. In some embodiments, N may be used to characterize the delay frame number, which may refer to the frame number of the delay feedback from the tracking module 130 to the focusing module 140, i.e., the difference between the frame number corresponding to the tracking result fed back by the tracking module 130 and the frame number of the focusing module 140 focusing based on the tracking result. For example, when the tracking module 130 identifies the 2 nd frame image and feeds back the tracking result of the target object to the focusing module 140, the focusing module 140 focuses the 5 th frame image based on the tracking result, and a delay frame number of 3 frames exists between the tracking module 130 and the focusing module 140.

Alternatively, the number of delay frames described above may be related to the performance of the processor, and the faster the processing speed of the processor, the shorter the number of delay frames may be, and the slower the processing speed of the processor, the longer the number of delay frames may be. The delay frame number may be a frame number obtained by a large number of tests before the electronic device leaves a factory, or may be obtained by performing a field test each time the electronic device tracks a new target object, or the delay frame number may be retested by the electronic device every fixed time.

Taking the example that the electronic device performs the field test every time when tracking a new target object, the electronic device can determine the target object to be tracked through the initial frame image and determine the area information of the target object in the initial frame image every time when the electronic device needs to track the new target object. The electronic equipment can control the focusing module to focus by taking the area information of the target object in the initial frame image as a focusing area, acquire a new image, perform area prediction based on the new image to obtain a first predicted focusing area, and control the focusing module to focus by using the first predicted focusing area. The method comprises the steps of recording the frame number of an image obtained by focusing the target object in an initial frame image by using the area information of the target object in the initial frame image as a focusing area by using a focusing module, recording the frame number of the image obtained by focusing the target object in the initial frame image by using the focusing area predicted for the first time by using the focusing module, and calculating the difference between the two recorded frame numbers, wherein the difference is the delay frame number.

And 330, taking the first prediction region information as an observed value, processing the first image region information and the observed value through a filter, and predicting to obtain second prediction region information of the target object in an N frame image after the current frame image.

In embodiments of the present application, the filter may comprise a kalman filter, which is a data filtering model that utilizes linear system states and is capable of estimating the state of a dynamic system from a series of data in the presence of measurement noise based on known conditions. The kalman filter may predict a state of the target object in an nth frame image after the current frame image according to the first image region information, and may use the first predicted region information as an observation value of the kalman filter, and correct the predicted state of the kalman filter with the observation value, thereby obtaining second predicted region information of the target object in the nth frame image after the current frame image. Meanwhile, the real data (such as a first image area) of the target object in the current frame image and the first prediction area information of the Nth frame image after the current frame image are used for prediction based on the filter, so that a more accurate prediction result can be obtained.

In the related technology, the Kalman filter needs to measure the state of the target moment, and the estimated state is modified by taking the actual value obtained by actual measurement as an observation value.

And step 340, controlling the focusing module to perform focusing operation based on the second predicted area information.

The second prediction region information may include at least one of second prediction position information and second prediction size information, the focusing module may determine a focusing region according to the second prediction region information, may use the second prediction position information as a position of the focusing region, may use the second prediction size information as a size of the focusing region, and performs a focusing operation on the focusing region.

In some embodiments, the focusing module may perform focusing operations in a focusing manner such as phase focusing or contrast focusing. In the phase focusing method, the lens position of the camera can be adjusted according to the phase difference information of each pixel point included in the focusing area in the original image data captured by the image sensor, so that the phase difference of each pixel point included in the focusing area is 0, and the focusing area is a clear image area, thereby realizing the focusing of the focusing area. In contrast focusing mode, the definition of the focusing area in the original image data captured by the image sensor can be calculated, the lens position of the camera is adjusted based on the definition, the best lens position can be found by adopting a climbing algorithm, and the definition of the focusing area corresponding to the best lens position is highest, so that the focusing of the focusing area is realized. It should be noted that the focusing module may also perform the focusing operation in other manners, and is not limited herein.

In the embodiment of the application, the first image area information corresponding to the target object in the current frame image is determined, the first prediction area information of the target object in the nth frame image after the current frame image is predicted according to the first image area information, the first prediction area information is used as an observation value, the first image area information and the observation value are processed through a filter, the second prediction area information of the target object in the nth frame image after the current frame image is predicted, the focusing module is controlled to perform focusing operation based on the second prediction area information, the focusing area of the target object in the nth frame image after the current frame image is predicted, and focusing is performed based on the focusing area, so that the deviation between the tracking result of the target object and the real position can be reduced, and the accuracy of automatic focusing is improved. And the first prediction region information is used as the observed value of the filter for prediction, so that the accuracy of the prediction result is improved, and the imaging effect of the target object is improved.

As shown in fig. 4, in an embodiment, another auto-focusing method is provided, which can be applied to the electronic device described above, and the auto-focusing method can include the following steps:

step 402, determining a target object from the initial frame image, and obtaining an initial image area of the target object in the initial frame image.

The initial frame image may refer to an image obtained by the electronic device for the first time to a target object to be tracked, and the electronic device may determine an image position of the target object in the initial frame image and obtain an initial image region of the target object in the initial frame image based on the image position. Alternatively, an image area occupied by the target object in the initial frame image may be identified, and an area of a circumscribed quadrangle (e.g., a rectangle, a square, etc.) of the occupied image area may be used as the initial image area. Or selecting an initial image area according to a set image area size by taking the image position of the target object in the initial frame image as the center.

In some embodiments, the manner in which the electronic device determines the target object from the initial frame image may include, but is not limited to, the following:

the method comprises the steps of carrying out subject recognition on an image acquired by a camera through a subject recognition model obtained through pre-training, determining a target object, and when a new target object is recognized by the subject recognition model, taking the image of the new target object as an initial frame image, wherein the electronic equipment can control the camera to focus on the new target object.

The subject recognition model may train images according to a large number of sample images recommended with subject labels, each training image may label the image position where the subject is located and the size of the occupied image area, and further, may label the category of the subject, for example, if the subject in the sample image is a human face, it may be labeled as a human face image, if the subject in the sample image is a scene, it may be labeled as a scene image, and if the subject in the sample image is a food, it may be labeled as a food image, but is not limited thereto. The training image can be input into the neural network for training, and parameters of the neural network are continuously adjusted according to the recognition result output by the neural network and the labeling information carried by the sample image, so that the error between the recognition result and the labeling information output by the neural network is smaller than an error threshold value, and the main body recognition model is obtained.

FIG. 5A is a diagram illustrating the determination of a target subject, under an embodiment. As shown in fig. 5A, the image in fig. 5A is input into the subject recognition model, and if the subject recognition model can recognize that the target object is a human face, the initial image area 510 can be determined according to a circumscribed quadrangle of an area occupied by the human face in the image. The target object is determined by using the main body recognition model obtained by machine learning, so that the recognition efficiency and accuracy can be improved.

And secondly, the electronic equipment can display the image acquired by the camera, and when the target selection operation of the user on the displayed image is detected, the target object can be determined according to the target selection operation, and the image with the detected target selection operation can be used as an initial frame image. Optionally, the target selection operation may include, but is not limited to, a touch operation, a voice operation, a line-of-sight interaction operation, and the like.

Taking a touch operation as an example, the electronic device may obtain touch coordinates of the touch operation on the screen, and convert the touch coordinates into image coordinates in the displayed image, so that the target object may be determined according to the image coordinates. And selecting an image area where the target object is located according to the set size of the image area by taking the image coordinate obtained by the conversion as a center, wherein the image area where the target object is located is the initial image area. FIG. 5B is a diagram illustrating the determination of a target subject in another embodiment. As shown in fig. 5B, the electronic device may display the image in fig. 5B, and if the user touches the position of the flower in the image, the electronic device may select the initial image area 520 according to the set size of the image area with the image coordinate obtained by converting the touch coordinate as the center.

Taking voice operation as an example, the electronic device may receive voice information of a user, recognize the voice information to obtain voice content, and if the voice content includes main body information, may search and determine a target object in a displayed image according to the main body information. The subject information may include, but is not limited to, a category name of the target object, a position of the target object in the displayed image, and the like, for example, if "face" is included in the voice information, a face may be searched in the displayed image and the face may be determined as the target object, and if "animal" is included in the voice information, an animal may be searched in the displayed image and the searched animal may be determined as the target object, and the like.

Taking the sight interaction operation as an example, after the electronic device displays an image, the electronic device may acquire a human eye image of the user through the human eye tracking system, extract an eye feature in the human eye image, analyze the eye feature to obtain a sight focus of the user's eyes on the screen, and determine the target object according to the sight focus. And selecting an image area where the target object is located according to the set size of the image area by taking the sight focus as a center, wherein the image area where the target object is located is an initial image area. The target object is determined based on the target selection operation of the user, so that the determined target object can better meet the requirements of the user, and the viscosity of the user is improved.

It should be noted that other ways to determine the target object in the initial frame image may be adopted, and the method is not limited to the above-mentioned ways.

Step 404, matching the current frame image with the initial image area, and determining a target area with the highest similarity with the initial image area in the current frame image.

The electronic device may track the target object based on an initial image region of the target object in the initial frame image, may register the current frame image with the initial image region, and determine a target region in the current frame image that matches the initial image region. In some embodiments, the image feature of the initial image region may be extracted, and when the current frame image is matched with the initial image region, the image feature of the current frame image may be extracted, and the image feature of the initial image region may be compared with the image feature of the current frame image, and the image feature most similar to the image feature of the initial image region in the current frame image may be used as the matched image feature, and the target region may be determined according to the image region matched with the initial image region in the current frame image.

In some embodiments, the current frame image and the initial image region may also be matched by using a correlation filtering method, the current frame image and the initial image region may be respectively converted into a frequency domain, and a correlation degree between the current frame image and the initial image region is calculated in the frequency domain, where a region with a highest correlation degree in the current frame image is a target region with a highest similarity degree with the initial image region. Further, frequency domain point multiplication calculation can be carried out on the current frame image and the initial image area in the frequency domain, and the correlation degree of different image areas of the current frame image and the initial image area is obtained. By replacing the convolution result of the image domain in a frequency domain dot product mode, the calculation amount of the image can be reduced, and the data processing efficiency can be improved.

In other embodiments, a twin neural network model may also be used to match the current image with the initial image region, and the current image and the initial image region may be respectively input into two neural network models with the same parameters, where one neural network model may output a first result corresponding to each image region based on image features of different image regions of the current image, and the other neural network model may output a second result based on the initial image region, and compare the first result with the second result. If the difference between the first result and the second result is smaller, the images are more similar, and the image area corresponding to the first result with the smallest difference between the first result and the second result can be selected as the target area with the highest similarity with the initial image area in the current frame image.

The method of determining the target area is not limited to the above-described methods, and may be determined by other methods, which is not limited in the present application.

And step 406, acquiring first image area information corresponding to the target object in the current frame image according to the target area.

After determining a target area with the highest similarity to the initial image area in the current image, the electronic device may use the position coordinates of the target area as the position information of the target object in the current image, where the position coordinates of the target area may be the center coordinates of the target area, or may be corner coordinates (e.g., lower-left coordinates, upper-right coordinates, etc.) selected in the target area, and may use the area size of the target area as the size information of the target object in the current image, so as to obtain the first image area information.

In the embodiment of the application, the first image area information of the target object in the current frame image is determined through the initial image area of the target object in the initial frame image, the target object only needs to be determined in the initial frame image, and the main body identification of each frame image is not needed, so that the image processing efficiency can be improved.

In step 408, the first prediction region information of the target object in the nth frame image after the current frame image is predicted according to the first image region information.

In one embodiment, as shown in FIG. 6, the step of predicting the first prediction region information of the target object in the Nth frame image after the current frame image according to the first image region information may include steps 602-604.

Step 602, calculating the corresponding change speed information of the target object in the current frame image according to the corresponding second image area information of the target object in the historical frame image and the first image area information.

The historical frame image is an image acquired by a camera before the current frame image, and the historical frame image may be a first frame image, a second frame image, a third frame image, and the like of the current frame image, for example, if the current frame image is an 8 th frame image, then a 7 th frame image, a 6 th frame image, a 5 th frame image, and the like may be used as the historical frame image. The difference between the number of the historical frame images and the number of the current frame images can be set according to actual requirements. Optionally, the electronic device may calculate, in combination with one or more historical frame images, change speed information corresponding to the target object in the current frame image.

In some embodiments, the difference between the number of frames between the historical frame image and the current frame image has a positive correlation with a delay time, which can be used to characterize the time for the tracking module 130 to the focusing module 140 to delay the feedback.

As an embodiment, the delay time may be determined according to the durations of the first prediction region information and the second prediction region information of the prediction target object in the nth frame image after the current frame image. The electronic device may be provided with a timer, and may record a start time at which the prediction is started to obtain the first prediction region, and a feedback time at which the second prediction region information is fed back to the focusing module, where a time difference between the feedback time and the start time is a time length for predicting the first prediction region information and the second prediction region information, and the time difference may be used as a delay time.

As another embodiment, the delay time may correspond to the number of delay frames, and the delay time may be determined according to the frame rate of the camera and the number of delay frames, specifically, the acquisition duration of each frame of image acquired by the camera may be calculated according to the frame rate of the camera, and the product of the acquisition duration and the number of delay frames is taken as the delay time. Alternatively, the number of delay frames may be determined according to the durations of the first prediction region information and the second prediction region information of the prediction target object in the nth frame image after the current frame image.

Further, the difference between the frame numbers of the historical frame image and the current frame image may be proportional to the aforementioned delay frame number N, and the difference between the frame numbers of the historical frame image and the current frame image is an integral multiple of N. The electronic device may first acquire the number N of delay frames, and then acquire the historical frame image according to the number N of delay frames. For example, the current frame image is the ith frame image, the history frame image may be an i-N frame image, an i-2N frame image, etc., but is not limited thereto.

The second image region information may include at least one of position information and size information of the target object in the history frame image. The change speed information may include information such as a change speed, which may include at least one of a position change speed and a size change speed, and a change acceleration, which may include at least one of a position change acceleration and a size change acceleration.

Step 604, predicting first prediction region information of the target object in an nth frame image after the current frame image according to the change speed information.

In some embodiments, the position change speed of the target object in the current frame image may be determined according to the position information of the target object in the current frame image and the position information in the historical frame images. The position change acceleration of the target object in the current frame image can be determined according to the position change speed of the target object in the current frame image and the position change speed in the historical frame image.

Further, the number of frames may be delayed as a unit time, and the position change speed of the target object in the current frame image may be a difference between the position information of the target object in the current frame image and the position information in the history frame image. The position change speed of the target object in the current frame image may be a difference value between the position change speed of the target object in the current frame image and the position change speed in the history frame image. The calculation of the position change speed and the position change acceleration of the target object in the current frame image can be represented by the following equations (1) to (4):

wherein x is _i Representing the position coordinates of the target object in the X-axis direction in the ith frame image; x is the number of _i-N Representing the position coordinates of the target object in the X-axis direction in the i-N frame images;

representing a target pairThe position change speed in the X-axis direction like in the i-th frame image; y is _i Representing the position coordinates of the target object in the Y-axis direction in the ith frame image; y is _i-N Representing the position coordinates of the target object in the Y-axis direction in the i-N frame images;

indicating a position change speed of the target object in the Y-axis direction in the ith frame image;

representing the position change speed of the target object in the X-axis direction in the i-N frame images;

representing the position change speed acceleration of the target object in the X-axis direction in the ith frame image;

representing the position change speed of the target object in the Y-axis direction in the i-N frame images;

indicating the position change acceleration of the target object in the Y-axis direction in the i-th frame image. The X-axis direction and the Y-axis direction may be expressed in an image coordinate system or a pixel coordinate system of the i-th frame image.

As a specific embodiment, a first predicted position coordinate of the target object in the X-axis direction in an nth frame image after the current frame image may be predicted according to a position change speed and a position change acceleration of the target object in the X-axis direction in the current frame image; the first predicted position coordinate of the target object in the Y-axis direction in the current frame image may be predicted according to the position change speed and the position change acceleration of the target object in the Y-axis direction in the nth frame image after the current frame image. The position information of the target object in the current frame image can be determined as shown in equations (5) to (6):

wherein, the first and the second end of the pipe are connected with each other,

indicating a first predicted position coordinate of the target object in the X-axis direction in an nth frame image (i.e., an i + nth frame image) following the ith frame image,

indicating a first predicted position coordinate of the target object in the Y-axis direction in the i + N-th frame image, the first predicted position information may include

In some embodiments, the size change speed of the target object in the current frame image may be determined according to the size information of the target object in the current frame image and the size information in the history frame image. The size change acceleration of the target object in the current frame image can be determined according to the size change speed of the target object in the current frame image and the size change speed in the historical frame image.

Further, the number of frames may be delayed as a unit time, and the size change speed of the target object in the current frame image may be a difference between the size information of the target object in the current frame image and the size information in the history frame image. The size change speed of the target object in the current frame image may be a difference value between the size change speed of the target object in the current frame image and the size change speed in the history frame image. The size information may include the height and width of the corresponding image area of the target object in the image, and the calculation of the size change speed and the size change acceleration of the target object in the current frame image may be as shown in equations (7) to (10):

wherein, w _i Representing the width of the corresponding image area of the target object in the ith frame image; w is a _i-N Representing the width of the corresponding image area of the target object in the i-N frame images;

representing the width change speed of the corresponding image area of the target object in the ith frame image; h is _i Representing the height of the target object in the corresponding image area in the ith frame image; h is _i-N Representing the height of the corresponding image area of the target object in the i-N frame images;

representing the high change speed of the target object in the corresponding image area in the ith frame image;

representing the width change speed of the corresponding image area of the target object in the i-N frame images;

representing the wide variation acceleration of the target object in the corresponding image area in the ith frame image;

representing the height of the corresponding image area of the target object in the i-N frame imagesThe speed of change;

indicating a high variation acceleration of the target object in the image of the ith frame corresponding to the image area. Alternatively, the width of the image area may be a side length of the image area parallel to the X-axis direction of the image coordinate system, and the height of the image area may be a side length of the image area parallel to the Y-axis direction of the image coordinate system.

As a specific embodiment, a first prediction width value of the target object in the image region corresponding to the nth frame image after the current frame image may be predicted according to the wide variation speed and the wide variation acceleration of the target object in the image region corresponding to the current frame image; a first prediction height value of the target object corresponding to the image region in an Nth frame image after the current frame image can be predicted according to the high variation speed and the high variation acceleration of the target object corresponding to the image region in the current frame image. The position information of the target object in the current frame image can be determined as shown in equations (11) to (12):

wherein the content of the first and second substances,

a first prediction width value representing a corresponding image area of the target object in the i + N frame image,

and a first predicted height value representing a corresponding image area of the target object in the i + N frame image. The first prediction size information may include

And

and step 410, using the first prediction region information as an observation value, processing the first image region information and the observation value through a filter, and predicting to obtain second prediction region information of the target object in an N frame image after the current frame image.

In one embodiment, as shown in fig. 7, the step of processing the first image area information and the observation value by using the first prediction area information as the observation value and using a filter to predict the second prediction area information of the target object in the nth frame image after the current frame image may include steps 702 to 704.

Step 702, calculating the first image area information through the prediction model to obtain a predicted value of the filter.

The filter may include a prediction model and a correction model, and in this embodiment, the filter may include a kalman filter, which is an efficient autoregressive filter for inferring the next step of the system based on the predicted and observed values of the system and their correlation with noise. The prediction model can be used for predicting the system state at the next moment, and the correction model can correct the predicted system state at the next moment according to the prediction noise and the observation noise and by combining the observation value.

In one embodiment, the step of calculating the first image region information by the prediction model to obtain the filter prediction value may include: and obtaining the corresponding change speed of the target object in the current frame image, calculating the first image area information and the change speed based on the prediction model to obtain the corresponding predicted state value of the target object at the next moment, and taking the predicted state value as the predicted value of the filter.

The prediction model may be constructed under the condition that the target object is assumed to change at a constant speed, and the state of the target object at the next time is estimated based on the corresponding change speed of the target object in the current frame image. In some embodiments, a current state value corresponding to the target object at the current time may be determined according to the first image region information and the change speed, and the current state value may be used to represent a system state of the target object in the current frame image, and the current state value is calculated through a state transition matrix of the prediction model to obtain an estimated state value corresponding to the target object at the next time. The number of delayed frames N can be set as a unit time, and the estimated state value corresponding to the target object at the next time can be used to describe the predicted state of the target object in the nth frame image after the current frame image.

As a specific embodiment, the estimated state value of the target object at the next time may be calculated according to the following equations (13) to (14).

c _i+N ＝Ac _i -Bu _i+N (13)；

P _i+N ＝AP _i A ^T +Q (14)；

Where A is a state transition matrix, B is an input control matrix, u _i+N For the external control quantity, P is the error matrix and Q is the predicted noise covariance matrix. c. C _i Representing the current state value (i.e. the state value corresponding to the ith frame image), c _i+N Indicating the estimated state value corresponding to the next time (i.e. the estimated state value corresponding to the i + N th frame image).

In the embodiment of the present application, assuming that the motion speed of the target object is uniform and is not interfered, the state transition matrix

u _i+N 0. Predicted noise covariance matrix

Wherein q is a debugging parameter.

In some embodiments, the first image area information includes position information, the change speed includes position change speed, the filter prediction value includes filter prediction position information, and the current state value of the target object

Can be used to describe the position state of the target object in the ith frame of image, and can be based on the current state value c _i Obtaining the estimated state value c of the next moment _i+N C of _i+N Can be used to describe the predicted position state of the target object in the i + N frame image, i.e. as the filter prediction position information.

In some embodiments, the first image area information includes size information, the change speed includes a size change speed, the filter prediction value includes filter prediction size information, and the current state value of the target object

Can be used to describe the size status of the corresponding image area of the target object in the ith frame image, and can be based on the current status value c _i Obtaining the estimated state value c of the next moment _i+N C of _i+N The method can be used for describing the estimated size state of the corresponding image area of the target object in the i + N frame image, namely as filter prediction size information.

Step 704, using the first prediction region information as an observation value of the correction model, and correcting the predicted value of the filter in the correction model by using the observation value to obtain second prediction region information of the target object in an nth frame image after the current frame image.

In some embodiments, the first prediction region information may be input to the correction model as an observation value, the filter prediction value may be corrected by a noise matrix and the observation value of the correction model to obtain a corrected filter prediction value, and the second prediction region information may be obtained based on the corrected filter prediction value. Alternatively, since the filter prediction value is a state value, a value of the target dimension in the corrected filter prediction value may be taken as the second prediction region information of the target object in the nth frame image after the current frame image, for example, a value of the first two dimensions in the corrected filter prediction value may be taken as the second prediction region information of the target object in the nth frame image after the current frame image.

The noise matrix may include a measured noise covariance matrix, a predicted noise covariance matrix, and the like. As a specific embodiment, the filter prediction value may be corrected according to the following equations (15) to (17):

K _i+N ＝P _i+N H ^T (HP _i+N H ^T +R) ^-1 (15)；

P _i+N ＝(I-K _i+N H)P _i+N (17)；

wherein K is Kalman gain, z _i+N For the observed value (which may be the first prediction region information in the embodiment of the present application) corresponding to the estimated next time, H denotes an observation matrix, R denotes a measurement noise covariance matrix, and I denotes an identity matrix.

Indicating the corrected predicted filter value.

In the embodiment of the present application, the observation matrix

Measurement error matrix

Wherein r is a debugging parameter.

The current state value is calculated by using the state transition matrix, the estimated state value corresponding to the target object at the next moment is obtained, meanwhile, the error matrix P at the next moment can be predicted, the Kalman gain K at the next moment is updated based on the predicted error matrix P at the next moment, and iteration updating is continuously carried out.

In some embodiments, the filter prediction value includes filter prediction position information, the first prediction region information may include first prediction position information, and then the observation value corresponding to the next time instant

Based on the current state value of the target object

The estimated state value c of the next moment is obtained _i+N Using the observed value pair c _i+N Correcting the dimension and current state value c of the predicted position information of the filter _i The two-dimensional values are taken as the second predicted position information output by the kalman filter because the two-dimensional values are kept consistent.

In some embodiments, the filter prediction value includes filter prediction size information, the first prediction region information may include first prediction size information, and the observation value corresponding to the next time instant

Based on the current state value of the target object

The estimated state value c of the next moment is obtained _i+N Using the observed value pair c _i+N Correcting the dimension and current state value c of the predicted position information of the filter _i The consistency is maintained, and therefore, the numerical value of the previous two dimensions can be taken as the second predicted dimension information output by the Kalman filter. It is understood that the target dimension may be adjusted according to the matrix composition of the state values in actual use, and the former two dimensions are only an exemplary implementation and are not used to limit the embodiments of the present application.

Fig. 8 is a diagram illustrating that a filter is used to predict an nth frame image of a target object after a current frame image in one embodiment. As shown in fig. 8, the position information and the size information of the target object in the current frame image may be input to a kalman prediction model, which obtains predicted position information based on the position information of the target object in the current frame image, and obtains predicted size information based on the size information of the target object in the current frame image. The first predicted position information and the first predicted size information of the target object in the nth frame image after the current frame image can be used as observation values to be input into the kalman modification model, and the kalman modification model can correct the position information predicted by the kalman prediction model according to the first predicted position information to obtain the second predicted position information of the target object in the nth frame image after the current frame image. The Kalman modification model can modify the size information predicted by the Kalman prediction model according to the first predicted size information to obtain second predicted size information of an N frame image of the target object after the current frame image.

In step 412, the focusing module is controlled to perform a focusing operation based on the second prediction area information.

The focusing module can determine a focusing area based on second predicted position information and second predicted size information predicted by the Kalman filter and execute focusing operation.

In some embodiments, N may also be a frame number greater than the delay frame number, and then when an nth frame image after the current frame image is acquired by the camera, the focusing module is controlled to perform the focusing operation based on the second prediction region information corresponding to the nth frame image after the current frame image, so that the focusing process is more flexible and accurate.

In the above embodiments, the position coordinates of the target object may be the center coordinates of the corresponding image area of the target object in the image, or may be the corner coordinates of the corresponding image area, and the like, which is not limited herein.

In the embodiment of the application, the position and the size of the target object in the Nth frame image after the current frame image are predicted based on a Kalman filtering method and by combining the speed and the acceleration of the position and size change, so that the focusing area of the focusing module is corrected, the target object can accurately fall into the focusing area in a scene in which the target object is in a motion state, the focusing accuracy is improved, the problem of unclear resolution of the target object in the motion process is optimized, and the imaging effect of the target object is improved.

In the prediction process of the target object, if the target is predicted only with the first derivative, that is, if the target motion velocity is assumed to be constant, the motion change perception is weak, and it is difficult to accurately predict the target of the motion change. In the embodiment of the application, the first prediction area information based on the variation acceleration is used as the observation value, the uniform velocity is used as the filter prediction value of the kalman filter, and the filter prediction value is corrected by combining the noise matrix, so that more accurate prediction information can be obtained, and the prediction accuracy is improved.

In addition, the target position and the size are respectively predicted in two dimensions, the position is accurately predicted, meanwhile, the target object can be always contained in the focusing area through the prediction of the scale change, the error of the focusing area of the target object is effectively reduced, and the focusing effect of the target object is optimized.

In other embodiments, besides the kalman filtering method described above, other algorithms may be used to predict the nth frame image of the target object after the current frame image. For example, the nth frame image of the target object after the current frame image may be predicted by using an optical flow method, where the optical flow refers to an instantaneous velocity of the target object at a pixel level in the imaging domain, and the optical flow method is a method for calculating an optical flow by using a velocity of a gray level change of a corresponding pixel position of the target object on two consecutive frame images, and the optical flow method may include, but is not limited to, a gradient-based optical flow method, a matching-based optical flow method, a phase-based optical flow method, and the like.

In some embodiments, taking a gradient-based optical flow method as an example, corner points of a target object in an image region corresponding to a current frame image may be detected, a motion vector of each corner point (the motion vector may be used to describe a moving direction and a moving speed) may be calculated based on a hypothesis of spatial consistency and a least square method, and a moving direction and a moving speed corresponding to the target object may be obtained based on the motion vector of each corner point, so as to predict and obtain position information of the target object in an nth frame image after the current frame image.

Alternatively, for the size information of the predicted target object in the nth frame image after the current frame image, the depth information of the target object in the current frame image may be obtained, and the depth information may be used to indicate the distance between the target object and the camera, and the depth information may be obtained by infrared ranging, laser ranging, binocular ranging, and the like, which is not limited herein. The depth information of the target object in the historical frame image and the depth information of the current frame image can be utilized to calculate the depth change information, the size change information of the target object in the Nth frame image after the current frame image is obtained through prediction according to the depth change information, and then the predicted size information of the target object in the Nth frame image after the current frame image is obtained based on the size change information.

In the embodiment of the application, the position information and the size information of the target object in the Nth frame image after the current frame image are predicted, so that the focusing area in the Nth frame image after the current frame image can be accurately predicted, the focusing accuracy is improved, and the imaging effect of the target object is ensured.

As shown in fig. 9, in one embodiment, an auto-focusing apparatus 900 is provided, which can be applied to the electronic device. The auto-focusing apparatus 900 may include an information determining module 910, a first predicting module 920, a second predicting module 930, and a control module 940.

An information determining module 910, configured to determine first image region information corresponding to the target object in the current frame image.

The first prediction module 920 is configured to predict first prediction region information of the target object in an nth frame image after the current frame image according to the first image region information, where N is a positive integer.

And a second prediction module 930, configured to use the first prediction region information as an observation value, process the first image region information and the observation value through a filter, and predict second prediction region information of the target object in an nth frame image after the current frame image.

In one embodiment, N is a delay frame number determined based on the durations of the first prediction region information and the second prediction region information of the prediction target object in the nth frame image after the current frame image.

A control module 940 for controlling the focusing module to perform a focusing operation based on the second prediction region information.

In one embodiment, the auto-focusing apparatus 900 further includes an object determination module in addition to the information determination module 910, the first prediction module 920, the second prediction module 930, and the control module 940.

And the object determining module is used for determining the target object from the initial frame image and acquiring an initial image area of the target object in the initial frame image.

The information determining module 910 is further configured to match the current frame image with the initial image region, determine a target region with the highest similarity to the initial image region in the current frame image, and obtain first image region information corresponding to the target object in the current frame image according to the target region.

In one embodiment, the first prediction module 920 includes a velocity calculation unit and a first prediction unit.

And the speed calculating unit is used for calculating the corresponding change speed information of the target object in the current frame image according to the second image area information and the first image area information of the target object in the historical frame image, and the historical frame image is an image acquired by a camera before the current frame image.

In one embodiment, the difference between the number of frames of the history frame image and the current frame image has a positive correlation with a delay time, and the delay time is determined according to the durations of the first prediction region information and the second prediction region information of the prediction target object in the nth frame image after the current frame image.

In one embodiment, the difference in frame number between the historical frame image and the current frame image is an integer multiple of N.

In one embodiment, the change speed includes a position change speed and a position change acceleration, and the first predicted area information includes first predicted position information; and/or

The change speed includes a size change speed and a size change acceleration, and the first prediction area information includes first prediction size information.

A first prediction unit for predicting first prediction region information of the target object in an nth frame image following the current frame image based on the change speed information.

In one embodiment, the filter includes a prediction model and a modification model.

The second prediction module 930 includes a second prediction unit and a modification unit.

And the second prediction unit is used for calculating the first image area information through the prediction model to obtain a predicted value of the filter.

In an embodiment, the second prediction unit is further configured to obtain a corresponding change speed of the target object in the current frame image, calculate the first image area information and the change speed based on the prediction model, obtain an estimated state value corresponding to the target object at the next time, and use the estimated state value as the predicted filter value.

In an embodiment, the second prediction unit is further configured to determine a current state value of the target object at the current time according to the first image area information and the change speed, and calculate the current state value through a state transition matrix of the prediction model to obtain an estimated state value of the target object at the next time.

In one embodiment, the first image area information includes position information, the change speed includes a position change speed, and the filter prediction value includes filter prediction position information; and/or

The first image area information includes size information, the change speed includes a size change speed, and the filter prediction value includes filter prediction size information.

And the correcting unit is used for correcting the predicted value of the filter in the correction model by using the observation value by taking the first prediction area information as the observation value of the correction model to obtain second prediction area information of the target object in an Nth frame image after the current frame image.

In an embodiment, the correcting unit is further configured to correct the filter predicted value by correcting the noise matrix and the observation value of the model to obtain a corrected filter predicted value, and take a value of a target dimension in the corrected filter predicted value as second prediction region information of the target object in an nth frame image after the current frame image.

In the embodiment of the application, based on a Kalman filtering method, the position and the size of a target object in an Nth frame image after a current frame image are predicted by combining the speed and the acceleration of position and size change, so that a focusing area of a focusing module is corrected, the target object can accurately fall into the focusing area when the target object is in a motion state scene, the focusing accuracy is improved, the problem of unclear target object in the motion process is optimized, and the imaging effect of the target object is improved.

In addition, in the embodiment of the application, the target position and the target size are respectively predicted from two dimensions, and the target object can be always contained in the focusing area through prediction of scale change while the position prediction is determined to be accurate, so that the error of the focusing area of the target object is effectively reduced, and the focusing effect on the target object is optimized.

FIG. 10 is a block diagram showing the structure of an electronic apparatus according to an embodiment. As shown in fig. 10, electronic device 1000 may include one or more of the following components: a processor 1010, a memory 1020 coupled to the processor 1010, wherein the memory 1020 may store one or more computer programs that may be configured to be executed by the one or more processors 1010 to implement the methods as described in the various embodiments above.

Processor 1010 may include one or more processing cores. The processor 1010 interfaces with various components throughout the electronic device 1000 using various interfaces and circuitry to perform various functions of the electronic device 1000 and process data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 1020 and invoking data stored in the memory 1020. Alternatively, the processor 1010 may be implemented in hardware using at least one of Digital Signal Processing (DSP), Field-Programmable Gate Array (FPGA), and Programmable Logic Array (PLA). The processor 1010 may integrate one or more of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a modem, and the like. Wherein, the CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing display content; the modem is used to handle wireless communications. It is understood that the modem may not be integrated into the processor 1010, but may be implemented by a communication chip.

The Memory 1020 may include a Random Access Memory (RAM) or a Read-Only Memory (ROM). The memory 1020 may be used to store instructions, programs, code, sets of codes, or sets of instructions. The memory 1020 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for implementing at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing the various method embodiments described above, and the like. The stored data area may also store data created during use by the electronic device 1000, and the like.

It is understood that the electronic device 1000 may include more or less structural elements than those shown in the above structural block diagrams, for example, a power module, a physical button, a WiFi (Wireless Fidelity) module, a speaker, a bluetooth module, a sensor, etc., and is not limited herein.

The embodiment of the application discloses a computer readable storage medium, which stores a computer program, wherein the computer program realizes the method described in the above embodiment when being executed by a processor.

Embodiments of the present application disclose a computer program product comprising a non-transitory computer readable storage medium storing a computer program, and the computer program, when executed by a processor, implements the method as described in the embodiments above.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a non-volatile computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the program is executed. The storage medium may be a magnetic disk, an optical disk, a ROM, etc.

Any reference to memory, storage, database, or other medium as used herein may include non-volatile and/or volatile memory. Suitable non-volatile memory can include ROM, Programmable ROM (PROM), Erasable PROM (EPROM), Electrically Erasable PROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM), which acts as external cache memory. By way of illustration and not limitation, RAM can take many forms, such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDR SDRAM), Enhanced SDRAM (ESDRAM), Synchlink DRAM (SLDRAM), Rambus Direct RAM (RDRAM), and Direct Rambus DRAM (DRDRAM).

It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present application. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. Those skilled in the art should also appreciate that the embodiments described in this specification are exemplary embodiments in nature, and that acts and modules are not necessarily required to practice the invention.

In various embodiments of the present application, it should be understood that the size of the serial number of each process described above does not mean that the execution sequence is necessarily sequential, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation on the implementation process of the embodiments of the present application.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware, or may also be implemented in the form of a software functional unit.

The integrated units, if implemented as software functional units and sold or used as a stand-alone product, may be stored in a computer accessible memory. Based on such understanding, the technical solutions of the present application, which essentially or partly contribute to the prior art, or all or part of the technical solutions, may be embodied in the form of a software product, which is stored in a memory and includes several requests for causing a computer device (which may be a personal computer, a server, or a network device, etc., and may specifically be a processor in the computer device) to execute some or all of the steps of the above methods of the embodiments of the present application.

The following describes in detail an auto-focusing method, an auto-focusing device, an electronic device, and a computer-readable storage medium, which are disclosed in the embodiments of the present application, and specific examples are applied herein to explain the principles and embodiments of the present application, and the descriptions of the foregoing embodiments are only used to help understand the method and the core idea of the present application. Meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. An auto-focusing method, comprising:

taking the first prediction region information as an observation value, processing the first image region information and the observation value through a filter, and predicting to obtain second prediction region information of the target object in an N frame image after the current frame image;

2. The method according to claim 1, wherein the predicting first prediction region information of the target object in an nth frame image after the current frame image according to the image region information comprises:

calculating corresponding change speed information of the target object in the current frame image according to corresponding second image area information of the target object in a historical frame image and the first image area information, wherein the historical frame image is an image acquired by a camera before the current frame image;

and predicting first prediction region information of the target object in an Nth frame image after the current frame image according to the change speed information.

3. The method according to claim 2, wherein a difference in frame number between the history frame image and the current frame image has a positive correlation with a delay time, the delay time being determined according to a duration of predicting the first prediction region information and the second prediction region information of the target object in an nth frame image after the current frame image.

4. The method of claim 3, wherein the difference in frame number between the historical frame image and the current frame image is an integer multiple of the N.

5. The method according to any one of claims 2 to 4, wherein the change speed includes a position change speed and a position change acceleration, and the first predicted area information includes first predicted position information; and/or

The change speed includes a size change speed and a size change acceleration, and the first prediction region information includes first prediction size information.

6. The method of claim 1, wherein the filter comprises a prediction model and a modification model; taking the first prediction area information as an observation value, processing the first image area information and the observation value through a filter, and predicting to obtain second prediction area information of the target object in an nth frame image after the current frame image, wherein the method comprises the following steps of:

calculating the first image area information through the prediction model to obtain a filter prediction value;

and taking the first prediction region information as an observation value of the correction model, and correcting the predicted value of the filter in the correction model by using the observation value to obtain second prediction region information of the target object in an N frame image after the current frame image.

7. The method of claim 6, wherein said calculating the first image region information via the prediction model to obtain a filter prediction value comprises:

acquiring the corresponding change speed of the target object in the current frame image;

and calculating the first image area information and the change speed based on the prediction model to obtain an estimated state value corresponding to the target object at the next moment, and taking the estimated state value as a predicted value of the filter.

8. The method according to claim 7, wherein the calculating the first image area information and the change speed based on the prediction model to obtain an estimated state value corresponding to the target object at the next time includes:

determining a current state value corresponding to the target object at the current moment according to the first image area information and the change speed;

and calculating the current state value through the state transition matrix of the prediction model to obtain an estimated state value corresponding to the target object at the next moment.

9. The method according to claim 7 or 8, wherein the first image area information comprises position information, the change speed comprises a position change speed, and the filter prediction value comprises filter prediction position information; and/or

10. The method according to claim 6, wherein modifying the filter prediction value in the modification model by using the observation value to obtain second prediction region information of the target object in an nth frame image after the current frame image comprises:

correcting the predicted value of the filter through the noise matrix of the correction model and the observation value to obtain a corrected predicted value of the filter;

and taking the numerical value of the target dimension in the corrected predicted value of the filter as second predicted area information of the target object in an Nth frame image after the current frame image.

11. The method according to any one of claims 1 to 4, 6 to 8, and 10, wherein N is a delay frame number determined according to a time length of predicting the first prediction region information and the second prediction region information of the target object in an nth frame image after the current frame image.

12. The method according to claim 1, wherein before the determining the corresponding first image region information of the target object in the current frame image, the method further comprises:

determining a target object from an initial frame image, and acquiring an initial image area of the target object in the initial frame image;

the determining of the corresponding first image area information of the target object in the current frame image includes:

matching a current frame image with the initial image area, and determining a target area with the highest similarity with the initial image area in the current frame image;

and acquiring first image area information corresponding to the target object in the current frame image according to the target area.

13. An auto-focusing apparatus, comprising:

14. An electronic device comprising a memory and a processor, the memory having stored thereon a computer program that, when executed by the processor, causes the processor to carry out the method of any one of claims 1 to 12.

15. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1 to 12.