CN110545384B

CN110545384B - Focusing method and device, electronic equipment and computer readable storage medium

Info

Publication number: CN110545384B
Application number: CN201910897424.5A
Authority: CN
Inventors: 贾玉虎
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2019-09-23
Filing date: 2019-09-23
Publication date: 2021-06-08
Anticipated expiration: 2039-09-23
Also published as: CN110545384A

Abstract

The application relates to a focusing method and device, an electronic device and a computer readable storage medium. The method comprises the steps of obtaining a current frame image; performing main body detection on the current frame image to obtain a main body in the current frame image; determining the position of a subject in the current frame image; and when the difference degree between the position of the main body in the current frame image and the position of the main body in the previous frame image is smaller than a threshold value, focusing is carried out based on the main body of the current frame image. The method and the device, the electronic equipment and the computer readable storage medium can improve the focusing accuracy.

Description

Focusing method and device, electronic equipment and computer readable storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a focusing method, an apparatus, an electronic device, and a computer-readable storage medium.

Background

With the development of imaging technology, people are more and more accustomed to shooting images or videos through image acquisition equipment such as a camera on electronic equipment and recording various information. The camera generally needs to focus on the shot object in the process of acquiring the image, so as to acquire a clear image of the shot object.

However, the conventional focusing method has a problem of inaccurate focusing.

Disclosure of Invention

The embodiment of the application provides a focusing method, a focusing device, electronic equipment and a computer readable storage medium, which can improve the focusing accuracy.

A focusing method, comprising:

acquiring a current frame image;

performing main body detection on the current frame image to obtain a main body in the current frame image;

determining the position of a subject in the current frame image;

and when the difference degree between the position of the main body in the current frame image and the position of the main body in the previous frame image is smaller than a threshold value, focusing is carried out based on the main body of the current frame image.

A focusing apparatus, comprising:

the current frame image acquisition module is used for acquiring a current frame image;

the main body detection module is used for carrying out main body detection on the current frame image to obtain a main body in the current frame image;

the position determining module is used for determining the position of the main body in the current frame image;

and the focusing module is used for focusing based on the main body of the current frame image when the difference degree between the position of the main body in the current frame image and the position of the main body in the previous frame image is less than a threshold value.

An electronic device includes a memory and a processor, wherein the memory stores a computer program, and the computer program, when executed by the processor, causes the processor to execute the steps of the focusing method.

A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method.

The focusing method and device, the electronic equipment and the computer readable storage medium obtain the current frame image; performing main body detection on the current frame image to obtain a main body in the current frame image; determining the position of a main body in a current frame image; when the difference between the position of the main body in the current frame image and the position of the main body in the previous frame image is smaller than the threshold, it indicates that the positions of the main body in the current frame image and the main body in the previous frame image do not change greatly, and the main body in the image is in a stable state, and focusing is performed based on the main body of the current frame image, so that focusing to a wrong object can be avoided, and the focusing accuracy is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a schematic diagram of an image processing circuit in one embodiment;

FIG. 2 is a flow chart of a focusing method in one embodiment;

FIG. 3 is a flow diagram of steps in one embodiment for determining a location of a subject;

FIG. 4 is a diagram illustrating a conventional focusing technique according to one embodiment;

FIG. 5 is a flow diagram of the subject detection in one embodiment;

FIG. 6 is a flow chart of the subject detection in another embodiment;

FIG. 7 is a block diagram showing the structure of a focusing device in one embodiment;

fig. 8 is a schematic diagram of an internal structure of an electronic device in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The embodiment of the application provides electronic equipment. The electronic device includes therein an Image Processing circuit, which may be implemented using hardware and/or software components, and may include various Processing units defining an ISP (Image Signal Processing) pipeline. FIG. 1 is a schematic diagram of an image processing circuit in one embodiment. As shown in fig. 1, for convenience of explanation, only aspects of the image processing technology related to the embodiments of the present application are shown.

As shown in fig. 1, the image processing circuit includes an ISP processor 140 and control logic 150. The image data captured by the imaging device 110 is first processed by the ISP processor 140, and the ISP processor 140 analyzes the image data to capture image statistics that may be used to determine and/or control one or more parameters of the imaging device 110. The imaging device 110 may include a camera having one or more lenses 112 and an image sensor 114. The image sensor 114 may include an array of color filters (e.g., Bayer filters), and the image sensor 114 may acquire light intensity and wavelength information captured with each imaging pixel of the image sensor 114 and provide a set of raw image data that may be processed by the ISP processor 140. The attitude sensor 120 (e.g., three-axis gyroscope, hall sensor, accelerometer) may provide parameters of the acquired image processing (e.g., anti-shake parameters) to the ISP processor 140 based on the type of interface of the attitude sensor 120. The attitude sensor 120 interface may utilize an SMIA (Standard Mobile Imaging Architecture) interface, other serial or parallel camera interfaces, or a combination of the above.

In addition, the image sensor 114 may also send raw image data to the attitude sensor 120, the sensor 120 may provide the raw image data to the ISP processor 140 based on the type of interface of the attitude sensor 120, or the attitude sensor 120 may store the raw image data in the image memory 130.

The ISP processor 140 processes the raw image data pixel by pixel in a variety of formats. For example, each image pixel may have a bit depth of 8, 10, 12, or 14 bits, and the ISP processor 140 may perform one or more image processing operations on the raw image data, gathering statistical information about the image data. Wherein the image processing operations may be performed with the same or different bit depth precision.

The ISP processor 140 may also receive image data from the image memory 130. For example, the attitude sensor 120 interface sends raw image data to the image memory 130, and the raw image data in the image memory 130 is then provided to the ISP processor 140 for processing. The image Memory 130 may be a portion of a Memory device, a storage device, or a separate dedicated Memory within an electronic device, and may include a DMA (Direct Memory Access) feature.

Upon receiving raw image data from the image sensor 114 interface or from the attitude sensor 120 interface or from the image memory 130, the ISP processor 140 may perform one or more image processing operations, such as temporal filtering. The processed image data may be sent to image memory 130 for additional processing before being displayed. ISP processor 140 receives processed data from image memory 130 and performs image data processing on the processed data in the raw domain and in the RGB and YCbCr color spaces. The image data processed by ISP processor 140 may be output to display 160 for viewing by a user and/or further processed by a Graphics Processing Unit (GPU). Further, the output of the ISP processor 140 may also be sent to the image memory 130, and the display 160 may read image data from the image memory 130. In one embodiment, image memory 130 may be configured to implement one or more frame buffers.

The statistical data determined by the ISP processor 140 may be transmitted to the control logic 150 unit. For example, the statistical data may include image sensor 114 statistics such as gyroscope vibration frequency, auto-exposure, auto-white balance, auto-focus, flicker detection, black level compensation, lens 112 shading correction, and the like. The control logic 150 may include a processor and/or microcontroller that executes one or more routines (e.g., firmware) that may determine control parameters of the imaging device 110 and control parameters of the ISP processor 140 based on the received statistical data. For example, the control parameters of the imaging device 110 may include attitude sensor 120 control parameters (e.g., gain, integration time of exposure control, anti-shake parameters, etc.), camera flash control parameters, camera anti-shake displacement parameters, lens 112 control parameters (e.g., focal length for focusing or zooming), or a combination of these parameters. The ISP control parameters may include gain levels and color correction matrices for automatic white balance and color adjustment (e.g., during RGB processing), as well as lens 112 shading correction parameters.

In one embodiment, the current frame image is acquired by the lens 112 and the image sensor 114 in the imaging device (camera) 110 and sent to the ISP processor 140. After receiving the current frame image, the ISP processor 140 performs main body detection on the current frame image to obtain a main body in the current frame image; determining the position of a main body in a current frame image; acquiring the position of a main body in the previous frame of image; when the degree of difference between the position of the subject in the current frame image and the position of the subject in the previous frame image is less than the threshold value, the position of the subject in the current frame image is sent to the control logic 150.

After acquiring the position of the subject in the current frame image, the control logic 150 controls the lens 112 in the imaging device 110 to move, so as to focus on the position of the subject in the current frame image, thereby improving the accuracy of focusing.

When the degree of difference between the position of the subject in the current frame image and the position of the subject in the previous frame image is greater than or equal to the threshold value, the position of the subject in the previous frame image is sent to the control logic 150.

After acquiring the position of the subject in the previous frame image, the control logic 150 controls the lens 112 in the imaging device 110 to move, so as to focus on the position of the subject in the previous frame image, thereby improving the accuracy of focusing when the position of the subject in the current frame image is unstable.

FIG. 2 is a flowchart of a focusing method in one embodiment. As shown in fig. 2, the focusing method includes steps 202 to 206.

Step 202, acquiring a current frame image.

The current frame image refers to an image acquired at the current time. The current frame image may be any one of an RGB (Red, Green, Blue) image, a grayscale image, a depth image, and the like.

In the embodiment provided by the application, the current frame image can be obtained by shooting through electronic equipment. The electronic equipment can be provided with cameras, and the number of the arranged cameras can be one or more. For example, 1, 2, 3, 5, etc. are provided, and are not limited herein. The form of the camera installed in the electronic device is not limited, and for example, the camera may be a camera built in the electronic device, or a camera externally installed in the electronic device; the camera can be a front camera or a rear camera.

The camera on the electronic device may be any type of camera. For example, the camera may be a color camera, a black and white camera, a depth camera, a telephoto camera, a wide angle camera, etc., without being limited thereto.

Correspondingly, but not limited to, a color image, i.e., an RGB image, is acquired by a color camera, a grayscale image is acquired by a black-and-white camera, a depth image is acquired by a depth camera, a tele image is acquired by a tele camera, and a wide image is acquired by a wide camera. The cameras in the electronic device may be the same type of camera or different types of cameras. For example, the cameras may be color cameras, or black and white cameras; one of the cameras may be a telephoto camera, and the other cameras may be wide-angle cameras, without being limited thereto.

Specifically, the current time is obtained, and the camera is controlled to shoot according to the current time to obtain the current frame image.

And 204, performing main body detection on the current frame image to obtain a main body in the current frame image.

The subject detection (subject detection) is to automatically process the region of interest and selectively ignore the region of no interest when facing a scene. The region of interest is referred to as the subject region.

The subject refers to various subjects, such as human, flower, cat, dog, cow, blue sky, white cloud, background, etc. The main body of the current frame image is a required main body and can be selected according to requirements.

In one embodiment, when a candidate subject is detected to exist in the current frame image, the candidate subject is taken as the subject of the current frame image. When it is detected that at least two candidate subjects exist in the current frame image, a subject in the current frame image is determined from the at least two candidate subjects.

In one embodiment, the area of each candidate subject is obtained, and the candidate subject with the largest area is taken as the subject in the current frame image.

Generally, the subject with the largest area is the subject of the user. Therefore, the candidate subject with the largest area is taken as the subject in the current frame image, and the accuracy of subject detection can be improved.

In another embodiment, position information of each candidate subject is acquired, and the candidate subject closest to the center of the current frame image is taken as the subject in the current frame image.

Generally, the subject at the center position of the image is the subject photographed by the user. Therefore, the candidate subject closest to the center of the current frame image is taken as the subject in the current frame image, and the accuracy of subject detection can be improved.

Step 206, the position of the subject in the current frame image is determined.

The position of the subject in the current frame image may be represented by the position of the centroid of the subject, the center point of the subject, or the feature point of the subject, but is not limited thereto.

For example, the coordinates of the centroid of the subject are (50,60), the position of the subject in the current frame image may be represented as (50, 60); the coordinates of the center point of the subject are (20,35), the position of the subject in the current frame image can be represented as (20, 35); the feature point of the subject is the nose tip, the coordinates of the nose tip are (65, 50), and the position of the subject in the current frame image can be represented as (65, 50).

And step 208, focusing based on the main body of the current frame image when the difference degree between the position of the main body in the current frame image and the position of the main body in the last frame image is smaller than the threshold value.

The degree of difference refers to the degree of difference. When the difference between the position of the subject in the current frame image and the position of the subject in the previous frame image is smaller than the threshold, the difference between the position of the subject in the current frame image and the position of the subject in the previous frame image is smaller, that is, the relative positions of the subject and the camera are kept stable in the process of shooting two adjacent frame images. When the relative position of the main body and the camera is kept stable, the reliability of the position of the main body in the current frame image is high, and focusing can be performed based on the main body of the current frame image. The main body of the current frame image with high stability is focused by judging the stability of the relative position of the main body and the camera, so that the main body can be focused to be accurate.

Specifically, the position of the subject of the previous frame of image is acquired; the position of the subject of the current frame image is compared with the position of the subject of the previous frame image.

Similarly, when the camera shoots the previous frame image, namely the previous frame image is the current frame image, the subject detection is carried out on the current frame image, and the subject step in the current frame image is obtained. And after the current frame image is shot, shooting a next frame image through a camera, taking the next frame image as the current frame image, and executing main body detection on the current frame image to obtain a main body step in the current frame image. Therefore, from the chronological order of the shooting time of the camera, the previous frame before the current frame image shot at the current time is the previous frame image.

In one embodiment, a distance value between the position of the subject of the current frame image and the position of the subject of the previous frame image may be obtained, and the distance value is taken as the degree of difference. For example, if the position of the subject of the current frame image is (50,60) and the position of the subject of the previous frame image is (55,60), the distance value can be calculated by the following formula:

therefore, the degree of difference between the position of the subject of the current frame image and the position of the subject of the previous frame image is 5.

Focusing refers to the process of changing the object distance and the distance position through a focusing mechanism of a camera so as to enable the shot object to form clear images. The focusing may be any of CAF (continuous Auto Focus), PDAF (Phase Detection Auto Focus), ToF (laser Focus), and the like.

When the difference degree between the position of the main body in the current frame image and the position of the main body in the previous frame image is smaller than the threshold value, focusing is performed based on the main body of the current frame image, namely focusing is performed to the position corresponding to the main body of the current frame image.

In one embodiment, the next frame of image is acquired by the focused camera; and taking the next frame image as the current frame image, and executing the step of performing main body detection on the current frame image to obtain the main body in the current frame image. And circulating in sequence, and obtaining clearer images or videos continuously.

When the shot object is small in size and a high-texture object exists in the background, the focusing area comprises a front background and a back background, and the high-texture background can be easily focused. Counting FV value if CAF, PDAF is PD value, if the position of main body can be detected, then the background will not be added, and the main body can be focused accurately.

In another embodiment, when the subject detection is performed on the current frame image, the subject detection may be performed in combination with AI (Artificial Intelligence), and focusing is performed on the region of interest, but is performed on the subject region instead.

The focusing method acquires the current frame image; performing main body detection on the current frame image to obtain a main body in the current frame image; determining the position of a main body in a current frame image; when the difference between the position of the main body in the current frame image and the position of the main body in the previous frame image is smaller than the threshold, it indicates that the positions of the main body in the current frame image and the main body in the previous frame image do not change greatly, and the main body in the image is in a stable state, and focusing is performed based on the main body of the current frame image, so that focusing to a wrong object can be avoided, and the focusing accuracy is improved.

In one embodiment, as shown in fig. 3, determining the position of the subject in the current frame image includes:

step 302, obtaining the position of each pixel point contained in the main body.

It can be understood that the main body is a region, and the region includes a plurality of pixel points. And acquiring the position of each pixel point contained in the main body.

In one embodiment, obtaining the position of each pixel point included in the main body includes: obtaining each candidate pixel point contained in the main body; filtering each candidate pixel point to obtain a target pixel point; and acquiring the position of each target pixel point.

The candidate pixel refers to a pixel included in the main body in the current frame image. The target pixel point refers to a pixel point obtained after filtering processing. The filtering process may be gaussian filtering, smoothing filtering, bilateral filtering, or the like, without being limited thereto.

It can be understood that the detected subject of the current frame image may further include some pixel points that do not belong to the subject object, or further include some noise. Therefore, each pixel point contained in the main body in the current frame image is filtered, noise points or pixel points not belonging to the main body object are filtered, and the accuracy of the determined position of the main body can be improved.

Step 304, determining the mass center of the main body according to the positions of the pixel points; the center of mass of the body is used to represent the position of the body.

Centroid refers to the center of mass. The position of each pixel point can be represented by coordinates. After the position coordinates of each pixel point are obtained, the position coordinates of the mass center of the main body can be determined according to the position coordinates of each pixel point. The position coordinates of the center of mass of the subject can be calculated according to the following formula: the position coordinates of the centroid are (X, Y), where: x is sum _ X/N, and Y is sum _ Y/N. sum _ x refers to the sum of the abscissas of the pixels contained in the main body, sum _ y refers to the sum of the ordinates of the pixels contained in the main body, and N refers to the number of the pixels contained in the main body.

According to the focusing method, the positions of all the pixel points contained in the main body are obtained, the mass center of the main body is determined according to the positions of all the pixel points, the mass center is the center of mass, and the mass center of the main body can more accurately represent the position of the main body.

In one embodiment, the method further comprises: when the difference degree between the position of the main body of the current frame image and the position of the main body of the previous frame image is greater than or equal to the threshold value, focusing is performed based on the main body of the previous frame image.

When the difference between the position of the subject in the current frame image and the position of the subject in the previous frame image is greater than or equal to the threshold, the difference between the position of the subject in the current frame image and the position of the subject in the previous frame image is larger, that is, the relative position between the subject and the camera is unstable during shooting of two adjacent frames of images. In the process of shooting two adjacent images, the subject may move, the camera may move, or both the subject and the camera may move to different degrees, and so on, so that the difference between the position of the subject in the current frame image and the position of the subject in the previous frame image is greater than or equal to the threshold.

Specifically, the position of the subject of the previous frame of image is acquired; and comparing the position of the main body of the current frame image with the position of the main body of the previous frame image to obtain the difference between the position of the main body of the current frame image and the position of the main body of the previous frame image.

When the difference degree is greater than or equal to the threshold value, the position of the main body of the current frame is not accurate, focusing is carried out based on the main body of the previous frame image, and therefore the focusing accuracy when the difference degree is greater than or equal to the threshold value is improved.

In one embodiment, the method further comprises: acquiring continuously shot original images; and respectively carrying out color coding on the original images to obtain each image to be detected. Acquiring a current frame image, comprising: and acquiring a current frame image from each image to be detected.

The original images refer to respective images captured by the camera. The image to be detected refers to an image obtained by color coding an original image. Color coding refers to the coding of visual information coded in color.

And carrying out color coding on the original image to obtain the image to be detected which can be a YUV image. Wherein "Y" represents brightness (Luma or Luma), i.e., a gray scale value; "U" and "V" represent chromaticity (Chroma) which is used to describe the color and saturation of an image for specifying the color of a pixel. The brightness of the original image can be improved through color coding, and noise reduction processing is carried out, so that a brighter and more accurate image to be detected is obtained.

The original image is color-coded, and the obtained image to be detected may be an RGB (Red, Green, Blue) image, or may be another image, which is not limited to this.

In one embodiment, the method further comprises: when focusing is completed, acquiring a next frame of image; performing main body detection on the next frame image to obtain a main body in the next frame image; determining the position of a subject in the next frame of image; when the difference degree between the position of the subject in the next frame image and the position of the subject in the current frame image is less than the threshold value, focusing is performed based on the subject of the next frame image.

When focusing based on the main body of the current frame image is completed, the lens is controlled to move, so that the focusing is carried out to the position corresponding to the main body of the current frame image, and the next frame image is obtained. Then, main body detection is carried out on the next frame image to obtain a main body in the next frame image; determining the position of a subject in the next frame of image; and when the difference degree between the position of the main body in the next frame image and the position of the main body in the current frame image is smaller than the threshold value, controlling the lens to move so as to focus to the position corresponding to the main body of the next frame image. And the sequential circulation can continuously and accurately focus, so that more accurate and clear preview images, videos and the like can be continuously obtained.

In one embodiment, as shown in FIG. 4, after the electronic device turns on the camera 408, the camera acquires a preview image. The camera 402 sends the preview image to the focusing module 404 at the front end and then to the subject recognition network 406 at the back end. After the focusing module 404 obtains the preview image, the focusing distance may be calculated by a preset focusing algorithm, so as to drive the camera 402 to focus. After the main body identification network 406 acquires the preview image, the main body identification is performed on the image to obtain a main body, and information corresponding to the main body is sent to the focusing module 404. It can be understood that the subject identification network is at the back end of the ISP flow, and the process of subject identification requires a certain amount of time. Therefore, when the focusing module 404 acquires the relevant information of the subject, a certain time period, such as about 6 frames, has elapsed. When the photographic subject is a moving object, focusing is performed based on the detected subject, and there may be a problem that focusing is inaccurate.

Therefore, the subject recognition network 406 can be placed at the front end of the ISP process, that is, the camera 402 sends the preview image to the subject recognition network 406 at the front end, and sends the preview image to the focusing module 404 after detecting the subject through the subject recognition network 406, and the focusing module 404 drives the camera to focus according to the subject, so that the delay of subject recognition is reduced, and the focusing accuracy can be improved.

In one embodiment, as shown in fig. 5, performing a subject detection on the current frame image to obtain a subject in the current frame image includes:

step 502, generating a central weight map corresponding to the current frame image, wherein the weight value represented by the central weight map is gradually reduced from the center to the edge.

The central weight map is a map used for recording the weight value of each pixel point in the current frame image. The weight values recorded in the central weight map gradually decrease from the center to the four sides, i.e., the central weight is the largest, and the weight values gradually decrease toward the four sides. And the weight value from the image center pixel point to the image edge pixel point of the current frame image is gradually reduced by representing the center weight image.

The ISP processor or central processor may generate a corresponding center weight map according to the size of the current frame image. The weight value represented by the central weight map gradually decreases from the center to the four sides. The central weight map may be generated using a gaussian function, or using a first order equation, or a second order equation. The gaussian function may be a two-dimensional gaussian function.

Step 504, inputting the current frame image and the central weight map into a main body detection model to obtain a main body region confidence map, wherein the main body detection model is obtained by training in advance according to the current frame image, the depth map, the central weight map and a corresponding labeled main body mask map of the same scene.

The subject detection model is obtained by acquiring a large amount of training data in advance and inputting the training data into the subject detection model containing the initial network weight for training. Each set of training data comprises a current frame image, a center weight image and a labeled main body mask image corresponding to the same scene. The current frame image and the central weight map are used as input of a trained main body detection model, and the marked main body mask (mask) map is used as an expected output real value (ground true) of the trained main body detection model. The main body mask image is an image filter template used for identifying a main body in an image, and can shield other parts of the image and screen out the main body in the image. The subject detection model may be trained to recognize and detect various subjects, such as people, flowers, cats, dogs, backgrounds, etc.

Specifically, the ISP processor or the central processing unit may input the current frame image and the central weight map into the subject detection model, and perform detection to obtain a subject region confidence map. The subject region confidence map is used to record the probability of which recognizable subject the subject belongs to, for example, the probability of a certain pixel point belonging to a person is 0.8, the probability of a flower is 0.1, and the probability of a background is 0.1.

Step 506, determining the subject in the current frame image according to the subject region confidence map.

The subject refers to various subjects, such as human, flower, cat, dog, cow, blue sky, white cloud, background, etc. The main body is the required main body and can be selected according to the requirement.

Specifically, the ISP processor or the central processor may select the highest confidence level or the next highest confidence level as the subject in the current frame image according to the subject region confidence map.

In the image processing method in this embodiment, after a center weight map corresponding to a current frame image is generated, the current frame image and the center weight map are input into corresponding subject detection models to be detected, a subject region confidence map can be obtained, a subject in the current frame image can be determined according to the subject region confidence map, an object in the center of the image can be detected more easily by using the center weight map, and the subject in the current frame image can be identified more accurately by using the trained subject detection models obtained by using the current frame image, the center weight map, the subject mask map and the like.

In one embodiment, as shown in fig. 6, determining the subject in the current frame image according to the subject region confidence map includes:

step 602, the confidence map of the main body region is processed to obtain a main body mask map.

Specifically, some scattered points with lower confidence exist in the confidence map of the subject region, and the confidence map of the subject region may be filtered by the ISP processor or the central processing unit to obtain the mask map of the subject. The filtering process may employ a configured confidence threshold to filter the pixel points in the confidence map of the subject region whose confidence value is lower than the confidence threshold. The confidence threshold may adopt a self-adaptive confidence threshold, may also adopt a fixed threshold, and may also adopt a threshold corresponding to a regional configuration.

Step 604, detecting the current frame image, and determining a highlight area in the current frame image.

The highlight region is a region having a luminance value greater than a luminance threshold value.

Specifically, the ISP processor or the central processing unit performs highlight detection on the current frame image, screens target pixel points with brightness values larger than a brightness threshold, and performs connected domain processing on the target pixel points to obtain a highlight area.

Step 606, determining the main body of the current frame image with highlight eliminated according to the highlight area and the main body mask image in the current frame image.

Specifically, the ISP processor or the central processor may perform a difference calculation or a logical and calculation on the highlight area in the current frame image and the main body mask image to obtain a main body with highlight eliminated in the current frame image.

In this embodiment, the confidence map of the main body region is filtered to obtain a main body mask map, so that the reliability of the confidence map of the main body region is improved, the current frame image is detected to obtain a highlight region, and then the highlight region is processed with the main body mask map to obtain a highlight-eliminated main body.

In one embodiment, the processing the subject region confidence map to obtain a subject mask map includes: and carrying out self-adaptive confidence threshold filtering processing on the confidence map of the main body region to obtain a main body mask map.

The adaptive confidence threshold refers to a confidence threshold. The adaptive confidence threshold may be a locally adaptive confidence threshold. The local self-adaptive confidence threshold is a binary confidence threshold determined at the position of a pixel point according to the pixel value distribution of the domain block of the pixel point. The binarization confidence threshold value configuration of the image area with higher brightness is higher, and the binarization threshold confidence value configuration of the image area with lower brightness is lower.

Optionally, the configuration process of the adaptive confidence threshold includes: when the brightness value of the pixel point is larger than the first brightness value, a first confidence threshold value is configured, when the brightness value of the pixel point is smaller than a second brightness value, a second confidence threshold value is configured, when the brightness value of the pixel point is larger than the second brightness value and smaller than the first brightness value, a third confidence threshold value is configured, wherein the second brightness value is smaller than or equal to the first brightness value, the second confidence threshold value is smaller than the third confidence threshold value, and the third confidence threshold value is smaller than the first confidence threshold value.

Optionally, the configuration process of the adaptive confidence threshold includes: when the brightness value of the pixel point is larger than the first brightness value, a first confidence threshold value is configured, and when the brightness value of the pixel point is smaller than or equal to the first brightness value, a second confidence threshold value is configured, wherein the second brightness value is smaller than or equal to the first brightness value, and the second confidence threshold value is smaller than the first confidence threshold value.

When the self-adaptive confidence threshold filtering processing is carried out on the confidence map of the main area, the confidence value of each pixel point in the confidence map of the main area is compared with the corresponding confidence threshold, if the confidence value is larger than or equal to the confidence threshold, the pixel point is reserved, and if the confidence value is smaller than the confidence threshold, the pixel point is removed.

In one embodiment, the performing an adaptive confidence threshold filtering process on the confidence map of the subject region to obtain a subject mask map includes:

carrying out self-adaptive confidence coefficient threshold filtering processing on the confidence coefficient map of the main body region to obtain a binary mask map; and performing morphology processing and guide filtering processing on the binary mask image to obtain a main body mask image.

Specifically, after the ISP processor or the central processing unit filters the confidence map of the main area according to the adaptive confidence threshold, the confidence values of the retained pixel points are represented by 1, and the confidence values of the removed pixel points are represented by 0, so as to obtain the binary mask map.

Morphological treatments may include erosion and swelling. Firstly, carrying out corrosion operation on the binary mask image, and then carrying out expansion operation to remove noise; and then conducting guided filtering processing on the morphologically processed binary mask image to realize edge filtering operation and obtain a main body mask image with an edge extracted.

The morphology processing and the guide filtering processing can ensure that the obtained main body mask image has less or no noise points and the edge is softer.

In one embodiment, the determining the highlight-removed main body in the current frame image according to the highlight region in the current frame image and the main body mask comprises: and carrying out difference processing on the highlight area in the current frame image and the main body mask image to obtain a main body with highlight eliminated.

Specifically, the ISP processor or the central processor performs a difference processing on the highlight area in the current frame image and the main body mask image, that is, the current frame image and the main body mask image are subtracted from each other by the corresponding pixel values, so as to obtain the main body in the current frame image. The main body without the highlight is obtained through differential processing, and the calculation mode is simple.

It should be understood that, although the steps in the flowcharts of fig. 2, 3, 5 and 6 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2, 3, 5, and 6 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performing the sub-steps or stages is not necessarily sequential, but may be performed alternately or alternatingly with other steps or at least some of the sub-steps or stages of other steps.

FIG. 7 is a block diagram of a focusing device according to an embodiment. As shown in fig. 7, there is provided a focusing apparatus 700 including: a current frame image acquisition module 702, a subject detection module 704, a position determination module 706, and a focusing module 708, wherein:

a current frame image obtaining module 702, configured to obtain a current frame image.

The main body detection module 704 is configured to perform main body detection on the current frame image to obtain a main body in the current frame image.

And a position determining module 706, configured to determine a position of the subject in the current frame image.

A focusing module 708, configured to perform focusing based on the subject of the current frame image when a difference between the position of the subject in the current frame image and the position of the subject in the previous frame image is less than a threshold.

The focusing device acquires a current frame image; performing main body detection on the current frame image to obtain a main body in the current frame image; determining the position of a main body in a current frame image; when the difference between the position of the main body in the current frame image and the position of the main body in the previous frame image is smaller than the threshold, it indicates that the positions of the main body in the current frame image and the main body in the previous frame image do not change greatly, and the main body in the image is in a stable state, and focusing is performed based on the main body of the current frame image, so that focusing to a wrong object can be avoided, and the focusing accuracy is improved.

In an embodiment, the position determining module 706 is further configured to obtain a position of each pixel point included in the main body; determining the mass center of the main body according to the position of each pixel point; the center of mass of the body is used to represent the position of the body.

In one embodiment, the focusing module 708 is further configured to focus based on the subject of the previous frame image when the difference between the position of the subject of the current frame image and the position of the subject of the previous frame image is greater than or equal to a threshold.

In one embodiment, the focusing device 700 further includes a shooting module for acquiring continuously shot original images; and respectively carrying out color coding on the original images to obtain each image to be detected. Acquiring a current frame image, comprising: and acquiring a current frame image from each image to be detected.

In one embodiment, the focusing apparatus 700 further includes a circulation module for acquiring a next frame of image when focusing is completed; performing main body detection on the current frame image to obtain a main body in the current frame image; determining the position of a subject in the next frame of image; when the difference degree between the position of the subject in the next frame image and the position of the subject in the current frame image is less than the threshold value, focusing is performed based on the subject of the next frame image.

In one embodiment, the subject detection module 704 is further configured to generate a central weight map corresponding to the current frame image, wherein the weight value represented by the central weight map decreases from the center to the edge; inputting the current frame image and the central weight map into a main body detection model to obtain a main body region confidence map, wherein the main body detection model is a model obtained by training in advance according to the current frame image, the central weight map and a corresponding marked main body mask map of the same scene; and determining the subject in the current frame image according to the subject region confidence map.

In an embodiment, the subject detection module 704 is further configured to process the subject region confidence map to obtain a subject mask map; detecting a current frame image and determining a highlight area in the current frame image; and determining the main body without the highlight in the current frame image according to the highlight area and the main body mask image in the current frame image.

In an embodiment, the subject detection module 704 is further configured to perform an adaptive confidence threshold filtering process on the subject region confidence map to obtain a subject mask map.

The division of the modules in the focusing device is only used for illustration, and in other embodiments, the focusing device may be divided into different modules as needed to complete all or part of the functions of the focusing device.

Fig. 8 is a schematic diagram of an internal structure of an electronic device in one embodiment. As shown in fig. 8, the electronic device includes a processor and a memory connected by a system bus. Wherein, the processor is used for providing calculation and control capability and supporting the operation of the whole electronic equipment. The memory may include a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The computer program can be executed by a processor for implementing a focusing method provided in the following embodiments. The internal memory provides a cached execution environment for the operating system computer programs in the non-volatile storage medium. The electronic device may be a mobile phone, a tablet computer, or a personal digital assistant or a wearable device, etc.

The implementation of each module in the focusing apparatus provided in the embodiments of the present application may be in the form of a computer program. The computer program may be run on a terminal or a server. The program modules constituted by the computer program may be stored on the memory of the terminal or the server. Which when executed by a processor, performs the steps of the method described in the embodiments of the present application.

The embodiment of the application also provides a computer readable storage medium. One or more non-transitory computer-readable storage media containing computer-executable instructions that, when executed by one or more processors, cause the processors to perform the steps of the focusing method.

A computer program product comprising instructions which, when run on a computer, cause the computer to perform a focusing method.

Any reference to memory, storage, database, or other medium used by embodiments of the present application may include non-volatile and/or volatile memory. Suitable non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM), which acts as external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms, such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), Enhanced SDRAM (ESDRAM), synchronous Link (Synchlink) DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and bus dynamic RAM (RDRAM).

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A focusing method, comprising:

acquiring a current frame image;

performing main body detection on the current frame image through a main body identification network to obtain a main body in the current frame image; the main body identification network is arranged at the front end of the ISP flow;

determining the position of a subject in the current frame image;

focusing based on the main body of the current frame image when the difference degree between the position of the main body in the current frame image and the position of the main body in the previous frame image is smaller than a threshold value;

and when the difference degree between the position of the main body of the current frame image and the position of the main body of the previous frame image is greater than or equal to a threshold value, focusing is carried out based on the main body of the previous frame image.

2. The method of claim 1, wherein the determining the location of the subject in the current frame image comprises:

acquiring the position of each pixel point contained in the main body;

determining the centroid of the main body according to the positions of the pixel points; the center of mass of the body is used to represent the position of the body.

3. The method of claim 1, further comprising:

acquiring the position of a main body in the previous frame of image;

and comparing the position of the main body of the current frame image with the position of the main body in the previous frame image to obtain the difference between the position of the main body of the current frame image and the position of the main body of the previous frame image.

4. The method of claim 1, further comprising:

acquiring continuously shot original images;

respectively carrying out color coding on the original images to obtain each image to be detected;

the acquiring of the current frame image includes:

and acquiring a current frame image from each image to be detected.

5. The method of claim 1, further comprising:

when focusing is completed, acquiring a next frame of image;

performing main body detection on the next frame image to obtain a main body in the next frame image; determining the position of a subject in the next frame of image; and when the difference degree between the position of the main body in the next frame image and the position of the main body in the current frame image is smaller than a threshold value, focusing is carried out based on the main body of the next frame image.

6. The method of claim 1, wherein the performing the subject detection on the current frame image to obtain the subject in the current frame image comprises:

generating a center weight map corresponding to the current frame image, wherein the weight value represented by the center weight map is gradually reduced from the center to the edge;

inputting the current frame image and the central weight map into a main body detection model to obtain a main body region confidence map, wherein the main body detection model is a model obtained by training in advance according to the current frame image, the central weight map and a corresponding marked main body mask map of the same scene;

and determining the subject in the current frame image according to the subject region confidence map.

7. The method of claim 6, wherein said determining the subject in the current frame image according to the subject region confidence map comprises:

processing the confidence coefficient map of the main body region to obtain a main body mask map;

detecting the current frame image, and determining a highlight area in the current frame image;

and determining the main body without highlight in the current frame image according to the highlight area in the current frame image and the main body mask image.

8. The method of claim 7, wherein the processing the subject region confidence map to obtain a subject mask map comprises:

and carrying out self-adaptive confidence coefficient threshold filtering processing on the confidence coefficient image of the main body region to obtain a main body mask image.

9. A focusing apparatus, comprising:

the main body detection module is used for carrying out main body detection on the current frame image through a main body identification network to obtain a main body in the current frame image; the main body identification network is arranged at the front end of the ISP flow;

the focusing module is used for focusing based on the main body of the current frame image when the difference degree between the position of the main body in the current frame image and the position of the main body in the previous frame image is smaller than a threshold value; and when the difference degree between the position of the main body of the current frame image and the position of the main body of the previous frame image is greater than or equal to a threshold value, focusing is carried out based on the main body of the previous frame image.

10. An electronic device comprising a memory and a processor, the memory having stored therein a computer program that, when executed by the processor, causes the processor to perform the steps of the focusing method as claimed in any one of claims 1 to 8.

11. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 8.