CN115601409A

CN115601409A - Image processing method, image processing device, storage medium and electronic equipment

Info

Publication number: CN115601409A
Application number: CN202110778658.5A
Authority: CN
Inventors: 向超; 陶伟森; 刘阳兴
Original assignee: Wuhan TCL Group Industrial Research Institute Co Ltd
Current assignee: Wuhan TCL Group Industrial Research Institute Co Ltd
Priority date: 2021-07-09
Filing date: 2021-07-09
Publication date: 2023-01-13

Abstract

The application discloses an image processing method, an image processing device, a storage medium and an electronic device, wherein the method comprises the following steps: the method comprises the steps of obtaining a main image and a secondary image, determining a first depth image corresponding to the main image and the secondary image, a secondary image gray image corresponding to the main image and the secondary image, performing binocular depth estimation according to the main image gray image and the secondary image gray image, determining a second depth image, and obtaining a target depth image according to the first depth image and the second depth image. The depth image containing the low-frequency information and the depth image containing the high-frequency information can be obtained by processing the binocular images, so that more depth information of fine structures can be reserved after the two images are fused, and the accuracy of generating the depth image is greatly improved.

Description

Image processing method, image processing device, storage medium and electronic equipment

Technical Field

The present application belongs to the field of image processing technologies, and in particular, to an image processing method, an image processing apparatus, a storage medium, and an electronic device.

Background

The pixel points in the depth image represent the distance from a point in physical space to the camera. In general, a disparity map between images acquired by multiple cameras for the same scene may be computed from which a depth map may be generated. For example, in a stereoscopic vision system composed of left and right binocular cameras, a left eye camera acquires a left image, a right eye camera acquires a right image, a disparity map between the left and right images is calculated, and the disparity map is converted into a depth image according to the principle of triangulation.

Time-of-flight ranging, which is based on the basic principle of calculating the distance of an object to be measured from a camera by continuously emitting a light pulse (typically invisible light) onto the object to be measured, receiving the light pulse reflected back from the object, and detecting the time of flight (round trip) of the light pulse, and structured light ranging are more commonly used in the prior art. The principle of the structured light ranging method is that light rays with certain structural characteristics are projected onto a shot object through a near-infrared laser and collected through a special infrared camera. The light with a certain structure can acquire different image phase information according to different depth areas of a shot object, and then the change of the structure is converted into depth information through an arithmetic unit. However, the applicant finds that the two methods not only have high power consumption and hardware cost, but also easily lose depth information of a fine structure and have low accuracy.

Disclosure of Invention

The application provides an image processing method, an image processing device, a storage medium and an electronic device, which can improve the accuracy of depth image generation.

In a first aspect, an embodiment of the present application provides an image processing method, including:

acquiring a main graph and a secondary graph;

determining a first depth image corresponding to the main map and the secondary map, a main map gray scale map corresponding to the main map and a secondary map gray scale map corresponding to the secondary map;

carrying out binocular depth estimation according to the main image gray level image and the auxiliary image gray level image to determine a second depth image;

and obtaining a target depth image according to the first depth image and the second depth image.

In a second aspect, an embodiment of the present application provides an image processing apparatus, including:

the acquisition module is used for acquiring the main graph and the auxiliary graph;

the first determining module is used for determining a first depth image corresponding to the main map and the secondary map, a main map gray scale map corresponding to the main map and a secondary map gray scale map corresponding to the secondary map;

the second determining module is used for carrying out binocular depth estimation according to the main image gray level image and the auxiliary image gray level image and determining a second depth image;

and the fusion module is used for obtaining a target depth image according to the first depth image and the second depth image.

In a third aspect, an embodiment of the present application provides a storage medium having a computer program stored thereon, which, when run on a computer, causes the computer to perform the above-mentioned image processing method.

In a fourth aspect, an embodiment of the present application provides an electronic device, including a processor and a memory, where the memory stores a plurality of instructions, and the processor loads the instructions in the memory to perform the following steps:

acquiring a main graph and a secondary graph;

determining a first depth image corresponding to the main image and the secondary image, a main image gray scale image corresponding to the main image and a secondary image gray scale image corresponding to the secondary image;

The image processing method provided by the embodiment of the application can obtain the main image and the secondary image, determine the first depth image corresponding to the main image and the secondary image, and the secondary image gray image corresponding to the main image and the secondary image, perform binocular depth estimation according to the main image gray image and the secondary image gray image, determine the second depth image, and obtain the target depth image according to the first depth image and the second depth image. The depth image containing the low-frequency information and the depth image containing the high-frequency information can be obtained by processing the binocular images, so that more depth information of fine structures can be reserved after the two images are fused, and the accuracy of generating the depth image is greatly improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic flowchart of an image processing method according to an embodiment of the present disclosure.

Fig. 2 is a scene schematic diagram of an image processing method according to an embodiment of the present application.

Fig. 3 is another schematic flow chart of an image processing method according to an embodiment of the present application.

Fig. 4 is a schematic view of another scene of an image processing method according to an embodiment of the present application.

Fig. 5 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application.

Fig. 6 is another schematic structural diagram of an image processing apparatus according to an embodiment of the present application.

Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Fig. 8 is another schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

Referring to the drawings, wherein like reference numbers refer to like elements, the principles of the present application are illustrated as being implemented in a suitable computing environment. The following description is based on illustrated embodiments of the application and should not be taken as limiting the application with respect to other embodiments that are not detailed herein.

In the description that follows, specific embodiments of the present application will be described with reference to steps and symbols executed by one or more computers, unless otherwise indicated. Accordingly, these steps and operations will be referred to, several times, as being performed by a computer, the computer performing operations involving a processing unit of the computer in electronic signals representing data in a structured form. This operation transforms the data or maintains it at locations in the computer's memory system, which may be reconfigured or otherwise altered in a manner well known to those skilled in the art. The data maintains a data structure that is a physical location of the memory that has particular characteristics defined by the data format. However, while the principles of the application have been described in language specific to above, it is not intended to be limited to the specific form set forth herein, and it will be recognized by those of ordinary skill in the art that various of the steps and operations described below may be implemented in hardware.

The terms "first", "second", and "third", etc. in this application are used to distinguish between different objects and not to describe a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or modules is not limited to only those steps or modules listed, but rather, some embodiments may include other steps or modules not listed or inherent to such process, method, article, or apparatus.

Referring to fig. 1, fig. 1 is a schematic flowchart of an image processing method according to an embodiment of the present disclosure. The image processing method provided by the embodiment of the application is applied to the electronic equipment, and the specific flow can be as follows:

and step 101, acquiring a main graph and a sub graph.

The main graph and the auxiliary graph can form a binocular image, the binocular image is based on a parallax principle and utilizes imaging equipment to obtain two images of a measured object from different positions, and three-dimensional geometric information of the object can be obtained by calculating position deviation between corresponding points of the binocular image. The principle of binocular stereo vision is to fuse images obtained by two eyes and observe the difference between the images, so that people can obtain obvious depth feeling, establish the corresponding relation between features, and correspond the mapping points of the same space physical point in different images, wherein the difference is called parallax (Disparity) image.

In an embodiment, the binocular images may be captured by a binocular camera, the binocular camera may be a mobile phone with a dual-camera module or a mobile phone with a multi-camera module (for example, a three-camera module or a four-camera module) or a flat panel, and may also directly acquire two images with parallax (for example, directly acquire two images with parallax stored in a server or a database). The binocular image can be a color image, a gray image, a color image and a gray image. For example, the binocular images include images captured by a left camera and a right camera, and further, after the binocular images are acquired, a main image and a sub-image therein may be further determined, for example, one image may be selected from the binocular images as the main image, and the corresponding other image is the sub-image. The main image in the binocular image is an image collected through a first imager in the imaging module, the auxiliary image in the binocular image is an image collected through a second imager in the imaging module, the first imager is a main imager, the main imager is used for collecting the main image, the second imager is an auxiliary imager, the auxiliary imager is used for collecting the auxiliary image, and the auxiliary image is used for auxiliary calculation of parallax information of the main image. The main graph refers to a reference view used for generating a corresponding depth map in a subsequent binocular depth estimation step, and the correspondence refers to that the imaging position of an object in the main graph is completely consistent with the imaging position of the object in the generated depth map. That is, after acquiring the binocular images, a target image may be selected from the binocular images, which is used as a reference view for generating a depth image when performing binocular depth estimation, and then the target image is taken as a main image and the other image of the binocular images is taken as a sub image. The mode of selecting the main image from the binocular images may be designated by a user or automatically selected by an electronic device, and the like, which is not further limited in the present application.

The image processing method provided by the embodiment of the application can be applied to electronic equipment, for example, a binocular camera can be a front double-camera, a rear double-camera and the like of the electronic equipment. Starting a binocular camera of the electronic equipment, enabling the electronic equipment to enter a photographing preview mode, and photographing by the binocular camera to obtain binocular images when a photographing instruction is received. Wherein, above-mentioned camera hardware generally includes five parts: a housing (motor), a lens, an infrared filter, an image sensor (e.g., CCD or COMS), and a Flexible Printed Circuit Board (FPCB), etc. In the shooting preview mode, in the process of displaying a preview image, the lens is driven by the motor to move, and a shot object forms an image on the image sensor through the lens. The image sensor converts the optical signal into an electric signal through optical-electric conversion and transmits the electric signal to the image processing circuit for subsequent processing. The Image Processing circuit may be implemented using hardware and/or software components, and may include various Processing units that define an ISP (Image Signal Processing) pipeline.

Further, binocular images can be corrected binocular after being shot. The binocular correction is to process images of the same object shot by the two cameras through a certain method, so that epipolar alignment of binocular images in multi-view geometry is realized, namely, the same object in the two images has the same size and is horizontally arranged on the same straight line. In one embodiment, the following parameters are acquired for binocular correction: the internal reference matrix of the left camera and the right camera, the distortion coefficient and the rotation matrix R of the right camera relative to the left camera. Wherein the above matrix can be decomposed into two matrices R1 and R2, i.e. the left and right cameras are rotated half each time to reach the level. In this embodiment, the binocular correction may include distortion removal and binocular parallel correction.

When the internal reference matrix and the distortion coefficient are obtained to eliminate the distortion of the image, the problem of the distortion of the image is solved (the radial distortion is caused by the process of the lens, and the tangential distortion is caused by the installation problem). The distortion correction process is that a source image pixel coordinate system is converted into a camera coordinate system through an internal reference matrix (the zoom and Z axis are increased compared with an image physical coordinate system), the camera coordinate of an image is corrected through a distortion coefficient, the camera coordinate system is converted into an image pixel coordinate system through the internal reference matrix after correction, and a new image coordinate is assigned according to the pixel value of the source image coordinate.

The binocular parallel correction is performed on the camera coordinate systems corresponding to the images, so that during the process of correcting the epipolar lines, after the transformation to the camera coordinate systems, the camera coordinate systems corresponding to the left and right images are respectively subjected to parallel epipolar line correction through the rotation matrixes R1 and R2. The method comprises the following steps: the method comprises the steps of converting a source image pixel coordinate system into a camera coordinate system through an internal reference matrix, converting two images into the same coordinate system so as to facilitate subsequent distortion correction, performing parallel polar line correction through rotation matrixes R1 and R2, correcting the camera coordinate of the image through a distortion coefficient, converting the camera coordinate system into an image pixel coordinate system through the internal reference matrix after correction, and assigning a new image coordinate according to the pixel value of the source image coordinate. Referring to fig. 2, a diagram (a) in fig. 2 is a binocular image captured by a binocular camera, then an image (b) is obtained after distortion is eliminated, and a binocular parallel correction is further performed to obtain an image (C), that is, a corrected image in the embodiment of the present application.

And 102, determining a first depth image corresponding to the main image and the secondary image, and a primary image gray scale image corresponding to the main image and a secondary image gray scale image corresponding to the secondary image.

In an embodiment, the first depth image may be determined by performing binocular depth estimation on the primary map and the secondary map in the binocular image, and specifically, a semi-global block matching method (SGBM) may be used. The SGBM algorithm in OpenCV is a semi-global stereo matching algorithm, the stereo matching effect is obviously better than that of a local matching algorithm, but the complexity is far higher than that of the local matching algorithm.

Specifically, the disparity map of the corrected image may be calculated by the SGBM algorithm, it should be noted that most of disparities in the disparity map, for which disparity values are unreliable, are caused by occlusion or uneven illumination, and therefore, they may be filled with nearby reliable disparity values. The unit of parallax is a pixel (pixel) and the unit of depth is often a millimeter (mm) representation. According to the geometric relationship of the parallel binocular vision, the following conversion formula of parallax and depth can be obtained:

depth＝(f*baseline)/disp

in the above formula, depth represents a depth map; f denotes the normalized focal length, i.e. fx in the internal reference; baseline is the distance between the optical centers of the two cameras, called the baseline distance; disp is the disparity value. The depth values are known in the later of the equation, and then the first depth image is generated according to the calculated depth values. It should be noted that, the algorithm for performing binocular depth estimation in the embodiment of the present application includes, but is not limited to, the SGBM algorithm described above, which is not further described herein.

In an embodiment, if the main image and the sub image after the binocular image correction are gray level images, the step does not need to be executed, otherwise, color conversion can be performed according to a calculation formula of corresponding color to gray level to obtain corresponding binocular gray level images. There are various methods for performing color conversion on the binocular image, for example, if the original binocular image after correction is in RGB format, the image in RGB format may be converted into a gray image with a single channel of Y according to the formula Y = R × 0.299+ g × 0.587+ b × 0.114. According to the formula, R, G, B values of all pixel points are read in sequence, gray values are calculated (converted into integer numbers), the gray values are assigned to corresponding positions of new images, and conversion is completed after all the pixel points traverse once.

In another embodiment, the color conversion may be performed by an average value method, for example, if the original binocular image after the correction is in the RGB format, the image in the RGB format may be converted into a grayscale image in which a single channel is Y according to the formula Y = (R + G + B)/3, that is, the average value of three colors of red, green, and blue is taken as the grayscale according to the formula, and all the pixel points are traversed.

And 103, carrying out binocular depth estimation according to the main image gray level image and the auxiliary image gray level image, and determining a second depth image.

Further, after performing color conversion on both the primary image and the secondary image in the binocular image to obtain the primary image grayscale map and the secondary image grayscale map, binocular depth estimation may be further performed on the primary image grayscale map and the secondary image grayscale map to determine a second depth image, where the method for binocular depth estimation may refer to step 102.

And 104, obtaining a target depth image according to the first depth image and the second depth image.

In an embodiment, the obtaining the target depth image according to the first depth image and the second depth image may include:

determining an edge image of a main image in the binocular image, and adjusting the second depth image according to the edge image to obtain a third depth image;

and obtaining a target depth image according to the first depth image and the third depth image.

It should be noted that the edge of the image refers to a portion where the brightness of a local area of the image changes significantly, and the gray profile of the area can be generally regarded as a step, i.e. a sharp change from one gray value in a small buffer area to another gray value with a larger gray difference. Most information of the image is concentrated in the edge part of the image, and the determination and extraction of the image edge are very important for the identification and understanding of the whole image scene and are important features which are depended on by image segmentation.

In one embodiment, the method for extracting the edge image may include various methods, for example, edge detection may be performed by a Canny operator, specifically, the Canny operator first smoothes the image by using a gaussian smoothing filter to remove noise, a Canny segmentation algorithm calculates the gradient magnitude and direction by using a finite difference of first-order partial derivatives, the Canny operator also passes through a non-maximum suppression process during the processing, and finally, the Canny operator further connects edges by using two thresholds. In other embodiments, a Laplacian operator may be further used as an edge detection operator when extracting the edge image, where it should be noted that the Laplacian operator is an isotropic operator, and a second-order differential operator is suitable when only the position of the edge is concerned and the gray level difference of pixels around the edge is not considered. The Laplacian operator responds more strongly to isolated pixels than to edges or lines, and is therefore only suitable for noise-free images. In the presence of noise, low-pass filtering is required before edges are detected using the Laplacian operator. In other embodiments, the edge image may also be extracted using a Sobel operator, isotropic Sobel operator, roberts operator, prewitt operator, or the like for edge detection.

In an embodiment, after the edge image of the main image in the corrected image is extracted, the edge image and the second depth image may be logically updated to a third depth image. The edge image may be an image in which the element at the edge position in the original main graph corresponds to an element in the edge map, and the value of the element in the edge map corresponding to the element at the non-edge position is 255, and the value of the element in the edge map is 0. Specifically, an AND (AND) operation may be performed on the edge image AND the second depth image element by element to obtain an updated third depth image. For example, a logic AND (AND) operation is performed on any element b in the second depth image AND an element c corresponding to the same position in the edge image to obtain a corresponding position element d in the new third depth image, which is expressed by the following formula:

and traversing all elements in the image to obtain an updated third depth image.

In this embodiment, the first depth image is a full-field depth image and includes a depth image with a relatively high low frequency, and the third depth image includes a depth image with a relatively high frequency, so that the accuracy of depth information of a fine structure part in a full-field scene is significantly enhanced after the two images are fused in this step.

As can be seen from the above, the image processing method provided in the embodiment of the present application may obtain the main image and the secondary image, determine the first depth image corresponding to the main image and the secondary image, and the secondary image grayscale image corresponding to the main image and the secondary image, perform binocular depth estimation according to the main image grayscale image and the secondary image grayscale image, determine the second depth image, and obtain the target depth image according to the first depth image and the second depth image. The depth image containing the low-frequency information and the depth image containing the high-frequency information can be obtained by processing the binocular images, so that more depth information of fine structures can be reserved after the two images are fused, and the accuracy of generating the depth image is greatly improved.

The image processing method of the present application will be further described below on the basis of the methods described in the above embodiments. Referring to fig. 3, fig. 3 is another schematic flow chart diagram of an image processing method according to an embodiment of the present application, where the image processing method includes:

step 201, acquiring a binocular image, and performing binocular correction on the binocular image to obtain a corrected image.

In an embodiment, the binocular image may be an image acquired by a binocular camera under a natural or artificial light source, wherein the binocular image may be a color image or a gray scale image, or may be a color image and a gray scale image.

Further, binocular images are subjected to binocular correction after being acquired. The binocular correction is to process images of the same object shot by the two cameras through a certain method, so that epipolar alignment of binocular images in multi-view geometry is achieved, the process only relates to application of internal and external parameters of the binocular cameras and does not relate to image content, and binocular matching is facilitated after binocular correction, so that depth is estimated through parallax.

Step 202, color conversion is performed on the corrected image to obtain a gray scale image.

Specifically, the binocular corrected image may be grayed, which is not required if the corrected image is a grayscale image, or color conversion may be performed according to a calculation formula for converting corresponding color into grayscale to obtain a corresponding binocular grayscale image. For example, if the original corrected binocular image is in RGB format, the RGB-format image may be converted into a single-channel Y grayscale image according to the formula Y = R × 0.299+ g × 0.587+ b × 0.114. According to the formula, R, G, B values of all pixel points are read in sequence, gray values (converted into integer numbers) are calculated, the gray values are assigned to corresponding positions of new images, and the gray images can be obtained after all the pixel points are traversed once and conversion is completed.

And 203, performing down-sampling processing on the gray level image, performing binocular depth estimation according to the down-sampled gray level image, and determining an initial depth image.

In the field of digital signal processing, down-sampling is a technique of multi-rate digital signal processing or a process of reducing a signal sampling rate, and is generally used to reduce a data transmission rate or a data size. In the embodiment of the application, the gray level image can be subjected to down-sampling processing before binocular depth estimation is carried out, so that the resolution of the binocular image during depth estimation can be reduced, on one hand, the time consumed by calculation can be remarkably reduced, on the other hand, noise can be removed to a certain extent by adopting smooth down-sampling, and the binocular depth estimation is facilitated.

In an embodiment, the smooth downsampling method may be but not limited to a downsampling method with an image smoothing function, such as a bilinear interpolation method or a cubic interpolation method. The bilinear interpolation method is also called first-order interpolation, and obtains the value of a pixel point to be solved through two times of linear interpolation by utilizing the correlation between the pixel point to be solved and 4 nearest neighbor pixels in a source image. The cubic interpolation method is called cubic convolution interpolation, and it uses the value of 16 adjacent pixel points of the pixel point to be solved in the source image, namely the weighted average of the 16 pixel points.

In an embodiment, after the down-sampling processing is performed on the grayscale image, and before the binocular depth estimation is performed according to the down-sampled grayscale image, the method further includes:

and according to a preset contrast value and a preset local window size, carrying out contrast adjustment on the gray level image subjected to the down-sampling treatment to obtain a contrast adjustment image.

For example, after the gray-scale image is down-sampled, local contrast enhancement processing may be performed on the low-resolution binocular single-channel gray-scale image by using a contrast adaptive histogram equalization method (CLAHE), where the optimal maximum contrast of the local contrast enhancement method in implementation may be set to 6, and the optimal local window size may be set to 11 pixels each.

After the local contrast adjustment is completed, binocular depth estimation may be further performed on the grayscale image, and an initial depth image may be determined. Specifically, the SGBM algorithm may calculate a disparity map of the corrected image, and after the disparity map is calculated, further detect a hole region of the disparity map and fill the hole region, for example, after the hole region is detected, fill the hole region with an average value of nearby reliable disparity values. The unit of parallax is a pixel (pixel) and the unit of depth is often a millimeter (mm) representation. According to the geometric relationship of the parallel binocular vision, the following conversion formula of parallax and depth can be obtained:

depth＝(f*baseline)/disp

in the above formula, depth represents a depth map; f denotes the normalized focal length, i.e. fx in the internal reference; baseline is the distance between the optical centers of the two cameras, called the baseline distance; disp is the disparity value. The depth values are known in the later equation, and then the initial depth image is generated according to the calculated depth values.

And step 204, performing upsampling processing on the initial depth image to obtain a first depth image.

In this embodiment, the initial depth image obtained at this time is a low-resolution depth image, and therefore, the smooth upsampling process may be performed on the initial depth image according to the parameter at the time of the downsampling, so that a high-resolution first depth image may be obtained. In another embodiment, a target resolution may also be set to perform the upsampling process on the initial depth image, for example, the upsampled target resolution is an image resolution of an original high-resolution binocular image obtained after binocular correction. The smooth upsampling method may be, but not limited to, a downsampling method with an image smoothing function, such as a bilinear interpolation method or a cubic interpolation method.

Step 205, performing denoising processing on the main image in the corrected image, performing color conversion on the main image subjected to denoising processing to obtain a main image gray scale image, performing color conversion on the secondary image in the corrected image, and performing denoising processing on the secondary image subjected to color conversion to obtain a secondary image gray scale image.

Further, denoising processing is performed on the main image in the binocular corrected high-resolution binocular image to obtain a denoised main image. Wherein the main map refers to a reference view used for generating a corresponding depth map in binocular depth estimation. The denoising processing method can be selected from but not limited to a median filtering method or a weighted least square filtering method and other denoising methods.

And after the main image in the corrected image is subjected to denoising processing, performing color conversion on the denoised main image to obtain a main image gray scale image. Correspondingly, color conversion and denoising processing are simultaneously carried out on the binocular-corrected grayed secondary image (the other image except the main image in the binocular image is the secondary image), wherein the selected denoising method and parameter setting are required to be completely consistent with the main image so as to obtain the secondary image grayscale image.

And step 206, carrying out binocular depth estimation according to the main image gray level image and the auxiliary image gray level image, and determining a second depth image.

In this embodiment, the binocular depth estimation method may be, but is not limited to, a depth estimation method with fast processing speed and accurate depth recovery for strong texture, such as a local Block Matching (BM) method. It should be noted that since the binocular image used for depth estimation in this step is an original high-resolution image without downsampling, the depth map obtained can retain the depth information of the fine structure to the maximum extent, but since a depth estimation method with a fast processing speed and low overall accuracy (mainly in the weak texture region, the accuracy is low) is adopted, it can only ensure that the part of the scene with the fine structure (strong texture) can accurately recover the depth information.

In this step, the binocular image used for depth estimation is the original high-resolution image without downsampling, so the depth map obtained can retain the depth information of the fine structure to the maximum extent, but since the depth estimation method which has a high processing speed and low overall precision (mainly in the weak texture region) is adopted, the depth estimation method can only ensure that the part with the fine structure in the scene can accurately recover the depth information.

And step 207, extracting the edge image of the main image subjected to denoising processing, and performing image morphological expansion processing on the edge image according to preset expansion parameters.

The edge of the image refers to a portion where the luminance change in the local area of the image is significant. In one embodiment, extracting the edge image may include a plurality of methods, such as edge detection by a Canny operator, to extract the edge image.

It should be noted that the morphological operation operates on the image based on the shape, and the morphological operation generates the output image by applying the structural element on the image, so as to change the shape of the object, for example, erosion is "thinning" and swelling is "fatting". They are typically performed on binary images, similar to contour detection. The dilation process is the expansion of bright white areas in the magnified image by adding pixels to the perceived boundaries of objects in the image. The erosion process is such that it removes pixels along the image boundary and reduces the size of the image. In the present embodiment, the fine structure in the edge image can be enlarged by performing dilation processing on the edge image.

Specifically, the expansion process may be performed by performing a convolution operation on the image using a convolution kernel, which may have any shape and size, and is usually a square or a circle. The convolution kernel usually has an anchor point, usually located at the center of the convolution kernel. As the convolution kernel scans this image, we compute the maximum pixel value of the overlap region and replace the position of the anchor point with the maximum value. I.e. the maximize operation results in a growth of bright areas in the picture, so is referred to herein as dilation. In the present embodiment, the expansion radius of the expansion treatment is preferably 3, and the number of times of expansion is preferably 1.

And step 208, updating the edge image after the expansion processing and the second depth image into a third depth image by performing element-by-element logical operation.

The edge image may be an image in which the element at the edge position in the original main graph corresponds to an element in the edge map, and the value of the element in the edge map corresponding to the element at the non-edge position is 255, and the value of the element in the edge map is 0. Specifically, an AND (AND) operation may be performed on the edge image AND the second depth image element by element to obtain an updated third depth image. For example, a logic AND (AND) operation is performed on any element b in the second depth image AND an element c corresponding to the same position in the edge image to obtain a new element d corresponding to the position in the third depth image, which is expressed as the following formula:

Step 209, respectively obtaining a first pixel value and a second pixel value of a corresponding pixel in the first depth image and the third depth image, and generating a target depth image according to a maximum value of the first pixel value and the second pixel value.

Specifically, the first depth image and the third depth image may be fused by performing a pixel-by-pixel correspondence maximum value operation to obtain a fused target depth image. For example, for any element a in the first depth image, the corresponding element in the third depth image is d, and the corresponding element in the fused target depth image is e, the following formula is expressed:

and traversing all elements in the image to generate the target depth image.

And step 210, performing filtering processing on the target depth image according to the main image in the corrected image to obtain a filtered target depth image.

Specifically, the main image after binocular correction can be used as a guide image, and the fused target depth image is subjected to subsequent processing to obtain a full-scene depth image finally containing accurate fine structure depth information. The post-processing method includes, but is not limited to, combined bilateral filtering, fast bilateral solution filtering, and the like depth map post-processing method.

The first depth image is a full-field depth image and is a depth image containing more low frequencies, and the third depth image is a depth image containing more high frequencies, so that the accuracy of the depth information of the fine structure part in the full field is obviously enhanced after the two images are fused in the step. Referring to fig. 4, the left image is an original image, the middle image is a depth image obtained by a method in the prior art, and the right image is a depth image generated by an image processing method provided in an embodiment of the present application. The result shows that the fine structure depth information recovery method provided by the invention not only can accurately recover the depth information of the fine structure in the scene, but also can meet the requirements of low cost, low power consumption, high processing speed and the like in industrial application scenes.

As can be seen from the above, the image processing method provided in the embodiment of the present application may obtain a binocular image, perform binocular correction on the binocular image to obtain a corrected image, perform color conversion on the corrected image to obtain a grayscale image, perform downsampling on the grayscale image, perform binocular depth estimation according to the downsampled grayscale image, determine an initial depth image, perform upsampling on the initial depth image to obtain a first depth image, perform denoising on a primary image in the corrected image, perform color conversion on the denoised primary image to obtain a primary grayscale image, perform denoising on a secondary image in the corrected image to obtain a secondary image grayscale image, perform binocular depth estimation according to the primary image grayscale image and the secondary image grayscale image, determine a second depth image, extract an edge image of the denoised primary image, perform image morphological expansion processing on the edge image according to preset expansion parameters, update the edge image and the second depth image after the expansion processing into a third depth image by performing element logical operation, obtain a third depth image, obtain a third depth value corresponding to the third depth value in the primary image and a target value corresponding to the third element filtering element, and generate a target value in the third depth image, and generate a target value according to the third depth value in the third depth image, and the third depth value in the third depth filtering element filtering target element. The depth image containing the low-frequency information and the depth image containing the high-frequency information can be obtained by processing the binocular images, so that more depth information of fine structures can be reserved after the two images are fused, and the accuracy of generating the depth image is greatly improved.

Referring to fig. 5, fig. 5 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present disclosure. Wherein the image processing apparatus 30 comprises:

an obtaining module 301, configured to obtain a main map and a sub map;

a first determining module 302, configured to determine a first depth image corresponding to the primary map and the secondary map, a primary map grayscale map corresponding to the primary map, and a secondary map grayscale map corresponding to the secondary map;

a second determining module 303, configured to perform binocular depth estimation according to the main map grayscale image and the secondary map grayscale image, and determine a second depth image;

an adjusting module 304, configured to determine an edge image of the main image, and adjust the second depth image according to the edge image to obtain a third depth image;

a fusion module 305, configured to obtain a target depth image according to the first depth image and the third depth image.

In one embodiment, with continued reference to FIG. 6, the first determination module 302 may include:

a conversion submodule 3021, configured to perform binocular correction on the main image and the sub image to obtain a corrected image, and determine a grayscale image corresponding to the corrected image;

the first processing submodule 3022 is configured to perform downsampling on the grayscale image, perform binocular depth estimation according to the downsampled grayscale image, and determine an initial depth image;

a second processing sub-module 3023, configured to perform upsampling on the initial depth image to obtain a first depth image.

In one embodiment, the fusion module 305 may include:

an obtaining sub-module 3051, configured to obtain a first element value and a second element value of corresponding elements in the first depth image and the third depth image, respectively;

the generation submodule 3052 is configured to obtain a maximum value of the first element value and the second element value, and generate a target depth image according to the maximum value.

As can be seen from the above description, the image processing apparatus 30 according to the embodiment of the present application may obtain the main image and the sub image, determine the first depth image corresponding to the main image and the sub image, and the sub image grayscale image corresponding to the main image and the sub image, perform binocular depth estimation according to the main image grayscale image and the sub image grayscale image, determine the second depth image, determine the edge image of the main image in the binocular image, adjust the second depth image according to the edge image, obtain the third depth image, and obtain the target depth image according to the first depth image and the third depth image. The depth image containing the low-frequency information and the depth image containing the high-frequency information can be obtained by processing the binocular images, so that more depth information of fine structures can be reserved after the two images are fused, and the accuracy of generating the depth image is greatly improved.

In the embodiment of the present application, the image processing apparatus and the image processing method in the foregoing embodiment belong to the same concept, and any method provided in the embodiment of the image processing method may be executed on the image processing apparatus, and a specific implementation process thereof is described in detail in the embodiment of the image processing method, and is not described herein again.

The term "module" as used herein may be considered a software object executing on the computing system. The different components, modules, engines, and services described herein may be considered as implementation objects on the computing system. The apparatus and method described herein may be implemented in software, but may also be implemented in hardware, and are within the scope of the present application.

Embodiments of the present application further provide a storage medium, on which a computer program is stored, and when the computer program runs on a computer, the computer is caused to execute the above-mentioned image processing method.

The embodiment of the application also provides an electronic device, such as a tablet computer, a mobile phone and the like. The processor in the electronic device loads instructions corresponding to processes of one or more application programs into the memory according to the following steps, and the processor runs the application programs stored in the memory, so that various functions are realized:

acquiring a main graph and a secondary graph;

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

Referring to fig. 7, the electronic device 400 includes a processor 401 and a memory 402. The processor 401 is electrically connected to the memory 402.

The processor 401 is a control center of the electronic device 400, connects various parts of the entire electronic device using various interfaces and lines, performs various functions of the electronic device 400 by running or loading a computer program stored in the memory 402 and calling data stored in the memory 402, and processes the data, thereby monitoring the electronic device 400 as a whole.

The memory 402 may be used to store software programs and modules, and the processor 401 executes various functional applications and data processing by operating the computer programs and modules stored in the memory 402. The memory 402 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, a computer program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data created according to use of the electronic device, and the like. Further, the memory 402 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory 402 may also include a memory controller to provide the processor 401 access to the memory 402.

In this embodiment, the processor 401 in the electronic device 400 loads instructions corresponding to one or more processes of the computer program into the memory 402 according to the following steps, and the processor 401 runs the computer program stored in the memory 402, so as to implement various functions, as follows:

acquiring a main graph and a secondary graph;

Referring to fig. 8, in some embodiments, the electronic device 400 may further include: a display 403, radio frequency circuitry 404, audio circuitry 405, and a power supply 406. The display 403, the rf circuit 404, the audio circuit 405, and the power source 406 are electrically connected to the processor 401.

The display 403 may be used to display information entered by or provided to the user as well as various graphical user interfaces, which may be made up of graphics, text, icons, video, and any combination thereof. The Display 403 may include a Display panel, and in some embodiments, the Display panel may be configured in the form of a Liquid Crystal Display (LCD), an Organic Light-Emitting Diode (OLED), or the like.

The rf circuit 404 may be used for transceiving rf signals to establish wireless communication with a network device or other electronic devices through wireless communication, and for transceiving signals with the network device or other electronic devices. In general, the radio frequency circuitry 404 includes, but is not limited to, an antenna, at least one Amplifier, a tuner, one or more oscillators, a Subscriber Identity Module (SIM) card, a transceiver, a coupler, a Low Noise Amplifier (LNA), a duplexer, and the like.

The audio circuit 405 may be used to provide an audio interface between the user and the electronic device through a speaker, microphone. The audio circuit 405 may convert the received audio data into an electrical signal, transmit the electrical signal to a speaker, and convert the electrical signal into an acoustic signal for output by the speaker.

The power supply 406 may be used to power various components of the electronic device 400. In some embodiments, power supply 406 may be logically coupled to processor 401 via a power management system, such that functions to manage charging, discharging, and power consumption management are performed via the power management system. The power supply 406 may also include any component of one or more dc or ac power sources, recharging systems, power failure detection circuitry, power converters or inverters, power status indicators, and the like.

Although not shown in fig. 8, the electronic device 400 may further include a camera, a bluetooth module, and the like, which are not described in detail herein.

In the embodiment of the present application, the storage medium may be a magnetic disk, an optical disk, a Read Only Memory (ROM), a Random Access Memory (RAM), or the like.

In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

It should be noted that, for the image processing method in the embodiment of the present application, it can be understood by a person skilled in the art that all or part of the process of implementing the image processing method in the embodiment of the present application can be completed by controlling the relevant hardware through a computer program, the computer program can be stored in a computer readable storage medium, such as a memory of an electronic device, and executed by at least one processor in the electronic device, and the process of executing the process can include, for example, the process of the embodiment of the image processing method. The storage medium may be a magnetic disk, an optical disk, a read-only memory, a random access memory, etc.

In the image processing apparatus according to the embodiment of the present application, each functional module may be integrated into one processing chip, each module may exist alone physically, or two or more modules may be integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium, such as a read-only memory, a magnetic or optical disk, or the like.

The foregoing detailed description has provided an image processing method, an image processing apparatus, a storage medium, and an electronic device according to embodiments of the present application, and specific examples are applied herein to explain the principles and implementations of the present application, and the descriptions of the foregoing embodiments are only used to help understand the method and the core ideas of the present application; meanwhile, for those skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. An image processing method, characterized in that it comprises the steps of:

acquiring a main graph and a secondary graph;

2. The image processing method according to claim 1, wherein the step of obtaining a target depth image from the first depth image and the second depth image comprises:

determining an edge image of the main image, and adjusting the second depth image according to the edge image to obtain a third depth image;

3. The image processing method of claim 1, wherein the step of determining the first depth image corresponding to the primary and secondary maps comprises:

carrying out binocular correction on the main image and the auxiliary image to obtain a corrected image;

determining a gray scale image corresponding to the corrected image;

performing down-sampling processing on the gray level image, and performing binocular depth estimation according to the down-sampled gray level image to determine an initial depth image;

and performing upsampling processing on the initial depth image to obtain a first depth image.

4. The image processing method according to claim 3, wherein after the down-sampling processing of the grayscale image, before the binocular depth estimation from the down-sampled grayscale image, the method further comprises:

and according to a preset contrast value and a preset local window size, performing contrast adjustment on the gray-scale image subjected to the down-sampling treatment to obtain a contrast adjustment image.

5. The image processing method of claim 1, wherein the primary image is an image captured by a first imager in the imaging module, the secondary image is an image captured by a second imager in the imaging module, the first imager is a primary imager, the primary imager is used for capturing the primary image, the second imager is a secondary imager, the secondary imager is used for capturing the secondary image, and the secondary image is used for assisting in calculating parallax information of the primary image.

6. The image processing method according to claim 2, wherein the step of determining the primary map grayscale map corresponding to the primary map and the secondary map grayscale map corresponding to the secondary map comprises:

carrying out denoising processing on the main graph, and carrying out color conversion on the main graph subjected to denoising processing to obtain a main graph gray level graph;

and carrying out color conversion on the sub-image, and carrying out denoising treatment on the sub-image after the color conversion to obtain a sub-image gray scale image.

7. The image processing method of claim 6, wherein determining an edge image of the main image, adjusting the second depth image according to the edge image, and obtaining a third depth image comprises:

extracting an edge image of the main image subjected to denoising processing;

performing image morphological expansion processing on the edge image according to preset expansion parameters to obtain an expanded image;

and updating the expanded image and the second depth image into a third depth image by performing element-by-element logical operation on the expanded image and the second depth image.

8. The image processing method according to claim 2, wherein the step of obtaining a target depth image from the first depth image and the third depth image comprises:

respectively acquiring a first pixel value and a second pixel value of corresponding pixels in the first depth image and the third depth image;

and acquiring the maximum value of the first pixel value and the second pixel value, and generating a target depth image according to the maximum value.

9. The image processing method according to claim 3, wherein after obtaining a target depth image from the first depth image and the second depth image, the method further comprises:

and carrying out filtering processing on the target depth image according to the main image in the corrected image so as to obtain a filtered target depth image.

10. An image processing apparatus, characterized in that the apparatus comprises:

11. A storage medium having stored thereon a computer program, characterized in that, when the computer program runs on a computer, it causes the computer to execute the image processing method according to any one of claims 1 to 9.

12. An electronic device comprising a processor and a memory, the memory storing a plurality of instructions, wherein the instructions in the memory are loaded by the processor for performing the steps of:

acquiring a main graph and a secondary graph;