CN114255268A

CN114255268A - Disparity map processing and deep learning model training method and related equipment

Info

Publication number: CN114255268A
Application number: CN202011018154.5A
Authority: CN
Inventors: 孟俊彪; 胡锦丽
Original assignee: Wuhan TCL Group Industrial Research Institute Co Ltd
Current assignee: Wuhan TCL Group Industrial Research Institute Co Ltd
Priority date: 2020-09-24
Filing date: 2020-09-24
Publication date: 2022-03-29

Abstract

The application discloses a method for disparity map processing and deep learning model training and related equipment, and belongs to the technical field of image processing. The method comprises the following steps: acquiring a first disparity map, wherein the first disparity map is determined according to the image pair to be processed; and inputting the first disparity map into the trained deep learning model for processing, and outputting a second disparity map, wherein the image quality of the second disparity map is higher than that of the first disparity map. In the application, the disparity map with high image quality can be obtained in a short time through the trained deep learning model.

Description

Disparity map processing and deep learning model training method and related equipment

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a method for disparity map processing and deep learning model training and a related device.

Background

The application of the dual-camera configuration in the mobile phone and other terminals is increasingly widespread. At present, in order to improve the shooting effect, double shooting blurring can be adopted to make the depth of field shallow and focus on a subject. In order to implement the double shot blurring, it is necessary to calculate a disparity map of the main shot image and the sub-shot image, perform depth estimation from the disparity map, and perform background blurring on the basis of the depth estimation.

However, due to repeated texture, weak texture, overexposure, noise and the like, the image quality of the disparity map directly obtained by the calculation of the stereo matching algorithm is difficult to meet the requirement, and a filter is often required to be used for optimization, for example, a least square filter, a combined bilateral filter, a median filter and the like can be used for optimization. These filters have the problems of long time consumption, poor filtering effect and the like.

Disclosure of Invention

The application provides a disparity map processing and deep learning model training method and related equipment, which are used for obtaining a disparity map with high image quality in a short time.

In a first aspect, a disparity map processing method is provided, including:

acquiring a first disparity map, wherein the first disparity map is determined according to the image pair to be processed;

and inputting the first disparity map into the trained deep learning model for processing, and outputting a second disparity map, wherein the image quality of the second disparity map is higher than that of the first disparity map.

In the application, after the first disparity map of the image pair to be processed is obtained, the first disparity map is input into the trained deep learning model, and then the second disparity map with higher image quality can be obtained in a shorter time.

In a second aspect, a deep learning model training method is provided, including:

obtaining a plurality of sample image pairs, wherein the image quality of a final disparity map in each sample image pair is higher than that of an initial disparity map;

using a plurality of initial disparity maps as input data in a training sample, and using a plurality of final disparity maps as sample marks in the training sample;

and training the untrained deep learning model by using the training sample to obtain the trained deep learning model.

In the present application, after a plurality of sample image pairs are acquired, the plurality of sample image pairs are determined as training samples, and the image quality of the final disparity map in each sample image pair is higher than that of the initial disparity map. And then, training the untrained deep learning model by using the training sample to obtain the trained deep learning model. The trained deep learning model can obtain a disparity map with higher image quality in a shorter time.

In a third aspect, a disparity map processing apparatus is provided, including:

the first disparity map acquisition module is used for acquiring a first disparity map, and the first disparity map is determined according to the image pair to be processed;

and the disparity map processing module is used for inputting the first disparity map into the trained deep learning model for processing and outputting a second disparity map, wherein the image quality of the second disparity map is higher than that of the first disparity map.

In a fourth aspect, a deep learning model training device is provided, which includes:

the second disparity map acquisition module is used for acquiring a plurality of sample image pairs, and the image quality of the final disparity map in each sample image pair is higher than that of the initial disparity map;

the training sample acquisition module is used for taking the initial disparity maps as input data in a training sample and taking the final disparity maps as sample marks in the training sample;

and the training module is used for training the untrained deep learning model by using the training sample to obtain the trained deep learning model.

In a fifth aspect, a computer device is provided, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when executed by the processor, the computer program implements the disparity map processing method described above.

In a sixth aspect, a computer device is provided, which comprises a memory, a processor and a computer program stored in the memory and executable on the processor, and which when executed by the processor implements the deep learning model training method described above.

In a seventh aspect, a computer-readable storage medium is provided, which stores a computer program, and the computer program realizes the above disparity map processing method when executed by a processor.

In an eighth aspect, a computer-readable storage medium is provided, which stores a computer program that, when executed by a processor, implements the deep learning model training method described above.

In a ninth aspect, there is provided a computer program product comprising instructions which, when run on a computer, cause the computer to perform the steps of the disparity map processing method described above.

In a tenth aspect, a computer program product comprising instructions which, when run on a computer, cause the computer to perform the steps of the deep learning model training method described above is provided.

It is understood that, for the beneficial effects of the third aspect, the fifth aspect, the seventh aspect and the ninth aspect, reference may be made to the description of the first aspect, and details are not described herein again. For the beneficial effects of the fourth aspect, the sixth aspect, the eighth aspect and the tenth aspect, reference may be made to the description related to the second aspect, and details are not repeated here.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a flowchart of a deep learning model training method provided in an embodiment of the present application;

FIG. 2 is a schematic structural diagram of a deep learning model provided in an embodiment of the present application;

fig. 3 is a flowchart of a disparity map processing method according to an embodiment of the present application;

fig. 4 is a schematic diagram of a disparity map provided in an embodiment of the present application;

FIG. 5 is a schematic structural diagram of a deep learning model training apparatus according to an embodiment of the present disclosure;

fig. 6 is a schematic structural diagram of a disparity map processing apparatus according to an embodiment of the present disclosure;

FIG. 7 is a schematic structural diagram of a computer device according to an embodiment of the present disclosure;

fig. 8 is a schematic structural diagram of another computer device provided in the embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

It should be understood that reference to "a plurality" in this application means two or more. In the description of the present application, "/" indicates an OR meaning, for example, A/B may indicate A or B; "and/or" herein is merely an association describing an associated object, and means that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, for the convenience of clearly describing the technical solutions of the present application, the terms "first", "second", and the like are used to distinguish the same items or similar items having substantially the same functions and actions. Those skilled in the art will appreciate that the terms "first," "second," etc. do not denote any order or quantity, nor do the terms "first," "second," etc. denote any order or importance.

Before explaining the embodiments of the present application in detail, an application scenario of the embodiments of the present application will be described.

In order to improve the shooting effect, double shooting blurring can be adopted to make the depth of field shallow and focus on the subject. The double-shot blurring needs to be realized according to the depth-of-field images of the main shot image and the auxiliary shot image. Currently, in the depth estimation process of the main and auxiliary shot images, the disparity maps of the main and auxiliary shot images can be calculated through a stereo matching algorithm. According to the optical principle, the disparity map is inversely proportional to the actual distance of the object, and thus a corresponding depth image can be obtained according to the disparity map. However, due to repeated texture, weak texture, overexposure, noise and the like, the image quality of the disparity map directly obtained by the calculation of the stereo matching algorithm is difficult to meet the requirement, and a filter is often required to be used for optimization, for example, a least square filter, a combined bilateral filter, a median filter and the like can be used for optimization. These filters have the problems of long time consumption, poor filtering effect and the like.

Therefore, the embodiment of the application provides a disparity map processing method, which can obtain a high-quality disparity map of an image pair through a trained deep learning model. The trained deep learning model is obtained by training an untrained deep learning model by using a training sample after determining an initial disparity map of a sample image pair and a final disparity map with higher image quality than the initial disparity map as the training sample, so that the disparity map with higher image quality can be obtained in a shorter time by the trained deep learning model.

The training process of the deep learning model will be described in detail below.

Fig. 1 is a flowchart of a deep learning model training method according to an embodiment of the present disclosure. Referring to fig. 1, the method comprises the steps of:

step 101: the server acquires a plurality of sample image pairs, and the image quality of the final disparity map in each sample image pair is higher than that of the initial disparity map.

The two sample images in each of the plurality of sample image pairs are an initial disparity map and a final disparity map. The plurality of sample image pairs correspond one-to-one to the plurality of first image pairs. The initial disparity map and the final disparity map in each sample image pair are both disparity maps of the corresponding first image pair.

The two images in the first image pair are two images acquired in the same scene at different angles, that is, the first image pair is obtained by shooting the same target through two different cameras. The first image pair may be images captured by two cameras in the electronic device, i.e., may be a binocular image pair. For example, the first image pair may be an image captured by a camera having the highest pixel among the plurality of cameras of the mobile phone and an image captured by a camera other than the camera having the highest pixel among the plurality of cameras of the mobile phone.

The disparity map of the first image pair is an image in which one image of the first image pair is a reference image (i.e., a main image), the size of which is the size of the reference image, and the pixel value of which is the disparity value. The image quality of the disparity map may be measured in one or more of edge fineness, accuracy, noise, sharpness, etc.

The image quality of the final disparity map is higher than that of the initial disparity map, which means that the final disparity map is better than the initial disparity map in one or more aspects of edge fineness, accuracy, noise, sharpness and the like. Compared with the initial disparity map, the final disparity map can better meet the implementation requirement of double-shot blurring.

For any sample image pair of the plurality of sample image pairs, this sample image pair may be referred to as sample image pair a, and the operation of the server to obtain the initial disparity map and the final disparity map in the sample image pair a is as follows:

the initial disparity map in the sample image pair a can be directly calculated in a simpler manner. Alternatively, the server may obtain a first image pair, and determine an initial disparity map in the sample image pair a according to the first image pair and a pre-stored stereo matching algorithm.

The stereo matching algorithm can find out the corresponding pixel point of each pixel point in one image in the image of the other visual angle, and calculate the disparity maps of the two images so as to estimate the depth image. For example, the stereo matching algorithm may be a semi-global block matching (SGBM) algorithm, a Block Matching (BM) algorithm, a Graph Cuts (GC) algorithm, and the like, which is not limited in this embodiment.

In one possible case, the operation of the server to determine the initial disparity map in the sample image pair a according to the first image pair and the pre-stored stereo matching algorithm may be: the server directly obtains the disparity map of the first image pair through a prestored stereo matching algorithm to serve as the initial disparity map in the sample image pair A.

In another possible case, the operation of the server to determine the initial disparity map in the sample image pair a according to the first image pair and the pre-stored stereo matching algorithm may be: the server determines an initial disparity map in the sample image pair a from the first image pair and a multi-scale stereo matching algorithm.

The multi-scale stereo matching algorithm is to perform multi-scale decomposition on an image, combine information of each scale and obtain a disparity map through the stereo matching algorithm. The disparity map obtained by the multi-scale stereo matching algorithm can integrate the advantages of large size fineness and small size accuracy.

The operation of the server determining the initial disparity map in the sample image pair a according to the first image pair and the multi-scale stereo matching algorithm may be:

the server determines a third disparity map according to the first image pair and a pre-stored first stereo matching algorithm;

the server determines a fourth disparity map according to a second image pair and a pre-stored second stereo matching algorithm, wherein the second image pair is obtained by reducing the size of each image in the first image pair by k times, and k is an integer greater than or equal to 1;

the server enlarges the size of the fourth disparity map by k times to obtain a fifth disparity map;

and the server determines the initial disparity map in the sample image pair A according to the pixel values of the pixel points in the third disparity map and the fifth disparity map.

When the server determines the third disparity map according to the first image pair and a prestored first stereo matching algorithm, the server can obtain the disparity map of the first image pair as the third disparity map through the first stereo matching algorithm.

When the server determines the fourth disparity map according to the second image pair and a pre-stored second stereo matching algorithm, the server may obtain the disparity map of the second image pair as the fourth disparity map through the second stereo matching algorithm. The first stereo matching algorithm and the second stereo matching algorithm may be the same or different.

k may be set in advance. For example, k may be 2, where the second image pair is the first image pair at 1/2 size.

And reducing the size of each image in the first image pair by k times to obtain a second image pair. The size of the disparity map of the second image pair (i.e., the fourth disparity map) is enlarged by k times to obtain a fifth disparity map. The size of the disparity map of the first image pair (i.e., the third disparity map) is the same as the size of the fifth disparity map.

Since the third disparity map is a disparity map of the first image pair, and the fifth disparity map is a disparity map obtained by enlarging the size of the disparity map of the second image pair by k times, the third disparity map and the fifth disparity map are disparity maps at different scales, and thus the initial disparity map in the sample image pair a can be determined by combining the pixel values of the pixel points in the third disparity map and the pixel values of the pixel points in the fifth disparity map.

Optionally, the operation of the server determining the initial disparity map in the sample image pair a according to the pixel values of the pixel points in the third disparity map and the pixel values of the pixel points in the fifth disparity map may be:

the server determines first pixel points at various positions in the third disparity map and second pixel points at corresponding positions in the fifth disparity map;

if the difference value between the pixel value of the first pixel point and the pixel value of the second pixel point is smaller than the reference difference value, the server takes the pixel value of the first pixel point as the pixel value of the pixel point at the position in the initial parallax image in the sample image pair A; alternatively, the first and second electrodes may be,

if the difference value between the pixel value of the first pixel point and the pixel value of the second pixel point is larger than the reference difference value, the server takes the pixel value of the second pixel point as the pixel value of the pixel point at the position in the initial parallax image in the sample image pair A; alternatively, the first and second electrodes may be,

if the difference between the pixel value of the first pixel and the pixel value of the second pixel is equal to the reference difference, the server takes the pixel value of the first pixel or the pixel value of the second pixel as the pixel value of the pixel at the position in the initial parallax image of the sample image pair A.

For any position in the disparity map, the server determines the pixel value of the pixel point at the position in the initial disparity map in the sample image pair a according to the pixel value of the first pixel point at the position in the third disparity map and the pixel value of the second pixel point at the position in the fifth disparity map.

The reference difference may be preset, and the reference difference may be set to be larger, for example, the reference difference may be 100, 120, and the like, which is not limited in this embodiment of the application.

If the difference between the pixel value of the first pixel point and the pixel value of the second pixel point is smaller than the reference difference, it is indicated that the difference between the pixel value of the first pixel point and the pixel value of the second pixel point is smaller. In this case, the pixel value of the first pixel point in the third disparity map corresponding to the original-size image pair may be taken as the pixel value of the pixel point in the initial disparity map in the sample image pair a, so that the initial disparity map in the sample image pair a may obtain the fineness of the large size.

If the difference between the pixel value of the first pixel point and the pixel value of the second pixel point is larger than the reference difference, the difference between the pixel value of the first pixel point and the pixel value of the second pixel point is larger. In this case, the pixel value of the second pixel point in the fifth disparity map corresponding to the reduced image pair of the original image pair may be taken as the pixel value of the pixel point in the initial disparity map of the sample image pair a, so that the initial disparity map of the sample image pair a may obtain the accuracy of the small size.

If the difference between the pixel value of the first pixel point and the pixel value of the second pixel point is equal to the reference difference, it is indicated that the difference between the pixel value of the first pixel point and the pixel value of the second pixel point is not large. In this case, the pixel value of the first pixel point in the third parallax image corresponding to the original-size image pair is taken, or the pixel value of the second pixel point in the fifth parallax image corresponding to the reduced image pair of the original-size image pair is taken, and the effect difference is not large, so that one of the pixel values can be taken as the pixel value of the pixel point in the initial parallax image in the sample image pair a.

The final disparity map in the sample image pair a can be obtained in a more complex manner, and two possible manners are described below, and a final disparity map with better image quality can be obtained in the following two possible manners.

In a first possible manner, the server processes the initial disparity map in the sample image pair a through an edge-preserving filtering algorithm to obtain a final disparity map in the sample image pair a.

The parallax image with smoother whole and finer edge can be obtained by the edge-preserving filtering algorithm. For example, the edge-preserving filtering algorithm may include one or more of a WLS (weighted least squares) filtering algorithm, a FBS (fast bilateral filtering) algorithm, and the like, which is not limited in this embodiment of the present invention.

For example, if the edge-preserving filtering algorithm includes a WLS filtering algorithm and a FBS algorithm, the server may perform WLS filtering on the initial disparity map in the sample image pair a to obtain a first disparity map, where the whole of the first disparity map is smoother but the edge remains unchanged. Then, the server performs FBS processing on the first disparity map according to the main image in the first image pair to obtain a second disparity map as a final disparity map in the sample image pair a, where the edge of the final disparity map is finer. The main image in the first image pair is an image shot by the camera with the highest pixel in the plurality of cameras.

In a second possible manner, the server processes the initial disparity map in the sample image pair a through an edge-preserving filtering algorithm to obtain a second disparity map, and then performs a refinement process on the second disparity map according to the received image processing instruction to obtain a final disparity map in the sample image pair a.

The parallax image with smoother whole and finer edge can be obtained by the edge-preserving filtering algorithm. For example, the edge-preserving filtering algorithm may include one or more of a WLS filtering algorithm, a FBS algorithm, and the like, which is not limited in this embodiment.

For example, if the edge-preserving filtering algorithm includes a WLS filtering algorithm and a FBS algorithm, the server may perform WLS filtering on the initial disparity map in the sample image pair a to obtain a first disparity map, where the whole of the first disparity map is smoother but the edge remains unchanged. And then, the server performs FBS processing on the first parallax image according to the main image in the first image pair to obtain a second parallax image. The main image in the first image pair is an image shot by the camera with the highest pixel in the plurality of cameras.

The image processing instruction is used for instructing to process the second disparity map, and may instruct to perform image smoothing processing or image sharpening processing. For example, the image processing instruction may instruct to smooth the image body and the background and sharpen the image edge. The image processing instruction can be triggered by a technician, and the technician can trigger the image processing instruction by operations such as click operation, sliding operation, voice operation, somatosensory operation and the like, which is not limited uniquely in the embodiment of the present application.

Alternatively, the technician may trigger the image processing instruction in image processing software (e.g., Photoshop) to process the second disparity map, so as to obtain a disparity map with better image quality (e.g., higher edge fineness) as the final disparity map.

Step 102: the server takes a plurality of initial disparity maps as input data in a training sample, and a plurality of final disparity maps as sample labels in the training sample.

The training samples are samples used for model training, and include input data and sample labels. The training sample includes the plurality of sample image pairs. The initial disparity map in each sample image pair is the input data and the final disparity map in each sample image pair is the sample label. That is, the input data in the training sample is all the initial disparity maps in the plurality of sample image pairs, and the samples in the training sample are labeled as all the final disparity maps in the plurality of sample image pairs.

Step 103: and the server trains the untrained deep learning model by using the training sample to obtain the trained deep learning model.

The deep learning model in the embodiment of the application may include a plurality of network layers, and the plurality of network layers include an input layer, a plurality of hidden layers, and an output layer. The input layer is responsible for receiving input data; the output layer is responsible for outputting the processed data; the plurality of hidden layers are positioned between the input layer and the output layer and are responsible for processing data, and the plurality of hidden layers are invisible to the outside. For example, the deep learning model may be a convolutional neural network or the like.

The structure of the deep learning model may be as shown in fig. 2, where possible. The deep learning model comprises an input layer, a plurality of hidden layers and an output layer, wherein the hidden layers sequentially comprise a convolutional layer, one or more void convolutional layers (4 void convolutional layers are exemplified in fig. 2), an anti-convolutional layer, a softmax layer and the output layer.

The input layer is used for receiving input data. For example, the input layer may receive input data having a size of 640x480 and a number of channels of 1. Where 640 is the width (width) and 480 is the height (height), and the number of lanes of input data refers to the number of input data.

The convolution layer is used for performing convolution operation on the input data to obtain a plurality of first characteristic graphs. For example, the convolutional layer may perform convolution on the input data (including but not limited to convolution with a convolution kernel size of 3 × 3 and a step size of 2), perform batch normalization, and then perform activation function (including but not limited to a leak _ relu activation function) processing to obtain a first feature map with a size of 320 × 240 and a channel number of 32, that is, 32 first feature maps with a size of 320 × 240.

One or more hole convolution layers are used for carrying out hole convolution operation on the plurality of first feature maps to obtain a plurality of second feature maps. For example, in order to enlarge the receptive field and obtain global semantic information, each of the four hole convolution layers first performs hole convolution on the plurality of first feature maps, then performs batch normalization, and then performs processing through an activation function (including but not limited to a leak _ relu activation function) to obtain a second feature map with a size of 320x240 and a channel number of 32, that is, 32 second feature maps with a size of 320x240 are obtained. In one possible implementation, the first of the four hole convolution layers is subjected to hole convolution with a convolution kernel size of 3x3, a step length of 1 and an expansion coefficient of 1, then batch normalization is performed, and then the hole convolution is processed through a leakage _ relu activation function; the second cavity convolution layer is subjected to cavity convolution with convolution kernel size of 3x3, step length of 1 and expansion coefficient of 2, batch normalization is carried out, and then function processing is activated through leak _ relu; performing cavity convolution with convolution kernel size of 3x3, step length of 1 and expansion coefficient of 4 on the third cavity convolution layer, performing batch normalization, and performing leakage _ relu activation function processing; and (3) performing cavity convolution with convolution kernel size of 3x3, step length of 1 and expansion coefficient of 1 on the fourth cavity convolution layer, performing batch normalization, and performing function activation processing through leak _ relu.

The deconvolution layer is used for carrying out deconvolution operation on the plurality of second feature maps to obtain n third feature maps. For example, the deconvolution layer may perform deconvolution on the plurality of second feature maps with a convolution kernel size of 3x3 and a step size of 2, and then process the plurality of second feature maps by using an activation function (including but not limited to a leak _ relu activation function) to obtain a third feature map with a size of 640x480 and a channel number of 16, that is, 16 third feature maps with a size of 640x 480.

The softmax layer is used for calculating the fraction of each of the n third feature maps to obtain n fractions, wherein n is an integer greater than or equal to 4. When the softmax layer calculates the score of each of the n third feature maps, for any one of the n third feature maps, the softmax function may be adopted to calculate the score of the third feature map. Thus, after the softmax layer calculates the score of each of the n third feature maps, n scores are obtained, and the n scores are equivalent to a matrix with 1 row and n columns. For example, for the third feature map with 640 × 480 channels and 16 channels, the softmax layer performs softmax operation on 16 channels to calculate the score of each channel, so as to obtain 16 scores, where the 16 scores correspond to a matrix with 1 row and 16 columns.

The output layer is used for transforming the n fractions into a filter matrix with m rows and m columns, and multiplying the pixel value of each pixel point in the input data by the filter matrix one by one to obtain processed data. m is an integer and n is the square of m. When the output layer transforms the n scores into a filter matrix with m rows and m columns, the reshape function may be used to transform the n scores into a filter matrix with m rows and m columns. Then, for the pixel point at each position in the input data, the output layer may multiply the pixel value of the pixel point by the filter matrix to obtain the pixel value of the pixel point at the position in the processed data. In this way, the edge fineness of the processed data of the deep learning model can be improved. For example, after the softmax layer calculates the fractions of 16 channels, the output layer performs reshape operation on the fractions of 16 channels to obtain a filter matrix with 4 rows and 4 columns, and then multiplies the pixel value of each pixel point in the input data by the filter matrix one by one to obtain processed data.

The server trains the untrained deep learning model by using the training sample, and the operation of obtaining the trained deep learning model may be:

the server inputs the input data in the training sample into an untrained deep learning model for processing, and outputs the processed data;

the server determines a loss value between the processed data and a sample mark in the training sample through a prestored marginal loss function;

and the server adjusts parameters in the untrained deep learning model according to the loss value to obtain the trained deep learning model.

The edge loss function is: loss ═ L₂(I_g,I_o)+L₂(I_g',I_o')

Loss is the Loss value, I_oFor the processed data, I_o' subtracting first data from the processed data, the first data being obtained by performing a Gaussian smoothing operation on the processed data, I_gFor marking the sample, I_g' subtracting second data from the sample mark, the second data being obtained by performing Gaussian smoothing on the sample mark, L₂() Is a norm loss function of L2.

The edge loss functionIn (1)_o' is the high frequency information (i.e., edges) of the processed data, I_g' high frequency information labeled for the sample. In this way, the loss value of the L2 norm loss function is calculated for the processed data and the sample label, the loss value of the L2 norm loss function is calculated for the high-frequency information of the processed data and the high-frequency information of the sample label, and the sum of the two loss values is used as the loss value between the processed data and the sample label in the training sample, so that the finally obtained loss value can be more accurate. Thus, the edge fineness of the processed data of the deep learning model can be higher by adopting the edge loss function.

The operation of the server adjusting the parameters in the untrained deep learning model according to the loss value may refer to related technologies, which are not described in detail in this embodiment of the present application. For example, for any parameter in the untrained deep learning model, the server may obtain the partial derivative of the edge loss function with respect to the parameter according to the loss value and the parameter; the product of the learning rate and the partial derivative of this parameter is subtracted from this parameter to obtain the adjusted parameter. The learning rate may be set in advance, for example, the learning rate may be 0.001, 0.000001, or the like.

In the embodiment of the application, the server acquires a plurality of sample image pairs, and the image quality of the final disparity map in each sample image pair is higher than that of the initial disparity map. The server takes a plurality of initial disparity maps as input data in a training sample, and a plurality of final disparity maps as sample labels in the training sample. And then, the server trains the untrained deep learning model by using the training sample to obtain the trained deep learning model. The trained deep learning model can obtain a disparity map with higher image quality in a shorter time.

The following describes the process of using the deep learning model obtained by training in the embodiment of fig. 1.

Fig. 3 is a flowchart of a disparity map processing method according to an embodiment of the present application. Referring to fig. 3, the method includes the following steps.

Step 301: the terminal acquires a first disparity map, which is determined according to the image pair to be processed.

Two images in the image pair to be processed are two images at different angles acquired in the same scene, namely the image pair to be processed is obtained by shooting the same target through two different cameras. The image pair may be images captured by two cameras in the electronic device, i.e., may be a binocular image pair. For example, the pair of images may be an image captured by a camera with the highest pixel among the plurality of cameras of the mobile phone and an image captured by a camera other than the camera with the highest pixel among the plurality of cameras of the mobile phone. Alternatively, the image pair may be one that requires bi-blurring.

The first disparity map can be directly calculated in a simpler manner. Optionally, the operation of the terminal acquiring the first disparity map may be: and the terminal determines a first disparity map according to the image pair to be processed and a prestored stereo matching algorithm.

The stereo matching algorithm can find out the corresponding pixel point of each pixel point in one image in the image of the other visual angle, and calculate the disparity maps of the two images so as to estimate the depth image. For example, the stereo matching algorithm may be an SGBM algorithm, a BM algorithm, a GC algorithm, and the like, which is not limited in this embodiment.

The operation of determining the first disparity map by the terminal according to the to-be-processed image pair and the pre-stored stereo matching algorithm is similar to the operation of determining the initial disparity map in the sample image pair a by the server according to the first image pair and the pre-stored stereo matching algorithm in the embodiment of fig. 1, and details of the operation are not repeated in this embodiment of the present application.

Step 302: and the terminal inputs the first disparity map into the trained deep learning model for processing, and outputs a second disparity map, wherein the image quality of the second disparity map is higher than that of the first disparity map.

The trained deep learning model is obtained by training through the deep learning model training method described in the embodiment of fig. 1. That is, the trained deep learning model is obtained by training an untrained deep learning model using a training sample, where the training sample includes a plurality of sample image pairs, and the image quality of the final disparity map in each sample image pair is higher than that of the initial disparity map.

In practical application, the terminal can calculate the first disparity map of the image pair in a simple manner, and then input the first disparity map into the trained deep learning model, so that the second disparity map with higher image quality can be obtained in a shorter time.

For example, referring to fig. 4, a first disparity map of the pair of images may be as shown in the left image in fig. 4, and after the first disparity map is input into the trained depth learning model, a second disparity map as shown in the right image in fig. 4 may be obtained. It can be seen that after the trained deep learning model is processed, the edge errors of the disparity map are obviously improved, and the processing time of the processing process at the mobile phone end is about 400 milliseconds, which is very short.

In the embodiment of the application, after the terminal acquires the first disparity map of the image pair to be processed, the first disparity map is input into the trained deep learning model, and then the second disparity map with higher image quality can be acquired in a shorter time.

Fig. 5 is a schematic structural diagram of a deep learning model training apparatus according to an embodiment of the present application. Referring to fig. 5, the apparatus includes a second disparity map obtaining module 501, a training sample obtaining module 502, and a training module 503, wherein:

a second disparity map obtaining module 501, configured to obtain a plurality of sample image pairs, where an image quality of a final disparity map in each sample image pair is higher than an image quality of an initial disparity map;

a training sample obtaining module 502, configured to use a plurality of initial disparity maps as input data in a training sample, and use a plurality of final disparity maps as sample labels in the training sample;

the training module 503 is configured to train the untrained deep learning model by using the training sample, so as to obtain a trained deep learning model.

Optionally, the sample image pair a is any one of the plurality of sample image pairs, and the second disparity map obtaining module 501 is configured to:

acquiring a first image pair, wherein the first image pair is obtained by shooting the same target through two different cameras;

an initial disparity map in the sample image pair a is determined from the first image pair and a pre-stored stereo matching algorithm.

Optionally, the second disparity map obtaining module 501 is configured to:

carrying out weighted least square filtering processing on the initial disparity map in the sample image pair A to obtain a first disparity map;

according to the images in the first image pair, performing rapid bilateral filtering processing on the first disparity map to obtain a second disparity map;

and refining the second disparity map according to a received image processing instruction to obtain a final disparity map in the sample image pair A, wherein the image processing instruction is used for instructing to perform image smoothing processing or image sharpening processing.

Optionally, the second disparity map obtaining module 501 is configured to:

determining a third disparity map according to the first image pair and a pre-stored first stereo matching algorithm;

determining a fourth disparity map according to a second image pair and a pre-stored second stereo matching algorithm, wherein the second image pair is obtained by reducing the size of each image in the first image pair by k times, and k is an integer greater than or equal to 1;

amplifying the size of the fourth disparity map by k times to obtain a fifth disparity map;

and determining the initial disparity map of the sample image pair A according to the pixel values of the pixel points in the third disparity map and the pixel values of the pixel points in the fifth disparity map.

Optionally, the second disparity map obtaining module 501 is configured to:

determining first pixel points at various positions in the third disparity map and second pixel points at corresponding positions in the fifth disparity map;

if the difference value between the pixel value of the first pixel point and the pixel value of the second pixel point is smaller than the reference difference value, taking the pixel value of the first pixel point as the pixel value of the pixel point at the position in the initial parallax image of the sample image pair A; alternatively, the first and second electrodes may be,

if the difference value between the pixel value of the first pixel point and the pixel value of the second pixel point is larger than the reference difference value, taking the pixel value of the second pixel point as the pixel value of the pixel point at the position in the initial parallax image of the sample image pair A; alternatively, the first and second electrodes may be,

and if the difference value between the pixel value of the first pixel point and the pixel value of the second pixel point is equal to the reference difference value, taking the pixel value of the first pixel point or the pixel value of the second pixel point as the pixel value of the pixel point at the position in the initial parallax image of the sample image pair A.

Optionally, the training module 503 is configured to:

inputting the input data in the training sample into an untrained deep learning model for processing, and outputting the processed data;

determining a loss value between the processed data and a sample mark in the training sample through a prestored marginal loss function;

adjusting parameters in the untrained deep learning model according to the loss value to obtain a trained deep learning model;

wherein the edge loss function is: loss ═ L₂(I_g,I_o)+L₂(I_g',I_o')

In the embodiment of the application, a plurality of sample image pairs are obtained, and the image quality of the final disparity map in each sample image pair is higher than that of the initial disparity map. The initial disparity maps are used as input data in a training sample, and the final disparity maps are used as sample labels in the training sample. And then, training the untrained deep learning model by using the training sample to obtain the trained deep learning model. The trained deep learning model can obtain a disparity map with higher image quality in a shorter time.

It should be noted that: in the deep learning model training apparatus provided in the above embodiment, only the division of the above functional modules is used for illustration when the deep learning model is trained, and in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the apparatus is divided into different functional modules to complete all or part of the above described functions. In addition, the deep learning model training device provided by the above embodiment and the deep learning model training method embodiment belong to the same concept, and the specific implementation process thereof is described in the method embodiment, and is not described herein again.

Fig. 6 is a schematic structural diagram of a disparity map processing apparatus according to an embodiment of the present application. Referring to fig. 6, the apparatus includes a first disparity map obtaining module 601 and a disparity map processing module 602, wherein:

a first disparity map obtaining module 601, configured to obtain a first disparity map, where the first disparity map is determined according to a pair of images to be processed;

the disparity map processing module 602 is configured to input the first disparity map into the trained deep learning model for processing, and output a second disparity map, where an image quality of the second disparity map is higher than an image quality of the first disparity map.

Optionally, the first disparity map obtaining module 601 is configured to:

and determining a first disparity map according to the image pair to be processed and a prestored stereo matching algorithm.

In the embodiment of the application, after the first disparity map of the image pair to be processed is obtained, the first disparity map is input into the trained deep learning model, and then the second disparity map with higher image quality can be obtained in a shorter time.

Optionally, the trained deep learning model is obtained by training an untrained deep learning model using a training sample, where the training sample includes a plurality of sample image pairs, and the image quality of the final disparity map in each sample image pair is higher than that of the initial disparity map.

It should be noted that: in the disparity map processing apparatus provided in the above embodiment, only the division of the above functional modules is illustrated in the disparity map processing, and in practical applications, the above functions may be distributed by different functional modules as needed, that is, the internal structure of the apparatus may be divided into different functional modules to complete all or part of the above described functions. In addition, the disparity map processing apparatus provided in the above embodiment and the disparity map processing method embodiment belong to the same concept, and specific implementation processes thereof are described in the method embodiment in detail and are not described herein again.

Fig. 7 is a schematic structural diagram of a computer device according to an embodiment of the present application. As shown in fig. 7, the computer device 7 includes: at least one processor 70 (only one processor is shown in fig. 7), a memory 71, and a computer program 72 stored in the memory 71 and executable on the at least one processor 70, the steps in the deep learning model training method in the above-described embodiments being implemented when the computer program 72 is executed by the processor 70.

The computer device 7 may be one server or may be a server cluster composed of a plurality of servers. Those skilled in the art will appreciate that fig. 7 is merely an example of the computer device 7, and does not constitute a limitation of the computer device 7, and may include more or less components than those shown, or combine some of the components, or different components, such as input output devices, network access devices, etc.

The Processor 70 may be a Central Processing Unit (CPU), and the Processor 70 may also be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), off-the-shelf Programmable Gate arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, etc. A general purpose processor may be a microprocessor or any conventional processor.

The storage 71 may in some embodiments be an internal storage unit of the computer device 7, such as a hard disk or a memory of the computer device 7. The memory 71 may be an external storage device of the computer device 7 in other embodiments, such as a plug-in hard disk provided on the computer device 7, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like. Further, the memory 71 may also include both an internal storage unit of the computer device 7 and an external storage device. The memory 71 is used for storing an operating system, an application program, a BootLoader (BootLoader), data, and other programs, such as program codes of a computer program. The memory 71 may also be used to temporarily store data that has been output or is to be output.

Fig. 8 is a schematic structural diagram of a computer device according to an embodiment of the present application. As shown in fig. 8, the computer device 8 includes: at least one processor 80 (only one processor is shown in fig. 8), a memory 81, and a computer program 82 stored in the memory 81 and executable on the at least one processor 80, the steps in the disparity map processing method in the above-described embodiments being implemented when the computer program 82 is executed by the processor 80.

The computer device 8 may be a desktop computer, a notebook, a palmtop computer, or the like terminal. Those skilled in the art will appreciate that fig. 8 is merely an example of the computer device 8 and does not constitute a limitation of the computer device 8, and may include more or less components than those shown, or combine certain components, or different components, such as input output devices, network access devices, etc.

The processor 80 may be a CPU, and the processor 80 may also be other general purpose processors, DSPs, ASICs, FPGAs, or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or any conventional processor.

The storage 81 may in some embodiments be an internal storage unit of the computer device 8, such as a hard disk or a memory of the computer device 8. The memory 81 may also be an external storage device of the computer device 8 in other embodiments, such as a plug-in hard disk, SMC, SD card, flash memory card, etc. provided on the computer device 8. Further, the memory 81 may also include both an internal storage unit of the computer device 8 and an external storage device. The memory 81 is used for storing an operating system, an application program, a boot loader, data, and other programs such as program codes of a computer program. The memory 81 may also be used to temporarily store data that has been output or is to be output.

In some embodiments, a computer readable storage medium is also provided, in which a computer program is stored, and when being executed by a processor, the computer program realizes the steps of the deep learning model training method or the disparity map processing method in the above embodiments. For example, the computer-readable storage medium may be a ROM (Read-Only Memory), a RAM (Random Access Memory), a CD-ROM (Compact Disc Read-Only Memory), a magnetic tape, a floppy disk, an optical data storage device, and the like.

It is noted that the computer-readable storage medium referred to herein may be a non-volatile storage medium, in other words, a non-transitory storage medium.

It should be understood that all or part of the steps for implementing the above embodiments may be implemented by software, hardware, firmware or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The computer instructions may be stored in the computer-readable storage medium described above.

That is, in some embodiments, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform the steps of the deep learning model training method or the disparity map processing method in the above embodiments.

The above description is not intended to limit the present application to the particular embodiments disclosed, but on the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the present application.

Claims

1. A disparity map processing method, comprising:

acquiring a first disparity map, wherein the first disparity map is determined according to an image pair to be processed;

and inputting the first disparity map into a trained deep learning model for processing, and outputting a second disparity map, wherein the image quality of the second disparity map is higher than that of the first disparity map.

2. The method of claim 1, wherein the obtaining the first disparity map comprises:

and determining a first disparity map according to the image pair to be processed and a pre-stored stereo matching algorithm.

3. The method according to claim 1 or 2, wherein the trained deep learning model is obtained by training an untrained deep learning model by using a training sample, the training sample comprises a plurality of sample image pairs, and the image quality of the final disparity map in each sample image pair is higher than that of the initial disparity map.

4. A deep learning model training method is characterized by comprising the following steps:

using a plurality of initial disparity maps as input data in a training sample, and using a plurality of final disparity maps as sample labels in the training sample;

5. The method of claim 4, wherein the sample image pair A is any one of the plurality of sample image pairs, and the obtaining an initial disparity map in the sample image pair A comprises:

determining an initial disparity map in the sample image pair A according to the first image pair and a pre-stored stereo matching algorithm.

6. The method of claim 5, wherein obtaining the final disparity map in the sample image pair A comprises:

performing weighted least square filtering processing on the initial disparity map in the sample image pair A to obtain a first disparity map;

and refining the second disparity map according to a received image processing instruction to obtain a final disparity map in the sample image pair A, wherein the image processing instruction is used for indicating image smoothing processing or image sharpening processing.

7. The method of claim 6, wherein said determining an initial disparity map in the sample image pair A from the first image pair and a pre-stored stereo matching algorithm comprises:

determining a third disparity map according to the first image pair and a prestored first stereo matching algorithm;

and determining the initial disparity map in the sample image pair A according to the pixel values of the pixel points in the third disparity map and the pixel values of the pixel points in the fifth disparity map.

8. The method of claim 7, wherein determining the initial disparity map in the sample image pair A according to the pixel values of the pixels in the third disparity map and the pixel values of the pixels in the fifth disparity map comprises:

if the difference value between the pixel value of the first pixel point and the pixel value of the second pixel point is smaller than the reference difference value, taking the pixel value of the first pixel point as the pixel value of the pixel point at the position in the initial parallax image in the sample image pair A; alternatively, the first and second electrodes may be,

if the difference value between the pixel value of the first pixel point and the pixel value of the second pixel point is larger than the reference difference value, taking the pixel value of the second pixel point as the pixel value of the pixel point at the position in the initial parallax image in the sample image pair A; alternatively, the first and second electrodes may be,

and if the difference value between the pixel value of the first pixel point and the pixel value of the second pixel point is equal to the reference difference value, taking the pixel value of the first pixel point or the pixel value of the second pixel point as the pixel value of the pixel point at the position in the initial parallax image in the sample image pair A.

9. The method of any one of claims 4-8, wherein training the untrained deep learning model row using the training samples to obtain a trained deep learning model comprises:

wherein the edge loss function is: loss ═ L₂(I_g,I_o)+L₂(I_g',I_o')

Said Loss is said Loss value, said I_oFor the processed data, the I_o' subtracting first data from the processed data, the first data being obtained by performing a Gaussian smoothing operation on the processed data, and I_gLabeling the sample, said I_g' subtracting second data from the sample marker, wherein the second data is obtained by performing Gaussian smoothing on the sample marker, and L₂() Is a norm loss function of L2.

10. A disparity map processing apparatus, comprising:

and the disparity map processing module is used for inputting the first disparity map into a trained deep learning model for processing and outputting a second disparity map, wherein the image quality of the second disparity map is higher than that of the first disparity map.

11. A deep learning model training device, comprising:

12. A computer device, characterized in that the computer device comprises a memory, a processor and a computer program stored in the memory and executable on the processor, which computer program, when executed by the processor, implements the method according to any of claims 1-3 or any of claims 4-9.

13. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a processor, implements the method according to any one of claims 1-3 or any one of claims 4-9.