CN111292425B - View synthesis method based on monocular and binocular mixed data set - Google Patents

View synthesis method based on monocular and binocular mixed data set Download PDF

Info

Publication number
CN111292425B
CN111292425B CN202010072802.9A CN202010072802A CN111292425B CN 111292425 B CN111292425 B CN 111292425B CN 202010072802 A CN202010072802 A CN 202010072802A CN 111292425 B CN111292425 B CN 111292425B
Authority
CN
China
Prior art keywords
binocular
image
disparity
monocular
pseudo
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010072802.9A
Other languages
Chinese (zh)
Other versions
CN111292425A (en
Inventor
肖春霞
李文杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN202010072802.9A priority Critical patent/CN111292425B/en
Publication of CN111292425A publication Critical patent/CN111292425A/en
Application granted granted Critical
Publication of CN111292425B publication Critical patent/CN111292425B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/97Determining parameters from multiple pictures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20228Disparity calculation for image-based rendering

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Graphics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a view synthesis method based on a monocular and binocular mixed data set, which comprises the steps of pre-training a disparity estimation network by utilizing small-scale left and right binocular images, generating a right image and a disparity label for a large-scale monocular image set by utilizing the pre-trained network to form a large-scale binocular image pair, training another disparity estimation network by utilizing the generated large-scale binocular image pair, and finally finishing view synthesis by utilizing a rendering technology based on a disparity map. The invention has the following advantages: training a parallax estimation network based on the small-scale left and right binocular images; a large-scale pseudo-binocular data set with a parallax label is generated based on the large-scale monocular picture set; training a parallax estimation network based on a self-generated 'pseudo data set'; the method for training the parallax estimation network by using the small-scale left and right binocular image pairs and the large-scale monocular image set is provided, the data set is easier to construct, and factors such as illumination consistency, camera movement and object movement do not need to be considered in the monocular image set.

Description

View synthesis method based on monocular and binocular mixed data set
Technical Field
The invention belongs to the field of computer vision and image rendering, and relates to a view synthesis method based on deep learning, in particular to a view synthesis method based on a small-scale binocular training set.
Background
In many cases in life, view synthesis technologies, such as virtual image rendering in virtual reality, 3D display technology, 2D video to 3D video conversion, etc., are required. The existing view synthesis method is mainly based on a depth learning method, a convolutional neural network is used as an image processing model to extract image features, further, depth information of a scene is estimated, and then a rendering technology based on a depth map is used for generating an image of a new view angle. However, existing methods based on deep learning are mostly based on binocular or multi-view data sets, and the required data sets are large in size. Although some large-scale binocular image data sets and monocular video data sets are available for training, the scenes contained in these data sets are relatively simple and homogeneous, which is not favorable for generalization of models. On one hand, if a binocular or multi-view data set containing various scenes is constructed, a large amount of time, labor and equipment cost are consumed, and in comparison, the construction of a small-scale monocular picture data set is easier, and only various single pictures need to be collected on the internet. On the other hand, the monocular video data set has the conditions of camera motion, object movement in a scene and the like, which can increase the difficulty for model training, and in contrast, the problems can be avoided by training with the monocular picture data set.
Disclosure of Invention
The invention aims to overcome the defects of the existing method and provides a view synthesis method based on a small-scale left and right binocular picture and large-scale monocular picture mixed data set.
The technical problem of the invention is mainly solved by the following technical scheme, and the view synthesis method based on the monocular and binocular mixed data set comprises the following steps:
step 1, constructing a mixed data set containing a small-scale left and right binocular image pair and a large-scale monocular image set;
step 2, pre-training a monocular parallax estimation network by utilizing small-scale left and right binocular images;
step 3, by using the model pre-trained in the step 2, regarding all pictures as 'left pictures' aiming at the monocular images in the mixed data set, and estimating a 'pseudo-disparity map' of each picture;
step 4, generating a corresponding 'pseudo right graph' by using monocular image data and an estimated 'pseudo disparity map' corresponding to the monocular image data and adopting a rendering method based on the disparity map;
step 5, forming a pseudo-binocular data set with a parallax label by using the monocular image set and the pseudo-parallax map and the pseudo-right map generated in the step 3 and the step 4;
step 6, retraining a binocular disparity estimation network by using the pseudo binocular data set generated in the step 5;
and 7, utilizing the binocular disparity estimation network trained in the step 6 to perform disparity map estimation on the input left and right binocular test picture pairs and render based on the disparity maps, and generating new view synthesis results of the left and right pairs on the camera base line.
Further, the data set constructed in step 1 is a mixed data set of a small-scale left and right binocular image pair and a large-scale monocular image set, wherein the small-scale left and right binocular image pair is a stereo rectified image pair with the scale of (10)2Stage), the large-scale monocular image set is an image set collected from the internet and containing various indoor and outdoor scenes, and the scale of the image set is (10)4Stages).
Further, when a small-scale left and right binocular image is used for pre-training the monocular parallax estimation network in the step 2, the left image is used as network input, and the right image is used for supervision; the network outputs left and right disparity maps corresponding to the left and right images and generates a right image and a left image respectively by rendering based on the disparity maps, and the process can be expressed as follows:
(Dl,Dr)=Ng(Il)
Figure BDA0002377717610000021
Figure BDA0002377717610000022
wherein, IlRepresenting the left image, N, of a small-scale left-right binocular image pairgRepresenting a disparity estimation network, (D)l,Dr) Left and right disparity maps representing the output of the network,
Figure BDA0002377717610000023
representing a left-based picture and a predicted right disparity map, rendering the generated right map,
Figure BDA0002377717610000024
and (e) representing a left image generated by rendering based on the right image and the predicted left disparity image, wherein (i, j) represents the pixel coordinates of the picture.
Further, when the small-scale left and right binocular images are used for pre-training the monocular parallax estimation network in the step 2, the real left and right images are used as bidirectional monitoring information. Taking the supervision of the left image as an example, the specific implementation process is as follows:
step 2.1, the generated left graph
Figure BDA0002377717610000025
And the true left image IlComparing, finding SSIM and L1 weighted loss:
Figure BDA0002377717610000026
wherein, N represents the total number of pixels of the left image, and α is a weight value for balancing SSIM loss and L1 loss.
Step 2.2, the gradient of the generated left disparity map is constrained by using a gradient smoothing term, so that the generated disparity map is smooth enough:
Figure BDA0002377717610000027
wherein,
Figure BDA0002377717610000028
the partial differential is expressed, e is the natural logarithm, | indicates the absolute value.
And 2.3, carrying out consistency constraint on the generated left and right disparity maps to ensure that the generated disparity maps meet the geometrical condition limit between the left and right:
Figure BDA0002377717610000031
step 2.4, left of the loss function in step 2.1, step 2.2 and step 2.3Exchanging the right graph to obtain a loss function for the right graph
Figure BDA0002377717610000032
And
Figure BDA0002377717610000033
the overall loss function is:
Figure BDA0002377717610000034
wherein alpha is*The weight value for controlling the ratio of the three losses is obtained. By minimizing
Figure BDA0002377717610000035
Supervision network NgA gradient update is performed.
Further, the monocular image set in step 3 is considered as a "left image", using the pre-trained network N in step 2gEstimating a disparity map corresponding to each picture, wherein the process can be represented as follows:
Figure BDA0002377717610000036
wherein,
Figure BDA0002377717610000037
representing a network N pre-trained by entering a singleton dataset into step 2gThe predicted "pseudo-disparity map".
Further, step 4 generates a "pseudo right map" based on the rendering method of the disparity map by using the monocular image set and the "pseudo disparity map" generated in step 3, and the process is defined as follows:
Figure BDA0002377717610000038
further, step 5 uses the monocular image set and the "pseudo-disparity map" and the "pseudo-right map" generated in steps 3 and 4 to form a "pseudo-binocular" data set with disparity labels:
Figure BDA0002377717610000039
the data set is used as a data set for network training in the subsequent step, and the subsequent training of the parallax estimation network is converted into a supervised training process.
Further, step 6 retrains a binocular disparity estimation network based on the "pseudo-binocular" data set generated in step 5, and uses a "pseudo-disparity map" in the "pseudo-binocular" data set as a supervision signal. The specific implementation process is as follows:
step 6.1, inputting the left image and the right image in the pseudo binocular data set into a network, and estimating a disparity map:
Figure BDA00023777176100000310
wherein N isaRepresenting the newly trained binocular disparity estimation network, and D represents the disparity values of the left and right views predicted by the network.
Step 6.2, the generated disparity map D and the pseudo disparity map in the pseudo binocular data set "
Figure BDA00023777176100000311
In comparison, the L1 loss is calculated:
Figure BDA00023777176100000312
by minimizing
Figure BDA00023777176100000313
Supervision network NaA gradient update is performed.
Further, step 7 uses the binocular disparity estimation network trained in step 6 to input left and right binocular images of the real world to estimate disparity values thereof, and uses rendering based on disparity images to generate a series of intermediate view results on the camera baselines of the left and right images. The process is concretely realized as follows:
and 7.1, inputting left and right binocular images of the real world to estimate the disparity value by using the binocular disparity estimation network trained in the step 6:
D=Na(Il,Ir)
wherein (I)l,Ir) A left and right image pair representing the real world, NaRepresenting a trained binocular disparity estimation network, D represents (I)l,Ir) An estimated disparity value.
And 7.2, calculating the disparity map of the left and right images at the alpha position on the camera base line by using the disparity map estimated in the step 7.1:
Figure BDA0002377717610000041
where α ∈ [0,1] indicates a relative position of the target view from the left image on the camera base line of the left and right images, and for example, α ═ 0.5 indicates that the distance from the position to the left image is 0.5 times the camera distance from the left and right images.
And 7.3, generating an image at the alpha position by using the parallax map at the alpha position generated in the step 7.2 and a rendering method based on the parallax map:
Figure BDA0002377717610000042
wherein, IlThe left image in the left and right image pair representing the real world, with (i, j) representing the image pixel coordinates.
Compared with the prior art, the invention has the following advantages:
1. the invention is based on a small-scale binocular data set (10)2) Training a parallax estimation network;
2. the invention generates a large-scale 'pseudo-binocular data set' with a parallax label based on a large-scale monocular data set;
3. the invention trains a parallax estimation network based on a self-generated 'pseudo data set';
4. the invention provides a parallax estimation network trained by a large-scale monocular data set, the data set is easier to construct, and factors such as illumination inconsistency, camera motion and object motion do not exist.
Drawings
Fig. 1 is a general flow chart of the present invention.
Detailed Description
The technical solution of the present invention is further explained with reference to the drawings and the embodiments.
As shown in fig. 1, a binocular vision chart synthesis method based on a small-scale left and right binocular training set and a large-scale monocular training set includes the following steps:
step 1, constructing a mixed data set containing a small-scale left and right binocular image pair and a large-scale monocular image set, wherein the specific implementation mode is as follows:
constructing a small-scale left and right binocular image pair and performing stereo rectification, wherein the scale is (10)2Stage) of collecting image sets containing various indoor and outdoor scenes from the Internet, and constructing a large-scale monocular image set with the size of (10)4Stages).
Step 2, pre-training a monocular parallax estimation network by utilizing small-scale left and right binocular images, wherein the network is the conventional network structure DispNet, and the specific implementation mode is as follows:
step 2.1, the left image is taken as network input, the network outputs left and right disparity maps corresponding to the left and right images, and rendering based on the disparity maps is utilized to respectively generate a right image and a left image, and the process can be expressed as:
(Dl,Dr)=Ng(Il)
Figure BDA0002377717610000051
Figure BDA0002377717610000052
wherein, IlRepresenting the left image, N, of a small-scale left-right binocular image pairgRepresenting a disparity estimation network, (D)l,Dr) Left and right disparity maps representing the output of the network,
Figure BDA0002377717610000053
representing a left-based picture and a predicted right disparity map, rendering the generated right map,
Figure BDA0002377717610000054
and (e) representing a left image generated by rendering based on the right image and the predicted left disparity image, wherein (i, j) represents the pixel coordinates of the picture.
And 2.2, when the small-scale left and right binocular images are used for pre-training the monocular parallax estimation network, the real left and right images are used as bidirectional monitoring information. Taking the supervision of the left image as an example, the specific implementation process is as follows:
step 2.2.1, generate left graph
Figure BDA0002377717610000055
And the true left image IlComparing, finding SSIM and L1 weighted loss:
Figure BDA0002377717610000056
wherein, N represents the total number of pixels in the left graph, α is a weight for balancing SSIM loss and L1 loss, and α is 0.85.
Step 2.2.2, constraining the gradient of the generated left disparity map by using a gradient smoothing term so that the generated disparity map is smooth enough:
Figure BDA0002377717610000057
wherein,
Figure BDA0002377717610000058
representing partial differential, e being a natural pairNumber, | indicates the absolute value.
Step 2.2.3, carrying out consistency constraint on the generated left and right disparity maps to ensure that the generated disparity maps meet the geometrical condition limit between the left and the right:
Figure BDA0002377717610000059
step 2.2.4, the left and right graphs of the loss function in step 2.2.1, step 2.2.2 and step 2.2.3 are exchanged to obtain the loss function aiming at the right graph
Figure BDA00023777176100000510
And
Figure BDA00023777176100000511
the overall loss function is:
Figure BDA00023777176100000512
wherein alpha is*To control the weight of the ratio of the three losses, αap=1,αds=0.1,αlr1. By minimizing
Figure BDA00023777176100000514
Supervision network NgA gradient update is performed.
Step 3, regarding the monocular image set in the mixed data set as a 'left image', and utilizing the network N pre-trained in the step 2gEstimating a disparity map corresponding to each picture, wherein the process can be represented as follows:
Figure BDA00023777176100000513
wherein,
Figure BDA0002377717610000061
representing a network N pre-trained by entering a singleton dataset into step 2gThe predicted "pseudo-disparity map".
And 4, generating a pseudo right image by using the monocular image set and the pseudo disparity map generated in the step 3 based on a rendering method of the disparity map, wherein the process is defined as follows:
Figure BDA0002377717610000062
step 5, forming a pseudo-binocular data set with a parallax label by using the monocular image set and the pseudo-parallax map and the pseudo-right map generated in the step 3 and the step 4, wherein the data set specifically comprises the following components:
Figure BDA0002377717610000063
the data set is used as a data set for network training in the subsequent step, and the subsequent training of the parallax estimation network is converted into a supervised training process.
And 6, retraining a binocular disparity estimation network based on the pseudo-binocular data set generated in the step 5, and taking a pseudo-disparity map in the pseudo-binocular data set as a supervision signal. The specific implementation process is as follows:
step 6.1, inputting the left image and the right image in the pseudo binocular data set into a network, and estimating a disparity map:
Figure BDA0002377717610000064
wherein N isaRepresenting the newly trained binocular disparity estimation network, and D represents the disparity values of the left and right views predicted by the network.
Step 6.2, the generated disparity map D and the pseudo disparity map in the pseudo binocular data set "
Figure BDA0002377717610000065
In comparison, the L1 loss is calculated:
Figure BDA0002377717610000066
by minimizing
Figure BDA0002377717610000067
Supervision network NaA gradient update is performed.
And 7, inputting left and right binocular images of the real world to estimate the parallax value of the left and right binocular images by using the binocular parallax estimation network trained in the step 6, and generating a series of intermediate view results on the camera baselines of the left and right images by using rendering based on the parallax images. The process is concretely realized as follows:
and 7.1, inputting left and right binocular images of the real world to estimate the disparity value by using the binocular disparity estimation network trained in the step 6:
D=Na(Il,Ir)
wherein (I)l,Ir) A left and right image pair representing the real world, NaRepresenting a trained binocular disparity estimation network, D represents (I)l,Ir) An estimated disparity value.
And 7.2, calculating the disparity map of the left and right images at the alpha position on the camera base line by using the disparity map estimated in the step 7.1:
Figure BDA0002377717610000068
where α ∈ [0,1] indicates a relative position of the target view from the left image on the camera base line of the left and right images, and for example, α ═ 0.5 indicates that the distance from the position to the left image is 0.5 times the camera distance from the left and right images.
And 7.3, generating an image at the alpha position by using the parallax map at the alpha position generated in the step 7.2 and a rendering method based on the parallax map:
Figure BDA0002377717610000071
wherein, IlThe left image in the left and right image pair representing the real world, with (i, j) representing the image pixel coordinates.
Compared with the prior art, the invention has the following advantages:
1. the invention is based on a small-scale binocular data set (10)2) Training a parallax estimation network;
2. the invention generates a large-scale 'pseudo-binocular data set' with a parallax label based on a large-scale monocular data set;
3. the invention trains a parallax estimation network based on a self-generated 'pseudo data set';
4. the invention provides a parallax estimation network trained by a large-scale monocular data set, the data set is easier to construct, and factors such as illumination inconsistency, camera motion and object motion do not exist.
The specific embodiments described herein are merely illustrative of the spirit of the invention. Various modifications or additions may be made to the described embodiments or alternatives may be employed by those skilled in the art without departing from the spirit or ambit of the invention as defined in the appended claims.

Claims (9)

1. A view synthesis method based on a monocular and binocular mixed data set is characterized by comprising the following steps:
step 1, constructing a mixed data set containing a small-scale left and right binocular image pair and a large-scale monocular image set;
step 2, pre-training a monocular parallax estimation network by utilizing small-scale left and right binocular images;
step 3, by using the model pre-trained in the step 2, regarding all pictures as 'left pictures' aiming at the monocular images in the mixed data set, and estimating a 'pseudo-disparity map' of each picture;
step 4, generating a corresponding 'pseudo right graph' by using monocular image data and an estimated 'pseudo disparity map' corresponding to the monocular image data and adopting a rendering method based on the disparity map;
step 5, forming a pseudo-binocular data set with a parallax label by using the monocular image set and the pseudo-parallax map and the pseudo-right map generated in the step 3 and the step 4;
step 6, retraining a binocular disparity estimation network by using the pseudo binocular data set generated in the step 5;
and 7, utilizing the binocular disparity estimation network trained in the step 6 to perform disparity map estimation on the input left and right binocular test picture pairs and render based on the disparity maps, and generating new view synthesis results of the left and right pairs on the camera base line.
2. The method of claim 1, wherein the view synthesis method based on the monocular and binocular mixed data sets comprises: the data set constructed in the step 1 is a mixed data set of a small-scale left and right binocular image pair and a large-scale monocular image set, wherein the small-scale left and right binocular image pair is a stereo rectified image pair with the scale of 102A large-scale monocular image set is an image set which is collected from the Internet and contains various indoor and outdoor scenes, and the scale of the image set is 104And (4) stages.
3. The method of claim 1, wherein the view synthesis method based on the monocular and binocular mixed data sets comprises: in step 2, when a small-scale left and right binocular images are used for pre-training the monocular parallax estimation network, the left image is used as network input, the network outputs left and right parallax images corresponding to the left and right images, and rendering based on the parallax images is used for respectively generating a right image and a left image, and the process is represented as follows:
(Dl,Dr)=Ng(Il)
Figure FDA0003340881360000011
Figure FDA0003340881360000012
wherein, IlRepresenting the left image, N, of a small-scale left-right binocular image pairgRepresenting a disparity estimation network, (D)l,Dr) Left and right disparity maps representing the output of the network,
Figure FDA0003340881360000013
representing a left-based picture and a predicted right disparity map, rendering the generated right map,
Figure FDA0003340881360000014
and (e) representing a left image generated by rendering based on the right image and the predicted left disparity image, wherein (i, j) represents the pixel coordinates of the picture.
4. A method for view synthesis based on a monocular and binocular mixed data set according to claim 3, wherein: in step 2, when a small-scale left and right binocular image is used for pre-training the monocular parallax estimation network, the real left and right images are used as bidirectional monitoring information, and the monitoring of the left image is taken as an example, and the specific implementation process is as follows:
step 2.1, the generated left graph
Figure FDA0003340881360000021
And the true left image IlComparing, finding SSIM and L1 weighted loss:
Figure FDA0003340881360000022
wherein, N represents the total number of the pixel points of the left image, and alpha is a weight value for balancing SSIM loss and L1 loss;
step 2.2, the gradient of the generated left disparity map is constrained by using a gradient smoothing term, so that the generated disparity map is smooth enough:
Figure FDA0003340881360000023
wherein,
Figure FDA0003340881360000024
representing partial differential, e is a natural logarithm, | represents solving an absolute value;
and 2.3, carrying out consistency constraint on the generated left and right disparity maps to ensure that the generated disparity maps meet the geometrical condition limit between the left and right:
Figure FDA0003340881360000025
step 2.4, exchanging the left and right graphs of the loss function in step 2.1, step 2.2 and step 2.3 to obtain the loss function for the right graph
Figure FDA0003340881360000026
And
Figure FDA0003340881360000027
the overall loss function is:
Figure FDA0003340881360000028
wherein alpha is*For controlling the weight of the ratio of the three losses by minimizing
Figure FDA0003340881360000029
Supervision network NgA gradient update is performed.
5. The method of claim 1, wherein the view synthesis method based on the monocular and binocular mixed data sets comprises: the monocular image set in step 3 is considered a "left image," using the pre-trained network N of step 2gEstimating a disparity map corresponding to each picture, wherein the process can be represented as follows:
Figure FDA00033408813600000210
wherein,
Figure FDA00033408813600000211
representing a network N pre-trained by entering a singleton dataset into step 2gThe predicted "pseudo-disparity map".
6. The method of claim 5, wherein the view synthesis method based on the monocular and binocular mixed data sets comprises: step 4, generating a pseudo right image by using the monocular image set and the pseudo disparity map generated in step 3 based on a rendering method of the disparity map, wherein the process is defined as follows:
Figure FDA00033408813600000212
where (i, j) represents the pixel coordinates of the picture.
7. The method of claim 6, wherein the view synthesis method based on the monocular and binocular mixed data sets comprises: step 5, forming a pseudo-binocular data set with a parallax label by using the monocular image set and the pseudo-parallax map and the pseudo-right map generated in the step 3 and the step 4:
Figure FDA00033408813600000213
the data set is used as a data set for network training in the subsequent step, and the subsequent training of the parallax estimation network is converted into a supervised training process.
8. The method of claim 7, wherein the view synthesis method based on the monocular and binocular mixed data sets comprises: step 6, based on the pseudo-binocular data set generated in step 5, retraining a binocular disparity estimation network, and using a pseudo-disparity map in the pseudo-binocular data set as a supervision signal, wherein the specific implementation process is as follows:
step 6.1, inputting the left image and the right image in the pseudo binocular data set into a network, and estimating a disparity map:
Figure FDA0003340881360000031
wherein N isaRepresenting a newly trained binocular disparity estimation network, and D represents disparity values of left and right views predicted by the network;
step 6.2, the generated disparity map D and the pseudo disparity map in the pseudo binocular data set "
Figure FDA0003340881360000032
And comparing, and solving loss:
Figure FDA0003340881360000033
by minimizing
Figure FDA0003340881360000034
Supervision network NaA gradient update is performed.
9. The method of claim 8, wherein the view synthesis method based on the monocular and binocular mixed data sets comprises: step 7, inputting left and right binocular images of the real world to estimate the disparity values by using the binocular disparity estimation network trained in the step 6, and generating a series of intermediate view results on the baselines of the left and right image cameras by using rendering based on the disparity images; the process is concretely realized as follows:
and 7.1, inputting left and right binocular images of the real world to estimate the disparity value by using the binocular disparity estimation network trained in the step 6:
D=Na(Il,Ir)
wherein (I)l,Ir) A left and right image pair representing the real world, NaRepresenting a trained binocular disparity estimation network, D represents (I)l,Ir) An estimated disparity value;
and 7.2, calculating the disparity map of the left and right images at the alpha position on the camera base line by using the disparity map estimated in the step 7.1:
Figure FDA0003340881360000035
wherein, alpha belongs to [0,1] represents the relative position of the target view and the left image on the camera base line of the left and right images;
and 7.3, generating an image at the alpha position by using the parallax map at the alpha position generated in the step 7.2 and a rendering method based on the parallax map:
Figure FDA0003340881360000036
wherein, IlThe left image in the left and right image pair representing the real world, with (i, j) representing the image pixel coordinates.
CN202010072802.9A 2020-01-21 2020-01-21 View synthesis method based on monocular and binocular mixed data set Active CN111292425B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010072802.9A CN111292425B (en) 2020-01-21 2020-01-21 View synthesis method based on monocular and binocular mixed data set

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010072802.9A CN111292425B (en) 2020-01-21 2020-01-21 View synthesis method based on monocular and binocular mixed data set

Publications (2)

Publication Number Publication Date
CN111292425A CN111292425A (en) 2020-06-16
CN111292425B true CN111292425B (en) 2022-02-01

Family

ID=71024323

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010072802.9A Active CN111292425B (en) 2020-01-21 2020-01-21 View synthesis method based on monocular and binocular mixed data set

Country Status (1)

Country Link
CN (1) CN111292425B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113436264B (en) * 2021-08-25 2021-11-19 深圳市大道智创科技有限公司 Pose calculation method and system based on monocular and monocular hybrid positioning
TWI798094B (en) * 2022-05-24 2023-04-01 鴻海精密工業股份有限公司 Method and equipment for training depth estimation model and depth estimation
CN115909446B (en) * 2022-11-14 2023-07-18 华南理工大学 Binocular face living body discriminating method, device and storage medium
CN117372494B (en) * 2023-08-07 2024-09-03 合肥工业大学 Power grid operator parallax estimation and positioning method based on single-binocular vision cooperation

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102903096A (en) * 2012-07-04 2013-01-30 北京航空航天大学 Monocular video based object depth extraction method
CN109087346A (en) * 2018-09-21 2018-12-25 北京地平线机器人技术研发有限公司 Training method, training device and the electronic equipment of monocular depth model
CN110113595A (en) * 2019-05-08 2019-08-09 北京奇艺世纪科技有限公司 A kind of 2D video turns the method, apparatus and electronic equipment of 3D video
CN110443843A (en) * 2019-07-29 2019-11-12 东北大学 A kind of unsupervised monocular depth estimation method based on generation confrontation network

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105931240B (en) * 2016-04-21 2018-10-19 西安交通大学 Three dimensional depth sensing device and method
CN106600583B (en) * 2016-12-07 2019-11-01 西安电子科技大学 Parallax picture capturing method based on end-to-end neural network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102903096A (en) * 2012-07-04 2013-01-30 北京航空航天大学 Monocular video based object depth extraction method
CN109087346A (en) * 2018-09-21 2018-12-25 北京地平线机器人技术研发有限公司 Training method, training device and the electronic equipment of monocular depth model
CN110113595A (en) * 2019-05-08 2019-08-09 北京奇艺世纪科技有限公司 A kind of 2D video turns the method, apparatus and electronic equipment of 3D video
CN110443843A (en) * 2019-07-29 2019-11-12 东北大学 A kind of unsupervised monocular depth estimation method based on generation confrontation network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
A Novel Monocular Disparity Estimation Network with Domain Transformation and Ambiguity Learning;Bello J 等;《2019 IEEE International Conference on Image Processing (ICIP)》;20190925;第474-478页 *
基于LRSDR-Net的实时单目深度估计;张喆韬 等;《电子测量技术》;20191031;第42卷(第19期);第164-169页 *

Also Published As

Publication number Publication date
CN111292425A (en) 2020-06-16

Similar Documents

Publication Publication Date Title
CN111292425B (en) View synthesis method based on monocular and binocular mixed data set
US11210803B2 (en) Method for 3D scene dense reconstruction based on monocular visual slam
CN108986136B (en) Binocular scene flow determination method and system based on semantic segmentation
CN110335343B (en) Human body three-dimensional reconstruction method and device based on RGBD single-view-angle image
CN108388882B (en) Gesture recognition method based on global-local RGB-D multi-mode
CN113393522B (en) 6D pose estimation method based on monocular RGB camera regression depth information
CN108876814B (en) Method for generating attitude flow image
CN110782490A (en) Video depth map estimation method and device with space-time consistency
CN108932725B (en) Scene flow estimation method based on convolutional neural network
CN103971366B (en) A kind of solid matching method being polymerize based on double weights
CN112308918B (en) Non-supervision monocular vision odometer method based on pose decoupling estimation
CN109758756B (en) Gymnastics video analysis method and system based on 3D camera
CN110910437B (en) Depth prediction method for complex indoor scene
CN113077505B (en) Monocular depth estimation network optimization method based on contrast learning
CN108510520B (en) A kind of image processing method, device and AR equipment
CN114429555A (en) Image density matching method, system, equipment and storage medium from coarse to fine
CN113284173A (en) End-to-end scene flow and pose joint learning method based on pseudo laser radar
CN111860651A (en) Monocular vision-based semi-dense map construction method for mobile robot
Gao et al. Joint optimization of depth and ego-motion for intelligent autonomous vehicles
CN114693720A (en) Design method of monocular vision odometer based on unsupervised deep learning
CN111311664A (en) Joint unsupervised estimation method and system for depth, pose and scene stream
CN112686952A (en) Image optical flow computing system, method and application
CN117197388A (en) Live-action three-dimensional virtual reality scene construction method and system based on generation of antagonistic neural network and oblique photography
CN113034681B (en) Three-dimensional reconstruction method and device for spatial plane relation constraint
CN117711066A (en) Three-dimensional human body posture estimation method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant