CN113379821B - Stable monocular video depth estimation method based on deep learning - Google Patents

Stable monocular video depth estimation method based on deep learning Download PDF

Info

Publication number
CN113379821B
CN113379821B CN202110695235.7A CN202110695235A CN113379821B CN 113379821 B CN113379821 B CN 113379821B CN 202110695235 A CN202110695235 A CN 202110695235A CN 113379821 B CN113379821 B CN 113379821B
Authority
CN
China
Prior art keywords
depth
view
depth estimation
estimation
camera pose
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110695235.7A
Other languages
Chinese (zh)
Other versions
CN113379821A (en
Inventor
肖春霞
罗飞
魏林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN202110695235.7A priority Critical patent/CN113379821B/en
Publication of CN113379821A publication Critical patent/CN113379821A/en
Application granted granted Critical
Publication of CN113379821B publication Critical patent/CN113379821B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a monocular video depth estimation method based on deep learning stability. The method completely utilizes the monocular video sequence to train the proposed model, does not need monitoring of a depth map or the GroudTruth of the camera pose in the training process, and is a completely unsupervised method. Compared with the existing monocular video-based depth estimation, the method has the characteristics that the depth estimation result estimated on continuous video frames is stable, and the depth estimation result between frames does not have a large inconsistency. In addition, a solution is also provided for the difficulty in depth estimation, namely depth estimation of a moving object.

Description

Stable monocular video depth estimation method based on deep learning
Technical Field
The invention belongs to the field of computer vision, and particularly relates to a stable monocular video depth estimation method based on deep learning. And returning to a stable depth map of continuous frames without group Truth as supervision.
Background
Depth estimation is a fundamental task of computer vision, aiming at estimating depth from 2D images. The input to this task is the RGB image and the output is the depth map. The depth here refers to the distance from the object to the camera optical center, and the problem to be solved in the depth estimation is to solve the distance from the object to the camera optical center in the shooting scene. The depth estimation has wide application prospect and application value, such as automatic driving for researching more fire [1] The field, as well as conventional three-dimensional reconstruction, augmented reality, etc., all require the use of related techniques to depth estimation.
Methods of acquiring depth information include using depth sensors such as LIDAR sensors, toF sensors, etc., or solving using depth estimation algorithms. The problem with using sensors to obtain depth maps is that the equipment is expensive and the acquisition cost is high, especially to obtain high precision depth data. And the depth data collected by the sensors is also range limited and subject to environmental factors. Therefore, if the problem of acquiring the depth data can be solved through the depth estimation algorithm, the data acquisition cost can be greatly saved, and the large-scale popularization and application of related applications are facilitated.
The depth estimation algorithm can be divided into a conventional method and a method based on deep learning. The traditional method relies on accurate extraction and matching of image feature points, which results in that in some low-texture regions or in the presence of occlusion and moving objects in the scene, the acquired depth data is less than ideal, and the estimated depth map is usually sparse. The method based on deep learning can well solve the problems of the traditional method and can estimate a dense depth result. Depth estimation based on deep learning can be classified into supervised depth estimation and unsupervised depth estimation, the supervised depth estimation has the problem that a large amount of real depth data is needed as supervision in the training process, but a large amount of manpower and material resources are consumed for collecting the real depth data, and the real depth data is not paid. Therefore, unsupervised depth estimation is a current trend of research development in this direction, wherein a deep learning method using monocular video sequences as training data is the most promising method among all depth estimation methods based on deep learning due to the convenience of data acquisition. However, one of the existing problems of the unsupervised depth estimation based on monocular video is that the estimated depth between consecutive video frames is not stable, which is a problem to be solved urgently, and the depth estimation between video sequences is stable to bring stable depth-based applications, such as augmented reality, three-dimensional reconstruction, and the like.
Disclosure of Invention
In order to overcome the defects, the invention provides a stable monocular video depth estimation method based on deep learning, which utilizes the strong fitting capacity of a convolutional neural network, and finally learns the depth map corresponding to the two-dimensional RGB picture by learning a large amount of two-dimensional RGB picture data and combining with a designed loss function. The method completely utilizes the monocular video sequence to train the proposed model, does not need the monitoring of the Ground Truth of the depth map or the camera pose in the training process, and is a completely unsupervised method.
The existing method for unsupervised depth estimation based on monocular video has great progress in precision, and the test result on a single picture has a good visualization result. However, after experiments are carried out on the existing monocular video depth estimation method, it is found that although a single picture can have a better test result, the test result on consecutive video frames has an unstable phenomenon, which is also a problem that the present invention mainly intends to solve. The patent makes innovations in that a time sequence smoothing term is proposed for continuous depth maps, and a time sequence smoothing loss is constructed by the smoothing term and used for restraining stability between continuous depth maps. And the other is to provide a self-discovery mask (mask) which is used for solving the depth estimation of a moving object in the scene.
In order to achieve the above object, a stable monocular video depth estimation method based on deep learning is characterized in that:
firstly, inputting a single color picture into a depth estimation network for depth estimation, and then inputting two continuous video frames into a camera pose estimation network for relative camera pose estimation; reconstructing an image by combining depth information output by a depth estimation network and camera pose information output by a camera pose network; the two continuous video frames are both pictures involved in depth estimation;
the method comprises the following steps of constructing a loss function to solve the unstable problem of depth estimation of continuous video frames, and specifically defining the following steps:
L gs =|S a -S b |,
S a =median(D a ),
wherein L is gs Representing the time-sequence smoothing loss function term for two consecutive video frames I a 、I b ,D a 、D b Represents I a And I b Result of depth estimation of S a And S b Then it is the time sequence smoothing term of the consecutive video frames, and the mean represents the median operation.
Further, the view synthesis loss of the dynamic object is constrained to deal with the problem that the depth estimation of the dynamic object violating the static scene assumption is inaccurate when view synthesis is performed, which is specifically performed as follows:
for two front and back adjacent views I a 、I b In obtaining a depth estimation networkThe camera pose output by the output depth information and camera pose estimation network may then be viewed from view I b View synthesis is performed to obtain a view I a Composite Picture I at View Angle a′ Then, the gray difference between the original view and the synthesized view, i.e. the view synthesis loss P, is calculated daa′ P can also be obtained dbb′ (ii) a Calculating to obtain a mask M after obtaining the view synthesis loss from the front frame to the rear frame and the view synthesis loss from the rear frame to the front frame, wherein the value of M is as small as possible under the condition that the pose of a depth map and a camera output by a network is accurate, the value of M is relatively large for an area with poor reconstruction effect, and the learning weight of the area is relatively set to be small; correspondingly, when calculating the loss, punishment should be carried out on the part of the area, and a smaller weight is set for the inaccurate area.
Preferably, the deep learning framework is a pyroch environment, and the version of the pyroch is more than 1.0.1. The network is built based on ResNet.
Further, after each epoch training is completed, testing the model trained until the current epoch, and evaluating the training effect of the current model by combining evaluation indexes; after the complete training is completed, the complete model training result is evaluated by combining the test set, and then the parameters of the model are adjusted to continue training so as to achieve the best training result. The evaluation index mainly comprises a root mean square error, a logarithmic root mean square error, an absolute relative error, a squared relative error and accuracy.
Preferably, the image is reconstructed with a reconstruction function as follows:
L p =∑ s ||pe(I t -I s-t )|| 1
I s-t =I s KT t-s D t K -1
in which I t Representing the target video frame to be depth estimated, I s-t Representing video frames I from adjacent sources s Camera pose T from source view to target view by combining camera internal parameter K t-s Depth of target viewDegree diagram D t A synthesized synthesis target view frame, pe representing a view synthesis loss; the specific view synthesis loss function is as follows:
Figure BDA0003128000440000031
where α, β are hyper-parameters, set here to 0.15, 0.85, respectively, ssim is a structural similarity function.
The invention has the advantages that: the depth estimation method of the invention can complete the depth estimation of the monocular video, and has the advantage that the result of the depth estimation can be kept stable between continuous video frames.
Drawings
FIG. 1 is a flow chart of a model of the present invention.
Detailed Description
For further understanding of the present invention, the objects, technical solutions and advantages of the present invention will become more apparent from the following detailed description of the present invention when taken in conjunction with the accompanying drawings and examples. It is to be understood that the present invention is illustrative only and not limiting.
Example 1
The embodiment of the depth estimation method based on the deep learning monocular video can well realize stable depth estimation of continuous video frames.
Fig. 1 is a flowchart of a model for monocular video depth estimation based on deep learning according to this embodiment. The model mainly comprises two parts, namely a depth estimation network and a camera pose estimation network. Firstly, inputting a single color RGB picture into a depth estimation network for depth estimation, and then inputting two continuous video frames into a camera pose estimation network for relative camera pose estimation, wherein the two continuous video frames are pictures contained in the depth estimation. By combining the depth information output by the depth estimation network and the camera pose information output by the camera pose network, view synthesis, namely image reconstruction can be performed, a loss function is constructed to supervise the training of the network, and then the parameters are updated through direction propagation.
As shown in fig. 1, based on the depth estimation method in this embodiment, the specific implementation steps are as follows:
step S1: the deep learning framework adopted by the invention is a pyrrch, so the pyrrch environment needs to be configured before operation, and the version of the pyrrch is more than 1.0.1.
Step S2: preparation of experimental data, namely a KITTI data set is adopted as a data set, data needs to be processed before training, the data is processed into two parts which are respectively training sets for training models, the training sets usually comprise a large number of pictures, the pictures in the training sets are processed to be the same in size, and classification can be carried out according to different scenes; and the test set is used for verifying the model, the difference between the test set and the training set is the number of pictures, the test set only contains relatively small amount of picture data, and the test set also contains corresponding real depth data. At the same time, the still picture frames should be removed during the processing, because the still pictures do not satisfy the prior assumption of view synthesis.
And step S3: the model is trained, the network model is constructed based on ResNet, and the model can be trained by a server after a corresponding software running environment is configured. Different training configurations, such as different network layer numbers, can be performed during training, but it should be noted that the higher the network layer number is, the larger the required computing resources are, and it is necessary to ensure that the video card capacity of the server is sufficiently used. The maximum capacity used in the present invention is 12G. After the image depth information and the camera pose information are obtained, view synthesis or image reconstruction can be carried out according to an image reconstruction function described below, the pixel difference between a reconstructed image and an original image can be used for constructing a loss function, and a model is supervised and trained during a back propagation period;
and step S4: after each epoch training is completed, the model trained until the current epoch is tested, and the effect of the current model training is evaluated by combining the evaluation indexes. One epoch is used for training the model by using all the training set data, and the value of the epoch is set to be 150, namely 150 times of training are required;
step S5: after the complete training is completed, the complete model training result is evaluated by combining the test set, and then the parameters of the model are adjusted to continue training so as to achieve the best training result. The evaluation indexes mainly include 5 items, namely Root Mean Square Error (RMSE), log root mean square error (RMSE log), absolute relative error (abssel), square relative error (SqRel), and accuracy (% correct), namely, the depth data output by the network model and the real depth data are respectively calculated and compared. The adjustment of the model parameters of the network can be adjusted according to the speed of the training process, whether the loss function is reduced or not and the descending trend;
specifically, in order to solve the problem of unstable depth estimation of consecutive video frames, the patent proposes a timing sequence smoothing idea suitable for the situation, and proposes a new loss function term, which is defined as follows:
L gs =|S a -S b |,
S a =median(D a ),
for ensuring the stability of the depth estimation of successive video frames.
Wherein L is gs Representing the time-sequence smoothing loss function term for two consecutive video frames I a 、I b ,D a 、D b Represents I a And I b Result of depth estimation of S a And S b Then it is the time-sequence smoothing term of the consecutive video frames before and after, and mean represents the median-removing operation.
The image reconstruction function is as follows:
L p =∑ s ||pe(I t -I s-t )|| 1
I s-t =I s KT t-s D t K -1
in which I t Representing the target video frame to be depth estimated, I s-t Representing video frames I from adjacent sources s Camera pose T combining camera internal reference K and source view to target view t-s Target view depth mapD t The synthesized synthesis target view frame, pe, represents a view synthesis loss. The specific view synthesis loss function is as follows:
Figure BDA0003128000440000051
where α, β are hyper-parameters, set here to 0.15, 0.85, respectively, and ssim is a structural similarity function.
In order to solve the problem that the depth estimation of a dynamic object violating the static scene assumption is inaccurate during view synthesis, the invention provides a mask idea based on front and back view synthesis inconsistency, and the view synthesis loss of the dynamic object is restrained, and the specific method comprises the following steps:
Figure BDA0003128000440000052
Figure BDA0003128000440000053
Figure BDA0003128000440000054
for two front and back adjacent views I a 、I b After obtaining the depth information output by the depth estimation network and the camera pose output by the camera pose estimation network, the depth information and the camera pose can be obtained by a view I b View synthesis is carried out to obtain a view I a Composite Picture I at View Angle a′ Then, the gray difference between the original view and the synthesized view, i.e. the view synthesis loss P, can be calculated daa′ P can also be obtained dbb′ . The mask M can be calculated after the view synthesis loss from the front frame to the rear frame and the view synthesis loss from the rear frame to the front frame are obtained, the value of M is as small as possible under the condition that the position and the posture of a depth map and a camera output by a network are accurate, and for an area with poor reconstruction effect, the value of M is relatively large, and the area with poor reconstruction effect is obtainedThe learning weight of the partial region should be set relatively small. Correspondingly, when calculating the loss, punishment should be carried out on the part of the area, and a smaller weight is set for the inaccurate area.
L gs Two adjacent depth maps in front and back are constrained globally, and the embodiment also constrains the local part of the depth map, and the specific formula is as follows:
D′=D/S
Figure BDA0003128000440000055
d ' is the depth map D ' from which the time sequence is smoothed ' t-s (p) is the depth map D 'smoothed by the de-temporal sequence of the target view' t Depth map at source view perspective, D ', synthesized in conjunction with camera pose' s (p) depth map D which is a source view s And (3) calculating the difference value of each point p in the depth map after sampling, and constraining the depth maps of the adjacent video frames pixel by pixel to ensure that the depth maps of the adjacent front and rear frames are consistent.
The embodiment provides a monocular video depth estimation method based on deep learning, which is characterized in that a depth map corresponding to a two-dimensional RGB picture is finally learned by learning a large amount of two-dimensional RGB picture data and combining a designed loss function by utilizing the strong fitting capacity of a convolutional neural network. The method solves the unstable phenomenon of the depth estimation on continuous video frames based on the monocular video depth estimation method.

Claims (7)

1. A stable monocular video depth estimation method based on deep learning is characterized in that:
firstly, inputting a single color picture into a depth estimation network for depth estimation, and then inputting two continuous video frames into a camera pose estimation network for relative camera pose estimation; reconstructing an image by combining depth information output by a depth estimation network and camera pose information output by a camera pose network; the two continuous video frames are both pictures involved in depth estimation;
the method comprises the following steps of constructing a loss function to solve the problem of unstable depth estimation of continuous video frames, and specifically defining the following steps:
L gs =|S a -S b |,
S a =median(D a ),
wherein L is gs Representing the time-sequence smoothing loss function term, for two consecutive video frames I a 、I b ,D a 、D b Represents I a And I b Result of depth estimation of S a And S b Then, the time sequence smoothing item of the front and back continuous video frames is used, and the mean represents the median operation;
L gs globally constraining two adjacent depth maps in front and back, and locally constraining the depth maps by using the following loss, wherein a specific formula is as follows:
D′=D/S
Figure FDA0003707754590000011
d 'is the depth map after the depth map D has been subjected to time-series smoothing, D' t-s (p) depth map D after de-temporal smoothing from the target view t ' depth map under source view perspective, D, combined with Camera pose Synthesis s ' (p) is a depth map D of the source view s And (3) calculating the difference value of each point p in the depth map after sampling, and constraining the depth maps of the adjacent video frames pixel by pixel to ensure that the depth maps of the adjacent front and rear frames are consistent.
2. The method of claim 1, wherein the method comprises:
the view synthesis loss of the dynamic object is constrained to deal with the problem that the depth estimation of the dynamic object violating the static scene assumption is inaccurate when view synthesis is performed, and the specific method is as follows:
for two front and back adjacent views I a 、I b From view I, after obtaining depth information output by the depth estimation network and camera pose output by the camera pose estimation network b View synthesis is performed to obtain a view I a Composite Picture I at View Angle a′ Then, the gray difference between the original view and the synthesized view, i.e. the view synthesis loss P, is calculated daa′ P can also be obtained dbb′ (ii) a And calculating to obtain the mask M after obtaining the view synthesis loss from the front frame to the rear frame and the view synthesis loss from the rear frame to the front frame.
3. The method of claim 2, wherein the method comprises: the deep learning framework is adopted as a pytorch environment, and the version of the torch is more than 1.0.1.
4. The method of claim 2, wherein the method comprises: the network is built based on ResNet.
5. The method of claim 2, wherein the method comprises: after each epoch is trained, testing the model trained until the current epoch, and evaluating the training effect of the current model by combining evaluation indexes; after the complete training is completed, the complete model training result is evaluated by combining the test set, and then the parameters of the model are adjusted to continue training so as to achieve the best training result.
6. The method for estimating depth of a stable monocular video based on deep learning according to claim 5, wherein: the evaluation indexes mainly comprise root mean square error, logarithmic root mean square error, absolute relative error, squared relative error and accuracy.
7. The method of claim 2, wherein the method comprises:
the image is reconstructed with the following reconstruction function:
L p =∑ s ||pe(I t -I s-t )|| 1
I s-t =I s KT t-s D t K -1
wherein I t Representing the target video frame to be depth estimated, I s-t Representing video frames I from adjacent sources s Camera pose T combining camera internal reference K and source view to target view t-s Target view depth map D t A synthesized target view frame, pe, representing a view synthesis penalty; the specific view synthesis loss function is as follows:
Figure FDA0003707754590000021
where α, β are hyper-parameters, set here to 0.15, 0.85, respectively, ssim is a structural similarity function.
CN202110695235.7A 2021-06-23 2021-06-23 Stable monocular video depth estimation method based on deep learning Active CN113379821B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110695235.7A CN113379821B (en) 2021-06-23 2021-06-23 Stable monocular video depth estimation method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110695235.7A CN113379821B (en) 2021-06-23 2021-06-23 Stable monocular video depth estimation method based on deep learning

Publications (2)

Publication Number Publication Date
CN113379821A CN113379821A (en) 2021-09-10
CN113379821B true CN113379821B (en) 2022-10-11

Family

ID=77578449

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110695235.7A Active CN113379821B (en) 2021-06-23 2021-06-23 Stable monocular video depth estimation method based on deep learning

Country Status (1)

Country Link
CN (1) CN113379821B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111986105A (en) * 2020-07-27 2020-11-24 成都考拉悠然科技有限公司 Video time sequence consistency enhancing method based on time domain denoising mask

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2424422B1 (en) * 2009-04-29 2019-08-14 Koninklijke Philips N.V. Real-time depth estimation from monocular endoscope images
US20150262412A1 (en) * 2014-03-17 2015-09-17 Qualcomm Incorporated Augmented reality lighting with dynamic geometry
CN108765479A (en) * 2018-04-04 2018-11-06 上海工程技术大学 Using deep learning to monocular view estimation of Depth optimization method in video sequence
CN111797855A (en) * 2019-04-09 2020-10-20 腾讯科技(深圳)有限公司 Image processing method, image processing device, model training method, model training device, medium and equipment
CN110443842B (en) * 2019-07-24 2022-02-15 大连理工大学 Depth map prediction method based on visual angle fusion
CN110610486B (en) * 2019-08-28 2022-07-19 清华大学 Monocular image depth estimation method and device
CN111783582A (en) * 2020-06-22 2020-10-16 东南大学 Unsupervised monocular depth estimation algorithm based on deep learning

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111986105A (en) * 2020-07-27 2020-11-24 成都考拉悠然科技有限公司 Video time sequence consistency enhancing method based on time domain denoising mask

Also Published As

Publication number Publication date
CN113379821A (en) 2021-09-10

Similar Documents

Publication Publication Date Title
Aleotti et al. Generative adversarial networks for unsupervised monocular depth prediction
Li et al. PDR-Net: Perception-inspired single image dehazing network with refinement
CN110108258B (en) Monocular vision odometer positioning method
CN110910447B (en) Visual odometer method based on dynamic and static scene separation
CN111488865B (en) Image optimization method and device, computer storage medium and electronic equipment
CN110689008A (en) Monocular image-oriented three-dimensional object detection method based on three-dimensional reconstruction
CN104794737B (en) A kind of depth information Auxiliary Particle Filter tracking
CN110827312B (en) Learning method based on cooperative visual attention neural network
CN102156995A (en) Video movement foreground dividing method in moving camera
CN107767358B (en) Method and device for determining ambiguity of object in image
CN110827320B (en) Target tracking method and device based on time sequence prediction
CN114429555A (en) Image density matching method, system, equipment and storage medium from coarse to fine
CN110910437A (en) Depth prediction method for complex indoor scene
Chen et al. A particle filtering framework for joint video tracking and pose estimation
CN114049434A (en) 3D modeling method and system based on full convolution neural network
Ubina et al. Intelligent underwater stereo camera design for fish metric estimation using reliable object matching
CN112686952A (en) Image optical flow computing system, method and application
CN114519772A (en) Three-dimensional reconstruction method and system based on sparse point cloud and cost aggregation
CN104463962B (en) Three-dimensional scene reconstruction method based on GPS information video
Babu et al. An efficient image dahazing using Googlenet based convolution neural networks
CN113065506B (en) Human body posture recognition method and system
Ge et al. An improved U-net architecture for image dehazing
CN117523100A (en) Three-dimensional scene reconstruction method and device based on neural network and multi-view consistency
CN112785629A (en) Aurora motion characterization method based on unsupervised deep optical flow network
Khan et al. Towards monocular neural facial depth estimation: Past, present, and future

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant