WO2021027543A1 - 基于单目图像的模型训练方法、装置及数据处理设备 - Google Patents

基于单目图像的模型训练方法、装置及数据处理设备 Download PDF

Info

Publication number
WO2021027543A1
WO2021027543A1 PCT/CN2020/104924 CN2020104924W WO2021027543A1 WO 2021027543 A1 WO2021027543 A1 WO 2021027543A1 CN 2020104924 W CN2020104924 W CN 2020104924W WO 2021027543 A1 WO2021027543 A1 WO 2021027543A1
Authority
WO
WIPO (PCT)
Prior art keywords
optical flow
image
training
training image
flow prediction
Prior art date
Application number
PCT/CN2020/104924
Other languages
English (en)
French (fr)
Inventor
刘鹏鹏
许佳
Original Assignee
广州虎牙科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 广州虎牙科技有限公司 filed Critical 广州虎牙科技有限公司
Priority to US17/629,521 priority Critical patent/US20220270354A1/en
Publication of WO2021027543A1 publication Critical patent/WO2021027543A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/18Image warping, e.g. rearranging pixels individually
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06V10/7747Organisation of the process, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/776Validation; Performance evaluation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • G06T2207/10012Stereo images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Definitions

  • This application relates to the field of computer vision technology, and specifically, provides a model training method, device and data processing equipment based on monocular images.
  • Binocular image matching is a computer vision problem, which is widely used in 3D digital scene reconstruction, autonomous driving and other fields.
  • the goal of binocular image alignment is to predict the displacement of pixels, that is, a stereo disparity map between two binocular images.
  • CNN Convolutional Neural Networks
  • synthetic simulation images can be used for training, but the model trained in this way has poor recognition capabilities for real images .
  • an unlabeled binocular image can be used to warp the right image to the left image according to the predicted disparity map, and then the difference between the warped right image and the left image can be measured according to the photometric loss.
  • this method still requires a large number of corrected binocular images, and the training cost is relatively high.
  • the purpose of this application is to provide a model training method, device and data processing equipment based on monocular images, which can realize self-supervised learning of stereo matching of binocular images without relying on corrected binocular image samples.
  • a model is used to predict optical flow and stereo matching.
  • the embodiment of the application provides a model training method based on a monocular image, which is applied to training an image matching model, and the method includes:
  • the trained image matching model is configured to perform binocular image alignment and optical flow prediction.
  • the embodiment of the present application also provides a model training device based on a monocular image, which is applied to training an image matching model, and the device includes:
  • the image acquisition unit is configured to acquire the first training image and the second training image acquired by the monocular image acquisition device at different time points;
  • the first optical flow prediction module is configured to obtain a first optical flow prediction result from the first training image to the second training image according to the luminosity loss between the first training image and the second training image;
  • the second optical flow prediction module is configured to use the first optical flow prediction result as an agent label, and use the first training image and the second training image to perform agent learning of optical flow prediction.
  • the embodiment of the present application also provides a data processing device, which is characterized by comprising a machine-readable storage medium and a processor, the machine-readable storage medium stores machine-executable instructions, and the machine-executable instructions are When the processor is executed, the above-mentioned model training method based on monocular images is realized.
  • the embodiment of the present application also provides a computer-readable storage medium on which a computer program is stored, characterized in that, when the computer program is executed by a processor, the above-mentioned model training method based on monocular images is implemented.
  • Figure 1 is a schematic block diagram of a data processing device provided by an embodiment of the application.
  • FIG. 2 is a schematic diagram of the step flow of the monocular image-based model training method provided by an embodiment of the application;
  • FIG. 3 is one of the schematic diagrams of the binocular image alignment principle provided by the embodiment of this application.
  • FIG. 4 is the second schematic diagram of the binocular image alignment principle provided by the embodiment of this application.
  • FIG. 5 is a schematic diagram of image matching model processing provided by an embodiment of the application.
  • Figure 6 is a schematic diagram of comparison of optical flow prediction test results on the same data set
  • Figure 7 is a schematic diagram of comparison of binocular image alignment test results on the same data set
  • FIG. 8 is a schematic diagram of modules of a monocular image-based model training device provided by an embodiment of the application.
  • FIG. 1 is a schematic diagram of the hardware structure of a data processing device 100 according to an embodiment of the application.
  • the data processing device 100 may include a processor 130 and a machine-readable storage medium 120.
  • the processor 130 and the machine-readable storage medium 120 may communicate via a system bus.
  • the machine-readable storage medium 120 stores machine-executable instructions (such as code instructions related to the image model training device 110), and by reading and executing the machine-executable instructions corresponding to the image model training logic in the machine-readable storage medium 120 ,
  • the processor 130 may execute the above-described model training method based on monocular images.
  • the machine-readable storage medium 120 mentioned in this application may be any electronic, magnetic, optical, or other physical storage device, and may contain or store information, such as executable instructions, data, and so on.
  • the machine-readable storage medium may be: RAM (Radom Access Memory), volatile memory, non-volatile memory, flash memory, storage drives (such as hard drives), solid state drives, and any type of storage disk (Such as CD, DVD, etc.), or similar storage media, or a combination of them.
  • FIG. 2 is a schematic flowchart of a model training method based on a monocular image provided in an embodiment of the present application. The following will exemplify the steps of the method.
  • Step 210 Obtain the first training image and the second training image collected by the monocular image acquisition device at different time points.
  • Step 220 Obtain a first optical flow prediction result from the first training image to the second training image according to the luminosity loss between the first training image and the second training image.
  • Step 230 Use the first optical flow prediction result as an agent label, and use the first training image and the second training image to perform agent learning for optical flow prediction.
  • binocular image alignment is generally a computer vision task of determining the same object from two binocular images with horizontal stereo inspection.
  • Optical flow prediction is a technology that determines the motion of the same object in different frames of images based on the assumption of brightness constancy and spatial smoothness, and according to the luminosity of pixels.
  • Proxy learning is a strategy that uses created additional tasks to guide learning for target tasks.
  • binocular image alignment and optical flow prediction can be regarded as a type of problem, that is, the problem of matching corresponding pixels in the image.
  • the main difference between the two is that binocular image alignment is a one-dimensional search problem.
  • the corresponding pixels are located on the epipolar line.
  • the optical flow prediction does not have this constraint and can be regarded as a two-dimensional search problem. Therefore, binocular image alignment can be regarded as a special case of optical flow. If a pixel matching model is trained to perform well in two-dimensional scenes, it can also perform pixel matching tasks well in one-dimensional scenes.
  • the data processing device 100 can obtain the monocular image acquisition device to acquire two images at different time points as training samples to train the image matching model by performing step 210.
  • the left and right cameras of the binocular camera can collect images at the same time, and the relative positions of the two cameras are generally fixed. Therefore, according to the geometric characteristics, during the binocular image alignment process, For the pixels on the epipolar line of the left image, the corresponding pixels should be located on the epipolar line of the right image, that is, this is a one-dimensional image matching problem.
  • the projection point of the point P in the three-dimensional scene in the left image of the binocular image is the pixel P l
  • the projection point in the right image is the pixel P r .
  • the epipolar line passes the pole e l of the left image, and P l is on the epipolar line, then the pixel P r corresponding to P l on the right image is always on the epipolar line, and the epipolar line passes through the right image
  • the pole e r Ol and Or are the centers of the left and right cameras, respectively, and e l and e r are poles.
  • FIG. 4 shows an example of binocular stereo image correction.
  • the left and right cameras are parallel and the epipolar line is horizontal. That is, the binocular image alignment is to find matching pixels along the horizontal line.
  • optical flow generally describes dense motion between two adjacent frames.
  • the two images are taken at different times, and the camera position and posture between these two frames can be changed.
  • the scene predicted by the optical flow can be a rigid scene or a non-rigid scene.
  • the optical flow prediction can also be a one-dimensional image matching problem along the epipolar line.
  • Binocular images are pictures taken at different angles at the same time. The binocular image alignment problem can be regarded as a rigid scene. The camera moves from one position to another position to shoot here, and then processes the optical flow prediction of the two images. problem.
  • the problem of camera self-motion may not be considered, and only binocular image alignment can be used as a special case of optical flow prediction.
  • the image matching model can achieve good optical flow prediction in two-dimensional space, it should also be able to achieve good binocular image alignment in one-dimensional space.
  • the data processing device 100 when the data processing device 100 performs step 220, in the optical flow prediction process, the data processing device 100 can warp the target image to the reference image according to the predicted optical flow, and warp the target image by measuring The difference between the reference image and the reference image is used to construct the luminosity loss.
  • the occluded pixels can be predetermined and excluded when using the luminosity loss to predict the optical flow.
  • a pixel is only visible in one frame of picture and not visible in another frame of picture, then the pixel is blocked.
  • the pixels may be occluded, such as the movement of the object or the movement of the camera, etc., which may cause the pixels to be occluded.
  • the first frame an object is facing forward, and the camera captures the front part of the object; and in the second frame, the object rotates to the back, so the camera can only capture To the part behind the object, in this way, the first half of the object in the first frame is not visible in the second frame and is occluded.
  • the data processing device 100 may obtain the initial optical flow graph and the initial optical flow graph from the first training graph to the second training graph according to the photometric loss between the first training graph and the second training graph.
  • the initial confidence map and then obtain the first optical flow prediction result after excluding the occluded pixels according to the initial optical flow map and the initial confidence map.
  • the initial optical flow map may indicate that the corresponding pixel is at the The displacement between the first training image and the second training image; the first optical flow prediction result may indicate that the unoccluded pixels are between the first training image and the second training image The amount of displacement.
  • the initial confidence map may be configured to indicate the occlusion state of the corresponding pixel.
  • the confidence of the occluded pixel in the initial confidence map may be set to 0, and the confidence of the unoccluded pixel may be Set to 1. Then, according to the initial optical flow map and the initial confidence map, the first optical flow prediction result is obtained.
  • the data processing device 100 may use forward-backward photometric detection to process the initial optical flow map, and determine the confidence level corresponding to each pixel according to the photometric difference to obtain the confidence map. .
  • the data processing device 100 may set the confidence level of pixels whose luminosity difference exceeds the preset threshold value to 0 as the occluded pixels; the data processing device 100 may set the confidence level of pixels whose luminosity difference does not exceed the preset threshold value to 1 , As unobstructed pixels.
  • the data processing device 100 can obtain the front of the pixel p on the initial optical flow diagram from the first training image I t to the second training image I t+1 when performing forward-backward photometric detection.
  • the data processing device 100 can obtain the confidence map M t ⁇ t+1 (p) of the pixel p according to the forward optical flow and the backward optical flow of the pixel p according to the following formula,
  • p represents a pixel
  • ⁇ (p) 0.1(
  • the data processing device 100 may also exchange the first training image and the second training image for training, so as to obtain a reverse optical flow image from the second training image to the first training image.
  • the data processing device 100 when the data processing device 100 performs step 220, it can perform the optical flow prediction from the first training image to the second training image according to the preset luminosity loss function and smoothness loss function, to obtain the first optical flow forecast result.
  • the luminosity loss function L p can be expressed as:
  • the form of the smoothness loss function L m may be:
  • I(p) is the pixel points on the first training image or the second training image
  • N is the total number of pixels in the first training image or the second training image
  • T represents the transposition
  • I(p) is the pixel on the first training image or the second training image
  • F(p) is the point on the optical flow diagram currently processed.
  • the CNN can learn better optical flow prediction on the KITTI data set even if there are only sparse correct labels. Therefore, in some embodiments, the data processing device 100 may first obtain sparse and high-confidence optical flow predictions by performing step 220, and then use them as proxy tags to guide the learning of image matching prediction.
  • the data processing device 100 may use the first optical flow prediction result as a proxy mark, use a preset proxy self-supervised loss function and a smoothness loss function, and execute the data processing from the first optical flow prediction result.
  • the form of the proxy self-supervised loss function L s may be:
  • p represents a pixel
  • F py is the initial optical flow graph
  • M py is the initial confidence graph
  • F is the currently processed optical flow graph
  • step 230 when the data processing device 100 performs step 230, it may no longer perform the culling action of unoccluded pixels, so that the model can predict the optical flow of the blocked area.
  • the first training image and the second training image may be subjected to the same preprocessing at random.
  • the The pre-processing may be to trim the first training image and the second training image at the same position and the same size, or perform the same random down-sampling, or in some other embodiments, the pre-processing may be Cut the first training image and the second training image at the same position and size, and perform the same random downsampling; then, the data processing device 100 may use the preprocessed first training image and second training image.
  • the training map performs the training of step 230, so that the accuracy of the prediction of the optical flow of the occluded point and the occluded point can be improved at the same time.
  • the first training image and the second training image may be randomly scaled by the same coefficient or rotated by the same angle, and then used The first training image and the second training image after processing perform the training of step 230.
  • the data processing device 100 may also use other methods to obtain high-confidence optical flow prediction. For example, traditional methods are used to calculate reliable parallax.
  • the model ultimately needs to perform optical flow prediction. Therefore, the data processing device 100 is used to obtain the optical flow prediction result and the confidence map through step 220, and then the high-confidence optical flow prediction is used as step 230.
  • Agent basic facts guide the neural network to learn image matching, and the above training process can be completed in a model.
  • the number of high-confidence pixels will increase. Therefore, after the data processing device 100 performs step 230, the second optical flow prediction result obtained by the agent learning may be used for iterative training to improve The recognition capability of the image matching model.
  • the image matching model obtained through training by the method provided in the embodiments of the present application may be configured to perform optical flow prediction, or may be configured to perform binocular image alignment.
  • the trained image matching model is performing optical flow prediction, the first training image I t to the second training image I t+1 collected at different time points can be used as input, and the output of I t to I t+1 Optical flow diagram.
  • the trained image matching model is configured to align the binocular images, the images I l and Ir collected by the left and right cameras in the binocular image can be used as input, and the output images I l to Ir can be obtained .
  • the stereo disparity map is used as the matching result.
  • the Adam optimizer can be used to build the image matching model on the TensorFlow system, and the batch size of the model is set to 4, the initial learning rate is 1e-4, and it is attenuated by half every 60k iterations.
  • standardized images can be used as input, and data can be enhanced by methods such as random cropping, scaling, or rotation.
  • the crop size can be set to [256,640] pixel size
  • the random scaling factor range can be set to [0.75,1.25].
  • the photometric loss can be applied to all pixels, and the image matching model can be trained using the photometric loss, and 100k iterations can be performed from the beginning. It should be noted that at the beginning, it is not necessary to distinguish between high-confidence pixels and low-confidence pixels, because directly applying luminosity loss to high-confidence pixels may result in an obvious solution that all pixels are considered low-confidence pixels. .
  • the photometric loss function L p and the smoothness loss function L m are used for 400k iterations to train the image matching model.
  • the proxy self-supervised loss function L s and the smoothness loss function L m may be used to perform 400k iterations to train the image matching model.
  • Figure 6 shows the test results of optical flow prediction using other models and the image matching model trained using the method provided in the embodiments of this application on the KITTI 2012 data set and KITTI 2015 data set. It can be seen from Figure 6 that The recognition ability of the image matching model ("Our+proxy" item) trained by the monocular image-based model training method provided in the embodiments of the present application is significantly better than the model trained by unsupervised methods such as MultiFrameOccFlow and DDFlow.
  • Figure 7 shows the test results of binocular image alignment using other models on the KITTI 2012 data set and KITTI 2015 data set and the image matching model trained using the method provided in the embodiments of this application.
  • the recognition ability of the image matching model (“Our+proxy+ft" item) trained by the monocular image-based model training method provided by the embodiment of the application is significantly better than the model trained by other unsupervised methods.
  • an embodiment of the present application also provides a model training device 110 based on a monocular image.
  • the device includes an image acquisition module 111, a first optical flow prediction module 112, and a second optical flow prediction module 113.
  • the image acquisition unit 111 is configured to acquire the first training image and the second training image acquired by the monocular image acquisition device at different time points.
  • the first optical flow prediction module 112 is configured to obtain a first optical flow prediction result from the first training image to the second training image according to the photometric loss between the first training image and the second training image ;
  • the second optical flow prediction module 113 is configured to use the first optical flow prediction result as an agent label, and use the first training image and the second training image to perform agent learning of optical flow prediction.
  • the monocular image-based model training method, device, and image processing equipment provided in this application treat binocular image matching as a special case of optical flow prediction, and use agent learning to collect data collected at different time points.
  • the first optical flow prediction result obtained by using two monocular images as training samples is used as a proxy label, and is configured to instruct the model to perform optical flow prediction learning again.
  • the self-supervised learning of stereo matching of binocular images can be performed without relying on the corrected binocular image samples, and the same model can be used to predict optical flow and stereo matching.
  • each block in the flowchart or block diagram may represent a module, program segment, or part of the code, and the module, program segment, or part of the code includes one or more configured to implement a prescribed logical function Executable instructions.
  • the functions marked in the block may also occur in a different order from the order marked in the drawings.
  • each block in the block diagram and/or flowchart, and the combination of the blocks in the block diagram and/or flowchart can be implemented by a dedicated hardware-based system that performs the specified functions or actions Or it can be realized by a combination of dedicated hardware and computer instructions.
  • the functional modules in the various embodiments of the present application may be integrated together to form an independent part, or each module may exist alone, or two or more modules may be integrated to form an independent part.
  • the function is implemented in the form of a software function module and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the technical solution of this application essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program code .
  • the optical flow prediction results obtained by using two monocular images collected at different time points as training samples are used as proxy markers to guide the model to perform optical flow again.
  • Stream prediction learning the self-supervised learning of stereo matching of binocular images can be realized without relying on the corrected binocular image samples, and the same model can be used to predict optical flow and stereo matching.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Multimedia (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

本申请实施例提供了一种基于单目图像的模型训练方法、装置及数据处理设备,该方法包括首先获取单目图像采集装置在不同时间点采集的第一训练图和第二训练图。然后根据第一训练图及第二训练图之间的光度损失,获得从第一训练图到第二训练图的第一光流预测结果。再将第一光流预测结果作为代理标记,使用第一训练图和第二训练图进行光流预测训练。通过将双目图像匹配看作光流预测的特例,采用代理学习的方式,将不同时间点采集的两个单目图像作为训练样本得到的第一光流预测结果作为代理标注来指导模型进行再次光流预测的学习。如此,可以在不依赖校正好的双目图像样本的情况下实现双目图像立体匹配的自监督学习,使用同一个模型进行预测光流和立体匹配。

Description

基于单目图像的模型训练方法、装置及数据处理设备
相关申请的交叉引用
本申请要求于2019年8月15日提交中国专利局的申请号为2019107538107、名称为“基于单目图像的模型训练方法、装置及数据处理设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机视觉技术领域,具体而言,提供一种基于单目图像的模型训练方法、装置及数据处理设备。
背景技术
双目图像对齐(stereo matching)属于计算机视觉问题,广泛应用于3D数字场景重建、自动驾驶等领域。双目图像对齐的目标是预测像素的位移,即两张双目图像之间的立体视差图。
在处理双目图像对齐问题时,可以利用卷积神经网络(Convolutional Neural Networks,CNN)模型,通过大量的样本对CNN模型进行训练,然后使用训练完成的模型实现双目图像对齐。
由于获得带有正确标注的双目图像训练样本成本比较高,因此在一些实现方式中,可以改为采用合成的仿真图像进行训练,但这种方式训练出的模型对真实图像的识别能力不佳。在另一些实现方式中,可以采用未标记的双目图像,根据预测获得的视差图把右图像扭曲到左图像,然后根据光度量损失来测量扭曲过的右图像和左图像之间的差异,但这种方式仍然需要大量校正好的双目图像,训练成本比较高。
发明内容
本申请的目的在于提供一种基于单目图像的模型训练方法、装置及数据处理设备,可以在不依赖校正好的双目图像样本的情况下实现双目图像立体匹配的自监督学习,使用同一个模型进行预测光流和立体匹配。
为实现上述目的中的至少一个目的,本申请采用的技术方案如下:
本申请实施例提供了一种基于单目图像的模型训练方法,应用于对图像匹配模型进行训练,所述方法包括:
获取单目图像采集装置在不同时间点采集的第一训练图和第二训练图;
根据所述第一训练图及第二训练图之间的光度损失,获得从所述第一训练图到第二训练图的第一光流预测结果;
将所述第一光流预测结果作为代理标记,使用所述第一训练图和第二训练图进行光流预测的代理学习;
将训练好的所述图像匹配模型配置成执行双目图像对齐和光流预测。
本申请实施例还提供了一种基于单目图像的模型训练装置,应用于对图像匹配模型进行训练,所述装置包括:
图像获取单元,被配置成获取单目图像采集装置在不同时间点采集的第一训练图和第二训练图;
第一光流预测模块,被配置成根据所述第一训练图及第二训练图之间的光度损失,获得从所述第一训练图到第二训练图的第一光流预测结果;
第二光流预测模块,被配置成将所述第一光流预测结果作为代理标记,使用所述第一训练图和第二训练图进行光流预测的代理学习。
本申请实施例还提供了一种数据处理设备,其特征在于,包括机器可读存储介质及处理器,所述机器可读存储介质存储有机器可执行指令,所述机器可执行指令在被所述处理器执行时,实现上述的基于单目图像的模型训练方法。
本申请实施例还提供了一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该计算机程序被处理器执行时实现上述的基于单目图像的模型训练方法。
附图说明
图1为本申请实施例提供的数据处理设备的方框示意图;
图2为本申请实施例提供的基于单目图像的模型训练方法的步骤流程示意图;
图3为本申请实施例提供的双目图像对齐原理示意图之一;
图4为本申请实施例提供的双目图像对齐原理示意图之二;
图5为本申请实施例提供的图像匹配模型处理的示意图;
图6为相同数据集上光流预测测试结果对比示意图;
图7为相同数据集上双目图像对齐测试结果对比示意图;
图8为本申请实施例提供的基于单目图像的模型训练装置的模块示意图。
具体实施方式
为更清楚地介绍本申请实施例的目的、技术方案和有益效果,下面将附图对本申请实施例提供的技术方案进行示例性描述。
请参照图1,图1为本申请实施例提供的一种数据处理设备100的硬件结构示意图。在一些实施例中,该数据处理设备100可包括处理器130及机器可读存储介质120。处理器130与机器可读存储介质120可经由系统总线通信。并且,机器可读存储介质120存储有机器可执行指令(如图像模型训练装置110相关的代码指令),通过读取并执行机器可读存储介质120中与图像模型训练逻辑对应的机器可执行指令,处理器130可执行上文描述的基于单目图像的模型训练方法。
在一些实施例中,本申请中提到的机器可读存储介质120可以是任何电子、磁性、光学或其它物理存储装置,可以包含或存储信息,如可执行指令、数据,等等。例如,机器可读存储介质可以是:RAM(Radom Access Memory,随机存取存储器)、易失存储器、非易失性存储器、闪存、存储驱动器(如硬盘驱动器)、固态硬盘、任何类型的存储盘(如光盘、dvd等),或者类似的存储介质,或者它们的组合。
请参照图2,本申请实施例提供的一种基于单目图像的模型训练方法的示意性流程图,以下将对所述方法包括各个步骤进行示例性阐述。
步骤210,获取单目图像采集装置在不同时间点采集的第一训练图和第二训练图。
步骤220,根据所述第一训练图及第二训练图之间的光度损失,获得从所述第一训练图到第二训练图的第一光流预测结果。
步骤230,将所述第一光流预测结果作为代理标记,使用所述第一训练图和第二训练图进行光流预测的代理学习。
在一些实施例中,双目图像对齐一般是从具有水平方向立体视察的两个双目图像中确定同一物体的计算机视觉任务。
光流(opticalflow)预测是一种基于亮度恒定性和空间平滑性假设,根据像素的光度来确定不同帧图像中同一物体的运动的技术。
代理学习(proxy learning)是一种利用创建的附加任务来指导针对目标任务进行学习的策略。
经发明人研究发现,双目图像对齐和光流预测可以看作是一类问题,即图像中对应像素点的匹配问题。两者的主要区别在于,双目图像对齐是一维搜索问题,在校正好的双目图像上,对应像素位于对极线上。而光流预测不具有这种约束,可以视作二维搜索问题。因此,双目图像对齐可以被视为是光流的特殊情况。如果训练出在二维场景能够良好执行像素匹配模型,它就也能够在一维场景上很好地实现像素匹配任务。
因此,在一些实施例中,数据处理设备100通过执行步骤210,可以获取单目图像采集装置在不同时间点采集两个图像作为训练样本对图像匹配模型进行训练。
示例性地,对于双目图像对齐,双目摄像机左右两个摄像机可以同时采集图像,并且两个相机的相对位置一般是固定的,因此,根据该几何特性,在双目图像对齐过程中,针对左图像极线上的像素,其对应的像素应位于右图像的极线上,即这是一维图像匹配问题。
请参照图3,三维场景中的点P在双目图像的左图像中的投影点为像素P l,右图像中的投影点为像素P r。当P l确定时,极线过左图像极点e l,且P l位于极线上,则在右图像上与P l的相应像素P r也总是位于极线上,并且极线通过右图像极点e r。其中,O l和O r分别是左右摄像机中心,e l和e r是极点。
请参照图4,图4示出了双目立体图像校正的示例,左右两个摄像机是平行的,极线是水平的,即双目图像对齐是沿水平线找到匹配的像素。
在一些实施例中,光流一般描述了两个相邻帧之间的密集运动。两个图像在不同时间拍摄,并且这两个帧之间的相机位置和姿势可以改变。光流预测的场景可以为刚性场景或非刚性场景。对于刚性场景,场景中物体没有移动,图像的差异仅因为摄像机的移动(旋转或平移),则光流预测也可以成为沿着极线的一维图像匹配问题。双目图像是在同一时间不同角度拍摄的画面,双目图像对齐问题可以视作在刚性场景中,摄像机在一个位置拍摄后移动另一位置在此拍摄,然后处理两个图像的光流预测的问题。
由于估算自我运动本身将导致额外的误差并且场景并不总是刚性的,因此,在一些实施例中,可以不考虑摄像机自我运动的问题,仅将双目图像对齐作为光流预测的特殊情况。也就是说,若图像匹配模型能在二维空间中实现良好的光流预测,也应该能够在一维空间中良好地实现双目图像对齐。
因此,在一些实施例中,数据处理设备100在执行步骤220时,在光流预测过程中,数据处理设备100可以根据预测的光流把目标图像扭曲到参照图像,并通过测量翘曲目标图像和参照图像之间的差异来构建光度损失。但是,对于场景中被前景遮挡的物体对应的像素,亮度恒定性假设不再成立,因此,对于被遮挡像素,光度损失可能将导致错误的训练监督。为此,在一些实施例中,在采用光度损失预测光流时可以预先确定并排除被遮挡像素。
其中,可以理解的是,如果一个像素点只在一帧图片中可见,而在另一帧图片中不可见,那么该像素点就是被遮挡的。像素点被遮挡可能存在多种原因,比如物体发生运动或者是摄像头发生运动等,都有可能导致像素点被遮挡。例如在一些可能的应用场景中,第一帧中某个物体正面朝前,摄像头拍到的是这个物体的前面部分;而在第二帧中,物体旋转变成朝后,那么摄像头只能捕捉到物体的后面的部分,如此,第一帧中物体的前半部分在第二帧中不可见,就是遮挡的。
示例性地,在一些实施例中,数据处理设备100可以根据所述第一训练图及第二训练图之间的光度损失,获得从第一训练图到第二训练图的初始光流图和初始置信度图,然后根据所述初始光流图和初始置信度图,获得排除被遮挡像素之后的所述第一光流预测结果其中,所述初始光流图可以指示对应的像素点在所述第一训练图到所述第二训练图之间的位移量;所述第一光流预测结果可以指示未被遮挡的像素点在所述第一训练图到所述第二训练图之间的位移量。
另外,所述初始置信度图可以被配置成指示对应像素点的遮挡状态,例如,所述初始置信度图中被遮挡像素的置信度可以被设置为0,未被遮挡像素的置信度可以被设置为1。然后根据所述初始光流图和初始置信度图,获得所述第一光流预测结果。
由于被遮挡像素的置信度为0,当初始光流图与初始置信度图相乘时,即从所述初始光流图中剔除了被遮挡像素的数据,从而获得了未被遮挡的像素组成的高置信度的光流图。
可选地,在一些实施例中,数据处理设备100可以采用前向-后向光度检测对所述初始光流图进行处理,根据光度差异确定各像素点对应的置信度得到所述置信度图。其中,数据处理设备100可以将光度差异超过预设阈值的像素的置信度设置为0,作为被遮挡的像素;数据处理设备100可以将光度差异未超过预设阈值的像素的置信度设置为1,作为未被遮挡的像素。
在一些实施例中,数据处理设备100在进行前向-后向光度检测时,可以获得所述第一训练图I t到第二训练图I t+1的初始光流图上像素p的前向光流F t→t+1(p)及后向光流 F′ t→t+1(p),其中,F′ t→t+1(p)=F t+1→t(p+F t→t+1(p)),F t+1→t为所述第二训练图到第一训练图的初始光流。
数据处理设备100可以按照以下公式根据所述像素p的前向光流和后向光流获得像素p的置信度图M t→t+1(p),
Figure PCTCN2020104924-appb-000001
其中,p表示像素点,δ(p)=0.1(|F t→t+1(p)+F′ t→t+1(p)|)+0.05。
另外,在一些实施例中,数据处理设备100还可以交换第一训练图和第二训练图来进行训练,以获得第二训练图到第一训练图的反向光流图。
其中,数据处理设备100在执行步骤220时,可以根据预设的光度损失函数和平滑度损失函数进行从所述第一训练图到第二训练图的光流预测,获得所述第一光流预测结果。
示例性地,所述光度损失函数L p可以表示为:
Figure PCTCN2020104924-appb-000002
其中,p表示像素点,
Figure PCTCN2020104924-appb-000003
为将所述第一训练图I t使用Census变化后获得的图像,
Figure PCTCN2020104924-appb-000004
为根据所述第一训练图到所述第二训练图的正向光流将
Figure PCTCN2020104924-appb-000005
扭曲到
Figure PCTCN2020104924-appb-000006
获得的扭曲图像,Hamming(x)为汉明距离。
所述平滑度损失函数L m的形式可以为:
Figure PCTCN2020104924-appb-000007
其中,I(p)为所述第一训练图或第二训练图上的像素点,N是所述第一训练图或第二训练图的像素总数,
Figure PCTCN2020104924-appb-000008
表示梯度,T表示转置,I(p)为第一训练图或第二训练图上的像素点,F(p)为当前处理的光流图上的点。
数据处理设备100在执行步骤220时,可以使用L p+λL m作为损失函数训练所述图像匹配模型,其中,λ=0.1。
此外,在上述步骤230中,由于即使只有稀疏的正确标记,CNN也可以在KITTI数据集上学习到较好的光流预测。因此,在一些实施例中,数据处理设备100可以先通过执行步骤220获得稀疏的高置信度的光流预测,然后将它们用作代理标记来指导图像匹配预测的学习。
请参照图5,在一些实施例中,数据处理设备100可以将所述第一光流预测结果作为代理标记,使用预设的代理自监督损失函数和平滑度损失函数,执行从所述第一训练图到第二训练图的光流预测。
示例性地,所述代理自监督损失函数L s的形式可以为:
Figure PCTCN2020104924-appb-000009
其中,p表示像素点,F py为所述初始光流图,M py为所述初始置信度图,F为当前处理的光流图。
数据处理设备100在执行步骤230时,可以使用L S+λL m作为损失函数训练所述图像匹配模型,其中,λ=0.1。
需要说明的是,与执行步骤220的训练过程不同,数据处理设备100在执行步骤230时,可以不再执行对未被遮挡像素的剔除动作,以使模型能够预测被遮挡区域的光流。
可选地,在一些实施例中,在数据处理设备100执行步骤230时,可以随机地先对所述第一训练图和第二训练图进行相同的预处理,比如在一些实施例中,该预处理可以是对所述第一训练图和所述第二训练图进行相同位置和相同大小的剪裁,或者是进行相同的随机降采样,又或者是其他一些实施例中,该预处理可以是对所述第一训练图和所述第二训练图进行相同位置和相同大小的剪裁,以及进行相同的随机降采样;然后,数据处理设备100可以使用预处理后的第一训练图和第二训练图执行步骤230的训练,从而可以同时提高遮挡点以及被遮挡点光流预测准确率的效果。
可选地,在一些实施例中,在数据处理设备100执行步骤230时,也可以先对所述第一训练图和第二训练图进行相同系数的随机缩放或相同角度的随机旋转,然后使用处理后的第一训练图和第二训练图执行步骤230的训练。
需要说明的是,在本申请其他一些可能的实施方式中,数据处理设备100也可以采用其他方法可以获得高置信度的光流预测。例如,采用传统方法来计算可靠的视差。
在一些场景中,模型最终需要执行的是光流预测,因此采用数据处理设备100通过步骤220获取到光流预测结果和置信度图,然后在执行步骤230时使用高置信度的光流预测作为代理基础事实来指导神经网络学习图像匹配,可在一个模型中完成上述训练过程。
在一些实施例中,经过代理学习之后,高置信度像素的数量将会增加,因此数据处理设备100执行步骤230之后,还可以使用代理学习获得的第二光流预测结果进行迭代训练,以改善所述图像匹配模型的识别能力。
需要说明的是,通过本申请实施例提供的方法训练获得的图像匹配模型,既可以被配置成进行光流预测,也可以被配置成进行双目图像对齐。当训练好的所述图像匹配模型在进行光流预测时,可以将不同时间点采集的第一训练图I t到第二训练图I t+1作为输入,输出I t到I t+1的光流图。在将训练好的所述图像匹配模型被配置成双目图像对齐时,则可以将双目图像中左右摄像机采集的图像I l和I r作为输入,并获得输出的图像I l到I r的立体视差图作为匹配结果。
在一些实施例中,可以使用Adam优化器在TensorFlow系统上建立所述图像匹配模型,并将模型的批量大小设置为4,初始学习率为1e-4,每60k迭代将其衰减一半。在训练期间,可以将标准化的图像为输入并进行例如随机剪裁、缩放或旋转等方式进行数据增强。示例性地,裁剪大小可以设置为[256,640]像素大小,随机缩放系数范围可以设置为[0.75,1.25]。
另外,在数据处理设备100执行步骤220时,可以将光度损失应用于所有像素,并使用光度损失训练所述图像匹配模型,从头开始进行100k次迭代。需要注意的是,在开始时,可以不区分高置信度像素和低置信度像素,因为直接只将光度损失应用于高置信度像素可能会得出所有像素被视为低置信度像素的明显解。之后,运用光度损失函数L p和平滑度损失函数L m进行400k次迭代来训练所述图像匹配模型。在数据处理设备100执行步骤230时,可以使用代理自监督损失函数L s和平滑度损失函数L m进行400k迭代以训练所述图像匹配模型。
图6示出了在使用KITTI 2012数据集和KITTI 2015数据集上,使用其他模型与使用本申请实施例提供的方法训练出的图像匹配模型进行光流预测的测试结果,从图6可见,采用本申请实施例提供的基于单目图像的模型训练方法训练出的图像匹配模型(“Our+proxy”项)的识别能力明显优于例如MultiFrameOccFlow和DDFlow等无监督方法训练出的模型。
图7示出了在使用KITTI 2012数据集和KITTI 2015数据集上,使用其他模型与使用本申请实施例提供的方法训练出的图像匹配模型进行双目图像对齐的测试结果,从图7可见,本申请实施例提供的基于单目图像的模型训练方法训练出的图像匹配模型(“Our+proxy+ft”项)的识别能力明显优于其他的无监督方法训练出的模型。
请参照图8,本申请实施例还提供一种基于单目图像的模型训练装置110,所述装置包 括图像获取模块111、第一光流预测模块112及第二光流预测模块113。
所述图像获取单元111被配置成获取单目图像采集装置在不同时间点采集的第一训练图和第二训练图。
所述第一光流预测模块112被配置成根据所述第一训练图及第二训练图之间的光度损失,获得从所述第一训练图到第二训练图的第一光流预测结果;
所述第二光流预测模块113被配置成将所述第一光流预测结果作为代理标记,使用所述第一训练图和第二训练图进行光流预测的代理学习。
综上所述,本申请提供的基于单目图像的模型训练方法、装置及图像处理设备,通过将双目图像匹配看作光流预测的特例,采用代理学习的方式,将不同时间点采集的两个单目图像作为训练样本得到的第一光流预测结果作为代理标记,被配置成指导模型进行再次光流预测的学习。如此,可以在不依赖校正好的双目图像样本的情况下双目图像立体匹配的自监督学习,使用同一个模型进行预测光流和立体匹配。
在本申请所提供的实施例中,应该理解到,所揭露的装置和方法,也可以通过其它的方式实现。以上所描述的装置实施例仅仅是示意性的,例如,附图中的流程图和框图显示了根据本申请的多个实施例的装置、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段或代码的一部分,所述模块、程序段或代码的一部分包含一个或多个被配置成实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现方式中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个连续的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或动作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
另外,在本申请各个实施例中的各功能模块可以集成在一起形成一个独立的部分,也可以是各个模块单独存在,也可以两个或两个以上模块集成形成一个独立的部分。
所述功能如果以软件功能模块的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。
工业实用性
通过将双目图像匹配看作光流预测的特例,采用代理学习的方式,将不同时间点采集的两个单目图像作为训练样本得到的光流预测结果作为代理标记,以指导模型进行再次光流预测的学习。如此,可以在不依赖校正好的双目图像样本的情况下实现双目图像立体匹配的自监督学习,使用同一个模型进行预测光流和立体匹配。

Claims (16)

  1. 一种基于单目图像的模型训练方法,其特征在于,应用于对图像匹配模型进行训练,所述方法包括:
    获取单目图像采集装置在不同时间点采集的第一训练图和第二训练图;
    根据所述第一训练图及第二训练图之间的光度损失,获得从所述第一训练图到第二训练图的第一光流预测结果;
    将所述第一光流预测结果作为代理标记,使用所述第一训练图和第二训练图进行光流预测的代理学习;
    将训练好的所述图像匹配模型配置成执行双目图像对齐和光流预测。
  2. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    将待处理的双目图像输入训练好的所述图像匹配模型;
    获得所述图像匹配模型针对所述待处理的双目图像输出的立体视差图。
  3. 根据权利要求1所述的方法,其特征在于,所述获得从所述第一训练图到第二训练图的第一光流预测结果的步骤,包括:
    根据所述第一训练图及第二训练图之间的光度损失,获得从第一训练图到第二训练图的初始光流图和初始置信度图;
    根据所述初始光流图和初始置信度图,获得排除被遮挡像素之后的所述第一光流预测结果。
  4. 根据权利要求3所述的方法,其特征在于,获得所述初始置信度图的方式,包括:
    采用前向-后向光度检测对所述初始光流图进行处理,根据光度差异确定各像素点对应的置信度得到所述置信度图;
    其中,将光度差异超过预设阈值的像素的置信度设置为0,作为被遮挡的像素;将光度差异未超过预设阈值的像素的置信度设置为1,作为未被遮挡的像素。
  5. 根据权利要求4所述的方法,其特征在于,所述采用前向-后向光度检测对所述初始光流图进行处理,根据光度差异确定各像素点对应的置信度得到所述置信度图,包括:
    获得所述第一训练图I t到第二训练图I t+1的初始光流图上像素p的前向光流F t→t+1(p)及后向光流F′ t→t+1(p),其中,F′ t→t+1(p)=F t+1→t(p+F t→t+1(p)),F t+1→t为所述第二训练图到第一训练图的初始光流;
    按照以下公式根据所述像素p的前向光流和后向光流获得像素p的置信度图M t→t+1(p),
    Figure PCTCN2020104924-appb-100001
    其中,δ(p)=0.1(|F t→t+1(p)+F′ t→t+1(p)|)+0.05。
  6. 根据权利要求5所述的方法,其特征在于,所述根据所述初始光流图和初始置信度图,获得所述第一光流预测结果的步骤,包括:
    根据预设的光度损失函数和平滑度损失函数进行从所述第一训练图到第二训练图的光流预测,获得所述第一光流预测结果。
  7. 根据权利要求6所述的方法,其特征在于,所述光度损失函数L p的形式为:
    Figure PCTCN2020104924-appb-100002
    其中,
    Figure PCTCN2020104924-appb-100003
    为将所述第一训练图I t使用Census变化后获得的图像,
    Figure PCTCN2020104924-appb-100004
    为根据所述第一训练图到所述第二训练图的正向光流将
    Figure PCTCN2020104924-appb-100005
    扭曲到
    Figure PCTCN2020104924-appb-100006
    获得的扭曲图像,Hamming(x)为汉明距离。
  8. 根据权利要求6所述的方法,其特征在于,所述平滑度损失函数L m的形式为:
    Figure PCTCN2020104924-appb-100007
    其中,I(p)为所述第一训练图或第二训练图上的像素点,N是所述第一训练图或第二训练图的像素总数,
    Figure PCTCN2020104924-appb-100008
    表示梯度,T表示转置,I(p)为第一训练图或第二训练图上的像素点,F(p)为当前处理的光流图上的点。
  9. 根据权利要求5所述的方法,其特征在于,将所述第一光流预测结果作为代理标记,使用所述第一训练图和第二训练图进行光流预测的代理学习的步骤,包括:
    将所述第一光流预测结果作为代理标记,使用预设的代理自监督损失函数和平滑度损失函数,执行从所述第一训练图到第二训练图的光流预测。
  10. 根据权利要求9所述的方法,其特征在于,所述代理自监督损失函数L s的形式为:
    Figure PCTCN2020104924-appb-100009
    其中,F py为所述初始光流图,M py为所述初始置信度图,F为当前处理的光流图。
  11. 根据权利要求9所述的方法,其特征在于,所述将所述第一光流预测结果作为代理标记,使用预设的代理自监督损失函数和平滑度损失函数,执行从所述第一训练图到第二训练图的光流预测训练的步骤,包括:
    对所述第一训练图及所述第二训练图进行相同的预处理;其中,所述预处理包括随机剪裁和/或随机降采样;
    将所述第一光流预测结果作为代理标记,使用预处理后的第一训练图及第二训练图进行图像元素匹配的机器学习训练。
  12. 根据权利要求9所述的方法,其特征在于,所述将所述第一光流预测结果作为代理标记,使用预设的代理自监督损失函数和平滑度损失函数,执行从所述第一训练图到第二训练图的光流预测训练的步骤,包括:
    对所述第一训练图和第二训练图进行相同的预处理;其中,所述预处理包括系数的随机缩放或角度的随机旋转;
    将所述第一光流预测结果作为代理标记,使用预处理后的第一训练图及第二训练图进行图像元素匹配的机器学习训练。
  13. 根据权利要求1所述的方法,其特征在于,所述将所述第一光流预测结果作为代理标记,使用所述第一训练图和第二训练图进行光流预测的代理学习的步骤之后,所述方法还包括:
    使用代理学习获得的第二光流预测结果进行迭代训练。
  14. 一种基于单目图像的模型训练装置,其特征在于,应用于对图像匹配模型进行训练,所述装置包括:
    图像获取单元,被配置成获取单目图像采集装置在不同时间点采集的第一训练图和第二训练图;
    第一光流预测模块,被配置成根据所述第一训练图及第二训练图之间的光度损失,获得从所述第一训练图到第二训练图的第一光流预测结果;
    第二光流预测模块,被配置成将所述第一光流预测结果作为代理标记,使用所述第一训练图和第二训练图进行光流预测的代理学习。
  15. 一种数据处理设备,其特征在于,包括机器可读存储介质及处理器,所述机器可读存储介质存储有机器可执行指令,所述机器可执行指令在被所述处理器执行时,实现权利要求1-13任意一项所述的方法。
  16. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该计算机程序被处理器执行时实现如权利要求1-13任意一项所述的方法。
PCT/CN2020/104924 2019-08-15 2020-07-27 基于单目图像的模型训练方法、装置及数据处理设备 WO2021027543A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/629,521 US20220270354A1 (en) 2019-08-15 2020-07-27 Monocular image-based model training method and apparatus, and data processing device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910753810.7 2019-08-15
CN201910753810.7A CN112396074A (zh) 2019-08-15 2019-08-15 基于单目图像的模型训练方法、装置及数据处理设备

Publications (1)

Publication Number Publication Date
WO2021027543A1 true WO2021027543A1 (zh) 2021-02-18

Family

ID=74570913

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/104924 WO2021027543A1 (zh) 2019-08-15 2020-07-27 基于单目图像的模型训练方法、装置及数据处理设备

Country Status (3)

Country Link
US (1) US20220270354A1 (zh)
CN (1) CN112396074A (zh)
WO (1) WO2021027543A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113643406A (zh) * 2021-08-12 2021-11-12 北京的卢深视科技有限公司 图像生成方法、电子设备及计算机可读存储介质

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112966584B (zh) * 2021-02-26 2024-04-19 中国科学院上海微系统与信息技术研究所 一种运动感知模型的训练方法、装置、电子设备及存储介质
US11688090B2 (en) * 2021-03-16 2023-06-27 Toyota Research Institute, Inc. Shared median-scaling metric for multi-camera self-supervised depth evaluation
CN114005075B (zh) * 2021-12-30 2022-04-05 深圳佑驾创新科技有限公司 一种光流估算模型的构建方法、装置及光流估算方法
CN117237800B (zh) * 2023-08-01 2024-06-14 广州智在信息科技有限公司 基于人工智能的作物生长监控方法及计算机设备

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3009789A1 (en) * 2013-06-11 2016-04-20 Yamaha Hatsudoki Kabushiki Kaisha Monocular-motion-stereo distance estimation method, and monocular-motion-stereo distance estimation apparatus
CN108028904A (zh) * 2015-09-09 2018-05-11 华为技术有限公司 移动设备上光场增强现实/虚拟现实的方法和系统
CN109903315A (zh) * 2019-03-08 2019-06-18 腾讯科技(深圳)有限公司 用于光流预测的方法、装置、设备以及可读存储介质
CN110111366A (zh) * 2019-05-06 2019-08-09 北京理工大学 一种基于多级损失量的端到端光流估计方法

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102219561B1 (ko) * 2018-11-23 2021-02-23 연세대학교 산학협력단 대응점 일관성에 기반한 비지도 학습 방식의 스테레오 매칭 장치 및 방법
CN112396073A (zh) * 2019-08-15 2021-02-23 广州虎牙科技有限公司 基于双目图像的模型训练方法、装置及数据处理设备

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3009789A1 (en) * 2013-06-11 2016-04-20 Yamaha Hatsudoki Kabushiki Kaisha Monocular-motion-stereo distance estimation method, and monocular-motion-stereo distance estimation apparatus
CN108028904A (zh) * 2015-09-09 2018-05-11 华为技术有限公司 移动设备上光场增强现实/虚拟现实的方法和系统
CN109903315A (zh) * 2019-03-08 2019-06-18 腾讯科技(深圳)有限公司 用于光流预测的方法、装置、设备以及可读存储介质
CN110111366A (zh) * 2019-05-06 2019-08-09 北京理工大学 一种基于多级损失量的端到端光流估计方法

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113643406A (zh) * 2021-08-12 2021-11-12 北京的卢深视科技有限公司 图像生成方法、电子设备及计算机可读存储介质

Also Published As

Publication number Publication date
US20220270354A1 (en) 2022-08-25
CN112396074A (zh) 2021-02-23

Similar Documents

Publication Publication Date Title
WO2021027543A1 (zh) 基于单目图像的模型训练方法、装置及数据处理设备
US11176381B2 (en) Video object segmentation by reference-guided mask propagation
CN109727288B (zh) 用于单目同时定位与地图构建的系统和方法
CN107274433B (zh) 基于深度学习的目标跟踪方法、装置及存储介质
US10395383B2 (en) Method, device and apparatus to estimate an ego-motion of a video apparatus in a SLAM type algorithm
US10334168B2 (en) Threshold determination in a RANSAC algorithm
JP5160643B2 (ja) 2次元画像からの3次元オブジェクト認識システム及び方法
US9898856B2 (en) Systems and methods for depth-assisted perspective distortion correction
US8433157B2 (en) System and method for three-dimensional object reconstruction from two-dimensional images
US9117310B2 (en) Virtual camera system
Vo et al. Spatiotemporal bundle adjustment for dynamic 3d reconstruction
WO2021027544A1 (zh) 基于双目图像的模型训练方法、装置及数据处理设备
US11170202B2 (en) Apparatus and method for performing 3D estimation based on locally determined 3D information hypotheses
TW202117611A (zh) 電腦視覺訓練系統及訓練電腦視覺系統的方法
US11651581B2 (en) System and method for correspondence map determination
CN113711276A (zh) 尺度感知单目定位和地图构建
WO2019157922A1 (zh) 一种图像处理方法、装置及ar设备
Rozumnyi et al. Sub-frame appearance and 6d pose estimation of fast moving objects
KR20150097251A (ko) 다중 영상간 대응점을 이용한 카메라 정렬 방법
CN112270748A (zh) 基于图像的三维重建方法及装置
Yue et al. High-dimensional camera shake removal with given depth map
TWI823491B (zh) 深度估計模型的優化方法、裝置、電子設備及存儲介質
Ashar et al. Video Stabilization using RAFT-based Optical Flow
TWI814500B (zh) 減少深度估計模型誤差的方法、裝置、設備及存儲介質
Agrawal et al. Robust ego-motion estimation and 3-D model refinement using surface parallax

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20851750

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20851750

Country of ref document: EP

Kind code of ref document: A1