WO2024000728A1 - 单目三维平面恢复方法、设备及存储介质 - Google Patents

单目三维平面恢复方法、设备及存储介质 Download PDF

Info

Publication number
WO2024000728A1
WO2024000728A1 PCT/CN2022/110039 CN2022110039W WO2024000728A1 WO 2024000728 A1 WO2024000728 A1 WO 2024000728A1 CN 2022110039 W CN2022110039 W CN 2022110039W WO 2024000728 A1 WO2024000728 A1 WO 2024000728A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature
plane
input
predicted
monocular
Prior art date
Application number
PCT/CN2022/110039
Other languages
English (en)
French (fr)
Inventor
崔岩
常青玲
任飞
徐世廷
杨鑫
侯宇灿
Original Assignee
五邑大学
广东四维看看智能设备有限公司
中德(珠海)人工智能研究院有限公司
珠海市四维时代网络科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 五邑大学, 广东四维看看智能设备有限公司, 中德(珠海)人工智能研究院有限公司, 珠海市四维时代网络科技有限公司 filed Critical 五邑大学
Publication of WO2024000728A1 publication Critical patent/WO2024000728A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • G06T7/593Depth or shape recovery from multiple images from stereo images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning

Definitions

  • the invention relates to the field of image data processing, and in particular to a monocular three-dimensional plane restoration method, equipment and storage medium.
  • Three-dimensional plane recovery requires segmenting the plane area of the scene from the image dimension, and at the same time estimating the plane parameters of the corresponding area. Based on the plane area and plane parameters, the three-dimensional plane recovery can be realized, and the predicted three-dimensional plane can be reconstructed.
  • monocular 3D plane restoration focuses on reconstruction accuracy and enhances the accuracy of the model structure by analyzing the edges of the plane structure and the embeddedness of the scene.
  • it lacks the ability to identify small plane areas and is prone to errors in the plane detection process. Losing a small proportion of pixel areas affects the accuracy of monocular three-dimensional plane recovery.
  • the present invention aims to solve at least one of the technical problems existing in the prior art.
  • the present invention provides a monocular three-dimensional plane restoration method, equipment and storage medium, which can extract features from the internal features of the feature map, effectively improve the comprehensiveness of feature extraction, and thereby improve the accuracy of monocular three-dimensional plane restoration. .
  • a first embodiment of the present invention provides a monocular three-dimensional plane restoration method, including:
  • the first internal feature and the first associated feature are fused and input to the first decoder for decoding to obtain the prediction plane parameters and the prediction plane area;
  • the second internal feature and the second correlation feature are fused and then input to the second decoder for decoding to obtain a predicted non-planar area, where the predicted non-planar area is used to verify the predicted flat area;
  • Three-dimensional restoration is performed based on the plane parameters and plane area to obtain the predicted three-dimensional plane.
  • the inner encoder and the outer encoder by setting the inner encoder and the outer encoder, the internal features of the image blocks in the corresponding feature map and the associated features between the image blocks are respectively extracted, and then the internal features and associated features are extracted After feature fusion is input to the decoder for decoding, it can effectively improve the comprehensiveness of feature extraction, reduce the probability of image information loss, and thus improve the accuracy of monocular three-dimensional plane recovery.
  • the predicted plane area can be corrected by predicting the non-planar area. This test can further improve the robustness of monocular three-dimensional plane recovery.
  • multi-scale feature extraction is performed on the input image to obtain the first feature map and the second feature map at two scales, including:
  • the corresponding position information is embedded in the first extraction map and the second extraction map respectively to obtain the first feature map and the second feature map at two scales.
  • the first feature map is input into the first inner encoder and the first outer encoder respectively, and the first internal features and each third image block in the first feature map are respectively extracted.
  • the first correlation feature between an image block includes:
  • Each first image block is input to the first outer encoder, and a first correlation feature between each first image block is extracted.
  • the second feature map is input into the second inner encoder and the second outer encoder respectively, and the second internal features and each third image block of the second image block in the second feature map are respectively extracted.
  • the second correlation feature between the two image blocks includes:
  • Each second image block is input to the second outer encoder, and a second correlation feature between each second image block is extracted.
  • the first internal feature and the first associated feature are fused and then input to the first decoder for decoding to obtain prediction plane parameters and prediction plane areas, including:
  • the first fusion feature is input to the first decoder and the plane area and plane parameters are used as labels for decoding and classification to obtain predicted plane parameters and predicted plane areas.
  • the second internal feature and the second correlation feature are fused and then input to the second decoder for decoding to obtain a predicted non-planar area, including:
  • the second fusion feature is input to the second decoder to perform decoding and classification using the non-planar area as a label to obtain a predicted non-planar area.
  • the method further includes:
  • the weight of the first decoder is updated according to the predicted planar area, the predicted non-planar area and the loss function.
  • updating the weight of the first decoder according to the predicted non-planar area and the loss function includes:
  • the weight of the first decoder is updated according to the predicted planar area, the predicted non-planar area and the cross-entropy loss function, where the cross-entropy loss function is:
  • Y + and Y - represent planar area mark pixels and non-planar area mark pixels respectively
  • P i represents the probability that the i-th pixel belongs to the planar area
  • w is the ratio of the pixel mark in the flat area and the pixel mark in the non-planar area.
  • a second embodiment of the present invention provides an electronic device, including:
  • a memory a processor, and a computer program stored in the memory and executable on the processor.
  • the processor executes the computer program, the monocular three-dimensional plane recovery method of any one of the first aspects is implemented.
  • the electronic device of the embodiment of the second aspect applies any one of the monocular three-dimensional plane restoration methods of the first aspect, it has all the beneficial effects of the first aspect of the present invention.
  • computer executable instructions are stored, and the computer executable instructions are used to execute any one of the monocular three-dimensional plane recovery methods of the first aspect.
  • the computer storage medium of the embodiment of the third aspect can perform any one of the monocular three-dimensional plane recovery methods of the first aspect, it has all the beneficial effects of the first aspect of the present invention.
  • Figure 1 is a main step diagram of the monocular three-dimensional plane restoration method according to the embodiment of the present invention.
  • Figure 2 is a schematic diagram of steps S100 in the monocular three-dimensional plane restoration method according to the embodiment of the present invention.
  • Figure 3 is a schematic diagram of steps S200 in the monocular three-dimensional plane restoration method according to the embodiment of the present invention.
  • Figure 4 is a schematic diagram of the steps of S300 in the monocular three-dimensional plane restoration method according to the embodiment of the present invention.
  • Figure 5 is a schematic diagram of steps S400 in the monocular three-dimensional plane restoration method according to the embodiment of the present invention.
  • Figure 6 is a schematic diagram of steps S500 in the monocular three-dimensional plane restoration method according to the embodiment of the present invention.
  • Figure 7 is a framework diagram of the network applied by the monocular three-dimensional plane restoration method according to the embodiment of the present invention.
  • Three-dimensional plane restoration and reconstruction technology is currently one of the mainstream research tasks in the field of computer vision.
  • Three-dimensional plane restoration of a single image requires segmenting the plane instance areas of the scene from the image dimensions, and at the same time estimating the plane parameters of each instance area.
  • the non-planar area will be represented by the depth estimated by the network model.
  • This technology has broad application prospects in fields such as virtual reality, augmented reality, and robotics.
  • the plane detection and restoration method of a single image requires simultaneous research on image depth, plane normals, plane segmentation, etc.
  • the traditional three-dimensional plane restoration and reconstruction method based on artificially extracted features only extracts the shallow texture information of the image and relies on the prior conditions of plane geometry, which has the disadvantage of weak generalization ability.
  • Real indoor scenes are very complex. Multiple shadows produced by complex light and various folding obstructions will affect the quality of plane restoration and reconstruction, making it difficult for traditional methods to cope with plane reconstruction tasks in complex indoor scenes.
  • Plane restoration and reconstruction is an important research direction in 3D reconstruction.
  • 3D reconstruction methods first generate point cloud data through 3D vision methods, then generate nonlinear scene surfaces by fitting relevant points, and then optimize the overall reconstruction through global reasoning.
  • segmented plane restoration and reconstruction combines the visual instance segmentation method to identify the plane area of the scene, using three parameters in the Cartesian coordinate system and a segmentation mask to represent the plane, which has better reconstruction accuracy and effect.
  • Segmented plane restoration and reconstruction is a multi-stage reconstruction method, and the accuracy of plane identification and parameter estimation will affect the results of the final model.
  • Three-dimensional plane recovery requires segmenting the plane area of the scene from the image dimension, and at the same time estimating the plane parameters of the corresponding area. Based on the plane area and plane parameters, the three-dimensional plane recovery can be realized, and the predicted three-dimensional plane can be reconstructed.
  • the end-to-end convolutional neural network architecture Planenet which can infer a fixed number of plane instance masks and plane parameters from a single RGB image; by predicting a fixed number of planes, it is directly induced from the plane structure Learning in the depth modality of the loss; by improving the two-stage Mask R-CNN method, using plane geometry prediction instead of object category classification, and then using a convolutional neural network to refine the plane segmentation mask; by predicting pixel-by-pixel plane parameters, using Associative embedding method, training network parameters to map each pixel to the embedding space, and then clustering the embedded pixels to generate planar instances; a planar refinement method constrained by the Manhattan world assumption, which is enhanced by limiting the geometric relationship between planar instances Refinement of plane parameters; a divide-and-conquer method is used to segment the panorama plane from the horizontal and vertical directions. In view of the difference in pixel distribution between the panorama and the ordinary image, this method can effectively restore the distorted plane instance; based
  • monocular 3D plane restoration focuses on reconstruction accuracy and enhances the accuracy of the model structure by analyzing the edges of the plane structure and the embeddedness of the scene.
  • it lacks the ability to identify small plane areas and is prone to errors in the plane detection process. Losing a small proportion of pixel areas affects the accuracy of monocular three-dimensional plane recovery.
  • the encoder part of the Transformer module is applied to the image block sequence and applied to the image classification task, which can obtain better results than the most advanced convolutional networks. and fewer computing resources.
  • the object detection problem is framed as a sequence-to-sequence prediction problem, predicting a set of objects interacting with a sequence of contextual features directly from the learned object query.
  • a new simple object detection paradigm is proposed that builds on the standard Transformer encoder-decoder architecture, which gets rid of many hand-designed components such as anchor generation and non-maximum suppression.
  • semantic segmentation is redefined as a sequence-to-sequence prediction task, and an encoder based purely on the self-attention mechanism is proposed, which eliminates The reliance on convolution operations solves the problem of limited receptive fields.
  • a monocular three-dimensional plane restoration method includes at least some steps:
  • S600 Perform three-dimensional restoration according to the plane parameters and the plane area to obtain the predicted three-dimensional plane.
  • the comprehensiveness of the information obtained can be improved.
  • the inner encoder and the outer encoder By setting up the inner encoder and the outer encoder, the internal features of the image blocks in the corresponding feature map and the correlation features between the image blocks are extracted respectively, and then The fusion of internal features and associated features is input to the decoder for decoding, which can effectively improve the comprehensiveness of feature extraction, reduce the probability of image information loss, and thus improve the accuracy of monocular three-dimensional plane recovery.
  • the predicted plane area can be predicted by Verification in non-planar areas can further improve the robustness of monocular three-dimensional plane recovery.
  • multi-scale feature extraction is performed on the input image to obtain the first feature map and the second feature map at two scales, including:
  • S120 Embed corresponding position information in the first extraction map and the second extraction map respectively to obtain the first feature map and the second feature map at two scales.
  • step S110 multi-scale feature extraction is performed on the input image through the HRNet convolution network to obtain the first extraction map and the second extraction map at two scales;
  • Step S120 embed corresponding position information into the first extraction map and the second extraction map respectively through position embedding, and then convert them into tokens to obtain the first feature map and the second feature map at two scales.
  • the scale corresponding to the first feature map is HW/16
  • the scale corresponding to the second feature map is HW/32, where H and W represent the height and width of the input image respectively.
  • the input data is further encoded into subdivided patches through the attention mechanism, that is, subdivided image blocks.
  • Windows is performed on the patch embedding of different feature maps.
  • Multi-Head Self-Attention can effectively reduce the amount of calculation, combine tokens from different stages of the vision transformer into image-like representations of different resolutions, and use convolutional decoders to gradually combine them into full resolution predict.
  • W-MSA Multi-Head Self-Attention
  • multi-scale dense vision transformers avoid feature loss caused by downsampling operations after image patch embedding calculations, providing more refined and globally consistent predictions.
  • step S200 the first feature map is input into the first inner encoder and the first outer encoder respectively, and the first internal features of the first image block in the first feature map are respectively extracted. and the first associated features between each first image block, including:
  • the loss of a small pixel plane area can be effectively avoided.
  • step S300 the second feature map is input into the second inner encoder and the second outer encoder respectively, and the second internal features of the second image block in the second feature map are respectively extracted. and the second associated features between each second image block, including:
  • the loss of a small pixel plane area can be effectively avoided.
  • step S400 the first internal feature and the first associated feature are fused and then input to the first decoder for decoding to obtain the prediction plane parameters and prediction plane area, including:
  • the comprehensiveness of feature extraction can be effectively improved, thereby improving the accuracy of the final three-dimensional plane restoration.
  • step S500 the second internal feature and the second correlation feature are fused and then input to the second decoder for decoding to obtain the predicted non-planar area, including:
  • the comprehensiveness of feature extraction can be effectively improved, thereby improving the accuracy of the final three-dimensional plane restoration.
  • step S500 the second internal feature and the second correlation feature are fused and then input to the second decoder for decoding to obtain the predicted non-planar feature.
  • the second decoder After the area, it also includes:
  • the weight of the first decoder is updated according to the predicted planar area, the predicted non-planar area and the loss function.
  • the first decoder is iteratively updated through the loss function, which can effectively improve the accuracy of plane area prediction during three-dimensional plane restoration.
  • the performance of the overall network is dynamically updated, which can improve the detection of scene changes. Accuracy and robustness.
  • weight of the first decoder is updated according to the predicted plane area, the predicted non-planar area and the loss function, specifically as follows:
  • the weight of the first decoder is updated according to the predicted planar area, the predicted non-planar area and the cross-entropy loss function, where the cross-entropy loss function is:
  • Y + and Y - represent planar area mark pixels and non-planar area mark pixels respectively
  • P i represents the probability that the i-th pixel belongs to the planar area
  • w is the ratio of the pixel mark in the flat area and the pixel mark in the non-planar area.
  • Mutual information is a measure of the degree of dependence between two random variables based on Shannon entropy, which can capture the nonlinear statistical correlation between variables.
  • the mutual information between X and Z can be understood as, given Z, Reduction in uncertainty in X:
  • H(X) is the Shannon entropy
  • Z) is the conditional entropy of Z given the condition of X
  • P XZ is the joint probability distribution of the two variables
  • P XZ is equivalent to the KL divergence (Kullback-Leibler) of P XZ and the product of P X and P Z :
  • mutual information is commonly used in unsupervised representation learning networks, but mutual information estimates are difficult to estimate as bijective functions and can lead to suboptimal representations that are irrelevant to downstream tasks. While a highly nonlinear evaluation framework may lead to better downstream performance, it defeats the purpose of learning effective transferable data representations.
  • the knowledge distillation framework based on mutual information defines mutual information as the difference between the entropy value of the teacher model and the entropy value of the teacher model under the condition of the student model. By maximizing the mutual information between the teacher-student network, the student model learns from the teacher Feature distribution of the model.
  • the present invention enhances feature expression through the mutual information of planar features of two maximized scale network branches.
  • two network branches of different scales correspond to the first decoder and the second decoder respectively, which are used to detect the predicted planar area S P and the predicted non-planar area S' NP respectively, where, at the most Ideally, the predicted flat area and the predicted non-planar area are inverted:
  • the last inequality expresses the non-negativity of KL divergence D KL .
  • the frame diagram of the network applied by the monocular three-dimensional plane restoration method in the embodiment of the present invention is shown in Figure 7.
  • the backbone network extracts features to obtain feature maps with sizes of 12 ⁇ 16 and 6 ⁇ 8, the size of 12 ⁇ 16 is passed POS (Position Embedding, position embedding) is input to the first inner and outer encoder, and the size is 6 ⁇ 8 and is input to the second inner and outer encoder through POS.
  • POS Position Embedding, position embedding
  • the loss function uses the mutual information loss function.
  • the second embodiment of the present invention also provides an electronic device.
  • the electronic device includes: a memory, a processor, and a computer program stored in the memory and executable on the processor.
  • the processor and memory may be connected via a bus or other means.
  • memory can be used to store non-transitory software programs and non-transitory computer executable programs.
  • the memory may include high-speed random access memory and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device.
  • the memory may optionally include memory located remotely from the processor, and the remote memory may be connected to the processor via a network. Examples of the above-mentioned networks include but are not limited to the Internet, intranets, local area networks, mobile communication networks and combinations thereof.
  • the non-transient software programs and instructions required to implement the monocular three-dimensional plane restoration method in the above-mentioned embodiment of the first aspect are stored in the memory.
  • the monocular three-dimensional plane restoration method in the above-mentioned embodiment is executed, for example , execute the above-described method steps S100 to S600, method steps S110 to S120, method steps S210 to S230, method steps S310 to S330, method steps S410 to S420, and method steps S510 to S520.
  • the device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separate, that is, they may be located in one place, or they may be distributed to multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • a third embodiment of the present invention provides a computer-readable storage medium that stores computer-executable instructions, and the computer-executable instructions are executed by a processor or a controller, for example, by the above-mentioned Execution by a processor in the device embodiment can cause the above-mentioned processor to perform the monocular three-dimensional plane restoration method in the above embodiment, for example, perform the above-described method steps S100 to S600, method steps S110 to S120, and method steps S210 to S210. S230, method steps S310 to S330, method steps S410 to S420, and method steps S510 to S520.
  • Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disk (DVD) or other optical disk storage, magnetic cassettes, tapes, disk storage or other magnetic storage devices, or may Any other medium used to store the desired information and that can be accessed by a computer.
  • communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery media .

Abstract

一种单目三维平面恢复方法、设备及存储介质,方法包括:对输入图像进行多尺度特征提取,得到两个尺度下的第一特征图和第二特征图(S100);将第一特征图分别输入第一内编码器和第一外编码器,分别提取得到第一特征图中第一图像块的第一内部特征和各个第一图像块之间的第一关联特征(S200);将第一内部特征和第一关联特征融合后输入到第一解码器进行解码,得到预测平面参数和预测平面区域(S400);根据平面参数和平面区域进行三维恢复,得到预测三维平面(S600)。通过分别提取对应特征图中图像块的内部特征以及图像块之间的关联特征,再将特征融合后输入到解码器进行解码,能够有效提高特征提取的全面性,能够提高单目三维平面恢复的精度。

Description

单目三维平面恢复方法、设备及存储介质 技术领域
本发明涉及图像数据处理领域,特别涉及一种单目三维平面恢复方法、设备及存储介质。
背景技术
三维平面恢复需要从图像维度分割出场景的平面区域,同时估计出对应区域的平面参数,根据平面区域和平面参数能够实现三维平面恢复,重建得到预测的三维平面。
相关技术中,单目三维平面恢复着重于重建精度,通过分析平面结构的边缘以及场景的嵌入性来加强模型结构的准确性,但缺乏对细小平面区域的识别能力,在平面检测的过程中容易丢失占比小部分的像素区域,影响单目三维平面恢复精度。
发明内容
本发明旨在至少解决现有技术中存在的技术问题之一。为此,本发明提供了一种单目三维平面恢复方法、设备及存储介质,能够对特征图的内部特征进行特征提取,可有效提高特征提取的全面性,进而提高单目三维平面恢复的精度。
本发明第一方面实施例提供一种单目三维平面恢复方法,包括:
对输入图像进行多尺度特征提取,得到两个尺度下的第一特征图和第二特征图;
将第一特征图分别输入第一内编码器和第一外编码器,分别提取得到第一特征图中第一图像块的第一内部特征和各个第一图像块之间的第一关联特征;
将第二特征图分别输入第二内编码器和第二外编码器,分别提取得到第二特征图中第二图像块的第二内部特征和各个第二图像块之间的第二关联特征;
将第一内部特征和第一关联特征融合后输入到第一解码器进行解码,得到预测平面参数和预测平面区域;
将第二内部特征和第二关联特征融合后输入到第二解码器进行解码,得到预测非平面区域,其中,预测非平面区域用于校验预测平面区域;
根据平面参数和平面区域进行三维恢复,得到预测三维平面。
根据本发明的上述实施例,至少具有如下有益效果:通过设置内编码器和外编码器,分别提取对应特征图中图像块的内部特征以及图像块之间的关联特征,再将内部特征和关联特征融合后输入到解码器进行解码,能够有效提高特征提取的全面性,减少图像信息丢失的机率,进而能够提高单目三维平面恢复的精度,此外,预测平面区域能够通过预测非平面区域 进行校验,可进一步提高单目三维平面恢复的鲁棒性。
根据本发明第一方面的一些实施例,对输入图像进行多尺度特征提取,得到两个尺度下的第一特征图和第二特征图,包括:
对输入图像进行多尺度特征提取,得到两个尺度下的第一提取图和第二提取图;
对第一提取图和第二提取图分别嵌入对应的位置信息,得到两个尺度下的第一特征图和第二特征图。
根据本发明第一方面的一些实施例,将第一特征图分别输入第一内编码器和第一外编码器,分别提取得到第一特征图中第一图像块的第一内部特征和各个第一图像块之间的第一关联特征,包括:
将第一特征图分切为多个第一图像块;
将各个第一图像块输入到第一内编码器,提取得到各个第一图像块的第一内部特征;
将各个第一图像块输入到第一外编码器,提取得到各个第一图像块之间的第一关联特征。
根据本发明第一方面的一些实施例,将第二特征图分别输入第二内编码器和第二外编码器,分别提取得到第二特征图中第二图像块的第二内部特征和各个第二图像块之间的第二关联特征,包括:
将第二特征图分切为多个第二图像块;
将各个第二图像块输入到第二内编码器,提取得到各个第二图像块的第二内部特征;
将各个第二图像块输入到第二外编码器,提取得到各个第二图像块之间的第二关联特征。
根据本发明第一方面的一些实施例,将第一内部特征和第一关联特征融合后输入到第一解码器进行解码,得到预测平面参数和预测平面区域,包括:
将第一内部特征和第一关联特征进行元素相加,得到第一融合特征;
将第一融合特征输入到第一解码器以平面区域和平面参数为标签进行解码分类,得到预测平面参数和预测平面区域。
根据本发明第一方面的一些实施例,将第二内部特征和第二关联特征融合后输入到第二解码器进行解码,得到预测非平面区域,包括:
将第二内部特征和第二关联特征进行元素相加,得到第二融合特征;
将第二融合特征输入到第二解码器以非平面区域为标签进行解码分类,得到预测非平面区域。
根据本发明第一方面的一些实施例,在将第二内部特征和第二关联特征融合后输入到第二解码器进行解码,得到预测非平面区域之后,还包括:
根据预测平面区域、预测非平面区域和损失函数,更新第一解码器的权重。
根据本发明第一方面的一些实施例,根据预测非平面区域和损失函数,更新第一解码器的权重,包括:
根据预测平面区域、预测非平面区域和交叉熵损失函数,更新第一解码器的权重,其中,交叉熵损失函数为:
Figure PCTCN2022110039-appb-000001
Y +和Y -分别表示平面区域标记像素和非平面区域标记像素,P i表示第i个像素属于平面区域的概率,
Figure PCTCN2022110039-appb-000002
表示第i个像素属于非平面区域的概率,w是平面区域像素标记和非平面区域像素标记的比率。
本发明第二方面实施例提供一种电子设备,包括:
存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,处理器执行计算机程序时实现第一方面任意一项的单目三维平面恢复方法。
由于第二方面实施例的电子设备应用第一方面任意一项的单目三维平面恢复方法,因此具有本发明第一方面的所有有益效果。
根据本发明第三方面实施例提供的一种计算机存储介质,存储有计算机可执行指令,计算机可执行指令用于执行第一方面任意一项的单目三维平面恢复方法。
由于第三方面实施例的计算机存储介质可执行第一方面任意一项的单目三维平面恢复方法,因此具有本发明第一方面的所有有益效果。
本发明的附加方面和优点将在下面的描述中部分给出,部分将从下面的描述中变得明显,或通过本发明的实践了解到。
附图说明
本发明的上述和/或附加的方面和优点从结合下面附图对实施例的描述中将变得明显和容易理解,其中:
图1是本发明实施例的单目三维平面恢复方法的主要步骤图;
图2是本发明实施例的单目三维平面恢复方法中S100的步骤示意图;
图3是本发明实施例的单目三维平面恢复方法中S200的步骤示意图;
图4是本发明实施例的单目三维平面恢复方法中S300的步骤示意图;
图5是本发明实施例的单目三维平面恢复方法中S400的步骤示意图;
图6是本发明实施例的单目三维平面恢复方法中S500的步骤示意图;
图7是本发明实施例的单目三维平面恢复方法所应用网络的框架图。
具体实施方式
本发明的描述中,除非另有明确的限定,设置、安装、连接等词语应做广义理解,所属技术领域技术人员可以结合技术方案的具体内容合理确定上述词语在本发明中的具体含义。在本发明的描述中,若干的含义是一个或者多个,多个的含义是两个以上,大于、小于、超过等理解为不包括本数,以上、以下、以内等理解为包括本数。此外,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。在本发明的描述中,除非另有说明,“多个”的含义是两个或两个以上。
随着深度学习的发展,计算机视觉领域受到了越来越多研究者的关注。三维平面恢复与重建技术是目前计算机视觉领域的主流研究任务之一,单张图片的三维平面恢复需要从图像维度分割出场景的平面实例区域,同时估计出每个实例区域的平面参数,一般地,非平面区域将会用网络模型估计的深度表示。该项技术在虚拟现实、增强现实、机器人等领域具有广阔的应用前景。
单张图片的平面检测与恢复方法需要同时对图像深度、平面法线、平面分割等进行展开研究。传统的基于人工提取特征的三维平面恢复重建方法,仅提取了图像的浅层纹理信息,同时依赖于平面几何的先验条件,存在泛化能力较弱的缺点。而真实的室内场景十分复杂,复杂光线所产生的多重阴影以及各种折叠遮挡物都会影响平面恢复与重建的质量,导致传统方法难以应对复杂室内场景的平面重建任务。平面恢复重建是三维重建中的一个重要研究方向,目前三维重建方法大多是首先通过三维视觉方法生成点云数据,然后通过拟合相关点生成非线性的场景表面,再通过全局推理优化整体的重建模型,而分段平面恢复重建结合视觉实例分割方法识别场景的平面区域,用笛卡尔坐标系下的三个参数和一个分割掩码表示平面,具有更好的重建精度和效果。分段平面恢复重建是多阶段的重建方法,平面识别和参数估计的精确度都会影响最终模型的结果。
三维平面恢复需要从图像维度分割出场景的平面区域,同时估计出对应区域的平面参数,根据平面区域和平面参数能够实现三维平面恢复,重建得到预测的三维平面。
有以下几种平面恢复方法:端到端的卷积神经网络架构Planenet,能够从单张RGB图形中推断固定数量的平面实例掩码及平面参数;通过预测固定数量的平面,直接从平面结构诱导的损失的深度模态中学习;通过改进两阶段Mask R-CNN方法,用平面几何预测代替对象类别分类,然后用卷积神经网络对平面分割掩码进行细化;通过预测逐像素平面参数,采用关联嵌入方法,训练网络参数将每个像素映射到嵌入空间,然后将嵌入的像素聚类生成平面 实例;一种受曼哈顿世界假设约束的平面细化方法,通过限制平面实例间的几何关系来加强平面参数的细化;通过从水平和垂直方向对全景图平面分割进行了分而治之的处理方法,针对于全景图与普通图像的像素分布差异,该方法能够有效恢复畸变的平面实例;基于Transformer模块的方法PlaneTR,通过加入平面实例中心及边缘特征,有效的提高了平面检测的效率。
相关技术中,单目三维平面恢复着重于重建精度,通过分析平面结构的边缘以及场景的嵌入性来加强模型结构的准确性,但缺乏对细小平面区域的识别能力,在平面检测的过程中容易丢失占比小部分的像素区域,影响单目三维平面恢复精度。
基于此,为了获得更优异的结果并且使用更少的计算资源,将Transformer模块的encoder部分应用在图像块序列,并应用在图像分类任务中,能够获得比最先进卷积网络有更优异的结果和更少的计算资源。如果将目标检测问题描述为一个序列到一个序列预测问题,直接从学习到的对象查询中预测与上下文特征序列交互的一组对象。提出了一种新的简单的对象检测范式,该范式建立在标准的Transformer编码器-解码器体系结构之上,该体系结构摆脱了许多手工设计的组件,如锚点生成和非最大抑制。为了解决由于卷积网络缺乏对低层特征张量的学习能力所导致次优表示学习,将语义分割重新定义为序列到序列的预测任务,提出了纯粹的基于自注意力机制的编码器,消除了对卷积操作的依赖,解决了感受野有限的问题。
下面参照图1至图7描述本发明一种单目三维平面恢复方法、设备及存储介质,能够对特征图的内部特征进行特征提取,可有效提高特征提取的全面性,进而提高单目三维平面恢复的精度。
参考图1所示,根据本发明第一方面实施例的一种单目三维平面恢复方法,至少包括一些步骤:
S100、对输入图像进行多尺度特征提取,得到两个尺度下的第一特征图和第二特征图;
S200、将第一特征图分别输入第一内编码器和第一外编码器,分别提取得到第一特征图中第一图像块的第一内部特征和各个第一图像块之间的第一关联特征;
S300、将第二特征图分别输入第二内编码器和第二外编码器,分别提取得到第二特征图中第二图像块的第二内部特征和各个第二图像块之间的第二关联特征;
S400、将第一内部特征和第一关联特征融合后输入到第一解码器进行解码,得到预测平面参数和预测平面区域;
S500、将第二内部特征和第二关联特征融合后输入到第二解码器进行解码,得到预测非平面区域,其中,预测非平面区域用于校验预测平面区域;
S600、根据平面参数和平面区域进行三维恢复,得到预测三维平面。
通过对输入图像进行多尺度特征提取,能够提高所获信息的全面性,通过设置内编码器和外编码器,分别提取对应特征图中图像块的内部特征以及图像块之间的关联特征,再将内部特征和关联特征融合后输入到解码器进行解码,能够有效提高特征提取的全面性,减少图像信息丢失的机率,进而能够提高单目三维平面恢复的精度,此外,预测平面区域能够通过预测非平面区域进行校验,可进一步提高单目三维平面恢复的鲁棒性。
可以理解的是,参考图2所示,步骤S100,对输入图像进行多尺度特征提取,得到两个尺度下的第一特征图和第二特征图,包括:
S110、对输入图像进行多尺度特征提取,得到两个尺度下的第一提取图和第二提取图;
S120、对第一提取图和第二提取图分别嵌入对应的位置信息,得到两个尺度下的第一特征图和第二特征图。
具体的,步骤S110,通过HRNet卷积网络对输入图像进行多尺度特征提取,得到两个尺度下的第一提取图和第二提取图;
步骤S120,通过位置嵌入对第一提取图和第二提取图分别嵌入对应的位置信息,分别转化为token后得到两个尺度下的第一特征图和第二特征图。
需要说明的是,第一特征图对应的尺度为HW/16的尺度,第二特征图对应的尺度为HW/32,其中,H和W分别表示输入图像的高和宽。
为了获取更多的细节,通过注意力机制将输入数据进一步编码为细分patch即细分的图像块,通过将特征图划分成了多个不相交的区域,对不同特征图的patch embedding进行Windows Multi-Head Self-Attention(W-MSA),能有效减少了计算量,将来自vision transformer不同阶段的token组合成不同分辨率的类图像表示,并使用卷积解码器逐步将它们组合成全分辨率预测。与全卷积网络相比,多尺度的dense vision transformer在图像块嵌入计算后避免了下采样操作带来的特征丢失,提供更精细和更全局一致的预测。
可以理解的是,参考图3所示,步骤S200,将第一特征图分别输入第一内编码器和第一外编码器,分别提取得到第一特征图中第一图像块的第一内部特征和各个第一图像块之间的第一关联特征,包括:
S210、将第一特征图分切为多个第一图像块;
S220、将各个第一图像块输入到第一内编码器,提取得到各个第一图像块的第一内部特征;
S230、将各个第一图像块输入到第一外编码器,提取得到各个第一图像块之间的第一关 联特征,其中,第一关联特征用于表征各个图像块之间的关系。
通过将第一特征图分切为多个第一图像块,能够有效避免占比小的像素平面区域丢失。
可以理解的是,参考图4所示,步骤S300,将第二特征图分别输入第二内编码器和第二外编码器,分别提取得到第二特征图中第二图像块的第二内部特征和各个第二图像块之间的第二关联特征,包括:
S310、将第二特征图分切为多个第二图像块;
S320、将各个第二图像块输入到第二内编码器,提取得到各个第二图像块的第二内部特征;
S330、将各个第二图像块输入到第二外编码器,提取得到各个第二图像块之间的第二关联特征,其中,第二关联特征用于表征各个图像块之间的关系。
通过将第一特征图分切为多个第一图像块,能够有效避免占比小的像素平面区域丢失。
可以理解的是,参考图5所示,步骤S400,将第一内部特征和第一关联特征融合后输入到第一解码器进行解码,得到预测平面参数和预测平面区域,包括:
S410、将第一内部特征和第一关联特征进行元素相加,得到第一融合特征;
S420、将第一融合特征输入到第一解码器以平面区域和平面参数为标签进行解码分类,得到预测平面参数和预测平面区域。
通过融合第一内部特征和第一相关特征能够有效提高特征提取的全面性,进而能够提高最终进行三维平面恢复的精度。
可以理解的是,参考图6所示,步骤S500,将第二内部特征和第二关联特征融合后输入到第二解码器进行解码,得到预测非平面区域,包括:
S510、将第二内部特征和第二关联特征进行元素相加,得到第二融合特征;
S520、将第二融合特征输入到第二解码器以非平面区域为标签进行解码分类,得到预测非平面区域。
通过融合第二内部特征和第二相关特征能够有效提高特征提取的全面性,进而能够提高最终进行三维平面恢复的精度。
在进行场景发生变化时,相关技术中的三维平面恢复时的检测精度显得明显不足,鲁棒性低。
基于此,为提高对场景的变化的检测精度和鲁棒性,可以理解是,步骤S500,在将第二内部特征和第二关联特征融合后输入到第二解码器进行解码,得到预测非平面区域之后,还包括:
根据预测平面区域、预测非平面区域和损失函数,更新第一解码器的权重。
在三维平面恢复的过程中,通过损失函数对第一解码器进行迭代更新,能够有效提高三维平面恢复时进行平面区域预测的精度,整体网络的性能得到动态更新,能够提高对场景的变化的检测精度和鲁棒性。
可以理解的是,根据预测平面区域、预测非平面区域和损失函数,更新第一解码器的权重,具体为:
根据预测平面区域、预测非平面区域和交叉熵损失函数,更新第一解码器的权重,其中,交叉熵损失函数为:
Figure PCTCN2022110039-appb-000003
Y +和Y -分别表示平面区域标记像素和非平面区域标记像素,P i表示第i个像素属于平面区域的概率,
Figure PCTCN2022110039-appb-000004
表示第i个像素属于非平面区域的概率,w是平面区域像素标记和非平面区域像素标记的比率。由于第一解码器和第二解码器的尺度和正负标签的定义不同而存在区别,最终通过变分信息对平面区域进行优化,正负标签为平面区域标签和非平面区域标签。
互信息是基于香农熵的两个随机变量之间依赖程度的度量,能够捕捉到变量间非线性的统计相关性,X和Z之间的互信息可以理解为,在给定Z的情况下,X中不确定性的减少量:
Figure PCTCN2022110039-appb-000005
其中,H(X)是香农熵,H(X|Z)是给定X的条件下Z的条件熵,P XZ是两个变量的联合概率分布,P X和P Z是各自边缘概率分布。同时,互信息等价于P XZ与P X和P Z乘积的KL散度(Kullback-Leibler):
Figure PCTCN2022110039-appb-000006
当联合概率P XZ和边缘乘积
Figure PCTCN2022110039-appb-000007
的分歧越大,X和Z之间的依赖性越强,因此,对于完全独立的两个变量,互信息不存在。互信息通常用于无监督表征学习网络中,但互信息估计作为双射函数难以估计,并且可能导致与下游任务无关的次优表示。而高度非线性的评估框架可能会带来更好的下游性能,但违背了学习有效可迁移数据表示的目的。基于互信息的知识蒸馏框架,将互信息定义为教师模型的熵值与在学生模型条件下教师模型熵值的差值,通过最大化教师-学生网络间的互信息,使得学生模型学习到教师模型的特征分布。
基于此,本发明通过两个最大化尺度网络分支的平面特征互信息来增强特征表达。在PlaneMT网络模型框架中,不同尺度的两个条网络分支分别对应第一解码器和第二解码器,分别用于检测得到预测平面区域S P和预测非平面区域S′ N-P,其中,在最理想的情况下,预测 平面区域和预测非平面区域取反:
S′ P∶=S′ N-P
由此,两条网络分支的输出预测平面区域变量S P和S′ P作为信息最大化的变分信息度量:
Figure PCTCN2022110039-appb-000008
由于互信息难以计算,为每个互信息项I(X;Z)提出一个变分下界,利用一个可变高斯q(x|z)来模拟p(x|z):
Figure PCTCN2022110039-appb-000009
最后一个不等式表现了KL散度D KL的非负性。
本发明实施例的单目三维平面恢复方法所应用网络的框架图参考图7所示,骨干网络提取特征获得尺寸为12×16以及尺寸为6×8的特征图后,尺寸为12×16经过POS(Position Embedding,位置嵌入)输入到第一内外编码器中,尺寸为6×8经过POS输入到第二内外编码器中,其中,损失函数采用互信息损失函数。
另外,本发明第二方面实施例还提供了一种电子设备,该电子设备包括:存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序。
处理器和存储器可以通过总线或者其他方式连接。
存储器作为一种非暂态计算机可读存储介质,可用于存储非暂态软件程序以及非暂态性计算机可执行程序。此外,存储器可以包括高速随机存取存储器,还可以包括非暂态存储器,例如至少一个磁盘存储器件、闪存器件、或其他非暂态固态存储器件。在一些实施方式中,存储器可选包括相对于处理器远程设置的存储器,这些远程存储器可以通过网络连接至该处理器。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
实现上述第一方面实施例的单目三维平面恢复方法所需的非暂态软件程序以及指令存储在存储器中,当被处理器执行时,执行上述实施例中的单目三维平面恢复方法,例如,执行以上描述的方法步骤S100至S600、方法步骤S110至S120、方法步骤S210至S230、方法步骤S310至S330、方法步骤S410至S420、方法步骤S510至S520。
以上所描述的设备实施例仅仅是示意性的,其中作为分离部件说明的单元可以是或者也可以不是物理上分开的,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。
此外,本发明第三方面实施例提供了一种计算机可读存储介质,该计算机可读存储介质存储有计算机可执行指令,该计算机可执行指令被一个处理器或控制器执行,例如,被上述设备实施例中的一个处理器执行,可使得上述处理器执行上述实施例中的单目三维平面恢复方法,例如,执行以上描述的方法步骤S100至S600、方法步骤S110至S120、方法步骤S210至S230、方法步骤S310至S330、方法步骤S410至S420、方法步骤S510至S520。
本领域普通技术人员可以理解,上文中所公开方法中的全部或某些步骤、系统可以被实施为软件、固件、硬件及其适当的组合。某些物理组件或所有物理组件可以被实施为由处理器,如中央处理器、数字信号处理器或微处理器执行的软件,或者被实施为硬件,或者被实施为集成电路,如专用集成电路。这样的软件可以分布在计算机可读介质上,计算机可读介质可以包括计算机存储介质(或非暂时性介质)和通信介质(或暂时性介质)。如本领域普通技术人员公知的,术语计算机存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、程序模块或其他数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于RAM、ROM、EEPROM、闪存或其他存储器技术、CD-ROM、数字多功能盘(DVD)或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其他的介质。此外,本领域普通技术人员公知的是,通信介质通常包含计算机可读指令、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信号中的其他数据,并且可包括任何信息递送介质。
在本说明书的描述中,参考术语“一个实施例”、“一些实施例”、“示意性实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本发明的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不一定指的是相同的实施例或示例。而且,描述的具体特征、结构、材料或者特点可以在任何的一个或多个实施例或示例中以合适的方式结合。
尽管已经示出和描述了本发明的实施例,本领域的普通技术人员可以理解:在不脱离本发明的原理和宗旨的情况下可以对这些实施例进行多种变化、修改、替换和变型,本发明的范围由权利要求及其等同物限定。

Claims (10)

  1. 一种单目三维平面恢复方法,其特征在于,包括:
    对输入图像进行多尺度特征提取,得到两个尺度下的第一特征图和第二特征图;
    将所述第一特征图分别输入第一内编码器和第一外编码器,分别提取得到所述第一特征图中第一图像块的第一内部特征和各个所述第一图像块之间的第一关联特征;
    将所述第二特征图分别输入第二内编码器和第二外编码器,分别提取得到所述第二特征图中第二图像块的第二内部特征和各个所述第二图像块之间的第二关联特征;
    将所述第一内部特征和所述第一关联特征融合后输入到第一解码器进行解码,得到预测平面参数和预测平面区域;
    将所述第二内部特征和所述第二关联特征融合后输入到第二解码器进行解码,得到预测非平面区域,其中,所述预测非平面区域用于校验所述预测平面区域;
    根据所述平面参数和所述平面区域进行三维恢复,得到预测三维平面。
  2. 根据权利要求1所述的单目三维平面恢复方法,其特征在于,所述对输入图像进行多尺度特征提取,得到两个尺度下的第一特征图和第二特征图,包括:
    对输入图像进行多尺度特征提取,得到两个尺度下的第一提取图和第二提取图;
    对所述第一提取图和所述第二提取图分别嵌入对应的位置信息,得到两个尺度下的第一特征图和第二特征图。
  3. 根据权利要求1所述的单目三维平面恢复方法,其特征在于,所述将所述第一特征图分别输入第一内编码器和第一外编码器,分别提取得到所述第一特征图中第一图像块的第一内部特征和各个所述第一图像块之间的第一关联特征,包括:
    将所述第一特征图分切为多个第一图像块;
    将各个所述第一图像块输入到所述第一内编码器,提取得到各个所述第一图像块的所述第一内部特征;
    将各个所述第一图像块输入到所述第一外编码器,提取得到各个所述第一图像块之间的所述第一关联特征。
  4. 根据权利要求1所述的单目三维平面恢复方法,其特征在于,所述将所述第二特征图分别输入第二内编码器和第二外编码器,分别提取得到所述第二特征图中第二图像块的第二内部特征和各个所述第二图像块之间的第二关联特征,包括:
    将所述第二特征图分切为多个第二图像块;
    将各个所述第二图像块输入到所述第二内编码器,提取得到各个所述第二图像块的所述 第二内部特征;
    将各个所述第二图像块输入到所述第二外编码器,提取得到各个所述第二图像块之间的所述第二关联特征。
  5. 根据权利要求1所述的单目三维平面恢复方法,其特征在于,所述将所述第一内部特征和所述第一关联特征融合后输入到第一解码器进行解码,得到预测平面参数和预测平面区域,包括:
    将所述第一内部特征和所述第一关联特征进行元素相加,得到第一融合特征;
    将所述第一融合特征输入到所述第一解码器以平面区域和平面参数为标签进行解码分类,得到所述预测平面参数和所述预测平面区域。
  6. 根据权利要求1所述的单目三维平面恢复方法,其特征在于,所述将所述第二内部特征和所述第二关联特征融合后输入到第二解码器进行解码,得到预测非平面区域,包括:
    将所述第二内部特征和所述第二关联特征进行元素相加,得到第二融合特征;
    将所述第二融合特征输入到所述第二解码器以非平面区域为标签进行解码分类,得到所述预测非平面区域。
  7. 根据权利要求1所述的单目三维平面恢复方法,其特征在于,在所述将所述第二内部特征和所述第二关联特征融合后输入到第二解码器进行解码,得到预测非平面区域之后,还包括:
    根据所述预测平面区域、所述预测非平面区域和损失函数,更新所述第一解码器的权重。
  8. 根据权利要求7所述的单目三维平面恢复方法,其特征在于,所述根据所述预测非平面区域和损失函数,更新所述第一解码器的权重,包括:
    根据所述预测平面区域、所述预测非平面区域和交叉熵损失函数,更新所述第一解码器的权重,其中,所述交叉熵损失函数为:
    Figure PCTCN2022110039-appb-100001
    Y +和Y -分别表示平面区域标记像素和非平面区域标记像素,P i表示第i个像素属于平面区域的概率,
    Figure PCTCN2022110039-appb-100002
    表示第i个像素属于非平面区域的概率,w是平面区域像素标记和非平面区域像素标记的比率。
  9. 一种电子设备,其特征在于,包括:
    存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如权利要求1至8中任意一项所述的单目三维平面恢复方法。
  10. 一种计算机存储介质,其特征在于,存储有计算机可执行指令,所述计算机可执行 指令用于执行权利要求1至8中任意一项所述的单目三维平面恢复方法。
PCT/CN2022/110039 2022-06-28 2022-08-03 单目三维平面恢复方法、设备及存储介质 WO2024000728A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210739676.7A CN115115691A (zh) 2022-06-28 2022-06-28 单目三维平面恢复方法、设备及存储介质
CN202210739676.7 2022-06-28

Publications (1)

Publication Number Publication Date
WO2024000728A1 true WO2024000728A1 (zh) 2024-01-04

Family

ID=83330200

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/110039 WO2024000728A1 (zh) 2022-06-28 2022-08-03 单目三维平面恢复方法、设备及存储介质

Country Status (2)

Country Link
CN (1) CN115115691A (zh)
WO (1) WO2024000728A1 (zh)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111414923A (zh) * 2020-03-05 2020-07-14 南昌航空大学 基于单幅rgb图像的室内场景三维重建方法及系统
CN112001960A (zh) * 2020-08-25 2020-11-27 中国人民解放军91550部队 基于多尺度残差金字塔注意力网络模型的单目图像深度估计方法
US20210150805A1 (en) * 2019-11-14 2021-05-20 Qualcomm Incorporated Layout estimation using planes
CN112990299A (zh) * 2021-03-11 2021-06-18 五邑大学 基于多尺度特征的深度图获取方法、电子设备、存储介质
CN113610912A (zh) * 2021-08-13 2021-11-05 中国矿业大学 三维场景重建中低分辨率图像单目深度估计系统及方法
CN113850900A (zh) * 2021-05-27 2021-12-28 北京大学 三维重建中基于图像和几何线索恢复深度图的方法及系统

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210150805A1 (en) * 2019-11-14 2021-05-20 Qualcomm Incorporated Layout estimation using planes
CN111414923A (zh) * 2020-03-05 2020-07-14 南昌航空大学 基于单幅rgb图像的室内场景三维重建方法及系统
CN112001960A (zh) * 2020-08-25 2020-11-27 中国人民解放军91550部队 基于多尺度残差金字塔注意力网络模型的单目图像深度估计方法
CN112990299A (zh) * 2021-03-11 2021-06-18 五邑大学 基于多尺度特征的深度图获取方法、电子设备、存储介质
CN113850900A (zh) * 2021-05-27 2021-12-28 北京大学 三维重建中基于图像和几何线索恢复深度图的方法及系统
CN113610912A (zh) * 2021-08-13 2021-11-05 中国矿业大学 三维场景重建中低分辨率图像单目深度估计系统及方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YANG, XIN ET AL.: "Monocular Depth Estimation Based on Multi-Scale Depth Map Fusion", IEEE ACCESS, vol. 9, 28 April 2021 (2021-04-28), pages 67696 - 67705, XP011854229, ISSN: 2169-3536, DOI: 10.1109/ACCESS.2021.3076346 *

Also Published As

Publication number Publication date
CN115115691A (zh) 2022-09-27

Similar Documents

Publication Publication Date Title
CN108764048B (zh) 人脸关键点检测方法及装置
CN109145759B (zh) 车辆属性识别方法、装置、服务器及存储介质
US8620026B2 (en) Video-based detection of multiple object types under varying poses
Yin et al. FD-SSD: An improved SSD object detection algorithm based on feature fusion and dilated convolution
Matzen et al. Nyc3dcars: A dataset of 3d vehicles in geographic context
US10726599B2 (en) Realistic augmentation of images and videos with graphics
Ma et al. A real-time crack detection algorithm for pavement based on CNN with multiple feature layers
WO2023082784A1 (zh) 一种基于局部特征注意力的行人重识别方法和装置
CN110827312B (zh) 一种基于协同视觉注意力神经网络的学习方法
CN113076871A (zh) 一种基于目标遮挡补偿的鱼群自动检测方法
CN113609896A (zh) 基于对偶相关注意力的对象级遥感变化检测方法及系统
CN109522807B (zh) 基于自生成特征的卫星影像识别系统、方法及电子设备
CN112418216A (zh) 一种复杂自然场景图像中的文字检测方法
WO2023212997A1 (zh) 基于知识蒸馏的神经网络训练方法、设备及存储介质
WO2022218396A1 (zh) 图像处理方法、装置和计算机可读存储介质
CN116453121B (zh) 一种车道线识别模型的训练方法及装置
CN111739144A (zh) 一种基于深度特征光流的同时定位和建图的方法及装置
CN110675421A (zh) 基于少量标注框的深度图像协同分割方法
CN113850136A (zh) 基于yolov5与BCNN的车辆朝向识别方法及系统
CN111104941B (zh) 图像方向纠正方法、装置及电子设备
CN114519819B (zh) 一种基于全局上下文感知的遥感图像目标检测方法
CN104463962A (zh) 基于gps信息视频的三维场景重建方法
Shit et al. An encoder‐decoder based CNN architecture using end to end dehaze and detection network for proper image visualization and detection
CN113570615A (zh) 一种基于深度学习的图像处理方法、电子设备及存储介质
CN117253044A (zh) 一种基于半监督交互学习的农田遥感图像分割方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22948822

Country of ref document: EP

Kind code of ref document: A1