WO2022100379A1 - Object attitude estimation method and system based on image and three-dimensional model, and medium - Google Patents

Object attitude estimation method and system based on image and three-dimensional model, and medium Download PDF

Info

Publication number
WO2022100379A1
WO2022100379A1 PCT/CN2021/124660 CN2021124660W WO2022100379A1 WO 2022100379 A1 WO2022100379 A1 WO 2022100379A1 CN 2021124660 W CN2021124660 W CN 2021124660W WO 2022100379 A1 WO2022100379 A1 WO 2022100379A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
feature
view
features
dimensional model
Prior art date
Application number
PCT/CN2021/124660
Other languages
French (fr)
Chinese (zh)
Inventor
张健驰
贾奎
陈轲
Original Assignee
华南理工大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华南理工大学 filed Critical 华南理工大学
Publication of WO2022100379A1 publication Critical patent/WO2022100379A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06F18/2148Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the process organisation or structure, e.g. boosting cascade
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2200/00Indexing scheme for image data processing or generation, in general
    • G06T2200/04Indexing scheme for image data processing or generation, in general involving 3D image data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Definitions

  • the present invention relates to the technical field of intelligent information processing, and in particular, to an image and three-dimensional model-based object attitude estimation method, system and medium.
  • Object pose estimation technology can estimate the category, three-dimensional displacement and three-dimensional orientation of the target object in the scene. This technology can greatly enhance scene understanding in VR, cars, and robots, and is important for applications such as augmented reality, autonomous driving, and robotic manipulation. Therefore, the object pose estimation technology can be said to be an important breakthrough in the manufacturing industry from the traditional model to the intelligent model.
  • the purpose of the present invention is to provide an object pose estimation method, system and medium based on images and three-dimensional models.
  • An object pose estimation method based on images and three-dimensional models comprising the following steps:
  • the object pose estimation model is used to map the image features of the target object to the convolutional neural network of the multi-view features of the three-dimensional model with the highest similarity.
  • the object pose estimation method also includes the step of constructing the object pose estimation model, specifically:
  • multi-view feature extraction network uses the multi-view feature extraction network to extract and save the features of each image in the multi-view image data set to form a multi-view image feature database of the three-dimensional model;
  • the perspective feature mapping network is trained by using the multi-view image feature database and the training set image feature database to obtain an object pose estimation model.
  • the standard format includes one of a point cloud format, a voxel format or a mesh format.
  • the multi-view image dataset includes X three-dimensional models ⁇ M 1 , M 2 , M 3 . . . M X ⁇ , and Y perspectives ⁇ V 1 , V 2 , V 3 . Y ⁇ and the two-dimensional images ⁇ I 1 , I 2 , I 3 . . . I Y ⁇ rendered by the three-dimensional model under Y viewing angles for each of the three-dimensional models;
  • the multi-view image feature database includes X three-dimensional models ⁇ M 1 , M 2 , M 3 . . . M X ⁇ , and Y perspectives ⁇ V 1 , V 2 , V 3 . and the view features ⁇ F 1 , F 2 , F 3 . vector, 1 ⁇ i ⁇ Y.
  • the multi-view feature extraction network is composed of a cascaded convolutional neural network, and is obtained by performing multiple rounds of iterative training using a gradient optimization algorithm.
  • the multi-view feature extraction network is composed of 14 layers of different types of deep convolutional neural networks cascaded;
  • the 1st layer is the input layer
  • the 2nd, 3rd, 5th, 6, 8, 9, 11, and 12th layers are the convolution layers
  • the 4th, 7th, 10th, and 13th layers are the pooling layers
  • the 14th layer is the output. layer
  • the output dimensions corresponding to all the convolutional layers are equal to the input dimensions, and the width and height of the feature maps output by all the pooling layers are respectively the width/2 and the height/2 of the feature maps input to the pooling layer.
  • the view feature mapping network takes the multi-view image feature database and the image features of the target object as inputs, and outputs the similarity between the image features and each view image feature in the multi-view image features of the three-dimensional model.
  • the view feature mapping network uses a cross-entropy function as a loss function, and optimizes the parameters of the view feature mapping network with minimizing the mapping error as an optimization goal.
  • An object pose estimation system based on images and three-dimensional models including:
  • the data acquisition module is used to acquire the image data of the target object
  • a pose estimation module configured to perform feature extraction on the image data, use an object pose estimation model to map the extracted features, and obtain the perspective corresponding to the feature with the highest similarity as the estimated pose of the target object;
  • the object pose estimation model is used to map the image features of the target object to the convolutional neural network of the multi-view features of the three-dimensional model with the highest similarity.
  • the present invention does not require the aid of depth images, makes more full use of the three-dimensional model of the target object, can better calculate the occluded target object, does not need to retrain the entire network when modifying the target object, and improves the performance of the object.
  • Generalization, accuracy and recognition speed of pose estimation techniques are used.
  • FIG. 1 is a flowchart of steps of a method for estimating an object pose based on a picture and a three-dimensional model in an embodiment of the present invention
  • FIG. 2 is a structural diagram of a multi-view feature extraction network in an embodiment of the present invention.
  • FIG. 3 is a structural diagram of a view feature mapping network in an embodiment of the present invention.
  • the azimuth description such as the azimuth or position relationship indicated by up, down, front, rear, left, right, etc.
  • the azimuth description is based on the azimuth or position relationship shown in the drawings, only In order to facilitate the description of the present invention and simplify the description, it is not indicated or implied that the indicated device or element must have a particular orientation, be constructed and operated in a particular orientation, and therefore should not be construed as limiting the present invention.
  • the meaning of several is one or more, the meaning of multiple is two or more, greater than, less than, exceeding, etc. are understood as not including this number, above, below, within, etc. are understood as including this number. If it is described that the first and the second are only for the purpose of distinguishing technical features, it cannot be understood as indicating or implying relative importance, or indicating the number of the indicated technical features or the order of the indicated technical features. relation.
  • this embodiment provides an object pose estimation method based on pictures and three-dimensional models, including the following steps:
  • 3D model acquisition and model rendering obtain the 3D model of the target object through standard point cloud formats, such as point cloud, voxel, and mesh formats, and render the 3D model of each object from multiple perspectives , to obtain a two-dimensional image corresponding to each of the three-dimensional models of the object under different viewing angles, forming a multi-view image data set of the three-dimensional model.
  • standard point cloud formats such as point cloud, voxel, and mesh formats
  • the multi-view feature extraction network is composed of 14 layers of different types of deep network cascades, of which the first layer is the input layer, the second, 3, 5, 6, 8, 9, 11, and 12 layers are convolutional layers. Layers 4, 7, 10, and 13 are pooling layers, and the last layer is the output layer; the output dimensions corresponding to all convolutional layers are equal to the input dimensions, and the width and height of the feature maps output by all pooling layers are the input of the pooling layer, respectively. The width/2 and height/2 of the feature map; and during testing, the output 32x32 features will be stretched to obtain 1024-dimensional features, that is, the image features.
  • the feature-image reconstruction network consists of 14 layers of different types of deep network cascades, of which the first layer is the image feature input layer, and the second, 3, 5, 6, 8, 9, 11, and 12 layers are input and output.
  • the deconvolution layer with the same dimensions, the 4th, 7th, 10th, and 13th layers are the deconvolution layers with the width and height of the output feature being the width ⁇ 2 and height ⁇ 2 of the input feature respectively, and the last layer is the output layer.
  • the training objective function is -L(F)+
  • the training takes the adam optimization method as the strategy, the learning rate parameter is initialized to 0.01, and the momentum is initialized to 0.95.
  • the parameters of the multi-view feature extraction network are not changed, and there is no need to cascade the feature-image reconstruction network again.
  • Feature database construction After training the multi-view feature extraction network, it already has the ability to map from images to features, and can calculate and save the features of each image in the multi-view image data set of the 3D model as a feature template library , and the features of each image in the training set; specifically, adjust the size of each image to 512 ⁇ 512, after sample-by-sample mean reduction and data normalization, input the multi-view feature extraction network, and calculate the feature map of 32 ⁇ 32 , and the feature map is pulled up to a feature of 1024 dimensions, which is the desired feature.
  • the three-dimensional model multi-view image feature database consists of X three-dimensional models ⁇ M 1 , M 2 , M 3 . 1 , V 2 , V 3 . , where F i is a feature vector of 1024 dimensions, 1 ⁇ i ⁇ Y.
  • the 3-D model multi-view image feature database can be used as a template, and it only needs to have the 3-D model multi-view image feature database according to the image features. The ability to find the most similar perspective is enough, so that the perspective is used as the pose of the target object to complete the object pose estimation.
  • the task of the perspective feature mapping network training is to train a convolutional neural network, input the multi-view image feature database of the three-dimensional model and the image that needs to estimate the pose of the object, and extract the features extracted by the multi-view feature extraction network, and output the described features contained in the object.
  • the pose of the target object is to train a convolutional neural network, input the multi-view image feature database of the three-dimensional model and the image that needs to estimate the pose of the object, and extract the features extracted by the multi-view feature extraction network, and output the described features contained in the object.
  • the pose of the target object is to train a convolutional neural network, input the multi-view image feature database of the three-dimensional model and the image that needs to estimate the pose of the object, and extract the features extracted by the multi-view feature extraction network, and output the described features contained in the object.
  • the pose of the target object is to train a convolutional neural network, input the multi-view image feature database of the three-dimensional model and the image that needs to estimate the pose of
  • the viewing angle feature mapping network optimizes the parameters of the viewing angle feature mapping network by using the three-dimensional model multi-view image feature database with the cross entropy function as the loss function, and taking minimizing the mapping error as the optimization goal.
  • the specific training steps are: 1) using the xavier initialization method to initialize the network parameters; 2) inputting the three-dimensional model multi-view image feature database and the image features, and calculating the similarity between the image features and the image features in each view in the database 3) Calculate the cross entropy loss value of the network according to the real image feature similarity; 4) Use the gradient optimization method based on the adam strategy to carry out back-propagation to update the perspective feature mapping network parameters; 5) Replace the training set image, repeat steps 2-4 until the cross-entropy loss value is below a certain threshold.
  • the advantage of the perspective feature mapping network is that changing the multi-view image feature database of the three-dimensional model or changing the image features does not affect the accuracy of the perspective feature mapping network.
  • attitude estimation can be performed according to the following steps: 1) Obtain a three-dimensional model of the object, and use the aforementioned three-dimensional model rendering method to render in multiple perspectives; 2) Obtain the three-dimensional model
  • the multi-view feature database of the 3D model is extracted with the trained multi-view feature extraction network, and the multi-view image feature database of the three-dimensional model is constructed; 3)
  • For the image obtained by the camera use the trained multi-view feature extraction network.
  • the perspective feature extraction network performs feature extraction to obtain the features of the image; 4) using the trained perspective feature mapping network, the multi-view image feature database of the three-dimensional model and the image features that need to be estimated are used as input, and the output is more.
  • the perspective calculated in the previous step is used as the pose of the target object.
  • A2. Use the convolution layer and the pooling layer to perform feature convolution and pooling on the image
  • This embodiment also provides an image and three-dimensional model-based object pose estimation system, including:
  • the data acquisition module is used to acquire the image data of the target object
  • a pose estimation module configured to perform feature extraction on the image data, use an object pose estimation model to map the extracted features, and obtain the perspective corresponding to the feature with the highest similarity as the estimated pose of the target object;
  • the object pose estimation model is used to map the image features of the target object to the convolutional neural network of the multi-view features of the three-dimensional model with the highest similarity.
  • An image and three-dimensional model-based object pose estimation system in this embodiment can execute the image and three-dimensional model-based object pose estimation method provided by the method embodiments of the present invention, and can execute any combination of implementation steps of the method embodiments. , with the corresponding functions and beneficial effects of the method.
  • Embodiments of the present application further disclose a computer program product or computer program, where the computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium.
  • a processor of the computer device can read the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method shown in FIG. 1 .
  • This embodiment also provides a storage medium, which stores an instruction or program for executing an image and three-dimensional model-based object pose estimation method provided by the method embodiment of the present invention.
  • the instruction or program can be executed. Any combination of implementation steps of the method embodiments has the corresponding functions and beneficial effects of the method.
  • the functions/operations noted in the block diagrams may occur out of the order noted in the operational diagrams.
  • two blocks shown in succession may, in fact, be executed substantially concurrently or the blocks may sometimes be executed in the reverse order, depending upon the functionality/operations involved.
  • the embodiments presented and described in the flowcharts of the present invention are provided by way of example in order to provide a more comprehensive understanding of the technology. The disclosed methods are not limited to the operations and logic flows presented herein. Alternative embodiments are contemplated in which the order of the various operations are altered and in which sub-operations described as part of larger operations are performed independently.
  • the functions, if implemented in the form of software functional units and sold or used as independent products, may be stored in a computer-readable storage medium.
  • the technical solution of the present invention can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution.
  • the computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present invention.
  • the aforementioned storage medium includes: U disk, mobile hard disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program codes .
  • a "computer-readable medium” can be any device that can contain, store, communicate, propagate, or transport the program for use by or in connection with an instruction execution system, apparatus, or apparatus.
  • computer readable media include the following: electrical connections with one or more wiring (electronic devices), portable computer disk cartridges (magnetic devices), random access memory (RAM), Read Only Memory (ROM), Erasable Editable Read Only Memory (EPROM or Flash Memory), Fiber Optic Devices, and Portable Compact Disc Read Only Memory (CDROM).
  • the computer readable medium may even be paper or other suitable medium on which the program may be printed, as the paper or other medium may be optically scanned, for example, followed by editing, interpretation, or other suitable medium as necessary process to obtain the program electronically and then store it in computer memory.
  • various parts of the present invention may be implemented in hardware, software, firmware or a combination thereof.
  • various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system.
  • a suitable instruction execution system For example, if implemented in hardware, as in another embodiment, it can be implemented by any one or a combination of the following techniques known in the art: Discrete logic circuits, application specific integrated circuits with suitable combinational logic gates, Programmable Gate Arrays (PGA), Field Programmable Gate Arrays (FPGA), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

An object attitude estimation method and system based on an image and a three-dimensional model, and a medium. The method comprises the following steps: acquiring image data of a target object; and performing feature extraction on the image data, mapping extracted features by using an object attitude estimation model, acquiring a viewing angle corresponding to the feature having the highest similarity, and taking the viewing angle as an estimated target object pose, wherein the object attitude estimation model is used for mapping image features of the target object to a convolutional neural network of a three-dimensional model multi-viewing-angle feature having the highest similarity. By means of the method, the aid of a depth image is not required, a three-dimensional model of a target object is more fully utilized, a blocked target object can be better calculated, and when modifying the target object, the retraining of the entire network is not required, such that the generalizability, precision and recognition speed of object pose estimation technology are improved, and the method can be widely applied to the technical field of intelligent information processing.

Description

基于图像和三维模型的物体姿态估计方法、系统及介质Object pose estimation method, system and medium based on image and 3D model 技术领域technical field
本发明涉及智能信息处理技术领域,尤其涉及一种基于图像和三维模型的物体姿态估计方法、系统及介质。The present invention relates to the technical field of intelligent information processing, and in particular, to an image and three-dimensional model-based object attitude estimation method, system and medium.
背景技术Background technique
物体姿态估计技术可以估计出场景中目标物体的类别、三维位移量和三维朝向。这项技术可以极大地加强VR、汽车和机器人对场景的理解,对于诸如增强现实、自动驾驶和机器人操作等应用非常重要。因此,物体姿态估计技术可以说是制造业由传统模式走向智能模型的一个重要突破口。Object pose estimation technology can estimate the category, three-dimensional displacement and three-dimensional orientation of the target object in the scene. This technology can greatly enhance scene understanding in VR, cars, and robots, and is important for applications such as augmented reality, autonomous driving, and robotic manipulation. Therefore, the object pose estimation technology can be said to be an important breakthrough in the manufacturing industry from the traditional model to the intelligent model.
尽管深度学习技术的出现让物体姿态估计领域取得了很大的进展,而目前主流的深度网络要么只用到了目标物体三维模型的部分关键点帮助估计,要么只通过反向传播,利用损失函数间接地利用物体的三维模型,并没有充分地利用好三维模型包含的各种信息。这导致了现有方法存在的一些问题:如更换目标模型就需要重新训练整个网络、无法处理好遮挡严重的情况、训练结果和测试结果偏差大等等。而从近些年物体姿态估计方法的发展来看,现有的方法还缺少一种良好的方案,用以提高物体姿态估计技术的泛化性、精准度和识别速度。Although the emergence of deep learning technology has made great progress in the field of object pose estimation, the current mainstream deep networks either only use some key points of the 3D model of the target object to help estimate, or only use the loss function indirectly through backpropagation. The three-dimensional model of the object is used effectively, and the various information contained in the three-dimensional model is not fully utilized. This leads to some problems with existing methods: for example, replacing the target model requires retraining the entire network, unable to deal with severe occlusion, large deviations between training results and test results, and so on. From the perspective of the development of object pose estimation methods in recent years, the existing methods still lack a good solution to improve the generalization, accuracy and recognition speed of object pose estimation technology.
发明内容SUMMARY OF THE INVENTION
为至少一定程度上解决现有技术中存在的技术问题之一,本发明的目的在于提供一种基于图像和三维模型的物体姿态估计方法、系统及介质。In order to solve one of the technical problems existing in the prior art at least to a certain extent, the purpose of the present invention is to provide an object pose estimation method, system and medium based on images and three-dimensional models.
本发明所采用的技术方案是:The technical scheme adopted in the present invention is:
一种基于图像和三维模型的物体姿态估计方法,包括以下步骤:An object pose estimation method based on images and three-dimensional models, comprising the following steps:
获取目标物体的图像数据;Obtain the image data of the target object;
对所述图像数据进行特征提取,采用物体姿态估计模型对提取到的特征进行映射,获取相似度最高的特征对应的视角作为估计的目标物体位姿;Perform feature extraction on the image data, use an object pose estimation model to map the extracted features, and obtain the perspective corresponding to the feature with the highest similarity as the estimated pose of the target object;
其中,所述物体姿态估计模型用于将所述目标物体的图像特征映射到相似度最高的三维模型多视角特征的卷积神经网络。Wherein, the object pose estimation model is used to map the image features of the target object to the convolutional neural network of the multi-view features of the three-dimensional model with the highest similarity.
进一步,所述物体姿态估计方法还包括构建所述物体姿态估计模型的步骤,具体为:Further, the object pose estimation method also includes the step of constructing the object pose estimation model, specifically:
通过标准格式获取目标物体的三维模型;Obtain the 3D model of the target object through a standard format;
在多个视角下对所述三维模型进行渲染,获得不同视角下所述三维模型对应的二维图像, 构成所述三维模型的多视角图像数据集;Rendering the three-dimensional model under multiple viewing angles, obtaining two-dimensional images corresponding to the three-dimensional model under different viewing angles, and forming a multi-view image data set of the three-dimensional model;
获取训练集,采用所述训练集训练多视角特征提取网络;obtaining a training set, and using the training set to train a multi-view feature extraction network;
采用所述多视角特征提取网络提取并保存所述多视角图像数据集中每一张图像的特征,构成所述三维模型的多视角图像特征数据库;Using the multi-view feature extraction network to extract and save the features of each image in the multi-view image data set to form a multi-view image feature database of the three-dimensional model;
采用所述多视角特征提取网络提取并保存所述训练集中每一张图像的特征,构成所述训练集图像特征数据库;Using the multi-view feature extraction network to extract and save the features of each image in the training set, form the training set image feature database;
采用所述多视角图像特征数据库和所述训练集图像特征数据库训练所述视角特征映射网络,获得物体姿态估计模型。The perspective feature mapping network is trained by using the multi-view image feature database and the training set image feature database to obtain an object pose estimation model.
进一步,所述标准格式包括点云格式、体素格式或网状格式中的其中一种。Further, the standard format includes one of a point cloud format, a voxel format or a mesh format.
进一步,所述多视角图像数据集包括X个三维模型{M 1、M 2、M 3…M X}、每个所述三维模型对应的Y个视角{V 1、V 2、V 3…V Y}以及每个所述三维模型在Y个视角下三维模型渲染出的二维图像{I 1、I 2、I 3…I Y}; Further, the multi-view image dataset includes X three-dimensional models {M 1 , M 2 , M 3 . . . M X }, and Y perspectives {V 1 , V 2 , V 3 . Y } and the two-dimensional images {I 1 , I 2 , I 3 . . . I Y } rendered by the three-dimensional model under Y viewing angles for each of the three-dimensional models;
所述多视角图像特征数据库包括X个三维模型{M 1、M 2、M 3…M X}、每个所述三维模型对应的Y个视角{V 1、V 2、V 3…V Y}以及所述三维模型每个视角下的二维图像经所述多视角特征提取网络进行特征提取而得到的视角特征{F 1、F 2、F 3…F Y},F i为1024维度的特征向量,1≤i≤Y。 The multi-view image feature database includes X three-dimensional models {M 1 , M 2 , M 3 . . . M X }, and Y perspectives {V 1 , V 2 , V 3 . and the view features { F 1 , F 2 , F 3 . vector, 1≤i≤Y.
进一步,所述多视角特征提取网络由级联卷积神经网络构成,利用梯度优化算法进行多轮迭代训练获得。Further, the multi-view feature extraction network is composed of a cascaded convolutional neural network, and is obtained by performing multiple rounds of iterative training using a gradient optimization algorithm.
进一步,所述多视角特征提取网络由14层不同类型的深度卷积神经网络级联构成;Further, the multi-view feature extraction network is composed of 14 layers of different types of deep convolutional neural networks cascaded;
其中,第1层为输入层,第2、3、5、6、8、9、11、12层为卷积层,第4、7、10、13层为池化层,第14层为输出层;Among them, the 1st layer is the input layer, the 2nd, 3rd, 5th, 6, 8, 9, 11, and 12th layers are the convolution layers, the 4th, 7th, 10th, and 13th layers are the pooling layers, and the 14th layer is the output. layer;
所有所述卷积层对应的输出维度等于输入维度,所有所述池化层输出的特征图的宽和高分别为所述池化层输入的特征图的宽/2和高/2。The output dimensions corresponding to all the convolutional layers are equal to the input dimensions, and the width and height of the feature maps output by all the pooling layers are respectively the width/2 and the height/2 of the feature maps input to the pooling layer.
进一步,所述视角特征映射网络以所述多视角图像特征数据库和目标物体的图像特征作为输入,输出所述图像特征与三维模型多视角图像特征中每一视角图像特征的相似度。Further, the view feature mapping network takes the multi-view image feature database and the image features of the target object as inputs, and outputs the similarity between the image features and each view image feature in the multi-view image features of the three-dimensional model.
进一步,所述视角特征映射网络以交叉熵函数作为损失函数,且以最小化映射误差为优化目标对所述视角特征映射网络的参数进行优化。Further, the view feature mapping network uses a cross-entropy function as a loss function, and optimizes the parameters of the view feature mapping network with minimizing the mapping error as an optimization goal.
本发明所采用的另一技术方案是:Another technical scheme adopted by the present invention is:
一种基于图像和三维模型的物体姿态估计系统,包括:An object pose estimation system based on images and three-dimensional models, including:
数据采集模块,用于获取目标物体的图像数据;The data acquisition module is used to acquire the image data of the target object;
位姿估计模块,用于对所述图像数据进行特征提取,采用物体姿态估计模型对提取到的特征进行映射,获取相似度最高的特征对应的视角作为估计的目标物体位姿;a pose estimation module, configured to perform feature extraction on the image data, use an object pose estimation model to map the extracted features, and obtain the perspective corresponding to the feature with the highest similarity as the estimated pose of the target object;
其中,所述物体姿态估计模型用于将所述目标物体的图像特征映射到相似度最高的三维模型多视角特征的卷积神经网络。Wherein, the object pose estimation model is used to map the image features of the target object to the convolutional neural network of the multi-view features of the three-dimensional model with the highest similarity.
本发明所采用的另一技术方案是:Another technical scheme adopted by the present invention is:
一种存储介质,其中存储有处理器可执行的指令,所述处理器可执行的指令在由处理器执行时用于执行如上所述一种基于图像和三维模型的物体姿态估计方法。A storage medium storing processor-executable instructions, when executed by the processor, is used to execute the above-mentioned method for estimating an object pose based on an image and a three-dimensional model.
本发明的有益效果是:本发明无需借助于深度图像,更加充分地利用目标物体的三维模型,可以更好地计算被遮挡的目标物体,修改目标物体时不需要重新训练整个网络,提高了物体位姿估计技术的泛化性、精准度和识别速度。The beneficial effects of the present invention are as follows: the present invention does not require the aid of depth images, makes more full use of the three-dimensional model of the target object, can better calculate the occluded target object, does not need to retrain the entire network when modifying the target object, and improves the performance of the object. Generalization, accuracy and recognition speed of pose estimation techniques.
附图说明Description of drawings
为了更清楚地说明本发明实施例或者现有技术中的技术方案,下面对本发明实施例或者现有技术中的相关技术方案附图作以下介绍,应当理解的是,下面介绍中的附图仅仅为了方便清晰表述本发明的技术方案中的部分实施例,对于本领域的技术人员而言,在无需付出创造性劳动的前提下,还可以根据这些附图获取到其他附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following descriptions are given to the accompanying drawings of the embodiments of the present invention or the related technical solutions in the prior art. It should be understood that the drawings in the following introduction are only In order to facilitate and clearly express some embodiments of the technical solutions of the present invention, for those skilled in the art, other drawings can also be obtained from these drawings without creative work.
图1是本发明实施例中一种基于图片和三维模型的物体姿态估计方法的步骤流程图;1 is a flowchart of steps of a method for estimating an object pose based on a picture and a three-dimensional model in an embodiment of the present invention;
图2是本发明实施例中多视角特征提取网络的结构图;2 is a structural diagram of a multi-view feature extraction network in an embodiment of the present invention;
图3是本发明实施例中视角特征映射网络的结构图。FIG. 3 is a structural diagram of a view feature mapping network in an embodiment of the present invention.
具体实施方式Detailed ways
下面详细描述本发明的实施例,所述实施例的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的,仅用于解释本发明,而不能理解为对本发明的限制。对于以下实施例中的步骤编号,其仅为了便于阐述说明而设置,对步骤之间的顺序不做任何限定,实施例中的各步骤的执行顺序均可根据本领域技术人员的理解来进行适应性调整。The following describes in detail the embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are exemplary, only used to explain the present invention, and should not be construed as a limitation of the present invention. The numbers of the steps in the following embodiments are only set for the convenience of description, and the sequence between the steps is not limited in any way, and the execution sequence of each step in the embodiments can be adapted according to the understanding of those skilled in the art Sexual adjustment.
在本发明的描述中,需要理解的是,涉及到方位描述,例如上、下、前、后、左、右等指示的方位或位置关系为基于附图所示的方位或位置关系,仅是为了便于描述本发明和简化描述,而不是指示或暗示所指的装置或元件必须具有特定的方位、以特定的方位构造和操作,因此不能理解为对本发明的限制。In the description of the present invention, it should be understood that the azimuth description, such as the azimuth or position relationship indicated by up, down, front, rear, left, right, etc., is based on the azimuth or position relationship shown in the drawings, only In order to facilitate the description of the present invention and simplify the description, it is not indicated or implied that the indicated device or element must have a particular orientation, be constructed and operated in a particular orientation, and therefore should not be construed as limiting the present invention.
在本发明的描述中,若干的含义是一个或者多个,多个的含义是两个以上,大于、小于、超过等理解为不包括本数,以上、以下、以内等理解为包括本数。如果有描述到第一、第二 只是用于区分技术特征为目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量或者隐含指明所指示的技术特征的先后关系。In the description of the present invention, the meaning of several is one or more, the meaning of multiple is two or more, greater than, less than, exceeding, etc. are understood as not including this number, above, below, within, etc. are understood as including this number. If it is described that the first and the second are only for the purpose of distinguishing technical features, it cannot be understood as indicating or implying relative importance, or indicating the number of the indicated technical features or the order of the indicated technical features. relation.
本发明的描述中,除非另有明确的限定,设置、安装、连接等词语应做广义理解,所属技术领域技术人员可以结合技术方案的具体内容合理确定上述词语在本发明中的具体含义。In the description of the present invention, unless otherwise clearly defined, words such as setting, installation, connection should be understood in a broad sense, and those skilled in the art can reasonably determine the specific meanings of the above words in the present invention in combination with the specific content of the technical solution.
如图1所示,本实施例提供一种基于图片和三维模型的物体姿态估计方法,包括以下步骤:As shown in FIG. 1 , this embodiment provides an object pose estimation method based on pictures and three-dimensional models, including the following steps:
S1、三维模型获取与模型渲染:通过标准点云格式,如点云、体素、网状格式,获取到目标物体的三维模型,并在多个视角下对每个所述物体三维模型进行渲染,得到不同视角下每个所述物体三维模型对应的二维图像,构成三维模型多视角图像数据集。S1. 3D model acquisition and model rendering: obtain the 3D model of the target object through standard point cloud formats, such as point cloud, voxel, and mesh formats, and render the 3D model of each object from multiple perspectives , to obtain a two-dimensional image corresponding to each of the three-dimensional models of the object under different viewing angles, forming a multi-view image data set of the three-dimensional model.
S2、多视角特征提取网络训练:为了使多视角特征提取网络提取出的图像特征能更好地适应物体姿态估计任务,在训练过程中,多视角特征提取网络与特征-图像重建网络级联一起训练,进行参数优化;级联后的整体网络表示为
Figure PCTCN2021124660-appb-000001
其中
Figure PCTCN2021124660-appb-000002
表示重建后的图像,F k=f θ(I k)为从图像I k到特征F k的映射,θ表示多视角特征提取网络的参数,f β表示特征-图像重建网络,β表示特征-图像重建网络的参数。
S2. Multi-view feature extraction network training: In order to make the image features extracted by the multi-view feature extraction network better suited to the task of object pose estimation, in the training process, the multi-view feature extraction network is cascaded with the feature-image reconstruction network. training and parameter optimization; the cascaded overall network is expressed as
Figure PCTCN2021124660-appb-000001
in
Figure PCTCN2021124660-appb-000002
represents the reconstructed image, F k = f θ (I k ) is the mapping from the image I k to the feature F k , θ represents the parameters of the multi-view feature extraction network, f β represents the feature-image reconstruction network, and β represents the feature- Parameters of the image reconstruction network.
该多视角特征提取网络由14层不同类型的深度网络级联构成,其中,第一层为输入层,第2、3、5、6、8、9、11、12层为卷积层,第4、7、10、13层为池化层,最后一层为输出层;所有卷积层对应的输出维度等于输入维度,所有池化层输出的特征图的宽和高分别为池化层输入特征图的宽/2和高/2;而在测试时,会将输出的32x32特征进行拉伸,得到1024维度的特征,即所述图像特征。The multi-view feature extraction network is composed of 14 layers of different types of deep network cascades, of which the first layer is the input layer, the second, 3, 5, 6, 8, 9, 11, and 12 layers are convolutional layers. Layers 4, 7, 10, and 13 are pooling layers, and the last layer is the output layer; the output dimensions corresponding to all convolutional layers are equal to the input dimensions, and the width and height of the feature maps output by all pooling layers are the input of the pooling layer, respectively. The width/2 and height/2 of the feature map; and during testing, the output 32x32 features will be stretched to obtain 1024-dimensional features, that is, the image features.
该特征-图像重建网络由14层不同类型的深度网络级联构成,其中,第一次为图像特征输入层,第2、3、5、6、8、9、11、12层为输入于输出维度保持一致的反卷积层,第4、7、10、13层为输出特征的宽和高分别为输入特征的宽×2和高×2的反卷积层,最后一层为输出层。The feature-image reconstruction network consists of 14 layers of different types of deep network cascades, of which the first layer is the image feature input layer, and the second, 3, 5, 6, 8, 9, 11, and 12 layers are input and output. The deconvolution layer with the same dimensions, the 4th, 7th, 10th, and 13th layers are the deconvolution layers with the width and height of the output feature being the width × 2 and height × 2 of the input feature respectively, and the last layer is the output layer.
为了最大化特征差异和最小化重建误差,训练的目标函数为-L(F)+||I k-f β(F k)|| 2,其中-L(F)表示特征差异损失函数,||I k-f β(F k)|| 2表示特征重建损失函数;训练以adam优化方法为策略,学习率参数初始化为0.01,动量初始化为0.95。 In order to maximize the feature difference and minimize the reconstruction error, the training objective function is -L(F)+||I k -f β (F k )|| 2 , where -L(F) represents the feature difference loss function, | |I k -f β (F k )|| 2 represents the feature reconstruction loss function; the training takes the adam optimization method as the strategy, the learning rate parameter is initialized to 0.01, and the momentum is initialized to 0.95.
而在测试或实用阶段,多视角特征提取网络的参数不再改变,并且不需要再级联特征-图像重建网络。In the testing or practical stage, the parameters of the multi-view feature extraction network are not changed, and there is no need to cascade the feature-image reconstruction network again.
S3、特征数据库构建:训练完所述多视角特征提取网络以后,已经具备从图像到特征的 映射能力,可以计算并保存所述三维模型多视角图像数据集中每一张图像的特征作为特征模板库,以及所述训练集中每一张图像的特征;具体地,将每张图像大小调整到512x512,经过逐样本均值消减和数据归一化后,输入多视角特征提取网络,计算得到32x32的特征图,并将特征图拉升成1024维度的特征,即为所求的特征。S3. Feature database construction: After training the multi-view feature extraction network, it already has the ability to map from images to features, and can calculate and save the features of each image in the multi-view image data set of the 3D model as a feature template library , and the features of each image in the training set; specifically, adjust the size of each image to 512×512, after sample-by-sample mean reduction and data normalization, input the multi-view feature extraction network, and calculate the feature map of 32×32 , and the feature map is pulled up to a feature of 1024 dimensions, which is the desired feature.
数据库构建步骤得到的结果为:所述三维模型多视角图像特征数据库,由X个三维模型{M 1、M 2、M 3…M X}、每个所述三维模型对应的Y个视角{V 1、V 2、V 3…V Y}和三维模型每个视角下的二维图像经多视角特征提取网络进行特征提取而得到的视角特征{F 1、F 2、F 3…F Y}构成,其中,F i为1024维度的特征向量,1≤i≤Y。 The result obtained in the database construction step is: the three-dimensional model multi-view image feature database consists of X three-dimensional models {M 1 , M 2 , M 3 . 1 , V 2 , V 3 . , where F i is a feature vector of 1024 dimensions, 1≤i≤Y.
S4、视角特征映射网络训练。S4, view feature mapping network training.
从所述多视角特征提取网络中提取出所需要三维模型多视角图像特征数据库后,所述的三维模型多视角图像特征数据库即可作为模板,只需具备根据图像特征从三维模型多视角图像特征数据库找到最相似的视角的能力即可,这样将该视角作为目标物体的位姿即完成了物体姿态估计。After the required 3D model multi-view image feature database is extracted from the multi-view feature extraction network, the 3-D model multi-view image feature database can be used as a template, and it only needs to have the 3-D model multi-view image feature database according to the image features. The ability to find the most similar perspective is enough, so that the perspective is used as the pose of the target object to complete the object pose estimation.
因而视角特征映射网络训练的任务就是训练一个卷积神经网络,输入三维模型多视角图像特征数据库以及需要估计物体姿态的图像经多视角特征提取网络提取提取的特征,输出该物体中包含的所述目标物体的位姿。Therefore, the task of the perspective feature mapping network training is to train a convolutional neural network, input the multi-view image feature database of the three-dimensional model and the image that needs to estimate the pose of the object, and extract the features extracted by the multi-view feature extraction network, and output the described features contained in the object. The pose of the target object.
所述视角特征映射网络以三维模型多视角图像特征数据库以交叉熵函数作为损失函数,以最小化映射误差为优化目标对视角特征映射网络的参数进行优化。The viewing angle feature mapping network optimizes the parameters of the viewing angle feature mapping network by using the three-dimensional model multi-view image feature database with the cross entropy function as the loss function, and taking minimizing the mapping error as the optimization goal.
具体训练步骤为:1)采用xavier初始化方法对网络参数进行初始化;2)输入所述三维模型多视角图像特征数据库和所述图像特征,计算图像特征与数据库中每个视角下的图像特征的相似度;3)根据真实的图像特征相似度,计算网络的交叉熵损失值;4)采用基于adam策略的梯度优化方法,进行反向传播,更新所述视角特征映射网络参数;5)更换训练集中的图像,重复步骤2-4,直到交叉熵损失值低于某一阈值。The specific training steps are: 1) using the xavier initialization method to initialize the network parameters; 2) inputting the three-dimensional model multi-view image feature database and the image features, and calculating the similarity between the image features and the image features in each view in the database 3) Calculate the cross entropy loss value of the network according to the real image feature similarity; 4) Use the gradient optimization method based on the adam strategy to carry out back-propagation to update the perspective feature mapping network parameters; 5) Replace the training set image, repeat steps 2-4 until the cross-entropy loss value is below a certain threshold.
所述视角特征映射网络的优点在于:更换三维模型多视角图像特征数据库或更换图像特征并不影响视角特征映射网络的精准度。The advantage of the perspective feature mapping network is that changing the multi-view image feature database of the three-dimensional model or changing the image features does not affect the accuracy of the perspective feature mapping network.
S5、利用训练好的网络进行物体姿态估计。S5. Use the trained network to estimate the pose of the object.
经过前面步骤后,已经训练好多视角特征提取网络训练和视角特征映射网络,所有的网络参数已经固定,可以用于物体姿态估计。After the previous steps, many perspective feature extraction network training and perspective feature mapping networks have been trained, and all network parameters have been fixed and can be used for object pose estimation.
具体的,对于某一需要进行估计的物体,可以根据如下步骤进行姿态估计:1)获取该物体的三维模型,利用前述的三维模型渲染方法在多个视角下进行渲染;2)得到该三维模型的 所述多视角特征数据库,用训练好的所述多视角特征提取网络进行特征提取,构建该三维模型的多视角图像特征数据库;3)对相机获取到的图像,用训练好的所述多视角特征提取网络进行特征提取,得到该图像的特征;4)利用训练好的所述视角特征映射网络,将所述该三维模型的多视角图像特征数据库和需要估计的图像特征作为输入,输出多视角图像特征数据库中与该图像特征最相似的特征所对应的视角;5)将上一步计算出的视角作为所述目标物体的位姿。Specifically, for an object that needs to be estimated, attitude estimation can be performed according to the following steps: 1) Obtain a three-dimensional model of the object, and use the aforementioned three-dimensional model rendering method to render in multiple perspectives; 2) Obtain the three-dimensional model The multi-view feature database of the 3D model is extracted with the trained multi-view feature extraction network, and the multi-view image feature database of the three-dimensional model is constructed; 3) For the image obtained by the camera, use the trained multi-view feature extraction network. The perspective feature extraction network performs feature extraction to obtain the features of the image; 4) using the trained perspective feature mapping network, the multi-view image feature database of the three-dimensional model and the image features that need to be estimated are used as input, and the output is more. The perspective corresponding to the most similar feature of the image feature in the perspective image feature database; 5) The perspective calculated in the previous step is used as the pose of the target object.
上述实施例中的多视角特征提取网络的结构图如图2所示,其工作的主要方式为:The structure diagram of the multi-view feature extraction network in the above embodiment is shown in Figure 2, and the main ways of its work are:
在训练阶段,包括步骤A1-A5:During the training phase, including steps A1-A5:
A1、在训练阶段,读取三维模型在某一视角下渲染的图像到内存;A1. In the training phase, read the image rendered by the 3D model under a certain viewing angle to the memory;
A2、利用所述卷积层和池化层对图像进行特征卷积与池化;A2. Use the convolution layer and the pooling layer to perform feature convolution and pooling on the image;
A3、利用反卷积层对视角特征进行反卷积;A3. Use the deconvolution layer to deconvolute the viewing angle feature;
A4、根据反卷积得到的特征进行图像重建;A4. Perform image reconstruction according to the features obtained by deconvolution;
A5、根据重建出的图像进行损失之计算,并更新网络参数。A5. Calculate the loss according to the reconstructed image, and update the network parameters.
在测试阶段,包括步骤B1-B2:During the testing phase, including steps B1-B2:
B1、则读取三维模型在某一视角下渲染的图像或相机读取到的图像到内存;B1, then read the image rendered by the 3D model at a certain angle of view or the image read by the camera to the memory;
B2、利用所述卷积层和池化层对图像进行特征卷积与池化,将输出的特征拉伸为特征向量。B2. Use the convolution layer and the pooling layer to perform feature convolution and pooling on the image, and stretch the output feature into a feature vector.
上述实施例中的视角特征映射的结构图如图3所示,其工作的主要步骤为:The structure diagram of the perspective feature mapping in the above embodiment is shown in Figure 3, and the main steps of its work are:
C1、读取所述三维模型对应的多视角特征数据库到内存;C1, read the multi-view feature database corresponding to the three-dimensional model to memory;
C2、读取所述RGB图像经多视角特征提取网络提取的特征到内存;C2, read the features extracted by the RGB image through the multi-view feature extraction network to the memory;
C3、采用特征拼接的方式对取得到的两种特征进行融合;C3. Use feature splicing to fuse the two obtained features;
C4、对融合后的特征进行卷积计算;C4. Perform convolution calculation on the fused features;
C5、根据卷积得到的结果,计算所述RGB图像特征与所述三维模型对应的多视角特征数据库中每一图像特征的相似度;C5, according to the result obtained by convolution, calculate the similarity of each image feature in the multi-view feature database corresponding to the RGB image feature and the three-dimensional model;
C6、找出与所述RGB图像特征相似度最高的多视角特征数据库中的特征对应的视角,将该视角作为所述目标物体的位姿。C6. Find the perspective corresponding to the feature in the multi-view feature database with the highest similarity of the RGB image features, and use the perspective as the pose of the target object.
本实施例还提供一种基于图像和三维模型的物体姿态估计系统,包括:This embodiment also provides an image and three-dimensional model-based object pose estimation system, including:
数据采集模块,用于获取目标物体的图像数据;The data acquisition module is used to acquire the image data of the target object;
位姿估计模块,用于对所述图像数据进行特征提取,采用物体姿态估计模型对提取到的特征进行映射,获取相似度最高的特征对应的视角作为估计的目标物体位姿;a pose estimation module, configured to perform feature extraction on the image data, use an object pose estimation model to map the extracted features, and obtain the perspective corresponding to the feature with the highest similarity as the estimated pose of the target object;
其中,所述物体姿态估计模型用于将所述目标物体的图像特征映射到相似度最高的三维模型多视角特征的卷积神经网络。Wherein, the object pose estimation model is used to map the image features of the target object to the convolutional neural network of the multi-view features of the three-dimensional model with the highest similarity.
本实施例的一种基于图像和三维模型的物体姿态估计系统,可执行本发明方法实施例所提供的一种基于图像和三维模型的物体姿态估计方法,可执行方法实施例的任意组合实施步骤,具备该方法相应的功能和有益效果。An image and three-dimensional model-based object pose estimation system in this embodiment can execute the image and three-dimensional model-based object pose estimation method provided by the method embodiments of the present invention, and can execute any combination of implementation steps of the method embodiments. , with the corresponding functions and beneficial effects of the method.
本申请实施例还公开了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存介质中。计算机设备的处理器可以从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行图1所示的方法。Embodiments of the present application further disclose a computer program product or computer program, where the computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium. A processor of the computer device can read the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method shown in FIG. 1 .
本实施例还提供了一种存储介质,存储有可执行本发明方法实施例所提供的一种基于图像和三维模型的物体姿态估计方法的指令或程序,当运行该指令或程序时,可执行方法实施例的任意组合实施步骤,具备该方法相应的功能和有益效果。This embodiment also provides a storage medium, which stores an instruction or program for executing an image and three-dimensional model-based object pose estimation method provided by the method embodiment of the present invention. When the instruction or program is executed, the instruction or program can be executed. Any combination of implementation steps of the method embodiments has the corresponding functions and beneficial effects of the method.
在一些可选择的实施例中,在方框图中提到的功能/操作可以不按照操作示图提到的顺序发生。例如,取决于所涉及的功能/操作,连续示出的两个方框实际上可以被大体上同时地执行或所述方框有时能以相反顺序被执行。此外,在本发明的流程图中所呈现和描述的实施例以示例的方式被提供,目的在于提供对技术更全面的理解。所公开的方法不限于本文所呈现的操作和逻辑流程。可选择的实施例是可预期的,其中各种操作的顺序被改变以及其中被描述为较大操作的一部分的子操作被独立地执行。In some alternative implementations, the functions/operations noted in the block diagrams may occur out of the order noted in the operational diagrams. For example, two blocks shown in succession may, in fact, be executed substantially concurrently or the blocks may sometimes be executed in the reverse order, depending upon the functionality/operations involved. Furthermore, the embodiments presented and described in the flowcharts of the present invention are provided by way of example in order to provide a more comprehensive understanding of the technology. The disclosed methods are not limited to the operations and logic flows presented herein. Alternative embodiments are contemplated in which the order of the various operations are altered and in which sub-operations described as part of larger operations are performed independently.
此外,虽然在功能性模块的背景下描述了本发明,但应当理解的是,除非另有相反说明,所述的功能和/或特征中的一个或多个可以被集成在单个物理装置和/或软件模块中,或者一个或多个功能和/或特征可以在单独的物理装置或软件模块中被实现。还可以理解的是,有关每个模块的实际实现的详细讨论对于理解本发明是不必要的。更确切地说,考虑到在本文中公开的装置中各种功能模块的属性、功能和内部关系的情况下,在工程师的常规技术内将会了解该模块的实际实现。因此,本领域技术人员运用普通技术就能够在无需过度试验的情况下实现在权利要求书中所阐明的本发明。还可以理解的是,所公开的特定概念仅仅是说明性的,并不意在限制本发明的范围,本发明的范围由所附权利要求书及其等同方案的全部范围来决定。Furthermore, while the invention is described in the context of functional modules, it is to be understood that, unless stated to the contrary, one or more of the described functions and/or features may be integrated in a single physical device and/or or software modules, or one or more functions and/or features may be implemented in separate physical devices or software modules. It will also be appreciated that a detailed discussion of the actual implementation of each module is not necessary to understand the present invention. Rather, given the attributes, functions, and internal relationships of the various functional modules in the apparatus disclosed herein, the actual implementation of the modules will be within the routine skill of the engineer. Accordingly, those skilled in the art, using ordinary skill, can implement the invention as set forth in the claims without undue experimentation. It is also to be understood that the specific concepts disclosed are illustrative only and are not intended to limit the scope of the invention, which is to be determined by the appended claims along with their full scope of equivalents.
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产 品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。The functions, if implemented in the form of software functional units and sold or used as independent products, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution. The computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes: U disk, mobile hard disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program codes .
在流程图中表示或在此以其他方式描述的逻辑和/或步骤,例如,可以被认为是用于实现逻辑功能的可执行指令的定序列表,可以具体实现在任何计算机可读介质中,以供指令执行系统、装置或设备(如基于计算机的系统、包括处理器的系统或其他可以从指令执行系统、装置或设备取指令并执行指令的系统)使用,或结合这些指令执行系统、装置或设备而使用。就本说明书而言,“计算机可读介质”可以是任何可以包含、存储、通信、传播或传输程序以供指令执行系统、装置或设备或结合这些指令执行系统、装置或设备而使用的装置。The logic and/or steps represented in flowcharts or otherwise described herein, for example, may be considered an ordered listing of executable instructions for implementing the logical functions, may be embodied in any computer-readable medium, For use with, or in conjunction with, an instruction execution system, apparatus, or device (such as a computer-based system, a system including a processor, or other system that can fetch instructions from and execute instructions from an instruction execution system, apparatus, or apparatus) or equipment. For the purposes of this specification, a "computer-readable medium" can be any device that can contain, store, communicate, propagate, or transport the program for use by or in connection with an instruction execution system, apparatus, or apparatus.
计算机可读介质的更具体的示例(非穷尽性列表)包括以下:具有一个或多个布线的电连接部(电子装置),便携式计算机盘盒(磁装置),随机存取存储器(RAM),只读存储器(ROM),可擦除可编辑只读存储器(EPROM或闪速存储器),光纤装置,以及便携式光盘只读存储器(CDROM)。另外,计算机可读介质甚至可以是可在其上打印所述程序的纸或其他合适的介质,因为可以例如通过对纸或其他介质进行光学扫描,接着进行编辑、解译或必要时以其他合适方式进行处理来以电子方式获得所述程序,然后将其存储在计算机存储器中。More specific examples (non-exhaustive list) of computer readable media include the following: electrical connections with one or more wiring (electronic devices), portable computer disk cartridges (magnetic devices), random access memory (RAM), Read Only Memory (ROM), Erasable Editable Read Only Memory (EPROM or Flash Memory), Fiber Optic Devices, and Portable Compact Disc Read Only Memory (CDROM). In addition, the computer readable medium may even be paper or other suitable medium on which the program may be printed, as the paper or other medium may be optically scanned, for example, followed by editing, interpretation, or other suitable medium as necessary process to obtain the program electronically and then store it in computer memory.
应当理解,本发明的各部分可以用硬件、软件、固件或它们的组合来实现。在上述实施方式中,多个步骤或方法可以用存储在存储器中且由合适的指令执行系统执行的软件或固件来实现。例如,如果用硬件来实现,和在另一实施方式中一样,可用本领域公知的下列技术中的任一项或他们的组合来实现:具有用于对数据信号实现逻辑功能的逻辑门电路的离散逻辑电路,具有合适的组合逻辑门电路的专用集成电路,可编程门阵列(PGA),现场可编程门阵列(FPGA)等。It should be understood that various parts of the present invention may be implemented in hardware, software, firmware or a combination thereof. In the above-described embodiments, various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, it can be implemented by any one or a combination of the following techniques known in the art: Discrete logic circuits, application specific integrated circuits with suitable combinational logic gates, Programmable Gate Arrays (PGA), Field Programmable Gate Arrays (FPGA), etc.
在本说明书的上述描述中,参考术语“一个实施方式/实施例”、“另一实施方式/实施例”或“某些实施方式/实施例”等的描述意指结合实施方式或示例描述的具体特征、结构、材料或者特点包含于本发明的至少一个实施方式或示例中。在本说明书中,对上述术语的示意性表述不一定指的是相同的实施方式或示例。而且,描述的具体特征、结构、材料或者特点可以在任何的一个或多个实施方式或示例中以合适的方式结合。In the above description of the present specification, reference to the description of the terms "one embodiment/example", "another embodiment/example" or "certain embodiments/examples" etc. means the description in conjunction with the embodiment or example. Particular features, structures, materials, or characteristics are included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
尽管已经示出和描述了本发明的实施方式,本领域的普通技术人员可以理解:在不脱离本发明的原理和宗旨的情况下可以对这些实施方式进行多种变化、修改、替换和变型,本发明的范围由权利要求及其等同物限定。Although embodiments of the present invention have been shown and described, it will be understood by those of ordinary skill in the art that various changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, The scope of the invention is defined by the claims and their equivalents.
以上是对本发明的较佳实施进行了具体说明,但本发明并不限于上述实施例,熟悉本领域的技术人员在不违背本发明精神的前提下还可做作出种种的等同变形或替换,这些等同的变形或替换均包含在本申请权利要求所限定的范围内。The above is a specific description of the preferred implementation of the present invention, but the present invention is not limited to the above-mentioned embodiments, and those skilled in the art can also make various equivalent deformations or replacements on the premise of not violating the spirit of the present invention. Equivalent modifications or substitutions are included within the scope defined by the claims of the present application.

Claims (10)

  1. 一种基于图像和三维模型的物体姿态估计方法,其特征在于,包括以下步骤:A method for estimating the pose of an object based on an image and a three-dimensional model, comprising the following steps:
    获取目标物体的图像数据;Obtain the image data of the target object;
    对所述图像数据进行特征提取,采用物体姿态估计模型对提取到的特征进行映射,获取相似度最高的特征对应的视角作为估计的目标物体位姿;Perform feature extraction on the image data, use an object pose estimation model to map the extracted features, and obtain the perspective corresponding to the feature with the highest similarity as the estimated pose of the target object;
    其中,所述物体姿态估计模型用于将所述目标物体的图像特征映射到相似度最高的三维模型多视角特征的卷积神经网络。Wherein, the object pose estimation model is used to map the image features of the target object to the convolutional neural network of the multi-view features of the three-dimensional model with the highest similarity.
  2. 根据权利要求1所述的一种基于图像和三维模型的物体姿态估计方法,其特征在于,所述物体姿态估计方法还包括构建所述物体姿态估计模型的步骤,具体为:The method for estimating an object pose based on an image and a three-dimensional model according to claim 1, wherein the method for estimating the object pose further comprises the step of constructing the object pose estimation model, specifically:
    通过标准格式获取目标物体的三维模型;Obtain the 3D model of the target object through a standard format;
    在多个视角下对所述三维模型进行渲染,获得不同视角下所述三维模型对应的二维图像,构成所述三维模型的多视角图像数据集;Rendering the three-dimensional model under multiple viewing angles, obtaining two-dimensional images corresponding to the three-dimensional model under different viewing angles, and forming a multi-view image data set of the three-dimensional model;
    获取训练集,采用所述训练集训练多视角特征提取网络;obtaining a training set, and using the training set to train a multi-view feature extraction network;
    采用所述多视角特征提取网络提取并保存所述多视角图像数据集中每一张图像的特征,构成所述三维模型的多视角图像特征数据库;Using the multi-view feature extraction network to extract and save the features of each image in the multi-view image data set to form a multi-view image feature database of the three-dimensional model;
    采用所述多视角特征提取网络提取并保存所述训练集中每一张图像的特征,构成所述训练集图像特征数据库;Using the multi-view feature extraction network to extract and save the features of each image in the training set, form the training set image feature database;
    采用所述多视角图像特征数据库和所述训练集图像特征数据库训练所述视角特征映射网络,获得物体姿态估计模型。The perspective feature mapping network is trained by using the multi-view image feature database and the training set image feature database to obtain an object pose estimation model.
  3. 根据权利要求2所述的一种基于图像和三维模型的物体姿态估计方法,其特征在于,所述标准格式包括点云格式、体素格式或网状格式中的其中一种。The method for estimating the pose of an object based on an image and a three-dimensional model according to claim 2, wherein the standard format includes one of a point cloud format, a voxel format or a mesh format.
  4. 根据权利要求2所述的一种基于图像和三维模型的物体姿态估计方法,其特征在于,所述多视角图像数据集包括X个三维模型{M 1、M 2、M 3…M X}、每个所述三维模型对应的Y个视角{V 1、V 2、V 3…V Y}以及每个所述三维模型在Y个视角下三维模型渲染出的二维图像{I 1、I 2、I 3…I Y}; The method for estimating object pose based on images and three-dimensional models according to claim 2, wherein the multi-view image data set includes X three-dimensional models {M 1 , M 2 , M 3 . . . M X }, Y viewing angles {V 1 , V 2 , V 3 . . . V Y } corresponding to each of the three-dimensional models, and two-dimensional images {I 1 , I 2 rendered by the three-dimensional model under the Y viewing angles for each of the three-dimensional models , I 3 ... I Y };
    所述多视角图像特征数据库包括X个三维模型{M 1、M 2、M 3…M X}、每个所述三维模型对应的Y个视角{V 1、V 2、V 3…V Y}以及所述三维模型每个视角下的二维图像经所述多视角特征提取网络进行特征提取而得到的视角特征{F 1、F 2、F 3…F Y},F i为1024维度的特征向量,1≤i≤Y。 The multi-view image feature database includes X three-dimensional models {M 1 , M 2 , M 3 . . . M X }, and Y perspectives {V 1 , V 2 , V 3 . and the view features { F 1 , F 2 , F 3 . vector, 1≤i≤Y.
  5. 根据权利要求2所述的一种基于图像和三维模型的物体姿态估计方法,其特征在于,所述多视角特征提取网络由级联卷积神经网络构成,利用梯度优化算法进行多轮迭代训练 获得。The method for estimating the pose of an object based on an image and a three-dimensional model according to claim 2, wherein the multi-view feature extraction network is composed of a cascaded convolutional neural network, and the gradient optimization algorithm is used to perform multiple rounds of iterative training to obtain .
  6. 根据权利要求5所述的一种基于图像和三维模型的物体姿态估计方法,其特征在于,所述多视角特征提取网络由14层不同类型的深度卷积神经网络级联构成;The method for estimating object pose based on images and three-dimensional models according to claim 5, wherein the multi-view feature extraction network is composed of 14 layers of different types of deep convolutional neural networks cascaded;
    其中,第1层为输入层,第2、3、5、6、8、9、11、12层为卷积层,第4、7、10、13层为池化层,第14层为输出层;Among them, the 1st layer is the input layer, the 2nd, 3rd, 5th, 6, 8, 9, 11, and 12th layers are the convolution layers, the 4th, 7th, 10th, and 13th layers are the pooling layers, and the 14th layer is the output. layer;
    所有所述卷积层对应的输出维度等于输入维度,所有所述池化层输出的特征图的宽和高分别为所述池化层输入的特征图的宽/2和高/2。The output dimensions corresponding to all the convolutional layers are equal to the input dimensions, and the width and height of the feature maps output by all the pooling layers are respectively the width/2 and the height/2 of the feature maps input to the pooling layer.
  7. 根据权利要求2所述的一种基于图像和三维模型的物体姿态估计方法,其特征在于,所述视角特征映射网络以所述多视角图像特征数据库和目标物体的图像特征作为输入,输出所述图像特征与三维模型多视角图像特征中每一视角图像特征的相似度。The method for estimating object pose based on images and three-dimensional models according to claim 2, wherein the perspective feature mapping network takes the multi-view image feature database and the image features of the target object as inputs, and outputs the The similarity between the image features and the multi-view image features of the 3D model for each view image feature.
  8. 根据权利要求7所述的一种基于图像和三维模型的物体姿态估计方法,其特征在于,所述视角特征映射网络以交叉熵函数作为损失函数,且以最小化映射误差为优化目标对所述视角特征映射网络的参数进行优化。The method for estimating the pose of an object based on an image and a three-dimensional model according to claim 7, wherein the perspective feature mapping network uses a cross-entropy function as a loss function, and takes minimizing the mapping error as an optimization goal to The parameters of the view feature map network are optimized.
  9. 一种基于图像和三维模型的物体姿态估计系统,其特征在于,包括:An object pose estimation system based on images and three-dimensional models, characterized in that it includes:
    数据采集模块,用于获取目标物体的图像数据;The data acquisition module is used to acquire the image data of the target object;
    位姿估计模块,用于对所述图像数据进行特征提取,采用物体姿态估计模型对提取到的特征进行映射,获取相似度最高的特征对应的视角作为估计的目标物体位姿;a pose estimation module, configured to perform feature extraction on the image data, use an object pose estimation model to map the extracted features, and obtain the perspective corresponding to the feature with the highest similarity as the estimated pose of the target object;
    其中,所述物体姿态估计模型用于将所述目标物体的图像特征映射到相似度最高的三维模型多视角特征的卷积神经网络。Wherein, the object pose estimation model is used to map the image features of the target object to the convolutional neural network of the multi-view features of the three-dimensional model with the highest similarity.
  10. 一种存储介质,其中存储有处理器可执行的程序,其特征在于,所述处理器可执行的程序在由处理器执行时用于执行如权利要求1-8任一项所述一种基于图像和三维模型的物体姿态估计方法。A storage medium in which a program executable by a processor is stored, wherein the program executable by the processor is used to execute a program based on any one of claims 1-8 when executed by the processor. Object pose estimation methods for images and 3D models.
PCT/CN2021/124660 2020-11-16 2021-10-19 Object attitude estimation method and system based on image and three-dimensional model, and medium WO2022100379A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011278095.5A CN112381879A (en) 2020-11-16 2020-11-16 Object posture estimation method, system and medium based on image and three-dimensional model
CN202011278095.5 2020-11-16

Publications (1)

Publication Number Publication Date
WO2022100379A1 true WO2022100379A1 (en) 2022-05-19

Family

ID=74584723

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/124660 WO2022100379A1 (en) 2020-11-16 2021-10-19 Object attitude estimation method and system based on image and three-dimensional model, and medium

Country Status (2)

Country Link
CN (1) CN112381879A (en)
WO (1) WO2022100379A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114821145A (en) * 2022-06-28 2022-07-29 山东百盟信息技术有限公司 Incomplete multi-view image data clustering method based on data restoration
CN115219492A (en) * 2022-05-25 2022-10-21 中国科学院自动化研究所 Appearance image acquisition method and device for three-dimensional object
CN116168137A (en) * 2023-04-21 2023-05-26 湖南马栏山视频先进技术研究院有限公司 New view angle synthesis method, device and memory based on nerve radiation field
CN116643648A (en) * 2023-04-13 2023-08-25 中国兵器装备集团自动化研究所有限公司 Three-dimensional scene matching interaction method, device, equipment and storage medium
CN116822341A (en) * 2023-06-12 2023-09-29 华中科技大学 Defect prediction method and system based on three-dimensional casting model feature extraction
CN117315152A (en) * 2023-09-27 2023-12-29 杭州一隅千象科技有限公司 Binocular stereoscopic imaging method and binocular stereoscopic imaging system

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112381879A (en) * 2020-11-16 2021-02-19 华南理工大学 Object posture estimation method, system and medium based on image and three-dimensional model
CN113409290B (en) * 2021-06-29 2023-12-15 北京兆维电子(集团)有限责任公司 Method and device for detecting appearance defects of liquid crystal display, and storage medium
CN113643366B (en) * 2021-07-12 2024-03-05 中国科学院自动化研究所 Multi-view three-dimensional object attitude estimation method and device
CN115115780B (en) * 2022-06-29 2024-07-09 聚好看科技股份有限公司 Three-dimensional reconstruction method and system based on multi-view RGBD camera
CN115223023B (en) * 2022-09-16 2022-12-20 杭州得闻天下数字文化科技有限公司 Human body contour estimation method and device based on stereoscopic vision and deep neural network

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108491880A (en) * 2018-03-23 2018-09-04 西安电子科技大学 Object classification based on neural network and position and orientation estimation method
CN109816725A (en) * 2019-01-17 2019-05-28 哈工大机器人(合肥)国际创新研究院 A kind of monocular camera object pose estimation method and device based on deep learning
US20200184668A1 (en) * 2018-12-05 2020-06-11 Qualcomm Incorporated Systems and methods for three-dimensional pose determination
CN112381879A (en) * 2020-11-16 2021-02-19 华南理工大学 Object posture estimation method, system and medium based on image and three-dimensional model

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017156243A1 (en) * 2016-03-11 2017-09-14 Siemens Aktiengesellschaft Deep-learning based feature mining for 2.5d sensing image search
CN109063301B (en) * 2018-07-24 2023-06-16 杭州师范大学 Single image indoor object attitude estimation method based on thermodynamic diagram
CN109934847B (en) * 2019-03-06 2020-05-22 视辰信息科技(上海)有限公司 Method and device for estimating posture of weak texture three-dimensional object

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108491880A (en) * 2018-03-23 2018-09-04 西安电子科技大学 Object classification based on neural network and position and orientation estimation method
US20200184668A1 (en) * 2018-12-05 2020-06-11 Qualcomm Incorporated Systems and methods for three-dimensional pose determination
CN109816725A (en) * 2019-01-17 2019-05-28 哈工大机器人(合肥)国际创新研究院 A kind of monocular camera object pose estimation method and device based on deep learning
CN112381879A (en) * 2020-11-16 2021-02-19 华南理工大学 Object posture estimation method, system and medium based on image and three-dimensional model

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115219492A (en) * 2022-05-25 2022-10-21 中国科学院自动化研究所 Appearance image acquisition method and device for three-dimensional object
CN114821145A (en) * 2022-06-28 2022-07-29 山东百盟信息技术有限公司 Incomplete multi-view image data clustering method based on data restoration
CN114821145B (en) * 2022-06-28 2022-09-23 山东百盟信息技术有限公司 Incomplete multi-view image data clustering method based on data restoration
CN116643648A (en) * 2023-04-13 2023-08-25 中国兵器装备集团自动化研究所有限公司 Three-dimensional scene matching interaction method, device, equipment and storage medium
CN116643648B (en) * 2023-04-13 2023-12-19 中国兵器装备集团自动化研究所有限公司 Three-dimensional scene matching interaction method, device, equipment and storage medium
CN116168137A (en) * 2023-04-21 2023-05-26 湖南马栏山视频先进技术研究院有限公司 New view angle synthesis method, device and memory based on nerve radiation field
CN116168137B (en) * 2023-04-21 2023-07-11 湖南马栏山视频先进技术研究院有限公司 New view angle synthesis method, device and memory based on nerve radiation field
CN116822341A (en) * 2023-06-12 2023-09-29 华中科技大学 Defect prediction method and system based on three-dimensional casting model feature extraction
CN117315152A (en) * 2023-09-27 2023-12-29 杭州一隅千象科技有限公司 Binocular stereoscopic imaging method and binocular stereoscopic imaging system
CN117315152B (en) * 2023-09-27 2024-03-29 杭州一隅千象科技有限公司 Binocular stereoscopic imaging method and binocular stereoscopic imaging system

Also Published As

Publication number Publication date
CN112381879A (en) 2021-02-19

Similar Documents

Publication Publication Date Title
WO2022100379A1 (en) Object attitude estimation method and system based on image and three-dimensional model, and medium
JP6807471B2 (en) Semantic segmentation model training methods and equipment, electronics, and storage media
CN111079532A (en) Video content description method based on text self-encoder
CN110633628B (en) RGB image scene three-dimensional model reconstruction method based on artificial neural network
CN107229757A (en) The video retrieval method encoded based on deep learning and Hash
CN109389667B (en) High-efficiency global illumination drawing method based on deep learning
CN111709980A (en) Multi-scale image registration method and device based on deep learning
CN110309835B (en) Image local feature extraction method and device
CN112862949A (en) Object 3D shape reconstruction method based on multiple views
CN115147599A (en) Object six-degree-of-freedom pose estimation method for multi-geometric feature learning of occlusion and truncation scenes
Stekovic et al. General 3d room layout from a single view by render-and-compare
Chang et al. Candidate-based matching of 3-D point clouds with axially switching pose estimation
Wu et al. Sc-wls: Towards interpretable feed-forward camera re-localization
CN111241326B (en) Image visual relationship indication positioning method based on attention pyramid graph network
CN116662600A (en) Visual positioning method based on lightweight structured line map
Kim et al. Self-supervised keypoint detection based on multi-layer random forest regressor
CN110633706B (en) Semantic segmentation method based on pyramid network
CN113592015B (en) Method and device for positioning and training feature matching network
Basak et al. Monocular depth estimation using encoder-decoder architecture and transfer learning from single RGB image
CN115018999A (en) Multi-robot-cooperation dense point cloud map construction method and device
Huang et al. A stereo matching algorithm based on the improved PSMNet
WO2024060839A1 (en) Object operation method and apparatus, computer device, and computer storage medium
CN117173445A (en) Hypergraph convolution network and contrast learning multi-view three-dimensional object classification method
Phalak et al. DeepPerimeter: Indoor boundary estimation from posed monocular sequences
CN115423927A (en) ViT-based multi-view 3D reconstruction method and system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21890906

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21890906

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 20.09.2023)