WO2022100379A1

WO2022100379A1 - Object attitude estimation method and system based on image and three-dimensional model, and medium

Info

Publication number: WO2022100379A1
Application number: PCT/CN2021/124660
Authority: WO
Inventors: 张健驰; 贾奎; 陈轲
Original assignee: 华南理工大学
Priority date: 2020-11-16
Filing date: 2021-10-19
Publication date: 2022-05-19
Also published as: CN112381879A

Abstract

An object attitude estimation method and system based on an image and a three-dimensional model, and a medium. The method comprises the following steps: acquiring image data of a target object; and performing feature extraction on the image data, mapping extracted features by using an object attitude estimation model, acquiring a viewing angle corresponding to the feature having the highest similarity, and taking the viewing angle as an estimated target object pose, wherein the object attitude estimation model is used for mapping image features of the target object to a convolutional neural network of a three-dimensional model multi-viewing-angle feature having the highest similarity. By means of the method, the aid of a depth image is not required, a three-dimensional model of a target object is more fully utilized, a blocked target object can be better calculated, and when modifying the target object, the retraining of the entire network is not required, such that the generalizability, precision and recognition speed of object pose estimation technology are improved, and the method can be widely applied to the technical field of intelligent information processing.

Description

Object pose estimation method, system and medium based on image and 3D model

technical field

The present invention relates to the technical field of intelligent information processing, and in particular, to an image and three-dimensional model-based object attitude estimation method, system and medium.

Background technique

Object pose estimation technology can estimate the category, three-dimensional displacement and three-dimensional orientation of the target object in the scene. This technology can greatly enhance scene understanding in VR, cars, and robots, and is important for applications such as augmented reality, autonomous driving, and robotic manipulation. Therefore, the object pose estimation technology can be said to be an important breakthrough in the manufacturing industry from the traditional model to the intelligent model.

Although the emergence of deep learning technology has made great progress in the field of object pose estimation, the current mainstream deep networks either only use some key points of the 3D model of the target object to help estimate, or only use the loss function indirectly through backpropagation. The three-dimensional model of the object is used effectively, and the various information contained in the three-dimensional model is not fully utilized. This leads to some problems with existing methods: for example, replacing the target model requires retraining the entire network, unable to deal with severe occlusion, large deviations between training results and test results, and so on. From the perspective of the development of object pose estimation methods in recent years, the existing methods still lack a good solution to improve the generalization, accuracy and recognition speed of object pose estimation technology.

SUMMARY OF THE INVENTION

In order to solve one of the technical problems existing in the prior art at least to a certain extent, the purpose of the present invention is to provide an object pose estimation method, system and medium based on images and three-dimensional models.

The technical scheme adopted in the present invention is:

An object pose estimation method based on images and three-dimensional models, comprising the following steps:

Obtain the image data of the target object;

Perform feature extraction on the image data, use an object pose estimation model to map the extracted features, and obtain the perspective corresponding to the feature with the highest similarity as the estimated pose of the target object;

Wherein, the object pose estimation model is used to map the image features of the target object to the convolutional neural network of the multi-view features of the three-dimensional model with the highest similarity.

Further, the object pose estimation method also includes the step of constructing the object pose estimation model, specifically:

Obtain the 3D model of the target object through a standard format;

Rendering the three-dimensional model under multiple viewing angles, obtaining two-dimensional images corresponding to the three-dimensional model under different viewing angles, and forming a multi-view image data set of the three-dimensional model;

obtaining a training set, and using the training set to train a multi-view feature extraction network;

Using the multi-view feature extraction network to extract and save the features of each image in the multi-view image data set to form a multi-view image feature database of the three-dimensional model;

Using the multi-view feature extraction network to extract and save the features of each image in the training set, form the training set image feature database;

The perspective feature mapping network is trained by using the multi-view image feature database and the training set image feature database to obtain an object pose estimation model.

Further, the standard format includes one of a point cloud format, a voxel format or a mesh format.

Further, the multi-view image dataset includes X three-dimensional models {M ₁ , M ₂ , M ₃ . . . M _X }, and Y perspectives {V ₁ , V ₂ , V ₃ . _Y } and the two-dimensional images {I ₁ , I ₂ , I ₃ . . . I _Y } rendered by the three-dimensional model under Y viewing angles for each of the three-dimensional models;

The multi-view image feature database includes X three-dimensional models {M ₁ , M ₂ , M ₃ . . . M _X }, and _Y perspectives {V ₁ , V ₂ , V ₃ . and the view _features _{ F ₁ , F ₂ , F ₃ . vector, 1≤i≤Y.

Further, the multi-view feature extraction network is composed of a cascaded convolutional neural network, and is obtained by performing multiple rounds of iterative training using a gradient optimization algorithm.

Further, the multi-view feature extraction network is composed of 14 layers of different types of deep convolutional neural networks cascaded;

Among them, the 1st layer is the input layer, the 2nd, 3rd, 5th, 6, 8, 9, 11, and 12th layers are the convolution layers, the 4th, 7th, 10th, and 13th layers are the pooling layers, and the 14th layer is the output. layer;

The output dimensions corresponding to all the convolutional layers are equal to the input dimensions, and the width and height of the feature maps output by all the pooling layers are respectively the width/2 and the height/2 of the feature maps input to the pooling layer.

Further, the view feature mapping network takes the multi-view image feature database and the image features of the target object as inputs, and outputs the similarity between the image features and each view image feature in the multi-view image features of the three-dimensional model.

Further, the view feature mapping network uses a cross-entropy function as a loss function, and optimizes the parameters of the view feature mapping network with minimizing the mapping error as an optimization goal.

Another technical scheme adopted by the present invention is:

An object pose estimation system based on images and three-dimensional models, including:

The data acquisition module is used to acquire the image data of the target object;

a pose estimation module, configured to perform feature extraction on the image data, use an object pose estimation model to map the extracted features, and obtain the perspective corresponding to the feature with the highest similarity as the estimated pose of the target object;

Another technical scheme adopted by the present invention is:

A storage medium storing processor-executable instructions, when executed by the processor, is used to execute the above-mentioned method for estimating an object pose based on an image and a three-dimensional model.

The beneficial effects of the present invention are as follows: the present invention does not require the aid of depth images, makes more full use of the three-dimensional model of the target object, can better calculate the occluded target object, does not need to retrain the entire network when modifying the target object, and improves the performance of the object. Generalization, accuracy and recognition speed of pose estimation techniques.

Description of drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following descriptions are given to the accompanying drawings of the embodiments of the present invention or the related technical solutions in the prior art. It should be understood that the drawings in the following introduction are only In order to facilitate and clearly express some embodiments of the technical solutions of the present invention, for those skilled in the art, other drawings can also be obtained from these drawings without creative work.

1 is a flowchart of steps of a method for estimating an object pose based on a picture and a three-dimensional model in an embodiment of the present invention;

2 is a structural diagram of a multi-view feature extraction network in an embodiment of the present invention;

FIG. 3 is a structural diagram of a view feature mapping network in an embodiment of the present invention.

Detailed ways

The following describes in detail the embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are exemplary, only used to explain the present invention, and should not be construed as a limitation of the present invention. The numbers of the steps in the following embodiments are only set for the convenience of description, and the sequence between the steps is not limited in any way, and the execution sequence of each step in the embodiments can be adapted according to the understanding of those skilled in the art Sexual adjustment.

In the description of the present invention, it should be understood that the azimuth description, such as the azimuth or position relationship indicated by up, down, front, rear, left, right, etc., is based on the azimuth or position relationship shown in the drawings, only In order to facilitate the description of the present invention and simplify the description, it is not indicated or implied that the indicated device or element must have a particular orientation, be constructed and operated in a particular orientation, and therefore should not be construed as limiting the present invention.

In the description of the present invention, the meaning of several is one or more, the meaning of multiple is two or more, greater than, less than, exceeding, etc. are understood as not including this number, above, below, within, etc. are understood as including this number. If it is described that the first and the second are only for the purpose of distinguishing technical features, it cannot be understood as indicating or implying relative importance, or indicating the number of the indicated technical features or the order of the indicated technical features. relation.

In the description of the present invention, unless otherwise clearly defined, words such as setting, installation, connection should be understood in a broad sense, and those skilled in the art can reasonably determine the specific meanings of the above words in the present invention in combination with the specific content of the technical solution.

As shown in FIG. 1 , this embodiment provides an object pose estimation method based on pictures and three-dimensional models, including the following steps:

S1. 3D model acquisition and model rendering: obtain the 3D model of the target object through standard point cloud formats, such as point cloud, voxel, and mesh formats, and render the 3D model of each object from multiple perspectives , to obtain a two-dimensional image corresponding to each of the three-dimensional models of the object under different viewing angles, forming a multi-view image data set of the three-dimensional model.

S2. Multi-view feature extraction network training: In order to make the image features extracted by the multi-view feature extraction network better suited to the task of object pose estimation, in the training process, the multi-view feature extraction network is cascaded with the feature-image reconstruction network. training and parameter optimization; the cascaded overall network is expressed as

in

represents the reconstructed image, F _k = f ^θ (I _k ) is the mapping from the image I _k to the feature F _k , θ represents the parameters of the multi-view feature extraction network, f ^β represents the feature-image reconstruction network, and β represents the feature- Parameters of the image reconstruction network.

The multi-view feature extraction network is composed of 14 layers of different types of deep network cascades, of which the first layer is the input layer, the second, 3, 5, 6, 8, 9, 11, and 12 layers are convolutional layers. Layers 4, 7, 10, and 13 are pooling layers, and the last layer is the output layer; the output dimensions corresponding to all convolutional layers are equal to the input dimensions, and the width and height of the feature maps output by all pooling layers are the input of the pooling layer, respectively. The width/2 and height/2 of the feature map; and during testing, the output 32x32 features will be stretched to obtain 1024-dimensional features, that is, the image features.

The feature-image reconstruction network consists of 14 layers of different types of deep network cascades, of which the first layer is the image feature input layer, and the second, 3, 5, 6, 8, 9, 11, and 12 layers are input and output. The deconvolution layer with the same dimensions, the 4th, 7th, 10th, and 13th layers are the deconvolution layers with the width and height of the output feature being the width × 2 and height × 2 of the input feature respectively, and the last layer is the output layer.

In order to maximize the feature difference and minimize the reconstruction error, the training objective function is -L(F)+||I _k -f ^β (F _k )|| ₂ , where -L(F) represents the feature difference loss function, | |I _k -f ^β (F _k )|| ₂ represents the feature reconstruction loss function; the training takes the adam optimization method as the strategy, the learning rate parameter is initialized to 0.01, and the momentum is initialized to 0.95.

In the testing or practical stage, the parameters of the multi-view feature extraction network are not changed, and there is no need to cascade the feature-image reconstruction network again.

S3. Feature database construction: After training the multi-view feature extraction network, it already has the ability to map from images to features, and can calculate and save the features of each image in the multi-view image data set of the 3D model as a feature template library , and the features of each image in the training set; specifically, adjust the size of each image to 512×512, after sample-by-sample mean reduction and data normalization, input the multi-view feature extraction network, and calculate the feature map of 32×32 , and the feature map is pulled up to a feature of 1024 dimensions, which is the desired feature.

The result obtained in the database construction step is: the three-dimensional model multi-view image feature database consists of _X three-dimensional models {M ₁ , M ₂ , M ₃ . ₁ _, _V ₂ _, _V ₃ _. , where F _i is a feature vector of 1024 dimensions, 1≤i≤Y.

S4, view feature mapping network training.

After the required 3D model multi-view image feature database is extracted from the multi-view feature extraction network, the 3-D model multi-view image feature database can be used as a template, and it only needs to have the 3-D model multi-view image feature database according to the image features. The ability to find the most similar perspective is enough, so that the perspective is used as the pose of the target object to complete the object pose estimation.

Therefore, the task of the perspective feature mapping network training is to train a convolutional neural network, input the multi-view image feature database of the three-dimensional model and the image that needs to estimate the pose of the object, and extract the features extracted by the multi-view feature extraction network, and output the described features contained in the object. The pose of the target object.

The viewing angle feature mapping network optimizes the parameters of the viewing angle feature mapping network by using the three-dimensional model multi-view image feature database with the cross entropy function as the loss function, and taking minimizing the mapping error as the optimization goal.

The specific training steps are: 1) using the xavier initialization method to initialize the network parameters; 2) inputting the three-dimensional model multi-view image feature database and the image features, and calculating the similarity between the image features and the image features in each view in the database 3) Calculate the cross entropy loss value of the network according to the real image feature similarity; 4) Use the gradient optimization method based on the adam strategy to carry out back-propagation to update the perspective feature mapping network parameters; 5) Replace the training set image, repeat steps 2-4 until the cross-entropy loss value is below a certain threshold.

The advantage of the perspective feature mapping network is that changing the multi-view image feature database of the three-dimensional model or changing the image features does not affect the accuracy of the perspective feature mapping network.

S5. Use the trained network to estimate the pose of the object.

After the previous steps, many perspective feature extraction network training and perspective feature mapping networks have been trained, and all network parameters have been fixed and can be used for object pose estimation.

Specifically, for an object that needs to be estimated, attitude estimation can be performed according to the following steps: 1) Obtain a three-dimensional model of the object, and use the aforementioned three-dimensional model rendering method to render in multiple perspectives; 2) Obtain the three-dimensional model The multi-view feature database of the 3D model is extracted with the trained multi-view feature extraction network, and the multi-view image feature database of the three-dimensional model is constructed; 3) For the image obtained by the camera, use the trained multi-view feature extraction network. The perspective feature extraction network performs feature extraction to obtain the features of the image; 4) using the trained perspective feature mapping network, the multi-view image feature database of the three-dimensional model and the image features that need to be estimated are used as input, and the output is more. The perspective corresponding to the most similar feature of the image feature in the perspective image feature database; 5) The perspective calculated in the previous step is used as the pose of the target object.

The structure diagram of the multi-view feature extraction network in the above embodiment is shown in Figure 2, and the main ways of its work are:

During the training phase, including steps A1-A5:

A1. In the training phase, read the image rendered by the 3D model under a certain viewing angle to the memory;

A2. Use the convolution layer and the pooling layer to perform feature convolution and pooling on the image;

A3. Use the deconvolution layer to deconvolute the viewing angle feature;

A4. Perform image reconstruction according to the features obtained by deconvolution;

A5. Calculate the loss according to the reconstructed image, and update the network parameters.

During the testing phase, including steps B1-B2:

B1, then read the image rendered by the 3D model at a certain angle of view or the image read by the camera to the memory;

B2. Use the convolution layer and the pooling layer to perform feature convolution and pooling on the image, and stretch the output feature into a feature vector.

The structure diagram of the perspective feature mapping in the above embodiment is shown in Figure 3, and the main steps of its work are:

C1, read the multi-view feature database corresponding to the three-dimensional model to memory;

C2, read the features extracted by the RGB image through the multi-view feature extraction network to the memory;

C3. Use feature splicing to fuse the two obtained features;

C4. Perform convolution calculation on the fused features;

C5, according to the result obtained by convolution, calculate the similarity of each image feature in the multi-view feature database corresponding to the RGB image feature and the three-dimensional model;

C6. Find the perspective corresponding to the feature in the multi-view feature database with the highest similarity of the RGB image features, and use the perspective as the pose of the target object.

This embodiment also provides an image and three-dimensional model-based object pose estimation system, including:

An image and three-dimensional model-based object pose estimation system in this embodiment can execute the image and three-dimensional model-based object pose estimation method provided by the method embodiments of the present invention, and can execute any combination of implementation steps of the method embodiments. , with the corresponding functions and beneficial effects of the method.

Embodiments of the present application further disclose a computer program product or computer program, where the computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium. A processor of the computer device can read the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method shown in FIG. 1 .

This embodiment also provides a storage medium, which stores an instruction or program for executing an image and three-dimensional model-based object pose estimation method provided by the method embodiment of the present invention. When the instruction or program is executed, the instruction or program can be executed. Any combination of implementation steps of the method embodiments has the corresponding functions and beneficial effects of the method.

In some alternative implementations, the functions/operations noted in the block diagrams may occur out of the order noted in the operational diagrams. For example, two blocks shown in succession may, in fact, be executed substantially concurrently or the blocks may sometimes be executed in the reverse order, depending upon the functionality/operations involved. Furthermore, the embodiments presented and described in the flowcharts of the present invention are provided by way of example in order to provide a more comprehensive understanding of the technology. The disclosed methods are not limited to the operations and logic flows presented herein. Alternative embodiments are contemplated in which the order of the various operations are altered and in which sub-operations described as part of larger operations are performed independently.

Furthermore, while the invention is described in the context of functional modules, it is to be understood that, unless stated to the contrary, one or more of the described functions and/or features may be integrated in a single physical device and/or or software modules, or one or more functions and/or features may be implemented in separate physical devices or software modules. It will also be appreciated that a detailed discussion of the actual implementation of each module is not necessary to understand the present invention. Rather, given the attributes, functions, and internal relationships of the various functional modules in the apparatus disclosed herein, the actual implementation of the modules will be within the routine skill of the engineer. Accordingly, those skilled in the art, using ordinary skill, can implement the invention as set forth in the claims without undue experimentation. It is also to be understood that the specific concepts disclosed are illustrative only and are not intended to limit the scope of the invention, which is to be determined by the appended claims along with their full scope of equivalents.

The functions, if implemented in the form of software functional units and sold or used as independent products, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution. The computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes: U disk, mobile hard disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program codes .

The logic and/or steps represented in flowcharts or otherwise described herein, for example, may be considered an ordered listing of executable instructions for implementing the logical functions, may be embodied in any computer-readable medium, For use with, or in conjunction with, an instruction execution system, apparatus, or device (such as a computer-based system, a system including a processor, or other system that can fetch instructions from and execute instructions from an instruction execution system, apparatus, or apparatus) or equipment. For the purposes of this specification, a "computer-readable medium" can be any device that can contain, store, communicate, propagate, or transport the program for use by or in connection with an instruction execution system, apparatus, or apparatus.

More specific examples (non-exhaustive list) of computer readable media include the following: electrical connections with one or more wiring (electronic devices), portable computer disk cartridges (magnetic devices), random access memory (RAM), Read Only Memory (ROM), Erasable Editable Read Only Memory (EPROM or Flash Memory), Fiber Optic Devices, and Portable Compact Disc Read Only Memory (CDROM). In addition, the computer readable medium may even be paper or other suitable medium on which the program may be printed, as the paper or other medium may be optically scanned, for example, followed by editing, interpretation, or other suitable medium as necessary process to obtain the program electronically and then store it in computer memory.

It should be understood that various parts of the present invention may be implemented in hardware, software, firmware or a combination thereof. In the above-described embodiments, various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, it can be implemented by any one or a combination of the following techniques known in the art: Discrete logic circuits, application specific integrated circuits with suitable combinational logic gates, Programmable Gate Arrays (PGA), Field Programmable Gate Arrays (FPGA), etc.

In the above description of the present specification, reference to the description of the terms "one embodiment/example", "another embodiment/example" or "certain embodiments/examples" etc. means the description in conjunction with the embodiment or example. Particular features, structures, materials, or characteristics are included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

Although embodiments of the present invention have been shown and described, it will be understood by those of ordinary skill in the art that various changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, The scope of the invention is defined by the claims and their equivalents.

The above is a specific description of the preferred implementation of the present invention, but the present invention is not limited to the above-mentioned embodiments, and those skilled in the art can also make various equivalent deformations or replacements on the premise of not violating the spirit of the present invention. Equivalent modifications or substitutions are included within the scope defined by the claims of the present application.

Claims

A method for estimating the pose of an object based on an image and a three-dimensional model, comprising the following steps:

Obtain the image data of the target object;

Perform feature extraction on the image data, use an object pose estimation model to map the extracted features, and obtain the perspective corresponding to the feature with the highest similarity as the estimated pose of the target object;

Wherein, the object pose estimation model is used to map the image features of the target object to the convolutional neural network of the multi-view features of the three-dimensional model with the highest similarity.
The method for estimating an object pose based on an image and a three-dimensional model according to claim 1, wherein the method for estimating the object pose further comprises the step of constructing the object pose estimation model, specifically:

Obtain the 3D model of the target object through a standard format;

Rendering the three-dimensional model under multiple viewing angles, obtaining two-dimensional images corresponding to the three-dimensional model under different viewing angles, and forming a multi-view image data set of the three-dimensional model;

obtaining a training set, and using the training set to train a multi-view feature extraction network;

Using the multi-view feature extraction network to extract and save the features of each image in the multi-view image data set to form a multi-view image feature database of the three-dimensional model;

Using the multi-view feature extraction network to extract and save the features of each image in the training set, form the training set image feature database;

The perspective feature mapping network is trained by using the multi-view image feature database and the training set image feature database to obtain an object pose estimation model.
The method for estimating the pose of an object based on an image and a three-dimensional model according to claim 2, wherein the standard format includes one of a point cloud format, a voxel format or a mesh format.
The method for estimating object pose based on images and three-dimensional models according to claim 2, wherein the multi-view image data set includes X three-dimensional models {M 1 , M 2 , M 3 . . . M X }, Y viewing angles {V 1 , V 2 , V 3 . . . V Y } corresponding to each of the three-dimensional models, and two-dimensional images {I 1 , I 2 rendered by the three-dimensional model under the Y viewing angles for each of the three-dimensional models , I 3 ... I Y };

The multi-view image feature database includes X three-dimensional models {M 1 , M 2 , M 3 . . . M X }, and Y perspectives {V 1 , V 2 , V 3 . and the view features { F 1 , F 2 , F 3 . vector, 1≤i≤Y.
The method for estimating the pose of an object based on an image and a three-dimensional model according to claim 2, wherein the multi-view feature extraction network is composed of a cascaded convolutional neural network, and the gradient optimization algorithm is used to perform multiple rounds of iterative training to obtain .
The method for estimating object pose based on images and three-dimensional models according to claim 5, wherein the multi-view feature extraction network is composed of 14 layers of different types of deep convolutional neural networks cascaded;

Among them, the 1st layer is the input layer, the 2nd, 3rd, 5th, 6, 8, 9, 11, and 12th layers are the convolution layers, the 4th, 7th, 10th, and 13th layers are the pooling layers, and the 14th layer is the output. layer;

The output dimensions corresponding to all the convolutional layers are equal to the input dimensions, and the width and height of the feature maps output by all the pooling layers are respectively the width/2 and the height/2 of the feature maps input to the pooling layer.
The method for estimating object pose based on images and three-dimensional models according to claim 2, wherein the perspective feature mapping network takes the multi-view image feature database and the image features of the target object as inputs, and outputs the The similarity between the image features and the multi-view image features of the 3D model for each view image feature.
The method for estimating the pose of an object based on an image and a three-dimensional model according to claim 7, wherein the perspective feature mapping network uses a cross-entropy function as a loss function, and takes minimizing the mapping error as an optimization goal to The parameters of the view feature map network are optimized.
An object pose estimation system based on images and three-dimensional models, characterized in that it includes:

The data acquisition module is used to acquire the image data of the target object;

a pose estimation module, configured to perform feature extraction on the image data, use an object pose estimation model to map the extracted features, and obtain the perspective corresponding to the feature with the highest similarity as the estimated pose of the target object;

Wherein, the object pose estimation model is used to map the image features of the target object to the convolutional neural network of the multi-view features of the three-dimensional model with the highest similarity.
A storage medium in which a program executable by a processor is stored, wherein the program executable by the processor is used to execute a program based on any one of claims 1-8 when executed by the processor. Object pose estimation methods for images and 3D models.