WO2023034548A1 - Monocular pose estimation and correction - Google Patents
Monocular pose estimation and correction Download PDFInfo
- Publication number
- WO2023034548A1 WO2023034548A1 PCT/US2022/042416 US2022042416W WO2023034548A1 WO 2023034548 A1 WO2023034548 A1 WO 2023034548A1 US 2022042416 W US2022042416 W US 2022042416W WO 2023034548 A1 WO2023034548 A1 WO 2023034548A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- source
- pose
- map
- source image
- Prior art date
Links
- 238000012937 correction Methods 0.000 title abstract description 21
- 238000000034 method Methods 0.000 claims description 33
- 238000013519 translation Methods 0.000 claims description 12
- 238000004590 computer program Methods 0.000 claims description 10
- 238000005259 measurement Methods 0.000 claims description 5
- 238000010586 diagram Methods 0.000 description 21
- 238000012545 processing Methods 0.000 description 18
- 238000004422 calculation algorithm Methods 0.000 description 16
- 230000015654 memory Effects 0.000 description 16
- 230000011218 segmentation Effects 0.000 description 16
- 238000012549 training Methods 0.000 description 15
- 238000003860 storage Methods 0.000 description 12
- 238000010801 machine learning Methods 0.000 description 10
- 230000009466 transformation Effects 0.000 description 9
- 230000008569 process Effects 0.000 description 8
- 238000007781 pre-processing Methods 0.000 description 7
- 230000006870 function Effects 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 4
- 230000000007 visual effect Effects 0.000 description 3
- 230000009471 action Effects 0.000 description 2
- 230000000386 athletic effect Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000013500 data storage Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007274 generation of a signal involved in cell-cell signaling Effects 0.000 description 1
- 238000012417 linear regression Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 230000035755 proliferation Effects 0.000 description 1
- 238000007637 random forest analysis Methods 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
- G06T7/74—Determining position or orientation of objects or cameras using feature-based methods involving reference images or patches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/50—Depth or shape recovery
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10028—Range image; Depth image; 3D point clouds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/95—Pattern authentication; Markers therefor; Forgery detection
Definitions
- Various embodiments of a Pose Correction Engine provide for significant improvements and advantages over conventional systems by providing the preprocessing of user source images for use in determining whether objects portrayed in user source images are fraudulent or counterfeit.
- the Engine provides a user(s) the flexibility to capture source images of a physical object in various predefined poses without requiring the user to capture source images that portray the physical object in perfect alignment with the various predefined poses.
- the user may upload the source images to the Engine and the Engine outputs pose corrected images based on the user’s source images.
- the Engine outputs pose corrected images that represent the source images as though the user successfully captured views of the physical object in perfect alignment with predefined poses.
- Various embodiments of an apparatus, methods, systems and computer program products described herein are directed to a Pose Correction Engine (“Engine”).
- the Engine generates a reference image of the object of interest.
- the reference image portrays the object of interest (“object”) oriented according to a first pose.
- the Engine receives a source image of an instance of the object.
- the source image portrays the instance of the object oriented according to a variation of the first pose.
- the Engine determines a difference between the first pose of the reference image and the variation of the first pose of the source image.
- the Engine identifies, based on the determined difference, one or portions of a three-dimensional (3D) map of a shape of the object obscured by the variation of the first pose portrayed in the source image.
- 3D three-dimensional
- the Engine generates a pose corrected image of the instance of the object that portrays at least a portion of the source image and at least the identified portion of the 3D map of the shape of the object. It is understood that, in various embodiments, an object of interest can be any type of physical object.
- a user may seek to determine whether a particular shoe is authentic.
- the user captures one or more source images of the shoe, wherein each respective image portrays a particular perspective view of the shoe.
- the user uploads the one or more source images to the Engine.
- the Engine pre-processes the one or more source images in preparation for authentication processing of the particular shoe.
- the Engine accesses a reference image of the particular shoe, but the reference image may portray the particular shoe according to a specific pose (i.e. position and orientation).
- a source image may also portray the particular shoe according to a pose that is nearly similar to the specific pose of the reference image. Stated differently, the pose of the source image may not be a perfect match to the specific pose the reference image.
- the Engine pre-processes the source image and the reference image according to a segmentation phase, a depth estimation phase, a scaling phase, and a registration phase in order to generate a pose corrected image.
- the pose corrected image output by the Engine represents a version of the source image that portrays the particular shoe according to the specific pose of the reference image.
- the Engine generates and stores a plurality of reference images for one or more types of objects of interest. For example, for a particular type of shoe, the Engine generates multiple reference images of that particular type of shoe, wherein each respective image portrays that particular type of shoe in a different pose (i.e. in a different position and orientation).
- the Engine trains a machine learning network(s) on the reference images during a training phase.
- the training phase includes a feedback propagation loop.
- the scaling phase implemented by the Engine includes applying one or more scaling factors to a depth map image based on the reference image.
- the registration phase implemented by the Engine generates one or more translation parameters and one or more rotation parameters.
- the Engine applies the translation parameters in the rotation parameters to the source image to generate the pose corrected image.
- a user may place a physical object in a lightbox that situates the physical object in a particular predefined pose.
- the lightbox may include one or more apertures for predefined fixed camera lens positions.
- the user may capture one or more source images with the cameras associated with lightbox.
- the Engine may pre-process a particular source image captured at the lightbox to output a corresponding pose corrected image.
- the Engine may further utilize data from the registration phase of that particular source image in order to generate respective pose corrected images of the other source images captured at the lightbox.
- Various embodiments include a module(s) and/or one or more functionalities to redact privacy information/data, to encrypt information/data and to anonymize data to ensure the confidentiality and security of user and platform information/data as well as compliance with data privacy law(s) and regulations in the United States and/or international jurisdictions.
- FIG. 1A is a diagram illustrating an exemplary environment in which some embodiments may operate.
- FIG. IB is a diagram illustrating an exemplary environment in which some embodiments may operate.
- FIGS. 2A and 2B are each a diagram illustrating an exemplary environment in which some embodiments may operate.
- FIG. 2C is a diagram illustrating an exemplary method that may be performed in some embodiments.
- FIG. 2D is a diagram illustrating an exemplary environment in which some embodiments may operate.
- FIG. 2E is a diagram illustrating an exemplary method that may be performed in some embodiments.
- FIG. 3 is a diagram illustrating an exemplary environment in which some embodiments may operate.
- FIG. 4 is a diagram illustrating an exemplary environment in which some embodiments may operate.
- FIG. 5 is a diagram illustrating an exemplary environment in which some embodiments may operate.
- FIG. 6 is a diagram illustrating an exemplary environment in which some embodiments may operate.
- FIGS. 7A , 7B and 7C are each a diagram illustrating an exemplary environment in which some embodiments may operate.
- FIG. 8 is a diagram illustrating an exemplary environment in which some embodiments may operate.
- FIG. 9 is a diagram illustrating an exemplary environment in which some embodiments may operate.
- steps of the exemplary methods set forth in this exemplary patent can be performed in different orders than the order presented in this specification. Furthermore, some steps of the exemplary methods may be performed in parallel rather than being performed sequentially. Also, the steps of the exemplary methods may be performed in a network environment in which some steps are performed by different computers in the networked environment.
- a computer system may include a processor, a memory, and a non-transitory computer-readable medium.
- the memory and non-transitory medium may store instructions for performing methods and steps described herein.
- Computer system 101 may comprise, for example, a smartphone, smart device, smart watch, tablet, desktop computer, laptop computer, notebook, server, or any other processing system.
- the computer system 101 is mobile so that it fits in a form factor that may be carried by a user.
- the computer system 101 is stationary.
- the computer system 101 may include a CPU 102 and memory 103.
- the computer system 101 may include internal or external peripherals such as a microphone 104 and speakers 105.
- the computer system may also include authentication application 110, which may comprise an image capture system 111 and a user interface (UI) system 112.
- UI user interface
- the image capture system may correspond to functionality for displaying one or more graphical overlays via the UI 112.
- the computer system 101 may be connected to a network 150.
- the network 150 may comprise, for example, a local network, intranet, wide-area network, internet, the Internet, wireless network, wired network, Wi-Fi, Bluetooth, a network of networks, or other networks.
- Network 150 may connect a number of computer systems to allow inter-device communications.
- Server 120 may be connected to computer system 101 over the network 150.
- the server 115 may comprise an a pose correction engine 120.
- the environment 100 may be a cloud computing environment that includes remote servers or remote storage systems.
- Cloud computing refers to pooled network resources that can be quickly provisioned so as to allow for easy scalability. Cloud computing can be used to provide software-as-a-service, platform-as-a-service, infrastructure-as-a-service, and similar features.
- a user may store a file in the “cloud,” which means that the file is stored on a remote network resource though the actual hardware storing the file may be opaque to the user.
- FIG. IB illustrates a block diagram of an example system 120 for a Pose Correction
- the system 120 may communicate with a user device 140 that sends one or more source images.
- the Difference Determination module 121 of the system 120 may perform functionality as illustrated in FIGS. 2A, 2B, 2C, 2D, 2E, 3, 4, 5, 6, 7A, 7B, 7C and/or 8.
- the Difference Determination module 121 may perform functionality related to determining a difference between a reference image(s) and a source image(s), wherein the reference image portrays an object according to a first pose and the source image portrays the object according to a second pose that is a variation of the first pose
- the Identification module 122 of the system 120 may perform functionality illustrated in FIGS. 2A, 2B, 2C, 2D, 2E, 3, 4, 5, 6, 7A, 7B, 7C and/or 8.
- the Identification module 122 may perform functionality related to identifying, based on a determined difference, one or portions of a three-dimensional (3D) map of a shape of the object obscured by the variation of the first pose portrayed in the source image.
- the Pose Correction module 123 of the system 120 may perform functionality as illustrated in FIGS. 2A, 2B, 2C, 2D, 2E, 3, 4, 5, 6, 7A, 7B, 7C and/or 8.
- the Pose Correction module 123 may perform functionality related to generating a pose corrected image of the object that portrays at least a portion of the source image and at least one identified obscured portion of the shape of the object.
- the Machine Learning Network(s) module 124 of the system 120 may perform functionality as illustrated in FIGS. 2A, 2B, 2C, 2D, 2E, 3, 4, 5, 6, 7A, 7B, 7C and/or 8. In some embodiments, the Machine Learning Network(s) module 124 may perform functionality related to training, updating and implementing one or more types of machine learning networks. [0039]
- the Synthetic Image Generation module 125 of the system 120 may perform functionality as illustrated in FIGS. 2A, 2B, 2C, 2D, 2E, 3, 4, 5, 6, 7A, 7B, 7C and/or 8. In some embodiments, the Synthetic Image Generation module 125 may perform functionality related to synthetically generating multiple reference images of a particular types of objects.
- One or more software modules of the Engine may be implemented on a computer system associated with a particular end user (“user”). As shown in flowchart 200 of FIG. 2A, the Engine may present one or more graphical overlays in a user interface to the user. (Act 202) A set of graphical overlays may correspond to a predefined pose for a particular type of physical object, such as a shoe. It is understood that there may be a plurality of graphical overlay sets for a particular type of object, wherein each distinct set of graphical overlays represents a particular single predefined pose for the same type of object.
- the Engine may have access to a reference image(s) that represents the same type of object according to each predefined pose.
- a first type of shoe may have at least a first set of graphical overlays for a first predefined pose and also a second set of graphical overlays for a second predefined pose.
- a first reference image portrays the same type of shoe according to the first predefined pose and a second reference image portrays the same type of shoe according to the second predefined pose.
- the user attempts to match camera views of the object with the displayed graphical overlays.
- one or more graphical overlays may be presented on a user interface as visual guides.
- the graphical overlays represent target display locations of various portions of a physical instance of a particular type of shoe viewed via a camera functionality.
- the display graphical overlays act as visual guides to assist the user to generate a source image that portrays an instance of the shoe in a pose that matches the predefined pose that corresponds with the displayed graphical overlays.
- the user captures, via a camera associated with the computer system, one or more source images portraying the object according to various object pose variations.
- the source image may portray the shoe according to a pose that is not an exact match to the predefined pose that corresponds with the displayed graphical overlays.
- the pose portrayed in the source image will inevitably be a variation of the predefined pose.
- the user sends the source image(s) to the Engine for preprocessing in order to generate a pose-corrected image.
- the pose-corrected image portrays content from the source image as though the user perfectly aligned the graphical overlays with portions of the shoe to create a source image that perfectly matched the predefined pose.
- the Engine receives one or more source images for preprocessing.
- the source images may each portray the same particular shoe, but each respective source image may portray that same particular shoe according to a different perspective view. It is understood that a perspective view may be based on a particular position and orientation of the shoe, whereby each particular position and orientation constitutes a pose.
- the identified reference images correspond to predefined poses that are similar to the poses portrayed in the received source images.
- the Engine identifies various reference images of the same particular shoe, wherein each respective reference image portrays that same particular shoe according to various predefined poses.
- a source image may include metadata that identifies a set of graphical overlays that were displayed to the user when the source image was captured.
- the Engine may identify a reference image that is associated with the identified set of graphical overlays.
- the Engine preprocesses one or more of the source images according to various phases in order to output a respective pose corrected image for each preprocessed source image.
- the Engine receives one or more source images 210.
- Each source image 210 portrays in object according to a variation of a predefined pose associated with a reference image 212.
- the Engine identifies respective reference images 212, wherein each identified reference image 212 portrays the same type of object in the predefined pose.
- the Engine determines the difference between the paired images 210, 212 and generates a pose corrected image 214.
- the preprocessing of a source image(s) 210 by the Engine may include one or more preprocessing phases.
- the Engine Upon receipt of a source image(s) 210, the Engine inputs the source image 210 and a corresponding reference image 212 into a segmentation phase 28.
- the segmentation phase 218 outputs a segmented mask source image and a segmented mask reference image.
- the Engine inputs the segmented mask source image and the segmented mask reference image into a depth estimation phase 220.
- the depth estimation phase 220 outputs a depth map source image and a depth map reference image.
- the Engine applies one or more scaling factors 222 to the depth map reference image and generates a scaled depth map reference image.
- the Engine inputs the scaled depth map reference image and the depth map source image into a registration phase 224.
- the registration phase 224 returns as output 226 a pose corrected image.
- the Engine receives a source image 232 that portrays a shoe according to a variation of a first pose.
- the Engine identifies a reference image 234 that portrays the same type of shoe (i.e. model, brand) according to the first pose.
- the Engine sends the images 232, 234 through pre-processing phases 218, 220, 222, 224, 226.
- the Engine returns a pose corrected image 236.
- the pose corrected image 236 includes content from the received source image 232 and further includes content that portrays various parts of the shoe obscured by the pose variation of the source image 232.
- the Engine generates a reference image of an object of interest.
- the reference image portrays the object oriented according to a first pose.
- the Engine generates a plurality of reference images of the object, wherein each respective reference image portrays the object according to a different pose.
- the Engine synthetically generates a plurality of reference images and/or generates reference images based on 3D scans of physical instances of the object(s).
- the Engine receives a source image of an instance of the object.
- the source image portrays the instance of the object oriented according to a variation of the first pose.
- the Engine identifies a reference image that corresponds with the received source image. For example, if the source image portrays a specific shoe model manufactured by a particular shoe company, the Engine identifies a set of reference images that portrays the same specific shoe model. Further, the Engine accesses the identifies set of reference images and selects a reference image that portrays the shoe according to a predefined pose, whereby the source image was captured during display of graphical overlays intended to guide the user to physical orient a camera to create a perspective view the physical shoe according to the predefined pose.
- the Engine determines a difference between the first pose of the reference image and the variation of the first pose of the source image. (Act 246) The Engine identifies, based on the determined difference, one or more portions of a three-dimensional (3D) map of a shape of the object obscured by the variation of the first pose portrayed in the source image. (Act 248) The Engine generates segmented mask images for both the source image(s) and the corresponding identified reference image(s). The Engine further generates depth map images based on the segmented mask images.
- the Engine generates a pose corrected image of the instance of the object that portrays at least a portion of the source image and at least the identified portion of the 3D map of the shape of the object. (Act 250) The Engine implements a global and local registration phase, as described herein, to the generate a pose corrected image for each received source image.
- various embodiments described herein include the collection and generation of various portions of training data.
- the Engine collects a plurality of 3D models for various types of objects.
- the Engine receives respective 3D models for various types of shoes.
- the respective 3D models include a 3D model for multiple differing types (i.e. models) of shoes offered by a plurality of shoe manufacturing companies.
- a model may be, for example, a type of shoe that is referenced by a unique SKU identifier.
- the Engine receives scans of physical objects. (Act 304) For example, a scan of a physical instance of a particular type of shoe.
- the Engine receives a plurality of scans whereby each scan may represent a different type of shoe. Each scan may further be translated into a corresponding 3D shoe model.
- the Engine receives the input data (i.e. the 3D models, the plurality of scans) and processes the input data in order to generate various images of object poses. (Act 306) For example, given one or more 3D models and/or scans of a particular model of a shoe, the Engine synthetically generates one or more additional reference images of the same shoe model oriented according to different poses. It is understood that a pose of an object portrays that object according to a perspective view defined according to a particular orientation of the object in 3D space on an x, y and z axis.
- the Engine synthetically generates multiple reference images of a particular type of athletic shoe whereby each respective reference image portrays a representation of the same type of athletic shoe viewed and oriented according to a different pose.
- the Engine may synthetically generate a plurality of reference images for a particular type of shoe, and further generate a plurality of reference images for another particular type of shoe.
- the Engine may implement this process over any number of shoes. It is understood that the various embodiments described herein are not limited to objects being only different types of shoes. Instead, an object can be any type of physical item.
- the Engine synthetically generates the various reference images from the perspective of an emulated camera that acts as a reference point in 3D space with regard to each pose.
- the Engine may generate a first reference image of a first type of shoe.
- the Engine may modify and/or manipulate the pose of the first reference image of the first type of shoe to generate additional reference images of the first type of shoe situated according to different poses.
- the Engine feeds one or more reference images into a machine learning network(s) 308 as training data.
- the reference image further includes depth values for each pixel of the corresponding reference image.
- the pixel depth value represents a distance from a particular pixel in a reference image to a position of an emulated camera from which the perspective view of the object’s pose in the reference image is based.
- a pixel depth value represents a distance between a particular pixel in the reference image and the emulated camera’s placement in 3D space with regard to an orientation (i.e. pose) of an object portrayed in that reference image and defined according to an x, y and z axis.
- the machine learning network 308 outputs various predicted segmented mask images 310 and various predicted depth mask images 314 as described in various embodiments herein.
- the training phase further includes implementation of a back propagation algorithm 312 that includes the feedback of a loss function.
- the feedback loss function For a predicted segmented mask image output during the training phase resulting from an input training reference image, the feedback loss function provides an indication of a measure of a classification error between portions in the predicted segmented mask image that portray the object and portions in the predicted mask image that portray the background surrounding the object.
- the measure of the classification error may be determined by comparing the predicted segmented mask image with classification ground truth provided by the input training reference image.
- the feedback loss function compares the predicted depth with ground truth depth of the input training reference image.
- the segmentation phase 218 implemented by the Engine receives an input source image 402 and feeds the input source image 402 into an encoder/decoder network.
- the segmentation phase 218 outputs a segmented mask image 404 based on the source image 402.
- the segmented mask image 404 is based on content of the input source image 402, whereby the encoder/decoder network removes one or more sections of background content in the input source image 402 that does not include content that corresponds to the object of interest, such as a shoe.
- the Engine identifies a corresponding reference image 406 upon receipt of the input source image 402.
- the corresponding reference image thereby includes content that portrays the same type of object in a predefined pose whereas a variation of that pose is portrayed in the input source image 402.
- the segmentation phase 218 further generates a segmented mask image 408 based on the reference image 406.
- the depth estimation phase 220 implemented by the Engine receives a segmented mask source image 502 output from the segmentation phase 218 and inputs the segmented mask source image 502 into a depth estimation network.
- the depth estimation phase 2020 outputs a depth map source image 504 based on the input segmented mask source image 502.
- the depth estimation network determines a distance of an emulated camera from each pixel in the input segmented mask image 502 to generate a depth map source image 504 that represents a predicted shape of the object portrayed in the input segmented mask source image 502.
- the Engine further inputs the corresponding segmented mask source image 506 into the depth estimation network.
- the depth estimation network outputs a depth map for the reference image 508 as well, whereby the depth map for the reference image 508 represents a predicted shape of the object portrayed in the reference image.
- the Engine identifies one or more scaling factors 604 associated the source image 402.
- one or more of the scaling factors 604 may b: a focal length associated with a camera that captured the source image, a type of light sensor associated with the camera that captured the source image and a pre-defined size measurement related to an instance of the object portrayed in the source image.
- the scaling phase 222 applies the scaling factors 604 to the depth map reference image 602 in order to modify and scale the image 602 in order to generate a scaled depth map reference image 606.
- the Engine By generating the scaled depth map reference image, the Engine prepares a reference image scaled to include one or more characteristics of an image that was captured by the same camera that captured the source image that is currently undergoing processing for pose correction.
- a depth map source image 702 and a scaled depth map reference image 704 are received as input.
- the Engine determines differences between the images 702, 704 by aligning respective portions of the images 702, 704.
- the Engine identifies occurrences of misalignment between the respective image portions the Engine attempted to align.
- the Engine generates one or more pose correction parameters 706.
- the pose correction parameters 706 may be one or more translation parameters and one or more rotation parameters.
- the Engine applies the pose correction parameters 706 to the corresponding original source image to generate a pose corrected image 214.
- Application of the pose correction parameters 706 emulates graphically rotating various portions of the portrayal of the object in the source images to generate additional object portions incorporated with the content from the source image, whereby the incorporated additional object portions create a visual appearance of the object in the source image as precisely positioned in a particular predefined pose.
- the incorporated additional object portions represent segments or areas of the objects that were not portrayed in the source image (i.e. obscured) but should have been if the object portrayed in the source image had been precisely positioned according to the particular predefined pose.
- the Engine predicts a source 3D map of the shape of the object based on a depth map source image 710 and predicts a reference 3D map of the object based on the scaled-depth map reference image 712.
- the Engine projects each of the source 3D map and the reference 3D map into a 3D space; and aligns the projected source and reference 3D maps to identify one or more differences.
- the Engine implements an iterative closest point algorithm to align the projected source and reference 3D maps.
- the Engine may implement a sift algorithm for alignment.
- a misalignment may be, for example, one or more respective differences between a section(s) of the source 3D map with a corresponding section of the reference 3D map(s).
- the Engine further identifies misalignment due to a conflict of colors present in various sections of the source 3D map and the reference 3D map that do not overlap.
- the Engine generates source point cloud data 714 based on the depth map source image 710 and generates reference point cloud data 716 based on the scaled-depth map reference image 712. It is understood that, the point cloud data 714, 716 may be based on a segmentation(s) representative of an object portrayed in the depth map images 710, 712.
- the Engine attempts to align the point cloud data 714, 716 to identify differences between the point cloud data 714, 716.
- the Engine identifies misalignment 718 between the point cloud data 714, 716.
- identified misalignment 718 may be respective portions of the point cloud data 714 that do not overlap with respective portions of the point cloud data 716. Such identified misalignments 718 (i.e. respective portions without overlap) our identified by the Engine as differences for which the pose correction parameters 706 are generated to correct.
- the Engine may receive a depth map source image 732 and a scaled depth map reference image 736 and may further segment out a particular predefined portion of the object portrayed in the images 732, 736. For example, if the object is a shoe, a heel tab shoe portion may be predefined as a registration portion to be analyzed during the registration phase 224.
- the Engine generates a segmentation mask 734, 738 of the heel tab from both images 732, 736.
- the Engine attempts to align heel tab segmentation masks 734, 738 in order to identify differences between the heel tab segmentation masks 734, 738.
- the Engine generates pose correction parameters 706 for adjusting a pose of the heel tab in the source segmentation mask 734 for alignment with the pose of the heel tab portrayed in the reference segmentation mask 738.
- the Engine further applies the pose correction parameters 706 generated for the heel tab in the source segmentation mask 734 to other various sections of the corresponding source image in order to generate the pose corrected image 214.
- the registration phase 224 aligns two or more images taken at different positions.
- the registration phase 224 aims to geometrically align the source image (portraying an object according to a variation of a predefined pose) with a reference image (portraying the same type of object precisely according to the predefined pose).
- the Engine calculates depth for both the source and reference images using a monocular depth estimation network.
- the predicted depth is obtained for the source image as well as the reference image
- depth for the pose of the reference image may be obtained by a depth sensing sensor and or a computer-generated 3D model may be used.
- depth is based on a distance value for each pixel in an image, where the distance value represents a measurement of distance from the respective pixel and a predefined position and orientation of a simulated camera represented as being external to the image.
- the Engine reprojects the predicted depths to obtain 3D point clouds for each of the source image and the reference image.
- the Engine executes one or more registration algorithms to compute a transformation matrix to align the 3D point clouds.
- the Engine may perform a two-step registration, which includes global registration followed by a local registration.
- Global registration methods include algorithms which do not require a measure of a proper (or adequate) initialization. Such global registration methods produce alignment results with a higher degree of error and are used as initialization for local methods.
- Local registration methods use the initial alignment and produce alignment methods with less degree of error
- the Engine implements global registration according to one or more of the following algorithms: Scale Invariant Feature Transform, Fast Point Feature Histogram, Random Sample Consensus.
- the Engine implements one or more of the following algorithms: iterative closest point (ICP) , colored iterative closest point.
- the Engine implements the ICP algorithm during local registration by, for each point in the source point cloud, matching a closest point in the reference point cloud.
- the Engine estimates a combination of rotation and translation parameters that minimizes a mean square point-to-point distance metric that optimizes (i.e. best aligns) each source point to the corresponding matching reference point identified in the previous stage of closest point matching.
- the Engine transforms the source points according to a transformation based on the rotation and translation parameters.
- the Engine implements the Color ICP algorithm during local registration.
- the Engine implements the Color ICP algorithm by performing segmentation of the reference image to extract one or more segment and modifies the RGB (red-blue-green) values of each pixel in the segment(s) based on an average RGB value of pixels in the source image.
- the Engine utilizes a deep colorization network.
- FIG. 8 illustrates a diagram 800 of a lightbox.
- a lightbox may be a physical structure in which any type of physical object may be placed according to various predefined physical positions and orientations.
- a shoe may be situated in a fixed position on a platform 814 within the lightbox.
- the lightbox may have multiple image capture devices (e.g.
- a source image of the shoe may be captured by a particular image capture device 802. Since the relative position of all other image capture devices 804, 806, 808, 810, 812 is known, the Engine can implement pose correction of source images related to various predefined poses that are different than a particular predefined pose that corresponds with a single source image, such as a single source image captured by a particular image captured 802. Stated differently, a first source image may portray the shoe according to a first predefined pose and a plurality of other source images may portray the same shoe according to various different predefined poses.
- the Engine During the registration phase, the Engine generates respective translation and rotation parameters for a transformation to generate a pose corrected image for the first source image.
- the Engine need not generate additional translation and rotation parameters or transforming each of the other source images. Rather, the Engine utilizes the translation and rotation parameters generated for the first source image to further generate respective translation and rotation parameters for each of the other source images based on each source image fixed position.
- the image capture device 802 for the first source image may be defined as a reference device having reference coordinates based on the device’s 802 known position at the lightbox with respect to the shoe on the platform 814 and the other devices 804, 806, 808, 810, 812.
- a particular different device 804 may be defined as a having relative coordinates with respect to the reference coordinates of the reference device 802.
- the Engine generates a reference transformation to generate a pose corrected image for the first source image from the reference device 802.
- the Engine maps the reference transformation used for the first source image to a relative transformation that corresponds with images from the different device 804.
- the Engine calculates a change of orientation from a center location on the reference device 802 to a center location on the different device 804.
- the Engine adjusts the reference transformation for pose correction to account for change of orientation of a center device location from the reference device 802 to the center device location of the different device 804.
- the resulting adjusted reference transformation thereby represents pose correction parameters for images captured by the difference device 804 generated without having to execute a global and local registration process on an image captured by the different device.
- various embodiments of the Engine described herein may use any suitable machine learning training techniques to train the machine learning network 130 for each sensor, including, but not limited to a neural net based algorithm, such as Artificial Neural Network, Deep Learning; a robust linear regression algorithm, such as Random Sample Consensus, Huber Regression, or Theil-Sen Estimator; a kernel based approach like a Support Vector Machine and Kernel Ridge Regression; a tree-based algorithm, such as Classification and Regression Tree, Random Forest, Extra Tree, Gradient Boost Machine, or Alternating Model Tree; Naive Bayes Classifier; and other suitable machine learning algorithms.
- FIG. 1 a neural net based algorithm
- Artificial Neural Network Deep Learning
- a robust linear regression algorithm such as Random Sample Consensus, Huber Regression, or Theil-Sen Estimator
- kernel based approach like a Support Vector Machine and Kernel Ridge Regression
- a tree-based algorithm such as Classification and Regression Tree, Random Forest, Extra Tree, Gradient Boost Machine, or Alternating
- FIG. 9 illustrates an example machine of a computer system within which a set of instructions, for causing the machine to perform any one or more of the methodologies discussed herein, may be executed.
- the machine may be connected (e.g., networked) to other machines in a LAN, an intranet, an extranet, and/or the Internet.
- the machine may operate in the capacity of a server or a client machine in client-server network environment, as a peer machine in a peer-to-peer (or distributed) network environment, or as a server or a client machine in a cloud computing infrastructure or environment.
- the machine may be a personal computer (PC), a tablet PC, a set-top box (STB), a Personal Digital Assistant (PDA), a cellular telephone, a web appliance, a server, a network router, a switch or bridge, or any machine capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by that machine.
- PC personal computer
- PDA Personal Digital Assistant
- STB set-top box
- STB set-top box
- a Personal Digital Assistant PDA
- a cellular telephone a web appliance
- server a server
- network router a network router
- switch or bridge any machine capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by that machine.
- machine shall also be taken to include any collection of machines that individually or jointly execute a set (or multiple sets) of instructions to perform any one or more of the methodologies discussed herein.
- the example computer system 900 includes a processing device 902, a main memory 904 (e.g., read-only memory (ROM), flash memory, dynamic random access memory (DRAM) such as synchronous DRAM (SDRAM) or Rambus DRAM (RDRAM), etc ), a static memory 906 (e.g., flash memory, static random access memory (SRAM), etc.), and a data storage device 918, which communicate with each other via a bus 930.
- main memory 904 e.g., read-only memory (ROM), flash memory, dynamic random access memory (DRAM) such as synchronous DRAM (SDRAM) or Rambus DRAM (RDRAM), etc
- DRAM dynamic random access memory
- SDRAM synchronous DRAM
- RDRAM Rambus DRAM
- static memory 906 e.g., flash memory, static random access memory (SRAM), etc.
- SRAM static random access memory
- Processing device 902 represents one or more general-purpose processing devices such as a microprocessor, a central processing unit, or the like. More particularly, the processing device may be complex instruction set computing (CISC) microprocessor, reduced instruction set computing (RISC) microprocessor, very long instruction word (VLIW) microprocessor, or processor implementing other instruction sets, or processors implementing a combination of instruction sets. Processing device 902 may also be one or more special-purpose processing devices such as an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), a digital signal processor (DSP), network processor, or the like. The processing device 902 is configured to execute instructions 926 for performing the operations and steps discussed herein.
- ASIC application specific integrated circuit
- FPGA field programmable gate array
- DSP digital signal processor
- the computer system 900 may further include a network interface device 908 to communicate over the network 920.
- the computer system 900 also may include a video display unit 910 (e.g., a liquid crystal display (LCD) or a cathode ray tube (CRT)), an alphanumeric input device 912 (e.g., a keyboard), a cursor control device 914 (e.g., a mouse), a graphics processing unit 922, a signal generation device 916 (e.g., a speaker), graphics processing unit 922, video processing unit 928, and audio processing unit 932.
- a video display unit 910 e.g., a liquid crystal display (LCD) or a cathode ray tube (CRT)
- an alphanumeric input device 912 e.g., a keyboard
- a cursor control device 914 e.g., a mouse
- graphics processing unit 922 e.g., a graphics processing unit 922
- the data storage device 918 may include a machine-readable storage medium 924 (also known as a computer-readable medium) on which is stored one or more sets of instructions or software 926 embodying any one or more of the methodologies or functions described herein.
- the instructions 926 may also reside, completely or at least partially, within the main memory 904 and/or within the processing device 902 during execution thereof by the computer system 900, the main memory 904 and the processing device 902 also constituting machine-readable storage media.
- the instructions 926 include instructions to implement functionality corresponding to the components of a device to perform the disclosure herein.
- the machine-readable storage medium 924 is shown in an example implementation to be a single medium, the term “machine-readable storage medium” should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) that store the one or more sets of instructions.
- the term “machine-readable storage medium” shall also be taken to include any medium that is capable of storing or encoding a set of instructions for execution by the machine and that cause the machine to perform any one or more of the methodologies of the present disclosure.
- the term “machine-readable storage medium” shall accordingly be taken to include, but not be limited to, solid-state memories, optical media and magnetic media.
- the present disclosure also relates to an apparatus for performing the operations herein.
- This apparatus may be specially constructed for the intended purposes, or it may comprise a general purpose computer selectively activated or reconfigured by a computer program stored in the computer.
- a computer program may be stored in a computer readable storage medium, such as, but not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, and magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, or any type of media suitable for storing electronic instructions, each coupled to a computer system bus.
- the present disclosure may be provided as a computer program product, or software, that may include a machine-readable medium having stored thereon instructions, which may be used to program a computer system (or other electronic devices) to perform a process according to the present disclosure.
- a machine-readable medium includes any mechanism for storing information in a form readable by a machine (e.g., a computer).
- a machine- readable (e.g., computer-readable) medium includes a machine (e.g., a computer) readable storage medium such as a read only memory (“ROM”), random access memory (“RAM”), magnetic disk storage media, optical storage media, flash memory devices, etc.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Processing (AREA)
- Image Analysis (AREA)
Abstract
Description
Claims
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202280062892.0A CN117980950A (en) | 2021-09-02 | 2022-09-02 | Monocular pose estimation and correction |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
IN202121039730 | 2021-09-02 | ||
IN202121039730 | 2021-09-02 | ||
US17/503,549 | 2021-10-18 | ||
US17/503,549 US11430152B1 (en) | 2021-10-18 | 2021-10-18 | Monocular pose estimation and correction |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023034548A1 true WO2023034548A1 (en) | 2023-03-09 |
Family
ID=85411554
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2022/042416 WO2023034548A1 (en) | 2021-09-02 | 2022-09-02 | Monocular pose estimation and correction |
Country Status (2)
Country | Link |
---|---|
US (1) | US20230143551A1 (en) |
WO (1) | WO2023034548A1 (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170154426A1 (en) * | 2015-11-29 | 2017-06-01 | Seedonk, Inc. | Image analysis and orientation correction for target object detection and validation |
US20210065432A1 (en) * | 2019-03-19 | 2021-03-04 | Sony Interactive Entertainment Inc. | Method and system for generating an image of a subject in a scene |
-
2022
- 2022-07-24 US US17/871,958 patent/US20230143551A1/en active Pending
- 2022-09-02 WO PCT/US2022/042416 patent/WO2023034548A1/en active Application Filing
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170154426A1 (en) * | 2015-11-29 | 2017-06-01 | Seedonk, Inc. | Image analysis and orientation correction for target object detection and validation |
US20210065432A1 (en) * | 2019-03-19 | 2021-03-04 | Sony Interactive Entertainment Inc. | Method and system for generating an image of a subject in a scene |
Non-Patent Citations (3)
Title |
---|
1 January 2011, SPRINGER , article RICHARD SZELISKI: "Computer Vision Algorithms and Applications", pages: 185 - 196, XP055423450 * |
ALCORN MICHAEL A.; LI QI; GONG ZHITAO; WANG CHENGFEI; MAI LONG; KU WEI-SHINN; NGUYEN ANH: "Strike (With) a Pose: Neural Networks Are Easily Fooled by Strange Poses of Familiar Objects", 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), IEEE, 15 June 2019 (2019-06-15), pages 4840 - 4849, XP033687299, DOI: 10.1109/CVPR.2019.00498 * |
ZHANG, XIAOZHENG ET AL.: "Face recognition across pose: A review", PATTERN RECOGNITION, vol. 42, no. 11, 2009, pages 2876 - 2896, XP026250877, DOI: 10.1016/j.patcog.2009.04.017 * |
Also Published As
Publication number | Publication date |
---|---|
US20230143551A1 (en) | 2023-05-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10163003B2 (en) | Recognizing combinations of body shape, pose, and clothing in three-dimensional input images | |
CN108256479B (en) | Face tracking method and device | |
EP3086282B1 (en) | Image processing device and image processing method | |
CN108734185B (en) | Image verification method and device | |
US11380017B2 (en) | Dual-view angle image calibration method and apparatus, storage medium and electronic device | |
US20130215113A1 (en) | Systems and methods for animating the faces of 3d characters using images of human faces | |
US11151583B2 (en) | Shoe authentication device and authentication process | |
Rana et al. | Learning-based tone mapping operator for efficient image matching | |
US9135712B2 (en) | Image recognition system in a cloud environment | |
US20220335682A1 (en) | Generating physically-based material maps | |
WO2019228144A1 (en) | Image processing method and device | |
Wang et al. | Facial expression-aware face frontalization | |
US10529085B2 (en) | Hardware disparity evaluation for stereo matching | |
US11430152B1 (en) | Monocular pose estimation and correction | |
JP5500404B1 (en) | Image processing apparatus and program thereof | |
US20230143551A1 (en) | Pose estimation and correction | |
US20220207917A1 (en) | Facial expression image processing method and apparatus, and electronic device | |
WO2021220688A1 (en) | Reinforcement learning model for labeling spatial relationships between images | |
JPWO2014092193A1 (en) | Image processing device, image processing method, image processing program, program, and mobile terminal device with camera | |
TWI776668B (en) | Image processing method and image processing system | |
US20240177414A1 (en) | 3d generation of diverse categories and scenes | |
US11361447B2 (en) | Image cropping using pre-generated metadata | |
CN109559343B (en) | Image processing method and device for container | |
Polevoi et al. | White Balance Correction for Detecting Holograms in Color Images of Black-and-White Photographs | |
Huanca Marin et al. | FindImplant: An Online Application for Visualizing the Dental Implants from X-Ray Images |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22865589 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 202280062892.0 Country of ref document: CN |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2022865589 Country of ref document: EP |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2022865589 Country of ref document: EP Effective date: 20240402 |