WO2025193512A1 - Single shot 3d modelling from 2d image - Google Patents
Single shot 3d modelling from 2d imageInfo
- Publication number
- WO2025193512A1 WO2025193512A1 PCT/US2025/018753 US2025018753W WO2025193512A1 WO 2025193512 A1 WO2025193512 A1 WO 2025193512A1 US 2025018753 W US2025018753 W US 2025018753W WO 2025193512 A1 WO2025193512 A1 WO 2025193512A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- product
- model
- mesh model
- representation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three-dimensional [3D] modelling for computer graphics
- G06T17/20—Finite element generation, e.g. wire-frame surface description, tesselation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/08—Logistics, e.g. warehousing, loading or distribution; Inventory or stock management
- G06Q10/087—Inventory or stock management, e.g. order filling, procurement or balancing against orders
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T15/00—Three-dimensional [3D] image rendering
- G06T15/04—Texture mapping
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/20—Scenes; Scene-specific elements in augmented reality scenes
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/64—Three-dimensional [3D] objects
- G06V20/647—Three-dimensional [3D] objects by matching two-dimensional images to three-dimensional objects
-
- G—PHYSICS
- G07—CHECKING-DEVICES
- G07G—REGISTERING THE RECEIPT OF CASH, VALUABLES, OR TOKENS
- G07G1/00—Cash registers
- G07G1/0036—Checkout procedures
- G07G1/0045—Checkout procedures with a code reader for reading of an identifying code of the article to be registered, e.g. barcode reader or radio-frequency identity [RFID] reader
- G07G1/0054—Checkout procedures with a code reader for reading of an identifying code of the article to be registered, e.g. barcode reader or radio-frequency identity [RFID] reader with control of supplementary check-parameters, e.g. weight or number of articles
- G07G1/0063—Checkout procedures with a code reader for reading of an identifying code of the article to be registered, e.g. barcode reader or radio-frequency identity [RFID] reader with control of supplementary check-parameters, e.g. weight or number of articles with means for detecting the geometric dimensions of the article of which the code is read, such as its size or height, for the verification of the registration
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
Definitions
- computer vision may be used to detect and identify products for various tasks, such as tracking product inventory, determining out-of- stock products, determining misplaced products and verifying proper checkout of the products, particularly when the store uses self-checkout stations.
- images could show products at arbitrary angles, rotated, crumpled (e.g., in the case of products packaged in bags, such as potato chips), color jittered, over-exposed, under-exposed, etc. Because of these difficulties, images collected in a retail setting may not be able to be matched to a pristine image of the product, such as an image of the product provided by the manufacturer. Therefore, these images may not be accurately identified for the downstream tasks.
- FIG. 1 is a flowchart showing the processes for building and using a 3D model from a single 2D image of the product.
- FIG. 2 is a data flow diagram showing the various components of the system and data flowing therebetween.
- FIG. 3 shows examples of initial 2D images of various products.
- FIG. 4 shows an example of shape classification.
- FIG. 5 shows examples of 3D representations of 2D images generated by the disclosed method.
- FIG. 6 shows several views of a 3D representation of a product using the same image for the front and back of the 3D representation.
- the process comprises steps of 1) loading the image of the product; 2) classifying the image into product shapes; 3) segmenting the product outline from the image with the texture information; 4) loading the 3D mesh model relevant to the shape predicted; 5) rendering the texture information onto the 3D model; and 6) exporting different images at different viewpoints of the 3D product model.
- FIG. 1 A flowchart of the process is shown in FIG. 1.
- 2D images 101 of various products are obtained, examples of which are shown in FIG. 3.
- a standard image of the product is retrieved.
- the standard image may be provided, for example, by the manufacturer of product or captured as a standard image of the product using any camera.
- the standard image is preferably a pristine frontal image showing an unobstructed and unobscured view of the primary face of product.
- the system employs advanced image classification techniques to categorize the product within the image into specific shapes. This crucial classification forms the basis for subsequent decisions, guiding the system to the appropriate 3D mesh model that best corresponds to the identified product shape.
- the standard image 101 of a product is classified into one of several distinct shapes at step 102 of the process. Examples of shape classification are shown in FIG. 4.
- the classification may be performed using a trained machine learning classifier 202, shown in FIG. 2.
- the trained classifier may be, for example, Resnet 34, which is a classification model structured as a 34 layer convolutional neural network. In other embodiments, any type or architecture for the classifier may be used.
- the product shape classification 203 resulting from the input of standard image 102 to classifier 202 may be dependent on the particular type of objects being classified. For example, in the context of a grocery retail establishment, products may be classified into one of ⁇ bag, bottle, box, candy bag, packet, unknown ⁇ . In other contexts, the classifications are likely to be different.
- the next step involves the segmentation of the product outline from the 2D image.
- This process is fundamental for isolating the contours of the product and for providing the foundation for texture extraction.
- the product is segmented from image 101 at step 104 using segmenter 204.
- Segmenter 204 first obtains an outline of the product from image 101.
- the contours of the outline of the product are dimensionally sampled to obtain dimensional data (e.g., height, width, etc.), depending on the classified shape of the product. For example, for a product classified as a bottle, the dimensions may be an aspect ratio and a list of dimensions of various portions of the bottle.
- the outline is then isolated from image 101 by removing white space or background from image 101.
- the isolated view of the product may be saved in a separate file.
- the segmentation and dimensional sampling may be performed by a neural network trained to identify regions in an image and classify them to different classes, or by any other means.
- texture information 206 regarding the product is extracted.
- the texture information may be, for example, the pixel-scale structure perceived on an image, based on the spatial arrangement of color or intensities.
- a 3D mesh model 207 of the product is generated, based on the dimensional data 205 and, in some embodiments, the shape classification 203.
- a Unity® engine may be used to generate the 3D mesh model 207.
- any other means may be used to generate the 3D mesh model 207.
- the extracted texture information 206 which encapsulates the visual characteristics of the product, is seamlessly rendered onto the 3D mesh model 207. This fusion of shape and texture culminates in a vivid and realistic 3D representation 208 of the product.
- the 3D model may show the primary face of the product on both the front and back sides of the 3D model, as shown in FIG. 6.
- multiple 2D images of the product may be captured and texture information extracted therefrom for rendering on other sides of the 3D model (e.g., all 6 sides of a box).
- the system may export a series of images from various viewpoints.
- the system exports, at step 112, a series of images 113 from various viewpoints.
- These exported images 113 showcase the product's intricacies, allowing for a thorough examination of its form and texture. This holistic approach not only simplifies the 3D modeling process but also ensures that the resultant model faithfully captures the visual essence of the product, making it an invaluable tool for diverse applications across industries.
- 3D representation 208 may be rotated in 3 dimensions, illumination may be changed, certain portions of the product may be obscured, etc.
- a 2D image may be captured at 112, thus producing a set of pose-altered images of the product 113.
- the pose-altered images may become part of an enriched dataset used to train a neural network, for example a product detector.
- Features may be extracted from each pose-altered image and used as part of the enriched training dataset.
- the disclosed method for one-shot 3D model generation has proven instrumental in producing diverse datasets essential for both training and inference processes, particularly in the context of product matching.
- the disclosed methods seamlessly transform visual representations of products into comprehensive 3D models, enriching datasets with a nuanced understanding of product shapes and textures.
- the annotated datasets comprising a myriad of product shapes and their corresponding 3D models, serve as a robust foundation for training machine learning models.
- the utilization of varied viewpoints and intricate details in the dataset ensures that the models gain a holistic understanding of the visual nuances associated with different products. This, in turn, enhances the model's capacity for accurate and efficient product matching during the inference phase.
- the disclosed method excels in matching and identifying products based on their visual characteristics.
- the trained models honed on the diverse dataset, showcase a remarkable ability to discern similarities and differences between products, facilitating precise product matching with a high level of accuracy.
- the disclosed method provides a robust and versatile solution.
- the system upon which the processes and methods are implemented may consist of software implementing the processes and executing on a general purpose computing device.
- the software may include a trained shape classifier 202 capable of determining the shape of individual products in an image.
- the software may further include a trained segmenter 204 for performing segmentation and dimensional sampling.
- the software may further include 3D model generator 206.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- Multimedia (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- Computer Graphics (AREA)
- Economics (AREA)
- Geometry (AREA)
- Artificial Intelligence (AREA)
- Entrepreneurship & Innovation (AREA)
- General Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Computing Systems (AREA)
- Accounting & Taxation (AREA)
- Finance (AREA)
- Development Economics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
Abstract
Disclosed herein a method and process to generate a 3D model for objects based on one shot approach using a single 2D image of the object. The process comprises the steps of classifying the image into object shapes, segment the product outline from the image with the texture information, generating a 3D mesh model relevant to the classified shape and rendering the texture onto the 3D model. The model may then be manipulated to create and export different images at different viewpoints of the object.
Description
SINGLE SHOT 3D MODELLING FROM 2D IMAGE
Related Applications
[0001] This application claims the benefit of U.S. Provisional Patent Application No. 63/565,242, filed March 14, 2024, the contents of which are incorporated herein in its entirety.
Background
[0002] In a retail setting, it is desirable to be able to use computer vision methods to detect and identify products to aid in management of a retail establishment. For example, computer vision may be used to detect and identify products for various tasks, such as tracking product inventory, determining out-of- stock products, determining misplaced products and verifying proper checkout of the products, particularly when the store uses self-checkout stations.
[0003] To this end, numerous computer vision methods have been developed and many real-world applications based on those computer vision methods perform at a satisfactory level. Currently, various visual sensors (e.g., fixed cameras, robots, drones, and mobile phones) have been deployed in retail stores, enabling the application of advanced technologies to ease shopping and store management tasks.
[0004] The identifications of products are often used in downstream tasks, such as inventory or self-checkout verification. However, products can be of arbitrary poses in a real-world retail scene, especially when the image is taken by a camera not facing straight towards the shelf or when the product is being handled at a self-checkout station. For example, images could show products at arbitrary angles, rotated, crumpled (e.g., in the case of products packaged in bags, such as potato chips), color jittered, over-exposed, under-exposed, etc. Because of these difficulties, images collected in a retail setting may not be able to be matched to a pristine image of the product, such as an image of the product provided by the manufacturer. Therefore, these images may not be accurately identified for the downstream tasks.
Summary of the Invention
[0005] To address the issues identified above, disclosed herein is a novel system and method for generating 3D models and diverse data of different viewpoints from a single 2D image based on a one-shot approach. This method streamlines the creation of rich datasets that empower machine learning models with a comprehensive understanding of product shapes and textures. The single-shot methodology is invaluable during the training phase of a machine learning model, as it ensures the model learns from a varied array of product representations, enhancing their capacity for accurate and nuanced
product matching. In inference scenarios, the trained machine learning models exhibit exceptional accuracy in swiftly and precisely identifying products based on their visual characteristics. This methodology disclosed herein optimizes both the training and inference stages, making product matching more efficient and reliable than prior art systems, thus representing a significant improvement.
Brief Description of the Drawings
[0006] By way of example, specific exemplary embodiments of the disclosed system and method will now be described, with reference to the accompanying drawings, in which:
[0007] FIG. 1 is a flowchart showing the processes for building and using a 3D model from a single 2D image of the product.
[0008] FIG. 2 is a data flow diagram showing the various components of the system and data flowing therebetween.
[0009] FIG. 3 shows examples of initial 2D images of various products.
[0010] FIG. 4 shows an example of shape classification.
[0011] FIG. 5 shows examples of 3D representations of 2D images generated by the disclosed method.
[0012] FIG. 6 shows several views of a 3D representation of a product using the same image for the front and back of the 3D representation.
Detailed Description
[0013] The disclosed processes are explained herein in the context of use in a retail grocery establishment to generate multiple views of various products. However, as would be understood by one of skill in the art, the processes may be used to generate multiple views of any objects for any purpose.
[0014] Disclosed herein a method and process to generate 3D models for products based on one shot approach using single 2D image from the product. As a high-level description, the process comprises steps of 1) loading the image of the product; 2) classifying the image into product shapes; 3) segmenting the product outline from the image with the texture information; 4) loading the 3D mesh model relevant to the shape predicted; 5) rendering the texture information onto the 3D model; and 6) exporting different images at different viewpoints of the 3D product model.
[0015] A flowchart of the process is shown in FIG. 1. In a first step of the process, 2D images 101 of various products are obtained, examples of which are shown in FIG. 3. In one embodiment, a standard image of the product is retrieved. The standard image may be provided, for example, by the manufacturer of product or captured as a standard image of the product using any camera. The standard image is preferably a pristine frontal image showing an unobstructed and unobscured view of the primary face of product.
[0016] Following the input of image 101, the system employs advanced image classification techniques to categorize the product within the image into specific shapes. This crucial classification forms the basis for subsequent decisions, guiding the system to the appropriate 3D mesh model that best corresponds to the identified product shape. The standard image 101 of a product is classified into one of several distinct shapes at step 102 of the process. Examples of shape classification are shown in FIG. 4. The classification may be performed using a trained machine learning classifier 202, shown in FIG. 2. In one embodiment, the trained classifier may be, for example, Resnet 34, which is a classification model structured as a 34 layer convolutional neural network. In other embodiments, any type or architecture for the classifier may be used.
[0017] The product shape classification 203 resulting from the input of standard image 102 to classifier 202 may be dependent on the particular type of objects being classified. For example, in the context of a grocery retail establishment, products may be classified into one of {bag, bottle, box, candy bag, packet, unknown}. In other contexts, the classifications are likely to be different.
[0018] The next step involves the segmentation of the product outline from the 2D image. This process is fundamental for isolating the contours of the product and for providing the foundation for texture extraction. Based on the shape classification, the product is segmented from image 101 at step 104 using
segmenter 204. Segmenter 204 first obtains an outline of the product from image 101. The contours of the outline of the product are dimensionally sampled to obtain dimensional data (e.g., height, width, etc.), depending on the classified shape of the product. For example, for a product classified as a bottle, the dimensions may be an aspect ratio and a list of dimensions of various portions of the bottle. The outline is then isolated from image 101 by removing white space or background from image 101. The isolated view of the product may be saved in a separate file. The segmentation and dimensional sampling may be performed by a neural network trained to identify regions in an image and classify them to different classes, or by any other means.
[0019] Once the segmenting and dimensional sampling of the product are complete, texture information 206 regarding the product is extracted. The texture information may be, for example, the pixel-scale structure perceived on an image, based on the spatial arrangement of color or intensities.
[0020] The system then dynamically generates the relevant 3D mesh model, setting the stage for the intricate stages of transformation. In step 106 of the process, a 3D mesh model 207 of the product is generated, based on the dimensional data 205 and, in some embodiments, the shape classification 203. In one embodiment, a Unity® engine may be used to generate the 3D mesh model 207. In other embodiments, any other means may be used to generate the 3D mesh model 207.
[0021] In a final step 108 in the creation of the 3D model, the extracted texture information 206, which encapsulates the visual characteristics of the product, is seamlessly rendered onto the 3D mesh model 207. This fusion of shape and texture culminates in a vivid and realistic 3D representation 208 of the product. Examples of 3D representations and the corresponding 2D images from which they were generated by the disclosed method are shown in FIG. 5. In one embodiment, for convenience, the 3D model may show the primary face of the product on both the front and back sides of the 3D model, as shown in FIG. 6. In other embodiments, multiple 2D images of the product may be captured and texture information extracted therefrom for rendering on other sides of the 3D model (e.g., all 6 sides of a box).
[0022] In some embodiments, the system may export a series of images from various viewpoints. To offer a comprehensive view of the created 3D representation 208, the system exports, at step 112, a series of images 113 from various viewpoints. These exported images 113 showcase the product's intricacies, allowing for a thorough examination of its form and texture. This holistic approach not only simplifies the 3D modeling process but also ensures that the resultant model faithfully captures the visual essence of the product, making it an invaluable tool for diverse applications across industries. Once the 3D representation of the product 208 has been created, it can be spatially and visually manipulated at step 110 of the process, in one embodiment based on user input 111, or, in other embodiments, based on a
standard set of manipulations. For example, 3D representation 208 may be rotated in 3 dimensions, illumination may be changed, certain portions of the product may be obscured, etc. For each manipulation, a 2D image may be captured at 112, thus producing a set of pose-altered images of the product 113.
[0023] The pose-altered images may become part of an enriched dataset used to train a neural network, for example a product detector. Features may be extracted from each pose-altered image and used as part of the enriched training dataset.
[0024] The disclosed method for one-shot 3D model generation has proven instrumental in producing diverse datasets essential for both training and inference processes, particularly in the context of product matching. By leveraging a single 2D image, the disclosed methods seamlessly transform visual representations of products into comprehensive 3D models, enriching datasets with a nuanced understanding of product shapes and textures.
[0025] During the training phase, the generated diverse data becomes a valuable asset. The annotated datasets, comprising a myriad of product shapes and their corresponding 3D models, serve as a robust foundation for training machine learning models. The utilization of varied viewpoints and intricate details in the dataset ensures that the models gain a holistic understanding of the visual nuances associated with different products. This, in turn, enhances
the model's capacity for accurate and efficient product matching during the inference phase.
[0026] In inference scenarios, the disclosed method excels in matching and identifying products based on their visual characteristics. The trained models, honed on the diverse dataset, showcase a remarkable ability to discern similarities and differences between products, facilitating precise product matching with a high level of accuracy. Whether it's for cataloging, retail applications, or other use cases requiring product identification, the disclosed method provides a robust and versatile solution.
[0027] The integration of the one-shot 3D model generation methodology has not only optimized the training process by diversifying datasets but has also elevated the performance of inference models, making product matching in our application more accurate and reliable.
[0028] In various embodiments of the invention, the system upon which the processes and methods are implemented may consist of software implementing the processes and executing on a general purpose computing device. The software may include a trained shape classifier 202 capable of determining the shape of individual products in an image. The software may further include a trained segmenter 204 for performing segmentation and dimensional sampling. The software may further include 3D model generator 206.
Claims
1. A method for generating a 3D model comprising: obtaining a 2D image of an object; classifying the object in the image; segmenting the object from the image; dimensionally sampling the segmented image of the object; extracting texture information from the segmented image; generating a 3D mesh model based at least on the dimensional sampling; and rendering the texture information onto the 3D mesh model to create a 3D representation of the object.
2. The method of claim 1 wherein classifying the object comprises: inputting the 2D image to a classifier trained to recognize a plurality of classes of objects;
3. The method of claim 2 wherein the objects are grocery items.
4. The method of claim 3 wherein the classifier classifies the grocery items into one of a bag object, a bottle object, box object, a candy bag object, a packet object or an unknown other type of object.
5. The method of claim 1 wherein the 3D mesh model of the object is generated by a trained neural network based on an input of the at least the dimensional samples.
6. The method of claim 5 wherein the 3D mesh model is also dependent on an input of the classification of the object.
7. The method of claim 1 wherein the texture information represents a primary face of the object.
8. The method of claim 7 wherein the texture information representing the primary face of the object is rendered on opposite sides of the 3D mesh model.
9. The method of claim 1 further comprising: manipulating the 3D representation to provide multiple different views of the object.
10. The method of claim 9 wherein the 3D representation is manipulated based on user input.
11. The method of claim 9 further comprising: capturing a 2D image of each manipulation of the 3D representation.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202463565242P | 2024-03-14 | 2024-03-14 | |
| US63/565,242 | 2024-03-14 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2025193512A1 true WO2025193512A1 (en) | 2025-09-18 |
Family
ID=97064393
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2025/018753 Pending WO2025193512A1 (en) | 2024-03-14 | 2025-03-06 | Single shot 3d modelling from 2d image |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2025193512A1 (en) |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20170032568A1 (en) * | 2013-12-10 | 2017-02-02 | Google Inc. | Methods and Systems for Providing a Preloader Animation for Image Viewers |
| US20180114368A1 (en) * | 2016-10-25 | 2018-04-26 | Adobe Systems Incorporated | Three-dimensional model manipulation and rendering |
| US20200356813A1 (en) * | 2016-10-05 | 2020-11-12 | Digimarc Corporation | Image processing arrangements |
| US20220068007A1 (en) * | 2020-09-02 | 2022-03-03 | Roblox Corporation | 3d asset generation from 2d images |
-
2025
- 2025-03-06 WO PCT/US2025/018753 patent/WO2025193512A1/en active Pending
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20170032568A1 (en) * | 2013-12-10 | 2017-02-02 | Google Inc. | Methods and Systems for Providing a Preloader Animation for Image Viewers |
| US20200356813A1 (en) * | 2016-10-05 | 2020-11-12 | Digimarc Corporation | Image processing arrangements |
| US20180114368A1 (en) * | 2016-10-25 | 2018-04-26 | Adobe Systems Incorporated | Three-dimensional model manipulation and rendering |
| US20220068007A1 (en) * | 2020-09-02 | 2022-03-03 | Roblox Corporation | 3d asset generation from 2d images |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| EP3179407B1 (en) | Recognition of a 3d modeled object from a 2d image | |
| EP4118619B1 (en) | Pose estimation method and apparatus | |
| Wang et al. | X-ray scattering image classification using deep learning | |
| CN109977983B (en) | Method and device for obtaining training image | |
| CN114241249B (en) | Image classification method and system based on target detection algorithm and convolutional neural network | |
| Ouyang et al. | Vehicle target detection in complex scenes based on YOLOv3 algorithm | |
| CN106408037A (en) | Image recognition method and apparatus | |
| Ivanovsky et al. | Facial expression recognition algorithm based on deep convolution neural network | |
| CN117252928B (en) | Visual image positioning system for modular intelligent assembly of electronic products | |
| CN111553422A (en) | Automatic identification and recovery method and system for surgical instruments | |
| CN112257506A (en) | Fruit and vegetable size identification method, device, electronic device and computer readable medium | |
| CN108876776A (en) | A kind of method of generating classification model, eye fundus image classification method and device | |
| Soumya et al. | Emotion recognition from partially occluded facial images using prototypical networks | |
| Dalara et al. | Entity recognition in Indian sculpture using CLAHE and machine learning | |
| Fayyaz et al. | Pedestrian gender classification on imbalanced and small sample datasets using deep and traditional features | |
| Iqbal et al. | Automated Meter Reading Detection Using Inception with Single Shot Multi-Box Detector. | |
| US11900516B2 (en) | System and method for pose tolerant feature extraction using generated pose-altered images | |
| Yi et al. | Detecting retail products in situ using CNN without human effort labeling | |
| WO2025193512A1 (en) | Single shot 3d modelling from 2d image | |
| Thwe et al. | Accurate fashion and accessories detection for mobile application based on deep learning. | |
| Hasan et al. | 2D geometric object shapes detection and classification | |
| CN117301631A (en) | Color box printing automatic folding intelligent management system | |
| CN118747801A (en) | An instance segmentation method for industrial scattered parts based on improved DGCNN network | |
| Solanki et al. | Automatic Detection of Temples in consumer Images using histogram of Gradient | |
| Laptev et al. | Integrating traditional machine learning and neural networks for image processing |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 25765181 Country of ref document: EP Kind code of ref document: A1 |