WO2019064266A1 - Data set creation for deep neural network - Google Patents

Data set creation for deep neural network Download PDF

Info

Publication number
WO2019064266A1
WO2019064266A1 PCT/IB2018/057568 IB2018057568W WO2019064266A1 WO 2019064266 A1 WO2019064266 A1 WO 2019064266A1 IB 2018057568 W IB2018057568 W IB 2018057568W WO 2019064266 A1 WO2019064266 A1 WO 2019064266A1
Authority
WO
WIPO (PCT)
Prior art keywords
video footage
cameras
neural network
data set
video
Prior art date
Application number
PCT/IB2018/057568
Other languages
French (fr)
Inventor
Jegor LEVKOVSKIY
Enrico PANDIAN
Original Assignee
Checkout Technologies Srl
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Checkout Technologies Srl filed Critical Checkout Technologies Srl
Publication of WO2019064266A1 publication Critical patent/WO2019064266A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • GPHYSICS
    • G03PHOTOGRAPHY; CINEMATOGRAPHY; ANALOGOUS TECHNIQUES USING WAVES OTHER THAN OPTICAL WAVES; ELECTROGRAPHY; HOLOGRAPHY
    • G03BAPPARATUS OR ARRANGEMENTS FOR TAKING PHOTOGRAPHS OR FOR PROJECTING OR VIEWING THEM; APPARATUS OR ARRANGEMENTS EMPLOYING ANALOGOUS TECHNIQUES USING WAVES OTHER THAN OPTICAL WAVES; ACCESSORIES THEREFOR
    • G03B17/00Details of cameras or camera bodies; Accessories therefor
    • G03B17/56Accessories
    • G03B17/561Support related camera accessories
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/90Arrangement of cameras or camera modules, e.g. multiple cameras in TV studios or sports stadiums

Definitions

  • the present invention relates generally to deep neural networks and more particularly, to a method and apparatus that permits to automate the routine of a data set creation.
  • Deep learning is the application of artificial neural networks to learning tasks that contain more than one hidden layer.
  • Deep learning architectures such as deep neural networks and recurrent neural networks have been applied to fields including computer vision, speech recognition, natural language processing, audio recognition, social network filtering, machine translation and bioinformatics where they produced results comparable to and in some cases superior to human experts.
  • the quality of the data set is essential for the training of the deep learning networks.
  • One of the hardest problems to solve in deep learning has nothing to do with neural nets, it's the problem of getting the right data in the right format.
  • Deep learning models needs a good training set to work properly. Collecting and constructing the training set takes time and domain-specific knowledge of where and how to gather relevant information.
  • the problem of data is perfectly described by the renowned scientist Andrew Ng in this analogy with a rocket
  • the present invention relates to a video footage device, the device consists of set of cameras supported by a cameras support; the cameras have high resolution and can be of different type, such as: fish eyes, linear and depth cameras; a rotation plate is provided which is a disk and it is put in motion by a stepper motor and fully controlled from the remote through a web interface; the rotation velocity, direction and angle are controlled parameters; the rotating plate is configured to expose the product in front of the cameras from different angles and distances; the rotating plate may consist of multiple rotating plates of different diameter stacked one over the other.
  • the rotation plate may be constructed of the transparent material, suited for the footage for the bottom.
  • an illumination device which is composed of a set of lamps.
  • the device is situated within a box and the specific background of the box can be exchanged.
  • the background can be of different color and may or may not have patterns on it.
  • the device is configured so that when it starts rotating a video stream is sent remotely, however a copy of the video is being stored on the local device.
  • one other product can be put to be scanned by the device.
  • the invention further relates to a 3D model reconstruction method on the output videos from the device described above, wherein the output videos are used as input for the algorithm able to create a 3D model of the product.
  • the 3D model is then used for the data augmentation, in this part one or more 3D models of products are processed at once by an algorithm to produce 2D images with many variation in scale and/or occlusion and/or translation and/or rotation and/or illumination and/or other conditions.
  • high resolution 2D images are generated, and they form the dataset with the annotations necessary for the deep neural network.
  • FIG. 1 Shows the video footage device scheme. Two views are presented: front view and the top view. Video footage device is composed of the rotating plate and four cameras. Cameras are placed in the vertical plane as shown on the figure.
  • Camera 1 (horizontal): acquires video stream in horizontal plane, 0°
  • Camera 2 (oblique): acquires video stream in plane rotated -45°
  • Camera 3 acquires video stream in plane rotated -90°
  • Camera 4 (bottom): acquires video stream in plane rotated 90° DESCRIPTION
  • Accurate three-dimensional shape reconstruction of objects using a video footage device can be achieved by using 3D imaging techniques along with the relative pose (i.e., translation and rotation) of a the object.
  • 3D imaging techniques along with the relative pose (i.e., translation and rotation) of a the object.
  • the data set creation is performed in the following steps
  • Step 1 Video capturing
  • Step 2 Video processing
  • Step 3 Object extraction from video

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Image Analysis (AREA)

Abstract

An apparatus and method is disclosed for data set creation suited for the training of deep neural network. The aim of the proposed system is to simplify and automate the routine of the data set creation. Data set in its final structure is generated in an appropriate format for training, validating and testing of the deep neural network. The physical item is an input to the system, it will be processed by the video footage device and as an output it will be produced a set of images sufficient for 3D model reconstruction of the former. The later processing will involve the human interaction through the web interface that permits to generate the final metadata files with a characteristics description of the item. The web application is equipped with the neural network that facilitates the bounding box selection and background subtraction from the images. Metadata files may be exported in various human readable or binary formats or it may be streamed into the appropriate RDBMS or NoSQL storage system.

Description

DATA SET CREATION FOR DEEP NEURAL NETWORK
BACKGROUND OF THE INVENTION
The present invention relates generally to deep neural networks and more particularly, to a method and apparatus that permits to automate the routine of a data set creation.
Deep learning is the application of artificial neural networks to learning tasks that contain more than one hidden layer. Deep learning architectures such as deep neural networks and recurrent neural networks have been applied to fields including computer vision, speech recognition, natural language processing, audio recognition, social network filtering, machine translation and bioinformatics where they produced results comparable to and in some cases superior to human experts.
The quality of the data set is essential for the training of the deep learning networks. One of the hardest problems to solve in deep learning has nothing to do with neural nets, it's the problem of getting the right data in the right format. Deep learning models needs a good training set to work properly. Collecting and constructing the training set takes time and domain-specific knowledge of where and how to gather relevant information. The problem of data is perfectly described by the renowned scientist Andrew Ng in this analogy with a rocket
"I think Al is akin to building a rocket ship. You need a huge engine and a lot of fuel. If you have a large engine and a tiny amount of fuel, you won't make it to orbit. If you have a tiny engine and a ton of fuel, you can't even lift off. To build a rocket you need a huge engine and a lot of fuel. The analogy to deep learning is that the rocket engine is the deep learning models and the fuel is the huge amounts of data we can feed to these algorithms. " Andrew Ng
Incorrectly constructed data set can lead to a poorly performing networks despite its logical potential. In Table below we group some of the details of the Deep Learning Models and its data sets VGGNet Pascal VOC COCO
Used For Image Recognition Image Recognition Image Recognition and Segmentation and Segmentation
Input Images Images Images
Output 1000 Categories 20 object classes 80 object categories
Data Size 2 images with 9993 segmented 2.5 segmented
assigned Category images object instances
Quoting the article, "Microsoft COCO: Common Objects in Context"
"Segmenting 2,500,000 object instances is an extremely time consuming task requiring over 22 worker hours per 1,000 segmentations."
So in the recent times it is sorted the huge need to develop an intelligent semi-automatic systems that will aid humans in the creation of data sets suited for the training of deep neural networks.
SUMMARY
The present invention relates to a video footage device, the device consists of set of cameras supported by a cameras support; the cameras have high resolution and can be of different type, such as: fish eyes, linear and depth cameras; a rotation plate is provided which is a disk and it is put in motion by a stepper motor and fully controlled from the remote through a web interface; the rotation velocity, direction and angle are controlled parameters; the rotating plate is configured to expose the product in front of the cameras from different angles and distances; the rotating plate may consist of multiple rotating plates of different diameter stacked one over the other.
In an embodiment, the rotation plate may be constructed of the transparent material, suited for the footage for the bottom.
In an embodiment, an illumination device is provided, which is composed of a set of lamps.
In an embodiment, the device is situated within a box and the specific background of the box can be exchanged. In an embodiment, the background can be of different color and may or may not have patterns on it.
In an embodiment, the device is configured so that when it starts rotating a video stream is sent remotely, however a copy of the video is being stored on the local device.
In an embodiment, right after the cycle of video footage of a product has finished, one other product can be put to be scanned by the device.
The invention further relates to a 3D model reconstruction method on the output videos from the device described above, wherein the output videos are used as input for the algorithm able to create a 3D model of the product.
In an embodiment, the 3D model is then used for the data augmentation, in this part one or more 3D models of products are processed at once by an algorithm to produce 2D images with many variation in scale and/or occlusion and/or translation and/or rotation and/or illumination and/or other conditions.
In an embodiment, high resolution 2D images are generated, and they form the dataset with the annotations necessary for the deep neural network.
BRIEF DESCRIPTION OF THE DRAWING
FIG. 1 . Shows the video footage device scheme. Two views are presented: front view and the top view. Video footage device is composed of the rotating plate and four cameras. Cameras are placed in the vertical plane as shown on the figure.
Camera 1 (horizontal): acquires video stream in horizontal plane, 0°
Camera 2 (oblique): acquires video stream in plane rotated -45°
Camera 3 (top): acquires video stream in plane rotated -90°
Camera 4 (bottom): acquires video stream in plane rotated 90° DESCRIPTION
Accurate three-dimensional shape reconstruction of objects using a video footage device can be achieved by using 3D imaging techniques along with the relative pose (i.e., translation and rotation) of a the object. For this scope it is proposed the novel device and the corresponding methods to construct the data set suited for the training of deep neural network.
The data set creation is performed in the following steps
Step 1 . Video capturing
Step 2. Video processing
Step 3. Object extraction from video
Step 4. 3D object model reconstruction
Step 5. Metadata export

Claims

1 . Video footage device, the device consists of set of cameras supported by a cameras support; the cameras have high resolution and can be of different type, such as: fish eyes, linear and depth cameras; a rotation plate is provided which is a disk and it is put in motion by a stepper motor and fully controlled from the remote through a web interface; the rotation velocity, direction and angle are controlled parameters; the rotating plate is configured to expose the product in front of the cameras from different angles and distances; the rotating plate may consist of multiple rotating plates of different diameter stacked one over the other.
2. Video footage device according to claim 1 , wherein the rotation plate may be constructed of the transparent material, suited for the footage for the bottom.
3. Video footage device according to one or more of the preceding claims, wherein an illumination device is provided, which is composed of a set of lamps.
4. Video footage device according to one or more of the preceding claims, wherein the device is situated within a box and the specific background of the box can be exchanged.
5. Video footage device according to claim 4, wherein the background can be of different color and may or may not have patterns on it.
6. Video footage device according to one or more of the preceding claims, wherein the device is configured so that when it starts rotating a video stream is sent remotely, however a copy of the video is being stored on the local device.
7. Video footage device according to one or more of the preceding claims, wherein right after the cycle of video footage of a product has finished, one other product can be put to be scanned by the device.
8. A 3D model reconstruction method on the output videos from the device of claim 1 , characterized in that
the output videos are used as input for the algorithm able to create a 3D model of the product.
9. Method according to claim 8, wherein the 3D model is then used for the data augmentation, in this part one or more 3D models of products are processed at once by an algorithm to produce 2D images with many variation in scale and/or occlusion and/or translation and/or rotation and/or illumination and/or other conditions.
10. Method according to claim 8 or 9, wherein high resolution 2D images are generated, and they form the dataset with the annotations necessary for the deep neural network.
PCT/IB2018/057568 2017-09-28 2018-09-28 Data set creation for deep neural network WO2019064266A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201762564965P 2017-09-28 2017-09-28
US62/564,965 2017-09-28

Publications (1)

Publication Number Publication Date
WO2019064266A1 true WO2019064266A1 (en) 2019-04-04

Family

ID=64109930

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2018/057568 WO2019064266A1 (en) 2017-09-28 2018-09-28 Data set creation for deep neural network

Country Status (1)

Country Link
WO (1) WO2019064266A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111191620A (en) * 2020-01-03 2020-05-22 西安电子科技大学 Method for constructing human-object interaction detection data set

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020159628A1 (en) * 2001-04-26 2002-10-31 Mitsubishi Electric Research Laboratories, Inc Image-based 3D digitizer
US20030038801A1 (en) * 2001-08-24 2003-02-27 Sanyo Electric Co., Ltd. Three dimensional modeling apparatus
US8462206B1 (en) * 2010-02-25 2013-06-11 Amazon Technologies, Inc. Image acquisition system
US20140354769A1 (en) * 2012-01-26 2014-12-04 Meditory Llc Device and methods for fabricating a two-dimensional image of a three-dimensional object
US20150334302A1 (en) * 2014-05-17 2015-11-19 Sheenwill International (Hong Kong) Limited Object Exterior Photographing System
US20150365636A1 (en) * 2014-06-12 2015-12-17 Dealermade Vehicle photo studio
US20170193680A1 (en) * 2016-01-04 2017-07-06 Kla-Tencor Corporation Generating high resolution images from low resolution images for semiconductor applications

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020159628A1 (en) * 2001-04-26 2002-10-31 Mitsubishi Electric Research Laboratories, Inc Image-based 3D digitizer
US20030038801A1 (en) * 2001-08-24 2003-02-27 Sanyo Electric Co., Ltd. Three dimensional modeling apparatus
US8462206B1 (en) * 2010-02-25 2013-06-11 Amazon Technologies, Inc. Image acquisition system
US20140354769A1 (en) * 2012-01-26 2014-12-04 Meditory Llc Device and methods for fabricating a two-dimensional image of a three-dimensional object
US20150334302A1 (en) * 2014-05-17 2015-11-19 Sheenwill International (Hong Kong) Limited Object Exterior Photographing System
US20150365636A1 (en) * 2014-06-12 2015-12-17 Dealermade Vehicle photo studio
US20170193680A1 (en) * 2016-01-04 2017-07-06 Kla-Tencor Corporation Generating high resolution images from low resolution images for semiconductor applications

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111191620A (en) * 2020-01-03 2020-05-22 西安电子科技大学 Method for constructing human-object interaction detection data set
CN111191620B (en) * 2020-01-03 2022-03-22 西安电子科技大学 Method for constructing human-object interaction detection data set

Similar Documents

Publication Publication Date Title
Wiley et al. Computer vision and image processing: a paper review
Zioulis et al. Omnidepth: Dense depth estimation for indoors spherical panoramas
CN109870983B (en) Method and device for processing tray stack image and system for warehousing goods picking
CN110503703B (en) Method and apparatus for generating image
Gomez-Donoso et al. Lonchanet: A sliced-based cnn architecture for real-time 3d object recognition
Hou et al. Revealnet: Seeing behind objects in rgb-d scans
US20210110588A1 (en) Mobile application for object recognition, style transfer and image synthesis, and related systems, methods, and apparatuses
CN110457515B (en) Three-dimensional model retrieval method of multi-view neural network based on global feature capture aggregation
CN104134234A (en) Full-automatic three-dimensional scene construction method based on single image
US20200242804A1 (en) Utilizing a critical edge detection neural network and a geometric model to determine camera parameters from a single digital image
CN110349247A (en) A kind of indoor scene CAD 3D method for reconstructing based on semantic understanding
US20210232926A1 (en) Mapping images to the synthetic domain
CN111832573B (en) Image emotion classification method based on class activation mapping and visual saliency
Su et al. Adapting models to signal degradation using distillation
CN113297701B (en) Simulation data set generation method and device for multiple industrial part stacking scenes
CN109034694A (en) Raw materials for production intelligent storage method and system based on intelligence manufacture
CN108734773A (en) A kind of three-dimensional rebuilding method and system for mixing picture
Wang et al. Instance shadow detection with a single-stage detector
CN113065506B (en) Human body posture recognition method and system
WO2019064266A1 (en) Data set creation for deep neural network
CN110019901A (en) Three-dimensional model search device, searching system, search method and computer readable storage medium
Hożyń Convolutional Neural Networks for Classifying Electronic Components in Industrial Applications
Zhang et al. A neural learning approach for simultaneous object detection and grasp detection in cluttered scenes
WO2019192745A1 (en) Object recognition from images using cad models as prior
CN115115713A (en) Unified space-time fusion all-around aerial view perception method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18797100

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18797100

Country of ref document: EP

Kind code of ref document: A1