WO2019064266A1

WO2019064266A1 - Data set creation for deep neural network

Info

Publication number: WO2019064266A1
Application number: PCT/IB2018/057568
Authority: WO
Inventors: Jegor LEVKOVSKIY; Enrico PANDIAN
Original assignee: Checkout Technologies Srl
Priority date: 2017-09-28
Filing date: 2018-09-28
Publication date: 2019-04-04

Abstract

An apparatus and method is disclosed for data set creation suited for the training of deep neural network. The aim of the proposed system is to simplify and automate the routine of the data set creation. Data set in its final structure is generated in an appropriate format for training, validating and testing of the deep neural network. The physical item is an input to the system, it will be processed by the video footage device and as an output it will be produced a set of images sufficient for 3D model reconstruction of the former. The later processing will involve the human interaction through the web interface that permits to generate the final metadata files with a characteristics description of the item. The web application is equipped with the neural network that facilitates the bounding box selection and background subtraction from the images. Metadata files may be exported in various human readable or binary formats or it may be streamed into the appropriate RDBMS or NoSQL storage system.

Description

DATA SET CREATION FOR DEEP NEURAL NETWORK

BACKGROUND OF THE INVENTION

The present invention relates generally to deep neural networks and more particularly, to a method and apparatus that permits to automate the routine of a data set creation.

Deep learning is the application of artificial neural networks to learning tasks that contain more than one hidden layer. Deep learning architectures such as deep neural networks and recurrent neural networks have been applied to fields including computer vision, speech recognition, natural language processing, audio recognition, social network filtering, machine translation and bioinformatics where they produced results comparable to and in some cases superior to human experts.

The quality of the data set is essential for the training of the deep learning networks. One of the hardest problems to solve in deep learning has nothing to do with neural nets, it's the problem of getting the right data in the right format. Deep learning models needs a good training set to work properly. Collecting and constructing the training set takes time and domain-specific knowledge of where and how to gather relevant information. The problem of data is perfectly described by the renowned scientist Andrew Ng in this analogy with a rocket

"I think Al is akin to building a rocket ship. You need a huge engine and a lot of fuel. If you have a large engine and a tiny amount of fuel, you won't make it to orbit. If you have a tiny engine and a ton of fuel, you can't even lift off. To build a rocket you need a huge engine and a lot of fuel. The analogy to deep learning is that the rocket engine is the deep learning models and the fuel is the huge amounts of data we can feed to these algorithms. " Andrew Ng

Incorrectly constructed data set can lead to a poorly performing networks despite its logical potential. In Table below we group some of the details of the Deep Learning Models and its data sets VGGNet Pascal VOC COCO

Used For Image Recognition Image Recognition Image Recognition and Segmentation and Segmentation

Input Images Images Images

Output 1000 Categories 20 object classes 80 object categories

Data Size 2 images with 9993 segmented 2.5 segmented

assigned Category images object instances

Quoting the article, "Microsoft COCO: Common Objects in Context"

"Segmenting 2,500,000 object instances is an extremely time consuming task requiring over 22 worker hours per 1,000 segmentations."

So in the recent times it is sorted the huge need to develop an intelligent semi-automatic systems that will aid humans in the creation of data sets suited for the training of deep neural networks.

SUMMARY

The present invention relates to a video footage device, the device consists of set of cameras supported by a cameras support; the cameras have high resolution and can be of different type, such as: fish eyes, linear and depth cameras; a rotation plate is provided which is a disk and it is put in motion by a stepper motor and fully controlled from the remote through a web interface; the rotation velocity, direction and angle are controlled parameters; the rotating plate is configured to expose the product in front of the cameras from different angles and distances; the rotating plate may consist of multiple rotating plates of different diameter stacked one over the other.

In an embodiment, the rotation plate may be constructed of the transparent material, suited for the footage for the bottom.

In an embodiment, an illumination device is provided, which is composed of a set of lamps.

In an embodiment, the device is situated within a box and the specific background of the box can be exchanged. In an embodiment, the background can be of different color and may or may not have patterns on it.

In an embodiment, the device is configured so that when it starts rotating a video stream is sent remotely, however a copy of the video is being stored on the local device.

In an embodiment, right after the cycle of video footage of a product has finished, one other product can be put to be scanned by the device.

The invention further relates to a 3D model reconstruction method on the output videos from the device described above, wherein the output videos are used as input for the algorithm able to create a 3D model of the product.

In an embodiment, the 3D model is then used for the data augmentation, in this part one or more 3D models of products are processed at once by an algorithm to produce 2D images with many variation in scale and/or occlusion and/or translation and/or rotation and/or illumination and/or other conditions.

In an embodiment, high resolution 2D images are generated, and they form the dataset with the annotations necessary for the deep neural network.

BRIEF DESCRIPTION OF THE DRAWING

FIG. 1 . Shows the video footage device scheme. Two views are presented: front view and the top view. Video footage device is composed of the rotating plate and four cameras. Cameras are placed in the vertical plane as shown on the figure.

Camera 1 (horizontal): acquires video stream in horizontal plane, 0°

Camera 2 (oblique): acquires video stream in plane rotated -45°

Camera 3 (top): acquires video stream in plane rotated -90°

Camera 4 (bottom): acquires video stream in plane rotated 90° DESCRIPTION

Accurate three-dimensional shape reconstruction of objects using a video footage device can be achieved by using 3D imaging techniques along with the relative pose (i.e., translation and rotation) of a the object. For this scope it is proposed the novel device and the corresponding methods to construct the data set suited for the training of deep neural network.

The data set creation is performed in the following steps

Step 1 . Video capturing

Step 2. Video processing

Step 3. Object extraction from video

Step 4. 3D object model reconstruction

Step 5. Metadata export

Claims

1 . Video footage device, the device consists of set of cameras supported by a cameras support; the cameras have high resolution and can be of different type, such as: fish eyes, linear and depth cameras; a rotation plate is provided which is a disk and it is put in motion by a stepper motor and fully controlled from the remote through a web interface; the rotation velocity, direction and angle are controlled parameters; the rotating plate is configured to expose the product in front of the cameras from different angles and distances; the rotating plate may consist of multiple rotating plates of different diameter stacked one over the other.

2. Video footage device according to claim 1 , wherein the rotation plate may be constructed of the transparent material, suited for the footage for the bottom.

3. Video footage device according to one or more of the preceding claims, wherein an illumination device is provided, which is composed of a set of lamps.

4. Video footage device according to one or more of the preceding claims, wherein the device is situated within a box and the specific background of the box can be exchanged.

5. Video footage device according to claim 4, wherein the background can be of different color and may or may not have patterns on it.

6. Video footage device according to one or more of the preceding claims, wherein the device is configured so that when it starts rotating a video stream is sent remotely, however a copy of the video is being stored on the local device.

7. Video footage device according to one or more of the preceding claims, wherein right after the cycle of video footage of a product has finished, one other product can be put to be scanned by the device.

8. A 3D model reconstruction method on the output videos from the device of claim 1 , characterized in that

the output videos are used as input for the algorithm able to create a 3D model of the product.

9. Method according to claim 8, wherein the 3D model is then used for the data augmentation, in this part one or more 3D models of products are processed at once by an algorithm to produce 2D images with many variation in scale and/or occlusion and/or translation and/or rotation and/or illumination and/or other conditions.

10. Method according to claim 8 or 9, wherein high resolution 2D images are generated, and they form the dataset with the annotations necessary for the deep neural network.