WO2022184265A1

WO2022184265A1 - Method for providing a three-dimensional model of a three-dimensional object suitable for transmission to a terminal device

Info

Publication number: WO2022184265A1
Application number: PCT/EP2021/055571
Authority: WO
Inventors: Christian Pflaum; Sebastian Kirsch; Alexander Pflaum
Original assignee: M2P Gmbh
Priority date: 2021-03-05
Filing date: 2021-03-05
Publication date: 2022-09-09

Abstract

The invention provides an a method for providing a three-dimensional model of a three-dimensional object for a terminal device, wherein the method comprises the following steps: receiving two-dimensional digital image data of the three-dimensional object; extraction of predetermined data and/or properties from the digital image data using a convolutional neural network, CNN; generating a three-dimensional model from the two-dimensional image data based on the extracted data and/or properties using a Generative Adversarial Network, GAN or a Variational Autoencoder, VAE; and sending the three-dimensional model of the three-dimensional object to a terminal device.

Description

METHOD FOR PROVIDING A THREE-DIMENSIONAL MODEL OF A THREE-DIMENSIONAL OBJECT SUITABLE FOR TRANSMISSION TO A TERMINAL

DEVICE

The present disclosure relates to methods and apparatuses for providing a three-dimensional model of a three-dimensional object for a terminal device.

In prior art systems, such as online photo services, the user edits the photos either after uploading them in the browser or with a software from the manufacturer. However, existing photo editors are often structured too complex and not optimized for smartphones. No or limited user interaction may take place in such systems. The selectable processes are limited or require too much user interaction. Photo products such as framed prints or cups, in particular, often serve decorative purposes. Accordingly, it is especially appealing for consumers to preview the products in real size and form in the environment they are supposed to decorate. As of now, the consumer has no technical way to see the product in the desired environment other than ordering the actual physical product. If the product does not fit the consumer's aesthetic taste, he may not have a legal basis for returning the product since it has been custom made.

It is an object of the present invention to overcome the above mentioned problems of the related art. In particular, it is an object of the present invention to provide methods and apparatuses that provide a three dimensional model of a three-dimensional object for a terminal device. In particular, an object of the present invention is to provide said methods and processes in a highly automated fashion for increased user experience.

The above-mentioned objects are achieved with the features of the independent claims. Dependent claims define preferred embodiments of the invention.

In particular, the present disclosure relates to a method for providing a three-dimensional model of a three-dimensional object for a terminal device, wherein the method comprises the following steps: receiving two-dimensional digital image data of the three-dimensional object; extraction of predetermined data and/or properties from the digital image data using a convolutional neural network (CNN); generating a three-dimensional model from the two- dimensional image data based on the extracted data and/or properties using a Generative Adversarial Network (GAN) or a Variational Autoencoder (VAE); and sending the three- dimensional model of the three-dimensional object to a terminal device.

Various embodiments may preferably implement the following features.

The terminal device may preferably be a mobile device, such as a smartphone, a laptop or a tablet computer. However, the terminal device may also be a desktop computer or a similar device.

The above method may preferably be executed on a backend device.

Preferably, the generated three-dimensional model is suitable and intended to be displayed in space on a terminal device by means of an augmented reality framework.

The generated three-dimensional model may be rendered through any software that supports augmented reality rendering, such as web browsers, mobile apps, desktop apps, and chat interfaces.

Transmission of the three-dimensional model to the terminal device may happen in the context of any interaction that requires communication between the terminal device and the backend device, such as chatbot conversations through a chat interface or the download of remote content through a web or mobile application.

The predetermined data and/or properties preferably relate to a class of the object and/or a geometry of the object and/or predetermined areas of the object and/or a printable area of the object.

The method preferably further comprises a step for receiving a digital image and wherein based on the extraction of the printable area of the object, the digital image is placed on the printable area, the three-dimensional model of the object together with the digital image is subsequently generated on the object.

Preferably, the step of receiving the two-dimensional digital image data further comprises receiving information on the three-dimensional size of the object, wherein the method further comprises a step of adapting the three-dimensional model using the information on the three- dimensional size of the object after the generation of the three-dimensional model.

The receipt of the information on the three-dimensional size of the object preferably comprises a manual input by a user and/or an automatic receipt from an existing database and/or from an extraction of metadata from the two-dimensional digital image data.

The method preferably further comprises a step for storing the extracted data and/or properties of the object and/or the received two-dimensional digital image data and/or the generated three-dimensional model in a database.

The method preferably further comprises a step of storing a unique ID relating the respective extracted data and/or the properties of the object and/or the received two-dimensional digital image data and/or the generated three-dimensional model with each other.

Preferably, receiving the two-dimensional digital image data comprises: receiving image data provided by a user or receiving the image data from an existing database.

Preferably, the extraction of predetermined data and/or properties from the digital image data using the CNN comprises: detection of the object in the digital image data; determination of end points and/or corner points of the object in the digital image data; and extraction of the object from the digital image data based on the detection of the object in the digital image data and the determination of end points and/or corner points of the object in the digital image data.

Preferably, generating a three-dimensional model comprises: training of the GAN or VAE to generate three-dimensional models from two-dimensional image data prior to the generation of the three-dimensional model based on the received two-dimensional digital image data, wherein the trained GAN or VAE is used for said generation.

The method preferably further comprises a step of sending the two-dimensional digital image data to the terminal device prior to sending the three-dimensional model of the three- dimensional object to the terminal device.

Preferably, sending the three-dimensional model of the three-dimensional object to the terminal device is triggered by the terminal device, e.g. a user input on the terminal device and a subsequent message about the user input.

The present disclosure also relates to a method for displaying a three-dimensional model in space on a terminal device by means of an augmented reality framework, the method comprising the following steps: the steps according to the method as described above, receiving the three-dimensional model on a terminal device, and displaying the three- dimensional model in space on the terminal device using an augmented reality framework.

The present disclosure also relates to a data processing apparatus comprising a processor configured to perform the steps of the method as described above.

The exemplary embodiments disclosed herein are directed to providing features that will become readily apparent by reference to the following description when taken in conjunction with the accompanying drawings. In accordance with various embodiments, exemplary systems, methods, devices and computer program products are disclosed herein. It is understood, however, that these embodiments are presented by way of example and not limitation, and it will be apparent to those of ordinary skill in the art who read the present disclosure that various modifications to the disclosed embodiments can be made while remaining within the scope of the present disclosure.

Thus, the present disclosure is not limited to the exemplary embodiments and applications described and illustrated herein. Additionally, the specific order and/or hierarchy of steps in the methods disclosed herein are merely exemplary approaches. Based upon design preferences, the specific order or hierarchy of steps of the disclosed methods or processes can be re-arranged while remaining within the scope of the present disclosure. Thus, those of ordinary skill in the art will understand that the methods and techniques disclosed herein present various steps or acts in a sample order, and the present disclosure is not limited to the specific order or hierarchy presented unless expressly stated otherwise.

The above and other aspects and their implementations are described in greater detail in the drawings, the descriptions, and the claims.

Fig. 1 is a schematic illustration of a method according to an embodiment of the present disclosure.

Fig. 2a is a schematic illustration of the extraction of predetermined data and/or properties from the digital image data using a convolutional neural network according to an embodiment of the present disclosure.

Fig. 2b is another schematic illustration of the extraction of predetermined data and/or properties from the digital image data using a convolutional neural network according to an embodiment of the present disclosure.

Fig. 3 is a schematic illustration of generating a three-dimensional model from the two- dimensional image data based on the extracted data and/or properties using a Generative Adversarial Network according to an embodiment of the present disclosure.

Fig. 4 is a flow chart of a method according to an embodiment of the present disclosure.

Fig. la schematically shows the procedure of an embodiment of the present disclosure. In particular, the user uploads an image I via a web browser, a smartphone app or a desktop app (here a desktop computer 10) that contains one or more objects to be extracted. Alternatively, the image can also be retrieved from an existing third-party database. The image I is transmitted to a cloud service Cl via an API interface. On the cloud service Cl, the image is stored in an image database D.

The image I is then forwarded to another cloud service C2 that hosts the Convolutional Neural Network (CNN).

It should be noted that the term "user", as used herein, refers to any person making 2D images of objects available to the service. This could be a partner merchant, an end-user uploading his own images of objects, etc. The term "customer" , as used herein, refers to someone who ultimately accesses the final 3D model on his/her device (e.g. mobile phone 20) as described in more detail below.

Next, a state of the art convolutional neural network architecture for image segmentation such as U-Net, DeepLabV3, etc. is used as the base model. These models are pre-trained to extract a wide variety of objects from images (often thousands of objects). Yet, they are unlikely trained to perfectly extract specific objects such as image frames or cups.

To achieve the desired accuracy the CNN needs to be further trained on a set of images relating to the business domain, i.e. the specific use case such as image frames, cups, etc. Ideally, further training the network on a few hundred to a few thousand images will sufficiently increase extraction accuracy.

Otherwise parts of the network need to be re-trained and the output layer may have to be replaced. To benefit from the information, the network has preferably already learned during pre-training, the lower layers, which learn abstract features and contours, and are thus "frozen". The upper layer will be re-trained on the new images. Ultimately, a new output layer might be added, that only distinguishes between a set of relevant items such as various image frames.

That is, with the exception of the output layer, the architecture is unlikely to change from the existing state of the art network. The training data will likely differ, and some network parameters such as the learning rate might be adjusted in view of the specific use case. An example of the extraction by the CNN is illustrated in Figs. 2a and 2b. Fig. 2a shows an example of an extracted object 30, i.e. a picture frame. In addition, the CNN is used to extract, e.g. an area within the detected picture frame 30 that can be used to place/print a picture thereon.

Fig. 2b shows another example of an extracted object 30, i.e. a cup. In addition, the CNN is used to extract, e.g. an area on the detected cup that can be used to place/print a picture thereon.

It will be understood by a person skilled in the art that any use of the CNN requires a respective training (as outlined above) to be able to reliably extract any such features from the image I.

Next, the extracted 2D object 02D will be saved in the image database D next to the existing image I saved in a previous step. In addition, the extracted object 02D is transmitted to another cloud service C3, which hosts the Generative Adversarial Network (GAN) or the Variational Autoencoder (VAE) network.

In the first iterations, the CNN will specialize in certain object classes (e.g., extracting picture frames from an image). The number of classes will be gradually expanded. To do this, the network may 1.) detect the existence of an object of the corresponding class in the image, 2.) determine the coordinates of the vertices of the object in the image, and 3.) extract the object from the image.

However, extracting further properties explicitly by the CNN, which are e.g. useful for determining the size of the object in the real world, is associated with relatively high effort (e.g. detection of the relevant image environment, inference of the size from the image environment). Instead, features such as size ratio may be included in the 2D -> 3D learning process by the GAN or VAE network. That is, a GAN may receive, for example, a 2D image portion of a cup of arbitrary size and generates from it a 3D cup of a standard size and shape that it learned in the training process. For 3D models that have a high variance in sizes, the desired size dimensions may be provided manually with the original object image. The 2D object image is translated into a standard size 3D model through machine learning. After the machine learning process is complete, and independent of it, the object is scaled to the desired dimensions.

Next, the GAN or VAE in C3 is used to generate a 3D object 03D from the provided 2D object

02D.

In general, the GAN consists of two networks, i.e. GAN training and GAN Inference described in the following with reference to Fig. 3.

GAN Training

The generator G takes a random noise vector z as input and applies the transformation G(z). Typically, z is a hypersphere of >=200 dimensions with each variable drawn from a Gaussian distribution P(z) with a mean of zero and a standard deviation of one. Each dimension can be thought of as a parameter that describes a feature of the 3D model. Through training, the generator learns to map points in the 200-dimensional latent space to output specific 3D objects. For example, one dimension might map to the color of the desired object's frame. Due to the randomness of z, the color is likely to differ in each iteration and G learns to create variations of the same object.

The output of G(x) is a continuous voxel representation X_c of the desired 3D object.

The 3D voxel representation is discretized to X_d since standard graphics rendering engines only accept discrete inputs.

Xd is fed to a standard graphics rendering engine R such as OpenGL, which is specialized in rendering 2D representations of 3D data. The 3D voxel representation is thus converted into a 2D image Y of the object. Instead of using a standard graphics rendering engine, training an additional neural rendering engine may be necessary. The reason is that standard engines only process discrete voxel values, but GAN training requires continuous values.

Several images are generated and collected in a set YG.

The discriminator D is trained on a set of original images of the desired object Yo and a set of generated images YG. Training continues until D has learned the function D(x) that distinguishes between fake images from YG and real images from Yo with reasonable accuracy (preferably >80%). The discriminator tries to maximize the probability of correctly guessing fake images produced by G and real images.

The generator G in turn attempts to minimize the probability of the discriminator correctly distinguishing between fake images and real images by generating ever more realistic 3D objects from the latent space vector z that are subsequently rendered to photorealistic 2D images IG by the graphics rendering engine R. Success is measured through a minimax probability loss function L that quantifies the accuracy of G and D (often cross entropy or L2 loss is used).

The process is repeated iteratively until an Equilibrium between G and D is achieved. G should now achieve the desired performance in generating 3D models from 2D photos.

The discriminator is discarded and the generator can be used in production.

GAN Inference

The generator G takes a random noise vector z with the same properties as used during training as input and applies the transformation G(z).

G returns a 3D voxel representation 03D. More information, in particular on GAN, can be found, e.g., in Jiajun Wu et al.: "Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling", 29th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain; Lilian Weng: "From GAN to WGAN", arXiv:1904.08994vl [cs.LG] 18 Apr 2019; and Sebastian Lunz et al.: "Inverse Graphics GAN: Learning to Generate 3D Shapes from Unstructured 2D Data", arXiv:2002.12674vl [cs.CV] 28 Feb 2020. All of which are incorporated by reference herein.

Alternatively, VAE may be used instead of GAN. The VAE learns the translation from the 2D representation of an object to the 3D representation of the same object, and then applies it to 2D representations to reconstruct 3D models. Learning this translation requires a large set of 2D -> 3D object pairs. GANs require a much smaller amount of 3D training data (<100), because a "real" object can be reused multiple times in pairing with generated objects.

Next, the generated object 03D is saved to the image database D next to the existing 2D image of the same object and the original image uploaded by the user. D now includes the following entry schema:

The original image I or the extracted 2D object are displayed in the app or in the browser on the customer's device 20. Once the customer selects the desired 2D representation and changes into the augmented reality view on his/her device 20, the 3D object is retrieved from the database D via the unique ID and downloaded to the customer's device 20.

Finally, the 3D object is rendered in original size via the customer's camera using an augmented reality framework that is supported by the customer's device 20.

All of the above mentioned cloud services Cl, C2, D, and C3 may be implemented as (or on) one backend (device). The backend may be a single unit or device, but can also be several interconnected units or devices. The several interconnected units or devices may be connected by a wired or wireless connection. The several interconnected units or devices may be located in one place or distributed in different places. The units or devices may be implemented as hardware and/or software.

Fig. 4 is a flow chart of a method according to an embodiment of the present disclosure. In particular, Fig. 4 shows a method for providing a three-dimensional model of a three- dimensional object for a terminal device.

In an embodiment, the method according to Fig. 4 comprises the following steps:

Step S41: receiving two-dimensional digital image data of the three-dimensional object.

Step S42: extraction of predetermined data and/or properties from the digital image data using a convolutional neural network (CNN).

Step S43: generating a three-dimensional model from the two-dimensional image data based on the extracted data and/or properties using a Generative Adversarial Network, GAN or a Variational Autoencoder (VAE).

Step S44: sending the three-dimensional model of the three-dimensional object to a terminal device.

In an embodiment, the generated three-dimensional model is suitable and intended to be displayed in space on a terminal device by means of an augmented reality framework.

In an embodiment the predetermined data and/or properties preferably relate to a class of the object and/or a geometry of the object and/or predetermined areas of the object and/or a printable area of the object.

In an embodiment, the method further comprises a step for receiving a digital image and wherein based on the extraction of the printable area of the object, the digital image is placed on the printable area, the three-dimensional model of the object together with the digital image is subsequently generated on the object. In an embodiment, the step of receiving the two-dimensional digital image data further comprises receiving information on the three-dimensional size of the object, wherein the method further comprises a step of adapting the three-dimensional model using the information on the three-dimensional size of the object after the generation of the three- dimensional model.

In an embodiment, the receipt of the information on the three-dimensional size of the object preferably comprises a manual input by a user and/or an automatic receipt from an existing database and/or from an extraction of metadata from the two-dimensional digital image data.

In an embodiment, the method further comprises a step for storing the extracted data and/or properties of the object and/or the received two-dimensional digital image data and/or the generated three-dimensional model in a database.

In an embodiment, the CNN is limited to the recognition and extraction of objects of certain object classes. The CNN may implicitly learn properties of the object that could possibly be transferred to the GAN or VAE network. The CNN would then be "connected" directly to the VAE or GAN.

In an embodiment, the method further comprises a step of storing a unique ID relating the respective extracted data and/or the properties of the object and/or the received two- dimensional digital image data and/or the generated three-dimensional model with each other.

In an embodiment, receiving the two-dimensional digital image data comprises: receiving image data provided by a user or receiving the image data from an existing database.

In an embodiment, the extraction of predetermined data and/or properties from the digital image data using the CNN comprises: detection of the object in the digital image data; determination of end points and/or corner points of the object in the digital image data; and extraction of the object from the digital image data based on the detection of the object in the digital image data and the determination of end points and/or corner points of the object in the digital image data.

In an embodiment, generating a three-dimensional model comprises: training of the GAN or VAE to generate three-dimensional models from two-dimensional image data prior to the generation of the three-dimensional model based on the received two-dimensional digital image data, wherein the trained GAN or VAE is used for said generation.

In an embodiment, the method further comprises a step of sending the two-dimensional digital image data to the terminal device prior to sending the three-dimensional model of the three-dimensional object to the terminal device.

In an embodiment, sending the three-dimensional model of the three-dimensional object to the terminal device is triggered by the terminal device, e.g. a user input on the terminal device and a subsequent message about the user input.

According to an embodiment, the method further comprises receiving the three-dimensional model on a terminal device, and displaying the three-dimensional model in space on the terminal device using an augmented reality framework.

In an embodiment, the method comprises a first step of uploading a product image (e.g. a picture frame, a cup, a T-shirt etc.) by a user; a second step of extracting the product from the uploaded image by a CNN and optionally identifying a printable area on the extracted product; a third optional step of projecting a picture (provided by a customer) onto the identified printable area; a fourth step of generating a 3D-GAN model of the extracted product; a fifth step of providing the 3D-GAN model to a customer's mobile device, in particular to an augmented reality framework, e.g. ARKit or ARCore, which may display the 3D-GAN model in a space the user is currently present.

It will be understood by a person skilled in the art that the training of neural networks is a highly empirical process. Factors such as the number of neuron layers or the properties that go into the training data can only be finally determined in the training process. Therefore, the information in the research papers referenced above should be viewed primarily as heuristics or starting points for experiments.

For example, 1.) architecture of the GAN model: the research on GANs on which the present disclosure is based, proposes several modifications and additions, e.g. training a neural graph engine and including it in the model architecture is described. However, it is possible that the architecture may work without this step; 2.) structure of networks and training data: one of the research papers referenced above proposes the use of a 200-dimensional vector whose entries are randomized samples of a normal distribution. This can be used as a starting point for the present disclosure, but in the end the vector may have perhaps 250 dimensions and the samples come from a different probability distribution.

While various embodiments of the present disclosure have been described above, it should be understood that they have been presented by way of example only, and not by way of limitation. Likewise, the various diagrams may depict an example architecture or configuration, which are provided to enable persons of ordinary skill in the art to understand exemplary features and functions of the present disclosure. Such persons would understand, however, that the present disclosure is not restricted to the illustrated example architectures or configurations, but can be implemented using a variety of alternative architectures and configurations. Additionally, as would be understood by persons of ordinary skill in the art, one or more features of one embodiment can be combined with one or more features of another embodiment described herein. Thus, the breadth and scope of the present disclosure should not be limited by any of the above-described exemplary embodiments.

It is also understood that any reference to an element herein using a designation such as "first," "second," and so forth does not generally limit the quantity or order of those elements. Rather, these designations can be used herein as a convenient means of distinguishing between two or more elements or instances of an element. Thus, a reference to first and second elements does not mean that only two elements can be employed, or that the first element must precede the second element in some manner. Additionally, a person having ordinary skill in the art would understand that information and signals can be represented using any of a variety of different technologies and techniques. For example, data, instructions, commands, information, signals, bits and symbols, for example, which may be referenced in the above description can be represented by voltages, currents, electromagnetic waves, magnetic fields or particles, optical fields or particles, or any combination thereof.

A skilled person would further appreciate that any of the various illustrative logical blocks, units, processors, means, circuits, methods and functions described in connection with the aspects disclosed herein can be implemented by electronic hardware (e.g., a digital implementation, an analog implementation, or a combination of the two), firmware, various forms of program or design code incorporating instructions (which can be referred to herein, for convenience, as "software" or a "software unit"), or any combination of these techniques.

To clearly illustrate this interchangeability of hardware, firmware and software, various illustrative components, blocks, units, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware, firmware or software, or a combination of these techniques, depends upon the particular application and design constraints imposed on the overall system. Skilled artisans can implement the described functionality in various ways for each particular application, but such implementation decisions do not cause a departure from the scope of the present disclosure. In accordance with various embodiments, a processor, device, component, circuit, structure, machine, unit, etc. can be configured to perform one or more of the functions described herein. The term "configured to" or "configured for" as used herein with respect to a specified operation or function refers to a processor, device, component, circuit, structure, machine, unit, etc. that is physically constructed, programmed and/or arranged to perform the specified operation or function.

Furthermore, a skilled person would understand that various illustrative logical blocks, units, devices, components and circuits described herein can be implemented within or performed by an integrated circuit (1C) that can include a general purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic device, or any combination thereof. The logical blocks, units, and circuits can further include antennas and/or transceivers to communicate with various components within the network or within the device. A general purpose processor can be a microprocessor, but in the alternative, the processor can be any conventional processor, controller, or state machine. A processor can also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other suitable configuration to perform the functions described herein. If implemented in software, the functions can be stored as one or more instructions or code on a computer- readable medium. Thus, the steps of a method or algorithm disclosed herein can be implemented as software stored on a computer-readable medium.

Computer-readable media includes both computer storage media and communication media including any medium that can be enabled to transfer a computer program or code from one place to another. A storage media can be any available media that can be accessed by a computer. By way of example, and not limitation, such computer-readable media can include RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer.

In this document, the term "unit" as used herein, refers to software, firmware, hardware, and any combination of these elements for performing the associated functions described herein. Additionally, for purpose of discussion, the various units are described as discrete units; however, as would be apparent to one of ordinary skill in the art, two or more units may be combined to form a single unit that performs the associated functions according embodiments of the present disclosure.

Additionally, memory or other storage, as well as communication components, may be employed in embodiments of the present disclosure. It will be appreciated that, for clarity purposes, the above description has described embodiments of the present disclosure with reference to different functional units and processors. However, it will be apparent that any suitable distribution of functionality between different functional units, processing logic elements or domains may be used without detracting from the present disclosure. For example, functionality illustrated to be performed by separate processing logic elements, or controllers, may be performed by the same processing logic element, or controller. Hence, references to specific functional units are only references to a suitable means for providing the described functionality, rather than indicative of a strict logical or physical structure or organization.

Various modifications to the implementations described in this disclosure will be readily apparent to those skilled in the art, and the general principles defined herein can be applied to other implementations without departing from the scope of this disclosure. Thus, the disclosure is not intended to be limited to the implementations shown herein, but is to be accorded the widest scope consistent with the novel features and principles disclosed herein, as recited in the claims below.

Claims

1. A method for providing a three-dimensional model of a three-dimensional object for a terminal device, wherein the method comprises the following steps: receiving two-dimensional digital image data of the three-dimensional object; extraction of predetermined data and/or properties from the digital image data using a convolutional neural network, CNN; generating a three-dimensional model from the two-dimensional image data based on the extracted data and/or properties using a Generative Adversarial Network, GAN or a Variational Autoencoder, VAE; and sending the three-dimensional model of the three-dimensional object to a terminal device.

2. The method according to claim 1, wherein sending the three-dimensional model of the three-dimensional object to a terminal device comprises a communication between the terminal device and a backend device, preferably chatbot conversations through a chat interface or download of remote content through a web or mobile application.

3. The method according to claim 1 or 2, wherein the generated three-dimensional model is suitable and intended to be displayed in space on a terminal device by means of an augmented reality framework.

4. The method according to any one of claims 1 to 3, wherein the predetermined data and/or properties relate to a class of the object and/or a geometry of the object and/or predetermined areas of the object and/or a printable area of the object.

5. The method according to claim 4, wherein the method further comprises a step for receiving a digital image and wherein based on the extraction of the printable area of the object, the digital image is placed on the printable area, the three-dimensional model of the object together with the digital image is subsequently generated on the object.

6. The method according to any one of claims 1 to 5, wherein the step of receiving the two- dimensional digital image data further comprises receiving information on the three- dimensional size of the object, wherein the method further comprises a step of adapting the three-dimensional model using the information on the three-dimensional size of the object after the generation of the three-dimensional model, and preferably wherein the receipt of the information on the three-dimensional size of the object comprises a manual input by a user and/or an automatic receipt from an existing database and/or from an extraction of metadata from the two-dimensional digital image data.

7. The method according to any one of claims 1 to 6, wherein the method further comprises a step for storing the extracted data and/or properties of the object and/or the received two-dimensional digital image data and/or the generated three- dimensional model in a database.

8. The method according to claim 7, wherein the method further comprises a step of storing a unique ID relating the respective extracted data and/or the properties of the object and/or the received two-dimensional digital image data and/or the generated three-dimensional model with each other.

9. The method according to any one of claims 1 to 8, wherein receiving the two- dimensional digital image data comprises: receiving image data provided by a user or receiving the image data from an existing database.

10. The method according to any one of claims 1 to 9, wherein the extraction of predetermined data and/or properties from the digital image data using the CNN comprises: detection of the object in the digital image data; determination of end points and/or corner points of the object in the digital image data; and extraction of the object from the digital image data based on the detection of the object in the digital image data and the determination of end points and/or corner points of the object in the digital image data.

11. The method according to any one of claims 1 to 10, wherein generating a three- dimensional model comprises: training of the GAN or VAE to generate three-dimensional models from two- dimensional image data prior to the generation of the three-dimensional model based on the received two-dimensional digital image data, wherein the trained GAN or VAE is used for said generation.

12. The method according to any one of claims 1 to 11, wherein the method further comprises a step of sending the two-dimensional digital image data to the terminal device priorto sending the three-dimensional model of the three-dimensional object to the terminal device.

13. The method according to claim 12, wherein sending the three-dimensional model of the three-dimensional object to the terminal device is triggered by the terminal device.

14. A method for displaying a three-dimensional model in space on a terminal device by means of an augmented reality framework, the method comprising the following steps: the steps according to the method of one of claims 1 to 11, receiving the three-dimensional model on a terminal device, and displaying the three-dimensional model in space on the terminal device using an augmented reality framework.

15. A data processing apparatus comprising a processor configured to perform the steps of the method according to any one of claims 1 to 14.