WO2022075533A1

WO2022075533A1 - Method of on-device generation and supplying wallpaper stream and computing device implementing the same

Info

Publication number: WO2022075533A1
Application number: PCT/KR2021/000224
Authority: WO
Inventors: Roman Evgenievich SUVOROV; Elizaveta Mikhailovna LOGACHEVA; Victor Sergeevich LEMPITSKY; Anton Evgenievich MASHIKHIN; Oleg Igorevich KHOMENKO
Original assignee: Samsung Electronics Co., Ltd.
Priority date: 2020-10-07
Filing date: 2021-01-08
Publication date: 2022-04-14
Also published as: RU2768551C1

Abstract

A method of on-device generation and supplying a computing device with wallpaper stream is disclosed. The method comprise: generating (S105), at the computing device, at least one first wallpaper of the wallpaper stream using a deep generative neural network, wherein the deep generative neural network is trained on a collection of high-quality images/videos and uploaded to the computing device in advance; and setting (S110), at the computing device, the at least one first wallpaper as the wallpaper of the computing device.

Description

METHOD OF ON-DEVICE GENERATION AND SUPPLYING WALLPAPER STREAM AND COMPUTING DEVICE IMPLEMENTING THE SAME

The present invention relates to the field of artificial intelligence in general, and specifically, to a method of generating and supplying a wallpaper stream on a computing device using a deep generative neural network and to the computing device implementing the method.

Wallpapers are a big part of user experience for a variety of devices, including smartphones, smart TVs, laptops and so on. Currently, to obtain new aesthetically pleasant wallpapers regularly users may subscribe to online updates, i.e. services (like Unsplash) that regularly send new wallpapers to the user device via Internet connection. Disadvantages of such an approach comprise at least the need of Internet connection, and traffic and bandwidth consumption.

Generative neural networks are now capable of synthesizing highly realistic looking 2D images, 3D images and videos. Such networks can therefore be trained to generate realistic aesthetically pleasant wallpaper images. Once trained, many of these models can generate infinite number of highly diverse wallpapers (the most popular class of these models, namely generative adversarial networks) by taking a random high-dimensional vector as an input and generating a peculiar image for such vector. Plugging in a new vector would result in a substantially different image.

Proposed herein is the alternative invention that allows regularly update the wallpapers of the user device without using the Internet. Provided according to the first aspect of the present disclosure is a method of on-device generation and supplying a computing device with wallpaper stream, the method including: generating, at the computing device, at least one first wallpaper of the wallpaper stream using a deep generative neural network, wherein the deep generative neural network is trained on a collection of high-quality images/videos and uploaded to the computing device in advance; and setting, at the computing device, the at least one first wallpaper as the wallpaper of the computing device. Since the plausible wallpaper is synthesized on the computing device itself, i.e. not downloaded from Internet, the above disadvantages are eliminated and the claimed invention makes it possible to reduce/avoid traffic and bandwidth consumption that was needed in the prior art for downloading.

Provided according to the second aspect of the present disclosure is a computing device including a processor and a storage device storing the trained deep generative neural network for on-device generation and supplying a wallpaper stream by performing the method according to the first aspect, when the trained deep generative neural network is executed by the processor.

Figure 1 illustrates a flowchart of a method of on-device generation and supplying a computing device with wallpaper stream according to an embodiment of the invention disclosed herein.

Figure 2 illustrates a flowchart of a method of on-device generation and supplying a computing device with wallpaper stream according to another embodiment of the invention disclosed herein.

Figure 3 illustrates a structural diagram of a computing device according to an embodiment of the invention disclosed herein.

In the following description, unless otherwise described, the same reference numerals are used for the same elements when they are depicted in different drawings, and overlapping description thereof may be omitted.

Figure 1 illustrates a flowchart of a method of on-device generation and supplying a computing device with wallpaper stream according to an embodiment of the invention disclosed herein. The method comprise: generating S105, at the computing device, at least one first wallpaper of the wallpaper stream using a deep generative neural network, wherein the deep generative neural network is trained on a collection of high-quality images/videos and uploaded to the computing device in advance. Implied by said generating is artificial synthesis of a wallpaper with the deep generative neural network. Depending on a type of content in the collection of high-quality images/videos, which was used during the training phase of the deep generative neural network, the deep generative neural network, when trained with such a collection, is configured to generate wallpapers of similar content type. For example, when the collection of high-quality images includes landscape images, the deep generative neural network, when trained with such a collection, will be configured to generate landscape wallpapers and so on. However, the present invention should not be limited with landscape wallpapers, because a collection of high-quality images/videos of any other type of content may be used at the training phase of the deep generative neural network. Furthermore, if a collection of high-quality videos was used during the training phase of the deep generative neural network, the deep generative neural network, when trained with such a collection, will be configured to generate video wallpapers of a content type corresponding to a content type of the training collection of high-quality videos. Implied by the wallpaper stream is one or more images that could be static, dynamic or interactive and/or one or more videos. When the deep generative neural network is trained, the deep generative neural network including weighting factors and other parameters may be uploaded to the computing device in advance, i.e. before the inference phase. The deep generative neural network may be stored in a storage device such as a memory of the computing device. When the at least one first wallpaper is generated by the deep generative neural network, the method comprises the step of setting S110, at the computing device, the at least one first wallpaper as the wallpaper of the computing device. The generated wallpaper may be for any kind of user interface, for example, the generated wallpaper may be the wallpaper for the main desktop, for the lock screen, for the empty page of a browser and so on without limitation. The generated wallpaper may be used as a screensaver of the computing device.

As illustrated on Figure 1 the method further comprises the step of determining S115, whether a condition is met or not. This conditioning is used to determine whether it is necessary to update the first wallpaper with a second wallpaper. The condition includes, but is not limited to, one or more of the following: (i) a user input is received at the computing device, wherein the user input may represent whether a wallpaper currently set as the wallpaper of the computing device is disliked by a user of the computing device or not; (ii) a preconfigured period of time is elapsed; (iii) GPS location of the computing device is changed, wherein the GPS location of the computing device may be registered by a GPS unit comprised in the computing device. Based on dislikes the method may be adapted to generate another wallpaper if the currently set wallpaper is disliked by the user. From such like/dislike information, the system can learn to generate wallpapers that the user will like. If it is determined that the condition is met (i.e. YES at step S115) then the method updates the wallpaper currently set as the wallpaper of the computing device by performing the following steps: generating S120, at the computing device, at least one second wallpaper of the wallpaper stream using the deep generative neural network, setting S125, at the computing device, the at least one second wallpaper as the wallpaper of the computing device. The described wallpaper update may be performed automatically in the background. As an example, the user can enable a feature which generates a new wallpaper every morning. The terms ≪first≫ and ≪second≫ when applied to the term ≪wallpaper≫ are used to differentiate between different wallpapers and should not be construed as the terms representing any ordinal relationship between said wallpapers or between the steps of the method. The at least one second wallpaper differs from the at least one first wallpaper. In an alternative embodiment (not illustrated) of the method, the step of determining S115, whether a condition is met or not, may be performed before generating and setting each subsequent wallpaper, including the case where the step of determining S115 is performed before the steps S105 and S110 explained above. Thus, it should be clear that the reference numerals are used just to simplify the illustration and should not be construed as representing any ordinal relationship between the steps of the method.

Figure 2 illustrates a flowchart of a method of on-device generation and supplying a computing device with wallpaper stream according to another embodiment of the invention disclosed herein. The method embodiment illustrated on Figure 2 differs from the method embodiment illustrated on Figure 1 in that it further comprises the steps of individualizing S95 the deep generative neural network for a user of the computing device using a random input as a parameter of the deep generative neural network thereby ensuring that the deep generative neural network is configured to generate unique wallpaper for the user of the computing device, and customizing S100 the deep generative neural network for the user of the computing device thereby ensuring that a wallpaper generated by the deep generative neural network is customized for the user. The steps S105-S125 illustrated on Figure 2 may be similar to the S105-S125 illustrated on Figure 1 except that the at least one first wallpaper and the at least one second wallpaper generated and set by the embodiment of Figure 2 are individualized and customized. Therefore, every user gets her/his own unique wallpaper every time.

The customization S100 may be based, but without the limitation, on one or more of the following customization parameters: one or more user preferences, one or more user inputs, one or more settings of the computing device, current time of the day, a current season of the year, current GPS location of the computing device, content of the user gallery currently stored on the computing device, content of browser history currently stored on the computing device, current weather and weather forecast, location and colors of icons and widgets on the device screen. Current location and colors of icons and widgets on the device screen may be detected and used in the disclosed method to synthesize such a wallpaper that is not mixed with the icons and widgets. In this case, current location and colors of icons and widgets on the device screen may be input to the deep generative neural network before generating/synthesizing a wallpaper as the corresponding parameters to effect an output of the deep generative neural network correspondingly. The method may further include (not illustrated) the steps of analyzing the user gallery content, inferring that the user is fond of mountain photography, and adapting the deep generative neural network to generate more images/videos/interactive wallpapers of mountains. Despite the fact that Figure 2 shows that the steps S95 and S100 are performed before the steps S105 and S110, it should not be construed as a limitation, because, if necessary, the steps may be ordered differently, for example, one or both of the steps S95 and S100 may be performed before generating and setting each subsequent wallpaper, including the case where one or both of the steps S95 and S100 is performed before the steps S120 and S125 explained above. Thus, it should be clear that the reference numerals are used just to simplify the illustration and should not be construed as representing any ordinal relationship between the steps of the method.

The deep generative neural network is trained using adversarial learning process alongside with one or more discriminator networks. The training is performed on a high-end computer or a computational cluster on a large dataset of wallpaper-quality images and/or videos. The deep generative neural network may have one or more of the following variables: vectorial variables, latent variables shaped as a two-dimensional matrix or a set of two-dimensional matrices. In an embodiment, latent variables may be drawn from unit normal distributions. The customization may be accomplished by a separate encoder network that is trained to map the customization parameters to the parameters of the latent space distributions such as the mean and the covariance of the Gaussian normal distribution, from which the latent variables are drawn for the deep generative neural network. When user-sensitive information such as, for example, one or more user preferences, one or more user inputs, one or more settings of the computing device, current GPS location of the computing device, content of the user gallery currently stored on the computing device, content of browser history currently stored on the computing device is processed as the customization parameter(s), it should be clear that such user-sensitive information is not compromised, because the entire processing of said user-sensitive information is carried out by a processor of the user`s computing device and by the separate encoder network and the deep generative neural network, which are stored in the storage device of the user`s computing device. In other words, advantageously, the user-sensitive information does not leave the computing device for processing.

In an alternative embodiment (not illustrated) of the method the step of generating S105, S120 at least one wallpaper further comprises the steps of synthesizing an image and restyling the image, and the step of setting S110, S125 the at least one wallpaper as the wallpaper of the computing device further comprises the step of setting the restyled image as the at least one first wallpaper. In still another embodiment (not illustrated) of the method the step of generating S105, S120 at least one wallpaper further comprises the steps of synthesizing an image and animating the image, and the step of setting S110, S125 the at least one wallpaper as the wallpaper of the computing device further comprises the step of setting the animated image as the at least one first wallpaper. Because there are no constraints on the bandwidth and acquiring the content is essentially free, dynamic wallpapers (videos at high resolution) can be generated with an appropriate model. Apart from dynamic wallpapers, interactive wallpapers can be generated. A smartphone wallpaper can change appearance based on user swipes, phone tilts or some interface events. As an example, swiping to a different tab in Android screen can change something in an image (e.g. move clouds in the image in the swipe direction). According to another alternative embodiment of the disclosed method the deep generative neural network may be adapted to generate not only realistic and plausible images but also hyper-realistic images that can have e.g. exaggerated features, such as extremely saturated sunset colors, exaggerated geometric proportions of objects (trees, buildings) etc. Some users may like and prefer such hyper-realistic wallpapers. Most generative models (e.g. adversarially trained generative neural networks) can allow each user to set her/his own preferred tradeoff between realism and hyperrealism. In still another embodiment of the method the step of generating S105, S120 at least one wallpaper further comprises the step of applying super resolution to the synthesized image, and the step of setting S110, S125 the at least one wallpaper as the wallpaper of the computing device further comprises the step of setting the image having super-resolution as the at least one first wallpaper. Particular techniques of restyling an image, applying super-resolution or hyper-resolution to an image are know in the art.

Figure 3 illustrates a structural diagram of a computing device 200 according to an embodiment of the invention disclosed herein. The computing device (200) comprises a processor 205 and a storage device 210. The processor 205 is configured to perform processing and computing tasks related to the operation of the computing device and to the operation on the method disclosed herein. The storage device stores the trained deep generative neural network 210.1 for on-device generation and supplying a wallpaper stream by performing the disclosed, when the trained deep generative neural network 210.1 is executed by the processor 205. The storage device 210 may also store processor-executable instructions to cause the processor to perform any one or more of the above described method steps. The processor 205 and the storage device 210 may be operatively interconnected. The processor 205 and the storage device 210 may be also connected with other components (not illustrated) of the computing device. The other component may include, but without the limitation, one or more of a display, a touchscreen, a keyboard, a keypad, a communication unit, a speaker, a microphone, a camera, Bluetooth-unit, NFC (Near-Field Communication) unit, RF(Radio-Frequency) unit, GPS unit, I/O means, as well as necessary wirings and interconnections and so on. The processor 205 may be implemented, but without the limitation, a general-purpose processor, an application-specific integrated circuit (ASIC), a user-programmable gate array (FPGA), or a system-on-chip (SoC). The storage device 210 may include, but without the limitation, RAM, ROM and so on. Thus, the computing device may be, but without the limitation, the user`s computing device being one of a smartphone, a tablet, a laptop, a notebook, smart TV, in-vehicle infotainment system and so on.

Any piece of the following information should not be construed as a limitation on the present disclosure. Instead, the following information is provided below to enable the skilled person to practice the embodiments disclosed herein and to prove that the disclosure is sufficient. Any particular values of any parameters specified below are to be construed non-limiting.

Model architecture. The architecture of the model may be based on StyleGAN. The model outputs images of resolution 256 Х 256 (or 512 Х 512) and has four sets of latent variables:

- a vector

, which encodes colors and the general scene layout;

- a vector

, which encodes global lighting (e.g. time of day);

- a set

of square matrices

, which encode shapes and details of static objects at N = 7 different resolutions between 4 Х 4 and 256 Х 256 ( N = 8 for 512 Х 512);

- a set

of square matrices

, which encode shapes and details of dynamic objects at the corresponding resolutions.

The generator has two components: the multilayer perceptron M and the convolutional generator G. The perceptron M takes the concatenated vector

and transforms it to the style vector

. The convolutional generator G has N = 7 (or 8) blocks. Within each block, a convolution is followed by two elementwise additions of two tensors obtained from

and

by a learnable per-channel scaling. Finally, the AdaIN transform is applied using per-channel scales and biases obtained from

using learnable linear transform. Within each block, this sequence of steps is repeated twice followed by upsampling and convolution layers.

Below, the following set of input latent variables will be referred to

as original inputs (or original latents). As in StyleGAN, the convolutional generator may use separate

vectors at each of the resolution (style mixing). The set of all style vectors will be referred to as

. Finally, the set of all spatial random inputs of the generator will be denoted as

.

Learning the model. The model is trained from two sources of data, the dataset of static scenery images

and the dataset of timelapse scenery videos

. It is relatively easy to collect a large static dataset, while with authors best efforts a few hundreds of videos that do not cover all the diversity of landscapes were collected. Thus, both sources of data may be utilized in order to build a model having improved performance. To do that, the proposed generative model (the deep generative neural network) is trained in an adversarial way with two different discriminators.

The static discriminator

has the same architecture and design choices as in StyleGAN. It observes images

from as real, while the fake samples are generated by the model. The pairwise discriminator

looks at pairs of images. It duplicates the architecture of

except first convolutional block that is applied separately to each frame. A real pair of images is obtained by sampling a video from

, and then sampling two random frames (arbitrary far from each other) from it. A fake pair is obtained by sampling common static latents

and

, and then individual dynamic latents

,

, and

,

. The two images are then obtained as

and

. All samples are drawn from unit normal distributions.

The model is trained within GAN approach with non-saturating loss with R1 regularization. During each update of the generator, a batch of fake images to which the static discriminator is applied is sampled or a batch of image pairs to which the pairwise discriminator is applied is sampled. The proportions of the static discriminator and the pairwise discriminator are selected from 0.5/0.5 to 0.9/0.1 respectively over each resolution transition phase and then kept fixed at 0.1. This helps the generator to learn disentangle static and dynamic latents early for each resolution and prevents the pairwise generator from overfitting to relatively small video dataset used for training.

During learning, the objective of the pairwise discriminator is to focus on the inconsistencies within each pair, and the objective of the static discriminator is to focus on visual quality. Furthermore, since the pairwise discriminator only sees real frames sampled from a limited number of videos, it may prone overfit to this limited set and effectively stop contributing to the learning process (while the static discriminator, which observes more diverse set of scenes, keeps improving the diversity of the model). It turns out both problems (focus on image quality rather than pairwise consistency, overfitting to limited diversity of videos) can be solved with a simple method. This method consists in that the fake set of frames are augmented with pairs of crops taken from same video frame, but from different locations. Since these crops have the same visual quality as the images in real frames, and since they come from the same videos as images within real pairs, the pairwise discriminator effectively stops paying attention to image quality, cannot simply overfit to the statistics of scenes in the video dataset, and has to focus on finding pairwise inconsistencies within fake pairs. This crop sampling method may be used to improve the quality of the model significantly.

Sampling videos from the model. The model does not attempt to learn full temporal dynamics of videos, and instead focuses on pairwise consistency of frames that are generated when the dynamic latent variables are resampled. In particular, the pairwise discriminator in the model does not sample real frames sequentially. The sampling procedure for fake pairs does not try to generate adjacent frames either. One of the reasons why the described learning phase does not attempt to learn continuity, is because the training dataset contains videos of widely-varying temporal rates, making the notion of temporal adjacency for a pair of frames effectively meaningless.

Because of this, the disclosed generation process is agnostic to a model of motion. The generator is forced to produce plausible frames regardless of

and

changes. In the experiments it was found that a simple model of motion described below is enough to produce compelling videos. Specifically, to sample a video, it is possible to sample a single static vector

from the unit normal distribution and then interpolate the dynamic latent vector between two unit normally distributed samples

and

. For the spatial maps, again it is possible to sample

and

from a unit normal distribution and then warp the

tensor continuously using a homography transform parameterized by displacements of two upper corners and two points at the horizon. The direction of the homogrpahy is sampled randomly, speed was chosen to match the average speed of clouds in the training dataset. The homography is flipped vertically for positions below the horizon to mimic the reflection process. To obtain

, it is possible to make a composition of

identical transforms and then apply it to

. As the latent variables are interpolated/warped, the latent variables are passed through the trained model to obtain the smooth videos. It is necessary to note that the described model requires no image-specific user input.

Animating Real Scenery Images with the Model. Inference. To animate a given scenery image

, a set of latent variables that produce such an image within the generator is inferred. Extended latents

and

are looked for, so that

. After that, the same procedure as described above to animate the given image may be applied.

The latent space of the generator is highly redundant, and to obtain good animation, it has to be ensured that the latent variables come roughly from the same distribution as during the training of the model (most importantly,

should belong to the output manifold of

). Without such prior, the latent variables that generate good reconstruction might still result in implausible animation (or lack of it). Therefore inference may be performed using the following three-step procedure:

1. Step 1: predicting a set of style vectors

using a feedforward encoder network E. The encoder has ResNet-152 architecture and is trained on 200000 synthetic images with mean absolute error loss.

is predicted by two-layer perceptron with ReLU from the concatenation of features from several levels of ResNet, aggregated by global average pooling.

2. Step 2: starting from

and zero

, all latents are optimized to improve reconstruction error. In addition, the deviation of

from the predicted

(with coefficient 0.01) and the deviation of

from zero (by reducing learning rate) are penalized. Optimization is performed for up to 500 steps with Adam and large initial learning rate (0.1), which is halved each time the loss does not improve for 20 iterations. A variant of the method, which was evaluated separately, uses a binary segmentation mask obtained by ADE20k-pretrained segmentation network. The mask identifies dynamic (sky+water) and remaining (static) parts of the scene. In this variant,

(respectively

) are kept at zero for dynamic (respectively, static) parts of the image.

3. Step 3: freezing latents and fine-tuning the weights of

to further drive down the reconstruction error. This step is needed since even after optimization, the gap between the reconstruction and the input image remains. During this fine-tuning, the combination of the per-pixel mean absolute error and the perceptual loss are minimized, with much larger (10Х) weight for the latter. 500 steps with ADAM and lr = 0,001 are performed.

Lighting manipulation. During training of the model,

is used to map

to

.

is resampled in order to take into account variations of lighting, weather changes, etc. and to have

describe only static attributes (land, buildings, horizon shape, etc.). To change lighting in a real image, one has to change

and then use MLP to obtain new styles

. The described inference procedure, however, outputs

and it has been found that it is very difficult to invert

and obtain

.

To tackle this problem, a separate neural network,

, is trained to approximate local dynamics of

. Let

and

, A is optimized as follows:

, where

is coefficient of interpolation between

and

. Thus, c = 0 corresponds to

, so

; c = 1 corresponds to

, so .

.

This is implemented by the combination of L1-loss

and relative direction loss

. The total optimization criterion is

.

is trained with ADAM until convergence. At test time, the network

makes it possible to sample a random target

and update

towards it by increasing the interpolation coefficient

as the animation progresses.

Super Resolution (SR). As the models are trained at medium resolution (e.g. 256Х256), it becomes possible to bring fine details from a given image that is subject to animation through a separate super-resolution procedure. The main idea of the super resolution approach is to borrow as much as possible from the original high-res image (which is downsampled for animation via

). To achieve that, the animation is super-resolved and blended with the original image using a standard image super resolution approach. It is possible to use ESRGANx4 trained on a dedicated dataset that is created as follows. To obtain the (hi-res, low-res) pair, a frame I is taken from the video dataset as a hi-res image, the frame is then downsampled and the first two steps of inference are run and an (imperfect) low-res image is obtained. Thus, the network is trained on a more complex task than super resolution.

After obtaining the super-resolved video, dynamic parts (sky and water) are transferred from it to the final result. The static parts are obtained by running the guided filter on the super-resolved frames while using the input high-res image as a guide. Such procedure effectively transfers high-res details from the input, while retaining the lighting change induced by lighting manipulation.

At least one of the plurality of modules, units, components, steps, sub-steps may be implemented through an AI model. A function associated with AI may be performed through the non-volatile memory, the volatile memory, and the processor. The processor may include one or a plurality of processors. At this time, one or a plurality of processors may be a general-purpose processor, such as a central processing unit (CPU), an application processor (AP), or the like, a graphics-only processing unit such as a graphics processing unit (GPU), a visual processing unit (VPU), and/or an AI-dedicated processor such as a neural processing unit (NPU). The one or a plurality of processors control the processing of the input data in accordance with a predefined operating rule or artificial intelligence (AI) model stored in the non-volatile memory and the volatile memory. The predefined operating rule or artificial intelligence model is provided through training or learning. Here, being provided through learning means that, by applying a learning algorithm to a plurality of learning data, a predefined operating rule or AI model of a desired characteristic is made. The learning may be performed in a device itself in which AI according to an embodiment is performed, and/or may be implemented through a separate server/system.

The AI model may consist of a plurality of neural network layers. Each layer has a plurality of weight values, and performs a layer operation through calculation of a previous layer and an operation of a plurality of weights. Examples of neural networks include, but are not limited to, convolutional neural network (CNN), deep neural network (DNN), recurrent neural network (RNN), restricted Boltzmann Machine (RBM), deep belief network (DBN), bidirectional recurrent deep neural network (BRDNN), generative adversarial networks (GAN), and deep Q-networks. The learning algorithm is a method for training a predetermined target device (for example, a robot) using a plurality of learning data to cause, allow, or control the target device to make a determination or prediction. Examples of learning algorithms include, but are not limited to, supervised learning, unsupervised learning, semi-supervised learning, or reinforcement learning.

It should be expressly understood that not all technical effects mentioned herein need to be enjoyed in each and every embodiment of the present technology. For example, embodiments of the present technology may be implemented without the user enjoying some of these technical effects, while other embodiments may be implemented with the user enjoying other technical effects or none at all.

Modifications and improvements to the above-described implementations of the present technology may become apparent to those skilled in the art. The foregoing description is intended to be exemplary rather than limiting. The scope of the present technology is therefore intended to be limited solely by the scope of the appended claims.

While the above-described implementations have been described and shown with reference to particular steps performed in a particular order, it will be understood that these steps may be combined, sub-divided, or reordered without departing from the teachings of the present technology. Accordingly, the order and grouping of the steps is not a limitation of the present technology.

Claims

A method of on-device generation and supplying a computing device with wallpaper stream, comprising:

generating (S105), at the computing device, at least one first wallpaper of the wallpaper stream using a deep generative neural network, wherein the deep generative neural network is trained on a collection of high-quality images/videos and uploaded to the computing device in advance; and

setting (S110), at the computing device, the at least one first wallpaper as the wallpaper of the computing device.
The method of claim 1, further comprising the following steps performed upon a condition is met (S115):

generating (S120), at the computing device, at least one second wallpaper of the wallpaper stream using the deep generative neural network, wherein the at least one second wallpaper differs from the at least one first wallpaper;

setting (S125), at the computing device, the at least one second wallpaper as the wallpaper of the computing device.
The method of claim 2, wherein the condition includes one or more of the following conditions:

a user input is received at the computing device, wherein the user input represents whether a wallpaper currently set as the wallpaper of the computing device is disliked by a user of the computing device or not;

a preconfigured period of time is elapsed;

GPS location of the computing device is changed, wherein the GPS location of the computing device is registerable by a GPS unit comprised in the computing device.
The method of claim 1, further comprising one or both of the following steps:

individualizing (S95) the deep generative neural network for a user of the computing device using a random input as a parameter of the deep generative neural network thereby ensuring that the deep generative neural network is configured to generate unique wallpaper for the user of the computing device;

customizing (S100) the deep generative neural network for the user of the computing device thereby ensuring that a wallpaper generated by the deep generative neural network is customized for the user, the customization is based on one or more of the following customization parameters: one or more user preferences, one or more user inputs, one or more settings of the computing device, current time of the day, a current season of the year, current GPS location of the computing device, content of the user gallery currently stored on the computing device, content of browser history currently stored on the computing device, current weather and weather forecast, location and colors of icons and widgets on the device screen.
The method of claim 1 or 2, wherein the wallpaper generated by the deep generative neural network is static, dynamic or interactive.
The method of claim 1, wherein the deep generative neural network has one or more of the following variables: vectorial variables, latent variables shaped as a two-dimensional matrix or a set of two-dimensional matrices.
The method of claim 6, wherein latent variables are drawn from unit normal distributions.
The method of any one of claims 4 or 6, wherein the customization is accomplished by a separate encoder network that is trained to map the customization parameters to the parameters of the latent space distributions such as the mean and the covariance of the Gaussian normal distribution, from which latent variables are drawn.
The method of claim 1 or 2, wherein the step of generating at least one wallpaper further comprises the steps of synthesizing an image and restyling the image, and

the step of setting the at least one wallpaper as the wallpaper of the computing device further comprises the step of setting the restyled image as the at least one first wallpaper.
The method of claim 1 or 2, wherein the step of generating at least one wallpaper further comprises the steps of synthesizing an image and animating the image, and

the step of setting the at least one wallpaper as the wallpaper of the computing device further comprises the step of setting the animated image as the at least one first wallpaper.
The method of claim 6, wherein the deep generative neural network is trained using adversarial learning process alongside with one or more discriminator networks.
A computing device (200) comprising a processor (205) and a storage device (210) storing the trained deep generative neural network (210.1) for on-device generation and supplying a wallpaper stream by performing the method of any one of claims 1-11, when the trained deep generative neural network (210.1) is executed by the processor (205).
The computing device (200) of claim 12, wherein the computing device (200) comprises the user-computing device being one of the following: a smartphone, a tablet, a laptop, a notebook, smart TV.