WO2022075533A1 - Method of on-device generation and supplying wallpaper stream and computing device implementing the same - Google Patents

Method of on-device generation and supplying wallpaper stream and computing device implementing the same Download PDF

Info

Publication number
WO2022075533A1
WO2022075533A1 PCT/KR2021/000224 KR2021000224W WO2022075533A1 WO 2022075533 A1 WO2022075533 A1 WO 2022075533A1 KR 2021000224 W KR2021000224 W KR 2021000224W WO 2022075533 A1 WO2022075533 A1 WO 2022075533A1
Authority
WO
WIPO (PCT)
Prior art keywords
wallpaper
computing device
neural network
generative neural
user
Prior art date
Application number
PCT/KR2021/000224
Other languages
French (fr)
Inventor
Roman Evgenievich SUVOROV
Elizaveta Mikhailovna LOGACHEVA
Victor Sergeevich LEMPITSKY
Anton Evgenievich MASHIKHIN
Oleg Igorevich KHOMENKO
Original Assignee
Samsung Electronics Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Samsung Electronics Co., Ltd. filed Critical Samsung Electronics Co., Ltd.
Publication of WO2022075533A1 publication Critical patent/WO2022075533A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0481Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/001Texturing; Colouring; Generation of texture or colour

Definitions

  • the present invention relates to the field of artificial intelligence in general, and specifically, to a method of generating and supplying a wallpaper stream on a computing device using a deep generative neural network and to the computing device implementing the method.
  • Wallpapers are a big part of user experience for a variety of devices, including smartphones, smart TVs, laptops and so on.
  • users may subscribe to online updates, i.e. services (like Unsplash ) that regularly send new wallpapers to the user device via Internet connection.
  • Online updates i.e. services (like Unsplash ) that regularly send new wallpapers to the user device via Internet connection.
  • Disadvantages of such an approach comprise at least the need of Internet connection, and traffic and bandwidth consumption.
  • Generative neural networks are now capable of synthesizing highly realistic looking 2D images, 3D images and videos. Such networks can therefore be trained to generate realistic aesthetically pleasant wallpaper images. Once trained, many of these models can generate infinite number of highly diverse wallpapers (the most popular class of these models, namely generative adversarial networks) by taking a random high-dimensional vector as an input and generating a peculiar image for such vector. Plugging in a new vector would result in a substantially different image.
  • Proposed herein is the alternative invention that allows regularly update the wallpapers of the user device without using the Internet.
  • a method of on-device generation and supplying a computing device with wallpaper stream including: generating, at the computing device, at least one first wallpaper of the wallpaper stream using a deep generative neural network, wherein the deep generative neural network is trained on a collection of high-quality images/videos and uploaded to the computing device in advance; and setting, at the computing device, the at least one first wallpaper as the wallpaper of the computing device. Since the plausible wallpaper is synthesized on the computing device itself, i.e. not downloaded from Internet, the above disadvantages are eliminated and the claimed invention makes it possible to reduce/avoid traffic and bandwidth consumption that was needed in the prior art for downloading.
  • a computing device including a processor and a storage device storing the trained deep generative neural network for on-device generation and supplying a wallpaper stream by performing the method according to the first aspect, when the trained deep generative neural network is executed by the processor.
  • Figure 1 illustrates a flowchart of a method of on-device generation and supplying a computing device with wallpaper stream according to an embodiment of the invention disclosed herein.
  • Figure 2 illustrates a flowchart of a method of on-device generation and supplying a computing device with wallpaper stream according to another embodiment of the invention disclosed herein.
  • Figure 3 illustrates a structural diagram of a computing device according to an embodiment of the invention disclosed herein.
  • Figure 1 illustrates a flowchart of a method of on-device generation and supplying a computing device with wallpaper stream according to an embodiment of the invention disclosed herein.
  • the method comprise: generating S105, at the computing device, at least one first wallpaper of the wallpaper stream using a deep generative neural network, wherein the deep generative neural network is trained on a collection of high-quality images/videos and uploaded to the computing device in advance. Implied by said generating is artificial synthesis of a wallpaper with the deep generative neural network.
  • the deep generative neural network when trained with such a collection, is configured to generate wallpapers of similar content type.
  • the deep generative neural network when trained with such a collection, will be configured to generate landscape wallpapers and so on.
  • the present invention should not be limited with landscape wallpapers, because a collection of high-quality images/videos of any other type of content may be used at the training phase of the deep generative neural network.
  • the deep generative neural network when trained with such a collection, will be configured to generate video wallpapers of a content type corresponding to a content type of the training collection of high-quality videos. Implied by the wallpaper stream is one or more images that could be static, dynamic or interactive and/or one or more videos.
  • the deep generative neural network including weighting factors and other parameters may be uploaded to the computing device in advance, i.e. before the inference phase.
  • the deep generative neural network may be stored in a storage device such as a memory of the computing device.
  • the method comprises the step of setting S110, at the computing device, the at least one first wallpaper as the wallpaper of the computing device.
  • the generated wallpaper may be for any kind of user interface, for example, the generated wallpaper may be the wallpaper for the main desktop, for the lock screen, for the empty page of a browser and so on without limitation.
  • the generated wallpaper may be used as a screensaver of the computing device.
  • the method further comprises the step of determining S115, whether a condition is met or not.
  • This conditioning is used to determine whether it is necessary to update the first wallpaper with a second wallpaper.
  • the condition includes, but is not limited to, one or more of the following: (i) a user input is received at the computing device, wherein the user input may represent whether a wallpaper currently set as the wallpaper of the computing device is disliked by a user of the computing device or not; (ii) a preconfigured period of time is elapsed; (iii) GPS location of the computing device is changed, wherein the GPS location of the computing device may be registered by a GPS unit comprised in the computing device.
  • the method may be adapted to generate another wallpaper if the currently set wallpaper is disliked by the user. From such like/dislike information, the system can learn to generate wallpapers that the user will like. If it is determined that the condition is met (i.e. YES at step S115) then the method updates the wallpaper currently set as the wallpaper of the computing device by performing the following steps: generating S120, at the computing device, at least one second wallpaper of the wallpaper stream using the deep generative neural network, setting S125, at the computing device, the at least one second wallpaper as the wallpaper of the computing device.
  • the described wallpaper update may be performed automatically in the background. As an example, the user can enable a feature which generates a new wallpaper every morning.
  • the terms ⁇ first ⁇ and ⁇ second ⁇ when applied to the term ⁇ wallpaper ⁇ are used to differentiate between different wallpapers and should not be construed as the terms representing any ordinal relationship between said wallpapers or between the steps of the method.
  • the at least one second wallpaper differs from the at least one first wallpaper.
  • the step of determining S115, whether a condition is met or not may be performed before generating and setting each subsequent wallpaper, including the case where the step of determining S115 is performed before the steps S105 and S110 explained above.
  • the reference numerals are used just to simplify the illustration and should not be construed as representing any ordinal relationship between the steps of the method.
  • Figure 2 illustrates a flowchart of a method of on-device generation and supplying a computing device with wallpaper stream according to another embodiment of the invention disclosed herein.
  • the method embodiment illustrated on Figure 2 differs from the method embodiment illustrated on Figure 1 in that it further comprises the steps of individualizing S95 the deep generative neural network for a user of the computing device using a random input as a parameter of the deep generative neural network thereby ensuring that the deep generative neural network is configured to generate unique wallpaper for the user of the computing device, and customizing S100 the deep generative neural network for the user of the computing device thereby ensuring that a wallpaper generated by the deep generative neural network is customized for the user.
  • the steps S105-S125 illustrated on Figure 2 may be similar to the S105-S125 illustrated o n Figure 1 except that the at least one first wallpaper and the at least one second wallpaper generated and set by the embodiment of Figure 2 are individualized and customized. Therefore, every user gets her/his own unique wallpaper every time.
  • the customization S100 may be based, but without the limitation, on one or more of the following customization parameters: one or more user preferences, one or more user inputs, one or more settings of the computing device, current time of the day, a current season of the year, current GPS location of the computing device, content of the user gallery currently stored on the computing device, content of browser history currently stored on the computing device, current weather and weather forecast, location and colors of icons and widgets on the device screen. Current location and colors of icons and widgets on the device screen may be detected and used in the disclosed method to synthesize such a wallpaper that is not mixed with the icons and widgets.
  • current location and colors of icons and widgets on the device screen may be input to the deep generative neural network before generating/synthesizing a wallpaper as the corresponding parameters to effect an output of the deep generative neural network correspondingly.
  • the method may further include (not illustrated) the steps of analyzing the user gallery content, inferring that the user is fond of mountain photography, and adapting the deep generative neural network to generate more images/videos/interactive wallpapers of mountains.
  • Figure 2 shows that the steps S95 and S100 are performed before the steps S105 and S110, it should not be construed as a limitation, because, if necessary, the steps may be ordered differently, for example, one or both of the steps S95 and S100 may be performed before generating and setting each subsequent wallpaper, including the case where one or both of the steps S95 and S100 is performed before the steps S120 and S125 explained above.
  • the reference numerals are used just to simplify the illustration and should not be construed as representing any ordinal relationship between the steps of the method.
  • the deep generative neural network is trained using adversarial learning process alongside with one or more discriminator networks.
  • the training is performed on a high-end computer or a computational cluster on a large dataset of wallpaper-quality images and/or videos.
  • the deep generative neural network may have one or more of the following variables: vectorial variables, latent variables shaped as a two-dimensional matrix or a set of two-dimensional matrices.
  • latent variables may be drawn from unit normal distributions.
  • the customization may be accomplished by a separate encoder network that is trained to map the customization parameters to the parameters of the latent space distributions such as the mean and the covariance of the Gaussian normal distribution, from which the latent variables are drawn for the deep generative neural network.
  • user-sensitive information such as, for example, one or more user preferences, one or more user inputs, one or more settings of the computing device, current GPS location of the computing device, content of the user gallery currently stored on the computing device, content of browser history currently stored on the computing device is processed as the customization parameter(s)
  • user-sensitive information is not compromised, because the entire processing of said user-sensitive information is carried out by a processor of the user ⁇ s computing device and by the separate encoder network and the deep generative neural network, which are stored in the storage device of the user ⁇ s computing device.
  • the user-sensitive information does not leave the computing device for processing.
  • the step of generating S105, S120 at least one wallpaper further comprises the steps of synthesizing an image and restyling the image, and the step of setting S110, S125 the at least one wallpaper as the wallpaper of the computing device further comprises the step of setting the restyled image as the at least one first wallpaper.
  • the step of generating S105, S120 at least one wallpaper further comprises the steps of synthesizing an image and animating the image, and the step of setting S110, S125 the at least one wallpaper as the wallpaper of the computing device further comprises the step of setting the animated image as the at least one first wallpaper.
  • dynamic wallpapers videos at high resolution
  • interactive wallpapers can be generated.
  • a smartphone wallpaper can change appearance based on user swipes, phone tilts or some interface events.
  • swiping to a different tab in Android screen can change something in an image (e.g. move clouds in the image in the swipe direction).
  • the deep generative neural network may be adapted to generate not only realistic and plausible images but also hyper-realistic images that can have e.g. exaggerated features, such as extremely saturated sunset colors, exaggerated geometric proportions of objects (trees, buildings) etc.
  • the step of generating S105, S120 at least one wallpaper further comprises the step of applying super resolution to the synthesized image
  • the step of setting S110, S125 the at least one wallpaper as the wallpaper of the computing device further comprises the step of setting the image having super-resolution as the at least one first wallpaper.
  • Particular techniques of restyling an image, applying super-resolution or hyper-resolution to an image are know in the art.
  • FIG. 3 illustrates a structural diagram of a computing device 200 according to an embodiment of the invention disclosed herein.
  • the computing device (200) comprises a processor 205 and a storage device 210.
  • the processor 205 is configured to perform processing and computing tasks related to the operation of the computing device and to the operation on the method disclosed herein.
  • the storage device stores the trained deep generative neural network 210.1 for on-device generation and supplying a wallpaper stream by performing the disclosed, when the trained deep generative neural network 210.1 is executed by the processor 205.
  • the storage device 210 may also store processor-executable instructions to cause the processor to perform any one or more of the above described method steps.
  • the processor 205 and the storage device 210 may be operatively interconnected.
  • the processor 205 and the storage device 210 may be also connected with other components (not illustrated) of the computing device.
  • the other component may include, but without the limitation, one or more of a display, a touchscreen, a keyboard, a keypad, a communication unit, a speaker, a microphone, a camera, Bluetooth-unit, NFC (Near-Field Communication) unit, RF(Radio-Frequency) unit, GPS unit, I/O means, as well as necessary wirings and interconnections and so on.
  • the processor 205 may be implemented, but without the limitation, a general-purpose processor, an application-specific integrated circuit (ASIC), a user-programmable gate array (FPGA), or a system-on-chip (SoC).
  • ASIC application-specific integrated circuit
  • FPGA user-programmable gate array
  • SoC system-on-chip
  • the storage device 210 may include, but without the limitation, RAM, ROM and so on.
  • the computing device may be, but without the limitation, the user ⁇ s computing device being one of a smartphone, a tablet, a laptop, a notebook, smart TV, in-vehicle infotainment system and so on.
  • Model architecture The architecture of the model may be based on StyleGAN.
  • the model outputs images of resolution 256 ⁇ 256 (or 512 ⁇ 512) and has four sets of latent variables:
  • the generator has two components: the multilayer perceptron M and the convolutional generator G.
  • the perceptron M takes the concatenated vector and transforms it to the style vector .
  • the convolutional generator may use separate vectors at each of the resolution (style mixing).
  • the set of all style vectors will be referred to as .
  • the set of all spatial random inputs of the generator will be denoted as .
  • the model is trained from two sources of data, the dataset of static scenery images and the dataset of timelapse scenery videos . It is relatively easy to collect a large static dataset, while with authors best efforts a few hundreds of videos that do not cover all the diversity of landscapes were collected. Thus, both sources of data may be utilized in order to build a model having improved performance. To do that, the proposed generative model (the deep generative neural network) is trained in an adversarial way with two different discriminators.
  • the static discriminator has the same architecture and design choices as in StyleGAN. It observes images from as real, while the fake samples are generated by the model.
  • the pairwise discriminator looks at pairs of images. It duplicates the architecture of except first convolutional block that is applied separately to each frame.
  • a real pair of images is obtained by sampling a video from , and then sampling two random frames (arbitrary far from each other) from it.
  • a fake pair is obtained by sampling common static latents and , and then individual dynamic latents , , and , . The two images are then obtained as and . All samples are drawn from unit normal distributions.
  • the model is trained within GAN approach with non-saturating loss with R1 regularization.
  • a batch of fake images to which the static discriminator is applied is sampled or a batch of image pairs to which the pairwise discriminator is applied is sampled.
  • the proportions of the static discriminator and the pairwise discriminator are selected from 0.5/0.5 to 0.9/0.1 respectively over each resolution transition phase and then kept fixed at 0.1. This helps the generator to learn disentangle static and dynamic latents early for each resolution and prevents the pairwise generator from overfitting to relatively small video dataset used for training.
  • the objective of the pairwise discriminator is to focus on the inconsistencies within each pair, and the objective of the static discriminator is to focus on visual quality. Furthermore, since the pairwise discriminator only sees real frames sampled from a limited number of videos, it may prone overfit to this limited set and effectively stop contributing to the learning process (while the static discriminator, which observes more diverse set of scenes, keeps improving the diversity of the model). It turns out both problems (focus on image quality rather than pairwise consistency, overfitting to limited diversity of videos) can be solved with a simple method. This method consists in that the fake set of frames are augmented with pairs of crops taken from same video frame, but from different locations.
  • Sampling videos from the model does not attempt to learn full temporal dynamics of videos, and instead focuses on pairwise consistency of frames that are generated when the dynamic latent variables are resampled.
  • the pairwise discriminator in the model does not sample real frames sequentially.
  • the sampling procedure for fake pairs does not try to generate adjacent frames either.
  • One of the reasons why the described learning phase does not attempt to learn continuity, is because the training dataset contains videos of widely-varying temporal rates, making the notion of temporal adjacency for a pair of frames effectively meaningless.
  • the disclosed generation process is agnostic to a model of motion.
  • the generator is forced to produce plausible frames regardless of and changes.
  • a simple model of motion described below is enough to produce compelling videos.
  • to sample a video it is possible to sample a single static vector from the unit normal distribution and then interpolate the dynamic latent vector between two unit normally distributed samples and .
  • For the spatial maps again it is possible to sample and from a unit normal distribution and then warp the tensor continuously using a homography transform parameterized by displacements of two upper corners and two points at the horizon. The direction of the homogrpahy is sampled randomly, speed was chosen to match the average speed of clouds in the training dataset.
  • the homography is flipped vertically for positions below the horizon to mimic the reflection process. To obtain , it is possible to make a composition of identical transforms and then apply it to . As the latent variables are interpolated/warped, the latent variables are passed through the trained model to obtain the smooth videos. It is necessary to note that the described model requires no image-specific user input.
  • the latent space of the generator is highly redundant, and to obtain good animation, it has to be ensured that the latent variables come roughly from the same distribution as during the training of the model (most importantly, should belong to the output manifold of ). Without such prior, the latent variables that generate good reconstruction might still result in implausible animation (or lack of it). Therefore inference may be performed using the following three-step procedure:
  • Step 1 predicting a set of style vectors using a feedforward encoder network E.
  • the encoder has ResNet-152 architecture and is trained on 200000 synthetic images with mean absolute error loss. is predicted by two-layer perceptron with ReLU from the concatenation of features from several levels of ResNet, aggregated by global average pooling.
  • Step 2 starting from and zero , all latents are optimized to improve reconstruction error. In addition, the deviation of from the predicted (with coefficient 0.01) and the deviation of from zero (by reducing learning rate) are penalized. Optimization is performed for up to 500 steps with Adam and large initial learning rate (0.1), which is halved each time the loss does not improve for 20 iterations.
  • a variant of the method which was evaluated separately, uses a binary segmentation mask obtained by ADE20k-pretrained segmentation network. The mask identifies dynamic (sky+water) and remaining (static) parts of the scene. In this variant, (respectively ) are kept at zero for dynamic (respectively, static) parts of the image.
  • Lighting manipulation During training of the model, is used to map to . is resampled in order to take into account variations of lighting, weather changes, etc. and to have describe only static attributes (land, buildings, horizon shape, etc.). To change lighting in a real image, one has to change and then use MLP to obtain new styles .
  • the described inference procedure outputs and it has been found that it is very difficult to invert and obtain .
  • SR Super Resolution
  • medium resolution e.g. 256 ⁇ 256
  • the main idea of the super resolution approach is to borrow as much as possible from the original high-res image (which is downsampled for animation via ).
  • the animation is super-resolved and blended with the original image using a standard image super resolution approach. It is possible to use ESRGANx4 trained on a dedicated dataset that is created as follows.
  • a frame I is taken from the video dataset as a hi-res image, the frame is then downsampled and the first two steps of inference are run and an (imperfect) low-res image is obtained.
  • the network is trained on a more complex task than super resolution.
  • At least one of the plurality of modules, units, components, steps, sub-steps may be implemented through an AI model.
  • a function associated with AI may be performed through the non-volatile memory, the volatile memory, and the processor.
  • the processor may include one or a plurality of processors.
  • one or a plurality of processors may be a general-purpose processor, such as a central processing unit (CPU), an application processor (AP), or the like, a graphics-only processing unit such as a graphics processing unit (GPU), a visual processing unit (VPU), and/or an AI-dedicated processor such as a neural processing unit (NPU).
  • CPU central processing unit
  • AP application processor
  • GPU graphics-only processing unit
  • VPU visual processing unit
  • NPU neural processing unit
  • the one or a plurality of processors control the processing of the input data in accordance with a predefined operating rule or artificial intelligence (AI) model stored in the non-volatile memory and the volatile memory.
  • the predefined operating rule or artificial intelligence model is provided through training or learning.
  • being provided through learning means that, by applying a learning algorithm to a plurality of learning data, a predefined operating rule or AI model of a desired characteristic is made.
  • the learning may be performed in a device itself in which AI according to an embodiment is performed, and/or may be implemented through a separate server/system.
  • the AI model may consist of a plurality of neural network layers. Each layer has a plurality of weight values, and performs a layer operation through calculation of a previous layer and an operation of a plurality of weights.
  • Examples of neural networks include, but are not limited to, convolutional neural network (CNN), deep neural network (DNN), recurrent neural network (RNN), restricted Boltzmann Machine (RBM), deep belief network (DBN), bidirectional recurrent deep neural network (BRDNN), generative adversarial networks (GAN), and deep Q-networks.
  • the learning algorithm is a method for training a predetermined target device (for example, a robot) using a plurality of learning data to cause, allow, or control the target device to make a determination or prediction.
  • Examples of learning algorithms include, but are not limited to, supervised learning, unsupervised learning, semi-supervised learning, or reinforcement learning.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

A method of on-device generation and supplying a computing device with wallpaper stream is disclosed. The method comprise: generating (S105), at the computing device, at least one first wallpaper of the wallpaper stream using a deep generative neural network, wherein the deep generative neural network is trained on a collection of high-quality images/videos and uploaded to the computing device in advance; and setting (S110), at the computing device, the at least one first wallpaper as the wallpaper of the computing device.

Description

METHOD OF ON-DEVICE GENERATION AND SUPPLYING WALLPAPER STREAM AND COMPUTING DEVICE IMPLEMENTING THE SAME
The present invention relates to the field of artificial intelligence in general, and specifically, to a method of generating and supplying a wallpaper stream on a computing device using a deep generative neural network and to the computing device implementing the method.
Wallpapers are a big part of user experience for a variety of devices, including smartphones, smart TVs, laptops and so on. Currently, to obtain new aesthetically pleasant wallpapers regularly users may subscribe to online updates, i.e. services (like Unsplash) that regularly send new wallpapers to the user device via Internet connection. Disadvantages of such an approach comprise at least the need of Internet connection, and traffic and bandwidth consumption.
Generative neural networks are now capable of synthesizing highly realistic looking 2D images, 3D images and videos. Such networks can therefore be trained to generate realistic aesthetically pleasant wallpaper images. Once trained, many of these models can generate infinite number of highly diverse wallpapers (the most popular class of these models, namely generative adversarial networks) by taking a random high-dimensional vector as an input and generating a peculiar image for such vector. Plugging in a new vector would result in a substantially different image.
Proposed herein is the alternative invention that allows regularly update the wallpapers of the user device without using the Internet. Provided according to the first aspect of the present disclosure is a method of on-device generation and supplying a computing device with wallpaper stream, the method including: generating, at the computing device, at least one first wallpaper of the wallpaper stream using a deep generative neural network, wherein the deep generative neural network is trained on a collection of high-quality images/videos and uploaded to the computing device in advance; and setting, at the computing device, the at least one first wallpaper as the wallpaper of the computing device. Since the plausible wallpaper is synthesized on the computing device itself, i.e. not downloaded from Internet, the above disadvantages are eliminated and the claimed invention makes it possible to reduce/avoid traffic and bandwidth consumption that was needed in the prior art for downloading.
Provided according to the second aspect of the present disclosure is a computing device including a processor and a storage device storing the trained deep generative neural network for on-device generation and supplying a wallpaper stream by performing the method according to the first aspect, when the trained deep generative neural network is executed by the processor.
Figure 1 illustrates a flowchart of a method of on-device generation and supplying a computing device with wallpaper stream according to an embodiment of the invention disclosed herein.
Figure 2 illustrates a flowchart of a method of on-device generation and supplying a computing device with wallpaper stream according to another embodiment of the invention disclosed herein.
Figure 3 illustrates a structural diagram of a computing device according to an embodiment of the invention disclosed herein.
In the following description, unless otherwise described, the same reference numerals are used for the same elements when they are depicted in different drawings, and overlapping description thereof may be omitted.
Figure 1 illustrates a flowchart of a method of on-device generation and supplying a computing device with wallpaper stream according to an embodiment of the invention disclosed herein. The method comprise: generating S105, at the computing device, at least one first wallpaper of the wallpaper stream using a deep generative neural network, wherein the deep generative neural network is trained on a collection of high-quality images/videos and uploaded to the computing device in advance. Implied by said generating is artificial synthesis of a wallpaper with the deep generative neural network. Depending on a type of content in the collection of high-quality images/videos, which was used during the training phase of the deep generative neural network, the deep generative neural network, when trained with such a collection, is configured to generate wallpapers of similar content type. For example, when the collection of high-quality images includes landscape images, the deep generative neural network, when trained with such a collection, will be configured to generate landscape wallpapers and so on. However, the present invention should not be limited with landscape wallpapers, because a collection of high-quality images/videos of any other type of content may be used at the training phase of the deep generative neural network. Furthermore, if a collection of high-quality videos was used during the training phase of the deep generative neural network, the deep generative neural network, when trained with such a collection, will be configured to generate video wallpapers of a content type corresponding to a content type of the training collection of high-quality videos. Implied by the wallpaper stream is one or more images that could be static, dynamic or interactive and/or one or more videos. When the deep generative neural network is trained, the deep generative neural network including weighting factors and other parameters may be uploaded to the computing device in advance, i.e. before the inference phase. The deep generative neural network may be stored in a storage device such as a memory of the computing device. When the at least one first wallpaper is generated by the deep generative neural network, the method comprises the step of setting S110, at the computing device, the at least one first wallpaper as the wallpaper of the computing device. The generated wallpaper may be for any kind of user interface, for example, the generated wallpaper may be the wallpaper for the main desktop, for the lock screen, for the empty page of a browser and so on without limitation. The generated wallpaper may be used as a screensaver of the computing device.
As illustrated on Figure 1 the method further comprises the step of determining S115, whether a condition is met or not. This conditioning is used to determine whether it is necessary to update the first wallpaper with a second wallpaper. The condition includes, but is not limited to, one or more of the following: (i) a user input is received at the computing device, wherein the user input may represent whether a wallpaper currently set as the wallpaper of the computing device is disliked by a user of the computing device or not; (ii) a preconfigured period of time is elapsed; (iii) GPS location of the computing device is changed, wherein the GPS location of the computing device may be registered by a GPS unit comprised in the computing device. Based on dislikes the method may be adapted to generate another wallpaper if the currently set wallpaper is disliked by the user. From such like/dislike information, the system can learn to generate wallpapers that the user will like. If it is determined that the condition is met (i.e. YES at step S115) then the method updates the wallpaper currently set as the wallpaper of the computing device by performing the following steps: generating S120, at the computing device, at least one second wallpaper of the wallpaper stream using the deep generative neural network, setting S125, at the computing device, the at least one second wallpaper as the wallpaper of the computing device. The described wallpaper update may be performed automatically in the background. As an example, the user can enable a feature which generates a new wallpaper every morning. The terms ≪first≫ and ≪second≫ when applied to the term ≪wallpaper≫ are used to differentiate between different wallpapers and should not be construed as the terms representing any ordinal relationship between said wallpapers or between the steps of the method. The at least one second wallpaper differs from the at least one first wallpaper. In an alternative embodiment (not illustrated) of the method, the step of determining S115, whether a condition is met or not, may be performed before generating and setting each subsequent wallpaper, including the case where the step of determining S115 is performed before the steps S105 and S110 explained above. Thus, it should be clear that the reference numerals are used just to simplify the illustration and should not be construed as representing any ordinal relationship between the steps of the method.
Figure 2 illustrates a flowchart of a method of on-device generation and supplying a computing device with wallpaper stream according to another embodiment of the invention disclosed herein. The method embodiment illustrated on Figure 2 differs from the method embodiment illustrated on Figure 1 in that it further comprises the steps of individualizing S95 the deep generative neural network for a user of the computing device using a random input as a parameter of the deep generative neural network thereby ensuring that the deep generative neural network is configured to generate unique wallpaper for the user of the computing device, and customizing S100 the deep generative neural network for the user of the computing device thereby ensuring that a wallpaper generated by the deep generative neural network is customized for the user. The steps S105-S125 illustrated on Figure 2 may be similar to the S105-S125 illustrated on Figure 1 except that the at least one first wallpaper and the at least one second wallpaper generated and set by the embodiment of Figure 2 are individualized and customized. Therefore, every user gets her/his own unique wallpaper every time.
The customization S100 may be based, but without the limitation, on one or more of the following customization parameters: one or more user preferences, one or more user inputs, one or more settings of the computing device, current time of the day, a current season of the year, current GPS location of the computing device, content of the user gallery currently stored on the computing device, content of browser history currently stored on the computing device, current weather and weather forecast, location and colors of icons and widgets on the device screen. Current location and colors of icons and widgets on the device screen may be detected and used in the disclosed method to synthesize such a wallpaper that is not mixed with the icons and widgets. In this case, current location and colors of icons and widgets on the device screen may be input to the deep generative neural network before generating/synthesizing a wallpaper as the corresponding parameters to effect an output of the deep generative neural network correspondingly. The method may further include (not illustrated) the steps of analyzing the user gallery content, inferring that the user is fond of mountain photography, and adapting the deep generative neural network to generate more images/videos/interactive wallpapers of mountains. Despite the fact that Figure 2 shows that the steps S95 and S100 are performed before the steps S105 and S110, it should not be construed as a limitation, because, if necessary, the steps may be ordered differently, for example, one or both of the steps S95 and S100 may be performed before generating and setting each subsequent wallpaper, including the case where one or both of the steps S95 and S100 is performed before the steps S120 and S125 explained above. Thus, it should be clear that the reference numerals are used just to simplify the illustration and should not be construed as representing any ordinal relationship between the steps of the method.
The deep generative neural network is trained using adversarial learning process alongside with one or more discriminator networks. The training is performed on a high-end computer or a computational cluster on a large dataset of wallpaper-quality images and/or videos. The deep generative neural network may have one or more of the following variables: vectorial variables, latent variables shaped as a two-dimensional matrix or a set of two-dimensional matrices. In an embodiment, latent variables may be drawn from unit normal distributions. The customization may be accomplished by a separate encoder network that is trained to map the customization parameters to the parameters of the latent space distributions such as the mean and the covariance of the Gaussian normal distribution, from which the latent variables are drawn for the deep generative neural network. When user-sensitive information such as, for example, one or more user preferences, one or more user inputs, one or more settings of the computing device, current GPS location of the computing device, content of the user gallery currently stored on the computing device, content of browser history currently stored on the computing device is processed as the customization parameter(s), it should be clear that such user-sensitive information is not compromised, because the entire processing of said user-sensitive information is carried out by a processor of the user`s computing device and by the separate encoder network and the deep generative neural network, which are stored in the storage device of the user`s computing device. In other words, advantageously, the user-sensitive information does not leave the computing device for processing.
In an alternative embodiment (not illustrated) of the method the step of generating S105, S120 at least one wallpaper further comprises the steps of synthesizing an image and restyling the image, and the step of setting S110, S125 the at least one wallpaper as the wallpaper of the computing device further comprises the step of setting the restyled image as the at least one first wallpaper. In still another embodiment (not illustrated) of the method the step of generating S105, S120 at least one wallpaper further comprises the steps of synthesizing an image and animating the image, and the step of setting S110, S125 the at least one wallpaper as the wallpaper of the computing device further comprises the step of setting the animated image as the at least one first wallpaper. Because there are no constraints on the bandwidth and acquiring the content is essentially free, dynamic wallpapers (videos at high resolution) can be generated with an appropriate model. Apart from dynamic wallpapers, interactive wallpapers can be generated. A smartphone wallpaper can change appearance based on user swipes, phone tilts or some interface events. As an example, swiping to a different tab in Android screen can change something in an image (e.g. move clouds in the image in the swipe direction). According to another alternative embodiment of the disclosed method the deep generative neural network may be adapted to generate not only realistic and plausible images but also hyper-realistic images that can have e.g. exaggerated features, such as extremely saturated sunset colors, exaggerated geometric proportions of objects (trees, buildings) etc. Some users may like and prefer such hyper-realistic wallpapers. Most generative models (e.g. adversarially trained generative neural networks) can allow each user to set her/his own preferred tradeoff between realism and hyperrealism. In still another embodiment of the method the step of generating S105, S120 at least one wallpaper further comprises the step of applying super resolution to the synthesized image, and the step of setting S110, S125 the at least one wallpaper as the wallpaper of the computing device further comprises the step of setting the image having super-resolution as the at least one first wallpaper. Particular techniques of restyling an image, applying super-resolution or hyper-resolution to an image are know in the art.
Figure 3 illustrates a structural diagram of a computing device 200 according to an embodiment of the invention disclosed herein. The computing device (200) comprises a processor 205 and a storage device 210. The processor 205 is configured to perform processing and computing tasks related to the operation of the computing device and to the operation on the method disclosed herein. The storage device stores the trained deep generative neural network 210.1 for on-device generation and supplying a wallpaper stream by performing the disclosed, when the trained deep generative neural network 210.1 is executed by the processor 205. The storage device 210 may also store processor-executable instructions to cause the processor to perform any one or more of the above described method steps. The processor 205 and the storage device 210 may be operatively interconnected. The processor 205 and the storage device 210 may be also connected with other components (not illustrated) of the computing device. The other component may include, but without the limitation, one or more of a display, a touchscreen, a keyboard, a keypad, a communication unit, a speaker, a microphone, a camera, Bluetooth-unit, NFC (Near-Field Communication) unit, RF(Radio-Frequency) unit, GPS unit, I/O means, as well as necessary wirings and interconnections and so on. The processor 205 may be implemented, but without the limitation, a general-purpose processor, an application-specific integrated circuit (ASIC), a user-programmable gate array (FPGA), or a system-on-chip (SoC). The storage device 210 may include, but without the limitation, RAM, ROM and so on. Thus, the computing device may be, but without the limitation, the user`s computing device being one of a smartphone, a tablet, a laptop, a notebook, smart TV, in-vehicle infotainment system and so on.
Any piece of the following information should not be construed as a limitation on the present disclosure. Instead, the following information is provided below to enable the skilled person to practice the embodiments disclosed herein and to prove that the disclosure is sufficient. Any particular values of any parameters specified below are to be construed non-limiting.
Model architecture. The architecture of the model may be based on StyleGAN. The model outputs images of resolution 256 Х 256 (or 512 Х 512) and has four sets of latent variables:
- a vector
Figure PCTKR2021000224-appb-img-000001
, which encodes colors and the general scene layout;
- a vector
Figure PCTKR2021000224-appb-img-000002
, which encodes global lighting (e.g. time of day);
- a set
Figure PCTKR2021000224-appb-img-000003
of square matrices
Figure PCTKR2021000224-appb-img-000004
, which encode shapes and details of static objects at N = 7 different resolutions between 4 Х 4 and 256 Х 256 ( N = 8 for 512 Х 512);
- a set
Figure PCTKR2021000224-appb-img-000005
of square matrices
Figure PCTKR2021000224-appb-img-000006
, which encode shapes and details of dynamic objects at the corresponding resolutions.
The generator has two components: the multilayer perceptron M and the convolutional generator G. The perceptron M takes the concatenated vector
Figure PCTKR2021000224-appb-img-000007
and transforms it to the style vector
Figure PCTKR2021000224-appb-img-000008
. The convolutional generator G has N = 7 (or 8) blocks. Within each block, a convolution is followed by two elementwise additions of two tensors obtained from
Figure PCTKR2021000224-appb-img-000009
and
Figure PCTKR2021000224-appb-img-000010
by a learnable per-channel scaling. Finally, the AdaIN transform is applied using per-channel scales and biases obtained from
Figure PCTKR2021000224-appb-img-000011
using learnable linear transform. Within each block, this sequence of steps is repeated twice followed by upsampling and convolution layers.
Below, the following set of input latent variables will be referred to
Figure PCTKR2021000224-appb-img-000012
as original inputs (or original latents). As in StyleGAN, the convolutional generator may use separate
Figure PCTKR2021000224-appb-img-000013
vectors at each of the resolution (style mixing). The set of all style vectors will be referred to as
Figure PCTKR2021000224-appb-img-000014
. Finally, the set of all spatial random inputs of the generator will be denoted as
Figure PCTKR2021000224-appb-img-000015
.
Learning the model. The model is trained from two sources of data, the dataset of static scenery images
Figure PCTKR2021000224-appb-img-000016
and the dataset of timelapse scenery videos
Figure PCTKR2021000224-appb-img-000017
. It is relatively easy to collect a large static dataset, while with authors best efforts a few hundreds of videos that do not cover all the diversity of landscapes were collected. Thus, both sources of data may be utilized in order to build a model having improved performance. To do that, the proposed generative model (the deep generative neural network) is trained in an adversarial way with two different discriminators.
The static discriminator
Figure PCTKR2021000224-appb-img-000018
has the same architecture and design choices as in StyleGAN. It observes images
Figure PCTKR2021000224-appb-img-000019
from as real, while the fake samples are generated by the model. The pairwise discriminator
Figure PCTKR2021000224-appb-img-000020
looks at pairs of images. It duplicates the architecture of
Figure PCTKR2021000224-appb-img-000021
except first convolutional block that is applied separately to each frame. A real pair of images is obtained by sampling a video from
Figure PCTKR2021000224-appb-img-000022
, and then sampling two random frames (arbitrary far from each other) from it. A fake pair is obtained by sampling common static latents
Figure PCTKR2021000224-appb-img-000023
and
Figure PCTKR2021000224-appb-img-000024
, and then individual dynamic latents
Figure PCTKR2021000224-appb-img-000025
,
Figure PCTKR2021000224-appb-img-000026
, and
Figure PCTKR2021000224-appb-img-000027
,
Figure PCTKR2021000224-appb-img-000028
. The two images are then obtained as
Figure PCTKR2021000224-appb-img-000029
and
Figure PCTKR2021000224-appb-img-000030
. All samples are drawn from unit normal distributions.
The model is trained within GAN approach with non-saturating loss with R1 regularization. During each update of the generator, a batch of fake images to which the static discriminator is applied is sampled or a batch of image pairs to which the pairwise discriminator is applied is sampled. The proportions of the static discriminator and the pairwise discriminator are selected from 0.5/0.5 to 0.9/0.1 respectively over each resolution transition phase and then kept fixed at 0.1. This helps the generator to learn disentangle static and dynamic latents early for each resolution and prevents the pairwise generator from overfitting to relatively small video dataset used for training.
During learning, the objective of the pairwise discriminator is to focus on the inconsistencies within each pair, and the objective of the static discriminator is to focus on visual quality. Furthermore, since the pairwise discriminator only sees real frames sampled from a limited number of videos, it may prone overfit to this limited set and effectively stop contributing to the learning process (while the static discriminator, which observes more diverse set of scenes, keeps improving the diversity of the model). It turns out both problems (focus on image quality rather than pairwise consistency, overfitting to limited diversity of videos) can be solved with a simple method. This method consists in that the fake set of frames are augmented with pairs of crops taken from same video frame, but from different locations. Since these crops have the same visual quality as the images in real frames, and since they come from the same videos as images within real pairs, the pairwise discriminator effectively stops paying attention to image quality, cannot simply overfit to the statistics of scenes in the video dataset, and has to focus on finding pairwise inconsistencies within fake pairs. This crop sampling method may be used to improve the quality of the model significantly.
Sampling videos from the model. The model does not attempt to learn full temporal dynamics of videos, and instead focuses on pairwise consistency of frames that are generated when the dynamic latent variables are resampled. In particular, the pairwise discriminator in the model does not sample real frames sequentially. The sampling procedure for fake pairs does not try to generate adjacent frames either. One of the reasons why the described learning phase does not attempt to learn continuity, is because the training dataset contains videos of widely-varying temporal rates, making the notion of temporal adjacency for a pair of frames effectively meaningless.
Because of this, the disclosed generation process is agnostic to a model of motion. The generator is forced to produce plausible frames regardless of
Figure PCTKR2021000224-appb-img-000031
and
Figure PCTKR2021000224-appb-img-000032
changes. In the experiments it was found that a simple model of motion described below is enough to produce compelling videos. Specifically, to sample a video, it is possible to sample a single static vector
Figure PCTKR2021000224-appb-img-000033
from the unit normal distribution and then interpolate the dynamic latent vector between two unit normally distributed samples
Figure PCTKR2021000224-appb-img-000034
and
Figure PCTKR2021000224-appb-img-000035
. For the spatial maps, again it is possible to sample
Figure PCTKR2021000224-appb-img-000036
and
Figure PCTKR2021000224-appb-img-000037
from a unit normal distribution and then warp the
Figure PCTKR2021000224-appb-img-000038
tensor continuously using a homography transform parameterized by displacements of two upper corners and two points at the horizon. The direction of the homogrpahy is sampled randomly, speed was chosen to match the average speed of clouds in the training dataset. The homography is flipped vertically for positions below the horizon to mimic the reflection process. To obtain
Figure PCTKR2021000224-appb-img-000039
, it is possible to make a composition of
Figure PCTKR2021000224-appb-img-000040
identical transforms and then apply it to
Figure PCTKR2021000224-appb-img-000041
. As the latent variables are interpolated/warped, the latent variables are passed through the trained model to obtain the smooth videos. It is necessary to note that the described model requires no image-specific user input.
Animating Real Scenery Images with the Model. Inference. To animate a given scenery image
Figure PCTKR2021000224-appb-img-000042
, a set of latent variables that produce such an image within the generator is inferred. Extended latents
Figure PCTKR2021000224-appb-img-000043
and
Figure PCTKR2021000224-appb-img-000044
are looked for, so that
Figure PCTKR2021000224-appb-img-000045
. After that, the same procedure as described above to animate the given image may be applied.
The latent space of the generator is highly redundant, and to obtain good animation, it has to be ensured that the latent variables come roughly from the same distribution as during the training of the model (most importantly,
Figure PCTKR2021000224-appb-img-000046
should belong to the output manifold of
Figure PCTKR2021000224-appb-img-000047
). Without such prior, the latent variables that generate good reconstruction might still result in implausible animation (or lack of it). Therefore inference may be performed using the following three-step procedure:
1. Step 1: predicting a set of style vectors
Figure PCTKR2021000224-appb-img-000048
using a feedforward encoder network E. The encoder has ResNet-152 architecture and is trained on 200000 synthetic images with mean absolute error loss.
Figure PCTKR2021000224-appb-img-000049
is predicted by two-layer perceptron with ReLU from the concatenation of features from several levels of ResNet, aggregated by global average pooling.
2. Step 2: starting from
Figure PCTKR2021000224-appb-img-000050
and zero
Figure PCTKR2021000224-appb-img-000051
, all latents are optimized to improve reconstruction error. In addition, the deviation of
Figure PCTKR2021000224-appb-img-000052
from the predicted
Figure PCTKR2021000224-appb-img-000053
(with coefficient 0.01) and the deviation of
Figure PCTKR2021000224-appb-img-000054
from zero (by reducing learning rate) are penalized. Optimization is performed for up to 500 steps with Adam and large initial learning rate (0.1), which is halved each time the loss does not improve for 20 iterations. A variant of the method, which was evaluated separately, uses a binary segmentation mask obtained by ADE20k-pretrained segmentation network. The mask identifies dynamic (sky+water) and remaining (static) parts of the scene. In this variant,
Figure PCTKR2021000224-appb-img-000055
(respectively
Figure PCTKR2021000224-appb-img-000056
) are kept at zero for dynamic (respectively, static) parts of the image.
3. Step 3: freezing latents and fine-tuning the weights of
Figure PCTKR2021000224-appb-img-000057
to further drive down the reconstruction error. This step is needed since even after optimization, the gap between the reconstruction and the input image remains. During this fine-tuning, the combination of the per-pixel mean absolute error and the perceptual loss are minimized, with much larger (10Х) weight for the latter. 500 steps with ADAM and lr = 0,001 are performed.
Lighting manipulation. During training of the model,
Figure PCTKR2021000224-appb-img-000058
is used to map
Figure PCTKR2021000224-appb-img-000059
to
Figure PCTKR2021000224-appb-img-000060
.
Figure PCTKR2021000224-appb-img-000061
is resampled in order to take into account variations of lighting, weather changes, etc. and to have
Figure PCTKR2021000224-appb-img-000062
describe only static attributes (land, buildings, horizon shape, etc.). To change lighting in a real image, one has to change
Figure PCTKR2021000224-appb-img-000063
and then use MLP to obtain new styles
Figure PCTKR2021000224-appb-img-000064
. The described inference procedure, however, outputs
Figure PCTKR2021000224-appb-img-000065
and it has been found that it is very difficult to invert
Figure PCTKR2021000224-appb-img-000066
and obtain
Figure PCTKR2021000224-appb-img-000067
.
To tackle this problem, a separate neural network,
Figure PCTKR2021000224-appb-img-000068
, is trained to approximate local dynamics of
Figure PCTKR2021000224-appb-img-000069
. Let
Figure PCTKR2021000224-appb-img-000070
and
Figure PCTKR2021000224-appb-img-000071
, A is optimized as follows:
Figure PCTKR2021000224-appb-img-000072
, where
Figure PCTKR2021000224-appb-img-000073
is coefficient of interpolation between
Figure PCTKR2021000224-appb-img-000074
and
Figure PCTKR2021000224-appb-img-000075
. Thus, c = 0 corresponds to
Figure PCTKR2021000224-appb-img-000076
, so
Figure PCTKR2021000224-appb-img-000077
; c = 1 corresponds to
Figure PCTKR2021000224-appb-img-000078
, so .
Figure PCTKR2021000224-appb-img-000079
.
This is implemented by the combination of L1-loss
Figure PCTKR2021000224-appb-img-000080
and relative direction loss
Figure PCTKR2021000224-appb-img-000081
. The total optimization criterion is
Figure PCTKR2021000224-appb-img-000082
.
Figure PCTKR2021000224-appb-img-000083
is trained with ADAM until convergence. At test time, the network
Figure PCTKR2021000224-appb-img-000084
makes it possible to sample a random target
Figure PCTKR2021000224-appb-img-000085
and update
Figure PCTKR2021000224-appb-img-000086
towards it by increasing the interpolation coefficient
Figure PCTKR2021000224-appb-img-000087
as the animation progresses.
Super Resolution (SR). As the models are trained at medium resolution (e.g. 256Х256), it becomes possible to bring fine details from a given image that is subject to animation through a separate super-resolution procedure. The main idea of the super resolution approach is to borrow as much as possible from the original high-res image (which is downsampled for animation via
Figure PCTKR2021000224-appb-img-000088
). To achieve that, the animation is super-resolved and blended with the original image using a standard image super resolution approach. It is possible to use ESRGANx4 trained on a dedicated dataset that is created as follows. To obtain the (hi-res, low-res) pair, a frame I is taken from the video dataset as a hi-res image, the frame is then downsampled and the first two steps of inference are run and an (imperfect) low-res image is obtained. Thus, the network is trained on a more complex task than super resolution.
After obtaining the super-resolved video, dynamic parts (sky and water) are transferred from it to the final result. The static parts are obtained by running the guided filter on the super-resolved frames while using the input high-res image as a guide. Such procedure effectively transfers high-res details from the input, while retaining the lighting change induced by lighting manipulation.
At least one of the plurality of modules, units, components, steps, sub-steps may be implemented through an AI model. A function associated with AI may be performed through the non-volatile memory, the volatile memory, and the processor. The processor may include one or a plurality of processors. At this time, one or a plurality of processors may be a general-purpose processor, such as a central processing unit (CPU), an application processor (AP), or the like, a graphics-only processing unit such as a graphics processing unit (GPU), a visual processing unit (VPU), and/or an AI-dedicated processor such as a neural processing unit (NPU). The one or a plurality of processors control the processing of the input data in accordance with a predefined operating rule or artificial intelligence (AI) model stored in the non-volatile memory and the volatile memory. The predefined operating rule or artificial intelligence model is provided through training or learning. Here, being provided through learning means that, by applying a learning algorithm to a plurality of learning data, a predefined operating rule or AI model of a desired characteristic is made. The learning may be performed in a device itself in which AI according to an embodiment is performed, and/or may be implemented through a separate server/system.
The AI model may consist of a plurality of neural network layers. Each layer has a plurality of weight values, and performs a layer operation through calculation of a previous layer and an operation of a plurality of weights. Examples of neural networks include, but are not limited to, convolutional neural network (CNN), deep neural network (DNN), recurrent neural network (RNN), restricted Boltzmann Machine (RBM), deep belief network (DBN), bidirectional recurrent deep neural network (BRDNN), generative adversarial networks (GAN), and deep Q-networks. The learning algorithm is a method for training a predetermined target device (for example, a robot) using a plurality of learning data to cause, allow, or control the target device to make a determination or prediction. Examples of learning algorithms include, but are not limited to, supervised learning, unsupervised learning, semi-supervised learning, or reinforcement learning.
It should be expressly understood that not all technical effects mentioned herein need to be enjoyed in each and every embodiment of the present technology. For example, embodiments of the present technology may be implemented without the user enjoying some of these technical effects, while other embodiments may be implemented with the user enjoying other technical effects or none at all.
Modifications and improvements to the above-described implementations of the present technology may become apparent to those skilled in the art. The foregoing description is intended to be exemplary rather than limiting. The scope of the present technology is therefore intended to be limited solely by the scope of the appended claims.
While the above-described implementations have been described and shown with reference to particular steps performed in a particular order, it will be understood that these steps may be combined, sub-divided, or reordered without departing from the teachings of the present technology. Accordingly, the order and grouping of the steps is not a limitation of the present technology.

Claims (13)

  1. A method of on-device generation and supplying a computing device with wallpaper stream, comprising:
    generating (S105), at the computing device, at least one first wallpaper of the wallpaper stream using a deep generative neural network, wherein the deep generative neural network is trained on a collection of high-quality images/videos and uploaded to the computing device in advance; and
    setting (S110), at the computing device, the at least one first wallpaper as the wallpaper of the computing device.
  2. The method of claim 1, further comprising the following steps performed upon a condition is met (S115):
    generating (S120), at the computing device, at least one second wallpaper of the wallpaper stream using the deep generative neural network, wherein the at least one second wallpaper differs from the at least one first wallpaper;
    setting (S125), at the computing device, the at least one second wallpaper as the wallpaper of the computing device.
  3. The method of claim 2, wherein the condition includes one or more of the following conditions:
    a user input is received at the computing device, wherein the user input represents whether a wallpaper currently set as the wallpaper of the computing device is disliked by a user of the computing device or not;
    a preconfigured period of time is elapsed;
    GPS location of the computing device is changed, wherein the GPS location of the computing device is registerable by a GPS unit comprised in the computing device.
  4. The method of claim 1, further comprising one or both of the following steps:
    individualizing (S95) the deep generative neural network for a user of the computing device using a random input as a parameter of the deep generative neural network thereby ensuring that the deep generative neural network is configured to generate unique wallpaper for the user of the computing device;
    customizing (S100) the deep generative neural network for the user of the computing device thereby ensuring that a wallpaper generated by the deep generative neural network is customized for the user, the customization is based on one or more of the following customization parameters: one or more user preferences, one or more user inputs, one or more settings of the computing device, current time of the day, a current season of the year, current GPS location of the computing device, content of the user gallery currently stored on the computing device, content of browser history currently stored on the computing device, current weather and weather forecast, location and colors of icons and widgets on the device screen.
  5. The method of claim 1 or 2, wherein the wallpaper generated by the deep generative neural network is static, dynamic or interactive.
  6. The method of claim 1, wherein the deep generative neural network has one or more of the following variables: vectorial variables, latent variables shaped as a two-dimensional matrix or a set of two-dimensional matrices.
  7. The method of claim 6, wherein latent variables are drawn from unit normal distributions.
  8. The method of any one of claims 4 or 6, wherein the customization is accomplished by a separate encoder network that is trained to map the customization parameters to the parameters of the latent space distributions such as the mean and the covariance of the Gaussian normal distribution, from which latent variables are drawn.
  9. The method of claim 1 or 2, wherein the step of generating at least one wallpaper further comprises the steps of synthesizing an image and restyling the image, and
    the step of setting the at least one wallpaper as the wallpaper of the computing device further comprises the step of setting the restyled image as the at least one first wallpaper.
  10. The method of claim 1 or 2, wherein the step of generating at least one wallpaper further comprises the steps of synthesizing an image and animating the image, and
    the step of setting the at least one wallpaper as the wallpaper of the computing device further comprises the step of setting the animated image as the at least one first wallpaper.
  11. The method of claim 6, wherein the deep generative neural network is trained using adversarial learning process alongside with one or more discriminator networks.
  12. A computing device (200) comprising a processor (205) and a storage device (210) storing the trained deep generative neural network (210.1) for on-device generation and supplying a wallpaper stream by performing the method of any one of claims 1-11, when the trained deep generative neural network (210.1) is executed by the processor (205).
  13. The computing device (200) of claim 12, wherein the computing device (200) comprises the user-computing device being one of the following: a smartphone, a tablet, a laptop, a notebook, smart TV.
PCT/KR2021/000224 2020-10-07 2021-01-08 Method of on-device generation and supplying wallpaper stream and computing device implementing the same WO2022075533A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
RU2020133033A RU2768551C1 (en) 2020-10-07 2020-10-07 Method for local generation and representation of wallpaper stream and computer implementing it
RU2020133033 2020-10-07

Publications (1)

Publication Number Publication Date
WO2022075533A1 true WO2022075533A1 (en) 2022-04-14

Family

ID=80819470

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2021/000224 WO2022075533A1 (en) 2020-10-07 2021-01-08 Method of on-device generation and supplying wallpaper stream and computing device implementing the same

Country Status (2)

Country Link
RU (1) RU2768551C1 (en)
WO (1) WO2022075533A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180012330A1 (en) * 2015-07-15 2018-01-11 Fyusion, Inc Dynamic Multi-View Interactive Digital Media Representation Lock Screen
US20180075490A1 (en) * 2016-09-09 2018-03-15 Sony Corporation System and method for providing recommendation on an electronic device based on emotional state detection
US20180150203A1 (en) * 2006-12-05 2018-05-31 At&T Mobility Ii Llc Home screen user interface for electronic device display
US20190251721A1 (en) * 2018-02-15 2019-08-15 Microsoft Technology Licensing, Llc Controllable conditional image generation
US20190336724A1 (en) * 2016-12-29 2019-11-07 Huawei Technologies Co., Ltd. Method and Apparatus for Adjusting User Emotion

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6202083B1 (en) * 1998-05-18 2001-03-13 Micron Electronics, Inc. Method for updating wallpaper for computer display
JP2004357176A (en) * 2003-05-30 2004-12-16 V-Cube Inc Wallpaper image creation system
CA2780765A1 (en) * 2009-11-13 2011-05-19 Google Inc. Live wallpaper
KR101994295B1 (en) * 2012-08-08 2019-06-28 삼성전자주식회사 Terminal and method for generating live image in terminal
US20150205498A1 (en) * 2014-01-17 2015-07-23 Southern Telecom Inc. Automatic wallpaper image changer for a computing device
CN106354385B (en) * 2016-08-26 2020-03-13 Oppo广东移动通信有限公司 Image processing method and device and terminal equipment
CN107817999A (en) * 2016-08-31 2018-03-20 上海卓易科技股份有限公司 The generation method and terminal of a kind of dynamic wallpaper
US10867214B2 (en) * 2018-02-14 2020-12-15 Nvidia Corporation Generation of synthetic images for training a neural network model

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180150203A1 (en) * 2006-12-05 2018-05-31 At&T Mobility Ii Llc Home screen user interface for electronic device display
US20180012330A1 (en) * 2015-07-15 2018-01-11 Fyusion, Inc Dynamic Multi-View Interactive Digital Media Representation Lock Screen
US20180075490A1 (en) * 2016-09-09 2018-03-15 Sony Corporation System and method for providing recommendation on an electronic device based on emotional state detection
US20190336724A1 (en) * 2016-12-29 2019-11-07 Huawei Technologies Co., Ltd. Method and Apparatus for Adjusting User Emotion
US20190251721A1 (en) * 2018-02-15 2019-08-15 Microsoft Technology Licensing, Llc Controllable conditional image generation

Also Published As

Publication number Publication date
RU2768551C1 (en) 2022-03-24

Similar Documents

Publication Publication Date Title
US11823327B2 (en) Method for rendering relighted 3D portrait of person and computing device for the same
JP7490004B2 (en) Image Colorization Using Machine Learning
US20230401672A1 (en) Video processing method and apparatus, computer device, and storage medium
WO2022089166A1 (en) Facial image processing method and apparatus, facial image display method and apparatus, and device
CN110381268A (en) method, device, storage medium and electronic equipment for generating video
CN112669448B (en) Virtual data set development method, system and storage medium based on three-dimensional reconstruction technology
WO2023279936A1 (en) Methods and systems for high definition image manipulation with neural networks
CN112381707A (en) Image generation method, device, equipment and storage medium
US20240126810A1 (en) Using interpolation to generate a video from static images
CN113542759B (en) Generating an antagonistic neural network assisted video reconstruction
CN117808854A (en) Image generation method, model training method, device and electronic equipment
WO2021057091A1 (en) Viewpoint image processing method and related device
CN115228083A (en) Resource rendering method and device
WO2022075533A1 (en) Method of on-device generation and supplying wallpaper stream and computing device implementing the same
CN113989460A (en) Real-time sky replacement special effect control method and device for augmented reality scene
CN115937365A (en) Network training method, device and equipment for face reconstruction and storage medium
US20240171788A1 (en) High-resolution video generation using image diffusion models
WO2022108067A1 (en) Method for rendering relighted 3d portrait of person and computing device for the same
US20240161362A1 (en) Target-augmented material maps
RU2757563C1 (en) Method for visualizing a 3d portrait of a person with altered lighting and a computing device for it
CN113222178B (en) Model training method, user interface generation method, device and storage medium
CN114418835A (en) Image processing method, apparatus, device and medium
WO2022197084A1 (en) Method of generating multi-layer representation of scene and computing device implementing the same
US20240163395A1 (en) Uncertainty-Guided Frame Interpolation for Video Rendering
CN115546019B (en) Rendering data acquisition method and electronic equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21877788

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21877788

Country of ref document: EP

Kind code of ref document: A1