US20180068463A1 - Systems and Methods for Providing Convolutional Neural Network Based Image Synthesis Using Stable and Controllable Parametric Models, a Multiscale Synthesis Framework and Novel Network Architectures - Google Patents

Systems and Methods for Providing Convolutional Neural Network Based Image Synthesis Using Stable and Controllable Parametric Models, a Multiscale Synthesis Framework and Novel Network Architectures Download PDF

Info

Publication number
US20180068463A1
US20180068463A1 US15/694,677 US201715694677A US2018068463A1 US 20180068463 A1 US20180068463 A1 US 20180068463A1 US 201715694677 A US201715694677 A US 201715694677A US 2018068463 A1 US2018068463 A1 US 2018068463A1
Authority
US
United States
Prior art keywords
image
loss function
pixel
source
style
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US15/694,677
Other versions
US9922432B1 (en
Inventor
Eric Andrew Risser
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Artomatix Ltd
Original Assignee
Artomatix Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Artomatix Ltd filed Critical Artomatix Ltd
Priority to US15/694,677 priority Critical patent/US9922432B1/en
Assigned to Artomatix Ltd. reassignment Artomatix Ltd. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: RISSER, Eric Andrew
Priority to US15/876,011 priority patent/US10424087B2/en
Publication of US20180068463A1 publication Critical patent/US20180068463A1/en
Application granted granted Critical
Publication of US9922432B1 publication Critical patent/US9922432B1/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/001Texturing; Colouring; Generation of texture or colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/40Analysis of texture
    • G06T7/41Analysis of texture based on statistical description of texture
    • G06T7/45Analysis of texture based on statistical description of texture using co-occurrence matrix computation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging
    • G06T5/004
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/20Image enhancement or restoration using local operators
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/40Image enhancement or restoration using histogram techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/73Deblurring; Sharpening
    • G06T5/75Unsharp masking

Definitions

  • This invention generally relates to image synthesis and more specifically relates to image synthesis using convolutional neural networks based upon exemplar images.
  • processes for providing CNN-based image synthesis may be performed by a server system.
  • the processes may be performed by a “cloud” server system.
  • the processes may be performed on a user device.
  • One embodiment is a system for generating a synthesized image including desired content presented in a desired style includes one or more processors, memory readable by the one or more processors.
  • the system in accordance with some embodiments of the invention includes instructions stored in the memory that when read by the one or more processors direct the one or more processors to receive a source content image that includes desired content for a synthesized image, receive a source style image that includes a desired texture for the synthesized image, determine a localized loss function for a pixel in at least one of the source content image and the source style image, and generate the synthesized image by optimizing a value of a pixel in the synthesized image to a content loss function of a corresponding pixel in the content source image and a style loss function of a corresponding pixel in the source style image wherein at least one of the corresponding pixels is the pixel that has a determined localized loss function and one of the content loss function and the source loss function is the determined localized loss function.
  • the localized loss function is represented by a Gram matrix.
  • the localized loss function is represented by a covariance matrix.
  • the localized loss function is determined using a Convolutional Neural Network (CNN).
  • CNN Convolutional Neural Network
  • the optimizing is performed by back propagation through the CNN.
  • the localized loss function is determined for a pixel in the source style image.
  • the instructions to determine a localized loss function for a pixel in the source style image direct the one or more processors to receive a mask that identifies regions of the style source image, determine a group of pixels including the pixel that are included in one of the plurality of regions identified by the mask, determine a localized loss function for the one of the plurality of regions from the groups of pixels included in the one of the plurality of regions, and associate the localized loss function with the pixel.
  • the instructions to determine a localized loss function for a pixel in the source style image direct the one or more processors to group the pixels of the source style image into a plurality of cells determined by a grid applied to the source style image, determine a localized loss function for the one of the plurality of cells that has a group of pixels that include the pixel, and associate the determined localized loss function of the one of the plurality of cells with the pixel.
  • the instructions to determine a localized loss function for a pixel in the source style image direct the one or more processors to determine a group of neighbor pixels for a pixel in the source content image, determine a group of corresponding pixels in the source style image associated with the group of neighbor pixels in the source content image wherein each of the group of corresponding pixels corresponds to one of the group of neighbor pixels and includes the pixel, and determine a local loss function for the group of corresponding pixels.
  • the localized loss function is determined for a pixel in the source content image.
  • the instructions to determine a localized loss function for a pixel in the source content image direct the one or more processors to receive a mask that identifies regions of the source content image, determine a group of pixels including the pixel that are included in one of the plurality of regions identified by the mask, determine a localized loss function for the one of the plurality of regions from the groups of pixels included in the one of the plurality of regions, and associate the localized loss function with the pixel.
  • the instructions to determine a localized loss function for a pixel in the source style image direct the one or more processors to group the pixels of the source content image into a plurality of cells determined by a grid applied to the source style image, determine a localized loss function for the one of the plurality of cells that has a group of pixels that include the pixel, and associate the determined localized loss function of the one of the plurality of cells with the pixel.
  • the instructions to determine a localized loss function for a pixel in the source style image direct the one or more processors to determine a global content loss function for the source content image from the pixels of the source content image, determine a weight for the pixel indicating a contribution to a structure in the source content image, and apply the weight to the global content loss function to determine the localized loss function for the pixel.
  • the weight is determined based upon a Laplacian pyramid of black and white versions of the source content image.
  • a localized loss function is determined for a pixel in the source content image and a corresponding pixel in the source style image.
  • the optimization uses the localized loss function for the pixel in the source content image as the content loss function and the localized loss function of the pixel in the source style image as the style loss function.
  • pixels in the synthesized image begin as white noise.
  • each pixel in the synthesized image begins with a value equal to a pixel value of a corresponding pixel in the source content image.
  • the optimizing is performed to minimize to a loss function that includes the content loss function, a style loss function, and a histogram loss function.
  • a method for performing style transfer in an image synthesis system where a synthesized image is generated with content from a source content image and texture from a source style image includes receiving a source content image that includes desired content for a synthesized image in the image synthesis system, receiving a source style image that includes a desired texture for the synthesized image in the image synthesis system, determining a localized loss function a pixel in at least one of the source content image and the source style image using the image synthesis system, and generating the synthesized image using the image synthesis system by optimizing a value of a pixel in the synthesized image to a content loss function of a corresponding pixel in the content source image and a style loss function of a corresponding pixel in the source style image wherein at least one of the corresponding pixels is the pixel that has a determined localized loss function and one of the content loss function and the source loss function is the determined localized loss function.
  • the localized loss function is represented by a Gram matrix.
  • the determining of a localized loss function for a pixel in the source style image includes receiving a mask that identifies regions of the style source image in the image synthesis system, determining a group of pixels including the pixel that are included in one of the plurality of regions identified by the mask using the image synthesis system, determining a localized loss function for the one of the plurality of regions from the groups of pixels included in the one of the plurality of regions using the image synthesis system, and associating the localized loss function with the pixel using the image synthesis system.
  • the determining a localized loss function for a pixel in the source style image comprises grouping the pixels of the source style image into a plurality of cells determined by a grid applied to the source style image using the image synthesis system, determining a localized loss function for the one of the plurality of cells that has a group of pixels that include the pixel using the image synthesis system, and associating the determined localized loss function of the one of the plurality of cells with the pixel using the image synthesis system.
  • determining of a localized loss function for a pixel in the source style image includes determining a group of neighbor pixels for a pixel in the source content image using the image synthesis system, determining a group of corresponding pixels in the source style image associated with the group of neighbor pixels in the source content image wherein each of the group of corresponding pixels corresponds to one of the group of neighbor pixels and includes the pixel using the image synthesis system, and determining a local loss function for the group of corresponding pixels using the image synthesis system.
  • FIG. 1 is an illustration of various devices that may perform one or more processes to provide Convolutional Neural Network (CNN) based image synthesis in accordance with an embodiment of the invention.
  • CNN Convolutional Neural Network
  • FIG. 2 is an illustration of components of a processing system in a device that executes one or more processes to provide CNN-based image synthesis using localized loss functions in accordance with an embodiment of the invention.
  • FIG. 3 is an illustration of images showing the instability in a Gram matrix.
  • FIG. 4 is an illustration of images showing a comparison of results of texture synthesis performed in accordance with various embodiments of the invention with and without the use of pyramids.
  • FIG. 5 is an illustration of a flow diagram of a process for providing CNN-based image synthesis that performs style transfer using localized loss functions in accordance with an embodiment of the invention.
  • FIG. 6 is an illustration of two input images and a resulting image from a style transfer process of the two input images using localized style loss functions in accordance with an embodiment of the invention.
  • FIG. 7 is an illustration of a flow diagram of a process for generating region-based loss functions in accordance with an embodiment of the invention.
  • FIG. 8 is an illustration of conceptual images showing masks of regions for two input images used in a style transfer process using region-based loss functions in accordance with an embodiment of the invention.
  • FIG. 9 is an illustration of conceptual images of cells in two input images in a style transfer process using uniform regions in accordance with an embodiment of the invention.
  • FIG. 10 is an illustration of a flow diagram of a process for generating localized loss functions in a style transfer process using uniform regions in accordance with an embodiment of the invention.
  • FIG. 11 is an illustration of a comparison of similar cells in two input images in a style transfer process using uniform regions in accordance with an embodiment of the invention.
  • FIG. 12 is an illustration of a comparison of similar pixels in two input images in a style transfer process using a per pixel loss transfer in accordance with an embodiment of the invention.
  • FIG. 13 is an illustration of a process for generating localized style loss functions for a style transfer process using per pixel loss transfer in accordance with an embodiment of the invention.
  • FIG. 14 is an illustration of two input images that provide an example of the operation of a style transfer process using a global content loss function in accordance with an embodiment of the invention.
  • FIG. 15 is an illustration of a resulting image from the style transfer from the two input images of FIG. 14 performed by a style transfer process using global content loss in accordance with an embodiment of the invention.
  • FIG. 16 is an illustration of a Laplacian Pyramid of images derived from a content source image from FIG. 14 used in a style transfer process using local content loss in accordance with an embodiment of the invention.
  • FIGS. 17 and 18 are illustrations of images produced by style transfer processes using global loss functions in accordance with certain embodiments of this invention.
  • FIG. 19 is an illustration of images generated by a style transfer process using localized content loss functions in accordance with an embodiment of the invention.
  • FIG. 20 is an illustration of a flow diagram of a process for determining localized loss using masks in accordance with an embodiment of the invention.
  • FIG. 21 is an illustration of images synthesized in accordance with some embodiments of the invention and images generated using other processes.
  • FIG. 22 is an illustration of images of masks used in an aging process in accordance with an embodiment of the invention.
  • FIG. 23 is an illustration of a synthesis order in a multiscale pyramid framework in accordance with an embodiment of the invention.
  • FIG. 24 is an illustration of a textured mapped model and components used to form the textured mapped model using a filter process in accordance with an embodiment of the invention.
  • FIG. 25 is an illustration of a texture and the texture applied to a surface of a mesh by a filter process in accordance with an embodiment of the invention.
  • CNN Convolutional Neural Network
  • processes for providing CNN-based image synthesis may be performed by a server system.
  • the processes may be performed by a “cloud” server system.
  • the processes may be performed on a user device.
  • the loss functions may be modeled using Gram matrices. In a number of embodiments, the loss functions may be modeled using covariance matrices. In accordance with several embodiments, the total loss may further include mean activation or histogram loss.
  • a source content image including desired structures for a synthesized image and a source style image, including a desired texture for the synthesized image
  • a CNN may be used to determine localized loss functions for groups of pixels in the source content and/or source style images.
  • the localized content and/or localized style loss functions may be used to generate a synthesized image that includes the content from the source content image and the texture from the source style image.
  • an optimization process may be performed to optimize pixels in a synthesized image using the localized content loss function of a corresponding pixel from the source content image and/or the localized style loss function of a corresponding pixel from the source style image.
  • the optimization may be an iterative optimization that is performed by back propagation through a CNN, or through a purely feed-forward process.
  • a specific pyramid-stack hybrid CNN architecture based on some combination of pooling, strided convolution and dilated convolution is used for image synthesis.
  • the specific CNN architecture utilized in image synthesis is largely dependent upon the requirements of a given application.
  • the CNN-based image synthesis processes may perform aging of an image.
  • CNN-based image synthesis processes may be used to perform continuous weathering by continually modifying the parametric model.
  • the CNN-based image synthesis processes may be used to perform weathering by controlling the weathering through a “painting by numbers” process.
  • CNN-based image synthesis processes may be used to perform continuous multiscale aging.
  • CNN-based image synthesis processes may be used to perform aging by transferring weathering patterns from external exemplars.
  • CNN-based image synthesis processes may combine optimization and feedforward parametric texture synthesis for fast high-resolution synthesis.
  • CNN-based image synthesis processes may be used to perform super image super resolution (SISR) for rending.
  • SISR super image super resolution
  • CNN-based image synthesis processes may combine parametric and non-parametric-non-CNN synthesis within a pyramid framework.
  • dilated convolution neural networks can be utilized to synthesize image hybrids.
  • Image hybridization involves starting from a set of several source images within a category and mixing them together in a way that produces a new member of that category.
  • image hybridization is performed using either an optimization or feedforward based synthesis strategy.
  • a key aspect of the image hybridization is to generate new activations at different levels of the network which combine the activation features extracted from the input images into new hybrid configurations.
  • Processes in accordance with many embodiments of the invention integrate an on-model synthesis approach into the CNN approach.
  • the goal of processes in accordance with some embodiments of the invention is to provide an on-model texture synthesis scheme that allows the user to supply a fully textured model as the input exemplar instead of just a texture, and apply that texture from the model onto a different untextured model.
  • the processes produce textures that conform to geometric shapes and the feature contents of that texture are guided by the underlying shape itself. This results in image synthesis that can be applied on top of already textured meshes, and can also produce appearance transfer from one textured mesh onto another.
  • a specific class of artificial neural networks that can be referred to as Condensed Feature Extraction Networks are generated from CNNs trained to perform image classification.
  • Systems and methods in accordance with many embodiments of the invention generate Condensed Feature Extraction Networks by utilizing an artificial neural network with a specific number of neurons to learn a network that approximates the intermediate neural activations of a different network with a larger number (or the same number) of artificial neurons.
  • the artificial neural network that is utilized to train a Condensed Feature Extraction Network is a CNN.
  • the computation required to generate outputs from the Condensed Feature Extraction Network for a set of input images is reduced relative to the CNN used to train the Condensed Feature Extraction Networks.
  • Network 100 includes a communications network 160 .
  • the communications network 160 is a network such as the Internet that allows devices connected to the network 160 to communicate with other connected devices.
  • Server systems 110 , 140 , and 170 are connected to the network 160 .
  • Each of the server systems 110 , 140 , and 170 may be a group of one or more server computer systems communicatively connected to one another via internal networks that execute processes that provide cloud services to users over the network 160 .
  • cloud services are one or more applications that are executed by one or more server systems to provide data and/or executable applications to devices over a network.
  • the server systems 110 , 140 , and 170 are shown each having three servers in the internal network. However, the server systems 110 , 140 and 170 may include any number of servers and any additional number of server systems may be connected to the network 160 to provide cloud services including (but not limited to) virtualized server systems.
  • processes for providing CNN-based image synthesis processes and/or systems may be provided by one or more software applications executing on a single server system and/or a group of server systems communicating over network 160 .
  • the personal devices 180 are shown as desktop computers that are connected via a conventional “wired” connection to the network 160 .
  • the personal device 180 may be a desktop computer, a laptop computer, a smart television, an entertainment gaming console, or any other device that connects to the network 160 via a “wired” or “wireless” network connection.
  • the mobile device 120 connects to network 160 using a wireless connection.
  • a wireless connection is a connection that may use Radio Frequency (RF) signals, Infrared (IR) signals, or any other form of wireless signaling to connect to the network 160 .
  • RF Radio Frequency
  • IR Infrared
  • the mobile device 120 is a mobile telephone.
  • mobile device 120 may be a mobile phone, a Personal Digital Assistant (PDA), a tablet, a smartphone, or any other type of device that connects to network 160 via wireless connection in accordance with various other embodiments of the invention.
  • the processes for providing CNN-based image synthesis may be performed by the user device.
  • an application being executed by the user device may capture or obtain the two or more input images and transmit the captured image(s) to a server system that performs the processes for providing CNN-based image synthesis.
  • the user device may include a camera or some other image capture system that captures the image.
  • the specific computing system(s) used to capture images and/or processing images to perform CNN-based image synthesis often is largely dependent upon the requirements of a given application and should not be considered as limited to any specific computing system(s) implementation(s). Computing systems and processes for performing CNN based image synthesis are discussed further below.
  • the processing device 200 includes a processor 205 , a non-volatile memory 210 , and a volatile memory 215 .
  • the processor 205 may be a processor, microprocessor, controller or a combination of processors, microprocessors and/or controllers that perform instructions stored in the volatile memory 215 and/or the non-volatile memory 210 to manipulate data stored in the memory.
  • the non-volatile memory 210 can store the processor instructions utilized to configure the processing system 200 to perform processes including processes in accordance with particular embodiments of the invention and/or data for the processes being utilized.
  • the processing system software and/or firmware can be stored in any of a variety of non-transient computer readable media appropriate to a specific application.
  • a network interface is a device that allows processing system 200 to transmit and receive data over a network based upon the instructions performed by processor 205 . Although an example of processing system 200 is illustrated in FIG. 2 , any of a variety of processing systems in the various devices may be configured to provide the methods and systems in accordance with various embodiments of the invention.
  • CNNs can be powerful tools for synthesizing similar but different version of an image or transferring the style of one image onto the content of another image. Recently, compelling results have been achieved through parametric modeling of the image statistics using a deep CNN.
  • An example CNN used for image style transfer is described by Leon Gatys in a paper entitled, “Image Style Transfer Using Convolutional Neural Networks” that may be obtained at www.cv-foundation.org/openacess/content_cvpr_2016/papers/Gatys_Image_Style_Transfer_CVPR_2016_paper.pdf.
  • CNN-based image synthesis processes are particularly suited for performing texture synthesis and style transfer.
  • CNN-based image synthesis systems may perform texture synthesis in the following manner:
  • a CNN image synthesis system receives an input source texture, S, and synthesizes an output texture, O.
  • S and O are passed through a CNN such as VGG that generates feature maps for the activations of the first L convolutional layers of the CNN.
  • the activations of the first L convolutional layers are denoted as S 1 . . . S L and O 1 . . . O L .
  • a loss gram over the layers, which preserves some properties of the input texture by means of a Gram matrix can be expressed as:
  • a l are user parameters that weight terms in the loss,
  • F i,j refers to feature i's pixel j within the feature map.
  • the synthesized output image O is initialized with white noise and is then optimized by applying gradient descent to equation (1). Specifically, the gradient of equation (1) with respect to the output image O is computed via backpropagation.
  • CNN-based image synthesis processes that perform style transfer synthesis operate in a similar manner to the texture synthesis process described above.
  • a CNN-based image synthesis system receives a content image, C, and a style image, S, that are used to generate a styled image, O. All three images are passed through a CNN, such as a VGG, that gives activations for the first L convolutional layers denoted as C 1 . . . C L , S 1 . . . S L , O 1 . . . O l .
  • the total style transfer loss combines the losses for the style image ( ⁇ gram ) and the content image ( ⁇ content ):
  • the content loss is a feature distance between content and output that attempts to make output and content look similar:
  • the output image O is initialized with white noise and optimized using a gradient descent.
  • CNN-based image synthesis processes performing style transfer may use an iterative optimization process to cause the white noise image of the synthesized image to incrementally begin to resemble some user-specified combination of the source content and style images.
  • a CNN backpropagation training procedure may be used as the iterative optimization process to turn the white noise or content image into an image that combines features of the content and style images.
  • the iterative optimization process can be directed by a loss function (equation 4) that the backpropagation training procedure is trying to minimize.
  • the loss function is calculated as the difference between parametric models encoding the style of a style image and the image being synthesized.
  • a content loss can be included as well, where the content loss is some distance metric between raw neural activations calculated for the content image and the image being synthesized.
  • the resulting operation is texture synthesis. If a style loss is used without content loss and the image being synthesized starts from the content image, then the resulting operation is style transfer. If both style and content loss are used then the operation will always be style transfer.
  • image processing applications including, but not limited to, image hybridization, super-resolution upscaling and time-varying weathering, could be achieved using the same CNN framework but using different loss functions.
  • CNN-based image synthesis processes in accordance with certain embodiments of the invention may use loss functions to direct the optimization process in various synthesis processes that may be performed.
  • CNN-based image synthesis processes in accordance with particular embodiments of this invention use a collection of stable loss functions for the CNN-based image synthesis to achieve various results.
  • CNN-based image synthesis processes use multiple stable loss functions for texture synthesis and style transfer.
  • multiple stable loss functions for the style transfer including loss functions for the style and content are addressed separately below.
  • FIG. 3 A problem that can be experienced when using Gram matrices as loss functions in style transfer is that the results are often unstable.
  • the cause of the instability is illustrated in FIG. 3 .
  • Gram matrices do not match image intensities. Instead, Gram matrices match feature activations, i.e. feature maps, after applying the activation functions but the same argument applies: activation maps with quite different means and variances can still have the same Gram matrix.
  • a Gram matrix is statistically related to neither the mean nor covariance matrices. Instead, a Gram matrix is related to a matrix of non-central second moments.
  • a feature activation map, F with m features, is used as an example.
  • feature map activations are simply referred to as “features,” such that a “feature” refers to the result of applying an activation function.
  • the statistics of the features in the feature map F can be summarized by using an m dimensional random variable X to model the probability distribution of a given m-tuple of features.
  • the random vector of features X can be related to the feature map F.
  • the Gram matrix, G(F) may be normalized by the number of samples n to obtain a sample estimator for the second non-central mixed moments E[XX T ].
  • the terms (normalized) “Gram matrix” and E[XX T ] may be used interchangeably in the following discussion even though one is actually a sampled estimator of the other.
  • equation (5) becomes:
  • is the standard deviation of feature X.
  • F 1 for the input source image, and a feature map, F 2 , for the synthesized output image that have respective feature distributions X 1 , X 2 means ⁇ 1 , ⁇ 2 , and standard deviations ⁇ 1 , ⁇ 2 , the features maps will have the same Gram matrix if the following condition for equation (6) holds:
  • an input feature map F 1 has an input feature random vector X 1 , a mean p, and covariance matrix Z(X 1 ).
  • the Gram matrices X 1 and X 2 are set equal to one another using equation (5) above to obtain:
  • the variances of the output random feature activation vector X 2 may be constrained along the main diagonal of its covariance matrix so that the variances are equal to a set of “target” output image feature activation variances.
  • the remains may be an unknown variable in the transformation matrix A, and vector b may be determined using closed form solutions of the resulting quadratic equations.
  • these equations are often long and computationally intensive to solve. Yet, it does show that there are more unknowns than equations.
  • CNN-based image synthesis processes use a covariance matrix instead of a Gram matrix to guide the synthesis process.
  • Covariance matrices are similar to Gram matrices but do not share the same limitation.
  • covariance matrices explicitly preserve statistical moments of various orders in the parametric model. By this we explicitly refer to the mean of all feature vectors as the first order moment and to the co-activations of feature vectors centered around their mean as second order moments.
  • This new parametric model allows the covariance loss and means loss to drive the synthesis. This can make the combined loss for texture synthesis:
  • the replacement of the Gram matrix with a Covariance matrix may improve but does not decisively solve the stability issues inherent in texture synthesis and/or style transfer.
  • the covariance matrix may be a powerful method for describing image style in a stable form when the texture being parameterized is highly stochastic and can be represented as a single cluster in feature space. It may still be a problem that many textures and most natural images contain multiple clusters. In other words, these textures or natural images contain a combination of multiple distinct textures.
  • a covariance matrix may exhibit the same unstable behavior as a Gram matrix. The reason for the unstable behavior is that the center between multiple clusters ensures that every cluster will be un-centered and, thus, will not exhibit stable mathematical properties.
  • CNN-based image synthesis processes that use covariance loss and/or mean loss in accordance with various embodiments of the invention are described above, other processes that use covariance loss and mean loss in accordance with various other embodiments of the invention that add, combine and/or remove steps as necessitated by the requirements of particular systems and/or processes are possible.
  • the multiple cluster problem may be dealt with by using an automatic clustering process on the feature vectors to identify different textures in an image.
  • the clustering process could transform the image so that each cluster is centered around its mean.
  • an automatic clustering process may introduce a number of additional problems. For example, if different linear transforms are applied to different regions of the image in a discrete way, seam lines may appear along the borders between different texture clusters. To deal with these seams, processes in accordance with many embodiments of the invention interpolate the transform between clusters. The interpolation may be more difficult than simply adding a histogram loss that has been shown to solve the same problem as discussed above.
  • the instability of Gram or Covariance matrices is addressed by explicitly preserving statistical moments of various orders in the activations of a texture.
  • an entire histogram of feature activations is preserved.
  • systems and processes in accordance with a number of embodiments augment synthesis loss with m additional histogram losses, one for each feature in each feature map.
  • systems and processes in accordance with several embodiments incorporate a total variation loss that may improve smoothness in the synthesized image.
  • the combined loss for texture synthesis in CNN-based image synthesis processes in accordance with some embodiments of the invention is:
  • CNN-based image synthesis processes in accordance with many embodiments of the invention use loss based on histogram matching.
  • the synthesized layer-wise feature activations are transformed so that their histograms match the corresponding histograms of the input source texture image S.
  • the transformation can be performed once for each histogram loss encountered during backpropagation.
  • CNN-based image synthesis processes in accordance with a number of embodiments of the invention use an ordinary histogram matching technique to remap the synthesized output activation to match the input source image activations.
  • O ij represents the output activations for a convolutional layer i and feature j
  • O′ ij represents the remapped activations.
  • the technique may compute a normalized histogram for the output activations O ij and match it to the normalized histogram for the activations of input source image S to obtain the remapped activations O′ ij .
  • This technique is then repeated from each feature in the feature map to determine a Frobenius norm distance between O ij and O′ ij .
  • the loss of the histograms may be expressed as:
  • O i is the activation map for feature map l and R(O i ) is the histogram remapped activation map, and ⁇ l is a user weight parameter that controls the strength of the loss.
  • R(O i ) has zero gradient almost everywhere, it can be treated as a constant for the gradient operator.
  • the gradient of equation (10) can be computed by realizing R(O i ) into a temporary array O′ l and computing the Frobenius norm loss between O i and O′ i .
  • CNN-based image synthesis processes that use histogram loss in accordance with various embodiments of the invention are described above, other processes that provide histogram loss in accordance with various other embodiments of the invention that add, combine and/or remove steps as necessitated by the requirements of particular systems and/or processes are possible.
  • Style transfer is a broadening of texture synthesis.
  • texture synthesis an input texture is statistically resynthesized.
  • Style transfer is similar with the additional constraint that the synthesized image O should not deviate too much from a content image C.
  • CNN-based image synthesis processes that perform style transfer in accordance with sundry embodiments of the invention include both a per-pixel content loss and a histogram loss in the parametric synthesis equation such that the overall loss becomes:
  • CNN-based image synthesis processes that use histogram loss to perform style transfer in accordance with various embodiments of the invention are described above, other processes that use histogram loss to perform style transfer in accordance with various other embodiments of the invention that add, combine and/or remove steps as necessitated by the requirements of particular systems and/or processes are possible.
  • CNN-based image synthesis processes in accordance with some embodiment automatically determine parameters for the processes performed.
  • the parameters may include, but are not limited to, the coefficients ⁇ 1 in the Gram/Covariance loss equation (1), ⁇ 1 in the content loss of equation (4), ⁇ l in the histogram/mean loss equation (1) and ⁇ that is multiplied against the total variation loss.
  • Automatic tuning processes in accordance with many embodiments of the invention are inspired by batch normalization that tunes hyper-parameters during a training process to reduce extreme values of gradients.
  • the parameters may also be dynamically adjusted during the optimization process.
  • the dynamic tuning can be performed with the aid of gradient information.
  • different loss terms L i may be encountered. Each loss term L i has an associated parameter c i that needs to be determined (c i is one of the parameters ⁇ l , ⁇ l , ⁇ l , and ⁇ ).
  • a backpropagated gradient g i may first be calculated from the current loss term as if c i were 1.
  • Magnitude thresholds of 1 can be used for all parameters except for the coefficient ⁇ l of the Gram/Covariance loss, which may have a magnitude threshold of 100 in accordance with several embodiments. As can readily be appreciated, magnitude thresholds and/or other constraints can be specified as appropriate to the requirements of a given application.
  • CNN-based image synthesis processes that perform automatic tuning in accordance with various embodiments of the invention are described above, other processes that provide automatic tuning in accordance with various other embodiments of the invention that add, combine and/or remove steps as necessitated by the requirements of particular systems and/or processes are possible.
  • CNN-based image synthesis processes in accordance with certain embodiments include manual and automatic control maps that were previously used in non-parametric approaches. To achieve this, processes in accordance with many embodiments perform a coarse-to-fine synthesis using image pyramids. In accordance with a number of embodiments, a ratio of two is used between successive image widths in the pyramid. A comparison of results of texture synthesis performed in accordance with various embodiments of the invention with and without the use of pyramids are shown in FIG. 4 .
  • images 401 and 402 are the style images and images 410 - 411 and 420 - 421 are the content images. Images 410 and 411 were generated without the use of pyramids and images 420 and 421 were generated with the use of pyramids.
  • Images 410 and 420 show that pyramids blend coarse scale style features with content features better.
  • Images 411 and 421 may show that pyramids transfer coarse scale features better and reduce CNN noise artifacts.
  • Images 412 and 422 are magnified from images 411 and 421 , respectively, and may show noise artifacts (in image 412 ) and better transfer of coarse-scale features (in image 422 ).
  • a process for providing CNN-based image synthesis that performs style transfer using localized loss functions in accordance with an embodiment of the invention is shown in FIG. 5 .
  • a source content image and a source style image are received ( 505 , 510 ).
  • the source content image includes the structures that are to be included in a synthesized image and the source style image includes a texture that is to be applied to the synthesized image.
  • the process 500 determines localized content loss functions for groups of pixels in the source content image ( 515 ) and/or localized style loss functions for groups of pixels in the source style image ( 520 ).
  • the localized content loss functions and/or localized style loss functions may be generated using a CNN.
  • Process 500 performs an optimization process using the localized content loss functions and/or localized style loss functions to cause the pixels in the synthesized image to form an image with a desired amount of content from the content source image and a desired amount of texture from the source style image ( 525 ).
  • the optimization process may be an iterative optimization process that is performed until a desired result is achieved.
  • the iterative optimization process may be performed by a backpropagation through a CNN iterative optimization processes in accordance with various embodiments of the invention are described in more detail below.
  • Style loss functions reproduce the textural component of the style image.
  • a global style loss function may be transformed into a stationary representation (i.e. the representation is a culmination of the local patches of texture independent of the location of each local patch in the image).
  • a global style loss function approach may generate the global style loss function by applying the source style image to a CNN, gathering all activations for a layer in a CNN and building a parametric model from the gathered activations of the layer. An optimization process may then be used to cause the loss function of one image to appear statistically similar to the loss function of another image by minimizing the error distance between the parametric model of the loss functions of the two images (which act as a statistical fingerprint that is being matched).
  • a style transfer approach using a global style loss function may lead to unimpressive results as shown by the images illustrated in FIG. 6 .
  • Brad Pitt's image 601 is matched to an image of Picasso's self-portrait 602 .
  • the overall style of the painting in the image 602 is transferred including the individual brush strokes, the general color palette and the overall look. This makes the image 603 look similar to the image 602 .
  • the individual features that compose the face are not transferred.
  • Brad Pitt's eyes, nose and mouth do not look like Picasso's corresponding features.
  • a collection of strategies designed to transform the style of an image locally rather than globally may be performed.
  • the various strategies used in the various embodiments involve a similar core idea of using a collection of parametric models representing local loss functions in either one or both of the source content image and the source style image, as opposed to using a single parametric model for each image.
  • Each of the parametric models of an image summarize specific features in the image and are distinct and/or unique from the other parametric models in the collection of matrices for the image.
  • the application of local loss functions may vary greatly between the various embodiments, depending on a desired degree of locality.
  • each of the models may represent very large regions of the image in accordance with some embodiments when it is desired to have very little locality.
  • the models may each represent smaller groups down to an individual pixel in accordance with particular embodiments of the invention where a very high degree of locality is desired.
  • Region-based style transfer may be used in accordance with some embodiments of the invention.
  • a process for generating a region-based loss function in accordance with an embodiment of the invention is shown in FIG. 7 .
  • a process 700 may generate a mask with one or more regions for both of the source content and source style images ( 710 ).
  • the regions may be determined by a user and received as a manual input of the user into the system.
  • processes may generate the regions of the mask through a neighbor matching process and/or other similar process for structure identification.
  • the process 700 applies the masks to each image and determines a region of the mask associated with each pixel in each of the images ( 715 ).
  • the process 700 assigns each pixel to the region determined to be associated with the pixel ( 720 ).
  • the process 700 then generates parametric models for each of the identified regions of the masks from the pixels associated with the regions ( 725 ) and may add the generated parametric model for each region to an array of matrices stored in memory.
  • the mask value of each pixel may be used to index the pixel into the proper parametric model in the array for use in the style transfer process described above.
  • Images that illustrate a region-based style transfer process performed on the images of Brad Pitt and Picasso's self-portrait 601 and 602 in accordance with an embodiment of the invention are shown in FIG. 8 .
  • region-based style transfer distinct features of an image can be clustered together.
  • image 802 Picasso's portrait from image 602 is segmented into a few distinct regions.
  • the eyes may be one region with the lips, nose, hair, skin shirt and background each being in their own regions.
  • a mask 802 may be applied over image 602 to identify the region that contains each pixel in the image 602 .
  • a mask shown in image 801 may be applied to the image 601 of Brad Pitt to identify the pixels that belong to each of the identified regions of the image 601 .
  • a uniform segment style transfer process may be performed.
  • the images are divided into uniform segments.
  • the images of Brad Pitt and Picasso's self-portrait divided into uniform segments in accordance with an embodiment of the invention that uses uniform segments are shown in FIG. 9 .
  • a process for performing uniform style transfer in accordance with an embodiment of the invention is shown in FIG. 10 .
  • images 901 and 902 of FIG. 9 a process 1000 of FIG. 10 divides each image into a grid of regions or cells ( 1005 ).
  • images 901 and 902 are divided into grids of 8 ⁇ 8 cells in the shown embodiment.
  • each cell is associated with an individual parametric model of a localized loss function ( 1010 ).
  • the generated parametric models can be added ( 1015 ) to an array of models for each image.
  • an individual parametric model may be associated ( 1020 ) with groups of cells that are determined by similarity of the cells or by some other manner.
  • the parametric models may be used as a descriptor for nearest neighbor matching of the pixels in associated cell(s).
  • the nearest neighbor matching binds cells together so that each cell in the content image is optimized more closely resemble a cell in the style image that is determined to most closely approximate the cell as shown in FIG. 11 .
  • a cell 1101 in the image 901 is optimized to a cell 1102 in the image 902 .
  • one or more cells in the style image that most closely approximate a cell in the content image may be identified by determining the cell(s) in the style image that has a minimum L 2 distance between its parametric model and the parametric model of the cell in the content image.
  • the optimizing processes for all of the cells in the content image are performed in parallel.
  • each pixel is a local region. Images that show the association between pixels in a source content image and a source style image in accordance with a certain embodiment of this invention are shown in FIG. 12 .
  • a parametric model of a localized style loss function is generated for each pixel in the content image cell 1201 and the style image cell 1202 .
  • a process for generating localized style loss functions in accordance with an embodiment of the invention is shown in FIG. 13 .
  • the process 1300 includes gathering a neighborhood of pixels surrounding each particular pixel in the source content image ( 1305 ).
  • a group of pixels in the source style image that are associated with the neighbor pixels of each particular pixel is determined ( 1310 ).
  • a pre-computed nearest neighbor set may be used to associate each pixel in the content image with a pixel in the source style image.
  • the group of pixels in the source style image associated with the neighborhood of each particular pixel is used to generate the parametric model of the localized style loss function that the particular pixel is optimized toward ( 1315 ).
  • regional segment style transfer is simple and fast compared to the other transfer strategies.
  • the regional segment style transfer can be imprecise, whether a human or a CNN is used to determine how the parametric models are generated.
  • the cell transfer can differ from the regional transfer in that many more parametric models are generated and the matrices themselves are used to determine the correspondence of features.
  • the per pixel approach is typically the most precise and the slowest possible transfer strategy because the amount of computations needed is increased by generation of a parametric model for each particular pixel from patches of pixels around the particular pixel.
  • content loss can be used for the transfer process instead of style loss.
  • style loss attempts to be stationary and content loss does not.
  • stationary means that the location of a pixel is the main factor influencing what the pixel should look like.
  • the content loss function can be simple in accordance with some embodiments in that the L2 (or Euclidian) distance is summed for each pixel in the synthesized image to each pixel at the same location in the content image.
  • the goal of content loss is to reproduce the “structure” of the content image (image 1401 of FIG. 14 showing the Golden Gate Bridge) while allowing the nonstructural aspects of the image to mutate towards resembling the style image (image 1402 of FIG. 14 showing Starry Night).
  • image 1401 of FIG. 14 showing the Golden Gate Bridge image 1401 of FIG. 14 showing the Golden Gate Bridge
  • nonstructural aspects of the image image 1402 of FIG. 14 showing Starry Night
  • a problem with using a global content loss in a style transfer process may be that all of the regions of the content image may not be equally as important in terms of key shapes and structures in the image.
  • image 1501 of the Golden Gate Bridge the low-importance content features including the low frequency sky and ocean are given a high enough content loss to overpower the style contribution and stop large, swirly clouds and stars from forming.
  • the high-importance content features including the bridge are largely being distorted by the style image. This makes the high-importance content features lose fine scale qualities such as the cable stretching from tower to tower.
  • the tower in the background is more distorted than the larger tower in the foreground because the tower in the background is smaller in terms of image size.
  • the tower in the background is not less important than the tower in the foreground in terms of content as the tower is a key structure in the image.
  • Style transfer processes that use localized content loss functions in accordance with some embodiments of the invention may provide weights to each pixel based on the amount that the pixel contributes to a key shape or structure in the image.
  • content can be a poorly defined concept with respect to art as “content is subjective and can be subject to personal interpretation.
  • the process for localizing content loss in accordance with some embodiments of the invention is based on the following observations about “content.” For the purposes of determining the contribution of a pixel to content, one may observe that flat, low frequency regions of an image generally do not contribute to the content of the image (for purposes of human perception) while high frequency regions generally are important contributors to the content.
  • style transfer processes in accordance with many embodiments of the invention may use a Laplacian Pyramid of black and white versions of the content image (Images 1601 - 1604 in FIG. 16 ) to determine content loss weights for each pixel in the image being synthesized where the high frequency pixels (whiter pixels) have a higher influence on content than low frequency pixels (darker pixels).
  • CNNs convolutional neural networks
  • CNNs convolutional neural networks trained on image classification tend to learn kernels at the deeper levels of the network that recognize shapes which are structurally meaningful to humans. Therefore, the magnitude of feature vectors produced from the content image deep in the network can also be used as a scaling factor for the content loss itself.
  • An image 1701 shows an image generated using global style loss and an image 1702 shows an image generated using global content loss starting from noise and use the respective global loss functions to generate the final image.
  • the image 1701 illustrates global style loss with no content loss producing a “texturized” version of the style image (Starry Night).
  • Image 1702 introduces global content loss to the texturized version and the texturized version of Starry Night is reshaped into the form of the Golden State Bridge but with the flaws identified above.
  • FIG. 18 The difference between the use of a global content loss functions and the use of localized content loss functions determined using a Laplacian Pyramid in accordance with a certain embodiment of the invention is shown in FIG. 18 .
  • An image 1801 is the same as the image 1702 and introduces global content loss to the texturized version of the image, and an image 1802 introduces local content loss based upon a Laplacian Pyramid to the texturized version instead of the global content loss.
  • the features in the image 1802 emerge (i.e. the bridge and the land) while the rest of the image reproduces the texture of Starry Night more accurately.
  • noise does not have to be the starting point in some embodiments of this invention.
  • the logic of starting from noise may be that noise often produces a slightly different version of the transfer each time.
  • CNN backpropagation may be used to provide a style transfer process using global and/or local content loss.
  • the use of CNN backpropagation can allow the image to be thought of as a point in a super-high dimensional space (a dimension for each color channel in each pixel of the image).
  • the optimization process is a gradient descent optimization that pulls an image at that point through the image's high dimensional space toward a new point that is within a small region of the high dimensional space that is considered “good output.”
  • the force that pulls the image may be the combined loss function for style and content as well as optimizing towards a local minimum of the function, depending on where the noise commences in this space.
  • the optimization process may be started from the content image instead of noise in accordance with a number of embodiments.
  • the use of the content image to start may not offer an advantage because the content loss may begin from an optimal position and play a more aggressive “tug of war” against the style loss resulting in an image that has more content and less style.
  • both loss functions content and style
  • the two loss functions are often fighting against each other during the entire process, and that may return a less than pleasing result.
  • the image 1901 was generated by starting at the content image of the Golden Gate Bridge and then optimizing using only style loss so that the image mutated to better resemble “Starry Night” until the process reached a local minimum. This produces better results than previously known style transfer processes. However, the results may be improved by re-introducing localized content loss instead of global content loss that results in image 1902 .
  • This approach addresses the problem of removing content loss completely by trying to reach a local minimum in the optimization process that does not cause key structures (e.g. the cables on the bridge and the tower in the background) to be mutated too much by the style loss and lose the distinguishing characteristics of these structures.
  • the mutation of structurally important aspects of the content too far in the style direction may be reduced leading to an optimization process that reaches a more desirable local minimum.
  • Localized style and content loss are also applicable within a feedforward texture synthesis and style transfer algorithm and are not limited to an optimization framework using backpropagation.
  • CNN-based image synthesis processes separate multiple textures in an image into multiple models to determine localized loss.
  • processes in accordance with many embodiments receive an index mask for the source texture or style image and an index mask for the synthesized image.
  • each mask is input by a user.
  • Each mask may include M indices. This may sometimes be referred to as a “painting by numbers” process.
  • a process for determining localized loss using masks in accordance with an embodiment of the invention is shown in FIG. 20 .
  • a process 2000 applies the mask for the source image to the source image to determine the pixels that belong to each of the M indices ( 2005 ) and the mask to the synthesized image to determine the pixels that belong to each of the M indices of the synthesized mask.
  • a parametric model is generated for each of the M indices of the source style mask from the pixels that belong to each of the M matrices ( 2010 ).
  • the indices of the synthesized output may be tracked though an image pyramid for coarse-to-fine synthesis ( 2015 ). During synthesis, the previous losses are modified to be spatially varying ( 2020 ).
  • spatially varying Gram/Covariance matrix and histogram losses may be imposed where the style Gram/Covariance matrices and histograms vary spatially based on the output index for the current pixel.
  • Histogram matching is then performed ( 2025 ).
  • the histogram matching may be performed separately in each of the M regions defined by the indexed masks.
  • Blending of adjacent regions may then be performed ( 2030 ).
  • the bending of adjacent regions can be automatically performed during backpropagation.
  • FIG. 21 shows images synthesized in accordance with particular embodiments of the invention and images generated using other processes.
  • images 2101 an example of controllable parametric neural texture synthesis.
  • Original images are on the left, synthesis results on the right; corresponding masks are above each image.
  • Rows of images 2105 , 2110 and 2115 are examples of portrait style transfer using painting by numbers.
  • Rows of images 2110 and 2115 show style transfer results for an embodiment of the invention on the far right as compared to images generated by another process in middle.
  • the images can show that processes in accordance with some embodiments of the invention may preserve fine-scale artistic texture better. However, processes in accordance with certain embodiments of the invention may also transfer a bit more of the person's “identity,” primarily due to hair and eye color changes.
  • the CNN used may be a VGG-19 network pre-trained on the ImageNet dataset.
  • layer Rectified Linear Unit (relu) 1_1, relu 2_1, relu 3_1 and relu 4_1 may be used for Gram losses.
  • the histogram losses may be computed only at layers relu 1_1 and relu 4_1 in a number of embodiments.
  • Content loss is computed only at relu 4_1 In accordance with several embodiments.
  • the total may only be performed on the first convolutional layer to smooth out noise that results from the optimization process.
  • the images are synthesized in a multi-resolution process using an image pyramid.
  • the process begins at the bottom of the pyramid that can be initialized to white noise, and after each level is finished synthesizing, a bi-linear interpolation is used to upsample to the next level of the pyramid.
  • CNN-based image synthesis systems in accordance with various embodiments of the invention are described above, other configurations of the CNN-based systems that add, modify and/or remove portions of the CNN in accordance with various embodiments of the invention are possible.
  • the apparent realism and/or quality of a synthesized image can be improved by applying synthetic weathering.
  • Textures that display the characteristics of some weathering processes may incorporate a collection of multiple textures consolidated into one weathered texture.
  • CNN-based image synthesis processes in accordance with some embodiments of the invention may provide a new approach for controlling the synthesis of these complex textures without having to separate the textures into different parametric models. This may be achieved by directly controlling the synthesis process by strategically waxing and waning specific parameters in the model to create new outputs that express different ratios of desired features to control the appearance of age for certain textures.
  • a separate but entangled problem to controlling age appearance during synthesis is first identifying which features in the input exemplar image display characteristics of age and to what degree.
  • user-created masks that delineate feature age may be received and used to identify the features.
  • Processes in accordance with many embodiments may use an automatic clustering approach to segregate different textures.
  • Still other processes in accordance with a number of embodiments may use a patch-based method that uses the average feature distance between a patch and its K nearest neighbors as a metric for “rarity” that may be interpreted as age. This method is based on the assumption that features created by the weathering process are highly random and have a low chance of finding a perfect match caused by the same process.
  • a CNN may be trained to learn and identify weathered features for a multitude of weathering types.
  • CNN-based image synthesis processes in accordance with particular embodiments of the invention can extract a parametric model for each region.
  • the desired age can be produced as a linear combination of the separate models.
  • weathering may just be an interpolation between a Young and Old parametric model as follows:
  • processes in accordance with many embodiments may introduce a “transition” parametric model built from the bordering pixels between young and old regions.
  • processes in accordance with a number of embodiments of the invention may dynamically generate masks for each layer of the network corresponding to the receptive field. Examples of a mask are shown in FIG. 22 where black is used to delineate the young model, white for the old model and grey for the transition model.
  • (a) indicates an input texture image
  • (b) indicates a mask delineating young and old textures
  • (c)-(f) indicate different receptive fields measured in terms of layers of rectified linear units for the Gram losses in texture synthesis and style transfer.
  • the aging process in accordance with some embodiments then may become a two-step process where, first, Young to Transition is synthesized and then Transition to Old is synthesized.
  • This strategy works for textures that completely change from one material to a completely different material as the textures age.
  • weathering often damages or deforms a young material rather than transforming it into a completely different material (e.g. scratching, cracking, peeling). Therefore, it is typical that a young model should not contain old features, but the old model should contain young features.
  • the old and transition regions may be combined into a single combined parametric model.
  • Aging processes in accordance with many embodiments of the invention may use a simple strategy for determining whether the transition and old models should be combined or not.
  • the strategy is based upon the observations that when generating the transition masks as shown in FIG. 22 , the transition region becomes larger for deeper layers of the network. Therefore, if at some layer in the network the transition region completely replaces either a young or an old region, the processes assign that region into transition model at all layers of the network.
  • the transition region can effectively “annex” other features if the features are too small to justify having their own parametric model.
  • each parametric model can have an age assigned to it between 0 and 1.
  • a list of N parametric models is sorted by age value from smallest to largest giving N ⁇ 1 pairs of models to linearly interpolate between. These interpolations are sequentially chained such that the youngest model is the Y model and the next youngest is the O model. Once the old texture is fully synthesized, set the Y model to the O model and replace the O model with the next youngest model. The process may then iterate until all of the parametric models have been processed.
  • all N parametric models may be combined in parallel. This results in a single parametric model that is a combination of an arbitrary number of models in any proportion.
  • CNN-based image synthesis processes that perform aging in accordance with various embodiments of the invention are described above, other processes that perform aging in accordance with various other embodiments of the invention that add, combine and/or remove steps as necessitated by the requirements of particular systems and/or processes are possible.
  • CNN-based image synthesis processes in accordance with particular embodiments of the invention may use this approach to synthesize smoothly varying animations from one age to another. Since a new parametric model for texture or “style” is introduced, and the optimization process starts from a prior model that represents the desired content, this process can be considered a special type of style transfer process where the style transfer process is a useful way to frame the problem.
  • these processes may introduce a new content loss strategy that applies different targets to local regions.
  • These processes may begin by first synthesizing a full image for each of the two parametric models to be used as multiple content images. For each local point in the synthesis image, the processes may dynamically choose which content loss to apply based on a “parametric heat map.” To generate the parametric heat map, the mean of a parametric model is subtracted from each pixel of the model and the co-activations to form a covariance matrix for that individual feature.
  • this may be performed in the rectified linear units for the Gram losses in texture synthesis and style transfer for layer 4 (relu_4) of the VGG-19 network.
  • the L2 distance between this covariance matrix and the covariance matrix component of the young and old parametric models is found for each pixel.
  • the parametric model that has the lowest error can be used to compute the content loss for the pixel using the corresponding content image.
  • processes in accordance with a few embodiments implement this approach by generating a new single content image by choosing pixels from the different content images using the lowest error mask.
  • the specific approach that is pursued is typically dependent upon the requirements of a given application.
  • CNN-based image synthesis processes that control weathering through “painting by numbers” in accordance with various embodiments of the invention are described above, other processes that control weathering through “painting by numbers” in accordance with certain embodiments of the invention that add, combine and/or remove steps as necessitated by the requirements of particular systems and/or processes are possible.
  • CNN-based image synthesis processes re-purpose style transfer to generate a continuous and progressive aging/de-aging process in a multiscale pyramid framework.
  • Style transfer may be considered an extension to texture synthesis in which a second content image is introduced to guide the optimization process.
  • Processes in accordance with many embodiments use the same concept to synthesize time sequences in a multiscale pyramid framework. These processes may bootstrap the animation by synthesizing the first frame in the sequence using the strategy described above. After the first frame is generated, subsequent frames can be created by using the frame before as a prior frame. As such, at any given point in time, two image pyramids are stored in memory, the pyramid for the previous frame and the pyramid for the current frame being synthesized. The synthesis order is illustrated in FIG.
  • processes in accordance with a number of embodiments may store an optimizer state for each pyramid level.
  • the base of the pyramid may use white noise as a prior frame to start the synthesis and then each subsequent pyramid level starts from the final result of the previous level that is bi-linearly re-sized to the correct resolution.
  • a new image pyramid may be synthesized.
  • the first level of the new pyramid uses the first level of the previous frame as a prior image.
  • the same layer from the previous frame is used as a prior image and a content loss is introduced by re-sizing the previous layer in the same frame, this content image can be seen as a blurry version of the desired result.
  • This process is conceptually illustrated in FIG. 23 where image 5 is synthesized using image 2 as a prior and image 4 is re-sized and used as a content image to guide the process.
  • CNN-based image synthesis processes in accordance with some embodiments achieve the same benefits as synthesizing a single image using the pyramid strategy.
  • the fidelity of larger structures may be improved, noise artifacts may be reduced and synthesis speed may be improved.
  • CNN-based image synthesis processes that perform continuous multiscale aging in accordance with various embodiments of the invention are described above, other processes that perform continuous multiscale aging in accordance with various other embodiments of the invention that add, combine and/or remove steps as necessitated by the requirements of particular systems and/or processes are possible.
  • CNN-based image synthesis processes in accordance with particular embodiments can accomplish this using the heat-map approach presented in the previous section on continuous weathering.
  • Processes performing weather transfer keep a separate L1 distance score between each parametric model.
  • these processes may discriminate on a pixel-by-pixel basis to determine the pixels in a weathered region that contribute to the actual age artifacts and to what degree.
  • the goal Given a region of image W with age features as well as transition features and the resulting parametric model, the goal is to remove any features that are not the desired “aged” features and replace these features in the correct proportion with the target parametric model of C.
  • processes in accordance with many embodiments normalize an L1 distance to each parametric model between 0 and 1 and invert the result so that a region in the synthesized image that strongly matches with a parametric model will receive a score close to 1 and regions that do not match receive a score closer to 0.
  • processes in accordance with a number of embodiments compute a mean activation of the model (note, co-activations are not used for this process as the features become very difficult to de-tangle).
  • the processes may multiply the mean activation by the local L 1 distance to that parametric model and subtract it from the activations at this target pixel to remove those features in their correct proportion from neural activations of the pixels.
  • the processes may take the mean activations from the new parametric model from image C and multiply it by the same L 1 distance to determine an activation value.
  • the activation value is then added to the target pixel in W to replace the young features in the original image with young features from a new image where the weathered features are being introduced.
  • the processes can now perform weathering on image W using the processes described above.
  • CNN-based image synthesis processes that transfer weathered patterns from external exemplars in accordance with various embodiments of the invention are described above, other processes that transfer weathered patterns in accordance with various other embodiments of the invention that add, combine and/or remove steps as necessitated by the requirements of particular systems and/or processes are possible.
  • Neural network-based texture synthesis processes can be grouped into two categories based on their underlying algorithmic strategy. These include optimization-based and feedforward-based approaches. Optimization-based approaches often produce superior quality but may be computationally expensive. This often makes these processes impractical for real world applications. Feedforward-based approaches were developed as a fast alternative to optimization. This is achieved by moving the computational burden to the training phase rather than at run time. While being fast, feedforward approaches are typically poor in quality and inflexible. The first feedforward approach baked the transformation for a single texture into each network. Later, several methods introduced the idea that multiple texture transformations could be baked into a single network. One such method introduced the idea of interpolating between these styles by matching the statistics of deep neural activations from some content image to those of a style image.
  • CNN-based image synthesis processes in accordance with some embodiments of the invention use a coarse-to-fine multiscale synthesis strategy for neural texture synthesis. These processes can achieve significant speedup over previous optimization methods by performing a majority of iterations on small images early in the process and the further the processes move up the pyramid, the less iterations are used to maintain the already established structure.
  • the use of multiscale pyramid based synthesis is not only computationally cheaper as the processes move up the pyramid, but the problem formulation actually changes. Rather than performing texture synthesis or style transfer, the problem changes to Single Image Super Resolution (SISR) that takes an additional parametric texture model to help guide the up-resolution process.
  • SISR Single Image Super Resolution
  • CNN-based image synthesis processes in accordance with many embodiments of the invention may utilize the optimization-based approach described above up until an arbitrary threshold (for example, around the 512 ⁇ 512 pixel size image, varies depending upon the requirements of a given application) and then switch to an arbitrary feedforward approach utilizing VGG encoding/decoding with activation transforms along the way.
  • Switching synthesis algorithms as the processes move up the pyramid can have additional benefits beyond speed.
  • Some CNN-based texture synthesis processes are only capable of generating RGB color textures, a standard that has been obsolete in the video game and movie industries for nearly 20 years. Color textures have been replaced by “Materials” which consist of several maps encoding the fine scale geometry of the surface as well as parameters that direct how light interacts with each pixel.
  • the encoder/decoder process in accordance with a number of embodiments can both up-resolution the previous synthesis level while also decoding the entire material. While it may be possible to train a new auto-encoder to process color images along with normal maps, roughness maps, etc. this process would have to be done for every possible combination of maps. This may be costly and awkward. This approach may provide a more flexible and elegant solution.
  • a method in accordance with several embodiments of the invention generates arbitrary material formats applied to any synthesis operation including, but not limited to, texture synthesis, time-varying weathering, style transfer, hybridization and super resolution.
  • This synthesis strategy involves using some color texture generated using another process as input.
  • an exemplar material is given as input, where this material contains at least one map that is similar in appearance and purpose as the input color map.
  • the input color map is then used as a guide to direct the synthesis of the full material. This is done through a nearest neighbor search where a pixel/patch is found in one of the maps in the material that is similar to a pixel/patch in the input color image.
  • the pointer map resulting from the nearest neighbor search directs how to re-arrange all maps within the material and then each can be synthesized using this same new guiding structure.
  • CNN-based image synthesis processes that combine optimization and feed forward processes in accordance with various embodiments of the invention are described above, other processes that combine optimization and feed forward processes in accordance with various other embodiments of the invention that add, combine and/or remove steps as necessitated by the requirements of particular systems and/or processes are possible.
  • a style transfer process used to perform SISR may have applications for rendering.
  • SISR is an ill-posed problem where many high-resolution images can be downsampled to the same low-resolution result. This one-to-many inversion is especially bad at reproducing texture because it targets inverting to an average of all the possible higher resolution images.
  • the latest trend in SISR is to train very deep (i.e. many layered) artificial neural networks on a large dataset using adversarial training. The high capacity of the deep network in conjunction with the adversarial training is meant to help reduce the loss of texture features.
  • Processes in accordance with some embodiments perform a video up-resolution strategy where the video content is rendered at a low resolution (LR). From the LR source, the processes cluster frames together based on their feature statistics. The mean frame from each cluster is determined and rendered at high resolution (HR). The processes then perform the same guided LR to HR synthesis as proposed for video streaming, with the one important difference that in video streaming the HR statistics for each frame are known whereas for rendering similar HR statistics are shared across multiple frames.
  • LR low resolution
  • HR high resolution
  • CNN-based image synthesis processes that perform SISR for rendering in accordance with various embodiments of the invention are described above, other processes that perform SISR for rendering in accordance with various other embodiments of the invention that add, combine and/or remove steps as necessitated by the requirements of particular systems and/or processes are possible.
  • CNN-based image synthesis processes in accordance with many embodiments of the invention can use a nearest neighbor search between patches in the synthesized result to determine the most similar patches in the input exemplar in order to create a bridge between parametric CNN-based texture synthesis frameworks and many established non-parametric texture synthesis methods that do not require a neural network to operate.
  • the ability to tether a neural network approach on low-resolution images with non-neural network based methods higher in the synthesis pyramid can represent a “best of both worlds” solution between the two strategies.
  • CNN-based approaches, especially parametric methods may be better at producing creative new features at the cost of speed, memory and image quality (these methods may contain many noise artifacts).
  • Non-parametric models that do not rely on neural networks tend to shuffle around patches directly from the input exemplar. As such, these approaches exhibit the inverse of these behaviors. They are fast, low memory approaches that largely match the fine details of the input. However, they are not as powerful at creating new shapes and features.
  • CNN Convolutional Neural Network
  • a combination of Pooling Layers, Strided Convolution Layers and Dilated Convolution Layers are used to arrange neurons into a hierarchical multiscale relationship.
  • image synthesis algorithms utilize pooling layers and sometimes strided convolution layers in order to form an image pyramid structure within the neural network architecture.
  • strided convolution layers typically, only one such strategy is used throughout the network architecture.
  • Dilated convolution is a similar concept to image stacks, first introduced for the purposes of image processing using signal and statistics based methods and later adapted for texture synthesis (Sylvain Lefebvre and Hugues Hoppe. 2005. Parallel controllable texture synthesis. ACM Trans. Graph. 24, 3 (July 2005), 777-786—the disclosure of which related to dilated convolutional networks is hereby incorporated by reference herein in its entirety).
  • the image stack is a collection of image pyramids sampled from a single image at regular translation intervals. Image stacks were developed to address the problem that the image pyramid data structure leads to discretization errors, e.g. the same input image when translated could lead to very different downsampled results.
  • the image stack is effectively a translation invariant alternative to image pyramids. It also follows that other types of symmetry transformations could also lead to similar discretization artifacts, e.g. two otherwise identical images would produce very different averages at coarser levels of an image pyramid.
  • each level of the pyramid is typically half the resolution in each dimension as the previous level.
  • each level is the same resolution as the previous level.
  • samples are typically averaged together or combined in some way using a convolution kernel with a stride of 1, used uniformly at every pyramid level.
  • samples are typically averaged together or combined in some way using a convolution kernel with a stride of 2 level where level is the number of scale factors relative to the original image. This can be thought of as analogous to a downscaling transition in an image pyramid.
  • image pyramids both downsample and subsample an image. While downsampling is a desirable operation, subsampling rarely is. Image stacks get around this problem by downsampling without subsampling.
  • Previous image synthesis methods using a Convolutional Neural Network have used some kind of pooling or dilated convolution strategy, thus they typically go through some form of subsampling operation followed by a supersampling operation.
  • the feedforward pass is a subsampling pyramid operation as higher order features are extracted deeper in the network, then the generation of the new image is a supersampling process through backpropagation as gradients traverse upwards through the network to the higher resolution shallow layers.
  • many approaches are typically some form of auto-encoder or cascading image pyramid, both of which utilize some form of subsampled data and attempt to supersample it during the synthesis process.
  • network architectures designed for image synthesis which may rely on pooling or strided convolution to do downsampling, can be improved by using a dilated convolution architecture instead. Therefore, the system in accordance with several embodiments of the invention makes use of dilated alternatives to such network architectures, as dilation is often a superior form of downsampling for image synthesis operations.
  • a synthesis strategy also relies on the use of image pyramids (typically Gaussian pyramids) of the input data for additional multiscale synthesis
  • the system in accordance with some embodiments of the invention uses an image stack to replace the image pyramid (typically a Gaussian stack).
  • the dilated network strategy is particularly well suited for auto-encoders where the decoder network is a layer-wise inverter of the encoder network (i.e. each layer in the encoder has a “mirror” layer in the decoder which inverts that layer as accurately as possible).
  • This particular network architecture is desirable for fast image synthesis because the encoder side of the network can distill an image into its most meaningful features, which can be modified in some way by another algorithm (e.g. including, but not limited to, a whitening transform, histogram match, or nearest neighbor search). The newly updated values can then be inverted by the decoder in order to produce a new image.
  • This synthesis strategy is attractive because it's much faster and more memory efficient than an optimization based approach.
  • inverting a network that includes pooling is very difficult (historically literature has always used a pre-trained VGG network as the encoder). Inverting pooling layers typically leads to blurring or other such supersampling artifacts.
  • Systems in accordance with many embodiments of the invention implement a dilated architecture as an alternative, which is easy and more accurate to invert on a layer-by-layer basis.
  • a whitening transform, multiscale NNS search or histogram matching algorithm can continue to be applied to features at each network layer as they progress through the decoder.
  • processes in accordance with some embodiments of the invention combine pooling or strided convolution layers at the shallow end of a convolutional neural network architecture with dilated convolution layers deeper in the network.
  • This “hybrid” network architecture exhibits the properties of a pyramid network up to a specific depth in the network and then switches to the properties of a dilated stack. From a memory viewpoint, this is attractive because large images quickly get condensed into much smaller images for the majority of the network. This is also a good compromise from an image processing viewpoint because the deeper layers of the network encode the complex shapes and patterns and thus need the highest resolution. Shallow layers of the network only encode simple shapes and textures and don't require the same degree of network capacity.
  • This new network architecture can be visualized as an image stack with a pyramid sitting on top of it.
  • pyramid-stack hybrid convolutional neural network architecture based on some combination of pooling, strided convolution and dilated convolution is used for image synthesis in a number of examples discussed above
  • the pyramid-stack hybrid may be modified in a variety of ways, including (but not limited to) adding, removing, and/or combining components of the stack.
  • CNN based image synthesis is an ideal approach for hybridizing images due to the CNN's ability to learn and identify complex shapes and structures across multiple scales. Unlike other image synthesis methods which can be improved by dilated convolution, but do not require it, hybridization is likely to produce poor results and artifacts if deeper layers of the CNN are subsampled. Therefore, the input images can be passed in a feedforward manner through a dilated convolutional neural network to produce deep neural activations that have a 1-to-1 mapping with input image pixels.
  • the system in accordance with many embodiments of the invention avoids quantization and maintains the maximum number of features to compare, drastically increasing the chance of finding good matches.
  • To bootstrap the synthesis process choose one of the input images at random and pass it feedforward through a network.
  • synthesis itself does not strictly require a dilated architecture. Dilated versus strided convolution have their own benefits and weaknesses and are compared below. The important thing to note is that the same convolution kernels used for extracting exemplar features typically must also be used during synthesis.
  • a dilated architecture can be thought of as a collection of pyramid architectures, so the same set of kernels can be used in either strategy.
  • Many of the examples described herein refer to VGG-19 feature kernels pre-trained on the ImageNet dataset, however one skilled in the art will recognize that convolutional kernels from any network architecture trained on any dataset may be applied in accordance with various embodiments of the invention.
  • Hybridization unlike other image synthesis operations, is a non-parametric process that relies on finding similar matching features between input and output images and building new images by re-combining sets of exemplar features into new configurations while tracking the global feature error between these new features being mixed and the original input features that they were derived from.
  • hybridization can be performed in either an optimization or feedforward based synthesis strategy.
  • the key aspect of image hybridization is to algorithmically generate new activations at different levels of the network which combine the activation features extracted from the input images into new hybrid configurations. Before we describe how these new hybrid configurations are generated, we'll identify how they are used to synthesize new images.
  • hybrid loss When performing optimization based synthesis, an input image (typically noise, but it could be anything) is iteratively updated to minimize some loss function.
  • the “hybrid loss” function is the summed L2 distance between each pixel in the current image being synthesized and the hybrid activation maps at a given layer. This is the same strategy as the “content loss” described above, however, whereas the content loss was taken directly from an input image, the “hybrid loss” is a new activation map that is generated by recombining activation features taken from different input images. In the original image synthesis work, content loss is only used at RELU4_1, so that it does not overpower style loss at shallow layers of the network.
  • Hybridization in accordance with a number of embodiments of the invention incorporates a style loss in order to perform style transfer combined with hybridization all in one operation.
  • the basic hybridization algorithm assumes that there is no style loss. Therefore, hybrid loss can be used at multiple layers in the network.
  • Feedforward networks on the other hand do not perform an optimization process turning one image into another, instead they transform an image into a new image. Therefore, using the dilated auto-encoder network described above, the encoder portion is run on all input images, their features are hybridized in the middle of the network using another process, and then this hybridized set of activation values are inverted by the decoder. Note that in both optimization and feedforward synthesis, the results of hybridizing deep features in the network can be passed up to shallow layers and then become further hybridized through another hybridization step.
  • the neural network in accordance with some embodiments of the invention is a feature descriptor and converts a neighborhood of raw pixel values into a single point in a high dimensional feature space.
  • the creation of hybrid activations for a layer can be explained.
  • the goal is to traverse every pixel location in the layer and replace the current feature vector at that location with a new feature vector taken from some other pixel location in that layer or from some pixel location taken from another input image's neural activations at that layer.
  • This can be done through a two-step process where the process introduces randomness or “jitter” and then “corrects” any artifacts or broken structures caused by the jitter.
  • the process optionally pre-computes k-nearest neighbors between each input image and every other input image as a part of an acceleration strategy.
  • the process in accordance with many embodiments of the invention gathers k nearest neighbors from the input exemplars.
  • the process divides up the k samples equally across all the exemplars.
  • the distance metric used for these KNNs is the L2 distance between feature vectors at the neural network layer of interest. This is equivalent to transforming all of the image data into points in high dimensional feature space.
  • the process in accordance with some embodiments of the invention gathers the cluster of exemplar feature points surrounding it, such that the process samples the same number of points from each exemplar.
  • the next step is to sort these K nearest neighbors from smallest to largest distance.
  • the one parameter exposed to the user is a “jitter slider” that goes from 0-1 (or an equivalent linear or non-linear range), where 0 should produce one of the original inputs and 1 should be the maximum hybridization and mutation. Therefore, the 0-1 range is mapped to the min distance and max distance of the closest to farthest neighbors.
  • the process in accordance with many embodiments of the invention gathers the K nearest neighbors with distances less than the jitter value and randomly selects one of them to update the synthesis patch with. This is akin to constraining noise. Instead of starting from noise and trying to recover structure from it (which is very difficult), instead the process in accordance with a number of embodiments starts from pure structure (i.e.
  • the process in accordance with several embodiments of the invention adds noise or randomness in “feature space” rather than color space or image space as is typical for these types of algorithms.
  • This strategy adds noise in feature space, which essentially allows the process to randomize the image in a way that preserves the important structures of the image. This operation can be performed at one or more convolution layers within the CNN.
  • the second step “correction” then “fixes” the image so that it maintains statistical similarity to the exemplar input. For each n ⁇ n neighborhood of synthesized neural activation vectors (where n ⁇ n could be any size, including 1 ⁇ 1 e.g. a single element vector), correction seeks out the neighborhood of neural activation vectors in any exemplar that has the lowest L2 distance. The current synthesis neural activation vector is then replaced with that closest exemplar neural activation vector.
  • the correction scheme is based on coherence (Ashikhmin, Michael. “Synthesizing natural textures.” Proceedings of the 2001 symposium on Interactive 3 D graphics .
  • the process in accordance with various embodiments of the invention can either perform a nearest neighbor search from the synthesis layer to the exemplar layers during runtime of the algorithm, or could pre-compute a candidate list of k-nearest neighbors from every exemplar feature to every other k exemplar feature. Then, during synthesis, each activation vector also maintains a pointer to the exemplar activation vector that it is mimicking.
  • the synthesis process for an optimization based algorithm in accordance with some embodiments of the invention runs E input images through a dilated version of a CNN, resulting in a set of activation vectors for specific layers of interest (for the sake of VGG-19, these are RELU1_1, RELU2_1, RELU3_1 AND RELU4_1).
  • the synthesis process runs a randomly selected input image through either a dilated or un-dilated version of the CNN to produce the starting point for the hybrid activations.
  • the process runs a jitter pass and runs the standard neural optimization based synthesis method starting from some prior (typically noise) for several iterations of backpropagation until the prior has turned into a manifestation of the jittered activations at the deep layer.
  • the process then runs a correction pass on the activations at the coarsest layer in the network (for VGG-19, this is RELU4_1), thus producing the hybrid activations for that layer.
  • the process runs the standard neural optimization based synthesis method again for several iterations of backpropagation until the prior has turned into a manifestation of the hybrid activations at the deep layer.
  • the process moves to the next most shallow layer of interest in the network (e.g. RELU3_1 for VGG-19) and repeats the process, jitter and correct in order to find new hybrid activations for that layer to use as the target for hybrid loss and reruns the optimization process now only going to that layer and no farther down the network. Repeat this process until the shallowest layer of interest is optimized.
  • the synthesis process for feedforward networks in accordance with a number of embodiments of the invention runs all inputs through the encoder, producing the deep neural activations.
  • the process runs the jitter pass on one of the exemplars in order to randomize the features.
  • the process samples a neighborhood of activation vectors (at least 3 ⁇ 3) around each activation vector and performs the correction phase of the algorithm.
  • the jitter and correction phase can either use pre-computed nearest neighbor sets or run a full nearest neighbor search during the algorithm.
  • the process continues through the decoder, inverting the new hybrid layer. This process can be repeated for each layer moving through the decoder or only run at target layers. This is a tradeoff between algorithm speed and the scale at which features are hybridized. Optimization based synthesis is slower than feedforward, however it achieves superior quality.
  • a 3D model is typically “texture mapped”.
  • “texture mapped” means an image is wrapped over the surface of the 3D shape as shown in FIG. 24 .
  • 3D models typically contain UV coordinates at each vertex which define the 2D parameterization of the 3D surface.
  • the left image displays the underlying geometry of the mesh 2401
  • the middle image shows the geometry with a texture mapped over the mesh 2402
  • the image on the right shows what that texture 2403 looks like as a 2D mapping of a 3D surface.
  • synthesizing texture maps as “on-model synthesis.”
  • Processes in accordance with many embodiments of the invention integrate an on-model synthesis approach into the CNN approach. To do so, these processes have to spread out atlas maps and build a gutter space of pointers re-directing to neighboring charts.
  • the CNN based synthesis approach in accordance with many embodiments of the invention relies on the process of convolution in which each pixel of the synthesis kernel is filtered based on a neighborhood of its surrounding pixels.
  • On-model synthesis introduces two complications on top of the standard synthesis approach in image space:
  • Flow field over a 3D model is generated using its curvature properties along with user guidance. That flow field can then be projected as a 2D vector field in the parameterized texture space.
  • This flow field typically contains both directional components as well as scale components along each axis. Rather than convolving the neural network along the image x and y axis unit vectors globally, each pixel now has its own local coordinate frame and scale.
  • UV texture space is typically broken up into a set of “charts” where each chart covers a relatively flat portion of the model. This adds another level of complication because texture colors that are coherent along the surface of the model are not coherent in texture space where we perform our convolutions.
  • the process in accordance with many embodiments of the invention adds a gutter space of a few pixels in radius around each chart. These gutter pixels store pointers to other charts in texture space that encode coherent pixels along the model's surface. This additional pointer buffer is referred to as a “jump map”.
  • the process in accordance with a number of embodiments When performing convolution, rather than sampling directly from the image, the process in accordance with a number of embodiments first samples from the jump map which points to the image pixel that should be sampled. Because texture space might have tightly packed charts, as a pre-process, the process in accordance with some embodiments spreads out the charts so that there is a gutter space of at least two pixels around each chart at the coarsest synthesis pyramid level plus however many pooling layers are passed through in the CNN. Note that when using dilated convolution, the gutter space typically must be two to the power of the number of dilated convolutions.
  • Processes in accordance with some of these embodiments introduce an underlying vector field that frames the local orientation around each pixel.
  • the vector field directs the local orientation of the convolution.
  • these processes can bi-linearly interpolate sampling of neural activations from the previous layer.
  • the gutter space of pointers redirects to another atlas chart.
  • inverse mapping can be used in a manner similar to what is described above with respect to convolution. This allows these processes to perform CNN image synthesis directly in UV space for on-model synthesis.
  • the algorithm described is designed to use a rectangular texture and a single model (with no relationship between the two) as input and synthesize a new image which maps into the model's unwrapped texture space, as shown in the FIG. 25 , where texture 2501 is wrapped around mesh 2502 .
  • the input is still a rectangular image and the output uses the mesh as a canvas on which to paint over.
  • a pre-textured mesh is given as input and the textures already parameterized into some UV space are used as the source data to feed an image synthesis process.
  • Processes in accordance with some embodiments of the invention follow a similar approach. These processes take this concept a step further and produce textures that conform to geometric shapes and the feature contents of that texture are guided by the underlying shape itself. This results in image synthesis that can be applied on top of already textured meshes, and can also produce appearance transfer from one textured mesh onto another.
  • the goal of processes in accordance with some embodiments of the invention is to go one step further and provide an on-model texture synthesis scheme that allows the user to supply a fully textured model as the input exemplar (for example texture mapped mesh ( 2402 )) instead of just a texture ( 2403 ), and apply that texture from the model onto a different untextured model.
  • a fully textured model as the input exemplar (for example texture mapped mesh ( 2402 )) instead of just a texture ( 2403 ), and apply that texture from the model onto a different untextured model.
  • the advantage to this approach is that a lot of useful information is represented by a textured mesh, including (but not limited to) the relationship between varying texture features and the underlying geometric shape on which they would typically exist. Texture and shape are often not independent. Instead, texture and shape are related.
  • processes in accordance with some embodiments of the invention can provide artists with more powerful and convenient tools.
  • the first is that deep neural activation features and their resulting parametric models for UV mapped textures should be calculated using the same vector field and jump map approach proposed above for the purposes of synthesis.
  • the second is to find a shape descriptor that is both effective as well as compatible with an image descriptor maintained by the system and the image based GPU accelerated framework upon which the system is built.
  • a key insight is that geometric shape information can be projected onto an image (i.e. a regular grid) and the shape descriptor is able to work by sampling patches from this grid in order to maintain compatibility with the GPU framework. Because it is desirable that geometric neighborhoods correspond to texture neighborhoods, it makes sense that the geometric projection into image space should match the texture unwrapping. The only issue is that texture information can map to multiple portions of a single mesh. As such, processes in accordance with some embodiments of the invention utilize a texture parameterization that provides a 1-to-1 mapping between points on a model and pixels in a texture image. This amounts to simply making copies of charts or chart regions that are pointed to from multiple polygons so that each polygon maps to its own region in texture space.
  • any arbitrary shape description ranging from point location in 3D space to more sophisticated descriptors, can be fed into a CNN framework in order to learn local shape features using a CNN training process.
  • One such training approach could be mesh categorization, however, other training approaches such as mesh compression, feature clustering or upres could also be viable training strategies for learning meaningful shape features.
  • a learning strategy for condensing networks that have been trained for purposes other than image synthesis allows for the production of new networks that are more efficient at extracting image features used for image synthesis.
  • VGG-19 pre-trained for image classification is used as a high quality, learned image descriptor for extracting out meaningful image features for the purposes of image synthesis.
  • Many networks designed for classification have been designed for a different and more difficult problem than texture feature extraction and often require more capacity than is needed for feature extraction.
  • VGG for example, is computationally expensive to run, which can result in small images, long wait times and a reliance on expensive hardware.
  • One of the benefits of systems in accordance with various embodiments of the invention is to improve memory/speed performance, without sacrificing synthesis quality.
  • VGG or some other network architecture trained on classification can be of interest on the basis that the kernels that were produced as a byproduct of the learning process for image classification can be useful in image synthesis.
  • image synthesis not all activation maps produced by a network are needed, only a small subset of those feature maps. As such, there are layers in the network that are not used directly for image synthesis, rather blocks of layers are run between layers of interest.
  • the number of hidden layers in a previously trained CNN can be reduced and/or the capacity of those hidden layers can be reduced.
  • the simplest strategy is to train a new network on image classification.
  • the learning strategy in accordance with many embodiments of the invention uses the activation maps produced by VGG (or some other artificial neural network) as the ground truth (since they will produce good synthesis results) and a network is trained to try and reproduce the input/output pairings using fewer neurons then VGG.
  • systems and methods in accordance with many embodiments of the invention utilize an artificial neural network with a specific number of neurons to learn a network that approximates the intermediate neural activations of a different network with a larger number (or the same number) of artificial neurons for the purposes of efficient image synthesis.

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

Systems and methods for providing convolutional neural network based image synthesis using localized loss functions is disclosed. A first image including desired content and a second image including a desired style are received. The images are analyzed to determine a local loss function. The first and second images are merged using the local loss function to generate an image that includes the desired content presented in the desired style. Similar processes can also be utilized to generate image hybrids and to perform on-model texture synthesis. In a number of embodiments, Condensed Feature Extraction Networks are also generated using a convolutional neural network previously trained to perform image classification, where the Condensed Feature Extraction Networks approximates intermediate neural activations of the convolutional neural network utilized during training.

Description

    CROSS REFERENCED APPLICATION
  • This application claims priority to U.S. Provisional Application Ser. No. 62/383,283, filed Sep. 2, 2016, U.S. Provisional Application Ser. No. 62/451,580, filed Jan. 27, 2017, and U.S. Provisional Application Ser. No. 62/531,778, filed Jul. 12, 2017. The contents of each of these applications are hereby incorporated by reference as if set forth herewith.
  • FIELD OF THE INVENTION
  • This invention generally relates to image synthesis and more specifically relates to image synthesis using convolutional neural networks based upon exemplar images.
  • BACKGROUND
  • With the growth and development of creative projects in a variety of digital spaces (including, but not limited to, virtual reality, digital art, as well as various industrial applications), the ability to create and design new works based on the combination of various existing sources has become an area of interest. However, the actual synthesis of such sources is a hard problem that raises a variety of difficulties.
  • SUMMARY OF THE INVENTION
  • Systems and methods for providing convolutional neural network based image synthesis are disclosed. In many embodiments, processes for providing CNN-based image synthesis may be performed by a server system. In accordance with several embodiments, the processes may be performed by a “cloud” server system. In still further embodiments, the processes may be performed on a user device.
  • One embodiment is a system for generating a synthesized image including desired content presented in a desired style includes one or more processors, memory readable by the one or more processors. The system in accordance with some embodiments of the invention includes instructions stored in the memory that when read by the one or more processors direct the one or more processors to receive a source content image that includes desired content for a synthesized image, receive a source style image that includes a desired texture for the synthesized image, determine a localized loss function for a pixel in at least one of the source content image and the source style image, and generate the synthesized image by optimizing a value of a pixel in the synthesized image to a content loss function of a corresponding pixel in the content source image and a style loss function of a corresponding pixel in the source style image wherein at least one of the corresponding pixels is the pixel that has a determined localized loss function and one of the content loss function and the source loss function is the determined localized loss function.
  • In another embodiment, the localized loss function is represented by a Gram matrix.
  • In a further embodiment, the localized loss function is represented by a covariance matrix.
  • In still another embodiment, the localized loss function is determined using a Convolutional Neural Network (CNN).
  • In a still further embodiment, the optimizing is performed by back propagation through the CNN.
  • In yet another embodiment, the localized loss function is determined for a pixel in the source style image.
  • In a yet further embodiment, the instructions to determine a localized loss function for a pixel in the source style image direct the one or more processors to receive a mask that identifies regions of the style source image, determine a group of pixels including the pixel that are included in one of the plurality of regions identified by the mask, determine a localized loss function for the one of the plurality of regions from the groups of pixels included in the one of the plurality of regions, and associate the localized loss function with the pixel.
  • In another additional embodiment, the instructions to determine a localized loss function for a pixel in the source style image direct the one or more processors to group the pixels of the source style image into a plurality of cells determined by a grid applied to the source style image, determine a localized loss function for the one of the plurality of cells that has a group of pixels that include the pixel, and associate the determined localized loss function of the one of the plurality of cells with the pixel.
  • In a further additional embodiment, the instructions to determine a localized loss function for a pixel in the source style image direct the one or more processors to determine a group of neighbor pixels for a pixel in the source content image, determine a group of corresponding pixels in the source style image associated with the group of neighbor pixels in the source content image wherein each of the group of corresponding pixels corresponds to one of the group of neighbor pixels and includes the pixel, and determine a local loss function for the group of corresponding pixels.
  • In another embodiment again, the localized loss function is determined for a pixel in the source content image.
  • In a further embodiment again, the instructions to determine a localized loss function for a pixel in the source content image direct the one or more processors to receive a mask that identifies regions of the source content image, determine a group of pixels including the pixel that are included in one of the plurality of regions identified by the mask, determine a localized loss function for the one of the plurality of regions from the groups of pixels included in the one of the plurality of regions, and associate the localized loss function with the pixel.
  • In still yet another embodiment, the instructions to determine a localized loss function for a pixel in the source style image direct the one or more processors to group the pixels of the source content image into a plurality of cells determined by a grid applied to the source style image, determine a localized loss function for the one of the plurality of cells that has a group of pixels that include the pixel, and associate the determined localized loss function of the one of the plurality of cells with the pixel.
  • In a still yet further embodiment, the instructions to determine a localized loss function for a pixel in the source style image direct the one or more processors to determine a global content loss function for the source content image from the pixels of the source content image, determine a weight for the pixel indicating a contribution to a structure in the source content image, and apply the weight to the global content loss function to determine the localized loss function for the pixel.
  • In still another additional embodiment, the weight is determined based upon a Laplacian pyramid of black and white versions of the source content image.
  • In a still further additional embodiment, a localized loss function is determined for a pixel in the source content image and a corresponding pixel in the source style image.
  • In still another embodiment again, the optimization uses the localized loss function for the pixel in the source content image as the content loss function and the localized loss function of the pixel in the source style image as the style loss function.
  • In a still further embodiment again, pixels in the synthesized image begin as white noise.
  • In yet another additional embodiment, each pixel in the synthesized image begins with a value equal to a pixel value of a corresponding pixel in the source content image.
  • In a yet further additional embodiment, the optimizing is performed to minimize to a loss function that includes the content loss function, a style loss function, and a histogram loss function.
  • In yet another embodiment again, a method for performing style transfer in an image synthesis system where a synthesized image is generated with content from a source content image and texture from a source style image includes receiving a source content image that includes desired content for a synthesized image in the image synthesis system, receiving a source style image that includes a desired texture for the synthesized image in the image synthesis system, determining a localized loss function a pixel in at least one of the source content image and the source style image using the image synthesis system, and generating the synthesized image using the image synthesis system by optimizing a value of a pixel in the synthesized image to a content loss function of a corresponding pixel in the content source image and a style loss function of a corresponding pixel in the source style image wherein at least one of the corresponding pixels is the pixel that has a determined localized loss function and one of the content loss function and the source loss function is the determined localized loss function.
  • In a yet further embodiment again, the localized loss function is represented by a Gram matrix.
  • In another additional embodiment again, the determining of a localized loss function for a pixel in the source style image includes receiving a mask that identifies regions of the style source image in the image synthesis system, determining a group of pixels including the pixel that are included in one of the plurality of regions identified by the mask using the image synthesis system, determining a localized loss function for the one of the plurality of regions from the groups of pixels included in the one of the plurality of regions using the image synthesis system, and associating the localized loss function with the pixel using the image synthesis system.
  • In a further additional embodiment again, the determining a localized loss function for a pixel in the source style image comprises grouping the pixels of the source style image into a plurality of cells determined by a grid applied to the source style image using the image synthesis system, determining a localized loss function for the one of the plurality of cells that has a group of pixels that include the pixel using the image synthesis system, and associating the determined localized loss function of the one of the plurality of cells with the pixel using the image synthesis system.
  • In still yet another additional embodiment, determining of a localized loss function for a pixel in the source style image includes determining a group of neighbor pixels for a pixel in the source content image using the image synthesis system, determining a group of corresponding pixels in the source style image associated with the group of neighbor pixels in the source content image wherein each of the group of corresponding pixels corresponds to one of the group of neighbor pixels and includes the pixel using the image synthesis system, and determining a local loss function for the group of corresponding pixels using the image synthesis system.
  • Additional embodiments and features are set forth in part in the description that follows, and in part will become apparent to those skilled in the art upon examination of the specification or may be learned by the practice of the invention. A further understanding of the nature and advantages of the present invention may be realized by reference to the remaining portions of the specification and the drawings, which forms a part of this disclosure.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.
  • The description and claims will be more fully understood with reference to the following figures and data graphs, which are presented as exemplary embodiments of the invention and should not be construed as a complete recitation of the scope of the invention.
  • FIG. 1 is an illustration of various devices that may perform one or more processes to provide Convolutional Neural Network (CNN) based image synthesis in accordance with an embodiment of the invention.
  • FIG. 2 is an illustration of components of a processing system in a device that executes one or more processes to provide CNN-based image synthesis using localized loss functions in accordance with an embodiment of the invention.
  • FIG. 3 is an illustration of images showing the instability in a Gram matrix.
  • FIG. 4 is an illustration of images showing a comparison of results of texture synthesis performed in accordance with various embodiments of the invention with and without the use of pyramids.
  • FIG. 5 is an illustration of a flow diagram of a process for providing CNN-based image synthesis that performs style transfer using localized loss functions in accordance with an embodiment of the invention.
  • FIG. 6 is an illustration of two input images and a resulting image from a style transfer process of the two input images using localized style loss functions in accordance with an embodiment of the invention.
  • FIG. 7 is an illustration of a flow diagram of a process for generating region-based loss functions in accordance with an embodiment of the invention.
  • FIG. 8 is an illustration of conceptual images showing masks of regions for two input images used in a style transfer process using region-based loss functions in accordance with an embodiment of the invention.
  • FIG. 9 is an illustration of conceptual images of cells in two input images in a style transfer process using uniform regions in accordance with an embodiment of the invention.
  • FIG. 10 is an illustration of a flow diagram of a process for generating localized loss functions in a style transfer process using uniform regions in accordance with an embodiment of the invention.
  • FIG. 11 is an illustration of a comparison of similar cells in two input images in a style transfer process using uniform regions in accordance with an embodiment of the invention.
  • FIG. 12 is an illustration of a comparison of similar pixels in two input images in a style transfer process using a per pixel loss transfer in accordance with an embodiment of the invention.
  • FIG. 13 is an illustration of a process for generating localized style loss functions for a style transfer process using per pixel loss transfer in accordance with an embodiment of the invention.
  • FIG. 14 is an illustration of two input images that provide an example of the operation of a style transfer process using a global content loss function in accordance with an embodiment of the invention.
  • FIG. 15 is an illustration of a resulting image from the style transfer from the two input images of FIG. 14 performed by a style transfer process using global content loss in accordance with an embodiment of the invention.
  • FIG. 16 is an illustration of a Laplacian Pyramid of images derived from a content source image from FIG. 14 used in a style transfer process using local content loss in accordance with an embodiment of the invention.
  • FIGS. 17 and 18 are illustrations of images produced by style transfer processes using global loss functions in accordance with certain embodiments of this invention.
  • FIG. 19 is an illustration of images generated by a style transfer process using localized content loss functions in accordance with an embodiment of the invention.
  • FIG. 20 is an illustration of a flow diagram of a process for determining localized loss using masks in accordance with an embodiment of the invention.
  • FIG. 21 is an illustration of images synthesized in accordance with some embodiments of the invention and images generated using other processes.
  • FIG. 22 is an illustration of images of masks used in an aging process in accordance with an embodiment of the invention.
  • FIG. 23 is an illustration of a synthesis order in a multiscale pyramid framework in accordance with an embodiment of the invention.
  • FIG. 24 is an illustration of a textured mapped model and components used to form the textured mapped model using a filter process in accordance with an embodiment of the invention.
  • FIG. 25 is an illustration of a texture and the texture applied to a surface of a mesh by a filter process in accordance with an embodiment of the invention.
  • DETAILED DISCUSSION
  • Turning now to the drawings, systems and methods for providing Convolutional Neural Network (CNN) based image synthesis in accordance with some embodiments of the invention are described. In many embodiments, processes for providing CNN-based image synthesis may be performed by a server system. In accordance with several embodiments, the processes may be performed by a “cloud” server system. In still further embodiments, the processes may be performed on a user device.
  • In accordance with many embodiments, the loss functions may be modeled using Gram matrices. In a number of embodiments, the loss functions may be modeled using covariance matrices. In accordance with several embodiments, the total loss may further include mean activation or histogram loss.
  • In accordance with sundry embodiments, a source content image, including desired structures for a synthesized image and a source style image, including a desired texture for the synthesized image, are received. A CNN may be used to determine localized loss functions for groups of pixels in the source content and/or source style images. The localized content and/or localized style loss functions may be used to generate a synthesized image that includes the content from the source content image and the texture from the source style image. In accordance with many embodiments, an optimization process may be performed to optimize pixels in a synthesized image using the localized content loss function of a corresponding pixel from the source content image and/or the localized style loss function of a corresponding pixel from the source style image. In accordance with a number of embodiments, the optimization may be an iterative optimization that is performed by back propagation through a CNN, or through a purely feed-forward process. In a number of embodiments, a specific pyramid-stack hybrid CNN architecture based on some combination of pooling, strided convolution and dilated convolution is used for image synthesis. As can readily be appreciated, the specific CNN architecture utilized in image synthesis is largely dependent upon the requirements of a given application.
  • In accordance with certain embodiments, the CNN-based image synthesis processes may perform aging of an image. In accordance with many embodiments, CNN-based image synthesis processes may be used to perform continuous weathering by continually modifying the parametric model. In accordance with a number of embodiments, the CNN-based image synthesis processes may be used to perform weathering by controlling the weathering through a “painting by numbers” process. In accordance with several embodiments, CNN-based image synthesis processes may be used to perform continuous multiscale aging. In accordance with many embodiments, CNN-based image synthesis processes may be used to perform aging by transferring weathering patterns from external exemplars.
  • In accordance with sundry embodiments, CNN-based image synthesis processes may combine optimization and feedforward parametric texture synthesis for fast high-resolution synthesis. In accordance with many embodiments, CNN-based image synthesis processes may be used to perform super image super resolution (SISR) for rending. In accordance with a number of embodiments, CNN-based image synthesis processes may combine parametric and non-parametric-non-CNN synthesis within a pyramid framework.
  • In several embodiments, dilated convolution neural networks can be utilized to synthesize image hybrids. Image hybridization involves starting from a set of several source images within a category and mixing them together in a way that produces a new member of that category. In a number of embodiments, image hybridization is performed using either an optimization or feedforward based synthesis strategy. In either case, a key aspect of the image hybridization is to generate new activations at different levels of the network which combine the activation features extracted from the input images into new hybrid configurations.
  • Processes in accordance with many embodiments of the invention integrate an on-model synthesis approach into the CNN approach. The goal of processes in accordance with some embodiments of the invention is to provide an on-model texture synthesis scheme that allows the user to supply a fully textured model as the input exemplar instead of just a texture, and apply that texture from the model onto a different untextured model. In many embodiments, the processes produce textures that conform to geometric shapes and the feature contents of that texture are guided by the underlying shape itself. This results in image synthesis that can be applied on top of already textured meshes, and can also produce appearance transfer from one textured mesh onto another.
  • In a number of embodiments, a specific class of artificial neural networks that can be referred to as Condensed Feature Extraction Networks are generated from CNNs trained to perform image classification. Systems and methods in accordance with many embodiments of the invention generate Condensed Feature Extraction Networks by utilizing an artificial neural network with a specific number of neurons to learn a network that approximates the intermediate neural activations of a different network with a larger number (or the same number) of artificial neurons. In several embodiments, the artificial neural network that is utilized to train a Condensed Feature Extraction Network is a CNN. In certain embodiments, the computation required to generate outputs from the Condensed Feature Extraction Network for a set of input images is reduced relative to the CNN used to train the Condensed Feature Extraction Networks.
  • Systems and methods for providing CNN-based image synthesis are described in more detail below.
  • Systems for Providing Convolutional Neural Network Based Image Synthesis
  • A system that provides CNN-based image synthesis in accordance with some embodiments of the invention is shown in FIG. 1. Network 100 includes a communications network 160. The communications network 160 is a network such as the Internet that allows devices connected to the network 160 to communicate with other connected devices. Server systems 110, 140, and 170 are connected to the network 160. Each of the server systems 110, 140, and 170 may be a group of one or more server computer systems communicatively connected to one another via internal networks that execute processes that provide cloud services to users over the network 160. For purposes of this discussion, cloud services are one or more applications that are executed by one or more server systems to provide data and/or executable applications to devices over a network. The server systems 110, 140, and 170 are shown each having three servers in the internal network. However, the server systems 110, 140 and 170 may include any number of servers and any additional number of server systems may be connected to the network 160 to provide cloud services including (but not limited to) virtualized server systems. In accordance with various embodiments of this invention, processes for providing CNN-based image synthesis processes and/or systems may be provided by one or more software applications executing on a single server system and/or a group of server systems communicating over network 160.
  • Users may use personal devices 180 and 120 that connect to the network 160 to perform processes for providing CNN-based image synthesis in accordance with various embodiments of the invention. In the illustrated embodiment, the personal devices 180 are shown as desktop computers that are connected via a conventional “wired” connection to the network 160. However, the personal device 180 may be a desktop computer, a laptop computer, a smart television, an entertainment gaming console, or any other device that connects to the network 160 via a “wired” or “wireless” network connection. The mobile device 120 connects to network 160 using a wireless connection. A wireless connection is a connection that may use Radio Frequency (RF) signals, Infrared (IR) signals, or any other form of wireless signaling to connect to the network 160. In FIG. 1, the mobile device 120 is a mobile telephone. However, mobile device 120 may be a mobile phone, a Personal Digital Assistant (PDA), a tablet, a smartphone, or any other type of device that connects to network 160 via wireless connection in accordance with various other embodiments of the invention. In accordance with some embodiments of the invention, the processes for providing CNN-based image synthesis may be performed by the user device. In several other embodiments, an application being executed by the user device may capture or obtain the two or more input images and transmit the captured image(s) to a server system that performs the processes for providing CNN-based image synthesis. In accordance with a number of embodiments where one or more of the images is captured by the user device, the user device may include a camera or some other image capture system that captures the image.
  • The specific computing system(s) used to capture images and/or processing images to perform CNN-based image synthesis often is largely dependent upon the requirements of a given application and should not be considered as limited to any specific computing system(s) implementation(s). Computing systems and processes for performing CNN based image synthesis are discussed further below.
  • Example of Processing Systems
  • An example of a processing system in a device that executes instructions to perform processes that provide CNN-based image synthesis in accordance with an embodiment of the invention is shown in FIG. 2. One may recognize that a particular processing system may include other components that are omitted for brevity without departing from various embodiments of the invention. The processing device 200 includes a processor 205, a non-volatile memory 210, and a volatile memory 215. The processor 205 may be a processor, microprocessor, controller or a combination of processors, microprocessors and/or controllers that perform instructions stored in the volatile memory 215 and/or the non-volatile memory 210 to manipulate data stored in the memory. The non-volatile memory 210 can store the processor instructions utilized to configure the processing system 200 to perform processes including processes in accordance with particular embodiments of the invention and/or data for the processes being utilized. In accordance with some embodiments, the processing system software and/or firmware can be stored in any of a variety of non-transient computer readable media appropriate to a specific application. A network interface is a device that allows processing system 200 to transmit and receive data over a network based upon the instructions performed by processor 205. Although an example of processing system 200 is illustrated in FIG. 2, any of a variety of processing systems in the various devices may be configured to provide the methods and systems in accordance with various embodiments of the invention.
  • Convolutional Neural Network Based Image Synthesis
  • CNNs can be powerful tools for synthesizing similar but different version of an image or transferring the style of one image onto the content of another image. Recently, compelling results have been achieved through parametric modeling of the image statistics using a deep CNN. An example CNN used for image style transfer is described by Leon Gatys in a paper entitled, “Image Style Transfer Using Convolutional Neural Networks” that may be obtained at www.cv-foundation.org/openacess/content_cvpr_2016/papers/Gatys_Image_Style_Transfer_CVPR_2016_paper.pdf. In particular, CNN-based image synthesis processes are particularly suited for performing texture synthesis and style transfer.
  • In particular, CNN-based image synthesis systems may perform texture synthesis in the following manner: A CNN image synthesis system receives an input source texture, S, and synthesizes an output texture, O. S and O are passed through a CNN such as VGG that generates feature maps for the activations of the first L convolutional layers of the CNN. For purposes of this discussion, the activations of the first L convolutional layers are denoted as S1 . . . SL and O1 . . . OL. A loss gram over the layers, which preserves some properties of the input texture by means of a Gram matrix can be expressed as:
  • gram = l = 1 L α i S i 2 G ( S i ) - G ( O i ) ) F 2 ( 1 )
  • where Al are user parameters that weight terms in the loss, |•| is the number of elements in a tensor, F is the Frobenius norm, and the Gram matrix G(F) is defined over any feature map F as an Nl×Nl matrix of inner products between pairs of features:
  • G ij ( F ) = k F ik F jk ( 2 )
  • Fi,j refers to feature i's pixel j within the feature map. The synthesized output image O is initialized with white noise and is then optimized by applying gradient descent to equation (1). Specifically, the gradient of equation (1) with respect to the output image O is computed via backpropagation.
  • CNN-based image synthesis processes that perform style transfer synthesis operate in a similar manner to the texture synthesis process described above. However, a CNN-based image synthesis system receives a content image, C, and a style image, S, that are used to generate a styled image, O. All three images are passed through a CNN, such as a VGG, that gives activations for the first L convolutional layers denoted as C1 . . . CL, S1 . . . SL, O1 . . . Ol. The total style transfer loss combines the losses for the style image (ζgram) and the content image (ζcontent):

  • Figure US20180068463A1-20180308-P00001
    transfer=
    Figure US20180068463A1-20180308-P00001
    gram+
    Figure US20180068463A1-20180308-P00001
    content  (3)
  • The content loss is a feature distance between content and output that attempts to make output and content look similar:
  • content = l = 1 L β l C l C l - O l F 2 ( 4 )
  • Where βl are user weight parameters, the output image O is initialized with white noise and optimized using a gradient descent.
  • As such, CNN-based image synthesis processes performing style transfer may use an iterative optimization process to cause the white noise image of the synthesized image to incrementally begin to resemble some user-specified combination of the source content and style images.
  • In accordance with many embodiments of the invention, a CNN backpropagation training procedure may be used as the iterative optimization process to turn the white noise or content image into an image that combines features of the content and style images. During backpropagation training procedures performed in accordance with a number of embodiments, the iterative optimization process can be directed by a loss function (equation 4) that the backpropagation training procedure is trying to minimize. In accordance with several embodiments, the loss function is calculated as the difference between parametric models encoding the style of a style image and the image being synthesized. In addition, in some embodiments of this invention, a content loss can be included as well, where the content loss is some distance metric between raw neural activations calculated for the content image and the image being synthesized. If a style loss is used without a content loss and the image being synthesized starts from noise, then the resulting operation is texture synthesis. If a style loss is used without content loss and the image being synthesized starts from the content image, then the resulting operation is style transfer. If both style and content loss are used then the operation will always be style transfer.
  • In accordance with various other embodiments of the invention, other image processing applications including, but not limited to, image hybridization, super-resolution upscaling and time-varying weathering, could be achieved using the same CNN framework but using different loss functions.
  • CNN-based image synthesis processes in accordance with certain embodiments of the invention may use loss functions to direct the optimization process in various synthesis processes that may be performed. However, CNN-based image synthesis processes in accordance with particular embodiments of this invention use a collection of stable loss functions for the CNN-based image synthesis to achieve various results. In accordance with some embodiments, CNN-based image synthesis processes use multiple stable loss functions for texture synthesis and style transfer. Thus, multiple stable loss functions for the style transfer including loss functions for the style and content are addressed separately below.
  • Problems with the Use of Gram Matrices
  • A problem that can be experienced when using Gram matrices as loss functions in style transfer is that the results are often unstable. The cause of the instability is illustrated in FIG. 3. In FIG. 3, an input image 301 has a uniform distribution of intensities with a mean of μ1=½≈0.707 and a standard deviation of σ1=0. An output image 302 has a non-uniform distribution with a mean of μ2=½ and a standard deviation of σ2=½. If interpreted as the activation of a feature map with one feature, these two distributions have equivalent non-central second moments of ½, and equal Gram matrices. The problem is that there are many distributions that result in an equivalent Gram matrix. However, the Gram matrices do not match image intensities. Instead, Gram matrices match feature activations, i.e. feature maps, after applying the activation functions but the same argument applies: activation maps with quite different means and variances can still have the same Gram matrix.
  • The problem arises because a Gram matrix is statistically related to neither the mean nor covariance matrices. Instead, a Gram matrix is related to a matrix of non-central second moments. To show this, a feature activation map, F, with m features, is used as an example. For brevity, “feature map activations” are simply referred to as “features,” such that a “feature” refers to the result of applying an activation function. The statistics of the features in the feature map F can be summarized by using an m dimensional random variable X to model the probability distribution of a given m-tuple of features. The random vector of features X can be related to the feature map F. For example, the Gram matrix, G(F), may be normalized by the number of samples n to obtain a sample estimator for the second non-central mixed moments E[XXT]. As such, the terms (normalized) “Gram matrix” and E[XXT] may be used interchangeably in the following discussion even though one is actually a sampled estimator of the other.
  • In the following argument,
  • 1 n G ( F ) = E [ XX T ]
  • and the mean feature is defined as μ=E[X]. By a general property of covariance matrices, Σ(X)=E[XXT]−μμT, Σ indicates a covariance matrix. After rearranging, the following equation results:

  • E[XX T]=Σ(X)+μμT  (5)
  • In the case where there is only one feature, m=1, equation (5) becomes:
  • 1 n G ( F ) = E [ X 2 ] = σ 2 + μ 2 = ( σ , μ ) 2 ( 6 )
  • Where σ is the standard deviation of feature X. For a feature map, F1 for the input source image, and a feature map, F2, for the synthesized output image that have respective feature distributions X1, X2, means μ1, μ2, and standard deviations σ1, σ2, the features maps will have the same Gram matrix if the following condition for equation (6) holds:

  • ∥(σ11)∥=∥(σ22)∥  (7)
  • As such, an infinite number of 1D feature maps with different variances but equal Gram matrices may be created. This is not optimal for image synthesis. Specifically, this means that even if a Gram matrix is held constant, the variance σ2 2 of the synthesized texture map can freely change (with corresponding changes to the mean, μ2, based on equation (7)). Conversely, the mean, μ2, of the synthesized texture map can freely change (with corresponding changes to the variance, σ2 2). This property often leads to the instabilities. For simplicity, it can be assumed that a CNN may be flexible enough to generate any distribution of output image features. To generate an output texture with a different variance (e.g. σ2>>σ1) but equal Gram matrix, equation (6) can be solved for μ2 to obtain μ1=√{square root over (σ1 21 2−σ2 2)}. This is shown in FIG. 3 where the distribution X1 of the image 301 has
  • μ 1 = 1 2
  • and α1=0 and the image 302 has larger standard deviation of σ2=½.
  • In the multidimensional case where m>1, if there is no correlation between features, the m separate cases of the previous 1D scenario. Thus, while maintaining the same Gram matrix, all of the variances can be changed, as long as a corresponding change to the mean is made. This can lead to instabilities in variance or mean. However, in the multidimensional scenario, typically there are correlations between features. The following example illustrates this point. In this example, an input feature map F1, has an input feature random vector X1, a mean p, and covariance matrix Z(X1). To generate a set of output feature random vectors X2 with an equal Gram matrix but different variance, an affine transformation is applied to the input random feature vector X1 to obtain a transformed random vector of output features activation X2=AX1+b, where A is an m×m matrix, and b is an m vector. The Gram matrices X1 and X2 are set equal to one another using equation (5) above to obtain:

  • E[X 2 X 2 T ]=AΣ(X 1)A T+(Au+b)(Au+b)T=E[X 1 X 1 T]=Σ(X 1)+μμT  (8)
  • The variances of the output random feature activation vector X2 may be constrained along the main diagonal of its covariance matrix so that the variances are equal to a set of “target” output image feature activation variances. The remains may be an unknown variable in the transformation matrix A, and vector b may be determined using closed form solutions of the resulting quadratic equations. However, these equations are often long and computationally intensive to solve. Yet, it does show that there are more unknowns than equations. Thus, it is possible to generate an output distribution X2 with different variances that the input feature distribution X1, but with the same Gram matrix. Specifically, there are m(m+3)/2 constraints due to equation (8) (there are m(m+1)/2 constraints due to the upper half of the symmetric matrix, plus m constraints for the known output feature variances).
  • Covariance and Mean Loss
  • In accordance with sundry embodiments of the invention, due to the Gram matrix stability issues identified above, CNN-based image synthesis processes use a covariance matrix instead of a Gram matrix to guide the synthesis process. Covariance matrices are similar to Gram matrices but do not share the same limitation. By subtracting off the mean activation before computing inner products, covariance matrices explicitly preserve statistical moments of various orders in the parametric model. By this we explicitly refer to the mean of all feature vectors as the first order moment and to the co-activations of feature vectors centered around their mean as second order moments.
  • Replacing Gram matrices with Covariance matrices can stabilize the synthesis process for some textures, however, subtracting the mean can affect the desired features to be reproduced during the synthesis process in undesirable ways. To counteract this effect, we introduce an additional loss term for mean activations.
  • This new parametric model allows the covariance loss and means loss to drive the synthesis. This can make the combined loss for texture synthesis:

  • L texture =L covariance +L mean +L tv  (12)
  • The replacement of the Gram matrix with a Covariance matrix may improve but does not decisively solve the stability issues inherent in texture synthesis and/or style transfer. However, the covariance matrix may be a powerful method for describing image style in a stable form when the texture being parameterized is highly stochastic and can be represented as a single cluster in feature space. It may still be a problem that many textures and most natural images contain multiple clusters. In other words, these textures or natural images contain a combination of multiple distinct textures. When an input texture exhibits multiple feature space clusters and the cluster centers are far apart from each other, a covariance matrix may exhibit the same unstable behavior as a Gram matrix. The reason for the unstable behavior is that the center between multiple clusters ensures that every cluster will be un-centered and, thus, will not exhibit stable mathematical properties.
  • Although CNN-based image synthesis processes that use covariance loss and/or mean loss in accordance with various embodiments of the invention are described above, other processes that use covariance loss and mean loss in accordance with various other embodiments of the invention that add, combine and/or remove steps as necessitated by the requirements of particular systems and/or processes are possible.
  • Histogram Losses
  • Due to the multiple cluster problem, processes in accordance with a number of embodiments of the invention may use Histogram loss as described above to achieve consistently high-quality results. In accordance with several embodiments, the multiple cluster problem may be dealt with by using an automatic clustering process on the feature vectors to identify different textures in an image. The clustering process could transform the image so that each cluster is centered around its mean. However, the use of an automatic clustering process may introduce a number of additional problems. For example, if different linear transforms are applied to different regions of the image in a discrete way, seam lines may appear along the borders between different texture clusters. To deal with these seams, processes in accordance with many embodiments of the invention interpolate the transform between clusters. The interpolation may be more difficult than simply adding a histogram loss that has been shown to solve the same problem as discussed above.
  • In accordance with particular embodiments of the invention, the instability of Gram or Covariance matrices is addressed by explicitly preserving statistical moments of various orders in the activations of a texture. In accordance with many embodiments, an entire histogram of feature activations is preserved. More specifically, systems and processes in accordance with a number of embodiments augment synthesis loss with m additional histogram losses, one for each feature in each feature map. Additionally, systems and processes in accordance with several embodiments incorporate a total variation loss that may improve smoothness in the synthesized image.
  • As such, the combined loss for texture synthesis in CNN-based image synthesis processes in accordance with some embodiments of the invention is:

  • Figure US20180068463A1-20180308-P00001
    texture (ours)=
    Figure US20180068463A1-20180308-P00001
    gram+
    Figure US20180068463A1-20180308-P00001
    histogram+
    Figure US20180068463A1-20180308-P00001
    tv  (9)
  • where Lgram can be interchanged with Lcovariance arbitrarily. However, it can be slightly subtle to develop a suitable histogram loss. For example, a naive approach of directly placing L2 loss between histograms of the input source texture image S and output synthesized image O has zero gradient almost everywhere and does not contribute to the optimization process.
  • As such, CNN-based image synthesis processes in accordance with many embodiments of the invention use loss based on histogram matching. To do so, the synthesized layer-wise feature activations are transformed so that their histograms match the corresponding histograms of the input source texture image S. The transformation can be performed once for each histogram loss encountered during backpropagation.
  • To do so, CNN-based image synthesis processes in accordance with a number of embodiments of the invention use an ordinary histogram matching technique to remap the synthesized output activation to match the input source image activations. In such a technique, Oij represents the output activations for a convolutional layer i and feature j, and O′ij represents the remapped activations. The technique may compute a normalized histogram for the output activations Oij and match it to the normalized histogram for the activations of input source image S to obtain the remapped activations O′ij. This technique is then repeated from each feature in the feature map to determine a Frobenius norm distance between Oij and O′ij. The loss of the histograms may be expressed as:
  • histogram = l = 1 L γ l O i - R ( O i ) F 2 ( 10 )
  • Where Oi is the activation map for feature map l and R(Oi) is the histogram remapped activation map, and γl is a user weight parameter that controls the strength of the loss. As R(Oi) has zero gradient almost everywhere, it can be treated as a constant for the gradient operator. Thus, the gradient of equation (10) can be computed by realizing R(Oi) into a temporary array O′l and computing the Frobenius norm loss between Oi and O′i.
  • Although CNN-based image synthesis processes that use histogram loss in accordance with various embodiments of the invention are described above, other processes that provide histogram loss in accordance with various other embodiments of the invention that add, combine and/or remove steps as necessitated by the requirements of particular systems and/or processes are possible.
  • Application of Histogram Loss to Style Transfers
  • Style transfer is a broadening of texture synthesis. In texture synthesis, an input texture is statistically resynthesized. Style transfer is similar with the additional constraint that the synthesized image O should not deviate too much from a content image C. To do so, CNN-based image synthesis processes that perform style transfer in accordance with sundry embodiments of the invention include both a per-pixel content loss and a histogram loss in the parametric synthesis equation such that the overall loss becomes:

  • Figure US20180068463A1-20180308-P00001
    transfer (ours)=
    Figure US20180068463A1-20180308-P00001
    gram+
    Figure US20180068463A1-20180308-P00001
    histogram+
    Figure US20180068463A1-20180308-P00001
    content+
    Figure US20180068463A1-20180308-P00001
    tv  (11)
  • where Lgram is interchangeable with Lcovariance for the purposes of our algorithm.
  • Although CNN-based image synthesis processes that use histogram loss to perform style transfer in accordance with various embodiments of the invention are described above, other processes that use histogram loss to perform style transfer in accordance with various other embodiments of the invention that add, combine and/or remove steps as necessitated by the requirements of particular systems and/or processes are possible.
  • Automatic Tuning of Parameters
  • CNN-based image synthesis processes in accordance with some embodiment automatically determine parameters for the processes performed. The parameters may include, but are not limited to, the coefficients α1 in the Gram/Covariance loss equation (1), β1 in the content loss of equation (4), γl in the histogram/mean loss equation (1) and ω that is multiplied against the total variation loss.
  • Automatic tuning processes in accordance with many embodiments of the invention are inspired by batch normalization that tunes hyper-parameters during a training process to reduce extreme values of gradients. The parameters may also be dynamically adjusted during the optimization process. In accordance with a number of embodiments, the dynamic tuning can be performed with the aid of gradient information. During backpropagation, different loss terms Li, may be encountered. Each loss term Li has an associated parameter ci that needs to be determined (ci is one of the parameters αl, βl, γl, and ω). A backpropagated gradient gi may first be calculated from the current loss term as if ci were 1. However, if the magnitude of gi exceeds a constant magnitude threshold Ti, then the gradient gi may be normalized so that its length is equal to Ti. Magnitude thresholds of 1 can be used for all parameters except for the coefficient αl of the Gram/Covariance loss, which may have a magnitude threshold of 100 in accordance with several embodiments. As can readily be appreciated, magnitude thresholds and/or other constraints can be specified as appropriate to the requirements of a given application.
  • Although CNN-based image synthesis processes that perform automatic tuning in accordance with various embodiments of the invention are described above, other processes that provide automatic tuning in accordance with various other embodiments of the invention that add, combine and/or remove steps as necessitated by the requirements of particular systems and/or processes are possible.
  • Multiscale Pyramid Based Synthesis
  • CNN-based image synthesis processes in accordance with certain embodiments include manual and automatic control maps that were previously used in non-parametric approaches. To achieve this, processes in accordance with many embodiments perform a coarse-to-fine synthesis using image pyramids. In accordance with a number of embodiments, a ratio of two is used between successive image widths in the pyramid. A comparison of results of texture synthesis performed in accordance with various embodiments of the invention with and without the use of pyramids are shown in FIG. 4. In FIG. 4, images 401 and 402 are the style images and images 410-411 and 420-421 are the content images. Images 410 and 411 were generated without the use of pyramids and images 420 and 421 were generated with the use of pyramids. Images 410 and 420 show that pyramids blend coarse scale style features with content features better. Images 411 and 421 may show that pyramids transfer coarse scale features better and reduce CNN noise artifacts. Images 412 and 422 are magnified from images 411 and 421, respectively, and may show noise artifacts (in image 412) and better transfer of coarse-scale features (in image 422).
  • Processes for Providing Convolutional Neural Network Based Image Synthesis Using Localized Loss Functions
  • A process for providing CNN-based image synthesis that performs style transfer using localized loss functions in accordance with an embodiment of the invention is shown in FIG. 5. In process 500, a source content image and a source style image are received (505, 510). The source content image includes the structures that are to be included in a synthesized image and the source style image includes a texture that is to be applied to the synthesized image. The process 500 determines localized content loss functions for groups of pixels in the source content image (515) and/or localized style loss functions for groups of pixels in the source style image (520). In accordance with some embodiments, the localized content loss functions and/or localized style loss functions may be generated using a CNN. The determinations of the localized content and/or style loss functions based upon various groupings of the pixels in accordance with various embodiments of the invention are described below. Process 500 performs an optimization process using the localized content loss functions and/or localized style loss functions to cause the pixels in the synthesized image to form an image with a desired amount of content from the content source image and a desired amount of texture from the source style image (525). In accordance with certain embodiments, the optimization process may be an iterative optimization process that is performed until a desired result is achieved. In accordance with many embodiments, the iterative optimization process may be performed by a backpropagation through a CNN iterative optimization processes in accordance with various embodiments of the invention are described in more detail below.
  • Although processes for providing convolutional neural network based image synthesis by performing style transfer using localized loss functions in accordance with various embodiments of the invention are discussed above, other processes may be modified by adding, removing, and/or combining steps of the described processes as necessitated by system and/or process requirements in accordance with various embodiments of the inventions.
  • Localizing Style Losses
  • Style loss functions reproduce the textural component of the style image. Thus, a global style loss function may be transformed into a stationary representation (i.e. the representation is a culmination of the local patches of texture independent of the location of each local patch in the image). A global style loss function approach may generate the global style loss function by applying the source style image to a CNN, gathering all activations for a layer in a CNN and building a parametric model from the gathered activations of the layer. An optimization process may then be used to cause the loss function of one image to appear statistically similar to the loss function of another image by minimizing the error distance between the parametric model of the loss functions of the two images (which act as a statistical fingerprint that is being matched).
  • A style transfer approach using a global style loss function may lead to unimpressive results as shown by the images illustrated in FIG. 6. In FIG. 6, Brad Pitt's image 601 is matched to an image of Picasso's self-portrait 602. In the resulting image 603, the overall style of the painting in the image 602 is transferred including the individual brush strokes, the general color palette and the overall look. This makes the image 603 look similar to the image 602. However, the individual features that compose the face are not transferred. Thus, Brad Pitt's eyes, nose and mouth do not look like Picasso's corresponding features.
  • In accordance with sundry embodiments of the invention, a collection of strategies designed to transform the style of an image locally rather than globally may be performed. However, the various strategies used in the various embodiments involve a similar core idea of using a collection of parametric models representing local loss functions in either one or both of the source content image and the source style image, as opposed to using a single parametric model for each image. Each of the parametric models of an image summarize specific features in the image and are distinct and/or unique from the other parametric models in the collection of matrices for the image. The application of local loss functions may vary greatly between the various embodiments, depending on a desired degree of locality. Thus, each of the models may represent very large regions of the image in accordance with some embodiments when it is desired to have very little locality. Alternatively, the models may each represent smaller groups down to an individual pixel in accordance with particular embodiments of the invention where a very high degree of locality is desired.
  • Region-Based Style Transfers
  • Region-based style transfer may be used in accordance with some embodiments of the invention. A process for generating a region-based loss function in accordance with an embodiment of the invention is shown in FIG. 7. A process 700 may generate a mask with one or more regions for both of the source content and source style images (710). In accordance with various embodiments, the regions may be determined by a user and received as a manual input of the user into the system. In accordance with many embodiments, processes may generate the regions of the mask through a neighbor matching process and/or other similar process for structure identification. The process 700 applies the masks to each image and determines a region of the mask associated with each pixel in each of the images (715). The process 700 assigns each pixel to the region determined to be associated with the pixel (720). The process 700 then generates parametric models for each of the identified regions of the masks from the pixels associated with the regions (725) and may add the generated parametric model for each region to an array of matrices stored in memory.
  • In accordance with many embodiments, the mask value of each pixel may be used to index the pixel into the proper parametric model in the array for use in the style transfer process described above. Images that illustrate a region-based style transfer process performed on the images of Brad Pitt and Picasso's self- portrait 601 and 602 in accordance with an embodiment of the invention are shown in FIG. 8. In region-based style transfer, distinct features of an image can be clustered together. As shown by image 802, Picasso's portrait from image 602 is segmented into a few distinct regions. For example, the eyes may be one region with the lips, nose, hair, skin shirt and background each being in their own regions. A mask 802 may be applied over image 602 to identify the region that contains each pixel in the image 602. Likewise, a mask shown in image 801 may be applied to the image 601 of Brad Pitt to identify the pixels that belong to each of the identified regions of the image 601.
  • Although processes that provide region-based loss functions in accordance with various embodiments of the invention are described above, other processes that provide region-based loss functions in accordance with various other embodiments of the invention that add, combine and/or remove steps as necessitated by the requirements of particular systems and/or processes are possible.
  • Uniform Segments Style Transfer
  • In accordance with a number of embodiments of the invention, a uniform segment style transfer process may be performed. In uniform segment style transfer, the images are divided into uniform segments. The images of Brad Pitt and Picasso's self-portrait divided into uniform segments in accordance with an embodiment of the invention that uses uniform segments are shown in FIG. 9. A process for performing uniform style transfer in accordance with an embodiment of the invention is shown in FIG. 10. As shown in images 901 and 902 of FIG. 9, a process 1000 of FIG. 10 divides each image into a grid of regions or cells (1005). In particular, images 901 and 902 are divided into grids of 8×8 cells in the shown embodiment. However, any number of cells may be used in grids, for example 16×16 and 20×20 grids, in accordance with various other embodiments of the invention. Each cell is associated with an individual parametric model of a localized loss function (1010). In many embodiments, the generated parametric models can be added (1015) to an array of models for each image. In accordance with some embodiments, an individual parametric model may be associated (1020) with groups of cells that are determined by similarity of the cells or by some other manner.
  • After the parametric models are generated, the parametric models may be used as a descriptor for nearest neighbor matching of the pixels in associated cell(s). The nearest neighbor matching binds cells together so that each cell in the content image is optimized more closely resemble a cell in the style image that is determined to most closely approximate the cell as shown in FIG. 11. In FIG. 11, a cell 1101 in the image 901 is optimized to a cell 1102 in the image 902. In accordance with some embodiments, one or more cells in the style image that most closely approximate a cell in the content image may be identified by determining the cell(s) in the style image that has a minimum L2 distance between its parametric model and the parametric model of the cell in the content image. In accordance with certain embodiments, the optimizing processes for all of the cells in the content image are performed in parallel.
  • Although processes that provide uniform segment style transfer in accordance with various embodiments of the invention are described above, other processes that provide segment-based style transfer in accordance with various other embodiments of the invention that add, combine and/or remove steps as necessitated by the requirements of particular systems and/or processes are possible.
  • Per Pixel Style Transfer
  • In accordance with particular embodiments of the invention, each pixel is a local region. Images that show the association between pixels in a source content image and a source style image in accordance with a certain embodiment of this invention are shown in FIG. 12. In accordance with many embodiments, a parametric model of a localized style loss function is generated for each pixel in the content image cell 1201 and the style image cell 1202. A process for generating localized style loss functions in accordance with an embodiment of the invention is shown in FIG. 13. The process 1300 includes gathering a neighborhood of pixels surrounding each particular pixel in the source content image (1305). A group of pixels in the source style image that are associated with the neighbor pixels of each particular pixel is determined (1310). In accordance with sundry embodiments, a pre-computed nearest neighbor set may be used to associate each pixel in the content image with a pixel in the source style image. The group of pixels in the source style image associated with the neighborhood of each particular pixel is used to generate the parametric model of the localized style loss function that the particular pixel is optimized toward (1315).
  • Although processes that provide per pixel-based style transfer in accordance with various embodiments of the invention are described above, other processes that provide per pixel-based style transfer in accordance with sundry embodiments of the invention that add, combine and/or remove steps as necessitated by the requirements of particular systems and/or processes are possible.
  • From the above descriptions of the various transfer strategies, one may see that regional segment style transfer is simple and fast compared to the other transfer strategies. However, the regional segment style transfer can be imprecise, whether a human or a CNN is used to determine how the parametric models are generated. The cell transfer can differ from the regional transfer in that many more parametric models are generated and the matrices themselves are used to determine the correspondence of features. The per pixel approach is typically the most precise and the slowest possible transfer strategy because the amount of computations needed is increased by generation of a parametric model for each particular pixel from patches of pixels around the particular pixel. The increase in computations stems from the fact that in the other approaches each of the pixels in an image contribute to one parametric model and in the per pixel approach each pixel contributes to the parametric model of each of the neighboring pixels. Furthermore, it is noted that various other embodiments may select certain other manners in determining local regions to refine the trade-off between speed and/or memory versus accuracy.
  • Localizing for Content Loss
  • In accordance with particular embodiments, content loss can be used for the transfer process instead of style loss. The difference is that style loss attempts to be stationary and content loss does not. For purposes of this discussion, stationary means that the location of a pixel is the main factor influencing what the pixel should look like. The content loss function can be simple in accordance with some embodiments in that the L2 (or Euclidian) distance is summed for each pixel in the synthesized image to each pixel at the same location in the content image.
  • The goal of content loss, on the other hand, is to reproduce the “structure” of the content image (image 1401 of FIG. 14 showing the Golden Gate Bridge) while allowing the nonstructural aspects of the image to mutate towards resembling the style image (image 1402 of FIG. 14 showing Starry Night). In accordance with other known transfer processes, one may assume that the deeper layers of the network represent the higher order image features, so by maintaining a low error between content and synthesis deeper in the network but allowing the style to be optimized at the shallow layers, a balance is reached between the style and content. However, these processes often do not regularly provide compelling results.
  • A problem with using a global content loss in a style transfer process may be that all of the regions of the content image may not be equally as important in terms of key shapes and structures in the image. For instance, in FIG. 15, image 1501 of the Golden Gate Bridge, the low-importance content features including the low frequency sky and ocean are given a high enough content loss to overpower the style contribution and stop large, swirly clouds and stars from forming. In parallel, the high-importance content features including the bridge are largely being distorted by the style image. This makes the high-importance content features lose fine scale qualities such as the cable stretching from tower to tower. Furthermore, the tower in the background is more distorted than the larger tower in the foreground because the tower in the background is smaller in terms of image size. However, the tower in the background is not less important than the tower in the foreground in terms of content as the tower is a key structure in the image.
  • Style transfer processes that use localized content loss functions in accordance with some embodiments of the invention may provide weights to each pixel based on the amount that the pixel contributes to a key shape or structure in the image. However, “content” can be a poorly defined concept with respect to art as “content is subjective and can be subject to personal interpretation. As such, the process for localizing content loss in accordance with some embodiments of the invention is based on the following observations about “content.” For the purposes of determining the contribution of a pixel to content, one may observe that flat, low frequency regions of an image generally do not contribute to the content of the image (for purposes of human perception) while high frequency regions generally are important contributors to the content. Therefore, style transfer processes in accordance with many embodiments of the invention may use a Laplacian Pyramid of black and white versions of the content image (Images 1601-1604 in FIG. 16) to determine content loss weights for each pixel in the image being synthesized where the high frequency pixels (whiter pixels) have a higher influence on content than low frequency pixels (darker pixels).
  • Alternatively, convolutional neural networks (CNNs) trained on image classification tend to learn kernels at the deeper levels of the network that recognize shapes which are structurally meaningful to humans. Therefore, the magnitude of feature vectors produced from the content image deep in the network can also be used as a scaling factor for the content loss itself.
  • The difference between the use of global style loss and global content loss is shown by the images shown in FIG. 17. An image 1701 shows an image generated using global style loss and an image 1702 shows an image generated using global content loss starting from noise and use the respective global loss functions to generate the final image. The image 1701 illustrates global style loss with no content loss producing a “texturized” version of the style image (Starry Night). Image 1702, on the other hand, introduces global content loss to the texturized version and the texturized version of Starry Night is reshaped into the form of the Golden State Bridge but with the flaws identified above.
  • The difference between the use of a global content loss functions and the use of localized content loss functions determined using a Laplacian Pyramid in accordance with a certain embodiment of the invention is shown in FIG. 18. An image 1801 is the same as the image 1702 and introduces global content loss to the texturized version of the image, and an image 1802 introduces local content loss based upon a Laplacian Pyramid to the texturized version instead of the global content loss. The features in the image 1802 emerge (i.e. the bridge and the land) while the rest of the image reproduces the texture of Starry Night more accurately.
  • Although previous processes may start from noise, noise does not have to be the starting point in some embodiments of this invention. The logic of starting from noise may be that noise often produces a slightly different version of the transfer each time.
  • In accordance with a number of embodiments of the invention, CNN backpropagation may be used to provide a style transfer process using global and/or local content loss. The use of CNN backpropagation can allow the image to be thought of as a point in a super-high dimensional space (a dimension for each color channel in each pixel of the image). The optimization process is a gradient descent optimization that pulls an image at that point through the image's high dimensional space toward a new point that is within a small region of the high dimensional space that is considered “good output.” The force that pulls the image may be the combined loss function for style and content as well as optimizing towards a local minimum of the function, depending on where the noise commences in this space. Alternatively, the optimization process may be started from the content image instead of noise in accordance with a number of embodiments. Using the standard approach, the use of the content image to start may not offer an advantage because the content loss may begin from an optimal position and play a more aggressive “tug of war” against the style loss resulting in an image that has more content and less style. In summary, when starting from noise, both loss functions (content and style) are often very bad and move together toward a mutually optimal choice for most of the processes whereas, when starting from the content image, the two loss functions are often fighting against each other during the entire process, and that may return a less than pleasing result.
  • The use of content loss in style transfer processes in accordance with particular embodiments of the invention has certain advantages that known style transfer processes typically have previously overlooked. If the content loss is removed from the function leaving only style loss, a texturized version of the image (as shown in FIG. 19, image 1901) is generated with the subtle difference that the content image (i.e. the starting point in high dimensional space) exists in a local minimum within the space of all possible versions of the texturized version of the style image. As such, starting at the content image and removing the content loss causes the process to start at a point in space that is the content image and move away from this point in the high dimension space to a point that is pure style, but the process arrives at a local minimum that represents both style and content in a much better localized ratio than is typically produced by known processes as shown in an image 1901 of FIG. 19.
  • The image 1901 was generated by starting at the content image of the Golden Gate Bridge and then optimizing using only style loss so that the image mutated to better resemble “Starry Night” until the process reached a local minimum. This produces better results than previously known style transfer processes. However, the results may be improved by re-introducing localized content loss instead of global content loss that results in image 1902. This approach addresses the problem of removing content loss completely by trying to reach a local minimum in the optimization process that does not cause key structures (e.g. the cables on the bridge and the tower in the background) to be mutated too much by the style loss and lose the distinguishing characteristics of these structures. By re-introducing local content loss in accordance with some embodiments of the invention, the mutation of structurally important aspects of the content too far in the style direction may be reduced leading to an optimization process that reaches a more desirable local minimum.
  • Localized style and content loss are also applicable within a feedforward texture synthesis and style transfer algorithm and are not limited to an optimization framework using backpropagation.
  • Although processes that provide style transfer using global and/local content loss in accordance with various embodiments of the invention are described above, other processes that provide style transfer using global and/or local content loss in accordance with various other embodiments of the invention that add, combine and/or remove steps as necessitated by the requirements of particular systems and/or processes are possible.
  • Localized Style Loss for Artistic Controls
  • In accordance with sundry embodiments of the invention, CNN-based image synthesis processes separate multiple textures in an image into multiple models to determine localized loss. To do so, processes in accordance with many embodiments receive an index mask for the source texture or style image and an index mask for the synthesized image. In accordance with a number of embodiments, each mask is input by a user. Each mask may include M indices. This may sometimes be referred to as a “painting by numbers” process.
  • A process for determining localized loss using masks in accordance with an embodiment of the invention is shown in FIG. 20. A process 2000 applies the mask for the source image to the source image to determine the pixels that belong to each of the M indices (2005) and the mask to the synthesized image to determine the pixels that belong to each of the M indices of the synthesized mask. A parametric model is generated for each of the M indices of the source style mask from the pixels that belong to each of the M matrices (2010). The indices of the synthesized output may be tracked though an image pyramid for coarse-to-fine synthesis (2015). During synthesis, the previous losses are modified to be spatially varying (2020). In accordance with many embodiments, spatially varying Gram/Covariance matrix and histogram losses may be imposed where the style Gram/Covariance matrices and histograms vary spatially based on the output index for the current pixel. Histogram matching is then performed (2025). In accordance with several embodiments, the histogram matching may be performed separately in each of the M regions defined by the indexed masks. Blending of adjacent regions may then be performed (2030). In accordance with a number of embodiments, the bending of adjacent regions can be automatically performed during backpropagation.
  • In accordance with certain embodiments of CNN-based image synthesis, it is important to note that often the style and content source images contain sets of textures that are semantically similar and should be transferred to each other. An example of this is shown in FIG. 21. FIG. 21 also shows images synthesized in accordance with particular embodiments of the invention and images generated using other processes. In FIG. 21, images 2101 an example of controllable parametric neural texture synthesis. Original images are on the left, synthesis results on the right; corresponding masks are above each image. Rows of images 2105, 2110 and 2115 are examples of portrait style transfer using painting by numbers. Rows of images 2110 and 2115 show style transfer results for an embodiment of the invention on the far right as compared to images generated by another process in middle. The images can show that processes in accordance with some embodiments of the invention may preserve fine-scale artistic texture better. However, processes in accordance with certain embodiments of the invention may also transfer a bit more of the person's “identity,” primarily due to hair and eye color changes.
  • Implementation Details of CNN Based Image Synthesis Systems that Use Histograms
  • In accordance with sundry embodiments of the invention, the CNN used may be a VGG-19 network pre-trained on the ImageNet dataset. In accordance with many embodiments, layer Rectified Linear Unit (relu) 1_1, relu 2_1, relu 3_1 and relu 4_1 may be used for Gram losses. The histogram losses may be computed only at layers relu 1_1 and relu 4_1 in a number of embodiments. Content loss is computed only at relu 4_1 In accordance with several embodiments. Furthermore, the total may only be performed on the first convolutional layer to smooth out noise that results from the optimization process.
  • In accordance with particular embodiments, the images are synthesized in a multi-resolution process using an image pyramid. During synthesis, the process begins at the bottom of the pyramid that can be initialized to white noise, and after each level is finished synthesizing, a bi-linear interpolation is used to upsample to the next level of the pyramid.
  • Although CNN-based image synthesis systems in accordance with various embodiments of the invention are described above, other configurations of the CNN-based systems that add, modify and/or remove portions of the CNN in accordance with various embodiments of the invention are possible.
  • Controlling Age Appearance within Parametric Models
  • The apparent realism and/or quality of a synthesized image can be improved by applying synthetic weathering. Textures that display the characteristics of some weathering processes may incorporate a collection of multiple textures consolidated into one weathered texture. As such, CNN-based image synthesis processes in accordance with some embodiments of the invention may provide a new approach for controlling the synthesis of these complex textures without having to separate the textures into different parametric models. This may be achieved by directly controlling the synthesis process by strategically waxing and waning specific parameters in the model to create new outputs that express different ratios of desired features to control the appearance of age for certain textures.
  • Identifying Age Features within Examplar Images
  • A separate but entangled problem to controlling age appearance during synthesis is first identifying which features in the input exemplar image display characteristics of age and to what degree. In accordance with sundry embodiments, user-created masks that delineate feature age may be received and used to identify the features. Processes in accordance with many embodiments may use an automatic clustering approach to segregate different textures. Still other processes in accordance with a number of embodiments may use a patch-based method that uses the average feature distance between a patch and its K nearest neighbors as a metric for “rarity” that may be interpreted as age. This method is based on the assumption that features created by the weathering process are highly random and have a low chance of finding a perfect match caused by the same process. However, there may be a limitation in the patch-based approach when a texture has rare features that are not a product of age, such as knots in wood. In accordance with several embodiments, a CNN may be trained to learn and identify weathered features for a multitude of weathering types.
  • Processes for Controlling Aging
  • Once weathered regions in an image have been identified, CNN-based image synthesis processes in accordance with particular embodiments of the invention can extract a parametric model for each region. The desired age can be produced as a linear combination of the separate models. In the simplest case, weathering may just be an interpolation between a Young and Old parametric model as follows:

  • P age =Y*(1−age)+O*(age)  (13)
  • This naive approach cannot generate fully young or fully old textures due to the large receptive fields common in CNN architectures mixing the two parametric models. To circumvent this problem, processes in accordance with many embodiments may introduce a “transition” parametric model built from the bordering pixels between young and old regions. To do so, processes in accordance with a number of embodiments of the invention may dynamically generate masks for each layer of the network corresponding to the receptive field. Examples of a mask are shown in FIG. 22 where black is used to delineate the young model, white for the old model and grey for the transition model. With specific regard to FIG. 22, (a) indicates an input texture image, (b) indicates a mask delineating young and old textures, (c)-(f) indicate different receptive fields measured in terms of layers of rectified linear units for the Gram losses in texture synthesis and style transfer. The aging process in accordance with some embodiments then may become a two-step process where, first, Young to Transition is synthesized and then Transition to Old is synthesized. This strategy works for textures that completely change from one material to a completely different material as the textures age. However, weathering often damages or deforms a young material rather than transforming it into a completely different material (e.g. scratching, cracking, peeling). Therefore, it is typical that a young model should not contain old features, but the old model should contain young features. In this scenario, the old and transition regions may be combined into a single combined parametric model.
  • Aging processes in accordance with many embodiments of the invention may use a simple strategy for determining whether the transition and old models should be combined or not. The strategy is based upon the observations that when generating the transition masks as shown in FIG. 22, the transition region becomes larger for deeper layers of the network. Therefore, if at some layer in the network the transition region completely replaces either a young or an old region, the processes assign that region into transition model at all layers of the network. Thus, the transition region can effectively “annex” other features if the features are too small to justify having their own parametric model.
  • Given the introduction of a third “transition” model, the equation (12) above no longer suffices. In addition to the three-model system, processes in accordance with a number of embodiments may extend the algorithm to account for an arbitrary number of parametric models for complex aging scenarios. When more than two age models are present, there are two equally appropriate synthesis strategies. First, each parametric model can have an age assigned to it between 0 and 1.
  • In the first strategy used by processes in accordance with a number of embodiments of the invention, a list of N parametric models is sorted by age value from smallest to largest giving N−1 pairs of models to linearly interpolate between. These interpolations are sequentially chained such that the youngest model is the Y model and the next youngest is the O model. Once the old texture is fully synthesized, set the Y model to the O model and replace the O model with the next youngest model. The process may then iterate until all of the parametric models have been processed.
  • In the second strategy used by a few embodiments, all N parametric models may be combined in parallel. This results in a single parametric model that is a combination of an arbitrary number of models in any proportion.
  • Although CNN-based image synthesis processes that perform aging in accordance with various embodiments of the invention are described above, other processes that perform aging in accordance with various other embodiments of the invention that add, combine and/or remove steps as necessitated by the requirements of particular systems and/or processes are possible.
  • Continuous Weathering
  • Once a new texture is synthesized at a starting age, it can be a simple process to continue the optimization while iteratively replacing the combined parametric model to reflect a new age. CNN-based image synthesis processes in accordance with particular embodiments of the invention may use this approach to synthesize smoothly varying animations from one age to another. Since a new parametric model for texture or “style” is introduced, and the optimization process starts from a prior model that represents the desired content, this process can be considered a special type of style transfer process where the style transfer process is a useful way to frame the problem.
  • One complication to note is that changing the parametric model can cause a subtle mutation of features that should otherwise be static. While the ultimate goal is to replace young features with old ones, it is also possible for the young features to mutate into new but equally young-looking features due to the parametric model being in flux. In addition, old features may continue to optimize once they are synthesized, breaking the illusion of an aging process. In order to stabilize non-transitional features, processes in accordance with many embodiments introduce a new component, a multi-target localized content loss.
  • In the most basic case, there are only young and old regions in a texture. In order to avoid the continuous mutation effect, these processes may introduce a new content loss strategy that applies different targets to local regions. These processes may begin by first synthesizing a full image for each of the two parametric models to be used as multiple content images. For each local point in the synthesis image, the processes may dynamically choose which content loss to apply based on a “parametric heat map.” To generate the parametric heat map, the mean of a parametric model is subtracted from each pixel of the model and the co-activations to form a covariance matrix for that individual feature. In accordance with a number of embodiments, this may be performed in the rectified linear units for the Gram losses in texture synthesis and style transfer for layer 4 (relu_4) of the VGG-19 network. Next, the L2 distance between this covariance matrix and the covariance matrix component of the young and old parametric models is found for each pixel. The parametric model that has the lowest error can be used to compute the content loss for the pixel using the corresponding content image. Alternatively, processes in accordance with a few embodiments implement this approach by generating a new single content image by choosing pixels from the different content images using the lowest error mask. As can readily be appreciated, the specific approach that is pursued is typically dependent upon the requirements of a given application.
  • Controlling Weathering Through “Painting by Numbers”
  • Continuously updating a single parametric model can lead to natural weathering patterns. However, it is difficult to manually control this process. As an alternative weathering approach, processes in accordance with sundry embodiments extend the painting by numbers strategy presented above where masks are directly drawn or procedurally generated which direct young, old and transition textures to different regions of the synthesis image over time. The ability to procedurally control a weathering process may be important for many textures where different regions can be affected by different environmental conditions in a 3D scene.
  • Although CNN-based image synthesis processes that control weathering through “painting by numbers” in accordance with various embodiments of the invention are described above, other processes that control weathering through “painting by numbers” in accordance with certain embodiments of the invention that add, combine and/or remove steps as necessitated by the requirements of particular systems and/or processes are possible.
  • Continuous Multiscale Aging
  • In accordance with sundry embodiments, CNN-based image synthesis processes re-purpose style transfer to generate a continuous and progressive aging/de-aging process in a multiscale pyramid framework. Style transfer may be considered an extension to texture synthesis in which a second content image is introduced to guide the optimization process. Processes in accordance with many embodiments use the same concept to synthesize time sequences in a multiscale pyramid framework. These processes may bootstrap the animation by synthesizing the first frame in the sequence using the strategy described above. After the first frame is generated, subsequent frames can be created by using the frame before as a prior frame. As such, at any given point in time, two image pyramids are stored in memory, the pyramid for the previous frame and the pyramid for the current frame being synthesized. The synthesis order is illustrated in FIG. 23. As the multiple image sizes may be synthesized in parallel, processes in accordance with a number of embodiments may store an optimizer state for each pyramid level. When synthesizing the first frame in the sequence, the base of the pyramid may use white noise as a prior frame to start the synthesis and then each subsequent pyramid level starts from the final result of the previous level that is bi-linearly re-sized to the correct resolution.
  • For all subsequent frames synthesized, a new image pyramid may be synthesized. In accordance with a number of embodiments, the first level of the new pyramid uses the first level of the previous frame as a prior image. For higher layers in the pyramid, the same layer from the previous frame is used as a prior image and a content loss is introduced by re-sizing the previous layer in the same frame, this content image can be seen as a blurry version of the desired result. This process is conceptually illustrated in FIG. 23 where image 5 is synthesized using image 2 as a prior and image 4 is re-sized and used as a content image to guide the process. By synthesizing a sequence of images in this manner CNN-based image synthesis processes in accordance with some embodiments achieve the same benefits as synthesizing a single image using the pyramid strategy. However, the fidelity of larger structures may be improved, noise artifacts may be reduced and synthesis speed may be improved.
  • Although CNN-based image synthesis processes that perform continuous multiscale aging in accordance with various embodiments of the invention are described above, other processes that perform continuous multiscale aging in accordance with various other embodiments of the invention that add, combine and/or remove steps as necessitated by the requirements of particular systems and/or processes are possible.
  • Transferring Weathered Patterns from External Exemplars
  • Arguably, it is more useful sometimes to transfer the weathering effects of one material “W” onto another material “C.” This is a more difficult problem as the parameters within W that represent aged features must be completely extracted and isolated from the parameters that encode the underlying texture. The only way to do this accurately is to give every pixel a high fidelity age scoring. While, other systems may be able to accomplish this through a nearest neighbor search, this approach is too coarse and approximate to accurately transfer weathering. Also, some of these other processes may rank pixels based on how unique the pixels are within the image. To perform transferring of weathering patterns in accordance with several embodiments of the invention, pixels may also need to be ranked based on which distinct features they contribute to and by how much.
  • CNN-based image synthesis processes in accordance with particular embodiments can accomplish this using the heat-map approach presented in the previous section on continuous weathering. However, rather than finding the best match among all parametric models, Processes performing weather transfer keep a separate L1 distance score between each parametric model. Thus, these processes may discriminate on a pixel-by-pixel basis to determine the pixels in a weathered region that contribute to the actual age artifacts and to what degree. Given a region of image W with age features as well as transition features and the resulting parametric model, the goal is to remove any features that are not the desired “aged” features and replace these features in the correct proportion with the target parametric model of C.
  • To do so, processes in accordance with many embodiments normalize an L1 distance to each parametric model between 0 and 1 and invert the result so that a region in the synthesized image that strongly matches with a parametric model will receive a score close to 1 and regions that do not match receive a score closer to 0. For a parametric model that should be removed from the mixture, processes in accordance with a number of embodiments compute a mean activation of the model (note, co-activations are not used for this process as the features become very difficult to de-tangle). For each pixel, the processes may multiply the mean activation by the local L1 distance to that parametric model and subtract it from the activations at this target pixel to remove those features in their correct proportion from neural activations of the pixels. The processes may take the mean activations from the new parametric model from image C and multiply it by the same L1 distance to determine an activation value. The activation value is then added to the target pixel in W to replace the young features in the original image with young features from a new image where the weathered features are being introduced. After the texture has been updated, the processes can now perform weathering on image W using the processes described above.
  • Although CNN-based image synthesis processes that transfer weathered patterns from external exemplars in accordance with various embodiments of the invention are described above, other processes that transfer weathered patterns in accordance with various other embodiments of the invention that add, combine and/or remove steps as necessitated by the requirements of particular systems and/or processes are possible.
  • Combining Optimization and Feedforward Parametric Texture Synthesis for Fast, High Resolution Syntheses
  • Neural network-based texture synthesis processes can be grouped into two categories based on their underlying algorithmic strategy. These include optimization-based and feedforward-based approaches. Optimization-based approaches often produce superior quality but may be computationally expensive. This often makes these processes impractical for real world applications. Feedforward-based approaches were developed as a fast alternative to optimization. This is achieved by moving the computational burden to the training phase rather than at run time. While being fast, feedforward approaches are typically poor in quality and inflexible. The first feedforward approach baked the transformation for a single texture into each network. Later, several methods introduced the idea that multiple texture transformations could be baked into a single network. One such method introduced the idea of interpolating between these styles by matching the statistics of deep neural activations from some content image to those of a style image. In addition to baking style transformations into a network, another strategy has been to train an auto-encoder that uses the standard fixed pre-trained VGG network as an encoder with a decoder that is trained to invert VGG. Style transfer can be achieved by directly modifying the activation values produced by the encoder so that they better match the style image. One approach is to replace each feature vector from the content with its nearest neighbor in the style. Impressive results can be achieved by transforming the content activations to better mimic the style activations through a whitening color transform (WCT), which is a linear transformation that is capable of matching covariance statistics. While these methods have greatly improved the flexibility and quality of feedforward methods, they can still be inferior to optimization.
  • CNN-based image synthesis processes in accordance with some embodiments of the invention use a coarse-to-fine multiscale synthesis strategy for neural texture synthesis. These processes can achieve significant speedup over previous optimization methods by performing a majority of iterations on small images early in the process and the further the processes move up the pyramid, the less iterations are used to maintain the already established structure. The use of multiscale pyramid based synthesis is not only computationally cheaper as the processes move up the pyramid, but the problem formulation actually changes. Rather than performing texture synthesis or style transfer, the problem changes to Single Image Super Resolution (SISR) that takes an additional parametric texture model to help guide the up-resolution process.
  • As such, CNN-based image synthesis processes in accordance with many embodiments of the invention may utilize the optimization-based approach described above up until an arbitrary threshold (for example, around the 512×512 pixel size image, varies depending upon the requirements of a given application) and then switch to an arbitrary feedforward approach utilizing VGG encoding/decoding with activation transforms along the way. Switching synthesis algorithms as the processes move up the pyramid can have additional benefits beyond speed. Some CNN-based texture synthesis processes are only capable of generating RGB color textures, a standard that has been obsolete in the video game and movie industries for nearly 20 years. Color textures have been replaced by “Materials” which consist of several maps encoding the fine scale geometry of the surface as well as parameters that direct how light interacts with each pixel. By utilizing a nearest neighbor search from the previous pyramid level to the most similar feature in the style exemplar, the encoder/decoder process in accordance with a number of embodiments can both up-resolution the previous synthesis level while also decoding the entire material. While it may be possible to train a new auto-encoder to process color images along with normal maps, roughness maps, etc. this process would have to be done for every possible combination of maps. This may be costly and awkward. This approach may provide a more flexible and elegant solution.
  • Extending this concept to the arbitrary case, a method in accordance with several embodiments of the invention generates arbitrary material formats applied to any synthesis operation including, but not limited to, texture synthesis, time-varying weathering, style transfer, hybridization and super resolution. This synthesis strategy involves using some color texture generated using another process as input. In addition, an exemplar material is given as input, where this material contains at least one map that is similar in appearance and purpose as the input color map. The input color map is then used as a guide to direct the synthesis of the full material. This is done through a nearest neighbor search where a pixel/patch is found in one of the maps in the material that is similar to a pixel/patch in the input color image. The pointer map resulting from the nearest neighbor search directs how to re-arrange all maps within the material and then each can be synthesized using this same new guiding structure.
  • These processes may be additionally attractive for style transfer because they are fully feedforward and the full resolution image can easily be broken up into deterministic windows that can be synthesized separately and stitched back together. This allows processes in accordance with particular embodiments to synthesize arbitrarily sized images with minimal engineering.
  • Although CNN-based image synthesis processes that combine optimization and feed forward processes in accordance with various embodiments of the invention are described above, other processes that combine optimization and feed forward processes in accordance with various other embodiments of the invention that add, combine and/or remove steps as necessitated by the requirements of particular systems and/or processes are possible.
  • SISR For Renderings
  • In accordance with certain embodiments, a style transfer process used to perform SISR may have applications for rendering. SISR is an ill-posed problem where many high-resolution images can be downsampled to the same low-resolution result. This one-to-many inversion is especially bad at reproducing texture because it targets inverting to an average of all the possible higher resolution images. The latest trend in SISR is to train very deep (i.e. many layered) artificial neural networks on a large dataset using adversarial training. The high capacity of the deep network in conjunction with the adversarial training is meant to help reduce the loss of texture features.
  • Recent advances in rendering algorithms as well as advances in high-resolution displays has resulted in an associated rise in rendering costs. The problem is that that there are more pixels and each pixel is more expensive to render. In particular, the recent jump from 1080p to 4k rendering has left many animation houses incapable of meeting market needs. For small studios, the costs may be restrictive. For large and famous studios, rendering at 4k may also be challenging.
  • However, when rendering a movie or other video content, frames typically do not change significantly from one frame to the next. Therefore, it can be assumed that the parametric model or high-resolution texture statistics extracted from one frame are probably also appropriate for guiding similar but slightly different frames. Processes in accordance with some embodiments perform a video up-resolution strategy where the video content is rendered at a low resolution (LR). From the LR source, the processes cluster frames together based on their feature statistics. The mean frame from each cluster is determined and rendered at high resolution (HR). The processes then perform the same guided LR to HR synthesis as proposed for video streaming, with the one important difference that in video streaming the HR statistics for each frame are known whereas for rendering similar HR statistics are shared across multiple frames.
  • Although CNN-based image synthesis processes that perform SISR for rendering in accordance with various embodiments of the invention are described above, other processes that perform SISR for rendering in accordance with various other embodiments of the invention that add, combine and/or remove steps as necessitated by the requirements of particular systems and/or processes are possible.
  • Combining Parametric and Non-Parametric-Non-CNN Synthesis within Pyramid Frameworks
  • Based on the processes described above, CNN-based image synthesis processes in accordance with many embodiments of the invention can use a nearest neighbor search between patches in the synthesized result to determine the most similar patches in the input exemplar in order to create a bridge between parametric CNN-based texture synthesis frameworks and many established non-parametric texture synthesis methods that do not require a neural network to operate. The ability to tether a neural network approach on low-resolution images with non-neural network based methods higher in the synthesis pyramid can represent a “best of both worlds” solution between the two strategies. CNN-based approaches, especially parametric methods may be better at producing creative new features at the cost of speed, memory and image quality (these methods may contain many noise artifacts). Non-parametric models that do not rely on neural networks tend to shuffle around patches directly from the input exemplar. As such, these approaches exhibit the inverse of these behaviors. They are fast, low memory approaches that largely match the fine details of the input. However, they are not as powerful at creating new shapes and features.
  • Although processes that combine parametric and non-parametric-non-CNN synthesis within a pyramid framework in accordance with various embodiments of the invention are described above, other processes that combine parametric and non-parametric-non-CNN synthesis within a pyramid framework in accordance with various other embodiments of the invention that add, combine and/or remove steps as necessitated by the requirements of particular systems and/or processes are possible.
  • Although the present invention has been described in certain specific aspects, many additional modifications and variations would be apparent to those skilled in the art. It is, therefore, to be understood that the present invention may be practiced otherwise than specifically described, including various changes in the implementation without departing from the scope and spirit of the present invention. Thus, embodiments of the present invention should be considered in all respects as illustrative and not restrictive.
  • Dilated Network Architecture
  • The examples disclosed above have largely been agnostic to network architecture and are transversal to many Convolutional Neural Network (CNN) architectures. However, in addition to the examples above, the system in accordance with several embodiments of the invention implements a new network architecture for image synthesis that is particularly well suited to the problem.
  • In accordance with some embodiments of the invention, a combination of Pooling Layers, Strided Convolution Layers and Dilated Convolution Layers are used to arrange neurons into a hierarchical multiscale relationship. Typically, image synthesis algorithms utilize pooling layers and sometimes strided convolution layers in order to form an image pyramid structure within the neural network architecture. Typically, only one such strategy is used throughout the network architecture. Recent work in image segmentation has achieved performance improvements utilizing a dilated convolutional network (https://arxiv.org/pdf/1511.07122.pdf—the disclosure of which related to dilated convolutional networks is hereby incorporated by reference herein in its entirety), with a follow-up work showing that dilated convolution can also be used to learn image filters (http://vladlen.info/papers/fast-image-processing-with-supplement.pdf—the disclosure of which related to dilated convolutional networks is hereby incorporated by reference herein in its entirety).
  • Dilated convolution is a similar concept to image stacks, first introduced for the purposes of image processing using signal and statistics based methods and later adapted for texture synthesis (Sylvain Lefebvre and Hugues Hoppe. 2005. Parallel controllable texture synthesis. ACM Trans. Graph. 24, 3 (July 2005), 777-786—the disclosure of which related to dilated convolutional networks is hereby incorporated by reference herein in its entirety). The image stack is a collection of image pyramids sampled from a single image at regular translation intervals. Image stacks were developed to address the problem that the image pyramid data structure leads to discretization errors, e.g. the same input image when translated could lead to very different downsampled results. The image stack is effectively a translation invariant alternative to image pyramids. It also follows that other types of symmetry transformations could also lead to similar discretization artifacts, e.g. two otherwise identical images would produce very different averages at coarser levels of an image pyramid.
  • Other solutions to this problem include, but are not limited to, using a collection of image pyramids at different translations of an input image. An image stack can extend this concept to the most extreme conclusion where every possible image pyramid configuration is extracted from the original image, and then packed together efficiently so that all redundant data is removed. The efficient packing can be achieved using a “quadtree pyramid,” which was originally developed to accelerate an Approximate Nearest Neighbor (ANN) search utilizing KD-Trees. However, an image stack re-orders the data into a tightly packed image where features remain coherent across scale levels. The same data structure was actually first introduced a few years earlier by Masaki Kawase during a Game Developers Conference lecture for the purpose of blurring an image (slides published online by GDC, https://www.gamedev.net/forums/topic/615132-dof-basic-steps/Slides 15-18—the disclosure of which related to dilated convolutional networks is hereby incorporated by reference herein in its entirety).
  • In an image pyramid structure, each level of the pyramid is typically half the resolution in each dimension as the previous level. In an image stack structure, each level is the same resolution as the previous level. In an image pyramid, samples are typically averaged together or combined in some way using a convolution kernel with a stride of 1, used uniformly at every pyramid level. In an image stack, samples are typically averaged together or combined in some way using a convolution kernel with a stride of 2level where level is the number of scale factors relative to the original image. This can be thought of as analogous to a downscaling transition in an image pyramid. In summary, image pyramids both downsample and subsample an image. While downsampling is a desirable operation, subsampling rarely is. Image stacks get around this problem by downsampling without subsampling.
  • The same data structures and hierarchical relationships used for an image pyramid and an image stack can also be used for convolutional network architectures that utilize pooling layers or strided convolution as well as dilated convolution. As such, the advantages/disadvantages of using a stack versus a pyramid in an image processing framework are transversal and carry over to a convolutional neural network architecture. Replacing pooling or strided convolution with a dilated convolution will often yield superior image synthesis results.
  • Previous image synthesis methods using a Convolutional Neural Network have used some kind of pooling or dilated convolution strategy, thus they typically go through some form of subsampling operation followed by a supersampling operation. In the optimization based framework, the feedforward pass is a subsampling pyramid operation as higher order features are extracted deeper in the network, then the generation of the new image is a supersampling process through backpropagation as gradients traverse upwards through the network to the higher resolution shallow layers. In a feedforward architecture, many approaches are typically some form of auto-encoder or cascading image pyramid, both of which utilize some form of subsampled data and attempt to supersample it during the synthesis process. In many embodiments of the invention, network architectures designed for image synthesis, which may rely on pooling or strided convolution to do downsampling, can be improved by using a dilated convolution architecture instead. Therefore, the system in accordance with several embodiments of the invention makes use of dilated alternatives to such network architectures, as dilation is often a superior form of downsampling for image synthesis operations. In addition, where a synthesis strategy also relies on the use of image pyramids (typically Gaussian pyramids) of the input data for additional multiscale synthesis, the system in accordance with some embodiments of the invention uses an image stack to replace the image pyramid (typically a Gaussian stack).
  • The dilated network strategy is particularly well suited for auto-encoders where the decoder network is a layer-wise inverter of the encoder network (i.e. each layer in the encoder has a “mirror” layer in the decoder which inverts that layer as accurately as possible). This particular network architecture is desirable for fast image synthesis because the encoder side of the network can distill an image into its most meaningful features, which can be modified in some way by another algorithm (e.g. including, but not limited to, a whitening transform, histogram match, or nearest neighbor search). The newly updated values can then be inverted by the decoder in order to produce a new image. This synthesis strategy is attractive because it's much faster and more memory efficient than an optimization based approach. It is however very difficult to implement because inverting a network that includes pooling is very difficult (historically literature has always used a pre-trained VGG network as the encoder). Inverting pooling layers typically leads to blurring or other such supersampling artifacts. Systems in accordance with many embodiments of the invention implement a dilated architecture as an alternative, which is easy and more accurate to invert on a layer-by-layer basis. In many embodiments, a whitening transform, multiscale NNS search or histogram matching algorithm can continue to be applied to features at each network layer as they progress through the decoder.
  • Note that dilation shares the same drawback as the image stacks, it is a memory inefficient way to encode data, especially deeper in the network. Previous texture synthesis approaches utilizing image stacks were generally limited to 256×256 pixels in resolution due to the high memory demands. However, conventional images may have much higher pixel resolutions (e.g., up to or exceeding a 8192×8192 pixel resolution). The size of these images can make the image stack representation too memory demanding.
  • Instead, processes in accordance with some embodiments of the invention combine pooling or strided convolution layers at the shallow end of a convolutional neural network architecture with dilated convolution layers deeper in the network. This “hybrid” network architecture exhibits the properties of a pyramid network up to a specific depth in the network and then switches to the properties of a dilated stack. From a memory viewpoint, this is attractive because large images quickly get condensed into much smaller images for the majority of the network. This is also a good compromise from an image processing viewpoint because the deeper layers of the network encode the complex shapes and patterns and thus need the highest resolution. Shallow layers of the network only encode simple shapes and textures and don't require the same degree of network capacity. This new network architecture can be visualized as an image stack with a pyramid sitting on top of it.
  • Although a specific pyramid-stack hybrid convolutional neural network architecture based on some combination of pooling, strided convolution and dilated convolution is used for image synthesis in a number of examples discussed above, in several embodiments of the invention, the pyramid-stack hybrid may be modified in a variety of ways, including (but not limited to) adding, removing, and/or combining components of the stack.
  • Image Hybridization
  • Starting from a set of several source images within the same category, systems and methods in accordance with many embodiments of the invention can hybridize, or mix them together in a way that produces a new member from that category. This is a follow-up work to Risser, Eric, et al. “Synthesizing structured image hybrids.” ACM Transactions on Graphics (TOG). Vol. 29. No. 4. ACM, 2010. the disclosure of which is hereby incorporated by reference in its entirety. While this work is based on the same theories as the original approach, this version of image hybridization is re-designed from the ground up in order to utilize convolutional neural networks both for image description and as the machinery for performing synthesis. The three key observations for image hybridization are:
  • (1) instead of taking a small sample from an infinite plane of texture and synthesizing more of that infinite plane, grab a small sample across an infinite set within a “category” and synthesize more instances of that category,
  • (2) structure exists across scales and should be observed across scales, and
  • (3) manifesting structure from noise is difficult, so don't break structure in the first place.
  • CNN based image synthesis is an ideal approach for hybridizing images due to the CNN's ability to learn and identify complex shapes and structures across multiple scales. Unlike other image synthesis methods which can be improved by dilated convolution, but do not require it, hybridization is likely to produce poor results and artifacts if deeper layers of the CNN are subsampled. Therefore, the input images can be passed in a feedforward manner through a dilated convolutional neural network to produce deep neural activations that have a 1-to-1 mapping with input image pixels.
  • The logic behind using a stack as opposed to a pyramid is that when performing coarse to fine synthesis, the texture built at a coarse level largely influences whether a local or global minimum will be found. When a sub-optimal choice is made at the coarse level of the synthesis pyramid, synthesis will get stuck in a local min, so it is important to be very careful when synthesizing at the coarse level, in order to achieve a good result. When doing stochastic texture synthesis, there are many local mins that are all very similar to the global min, so this is less of a concern. When doing image hybridization though, most local minima are significantly worse than the global minimum and a small set of local minima surrounding the global minimum. This can be tied to breaking versus preserving the global structure of the image.
  • The reason why stacks typically preserve global structure as opposed to pyramids is very simple. Imagine the same picture of a face in image A and image B. The only difference is that image B is transposed along the x axis. When turning A and B into pyramids, the right eye is subsampled into a single pixel at a coarse level. The same eye in image B, however, is spread across two pixels which also have features surrounding the eye, thanks to the transposed position drastically changing the quantization in the pyramid. When synthesizing the texture at a coarse level, there are no good nearest neighbor matches between image A and B, the center of the eye is represented by a pixel in image A but image B only has the right and left sides of the eye as candidates to be linked to. While it's theoretically possible to recover the eye feature at finer levels, this is a much harder problem and in practice rarely happens. By using a stack, the system in accordance with many embodiments of the invention avoids quantization and maintains the maximum number of features to compare, drastically increasing the chance of finding good matches. To bootstrap the synthesis process, choose one of the input images at random and pass it feedforward through a network. Unlike the first step that extracts feature vectors for each exemplar, synthesis itself does not strictly require a dilated architecture. Dilated versus strided convolution have their own benefits and weaknesses and are compared below. The important thing to note is that the same convolution kernels used for extracting exemplar features typically must also be used during synthesis. Luckily, a dilated architecture can be thought of as a collection of pyramid architectures, so the same set of kernels can be used in either strategy. Many of the examples described herein refer to VGG-19 feature kernels pre-trained on the ImageNet dataset, however one skilled in the art will recognize that convolutional kernels from any network architecture trained on any dataset may be applied in accordance with various embodiments of the invention.
  • Hybridization, unlike other image synthesis operations, is a non-parametric process that relies on finding similar matching features between input and output images and building new images by re-combining sets of exemplar features into new configurations while tracking the global feature error between these new features being mixed and the original input features that they were derived from. Note that hybridization can be performed in either an optimization or feedforward based synthesis strategy. In either case, the key aspect of image hybridization is to algorithmically generate new activations at different levels of the network which combine the activation features extracted from the input images into new hybrid configurations. Before we describe how these new hybrid configurations are generated, we'll identify how they are used to synthesize new images.
  • When performing optimization based synthesis, an input image (typically noise, but it could be anything) is iteratively updated to minimize some loss function. The “hybrid loss” function is the summed L2 distance between each pixel in the current image being synthesized and the hybrid activation maps at a given layer. This is the same strategy as the “content loss” described above, however, whereas the content loss was taken directly from an input image, the “hybrid loss” is a new activation map that is generated by recombining activation features taken from different input images. In the original image synthesis work, content loss is only used at RELU4_1, so that it does not overpower style loss at shallow layers of the network. Hybridization in accordance with a number of embodiments of the invention incorporates a style loss in order to perform style transfer combined with hybridization all in one operation. Alternatively, in several embodiments, the basic hybridization algorithm assumes that there is no style loss. Therefore, hybrid loss can be used at multiple layers in the network. Feedforward networks on the other hand do not perform an optimization process turning one image into another, instead they transform an image into a new image. Therefore, using the dilated auto-encoder network described above, the encoder portion is run on all input images, their features are hybridized in the middle of the network using another process, and then this hybridized set of activation values are inverted by the decoder. Note that in both optimization and feedforward synthesis, the results of hybridizing deep features in the network can be passed up to shallow layers and then become further hybridized through another hybridization step.
  • In order to describe the process in which activation maps are reconfigured to produce hybrid activation maps, it is helpful to first introduce the concept of feature space. The idea behind feature space is that similar visual features in the image will map to points in feature space that are close to each other in terms of Euclidean distance, while very different looking features will be very far away from each other. The neural network in accordance with some embodiments of the invention is a feature descriptor and converts a neighborhood of raw pixel values into a single point in a high dimensional feature space.
  • Given this background, the creation of hybrid activations for a layer can be explained. Given the layer activations for one of the random input images, the goal is to traverse every pixel location in the layer and replace the current feature vector at that location with a new feature vector taken from some other pixel location in that layer or from some pixel location taken from another input image's neural activations at that layer. This can be done through a two-step process where the process introduces randomness or “jitter” and then “corrects” any artifacts or broken structures caused by the jitter. In certain embodiments, the process optionally pre-computes k-nearest neighbors between each input image and every other input image as a part of an acceleration strategy.
  • In the next phase, for each pixel in the image being synthesized, the process in accordance with many embodiments of the invention gathers k nearest neighbors from the input exemplars. In certain embodiments, the process divides up the k samples equally across all the exemplars. In a number of embodiments, the distance metric used for these KNNs is the L2 distance between feature vectors at the neural network layer of interest. This is equivalent to transforming all of the image data into points in high dimensional feature space. Around each synthesis feature point, the process in accordance with some embodiments of the invention gathers the cluster of exemplar feature points surrounding it, such that the process samples the same number of points from each exemplar. The next step is to sort these K nearest neighbors from smallest to largest distance.
  • In certain embodiments, the one parameter exposed to the user is a “jitter slider” that goes from 0-1 (or an equivalent linear or non-linear range), where 0 should produce one of the original inputs and 1 should be the maximum hybridization and mutation. Therefore, the 0-1 range is mapped to the min distance and max distance of the closest to farthest neighbors. Depending on the jitter setting, the process in accordance with many embodiments of the invention gathers the K nearest neighbors with distances less than the jitter value and randomly selects one of them to update the synthesis patch with. This is akin to constraining noise. Instead of starting from noise and trying to recover structure from it (which is very difficult), instead the process in accordance with a number of embodiments starts from pure structure (i.e. the input) and adds noise strategically and intelligently in order to not break the structure to a degree from which it cannot be recovered. To this end, the process in accordance with several embodiments of the invention adds noise or randomness in “feature space” rather than color space or image space as is typical for these types of algorithms. This strategy adds noise in feature space, which essentially allows the process to randomize the image in a way that preserves the important structures of the image. This operation can be performed at one or more convolution layers within the CNN.
  • After jitter is used to modify the details of the new image being synthesized, the second step “correction” then “fixes” the image so that it maintains statistical similarity to the exemplar input. For each n×n neighborhood of synthesized neural activation vectors (where n×n could be any size, including 1×1 e.g. a single element vector), correction seeks out the neighborhood of neural activation vectors in any exemplar that has the lowest L2 distance. The current synthesis neural activation vector is then replaced with that closest exemplar neural activation vector. The correction scheme is based on coherence (Ashikhmin, Michael. “Synthesizing natural textures.” Proceedings of the 2001 symposium on Interactive 3D graphics. ACM, 2001—the relevant disclosure from which is hereby incorporated by reference in its entirety), which states that nearest neighbor selection is not always the best method for finding perceptual similarity. Rather coherence, or the relationship between neighboring pixels, plays a large role in structure and perceptual similarity. Therefore, the process in accordance with many embodiments of the invention introduces a bias so that exemplar activation vectors that form a coherent patch with the surrounding synthesis activation vectors are given a reduction in L2 distance. This incentivizes the formation of coherent patches from the exemplar.
  • For both jitter and correction, the process in accordance with various embodiments of the invention can either perform a nearest neighbor search from the synthesis layer to the exemplar layers during runtime of the algorithm, or could pre-compute a candidate list of k-nearest neighbors from every exemplar feature to every other k exemplar feature. Then, during synthesis, each activation vector also maintains a pointer to the exemplar activation vector that it is mimicking.
  • Whether pre-computing nearest neighbors, or finding them at runtime, a nearest neighbor searching algorithm that is designed with neural networks in mind is needed. To this end, several embodiments in accordance with the invention use a nearest neighbor algorithm as described in U.S. Provisional Application 62/528,372, entitled “Systems and Methods for Providing convolutional Neural Network Based Non-Parametric Texture Synthesis in Graphic Objects, filed on Jul. 3, 2017, the disclosure of which is incorporated herein in its entirety.
  • The synthesis process for an optimization based algorithm in accordance with some embodiments of the invention runs E input images through a dilated version of a CNN, resulting in a set of activation vectors for specific layers of interest (for the sake of VGG-19, these are RELU1_1, RELU2_1, RELU3_1 AND RELU4_1). The synthesis process runs a randomly selected input image through either a dilated or un-dilated version of the CNN to produce the starting point for the hybrid activations. The process runs a jitter pass and runs the standard neural optimization based synthesis method starting from some prior (typically noise) for several iterations of backpropagation until the prior has turned into a manifestation of the jittered activations at the deep layer. The process then runs a correction pass on the activations at the coarsest layer in the network (for VGG-19, this is RELU4_1), thus producing the hybrid activations for that layer. The process runs the standard neural optimization based synthesis method again for several iterations of backpropagation until the prior has turned into a manifestation of the hybrid activations at the deep layer. Once the current level has converged, the process moves to the next most shallow layer of interest in the network (e.g. RELU3_1 for VGG-19) and repeats the process, jitter and correct in order to find new hybrid activations for that layer to use as the target for hybrid loss and reruns the optimization process now only going to that layer and no farther down the network. Repeat this process until the shallowest layer of interest is optimized.
  • The synthesis process for feedforward networks in accordance with a number of embodiments of the invention runs all inputs through the encoder, producing the deep neural activations. Before running the decoder, the process runs the jitter pass on one of the exemplars in order to randomize the features. In order to correct, the process samples a neighborhood of activation vectors (at least 3×3) around each activation vector and performs the correction phase of the algorithm. The jitter and correction phase can either use pre-computed nearest neighbor sets or run a full nearest neighbor search during the algorithm. Once correction is finished, the process continues through the decoder, inverting the new hybrid layer. This process can be repeated for each layer moving through the decoder or only run at target layers. This is a tradeoff between algorithm speed and the scale at which features are hybridized. Optimization based synthesis is slower than feedforward, however it achieves superior quality.
  • On Model Image Synthesis Using Convolutional Neural Networks
  • In computer graphics, a 3D model is typically “texture mapped”. For purposes of this discussion, “texture mapped,” means an image is wrapped over the surface of the 3D shape as shown in FIG. 24. 3D models typically contain UV coordinates at each vertex which define the 2D parameterization of the 3D surface. In FIG. 24, the left image displays the underlying geometry of the mesh 2401, the middle image shows the geometry with a texture mapped over the mesh 2402 and the image on the right shows what that texture 2403 looks like as a 2D mapping of a 3D surface. We refer to synthesizing texture maps as “on-model synthesis.”
  • Processes in accordance with many embodiments of the invention integrate an on-model synthesis approach into the CNN approach. To do so, these processes have to spread out atlas maps and build a gutter space of pointers re-directing to neighboring charts.
  • The CNN based synthesis approach in accordance with many embodiments of the invention relies on the process of convolution in which each pixel of the synthesis kernel is filtered based on a neighborhood of its surrounding pixels. On-model synthesis introduces two complications on top of the standard synthesis approach in image space:
  • (1) Flow field over a 3D model is generated using its curvature properties along with user guidance. That flow field can then be projected as a 2D vector field in the parameterized texture space. This flow field typically contains both directional components as well as scale components along each axis. Rather than convolving the neural network along the image x and y axis unit vectors globally, each pixel now has its own local coordinate frame and scale.
  • (2) Because an arbitrary 3D surface cannot be mapped to a single plane, UV texture space is typically broken up into a set of “charts” where each chart covers a relatively flat portion of the model. This adds another level of complication because texture colors that are coherent along the surface of the model are not coherent in texture space where we perform our convolutions. To accommodate this, the process in accordance with many embodiments of the invention adds a gutter space of a few pixels in radius around each chart. These gutter pixels store pointers to other charts in texture space that encode coherent pixels along the model's surface. This additional pointer buffer is referred to as a “jump map”. When performing convolution, rather than sampling directly from the image, the process in accordance with a number of embodiments first samples from the jump map which points to the image pixel that should be sampled. Because texture space might have tightly packed charts, as a pre-process, the process in accordance with some embodiments spreads out the charts so that there is a gutter space of at least two pixels around each chart at the coarsest synthesis pyramid level plus however many pooling layers are passed through in the CNN. Note that when using dilated convolution, the gutter space typically must be two to the power of the number of dilated convolutions.
  • Processes in accordance with some of these embodiments introduce an underlying vector field that frames the local orientation around each pixel. As CNNs work by performing convolution across an image, the vector field directs the local orientation of the convolution. Thus, these processes can bi-linearly interpolate sampling of neural activations from the previous layer. Where the convolution kernel extends beyond the scope of an atlas chart, the gutter space of pointers redirects to another atlas chart. During the back-propagation phase of the process, inverse mapping can be used in a manner similar to what is described above with respect to convolution. This allows these processes to perform CNN image synthesis directly in UV space for on-model synthesis.
  • Thus far the algorithm described is designed to use a rectangular texture and a single model (with no relationship between the two) as input and synthesize a new image which maps into the model's unwrapped texture space, as shown in the FIG. 25, where texture 2501 is wrapped around mesh 2502. In this regard the input is still a rectangular image and the output uses the mesh as a canvas on which to paint over. In many instances, a pre-textured mesh is given as input and the textures already parameterized into some UV space are used as the source data to feed an image synthesis process.
  • Processes in accordance with some embodiments of the invention follow a similar approach. These processes take this concept a step further and produce textures that conform to geometric shapes and the feature contents of that texture are guided by the underlying shape itself. This results in image synthesis that can be applied on top of already textured meshes, and can also produce appearance transfer from one textured mesh onto another.
  • The goal of processes in accordance with some embodiments of the invention is to go one step further and provide an on-model texture synthesis scheme that allows the user to supply a fully textured model as the input exemplar (for example texture mapped mesh (2402)) instead of just a texture (2403), and apply that texture from the model onto a different untextured model. The advantage to this approach is that a lot of useful information is represented by a textured mesh, including (but not limited to) the relationship between varying texture features and the underlying geometric shape on which they would typically exist. Texture and shape are often not independent. Instead, texture and shape are related. Thus, by learning or associating the relationships between a texture and a shape to which the texture is applied, processes in accordance with some embodiments of the invention can provide artists with more powerful and convenient tools.
  • There are two key ideas behind this approach in accordance with some embodiments of the invention. The first is that deep neural activation features and their resulting parametric models for UV mapped textures should be calculated using the same vector field and jump map approach proposed above for the purposes of synthesis. The second is to find a shape descriptor that is both effective as well as compatible with an image descriptor maintained by the system and the image based GPU accelerated framework upon which the system is built.
  • A key insight is that geometric shape information can be projected onto an image (i.e. a regular grid) and the shape descriptor is able to work by sampling patches from this grid in order to maintain compatibility with the GPU framework. Because it is desirable that geometric neighborhoods correspond to texture neighborhoods, it makes sense that the geometric projection into image space should match the texture unwrapping. The only issue is that texture information can map to multiple portions of a single mesh. As such, processes in accordance with some embodiments of the invention utilize a texture parameterization that provides a 1-to-1 mapping between points on a model and pixels in a texture image. This amounts to simply making copies of charts or chart regions that are pointed to from multiple polygons so that each polygon maps to its own region in texture space. Once each point on the 3D surface of a mesh points to a unique pixel in image space, any arbitrary shape description, ranging from point location in 3D space to more sophisticated descriptors, can be fed into a CNN framework in order to learn local shape features using a CNN training process. One such training approach could be mesh categorization, however, other training approaches such as mesh compression, feature clustering or upres could also be viable training strategies for learning meaningful shape features.
  • Condensed Feature Extraction Networks
  • In several embodiments, a learning strategy for condensing networks that have been trained for purposes other than image synthesis allows for the production of new networks that are more efficient at extracting image features used for image synthesis. Typically, VGG-19 pre-trained for image classification is used as a high quality, learned image descriptor for extracting out meaningful image features for the purposes of image synthesis. Many networks designed for classification have been designed for a different and more difficult problem than texture feature extraction and often require more capacity than is needed for feature extraction. VGG, for example, is computationally expensive to run, which can result in small images, long wait times and a reliance on expensive hardware. One of the benefits of systems in accordance with various embodiments of the invention is to improve memory/speed performance, without sacrificing synthesis quality.
  • Again, for the purposes of image synthesis, VGG or some other network architecture trained on classification can be of interest on the basis that the kernels that were produced as a byproduct of the learning process for image classification can be useful in image synthesis. For image synthesis, not all activation maps produced by a network are needed, only a small subset of those feature maps. As such, there are layers in the network that are not used directly for image synthesis, rather blocks of layers are run between layers of interest. In many embodiments, the number of hidden layers in a previously trained CNN can be reduced and/or the capacity of those hidden layers can be reduced. The simplest strategy is to train a new network on image classification. Unfortunately, if a new set of kernels for classification on a lower capacity network is learned, there's a good chance that there won't be enough capacity to perform classification as well as VGG and if the classification isn't as good then the resulting kernels might not do as good a job at feature extraction.
  • Rather than learning a leaner feature extraction network by reproducing the classification learning process, the learning strategy in accordance with many embodiments of the invention uses the activation maps produced by VGG (or some other artificial neural network) as the ground truth (since they will produce good synthesis results) and a network is trained to try and reproduce the input/output pairings using fewer neurons then VGG.
  • The assumption behind this condensed network learning strategy is that a reduction in network capacity has a larger effect on the quality of classification performance than it does on feature extraction for the purposes of image synthesis. For example, if a small network approximates a VGG convolutional block at 95% accuracy at 25% VGG's neurons, that would introduce so much error that the classification results would be dramatically affected. However, for the purposes of image synthesis, 95% accuracy would still find very good image features and synthesis quality would not be noticeably effected. Stated another way, systems and methods in accordance with many embodiments of the invention utilize an artificial neural network with a specific number of neurons to learn a network that approximates the intermediate neural activations of a different network with a larger number (or the same number) of artificial neurons for the purposes of efficient image synthesis.
  • Although various processes for generating a geometric description map for convolutional neural network based texture synthesis are discussed above, many different systems and methods can be implemented in accordance with various embodiments of the invention. It is therefore to be understood that the present invention may be practiced in ways other than specifically described, without departing from the scope and spirit of the present invention. Thus, embodiments of the present invention should be considered in all respects as illustrative and not restrictive. Accordingly, the scope of the invention should be determined not by the embodiments illustrated, but by the appended claims and their equivalents.

Claims (30)

What is claimed is:
1. A system for generating a synthesized image including desired content presented in a desired style comprising:
one or more processors;
memory readable by the one or more processors; and
instructions stored in the memory that when read by the one or more processors direct the one or more processors to:
receive a source content image that includes desired content for a synthesized image,
receive a source style image that includes a desired texture for the synthesized image,
determine a localized loss function for a pixel in at least one of the source content image and the source style image, and
generate the synthesized image by:
optimizing a value of a pixel in the synthesized image to a content loss function of a corresponding pixel in the content source image and a style loss function of a corresponding pixel in the source style image wherein at least one of the corresponding pixels is the pixel that has a determined localized loss function and one of the content loss function and the source loss function is the determined localized loss function.
2. The system of claim 1, wherein the localized loss function is represented by a Gram matrix.
3. The system of claim 1, wherein the localized loss function is represented by a covariance matrix.
4. The system of claim 1, wherein the localized loss function is determined using a Convolutional Neural Network (CNN).
5. The system of claim 4, wherein the optimizing is performed by back propagation through the CNN.
6. The system of claim 1, wherein the localized loss function is determined for a pixel in the source style image.
7. The system of claim 6, wherein the instructions to determine a localized loss function for a pixel in the source style image direct the one or more processors to:
receive a mask that identifies regions of the style source image;
determine a group of pixels including the pixel that are included in one of the plurality of regions identified by the mask;
determine a localized loss function for the one of the plurality of regions from the groups of pixels included in the one of the plurality of regions; and
associate the localized loss function with the pixel.
8. The system of claim 6, wherein the instructions to determine a localized loss function for a pixel in the source style image direct the one or more processors to:
group the pixels of the source style image into a plurality of cells determined by a grid applied to the source style image;
determine a localized loss function for the one of the plurality of cells that has a group of pixels that include the pixel; and
associate the determined localized loss function of the one of the plurality of cells with the pixel.
9. The system of claim 6, wherein the instructions to determine a localized loss function for a pixel in the source style image direct the one or more processors to:
determine a group of neighbor pixels for a pixel in the source content image;
determine a group of corresponding pixels in the source style image associated with the group of neighbor pixels in the source content image wherein each of the group of corresponding pixels corresponds to one of the group of neighbor pixels and includes the pixel; and
determine a local loss function for the group of corresponding pixels.
10. The system of claim 1, wherein the localized loss function is determined for a pixel in the source content image.
11. The system of claim 10, wherein the instructions to determine a localized loss function for a pixel in the source content image direct the one or more processors to:
receive a mask that identifies regions of the source content image;
determine a group of pixels including the pixel that are included in one of the plurality of regions identified by the mask;
determine a localized loss function for the one of the plurality of regions from the groups of pixels included in the one of the plurality of regions; and
associate the localized loss function with the pixel.
12. The system of claim 10, wherein the instructions to determine a localized loss function for a pixel in the source style image direct the one or more processors to:
group the pixels of the source content image into a plurality of cells determined by a grid applied to the source style image;
determine a localized loss function for the one of the plurality of cells that has a group of pixels that include the pixel; and
associate the determined localized loss function of the one of the plurality of cells with the pixel.
13. The system of claim 10, wherein the instructions to determine a localized loss function for a pixel in the source style image direct the one or more processors to:
determine a global content loss function for the source content image from the pixels of the source content image;
determine a weight for the pixel indicating a contribution to a structure in the source content image; and
apply the weight to the global content loss function to determine the localized loss function for the pixel.
14. The system of claim 13, wherein the weight is determined based upon a Laplacian pyramid of black and white versions of the source content image.
15. The system of claim 10, wherein a localized loss function is determined for a pixel in the source content image and a corresponding pixel in the source style image.
16. The system of claim 13, wherein the optimization uses the localized loss function for the pixel in the source content image as the content loss function and the localized loss function of the pixel in the source style image as the style loss function.
17. The system of claim 1, wherein pixels in the synthesized image begin as white noise.
18. The system of claim 1, wherein each pixel in the synthesized image begins with a value equal to a pixel value of a corresponding pixel in the source content image.
19. The system of claim 1, wherein the optimizing is performed to minimize to a loss function that includes the content loss function, a style loss function, and a histogram loss function.
20. A method for performing style transfer in an image synthesis system where a synthesized image is generated with content from a source content image and texture from a source style image, the method comprising:
receiving a source content image that includes desired content for a synthesized image in the image synthesis system;
receiving a source style image that includes a desired texture for the synthesized image in the image synthesis system;
determining a localized loss function for a pixel in at least one of the source content image and the source style image using the image synthesis system; and
generating the synthesized image using the image synthesis system by optimizing a value of a pixel in the synthesized image to a content loss function of a corresponding pixel in the content source image and a style loss function of a corresponding pixel in the source style image wherein at least one of the corresponding pixels is the pixel that has a determined localized loss function and one of the content loss function and the source loss function is the determined localized loss function.
21. The method of claim 20, wherein the localized loss function is represented by one of a Gram matrix and a covariance matrix.
22. The method of claim 20, wherein the localized loss function is determined by the image synthesis system using a Convolutional Neural Network (CNN), wherein the optimizing is performed by the image synthesis system using back propagation through the CNN.
23. The method of claim 20, wherein the determining of a localized loss function for a pixel in at least one of the source content image and the source style image comprises:
receiving a mask that identifies regions of at least one of the source content image and the source style image using the image synthesis system;
determining a group of pixels including the pixel that are included in one of the plurality of regions identified by the mask using the image synthesis system;
determining a localized loss function for the one of the plurality of regions from the groups of pixels included in the one of the plurality of regions using the image synthesis system; and
associating the localized loss function with the pixel using the image synthesis system.
24. The method of claim 20, wherein the determining of a localized loss function for a pixel in at least one of the source style image and the source content image comprises:
grouping the pixels of at least one of the source content image and the source style image into a plurality of cells determined by a grid applied to the source style image using the image synthesis system;
determining a localized loss function for the one of the plurality of cells that has a group of pixels that include the pixel using the image synthesis system; and
associating the determined localized loss function of the one of the plurality of cells with the pixel using the image synthesis system.
25. The method of claim 20, wherein the determining of a localized loss function for a pixel in at least one of the source style image and the source content image comprises:
determining a group of neighbor pixels for a pixel in the source content image using the image synthesis system;
determining a group of corresponding pixels in the source style image associated with the group of neighbor pixels in the source content image wherein each of the group of corresponding pixels corresponds to one of the group of neighbor pixels and includes the pixel using the image synthesis system; and
determining a local loss function for the group of corresponding pixels using the image synthesis system.
26. The method of claim 20, wherein the determining of a localized loss function for a pixel in at least one of the source style image and the source content image comprises:
determining a global content loss function for the source content image from the pixels of the source content image using the image synthesis system;
determining a weight for the pixel indicating a contribution to a structure in the source content image using the image synthesis system; and
applying the weight to the global content loss function to determine the localized loss function for the pixel using the image synthesis system.
27. The method of claim 26, wherein the weight is determined based upon a Laplacian Pyramid of black and white versions of the source content image.
28. The method of claim 20, wherein a first localized loss function is determined for a pixel in the source content image and a second localized loss function is determined for a corresponding pixel in the source style image.
29. The method of claim 28, wherein the optimizing uses the first localized loss function for the pixel in the source content image as the content loss function and the second localized loss function of the pixel in the source style image as the style loss function.
30. The method of claim 20, wherein the optimizing is performed to minimize to a loss function that includes at least one of the content loss function, a style loss function, and a histogram loss function.
US15/694,677 2016-09-02 2017-09-01 Systems and methods for providing convolutional neural network based image synthesis using stable and controllable parametric models, a multiscale synthesis framework and novel network architectures Active US9922432B1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US15/694,677 US9922432B1 (en) 2016-09-02 2017-09-01 Systems and methods for providing convolutional neural network based image synthesis using stable and controllable parametric models, a multiscale synthesis framework and novel network architectures
US15/876,011 US10424087B2 (en) 2016-09-02 2018-01-19 Systems and methods for providing convolutional neural network based image synthesis using stable and controllable parametric models, a multiscale synthesis framework and novel network architectures

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US201662383283P 2016-09-02 2016-09-02
US201762451580P 2017-01-27 2017-01-27
US201762531778P 2017-07-12 2017-07-12
US15/694,677 US9922432B1 (en) 2016-09-02 2017-09-01 Systems and methods for providing convolutional neural network based image synthesis using stable and controllable parametric models, a multiscale synthesis framework and novel network architectures

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/876,011 Continuation US10424087B2 (en) 2016-09-02 2018-01-19 Systems and methods for providing convolutional neural network based image synthesis using stable and controllable parametric models, a multiscale synthesis framework and novel network architectures

Publications (2)

Publication Number Publication Date
US20180068463A1 true US20180068463A1 (en) 2018-03-08
US9922432B1 US9922432B1 (en) 2018-03-20

Family

ID=60009665

Family Applications (2)

Application Number Title Priority Date Filing Date
US15/694,677 Active US9922432B1 (en) 2016-09-02 2017-09-01 Systems and methods for providing convolutional neural network based image synthesis using stable and controllable parametric models, a multiscale synthesis framework and novel network architectures
US15/876,011 Active US10424087B2 (en) 2016-09-02 2018-01-19 Systems and methods for providing convolutional neural network based image synthesis using stable and controllable parametric models, a multiscale synthesis framework and novel network architectures

Family Applications After (1)

Application Number Title Priority Date Filing Date
US15/876,011 Active US10424087B2 (en) 2016-09-02 2018-01-19 Systems and methods for providing convolutional neural network based image synthesis using stable and controllable parametric models, a multiscale synthesis framework and novel network architectures

Country Status (3)

Country Link
US (2) US9922432B1 (en)
EP (1) EP3507773A1 (en)
WO (1) WO2018042388A1 (en)

Cited By (96)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180144506A1 (en) * 2016-11-18 2018-05-24 Samsung Electronics Co., Ltd. Texture processing method and device
US20180260668A1 (en) * 2017-03-10 2018-09-13 Adobe Systems Incorporated Harmonizing composite images using deep learning
CN108596830A (en) * 2018-04-28 2018-09-28 国信优易数据有限公司 A kind of image Style Transfer model training method and image Style Transfer method
CN108805803A (en) * 2018-06-13 2018-11-13 衡阳师范学院 A kind of portrait style moving method based on semantic segmentation Yu depth convolutional neural networks
CN108846440A (en) * 2018-06-20 2018-11-20 腾讯科技(深圳)有限公司 Image processing method and device, computer-readable medium and electronic equipment
CN108846793A (en) * 2018-05-25 2018-11-20 深圳市商汤科技有限公司 Image processing method and terminal device based on image style transformation model
US20180342084A1 (en) * 2017-05-23 2018-11-29 Preferred Networks, Inc. Method and apparatus for automatic line drawing coloring and graphical user interface thereof
US20180357800A1 (en) * 2017-06-09 2018-12-13 Adobe Systems Incorporated Multimodal style-transfer network for applying style features from multi-resolution style exemplars to input images
CN109064434A (en) * 2018-06-28 2018-12-21 广州视源电子科技股份有限公司 Method, apparatus, storage medium and the computer equipment of image enhancement
US20190026870A1 (en) * 2017-07-19 2019-01-24 Petuum Inc. Real-time Intelligent Image Manipulation System
CN109300170A (en) * 2018-10-18 2019-02-01 云南大学 Portrait photo shadow transmission method
CN109389556A (en) * 2018-09-21 2019-02-26 五邑大学 The multiple dimensioned empty convolutional neural networks ultra-resolution ratio reconstructing method of one kind and device
CN109522939A (en) * 2018-10-26 2019-03-26 平安科技(深圳)有限公司 Image classification method, terminal device and computer readable storage medium
CN109559276A (en) * 2018-11-14 2019-04-02 武汉大学 A kind of image super-resolution rebuilding method based on reference-free quality evaluation and characteristic statistics
CN109583509A (en) * 2018-12-12 2019-04-05 南京旷云科技有限公司 Data creation method, device and electronic equipment
CN109639710A (en) * 2018-12-29 2019-04-16 浙江工业大学 A kind of network attack defence method based on dual training
US20190122394A1 (en) * 2017-10-19 2019-04-25 Fujitsu Limited Image processing apparatus and image processing method
CN109711892A (en) * 2018-12-28 2019-05-03 浙江百应科技有限公司 The method for automatically generating client's label during Intelligent voice dialog
US20190163978A1 (en) * 2017-11-30 2019-05-30 Nvidia Corporation Budget-aware method for detecting activity in video
US10311326B2 (en) * 2017-03-31 2019-06-04 Qualcomm Incorporated Systems and methods for improved image textures
CN109886207A (en) * 2019-02-25 2019-06-14 上海交通大学 Wide-area monitoring systems and method based on image Style Transfer
CN109919829A (en) * 2019-01-17 2019-06-21 北京达佳互联信息技术有限公司 Image Style Transfer method, apparatus and computer readable storage medium
CN110059707A (en) * 2019-04-25 2019-07-26 北京小米移动软件有限公司 Optimization method, device and the equipment of image characteristic point
CN110148424A (en) * 2019-05-08 2019-08-20 北京达佳互联信息技术有限公司 Method of speech processing, device, electronic equipment and storage medium
US20190259134A1 (en) * 2018-02-20 2019-08-22 Element Ai Inc. Training method for convolutional neural networks for use in artistic style transfers for video
EP3540695A1 (en) * 2018-03-13 2019-09-18 InterDigital CE Patent Holdings Method for transfer of a style of a reference visual object to another visual object, and corresponding electronic device, computer readable program products and computer readable storage medium
US10424087B2 (en) 2016-09-02 2019-09-24 Artomatix Ltd. Systems and methods for providing convolutional neural network based image synthesis using stable and controllable parametric models, a multiscale synthesis framework and novel network architectures
CN110287849A (en) * 2019-06-20 2019-09-27 北京工业大学 A kind of lightweight depth network image object detection method suitable for raspberry pie
US10467820B2 (en) * 2018-01-24 2019-11-05 Google Llc Image style transfer for three-dimensional models
US10482639B2 (en) * 2017-02-21 2019-11-19 Adobe Inc. Deep high-resolution style synthesis
CN110516803A (en) * 2018-05-21 2019-11-29 畅想科技有限公司 Traditional computer vision algorithm is embodied as neural network
CN110533741A (en) * 2019-08-08 2019-12-03 天津工业大学 A kind of camouflage pattern design method rapidly adapting to battlefield variation
CN110717955A (en) * 2019-09-29 2020-01-21 武汉极意网络科技有限公司 Method, device and equipment for updating gallery and storage medium
CN110796080A (en) * 2019-10-29 2020-02-14 重庆大学 Multi-pose pedestrian image synthesis algorithm based on generation of countermeasure network
CN110889318A (en) * 2018-09-05 2020-03-17 斯特拉德视觉公司 Lane detection method and apparatus using CNN
EP3629296A1 (en) * 2018-09-28 2020-04-01 Samsung Electronics Co., Ltd. Display apparatus control method and display apparatus using the same
US10643092B2 (en) * 2018-06-21 2020-05-05 International Business Machines Corporation Segmenting irregular shapes in images using deep region growing with an image pyramid
CN111147443A (en) * 2019-11-18 2020-05-12 四川大学 Unified quantification method for network threat attack characteristics based on style migration
CN111160279A (en) * 2019-12-31 2020-05-15 武汉星巡智能科技有限公司 Method, apparatus, device and medium for generating target recognition model using small sample
CN111178507A (en) * 2019-12-26 2020-05-19 集奥聚合(北京)人工智能科技有限公司 Atlas convolution neural network data processing method and device
CN111223039A (en) * 2020-01-08 2020-06-02 广东博智林机器人有限公司 Image style conversion method and device, electronic equipment and storage medium
CN111275126A (en) * 2020-02-12 2020-06-12 武汉轻工大学 Sample data set generation method, device, equipment and storage medium
CN111325237A (en) * 2020-01-21 2020-06-23 中国科学院深圳先进技术研究院 Image identification method based on attention interaction mechanism
CN111325232A (en) * 2018-12-13 2020-06-23 财团法人工业技术研究院 Training method of phase image generator and training method of phase image classifier
CN111340745A (en) * 2020-03-27 2020-06-26 成都安易迅科技有限公司 Image generation method and device, storage medium and electronic equipment
WO2020140421A1 (en) * 2019-01-03 2020-07-09 Boe Technology Group Co., Ltd. Computer-implemented method of training convolutional neural network, convolutional neural network, computer-implemented method using convolutional neural network, apparatus for training convolutional neural network, and computer-program product
CN111428562A (en) * 2020-02-24 2020-07-17 天津师范大学 Pedestrian re-identification method based on component guide graph convolution network
CN111507902A (en) * 2020-04-15 2020-08-07 京东城市(北京)数字科技有限公司 High-resolution image acquisition method and device
US10748232B2 (en) 2018-06-08 2020-08-18 Digimarc Corporation Generating signal bearing art using stipple, voronoi and delaunay methods and reading same
US10769764B2 (en) * 2019-02-08 2020-09-08 Adobe Inc. Hierarchical scale matching and patch estimation for image style transfer with arbitrary resolution
US10776923B2 (en) 2018-06-21 2020-09-15 International Business Machines Corporation Segmenting irregular shapes in images using deep region growing
US10776982B2 (en) 2017-07-03 2020-09-15 Artomatix Ltd. Systems and methods for providing non-parametric texture synthesis of arbitrary shape and/or material data in a unified framework
CN111738295A (en) * 2020-05-22 2020-10-02 南通大学 Image segmentation method and storage medium
CN111768335A (en) * 2020-07-02 2020-10-13 北京工商大学 CNN-based user interactive image local clothing style migration method
WO2020238120A1 (en) * 2019-05-30 2020-12-03 Guangdong Oppo Mobile Telecommunications Corp., Ltd. System and method for single-modal or multi-modal style transfer and system for random stylization using the same
CN112070010A (en) * 2020-09-08 2020-12-11 长沙理工大学 Pedestrian re-recognition method combining multi-loss dynamic training strategy to enhance local feature learning
US10872392B2 (en) 2017-11-07 2020-12-22 Digimarc Corporation Generating artistic designs encoded with robust, machine-readable data
CN112132167A (en) * 2019-06-24 2020-12-25 商汤集团有限公司 Image generation and neural network training method, apparatus, device, and medium
US10896307B2 (en) 2017-11-07 2021-01-19 Digimarc Corporation Generating and reading optical codes with variable density to adapt for visual quality and reliability
US10916001B2 (en) * 2016-11-28 2021-02-09 Adobe Inc. Facilitating sketch to painting transformations
CN112336342A (en) * 2020-10-29 2021-02-09 深圳市优必选科技股份有限公司 Hand key point detection method and device and terminal equipment
WO2021041772A1 (en) * 2019-08-30 2021-03-04 The Research Foundation For The State University Of New York Dilated convolutional neural network system and method for positron emission tomography (pet) image denoising
CN112561792A (en) * 2020-12-25 2021-03-26 北京百度网讯科技有限公司 Image style migration method and device, electronic equipment and storage medium
CN112673643A (en) * 2019-09-19 2021-04-16 海信视像科技股份有限公司 Image quality circuit, image processing apparatus, and signal feature detection method
WO2021075758A1 (en) * 2019-10-15 2021-04-22 Samsung Electronics Co., Ltd. Electronic apparatus and controlling method thereof
CN112819686A (en) * 2020-08-18 2021-05-18 腾讯科技(深圳)有限公司 Image style processing method and device based on artificial intelligence and electronic equipment
US20210150685A1 (en) * 2017-10-30 2021-05-20 Shanghai Cambricon Information Technology Co., Ltd. Information processing method and terminal device
US20210182624A1 (en) * 2018-08-31 2021-06-17 Snap Inc. Generative neural network distillation
JP2021516834A (en) * 2018-05-02 2021-07-08 コーニンクレッカ フィリップス エヌ ヴェKoninklijke Philips N.V. Generation of newborn simulation images
US11080834B2 (en) * 2019-12-26 2021-08-03 Ping An Technology (Shenzhen) Co., Ltd. Image processing method and electronic device
US11113578B1 (en) * 2020-04-13 2021-09-07 Adobe, Inc. Learned model-based image rendering
US11126915B2 (en) * 2018-10-15 2021-09-21 Sony Corporation Information processing apparatus and information processing method for volume data visualization
US20210304487A1 (en) * 2020-03-30 2021-09-30 Brother Kogyo Kabushiki Kaisha Storage medium storing program, training method of machine learning model, and image generating apparatus
US11145042B2 (en) 2019-11-12 2021-10-12 Palo Alto Research Center Incorporated Using convolutional neural network style transfer to automate graphic design creation
US11216150B2 (en) * 2019-06-28 2022-01-04 Wen-Chieh Geoffrey Lee Pervasive 3D graphical user interface with vector field functionality
US11238623B2 (en) 2017-05-01 2022-02-01 Preferred Networks, Inc. Automatic line drawing coloring program, automatic line drawing coloring apparatus, and graphical user interface program
CN114127788A (en) * 2019-04-29 2022-03-01 荻蒲仁德科技 Systems and methods for lossy image and video compression and/or transmission using a meta-network or a neural network
US20220076375A1 (en) * 2019-05-22 2022-03-10 Samsung Electronics Co., Ltd. Image processing apparatus and image processing method thereof
US20220139036A1 (en) * 2020-11-05 2022-05-05 Fyusion, Inc. Deferred neural rendering for view extrapolation
CN114493994A (en) * 2022-01-13 2022-05-13 南京市测绘勘察研究院股份有限公司 Ancient painting style migration method for three-dimensional scene
US11366981B1 (en) * 2019-12-03 2022-06-21 Apple Inc. Data augmentation for local feature detector and descriptor learning using appearance transform
US11393427B2 (en) * 2016-09-07 2022-07-19 Samsung Electronics Co., Ltd. Image processing apparatus and recording medium
US11436780B2 (en) * 2018-05-24 2022-09-06 Warner Bros. Entertainment Inc. Matching mouth shape and movement in digital video to alternative audio
US11461639B2 (en) * 2017-08-29 2022-10-04 Beijing Boe Technology Development Co., Ltd. Image processing method, image processing device, and training method of neural network
US20220351479A1 (en) * 2021-04-29 2022-11-03 Square Enix Co., Ltd. Style transfer program and style transfer method
US11546634B2 (en) * 2018-08-03 2023-01-03 V-Nova International Limited Upsampling for signal enhancement coding
US11574198B2 (en) * 2019-12-12 2023-02-07 Samsung Electronics Co., Ltd. Apparatus and method with neural network implementation of domain adaptation
US20230074420A1 (en) * 2021-09-07 2023-03-09 Nvidia Corporation Transferring geometric and texture styles in 3d asset rendering using neural networks
US20230082561A1 (en) * 2020-03-02 2023-03-16 Lg Electronics Inc. Image encoding/decoding method and device for performing feature quantization/de-quantization, and recording medium for storing bitstream
US20230087476A1 (en) * 2021-09-17 2023-03-23 Kwai Inc. Methods and apparatuses for photorealistic rendering of images using machine learning
US11694083B2 (en) * 2017-10-15 2023-07-04 Alethio Co. Signal translation system and signal translation method
US11704765B2 (en) * 2017-12-08 2023-07-18 Digimarc Corporation Artwork generated to convey digital messages, and methods/apparatuses for generating such artwork
CN116452895A (en) * 2023-06-13 2023-07-18 中国科学技术大学 Small sample image classification method, device and medium based on multi-mode symmetrical enhancement
US11748932B2 (en) * 2020-04-27 2023-09-05 Microsoft Technology Licensing, Llc Controllable image generation
WO2024063811A1 (en) * 2022-09-22 2024-03-28 Tencent America LLC Multiple attribute maps merging
CN117953361A (en) * 2024-03-27 2024-04-30 西北工业大学青岛研究院 Underwater fish shoal small target steady counting method based on density map

Families Citing this family (48)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3526770B1 (en) 2016-10-21 2020-04-15 Google LLC Stylizing input images
EP3419286A1 (en) * 2017-06-23 2018-12-26 Koninklijke Philips N.V. Processing of 3d image information based on texture maps and meshes
US10810467B2 (en) * 2017-11-17 2020-10-20 Hong Kong Applied Science and Technology Research Institute Company Limited Flexible integrating recognition and semantic processing
WO2019113471A1 (en) 2017-12-08 2019-06-13 Digimarc Corporation Artwork generated to convey digital messages, and methods/apparatuses for generating such artwork
US10706503B2 (en) * 2018-03-13 2020-07-07 Disney Enterprises, Inc. Image processing using a convolutional neural network
CN108711137B (en) * 2018-05-18 2020-08-18 西安交通大学 Image color expression mode migration method based on deep convolutional neural network
CN108665067B (en) * 2018-05-29 2020-05-29 北京大学 Compression method and system for frequent transmission of deep neural network
KR102096388B1 (en) * 2018-06-05 2020-04-06 네이버 주식회사 Optimization for dnn conposition with real-time inference in mobile environment
CN108924528B (en) * 2018-06-06 2020-07-28 浙江大学 Binocular stylized real-time rendering method based on deep learning
EP3791316A1 (en) * 2018-06-13 2021-03-17 Siemens Healthcare GmbH Localization and classification of abnormalities in medical images
CN109101806A (en) * 2018-08-17 2018-12-28 浙江捷尚视觉科技股份有限公司 A kind of privacy portrait data mask method based on Style Transfer
US10789769B2 (en) 2018-09-05 2020-09-29 Cyberlink Corp. Systems and methods for image style transfer utilizing image mask pre-processing
US10964100B2 (en) * 2018-09-10 2021-03-30 Adobe Inc. Data-driven modeling of advanced paint appearance
CN109410127B (en) * 2018-09-17 2020-09-01 西安电子科技大学 Image denoising method based on deep learning and multi-scale image enhancement
CN109285112A (en) * 2018-09-25 2019-01-29 京东方科技集团股份有限公司 Image processing method neural network based, image processing apparatus
CN110956575B (en) 2018-09-26 2022-04-12 京东方科技集团股份有限公司 Method and device for converting image style and convolution neural network processor
CN109166087A (en) * 2018-09-29 2019-01-08 上海联影医疗科技有限公司 Style conversion method, device, medical supply, image system and the storage medium of medical image
WO2020073758A1 (en) * 2018-10-10 2020-04-16 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Method and apparatus for training machine learning modle, apparatus for video style transfer
CN109377537B (en) * 2018-10-18 2020-11-06 云南大学 Style transfer method for heavy color painting
US10922573B2 (en) * 2018-10-22 2021-02-16 Future Health Works Ltd. Computer based object detection within a video or image
WO2020087173A1 (en) * 2018-11-01 2020-05-07 Element Ai Inc. Automatically applying style characteristics to images
CN109697690A (en) * 2018-11-01 2019-04-30 北京达佳互联信息技术有限公司 Image Style Transfer method and system
CN109712081B (en) * 2018-11-14 2021-01-29 浙江大学 Semantic style migration method and system fusing depth features
KR20200063289A (en) * 2018-11-16 2020-06-05 삼성전자주식회사 Image processing apparatus and operating method for the same
CN109492735B (en) * 2018-11-23 2020-06-09 清华大学 Two-dimensional code generation method and computer-readable storage medium
CN109859096A (en) * 2018-12-28 2019-06-07 北京达佳互联信息技术有限公司 Image Style Transfer method, apparatus, electronic equipment and storage medium
CN111583165B (en) * 2019-02-19 2023-08-08 京东方科技集团股份有限公司 Image processing method, device, equipment and storage medium
US10839517B2 (en) 2019-02-21 2020-11-17 Sony Corporation Multiple neural networks-based object segmentation in a sequence of color image frames
CN109894383B (en) * 2019-02-21 2021-04-23 南方科技大学 Article sorting method and device, storage medium and electronic equipment
US11074733B2 (en) 2019-03-15 2021-07-27 Neocortext, Inc. Face-swapping apparatus and method
CN110084775B (en) * 2019-05-09 2021-11-26 深圳市商汤科技有限公司 Image processing method and device, electronic equipment and storage medium
US11580673B1 (en) * 2019-06-04 2023-02-14 Duke University Methods, systems, and computer readable media for mask embedding for realistic high-resolution image synthesis
CN110399924B (en) * 2019-07-26 2021-09-07 北京小米移动软件有限公司 Image processing method, device and medium
US10593021B1 (en) * 2019-09-11 2020-03-17 Inception Institute of Artificial Intelligence, Ltd. Motion deblurring using neural network architectures
KR102248150B1 (en) * 2019-09-27 2021-05-04 영남대학교 산학협력단 Total style transfer with a single feed-forward network
US11514292B2 (en) 2019-12-30 2022-11-29 International Business Machines Corporation Grad neural networks for unstructured data
CN111260593B (en) * 2020-01-14 2023-03-14 腾讯科技(深圳)有限公司 Image processing method, image processing device, electronic equipment and storage medium
CN113139909B (en) * 2020-01-19 2022-08-02 杭州喔影网络科技有限公司 Image enhancement method based on deep learning
US11429771B2 (en) * 2020-01-31 2022-08-30 Fotonation Limited Hardware-implemented argmax layer
CN113327190A (en) 2020-02-28 2021-08-31 阿里巴巴集团控股有限公司 Image and data processing method and device
CN111626994B (en) * 2020-05-18 2023-06-06 江苏远望仪器集团有限公司 Equipment fault defect diagnosis method based on improved U-Net neural network
CN112200247B (en) * 2020-10-12 2021-07-02 西安泽塔云科技股份有限公司 Image processing system and method based on multi-dimensional image mapping
US20220156415A1 (en) * 2020-11-13 2022-05-19 Autodesk, Inc. Techniques for generating subjective style comparison metrics for b-reps of 3d cad objects
CN113111791B (en) * 2021-04-16 2024-04-09 深圳市格灵人工智能与机器人研究院有限公司 Image filter conversion network training method and computer readable storage medium
CN113325376A (en) * 2021-05-27 2021-08-31 重庆邮电大学 Method for correcting electromagnetic cross coupling error of phase control array under color noise
US11704891B1 (en) 2021-12-29 2023-07-18 Insight Direct Usa, Inc. Dynamically configured extraction, preprocessing, and publishing of a region of interest that is a subset of streaming video data
US11509836B1 (en) 2021-12-29 2022-11-22 Insight Direct Usa, Inc. Dynamically configured processing of a region of interest dependent upon published video data selected by a runtime configuration file
US11778167B1 (en) 2022-07-26 2023-10-03 Insight Direct Usa, Inc. Method and system for preprocessing optimization of streaming video data

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5487610B2 (en) * 2008-12-18 2014-05-07 ソニー株式会社 Image processing apparatus and method, and program
US8896622B2 (en) * 2009-09-04 2014-11-25 Adobe Systems Incorporated Methods and apparatus for marker-based stylistic rendering
GB201212518D0 (en) * 2012-07-13 2012-08-29 Deepmind Technologies Ltd Method and apparatus for image searching
US9576351B1 (en) * 2015-11-19 2017-02-21 Adobe Systems Incorporated Style transfer for headshot portraits
WO2018042388A1 (en) 2016-09-02 2018-03-08 Artomatix Ltd. Systems and methods for providing convolutional neural network based image synthesis using stable and controllable parametric models, a multiscale synthesis framework and novel network architectures
WO2019008519A1 (en) 2017-07-03 2019-01-10 Artomatix Ltd. Systems and methods for providing non-parametric texture synthesis of arbitrary shape and/or material data in a unified framework

Cited By (127)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10424087B2 (en) 2016-09-02 2019-09-24 Artomatix Ltd. Systems and methods for providing convolutional neural network based image synthesis using stable and controllable parametric models, a multiscale synthesis framework and novel network architectures
US11393427B2 (en) * 2016-09-07 2022-07-19 Samsung Electronics Co., Ltd. Image processing apparatus and recording medium
US10733764B2 (en) * 2016-11-18 2020-08-04 Samsung Electronics Co., Ltd. Texture processing method and device
US20180144506A1 (en) * 2016-11-18 2018-05-24 Samsung Electronics Co., Ltd. Texture processing method and device
US10916001B2 (en) * 2016-11-28 2021-02-09 Adobe Inc. Facilitating sketch to painting transformations
US11783461B2 (en) 2016-11-28 2023-10-10 Adobe Inc. Facilitating sketch to painting transformations
US10482639B2 (en) * 2017-02-21 2019-11-19 Adobe Inc. Deep high-resolution style synthesis
US20180260668A1 (en) * 2017-03-10 2018-09-13 Adobe Systems Incorporated Harmonizing composite images using deep learning
US10867416B2 (en) * 2017-03-10 2020-12-15 Adobe Inc. Harmonizing composite images using deep learning
US10311326B2 (en) * 2017-03-31 2019-06-04 Qualcomm Incorporated Systems and methods for improved image textures
US11238623B2 (en) 2017-05-01 2022-02-01 Preferred Networks, Inc. Automatic line drawing coloring program, automatic line drawing coloring apparatus, and graphical user interface program
US11915344B2 (en) 2017-05-23 2024-02-27 Preferred Networks, Inc. Method and apparatus for automatic line drawing coloring and graphical user interface thereof
US20180342084A1 (en) * 2017-05-23 2018-11-29 Preferred Networks, Inc. Method and apparatus for automatic line drawing coloring and graphical user interface thereof
US11010932B2 (en) * 2017-05-23 2021-05-18 Preferred Networks, Inc. Method and apparatus for automatic line drawing coloring and graphical user interface thereof
US10565757B2 (en) * 2017-06-09 2020-02-18 Adobe Inc. Multimodal style-transfer network for applying style features from multi-resolution style exemplars to input images
US20180357800A1 (en) * 2017-06-09 2018-12-13 Adobe Systems Incorporated Multimodal style-transfer network for applying style features from multi-resolution style exemplars to input images
US11257279B2 (en) 2017-07-03 2022-02-22 Artomatix Ltd. Systems and methods for providing non-parametric texture synthesis of arbitrary shape and/or material data in a unified framework
US10776982B2 (en) 2017-07-03 2020-09-15 Artomatix Ltd. Systems and methods for providing non-parametric texture synthesis of arbitrary shape and/or material data in a unified framework
US10832387B2 (en) * 2017-07-19 2020-11-10 Petuum Inc. Real-time intelligent image manipulation system
US20190026870A1 (en) * 2017-07-19 2019-01-24 Petuum Inc. Real-time Intelligent Image Manipulation System
US11461639B2 (en) * 2017-08-29 2022-10-04 Beijing Boe Technology Development Co., Ltd. Image processing method, image processing device, and training method of neural network
US11694083B2 (en) * 2017-10-15 2023-07-04 Alethio Co. Signal translation system and signal translation method
US10810765B2 (en) * 2017-10-19 2020-10-20 Fujitsu Limited Image processing apparatus and image processing method
US20190122394A1 (en) * 2017-10-19 2019-04-25 Fujitsu Limited Image processing apparatus and image processing method
US11922132B2 (en) * 2017-10-30 2024-03-05 Shanghai Cambricon Information Technology Co., Ltd. Information processing method and terminal device
US20210150685A1 (en) * 2017-10-30 2021-05-20 Shanghai Cambricon Information Technology Co., Ltd. Information processing method and terminal device
US10872392B2 (en) 2017-11-07 2020-12-22 Digimarc Corporation Generating artistic designs encoded with robust, machine-readable data
US10896307B2 (en) 2017-11-07 2021-01-19 Digimarc Corporation Generating and reading optical codes with variable density to adapt for visual quality and reliability
US10860859B2 (en) * 2017-11-30 2020-12-08 Nvidia Corporation Budget-aware method for detecting activity in video
US20190163978A1 (en) * 2017-11-30 2019-05-30 Nvidia Corporation Budget-aware method for detecting activity in video
US11704765B2 (en) * 2017-12-08 2023-07-18 Digimarc Corporation Artwork generated to convey digital messages, and methods/apparatuses for generating such artwork
US10467820B2 (en) * 2018-01-24 2019-11-05 Google Llc Image style transfer for three-dimensional models
US20190259134A1 (en) * 2018-02-20 2019-08-22 Element Ai Inc. Training method for convolutional neural networks for use in artistic style transfers for video
US10825132B2 (en) * 2018-02-20 2020-11-03 Element Ai Inc. Training method for convolutional neural networks for use in artistic style transfers for video
EP3540695A1 (en) * 2018-03-13 2019-09-18 InterDigital CE Patent Holdings Method for transfer of a style of a reference visual object to another visual object, and corresponding electronic device, computer readable program products and computer readable storage medium
CN108596830A (en) * 2018-04-28 2018-09-28 国信优易数据有限公司 A kind of image Style Transfer model training method and image Style Transfer method
JP7008845B2 (en) 2018-05-02 2022-01-25 コーニンクレッカ フィリップス エヌ ヴェ Generation of simulation images for newborns
JP2021516834A (en) * 2018-05-02 2021-07-08 コーニンクレッカ フィリップス エヌ ヴェKoninklijke Philips N.V. Generation of newborn simulation images
CN110516803A (en) * 2018-05-21 2019-11-29 畅想科技有限公司 Traditional computer vision algorithm is embodied as neural network
US11436780B2 (en) * 2018-05-24 2022-09-06 Warner Bros. Entertainment Inc. Matching mouth shape and movement in digital video to alternative audio
CN108846793A (en) * 2018-05-25 2018-11-20 深圳市商汤科技有限公司 Image processing method and terminal device based on image style transformation model
US10748232B2 (en) 2018-06-08 2020-08-18 Digimarc Corporation Generating signal bearing art using stipple, voronoi and delaunay methods and reading same
US11276133B2 (en) 2018-06-08 2022-03-15 Digimarc Corporation Generating signal bearing art using stipple, Voronoi and Delaunay methods and reading same
US11657470B2 (en) 2018-06-08 2023-05-23 Digimarc Corporation Generating signal bearing art using Stipple, Voronoi and Delaunay methods and reading same
CN108805803A (en) * 2018-06-13 2018-11-13 衡阳师范学院 A kind of portrait style moving method based on semantic segmentation Yu depth convolutional neural networks
CN108846440A (en) * 2018-06-20 2018-11-20 腾讯科技(深圳)有限公司 Image processing method and device, computer-readable medium and electronic equipment
US10776923B2 (en) 2018-06-21 2020-09-15 International Business Machines Corporation Segmenting irregular shapes in images using deep region growing
US10643092B2 (en) * 2018-06-21 2020-05-05 International Business Machines Corporation Segmenting irregular shapes in images using deep region growing with an image pyramid
CN109064434A (en) * 2018-06-28 2018-12-21 广州视源电子科技股份有限公司 Method, apparatus, storage medium and the computer equipment of image enhancement
US11546634B2 (en) * 2018-08-03 2023-01-03 V-Nova International Limited Upsampling for signal enhancement coding
US20210182624A1 (en) * 2018-08-31 2021-06-17 Snap Inc. Generative neural network distillation
US11727280B2 (en) * 2018-08-31 2023-08-15 Snap Inc. Generative neural network distillation
CN110889318A (en) * 2018-09-05 2020-03-17 斯特拉德视觉公司 Lane detection method and apparatus using CNN
CN109389556A (en) * 2018-09-21 2019-02-26 五邑大学 The multiple dimensioned empty convolutional neural networks ultra-resolution ratio reconstructing method of one kind and device
US11043013B2 (en) 2018-09-28 2021-06-22 Samsung Electronics Co., Ltd. Display apparatus control method and display apparatus using the same
EP3629296A1 (en) * 2018-09-28 2020-04-01 Samsung Electronics Co., Ltd. Display apparatus control method and display apparatus using the same
KR20200036661A (en) * 2018-09-28 2020-04-07 삼성전자주식회사 Method for controlling of a display apparatus and display apparatus thereof
KR102640234B1 (en) 2018-09-28 2024-02-23 삼성전자주식회사 Method for controlling of a display apparatus and display apparatus thereof
WO2020067759A1 (en) * 2018-09-28 2020-04-02 Samsung Electronics Co., Ltd. Display apparatus control method and display apparatus using the same
US11126915B2 (en) * 2018-10-15 2021-09-21 Sony Corporation Information processing apparatus and information processing method for volume data visualization
CN109300170A (en) * 2018-10-18 2019-02-01 云南大学 Portrait photo shadow transmission method
WO2020082595A1 (en) * 2018-10-26 2020-04-30 平安科技(深圳)有限公司 Image classification method, terminal device and non-volatile computer readable storage medium
CN109522939A (en) * 2018-10-26 2019-03-26 平安科技(深圳)有限公司 Image classification method, terminal device and computer readable storage medium
CN109559276A (en) * 2018-11-14 2019-04-02 武汉大学 A kind of image super-resolution rebuilding method based on reference-free quality evaluation and characteristic statistics
CN109583509A (en) * 2018-12-12 2019-04-05 南京旷云科技有限公司 Data creation method, device and electronic equipment
CN111325232A (en) * 2018-12-13 2020-06-23 财团法人工业技术研究院 Training method of phase image generator and training method of phase image classifier
CN109711892A (en) * 2018-12-28 2019-05-03 浙江百应科技有限公司 The method for automatically generating client's label during Intelligent voice dialog
CN109639710A (en) * 2018-12-29 2019-04-16 浙江工业大学 A kind of network attack defence method based on dual training
US11537849B2 (en) 2019-01-03 2022-12-27 Boe Technology Group Co., Ltd. Computer-implemented method of training convolutional neural network, convolutional neural network, computer-implemented method using convolutional neural network, apparatus for training convolutional neural network, and computer-program product
WO2020140421A1 (en) * 2019-01-03 2020-07-09 Boe Technology Group Co., Ltd. Computer-implemented method of training convolutional neural network, convolutional neural network, computer-implemented method using convolutional neural network, apparatus for training convolutional neural network, and computer-program product
CN109919829A (en) * 2019-01-17 2019-06-21 北京达佳互联信息技术有限公司 Image Style Transfer method, apparatus and computer readable storage medium
US11232547B2 (en) * 2019-02-08 2022-01-25 Adobe Inc. Hierarchical scale matching and patch estimation for image style transfer with arbitrary resolution
US10769764B2 (en) * 2019-02-08 2020-09-08 Adobe Inc. Hierarchical scale matching and patch estimation for image style transfer with arbitrary resolution
CN109886207A (en) * 2019-02-25 2019-06-14 上海交通大学 Wide-area monitoring systems and method based on image Style Transfer
CN110059707A (en) * 2019-04-25 2019-07-26 北京小米移动软件有限公司 Optimization method, device and the equipment of image characteristic point
CN114127788A (en) * 2019-04-29 2022-03-01 荻蒲仁德科技 Systems and methods for lossy image and video compression and/or transmission using a meta-network or a neural network
CN110148424B (en) * 2019-05-08 2021-05-25 北京达佳互联信息技术有限公司 Voice processing method and device, electronic equipment and storage medium
CN110148424A (en) * 2019-05-08 2019-08-20 北京达佳互联信息技术有限公司 Method of speech processing, device, electronic equipment and storage medium
US11836890B2 (en) * 2019-05-22 2023-12-05 Samsung Electronics Co., Ltd. Image processing apparatus and image processing method thereof
US11295412B2 (en) * 2019-05-22 2022-04-05 Samsung Electronics Co., Ltd. Image processing apparatus and image processing method thereof
US20220076375A1 (en) * 2019-05-22 2022-03-10 Samsung Electronics Co., Ltd. Image processing apparatus and image processing method thereof
WO2020238120A1 (en) * 2019-05-30 2020-12-03 Guangdong Oppo Mobile Telecommunications Corp., Ltd. System and method for single-modal or multi-modal style transfer and system for random stylization using the same
CN110287849A (en) * 2019-06-20 2019-09-27 北京工业大学 A kind of lightweight depth network image object detection method suitable for raspberry pie
CN112132167A (en) * 2019-06-24 2020-12-25 商汤集团有限公司 Image generation and neural network training method, apparatus, device, and medium
WO2020258902A1 (en) * 2019-06-24 2020-12-30 商汤集团有限公司 Image generating and neural network training method, apparatus, device, and medium
US11216150B2 (en) * 2019-06-28 2022-01-04 Wen-Chieh Geoffrey Lee Pervasive 3D graphical user interface with vector field functionality
CN110533741A (en) * 2019-08-08 2019-12-03 天津工业大学 A kind of camouflage pattern design method rapidly adapting to battlefield variation
US11540798B2 (en) 2019-08-30 2023-01-03 The Research Foundation For The State University Of New York Dilated convolutional neural network system and method for positron emission tomography (PET) image denoising
WO2021041772A1 (en) * 2019-08-30 2021-03-04 The Research Foundation For The State University Of New York Dilated convolutional neural network system and method for positron emission tomography (pet) image denoising
CN112673643A (en) * 2019-09-19 2021-04-16 海信视像科技股份有限公司 Image quality circuit, image processing apparatus, and signal feature detection method
CN110717955A (en) * 2019-09-29 2020-01-21 武汉极意网络科技有限公司 Method, device and equipment for updating gallery and storage medium
US11443537B2 (en) 2019-10-15 2022-09-13 Samsung Electronics Co., Ltd. Electronic apparatus and controlling method thereof
WO2021075758A1 (en) * 2019-10-15 2021-04-22 Samsung Electronics Co., Ltd. Electronic apparatus and controlling method thereof
CN110796080A (en) * 2019-10-29 2020-02-14 重庆大学 Multi-pose pedestrian image synthesis algorithm based on generation of countermeasure network
US11145042B2 (en) 2019-11-12 2021-10-12 Palo Alto Research Center Incorporated Using convolutional neural network style transfer to automate graphic design creation
CN111147443A (en) * 2019-11-18 2020-05-12 四川大学 Unified quantification method for network threat attack characteristics based on style migration
US11366981B1 (en) * 2019-12-03 2022-06-21 Apple Inc. Data augmentation for local feature detector and descriptor learning using appearance transform
US11574198B2 (en) * 2019-12-12 2023-02-07 Samsung Electronics Co., Ltd. Apparatus and method with neural network implementation of domain adaptation
US11080834B2 (en) * 2019-12-26 2021-08-03 Ping An Technology (Shenzhen) Co., Ltd. Image processing method and electronic device
CN111178507A (en) * 2019-12-26 2020-05-19 集奥聚合(北京)人工智能科技有限公司 Atlas convolution neural network data processing method and device
CN111160279A (en) * 2019-12-31 2020-05-15 武汉星巡智能科技有限公司 Method, apparatus, device and medium for generating target recognition model using small sample
CN111223039A (en) * 2020-01-08 2020-06-02 广东博智林机器人有限公司 Image style conversion method and device, electronic equipment and storage medium
CN111325237A (en) * 2020-01-21 2020-06-23 中国科学院深圳先进技术研究院 Image identification method based on attention interaction mechanism
CN111275126A (en) * 2020-02-12 2020-06-12 武汉轻工大学 Sample data set generation method, device, equipment and storage medium
CN111428562A (en) * 2020-02-24 2020-07-17 天津师范大学 Pedestrian re-identification method based on component guide graph convolution network
US20230082561A1 (en) * 2020-03-02 2023-03-16 Lg Electronics Inc. Image encoding/decoding method and device for performing feature quantization/de-quantization, and recording medium for storing bitstream
CN111340745A (en) * 2020-03-27 2020-06-26 成都安易迅科技有限公司 Image generation method and device, storage medium and electronic equipment
US20210304487A1 (en) * 2020-03-30 2021-09-30 Brother Kogyo Kabushiki Kaisha Storage medium storing program, training method of machine learning model, and image generating apparatus
US11625886B2 (en) * 2020-03-30 2023-04-11 Brother Kogyo Kabushiki Kaisha Storage medium storing program, training method of machine learning model, and image generating apparatus
US11113578B1 (en) * 2020-04-13 2021-09-07 Adobe, Inc. Learned model-based image rendering
CN111507902A (en) * 2020-04-15 2020-08-07 京东城市(北京)数字科技有限公司 High-resolution image acquisition method and device
US11748932B2 (en) * 2020-04-27 2023-09-05 Microsoft Technology Licensing, Llc Controllable image generation
CN111738295A (en) * 2020-05-22 2020-10-02 南通大学 Image segmentation method and storage medium
CN111768335A (en) * 2020-07-02 2020-10-13 北京工商大学 CNN-based user interactive image local clothing style migration method
CN112819686A (en) * 2020-08-18 2021-05-18 腾讯科技(深圳)有限公司 Image style processing method and device based on artificial intelligence and electronic equipment
CN112070010A (en) * 2020-09-08 2020-12-11 长沙理工大学 Pedestrian re-recognition method combining multi-loss dynamic training strategy to enhance local feature learning
CN112336342A (en) * 2020-10-29 2021-02-09 深圳市优必选科技股份有限公司 Hand key point detection method and device and terminal equipment
US11887256B2 (en) * 2020-11-05 2024-01-30 Fyusion, Inc. Deferred neural rendering for view extrapolation
US20220139036A1 (en) * 2020-11-05 2022-05-05 Fyusion, Inc. Deferred neural rendering for view extrapolation
CN112561792A (en) * 2020-12-25 2021-03-26 北京百度网讯科技有限公司 Image style migration method and device, electronic equipment and storage medium
US20220351479A1 (en) * 2021-04-29 2022-11-03 Square Enix Co., Ltd. Style transfer program and style transfer method
US20230074420A1 (en) * 2021-09-07 2023-03-09 Nvidia Corporation Transferring geometric and texture styles in 3d asset rendering using neural networks
US20230087476A1 (en) * 2021-09-17 2023-03-23 Kwai Inc. Methods and apparatuses for photorealistic rendering of images using machine learning
CN114493994A (en) * 2022-01-13 2022-05-13 南京市测绘勘察研究院股份有限公司 Ancient painting style migration method for three-dimensional scene
WO2024063811A1 (en) * 2022-09-22 2024-03-28 Tencent America LLC Multiple attribute maps merging
CN116452895A (en) * 2023-06-13 2023-07-18 中国科学技术大学 Small sample image classification method, device and medium based on multi-mode symmetrical enhancement
CN117953361A (en) * 2024-03-27 2024-04-30 西北工业大学青岛研究院 Underwater fish shoal small target steady counting method based on density map

Also Published As

Publication number Publication date
EP3507773A1 (en) 2019-07-10
WO2018042388A1 (en) 2018-03-08
US9922432B1 (en) 2018-03-20
US10424087B2 (en) 2019-09-24
US20180144509A1 (en) 2018-05-24

Similar Documents

Publication Publication Date Title
US10424087B2 (en) Systems and methods for providing convolutional neural network based image synthesis using stable and controllable parametric models, a multiscale synthesis framework and novel network architectures
Tewari et al. Fml: Face model learning from videos
Wang et al. Multimodal transfer: A hierarchical deep convolutional neural network for fast artistic style transfer
Zhang et al. Multimodal style transfer via graph cuts
Zhang et al. Age progression/regression by conditional adversarial autoencoder
Simo-Serra et al. Learning to simplify: fully convolutional networks for rough sketch cleanup
Shocher et al. Ingan: Capturing and remapping the" dna" of a natural image
US8411948B2 (en) Up-sampling binary images for segmentation
CN118172460A (en) Semantic image synthesis for generating substantially realistic images using neural networks
WO2021027759A1 (en) Facial image processing
CN111986075B (en) Style migration method for target edge clarification
Liu et al. Structure-guided arbitrary style transfer for artistic image and video
JP2024500896A (en) Methods, systems and methods for generating 3D head deformation models
Thasarathan et al. Automatic temporally coherent video colorization
JP7462120B2 (en) Method, system and computer program for extracting color from two-dimensional (2D) facial images
US20240169661A1 (en) Uv mapping on 3d objects with the use of artificial intelligence
De Souza et al. A review on generative adversarial networks for image generation
CN116997933A (en) Method and system for constructing facial position map
KR20230110787A (en) Methods and systems for forming personalized 3D head and face models
Guo et al. Attribute-controlled face photo synthesis from simple line drawing
US20230319223A1 (en) Method and system for deep learning based face swapping with multiple encoders
Zhao et al. Purifying naturalistic images through a real-time style transfer semantics network
CN116030181A (en) 3D virtual image generation method and device
CN113034560A (en) Non-uniform texture migration method and device
Šoberl Mixed reality and deep learning: Augmenting visual information using generative adversarial networks

Legal Events

Date Code Title Description
FEPP Fee payment procedure

Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.)

AS Assignment

Owner name: ARTOMATIX LTD., IRELAND

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:RISSER, ERIC ANDREW;REEL/FRAME:043565/0783

Effective date: 20170908

FEPP Fee payment procedure

Free format text: ENTITY STATUS SET TO SMALL (ORIGINAL EVENT CODE: SMAL)

STCF Information on status: patent grant

Free format text: PATENTED CASE

FEPP Fee payment procedure

Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 4