US20180374249A1 - Synthesizing Images of Clothing on Models - Google Patents
Synthesizing Images of Clothing on Models Download PDFInfo
- Publication number
- US20180374249A1 US20180374249A1 US15/634,171 US201715634171A US2018374249A1 US 20180374249 A1 US20180374249 A1 US 20180374249A1 US 201715634171 A US201715634171 A US 201715634171A US 2018374249 A1 US2018374249 A1 US 2018374249A1
- Authority
- US
- United States
- Prior art keywords
- image
- garment
- network
- model
- images
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G06K9/6256—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T11/00—2D [Two Dimensional] image generation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T11/00—2D [Two Dimensional] image generation
- G06T11/60—Editing figures and text; Combining figures or text
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2210/00—Indexing scheme for image generation or computer graphics
- G06T2210/16—Cloth
Definitions
- the invention relates to automatic image generation. More specifically, the invention relates to methods for creating synthetic images based on a training set of image pairs. Once trained, an embodiment can accept a new image that is like one member of an image pair, and create a synthetic image representing what the other image of the pair might look like.
- Neural networks are computing systems that are inspired by biological processes. In particular, they comprise a plurality of modules that are designed to operate similarly to the way biological neurons are thought to function. Neurons are often modeled as multiple-input, single-output thresholding accumulators, which “fire” if enough of the inputs are active enough. (The output of one model neuron may be connected to a number of subsequent neurons model inputs, or even to previous neuron models in a feedback loop.) Although the function of a single neuron model is quite simple, it has been observed that a large number of models operating simultaneously, which are configured or “trained” properly, can produce surprisingly good results in problems that are difficult to address with traditional data processing or programming techniques.
- One common neural-network task is image recognition: a plurality of neurons are arranged in a pyramid-like hierarchy, with an array of input neurons for each pixel of an image, followed by one or more decreasing-sized layers of neurons, and where an output neuron is designated to indicate whether an input image has a characteristic of interest.
- a network can be “trained” by exposure to a set of images where the characteristic of interest is present or absent. For example, the characteristic may be whether a penguin is present in the image. Once trained, the network may be able to determine fairly accurately whether a penguin is depicted in a new image, not part of the training set.
- Neural networks can also be used to generate or synthesize new information based on prior training and a random seed. For example, biophysicist Dr. Mike Tyka has experimented with training neural networks to generate images that resemble artistic portraits of human faces, although there is no actual subject involved, and any resemblance to an individual is purely coincidental.
- neural networks are not infallible (recognizers may misidentify a target, or generative networks may construct output that is not useable for the intended purpose) they are often adequate for practical use in applications where deterministic methods are too slow, too complex or too expensive. Thus, neural networks can fill a gap between rote mechanical methods that are easy to implement on a computer, and labor-intensive methods that depend on human judgment to achieve best results.
- a neural network is trained with pairs of images, where one image of a pair shows a garment, and the other image shows a model wearing the garment. Then, a new image is presented to the trained network, and a synthetic image that might be the new image's pair is automatically generated. The synthetic image is displayed to a user.
- FIG. 1 is a flow chart outlining a method according to an embodiment of the invention.
- FIG. 2 is a simplified representation of data flow through a neural network implementing an embodiment of the invention.
- FIG. 3 shows a range of synthetic images created as a “model pose” image control parameter is varied.
- FIG. 4 is a flow chart outlining a more comprehensive application of an embodiment.
- FIG. 5 is a flow chart outlining another application of an embodiment.
- Embodiments of the invention use neural networks to synthesize images of models wearing garments from less-expensive images of the garments alone.
- the synthesis process exposes controls that can be used to adjust characteristics of the synthesized image. For example, the skin tone, body shape, pose and other characteristics of the model in the synthesized image may be varied over a useful range. This permits a user (e.g., a customer) to adjust a synthetic “as-worn” image to resemble the user more closely.
- an embodiment allows a user to get a better idea of how a particular garment may appear when worn by the user. This improved representation may improve a retailer's chance of making a sale (or of avoiding the return of a garment that, upon trying on, the purchaser did not like).
- FIG. 1 shows an overview of a central portion of an embodiment's operations.
- the method begins by initializing a neural network ( 100 ).
- This network is trained using pairs of images ( 110 ): one image of each pair shows a garment (e.g., a garment laid out flat on a neutral surface); and the other image of each pair shows a model wearing the garment.
- useful parameters from the “Z vector” (described below) are identified ( 120 ).
- a garment image is provided to the trained network ( 130 ).
- the garment image need not be (and preferably is not) one of the training images, but instead is an image without a corresponding mate.
- the network synthesizes an image of this garment on a model, based on the network's training and parameters of the Z vector ( 140 ).
- the synthetic image is displayed to the user ( 150 ).
- the user may adjust a Z vector parameter ( 160 ) and a new image is synthesized ( 140 ) and displayed ( 150 ).
- the adjustment and re-synthesizing may be repeated as often as desired to produce a variety of synthetic images of a model wearing the garment.
- the Z vector parameters may control characteristics such as the model's skin tone, body shape and size, pose or position, and even accessories (e.g., shoe style, handbags, glasses, scarves or jewelry), so the user may be able to control the synthesis process to produce images that more closely resemble what the garment would look like if worn by the user.
- characteristics such as the model's skin tone, body shape and size, pose or position, and even accessories (e.g., shoe style, handbags, glasses, scarves or jewelry), so the user may be able to control the synthesis process to produce images that more closely resemble what the garment would look like if worn by the user.
- FIG. 2 shows a conceptual representation of a Generative Adversarial Network, which is one type of neural network that can be used in an embodiment of the invention.
- Information can generally be understood as passing from left to right across the drawing.
- An image of a garment 210 is communicated to an input layer 221 of a first neural network 220 .
- the input layer may have approximately as many input elements as there are pixels in the input image (i.e., many more elements than are shown in this drawing).
- an input layer may comprise elements for each color channel of each input pixel (e.g., red, green and blue elements for each pixel).
- the input layer 221 is connected to an intermediate layer 223 by a network of variable connection weights 222 , and the intermediate layer 223 is connected to an output layer 225 by a similar network 224 .
- an embodiment may use feedback connections 226 , which may go to an immediately preceding layer, or to an earlier layer.
- Each neural network may have more than the three layers shown here. In a preferred embodiment, about seven (7) layers are used.
- each layer of neural network 220 (traveling left to right) is smaller than the preceding layer.
- the network can be thought of as performing a kind of compression, resulting in Z-Vector 250 , which in many embodiments is a vector of real numbers that comprises the outputs of the last layer of network 220 .
- Z-Vector 250 is used as an input to a second neural network 230 .
- This network is similar in structure to network 220 (it comprises layers of model neurons 231 , 233 and 235 , interconnected by variable-weight connections 232 , 234 ), but the number of elements in each layer is increasing (in the direction of data flow).
- network 230 may include feedback connections (e.g., 236 ) which carry data in the “opposite” direction.
- the output of network 230 once the GAN has been trained, is a synthetic image 240 that represents what garment 210 may look like when worn by a model.
- an embodiment may use inter-network connections 227 , 228 or 237 .
- These typically connect layers of similar depth ( 227 , 237 ), but provisions may be made for connections between different depths ( 228 ).
- depth means “levels away from input of left-hand network 220 or levels away from output of right-hand network 230 ,” in recognition of the fact that the networks are to some degree mirror images of each other.
- skip connections may be referred to as “skip connections.”
- a skip connection provides a simple, direct way for the networks to pass through information that does not affect the “compression” and “synthesis” operations very much.
- a skip connection can carry the color of the sample image directly through to the output network, rather than relying on the input network to recognize the color and communicate it to the generative network through the Z vector.
- Z-Vector elements encode characteristics that the GAN has learned to recognize (and to use in image synthesis) after processing the training image pairs.
- one Z-Vector element may control model skin tone.
- Another element may control model pose (including whether the model is facing the camera or turned to one side, or the position of the arms or legs).
- An element may control the style of shoe depicted in the synthesized image—these may be similar to one of the styles learned from the training images, but the output image shoes are not simply copied from one of the training-image model's legs.
- the generative network 230 constructs an image, including shoes or other accessories, that looks like it might have been seen among the training images, if the input image 210 had had a corresponding model-garment companion image.
- adjusting the Z-Vector elements can cause the generative network to make a new image with a different skin tone, model pose, or shoe style.
- Z-Vector elements may also control details such as dress length, sleeve length, or collar style. I.e., the synthesized image may show ways in which the garment might be altered, as well as ways in which the model might be altered.
- a Z-Vector element controls model rotation, so even though the input dress image is shown head-on, a range of plausible dress-on-model images can be produced, with the model turning from left to right across the range.
- An example of this is shown in FIG. 3 . All 25 images were synthesized from a single image of a black dress by a network trained with a variety of dress/model pairs of photos.
- GAN Generative Autonomous Network
- RNNs Recurrent Neural Networks
- RIMs Recurrent Inference Machines
- VAEs Variational Autoencoders
- each node at a particular level was drawn with connections to all nodes at the next level.
- a few inter-level connections were shown.
- a neural network may be fully-connected: the weighted output of each node may form part of the input signal to every other node (even including the node itself).
- Such a network is very difficult to depict in a two-dimensional figure, but it is a common topology, known to practitioners of ordinary skill.
- Another alternative is a convolutional neural network. This topology is more space-efficient than a fully-connected network, so it can often operate on larger input images (and produce higher-resolution output images).
- convolutional networks are known to those of ordinary skill in the art, and can be used to good effect by adhering to the principles and approaches described herein.
- input images may be about 192 ⁇ 256 pixels, with three color channels for each pixel (red, green & blue).
- the generative network may mirror the input network, starting with the Z vector and emitting a 192 ⁇ 256 synthetic color image.
- Not all elements of the Z vector correspond to a recognizable characteristic such as “skin tone,” “dress length,” “shoe style” or “model pose.”
- the effects of individual vector elements (and of subsets of the vector) may be determined empirically, or by providing additional information about training image pairs (e.g., by tagging the images with characteristic descriptions) and propagating the additional information through the network during training.
- PCA Principal Component Analysis
- FIG. 4 outlines a method used by a complete application built around an image synthesizer according to an embodiment of the invention.
- a system operator initializes and trains a neural network as described above, using a training set of image pairs where one member of each pair depicts a garment, and the other member of each pair depicts a model wearing the garment ( 410 ).
- a database of garment images is populated ( 420 ). These images are similar to the first images of the training set, and may even include the training images. These are, for example, images of garments offered for sale by the system operator.
- a customer visits the operator's system (e.g., when she accesses an e-commerce web site)
- she may search or browse the catalog of garments using any suitable prior-art method ( 430 ).
- garments may be grouped and presented by color, style, weight, designer, size, price, or any other desired arrangement.
- the system synthesizes and displays an image of the garment on a model ( 450 ).
- the user may be offered an array of controls which are connected to suitable elements of the Z-vector, and she may adjust those parameters as desired ( 460 ).
- a parameter is adjusted, the system synthesizes and displays a new image of the garment on the model ( 450 ).
- the user decides not to purchase this garment ( 460 ), for example by clicking a “Back” button or returning to a list of search results, she may continue to view other garments for sale ( 430 ). If the user decides to purchase the garment ( 480 ), then information about the selected garment (e.g., a SKU) is passed to a prior-art order-fulfillment process for subsequent activity ( 490 ).
- information about the selected garment e.g., a SKU
- An embodiment of the invention may combine a garment selection control with controls for other aspects of image generation (skin tone, pose, body shape, accessories, etc.) Then, by manipulating individual controls of this plurality of controls, the user can change the garment (leaving the skin tone, pose, and accessories alone), or switch among accessories (leaving skin tone, pose and garment alone).
- This embodiment permits quick, self-directed comparisons among complete “outfits” or “looks,” a capability that is currently provided at significant expense by human fashion coordinators, and consequently mostly unavailable to shoppers of ordinary or modest means.
- FIG. 5 outlines another application of an image synthesis network as described above.
- the method begins by initializing and training an image synthesis network using a training set of image pairs ( 500 ).
- the system acquires a garment image ( 510 ), synthesizes an image of the garment on a model ( 520 ), and stores the synthesized image ( 530 ). If there are more garment images to process ( 540 ), these steps are repeated, resulting in a library of synthesized images showing a variety of garments of which images were acquired and processed.
- the image synthesis may use randomly-chosen Z-vector parameters, which will yield a variety of different model skin tones, shapes, poses, and accessories in the synthesized image libraries.
- synthesized images from the library may be incorporated into a catalog layout ( 560 ) and printed ( 570 ); or a plurality of static web pages comprising one or more synthetic images may be generated ( 580 ), and those web pages may be served to visitors to a web site ( 590 ). This process may reduce the cost to produce a catalog of product images or a website displaying many garments.
- the method may be integrated with a prior-art garment processing sequence, such as a consignment dealer that receives many heterogenous garments from a variety of manufacturers.
- a prior-art garment processing sequence such as a consignment dealer that receives many heterogenous garments from a variety of manufacturers.
- These garments may be passed through a steam chamber to dissipate storage or transport odors and to remove wrinkles.
- the garments may be placed on a mannequin during this process, and an image of the freshly-steamed garment may be automatically captured at the end. This image may be delivered to the process outlined in FIG. 5 , as the “acquired image” at 510 .
- Multiple views of the garment may be acquired, and may permit the synthesis of a greater variety of “garment on model” images, provided that the neural network has been trained with a correspondingly greater variety of image pairs.
- An embodiment of the invention may be a machine-readable medium, including without limitation a non-transient machine-readable medium, having stored thereon data and instructions to cause a programmable processor to perform operations as described above.
- the operations might be performed by specific hardware components that contain hardwired logic. Those operations might alternatively be performed by any combination of programmed computer components and custom hardware components.
- Instructions for a programmable processor may be stored in a form that is directly executable by the processor (“object” or “executable” form), or the instructions may be stored in a human-readable text form called “source code” that can be automatically processed by a development tool commonly known as a “compiler” to produce executable code. Instructions may also be specified as a difference or “delta” from a predetermined version of a basic source code. The delta (also called a “patch”) can be used to prepare instructions to implement an embodiment of the invention, starting with a commonly-available source code package that does not contain an embodiment.
- the instructions for a programmable processor may be treated as data and used to modulate a carrier signal, which can subsequently be sent to a remote receiver, where the signal is demodulated to recover the instructions, and the instructions are executed to implement the methods of an embodiment at the remote receiver.
- modulation and transmission are known as “serving” the instructions, while receiving and demodulating are often called “downloading.”
- serving i.e., encodes and sends
- downloading often called “downloading.”
- one embodiment “serves” i.e., encodes and sends) the instructions of an embodiment to a client, often over a distributed data network like the Internet.
- the instructions thus transmitted can be saved on a hard disk or other data storage device at the receiver to create another embodiment of the invention, meeting the description of a non-transient machine-readable medium storing data and instructions to perform some of the operations discussed above. Compiling (if necessary) and executing such an embodiment at the receiver may result in the receiver performing operations according to a third embodiment.
- the present invention also relates to apparatus for performing the operations herein.
- This apparatus may be specially constructed for the required purposes, or it may comprise a general-purpose computer selectively activated or reconfigured by a computer program stored in the computer.
- a computer program may be stored in a computer readable storage medium, including without limitation any type of disk including floppy disks, optical disks, compact disc read-only memory (“CD-ROM”), and magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), eraseable, programmable read-only memories (“EPROMs”), electrically-eraseable read-only memories (“EEPROMs”), magnetic or optical cards, or any type of media suitable for storing computer instructions.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Life Sciences & Earth Sciences (AREA)
- Databases & Information Systems (AREA)
- Biomedical Technology (AREA)
- Medical Informatics (AREA)
- Data Mining & Analysis (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- Software Systems (AREA)
- Biodiversity & Conservation Biology (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
Description
- This is an original U.S. patent application.
- The invention relates to automatic image generation. More specifically, the invention relates to methods for creating synthetic images based on a training set of image pairs. Once trained, an embodiment can accept a new image that is like one member of an image pair, and create a synthetic image representing what the other image of the pair might look like.
- Neural networks are computing systems that are inspired by biological processes. In particular, they comprise a plurality of modules that are designed to operate similarly to the way biological neurons are thought to function. Neurons are often modeled as multiple-input, single-output thresholding accumulators, which “fire” if enough of the inputs are active enough. (The output of one model neuron may be connected to a number of subsequent neurons model inputs, or even to previous neuron models in a feedback loop.) Although the function of a single neuron model is quite simple, it has been observed that a large number of models operating simultaneously, which are configured or “trained” properly, can produce surprisingly good results in problems that are difficult to address with traditional data processing or programming techniques.
- One common neural-network task is image recognition: a plurality of neurons are arranged in a pyramid-like hierarchy, with an array of input neurons for each pixel of an image, followed by one or more decreasing-sized layers of neurons, and where an output neuron is designated to indicate whether an input image has a characteristic of interest. Such a network can be “trained” by exposure to a set of images where the characteristic of interest is present or absent. For example, the characteristic may be whether a penguin is present in the image. Once trained, the network may be able to determine fairly accurately whether a penguin is depicted in a new image, not part of the training set.
- Neural networks can also be used to generate or synthesize new information based on prior training and a random seed. For example, biophysicist Dr. Mike Tyka has experimented with training neural networks to generate images that resemble artistic portraits of human faces, although there is no actual subject involved, and any resemblance to an individual is purely coincidental.
- Although neural networks are not infallible (recognizers may misidentify a target, or generative networks may construct output that is not useable for the intended purpose) they are often adequate for practical use in applications where deterministic methods are too slow, too complex or too expensive. Thus, neural networks can fill a gap between rote mechanical methods that are easy to implement on a computer, and labor-intensive methods that depend on human judgment to achieve best results.
- A neural network is trained with pairs of images, where one image of a pair shows a garment, and the other image shows a model wearing the garment. Then, a new image is presented to the trained network, and a synthetic image that might be the new image's pair is automatically generated. The synthetic image is displayed to a user.
-
FIG. 1 is a flow chart outlining a method according to an embodiment of the invention. -
FIG. 2 is a simplified representation of data flow through a neural network implementing an embodiment of the invention. -
FIG. 3 shows a range of synthetic images created as a “model pose” image control parameter is varied. -
FIG. 4 is a flow chart outlining a more comprehensive application of an embodiment. -
FIG. 5 is a flow chart outlining another application of an embodiment. - Most people wear clothes, and many people choose and purchase some of their clothes from vendors who do not (or cannot) offer the customer an opportunity to try on the garment before purchase. Catalog and online retailers (among others) often go to great lengths to produce photos of their garments worn by representative models and presented in common situations; these photos are important to convey the impression of the fit and appearance of a garment to a customer who cannot see it in person.
- Embodiments of the invention use neural networks to synthesize images of models wearing garments from less-expensive images of the garments alone. Beneficially, the synthesis process exposes controls that can be used to adjust characteristics of the synthesized image. For example, the skin tone, body shape, pose and other characteristics of the model in the synthesized image may be varied over a useful range. This permits a user (e.g., a customer) to adjust a synthetic “as-worn” image to resemble the user more closely. Thus, in addition to reducing the cost of producing representative photos, an embodiment allows a user to get a better idea of how a particular garment may appear when worn by the user. This improved representation may improve a retailer's chance of making a sale (or of avoiding the return of a garment that, upon trying on, the purchaser did not like).
-
FIG. 1 shows an overview of a central portion of an embodiment's operations. The method begins by initializing a neural network (100). This network is trained using pairs of images (110): one image of each pair shows a garment (e.g., a garment laid out flat on a neutral surface); and the other image of each pair shows a model wearing the garment. Once training is complete, useful parameters from the “Z vector” (described below) are identified (120). - Now, in typical use, a garment image is provided to the trained network (130). The garment image need not be (and preferably is not) one of the training images, but instead is an image without a corresponding mate. The network synthesizes an image of this garment on a model, based on the network's training and parameters of the Z vector (140). The synthetic image is displayed to the user (150). The user may adjust a Z vector parameter (160) and a new image is synthesized (140) and displayed (150). The adjustment and re-synthesizing may be repeated as often as desired to produce a variety of synthetic images of a model wearing the garment.
- The Z vector parameters may control characteristics such as the model's skin tone, body shape and size, pose or position, and even accessories (e.g., shoe style, handbags, glasses, scarves or jewelry), so the user may be able to control the synthesis process to produce images that more closely resemble what the garment would look like if worn by the user.
-
FIG. 2 shows a conceptual representation of a Generative Adversarial Network, which is one type of neural network that can be used in an embodiment of the invention. Information can generally be understood as passing from left to right across the drawing. An image of agarment 210 is communicated to aninput layer 221 of a firstneural network 220. The input layer may have approximately as many input elements as there are pixels in the input image (i.e., many more elements than are shown in this drawing). Or, an input layer may comprise elements for each color channel of each input pixel (e.g., red, green and blue elements for each pixel). Theinput layer 221 is connected to anintermediate layer 223 by a network ofvariable connection weights 222, and theintermediate layer 223 is connected to anoutput layer 225 by asimilar network 224. The connections are shown all-to-all, but each connection may be associated with a variable weight, so some connections may be effectively absent (weight=0). In addition to layer-to-layer connections, an embodiment may usefeedback connections 226, which may go to an immediately preceding layer, or to an earlier layer. Each neural network may have more than the three layers shown here. In a preferred embodiment, about seven (7) layers are used. - Note that each layer of neural network 220 (traveling left to right) is smaller than the preceding layer. Thus, the network can be thought of as performing a kind of compression, resulting in Z-Vector 250, which in many embodiments is a vector of real numbers that comprises the outputs of the last layer of
network 220. - Z-Vector 250 is used as an input to a second
neural network 230. This network is similar in structure to network 220 (it comprises layers ofmodel neurons weight connections 232, 234), but the number of elements in each layer is increasing (in the direction of data flow). Likenetwork 220,network 230 may include feedback connections (e.g., 236) which carry data in the “opposite” direction. The output ofnetwork 230, once the GAN has been trained, is asynthetic image 240 that represents whatgarment 210 may look like when worn by a model. - In addition to the various intra-network connections, an embodiment may use
inter-network connections hand network 220 or levels away from output of right-hand network 230,” in recognition of the fact that the networks are to some degree mirror images of each other.) These connections may be referred to as “skip connections.” Conceptually, a skip connection provides a simple, direct way for the networks to pass through information that does not affect the “compression” and “synthesis” operations very much. For example, garment color is largely orthogonal to questions of how fabric tends to drape and hang on a figure—a red dress and a blue dress of equivalent construction would look about the same on a model, except for the color itself. Thus, a skip connection can carry the color of the sample image directly through to the output network, rather than relying on the input network to recognize the color and communicate it to the generative network through the Z vector. - Returning to the idea that
network 220 performs something like compression, it turns out that some of the Z-Vector elements encode characteristics that the GAN has learned to recognize (and to use in image synthesis) after processing the training image pairs. For example, one Z-Vector element may control model skin tone. Another element may control model pose (including whether the model is facing the camera or turned to one side, or the position of the arms or legs). An element may control the style of shoe depicted in the synthesized image—these may be similar to one of the styles learned from the training images, but the output image shoes are not simply copied from one of the training-image model's legs. Instead, thegenerative network 230 constructs an image, including shoes or other accessories, that looks like it might have been seen among the training images, if theinput image 210 had had a corresponding model-garment companion image. And critically, adjusting the Z-Vector elements can cause the generative network to make a new image with a different skin tone, model pose, or shoe style. Z-Vector elements may also control details such as dress length, sleeve length, or collar style. I.e., the synthesized image may show ways in which the garment might be altered, as well as ways in which the model might be altered. In one exemplary embodiment, a Z-Vector element controls model rotation, so even though the input dress image is shown head-on, a range of plausible dress-on-model images can be produced, with the model turning from left to right across the range. An example of this is shown inFIG. 3 . All 25 images were synthesized from a single image of a black dress by a network trained with a variety of dress/model pairs of photos. - Although the Generative Autonomous Network (“GAN”) depicted and described with reference to
FIG. 2 is one type of neural network that is well-suited for use in an embodiment, other known types are also useable. For example, network configurations known in the art as Recurrent Neural Networks (“RNNs”), Recurrent Inference Machines (“RIMs”), and Variational Autoencoders (“VAEs”) can all be trained with pairs of images as described above, can generate new synthetic images that could plausibly be the pair-mate of a new input image, and expose quantitative control parameters that can be varied proportionally to adjust features or characteristics of the synthetic images. The foregoing aspects of a neural network are important for use in an embodiment of the invention, namely: -
- Trainable with pairs of images;
- Generates a synthetic image from a new input image, where the synthetic image resembles a “pair” image corresponding to the new input image (i.e., the new input image and the synthetic image could plausibly have been a pair in the training set); and
- Exposes quantitative control parameters that can be manipulated to change characteristics of the synthetic image, including characteristics such as:
-
Model skin tone Garment length Model pose Garment sleeve style Model body weight, shape Garment collar style Accessories Garment fit tightness - One feature of a neural network suitable for use in an embodiment was mentioned briefly above, but bears further discussion. In
FIG. 2 , each node at a particular level was drawn with connections to all nodes at the next level. In addition, a few inter-level connections were shown. However, a neural network may be fully-connected: the weighted output of each node may form part of the input signal to every other node (even including the node itself). Such a network is very difficult to depict in a two-dimensional figure, but it is a common topology, known to practitioners of ordinary skill. Another alternative is a convolutional neural network. This topology is more space-efficient than a fully-connected network, so it can often operate on larger input images (and produce higher-resolution output images). Again, convolutional networks are known to those of ordinary skill in the art, and can be used to good effect by adhering to the principles and approaches described herein. - In an exemplary embodiment of the invention using a Generative Adversarial Network, input images may be about 192×256 pixels, with three color channels for each pixel (red, green & blue). Thus, the input layer comprises 192×256×3=147,456 neuron model elements. Subsequent layers decrease by a factor of two, and the Z vector ends up as 512×4×3=6,144 scalars. The generative network may mirror the input network, starting with the Z vector and emitting a 192×256 synthetic color image.
- Not all elements of the Z vector correspond to a recognizable characteristic such as “skin tone,” “dress length,” “shoe style” or “model pose.” The effects of individual vector elements (and of subsets of the vector) may be determined empirically, or by providing additional information about training image pairs (e.g., by tagging the images with characteristic descriptions) and propagating the additional information through the network during training.
- One favorable approach to determining Z Vector component effect is to perform Principal Component Analysis (“PCA”) on the Z Vector, to identify a smaller vector Z′ whose components are largely linearly independent. The elements of Z′ may be tested to determine their effect, and elements that affect characteristics of interest may be exposed to the user to control synthetic image generation.
-
FIG. 4 outlines a method used by a complete application built around an image synthesizer according to an embodiment of the invention. To begin, a system operator initializes and trains a neural network as described above, using a training set of image pairs where one member of each pair depicts a garment, and the other member of each pair depicts a model wearing the garment (410). Next, a database of garment images is populated (420). These images are similar to the first images of the training set, and may even include the training images. These are, for example, images of garments offered for sale by the system operator. - When a customer visits the operator's system (e.g., when she accesses an e-commerce web site), she may search or browse the catalog of garments using any suitable prior-art method (430). For example, garments may be grouped and presented by color, style, weight, designer, size, price, or any other desired arrangement. When the user selects a garment (440), the system synthesizes and displays an image of the garment on a model (450). The user may be offered an array of controls which are connected to suitable elements of the Z-vector, and she may adjust those parameters as desired (460). When a parameter is adjusted, the system synthesizes and displays a new image of the garment on the model (450).
- If the user decides not to purchase this garment (460), for example by clicking a “Back” button or returning to a list of search results, she may continue to view other garments for sale (430). If the user decides to purchase the garment (480), then information about the selected garment (e.g., a SKU) is passed to a prior-art order-fulfillment process for subsequent activity (490).
- An embodiment of the invention may combine a garment selection control with controls for other aspects of image generation (skin tone, pose, body shape, accessories, etc.) Then, by manipulating individual controls of this plurality of controls, the user can change the garment (leaving the skin tone, pose, and accessories alone), or switch among accessories (leaving skin tone, pose and garment alone). This embodiment permits quick, self-directed comparisons among complete “outfits” or “looks,” a capability that is currently provided at significant expense by human fashion coordinators, and consequently mostly unavailable to shoppers of ordinary or modest means.
-
FIG. 5 outlines another application of an image synthesis network as described above. As before, the method begins by initializing and training an image synthesis network using a training set of image pairs (500). Next, the system acquires a garment image (510), synthesizes an image of the garment on a model (520), and stores the synthesized image (530). If there are more garment images to process (540), these steps are repeated, resulting in a library of synthesized images showing a variety of garments of which images were acquired and processed. The image synthesis may use randomly-chosen Z-vector parameters, which will yield a variety of different model skin tones, shapes, poses, and accessories in the synthesized image libraries. - When there are no more garment images to process (550), synthesized images from the library may be incorporated into a catalog layout (560) and printed (570); or a plurality of static web pages comprising one or more synthetic images may be generated (580), and those web pages may be served to visitors to a web site (590). This process may reduce the cost to produce a catalog of product images or a website displaying many garments.
- It is appreciated that the method may be integrated with a prior-art garment processing sequence, such as a consignment dealer that receives many heterogenous garments from a variety of manufacturers. These garments may be passed through a steam chamber to dissipate storage or transport odors and to remove wrinkles. The garments may be placed on a mannequin during this process, and an image of the freshly-steamed garment may be automatically captured at the end. This image may be delivered to the process outlined in
FIG. 5 , as the “acquired image” at 510. Multiple views of the garment may be acquired, and may permit the synthesis of a greater variety of “garment on model” images, provided that the neural network has been trained with a correspondingly greater variety of image pairs. - An embodiment of the invention may be a machine-readable medium, including without limitation a non-transient machine-readable medium, having stored thereon data and instructions to cause a programmable processor to perform operations as described above. In other embodiments, the operations might be performed by specific hardware components that contain hardwired logic. Those operations might alternatively be performed by any combination of programmed computer components and custom hardware components.
- Instructions for a programmable processor may be stored in a form that is directly executable by the processor (“object” or “executable” form), or the instructions may be stored in a human-readable text form called “source code” that can be automatically processed by a development tool commonly known as a “compiler” to produce executable code. Instructions may also be specified as a difference or “delta” from a predetermined version of a basic source code. The delta (also called a “patch”) can be used to prepare instructions to implement an embodiment of the invention, starting with a commonly-available source code package that does not contain an embodiment.
- In some embodiments, the instructions for a programmable processor may be treated as data and used to modulate a carrier signal, which can subsequently be sent to a remote receiver, where the signal is demodulated to recover the instructions, and the instructions are executed to implement the methods of an embodiment at the remote receiver. In the vernacular, such modulation and transmission are known as “serving” the instructions, while receiving and demodulating are often called “downloading.” In other words, one embodiment “serves” (i.e., encodes and sends) the instructions of an embodiment to a client, often over a distributed data network like the Internet. The instructions thus transmitted can be saved on a hard disk or other data storage device at the receiver to create another embodiment of the invention, meeting the description of a non-transient machine-readable medium storing data and instructions to perform some of the operations discussed above. Compiling (if necessary) and executing such an embodiment at the receiver may result in the receiver performing operations according to a third embodiment.
- In the preceding description, numerous details were set forth. It will be apparent, however, to one skilled in the art, that the present invention may be practiced without some of these specific details. In some instances, well-known structures and devices are shown in block diagram form, rather than in detail, in order to avoid obscuring the present invention.
- Some portions of the detailed descriptions may have been presented in terms of algorithms and symbolic representations of operations on data bits within a computer memory. These algorithmic descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. An algorithm is here, and generally, conceived to be a self-consistent sequence of steps leading to a desired result. The steps are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like.
- It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the preceding discussion, it is appreciated that throughout the description, discussions utilizing terms such as “processing” or “computing” or “calculating” or “determining” or “displaying” or the like, refer to the action and processes of a computer system or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage, transmission or display devices.
- The present invention also relates to apparatus for performing the operations herein. This apparatus may be specially constructed for the required purposes, or it may comprise a general-purpose computer selectively activated or reconfigured by a computer program stored in the computer. Such a computer program may be stored in a computer readable storage medium, including without limitation any type of disk including floppy disks, optical disks, compact disc read-only memory (“CD-ROM”), and magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), eraseable, programmable read-only memories (“EPROMs”), electrically-eraseable read-only memories (“EEPROMs”), magnetic or optical cards, or any type of media suitable for storing computer instructions.
- The algorithms and displays presented herein are not inherently related to any particular computer or other apparatus. Various general-purpose systems may be used with programs in accordance with the teachings herein, or it may prove convenient to construct more specialized apparatus to perform some method steps. The required structure for a variety of these systems will be recited in the claims below. In addition, the present invention is not described with reference to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the invention as described herein.
- The applications of the present invention have been described largely by reference to specific examples and in terms of particular allocations of functionality to certain hardware and/or software components. However, those of skill in the art will recognize that configurable, synthetic images of models wearing garments based on images of the garments alone can also be produced by software and hardware that distribute the functions of embodiments of this invention differently than herein described. Such variations and implementations are understood to be captured according to the following claims.
Claims (15)
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15/634,171 US10304227B2 (en) | 2017-06-27 | 2017-06-27 | Synthesizing images of clothing on models |
US16/422,278 US10755479B2 (en) | 2017-06-27 | 2019-05-24 | Systems and methods for synthesizing images of apparel ensembles on models |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15/634,171 US10304227B2 (en) | 2017-06-27 | 2017-06-27 | Synthesizing images of clothing on models |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/422,278 Continuation-In-Part US10755479B2 (en) | 2017-06-27 | 2019-05-24 | Systems and methods for synthesizing images of apparel ensembles on models |
Publications (2)
Publication Number | Publication Date |
---|---|
US20180374249A1 true US20180374249A1 (en) | 2018-12-27 |
US10304227B2 US10304227B2 (en) | 2019-05-28 |
Family
ID=64693443
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/634,171 Active 2037-08-02 US10304227B2 (en) | 2017-06-27 | 2017-06-27 | Synthesizing images of clothing on models |
Country Status (1)
Country | Link |
---|---|
US (1) | US10304227B2 (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110021051A (en) * | 2019-04-01 | 2019-07-16 | 浙江大学 | One kind passing through text Conrad object image generation method based on confrontation network is generated |
CN110348330A (en) * | 2019-06-24 | 2019-10-18 | 电子科技大学 | Human face posture virtual view generation method based on VAE-ACGAN |
US20190347831A1 (en) * | 2018-05-08 | 2019-11-14 | Myntra Designs Private Limited | Generative system and method for creating fashion products |
CN111339918A (en) * | 2020-02-24 | 2020-06-26 | 深圳市商汤科技有限公司 | Image processing method, image processing device, computer equipment and storage medium |
WO2020149960A1 (en) * | 2019-01-16 | 2020-07-23 | Microsoft Technology Licensing, Llc | Finding complementary digital images using a conditional generative adversarial network |
US10783398B1 (en) * | 2018-10-22 | 2020-09-22 | Shutterstock, Inc. | Image editor including localized editing based on generative adversarial networks |
CN111709874A (en) * | 2020-06-16 | 2020-09-25 | 北京百度网讯科技有限公司 | Image adjusting method and device, electronic equipment and storage medium |
CN112100908A (en) * | 2020-08-31 | 2020-12-18 | 西安工程大学 | Garment design method for generating confrontation network based on multi-condition deep convolution |
CN112991160A (en) * | 2021-05-07 | 2021-06-18 | 腾讯科技(深圳)有限公司 | Image processing method, image processing device, computer equipment and storage medium |
US11073975B1 (en) * | 2019-03-29 | 2021-07-27 | Shutterstock, Inc. | Synthetic image generation in response to user creation of image |
US20230086880A1 (en) * | 2021-09-20 | 2023-03-23 | Revery.ai, Inc. | Controllable image-based virtual try-on system |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10325442B2 (en) | 2016-10-12 | 2019-06-18 | Uber Technologies, Inc. | Facilitating direct rider driver pairing for mass egress areas |
US10355788B2 (en) | 2017-01-06 | 2019-07-16 | Uber Technologies, Inc. | Method and system for ultrasonic proximity service |
CA3082886A1 (en) * | 2017-11-02 | 2019-05-09 | Measur3D, Llc | Clothing model generation and display system |
US10820650B2 (en) * | 2018-02-27 | 2020-11-03 | Levi Strauss & Co. | Surface projection for apparel in an apparel design system |
US20210073886A1 (en) | 2019-08-29 | 2021-03-11 | Levi Strauss & Co. | Digital Showroom with Virtual Previews of Garments and Finishes |
US11544884B2 (en) * | 2020-12-11 | 2023-01-03 | Snap Inc. | Virtual clothing try-on |
US11636654B2 (en) | 2021-05-19 | 2023-04-25 | Snap Inc. | AR-based connected portal shopping |
US11580592B2 (en) | 2021-05-19 | 2023-02-14 | Snap Inc. | Customized virtual store |
Family Cites Families (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5680528A (en) * | 1994-05-24 | 1997-10-21 | Korszun; Henry A. | Digital dressing room |
GB0101371D0 (en) * | 2001-01-19 | 2001-03-07 | Virtual Mirrors Ltd | Production and visualisation of garments |
JP4473754B2 (en) * | 2005-03-11 | 2010-06-02 | 株式会社東芝 | Virtual fitting device |
US7657340B2 (en) * | 2006-01-31 | 2010-02-02 | Dragon & Phoenix Software, Inc. | System, apparatus and method for facilitating pattern-based clothing design activities |
US9959453B2 (en) * | 2010-03-28 | 2018-05-01 | AR (ES) Technologies Ltd. | Methods and systems for three-dimensional rendering of a virtual augmented replica of a product image merged with a model image of a human-body feature |
US20110298897A1 (en) * | 2010-06-08 | 2011-12-08 | Iva Sareen | System and method for 3d virtual try-on of apparel on an avatar |
US9147207B2 (en) * | 2012-07-09 | 2015-09-29 | Stylewhile Oy | System and method for generating image data for on-line shopping |
US9760792B2 (en) * | 2015-03-20 | 2017-09-12 | Netra, Inc. | Object detection and classification |
US20170278135A1 (en) * | 2016-02-18 | 2017-09-28 | Fitroom, Inc. | Image recognition artificial intelligence system for ecommerce |
US10290136B2 (en) | 2016-08-10 | 2019-05-14 | Zeekit Online Shopping Ltd | Processing user selectable product images and facilitating visualization-assisted coordinated product transactions |
US10019655B2 (en) * | 2016-08-31 | 2018-07-10 | Adobe Systems Incorporated | Deep-learning network architecture for object detection |
-
2017
- 2017-06-27 US US15/634,171 patent/US10304227B2/en active Active
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190347831A1 (en) * | 2018-05-08 | 2019-11-14 | Myntra Designs Private Limited | Generative system and method for creating fashion products |
US11055880B2 (en) * | 2018-05-08 | 2021-07-06 | Myntra Designs Private Limited | Generative system and method for creating fashion products |
US10783398B1 (en) * | 2018-10-22 | 2020-09-22 | Shutterstock, Inc. | Image editor including localized editing based on generative adversarial networks |
US10949706B2 (en) | 2019-01-16 | 2021-03-16 | Microsoft Technology Licensing, Llc | Finding complementary digital images using a conditional generative adversarial network |
WO2020149960A1 (en) * | 2019-01-16 | 2020-07-23 | Microsoft Technology Licensing, Llc | Finding complementary digital images using a conditional generative adversarial network |
US11073975B1 (en) * | 2019-03-29 | 2021-07-27 | Shutterstock, Inc. | Synthetic image generation in response to user creation of image |
CN110021051A (en) * | 2019-04-01 | 2019-07-16 | 浙江大学 | One kind passing through text Conrad object image generation method based on confrontation network is generated |
CN110021051B (en) * | 2019-04-01 | 2020-12-15 | 浙江大学 | Human image generation method based on generation of confrontation network through text guidance |
CN110348330A (en) * | 2019-06-24 | 2019-10-18 | 电子科技大学 | Human face posture virtual view generation method based on VAE-ACGAN |
CN111339918A (en) * | 2020-02-24 | 2020-06-26 | 深圳市商汤科技有限公司 | Image processing method, image processing device, computer equipment and storage medium |
CN111709874A (en) * | 2020-06-16 | 2020-09-25 | 北京百度网讯科技有限公司 | Image adjusting method and device, electronic equipment and storage medium |
CN112100908A (en) * | 2020-08-31 | 2020-12-18 | 西安工程大学 | Garment design method for generating confrontation network based on multi-condition deep convolution |
CN112991160A (en) * | 2021-05-07 | 2021-06-18 | 腾讯科技(深圳)有限公司 | Image processing method, image processing device, computer equipment and storage medium |
US20230086880A1 (en) * | 2021-09-20 | 2023-03-23 | Revery.ai, Inc. | Controllable image-based virtual try-on system |
Also Published As
Publication number | Publication date |
---|---|
US10304227B2 (en) | 2019-05-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10304227B2 (en) | Synthesizing images of clothing on models | |
US10755479B2 (en) | Systems and methods for synthesizing images of apparel ensembles on models | |
US10922898B2 (en) | Resolving virtual apparel simulation errors | |
US20170004567A1 (en) | System and method for providing modular online product selection, visualization and design services | |
CN110084874A (en) | For the image Style Transfer of threedimensional model | |
JP7489921B2 (en) | Apparatus and method for generating image/text-based designs | |
JP2016502713A (en) | Clothing matching system and method | |
Jiang et al. | Deep learning for fashion style generation | |
CN111325226A (en) | Information presentation method and device | |
US20210182950A1 (en) | System and method for transforming images of retail items | |
CN108109049A (en) | Clothing matching Forecasting Methodology, device, computer equipment and storage medium | |
US20230086880A1 (en) | Controllable image-based virtual try-on system | |
US20240037869A1 (en) | Systems and methods for using machine learning models to effect virtual try-on and styling on actual users | |
KR20230153451A (en) | An attempt using inverse GANs | |
US11188790B1 (en) | Generation of synthetic datasets for machine learning models | |
Prakash et al. | Virtual fashion mirror | |
WO2020242718A1 (en) | Systems and methods for synthesizing images of apparel ensembles on models | |
Li et al. | POVNet: Image-based virtual try-on through accurate warping and residual | |
JP7507791B2 (en) | SYSTEM AND METHOD FOR SYNTHESISING APPAREL ENSEMBLES FOR MODELS - Patent application | |
CN107257988A (en) | Dress ornament analogue means and method | |
Pei | The effective communication system using 3D scanning for mass customized design | |
KR20200071196A (en) | A virtual fitting system based on face recognition | |
US11804023B1 (en) | Systems and methods for providing a virtual dressing room and a virtual stylist | |
KR102652199B1 (en) | System and method for generating video content based on pose estimation | |
Pandey et al. | CLOTON: A GAN based approach for Clothing Try-On |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: MAD STREET DEN, INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:COLBERT, MARCUS C., DR.;REEL/FRAME:042826/0129 Effective date: 20170625 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
AS | Assignment |
Owner name: SILICON VALLEY BANK, CALIFORNIA Free format text: SECURITY INTEREST;ASSIGNOR:MAD STREET DEN INC.;REEL/FRAME:061008/0164 Effective date: 20220829 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YR, SMALL ENTITY (ORIGINAL EVENT CODE: M2551); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY Year of fee payment: 4 |