NZ736107A - Virtual trying-on experience - Google Patents
Virtual trying-on experience Download PDFInfo
- Publication number
- NZ736107A NZ736107A NZ736107A NZ73610716A NZ736107A NZ 736107 A NZ736107 A NZ 736107A NZ 736107 A NZ736107 A NZ 736107A NZ 73610716 A NZ73610716 A NZ 73610716A NZ 736107 A NZ736107 A NZ 736107A
- Authority
- NZ
- New Zealand
- Prior art keywords
- user
- model
- face
- item
- models
- Prior art date
Links
- 238000000034 method Methods 0.000 claims abstract description 110
- 239000011521 glass Substances 0.000 claims description 53
- 238000012545 processing Methods 0.000 claims description 18
- 230000000007 visual effect Effects 0.000 claims description 6
- 230000001419 dependent effect Effects 0.000 claims description 5
- 241000282412 Homo Species 0.000 claims description 2
- 210000003128 head Anatomy 0.000 description 106
- 230000015654 memory Effects 0.000 description 21
- 230000008569 process Effects 0.000 description 21
- 238000003860 storage Methods 0.000 description 16
- 238000004590 computer program Methods 0.000 description 8
- 238000001514 detection method Methods 0.000 description 8
- 238000009877 rendering Methods 0.000 description 7
- 210000000887 face Anatomy 0.000 description 5
- 238000012800 visualization Methods 0.000 description 5
- 230000005540 biological transmission Effects 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 238000013500 data storage Methods 0.000 description 3
- 238000007781 pre-processing Methods 0.000 description 3
- 230000009471 action Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 230000037237 body shape Effects 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000026676 system process Effects 0.000 description 2
- 230000002123 temporal effect Effects 0.000 description 2
- 239000013598 vector Substances 0.000 description 2
- PXFBZOLANLWPMH-UHFFFAOYSA-N 16-Epiaffinine Natural products C1C(C2=CC=CC=C2N2)=C2C(=O)CC2C(=CC)CN(C)C1C2CO PXFBZOLANLWPMH-UHFFFAOYSA-N 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 239000000872 buffer Substances 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013075 data extraction Methods 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 230000006837 decompression Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000005294 ferromagnetic effect Effects 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 238000012804 iterative process Methods 0.000 description 1
- 230000005291 magnetic effect Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000005055 memory storage Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
- 238000012913 prioritisation Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000004441 surface measurement Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T19/00—Manipulating 3D models or images for computer graphics
- G06T19/006—Mixed reality
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
- G06T7/75—Determining position or orientation of objects or cameras using feature-based methods involving models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/06—Buying, selling or leasing transactions
- G06Q30/0601—Electronic shopping [e-shopping]
- G06Q30/0641—Shopping interfaces
- G06Q30/0643—Graphical representation of items or shoppers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/30—Determination of transform parameters for the alignment of images, i.e. image registration
- G06T7/33—Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
- G06T7/344—Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods involving models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20212—Image combination
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
- G06T2207/30201—Face
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2210/00—Indexing scheme for image generation or computer graphics
- G06T2210/16—Cloth
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2219/00—Indexing scheme for manipulating 3D models or images for computer graphics
- G06T2219/20—Indexing scheme for editing of 3D models
- G06T2219/2004—Aligning objects, relative positioning of parts
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- Software Systems (AREA)
- Computer Hardware Design (AREA)
- General Engineering & Computer Science (AREA)
- Computer Graphics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Accounting & Taxation (AREA)
- Finance (AREA)
- Development Economics (AREA)
- Economics (AREA)
- Marketing (AREA)
- Strategic Management (AREA)
- General Business, Economics & Management (AREA)
- Processing Or Creating Images (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
There is provided a method of providing a virtual trying on experience to a user comprising extracting at least one image from a video including a plurality of video frames of a user in different orientations to provide at least one extracted imageacquiring 3D models of an item to be tried on the user and a generic representation of a human, combining the acquired 3D models and at least one extracted image as the background, and generating an output image representative of the virtual trying-on experience. There is also provided apparatus to carry out the methods.
Description
Virtual Trying-On Experience
Field of the invention
Embodiments of the invention relate to a computer implemented method for providing a
visual entation of an item being tried on a user.
Summary
The present invention provides a method of ing a virtual trying on experience to a user
as described in the accompanying claims.
In a particular aspect, the present invention provides a method of providing a virtual trying
on experience to a user comprising:
extracting at least one image from a video including a plurality of video frames of a
user in different orientations to provide at least one extracted image;
acquiring a 3D model of an item to be tried on the user and a 3D model of a generic
representation of a human; and
combining the acquired 3D models with at least one ted image, having the at
least one extracted image as a background, to generate an output image representative of the
virtual trying-on experience; wherein each of the 3D models comprises an origin point, and
the combining the acquired 3D models with the at least one extracted image comprises
aligning the origin points of each of the 3D models in 3D space.
In another particular aspect, the present invention provides a method of providing a virtual
trying on experience for a user, comprising:
receiving a ity of video frames of a user’s head in different orientations to
provide captured oriented user images;
identifying an origin point on a 3D model of a generic user;
identifying an origin point on a 3D model of a user-selected item to be tried on;
aligning the origin point of the 3D model of the generic user and the 3D model of an
item to be tried on;
combining each captured oriented user image as a background with a ted
representation of the user-selected item to be tried on based on the aligned 3D model of the
c user and the 3D model of the item to provide a series of ed images
entative of the virtual trying on experience; and
ying the series of combined images.
[FOLLOWED BY PAGE 1a]
- 1a -
Specific examples of the invention are set forth in the dependent claims.
These and other aspects of the invention will be nt from and elucidated with reference
to the examples described hereinafter.
Brief description of the drawings
Further details, aspects and ments of the invention will be described, by way of
example only, with reference to the drawings. In the drawings, like reference numbers are used to
identify like or functionally similar elements. ts in the figures are illustrated for simplicity
and clarity and have not necessarily been drawn to scale.
Figure 1 shows an example method of providing a virtual trying on experience to a user
according to an example embodiment of the invention;
Figure 2 shows first and second more ed portions of the method of Figure 1, according
to an example embodiment of the invention;
Figure 3 shows a third more detailed portion of the method of Figure 1, according to an
example embodiment of the invention;
Figure 4 shows a high level diagram of the face tracking method, according to an example
embodiment of the invention;
Figure 5 shows how the method ves faces in video ces, according to an example
embodiment of the invention;
Figure 6 shows a detected face, according to an example ment of the invention;
Figure 7 shows ed features of a face, according to an example embodiment of the
invention;
Figure 8 shows a pre-processing phase of the method that has the ive to find the most
reliable frame containing a face from the video sequence, according to an example ment of
the invention;
Figure 9 shows an optional face model building phase of the method that serves to construct
a suitable face model representation, according to an example embodiment of the invention;
[FOLLOWED BY PAGE 2]
W0 2016/142668
Figure 10 shows a processed video frame along with its corresponding (eg. generic) 3D
model of a head, according to an example ment ofthe invention;
Figure 11 shows a sequential face tracking portion of the disclosed method, according to an
example embodiment of the invention;
Figure 12 shows an exemplary embodiment of er re on which the disclosed
method may be run,
Figure 13 shows another ary embodiment of computer hardware on which the
disclosed method may be run.
Detailed description
Because the illustrated embodiments of the present invention may for the most part be
implemented using onic components and circuits known to those d in the art, details will
not be explained in any greater extent than that considered necessary to illustrate the invention to a
person skilled in the nt art. This is done for the understanding and appreciation of the
underlying concepts of the present invention, without unduly obfuscating or cting from the
teachings ofthe present invention.
es provide a method, apparatus and system for generating "a virtual try-on
experience" of an item on a user, such as a pair of spectacles/glasses being tried on a user’s head.
The virtual try-on experience may be yed on a computer display, for example on a
smartphone or tablet screen. Examples also provide a computer program (or “app”) comprising
instructions, which when executed by one or more processors, carry out the disclosed methods. The
disclosed virtual try on experience methods and apparatuses allow a user to see what a selected
item would look like on their person, typically their head. Whilst the following has been cast in
terms of trying on glasses on a human head, similar methods may also be used to lly try on
any other readily 3D model-able items that may be worn or attached to another object, especially a
human object, including, but not limited to: earrings, tattoos, shoes, makeup, and the like.
Examples may use one or more generic 3D models of a human head, together with a one or
more 3D models of the ) to be tried on, for example models of selected pairs of glasses. The
one or more generic 3D models of a human head may include a female generic head and a male
generic head. In some embodiments, different body shape generic head 3D models may be
provided and selected between to be used in the generation of the “virtual try-on ence". For
example, the different body shape generic heads may comprise different widths and/or heights of
heads, or hat sizes.
According to some examples, the 3D models (of both the generic human heads and/or the
items to be placed on the head) may be placed into a 3D space by reference to an origin. The origin
ofthe 3D models may be defined as a location in the 3D space at which the coordinates of each 3D
W0 2016/142668
model is to be referenced from, in order to locate any given portion of the 3D model. The origin of
each model may correspond to one another, and to a specified nominally universal location, such as
the location of a bridge of the nose. Thus, the origins of the 3D models may be y co-located
in the 3D space, together with a ponding location of the item to be virtually tried on, so that
they may be naturally/suitably aligned. There may also be provided one or more attachment points
for the item being tried on to the 3D model of a c human head. In the trying on of s
example, these may be, for example, where the arms of the glasses rest on a human ear.
The origin is not in itself a point in the model. It is merely a location by which points in the
3D models (both of the c human head, but also of any item being tried on, such as glasses)
may be referenced and suitably aligned. This is to say, examples may place both 3D models (i.e.
the selected generic human head + item being tried on) into the same 3D space in a suitable (i.e.
realistic) alignment by reference to the respective origins. The 3D model of the head may not be
made e, but only used for occlusion or other ations of the 3D model of the glasses. The
combined generic head (invisible) and glasses 3D models bly occluded) can then be placed on
a background comprising an extracted image of the user taken from a video, so that the overall
combination of the rendered 3D model of the glasses and the extracted video gives the impression
of the glasses being worn by the user. This combination process, as well as the occlusion
calculations using the “invisible” generic human head, may be repeated for a number of extracted
images at different nominal rotations.
By using pre—defined generic 3D head models, examples do not need to generate a 3D model
of a user’s head, and therefore reduce the sing overhead requirements. However, the utility
of the examples is not materially affected as key issues pertaining to the virtual try on experience
are maintained, such as occlusion of portions of the glasses by head extremities (e.g. eyes, nose,
etc), during rotation, as discussed in more detail below.
Examples map the 3D models in the 3D space onto suitably captured and arranged images of
the actual user of the system. This mapping process may include trying to find images of a user’s
head having pre-defined angles of view matching predetermined angles. This matching may
comprise determining, for a ed head rotation video, a predetermined number of angles of
head between the two maximum angles of head rotation contained within the captured head
rotation video. In such a way, es enable use of the specific captured head on video,
less of whether or not a pre-determined preferable maximum of head rotation has occurred
(i.e. these examples would not require the user to re-capture a new video because the user had not
turned their head sufficiently in the original capturing of their head rotation). Thus, examples are
more efficient than the prior art that requires a minimum head rotation.
In examples, by establishing angles of images based on maximum angle of user head rotation
in a captured video (and therefore under the direct l ofthe user), the g angle(s) may be
W0 2016/142668 2016/050596
user—determined. This enables the system to portray the generated 3D try on experience in a way
particularly desirable to the user, as opposed to only being portrayed in a specific, pre-determined
manner that the user must abide by in order for the system to work. Thus, examples are more
“natural” to use than the prior art.
There now follows a detailed description of an exemplary embodiment of the present
invention, in particular an embodiment in the form of a software application (often simply referred
to as an “app”) used on smartphone device. The e software application is in the form of a
Virtualized method for a human user to try on s including a face tracking portion described in
more detail below, where face tracking is used in an application according to examples to
‘recognize’ a user’s face (i.e. compute a user’s ace pose).
es of the disclosed method may include extracting a still image(s) of a user (or just
user’s head portion) from a captured video of the user. A movement and/or orientation ofthe user’s
head, i.e. position and viewing direction, may be determined from the extracted still image(s). The
image of the user may be used as a ound image for a 3D space including 3D models of the
item, such as glasses, to be virtually tried on, thereby creating the appearance of the item being
tried on the user actual e head. A 3D model of a c head, i.e. not of the actual user, may
also be placed into the 3D space, overlying the background image of the user, In this way, the
generic human head model may be used as a mask, to allow suitable occlusion culling (i.e. hidden
surface determination) to be carried out on the 3D model of the item being tried on, in relation to
the user’s head. Use of a generic human head model provides higher processing efficiency/speed,
t significantly reducing efficacy ofthe end result.
An origin of the 3D model of a generic human head may be located at a pre-determined point
in the model, for example, corresponding to a bridge of a nose in the model. Other locations and
numbers of reference points may be used instead. A position at which the 3D model is located
within the 3D space may also set with reference to the origin of the model, i.e. by specifying the
location of the origin of the 3D model within the 3D space. The orientation of the 3D model may
pond to the determined viewing direction ofthe user.
A 3D model ofthe ed item to be tried on, for example the ed pair of glasses, may
be placed into the 3D space. An orientation of the glasses model may correspond to the viewing
direction of the user. An origin of the 3D glasses model may be provided and located at a point
corresponding to the same point as the 3D model of the generic human head, for example also
being at a bridge of a nose in the glasses model. A position at which the 3D glasses model is
located within the 3D space may be set with reference to the origin of the glasses 3D model, i.e. by
specifying the location of the origin of the 3D model within the 3D space. The origin of the 3D
model of the glasses may be set so that the glasses substantially align to the normal wearing
position on the 3D model of the human head.
W0 2016/142668
An image of the glasses located on the user’s head may then be generated based on the 3D
models of the glasses and generic head (which may be used to mask portions of the glasses model
which should not be visible and to generate shadow) and the background image of the user.
The position of the glasses relative to the head may be altered by moving the location of the
3D glasses model in the 3D space, i.e. by setting a different location of an origin of the model, or
by moving the origin of the 3D glasses model out of alignment with the origin ofthe 3D model of a
generic human head.
The example application also may include video e, which may refer to capturing a
video of the user’s head and splitting that video up into a plurality of video frames. In some
examples, the video capture may occur outside of the device displaying the visualization. Each
video frame may therefore se an image ted from a video capture device or a video
sequence captured by that or another video capture . Examples may include one or more 3D
models, where a 3D model is a 3D representation of an object. In specific examples, the 3D models
may be of a generic human head and of an item to be visualized upon the head, such as a pair of
glasses. A 3D model as used herein may comprise a data set including one or more of: a set of
locations in a 3D space defining the item being modelled, a set of data representing a e or
material of the item (or portion thereof) in the model, a mesh of data points ng the object, an
origin, or reference point for the model, and other data usefiJl in defining the physical item about
which the 3D model relates. Examples may also use a scene, where the scene may contain one or
more models, and including, for example, all the meshes for the 3D models used to visualize the
glasses on a user’s head. Other data sets that may also be used in some examples include: a
material data set describing how a 3D model should be rendered, often based upon textures, a mesh
data set that may be the technical 3D representation of the 3D model, a e data set that may
include a graphic file that may be applied to a 3D model in order to give it a e and/or a color.
Data sets that may be used in some embodiments may include CSV (for Comma Separated
Values), which is an exchange format used in software such as ExcelTM, JSON (for JavaScript
Object Notation) is an ge format used mainly in Web, and metrics, that are a way to record,
for example, the usage of the application. Other data sets are also envisaged for use in examples,
and the invention is not so d.
Example embodiments may comprise code portions or software s including, but not
limited to: code portions provided by or through a Software Development Kit (SDK) of the target
Operating System (OS), operable to enable execution of the ation on that target OS, for
e portions provided in the iOS SDK environment, XCode®, 3D model ing, lighting
and shadowing code portion (for example, for applying the glasses on user’s face), face tacking
code portions; and metric provision code portions.
W0 42668
The software application comprises three core actions: video ing of the user’s face
with face-tracking, 3D model download and interpretation/representation of the 3D models (of
generic user head and glasses being visualized on the user’s head); and display of the combination
of the 3D Models and recorded video imagery. Examples may also include cloud / web enabled
services catalog handling, thereby ng onward use of the visualization to the user, for example
for providing the selected glasses to the user for real-world trying on and/or sale.
Figure 1 shows an example method 100 of providing a virtual try on experience for glasses
on a user’s head.
The method starts by capturing video 110 of the user’s head rotating. However, due to the
beneficial aspects of the disclosed examples (in ular, the freedom to use any form/extent of
head rotation), in the alternative, a previously captured video may be used instead.
The method then extracts images 120, for later processing, as disclosed in more detail below.
From the extracted images, the method determines the object (in this example, the user’s head)
movement in the extracted images 130. Next, 3D models of the items (ie. glasses) to be placed,
and a 3D model of a c human head on which to place the item models, are acquired 140,
either from local storage (e.g. in the case of the generic human head model) or from a remote data
repository (e.g. in the case of the item/glasses, as this may be a new model). More detailed
description ofthese process 130 and 140 are disclosed below with reference to Figure 2.
The 3D models are ed with one another and the extracted images (as background) at
step 150. Then, an image of the visual representation of the object s head) with the item
(glasses) thereon can be generated 160. This is described in more detail with respect of the Figure
3, below.
Optionally, the location ofthe items with respect to the object may be adjusted 170, typically
according to user input. This step may occur after display of the image, as a result of the user
desiring a slightly different output image.
Figure 2 shows a more detailed view 200 of a portion of the method, in particular, the object
movement determination step 130 and 3D model acquisition step 140.
The object movement determination step 130, may be broken down in to sub steps in which
a maximum rotation of the object (i.e. head) in a first direction (e.g. to the left) is ined 132,
then the maximum rotation in the second direction (e.g. to the right) may then be determined 134,
finally, for this n of the method, output values may be provided 136 indicative of the
maximum rotation of the head in both first and second directions, for use in the subsequent
sing of the extracted images and/or 3D models for placement within the 3D space relating to
each extracted image. In some examples, the different steps noted above in respect of the object
movement ination may only be optional.
W0 2016/142668
The 3D model acquisition step 140, may be broken down in to sub steps in which a 3D
model of a generic head is acquired 142, or optionally, to e a selection step 144 of a one 3D
model of a generic human head out of a number of acquired 3D generic models of a human head
(e.g. choosing between a male or female generic head 3D model). The choice of generic head
model may be under direct User control, or by automated selection, as described in more detail
below. Next, the 3D models of the item(s) to be placed on the head, e.g. glasses may then be
acquired 146. Whilst the two acquisition steps 142 and 146 may be carried out either way round, it
is advantageous to choose the generic human head in use, because this may allow the choice of 3D
models of the items to be placed to be filtered so that only applicable models are available for
subsequent acquisition. For example, choosing a female c human head 3D model can filter
out all male glasses.
Figure 3 shows a more detailed view 300 of the image generation step 160 of Figure 1.
The image generation step 160 may start by ng an extracted image as the background
162 to the visual representation of the item being tried on the user’s head. Then, using the face
tracking data (i.e. detected nt, such as the extent of on values discussed above, at step
136) may be used to align the 3D models of the c human head and the 3D model of the
glasses to the ted image used as background 164 (the 3D models may already have been
d to one another, for e using their origins, or that alignment can be carried out at this
point as well, instead).
Hidden surface detection calculations (i.e. occlusion calculations) 166 may be carried out on
the 3D model of the glasses, using the 3D model of the generic head, so that any parts of the
glasses that should not be visible in the context of the particular extracted image in use at this point
in time may be left out of the overall end 3D rendering of the combined scene (comprising
extracted image background, and 3D model of glasses “on top”). The combined scene may then be
output as a rendered image 168. The process may repeat for a number of different extracted ,
each depicting a different rotation ofthe user’s head in space.
The extracted images used above may be taken from a video recording of the user’s face,
which may be carried out with a face tracking n of the example method. This allows the user
to record a Video of themselves, so that the virtual glasses can be shown as they would look on their
actual person. This is achieved in multiple steps. First the application records a video capture of the
user's head. Then the application will intelligently split this video into frames and send these to the
Face tracking y module. The face tracking library module may then return the location results
for each frame (i.e. where the user’s face is in the frame and/or 3D space/world (related to a
Coordinate System, (CS) that is linked to the camera). These results may be used to position the 3D
glasses on the users face virtually.
W0 2016/142668
The face recording may be approximately 8 seconds long, and may be captured in high
resolution video.
There is now described in more detail an exemplary chain of production bing how the
virtual glasses are suitably rendered on the captured video of user’s head end.
Video recording and face—tracking:
When starting the application, the application may prompt the user to record a video of their
head turning in a non—predefined, i.e. user-controllable, substantially horizontal sweep of the user’s
head. The camera is typically located head of the user’s face, when the user’s head is at the
central point of the overall sweep, such that the entirety of the user’s head e in the frame of
the video. However, in other examples, the camera may not be so aligned. The user has to move his
head left and right to give the best results possible. The location of the head in the sweep may be
assessed by the face tracking module prior to capture of the video for use in the method, such that
the user may be prompted to re-align their head before e. In this way, the user may be
suitably prompted so that only a single video e is ary, which ultimately provides a
better user experience. However, in some examples, the method captures the video as is ed
by the user, and carries on without requiring a second video capture.
When the video is recorded (and, optionally, the user is happy with it), the video may then be
processed through the following steps.
Video split
The captured video is to be reted by the face-tracking process carried out by the face-
tracking module. However, to aid this, the captured video of the user’s head may be sampled, so
that only a sub-set of the captured video images are used in the later processing steps. This may
result in faster and/or more efficient processing, which in turn may also allow the example
application to be performed by lesser processing resources or at greater energy efficiency.
One exemplary way to provide this sampling of the captured video images is to split the
video into comprehensible frames. Initially, this ing action may involve the video that is
recorded at a higher initial capture rate (e.g. of 30 frame per seconds, at 8 seconds total length, that
gives a total of 240 video frames), but only selecting or further processing a termined or user
definable number of those frames. For example, the splitting process may select every third fame of
the originally capture video, which in the above example provides 80 output frame for subsequent
processing, at a rate of 10 frames per second, Thus the processing overload is now approximately
33% of the al processing load.
Face-tracking process
The sub-selected 80 video frames (i.e. 80 distinct ) are then sent to the Face-tracking
module for analysis, as described in more detail below with respect to figures 3 to 10. By the end of
W0 2016/142668
the Face-tracking process, the application may have 80 sets of data: one for each sub—selected video
frame. These sets of data contain, for each video frame, the on and orientation of the face.
Face-tracking data selection
It may be ssary for the application to process all 80 sets of data at this point, so the
ation may include a step of selecting a fined number of best frames offered by the
results returned by the face-tracking module. For example, the 9 best frames may be selected, based
upon the face orientation, thereby covering all the angles ofthe face as it turns from left to right (or
vice versa).
The ion may be made as s: for frame 1 (left most), the face may be turned 35
degrees to the left, for frame 2, the face may be turned 28 degrees to the left, for frame 3, the face
may be tumed 20 degrees to the left; for frame 4, the face may be turned 10 degrees to the left; for
frame 5, the face may be centered, for frame 6, the face may be turned 10 degrees to the right, for
frame 7, the face may be tumed 20 s to the right; for frame 8, the face may be turned 28
degrees to the right; for frame 9, the face may be turned 35 degrees to the right. Other specific
angles selected for each of the selection of best frames may be used, and may also be defined by
the user instead.
In some examples, non—linear/contiguous capture of images/frames of the head in the 3D
space may be used. This is to say, in these alternative examples, the user’s head may pass through
any given target angle more than once during a recording. For example, if one degree left of centre
were a target angle and the recording starts from a straight ahead position, then the head being
captured passes through this one degree left of centre angle twice — once en route *to* the left—most
position and once more after rebound *from* the left-most position. Thus, in these examples, the
method has the option to decide which of the different instances is the best version of the angle to
use for actual display to the user. Thus, the images actually used to display to the user may not all
be contiguous/sequential in time.
In an alternative example, instead of selecting best frames for further processing ing to
pre—defined angles (which assumes a pre-defined head sweep — e.g. 180 degrees sweep, with 90
degree (left and right) max turn from central dead-ahead position), the method may d use any
arbitrary user provided tum of head, and determine the actual m turn in each direction, and
then split that determined actual head turn into a et number of ‘best frames’. This process
may also take into account a lack of symmetry of the overall head tum (i.e. more tum to the left
that right, or vice versa). For example, the actual head turn may be, in actual fact, 35 degrees left
and 45 degrees right. Therefore, a total of 70 degrees, which in turn may be then split into 9 frames
at 7.77 degrees each, or simply 3 on the left, one central, and 4 on the right).
By the end of the face-tracking data selection portion of the overall method, the application
may have selected 9 frames and associated sets of data. In some examples, if the application was
W0 2016/142668
not able to select a suitable number of “best” frames, the user’s video may be rejected, and the user
may be kindly asked to take a new head turning video. For example, if the st frame does not
offer a face turned at least 20 degrees to the left, or the right most frame does not offer a face
turned at least 20 degrees to the right, the user's video will be rejected.
Face-tracking process end
When the application has the requisite number (e .g. 9) best frames, the respective best frame
images and data sets are saved within the application data storage location. These may then be used
in a later state, with the 3D models, which may also be stored in the application data storage
location, or another memory location in the device ng out the example application, or even in
a networked on, such as central cloud storage repository.
3D chain of production and process
All the best frames are produced within the application, following 3D modeling ques
known in the art. For example, the application may start from the captured High ion, High
polygons models (eg. of the glasses (or other product) to be tried on). Since the application has to
run on mobile devices, these 3D models may be reworked in order to adapt to low ation
power and low memory offered by the mobile devices, for example to reduce the number of
polygons in each of the models.
Then, the application can work on the es. The textures may be images and, if not
reworked, may overflow the device memory and lead to application crashes. For this application,
there may be two sets of textures generated each time: one for a first type of device (e.g. a mobile
device such as a hone, using iOS, where the textures used may be smaller, and hence more
suited for a 3G connection) and one for a second type of device, such as a portable device like a
tablet (i.e. using textures that may be more suited for a physically larger screen, and/or higher rate
wifi connection). Once the number of ns has been reduced and/or the es have been
treated according to desirable advantages to the target execution environment, the final 3D models
may be exported, for example in a mesh format. The 3D models may be exported in any suitable
3D model data format, and the invention is not so limited. An example of a suitable data format is
the Ogre3D format.
3D models in the cloud
The 3D models may be located in a central data repository, e.g. on a server, and may be
optionally compressed, for e, archived in a ZIP format. When ssion is used to store
the 3D model data, in order to reduce data storage and transmission requirements, then the
application may include respective decompression modules.
In order to get the 3D models of the glasses (and generic heads), the application may
download them from the server and unzip them. When that is done, the application can pass the 3D
models to the rendering engine.
W0 2016/142668
3D ing Engine
The 3D rendering engine used in this application gets a 3D model, and it will pass all the
rendered files along with the face tracking data sets and the respective video frames from the video
to the graphics/display engine. The 3D graphics engine may render the end image according to the
process as described in relation to Figure 3.
Thus, in the example sed above, using 9 extracted images, the rendering engine may
do the following steps to create an image of the user wearing the virtual images: 1) open the 3D
files and interpret them to create a 3D representation (e.g. the 3D glasses), 2) for each of the 9
frames used in the app: apply the video frame in the background (so the user’s face is in the
ound) and then display the 3D glasses in front of the background; using the face tracking
data set (face position and orientation), the engine will position the 3D models exactly on the user’s
face; 3) a “screenshot” of the 3D frames placed on the background will be taken, 4) the 9
screenshots are then displayed to the user.
3D Rendering Process end
Using inbuilt swipe gestures of the target OS, the user may now “browse” through the
ed screenshots for each frames, in which the illusion of the rendered glasses are on the user’s
face.
Web services, cloud and catalogs
The catalog ning all the frames is downloaded by the application from a static URL on
the server. The catalog will allow the application to know where to look for 3D glasses and when to
display them. This catalog will for example describe all the frames for the “Designer” ry, so
the application can fetch the corresponding 3D files. The catalog may use a CSV format for the
data e.
As described above, example applications include processes to: carry out video recording,
processing and face tracking data extraction, download 3D models from a server, interpreting and
adjusting those models ing face-tracking data. The downloading of the 3D models may
comprise ading a catalog of different useable 3D models of the items to be shown (e.g.
glasses), or different generic human head 3D models.
Face tracking is process
The following describes the face-tracking algorithms, as used in offline (i.e. non real-time)
application scenarios, such as when the disclosed example methods, s and devices detect and
track human faces on a pre-recorded video sequence. This example discloses use of the following
terms/notations: a frame is an image extracted from a Video captured by a video capture device or a
previously captured input Video sequence, a face model is a 3D mesh that represents a face; a key
point (also named interest point) is a point that corresponds to an interesting location in the image
W0 2016/142668
because of its neighborhood variations; a pose is a vector composed of a on and a translation
to describe rigid affine transformations in space.
Figure 4 shows a high level diagram of the face tracking method. A set of input images 402
are used by face tracking module 410 to provide an output set of s 402, which may be
referred to as “pose vectors”.
Figure 5 details how the Software retrieves faces in video sequences. The face-tracking
process may include a face-tracking engine that may be decomposed into four main phases: (1)
Pre-processing the (pre-recorded) video sequence 510, in order to find the frame ning the
most “reliable” face 520, (2) optionally the method may include building a 2.5D face model
ponding to the current user’s face, or choosing a most applicable generic model of a human
head to the captured user head image 530; (3) Tracking the face model sequentially using the (part
or whole) video sequence 540.
11) Pro-Processing phase
Figure 8 shows a pre-processing phase of the method 800 that has the objective to find the
most reliable frame containing a face from the video sequence. This phase is decomposed into 3
main sub-steps:
- (a) Face detection step 810 (and figure 6) which includes detecting the presence of a face in
each video frame. When a face is found, its position is ated.
- (b) Non-rigid face detection step 830 (and figure 7) which includes discovering face
features positions (e.g. eyes, nose, mouth, etc).
- (c) Retrieving the video frame containing the most reliable face image 870 out of a number
of candidates 850.
The face detection step (a) 810 may discover faces in the video frames using a sliding
windows technique. This technique includes comparing each part of the frame using pyramidal
images techniques and finding if a part of the frame is similar to a face ure, Face signature(s)
is stored in a file or a data structure and is named a classifier. To learn the classifier, nds of
previously known face images may have been processed. The face detection reiterates 820 until a
suitable face is .
The Non-rigid face detection step (b) is more complex since it tries to detect elements of the
face (also called face features, or landmarks). This non-rigid face ion step may take
age of the fact that a face has been correctly ed in step (a). Then face detection is
refined to detect face elements, for example using face detections techniques known in the art. As
in (a), a signature of face elements has been learnt using hundreds of face representations. This step
(b) is then able to compute a 2D shape that corresponds to the face features (see an illustration in
figure 7).
W0 2016/142668
Steps (a) and (b) may be repeated on all or on a subset of the captured frames that ses
the video sequence being assessed. The number of frames processed depends on the total number
of frames of the video sequence, or the sub-selection of video frames used. These may be based
upon, for e, the processing capacity of the system (e.g. processor, memory, etc), or on the
time the user is (or deemed to be) willing to wait before results to .
(c) If steps (a) and (b) have succeeded for at least one frame, then step (c) is processed to
find the frame in the video sequence that contains the most reliable face. Notion of reliable face can
be defined as follow:
- find the candidate frames with facing orientation i.e. faces that looks toward the camera,
using a threshold value on the angle (e.g. less than few radians), and
- amongst these candidate frames, find the frame containing a face not too far and not too
close to the camera using two threshold values as well.
Once the frame(s) with the most reliable face is found in the video sequence, the face-
ng algorithm changes state and tries to construct a face model representation, or chose a most
appropriate generic head model for use, or simply uses a standard generic model t any
selection thereof 890.
12) Building 3D face model phase
Figure 9 shows the optional face model building phase of the method 900 that serves to
uct a suitable face model representation, i.e. building an approximate geometry of the face
along with a textured signature of the face and corresponding keypoints. In some examples, this
ed 3D model is ed as a keyframe. The approximate geometry ofthe face may instead be
taken from a termined generic 3D model of a human face.
The keyframe may be constructed using the most reliable frame of the video sequence. This
phase is decomposed in following steps:
- (a) Creating a 3D model/mesh of the face 910 using the position of the face and the non-
rigid face shape built during phase (1).
- (b) Finding keypoints on the face image 920 and re-project them on the 3D mesh to find
their 3D positions.
- (c) Saving a 2D image of the face by cropping the face available in the most reliable frame.
In respect of step (a), the position of the face elements may be used to create the 3D model
of the face. These face elements may give essential information about the deformation of the face.
A mean (i.e. average) 3D face model available statically is then deformed using these 2D face
elements. This face model may then be positioned and oriented according to the camera position.
This may be done by optimizing an energy hl’l that is expressed using the image position of
face elements and their corresponding 3D position on the model.
W0 2016/142668
In respect of step (b), keypoints (referred sometimes as interest points or comer points) may
be computed on the face image using the most reliable frame. In some examples, a nt can be
detected at a specific image location if the neighboring pixels intensities are varying substantially
in both horizontal and vertical directions.
In respect of step (c), along with the 3D model and nts, the face representation (an
image of the face) may also be memorized (i.e. saved) so that the process can match its ance
in the remaining frames of the video capture.
Steps (a), (b) and (c) aim to construct a keyframe of the face. This keyframe is used to track
the face ofthe user in the remaining video frames.
(3) Tracking the face sequentially phase (see Figure 11).
Once the face model of the user has been reconstructed, or generic model chosen, the
ing video frames may be processed with the objective to track the face sequentially.
Assuming that the face's appearance in contiguous video frames is similar helps the described
method track the face frame after frame. This is because the portion of image around each
keypoint(s) does not change too much from one frame to r, therefore comparing/matching
keypoint(s) (in fact neighbouring image appearance) is easier. Any le technique to track the
face sequentially known in the art may be used. For example, as described in “Stable Real-Time 3D
Tracking using Online and Offline Information” — by L. Vacchetti, V. Lepetit and P. Fua, where
the keyframe may be used to match keypoints computed in earlier described face model building
phase and keypoints computed in each video frame. The pose of the face (i.e. its position and
orientation) may then be computed for each new frame using an optimization technique.
Figure 10 shows a processed video frame along with its corresponding (e.g. generic) 3D
model of a head.
When the video sequence is completed, face poses (and, in some examples, the
corresponding generic human face model) are sent to the 3D rendering engine, so that rendering
module can use this information to display l objects on top of the video ce. This
process is shown in Figure 11, and includes tracking sequentially the Face model using the
keyframe 1110, and returning Face poses when available 1130, via iterative process 1120 whilst
frames are available for processing, until no more frames are available for processing.
The invention may be implemented as a computer program for g on a er
system, said computer system comprising at least one ser, where the computer program
includes able code ns for execution by the said at least one processor, in order for the
computer system to perform any method according to the described examples. The computer
system may be a programmable apparatus, such as, but not limited to a personal er, tablet or
smartphone apparatus.
W0 2016/142668
Figure 12 shoes an exemplary c embodiment of such a computer system 1200
sing one or more processor(s) 1240, system control logic 1220 coupled with at least one of
the processor(s) 1240, system memory 1210 coupled with system control logic 1220, non-volatile
memory WVM)/storage 1230 coupled with system control logic 1220, and a network ace
1260 coupled with system control logic 1220. The system control logic 1220 may also be coupled
to Input/Output devices 1250.
Processor(s) 1240 may include one or more single-core or multi-core processors.
Processor(s) 1240 may include any combination of general-purpose processors and dedicated
processors (e.g., graphics processors, application processors, etc). Processors 1240 may be
operable to carry out the above described methods, using le instructions or programs (i.e.
operate via use of processor, or other logic, instructions). The instructions may be stored in system
memory 1210, as glasses visualisation application 1205, or additionally or alternatively may be
stored in (NVM)/storage 1230, as NVM s visualisation application portion 1235, to thereby
ct the one or more processors 1240 to carry out the virtual trying on experience methods
bed herein. The system memory 1210 may also include 3D model data 1215, whilst NVM
storage 1230 may include 3D model Data 1237. These may serve to store 3D models of the items
to be placed, such as glasses, and one or more generic 3D models of a human head.
System control logic 1220 for one embodiment may include any suitable interface
controllers to e for any le interface to at least one of the sor(s) 1240 and/or to
any suitable device or component in communication with system control logic 1220.
System control logic 1220 for one embodiment may e one or more memory
controller(s) (not shown) to provide an interface to system memory 1210. System memory 1210
may be used to load and store data and/or instructions, for example, for system 1200. System
memory 1210 for one embodiment may include any suitable volatile memory, such as suitable
dynamic random access memory (DRAM), for example.
NVM/storage 1230 may include one or more tangible, non-transitory computer-readable
media used to store data and/or instructions, for example. NVM/storage 1230 may include any
le non-volatile memory, such as flash memory, for example, and/or may include any suitable
non—volatile storage device(s), such as one or more hard disk drive(s) (HDD(s)), one or more
compact disk (CD) drive(s), and/or one or more digital versatile disk (DVD) drive(s), for example.
The orage 1230 may include a storage resource physically part of a device on which
the system 1200 is installed or it may be accessible by, but not necessarily a part of, the .
For example, the NVM/storage 1230 may be accessed over a network via the network interface
1260.
W0 2016/142668
System memory 1210 and NVM/storage 1230 may respectively include, in particular,
temporal and persistent copies of, for example, the instructions memory portions holding the
glasses visualisation application 1205 and 1235, respectively.
k interface 1260 may provide a radio interface for system 1200 to communicate over
one or more network(s) (e.g. wireless communication network) and/or with any other suitable
device.
Figure 13 shows more c example device to carry out the disclosed virtual trying
experience method, in particular a smartphone ment 1300, where the method is carried out
by an “app” downloaded to the smartphone 1300 via antenna 1310, to be run on a computer system
1200 (as per figure 12) within the smartphone 1300. The smartphone 1300 further includes a
display and/or touch screen display 1320 for displaying the virtual try-on ence image formed
according to the above described examples. The smartphone 1300 may optionally also e a set
of dedicated input s, such as keyboard 1320, especially when a touchscreen display is not
provided.
A computer program may be formed of a list of executable instructions such as a particular
ation program and/or an operating system. The computer program may for example include
one or more of: a subroutine, a function, a procedure, an object method, an object implementation,
an executable ation (“app”), an applet, a servlet, a source code portion, an object code
portion, a shared library/dynamic load library and/or any other sequence of instructions designed
for execution on a suitable computer system.
The er program may be stored internally on a computer readable storage medium or
transmitted to the computer system via a er le transmission medium. All or some of
the computer program may be provided on computer readable media permanently, removably or
remotely coupled to the mmable apparatus, such as an information processing system. The
computer readable media may include, for e and without limitation, any one or more of the
following: magnetic e media including disk and tape storage media; optical storage media
such as compact disk media (e.g., CD—ROM, CD-R, Blu-Ray®, etc.) digital video disk storage
media (DVD, DVD-R, DVD-RW, etc) or high y optical media (e.g. Blu-Ray®, etc); non-
volatile memory storage media including semiconductor-based memory units such as FLASH
memory, EEPROM, EPROM, ROM; ferromagnetic digital memories, MRAM; volatile storage
media including registers, buffers or , main , RAM, DRAM, DDR RAM etc; and
data transmission media including computer networks, point-to-point telecommunication
equipment, and r wave transmission media, and the like. Embodiments of the invention may
include tangible and non-tangible embodiments, transitory and non-transitory embodiments and are
not limited to any specific form of computer readable media used.
W0 2016/142668
A computer process typically includes an executing (running) program or portion of a
program, current m values and state information, and the resources used by the operating
system to manage the execution of the process. An operating system (OS) is the re that
manages the sharing of the resources of a computer and provides programmers with an interface
used to access those resources. An operating system processes system data and user input, and
responds by allocating and ng tasks and internal system resources as a service to users and
programs of the .
The computer system may for instance include at least one processing unit, associated
memory and a number of output (I/O) devices. When executing the computer program, the
computer system processes information according to the computer program and produces resultant
output information via I/O devices.
In the foregoing specification, the invention has been described with reference to specific
examples of embodiments of the invention. It will, however, be evident that various modifications
and changes may be made therein without departing from the broader scope of the ion as set
forth in the appended claims.
Those skilled in the art will recognize that the boundaries between logic blocks are merely
illustrative and that alternative embodiments may merge logic blocks or circuit elements or impose
an alternate decomposition of onality upon various logic blocks or t elements. Thus, it is
to be tood that the architectures depicted herein are merely exemplary, and that in fact many
other architectures can be implemented which achieve the same fimctionality.
Any arrangement of components to achieve the same nality is effectively "associated"
such that the desired functionality is achieved. Hence, any two components herein combined to
e a particular functionality can be seen as "associated with" each other such that the desired
fimctionality is ed, irrespective of architectures or intermedial components. Likewise, any
two components so ated can also be viewed as being "operably connected," or bly
coupled," to each other to achieve the desired functionality.
Furthermore, those skilled in the art will recognize that boundaries between the above
described operations are merely illustrative. The multiple operations may be combined into a single
operation, a single operation may be distributed in onal operations and operations may be
executed at least lly overlapping in time. Moreover, alternative embodiments may include
multiple ces of a particular operation, and the order of operations may be altered in various
other embodiments.
Also for example, the examples, or portions thereof, may be implemented as soft or code
representations of physical circuitry or of logical representations convertible into physical circuitry,
such as in a hardware description language of any appropriate type.
W0 2016/142668
Also, the invention is not d to physical s or units implemented in non-
programmable hardware but can also be applied in programmable devices or units able to m
the desired device functions by operating in accordance with le program code, such as
mainframes, minicomputers, servers, workstations, personal computers, notepads, personal digital
assistants, electronic games, automotive and other embedded systems, cell phones and various
other wireless devices, commonly denoted in this application as ‘computer systems’.
However, other ations, variations and alternatives are also possible. The
specifications and drawings are, accordingly, to be regarded in an rative rather than in a
restrictive sense.
In the claims, any reference signs placed between parentheses shall not be construed as
limiting the claim. The word ‘comprising’ does not exclude the presence of other elements or steps
then those listed in a claim. Furthermore, the terms “a” or “an,” as used herein, are defined as one
or more than one. Also, the use of uctory phrases such as “at least one” and “one or more” in
the claims should not be construed to imply that the uction of another claim element by the
indefinite articles "a" or "an" limits any particular claim containing such introduced claim element
to inventions containing only one such element, even when the same claim includes the
introductory phrases "one or more" or "at least one" and ite articles such as "a" or "an." The
same holds true for the use of definite articles. Unless stated otherwise, terms such as “first” and
“second” are used to arbitrarily distinguish between the elements such terms describe. Thus, these
terms are not necessarily intended to indicate temporal or other prioritization of such elements. The
mere fact that certain measures are recited in mutually different claims does not indicate that a
combination of these es cannot be used to advantage.
Examples provide a method of providing a virtual trying on experience to a user comprising
extracting at least one image from a Video including a plurality of Video frames of a user in
different orientations to provide at least one extracted image, determining user movement in the at
least one extracted image, acquiring 3D models of an item to be tried on the user and a generic
representation of a human, combining the ed 3D models and at least one extracted image as
the background, and ting an output image representative ofthe Virtual trying—on experience.
In some es, the determining user movement in the at least one extracted image further
comprises determining a maximum angle of rotation of the user in a first direction.
In some examples, the determining user movement in the at least one extracted image further
comprises ining a maximum angle of rotation ofthe user in a second direction.
In some examples, the determining user movement in the at least one ted image further
comprises outputting a value indicative of the determined maximum angle of rotation ofthe user in
the first or second directions.
W0 2016/142668
In some examples, the acquiring 3D models of an item to be tried on the user and a generic
representation of a human further comprises selecting a one of a plurality of 3D models of available
generic humans.
In some examples, the method further comprises determining an origin point in each of the 3D
models used, wherein the respective origin point in each 3D model is placed to allow alignment of
the 3D models with one another.
In some es, the method further comprises determining an orientation ofthe user in the at
least one extracted image and corresponding the orientation of the 3D models in a 3D space
according to the determined orientation ofthe user.
In some examples, the method further comprises adjusting an origin of at least one 3D model.
In some examples, the method further comprises ng the origins ofthe 3D models.
In some examples, the method further comprises dividing the maximum on ofthe user in
first and second directions into a predetermined number of set angles, and extracting as many
images as there are determined number of set angles.
In some examples, the method further comprises adjusting respective positions of the 3D
models and the background according to user input.
In some examples, the method r comprises capturing the on ofthe user using a
video capture device.
In some examples, the method further comprises determining user movement comprises
determining movement of a user’s head.
There is also provided a method of providing a l trying on experience for a user,
comprising receiving a plurality of video frames of a user’s head in different orientations to provide
captured oriented user images, identifying an origin reference points on the captured oriented user
, identifying an origin on a 3D model of a generic user, identifying an origin reference point
on a 3D model of a elected item to be tried on, aligning the nce points of the selected
captured oriented user , the 3D model of a generic user and the 3D model of an item to be
tried on, combining the captured oriented user images with a generated representation of user—
selected item to be tried on to e a combined image and displaying the combined image.
In some es, the receiving a plurality of video frames of a user’s head in different
orientations to provide captured oriented user images fithher comprises selecting only a subset of
all the ed video frames to use in the subsequent processing of the captured oriented user
images.
In some examples, the selecting only a subset is a pre—determined sub set, or user—selectable.
In some examples, the method fithher comprises identifying one or more attachment points of
the item to the user.
W0 2016/142668
In some examples, the method further comprises rotating or translating the attachment points in
the 3D space to re-align the item to the user in a user specified way.
In some examples, the providing a virtual trying on experience for a user comprises generating
a visual representation of a user trying on an item, and wherein the trying on of an item on a user
comprises trying on an item on a user’s head. In some examples, the item being tried on is a pair of
glasses.
There is also provided a method of providing a virtual trying on experience for a user
comprising generating a visual representation of a user trying on an item from at least one 3D
model of an item to be tried on, at least one 3D generic model of a human head and at least one
extracted image of the user head.
Unless otherwise stated as incompatible, or the physics or otherwise ofthe embodiments
prevent such a combination, the features of the following claims may be integrated er in any
suitable and beneficial arrangement. This is to say that the combination of es is not limited by
the claims specific form, particularly the form ofthe dependent , such as claim numbering
and the like.
Claims (22)
1. A method of providing a virtual trying on experience to a user comprising: extracting at least one image from a video including a plurality of video frames of a 5 user in ent orientations to provide at least one ted image; acquiring a 3D model of an item to be tried on the user and a 3D model of a generic representation of a human; and combining the acquired 3D models with at least one extracted image, having the at least one extracted image as a background, to generate an output image representative of the 10 virtual trying-on experience; wherein each of the 3D models comprises an origin point, and the combining the acquired 3D models with the at least one extracted image comprises aligning the origin points of each of the 3D models in 3D space.
2. The method of claim 1, comprising determining user movement in the at least one extracted 15 image.
3. The method of claim 2, wherein determining user movement in the at least one ted image further comprises determining a maximum angle of rotation of the user in a first direction. 20
4. The method of claim 2 or 3, wherein ining user movement in the at least one extracted image further ses determining a maximum angle of rotation of the user in a second direction.
5. The method of claim 3 or 4, wherein determining user movement in the at least one ted image further comprises outputting a value indicative of the ined maximum angle of on 25 of the user in the first direction when dependent on claim 3 or the second direction when dependent on claim 4.
6. The method of any one of the preceding claims, wherein the acquiring the 3D model of an item to be tried on the user and the 3D model of a generic representation of a human further comprises 30 selecting a one of a plurality of 3D models of generic humans.
7. The method of any one of the preceding claims, further comprising determining an orientation of the user in the at least one extracted image and ponding the orientation of the 3D models in a 3D space according to the determined orientation of the user.
8. The method of claim 1, further comprising adjusting the origin point of at least one of the 3D models.
9. The method of any one of claims 3 to 5 or any one of the claims dependent thereon, further 5 comprising dividing the maximum rotation of the user in the first or the second direction into a predetermined number of set angles, and extracting as many images as there are determined number of set angles.
10. The method of any one of the preceding claims, further comprising adjusting respective 10 positions of the 3D models and the background ing to user input.
11. The method of any one of the ing claims, further sing capturing the rotation of the user using a video capture device. 15
12. The method of any one of the preceding claims, wherein determining user movement comprises determining movement of a user’s head
13. A method of providing a virtual trying on experience for a user, comprising: receiving a plurality of video frames of a user’s head in different orientations to provide 20 captured ed user images; identifying an origin point on a 3D model of a generic user; identifying an origin point on a 3D model of a elected item to be tried on; aligning the origin point of the 3D model of the generic user and the 3D model of an item to be tried on; 25 combining each captured oriented user image as a background with a generated representation of the user-selected item to be tried on based on the d 3D model of the generic user and the 3D model of the item to provide a series of combined images representative of the virtual trying on experience; and displaying the series of combined images.
14. The method of claim 13, wherein receiving a ity of video frames of a user’s head in different orientations to provide captured ed user images further ses selecting only a subset of all the captured video frames to use in the subsequent processing of the captured oriented user images
15. The method of claim 14, wherein the selecting only a subset is a pre-determined sub set, or user–selectable.
16. The method of any one of claims 13 to 15, further comprising identifying one or more 5 attachment points of the item to the user.
17. The method of claim 16 wherein the method further comprises rotating or translating the ment points in the 3D space to re-align the item to the user in a user ied way. 10
18. The method of any one of claims 13 to 17, wherein providing a virtual trying on experience for a user comprises generating a visual representation of a user trying on an item, and wherein the trying on of an item on a user comprises trying on an item on a user’s head.
19. The method of claim 18 n the item is a pair of glasses.
20. A computer readable medium comprising instructions, which, when executed by one or processors, result in the one or more sors carrying out the method of any one of the preceding claims. 20
21. A er system arranged to carry out any one of the preceding method claims or provide instructions to carry out any one of the preceding method claims.
22. The method of claim 1 or 13, substantially as herein described with reference to any one of the Examples and/or
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
GB1503831.8 | 2015-03-06 | ||
GB1503831.8A GB2536060B (en) | 2015-03-06 | 2015-03-06 | Virtual trying-on experience |
PCT/GB2016/050596 WO2016142668A1 (en) | 2015-03-06 | 2016-03-07 | Virtual trying-on experience |
Publications (2)
Publication Number | Publication Date |
---|---|
NZ736107A true NZ736107A (en) | 2021-08-27 |
NZ736107B2 NZ736107B2 (en) | 2021-11-30 |
Family
ID=
Also Published As
Publication number | Publication date |
---|---|
AU2016230943A1 (en) | 2017-10-26 |
GB2536060B (en) | 2019-10-16 |
GB2536060A (en) | 2016-09-07 |
AU2016230943B2 (en) | 2021-03-25 |
WO2016142668A1 (en) | 2016-09-15 |
EP3266000A1 (en) | 2018-01-10 |
GB201503831D0 (en) | 2015-04-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10924676B2 (en) | Real-time visual effects for a live camera view | |
US11481869B2 (en) | Cross-domain image translation | |
KR102304124B1 (en) | Method and appartus for learning-based generating 3d model | |
WO2020029554A1 (en) | Augmented reality multi-plane model animation interaction method and device, apparatus, and storage medium | |
US11138306B2 (en) | Physics-based CAPTCHA | |
KR102433857B1 (en) | Device and method for creating dynamic virtual content in mixed reality | |
US11276238B2 (en) | Method, apparatus and electronic device for generating a three-dimensional effect based on a face | |
US10394221B2 (en) | 3D printing using 3D video data | |
JP2021524628A (en) | Lighting estimation | |
AU2016230943B2 (en) | Virtual trying-on experience | |
WO2022182369A1 (en) | Method and system providing temporary texture application to enhance 3d modeling | |
KR20230162107A (en) | Facial synthesis for head rotations in augmented reality content | |
CN115035224A (en) | Method and apparatus for image processing and reconstructed image generation | |
CN112862981B (en) | Method and apparatus for presenting a virtual representation, computer device and storage medium | |
NZ736107B2 (en) | Virtual trying-on experience | |
US10825258B1 (en) | Systems and methods for graph-based design of augmented-reality effects | |
CN110827411B (en) | Method, device, equipment and storage medium for displaying augmented reality model of self-adaptive environment | |
JP7556839B2 (en) | DEVICE AND METHOD FOR GENERATING DYNAMIC VIRTUAL CONTENT IN MIXED REALITY - Patent application | |
Munir et al. | 3D Single Image Face Reconstruction Approaches With Deep Neural Networks | |
CN118279372A (en) | Face key point detection method and electronic equipment | |
Jiao et al. | NEHand: Enhancing Hand Pose Estimation in the Wild through Synthetic and Motion Capture Datasets | |
Nguyen et al. | Fast and automatic 3D full head synthesis using iPhone | |
Cheng et al. | Object-level Data Augmentation for Visual 3D Object Detection in Autonomous Driving |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PSEA | Patent sealed | ||
RENW | Renewal (renewal fees accepted) |
Free format text: PATENT RENEWED FOR 1 YEAR UNTIL 07 MAR 2023 BY IP CENTRUM LIMITED Effective date: 20220212 |
|
RENW | Renewal (renewal fees accepted) |
Free format text: PATENT RENEWED FOR 1 YEAR UNTIL 07 MAR 2024 BY IP CENTRUM LIMITED Effective date: 20230201 |
|
RENW | Renewal (renewal fees accepted) |
Free format text: PATENT RENEWED FOR 1 YEAR UNTIL 07 MAR 2025 BY IP CENTRUM LIMITED Effective date: 20240213 |