WO2016142668A1 - Expérience d'essayage virtuel - Google Patents

Expérience d'essayage virtuel Download PDF

Info

Publication number
WO2016142668A1
WO2016142668A1 PCT/GB2016/050596 GB2016050596W WO2016142668A1 WO 2016142668 A1 WO2016142668 A1 WO 2016142668A1 GB 2016050596 W GB2016050596 W GB 2016050596W WO 2016142668 A1 WO2016142668 A1 WO 2016142668A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
model
face
models
item
Prior art date
Application number
PCT/GB2016/050596
Other languages
English (en)
Inventor
David Mark GROVES
Jerome BOISSON
Original Assignee
Specsavers Optical Group Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Specsavers Optical Group Limited filed Critical Specsavers Optical Group Limited
Priority to AU2016230943A priority Critical patent/AU2016230943B2/en
Priority to NZ736107A priority patent/NZ736107B2/en
Priority to EP16710283.9A priority patent/EP3266000A1/fr
Publication of WO2016142668A1 publication Critical patent/WO2016142668A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/006Mixed reality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • G06T7/75Determining position or orientation of objects or cameras using feature-based methods involving models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0641Shopping interfaces
    • G06Q30/0643Graphical representation of items or shoppers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • G06T7/33Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
    • G06T7/344Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods involving models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2210/00Indexing scheme for image generation or computer graphics
    • G06T2210/16Cloth
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2219/00Indexing scheme for manipulating 3D models or images for computer graphics
    • G06T2219/20Indexing scheme for editing of 3D models
    • G06T2219/2004Aligning objects, relative positioning of parts

Definitions

  • Embodiments of the invention relate to a computer implemented method for providing a visual representation of an item being tried on a user.
  • the present invention provides a method of providing a virtual trying on experience to a user as described in the accompanying claims.
  • Figure 1 shows an example method of providing a virtual trying on experience to a user according to an example embodiment of the invention
  • Figure 2 shows first and second more detailed portions of the method of Figure 1, according to an example embodiment of the invention
  • Figure 3 shows a third more detailed portion of the method of Figure 1, according to an example embodiment of the invention.
  • Figure 4 shows a high level diagram of the face tracking method, according to an example embodiment of the invention.
  • Figure 5 shows how the method retrieves faces in video sequences, according to an example embodiment of the invention
  • Figure 6 shows a detected face, according to an example embodiment of the invention
  • Figure 7 shows detected features of a face, according to an example embodiment of the invention
  • Figure 8 shows a pre-processing phase of the method that has the objective to find the most reliable frame containing a face from the video sequence, according to an example embodiment of the invention
  • Figure 9 shows an optional face model building phase of the method that serves to construct a suitable face model representation, according to an example embodiment of the invention
  • Figure 10 shows a processed video frame along with its corresponding (e.g. generic) 3D model of a head, according to an example embodiment of the invention
  • Figure 11 shows a sequential face tracking portion of the disclosed method, according to an example embodiment of the invention.
  • Figure 12 shows an exemplary embodiment of computer hardware on which the disclosed method may be run
  • Figure 13 shows another exemplary embodiment of computer hardware on which the disclosed method may be run. Detailed description
  • Examples provide a method, apparatus and system for generating "a virtual try-on experience" of an item on a user, such as a pair of spectacles/glasses being tried on a user's head.
  • the virtual try-on experience may be displayed on a computer display, for example on a smartphone or tablet screen.
  • Examples also provide a computer program (or "app") comprising instructions, which when executed by one or more processors, carry out the disclosed methods.
  • the disclosed virtual try on experience methods and apparatuses allow a user to see what a selected item would look like on their person, typically their head.
  • Examples may use one or more generic 3D models of a human head, together with a one or more 3D models of the item(s) to be tried on, for example models of selected pairs of glasses.
  • the one or more generic 3D models of a human head may include a female generic head and a male generic head.
  • different body shape generic head 3D models may be provided and selected between to be used in the generation of the "virtual try-on experience".
  • the different body shape generic heads may comprise different widths and/or heights of heads, or hat sizes.
  • the 3D models may be placed into a 3D space by reference to an origin.
  • the origin of the 3D models may be defined as a location in the 3D space at which the coordinates of each 3D model is to be referenced from, in order to locate any given portion of the 3D model.
  • the origin of each model may correspond to one another, and to a specified nominally universal location, such as the location of a bridge of the nose.
  • the origins of the 3D models may be readily co-located in the 3D space, together with a corresponding location of the item to be virtually tried on, so that they may be naturally/suitably aligned.
  • the origin is not in itself a point in the model. It is merely a location by which points in the 3D models (both of the generic human head, but also of any item being tried on, such as glasses) may be referenced and suitably aligned. This is to say, examples may place both 3D models (i.e. the selected generic human head + item being tried on) into the same 3D space in a suitable (i.e. realistic) alignment by reference to the respective origins.
  • the 3D model of the head may not be made visible, but only used for occlusion or other calculations of the 3D model of the glasses.
  • the combined generic head (invisible) and glasses 3D models (suitably occluded) can then be placed on a background comprising an extracted image of the user taken from a video, so that the overall combination of the rendered 3D model of the glasses and the extracted video gives the impression of the glasses being worn by the user.
  • This combination process, as well as the occlusion calculations using the "invisible" generic human head, may be repeated for a number of extracted images at different nominal rotations.
  • examples do not need to generate a 3D model of a user's head, and therefore reduce the processing overhead requirements.
  • the utility of the examples is not materially affected as key issues pertaining to the virtual try on experience are maintained, such as occlusion of portions of the glasses by head extremities (e.g. eyes, nose, etc), during rotation, as discussed in more detail below.
  • Examples map the 3D models in the 3D space onto suitably captured and arranged images of the actual user of the system.
  • This mapping process may include trying to find images of a user's head having pre-defined angles of view matching predetermined angles.
  • This matching may comprise determining, for a captured head rotation video, a predetermined number of angles of head between the two maximum angles of head rotation contained within the captured head rotation video.
  • examples enable use of the specific captured head rotation video, regardless of whether or not a pre-determined preferable maximum of head rotation has occurred (i.e. these examples would not require the user to re-capture a new video because the user had not turned their head sufficiently in the original capturing of their head rotation).
  • examples are more efficient than the prior art that requires a minimum head rotation.
  • the viewing angle(s) may be user-determined. This enables the system to portray the generated 3D try on experience in a way particularly desirable to the user, as opposed to only being portrayed in a specific, pre -determined manner that the user must abide by in order for the system to work. Thus, examples are more "natural" to use than the prior art.
  • the example software application is in the form of a virtualized method for a human user to try on glasses including a face tracking portion described in more detail below, where face tracking is used in an application according to examples to 'recognize' a user's face (i.e. compute a user's head/face pose).
  • Examples of the disclosed method may include extracting a still image(s) of a user (or just user's head portion) from a captured video of the user.
  • a movement and/or orientation of the user's head, i.e. position and viewing direction, may be determined from the extracted still image(s).
  • the image of the user may be used as a background image for a 3D space including 3D models of the item, such as glasses, to be virtually tried on, thereby creating the appearance of the item being tried on the user actual capture head.
  • a 3D model of a generic head, i.e. not of the actual user may also be placed into the 3D space, overlying the background image of the user.
  • the generic human head model may be used as a mask, to allow suitable occlusion culling (i.e. hidden surface determination) to be carried out on the 3D model of the item being tried on, in relation to the user's head.
  • suitable occlusion culling i.e. hidden surface determination
  • Use of a generic human head model provides higher processing efficiency/speed, without significantly reducing efficacy of the end result.
  • An origin of the 3D model of a generic human head may be located at a pre-determined point in the model, for example, corresponding to a bridge of a nose in the model. Other locations and numbers of reference points may be used instead.
  • a position at which the 3D model is located within the 3D space may also set with reference to the origin of the model, i.e. by specifying the location of the origin of the 3D model within the 3D space.
  • the orientation of the 3D model may correspond to the determined viewing direction of the user.
  • a 3D model of the selected item to be tried on, for example the selected pair of glasses, may be placed into the 3D space.
  • An orientation of the glasses model may correspond to the viewing direction of the user.
  • An origin of the 3D glasses model may be provided and located at a point corresponding to the same point as the 3D model of the generic human head, for example also being at a bridge of a nose in the glasses model.
  • a position at which the 3D glasses model is located within the 3D space may be set with reference to the origin of the glasses 3D model, i.e. by specifying the location of the origin of the 3D model within the 3D space.
  • the origin of the 3D model of the glasses may be set so that the glasses substantially align to the normal wearing position on the 3D model of the human head.
  • An image of the glasses located on the user's head may then be generated based on the 3D models of the glasses and generic head (which may be used to mask portions of the glasses model which should not be visible and to generate shadow) and the background image of the user
  • the position of the glasses relative to the head may be altered by moving the location of the 3D glasses model in the 3D space, i.e. by setting a different location of an origin of the model, or by moving the origin of the 3D glasses model out of alignment with the origin of the 3D model of a generic human head.
  • the example application also may include video capture, which may refer to capturing a video of the user's head and splitting that video up into a plurality of video frames.
  • video capture may occur outside of the device displaying the visualization.
  • Each video frame may therefore comprise an image extracted from a video capture device or a video sequence captured by that or another video capture device.
  • Examples may include one or more 3D models, where a 3D model is a 3D representation of an object.
  • the 3D models may be of a generic human head and of an item to be visualized upon the head, such as a pair of glasses.
  • a 3D model as used herein may comprise a data set including one or more of: a set of locations in a 3D space defining the item being modelled; a set of data representing a texture or material of the item (or portion thereof) in the model; a mesh of data points defining the object; an origin, or reference point for the model, and other data useful in defining the physical item about which the 3D model relates. Examples may also use a scene, where the scene may contain one or more models, and including, for example, all the meshes for the 3D models used to visualize the glasses on a user's head.
  • Other data sets that may also be used in some examples include: a material data set describing how a 3D model should be rendered, often based upon textures; a mesh data set that may be the technical 3D representation of the 3D model; a texture data set that may include a graphic file that may be applied to a 3D model in order to give it a texture and/or a color.
  • Data sets that may be used in some embodiments may include CSV (for Comma Separated
  • Example embodiments may comprise code portions or software modules including, but not limited to: code portions provided by or through a Software Development Kit (SDK) of the target Operating System (OS), operable to enable execution of the application on that target OS, for example portions provided in the iOS SDK environment, XCode®; 3D model rendering, lighting and shadowing code portion (for example, for applying the glasses on user's face); face tacking code portions; and metric provision code portions.
  • SDK Software Development Kit
  • OS Operating System
  • 3D model rendering, lighting and shadowing code portion for example, for applying the glasses on user's face
  • face tacking code portions for example, for applying the glasses on user's face
  • metric provision code portions The software application comprises three core actions: video recording of the user's face with face-tracking; 3D model download and interpretation/representation of the 3D models (of generic user head and glasses being visualized on the user's head); and display of the combination of the 3D Models and recorded video imagery. Examples may also include cloud / web enabled services catalog handling,
  • Figure 1 shows an example method 100 of providing a virtual try on experience for glasses on a user's head.
  • the method starts by capturing video 110 of the user's head rotating.
  • a previously captured video may be used instead.
  • the method then extracts images 120, for later processing, as disclosed in more detail below. From the extracted images, the method determines the object (in this example, the user's head) movement in the extracted images 130. Next, 3D models of the items (i.e. glasses) to be placed, and a 3D model of a generic human head on which to place the item models, are acquired 140, either from local storage (e.g. in the case of the generic human head model) or from a remote data repository (e.g. in the case of the item/glasses, as this may be a new model). More detailed description of these process 130 and 140 are disclosed below with reference to Figure 2.
  • the 3D models are combined with one another and the extracted images (as background) at step 150. Then, an image of the visual representation of the object (user's head) with the item (glasses) thereon can be generated 160. This is described in more detail with respect of the Figure 3, below.
  • the location of the items with respect to the object may be adjusted 170, typically according to user input. This step may occur after display of the image, as a result of the user desiring a slightly different output image.
  • Figure 2 shows a more detailed view 200 of a portion of the method, in particular, the object movement determination step 130 and 3D model acquisition step 140.
  • the object movement determination step 130 may be broken down in to sub steps in which a maximum rotation of the object (i.e. head) in a first direction (e.g. to the left) is determined 132, then the maximum rotation in the second direction (e.g. to the right) may then be determined 134, finally, for this portion of the method, output values may be provided 136 indicative of the maximum rotation of the head in both first and second directions, for use in the subsequent processing of the extracted images and/or 3D models for placement within the 3D space relating to each extracted image.
  • the different steps noted above in respect of the object movement determination may only be optional.
  • the 3D model acquisition step 140 may be broken down in to sub steps in which a 3D model of a generic head is acquired 142, or optionally, to include a selection step 144 of a one 3D model of a generic human head out of a number of acquired 3D generic models of a human head (e.g. choosing between a male or female generic head 3D model).
  • the choice of generic head model may be under direct User control, or by automated selection, as described in more detail below.
  • the 3D models of the item(s) to be placed on the head e.g. glasses may then be acquired 146.
  • the two acquisition steps 142 and 146 may be carried out either way round, it is advantageous to choose the generic human head in use, because this may allow the choice of 3D models of the items to be placed to be filtered so that only applicable models are available for subsequent acquisition. For example, choosing a female generic human head 3D model can filter out all male glasses.
  • Figure 3 shows a more detailed view 300 of the image generation step 160 of Figure 1.
  • the image generation step 160 may start by applying an extracted image as the background 162 to the visual representation of the item being tried on the user's head. Then, using the face tracking data (i.e. detected movement, such as the extent of rotation values discussed above, at step 136) may be used to align the 3D models of the generic human head and the 3D model of the glasses to the extracted image used as background 164 (the 3D models may already have been aligned to one another, for example using their origins, or that alignment can be carried out at this point as well, instead).
  • the face tracking data i.e. detected movement, such as the extent of rotation values discussed above, at step 1366
  • the 3D models may already have been aligned to one another, for example using their origins, or that alignment can be carried out at this point as well, instead).
  • Hidden surface detection calculations (i.e. occlusion calculations) 166 may be carried out on the 3D model of the glasses, using the 3D model of the generic head, so that any parts of the glasses that should not be visible in the context of the particular extracted image in use at this point in time may be left out of the overall end 3D rendering of the combined scene (comprising extracted image background, and 3D model of glasses "on top").
  • the combined scene may then be output as a rendered image 168.
  • the process may repeat for a number of different extracted images, each depicting a different rotation of the user's head in space.
  • the extracted images used above may be taken from a video recording of the user's face, which may be carried out with a face tracking portion of the example method. This allows the user to record a video of themselves, so that the virtual glasses can be shown as they would look on their actual person. This is achieved in multiple steps. First the application records a video capture of the user's head. Then the application will intelligently split this video into frames and send these to the Face tracking library module. The face tracking library module may then return the location results for each frame (i.e. where the user's face is in the frame and/or 3D space/world (related to a Coordinate System, (CS) that is linked to the camera). These results may be used to position the 3D glasses on the users face virtually.
  • the face recording may be approximately 8 seconds long, and may be captured in high resolution video.
  • Video recording and face -tracking
  • the application may prompt the user to record a video of their head turning in a non-predefined, i.e. user-controllable, substantially horizontal sweep of the user's head.
  • the camera is typically located dead-ahead of the user's face, when the user's head is at the central point of the overall sweep, such that the entirety of the user's head visible in the frame of the video. However, in other examples, the camera may not be so aligned.
  • the user has to move his head left and right to give the best results possible.
  • the location of the head in the sweep may be assessed by the face tracking module prior to capture of the video for use in the method, such that the user may be prompted to re-align their head before capture.
  • the method captures the video as is provided by the user, and carries on without requiring a second video capture.
  • the video may then be processed through the following steps.
  • the captured video is to be interpreted by the face-tracking process carried out by the face- tracking module.
  • the captured video of the user's head may be sampled, so that only a sub-set of the captured video images are used in the later processing steps. This may result in faster and/or more efficient processing, which in turn may also allow the example application to be performed by lesser processing resources or at greater energy efficiency.
  • One exemplary way to provide this sampling of the captured video images is to split the video into comprehensible frames.
  • this splitting action may involve the video that is recorded at a higher initial capture rate (e.g. of 30 frame per seconds, at 8 seconds total length, that gives a total of 240 video frames), but only selecting or further processing a pre-determined or user definable number of those frames.
  • the splitting process may select every third fame of the originally capture video, which in the above example provides 80 output frame for subsequent processing, at a rate of 10 frames per second.
  • the processing overload is now approximately 33% of the original processing load.
  • the sub-selected 80 video frames are then sent to the Face-tracking module for analysis, as described in more detail below with respect to figures 3 to 10.
  • the application may have 80 sets of data: one for each sub-selected video frame. These sets of data contain, for each video frame, the position and orientation of the face.
  • the application may include a step of selecting a pre-defined number of best frames offered by the results returned by the face-tracking module. For example, the 9 best frames may be selected, based upon the face orientation, thereby covering all the angles of the face as it turns from left to right (or vice versa).
  • the selection may be made as follows: for frame 1 (left most), the face may be turned 35 degrees to the left; for frame 2, the face may be turned 28 degrees to the left; for frame 3, the face may be turned 20 degrees to the left; for frame 4, the face may be turned 10 degrees to the left; for frame 5, the face may be centered; for frame 6, the face may be turned 10 degrees to the right; for frame 7, the face may be turned 20 degrees to the right; for frame 8, the face may be turned 28 degrees to the right; for frame 9, the face may be turned 35 degrees to the right.
  • Other specific angles selected for each of the selection of best frames may be used, and may also be defined by the user instead.
  • non-linear/contiguous capture of images/frames of the head in the 3D space may be used.
  • the user's head may pass through any given target angle more than once during a recording. For example, if one degree left of centre were a target angle and the recording starts from a straight ahead position, then the head being captured passes through this one degree left of centre angle twice - once en route *to* the left-most position and once more after rebound *from* the left-most position.
  • the method has the option to decide which of the different instances is the best version of the angle to use for actual display to the user.
  • the images actually used to display to the user may not all be contiguous/sequential in time.
  • the method may instead use any arbitrary user provided turn of head, and determine the actual maximum turn in each direction, and then split that determined actual head turn into a discreet number of 'best frames'.
  • This process may also take into account a lack of symmetry of the overall head turn (i.e. more turn to the left that right, or vice versa).
  • the actual head turn may be, in actual fact, 35 degrees left and 45 degrees right. Therefore, a total of 70 degrees, which in turn may be then split into 9 frames at 7.77 degrees each, or simply 3 on the left, one central, and 4 on the right).
  • the application may have selected 9 frames and associated sets of data.
  • the application may be rejected, and the user may be kindly asked to take a new head turning video. For example, if the leftmost frame does not offer a face turned at least 20 degrees to the left, or the right most frame does not offer a face turned at least 20 degrees to the right, the user's video will be rejected.
  • the respective best frame images and data sets are saved within the application data storage location. These may then be used in a later state, with the 3D models, which may also be stored in the application data storage location, or another memory location in the device carrying out the example application, or even in a networked location, such as central cloud storage repository.
  • the application may start from the captured High Definition, High polygons models (e.g. of the glasses (or other product) to be tried on). Since the application has to run on mobile devices, these 3D models may be reworked in order to adapt to low calculation power and low memory offered by the mobile devices, for example to reduce the number of polygons in each of the models.
  • High polygons models e.g. of the glasses (or other product) to be tried on. Since the application has to run on mobile devices, these 3D models may be reworked in order to adapt to low calculation power and low memory offered by the mobile devices, for example to reduce the number of polygons in each of the models.
  • the textures may be images and, if not reworked, may overflow the device memory and lead to application crashes.
  • a first type of device e.g. a mobile device such as a smartphone, using iOS, where the textures used may be smaller, and hence more suited for a 3G connection
  • a second type of device such as a portable device like a tablet (i.e. using textures that may be more suited for a physically larger screen, and/or higher rate wifi connection).
  • the final 3D models may be exported, for example in a mesh format.
  • the 3D models may be exported in any suitable 3D model data format, and the invention is not so limited.
  • An example of a suitable data format is the Ogre3D format.
  • the 3D models may be located in a central data repository, e.g. on a server, and may be optionally compressed, for example, archived in a ZIP format.
  • a central data repository e.g. on a server
  • the application may include respective decompression modules.
  • the application may download them from the server and unzip them. When that is done, the application can pass the 3D models to the rendering engine.
  • 3D Rendering Engine
  • the 3D rendering engine used in this application gets a 3D model, and it will pass all the rendered files along with the face tracking data sets and the respective video frames from the video to the graphics/display engine.
  • the 3D graphics engine may render the end image according to the process as described in relation to Figure 3.
  • the rendering engine may do the following steps to create an image of the user wearing the virtual images: 1) open the 3D files and interpret them to create a 3D representation (e.g. the 3D glasses); 2) for each of the 9 frames used in the app: apply the video frame in the background (so the user's face is in the background) and then display the 3D glasses in front of the background; using the face tracking data set (face position and orientation), the engine will position the 3D models exactly on the user's face; 3) a "screenshot" of the 3D frames placed on the background will be taken; 4) the 9 screenshots are then displayed to the user.
  • a 3D representation e.g. the 3D glasses
  • the user may now "browse” through the rendered screenshots for each frames, in which the illusion of the rendered glasses are on the user's face.
  • the catalog containing all the frames is downloaded by the application from a static URL on the server.
  • the catalog will allow the application to know where to look for 3D glasses and when to display them. This catalog will for example describe all the frames for the "Designer" category, so the application can fetch the corresponding 3D files.
  • the catalog may use a CSV format for the data storage.
  • example applications include processes to: carry out video recording, processing and face tracking data extraction; download 3D models from a server, interpreting and adjusting those models according face-tracking data.
  • the downloading of the 3D models may comprise downloading a catalog of different useable 3D models of the items to be shown (e.g. glasses), or different generic human head 3D models.
  • a frame is an image extracted from a video captured by a video capture device or a previously captured input video sequence
  • a face model is a 3D mesh that represents a face
  • a key point also named interest point
  • a pose is a vector composed of a position and a translation to describe rigid affine transformations in space.
  • Figure 4 shows a high level diagram of the face tracking method.
  • a set of input images 402 are used by face tracking module 410 to provide an output set of vectors 402, which may be referred to as "pose vectors".
  • the face-tracking process may include a face-tracking engine that may be decomposed into four main phases: (1) Pre-processing the (pre-recorded) video sequence 510, in order to find the frame containing the most "reliable" face 520; (2) optionally the method may include building a 2.5D face model corresponding to the current user's face, or choosing a most applicable generic model of a human head to the captured user head image 530; (3) Tracking the face model sequentially using the (part or whole) video sequence 540.
  • FIG. 8 shows a pre-processing phase of the method 800 that has the objective to find the most reliable frame containing a face from the video sequence. This phase is decomposed into 3 main sub-steps:
  • Face detection step 810 (and figure 6) which includes detecting the presence of a face in each video frame. When a face is found, its position is calculated.
  • Non-rigid face detection step 830 which includes discovering face features positions (e.g. eyes, nose, mouth, etc).
  • the face detection step (a) 810 may discover faces in the video frames using a sliding windows technique. This technique includes comparing each part of the frame using pyramidal images techniques and finding if a part of the frame is similar to a face signature. Face signature(s) is stored in a file or a data structure and is named a classifier. To learn the classifier, thousands of previously known face images may have been processed. The face detection reiterates 820 until a suitable face is output.
  • the Non-rigid face detection step (b) is more complex since it tries to detect elements of the face (also called face features, or landmarks).
  • This non-rigid face detection step may take advantage of the fact that a face has been correctly detected in step (a). Then face detection is refined to detect face elements, for example using face detections techniques known in the art. As in (a), a signature of face elements has been learnt using hundreds of face representations.
  • This step (b) is then able to compute a 2D shape that corresponds to the face features (see an illustration in figure 7). Steps (a) and (b) may be repeated on all or on a subset of the captured frames that comprises the video sequence being assessed.
  • the number of frames processed depends on the total number of frames of the video sequence, or the sub-selection of video frames used. These may be based upon, for example, the processing capacity of the system (e.g. processor, memory, etc), or on the time the user is (or deemed to be) willing to wait before results to appear.
  • the processing capacity of the system e.g. processor, memory, etc
  • step (c) If steps (a) and (b) have succeeded for at least one frame, then step (c) is processed to find the frame in the video sequence that contains the most reliable face. Notion of reliable face can be defined as follow:
  • the face- tracking algorithm changes state and tries to construct a face model representation, or chose a most appropriate generic head model for use, or simply uses a standard generic model without any selection thereof 890.
  • Figure 9 shows the optional face model building phase of the method 900 that serves to construct a suitable face model representation, i.e. building an approximate geometry of the face along with a textured signature of the face and corresponding keypoints.
  • this textured 3D model is referred as a keyframe.
  • the approximate geometry of the face may instead be taken from a pre-determined generic 3D model of a human face.
  • the keyframe may be constructed using the most reliable frame of the video sequence. This phase is decomposed in following steps:
  • step (c) Saving a 2D image of the face by cropping the face available in the most reliable frame.
  • the position of the face elements may be used to create the 3D model of the face. These face elements may give essential information about the deformation of the face. A mean (i.e. average) 3D face model available statically is then deformed using these 2D face elements. This face model may then be positioned and oriented according to the camera position. This may be done by optimizing an energy function that is expressed using the image position of face elements and their corresponding 3D position on the model.
  • keypoints referred sometimes as interest points or corner points
  • a keypoint can be detected at a specific image location if the neighboring pixels intensities are varying substantially in both horizontal and vertical directions.
  • the face representation (an image of the face) may also be memorized (i.e. saved) so that the process can match its appearance in the remaining frames of the video capture.
  • Steps (a), (b) and (c) aim to construct a keyframe of the face. This keyframe is used to track the face of the user in the remaining video frames.
  • the remaining video frames may be processed with the objective to track the face sequentially. Assuming that the face's appearance in contiguous video frames is similar helps the described method track the face frame after frame. This is because the portion of image around each keypoint(s) does not change too much from one frame to another, therefore comparing/matching keypoint(s) (in fact neighbouring image appearance) is easier. Any suitable technique to track the face sequentially known in the art may be used. For example, as described in "Stable Real-Time 3D Tracking using Online and Offline Information" - by L. Vacchetti, V. Lepetit and P.
  • the keyframe may be used to match keypoints computed in earlier described face model building phase and keypoints computed in each video frame.
  • the pose of the face i.e. its position and orientation
  • Figure 10 shows a processed video frame along with its corresponding (e.g. generic) 3D model of a head.
  • face poses and, in some examples, the corresponding generic human face model
  • rendering module can use this information to display virtual objects on top of the video sequence.
  • This process is shown in Figure 11, and includes tracking sequentially the Face model using the keyframe 1110, and returning Face poses when available 1130, via iterative process 1120 whilst frames are available for processing, until no more frames are available for processing.
  • the invention may be implemented as a computer program for running on a computer system, said computer system comprising at least one processer, where the computer program includes executable code portions for execution by the said at least one processor, in order for the computer system to perform any method according to the described examples.
  • the computer system may be a programmable apparatus, such as, but not limited to a personal computer, tablet or smartphone apparatus.
  • Figure 12 shoes an exemplary generic embodiment of such a computer system 1200 comprising one or more processor(s) 1240, system control logic 1220 coupled with at least one of the processor(s) 1240, system memory 1210 coupled with system control logic 1220, non-volatile memory (NVM)/storage 1230 coupled with system control logic 1220, and a network interface 1260 coupled with system control logic 1220.
  • the system control logic 1220 may also be coupled to Input/Output devices 1250.
  • Processor(s) 1240 may include one or more single-core or multi-core processors.
  • Processor(s) 1240 may include any combination of general-purpose processors and dedicated processors (e.g., graphics processors, application processors, etc.).
  • Processors 1240 may be operable to carry out the above described methods, using suitable instructions or programs (i.e. operate via use of processor, or other logic, instructions).
  • the instructions may be stored in system memory 1210, as glasses visualisation application 1205, or additionally or alternatively may be stored in (NVM)/storage 1230, as NVM glasses visualisation application portion 1235, to thereby instruct the one or more processors 1240 to carry out the virtual trying on experience methods described herein.
  • the system memory 1210 may also include 3D model data 1215, whilst NVM storage 1230 may include 3D model Data 1237. These may serve to store 3D models of the items to be placed, such as glasses, and one or more generic 3D models of a human head.
  • System control logic 1220 may include any suitable interface controllers to provide for any suitable interface to at least one of the processor(s) 1240 and/or to any suitable device or component in communication with system control logic 1220.
  • System control logic 1220 may include one or more memory controller(s) (not shown) to provide an interface to system memory 1210.
  • System memory 1210 may be used to load and store data and/or instructions, for example, for system 1200.
  • System memory 1210 for one embodiment may include any suitable volatile memory, such as suitable dynamic random access memory (DRAM), for example.
  • DRAM dynamic random access memory
  • NVM/storage 1230 may include one or more tangible, non-transitory computer-readable media used to store data and/or instructions, for example.
  • NVM/storage 1230 may include any suitable non-volatile memory, such as flash memory, for example, and/or may include any suitable non-volatile storage device(s), such as one or more hard disk drive(s) (HDD(s)), one or more compact disk (CD) drive(s), and/or one or more digital versatile disk (DVD) drive(s), for example.
  • HDD hard disk drive
  • CD compact disk
  • DVD digital versatile disk
  • the NVM/storage 1230 may include a storage resource physically part of a device on which the system 1200 is installed or it may be accessible by, but not necessarily a part of, the device.
  • the NVM/storage 1230 may be accessed over a network via the network interface 1260.
  • System memory 1210 and NVM/storage 1230 may respectively include, in particular, temporal and persistent copies of, for example, the instructions memory portions holding the glasses visualisation application 1205 and 1235, respectively.
  • Network interface 1260 may provide a radio interface for system 1200 to communicate over one or more network(s) (e.g. wireless communication network) and/or with any other suitable device.
  • network(s) e.g. wireless communication network
  • Figure 13 shows more specific example device to carry out the disclosed virtual trying experience method, in particular a smartphone embodiment 1300, where the method is carried out by an "app" downloaded to the smartphone 1300 via antenna 1310, to be run on a computer system 1200 (as per figure 12) within the smartphone 1300.
  • the smartphone 1300 further includes a display and/or touch screen display 1320 for displaying the virtual try-on experience image formed according to the above described examples.
  • the smartphone 1300 may optionally also include a set of dedicated input devices, such as keyboard 1320, especially when a touchscreen display is not provided.
  • a computer program may be formed of a list of executable instructions such as a particular application program and/or an operating system.
  • the computer program may for example include one or more of: a subroutine, a function, a procedure, an object method, an object implementation, an executable application ("app"), an applet, a servlet, a source code portion, an object code portion, a shared library/dynamic load library and/or any other sequence of instructions designed for execution on a suitable computer system.
  • the computer program may be stored internally on a computer readable storage medium or transmitted to the computer system via a computer readable transmission medium. All or some of the computer program may be provided on computer readable media permanently, removably or remotely coupled to the programmable apparatus, such as an information processing system.
  • the computer readable media may include, for example and without limitation, any one or more of the following: magnetic storage media including disk and tape storage media; optical storage media such as compact disk media (e.g., CD-ROM, CD-R, Blu-Ray®, etc.) digital video disk storage media (DVD, DVD-R, DVD-RW, etc) or high density optical media (e.g.
  • Nonvolatile memory storage media including semiconductor-based memory units such as FLASH memory, EEPROM, EPROM, ROM; ferromagnetic digital memories; MRAM; volatile storage media including registers, buffers or caches, main memory, RAM, DRAM, DDR RAM etc.; and data transmission media including computer networks, point-to-point telecommunication equipment, and carrier wave transmission media, and the like.
  • Embodiments of the invention may include tangible and non-tangible embodiments, transitory and non-transitory embodiments and are not limited to any specific form of computer readable media used.
  • a computer process typically includes an executing (running) program or portion of a program, current program values and state information, and the resources used by the operating system to manage the execution of the process.
  • An operating system is the software that manages the sharing of the resources of a computer and provides programmers with an interface used to access those resources.
  • An operating system processes system data and user input, and responds by allocating and managing tasks and internal system resources as a service to users and programs of the system.
  • the computer system may for instance include at least one processing unit, associated memory and a number of input/output (I/O) devices.
  • I/O input/output
  • the computer system processes information according to the computer program and produces resultant output information via I/O devices.
  • logic blocks are merely illustrative and that alternative embodiments may merge logic blocks or circuit elements or impose an alternate decomposition of functionality upon various logic blocks or circuit elements.
  • architectures depicted herein are merely exemplary, and that in fact many other architectures can be implemented which achieve the same functionality.
  • any arrangement of components to achieve the same functionality is effectively “associated” such that the desired functionality is achieved.
  • any two components herein combined to achieve a particular functionality can be seen as “associated with” each other such that the desired functionality is achieved, irrespective of architectures or intermedial components.
  • any two components so associated can also be viewed as being “operably connected,” or “operably coupled,” to each other to achieve the desired functionality.
  • the examples, or portions thereof may be implemented as soft or code representations of physical circuitry or of logical representations convertible into physical circuitry, such as in a hardware description language of any appropriate type.
  • the invention is not limited to physical devices or units implemented in nonprogrammable hardware but can also be applied in programmable devices or units able to perform the desired device functions by operating in accordance with suitable program code, such as mainframes, minicomputers, servers, workstations, personal computers, notepads, personal digital assistants, electronic games, automotive and other embedded systems, cell phones and various other wireless devices, commonly denoted in this application as 'computer systems'.
  • any reference signs placed between parentheses shall not be construed as limiting the claim.
  • the word 'comprising' does not exclude the presence of other elements or steps then those listed in a claim.
  • the terms "a” or "an,” as used herein, are defined as one or more than one.
  • Examples provide a method of providing a virtual trying on experience to a user comprising extracting at least one image from a video including a plurality of video frames of a user in different orientations to provide at least one extracted image, determining user movement in the at least one extracted image, acquiring 3D models of an item to be tried on the user and a generic representation of a human, combining the acquired 3D models and at least one extracted image as the background, and generating an output image representative of the virtual trying -on experience.
  • the determining user movement in the at least one extracted image further comprises determining a maximum angle of rotation of the user in a first direction.
  • the determining user movement in the at least one extracted image further comprises determining a maximum angle of rotation of the user in a second direction.
  • the determining user movement in the at least one extracted image further comprises outputting a value indicative of the determined maximum angle of rotation of the user in the first or second directions.
  • the acquiring 3D models of an item to be tried on the user and a generic representation of a human further comprises selecting a one of a plurality of 3D models of available generic humans.
  • the method further comprises determining an origin point in each of the 3D models used, wherein the respective origin point in each 3D model is placed to allow alignment of the 3D models with one another.
  • the method further comprises determining an orientation of the user in the at least one extracted image and corresponding the orientation of the 3D models in a 3D space according to the determined orientation of the user.
  • the method further comprises adjusting an origin of at least one 3D model.
  • the method further comprises aligning the origins of the 3D models.
  • the method further comprises dividing the maximum rotation of the user in first and second directions into a predetermined number of set angles, and extracting as many images as there are determined number of set angles.
  • the method further comprises adjusting respective positions of the 3D models and the background according to user input.
  • the method further comprises capturing the rotation of the user using a video capture device.
  • the method further comprises determining user movement comprises determining movement of a user's head.
  • a method of providing a virtual trying on experience for a user comprising receiving a plurality of video frames of a user's head in different orientations to provide captured oriented user images, identifying an origin reference points on the captured oriented user images, identifying an origin on a 3D model of a generic user, identifying an origin reference point on a 3D model of a user-selected item to be tried on, aligning the reference points of the selected captured oriented user images, the 3D model of a generic user and the 3D model of an item to be tried on, combining the captured oriented user images with a generated representation of user- selected item to be tried on to provide a combined image and displaying the combined image.
  • the receiving a plurality of video frames of a user's head in different orientations to provide captured oriented user images further comprises selecting only a subset of all the captured video frames to use in the subsequent processing of the captured oriented user images.
  • the selecting only a subset is a pre-determined sub set, or user-selectable.
  • the method further comprises identifying one or more attachment points of the item to the user.
  • the method further comprises rotating or translating the attachment points in the 3D space to re-align the item to the user in a user specified way.
  • the providing a virtual trying on experience for a user comprises generating a visual representation of a user trying on an item, and wherein the trying on of an item on a user comprises trying on an item on a user's head.
  • the item being tried on is a pair of glasses.
  • a method of providing a virtual trying on experience for a user comprising generating a visual representation of a user trying on an item from at least one 3D model of an item to be tried on, at least one 3D generic model of a human head and at least one extracted image of the user head.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Graphics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Processing Or Creating Images (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

L'invention concerne un procédé de fourniture d'une expérience d'essayage virtuel à un utilisateur, consistant à extraire au moins une image d'une vidéo comprenant une pluralité de trames vidéo d'un utilisateur dans différentes orientations pour fournir au moins une image extraite, à acquérir des modèles 3D d'un article devant être essayé sur l'utilisateur et une représentation générique d'un être humain, à combiner les modèles 3D acquis et au moins une image extraite en tant qu'arrière-plan, et à générer une image de sortie représentative de l'expérience d'essayage virtuel. L'invention concerne également un appareil permettant de mettre en œuvre les procédés.
PCT/GB2016/050596 2015-03-06 2016-03-07 Expérience d'essayage virtuel WO2016142668A1 (fr)

Priority Applications (3)

Application Number Priority Date Filing Date Title
AU2016230943A AU2016230943B2 (en) 2015-03-06 2016-03-07 Virtual trying-on experience
NZ736107A NZ736107B2 (en) 2015-03-06 2016-03-07 Virtual trying-on experience
EP16710283.9A EP3266000A1 (fr) 2015-03-06 2016-03-07 Expérience d'essayage virtuel

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB1503831.8 2015-03-06
GB1503831.8A GB2536060B (en) 2015-03-06 2015-03-06 Virtual trying-on experience

Publications (1)

Publication Number Publication Date
WO2016142668A1 true WO2016142668A1 (fr) 2016-09-15

Family

ID=52998515

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2016/050596 WO2016142668A1 (fr) 2015-03-06 2016-03-07 Expérience d'essayage virtuel

Country Status (4)

Country Link
EP (1) EP3266000A1 (fr)
AU (1) AU2016230943B2 (fr)
GB (1) GB2536060B (fr)
WO (1) WO2016142668A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10685457B2 (en) 2018-11-15 2020-06-16 Vision Service Plan Systems and methods for visualizing eyewear on a user

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP4006628A1 (fr) * 2020-11-27 2022-06-01 Fielmann Ventures GmbH Procédé mis en oeuvre par ordinateur pour fournir et positionner des lunettes et pour centrer les verres des lunettes

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6144388A (en) * 1998-03-06 2000-11-07 Bornstein; Raanan Process for displaying articles of clothing on an image of a person
EP2113881A1 (fr) * 2008-04-29 2009-11-04 Holiton Limited Procédé et dispositif de production d'image
US8708494B1 (en) * 2012-01-30 2014-04-29 Ditto Technologies, Inc. Displaying glasses with recorded images
US20150055085A1 (en) * 2013-08-22 2015-02-26 Bespoke, Inc. Method and system to create products

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8733936B1 (en) * 2012-01-30 2014-05-27 Ditto Technologies, Inc. Fitting glasses frames to a user
US20130335416A1 (en) * 2012-05-23 2013-12-19 1-800 Contacts, Inc. Systems and methods for generating a 3-d model of a virtual try-on product
US9286715B2 (en) * 2012-05-23 2016-03-15 Glasses.Com Inc. Systems and methods for adjusting a virtual try-on
TW201445457A (zh) * 2013-05-29 2014-12-01 Univ Ming Chuan 虛擬眼鏡試戴方法及其裝置
CN103400119B (zh) * 2013-07-31 2017-02-15 徐坚 基于人脸识别技术的混合显示眼镜交互展示方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6144388A (en) * 1998-03-06 2000-11-07 Bornstein; Raanan Process for displaying articles of clothing on an image of a person
EP2113881A1 (fr) * 2008-04-29 2009-11-04 Holiton Limited Procédé et dispositif de production d'image
US8708494B1 (en) * 2012-01-30 2014-04-29 Ditto Technologies, Inc. Displaying glasses with recorded images
US20150055085A1 (en) * 2013-08-22 2015-02-26 Bespoke, Inc. Method and system to create products

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10685457B2 (en) 2018-11-15 2020-06-16 Vision Service Plan Systems and methods for visualizing eyewear on a user

Also Published As

Publication number Publication date
GB201503831D0 (en) 2015-04-22
NZ736107A (en) 2021-08-27
GB2536060A (en) 2016-09-07
EP3266000A1 (fr) 2018-01-10
AU2016230943A1 (en) 2017-10-26
AU2016230943B2 (en) 2021-03-25
GB2536060B (en) 2019-10-16

Similar Documents

Publication Publication Date Title
CN111243093B (zh) 三维人脸网格的生成方法、装置、设备及存储介质
US20200013212A1 (en) Facial image replacement using 3-dimensional modelling techniques
WO2020029554A1 (fr) Procédé et dispositif d'interaction d'animation de modèle multiplan de réalité augmentée, appareil et support d'informations
KR102304124B1 (ko) 학습기반 3d 모델 생성 장치 및 방법
US20220044352A1 (en) Cross-domain image translation
JP7386812B2 (ja) 照明推定
US20180276882A1 (en) Systems and methods for augmented reality art creation
CN108986016B (zh) 图像美化方法、装置及电子设备
US11276238B2 (en) Method, apparatus and electronic device for generating a three-dimensional effect based on a face
US11138306B2 (en) Physics-based CAPTCHA
KR102433857B1 (ko) 혼합 현실에서 동적 가상 콘텐츠들을 생성하기 위한 디바이스 및 방법
US20230394740A1 (en) Method and system providing temporary texture application to enhance 3d modeling
KR20230162107A (ko) 증강 현실 콘텐츠에서의 머리 회전들에 대한 얼굴 합성
CN112308977A (zh) 视频处理方法、视频处理装置和存储介质
AU2016230943B2 (en) Virtual trying-on experience
CN113178017A (zh) Ar数据展示方法、装置、电子设备及存储介质
CN113240789A (zh) 虚拟对象构建方法及装置
CN116342831A (zh) 三维场景重建方法、装置、计算机设备及存储介质
CN108989681A (zh) 全景图像生成方法和装置
CN110827411B (zh) 自适应环境的增强现实模型显示方法、装置、设备及存储介质
NZ736107B2 (en) Virtual trying-on experience
US10825258B1 (en) Systems and methods for graph-based design of augmented-reality effects
Cheng et al. Object-level Data Augmentation for Visual 3D Object Detection in Autonomous Driving
CN111445573A (zh) 人手建模方法、系统、芯片、电子设备及介质
Nguyen et al. Fast and automatic 3D full head synthesis using iPhone

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16710283

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

REEP Request for entry into the european phase

Ref document number: 2016710283

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2016230943

Country of ref document: AU

Date of ref document: 20160307

Kind code of ref document: A