SG181178A1

SG181178A1 - A practical virtual fitting room for retail stores

Info

Publication number: SG181178A1
Application number: SG2010081750A
Authority: SG
Inventors: Chin Huat Cheng
Original assignee: Chin Huat Cheng
Priority date: 2010-11-08
Filing date: 2010-11-08
Publication date: 2012-06-28

Abstract

A Practical Virtual Fitting Room for Retail Stores AbstractA method and apparatus for generating and Using a virtual fitting room suited to retail stores is disclosed. The virtual fitting room apparatus does not require a dedicated-facility to operate. There is no need for an advance scan of the userprior to use. During use, a user does not need to hold or wear any contraptions. This approach allows any shopper to quickly preview garments on his or her body before trying on shortlisted garments in a real fitting room. The present invention also discloses a hands-free interface for garment navigation andselection.(Refer to FIG. 1)

Description

A Practical Virtual Fitting Room for Retail Stores

Works Cited ‘Binford, H. H., & Baker, T. O. (1981). Depth from edge and intensity based stereo. International

Joint Conferences on Artificial Intelligence, (pp. 631-636). Vancouver. : Cone, D. (1998). Patent No. 5,850,222. United States of America.

Eisert, P. (2008). Virtual Clothing: Fraunhofer Heinrich Hertz Institute. Retrieved from : Fraunhofer Heinrich Hertz Institute Web site: Co http://www .hhi.fraunhofer.de/en/departments/image-processing/computer-vision- : : ~- graphics/3d-virtual-and-augmented-environments/virtual-mirror/virtual-clothing/

Eyemagnet. (n.d.). Motion Detection, Eyemagnet Ltd. Retrieved 4 11, 2010, from Eyemagnet Ltd.

Web site: http://www.eyemagnet.com/

Feld, A., Nevo, N., & Cegla, E. (2006). Patent No. 7,149,665. United States of America.

Furutanisangyou Co. Ltd. (2009). Magic Mirror: Furutanisangyou Co. Ltd. Retrieved 4 11, 2010, from Furutanisangyou Co. Ltd. Web site: http://www.furutani- sangyou.co.jp/service/mahou-no-kagami.html

Gazzuolo, E. B. (2003). Patent No. 6,546,309. United States of America. -

Hartley, R. I. (1999). Theory and Practice of Projective Rectification. International Journal of

Computer Vision 35, 115-127.

Hearst Communications, Inc. (2010). Virtual Dressing Room. Retrieved 4 11, 2010, from

Seventeen: http://www.seventeen.com/fashion/virtual-dressing-room

Moreno, F. J. (2009). Patent No. 20090115777. United States of America. - - Okada, R., & Kondo, N. (2008). Patent No. 7,433,753. United States of America.

Straka, M., & Ruther, M. (n.d.). NARKISSOS: The Virtual Dressing Room. Retrieved 4 11, 2010, from Robot Vision Lab Web site: http://rvlab.icg.tugraz.at/

Vassilev, T. I., Lambros, C. Y., & Spanlang, B. (2005). Patent No. 20050052461. United States of

America.

Vock, C. A. (2003). Patent No. 200301011105. United States of America.

Zugara. (16 11, 2009). Fashionista AR Dressing Room from Zugara. Retrieved from The Future - Digital Life Web site: http://thomaskcarpenter.com/2009/11/16/fashionista-ar-dressing- room-zugara/ : :

Co *Isolsor -— ee ___ *GOOOOI*

SG-P201011 2

A Practical Virtual Fitting Room for Retail Stores

Description BACKGROUND OF THE INVENTION

[0001] The present invention relates to a method and apparatus for generating and using a : virtual fitting room (VFR) suited to retail stores. The virtual fitting room apparatus does not require a dedicated facility to operate. There is no need for an advance scan of the user prior to use. During use, a user does not need to hold or wear any contraptions. This approach allows any shopper to quickly preview garments on his or her body before trying on shortlisted garments in a real fitting room. The present invention also discloses a hands-free interface for garment navigation and selection.

DESCRIPTION OF THE RELATED ART. .

[0002] An early attempt at creating VFRs, also known as virtual dressing rooms or virtual changing rooms rendered two-dimensional garments on virtual mannequins. This requires manual inputs that describe the body shape of the user, as disclosed by Cone, 1998. The use of three-dimensional (3D) mannequins is disclosed by Gazzuolo, 2003, and Feld et. al, 2006. A method of creating virtual mannequins based on accurate 3D representations of the user is - disclosed by Vassilev et. al., 2005. These virtual garments and mannequins are rendered inside a synthetic world without any real-time immersion experience for the user.

[0003] Okada & Kondo, 2008, disclosed an invention that replaces 3D visualisation with augmented reality (AR). Instead, the garment is superimposed onto an image of the user from a live video source. This provided a high degree of realism as if the user is wearing the garment in the real world. However, a great deal of complexity and a priori knowledge of the user is required. A 3D model of the user and several different postures of the user must be scanned in advance.

[0004] An early adoption of AR visualisation is disclosed by Vock, 2003. Since then, several companies and online shopping websites offer VFRs based on AR. Examples include Eyemagnet and Hearst Communications, Inc, 2010. These are more cost-effective than the 3D methods described so far. They also do not require any knowledge about the user in advance. Their main shortcoming is that no automatic fitting of the garment on the user is available.

[0005] In the application of VFRs for retail stores, a method that involves an elaborate scanning : process of the shopper in advance is impractical. On the other hand, the use of a simpler apparatus with no automatic garment-fitting suffers from a lack of realism and ease of use.

[0006] The document by Vock, 2003, discloses the triangulation of edge contours in images obtained from a plurality of cameras. This is used to estimate the size of the user for garment- fitting. Automatic garment fitting is disclosed by Furutanisangyou Co. Ltd., 2009, and Moreno using an apparatus to obtain the user’s silhouette in real-time by chroma keying. This requiresa uniformly coloured background so that the apparatus can trace the silhouette of the user.

Another example of a dedicated facility is a room installed with a plurality of cameras positioned around the user where his or her 3D representation is scanned during use. This is disclosed by

Straka & Ruther in a project that uses as many as 15 cameras.

[0007] In an environment where shoppers may be constantly moving in the background or where the scene is cluttered, the triangulation of simple edge contours may not be robust enough to filter out background noise. The use of a dedicated facility makes this uncertainty irrelevant but incurs the penalty of additional setup effort and costs. Such a facility is a concern for smaller retail stores and also where the rent of retail space is high.

[0008] Another method for automatic garment-fitting use physical patterns to track the user.

One such apparatus based on this method is disclosed by Zugara, 2009, where the user holds up a pattern during use. Another apparatus disclosed by Eisert, 2008, employs a pattern that is worn by the user. Holding up a pattern while trying on garments is unnatural and degrades the immersion experience, while wearing it is an inconvenient chore that defeats the purpose of trying on garments virtually.

SG-P201011 3

’ A Practical Virtual Fitting Room for Retail Stores

[0009] Some of the prior art described involves a non-contact human-computer interface.

Contact-based interfaces such as the use of touchscreens may transmit diseases through indirect touch. Non-contact methods are not only hygienic; they make the apparatus more modern and appealing to use. Several apparatus of such prior art recognise upright hand gestures for garment selection. This may be extended to the manipulation of how the garment is displayed.

Prolonged use, however, may induce hand fatigue, especially if the garment collection is large and/or garment-fitting is a manual process. :

SUMMARY OF THE INVENTION

[0010] In accordance with the present invention, there is provided a VFR apparatus comprising:

[0011] a stereo or binocular camera to obtain an image pair of a user who tries on clothing; : [0012] a storage unit to store garment data;

[0013] a spatial transform unit for performing different spatial transformations of the image pair; So

[0014] a depth mapping unit for generating depth maps of the observed scene;

[0015] a silhouette extraction unit for extracting the silhouette of the user; oe [0016] a garment unit for interpreting the user selection of garments and the fitting of the selected garment to the body of the user; and

[0017] an augmented reality unit for superimposing images of garments on the user and rendering a virtual clothing rack for garment navigation.

[0018] Also in accordance with the present invention, there is provided a computer-readable medium storing a VFR computer program for causing a computer to execute instructions to perform steps of:

[0019] inputting the image pair of a user;

[0020] spatially transforming each image of the image pair;

[0021] generating depth maps from the transformed image pairs;

[0022] estimating the silhouette of the user from the depth maps;

[0023] estimating the intra-scene location of the user for garment selection; :

[0024] making reference to garment data stored in the storage unit;

[0025] geometrically transforming images of garments to fit the silhouette of the user by using the garment data stored in the storage unit;

[0026] superimposing transformed images of garments on the user; and

[0027] superimposing a virtual clothing rack for navigation through the stored garments.

BRIEF DESCRIPTION OF THE DRAWINGS

[0028] FIG. 1 is a block diagram of a VFR apparatus according to an embodiment of the invention; and =

[0029] FIG. 2 is a block diagram of a VFR apparatus for acquiring and generating the users silhouette in a modified manner.

DESCRIPTION OF THE PREFERRED EMBODIMENT

[0030] Embodiments consistent with the present invention relate to a VFR apparatus for obtaining images of how a user will look in a particular garment, using images of the user, without the user having to change his/her actual clothing. A VFR apparatus first estimates in real-time the silhouette of a user who tries on garments based on depth maps using an image pair. The apparatus then use one of the images in the image pair to recreate the experience of the user looking at a mirror wearing the virtual garment by superimposing the image of the garment onto the body of the user.

SG-P201011 4

A Practical Virtual Fitting Room for Retail Stores

[0031] The VFR apparatus of this embodiment displays images of garment that a user wants to try on being superimposed onto the body of the user. The user can see how the garment will appear on his or her body, without physically trying it on. The VFR apparatus of this embodiment geometrically transform the images of the garments to be tried on by estimating the silhouette of the user. Therefore, the user can see his or image in a standing posture as if he or she was actually trying out the clothes in front of a mirror. As the user moves in front of the cameras, the user can see his or her image, with the garment, undergoing the same movement.

[0032] The VFR apparatus of this embodiment is capable of changing the garment being tried on by using data for different types of garment. Therefore, the user is able to virtually try on garments that are not physically present. This also allows the user to quickly preview garments of different colours and designs.

[0033] The constitution and operation of the VFR apparatus of this embodiment will now be described with referenced to the drawings.

[0034] FIG. 1 illustrates the constitution of the VFR apparatus according to the embodiment.

Prior to virtually trying on the garments by using the apparatus, frontal pictures of garments are taken by a camera in the same apparatus or by a camera foreign to the apparatus. The backgrounds of the garments are manually cropped to reveal the silhouettes of the garments.

They are then saved to the apparatus.

[0035] A binocular acquisition unit 1 is a camera rig with two cameras. This camera rig is mounted on top of a display apparatus, such as a flat-panel display. An arbitrary camera is used as the reference camera. The binocular acquisition unit 1 is for taking an image pair of the user who is going to virtually try on the garment. This unit takes a sequence of image pairs of the user.

[0036] A spatial transform unit 2 transforms each image from the image pair into five spatial components. These components include the three primary colour components comprising of the red, green, and blue channels, its grayscale conversion, and the edge contours of the image. The spatial transform unit 2 essentially creates five spatially transformed pairs for a given image pair. ;

[0037] A depth mapping unit 3 pre-process each image of a spatially transformed pair from the spatial transform unit 2 by performing an image rectification (Hartley, 1999). A depth map (Binford & Baker, 1981) is computed by comparing the rectified images associated with the same spatially transformed pair. Such a computation always references the rectified image originating from the reference camera in the binocular acquisition unit 1. Every spatial location in the depth map corresponds to a point in the real world. The value of this spatial location is an estimation of how far the corresponding point in the real world is from the binocular acquisition unit 1. A depth threshold is used to filter out any points that are too far away from the binocular acquisition unit 1.

[0038] A silhouette extraction unit 4 determines the maximum response from all depth maps associated with a given image pair. The values from the same spatial location in the depth maps are considered, such that the maximum value is taken as the final response for that spatial location. This response map is then binarised with a pre-defined threshold. A silhouette of the user is generated by assigning a pixel value of 1 to those spatial responses greater than the threshold and by assigning a pixel value of 0 to the other pixels. The silhouette extraction unit 4 post-process the binarised silhouette by performing an image rectification in reverse.

[0039] A garment unit 5 compares the silhouette from the silhouette extraction unit 4 with a generic shape of a human being. Non-human shapes are rejected. If there is a valid silhouette, the intra-scene location of the silhouette is estimated. If the intra-scene location is close to the edge of the image, it is interpreted as a command by the user to select another garment.

Otherwise, there is no change in garment selection. The garment unit 5 retrieves the selected garment data from the garment collection A. The image of the garment is then geometrically transformed to fit the body of the user by fitting to the silhouette generated by the silhouette extraction unit 4.

SG-P201011 5

A Practical Virtual Fitting Room for Retail Stores

[0040] An augmented reality unit 6 displays the images of the garments superimposed on the images of the user. The image of the garment generated by the garment unit 5 is superimposed onto the image from the reference camera in the binocular acquisition unit 1. The resulting image is flipped to create a mirrored image of the user wearing the garment. A virtual clothing rack is rendered on the mirrored image to display thumbnails of garments from the garment collection A. Each garment has a unique identity number and other descriptions displayed, such as pricing, general information and discounts, for example. The user may use the garment identity number to direct the store assistant to retrieve the actual garment and proceed to a real fitting room for trying on physically later.

[0041] As described above, the VFR apparatus of this embodiment of the invention is capable of displaying garments superimposed on an image of the user, regardless of his or her shape, size, and location within the observed scene. The VFR apparatus does not require any scans of the user in advance or require the user to hold or wear a contraption during use.

[0042] The binocular acquisition unit 1 is optimally mounted on top of a display apparatus without the need for a dedicated facility to support an extensive array of cameras with awkward mounting angles. The steps performed by the spatial transform unit 2, the depth mapping unit 3, and the silhouette extraction unit 4, creates a more robust depth map than the classic approach of generating depth maps from grayscale images. This allows the VFR apparatus to filter out the background in the image. Thus, the VFR apparatus may face a scene of moving shoppers, racks of clothing, or store ornamentation to maximise the use of retail space, instead of being restricted to a uniform background. Some clearance from the binocular acquisition unit 1 is required during use. Otherwise, the area is open to the flow of shopper traffic.

[0043] The VFR apparatus allows a hands-free approach to garment selection. A user selects garments by stepping towards the left or right edge of the image, while a virtual clothing rack scrolls garments in unison for ease of navigation through the garment collection A. This form of human-computer interaction is hygienic and does not induce hand fatigue after prolonged use. It also allows the VFR apparatus to be accessible to handicaps without hands.

MODIFIED EXAMPLE : :

[0044] The modified example deals with a VFR apparatus capable of a more accurate estimation of the user’s silhouette. FIG. 2 illustrates a VFR apparatus according to the modified example. In

FIG. 2, the modified part 100 represents the portion of the apparatus that differs between the modified example and the above embodiment.

[0045] A trinocular acquisition unit 7 is a camera rig with three cameras. This rig is mounted on top of a display apparatus, such as a flat-panel display. Two cameras are mounted next to each other in a row. An arbitrary camera in this group is chosen as the reference camera 7A. A third camera 7B is mounted on top of the row between the two cameras. The trinocular acquisition unit 7 is for taking an image triplet of the user who is going to virtually try on the garment. This unit takes a sequence of image triplets of the user.

[0046] A spatial transform unit 8 transforms the image pair from the bottom row cameras in the trinocular acquisition unit 7 in a similar fashion to how the spatial transform unit 2 transforms the image pair from the binocular acquisition unit 1. The spatial transform unit 8 also transforms each image in the image pair from the reference camera 7A and the top camera 7B in a similar fashion to how the spatial transform unit 2 transforms the image pair from the binocular acquisition unit 1. The spatial transform unit 8 essentially creates 10 spatially transformed pairs for a given image triplet.

[0047] A depth mapping unit 9 pre-process each image of a spatially transformed pair from the spatial transform unit 8 by performing an image rectification. This depth mapping unit 9 essentially creates 10 rectified image pairs. A depth map is computed by comparing the images in a rectified image pair. Every spatial location in the depth map corresponds to a point in the real world. The value of this spatial location is an estimation of how far the corresponding point in the real world is from the trinocular acquisition unit 7. A depth threshold is used to filter out any points that are too far away from the trinocular acquisition unit 7.

SG-P201011 6

A Practical Virtual Fitting Room for Retail Stores

[0048] A depth map is always computed by referencing the rectified image originating from the reference camera 7A in the trinocular acquisition unit 7. A reverse image rectification is performed on all depth maps generated by the depth mapping unit 9. Each spatial location in the depth maps then corresponds to the same point in the real world.

[0049] A silhouette extraction unit 10 determines the maximum response from all depth maps associated with a given image pair in the image triplet. The silhouette extraction unit 10 pre- process each depth map from the depth mapping unit 9 by performing an image rectification in reverse. The values from the same spatial location in the de-rectified depth maps are then considered, such that the maximum value is taken as the final response for that spatial location.

This response map is then binarised with a pre-defined threshold. A silhouette of the user is generated by assigning a pixel value of 1 to those spatial responses greater than the threshold and by assigning a pixel value of 0 to the other pixels.

[0050] The garment unit 5 validates the silhouette from the silhouette extraction unit 10 and then determine the intra-scene location of the user for garment selection. The garment unit 5 retrieves the selected garment data from the garment collection A and geometrically transform the image of the garment to fit the body of the user.

[0051] This modified example is potentially capable of generating a more complete silhouette of the user than the preferred embodiment of the present invention.

SG-P201011 7

Claims

A Practical Virtual Fitting Room for Retail Stores WHAT IS CLAIMED:

[0052] Although the present invention has been described in terms of preferred and modified embodiments, it is not intended that the invention be limited to these embodiments. Modifications within the spirit of the invention will be apparent to those skilled in the art. The scope of the present invention is defined by claims that follow.

[0053] 1. A VFR apparatus comprising:

[0054] a binocular camera to obtain an image pair of a user who tries on clothing;

[0055] a storage unit to store (A) garment data;

[0056] a spatial transform unit for performing different spatial transformations of the image pair;

[0057] a depth mapping unit for generating depth maps of the observed scene using a plurality of spatially transformed image pairs;

[0058] a silhouette extraction unit for extracting the silhouette of the user using a plurality of depth maps;

[0059] a garment unit for interpreting the user selection of garments and the fitting of the selected garment to the body of the user; and

[0060] an augmented reality unit for superimposing images of garments on the user and rendering a virtual clothing rack for garment navigation.

[0061] 2. A computer-readable medium storing a VFR computer program for causing a computer to execute instructions to perform steps of:

[0062] inputting the image pair of a user;

[0063] spatially transforming each image of the image pair;

[0064] generating depth maps from the transformed image pairs:

[0065] estimating the silhouette of the user from the depth maps;

[0066] estimating the intra-scene location of the user for garment selection;

[0067] making reference to garment data stored in the storage unit;

[0068] geometrically transforming images of garments to fit the silhouette of the user by using the garment data stored in the storage unit;

[0069] superimposing transformed images of garments on the user; and

[0070] superimposing a virtual clothing rack for navigation through the stored garments.

[0071] 3. A VFR apparatus according to claim 1, wherein:

[0072] the binocular camera is replaced by a trinocular camera to obtain an image triplet instead of an image pair;

[0073] the spatial transform unit performs spatial transformations on an image triplet instead of an image pair;

[0074] the depth mapping unit inputs an additional image pair that is spatially transformed; and :

[0075] the silhouette extraction unit inputs additional depth maps.

[0076] 4. A computer-readable medium according to claim 2, wherein:

[0077] the step of spatial transformation involves an additional image pair;

[0078] the step of generating depth maps involves additional image pairs that are spatially transformed; and

[0079] the step of estimating the user’s silhouette involves additional depth maps. SG-P201011 8