WO2021179919A1

WO2021179919A1 - System and method for virtual fitting during live streaming

Info

Publication number: WO2021179919A1
Application number: PCT/CN2021/078259
Authority: WO
Inventors: Yuan Tian; Yi Xu; Shuxue Quan
Original assignee: Guangdong Oppo Mobile Telecommunications Corp., Ltd.
Priority date: 2020-03-10
Filing date: 2021-02-26
Publication date: 2021-09-16
Also published as: CN115104319A

Abstract

Described herein are methods and systems for augmenting streaming media data to include virtual fit data. The methods involve assessing a streaming data to obtain a first three-dimensional (3D) model associated with a product. The methods further involve obtaining a second 3D model associated with a user. The first 3D model is then fitted onto the second 3D model and the fitted models are posed in a manner estimated from a presenter within the streaming data. The posed models are then rendered and displayed to a viewer alongside the streaming data. Embodiments of the present disclosure are applicable to a variety of applications in virtual reality and computer-based fitting systems.

Description

SYSTEM AND METHOD FOR VIRTUAL FITTING DURING LIVE STREAMING

CROSS-REFERENCE TO RELATED APPLICATION

This application is based on and claims priority of U.S. Provisional Application No. 62/987,474, filed on March 10, 2020 and entitled “SYSTEM AND METHOD FOR VIRTUAL FITTING DURING LIVE STREAMING” , the disclosure of which is hereby incorporated by reference in its entirety.

TECHNICAL FIELD

The present disclosure relates generally to methods and systems related to virtual fitting applications. More particularly, embodiments of the present disclosure provide methods and systems for augmenting streaming media data to include virtual fit data.

BACKGROUND

When considering whether to purchase products online, users often have difficulty visualizing how the product will look or perform. This is especially true when it comes to garments or other wearable products. In order to get a better idea of a product’s properties, users often resort to viewing images and/or videos in which presenters provide product ratings and reviews for the product. These images and videos may show the product in question on the presenter, who may discuss various benefits or drawbacks for the product. However, because people have varying body types, even seeing the product on the presenter may not be helpful in allowing the user to fully visualize how that product will look or perform on him/herself.

Embodiments of the disclosure address these and other problems individually and collectively.

SUMMARY

The methods involve assessing streaming data to obtain a first three-dimensional (3D) model associated with a product. The methods further involve obtaining a second 3D model associated with a user. The first 3D model is then fitted onto the second 3D model and the fitted models are posed in a manner estimated from a presenter within the streaming data. The posed models are then rendered and displayed to a viewer alongside the streaming data. Embodiments of the present disclosure are applicable to a variety of applications in virtual reality and computer-based fitting systems.

One embodiment of the disclosure is directed to a method comprising receiving an indication of media content being viewed by a user, identifying a product associated with the media content, obtaining a first 3D model representative of the product, obtaining a second 3D model representative of the user, determining a presentation pose from the media content, applying the presentation pose to the second 3D model, generating a third 3D model by fitting the second 3D model with the first 3D model, and presenting the third 3D model to the user in the presentation pose.

Another embodiment of the disclosure is directed to a system comprising: a processor; and a memory including instructions that, when executed with the processor, cause the system to, at least receive an indication of media content being viewed by a user, identify a product associated with the media content, obtain a first 3D model representative of the product, obtain a second 3D model representative of the user, determine a presentation pose from the media content, apply the presentation pose to the second 3D model, generate a third 3D model by fitting the second 3D model with the first 3D model, and present the third 3D model to the user in the presentation pose.

Yet another embodiment of the disclosure is directed to a non-transitory computer readable medium storing specific computer-executable instructions that, when executed by a processor, cause a computer system to at least receive an indication of media content being viewed by a user, identify a product associated with the media content, obtain a first 3D model representative of the product, obtain a second 3D model representative of the user, determine a presentation pose from the media content, apply the presentation pose to the second 3D model, generate a third 3D model by fitting the second 3D model with the first 3D model, and present the third 3D model to the user in the presentation pose.

Numerous benefits are achieved by way of the present system over conventional systems. For example, embodiments of the present disclosure involve methods and systems that provide a user with a more accurate assessment as to how a garment or other wearable product would look on himself/herself. In the described system, streaming media data is augmented with virtual fit data for a user. To do this product models are identified in relation to the streaming media data and a user model is obtained in relation to a viewer of the streaming media data. The product models are fitted onto the user model, which is posed in a manner similar to a presenter figure within the streaming media data. The product models and user model are then rendered and presented alongside (e.g., augmented within) the streaming media data.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 depicts an illustrative example of a system in which a streaming video may be augmented with virtual fit information in accordance with at least some embodiments;

FIG. 2 depicts a system architecture for a system that augments streaming data with virtual fit information in accordance with at least some embodiments;

FIG. 3 is a simplified flowchart illustrating a method of presenting a data stream augmented with virtual fit data according to an embodiment of the present disclosure;

FIG. 4 depicts an illustrative example of a technique for obtaining 3D models using sensor data in accordance with at least some embodiments;

FIG. 5 depicts an example graphical user interface (GUI) demonstrating features that may be implemented in accordance with embodiments described herein;

FIG. 6 illustrates a flow diagram depicting a process for presenting virtual fit data to a user in accordance with at least some embodiments; and

FIG. 7 illustrates examples of components of a computer system according to certain embodiments.

FIG. 8 illustrates a block diagram depicting an apparatus for presenting virtual fit data to a user in accordance with at least some embodiments.

DETAILED DESCRIPTION

The present disclosure relates generally to methods and systems related to virtual reality applications. More particularly, embodiments of the present disclosure provide methods and systems for determining a level of fit for a user and product. Embodiments of the present disclosure are applicable to a variety of applications in virtual reality and computer-based fitting systems.

FIG. 1 depicts an illustrative example of a system in which a streaming video may be augmented with virtual fit information in accordance with at least some embodiments. In FIG. 1, a user device 102 may be used to provide a request to a mobile application server 104 for virtual fit information. The user device, in some cases, may be used to obtain user data 106, which may be provided to the mobile application server 104 to be used in generating the virtual fit information.

In an example, the user device 102 represents a suitable computing device that includes one or more graphical processing units (GPUs) , one or more general purpose processors (GPPs) , and one or more memories storing computer-readable instructions that are executable by at least one of the processors to perform various functionalities of the embodiments of the present disclosure. For instance, user device 102 can be any of a smartphone, a tablet, a laptop, a personal computer, a gaming console, or a smart television. The user device 102 may additionally include a range camera (i.e., depth sensor) and/or a RGB optical sensor, such as a camera.

The user device may be used to capture and/or generate user data 106. User data 106, may include information related to a particular user (e.g., a user of the user device 102) for which a virtual fit data should be created. The user data 106 may include data about the user which may be used to generate virtual fit data. For example, user data 106 may include dimensions of the user. User data 106 may be captured in any suitable format. For example, user data 106 may include a point cloud, a 3D mesh or model, or a string of characters that includes measurements at predetermined locations. In some cases, capturing user data 106 may involve receiving information about the user which is manually input into the user device 102. For example, a user may input measurements for various parts of the user’s body via a keypad. In some cases, capturing user data 106 may involve using a camera and/or a depth sensor to capture images /depth information related to the user. The user device 102 may be further configured to generate a 3D model from the captured images/depth information. This process is described in greater detail with respect to FIG. 4 below.

The mobile application server 104 may include any computing device capable of generating a data stream which is augmented with virtual fit data for a user in accordance with the techniques described herein. In order to generate an augmented data stream, the mobile application server 104 may receive user data 106 from the user device 102. It should be noted that while the user data 106 may be received by the mobile application server 104 at the same time that the mobile application server 104 receives a request to generate virtual fit data, the mobile application server 104 may also receive user data 106 prior to, and independent of, any request to generate virtual fit data. For example, the mobile application server 104 may receive the user data 106 during an enrollment phase during which a user establishes an account with the mobile application server 104.

The request for virtual fit data may indicate a streaming data 108. Streaming data 108 may be a streaming video (e.g., a live stream) or other suitable dynamic media content. The streaming data 108 may depict at least a presenter 110 and at least one product 112. The mobile application server 104 may obtain, from the streaming data 108, an identifier for the at least one product 112 (product identifier 114) and data related to a pose of the presenter (pose data 116) . In some embodiments, one or more of the product identifier 114 or the pose data 116 may be associated with the streaming data 108 via metadata attached to the streaming data 108. In some embodiments, one or more machine vision techniques may be used to determine one or more of the product identifier 114 and/or the pose data 116 from imagery within the streaming data 108.

The mobile application server 104 may include, or have access to, object model data 118 from which product data 120 may be obtained in order to complete the request. Object model data 118 may include any computer-readable storage medium having stored thereon one or more 3D models. For example, the object model data 118 may be a database maintained by the mobile application server 104 or another server. The 3D models stored in object model data 118 may be representative of products which can be worn by a user, such as clothing items (e.g., garments) or accessories. In some embodiments, the object model data 118 may store 3D models for multiple versions of a product (e.g., different sizes and/or styles) . Upon receiving a product identifier 114 for a particular product, the mobile application server 104 retrieves product data 120, which includes a 3D model associated with the particular product, from the object model data 118.

The mobile application server 104 may be configured to combine the user data 106 and the product data 120 in order to generate a fitted avatar for the user. The mobile application server 104 may also pose the fitted avatar in accordance with the pose data 116. Once the fitted avatar has been generated, the mobile application server 104 may augment the streaming data 108 with the fitted avatar to generate an augmented streaming data 122. Once generated, the augmented streaming data 122 may be provided back to the user device 102, which may be rendered on a display for a user to view.

For clarity, a certain number of components are shown in FIG. 1. It is understood, however, that embodiments of the disclosure may include more than one of each component. In addition, some embodiments of the disclosure may include fewer than or greater than all of the components shown in FIG. 1. In addition, the components in FIG. 1 may communicate via any suitable communication medium (including the internet) , using any suitable communication protocol.

FIG. 2 depicts a system architecture for a system that augments streaming data with virtual fit information in accordance with at least some embodiments. In FIG. 2, a user device 202 may be in communication with a number of other components, including at least a mobile application server 204. The mobile application server 204 may perform at least a portion of the processing functions required by a mobile application installed upon the user device. The user device 202 and mobile application server 204 may be examples of the user device 102 and mobile application server 104 respectively described with respect to FIG. 1.

A user device 202 may be any suitable electronic device that is capable of providing at least a portion of the capabilities described herein. In particular, the user device 202 may be any electronic device capable of capturing user data and/or presenting an augmented data stream on a display. In some embodiments, a user device may be capable of establishing a communication session with another electronic device (e.g., mobile application server 204) and transmitting /receiving data from that electronic device. A user device may include the ability to download and/or execute mobile applications. User devices may include mobile communication devices as well as personal computers and thin-client devices. In some embodiments, a user device may comprise any portable electronic device that has a primary function related to communication. For example, a user device may be a smart phone, a personal data assistant (PDA) , or any other suitable handheld device. The user device can be implemented as a self-contained unit with various components (e.g., input sensors, one or more processors, memory, etc. ) integrated into the user device. Reference in this disclosure to an “output” of a component or an “output” of a sensor does not necessarily imply that the output is transmitted outside of the user device. Outputs of various components might remain inside a self-contained unit that defines a user device.

In one illustrative configuration, the user device 202 may include at least one memory 206 and one or more processing units (or processor (s) ) 208. The processor (s) 208 may be implemented as appropriate in hardware, computer-executable instructions, firmware or combinations thereof. Computer-executable instruction or firmware implementations of the processor (s) 208 may include computer-executable or machine executable instructions written in any suitable programming language to perform the various functions described. The user device 202 may also include one or more input sensors 210 for receiving user and/or environmental input. There may be a variety of input sensors 210 capable of detecting user or environmental input, such as an accelerometer, a camera device, a depth sensor, a microphone, a global positioning system (e.g., GPS) receiver, etc. The one or more input sensors 210 may include a range camera device (e.g., a depth sensor) capable of generating a range image, as well as a camera device configured to capture image information.

For the purposes of this disclosure, a range camera (e.g., a depth sensor) may be any device configured to identify a distance or range of an object or objects from the range camera. In some embodiments, the range camera may generate a range image (or range map) , in which pixel values correspond to the detected distance for that pixel. The pixel values can be obtained directly in physical units (e.g., meters) . In at least some embodiments of the disclosure, the user device may employ a range camera that operates using structured light. In a range camera that operates using structured light, a projector projects light onto an object or objects in a structured pattern. The light may be of a range that is outside of the visible range (e.g., infrared or ultraviolet) . The range camera may be equipped with one or more camera devices configured to obtain an image of the object with the reflected pattern. Distance information may then be generated based on distortions in the detected pattern. It should be noted that although this disclosure focuses on the use of a range camera using structured light, any suitable type of range camera, including those that operate using stereo triangulation, sheet of light triangulation, time-of-flight, interferometry, coded aperture, or any other suitable technique for range detection, would be useable by the described system.

The memory 206 may store program instructions that are loadable and executable on the processor (s) 208, as well as data generated during the execution of these programs. Depending on the configuration and type of user device 202, the memory 206 may be volatile (such as random access memory (RAM) ) and/or non-volatile (such as read-only memory (ROM) , flash memory, etc. ) . The user device 202 may also include additional storage 212, such as either removable storage or non-removable storage including, but not limited to, magnetic storage, optical disks, and/or tape storage. The disk drives and their associated computer-readable media may provide non-volatile storage of computer-readable instructions, data structures, program modules, and other data for the computing devices. In some implementations, the memory 206 may include multiple different types of memory, such as static random access memory (SRAM) , dynamic random access memory (DRAM) or ROM. Turning to the contents of the memory 206 in more detail, the memory 206 may include an operating system 214 and one or more application programs or services for implementing the features disclosed herein including at least a mobile application 216. The memory 206 may also include application data 218, which provides information to be generated by and/or consumed by the mobile application 216. In some embodiments, the application data 218 may be stored in a database.

For the purposes of this disclosure, a mobile application may be any set of computer executable instructions installed upon, and executed from, a user device 202. Mobile applications may be installed on a user device by a manufacturer of the user device or by another entity. In some embodiments, the mobile application 216 may cause a user device to establish a communication session with a mobile application server 204 that provides backend support for the mobile application 216. A mobile application server 204 may maintain account information associated with a particular user device and/or user. In some embodiments, a user may be required to log into a mobile application in order to access functionality provided by the mobile application 216.

In accordance with at least some embodiments, the mobile application 216 is configured to provide user information to the mobile application server 204 and to present information received from the mobile application server 204 to a user. More particularly, the mobile application 216 is configured to obtain measurement data for a user and to submit that measurement data to a mobile application server 204 in relation to a request for a streaming data augmented with virtual fit data. In some embodiments, the mobile application 216 may also receive an indication of a data stream which should be augmented with virtual fit data.

In accordance with at least some embodiments, the mobile application 216 may receive output from the input sensors 210 and generate a 3D model based upon that output. For example, the mobile application 216 may receive depth information (e.g., a range image) from a depth sensor (e.g., a range camera) , such as the depth sensors previously described with respect to input sensors 210 as well as image information from a camera input sensor. Based on this information, the mobile application 216 may determine the bounds of an object (e.g., a user) to be identified. For example, a sudden variance in depth within the depth information may indicate a border or outline of an object. In another example, the mobile application 216 may utilize one or more machine vision techniques and/or machine learning to identify the bounds of an object. In this example, the mobile application 216 may receive image information from a camera input sensor 210 and may identify potential objects within the image information based on variances in color or texture data detected within the image or based on learned patterns. In some embodiments, the mobile application 216 may cause the user device 202 to transmit the output obtained from the input sensors 210 to the mobile application server 204, which may then perform one or more object recognition techniques upon that output in order to generate a 3D model of the object.

The user device 202 may also contain a communications interface (s) 220 that enables the user device 202 to communicate with any other suitable electronic devices. In some embodiments, the communication interface 220 may enable the user device 202 to communicate with other electronic devices on a network (e.g., on a private network) . For example, the user device 202 may include a BLUETOOTH ^TM wireless communication module, which allows it to communicate with another electronic device. The user device 202 may also include input/output (I/O) device (s) and/or ports 222, such as for enabling connection with a keyboard, a mouse, a pen, a voice input device, a touch input device, a display, speakers, a printer, etc.

In some embodiments, the user device 202 may communicate with the mobile application server 204 via a communication network. The communication network may include any one or a combination of many different types of networks, such as cable networks, the Internet, wireless networks, cellular networks, and other private and/or public networks. In addition, the communication network may comprise multiple different networks. For example, the user device 202 may utilize a wireless local area network (WLAN) to communicate with a wireless router, which may then route the communication over a public network (e.g., the Internet) to the mobile application server 204.

The mobile application server 204 may be any computing device or plurality of computing devices configured to perform one or more calculations on behalf of the mobile application 216 on the user device 202. In some embodiments, the mobile application 216 may be in periodic communication with the mobile application server 204. For example, the mobile application 216 may receive updates, push notifications, or other instructions from the mobile application server 204. In some embodiments, the mobile application 216 and mobile application server 204 may utilize a proprietary encryption and/or decryption scheme to secure communications between the two. In some embodiments, the mobile application server 204 may be executed by one or more virtual machines implemented in a hosted computing environment. The hosted computing environment may include one or more rapidly provisioned and released computing resources, which computing resources may include computing, networking, and/or storage devices. A hosted computing environment may also be referred to as a cloud-computing environment.

In one illustrative configuration, the mobile application server 204 may include at least one memory 224 and one or more processing units (or processor (s) ) 226. The processor (s) 226 may be implemented as appropriate in hardware, computer-executable instructions, firmware or combinations thereof. Computer-executable instruction or firmware implementations of the processor (s) 226 may include computer-executable or machine executable instructions written in any suitable programming language to perform the various functions described.

The memory 224 may store program instructions that are loadable and executable on the processor (s) 226, as well as data generated during the execution of these programs. Depending on the configuration and type of mobile application server 204, the memory 224 may be volatile (such as random access memory (RAM) ) and/or non-volatile (such as read-only memory (ROM) , flash memory, etc. ) . The mobile application server 204 may also include additional storage 228, such as either removable storage or non-removable storage including, but not limited to, magnetic storage, optical disks, and/or tape storage. The disk drives and their associated computer-readable media may provide non-volatile storage of computer-readable instructions, data structures, program modules, and other data for the computing devices. In some implementations, the memory 224 may include multiple different types of memory, such as static random access memory (SRAM) , dynamic random access memory (DRAM) or ROM. Turning to the contents of the memory 224 in more detail, the memory 224 may include an operating system 230 and one or more application programs or services for implementing the features disclosed herein including at least a module for fitting a 3D model of a product onto a 3D model of a user (fitting module 232) and/or a module for determining and applying a pose to a 3D model of a product and a 3D model of a user (pose module 234) . The memory 224 may also include account data 236, which provides information associated with user accounts maintained by the described system, user model data 238, which maintains 3D models associated with each user of an account, and/or object model data 240, which maintains 3D models associated with a number of objects (products) . In some embodiments, one or more of the account data 236, the user model data 238, or the object model data 240 may be stored in a database. In some embodiments, the object model data 240 may be an electronic catalog that includes data related to objects available for sale from a resource provider, such as a retailer or other suitable merchant.

The memory 224 and the additional storage 228, both removable and non-removable, are examples of computer-readable storage media. For example, computer-readable storage media may include volatile or non-volatile, removable or non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data. As used herein, the term “modules” may refer to programming modules executed by computing systems (e.g., processors) that are installed on and/or executed from the mobile application server 204. The mobile application server 204 may also contain communications connection (s) 242 that allow the mobile application server 204 to communicate with a stored database, another computing device or server, user terminals, and/or other components of the described system. The mobile application server 204 may also include input/output (I/O) device (s) and/or ports 244, such as for enabling connection with a keyboard, a mouse, a pen, a voice input device, a touch input device, a display, speakers, a printer, etc.

Turning to the contents of the memory 224 in more detail, the memory 224 may include the fitting module 232, the pose module 234, the database containing account data 236, the database containing user model data 238, and/or the database containing object model data 240.

In some embodiments, the fitting module 232 may be configured to, in conjunction with the processors 226, apply deformations to a 3D model of a product in order to fit it onto a 3D model of a user. The fitting module 232 may have access to one or more rules which delineate how specific product types (e.g., shirt, pants, etc. ) should be deformed (e.g., stretched and/or bended) in order to be fit onto a user model. In order to fit a 3D model of a product onto a 3D model of a user, the fitting module 232 may snap certain portions of the 3D model of the product onto specific portions of the 3D model of the user. For example, the 3D model of a shirt may be positioned so that sleeves of the 3D model of the shirt encompass the arms of a 3D model of a user. Additionally, the 3D model of a shirt may be also be positioned so that the collar of the 3D model of the shirt encompasses a neck of a 3D model of a user. The remainder of the 3D model of the shirt may then be deformed such that the interior of the 3D model of the shirt lies outside of or along the exterior of the 3D model of the user by stretching and bending the portions of the 3D model of the shirt.

In some embodiments, the pose module 234 may be configured to, in conjunction with the processors 226, identify a pose of a presenter (i.e., a human figure) within a streaming data and apply that pose to a combination of a 3D model of a product fitted onto a 3D model of a user as generated by the fitting module 232. This may involve using one or more pose estimation techniques to determine a current pose of a presenter within the data stream. For example, the pose module 234 may use machine learning to determine a pose of the presenter within a data stream. One skilled in the art would recognize that a number of suitable pose estimation techniques are available. In some embodiments, the pose module 234 may then apply the determined pose to the user model (having fitted onto it the product model) . This may involve repositioning one or more appendages or body parts of the user model until the determined pose is achieved. In some embodiments, the pose module 234 may monitor the pose of the presenter within the streaming data and may adjust a pose of the user model to match the pose of the presenter as changes in the presenter’s pose are detected.

Once the pose module has adjusted a pose of the combined user model and product model, the pose module 234 may render the combined user model and product model. In some embodiments, the combined user model and product model may be rendered in a small window, which is then placed in a discreet location within the streaming data. For example, where the streaming data is a video, the combined user model and product model may be rendered within a window located in a lower corner of the video. The rendering would allow the user to visualize the product on him/herself. Note that while the pose module 234 and the fitting module 232 are described with respect to the mobile application server 204, the functionality described as being performed by one or more of the modules may be performed by the mobile application on the user device 202 instead.

In some embodiments, each of the object entries within the object model database 240 may be associated with a 3D model of that object. In these embodiments, the 3D model may be combined with a second 3D model of a user and provided to the mobile application 216 such that the user device 202 is caused to display the combination of the 3D models on a display of the user device as augmented into streaming data. The mobile application 216 may dynamically update a pose of the combination of the 3D models on the display of the user device as a pose of a presenter with the streaming data is updated.

FIG. 3 is a simplified flowchart illustrating a method of presenting a data stream augmented with virtual fit data according to an embodiment of the present disclosure. The flow is described in connection with a computer system that is an example of the computer systems described herein. Some or all of the operations of the flows can be implemented via specific hardware on the computer system and/or can be implemented as computer-readable instructions stored on a non-transitory computer-readable medium of the computer system. As stored, the computer-readable instructions represent programmable modules that include code executable by a processor of the computer system. The execution of such instructions configures the computer system to perform the respective operations. Each programmable module in combination with the processor represents a means for performing a respective operation (s) . While the operations are illustrated in a particular order, it should be understood that no particular order is necessary and that one or more operations may be omitted, skipped, and/or reordered.

To begin the process 300, one or more products 302 are scanned to generate product model data 304. The product model 304 are generated as 3D virtual representations of the product 302 which may be generated by scanning the product 302 using cameras and or depth sensors from multiple view angles. At step 1 of process 300, a number of generated product models 304 are provided to the mobile application server 204 to be stored in object model data 240. Product models 304 may be generated by a number of different entities. For example, a product model for a particular product may be generated by a manufacturer of that product.

Separately, a user is scanned using a camera and/or depth sensor installed on a user device 202 to generate a user model data 308 at step 2 of process 300. Some example techniques for generating a model of an object (e.g., a user) are described in greater detail with respect to FIG. 4 below. At step 3 of process 300, the user model data 308 is transmitted to the mobile application server 204 to be stored in user model data 238. In some cases, the user model data 308 may be stored in relation to an account maintained for the user that was scanned.

The mobile application server 204 receives a request from a user to consume streaming data 306. Upon receiving a request to consume streaming data 306, the mobile application server 204 retrieves the streaming data 306 from its location at step 4 of process 300. In some embodiments, the streaming data 306 may be maintained by the mobile application server 204. In some embodiments, streaming data 306 may be maintained by an entity separate from the mobile application server 204. For example, a user may request, via the mobile application server 204 which provides support for a mobile application installed upon the user’s mobile device, to watch a video file hosted by YOUTUBE. COM ^TM. In this example, the user may provide a uniform resource locator (URL) or other identifier of the video. The mobile application server 204 may then retrieve the video file by visiting the URL. Once the streaming data 306 has been retrieved, the mobile application server 204 identifies one or more relevant products as well as a presenter pose. As described elsewhere, this may be done using a fitting module 232 and/or a pose module 234 as described with respect to FIG. 2 above.

Turning to the steps performed by the pose module 234, the process 300 involves determining a pose of a presenter within the streaming data 306 at step 5. This may involve first identifying the presenter within the streaming data (e.g., using one or more machine vision techniques) and then estimating a pose of that presenter using any suitable pose estimation technique. One skilled in the art should recognize that a number of suitable techniques are available. In general, a pose for an object indicates a location and orientation of that object. With respect to a presenter, the estimated pose includes a record of locations and orientations of various body parts or joints of the presenter.

Once a pose of the presenter has been estimated, that pose is applied to a user model by the pose module 234. To do this, the pose module 234 retrieves a user model at step 6 of process 300. The user model is a 3D model representative of a person which is stored in user model data 238 in association with that person or an account linked to the person. When a request to consume streaming data 306 is received in relation to a particular user, the pose module 234 may retrieve the user model associated with that person from the user model data 238. Once retrieved, the pose module 234 applies the estimated pose of the presenter to the retrieved user model. To do this, the pose module 234 repositions various body parts of the user model to match the record of locations and orientations of various body parts of the presenter. The posed user model is then provided to the fitting module 232 at step 7 of process 300.

Turning to the steps performed by the fitting module 232, the process 300 involves retrieving one or more product models at step 8. First, the fitting module 232 identifies one or more products associated with the streaming data 306. In some embodiments, the streaming data 306 may include an indication of one or more products. For example, the streaming data 306 may have attached metadata which indicates a Stocking Keep Unit (SKU) or other product identifier associated with the streaming data 306. In some embodiments, the one or more products may be identified from the streaming data 306 using machine vision techniques (e.g., object recognition) . For example, the fitting module 232 may identify a particular product (e.g., a shirt or pants) being worn by a presenter within the streaming data 306 by comparing visual properties of the product with attributes stored in relation to a number of products in an electronic catalog maintained by the mobile application server 204. In this example, the fitting module 232 may identify the product in the electronic catalog which most closely matches the product being worn by the presenter. Once the fitting module 232 has identified one or more products associated with the streaming data 306, product models for those products are retrieved from object model data.

At step 9 of process 300, the process involves fitting the one or more product models obtained at step 8 onto the posed user model provided at step 7. The process of fitting a product model onto the posed user model is done by adjusting a set of parameters that control the deformation of one or more regions of interest of the product model until the product model fits the user model. The set of parameters can be defined as a set of measurements, such as the displacements from each vertex of the product model. This process can be formulated as an optimization process, where a few different optimization algorithms can be used to find the best set of parameters that minimizes one or more cost functions. The cost functions can be defined as number of penetrations between the meshes of the two 3D models, the average distance between the vertices of the body mesh to the garment mesh, etc. Further examples of techniques for fitting a product model onto a user model are described in greater detail with respect to U.S. Patent Application No. 62/987,196, entitled “SYSTEM AND METHOD FOR VIRTUAL FITTING, ” (Attorney Docket No. 105184-1175451-007700US) , which is hereby incorporated by reference in its entirety for all purposes.

Once the product model has been fitted onto the user model, the models are rendered at step 10. Rendering is a process by which a 3D model is given the appearance of being solid using shading and color. One skilled in the art would recognize that there are a number of suitable techniques for rendering a product model fitted onto a posed user model. In some embodiments, the streaming data 306 is augmented with the rendered models. For example, the rendered models may be placed, as augmented visual data, within a small window inside of the streaming data 306, such that it may be viewed by a viewer of the streaming data 306 (e.g., the user) at the same time as the streaming data 306 is being viewed.

The rendered models (e.g., the augmented streaming data) are then provided to the user device 202 at step 11 of process 300. Upon receiving the rendered models, the user device 202 may present the rendered models to a user. For example, the user device 202 may play the augmented streaming data via a media player application.

In some embodiments, additional processing may be performed when the augmented streaming data 306 is consumed. For example, a user device 202 which is presenting the augmented streaming data 306 may collect image information for a user which is viewing the augmented streaming data 306 via a front-facing camera installed upon the user device 202. In this example, the user’s facial data may be extracted from the image information and overlaid onto the rendered user model, so that the user model is given the user’s face and facial expression data.

It should be appreciated that the specific steps illustrated in FIG. 3 provide a particular method of presenting a data stream augmented with virtual fit data according to an embodiment of the present disclosure. As noted above, other sequences of steps may also be performed according to alternative embodiments. For example, alternative embodiments of the present disclosure may perform the steps outlined above in a different order. Moreover, the individual steps illustrated in FIG. 3 may include multiple sub-steps that may be performed in various sequences as appropriate to the individual step. Furthermore, additional steps may be added or removed depending on the particular applications. One of ordinary skill in the art would recognize many variations, modifications, and alternatives.

FIG. 4 depicts an illustrative example of a technique for obtaining 3D models using sensor data in accordance with at least some embodiments. In accordance with at least some embodiments, sensor data 402 may be obtained from one or more input sensors installed upon a user device. The captured sensor data 402 includes image information 404 captured by a camera device as well as depth map information 406 captured by a depth sensor.

As stated above, the sensor data 402 may include image information 404. One or more image processing techniques may be used on image information 404 in order to identify one or more objects within that image information 404. For example, edge detection may be used to identify a section 408 within the image information 404 that includes an object. To do this, discontinuities in brightness, color, and/or texture may be identified across an image in order to detect edges of various objects within the image. Section 408 depicts an illustrative example image of a chair in which such discontinuities have been emphasized.

As also stated above, the sensor data 402 may include depth information 406. In depth information 406, a value may be assigned to each pixel that represents a distance between the user device and a particular point corresponding to the location of that pixel. The depth information 406 may be analyzed to detect sudden variances in depth within the depth information 406. For example, sudden changes in distance may indicate an edge or a border of an object within the depth information 406.

In some embodiments, the sensor data 402 may include both image information 404 and depth information 406. In at least some of these embodiments, objects may first be identified in either the image information 404 or the depth information 406 and various attributes of the objects may be determined from the other information. For example, edge detection techniques may be used to identify a section of the image information 404 that includes an object 408. The section 408 may then be mapped to a corresponding section 410 in the depth information to determine depth information for the identified object (e.g., a point cloud) . In another example, a section 410 that includes an object may first be identified within the depth information 406. In this example, the section 410 may then be mapped to a corresponding section 408 in the image information to determine appearance attributes for the identified object (e.g., color or texture values) .

In some embodiments, various attributes (e.g., color, texture, point cloud data, object edges) of an object identified in sensor data 402 may be used as input to a machine learning module in order to identify or generate a 3D model 412 that matches the identified object. In some embodiments, a point cloud for the object may be generated from the depth information and/or image information and compared to point cloud data stored in a database to identify a closest matching 3D model. Alternatively, a 3D model of an object (e.g., a user or a product) may be generated using the sensor data 402. To do this, a mesh may be created from point cloud data obtained from a section 410 of depth information 406. The system may then map appearance data from a section of image information 404 corresponding to section 410 to the mesh to generate a basic 3D model. Although particular techniques are described, it should be noted that there are a number of techniques for identifying particular objects from sensor output.

As described elsewhere, sensor data captured by a user device (e.g., user device 102 of FIG. 1) may be used to generate a 3D model of a user using the techniques described above. This 3D model of a user may then be provided to a mobile application server as user data. In some embodiments, sensor data may be used to generate a 3D model of a product, which may then be stored in an object model database 238. For example, a user wishing to sell a product may capture sensor data related to the product from the user’s device. The user’s user device may then generate a 3D model in the manner outlined above and may provide that 3D model to the mobile application server.

FIG. 5 depicts an example graphical user interface (GUI) demonstrating some example features that may be implemented in accordance with embodiments described herein. In FIG. 5, an example user device 502 is depicted as having a display screen upon which visual data may be presented. User device 502 is an example of user device 202 described with respect to FIG. 2 above.

As depicted in FIG. 5, a GUI of a software application (e.g., a media viewer application) installed upon user device 502 may be used to present streaming data 504. The streaming data 504 may include at least a presenter 506, which is a person depicted within streaming data 504, and a product 508. Product 508 may be worn or otherwise presented by presenter 506 within the streaming data 504.

As described elsewhere, a posed and fitted model 510 may be presented alongside the streaming data 504. For example, the model 510 may be presented within a separate window 512 in a location which minimizes any obstruction of a view of the streaming data 504, sometimes referred to as a picture-in-picture. The model 510 may include a user model representative of a current viewer of the streaming data 504 which has been posed in a manner similar to that of the presenter 508 and onto which a product model representative of product 508 has been fitted.

FIG. 6 illustrates a flow diagram depicting a process for presenting virtual fit data to a user in accordance with at least some embodiments. The process 600 depicted in FIG. 6 may be performed by a mobile application server (e.g., mobile application server 204 of FIG. 2) in communication with a user device (e.g., user device 202 of FIG. 2) .

At 602, the process 600 involves receiving an indication of media content being consumed by a user. For example, an indication may be received that a user is viewing a streaming video, which is a type of media content. The indicated media content may include a depiction of a presenter, which is a person different from the user. The indicated media content may also include a depiction of a product being presented by the presenter. For example, the product might be a clothing item worn by the presenter within the media content.

At 604, the process 600 involves identifying a product associated with the media content. In some embodiments, the product associated with the media content is identified via an identifier associated with the product included within metadata for the media content. In one example, the identifier associated with the product is a stock keeping unit (SKU) number. In some embodiments, the product associated with the media content is identified via object recognition.

At 606, the process 600 involves obtaining a first 3D model representative of the product. To do this, a 3D model associated with the product identified at 604 is retrieved from a database having stored therein object model data (e.g., object model data 240 of FIG. 2) . In some embodiments, an appropriate size and/or style of the product may be selected based on information stored in relation to a user which is viewing the media content.

At 608, the process 600 involves obtaining a second 3D model representative of the user. In some embodiments, user models may be stored in relation to one or more accounts. In these embodiments, the second 3D model may be identified and retrieved by virtue of being stored in relation to an account which is being used to view the media content. In some embodiments, a second 3D model representative of the user may be received from a user device which is being used to view the media content.

At 610, the process 600 involves determining a presentation pose from the media content. The presentation pose is determined as a current pose of the presenter within the media content. This may be done using any suitable pose estimation technique. The determined presentation pose may include an indication of various parts (e.g., body parts) of a user model and their respective locations and orientations.

At 612, the process 600 involves applying the presentation pose to the second 3D model. To do this, locations and orientations of various parts (e.g., body parts) of the second 3D model may be adjusted so that they match the corresponding locations and orientations in the presentation pose data.

At 614, the process 600 involves generating a third 3D model by fitting the first 3D model onto the second 3D model. This may involve deforming the first 3D model to minimize distances between the first 3D model and the second 3D model.

At 616, the process 600 involves presenting the third 3D model to the user. This involves rendering the third 3D model and providing the third 3D model to a user device which is presenting the media content. The third 3D model is caused to be presented alongside the media content. For example, the media content may be augmented to include the third 3D model (e.g., in a separate window within the media content) .

It should be appreciated that the specific steps illustrated in FIG. 6 provide a particular method of presenting virtual fit data to a user according to an embodiment of the present disclosure. As noted above, other sequences of steps may also be performed according to alternative embodiments. For example, alternative embodiments of the present disclosure may perform the steps outlined above in a different order. Moreover, the individual steps illustrated in FIG. 6 may include multiple sub-steps that may be performed in various sequences as appropriate to the individual step. Furthermore, additional steps may be added or removed depending on the particular applications. One of ordinary skill in the art would recognize many variations, modifications, and alternatives.

FIG. 7 illustrates examples of components of a computer system 700 according to certain embodiments. The computer system 700 is an example of the computer system described herein above. Although these components are illustrated as belonging to a same computer system 700, the computer system 700 can also be distributed.

The computer system 700 includes at least a processor 702, a memory 704, a storage device 706, input/output peripherals (I/O) 708, communication peripherals 710, and an interface bus 712. The interface bus 712 is configured to communicate, transmit, and transfer data, controls, and commands among the various components of the computer system 700. The memory 704 and the storage device 706 include computer-readable storage media, such as RAM, ROM, electrically erasable programmable read-only memory (EEPROM) , hard drives, CD-ROMs, optical storage devices, magnetic storage devices, electronic non-volatile computer storage, for example FLASH ^TM memory, and other tangible storage media. Any of such computer readable storage media can be configured to store instructions or program codes embodying aspects of the disclosure. The memory 704 and the storage device 706 also include computer readable signal media. A computer readable signal medium includes a propagated data signal with computer readable program code embodied therein. Such a propagated signal takes any of a variety of forms including, but not limited to, electromagnetic, optical, or any combination thereof. A computer readable signal medium includes any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use in connection with the computer system 700.

Further, the memory 704 includes an operating system, programs, and applications. The processor 702 is configured to execute the stored instructions and includes, for example, a logical processing unit, a microprocessor, a digital signal processor, and other processors. The memory 704 and/or the processor 702 can be virtualized and can be hosted within another computer system of, for example, a cloud network or a data center. The I/O peripherals 708 include user interfaces, such as a keyboard, screen (e.g., a touch screen) , microphone, speaker, other input/output devices, and computing components, such as graphical processing units, serial ports, parallel ports, universal serial buses, and other input/output peripherals. The I/O peripherals 708 are connected to the processor 702 through any of the ports coupled to the interface bus 712. The communication peripherals 710 are configured to facilitate communication between the computer system 700 and other computing devices over a communications network and include, for example, a network interface controller, modem, wireless and wired interface cards, antenna, and other communication peripherals.

FIG. 8 illustrates a block diagram depicting an apparatus for presenting virtual fit data to a user in accordance with at least some embodiments. The apparatus 800 depicted in FIG. 8 may be implemented as a mobile application server (e.g., mobile application server 204 of FIG. 2) in communication with a user device (e.g., user device 202 of FIG. 2) .

The apparatus 800 may include a receiving module 802 configured to receive an indication of media content being consumed by a user. For example, an indication may be received that a user is viewing a streaming video, which is a type of media content. The indicated media content may include a depiction of a presenter, which is a person different from the user. The indicated media content may also include a depiction of a product being presented by the presenter. For example, the product might be a clothing item worn by the presenter within the media content.

The apparatus 800 may further include an identifying module 804 configured to identify a product associated with the media content. In some embodiments, the product associated with the media content is identified via an identifier associated with the product included within metadata for the media content. In one example, the identifier associated with the product is a stock keeping unit (SKU) number. In some embodiments, the product associated with the media content is identified via object recognition.

The apparatus 800 may further include an obtaining module 806 configured to obtain a first 3D model representative of the product. To do this, a 3D model associated with the product identified at 604 is retrieved from a database having stored therein object model data (e.g., object model data 240 of FIG. 2) . In some embodiments, an appropriate size and/or style of the product may be selected based on information stored in relation to a user which is viewing the media content.

The obtaining module 806 may be further configured to obtain a second 3D model representative of the user. In some embodiments, user models may be stored in relation to one or more accounts. In these embodiments, the second 3D model may be identified and retrieved by virtue of being stored in relation to an account which is being used to view the media content. In some embodiments, a second 3D model representative of the user may be received from a user device which is being used to view the media content.

The apparatus 800 may further include a determining module 808 configured to determine a presentation pose from the media content. The presentation pose is determined as a current pose of the presenter within the media content. This may be done using any suitable pose estimation technique. The determined presentation pose may include an indication of various parts (e.g., body parts) of a user model and their respective locations and orientations.

The apparatus 800 may further include an applying module 810 configured to apply the presentation pose to the second 3D model. To do this, locations and orientations of various parts (e.g., body parts) of the second 3D model may be adjusted so that they match the corresponding locations and orientations in the presentation pose data.

The apparatus 800 may further include a generating module 812 configured to generate a third 3D model by fitting the first 3D model onto the second 3D model. This may involve deforming the first 3D model to minimize distances between the first 3D model and the second 3D model.

The apparatus 800 may further include a presenting module 814 configured to present the third 3D model to the user. This involves rendering the third 3D model and providing the third 3D model to a user device which is presenting the media content. The third 3D model is caused to be presented alongside the media content. For example, the media content may be augmented to include the third 3D model (e.g., in a separate window within the media content) .

While the present subject matter has been described in detail with respect to specific embodiments thereof, it will be appreciated that those skilled in the art, upon attaining an understanding of the foregoing may readily produce alterations to, variations of, and equivalents to such embodiments. Accordingly, it should be understood that the present disclosure has been presented for purposes of example rather than limitation, and does not preclude inclusion of such modifications, variations, and/or additions to the present subject matter as would be readily apparent to one of ordinary skill in the art. Indeed, the methods and systems described herein may be embodied in a variety of other forms; furthermore, various omissions, substitutions and changes in the form of the methods and systems described herein may be made without departing from the spirit of the present disclosure. The accompanying claims and their equivalents are intended to cover such forms or modifications as would fall within the scope and spirit of the present disclosure.

Unless specifically stated otherwise, it is appreciated that throughout this specification discussions utilizing terms such as “processing, ” “computing, ” “calculating, ” “determining, ” and “identifying” or the like refer to actions or processes of a computing device, such as one or more computers or a similar electronic computing device or devices, that manipulate or transform data represented as physical electronic or magnetic quantities within memories, registers, or other information storage devices, transmission devices, or display devices of the computing platform.

The system or systems discussed herein are not limited to any particular hardware architecture or configuration. A computing device can include any suitable arrangement of components that provide a result conditioned on one or more inputs. Suitable computing devices include multipurpose microprocessor-based computer systems accessing stored software that programs or configures the computer system from a general-purpose computing apparatus to a specialized computing apparatus implementing one or more embodiments of the present subject matter. Any suitable programming, scripting, or other type of language or combinations of languages may be used to implement the teachings contained herein in software to be used in programming or configuring a computing device.

Embodiments of the methods disclosed herein may be performed in the operation of such computing devices. The order of the blocks presented in the examples above can be varied-for example, blocks can be re-ordered, combined, and/or broken into sub-blocks. Certain blocks or processes can be performed in parallel.

Conditional language used herein, such as, among others, “can, ” “could, ” “might, ” “may, ” “e.g., ” and the like, unless specifically stated otherwise, or otherwise understood within the context as used, is generally intended to convey that certain examples include, while other examples do not include, certain features, elements, and/or steps. Thus, such conditional language is not generally intended to imply that features, elements and/or steps are in any way required for one or more examples or that one or more examples necessarily include logic for deciding, with or without author input or prompting, whether these features, elements and/or steps are included or are to be performed in any particular example.

The terms “including, ” “including, ” “having, ” and the like are synonymous and are used inclusively, in an open-ended fashion, and do not exclude additional elements, features, acts, operations, and so forth. Also, the term “or” is used in its inclusive sense (and not in its exclusive sense) so that when used, for example, to connect a list of elements, the term “or” means one, some, or all of the elements in the list. The use of “adapted to” or “configured to” herein is meant as open and inclusive language that does not foreclose devices adapted to or configured to perform additional tasks or steps. Additionally, the use of “based on” is meant to be open and inclusive, in that a process, step, calculation, or other action “based on” one or more recited conditions or values may, in practice, be based on additional conditions or values beyond those recited. Similarly, the use of “based at least in part on” is meant to be open and inclusive, in that a process, step, calculation, or other action “based at least in part on” one or more recited conditions or values may, in practice, be based on additional conditions or values beyond those recited. Headings, lists, and numbering included herein are for ease of explanation only and are not meant to be limiting.

The various features and processes described above may be used independently of one another, or may be combined in various ways. All possible combinations and sub-combinations are intended to fall within the scope of the present disclosure. In addition, certain method or process blocks may be omitted in some implementations. The methods and processes described herein are also not limited to any particular sequence, and the blocks or states relating thereto can be performed in other sequences that are appropriate. For example, described blocks or states may be performed in an order other than that specifically disclosed, or multiple blocks or states may be combined in a single block or state. The example blocks or states may be performed in serial, in parallel, or in some other manner. Blocks or states may be added to or removed from the disclosed examples. Similarly, the example systems and components described herein may be configured differently than described. For example, elements may be added to, removed from, or rearranged compared to the disclosed examples.

Claims

A method comprising:

receiving an indication of media content being viewed by a user;

identifying a product associated with the media content;

obtaining a first 3D model representative of the product;

obtaining a second 3D model representative of the user;

determining a presentation pose from the media content;

applying the presentation pose to the second 3D model;

generating a third 3D model by fitting the second 3D model with the first 3D model; and

presenting the third 3D model to the user in the presentation pose.
The method of claim 1 wherein the product associated with the media content is identified via an identifier associated with the product included within metadata for the media content.
The method of claim 2 wherein the identifier associated with the product is a stock keeping unit (SKU) number.
The method of claim 1 wherein the first 3D model representative of the product is obtained from a catalog of 3D models.
The method of claim 1 wherein the media content comprises a streaming video.
The method of claim 1 wherein the product is a clothing item worn by a presenter within the media content, and wherein the presentation pose comprises a pose of the presenter.
The method of claim 6 wherein the presenter comprises a second user different from the user.
A system comprising:

a processor; and

a memory including instructions that, when executed with the processor, cause the system to, at least:

receive an indication of media content being viewed by a user;

identify a product associated with the media content;

obtain a first 3D model representative of the product;

obtain a second 3D model representative of the user;

determine a presentation pose from the media content;

apply the presentation pose to the second 3D model;

generate a third 3D model by fitting the second 3D model with the first 3D model; and

present the third 3D model to the user in the presentation pose.
The system of claim 8 wherein the product associated with the media content is identified via an identifier associated with the product included within metadata for the media content.
The system of claim 9 wherein the identifier associated with the product is a stock keeping unit (SKU) number.
The system of claim 8 wherein the first 3D model representative of the product is obtained from a catalog of 3D models.
The system of claim 8 wherein the product is a clothing item worn by a presenter within the media content, and wherein the presentation pose comprises a pose of the presenter.
The system of claim 12 wherein the presenter comprises a second user different from the user.
The system of claim 8 wherein the media content comprises a streaming video.
A non-transitory computer readable medium storing specific computer-executable instructions that, when executed by a processor, cause a computer system to at least:

receive an indication of media content being viewed by a user;

identify a product associated with the media content;

obtain a first 3D model representative of the product;

obtain a second 3D model representative of the user;

determine a presentation pose from the media content;

apply the presentation pose to the second 3D model;

generate a third 3D model by fitting the second 3D model with the first 3D model; and

present the third 3D model to the user in the presentation pose.
The non-transitory computer readable medium of claim 15 wherein the product associated with the media content is identified via an identifier associated with the product included within metadata for the media content.
The non-transitory computer readable medium of claim 16 wherein the identifier associated with the product is a stock keeping unit (SKU) number.
The non-transitory computer readable medium of claim 15 wherein the first 3D model representative of the product is obtained from a catalog of 3D models.
The non-transitory computer readable medium of claim 15 wherein the product is a clothing item worn by a presenter within the media content, and wherein the presentation pose comprises a pose of the presenter.
The non-transitory computer readable medium of claim 15 wherein the media content comprises a streaming video.
An apparatus comprising:

a receiving module, configured to receive an indication of media content being viewed by a user;

an identifying module, configured to identify a product associated with the media content;

an obtaining module, configured to obtain a first 3D model representative of the product, and obtain a second 3D model representative of the user;

a determining module, configured to determine a presentation pose from the media content;

an applying module, configured to apply the presentation pose to the second 3D model;

a generating module, configured to generate a third 3D model by fitting the second 3D model with the first 3D model; and

a presenting module, configured to present the third 3D model to the user in the presentation pose.
The apparatus of claim 21 wherein the product associated with the media content is identified via an identifier associated with the product included within metadata for the media content.
The apparatus of claim 22 wherein the identifier associated with the product is a stock keeping unit (SKU) number.
The apparatus of claim 21 wherein the first 3D model representative of the product is obtained from a catalog of 3D models.
The apparatus of claim 21 wherein the media content comprises a streaming video.
The apparatus of claim 21 wherein the product is a clothing item worn by a presenter within the media content, and wherein the presentation pose comprises a pose of the presenter.
The apparatus of claim 26 wherein the presenter comprises a second user different from the user.
A computer program, wherein the computer program, when executed by a processor, causes the processor to execute the method of any of claims 1 to 7.