CN117876579A - Platform for enabling multiple users to generate and use neural radiation field models - Google Patents

Platform for enabling multiple users to generate and use neural radiation field models Download PDF

Info

Publication number
CN117876579A
CN117876579A CN202311736189.6A CN202311736189A CN117876579A CN 117876579 A CN117876579 A CN 117876579A CN 202311736189 A CN202311736189 A CN 202311736189A CN 117876579 A CN117876579 A CN 117876579A
Authority
CN
China
Prior art keywords
user
radiation field
view
images
objects
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311736189.6A
Other languages
Chinese (zh)
Inventor
I·博纳奇
A·萨德尔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Google LLC
Original Assignee
Google LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US18/169,425 external-priority patent/US20240202987A1/en
Application filed by Google LLC filed Critical Google LLC
Publication of CN117876579A publication Critical patent/CN117876579A/en
Pending legal-status Critical Current

Links

Landscapes

  • Processing Or Creating Images (AREA)

Abstract

Systems and methods for enabling a user to generate and utilize a neural radiation field model may include obtaining user image data and training one or more neural radiation field models based on the user image data. The system and method may include obtaining a user image based on a determination that the user image depicts an object of a particular object type. The trained neural radiation field model may then be used for view synthesis image generation of the particular user object.

Description

Platform for enabling multiple users to generate and use neural radiation field models
Technical Field
The present disclosure relates generally to a platform for enabling multiple users to generate and use neuro-radiation field models to generate virtual representations of user objects. More particularly, the present disclosure relates to one or more novel view synthesis (view) images that obtain a user image and train one or more neural radiation field models to generate one or more objects depicted in the user image.
Background
The user may not have access to three-dimensional modeling, object segmentation, and novel view rendering. Such features may facilitate searching, visualizing rearranged environments, understanding objects, and comparing objects without having to physically bring the objects side-by-side. Previous techniques for virtually viewing objects have relied heavily on photography and/or large amounts of data, which may include video. The photograph includes a two-dimensional view from a single view and/or a limited number of views. Video is similarly limited to explicitly captured data. Based on time costs and/or lack of knowledge of the modeling program, a user may not have access to the three-dimensional modeling techniques.
In addition, the picture may provide only a limited amount of information to the user. The size and compatibility with the new environment may be difficult to understand from the image. For example, a user may wish to rearrange their rooms; however, physically rearranging the rooms merely to view the possibilities may be cumbersome. The user using the image may depend largely on imagination of understanding of size, lighting, and orientation.
Disclosure of Invention
Aspects and advantages of embodiments of the disclosure will be set forth in part in the description which follows, or may be learned from the description, or may be learned by practice of the embodiments.
One example aspect of the present disclosure relates to a computing system. The system may include one or more processors and one or more non-transitory computer-readable media collectively storing instructions that, when executed by the one or more processors, cause the computing system to perform operations. The operations may include obtaining user image data and requesting data. The user image data may describe one or more images including one or more user objects. One or more images may have been generated using a user computing device. The operations may include training one or more neural radiation field models based on the user image data. One or more neural radiation field models may be trained to generate view synthesis of one or more objects. The operations may include generating one or more view composite images using one or more neural radiation field models based on the request data. In some implementations, the one or more view composite images can include one or more renderings of the one or more objects.
Another example aspect of the present disclosure relates to a computer-implemented method for virtual closet (close) generation. The method may include obtaining, by a computing system including one or more processors, a plurality of user images. Each of the plurality of user images may include one or more pieces of clothing (closing). In some embodiments, multiple user images may be associated with multiple different pieces of clothing. The method may include training, by the computing system, a respective neural radiation field model for each respective garment of the plurality of different garments. Each respective neuro-radiation field model may be trained to generate one or more view-synthesis renderings of a particular respective piece of clothing. The method may include storing, by the computing system, each respective neural radiation field model in a collective database. The method may include providing, by the computing system, a virtual closet interface. The virtual closet interface may provide a plurality of garment view composite renderings for display based on a plurality of respective neural radiation field models. The plurality of garment view composite renderings may be associated with at least a subset of the plurality of different garments.
Another example aspect of the disclosure relates to one or more non-transitory computer-readable media collectively storing instructions that, when executed by one or more computing devices, cause the one or more computing devices to perform operations. The operations may include obtaining a plurality of user image datasets. Each of the plurality of user image data sets may describe one or more images including one or more objects. In some implementations, one or more images may have been generated using a user computing device. The operations may include processing the plurality of user image data sets with one or more classification models to determine a subset of the plurality of user image data sets that includes features describing one or more particular object types. The operations may include training a plurality of neural radiation field models based on a subset of the plurality of user image datasets. In some implementations, each respective neural radiation field model may be trained to generate a view synthesis of one or more particular objects of a respective user image dataset of a subset of the plurality of user image datasets. The operations may include generating a plurality of view synthesis renderings using the plurality of neural radiation field models. Multiple view composite renderings may describe multiple different objects of a particular object type. The operations may include providing a user interface for viewing multiple view composite renderings.
The system and method may be used to learn a three-dimensional representation of a user object, which may then be used to generate a virtual catalog of user objects. Additionally and/or alternatively, the systems and methods may be used to compare user objects and/or other objects. The comparison may be aided by view-synthesis rendering that renders different objects with uniform (unified) lighting and/or uniform gestures. For example, images of different objects may depict objects at different illuminations, locations, and/or distances. The systems and methods disclosed herein may be used to learn a three-dimensional representation of an object, and may generate view-composite renderings of different objects with uniform lighting, uniform gestures, and/or uniform scaling (scaling).
Other aspects of the disclosure relate to various systems, apparatuses, non-transitory computer-readable media, user interfaces, and electronic devices.
These and other features, aspects, and advantages of various embodiments of the present disclosure will become better understood with reference to the following description and appended claims. The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the disclosure and together with the description, serve to explain the principles of interest.
Drawings
A detailed discussion of one embodiment directed to one of ordinary skill in the art is set forth in the specification in connection with the accompanying drawings, in which:
FIG. 1 depicts a block diagram of an example view composite image generation system, according to an example embodiment of the present disclosure.
FIG. 2 depicts a block diagram of example virtual object set generation, according to an example embodiment of the present disclosure.
Fig. 3 depicts a flowchart of an example method for performing view synthesis image generation, according to an example embodiment of the present disclosure.
FIG. 4 depicts a block diagram of example view composite image generation, according to an example embodiment of the present disclosure.
Fig. 5 depicts a block diagram of an example neural radiation field model training and utilization, according to an example embodiment of the present disclosure.
Fig. 6 depicts a diagram of an example virtual closet interface, according to an example embodiment of the present disclosure.
Fig. 7 depicts a flowchart of an example method for performing virtual closet generation, according to an example embodiment of the present disclosure.
FIG. 8 depicts a flowchart of an example method for performing virtual object user interface generation and display, according to an example embodiment of the present disclosure.
FIG. 9A depicts a block diagram of an example computing system performing view synthesis image generation, according to an example embodiment of the disclosure.
Fig. 9B depicts a block diagram of an example computing device performing view synthesis image generation, according to an example embodiment of the disclosure.
Fig. 9C depicts a block diagram of an example computing device performing view synthesis image generation, according to an example embodiment of the disclosure.
Fig. 10 depicts a block diagram of an example neural radiation field model training, according to an example embodiment of the present disclosure.
Fig. 11 depicts a block diagram of an example enhanced environment (augmented environment) generation system, according to an example embodiment of the present disclosure.
FIG. 12 depicts a block diagram of example virtual environment generation, according to an example embodiment of the present disclosure.
Fig. 13 depicts a block diagram of an example enhanced image data generation, according to an example embodiment of the present disclosure.
Repeated reference characters across the several figures are intended to identify identical features in the various embodiments.
Detailed Description
In general, the present disclosure relates to systems and methods for providing a platform for a user to train and/or utilize a neuro-radiation field model for user object rendering. In particular, the systems and methods disclosed herein may learn a three-dimensional representation of a user object with the aid of (level) one or more neural radiation field models and a user image. The trained neural radiation field model may allow users to rearrange their environment via augmented reality, which may circumvent the physical burden process of manually rearranging rooms to move objects back if the appearance is not user desired. Additionally and/or alternatively, the trained neural radiation field model may be used to generate a virtual catalog (e.g., virtual closet) of user objects. The virtual directory may include unified gestures and/or unified lighting renderings, which may allow for unified depiction of user objects (e.g., for object comparison). Additionally and/or alternatively, the novel view composite image generation may be used to view various objects from various locations and directions without physically traversing the environment.
The trained neural radiation field model may enable users to create their own geometrically perceived field try-on experience. The utilization on an individual-based level may allow each individual user and/or collection of users to train a neuro-radiation field, which may be personalized for its objects and/or objects in its environment. Personalization may enable virtual rearrangement, virtual comparison, and/or visualization of geometry-aware (and location-aware) anywhere. Additionally and/or alternatively, the systems and methods may include view-synthesis rendering of different objects with uniform lighting, uniform gestures, uniform locations, and/or uniform scaling to provide a platform for comparing objects with context-aware (context-aware) rendering.
Based on previous techniques, users have traditionally had no access to three-dimensional modeling, object segmentation, and novel view rendering. Such features may facilitate searching, visualizing rearranged environments, understanding objects, and comparing objects without having to physically bring the objects side-by-side.
The systems and methods disclosed herein may utilize a platform for providing neuro-radiative field (Neural Radiance Field, NERF) technology to users to allow users to create, store, share, and view high quality 3D content at a broad level. The system and method may facilitate re-modeling, apparel (outfit) design, object comparison, and/or catalog generation (e.g., merchants may build high quality 3D content for products and add to their websites).
In some implementations, the systems and methods disclosed herein may be used to generate a three-dimensional model of a user object and render a composite image of the user object. The system and method may learn a three-dimensional model using objects from a set of photographs of a user. The learned three-dimensional model may be used to render a particular combination of objects and environments that may be controlled by the user and/or may be controlled based on context such as the user's search history. Additionally and/or alternatively, the user may manipulate the rendering (e.g., "scroll"). For example, novel view synthesis using a trained neural radiation field model may be used to allow a user to view objects from different views without physically traversing the environment.
A platform for enabling users to generate and utilize neural radiation field models may allow users to visualize "their" objects along with other objects and/or other environments. Additionally and/or alternatively, a particular combination of objects and/or environments may be a function of unique inputs of object characteristics such as search history, such as availability, price, etc., which may provide context awareness and user-specific experience.
Additionally and/or alternatively, the systems and methods disclosed herein may include a platform that provides an interface for a user to train a neural radiation field model on user-generated content (e.g., images) provided by the user. The trained neural radiation field model may be used to generate a virtual representation of user objects that may be added to a user collection, which may be used for organization, comparison, sharing, and the like. The user set may include a virtual closet, a virtual furniture directory, a virtual collection directory (e.g., a virtual representation of a physical set (e.g., a bobble head set) that the user may generate) and/or a virtual trophy set. The platform may enable a user to generate a photorealistic view rendering from a plurality of different perspectives, which may be accessed and displayed even when the user is not in proximity to a physical object. The platform may include sharing among users, which may be used for social media, marketing, and/or messaging.
The systems and methods of the present disclosure provide a number of technical effects and benefits. As one example, the system and method may learn a three-dimensional representation of a user object based on a user image to provide a view-synthesis rendering of the user object. In particular, images captured with a user computing device may be processed to classify and/or segment objects. The three-dimensional modeled representation may then be learned for one or more objects in the user image by training a neuro-radiation field model. The trained neural radiation field model may then be used for augmented reality rendering, novel view synthesis, and/or instance interpolation.
Another technical benefit of the systems and methods of the present disclosure is the ability to provide a virtual catalog of user objects via one or more view composite images. For example, multiple neuro-radiation field models may be utilized to generate multiple view composite renderings of multiple user objects. View-synthesis rendering may be generated and/or enhanced based on unified lighting, unified scaling, and/or unified gestures. Multiple view renderings may be provided via a user interface to allow users to easily view their objects from their phones or other computing devices.
Another example of technical effects and benefits relates to improved computing efficiency and improvements in the operation of computing systems (functioning). For example, the systems and methods disclosed herein may reduce the computational cost of searching images of objects online by means of user images, and may ensure that the correct object is being modeled.
Referring now to the drawings, example embodiments of the present disclosure will be discussed in more detail.
FIG. 1 depicts a block diagram of an example view composite image generation system 10, according to an example embodiment of the present disclosure. In particular, view composite image generation system 10 may obtain user image data 14 and/or request data 18 from user 12 (e.g., from a user computing system). The user image data 14 and/or the request data 18 may be obtained in response to a time event, one or more user inputs, application download and profile (profile) settings, and/or trigger event determinations. User image data 14 and/or request data 18 may be obtained via one or more interactions with a platform (e.g., a web platform). In some implementations, an application programming interface associated with the platform may obtain and/or generate user image data 14 and/or request data 18 in response to one or more inputs. User 12 may be an individual, a retailer, a manufacturer, a service provider, and/or another entity.
The user image data 14 may be used to generate a three-dimensional model 16 of the user object depicted in the user image data 16. Generating the three-dimensional model 16 may include learning a three-dimensional representation of the corresponding object by training one or more neural radiation field models on the user image data 14.
The rendering block 20 may process the request data 18 and may render one or more view composite images 1 of the object using the generated three-dimensional model. Request data 18 may describe explicit user requests to generate view-composition renderings (e.g., augmented reality renderings) in the user's environment and/or user requests to render one or more objects in combination with one or more additional objects or features. Request data 18 may describe contexts and/or parameters (e.g., lighting, size of environmental objects, time of day, location and orientation of other objects in the environment, and/or other contexts associated with generation) that may affect how the objects are rendered. The request data 18 may be generated and/or obtained in response to a user's context.
The view composite image 22 of the object may be provided via a viewfinder, a still image, a directory user interface, and/or via a virtual reality experience. The generated view composite image 22 may be stored locally and/or on a server in association with the user profile. In some implementations, the view composite image 22 of the object can be stored by the platform via one or more server computing systems associated with the platform. Additionally and/or alternatively, a view composite image 22 of the object may be provided for display and/or interaction via a user interface associated with the platform. The user may add the view composite image 22 of the object to one or more collections associated with the user, which may then be considered collective via a collection user interface.
FIG. 2 depicts a block diagram of an example virtual object set generation 200, according to an example embodiment of the disclosure. In particular, virtual object set generation 200 may include obtaining user image data 214 (e.g., images from a user-specific image library that may be stored locally and/or on a server computing system) and/or request data 218 (e.g., manual requests, context-based requests, and/or application-initiated requests) from a user 212 (e.g., from a user computing system that may include a user computing device (e.g., a mobile computing device)). The user image data 214 and/or the request data 218 may be obtained in response to a time event (e.g., a given interval for updating a virtual object catalog by a user-specific image database), one or more user inputs (e.g., one or more inputs of a user interface for learning a three-dimensional representation of a specific object in an environment and/or one or more user inputs for importing or exporting images from a user-specific store), an application download and profile settings (e.g., a virtual closet application and/or an object modeling application), and/or a trigger event determination (e.g., a user location, search query acquisition, and/or knowledge trigger event). The user 212 may be an individual, a retailer, a manufacturer, a service provider, and/or another entity.
The user image data 214 may include images generated and/or obtained by the user 212 (e.g., via an image capture device (e.g., a camera of a mobile computing device)). Alternatively and/or additionally, the user image data 214 may include user-selected data. For example, the user-selected data may include one or more images and/or image datasets selected by a user via one or more user inputs. The user-selected data may include images from web pages, image data published on a social media platform, images in a user's "camera film," image data stored locally in an image folder, and/or data stored in one or more other databases. The user-selected data may be selected via one or more user inputs, which may include gesture inputs, tap inputs, cursor inputs, text inputs, and/or any other form of input. The user image data 214 may be stored locally and/or on one or more server computing systems. The user image data 214 may be specifically associated with a particular user and/or may be shared data (e.g., shared among a set group and/or shared via a network and/or web page) selected by the user to generate a virtual object, which may then be stored in the set and/or provided for display. The user image data 214 may include automatically selected image data. The automatic selection may be based on one or more object detections, one or more object classifications, and/or one or more image classifications. For example, the plurality of image data sets may be processed to determine a subset of the image data sets that includes image data describing one or more objects of a particular object type. The subset may be selected for processing.
The user image data 214 may be used to generate a three-dimensional model 216 of the user object depicted in the user image data 216 (e.g., by training parameters of a neural radiation field model to learn a three-dimensional representation of color values and density values of the user object). Generating the three-dimensional model 216 may include learning a three-dimensional representation of the corresponding object by training one or more neural radiation field models on the user image data 214.
The rendering block 220 (e.g., one or more layers for prompting one or more neural radiation field models and/or one or more application programming interfaces for obtaining and/or utilizing the neural radiation field models) may process the request data 218 and may utilize the generated three-dimensional model to render one or more view composite images 222 of the object. Request data 218 may describe an explicit user request to generate a view-composition rendering (e.g., an augmented reality rendering) in a user's environment and/or a user request to render one or more objects in conjunction with one or more additional objects or features. The request data 218 may describe a context and/or parameters (e.g., lighting, size of the environmental object, time of day, location and orientation of other objects in the environment, and/or other contexts associated with generation) that may affect how the object is rendered. Request data 218 may be generated and/or obtained in response to the user's context.
The view composite image 222 of the object may be provided via a viewfinder, a still image, a directory user interface, and/or via a virtual reality experience. The generated view composite image 222 may be stored locally and/or on a server in association with a user profile.
In some implementations, the view composite image 222 may be rendered by the rendering block 220 based on one or more unified parameters 224. The unification parameters 224 may include unification pose (facing a particular direction (e.g., forward facing)), unification location (e.g., image-centered object), unification lighting (e.g., no shadows, front lighting, natural lighting, etc.), and/or unification scale (scale) (e.g., rendered objects may be scaled based on unification scale such that rendering may have unification one inch to two pixel scale). The unification parameters 224 may be used to provide a coherent (coherent) rendering of the object, which may provide a more intelligent (formed) comparison of the object and/or a more intelligent coherent comparison.
Additionally and/or alternatively, one or more view composite images 22 may be added to the catalog 226. For example, one or more view composite images 222 may describe one or more clothing objects and may be added to a virtual closet catalog. The virtual closet inventory may include a plurality of user clothing renderings, which may be used for clothing planning, for clothing shopping, and/or for clothing comparison. Catalog 226 may be a user-specific catalog, a product database for retailers and/or manufacturers, and/or a group-specific catalog for group sharing. In some implementations, the generated catalog can be processed to determine object suggestions for users and/or groups of users. For example, user preferences, styles, and/or flaws may be determined based on depictions in a user-specific catalog. The wear of the clothing, the color plate, the style, the amount of a particular object type, and/or the set of objects may be determined and utilized to determine suggestions to be provided to the user. The system and method may provide for specific object selection based on existing objects having a high level of wear. Additionally and/or alternatively, a style of the user may be determined and suggestions for other objects of the style may be suggested.
Fig. 3 depicts a flowchart of an example method for performing according to an example embodiment of the present disclosure. Although fig. 3 depicts steps performed in a particular order for purposes of illustration and discussion, the methods of the present disclosure are not limited to the particular order or arrangement shown. The various steps of method 300 may be omitted, rearranged, combined, and/or adjusted in various ways without departing from the scope of the present disclosure.
At 302, a computing system may obtain user image data and request data. The user image data may describe one or more images including one or more user objects. One or more images may have been generated using a user computing device. Alternatively and/or additionally, the user image data may include user-selected data (e.g., one or more images obtained from a web page and/or web platform for generating virtual objects). In some implementations, the request data may describe a request to generate an object type specific set. The request data may be associated with a context. In some implementations, the context can describe at least one of an object context or an environmental context.
At 304, the computing system may train one or more neural radiation field models based on the user image data. One or more neural radiation field models may be trained to generate view synthesis of one or more objects. The one or more neural radiation field models may be configured to predict color values and/or density values associated with the object to generate a view synthesis rendering of the image, which may include view rendering of the object from a novel view not depicted in the user image data.
In some implementations, the computing system may process the user image data to determine that one or more objects are of a particular object type and store one or more neural radiation field models in a collection database. The collection database may be associated with a collection that is object type specific. A particular object type may be associated with one or more pieces of clothing.
At 306, the computing system may generate one or more view composite images using one or more neural radiation field models based on the request data. The one or more view composite images may include one or more renderings of one or more objects.
In some embodiments, the one or more view synthesis images may be generated by processing the position and view direction with one or more neural radiation field models to generate one or more predicted density values and one or more color values and generating one or more view synthesis images based on the one or more predicted density values and the one or more color values.
In some implementations, the request data may describe one or more adjustment settings. Generating one or more view composite images using one or more neural radiation field models based on the request data may include adjusting one or more color values of a set of predicted values generated by the one or more neural radiation field models.
Additionally and/or alternatively, the request data may describe a particular location and a particular view direction. Generating one or more view composite images using the one or more neural radiation field models based on the request data may include processing the particular location and the particular view direction using the one or more neural radiation field models to generate a view rendering of one or more objects describing views associated with the particular location and the particular view direction.
In some implementations, the computing system may provide one or more view composite images to the user computing system for display. For example, view composition rendering may be provided for display via one or more user interfaces, and may include a grid view, a carousel (carousel) view, a thumbnail (thumb) view, and/or an expanded (expanded) view.
Additionally and/or alternatively, the computing system may provide a virtual object user interface to the user computing system. The virtual object user interface may provide one or more view composite images for display. One or more objects may be isolated from the original environment depicted in the user image data.
In some implementations, the computing system may obtain a plurality of additional user image datasets. Each of the plurality of additional user image data sets may have been generated using the user computing device. The computing system may process each of the plurality of additional user image data sets using one or more object determination models to determine that a subset of the plurality of additional user image data sets includes a corresponding object of a particular object type. The computing system may train respective additional neural radiation field models for each respective additional user image dataset of a subset of the plurality of additional user image datasets and store each respective additional neural radiation field model in the aggregate database.
Fig. 4 depicts a block diagram of an example view composite image generation 400, according to an example embodiment of the present disclosure. View composite image generation 400 may include obtaining a user-specific image (image) of a user object (e.g., one or more images of the object captured by a user computing device) 402. In some embodiments, a specific user interface may be utilized to instruct a user how to capture images for training a neural radiation field model. Additionally and/or alternatively, a user interface and/or one or more applications may be used to capture an array of images of the object for training one or more neural radiation field models to generate a view rendering of the one or more objects. Alternatively and/or additionally, the user-specific imagery of the user object 402 may be obtained from a user-specific image database (e.g., an image library associated with the user).
The user-specific imagery 402 of the user object may be used to generate a three-dimensional model 404 of the user object (e.g., the user-specific imagery 402 of the user object may be used to train one or more neural radiation field models to learn one or more three-dimensional representations of the object). The generated three-dimensional model may be used to generate rendering data 406 (e.g., trained neural radiation field models and/or parametric data) of the object. The rendering data may be stored in association with object-specific data, which may include classification data (e.g., object tags), source image data, metadata, and/or user annotations.
The stored rendering data may then be selected 408 based on the context information 410. For example, the rendering data may be selected 408 based on context information 410, which context information 410 may include user search history, user search queries, budgets, other selected objects, user location, time, and/or aesthetics associated with the user's current environment.
The selected rendering data may then be processed by a rendering block 412 to render one or more view composite images 414 of the selected object. In some implementations, multiple rendering datasets associated with multiple different user objects may be obtained to render one or more images having multiple user objects. Additionally and/or alternatively, user objects may be rendered into a user environment, a template environment, and/or a user-selected environment. One or more user objects may be rendered adjacent to a suggested object (e.g., a suggested item to purchase).
Fig. 5 depicts a block diagram of an example neural radiation field model training and utilization 500, according to an example embodiment of the present disclosure. Specifically, a plurality of images 502 may be obtained. The plurality of images 502 may include a first image, a second image, a third image, a fourth image, a fifth image, and/or an nth image. The plurality of images 502 may be obtained from a user-specific database (e.g., local storage and/or cloud storage associated with the user). In some implementations, the plurality of images 502 can be obtained via a capture device associated with the user computing system.
The plurality of images 502 may be processed using one or more classification models 504 (and/or one or more detection models and/or one or more segmentation models) to determine a subset 506 of images that includes one or more objects associated with one or more object types. For example, classification model 504 may determine that subset 506 of images includes objects of a particular object type (e.g., laundry object type, furniture object type, and/or particular product type). The subset of images 506 may include the first image, the third image, and the nth image. Different images of the subset 506 of images may describe different objects. In some implementations, an image describing the same object can be determined and utilized to generate an object-specific dataset for refinement (refined) training.
The subset 506 of images may then be used to train a plurality of neural radiation field models 508 (e.g., a first NeRF model associated with an object of a first image (e.g., a first object), a third NeRF model associated with an object of a third image (e.g., a third object), and an nth NeRF model associated with an object of an nth image (e.g., an nth object)). Each neuro-radiation field model can be trained to generate view-synthesis renderings of different objects. Different sets of neuro-radiation field data (e.g., neuro-radiation field model 508 and/or learned parameters) may be stored.
The user may then interact with the user interface 510. Based on the user interface interactions, one or more of the neuro-radiation field data sets may be obtained. The one or more selected neuro-radiation field data sets may be utilized by rendering block 512 to generate one or more view composite images 514 describing the one or more user objects.
One or more additional user interface interactions may be received that may prompt the rendering of the additional view composite rendering based on one or more adjustments associated with the one or more inputs.
Fig. 6 depicts a diagram of an example virtual closet interface 600, according to an example embodiment of the present disclosure. In particular, the plurality of images 602 may be obtained and/or processed to generate a plurality of rendered data sets 604 associated with the garments depicted in the plurality of images 602. The plurality of images 602 may be obtained from a user-specific database (e.g., local storage and/or cloud storage associated with the user). In some implementations, one or more of the plurality of images 602 and/or one or more of the plurality of rendering data sets 604 may be obtained from a database associated with one or more other users (e.g., rendering data sets associated with retailers and/or manufacturers selling products (e.g., dress or shirts)).
The plurality of rendering datasets 604 may be selected and/or accessed in response to one or more interactions with the user interface 606. One or more rendering data sets may be selected and processed by rendering block 608 to generate one or more view synthesis renderings. For example, the systems and methods may be used to formulate apparel to be worn. The user may select and/or may be advised that the rendered apparel may be rendered with consistent gestures and lighting for review. The generated renderings may include virtual closet renderings 610 segmented from the user and/or virtual closet renderings 612 rendered on the user or virtual closet renderings rendered on the template individuals. In some implementations, a user may scroll through view-synthesis rendering of different pieces of clothing, may determine apparel to be visualized on a particular user (e.g., an augmented reality try-on and/or a template image of the user or another individual), and may render selected apparel on the user.
In some implementations, each clothing subtype may include multiple view composite renderings associated with different objects of the clothing subtype. A carousel interface may be provided for each clothing subtype, and multiple carousel interfaces may be provided simultaneously to scroll through each subtype individually and/or consistently, which may allow for coherent apparel. The user may then choose to try-on render the user interface element to then render the selected apparel on the user and/or the template individual.
Similar user interfaces may be implemented for interior designs, landscape designs, and/or gaming designs. The virtual closet interface and/or other similar user interfaces may include one or more suggestions determined based on user objects, user search histories, user browsing histories, and/or user preferences. The suggested rendering dataset may be obtained from a server database. The server database may include rendering data sets generated by other users (e.g., retailers, manufacturers, and/or peer-to-peer sellers). The advice may be based on availability, size, and/or price range for a particular user.
Fig. 7 depicts a flowchart of an example method for performing, according to an example embodiment of the present disclosure. Although fig. 7 depicts steps performed in a particular order for purposes of illustration and discussion, the methods of the present disclosure are not limited to the particular order or arrangement shown. The various steps of method 700 may be omitted, rearranged, combined, and/or adjusted in various ways without departing from the scope of the present disclosure.
At 702, a computing system may obtain a plurality of user images. Each of the plurality of user images may include one or more pieces of clothing. The plurality of user images may be associated with a plurality of different pieces of clothing. In some implementations, multiple user images may be automatically obtained from a storage database associated with a particular user based on the obtained request data. Additionally and/or alternatively, a plurality of user images may be selected from a corpus of user images (corpuses) based on metadata, one or more user inputs, and/or one or more classifications.
In some implementations, a computing system may access a stored database associated with a user and process a corpus of user images using one or more classification models to determine a plurality of user images including one or more objects classified as clothing.
At 704, the computing system may train a respective neural radiation field model for each respective garment of the plurality of different garments. Each respective neuro-radiation field model may be trained to generate one or more view-synthesis renderings of a particular respective piece of clothing.
At 706, the computing system can store each respective neural radiation field model in a collection database. The collection database may be associated with object types (e.g., clothing, furniture, plants, etc.) and/or object sub-types (e.g., pants, shirts, shoes, tables, lights, chairs, lily, orchids, shrubs, etc.). The aggregated database may be associated with a particular user and/or a particular marketplace. The user may supplement the aggregate database with products discovered via one or more online markets, social media posts from a social media platform, and/or a suggested rendered data set associated with a suggested object (or product). The suggestions may be based on a determined user need, a determined user style, a determined user aesthetics, and/or a user context. The suggested products may be associated with known dimensions, known availability, and/or known price compatibility.
At 708, the computing system may provide a virtual closet interface. The virtual closet interface may provide a plurality of garment view composite renderings for display based on a plurality of respective neural radiation field models. The plurality of garment view composite renderings may be associated with at least a subset of the plurality of different garments. In some embodiments, the virtual closet interface may include one or more interface features to view a laundry integration (ensable) that includes two or more pieces of laundry displayed simultaneously. A plurality of garment view composite renderings may be generated based on the one or more unified pose parameters and the one or more unified lighting parameters.
Fig. 8 depicts a flowchart of an example method for performing in accordance with an example embodiment of the present disclosure. Although fig. 8 depicts steps performed in a particular order for purposes of illustration and discussion, the methods of the present disclosure are not limited to the particular order or arrangement shown. The various steps of method 800 may be omitted, rearranged, combined, and/or adjusted in various ways without departing from the scope of the present disclosure.
At 802, a computing system may obtain a plurality of user image datasets. Each of the plurality of user image data sets may describe one or more images including one or more objects. One or more images may have been generated with a user computing device (e.g., a mobile computing device having an image capture component). The capturing and/or generating of the user image dataset may be facilitated by one or more user interface elements of the array for capturing images of each particular object.
At 804, the computing system may process the plurality of user image data sets using the one or more classification models to determine a subset of the plurality of user image data sets that includes features describing one or more particular object types. In some implementations, the determination may include one or more additional machine learning models (e.g., one or more detection models, one or more segmentation models, and/or one or more feature extractors). One or more classification models may have been trained to classify one or more particular object types (e.g., laundry type, furniture type, etc.). One or more classified objects may be segmented to generate a plurality of segmented images.
At 806, the computing system may train a plurality of neural radiation field models based on the subset of the plurality of user image datasets. Each respective neuro-radiation field model may be trained to generate a view synthesis of one or more particular objects of a respective user image dataset of a subset of the plurality of user image datasets. In some implementations, the subset can be processed to generate a plurality of training patches (patches) associated with each particular user image dataset of the subset. The patch may be used to train a neural radiation field model.
In some implementations, a computing system may determine a first set of user image datasets that includes features describing a first object subtype. The computing system may associate a first set of corresponding neural radiation models with the first object subtype tags, may determine a second set of user image data sets that includes features describing the second object subtype, and may associate a second set of corresponding neural radiation models with the second object subtype tags.
At 808, the computing system may generate a plurality of view synthesis renderings using the plurality of neural radiation field models. The multiple view composition renderings may describe a plurality of different objects (e.g., different pieces of clothing) of a particular object type (e.g., clothing object type and/or furniture object type).
At 810, the computing system may provide a user interface for viewing the multiple view composite rendering. The user interface may include a rendering pane (pane) for viewing multiple view composite renderings. Multiple view composite renderings may be provided via a carousel interface, multiple thumbnails, and/or compiled renderings with multiple objects displayed in a single environment.
In some implementations, a computing system may receive an integrated rendering request. The integrated rendering request may describe a request to generate a view rendering of a first object sub-type and a second object of a second object sub-type. The computing system may generate an integrated view rendering using a first neural radiation field model of the first set of corresponding neural radiation field models and a second neural radiation field model of the second set of corresponding neural radiation field models. The integrated view rendering may include image data describing a first object and a second object in a shared environment.
In some implementations, one or more neural radiation field models may be used to generate an augmented reality asset (asset) and/or virtual reality experience. For example, one or more neural radiation field models may be used to generate multiple view-synthesis renderings of one or more objects and/or environments, which may be used to provide an augmented reality experience and/or a virtual reality experience to a user. The augmented reality experience may be used to view objects (e.g., user objects) at different locations and/or positions in an environment in which the user is currently located, which may be a different environment than the environment in which the current physical object resides. The virtual reality experience may be used to provide a virtual roaming (walk-through) experience that may be used for remodelling, virtual access (e.g., virtual access to a complex house or escape room), apartment previews, and/or social media sharing of the environment that users may share to allow friends and/or family to view the environment. Additionally and/or alternatively, view synthesis rendering may be used for video game development and/or other content generation.
Fig. 9A depicts a block diagram of an example computing system 100 that performs view synthesis image generation, according to an example embodiment of the disclosure. The system 100 includes a user computing device 102, a server computing system 130, and a training computing system 150 communicatively coupled by a network 180.
The user computing device 102 may be any type of computing device, such as, for example, a personal computing device (e.g., a laptop computer or desktop computer), a mobile computing device (e.g., a smart phone or tablet computer), a game console or controller, a wearable computing device, an embedded computing device, or any other type of computing device.
The user computing device 102 includes one or more processors 112 and memory 114. The one or more processors 112 may be any suitable processing device (e.g., processor core, microprocessor, ASIC, FPGA, controller, microcontroller, etc.), and may be one processor or multiple processors operatively connected. Memory 114 may include one or more non-transitory computer-readable storage media, such as RAM, ROM, EEPROM, EPROM, flash memory devices, magnetic disks, and the like, and combinations thereof. Memory 114 may store data 116 and instructions 118 that are executed by processor 112 to cause user computing device 102 to perform operations.
In some implementations, the user computing device 102 may store or include one or more neural radiation field models 120. For example, the neural radiation field model 120 may be or may otherwise include various machine learning models, such as a neural network (e.g., a deep neural network) or other types of machine learning models, including nonlinear models and/or linear models. The neural network may include a feed forward neural network, a recurrent neural network (e.g., long short term memory recurrent neural network (long short-term memory recurrent neural network)), a convolutional neural network, or other form of neural network. An example neural radiation field model 120 is discussed with reference to fig. 1-2 and fig. 4-6.
In some implementations, the one or more neural radiation field models 120 may be received from the server computing system 130 over the network 180, stored in the user computing device memory 114, and then used or otherwise implemented by the one or more processors 112. In some implementations, the user computing device 102 can implement multiple parallel instances of the single neural radiation field model 120 (e.g., to perform parallel user object three-dimensional modeling across multiple instances of objects in the user image).
More specifically, the neuro-radiation field model 120 may be configured to process the three-dimensional position and the two-dimensional view direction to determine one or more predicted color values and one or more predicted density values to generate a view synthesis of the one or more objects from the position and the view direction. A particular neural radiation field model may be associated with one or more tags. A particular neural radiation field model may be obtained based on the association with a given tag and/or a given object. The neural radiation field model 120 may be used to synthesize an image having a plurality of objects, for virtually viewing the objects, and/or for augmented reality rendering.
Additionally or alternatively, one or more neural radiation field models 140 may be included in the server computing system 130 or otherwise stored and implemented by the server computing system 130, the server computing system 130 in communication with the user computing device 102 according to a client-server relationship. For example, the neural radiation field model 140 may be implemented by the server computing system 140 as part of a web service (e.g., a view synthesis image generation service). Accordingly, one or more models 120 may be stored and implemented at the user computing device 102 and/or one or more models 140 may be stored and implemented at the server computing system 130.
The user computing device 102 may also include one or more user input components 122 that receive user input. For example, the user input component 122 may be a touch-sensitive component (e.g., a touch-sensitive display screen or touchpad) that is sensitive to the touch of a user input object (e.g., a finger or stylus). The touch sensitive component may be used to implement a virtual keyboard. Other example user input components include a microphone, a conventional keyboard, or other means by which a user may provide user input.
The server computing system 130 includes one or more processors 132 and memory 134. The one or more processors 132 may be any suitable processing device (e.g., processor core, microprocessor, ASIC, FPGA, controller, microcontroller, etc.), and may be one processor or multiple processors operatively connected. Memory 134 may include one or more non-transitory computer-readable storage media, such as RAM, ROM, EEPROM, EPROM, flash memory devices, magnetic disks, and the like, and combinations thereof. Memory 134 may store data 136 and instructions 138 that are executed by processor 132 to cause server computing system 130 to perform operations.
In some implementations, the server computing system 130 includes or is otherwise implemented by one or more server computing devices. In instances where server computing system 130 includes multiple server computing devices, such server computing devices may operate in accordance with a sequential (sequential) computing architecture, a parallel computing architecture, or some combination thereof.
As described above, the server computing system 130 may store or otherwise include one or more machine-learned neuro-radiation field models 140. For example, model 140 may be or may otherwise include various machine learning models. Example machine learning models include neural networks or other multi-layer nonlinear models. Example neural networks include feed forward neural networks, deep neural networks, recurrent neural networks, and convolutional neural networks. An example model 140 is discussed with reference to fig. 1-2 and fig. 4-6.
The user computing device 102 and/or the server computing system 130 may train the models 120 and/or 140 via interactions with a training computing system 150 communicatively coupled via a network 180. The training computing system 150 may be separate from the server computing system 130 or may be part of the server computing system 130.
The training computing system 150 includes one or more processors 152 and memory 154. The one or more processors 152 may be any suitable processing device (e.g., processor core, microprocessor, ASIC, FPGA, controller, microcontroller, etc.), and may be one processor or multiple processors operatively connected. The memory 154 may include one or more non-transitory computer-readable storage media, such as RAM, ROM, EEPROM, EPROM, flash memory devices, magnetic disks, and the like, and combinations thereof. Memory 154 may store data 156 and instructions 158 that are executed by processor 152 to cause training computing system 150 to perform operations. In some implementations, the training computing system 150 includes or is otherwise implemented by one or more server computing devices.
Training computing system 150 may include a model trainer 160, model trainer 160 using various training or learning techniques (such as, for example, back propagation of errors) to train machine learning models 120 and/or 140 stored at user computing device 102 and/or server computing system 130. For example, the loss function may be back-propagated through the model to update one or more parameters of the model (e.g., a gradient based on the loss function). Various loss functions may be used, such as mean square error, likelihood loss, cross entropy loss, hinge loss, and/or various other loss functions. Gradient descent techniques may be used to iteratively update parameters over multiple training iterations.
In some implementations, performing back propagation of the error may include performing truncated timing back propagation (truncated backpropagation through time). Model trainer 160 may perform a variety of generalization techniques (e.g., weight decay, loss (dropout), etc.) to promote the generalization ability of the model being trained.
In particular, model trainer 160 may train neuro-radiation field models 120 and/or 140 based on a set of training data 162. Training data 162 may include, for example, user images, metadata, additional training images, ground truth (ground truth) tags, example training renderings, example feature annotations, example anchor points, and/or training video data.
In some implementations, if the user has provided permission (present), the training examples can be provided by the user computing device 102. Thus, in such embodiments, the model 120 provided to the user computing device 102 may be trained by the training computing system 150 on user-specific data received from the user computing device 102. In some instances, this process may be referred to as a personalized model.
Model trainer 160 includes computer logic for providing the desired functionality. Model trainer 160 may be implemented in hardware, firmware, and/or software that controls a general purpose processor. For example, in some embodiments, model trainer 160 includes program files stored on a storage device, loaded into memory, and executed by one or more processors. In other implementations, model trainer 160 includes a set of one or more computer-executable instructions stored in a tangible computer-readable storage medium (such as a RAM hard disk or an optical or magnetic medium).
The network 180 may be any type of communication network, such as a local area network (e.g., an intranet), a wide area network (e.g., the internet), or some combination thereof, and may include any number of wired or wireless links. In general, communications over network 180 may be carried via any type of wired and/or wireless connection using a variety of communication protocols (e.g., TCP/IP, HTTP, SMTP, FTP), coding or formats (e.g., HTML, XML), and/or protection schemes (e.g., VPN, secure HTTP, SSL).
The machine learning model described in this specification may be used in a variety of tasks, applications, and/or use cases.
In some implementations, the input to the machine learning model of the present disclosure can be image data. The machine learning model may process the image data to generate an output. As an example, the machine learning model may process the image data to generate an image recognition output (e.g., recognition of the image data, potential (latency) embedding of the image data, encoded representation of the image data, hashing of the image data, etc.). As another example, the machine learning model may process the image data to generate an image segmentation output. As another example, the machine learning model may process image data to generate an image classification output. As another example, the machine learning model may process the image data to generate an image data modification output (e.g., a change in the image data, etc.). As another example, the machine learning model may process the image data to generate an encoded image data output (e.g., an encoded and/or compressed representation of the image data, etc.). As another example, the machine learning model may process the image data to generate an enlarged image data output. As another example, the machine learning model may process the image data to generate a prediction output.
In some implementations, the input to the machine learning model of the present disclosure can be text or natural language data. The machine learning model may process text or natural language data to generate an output. As an example, the machine learning model may process natural language data to generate a linguistic coded output. As another example, the machine learning model may process text or natural language data to generate a potential text-embedded output. As another example, the machine learning model may process text or natural language data to generate a translation output. As another example, the machine learning model may process text or natural language data to generate a classification output. As another example, the machine learning model may process text or natural language data to generate a text segmentation output. As another example, the machine learning model may process text or natural language data to generate semantic intent output. As another example, the machine learning model may process text or natural language data to generate an enlarged text or natural language output (e.g., higher quality text or natural language data than the input text or natural language, etc.). As another example, the machine learning model may process text or natural language data to generate a predictive output.
In some implementations, the input to the machine learning model of the present disclosure can be potentially encoded data (e.g., a potential spatial representation of the input, etc.). The machine learning model may process the potentially encoded data to generate an output. As an example, the machine learning model may process the potentially encoded data to generate the recognition output. As another example, the machine learning model may process the potentially encoded data to generate a reconstructed output. As another example, the machine learning model may process the potentially encoded data to generate a search output. As another example, the machine learning model may process the potentially encoded data to generate a reclustering output. As another example, the machine learning model may process the potentially encoded data to generate a prediction output.
In some implementations, the input to the machine learning model of the present disclosure can be statistical data. The machine learning model may process the statistical data to generate an output. As an example, the machine learning model may process the statistical data to generate an identification output. As another example, the machine learning model may process the statistical data to generate a prediction output. As another example, the machine learning model may process the statistical data to generate a classification output. As another example, the machine learning model may process the statistical data to generate a segmentation output. As another example, the machine learning model may process the statistical data to generate a segmentation output. As another example, the machine learning model may process the statistical data to generate a visual output. As another example, the machine learning model may process the statistical data to generate a diagnostic output.
In some implementations, the input to the machine learning model of the present disclosure can be sensor data. The machine learning model may process the sensor data to generate an output. As an example, the machine learning model may process the sensor data to generate an identification output. As another example, the machine learning model may process the sensor data to generate a prediction output. As another example, the machine learning model may process the sensor data to generate a classification output. As another example, the machine learning model may process the sensor data to generate a segmented output. As another example, the machine learning model may process the sensor data to generate a segmented output. As another example, the machine learning model may process the sensor data to generate a visual output. As another example, the machine learning model may process the sensor data to generate a diagnostic output. As another example, the machine learning model may process the sensor data to generate a detection output.
In some cases, the machine learning model may be configured to perform tasks that include encoding input data for reliable and/or efficient transmission or storage (and/or corresponding decoding). For example, the task may be an audio compression task. The input may comprise audio data and the output may comprise compressed audio data. In another example, the input includes visual data (e.g., one or more images or videos), the output includes compressed visual data, and the task is a visual data compression task. In another example, a task may include generating an embedding of input data (e.g., input audio or visual data).
In some cases, the input includes visual data and the task is a computer visual task. In some cases, pixel data including one or more images is input, and the task is an image processing task. For example, the image processing task may be an image classification, wherein the output is a set of scores, each score corresponding to a different object class and representing a likelihood that one or more images depict an object belonging to that object class. The image processing task may be object detection, wherein the image processing output identifies one or more regions in the one or more images, and for each region, a likelihood that the region depicts the object of interest. As another example, the image processing task may be image segmentation, wherein the image processing output defines a respective likelihood for each category of a set of predetermined categories for each pixel in the one or more images. For example, the set of categories may be foreground and background. As another example, the set of categories may be object classes. As another example, the image processing task may be depth estimation, wherein the image processing output defines a respective depth value for each pixel in the one or more images. As another example, the image processing task may be motion estimation, wherein the network input includes a plurality of images, and the image processing output defines, for each pixel of one of the input images, a motion of a scene depicted at a pixel between the images in the network input.
FIG. 9A illustrates one example computing system that may be used to implement the present disclosure. Other computing systems may also be used. For example, in some implementations, the user computing device 102 may include a model trainer 160 and a training data set 162. In such implementations, the model 120 may be trained and used locally at the user computing device 102. In some such implementations, the user computing device 102 may implement the model trainer 160 to personalize the model 120 based on user-specific data.
Fig. 9B depicts a block diagram of an example computing device 40, performed in accordance with an example embodiment of the present disclosure. Computing device 40 may be a user computing device or a server computing device.
Computing device 40 includes a plurality of applications (e.g., application 1 through application N). Each application contains its own machine learning library and machine learning model. For example, each application may include a machine learning model. Example applications include text messaging applications, email applications, dictation applications, virtual keyboard applications, browser applications, and the like.
As shown in fig. 9B, each application may communicate with a number of other components of the computing device, such as, for example, one or more sensors, a context manager, a device state component, and/or additional components. In some implementations, each application can communicate with each device component using an API (e.g., public API). In some implementations, the APIs used by each application are specific to that application.
Fig. 9C depicts a block diagram of an example computing device 50, performed in accordance with an example embodiment of the present disclosure. Computing device 50 may be a user computing device or a server computing device.
Computing device 50 includes a plurality of applications (e.g., application 1 through application N). Each application communicates with a central intelligence (central intelligence) layer. Example applications include text messaging applications, email applications, dictation applications, virtual keyboard applications, browser applications, and the like. In some implementations, each application can communicate with the central intelligence layer (and the models stored therein) using APIs (e.g., generic APIs across all applications).
The central intelligence layer includes a plurality of machine learning models. For example, as shown in fig. 9C, a corresponding machine learning model (e.g., model) may be provided for each application and managed by the central intelligence layer. In other implementations, two or more applications may share a single machine learning model. For example, in some embodiments, the central intelligence layer may provide a single model (e.g., a single model) for all applications. In some implementations, the central intelligence layer is included within or otherwise implemented by the operating system of computing device 50.
The central intelligence layer may communicate with the central device data layer. The central device data layer may be a centralized repository for data of computing devices 50. As shown in fig. 9C, the central device data layer may communicate with a plurality of other components of the computing device, such as, for example, one or more sensors, a context manager, a device status component, and/or an add-on component. In some implementations, the central device data layer can communicate with each device component using an API (e.g., a proprietary API).
Fig. 10 depicts a block diagram of an example neural radiation field model training 1000, according to an example embodiment of the present disclosure. Training the neuro-radiation field model 1006 may include processing one or more training data sets. The one or more training data sets may be specific to one or more objects and/or one or more environments. For example, the neural radiation field model 1006 may process the training position 1002 (e.g., a three-dimensional position) and the training view direction 1004 (e.g., a two-dimensional view direction and/or vector) to generate one or more predicted color values 1008 and/or one or more predicted density values 1010. The one or more predicted color values 1008 and the one or more predicted density values 1010 may be used to generate a view rendering 1012.
Training images 1014 associated with training position 1002 and training view direction 1004 may be obtained. The training image 1014 and view rendering 1012 may be used to evaluate a loss function 1016. The evaluation may then be utilized to adjust one or more parameters of the neural radiation field model 1006. For example, the training image 1014 and view rendering 1012 may be used to evaluate the loss function 1016 to generate a gradient descent, which may be counter-propagated to adjust one or more parameters. The loss function 1016 may include an L2 loss function, a perceptual loss function, a mean square loss function, a cross entropy loss function, and/or a hinge loss function.
In some implementations, the systems and methods disclosed herein may be used to generate and/or render an enhanced environment based on user image data, one or more neural radiation field models, grids, and/or suggested data sets.
FIG. 11 depicts a block diagram of an example enhanced environment generation system 1100, according to an example embodiment of the disclosure. The environment generation system 1100 is enhanced. In particular, FIG. 11 depicts an enhanced environment generation system 1100 that includes obtaining user data 1102 (e.g., search queries, search parameters, preference data, historical user data, user-selected data, and/or image data) and outputting an interactive user interface 1110 to a user, the interactive user interface 1110 describing a three-dimensional representation of an enhanced environment 1108 that includes a plurality of objects 1104 rendered into the environment 1106.
For example, user data 1102 associated with a user may be obtained. The user data 1102 may include search queries (e.g., one or more keywords and/or one or more query images), historical data (e.g., a user's search history, a user's browser history, and/or a user's purchase history), preference data (e.g., explicitly entered preferences, learned preferences, and/or weighted adjustments to preferences), user-selected data, refinement parameters (e.g., price ranges, locations, brands, ratings, and/or sizes), user image data, and/or generated collections (e.g., collections generated by a user, which may include shopping carts, virtual object catalogs, and/or virtual interest boards).
The user data 1102 may be used to determine one or more objects 1104. One or more objects 1104 may be responsive to user data 1102. For example, one or more objects 1104 may be associated with search results that are responsive to a search query and/or one or more refinement parameters. In some implementations, the one or more objects 104 can be determined by processing the user data 1102 with one or more machine learning models trained to suggest the objects.
One or more rendering data sets associated with the one or more objects 1104 may be obtained to augment the environment 1106 to generate an augmented environment 1108 that may be provided in the interactive user interface 1110. The one or more rendering datasets may include one or more grids for each particular object and one or more neuro-radiation field datasets (e.g., one or more neuro-radiation field models having one or more learned parameters associated with the object).
The enhanced environment 1108 may be provided as a grid of renderings in the environment 1106 during instances of environment navigation, and may be provided with neural radiation field renderings in the environment 1106 during instances of threshold time obtained during viewing of the enhanced environment 1108 from a particular location and view direction.
Navigation and stalling may occur in response to interaction with the interactive user interface 1110. The interactive user interface 1110 may include pop-up elements for providing additional information about one or more objects 1104 and/or may be used to replace/add/delete objects 1104.
The environment 1106 may be a template environment and/or may be a user environment generated based on one or more user inputs (e.g., virtual model generation and/or one or more input images).
FIG. 12 depicts a block diagram of an example virtual environment generation 1200, according to an example embodiment of the disclosure. In particular, fig. 12 depicts user data 1202 processed to generate a virtual environment 1216, which virtual environment 1216 can be provided for display via an interactive user interface 1218.
User data 1202 may be obtained from a user computing system. The user data 1202 may include search queries, historical data (e.g., search history, browsing history, purchase history, and/or interaction history), preference data, user-selected data, and/or user profile data. The user data 1202 may be processed by the suggestion block 1204 to determine one or more objects 1206 associated with the user data 1202. One or more objects 1206 may be associated with one or more products for purchase. One or more rendering data sets 1210 may then be obtained from the rendering asset database 1208 based on the one or more objects 1206. One or more rendering data sets 1210 may be obtained by querying the rendering asset database 208 with data associated with one or more objects 1206. In some implementations, one or more rendering datasets 1210 may be pre-associated with one or more objects 1206 (e.g., via one or more tags).
One or more templates 1212 may then be obtained. The one or more templates 1212 may be associated with one or more example environments (e.g., example rooms, example lawns, and/or example automobiles). One or more templates 1212 may be determined based on the user data 1202 and/or based on the one or more objects 1206. The template 1212 may include image data, mesh data, a trained neural radiation field model, a three-dimensional representation, and/or a virtual reality experience.
The one or more templates 1212 and the one or more rendering datasets 1210 may be processed using the rendering model 1214 to generate a virtual environment 1216. The rendering model 1214 may include one or more neural radiation field models (e.g., one or more neural radiation field models trained on other user data sets and/or one or more neural radiation field models trained on user's image data sets), one or more enhancement models, and/or one or more mesh models.
The virtual environment 1216 may describe one or more objects 1206 that are rendered into the template environment. The virtual environment 1216 may be generated based on the one or more templates 1212 and the one or more rendering datasets 1210. A virtual environment 1216 may be provided for display in the interactive user interface 1218. In some implementations, a user may be able to interact with the interactive user interface 1218 to view the virtual environment 1216 from different angles and/or at different scales.
Fig. 13 depicts a block diagram of an example enhanced image data (augmented image data) generation 1300, according to an example embodiment of the disclosure. In particular, fig. 13 depicts user data 1302 and image data 1312 processed to generate enhanced image data 1316, enhanced image data 1316 may be provided for display via interactive user interface 1318.
User data 1302 may be obtained from a user computing system. User data 1302 can include search queries, historical data (e.g., search history, browsing history, purchase history, and/or interaction history), preference data, and/or user profile data. User data 1302 can be processed by suggestion block 1304 to determine one or more objects 1306 associated with user data 1302. One or more objects 1306 may be associated with one or more products for purchase. One or more rendering data sets 1310 may then be obtained from rendering asset database 1308 based on the one or more objects 1306. One or more rendering data sets 1310 may be obtained by querying rendering asset database 1308 with data associated with one or more objects 1306. In some implementations, one or more rendering data sets 1310 may be pre-associated with one or more objects 1306 (e.g., via one or more tags).
Image data 1312 may then be obtained. The image data 1312 may be associated with one or more user environments (e.g., a living room of the user, a bedroom of the user, a current environment in which the user is located, a lawn of the user, and/or a particular car associated with the user). The image data 1312 may be obtained in response to one or more selections by the user. The image data 1312 may include one or more images of the environment. In some implementations, the image data 1312 may be used to train one or more machine learning models (e.g., one or more neural radiation field models).
The image data 1312 and the one or more rendering data sets 1310 may be processed using a rendering model 1314 to generate enhanced image data 1316. The rendering model 1314 may include one or more neural radiation field models, one or more enhancement models, and/or one or more mesh models.
Enhanced image data 1316 may describe one or more objects 1306 rendered into a user environment. Enhanced image data 1316 may be generated based on image data 1312 and one or more rendering data sets 1310. Enhanced image data 1316 may be provided for display in interactive user interface 1318. In some implementations, a user may be able to interact with the interactive user interface 1318 to view one or more various renderings of the enhanced image data 1316 describing different angles and/or different zooms of the enhanced user environment.
The technology discussed herein refers to servers, databases, software applications, and other computer-based systems, as well as actions taken and information sent to and from such systems. The inherent flexibility of computer-based systems allows for a variety of possible configurations, combinations, and divisions of tasks and functions between and among components. For example, the processes discussed herein may be implemented using a single device or component or multiple devices or components operating in combination. The database and applications may be implemented on a single system or distributed across multiple systems. The distributed components may operate serially or in parallel.
While the present subject matter has been described in detail with respect to various specific example embodiments thereof, each example is provided by way of explanation and not limitation of the present disclosure. Modifications, variations and equivalents to those embodiments will readily occur to those skilled in the art upon attaining an understanding of the foregoing. Accordingly, the subject disclosure does not preclude inclusion of such modifications, variations and/or additions to the present subject matter as would be readily apparent to one of ordinary skill in the art. For example, features illustrated or described as part of one embodiment can be used with another embodiment to yield a still further embodiment. Accordingly, the present disclosure is intended to cover such alternatives, modifications, and equivalents.

Claims (20)

1. A computing system, the system comprising:
one or more processors; and
one or more non-transitory computer-readable media collectively storing instructions that, when executed by the one or more processors, cause the computing system to perform operations comprising:
obtaining user image data and request data, wherein the user image data describes one or more images comprising one or more user objects, wherein the one or more images are generated using a user computing device;
training one or more neural radiation field models based on the user image data, wherein the one or more neural radiation field models are trained to generate a view synthesis of the one or more objects; and
one or more view synthesis images are generated using the one or more neural radiation field models based on the request data, and wherein the one or more view synthesis images include one or more renderings of the one or more objects.
2. The system of claim 1, wherein the request data describes a request to generate a set of object types; and
Wherein the operations further comprise:
processing the user image data to determine that the one or more objects are of a particular object type; and
the one or more neural radiation field models are stored in a collection database, wherein the collection database is associated with the object type specific collection.
3. The system of claim 2, wherein the operations further comprise:
obtaining a plurality of additional user image data sets, wherein each of the plurality of additional user image data sets is generated with the user computing device;
processing each of the plurality of additional user image data sets using one or more object determination models to determine that a subset of the plurality of additional user image data sets includes a respective object of the particular object type;
training a respective additional neural radiation field model for each respective additional user image dataset of the subset of the plurality of additional user image datasets; and
each respective additional neural radiation field model is stored in the aggregate database.
4. The system of claim 2, wherein the particular object type is associated with one or more pieces of clothing.
5. The system of claim 1, wherein the operations further comprise:
the one or more view composite images are provided to a user computing system for display.
6. The system of claim 1, wherein the request data is associated with a context, wherein the context describes at least one of an object context or an environmental context.
7. The system of claim 1, wherein the operations further comprise:
providing a virtual object user interface to a user computing system, wherein the virtual object user interface provides the one or more view composite images for display, wherein the one or more objects are isolated from an original environment depicted in the user image data.
8. The system of claim 1, wherein the one or more view composite images are generated by:
processing the position and view direction using the one or more neural radiation field models to generate one or more predicted density values and one or more color values; and
the one or more view synthesis images are generated based on the one or more predicted density values and the one or more color values.
9. The system of claim 1, wherein the request data describes one or more adjustment settings; and
wherein generating the one or more view composite images with the one or more neural radiation field models based on the request data includes adjusting one or more color values of a set of predicted values generated by the one or more neural radiation field models.
10. The system of claim 1, wherein the request data describes a particular location and a particular view direction; and
wherein generating the one or more view composite images using the one or more neural radiation field models based on the request data comprises: processing the particular location and the particular view direction with the one or more neural radiation field models to generate a view rendering of the one or more objects, the view rendering describing a view associated with the particular location and the particular view direction.
11. A computer-implemented method for virtual closet generation, the method comprising:
obtaining, by a computing system comprising one or more processors, a plurality of user images, wherein each of the plurality of user images comprises one or more pieces of clothing, wherein the plurality of user images are associated with a plurality of different pieces of clothing;
Training, by the computing system, a respective neuro-radiation field model for each respective garment of the plurality of different garments, wherein each respective neuro-radiation field model is trained to generate one or more view-synthesis renderings of a particular respective garment;
storing, by the computing system, each respective neural radiation field model in a collective database; and
a virtual closet interface is provided by the computing system, wherein the virtual closet interface provides a plurality of garment view composite renderings for display based on the plurality of respective neural radiation field models, wherein the plurality of garment view composite renderings are associated with at least a subset of the plurality of different pieces of garment.
12. The method of claim 11, wherein the plurality of user images are automatically obtained from a stored database associated with a particular user based on the obtained request data.
13. The method of claim 11, wherein the plurality of user images are selected from a corpus of user images based on at least one of metadata, one or more user inputs, or one or more classifications.
14. The method of claim 11, further comprising:
Accessing, by the computing system, a storage database associated with a user; and
processing, by the computing system, a corpus of user images using one or more classification models to determine the plurality of user images including one or more objects classified as clothing.
15. The method of claim 11, wherein the virtual closet interface includes one or more interface features to view a laundry integration comprising two or more pieces of laundry displayed simultaneously.
16. The method of claim 11, wherein the plurality of garment view composite renderings are generated based on one or more unified pose parameters and one or more unified lighting parameters.
17. One or more non-transitory computer-readable media collectively storing instructions that, when executed by one or more computing devices, cause the one or more computing devices to perform operations comprising:
obtaining a plurality of user image datasets, wherein each user image dataset of the plurality of user image datasets describes one or more images comprising one or more objects, wherein the one or more images are generated using a user computing device;
Processing the plurality of user image data sets using one or more classification models to determine a subset of the plurality of user image data sets that includes features describing one or more particular object types;
training a plurality of neuro-radiation field models based on the subset of the plurality of user image datasets, wherein each respective neuro-radiation field model is trained to generate a view synthesis of one or more particular objects of a respective user image dataset of the subset of the plurality of user image datasets;
generating a plurality of view synthesis renderings using the plurality of neural radiation field models, wherein the plurality of view synthesis renderings describe a plurality of different objects of the particular object type; and
a user interface is provided for viewing the multiple view composite rendering.
18. The one or more non-transitory computer-readable media of claim 17, wherein the user interface comprises a rendering pane for viewing the plurality of view composition renderings.
19. The one or more non-transitory computer-readable media of claim 17, wherein the operations further comprise:
determining a first set of user image data sets comprising features describing a first object subtype;
Associating a first set of corresponding neural radiation models with a first object subtype tag;
determining a second set of user image data sets comprising features describing a second object subtype; and
a second set of corresponding neural radiation models is associated with a second object subtype tag.
20. The one or more non-transitory computer-readable media of claim 19, wherein the operations further comprise:
receiving an integrated rendering request, wherein the integrated rendering request describes a request to generate view rendering of a first object of the first object sub-type and a second object of the second object sub-type; and
generating an integrated view rendering using a first neuro-radiation field model of the first set of respective neuro-radiation field models and a second neuro-radiation field model of the second set of respective neuro-radiation field models, wherein the integrated view rendering comprises image data describing the first object and the second object in a shared environment.
CN202311736189.6A 2022-12-16 2023-12-15 Platform for enabling multiple users to generate and use neural radiation field models Pending CN117876579A (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US63/433,111 2022-12-16
US63/433,559 2022-12-19
US18/169,425 US20240202987A1 (en) 2022-12-16 2023-02-15 Platform for Enabling Multiple Users to Generate and Use Neural Radiance Field Models
US18/169,425 2023-02-15

Publications (1)

Publication Number Publication Date
CN117876579A true CN117876579A (en) 2024-04-12

Family

ID=90580135

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311736189.6A Pending CN117876579A (en) 2022-12-16 2023-12-15 Platform for enabling multiple users to generate and use neural radiation field models

Country Status (1)

Country Link
CN (1) CN117876579A (en)

Similar Documents

Publication Publication Date Title
US11403829B2 (en) Object preview in a mixed reality environment
US10402917B2 (en) Color-related social networking recommendations using affiliated colors
US12014467B2 (en) Generating augmented reality prerenderings using template images
US20200342320A1 (en) Non-binary gender filter
US20170031952A1 (en) Method and system for identifying a property for purchase using image processing
US10203847B1 (en) Determining collections of similar items
JP7331054B2 (en) Intelligent system and method for visual search queries
US10019143B1 (en) Determining a principal image from user interaction
JPWO2020090054A1 (en) Information information system, information processing device, server device, program, or method
CN116868187A (en) Computing platform for facilitating augmented reality experience with third party assets
US20180330393A1 (en) Method for easy accessibility to home design items
US20240202987A1 (en) Platform for Enabling Multiple Users to Generate and Use Neural Radiance Field Models
CN117876579A (en) Platform for enabling multiple users to generate and use neural radiation field models
JP7487399B1 (en) User-context-aware selection of rendering datasets
EP4386581A1 (en) User-context aware rendering dataset selection
US11941678B1 (en) Search with machine-learned model-generated queries
US20240202796A1 (en) Search with Machine-Learned Model-Generated Queries
KR102552621B1 (en) Method for custominzing interior of 3-dimensional and gan
US20240212021A1 (en) Device of providing content recommendation information based on relationship learning model of rendering information and method of operating same
CN113379482B (en) Article recommendation method, computing device and storage medium
US20240161423A1 (en) Systems and methods for using machine learning models to effect virtual try-on and styling on actual users
WO2024137088A1 (en) Search with machine-learned model-generated queries
Rayavel et al. Optimizing User Satisfaction and Streamlining Operations in Online Retail Using CNN Algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination