US12417487B2 - Systems, method, and computer storage medium for creating listing for items for sale in an electronic marketplace based on video analysis - Google Patents
Systems, method, and computer storage medium for creating listing for items for sale in an electronic marketplace based on video analysisInfo
- Publication number
- US12417487B2 US12417487B2 US17/562,423 US202117562423A US12417487B2 US 12417487 B2 US12417487 B2 US 12417487B2 US 202117562423 A US202117562423 A US 202117562423A US 12417487 B2 US12417487 B2 US 12417487B2
- Authority
- US
- United States
- Prior art keywords
- video
- item
- listing
- items
- text
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active, expires
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/76—Television signal recording
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/06—Buying, selling or leasing transactions
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0241—Advertisements
- G06Q30/0276—Advertisement creation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/06—Buying, selling or leasing transactions
- G06Q30/0601—Electronic shopping [e-shopping]
- G06Q30/0613—Electronic shopping [e-shopping] using intermediate agents
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/06—Buying, selling or leasing transactions
- G06Q30/0601—Electronic shopping [e-shopping]
- G06Q30/0641—Electronic shopping [e-shopping] utilising user interfaces specially adapted for shopping
- G06Q30/0643—Electronic shopping [e-shopping] utilising user interfaces specially adapted for shopping graphically representing goods, e.g. 3D product representation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
Definitions
- Various applications such as electronic marketplace applications, are commonly utilized by users to perform various on-line tasks, such as selling and/or buying items in an electronic marketplace.
- a user In order to create a listing for an item for sale in the electronic marketplace, a user typically utilizes a computer or another device to provide various details about the item, to research the marketplace to determine an appropriate price for the item, etc.
- a user may have multiple items that the user may wish to list for sale in the electronic marketplace.
- a small business owner may wish to create an electronic store in the electronic marketplace and list their physical inventory for sale in the electronic marketplace.
- creating the multiple listings may be cumbersome and time consuming.
- the user may be inapt in creating an electronic marketplace store and/or unable to perform necessary research, etc. due to lack of access to a computer, for example.
- a listing application may receive a video from a user device, such as a mobile phone equipped with a camera.
- the video may include depictions and audio descriptions of multiple items that the user wishes to list in the electronic marketplace.
- the listing application may identify the multiple items depicted in the video, obtain images (e.g., screen shots) of the items from the video, extract attributes of respective one of the multiple items from the content of the video, etc.
- the listing application may then automatically generate respective listings for the items using the images, item attributes, etc. extracted from the content of the video.
- the generated listings may then be displayed to the user for editing and/or approval by the user.
- the listing application may also generate an electronic marketplace store for the user, and may list the items in the electronic marketplace store.
- the user may easily and efficiently list an inventory of multiple items in the electronic marketplace by simply recording a video depicting and describing the multiple items that the user wishes sell in the electronic marketplace.
- a system for assisting users in listing items for sale in an electronic marketplace comprises a processor and memory including instructions which, when executed by the processor, causes the processor to perform operations.
- the operations include receiving a video from a user device associated with a user, the video including a video stream depicting a plurality of items to be listed for sale in the electronic marketplace.
- the operations also include obtaining, from the video stream, respective images depicting respective items among the plurality of items and extracting, from the video, respective attributes of the respective items among the plurality of items.
- the operations further include generating, based at least in part on the respective attributes of the respective items among the plurality of items, respective listings for sale of the respective items.
- the operations additionally include causing the respective listings for sale of the respective items to be displayed to the user.
- a method for customizing experience of a user of an electronic marketplace application includes receiving a video from a user device associated with a user, the video including a video stream depicting a plurality of items to be listed for sale in the electronic marketplace.
- the method also includes obtaining, from the video stream, respective images depicting respective items among the plurality of items, and extracting, from the video, respective attributes of the respective items among the plurality of items.
- the method further includes generating, based at least in part on the respective attributes of the respective items among the plurality of items, respective listings for sale of the respective items.
- the method additionally includes causing the respective listings for sale of the respective items to be displayed to the user.
- a computer storage medium encodes computer executable instructions that, when executed by at least one processor, perform a method.
- the method includes receiving a video from a user device associated with a user, the video including a video stream depicting a plurality of items to be listed for sale in the electronic marketplace.
- the method also includes obtaining, from the video stream, respective images depicting respective items among the plurality of items.
- the method further includes generating, based at least in part on the respective video images obtained from the video stream, respective listings for sale of the respective items.
- the method further still includes generating an electronic store for the user, the electronic store including the respective listings for sale of the respective items.
- the method additionally includes causing the electronic store to be displayed to at least one potential buyer.
- FIG. 1 illustrates an exemplary system for creating listings for sale in an electronic marketplace based on identifying items in a video feed, in accordance with aspects of the present disclosure.
- FIG. 2 depicts an example listing generator engine, in accordance with aspects of the present disclosure.
- FIG. 3 depicts an example video processing system, in accordance with aspects of the present disclosure.
- FIGS. 4 A-C depict example features that may be provided by an electronic marketplace application to assist a user in creating listings for items that the user may wish to list for sale in an electronic marketplace, in accordance with aspects of the present disclosure.
- FIG. 5 depicts an example user an electronic marketplace store generated for a user, in accordance with aspects of the present disclosure.
- FIG. 6 is a block diagram illustrating example physical components of a computing device with which aspects of the disclosure may be practiced.
- aspects of the disclosure are described more fully below with reference to the accompanying drawings, which from a part hereof, and which show specific example aspects.
- different aspects of the disclosure may be implemented in many different ways and should not be construed as limited to the aspects set forth herein; rather, these aspects are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the aspects to those skilled in the art.
- aspects may be practiced as methods, systems, or devices. Accordingly, aspects may take the form of a hardware implementation, an entirely software implementation or an implementation combining software and hardware aspects. The following detailed description is, therefore, not to be taken in a limiting sense.
- a listing application may receive a video from a user device, such as a mobile phone equipped with a camera, the video including depictions and, in some cases, audio descriptions of items that the user wishes to list for sale in the electronic marketplace.
- the listing application may identify, based on the content of the video, various items that the user wishes to sell, obtain images (e.g., screen shots) of the items, extract attributes of the items, etc.
- the listing application may utilize one or more trained machine learning (ML) models to process the video, convert the audio description of the items in the video to a text output including textual descriptions of the items, determine locations (e.g., timestamps) of where the items are depicted in the video, process the textual descriptions to recognize named entities and other item attributes mentioned in the video, etc.
- the listing application may then generate respective listings for the multiple items in the video based on the information gleaned from the video, such as images depicting the items, attributes of the items, etc.
- the listing application may search a products catalog to find similar items listed in the electronic marketplace.
- the listing application may obtain additional item attributes based on the similar items listed in the electronic marketplace, and may include the additional attributes in the listings generated for the items in the video. Additionally or alternatively, the listing application may determine, based on trending prices of the similar items listed in the electronic marketplace, prices that the listing application may suggest to the user for the items in the video. The generated listings may then be displayed to the user for editing and/or final approval by the user.
- the listing application may also generate an electronic marketplace store for the user and may list the items in the electronic marketplace store.
- the electronic marketplace store may include a digital storefront, a title, a logo, a billboard, etc. that may enhance the user's ability to sell the items in the electronic marketplace. In these ways, the user may easily and efficiently list an inventory of multiple items in the electronic marketplace by simply recording a video depicting and describing the multiple items that the user wishes sell in the electronic marketplace.
- FIG. 1 illustrates an exemplary system 100 for creating listings for sale in an electronic marketplace, in accordance with aspects of the present disclosure.
- the system 100 may include a user device 102 that may be configured to run or otherwise execute a client application 104 .
- the user device 102 may be a mobile device, such as a smartphone, equipped with a camera. Although a single user device 102 is illustrated in FIG. 1 , the system 100 may generally include multiple user devices 102 configured to run or otherwise execute client applications 104 .
- the user devices 102 may include, but are not limited to, laptops, tablets, smartphones, and the like.
- the applications 104 may include applications that allow users to engage with an electronic marketplace (sometimes referred to herein as “electronic marketplace applications”), for example to allow users to sell items and/or to buy items in the electronic marketplace.
- the client applications 104 may include web applications, where such client applications 104 may run or otherwise execute instructions within web browsers.
- the client applications 104 may additionally or alternatively include native client applications residing on the user devices 102 .
- the one or more user devices 102 may be communicatively coupled to an electronic marketplace server 106 via a network 108 .
- the network 108 may be a wide area network (WAN) such as the Internet, a local area network (LAN), or any other suitable type of network.
- the network 108 may be single network or may be made up of multiple different networks, in some examples.
- the system 100 may include a database 109 .
- the database 109 may be communicatively coupled to the electronic marketplace server 106 and/or to the user device 102 via the communication network 108 , as illustrated in FIG. 1 , or may be coupled to the electronic marketplace server 106 and/or to the user device 102 in other suitable manners.
- the product database 109 may be directly connected to the electronic marketplace server 106 , or may be included as part of the meeting electronic marketplace server 106 , in some examples.
- the database 109 may be a single database or may include multiple different databases.
- the database 109 may store one or more product catalogues 111 that may include information about various items that may be listed for sale in the electronic marketplace, such as description of the items, prices of the items, etc.
- the client application 104 running or otherwise executing on the user device 102 may be configured to assist a user 110 of the user device 102 in generating listings for items that the user 110 may wish to sell in the electronic marketplace.
- the client application 104 may allow the user 110 to record a video 112 depicting multiple items 114 that the user 110 wishes to sell in the electronic marketplace, and to upload the video to the electronic marketplace application server 106 .
- the user 110 may own or otherwise operate a physical store, for example, and the items 114 may include inventory of the physical store that the user 110 may wish to list for sale in the electronic marketplace.
- the user 110 may wish to convert the physical store to an electronic store in the electronic marketplace.
- the items 114 may include personal items that the user 110 may wish to list for sale in the electronic marketplace.
- the items 114 may thus include items in a same category (e.g., electronics, shoes, etc.) or may include items in different categories.
- the items 114 may include new items (e.g., items in unopened boxes, etc.) and/or may include used items in varying conditions.
- the user 110 may record the video 112 using a camera with which the user device 102 may be equipped or otherwise associated.
- the client application 104 may prompt the user 110 to create listings for items by recording a video using the camera of the user device 102 .
- the client application 104 may then access the camera of the user device 102 and may allow the user 110 to record the video 112 from within the client application 104 .
- the video 112 may be recorded in other suitable manners and/or using other suitable recording devices.
- the video 112 may include a video stream depicting respective ones of the multiple items 114 and an audio stream including descriptions of the respective ones of the multiple items 114 .
- the user 110 may record the video 112 , for example, by taking a sweep of the items 114 with the camera, stopping to describe each item 114 while the camera is hovering over the item 114 . While describing each item 114 , the user 110 may provide various details describing the item 114 , such as type, brand, model, color, size, condition (e.g., new, in a box, used, good condition, etc.), etc. of the item 114 . The user 110 may then move the camera to the next item 114 , and, while the camera is hovering over the next item 114 , the user 110 may provide details describing the next item 114 , and so on. In some aspects, the user 110 may provide audio cues for transitioning between items 114 , for example by saying “the next item is . . . ” of the like prior to, or in the process of, moving the camera to the next item 114 .
- the user device 102 may transmit the video 112 via the network 108 to the electronic marketplace server 106 .
- the electronic marketplace application server 106 may receive the video 112 and may provide the video 112 to a listing application 116 that may be running or otherwise executing on the electronic marketplace server 106 .
- the listing application 116 may process a video stream of the video 112 to identify items 114 depicted in the video 112 and extract respective attributes of the items 114 from the content of the video 112 .
- the listing application 116 may convert the audio stream of the video 112 to a text output, recognize named entities and descriptions in the text output, identify and process video frames that depict the items 114 in the video 112 , etc.
- the listing application 116 may generate respective listings 118 for respective ones of the items 114 . For example, the
- the listing application 116 may search one or more product catalogs 111 stored in the database 109 to find one or more similar items that may be listed for sale in the electronic marketplace.
- the listing application 116 may generate, based on information gleaned from the video 112 , a representation (e.g., a vector representation) of the item 114 , and may utilize the representation of the item 114 to query the one or more product catalogs 111 on the database 109 .
- the listing application 116 may utilize information (e.g., attributes, descriptions, etc.) from one or more matching product entries obtained from the product catalogs 111 to populate fields of the listing 118 for the item 114 .
- the listing application 116 is further configured to generate an electronic store 120 for the user and to associate the listings 114 with the electronic store 120 generated for the user.
- the listing application 116 may generate the electronic storefront based on the items 114 (e.g., based on the type of items 114 , etc.) and/or based on additional information that may be provided by the user 110 via the client application 104 , for example.
- the electronic storefront may include a store title, a store logo, a particular color scheme, etc.
- the items 114 may then be displayed to potential buyers within the electronic storefront generated for the user 110 .
- the user 110 may thus be able to quickly and efficiently list the multiple items 114 that the user wishes to sell in the electronic marketplace, such as inventory of a physical store that the user 110 may wish to sell in the electronic marketplace.
- the item identifier engine 202 may be configured to identify locations (e.g., timestamps corresponding to the locations) within the video 208 at which respective items 114 are depicted and/or described in the video 208 .
- the item identifier engine 202 may identify timestamps corresponding to locations within the video 208 at which particular items are being described by the user.
- the item identifier engine 202 may identify the locations additionally or alternatively based on audio cues (e.g., transition cues), for example based on detecting phrases such as “next I have . . . ,” “my next item is . . . ,” etc.
- the item identifier engine 202 may identify the locations the using motion detection, for example to detect locations at which the user is hovering over the items in the video 208 .
- the text output generated by the speech recognition engine 210 - 1 , along with identifies of locations at which the items are depicted and/or described in the video 208 may be provided to the attribute extraction engine 210 - 2 .
- the attribute extraction engine 210 - 2 may comprise a named entity recognition model trained to recognize item attributes based on textual descriptions of items that may be depicted in the video 208 , for example.
- the attribute extraction engine 210 - 2 may process the text output from the speech recognition engine 210 - 1 to extract item attributes from the content of the video 208 .
- Item attributes extracted from the content of the video 208 for a particular item 114 may include a brand of the item 114 , a model of the item 114 , a size of the item 114 , condition of the item 114 , etc.
- the item attributes extracted by the attribute extraction engine 210 - 2 may be provided to the item representation generator 210 - 3 .
- the item identifier engine 202 may extract images (e.g., video frames or screen shots) depicting the items 114 from the vide 208 , and may provide the images as inputs to the item representation generator 210 - 3 .
- the item representation generator 210 - 3 may be configured to generate a single representation of an item that may be, for example, in a form of a vector representing the item.
- the item representation generator 210 - 3 may comprise a machine learning model trained to generate an item representation (e.g., a vector) for an item from various modalities (e.g., images, textual descriptions, etc.) corresponding to the item.
- the item representation generator 210 - 3 may thus generate respective representations for the multiple items 114 based on the information (images, attributes, etc.) that may be gleaned from the content of the video 208 .
- the item identifier engine 202 may utilize the item representations generated by the item representation generator 210 - 3 as queries to search a products catalogue to find products closely matching the items 114 . Information from an entry in the products catalog matching an item 114 may then be provided to the listing generator 204 and may be utilized by the listing generator 204 to populate fields of a listing that the listing generator 206 may generate for the item 114 . Additionally or alternatively, information (e.g., images, item attributes, etc.) gleaned from the video 208 may be provided to the listing generator 204 and may be utilized for generating listings by the listing generator 204 .
- information e.g., images, item attributes, etc.
- the listing generator 204 may generate a listing for an item 114 to include an image extracted from the video 208 depicting the item 114 .
- the listing generator 204 may use one or more attributes (e.g., brand, model, color, size, etc.) extracted from the content of the video 208 for an item 114 to directly populate corresponding fields in a listing generated for the item 114 .
- the store generator 206 may generate an electronic store for the user, and may include the listings generated by the listing generator 204 in the electronic store generated for the user.
- the store generator 206 may generate an electronic storefront for the electronic store based on the items 114 (e.g., based on the type of items 114 , etc.) and/or based on additional information that may be provided by the user 110 via the application 104 , for example.
- the electronic storefront may include a store title, a store logo, a particular color scheme, etc.
- the listing application 200 may generate a listing output 214 that may include the multiple listings generated by the listing generator 204 and/or the electronic store generated by the store generator 206 , and may provide the output 114 to the client application 104 for display to the user 110 in the user interface 122 of the client application 104 .
- FIG. 3 depicts an example item identifier engine 300 , in accordance with aspects of the present disclosure.
- the item identifier engine 300 corresponds to the item identifier engine 202 of FIG. 2 .
- the item identifier engine 300 is described with reference to FIG. 2 .
- the item identifier engine 300 may be utilized with an application different from the listing application 200 of FIG. 2 .
- the item identifier engine 300 may include a video processing engine 302 , a named entity recognition engine 304 and a multimodal model 306 .
- the video processing engine 302 may be configured to process a video 308 (e.g., corresponding to the video 208 of FIG. 2 ) that may include a video stream depicting multiple items (e.g., items 114 ) that a user wishes to list for sale in the electronic marketplace.
- the video 308 may also include an audio stream providing description of the multiple items in the video 308 .
- the video processing engine 302 may use one or more of a general speech recognition model 310 - 1 , an electronic commerce aware language model 310 - 2 and one or more product category models 310 - 3 to convert at least a portion of the audio stream of the video 308 to a text output containing words or “tokens” that may be recognized in the audio stream describing the multiple items in the video 308 .
- the general speech recognition model 310 - 1 may be a general-purpose automatic speech recognition (ASR) engine or service.
- the electronic commerce aware language model 310 - 2 may be a model (e.g., an ML model such as a neural network) trained to recognize electronic marketplace specific terms, words, expressions, etc., for example.
- the one or more product category language models 310 - 3 may include category models such as electronics category language model, shoes category language model, etc., configured to boost speech recognition of hot words, terms, expressions, etc. in the corresponding product category.
- category models such as electronics category language model, shoes category language model, etc.
- a product category language model 310 - 3 for the electronics category may boost brand names such as Razor, iPhone, etc. to enhance ability of the video processing engine 302 to distinguish between the brand names and similar-sounding words such as razer and phone.
- the video processing engine 302 may be configured to select one or more appropriate product category models 310 - 3 , based for example based on prior knowledge of categories of the items 114 and/or based on identifying the categories of the items 114 from the content of the video 308 , and to utilize the selected one or more product category models to boost appropriate words, terms, expressions, etc. when converting speech to text in the video 308 .
- the video processing engine 302 may be configured to identify, based on the content of the video 308 , locations at which the respective items 114 are depicted and/or described in the video 308 . For example, the video processing engine 302 may determine a location at which an item is depicted in the video 308 using motion detection to detect a location at which a camera was hovering over an item. Additionally or alternatively, the video processing engine 302 may determine a location at which an item is depicted in the video 302 based on audio cues, such as based on detecting transitional phrases such as “my first item is . . . ,” “my next item is . . . ,” in the audio stream of the video 308 .
- the video processing engine 302 may generate timestamps or other suitable indicators to indicate locations at which the respective items 114 are depicted and/or described in the video 308 .
- the video processing engine 302 may extract, from the video 308 , respective images (e.g., screen shots) that depict the respective items 114 .
- the video processing engine 302 may obtain an image for an item by taking a screen shot at a location, in the video 308 , corresponding to the timestamp identified for the item 114 .
- the video processing engine 302 may be configured to identify one or more best images depicting an item 114 , for example by aligning images with the text output including a description of the item.
- video processing engine 302 may split the video 308 into multiple videos depicting respective items 114 , for example based on motion detection and/or transition phrase detection in the video 308 .
- the multiple videos may then be processed in parallel by the item identifier 300 to expedite creation of listings for the multiple items 114 .
- the output of the text output generated by the video processing engine 302 and the indicators (e.g., timestamps) indicating locations that depict and/or describe the respective items 114 in the video 308 may be provided to the named entity recognition engine 304 .
- the named entity recognition engine 304 may be configured to extract attributes of the items 114 from the text output generated by the video processing engine 302 .
- the named entity recognition engine 308 may comprise a multilingual model configured to recognize a language (e.g., English, Spanish, etc.) of the text output generated by the video processing engine 302 , and extract attributes of the items 114 in the corresponding language.
- the attributes extracted from the text output generated by the video processing engine 302 may comprise values corresponding to respective attribute categories, such as brand, model, type, color, size, condition, electrical feature (e.g., connectivity type), etc.
- the named entity recognition engine 304 may analyze a portion of the text output corresponding to a particular item 114 to generate attribute pairs, each attribute pair including an attribute category and an attribute value extracted from the text output generated by the video processing engine 302 .
- attribute pairs may include pairs such as [Brand:Razar], [ProductlD:Chroma], [Type:Headset], [Connectivity:USB], [Color:Red], and so on.
- the attributes of an item 114 extracted by the named entity recognition engine 304 may be provided as a modality descriptive of the item 114 to the multimodal model 306 .
- One or more additional modalities descriptive of the item 114 may also be provided to the multimodal model 306 .
- the one or more images depicting the item 114 obtained from the video 308 may be provided as an additional modality descriptive of the item 114 to the multimodal model 306 .
- the multimodal model 306 may generate a representation of the item 114 based on the multiple modalities descriptive of the item 114 .
- the multimodal model 306 may comprise a trained ML multimodal model, such as vision-text Bidirectional Encoder Representations from Transformer (VT-BERT) model.
- VT-BERT vision-text Bidirectional Encoder Representations from Transformer
- the multimodal model 306 may be implemented in other suitable manners.
- the multimodal model 306 may generate the item representations for the items 114 in the form of item embeddings, or vectors representing the items 114 .
- the item representations generated by the multimodal model 306 may be utilized to search a product catalog to find similar items for generating listings for the items 114 as described above.
- the item representations generated by the multimodal model 306 may be stored in association with the listings generated for the items 114 so that the listings may subsequently searched based on queries that may be provided by potential buyers, for example.
- FIGS. 4 A-C depict example features that may be provided by an electronic marketplace application 400 to assist a user in creating listings for items that the user may wish to list for sale in an electronic marketplace.
- the electronic marketplace application 400 may correspond to the electronic marketplace application 104 of the system 100 of FIG. 1 .
- the electronic marketplace application 400 is described with reference to FIG. 1 .
- the electronic marketplace application 400 may be utilized with a system different from the system 100 of FIG. 1 .
- an example user interface prompt 402 may be displayed in a user interface 404 of the electronic marketplace application 400 to prompt the user to initiate creation of an electronic store, in accordance with aspects of the present disclosure.
- the user interface 404 may correspond to the user interface 116 of FIG. 1 .
- the user interface prompt 402 may be displayed in a profile screen 406 in the user interface 404 when the user is logged into the electronic marketplace application 400 .
- the profile screen 406 may display a username 408 and a logo 410 associated with the user logged into the electronic marketplace application 400 .
- the username 408 may correspond to a business name associated with a business of the user, such as a physical store owned or otherwise operated by the user.
- the logo 410 may be, for example, a logo of the physical store owned or otherwise operated by the user.
- the username 408 and/or the logo 410 may not be associated with a physical store.
- the username 408 and/or the logo 410 may be stored in a database in association with a user account of the user, and may be retrieved from the database for display in the profile screen 406 .
- the user interface prompt 402 may comprise a clickable button or icon, for example.
- the user may click on, or otherwise engage with, the user interface prompt 402 to initiate creation of an electronic store.
- the electronic marketplace application 400 may access a camera on the user device 102 (e.g., a mobile phone), and may allow the user to record a video (e.g., the video 112 ) depicting and describing multiple items (e.g., the items 114 ) that the user wishes to include as inventory in the electronic store.
- the electronic marketplace application 400 may transmit the video recorded by the user to the electronic marketplace server 106 for generation of listings for the multiple items and, in some cases, a storefront for the electronic store, by the listing application 116 .
- an edit store screen 452 may display listings 454 generated for the user based on the video recorded by the user to allow the user to view and, if necessary and/or desired, edit details of the listings.
- the edit store screen 452 may additionally display one or more fields for entering or editing details of an electronic store generated for the user.
- the edit store screen 452 may include a title field 456 that may allow the user to enter a name or a title for the electronic store.
- the title field 456 may include a title or name that may be generated by the listing application 116 for the electronic store, for example based on the content of the video recorded by the user, and may allow the user to edit the generated name or title for the electronic store.
- the edit store screen 452 may display a billboard field 456 that may allow the user to upload a billboard image or a logo for the electronic store.
- the edit store screen 452 may additionally or alternatively display other suitable fields for prompting the user to provide and/or edit details for the electronic store.
- the edit store screen 452 may additionally display a publish icon or button 460 for publishing the electronic store. The user may click on or otherwise engage with the publish icon or button 460 to approve and publish the electronic store in the electronic marketplace.
- a store screen 470 may display an electronic store generated for the user.
- the store screen 470 may include a title field 472 displaying the name or title generated for and/or entered or edited by the user via the edit store screen 452 .
- the store screen 470 may also include a logo field 474 displaying a logo for the electronic store that may have been provided by the user (e.g., via the edit store screen 452 ) or otherwise generated for the electronic store associated with the user.
- the store screen 470 may additionally provide a display of the listings for the items that were depicted in the video recorded by the user.
- the store screen 470 may display a search field 476 to allow potential byers to search for products in the electronic store.
- FIG. 5 illustrates an exemplary method 500 for generating listings for items for sale in an electronic marketplace, in accordance with aspects of the present disclosure.
- the method 500 may be performed by a server device to generate respective listings for multiple items based on a video received from a user device.
- a general order of the operations for the method 500 is shown in FIG. 5 .
- the method 500 may include more or fewer steps or may arrange the order of the steps differently than those shown in FIG. 5 .
- the method 500 can be executed as a set of computer-executable instructions executed by a computer system and encoded or stored on a computer readable medium. Further, the method 500 can be performed by gates or circuits associated with a processor, an ASIC, an FPGA, a SOC or other hardware device.
- Method 500 begins at operation 502 at which a video is received from a user device.
- the video may include a video stream depicting a plurality of items to be listed for sale in the electronic marketplace.
- the video 112 including a video stream depicting the items 114 is received.
- another suitable video is received depicting suitable items different from the items 114 is received.
- the video may having been recorded by the user using a camera od the user device, such as a mobile phone, for example.
- the user may have recorded the video by taking a sweep (e.g., a 360 degree sweep) of an inventory of a physical store that is operated by the user, for example.
- a sweep e.g., a 360 degree sweep
- respective images depicting respective items among the plurality of items are obtained from the video stream of the video.
- the respective images are obtained based on determining locations (e.g., timestamps) corresponding to video frames, in the video stream, in which the items are depicted.
- the locations are determined based on the audio stream, for example by determining locations in the video at which the items are being described by the user. Additionally or alternatively, the locations are determined based on motion detection, detecting that the user is hovering over the item in the video. In other aspects, other suitable factors are additionally or alternatively utilized to determine locations at which the items are depicted in the video.
- respective attributes of the respective items among the plurality of items are extracted from content of the video.
- the attributes are extracted from descriptions of the items in the video.
- at least a portion of the audio stream of the video may be converted to a text output containing textual descriptions of the respective items, and the attributes of the items may be extracted from the textual descriptions of the respective items.
- portions of the textual descriptions corresponding to respective items may be analyzed by a named entity recognition model to extract attributes of the corresponding items.
- the attributes of the respective items may be extracted from content of the video in other suitable manners.
- respective listings for sale of the respective items are generated.
- the respective listings are generated based at least in part on the respective attributes of the respective items extracted from the content of the video.
- a listing for a particular item is generated to include the attributes of the particular item that are extracted at block 506 .
- the listing may further include an image of the particular item obtained at block 504 .
- a products catalog may be searched to find one or more similar items listed in the electronic marketplace, and the listing may be generated based on attributes that may be associated with the similar items in the electronic marketplace.
- a representation e.g., a vector representation
- a representation of a particular item may be generated (e.g., using a trained multimodal model) based on multiple modalities descriptive of the particular item, such as one or more images of the particular item obtained from the video, the attributes of the item extracted from content of the video, etc., and the representation of the particular item may be utilized to search the product catalog to find one or more similar items.
- other suitable techniques may utilized to generate the listings for the respective items.
- the respective listings for sale of the respective items are provided for display to the user.
- the respective listings may be transmitted to the user device ad may be displayed to the user via a user interface of a marketplace application that may be running or otherwise executing on the user device.
- the user may then edit the listings as needed and may approve the listings for publishing in the electronic marketplace.
- the user may thus easily and efficiently list an inventory of multiple items in the electronic marketplace by simply recording a video depicting and describing the items that the user wishes sell in the electronic marketplace.
- FIG. 6 illustrates a simplified block diagram of the device with which aspects of the present disclosure may be practiced in accordance with aspects of the present disclosure.
- One or more of the present aspects may be implemented in an operating environment 600 .
- This is only one example of a suitable operating environment and is not intended to suggest any limitation as to the scope of use or functionality.
- Other well-known computing systems, environments, and/or configurations that may be suitable for use include, but are not limited to, personal computers, server computers, hand-held or laptop devices, multiprocessor systems, microprocessor-based systems, programmable consumer electronics such as smartphones, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.
- the operating environment 600 typically includes at least one processing unit 602 and memory 604 .
- memory 604 instructions to perform customization of applications as described herein
- memory 604 may be volatile (such as RAM), non-volatile (such as ROM, flash memory, etc.), or some combination of the two.
- This most basic configuration is illustrated in FIG. 6 by dashed line 606 .
- the operating environment 600 may also include storage devices (removable, 608 , and/or non-removable, 610 ) including, but not limited to, magnetic or optical disks or tape.
- the operating environment 600 may also have input device(s) 614 such as keyboard, mouse, pen, voice input, on-board sensors, etc.
- output device(s) 616 such as a display, speakers, printer, motors, etc.
- Also included in the environment may be one or more communication connections, 612 , such as LAN, WAN, a near-field communications network, point to point, etc.
- Operating environment 600 typically includes at least some form of computer readable media.
- Computer readable media can be any available media that can be accessed by at least one processing unit 602 or other devices comprising the operating environment.
- Computer readable media may comprise computer storage media and communication media.
- Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.
- Computer storage media includes, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other tangible, non-transitory medium which can be used to store the desired information.
- Computer storage media does not include communication media.
- Computer storage media does not include a carrier wave or other propagated or modulated data signal.
- Communication media embodies computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media.
- modulated data signal means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal.
- communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media.
- the operating environment 600 may be a single computer operating in a networked environment using logical connections to one or more remote computers.
- the remote computer may be a personal computer, a server, a router, a network PC, a peer device or other common network node, and typically includes many or all of the elements described above as well as others not so mentioned.
- the logical connections may include any method supported by available communications media.
- Such networking environments are commonplace in offices, enterprise-wide computer networks, intranets and the Internet.
Landscapes
- Business, Economics & Management (AREA)
- Engineering & Computer Science (AREA)
- Finance (AREA)
- Accounting & Taxation (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Development Economics (AREA)
- Strategic Management (AREA)
- General Business, Economics & Management (AREA)
- Marketing (AREA)
- Economics (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Entrepreneurship & Innovation (AREA)
- Game Theory and Decision Science (AREA)
- Signal Processing (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Image Analysis (AREA)
Abstract
Description
Claims (19)
Priority Applications (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US17/562,423 US12417487B2 (en) | 2021-12-27 | 2021-12-27 | Systems, method, and computer storage medium for creating listing for items for sale in an electronic marketplace based on video analysis |
| CN202211462495.0A CN116362822A (en) | 2021-12-27 | 2022-11-17 | System and method for creating a listing of items for sale in an electronic marketplace |
| EP22209585.3A EP4202818A1 (en) | 2021-12-27 | 2022-11-25 | Systems and methods for creating listings for items for sale in an electronic marketplace |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US17/562,423 US12417487B2 (en) | 2021-12-27 | 2021-12-27 | Systems, method, and computer storage medium for creating listing for items for sale in an electronic marketplace based on video analysis |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| US20230206314A1 US20230206314A1 (en) | 2023-06-29 |
| US12417487B2 true US12417487B2 (en) | 2025-09-16 |
Family
ID=84363237
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US17/562,423 Active 2042-01-01 US12417487B2 (en) | 2021-12-27 | 2021-12-27 | Systems, method, and computer storage medium for creating listing for items for sale in an electronic marketplace based on video analysis |
Country Status (3)
| Country | Link |
|---|---|
| US (1) | US12417487B2 (en) |
| EP (1) | EP4202818A1 (en) |
| CN (1) | CN116362822A (en) |
Citations (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20090012878A1 (en) * | 2005-08-09 | 2009-01-08 | Tedesco Daniel E | Apparatus, Systems and Methods for Facilitating Commerce |
| US20100217684A1 (en) | 2009-02-24 | 2010-08-26 | Ryan Melcher | System and method to create listings using image and voice recognition |
| US20110099085A1 (en) * | 2009-10-23 | 2011-04-28 | Scot Fraser Hamilton | Product identification using multiple services |
| US20140337174A1 (en) * | 2013-05-13 | 2014-11-13 | A9.Com, Inc. | Augmented reality recomendations |
| US20170024791A1 (en) * | 2007-11-20 | 2017-01-26 | Theresa Klinger | System and method for interactive metadata and intelligent propagation for electronic multimedia |
| WO2019183061A1 (en) | 2018-03-20 | 2019-09-26 | A9.Com, Inc. | Object identification in social media post |
| US20200151837A1 (en) | 2018-11-08 | 2020-05-14 | Sony Interactive Entertainment LLC | Method for performing legal clearance review of digital content |
| US20220115043A1 (en) * | 2020-10-08 | 2022-04-14 | Adobe Inc. | Enhancing review videos |
| US11432046B1 (en) * | 2015-06-12 | 2022-08-30 | Veepio Holdings, Llc | Interactive, personalized objects in content creator's media with e-commerce link associated therewith |
-
2021
- 2021-12-27 US US17/562,423 patent/US12417487B2/en active Active
-
2022
- 2022-11-17 CN CN202211462495.0A patent/CN116362822A/en active Pending
- 2022-11-25 EP EP22209585.3A patent/EP4202818A1/en active Pending
Patent Citations (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20090012878A1 (en) * | 2005-08-09 | 2009-01-08 | Tedesco Daniel E | Apparatus, Systems and Methods for Facilitating Commerce |
| US20170024791A1 (en) * | 2007-11-20 | 2017-01-26 | Theresa Klinger | System and method for interactive metadata and intelligent propagation for electronic multimedia |
| US20100217684A1 (en) | 2009-02-24 | 2010-08-26 | Ryan Melcher | System and method to create listings using image and voice recognition |
| US20110099085A1 (en) * | 2009-10-23 | 2011-04-28 | Scot Fraser Hamilton | Product identification using multiple services |
| US20140337174A1 (en) * | 2013-05-13 | 2014-11-13 | A9.Com, Inc. | Augmented reality recomendations |
| US11432046B1 (en) * | 2015-06-12 | 2022-08-30 | Veepio Holdings, Llc | Interactive, personalized objects in content creator's media with e-commerce link associated therewith |
| WO2019183061A1 (en) | 2018-03-20 | 2019-09-26 | A9.Com, Inc. | Object identification in social media post |
| US20200151837A1 (en) | 2018-11-08 | 2020-05-14 | Sony Interactive Entertainment LLC | Method for performing legal clearance review of digital content |
| US20220115043A1 (en) * | 2020-10-08 | 2022-04-14 | Adobe Inc. | Enhancing review videos |
Non-Patent Citations (2)
| Title |
|---|
| Extended European Search Report Received for European Patent Application No. 22209585.3, mailed on Apr. 3, 2023, 10 pages. |
| Florez, O. U. (2013). Knowledge extraction in video through the interaction analysis of activities knowledge extraction in video through the interaction analysis of activities (Order No. 3587563). Retrieved from https://dialog.proquest.com/professional/docview/1427350585?accountid=131444. * |
Also Published As
| Publication number | Publication date |
|---|---|
| CN116362822A (en) | 2023-06-30 |
| US20230206314A1 (en) | 2023-06-29 |
| EP4202818A1 (en) | 2023-06-28 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN110134931B (en) | Medium title generation method, medium title generation device, electronic equipment and readable medium | |
| US20120078626A1 (en) | Systems and methods for converting speech in multimedia content to text | |
| CN107832338B (en) | Method and system for recognizing core product words | |
| CN113836950B (en) | Commodity title text translation method and device, equipment and medium thereof | |
| US11443180B1 (en) | Mapping content to an item repository | |
| WO2024188044A1 (en) | Video tag generation method and apparatus, electronic device, and storage medium | |
| US20180012282A1 (en) | Image-based shopping system | |
| CN114896452A (en) | Video retrieval method and device, electronic equipment and storage medium | |
| US11514913B2 (en) | Collaborative content management | |
| KR20220130863A (en) | Apparatus for Providing Multimedia Conversion Content Creation Service Based on Voice-Text Conversion Video Resource Matching | |
| US12417487B2 (en) | Systems, method, and computer storage medium for creating listing for items for sale in an electronic marketplace based on video analysis | |
| CN111259131B (en) | Information processing method, medium, device and computing equipment | |
| KR20220079029A (en) | Method for providing automatic document-based multimedia content creation service | |
| CN113127597B (en) | Search information processing method and device and electronic equipment | |
| KR20250016409A (en) | Method of generating review video customized user using feedback data and serrver performing thereof | |
| KR20220079026A (en) | A apparatus for providing general document-based multimedia image content production service | |
| CN109739970B (en) | Information processing method and device and electronic equipment | |
| US20250110985A1 (en) | Personalized ai assistance using ambient context | |
| KR102435243B1 (en) | A method for providing a producing service of transformed multimedia contents using matching of video resources | |
| KR20220079042A (en) | Program recorded medium for providing service | |
| KR20220079060A (en) | Resource database device for document-based video resource matching and multimedia conversion content production | |
| KR20220079073A (en) | Production interface device for multimedia conversion content production service providing device | |
| KR20220079019A (en) | A program for providing multimedia contents production service | |
| EP4386651B1 (en) | Product identification assistance techniques in an electronic marketplace application | |
| KR20220079024A (en) | Program recording medium for providing services that convert documents into multimedia contents |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| AS | Assignment |
Owner name: EBAY INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:PERIYATHAMBI, RAMESH;LANCEWICKI, TOMER;HEWAVITHARANA, SANJIKA;AND OTHERS;SIGNING DATES FROM 20211215 TO 20211227;REEL/FRAME:058482/0340 |
|
| FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
| AS | Assignment |
Owner name: EBAY INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:REEVES, RYAN;REEL/FRAME:060647/0023 Effective date: 20220727 |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
| STCV | Information on status: appeal procedure |
Free format text: NOTICE OF APPEAL FILED |
|
| STCV | Information on status: appeal procedure |
Free format text: APPEAL BRIEF (OR SUPPLEMENTAL BRIEF) ENTERED AND FORWARDED TO EXAMINER |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT RECEIVED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED |
|
| STCF | Information on status: patent grant |
Free format text: PATENTED CASE |