WO2023220172A1 - Systèmes et procédés d'ingestion et de traitement de contenu enrichissable - Google Patents

Systèmes et procédés d'ingestion et de traitement de contenu enrichissable Download PDF

Info

Publication number
WO2023220172A1
WO2023220172A1 PCT/US2023/021728 US2023021728W WO2023220172A1 WO 2023220172 A1 WO2023220172 A1 WO 2023220172A1 US 2023021728 W US2023021728 W US 2023021728W WO 2023220172 A1 WO2023220172 A1 WO 2023220172A1
Authority
WO
WIPO (PCT)
Prior art keywords
client device
content
action
hash
enrichable
Prior art date
Application number
PCT/US2023/021728
Other languages
English (en)
Inventor
Michael Muller
Sharmil HASSAN
Original Assignee
Michael Muller
Hassan Sharmil
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Michael Muller, Hassan Sharmil filed Critical Michael Muller
Publication of WO2023220172A1 publication Critical patent/WO2023220172A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/50Extraction of image or video features by performing operations within image blocks; by using histograms, e.g. histogram of oriented gradients [HoG]; by summing image-intensity values; Projection analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/002Specific input/output arrangements not covered by G06F3/01 - G06F3/16
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/20Scenes; Scene-specific elements in augmented reality scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06KGRAPHICAL DATA READING; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K17/00Methods or arrangements for effecting co-operative working between equipments covered by two or more of main groups G06K1/00 - G06K15/00, e.g. automatic card files incorporating conveying and reading operations

Definitions

  • Embodiments described herein relate to a systems and methods for ingesting and processing enrichable content and delivering enriched content to a client device.
  • An object or media may be affixed or rendered with a label, barcode, beacon, or tag that encodes information about that object or media, such as an identification number, a uniform resource location (URL), or the like linking to supplemental information about the object or media, and so on.
  • Examples include Quick Response (QR) codes, bar codes, Bluetooth Low Energy (BLE) beacons, Near-Field Communication (NFC) tags, radio frequency identification tags (RFID), and so on.
  • QR Quick Response
  • BLE Bluetooth Low Energy
  • NFC Near-Field Communication
  • RFID radio frequency identification tags
  • a barcode such as a QR code is a graphical static matrix encoding information that can be read, and acted upon, by an electronic device with a camera or scanning laser in close proximity of that code.
  • a code may only appear for a brief period of time (e.g., during a commercial), significantly limiting the ability of a viewer to access the content linked to by that code.
  • a printed code may be damaged either maliciously by environmental exposure over time.
  • an electronic tag such as a BLE beacon may be a low-power electronic device that regularly broadcasts encoded (often static) information that can be wirelessly received, and acted upon, by a suitably-capable electronic device within a few meters of the beacon.
  • An NFC tag is typically an unpowered electronic circuit encoding static information that can be read, and acted upon, by an NFC-capable electronic device within a few centimeters of the tag. Similar limitations are present for RFID tags. Such electronic devices have significantly limited range and may have a limited service life, as battery capacity drains over time.
  • each of these and other conventional information encoding techniques exhibit several drawbacks.
  • a conventional code, label, tag, or beacon occupies physical space, obscuring a portion of the object or media to which it is affixed.
  • conventional codes, labels, tags, and beacons encode static information that cannot be changed or updated remotely without additional intermediate proxies or redirects, many of which are blocked by certain web security policies.
  • conventional codes, labels, tags, and beacons are subject to failure, tampering, and/or damage, often rendering them completely unusable for intended purposes.
  • conventional codes, labels, beacons, and tags require specific threshold proximity with an electronic device attempting to read them. More specifically, neither a QR code, nor an NFC tag, nor a BLE beacon can be scanned from a large distance, even if undamaged and operating normally.
  • Embodiments described herein take the form of a system for ingesting enrichable content and associating the enrichable content with an action, the system including at least an application server including at least a memory allocation storing executable instructions, and a processor allocation operably coupled with the memory allocation and configured to load from the memory allocation the executable instructions thereby instantiating an instance of a backend application configured to communicably couple to a frontend application instance instantiated over a client device in network communication with the application server, the backend application instance configured to receive a structured data object with a static image of the enrichable content and an attribute identifying an action to associate to the enrichable content, provide the static image as input to a high -accuracy object classifier, receive as output from the high-accuracy object classifier, a set of objects and corresponding bounding boxes identified within the static image, generate an object graph based on the set of corresponding bounding boxes, generate a hash corresponding to object graph, and storing the hash and the action in a database.
  • Related and additional embodiments include a configuration in which the client device includes a personal electronic device with a camera.
  • Related and additional embodiments include a configuration in which the structured data object includes a set of static images, and the static image may be a member of the set of static images.
  • Related and additional embodiments include a configuration in which the set of static images are associated with frames of a captured video.
  • Related and additional embodiments include a configuration in which the backend application may be configured to, for each respective static image of the set of static images provide the respective static image as input to the high-accuracy object classifier, receive as output from the high-accuracy object classifier, a respective set of objects and corresponding bounding boxes identified within the respective static image, generate at respective graph based on the respective set of corresponding bounding boxes, and generate a respective hash corresponding to object graph.
  • Related and additional embodiments include a configuration in which the backend application may be configured to generate a single hash from each respective hash.
  • Related and additional embodiments include a configuration in which the single hash may be associated to the action in the database. [0015] Related and additional embodiments include a configuration in which the structured data object includes location information of the client device.
  • Related and additional embodiments include a configuration in which the hash may be based, at least in part, on the location information.
  • Related and additional embodiments include a configuration in which the structured data object includes orientation information of the client device.
  • Related and additional embodiments include a configuration in which the hash may be based, at least in part, on the orientation information.
  • orientation information may be based at least in part on accelerometer data of the client device, compass data of the client device, or gyroscope information of the client device.
  • Related and additional embodiments include a configuration in which the structured data object includes network connectivity information of the client device.
  • Related and additional embodiments include a configuration in which the hash may be based, at least in part, on the network connectivity information.
  • Related and additional embodiments include a configuration in which the network connectivity information corresponds to a Wi-Fi connection, a cellular connection, or a Bluetooth connection.
  • Related and additional embodiments include a configuration in which the action includes an instruction to cause the client device to load a URL.
  • Related and additional embodiments include a configuration in which the action includes an instruction to cause the client device to render a virtual reality scene on a display of the client device.
  • Related and additional embodiments include a configuration in which the action includes an instruction to cause the client device to render an augmented reality scene on a display of the client device.
  • Embodiments described herein take the form of a system for identifying enrichable content and causing to be executed at least one action associated with the enrichable content, the system including at least an application server including at least a memory allocation storing executable instructions, and a processor allocation operably coupled with the memory allocation and configured to load from the memory allocation the executable instructions thereby instantiating an instance of a backend application configured to communicably couple to a frontend application instance instantiated over a client device in network communication with the application server, the backend application instance configured to receive a structured data object with a static image, provide the static image as input to a high-speed object classifier, receive as output from the high-speed object classifier, a set of objects and corresponding bounding boxes identified within the static image, generate an object graph based on the set of corresponding bounding boxes, generate a hash corresponding to object graph, determine whether the hash may be equivalent to a hash previously stored in a database, in response to determining that the hash may be equivalent to a previously
  • Related and additional embodiments include a configuration in which the scene includes an active television displaying a broadcast.
  • Related and additional embodiments include a configuration in which the structured data object includes at least one of location information of the client device, or orientation information of the client device.
  • Embodiments described herein take the form of a method of operating a server application to ingest media content and associate the media content with one or more actions to be performed by at least one of a client device, a third-party service, or a first-party service, the method including at least receiving a structured data object with a static image of a scene and an attribute identifying an action to associate to the enrichable content, provide the static image as input to a high-accuracy object classifier, receive as output from the high-accuracy object classifier, a set of objects and corresponding bounding boxes identified within the static image, generate an object graph based on the set of corresponding bounding boxes, generate a hash corresponding to object graph, and storing the hash and the action in a database.
  • Related and additional embodiments include a configuration in which the static image includes a QR code.
  • Related and additional embodiments include with extracting data encoded by of the QR coded.
  • Related and additional embodiments include with associating the action with the extracted data of the QR code.
  • Related and additional embodiments include a configuration in which the structured data includes information obtained from an NFC tag, RFID, or Bluetooth tag disposed within the scene.
  • Related and additional embodiments include with associating the action with the information.
  • FIG. 1 depicts an example computing network in, or over which embodiments presented in this disclosure may be implemented.
  • FIG. 2 depicts an example intake system, as described herein.
  • FIG. 3 depicts an example retrieval system, in accordance with some embodiments.
  • FIG. 4A depicts an example computing environment corresponding to an intake system, as described herein.
  • FIG. 4B depicts an example computing environment corresponding to a retrieval system, as described herein.
  • FIGs. 5A-5G depict various example use cases or practical applications of embodiments described herein.
  • FIG. 6 depicts an example user interface of a client application executing on a client device, in accordance with some embodiments.
  • FIGs. 7A-7B depict an example of object identification and generation of object clusters, in accordance with some embodiments.
  • FIG. 8 depicts a flowchart corresponding to example operations of a method being performed by an intake system, in accordance with some embodiments.
  • FIG. 9A depicts a flowchart corresponding to example operations of a method being performed by a retrieval system, in accordance with some embodiments.
  • FIG. 9B depicts another flowchart corresponding to example operations of a method being performed by the retrieval system, in accordance with some embodiments.
  • FIG. 10 depicts a flowchart corresponding to example operations of a method being performed by a retrieval system, in particular, for an occluded QR code and/or an out-of-range NFC tag, RFID tag, and/or a BLE beacon, in accordance with some embodiments.
  • Embodiments described herein relate to systems and methods for enriching a user’s engagement with physical objects, scenes, and/or media.
  • a portable electronic device such as a cellular phone, that includes a camera to image a scene within a field of view of the camera.
  • Frames of the scene, captured by the camera at any suitable frame rate and/or any suitable resolution can be processed by a cloud service as describe herein so as to recognize content within the scene, relative positions between recognized content, and cause the electronic device that imaged the scene to perform one or more actions (thereby “enriching” the scene or object with supplemental content, actions, tasks, media, and so on) based on, or associated with, the recognized content.
  • Example enrichments/actions that can be associated with a recognized scene as described herein can include, without limitation: loading a website; launching an application; rendering an augmented reality object or media over a live view of the scene imaged by the portable electronic device; facilitating a purchase; passing sensor data to a remote service or server; creating or passing ownership in a non-fungible token or crypto-asset; creating a digital twin object in a digital environment (e.g., a “metaverse”); issuing a coupon for a particular service or product; generating a promotion code for a particular product or service; rendering a video; playing or causing to be shown selected media; rendering a particular object or media within a virtual reality (VR) environment (e.g., for a user wearing a VR headset) and so on.
  • VR virtual reality
  • one or more frames captured by the camera can be uploaded to a cloud service configured to leverage a trained machine learning model to identify and/or label or classify objects, items, text, or persons in the imaged scene.
  • the trained machine learning model can be additionally configured to segment one or more frames into a grid, processing subsets of a complete frame individually (and/or in parallel).
  • the trained machine learning model can be configured to determine relative positions between individual identified objects (in particular, relative positions of bounding boxes, as one example; for example, geometric centers of bounding boxes can be located in a coordinate space and an object graph can be constructed in which nodes of the graph are associated with bounding boxes and edges of the graph connect nearest neighbor bounding boxes) in the imaged scene.
  • relative positions between individual identified objects in particular, relative positions of bounding boxes, as one example; for example, geometric centers of bounding boxes can be located in a coordinate space and an object graph can be constructed in which nodes of the graph are associated with bounding boxes and edges of the graph connect nearest neighbor bounding boxes
  • These sets of identified/classified objects and associated relative positions therebetween can be collapsed into and/or otherwise represented by a fingerprint, hash, or vector that, in turn, can be used to compare against a database of vector representations of various scenes or objects (e.g., for use with a sparse representation classifier, as one example).
  • an action database can be queried to access and return one or more actions associated with the recognized scene or object. Thereafter, an action distributor service can cause each action obtained from the database to be performed by an appropriate device or software instance.
  • embodiments described herein relate to systems and methods for associating arbitrary actions (one or more) with arbitrary real -world scenes, objects, and/or media.
  • a scene, person, object, or media that can be uniquely identifiable by methods described herein can be referred to as “enrichable content.”
  • a scene, person, object, or media that has been uniquely identified and associated with one or more actions or tasks can be referred to as “enriched content,” “enriched media,” and/or “enriched objects.”
  • embodiments described herein generally and broadly relate to systems and methods for ingesting (e.g., identifying and associating actions to) enrichable content and, additionally, to systems and methods for automatically causing to be executed one or more actions in response to subsequent identification of a previously-ingested enrichable content.
  • an enrichable content as described herein may be a social media post that includes a single face, several commercial products in the foreground, several background objects (e.g., picture frames, houseplants, furniture, and so on).
  • An author of the social media post can upload the post to an intake system, as described herein, so that the content of that social media post (e.g., recognized objects, faces, text, and so on) and the unique arrangement of that content (e.g., relative positions of object within clusters, and relative positions of clusters of objects, and so on), can be associated to, and/or otherwise collapsed into, a unique identifier (sometimes referred to herein as vectorization or hashing of one or more object clusters).
  • a unique identifier sometimes referred to herein as vectorization or hashing of one or more object clusters.
  • a list of recognized objects may be captured as a set of bounding boxes surrounding respective recognized/classified/labeled content, a set of confidences, and a graph data structure associating relative positions between bounding boxes.
  • clusters of objects (within a threshold distance of one another) can be grouped together.
  • the list of objects and clusters may be presented, stored, and/or transmitted as a JavaScript TM Object Notation (JSON) object, such as:
  • ⁇ index : 2 bounds: [345, 1233, 667, 988], object id : 73C2E6788FCCA, label : "product_COMAPNY Sunscreen SKU: 45678”, label id : FA1914846B010BD1, label confidence : 0.76, ⁇ , ⁇ index : 3, bounds: [15, 678, 27, 2569], object id : A0558FFB854B0, label : "furniture couch”, label id : 64F395BD34C, label confidence : 0.96, ⁇ , ⁇ index : 4, bounds: [250, 260, 35, 37], object id : 31EA478C0934, label : "picture frame”, label id : 6EC258F246855, label confidence : 0.82,
  • This data object may be parsed by a system as described herein to generate a graph for each cluster that in turn can be hashed into a single output value representing the unique arrangement of objects in each cluster.
  • the foregoing example algorithm creates one or more graph data structures having edges defined by distances and angles separating individual bounding boxes of objects identified within input data (e.g., the foregoing JSON object). Once an edge is defined (including an angle and distance, which may be based on pixel counts and/or normalized to a standard scale) between each pairing of individual nodes (objects) that edge can be added to a list of edges between labeled objects defining a graph of objects recognized within a particular cluster.
  • positional relationships between clusters can be used to define a higher-order graph data structure defining an arrangement of clusters of objects within a particular scene.
  • the graph data structures associated with each cluster, and the overarching graph defining positional relationships between clusters themselves can each be hashed to a unique value.
  • the hash may be an ordered hash function or an unordered hash function. For embodiments in which the hash is ordered, similar arrangements of objects generate similar hash values.
  • the hashes associated with individual clusters and the hash associated with a graph of clusters within a scene can be concatenated (optionally with one or more salt values) and re-hashed to define a single hash function or identifier representing the arrangement of objects within a particular scene.
  • the set of identified objects and one or more hashes representing arrangements of those objects can be used to reliably identify a particular object set within a particular scene, such as the social media post of the preceding example.
  • the author of the social media post - after having uploaded the social media post to be processed in the manner described above and/or elsewhere herein - may also associate one or more actions, which may be preselected from a list of actions and/or may be customized (e.g., arbitrary code provided by the author or a third party, which may execute as a lambda function or other serverless function, as one example), to the unique identifier.
  • actions may be preselected from a list of actions and/or may be customized (e.g., arbitrary code provided by the author or a third party, which may execute as a lambda function or other serverless function, as one example)
  • persons viewing the social media post at a later time can cause their respective electronic devices to attempt to identify the social media post by leveraging a system such as described herein.
  • the social media post may be uploaded to a content identification system, an action retrieval system, or more simply a “retrieval” system which can perform similar operations to the intake system; in particular, the retrieval system can be configured to identify one or more objects, faces, or text within the social media post, and identify a relative arrangement of those objects and/or clusters thereof.
  • the retrieval system can generate an identifier, fingerprint, or hash which can be compared against (for threshold similarity and/or identity) to a database of unique identifiers of enrichable content previously consumed by the intake system. If a match is determined by the retrieval system, the unique identifier can be used to query an action database, to retrieve one or more actions, some or all of which may have been selected by the author of the social media post. Thereafter, the retrieval system can cause the actions to be performed, executed, or otherwise scheduled. In some cases, an action may be triggered at the respective electronic device of the person viewing the social media post. In other cases, an action may be triggered at a third-party system, such as a content interaction tracking service. These examples are not exhaustive.
  • the social media post is enriched with additional features and functionality that may not be provided by the social media platform hosting the post.
  • the author of the post may link - via an enrichment as described herein - to a merchandise purchase store, may provide special content (e.g., as an augmented reality (AR) overlay rendered over the social media post), may automatically generate a coupon or promotion code, and so on.
  • AR augmented reality
  • the social media post may be enriched with one or more opaque actions that occur in the background, such as for engagement tracking and/or copyright enforcement.
  • the content-based hash including positional relationships of recognized objects, color histograms, facial recognition output, and so on
  • a threshold distance e.g., cosine distance
  • copyright enforcement actions e.g., DMCA takedown notices
  • the content enrichment features described above are not limited to presentations of the social media post on the social media platform. More specifically, because systems described herein leverage machine learning, generative pretrained transformers, Al, and/or computer vision to identify objects, faces, text, and/or other content within arbitrary frames captured by a camera (and/or uploaded as an image file), the enrichment features follow the social media post wherever that post is reproduced.
  • the social media post is printed
  • a user can direct a camera of their cellular phone to image the printed social media post and the associated enriching actions can be performed (as the retrieval system will still recognize the same set of identified objects and the same relative arrangement thereof).
  • the original author maintains control over the enrichment features associated therewith.
  • a retrieval system as described herein can be configured to operate within given tolerances. For example, exact matches object and arrangements thereof may not be required; suitably close matches (which may vary from embodiment to embodiment) can be considered as matches for purposes of providing enriched content, as described herein.
  • content hashes can be generated as ordered hashes between which distances may be calculated to infer similarity between different hashed content.
  • any appropriate distance measurement technique can be used.
  • a threshold can be defined so as to binarize a determination of whether two content hashes represent the same content. In particular, if a distance between two content hashes satisfies the threshold (e.g., is below a threshold distance), then the two underlying contents are defined as the same.
  • the underlying contents are defined as different contents.
  • distance measurements may be recorded and logged over time to determine and/or verify whether threshold decisions are appropriate; if many near-miss distances are received, threshold values may be increased.
  • This foregoing tolerance-based matching approach can provide further advantages to, as one example, the author of the social media post of the preceding example. For example, if another person crops a watermark out of the original post, the original content may still be recognized.
  • a retrieval system as described herein can be configured to segment an input image into a grid (or other subarea of arbitrary shape or size; examples include concentric circles, randomly-distributed rectangular areas, and so on), and each grid element can be independently processed by the retrieval system.
  • a grid or other subarea of arbitrary shape or size; examples include concentric circles, randomly-distributed rectangular areas, and so on
  • each grid element can be independently processed by the retrieval system.
  • the social media post of the preceding example is broadcast in a news segment
  • viewers of the news segment can provide a photo or video of the segment as input to the retrieval system (e.g., by directing a camera of a personal cellular phone to image the news segment), which in turn can optionally segment the image as described above.
  • At least one segment of the segmented input image set contains at least a portion of the enriched content (e.g., the social media post), and can be identified by the retrieval system and the actions associated therewith can be performed.
  • enrichable content can be identified and associated to one or more actions, as described herein.
  • a person may upload a photo of their face as an item of enrichable content.
  • the person may associate a personal website, a contact card, and/or any other suitable information relevant to and/or selected by the person.
  • the associated actions can be caused to be performed.
  • a category of specific objects may be identified by a system as described herein, such as particular apparel or accessories from a particular manufacturer (e.g., shoes, handbags, clothingjewelry and so on).
  • an associated action may be to purchase a similar or identical item, to create a digital twin of the item in a virtual interaction environment (e.g., metaverse or other simulated environment), to update a loyalty points database with a particular manufacturer, and so on.
  • a virtual interaction environment e.g., metaverse or other simulated environment
  • embodiments described herein can be leveraged to enrich media content, such as video advertisements, billboard advertisements, television programs, movies, and so on.
  • media content such as video advertisements, billboard advertisements, television programs, movies, and so on.
  • a commercial may be received by the intake system on a frame-by- frame basis and/or a subset of frames.
  • each frame provided as input to the intake system can result in a particular unique identifier, such as described above.
  • virtual objects can be associated to real world geographic locations.
  • scanned enrichable content may or may not be associated with the action (i.e., virtual object rendering, in AR as one example) taken in response to scanning that content.
  • a system as described herein can be used as a virtual geocaching system or a capture the flag game in which searchers scan real-world content to reveal potential virtual- world objects.
  • virtual objects associated with geographic locations may be directly associated with the real -world objects or scenes, such as placing a virtual for-sale sign over real property upon taking a photo of the real property location.
  • FIG. 10 Further embodiments do not rely on a single frame to generate a content hash as described above.
  • a sequence of frames - and in particular the set of identified objects and relative positions thereof - can be collapsed into a single identifier, in a similar manner as with individual static frames as described above.
  • content hashes of individual frames can be graphed together with edges corresponding to the duration between the frames that are captured, thereby introducing time-variance of the scene as yet another hashable property.
  • a retrieval system may be configured to receive as input one or more frames from a portable electronic device imaging a broadcast of a commercial.
  • the frames can be processed by the intake system, and the sequence of processed frames can be collapsed into a vector, hash, or fingerprint and compared against previously-hashed content now stored in the identifier/vector database.
  • video media content captured by a camera or imaging device — can be recognized and enriched by a system as described herein, even if the capturing device does not precisely frame the target video media content, does not align precisely in time with the start of the target content, and so on.
  • a commercial may be identified by methods described herein and associated with an enrichment that causes a device that scans/images the commercial (e.g., a personal cellular phone) to display a graphical user interface including an option to buy an advertised product, to initiate a trial of an advertised service, to initiate a videocall or telephone call to a company or person, to initiate a crypto-asset transaction, render a digital twin version (in AR or VR) of an advertised product, or any other suitable enrichment action.
  • a device that scans/images the commercial (e.g., a personal cellular phone) to display a graphical user interface including an option to buy an advertised product, to initiate a trial of an advertised service, to initiate a videocall or telephone call to a company or person, to initiate a crypto-asset transaction, render a digital twin version (in AR or VR) of an advertised product, or any other suitable enrichment action.
  • an enrichment action can be leveraged as a mechanism for enforcement of copyright.
  • an enrichment action may cause a client device that images a particular enrichable object or scene to upload information (e.g., as a structured data object having one or more attributes, encoding one or more images or video or other media, sensor data such as location information and so on) to a server under the control of (or otherwise accessible to) a content owner.
  • information e.g., as a structured data object having one or more attributes, encoding one or more images or video or other media, sensor data such as location information and so on
  • the content owner can be made aware of the existence of copies, whether authorized or not, of content owned by the content owner.
  • an enrichment action can be based on and/or selected in view of a particular context in which an enrichable content is imaged as described herein.
  • a user leverages a personal cell phone to scan a particular storefront (that has been previously imaged and provided as input to an intake system)
  • different actions may be performed depending upon a global position of the user at the time of the scan. More particularly, if a user is standing in front of the store (e.g., a GPS location corresponds to the storefront’s GPS location), a menu overlay may be rendered on the user’s device.
  • an enrichment action may be to direct the user’ s browser to a reservation page, so that the user may make a reservation at the restaurant.
  • a content enrichment system as described herein includes an intake portion and a retrieval portion, both of which can be leveraged to identify particular objects, faces, items, and so on and arrangements thereof in a particular scene (static) or in a sequence of scenes (e.g., video images).
  • an intake system or a retrieval system as described herein can leverage machine learning, artificial intelligence, sensor systems, data aggregators, computer vision, color histogram analysis modules, sound detection and classification systems, or any other suitable software instances or hardware apparatuses.
  • an intake system leverages a higher-performance object classification technique (which may be slower and/or more computationally intensive) than a retrieval system which may leverage a high-speed object classification technique.
  • intake operations may be more computationally expensive than retrieval operations which may be configured to execute as quickly as possible.
  • preprocessing and/or object classification operation of a retrieval and/or an intake system can be performed on a client device.
  • object classification operations of a retrieval system can be performed in part by a user’s device.
  • the user’s device may be configured to de-skew, rotate, color correct, scale, or otherwise modify one or more frames of an imaged scene before transmitting those frames to a remote server configured to execute other operations of a retrieval system as described herein.
  • the remote server may include functionality to automatically identify and crop and de-skew content received from a user.
  • a retrieval system and/or an intake system as described herein can be configured to segment an input image, processing individual portions of an image as though each was a separate image.
  • sub-portions of an image can be processed in parallel, or, in other cases, sub-portions may be processed in a particular pattern or sequence. For example, in some cases, a central tile/segment of a particular image may be processed first, whereas comer or edge segments of the same image may be processed last.
  • retrieval operations may be staged in multiple stages.
  • a first stage may be configured to perform crude object detection so as to locate a candidate segment of the image to scan first.
  • a first stage may be a stage configured to identify a particular trigger symbol or fiducial, such as an icon or watermark.
  • a crude object detection algorithm may operate to detect a television or other rectangular object within a scene. Thereafter, a second stage can be configured to attempt to identify a watermark in a particular location within a bounding box surrounding the identified television. In some cases, the watermark or icon can be located in a bottom comer of the screen. Upon identifying the icon, content of the television can be accurately cropped, de-skewed and scale, and transmitted to a remote retrieval system for further identification and action triggering.
  • an intake system can be configured to scan for barcodes, QR codes, or other optical codes (or plaintext associated with other information, such as a web address or a phone number; any particular text recognized with a suitable data detector or regular expression may be processed) within a particular scene.
  • the intake system can thereafter associate the function(s) of the scanned codes with the identified scene.
  • a scene may include a QR code that links to a website.
  • the intake system can record this web address and present the web address as an optional action to users who subsequently image the scene.
  • QR codes QR codes
  • barcodes barcodes
  • plaintext plaintext
  • actionable plaintext e.g., web addresses, email addresses, and so on. More particularly, because an intake system as described herein reads these codes (and/or OCRs the plaintext, potentially recognizing certain types of text with a data detector) on intake, a user who subsequently scans the same scene does not need to be close enough to actually scan the QR code for the action associated with that QR code to be offered to the user.
  • a storefront may post a QR code that links to the store’s menu.
  • the storefront itself may be provided as input to an intake system, which can read the QR code and record the URL pointing to the store’s menu.
  • a potential customer in a vehicle passing the store may capture an image of the storefront.
  • the potential customer is not only moving, but is likely too far from the QR code itself for even a high-resolution camera to properly resolve or decode the QR code.
  • the storefront itself can be identified (by its unique collection of objects, distributions thereof, global location and/or other identifying sensor input), and the URL encoded by the QR code - that was not possible to be scanned by the potential customer in the passing vehicle - is presented to the potential customer.
  • a system as described herein can effectively enable functionality of distant QR codes otherwise impossible to scan.
  • a system as described herein can effectively enable functionality of an occluded QR code that is otherwise not visible.
  • a system as described herein can effectively enable functionality of an irreparably damaged QR code that is so damaged that in-built error correction and redundancy fails.
  • a system as described herein can present a URL to a user that is written in plain text but is too far away to be read by the user.
  • an intake system as described herein can record NFC tags, RFID tags, data from BLE beacons, in order to enable functionality thereof in much the same manner as described above with respect to QR codes, barcodes, plaintext.
  • a system as described herein can effectively enable functionality of distant NFC tags, RFID tags, and BLE beacons otherwise impossible to scan.
  • a system as described herein can enable functionality of NFC tags and BLE beacons for personal electronic devices not capable to scan NFC tags or BLE beacons.
  • a system as described herein can effectively enable functionality of an occluded NFC tag or BLE beacon that is otherwise not scannable due to the occlusion.
  • a system as described herein can effectively enable functionality of an irreparably damaged NFC tag or damage or unpowered BLE beacon that is so damaged that in-built error correction and redundancy fails.
  • identification operations as described herein can be assisted with other sensor inputs, such as GPS input.
  • GPS information retrieved from a user’s device can be used to filter a possible set of enriched objects to only those objects known to be within the geographic region in which the scan takes place.
  • many fast food franchise storefronts may include similar distributions of physical objects or classifiable objects, but each occupies a different physical location; filtering by physical location can improve recognition accuracy of a particular object over other similar objects.
  • an intake system can be configured by an administrator or content uploader to ignore particular objects or classes of objects. For example, a storefront owner may instruct an intake system to ignore human persons. In other cases, a social media content uploader may instruct the intake system to ignore certain background objects.
  • a retrieval system can be configured to ignore particular object classes when determining content of a particular imaged scene.
  • the retrieval system may be configured to movable or moving objects such as persons, animals, vehicles, and the like.
  • different instances or different threads of the retrieval system may be configured to ignore different sets of objects.
  • a single content item or object can be associated with — and may be matched by comparing to — many different hashes, each of which may be based on different segments, portions, object classes or types, or other combinations thereof.
  • enrichable content may include one or more images, and/or one or more video frames uploaded by a user to an application server of an intake system.
  • the user may associate one or more actions with the uploaded enrichable content, and the content and the actions may be stored in a database that is communicatively coupled to the application server.
  • Enrichable content associated with the one or more corresponding user actions may generate content that is enriched by one or more corresponding actions.
  • an application server of an intake system may ingest the uploaded enrichable content and process the uploaded enrichable content using an artificial intelligence algorithm, a machine learning algorithm, facial attribute classifier, generative pretrained transformer, and/or a computer vision algorithm, which may be referred herein collectively as “AI/ML/CV” or “machine learning algorithm” or “trained classifier” or “predictive model” or “transformers.”
  • AI/ML/CV artificial intelligence algorithm
  • machine learning algorithm or “trained classifier” or “predictive model” or “transformers.”
  • the enrichable content uploaded by a user may be processed to identify one or more objects in the uploaded enrichable content, a spatial relationship and/or a temporal relationship (e.g., how objects and/or clusters of objects move between sequences of frames) between the one or more objects identified using the machine learning algorithm.
  • a spatial relationship and/or a temporal relationship e.g., how objects and/or clusters of objects move between sequences of frames
  • At least one instance of a server application may maintain a library or database based on the enrichable content and associate one or more actions with the enrichable content.
  • the enrichable content may include an image or other media
  • the library may include multiple different images generated based on the image uploaded by the user. For example, multiple images may be generated by varying properties of the uploaded image, such as brightness, contrast, temperature, tint, hue, gamma, color, blur, color tint, rotation, scale, an aspect ratio, and so on.
  • a machine learning algorithm may be used to create a list of one or more objects within in an image (or, more generally, a scene) and generate an object descriptor or label corresponding to each object.
  • the machine learning algorithm may be a supervised machine learning algorithm.
  • other types of machine learning algorithms may be used in place of and/or with a supervised algorithm.
  • a generative pretrained transformer or other artificial neural networks may be configured to determine object labels and/or object sets based, in part, on language models encoding object label co-occurrences in a single scene. More simply, a transformer may be leveraged as a label filter to prevent nonsensical identification of different objects in the same scene.
  • the system may be configured to determine that a low-confidence identification of a shark from a naive CV classifier is incorrect if the same scene includes a high-confidence identification of a farmhouse.
  • the object descriptor may include a list of objects identified in the image.
  • the list of objects identified in the image may be accompanied by and/or may include a bounding box surrounding each identified object or content item.
  • a bounding box may also describe a spatial location of an object in an image relative to a coordinate system particular to an anchor point of the image, such as a comer thereof or a geometric center thereof. Accordingly, as noted above, spatial relationships between two more objects identified in an image may also be determined.
  • each bounding box may be further divided into a number of sections, for example, 16 sections (4x4 sections) or 64 sections (8x8 sections) or an arbitrary non-square number of sections.
  • a number of sections for dividing each bounding box may be configurable.
  • each bounding box may be further divided into a number of sections, each of a fixed size.
  • an object attribute classifier may be executed to further characterize the identified object.
  • a vector, hash, or other fingerprint may be generated based on the number of sections representing each bounding box identifying each object.
  • a dictionary of vectors may be leveraged by a machine learning algorithm (e.g., sparse representation classification algorithm) as described herein to identify a candidate scene as a scene already processed by the intake system.
  • a machine learning algorithm e.g., sparse representation classification algorithm
  • Each set of vectors in the dictionary may correspond to object lists and relative positions within an uploaded image (and/or other images derived from that image, such as images of different scale or color content), as described herein.
  • each vector of the dictionary may be normalized to a particular uniform length.
  • Each vector corresponding to each scene or object identified as a result of operation of the intake system can be associated with a particular index number, each of which may be stored in an index database (e.g., an index database, communicatively coupled with the application server of the intake system), which can in turn be associated with actions to be performed whenever the scene corresponding to a particular index is scanned.
  • an index database e.g., an index database, communicatively coupled with the application server of the intake system
  • a retrieval system may include an application server executing an instance of a server application configured to receive an enrichable content, such as an image or a video, from a client device over a user interface, as described herein.
  • the user interface may present, to a user, options regarding a particular usage intended by the user. Accordingly, when the user indicates the user is uploading the image or video to retrieve one or more user actions associated with the image or video being uploaded by the user, the instance of the application server may process the uploaded image or video to identify one or more user actions that may be associated with the image or video uploaded by the user.
  • the instance of the application server may leverage a machine learning algorithm to identify one or more objects captured in an image uploaded by a user, or one or more objects captured in a video (e.g., a series of images).
  • one or more bounding boxes surrounding the one or more objects identified in the image or the series of images may be used to generate a vector, as described herein, with reference to the intake system.
  • the vector may be then compared with multiple vectors stored in the vector table to identify the closest matching vector with a predetermined or preconfigured threshold matching error, such as described above.
  • a database record index associated with the closest matching vector in the vector table may then be searched in the vector set table to identify a vector set and corresponding database record index corresponding to the matched vector(s).
  • the database record index of the vector set may then be used to query an action database to identify one or more user actions that are associated with the image or video uploaded by the user, some of which may be performed by the user’s device, some of which may be performed by a third-party system, some of which may be performed by the retrieval system itself.
  • Many configurations are possible.
  • a particular image uploaded by a user to a retrieval system may have been associated with an action that may allow the user to purchase or lease specific content for use in metaverse. For example, when a user takes and uploads an image of some apparel previously scanned by the intake system, the retrieval system may retrieve an action to generate a temporary digital twin/copy of the apparel item in a digital environment which may be associated to a user account, may be offered for purchase to the user, or may be offered for lease to the user.
  • a user-uploaded image may include an image of purchased item.
  • the user may have purchased a new toaster, which may be still in the original packaging.
  • a user may capture an image including the purchased item in its original packaging.
  • the user may be offered an option whether the user would like to complete registration for the purchased item.
  • many of the details for the online registration may be automatically filed based on metadata associated with the uploaded image and/or information corresponding to a client device used for uploading the image (e.g., such as a user account) as a structured data object with multiple attributes, each of which may correspond to an image captured by the client device and/or sensor information of sensors of the client device such as GPS information, accelerometer information, gyroscope information, compass information, network connection information (e.g., Wi-Fi, Bluetooth, UWB, cellular connections and so on).
  • a client device used for uploading the image e.g., such as a user account
  • a structured data object with multiple attributes each of which may correspond to an image captured by the client device and/or sensor information of sensors of the client device such as GPS information, accelerometer information, gyroscope information, compass information, network connection information (e.g., Wi-Fi, Bluetooth, UWB, cellular connections and so on).
  • location data included in the metadata of the uploaded image and/or a GPS location of the client device may be used to complete residence/business address information.
  • a username associated with an account authorized on the client device may also be used to complete the online registration for the product.
  • a search may be performed if the particular product is registered at an address that may be identified as described above or associated with the user that is identified as described above. If the particular product is found as not have been registered at the address or associated with the user, an option for online registration may be presented to the user. As described above, many of the details for the online registration may be automatically filed based on metadata associated with the uploaded image and/or information corresponding to a client device used for uploading the image.
  • a user may leverage a system as described herein to quickly and efficiently complete an inventory of items purchased and owned by the user and placed within the user’s home, for example, for homeowners or renters’ insurance underwriting or claims purposes.
  • a user may leverage a system as described herein to create digital twins of the user’s real -world possessions in a virtual environment.
  • a user may upload the image on the user’s one or more social network accounts. Based on the number of other visits to the particular uploaded image, the user may be awarded reward points by a third party or by the social media platform.
  • a user may be provided a suggestion that there are other features or actions associated with a specific object that may be captured using a camera and uploaded.
  • a logo design or a symbol may be placed on the object or rendered over a media item for broadcast (e.g., watermark).
  • the user upon seeing the particular logo or symbol may leverage a nearby camera to capture an image of the scene or media or object displaying the logo or symbol and upload it to a retrieval system as described herein.
  • FIG. 1 depicts an example computing network in, or over which embodiments as described herein may be implemented.
  • an example computing network implemented as a system 100.
  • the system 100 includes a client device 102 communicatively coupled via a network to a gateway 104.
  • the gateway 104 may further provide coupling between the client device 102 and an intake system 106 and/or a retrieval system 108.
  • the system 100 can be leveraged by the client device 102 in an intake mode or a retrieval mode, both of which are described in greater detail below.
  • the client device 102 is configured to upload one or more images of an object or scene to the intake system 106.
  • the intake system 106 is configured to identify enrichable content within the scene and to associate one or more selected actions with that enrichable content.
  • the client device 102 When operated in a retrieval mode, the client device 102 is configured to upload one or more images of an object or scene to the retrieval system 108. With this input, the retrieval system 108 is configured to leverage previously-identified content (e.g., content previously uploaded to the intake system 106) and to cause to be executed one or more actions associated with the identified content.
  • previously-identified content e.g., content previously uploaded to the intake system 106
  • Each of these operations are described in greater detail below, however it may be appreciated that in many embodiments a client device operating the system 100 in an intake mode may be different from a client device operating the system 100 in a retrieval mode; the client device 102 is depicted as operable in both modes only for simplicity of description and illustration.
  • the client device 102 may be a phone, a tablet, a smartwatch, a laptop, a computer, the internet-of-Things (loT), and electronic device that has at least one imaging system (e.g., camera) and a transceiver to communicate with the gateway 104 via the network.
  • the imaging system e.g., camera
  • a user of the client device 102 may cause the device to transmit an image or video (and/or sequence of images) captured with a camera system of the client device 102 to either the intake system 106 via the gateway 104 for further processing.
  • the subject(s) of an image or video may be referenced herein as enrichable content as one or more actions that may be associated with the image or video, and the content may be enriched with additional information, such as one or more actions that may be triggered when the same or another user captures an image or a video that has similarity to the image or the video that is enriched with the one or more associated actions.
  • the image or video may be captured using an application executing on the client device 102, or using a web interface providing a web connection to the gateway 104. Accordingly, in the intake mode, the application executing on the client device 102 or the web interface providing the web connection to the gateway 104 may present to the user options whether the user would like to associate one or more actions with the enrichable content or the user would like to retrieve any actions that may be associated with the image or video. [0141] In some embodiments, based on the user’s selection of one or more options (presented by the application, for example in a graphical user interface thereof) executing on the client device or the web interface, the captured images or video may be transmitted to the gateway 104 for forwarding to the intake system 106. To route these transmissions, the gateway 104 can leverage a flag, attribute, or header indicating that the images are intended for the intake system 106.
  • the intake system 106 may receive from the client device 102 via the gateway 104 over the network one or more images and/or one or more frames of a video (e.g., the frames 112) at an intake pipeline 110.
  • the intake pipeline 110 may comprise one or more servers, applications, libraries, functions, modules, algorithms, and/or machine learning model s/algorithms, and so on, for processing of the frames 112 and providing as output an index identifying particular content within the object/scene and associating that content to one or more actions/enrichments.
  • the index(es) generated by the pipeline can be stored in a database 114.
  • the retrieval system 108 may receive from the client device 102 via the gateway 104 over the network one or more images and/or one or more frames of a video (e.g., the frames 118) at a retrieval pipeline 116.
  • the retrieval pipeline 116 may comprise one or more servers, applications, libraries, functions, modules, algorithms, and/or machine learning models/algorithms, and so on, for processing of the frames 118, which may have a different resolution and/or may be presented at a different frame rate than the frames 112. Processing of the frames 118 by the retrieval pipeline 116 is described in detail below.
  • the retrieval pipeline 116 may determine an index of a database record corresponding to the enriched content that is matching with enrichable content stored at the index. The retrieval pipeline 116 may then identify one or more actions stored in a database 120, which may be a part of the retrieval system 108 or the intake system 106; in either construction, the database 120 may be communi cably coupled to either or both the intake system 106 and the retrieval system 108.
  • the retrieval pipeline 116 may then communicate the retrieved one or more actions to an action distributor 122.
  • the action distributor 122 may analyze the one or more actions - and/or metadata or descriptive data thereof - and transmit the one or more actions to appropriate destinations for execution and/or other handling.
  • the action distributor 122 may be configured to identify at least one action that should be executed by the client device 102; in other words, certain actions may be intended to benefit, or receive input from, or provide output to a user 124, which may be a user of the client device 102.
  • the action distributor 122 may generate one or more messages or notifications or instructions to be received by the client device 102, which in response may modify a graphical user interface, may retrieve sensor output (GPS, accelerometer, compass, and so on) and forward that sensor output to another service, may generate a notification and so on. Many potential actions may be performed by the client device 102.
  • sensor output GPS, accelerometer, compass, and so on
  • the action distributor 122 may also communicate with an administrator or an admin console 126 for various purposes such as logging, recording, and/or troubleshooting, and so on.
  • the action distributor 122 may also communicate with a third-party system 128 for example to retrieval additional actions from the third-party system 128 and/or to update the third-party system of the particular enriched content uploaded by a user.
  • the action distributor 122 may also communicate with an action executor 130.
  • the action executor 130 cause performing of the one or more actions based on the user selection.
  • FIG. 2 depicts an example intake system, as described herein.
  • the intake system 200 may include one or more application servers executing instances of one or more algorithms, e.g., machine-learning algorithms, artificial intelligence-based algorithms, computer- vision based algorithms, which may be referenced as a machine-learning algorithm in the present disclosure.
  • the one or more algorithms may be configured to process one or more images and/or one or more frames 204 captured of an object or scene uploaded by a client device 202.
  • the one or more images and/or one or more frames of a video may be received at an intake system via a gateway, such as the gateway 104 of FIG. 1.
  • the high-performance machine learning algorithm 206 may analyze the received enrichable content, and identify one or more objects in the scene, and relative positions thereof.
  • the high-performance machine learning algorithm 206 may be used to identify one or more objects included in the scene and provide an object list to an object cluster identifier 208.
  • the object cluster identifier 208 be configured to identify groupings of objects identified by the high-performance machine learning algorithm 206.
  • the list of objects identified in the scene may be created using a bounding box surrounding each object in the enrichable content that can be identified.
  • a bounding box may also describe a spatial location of an object in the scene, in two or three dimensions. Accordingly, spatial relationships between two or more objects identified in the enrichable content may also be determined. In some embodiments, when the scene contains an object (e.g., an enrichable content item) that is movable, a temporal relationship between objects identified in the enrichable content may also be determined.
  • an object e.g., an enrichable content item
  • a temporal relationship between objects identified in the enrichable content may also be determined.
  • a single identifier can be used to reference the one or more clusters of objects 210 output by the object cluster identifier 208.
  • a vectorizer 212 can be configured to consume a set of clusters and to provide as output a single vector 214, which in turn can be identified by a single index value.
  • each recognizable enrichable object or scene can have its content summarized by a single vector, which in turn can be indexed.
  • more than one vector can be used to describe a single scene.
  • some vectors may be created based on different image manipulations of input images (e.g., color adjustments, scale changes, and so on) and some vectors may be based on different sets of identifiable objects. For example, a first vector describing a scene may reject or ignore human persons whereas another vector describing the same scene may not reject or ignore labels of human persons. In this manner, recognition accuracy by a retrieval system such as described herein can be improved.
  • each vector 214 corresponding to the one or more clusters of objects 210 may be normalized in accordance with criteria including one or more of a data size of a vector, a particular physical dimension of a vector, and so on.
  • the classifier 216 may be executing on the one or more application servers of the intake system 200 and may store the set of vectors into an index database 218.
  • An index corresponding to a database record storing the set of vectors in the index database 218 may be retrieved by the classifier 216 and may be further associated with the set of actions stored in an action database 220.
  • the enrichable content uploaded by a user of the client device and other generated enrichable content having different properties, as described herein may be associated with the set of actions 222 as provided by the user of the client device, and an acknowledgement or confirmation may be sent back to the user for displaying on the client device 202.
  • the intake system 200 receives enrichable content with associated one or more actions, which may be then retrieved using a retrieval system described using FIG. 3.
  • FIG. 3 depicts an example retrieval system, as described herein.
  • a retrieval system 300 may include one or more application servers executing a scene or a boundary detector 306 and algorithms 308, e.g., machine-learning algorithms, artificial intelligence-based algorithms, computer-vision based algorithms, which may be referenced as a machine-learning algorithm 308 in the present disclosure.
  • the scene or boundary detector 306 may receive frames 304 from a client device 302 via a gateway, such as the gateway 104, as described herein.
  • the frames 304 may correspond contain enrichable content previously uploaded to an intake system by the same or another user.
  • the scene or boundary detector 306 may identify one or more objects in the frames 304 using techniques described herein. Each object identified in the content may has a particular spatial relationship with each other classified object. If the uploaded content is a sequence of frames, then objects identified in the content may also have temporal relationship with each other.
  • An output of the scene or boundary detector 306, which may include one or more objects with its corresponding bounding box may then then be processed though the machinelearning algorithm 308.
  • the machine-learning algorithm 308 may be high-speed algorithm that prioritizes speed over accuracy or precision.
  • the machine-learning algorithm 308 may operate with an object cluster identifier 310 to generate object clusters, which may be represented as structured data and/or as the object clusters 312.
  • a classifier 318 may compare the vector 316 with one or more vectors stored in an index database 320 (which may be the index database 218 of FIG.).
  • the classifier 318 may identify a vector stored in the index database 320 that matches with the vector 316 according to particular criteria.
  • the particular criteria may include at least a specific number of objects in the vector 316 and the matching vector stored in the index database 320, spatial relationship between two or more objects that match above a particular threshold value, and so on.
  • sparse representation classification can be used to identify which among the set of vectors is a closest match to the vector 316. Once a match is determined within a tolerance or threshold, an index corresponding to that match can be returned.
  • the index corresponding to the vector stored in the index database 320 that matches the vector 316 may be used to search an action database 322.
  • the action database 322 may be the action database 220.
  • an index of records storing one or more actions associated with the frames 304 may be used to retrieve an actions list 324.
  • the retrieved actions list 324 may be then sent to an action distributor 326 for distributing to various stakeholders, such as a third-party 328, an administrator 332, and/or a user of the client device 302.
  • the retrieved actions list 324 may be reported to the third-party 328, for example, for informing the third-party 328 about a location of the user of the client device 302.
  • the third-party may also display or push information about their services and/or sales to the client device 302 through the action distributor 326.
  • the third-party 328 may be any party, person, and/or an entity which may be interested to know the user of the client device 302 has uploaded frames 304 that is associated with one or more actions identified in the actions list 324.
  • the administrator 332 may receive information of the frames 304 and the corresponding one or more actions of the actions list 324 for logging, monitoring, and/or troubleshooting, and so on.
  • the retrieved actions list 324 may be displayed on the client device 302 of the user, and may be performed by an action executor 330 in accordance with selection of a particular action of the action list displayed on the client device.
  • FIG. 4A depicts an example computing environment corresponding to an intake system in, or over which, embodiments as described herein may be implemented.
  • a client device 402 may include a processor 404, a memory 406, a display 408, an input system 410, a camera 412 (which may be an event driven/neurom orphic camera, a CMOS camera, CCD camera or any other suitable camera operating in visible bands, IR bands, UV bands or multiple bands thereof), and a sensor 414.
  • a processor 404 may include a processor 404, a memory 406, a display 408, an input system 410, a camera 412 (which may be an event driven/neurom orphic camera, a CMOS camera, CCD camera or any other suitable camera operating in visible bands, IR bands, UV bands or multiple bands thereof), and a sensor 414.
  • a camera 412 which may be an event driven/neurom orphic camera, a CMOS camera, CCD camera or any other suitable camera operating in visible bands, IR
  • FIG. 4A A single instance of a processor, a memory, a display, an input system, a camera, and/or a sensor are shown in FIG. 4A, but it is appreciated that there may be more than one instance of the processor 404, the memory 406, the display 408, the input system 410, the camera 412, and the sensor 414 present in the client device 402.
  • the client device 402 can leverage the processor 404 to access an executable asset from the memory 406 to instantiate an instance of software configured to access an intake system and/or a retrieval system, such as described herein.
  • the instance of software can be referred to as a client application, a frontend application, a browser application, a native application, or by another name.
  • the instance of software may be a portion of a kernel or operating system of the client device 402.
  • An intake system may include an application server 416 may include one or more resource allocations 418, which can include processing resources and memory resources.
  • the application server 416 (also referred to as a host server) may be connected to an index database 420 and an action database 422.
  • the application server 416 can leverage the processor allocation to access an executable asset from the memory allocation to instantiate an instance of software configured to operate as at least a portion of an intake system and/or a retrieval system, such as described herein.
  • the instance of software can be referred to as a server application, a backend application, a host service, or by another name.
  • the instance of software may be a portion of a kernel or operating system of the application server 416.
  • the index database 420 and the action database 422 are shown separate, but this is not required. There may be only one database having tables corresponding to the index database 420 and the action database 422. As described herein the index database 420 may store identifiers of enrichable content as database records, and the action database 422 may store one or more actions associated with enrichable content as cross referenced with one or more indexes of the index database 420.
  • the client device 402 may be communicatively coupled with the application server 416 of the intake system via a gateway, such as the gateway 104 over a network, although this is not required.
  • the network may include the open internet.
  • the client device 402 may be a phone, a computer, a laptop, a tablet, a smartwatch, and/or another suitable device, whether portable or stationary.
  • the processor 404 of the client device 402 may be any suitable computing device or logical circuit configured to execute one or more instructions to perform or coordinate one or more operations on or to digital data.
  • the processor or processors of the client device 402 may be a physical processor, although this is not required of all embodiments; virtual components may be suitable in some implementations.
  • the memory 406 of the client device 402 may be configured and/or implemented in a number of suitable ways and may be partially or completely virtualized.
  • the processor 404 of the client device 402 is configured to access at least one executable asset from the memory 406 of the client device 402. More particularly, the processor 404 of the client device 402 may be configured to access a data store portion of the memory to load, into a working portion of the memory, at least one executable asset or executable program instruction. In response to loading the instruction or executable asset into working memory, the processor 404 of the client device 402 may instantiate an instance of software referred to herein as a client application. The client application and its corresponding user interface is described using, for example, FIG. 6 below.
  • a user of the client device 402 may perform various functions, including but not limited to, launch a client application, capture one or more images and/or video of an object and/or a scene using the camera 412 of the client device 402.
  • the sensor 414 may be a GPS sensor which may be used to determine a current location of the client device 402; this data can be received by the intake system and may be associated to a particular enriched content.
  • there may be other type of sensors as well such as, an inertial measurement unit sensor for detecting speed and movement-related information of the client device 402 according to the movement of a user of the client device 402.
  • the client device may communicate with the application server 416 one or more images (data), one or more frames of video, and/or one or more actions, and receive acknowledgement or confirmation according to an outcome of associating the one or more actions with the one or more images and/or the one or more frames of video as specified by the user of the client device 402.
  • the one or more resource allocation functions/modules 418 may allocate resources, including but not limited to, a processor or a computational resource, a memory, network usage or bandwidth, and so on.
  • the application server 416 may be executing a server application. In some cases, there may be more than one instance of the server application executing on the application server 416.
  • FIG. 4B depicts an example computing environment corresponding to a retrieval system in, or over which, embodiments as described herein may be implemented.
  • a client device 424 may include a processor 426, a memory 428, a display 430, an input system 432, a camera 434, and a sensor 436.
  • a retrieval system may include an application server 438 may include one or more resource allocation functions/modules 440.
  • the application server 438 may be connected to an index database 442 and an action database 444.
  • the index database 442 and the action database 444 are shown separate, there may be only one database storing tables corresponding to the index database 442 and the action database 444.
  • the index database 442 may store identifiers of the one or more objects of the enrichable content as database records
  • the action database 444 may store one or more actions associated with the enrichable content as cross referenced with one or more indexes of the index database 420 corresponding to the enrichable content and other content having different properties generated from the enrichable content.
  • the client device 424 (which may be the same client device as shown in FIG. 4A or may be a different client device) may be communicatively coupled with the application server 438 of the retrieval system via a gateway, such as the gateway 104 over a network, although this is not required.
  • the network includes the open internet.
  • the client device 424 may be a phone, a computer, a laptop, a tablet, a smartwatch, and/or another suitable client device.
  • the processor 426 of the client device 424 may be any suitable computing device or logical circuit configured to execute one or more instructions to perform or coordinate one or more operations on or to digital data.
  • the processor or processors of the client device 424 may be a physical processor, although this is not required of all embodiments; virtual components may be suitable in some implementations.
  • the memory 428 of the client device 424 may be configured and/or implemented in a number of suitable ways and may be partially or completely virtualized.
  • the processor 426 of the client device 424 is configured to access at least one executable asset from the memory 428 of the client device 424. More particularly, the processor 426 of the client device 424 may be configured to access a data store portion of the memory to load, into a working portion of the memory, at least one executable asset or executable program instruction. In response to loading the instruction or executable asset into working memory, the processor 426 of the client device 424 may instantiate an instance of software referred to herein as a client application. The client application and its corresponding user interface is described with reference to FIG. 6 below.
  • a user of the client device 424 may perform various functions, including but not limited to, launch a client application, capture one or more images and/or video of an object and/or a scene using the camera 434 of the client device 424.
  • the sensor 436 may be a GPS sensor which may be used to determine a current location of the client device 424.
  • there may be other type of sensors as well such as, an inertial measurement unit sensor for detecting speed and movement related information of the client device 424 according to the movement of a user of the client device 424.
  • the client device 424 may communicate with the application server 438 one or more images and/or one or more frames of video and receive one or more actions corresponding to the one or more images and/or frames of video according to an outcome of processing of the one or more images and/or frame of video by the application server 438.
  • the one or more resource allocation functions/modules 440 may allocate resources, including but not limited to, a processor or a computational resource, a memory, network usage or bandwidth, and so on.
  • the application server 438 may be executing a server application. In some cases, there may be more than one instance of the server application executing on the application server 438.
  • a host server supporting the backend may be a cluster of different computing resources, which may be geographically separated from one another.
  • the application server and the client device may be referred to, simply, as “computing resources” configured to execute purpose-configured software (e.g., the client application and the backend application).
  • computing resource (along with other similar terms and phrases, including, but not limited to, “computing device” and “computing network”) may be used to refer to any physical and/or virtual electronic device or machine component, or set or group of interconnected and/or communicably coupled physical and/or virtual electronic devices or machine components, suitable to execute or cause to be executed one or more arithmetic or logical operations on digital data.
  • Example computing resources contemplated herein include, but are not limited to: single or multi -core processors; single or multi-thread processors; purpose-configured coprocessors (e.g., graphics processing units, motion processing units, sensor processing units, and the like); volatile or non-volatile memory; application-specific integrated circuits; field- programmable gate arrays; input/output devices and systems and components thereof (e.g., keyboards, mice, trackpads, generic human interface devices, video cameras, microphones, speakers, and the like); networking appliances and systems and components thereof (e.g., routers, switches, firewalls, packet shapers, content filters, network interface controllers or cards, access points, modems, and the like); embedded devices and systems and components thereof (e.g., system(s)-on-chip, Internet-of-Things devices, and the like); industrial control or automation devices and systems and components thereof (e.g., programmable logic controllers, programmable relays, supervisory control and data acquisition controllers,
  • FIGs. 5A-5G depict various example use cases or practical applications of embodiments described herein.
  • FIG. 5A depicts an example operation of a retrieval system 500a as described herein.
  • the figure includes a client device 502, such as a smartphone is shown in front of a television 504.
  • a program with a current scene 508 may have a particular logo or symbol 506 displayed on a display screen of the television 504.
  • the particular logo or symbol may suggest a user present in the room that there may be one or more actions associated with a particular content being displayed on the television 504.
  • the logo may be animated or static, may be positioned suitably anywhere within the television active display area, may be added over the media content by a broadcaster and/or may be encoded into the media itself, and so on.
  • the logo may be rendered by a set top box. In some cases, the logo may be colored to maximize contrast with surrounding pixels.
  • a user may use an input system of the client device 502 to launch an application that may leverage a camera (CMOS, CCD, neuromorphic, and so on) of the client device 502 to capture one or more images and/or one or more frames of the program as a video (“content”).
  • the application executing on the client device may then communicate the content to a retrieval system, as described herein.
  • the retrieval system may then process the content to identify one or more actions associated with the content received from the client device 502, and transmit to the client device for the user to select and take an action.
  • the camera of the client device 502 may capture an image including the current scene 508 being displayed on the display screen of the television 504, a television frame, and other objects, such as a portion of a television console, and so on.
  • the particular logo or symbol 506 may not (in some embodiments) carry any information; in other embodiments the logo may encode information by its structure.
  • one or more actions may be associated with a particular object within a scene rendered on the television.
  • a user uploads one or more images taken using a client application executing on the client device 502
  • one or more objects may be identified from the uploaded one or more images by a retrieval system as described herein.
  • the retrieval system may identify one or more actions that may be associated with an object recognized within the active display area of the television.
  • the one or more actions retrieved from a database may be communicated to the client device 502 for the user to select and execute.
  • a user 510 using a client device 512 may capture one or more images or videos of particular program being displayed on a display screen of a television 514.
  • a currently being displayed scene may include a particular logo or symbol 518, which may suggest the user 510 that there are one or more actions associated with the program being displayed on the television.
  • the user may retrieve one or more actions associated with the program by taking one or more images or videos using a camera of the client device 512 via a client application executing on the client device 512.
  • the user may capture image and/or video from any angle and/or under any lighting condition.
  • relevant objects from the one or more images and/or video frames may be identified, and one or more actions associated with the relevant objects may be identified.
  • one or more actions may have been associated with enrichable content uploaded by a user, and also content that may be generated having different properties, such as brightness, contrast, temperature, tint, hue, gamma, color, blur, color tint, an angle at which an image may be taken, and/or an aspect ratio, and so on. Accordingly, the content uploaded by the user 510 is not required to be exactly identical with content used to associated one or more actions with the content.
  • a view 500c illustrates a client application view of a client application executing on a client device 520.
  • the client application executing on the client device may use a camera of the client device 520 for taking one or more images and/or videos of a program currently being displayed on a television as a view 522.
  • a notification 524 or other visual or haptic indication of detecting enrichable content may be displayed or triggered.
  • the notification may be displayed based on detection of the particular logo or symbol 518 in a known location. For example, the notification may be displayed when a camera may be used by the client application to capture one or more images and/or videos whether or not one or more actions are associated with the particular content.
  • a view 500d illustrates one or more actions retrieved by a retrieval system corresponding to the one or more images and/or videos uploaded to the retrieval system by the user 510.
  • one or more actions displayed on a client device 526 may include a movie trailer shown as a picture-in-picture or an overlay, buy movie tickets 530, buy merchandise 532, use digital content as digital twin 534, and/or other actions 536.
  • the retrieved actions in the present example, may be associated with, for example, an upcoming movie, and the user may have taken one or more images and/or videos of particular movie trailer or advertisement being displayed on the television 514. Accordingly, any number of actions and/or any type of actions may be associated with the enrichable content for retrieval by a user.
  • FIG. 5E illustrates a use case as 500e in which functionality of a QR code can be enabled despite that the QR code itself is unreadable for a particular user’s client device. However, as noted above, for successful scanning of the QR code, the user is required to be within a threshold distance of the QR code.
  • a user may capture an image of a building having an entrance door 540 with a QR code 542 displayed.
  • the features and objects of the entrance door 540, along with optionally supplemental information such as GPS information, can be used by an intake system to uniquely identify the entrance door 540 and/or the building.
  • the retrieval system may identify that the recognized building is associated with a QR code.
  • the user may be notified with a graphical user interface element 544 that a distant QR code has been detected and a user may execute an associated action by selecting a second user interface element 546 to trigger an action associated with the QR code 542.
  • FIG. 5F illustrates a use case 500f, in which a distant QR code is occluded, hidden, or blocked.
  • the distant QR code may have been damaged. Accordingly, when a user captures an image, using a camera of a client device 548 through a client application executing on the client device 548, of the building in which the entrance door 540 is seen having the QR code 542 blocked, for example, by a person standing in front of the QR code 542.
  • a retrieval system may identify based on the image transmitted to it from the client device 548, that the user has transmitted an image of the building, which has a QR code, and a corresponding action associated with the QR code. As a result, a notification may be displayed to the user that an occluded QR code has been detected, and the user can select a user interface element to trigger an action associated with the QR code 542.
  • FIG. 5G illustrates yet another use case 500g, in which an out-of-range NFC and/or a Bluetooth low energy beacon may be detected as within an image of a particular already-scanned scene.
  • a user may associate one or more actions that may be accessible only when a client device within a specific distance range supported by a near-field communication and/or a Bluetooth low energy (BLE) range.
  • BLE Bluetooth low energy
  • a user may upload enrichable content to an intake system and associate it with actions available when a client device is with NFC and/or BLE range.
  • an NFC tag 562 may be present on a table 560, and a user may trigger one or more actions by enabling an NFC mode on a client device.
  • a user having a client device 558 that is not within the NFC range may capture an image of the table 560, for example, and transmit the image to a retrieval system.
  • the retrieval system may identify one or more objects in the image uploaded from the client device 558 and identify that the user has transmitted an image which has one or more actions available through NFC and/or BLE. Accordingly, a notification 564 may be displayed on the client device 558 indicating an out-of-range NFC and/or BLE is detected, so that the user may select and execute functions available using NFC and/or BLE but without being in proximity from the NFC tag and/or BLE of a particular NFC and/or BLE range.
  • FIG. 6 depicts an example user interface of a client application executing on a client device, in accordance with some embodiments.
  • a client application executing on a client device 602 may provide one or more menu options to a user while communicating with an intake system described herein in accordance with some embodiments.
  • a user may upload enrichable content including one or more images and/or video frames of a video and associate one or more actions to the uploaded enrichable content.
  • a client application executing on the client device 602 may present to a user of the client device options, such as upload or scan content 604, set action(s) 606, add action(s) 608, update action(s) 610, delete action(s) 612, and/or delete content 614.
  • the user may save or transmit the enrichable content and/or actions by selecting a save option 616.
  • the upload/scan content may allow the user to access a camera of the client device to capture an image and/or a video of a particular scene and/or object.
  • the user may then associate one or more actions using set action(s) 606 options.
  • the set of actions may include, for example, purchase ticket, play a trailer, purchase merchandize, purchase/lease digital content for use in metaverse, and so on.
  • the set of actions may also include an action associated with a specific QR code, an NFC tag, and/or a BLE.
  • the user may add additional actions using add action(s) 608 options and/or update or delete actions using update action(s) 610 or delete action(s) 612 when the user has previously associated one or more actions with the particular enrichable content.
  • the user may delete enrichable content using delete content 614 option, and thereby, could also remove all actions that may have been associated with the deleted enrichable content.
  • the save option 616 may allow the user to transmit the enrichable content and corresponding user action associated with the enrichable content to an intake system described herein in accordance with some embodiments.
  • FIGs. 7A-7B depict an example of object identification and generation of object clusters, in accordance with some embodiments.
  • a user may be located anywhere in a room in which a television 704 is present and showing a program having a scene as shown in the view 700a.
  • a user may have many objects, such as a television 704, a television console, and so on, in an image.
  • raw imagery captured by the client device 702 can be subdivided into multiple sections, and each section can be provided to a retrieval system as described herein and/or to a crude object detector, such as described herein.
  • segments of an image can be sequentially scanned, starting centrally and working outward.
  • a central portion 706 of the image may only be analyzed for identifying one or more objects in the image. Accordingly, extraneous information included in content may be excluded from processing by the intake system or retrieval system.
  • the central portion 706 of the image may be further divided into a number of sections 706A-706I for identifying one or more objects in the central portion 706, and spatial relationship between the one or more objects identified in the central portion 706.
  • object clusters may be generated as described herein by the intake system and/or retrieval system.
  • FIG. 8 depicts a flowchart corresponding to example operations of a method being performed by an intake system, in accordance with some embodiments.
  • the method 800 includes the operation 802 at which enrichable content and a set of actions may be received at an intake system, as shown herein in accordance with some embodiments in FIG. 1, FIG. 2, and/or FIG. 4A.
  • the enrichable content received at the intake system may include one or more images and/or videos taken using a camera of a client device, such as the client device 102, 202, and/or 402.
  • a user of the client device 102, 202, and/or 402 may launch an application and/or a web interface to take the one or more images and/or video using the camera of the client device.
  • a user may transmit the enrichable content, and one or more actions associated with the enrichable content using an interface shown in FIG. 6.
  • the enrichable content may be received at the intake system via a gateway over a network.
  • object clusters may be generated in which one or more objects in the received enrichable content may be identified.
  • a predetermined area of the enrichable content may be analyzed for determining one or more objects.
  • the predetermined area may be a central portion of an image.
  • the predetermined area may be selected based on blurring of an image. Accordingly, a picture area without blurring may be selected for analysis to generate object clusters as described herein.
  • an identifier of a number of objects in object clusters generated based on the enrichable content received at the operation 802 may be generated.
  • one or more objects of the object clusters may be converted into a single vector representing the one or more objects and/or object clusters.
  • the generated vector may be normalized. The generated vector may then be stored as a database record in a database, for example, an index database.
  • a set of actions received at the operation 802 from a client device using a user interface shown in FIG. 6 may be associated with a vector generated at the operation 806 above, and stored as a database record in a database, such as an action database, along with an index of a database record storing the generated vector.
  • a database such as an action database
  • the action database and the index database may be a single database.
  • FIG. 9A depicts a flowchart corresponding to example operations of a method being performed by a retrieval system, in accordance with some embodiments.
  • a method 900a at the operation 902, enrichable content may be received at a retrieval system, as shown herein in accordance with some embodiments in FIG. 1, FIG. 3, and/or FIG. 4B.
  • the enrichable content received at the retrieval system may include one or more images and/or videos taken using a camera of a client device, such as the client device 102, 302, and/or 424.
  • a user of the client device 102, 302, and/or 424 may launch an application and/or a web interface to take the one or more images and/or video using the camera of the client device.
  • a user may transmit the enrichable content to the retrieval system via a gateway over a network.
  • object clusters may be generated in which one or more objects in the received scene image may be identified.
  • a predetermined area of the enrichable content may be analyzed for determining one or more objects.
  • the predetermined area may be a central portion of an image.
  • the predetermined area may be selected based on blurring of an image. Accordingly, a picture area without blurring may be selected for analysis to generate object clusters as described herein.
  • an identifier of a number of objects in object clusters generated based on the enrichable content received at the operation 902 may be generated.
  • one or more objects of the object clusters may be converted into a single column representing the one or more objects of the object clusters.
  • the generated vector may be normalized, as described herein in accordance with criteria including one or more of: a data size of a vector, a particular physical dimension of a vector, and so on. The generated vector may then be compared with one or more stored vectors in a database, for example, an index database, at 806.
  • each of the one or more vectors stored in the index database may be checked whether each of the one or more vectors stored in the index database correspond to the same enrichable content and/or content generated having different properties based on the received enrichable content. If it is determined that the one or more vectors stored in the index database and found matching with a vector generated at the operation 906 all correspond to the same enrichable content and/or content generated having different properties based on the received enrichable content, then an index corresponding to any of the one or more vectors may be identified for further processing at the operation 908.
  • an index corresponding to a vector having the best match may be identified for further processing at the operation 908.
  • a set of actions associated with an index of the vector stored in index database, and as determined at the operation 906 above, may be retrieved from an actions database.
  • the retrieved set of actions may be returned to the user for displaying on the user’s client device. The user may then select one or more actions for execution.
  • FIG. 9B depicts another flowchart corresponding to example operations of a method being performed by a retrieval system, in accordance with some embodiments.
  • the set of actions retrieved at the operation 908 may include an action identifying whether the enrichable content is available for use in metaverse. For example, a user may have uploaded an image of particular clothing, which may be available not only as a physical purchase but may only be available as a digital purchase or lease.
  • one or more options related to purchase and/or lease of the digital content in metaverse may be presented to a user.
  • FIG. 10 depicts a flowchart corresponding to example operations of a method being performed by a retrieval system, in particular, for an occluded QR code and/or an out-of-range NFC and/or BLE beacon, in accordance with some embodiments.
  • enrichable content may be received at a retrieval system, as shown herein in accordance with some embodiments in FIG. 1, FIG. 3, and/or FIG. 4B.
  • the enrichable content received at the retrieval system may include one or more images and/or videos taken using a camera of a client device, such as the client device 102, 302, and/or 424.
  • a user of the client device 102, 302, and/or 424 may launch an application and/or a web interface to take the one or more images and/or video using the camera of the client device.
  • a user may transmit the enrichable content to the retrieval system via a gateway over a network.
  • object clusters may be generated in which one or more objects in the received enrichable content may be identified.
  • a predetermined area of the enrichable content may be analyzed for determining one or more objects.
  • the predetermined area may be a central portion of an image.
  • the predetermined area may be selected based on blurring of an image. Accordingly, a picture area without blurring may be selected for analysis to generate object clusters as described herein.
  • an identifier of a number of objects in object clusters generated based on the enrichable content received at the operation 1002 may be generated.
  • one or more objects of the object clusters may be converted into a single column representing the one or more objects of the object clusters.
  • the generated vector may be normalized, as described herein in accordance with criteria including one or more of: a data size of a vector, a particular physical dimension of a vector, and so on. The generated vector may then be compared with one or more stored vectors in a database, for example, an index database, at the operation 806.
  • each of the one or more vectors stored in the index database may be checked whether each of the one or more vectors stored in the index database correspond to the same enrichable content and/or content generated having different properties based on the received enrichable content. If it is determined that the one or more vectors stored in the index database and found matching with a vector generated at the operation 1006 all correspond to the same enrichable content and/or content generated having different properties based on the received enrichable content, then an index corresponding to any of the one or more vectors may be identified for further processing at the operation 1008.
  • an index corresponding to a vector having the best match may be identified for further processing at the operation 1008.
  • a set of actions associated with an index of the vector stored in index database, and as determined at the operation 1006 above, may be retrieved from an actions database.
  • the retrieved set of actions may be analyzed, and it may be determined that the enrichable content received at the operation 1002 corresponds with a QR code, an NFC tag, and/or a BLE.
  • the user may be within an NFC and/or BLE range or may be outside of the NFC and/or BLE beacon range.
  • the QR code may be scannable or may be occluded, hidden, and/or damaged.
  • a user may be notified of the occluded QR code and/or the out-of-range NFC tag and/or BLE beacon.
  • a user input may be received in which the user may have selected an option to execute an action associated with the occluded QR code and/or the out-of- range NFC tag and/or BLE beacon. Accordingly, a user can execute functions associated with an out-of-range NFC tag and/or BLE beacon even when the user is not within an NFC and/or BLE beacon range. Similarly, a user can execute functions associated with a QR code even when the QR code is unscannable, occluded, hidden, and/or damaged.
  • each microservice may be configured to provide data output and receive data input across an encrypted data channel.
  • each microservice may be configured to store its own data in a dedicated encrypted database; in others, microservices can store encrypted data in a common database; whether such data is stored in tables shared by multiple microservices or whether microservices may leverage independent and separate tables/schemas can vary from embodiment to embodiment.
  • the phrase “at least one of’ preceding a series of items, with the term “and” or “or” to separate any of the items, modifies the list as a whole, rather than each member of the list.
  • the phrase “at least one of’ does not require selection of at least one of each item listed; rather, the phrase allows a meaning that includes at a minimum one of any of the items, and/or at a minimum one of any combination of the items, and/or at a minimum one of each of the items.
  • the phrases “at least one of A, B, and C” or “at least one of A, B, or C” each refer to only A, only B, or only C; any combination of A, B, and C; and/or one or more of each of A, B, and C.
  • an order of elements presented for a conjunctive or disjunctive list provided herein should not be construed as limiting the disclosure to only that order provided.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Library & Information Science (AREA)
  • Human Computer Interaction (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Processing Or Creating Images (AREA)

Abstract

L'invention concerne un système et un procédé d'ingestion de contenu enrichissable, et au moins une action associée. Au moins un objet est identifié à partir du contenu enrichissable reçu, et des groupes d'objets sont générés. En fonction de chaque objet desdits groupes, un identifiant des groupes d'objets est généré et sauvegardé en tant qu'enregistrement dans une base de données. Ladite action au moins correspondant au contenu enrichissable est stockée dans au moins une base de données et associée à un index d'un enregistrement de base de données représentant l'identifiant des groupes d'objets.
PCT/US2023/021728 2022-05-10 2023-05-10 Systèmes et procédés d'ingestion et de traitement de contenu enrichissable WO2023220172A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202263340101P 2022-05-10 2022-05-10
US63/340,101 2022-05-10

Publications (1)

Publication Number Publication Date
WO2023220172A1 true WO2023220172A1 (fr) 2023-11-16

Family

ID=88730910

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2023/021728 WO2023220172A1 (fr) 2022-05-10 2023-05-10 Systèmes et procédés d'ingestion et de traitement de contenu enrichissable

Country Status (1)

Country Link
WO (1) WO2023220172A1 (fr)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140247278A1 (en) * 2013-03-01 2014-09-04 Layar B.V. Barcode visualization in augmented reality
US9002831B1 (en) * 2011-06-10 2015-04-07 Google Inc. Query image search
US20160117061A1 (en) * 2013-06-03 2016-04-28 Miworld Technologies Inc. System and method for image based interactions
US20170286901A1 (en) * 2016-03-29 2017-10-05 Bossa Nova Robotics Ip, Inc. System and Method for Locating, Identifying and Counting Items

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9002831B1 (en) * 2011-06-10 2015-04-07 Google Inc. Query image search
US20140247278A1 (en) * 2013-03-01 2014-09-04 Layar B.V. Barcode visualization in augmented reality
US20160117061A1 (en) * 2013-06-03 2016-04-28 Miworld Technologies Inc. System and method for image based interactions
US20170286901A1 (en) * 2016-03-29 2017-10-05 Bossa Nova Robotics Ip, Inc. System and Method for Locating, Identifying and Counting Items

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ASHWANI KUMAR;ZUOPENGJUSTIN ZHANG;HONGBO LYU: "Object detection in real time based on improved single shot multi-box detector algorithm", EURASIP JOURNAL ON WIRELESS COMMUNICATIONS AND NETWORKING, BIOMED CENTRAL LTD, LONDON, UK, vol. 2020, no. 1, 17 October 2020 (2020-10-17), London, UK , pages 1 - 18, XP021282983, DOI: 10.1186/s13638-020-01826-x *
LAI ET AL.: "Instance-aware hashing for multi-label image retrieval", IEEE TRANSACTIONS ON IMAGE PROCESSING, vol. 25, no. 6, 2016, pages 2469 - 2479, XP011606255, Retrieved from the Internet <URL:https://ieeexplore.ieee.org/abstract/document/7438833> [retrieved on 20230629], DOI: 10.1109/TIP.2016.2545300 *

Similar Documents

Publication Publication Date Title
US10900772B2 (en) Apparatus and methods for facial recognition and video analytics to identify individuals in contextual video streams
US10140549B2 (en) Scalable image matching
US10891671B2 (en) Image recognition result culling
US9436883B2 (en) Collaborative text detection and recognition
US9424461B1 (en) Object recognition for three-dimensional bodies
CN105659286B (zh) 自动化图像裁剪和分享
JP5621897B2 (ja) 処理方法、コンピュータプログラム及び処理装置
US10169684B1 (en) Methods and systems for recognizing objects based on one or more stored training images
US20170078756A1 (en) Visual hash tags via trending recognition activities, systems and methods
CN106663196A (zh) 视频中的计算机显著人物识别
US20140078174A1 (en) Augmented reality creation and consumption
JP2014170314A (ja) 情報処理システム、情報処理方法およびプログラム
US9904866B1 (en) Architectures for object recognition
US11302045B2 (en) Image processing apparatus, image providing apparatus,control methods thereof, and medium
US12033190B2 (en) System and method for content recognition and data categorization
US10600060B1 (en) Predictive analytics from visual data
JP2013195725A (ja) 画像表示システム
US20180039626A1 (en) System and method for tagging multimedia content elements based on facial representations
WO2023220172A1 (fr) Systèmes et procédés d&#39;ingestion et de traitement de contenu enrichissable
Bouma et al. WPSS: Watching people security services
US10860821B1 (en) Barcode disambiguation
US9922052B1 (en) Custom image data store
US9275394B2 (en) Identifying user-target relation
Savadatti-Kamath Video analysis and compression for surveillance applications

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23804210

Country of ref document: EP

Kind code of ref document: A1