US11494377B2 - Multi-detector probabilistic reasoning for natural language queries - Google Patents
Multi-detector probabilistic reasoning for natural language queries Download PDFInfo
- Publication number
- US11494377B2 US11494377B2 US16/819,947 US202016819947A US11494377B2 US 11494377 B2 US11494377 B2 US 11494377B2 US 202016819947 A US202016819947 A US 202016819947A US 11494377 B2 US11494377 B2 US 11494377B2
- Authority
- US
- United States
- Prior art keywords
- query
- detector
- dag
- detectors
- ontology
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active, expires
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/71—Indexing; Data structures therefor; Storage structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2452—Query translation
- G06F16/24522—Translation of natural language queries to structured queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/242—Query formulation
- G06F16/243—Natural language query formulation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/73—Querying
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/73—Querying
- G06F16/732—Query formulation
- G06F16/7343—Query language or query format
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/73—Querying
- G06F16/738—Presentation of query results
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/901—Indexing; Data structures therefor; Storage structures
- G06F16/9024—Graphs; Linked lists
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G06N7/005—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computing arrangements based on specific mathematical models
- G06N7/01—Probabilistic graphical models, e.g. probabilistic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
Definitions
- the present invention relates to video processing and more particularly to searching image media.
- a system for solving queries on image data.
- the system includes a processor device coupled to a memory device.
- the system includes a detector manager with a detector application programming interface (API) to allow external detectors to be inserted into the system by exposing capabilities of the external detectors and providing a predetermined way to execute the external detectors.
- An ontology manager exposes knowledge bases regarding ontologies to a reasoning engine.
- a query parser transforms a natural query into query directed acyclic graph (DAG).
- DAG query directed acyclic graph
- the system includes a reasoning engine that uses the query DAG, the ontology manager and the detector API to plan an execution list of detectors.
- the reasoning engine uses the query DAG, a scene representation DAG produced by the external detectors and the ontology manager to answer the natural query.
- a method for solving queries on image data.
- the method includes implementing a detector manager with a detector application programming interface (API) to allow external detectors to be inserted into the system by exposing capabilities of the external detectors and providing a predetermined way to execute the external detectors.
- the method includes implementing an ontology manager that exposes knowledge bases regarding ontologies to a reasoning engine.
- the method includes implementing a query parser that transforms a natural query into query directed acyclic graph (DAG).
- DAG query directed acyclic graph
- the method includes implementing a reasoning engine that uses the query DAG, the ontology manager and the detector API to plan an execution list of detectors.
- the reasoning engine uses the query DAG, a scene representation DAG produced by the external detectors and the ontology manager to answer the natural query.
- FIG. 1 is a block diagram showing a high-level system for multi-detector probabilistic reasoning, in accordance with an embodiment of the present invention
- FIG. 2 is a block diagram illustrating a flowchart of a high-level system for multi-detector probabilistic reasoning, in accordance with an embodiment of the present invention
- FIG. 3 is a block diagram illustrating components of a probabilistic logic engine, in accordance with an embodiment of the present invention
- FIG. 4 is a block diagram illustrating a query parsed into a directed acyclic graph (DAG), in accordance with the present principles
- FIG. 5 is a block diagram illustrating a DAG of objects, attributes and relations in a scene, in accordance with an embodiment of the present invention
- FIG. 6 is a block diagram illustrating an image result of a query, in accordance with the present principles.
- FIG. 7 is a block diagram illustrating a method for solving queries on image data, in accordance with an embodiment of the present invention.
- the systems include a detector manager, an ontology manager, a query parser, and a reasoning engine.
- the detector manager has a detector application programming interface (API) that allows external detectors to be inserted into the system by exposing capabilities of the detectors and providing a predetermined way to execute the detectors.
- the ontology manager exposes knowledge bases regarding ontologies to the reasoning engine.
- the query parser transforms each natural query into a query directed acyclic graph (DAG).
- DAG query directed acyclic graph
- the reasoning engine uses the query DAG, the ontology manager and the detector API to plan an execution list of detectors.
- the reasoning engine can then use the query DAG, a scene representation DAG produced by the detectors and the ontology manager to answer the natural query.
- the system can implement multi-detector probabilistic reasoning.
- the system can provide immediate answers to complex queries on vast amounts of surveillance data.
- the system provides a flexible and expandable probabilistic logic framework that goes beyond end-to-end learning approaches by leveraging these approaches in concert with ontologies to solve complex image/video queries.
- the system is designed to be efficiently tailored to individual needs of specific applications and therefore can be deployed in a short amount of time without requiring a full-fledged supervised training cycle.
- Embodiments described herein may be entirely hardware, entirely software or including both hardware and software elements.
- the present invention is implemented in software, which includes but is not limited to firmware, resident software, microcode, etc.
- Embodiments described herein may be entirely hardware, entirely software or including both hardware and software elements.
- the present invention is implemented in software, which includes but is not limited to firmware, resident software, microcode, etc.
- Embodiments may include a computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system.
- a computer-usable or computer readable medium may include any apparatus that stores, communicates, propagates, or transports the program for use by or in connection with the instruction execution system, apparatus, or device.
- the medium can be magnetic, optical, electronic, electromagnetic, infrared, or semiconductor system (or apparatus or device) or a propagation medium.
- the medium may include a computer-readable storage medium such as a semiconductor or solid-state memory, magnetic tape, a removable computer diskette, a random-access memory (RAM), a read-only memory (ROM), a rigid magnetic disk and an optical disk, etc.
- Each computer program may be tangibly stored in a machine-readable storage media or device (e.g., program memory or magnetic disk) readable by a general or special purpose programmable computer, for configuring and controlling operation of a computer when the storage media or device is read by the computer to perform the procedures described herein.
- the inventive system may also be considered to be embodied in a computer-readable storage medium, configured with a computer program, where the storage medium so configured causes a computer to operate in a specific and predefined manner to perform the functions described herein.
- a data processing system suitable for storing and/or executing program code may include at least one processor coupled directly or indirectly to memory elements through a system bus.
- the memory elements can include local memory employed during actual execution of the program code, bulk storage, and cache memories which provide temporary storage of at least some program code to reduce the number of times code is retrieved from bulk storage during execution.
- I/O devices including but not limited to keyboards, displays, pointing devices, etc. may be coupled to the system either directly or through intervening I/O controllers.
- Network adapters may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks.
- Modems, cable modem and Ethernet cards are just a few of the currently available types of network adapters.
- FIG. 1 is a block diagram showing an exemplary processing system 100 for multi-detector probabilistic reasoning, in accordance with an embodiment of the present invention.
- the processing system 100 includes a set of processing units (e.g., CPUs) 101 , a set of GPUs 102 , a set of memory devices 103 , a set of communication devices 104 , and set of peripherals 105 .
- the CPUs 101 can be single or multi-core CPUs.
- the GPUs 102 can be single or multi-core GPUs.
- the one or more memory devices 103 can include caches, RAMs, ROMs, and other memories (flash, optical, magnetic, etc.).
- the communication devices 104 can include wireless and/or wired communication devices (e.g., network (e.g., WIFI, etc.) adapters, etc.).
- the peripherals 105 can include a display device, a user input device, a printer, an imaging device, and so forth. Elements of processing system 100 are connected by one or more buses or networks (collectively denoted by the figure reference numeral 110 ).
- memory devices 103 can store specially programmed software modules to transform the computer processing system into a special purpose computer configured to implement various aspects of the present invention.
- special purpose hardware e.g., Application Specific Integrated Circuits, Field Programmable Gate Arrays (FPGAs), and so forth
- FPGAs Field Programmable Gate Arrays
- memory devices 103 store program code for implementing one or more of the following: application programming interfaces (APIs) 130 , a probabilistic logic (for example, reasoning) engine 140 , a natural language query parser 150 , ontologies 160 , etc.
- APIs application programming interfaces
- the ontologies 160 form a knowledge base and encode useful knowledge into logic terms.
- processing system 100 may also include other elements (not shown), as readily contemplated by one of skill in the art, as well as omit certain elements.
- various other input devices and/or output devices can be included in processing system 100 , depending upon the particular implementation of the same, as readily understood by one of ordinary skill in the art.
- various types of wireless and/or wired input and/or output devices can be used.
- additional processors, controllers, memories, and so forth, in various configurations can also be utilized.
- FIG. 2 a block diagram of a high-level system 200 for multi-detector probabilistic reasoning is illustratively depicted in accordance with an embodiment of the present invention. Although a particular number of each type of component and/or layer of the system is illustrated, it should be understood that more or fewer of each component and/or layer.
- system 200 includes API 130 , probabilistic logic engine 140 , ontologies 160 , a query dashboard interface 205 , an ontologies dashboard interface 210 , a detector library 220 , and detectors 230 .
- System 200 can combine the components into a flexible and expandable software platform.
- detector API 130 can encapsulate detectors 230 , such as object detectors 240 (for example, off-the-shelve trained models such as You Only Look Once (YOLO) 242 , Regions with convolutional neural networks (R-CNN) 244 , Fast R-CNN, Faster R-CNN, etc.).
- object detectors 240 for example, off-the-shelve trained models such as You Only Look Once (YOLO) 242 , Regions with convolutional neural networks (R-CNN) 244 , Fast R-CNN, Faster R-CNN, etc.
- Probabilistic logic engine 140 includes a natural language query parser and can parse the natural language query received into a directed acyclic graph (DAG) which identifies language elements and groups them hierarchically, such as described below with respect to FIG. 3 .
- a directed acyclic graph is a finite directed graph with no directed cycles.
- a DAG has finitely many vertices and edges, with each edge directed from one vertex to another, such that there is no way to start at any particular vertex and follow a consistently-directed sequence of edges that eventually loops back to that particular vertex.
- a DAG is a directed graph that has a topological ordering, a sequence of the vertices such that every edge is directed from earlier to later in the sequence.
- Ontologies 160 encode useful knowledge into logic terms ( 150 ). Ontologies 160 can extend the capabilities of the detectors 230 by describing higher level concepts in terms of basic objects, thus linking natural language queries to the objects that the detectors 230 can detect.
- Query dashboard interface 205 provides an interface for the user to enter the query and, in some embodiments, optionally displays the results of query parsing with an interface to correct the query if needed.
- the ontologies dashboard interface 210 provides a user interface by which a user can specify which ontologies to use, and, optionally, add new domain knowledge.
- Detector library 220 can link concepts from the query with those that can be detected by the detectors 230 .
- Concepts can include objects, attributes and relations.
- Detectors 230 can include object detectors 240 (for example, different types of object detectors, such as YOLO 242 , R-CNN 244 , etc.), attribute detectors 250 (for example, different attributes, such as color 252 , shape 254 , etc.), and relation detectors 260 (for example, different relations, such as near 262 , behind 264 , etc.).
- object detectors 240 for example, different types of object detectors, such as YOLO 242 , R-CNN 244 , etc.
- attribute detectors 250 for example, different attributes, such as color 252 , shape 254 , etc.
- relation detectors 260 for example, different relations, such as near 262 , behind 264 , etc.
- the example embodiments can leverage (for example, existing, off-the-shelf, proprietary, etc.) trained models (for example, YOLO, R-CNN, etc.) and integrate the models into a probabilistic logic framework (thereby expanding their usefulness).
- Models can be added to the framework by encapsulating the models into an API 130 that provides a means for a detector 230 to advertise its capabilities in detecting objects, actions, relations or attributes.
- a user-inputted natural language query can then be parsed and encoded into a set of basic logic facts using the ontologies 160 and the detectors' 230 capabilities.
- the probabilistic logic engine 140 can solve the set of facts and returns the top matches.
- the top matches can be determined based on one or more threshold values associated with the objects, actions, relations and attributes, as well as predetermined numbers used to limit (for example, “cap”) a volume of answers to a query.
- probabilistic logic engine 140 can be used to solve queries without specific training.
- Detectors 160 can include pre-trained, off-the-shelves models (and/or proprietary, or trained models, etc.) that are integrated into system 200 via an API 130 that allows the detectors 230 to advertise their capabilities.
- Ontologies 160 extend the capabilities of the detectors 230 by describing higher level concepts in terms of the basic objects and thereby link the natural language query to the objects that are detectable by the detectors 230 .
- Object relations and attributes are handled in a similar way via detectors 230 and are deduced from the query by NLP parsers.
- the system 200 can be applied to realize (determine answers to, results for, etc.) queries efficiently and help retrieve information from video streams for applications (such as security in public spaces). For example, surveillance cameras can produce constant streams of video.
- the system 200 can be applied to finding useful content, for example, in instances in which most of the content is uninteresting (or not relevant, etc.).
- the system 200 can find relevant information from video streams in a manner that avoids tedious, time consuming and error-prone work for human operators.
- the information that the system 200 can be directed to finding can include information based on high-level queries (for example, search instructions) regarding particular objects and/or actions that occur in the video streams. For example, the information hidden in the huge amount of video that an operator wants to retrieve can generally be described in high-level queries, such as “find people falling”, “find animal crossing the road”, “find tandem motorbikes”, etc.
- the system 200 includes a flexible and expandable probabilistic logic framework that goes beyond end-to-end learning approaches by leveraging them in concert with ontologies 160 to solve complex image/video queries.
- the system 200 can be efficiently tailored to individual needs of specific applications and therefore can be deployed in a short amount of time without requiring a full-fledged supervised training cycle.
- FIG. 3 a block diagram 300 of components of a probabilistic logic engine 140 and associated interfaces is illustratively depicted in accordance with one embodiment of the present invention.
- probabilistic logic engine 140 interfaces with a user interface (herein hereafter, “intf”) dashboard 304 , a detector API (D-API) 350 and a knowledge base (KB) API 352 .
- the user interface dashboard 304 can include, for example, a web server (for example, an ORBITTM web server) that accesses (for example, receives) answers (via answer interface 326 and collection manager 324 ) to natural language queries 302 such as “find a person near a white car” and outputs the results of the processing (for example, particular media that match the query) by probabilistic logic engine 140 on a user display 306 , such as further described herein below with respect to FIG. 6 by way of non-limiting example.
- the user interface dashboard 304 can also access a vocabulary interface 364 that is connected to a detector manager 362 .
- the probabilistic logic engine 140 receives the natural language query 302 via a query interface 308 .
- the natural language query 302 is processed via a query processor 310 .
- the probabilistic logic engine 140 (also known as (aka) the “reasoning” engine) can parse the natural language query 302 into a query directed acyclic graph (DAG) 312 which identifies language elements and groups them hierarchically.
- DAG query directed acyclic graph
- the QDAG 312 is then translated into logic statements that are appropriate for the particular probabilistic logic engine used.
- the QDAG 312 is also provided to the grounding unit 314 .
- FIG. 4 a block diagram 400 illustrating a query parsed into a directed acyclic graph (DAG) is illustratively depicted in accordance with an embodiment of the present invention.
- DAG directed acyclic graph
- a query is provided to illustrate an implementation of the system 200 .
- the system 200 can receive a natural language query 302 such as: “Find a person near a white vehicle”.
- the natural language query 302 is parsed into a DAG 400 by the natural language processing module (query processor 310 ) (for example, leveraging structural information retrieved by an associated neural network for semantic extraction such as SENNATM).
- the DAG 400 includes different types of elements, such as relation 405 (near, type: relation), object 410 (person, type: object) and 415 (vehicle, type: object) and attribute 420 (white, type: attribute).
- An ontology manager 370 can control access to the ontologies 160 .
- QDAG 312 is then converted into logic facts using rules that depend on the underlying logic engine used (for example, a mechanical translation).
- Ontologies 160 (such as proprietary ontologies 354 , for example, domain specific, semantic templates, detector assignments, etc., and/or public ontologies 356 , for example, synonym service, concept relations, ConceptNet NumberbatchTM, etc.) can be accessed by ontology manager 370 via knowledge base (KB)-API 352 .
- Ontologies 160 can include a set of concepts and categories in a subject area or domain that shows their properties and the relations between them.
- the ontology manager 370 can then use ontologies 160 to expand high-level concepts into lower level ones.
- Concepts include objects, attributes and relations.
- the detector library 220 is used to link concepts from the parsed query with those that can be detected by the detectors 230 .
- the query DAG, it's dependencies and data types, as well as the retrieved ontology rules are translated to the corresponding logic language for grounding.
- the system 200 converts the two query objects ‘person’ and ‘vehicle’ to the following logic representation: ‘is_a(X, person)’, ‘is_a(Y, vehicle)’, attributes become ‘is_of(Y, white)’ and relations are encoded to ‘is_near(X,Y)’.
- ontology rules from public ontologies (e.g. ConceptNet) 356 and proprietary ontologies 354 are translated into corresponding logic language. For instance, in Problog language, such rules could be:
- the system 200 can include various object and relation extractors (detectors 230 ), each of which can each advertise detection capabilities.
- the object detectors 240 can include object detectors such as maskRCNN detector, YOLO detector, Hat detector, torso detector, etc.
- the attribute detectors 250 can include attribute detectors such as a RGB (red, green and blue) color detector, center color detector, HSV (hue, standard, value) color detector, size detector, etc.
- the relation detectors 260 can include relation detectors such as a bounding box relation detector, a vicinity detector, etc.
- a detector manager 362 can access the D-API 350 to allow external detectors 230 to be inserted into the system by exposing their capabilities and providing a predetermined (for example, standardized, preset, etc.) way to execute them.
- the detectors 230 can be registered on the detector registry 360 .
- the grounding unit (GU 314 ) and execution planner 330 determine the appropriate set of object 240 , attribute 250 and relation detectors 260 (via D-API 350 ) to be used in order to answer the query 302 .
- Grounding unit 314 takes the logic translations of the query, ontology rules and detector capabilities and identifies all possible ways the query might be answered using the existing detectors. The different combinations are recorded and combined into an execution list which is passed to the extraction engine 340 to run the corresponding detectors 230 to answer the query:
- the output of the grounding unit 314 is a premature execution list that can then be passed to the execution planner 330 for final assembly and enrichment via Input Backtracking, to ensure that a detector 230 which cannot run on the image bounding box, can receive a bounding box containing an input concept type that it understands.
- a bounding box can include coordinates of the rectangular border that fully encloses a digital image when it is placed over a bi-dimensional background.
- Backtracking can include a technique for solving problems recursively by building a solution incrementally, one piece at a time, removing solutions that fail to satisfy the constraints of the problem at any point of time. For example, detection of a first type of object can be contingent on detection of a second type of object in association with the first type of object.
- a person detector will be added to the execution list such that the extraction engine 340 will be able to run both (for example, to identify a person and a corresponding hat).
- the detector can be specifically trained to only find hat's on people's heads (for example, the contingency can also be relative position dependent).
- the finalized execution list 332 is then passed to the extraction engine 340 which dispatches the right bounding boxes to the corresponding detectors 230 , while minimizing overhead to run via keeping track of detection history, caching previous detections (for example, using caching system 342 ) and batch processing in instances in which the detector 230 provides that feature.
- the extraction engine 340 executes the different detectors 230 and registers all detections into the representation DAG 344 of the corresponding image.
- the fully populated representation DAG 344 is consisting of all found objects, their attributes and relations.
- the result is a representation directed acyclic graph (DAG) 344 of the objects, their attributes and their relations in the scene (for example, a scene as shown in FIG. 6 ), including the corresponding relations between the objects as shown in FIG. 5 .
- the graph 500 includes relations (for example, to left of 510 , to right of 515 , near 520 , in front of 525 , etc.) between objects (for example, car 530 , man 535 , bicycle 540 , etc.) and attributes (color 550 , white 555 , etc.).
- Each node may contain additional information such the confidence of the detection, the location (bounding box) of the object detected, the detector used, etc.
- the Query DAG 312 from FIG. 3 and the representation DAG 344 (from FIG. 5 ) are then passed to the logic processor 322 where both are parsed into the corresponding logic language for inference by the probabilistic logic engine 140 , where the PLE 316 is used.
- the results can take a similar form as shown in the following, where the probabilities correspond to the confidences reported by the detectors 230 for a given detection:
- n5 and n7 are the node IDs of the nodes in the representation DAG.
- the translation to logic may use the following mapping:
- the converted facts for each image are then evaluated by the PLE 316 with regard to correspondence to the original query, also taking into account the extending ontological rule set in order to evaluate the probability for the query to be answered successfully.
- the result consisting of answer probability and objects contained in corresponding answer set, is registered by the logic processor 322 in the answer_set field of the representation DAG 344 and handed to the collection manager module 324 which populates the bounding boxes, pixel masks, and labels onto the respective images for presentation.
- Such a result can be represented in a similar manner as follows, where n5 and n7 are the object IDs of the objects involved in the answer set. The number after is the probability assigned by the probabilistic logic.
- FIG. 6 is a block diagram 600 illustrating an image result of a query in accordance with an embodiment of the present invention.
- the images finalized by the collection manager 324 are stored to a predefined output location and displayed for the use in the user interface dashboard 304 (for example, of the web interface).
- the user interface dashboard 304 can be accessed via a graphical user interface of an associated device.
- the displayed image includes bounding boxes, pixel masks, and labels.
- a bounding box 670 (with broken lines to indicate highlighting, for example, via color, luminescence, etc.) is illustrated around the person 625 and another bounding box 660 (with broken lines) identifies a car 635 that the person 625 is “near”.
- Other persons, cars and objects can be identified with different types of bounding boxes 650 (for example, that indicate the type of object that does not meet the criteria of the query).
- the displayed image can include identifiers for each of the persons and cars corresponding to, for example, identifiers for each object stored in an associated database (not shown).
- FIG. 7 is a block diagram of a method 700 for solving a query on image data is illustratively depicted in accordance with an embodiment of the present invention.
- system 200 implements a detector manager 362 with an API (D-API 350 ) to allow external detectors 230 to be inserted into the system 200 by exposing their capabilities.
- the detector manager 362 also provides a predetermined way to execute the detectors 230 .
- system 200 implements an ontology manager 370 (for example, that uses ontologies 160 and KB-API 352 ) that exposes knowledge bases to the reasoning engine 140 .
- the implementation of the reasoning engine 140 can be based on any of the following: Markov Logic Networks, Probabilistic Logic (Problog), Bayesian Logic (BLOG), Probabilistic Similarity Logic (PSL), etc.
- system 200 implements a query parser (for example, query processor 310 ) that transforms the natural query 302 into a query directed acyclic graph 312 .
- the query parser can use trained language models to parse the query.
- system 200 uses the query DAG 312 , ontology manager 370 and detector API 350 to plan an execution list of detectors 230 .
- system 200 uses the query DAG 312 , scene representation DAG 344 produced by the detectors 230 and the ontology manager 370 to answer the query 302 .
- the system 200 can return answers to the natural query in a predetermined format that highlights subjects of the natural query.
- the term “hardware processor subsystem” or “hardware processor” can refer to a processor, memory, software or combinations thereof that cooperate to perform one or more specific tasks.
- the hardware processor subsystem can include one or more data processing elements (e.g., logic circuits, processing circuits, instruction execution devices, etc.).
- the one or more data processing elements can be included in a central processing unit, a graphics processing unit, and/or a separate processor- or computing element-based controller (e.g., logic gates, etc.).
- the hardware processor subsystem can include one or more on-board memories (e.g., caches, dedicated memory arrays, read only memory, etc.).
- the hardware processor subsystem can include one or more memories that can be on or off board or that can be dedicated for use by the hardware processor subsystem (e.g., ROM, RAM, basic input/output system (BIOS), etc.).
- the hardware processor subsystem can include and execute one or more software elements.
- the one or more software elements can include an operating system and/or one or more applications and/or specific code to achieve a specified result.
- the hardware processor subsystem can include dedicated, specialized circuitry that performs one or more electronic processing functions to achieve a specified result.
- Such circuitry can include one or more application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), and/or programmable logic arrays (PLAs).
- ASICs application-specific integrated circuits
- FPGAs field-programmable gate arrays
- PDAs programmable logic arrays
- any of the following “/”, “and/or”, and “at least one of”, for example, in the cases of “A/B”, “A and/or B” and “at least one of A and B”, is intended to encompass the selection of the first listed option (A) only, or the selection of the second listed option (B) only, or the selection of both options (A and B).
- such phrasing is intended to encompass the selection of the first listed option (A) only, or the selection of the second listed option (B) only, or the selection of the third listed option (C) only, or the selection of the first and the second listed options (A and B) only, or the selection of the first and third listed options (A and C) only, or the selection of the second and third listed options (B and C) only, or the selection of all three options (A and B and C).
- This may be extended for as many items listed.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Multimedia (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Algebra (AREA)
- Probability & Statistics with Applications (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Image Analysis (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
Description
-
- is_a(X,vehicle):-is_a(X,car).
- is_a(X,vehicle):-is_a(X,suv).
- is_a(X,vehicle):-is_a(X,automobile).
- is_a(X,vehicle):-is_a(X,truck).
-
- 0.25011::is_a(n5,person).
- 0.17231::is_a(n7,car).
- 0.663::is_of(n7,white).
- 0.027290636064548::is_near(n5,n7).
-
- Objects→detector_confidence::is_a(X,object_concept)
- Attributes→detector_confidence::is_of(X,attribute_concept)
- Relations→detector_confidence::is_relation_concept(X,Y).
-
- ans(n5,n7): 0.00077977395.
Claims (19)
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/819,947 US11494377B2 (en) | 2019-04-01 | 2020-03-16 | Multi-detector probabilistic reasoning for natural language queries |
PCT/US2020/023148 WO2020205230A1 (en) | 2019-04-01 | 2020-03-17 | Multi-detector probabilistic reasoning for natural language queries |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201962827272P | 2019-04-01 | 2019-04-01 | |
US16/819,947 US11494377B2 (en) | 2019-04-01 | 2020-03-16 | Multi-detector probabilistic reasoning for natural language queries |
Publications (2)
Publication Number | Publication Date |
---|---|
US20200311072A1 US20200311072A1 (en) | 2020-10-01 |
US11494377B2 true US11494377B2 (en) | 2022-11-08 |
Family
ID=72607939
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/819,947 Active 2040-12-15 US11494377B2 (en) | 2019-04-01 | 2020-03-16 | Multi-detector probabilistic reasoning for natural language queries |
Country Status (2)
Country | Link |
---|---|
US (1) | US11494377B2 (en) |
WO (1) | WO2020205230A1 (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11494377B2 (en) * | 2019-04-01 | 2022-11-08 | Nec Corporation | Multi-detector probabilistic reasoning for natural language queries |
Citations (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6266053B1 (en) * | 1998-04-03 | 2001-07-24 | Synapix, Inc. | Time inheritance scene graph for representation of media content |
US20040249809A1 (en) * | 2003-01-25 | 2004-12-09 | Purdue Research Foundation | Methods, systems, and data structures for performing searches on three dimensional objects |
US6912293B1 (en) * | 1998-06-26 | 2005-06-28 | Carl P. Korobkin | Photogrammetry engine for model construction |
US20120310916A1 (en) * | 2010-06-04 | 2012-12-06 | Yale University | Query Execution Systems and Methods |
US20140236578A1 (en) * | 2013-02-15 | 2014-08-21 | Nec Laboratories America, Inc. | Question-Answering by Recursive Parse Tree Descent |
US20140324864A1 (en) * | 2013-04-12 | 2014-10-30 | Objectvideo, Inc. | Graph matching by sub-graph grouping and indexing |
CN104462084A (en) * | 2013-09-13 | 2015-03-25 | Sap欧洲公司 | Search refinement advice based on multiple queries |
US20150331929A1 (en) * | 2014-05-16 | 2015-11-19 | Microsoft Corporation | Natural language image search |
US20170024460A1 (en) * | 2015-07-23 | 2017-01-26 | International Business Machines Corporation | Context sensitive query expansion |
US20170124432A1 (en) * | 2015-11-03 | 2017-05-04 | Baidu Usa Llc | Systems and methods for attention-based configurable convolutional neural networks (abc-cnn) for visual question answering |
US20180096192A1 (en) * | 2016-10-04 | 2018-04-05 | Disney Enterprises, Inc. | Systems and Methods for Identifying Objects in Media Contents |
US20180232648A1 (en) * | 2017-02-14 | 2018-08-16 | Cognitive Scale, Inc. | Navigating a Hierarchical Abstraction of Topics via an Augmented Gamma Belief Network Operation |
US10168899B1 (en) * | 2015-03-16 | 2019-01-01 | FiftyThree, Inc. | Computer-readable media and related methods for processing hand-drawn image elements |
US20190278771A1 (en) * | 2015-09-11 | 2019-09-12 | Google Llc | Disambiguating join paths for natural language queries |
US10503775B1 (en) * | 2016-12-28 | 2019-12-10 | Shutterstock, Inc. | Composition aware image querying |
US10789288B1 (en) * | 2018-05-17 | 2020-09-29 | Shutterstock, Inc. | Relational model based natural language querying to identify object relationships in scene |
US20200311072A1 (en) * | 2019-04-01 | 2020-10-01 | Nec Laboratories America, Inc. | Multi-detector probabilistic reasoning for natural language queries |
US20200356829A1 (en) * | 2019-05-08 | 2020-11-12 | Accenture Global Solutions Limited | Multi-modal visual question answering system |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8856156B1 (en) * | 2011-10-07 | 2014-10-07 | Cerner Innovation, Inc. | Ontology mapper |
-
2020
- 2020-03-16 US US16/819,947 patent/US11494377B2/en active Active
- 2020-03-17 WO PCT/US2020/023148 patent/WO2020205230A1/en active Application Filing
Patent Citations (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6266053B1 (en) * | 1998-04-03 | 2001-07-24 | Synapix, Inc. | Time inheritance scene graph for representation of media content |
US6912293B1 (en) * | 1998-06-26 | 2005-06-28 | Carl P. Korobkin | Photogrammetry engine for model construction |
US20040249809A1 (en) * | 2003-01-25 | 2004-12-09 | Purdue Research Foundation | Methods, systems, and data structures for performing searches on three dimensional objects |
US20120310916A1 (en) * | 2010-06-04 | 2012-12-06 | Yale University | Query Execution Systems and Methods |
US20140236578A1 (en) * | 2013-02-15 | 2014-08-21 | Nec Laboratories America, Inc. | Question-Answering by Recursive Parse Tree Descent |
US20140324864A1 (en) * | 2013-04-12 | 2014-10-30 | Objectvideo, Inc. | Graph matching by sub-graph grouping and indexing |
CN104462084A (en) * | 2013-09-13 | 2015-03-25 | Sap欧洲公司 | Search refinement advice based on multiple queries |
US20150331929A1 (en) * | 2014-05-16 | 2015-11-19 | Microsoft Corporation | Natural language image search |
US10168899B1 (en) * | 2015-03-16 | 2019-01-01 | FiftyThree, Inc. | Computer-readable media and related methods for processing hand-drawn image elements |
US20170024460A1 (en) * | 2015-07-23 | 2017-01-26 | International Business Machines Corporation | Context sensitive query expansion |
US20190278771A1 (en) * | 2015-09-11 | 2019-09-12 | Google Llc | Disambiguating join paths for natural language queries |
US20170124432A1 (en) * | 2015-11-03 | 2017-05-04 | Baidu Usa Llc | Systems and methods for attention-based configurable convolutional neural networks (abc-cnn) for visual question answering |
US20180096192A1 (en) * | 2016-10-04 | 2018-04-05 | Disney Enterprises, Inc. | Systems and Methods for Identifying Objects in Media Contents |
US10503775B1 (en) * | 2016-12-28 | 2019-12-10 | Shutterstock, Inc. | Composition aware image querying |
US20180232648A1 (en) * | 2017-02-14 | 2018-08-16 | Cognitive Scale, Inc. | Navigating a Hierarchical Abstraction of Topics via an Augmented Gamma Belief Network Operation |
US10789288B1 (en) * | 2018-05-17 | 2020-09-29 | Shutterstock, Inc. | Relational model based natural language querying to identify object relationships in scene |
US20200311072A1 (en) * | 2019-04-01 | 2020-10-01 | Nec Laboratories America, Inc. | Multi-detector probabilistic reasoning for natural language queries |
US20200356829A1 (en) * | 2019-05-08 | 2020-11-12 | Accenture Global Solutions Limited | Multi-modal visual question answering system |
Non-Patent Citations (12)
Title |
---|
Abhijit Suprem; "Approximate Query Matching for Image Retrieval" School of Computer Science, Georgia Tech Mar. 15, 2018 (Year: 2018). * |
Belongie, et al., "Color-and Texture-Based Image Segmentation Using EM and its Application to Content-Based Image Retrieval", Sixth International Conference on Computer Vision, Feb. 1998, pp. 1-8. |
Carneiro, et al., "Supervised Learning of Semantic Classes for Image Annotation and Retrieval", IEEE Transactions on Pattern Analysis and Machine Intelligence, Mar. 2007, pp. 394-410, vol. 29, No. 3. |
Chaorui Deng et al. "Visual Grounding via Accumulated Attention" in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 18, 2018, (pp. 7746-7755, sections 3.1.2, 3.3). |
Collobert, et al, "Fast semantic extraction using a novel neural network architecture", 45th Annual Meeting of the Association of Computational Linguistics, Jun. 2007, 8 pages. |
Damien Teney et al., "Graph-Structured Representations for Visual Question Answering" in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Jul. 21, 2017, (pp. 1-9, Sections 1-2 and figure 2). |
Jain, A., Mittal, K. & Tayal, D.K. Automatically incorporating context meaning for query expansion using graph connectivity measures. Prog Artif Intell 2, 129-139 (2014). https://doi.org/10.1007/s13748-014-0041-x (Year: 2014). * |
Liu, et al., "ConceptNet—a Practical Commonsense Reasoning Tool-Kit, BT technology Journal", Oct. 2004, pp. 211-226, vol. 22, No. 4, 211-226. |
M. Peng, Q. Lin, Y. Tian, M. Yang, Y. Xiao and B. Ni, "Query expansion based on Conceptual Word Cluster Space Graph," The 5th International Conference on New Trends in Information Science and Service Science, 2011, pp. 128-133. (Year: 2011). * |
Ma, et al., "Attend and Interact: Higher-Order Object Interactions for Video Understanding", arXiv:1711.06330v2 [cs.CV] Mar. 20, 2018, pp. 1-18. |
Taney et al. "Graph-Structured Representations for Visual Question Answering"; Australian Centre for Visual Technologies the University of Adelaide; 2016 (Year: 2016). * |
Yikang Li et al. "Scene Graph Generation from Objects, Phrases and Region Captions" in IEEE International Conference on Computer Vision (ICCV), Oct. 22, 2017, pp. 1261-1270 (sections 3.1-3.2, 3.4; and figures 2-4). |
Also Published As
Publication number | Publication date |
---|---|
WO2020205230A1 (en) | 2020-10-08 |
US20200311072A1 (en) | 2020-10-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9002066B2 (en) | Methods, systems and processor-readable media for designing a license plate overlay decal having infrared annotation marks | |
US20190370587A1 (en) | Attention-based explanations for artificial intelligence behavior | |
CN110232689A (en) | Semantic classes positions digital environment | |
CN113095346A (en) | Data labeling method and data labeling device | |
CN113841161A (en) | Extensible architecture for automatically generating content distribution images | |
WO2024045474A1 (en) | Image copywriting generation method, device, and computer storage medium | |
US20230281826A1 (en) | Panoptic segmentation with multi-database training using mixed embedding | |
US11250299B2 (en) | Learning representations of generalized cross-modal entailment tasks | |
US11423262B2 (en) | Automatically filtering out objects based on user preferences | |
US11494377B2 (en) | Multi-detector probabilistic reasoning for natural language queries | |
US20220284343A1 (en) | Machine teaching complex concepts assisted by computer vision and knowledge reasoning | |
CN111523351A (en) | Neural network training method and device and electronic equipment | |
Hu et al. | Xaitk-saliency: An open source explainable ai toolkit for saliency | |
CN111881900B (en) | Corpus generation method, corpus translation model training method, corpus translation model translation method, corpus translation device, corpus translation equipment and corpus translation medium | |
CN114943877A (en) | Model training method and device, electronic equipment and storage medium | |
CN113705559B (en) | Character recognition method and device based on artificial intelligence and electronic equipment | |
CN112115928B (en) | Training method and detection method of neural network based on illegal parking vehicle labels | |
CN114332798A (en) | Processing method and related device for network car booking environment information | |
CN114638973A (en) | Target image detection method and image detection model training method | |
CN112734778A (en) | Vehicle matting method, system, equipment and storage medium based on neural network | |
CN113792569A (en) | Object identification method and device, electronic equipment and readable medium | |
WO2024045641A1 (en) | Image annotation method and apparatus | |
Kawano et al. | TAG: Guidance-free Open-Vocabulary Semantic Segmentation | |
US11887379B2 (en) | Road sign content prediction and search in smart data management for training machine learning model | |
JP7362075B2 (en) | Information processing device, information processing method, and information processing program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: NEC LABORATORIES AMERICA, INC., NEW JERSEY Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:COSATTO, ERIC;NICULESCU-MIZIL, ALEXANDRU;REEL/FRAME:052126/0306 Effective date: 20200313 |
|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
AS | Assignment |
Owner name: NEC CORPORATION, JAPAN Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:NEC LABORATORIES AMERICA, INC.;REEL/FRAME:061237/0243 Effective date: 20220928 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |