EP4115326A2 - Generation and usage of semantic features for detection and correction of perception errors - Google Patents
Generation and usage of semantic features for detection and correction of perception errorsInfo
- Publication number
- EP4115326A2 EP4115326A2 EP21770086.3A EP21770086A EP4115326A2 EP 4115326 A2 EP4115326 A2 EP 4115326A2 EP 21770086 A EP21770086 A EP 21770086A EP 4115326 A2 EP4115326 A2 EP 4115326A2
- Authority
- EP
- European Patent Office
- Prior art keywords
- perception
- list
- semantic feature
- set forth
- parameter
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000008447 perception Effects 0.000 title claims abstract description 108
- 238000001514 detection method Methods 0.000 title description 24
- 238000012937 correction Methods 0.000 title description 8
- 239000013598 vector Substances 0.000 claims abstract description 24
- 238000000034 method Methods 0.000 claims description 33
- 239000000523 sample Substances 0.000 claims description 31
- 238000009826 distribution Methods 0.000 claims description 20
- 238000004590 computer program Methods 0.000 claims description 15
- 230000015654 memory Effects 0.000 claims description 15
- 230000006870 function Effects 0.000 description 23
- 230000008569 process Effects 0.000 description 15
- 238000005457 optimization Methods 0.000 description 6
- 238000012545 processing Methods 0.000 description 6
- 230000002123 temporal effect Effects 0.000 description 6
- 230000008901 benefit Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 4
- 238000003909 pattern recognition Methods 0.000 description 4
- 230000011218 segmentation Effects 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 3
- 238000004891 communication Methods 0.000 description 3
- 230000001143 conditioned effect Effects 0.000 description 3
- 238000013500 data storage Methods 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 230000001427 coherent effect Effects 0.000 description 2
- 230000003750 conditioning effect Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 230000007704 transition Effects 0.000 description 2
- 241000136406 Comones Species 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000004888 barrier function Effects 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000009472 formulation Methods 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 238000012905 input function Methods 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 230000005055 memory storage Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/42—Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/98—Detection or correction of errors, e.g. by rescanning the pattern or by human intervention; Evaluation of the quality of the acquired patterns
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/56—Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
Definitions
- the present invention relates to perception systems and, more specifically, to a perception system using semantic features for detection and correction of perception errors.
- Perception systems are commonly used for object recognition and tracking.
- deep-learning powers a major portion of a state-of-the-art perception system (see the List of Incorporated Literature References, Literature Reference No. 4).
- These systems are inherently hard to decipher and understand, which makes reasoning about its successes and failures a difficult task.
- most perception systems today operate solely on the basis of appearance-based information, be it three-dimensional depth information from a light detection and ranging system (LiDAR) or visual information in the form of an image from an electro-optical sensor.
- LiDAR light detection and ranging system
- visual information in the form of an image from an electro-optical sensor.
- current state-of-the-art perception systems lack conceptual information about physics of the world, notions of relationships between entities, and task-specific context.
- the system comprises one or more processors and a memory.
- the memory is a non-transitory computer-readable medium having executable instructions encoded thereon, such that upon execution of the instructions, the one or more processors perform several operations, such as generating a list of detected objects from perception data of a scene; generating a list of background classes from backgrounds in the perception data associated with the list of detected objects; for each detected object in the list of detected objects, identifying a closest background class from the list of background classes; determining an object embedding vector for the object class; determining a background class embedding vector for the closest background class; and determining a semantic feature based on a distance between the object embedding vector and the background class embedding vector.
- the system performs operations of generating a probabilistic distribution for the semantic feature, the probabilistic distribution having true positive and false positive distributions; identifying lower and upper bounds for the true positive distribution such that an area between the lower and upper bounds represents a confidence probability, P TP , of a true positive probe, such that the confidence probability, P TP , is an axiom for an input perception parameter; adjusting the input perception parameter based on the axiom to generate an optimal perception parameter; and adjusting one or more perception parameters of the perception system based on the optimal perception parameter.
- the semantic feature is a cosine similarity metric.
- the semantic feature is a conditional random fields (CRF) feature where co-occurrence statistics are obtained through a probabilistic framework, with a maximum a posteriori probability inference used to determine a likelihood of co-occurring objects.
- CCF conditional random fields
- the system performs an operation of causing an autonomous vehicle to initiate a physical operation based on the optimal perception parameter.
- the present invention also includes a computer program product and a computer implemented method.
- the computer program product includes computer-readable instructions stored on a non-transitory computer-readable medium that are executable by a computer having one or more processors, such that upon execution of the instructions, the one or more processors perform the operations listed herein.
- the computer implemented method includes an act of causing a computer to execute such instructions and perform the resulting operations.
- FIG. 1 is a block diagram depicting the components of a system according to various embodiments of the present invention.
- FIG. 2 is an illustration of a computer program product embodying an aspect of the present invention
- FIG. 3 is an illustration depicting pre-requisites for generating semantic features according to various embodiments of the present invention.
- FIG. 4 is a flowchart depicting a semantic feature generation process according to various embodiments of the present invention.
- FIG. 5 is an illustration depicting a structure of conditional random fields
- FIG. 6 is an illustration depicting example misclassifications in object detection according to various embodiments of the present invention.
- FIG. 7 is a flowchart depicting an overview of the PSTL framework according to various embodiments of the present invention.
- FIG. 8 is a graph depicting sample probability distributions of a probe according to various embodiments of the present invention.
- the present invention relates to a perception system using semantic features for detection and correction of perception errors.
- the following description is presented to enable one of ordinary skill in the art to make and use the invention and to incorporate it in the context of particular applications.
- Various modifications, as well as a variety of uses in different applications, will be readily apparent to those skilled in the art, and the general principles defined herein may be applied to a wide range of aspects.
- the present invention is not intended to be limited to the aspects presented, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
- Various embodiments of the invention include three “principal” aspects.
- the first is a perception system using semantic features for detection and correction of perception errors.
- the system is typically in the form of a computer system operating software or in the form of a “hard-coded” instruction set. This system may be incorporated into a wide variety of devices that provide different functionalities.
- the second principal aspect is a method, typically in the form of software, operated using a data processing system (computer).
- the third principal aspect is a computer program product.
- the computer program product generally represents computer-readable instructions stored on a non-transitory computer- readable medium such as an optical storage device, e.g., a compact disc (CD) or digital versatile disc (DVD), or a magnetic storage device such as a floppy disk or magnetic tape.
- a non-transitory computer- readable medium such as an optical storage device, e.g., a compact disc (CD) or digital versatile disc (DVD), or a magnetic storage device such as a floppy disk or magnetic tape.
- a non-transitory computer- readable medium such as an optical storage device, e.g., a compact disc (CD) or digital versatile disc (DVD), or a magnetic storage device such as a floppy disk or magnetic tape.
- Other, non-limiting examples of computer-readable media include hard disks, read-only memory (ROM), and flash-type memories.
- FIG. 1 A block diagram depicting an example of a system (i.e., computer system
- the computer system 100 is configured to perform calculations, processes, operations, and/or functions associated with a program or algorithm.
- certain processes and steps discussed herein are realized as a series of instructions (e.g., software program) that reside within computer readable memory units and are executed by one or more processors of the computer system 100. When executed, the instructions cause the computer system 100 to perform specific actions and exhibit specific behavior, such as described herein.
- the computer system 100 can be embodied in any device(s) that operates to perform the functions as described herein as applicable to the particular application, such as a desktop computer, a mobile or smart phone, a tablet computer, a computer embodied in a mobile platform, or any other device or devices that can individually and/or collectively execute the instructions to perform the related operations/processes.
- the computer system 100 may include an address/data bus 102 that is configured to communicate information. Additionally, one or more data processing units, such as a processor 104 (or processors), are coupled with the address/data bus 102.
- the processor 104 is configured to process information and instructions.
- the processor 104 is a microprocessor.
- the processor 104 may be a different type of processor such as a parallel processor, application-specific integrated circuit (ASIC), programmable logic array (PLA), complex programmable logic device (CPLD), or a field programmable gate array (FPGA) or any other processing component operable for performing the relevant operations.
- ASIC application-specific integrated circuit
- PLA programmable logic array
- CPLD complex programmable logic device
- FPGA field programmable gate array
- the computer system 100 is configured to utilize one or more data storage units.
- the computer system 100 may include a volatile memory unit 106 (e.g., random access memory (“RAM”), static RAM, dynamic RAM, etc.) coupled with the address/data bus 102, wherein a volatile memory unit 106 is configured to store information and instructions for the processor 104.
- RAM random access memory
- static RAM static RAM
- dynamic RAM dynamic RAM
- the computer system 100 further may include a non-volatile memory unit 108 (e.g., read-only memory (“ROM”), programmable ROM (“PROM”), erasable programmable ROM (“EPROM”), electrically erasable programmable ROM “EEPROM”), flash memory, etc.) coupled with the address/data bus 102, wherein the non-volatile memory unit 108 is configured to store static information and instructions for the processor 104.
- the computer system 100 may execute instructions retrieved from an online data storage unit such as in “Cloud” computing.
- the computer system 100 also may include one or more interfaces, such as an interface 110, coupled with the address/data bus 102.
- the one or more interfaces are configured to enable the computer system 100 to interface with other electronic devices and computer systems.
- the communication interfaces implemented by the one or more interfaces may include wireline (e.g., serial cables, modems, network adaptors, etc.) and/or wireless (e.g., wireless modems, wireless network adaptors, etc.) communication technology.
- the computer system 100 may include an input device 112 coupled with the address/data bus 102, wherein the input device 112 is configured to communicate information and command selections to the processor 104.
- the input device 112 is an alphanumeric input device, such as a keyboard, that may include alphanumeric and/or function keys.
- the input device 112 may be an input device other than an alphanumeric input device.
- the computer system 100 may include a cursor control device 114 coupled with the address/data bus 102, wherein the cursor control device 114 is configured to communicate user input information and/or command selections to the processor 104.
- the cursor control device 114 is implemented using a device such as a mouse, a track-ball, a trackpad, an optical tracking device, or a touch screen.
- the cursor control device 114 is directed and/or activated via input from the input device 112, such as in response to the use of special keys and key sequence commands associated with the input device 112.
- the cursor control device 114 is configured to be directed or guided by voice commands.
- the computer system 100 further may include one or more optional computer usable data storage devices, such as a storage device 116, coupled with the address/data bus 102.
- the storage device 116 is configured to store information and/or computer executable instructions.
- the storage device 116 is a storage device such as a magnetic or optical disk drive (e.g., hard disk drive (“HDD”), floppy diskette, compact disk read only memory (“CD-ROM”), digital versatile disk (“DVD”)).
- a display device 118 is coupled with the address/data bus 102, wherein the display device 118 is configured to display video and/or graphics.
- the display device 118 may include a cathode ray tube (“CRT”), liquid crystal display (“LCD”), field emission display (“FED”), plasma display, or any other display device suitable for displaying video and/or graphic images and alphanumeric characters recognizable to a user.
- CTR cathode ray tube
- LCD liquid crystal display
- FED field emission display
- plasma display or any other display device suitable for displaying video and/or graphic images and alphanumeric characters recognizable to a user.
- the computer system 100 presented herein is an example computing environment in accordance with an aspect.
- the non-limiting example of the computer system 100 is not strictly limited to being a computer system.
- the computer system 100 represents a type of data processing analysis that may be used in accordance with various aspects described herein.
- other computing systems may also be implemented.
- the spirit and scope of the present technology is not limited to any single data processing environment.
- one or more operations of various aspects of the present technology are controlled or implemented using computer-executable instructions, such as program modules, being executed by a computer.
- program modules include routines, programs, objects, components and/or data structures that are configured to perform particular tasks or implement particular abstract data types.
- an aspect provides that one or more aspects of the present technology are implemented by utilizing one or more distributed computing environments, such as where tasks are performed by remote processing devices that are linked through a communications network, or such as where various program modules are located in both local and remote computer-storage media including memory- storage devices.
- FIG. 2 An illustrative diagram of a computer program product (i.e., storage device) embodying the present invention is depicted in FIG. 2.
- the computer program product is depicted as floppy disk 200 or an optical disk 202 such as a CD or DVD.
- the computer program product generally represents computer-readable instructions stored on any compatible non-transitory computer-readable medium.
- the term “instructions” as used with respect to this invention generally indicates a set of operations to be performed on a computer, and may represent pieces of a whole program or individual, separable, software modules.
- Non-limiting examples of “instruction” include computer program code (source or object code) and “hard-coded” electronics (i.e. computer operations coded into a computer chip).
- the “instruction” is stored on any non-transitory computer-readable medium, such as in the memory of a computer or on a floppy disk, a CD-ROM, and a flash drive. In either event, the instructions are encoded on a non-transitory computer-readable medium.
- the present disclosure is directed to a perception system.
- the disclosure provides a system and method of generating features from semantic information and using such information for detecting and correcting errors in perception systems.
- the process enables the creation of feature embedding vectors in semantic space.
- the embedding may encode informational cues including, but not limited to, object co-occurrence, spatial relations, object/background taxonomy, word ontologies, scenegraph -based context and etymological relations.
- An understanding of word ontologies and scenegraph- based context can be found in Literature Reference Nos. 1 and 2, respectively.
- the ability to encode such information in an embedding vector helps a machine make sense of the context in perception systems instead of relying purely on appearance-based features.
- the disclosed system takes advantage of such semantic features using a probabilistic signal temporal logic framework to detect and correct perception errors including, but not limited to, object misclassification, missed object detections, broken object tracks and false positive object detection.
- An understanding of probabilistic signal temporal logic can be found in Literature Reference No. 3.
- the system may use the semantic information to create one or more constraints for the probabilistic signal temporal logic framework which is used to detect and correct the perception errors in such a perception system.
- the system of the present disclosure In detecting and correcting perception errors, the system of the present disclosure is more robust and computational less expensive than the prior art and, importantly, is highly-performant for essential applications.
- the system of the present disclosure provides several unique advantages over the prior art, including (1) the conversion of contextual semantic information from images and other sensor output to machine-understandable feature embedding vectors, (2) the usage of semantic feature embeddings to create constraints and axioms in a probabilistic temporal logic framework, (3) the improved evaluation and correction of perception errors aided by semantic context in a formally verifiable manner, and (4) the formulation of a dependency model between objects in the scene, allowing for use of the dependency model to improve detection accuracy.
- the system has several applications that employ a perception system. Some non-limiting examples include use in intelligence, surveillance and reconnaissance applications (ISR), autonomous vehicles and other unmanned aerial systems, as well as object recognition and tracking.
- ISR intelligence, surveillance and reconnaissance applications
- autonomous vehicles autonomous vehicles
- unmanned aerial systems as well as object recognition and tracking.
- object recognition and tracking The system described in the invention may also be used to benchmark other perception systems. Specific details are provided below.
- the present disclosure describes a method and system to generate a feature embedding that encodes semantic information and that uses such feature embeddings to detect and correct errors.
- the system first generates the semantic features to be used as probes. Using these probes, the system sets up a probabilistic signal temporal logic (PSTL), which provides axioms. With these axiom-based constraints, an optimization problem is solved to synthesize controls for the perception system which reduce perception errors.
- PSTL probabilistic signal temporal logic
- the system first generates semantic features to be used as probes.
- the system must first obtain a list of detected objects 302 in the image with their locations (via instance extraction) and a list of background classes 306 (defined as “stuff’ in COCO-stuff (see Literature Reference No. 5) along with their image masks (via background extraction).
- the list of objects/classes detected in the image are provided by a perception system.
- the system of the present disclosure sits on top of (i.e., works with) a perception system; in other words, the system of the present disclosure uses the perception system’s outputs and, based on the additional semantic context, confirms/improves the results of the perception system.
- the system of the present disclosure can generate semantic features for any object class from prior domain knowledge but, in one aspect, the present system is not responsible for detecting these objects in the image.
- These pre-requisites can be computed from the original input perception data 300, being, for example, electro-optical sensor images, lidar depth-maps, radar detections or any combination thereof.
- images from a camera will be used.
- these pre-requisites can be computed using a state-of-the-art panoptic segmentation 304 model (see Literature Reference Nos. 6, 7, and 8).
- any performant object detection technique e.g., see Literature Reference Nos. 8, 9, and 10) can be used in tandem with a semantic segmentation technique (see Literature Reference No.
- detected objects List of detected objects and locations
- background classes List of detected background classes and their location for each object in detected objects
- background class fmdClosestBackgroundClass(object.location, b ackground cl asses)
- object embedding concept_embedding(object. class)
- background embedding concept embedding(background class)
- semantic feature cosineSimilarity(object. class, b ackground cl ass)
- the system finds the closest background 402 from the list of background classes 306 by using the minimum Euclidian distance from the object to the background pixels. This is done by comparing the detected object 400 against each background class in the list of background classes 306 to identify the background class 404 with the minimum distance.
- the previous example is not the only way of computing the closest background class. Any method of computing the closest background class could work as part of the process of the present disclosure.
- Cosine similarity is just one example of a metric that can be used as a feature. It is only used as an example as the invention is just as applicable when using a different similarity metric or creating a different type of semantic feature apart from similarity.
- the system uses a conceptual word embedding to find the embedding vector for the object class 408 and the closest background class 404, via object embedding 410 and background embedding 406, respectively.
- Theta is the angle between the two vectors A and B in high-dimensional space.
- the cosine of theta is an indicator of the similarity between vectors A and B.
- the calculated cosine similarity 412 can be used as a semantic feature 414.
- This feature 414 should encode whether the objects are closely related or not by providing a similarity score between -1 and 1. Since a conceptual embedding is used instead of a simple word embedding, like word2vec or fasttext (see Literature Reference Nos. 13 through 15), the embedding already encodes semantic information instead of just using linguistic context.
- the system can also calculate object-object coherence.
- Two objects can be said to be coherent if their co-existence is reasonable. For example, if one were to detect a handbag right next to a person in an image, then it can be said that it is reasonable because people carry handbags. While if one were to detect a handbag right next to a traffic light, one might be suspicious about the results of the perception system.
- the system can compute the cosine similarity for the concept embeddings of two object classes located close or adjacent to each other in the source image.
- the system uses conditional co- occurrence statistics between objects to further assess potential errors in object detection. ‘Conditional’ because considering the relationships between all semantic objects for a given scenario would be redundant if not computationally overkill and ‘co-occurrence statistics’ to learn the dependencies between objects.
- CRF conditional random fields
- MAP maximum a posteriori probability
- a goal is to construct a graphical model that describes correlations between objects in the scene and make predictions conditioned on the scene.
- FIG. 5 illustrates the range of possible cliques in a particular scenario.
- the objective here is to exploit the mutual information within objects (for example, information that different obejcts have in comone with one another and conditioned on particular scene, e.g. urban driving area) to characterize each model.
- the mutual information is also not limited to commonality.
- the term mutual is used to be general because sometimes the pattern is not common but consistently accrue between two objects; the statistical consistency is what is referred to as mutual information.
- Feature functions define the connections between nodes.
- unary feature functions which was computed with the confidence of object detector (before semantic feature generation) and then (2) the pairwise feature function which was computed with cosine similarity between the semantic features.
- the number of feature functions that each clique c has is a design choice.
- each clique potential is factorized over a set of feature functions f k c (. ), where k is the index of feature functions in the clique c.
- k is the index of feature functions in the clique c.
- the significance of each feature function ( ⁇ k c ) is learned from co-occurrence statistics (this is similar to say hand-bag and person are often detected next each other conditioned on urban scene), however, in this framework the distance measure is at the conceptual level.
- each feature function f k c has associated with it a weighting factor or “significance” ⁇ k c . The higher the weight the more discriminative the pair is in classifying a given object in the scene.
- each parameter ⁇ k . is directly influenced by the sparsity of the data (which is either the data used for training or potentially from prior knowledge, such as from ConceptNet). Increasing the sparsity improves the expressive power of CRF structures, which is also the underlying rationale in the case of Hidden Markov Models (HMM) or Dynamic Bayesian Networks (DBN).
- HMM Hidden Markov Models
- DBN Dynamic Bayesian Networks
- MAP inference is used for each test sample to select the label in which its parameters return the highest conditional likelihood. For example, conditioning on the scenario to be on the urban objects, the objects ‘Car’ and ‘Bus’ would have higher correlation compared with ‘Car’ and ‘Airplane’.
- This generated semantic feature can be used to confirm object-background coherence.
- an airplane 600 is misclassified as a surfboard.
- using the semantic feature can be beneficial.
- the closest background classes for the surfboard detected in the image are sky and road.
- the system can calculate the semantic feature based on object-background coherence. If an object detection network can provide additional potential classes/categories for the detected object, then the system could also calculate the semantic feature for those classes.
- the same semantic feature for an airplane is shown in the following table. [00067] Based on the computed semantic feature values in the table above, it is clear that surfboard and sky are not close in semantic space and are very dissimilar.
- the invention of the present disclosure employs a Probabilistic Signal Temporal Logic (PSTL) 702 framework (see Literature Reference No. 16 for PSTL) to use the semantic features described above.
- PSTL Probabilistic Signal Temporal Logic
- the semantic features (illustrated as element 414 in FIG. 4) are generated as part of the Perception Probe Generation 700 process and used as the probes.
- the PSTL framework 702 uses the generated probes including, but not limited to, object size, aspect ratio, contrast and entropy.
- the semantic features will be used as a probe in the PSTL 702.
- statistical analysis is used to generate true positive and false positive probabilistic distributions based on true positive and false positive detections.
- the probabilistic function can also be multi- dimensional. Integrating all the available axioms from x provides a “multi- dimensional range” of the corresponding detection or recognition.
- x as used here is a probe or semantic feature, not an axiom.
- These axioms 704 can then be used to optimize for ideal perception control parameters 708 which will provide the best true positive to false positive ratio.
- the perception control parameters can be adjusted based on the optimization.
- the perception parameters that are adjusted may be used to modify the input to the perception system, or to modify a parameter in the hardware of the perception system, or to modify a parameter in the model inside the perception system. All three of these cases, input, hardware and model, are considered to be a part of the perception system.
- a PSTL-constraint based optimization 706 is used (see, for example,
- x t is the probe state at time t and x t' is the predicted probe in the next time frame, t' .
- x is defined to be a probe state; it is a semantic feature that is being used as a probe.
- f t ( ⁇ ) is the state transition function and g t ( ⁇ ) is the input function with the control input, u t (it should be noted that the state transition function f t is not to be confused with the the probe sequence / or the feature function f k ).
- a goal is to achieve the optimal u t to reduce perception errors and, in doing so, generate the optimal perception parameters 708 (i.e, u 0PT ).
- z is the semantic feature.
- the process can be designed to disregard the upper bound and form a one sided lower-bound only constraint.
- the semantic object-background coherence feature value should be greater than 0.2 at least 99% of the time t s : t e so that PSTL isn’t violated.
- the semantic feature threshold and P TPz will differ from one class to the other.
- P TPz is the probabilistic constraint on the bounds for true positives.
- Z and X represent different constraints.
- P Tpx could be the constraint for the airplane bounds
- Ptp z could be the constraint for the handbag bounds
- Z and X are identifiers in this context, they could be A, B , C or any other letter or phrase.
- the output of the present system is a set of constraints that are used to identify and correct perception errors. These perception errors can be corrected in systems with a variety of different applications including, but not limited to, Autonomous Urban Driving, Autonomous Flying, Intelligence Surveillance and Reconnaissance, Search and Rescue. An incorrect detection could lead to the autonomous system following an unwanted trajectory. Thus, the output of system of the present disclosure corrects the wrong detection by adjusting perception parameters, which in turn affects the planning and decision making of this tangible physical autonomous system. For example, the system could cause the autonomous system to alter its movements to correct the trajectory of travel.
- the system of the present disclosure provides a set of one or more constraints, such as those referenced above. These constraints are used to identify perception errors. Then, other object candidates are checked in the image to finally correct the identified error. In one aspect, this error identification and correction are the outputs of the system described in this invention, which can be relayed to a planner to affect, modify and plan a better and safer trajectory for the autonomous system.
- a self-driving vehicle may misclassify a bicycle as a person. Without the present system, the self-driving car would have expected the pedestrian to stay on the sidewalk, when in reality a bicycle is not constrained to a sidewalk.
- the present system is able to correct this error and accurately inform the self-driving vehicle about a bicycle.
- the self-driving vehicle will take a tangibly different route in the physical world to restrict itself from getting into a bike lane to avoid a potential accident.
- the system can cause the self-driving vehicle to initiate physical operations through other systems in the vehicle, such as the accelerator, brake, or steering, to avoid collision with the detected object (e.g., bicycle in this example).
- the self-driving vehicle will automatically adapt/modify its trajectory to account for the results of the method and system as described herein.
- this concept can be extended to a variety of applications, such as unmanned aerial vehicles, robotic equipment in a factory, etc.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Quality & Reliability (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
Abstract
Description
Claims
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US202062984728P | 2020-03-03 | 2020-03-03 | |
US17/030,354 US11334767B2 (en) | 2019-09-24 | 2020-09-23 | System and method of perception error evaluation and correction by solving optimization problems under the probabilistic signal temporal logic based constraints |
US17/133,345 US11350039B2 (en) | 2019-09-24 | 2020-12-23 | Contrast and entropy based perception adaptation using probabilistic signal temporal logic based optimization |
PCT/US2021/020555 WO2021206828A2 (en) | 2020-03-03 | 2021-03-02 | Generation and usage of semantic features for detection and correction of perception errors |
Publications (1)
Publication Number | Publication Date |
---|---|
EP4115326A2 true EP4115326A2 (en) | 2023-01-11 |
Family
ID=78026175
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP21770086.3A Pending EP4115326A2 (en) | 2020-03-03 | 2021-03-02 | Generation and usage of semantic features for detection and correction of perception errors |
Country Status (3)
Country | Link |
---|---|
EP (1) | EP4115326A2 (en) |
CN (1) | CN114930406A (en) |
WO (1) | WO2021206828A2 (en) |
-
2021
- 2021-03-02 CN CN202180008841.5A patent/CN114930406A/en active Pending
- 2021-03-02 WO PCT/US2021/020555 patent/WO2021206828A2/en unknown
- 2021-03-02 EP EP21770086.3A patent/EP4115326A2/en active Pending
Also Published As
Publication number | Publication date |
---|---|
WO2021206828A3 (en) | 2022-03-03 |
CN114930406A (en) | 2022-08-19 |
WO2021206828A2 (en) | 2021-10-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11775838B2 (en) | Image captioning with weakly-supervised attention penalty | |
Dequaire et al. | Deep tracking in the wild: End-to-end tracking using recurrent neural networks | |
US9286693B2 (en) | Method and apparatus for detecting abnormal movement | |
US11334767B2 (en) | System and method of perception error evaluation and correction by solving optimization problems under the probabilistic signal temporal logic based constraints | |
KR20220047228A (en) | Method and apparatus for generating image classification model, electronic device, storage medium, computer program, roadside device and cloud control platform | |
US10733483B2 (en) | Method and system for classification of data | |
US11694120B2 (en) | Generation and usage of semantic features for detection and correction of perception errors | |
Belharbi et al. | Deep neural networks regularization for structured output prediction | |
Omidshafiei et al. | Hierarchical bayesian noise inference for robust real-time probabilistic object classification | |
EP3627403A1 (en) | Training of a one-shot learning classifier | |
Dzieduszyński | Machine learning and complex compositional principles in architecture: Application of convolutional neural networks for generation of context-dependent spatial compositions | |
Xu et al. | Detection of ship targets in photoelectric images based on an improved recurrent attention convolutional neural network | |
EP4115326A2 (en) | Generation and usage of semantic features for detection and correction of perception errors | |
Hu et al. | Toward aircraft detection and fine-grained recognition from remote sensing images | |
US20240028828A1 (en) | Machine learning model architecture and user interface to indicate impact of text ngrams | |
Narayanan et al. | Overview of Recent Advancements in Deep Learning and Artificial Intelligence | |
Jiang et al. | Dynamic Security Assessment Framework for Steel Casting Workshops in Smart Factory | |
US20240135724A1 (en) | Methods and systems for predicting parking space vacancy | |
US20210383226A1 (en) | Cross-transformer neural network system for few-shot similarity determination and classification | |
Temple | Real-Time Plume Detection and Segmentation Using Neural Networks | |
US20230259766A1 (en) | Quantization method to improve the fidelity of rule extraction algorithms for use with artificial neural networks | |
Chau | Dynamic and robust object tracking for activity recognition | |
Cai et al. | A visual analytic framework for data fusion in investigative intelligence | |
Katz | Safe Machine Learning-Based Perception Via Closed-Loop Analysis | |
Kumar et al. | An optimized intelligent traffic sign forecasting framework for smart cities |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: UNKNOWN |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE |
|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE |
|
17P | Request for examination filed |
Effective date: 20220909 |
|
AK | Designated contracting states |
Kind code of ref document: A2 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
DAV | Request for validation of the european patent (deleted) | ||
DAX | Request for extension of the european patent (deleted) | ||
P01 | Opt-out of the competence of the unified patent court (upc) registered |
Effective date: 20230525 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: EXAMINATION IS IN PROGRESS |