WO2019198076A1

WO2019198076A1 - Real-time raw data- and sensor fusion

Info

Publication number: WO2019198076A1
Application number: PCT/IL2019/050400
Authority: WO
Inventors: Thorsten BUCHMEIER; Ehud Spiegel; Yoram Elichai; Ronen Korman
Original assignee: Ionterra Transportation And Aviation Technologies Ltd.
Priority date: 2018-04-11
Filing date: 2019-04-10
Publication date: 2019-10-17

Abstract

A computer implemented method of detecting objects based on fusion of sensory data received from a plurality of sensors, comprising using one or more processors for receiving a plurality of sensory datasets captured by a plurality of sensors adapted to capture a plurality of radiation types in a common scene, detecting in real-time one or more predefined primitive elements in each of at least some of the plurality of sensory datasets, associating the predefined primitive element(s) detected in each of the at least some sensory datasets with one or more potential objects, complementing one or more of the potential objects by joining together respective associated predefined primitive elements to create a respective fused object, classifying the respective fused object according to a match with one or more of a plurality of predefined objects and outputting the classification of the respective fused object.

Description

REAL-TIME RAW DATA- AND SENSOR FUSION

RELATED APPLICATIONS

This application claims the benefit of priority under 35 USC § 119(e) of U.S. Provisional Patent Application No. 62/655,835 filed on April 11, 2018 and U.S. Patent Application No. 62/671,443 filed on May 15, 2018. The contents of the above applications are all incorporated by reference as if fully set forth herein in their entirety.

BACKGROUND

The present invention, in some embodiments thereof, relates to detecting objects in a scene, and, more specifically, but not exclusively, to detecting objects in a scene based on fusion of sensory datasets received from a plurality of sensors adapted to capture different radiation types.

Machine based object detection has gained significance importance mainly due to the rapidly evolving automated systems, applications and services requiring image recognition ranging from partially automated control systems to fully autonomous vehicles which heavily rely on the ability to detect and recognize objects in a scene. Moreover, many of such applications, specifically the autonomous vehicle applications may require high speed real-time object detection capabilities in order to properly adapt and respond to the highly dynamic surrounding environment.

Such object detection systems typically employ a plurality of sensors employing a plurality of sensing and capturing technologies for collecting sensory data depicting the scene, for example, imaging, Laser Imaging Detection and Ranging (LiDAR), Radio Detection And Ranging (RADAR), Sound Navigation And Ranging (SONAR) and/or the like. The sensory datasets captured by the sensors may be analyzed to detect object(s) present in the scene.

SUMMARY

According to a first aspect of the present invention there is provided a computer implemented method of detecting objects based on fusion of sensory data received from a plurality of sensors, comprising executing a code by one or more processors for:

Receiving a plurality of sensory datasets captured by a plurality of sensors adapted to capture a plurality of radiation types in a common scene.

Detecting in real-time one or more predefined primitive elements in each of at least some of the plurality of sensory datasets.

Associating the one or more predefined primitive elements detected in each of the at least some sensory datasets with one or more potential objects. Complementing the one or more potential object by joining together respective associated predefined primitive elements to create a respective fused object.

Classifying the respective fused object according to a match with one or more of a plurality of predefined objects.

Outputting the classification of the respective fused object.

According to a second aspect of the present invention there is provided a system for detecting objects based on fusion of sensory data received from a plurality of sensors, comprising:

One or more processors adapted for executing a code, the code comprising:

^■ Code instructions to receive plurality of sensory datasets captured by a plurality of

sensors adapted to capture a plurality of radiation types in a common scene.

^■ Code instructions to detect in real-time one or more predefined primitive elements in each of at least some of the plurality of sensory datasets.

^■ Code instructions to associate the one or more predefined primitive elements detected in each of the at least some sensory datasets with one or more potential objects.

^■ Code instructions to complement the one or more potential objects by joining together respective associated predefined primitive elements to create a respective fused object.

^■ Code instructions to classify the respective fused object according to a match with one or more of a plurality of predefined objects.

^■ Code instructions to output the classification of the respective fused object.

The sensory datasets received from the plurality of different technology sensors are processed at a low level to first detect the low level predefined primitive elements. Since each sensing and capturing technology presents advantages and limitations compared to each other, detecting the predefined primitive elements in each of the different technology sensory datasets may allow for significantly more accurate, elaborate and/or complete detection of the predefined primitive elements compared to processing each of the different technology sensory datasets separately. The fused object(s) created by complementing the potential object(s) through joining (fusing) together the predefined primitive elements may therefore significantly improve the accuracy, comprehensiveness and/or completeness of the fused object(s). Moreover, detecting the (fused) object(s) based on at least partial detection by the different technology sensors may provide significant redundancy and overlap between sensory datasets thus significantly increasing confidence in the detection. Furthermore, low level processing of the sensory datasets may prevent bottlenecks in the processing sequence since there is no need to wait for each of multiple processing pipelines to fully complete its processing cycle of a sensory dataset(s) received from a single type of sensor. In addition, since the fusion engine detects the primitive elements in the sensory datasets based on comparison to predefined primitive, the fusion engine may be significantly more immune to machine learning adversarial attacks.

In a further implementation form of the first and/or second aspects, the plurality of sensors include at least some members of a group consisting of: an imaging sensor, a laser detection and ranging (LiDAR) sensor, a radio detection and ranging (RADAR) sensor and a sound navigation and ranging (SONAR) sensor, the plurality of sensors are adapted to capture respective radiation types which are members of a group consisting of: visible light waves, infrared light waves, laser light waves, hyperspectral waves, heat radiation, radio frequency waves and ultrasonic waves. Since each sensing and capturing technology presents advantages and limitations compared to each other, supporting sensors employing a wide range of sensing technologies may further improve accuracy, comprehensiveness and/or completeness of the detection of the predefined primitive elements.

In a further implementation form of the first and/or second aspects, each of the one or more predefined primitive elements is a member of a group consisting of: an edge, a curve, a line, a corner, a surface, a shape, a texture and a color. Detecting low level simple predefined primitive elements may be done by a fast analysis thus significantly reducing required computing resources. Moreover the low level predefined primitive elements may be used as building block for plurality of higher level objects and therefore a wide variety of the predefined primitive elements may allow construction of a wide range of higher level objects.

In an optional implementation form of the first and/or second aspects, the association is based on aligning each of the one or more predefined primitive elements detected in each of the at least some sensory datasets according to a common reference. Aligning the predefined primitive elements detected in the sensory datasets received from the different technology sensors may significantly improve accuracy of the correlation of the predefined primitive elements detected in the different technology sensory datasets with each other.

In a further implementation form of the first and/or second aspects, the aligning comprising one or more members of a group consisting of: a temporal alignment, a spatial alignment, a resolution alignment and a distribution alignment. Aligning (adjusting) the predefined primitive elements in the space and time dimensions as well as with respect to the capturing capabilities of the different technologies sensors may further improve accuracy of correlation between the predefined primitive elements detected in the different technology sensory datasets.

In an optional implementation form of the first and/or second aspects, a descriptive dataset of the fused object is outputted, the descriptive dataset comprising one or more members of a group consisting of: temporal information, spatial information and relational information descriptive of relation of the fused object with one or more other objects detected in at least some of the sensory datasets. The additional descriptive dataset may significantly improve comprehending the scene as the additional descriptive information may describe relations between detected objects, position, timing and/or other descriptive attributes of the detected objects.

In an optional implementation form of the first and/or second aspects, one or more failed sensors of the plurality of sensors are detected by identifying incompliance of the one or more predefined primitive elements detected in the sensory dataset of the failed sensor with respect to the one or more predefined primitive elements detected in at least another one of the plurality of sensory datasets. Automatically detecting failure(s) in one or more of the sensors may significantly improve integrity, reliability and/or robustness of the objects detection system since such failed sensors may be easily detected in real-time using unbiased sensory data received from the different technology(s) sensor(s).

In an optional implementation form of the first and/or second aspects, one or more of the plurality of sensors are calibrated according to a comparison of the one or more predefined primitive elements detected in the sensory dataset of the respective sensor with respect to the one or more predefined primitive elements detected in at least another one of the plurality of sensory datasets. Automatically calibrating one or more of the sensors may further improve integrity, reliability and/or robustness of the objects detection system since the sensors may be calibrated automatically, dynamically and/or in real-time.

Other systems, methods, features, and advantages of the present disclosure will be or become apparent to one with skill in the art upon examination of the following drawings and detailed description. It is intended that all such additional systems, methods, features, and advantages be included within this description, be within the scope of the present disclosure, and be protected by the accompanying claims.

Unless otherwise defined, all technical and/or scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the invention pertains. Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of embodiments of the invention, exemplary methods and/or materials are described below. In case of conflict, the patent specification, including definitions, will control. In addition, the materials, methods, and examples are illustrative only and are not intended to be necessarily limiting.

Implementation of the method and/or system of embodiments of the invention can involve performing or completing selected tasks manually, automatically, or a combination thereof. Moreover, according to actual instrumentation and equipment of embodiments of the method and/or system of the invention, several selected tasks could be implemented by hardware, by software or by firmware or by a combination thereof using an operating system.

For example, hardware for performing selected tasks according to embodiments of the invention could be implemented as a chip or a circuit. As software, selected tasks according to embodiments of the invention could be implemented as a plurality of software instructions being executed by a computer using any suitable operating system. In an exemplary embodiment of the invention, one or more tasks according to exemplary embodiments of method and/or system as described herein are performed by a data processor, such as a computing platform for executing a plurality of instructions. Optionally, the data processor includes a volatile memory for storing instructions and/or data and/or a non-volatile storage, for example, a magnetic hard-disk and/or removable media, for storing instructions and/or data. Optionally, a network connection is provided as well. A display and/or a user input device such as a keyboard or mouse are optionally provided as well.

BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS

Some embodiments of the invention are herein described, by way of example only, with reference to the accompanying drawings. With specific reference now to the drawings in detail, it is stressed that the particulars shown are by way of example and for purposes of illustrative discussion of embodiments of the invention. In this regard, the description taken with the drawings makes apparent to those skilled in the art how embodiments of the invention may be practiced.

In the drawings:

FIG. 1 is a flowchart of an exemplary process of detecting objects based on fusion of sensory datasets received from a plurality of sensors adapted to capture different radiation types, according to some embodiments of the present invention;

FIG. 2 is a schematic illustration of an exemplary system for detecting objects based on fusion of sensory datasets received from a plurality of sensors adapted to capture different radiation types, according to some embodiments of the present invention;

FIG. 3A and FIG. 3B are exemplary image captures presenting detection of predefined primitive elements in sensory datasets captured by an imaging sensor and a LiDAR sensor, according to some embodiments of the present invention; and

FIG. 4 is an exemplary image capture presenting creation of fused objects by joining together predefined primitive elements detected in sensory datasets captured by an imaging sensor and a LiDAR sensor, according to some embodiments of the present invention. DETAILED DESCRIPTION

According to some embodiments of the present invention, there are provided methods and systems for detecting objects in a scene in particular a scene monitored by a plurality of sensors of a vehicle based on fusion of sensory datasets received from a plurality of sensors employing different sensing and/or capturing technologies and adapted to capture different radiation types.

The sensors may include, for example, imaging sensor(s) (e.g. camera, night vision camera, thermal camera, etc.), LiDAR sensor(s), RADAR sensor(s), SONAR sensor(s) and/or the like and may hence be adapted to capture different radiation types, for example, visible light waves, infrared light waves, laser light waves, hyperspectral waves, heat radiation, radio frequency waves and ultrasonic waves and/or the like.

A fusion engine may receive the sensory datasets from at least some of the different technology sensors and analyze the sensory datasets to detect one or more low level predefined primitive elements, for example, an edge, a curve, a corner, a surface, a shape, a texture, a color and/or any combination thereof. The fusion engine may detect the predefined primitive elements by comparing them to predefined primitive elements defined in advance and stored in a primitive elements dataset, for example, a record, a file, a database and/or the like. Each of the predefined primitive elements may further be associated in the primitive elements dataset as building blocks for one or more higher level objects, for example, a vehicle (e.g. car, truck, motorcycle, bicycle, etc.), a structure, (e.g. a building, a tree, etc.), a transportation infrastructure element (e.g. traffic light, traffic sign, light columns, road edges, road markings, sidewalk, traffic circle edges, etc.) and/or the like.

Since the sensors apply different sensing and/or capturing technologies the predefined primitive elements detected by the fusion engine in each of the sensory datasets may vary according to the capturing capabilities of the respective sensors.

The fusion engine may associate the predefined primitive elements detected in the plurality of datasets with one or more potential objects which may be present in the scene according to the association and relations predefined in the primitive elements dataset.

The fusion engine may further correlate between multiple predefined primitive elements detected in the sensory datasets received from the different technology sensors and further associate the correlated multitude of predefined primitive elements with respective potential object(s). The fusion engine correlate together the multitude of predefined primitive elements based on a spatial and/or temporal correlation detected in the sensory datasets, for example, proximity, distance, timing of capture and/or the like. In order to identify the spatial and/or temporal correlation, the fusion engine may first align the predefined primitive elements with respect to a common reference, for example, a common coordinate system, a common object detected in the scene by at least some of the sensors and/or the like.

The fusion engine may then complement (complete) in real-time the potential object(s) by joining together the predefined primitive elements detected in at least some of the sensory datasets and correlated to the same potential object(s) to create a respective fused object for one or more of the potentially detected objects. The predefined primitive elements detected in the different sensory datasets may each only partially portray (depict) the potential object(s) due to limitation of the respective sensing and/or capturing technologies. Therefore by joining the potentially partial elements detected in the plurality of sensory datasets captured by the plurality of different technology sensors may significantly improve the accuracy, comprehensiveness and/or completeness of the respective fused object(s).

The fusion engine may classify the fused object(s) to respective predefined and labeled objects according to predefined classification data and labels. The fusion engine may output the classification data (e.g. label) optionally coupled with additional data relating to the respective fused object(s), for example, spatial information, temporal information, relational information descriptive of relation (spatial relation, temporal relation, etc.) of the fused object(s) with one or more other objects in the scene and/or the like. The fusion engine may further calculate a probability score for one or more of the fused objects to indicate a probability level that the respective fused object is accurately classified to the respective labeled objects.

According to some embodiments of the present invention, one or more failures and/or malfunctions of one or more of the sensors may be automatically detected based on analysis of sensory data, specifically analysis of predefined primitive element(s) detected by analyzing the sensory dataset received from one or more other sensors, in particular other sensor(s) employing different sensing technology(s) and adapted to capture different radiation type(s).

According to some embodiments of the present invention, one or more of the sensors may be automatically calibrated according to sensory data, specifically according to predefined primitive element(s) detected by analyzing the sensory dataset(s) of one or more other sensors, in particular other sensor(s) employing different sensing technology(s) and adapted to capture different radiation type(s).

Detecting the fused objects in the scene may present significant advantages and benefits compared to currently existing methods and systems for detecting objects in a scene. The existing methods and systems may typically apply object detection processes to each of the sensory datasets separately. This means that the sensory dataset received from each technology type of sensor may be processed independently. For example, imaging sensory data, LiDAR sensory data, RADAR sensory data and/or SONAR sensory data may be each processed independently in a separate processing pipeline to detect higher level objects in the scene. This may present major limitations. First, due to limitations in each of the sensing and capturing technologies each of the sensor types may exhibit inherent capturing limitations. For example, imaging sensors may present difficulties in a differentiating between objects having substantially similar visible characteristics, for example, color, texture and/or the like. In another example, LiDAR, RADAR, and /or SONAR sensors may present difficulties in a differentiating between objects located at substantially similar distance from the sensor(s). Moreover, since each of the sensory datasets are processed independently to detect the higher level objects, the correlation between the detected higher level objects may be done only after all (or at least to majority) of the different technology datasets are processed. This may cause a dependency on the slowest processing pipeline which may stall, halt and/or delay the higher speed processing pipelines.

The fusion approach on the other hand easily overcomes these limitations. The sensory datasets received from the plurality of different technology sensors are processed at a low level to first detect the low level predefined primitive elements. Since each sensing and capturing technology presents advantages and limitations compared to each other, detecting the predefined primitive elements in each of the different technology sensory datasets may allow for significantly more accurate, elaborate and/or complete detection of the predefined primitive elements compared to processing each of the different technology sensory datasets separately. Therefore, the fused object(s) created by complementing the potential object(s) through joining (fusing) together the predefined primitive elements detected by the different technology sensors and associated with common potential object(s) may significantly improve the accuracy, comprehensiveness and/or completeness of the fused object(s). Furthermore, detecting the (fused) object(s) based on at least partial detection by the different technology sensors may provide significant redundancy and overlap between sensory datasets thus significantly increasing confidence in the detection. True positive and/or true negative detections may therefore be significantly increased while false positive and/or false negative detections may be significantly reduced thus further increasing the accuracy of the objects detection.

Moreover, low level processing of the sensory datasets may prevent bottlenecks in the processing sequence since there is no need to wait for each of multiple processing pipelines to fully complete its processing cycle as may be done by the existing methods and systems. This may significantly improve resource utilization, for example, processing resources, storage resources, networking resources and/or the like since the waiting time may be significantly reduced compared to the existing methods. Reducing and potentially preventing the waiting time all together may significantly increase the processing speed which may be essential and/or crucial for real-time object detection applications, for example, autonomous vehicles, safety systems and/or the like. Also, detecting the low level predefined primitive elements which are simple elements may require significantly reduced computing resources compared to the existing methods which may be adapted to directly detect high level objects.

Furthermore, since the fusion engine detects the primitive elements in the sensory datasets based on comparison to primitive elements predefined and stored in the primitive elements dataset, the fusion engine may be significantly more immune to machine learning adversarial attacks compared to the existing methods and systems which typically employ machine learning (e.g. deep learning) for detecting the objects in the scene. Machine learning adversarial attacks relate to inputs (training samples) to machine learning models which are intentionally designed by an adversary to cause the machine learning model to make a mistake.

In addition, the fusion engine may be deployed in conjunction with one or more of the existing methods for redundancy purposes and/or for verification of the detected objects by comparing between detected objects (or not detected) in a plurality of independent object detection systems, i.e. the fusion engine and one or more of the existing systems. Moreover, the probability score calculated by the fusion engine for one or more of the fused objects may be further evaluated in conjunction with the detection of the existing methods to further refine the detection accuracy.

Aligning the predefined primitive elements detected in the sensory datasets received from the different technology sensors may significantly improve the accuracy of correlation with each other of the predefined primitive elements detected in the different technology sensory datasets.

Automatically detecting failure(s) in one or more of the sensors according to sensory data received from other sensor(s), specifically other sensor(s) adapted to capture different radiation type(s) may significantly improve integrity, reliability and/or robustness of the objects detection system since such failed sensors may be easily detected in real-time using unbiased sensory data received from the different technology(s) sensor(s).

Automatically calibrating one or more of the sensors according to sensory data received from other sensor(s), specifically other sensor(s) adapted to capture different radiation type(s) may further improve integrity, reliability and/or robustness of the objects detection system since the sensors may be calibrated automatically, dynamically and/or in real-time. Before explaining at least one embodiment of the invention in detail, it is to be understood that the invention is not necessarily limited in its application to the details of construction and the arrangement of the components and/or methods set forth in the following description and/or illustrated in the drawings and/or the Examples. The invention is capable of other embodiments or of being practiced or carried out in various ways.

As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or computer program product. Accordingly, aspects of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a“circuit,” “module” or“system.” Furthermore, aspects of the present invention may take the form of a computer program product embodied in one or more computer readable medium(s) having computer readable program code embodied thereon.

Any combination of one or more computer readable medium(s) may be utilized. The computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to, electro magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Computer Program code comprising computer readable program instructions embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wire line, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

The program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages.

The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider). The program code can be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a local area network, a wide area network and/or a wireless network.

Aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer readable program instructions.

The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the Figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts or carry out combinations of special purpose hardware and computer instructions.

Referring now to the drawings, FIG. 1 illustrates a flowchart of an exemplary process of detecting objects based on fusion of sensory datasets received from a plurality of sensors adapted to capture different radiation types, according to some embodiments of the present invention. An exemplary process 100 may be executed by a fusion engine to detect one or more objects in a scene, in particular a scene monitored by a plurality of sensors of a vehicle. The fusion engine may join together (fuse) low level predefined primitive elements detected by analyzing sensory datasets received from at least some of the plurality of sensors employing different sensing and/or capturing technologies, for example, imaging, LiDAR, RADAR, SONAR and/or the like and are hence adapted to capture different radiation types, for example, visible light, infrared light, laser light, RF waves, ultrasonic waves and/or the like.

Reference is also made to FIG. 2, which is a schematic illustration of an exemplary system for detecting objects based on fusion of sensory datasets received from a plurality of sensors adapted to capture different radiation types, according to some embodiments of the present invention. An exemplary fusion system 200 may include a network interface 202 for connecting to a network 230, an Input//Output (I/O) interface 204 for connecting to a plurality of sensors 220, a processor(s) 206 for executing a process such as the process 100 and storage 216 for storing program code and/or data. According to some embodiments of the present invention the fusion system 200 may be installed, mounted, integrated and/or embedded in the vehicle.

The network interface 202 may include one or more network interfaces providing the fusion system 200 connectivity to the network 230 which may include one or more networks, specifically wireless networks, for example, a Radio Frequency (RF) link, a cellular network, a Wireless Local Area Network (WLAN) and/or the like. Through the network interface 202, the fusion system 200 may connect to one or more remote network resources, for example, a remote server 240, a cloud service 250 and/or the like.

The I/O interface 204 may include one or wired and/or wireless interfaces providing the fusion system 200 connectivity to the sensors 220, for example, an RF communication channel, a WLAN communication channel, a Controller Area Network (CAN) interface, a Universal Serial Bus (USB) interface, a serial interface, a single wire interface and/or the like. Through the I/O interface 204, the fusion system 200 may receive sensory datasets from the sensors 220. Optionally, the fusion system 200 receives the sensory dataset (in its raw format) from one or more devices, systems and/or applications, for example, Autosar and/or the like adapted to collect the sensory datasets from one or more of the sensors 220 and distribute the collected sensory dataset(s) to other devices, for example, the fusion system 200.

The processor(s) 206, homogenous or heterogeneous, may include one or more processing nodes arranged for parallel processing, as clusters and/or as one or more multi core processor(s).

The storage 208 may include one or more non-transitory memory devices, either persistent non-volatile devices, for example, a Read Only Memory (ROM), a Flash array, a hard drive, a solid state drive (SSD) and/or the like. The storage 208 may also include one or more volatile devices, for example, a Random Access Memory (RAM) device, a cache memory and/or the like.

The processor(s) 206 may execute one or more software modules, for example, a process, a script, an application, an agent, a utility, a tool and/or the like each comprising a plurality of program instructions stored in a non-transitory medium such as the storage 208 and executed by one or more processors such as the processor(s) 206. For example, the processor(s) 206 may execute a fusion engine 210 for executing the process 100 to analyze the sensory datasets captured by the sensors 220 monitoring the scene to detect one or more objects, in particular to fuse together low level primitive elements detected by different sensors 220 to improve an accuracy of the detection. The primitive elements may be predefined and characterized in advance and stored in a primitive elements dataset 212 in the storage 208, for example, a record, a file, a database and/or the like constituting a collection, a library and/or the like of predefined primitive elements.

The sensors 220 may typically be installed, mounted, integrated and/or embedded in the vehicle. The sensors 220 may include a plurality of sensors employing at least some of a plurality of different sensing and/or capturing technologies, for example, imaging, LiDAR, RADAR, SONAR and/or the like. The sensors 220 are therefore adapted to monitor and capture different radiation types, for example, visible light waves, infrared light waves, laser light waves, hyperspectral waves, heat radiation, radio frequency waves and ultrasonic waves and/or the like. For example, the sensors 220 may include one or more imaging sensors 220A adapted to capture visible light waves and/or infrared light waves, for example, a camera, a video camera, a stereoscopic camera, a night vision imaging sensor and/or the like. The imaging sensors 220A may further include one or more thermal imaging sensors adapted to capture thermal radiation. The imaging sensors 220A may also include one or more hyperspectral imaging sensors adapted to capture a wide spectrum of waves. The sensors 220 may include one or more LiDAR sensors 220B adapted to determine range, angle and/or velocity of objects in the scene by emitting pulsed laser light beams and intercepting returning laser light beams reflected and/or deflected from the objects in the scene. The sensors 220 may include one or more RADAR sensors 220C adapted to determine range, angle and/or velocity of objects in the scene by emitting RF waves and intercepting returning RF waves reflected and/or deflected from the objects in the scene. The sensors 220 may include one or more SONAR sensors 220D adapted to determine range, angle and/or velocity of objects in the scene by emitting pulsed ultrasonic waves and intercepting echoes of the emitted ultrasonic waves resulting from reflection and/or deflection from the objects in the scene.

As shown at 102, the process 100 starts with the fusion engine 210 receiving sensory datasets from a plurality of the sensors 220. The sensory dataset received from each of the sensors 220 may naturally depend on the type of the respective sensor technology employed by the respective sensor 220. For example, the sensory dataset received from the imaging sensor(s) 220A may include one or more images, heat maps, spectral maps and/or the like of the depicted scene. In another example, the sensory datasets received from the LiDAR sensor(s) 220B, the RADAR sensor(s) 220C and/or the SONAR sensor(s) 220D may include one or more images, maps and/or presentations mapping distances to objects in the scene. The sensory datasets may include 2-Dimension (2D) data, 3- Dimension (3D) data, depth data, monochrome data, color data, reflection data and/or the like.

In particular the fusion engine 210 receives sensory datasets from sensors 220 which employ different sensing technologies and adapted to monitor and capture different radiation types. For example, the fusion engine 210 may receive sensory dataset(s) captured by one or more of the imaging sensor(s) 220A and sensory dataset(s) captured by one or more of the LiDAR sensor(s) 220B. In another example, the fusion engine 210 may receive sensory datasets captured by one or more of the imaging sensor(s) 220A and sensory datasets captured by one or more of the RADAR sensor(s) 220C. In another example, the fusion engine 210 may receive sensory datasets captured by one or more of the imaging sensor(s) 220A and sensory datasets captured by one or more of the SONAR sensor(s) 220D. In another example, the fusion engine 210 may receive sensory datasets captured by one or more of the LiDAR sensor(s) 220B and sensory datasets captured by one or more of the RADAR sensor(s) 220C. In another example, the fusion engine 210 may receive sensory datasets captured by one or more of the LiDAR sensor(s) 220B and sensory datasets captured by one or more of the SONAR sensor(s) 220D. In another example, the fusion engine 210 may receive sensory datasets captured by one or more of the RADAR sensor(s) 220C and sensory datasets captured by one or more of the SONAR sensor(s) 220D. In another example, the fusion engine 210 may receive sensory datasets captured by one or more of the imaging sensor(s) 220A, sensory datasets captured by one or more of the LiDAR sensor(s) 220B and sensory datasets captured by one or more of the RADAR sensor(s) 220C.

Moreover, the sensory datasets are received by the fusion engine 210 from the plurality of sensors 220 relates to a substantially common scene as viewed from the vehicle. This may naturally depend on the installation locations of the sensors 220 in the vehicle. The sensors 220 may be installed and adapted to monitor substantially common scenes despite the fact that the exact positioning and/or view angle of each of the sensors 220 may vary with respect to each other. For example, the fusion engine 210 may receive sensory datasets captured by one or more imaging sensors 220A installed to monitor a front view of the vehicle and sensory datasets captured by one or more LiDAR sensors 220B installed to monitor substantially the same front view of the vehicle. In another example, the fusion engine 210 may receive sensory datasets captured by one or more imaging sensors 220A installed to monitor a front view of the vehicle and sensory datasets captured by one or more LiDAR sensors 220B installed to monitor a 360 degrees view of the vehicle. In another example, the fusion engine 210 may receive sensory datasets captured by one or more imaging sensors 220A installed to monitor a right side view of the vehicle and sensory datasets captured by one or more RADAR sensors 220C installed to monitor substantially the same right side view of the vehicle. In another example, the fusion engine 210 may receive sensory datasets captured by one or more LiDAR sensors 220B installed to monitor a back side view of the vehicle and sensory datasets captured by one or more SONAR sensors 220B installed to monitor substantially the same back side view of the vehicle.

The sensory datasets received by the fusion engine 210 may typically include a timing indication, for example, a time tag and/or the like indicating the timing of interception, i.e. the time of capture by the respective sensor 220. The timing indication may be an absolute value with respect to a real time clock and/or a relative timing calculated with respect to some reference time, for example, a counter, a timer and/or the like.

As shown at 104, the fusion engine 210 may analyze the received sensory datasets received from the plurality of sensors 220 to detect one or more predefined primitive elements depicted by the respective sensors 220. The fusion engine 210 may apply one or more analysis methods, techniques, tools and/or algorithms for analyzing the sensory datasets. For example, the fusion engine 210 may apply one or more image processing techniques such as, for example, color similarity, edge detection, edges grouping and/or the like to detect one or more of the predefined primitive elements in the sensory dataset(s) received from the imaging sensor(s) 220A. In another example, the fusion engine 210 may apply one or more signal processing techniques such as, for example, depth analysis and/or the like to detect one or more of the predefined primitive elements in the sensory dataset(s) received from the LiDAR senor(s) 220B. In another example, the fusion engine 210 may apply one or more signal processing techniques such as, for example, blobs analysis and differentiation and/or the like to detect one or more of the predefined primitive elements in the sensory dataset(s) received from the RADAR sensor(s) 220C and/or the SONAR senor(s) 220D.

The fusion engine 210 may apply one or more comparison metrics, for example, a shape similarity, a size similarity, a color similarity, a texture similarity and/or the like to compare the detected predefined primitive element(s) to the predefined primitive elements in the primitive elements dataset, search for a match and label the detected predefined primitive element(s) according to the match. For example, the fusion engine 210 may compare one or more edges and/or curves detected by analyzing image(s) received from one or more of the imaging sensors 220A to an outline of one or more of the predefined primitive elements in the primitive elements dataset 212. In another example, the fusion engine 210 may compare an outline and/or part thereof detected by analyzing growth and detecting edges of cloud point(s) received from one or more of the LiDAR sensors 220B to an outline of one or more of the predefined primitive elements in the primitive elements dataset 212. In another example, the fusion engine 210 may compare an outline and/or part thereof detected by analyzing RADAR blobs received from one or more of the RADAR sensors 220C to an outline of one or more of the predefined primitive elements in the primitive elements dataset 212. The fusion engine 210 may further apply one or more machine learning, for example, deep learning methods, algorithms and/or models, for example, a neural network, a support vector machine (SVM), a decision tree learning algorithm, a K-Nearest neighbors algorithm and/or any other learning algorithm as known in the art trained to detect the predefined primitive elements in the sensory datasets and label them according to matching predefined primitive elements in the primitive elements dataset.

The predefined primitive elements may include, for example, an edge, a curve, a line, a corner, a surface, a shape (either 2D shapes such as triangle, rectangle, square, circle, ellipse, arbitrary closed shape, etc. and/or 3D shape such as ball, box, cone, etc.), a texture, a color and/or the like. The predefined primitive elements may further include partial shapes, for example, a part of a circle, part of a rectangle, part of a box, part of a ball and/or the like. The predefined primitive elements may typically be part of and/or associated with one or more higher level objects for example, a vehicle (e.g. car, truck, motorcycle, bicycle, etc.), a structure, (e.g. a building, a tree, etc.), a transportation infrastructure element (e.g. traffic light, traffic sign, light columns, road edges, road markings, sidewalk, traffic circle edges, etc.) and/or the like. The higher level objects may further be hierarchically constructed such that an object of a higher level is constructed of a plurality of lower level objects. For example, a car wheel may be defined as a higher level object of a certain level while a car object comprising at least one wheel object may be defined as a higher level object of a lev higher than the level of the wheel object. As such the predefined primitive elements may be regarded as building blocks for constructing, detecting and/or identifying the higher level objects. The predefined primitive elements may be 2D primitive elements and/or 3D primitive elements which are labeled distinctively accordingly. For example, a circular 2D shape may be labeled as a circle while a circular 3D shape may be labeled as a ball. For example, a rectangular 2D shape may be labeled as a rectangle while a rectangular 3D shape may be labeled as a box, a pyramid and/or the like.

The predefined primitive elements which may be defined during a learning phase may be locally stored by the fusion system 200, for example, in the primitive elements dataset 212. The learning phase may conducted by the fusion engine 210 and/or by one or more other software modules, tools, applications and/or the like adapted to conduct and/or control at least part of the learning phase.

During the learning phase one or more methods, techniques and/or tools may be applied for defining the predefined primitive elements and labeling them accordingly. Specifically the predefined primitive elements may be defined by analyzing a plurality of training sensory datasets such as the sensory datasets received from the sensors 220 to detect, isolate, identify and label a plurality of primitive elements. The training sensory datasets may include actual sensory datasets captured by one or more sensors such as the sensors 220. However, the training sensory datasets may further include sensory datasets received from other sources, for example, synthetically constructed sensory datasets created manually and/or automatically by one or more simulation systems and/or applications. Once defined, the labeled predefined primitive elements may be stored in the primitive elements dataset 212.

One or more of the primitive elements may be manually predefined by defining one or more higher level objects and constructing the higher level object(s) from respective predefined primitive elements optionally having predefined relations with other predefined primitive elements of the respective higher level object. For example, a light (lamp) column may be defined by two thin rectangle primitive elements substantially perpendicular to each other (e.g. 80-120 degrees between them) where the first rectangle is substantially vertical extending upwards from ground level to a height of at least two meters.

Moreover, one or more of the primitive elements may be predefined by applying an automated rule based analysis to detect one or more of the primitive elements according to a set of rules defining the type, characteristics, attributes, relations and/or the like of the primitive element(s). For example, an automated analysis may be applied to identify higher level object(s) in the training sensory datasets followed by a segmentation process to construct the higher level object(s) from a plurality of predefined primitive elements, for example, an edge, a curve, a shape and/or the like. Furthermore, one or more of the primitive elements may be predefined by applying one or more machine learning algorithms, for example, deep learning algorithms to the training sensory datasets for detecting marked (labeled) higher level object(s), extracting primitive elements (features) of the higher level object(s) and clustering, classifying and/or labeling the extracted primitive elements accordingly.

In addition, one or more of the predefined primitive elements may be predefined using inheritance from already defined higher level object(s). For example, assuming a certain light column is already defined with respective predefined primitive elements. One or more other light column may have substantially the same shape but with some different dimensions, for example, height, distance between the structural rectangles and/or the like. In such case the other light column(s) may be defined using similar and/or slightly adjusted predefined primitive elements optionally with adjusted relations with other predefined primitive elements of the respective light column.

The training sensory datasets may be selected, designed, constructed and/or configured to provide a diverse view of the higher level objects of interest in order to adapt the predefined primitive elements to a plurality of view conditions, sensory data capture conditions, operation parameters of the sensor(s) 220 and/or the like. For example, the training sensory datasets may include sensory data relating to one or more of the higher level objects captured by the sensors 220 at different distances, at different elevations, from different view angles and/or the like. In another example, the training sensory datasets may include sensory data relating to one or more of the higher level objects which may be partially visible to the sensors 220, for example, obscured by other objects, partially visible due to weather conditions and/or the like. In another example, the training sensory datasets may include sensory data captured by the sensors 220 having various operation parameters, for example, different resolutions, different fields of view, different dynamic ranges, different zoom-in values, different cloud point distributions (specifically with respect to the LiDAR sensor(s) 220B, the RADAR sensor(s) 220C and/or the SONAR sensor(s) 220D), different distortion levels and/or the like.

Moreover, the training sensory datasets may be selected, constructed and/or configured to include sensory data captured by the sensors 220 depicting different aspects of the higher level object(s), for example, different objects of the same type, different views of the higher level object(s) and/or the like in order to adapt the predefined primitive elements accordingly. For example, the training sensory datasets may include sensory data captured by the sensors 220 depicting a plurality of car models, a plurality of truck models, a plurality of pedestrians, a plurality of traffic light structures, a plurality of road markings outlines and/or the like. In another example, the training sensory datasets may include sensory data captured by the sensors 220 depicting different views of the higher level object(s), for example, a front view, a back view, a side view, a top view and/or the like. In another example, the training sensory datasets may include sensory data captured for the higher level object(s) which may be only partially visible to the sensors 220, for example, obscured by other object(s), partially visible due to weather conditions and/or the like.

During the learning phase each of the predefined primitive elements may be associated with one or more of the higher level objects which are also classified and labeled. The association between the higher level objects and their predefined primitive elements building block may be defined in the primitive elements dataset. For example, circle shapes may be associated with one or more traffic signs. In another example, a substantially horizontal edge line may be associated with a road outline, a sidewalk outline, a building outline and/or the like. In another example, a substantially vertical edge line may be associated with a traffic sign column outline, a building outline and/or the like.

Moreover, multiple predefined primitive elements may be correlated with one or more respective higher level objects according to one or more relational conditions. For example, two vertical lines which are within a predefined distance (which may be adjusted according to the distance of the lines from the sensor(s) 220) from each other may be associated with the same traffic sign pole, the same light column and/or the like. In another example, a circular shape and a detected vertical line attached to the circular shape may be associated with the same traffic sign pole, the same light column and/or the like. In another example, a plurality of horizontal 2D rectangles located spaced from each other in a predefined distance may be associated with a pedestrian crosswalk (zebra crossing) road marking. In another example, a plurality of curves having certain predefined positions and/or locations with respect to each other may define one or more outlines associated with one or more vehicles (e.g. car, truck, motorcycle, bicycle, police car, ambulance, etc.), one or more pedestrians (e.g. man, woman, child, group of people, etc.) and/or the like.

Analyzing the sensory datasets received from the sensors 220, the fusion engine 210 may detect one or more of the plurality of predefined primitive elements. Due to the different technologies applied by the different types of the sensors 220, the fusion engine 210 may detect one or more predefined primitive elements in one sensory dataset while detecting other predefined primitive element(s) in one or more other sensory datasets. For example, one or more predefined primitive elements may be detected in the sensory dataset(s) received from one or more of the imaging sensors 220A while detecting other predefined primitive element(s) in the sensory dataset(s) received from one or more of the LiDAR sensors 220B.

Reference is now made to FIG. 3 A and FIG. 3B, which are exemplary image captures presenting detection of predefined primitive elements in sensory datasets captured by an imaging sensor and a LiDAR sensor, according to some embodiments of the present invention. FIG. 3A and FIG. 3B present two image capture 302 and 304 where 302 is an image capture (sensory data) received from an imaging sensor such as the imaging sensor 220A and 304 is an image capture (sensory data) received from a LiDAR sensor such as the LiDAR sensor 220B. Both the imaging sensor 220A and the LiDAR sensor 220B depict the exact same scene. The two sensory datasets 302 and 304 may be processed by a fusion engine such as the fusion engine 210.

As seen in FIG. 3 A which compares the two sensory datasets (images) 302A and 302B, the fusion engine 210 may detect a plurality of predefined primitive elements in the sensory dataset 302 received from the imaging device 220A while failing to detect these predefined primitive elements in the sensory dataset 304 received from the LiDAR device 220B. Specifically, by analyzing the sensory dataset 302, the fusion engine 210 may detect the following predefined primitive elements (marked red), traffic sign (1), arrow marking on traffic sign (2), car specific outline (3), road markings (4), bicycle wheel (5), window (6) and feet of a pedestrian crossing a street (7). However, when analyzing the sensory dataset 304, the fusion engine 210 may fail to detect the predefined primitive elements detected in the sensory dataset 302. This is due to the different technology of the LiDAR sensor 220B which is based on mapping distance to objects in the scene. For example, the fusion engine 210 may fail to detect the car specific outline (3), the window (6) and the feet of the pedestrian (7) since they are within significantly the same distance as other objects in their background and the LiDAR technology may therefore fail to distinguish them from the background. In another example, the fusion engine 210 may fail to detect the arrow marking on traffic sign (2) and the road markings (4) since the LiDAR technology may be limited in detecting markings which present no distance difference with respect to the surface on which the markings are made. In another example, the fusion engine 210 may fail to detect the traffic sign (1) as it blends with surrounding objects making the traffic sign (1) indistinguishable from the surrounding objects. In another example, the fusion engine 210 may fail to detect the bicycle wheel (5) as it may be too thin to reflect the laser light pulses (beams) and may therefore not be detected by the LiDAR sensor 220B.

As seen in FIG. 3B which compares the two sensory datasets (images) 302A and 302B, the fusion engine 210 may detect a plurality of predefined primitive elements in the sensory dataset 304 received from the LiDAR device 220B while failing to detect these predefined primitive elements in the sensory dataset 302 received from the imaging device 220A. Specifically, by analyzing the sensory dataset 304, the fusion engine 210 may detect the following predefined primitive elements (marked with white outline), traffic sign (A), traffic sign (B), traffic sign pole (C) and a head of pedestrian crossing a street (D). However, when analyzing the sensory dataset 302, the fusion engine 210 may fail to detect the predefined primitive elements detected in the sensory dataset 304. This is due to the different technology of the imaging sensor 220A which may fail to distinguish between objects due to color similarity, pattern similarity, partial obscurity, blending and/or the like. For example, the fusion engine 210 may fail to detect the traffic sign (A), the traffic sign (B) and the traffic sign pole (C) which may blend with surrounding objects making the traffic sign (A), the traffic sign (B) and the traffic sign pole (C) indistinguishable from their surrounding objects. In another example, the fusion engine 210 may fail to detect the head of the pedestrian (D) which has substantially similar color as the background thus making the head of the pedestrian (D) indistinguishable from the background.

Reference is made once again to FIG. 1.

As shown at 106, the fusion engine 210 may associate detected predefined primitive elements with one or more potential (higher level) objects which may potentially be present in the scene monitored by the sensors 220. The fusion engine may check which higher level objects are associated in the primitive elements dataset with the detected predefined primitive elements and may associate the detected predefined primitive elements with one or more potential objects accordingly. For example, assuming the fusion engine 210 detects a circular shape (predefined primitive element), the fusion engine 210 may estimate the potential object is, for example, a traffic sign, a head of a person, a wheel of a car, a light housing of a traffic light and/or the like since these objects may be associated with the circular shape in the primitive elements dataset. In another example, assuming the fusion engine 210 detects a substantially vertical line (predefined primitive element) extending upwards, the fusion engine 210 may estimate the potential object is, for example, a traffic sign pole, a traffic light pole, a building edge, a vehicle edge and/or the like since these objects may be associated with the circular shape in the primitive elements dataset. In another example, assuming the fusion engine 210 detects a substantially horizontal line (predefined primitive element) extending forward, sideways, a combination thereof and/or the like, the fusion engine 210 may estimate the potential object is, for example, an edge of a road, an edge of a sidewalk and/or the like since these objects may be associated with the circular shape in the primitive elements dataset.

Moreover, the fusion engine 210 may associate the detected predefined primitive elements with the respective potential object(s) according to a correlation between predefined primitive elements detected in different sensory datasets received from different sensors 220. The fusion engine 210 may first align the predefined primitive elements detected in different datasets received from different sensors 220 in order to identify a temporal and/or spatial correlation between the predefined primitive elements and thus accurately correlate between the predefined primitive elements.

The sensors 220 may not be positioned (i.e. mounted, installed, deployed, etc.), for example, in the vehicle, to capture the potential objects in the scene in exact spatial alignment with each other. For example, the sensors 220 may be positioned at different locations thus having different spatial capturing parameters, for example, view angle, elevation, distance and/or the lie. The fusion engine 210 may be provided with the positioning information of the sensors 220, for example, the view angle, the elevation, the distance (depth) and/or the like with respect to the positioning of the vehicle and therefore with respect to the scene. The fusion engine 210 may therefore align the sensory datasets with each other such that the predefined primitive elements detected in the sensory datasets received from the sensors 220 are adjusted to compensate for the different spatial capturing parameters of their respective sensors 220 and thus aligned with each other in the space. As result, the fusion engine 210 may correlate predefined primitive element(s) detected in one of the sensory datasets with predefined primitive element(s) detected in one or more other sensory datasets and optionally associate them with one or more potential objects.

The fusion engine 210 may align the sensory datasets according to a common reference. For example, assuming the mounting, installation and/or integration information of the sensors 220 (e.g. position, elevation, view angle, etc.) is available to the fusion engine 210, the fusion engine 210 may compensate and/or adjust the sensory data according to a common coordinate system, for example, a common coordinate system of the vehicle in another example, the fusion engine 210 may align the sensory datasets according to one or more common objects, for a predefined primitive element, a higher level object and/or the like detected in at least some of the sensory datasets and may be used to align the sensory datasets with respect to each other.

The fusion engine 210 may further use the timing information, for example, the time tag assigned to the sensory datasets received from one or more of the sensors 220 to correlate between predefined primitive element(s) detected at different times. For example, assuming that by analyzing a certain sensory dataset received at time t from a certain one of the sensors 220, the fusion engine 210 detects a certain predefined primitive element. Further assuming that by analyzing a certain sensory dataset received at time t' (f later than t) from the same and/or another one of the sensors 220, the fusion engine 210 detects a predefined primitive element which is substantially similar to the certain predefined primitive element. In such case the fusion engine 210 may correlate the certain predefined primitive element detected in the two sensory datasets and optionally associate them with a respective potential object.

Moreover, the fusion engine 210 may use sensory datasets received from one or more other sensors, for example, a Global Positioning System (GPS) sensor to correlate the predefined primitive element(s) detected at different times by using GPS information to calculate a location, a speed, an acceleration and/or the like of the vehicle and associate the predefined primitive element(s) in time and space accordingly.

In addition the sensory datasets received from the plurality of sensors 220 may not be synchronized in time. For example, sensory dataset may be received from a certain sensor 220 with a time shift with respect to the sensory dataset received from one or more other sensors 220. The time shift may lead to a difference in the detected predefined primitive elements in each of the sensory datasets, in particular in case the sensors 220 are installed in a moving vehicle since the timing difference may affect the spatial capturing parameters of the sensors 220. The fusion engine 210 may therefore align the sensory datasets with each other such that the predefined primitive elements detected in the sensory datasets received from the sensors 220 are adjusted to compensate for the different timing capturing parameters of their respective sensors 220 and thus aligned with each other in time. For example, the fusion engine 210 may use the timing indication such as, for example, the time tag assigned to the sensory datasets for identifying their timing of capture by the respective sensors 220. Based on the identified timing, the fusion engine 210 may adjust the sensory datasets accordingly and correlate the predefined primitive elements detected in the sensory datasets received from the sensors 220 with each other and optionally associate them with one or more respective potential objects.

The fusion engine 210 may further temporally align sensory datasets received from multiple sensors 220 having different sampling rates. For example, assuming a certain imaging sensor 220A provides sensory datasets (e.g. images) at a relatively high rate while a certain LiDAR sensor 220B provides sensory datasets depicting the same scene as the certain imaging sensor 220A at a significantly lower rate. The fusion engine 210 may adjust the sensory data of the certain imaging sensor 220A and/or the certain LiDAR sensor 220B to align in time the sensory data of the two sensors.

The sensors 220 may further differ from each other in other capturing parameters, for example, resolution, distribution, distortion (e.g. scanning distortion, Parallax distortion, etc.), scaling and/or the like. The fusion engine 210 may adjust one or more of the sensory datasets received from the plurality of sensors 220 to compensate for different resolutions and/or distributions of the different sensors 220 and align the predefined primitive elements detected in the various sensory datasets. Based on the aligned sensory datasets the fusion engine 210 may correlate together one or more of the predefined primitive elements detected in different sensory datasets and further associate them with one or more respective potential objects. For example, sensory dataset received from a certain imaging sensor 220A may have a higher resolution than the sensory dataset received from a certain LiDAR sensor 220B. In such case the fusion engine 210 may adjust one or more predefined primitive elements detected in one or more of the sensory datasets received from the imaging sensor 220A and the LiDAR sensor 220B to compensate for the different resolution and correlate one or more predefined primitive elements detected in these sensory datasets with each other and optionally with one or more respective potential object. In another example, sensory dataset received from a certain imaging sensor 220A may include a matrix of pixels which are substantially evenly distributed while the sensory dataset received from a certain LiDAR sensor 220B may include a cloud point having an uneven distribution, an effect which may be inherent to the LiDAR capturing technology. In such case the fusion engine 210 may adjust one or more predefined primitive elements detected in one or more of the sensory datasets received from the imaging sensor 220A and the LiDAR sensor 220B to compensate for the different distribution to correlate one or more predefined primitive elements detected in these sensory datasets with each other and optionally associate them with one or more respective potential objects.

As shown at 108, the fusion engine 210 may create one or more fused objects by joining together associated respective predefined primitive elements to complement accordingly the potential object(s) detected in the scene. In particular, the fusion engine 210 joins together predefined primitive elements detected in sensory datasets received from sensors 220 employing different sensing and capturing technology and hence adapted to capture different radiation types. By joining together the associated respective predefined primitive elements and complementing the potential object(s), the fused object created by the fusion engine 210 may be significantly more accurate, elaborate, complete and/or comprehensive.

For example, the fusion engine 210 may join together horizontal line sections (predefined primitive elements) associated with a common potential object where one or more sections are detected by analyzing the sensory dataset of a certain imaging sensor 220A and one or more sections are detected by analyzing the sensory dataset of a certain LiDAR sensor 220B. The two sections may overlap, partially overlap or not overlap. The fused object resulting from joining together the horizontal line sections may therefore be significantly more detailed and/or comprehensive. In another example, the fusion engine 210 may join together a vertical line (predefined primitive element) and a circular shape (predefined primitive element) associated with a common potential object where the vertical line is detected by analyzing the sensory dataset of a certain imaging sensor 220A and the circular shape is detected by analyzing the sensory dataset of a certain LiDAR sensor 220B. The vertical line and the sections may overlap, partially overlap or not overlap. The fused object resulting from joining together the vertical line and the circular shape may therefore be significantly more detailed and/or comprehensive. In another example, the fusion engine 210 may join together vertical lines (predefined primitive element) detected by analyzing the sensory dataset of the imaging sensor 220A and the LiDAR sensor 220B which are correlated together and optionally associated with a certain potential object, for example, a traffic sign pole, a light column and/or the like. In another example, the fusion engine 210 may join together a plurality of horizontal 2D rectangles (predefined primitive element) located spaced from each other in a predefined distance road detected by analyzing the sensory dataset of the imaging sensor 220A with one or more predefined primitive element portraying an edge of a road detected by analyzing the sensory dataset of the LiDAR sensor 220B. The 2D rectangles and the road outline are correlated together and optionally associated with a pedestrian crosswalk object.

Another example for the creation of fused object(s) may demonstrated by joining together predefined primitive elements detected by the imaging sensor 220A and the LiDAR sensor 220B as shown in FIG. 3A and FIG. 3B which are correlated together and optionally associated with respective potential object(s). For example, the fusion engine 210 may join together the arrow marking on traffic sign (2) detected by analyzing the sensory dataset received from the imaging sensor 220A and the outline of the traffic sign (B) detected by analyzing the sensory dataset received from the LiDAR sensor 220B to create a fused object, i.e. an arrow traffic sign. In another example, the fusion engine 210 may join together the feet of the pedestrian crossing the street (7) detected by analyzing the sensory dataset received from the imaging sensor 220A and the outline of the upper part and head of the pedestrian crossing the street (D) detected by analyzing the sensory dataset received from the LiDAR sensor 220B to create a fused object, i.e. the pedestrian crossing the street. In another example, the fusion engine 210 may join together the bicycle wheel (5) detected by analyzing the sensory dataset received from the imaging sensor 220A and the outline of the person riding a bicycle (only the person is detected while the bicycle is not) detected by analyzing the sensory dataset received from the LiDAR sensor 220B to create a fused object, i.e. the person riding the bicycle. In another example, the fusion engine 210 may join together the road markings (4) detected by analyzing the sensory dataset received from the imaging sensor 220A and the outline of road detected by analyzing the sensory dataset received from the LiDAR sensor 220B to create a fused object, i.e. the road including markings. For brevity, the presented examples are naturally highly simplistic. However, the fusion engine 210 may join together complex predefined primitive elements optionally a large number such predefined primitive elements to complement potential objects and create higher complexity fused objects.

Reference is now made to FIG. 4, which is an exemplary image capture presenting creation of fused objects by joining together predefined primitive elements detected in sensory datasets captured by an imaging sensor and a LiDAR sensor, according to some embodiments of the present invention. An exemplary image capture 402 presents creation of a plurality of fused objects by a fusion engine such as the fusion engine 210 which may join together a plurality of associated predefined primitive elements detected based on analysis of sensory datasets received from one or more imaging sensors such as the imaging sensor 220A and one or more LiDAR sensors such as the LiDAR sensor 220B. In particular, the fusion engine 210 joins together a plurality of curves (predefined primitive elements) detected in the sensory dataset received from the imaging sensor 220A (marked RED) and a plurality of curves (predefined primitive elements) detected in the sensory dataset received from the LiDAR sensor 220B (marked GREEN).

For example, the fusion engine 210 may join together a plurality of the RED curves (extracted from the sensory dataset of the imaging sensor 220A) portraying the lower part of a pedestrian 404 with a plurality of the GREEN curves (extracted from the sensory dataset of the LiDAR sensor 220B) portraying the upper part of the pedestrian 404 to create a fused object of the pedestrian 404. As evident the fused object of the pedestrian 404 is significantly more accurate and complete compared to the partial shapes constructed by only the RED curves or only the GREEN curves. In another example, the fusion engine 210 may join together a plurality of the RED curves (extracted from the sensory dataset of the imaging sensor 220A) portraying the lower part of a bicycle rider 406 with a plurality of the GREEN curves (extracted from the sensory dataset of the LiDAR sensor 220B) portraying the upper part of the bicycle rider 406 to create a fused object of the bicycle rider 406. As evident the fused object of the bicycle rider 406 is significantly more accurate and complete compared to the partial shapes constructed by only the RED curves or only the GREEN curves. In another example, the fusion engine 210 may join together a plurality of the RED curves (extracted from the sensory dataset of the imaging sensor 220A) portraying arrow markings on traffic sign 410 with a plurality of the GREEN curves (extracted from the sensory dataset of the LiDAR sensor 220B) portraying an outline of the traffic sign 410 to create a fused object of the arrow traffic sign 410. As evident the fused object of the arrow traffic sign 410 is significantly more accurate and complete compared to the partial construction achievable based on only the RED curves or only the GREEN curves. In another example, the fusion engine 210 may join together a plurality of the RED curves (extracted from the sensory dataset of the imaging sensor 220A) portraying markings 408 in a street intersection with a plurality of the GREEN curves (extracted from the sensory dataset of the LiDAR sensor 220B) portraying objects detected in the street intersection, for example, the pedestrian 404 the bicycle rider 406, the arrow traffic sign 410 and a vehicle 412 to create a fused picture of the street intersection. As evident the fused picture of the street intersection is significantly more accurate and complete compared to the partial construction achievable based on only the RED curves or only the GREEN curves.

Reference is made once again to FIG. 1.

As shown at 110, the fusion engine 210 classifies each of the fused object(s) with a respective label of a matching object of a plurality of predefined and labeled objects. The fusion engine 210 may classify each fused object according to the labels of the predefined primitive elements associated with the respective fused object based on the association of the associated predefined primitive elements in the primitive elements dataset. The fusion engine 210 may also apply one or more of the comparison metrics, for example, the shape similarity, the size similarity, the color similarity, the texture similarity and/or the like to compare the fused object to the predefined objects. The fusion engine 210 may further apply one or more machine learning, specifically deep learning methods, algorithms and/or models, for example, a neural network, an SVM, a decision tree learning algorithm, a K-Nearest neighbors algorithm and/or any other learning algorithm as known in the art trained to identify the fused object as one of the predefined and labeled objects.

The fusion engine 210 may be able to classify one or more of the fused object(s) even if the respective fused object does not perfectly match a respective predefined and labeled object. For example, assuming a certain object in the scene, for example, a light (lamp) column is partially hidden (blocked) by another object, for example, a car, the fusion engine may still be able to classify the partially blocked light column according to detection of a comer (predefined primitive element) having an angle of 80-120 degrees connecting two rectangles (predefined primitive elements) at a height of approximately 2 meters above ground level. While a vertical rectangle of the two rectangles may be partially blocked, for example, its section extending from the ground, is not visible, the fusion engine 210 may be able to classify the partially blocked light column according to detection of the visible predefined primitive elements associated with the predefined labeled light column object.

The fusion engine 210 may further apply a match threshold to determine a positive or negative match of one or more of the fused objects with respective predefined labeled object(s). The match threshold may be set globally for all the predefined labeled objects. However, the match threshold may be set individually for one or more of the predefined labeled objects. The match threshold may be expressed in one or more metrics, for example, percentage and/or the like. The fusion engine 210 may therefore evaluate a match level between a certain fused object and a respective predefined labeled object. In case the match level exceeds (is above) the match threshold, the fusion engine 210 may classify the certain fused object accordingly and in case the match level does not exceed (is below) the match threshold, the fusion engine 210 may refrain from classifying the certain fused object.

Optionally, the fusion engine 210 classifies one or more of the fused objects and assigns it with a calculated probability score indicating a probability that the fused object matches a respective predefined labeled object (i.e. a match level). The fusion engine 210 may further calculate a plurality of probability scores for a certain fused object indicating a probability that the certain fused object matches a respective one of a plurality of predefined labeled objects.

Optionally, weights are assigned to one or more of the sensors 220 to adjust a contribution of the sensory dataset received from respective sensor 220 to the construction and/or classification of the fused object(s). When classifying the fused object(s) and calculating the probability score(s), the fusion engine 210 may apply the weights to the predefined primitive elements detected in the sensory dataset(s) of the respective sensor(s). For example, assuming a certain imaging sensor 220A is assigned a first weight and a certain LiDAR sensor 220B is assigned a second weight which is lower than the first weight. Further assuming the fusion engine 210 detects one or more predefined primitive elements in each of the sensory datasets received from the certain imaging sensor 220A and the certain LiDAR sensor 220B which are associated with a certain potential object. When constructing the respective fused object, the fusion engine 210 may apply the first and second weights such that the predefined primitive element(s) detected in the sensory dataset received from the certain imaging sensor 220A have higher significance, importance and/or precedence over the predefined primitive element(s) detected in the sensory dataset received from the certain LiDAR sensor 220B. Similarly, when calculating the probability score for the certain fused object, the fusion engine 210 applying the first and second weights may assign higher significance, importance and/or precedence for the sensory data received from the certain imaging sensor 220A and lower significance, importance and/or precedence for the sensory data received from the certain LiDAR sensor 220B. In other words, the contribution and/or impact of the certain imaging sensor 220A to the construction and/or classification of the certain fused object is higher than that of the certain LiDAR sensor 220B.

Moreover, while the fusion engine 210 may classify fused objects as the predefined higher level objects, the fusion engine 210 may also classify one or more arbitrary higher level objects which are not specifically predefined in advance but detected by joining a plurality of predefined primitive elements. For example, the fusion engine 210 may detect a certain arbitrary object having an arbitrary shape which resides on a road, for example, a trash pile, debris, a hole in the road and/or the like. The fusion engine 210 may classify such fused arbitrary objects as, for example, an obstacle, a hazard and/or the like.

As shown at 112, the fusion engine 210 may output the classification, for example, the label of the fused object(s) detected in the scene.

The fusion engine 210 may further output a descriptive dataset for one or more fused object(s) detected in the scene. The descriptive dataset may include temporal information relating to the fused object(s), for example, a time of capture, a timing of the detection of the fused object in a sequence of sensory datasets captured during a certain time period, detection timing with respect to detection of one or more other objects and/or the like. The descriptive dataset may also include spatial information relating to the fused object(s), for example, a location of the fused object with respect to vehicle (in which the sensors 220 are positioned), a location of the fused object with respect to one or more other objects and/or the like. The relational information of the fused object, i.e. the temporal and/or spatial information with respect to other object(s) may further include additional relational information, for example, a relative positioning between the objects, an elevation difference between the objects, an obscurity of the fused object which may be blocked by another object(s) and/or the like. The fusion engine 210 may extract, calculate and/or derive the spatial information, the temporal information, the relational information and/or part thereof from the analysis of one or more of the sensory datasets received from one or more of the sensors 220. For example, the spatial information or part thereof may be calculated based on one or more of the sensory datasets received from one or more of the sensors 220, for example, the imaging sensor(s) 220A, the LiDAR sensor(s) 220B and/or the like. The fusion engine 210 may further extract, calculate and/or derive the spatial information and/or part thereof based on information received from one or more geolocation sensors, for example, GPS information received from one or more GPS sensors, navigation systems and/or the like.

The fusion engine 210 may further transmit and/or receive data relating to the operational environment of the fusion system 200 to one or more of the remote network resources, for example, the remote server 240, the cloud service 250 and/or the like. For example, assuming the fusion system is installed in the vehicle the fusion engine 210 may transmit detection information describing the detected objects (e.g. type, location, position with respect to the sensor(s) 220, position with respect to other objects, etc.) and/or the like. This information may be used for creating one or more models, for example, a map model mapping the detected objects, a scenery model presenting the detected objects, a big data model and/or the like. According to some embodiments of the present invention, one or more failures and/or malfunctions of one or more of the sensors 220 may be automatically detected based on analysis of sensory data, specifically analysis of predefined primitive element(s) detected by analyzing the sensory dataset of other sensor(s) 220, in particular other sensor(s) 220 employing different sensing technology(s) and adapted to capture different radiation type(s).

For example, assuming that by analyzing the sensory dataset received from a certain imaging sensor 220A, the fusion engine 210 detects an outline of a certain traffic light (predefined primitive element) in the scene depicted by the certain imaging sensor 220A. Further assuming, that based on the analysis of the sensory dataset received from the certain imaging sensor 220A, the fusion engine 210 determines that the outline of a certain traffic light and/or part thereof should be detected by analyzing the sensory dataset received from a certain LiDAR sensor 220B depicting substantially the same scene. In case the fusion engine 210 is unable to detect the outline of a certain traffic light and/or part thereof by analyzing the sensory dataset received from a certain LiDAR sensor 220B, the fusion engine 210 may determine that there is a high probability that the certain LiDAR sensor 220B is failed, blocked from properly monitoring the scene and/or the like.

In another example, assuming that by analyzing the sensory dataset received from a certain SONAR sensor 220D, the fusion engine 210 detects an outline of a certain object (predefined primitive element) in the scene depicted by the certain SONAR sensor 220D. Further assuming, that based on the analysis of the sensory dataset received from the SONAR sensor 220D, the fusion engine 210 determines that the outline of a certain object and/or part thereof should be detected by analyzing the sensory dataset received from a certain RADAR sensor 220C depicting substantially the same scene. In case the fusion engine 210 is unable to detect the outline of a certain object and/or part thereof by analyzing the sensory dataset received from a certain RADAR sensor 220C, the fusion engine 210 may determine that there is a high probability that the certain RADAR sensor 220C is failed, blocked from properly monitoring the scene and/or the like.

According to some embodiments of the present invention, one or more of the sensors 220 may be automatically calibrated according to sensory data, specifically according to predefined primitive element(s) detected by analyzing the sensory dataset(s) of other sensor(s) 220, in particular other sensor(s) 220 employing different sensing technology(s) and adapted to capture different radiation type(s).

For example, assuming that by analyzing the sensory dataset received from a certain imaging sensor 220A, the fusion engine 210 detects a street comer (predefined primitive element) in a scene depicted by the certain imaging sensor 220A. Also assuming, that that by analyzing the sensory dataset received from a certain LiDAR sensor 220B, the fusion engine 210 detects a street corner (predefined primitive element) in a scene depicted by the certain LiDAR sensor 220B which at least partially overlaps the scene depicted by the certain imaging sensor 220A. Further assuming the fusion engine 210 correlates together the street corners detected in the sensory dataset of the certain imaging sensor 220A and the sensory dataset of the certain LiDAR sensor 220B as described in step 106. In case the fusion engine 210 detects that there is an incompliance, for example, a spatial shift, a temporal shift and/or the like between the street corners detected in the sensory dataset of the certain imaging sensor 220A and the sensory dataset of the certain LiDAR sensor 220B, the fusion engine 210 may calibrate and/or compensate the sensory dataset received from the certain imaging sensor 220A according to the sensory dataset received from the certain LiDAR sensor 220B or vice versa.

In order to improve calibration of one or more of the sensors 220, the fusion engine 210 may preferably attempt to detect a common (associated) predefined primitive element(s) in the datasets received from a plurality of sensors 220 (more than two) and calibrate and/or compensate according to a majority decision, i.e. determines the correct calibration according to detection of the common predefined primitive element(s) in a majority of the in the sensory datasets.

It is expected that during the life of a patent maturing from this application many relevant systems, methods and computer programs will be developed and the scope of the terms sensor technology and machine learning models are intended to include all such new technologies a priori.

As used herein the term“about” refers to ± 10 %.

The terms "comprises", "comprising", "includes", "including",“having” and their conjugates mean "including but not limited to".

The term“consisting of’ means“including and limited to”.

As used herein, the singular form "a", "an" and "the" include plural references unless the context clearly dictates otherwise. For example, the term "a compound" or "at least one compound" may include a plurality of compounds, including mixtures thereof.

Throughout this application, various embodiments of this invention may be presented in a range format. It should be understood that the description in range format is merely for convenience and brevity and should not be construed as an inflexible limitation on the scope of the invention. Accordingly, the description of a range should be considered to have specifically disclosed all the possible subranges as well as individual numerical values within that range. For example, description of a range such as from 1 to 6 should be considered to have specifically disclosed subranges such as from 1 to 3, from 1 to 4, from 1 to 5, from 2 to 4, from 2 to 6, from 3 to 6 etc., as well as individual numbers within that range, for example, 1, 2, 3, 4, 5, and 6. This applies regardless of the breadth of the range. Whenever a numerical range is indicated herein, it is meant to include any cited numeral (fractional or integral) within the indicated range. The phrases“ranging/ranges between” a first indicate number and a second indicate number and“ranging/ranges from” a first indicate number “to” a second indicate number are used herein interchangeably and are meant to include the first and second indicated numbers and all the fractional and integral numerals therebetween.

It is appreciated that certain features of the invention, which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the invention, which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable subcombination or as suitable in any other described embodiment of the invention. Certain features described in the context of various embodiments are not to be considered essential features of those embodiments, unless the embodiment is inoperative without those elements.

In addition, any priority document(s) of this application is/are hereby incorporated herein by reference in its/their entirety.

Claims

WHAT IS CLAIMED IS:

1. A computer implemented method of detecting objects based on fusion of sensory data received from a plurality of sensors, comprising:

executing a code by at least one processor for:

receiving a plurality of sensory datasets captured by a plurality of sensors adapted to capture a plurality of radiation types in a common scene;

detecting in real-time at least one predefined primitive element in each of at least some of the plurality of sensory datasets;

associating the at least one predefined primitive element detected in each of the at least some sensory datasets with at least one potential object;

complementing the at least one potential object by joining together respective associated predefined primitive elements to create a respective fused object;

classifying the respective fused object according to a match with at least one of a plurality of predefined objects; and

outputting the classification of the respective fused object.

2. The computer implemented method of claim 1, wherein the plurality of sensors include at least some members of a group consisting of: an imaging sensor, a laser detection and ranging (LiDAR) sensor, a radio detection and ranging (RADAR) sensor and a sound navigation and ranging (SONAR) sensor, the plurality of sensors are adapted to capture respective radiation types which are members of a group consisting of: visible light waves, infrared light waves, laser light waves, hyperspectral waves, heat radiation, radio frequency waves and ultrasonic waves.

3. The computer implemented method of claim 1, wherein the at least one predefined primitive element is a member of a group consisting of: an edge, a curve, a line, a comer, a surface, a shape, a texture and a color.

4. The computer implemented method of claim 1, further comprising the association is based on aligning each of the at least one predefined primitive element detected in each of the at least some sensory datasets according to a common reference.

5. The computer implemented method of claim 4, wherein the aligning comprising at least one member of a group consisting of: a temporal alignment, a spatial alignment, a resolution alignment and a distribution alignment.

6. The computer implemented method of claim 1, further comprising outputting a descriptive dataset of the fused object, the descriptive dataset comprising at least one member of a group consisting of: temporal information, spatial information and relational information descriptive of relation of the fused object with at least one another object detected in at least some of the sensory datasets.

7. The computer implemented method of claim 1, further comprising detecting at least one failed sensor of the plurality of sensors by identifying incompliance of the at least one predefined primitive element detected in the sensory dataset of the failed sensor with respect to the at least one predefined primitive element detected in at least another one of the plurality of sensory datasets.

8. The computer implemented method of claim 1, further comprising calibrating at least one of the plurality of sensors according to a comparison of the at least one predefined primitive element detected in the sensory dataset of the respective sensor with respect to the at least one predefined primitive element detected in at least another one of the plurality of sensory datasets.

9. A system for detecting objects based on fusion of sensory data received from a plurality of sensors, comprising:

at least one processor adapted for executing a code, the code comprising:

code instructions to receive plurality of sensory datasets captured by a plurality of sensors adapted to capture a plurality of radiation types in a common scene;

code instructions to detect in real-time at least one predefined primitive element in each of at least some of the plurality of sensory datasets;

code instructions to associate the at least one predefined primitive element detected in each of the at least some sensory datasets with at least one potential object;

code instructions to complement the at least one potential object by joining together respective associated predefined primitive elements to create a respective fused object; code instructions to classify the respective fused object according to a match with at least one of a plurality of predefined objects; and code instructions to output the classification of the respective fused object.

10. The system of claim 9, wherein the plurality of sensors include at least some members of a group consisting of: an imaging sensor, a laser detection and ranging (LiDAR) sensor, a radio detection and ranging (RADAR) sensor and a sound navigation and ranging (SONAR) sensor, the plurality of sensors are adapted to capture respective radiation types which are members of a group consisting of: visible light waves, infrared light waves, laser light waves, hyperspectral waves, heat radiation, radio frequency waves and ultrasonic waves.

11. The system of claim 9, wherein the at least one predefined primitive element is a member of a group consisting of: an edge, a curve, a line, a corner, a surface, a shape, a texture and a color.

12. The system of claim 9, further comprising the association is based on aligning each of the at least one predefined primitive element detected in each of the at least some sensory datasets according to a common reference.

13. The system of claim 12, wherein the aligning comprising at least one member of a group consisting of: a temporal alignment, a spatial alignment, a resolution alignment and a distribution alignment.

14. The system of claim 9, further comprising outputting a descriptive dataset of the fused object, the descriptive dataset comprising at least one member of a group consisting of: temporal information, spatial information and relational information descriptive of relation of the fused object with at least one another object detected in at least some of the sensory datasets.

15. The system of claim 9, further comprising detecting at least one failed sensor of the plurality of sensors by identifying incompliance of the at least one predefined primitive element detected in the sensory dataset of the failed sensor with respect to the at least one predefined primitive element detected in at least another one of the plurality of sensory datasets.

16. The system of claim 9, further comprising calibrating at least one of the plurality of sensors according to a comparison of the at least one predefined primitive element detected in the sensory dataset of the respective sensor with respect to the at least one predefined primitive element detected in at least another one of the plurality of sensory datasets.