GB2609620A

GB2609620A - System and computer-implemented method for performing object detection for objects present in 3D environment

Info

Publication number: GB2609620A
Application number: GB2111294.1A
Authority: GB
Inventors: Tanksale Tejas; Kannan Srividhya
Original assignee: Continental Automotive GmbH
Current assignee: Continental Automotive GmbH
Priority date: 2021-08-05
Filing date: 2021-08-05
Publication date: 2023-02-15

Abstract

Embodiments of present disclosure relates to system and method for performing object detection for objects present in a 3D environment specifically for tracking objects around autonomous vehicles. For the object detection, one or more clusters of points in 3D representation of a 3D environment is detected. Each of the one or more clusters indicate an object from one or more objects present in the 3D environment. Further, a cuboid is generated for each of the one or more objects in the 3D representation, based on the one or more clusters. At least one point of interest from corresponding cuboid of each of the one or more clusters is projected onto a 2D image of at least part of the 3D environment. One or more regions of interest of the one or more objects on the 2D image are located based on the projection. Such regions of interest are provided for detection and classification of the one or more objects accurately with less processing time.

Description

SYSTEM AND COMPUTER-IMPLEMENTED METHOD FOR PERFORMING

OBJECT DETECTION FOR OBJECTS PRESENT IN 3D ENVIRONMENT"

TECHNICAL FIELD

[1] The present disclosure relates generally to autonomous systems, more specifically, but not particularly to, system and computer-implemented method for performing object detection for objects present in 3D environment of an autonomous system.

BACKGROUND

[2] Detection and tracking of objects around an autonomous vehicle is essential to operate safely. Various sensors may be implemented in such autonomous vehicle for the detection of the objects. Presently, depth-based systems such as Light Detection and Ranging (Lidar) and Radio Detection and Ranging (Radar) systems may be implemented to retrieve 3-Dimensional (3D) data for object detection. Such systems focuses on projection of 2D data in 3D space or perform object detection in 3D space using 3D convolutions or perform 3D object detection from Red, Green and Blue (RGB)-Depth (D) data. Upon detecting the objects, a classifier may be used to classify the detected objects.

[3] In case of projection of the 2D data in 3D space input to such systems comprises multiple 2D-projected arrays for different points of view. Time taken for the projection and processing for object detection is high for such systems. in case of 3D convolutions, basic premise is that feature extraction in 3D space is more accurate than in 2D images. However, 3D point clouds from Lidar are usually not dense which could result in loss of accuracy. Also, time taken for 3D object detection is much more than 2D which does not allow real-time object detection. Object detection may be performed by using only the 3D data However, sparsity in the 3D data may lead to accuracy problems in the detection. When focusing more on accuracy rather than on speed, using real-time 3D data for object detection is unfeasible.

[4] Some techniques disclose to generate proposals for object detection in either 2D or 3D. In case of 2D camera-only input using RGB images, conventional image processing techniques or CNNs are used for proposal generation. In case of 3D data, such as RGB-D, bounding boxes are generated in 3D. However, most approaches are slow in case of 3D. More and more implementations in the autonomous vehicle use a camera and lidar together. However, such implementations include complex neural networks. Thus, high speed detection may not be possible with current approaches using such implementation.

[5] The information disclosed in this background of the disclosure section is only for enhancement of understanding of the general background of the invention and should not be taken as an acknowledgement or any form of suggestion that this information forms the prior art already known to a person skilled in the art.

BRIEF DESCRIPTION OF THE INVENTION

[6] The present disclosure provisions to solve the problem arising during projection of 3D data onto 2D data. Complexity of existing techniques, which implement neural networks, is reduced. The proposed disclosure teaches simpler technique by eliminating the need of complex neural networks for the projection of the 3D data onto the 2D data, and for the detection of object. Thus, processing time required for object detection may be reduced. By which, the proposed method and system may be implemented for real-time and dynamic detection of objects.

[7] In a preferred embodiment, the present disclosure relates to a computer-implemented method for performing object detection for objects present in a 3-Dimensional (3D) environment. The computer-implemented method is performed by an object detection system. For the object detection, at step A, one or more clusters of points in 3D representation of a 3D environment is detected. Each of the one or more clusters indicate an object from one or more objects present in the 3D environment. Further, at step B, a cuboid is generated for each of the one or more objects in the 3D representation, based on the one or more clusters detected at step A. At step C. At least one point of interest from corresponding cuboid of each of the one or more clusters of the 3D representation is projected onto a 2-Dimensional (2D) image of at least part of the 3D environment. One or more regions of interest of the one or more objects on the 2D image are located based on the projection of the at least one point of interest from step C, for detecting the one or more objects.

[8] in a preferred embodiment, the computer-implemented method comprises to provide the one or more regions of interest of the one or more objects from the 2D image to a trained classifier. for classifying the one or more objects present in the 3D environment.

[009] In a preferred embodiment, the 3D environment is surroundings of an autonomous vehicle, and the object detection system is implemented in the autonomous vehicle.

[0010] In a preferred embodiment, each cluster from the one or more clusters in the 3D representation is used to generate the cuboid to enclose boundary points of respective object from the one or more objects.

[0011] In a preferred embodiment, the at least one point of interest of a cluster from the one or more clusters comprises at least one of point of centroid, point in edge and point in vertices, of the cuboid associated with the cluster.

[0012] In a preferred embodiment, the one or more clusters comprises correcting misplaced points in the 3D representation to be part of the one or more clusters to update the one or more clusters.

[0013] In a preferred embodiment, the present disclosure relates to object detection system for performing object detection for objects present in a 3D environment. The object detection system includes a processor and a memory communicatively coupled to the processor. The memory stores processor-executable instructions, which on execution cause the processor to perform the object detection disclosed in the computer-implemented method.

[0014] In a preferred embodiment, the object detection system further comprises sensors for detecting 3D representations for use in step A of the computer-implemented method.

[0015] In a preferred embodiment, the object detection systcm further comprises sensors for detecting 2-Dimensional (2D) images for use in step C of the computer-implemented method.

[0016] in a preferred embodiment, the object detection system further comprises a interface configured to receive data of 3D representation of a 3D environment and/or data of 2-Dimensional (2D) images.

[0017] In a preferred embodiment, the present disclosure relates to computer program product comprising instructions which, when the program is executed by a computer, cause the computer to carry out the steps of the computed-implemented method.

[0018] In a preferred embodiment, the present disclosure relates to a computer program product comprising instructions to cause the object detection system to perform object detection and execute the steps of the computer-implemented method.

[0019] In a preferred embodiment, the present disclosure relates to a computer-readable storage medium comprising instructions which, when executed by a computer, cause the computer to carry out the steps of the computer-implemented method.

[0020] As used in this summary, in the description below, in the claims below, and in the accompanying drawings, the term "3D environment" means a physical space comprising plurality of objects. The 3D environment may be, but is not limited to, a road, parking space, park, highway and so on.

[0021] As used in this summary, in the description below, in the claims below, and in the accompanying drawings, the term "objects" means a person, or a thing present in a physical space. The objects may include, but are not limited to, a vehicle, a tree, a person, building, animal and so on.

[0022] As used in this summary, in the description below, in the claims below, and in the accompanying drawings, the term "cuboid" means is a convex polyhedron bounded by six quadrilateral faces, whose polyhedral graph is the same as that of a cube.

[0023] As used in this summary, in the description below, in the claims below, and in the accompanying drawings, the term "at least" followed by a number is used in to denote the start of a range beginning with that number (which may be a range having an upper limit or no upper limit, depending on the variable being defined). For example, "at least one" means one or more than one.

[0024] The foregoing summary is illustrative only and is not intended to be in any way limiting. In addition to the illustrative aspects, embodiments, and features described above, further aspects, embodiments, and features will become apparent by reference to the drawings and the following detailed description.

S

BRIEF DESCRIPTION OF THE ACCOMPANYING DRAWINGS

[0025] The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate exemplary embodiments and, together with the description, serve to explain the disclosed principles. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. The same numbers are used throughout the figures to reference like features and components. Some embodiments of system and/or methods in accordance with embodiments of the present subject matter are now described, by way of example only, and regarding the accompanying figures, in which: [0026] Figure 1 shows exemplary environment of an object detection system for performing object detection for objects present in a 3D environment in accordance with some embodiments of the present disclosure; [0027] Figure 2 shows a detailed block diagram of an object detection system for performing object detection for objects present in a 3D enviromnent, in accordance with some embodiments of the present disclosure; [0028] Figures 3a-3e illustrate exemplary embodiments for performing object detection for objects present in a 3D environment, in accordance with some embodiments of present disclosure; [0029] Figure 4 illustrates a flowchart showing exemplary method for performing object detection for objects present in a 3D cnviromnent, in accordance with some embodiments of present disclosure; and [0030] Figure 5 illustrates a block diagram of an exemplary computer system for implementing embodiments consistent with the present disclosure.

[0031] It should be appreciated by those skilled in the art that any block diagrams herein represent conceptual views of illustrative systems embodying the principles of the present subject matter. Similarly, it will be appreciated that any flow charts, flow diagrams, state transition diagrams, pseudo code, and the like represent various processes which may be substantially represented in computer readable medium and executed by a computer or processor, whether such computer or processor is explicitly shown.

DETAILED DESCRIPTION

[0032] in the present document, the word "exemplary" is used herein to mean "serving as an example, instance, or illustration." Any embodiment or implementation of the present subject matter described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

[0033] While the disclosure is susceptible to various modifications and alternative forms, specific embodiment thereof has been shown by way of example in the drawings and will be described in detail below. It should be understood, however that it is not intended to limit the disclosure to the forms disclosed, but on the contrary, the disclosure is to cover all modifications, equivalents, and alternative falling within the spirit and the scope of the disclosure.

[0034] The terms "comprises", "comprising", or any other variations thereof, are intended to cover a non-exclusive inclusion, such that a setup, device, or method that comprises a list of components or steps does not include only those components or steps but may include other components or steps not expressly listed or inherent to such setup or device or method. In other words, one or more elements in a system or apparatus proceeded by "comprises.., a" does not, without more constraints, preclude the existence of other elements or additional elements in the system or method.

[0035] The terms "includes", -including", or any other variations thereof, are i ntended to cover a non-exclusive inclusion, such that a setup, device, or method that includes a list of components or steps does not include only those components or steps but may include other components or steps not expressly listed or inherent to such setup or device or method. In other words, one or more elements in a system or apparatus proceeded by -includes.., a" does not, without more constraints, preclude the existence of other elements or additional elements in the system or method.

[0036] In the following detailed description of the embodiments of the disclosure, reference is made to the accompanying drawings that form a part hereof, and in which are shown by way of illustration specific embodiments in which the disclosure may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the disclosure, and it is to be undcrstood that other embodiments may be utilized and that changes may be made without departing from the scope of the present disclosure. The following description is, therefore, not to be taken in a limiting sense.

[0037] Present disclosure relates to computer-implemented method and system for performing object detection for objects present in a 3D environment. Particularly, the present disclosure proposes to locate regions of interest in a 2D image of at least part of the 3D environment to detect and classify the objects faster and accurately. 3D representation of the 3D environment is considered and points in the 3D representation are clustered for each of the objects. A cuboid is generated for each of the one or more objects by using corresponding cluster. Further, points of interest from the cuboid are projected onto the 2D image to locate the regions of interest for each of the one or more objects. Thus, the regions of interest are provided for classification of the one or more objects efficiently, [0038] Figure 1 shows exemplary environment 100 of an object detection system 101 for detecting objects in a 3D environment, in accordance with some embodiments of the present disclosure. The exemplary environment 100 may include the object detection system 101 in communication with a 3D representation capture unit 102, a 2D image capture unit 103 and a trained classifier 104. The 3D environment may be surroundings of a system implementing the object detection system 101. In a preferred embodiment, the object detection system 101 may be implemented in an autonomous vehicle and thus the 3D environment may be surroundings of the autonomous vehicle. In another example, the object detection system 101 may be implemented in a robotic system in an industry and thus the 3D environment may be surroundings of the robotic systcm. In the alternative embodiment, the object detection systcm 101 may be implemented in any system where objects in surroundings of that system need to be detected and classified. For example, in the autonomous vehicle the object detection system 10 I may be implemented to detect objects in front of the autonomous vehicle to be one of a car, a bicycle, a tree, a building and so on.

[0039] The 3D representation capture unit 102 may be configured to capture data points related to objects in the 3D environment, in the preferred embodiment, the data points may represent the co-ordinates of each of the objects in the 3D environment. In the preferred embodiment, the 3D representation capture unit 102 may be a Light Detection and Ranging (Lidar) system and the 3D representation obtained for the 3D environment may be a point cloud data. In the preferred embodiment, 3D representation may be mesh grid data, voxel data,

S

volumetric data and so on. The 3D representation may be of any other form of data point of the 3D environment, known to a person skilled in the art. The 2D image capture unit 103 may be configured to capture 2D image of at least part of the 3D environment. For example, the 2D image capture unit 103 may be a camera which captures front view of the system implementing the object detection system 101. In the preferred embodiment, the 2D image capture unit 103 may be directed in such a manner that the 2D image may be captured such that objects covered in the 3D representation are covered in the 2D image as well. In the preferred embodiment, point of view of the 3D representation capture unit 102 for capturing the 3D representation may be same as point of view of the 2D image capture unit 103. In the alternative, the point of view of the 3D representation capture unit 102 for capturing the 3D representation may not be same as the point of view of the 2D image capture unit 103. In the preferred embodiment, the 3D representation capture unit 102 and the 2D image capture unit 103 may be part of the system implementing the object detection system 101. For example, consider the system is the autonomous vehicle, the 3D representation capture unit 102 is the Lidar system, and the 2D image capture unit 103 is the camera. The Lidar system and the camera may be coupled with the autonomous car and be in communication with the object detection system 101.

[0040] The object detection system 101 may include one or more processors 105, Input/Output (110) interface 106 and a memory 107. In some embodiments, the memory 107 may be communicatively coupled to the one or more processors 105. The memory 107 stores instructions, executable by the one or more processors 105, which on execution, may cause the object detection system 101 to perform object detection in the 3D environment, as proposed in the present disclosure. In the preferred embodiment, the memory 107 may include one or more modules 108 and data 109. The one or more modules 108 may be configured to perform the steps of the present disclosure using the data 109, to perform the object detection, in the preferred embodiment, each of the one or more modules 108 may be a hardware unit which may be outside the memory 107 and coupled with the object detection system 101. In the preferred embodiment, the object detection system 101 may be implemented in a variety of computing systems, such as a laptop computer, a desktop computer, a Personal Computer (PC), a notebook, a smartphone, a tablet, e-book readers, a server, a network server, cloud server, Artificial intelligence (Ai) accelerator, Graphics Processing Unit (GPU) enabled devices and the like, in the preferred embodiment, the object detection system 101 may be a dedicated server associated with a single system, to perform object detection for objects present in 3D environment of the single system. In the alternative embodiment, the object detection system 101 may be a cloud-based server associated with plurality of systems. Such object detection system 101 may be configured to perform object detection for objects present in 3D environment of each of the plurality of systems 100411 The object detection system 101 may be in communication with at least one of the 3D representation capture unit 102, the 2D image capture unit 103 and the trained classifier 104, for performing the detection of the objects in the 3D environment. In the preferred embodiment, the object detection system 101 may communicate with at least one of 3D representation capture unit 102, the 2D image capture unit 103 and the trained classifier 104 via a communication network (not shown in the figure). The communication network may include, without limitation, a direct interconnection, Local Area Network (LAN), Wide Area Network (WAN), wireless network (e.g., using Wireless Application Protocol), the Internet, and the like. In the preferred embodiment, a dedicated communication network may be implemented to establish communication between the object detection system 101 and each of the 3D representation capture unit 102, the 2D image capture unit 103 and the trained classifier 104.

[0042] For performing the object detection, the object detection system 101 may be configured to detect one or more clusters of points in the 3D representation of the 3D environment. The object detection system 101 may receive the 3D representation from the 3D representation capture unit 102. Each of the one or more clusters indicate an object from one or more objects present in the 3D environment. In the preferred embodiment, the object detection system 101 may be configured to correct misplaced points in the 3D representation to be part of the one or more clusters, to update the one or more clusters.

[0043] Further, the object detection system 101 may be configured to generate a cuboid for each of the one or more objects in the 3D representation. The cuboid for an object from the one or more objects is generated based on corresponding cluster from the one or more clusters. The corresponding cluster is cluster detected for the objects using points related to the object in the 3D representation. In the preferred embodiment, each duster from the one or more clusters is used to generate the cuboid to enclose boundary points of respective object from the one or more objects.

[0044] Upon generating the cuboid for each of the one or more objects, the object detection system 101 may be configured to project at least one point of interest from corresponding cuboid of each of the one or more clusters onto the 2D image of at least part of the 3D environment. The object detection system 101 may be configured to receive the 2D image from the 2D image capture unit 103. in the preferred embodiment, the at least one point of interest of a cluster from the one or more clusters may be at least one of point of centroid, point in edge and point in vertices, of the cuboid associated with the cluster.

[0045] Upon the projection, the object detection system 101 may be configured to locate one or more regions of interest of the one or more objects on the 2D image based on the projcction. Each of the one or more regions of interest represents each object in the 3D environment and hence the one or more objects are detected by the object detection system 101.

[0046] in the preferred embodiment, output from the object detection system 101 may be provided to the trained classifier 104. That is, the object detection system 101 may be configured to transmit the one or more regions of interest of the one or more objects from the 2D image to the trained classifier 104. in the preferred embodiment, the trained classifier 104 may be configured to classify the one or more objects present in the 3D environment using the output of the object detection system 101. In the preferred embodiment, the trained classifier 104 may be a deep learning network, convolution neural network, classically trained network and so on. In the preferred embodiment, the object detection system 101 may be in communication with the trained classifier 104 via a communication network. The communication network may include, without limitation, a direct interconnection, Local Area Network (LAN), Wide Area Network (WAN), wireless network (e.g., using Wireless Application Protocol), the Internet, and the like. In the preferred embodiment, the trained classifier 104 may be integral part of the object detection system 101.

[0047] In another embodiment, the output of the object detection system 101 may be provided to a display unit (not shown in Figure) associated with the object detection system 101, to display the one or more regions of interest on the 2D image. In the alternative embodiment, the 2D image along with the one or more regions of interest may be displayed. The one or more regions of interest may be used identify to locate objects in the 3D environment.

[0048] In the preferred embodiment, the object detection system 101 may receive data for performing the object detection via the I/O interface 106. The received data may include, but is not limited to, at least one of the 3D representation and the 2D image of at least part of the 3D environment. Also, the object detection system 101 may transmit data, for detecting the objects, via the I/O interface 106. The transmitted data may include, but is not limited to, the regions of interest, the one or more clusters, output of the projection and information on cuboid and so on.

[0049] Figure 2 shows a detailed block diagram of the object detection system 101 for performing object detection for objects present in the 3D environment, in accordance with some embodiments of the present disclosure.

[0050] The data 109 and the one or more modules 108 in the memory 107 of the object detection system 101 is described herein in detail.

[0051] in one implementation, the one or more modules 108 may include, but are not limited to, a cluster detection module 201, a cuboid generation module 202, a projection module 203, a locating module 204 and one or more other modules 205, associated with the object detection system 101.

[0052] In the preferred embodiment, the data 109 in the memory 107 may include 3D representation data 206, 2D image data 207, cluster data 208, cuboid data 209, projection data 210. ROI data 211 and other data 212 associated with the object detection system 101.

[0053] In the preferred embodiment, the data 109 in the memory 107 may be processed by the one or more modules 108 of the object detection system 101. in the preferred embodiment, the one or more modules 108 may be implemented as dedicated units and when implemented in such a manner, said modules may be configured with the functionality defined in the present disclosure to result in a novel hardware. As used herein, the term module may refer to an Application Specific Integrated Circuit (ASIC), an electronic circuit, a Field-Programmable Gate Arrays (FPGA), Programmable System-on-Chip (PSoC), a combinational logic circuit, and/or other suitable components that provide the described functionality.

[0054] The one or more modules 108 of the present disclosure function to perform the object detection for objects present in the 3D environment. The one or more modules 108 along with the data 109, may be implemented in any system, for performing detection of the objects.

[0055] The object detection may be implemented in a system where there is a need for detecting object in surroundings of the system. The object detection system 101 may be associated with the system, for performing detection of the objects. in the preferred embodiment, the system implementing the object detection system 101 may be an autonomous system. The object detection system 101 may be triggered to detect the objects in the 3D environment by a user of the system. In the preferred embodiment, the object detection unit may be automatically trigger based on one or more factors. The one or more factors may vary based on application of the object detection system 101. For example, when the system is an autonomous vehicle or a robotic system, the object detection system 101 may be triggered when the system is switched on or in movement.

[0056] For performing the detection of the objects, initially, the object detection system 101 may be configured to receive the 3D representation and 2D image of at least part of the 3D environment. In the preferred embodiment, the object detection unit may received the 3D representation from the 3D representation capture unit 102 and 2D image 2D image capture unit 103.1n the preferred embodiment, the 3D representation and the 2D image may be received by the object detection unit in real-time. For example, the 3D representation capture unit 102 may dynamically communicate the 3D representation with the object detection system 101, upon capturing. Similarly, the 2D image capture unit 103 may dynamically communicate the 2D image with the object detection system 101, upon capturing. In the preferred embodiment, the 3D representation and the 2D image may be pre-stored in a memory unit associated with the system and may be retrieved by the object detection system 101, when performing detection of the objects. In the preferred embodiment, the object detection unit may relieve the 3D representation and the 2D image and store in the memory 107. The 3D representation and the 2D image in the memory 107 may be used when performing the detection, in the preferred embodiment, when the system implementing the object detection system 101 is in movement, the 3D representation and the 2D image may be captured simultaneously and provided dynamically to the object detection system 101. The object detection system 101 may use the 3D representation and the 2D image in real-time to detect object in the 3D environment associated with said system. In the preferred embodiment, the one or modules of the object detection system 101 may operate to perform the detection of the objects at regular intervals of time. In the preferred embodiment, the one or modules of the object detection system 101 may operate to perform the detection of the objects continuously during movement of the system implementing the object detection system 101. In such embodiment, the 3D representation and the 2D image may be captured continuously for the detection.

[0057] Consider an embodiment illustrated in Figure 3a. The object detection system 101 may be implemented in an autonomous vehicle. When on road, it may be required for the autonomous vehicle to identify objects in the surroundings. By which, one or more actions may be executed for the autonomous vehicle. The one or more actions may include, but are not limited to, moving forward, tuning right, or left, over taking front vehicle, applying brakes, reducing speed of the autonomous vehicle, parking the autonomous vehicle and so on. Thus, the object detection unit may be implemented in such autonomous vehicle for detecting the object in the surroundings of the autonomous vehicle. The 2D image capture unit 103 of the autonomous vehicle may capture a 2D image 300 as illustrated in Figure 3a. The 2D image 300 may represent the 3D environment of the autonomous vehicle in 2D form. The 2D image 300 may include one or more objects like front vehicles, trees, buildings and so on. Consider the one or more objects in the 3D environment to be first object 301.1, second object 301.2, third object 301.3, fourth object 301.4, fifth object 301.5 and sixth object 301.6. Simultaneously, 3D representation of the 3D environment may also be captured by the 3D representation capture unit 102, for the autonomous vehicle. The 3D representation is communicated with the object detection system 10 1. In the preferred embodiment, the 3D representation may include data points representation each of the objects in the 3D environment of the autonomous vehicle. In the preferred embodiment, the 3D representation may provide 3D visualization of the 3D environments using multiple points. Each of the points represent X, Y. and Z geometric coordinates of a single point on an underlying object from the objects in the 3D environment.

[0058] Upon receiving the 3D representation and the 2D image, the object detection system 10 I may be configured to store the 3D representation as the 3D representation data 206 and the 2D image as the 2D image data 207 in the memory 107. The cluster detection module 201 of the object detection system 101 may be configured to use the 3D representation to detect one or more dusters of points in the 3D representation. in the preferred embodiment, the cluster detection module 201 may detect the one or more clusters by implemented one or more techniques such as model-based clustering, edge-based clustering, region-based clustering, k-means clustering. DB-scan clustering, optimization-based clustering and so on. In the preferred embodiment, by implementing the clustering technique, agglomerations of 3D points in the 3D representation are determined. The points are then segmented into one or more clusters.

[0059] One or more other techniques of clustering, known to a person skilled in the arc may be implemented to detect the one or more clusters. The one or more clusters are detected such that each of the one or more clusters indicate an object from one or more objects present in the 3D environment. In the preferred embodiment, the cluster detection module 201 may be configured to update the one or more dusters detected dusters. in the preferred embodiment, updating of the one or more clusters may be initiated when one or more points in the 3D representation are identified to be not part of any of the one or more clusters. Thus, for updating the one or more clusters, the cluster detection module 201 may be configured to correct misplaced points in the 3D representation to be part of the one or more clusters. In the preferred embodiment, the cluster detection module 201 may be configured to perform updating of the one or more clusters, until all the points in the 3D representation are part of the one or more dusters. One or more techniques, known to a person skilled in the art, may be implemented to update the one or more dusters. The one or more dusters detected by the cluster detection module 201 may be stored as the cluster data 208 in the memory 107. An exemplary illustration of a 3D representation 302 with one or more clusters for the 3D environment of an autonomous vehicle is provided in Figure 3b. The 3D representation 302 may be of the 3D environment illustrated in Figure 3a. First cluster 303,1 may be detected for the first object 301.1. Second duster 303.2 may be detected for the second object 301.2. Third cluster 303.3 may be detected for the third object 301.3. Fourth cluster 303.4 may be detected for the fourth object 301.4. Fifth cluster 303.5 may be detected for the fifth object 301.5. Sixth cluster 303.6 may be detected for the sixth object 301.6.

[0060] Upon detecting the one or more clusters, the cuboid generation module 202 of the object detection system 101 may be configured to generate a cuboid for each of the one or more objects in the 3D representation. The cuboid for an object from the one or more objects is generated based on corresponding cluster from the one or more clusters. In the preferred embodiment, the cuboid encloses boundary points of respective object from the one or more objects. One or more techniques, known to a person skilled in the art, may be implemented to generate the cuboid in the 3D representation. Cuboids generated by the cuboid generation module 202 may be stored as the cuboid data 209 in the memory 107. An exemplary embodiment illustrating the 3D representation with generated cuboids for each of the one or more clusters is shown in Figure 3c. First cuboid 304.1 may be generated for the first cluster 303.1. First cuboid 304.1 may be generated for the first cluster 303.1. Second cuboid 304.2 may be generated for the second cluster 303.2. Third cuboid 304.3 may be generated for the third cluster 303.3. Fourth cuboid 304.4 may be generated for the fourth cluster 303.4. Fifth cuboid 304.5 may be generated for the fifth cluster 303.5. Sixth cuboid 304.6 may be generated for the sixth cluster 303.6.

[0061] Further, the projection module 203 of the object detection system 101 may be configured to project at least one point of interest from corresponding cuboid of each of the one or more clusters onto the 2D image of at least part of the 3D environment. In the preferred embodiment, the at least one point of interest of a cluster from the one or more clusters may be at least one of point of centroid, point in edge and point in vertices, of the cuboid associated with the cluster. One or more techniques, known to a person skilled in the art, may be implemented for the projection. Information related to the projection performed by the projection module 203 may be stored as the projection data 210 in the memory 107. Figure 3d illustrates projection of cuboids on the 2D image 300. For example, the first cuboid 304.1 of the first cluster 303.1 relating to the first object 301.1 is projected on the 2D image 300. Points of edges of the first cuboid 304.1 is projected on the 2D image 300. Similarly, the sixth cuboid 304.6 of the sixth cluster 303.6 relating to the sixth object 301.6 is projected on the 2D image 300. Points of edges of the sixth cuboid 304.6 is projected on the 2D image 300. The projection of the at least one point of interest for each of other cuboids i.e., the second cuboid 304.2, the third cuboid 304.3, the fourth cuboid 304.4 and the fifth cuboid 304.5 is performed.

[0062] Upon projection, a bounding box may be formed for each of the objects on the 2D image. The locating module 204 of the object detection system 101, based on the projection, may be configured to locate one or more regions of interest of the one or more objects on the 2D image. Each bounding box may be considered to region of interest for corresponding object. The one or more regions of interest located by the locating module 204 may be stored as the Region of Interest (R01) data 211 in the memory 107. As illustrated in Figure 3d, first region of interest 305.1 may be located for the first cuboid 304.1 relating to the first object 301.1. Similarly, sixth region of interest 305.6 may be located for the sixth cuboid 304.6 relating to the sixth object 301.6. As shown in Figure 3e, regions of interest for other cuboids are also located on the 2D image 300. Second region of interest 305.2 may be located for the second cuboid 304.2 relating to the second object 301.2. Third region of interest 305.3 may be located for the third cuboid 304.3 relating to the third object 301.3. Fourth region of interest 305.4 may be located for the fourth cuboid 304.4 relating to the fourth object 301.4. Fifth region of interest 305.5 may be located for the fifth cuboid 304.5 relating to the fifth object 301.5. Thus, each of the one or more regions of interest represents each object in the 3D environment and hence the one or more objects are detected or identified by the object detection system 101. In the preferred embodiment, using the one or more regions of interest, presence of the one or more objects may be identified.

[0063] The other data 212 may store data, including temporary data and temporary files, generated by modules for performing the various functions of the object detection system 101. The one or more modules 108 may also include other modules 205 to perform various miscellaneous functionalities of the object detection system 101. It will be appreciated that such modules may be represented as a single module or a combination of different modules.

[0064] Figure 4 illustrates a flowchart showing an exemplary method to performing detection of the objects in the 3D environment. The proposed computer-implemented method may be implemented in a system where there is a need for detecting objects in surroundings i.e., 3D environment of such system. In the preferred embodiment, the computer-implemented method may be performed by the object detection system 101 upon receiving 3D representation and 2D image of at least part of the 3D environment.

[0065] initially, at block 401, the object detection system 101 may be configured to detect one or more dusters of points in the 3D representation of the 3D environment. Each of the one or more clusters may indicate an object from one or more objects present in the 3D environment. In the preferred embodiment, detection of the one or more clusters may include correction of misplaced points in the 3D representation to be part of the one or more clusters. In the preferred embodiment, the correction may be an iterative process and may be performed until no points are identified to be misplaced. in the preferred embodiment, for the correction, points in the 3D representation cloud is checked for density of points. in areas where the density is less, the points are interpolated or extrapolated from neighboring points to update the one or more clusters.

[0066] At block 402, the object detection system 101 may be configured to generate a cuboid for each of the one or more objects in the 3D representation. The cuboid for an object from the one or more objects is generated based on corresponding cluster from the one or more clusters. In the preferred embodiment, each cluster from the one or more clusters is used to generate the cuboid to enclose boundary points of respective object from the one or more objects.

[0067] At block 403, the object detection system 101 may be configured to project at least one point of interest from corresponding cuboid of each of the one or more clusters onto the 2D image of at least part of the 3D environment. The object detection system 101 may be configured to receive the 2D image from the 2D image capture unit 103, in the preferred embodiment, the at least one point of interest of a duster from the one or more clusters may be at least one of point of centroid, point in edge and point in vertices, of the cuboid associated with the cluster. In the preferred embodiment, the projection is a 3D to 2D transform and gives location of the coordinates of the one or more objects in the 2D image.

[0068] At block 404, the object detection system 101 may be configured to locate one or more regions of interest of the one or more objects on the 2D image based on the projection. Each of the one or more regions of interest represents each object in the 3D environment and hence the one or more objects are detected by the object detection system 101.

[0069] In the preferred embodiment, output from the object detection system 101 may be provided to the trained classifier 104. That is, the object detection system 101 may be configured to transmit the one or more regions of interest of the one or more objects from the 2D image to the trained classifier 104. In the preferred embodiment, the trained classifier 104 may be configured to classify the one or more objects present in the 3D environment using the output of the object detection system 101. For example, die regions of interest in Figure 3e may be provided to the trained classifier 104. In the preferred embodiment, the trained classifier 104 may be trained using plurality of classes associated with objects in the 3D environment. For an exemplary 3D environment illustrated in Figures 3a-3e, the plurality of classes may include, but are not limited to, car, bicycle, tree, building, pedestrian and so on. Using the regions of interest, the trained classifier 305.1, 305.2...305.6 104 may be configured to classify the one or more objects to be associated with a class from the plurality of classes. For example, using the regions of interest 305.1, 305.2...305.6, objects within the regions of interest are detected and classified as one of car, a bicycle and so on.

[0070] As illustrated in Figure 4, the method 400 may include one or more blocks for executing processes in the object detection system 101. The method 400 may be described in the general context of computer executable instructions. Generally, computer executable instructions can include routines, programs, objects, components, data structures, procedures, modules, and functions, which perform particular functions or implement particular abstract data types.

[0071] The order in which the method 400 is described may not intended to be construed as a limitation, and any number of the described method blocks can be combined in any order to implement the computer-implemented method. Additionally, individual blocks may be deleted from the methods without departing from the scope of the subject matter described herein. Furthermore, the computer-implemented method can be implemented in any suitable hardware, software, firmware, or combination thereof [0072] The proposed disclosure teaches simpler techniques by eliminating the need of complex neural networks for the projection of the 3D data onto the 2D data, and for the detection of object. Thus, complexity of the overall system and processing time required for object detection may be reduced. By which, the proposed method and system may be implemented for real-time and dynamic detection of objects.

[0073] The present disclosure does not use any complex deep learning models. The proposed computer-implemented method is independent of any input data biases and performs well without any heavy processing power.

Computing System [0074] Figure 5 illustrates a block diagram of an exemplary, computer system 500 for implementing embodiments consistent with the present disclosure. In an embodiment, the computer system 500 is used to implement the object detection system 101 for performing detection of the objects in 3D environment. The computer system 500 may include a central processing unit ("CPU" or "processor") 502. The processor 502 may include at least one data processor for executing processes in Virtual Storage Area Network. The processor 502 may include specialized processing units such as, integrated system (bus) controllers, memory management control units, floating point units, graphics processing units, digital signal processing units, etc. [0075] The processor 502 may be disposed in communication with one or more input/output (I/0) devices 509 and 510 via 1/0 interface 501. The I/O interface 501 may employ communication protocols/methods such as, without limitation, audio, analog, digital, monaural, RCA, stereo, IEEE-1394, serial bus, universal serial bus (USB), infrared, PS/2, BNC, coaxial, component, composite, digital visual interface (DVI), high-definition multimedia interface (HUM, radio frequency (RF) antennas, S-Video, VGA, IEEE 802.n /b/g/n/x, Bluctooth, cellular (e.g., code-division multiple access (CDMA), high-speed packet access (HSPA+), global system for mobile communications (GSM), long-term evolution (LTE). WiMax, or the like), etc. [0076] Using the 1/0 interface 501, the computer system 500 may communicate with one or more 1/0 devices 509 and 510. For example, the input devices 509 may be an antenna, keyboard, mouse, joystick, (infrared) remote control, camera, card reader, fax machine, dongle, biometric reader, microphone, touch screen, touchpad, trackball, stylus, scanner, storage device, transceiver, video device/source, etc. The output devices 510 may be a printer, fax machine, video display (e.g., cathode ray tube (CRT), liquid clystal display (LCD), light-emitting diode (LED), plasma, Plasma Display Panel (PDP), Organic light-emitting diode display (OLED) or the like), audio speaker, etc. [0077] In some embodiments, the computer system 500 may consist of the object detection system 101. The processor 502 may be disposed in communication with a communication network (not shown in figure) via a network interface 503. The network interface 503 may communicate with the communication network. The network interface 503 may employ connection protocols including, without limitation, direct connect, Ethernet (e.g., twisted pair 10/100/1000 Base T), transmission control protocol/intemet protocol (TCP/IP), token ring, IEEE 802.11a/b/g/n/x, etc. The communication network may include, without limitation, a direct interconnection, local area network (LAN), wide area network (WAN), wireless network (e.g., using Wireless Application Protocol), the Internet, etc. Using the network interface 503 and the communication network, the computer system 500 may communicate with at least one of a 3D representation capture unit 511, 2D image capture unit 512 and a trained classifier 513, for performing the detection of the one or more objects in the 3D environment. The network interface 503 may employ connection protocols include, but not limited to, direct connect, Ethernet (e.g., twisted pair 10/100/1000 Base T), transmission control protocol/intemet protocol (TCP/IP), token ring, IEEE 802,1 -I aibig/n/x, etc. [0078] The communication network includes, but is not limited to, a direct interconnection, an e-commerce network, a peer to peer (P2P) network, local area network (LAN), wide area network (WAN), wireless network (e.g., using Wireless Application Protocol), the Internet, Wi-Fi, and such. The first network and the second network may either be a dedicated network or a shared network, which represents an association of the different types of networks that use a variety of protocols, for example, Hypertext Transfer Protocol (HTTP), Transmission Control Protocol/Internet Protocol (TCP/IP), Wireless Application Protocol (\YAP), etc., to communicate with each other. Further, the first network and the second network may include a variety of network devices, including routers, bridges, servers, computing devices, storage devices, etc. [0079] In some embodiments, the processor 502 may be disposed in communication with a memory 505 (e.g., RAM, ROM, etc. not shown in Figure 5) via a storage interface 504. The storage interface 504 may connect to mommy 505 including, without limitation, memory drives, removable disc drives, etc., employing connection protocols such as, serial advanced technology attachment (SATA), Integrated Drive Electronics (IDE), IEEE-1394, Universal Serial Bus (USB), fibre channel, Small Computer Systems Interface (SCSI), etc. The memory drives may further include a drum, magnetic disc drive, magneto-optical drive, optical drive, Redundant Array of Independent Discs (RAID), solid-state memory devices, solid-state drives, etc. [0080] The memory 505 may store a collection of program or database components, including, without limitation, user interface 506, an operating system 507, web browser 508 etc. In some embodiments, computer system 500 may store user/application data, such as, the data, variables, records, etc., as described in this disclosure. Such databases may be implemented as fault-tolerant, relational, scalable, secure databases such as Oracle ® or Sybaset.

[0081] The operating system 507 may facilitate resource management and operation of the computer system 500. Examples of operating systcms include, without limitation, APPLE MACINTOSH® OS X, UNIX®, UNIX-like system distributions (E.G., BERKELEY SOFTWARE DISTRIBUTIONTm (BSD), FREEBSDTm, NETBSDTm, OPENBSDTM, etc.), LINUX DISTRIBUTIONSTm (E.G., RED HATTm, UBUNTUTm, KUBUNTUTm, etc.), IBMTm OS/2, MICROSOFTTm W1NDOWSTm (XP', VISTATm/7/8, 10 etc.), APPLE® 105T1m. GOOGLE® ANDROIDTM. BLACKBERRY® OS, or the like.

[0082] In some embodiments, the computer system 500 may implement a web browser 508 stored program component. The web browser 508 may be a hypertext viewing application, such as Microsoft Internet Explorer, Google Chrome, Mozilla Firefox, Apple Safari, etc. Secure web browsing may be provided using Hypertext Transport Protocol Secure (HTTPS), Secure Sockets Layer (SSL), Transport Layer Security (TLS), etc. Web browsers 508 may utilize facilities such as AJAX, DHTML, Adobe Flash, JavaScript, Java, Application Programming Interfaces (APIs), etc. In some embodiments, the computer system 500 may implement a mail server stored program component. The mail server may be an Internet mail server such as Microsoft Exchange, or the like. The mail server may utilize facilities such as ASP, ActiveX, ANSI C-F-F/C#, Microsoft NET, Common Gateway interface (CGT) scripts, Java, JavaScript, PERL, PHP, Python, WebObjects, etc. The mail server may utilize communication protocols such as Internet Message Access Protocol (IMAP). Messaging Application Programming Interface (MAPI), Microsoft Exchange, Post Office Protocol (POP), Simple Mail Transfer Protocol (SMTP), or the like. In some embodiments, the computer system 500 may implement a mail client stored program component. The mail client may be a mail viewing application, such as Apple Mail, Microsoft Entourage, Microsoft Outlook, Mozilla Thunderbird, etc. [0083] Furthermore, one or more computer-readable storage media may be utilized in implementing embodiments consistent with the present disclosure. A computer-readable storage medium refers to any type of physical memory on which infonnation or data readable by a processor may be stored. Thus, a computer-readable storage medium may store instructions for execution by one or more processors, including instructions for causing the processor(s) to perform steps or stages consistent with the embodiments described hcrcin. The term "computer-readable medium" should be understood to include tangible items and exclude carrier waves and transient signals, i.e., be non-transitoiy. Examples include Random Access Memory (RAM), Read-Only Memory (ROM), volatile memory, non-volatile memory, hard drives. Compact Disc (CD) ROMs, DVDs. flash drives, disks, and any other known physical storage media.

[0084] The described operations may be implemented as a method, system or article of manufacture using standard programming and/or engineering techniques to produce software, firmware, hardware, or any combination thereof The described operations may be implemented as code maintained in a "non-transitory computer readable medium", where a processor may read and execute the code from the computer readable medium. The processor is at least one of a microprocessor and a processor capable of processing and executing the queries. A non-transitory computer readable medium may include media such as magnetic storage medium (e.g., hard disk drives, floppy disks, tape, etc.), optical storage (CD-ROMs, DVDs, optical disks, etc.), volatile and non-volatile memory devices (e.g., EEPROMs, ROMs, PROMs, RAMs, DRAMs, SRAMs, Flash Memory, firmware, programmable logic, etc.), etc. Further, non-transitory computer-readable media may include all computer-readable media except for a transitory. The code implementing the described operations may further be implemented in hardware logic (e.g., an integrated circuit chip, Programmable Gate Array (PGA), Application Specific Integrated Circuit (ASIC), etc.).

[0085] An "article of manufacture" includes non-transitory computer readable medium, and /or hardware logic, in which code may be implemented. A device in which the code implementing the described embodiments of operations is encoded may include a computer readable medium or hardware logic. Of course, those skilled in the art will recognize that many modifications may be made to this configuration without departing from the scope of the invention, and that the article of manufacture may include suitable information bearing medium known in the art.

[0086] The terms "an embodiment", "embodiment", "embodiments", "the embodiment", "the embodiments", "one or more embodiments", "some embodiments", and "one embodiment" mean "one or more (but not all) embodiments of the invention(s)" unless expressly specified otherwise.

[0087] The terms "including", "comprising", "having" and variations thereof mean "including but not limited to", unless expressly specified otherwise.

[0088] The enumerated listing of items does not imply that any or all of the items are mutually exclusive, unless expressly specified otherwise.

[0089] The terms "a" 'an" and "the" mean "one or more", unless expressly specified otherwise.

[0090] A description of an embodiment with several components in communication with each other does not imply that all such components are required. On the contrary a variety of optional components are described to illustrate the wide variety of possible embodiments of the invention.

[0091] When a single device or article is described herein, it will be readily apparent that more than one device/article (whether or not they cooperate) may be used in place of a single device/article. Similarly, where more than one device or article is described herein (whether or not they cooperate), it will be readily apparent that a single device/article may be used in place of the more than one device or article, or a different number of devices/articles may be used instead of the shown number of devices or programs. The functionality and/or the features of a device may be alternatively embodied by one or more other devices which are not explicitly described as having such fimctionality/features. Thus, other embodiments of the invention need not include the device itself [0092] The illustrated operations of Figure 4 show certain events occurring in a certain order. In alternative embodiments, certain operations may be performed in a different order, modified, or removed. Moreover, steps may be added to the above-described logic and still conform to the described embodiments. Further, operations described herein may occur sequentially or certain operations may be processed in parallel. Yet further, operations may be performed by a single processing unit or by distributed processing units.

[0093] Finally, the language used in the specification has been principally selected for readability and instructional purposes, and it may not have been selected to delineate or circumscribe the inventive subjectmatter. It is therefore intended that the scope of the invention be limited not by this detailed description, but rather by any claims that issue on an application based here on. Accordingly, the disclosure of the embodiments of the invention is intended to be illustrative, but not limiting, of the scope of the invention, which is set forth in the following claim s.

[0094] While various aspects and embodiments have been disclosed herein, other aspects and embodiments will be apparent to those skilled in the art. The various aspects and embodiments disclosed herein are for purposes of illustration and are not intended to be limiting, with the true scope and spirit being indicated by the following claims.

Referral numerals:

Reference Description

Number Exemplary environment 101 Object detection system 102 3D representation capture unit 103 2D image capture unit 104 Trained classifier Processor 106 1/0 interface 107 Memory 108 Modules 109 Data 201 Cluster detection module 202 Cuboid generation module 203 Projection module 204 Locating module 205 Other modules 206 3D representation data 207 2D image data 208 Cluster data 209 Cuboid data 210 Projection data 211 ROI data 212 Other data 300 2D image 301.1 First object 301.2 Second object 301.3 Third object 301.4 Fourth object 301.5 Fifth object 301.6 Sixth object 302 3D representation 303.1 First cluster 303.2 Second cluster 303.3 Third cluster 303.4 Fourth cluster 303.5 Fifth cluster 303.6 Sixth cluster 304,1 First cuboid 304.2 Second cuboid 304.3 Third cuboid 304.4 Fourth cuboid 304.5 Fifth cuboid 304.6 Sixth cuboid 305,1 First region of interest 305.2 Second region of interest 305.3 Third region of interest 305.4 Fourth region of interest 305.5 Fifth region of interest 305.6 Sixth region of interest 500 Computer System I VO Interface 502 Processor 503 Network Interface 504 Storage Interface 505 Memory 506 User Interface 507 Operating System 508 Web Browser 509 Input Devices 510 Output Devices 511 3D representation capture unit 512 2D image capture unit 513 Trained classifier

Claims

We claim: 1. A computer-implemented method for performing object detection for objects present in a 3-Dimensional (3D) environment, the computer-implemented method performed, by an object detection system, comprising: A) detecting one or more clusters of points in 3D representation of a 3D environment, wherein each of the one or more clusters indicate an object from one or more objects present in the 3D environment; B) generating a cuboid for each of the one or more objects in the 3D representation, based on the one or more dusters detected in step A; C) projecting at least one point of interest from corresponding cuboid of each of the one or more clusters of the 3D representation onto a 2-Dimensional (2D) image of at least a part of the 3D environment; and D) locating one or more regions of interest of the one or more objects on the 2D image based on the projection of the at least one point of interest from step C), for detecting the one or more objects.
2 The computer-implemented method of claim 1, further comprising: providing, by the object detection system, the one or more regions of interest of the one or more objects from the 2D image to a trained classifier, for classifying the one or more objects present in the 3D environment.
3. The computer-implemented method of any of claims 1 to 2, wherein the 3D environment is surroundings of an autonomous vehicle, and the object detection system is implemented in the autonomous vehicle.
4. The computer-implemented method of any of claims 1 to 3, wherein each cluster from the one or more clusters in the 3D representation is used to generate the cuboid to enclose boundary points of respective object from the one or more objects.
5. The computer-implemented method of any of claims 1 to 4, wherein the at least one point of interest of a cluster from the one or more clusters comprises at least one of point of centroid, point in edge and point in vertices, of the cuboid associated with the cluster.
6. The computer-implemented method of any of claims 1 to 5, wherein detecting the one or more clusters comprises correcting misplaced points in the 3D representation to be part of the one or more dusters, to update the one or more clusters.
7. An object detection system for performing object detection for objects present a 3-D nensional (3D) environment, the object detection system comprises: a processor; and a memory communicatively coupled to the processor, wherein the memory stores processor-executable instructions, which, on execution, cause the processor to perform a computer-implemented method of any of the preceding claims.
8. The object detection system of claim 7, further comprises sensors for detecting 3D representations for use in step A.
9. The object detection system of any of claims 7 to 8, further comprises sensors for detecting 2-Dimensional (2D) images for use in step C.
10. The object detection system of any of claims 7 to 9" further comprises a interface configured to receive data of 3D representation of a 3D environment ancUor data of 2-Dimensional (2D) images..
11. A computer program product comprising instructions which, when the program is executed by a computer, cause the computer to carry out the steps of the claims 1 to 6.
12. A computer program product comprising instructions to cause the object detection system of claims 7 to 10 to execute the steps of the computer-implemented method of claims I to 6.
13 A computer-readable storage medium comprising instructions which, when executed by a computer, cause the computer to carry out the steps of the computer-implemented method of claims 1 to 6.