CN115359457A

CN115359457A - 3D target detection method and system based on fisheye image

Info

Publication number: CN115359457A
Application number: CN202211019368.3A
Authority: CN
Inventors: 宋京; 吴子章; 王晓权; 吴昀哲; 王凡
Original assignee: Zongmu Technology Shanghai Co Ltd
Current assignee: Zongmu Technology Shanghai Co Ltd
Priority date: 2022-08-24
Filing date: 2022-08-24
Publication date: 2022-11-18

Abstract

The disclosure relates to a 3D target detection method and system based on a plurality of fisheye images. The method comprises the following steps: receiving a plurality of fisheye images photographed at the same time from a plurality of fisheye cameras disposed around a vehicle; performing 3D target detection according to each of the plurality of fisheye images to obtain 3D information and corresponding confidence of a target included in each fisheye image; mapping the 3D information of the target detected in each fisheye image into a single coordinate system to obtain 3D perception information of the surroundings of the vehicle; performing a deduplication operation on 3D perception information of a detected target based on a confidence corresponding to the 3D information to remove duplicate 3D information related to the same target; and using the 3D perception information subjected to the deduplication operation for automatic driving and/or assisted driving of the vehicle.

Description

3D target detection method and system based on fisheye image

Technical Field

The disclosure relates to a 3D target detection method and system based on fisheye images.

Background

With the rapid development of fields such as robot application and automatic driving, 3D object detection becomes more and more important. In the field of intelligent driving, various perception algorithms are required to acquire information such as relative positions, sizes and orientations of pedestrians, vehicles and the like on roads so as to control the self-vehicle to avoid the pedestrians and the vehicles.

Currently, a monocular 3D object detection method based on a single RGB image needs to additionally label some other information for 3D object detection on the basis of a traditional 2D detection label, such as a 3D size of an object, a 3D coordinate of the object in a camera coordinate system, a deflection angle of the object with respect to an observer, and the like. However, most of these monocular 3D target detection technologies are based on narrow-angle pinhole cameras, and their disadvantages are small field of view and blind areas. For intelligent driving, sensing the surrounding environment usually requires more than ten pinhole cameras, so that the processing time is greatly increased, and the real-time requirement of target detection processing is not met.

The present disclosure improves upon, but is not limited to, the factors discussed above.

Disclosure of Invention

Therefore, the disclosure provides a 3D target detection method and system based on multiple fisheye cameras. The method disclosed by the invention utilizes the convolutional neural network to extract features, directly uses the original image of the fisheye image (i.e. without distortion correction) to detect the 3D target, and fully utilizes the information of the fisheye image (because the distortion correction will lose image information, for example, 1/3 of the information is generally lost), so that the detection result is more accurate. The method disclosed by the invention not only has higher accuracy for 3D target detection, but also has higher real-time performance. The method disclosed by the invention is combined with the coordinate conversion and fusion strategy of the multi-channel fisheye image detection results at the same time, so that the accurate detection of the sensing environment target around the intelligent driving vehicle is realized, the problem of poor detection effect of the single-channel fisheye 3D detection method in the serious distortion area is solved, more reliable environment sensing information is provided for an automatic driving and/or auxiliary driving system, and the vehicle can conveniently make more reliable control and decision planning according to the more reliable environment sensing information.

According to a first aspect of the present disclosure, there is provided a method for 3D object detection based on a plurality of fisheye images, comprising: receiving a plurality of fisheye images photographed at the same time from a plurality of fisheye cameras disposed around a vehicle; performing 3D target detection according to each of the plurality of fisheye images to obtain 3D information and corresponding confidence of a target included in each fisheye image; mapping the 3D information of the target detected in each fisheye image into a single coordinate system to obtain 3D perception information of the surroundings of the vehicle; performing a deduplication operation on the 3D perception information based on a confidence corresponding to 3D information of a detected target to remove duplicate 3D information related to the same target; and using the 3D perception information subjected to the deduplication operation for automatic driving and/or assisted driving of the vehicle.

According to an embodiment, the plurality of fisheye cameras are arranged to cover the surroundings of the vehicle 360 °.

According to a further embodiment, performing 3D object detection according to each of the plurality of fisheye images comprises: extracting the characteristics of the fisheye image through a neural network to obtain a characteristic diagram; for each pixel of the feature map: constructing a matrix by combining the pixel coordinates with distortion parameters and depth information associated with a fisheye camera taking the fisheye image; processing the obtained matrix through the neural network to obtain a position code corresponding to the position of the pixel; and combining the feature map with the obtained position code by an attention mechanism to obtain a new feature map; and carrying out target detection on the new feature map through the neural network so as to obtain 3D information and corresponding confidence coefficient of a target.

According to a further embodiment, the neural network comprises four concatenated residual networks.

According to yet another embodiment, the method includes cropping each of the plurality of fisheye images to remove image regions where the frequency of appearance of objects is low before performing 3D object detection.

According to a further embodiment, mapping the 3D information of the object detected in each fisheye image into a single coordinate system comprises transforming each fisheye image from a pixel coordinate system to a corresponding camera coordinate system and from the corresponding camera coordinate system to the single coordinate system, thereby obtaining 3D perception information of the surroundings of the vehicle in the single coordinate system.

According to a further embodiment, performing a deduplication operation on the 3D perceptual information comprises performing the deduplication operation using non-maxima suppression.

According to a first aspect of the present disclosure, there is provided a multiple fisheye image-based 3D object detection system, comprising: a plurality of fisheye cameras disposed on the vehicle; and an on-board computer, wherein the plurality of fisheye cameras are configured to capture a plurality of fisheye images at a same time and transmit the fisheye images to the on-board computer, and wherein the on-board computer is configured to: performing 3D target detection according to each of the plurality of fisheye images to obtain 3D information and corresponding confidence of a target included in each fisheye image; mapping the 3D information of the target detected in each fisheye image into a single coordinate system to obtain 3D perception information of the surrounding environment of the vehicle; performing a deduplication operation on the 3D perception information based on a confidence corresponding to 3D information of a detected target to remove duplicate 3D information related to the same target; and using the 3D perception information subjected to the deduplication operation for automatic driving and/or assisted driving of the vehicle.

According to an embodiment, the on-board computer is further configured to crop each of the plurality of fisheye images to remove image regions where objects appear less frequently prior to 3D object detection.

According to a third aspect of the present disclosure, there is provided a motor vehicle comprising a 3D object detection system according to the second aspect of the present disclosure.

Aspects generally include methods, apparatus, systems, computer program products, and processing systems substantially as described herein with reference to and as illustrated by the accompanying drawings.

The foregoing has outlined rather broadly the features and technical advantages of an example in accordance with the present disclosure in order that the detailed description that follows may be better understood. Additional features and advantages will be described hereinafter. The conception and specific examples disclosed may be readily utilized as a basis for modifying or designing other structures for carrying out the same purposes of the present disclosure. Such equivalent constructions do not depart from the scope of the appended claims. The features of the concepts disclosed herein, both as to their organization and method of operation, together with associated advantages, will be better understood from the following description when considered in connection with the accompanying figures. Each of the figures is provided for the purpose of illustration and description and does not define the limitations of the claims.

Drawings

So that the manner in which the above recited features of the present disclosure can be understood in detail, a more particular description of the disclosure, briefly summarized above, may be had by reference to aspects, some of which are illustrated in the appended drawings. It is to be noted, however, that the appended drawings illustrate only certain typical aspects of this disclosure and are therefore not to be considered limiting of its scope, for the description may admit to other equally effective aspects. The same reference numbers in different drawings may identify the same or similar elements.

Fig. 1 is a flow diagram of an example 3D object detection method based on multiple fisheye images, according to an embodiment of the disclosure;

FIG. 2 is a schematic diagram of 3D perception information according to an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of an example 3D object detection system based on multiple fisheye images, in accordance with an embodiment of the disclosure; and

FIG. 4 is a schematic view of an example motor vehicle according to an embodiment of the present disclosure.

Detailed Description

As described above, the existing 3D target detection based on narrow-angle pinhole cameras often needs to use more than ten pinhole cameras, which greatly increases the processing time and does not meet the real-time requirement of target detection processing.

The inventor realizes that the viewing angle range of the all-round fisheye camera is wide, no blind area exists, and the shielding between targets can be reduced, so that 3D target detection by using fisheye images is necessary. However, the fish-eye image has serious target distortion and large image distortion, which brings certain difficulty to image processing, so that the existing image target detection method cannot be well transferred to the processing of the fish-eye image. At present, the research on 3D target detection by using a panoramic fisheye image is not mature, and especially, the target detection effect of a severely distorted area is poor, and the target identification accuracy is low.

Therefore, the present disclosure proposes a 3D target detection method and system based on multiple fisheye cameras. The method disclosed by the invention utilizes the convolutional neural network to extract features, directly uses the original image of the fisheye image (i.e. without distortion correction) to detect the 3D target, and fully utilizes the information of the fisheye image (because the distortion correction will lose image information, for example, 1/3 of the information is generally lost), so that the detection result is more accurate. The method disclosed by the invention not only has higher accuracy for 3D target detection, but also has higher real-time performance. The method disclosed by the invention is combined with the coordinate conversion and fusion strategy of the detection results of the multi-channel fisheye images at the same time, so that the accurate detection of the sensing environmental target around the intelligent driving vehicle is realized, the problem that the detection effect of the single-channel fisheye 3D detection in the serious distortion area is poor is solved, more reliable environmental sensing information is provided for an automatic driving and/or auxiliary driving system, and the vehicle can conveniently make more reliable control and decision planning according to the more reliable environmental sensing information.

The detailed description set forth below in connection with the appended drawings is intended as a description of various configurations and is not intended to represent the only configurations in which the concepts described herein may be practiced. The detailed description includes specific details to provide a thorough understanding of the various concepts. It will be apparent, however, to one skilled in the art that these concepts may be practiced without these specific details.

Referring to fig. 1, a flow diagram of an example 3D object detection method 100 based on multiple fisheye images is shown, according to an embodiment of the disclosure.

As shown in fig. 1, method 100 may include receiving a plurality of fisheye images taken at the same time from a plurality of fisheye cameras disposed around a vehicle at block 110.

In an embodiment of the present disclosure, a plurality of fisheye cameras may be disposed to cover the surroundings of the vehicle by 360 °, for example, the front side, the rear side, the left side, and the right side of the vehicle body may each be provided with one fisheye camera so as to cover the surroundings of the vehicle by 360 °, whereby a small number of fisheye cameras may be used, thereby making the processing time sufficiently short to meet the real-time requirement of 3D object detection. In this embodiment, the four fisheye cameras may be configured to capture images of the environment in front of, behind, to the left of, and to the right of the vehicle at a particular time, including, for example, 3D detection of objects of interest, such as vehicles, pedestrians, various obstacles, road signs, and so forth. For example, a fisheye camera (such as a monocular fisheye camera oriented at a vehicle body heading angle and having a lateral position that is the center of the lateral axis of the vehicle body coordinate system) mounted on the front side of the vehicle captures fisheye-looking road images during vehicle travel. It will be appreciated that a plurality of fisheye cameras are simultaneously photographed to facilitate subsequent detection, fusion, etc. of fisheye images.

In an embodiment of the present disclosure, the captured fisheye image is a fisheye image without distortion correction, and preferably, the road portion proportion should not be lower than a certain value (e.g., a predetermined threshold) in the captured image. It will be understood by those skilled in the art that the predetermined threshold value can be any suitable value that can be set as desired, and will not be described in detail herein.

Next, at block 120, the method 100 may include performing 3D object detection according to each of the plurality of fisheye images to obtain 3D information and corresponding confidence of the object included in each fisheye image.

In an embodiment of the present disclosure, performing 3D object detection according to each of a plurality of fisheye images may include first performing feature extraction on the fisheye image through a neural network to obtain a feature map. Then, for each pixel of the feature map, combining the coordinates of the pixel in the first dimension and the second dimension with the distortion parameters and the depth information associated with the fisheye camera taking the fisheye image to construct a matrix, processing the obtained matrix through a neural network to obtain a position code corresponding to the position of the pixel, and combining the feature map with the obtained position code through an attention mechanism to obtain a new feature map. And finally, carrying out target detection on the new characteristic diagram through a neural network so as to obtain the 3D information and the corresponding confidence coefficient of the target. In an embodiment of the present disclosure, the 3D information of the object may include a category, a size, a heading angle, a position, and the like of the object. Further according to this embodiment, different objects may be distinguished by tags, e.g. a vehicle may be labeled 0, a pedestrian may be labeled 1, etc.

One example of 3D object detection of the present disclosure is given below.

In this example, the fisheye image may be a two-dimensional image having h × w pixels, where h is the number of pixels in the first dimension and w is the number of pixels in the second dimension. By performing feature extraction on the fisheye image through a neural network, c feature maps representing the approximate outline of the target can be obtained, where c is the number of channels (typically 128, 256, etc.). Each feature map is a two-dimensional map of h '× w' pixels, where h 'and w' are specific percentages of h and w, respectively, of the fisheye image. The inventor realizes that the search area is too large due to the fact that h 'and w' are too large, the training difficulty is increased, and the real-time performance and the accuracy of the detection process are reduced; too little will make the feature too rough and the detection accuracy is degraded. Thus, in a preferred embodiment of the present disclosure, h 'and w' are selected to be 1/8 or 1/4 of h and w, respectively, of the fisheye image in order to optimize the accuracy and real-time of 3D object detection. Then, for each of the h '× w' pixels of one feature map, a matrix is constructed combining the pixel coordinates with the distortion parameters and depth information associated with the fisheye camera taking the fisheye image. Further according to this example, the depth information typically takes a range of 0-20 meters, one depth point per 0.5 meters, i.e. a total of 40 depth points. Thus, one matrix for each depth point may be obtained in this example, resulting in a total of 40 matrices. As can be understood by those skilled in the art, the distortion parameter is an inherent constant of the fisheye camera, and is not described in detail herein; and the depth information may be chosen to be any suitable range and the step size of the depth points may be chosen to be any suitable value other than 0.5 meters in the above example. The matrix then derives, via a neural network, a position code corresponding to the pixel position (which is generally the same dimension as the image feature), from which a new feature map can then be derived by combining the feature map with the derived position code by means of attention. The new feature map is then used to detect 3D information of the object and to derive a confidence level corresponding to the 3D information of the object.

In yet another embodiment of the present disclosure, the neural network employed by the method 100 may be a convolutional neural network to take advantage of its powerful feature extraction capability to directly perform 3D object detection on fisheye images. In this embodiment, a single-stage Detection method is employed, and the main structure of the neural network is similar to that of the centret, divided into a backbone (backbone) and a Detection head (Detection head), the backbone employing four serially connected residual error networks (such as ResNet 18) to facilitate the trade-off of Detection speed. Meanwhile, to avoid using complicated means such as an anchor (anchor), the method 100 uses Gaussian distribution to convert the target into a representation of one (e.g., a central point) or a plurality of (the central point plus a plurality of vertexes) key points, thereby directly establishing a relationship between the input image and the thermodynamic diagram.

In yet another embodiment of the present disclosure, to improve the efficiency of object detection, the method 100 may crop each of the plurality of fisheye images to remove image regions where objects appear less frequently before performing 3D object detection.

For example, in the fish-eye image of the present disclosure, the upper portion is generally the sky, and the lower portion is a region closer to the vehicle, and in these image regions, the target object to be detected does not generally appear. Therefore, the corresponding proportion of the upper part and the lower part of the fisheye image can be cut off, the information of the target object cannot be lost, the image content needing to be processed is reduced, and the speed and the efficiency of target detection are improved. For example, for a fisheye image of 1920x 1280 pixels, 200 pixels above and 210 pixels below the fisheye image may be cut out, so that the cut-out image becomes 1920x 870 pixels in size. It will be understood by those skilled in the art that any suitable manner may be used to crop the image, as long as it does not lose the object information, and will not be described herein.

With continued reference to fig. 1, at block 130, method 100 may include mapping the 3D information of the targets detected in each fisheye image into a single coordinate system.

In an embodiment of the present disclosure, mapping the 3D information of the object detected in each fisheye image into a single coordinate system may include fusing a plurality of fisheye images in a single coordinate system through coordinate conversion. For example, the method 100 may convert each fisheye image from the pixel coordinate system to the corresponding camera coordinate system and then from the corresponding camera coordinate system to the single coordinate system, thereby obtaining 3D perception information of the surroundings of the vehicle.

In an example, the single coordinate system may be a world coordinate system (e.g., a coordinate system with the center of a front bumper of the vehicle as an origin) associated with a bird's eye view of the vehicle, whereby the plurality of fisheye images and the 3D information of the objects detected therein may be mapped into the single bird's eye view. It will be appreciated by those skilled in the art that the single coordinate system may be any suitable coordinate system and will not be described in detail herein. Continuing with the example, the translation relationship between the world coordinate system associated with the single bird's eye view and the camera coordinate system of each fisheye camera may be:

wherein [ Xc, yc, zc] ^T Representing the camera coordinate system, [ X, Y, Z] ^T Representing a world coordinate system. R is a rotation matrix and T denotes a translation matrix. R, T are camera independent and are therefore called cameras "Extrinsic parameters ". It will be appreciated that the distance between the origin of coordinates of the two coordinate systems has three degrees of freedom since it is governed by components in the three x, y and z directions.

According to the process of converting the world coordinate system into the camera coordinate system, firstly rotating according to the z axis, then rotating according to the y axis, and then rotating according to the x axis, and finally obtaining the camera coordinate system, wherein the rotation angles are yaw (yaw), pitch (pitch), and roll (roll), respectively, and then the rotation matrix from the camera coordinate system to the world coordinate system is defined as follows: r = R _x *R _y *R _z

A translation matrix T: cam _X ，cam _Y ，cam _Z Is an external parameter of the camera.

The relationship between the pixel coordinate system and the camera coordinate system is:

wherein c is _x 、c _y Due to the accuracy during installation, the principal point is often not at the exact center of the image plane, and an offset needs to be introduced. f. of _x 、f _y Is the pixel focal length of the camera obtained by calibration.

The relationship between the world coordinate system and the pixel coordinate system is:

thus, the pixel coordinates can be converted into world coordinates by the coordinate conversion relationship.

In an embodiment of the present disclosure, it is contemplated that different fisheye images may include the same object, for example, an image taken by a front fisheye camera and an image taken by a left fisheye camera may simultaneously take an object forward and to the left of the vehicle. Thereby, 3D object detection for the left fisheye image and the front fisheye image is made such that 3D information of the same object will be detected, such that these 3D information are redundant. Accordingly, the method 100 may include performing a deduplication operation on 3D perception information of a detected object based on a confidence level corresponding to the 3D information of the object to remove duplicate 3D information related to the same object, at block 140.

For example, the first fisheye image and the second fisheye image may have an overlap region, and an object (e.g., a vehicle, a pedestrian, etc.) appears in the overlap region, such that the method 100 detects 3D information of the object in both the first fisheye image and the second fisheye image, and detects 3D information of the object with a confidence of 0.7 in the first fisheye image, and 3D information of the object with a confidence of 0.9 in the second fisheye image. In this case, the method 100 may remove the 3D information of the object detected in the first fisheye image in the overlap region, and retain the 3D information of the object detected in the second fisheye image, thereby solving the problem of poor object detection effect (especially in a distortion-severe region) in one-way fisheye detection.

In a preferred embodiment of the present disclosure, performing a deduplication operation on the 3D perception information may include performing the deduplication operation using a non-maxima suppression (NMS) to find the best 3D information for the target (including the size, location, heading, etc. of the target). For example, fig. 2 shows a schematic diagram of 3D perception information according to an embodiment of the present disclosure. As shown, fig. 2 shows a schematic diagram on the left side after mapping the 3D detection results of a plurality of fisheye images to a single coordinate system; it can be seen that the detected object (the object on the right side of the vehicle) has a "ghost", which indicates that both the front fisheye camera and the right fisheye camera detect the object. Subsequently, after NMS (non-maximum suppression) deduplication, fig. 2 shows the best 3D information of the target on the right side.

In an embodiment of the present disclosure, the 3D perception information has a crucial role for path planning and control in subsequent autodrive scenarios. Thus, the method 100 may include using the 3D perception information via the deduplication operation for autonomous driving and/or assisted driving of the vehicle at block 150. .

Referring to fig. 3, a schematic diagram of a multiple fisheye image based 3D object detection system 300 is shown, according to an embodiment of the disclosure. As shown, the system 300 may include a plurality of fisheye cameras (such as fisheye camera 301, fisheye camera 303) and an on-board computer 307. In an embodiment of the present disclosure, the fisheye camera is arranged on the vehicle and preferably covers 360 ° of the surroundings of the vehicle. Although two

fisheye cameras

301 and 303 are shown in fig. 3, it will be appreciated that the system 300 may include any suitable number of fisheye cameras, as indicated by ellipsis 305. Preferably, the system 300 may include four fisheye cameras disposed at the front, rear, left, and right of the vehicle.

According to an embodiment of the present disclosure, the plurality of fisheye cameras may be configured to capture a plurality of fisheye images at the same time and transmit the fisheye images to the vehicle computer 307, and the vehicle computer 307 may be configured to perform 3D object detection according to each of the plurality of fisheye images to obtain 3D information and corresponding confidence of the object included in each fisheye image; mapping the 3D information of the target detected in each fisheye image into a single coordinate system to obtain 3D perception information of the surrounding environment of the vehicle; performing a deduplication operation on 3D perception information of a detected target based on a confidence corresponding to the 3D information to remove duplicate 3D information related to the same target; and using the 3D perception information subjected to the deduplication operation for automatic driving and/or assisted driving of the vehicle.

In yet another embodiment of the present disclosure, to improve the efficiency of object detection, the vehicle mount computer 307 may be configured to crop each of the plurality of fisheye images to remove image regions where objects appear less frequently before performing 3D object detection. For example, in the fish-eye image of the present disclosure, the upper portion is generally the sky, and the lower portion is a region closer to the vehicle, and in these image regions, the target object to be detected does not generally appear. Therefore, the corresponding proportion of the upper part and the lower part of the fisheye image can be cut off without losing the information of the target object, the image content needing to be processed is reduced, the speed and the efficiency of target detection are improved, and the false detection rate is reduced. For example, for a fisheye image of 1920x 1280 pixels, 200 pixels above and 210 pixels below the fisheye image may be cut out, so that the cut-out image becomes 1920x 870 pixels in size. It will be understood by those skilled in the art that any suitable manner may be used to crop the image, as long as it does not lose the object information, and will not be described herein.

In an embodiment of the present disclosure, it is considered that different fisheye images may include the same object, for example, an image taken by a front fisheye camera and an image taken by a left fisheye camera may simultaneously take an object in the front left of the vehicle. Thereby, 3D object detection for the left fisheye image and the front fisheye image is made such that 3D information of the same object will be detected, such that these 3D information are redundant. Thus, the vehicle mount computer 307 may also be configured to perform a deduplication operation on the 3D perception information of the detected object based on the confidence corresponding to the 3D information to remove duplicate 3D information related to the same object. In a preferred embodiment of the present disclosure, performing a deduplication operation on the 3D perception information may include performing the deduplication operation using a non-maxima suppression (NMS) to find the best 3D information for the target (including the size, location, heading, etc. of the target).

FIG. 4 shows a schematic view of an example motor vehicle 400 according to an embodiment of the present disclosure. In this embodiment, the automobile 400 may include the 3D object detection system shown and described with reference to fig. 3.

As is apparent from the above description, a plurality of distortion-uncorrected fisheye images are input to a neural network (including a 3D object detection model) for detection, the class and 3D attribute of an object in the plurality of fisheye images are output, and the detection result is converted into 3D perception information of the corresponding vehicle environment, and the 3D perception information is then used for automatic driving and/or assisted driving of the vehicle.

In summary, the multi-channel fisheye 3D target detection algorithm, the multi-channel fisheye 3D target detection system, the multi-channel fisheye 3D target detection equipment and the computer-readable storage medium provided by the invention can adapt to the change of the target object in different road environments through the strong feature extraction capability of the 3D target detection network, and accurately describe the position of the target object in the vehicle body coordinate system by effectively obtaining the specific information of the target object, so that the time consumption is low, the accuracy is high, and the real-time performance is certain. The invention effectively overcomes various defects in the prior art and has high industrial utilization value.

It will be appreciated that although embodiments of the present disclosure are described herein with respect to automatic/assisted driving of a vehicle, the methods and systems of the present disclosure are equally applicable to other vehicles, such as boats, aircraft, and the like.

The foregoing detailed description includes references to the accompanying drawings, which form a part hereof. The drawings illustrate specific embodiments that can be practiced by way of illustration. These embodiments are also referred to herein as "examples". Such examples may include elements other than those shown or described. However, examples including the elements shown or described are also contemplated. Moreover, it is contemplated to use examples of any combination or permutation of those elements shown or described, or to refer to a particular example (or one or more aspects thereof) shown or described herein, or to refer to other examples (or one or more aspects thereof) shown or described herein.

In the appended claims, the terms "comprises," "comprising," and "includes" are open-ended, that is, a system, device, article, or process that includes elements in the claims other than those elements recited after such terms is considered to be within the scope of that claim. Furthermore, in the appended claims, the terms "first," "second," and "third," etc. are used merely as labels, and are not intended to indicate a numerical order of their objects.

In addition, the order of operations illustrated in this specification is exemplary. In alternative embodiments, the operations may be performed in a different order than illustrated in the figures, and the operations may be combined into a single operation or split into additional operations.

The above description is intended to be illustrative, and not restrictive. For example, the above-described examples (or one or more aspects thereof) may be used in conjunction with other embodiments. Other embodiments may be used, such as by one of ordinary skill in the art, after reviewing the above description. The abstract allows the reader to quickly ascertain the nature of the technical disclosure. This Abstract is submitted with the understanding that it will not be used to interpret or limit the scope or meaning of the claims. Furthermore, in the foregoing detailed description, various features may be grouped together to streamline the disclosure. However, the claims may not recite every feature disclosed herein because embodiments may characterize a subset of the features. Moreover, embodiments may include fewer features than are disclosed in a particular example. Thus the following claims are hereby incorporated into the detailed description, with one claim standing on its own as a separate embodiment. The scope of the embodiments disclosed herein should be determined with reference to the appended claims, along with the full scope of equivalents to which such claims are entitled.

Claims

1. A3D target detection method based on a plurality of fisheye images comprises the following steps:

receiving a plurality of fisheye images photographed at the same time from a plurality of fisheye cameras disposed around a vehicle;

performing 3D target detection according to each of the plurality of fisheye images to obtain 3D information and corresponding confidence of a target included in each fisheye image;

mapping the 3D information of the target detected in each fisheye image into a single coordinate system to obtain 3D perception information of the surroundings of the vehicle;

performing a deduplication operation on 3D perception information of a detected target based on a confidence corresponding to the 3D information to remove duplicate 3D information related to the same target; and

the 3D perception information subjected to the deduplication operation is used for automatic driving and/or assisted driving of the vehicle.

2. The method of claim 1, wherein the plurality of fisheye cameras are arranged to cover 360 ° of the surroundings of the vehicle.

3. The method of claim 1, wherein performing 3D object detection from each of the plurality of fisheye images comprises:

extracting the characteristics of the fisheye image through a neural network to obtain a characteristic diagram;

for each pixel of the feature map:

constructing a matrix by combining the pixel coordinates with distortion parameters and depth information associated with a fisheye camera taking the fisheye image;

processing the obtained matrix by the neural network to obtain a position code corresponding to the position of the pixel; and

combining the feature map with the obtained position code by an attention mechanism to obtain a new feature map; and

and carrying out target detection on the new characteristic diagram through the neural network so as to obtain the 3D information of the target and the corresponding confidence coefficient.

4. The method of claim 3, wherein the neural network comprises four concatenated residual networks.

5. The method of claim 1, wherein each of the plurality of fisheye images is cropped to remove image regions where objects appear less frequently before 3D object detection.

6. The method of claim 1, wherein mapping the 3D information of the object detected in each fisheye image into a single coordinate system comprises:

and converting each fisheye image from the pixel coordinate system to the corresponding camera coordinate system, and then converting each fisheye image from the corresponding camera coordinate system to the single coordinate system, thereby obtaining the 3D perception information of the surrounding environment of the vehicle under the single coordinate system.

7. The method of claim 1, wherein performing a deduplication operation on the 3D perception information comprises performing the deduplication operation using non-maxima suppression.

8. A multiple fisheye image based 3D object detection system comprising:

a plurality of fisheye cameras disposed on the vehicle; and

the computer is carried by the vehicle,

wherein the plurality of fisheye cameras are configured to capture a plurality of fisheye images at the same time and transmit the fisheye images to the vehicle computer,

and wherein the on-board computer is configured to:

performing 3D target detection according to each of the fisheye images to obtain 3D information and corresponding confidence of a target included in each fisheye image;

mapping the 3D information of the target detected in each fisheye image into a single coordinate system to obtain 3D perception information of the surrounding environment of the vehicle;

performing a deduplication operation on the 3D perception information based on a confidence corresponding to 3D information of a detected target to remove duplicate 3D information related to the same target; and

9. The system of claim 8, wherein the on-board computer is further configured to crop each of the plurality of fisheye images to remove image regions where objects appear less frequently prior to 3D object detection.

10. The system of claim 8, wherein performing a deduplication operation on the 3D perception information comprises performing the deduplication operation using non-maxima suppression.

11. A motor vehicle comprising a 3D object detection system according to any of claims 8-10.