CN111712828A

CN111712828A - Object detection method, electronic device and movable platform

Info

Publication number: CN111712828A
Application number: CN201980012209.0A
Authority: CN
Inventors: 张磊杰; 陈晓智; 徐斌
Original assignee: SZ DJI Technology Co Ltd
Current assignee: SZ DJI Technology Co Ltd
Priority date: 2019-06-06
Filing date: 2019-06-06
Publication date: 2020-09-25
Also published as: WO2020243962A1

Abstract

An object detection method, an electronic device, and a movable platform. The object detection method comprises the following steps: acquiring sparse point cloud data and an image of a scene to be detected (S201); projecting the sparse point cloud data and the image to a target coordinate system to acquire data to be processed (S202); and performing three-dimensional detection on the data to be processed to obtain a detection result of the object included in the scene to be detected (S203). The object detection is realized by acquiring the sparse point cloud data and the image, and the density of the point cloud data is reduced, so that the object detection cost is reduced.

Description

Object detection method, electronic device and movable platform

Technical Field

The embodiment of the application relates to the technical field of movable platforms, in particular to an object detection method, electronic equipment and a movable platform.

Background

Obstacle detection is one of the key technologies of an automatic driving system, which detects obstacles such as vehicles, pedestrians, etc. in a road scene using sensors such as cameras, laser radars, millimeter wave radars, etc. mounted on vehicles. In an automatic driving scenario, an automatic driving system needs to obtain not only the position of an obstacle on an image, but also predict three-dimensional positioning information of the obstacle. The accuracy of the three-dimensional positioning of the obstacle directly affects the safety and reliability of the autonomous vehicle.

Currently, obstacle detection positioning methods generally rely on accurate depth measurement sensors, which can obtain high density point cloud data, such as laser radar sensors. However, accurate depth measurement sensors are costly, resulting in a costly autopilot system.

Disclosure of Invention

The embodiment of the application provides an object detection method, electronic equipment and a movable platform, and the cost of object detection can be reduced.

In a first aspect, an embodiment of the present application provides an object detection method, including:

acquiring sparse point cloud data and an image of a scene to be detected;

projecting the sparse point cloud data and the image to a target coordinate system to obtain data to be processed;

and carrying out three-dimensional detection on the data to be processed to obtain a detection result of the object included in the scene to be detected.

In a second aspect, an embodiment of the present application provides an electronic device, including:

a memory for storing a computer program;

a processor for executing the computer program, in particular for:

acquiring sparse point cloud data and an image of a scene to be detected;

In a third aspect, an embodiment of the present application provides a movable platform, including: an electronic device provided by the second aspect of the embodiments of the present application.

In a fourth aspect, an embodiment of the present application provides a computer storage medium, in which a computer program is stored, and the computer program, when executed, implements the object detection method according to the first aspect.

The embodiment of the application provides an object detection method, electronic equipment and a movable platform. As only sparse point cloud data needs to be acquired, the density of the point cloud data is reduced, the complexity and requirements of electronic equipment are reduced, and the cost of object detection is reduced.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

Fig. 1 is a schematic view of an application scenario according to an embodiment of the present application;

fig. 2 is a flowchart of an object detection method according to an embodiment of the present application;

fig. 3 is another flowchart of an object detection method according to an embodiment of the present disclosure;

fig. 4 is another flowchart of an object detection method according to an embodiment of the present disclosure;

fig. 5 is another flowchart of an object detection method according to an embodiment of the present disclosure;

fig. 6 is another flowchart of an object detection method according to an embodiment of the present disclosure;

fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of a laser radar according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In the description of the present application, "a plurality" means two or more than two.

In addition, in order to facilitate clear description of technical solutions of the embodiments of the present application, in the embodiments of the present application, terms such as "first" and "second" are used to distinguish the same items or similar items having substantially the same functions and actions. Those skilled in the art will appreciate that the terms "first," "second," etc. do not denote any order or quantity, nor do the terms "first," "second," etc. denote any order or importance.

The embodiment of the application can be applied to any field needing to detect the object. For example, the method is applied to the intelligent driving field such as automatic driving and auxiliary driving, and can detect obstacles such as vehicles and pedestrians in a road scene. For another example, the method can be applied to the field of unmanned aerial vehicles, and can detect the detection of obstacles in the flight scene of the unmanned aerial vehicle. For another example, in the security field, objects entering a designated area are detected. The object detection method provided by the embodiment of the application can be suitable for a low-complexity neural network, and enables the detection scheme to have universality on multiple platforms on the basis of ensuring the object detection accuracy.

Fig. 1 is a schematic view of an application scenario according to an embodiment of the present application, and as shown in fig. 1, an intelligent driving vehicle includes a detection device. During the driving process of the intelligent driving vehicle, the detection device can identify and detect objects (such as rockfall, lost objects, withered branches, pedestrians, vehicles and the like) in a front lane, obtain detection information such as three-dimensional positions, posture orientations and three-dimensional sizes of the objects, and plan the intelligent driving state according to the detection information, such as lane changing, deceleration or parking.

Alternatively, the detection device may include radar, ultrasonic detection device, Time of flight (TOF) ranging detection device, vision detection device, laser detection device, image sensor, and the like, and combinations thereof. The number and implementation types of the sensors are not limited in the embodiments of the present application. For example, the image sensor may be a camera, a video camera, or the like. The radar may be a general purpose lidar or a specific lidar meeting the requirements of a specific scene, such as a rotary scanning multiline lidar with multiple-transmit-multiple-receive sensors, etc.

It should be noted that fig. 1 is a schematic view of an application scenario of the present application, and the application scenario of the embodiment of the present application includes, but is not limited to, that shown in fig. 1.

The technical solution of the present invention will be described in detail below with specific examples. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments.

Fig. 2 is a flowchart of an object detection method according to an embodiment of the present disclosure. In the object detection method provided by this embodiment, the execution subject may be an electronic device. As shown in fig. 2, the object detection method provided in this embodiment includes:

s201, acquiring sparse point cloud data and images of a scene to be detected.

The point cloud data refers to a point data set of the object appearance surface obtained by the measuring equipment. The point cloud data may be classified into sparse point cloud data and dense point cloud data according to different conditions. For example, the division may be made according to the pitch between the dots and the number of dots. When the distance between each point is larger and the number of the points is smaller, the method can be called sparse point cloud data. When the distance between the points is small and the number of the points is large, the data can be called dense point cloud data or high-density point cloud data.

The acquisition of dense point clouds requires high-beam lidar to scan at higher frequencies. High-line-beam laser radars cause higher use cost, and the service life of the laser radars is reduced when the laser radars are continuously used at higher scanning frequency. Other dense point cloud obtaining methods, such as point cloud splicing of multiple single line laser radars, require complex algorithms and are relatively low in system robustness.

As only sparse point cloud data needs to be acquired, compared with the acquisition of high-density point cloud data, the difficulty of acquiring the point cloud data is reduced, and the requirements on equipment and the cost of the equipment are reduced. Therefore, under the daily application scene, the sparse point cloud has better utilization value than the dense point cloud.

It should be noted that, the scene to be detected is not limited in this embodiment, and may be different according to the type of the electronic device and different application scenes. For example, when the electronic device is applied to an autonomous vehicle, the scene to be detected may be a road ahead of which the vehicle travels. When electronic equipment is applied to unmanned aerial vehicle, it can be the flight environment when unmanned aerial vehicle flies to wait to detect the scene.

Optionally, acquiring sparse point cloud data and an image of a scene to be detected may include:

and acquiring the sparse point cloud data through a radar sensor, and acquiring the image through an image sensor.

The number, installation position and implementation manner of the radar sensors and the image sensors are not limited in this embodiment. For example, the image sensor may be a camera, a webcam, or the like. The number of image sensors may be one. The number of radar sensors may be one or more than one.

Optionally, when the number of the radar sensors is greater than 1, acquiring sparse point cloud data by the radar sensors may include:

and respectively acquiring corresponding first sparse point cloud data through each radar sensor.

And projecting the first sparse point cloud data corresponding to each radar sensor into a target radar coordinate system according to the external parameters of at least one radar sensor to obtain sparse point cloud data.

Specifically, the number of the radar sensors is plural. The radar coordinate systems corresponding to the multiple radar sensors have a certain conversion relationship, and the conversion relationship may be determined by external parameters of the radar sensors, which is not limited in this embodiment. External parameters of the radar sensors include, but are not limited to, the arrangement, position, orientation angle, carrier velocity, acceleration, etc. of the radar sensors. The first sparse point cloud data acquired by each radar sensor can be projected into a target radar coordinate system, and the sparse point cloud data in the target radar coordinate system is acquired. Optionally, the target radar coordinate system may be a radar coordinate system corresponding to any one of the plurality of radar sensors. Or the target radar coordinate system is other radar coordinate systems, and the target radar coordinate system has a certain conversion relation with the radar coordinate system corresponding to each radar sensor.

This is illustrated by way of example.

Assume that there are 2 radar sensors, referred to as radar sensor 1 and radar sensor 2, respectively. The target radar coordinate system may be the radar coordinate system 1 corresponding to the radar sensor 1. The sparse point cloud data may include: and projecting the sparse point cloud data acquired by the radar sensor 2 into the radar coordinate system 1 and then collecting the sparse point cloud data acquired by the radar sensor 1.

Optionally, if the first sparse point cloud data corresponding to each radar sensor is projected to a target radar coordinate system and then overlapped point cloud data exists, data deduplication is performed.

Through data deduplication processing, the effectiveness of the acquired sparse point cloud data is improved, and therefore the accuracy of object detection is improved.

Alternatively, the sparse point cloud data may include three-dimensional position coordinates of each point, which may be labeled (x, y, z). Optionally, the sparse point cloud data may further include a laser reflection intensity value for each point.

S202, projecting the sparse point cloud data and the image to a target coordinate system to obtain data to be processed.

Specifically, the sparse point cloud data and the image are projected to a target coordinate system, so that the sparse point cloud data and the pixel in the image can be matched, and the effectiveness of the data to be processed is improved. Optionally, the target coordinate system may be an image coordinate system corresponding to the image sensor. The target coordinate system may be other coordinate systems, which is not limited in this embodiment.

Optionally, projecting the sparse point cloud data and the image into a target coordinate system to obtain data to be processed, which may include:

and projecting the sparse point cloud data and the image into an image coordinate system through external parameters of the radar sensor and the image sensor to acquire data to be processed.

Specifically, a certain conversion relationship exists between a radar coordinate system corresponding to the radar sensor and an image coordinate system corresponding to the image sensor, and the conversion relationship may be determined by external references of the radar sensor and the image sensor, which is not limited in this embodiment. In this implementation, the target coordinate system is the image coordinate system corresponding to the image sensor. By projecting the sparse point cloud data and the image into the image coordinate system, the sparse point cloud data can be accurately mapped and matched with part of pixels in the image, and the sparse point cloud data outside the image coordinate system can be filtered. For example, assume that the image has a length H and a width W. Then, by projecting the sparse point cloud data and the image into the image coordinate system, the sparse point cloud data outside the H × W range can be filtered out, and the data to be processed is obtained.

Optionally, the data to be processed may include: and projecting the sparse point cloud data to the coordinate value and the reflectivity of each point in a target coordinate system, and the coordinate value of a pixel point in the image in the target coordinate system.

It should be noted that, the sparse point cloud data and the image are projected into the target coordinate system, and each point in the sparse point cloud data and each pixel point in the image are not necessarily completely mapped and matched. When the matching is possible, the reflectivity of the corresponding point in the target coordinate system may be the laser reflection intensity value of the point in the sparse point cloud data. When not matched, the reflectivity of the corresponding point in the target coordinate system may be set to 0.

And S203, carrying out three-dimensional detection on the data to be processed, and acquiring a detection result of an object included in the scene to be detected.

Optionally, the information of the object may include at least one of: three-dimensional position information, orientation information, three-dimensional size information, and depth values of objects.

As can be seen, according to the object detection method provided by this embodiment, by acquiring the sparse point cloud data and the image of the scene to be detected, the detection result of the object included in the scene to be detected can be acquired based on the sparse point cloud data and the image. As only sparse point cloud data needs to be acquired, the density of the point cloud data is reduced, the complexity and requirements of electronic equipment are reduced, and the cost of object detection is reduced.

Fig. 3 is another flowchart of an object detection method according to an embodiment of the present disclosure. As shown in fig. 3, in the step S203, performing three-dimensional detection on the data to be processed to obtain a detection result of an object included in the scene to be detected may include:

s301, inputting the data to be processed into the basic network model, and acquiring the characteristic diagram.

The basic network model can be pre-trained and used for outputting a characteristic diagram according to data to be processed. It should be noted that, in this embodiment, implementation of the basic network model is not limited, and different neural network models, for example, convolutional neural network models, may be adopted according to actual requirements. The basic network model can comprise a plurality of layers of convolution and pooling operations according to actual requirements, and finally outputs a feature map.

S302, inputting the feature map into the candidate area network model, and obtaining a two-dimensional frame of the candidate object.

The candidate area network model may be a two-dimensional box trained in advance and used for outputting candidate objects according to the feature map. It should be noted that, in this embodiment, implementation of the candidate area network model is not limited, and different neural network models, for example, convolutional neural network models, may be adopted according to actual requirements. And the two-dimensional frame of the candidate object corresponds to the object included in the scene to be detected. Each object included in the scene to be detected may correspond to a two-dimensional frame of a plurality of candidate objects.

It should be noted that in this step, the specific type of the object is not distinguished. For example. Assuming that the objects included in the scene to be detected are 2 vehicles and 1 pedestrian, the two-dimensional frame of the acquired object candidates may be 100. Then, 2 vehicles and 1 pedestrian collectively correspond to the two-dimensional frame of the 100 candidate objects. In the subsequent step, it can be determined to which object the two-dimensional frames of the 100 candidate objects correspond respectively.

S303, determining the object included in the scene to be detected according to the two-dimensional frame of the candidate object, and acquiring a compensation value of the information of the object.

Specifically, what the object included in the scene to be detected is may be determined according to the two-dimensional frame of the candidate object, and a compensation value of the information of the object may be obtained.

Alternatively, the compensation value of the information of the object may include, but is not limited to, at least one of the following: a compensation value for an orientation of the object, a compensation value for three-dimensional position information of the object, a compensation value for a three-dimensional size of the object, and a compensation value for a two-dimensional frame of the object.

Wherein the compensation value of the orientation of the object is a difference value between the actual value of the orientation of the object and the preset orientation. The compensation value of the three-dimensional position information of the object is a difference value between an actual value of the three-dimensional position of the object and a preset three-dimensional position. The compensation value of the three-dimensional size of the object is a difference value between an actual value of the three-dimensional size of the object and a preset three-dimensional size. The compensation value of the two-dimensional frame of the object is the difference value between the actual value and the preset value of the two-dimensional frame of the object. It should be noted that, in this embodiment, specific values of the preset orientation, the preset three-dimensional position, the preset three-dimensional size, and the preset value of the two-dimensional frame of the object are not limited. For example, for a vehicle, the preset three-dimensional position may be a three-dimensional position of a center point of a chassis of the vehicle. The preset three-dimensional size may be different according to the model of the vehicle.

And S304, acquiring the information of the object according to the compensation value of the information of the object.

In the object detection method provided by this embodiment, the data to be processed is sequentially input into the basic network model and the candidate area network model, so that the two-dimensional frame of the candidate object can be obtained, the compensation value of the object and the information of the object included in the scene to be detected is determined according to the two-dimensional frame of the candidate object, and the information of the object is obtained according to the compensation value of the information of the object. Compare in the information that directly acquires the object, acquire the offset of the information of object earlier, easily realize and the accuracy is higher, has promoted object detection's accuracy.

Fig. 4 is another flowchart of an object detection method according to an embodiment of the present disclosure. As shown in fig. 4, the step S302 of inputting the feature map into the candidate area network model to obtain the two-dimensional frame of the candidate object may include:

s401, obtaining the probability that each pixel point in the image belongs to the object according to the characteristic diagram.

S402, if the first pixel is determined to belong to the object according to the probability that each pixel point belongs to the object, a two-dimensional frame of the object corresponding to the first pixel is obtained.

And S403, acquiring a two-dimensional frame of the candidate object according to the probability that the first pixel belongs to the object and the two-dimensional frame of the object corresponding to the first pixel.

The following description is made with reference to examples.

Assume that the resolution of the image is 100 x 50, i.e. 5000 pixels. The probability that each pixel point in the 5000 pixel points belongs to the object can be obtained according to the feature map. Assuming that the probability that the pixel 1 belongs to the object is P1, it is determined that the pixel 1 belongs to the object according to the probability P1, and the pixel 1 may be referred to as a first pixel, so that the two-dimensional frame 1 of the object corresponding to the pixel 1 may be obtained. The probability that the pixel point 2 belongs to the object is P2, and the pixel point 2 is determined not to belong to the object according to the probability P2. The probability that the pixel point 3 belongs to the object is P3, and according to the probability P3, it can be determined that the pixel point 3 belongs to the object, and the pixel point 3 can be called as a first pixel point, so that the two-dimensional frame 3 of the object corresponding to the pixel point 3 can be obtained. Assuming that 200 first pixels are determined according to the probability that 5000 pixel points belong to the objects respectively, then two-dimensional frames of 200 objects can be obtained. Then, the two-dimensional frames of the object corresponding to the 200 first pixels and the 200 first pixels are further screened, and the two-dimensional frame of the candidate object is obtained from the two-dimensional frames of the 200 objects. For example, the number of the finally acquired two-dimensional frames of the candidate object may be 50.

It should be noted that the above numbers are merely examples, and the present embodiment does not limit the present invention.

According to the object detection method provided by this embodiment, according to the probability that the pixel point in the image belongs to the object, the two-dimensional frames of the object corresponding to a part of pixels can be obtained first, and further, the two-dimensional frames of the object are brushed and selected again to determine the two-dimensional frame of the candidate object, so that the accuracy of obtaining the two-dimensional frame of the candidate object is improved.

Optionally, in S402, determining whether a pixel belongs to an object according to the probability that the pixel belongs to the object may include:

and if the probability that the pixel point belongs to the object is greater than or equal to the preset value, determining that the pixel point belongs to the object.

And if the probability that the pixel point belongs to the object is smaller than the preset value, determining that the pixel point does not belong to the object.

The specific value of the preset value is not limited in this embodiment.

Optionally, in S403, acquiring a two-dimensional frame of the candidate object according to the probability that the first pixel belongs to the object and the two-dimensional frame of the object corresponding to the first pixel, may include:

and acquiring a first pixel to be processed from a first set consisting of a plurality of first pixels, deleting the first pixel to be processed from the first set, and acquiring an updated first set. The first pixel to be processed is the first pixel in the first set with the highest probability of belonging to the object.

And for each first pixel in the updated first set, acquiring a correlation value between each first pixel and the first pixel to be processed. The correlation value is used to indicate the degree of coincidence between the two-dimensional frame of the object corresponding to each first pixel and the two-dimensional frame of the object corresponding to the first pixel to be processed.

Deleting the first pixels with the correlation values larger than the preset value from the updated first set, and re-executing the steps of acquiring the first pixels to be processed and updating the first set until the first set does not include the first pixels, and determining the two-dimensional frames of the objects corresponding to all the first pixels to be processed as the two-dimensional frames of the candidate objects.

The following description is made with reference to examples.

Assume that the number of first pixels is 4, labeled as pixels 1-4. The probability that the pixels 1 to 4 belong to the object is P1 to P4. Wherein, P2 is more than P3 is more than P1 is more than P4. Pixels 1-4 constitute an initial first set. First, the probability P2 that pixel 2 in the first set belongs to the object is the largest, pixel 2 is obtained from the first set, pixel 2 may be referred to as the first pixel to be processed, and the first set is updated to be { pixel 1, pixel 3, pixel 4 }. The correlation values between pixel 1, pixel 3, pixel 4 and pixel 2 in the first set are obtained, and labeled as Q12, Q32, Q42 in sequence. If Q12 > the preset value Q and Q42 > the preset value Q, it indicates that the two-dimensional frame overlapping degree of the object corresponding to the pixel 1 and the two-dimensional frame overlapping degree of the object corresponding to the pixel 2 are high, and the two-dimensional frame overlapping degree of the object corresponding to the pixel 4 and the two-dimensional frame overlapping degree of the object corresponding to the pixel 2 is also high, so that the pixel 1 and the pixel 4 can be deleted from the first set, and the deduplication operation for the two-dimensional frame of the object corresponding to the pixel 2 is completed. At this time, the first set includes only the pixel 3. Again, pixel 3 is taken from the first set, and pixel 3 may be referred to as the first pixel to be processed. After deleting pixel 3 from the first set, the first set does not include the first pixel. The two-dimensional frame of the object corresponding to pixel 2 and pixel 3 may be determined as the two-dimensional frame of the candidate object. The two-dimensional frame of the candidate object is 2.

Therefore, the duplicate removal operation is realized in the two-dimensional frame of the object corresponding to the first pixel, the number of the two-dimensional frames of the obtained candidate object is greatly reduced, the effectiveness of the two-dimensional frame of the candidate object is improved, and the accuracy of object detection is favorably improved.

Fig. 5 is another flowchart of an object detection method according to an embodiment of the present disclosure. As shown in fig. 5, in the step S303, determining the object included in the scene to be detected according to the two-dimensional frame of the candidate object may include:

s501, inputting the two-dimensional frame of the candidate object into the first three-dimensional detection network model, and obtaining the probability that the candidate object belongs to each object in the preset objects.

The first three-dimensional detection network model may be pre-trained and configured to output, according to the two-dimensional frame of the candidate object, a probability that the candidate object belongs to each object in the preset objects. It should be noted that, in this embodiment, an implementation manner of the first three-dimensional detection network model is not limited, and different neural network models, for example, convolutional neural network models, may be adopted according to actual requirements. The present embodiment is not limited to specific categories of the preset objects, for example, the preset objects may include, but are not limited to, vehicles, bicycles, and pedestrians.

This is illustrated by way of example.

Assuming that the number of the two-dimensional frames of the candidate object is 3, the candidate object can be marked as candidate objects 1-3, and the two-dimensional frames of the candidate object can be marked as two-dimensional frames 1-3. The preset objects include vehicles and pedestrians. Then, after the two-dimensional frame of the candidate object is input into the first three-dimensional detection network model, the output result is as follows:

the object candidate 1: the probability of belonging to a vehicle is P11, and the probability of belonging to a pedestrian is P12.

The object candidate 2: the probability of belonging to a vehicle is P21, and the probability of belonging to a pedestrian is P22.

The object candidate 3: the probability of belonging to a vehicle is P31, and the probability of belonging to a pedestrian is P32.

S502, obtaining the objects included in the scene to be detected according to the probability that the candidate objects belong to each object in the preset objects.

Optionally, in an implementation manner, if the probability that the candidate object belongs to the first object in the preset objects is greater than a preset threshold corresponding to the first object, it is determined that the candidate object is an object included in the scene to be detected. The explanation is also made with an example in S501. Assume that the vehicle corresponds to a preset threshold of Q1 and the pedestrian corresponds to a preset threshold of Q2. Assuming that P11 > Q1, P21 < Q1, and P31 > Q1, it can be determined that the scene to be detected includes the candidate object 1 and the candidate object 3, and both are vehicles. If P12 < Q2, P22 < Q2, and P32 < Q2, it is assumed that no pedestrian is included in the scene to be detected.

It should be noted that, in S502, the object included in the scene to be detected is obtained according to the probability that the candidate object belongs to each object in the preset objects, and other implementation manners may also be provided, which is not limited in this embodiment.

Optionally, in S303, acquiring a compensation value of the information of the object may include:

by inputting the two-dimensional frame of the candidate object into the first three-dimensional detection network model, at least one of the following compensation values is also obtained: a compensation value for the orientation of the candidate object, a compensation value for the three-dimensional position information of the candidate object, a compensation value for the two-dimensional frame of the candidate object, and a compensation value for the three-dimensional size of the candidate object.

And if the candidate object is determined to be the object included in the scene to be detected according to the probability that the candidate object belongs to each object in the preset objects, determining the compensation value corresponding to the candidate object as the compensation value of the information of the object.

Specifically, through the first three-dimensional detection network model, the probability that the candidate object belongs to each object in the preset objects can be output according to the two-dimensional frame of the candidate object, and the compensation value of the information of the candidate object can be output at the same time. If the candidate object is determined to be an object included in the scene to be detected according to the probability that the candidate object belongs to each object in the preset objects (see the related description of S502 in particular), the compensation value corresponding to the candidate object may be determined to be the compensation value of the information of the object. The explanation is also made with an example in S501. Since the candidate object 1 and the candidate object 3 are vehicles included in the scene to be detected, the compensation values corresponding to the candidate object 1 and the candidate object 3, respectively, may be determined as compensation values of information of the vehicles included in the scene to be detected.

Next, the compensation value will be described by taking the compensation value of the orientation of the object candidate as an example.

For the compensation value of the orientation of the candidate object, [ -180 °, 180 ° ] may be equally divided into a plurality of sections, with the center of each section set as the preset orientation. For example, for the interval [ -160 °, -140 ° ], the preset orientation may be-150 °. For each two-dimensional frame of the candidate object, the first three-dimensional detection network model can output a section to which the orientation of the candidate object belongs and a compensation value of the orientation of the candidate object, wherein the compensation value is a difference value between an actual value of the orientation of the candidate object and the center of the section to which the orientation of the candidate object belongs.

In the object detection method provided by this embodiment, the two-dimensional frame of the candidate object is input into the first three-dimensional detection network model, and the probability that the candidate object belongs to each object in the preset objects and the compensation value of the information of the candidate object can be obtained, so that the compensation value of the information of the object can be obtained simultaneously when the candidate object is determined to be an object included in the scene to be detected.

Fig. 6 is another flowchart of an object detection method according to an embodiment of the present disclosure. As shown in fig. 6, in the above S303, determining the object included in the scene to be detected according to the two-dimensional frame of the object candidate may include:

s601, inputting the two-dimensional frame of the candidate object into a semantic prediction network model, and acquiring the probability that the candidate object belongs to each object in preset objects.

S602, determining the objects included in the scene to be detected according to the probability that the candidate objects belong to each object in the preset objects.

The semantic prediction network model can be pre-trained and used for outputting the probability that the candidate object belongs to each object in the preset objects according to the two-dimensional frame of the candidate object. It should be noted that, in this embodiment, implementation of the semantic prediction network model is not limited, and different neural network models, for example, convolutional neural network models, may be adopted according to actual requirements. The present embodiment is not limited to specific categories of the preset objects, for example, the preset objects may include, but are not limited to, vehicles, bicycles, and pedestrians.

inputting a two-dimensional frame of an object included in a scene to be detected into a second three-dimensional detection network model, and acquiring a compensation value of information of the object, wherein the compensation value comprises at least one of the following items: a compensation value for an orientation of the object, a compensation value for three-dimensional position information of the object, a compensation value for a two-dimensional frame of the object, and a compensation value for a three-dimensional size of the object.

The second three-dimensional detection network model may be a pre-trained compensation value for outputting information of the object according to the two-dimensional frame of the object. It should be noted that, in this embodiment, an implementation manner of the second three-dimensional detection network model is not limited, and different neural network models, for example, convolutional neural network models, may be adopted according to actual requirements.

This embodiment differs from the embodiment shown in fig. 5 in that: in the embodiment, the two models of the semantic prediction network model and the second three-dimensional detection network model are involved. The output of the semantic prediction network model is the probability that the candidate object belongs to each object in the preset objects. And after determining that the candidate object is the object included in the scene to be detected according to the probability, outputting a compensation value of the information of the object through a second three-dimensional detection network model. And in the embodiment shown in fig. 5, to a first three-dimensional inspection network model. The probability that the candidate object belongs to each object in the preset objects and the compensation value of the information of the candidate object can be simultaneously output through the first three-dimensional detection network model.

On the basis of the above-described embodiment, in one possible implementation, the compensation value of the information of the object includes a compensation value of an orientation of the object. In S304, acquiring the information of the object according to the compensation value of the information of the object may include:

and acquiring the central angle of the preset orientation interval to which the object belongs.

And acquiring the orientation information of the object according to the compensation value of the orientation of the object and the central angle of the preset orientation interval to which the object belongs.

In this implementation, the orientation information of the object may be acquired according to the compensation value of the orientation of the object.

Optionally, in another possible implementation manner, the compensation value of the information of the object includes a compensation value of three-dimensional position information of the object. In S304, acquiring the information of the object according to the compensation value of the information of the object may include:

three-dimensional position information of a reference point of an object is acquired.

And acquiring the three-dimensional position information of the object according to the compensation value of the three-dimensional position information of the object and the three-dimensional position information of the reference point of the object.

In this implementation, the three-dimensional position information of the object may be acquired according to the compensation value of the three-dimensional position information of the object.

Optionally, in another possible implementation, the compensation value of the information of the object includes a compensation value of a three-dimensional size of the object. In S304, acquiring the information of the object according to the compensation value of the information of the object may include:

and acquiring a reference value of the three-dimensional size of the object corresponding to the object.

And acquiring three-dimensional size information of the object according to the compensation value of the three-dimensional size of the object and the reference value of the three-dimensional size of the object corresponding to the object.

In this implementation, the three-dimensional size information of the object may be acquired from the compensation value of the three-dimensional size of the object.

Optionally, in another possible implementation, the compensation value of the information of the object includes a compensation value of a two-dimensional frame of the object. In S304, acquiring the information of the object according to the compensation value of the information of the object may include:

and acquiring a reference value of the two-dimensional frame corresponding to the object.

And acquiring the position information of the two-dimensional frame of the object according to the compensation value of the two-dimensional frame of the object and the reference value of the two-dimensional frame corresponding to the object.

And acquiring the depth value of the object according to the position information of the two-dimensional frame of the object.

In this implementation, the depth value of the object may be obtained from the compensation value of the two-dimensional frame of the object.

It should be noted that, the compensation value of the information of the object includes different contents, and the above various implementations may be combined with each other, so that at least one of the following information of the object may be acquired: orientation information of the object, three-dimensional position information of the object, three-dimensional size information of the object, and depth values of the object.

Optionally, in an implementation, obtaining the depth value of the object according to the position information of the two-dimensional frame of the object may include:

and inputting the position information of the two-dimensional frame of the object into the first region segmentation network model to obtain sparse point cloud data on the surface of the object.

And clustering and partitioning the sparse point cloud data on the surface of the object to obtain the sparse point cloud data of a target point on the surface of the object.

And determining the depth value of the object according to the sparse point cloud data of the target point.

The first region segmentation network model may be pre-trained and configured to output sparse point cloud data on the surface of the object according to the position information of the two-dimensional frame of the object. It should be noted that, in this embodiment, an implementation manner of the first region segmentation network model is not limited, and different neural network models, for example, convolutional neural network models, may be adopted according to actual requirements.

In this implementation, accurate sparse point cloud data on the surface of the object can be acquired through the first region segmentation network model. And then, clustering and segmenting the sparse point cloud data on the surface of the object, so as to obtain the sparse point cloud data of a target point on the surface of the object, and finally determining the depth value of the object.

It should be noted that the present embodiment does not limit the position of the target point. For example, the target point on the vehicle may be a raised point on the rear of the vehicle. The target point of the pedestrian may be a point on the head of the pedestrian, and so on.

Optionally, in another implementation, obtaining the depth value of the object according to the position information of the two-dimensional frame of the object may include:

and inputting the position information of the two-dimensional frame of the object into the second region segmentation network model, and acquiring the sparse point cloud data of the target surface on the surface of the object.

And obtaining the depth value of the object according to the sparse point cloud data of the target surface.

The second region segmentation network model may be pre-trained sparse point cloud data for outputting a target surface on the surface of the object according to the position information of the two-dimensional frame of the object. It should be noted that, in this embodiment, an implementation manner of the second region segmentation network model is not limited, and different neural network models, for example, convolutional neural network models, may be adopted according to actual requirements.

In this implementation, by using the second region segmentation network model, sparse point cloud data of a target surface on the surface of an object can be acquired, so that a depth value of the object can be determined.

It should be noted that the position of the target surface is not limited in this embodiment. For example, if the traveling direction of the vehicle coincides with the moving direction of the electronic device, the target surface on the vehicle may be the rear of the vehicle. The target surface on the vehicle may be the front of the vehicle if the direction of travel of the vehicle is opposite to the direction of movement of the electronic device. The target surface of the pedestrian may be the head of the pedestrian, and the like.

Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic device provided by this embodiment is configured to execute the object detection method provided by any one of the implementation manners of fig. 2 to 6. As shown in fig. 7, the electronic device provided in this embodiment may include:

a memory 12 for storing a computer program;

a processor 11, configured to execute the computer program, and specifically configured to:

acquiring sparse point cloud data and an image of a scene to be detected;

Optionally, the processor 11 is specifically configured to:

inputting the data to be processed into a basic network model to obtain a characteristic diagram;

inputting the characteristic diagram into a candidate area network model to obtain a two-dimensional frame of a candidate object;

determining the object included in the scene to be detected according to the two-dimensional frame of the candidate object, and acquiring a compensation value of information of the object;

and acquiring the information of the object according to the compensation value of the information of the object.

Optionally, the processor 11 is specifically configured to:

acquiring the probability that each pixel point in the image belongs to an object according to the characteristic diagram;

if the first pixel is determined to belong to the object according to the probability that each pixel point belongs to the object, acquiring a two-dimensional frame of the object corresponding to the first pixel;

and acquiring the two-dimensional frame of the candidate object according to the probability that the first pixel belongs to the object and the two-dimensional frame of the object corresponding to the first pixel.

Optionally, the processor 11 is specifically configured to:

acquiring a first pixel to be processed from a first set consisting of a plurality of first pixels, deleting the first pixel to be processed from the first set, and acquiring an updated first set; the first pixel to be processed is the first pixel with the highest probability of belonging to the object in the first set;

for each first pixel in the updated first set, obtaining a correlation value between each first pixel and the to-be-processed first pixel; the correlation value is used for indicating the coincidence degree of the two-dimensional frame of the object corresponding to each first pixel and the two-dimensional frame of the object corresponding to the first pixel to be processed;

and deleting the first pixels with the correlation values larger than the preset value from the updated first set, and re-executing the steps of acquiring the first pixels to be processed and updating the first set until the first set does not comprise the first pixels, and determining all the first pixels to be processed as the two-dimensional frame of the candidate object.

Optionally, the processor 11 is specifically configured to:

inputting the two-dimensional frame of the candidate object into a first three-dimensional detection network model, and acquiring the probability that the candidate object belongs to each object in preset objects;

and acquiring the objects included in the scene to be detected according to the probability that the candidate object belongs to each object in the preset objects.

Optionally, the processor 11 is specifically configured to:

by inputting the two-dimensional frame of the candidate object into the first three-dimensional detection network model, at least one of the following compensation values is also obtained: a compensation value of an orientation of the candidate object, a compensation value of three-dimensional position information of the candidate object, a compensation value of a two-dimensional frame of the candidate object, and a compensation value of a three-dimensional size of the candidate object;

Optionally, the processor 11 is specifically configured to:

inputting the two-dimensional frame of the candidate object into a semantic prediction network model, and acquiring the probability that the candidate object belongs to each object in preset objects;

and determining the objects included in the scene to be detected according to the probability that the candidate objects belong to each object in preset objects.

Optionally, the processor 11 is specifically configured to:

inputting a two-dimensional frame of an object included in the scene to be detected into a second three-dimensional detection network model, and acquiring a compensation value of information of the object, wherein the compensation value includes at least one of the following items: a compensation value for an orientation of the object, a compensation value for three-dimensional position information of the object, a compensation value for a two-dimensional frame of the object, and a compensation value for a three-dimensional size of the object.

Optionally, the compensation value of the information of the object includes a compensation value of an orientation of the object, and the processor 11 is specifically configured to:

acquiring a central angle of a preset orientation interval to which the object belongs;

and acquiring orientation information of the object according to the compensation value of the orientation of the object and the central angle of the preset orientation interval to which the object belongs.

Optionally, the compensation value of the information of the object includes a compensation value of three-dimensional position information of the object, and the processor 11 is specifically configured to:

acquiring three-dimensional position information of a reference point of the object;

Optionally, the compensation value of the information of the object includes a compensation value of a three-dimensional size of the object, and the processor 11 is specifically configured to:

acquiring a reference value of the three-dimensional size of an object corresponding to the object;

Optionally, the compensation value of the information of the object includes a compensation value of a two-dimensional frame of the object, and the processor 11 is specifically configured to:

acquiring a reference value of a two-dimensional frame corresponding to the object;

acquiring position information of the two-dimensional frame of the object according to the compensation value of the two-dimensional frame of the object and the reference value of the two-dimensional frame corresponding to the object;

Optionally, the processor 11 is specifically configured to:

inputting the position information of the two-dimensional frame of the object into a first region segmentation network model to obtain sparse point cloud data on the surface of the object;

performing clustering segmentation on the sparse point cloud data on the surface of the object to obtain sparse point cloud data of a target point on the surface of the object;

Optionally, the processor 11 is specifically configured to:

inputting the position information of the two-dimensional frame of the object into a second region segmentation network model, and acquiring sparse point cloud data of a target surface on the surface of the object;

and acquiring the depth value of the object according to the sparse point cloud data of the target surface.

Optionally, the information of the object includes at least one of the following: three-dimensional position information, orientation information, three-dimensional size information, and depth values of the object.

Optionally, the processor 11 is specifically configured to:

and acquiring the sparse point cloud data through at least one radar sensor, and acquiring the image through an image sensor.

Optionally, the number of the radar sensors is greater than 1; the processor 11 is specifically configured to:

respectively acquiring corresponding first sparse point cloud data through each radar sensor;

and according to the external parameters of the at least one radar sensor, projecting the first sparse point cloud data corresponding to each radar sensor to a target radar coordinate system to obtain the sparse point cloud data.

Optionally, the processor 11 is specifically configured to:

and projecting the sparse point cloud data and the image into a camera coordinate system through an external parameter matrix between the radar sensor and the image sensor to acquire the data to be processed.

Optionally, the data to be processed includes: and the sparse point cloud data is projected to the coordinate value and the reflectivity of each point in the target coordinate system, and the coordinate value of the pixel point in the image in the target coordinate system.

Optionally, the electronic device may further include a radar sensor 13 and an image sensor 14. In this embodiment, the number and the installation positions of the radar sensors 13 and the image sensors 14 are not limited.

The electronic device provided in this embodiment is configured to execute the object detection method provided in any one of the implementation manners of fig. 2 to 6, and the technical solution and the technical effect are similar, which are not repeated herein.

Embodiments of the present application further provide a movable platform, which may include the electronic device provided in the embodiment shown in fig. 7. It should be noted that the present embodiment is not limited to the type of the movable platform, and may be any device that needs to perform object detection. For example, it may be a drone, a vehicle, or other vehicle.

As shown in fig. 8, in an alternative embodiment. The ranging apparatus 200 includes a ranging module 210, the ranging module 210 including an emitter 203 (e.g., transmit circuitry), a collimating element 204, a detector 205 (e.g., which may include receive circuitry, sampling circuitry, and arithmetic circuitry), and a path-altering element 206. The distance measuring module 210 is configured to emit a light beam, receive return light, and convert the return light into an electrical signal. Wherein the emitter 203 may be configured to emit a sequence of light pulses. In one embodiment, the transmitter 203 may emit a sequence of laser pulses. Optionally, the laser beam emitted by the emitter 203 is a narrow bandwidth beam having a wavelength outside the visible range. The collimating element 204 is disposed on an emitting light path of the emitter 203, and is configured to collimate the light beam emitted from the emitter 203, and collimate the light beam emitted from the emitter 203 into parallel light to be emitted to the scanning module. The collimating element 204 also serves to condense at least a portion of the return light reflected by the detector. The collimating element 204 may be a collimating lens or other element capable of collimating a light beam.

In the embodiment shown in fig. 8, the transmit and receive optical paths within the distance measuring device are combined by the optical path changing element 206 before the collimating element 204, so that the transmit and receive optical paths can share the same collimating element, making the optical path more compact. In other implementations, the emitter 203 and the detector 205 may use respective collimating elements, and the optical path changing element 206 may be disposed in the optical path after the collimating elements.

In the embodiment shown in fig. 8, since the beam aperture of the light beam emitted from the emitter 203 is small and the beam aperture of the return light received by the distance measuring device is large, the optical path changing element can adopt a small-area mirror to combine the emission optical path and the reception optical path. In other implementations, the optical path changing element may also be a mirror with a through hole, wherein the through hole is used for transmitting the outgoing light from the emitter 203, and the mirror is used for reflecting the return light to the detector 205. Therefore, the shielding of the bracket of the small reflector to the return light can be reduced in the case of adopting the small reflector.

In the embodiment shown in fig. 8, the optical path altering element is offset from the optical axis of the collimating element 204. In other implementations, the optical path altering element may also be located on the optical axis of the collimating element 204.

The ranging device 200 also includes a scanning module 202. The scanning module 202 is disposed on the emitting light path of the distance measuring module 210, and the scanning module 202 is configured to change the transmission direction of the collimated light beam 219 emitted by the collimating element 204, project the collimated light beam to the external environment, and project the return light beam to the collimating element 204. The return light is converged by the collimating element 204 onto the detector 205.

In one embodiment, the scanning module 202 may include at least one optical element for altering the propagation path of the light beam, wherein the optical element may alter the propagation path of the light beam by reflecting, refracting, diffracting, etc., the light beam. For example, the scanning module 202 includes a lens, mirror, prism, galvanometer, grating, liquid crystal, optical phased Array (optical phased Array), or any combination thereof. In one example, at least a portion of the optical element is moved, for example, by a driving module, and the moved optical element can reflect, refract, or diffract the light beam to different directions at different times. In some embodiments, multiple optical elements of the scanning module 202 may rotate or oscillate about a common axis 209, with each rotating or oscillating optical element serving to constantly change the direction of propagation of an incident beam. In one embodiment, the multiple optical elements of the scanning module 202 may rotate at different rotational speeds or oscillate at different speeds. In another embodiment, at least some of the optical elements of the scanning module 202 may rotate at substantially the same rotational speed. In some embodiments, the multiple optical elements of the scanning module may also be rotated about different axes. In some embodiments, the multiple optical elements of the scanning module may also rotate in the same direction, or in different directions; or in the same direction, or in different directions, without limitation.

In one embodiment, the scanning module 202 includes a first optical element 214 and a driver 216 coupled to the first optical element 214, the driver 216 configured to drive the first optical element 214 to rotate about the rotation axis 209, such that the first optical element 214 redirects the collimated light beam 219. The first optical element 214 projects the collimated beam 219 into different directions. In one embodiment, the angle between the direction of the collimated beam 219 after it is altered by the first optical element and the axis of rotation 209 changes as the first optical element 214 is rotated. In one embodiment, the first optical element 214 includes a pair of opposing non-parallel surfaces through which the collimated light beam 219 passes. In one embodiment, the first optical element 214 includes a prism having a thickness that varies along at least one radial direction. In one embodiment, the first optical element 214 comprises a wedge angle prism that refracts the collimated beam 219.

In one embodiment, the scanning module 202 further comprises a second optical element 215, the second optical element 215 rotating around a rotation axis 209, the rotation speed of the second optical element 215 being different from the rotation speed of the first optical element 214. The second optical element 215 is used to change the direction of the light beam projected by the first optical element 214. In one embodiment, the second optical element 215 is coupled to another driver 217, and the driver 217 drives the second optical element 215 to rotate. The first optical element 214 and the second optical element 215 may be driven by the same or different drivers, such that the first optical element 214 and the second optical element 215 rotate at different speeds and/or turns, thereby projecting the collimated light beam 219 into different directions in the ambient space, which may scan a larger spatial range. In one embodiment, the controller 218 controls the drivers 216 and 217 to drive the first optical element 214 and the second optical element 215, respectively. The rotation speed of the first optical element 214 and the second optical element 215 can be determined according to the region and the pattern expected to be scanned in the actual application. The drives 216 and 217 may include motors or other drives.

In one embodiment, second optical element 215 includes a pair of opposing non-parallel surfaces through which the light beam passes. In one embodiment, second optical element 215 includes a prism having a thickness that varies along at least one radial direction. In one embodiment, second optical element 215 comprises a wedge angle prism.

In one embodiment, the scan module 202 further comprises a third optical element (not shown) and a driver for driving the third optical element to move. Optionally, the third optical element comprises a pair of opposed non-parallel surfaces through which the light beam passes. In one embodiment, the third optical element comprises a prism having a thickness that varies along at least one radial direction. In one embodiment, the third optical element comprises a wedge angle prism. At least two of the first, second and third optical elements rotate at different rotational speeds and/or rotational directions.

Rotation of the optical elements in the scanning module 202 may project light in different directions, such as directions 211 and 213, and thus scan the space around the ranging device 200. When the light 211 projected by the scanning module 202 hits the detection object 201, a part of the light is reflected by the detection object 201 to the distance measuring device 200 in the opposite direction to the projected light 211. The return light 212 reflected by the object 201 passes through the scanning module 202 and then enters the collimating element 204.

The detector 205 is placed on the same side of the collimating element 204 as the emitter 203, and the detector 205 is used to convert at least part of the return light passing through the collimating element 204 into an electrical signal.

In one embodiment, each optical element is coated with an antireflection coating. Optionally, the thickness of the antireflection film is equal to or close to the wavelength of the light beam emitted by the emitter 203, which can increase the intensity of the transmitted light beam.

In one embodiment, a filter layer is coated on a surface of a component in the distance measuring device located on the light beam propagation path, or a filter is disposed on the light beam propagation path to transmit at least a wavelength band of the light beam emitted from the emitter 203 and reflect other wavelength bands, so as to reduce noise of the ambient light to the receiver.

In some embodiments, the transmitter 203 may include a laser diode through which laser pulses in the order of nanoseconds are emitted. Further, the laser pulse reception time may be determined, for example, by detecting the rising edge time and/or the falling edge time of the electrical signal pulse. In this manner, the ranging apparatus 200 may calculate TOF using the pulse reception time information and the pulse emission time information, thereby determining the distance of the probe 201 to the ranging apparatus 200.

According to the laser radar provided by the embodiment, the acquisition of the point cloud data of the laser radar can be realized.

Embodiments of the present application further provide a computer storage medium, where the computer storage medium is used to store computer software instructions for detecting the object, and when the computer storage medium runs on a computer, the computer is enabled to execute various possible object detection methods in the above method embodiments. The processes or functions described in accordance with the embodiments of the present application may be generated in whole or in part when the computer-executable instructions are loaded and executed on a computer. The computer instructions may be stored on a computer storage medium or transmitted from one computer storage medium to another via wireless (e.g., cellular, infrared, short-range wireless, microwave, etc.) to another website site, computer, server, or data center. The computer storage media may be any available media that can be accessed by a computer or a data storage device, such as a server, data center, etc., that incorporates one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., SSD), among others.

Those of ordinary skill in the art will understand that: all or part of the steps for implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer readable storage medium, and when executed, the program performs the steps including the method embodiments; and the aforementioned storage medium includes: various media capable of storing program codes, such as a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, and an optical disk.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. An object detection method, comprising:

acquiring sparse point cloud data and an image of a scene to be detected;

2. The method according to claim 1, wherein the three-dimensional detection of the data to be processed to obtain a detection result of an object included in the scene to be detected comprises:

3. The method of claim 2, wherein inputting the feature map into a candidate area network model to obtain a two-dimensional box of candidate objects comprises:

4. The method according to claim 3, wherein the obtaining the two-dimensional frame of the candidate object according to the probability that the first pixel belongs to the object and the two-dimensional frame of the object corresponding to the first pixel comprises:

5. The method according to claim 2, wherein the determining the object included in the scene to be detected according to the two-dimensional frame of the candidate object comprises:

6. The method of claim 5, wherein the obtaining a compensation value for the information of the object comprises:

7. The method according to claim 2, wherein the determining the object included in the scene to be detected according to the two-dimensional frame of the candidate object comprises:

8. The method of claim 7, wherein the obtaining a compensation value for the information of the object comprises:

9. The method according to any one of claims 5 to 8, wherein the compensation value of the information of the object comprises a compensation value of an orientation of the object, and the obtaining the information of the object according to the compensation value of the information of the object comprises:

10. The method according to any one of claims 5 to 8, wherein the compensation value of the information of the object includes a compensation value of three-dimensional position information of the object, and the acquiring the information of the object according to the compensation value of the information of the object includes:

11. The method according to any one of claims 5 to 8, wherein the compensation value of the information of the object includes a compensation value of a three-dimensional size of the object, and the acquiring the information of the object according to the compensation value of the information of the object includes:

12. The method according to any one of claims 5 to 8, wherein the compensation value of the information of the object comprises a compensation value of a two-dimensional frame of the object, and the obtaining the information of the object according to the compensation value of the information of the object comprises:

13. The method according to claim 12, wherein the obtaining the depth value of the object according to the position information of the two-dimensional frame of the object comprises:

14. The method according to claim 12, wherein the obtaining the depth value of the object according to the position information of the two-dimensional frame of the object comprises:

15. The method according to any one of claims 1 to 14, wherein the information of the object comprises at least one of: three-dimensional position information, orientation information, three-dimensional size information, and depth values of the object.

16. The method according to any one of claims 1 to 14, wherein the acquiring sparse point cloud data and images of the scene to be detected comprises:

17. The method of claim 16, wherein the number of radar sensors is greater than 1; the acquiring the sparse point cloud data by at least one radar sensor comprises:

18. The method of claim 16, wherein the projecting the sparse point cloud data and the image into a target coordinate system, obtaining data to be processed, comprises:

and projecting the sparse point cloud data and the image into an image coordinate system through external parameters of the radar sensor and the image sensor to acquire the data to be processed.

19. The method according to any one of claims 1 to 14, wherein the data to be processed comprises: and the sparse point cloud data is projected to the coordinate value and the reflectivity of each point in the target coordinate system, and the coordinate value of the pixel point in the image in the target coordinate system.

20. An electronic device, comprising:

a memory for storing a computer program;

a processor for executing the computer program, in particular for:

acquiring sparse point cloud data and an image of a scene to be detected;

21. The electronic device of claim 20, wherein the processor is specifically configured to:

22. The electronic device of claim 21, wherein the processor is specifically configured to:

23. The electronic device of claim 22, wherein the processor is specifically configured to:

24. The electronic device of claim 21, wherein the processor is specifically configured to:

25. The electronic device of claim 24, wherein the processor is specifically configured to:

26. The electronic device of claim 21, wherein the processor is specifically configured to:

27. The electronic device of claim 26, wherein the processor is specifically configured to:

28. The electronic device of any of claims 24-27, wherein the compensation value for the information of the object comprises a compensation value for an orientation of the object, and wherein the processor is specifically configured to:

29. The electronic device of any of claims 24-27, wherein the compensated value for the information of the object comprises a compensated value for three-dimensional position information of the object, and wherein the processor is specifically configured to:

30. The electronic device of any of claims 24-27, wherein the compensation value for the information of the object comprises a compensation value for a three-dimensional size of the object, and wherein the processor is specifically configured to:

31. The electronic device of any of claims 24 to 27, wherein the compensation value for the information of the object comprises a compensation value for a two-dimensional box of the object, and wherein the processor is specifically configured to:

32. The electronic device of claim 31, wherein the processor is specifically configured to:

33. The electronic device of claim 31, wherein the processor is specifically configured to:

34. The electronic device of any of claims 20-33, wherein the information of the object comprises at least one of: three-dimensional position information, orientation information, three-dimensional size information, and depth values of the object.

35. The electronic device of any of claims 20-33, wherein the processor is specifically configured to:

36. The electronic device of claim 35, wherein the number of radar sensors is greater than 1; the processor is specifically configured to:

37. The electronic device of claim 35, wherein the processor is specifically configured to:

38. The electronic device of any of claims 20-23, wherein the data to be processed comprises: and the sparse point cloud data is projected to the coordinate value and the reflectivity of each point in the target coordinate system, and the coordinate value of the pixel point in the image in the target coordinate system.

39. A movable platform, comprising: an electronic device as claimed in any of claims 20 to 38.

40. The movable platform of claim 39, wherein the movable platform is a vehicle or a drone.

41. A computer storage medium, characterized in that the storage medium has stored therein a computer program which, when executed, implements an object detection method as claimed in any one of claims 1-19.