CN117152693A

CN117152693A - Object detection method, device, electronic apparatus, storage medium, and program product

Info

Publication number: CN117152693A
Application number: CN202210563596.0A
Authority: CN
Inventors: 王珂
Original assignee: Tianjin Carl Power Technology Co ltd
Current assignee: Tianjin Carl Power Technology Co ltd
Priority date: 2022-05-23
Filing date: 2022-05-23
Publication date: 2023-12-01

Abstract

Embodiments of the present disclosure relate to a target detection method, apparatus, electronic device, storage medium, and program product. The method comprises the following steps: acquiring point cloud data of a laser radar and sensing data of other sensing devices; performing regional proposal processing based on the point cloud data to obtain a first regional proposal; the first area proposal is used for representing an alternative area where the target object detected by the laser radar is located; performing region proposal processing based on the perception data to obtain a second region proposal; the second area proposal is used for representing an alternative area where the target object detected by the other perception equipment is located; detecting according to the first area proposal and the second area proposal to obtain a target detection result; the target detection result comprises a target area where the target object is located and a category of the target object. The method can improve the perception capability of the automatic driving vehicle.

Description

Object detection method, device, electronic apparatus, storage medium, and program product

Technical Field

The embodiment of the disclosure relates to the technical field of target detection, in particular to a target detection method, a target detection device, electronic equipment, a storage medium and a program product.

Background

With the development of automobile technology, automatic driving technology has emerged. Currently, an automatic driving vehicle often utilizes point cloud data acquired by a laser radar to detect a target, so that the surrounding environment of the vehicle is perceived.

Since environmental perception has a very important influence on the safe driving of an autonomous vehicle, how to improve the perception capability of the autonomous vehicle becomes a technical problem to be solved.

Disclosure of Invention

Embodiments of the present disclosure provide a target detection method, apparatus, electronic device, storage medium, and program product that may be used to improve the awareness of an autonomous vehicle.

In a first aspect, an embodiment of the present disclosure provides a target detection method, the method including:

acquiring point cloud data of a laser radar and sensing data of other sensing devices;

performing regional proposal processing based on the point cloud data to obtain a first regional proposal; the first area proposal is used for representing an alternative area where the target object detected by the laser radar is located;

performing region proposal processing based on the perception data to obtain a second region proposal; the second area proposal is used for representing an alternative area where the target object detected by the other perception equipment is located;

Detecting according to the first area proposal and the second area proposal to obtain a target detection result; the target detection result comprises a target area where the target object is located and a category of the target object.

In a second aspect, embodiments of the present disclosure provide an object detection apparatus, the apparatus comprising:

the data acquisition module is used for acquiring point cloud data of the laser radar and sensing data of other sensing devices;

the first area proposal module is used for carrying out area proposal processing based on the point cloud data to obtain a first area proposal; the first area proposal is used for representing an alternative area where the target object detected by the laser radar is located;

the second area proposal module is used for carrying out area proposal processing based on the perception data to obtain a second area proposal; the second area proposal is used for representing an alternative area where the target object detected by the other perception equipment is located;

the target detection module is used for detecting according to the first area proposal and the second area proposal to obtain a target detection result; the target detection result comprises a target area where the target object is located and a category of the target object.

In a third aspect, an embodiment of the disclosure provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the method of the first aspect when the processor executes the computer program.

In a fourth aspect, embodiments of the present disclosure provide a computer-readable storage medium, on which a computer program is stored, which when executed by a processor, implements the method of the first aspect.

In a fifth aspect, embodiments of the present disclosure provide a computer program product comprising a computer program which, when executed by a processor, implements the method of the first aspect described above.

The object detection method, the object detection device, the electronic equipment, the storage medium and the program product provided by the embodiment of the disclosure acquire point cloud data of the laser radar and sensing data of other sensing equipment; performing region proposal processing based on the point cloud data to obtain a first region proposal; performing region proposal processing based on the perception data to obtain a second region proposal; and detecting according to the first area proposal and the second area proposal to obtain a target detection result. According to the embodiment of the disclosure, all the candidate areas where the target object is located are obtained based on the point cloud data of the laser radar and the sensing data of other sensing devices, and then target detection is carried out according to all the candidate areas where the target object is located, so that a more accurate target detection result can be obtained. Compared with the mode of sensing by only adopting the laser radar in the prior art, the sensing method and the sensing device have the advantages that other sensing devices are combined with the laser radar to sense, and the sensing capability of an automatic driving vehicle can be improved.

Drawings

FIG. 1 is a diagram of an application environment for a target detection method in one embodiment;

FIG. 2 is a flow chart of a method of detecting targets in one embodiment;

FIG. 3 is a flow chart illustrating the detection steps according to the first region proposal and the second region proposal in one embodiment;

FIG. 4 is a flowchart illustrating steps for obtaining a lidar signature based on point cloud data in one embodiment;

FIG. 5 is a flow diagram of the training steps of a two-dimensional example segmentation model in one embodiment;

FIG. 6 is a flow chart of a method for detecting targets according to another embodiment;

FIG. 7 is one of the block diagrams of the object detection device in one embodiment;

FIG. 8 is a second block diagram of an object detection device in one embodiment;

FIG. 9 is a third block diagram of an object detection device in one embodiment;

fig. 10 is an internal structural diagram of an electronic device in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more apparent, the embodiments of the present disclosure will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the disclosed embodiments and are not intended to limit the disclosed embodiments.

First, before the technical solution of the embodiments of the present disclosure is specifically described, a description is given of a technical background or a technical evolution context on which the embodiments of the present disclosure are based. In general, an automatic driving vehicle often uses point cloud data collected by a laser radar to detect a target, so as to sense the surrounding environment of the vehicle. Since environmental perception has a very important influence on the safe driving of an autonomous vehicle, how to improve the perception capability of the autonomous vehicle becomes a technical problem to be solved.

The embodiment of the disclosure provides a target detection scheme, which is used for acquiring point cloud data of a laser radar and sensing data of other sensing devices; performing region proposal processing based on the point cloud data to obtain a first region proposal; performing region proposal processing based on the perception data to obtain a second region proposal; and detecting according to the first area proposal and the second area proposal to obtain a target detection result. According to the embodiment of the disclosure, all the candidate areas where the target object is located are obtained based on the point cloud data of the laser radar and the sensing data of other sensing devices, and then target detection is carried out according to all the candidate areas where the target object is located, so that a more accurate target detection result can be obtained. Compared with the mode of sensing by only adopting the laser radar in the prior art, the sensing method and the sensing device have the advantages that other sensing devices are combined with the laser radar to sense, and the sensing capability of an automatic driving vehicle can be improved. It should be noted that, from the technical solution of combining other sensing devices with the lidar and introducing the following embodiments, the applicant has made a lot of creative efforts.

The following describes a technical scheme related to an embodiment of the present disclosure in conjunction with a scenario in which the embodiment of the present disclosure is applied.

The target detection method provided by the embodiment of the disclosure can be applied to an application environment as shown in fig. 1. The application environment includes a vehicle 102 with electronics, lidar and other sensing devices disposed in the vehicle 102. The laser radar and other sensing equipment can sense the surrounding environment of the vehicle, and the electronic equipment controls the vehicle according to data acquired by the laser radar and other sensing equipment, so that automatic driving of the vehicle is realized. The other sensing devices may include, but are not limited to, image acquisition devices such as millimeter wave radar, various cameras, look-around cameras, fisheye cameras, infrared cameras, and the like.

In one embodiment, as shown in fig. 2, a target detection method is provided, and the method is applied to the electronic device in fig. 1 for illustration, and includes the following steps:

step 201, acquiring point cloud data of a laser radar and sensing data of other sensing devices.

When the target is detected, a laser radar installed on the vehicle emits laser pulses to the surrounding environment and detects the returned laser pulses, so that point cloud data are obtained. And sensing the surrounding environment of the vehicle by other sensing equipment arranged on the vehicle to obtain sensing data. For example, a camera mounted on a vehicle photographs the surroundings of the vehicle to obtain an environmental image. The embodiments of the present disclosure are not limited to other sensing devices and sensing data.

The electronic equipment acquires point cloud data from the laser radar and acquires perception data from other perception equipment.

Step 202, performing area proposal processing based on the point cloud data to obtain a first area proposal.

Wherein the first region proposal is used for representing an alternative region where a target object detected by the laser radar is located. The target object may include at least one of a vehicle, a person, an animal, a plant, and a building. The embodiments of the present disclosure do not limit the target object. If the target object is a plurality of target objects, the first region proposal may include an alternative region where each target object is located, i.e. a plurality of region detection boxes.

After the electronic equipment acquires the point cloud data, performing area proposal processing based on the point cloud data to obtain a first area proposal. Optionally, the point cloud data is input into a first area proposal network (Region Proposal Network), resulting in a first area proposal output by the first area proposal network.

And 203, performing region proposal processing based on the perception data to obtain a second region proposal.

The second area proposal is used for representing an alternative area where the target object detected by other sensing devices is located. If the target object is a plurality of target objects, the second region proposal comprises an alternative region where each target object is located, namely a plurality of region detection boxes.

After the electronic device acquires the perception data, the electronic device performs region proposal processing based on the perception data to obtain a second region proposal. Optionally, the perceived data is input into a second regional proposal network resulting in a second regional proposal output by the second regional proposal network.

For example, the environmental image collected by the camera is input into the second area proposal network, and a plurality of area detection frames output by the second area network are obtained.

The above-mentioned region detection frame may include a coordinate position of a center point of the target object in a world coordinate system, a length, width, height of the target object, a rotation angle of the target object around a z-axis in the world coordinate system, and the like. The embodiment of the disclosure does not limit the area detection frame, and can be set according to actual conditions.

And 204, detecting according to the first area proposal and the second area proposal to obtain a target detection result.

The target detection result comprises a target area where the target object is located and a class of the target object. The target area where the target object is located can be represented by an area detection frame, and the categories of the target object can include categories of vehicles, people, animals, plants, buildings and the like. Alternatively, the class of the target object may be pixel-level, i.e. there is a corresponding class for each pixel. The embodiments of the present disclosure do not limit the category of the target object.

After the first area proposal and the second area proposal are obtained, all the candidate areas where the target object is located are obtained, detection is carried out according to all the candidate areas where the target object is located, the target area where the target object is located can be obtained, and the target area is identified to obtain the category of the target object. The embodiment of the disclosure does not limit the detection mode, and can be selected according to actual conditions.

In the target detection method, point cloud data of the laser radar and sensing data of other sensing devices are acquired; performing region proposal processing based on the point cloud data to obtain a first region proposal; performing region proposal processing based on the perception data to obtain a second region proposal; and detecting according to the first area proposal and the second area proposal to obtain a target detection result. According to the embodiment of the disclosure, all the candidate areas where the target object is located are obtained based on the point cloud data of the laser radar and the sensing data of other sensing devices, and then target detection is carried out according to all the candidate areas where the target object is located, so that a more accurate target detection result can be obtained. Compared with the mode of sensing by only adopting the laser radar in the prior art, the sensing method and the sensing device have the advantages that other sensing devices are combined with the laser radar to sense, and the sensing capability of an automatic driving vehicle can be improved.

In one embodiment, as shown in fig. 3, the process of detecting according to the first area proposal and the second area proposal to obtain the target detection result may include the following steps:

step 301, merging the first area proposal and the second area proposal to obtain the target area proposal.

Wherein the target area proposal is used to represent an alternative area where the target object detected by the lidar and other sensing devices is located.

When the first area proposal and the second area proposal are obtained, two groups of area detection frames of the target object are obtained, and the two groups of area detection frames can be combined to obtain the target area proposal, wherein the target area proposal can comprise all the area detection frames of the two groups of area detection frames or a union of the two groups of area detection frames. The embodiment of the present disclosure is not limited to the combination processing manner.

It can be understood that the merging processing of the first area proposal and the second area proposal combines the candidate area where the target object detected by the laser radar is located with the candidate area where the target object detected by other sensing devices is located, so as to sense all the candidate areas where the target object is located.

And 302, acquiring a laser radar feature map according to the point cloud data.

After the electronic equipment acquires the point cloud data, feature extraction can be performed based on the point cloud data to obtain a laser radar feature map. The feature extraction may be performed by using a neural network model, or may be performed by using other manners, which is not limited in this embodiment of the disclosure, and may be set according to actual situations.

And step 303, determining an object feature map corresponding to the target object according to the target area proposal and the laser radar feature map.

After determining the target area proposal and the laser radar feature map, determining the area of the target object in the laser radar feature map according to the target area proposal, and extracting the object feature map corresponding to the target object from the laser radar feature map according to the area.

Optionally, when the object feature map is extracted from the lidar feature map, the feature maps of different target objects may be resampled by using a preset algorithm, and the feature maps of different target objects may be sampled to the same resolution, so that the subsequent instance segmentation may be performed in batch, thereby improving the detection efficiency. The preset algorithm may include a bilinear interpolation algorithm, which is not limited in the embodiments of the present disclosure.

And step 304, performing three-dimensional instance segmentation on the object feature map to obtain a target detection result.

After the object feature map is determined, three-dimensional instance segmentation is performed on the object feature map. Optionally, inputting the object feature map into a pre-trained three-dimensional instance segmentation model to obtain a target detection result output by the three-dimensional instance segmentation model.

Instance segmentation is a way to detect different instances using a target detection algorithm and then label pixel by pixel in different instance areas using a semantic segmentation algorithm.

Optionally, under the condition that calculation such as target tracking is required, the target detection result output by the three-dimensional instance segmentation model can only include the target area where the target object is located, so that the calculation power requirement of subsequent target tracking calculation can be reduced.

In the above embodiment, the first area proposal and the second area proposal are combined to obtain the target area proposal; acquiring a laser radar feature map according to the point cloud data; determining an object feature map corresponding to the target object according to the target area proposal and the laser radar feature map; and carrying out three-dimensional instance segmentation on the object feature map to obtain a target detection result. According to the embodiment of the disclosure, the laser radar is combined with other sensing equipment to perform environment sensing, so that the sensing capability of the automatic driving vehicle can be improved.

In one embodiment, as shown in fig. 4, the process of obtaining a lidar feature map according to the point cloud data may include the following steps:

and 3021, performing mapping processing on the point cloud data according to the mapping relationship between the point cloud data and the depth map to obtain a laser radar depth map.

The mapping relation between the point cloud data and the depth map can be preset in the electronic equipment, and after the point cloud data is acquired, the point cloud data is mapped according to the mapping relation, so that the corresponding laser radar depth map is obtained.

The mapping process may include: mapping each point cloud point in the point cloud data from a reference coordinate system to a spherical coordinate system to obtain the position of each point cloud point in the spherical coordinate system, determining an included angle between the position of each point cloud point in the spherical coordinate system and the x axis of the spherical coordinate system as a first included angle, and determining an included angle between the position of each point cloud point in the spherical coordinate system and the z axis of the spherical coordinate system as a second included angle; and taking the first included angle as an abscissa, the second included angle as an ordinate, and the surface reflectivity information and/or the depth information of the point cloud point as pixel values to obtain the laser radar depth map.

For example, the reference coordinate system is a world coordinate system, the position of the point cloud point 1 in the world coordinate system is (x 1, y1, z 1), and the mapping process is performed according to the mapping relation, so as to obtain the position of the point cloud point 1 in the spherical coordinate system as follows Wherein r1 is the distance between the position of the point cloud point in the spherical coordinate system and the origin of the spherical coordinate system, and theta 1 is a first included angle,/>is a second included angle. θ1 as abscissa, will +.>And taking the surface reflectivity information of the point cloud point 1 as a pixel value as an ordinate, and obtaining a laser radar depth map. Alternatively, θ1 is taken as the abscissa and +.>And taking the surface reflectivity information of the point cloud point 1 as the pixel value of one channel and the depth information of the point cloud point 1 as the pixel value of the other channel as the ordinate, so that the laser radar depth map can be obtained. The pixel value can also adopt prior information such as projection errors, classification results and the like, and the pixel value is not limited and can be set according to actual conditions.

For the mechanical lidar, the mapping relationship between the point cloud data and the depth map includes: each row of pixels on the laser radar depth map corresponds to one laser scanning line (beam); each pixel in each row corresponds to one scan point of the lidar.

Optionally, before mapping the point cloud data, a part of point cloud data acquired by the multiple lidars may be spliced to obtain all spliced point cloud data, and then mapping all point cloud data to obtain a lidar depth map.

Optionally, after obtaining the laser radar depth map, downsampling may be performed on the laser radar depth map to reduce the resolution of the laser radar depth map, so as to reduce the computational effort requirements of subsequent feature extraction processes.

And 3022, performing feature extraction on the laser radar depth map to obtain a laser radar feature map.

And after the laser radar depth map is obtained, extracting features of the laser radar depth map. Optionally, inputting the laser radar depth map into a pre-trained backbone network for feature extraction, and obtaining a laser radar feature map output by the backbone network.

The backbone network can adopt a combination of a convolutional neural network and a characteristic pyramid network, and can also adopt a characteristic extraction network based on a transducer. The backbone network is not limited by the embodiments of the present disclosure.

In the above embodiment, mapping is performed on the point cloud data according to the mapping relationship between the point cloud data and the depth map, so as to obtain a laser radar depth map; and extracting features of the laser radar depth map to obtain a laser radar feature map. Compared with the current method for gridding the point cloud data, the laser radar depth map can retain more three-dimensional space information and is more suitable for learning by a neural network model, so that larger receptive field and more global information can be obtained, and the detection precision of target detection is improved.

In one embodiment, the process of performing the region proposal based on the point cloud data to obtain the first region proposal may include: and inputting the laser radar feature map into a pre-trained two-dimensional instance segmentation model to obtain a first area proposal output by the two-dimensional instance segmentation model.

The electronic equipment can store a pre-trained two-dimensional instance segmentation model, when the regional proposal processing is carried out based on the point cloud data, the point cloud data is mapped according to the mapping relation between the point cloud data and the depth map, and the laser radar depth map is obtained; and then extracting features of the laser radar depth map to obtain a laser radar feature map. And then, inputting the extracted laser radar depth map into a two-dimensional instance segmentation model, wherein the two-dimensional instance segmentation model can output a first region proposal.

It can be appreciated that, by using a pre-trained two-dimensional instance segmentation model, an alternative region where a target object detected by the laser radar is located can be obtained quickly and accurately.

On the basis of the above embodiment, as shown in fig. 5, the training process of the two-dimensional instance segmentation model may include the following steps:

step 401, obtaining a plurality of sample point cloud data and labels corresponding to the sample point cloud data.

The three-dimensional region where the sample object is located is marked.

And acquiring a plurality of sample point cloud data, wherein a manual labeling mode can be adopted to acquire labels corresponding to the sample point cloud data. Other labeling schemes may also be employed, and embodiments of the present disclosure are not limited in this regard.

Step 402, obtaining a corresponding sample feature map according to each sample point cloud data.

And according to the mapping relation between the point cloud data and the depth map, mapping the point cloud data of each sample to obtain a corresponding sample depth map, and then extracting the characteristics of each sample depth map to obtain a corresponding sample characteristic map. The mapping process and feature extraction may refer to the description of the above embodiments, and the embodiments of the disclosure are not repeated herein.

And step 403, mapping the labels according to the mapping relation between the point cloud data and the depth map to obtain a two-dimensional area where the sample object is located.

Similarly, mapping processing is also performed on labels corresponding to the sample point cloud data, namely, a three-dimensional area where the sample object is located is mapped into a two-dimensional space, and a two-dimensional area where the sample object is located is obtained. The two-dimensional region in which the sample object is located may be represented by a region detection box surrounding the sample object.

Optionally, the label corresponding to the sample point cloud data further includes a three-dimensional segmentation mask, and after mapping the label, the two-dimensional segmentation mask can also be obtained.

It can be understood that under the condition of the existing three-dimensional annotation, the two-dimensional annotation can be obtained quickly and conveniently according to the mapping relation, so that the model training efficiency is improved.

And step 404, performing model training according to the sample feature map and the two-dimensional region where the sample object is located, and obtaining a two-dimensional instance segmentation model.

After the two-dimensional sample feature map and the two-dimensional region where the sample object is located are determined, the sample feature map is used as model input to perform model training, then a loss value between the model output and the two-dimensional region where the sample object is located is determined, if the loss value does not meet a preset convergence condition, model parameters are adjusted, and training is continued. And finishing training until the loss value accords with a preset convergence condition, and determining the model at the time of finishing training as a two-dimensional instance segmentation model. The embodiment of the disclosure does not limit the preset convergence condition.

In the above embodiment, a plurality of sample point cloud data and labels corresponding to the sample point cloud data are obtained; acquiring a corresponding sample feature map according to the cloud data of each sample point; mapping the labels according to the mapping relation between the point cloud data and the depth map to obtain a two-dimensional area where the sample object is located; and performing model training according to the sample feature map and the two-dimensional region where the sample object is located to obtain a two-dimensional instance segmentation model. According to the embodiment of the disclosure, the two-dimensional instance segmentation model is trained in advance, and when the region proposal processing is performed based on the point cloud data, the first region proposal can be rapidly and accurately obtained by using the two-dimensional instance segmentation model, so that the detection efficiency of target detection can be improved.

In one embodiment, the process of performing the region proposal based on the perception data to obtain the second region proposal may include: performing target detection on the perception data to obtain a third region proposal; and carrying out coordinate conversion processing on the third region proposal according to the position relation between other sensing devices and the laser radar and the mapping relation between the point cloud data and the depth map to obtain a second region proposal.

And performing target detection on the perception data by adopting a preset detection algorithm to obtain a third area proposal. According to the internal and external parameters of other sensing devices and the internal and external parameters of the laser radar and the position relation between other sensing devices and the laser radar, a coordinate conversion relation between sensing data and point cloud data can be established, and according to the coordinate conversion relation, the third area proposal can be converted into a point cloud data coordinate system. And then, according to the mapping relation between the point cloud data and the depth map, mapping the region proposal under the coordinate system of the point cloud data to the coordinate system of the depth map to obtain a second region proposal.

Taking the sensing data as an environment image as an example, performing target detection on the environment image to obtain a third area proposal, and then converting the third area proposal into a point cloud data coordinate system and then into a depth map coordinate system to obtain a second area proposal. It will be appreciated that the camera vision algorithm may provide a very robust and long range detection result, i.e. the third area proposal accuracy is higher when the perceived data is image data.

In the above embodiment, the target detection is performed on the sensing data to obtain the third area proposal; and carrying out coordinate conversion processing on the third region proposal according to the position relation between other sensing devices and the laser radar and the mapping relation between the point cloud data and the depth map to obtain a second region proposal. The embodiment of the disclosure converts the third region proposal detected from the perception data into the second region proposal under the depth map coordinate system, which is convenient for combining the second region proposal and the first region proposal, thereby obtaining all the candidate regions where the target object is located and providing detection basis for target detection.

In one embodiment, as shown in fig. 6, a target detection method is provided, and the method is applied to the electronic device in fig. 1 for illustration, and includes the following steps:

step 501, obtaining point cloud data of a laser radar and sensing data of other sensing devices.

And step 502, performing mapping processing on the point cloud data according to the mapping relation between the point cloud data and the depth map to obtain a laser radar depth map.

And step 503, extracting features of the laser radar depth map to obtain a laser radar feature map.

Step 504, inputting the lidar feature map into a pre-trained two-dimensional instance segmentation model, and obtaining a first region proposal output by the two-dimensional instance segmentation model.

Step 505, performing target detection on the perceived data to obtain a third region proposal; and carrying out coordinate conversion processing on the third region proposal according to the position relation between other sensing devices and the laser radar and the mapping relation between the point cloud data and the depth map to obtain a second region proposal.

Step 506, merging the first area proposal and the second area proposal to obtain the target area proposal.

And step 507, determining an object feature map corresponding to the target object according to the target area proposal and the laser radar feature map.

And step 508, performing three-dimensional instance segmentation on the object feature map to obtain a target detection result.

In the above embodiment, the point cloud data of the lidar and the sensing data of other sensing devices are acquired, the point cloud data is mapped into the lidar depth map, and then the lidar depth map is subjected to feature extraction to obtain the lidar feature map. Next, a first region proposal is obtained using the two-dimensional instance segmentation and the lidar feature map. In determining the first region proposal, a second region proposal is determined from the perceptual data. Then, merging the first area proposal and the second area proposal to obtain a target area proposal; extracting an object feature map of the target object from the laser radar feature map; and finally, detecting according to the target area proposal and the object feature map to obtain a target detection result. Because the embodiment of the disclosure combines other sensing devices with the laser radar to perform environment sensing, the sensing capability of the automatic driving vehicle can be improved.

It should be understood that, although the steps in the flowcharts of fig. 2 to 6 are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least a portion of the steps of fig. 2-6 may include steps or stages that are not necessarily performed at the same time, but may be performed at different times, nor does the order in which the steps or stages are performed necessarily occur sequentially, but may be performed alternately or alternately with other steps or at least a portion of the steps or stages in other steps.

In one embodiment, as shown in fig. 7, there is provided an object detection apparatus including:

the data acquisition module 601 is configured to acquire point cloud data of a laser radar and sensing data of other sensing devices;

a first area proposal module 602, configured to perform area proposal processing based on the point cloud data, to obtain a first area proposal; the first area proposal is used for representing an alternative area where a target object detected by the laser radar is located;

A second region proposal module 603, configured to perform region proposal processing based on the perceptual data, to obtain a second region proposal; the second area proposal is used for representing an alternative area where the target object detected by other sensing equipment is located;

the target detection module 604 is configured to detect according to the first region proposal and the second region proposal, so as to obtain a target detection result; the target detection result comprises a target area where the target object is located and a class of the target object.

In one embodiment, as shown in FIG. 8, the object detection module 604 includes:

a merging submodule 6041, configured to merge the first region proposal and the second region proposal to obtain a target region proposal; the target area proposal is used for representing an alternative area where a target object detected by the laser radar and other sensing devices is located;

a first feature map acquisition submodule 6042 for acquiring a laser radar feature map according to the point cloud data;

a second feature map acquisition submodule 6043, configured to determine an object feature map corresponding to the target object according to the target area proposal and the lidar feature map;

the detection submodule 6044 is used for performing three-dimensional instance segmentation on the object feature map to obtain a target detection result.

In one embodiment, the first feature map obtaining submodule 6042 is specifically configured to map the point cloud data according to a mapping relationship between the point cloud data and the depth map, so as to obtain a laser radar depth map; and extracting features of the laser radar depth map to obtain a laser radar feature map.

In one embodiment, the first feature map obtaining submodule 6042 is specifically configured to input a laser radar depth map into a pre-trained backbone network for feature extraction, so as to obtain a laser radar feature map output by the backbone network.

In one embodiment, the first feature map obtaining submodule 6042 is specifically configured to map each point cloud point in the point cloud data from a reference coordinate system to a spherical coordinate system, obtain a position of each point cloud point in the spherical coordinate system, determine an included angle between the position of each point cloud point in the spherical coordinate system and an x-axis of the spherical coordinate system as a first included angle, and determine an included angle between the position of each point cloud point in the spherical coordinate system and a z-axis of the spherical coordinate system as a second included angle; and taking the first included angle as an abscissa, the second included angle as an ordinate, and the surface reflectivity information and/or the depth information of the point cloud point as pixel values to obtain the laser radar depth map.

In one embodiment, the detection submodule 6044 is specifically configured to input the object feature map into a pre-trained three-dimensional instance segmentation model, so as to obtain a target detection result output by the three-dimensional instance segmentation model.

In one embodiment, the first region proposal module 602 is specifically configured to input the lidar feature map into a pre-trained two-dimensional instance segmentation model, and obtain a first region proposal output by the two-dimensional instance segmentation model.

In one embodiment, as shown in fig. 9, further includes:

the sample obtaining module 605 is configured to obtain a plurality of sample point cloud data and labels corresponding to the sample point cloud data; labeling a three-dimensional region where a sample object is located;

the sample feature map obtaining module 606 is configured to obtain a corresponding sample feature map according to each sample point cloud data;

the annotation mapping module 607 is configured to map the annotation according to the mapping relationship between the point cloud data and the depth map, so as to obtain a two-dimensional area where the sample object is located;

and the model training module is used for carrying out model training according to the sample feature map and the two-dimensional area where the sample object is positioned to obtain a two-dimensional instance segmentation model.

In one embodiment, the second area proposal module 602 is specifically configured to perform object detection on the sensing data to obtain a third area proposal; and carrying out coordinate conversion processing on the third region proposal according to the position relation between other sensing devices and the laser radar and the mapping relation between the point cloud data and the depth map to obtain a second region proposal.

For specific limitations of the object detection device, reference may be made to the above limitations of the object detection method, and no further description is given here. The respective modules in the above-described object detection apparatus may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the electronic device, or may be stored in software in a memory in the electronic device, so that the processor may call and execute operations corresponding to the above modules.

Fig. 10 is a block diagram of an electronic device 1300, according to an example embodiment. For example, the electronic device 1300 may be an in-vehicle center control, a mobile phone, a digital broadcast terminal, a messaging device, a tablet device, a personal digital assistant, or the like.

Referring to fig. 10, an electronic device 1300 may include one or more of the following components: a processing component 1302, a memory 1304, a power component 1306, a multimedia component 1308, an audio component 1310, an input/output (I/O) interface 1312, a sensor component 1314, and a communication component 1316. Wherein the memory has stored thereon a computer program or instructions that run on the processor.

The processing component 1302 generally controls overall operation of the electronic device 1300, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 1302 may include one or more processors 1320 to execute instructions to perform all or part of the steps of the methods described above. Further, the processing component 1302 can include one or more modules that facilitate interactions between the processing component 1302 and other components. For example, the processing component 1302 may include a multimedia module to facilitate interaction between the multimedia component 1308 and the processing component 1302.

The memory 1304 is configured to store various types of data to support operations at the electronic device 1300. Examples of such data include instructions for any application or method operating on the electronic device 1300, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 1304 may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.

The power supply assembly 1306 provides power to the various components of the electronic device 1300. The power components 1306 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the electronic device 1300.

The multimedia component 1308 includes a touch-sensitive display screen that provides an output interface between the electronic device 1300 and a user. In some embodiments, the touch display screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may sense not only the boundary of a touch or slide action, but also the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 1308 includes a front-facing camera and/or a rear-facing camera. When the electronic device 1300 is in an operational mode, such as a shooting mode or a video mode, the front camera and/or the rear camera may receive external multimedia data. Each front camera and rear camera may be a fixed optical lens system or have focal length and optical zoom capabilities.

The audio component 1310 is configured to output and/or input audio signals. For example, the audio component 1310 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 1300 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may be further stored in the memory 1304 or transmitted via the communication component 1316. In some embodiments, the audio component 1310 also includes a speaker for outputting audio signals.

The I/O interface 1312 provides an interface between the processing component 1302 and peripheral interface modules, which may be a keyboard, click wheel, buttons, etc. These buttons may include, but are not limited to: homepage button, volume button, start button, and lock button.

The sensor assembly 1314 includes one or more sensors for providing status assessment of various aspects of the electronic device 1300. For example, the sensor assembly 1314 may detect an on/off state of the electronic device 1300, a relative positioning of the components, such as a display and keypad of the electronic device 1300, the sensor assembly 1314 may also detect a change in position of the electronic device 1300 or a component of the electronic device 1300, the presence or absence of a user's contact with the electronic device 1300, an orientation or acceleration/deceleration of the electronic device 1300, and a change in temperature of the electronic device 1300. The sensor assembly 1314 may include a proximity sensor configured to detect the presence of nearby objects in the absence of any physical contact. The sensor assembly 1314 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 1314 may also include an acceleration sensor, a gyroscopic sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 1316 is configured to facilitate communication between the electronic device 1300 and other devices, either wired or wireless. The electronic device 1300 may access a wireless network based on a communication standard, such as WiFi,2G, or 3G, or a combination thereof. In one exemplary embodiment, the communication component 1316 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 1316 further includes a Near Field Communication (NFC) module to facilitate short range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the electronic device 1300 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic elements for performing the above-described object detection methods.

In an exemplary embodiment, a non-transitory computer-readable storage medium is also provided, such as memory 1304, including instructions executable by processor 1320 of electronic device 1300 to perform the above-described method. For example, the non-transitory computer readable storage medium may be ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.

In an exemplary embodiment, a computer program product is also provided, which, when being executed by a processor, may implement the above-mentioned method. The computer program product includes one or more computer instructions. When loaded and executed on a computer, these computer instructions may implement some or all of the methods described above, in whole or in part, in accordance with the processes or functions described in embodiments of the present disclosure.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided by the present disclosure may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, or the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples merely represent a few implementations of the disclosed examples, which are described in more detail and are not to be construed as limiting the scope of the invention. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made to the disclosed embodiments without departing from the spirit of the disclosed embodiments. Accordingly, the protection scope of the disclosed embodiment patent should be subject to the appended claims.

Claims

1. A method of target detection, the method comprising:

2. The method of claim 1, wherein the detecting according to the first region proposal and the second region proposal to obtain the target detection result comprises:

combining the first area proposal and the second area proposal to obtain a target area proposal; the target area proposal is used for representing an alternative area where the target object detected by the laser radar and the other sensing devices is located;

acquiring a laser radar feature map according to the point cloud data;

determining an object feature map corresponding to the target object according to the target area proposal and the laser radar feature map;

and carrying out three-dimensional instance segmentation on the object feature map to obtain the target detection result.

3. The method of claim 2, wherein the obtaining a lidar signature from the point cloud data comprises:

performing mapping processing on the point cloud data according to a mapping relation between the point cloud data and the depth map to obtain a laser radar depth map;

and extracting features of the laser radar depth map to obtain the laser radar feature map.

4. The method of claim 3, wherein the feature extracting the lidar depth map to obtain the lidar feature map comprises:

and inputting the laser radar depth map into a pre-trained backbone network for feature extraction, and obtaining the laser radar feature map output by the backbone network.

5. The method of claim 3, wherein the mapping the point cloud data according to the mapping relationship between the point cloud data and the depth map to obtain the laser radar depth map comprises:

mapping each point cloud point in the point cloud data from a reference coordinate system to a spherical coordinate system to obtain the position of each point cloud point in the spherical coordinate system, determining an included angle between the position of each point cloud point in the spherical coordinate system and the x-axis of the spherical coordinate system as a first included angle, and determining an included angle between the position of each point cloud point in the spherical coordinate system and the z-axis of the spherical coordinate system as a second included angle;

And taking the first included angle as an abscissa, the second included angle as an ordinate, and the surface reflectivity information and/or the depth information of the point cloud point as pixel values to obtain the laser radar depth map.

6. The method according to claim 2, wherein the performing three-dimensional instance segmentation on the object feature map to obtain the target detection result includes:

and inputting the object feature map into a pre-trained three-dimensional instance segmentation model to obtain the target detection result output by the three-dimensional instance segmentation model.

7. The method according to any one of claims 2 to 6, wherein the performing a region proposal process based on the point cloud data to obtain a first region proposal includes:

and inputting the laser radar feature map into a pre-trained two-dimensional instance segmentation model to obtain the first region proposal output by the two-dimensional instance segmentation model.

8. The method of claim 7, wherein the training process of the two-dimensional instance segmentation model comprises:

acquiring a plurality of sample point cloud data and labels corresponding to the sample point cloud data; the label is a three-dimensional area where the sample object is located;

Acquiring a corresponding sample feature map according to each sample point cloud data;

mapping the label according to the mapping relation between the point cloud data and the depth map to obtain a two-dimensional area where the sample object is located;

and performing model training according to the sample feature map and the two-dimensional area where the sample object is located to obtain the two-dimensional instance segmentation model.

9. The method according to any one of claims 1 to 6, wherein said performing a region proposal process based on said perceptual data to obtain a second region proposal comprises:

performing target detection on the perception data to obtain a third region proposal;

and carrying out coordinate conversion processing on the third region proposal according to the position relation between the other sensing devices and the laser radar and the mapping relation between the point cloud data and the depth map to obtain the second region proposal.

10. An object detection device, the device comprising:

11. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method of any one of claims 1 to 9 when the computer program is executed.

12. A storage medium having stored thereon a computer program, which when executed by a processor performs the steps of the method according to any of claims 1 to 9.

13. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method according to any one of claims 1-9.