CN113761999A

CN113761999A - Target detection method and device, electronic equipment and storage medium

Info

Publication number: CN113761999A
Application number: CN202010931022.5A
Authority: CN
Inventors: 刘浩
Original assignee: Beijing Jingdong Qianshi Technology Co Ltd
Current assignee: Beijing Jingdong Qianshi Technology Co Ltd
Priority date: 2020-09-07
Filing date: 2020-09-07
Publication date: 2021-12-07
Anticipated expiration: 2040-09-07
Also published as: CN113761999B

Abstract

The embodiment of the invention discloses a target detection method, a target detection device, electronic equipment and a storage medium, wherein the method comprises the following steps: determining the outline of a detection target candidate area based on three-dimensional point cloud data acquired by scanning aiming at a physical space; projecting the outline to an initial two-dimensional image obtained by shooting aiming at a physical space to obtain a first two-dimensional image corresponding to a detection target; determining a position and/or type of the detection target based on the first two-dimensional image and the initial two-dimensional image. By the technical scheme of the embodiment of the invention, the operation amount in target detection is reduced, and the target detection speed is further improved.

Description

Target detection method and device, electronic equipment and storage medium

Technical Field

The embodiment of the invention relates to the technical field of computers, in particular to a target detection method, a target detection device, electronic equipment and a storage medium.

Background

In the field of automatic driving, in order to ensure the running safety of an automatic driving vehicle, an obstacle which possibly obstructs the running of the vehicle needs to be detected in real time in the automatic driving process, so that reasonable avoidance actions are executed according to different obstacle types and states, and the running safety of the automatic driving vehicle is ensured. At present, two commonly used target detection methods exist, wherein one method is detection based on laser radar point cloud, specifically, three-dimensional point cloud data is converted into image data of a bird's-eye view, and then target detection is performed through a two-dimensional target detection algorithm. The other method is visual detection based on RGB images, and specifically comprises the steps of performing feature extraction according to an original image to obtain a feature image, and then performing target recognition according to each recognition unit of the feature image.

In the process of implementing the invention, the inventor finds that at least the following problems exist in the prior art:

the detection method based on the laser radar point cloud is limited by the characteristics of the laser radar, the identification precision and stability of a distant target are poor, the problem is particularly obvious when the laser radar with a small number of scanning lines is used, and the cost of automatically driving the vehicle is undoubtedly increased by using the laser radar with a large number of scanning lines. In the visual detection method based on the RGB image, since the size and the position of the detected target are unknown, and the global search needs to be performed on the feature image in the process of respectively performing target identification according to each identification unit of the feature image, the detection method has the problems of large calculation amount, large consumed calculation resources and slow detection speed.

Disclosure of Invention

The embodiment of the invention provides a target detection method, a target detection device, electronic equipment and a storage medium, which improve the target detection precision and speed and reduce the target detection calculation amount.

In a first aspect, an embodiment of the present invention provides a target detection method, where the method includes:

determining the outline of a detection target candidate area based on three-dimensional point cloud data acquired by scanning aiming at a physical space;

projecting the outline to an initial two-dimensional image obtained by shooting aiming at a physical space to obtain a first two-dimensional image corresponding to a detection target;

determining a position and/or type of the detection target based on the first two-dimensional image and the initial two-dimensional image.

In a second aspect, an embodiment of the present invention further provides an object detection apparatus, where the apparatus includes:

the candidate area determining module is used for determining the outline of the detection target candidate area based on the three-dimensional point cloud data acquired by aiming at the physical space scanning;

the projection module is used for projecting the outline to an initial two-dimensional image obtained by shooting aiming at a physical space to obtain a first two-dimensional image corresponding to a detection target;

a detection module for determining a position and/or a type of the detection target based on the first two-dimensional image and the initial two-dimensional image.

In a third aspect, an embodiment of the present invention further provides a target detection system, including: the system comprises a three-dimensional point cloud acquisition device, a two-dimensional image acquisition device and a processor;

the three-dimensional point cloud acquisition device is in communication connection with the processor and is used for scanning acquired three-dimensional point cloud data aiming at a physical space and sending the three-dimensional point cloud data to the processor;

the two-dimensional image acquisition device is in communication connection with the processor and is used for shooting a physical space to acquire an initial two-dimensional image and sending the initial two-dimensional image to the processor;

the processor is configured to perform the steps of the target detection method of the embodiments of the invention based on the three-dimensional point cloud data and the initial two-dimensional image.

In a fourth aspect, an embodiment of the present invention further provides an electronic device, where the electronic device includes:

one or more processors;

a memory for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement the steps of the object detection method as provided by any of the embodiments of the invention.

In a fifth aspect, the embodiments of the present invention further provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the object detection method provided in any embodiment of the present invention.

The embodiment of the invention has the following advantages or beneficial effects:

the contour of the candidate region of the detection target is determined based on the three-dimensional point cloud data obtained by scanning aiming at the physical space, and the fine features of the detection target are not directly extracted from the three-dimensional point cloud data, so that the calculation amount is reduced; acquiring a first two-dimensional image corresponding to a detection target by projecting the contour to an initial two-dimensional image acquired by shooting aiming at a physical space; and the technical means for determining the position and/or type of the detected target based on the first two-dimensional image and the initial two-dimensional image realizes the purpose of improving the target detection speed and precision.

Drawings

Fig. 1 is a flowchart of a target detection method according to an embodiment of the present invention;

fig. 2 is a flowchart of a target detection method according to a second embodiment of the present invention;

fig. 3 is a flowchart of a target detection method according to a third embodiment of the present invention;

FIG. 4 is a schematic diagram of a manner of setting an anchor according to a third embodiment of the present invention;

FIG. 5 is a block diagram of a target detection algorithm according to a third embodiment of the present invention;

fig. 6 is a schematic structural diagram of an object detection apparatus according to a fourth embodiment of the present invention;

fig. 7 is a schematic structural diagram of a target detection system according to a fifth embodiment of the present invention;

fig. 8 is a schematic structural diagram of an electronic device according to a sixth embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.

Example one

Fig. 1 is a flowchart of a target detection method according to an embodiment of the present invention. The embodiment is applicable to a scene in which an obstacle that may obstruct the vehicle from running is detected in the field of automatic driving. The method may be performed by a target detection apparatus, which may be implemented in software and/or hardware, and integrated into a target electronic device, such as an autonomous vehicle or a server.

As shown in fig. 1, the target detection method specifically includes the following steps:

step 110, determining the outline of the detection target candidate area based on the three-dimensional point cloud data acquired by scanning aiming at the physical space.

The three-dimensional point cloud data can be obtained by scanning a physical space through a vehicle-mounted laser radar, and the physical space can be a physical space of a driving environment of an automatic driving vehicle. The detection target may be a movable object such as a vehicle, a pedestrian, or the like that travels on a road. The three-dimensional point cloud data is composed of a great number of point cloud points, each point cloud point comprises four-dimensional information which is (x, y, z, intensity), wherein x represents an x coordinate value of the point cloud point in a point cloud coordinate system, y represents a y coordinate value of the point cloud point in the point cloud coordinate system, z represents a z coordinate value of the point cloud point in the point cloud coordinate system, and intensity represents intensity information of the point cloud point.

Illustratively, the determining the profile of the detection target candidate region based on the three-dimensional point cloud data acquired for the physical space scanning includes:

preprocessing the three-dimensional point cloud data to remove point cloud data belonging to a preset static object in the three-dimensional point cloud data;

performing clustering operation on the preprocessed three-dimensional point cloud data by setting a clustering algorithm to obtain a clustering result containing at least one clustering cluster;

determining a minimum bounding box of each clustering cluster in the clustering result;

and determining the area where each minimum bounding box is positioned as the outline of the detection target candidate area.

Wherein the preset static object is, for example, the ground, a flower bed, a telegraph pole, a curb, or the like. By removing the point cloud data of the preset static object in advance, the number of clustering clusters in a subsequent clustering result is greatly reduced, the subsequent matching times on the characteristic image are further reduced, the calculated amount of characteristic fusion is reduced, and the target detection speed is improved.

Further, if the preset static object is the ground, the preprocessing is performed on the three-dimensional point cloud data to remove the point cloud data belonging to the preset static object in the three-dimensional point cloud data, and the method includes:

determining the dip angle between two point cloud points obtained by scanning two adjacent laser emitting ends at the same time in the three-dimensional point cloud data;

if the inclination angle is smaller than an inclination angle threshold value, marking the two point cloud points as ground point cloud data;

removing the ground point cloud data from the three-dimensional point cloud data.

Taking a 16-line radar as an example, the single-frame point cloud data is point cloud data scanned by 16 laser emission ends rotating for one circle at the same time. When the laser emitting device is installed, the elevation angles of the 16 laser emitting ends are uniformly distributed, and are generally 2 degrees. Each laser emitting end rotates one circle, about 1800 point cloud data (determined by scanning frequency) can be scanned, so that a single frame of point cloud data is composed of 16 x 1800 point cloud data, and a matrix with 16 rows and 1800 columns is formed. Because the ground has a flat characteristic, aiming at the ground point cloud data, the inclination angle between two point cloud data of the same column and adjacent rows is not larger than the difference of the elevation angles of two adjacent laser emission ends, so that the ground point cloud data and the non-ground point cloud data in the single-frame point cloud data can be identified by utilizing the characteristic. Wherein the single-frame point cloud data is obtained by rotating and scanning at least two laser emission ends adjacent to each other for one circle.

The inclination angle between two point cloud points obtained by scanning two adjacent laser emission ends at the same time comprises the following steps:

determining a dip angle between two point cloud points based on the following formula:

wherein alpha (i, j) represents the ith row and jth column point in the single-frame point cloud data setInclination angle, x, between cloud point and point cloud point in line (i +1) and column (j)_i,jX coordinate value, y coordinate value representing point cloud point of ith row and jth column in single frame point cloud data set_i,jY coordinate value, z coordinate value representing point cloud point of ith row and jth column in single-frame point cloud data set_i,jAnd the row elements in the single-frame point cloud data set represent point cloud points obtained by scanning the same laser emission end at different moments, and the column elements represent point cloud points obtained by scanning different laser emission ends at the same moment. The expression atan2(y, x) means the angle between a ray pointing to a point (x, y) and the positive direction of the x-axis on a coordinate plane, starting from the origin of the coordinates.

Further, if the preset static object is a static object (such as a flower bed, a telegraph pole, a garbage can, a curb, etc.) other than the ground, the preprocessing the three-dimensional point cloud data to remove the point cloud data belonging to the preset static object in the three-dimensional point cloud data includes:

determining a world coordinate value of each three-dimensional point cloud point in the three-dimensional point cloud data;

and removing the three-dimensional point cloud points with the world coordinate values falling within a set range from the three-dimensional point cloud data, wherein the set range is determined according to the world coordinate values of the preset static object. The world coordinate values specifically refer to coordinate values in a world coordinate system, and each static and immobile object (such as a flower bed, a telegraph pole, a garbage can, a road tooth and the like) in a physical space corresponds to a unique coordinate value in the world coordinate system, so that a point-dimensional cloud point of the static object can be determined by using the world coordinate values of the object, and then the cloud point is removed.

Further, the set clustering algorithm may be, for example, a density clustering algorithm landmark FN-DBSCAN, or a mesh clustering algorithm STING.

The determining of the minimum bounding box of each cluster C in the clustering result { C } is specifically to calculate a minimum bounding cube of each cluster C, where the cube may be a rectangular solid or a cube. The frame of the minimum circumscribed cube forms each clusterC, the area of the cloud point of the three-dimensional point, namely the outline proseal regions of the detection target candidate area,

wherein

The category indexes id, px, py, pz, pd, ph, pw for the cluster C respectively represent the position and the length, height, and width information of the geometric center of the current candidate region.

And 120, projecting the contour to an initial two-dimensional image obtained by shooting aiming at a physical space to obtain a first two-dimensional image corresponding to the detection target.

The initial two-dimensional image may specifically be a color image captured by an RGB camera.

Specifically, based on coordinate transformation, the area contour of the candidate area is projected to the corresponding position of the initial two-dimensional image, and the position and/or type of the detection target is determined based on the projected first two-dimensional image. Because the possible position area (namely the candidate area) of the detection target is determined in advance based on the three-dimensional point cloud data coarse granularity, the detection area is reduced by projecting the outline of the candidate area to the initial two-dimensional image, only the small area blocks of the candidate area corresponding to the initial two-dimensional image are required to be subjected to target detection, and the target detection of the whole initial two-dimensional image is not required, so that the detection calculation amount is greatly reduced, the detection speed is improved, and the detection precision is ensured.

Illustratively, the projecting the contour onto an initial two-dimensional image obtained for physical space shooting to obtain a first two-dimensional image corresponding to a detection target includes:

determining a coordinate transformation matrix according to calibration parameters of the laser radar and the camera;

and projecting the contour to an initial two-dimensional image obtained by shooting aiming at a physical space based on the coordinate conversion matrix to obtain a first two-dimensional image corresponding to the detection target.

Step 130, determining the position and/or type of the detection target based on the first two-dimensional image and the initial two-dimensional image.

Illustratively, the determining the position and/or the type of the detection target based on the first two-dimensional image and the initial two-dimensional image comprises:

inputting the initial two-dimensional image into a preset detection model;

determining the output result of the intermediate layer of the preset detection model as a characteristic image corresponding to the initial two-dimensional image;

mapping the first two-dimensional image to the characteristic image according to the downsampling multiplying power corresponding to the characteristic image to obtain a second two-dimensional image corresponding to a detection target;

determining the position and/or type of the detection target based on the second two-dimensional image.

Feature extraction of the initial two-dimensional image may be performed by a neural network model, such as a convolutional neural network model. It should be noted that, in order to improve the target detection accuracy, the solution of this embodiment is to finally determine the position and/or type (for example, the type is a person, a vehicle, etc.) of the detection target by fusing the three-dimensional point cloud feature of the detection target with the two-dimensional image feature, so that the feature image is not the feature image output by the last layer of the neural network detection model, but the feature image output by the intermediate layer, and the position and/or type of the detection target is predicted by combining the candidate region based on the feature image output by the intermediate layer.

According to the technical scheme of the embodiment, firstly, the coarse-grained target contour is detected based on three-dimensional point cloud data, then the contour of the detected candidate region is projected to the initial two-dimensional image, and further, the target detection is performed on the region small blocks of the candidate region contour corresponding to the initial two-dimensional image, instead of extracting the fine features of the detected target directly based on the three-dimensional point cloud data, so that the detection operand is greatly reduced, and the detection speed is improved; the contour of the determined detection target candidate region is projected to the initial two-dimensional image, so that the purpose of reducing the detection region is achieved, only the region small block corresponding to the candidate region in the initial two-dimensional image needs to be subjected to target detection, and the whole initial two-dimensional image does not need to be subjected to target detection, so that the detection calculation amount is greatly reduced, the detection speed is improved, and the detection precision is ensured.

Example two

Fig. 2 is a flowchart of a target detection method according to a second embodiment of the present invention. The present embodiment is based on the above embodiment, and embodies the step 130 "determining the position and/or type of the detection target based on the first two-dimensional image and the initial two-dimensional image". Compared with the target detection method provided by the embodiment, the target detection method provided by the embodiment can make up for the influence of the characteristics of the laser radar on the acquired three-dimensional point cloud data. Specifically, if the characteristics of the laser radar are poor, the three-dimensional point cloud data corresponding to a distant object cannot be acquired well, for example, when the number of scanning lines of the laser radar is small, the above problem is particularly serious, and no three-dimensional point cloud data with good quality is used as a basis, so that a guarantee cannot be provided for subsequent accurate detection of a target. And purposely improving the performance of lidar will undoubtedly increase hardware costs. In view of the above problems, the technical solution of the present embodiment provides a corresponding solution. Wherein explanations of the same or corresponding terms as those of the above-described embodiments are omitted.

Referring to fig. 2, the target detection method includes the steps of:

step 210, determining the outline of the detection target candidate area based on the three-dimensional point cloud data acquired by scanning aiming at the physical space.

Step 220, projecting the contour to an initial two-dimensional image obtained by shooting aiming at a physical space, and obtaining a first two-dimensional image corresponding to the detection target.

Step 230, inputting the initial two-dimensional image into a preset detection model; and determining the output result of the preset detection model intermediate layer as a characteristic image corresponding to the initial two-dimensional image.

Step 240, extending the corresponding image contour of the candidate region in the first two-dimensional image by a set distance to supplement the main body edge of the detection target.

And step 250, mapping the first two-dimensional image to the characteristic image according to the downsampling multiplying power corresponding to the characteristic image to obtain a second two-dimensional image corresponding to the detection target.

The characteristic image is an image which is not mature and accurate in the target detection process, and is an output result of a preset detection model intermediate layer. Any one target detection task is to finally obtain a target detection result through a certain detection process, and in the detection process, a detection algorithm can extract various features of an original image step by step, wherein the feature image is an image in the detection process and contains more or less features of a detected target. Taking the preset detection model as a convolutional neural network model as an example, the feature image may be an output result of a second convolutional layer of an eighth block in a model structure, and is usually an image obtained by performing downsampling based on an input image, so that the first two-dimensional image is mapped to the feature image according to a downsampling magnification corresponding to the feature image, and a second two-dimensional image corresponding to a detection target is obtained.

Further, limited by the characteristics of the laser radar, if the characteristics of the laser radar are poor, the three-dimensional point cloud data corresponding to the distant object cannot be acquired well, which may cause that the main body of the target object on the feature image in the mapping area corresponding to the candidate area is incomplete, and further affect the detection effect of the subsequent target object position. In view of the above problems, in the technical solution of this embodiment, the following steps are added:

before the first two-dimensional image is mapped to the feature image, extending the corresponding image contour of the candidate region in the first two-dimensional image by a set distance to supplement the main body edge of the detection target. Specifically, the extension direction may be determined based on image contour features of the three-dimensional point cloud data in the candidate region corresponding to the first two-dimensional image, and the set distance may be extended along the determined extension direction. The contour of the candidate region may also be extended along a straight line direction in which the corresponding image contour in the first two-dimensional image is located.

Step 260, determining the position and/or type of the detection target based on the second two-dimensional image.

Specifically, the second two-dimensional image may be divided into grid regions of a fixed size, and then the position of the anchor may be further determined by dividing according to the grid, and then target detection may be performed within the anchor.

Assuming that the number of the determined contours of the detection target candidate region is Q, the fixed-size grid region divided by the second two-dimensional image is 3 × 3, and the number of the corresponding anchors is T, the number of matching times during target detection only needs Q × T. However, in the prior art, since the size and the position of the detection target are unknown, global search matching needs to be performed on the feature image, that is, the anchors in each grid in the feature image are respectively subjected to matching calculation, and if the number of grid regions divided by the feature image is M × N, and K anchors are pre-allocated to each grid, the number of matching times is M × N × K, and usually M × N K is far greater than Q £.

According to the technical scheme, the outline of the detection target candidate area is determined based on three-dimensional point cloud data acquired by scanning aiming at a physical space; projecting the outline to an initial two-dimensional image obtained by shooting aiming at a physical space to obtain a first two-dimensional image corresponding to a detection target; extending the image contour corresponding to the contour of the candidate area in the first two-dimensional image by a set distance to supplement the main body edge of the detection target, and mapping the first two-dimensional image to the characteristic image according to the downsampling multiplying power corresponding to the characteristic image to obtain a second two-dimensional image corresponding to the detection target; the position and/or type of the detection target are/is determined based on the second two-dimensional image, the problem that three-dimensional point cloud data of a distant object is incomplete due to the characteristics of the laser radar is solved, and the requirements on the characteristics of the laser radar are reduced; the target detection is realized, and the detection speed and precision are improved.

EXAMPLE III

Fig. 3 is a flowchart of a target detection method according to a third embodiment of the present invention. Based on the above embodiments, this embodiment embodies the step 260 of determining the position and/or type of the detection target based on the second two-dimensional image, and provides several ways of setting the anchor, which is helpful to further improve the target detection speed and accuracy. Wherein explanations of the same or corresponding terms as those of the above-described embodiments are omitted.

Referring to fig. 3, the target detection method includes the steps of:

step 310, determining the outline of the detection target candidate area based on the three-dimensional point cloud data acquired by scanning aiming at the physical space.

And step 320, projecting the contour to an initial two-dimensional image obtained by shooting aiming at a physical space to obtain a first two-dimensional image corresponding to the detection target.

Step 330, inputting the initial two-dimensional image into a preset detection model; and determining the output result of the preset detection model intermediate layer as a characteristic image corresponding to the initial two-dimensional image.

Step 340, extending the corresponding image contour of the candidate region in the first two-dimensional image by a set distance to supplement the main body edge of the detection target.

And 350, mapping the first two-dimensional image to the characteristic image according to the downsampling multiplying power corresponding to the characteristic image to obtain a second two-dimensional image corresponding to the detection target.

Step 360, dividing the second two-dimensional image into a set number of grid areas, and determining at least one frame area according to the grid areas; performing frame coarse regression operation according to the frame region by using a set neural network model to obtain a segmentation image; and determining the position and/or type of the detection target according to the segmentation image.

It is assumed that the second two-dimensional image is divided into 3 × 3 mesh regions, and 15 anchors are assigned to the 3 × 3 mesh regions, and the assignment manner of the 15 anchors is shown in fig. 4.

Illustratively, the determining at least one frame region (i.e. an anchor box region) according to the grid region includes at least one of the following manners:

determining all the grid areas as first frame areas;

determining the grid areas in the same row as a second frame area;

determining the grid areas in the same column as a third frame area;

determining the grid areas of two adjacent rows as a fourth frame area;

determining the grid areas of two adjacent columns as a fifth frame area;

and determining the grid area forming the square as a sixth frame area, wherein the side length of the square is larger than that of the single grid area and is smaller than that of the second two-dimensional image.

Wherein, the Anchor mechanism is specifically as follows: a point on the feature map corresponds to a small area of the original image, a plurality of anchor boxes can be generated on the small area, and then the anchor boxes in which targets may exist are detected in each anchor box, and further coordinates are regressed to detect the position information of the targets. In the technical scheme of the embodiment, firstly, the contour of a detection target is determined based on three-dimensional point cloud data, the detection area of a two-dimensional characteristic image is reduced based on the determined contour, only the area corresponding to the detected contour in the two-dimensional characteristic image is reserved, and the detection calculation amount is greatly reduced by reducing the detection area of the two-dimensional characteristic image; and further detecting whether the anchors corresponding to each pixel point contain the detection target or not through an anchor mechanism.

Further, the determining the position and/or the type of the detection target according to the segmented image includes:

inputting the segmentation image into a target classification model to obtain the type of a detection target;

and inputting the segmentation image into a full convolution model to obtain the position of the detection target.

Correspondingly, referring to a schematic diagram of a framework of an Object detection algorithm shown in fig. 5, the algorithm includes two modules, namely a 3D point cloud data processing module (also called 3D-projection-Detector) and a 2D image detection module (also called 2D-Feature-Object-Detector). The 3D-projection-Projector is responsible for extracting the outline of the detection target candidate region from the point cloud and mapping the outline to the corresponding position of the Feature map of the Feature image extracted from the 2D-Feature-Object-Detector. The 2D-Feature-Object-Detector is responsible for carrying out Object detection on the contour of the 3D-project-Projector extraction candidate region and completing Feature extraction.

Different from the existing fusion scheme, the algorithm does not directly extract or detect the features from the point cloud, but uses an unsupervised mode to cluster the point cloud, so that the rough three-dimensional candidate region of the detection target is quickly obtained. Next, the three-dimensional candidate region is mapped to the 2D coordinate system of the camera by the calibration parameters of the hardware electronic devices such as radar, camera, etc. And while processing the point cloud, sending the two-dimensional image to a CNN backbone network for feature extraction to obtain a feature image feature map of the whole image. And then, directly projecting the position of the 3D candidate area obtained in the previous step to the corresponding position of the feature image feature map according to the down-sampling multiplying power and performing cropping, thereby obtaining a plurality of small-size local feature image feature maps. Please note that the local feature image feature map at this time corresponds to a clustering result with preset static point clouds filtered out, and since the preset static point clouds are filtered out, the number of clustering clusters in the clustering result is greatly reduced, and further, the number of times of matching the detection link on the whole feature image feature map is reduced. In addition, in the conventional detection method, K anchors are pre-assigned to each pixel point of the global feature image feature map (assuming that the size is M × N), and the calculation amount is M × N × K at this time. In the scheme, the region to be detected is not a global feature image feature map, but a plurality of local feature image feature maps (corresponding to the extracted 3D candidate regions), wherein each local feature image feature map has a higher probability of containing one or a few detection targets. Because the detection targets are relatively close in three-dimensional space positions, the scales of the detection targets of the same category on the feature image feature map are relatively similar, so that the local feature image feature map to be detected can be dynamically divided into grids of fixed size, such as 3 × 3, and the like, and then the local feature image feature map is further spatially divided according to the grids to serve as anchors, and then detection is performed in each anchor. Assuming that the detected number of cluster is Q, the fixed grid size of the feature map of the local feature image is 3 x 3, and the corresponding anchor number is T, the matching times only need to be Q x T during detection. Generally, M N K > Q T, which is the root cause of the speed increase of the scheme.

Specifically, referring to fig. 5, the algorithm flow of the 3D point cloud data processing module (also called 3D-projection-Projector) is as follows:

1) the input is a point cloud matrix PC, the format of which is (n,4), wherein n represents the number of point cloud points, 4 represents four dimensions (x, y, z, intensity) of the point cloud points, and the three-dimensional coordinates of the space and the intensity of the point cloud are respectively represented;

2) and carrying out data preprocessing on the point cloud matrix PC. For example using a ground removal algorithmRemovingGround point clouds and static obstacle removal algorithms remove environmental point clouds (e.g., a point cloud for a utility pole, a point cloud for a flower bed, a point cloud for a trash bin, etc.).

3) And clustering the point cloud by using a clustering algorithm, such as a landmark FN-DBSCAN spatial clustering algorithm, to obtain a clustering result { C }, wherein C is a cluster.

4) Respectively calculating the smallest surrounding frame (namely the smallest circumscribed rectangle) of all the clustering clusters C in the clustering result to form three-dimensional point cloud candidate regions of the detection target,

wherein

The category indexes id, px, py, pz, pd, ph, pw for the cluster C are the position and length, height, and width information of the geometric center of the current candidate region.

5) Using calibration parameters of laser radar and camera to obtain three-dimensional candidate regionProjecting the contour of the domain spatial regions to the 2D coordinates of the image to obtain a first two-dimensional image

Wherein

The category indexes id, px, py, ph, pw for the cluster C are the center position and height and width information of the current candidate region pro-spatial region in the 2D coordinate system, respectively. It is understood that the number of the three-dimensional candidate region prolific regions may be 1, or may be at least two, and usually, the number of the three-dimensional candidate region prolific regions is more than one. If the number of the three-dimensional candidate regions is 1, the first two-dimensional image obtained after projection corresponds to a two-dimensional image contour; if the number of the three-dimensional candidate regions is multiple, a plurality of two-dimensional image contours correspond to the first two-dimensional image obtained after projection.

The algorithm flow of the 2D image detection module (also called 2D-Feature-Object-Detector) is as follows:

1) the input is a two-dimensional image (namely a two-dimensional image obtained by a vehicle-mounted camera for shooting a physical space), the size is (800,600,3), the image is input into a backbone network VGG, the convolution layer conv8_2 is taken as a full-image feature map, the full-image feature map is marked as G, and the size is (50,38, 512).

2) Calculating the down-sampling multiplying power according to the relation of convolution operation in down-sampling, such as kernel-size, padding, step size stride, convolution layer number, pooling posing parameter, etc., and converting the first two-dimensional image PR into the second two-dimensional image PR₂Each image contour in the second two-dimensional image is mapped to a characteristic image G to obtain a second two-dimensional image, each image contour corresponds to a small local area block in the second two-dimensional image and is called as a local characteristic image

It is understood that each cluster corresponds to one of the candidate regions, and each of the candidate regions corresponds to the first two-dimensional image respectivelyEach region block on the first two-dimensional image corresponds to one region block on the second two-dimensional image, so that a plurality of local region blocks on the second two-dimensional image can be obtained through the projection and mapping operations, the local region blocks are regions where detection targets possibly exist, the target detection only needs to be carried out on the local region blocks subsequently, the target detection does not need to be carried out on each pixel region of the whole characteristic image, and the operation amount is greatly reduced.

3) And respectively cutting the local area small blocks, namely cutting each local area small block from the second two-dimensional image to obtain a local characteristic image. It can be understood that, if the number of the candidate regions is one, a local region small block corresponds to the second two-dimensional image, and at this time, no cropping is required, and the second two-dimensional image is the local feature image.

4) In particular, in the second two-dimensional image

The image region design region is performed, whether each region is a foreground (the foreground is a detection target, such as a pedestrian or a vehicle) or a background (the background is other objects which are not the detection target) and a bounding box coarse regression are judged, and the method is specifically realized through an RPN. The cross-over ratio IoU with the ground truth may be greater than 0.7 as a positive sample, less than 0.3 as a negative sample, smooth L1 and the regularization term as the regression loss function loss, and binary _ focal _ loss as the class loss function loss of the foreground and background. In which the local characteristic image is

Dividing the grid into 3 x 3 grid areas, and allocating 15 anchors in total; and selecting all foreground targets in the RPN process, and clipping crop to the corresponding local feature image feature map according to the coordinates after the rough regression. Performing POOLING SPP POLING on the feature image feature map after cropping, wherein the SPP parameters can be {5 × 5,3 × 3,2 × 2,1 × 1} to obtain a feature image fea with uniform size of 39 × 1Future map, noted F. And (3) sending the F into a CNN branch network for extracting embedding, wherein the loss function loss is cosine loss, outputting a visual feature vector of 128 x 1, and providing the visual feature of the detected object for a downstream module (a tracking module) of the algorithm. F is sent into a full convolution layer and a softmax layer, the detection targets are classified, and the loss function loss is the probability _ cross _ entry. And F is sent into a full convolution layer for bounding box fine regression, and the loss function loss is smooth L1.

The target detection method provided by the embodiment greatly reduces the operation complexity, improves the response speed of target detection in automatic driving, and simultaneously, the laser radar point cloud is not directly used for detection, so that a laser radar product with less scanning lines can be used, and the cost of electronic equipment is effectively reduced.

The following is an embodiment of the object detection apparatus provided in the embodiments of the present invention, which belongs to the same inventive concept as the object detection methods in the embodiments described above, and reference may be made to the embodiments of the object detection method for details that are not described in detail in the embodiments of the object detection apparatus.

Example four

Fig. 6 is a schematic structural diagram of a target detection apparatus according to a fourth embodiment of the present invention. The embodiment is applicable to a scene in which an obstacle that may obstruct the vehicle from running is detected in the field of automatic driving. The device may be integrated into an autonomous vehicle or a server.

As shown in fig. 6, the apparatus includes: a candidate region determination module 610, a projection module 620, and a detection module 630.

The candidate region determining module 610 is configured to determine a contour of a detection target candidate region based on three-dimensional point cloud data obtained by scanning for a physical space; a projection module 620, configured to project the contour to an initial two-dimensional image obtained by shooting for a physical space, so as to obtain a first two-dimensional image corresponding to a detection target; a detection module 630 for determining a location and/or a type of the detection target based on the first two-dimensional image and the initial two-dimensional image.

Further, the candidate region determining module 610 includes:

the preprocessing unit is used for preprocessing the three-dimensional point cloud data to remove point cloud data belonging to a preset static object in the three-dimensional point cloud data;

the clustering unit is used for performing clustering operation on the preprocessed three-dimensional point cloud data through a set clustering algorithm to obtain a clustering result containing at least one clustering cluster;

the first determining unit is used for determining the minimum bounding box of each clustering cluster in the clustering result;

and the second determining unit is used for determining the area where each minimum bounding box is positioned as the outline of the detection target candidate area.

Further, if the preset static object is the ground, the preprocessing unit is specifically configured to:

Further, if the preset static object is a static object other than the ground, the preprocessing unit is specifically configured to:

and removing the three-dimensional point cloud points with the world coordinate values falling within a set range from the three-dimensional point cloud data, wherein the set range is determined according to the world coordinate values of the preset static object.

Further, the projection module 620 includes:

the first determining unit is used for determining a coordinate transformation matrix according to calibration parameters of the laser radar and the camera;

and the projection unit is used for projecting the outline of the candidate area to an initial two-dimensional image acquired by shooting aiming at a physical space based on the coordinate conversion matrix to obtain a first two-dimensional image corresponding to the detection target.

Further, the detecting module 630 includes:

the mapping unit is used for mapping the first two-dimensional image to the characteristic image according to the downsampling multiplying power corresponding to the characteristic image to obtain a second two-dimensional image corresponding to a detection target;

a second determination unit for determining a position and/or a type of the detection target based on the second two-dimensional image.

Further, the detecting module 630 further includes: a supplementing unit, configured to extend an image contour corresponding to the contour of the candidate region in the first two-dimensional image by a set distance before mapping the first two-dimensional image to the feature image, so as to supplement a body edge of a detection target.

Further, the second determination unit includes:

a dividing subunit, configured to divide the second two-dimensional image into a set number of mesh regions;

a first determining subunit, configured to determine at least one frame region according to the grid region;

the operation subunit is used for performing frame coarse regression operation according to the frame region by using a set neural network model to obtain a segmentation image;

and the second determining subunit is used for determining the position and/or the type of the detection target according to the segmented image.

Further, the first determining subunit is specifically configured to perform at least one of:

determining all the grid areas as first frame areas;

determining the grid areas in the same row as a second frame area;

determining the grid areas in the same column as a third frame area;

determining the grid areas of two adjacent rows as a fourth frame area;

determining the grid areas of two adjacent columns as a fifth frame area;

Further, the second determining subunit is specifically configured to:

According to the technical scheme of the embodiment, firstly, the coarse-grained target contour is detected based on three-dimensional point cloud data, then the contour of the detected candidate region is projected to the two-dimensional feature image, the target detection is further performed on the region small blocks of the candidate region contour corresponding to the two-dimensional feature image, and the extraction of the fine feature of the detected target is not directly performed based on the three-dimensional point cloud data, so that the detection operand is greatly reduced, and the detection speed is improved; the contour of the determined detection target candidate region is projected to the two-dimensional characteristic image, so that the purpose of reducing the detection region is achieved, only the region small block corresponding to the candidate region in the characteristic image needs to be subjected to target detection, and the whole characteristic image does not need to be subjected to target detection, so that the detection calculation amount is greatly reduced, the detection speed is improved, and the detection precision is ensured.

The target detection device provided by the embodiment of the invention can execute the target detection method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects for executing the target detection method.

EXAMPLE five

Fig. 7 is a schematic structural diagram of a target detection system according to a fifth embodiment of the present invention, and as shown in fig. 7, the system includes: a three-dimensional point cloud acquisition device 710, a two-dimensional image acquisition device 720 and a processor 730;

the three-dimensional point cloud acquisition device 710 is in communication connection with the processor 730, and is used for scanning acquired three-dimensional point cloud data aiming at a physical space and sending the three-dimensional point cloud data to the processor;

the two-dimensional image acquisition device 720 is in communication connection with the processor 730 and is used for acquiring an initial two-dimensional image for physical space shooting and sending the initial two-dimensional image to the processor;

the processor 730 is configured to perform the steps of the target detection method according to any of the above embodiments based on the three-dimensional point cloud data and the initial two-dimensional image.

The target detection method comprises the steps of firstly detecting a coarse-grained target contour based on three-dimensional point cloud data, then projecting the contour of a detected candidate region to a two-dimensional feature image, and further performing target detection on a region small block corresponding to the candidate region contour in the two-dimensional feature image instead of extracting a fine feature of the detected target directly based on the three-dimensional point cloud data, so that the detection operand is greatly reduced, and the detection speed is improved; the contour of the determined detection target candidate region is projected to the two-dimensional characteristic image, so that the purpose of reducing the detection region is achieved, only the region small block corresponding to the candidate region in the characteristic image needs to be subjected to target detection, and the whole characteristic image does not need to be subjected to target detection, so that the detection calculation amount is greatly reduced, the detection speed is improved, and the detection precision is ensured.

EXAMPLE six

Fig. 8 is a schematic structural diagram of an electronic device according to a sixth embodiment of the present invention. FIG. 8 illustrates a block diagram of an exemplary electronic device 12 suitable for use in implementing embodiments of the present invention. The electronic device 12 shown in fig. 8 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiment of the present invention.

As shown in FIG. 8, electronic device 12 is embodied in the form of a general purpose computing electronic device. The components of electronic device 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, and a bus 18 that couples various system components including the system memory 28 and the processing unit 16.

Bus 18 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, micro-channel architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.

Electronic device 12 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by electronic device 12 and includes both volatile and nonvolatile media, removable and non-removable media.

The system memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)30 and/or cache memory 32. The electronic device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 8, and commonly referred to as a "hard drive"). Although not shown in FIG. 8, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to bus 18 by one or more data media interfaces. The system memory 28 may include at least one program product having a set of program modules (e.g., at least one candidate region determination module 610, feature image determination module 620, and detection module 630) configured to perform the functions of embodiments of the present invention.

A program/utility 40 having a set (at least one candidate region determination module 610, feature image determination module 620, and detection module 630) of program modules 42 may be stored, for example, in system memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may include an implementation of a network environment. Program modules 42 generally carry out the functions and/or methodologies of the described embodiments of the invention.

Electronic device 12 may also communicate with one or more external electronic devices 14 (e.g., keyboard, pointing electronics, display 24, etc.), with one or more electronic devices that enable a user to interact with electronic device 12, and/or with any electronic device (e.g., network card, modem, etc.) that enables electronic device 12 to communicate with one or more other computing electronic devices. Such communication may be through an input/output (I/O) interface 22. Also, the electronic device 12 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet) via the network adapter 20. As shown, the network adapter 20 communicates with other modules of the electronic device 12 via the bus 18. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with electronic device 12, including but not limited to: microcode, electronic device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

The processing unit 16 executes various functional applications and object detection by running programs stored in the system memory 28, for example, to implement an object detection method provided by the embodiment of the present invention, the method includes:

the position and/or type of the detection target is determined based on the first two-dimensional image and the initial two-dimensional image, and it can be understood by those skilled in the art that a processor may also implement the technical solution of the target detection method provided in any embodiment of the present invention.

EXAMPLE seven

The seventh embodiment provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor, implements the steps of the object detection method provided in any embodiment of the present invention, the method including:

determining the location and/or type of the detection target based on the first two-dimensional image and the initial two-dimensional image embodiments of the computer storage medium may employ any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer-readable storage medium may be, for example but not limited to: an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

It will be understood by those skilled in the art that the modules or steps of the invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of computing devices, and optionally they may be implemented by program code executable by a computing device, such that it may be stored in a memory device and executed by a computing device, or it may be separately fabricated into various integrated circuit modules, or it may be fabricated by fabricating a plurality of modules or steps thereof into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims

1. A method of object detection, comprising:

2. The method of claim 1, wherein determining the contour of the detection target candidate region based on three-dimensional point cloud data acquired for the physical space scan comprises:

3. The method of claim 2, wherein the preprocessing the three-dimensional point cloud data to remove point cloud data belonging to a predetermined static object from the three-dimensional point cloud data if the predetermined static object is the ground comprises:

4. The method of claim 2, wherein if the predetermined static object is a static object other than the ground, the pre-processing the three-dimensional point cloud data to remove point cloud data belonging to the predetermined static object from the three-dimensional point cloud data comprises:

5. The method of claim 1, wherein the projecting the contour onto an initial two-dimensional image obtained for a physical space capture to obtain a first two-dimensional image corresponding to a detected object comprises:

6. The method of claim 1, wherein said determining the location and/or type of the detection target based on the first two-dimensional image and the initial two-dimensional image comprises:

inputting the initial two-dimensional image into a preset detection model;

7. The method of claim 6, wherein prior to mapping the first two-dimensional image to the feature image, further comprising:

and extending the corresponding image contour of the candidate region in the first two-dimensional image by a set distance to supplement the main body edge of the detection target.

8. The method of claim 6, wherein determining the location and/or type of the detection target based on the second two-dimensional image comprises:

dividing the second two-dimensional image into a set number of grid areas;

determining at least one frame region according to the grid region;

performing frame coarse regression operation according to the frame region by using a set neural network model to obtain a segmentation image;

and determining the position and/or type of the detection target according to the segmentation image.

9. The method of claim 8, wherein determining at least one bounding region according to the grid region comprises at least one of:

determining all the grid areas as first frame areas;

determining the grid areas in the same row as a second frame area;

determining the grid areas in the same column as a third frame area;

determining the grid areas of two adjacent rows as a fourth frame area;

determining the grid areas of two adjacent columns as a fifth frame area;

10. The method of claim 8, wherein determining the location and/or type of the detection target from the segmented image comprises:

11. An object detection device, comprising:

12. An object detection system, comprising: the system comprises a three-dimensional point cloud acquisition device, a two-dimensional image acquisition device and a processor;

the processor is configured to perform the target detection method steps of any of claims 1-10 based on the three-dimensional point cloud data and the initial two-dimensional image.

13. An electronic device, characterized in that the electronic device comprises:

one or more processors;

a memory for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement the object detection method steps of any one of claims 1-10.

14. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the object detection method steps of any one of claims 1 to 10.