CN111179329B

CN111179329B - Three-dimensional target detection method and device and electronic equipment

Info

Publication number: CN111179329B
Application number: CN201911410518.1A
Authority: CN
Inventors: 黄浴
Original assignee: Zhiche Youxing Technology Shanghai Co ltd
Current assignee: Zhiche Youxing Technology Shanghai Co ltd
Priority date: 2019-12-31
Filing date: 2019-12-31
Publication date: 2023-09-08
Anticipated expiration: 2039-12-31
Also published as: CN111179329A

Abstract

The embodiment of the disclosure discloses a three-dimensional target detection method, a three-dimensional target detection device and electronic equipment. The method comprises the following steps: acquiring an original point cloud acquired by a laser radar and an original image acquired by a camera; determining a first quality evaluation index corresponding to the original point cloud and a second quality evaluation index corresponding to the original image; determining first target data passing quality evaluation in the original point cloud and the original image according to the first quality evaluation index and the second quality evaluation index; and obtaining a three-dimensional target detection result according to the first target data and a corresponding target detection strategy. Compared with the condition that three-dimensional target detection is directly carried out according to the image acquired by the camera in the prior art, the embodiment of the disclosure can effectively ensure the reliability of the detection result.

Description

Three-dimensional target detection method and device and electronic equipment

Technical Field

The disclosure relates to the technical field of target detection, and in particular relates to a three-dimensional target detection method, a three-dimensional target detection device and electronic equipment.

Background

For an automatic driving system, three-dimensional target detection is a very important link, and at present, when three-dimensional target detection is performed, only images acquired by a camera are generally used for detection, and once the quality of the images acquired by the camera is poor, the reliability of a detection result is very poor.

Disclosure of Invention

The present disclosure has been made in order to solve the above technical problems. The embodiment of the disclosure provides a three-dimensional target detection method, a three-dimensional target detection device and electronic equipment.

According to an aspect of the embodiments of the present disclosure, there is provided a three-dimensional object detection method including:

acquiring an original point cloud acquired by a laser radar and an original image acquired by a camera;

determining a first quality evaluation index corresponding to the original point cloud and a second quality evaluation index corresponding to the original image;

determining first target data passing quality evaluation in the original point cloud and the original image according to the first quality evaluation index and the second quality evaluation index;

and obtaining a three-dimensional target detection result according to the first target data and a corresponding target detection strategy.

According to another aspect of an embodiment of the present disclosure, there is provided a three-dimensional object detection apparatus including:

the first acquisition module is used for acquiring an original point cloud acquired by the laser radar and an original image acquired by the camera;

the first determining module is used for determining a first quality evaluation index corresponding to the original point cloud and a second quality evaluation index corresponding to the original image;

The second determining module is used for determining first target data passing quality evaluation in the original point cloud and the original image according to the first quality evaluation index and the second quality evaluation index;

and the second acquisition module is used for acquiring a three-dimensional target detection result according to the first target data and a corresponding target detection strategy.

According to still another aspect of an embodiment of the present disclosure, there is provided an electronic apparatus including:

a processor;

a memory for storing the processor-executable instructions;

the processor is used for reading the executable instructions from the memory and executing the instructions to realize the three-dimensional target detection method.

According to still another aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium storing a computer program for executing the above three-dimensional object detection method.

In the embodiment of the disclosure, after determining a first quality evaluation index for an original point cloud acquired by a laser radar and a second quality evaluation index for an original image acquired by a camera, first target data passing quality evaluation in the original point cloud and the original image can be determined according to the first quality evaluation index and the second quality evaluation index; and then, according to the first target data, a three-dimensional target detection result can be obtained by a corresponding target detection strategy. Therefore, in the embodiment of the disclosure, the multi-sensor system is used, and the multi-sensor system can simultaneously comprise the laser radar and the camera, and when three-dimensional target detection is performed, the quality evaluation is performed on first target data passing through the original point cloud collected by the laser radar and the original image collected by the camera, namely, as long as the data collected by at least one of the laser radar and the camera is reliable, a reliable three-dimensional target detection result can be obtained, so that compared with the case that three-dimensional target detection is performed directly according to the image collected by the camera in the prior art, the embodiment of the disclosure can effectively ensure the reliability of the detection result. The technical scheme of the present disclosure is described in further detail below through the accompanying drawings and examples.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description, serve to explain the principles of the disclosure.

The disclosure may be more clearly understood from the following detailed description taken in conjunction with the accompanying drawings in which:

FIG. 1 is a flow chart of a three-dimensional object detection method according to an exemplary embodiment of the present disclosure;

FIG. 2 is a block diagram of a three-dimensional object detection system in an exemplary embodiment of the present disclosure;

FIG. 3 is a schematic diagram of three-dimensional object detection in an exemplary embodiment of the present disclosure;

FIG. 4 is a block diagram of a three-dimensional object detection device provided in an exemplary embodiment of the present disclosure;

FIG. 5 is a block diagram of a three-dimensional object detection device provided in accordance with another exemplary embodiment of the present disclosure;

FIG. 6 is a block diagram of a three-dimensional object detection device provided in accordance with yet another exemplary embodiment of the present disclosure;

fig. 7 is a block diagram of an electronic device according to an exemplary embodiment of the present disclosure.

Detailed Description

Only some embodiments of the present disclosure, but not all embodiments of the present disclosure, it being understood that the present disclosure is not limited to the example embodiments described herein.

It should be noted that: the relative arrangement of the components and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless it is specifically stated otherwise.

It will be appreciated by those of skill in the art that the terms "first," "second," etc. in embodiments of the present disclosure are used merely to distinguish between different steps, devices or modules, etc., and do not represent any particular technical meaning or necessarily logical order; "plurality" may mean two or more, and "at least one" may mean one, two or more.

It should also be appreciated that any component, data, or structure referred to in the presently disclosed embodiments may be generally understood as one or more without explicit limitation or the contrary in the context.

In addition, the term "and/or" in this disclosure is merely an association relationship describing an association object, and indicates that three relationships may exist, for example, a and/or B may indicate: a exists alone, A and B exist together, and B exists alone. The character "/" in the present disclosure indicates that the front-rear association object is an or relationship.

It should also be understood that the description of the various embodiments of the present disclosure emphasizes the differences between the various embodiments, and that the same or similar features may be referred to each other, and for brevity, will not be described in detail. It should be appreciated that for ease of description, the dimensions of the various parts shown in the figures are not drawn to actual scale.

The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses. Techniques, methods, and apparatus known to one of ordinary skill in the relevant art may not be discussed in detail, but are intended to be part of the specification where appropriate.

Embodiments of the present disclosure may be applicable to electronic devices such as terminal devices, computer systems, servers, etc., which may operate with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known terminal devices, computing systems, environments, and/or configurations that may be suitable for use with the terminal device, computer system, server, or other electronic device include, but are not limited to: personal computer systems, server computer systems, thin clients, thick clients, hand-held or laptop devices, microprocessor-based systems, set-top boxes, programmable consumer electronics, network personal computers, minicomputer systems, mainframe computer systems, and distributed cloud computing technology environments that include any of the above systems, and the like.

An electronic device such as a terminal device, a computer system, or the like may be described in the general context of computer system-executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, etc., that perform particular tasks or implement particular abstract data types. The computer system/server may be implemented in a distributed cloud computing environment in which tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computing system storage media including memory storage devices.

In addition, the present disclosure may be applied to, but not limited to, other technical fields such as security monitoring, robot navigation, and intelligent traffic management systems.

Exemplary method

Fig. 1 is a flow chart illustrating a three-dimensional object detection method according to an exemplary embodiment of the present disclosure. The method shown in fig. 1 may include step 101, step 102, step 103, and step 104, each of which is described below.

Step 101, acquiring an original point cloud acquired by a laser radar and an original image acquired by a camera.

Here, the raw point cloud acquired by the lidar and the raw image acquired by the camera may be time-synchronized. If the lidar continuously collects the original point clouds to obtain a point cloud sequence composed of a plurality of original point clouds, and the camera continuously collects the original images to obtain an image sequence composed of a plurality of original images, the frame rates of the point cloud sequence and the image sequence may be the same.

Here, the number of cameras may be one; alternatively, the number of cameras may be at least two. Specifically, as can be seen from the three-dimensional object detection system shown in fig. 2, the number of cameras may be two, the two cameras are the camera 1 and the camera 2, respectively, and the camera 1 and the camera 2 may form a binocular camera, in which case, the original image involved in the step 101 may include the original image collected by the camera 1 and the original image collected by the camera 2 at the same time.

Step 102, determining a first quality evaluation index corresponding to the original point cloud, and determining a second quality evaluation index corresponding to the original image.

Here, the first quality evaluation index may be used to evaluate the quality of the original point cloud, and the types of the first quality evaluation index are various, and will be described by way of example for the sake of layout clarity.

Here, the second quality evaluation index may be used to evaluate the quality of the original image, and the second quality evaluation index may be a measure of conventional image processing and video frame acquisition, such as peak-to-noise ratio (PSNR), structural similarity (structural similarity, SSIM), and the like.

And step 103, determining first target data passing the quality evaluation in the original point cloud and the original image according to the first quality evaluation index and the second quality evaluation index.

Because the first quality evaluation index is used for evaluating the quality of the original point cloud, the second quality evaluation index is used for evaluating the quality of the original image, whether the quality evaluation of the original point cloud and the original image passes or not can be respectively determined according to the first quality evaluation index and the second quality evaluation index, and corresponding first target data can be obtained according to the determination result. Several possible scenarios of data composition of the first target data are described below by way of example in connection with fig. 2.

In the first case, the quality evaluation of the original point cloud collected by the laser radar passes, and the quality evaluation of the original image collected by the camera 1 and the quality evaluation of the original image collected by the camera 2 pass, and at this time, the first target data may include the original point cloud, the original image collected by the camera 1, and the original image collected by the camera 2.

In the second case, the quality evaluation of the original point cloud acquired by the laser radar passes, and the quality evaluation of the original image acquired by the camera 1 and the quality evaluation of the original image acquired by the camera 2 do not pass, and at this time, the first target data may include only the original point cloud.

In the third case, the quality evaluation of the original point cloud acquired by the laser radar does not pass, and the quality evaluation of at least one of the original image acquired by the camera 1 and the original image acquired by the camera 2 passes, and at this time, the first target data may include only the original image acquired by the camera 1 and the original image acquired by the camera 2, and the quality evaluation passes the original image.

In the fourth case, the quality evaluation of the original point cloud acquired by the laser radar passes, and the quality evaluation of one of the original image acquired by the camera 1 and the original image acquired by the camera 2 passes, and at this time, the first target data may include only the original image acquired by the camera 1 and the original image acquired by the camera 2, the original image whose quality evaluation passes, and the original point cloud.

Step 104, obtaining a three-dimensional target detection result according to the first target data and the corresponding target detection strategy.

The three-dimensional may be referred to as 3D, and the two-dimensional may be referred to as 2D.

Here, in the case where only the original point cloud is included in the first target data, the three-dimensional target detection result may be obtained from the first target data with a target detection policy based on the point cloud; in the case that the first target data includes only at least one of the original image collected by the camera 1 and the original image collected by the camera 2, a three-dimensional target detection result may be obtained according to the first target data with an image-based target detection policy; in the case that the first target data includes at least one of the original image collected by the camera 1 and the original image collected by the camera 2 and the original point cloud at the same time, the three-dimensional target detection result may be obtained according to the first target data based on a target detection policy of point cloud and image fusion.

Specifically, the three-dimensional object detection result may include a type of the object, a 3D bezel center position of the object, a size (e.g., length, width, height) of the object, and the like.

In the embodiment of the disclosure, after determining a first quality evaluation index for an original point cloud acquired by a laser radar and a second quality evaluation index for an original image acquired by a camera, first target data passing quality evaluation in the original point cloud and the original image can be determined according to the first quality evaluation index and the second quality evaluation index; and then, according to the first target data, a corresponding target detection strategy is adopted to obtain a three-dimensional target detection result. Therefore, in the embodiment of the disclosure, the multi-sensor system is used, and the multi-sensor system can simultaneously comprise the laser radar and the camera, and when three-dimensional target detection is performed, the quality evaluation is performed on first target data passing through the original point cloud collected by the laser radar and the original image collected by the camera, namely, as long as the data collected by at least one of the laser radar and the camera is reliable, a reliable three-dimensional target detection result can be obtained, so that compared with the case that three-dimensional target detection is performed directly according to the image collected by the camera in the prior art, the embodiment of the disclosure can effectively ensure the reliability of the detection result.

In an alternative example, according to the first target data, a three-dimensional target detection result is obtained according to a corresponding target detection strategy, including:

under the condition that the first target data comprises an original image, a two-dimensional instance segmentation result is obtained according to the original image;

and obtaining a three-dimensional target detection result according to the two-dimensional example segmentation result.

Here, a two-dimensional instance segmentation result may be obtained from an original image, and a specific embodiment of obtaining the two-dimensional instance segmentation result will be described by way of example.

In a specific embodiment, the number of cameras is one, and a two-dimensional instance segmentation result is obtained according to an original image, including:

and carrying out instance segmentation on the original image in the first target data to obtain a two-dimensional instance segmentation result.

Here, the three-dimensional object detection system may include a two-dimensional object detection module and a two-dimensional segmentation module, and the combination of the two-dimensional object detection module and the two-dimensional segmentation module may be regarded as an example segmentation module. In general, the instance segmentation model (Instance Segmentation) employed by the instance segmentation module may be Mask-RCNN (which is an instance segmentation algorithm), sequential packet networks (Sequential Grouping Networks, SGN), or the like.

Assuming that the first target data only includes an original image acquired by the camera 1 in fig. 2, as shown in fig. 2, the two-dimensional target detection module may be used to perform 2D detection on the original image acquired by the camera 1, and the two-dimensional segmentation module may also be used to perform 2D segmentation on the original image acquired by the camera 1, where a detection result obtained by the 2D detection and a segmentation result obtained by the 2D segmentation may be used to form a two-dimensional example segmentation result; the detection result obtained by 2D detection may include the type of the target, the 2D frame position of the target, the size (such as length and width) of the target, and the like.

In this embodiment, the two-dimensional instance segmentation result can be obtained very conveniently by using the original image in the first target data.

In another embodiment, the number of cameras is two, and the two-dimensional example segmentation result is obtained according to the original image, including:

under the condition that the first target data comprises original images respectively acquired by two cameras, selecting the original image acquired by one of the two cameras from the first target data;

and performing instance segmentation on the selected original image to obtain a two-dimensional instance segmentation result.

Assuming that the original image collected by the camera 1 and the original image collected by the camera 2 in fig. 2 are included in the first target data, the original image collected by one of the camera 1 and the camera 2 may be selected from the first target data.

Specifically, the original image with the optimal image quality can be selected from the original image collected by the camera 1 and the original image collected by the camera 2 according to the second quality evaluation index corresponding to the original image collected by the camera 1 and the second quality evaluation index corresponding to the original image collected by the camera 2. Alternatively, one camera-captured original image may be arbitrarily selected from among the camera-1-captured original image and the camera-2-captured original image.

Then, similar to the above embodiment, the two-dimensional object detection module and the two-dimensional segmentation module may be used to perform 2D segmentation and 2D detection on the selected original image, so as to obtain a two-dimensional instance segmentation result.

Regardless of the implementation manner of the two-dimensional instance segmentation result, after the two-dimensional instance segmentation result is obtained, a three-dimensional target detection result can be obtained according to the two-dimensional instance segmentation result. It can be seen that, in the embodiment of the present disclosure, in the case that the original image is included in the first target data, information in the original image may be used for the three-dimensional target detection process to ensure the reliability of the detection result.

In an alternative example, the three-dimensional object detection result is obtained according to the two-dimensional instance segmentation result, including:

determining whether the first target data comprises an original point cloud or not to obtain a determination result;

obtaining a target point cloud according to the determination result;

and obtaining a three-dimensional target detection result according to the two-dimensional instance segmentation result and the target point cloud.

Here, the data in the first target data may be traversed to obtain a determination result for characterizing whether the original point cloud is included in the first target data. Next, a target point cloud may be obtained according to the determination result.

In a specific embodiment, according to the determination result, obtaining the target point cloud includes:

taking the original point cloud as a target point cloud under the condition that the first target data is characterized by the determination result and the original point cloud is included;

under the condition that the first target data represented by the determination result does not comprise the original point cloud, generating a pseudo point cloud according to the original image and calibration parameters of the camera, and taking the pseudo point cloud as a target point cloud.

Here, the calibration parameters of the camera may include an inner parameter and an outer parameter; wherein, the internal parameters can comprise focal length, eccentricity and distortion coefficient of the camera; the external parameters may include the pose of the camera with respect to the world coordinate system, including, for example, a rotation matrix and a translation matrix of the camera with respect to the world coordinate system.

Here, the three-dimensional object detection system may include a depth estimation module in addition to the two-dimensional object detection module and the two-dimensional segmentation module.

When the determination result characterizes that the first target data includes the original point cloud, the original point cloud in the first target data may be directly used as the target point cloud.

In the case that the determination result characterizes that the first target data does not include the original point cloud, as shown in fig. 2, depth estimation may be performed by using the depth estimation module and the original image in the first target data, so as to obtain a corresponding depth map. Specifically, in the case that the first target data includes the original image acquired by the camera 1 and the original image acquired by the camera 2, the depth estimation model adopted by the depth estimation module may be a binocular stereoscopic depth estimation model, for example, a pyramid stereoscopic matching network (Pyramid Stereo Matching Network, PSMNet), a Group-related stereoscopic network (Group-wise Correlation Stereo Network, gwnet), and the like; in case only one of the original image acquired by camera 1 and the original image acquired by camera 2 is included in the first target data, the depth estimation model employed by the depth estimation module may be a monocular depth estimation model, such as GeoNet, which is an unsupervised learning framework for monocular depth, optical flow and camera motion estimation in video, a Structure-aware based residual pyramid network (Structure-Aware Residual Pyramid Network, SARPN), etc.

After the depth map is obtained, the depth map can be subjected to inverse perspective transformation according to calibration parameters of the camera to obtain a pseudo point cloud, and then the pseudo point cloud is taken as a target point cloud. It should be noted that the calibration parameters according to which the inverse perspective transformation is performed may be the calibration parameters of the camera corresponding to the original image used for performing the instance segmentation.

In this embodiment, whether the first target data includes the original point cloud or not, the target point cloud can be determined very conveniently. After the target point cloud is obtained, a three-dimensional target detection result can be obtained according to the two-dimensional instance segmentation result and the target point cloud.

In the embodiment of the disclosure, the three-dimensional target detection result can be obtained by combining the two-dimensional instance segmentation result of the original image acquired by the camera and the target point cloud, so that the reliability of the three-dimensional target detection result can be better ensured.

In an alternative example, obtaining a three-dimensional target detection result from the two-dimensional instance segmentation result and the target point cloud includes:

generating a first target view according to the target point cloud; wherein the first target view comprises at least one of a first bird's eye view and a first front view;

According to the first target view, extracting features to obtain a first extraction result;

performing feature extraction according to the two-dimensional instance segmentation result to obtain a second extraction result;

inputting the first extraction result and the second extraction result into a three-dimensional area proposal network to obtain proposal results output by the three-dimensional area proposal network;

the obtained proposal result is input to a three-dimensional object detector to obtain a three-dimensional object detection result output by the three-dimensional object detector.

Here, a three-dimensional region proposal network (RegionProposalNetwork, RPN), which may be simply referred to as a 3D RPN, and a three-dimensional object detector may also be included in the three-dimensional object detection system.

In the embodiment of the present disclosure, after the target point cloud is obtained, the target point cloud may be projected to a vertical direction to obtain the first aerial view, or may be projected to a forward camera direction to obtain the first front view, and in this way, the first target view including at least one of the first aerial view and the first front view may be obtained.

Assuming that the first target view includes both the first aerial view and the first front view, as shown in fig. 3, feature extraction may be performed on two views in the first target view by using two feature extraction encoders, respectively, to obtain two first extraction results. In addition, as shown in fig. 3, a feature extraction encoder may be further used to perform feature extraction on the two-dimensional instance segmentation result to obtain a second extraction result. Alternatively, each feature extraction encoder may extract features using a residual network or a dense network. In particular, the residual network may also be referred to as ResNet and the dense network may also be referred to as DenseNet.

Next, both the first extraction result and the second extraction result may be input into a three-dimensional region proposal network for 3D proposal generation, resulting in several proposals, each of which may include a possible target location, size, orientation, etc.

Thereafter, the three-dimensional region proposal network may rank the several proposals, and the three-dimensional region proposal network may output proposal results including the first N proposals in the rank; wherein N may be 20, 25, 30, etc.

The proposal result output by the three-dimensional area proposal network can be input to a three-dimensional object detector, as shown in fig. 2 and 3, and the three-dimensional object detector can perform 3D object detection according to the proposal result to obtain a three-dimensional object detection result. Here, the three-dimensional area proposal network and the three-dimensional object detector may be composed of fully connected layers (i.e., FCL) with the addition of a Non-maximum suppression (Non-Maximum Suppression, NMS) module.

It can be seen that in the embodiments of the present disclosure, by feature extraction and combining the three-dimensional region proposal network and the three-dimensional object detector, the three-dimensional object detection result can be obtained conveniently and reliably.

Generating a second target view according to the original point cloud under the condition that the first target data comprises the original point cloud and the first target data does not comprise the original image; wherein the second target view comprises at least one of a second bird's eye view and a second front view;

according to the second target view, extracting features to obtain a third extraction result;

inputting the third extraction result into a three-dimensional area proposal network to obtain a proposal result output by the three-dimensional area proposal network;

Here, in the case where only the original point cloud is included in the first target data, the original point cloud may be projected to the vertical direction to obtain the second bird's-eye view, and the target point cloud may be projected to the forward camera direction to obtain the second front view, in this way, the second target view including at least one of the second bird's-eye view and the second front view can be obtained.

Assuming that the second target view includes both the second bird's eye view and the second front view, as shown in fig. 3, feature extraction may be performed on two views of the second target view by using two feature extraction encoders, respectively, to obtain two third extraction results. Next, the two third extraction results may be input into the three-dimensional area proposal network, and the subsequent steps may refer to the description of the process after the two first extraction results and the second extraction result are both input into the three-dimensional area proposal network, which is not described herein.

In an alternative example, determining a first quality assessment indicator corresponding to an origin cloud includes:

projecting the original point cloud to an image plane of a camera to obtain a projection image;

calculating gradient information of a projection image and correlation of image edge information of an original image, and taking Guan Du as a first quality evaluation index corresponding to an original point cloud; or determining the Raney secondary entropy of the projection image, and taking the Raney secondary entropy as a first quality evaluation index corresponding to the original point cloud.

Taking the case that only one camera is included in the three-dimensional target detection system as an example, after the original point cloud collected by the laser radar and the original image collected by the camera are obtained, the original point cloud can be projected to the image plane of the camera to obtain a projected image, and at this time, the coordinate systems of the laser radar and the camera can be considered to be calibrated. Next, a first quality evaluation index corresponding to the original point cloud is determined from the projection image, and a specific embodiment of determining the first quality evaluation index is described below by way of example.

In one embodiment, gradient information of the projection image and image edge information of the original image may be calculated, and a correlation of the gradient information and the image edge information may be calculated. Specifically, the formula used to calculate the correlation may be:

where Jc is the correlation, w is the video window size, f is the image, (I, j) is the pixel location in the image, p is the 3-D point of the point cloud, X is the point cloud data collected by the lidar, and D is the image gradient map (which is used to characterize the gradient information).

After calculating the correlation, the correlation may be used as a first quality evaluation index corresponding to the original point cloud. Here, a correlation threshold may be preset, and if the determined correlation is greater than the correlation threshold, the quality evaluation of the original point cloud may be considered to pass; otherwise, the quality evaluation of the original point cloud may be considered not to pass.

It is easy to see that the first quality evaluation index can be determined very conveniently and quickly by the embodiment, and the quality evaluation of the original point cloud can be realized very conveniently and quickly.

In another embodiment, the Raney secondary entropy of the projected image may be calculated. Specifically, the formula used to calculate the rani secondary entropy may be:

Wherein, the liquid crystal display device comprises a liquid crystal display device,is the Raney quadratic entropy, G (a, b) is a Gaussian distribution function with a mean value of a and a variance of b, RQE is a measure of compactness in the form of a Gaussian mixture model (Gaussian Mixture Model, GMM) defining the point cloud distribution as a quality measure here.

After the raney secondary entropy is calculated, the raney secondary entropy may be used as a first quality assessment index corresponding to the original point cloud. Specifically, a raney secondary entropy threshold may be preset, and if the calculated raney secondary entropy is greater than the raney secondary entropy threshold, the quality evaluation of the original point cloud may be considered to pass; otherwise, the quality evaluation of the original point cloud may be considered not to pass.

It is easy to see that the first quality evaluation index can be determined very conveniently by the implementation mode, and the quality evaluation of the original point cloud can be realized very conveniently.

In the embodiment of the disclosure, the correlation degree or the Raney secondary entropy can be very conveniently calculated through the projection image obtained by projecting the original point cloud to the image plane of the camera, so that the first quality evaluation index is determined according to the correlation degree or the Raney secondary entropy, and the quality evaluation of the original point cloud is realized.

It should be noted that, the manner of performing quality evaluation on the original image may refer to the manner of performing quality evaluation on the original point cloud. Specifically, a PSNR threshold may be preset, and PSNR may be calculated for the original image, and in the case where the calculated PSNR is greater than the PSNR threshold, the quality evaluation of the original image may be considered to be passed; otherwise, the quality evaluation of the original image may be considered not to pass.

The working principle of the three-dimensional object detection system will be described with reference to fig. 2.

As shown in fig. 2, a laser radar, a camera 1, and a camera 2 may be included in the three-dimensional object detection system, and a control switch D and a control switch E may be provided in addition to a control switch a corresponding to the laser radar, a control switch B corresponding to the camera 1, and a control switch C corresponding to the camera 2.

Wherein the control switch D can control whether or not three operations of depth estimation, 2D detection and 2D segmentation are performed at the same time. The control switch E may have two working positions, a first working position and a second working position, where in the first working position, the control switch D is connected to an output end of the laser radar, and where in the second working position, the control switch D is connected to an output end of the pseudo-laser radar point cloud generation (which is used to output the pseudo-point cloud above).

After the laser radar collects original point clouds and the camera 1 and the camera 2 collect original images respectively, point cloud data quality evaluation can be performed on the original point clouds, and image quality evaluation can be performed on the original images collected by the camera 1 and the original images collected by the camera 2 respectively. There may then be four cases:

In the first case, only the quality evaluation of the original point cloud is passed, at this time, the control switch a may be placed in a closed state, the control switch B, the control switch C, and the control switch D may be placed in an open state, and the control switch E may be switched to the first working position. Then the three-dimensional region proposal network and the three-dimensional object detector may generate three-dimensional object detection results based solely on the original point cloud.

In the second case, the quality evaluation of the original point cloud does not pass, the quality evaluation of the original image acquired by one of the cameras 1 and 2 passes, for example, only the quality evaluation of the original image acquired by the camera 1 passes, at which time the control switch B and the control switch D may be placed in the closed state, the control switch a and the control switch C may be placed in the open state, and the control switch E may be switched to the second operation position. Then, the generation of the pseudo point cloud may be performed, and the three-dimensional area proposal network and the three-dimensional object detector may generate a three-dimensional object detection result according to the pseudo point cloud and the two-dimensional instance segmentation result obtained based on the original image acquired by the camera 1.

In the third case, the quality evaluation of the original point cloud passes, the quality evaluation of the original image acquired by one of the cameras 1 and 2 passes, for example, the quality evaluation of the original image acquired by the camera 1 passes, at which time the control switch a, the control switch B, the control switch D may all be placed in the closed state, the control switch C may be placed in the open state, and the control switch E may be switched to the first operating position. Then, the three-dimensional area proposal network and the three-dimensional object detector can generate a three-dimensional object detection result from the original point cloud and the two-dimensional instance segmentation result obtained based on the original image acquired by the camera 1.

In the fourth case, the quality evaluation of the original point cloud is passed, and the quality evaluation of the original images acquired by the camera 1 and the camera 2 are all passed, and at this time, the control switch a, the control switch B, the control switch C and the control switch D may be all placed in the closed state, and the control switch E may be switched to the first working position. Then, a three-dimensional object detection result may be generated from the two-dimensional instance segmentation result obtained based on the original image acquired by one of the cameras 1 and 2, and the original point cloud.

It should be noted that, whether the camera or the lidar, the sensor data has some defects, for example, the camera data is interfered by the illumination environment, the defects of noise, too low resolution, too high saturation of brightness or too dark and unclear and the like occur, and the lidar also has the defects of sparse data (wire harness limitation), limited distance (insufficient laser emission and receiving capability), and "black hole" caused by non-reflective materials and the like. Therefore, a multi-sensor fusion three-dimensional object detection system (including a laser radar and a camera) can be provided in the embodiments of the present disclosure, so that the embodiments of the present disclosure do not rely on any single sensor alone, but two sensors cooperate with each other and make up for the deficiency of each other, and can still operate if any one sensor fails to work or data is bad, so long as one sensor can provide reliable data, the system can continue to operate, and in the deep learning framework, the embodiments of the present disclosure can process the reliable sensor data preferentially in a very flexible manner and complementarily enhance the performance of object detection, so that the robustness and adaptability of the entire automatic driving system perception module can be improved.

Exemplary apparatus

Fig. 4 is a block diagram of a three-dimensional object detection device provided in an exemplary embodiment of the present disclosure. The apparatus shown in fig. 4 comprises a first acquisition module 401, a first determination module 402, a second determination module 403 and a second acquisition module 404.

The first obtaining module 401 is configured to obtain an original point cloud collected by the laser radar and obtain an original image collected by the camera;

a first determining module 402, configured to determine a first quality assessment index corresponding to an original point cloud, and determine a second quality assessment index corresponding to an original image;

a second determining module 403, configured to determine, according to the first quality evaluation index and the second quality evaluation index, first target data passing the quality evaluation in the original point cloud and the original image;

the second obtaining module 404 is configured to obtain a three-dimensional target detection result according to the first target data and a corresponding target detection policy.

In an alternative example, as shown in fig. 5, the second acquisition module 404 includes:

a first obtaining submodule 4041, configured to obtain a two-dimensional instance segmentation result according to the original image in a case where the original image is included in the first target data;

the second obtaining sub-module 4042 is configured to obtain a three-dimensional target detection result according to the two-dimensional instance segmentation result.

In an alternative example, the second acquisition sub-module includes:

a first acquisition unit configured to determine whether an original point cloud is included in first target data to obtain a determination result;

the second acquisition unit is used for acquiring a target point cloud according to the determination result;

and the third acquisition unit is used for acquiring a three-dimensional target detection result according to the two-dimensional instance segmentation result and the target point cloud.

In an alternative example, the second obtaining unit is specifically configured to:

taking the original point cloud as a target point cloud under the condition that the first target data is characterized by the determination result and the original point cloud is included; under the condition that the first target data represented by the determination result does not comprise the original point cloud, generating a pseudo point cloud according to the original image and calibration parameters of the camera, and taking the pseudo point cloud as a target point cloud.

In an alternative example, the third acquisition unit includes:

the generation subunit is used for generating a first target view according to the target point cloud; wherein the first target view comprises at least one of a first bird's eye view and a first front view;

the first acquisition subunit is used for extracting features according to the first target view so as to obtain a first extraction result;

The second acquisition subunit is used for carrying out feature extraction according to the two-dimensional instance segmentation result so as to obtain a second extraction result;

a third obtaining subunit, configured to input the first extraction result and the second extraction result into the three-dimensional area proposal network, so as to obtain a proposal result output by the three-dimensional area proposal network;

and a fourth acquisition subunit for inputting the obtained proposal result into the three-dimensional object detector to obtain a three-dimensional object detection result output by the three-dimensional object detector.

In one example of an alternative implementation of the method,

the quantity of camera is one, and the sub-module is acquireed to first, specifically is used for:

performing instance segmentation on an original image in first target data to obtain a two-dimensional instance segmentation result;

or alternatively, the process may be performed,

the quantity of camera is two, and the sub-module is acquireed to first, includes:

the selecting unit is used for selecting the original image acquired by one of the two cameras from the first target data under the condition that the first target data comprises the original images acquired by the two cameras respectively;

and a fourth acquisition unit for performing instance segmentation on the selected original image to obtain a two-dimensional instance segmentation result.

In an alternative example, the second acquisition module 404 includes:

The generation sub-module is used for generating a second target view according to the original point cloud under the condition that the first target data comprise the original point cloud and the first target data do not comprise the original image; wherein the second target view comprises at least one of a second bird's eye view and a second front view;

the third acquisition sub-module is used for extracting features according to the second target view so as to obtain a third extraction result;

a fourth obtaining sub-module, configured to input the third extraction result into the three-dimensional area proposal network, so as to obtain a proposal result output by the three-dimensional area proposal network;

and a fifth acquisition sub-module for inputting the obtained proposal result into the three-dimensional target detector to obtain a three-dimensional target detection result output by the three-dimensional target detector.

In an alternative example, as shown in fig. 6, the first determining module 402 includes:

a sixth obtaining submodule 4021, configured to project an original point cloud to an image plane of the camera to obtain a projected image;

the processing sub-module 4022 is configured to calculate gradient information of the projection image and a correlation degree of image edge information of the original image, and use Guan Du as a first quality assessment index corresponding to the original point cloud; or determining the Raney secondary entropy of the projection image, and taking the Raney secondary entropy as a first quality evaluation index corresponding to the original point cloud.

Exemplary electronic device

Next, an electronic device 70 according to an embodiment of the present disclosure is described with reference to fig. 7. The electronic device 70 may be either or both of the first device and the second device, or a stand-alone device independent thereof, which may communicate with the first device and the second device to receive the acquired input signals therefrom.

As shown in fig. 7, the electronic device 70 includes one or more processors 71 and memory 72.

Processor 71 may be a Central Processing Unit (CPU) or other form of processing unit having data processing and/or instruction execution capabilities that controls other components in electronic device 70 to perform desired functions.

Memory 72 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. Volatile memory can include Random Access Memory (RAM) and/or cache memory (cache), among others. The nonvolatile memory may include Read Only Memory (ROM), hard disk, flash memory, and the like. One or more computer program instructions may be stored on a computer readable storage medium and the processor 71 may execute the program instructions to implement the three-dimensional object detection methods and/or other desired functions of the various embodiments disclosed herein above.

In one example, the electronic device 70 may further include: an input device 73 and an output device 74, which are interconnected by a bus system and/or other forms of connection mechanisms (not shown). The input device 73 may include a keyboard, a mouse, etc. Output devices 74 may include a display, speakers, remote output devices, and the like.

Of course, only some of the components of the electronic device 70 that are relevant to the present disclosure are shown in fig. 7, with components such as buses, input/output interfaces, etc. omitted for simplicity. In addition, the electronic device 70 may include any other suitable components depending on the particular application.

Exemplary computer program product and computer readable storage Medium

In addition to the methods and apparatus described above, embodiments of the present disclosure may also be a computer program product comprising computer program instructions which, when executed by a processor, cause the processor to perform the steps in a three-dimensional object detection method according to various embodiments of the present disclosure described in the "exemplary methods" section of this specification.

The computer program product may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server.

Furthermore, embodiments of the present disclosure may also be a computer-readable storage medium, having stored thereon computer program instructions, which when executed by a processor, cause the processor to perform the steps in the three-dimensional object detection method according to the various embodiments of the present disclosure described in the above "exemplary method" section of the present description.

A computer readable storage medium may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may comprise an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The basic principles of the present disclosure have been described above in connection with specific embodiments, it should be noted that the advantages, benefits, effects, etc. mentioned in this disclosure are merely examples and are not to be considered as necessarily possessed by the various embodiments of the present disclosure. The specific details disclosed herein are merely for purposes of example and understanding, and are not intended to limit the disclosure to the specific details described above.

In this specification, each embodiment is described in a progressive manner, and each embodiment focuses on differences from other embodiments, and the same or similar parts between the embodiments are referred to each other. Since the system embodiment basically corresponds to the method embodiment, the description is simpler, and the relevant points refer to the description of the method embodiment.

The block diagrams of the devices, apparatuses, devices, systems referred to in this disclosure are merely illustrative examples and are not intended to require or imply that the connections, arrangements, configurations must be made in the manner shown in the block diagrams. As will be appreciated by one of skill in the art, the devices, apparatuses, devices, systems may be connected, arranged, configured in any manner. Words such as "including," "comprising," "having," and the like are words of openness and mean "including but not limited to," and are used interchangeably therewith.

The methods and apparatus of the present disclosure may be implemented in a number of ways. For example, the methods and apparatus of the present disclosure may be implemented by software, hardware, firmware, or any combination of software, hardware, firmware. The above-described sequence of steps for the method is for illustration only, and the steps of the method of the present disclosure are not limited to the sequence specifically described above unless specifically stated otherwise. Furthermore, in some embodiments, the present disclosure may also be implemented as programs recorded in a recording medium, the programs including machine-readable instructions for implementing the methods according to the present disclosure. Thus, the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.

In the apparatus, devices, and methods of the present disclosure, the components or steps may be disassembled and/or assembled. Such decomposition and/or recombination should be considered equivalent to the present disclosure. The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the disclosure. Thus, the present disclosure is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

The foregoing description has been presented for purposes of illustration and description. Furthermore, this description is not intended to limit the embodiments of the disclosure to the form disclosed herein. Although a number of example aspects and embodiments have been discussed above, a person of ordinary skill in the art will recognize certain variations, modifications, alterations, additions, and subcombinations thereof.

Claims

1. A three-dimensional object detection method, comprising:

Determining a first quality evaluation index corresponding to the original point cloud and determining a second quality evaluation index corresponding to the original image, wherein the second quality evaluation index is used for evaluating the quality of the original image and is a measure of traditional image processing and video frame acquisition;

determining first target data passing through quality evaluation in the original point cloud and the original image according to the first quality evaluation index and the second quality evaluation index, wherein the first target data comprises the original point cloud and/or the original image passing through the first quality evaluation index and the second quality evaluation index;

obtaining a two-dimensional instance segmentation result according to the original image under the condition that the original image is included in the first target data; obtaining a three-dimensional target detection result according to the two-dimensional instance segmentation result;

the determining a first quality evaluation index corresponding to the original point cloud includes:

projecting the original point cloud to an image plane of the camera to obtain a projection image; calculating gradient information of the projection image and correlation of image edge information of the original image, and taking the correlation as a first quality evaluation index corresponding to the original point cloud; or determining the Raney secondary entropy of the projection image, and taking the Raney secondary entropy as a first quality evaluation index corresponding to the original point cloud.

2. The method according to claim 1, wherein the obtaining a three-dimensional object detection result from the two-dimensional instance segmentation result comprises:

determining whether the original point cloud is included in the first target data to obtain a determination result;

obtaining a target point cloud according to the determination result;

3. The method according to claim 2, wherein the obtaining the target point cloud according to the determination result includes:

taking the original point cloud as a target point cloud when the determination result characterizes the first target data to include the original point cloud;

and under the condition that the first target data is characterized by the determined result and the original point cloud is not included, generating a pseudo point cloud according to the original image and the calibration parameters of the camera, and taking the pseudo point cloud as a target point cloud.

4. The method according to claim 2, wherein the obtaining a three-dimensional object detection result from the two-dimensional instance segmentation result and the object point cloud comprises:

Extracting features according to the first target view to obtain a first extraction result;

5. The method of claim 1, wherein the step of determining the position of the substrate comprises,

the number of cameras is one, and the obtaining a two-dimensional instance segmentation result according to the original image comprises the following steps:

performing instance segmentation on the original image in the first target data to obtain a two-dimensional instance segmentation result;

or alternatively, the process may be performed,

the number of cameras is two, and the obtaining a two-dimensional instance segmentation result according to the original image comprises the following steps:

under the condition that the first target data comprises the original images respectively acquired by the two cameras, selecting the original image acquired by one of the two cameras from the first target data;

6. The method according to claim 1, wherein obtaining a three-dimensional target detection result according to the first target data and a corresponding target detection policy includes:

generating a second target view according to the original point cloud under the condition that the original point cloud is included in the first target data and the original image is not included in the first target data; wherein the second target view comprises at least one of a second bird's eye view and a second front view;

performing feature extraction according to the second target view to obtain a third extraction result;

7. A three-dimensional object detection device, comprising:

The first determining module is used for determining a first quality evaluation index corresponding to the original point cloud and determining a second quality evaluation index corresponding to the original image, wherein the second quality evaluation index is used for evaluating the quality of the original image and is a measure of traditional image processing and video frame acquisition;

the second determining module is used for determining first target data passing through quality evaluation in the original point cloud and the original image according to the first quality evaluation index and the second quality evaluation index, wherein the first target data comprises the original point cloud and/or the original image passing through the first quality evaluation index and the second quality evaluation index;

the second acquisition module is used for acquiring a three-dimensional target detection result according to the first target data and a corresponding target detection strategy;

wherein the first determining module includes:

a sixth obtaining sub-module, configured to project the original point cloud to an image plane of the camera, so as to obtain a projection image; the processing sub-module is used for calculating the gradient information of the projection image and the correlation of the image edge information of the original image, and taking the correlation as a first quality evaluation index corresponding to the original point cloud; or determining the Raney secondary entropy of the projection image, and taking the Raney secondary entropy as a first quality evaluation index corresponding to the original point cloud;

The second acquisition module includes:

the first acquisition sub-module is used for acquiring a two-dimensional instance segmentation result according to the original image under the condition that the original image is included in the first target data;

and the second acquisition sub-module is used for acquiring a three-dimensional target detection result according to the two-dimensional instance segmentation result.

8. The apparatus of claim 7, wherein the second acquisition sub-module comprises:

a first obtaining unit, configured to determine whether the first target data includes the original point cloud, so as to obtain a determination result;

9. The apparatus according to claim 8, wherein the second acquisition unit is specifically configured to:

taking the original point cloud as a target point cloud when the determination result characterizes the first target data to include the original point cloud; and under the condition that the first target data is characterized by the determined result and the original point cloud is not included, generating a pseudo point cloud according to the original image and the calibration parameters of the camera, and taking the pseudo point cloud as a target point cloud.

10. The apparatus of claim 8, wherein the third acquisition unit comprises:

a generation subunit, configured to generate a first target view according to the target point cloud; wherein the first target view comprises at least one of a first bird's eye view and a first front view;

the second obtaining subunit is used for extracting features according to the two-dimensional instance segmentation result so as to obtain a second extraction result;

a third obtaining subunit, configured to input the first extraction result and the second extraction result into a three-dimensional area proposal network, so as to obtain a proposal result output by the three-dimensional area proposal network;

and a fourth acquisition subunit for inputting the obtained proposal result into a three-dimensional target detector to obtain a three-dimensional target detection result output by the three-dimensional target detector.

11. The apparatus of claim 7, wherein the device comprises a plurality of sensors,

the number of cameras is one, and the first acquisition submodule is specifically configured to:

Or alternatively, the process may be performed,

the number of cameras is two, the first sub-module that acquires includes:

a selecting unit, configured to select, from the first target data, the original image acquired by one of the two cameras, in a case where the first target data includes the original images acquired by the two cameras, respectively;

and a fourth acquisition unit, configured to perform instance segmentation on the selected original image, so as to obtain a two-dimensional instance segmentation result.

12. The apparatus of claim 7, wherein the second acquisition module comprises:

a generating sub-module, configured to generate a second target view according to the original point cloud when the first target data includes the original point cloud and the first target data does not include the original image; wherein the second target view comprises at least one of a second bird's eye view and a second front view;

the third obtaining sub-module is used for extracting features according to the second target view so as to obtain a third extraction result;

a fourth obtaining sub-module, configured to input the third extraction result into a three-dimensional area proposal network, so as to obtain a proposal result output by the three-dimensional area proposal network;

And a fifth acquisition sub-module, configured to input the obtained proposal result into a three-dimensional object detector, so as to obtain a three-dimensional object detection result output by the three-dimensional object detector.

13. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

the processor is configured to read the executable instructions from the memory and execute the instructions to implement the three-dimensional object detection method according to any one of claims 1 to 6.

14. A computer-readable storage medium storing a computer program for executing the three-dimensional object detection method according to any one of the preceding claims 1 to 6.