CN111179329A

CN111179329A - Three-dimensional target detection method and device and electronic equipment

Info

Publication number: CN111179329A
Application number: CN201911410518.1A
Authority: CN
Inventors: 黄浴
Original assignee: Zhiche Youxing Technology Shanghai Co ltd
Current assignee: Zhiche Youxing Technology Shanghai Co ltd
Priority date: 2019-12-31
Filing date: 2019-12-31
Publication date: 2020-05-19
Anticipated expiration: 2039-12-31
Also published as: CN111179329B

Abstract

The embodiment of the disclosure discloses a three-dimensional target detection method and device and electronic equipment. The method comprises the following steps: acquiring an original point cloud acquired by a laser radar and acquiring an original image acquired by a camera; determining a first quality evaluation index corresponding to the original point cloud and determining a second quality evaluation index corresponding to the original image; determining first target data which passes the quality evaluation in the original point cloud and the original image according to the first quality evaluation index and the second quality evaluation index; and obtaining a three-dimensional target detection result according to the first target data and a corresponding target detection strategy. Compared with the situation that the three-dimensional target detection is directly carried out according to the image collected by the camera in the prior art, the embodiment of the disclosure can effectively ensure the reliability of the detection result.

Description

Three-dimensional target detection method and device and electronic equipment

Technical Field

The present disclosure relates to the field of target detection technologies, and in particular, to a three-dimensional target detection method and apparatus, and an electronic device.

Background

For an automatic driving system, three-dimensional target detection is a very important link, at present, when the three-dimensional target detection is performed, detection is generally performed only by using an image acquired by a camera, and once the quality of the image acquired by the camera is poor, the reliability of a detection result is very poor.

Disclosure of Invention

The present disclosure is proposed to solve the above technical problems. The embodiment of the disclosure provides a three-dimensional target detection method and device and electronic equipment.

According to an aspect of the embodiments of the present disclosure, there is provided a three-dimensional target detection method, including:

acquiring an original point cloud acquired by a laser radar and acquiring an original image acquired by a camera;

determining a first quality evaluation index corresponding to the original point cloud, and determining a second quality evaluation index corresponding to the original image;

determining first target data which passes quality evaluation in the original point cloud and the original image according to the first quality evaluation index and the second quality evaluation index;

and obtaining a three-dimensional target detection result according to the first target data and a corresponding target detection strategy.

According to another aspect of the embodiments of the present disclosure, there is provided a three-dimensional object detecting apparatus including:

the first acquisition module is used for acquiring original point cloud acquired by the laser radar and acquiring an original image acquired by the camera;

the first determining module is used for determining a first quality evaluation index corresponding to the original point cloud and determining a second quality evaluation index corresponding to the original image;

the second determination module is used for determining first target data which passes quality evaluation in the original point cloud and the original image according to the first quality evaluation index and the second quality evaluation index;

and the second acquisition module is used for acquiring a three-dimensional target detection result according to the first target data and a corresponding target detection strategy.

According to still another aspect of an embodiment of the present disclosure, there is provided an electronic apparatus including:

a processor;

a memory for storing the processor-executable instructions;

and the processor is used for reading the executable instruction from the memory and executing the instruction to realize the three-dimensional target detection method.

According to still another aspect of an embodiment of the present disclosure, there is provided a computer-readable storage medium storing a computer program for executing the above-described three-dimensional object detection method.

In the embodiment of the disclosure, after a first quality evaluation index is determined for an original point cloud acquired by a laser radar and a second quality evaluation index is determined for an original image acquired by a camera, first target data which passes quality evaluation in the original point cloud and the original image can be determined according to the first quality evaluation index and the second quality evaluation index; and then, according to the first target data, obtaining a three-dimensional target detection result by using a corresponding target detection strategy. It can be seen that, a multi-sensor system is used in the embodiment of the present disclosure, the multi-sensor system may include a laser radar and a camera at the same time, and when detecting a three-dimensional target, the first target data that passes the quality evaluation is based on the original point cloud collected by the laser radar and the original image collected by the camera, that is, as long as data collected by at least one of the laser radar and the camera is reliable, a reliable three-dimensional target detection result may be obtained, and therefore, compared with a case where detecting the three-dimensional target directly according to the image collected by the camera in the prior art, the embodiment of the present disclosure may effectively ensure the reliability of the detection result. The technical solution of the present disclosure is further described in detail by the accompanying drawings and examples.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description, serve to explain the principles of the disclosure.

The present disclosure may be more clearly understood from the following detailed description, taken with reference to the accompanying drawings, in which:

fig. 1 is a schematic flow chart of a three-dimensional target detection method according to an exemplary embodiment of the present disclosure;

FIG. 2 is a block diagram of a three-dimensional object detection system in an exemplary embodiment of the present disclosure;

FIG. 3 is a schematic diagram of three-dimensional object detection in an exemplary embodiment of the present disclosure;

fig. 4 is a block diagram of a three-dimensional target detection apparatus according to an exemplary embodiment of the present disclosure;

fig. 5 is a block diagram of a three-dimensional target detection apparatus according to another exemplary embodiment of the present disclosure;

fig. 6 is a block diagram of a three-dimensional target detection apparatus according to still another exemplary embodiment of the present disclosure;

fig. 7 is a block diagram of an electronic device provided in an exemplary embodiment of the present disclosure.

Detailed Description

But only a few embodiments of the present disclosure and not all embodiments of the present disclosure, it should be understood that the present disclosure is not limited to the example embodiments described herein.

It should be noted that: the relative arrangement of the components and steps, the numerical expressions, and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless specifically stated otherwise.

It will be understood by those within the art that the terms "first", "second", etc. in the embodiments of the present disclosure are used merely to distinguish one step, device or module from another, and do not denote any particular technical meaning or necessary logical order; "plurality" may mean two or more, and "at least one" may mean one, two or more.

It is also to be understood that any reference to any component, data, or structure in the embodiments of the disclosure, may be generally understood as one or more, unless explicitly defined otherwise or stated otherwise.

In addition, the term "and/or" in the present disclosure is only one kind of association relationship describing an associated object, and means that three kinds of relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In the present disclosure, the character "/" indicates that the former and latter associated objects are in an "or" relationship.

It should also be understood that the description of the various embodiments of the present disclosure emphasizes the differences between the various embodiments, and the same or similar parts may be referred to each other, so that the descriptions thereof are omitted for brevity. It should be understood that the dimensions of the various features shown in the drawings are not drawn to scale for ease of illustration.

The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses. Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.

The disclosed embodiments may be applied to electronic devices such as terminal devices, computer systems, servers, etc., which are operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known terminal devices, computing systems, environments, and/or configurations that may be suitable for use with electronic devices, such as terminal devices, computer systems, servers, and the like, include, but are not limited to: personal computer systems, server computer systems, thin clients, thick clients, hand-held or laptop devices, microprocessor-based systems, set top boxes, programmable consumer electronics, network pcs, minicomputer systems, mainframe computer systems, distributed cloud computing environments that include any of the above systems, and the like.

Electronic devices such as terminal devices, computer systems, and the like may be described in the general context of computer system-executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, etc. that perform particular tasks or implement particular abstract data types. The computer system/server may be practiced in distributed cloud computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computer system storage media including memory storage devices.

In addition, the method can also be applied to other technical fields such as security monitoring, robot navigation and intelligent traffic management systems.

Exemplary method

Fig. 1 is a schematic flow chart of a three-dimensional target detection method according to an exemplary embodiment of the present disclosure. The method shown in fig. 1 may include step 101, step 102, step 103 and step 104, which are described separately below.

Step 101, acquiring an original point cloud acquired by a laser radar and acquiring an original image acquired by a camera.

Here, the raw point cloud collected by the lidar and the raw image collected by the camera may be time synchronized. If the laser radar continuously collects the original point clouds to obtain a point cloud sequence consisting of a plurality of original point clouds and the camera continuously collects the original images to obtain an image sequence consisting of a plurality of original images, the frame rates of the point cloud sequence and the image sequence may be the same.

Here, the number of cameras may be one; alternatively, the number of cameras may be at least two. Specifically, as can be seen from the three-dimensional target detection system shown in fig. 2, the number of the cameras may be two, the two cameras are the camera 1 and the camera 2 respectively, and the camera 1 and the camera 2 may constitute a binocular camera, in which case, the original image related to the step 101 may include the original image collected by the camera 1 and the original image collected by the camera 2 at the same time.

Step 102, determining a first quality evaluation index corresponding to the original point cloud, and determining a second quality evaluation index corresponding to the original image.

Here, the first quality evaluation index may be used to evaluate the quality of the original point cloud, and the first quality evaluation index is various in types, and for clarity of layout, the following description is given by way of example.

Here, the second quality evaluation index may be used to evaluate the quality of the original image, and the second quality evaluation index may be a measure of conventional image processing and video frame acquisition, such as peak-to-noise ratio (PSNR), Structural Similarity (SSIM), and the like.

Step 103, determining first target data which passes the quality evaluation in the original point cloud and the original image according to the first quality evaluation index and the second quality evaluation index.

The first quality evaluation index is used for evaluating the quality of the original point cloud, the second quality evaluation index is used for evaluating the quality of the original image, and whether the quality evaluation of the original point cloud and the quality evaluation of the original image pass or not can be respectively determined according to the first quality evaluation index and the second quality evaluation index, so that corresponding first target data can be obtained according to the determination result. Several possible scenarios of the data composition of the first target data are exemplified below in connection with fig. 2.

In the first case, the quality evaluation of the original point cloud collected by the laser radar is passed, the quality evaluation of the original image collected by the camera 1 and the quality evaluation of the original image collected by the camera 2 are both passed, and at this time, the first target data may include the original point cloud, the original image collected by the camera 1, and the original image collected by the camera 2 at the same time.

In the second case, the quality evaluation of the original point cloud collected by the laser radar is passed, and the quality evaluation of the original image collected by the camera 1 and the quality evaluation of the original image collected by the camera 2 are not passed, at this time, the first target data may only include the original point cloud.

In the third case, the quality evaluation of the original point cloud acquired by the lidar does not pass, and the quality evaluation of at least one of the original image acquired by the camera 1 and the original image acquired by the camera 2 passes, at this time, the first target data may only include the original image whose quality evaluation passes in the original image acquired by the camera 1 and the original image acquired by the camera 2.

In the fourth case, the quality evaluation of the original point cloud acquired by the lidar is passed, and the quality evaluation of one of the original image acquired by the camera 1 and the original image acquired by the camera 2 is passed, and at this time, the first target data may only include the original image acquired by the camera 1 and the original image acquired by the camera 2, the original image whose quality evaluation is passed, and the original point cloud.

And 104, obtaining a three-dimensional target detection result according to the first target data and a corresponding target detection strategy.

Three dimensions may also be referred to as 3D, and two dimensions may also be referred to as 2D.

Here, in the case where only the original point cloud is included in the first target data, a three-dimensional target detection result may be obtained in a target detection strategy based on the point cloud according to the first target data; in the case that the first target data only includes at least one of the original image collected by the camera 1 and the original image collected by the camera 2, a three-dimensional target detection result can be obtained according to the first target data by a target detection strategy based on the image; in the case that the first target data includes at least one of the original image acquired by the camera 1 and the original image acquired by the camera 2, and the original point cloud, a three-dimensional target detection result may be obtained according to the first target data by a target detection strategy based on the fusion of the point cloud and the image.

Specifically, the three-dimensional target detection result may include a type of the target, a 3D frame center position of the target, a size (e.g., length, width, height) of the target, and the like.

In the embodiment of the disclosure, after a first quality evaluation index is determined for an original point cloud acquired by a laser radar and a second quality evaluation index is determined for an original image acquired by a camera, first target data which passes quality evaluation in the original point cloud and the original image can be determined according to the first quality evaluation index and the second quality evaluation index; and then, according to the first target data, obtaining a three-dimensional target detection result by using a corresponding target detection strategy. It can be seen that, a multi-sensor system is used in the embodiment of the present disclosure, the multi-sensor system may include a laser radar and a camera at the same time, and when detecting a three-dimensional target, the first target data that passes the quality evaluation is based on the original point cloud collected by the laser radar and the original image collected by the camera, that is, as long as data collected by at least one of the laser radar and the camera is reliable, a reliable three-dimensional target detection result may be obtained, and therefore, compared with a case where detecting the three-dimensional target directly according to the image collected by the camera in the prior art, the embodiment of the present disclosure may effectively ensure the reliability of the detection result.

In an optional example, obtaining a three-dimensional target detection result according to the first target data and a corresponding target detection strategy includes:

under the condition that the first target data comprises an original image, obtaining a two-dimensional instance segmentation result according to the original image;

and obtaining a three-dimensional target detection result according to the two-dimensional example segmentation result.

Here, a two-dimensional example segmentation result may be obtained from the original image, and a specific embodiment of obtaining the two-dimensional example segmentation result will be described below by way of example.

In a specific embodiment, the number of the cameras is one, and a two-dimensional instance segmentation result is obtained according to an original image, and the method includes:

and carrying out example segmentation on the original image in the first target data to obtain a two-dimensional example segmentation result.

Here, the three-dimensional object detection system may include a two-dimensional object detection module and a two-dimensional segmentation module, and a combination of the two-dimensional object detection module and the two-dimensional segmentation module may be regarded as an example segmentation module. In general, the Instance Segmentation model (Instance Segmentation) employed by the Instance Segmentation module can be Mask-RCNN (which is an Instance Segmentation algorithm), Sequential Group Networks (SGNs), and the like.

Assuming that the first target data only includes the original image acquired by the camera 1 in fig. 2, as shown in fig. 2, a two-dimensional target detection module may be used to perform 2D detection on the original image acquired by the camera 1, and a two-dimensional segmentation module may also be used to perform 2D segmentation on the original image acquired by the camera 1, and a detection result obtained by the 2D detection and a segmentation result obtained by the 2D segmentation may be used to form a two-dimensional example segmentation result; the detection result obtained by the 2D detection may include the type of the target, the 2D frame position of the target, the size (e.g., length and width) of the target, and the like.

In this embodiment, the two-dimensional instance segmentation result can be obtained very conveniently using the original image in the first target data.

In another specific embodiment, the number of the cameras is two, and a two-dimensional example segmentation result is obtained according to the original image, including:

under the condition that the first target data comprises original images respectively collected by the two cameras, selecting the original image collected by one of the two cameras from the first target data;

and performing example segmentation on the selected original image to obtain a two-dimensional example segmentation result.

Assuming that the original image captured by the camera 1 and the original image captured by the camera 2 in fig. 2 are included in the first target data, the original image captured by one of the

cameras

1 and 2 may be selected from the first target data.

Specifically, an original image with the best image quality may be selected from the original image collected by the camera 1 and the original image collected by the camera 2 according to the second quality assessment index corresponding to the original image collected by the camera 1 and the second quality assessment index corresponding to the original image collected by the camera 2. Alternatively, the original image captured by one camera may be arbitrarily selected from the original image captured by the camera 1 and the original image captured by the camera 2.

Then, similar to the previous embodiment, the two-dimensional target detection module and the two-dimensional segmentation module may be utilized to perform 2D segmentation and 2D detection on the selected original image, respectively, so as to obtain a two-dimensional instance segmentation result.

In this embodiment, the two-dimensional instance segmentation result can also be obtained very conveniently using the original image in the first target data.

No matter what implementation is adopted to obtain the two-dimensional example segmentation result, after the two-dimensional example segmentation result is obtained, the three-dimensional target detection result can be obtained according to the two-dimensional example segmentation result. It can be seen that, in the embodiment of the present disclosure, in the case that the original image is included in the first target data, information in the original image may be used in the three-dimensional target detection process to ensure reliability of the detection result.

In an alternative example, obtaining a three-dimensional target detection result according to a two-dimensional instance segmentation result includes:

determining whether the original point cloud is included in the first target data to obtain a determination result;

obtaining a target point cloud according to the determination result;

and obtaining a three-dimensional target detection result according to the two-dimensional example segmentation result and the target point cloud.

Here, the data in the first target data may be traversed to obtain a determination result for characterizing whether the original point cloud is included in the first target data. Next, a target point cloud may be obtained according to the determination result.

In one embodiment, obtaining the target point cloud according to the determination result includes:

taking the original point cloud as a target point cloud under the condition that the determined result represents that the first target data comprises the original point cloud;

and under the condition that the determined result represents that the first target data does not comprise the original point cloud, generating a pseudo point cloud according to the original image and the calibration parameters of the camera, and taking the pseudo point cloud as the target point cloud.

Here, the calibration parameters of the camera may include internal parameters and external parameters; the internal parameters can comprise a camera focal length, an eccentricity and a distortion coefficient; the extrinsic parameters may include a pose of the camera relative to a world coordinate system, including, for example, a rotation matrix and a translation matrix of the camera relative to the world coordinate system.

Here, the three-dimensional object detection system may include a depth estimation module in addition to the two-dimensional object detection module and the two-dimensional segmentation module.

In the case that the determination result indicates that the first target data includes the original point cloud, the original point cloud in the first target data may be directly used as the target point cloud.

In the case that the determination result indicates that the first target data does not include the original point cloud, as shown in fig. 2, a depth estimation module and an original image in the first target data may be utilized to perform depth estimation to obtain a corresponding depth map. Specifically, in the case that the first target data includes an original image acquired by the camera 1 and an original image acquired by the camera 2, the depth estimation model adopted by the depth estimation module may be a binocular Stereo depth estimation model, such as a Pyramid Stereo Matching Network (PSMNet), a Group-related Stereo Network (GWCNet), and the like; in the case where only one of the original image acquired by the camera 1 and the original image acquired by the camera 2 is included in the first target data, the depth estimation model adopted by the depth estimation module may be a monocular depth estimation model, such as GeoNet (which is an unsupervised learning framework for monocular depth, optical flow, and camera motion estimation in video), a Structure-Aware-based residual pyramid Network (SARPN), and the like.

After the depth map is obtained, inverse perspective transformation can be performed on the depth map according to the calibration parameters of the camera to obtain a pseudo point cloud, and then the pseudo point cloud is used as a target point cloud. It should be noted that the calibration parameter for performing the inverse perspective transformation may be the calibration parameter of the camera corresponding to the original image for performing the example segmentation.

In this embodiment, the target point cloud can be determined very conveniently regardless of whether the original point cloud is included in the first target data. After the target point cloud is obtained, a three-dimensional target detection result can be obtained according to the two-dimensional example segmentation result and the target point cloud.

In the embodiment of the disclosure, the three-dimensional target detection result can be obtained by combining the two-dimensional example segmentation result of the original image acquired by the camera and the target point cloud, so that the reliability of the three-dimensional target detection result can be better ensured.

In an alternative example, obtaining a three-dimensional target detection result according to a two-dimensional instance segmentation result and a target point cloud includes:

generating a first target view according to the target point cloud; wherein the first target view comprises at least one of a first bird's eye view and a first front view;

according to the first target view, feature extraction is carried out to obtain a first extraction result;

according to the two-dimensional example segmentation result, performing feature extraction to obtain a second extraction result;

inputting the first extraction result and the second extraction result into a three-dimensional area proposal network to obtain a proposal result output by the three-dimensional area proposal network;

and inputting the obtained proposal result into a three-dimensional target detector to obtain a three-dimensional target detection result output by the three-dimensional target detector.

Here, the three-dimensional object detection system may further include a three-dimensional area proposed network (RPN), which may be referred to as a 3D RPN for short, and a three-dimensional object detector.

In the embodiments of the present disclosure, after obtaining the target point cloud, the target point cloud may be projected in a vertical direction to obtain a first bird's-eye view, and the target point cloud may also be projected in a forward camera direction to obtain a first front view, by which a first target view including at least one of the first bird's-eye view and the first front view can be obtained.

Assuming that the first target view includes the first bird's-eye view and the first front view at the same time, as shown in fig. 3, two feature extraction encoders may be used to perform feature extraction on two views of the first target view respectively to obtain two first extraction results. In addition, as shown in fig. 3, a feature extraction encoder may be further used to perform feature extraction on the two-dimensional example segmentation result to obtain a second extraction result. Optionally, each feature extraction encoder may extract features using a residual network or a dense network. In particular, the residual network may also be referred to as ResNet and the dense network may also be referred to as densnet.

Next, both the first extraction result and the second extraction result may be input into a three-dimensional area proposal network for 3D proposal generation, resulting in several proposals, each of which may include a possible target location, size, orientation, etc.

The three-dimensional area proposal network can sort the proposals and output proposal results comprising the first N proposals in the sort; wherein, N can be 20, 25, 30, etc.

The proposed result output by the three-dimensional area proposed network can be input to a three-dimensional target detector, as shown in fig. 2 and 3, and the three-dimensional target detector can perform 3D target detection accordingly to obtain a three-dimensional target detection result. Here, the three-dimensional area proposed network and the three-dimensional target detector may be composed of a fully connected layer (i.e., FCL), and finally a Non-maximum suppression (NMS) module is added.

Therefore, in the embodiment of the disclosure, the three-dimensional target detection result can be conveniently and reliably obtained by feature extraction and combination of the three-dimensional area proposed network and the three-dimensional target detector.

under the condition that the first target data comprise the original point cloud and the first target data do not comprise the original image, generating a second target view according to the original point cloud; wherein the second target view comprises at least one of a second bird's eye view and a second front view;

according to the second target view, feature extraction is carried out to obtain a third extraction result;

inputting the third extraction result into a three-dimensional area proposal network to obtain a proposal result output by the three-dimensional area proposal network;

Here, in the case where only the original point cloud is included in the first target data, the original point cloud may be projected in the vertical direction to obtain the second bird's eye view, and the target point cloud may also be projected in the forward camera direction to obtain the second front view, by which means the second target view including at least one of the second bird's eye view and the second front view can be obtained.

Assuming that the second target view includes the second bird's-eye view and the second front view at the same time, as shown in fig. 3, two feature extraction encoders may be used to perform feature extraction on two views of the second target view respectively to obtain two third extraction results. Next, the two third extraction results may be input into the three-dimensional area proposed network, and the following steps refer to the above description of the process after the two first extraction results and the second extraction result are both input into the three-dimensional area proposed network, which is not described herein again.

In one optional example, determining a first quality assessment indicator corresponding to the original point cloud comprises:

projecting the original point cloud to an image plane of a camera to obtain a projected image;

calculating the gradient information of the projected image and the correlation degree of the image edge information of the original image, and taking the correlation degree as a first quality evaluation index corresponding to the original point cloud; or determining the Raney quadratic entropy of the projection image, and taking the Raney quadratic entropy as a first quality evaluation index corresponding to the original point cloud.

Taking the case that the three-dimensional target detection system only includes one camera as an example, after the original point cloud acquired by the laser radar and the original image acquired by the camera are acquired, the original point cloud can be projected to the image plane of the camera to obtain a projected image, and at this time, the coordinate systems of the laser radar and the camera can be considered to be calibrated. Next, a first quality evaluation index corresponding to the original point cloud is determined according to the projection image, and a specific embodiment of determining the first quality evaluation index is described below by way of example.

In one embodiment, gradient information of the projection image and image edge information of the original image may be calculated, and a correlation of the gradient information and the image edge information may be calculated. Specifically, the formula used for calculating the correlation may be:

where Jc is the degree of correlation, w is the video window size, f is the image, (I, j) is the pixel location in the image, p is the 3-D point of the point cloud, X is the point cloud data acquired by the lidar, and D is the image gradient map (which is used to characterize the gradient information).

After the correlation degree is calculated, the correlation degree can be used as a first quality evaluation index corresponding to the original point cloud. Here, a correlation threshold may be preset, and if the determined correlation is greater than the correlation threshold, the quality evaluation of the original point cloud may be considered to pass; otherwise, the quality assessment of the original point cloud may be deemed to fail.

It is easy to see that, the implementation method can determine the first quality evaluation index very conveniently, and can realize the quality evaluation of the original point cloud very conveniently.

In another embodiment, the Raney quadratic entropy of the projected image may be calculated. Specifically, the formula adopted for calculating the raney quadratic entropy may be:

wherein the content of the first and second substances,

it is reny quadratic entropy, G (a, b) is a Gaussian distribution function with mean a and variance b, RQE is a dense measure defining point cloud distribution in the form of a Gaussian Mixture Model (GMM), which can be used as the quality measure here.

After the raney quadratic entropy is calculated, the raney quadratic entropy may be used as a first quality evaluation index corresponding to the original point cloud. Specifically, a threshold value of the raney quadratic entropy may be preset, and if the calculated raney quadratic entropy is greater than the threshold value of the raney quadratic entropy, the quality evaluation of the original point cloud may be considered to pass; otherwise, the quality assessment of the original point cloud may be deemed to fail.

It is easy to see that, the implementation method can also determine the first quality evaluation index very conveniently, and can realize the quality evaluation of the original point cloud very conveniently.

In the embodiment of the disclosure, the correlation or the raney entropy can be calculated very conveniently by projecting the original point cloud to the projection image obtained by the image plane of the camera, so that the first quality evaluation index is determined accordingly, and the quality evaluation of the original point cloud is realized.

It should be noted that the quality evaluation of the original image may be performed in a manner similar to that of the original point cloud. Specifically, a PSNR threshold may be preset, and the PSNR may be calculated for the original image, and when the calculated PSNR is greater than the PSNR threshold, it may be considered that the quality evaluation of the original image passes; otherwise, the quality assessment of the original image may be deemed to fail.

The working principle of the three-dimensional target detection system will be described with reference to fig. 2.

As shown in fig. 2, a laser radar, a camera 1 and a camera 2 may be included in the three-dimensional target detection system, and a control switch D and a control switch E may be provided in the three-dimensional target detection system in addition to a control switch a provided corresponding to the laser radar, a control switch B provided corresponding to the camera 1, and a control switch C provided corresponding to the camera 2.

Wherein, the control switch D can simultaneously control whether the three operations of depth estimation, 2D detection and 2D segmentation are performed. The control switch E may have two working positions, a first working position and a second working position, respectively, and when in the first working position, the control switch D is connected to the output end of the laser radar, and when in the second working position, the control switch D is connected to the output end of the pseudo laser radar point cloud generation (which is used for outputting the pseudo point cloud in the above).

After the laser radar collects the original point cloud, and the camera 1 and the camera 2 respectively collect original images, point cloud data quality evaluation can be performed on the original point cloud, and image quality evaluation can be performed on the original images collected by the camera 1 and the original images collected by the camera 2 respectively. The following four cases are possible:

in the first case, only the quality evaluation of the original point cloud passes, at this time, the control switch a may be placed in the closed state, the control switch B, the control switch C, and the control switch D may be placed in the open state, and the control switch E may be switched to the first working position. Then, the three-dimensional area proposed network and the three-dimensional target detector may generate a three-dimensional target detection result only from the original point cloud.

In the second case, the quality evaluation of the original point cloud is not passed, the quality evaluation of the original image captured by one of the

cameras

1 and 2 is passed, for example, the quality evaluation of the original image captured by only the camera 1 is passed, at which time the control switch B and the control switch D can be put into a closed state, the control switch a and the control switch C can be put into an open state, and the control switch E can be switched to the second operating position. Then, a pseudo point cloud may be generated, and the three-dimensional area proposed network and the three-dimensional target detector may generate a three-dimensional target detection result according to the pseudo point cloud and a two-dimensional example segmentation result obtained based on the original image acquired by the camera 1.

In the third case, the quality of the original point cloud is evaluated, and the quality of the original image acquired by one of the camera 1 and the camera 2 is evaluated, for example, the quality of the original image acquired by the camera 1 is evaluated, at this time, the control switch a, the control switch B, and the control switch D may be all turned on, the control switch C may be turned off, and the control switch E may be switched to the first working position. Then, the three-dimensional area proposed network and the three-dimensional target detector may generate a three-dimensional target detection result from the original point cloud and a two-dimensional instance segmentation result obtained based on the original image acquired by the camera 1.

In the fourth situation, the quality evaluation of the original point cloud passes, the quality evaluation of the original image acquired by the camera 1 and the quality evaluation of the original image acquired by the camera 2 pass, at this time, the control switch a, the control switch B, the control switch C and the control switch D can be all placed in a closed state, and the control switch E can be switched to the first working position. Then, a three-dimensional target detection result may be generated from a two-dimensional instance segmentation result obtained based on an original image acquired by one of the camera 1 and the camera 2, and the original point cloud.

It should be noted that, both the camera and the lidar have some defects in the sensor data, for example, the camera data is interfered by the illumination environment, and the defects such as noise, too low resolution, too high saturation of brightness, or too dark and unclear occur, while the lidar also has the defects such as data sparsity (beam limitation), distance limitation (insufficient laser transmitting and receiving capability), and "black hole" caused by non-reflective material. Therefore, the embodiment of the present disclosure may provide a multi-sensor fused three-dimensional target detection system (which includes both a laser radar and a camera), so that the embodiment of the present disclosure does not rely on any single sensor alone, but two sensors cooperate with each other to make up for the deficiency of each other, and may still operate under the condition that any sensor fails to work or data is deteriorated, and as long as one sensor can provide reliable data, the system may continue to operate.

Exemplary devices

Fig. 4 is a block diagram of a three-dimensional object detection device according to an exemplary embodiment of the present disclosure. The apparatus shown in fig. 4 includes a first obtaining module 401, a first determining module 402, a second determining module 403, and a second obtaining module 404.

A first obtaining module 401, configured to obtain an original point cloud collected by a laser radar and obtain an original image collected by a camera;

a first determining module 402, configured to determine a first quality assessment indicator corresponding to the original point cloud, and determine a second quality assessment indicator corresponding to the original image;

a second determining module 403, configured to determine, according to the first quality assessment indicator and the second quality assessment indicator, first target data that passes quality assessment in the original point cloud and the original image;

and a second obtaining module 404, configured to obtain a three-dimensional target detection result according to the first target data and a corresponding target detection strategy.

In an alternative example, as shown in fig. 5, the second obtaining module 404 includes:

the first obtaining sub-module 4041 is configured to, when the first target data includes an original image, obtain a two-dimensional instance segmentation result according to the original image;

and the second obtaining sub-module 4042 is configured to obtain a three-dimensional target detection result according to the two-dimensional example segmentation result.

In an optional example, the second obtaining sub-module includes:

a first acquisition unit configured to determine whether an original point cloud is included in the first target data to obtain a determination result;

the second acquisition unit is used for acquiring a target point cloud according to the determination result;

and the third acquisition unit is used for acquiring a three-dimensional target detection result according to the two-dimensional example segmentation result and the target point cloud.

In an optional example, the second obtaining unit is specifically configured to:

taking the original point cloud as a target point cloud under the condition that the determined result represents that the first target data comprises the original point cloud; and under the condition that the determined result represents that the first target data does not comprise the original point cloud, generating a pseudo point cloud according to the original image and the calibration parameters of the camera, and taking the pseudo point cloud as the target point cloud.

In an optional example, the third obtaining unit includes:

the generating subunit is used for generating a first target view according to the target point cloud; wherein the first target view comprises at least one of a first bird's eye view and a first front view;

the first acquisition subunit is used for performing feature extraction according to the first target view to obtain a first extraction result;

the second obtaining subunit is used for carrying out feature extraction according to the two-dimensional instance segmentation result so as to obtain a second extraction result;

the third acquisition subunit is used for inputting the first extraction result and the second extraction result into the three-dimensional area proposal network so as to obtain a proposal result output by the three-dimensional area proposal network;

and the fourth acquisition subunit is used for inputting the obtained proposal result into the three-dimensional target detector to obtain a three-dimensional target detection result output by the three-dimensional target detector.

In one alternative example of this, the user may,

the number of the cameras is one, and the first acquisition submodule is specifically used for:

carrying out example segmentation on an original image in first target data to obtain a two-dimensional example segmentation result;

alternatively, the first and second electrodes may be,

the quantity of camera is two, and the first submodule of acquireing includes:

the selection unit is used for selecting an original image acquired by one of the two cameras from the first target data under the condition that the first target data comprises the original images acquired by the two cameras respectively;

and the fourth acquisition unit is used for carrying out example segmentation on the selected original image so as to obtain a two-dimensional example segmentation result.

In an optional example, the second obtaining module 404 includes:

the generation submodule is used for generating a second target view according to the original point cloud under the condition that the first target data comprises the original point cloud and the first target data does not comprise the original image; wherein the second target view comprises at least one of a second bird's eye view and a second front view;

the third obtaining submodule is used for carrying out feature extraction according to the second target view so as to obtain a third extraction result;

the fourth obtaining submodule is used for inputting the third extraction result into the three-dimensional area proposal network so as to obtain a proposal result output by the three-dimensional area proposal network;

and the fifth acquisition submodule is used for inputting the obtained proposed result into the three-dimensional target detector so as to obtain a three-dimensional target detection result output by the three-dimensional target detector.

In an alternative example, as shown in fig. 6, the first determining module 402 includes:

a sixth obtaining sub-module 4021, configured to project the original point cloud to an image plane of the camera to obtain a projection image;

the processing submodule 4022 is configured to calculate gradient information of the projection image and a correlation degree of image edge information of the original image, and use the correlation degree as a first quality evaluation index corresponding to the original point cloud; or determining the Raney quadratic entropy of the projection image, and taking the Raney quadratic entropy as a first quality evaluation index corresponding to the original point cloud.

Exemplary electronic device

Next, an electronic apparatus 70 according to an embodiment of the present disclosure is described with reference to fig. 7. The electronic device 70 may be either or both of the first device and the second device, or a stand-alone device separate from them that may communicate with the first device and the second device to receive the acquired input signals therefrom.

As shown in fig. 7, the electronic device 70 includes one or more processors 71 and a memory 72.

Processor 71 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities that controls other components in electronic device 70 to perform desired functions.

Memory 72 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. Volatile memory can include Random Access Memory (RAM), cache memory (or the like). The non-volatile memory may include Read Only Memory (ROM), hard disk, flash memory, etc. One or more computer program instructions may be stored on a computer-readable storage medium and executed by processor 71 to implement the three-dimensional object detection methods of the various embodiments disclosed above and/or other desired functions.

In one example, the electronic device 70 may further include: an input device 73 and an output device 74, which are interconnected by a bus system and/or other form of connection mechanism (not shown). The input device 73 may include a keyboard, a mouse, and the like. Output device 74 may include a display, speakers, a remote output device, and the like.

Of course, for simplicity, only some of the components of the electronic device 70 relevant to the present disclosure are shown in fig. 7, omitting components such as buses, input/output interfaces, and the like. In addition, the electronic device 70 may include any other suitable components, depending on the particular application.

Exemplary computer program product and computer-readable storage Medium

In addition to the above-described methods and apparatus, embodiments of the present disclosure may also be a computer program product comprising computer program instructions that, when executed by a processor, cause the processor to perform the steps in the three-dimensional object detection method according to various embodiments of the present disclosure described in the "exemplary methods" section above of this specification.

The computer program product may write program code for performing the operations of embodiments of the present disclosure in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server.

Furthermore, embodiments of the present disclosure may also be a computer-readable storage medium having stored thereon computer program instructions that, when executed by a processor, cause the processor to perform the steps in the three-dimensional object detection method according to various embodiments of the present disclosure described in the "exemplary methods" section above in this specification.

A computer-readable storage medium may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the above. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The foregoing describes the general principles of the present disclosure in conjunction with specific embodiments, and it is noted that the advantages, effects, etc., presented in the present disclosure are merely examples and are not limiting, which should not be considered essential to the various embodiments of the present disclosure. The specific details disclosed above are for the purpose of illustration and understanding only and are not intended to be limiting, since the above-described details do not limit the disclosure to the specific details described above.

The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts in the embodiments are referred to each other. Because the system embodiment basically corresponds to the method embodiment, the description is relatively simple, and the relevant points can be referred to the description of the method embodiment.

The block diagrams of devices, apparatuses, systems referred to in this disclosure are only given as illustrative examples and are not intended to require or imply that the connections, arrangements, configurations, etc. must be made in the manner shown in the block diagrams. These devices, apparatuses, devices, systems may be connected, arranged, configured in any manner, as will be appreciated by those skilled in the art. Words such as "including," "comprising," "having," and the like are open-ended words that mean "including, but not limited to," and are used interchangeably therewith.

The methods and apparatus of the present disclosure may be implemented in a number of ways. For example, the methods and apparatus of the present disclosure may be implemented by software, hardware, firmware, or any combination of software, hardware, and firmware. The above-described order for the steps of the method is for illustration only, and the steps of the method of the present disclosure are not limited to the order specifically described above unless specifically stated otherwise. Further, in some embodiments, the present disclosure may also be embodied as programs recorded in a recording medium, the programs including machine-readable instructions for implementing the methods according to the present disclosure. Thus, the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.

In the apparatus, devices and methods of the present disclosure, components or steps may be decomposed and/or recombined. These decompositions and/or recombinations are to be considered equivalents of the present disclosure. The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the disclosure. Thus, the present disclosure is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

The foregoing description has been presented for purposes of illustration and description. Furthermore, this description is not intended to limit embodiments of the disclosure to the form disclosed herein. While a number of example aspects and embodiments have been discussed above, those of skill in the art will recognize certain variations, modifications, alterations, additions and sub-combinations thereof.

Claims

1. A three-dimensional target detection method is characterized by comprising the following steps:

2. The method of claim 1, wherein obtaining three-dimensional object detection results according to the first object data and corresponding object detection strategies comprises:

under the condition that the first target data comprises the original image, obtaining a two-dimensional instance segmentation result according to the original image;

3. The method according to claim 2, wherein the obtaining a three-dimensional target detection result according to the two-dimensional instance segmentation result comprises:

obtaining a target point cloud according to the determination result;

4. The method of claim 3, wherein obtaining a target point cloud according to the determination comprises:

taking the original point cloud as a target point cloud under the condition that the determined result indicates that the original point cloud is included in the first target data;

and under the condition that the determined result represents that the original point cloud is not included in the first target data, generating a pseudo point cloud according to the original image and the calibration parameters of the camera, and taking the pseudo point cloud as a target point cloud.

5. The method of claim 3, wherein obtaining a three-dimensional target detection result from the two-dimensional instance segmentation result and the target point cloud comprises:

according to the two-dimensional example segmentation result, feature extraction is carried out to obtain a second extraction result;

6. The method of claim 2,

the number of the cameras is one, and the obtaining of the two-dimensional instance segmentation result according to the original image comprises the following steps:

performing example segmentation on the original image in the first target data to obtain a two-dimensional example segmentation result;

alternatively, the first and second electrodes may be,

the number of the cameras is two, and the obtaining of the two-dimensional instance segmentation result according to the original image comprises the following steps:

under the condition that the first target data comprises the original images respectively collected by the two cameras, selecting the original image collected by one of the two cameras from the first target data;

and carrying out example segmentation on the selected original image to obtain a two-dimensional example segmentation result.

7. The method of claim 1, wherein obtaining three-dimensional object detection results according to the first object data and corresponding object detection strategies comprises:

under the condition that the original point cloud is included in the first target data and the original image is not included in the first target data, generating a second target view according to the original point cloud; wherein the second target view comprises at least one of a second bird's eye view and a second front view;

8. A three-dimensional object detecting device, comprising:

9. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

the processor is configured to read the executable instructions from the memory and execute the instructions to implement the three-dimensional object detection method of any one of claims 1 to 7.

10. A computer-readable storage medium, in which a computer program is stored, characterized in that the computer program is configured to execute the three-dimensional object detection method of any one of the preceding claims 1 to 7.