CN115082886B

CN115082886B - Target detection method, device, storage medium, chip and vehicle

Info

Publication number: CN115082886B
Application number: CN202210786934.7A
Authority: CN
Inventors: 赵燕顺
Original assignee: Xiaomi Automobile Technology Co Ltd
Current assignee: Xiaomi Automobile Technology Co Ltd
Priority date: 2022-07-04
Filing date: 2022-07-04
Publication date: 2023-09-29
Anticipated expiration: 2042-07-04
Also published as: CN115082886A

Abstract

The disclosure relates to a target detection method, a target detection device, a storage medium, a chip and a vehicle, and relates to the field of automatic driving. The method comprises the steps of obtaining image information of a target area to be detected and point cloud information of the target area; acquiring a first three-dimensional detection result of a target object in the target area according to the image information, and acquiring a second three-dimensional detection result of the target object in the target area according to the point cloud information; determining a to-be-determined three-dimensional detection result from the first three-dimensional detection result and the second three-dimensional detection result; acquiring a two-dimensional detection result of the target object in the target area according to the image information; and determining a target three-dimensional detection result from the to-be-determined three-dimensional detection results according to the two-dimensional detection result. The method has the advantages that the undetermined three-dimensional detection result can be checked through the two-dimensional detection result, false detection of the undetermined three-dimensional detection result is reduced, accuracy of target detection is improved, and safety and reliability of automatic driving of the vehicle are improved.

Description

Target detection method, device, storage medium, chip and vehicle

Technical Field

The present disclosure relates to the field of autopilot, and in particular, to a method, apparatus, storage medium, chip, and vehicle for target detection.

Background

In the automatic driving technology of the vehicle, corresponding target detection results can be obtained through multiple target detection technologies, and the multiple target detection result fusion technology can provide higher reliability for target detection of automatic driving of the vehicle and improve the safety of automatic driving of the vehicle.

The scheme of fusing multiple target detection results in the related technology is generally to fuse three-dimensional target detection results, and because more false detection exists in the three-dimensional target detection results in the related technology, the false detection rate of the fused target detection results is higher, and the user experience is affected.

Disclosure of Invention

To overcome the problems in the related art, the present disclosure provides a method, apparatus, storage medium, chip, and vehicle for object detection.

According to a first aspect of embodiments of the present disclosure, there is provided a method of target detection, the method comprising:

acquiring image information of a target area to be detected and point cloud information of the target area;

acquiring a first three-dimensional detection result of a target object in the target area according to the image information, and acquiring a second three-dimensional detection result of the target object in the target area according to the point cloud information;

Determining a to-be-determined three-dimensional detection result from the first three-dimensional detection result and the second three-dimensional detection result;

acquiring a two-dimensional detection result of the target object in the target area according to the image information;

and determining a target three-dimensional detection result from the to-be-determined three-dimensional detection results according to the two-dimensional detection result.

Optionally, the pending three-dimensional detection result includes a pending three-dimensional labeling frame in the target area for labeling the target object, the target three-dimensional detection result includes a target three-dimensional labeling frame, and determining, according to the two-dimensional detection result, a target three-dimensional detection result from the pending three-dimensional detection result includes:

acquiring camera calibration parameters corresponding to the image information;

projecting the to-be-determined three-dimensional annotation frame onto the image information according to the camera calibration parameters to obtain annotation frame projection of the to-be-determined three-dimensional annotation frame on the image information;

and determining the target three-dimensional annotation frame from the to-be-determined three-dimensional annotation frame according to the two-dimensional detection result and the annotation frame projection.

Optionally, the two-dimensional detection result includes a two-dimensional labeling frame for labeling the target object in the target area, and determining, according to the two-dimensional detection result and the labeling frame projection, the target three-dimensional labeling frame from the to-be-determined three-dimensional labeling frames includes:

Matching the two-dimensional annotation frame with the projection of the annotation frame;

and taking the successfully matched to-be-determined three-dimensional labeling frame as the target three-dimensional labeling frame.

Optionally, the matching the two-dimensional annotation frame and the annotation frame projection includes:

determining the total area of the two-dimensional annotation frame and the annotation frame projected in the image information under the condition that the overlapping area exists between the two-dimensional annotation frame and the annotation frame projected;

determining an overlap ratio of the overlap region in the total region;

and under the condition that the overlapping proportion is larger than or equal to a preset overlapping proportion threshold value, determining that the projection matching of the two-dimensional annotation frame and the annotation frame is successful.

Optionally, the two-dimensional detection result further includes a first object recognition type, the pending three-dimensional detection result further includes a second object recognition type, and the pending three-dimensional labeling frame to be successfully matched includes, as the target three-dimensional labeling frame:

and under the condition that the corresponding relation exists between the first object recognition type and the second object recognition type, taking the to-be-determined three-dimensional annotation frame successfully matched as the target three-dimensional annotation frame.

Optionally, the target three-dimensional detection result further includes a target object type corresponding to the target object;

the determining a target three-dimensional detection result from the to-be-determined three-dimensional detection results according to the two-dimensional detection result further comprises:

and taking the first object identification type as the target object type.

Optionally, the first three-dimensional detection result includes a first three-dimensional labeling frame in the target area for labeling the target object, the second three-dimensional detection result includes a second three-dimensional labeling frame in the target area for labeling the target object, and determining the to-be-determined three-dimensional detection result from the first three-dimensional detection result and the second three-dimensional detection result includes:

matching the first three-dimensional annotation frame with the second three-dimensional annotation frame;

and determining the undetermined three-dimensional detection result from the first three-dimensional detection result and the second three-dimensional detection result according to the matching result.

According to a second aspect of embodiments of the present disclosure, there is provided an apparatus for target detection, the apparatus comprising:

the first acquisition module is configured to acquire image information of a target area to be detected and point cloud information of the target area;

The second acquisition module is configured to acquire a first three-dimensional detection result of a target object in the target area according to the image information and acquire a second three-dimensional detection result of the target object in the target area according to the point cloud information;

a determining module configured to determine a pending three-dimensional detection result from the first three-dimensional detection result and the second three-dimensional detection result;

a third acquisition module configured to acquire a two-dimensional detection result of the target object in the target area according to the image information;

and the detection module is configured to determine a target three-dimensional detection result from the to-be-determined three-dimensional detection results according to the two-dimensional detection result.

Optionally, the pending three-dimensional detection result includes a pending three-dimensional labeling frame in the target area for labeling the target object, the target three-dimensional detection result includes a target three-dimensional labeling frame, and the detection module is further configured to:

acquiring camera calibration parameters corresponding to the image information;

Optionally, the two-dimensional detection result includes a two-dimensional labeling frame in the target area for labeling the target object, and the detection module is further configured to:

Optionally, the detection module is further configured to:

determining an overlap ratio of the overlap region in the total region;

Optionally, the two-dimensional detection result further includes a first object recognition type, the pending three-dimensional detection result further includes a second object recognition type, and the detection module is further configured to:

Optionally, the target three-dimensional detection result further includes a target object type corresponding to the target object, and the detection module is further configured to:

and taking the first object identification type as the target object type.

Optionally, the first three-dimensional detection result includes a first three-dimensional labeling frame in the target area for labeling the target object, the second three-dimensional detection result includes a second three-dimensional labeling frame in the target area for labeling the target object, and the determining module is further configured to:

According to a third aspect of embodiments of the present disclosure, there is provided a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the steps of the method of any of the first aspects.

According to a fourth aspect of embodiments of the present disclosure, there is provided a chip comprising a processor and an interface; the processor is configured to read instructions to perform the method of any of the first aspects.

According to a fifth aspect of embodiments of the present disclosure, there is provided a vehicle comprising:

a memory having a computer program stored thereon;

a processor for executing the computer program in the memory to implement the steps of the method of any of the first aspects.

The technical scheme provided by the embodiment of the disclosure can comprise the following beneficial effects:

the method comprises the steps of obtaining image information of a target area to be detected and point cloud information of the target area; acquiring a first three-dimensional detection result of a target object in the target area according to the image information, and acquiring a second three-dimensional detection result of the target object in the target area according to the point cloud information; determining a to-be-determined three-dimensional detection result from the first three-dimensional detection result and the second three-dimensional detection result; acquiring a two-dimensional detection result of the target object in the target area according to the image information; and determining a target three-dimensional detection result from the to-be-determined three-dimensional detection results according to the two-dimensional detection result. The method has the advantages that the undetermined three-dimensional detection result can be checked through the two-dimensional detection result, false detection of the undetermined three-dimensional detection result is reduced, accuracy of target detection is improved, and safety and reliability of automatic driving of the vehicle are improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Additional features and advantages of the present disclosure will be set forth in the detailed description which follows.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure, but do not constitute a limitation of the disclosure.

FIG. 1 is a flow chart illustrating a method of object detection according to an exemplary embodiment.

FIG. 2 is a flow chart illustrating another method of object detection according to an exemplary embodiment.

FIG. 3 is a flowchart illustrating yet another method of object detection, according to an exemplary embodiment.

FIG. 4 is a flowchart illustrating yet another method of object detection, according to an example embodiment.

Fig. 5 is a block diagram illustrating an apparatus for object detection according to an exemplary embodiment.

FIG. 6 is a functional block diagram of a vehicle, according to an exemplary embodiment.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims, it being understood that the detailed description described herein is merely illustrative and explanatory of the disclosure and is not restrictive of the disclosure.

It should be noted that, all actions of acquiring signals, information or data in the present application are performed under the condition of conforming to the corresponding data protection rule policy of the country of the location and obtaining the authorization given by the owner of the corresponding device.

The present disclosure is described below in connection with specific embodiments.

FIG. 1 is a flow chart illustrating a method of target detection, as shown in FIG. 1, according to an exemplary embodiment, which may include the steps of:

in step S101, image information of a target area to be detected, and point cloud information of the target area are acquired.

For example, the target area may be an area to be detected within a vehicle periphery range, and in some possible implementations, image information of the target area to be detected may be obtained through a camera disposed at a first preset position of the vehicle, and point cloud information of the target area may be obtained through a laser radar disposed at a second preset position of the vehicle, where the point cloud information includes a plurality of three-dimensional coordinate points and corresponding feature information, such as reflectivity. The number of cameras may be one or more, and the number of lidars may be one or more, which is not limited in this disclosure, and the technical scheme of acquiring the image information of the target area through the cameras and acquiring the point cloud information of the target area through the lidar is described in the related art, and is not repeated here.

In step S102, a first three-dimensional detection result of the target object in the target area is obtained from the image information, and a second three-dimensional detection result of the target object in the target area is obtained from the point cloud information.

In some possible implementations, the first three-dimensional detection result of the target object in the target area may be obtained according to the image information by using a Monocular three-dimensional detection technique fcos3D (Fully Convolutional One-Stage Monocular 3D Object Detection, full convolution Monocular three-dimensional target detection), and in the case that the number of cameras is multiple, the first three-dimensional detection result of the target object in the target area may also be obtained according to the image information by using a Multi-view three-dimensional detection technique detr3D (Multi-view 3D Object Detection). The technical solution for acquiring the first three-dimensional detection result of the target object in the target area according to the image information may be referred to the description in the related art by adopting the monocular three-dimensional detection technology or the monocular three-dimensional detection technology, which is not described herein, and the present disclosure does not limit the technical solution for acquiring the first three-dimensional detection result of the target object in the target area according to the image information.

In some possible implementations, a pre-trained neural network model may be used to obtain the second three-dimensional detection result of the target object in the target area according to the point cloud information, for example, the pre-set neural network model may be SIENet (Spatial Information Enhancement Network ), and the technical solution for obtaining the second three-dimensional detection result of the target object in the target area according to the point cloud information is described in the related art, which is not repeated herein, and the disclosure does not limit the technical solution for obtaining the second three-dimensional detection result of the target object in the target area according to the point cloud information.

In step S103, a pending three-dimensional detection result is determined from the first three-dimensional detection result and the second three-dimensional detection result.

The first three-dimensional detection result comprises a first three-dimensional labeling frame used for labeling the target object in the target area, and the second three-dimensional detection result comprises a second three-dimensional labeling frame used for labeling the target object in the target area.

For example, the first three-dimensional labeling frame and the second three-dimensional labeling frame may be represented by the following formula 1 and formula 2, respectively.

Wherein, the liquid crystal display device comprises a liquid crystal display device,the coordinates of the center point of the first three-dimensional annotation frame for the ith target object,length, width and height information of a first three-dimensional annotation frame serving as an ith target object,/and->The angle information of the first three-dimensional label frame, which is the i-th target object, may be, for example, an angle between the length of the first three-dimensional label frame and the vehicle forward direction.

Wherein, the liquid crystal display device comprises a liquid crystal display device,the coordinates of the center point of the second three-dimensional annotation frame of the jth target object,length, width and height information of a second three-dimensional annotation frame serving as a jth target object,/for>And the angle information of the second three-dimensional annotation frame of the jth target object.

The coordinate system in which the coordinates are located may be a vehicle body coordinate system, for example, the vehicle body coordinate system is a right-hand coordinate system, and the center of the rear axle of the vehicle is taken as the origin of coordinates.

Fig. 2 is a flowchart illustrating another method of object detection according to an exemplary embodiment, and as shown in fig. 2, step S103 may include the steps of:

in step S1031, the first three-dimensional label frame and the second three-dimensional label frame are matched.

In some possible implementations, the first three-dimensional labeling frame and the second three-dimensional labeling frame may be projected onto the ground in the vehicle body coordinate system, a matching result is determined according to the IOU (Intersection over Union, intersection ratio) of the ground projections of the first three-dimensional labeling frame and the second three-dimensional labeling frame, and if the IOU is greater than or equal to a preset intersection ratio threshold, the matching of the corresponding first three-dimensional labeling frame and the second three-dimensional labeling frame is determined to be successful. And under the condition that the IOU of the first three-dimensional labeling frame and any second three-dimensional labeling frame is smaller than a preset cross ratio threshold, determining that the first three-dimensional labeling frame is unsuccessful in matching. And under the condition that the IOU of the second three-dimensional annotation frame and any one of the first three-dimensional annotation frames is smaller than a preset cross ratio threshold, determining that the second three-dimensional annotation frame is unsuccessfully matched.

In step S1032, a pending three-dimensional detection result is determined from the first three-dimensional detection result and the second three-dimensional detection result according to the matching result.

The first three-dimensional detection result further comprises a first three-dimensional identification type, and the second three-dimensional detection result further comprises a second three-dimensional identification type.

In some possible implementations, when the first three-dimensional labeling frame and the second three-dimensional labeling frame are successfully matched, the second three-dimensional labeling frame and the second three-dimensional recognition type are used as pending three-dimensional detection results. And under the condition that the matching of the first three-dimensional labeling frame is unsuccessful, taking the first three-dimensional labeling frame and the first three-dimensional identification type as a to-be-determined three-dimensional detection result, and under the condition that the matching of the second three-dimensional labeling frame is unsuccessful, taking the second three-dimensional labeling frame and the second three-dimensional identification type as the to-be-determined three-dimensional detection result.

For example, the pending three-dimensional detection result includes a pending three-dimensional labeling frame for labeling the target object in the target area, where the pending three-dimensional labeling frame can be represented by the following formula 3.

Wherein, the liquid crystal display device comprises a liquid crystal display device,is the kthCoordinates of a center point of the to-be-determined three-dimensional labeling frame of the target object,length, width and height information of a to-be-determined three-dimensional marking frame for a kth target object,/for>And the angle information of the to-be-determined three-dimensional annotation frame of the kth target object.

In step S104, a two-dimensional detection result of the target object in the target area is acquired from the image information.

In some possible implementations, a two-dimensional detection result of a target object in a target area may be obtained from image information by a deep learning image recognition technique in the related art, for example, a fast R-CNN (fast Region-based Convolutional Neural Networks, faster area-based convolutional neural network), and the two-dimensional detection result may include a two-dimensional labeling frame of the target object in the image information, for example, in the fast R-CNN, the two-dimensional labeling frame may be an ROI (Region of interest, region of interest frame), and the two-dimensional labeling frame may be represented by the following formula 4, for example.

Wherein, the liquid crystal display device comprises a liquid crystal display device,for the center point coordinates of the two-dimensional labeling frame of the mth target object, < >>And->The width and the height of the two-dimensional annotation frame of the mth target object are respectively.

The coordinate system in which the coordinates in the formula are located may be a pixel coordinate system, and specifically, reference may be made to descriptions in the related art, which are not described herein.

In some embodiments, the two-dimensional detection result may further include a first recognition type of the target object, which may be determined, for example, by a deep-learning image recognition technique in the related art.

The technical scheme for acquiring the two-dimensional detection result of the target object in the target area according to the image information may refer to descriptions in related technologies, and is not repeated herein, and the technical scheme for acquiring the two-dimensional detection result of the target object in the target area according to the image information is not limited in the present disclosure.

In step S105, a target three-dimensional detection result is determined from the pending three-dimensional detection results based on the two-dimensional detection result.

Fig. 3 is a flowchart illustrating yet another method of object detection, according to an exemplary embodiment, as shown in fig. 3, step S105 may include the steps of:

in step S1051, camera calibration parameters corresponding to the image information are acquired.

In some possible implementations, the camera calibration parameters may include internal parameters and external parameters of the camera, wherein the internal parameters of the camera are used to convert the coordinates of the target object in the pixel coordinate system and the camera coordinate system, and the external parameters of the camera are used to convert the coordinates of the target object in the camera coordinate system and the vehicle body coordinate system. The values of specific camera calibration parameters are related to parameters (such as focal length) of the camera, and reference may be made to descriptions of camera calibration parameters in the related art.

In step S1052, the to-be-determined three-dimensional labeling frame is projected onto the image information according to the camera calibration parameters, and the labeling frame projection of the to-be-determined three-dimensional labeling frame on the image information is obtained.

For example, the coordinates of the to-be-determined three-dimensional labeling frame in the formula 3 may be converted into to-be-determined coordinates in the camera coordinate system according to external parameters in the camera calibration parameters, and then the to-be-determined coordinates are converted into labeling frames in the pixel coordinate system for projection according to internal parameters in the camera calibration parameters. In some possible implementations, the annotation box projection can be represented by equation 5 below.

Wherein, the liquid crystal display device comprises a liquid crystal display device,center point coordinates projected for the label frame of the kth target object, < >>And->The width and the height of the label frame projection of the kth target object are respectively.

The step of obtaining the projection of the pending three-dimensional labeling frame on the image information by the external reference and the internal reference in the camera calibration parameters is described in the related art, and is not repeated here.

In step S1053, a target three-dimensional labeling frame is determined from the to-be-determined three-dimensional labeling frames according to the two-dimensional detection result and the labeling frame projection.

The two-dimensional detection result comprises a two-dimensional annotation frame used for annotating the target object in the target area.

In some embodiments, the target three-dimensional annotation frame may be determined from the pending three-dimensional annotation frames by the following steps.

And step 1, matching the two-dimensional annotation frame with the projection of the annotation frame.

In some possible implementations, first, in the case where there is an overlapping region of the two-dimensional annotation frame and the annotation frame projection, a total region of the two-dimensional annotation frame and the annotation frame projection in the image information is determined.

For example, the union of the two-dimensional annotation frame represented by equation 4 and the corresponding annotation frame projection represented by equation 5 may be used as the total area of the two-dimensional annotation frame and the annotation frame projection in the image information.

Next, the overlapping proportion of the overlapping region in the total region is determined.

For example, the intersection of the two-dimensional annotation frame represented by equation 4 and the corresponding annotation frame projection represented by equation 5 may be used as the overlapping region of the two-dimensional annotation frame and the annotation frame projection in the image information. The overlapping proportion of the overlapping region in the total region is obtained, and for example, a ratio of the first number of pixels of the overlapping region to the second number of pixels of the total region may be used as the overlapping proportion.

And finally, under the condition that the overlapping proportion is larger than or equal to a preset overlapping proportion threshold value, determining that the two-dimensional annotation frame and the annotation frame are successfully matched in projection.

And 2, taking the successfully matched to-be-determined three-dimensional labeling frame as a target three-dimensional labeling frame.

For example, when the overlapping proportion is greater than or equal to a preset overlapping proportion threshold, the method characterizes that a target object in the pending three-dimensional detection result corresponding to the projection of the labeling frame exists in the two-dimensional detection result, and takes the corresponding pending three-dimensional labeling frame as a target three-dimensional labeling frame. In some possible implementations, the target three-dimensional annotation box can be represented by the following equation 6.

Wherein, the liquid crystal display device comprises a liquid crystal display device,coordinates of a center point of the target three-dimensional annotation frame of the nth target object, Length, width and height information of a target three-dimensional annotation frame of an nth target object>The angle information of the target three-dimensional annotation frame which is the nth target object, wherein n is less than or equal to k in the formula 5.

In another embodiment, when the overlapping proportion of the label frame projection and any two-dimensional label frame is smaller than a preset overlapping proportion threshold value or an overlapping area does not exist between the label frame projection and any two-dimensional label frame, the to-be-determined three-dimensional label frame corresponding to the label frame projection is characterized as a false detection three-dimensional label frame, and in some possible implementation manners, the three-dimensional detection result corresponding to the false detection three-dimensional label frame can be deleted from the to-be-determined three-dimensional detection result.

By adopting the technical scheme, the undetermined three-dimensional detection result can be checked through the two-dimensional detection result, false detection of the undetermined three-dimensional detection result is reduced, the accuracy of target detection is improved, and the safety and reliability of automatic driving of the vehicle are improved.

In another embodiment, the two-dimensional detection result further comprises a first object recognition type, and the pending three-dimensional detection result further comprises a second object recognition type.

In step S1053, the to-be-determined three-dimensional labeling frame that is successfully matched may be used as the target three-dimensional labeling frame as follows.

For example, the second object recognition type may be obtained through the technical solution in step S102, and since the accuracy of the second object recognition type is lower than that of the first object recognition type in the two-dimensional detection result, the to-be-determined three-dimensional labeling frame that is successfully matched may be used as the target three-dimensional labeling frame when it is determined that the first object recognition type has a correspondence with the second object recognition type.

For example, whether the first object recognition type and the second object recognition type have a correspondence relationship may be determined by the preset object recognition type correspondence relationship shown in table 1.

TABLE 1

It should be specifically noted that, table 1 is an example of a preset object recognition type correspondence relationship, and those skilled in the art may determine the preset object recognition type correspondence relationship according to a specific target detection algorithm and a different algorithm training data.

By adopting the technical scheme, under the condition that the corresponding relation exists between the first object identification type and the second object identification type, the to-be-determined three-dimensional annotation frame which is successfully matched is taken as the target three-dimensional annotation frame. The accuracy of target detection can be further improved, and the safety and reliability of automatic driving of the vehicle are improved.

Fig. 4 is a flowchart illustrating yet another method of object detection, according to an exemplary embodiment, as shown in fig. 4, step S105 may further include the steps of:

in step S1054, the first object recognition type is set as the target object type.

The target three-dimensional detection result further comprises a target object type corresponding to the target object.

In some embodiments, the three-dimensional recognition type in step S1032 may be taken as the target object type.

In another embodiment, since the first object recognition type in the two-dimensional detection result of the target object in the target area acquired from the image information is more accurate, in some embodiments, the corresponding first object recognition type may be taken as the target object type.

By adopting the technical scheme, after the target three-dimensional labeling frame is determined from the to-be-determined three-dimensional labeling frame according to the two-dimensional detection result and the labeling frame projection, the first object identification type in the two-dimensional detection result is used as the target object type, so that the accuracy of target detection can be further improved, and the safety and reliability of automatic driving of the vehicle are improved.

Fig. 5 is a block diagram illustrating an apparatus 500 for object detection, according to an exemplary embodiment, as shown in fig. 5, the apparatus 500 for object detection includes:

A first obtaining module 501 configured to obtain image information of a target area to be detected and point cloud information of the target area;

a second obtaining module 502 configured to obtain a first three-dimensional detection result of the target object in the target area according to the image information, and obtain a second three-dimensional detection result of the target object in the target area according to the point cloud information;

a determining module 503 configured to determine a pending three-dimensional detection result from the first three-dimensional detection result and the second three-dimensional detection result;

a third obtaining module 504 configured to obtain a two-dimensional detection result of the target object in the target area according to the image information;

the detection module 505 is configured to determine a target three-dimensional detection result from the pending three-dimensional detection results according to the two-dimensional detection result.

Optionally, the pending three-dimensional detection result includes a pending three-dimensional labeling frame for labeling the target object in the target area, the target three-dimensional detection result includes a target three-dimensional labeling frame, and the detection module 505 is further configured to:

acquiring camera calibration parameters corresponding to the image information;

projecting the to-be-fixed three-dimensional annotation frame onto the image information according to the camera calibration parameters to obtain the annotation frame projection of the to-be-fixed three-dimensional annotation frame on the image information;

And determining the target three-dimensional annotation frame from the to-be-determined three-dimensional annotation frames according to the two-dimensional detection result and the annotation frame projection.

Optionally, the two-dimensional detection result includes a two-dimensional labeling frame in the target area for labeling the target object, and the detection module 505 is further configured to:

and taking the successfully matched to-be-determined three-dimensional labeling frame as a target three-dimensional labeling frame.

Optionally, the detection module 505 is further configured to:

under the condition that the two-dimensional annotation frame and the annotation frame projection have an overlapping area, determining the total area of the two-dimensional annotation frame and the annotation frame projection in the image information;

determining the overlapping proportion of the overlapping region in the total region;

and under the condition that the overlapping proportion is larger than or equal to a preset overlapping proportion threshold value, determining that the two-dimensional annotation frame and the annotation frame are successfully matched in projection.

Optionally, the two-dimensional detection result further includes a first object recognition type, the pending three-dimensional detection result further includes a second object recognition type, and the detection module 505 is further configured to:

and under the condition that the corresponding relation exists between the first object recognition type and the second object recognition type, taking the to-be-determined three-dimensional annotation frame successfully matched as a target three-dimensional annotation frame.

Optionally, the target three-dimensional detection result further includes a target object type corresponding to the target object, and the detection module 505 is further configured to:

the first object recognition type is taken as a target object type.

Optionally, the first three-dimensional detection result includes a first three-dimensional labeling frame in the target area for labeling the target object, the second three-dimensional detection result includes a second three-dimensional labeling frame in the target area for labeling the target object, and the determining module 503 is further configured to:

matching the first three-dimensional labeling frame with the second three-dimensional labeling frame;

and determining a to-be-determined three-dimensional detection result from the first three-dimensional detection result and the second three-dimensional detection result according to the matching result.

The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.

By adopting the scheme, the undetermined three-dimensional detection result can be checked through the two-dimensional detection result, false detection of the undetermined three-dimensional detection result is reduced, the accuracy of target detection is improved, and the safety and reliability of automatic driving of the vehicle are improved.

The present disclosure also provides a non-transitory computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the steps of the method of object detection provided by the present disclosure.

The apparatus 500 for object detection may be a part of or in addition to a stand-alone electronic device, for example, in one embodiment, the apparatus 500 for object detection may be an integrated circuit (Integrated Circuit, IC) or a chip, where the integrated circuit may be an IC or a set of multiple ICs; the chip may include, but is not limited to, the following: GPU (Graphics Processing Unit, graphics processor), CPU (Central Processing Unit ), FPGA (Field Programmable Gate Array, programmable logic array), DSP (Digital Signal Processor ), ASIC (Application Specific Integrated Circuit, application specific integrated circuit), SOC (System on Chip, SOC, system on Chip or System on Chip), etc. The integrated circuit or chip described above may be used to execute executable instructions (or code) to implement the method of object detection of the first aspect of the present disclosure. The executable instructions may be stored on the integrated circuit or chip or may be retrieved from another device or apparatus, such as the integrated circuit or chip including a processor, memory, and interface for communicating with other devices. The executable instructions may be stored in the processor, which when executed by the processor, implement the method of object detection of the first aspect of the present disclosure; alternatively, the integrated circuit or chip may receive executable instructions through the interface and transmit to the processor for execution to implement the method of object detection of the first aspect of the present disclosure.

Referring to fig. 6, fig. 6 is a functional block diagram of a vehicle 600, according to an exemplary embodiment. The vehicle 600 may be configured in a fully or partially autonomous mode. For example, the vehicle 600 may obtain environmental information of its surroundings through the perception system 620 and derive an automatic driving strategy based on analysis of the surrounding environmental information to achieve full automatic driving, or present the analysis results to the user to achieve partial automatic driving.

The vehicle 600 may include various subsystems, such as an infotainment system 610, a perception system 620, a decision control system 630, a drive system 640, and a computing platform 650. Alternatively, vehicle 600 may include more or fewer subsystems, and each subsystem may include multiple components. In addition, each of the subsystems and components of vehicle 600 may be interconnected via wires or wirelessly.

In some embodiments, the infotainment system 610 may include a communication system 611, an entertainment system 612, and a navigation system 613.

The communication system 611 may comprise a wireless communication system, which may communicate wirelessly with one or more devices, either directly or via a communication network. For example, the wireless communication system may use 3G cellular communication, such as CDMA, EVD0, GSM/GPRS, or 4G cellular communication, such as LTE. Or 5G cellular communication. The wireless communication system may communicate with a wireless local area network (wireless local area network, WLAN) using WiFi. In some embodiments, the wireless communication system may communicate directly with the device using an infrared link, bluetooth, or ZigBee. Other wireless protocols, such as various vehicle communication systems, for example, wireless communication systems may include one or more dedicated short-range communication (dedicated short range communications, DSRC) devices, which may include public and/or private data communications between vehicles and/or roadside stations.

Entertainment system 612 may include a display device, a microphone, and an audio, and a user may listen to the broadcast in the vehicle based on the entertainment system, playing music; or the mobile phone is communicated with the vehicle, the screen of the mobile phone is realized on the display equipment, the display equipment can be in a touch control type, and a user can operate through touching the screen.

In some cases, the user's voice signal may be acquired through a microphone and certain controls of the vehicle 600 by the user may be implemented based on analysis of the user's voice signal, such as adjusting the temperature within the vehicle, etc. In other cases, music may be played to the user through sound.

The navigation system 613 may include a map service provided by a map provider to provide navigation of a travel route for the vehicle 600, and the navigation system 613 may be used with the global positioning system 621 and the inertial measurement unit 622 of the vehicle. The map service provided by the map provider may be a two-dimensional map or a high-precision map.

The perception system 620 may include several types of sensors that sense information about the environment surrounding the vehicle 600. For example, sensing system 620 may include a global positioning system 621 (which may be a GPS system, or may be a beidou system, or other positioning system), an inertial measurement unit (inertial measurement unit, IMU) 622, a lidar 623, a millimeter wave radar 624, an ultrasonic radar 625, and a camera 626. The sensing system 620 may also include sensors (e.g., in-vehicle air quality monitors, fuel gauges, oil temperature gauges, etc.) of the internal systems of the monitored vehicle 600. Sensor data from one or more of these sensors may be used to detect objects and their corresponding characteristics (location, shape, direction, speed, etc.). Such detection and identification is a critical function of the safe operation of the vehicle 600.

The global positioning system 621 is used to estimate the geographic location of the vehicle 600.

The inertial measurement unit 622 is configured to sense a change in the pose of the vehicle 600 based on inertial acceleration. In some embodiments, inertial measurement unit 622 may be a combination of an accelerometer and a gyroscope.

The lidar 623 uses a laser to sense objects in the environment in which the vehicle 600 is located. In some embodiments, lidar 623 may include one or more laser sources, a laser scanner, and one or more detectors, among other system components.

The millimeter-wave radar 624 utilizes radio signals to sense objects within the surrounding environment of the vehicle 600. In some embodiments, millimeter-wave radar 624 may be used to sense the speed and/or heading of an object in addition to sensing the object.

The ultrasonic radar 625 may utilize ultrasonic signals to sense objects around the vehicle 600.

The image pickup device 626 is used to capture image information of the surrounding environment of the vehicle 600. The image capturing device 626 may include a monocular camera, a binocular camera, a structured light camera, a panoramic camera, etc., and the image information acquired by the image capturing device 626 may include still images or video stream information.

The decision control system 630 includes a computing system 631 that makes analysis decisions based on information acquired by the perception system 620, and the decision control system 630 also includes a vehicle controller 632 that controls the powertrain of the vehicle 600, as well as a steering system 633, throttle 634, and braking system 635 for controlling the vehicle 600.

The computing system 631 may be operable to process and analyze the various information acquired by the perception system 620 in order to identify targets, objects, and/or features in the environment surrounding the vehicle 600. The targets may include pedestrians or animals and the objects and/or features may include traffic signals, road boundaries, and obstacles. The computing system 631 may use object recognition algorithms, in-motion restoration structure (Structure from Motion, SFM) algorithms, video tracking, and the like. In some embodiments, the computing system 631 may be used to map the environment, track objects, estimate the speed of objects, and so forth. The computing system 631 may analyze the acquired various information and derive control strategies for the vehicle.

The vehicle controller 632 may be configured to coordinate control of the power battery and the engine 641 of the vehicle to enhance the power performance of the vehicle 600.

Steering system 633 is operable to adjust the direction of travel of vehicle 600. For example, in one embodiment may be a steering wheel system.

Throttle 634 is used to control the operating speed of engine 641 and thereby the speed of vehicle 600.

The braking system 635 is used to control deceleration of the vehicle 600. The braking system 635 may use friction to slow the wheels 644. In some embodiments, the braking system 635 may convert kinetic energy of the wheels 644 into electrical current. The braking system 635 may take other forms to slow the rotational speed of the wheels 644 to control the speed of the vehicle 600.

The drive system 640 may include components that provide powered movement of the vehicle 600. In one embodiment, the drive system 640 may include an engine 641, an energy source 642, a transmission 643, and wheels 644. The engine 641 may be an internal combustion engine, an electric motor, an air compression engine, or other types of engine combinations, such as a hybrid engine of a gasoline engine and an electric motor, or a hybrid engine of an internal combustion engine and an air compression engine. The engine 641 converts the energy source 642 into mechanical energy.

Examples of energy sources 642 include gasoline, diesel, other petroleum-based fuels, propane, other compressed gas-based fuels, ethanol, solar panels, batteries, and other sources of electricity. The energy source 642 may also provide energy to other systems of the vehicle 600.

The transmission 643 may transfer mechanical power from the engine 641 to wheels 644. The transmission 643 may include a gearbox, a differential, and a driveshaft. In one embodiment, the transmission 643 may also include other devices, such as a clutch. Wherein the drive shaft may include one or more axles that may be coupled to one or more wheels 644.

Some or all of the functions of the vehicle 600 are controlled by the computing platform 650. The computing platform 650 may include at least one processor 651, and the processor 651 may execute instructions 653 stored in a non-transitory computer-readable medium, such as memory 652. In some embodiments, computing platform 650 may also be a plurality of computing devices that control individual components or subsystems of vehicle 600 in a distributed manner.

The processor 651 may be any conventional processor, such as a commercially available CPU. Alternatively, the processor 651 may also include, for example, an image processor (Graphic Process Unit, GPU), a field programmable gate array (Field Programmable Gate Array, FPGA), a System On Chip (SOC), an application specific integrated Chip (Application Specific Integrated Circuit, ASIC), or a combination thereof. Although FIG. 6 functionally illustrates a processor, memory, and other elements of a computer in the same block, it will be understood by those of ordinary skill in the art that the processor, computer, or memory may in fact comprise multiple processors, computers, or memories that may or may not be stored within the same physical housing. For example, the memory may be a hard disk drive or other storage medium located in a different housing than the computer. Thus, references to a processor or computer will be understood to include references to a collection of processors or computers or memories that may or may not operate in parallel. Rather than using a single processor to perform the steps described herein, some components, such as the steering component and the retarding component, may each have their own processor that performs only calculations related to the component-specific functions.

In the disclosed embodiments, the processor 651 may perform the method of object detection described above.

In various aspects described herein, the processor 651 can be located remotely from and in wireless communication with the vehicle. In other aspects, some of the processes described herein are performed on a processor disposed within the vehicle and others are performed by a remote processor, including taking the necessary steps to perform a single maneuver.

In some embodiments, memory 652 may contain instructions 653 (e.g., program logic), which instructions 653 may be executed by processor 651 to perform various functions of vehicle 600. Memory 652 may also contain additional instructions, including instructions to send data to, receive data from, interact with, and/or control one or more of infotainment system 610, perception system 620, decision control system 630, drive system 640.

In addition to instructions 653, memory 652 may store data such as road maps, route information, vehicle location, direction, speed, and other such vehicle data, as well as other information. Such information may be used by the vehicle 600 and the computing platform 650 during operation of the vehicle 600 in autonomous, semi-autonomous, and/or manual modes.

The computing platform 650 may control the functions of the vehicle 600 based on inputs received from various subsystems (e.g., the drive system 640, the perception system 620, and the decision control system 630). For example, computing platform 650 may utilize input from decision control system 630 in order to control steering system 633 to avoid obstacles detected by perception system 620. In some embodiments, computing platform 650 is operable to provide control over many aspects of vehicle 600 and its subsystems.

Alternatively, one or more of these components may be mounted separately from or associated with vehicle 600. For example, the memory 652 may exist partially or completely separate from the vehicle 600. The above components may be communicatively coupled together in a wired and/or wireless manner.

Alternatively, the above components are only an example, and in practical applications, components in the above modules may be added or deleted according to actual needs, and fig. 6 should not be construed as limiting the embodiments of the present disclosure.

An autonomous car traveling on a road, such as the vehicle 600 above, may identify objects within its surrounding environment to determine adjustments to the current speed. The object may be another vehicle, a traffic control device, or another type of object. In some examples, each identified object may be considered independently and based on its respective characteristics, such as its current speed, acceleration, spacing from the vehicle, etc., may be used to determine the speed at which the autonomous car is to adjust.

Alternatively, the vehicle 600 or a sensing and computing device associated with the vehicle 600 (e.g., computing system 631, computing platform 650) may predict the behavior of the identified object based on the characteristics of the identified object and the state of the surrounding environment (e.g., traffic, rain, ice on a road, etc.). Alternatively, each identified object depends on each other's behavior, so all of the identified objects can also be considered together to predict the behavior of a single identified object. The vehicle 600 is able to adjust its speed based on the predicted behavior of the identified object. In other words, the autonomous car is able to determine what steady state the vehicle will need to adjust to (e.g., accelerate, decelerate, or stop) based on the predicted behavior of the object. In this process, other factors may also be considered to determine the speed of the vehicle 600, such as the lateral position of the vehicle 600 in the road on which it is traveling, the curvature of the road, the proximity of static and dynamic objects, and so forth.

In addition to providing instructions to adjust the speed of the autonomous vehicle, the computing device may also provide instructions to modify the steering angle of the vehicle 600 so that the autonomous vehicle follows a given trajectory and/or maintains safe lateral and longitudinal distances from objects in the vicinity of the autonomous vehicle (e.g., vehicles in adjacent lanes on a roadway).

The vehicle 600 may be various types of traveling tools, such as a car, a truck, a motorcycle, a bus, a ship, an airplane, a helicopter, a recreational vehicle, a train, etc., and embodiments of the present disclosure are not particularly limited.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A method of target detection, the method comprising:

determining a target three-dimensional detection result from the to-be-determined three-dimensional detection results according to the two-dimensional detection result;

the pending three-dimensional detection result comprises a pending three-dimensional labeling frame used for labeling the target object in the target area, the target three-dimensional detection result comprises a target three-dimensional labeling frame, and the determining the target three-dimensional detection result from the pending three-dimensional detection result according to the two-dimensional detection result comprises:

acquiring camera calibration parameters corresponding to the image information;

determining the target three-dimensional annotation frame from the to-be-determined three-dimensional annotation frame according to the two-dimensional detection result and the annotation frame projection;

The two-dimensional detection result comprises a two-dimensional annotation frame used for annotating the target object in the target area, and the determining the target three-dimensional annotation frame from the to-be-determined three-dimensional annotation frame according to the two-dimensional detection result and the annotation frame projection comprises the following steps:

2. The method of claim 1, wherein said matching the two-dimensional annotation frame with the annotation frame projection comprises:

determining an overlap ratio of the overlap region in the total region;

3. The method according to claim 1, wherein the two-dimensional detection result further includes a first object recognition type, the pending three-dimensional detection result further includes a second object recognition type, and the pending three-dimensional labeling frame to be successfully matched includes, as the target three-dimensional labeling frame:

4. The method according to claim 3, wherein the target three-dimensional detection result further includes a target object type corresponding to the target object, and the determining the target three-dimensional detection result from the to-be-determined three-dimensional detection results according to the two-dimensional detection result further includes:

and taking the first object identification type as the target object type.

5. The method of any one of claims 1 to 4, wherein the first three-dimensional detection result includes a first three-dimensional labeling frame in the target region for labeling the target object, the second three-dimensional detection result includes a second three-dimensional labeling frame in the target region for labeling the target object, and determining a to-be-determined three-dimensional detection result from the first three-dimensional detection result and the second three-dimensional detection result includes:

6. An apparatus for target detection, the apparatus comprising:

the detection module is configured to determine a target three-dimensional detection result from the to-be-determined three-dimensional detection results according to the two-dimensional detection result;

the pending three-dimensional detection result comprises a pending three-dimensional labeling frame used for labeling the target object in the target area, the target three-dimensional detection result comprises a target three-dimensional labeling frame, and the detection module is further configured to:

Acquiring camera calibration parameters corresponding to the image information;

the two-dimensional detection result comprises a two-dimensional labeling frame used for labeling the target object in the target area, and the detection module is further configured to:

7. A non-transitory computer readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the steps in the method of any of claims 1 to 5.

8. A chip, comprising a processor and an interface; the processor is configured to read instructions to perform the method of any one of claims 1 to 5.

9. A vehicle, characterized by comprising:

A memory having a computer program stored thereon;

a processor for executing the computer program in the memory to implement the steps of the method of any one of claims 1 to 5.