CN115049700A

CN115049700A - Target detection method and device

Info

Publication number: CN115049700A
Application number: CN202110256851.2A
Authority: CN
Inventors: 吴家俊; 梁振宝; 周伟
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2021-03-09
Filing date: 2021-03-09
Publication date: 2022-09-13
Also published as: WO2022188663A1

Abstract

The application relates to the field of intelligent driving, and discloses a target detection method and device, which are used for improving the accuracy and the real-time performance of target detection. The method comprises the following steps: acquiring a point cloud from a three-dimensional scanning device and an image from a visual sensor; inputting the point cloud and the three-dimensional space position of the target predicted by the at least one target tracking track in the point cloud into a target detection model for processing to obtain the three-dimensional space position of the at least one first target; determining a two-dimensional space position of at least one second target in the image according to the projection of the three-dimensional space position of the at least one first target in the image and the two-dimensional space position of the at least one target tracking track in the image; and determining the three-dimensional space position of the at least one second target in the point cloud according to the projection of the two-dimensional space position of the at least one second target in the point cloud.

Description

Target detection method and device

Technical Field

The embodiment of the application relates to the field of intelligent driving, in particular to a target detection method and device.

Background

Along with the development of cities, traffic is more congested, and people tend to be tired more and more when driving. In order to meet the travel requirements of people, intelligent driving (including auxiliary driving and unmanned driving) is carried out. How to reliably realize the target detection in the environment is very important for the decision of intelligent driving.

Most current target detection methods are based on a single type of sensor, relying for example only on lidar to acquire point clouds or only on a camera to acquire images. The point cloud can provide three-dimensional information of the target, can well overcome the problem of mutual shielding of the target, but is sparse, and has low recognition rate of target features. Compared with point clouds, the image has more abundant information, but the image is greatly influenced by illumination, weather and the like, and the reliability of detection and tracking is poor. And the image only has two-dimensional plane information, and the information of the shielded target cannot be acquired, so that the target is easily lost or errors are easily caused. The point cloud and the image are fused, so that the complementarity of the point cloud and the image can be fully exerted, and the detection robustness is improved. However, at present, the research on target detection of multi-sensor fusion is less, and the accuracy and real-time performance of target detection are to be improved.

Disclosure of Invention

The embodiment of the application provides a target detection method and device, which are used for improving the accuracy and the real-time performance of target detection.

In a first aspect, an embodiment of the present application provides a target detection method, where the method includes: acquiring a point cloud from a three-dimensional scanning device and an image from a visual sensor; inputting the point cloud and the three-dimensional space position of the target predicted by the at least one target tracking track in the point cloud into a target detection model for processing to obtain the three-dimensional space position of the at least one first target, wherein the target detection model is obtained by training a plurality of point cloud samples of the three-dimensional space position of the predicted target corresponding to the known target tracking track and three-dimensional space position detection results of a plurality of targets corresponding to the plurality of point cloud samples one by one; determining a two-dimensional space position of at least one second target in the image according to the projection of the three-dimensional space position of the at least one first target in the image and the two-dimensional space position of the at least one target tracking track in the image; and determining the three-dimensional space position of the at least one second target in the point cloud according to the projection of the two-dimensional space position of the at least one second target in the point cloud.

In the embodiment of the application, a target tracking track feedback mechanism is added, when the target is detected in the point cloud and the image, the area where the target tracking track is predicted in the point cloud and the image is more concerned, the detection omission can be effectively reduced, and the accuracy of target detection is improved.

In one possible design, the method further includes: matching the at least one target tracking track and the at least one second target according to the target characteristics corresponding to the at least one target tracking track and the target characteristics of the at least one second target; and associating the matched target tracking track with the second target. Optionally, the target features include one or more of: position, size, speed, direction, category, point cloud point number, coordinate value distribution of each direction of the point cloud, point cloud reflection intensity distribution, appearance characteristic, depth characteristic and the like.

In the design, the detected target can be associated with the existing target tracking track based on the target characteristics, so that the complete target tracking track can be acquired, and the upcoming position of the target at the next moment can be predicted.

In one possible design, the method further includes: and for the second target which is not matched with the target tracking track, establishing a target tracking track corresponding to the second target.

In the design, a new ID can be given to a newly appeared target, a target tracking track corresponding to the target is established, and the tracking of all the appeared targets is favorably realized.

In one possible design, the method further includes: for the target tracking trajectory that does not match to the second target, associating the target tracking trajectory with a predicted target of the target tracking trajectory in the point cloud and/or the image.

In the design, for the target tracking track of which the corresponding target is not detected in the point cloud and the image, the target tracking track can be associated with the predicted target of the target tracking track in the point cloud and/or the image, so that the problem that the same target corresponds to a plurality of target tracking tracks due to missing detection is avoided, and the reliability of target tracking is improved.

In one possible design, before associating the target tracking trajectory with a predicted target of the target tracking trajectory in the point cloud and/or the image for the target tracking trajectory that does not match to the second target, the method further comprises: and when the frequency of the target tracking track associated with the predicted target is greater than or equal to a first threshold value, deleting the target tracking track.

In the design, the target tracking track of the corresponding target which is not detected in the acquired point cloud and/or image for many times is deleted, so that the processing resource is saved.

In one possible design, the method further includes: acquiring a calibration object point cloud from three-dimensional scanning equipment and a calibration object image from a visual sensor; and determining a projection matrix of a point cloud coordinate system and an image coordinate system according to the three-dimensional coordinates of a plurality of calibration points in the calibration object in the point cloud of the calibration object and the two-dimensional coordinates in the image of the calibration object.

In the design, the three-dimensional scanning equipment and the visual sensor can be jointly calibrated through the calibration object, and the projection matrix of the point cloud coordinate system and the image coordinate system (also called as a pixel coordinate system) is determined, so that the integration of target detection results in the point cloud and the image is facilitated, and the accuracy of target detection is improved.

In a second aspect, an embodiment of the present application provides an object detection apparatus, where the apparatus has a function of implementing the method in the first aspect or any one of the possible designs of the first aspect, where the function may be implemented by hardware, or by hardware executing corresponding software. The hardware or software includes one or more units (modules) corresponding to the above functions, such as an acquisition unit and a processing unit.

In a third aspect, an embodiment of the present application provides an object detection apparatus, including at least one processor and an interface, where the processor is configured to call and run a computer program from the interface, and when the processor executes the computer program, the method may be implemented as described in the first aspect or any one of the possible designs of the first aspect.

In a fourth aspect, an embodiment of the present application provides a terminal, which includes the apparatus in the second aspect. Optionally, the terminal may be an on-board device, a vehicle, a monitoring controller, an unmanned aerial vehicle, a robot, a road side unit, or the like. Or, the terminal may also be an intelligent device that needs target detection or tracking, such as an intelligent home, an intelligent manufacturing, and the like.

In a fifth aspect, an embodiment of the present application provides a chip system, where the chip system includes: a processor and an interface, the processor being configured to invoke and run a computer program from the interface, the computer program, when executed by the processor, being capable of implementing the method as set forth in the first aspect or any one of the possible designs of the first aspect.

In a sixth aspect, an embodiment of the present application provides a computer-readable storage medium having a computer program for executing the method of the first aspect or any one of the possible designs of the first aspect.

In a seventh aspect, this application further provides a computer program product, which includes a computer program or instructions, and when the computer program or instructions are executed, the method described in the first aspect or any possible design of the first aspect may be implemented.

For technical effects achieved by the second aspect to the seventh aspect, please refer to the technical effects achieved by the first aspect, which will not be repeated herein.

Drawings

FIG. 1 is a schematic diagram of a target detection system provided in an embodiment of the present application;

fig. 2 is a schematic view of a target detection process provided in an embodiment of the present application;

FIG. 3 is a schematic view of an intelligent driving scenario provided by an embodiment of the present application;

FIG. 4 is a schematic diagram of a target detection scheme based on multi-sensor fusion according to an embodiment of the present application;

fig. 5 is a second schematic diagram of a multi-sensor fusion-based target detection scheme provided in the embodiment of the present application;

fig. 6 is a third schematic diagram of a multi-sensor fusion-based target detection scheme provided in the embodiment of the present application;

fig. 7 is a schematic process diagram of a target detection method according to an embodiment of the present application;

FIG. 8 is a schematic diagram of an object detection apparatus according to an embodiment of the present disclosure;

fig. 9 is a second schematic view of a target detection apparatus according to an embodiment of the present application.

Detailed Description

Fig. 1 is a schematic diagram of a target detection system provided in the present application, which includes a data preprocessing module, a joint calibration module, a point cloud detection module, an image region of interest acquisition module, a point cloud domain prediction module, an image domain prediction module, a prediction decision module, a data association module, and a trajectory management module.

In conjunction with the schematic diagram of the target detection process shown in fig. 2, the data preprocessing module: the method is mainly used for filtering point clouds and removing ground points; distortion correction is performed on the image, and the like.

A combined calibration module: the method is mainly used for jointly calibrating the point cloud and the image acquired by the three-dimensional scanning equipment and the visual sensor to acquire the projection matrix between the point cloud coordinate system and the image coordinate system.

A point cloud detection module: the method is mainly used for inputting the point cloud obtained at the current moment and the result (such as the three-dimensional space position of a target predicted in the point cloud obtained at the current moment by at least one target tracking track) after the target tracking track management is fed back into a trained target detection model (such as a deep neural network model) to obtain a target detection result.

An image region of interest acquisition module: the method is mainly used for projecting a target detection result obtained based on point cloud into an image by using a projection matrix, and obtaining an interested area by combining a fed-back result (such as a two-dimensional space position of a predicted target in the image obtained by at least one target tracking track at the current moment) after target tracking track management.

A prediction decision module: the method is mainly used for back projecting the target detection result of the image to the point cloud, comparing the target detection result with the target detection result of the point cloud, and deciding a more accurate target detection result.

A data association module: the method is mainly used for performing correlation matching on the target detection result after the prediction decision and the target tracking track.

A track management module: the method is mainly used for managing and updating all target tracking tracks according to the data association result.

A point cloud domain prediction module: the method is mainly used for predicting the three-dimensional space position of the target in the point cloud acquired at the next moment based on the updated target tracking track.

An image domain prediction module: the method is mainly used for predicting the two-dimensional space position of the target in the image acquired at the next moment based on the updated target tracking track.

It is to be understood that the structure of the target detection system illustrated in the embodiments of the present application does not constitute a specific limitation to the target detection system. In other embodiments of the present application, the object detection system may include more or fewer modules than shown, or some modules may be combined, some modules may be split, or a different arrangement of modules.

The target detection scheme provided by the embodiment of the application can be suitable for a terminal which is applied with a target detection system as shown in fig. 1, the terminal can be equipment such as vehicle-mounted equipment, a vehicle, a monitoring controller, an unmanned aerial vehicle, a robot and a Road Side Unit (RSU), and the target detection scheme is suitable for scenes such as monitoring, intelligent driving, unmanned aerial vehicle navigation and robot traveling. In the following description of the embodiment of the present application, a terminal to which the target detection system shown in fig. 1 is applied in an intelligent driving scenario is taken as an example for description. As shown in fig. 3, a terminal (e.g., vehicle a) may acquire a point cloud and an image of a surrounding environment through a three-dimensional scanning device(s) and a visual sensor(s) disposed on the terminal, and may detect and track targets such as vehicles (e.g., vehicle B, vehicle C, etc.), pedestrians, bicycles (not shown in the figure), trees (not shown in the figure), and the like in the surrounding environment.

At present, a multi-sensor fusion-based target detection scheme mainly comprises the following steps:

the first scheme is as follows: as shown in fig. 4, after the point cloud is obtained from the laser radar, the three-dimensional spatial position of the target is detected by using the deep convolutional neural network and the point cloud features are extracted. An image is acquired from a monocular camera, a three-dimensional boundary of a target detected from the point cloud is projected onto the image, and image features of the projected area are extracted using a deep convolutional neural network. And then, calculating similarity matrixes of the detected target and the target tracking track on the point cloud three-dimensional space position, the point cloud characteristics and the image characteristics, combining the three similarity matrixes, calculating bipartite graph matching relation between the target and the target tracking track by the combined similarity matrixes through Hungarian algorithm, and carrying out state estimation on the target tracking track by combining with a Kalman filter, thereby realizing the tracking of the target in the point cloud. However, the scheme simultaneously uses the depth network to extract the features in the image and the point cloud, so that the resource consumption is high, the calculation efficiency is low, and the implementation is poor; and once the missed detection occurs in the point cloud obtained based on the laser radar, the missed detection target cannot be found back through the image, and the accuracy is low.

The second scheme is as follows: as shown in fig. 5, the scheme first obtains target detection information in the captured image and point cloud using a deep learning algorithm. For example: acquiring two-dimensional (2-dimension, 2D) detection frame types, central point pixel coordinate positions and length and width dimension information of targets in the images by adopting a deep learning image target detection algorithm; and acquiring the information of the category, the central point space coordinate and the length, width, height and size of a three-dimensional (3-dimensional, 3D) detection frame of the target in the point cloud by adopting a deep learning point cloud target detection algorithm. And then, performing optimal matching on the detection frames of the targets in the image and the point cloud acquired at adjacent moments respectively by using a Hungarian algorithm based on the minimum distance between the detection frames to realize target tracking, and establishing target tracking tracks of the image and the point cloud respectively. However, in the scheme, the deep learning algorithm is used for feature extraction in the image and the point cloud at the same time, so that the resource consumption is high, and the real-time performance is poor; in addition, without a real tracking algorithm, the distance matching between the detection frame and the target with large displacement at the time when the targets are dense or adjacent is easy to make mistakes.

In the third scheme: as shown in fig. 6, the method collects point clouds of a target, filters the collected point clouds, outputs ground point data after ground points are filtered, maps the obtained ground point data to generate a distance image and a reflection intensity-based image, performs point cloud segmentation and clustering on the ground point data according to the distance image, the reflection intensity image and echo intensity information to obtain a plurality of point cloud areas, and screens out a target point cloud area of a suspected target from each point cloud area according to prior knowledge of the target; and extracting the characteristics of each target point cloud area, and classifying the extracted characteristic vectors to identify the target to obtain a first target detection result. The method comprises the steps of collecting an image, preprocessing the image, extracting an interested area of the preprocessed image by using a projection transformation matrix, extracting image features in the interested area, identifying a target according to the extracted image features, and obtaining a second target detection result. And if the first target detection result and the second target detection result are the same, outputting the first target detection result or the second target detection result as a final target detection result, and if the first target detection result and the second target detection result are different, performing fusion judgment on the first target detection result and the second target detection result based on Bayesian decision to obtain a final target detection result and outputting the final target detection result. Finally, a multi-target tracking method based on Markov Decision Processes (MDP) is used for tracking. However, point cloud based target detection relies on a large amount of a priori knowledge with poor accuracy. When the point cloud is missed, the missed detection target cannot be found back through the image, and the accuracy is low.

The method aims to provide a target detection scheme, the target detection result in the point cloud is corrected through the target detection result in the image, the missing rate is reduced by using a target tracking track feedback mechanism, and the accuracy and the real-time performance of target detection are improved.

Before describing the embodiments of the present application, some terms in the embodiments of the present application will be explained to facilitate understanding by those skilled in the art.

1) The point cloud, a point data set on the surface of the object obtained by scanning through the three-dimensional scanning device, may be referred to as a point cloud (point cloud). A point cloud is a collection of vectors in a three-dimensional coordinate system. These vectors are usually expressed in x, y, z three-dimensional coordinates and are generally used primarily to represent the shape of the external surface of an object. Furthermore, the point cloud may represent RGB color, gray value, depth, object reflection surface intensity, etc. of one point in addition to the geometric position information represented by (x, y, z). The cloud coordinate system referred to in the embodiments of the present application is a three-dimensional (x, y, z) coordinate system where cloud points are located in the cloud points.

2) The image coordinate system, which may also be referred to as a pixel coordinate system, is usually a two-dimensional coordinate system established with the upper left corner of the image as the origin, and the unit is a pixel (pixel). The two coordinate axes of the image coordinate system are formed by u and v. The coordinates of a certain point in the image coordinate system may be identified as (u, v).

3) The angular points, i.e. points with particularly prominent attributes in some respect, refer to representative and robust points in point clouds and images, such as the intersection of two edges.

4) Region of interest (ROI), in image processing, a region to be processed is delineated from a processed image in a manner of a box, a circle, an ellipse, an irregular polygon, and the like, and is called a region of interest, and the region of interest may be regarded as a region where an object exists in the image in the embodiment of the present application.

In addition, it is to be understood that, in the present application, "at least one" means one or more, "a plurality" means two or more. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone, wherein A and B can be singular or plural. In the description of the text of this application, the character "/" generally indicates that the former and latter associated objects are in an "or" relationship. In addition, unless stated to the contrary, the embodiments of the present application refer to the ordinal numbers "first", "second", etc. for distinguishing a plurality of objects, and do not limit the order, timing, priority, or importance of the plurality of objects, and the descriptions of "first", "second", etc. do not limit the objects to be necessarily different. The various numerical designations referred to in this application are merely for ease of description and distinction and are not intended to limit the scope of the embodiments of the present application. The sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of the processes should be determined by their functions and inherent logic. In this application, the words "exemplary" or "for example" are used to mean examples, illustrations or descriptions and any embodiment or design described as "exemplary" or "for example" is not to be construed as preferred or advantageous over other embodiments or designs. The use of the terms "exemplary" or "such as" are intended to present relevant concepts in a concrete fashion for ease of understanding. The embodiments of the present application will be described in detail below with reference to the accompanying drawings.

Fig. 7 is a schematic diagram of a target detection method provided in an embodiment of the present application, where the method includes:

s701: the terminal acquires a point cloud from the three-dimensional scanning device and an image from the vision sensor.

The three-dimensional scanning equipment can be a laser radar, a millimeter wave radar, a depth camera and the like, and the vision sensor can be a monocular camera, a multi-view camera and the like.

In a possible implementation, the terminal may be equipped with at least one three-dimensional scanning device and at least one visual sensor, and the terminal may scan an object around the terminal (or in a certain direction, such as a traveling direction) through the three-dimensional scanning device to collect a point cloud of the object around the terminal (or in the certain direction); the object around the terminal (or in a certain direction) can be scanned by the vision sensor, and the image of the object around the terminal (or in a certain direction) can be acquired. The point cloud may be a set of point cloud points, information of each point cloud point in the set includes three-dimensional coordinates (x, y, z) of the point cloud point, and when the three-dimensional scanning device is a laser radar or a millimeter wave radar, the information of each point cloud point may further include information such as laser reflection intensity or millimeter wave reflection intensity.

In addition, in order to avoid the inconsistency of the acquisition time of the point cloud and the image, when the terminal starts to initially acquire the point cloud from the three-dimensional scanning device and the image from the visual sensor, the acquisition time of the point cloud and the image can be acquired from the three-dimensional scanning device and the visual sensor, so that the point cloud and the image acquired from the three-dimensional scanning device and the visual sensor are aligned in time according to the acquisition time of the point cloud and the image, and the same group of point cloud and image for target detection are ensured to be consistent in acquisition time.

In some implementations, after the point cloud and the image are acquired, the terminal may also perform data preprocessing operations on the point cloud and/or the image. For example, the terminal can filter the point cloud, remove the cloud points of the ground points, reduce the data volume of the point cloud, and improve the target detection efficiency; barrel distortion, pincushion distortion, or the like existing in the captured image may be corrected based on internal and external parameters of the vision sensor (which are generally provided by a vision sensor manufacturer).

As an example, the terminal may remove point cloud points satisfying a predetermined condition (e.g., a z coordinate of the point cloud point is less than a certain threshold) from the point cloud, and filter out point cloud points on the ground, thereby reducing the data amount of the point cloud and improving the target detection efficiency.

S702: and the terminal inputs the point cloud and the three-dimensional space position of the target predicted by the at least one target tracking track in the point cloud into a target detection model for processing to obtain the three-dimensional space position of the at least one first target.

The three-dimensional space position of the target comprises information such as a central point coordinate, a length, a width, a height and the like, and can also be called a three-dimensional detection frame or a three-dimensional bounding box (3D BBox), and the target detection model is trained based on a plurality of point cloud samples of the three-dimensional space position of a predicted target corresponding to a known target tracking track and three-dimensional space position detection results of a plurality of targets corresponding to the plurality of point cloud samples one by one.

In the embodiment of the present application, one target tracking track corresponds to one target, and the target tracking track records information of the target, such as an identity identification number (ID), a target feature, an existence time, a three-dimensional spatial position in each frame of point cloud where the target exists, a two-dimensional spatial position in each frame of image where the target exists, and the like. Tracking a target in the point clouds through a Kalman (kalman) algorithm and the like, and predicting the three-dimensional spatial position of the target in the next frame of point clouds (namely, the point clouds collected at the next moment) according to the three-dimensional spatial position of the target in each frame of point clouds of the target in a target tracking track corresponding to the target, namely obtaining the three-dimensional spatial position of the target predicted by the target tracking track in the next frame of point clouds; the target can be tracked in the image through an optical flow algorithm and the like, and according to the two-dimensional space position of the target in each frame of image in which the target exists in the target tracking track corresponding to the target, the two-dimensional space position of the target in the next frame of image (namely the image collected at the next moment) can be predicted through the optical flow algorithm and the like, namely the two-dimensional space position of the target predicted by the target tracking track in the next frame of image can be obtained.

When the point cloud is subjected to target detection, the probability that a target appears in a position area where the three-dimensional space position of the target is predicted by the existing target tracking track in the current point cloud is obviously higher than that of other position areas in the point cloud, and the target tracking track is a position area which needs to be focused when the point cloud is subjected to target detection.

And for the point cloud, target detection is carried out, and the terminal can process the point cloud and at least one target tracking track to predict the three-dimensional space position of the target in the point cloud through a target detection model. Specifically, the target detection model may be obtained by training, by the training device, a plurality of point cloud samples of a three-dimensional spatial position of a predicted target corresponding to a known target tracking trajectory maintained in a sample set, and a three-dimensional spatial position detection result of a plurality of targets corresponding to the plurality of point cloud samples one to one. When the target detection model is trained, the training device may add a three-dimensional spatial position label vector (for example, a label vector composed of information such as coordinates of a central point, length, width, height, and the like) to each point cloud sample according to the three-dimensional spatial position of the target corresponding to each point cloud sample. In addition, it should be understood that if there are three-dimensional spatial positions of multiple targets in the point cloud sample, there are multiple three-dimensional spatial position tag vectors added to the point cloud sample, and the three-dimensional spatial position tag vectors of multiple targets may also exist in a matrix form, and correspond to the multiple targets one by one.

After adding a three-dimensional space position label vector of a target to each point cloud sample in a training set, inputting the point cloud sample and the three-dimensional space position of a predicted target corresponding to a target tracking track(s) into a target detection model by a training device for processing to obtain a three-dimensional space position predicted value of the target(s) output by the target detection model, calculating loss (loss) of the target detection model by a loss function (loss) training device according to the three-dimensional space position predicted value of the output target and the three-dimensional space position label vector of the real target corresponding to the point cloud sample, wherein the higher the loss is, the larger the difference between the three-dimensional space position predicted value of the target output by the target detection model and the three-dimensional space position label vector of the real target is, and adjusting parameters in the target detection model according to the loss by the training device, if the parameters of the neurons in the target detection model are updated by adopting a stochastic gradient descent method, the training process of the target detection model becomes a process for reducing the loss as much as possible. And continuously training the target detection model through point cloud samples in the sample set, and obtaining the trained target detection model when the loss is reduced to a preset range. Wherein, the target detection model can be a deep neural network or the like.

It should be understood that the point cloud samples in the training set may be obtained by pre-sampling, such as pre-collecting the point cloud samples by the terminal, predicting the three-dimensional spatial position of the predicted target in the collected point cloud samples according to the target tracking track(s), and recording the predicted three-dimensional spatial position, while labeling the three-dimensional spatial position of the real target existing in the point cloud samples.

The training device may be a Personal Computer (PC), a laptop, a server, or a terminal, and if the training device and the terminal are not the same device, the training device may guide the trained target detection model into the terminal after completing training the target detection model, so that the terminal may detect the first target in the acquired point cloud.

S703: and the terminal determines the two-dimensional space position of at least one second target in the image according to the projection of the three-dimensional space position of the at least one first target in the image and the two-dimensional space position of the at least one target tracking track prediction target in the image.

Through the projection matrixes of the point cloud coordinate system (three-dimensional) and the image coordinate system (two-dimensional), the three-dimensional space position in the point cloud can be projected into the image to obtain the two-dimensional space position in the image, and the two-dimensional space position in the image can also be projected into the point cloud to obtain the three-dimensional space position in the point cloud.

In some implementations, the projection matrix may be determined by presetting a plurality of calibration objects (e.g., a three-dimensional carton with a plurality of edges) in a common field of view of a three-dimensional scanning device and a visual sensor, acquiring a calibration object point cloud and a calibration object image by the three-dimensional scanning device and the visual sensor, selecting a plurality of calibration points (e.g., corner points of the three-dimensional carton) in the acquired calibration object point cloud and calibration object image, obtaining three-dimensional coordinates of the plurality of calibration points in the calibration object point cloud and two-dimensional coordinates in the calibration object image, and solving the projection matrix of the point cloud coordinate system and the image coordinate system according to the three-dimensional coordinates of the plurality of calibration points in the calibration object point cloud and the two-dimensional coordinates in the calibration object image.

As an example: assuming that (x, y, z) and (u, v) are coordinates of the calibration point in the point cloud coordinate system and the image coordinate system, respectively, the transformation relationship between the two coordinate systems can be obtained as follows:

wherein, K is an internal reference matrix of the visual sensor, the internal reference matrix of the visual sensor is fixed after leaving the factory and is usually provided by a manufacturer or obtained by a calibration algorithm, [ R, T ] is an external reference matrix of the visual sensor, and a projection matrix M from a point cloud coordinate system to an image coordinate system can be solved by three-dimensional coordinates of a plurality of (at least 3) calibration points in a point cloud of a calibration object and two-dimensional coordinates in an image of the calibration object.

In addition, although feedback of a predicted target of the target tracking track is increased when the first target in the point cloud is detected, and the missed detection rate is reduced, the detection result of the three-dimensional space position of the first target output by the target detection model still has the possibility of missed detection, so in some embodiments, the terminal may further increase feedback of the predicted target of the target tracking track when the second target in the image is detected, regard the two-dimensional space position obtained by projecting at least one first target in the image and the two-dimensional space position of the predicted target of at least one target tracking track in the image as the existing targets, and output the two-dimensional space position obtained by projecting and the two-dimensional space position of the predicted target as the two-dimensional space position where the second target exists.

S704: and the terminal determines the three-dimensional space position of the at least one second target in the point cloud according to the projection of the two-dimensional space position of the at least one second target in the point cloud.

And the terminal projects the two-dimensional space position of at least one second target in the image into the point cloud, so that the three-dimensional space position of the at least one second target in the point cloud can be obtained, and the final target detection result of the point cloud is obtained and output.

For any second object, the features of the second object may include object features in three-dimensional spatial locations in the point cloud and object features in two-dimensional spatial locations in the image. The target features in the three-dimensional spatial positions in the point cloud may include a position (such as a central point coordinate), a size (such as a length, a width and a height), a speed, a direction, a category, a point cloud point number, numerical distribution of coordinates of each direction in the point cloud, reflection intensity distribution of the point cloud (such as a point cloud reflection intensity distribution histogram), a depth feature, and the like, and the target features in the two-dimensional spatial positions in the image may include a position (a central point coordinate), a size (such as a length and a width), a speed, a direction, a category, an appearance feature (such as an image color histogram, a direction gradient histogram), and the like.

For target tracking, one target tracking track corresponds to one target, and the target tracking track records information of the target, such as an ID, target characteristics, existence time, a three-dimensional space position in each frame of point cloud where the target exists, a two-dimensional space position in each frame of image where the target exists, and the like. And associating the second target matched with the target tracking track to perfect the existing target tracking track.

As an example, a degree of matching (or similarity) between the target feature of the at least one target tracking trajectory and the target feature of the at least one second target may be used as a cost matrix, and a hungarian algorithm is adopted to perform global optimal matching on the at least one target tracking trajectory and the at least one second target. The Hungarian algorithm is a combined optimization algorithm for solving task allocation problems in polynomial time. When the terminal calculates the similarity between the target feature of the target tracking track and the target feature of the second target, one or more of the target features such as position (in point cloud and/or image), size (in point cloud and/or image), speed (in point cloud and/or image), direction (in point cloud and/or image), category (in point cloud and/or image), point cloud point number, numerical value distribution of coordinates of each direction of point cloud, point cloud reflection intensity distribution, appearance feature, depth feature and the like are considered, when the multiple target features are considered, different weights can be given to different target features, and the sum of all weights is 1.

For a second target matched with the existing target tracking track, the second target is given to the ID of the matched target tracking track, and the existing target tracking track is perfected; for a second target which is not matched with the upper target tracking track, the terminal can assign a new target tracking track ID to the target and create a new target tracking track.

For the target tracking track which is not matched with the second target, the terminal can associate the target tracking track with the predicted target of the target tracking track in the point cloud and/or the image, perfect the target tracking track and avoid that the same target corresponds to a plurality of target tracking tracks due to omission and the like.

It should be understood that although the second target already covers the predicted target with the target tracking track, if the three-dimensional spatial position of the predicted target in the point cloud and the two-dimensional spatial position of the predicted target in the image do not actually appear, the target feature may not be successfully matched with the target feature of the target tracking track.

In addition, in order to avoid wasting processing resources for detecting and tracking the target which is moved out of the detection range, for the target tracking track which is not matched with the second target, before the target tracking track is associated with the predicted target of the target tracking track in the point cloud and/or the image, if the frequency of associating the predicted target with the target tracking track is greater than or equal to the first threshold value, the terminal deletes the target tracking track.

The above-mentioned scheme provided by the present application is introduced mainly from the perspective of method flow, and the following describes in detail the technical scheme of the embodiment of the present application from the perspective of hardware or logic partitioning module. It is understood that, in order to implement the above functions, the apparatus may include a corresponding hardware structure and/or software module for performing each function. Those of skill in the art would readily appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is performed as hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In case of an integrated unit, fig. 8 shows a possible exemplary block diagram of the object detection device referred to in the embodiments of the present application, and the object detection device 800 may exist in the form of a software module or a hardware module. The object detection device 800 may include: an acquisition unit 803 and a processing unit 802. In one example, the apparatus may be a chip.

Optionally, the apparatus 800 may further comprise a storage unit 801 for storing program codes and/or data of the apparatus 800.

Specifically, in one embodiment, an acquisition unit 803 for acquiring a point cloud from a three-dimensional scanning apparatus and an image from a vision sensor;

a processing unit 802, configured to input the point cloud and the three-dimensional spatial position of the target predicted by the at least one target tracking track in the point cloud to a target detection model for processing, so as to obtain the three-dimensional spatial position of the at least one first target, where the target detection model is obtained by training a plurality of point cloud samples of the three-dimensional spatial position of the predicted target corresponding to known target tracking tracks and three-dimensional spatial position detection results of a plurality of targets corresponding to the plurality of point cloud samples one to one;

the processing unit 802 is further configured to determine a two-dimensional spatial position of at least one second target in the image according to a projection of the three-dimensional spatial position of the at least one first target in the image and a two-dimensional spatial position of a predicted target of the at least one target tracking trajectory in the image;

the processing unit 802 is further configured to determine a three-dimensional spatial position of the at least one second target in the point cloud according to a projection of the two-dimensional spatial position of the at least one second target in the point cloud.

In a possible design, the processing unit 802 is further configured to match the at least one target tracking track and the at least one second target according to a target feature corresponding to the at least one target tracking track and a target feature of the at least one second target; and associating the matched target tracking track with the second target.

In a possible design, the processing unit 802 is further configured to, for the second target that is not matched to the target tracking trajectory, establish a target tracking trajectory corresponding to the second target.

In one possible design, the processing unit 802 is further configured to, for the target tracking trajectory that is not matched to the second target, associate the target tracking trajectory with a predicted target of the target tracking trajectory in the point cloud and/or the image.

In one possible design, the processing unit 802 is further configured to delete the target tracking trajectory that is not matched to the second target when the number of times the target tracking trajectory is associated with a predicted target in the point cloud and/or the image is greater than or equal to a first threshold before associating the target tracking trajectory with the predicted target in the point cloud and/or the image.

In one possible design, the target features include one or more of: position, length, width, height, speed, direction, category, point cloud point number, coordinate value distribution of each direction of the point cloud, point cloud reflection intensity distribution, appearance characteristic and depth characteristic.

In a possible design, the acquiring unit 803 is further configured to acquire a point cloud of the calibration object from the three-dimensional scanning device and an image of the calibration object from the vision sensor;

the processing unit 802 is further configured to determine a projection matrix of a point cloud coordinate system and an image coordinate system according to a three-dimensional coordinate of a plurality of calibration points in the calibration object in the point cloud of the calibration object and a two-dimensional coordinate in the calibration object image.

As shown in fig. 9, an object detection apparatus 900 is further provided in the embodiment of the present application, and as shown in fig. 9, the object detection apparatus 900 includes at least one processor 902 and an interface circuit. Further, the apparatus may further comprise at least one memory 901, wherein the at least one memory 901 is coupled to the processor 902. The interface circuit is used for providing data and/or information input and output for the at least one processor. The memory 901 is configured to store computer executable instructions, and when the object detection apparatus 900 runs, the processor 902 executes the computer executable instructions stored in the memory 901, so that the object detection apparatus 900 implements the object detection method, where the implementation of the object detection method may refer to the description above and the related description of the drawings, and is not described herein again.

As another form of the present embodiment, there is provided a computer-readable storage medium on which a program or instructions are stored, which, when executed, can perform the object detection method in the above-described method embodiments.

As another form of the present embodiment, there is provided a computer program product containing instructions that, when executed, may perform the object detection method in the above-described method embodiments.

As another form of this embodiment, a chip may be provided, where the chip may be coupled with a memory, and is used to call a computer program product stored in the memory, so as to implement the target detection method in the foregoing method embodiments.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.

It will be apparent to those skilled in the art that various changes and modifications may be made in the embodiments of the present application without departing from the spirit and scope of the embodiments of the present application. Thus, if such modifications and variations of the embodiments of the present application fall within the scope of the claims of the present application and their equivalents, the present application is also intended to encompass such modifications and variations.

Claims

1. A method of target detection, comprising:

acquiring a point cloud from a three-dimensional scanning device and an image from a visual sensor;

inputting the point cloud and the three-dimensional space position of the target predicted by the at least one target tracking track in the point cloud into a target detection model for processing to obtain the three-dimensional space position of the at least one first target;

determining a two-dimensional space position of at least one second target in the image according to a projection of the three-dimensional space position of the at least one first target in the image and a two-dimensional space position of a predicted target of the at least one target tracking track in the image;

and determining the three-dimensional space position of the at least one second target in the point cloud according to the projection of the two-dimensional space position of the at least one second target in the point cloud.

2. The method of claim 1, wherein the method further comprises:

matching the at least one target tracking track and the at least one second target according to the target characteristics corresponding to the at least one target tracking track and the target characteristics of the at least one second target;

and associating the matched target tracking track with the second target.

3. The method of claim 2, wherein the method further comprises:

and for the second target which is not matched with the target tracking track, establishing a target tracking track corresponding to the second target.

4. The method of claim 2 or 3, further comprising:

for the target tracking trajectory that does not match to the second target, associating the target tracking trajectory with a predicted target of the target tracking trajectory in the point cloud and/or the image.

5. The method of claim 4, wherein prior to the associating the target tracking trajectory with the predicted target of the target tracking trajectory in the point cloud and/or the image for the target tracking trajectory not matched to the second target, the method further comprises:

and when the frequency of the target tracking track associated with the predicted target is greater than or equal to a first threshold value, deleting the target tracking track.

6. The method of any one of claims 2-5, wherein the target features include one or more of:

position, size, speed, direction, category, point cloud point number, coordinate value distribution of each direction of the point cloud, point cloud reflection intensity distribution, appearance characteristic and depth characteristic.

7. The method of any one of claims 1-6, further comprising:

acquiring a calibration object point cloud from three-dimensional scanning equipment and a calibration object image from a visual sensor;

and determining a projection matrix of a point cloud coordinate system and an image coordinate system according to the three-dimensional coordinates of a plurality of calibration points in the calibration object in the point cloud of the calibration object and the two-dimensional coordinates in the image of the calibration object.

8. An object detection device, comprising:

an acquisition unit for acquiring a point cloud from a three-dimensional scanning apparatus and an image from a vision sensor;

the processing unit is used for inputting the three-dimensional space positions of the predicted targets in the point cloud and the at least one target tracking track into a target detection model for processing to obtain the three-dimensional space position of the at least one first target;

the processing unit is further configured to determine a two-dimensional spatial position of at least one second target in the image according to a projection of the three-dimensional spatial position of the at least one first target in the image and a two-dimensional spatial position of a predicted target of the at least one target tracking trajectory in the image;

the processing unit is further configured to determine a three-dimensional spatial position of the at least one second target in the point cloud according to a projection of the two-dimensional spatial position of the at least one second target in the point cloud.

9. The apparatus according to claim 8, wherein the processing unit is further configured to match the at least one target tracking trajectory with the at least one second target according to a target feature corresponding to the at least one target tracking trajectory and a target feature of the at least one second target; and associating the matched target tracking track with the second target.

10. The apparatus of claim 9, wherein the processing unit is further configured to, for the second target that is not matched to the target tracking trajectory, establish a target tracking trajectory corresponding to the second target.

11. The apparatus of claim 9 or 10, wherein the processing unit is further to, for the target tracking trajectory that does not match to the second target, associate the target tracking trajectory with a predicted target of the target tracking trajectory in the point cloud and/or the image.

12. The apparatus of claim 11, wherein the processing unit, for the target tracking trajectory that does not match to the second target, is further to delete the target tracking trajectory when the number of times the target tracking trajectory associates a predicted target is greater than or equal to a first threshold before associating the target tracking trajectory with the predicted target of the target tracking trajectory in the point cloud and/or the image.

13. The apparatus of any one of claims 9-12, wherein the target features include one or more of: position, size, speed, direction, category, point cloud point number, point cloud coordinate value distribution in each direction, point cloud reflection intensity distribution, appearance characteristic and depth characteristic.

14. The apparatus according to any one of claims 8-13, wherein the acquiring unit is further configured to acquire a calibration object point cloud from a three-dimensional scanning device and a calibration object image from a vision sensor;

the processing unit is further used for determining a projection matrix of a point cloud coordinate system and an image coordinate system according to the three-dimensional coordinates of a plurality of calibration points in the calibration object point cloud and the two-dimensional coordinates in the calibration object image.

15. An object detection device comprising at least one processor and an interface;

the at least one processor is configured to invoke and run a computer program from the interface, which when executed by the at least one processor implements the method of any one of claims 1-7.

16. A system-on-chip, the system-on-chip comprising: at least one processor and an interface;

17. A computer-readable storage medium, in which a computer program is stored which, when executed by a computer, causes the computer to carry out the method according to any one of claims 1 to 7.

18. A terminal, characterized in that the terminal comprises an object detection arrangement according to any one of claims 8-14.

19. The terminal of claim 18, wherein the terminal is a vehicle, a drone, or a robot.