WO2022188663A1

WO2022188663A1 - Target detection method and apparatus

Info

Publication number: WO2022188663A1
Application number: PCT/CN2022/078611
Authority: WO
Inventors: 吴家俊; 梁振宝; 周伟
Original assignee: 华为技术有限公司
Priority date: 2021-03-09
Filing date: 2022-03-01
Publication date: 2022-09-15
Also published as: CN115049700A

Abstract

The present application relates to the field of intelligent driving. Disclosed are a target detection method and apparatus, which are used for improving the accuracy and real-time performance of target detection. The method comprises: acquiring a point cloud from a three-dimensional scanning device and an image from a visual sensor; inputting, into a target detection model, the point cloud, and the three-dimensional spatial position of a predicted target of at least one target tracking trajectory in the point cloud, and processing same, so as to obtain the three-dimensional spatial position of at least one first target; according to the projection of the three-dimensional spatial position of the at least one first target in the image and the two-dimensional spatial position of the predicted target of the at least one target tracking trajectory in the image, determining the two-dimensional spatial position of at least one second target in the image; and according to the projection of the two-dimensional spatial position of the at least one second target in the point cloud, determining the three-dimensional spatial position of the at least one second target in the point cloud.

Description

A target detection method and device

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the priority of the Chinese patent application filed on March 09, 2021 with the Intellectual Property Office of the People's Republic of China, the application number is 202110256851.2, and the application name is "a target detection method and device", the entire contents of which are incorporated by reference in in this application.

technical field

The embodiments of the present application relate to the field of intelligent driving, and in particular, to a target detection method and device.

Background technique

With the development of the city, the traffic becomes more and more congested, and people tend to be more and more tired when driving. In order to meet people's travel requirements, intelligent driving (including assisted driving and unmanned driving) emerges as the times require. How to reliably detect objects in the environment is crucial to the decision-making of intelligent driving.

Most current object detection methods are based on a single type of sensor, such as only relying on lidar to obtain point clouds or only relying on cameras to obtain images. The point cloud can provide the three-dimensional information of the target and can better overcome the problem of mutual occlusion of the target, but the point cloud is relatively sparse, and the recognition rate of the target features is not high. Compared with point clouds, images have richer information, but images are greatly affected by lighting, weather, etc., and the reliability of detection and tracking is poor. Moreover, the image only has two-dimensional plane information, and the information of the occluded target cannot be obtained, which is easy to lose the target or cause errors. The fusion of point cloud and image can give full play to the complementarity of point cloud and image, and improve the robustness of detection. However, there are few researches on target detection of multi-sensor fusion at present, and the accuracy and real-time performance of target detection need to be improved.

SUMMARY OF THE INVENTION

Embodiments of the present application provide a target detection method and device, so as to improve the accuracy and real-time performance of target detection.

In a first aspect, an embodiment of the present application provides a target detection method, the method includes: acquiring a point cloud from a three-dimensional scanning device and an image from a vision sensor; placing the point cloud and at least one target tracking trajectory on the point cloud The three-dimensional space position of the predicted target is input into the target detection model for processing, and the three-dimensional space position of at least one first target is obtained, wherein the target detection model is based on a plurality of three-dimensional space positions of the predicted target corresponding to the known target tracking trajectory. The point cloud samples and the three-dimensional space position detection results of the multiple targets corresponding to the multiple point cloud samples one-to-one are obtained by training; according to the projection and the three-dimensional space position of the at least one first target in the image. The at least one target tracking trajectory predicts the two-dimensional spatial position of the target in the image, and determines the two-dimensional spatial position of the at least one second target in the image; according to the two-dimensional spatial position of the at least one second target, The projection in the point cloud determines the three-dimensional space position of the at least one second target in the point cloud.

In the embodiment of the present application, a target tracking trajectory feedback mechanism is added. When performing target detection in point clouds and images, more attention is paid to the area where the target tracking trajectory is located in the point clouds and images where the predicted target position is located, which can effectively reduce leakage. to improve the accuracy of target detection.

In a possible design, the method further includes: according to the target feature corresponding to the at least one target tracking track and the target feature of the at least one second target, performing the tracking on the at least one target tracking track and the at least one target tracking track and the at least one target tracking track. A second target is matched; the matched target tracking trajectory is associated with the second target. Optionally, the target features include one or more of the following: position, size, speed, direction, category, number of point cloud points, numerical distribution of coordinates in each direction of point cloud, distribution of point cloud reflection intensity, appearance feature, depth features, etc.

In the above design, the detected target can be associated with the existing target tracking trajectory based on the target feature, which is conducive to obtaining a complete target tracking trajectory and predicting the position where the target will appear at the next moment.

In a possible design, the method further includes: for the second target that is not matched to the target tracking trajectory, establishing a target tracking trajectory corresponding to the second target.

In the above design, for a newly appeared target, a new ID can be given to the target, and a target tracking trajectory corresponding to the target can be established, which is conducive to tracking all the targets that appear.

In a possible design, the method further includes: for the target tracking trajectory that is not matched to the second target, comparing the target tracking trajectory and the target tracking trajectory on the point cloud and/or the target tracking trajectory. predicted target associations in the image.

In the above design, for the target tracking trajectory that does not detect the corresponding target in the point cloud and image, the target tracking trajectory can be associated with the predicted target of the target tracking trajectory in the point cloud and/or image, which is beneficial to avoid leakage due to leakage. It can detect the problem that the same target corresponds to multiple target tracking trajectories, and improve the reliability of target tracking.

In a possible design, for the target tracking trajectory that is not matched to the second target, the target tracking trajectory and the target tracking trajectory are in the point cloud and/or the image. Before being associated with the predicted target, the method further includes: when the number of times the target tracking trajectory is associated with the predicted target is greater than or equal to a first threshold, deleting the target tracking trajectory.

In the above design, deleting the target tracking trajectories for which the corresponding target is not detected in the acquired point cloud and/or image for many times is beneficial to save processing resources.

In a possible design, the method further includes: acquiring a calibration object point cloud from a three-dimensional scanning device and a calibration object image from a vision sensor; The three-dimensional coordinates and the two-dimensional coordinates in the calibration object image determine the projection matrix of the point cloud coordinate system and the image coordinate system.

In the above design, the three-dimensional scanning device and the visual sensor can be jointly calibrated by the calibration object, and the projection matrix of the point cloud coordinate system and the image coordinate system (also called the pixel coordinate system) can be determined, which is beneficial to the point cloud and image. The target detection results are fused to improve the accuracy of target detection.

In the second aspect, an embodiment of the present application provides a target detection device, the device has the function of implementing the first aspect or any possible method in the design of the first aspect, and the function can be implemented by hardware or by The hardware executes the corresponding software implementation. The hardware or software includes one or more units (modules) corresponding to the above functions, such as an acquisition unit and a processing unit.

In a third aspect, an embodiment of the present application provides a target detection apparatus, including at least one processor and an interface, where the processor is configured to call and run a computer program from the interface, and when the processor executes the computer program, The method described in the above first aspect or any possible design of the first aspect can be implemented.

In a fourth aspect, an embodiment of the present application provides a terminal, where the terminal includes the device described in the second aspect above. Optionally, the terminal may be a vehicle-mounted device, a vehicle, a monitoring controller, an unmanned aerial vehicle, a robot, a roadside unit, or the like. Alternatively, the terminal may also be a smart device that needs to perform target detection or tracking, such as smart home and smart manufacturing.

In a fifth aspect, an embodiment of the present application provides a chip system, the chip system includes: a processor and an interface, the processor is configured to call and run a computer program from the interface, and when the processor executes the computer program , the method described in the first aspect or any possible design of the first aspect can be implemented.

In a sixth aspect, an embodiment of the present application provides a computer-readable storage medium, the computer-readable storage medium having a computer for executing the method described in the first aspect or any possible design of the first aspect program.

In a seventh aspect, an embodiment of the present application further provides a computer program product, including a computer program or instruction, when the computer program or instruction is executed, the first aspect or any possible design of the first aspect can be implemented method described in.

For the technical effects that can be achieved by the second aspect to the seventh aspect, please refer to the technical effects that can be achieved by the first aspect, which will not be repeated here.

Description of drawings

1 is a schematic diagram of a target detection system provided by an embodiment of the present application;

2 is a schematic flowchart of a target detection process provided by an embodiment of the present application;

3 is a schematic diagram of an intelligent driving scenario provided by an embodiment of the present application;

FIG. 4 is one of the schematic diagrams of a target detection solution based on multi-sensor fusion provided by an embodiment of the present application;

FIG. 5 is the second schematic diagram of the target detection solution based on multi-sensor fusion provided by the embodiment of the present application;

FIG. 6 is a third schematic diagram of a target detection solution based on multi-sensor fusion provided by an embodiment of the present application;

FIG. 7 is a schematic process diagram of a target detection method provided by an embodiment of the present application;

FIG. 8 is one of the schematic diagrams of the target detection apparatus provided by the embodiment of the present application;

FIG. 9 is a second schematic diagram of a target detection apparatus provided by an embodiment of the present application.

Detailed ways

1 is a schematic diagram of a target detection system provided for the implementation of this application, including a data preprocessing module, a joint calibration module, a point cloud detection module, an image region of interest acquisition module, a point cloud domain prediction module, an image domain prediction module, and a prediction module. Decision-making module, data association module, trajectory management module.

Combined with the schematic diagram of the target detection process shown in Figure 2, the data preprocessing module is mainly used to filter point clouds, remove ground points, and perform distortion correction on images.

Joint calibration module: It is mainly used to jointly calibrate the point cloud and image obtained by the 3D scanning device and the vision sensor, and obtain the projection matrix between the point cloud coordinate system and the image coordinate system.

Point cloud detection module: It is mainly used to input the point cloud obtained at the current moment and the results of the feedback target tracking trajectory management (such as at least one target tracking trajectory predicting the three-dimensional space position of the target in the point cloud obtained at the current moment) into the trained In a good target detection model (such as a deep neural network model), the target detection results are obtained.

Image ROI acquisition module: It is mainly used to project the target detection results obtained based on the point cloud into the image using the projection matrix, and combine the results of the feedback target tracking trajectory management (such as at least one target tracking trajectory obtained at the current moment). The two-dimensional spatial position of the predicted target in the image) to obtain the region of interest.

Prediction decision module: It is mainly used to back-project the target detection result of the image to the point cloud, and compare it with the target detection result of the point cloud to decide a more accurate target detection result.

Data association module: It is mainly used to associate and match the target detection result after the prediction decision and the target tracking trajectory.

Trajectory management module: It is mainly used to manage and update all target tracking trajectories according to the data association results.

Point cloud domain prediction module: It is mainly used to predict the three-dimensional space position of the target in the point cloud obtained by the target tracking trajectory based on the updated target tracking trajectory at the next moment.

Image domain prediction module: It is mainly used to predict the two-dimensional spatial position of the target in the image obtained at the next moment based on the updated target tracking trajectory and predicting the target tracking trajectory.

It can be understood that the structure of the target detection system illustrated in the embodiments of the present application does not constitute a specific limitation on the target detection system. In other embodiments of the present application, the target detection system may include more or less modules than shown, or some modules may be combined, or some modules may be split, or different modules are arranged.

The target detection solution provided in the embodiment of the present application can be applied to a terminal to which the target detection system shown in FIG. 1 is applied, and the terminal can be a vehicle-mounted device, a vehicle, a monitoring controller, an unmanned aerial vehicle, a robot, a roadside unit ( Road side unit, RSU) and other equipment, suitable for monitoring, intelligent driving, drone navigation, robot travel and other scenarios. In the subsequent description of the embodiments of the present application, a terminal to which the target detection system shown in FIG. 1 is applied in an intelligent driving scenario is used as an example for description. As shown in Figure 3, a terminal (such as vehicle A) can obtain point clouds and images of the surrounding environment through the three-dimensional scanning device(s) and visual sensor(s) set on the terminal, and can monitor the surrounding environment. Vehicles (such as vehicle B, vehicle C, etc.), pedestrians, bicycles (not shown in the figure), trees (not shown in the figure) and other objects are detected and tracked.

At present, the target detection solutions based on multi-sensor fusion mainly include the following:

The first scheme: As shown in Figure 4, this scheme uses a deep convolutional neural network to detect the three-dimensional spatial position of the target and extract the point cloud features after obtaining the point cloud from the lidar. Acquire an image from a monocular camera, project the 3D boundary of the object detected from the point cloud to the image, and use a deep convolutional neural network to extract image features of the projected area. Next, calculate the similarity matrix of the detected target and the target tracking trajectory in the three-dimensional space position of the point cloud, the point cloud feature and the image feature, and merge the three similarity matrices, and the combined similarity matrix is calculated by the Hungarian algorithm. The bipartite graph matching relationship between the target and the target tracking trajectory is combined with the Kalman filter to estimate the state of the target tracking trajectory, so as to achieve the tracking of the target in the point cloud. However, this scheme uses a deep network for feature extraction in images and point clouds at the same time, which consumes more resources, has low computational efficiency, and is poorly implemented; and once there is a missed detection in the point cloud obtained based on lidar, it cannot be retrieved through the image. Missing target, low accuracy.

The second scheme: As shown in Figure 5, this scheme first uses the deep learning algorithm to obtain the target detection information in the collected images and point clouds. For example: use the deep learning image target detection algorithm for the image to obtain the two-dimensional (2-dimension, 2D) detection frame category, the pixel coordinate position of the center point and the length and width size information of the target in the image; use the deep learning point cloud target detection for the point cloud The algorithm obtains the information of the three-dimensional (3-dimension, 3D) detection frame type, the spatial coordinates of the center point and the length, width and height of the target in the point cloud. Then, based on the minimum distance between the detection frames, the Hungarian algorithm is used to optimally match the detection frame of the image obtained at the adjacent moment and the target in the point cloud to achieve target tracking, and establish the target tracking trajectory of the image and the point cloud respectively. . However, this scheme uses deep learning algorithm for feature extraction in images and point clouds at the same time, which consumes more resources and has poor real-time performance; in addition, there is no real tracking algorithm, and the detection frame and Distance matching of detection boxes is error-prone.

The third scheme: As shown in Figure 6, this scheme collects the point cloud of the target, filters the collected point cloud, outputs the ground object point data after filtering out the ground points, and maps the obtained ground object point data to generate distance Image and based on the reflection intensity image, perform point cloud segmentation and clustering on the object point data according to the distance image, reflection intensity image and echo intensity information to obtain a plurality of point cloud regions. The target point cloud area of the suspected target is screened out from the point cloud area; the feature extraction is performed on each target point cloud area, and the extracted feature vector is used to classify the target to identify the target, and obtain the first target detection result. Collect the image, preprocess the image, use the projection transformation matrix to extract the region of interest from the preprocessed image, perform image feature extraction in the region of interest, identify the target according to the extracted image features, and obtain the second target detection result . If the first target detection result and the second target detection result are the same, the first target detection result or the second target detection result is output as the final target detection result, if the first target detection result and the second target detection result are different , based on Bayesian decision-making, the first target detection result and the second target detection result are fused and judged to obtain the final target detection result output. Finally, a multi-target tracking method based on Markov decision process (MDP) is used for tracking. However, point cloud-based object detection relies on a large amount of prior knowledge with poor accuracy. When the point cloud is missed, the missed target cannot be retrieved through the image, and the accuracy is low.

The purpose of this application is to provide a target detection solution. The target detection result in the point cloud is corrected by the target detection result in the image, and the target tracking trajectory feedback mechanism is used to reduce the missed detection rate and improve the accuracy and real-time performance of target detection. .

Before introducing the embodiments of the present application, some terms in the embodiments of the present application will be explained first, so as to facilitate the understanding of those skilled in the art.

1) Point cloud, the set of point data on the surface of the object scanned by the 3D scanning device can be called a point cloud. A point cloud is a collection of vectors in a three-dimensional coordinate system. These vectors are usually expressed in the form of x, y, z three-dimensional coordinates, and are generally used to represent the outer surface shape of an object. Not only that, in addition to the geometric position information represented by (x, y, z), the point cloud can also represent the RGB color, gray value, depth, intensity of the object's reflective surface, etc. of a point. The point cloud coordinate system involved in the embodiments of the present application is the three-dimensional (x, y, z) coordinate system where the point cloud points in the point cloud are located.

2) The image coordinate system, also known as the pixel coordinate system, is usually a two-dimensional coordinate system established with the upper left corner of the image as the origin, and the unit is pixel. The two coordinate axes of the image coordinate system consist of u and v. The coordinates of a point in the image coordinate system can be identified as (u, v).

3) Corner points, corner points are points with particularly prominent attributes in a certain aspect, and refer to representative and robust points in point clouds and images, such as the intersection of two sides.

4), region of interest (ROI), in image processing, the area to be processed is outlined from the processed image in the form of boxes, circles, ellipses, irregular polygons, etc., which is called the region of interest. In this embodiment of the present application, the region of interest may be considered as a region in an image where a target exists.

In addition, it should be understood that, in this application, "at least one" refers to one or more, and "a plurality" refers to two or more. "And/or", which describes the association relationship of the associated objects, indicates that there can be three kinds of relationships, for example, A and/or B, which can indicate: the existence of A alone, the existence of A and B at the same time, and the existence of B alone, where A, B can be singular or plural. In the text description of this application, the character "/" generally indicates that the contextual objects are in an "or" relationship. In addition, unless otherwise stated, the ordinal numbers such as “first” and “second” mentioned in the embodiments of the present application are used to distinguish multiple objects, and are not used to limit the order, sequence, priority, or importance of multiple objects. degree, and the descriptions of "first" and "second" do not limit the objects to be necessarily different. Various numerical numbers involved in the present application are only for the convenience of description, and are not used to limit the scope of the embodiments of the present application. The size of the sequence numbers of the above processes does not imply the sequence of execution, and the execution sequence of each process should be determined by its function and internal logic. In this application, the words "exemplary" or "such as" are used to mean an example, illustration, or illustration, and any embodiment or design described as "exemplary" or "such as" should not be construed as Other embodiments or designs are more preferred or advantageous. The use of words such as "exemplary" or "such as" is intended to present the relevant concepts in a specific manner to facilitate understanding. The embodiments of the present application will be described in detail below with reference to the accompanying drawings.

FIG. 7 is a schematic diagram of a target detection method provided by an embodiment of the present application, and the method includes:

S701: The terminal acquires the point cloud from the three-dimensional scanning device and the image from the vision sensor.

Among them, the three-dimensional scanning device can be a lidar, a millimeter-wave radar, a depth camera, etc., and the visual sensor can be a monocular camera, a multi-eye camera, and the like.

In a possible implementation, at least one three-dimensional scanning device and at least one visual sensor may be installed on the terminal, and the terminal may scan objects around the terminal (or in a certain direction, such as the direction of travel) through the three-dimensional scanning device, and collect The point cloud of objects around the terminal (or in a certain direction); it is also possible to scan the objects around the terminal (or in a certain direction) through the vision sensor, and collect images of the objects around the terminal (or in a certain direction). The point cloud may be a collection of point cloud points, and the information of each point cloud point in the collection includes the three-dimensional coordinates (x, y, z) of the point cloud point. When the three-dimensional scanning device is a lidar or a millimeter-wave radar , the information of each point cloud point can also include information such as laser reflection intensity or millimeter wave reflection intensity.

In addition, in order to avoid inconsistency in the acquisition time of the point cloud and the image, when the terminal starts to initially acquire the point cloud from the 3D scanning device and the image from the vision sensor, it can also obtain the acquisition time of the point cloud and the image from the 3D scanning device and the visual sensor. Therefore, according to the acquisition time of the point cloud and the image, the point cloud and the image obtained from the 3D scanning device and the vision sensor are time-aligned to ensure that the same set of point clouds and images for target detection have the same acquisition time.

In some implementations, after acquiring the point cloud and the image, the terminal may further perform a data preprocessing operation on the point cloud and/or the image. For example, the terminal can filter the point cloud, remove the ground point cloud points, reduce the data volume of the point cloud, and improve the target detection efficiency; it can also be based on the internal and external parameters of the visual sensor (usually provided by the visual sensor manufacturer). The barrel distortion or pincushion distortion that exists in the collected image is corrected for distortion.

As an example, the terminal can remove the point cloud points that meet the above conditions in the above point cloud according to the pre-given conditions that the point cloud points belonging to the ground should meet (for example, the z-coordinate of the point cloud point is less than a certain threshold), The point cloud points on the ground are filtered out, thereby reducing the data volume of the point cloud and improving the efficiency of target detection.

S702: The terminal inputs the point cloud and the three-dimensional space position of the target predicted in the point cloud and the at least one target tracking trajectory into the target detection model for processing, and obtains the three-dimensional space position of at least one first target.

Among them, the three-dimensional space position of the target includes information such as center point coordinates, length, width and height, which can also be called a three-dimensional detection box or a three-dimensional bounding box (3D BBox). The target detection model is based on the prediction corresponding to the known target tracking trajectory. The multiple point cloud samples of the three-dimensional spatial position of the target and the three-dimensional spatial position detection results of the multiple targets corresponding to the multiple point cloud samples one-to-one are obtained by training.

In the embodiment of the present application, a target tracking track corresponds to a target, and the target tracking track records information of the target, such as an identity document (ID), target characteristics, existence time, and each frame in which the target exists. The three-dimensional space position in the point cloud, the two-dimensional space position in each frame of image where the target exists, etc. The target can be tracked in the point cloud by the Kalman algorithm, etc. According to the three-dimensional space position of the target in each frame of the point cloud where the target corresponds to the target tracking track, the target can be predicted in the next The three-dimensional space position that appears in the frame point cloud (that is, the point cloud collected at the next moment), that is, the target tracking trajectory can be obtained to predict the three-dimensional space position of the target in the next frame point cloud; Tracking in the image, according to the two-dimensional space position of the target in each frame of the target image in the target tracking track corresponding to the target, through the optical flow algorithm, etc. can predict the target in the next frame of image (that is, the next moment to collect the image. The two-dimensional space position that appears in the image), that is, the two-dimensional space position of the target tracking trajectory in the next frame image can be obtained.

When the target detection is performed on the point cloud, the existing target tracking trajectory in the current point cloud predicts the three-dimensional space position of the target in the location area where the probability of the target appearing is significantly higher than that in other location areas in the point cloud. The location area that needs to be focused on when performing object detection.

For target detection on the point cloud, the terminal can predict the three-dimensional space position of the target in the point cloud by processing the point cloud and at least one target tracking trajectory by the target detection model. Specifically, the target detection model can be a plurality of point cloud samples that predict the three-dimensional spatial position of the target based on the known target tracking trajectories maintained in the sample set by the training device, and a plurality of point cloud samples corresponding to the plurality of point cloud samples one-to-one. The three-dimensional space position detection result of the target is obtained by training. When training the target detection model, the training device can add a three-dimensional space position label vector (such as center point coordinates, length, width, height, etc.) to each point cloud sample according to the three-dimensional space position of the target corresponding to each point cloud sample. label vector of information). In addition, it should be understood that if there are multiple three-dimensional spatial positions of targets in the point cloud sample, there are multiple three-dimensional spatial position label vectors added to the point cloud sample, which correspond to multiple targets one-to-one. The spatial location label vector can also exist in the form of a matrix.

After adding the 3D space position label vector of the target to each point cloud sample in the training set, the training device can input the 3D space position of the predicted target corresponding to the point cloud sample and the target tracking track(s) into the target detection model for processing , obtain the predicted value of the three-dimensional space position of the target (one or more) output by the target detection model, according to the predicted value of the three-dimensional space position of the output target and the three-dimensional space position label vector of the real target corresponding to the point cloud sample, through the loss function ( loss function) training equipment can calculate the loss of the target detection model. Adjust the parameters in the target detection model according to the loss. If the stochastic gradient descent method is used to update the parameters of the neurons in the target detection model, the training process of the target detection model becomes the process of reducing the loss as much as possible. The target detection model is continuously trained through the point cloud samples in the sample set. When the loss is reduced to a preset range, the trained target detection model can be obtained. The target detection model may be a deep neural network or the like.

It should be understood that the point cloud samples in the training set can be obtained by pre-sampling, such as pre-collecting point cloud samples through the terminal, and predicting the three-dimensional shape of the predicted target in the collected point cloud samples according to the target tracking trajectory (one or more). The spatial position is recorded, and the three-dimensional spatial position of the real target existing in the point cloud sample is marked at the same time.

The above training equipment can be a personal computer (PC), a notebook computer, a server, etc., or a terminal. If the training equipment and the terminal are not the same equipment, after the training equipment has completed the training of the target detection model, the training completed can be used. The target detection model is imported into the terminal, so that the terminal can detect the first target in the acquired point cloud.

S703: The terminal predicts the two-dimensional space position of the target in the image according to the projection of the three-dimensional space position of the at least one first target in the image and the at least one target tracking trajectory in the image, and determines the position of the target in the image. two-dimensional spatial location of at least one second object.

Through the projection matrix of the point cloud coordinate system (three-dimensional) and the image coordinate system (two-dimensional), the three-dimensional space position in the point cloud can be projected into the image, and the two-dimensional space position in the image can be obtained. The dimensional space position is projected into the point cloud, and the 3D space position in the point cloud is obtained.

In some implementations, for the determination of the projection matrix, several calibration objects (such as a three-dimensional carton with multiple edges and corners) can be preset and placed in the common field of view of the 3D scanning device and the vision sensor, and the calibration object points are collected by the 3D scanning device and the vision sensor. Cloud and calibration object image, select multiple calibration points (such as the corners of the three-dimensional carton) in the collected calibration object point cloud and calibration object image, and obtain the three-dimensional coordinates of the multiple calibration points in the calibration object point cloud and the calibration object. For the two-dimensional coordinates in the image, the projection matrix of the point cloud coordinate system and the image coordinate system can be solved according to the three-dimensional coordinates of multiple calibration points in the calibration object point cloud and the two-dimensional coordinates in the calibration object image.

As an example: Assuming that (x, y, z) and (u, v) are the coordinates of the calibration point in the point cloud coordinate system and the image coordinate system, respectively, the conversion relationship between the two coordinate systems can be obtained as follows:

Among them, K is the internal parameter matrix of the visual sensor. The internal parameter matrix of the visual sensor is fixed after leaving the factory and is usually provided by the manufacturer or obtained through a calibration algorithm. [R, T] is the external parameter matrix of the visual sensor. 3) The three-dimensional coordinates of the calibration point in the point cloud of the calibration object and the two-dimensional coordinates in the image of the calibration object, the projection matrix M from the point cloud coordinate system to the image coordinate system can be solved.

In addition, although the feedback of the predicted target of the target tracking trajectory has been added during the detection of the first target in the point cloud to reduce the missed detection rate, the detection results of the three-dimensional space position of the first target output by the target detection model still have some missed detections. Possibly, so in some embodiments, the terminal may also add feedback on the predicted target of the target tracking trajectory when detecting the second target in the image, and convert the two-dimensional spatial position obtained by the projection of at least one first target in the image. , and at least one target tracking trajectory predicts the two-dimensional space position of the target in the image as a target, and the two-dimensional space position obtained by projection and the two-dimensional space position of the predicted target are output as the two-dimensional space position of the second target. .

S704: The terminal determines the three-dimensional space position of the at least one second target in the point cloud according to the projection of the two-dimensional space position of the at least one second target in the point cloud.

The terminal projects the two-dimensional spatial position of the at least one second target in the image into the point cloud to obtain the three-dimensional spatial position of the at least one second target in the point cloud, and obtains the final target detection result output of the point cloud.

For any second target, the features of the second target may include target features in a three-dimensional space position in the point cloud and target features in a two-dimensional space position in the image. The target features in the three-dimensional space position in the point cloud may include position (such as center point coordinates), size (such as length, width and height), speed, direction, category, number of point cloud points, coordinate value distribution in each direction of point cloud, point cloud Reflection intensity distribution (such as point cloud reflection intensity distribution histogram), depth features, etc. The target features of the two-dimensional space position in the image include position (center point coordinates), size (such as length and width), speed, direction, category, appearance Features (such as image color histogram, directional gradient histogram), etc.

For target tracking, a target tracking trajectory corresponds to a target, and the target tracking trajectory records the information of the target, such as ID, target characteristics, existence time, three-dimensional space position in each frame of point cloud where the target exists, The two-dimensional spatial position in each frame of images of the target, etc., in order to achieve the tracking of the same target, in some embodiments, the terminal can detect at least one second target according to the target feature corresponding to the existing at least one target tracking trajectory The target feature is matched with the at least one target tracking trajectory and the at least one second target. The second target matched to the target tracking trajectory is associated with the target tracking trajectory to improve the existing target tracking trajectory.

As an example, the matching degree (or similarity) between the target feature of the at least one target tracking trajectory and the target feature of the at least one second target can be used as the cost matrix, and the Hungarian algorithm can be used to analyze the at least one target tracking trajectory and the at least one target tracking trajectory. The second objective performs global optimal matching. The Hungarian algorithm is a combinatorial optimization algorithm that solves the task assignment problem in polynomial time. When calculating the similarity between the target feature of the target tracking trajectory and the target feature of the second target, the terminal considers the position (in the point cloud and/or in the image), size (in the point cloud and/or in the image), speed (in the point cloud and/or in the image), direction (in the point cloud and/or in the image), category (in the point cloud and/or in the image), number of points in the point cloud, numerical distribution of coordinates in each direction of the point cloud, point cloud reflection One or more of the target features such as intensity distribution, appearance feature, depth feature, etc. When multiple target features are considered, different target features can be assigned different weights, and the sum of the weighted values is 1.

For the second target that matches the existing target tracking trajectory, assign the second target the ID of the matching target tracking trajectory to improve the existing target tracking trajectory; for the second target that does not match the target tracking trajectory, the terminal can be the target. Assign a new target tracking track ID to create a new target tracking track.

For the target tracking trajectory that does not match the second target, the terminal can associate the target tracking trajectory with the predicted target of the target tracking trajectory in the point cloud and/or image, improve the target tracking trajectory, and avoid missed detections, etc. The reason is that the same target corresponds to multiple target tracking trajectories.

It should be understood that although the predicted target of the target tracking trajectory has been covered in the second target, if the predicted target’s three-dimensional spatial position in the point cloud and the two-dimensional spatial position in the image do not actually appear, the target features still remain. The target feature of the target tracking trajectory will not be successfully matched.

In addition, in order to avoid wasting processing resources by detecting and tracking the target that has moved out of the detection range, for the target tracking trajectory that is not matched to the second target, the target tracking trajectory and the target tracking trajectory in the point cloud and/or image are Before predicting the target association, if the number of times the target tracking trajectory is associated with the predicted target is greater than or equal to the first threshold, the terminal deletes the target tracking trajectory.

The solutions provided by the present application are mainly introduced from the perspective of method flow above, and the technical solutions of the embodiments of the present application will be described in detail below from the perspective of hardware or logical division modules. It can be understood that, in order to realize the above-mentioned functions, the apparatus may include corresponding hardware structures and/or software modules for performing each function. Those skilled in the art should easily realize that the present application can be implemented in hardware or a combination of hardware and computer software with the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein. Whether a function is performed by hardware or computer software driving hardware depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each particular application, but such implementations should not be considered beyond the scope of this application.

In the case of using an integrated unit, FIG. 8 shows a possible exemplary block diagram of the target detection apparatus involved in the embodiment of the present application, and the target detection apparatus 800 may exist in the form of a software module or a hardware module. The target detection apparatus 800 may include: an acquisition unit 803 and a processing unit 802 . In one example, the device may be a chip.

Optionally, the apparatus 800 may further include a storage unit 801 for storing program codes and/or data of the apparatus 800 .

Specifically, in one embodiment, the acquiring unit 803 is configured to acquire the point cloud from the three-dimensional scanning device and the image from the vision sensor;

The processing unit 802 is configured to input the point cloud and at least one target tracking trajectory in the point cloud to predict the three-dimensional space position of the target into the target detection model for processing, and obtain the three-dimensional space position of at least one first target, wherein the three-dimensional space position of the first target is obtained. The target detection model is obtained by training based on multiple point cloud samples of the three-dimensional spatial position of the predicted target corresponding to the known target tracking trajectory, and the three-dimensional spatial position detection results of the multiple targets corresponding to the multiple point cloud samples one-to-one. of;

The processing unit 802 is further configured to predict the two-dimensional spatial position of the target according to the projection of the three-dimensional spatial position of the at least one first target in the image and the at least one target tracking trajectory in the image, and determine the two-dimensional spatial position of at least one second object in the image;

The processing unit 802 is further configured to determine the three-dimensional spatial position of the at least one second target in the point cloud according to the projection of the two-dimensional spatial position of the at least one second target in the point cloud.

In a possible design, the processing unit 802 is further configured to, according to the target feature corresponding to the at least one target tracking track and the target feature of the at least one second target, perform the tracking of the at least one target tracking track and the The at least one second target is matched; the matched target tracking trajectory is associated with the second target.

In a possible design, the processing unit 802 is further configured to establish a target tracking trajectory corresponding to the second target for the second target that is not matched to the target tracking trajectory.

In a possible design, the processing unit 802 is further configured to, for the target tracking trajectory that is not matched to the second target, place the target tracking trajectory and the target tracking trajectory in the point cloud and/or predicted target associations in the image.

In a possible design, the processing unit 802 compares the target tracking trajectory with the target tracking trajectory in the point cloud and/or the target tracking trajectory for the target tracking trajectory that is not matched to the second target. Before being associated with the predicted target in the image, it is also used for deleting the target tracking trajectory when the number of times the target tracking trajectory is associated with the predicted target is greater than or equal to a first threshold.

In a possible design, the target features include one or more of the following: position, length, width, height, speed, direction, category, number of point cloud points, coordinate value distribution in each direction of the point cloud, and point cloud reflection Intensity distribution, appearance features, depth features.

In a possible design, the acquiring unit 803 is further configured to acquire the calibration object point cloud from the three-dimensional scanning device and the calibration object image from the vision sensor;

The processing unit 802 is further configured to determine a point cloud coordinate system and an image coordinate system according to the three-dimensional coordinates of a plurality of calibration points in the calibration object in the calibration object point cloud and the two-dimensional coordinates in the calibration object image projection matrix.

As shown in FIG. 9 , an embodiment of the present application further provides a target detection apparatus 900 . As shown in FIG. 9 , the target detection apparatus 900 includes at least one processor 902 and an interface circuit. Further, the apparatus further includes at least one memory 901 , and the at least one memory 901 is connected to the processor 902 . The interface circuit is used to provide input and output of data and/or information for the at least one processor. The memory 901 is used to store the computer-executed instructions. When the target detection device 900 is running, the processor 902 executes the computer-executed instructions stored in the memory 901, so that the target detection device 900 can realize the above-mentioned target detection method. For the realization of the specific target detection method, please refer to The relevant descriptions of the above and the accompanying drawings are not repeated here.

As another form of this embodiment, a computer-readable storage medium is provided, on which a program or an instruction is stored, and when the program or instruction is executed, the target detection method in the above method embodiment can be executed.

As another form of this embodiment, a computer program product including an instruction is provided, and when the instruction is executed, the target detection method in the above method embodiment can be executed.

As another form of this embodiment, a chip is provided. The chip can be coupled with a memory and is used to call a computer program product stored in the memory to implement the target detection method in the above method embodiments.

As will be appreciated by those skilled in the art, the embodiments of the present application may be provided as a method, a system, or a computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the present application. It will be understood that each flow and/or block in the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to the processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing device to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing device produce Means for implementing the functions specified in a flow or flow of a flowchart and/or a block or blocks of a block diagram.

These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions The apparatus implements the functions specified in the flow or flow of the flowcharts and/or the block or blocks of the block diagrams.

These computer program instructions can also be loaded on a computer or other programmable data processing device to cause a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process such that The instructions provide steps for implementing the functions specified in the flow or blocks of the flowcharts and/or the block or blocks of the block diagrams.

While the preferred embodiments of the present application have been described, additional changes and modifications to these embodiments may occur to those skilled in the art once the basic inventive concepts are known. Therefore, the appended claims are intended to be construed to include the preferred embodiment and all changes and modifications that fall within the scope of this application.

Obviously, those skilled in the art can make various changes and modifications to the embodiments of the present application without departing from the spirit and scope of the embodiments of the present application. Thus, if these modifications and variations of the embodiments of the present application fall within the scope of the claims of the present application and their equivalents, the present application is also intended to include these modifications and variations.

Claims

A target detection method, comprising:

Acquire point clouds from 3D scanning equipment and images from vision sensors;

Inputting the point cloud and at least one target tracking trajectory to predict the three-dimensional space position of the target in the point cloud into the target detection model for processing, to obtain the three-dimensional space position of at least one first target;

Determine at least one second target in the image according to the projection of the three-dimensional space position of the at least one first target in the image and the prediction of the two-dimensional space position of the target in the image by the at least one target tracking trajectory The two-dimensional space position of ;

According to the projection of the two-dimensional spatial position of the at least one second target in the point cloud, the three-dimensional spatial position of the at least one second target in the point cloud is determined.
The method of claim 1, wherein the method further comprises:

matching the at least one target tracking trajectory and the at least one second target according to the target feature corresponding to the at least one target tracking trajectory and the target feature of the at least one second target;

Associate the matched target tracking trajectory with the second target.
The method of claim 2, wherein the method further comprises:

For the second target that is not matched to the target tracking trajectory, a target tracking trajectory corresponding to the second target is established.
The method of claim 2 or 3, wherein the method further comprises:

For the target tracking trajectory that is not matched to the second target, the target tracking trajectory is associated with a predicted target of the target tracking trajectory in the point cloud and/or the image.
The method according to claim 4, wherein, for the target tracking trajectory that is not matched to the second target, the target tracking trajectory and the target tracking trajectory are compared between the point cloud and/or the target tracking trajectory. or before the prediction target in the image is associated, the method further includes:

When the number of times that the target tracking trajectory is associated with the predicted target is greater than or equal to a first threshold, the target tracking trajectory is deleted.
The method of any one of claims 2-5, wherein the target feature comprises one or more of the following:

Position, size, speed, direction, category, number of point cloud points, numerical distribution of coordinates in each direction of point cloud, distribution of reflection intensity of point cloud, appearance feature, depth feature.
The method according to any one of claims 1-6, wherein the method further comprises:

Obtain the calibration object point cloud from the 3D scanning device and the calibration object image from the vision sensor;

The projection matrix of the point cloud coordinate system and the image coordinate system is determined according to the three-dimensional coordinates of the multiple calibration points in the calibration object in the point cloud of the calibration object and the two-dimensional coordinates in the image of the calibration object.
A target detection device, comprising:

an acquisition unit for acquiring the point cloud from the 3D scanning device and the image from the vision sensor;

a processing unit, configured to input the point cloud and at least one target tracking trajectory in the point cloud to predict the three-dimensional space position of the target into the target detection model for processing, and obtain the three-dimensional space position of at least one first target;

The processing unit is further configured to predict the two-dimensional spatial position of the target according to the projection of the three-dimensional spatial position of the at least one first target in the image and the at least one target tracking trajectory in the image, and determine the the two-dimensional spatial position of at least one second object in the image;

The processing unit is further configured to determine the three-dimensional space position of the at least one second target in the point cloud according to the projection of the two-dimensional space position of the at least one second target in the point cloud.
The apparatus according to claim 8, wherein the processing unit is further configured to, according to the target feature corresponding to the at least one target tracking track and the target feature of the at least one second target The target tracking trajectory is matched with the at least one second target; and the matched target tracking trajectory is associated with the second target.
The apparatus according to claim 9, wherein the processing unit is further configured to establish a target tracking trajectory corresponding to the second target for the second target that is not matched to the target tracking trajectory.
The apparatus according to claim 9 or 10, wherein the processing unit is further configured to compare the target tracking trajectory with the target tracking trajectory for the target tracking trajectory that is not matched to the second target A predicted target association of a trajectory in the point cloud and/or in the image.
The apparatus according to claim 11, wherein, for the target tracking trajectory that is not matched to the second target, the processing unit compares the target tracking trajectory and the target tracking trajectory in the point cloud And/or before the predicted target in the image is associated, it is also used to delete the target tracking track when the number of times the target tracking track is associated with the predicted target is greater than or equal to a first threshold.
The device according to any one of claims 9-12, wherein the target feature comprises one or more of the following: position, size, speed, direction, category, number of point cloud points, each point cloud Orientation coordinate numerical distribution, point cloud reflection intensity distribution, appearance features, depth features.
The device according to any one of claims 8-13, wherein the acquisition unit is further configured to acquire a calibration object point cloud from a three-dimensional scanning device and a calibration object image from a vision sensor;

The processing unit is further configured to determine the coordinates of the point cloud coordinate system and the image coordinate system according to the three-dimensional coordinates of a plurality of calibration points in the calibration object in the calibration object point cloud and the two-dimensional coordinates in the calibration object image. Projection matrix.
A target detection device, comprising at least one processor and an interface;

The at least one processor is configured to invoke and run a computer program from the interface, and when the computer program is executed by the at least one processor, the method according to any one of claims 1-7 is implemented.
A chip system, characterized in that the chip system includes: at least one processor and an interface;

The at least one processor is configured to invoke and run a computer program from the interface, and when the computer program is executed by the at least one processor, the method according to any one of claims 1-7 is implemented.
A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a computer, the computer is made to execute any one of claims 1-7 the method described.
A terminal, characterized in that the terminal comprises the target detection device according to any one of claims 8-14.
The terminal of claim 18, wherein the terminal is a vehicle, a drone or a robot.