CN114937177A

CN114937177A - Automatic marking and detection model training and target recognition method and electronic equipment

Info

Publication number: CN114937177A
Application number: CN202210631423.8A
Authority: CN
Inventors: 刘袁; 蔡思佳; 陈静远; 邓兵; 黄建强
Original assignee: Alibaba China Co Ltd
Current assignee: Alibaba China Co Ltd
Priority date: 2022-06-06
Filing date: 2022-06-06
Publication date: 2022-08-23

Abstract

The embodiment of the application provides an automatic marking method, a detection model training method, a target identification method and electronic equipment, wherein the automatic marking method comprises the following steps: determining an untracked and matched tracking target according to a detection result of target detection on the adjacent frame 3D point cloud data; setting a tracking task for the tracking target which is not tracked and matched, and continuously tracking until 3D point cloud data of the tracking target does not exist in a detection frame corresponding to the tracking target, or determining that the tracking target exceeds a preset range according to the 3D point cloud data; and marking data for the tracked target which is not tracked and matched according to the tracking result. Through the embodiment of the application, the data marking can be more accurate, the marking quality is also higher, and the detection model training effect is better. When the method is applied to the target identification process, a more accurate target identification result can be obtained.

Description

Automatic marking and detection model training and target recognition method and electronic equipment

Technical Field

The embodiment of the application relates to the technical field of artificial intelligence, in particular to an automatic marking method, a detection model training method, a target identification method and corresponding electronic equipment.

Background

With the development of the laser radar technology, the laser radar technology is widely used in various scenes such as unmanned driving, vehicle-road cooperation, cleaning robots and the like. In these scenes, the 3D point cloud data acquired by the laser radar is subjected to target detection, so that the information such as the category, the scale, the distance and the like of obstacles around equipment or a vehicle can be sensed in real time, and effective guarantee is provided for decision planning and control of unmanned driving, vehicle-road cooperation, cleaning robots and the like. The target detection is mostly realized through a detection model, so that the target detection model based on the 3D point cloud data can be considered as a cornerstone for the laser radar to play a role in each scene.

For the target detection model, the target detection model needs to be obtained through training of a training sample of 3D point cloud data. Besides the 3D point cloud data, marking data of the 3D point cloud data are also needed in the training sample. The more the 3D point cloud data is marked, the better the training effect of the target detection model is, and the higher the robustness is. Therefore, the marking quality of the 3D point cloud data directly influences the training effect of the target detection model.

However, in the aspect of data marking, due to the complexity and data characteristics of the 3D point cloud data and the difference of the use scenes of the laser radar, the marking quality of the 3D point cloud data is poor, and the marking information is inaccurate.

Therefore, how to perform efficient and accurate marking on the 3D point cloud data becomes an urgent problem to be solved.

Disclosure of Invention

In view of the above, embodiments of the present application provide an automatic marking and inspection model training scheme to at least partially solve the above problems.

According to a first aspect of the embodiments of the present application, there is provided an automatic marking method, including: determining an untracked and matched tracking target according to a detection result of target detection on the adjacent frame 3D point cloud data; setting a tracking task for the tracking target which is not tracked and matched, and continuously tracking until 3D point cloud data of the tracking target does not exist in a detection frame corresponding to the tracking target, or determining that the tracking target exceeds a preset range according to the 3D point cloud data; and marking data for the tracked target which is not tracked and matched according to the tracking result.

According to a second aspect of the embodiments of the present application, there is provided a detection model training method, where the detection model includes a backbone network portion, a detection head network portion, and an auxiliary network portion; the method comprises the following steps: acquiring 3D point cloud sample data, and performing feature extraction on the 3D point cloud sample data through a backbone network part of the detection model to obtain corresponding feature vectors; performing target detection based on the characteristic vector through a detection head network part of the detection model to obtain a detection result; predicting whether each position in the 3D point cloud sample data belongs to a target object sample or not through an auxiliary network part of the detection model based on the characteristic vector, and outputting prediction information; obtaining a first loss value according to the detection result and a first loss function corresponding to the detection result; obtaining a second loss value according to the prediction information and a second loss function corresponding to the prediction information; and training the detection model according to the first loss value and the second loss value.

According to a third aspect of the embodiments of the present application, there is provided a target identification method, including: acquiring a 3D point cloud data stream acquired in real time; determining a tracking target matched with a tracking target and a tracking target not matched with the tracking target according to a detection result of target detection on adjacent frames of 3D point cloud data in the 3D point cloud data stream; tracking a tracking target which is tracked and matched based on a detection result, setting a tracking task for the tracking target which is not tracked and matched, and continuously tracking until 3D point cloud data of the tracking target do not exist in a detection frame corresponding to the tracking target, or determining that the tracking target exceeds a preset range according to the 3D point cloud data; and identifying the tracked target according to the tracking result of the tracked and matched tracked target and the tracking result of the tracked and unmatched tracked target.

According to a fourth aspect of embodiments of the present application, there is provided an electronic apparatus, including: the system comprises image acquisition equipment, a laser radar, a display, a processor, a communication interface and a communication bus, wherein the image acquisition equipment, the laser radar, the display, the processor and the communication interface finish mutual communication through the communication bus; wherein: the laser radar is used for acquiring 3D point cloud data of the surrounding environment in real time to form a 3D point cloud data stream; the processor is used for determining a tracking target matched with a tracking target and a tracking target not matched with the tracking target according to a detection result of target detection on adjacent frames of 3D point cloud data in the 3D point cloud data stream; tracking a tracking target which is tracked and matched based on a detection result, setting a tracking task for the tracking target which is not tracked and matched, and continuously tracking until 3D point cloud data of the tracking target does not exist in a detection frame corresponding to the tracking target, or determining that the tracking target exceeds a preset range according to the 3D point cloud data; identifying the tracked target according to the tracking result of the tracked and matched tracked target and the tracking result of the tracked and unmatched tracked target; superposing the result of the tracking target identification on a preset image, or carrying out 3D modeling based on the result of the tracking target identification and the image acquired by the image acquisition equipment in real time to obtain a 3D virtual scene; the display is configured to display an image on which a result of the tracking target recognition is superimposed, or display the 3D virtual scene.

According to a fifth aspect of embodiments herein, there is provided a computer storage medium having stored thereon a computer program which, when executed by a processor, implements a method as described in the first or second or third aspect.

According to a sixth aspect of embodiments herein, there is provided a computer program product comprising computer instructions for instructing a computing device to perform operations corresponding to the method of the first aspect, the second aspect or the third aspect.

According to the automatic marking scheme provided by the embodiment of the application, whether the tracking target incapable of tracking and matching exists is determined based on the detection result of the 3D point cloud data. For these tracked objects, unlike the conventional method in which the tracked objects are not detected or deleted in the following N frames, in the scheme of the embodiment of the present application, the tracked objects are continuously tracked until the 3D point cloud data of the tracked objects no longer exists in the corresponding detection frame or the 3D point cloud data exceeds the preset distance range. Therefore, through a tracking mode, detection is carried out by combining multi-frame 3D point cloud data, the detection result can be more accurate, and the problems of missing detection and false detection caused by the fact that a tracking target is shielded or the point cloud is sparse are effectively avoided. Further, the data marking based on the method can be more accurate, and the marking quality is higher.

According to the detection model training scheme provided by the embodiment of the application, an auxiliary network part for detecting whether each position in the 3D point cloud sample data belongs to a target object sample or not is added in the detection model to assist the training of the whole detection model, so that the trained detection model can extract more detailed and comprehensive 3D point cloud data characteristics, and the detection model has better detection performance and robustness. Therefore, the detection model is more accurate in subsequent detection and tracking of the 3D point cloud data, and the marking of the 3D point cloud data based on the detection and tracking results is more accurate and high in quality.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the description below are only some embodiments described in the embodiments of the present application, and other drawings can be obtained by those skilled in the art according to these drawings.

FIG. 1 is a schematic diagram of an exemplary system suitable for use with embodiments of the present application;

fig. 2A is a flowchart illustrating steps of an automatic marking method according to a first embodiment of the present disclosure;

FIG. 2B is a diagram illustrating a back tracking detection in the embodiment of FIG. 2A;

FIG. 3A is a flowchart illustrating steps of a training method for a test model according to a second embodiment of the present disclosure;

FIG. 3B is a schematic diagram of a detection model in the embodiment shown in FIG. 3A;

FIG. 3C is a diagram illustrating an example of a scenario in the embodiment shown in FIG. 2A;

FIG. 4 is a flowchart illustrating steps of a method for identifying a target according to a third embodiment of the present application;

fig. 5 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present application.

Detailed Description

In order to make those skilled in the art better understand the technical solutions in the embodiments of the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application shall fall within the scope of the protection of the embodiments in the present application.

The following further describes specific implementations of embodiments of the present application with reference to the drawings of the embodiments of the present application.

FIG. 1 illustrates an exemplary system to which embodiments of the present application may be applied. As shown in fig. 1, the system 100 may include a cloud server 102, a communication network 104, and/or one or more user devices 106, illustrated in fig. 1 as a plurality of user devices.

Cloud server 102 may be any suitable device for storing information, data, programs, and/or any other suitable type of content, including but not limited to distributed storage system devices, server clusters, computing cloud server clusters, and the like. In some embodiments, cloud server 102 may perform any suitable functions. For example, in some embodiments, the cloud server 102 may be used to automatically mark 3D point cloud data. As an optional example, in some embodiments, the cloud server 102 may determine an untracked matching tracking target based on the detection result of the 3D point cloud data; then, setting a tracking task for the tracking target and continuously tracking until 3D point cloud data of the tracking target or a preset range of the tracking target does not exist in a corresponding detection frame; and then, the tracking result is continued to automatically mark the tracking target. As another example, in some embodiments, a detection model may be provided in the cloud server 102, through which target detection and target tracking are performed on the 3D point cloud data. As another example, in some embodiments, cloud server 102 may also train the detection model. In some embodiments, the cloud server 102 may mark the 3D point cloud data according to a request sent by the user device 106. In some embodiments, the cloud service 102 may also perform target tracking and identification in a tracking manner in automatic marking.

In some embodiments, the communication network 104 may be any suitable combination of one or more wired and/or wireless networks. For example, the communication network 104 can include any one or more of the following: the network may include, but is not limited to, the internet, an intranet, a Wide Area Network (WAN), a Local Area Network (LAN), a wireless network, a Digital Subscriber Line (DSL) network, a frame relay network, an Asynchronous Transfer Mode (ATM) network, a Virtual Private Network (VPN), and/or any other suitable communication network. The user device 106 can be connected to the communication network 104 via one or more communication links (e.g., communication link 112), and the communication network 104 can be linked to the cloud server 102 via one or more communication links (e.g., communication link 114). The communication link may be any communication link suitable for communicating data between the user device 106 and the cloud service 102, such as a network link, dial-up link, wireless link, hardwired link, any other suitable communication link, or any suitable combination of such links.

User devices 106 may include any one or more user devices suitable for interacting with, presenting, and inputting information to a user. In some embodiments, the user device 106 may send a request to the cloud server 102 to mark the 3D point cloud data, and receive a marking result fed back by the cloud server 102. The 3D point cloud data may be sent to the cloud server 102 by the user equipment 106, or the user equipment may carry a storage address of the 3D point cloud data in the request, and the cloud server 102 marks the 3D point cloud data after obtaining the storage address. In some embodiments, the user device 106 may also send a request for object tracking and recognition of a 3D point cloud data stream formed by the 3D point cloud data acquired in real time to the cloud service 102, and receive a result fed back by the cloud service 102. In some embodiments, user devices 106 may comprise any suitable type of device. For example, in some embodiments, the user device 106 may comprise any suitable type of user device, such as a mobile device, a tablet computer, a laptop computer, a desktop computer, and so forth.

Based on the above system, the scheme provided by the present application is explained by a plurality of embodiments below.

Example one

Referring to fig. 2A, a flow chart illustrating steps of an automatic marking method according to a first embodiment of the present application is shown.

The automatic marking method comprises the following steps:

step S202: and determining an untracked and matched tracked target according to a detection result of target detection on the adjacent frame of 3D point cloud data.

Under many scenes related to automatic driving, such as unmanned driving, vehicle-road cooperation, cleaning robots and the like, perception data, namely 3D point cloud data acquired by a laser radar, needs to be acquired by the laser radar. Based on the 3D point cloud data, automatic driving control and decision can be effectively realized, but the premise is that a detection model with good performance is required. In order to obtain the detection model with good performance, the training sample data used for training the detection model needs to have an accurate label (l abe l), and the process of generating the label for the training sample data is the marking in the embodiment of the present application.

In order to perform accurate marking, in this embodiment, target detection is performed on data frames of adjacent 3D point cloud data first, and generally, a marking effect is good for a detected and matched target that can be tracked. However, for the target which fails to track the matching, the target may be temporarily completely blocked, or partially blocked, or the data in the partial frame is sparse, which easily results in missed detection and false detection. In this case, in the embodiment of the present application, firstly, the tracking targets that cannot be tracked and matched are determined according to the detection result of target detection on the 3D point cloud data of adjacent frames (usually, two adjacent frames before and after the adjacent frames, but not limited thereto, if the frame detection is performed at intervals, the two adjacent detection frames are used).

The improved detection model provided by the embodiment of the application can be used in a feasible mode, the detection model is a machine learning model trained based on a point cloud segmentation auxiliary task, and the target detection can be performed on the 3D point cloud data through the improved detection model based on the 3D point cloud data. The point cloud segmentation auxiliary task is used for predicting whether each position in the 3D point cloud data belongs to a target object sample.

The detection model may include a backbone network portion, a detection head network portion, and an auxiliary network portion. Based on this, the detection model may be trained using 3D point cloud sample data. In the training process, the backbone network part is used for extracting the characteristics of the 3D point cloud sample data and outputting corresponding characteristic vectors; the detection head network part is used for carrying out target detection based on the characteristic vectors output by the backbone network part; and the auxiliary network part is used for predicting whether each position in the 3D point cloud sample data belongs to the target object sample or not based on the characteristic vector output by the backbone network part and outputting prediction information. Specifically, 3D point cloud sample data may be input into a detection model to be trained; extracting the characteristics of the 3D point cloud sample data through a backbone network part of a detection model, and outputting a corresponding characteristic vector; target detection is carried out on the basis of the characteristic vector through a detection head network part of the detection model to obtain a detection result; predicting whether each position in the 3D point cloud sample data belongs to a target object sample or not by detecting an auxiliary network part of the model based on the characteristic vector, and outputting prediction information; obtaining a first loss value according to the detection result and a first loss function corresponding to the detection result; obtaining a second loss value according to the prediction information and a second loss function corresponding to the prediction information; and training the detection model according to the first loss value and the second loss value. The above-mentioned specific training process for the detection model will be described in detail in embodiment two, and will not be described in detail here.

Based on the detection result of target detection on the 3D point cloud data of the adjacent frames and the time sequence relationship between the adjacent frames, the tracking target matched with the tracking target and the tracking target not matched with the tracking target can be determined (the I oU of the corresponding detection frame is lower than the threshold).

An exemplary tracking process may include three phases of state prediction, target association, and state update.

In the state prediction phase, for each object detected, its state can be represented as a 10-dimensional state vector (x, y, z, θ, l, w, h, v _ x, v _ y, v _ z). Wherein, the first 3 vectors (x, y, z) represent the position of the center of the target, the 4 th vector θ represents the horizontal angle of the 3D frame corresponding to the target, the 5 th, 6 th, and 7 th vectors (l, w, h) represent the length, width, and height of the 3D frame corresponding to the target, and the 8 th, 9 th, and 10 th vectors (v _ x, v _ y, v _ z) represent the velocity of the target in 3 directions. Wherein, the speed is calculated after the target tracking in the current frame and the previous frame are matched in the tracking process. Further, the position of the object in the current frame may be predicted based on the position and velocity of the object in the previous frame.

In the object association stage, the detection result (detection box) of the current frame for the object is matched with the result (prediction box) predicted in the state prediction, and an affinity matrix is generated based on the corresponding 3D IoU. For example, a match can be made using Hungarian algorithm (Hungarian algorithm) while a threshold is set, below which IoU is considered as not being matched. The threshold value can be set by a person skilled in the art as appropriate according to actual needs, and is not limited in this embodiment of the application, and may be, for example, 0.3.

Therefore, a target which is matched with the tracking target and a target which is not matched with the tracking target can be determined, and the embodiment of the application focuses on the target which is not matched with the tracking target, namely the tracking target which is not matched with the tracking target. In addition, for subsequent detection and tracking, the state of the target needs to be updated, i.e., a state update phase is entered.

In the state updating stage, the state of the target may be updated to (x ', y', z ', θ', l ', w', h ', v _ x', v _ y ', v _ z') based on a weighted sum, wherein the weights are determined by uncertainties, such as by determining weights and updating states through a kalman filtering manner. The updated state is used as the state of the current frame, and the target tracking process can be continued by returning to the state prediction stage.

Step S204: and setting a tracking task for the tracking target which is not tracked and matched, and continuously tracking until the 3D point cloud data of the tracking target does not exist in the detection frame corresponding to the tracking target, or determining that the tracking target exceeds a preset range according to the 3D point cloud data.

If the tracking target matched with the traditional untracked target is not detected in the next N frames, the tracking target is deleted differently. In the embodiment of the application, the tracking task corresponding to the tracking target which is not tracked and matched is set for the tracking target, and the tracking is continuously performed. For example, a new track may be given immediately and tracking may continue for any detection boxes that do not track a matching target. Therefore, the recall rate of the tracking target can be greatly improved.

For the moving target, the tracking can be continued until the detection frame corresponding to the target does not contain any 3D point cloud data corresponding to the target, or the target exceeds a preset range. Thus, even if the target is not matched with the detection frame for many times, the tracking is not terminated, and the recall rate can be further improved. The preset range can be set by a person skilled in the art according to an actual scene, for example, for a scene with a high density, such as a person, a vehicle, or the like, the preset range can be set to 10 meters; for less dense scenes such as people, vehicles and the like, the preset range can be set farther, such as 50-100 meters, and the like.

For a stationary target, tracking will not stop as long as the target is within a preset range. Thus, even after the target is occluded for a long time, the tracking can be continued. As mentioned above, the preset range may be set by a person skilled in the art according to an actual scene, for example, for a scene with a high density, such as a person, a vehicle, or the like, the preset range may be set to 10 meters; for less dense scenes such as people, vehicles and the like, the preset range can be set farther, such as 50-100 meters, and the like.

Therefore, more accurate detection and tracking results can be obtained. However, in order to further improve the accuracy of detection and tracking, in a possible manner, the tracking frame obtained by tracking may also be processed, including: based on a continuous tracking process, obtaining a tracking frame sequence corresponding to the tracking target; judging whether to filter the tracking frame corresponding to the tracking frame sequence or not at least according to the tracking information corresponding to the tracking frame sequence; and performing corresponding operation on the tracking frame sequence according to the judgment result, and determining the tracking result of the tracking target which is not matched with the tracking according to the operation result. Therefore, invalid sequences can be filtered from the tracking frame sequence, and the subsequent marking efficiency is improved.

The determination of whether to filter the tracking frame corresponding to the tracking frame sequence may be implemented in one or more of the following manners according to at least the tracking information corresponding to the tracking frame sequence.

The first method is as follows: and judging whether all tracking frames in the tracking frame sequence are filtered according to the relation between the matching information of the tracking frames in the tracking frame sequence and the corresponding detection frames and a preset matching threshold.

The matching information may be the number of matching times or the ratio of the number of matching times to all matches. Correspondingly, the preset matching threshold may be a preset matching number threshold or a preset matching percentage threshold. The specific setting of the two thresholds can be set by those skilled in the art as appropriate according to actual needs, which is not limited by the embodiment of the present application, and for example, the preset matching ratio threshold may be 0.3. In this way, if the matching times of the tracking frame and the detection frame are less or the occupation ratio is lower, the probability that the tracking frame is possibly the target object is lower, and therefore, the tracking frame sequence can be filtered.

The second method comprises the following steps: and judging whether all tracking frames in the tracking frame sequence are filtered or not according to the relation between the duration time length of the tracking frame sequence and a preset time length threshold.

Wherein the preset time length threshold value can be set by those skilled in the art as appropriate according to actual conditions, such as 0.5S for example. In this manner, the shorter the duration of the sequence of tracking frames, the more likely it is to be an invalid tracking target, which can be filtered out.

The third method comprises the following steps: and judging whether all tracking frames in the tracking frame sequence are filtered according to the relation between the number of the 3D point cloud data of the tracking target in each tracking frame in the tracking frame sequence and a preset number threshold.

The preset number threshold may be set by those skilled in the art as appropriate according to actual conditions, such as 15 points for example. In this manner, if the number of the 3D point cloud data of the tracked target in each tracking frame in the sequence of tracking frames is continuously lower than the preset number threshold, it is more indicative that the tracked target may be an invalid tracking target, such as noise, and therefore, the tracked target may be filtered.

Through the method, effective tracking frame sequences can be obtained, and the tracking frames with higher reference value for subsequent marking can be determined based on the tracking frame sequences. For example, the remaining tracking frames can be determined according to the operation result, and the tracking frames with the quantity and the sequence of the 3D point cloud data in the tracking frames meeting the preset sequencing standard are selected from the remaining tracking frames; and determining a tracking result of the tracking target which is not matched with the tracking target based on the selected tracking frame.

Illustratively, for the obtained tracking frame sequence, the following operations may be performed: (1) for a certain tracking frame sequence track list, if the corresponding hit ratio (the ratio of the matching times of the tracking frame and the detection frame) is calculated to be less than 0.3, deleting the track list; (2) if the time length of a certain track list is less than 0.5S, deleting the track list; (3) if the number of points in a track frame of a track list never exceeds 15 points, the track list is deleted. (4) And selecting the tracking frames with more points top3 from the remaining tracking frames in the tracking frame sequence, wherein the scale estimation of the tracking frames is most accurate, and the average value of the tracking frames can be taken as the scale estimation of all the tracking frames in the tracking frame sequence, so that the scale estimation accuracy of the tracking frames is obviously improved. Based on the above, the remaining tracking frame can be used as a tracking result for marking data subsequently.

Step S206: and marking data for the tracked target which is not tracked and matched according to the tracking result.

Based on the foregoing tracking process, the tracking results (such as a detection frame or a tracking frame) corresponding to the tracked target that is not tracked and matched can be determined by integrating the detection results of the multiple frames of 3D point cloud data, so that accurate data marking can be performed on the tracked target that is not tracked and matched in these frames, that is, a label is set, and the information of the label includes, but is not limited to, the position information, the direction information, and the scale information of the tracked target, which can be obtained based on the state vector corresponding to the tracked target in the corresponding frame.

Further, in many cases, the target is to drive from far to near. However, the 3D point cloud data is sparse at far distances, making the target difficult to detect, while near detection is relatively easy. Based on the method, the embodiment of the application also provides a back tracking mode to improve the recall rate of the point cloud sparse target from far to near and far.

In particular, it can be implemented as: if the tracked target matched with the tracking is determined to be a far tracked target and a near tracked target according to the detection result of target detection on the adjacent frame 3D point cloud data, acquiring a tracking frame sequence corresponding to the tracked target; determining the most initial tracking frame in the tracking frame sequence and the initial time of the 3D point cloud data frame corresponding to the most initial tracking frame; acquiring a 3D point cloud data frame with the time earlier than the initial time by a preset frame number; tracking the 3D point cloud data frame with the preset frame number based on the most initial tracking frame; and marking data for a tracking target in the 3D point cloud data frame with a preset frame number according to a tracking result.

An example of the back tracking detection is shown in fig. 2B, where the upper row in fig. 2B represents object detection in the conventional manner and the lower row represents object detection in the back tracking manner.

As can be seen from fig. 2B, in the conventional method, since the 3D point cloud data of the target at a far position (corresponding to the first 3 frames with X numbers) is sparse, it is difficult to detect the target until the target is detected at a near position (corresponding to the 4 th to 7 th frames). However, in practice, the target exists in the first 3 frames, which cannot be detected in the conventional manner.

And by adopting the scheme of reverse tracking detection, after the target is detected in the 4 th frame, the previous 3 frames (marked by a dotted frame) can be detected again in a reverse mode, and because the detection result of the next frame is taken as the basis, when the previous frame is detected, the prediction and tracking can be carried out on the basis of the existing result, so that the detection result of the target can be obtained from the previous frame with relatively sparse 3D point cloud data. Therefore, the target recall rate and the detection accuracy are improved. Furthermore, the data marking on the basis is more accurate, and the marking quality is higher.

With the present embodiment, it is determined whether there is a tracking target that cannot track matching based on the detection result of the 3D point cloud data. For these tracked objects, unlike the conventional method in which the tracked objects are not detected or deleted in the following N frames, in the scheme of this embodiment, the tracked objects are continuously tracked until there is no more 3D point cloud data of the tracked objects in the corresponding detection frame or the tracked objects exceed the preset distance range. Therefore, through a tracking mode, detection is carried out by combining multi-frame 3D point cloud data, the detection result can be more accurate, and the problems of missing detection and false detection caused by the fact that a tracking target is shielded or the point cloud is sparse are effectively avoided. Further, the data marking based on the method can be more accurate, and the marking quality is higher.

Example two

In this embodiment, the scheme of the embodiment of the present application is described with emphasis on training of the detection model used in the first embodiment. For illustration, the results of the detection model will be described first, as shown in FIG. 3B.

As can be seen from fig. 3B, the Detection model includes a Backbone network portion (schematically illustrated as a Backbone), a Detection head network portion (schematically illustrated as a Detection head), and an Auxiliary network portion (schematically illustrated as an auxiary network).

The backbone network part is used for extracting the features of the 3D point cloud sample data and outputting corresponding feature vectors. In one example, the backbone network portion can be implemented as an encoder structure that includes at least a convolutional layer and a pooling layer, which can extract robust 3D point cloud features from the 3 point cloud data.

And the detection head network part is used for carrying out target detection based on the characteristic vectors output by the backbone network part. In one example, the detector head network portion may be implemented as any suitable target detector, including but not limited to an ssd (single Shot multi box detector) detector, or a detector capable of matching the prior box to the ground truth using a 2D joint section (IoU). Illustratively, a structure similar to the Backbone network backhaul and Detection head in the pointpilar model may be employed. By detecting the head part, the target position and the target category can be output based on the 3D point cloud characteristics extracted by the Backbone.

In order to make the features extracted by the Backbone network part backhaul more robust, so as to further improve the target detection performance, in the detection model of this embodiment, a point cloud segmentation assistance task is introduced to help the Backbone network part extract more robust 3D point cloud features from the original point cloud data, that is, an assistance network part, which is used for predicting whether each position in the 3D point cloud sample data belongs to a target object sample based on the feature vector output by the Backbone network part backhaul, and outputting prediction information.

As shown in fig. 3B, setting the scale at the Backbone input to be N × N, at the Backbone output, the scale becomes N/4 × N/4 due to pooling (Pooling). At the part of the auxiliary network introduced, the scale is restored to N x N by adding deconvolution. Subsequently, it is determined whether each position in N × N belongs to a target object sample such as an obstacle based on semantic segmentation. Specifically, a sigmoid activation function is added at the end of the auxiliary network part to convert the output of each position x into a probability form P (x) of [0,1]

In the point cloud segmentation auxiliary task, the following loss function is adopted:

E＝∑-w(x)log(y*P(x)+(1-y)*(1-P(x)))

wherein y represents the ground-route of the detection frame, i.e. whether the position x belongs to the target object sample such as an obstacle; p (x) represents the probability that the location predicted by the detection model belongs to a target object sample, such as an obstacle; w (x) represents weights given to different positions in the loss function for class balancing of the class to which the target object sample corresponds.

In the training process, the whole detection model is trained through the loss function, and the overall performance of the detection model can be remarkably improved. In the subsequent application process, although the auxiliary network part is not used, the whole detection model can obtain more robust 3D point cloud characteristics because of the participation of the detection model in training.

Based on the detection model structure, a flow of the detection model training method provided by the embodiment of the present application is shown in fig. 3A. The detection model training method comprises the following steps:

step S302: and acquiring 3D point cloud sample data, and performing feature extraction on the 3D point cloud sample data through a backbone network part of a detection model to be trained to obtain a corresponding feature vector.

As previously mentioned, the Backbone network portion may be implemented as an encoder structure, illustratively, as a Backbone network Backbone structure in the pointpilar model. Through the backbone network portion, 3D point cloud features, i.e., the feature vectors, can be obtained.

Step S304: and carrying out target detection based on the characteristic vector through a detection head network part of the detection model to obtain a detection result.

As previously mentioned, this Detection head network portion can be illustratively implemented as a Detection head structure in the Pointpilar model. Through the detection head network part, target detection can be carried out, and a corresponding detection result can be obtained. The first loss function is arranged in the part and comprises a classification part and a regression part, the classification of the target can be obtained through the classification loss function part, and the position of the target can be obtained through the regression loss function part. It should be noted that, in the embodiment of the present application, a specific form of the first loss function is not limited, and only the first loss function needs to have the corresponding functions described above.

Step S306: and predicting whether each position in the 3D point cloud sample data belongs to a target object sample or not by detecting an auxiliary network part of the model based on the characteristic vector, and outputting prediction information.

As previously mentioned, the auxiliary network portion may include a deconvolution layer (or referred to as an upsampling layer) and a semantic segmentation layer. In one possibility, by detecting the auxiliary network part of the model, semantic segmentation is performed based on the feature vectors; and judging whether each position in the 3D point cloud sample data belongs to a target object sample to predict according to the result of semantic segmentation, and outputting prediction information. Optionally, performing semantic segmentation based on the feature vector may be implemented as: the feature vector is up-sampled to obtain the feature vector with the same dimension as the 3D point cloud sample data input by the backbone network part; and performing semantic segmentation on the 3D point cloud sample data based on the feature vectors with the same dimensionality.

For example, in combination with the structure of the auxiliary network portion, the scale of the feature vector output by the main network portion may be recovered by the deconvolution layer, and the semantic segmentation layer performs semantic segmentation on the feature vector after scale recovery to determine whether each position point in the 3D point cloud data belongs to the target object sample.

In addition, a corresponding second loss function is provided in the auxiliary network part, such as the loss function E ∑ w (x) log (y × p (x)) + (1-y) × (1-p (x)) (as described above).

The meaning of each parameter in the second loss function is as described above and will not be described herein.

Step S308: obtaining a first loss value according to the detection result and a first loss function corresponding to the detection result; and obtaining a second loss value according to the prediction information and a second loss function corresponding to the prediction information.

It should be noted that, the operations of obtaining the first loss value and obtaining the second loss value may not be in a sequential order, or may be performed in parallel.

Step S310: and training the detection model according to the first loss value and the second loss value.

The detection model is trained by combining the two loss values, so that the detection model can improve the overall performance of the detection model on the whole by extracting the 3D point cloud features with higher robustness.

The training of the detection model is a loop iteration process until a training termination condition is reached, for example, a preset training time is reached, or a loss value meets a preset threshold, and the like.

After the training of the test model is completed, the test model can be put into use for testing and tracking as described in example one. For example, the detection model is used for carrying out target detection on adjacent frame 3D point cloud data to obtain a detection result; further, determining an untracked and matched tracking target based on the detection result; setting a tracking task for an untracked and matched tracking target and continuously tracking until 3D point cloud data of the tracking target does not exist in a detection frame corresponding to the tracking target, or determining that the tracking target exceeds a preset range according to the 3D point cloud data; and then, marking data for the tracked target which is not tracked and matched according to the tracking result.

According to the detection model training scheme provided by the embodiment, an auxiliary network part for detecting whether each position in the 3D point cloud sample data belongs to a target object sample is added in the detection model so as to assist the training of the whole detection model, so that the trained detection model can extract more detailed and comprehensive 3D point cloud data characteristics, and the detection model has better detection performance and robustness. Therefore, the detection model is more accurate in detection and tracking of the 3D point cloud data in follow-up use, and the marking of the 3D point cloud data based on the detection and tracking results is more accurate and high in quality.

In the following, a specific scenario is taken as an example, and an automatic marking process based on the detection model is exemplarily described, as shown in fig. 3C.

In fig. 3C, N continuous frames of 3D point cloud data are input to the detection model, and assuming that the target X framed by the detection frame 1 in the 30 th frame does not appear in the 31 th frame according to the detection results of the detection model on the 30 th and 31 th frames, the target X is determined as a tracking target (schematically illustrated as a car in the figure) which is not tracked and matched. And assigning a new track task to the target X and continuously tracking the new track task. It is assumed to reappear in frame 48 and persist in frames 49-100. Then, first, it is determined whether the hit ratio is less than 0.3 in the track frame sequence track list corresponding to the 30 th to 100 th frames, which is assumed to be 0.7 in this example, and the track frame sequence track list corresponding to the target X in the 30 th to 100 th frames is reserved. Then, the duration of the track list is determined, in this example, it is assumed to be 2 seconds, which is greater than the preset duration threshold value 0.5S, and the track list is continuously reserved. Further, the number of the 3D point data corresponding to the target X in each tracking frame in the track list is determined again, in this example, it is assumed that there are more than 15 point cloud data points, and the track list is continuously maintained.

However, the number of 3D point cloud data corresponding to the target X in each tracking frame in the track list is determined, assuming that the number of 3D point cloud data of the tracking frame in the 90 th frame, the 92 th frame and the 95 th frame is the largest. Then, three tracking frames corresponding to the target X in the three frames are selected, and the target X is marked according to the three tracking frames, for example, the position, the direction and the scale corresponding to the target X are obtained based on the 3D point cloud data in the three tracking frames.

Of course, the above example is only a simple example with one target X, the implementation is far more complex than the above example, but for each detected target, especially for targets that are not tracked and matched, Acer can be detected and tracked in the above manner, and then automatic marking is performed on the target according to the tracking result.

Practice proves that the performance of the detection model can be effectively improved through the point cloud segmentation-based auxiliary task, and the optimization marking result based on multi-frame point cloud data can be realized through the tracking mode in the embodiment of the application adopted by aiming at the tracked target which is not tracked and matched. The two improvements are applied to a Kitti data set, a traditional 3D point cloud detection model such as a Pointpilar model is used as a contrast, and the accuracy of detection results is compared as shown in the following table:

it can be seen that the point cloud segmentation auxiliary function is added into the 3D detection model, so that the detection performance of the model can be remarkably improved. On this basis, further through the mode that the multiframe lasts the tracking in this application embodiment, mark through the result of multiframe tracking, can further promote the model performance. Based on this, the marking of the 3D data is also more accurate and of high quality.

EXAMPLE III

Referring to fig. 4, a flowchart of steps of a target identification method according to a third embodiment of the present application is shown.

In this embodiment, the accuracy of target tracking and recognition can be improved in the process of target tracking and recognition in the application of the tracking mode adopted in the first embodiment.

The target identification method of the embodiment comprises the following steps:

step S402: and acquiring a 3D point cloud data stream acquired in real time.

For devices provided with a laser radar, including but not limited to autonomous vehicles, autonomous driving robots, and some AR (augmented reality) devices, VR (virtual reality) devices, etc., in an operating state, 3D point cloud data of a surrounding environment is acquired in real time, thereby forming a 3D point cloud data stream, that is, the data stream is formed based on the acquired continuous multi-frame 3D point cloud data.

Step S404: and determining a tracking target matched with the tracking target and a tracking target not matched with the tracking target according to a detection result of target detection on the 3D point cloud data of the adjacent frames in the 3D point cloud data stream.

Because the embodiment is mainly used for accurately tracking the target, not only the tracking target which is not tracked and matched but also the tracking target which is tracked and matched can be determined. For the specific way of determining the tracked and unmatched tracked targets based on the detection result, reference may be made to the description of the relevant parts in the first embodiment, and details are not repeated here.

Step S406: tracking the tracking target matched with the tracking based on the detection result; and setting a tracking task for the tracking target which is not tracked and matched, and continuously tracking until the 3D point cloud data of the tracking target does not exist in the detection frame corresponding to the tracking target, or determining that the tracking target exceeds a preset range according to the 3D point cloud data.

Wherein, for the tracking target matched with the tracking, the tracking can be carried out by adopting a conventional mode; for the tracked target that is not tracked and matched, the tracking can be performed in the manner as described in the first embodiment. And, corresponding tracking results are obtained, respectively.

Step S408: and identifying the tracked target according to the tracking result of the tracked and matched tracked target and the tracking result of the tracked and unmatched tracked target.

Whether the tracking target is a tracking-matched tracking target or a tracking target which is not tracking-matched, the corresponding tracking results of the tracking targets contain the same information, for example, the information of the 10-dimensional state vector (x, y, z, θ, l, w, h, v _ x, v _ y, v _ z) as described above, or at least include position information, direction information, scale information, and the like. Based on this, tracking target recognition including, but not limited to, corresponding position recognition, direction recognition, scale recognition, and recognition such as category recognition (e.g., people, vehicles, obstacles, etc.), travel path recognition, etc. can be achieved.

Optionally, the recognition results can also be applied to subsequent practical applications according to practical needs. To this end, in one possible manner, the present embodiment may further include the following step S410.

Step S410: superposing the result of the tracking target identification on a preset image and displaying the result; or performing 3D modeling based on the tracking target recognition result and the image acquired by the image acquisition equipment in real time to obtain and display a 3D virtual scene.

For example, in driving planning or navigation, the result of the tracking target recognition may be superimposed on an electronic map to more clearly show the information in the environment where the current vehicle or robot or AR/VR device is located.

Or, if the device is provided with the laser radar and the image acquisition device such as a camera, the image acquisition device can acquire the image data of the surrounding environment in real time while the laser radar acquires the 3D point cloud data of the surrounding environment in real time. On this basis, the results of the aforementioned tracking target recognition may be combined with the image data to achieve a desired application.

For example, during driving planning or navigation, the result of the tracking target recognition may be superimposed on the image acquired by the image acquisition device in real time, and the image data of the current vehicle or the environment where the robot or the AR/VR device is located and the corresponding information of each tracking target in the image are more clearly shown through the display.

For another example, in some virtual reality scenes, such as a simulation scene display or a game scene, the 3D modeling may be performed based on the result of the tracking target recognition and the real-time image acquired by the image acquisition device to obtain a corresponding 3D virtual scene; furthermore, the 3D virtual scene is displayed through the display, so that the immersive experience under the corresponding scene can be obtained through the VR equipment.

Therefore, by the embodiment, when the tracking mode for the untracked and matched tracking target provided by the embodiment of the application is applied to the tracking and identifying scene, the tracking, identifying and accuracy can be improved, and a better tracking effect is achieved.

Example four

Referring to fig. 5, a schematic structural diagram of an electronic device according to a fourth embodiment of the present application is shown, and the specific embodiment of the present application does not limit a specific implementation of the electronic device. Illustratively, it may be implemented as a vehicle, a robot, an AR device, or a VR device, for example.

As shown in fig. 5, the electronic device may include: image capture device 502, lidar 504, display 506, processor 508, communication Interface 510, and communication bus 512.

Wherein:

the image capturing device 502 is configured to capture image data of a surrounding environment in real time.

And the laser radar 504 is used for acquiring the 3D point cloud data of the surrounding environment in real time to form a 3D point cloud data stream.

A processor 508, configured to determine a tracking target that is tracking matched and a tracking target that is not tracking matched according to a detection result of target detection performed on adjacent frames of 3D point cloud data in the 3D point cloud data stream; tracking a tracking target which is tracked and matched based on the detection result, setting a tracking task for the tracking target which is not tracked and matched, and continuously tracking until 3D point cloud data of the tracking target does not exist in a detection frame corresponding to the tracking target, or determining that the tracking target exceeds a preset range according to the 3D point cloud data; identifying the tracked target according to the tracking result of the tracked and matched tracked target and the tracking result of the tracked and unmatched tracked target; the result of the tracking target recognition is superimposed on a preset image (including an image acquired by the image acquisition device 502 and images acquired by other non-image acquisition devices 502, such as an electronic map), or 3D modeling is performed based on the result of the tracking target recognition and the images acquired by the image acquisition devices in real time, so as to obtain a 3D virtual scene.

A display 506 for displaying an image on which a result of the tracking target recognition is superimposed, or displaying a 3D virtual scene.

Image capture device 502, lidar 504, display 506, processor 508, and communication interface 510 communicate with each other via communication bus 512.

And a communication interface 510 for communicating with other electronic devices or cloud servers.

The processor 508 may be a CPU, or an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement embodiments of the present application. The intelligent device comprises one or more processors which can be the same type of processor, such as one or more CPUs; or may be different types of processors such as one or more CPUs and one or more ASICs.

Optionally, the electronic device of this embodiment may further include a memory 514 for storing various data in the electronic device, such as image data, 3D point cloud data, various intermediate data generated in the tracking process, and the like. The memory 514 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.

For specific implementation of the above operation of the electronic device in this embodiment, reference may be made to corresponding steps and corresponding descriptions in units in the above method embodiments, and corresponding beneficial effects are provided, which are not described herein again. It can be clearly understood by those skilled in the art that, for convenience and simplicity of description, the specific working processes of the above-described devices and modules may refer to the corresponding process descriptions in the foregoing method embodiments, and are not described herein again.

Embodiments of the present application further provide a computer program product, which includes computer instructions that instruct a computing device to perform operations corresponding to any one of the methods in the foregoing method embodiments.

It should be noted that, according to the implementation requirement, each component/step described in the embodiment of the present application may be divided into more components/steps, and two or more components/steps or partial operations of the components/steps may also be combined into a new component/step to achieve the purpose of the embodiment of the present application.

The above-described methods according to embodiments of the present application may be implemented in hardware, firmware, or as software or computer code storable in a recording medium such as a CD ROM, a RAM, a floppy disk, a hard disk, or a magneto-optical disk, or as computer code originally stored in a remote recording medium or a non-transitory machine-readable medium downloaded through a network and to be stored in a local recording medium, so that the methods described herein may be stored in such software processes on a recording medium using a general-purpose computer, a dedicated processor, or programmable or dedicated hardware such as an ASIC or FPGA. It will be appreciated that a computer, processor, microprocessor controller, or programmable hardware includes memory components (e.g., RAM, ROM, flash memory, etc.) that can store or receive software or computer code that, when accessed and executed by a computer, processor, or hardware, implements the methods described herein. Further, when a general-purpose computer accesses code for implementing the methods illustrated herein, execution of the code transforms the general-purpose computer into a special-purpose computer for performing the methods illustrated herein.

Those of ordinary skill in the art will appreciate that the various illustrative elements and method steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the embodiments of the present application.

The above embodiments are only used for illustrating the embodiments of the present application, and not for limiting the embodiments of the present application, and those skilled in the relevant art can make various changes and modifications without departing from the spirit and scope of the embodiments of the present application, so that all equivalent technical solutions also belong to the scope of the embodiments of the present application, and the scope of patent protection of the embodiments of the present application should be defined by the claims.

Claims

1. An automatic marking method, comprising:

determining an untracked and matched tracking target according to a detection result of target detection on the adjacent frame 3D point cloud data;

setting a tracking task for the tracking target which is not tracked and matched, and continuously tracking until 3D point cloud data of the tracking target does not exist in a detection frame corresponding to the tracking target, or determining that the tracking target exceeds a preset range according to the 3D point cloud data;

and marking data for the tracked target which is not tracked and matched according to the tracking result.

2. The method of claim 1, wherein prior to the data-marking of the 3D point cloud data according to the tracking results, the method further comprises:

obtaining a tracking frame sequence corresponding to the tracking target based on a continuous tracking process;

judging whether to filter the tracking frame corresponding to the tracking frame sequence at least according to the tracking information corresponding to the tracking frame sequence;

and performing corresponding operation on the tracking frame sequence according to the judgment result, and determining the tracking result of the tracking target which is not matched with the tracking according to the operation result.

3. The method according to claim 2, wherein the determining whether to filter the tracking frame corresponding to the tracking frame sequence according to at least the tracking information corresponding to the tracking frame sequence includes at least one of:

judging whether all tracking frames in the tracking frame sequence are filtered according to the relation between the matching information of the tracking frames in the tracking frame sequence and the corresponding detection frames and a preset matching threshold;

judging whether all tracking frames in the tracking frame sequence are filtered according to the relation between the duration length of the tracking frame sequence and a preset time length threshold;

and judging whether all tracking frames in the tracking frame sequence are filtered according to the relation between the number of the 3D point cloud data of the tracking target in each tracking frame in the tracking frame sequence and a preset number threshold.

4. The method of claim 3, wherein the determining a tracking result of the tracking target which is not matched with the tracking according to the operation result comprises:

determining the reserved tracking frames according to the operation result, and selecting the tracking frames of which the quantity sequence of the 3D point cloud data in the tracking frames meets the preset sequencing standard;

and determining a tracking result of the tracking target which is not matched with the tracking target based on the selected tracking frame.

5. The method of any of claims 1-4, wherein the method further comprises:

if the tracking target matched with the tracking is determined to be a far tracking target and a near tracking target according to the detection result of target detection on the 3D point cloud data of the adjacent frames, acquiring a tracking frame sequence corresponding to the tracking target;

determining the most initial tracking frame in the tracking frame sequence and the initial time of the 3D point cloud data frame corresponding to the most initial tracking frame;

acquiring a 3D point cloud data frame with the time earlier than the initial time by a preset frame number;

tracking the 3D point cloud data frame with the preset frame number based on the most initial tracking frame;

and marking data for a tracking target in the 3D point cloud data frame with a preset frame number according to a tracking result.

6. The method of any one of claims 1-4, wherein prior to the determining an untracked tracked target from the detection of the target detection on the adjacent frame 3D point cloud data, the method further comprises:

performing target detection on the 3D point cloud data through a detection model for performing target detection based on the 3D point cloud data; the detection model is a machine learning model trained based on a point cloud segmentation auxiliary task.

7. The method of claim 6, wherein the method further comprises:

training the detection model by using 3D point cloud sample data;

wherein the detection model comprises a backbone network part, a detection head network part and an auxiliary network part; the backbone network part is used for extracting the characteristics of the 3D point cloud sample data and outputting corresponding characteristic vectors; the detection head network part is used for carrying out target detection based on the characteristic vector output by the main network part; and the auxiliary network part is used for predicting whether each position in the 3D point cloud sample data belongs to the target object sample or not based on the characteristic vector output by the main network part and outputting prediction information.

8. The method of claim 7, wherein the training the detection model using 3D point cloud sample data comprises:

inputting the 3D point cloud sample data into the detection model to be trained;

extracting the characteristics of the 3D point cloud sample data through a backbone network part of the detection model, and outputting a corresponding characteristic vector;

performing target detection based on the characteristic vector through a detection head network part of the detection model to obtain a detection result;

predicting whether each position in the 3D point cloud sample data belongs to a target object sample or not through an auxiliary network part of the detection model based on the characteristic vector, and outputting prediction information;

obtaining a first loss value according to the detection result and a first loss function corresponding to the detection result; obtaining a second loss value according to the prediction information and a second loss function corresponding to the prediction information;

and training the detection model according to the first loss value and the second loss value.

9. A detection model training method, the detection model includes a main network part, a detection head network part and an auxiliary network part; the method comprises the following steps:

acquiring 3D point cloud sample data, and performing feature extraction on the 3D point cloud sample data through a backbone network part of the detection model to obtain corresponding feature vectors;

10. The method of claim 9, wherein said predicting, by the auxiliary network portion of the detection model, whether each location in the 3D point cloud sample data belongs to a target object sample based on the feature vectors and outputting prediction information comprises:

performing semantic segmentation based on the feature vectors by an auxiliary network portion of the detection model;

and judging whether each position in the 3D point cloud sample data belongs to a target object sample to predict according to the result of semantic segmentation, and outputting prediction information.

11. The method of claim 10, wherein the semantically segmenting based on the feature vector comprises:

the feature vectors are sampled upwards, and the feature vectors with the same dimensionality as 3D point cloud sample data input by the main network part are obtained;

and performing semantic segmentation on the 3D point cloud sample data based on the feature vectors with the same dimensionality.

12. An object recognition method, comprising:

acquiring a 3D point cloud data stream acquired in real time;

determining a tracking target matched with a tracking target and a tracking target not matched with the tracking target according to a detection result of target detection on adjacent frames of 3D point cloud data in the 3D point cloud data stream;

tracking a tracking target which is tracked and matched based on a detection result, setting a tracking task for the tracking target which is not tracked and matched, and continuously tracking until 3D point cloud data of the tracking target does not exist in a detection frame corresponding to the tracking target, or determining that the tracking target exceeds a preset range according to the 3D point cloud data;

and identifying the tracked target according to the tracking result of the tracked and matched tracked target and the tracking result of the tracked and unmatched tracked target.

13. The method of claim 12, wherein the method further comprises:

superposing the result of the tracking target identification on a preset image and displaying;

alternatively, the first and second electrodes may be,

and 3D modeling is carried out based on the tracking target recognition result and the image acquired by the image acquisition equipment in real time, and a 3D virtual scene is obtained and displayed.

14. An electronic device, comprising: the system comprises image acquisition equipment, a laser radar, a display, a processor, a communication interface and a communication bus, wherein the image acquisition equipment, the laser radar, the display, the processor and the communication interface finish mutual communication through the communication bus;

wherein:

the laser radar is used for acquiring 3D point cloud data of the surrounding environment in real time to form a 3D point cloud data stream;

the processor is used for determining a tracking target matched with a tracking target and a tracking target not matched with the tracking target according to a detection result of target detection on adjacent frames of 3D point cloud data in the 3D point cloud data stream; tracking a tracking target which is tracked and matched based on a detection result, setting a tracking task for the tracking target which is not tracked and matched, and continuously tracking until 3D point cloud data of the tracking target does not exist in a detection frame corresponding to the tracking target, or determining that the tracking target exceeds a preset range according to the 3D point cloud data; identifying the tracked target according to the tracking result of the tracked and matched tracked target and the tracking result of the tracked and unmatched tracked target; superposing the result of the tracking target identification on a preset image, or carrying out 3D modeling based on the result of the tracking target identification and the image acquired by the image acquisition equipment in real time to obtain a 3D virtual scene;

the display is used for displaying the image on which the result of the tracking target recognition is superimposed, or displaying the 3D virtual scene.