CN116363565B

CN116363565B - Target track determining method and device, electronic equipment and storage medium

Info

Publication number: CN116363565B
Application number: CN202310639967.3A
Authority: CN
Inventors: 吴亚军; 蒋召; 黄泽元
Original assignee: Shenzhen Xumi Yuntu Space Technology Co Ltd
Current assignee: Shenzhen Xumi Yuntu Space Technology Co Ltd
Priority date: 2023-06-01
Filing date: 2023-06-01
Publication date: 2023-08-11
Anticipated expiration: 2043-06-01
Also published as: CN116363565A

Abstract

The application relates to the technical field of computers, and provides a target track determining method, a target track determining device, electronic equipment and a storage medium. Comprising the following steps: acquiring a target motion video; predicting a prediction frame of a target in an ith frame and a jth frame of the video by using a predictor; acquiring detection frames of a target in an ith frame and a jth frame of the video by using a target detection algorithm; generating a virtual track by using the ith frame detection frame and the jth frame detection frame in response to the prediction frame of the ith frame being unassociated with the detection frame of the ith frame and the prediction frame of the jth frame being associated with the detection frame of the jth frame; updating the predictor based on the virtual track; predicting a prediction frame of the target in a j+n frame of the video based on the updated predictor; acquiring a detection frame of a target in a j+n frame of the video by using a target detection algorithm; correlating a prediction frame of the target in the j+n frame with the detection frame to obtain the position of the target in the j+n frame; a trajectory of the target is determined based on the locations. The method can improve the determination accuracy of the target track.

Description

Target track determining method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method and apparatus for determining a target track, an electronic device, and a storage medium.

Background

The motion model of the existing tracking algorithm assumes that the target motion is linear, and continuous detection values are needed to update the predictor in the tracking process, so that the algorithms are particularly sensitive to occlusion in the track and nonlinear motion of the target, and the problems of track jump (switch) and track splitting often occur.

In order to solve the problem of target tracking in complex scenes, the related art proposes to use camera motion compensation to enhance a motion modeling method, and determine a target track based on a target detection frame detected by a target inspection algorithm and a target prediction frame predicted by a predictor. The state of the predictor is updated in real time according to the current detection result. However, in real scenes, when there is an occlusion or a nonlinear motion of an object during tracking, the state vector noise of the predictor is gradually accumulated when there is no target detection frame associated with the existing track for a long time, thereby causing prediction bias.

Disclosure of Invention

In view of the above, the embodiments of the present application provide a method, an apparatus, an electronic device, and a storage medium for determining a target track, so as to solve the problem in the prior art that the accuracy of determining the target track is not high.

In a first aspect of an embodiment of the present application, a method for determining a target track is provided, including:

acquiring a target motion video;

predicting a prediction frame of a target in an ith frame and a jth frame of the video by using a predictor;

acquiring detection frames of a target in an ith frame and a jth frame of the video by using a target detection algorithm;

generating a virtual track by using the ith frame detection frame and the jth frame detection frame in response to the prediction frame of the ith frame being unassociated with the detection frame of the ith frame and the prediction frame of the jth frame being associated with the detection frame of the jth frame;

updating the predictor based on the virtual track;

predicting a prediction frame of the target in a j+n frame of the video based on the updated predictor;

acquiring a detection frame of a target in a j+n frame of the video by using a target detection algorithm;

correlating a prediction frame of the target in the j+n frame with the detection frame to obtain the position of the target in the j+n frame;

determining a track of the target based on the position of the target in the j+n frame;

wherein i, j and n are positive integers, and j is greater than i.

In a second aspect of the embodiment of the present application, there is provided a target track determining apparatus, including:

an acquisition device configured to acquire a target motion video;

a prediction means configured to predict a prediction frame of a target at an i-th frame and a j-th frame of the video using a predictor;

A detection device configured to acquire detection frames of a target at an i-th frame and a j-th frame of a video using a target detection algorithm;

generating means configured to generate a virtual track using the i-th frame detection frame and the j-th frame detection frame in response to the prediction frame of the i-th frame being unassociated with the detection frame of the i-th frame and the prediction frame of the j-th frame being associated with the detection frame of the j-th frame;

updating means configured to update the predictor based on the virtual track;

the prediction means is further configured to predict a prediction frame of the target at the j+n frame of the video based on the updated predictor;

the detection device is further configured to acquire a detection frame of the target in the video j+n frame by using a target detection algorithm;

the association device is configured to associate the prediction frame and the detection frame of the target in the j+n frame to obtain the position of the target in the j+n frame;

a determining means configured to determine a trajectory of the target based on a position of the target in the j+n frame;

wherein i, j and n are positive integers, and j is greater than i.

In a third aspect of the embodiments of the present application, there is provided an electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the above method when executing the computer program.

In a fourth aspect of the embodiments of the present application, there is provided a computer readable storage medium storing a computer program which, when executed by a processor, implements the steps of the above method.

Compared with the prior art, the embodiment of the application has the beneficial effects that: after the track is successfully re-associated after being lost, the virtual track is determined based on the detection frames before the track is lost and after the track is re-associated, and the state of the predictor is updated based on the target frame corresponding to the virtual track, namely the predictor is updated reversely and smoothly through the current detection value, so that error accumulation of the predictor is avoided, and the determination precision of the target track is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flow chart of a target tracking method based on motion detection compensation.

Fig. 2 is a flowchart of a target track determining method according to an embodiment of the present application.

Fig. 3 is a flowchart of a target track determining method according to an embodiment of the present application.

Fig. 4 (a) is a schematic diagram of a target track without track splitting according to an embodiment of the present application.

Fig. 4 (b) is a schematic diagram of a target track with track splitting according to an embodiment of the present application.

Fig. 5 is a flowchart of a method for determining association degree of any two tracks corresponding to the same time interval in N tracks by using a trained prediction model according to an embodiment of the present application.

Fig. 6 is a flowchart of a method for determining a degree of association between a first track and a second track based on an aggregate feature of the first track and the second track according to an embodiment of the present application.

Fig. 7 is a schematic diagram of a method for offline associating a fracture track according to an embodiment of the present application.

Fig. 8 is a flowchart of a method for training a predictive model according to an embodiment of the application.

Fig. 9 is a schematic diagram of a target track determining apparatus according to an embodiment of the present application.

Fig. 10 is a schematic diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, techniques, etc., in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

As mentioned above, in order to solve the object tracking problem in complex scenes, the related art proposes to enhance a motion modeling method using camera motion compensation. Fig. 1 is a flow chart of a target tracking method based on motion detection compensation. As shown in fig. 1, the method comprises the steps of:

in step S101, an object is detected.

The target detection algorithm may be used to extract the target detection object frame of the current frame and filter out detection object frames with low object frame thresholds. The target detection algorithm may be a two-Stage (TwoStage) target detection algorithm implemented based on a regional convolutional neural network (Region-ConvolutionalNeural Networks, R-CNN), a spatial pyramid pooling network (Spatial Pyramid Pooling Network, SPP-Net), a Fast regional convolutional neural network (Fast Region-Convolutional NeuralNetworks, fast R-CNN), a regional full convolutional network (Region-based fullyconvolutional network, R-FCN), or a Single-Stage (One Stage) target detection algorithm implemented based on a global feature (OverFeat), a Single prediction (You Only Look Once, YOLO) v1, YOLO v3, a Single multi-frame prediction (Single Shot MultiBoxDetector, SSD), retinaNet, or the like. The quality of the target detection algorithm has a great influence on the target tracking accuracy.

Step S102, predictor estimation and predictor state update.

The prediction object frames of all targets of the current frame are predicted by using the predictor, and the state of the predictor is updated by using the detection result of the current frame. Among them, the predictor may use a Kalman (Kalman) predictor, or other predictors.

Step S103, track association, deletion and new creation.

And associating the detection object frame and the prediction object frame of the current frame. If the association is successful, a target frame may be determined based on the detected object frame and the predicted object frame, and a track Identification (ID) may be assigned to the current detected object based on the target frame. And if the track is not successfully associated with the detection object frame within a certain time range, deleting the track. If the detected object frame with high detection confidence fails to be successfully associated with the track existing at present, the current detected object frame is newly established as a new track.

However, in real scenes, when there is an occlusion or a nonlinear motion of an object during tracking, the state vector noise of the predictor is gradually accumulated when there is no target detection frame associated with the existing track for a long time, thereby causing prediction bias.

In view of this, the embodiment of the application provides a target track determining method, which determines a virtual track based on detection frames before loss and after re-association after the track is lost and updates the state of a predictor based on the target frame corresponding to the virtual track, namely, the predictor is updated reversely and smoothly through the current detection value, so that error accumulation of the predictor is avoided, and the target track determining precision is improved.

Fig. 2 is a flowchart of a target track determining method according to an embodiment of the present application. As shown in fig. 2, the method comprises the steps of:

in step S201, a target motion video is acquired.

In step S202, a predictor is used to predict prediction frames of the target at the i-th and j-th frames of the video.

In step S203, the detection frames of the target at the i-th frame and the j-th frame of the video are acquired using the target detection algorithm.

In step S204, in response to the prediction frame of the ith frame being unassociated with the detection frame of the ith frame and the prediction frame of the jth frame being associated with the detection frame of the jth frame, a virtual track is generated using the detection frames of the ith frame and the jth frame.

In step S205, the predictor is updated based on the virtual track.

In step S206, a prediction frame of the target in the j+n frame of the video is predicted based on the updated predictor.

In step S207, a detection frame of the object at the j+n frame of the video is acquired using the object detection algorithm.

In step S208, the prediction frame and the detection frame of the target in the j+n frame are associated, so as to obtain the position of the target in the j+n frame.

In step S209, the trajectory of the target is determined based on the position of the target in the j+n frame.

Wherein i, j and n are positive integers, and j is greater than i.

In the embodiment of the application, the target track determining method can be executed by a terminal device or a server. The terminal device may be hardware or software. When the terminal device is hardware, it may be a variety of electronic devices having a display screen and supporting communication with a server, including but not limited to smartphones, tablet computers, laptop and desktop computers, and the like; when the terminal device is software, it may be installed in the electronic device as described above. The terminal device may be implemented as a plurality of software or software modules, or as a single software or software module, as embodiments of the application are not limited in this regard. Further, various applications may be installed on the terminal device, such as a data processing application, an instant messaging tool, social platform software, a search class application, a shopping class application, and the like.

The server may be a server that provides various services, for example, a background server that receives a request transmitted from a terminal device with which communication connection is established, and the background server may perform processing such as receiving and analyzing the request transmitted from the terminal device and generate a processing result. The server may be a server, a server cluster formed by a plurality of servers, or a cloud computing service center, which is not limited in this embodiment of the present application.

The server may be hardware or software. When the server is hardware, it may be various electronic devices that provide various services to the terminal device. When the server is software, it may be a plurality of software or software modules that provide various services to the terminal device, or may be a single software or software module that provides various services to the terminal device, which is not limited in this embodiment of the present application.

The target may be any target that requires motion tracking. Such as characters and objects in moving vehicles, boats, aircraft, animals, games, etc. applications, objects in medical images, etc.

In the embodiment of the application, the motion video of the target can be acquired to determine the motion trail of the target based on the video content. In an example, the motion video of the target may be a surveillance video including vehicles, and motion trajectories of some of the vehicles may be determined based on contents of the surveillance video, respectively. It will be appreciated that the motion video of the object may also be a captured motion video for other objects such as ships, aircraft, animals, etc., without limitation.

In the embodiment of the application, a predictor can be used for predicting the predicted frames of the target in the ith frame and the jth frame of the video. The predictor may be a Kalman predictor, or other predictor, without limitation. The predictor may first predict a prediction frame of the target at the i-th frame and then predict a prediction frame of the target at the j-th frame. j is a positive integer greater than i, i.e., the operation of predicting the prediction frame of the j-th frame is performed after predicting the prediction frame of the i-th frame.

In the embodiment of the application, the detection frames of the target in the ith frame and the jth frame of the video can be obtained by using a target detection algorithm. The target detection algorithm may be Two Stage target detection algorithm or One Stage target detection algorithm, which is not limited herein. Also, the detection of the target at the detection frame of the i-th frame may be first performed using the target detection algorithm, and then the detection of the target at the detection frame of the j-th frame, that is, the operation of detecting the detection frame of the j-th frame is performed after the detection frame of the i-th frame.

In the embodiment of the application, whether the predicted frame of the ith frame is successfully associated with the detection frame of the ith frame can be judged. When it is determined that the prediction frame of the ith frame is not successfully associated with the detection frame of the ith frame, it is further determined whether the prediction frame of the jth frame is successfully associated with the detection frame of the jth frame. When it is determined that the prediction frame of the j-th frame and the detection frame of the j-th frame can be successfully associated, a virtual track is generated using the detection frame of the i-th frame and the detection frame of the j-th frame.

Further, the predictor may be updated based on the generated virtual track. That is, the target frames of the i-th to j-1-th frames in the generated virtual track may be acquired, and the predictor parameters may be updated using the target frame of each frame as the detection frame of each frame. Specifically, the state of the predictor in the i-1 frame can be obtained from the i-1 frame, and then the state of the predictor in the i-1 frame is updated by using the target frame of the i-1 frame as a detection frame, so as to obtain the state of the predictor in the i-1 frame. Next, using the target frame of the i+1th frame as a detection frame, updating the state of the predictor in the i frame to obtain the state of the predictor in the i+1th frame. And so on until the update gets the state of the predictor at the j-1 frame. And finally, updating the state of the predictor of the j-1 th frame based on the detection frame of the j-th frame to obtain the state of the predictor in the j-th frame.

Compared with the processing method for updating the state of the predictor only according to the j-th frame detection frame in the related art, the technical scheme of the embodiment of the application can reduce the accumulated error of the predictor and improve the calculation precision in the subsequent track determination process.

In the embodiment of the application, the ith frame of the video can be the first frame which is not successfully associated with the detection frame, and the jth frame of the video can be the first frame which is re-associated after the prediction frame is not successfully associated with the detection frame. That is, when it is determined that the prediction frame and the detection frame of the current frame are not successfully associated, the current frame may be noted as the i-th frame. Further, starting from the (i+1) th frame, it is determined whether the prediction frame and the detection frame of each frame are successfully associated. If the prediction frame of the ith+k frame is successfully associated with the detection frame, the ith+k frame is marked as the jth frame. Wherein k is a positive integer.

In the embodiment of the application, the prediction frame of the target in the j+n frame of the video can be predicted based on the updated predictor, and the detection frame of the target in the j+n frame of the video can be obtained by using a target detection algorithm. And associating the predicted frame of the target in the j+n frame with the detection frame to obtain the position of the target in the j+n frame, and determining the track of the target based on the position of the target in the j+n frame.

According to the technical scheme provided by the embodiment of the application, after the track is re-associated successfully after being lost, the virtual track is determined based on the detection frames before and after being re-associated, and the state of the predictor is updated based on the target frame corresponding to the virtual track, namely, the predictor is updated reversely and smoothly through the current detection value, so that the predictor error accumulation caused by long-time shielding or nonlinear movement of the target in the target tracking process is avoided, and the target track determination precision is improved.

That is, in order to solve the problem that tracking noise gradually accumulates in a complex scene, the technical scheme of the embodiment of the application adopts a method for correcting the history predictor based on the detection value. That is, when the track is re-associated after being lost, the predictor is updated back smoothly with the current detection value. Specifically, assuming that a certain track is lost at time t1 and re-associated at time t2, a virtual track is generated by the detection value at time t1 and the detection value at time t2, then a target frame in the virtual track is used for replacing the detection frame in the lost stage, and the state of the predictor is updated from time t1 by using the target frames in the virtual tracks, so that error accumulation of the predictor can be avoided.

In the embodiment of the present application, the prediction frame of the ith frame is not associated with the detection frame of the ith frame, and may include: the detection frame of the ith frame is not acquired; or, the acquired detection frame of the ith frame cannot be associated with the prediction frame of the ith frame. That is, if it is determined that the detection frame of the i-th frame is not detected or the detection frame of the i-th frame is filtered below a preset detection frame threshold, it may be determined that the prediction frame of the i-th frame is not associated with the detection frame of the i-th frame. On the other hand, if the detection frame of the i frame is detected and the detection frame of the i frame is higher than the preset detection frame threshold, but the detection frame of the i frame and the prediction frame of the i frame cannot be associated, that is, are not matched, it can be determined that the prediction frame of the i frame and the detection frame of the i frame are not associated. The value of the preset detection frame threshold can be set according to actual needs, and the preset detection frame threshold is not limited.

In the related art, the problem of track splitting may be caused due to various defects in the target tracking algorithm. That is, the determined target track should be a complete track curve for the same target under normal conditions, but the determined target track is a plurality of tracks due to the deviation of the target tracking algorithm, and the plurality of tracks have redundant tracks. That is, at least two tracks among the plurality of tracks overlap in time. At this time, in the related art, a cross-border tracking (ReID) model may be used to extract appearance information of the target in each track segment, and then the corresponding track is associated based on the time information and the appearance information, so as to determine a correct track of the target. However, the cross-border tracking model is slow to process and occupies more computing resources, resulting in inefficient association.

In view of this, the method for determining the target track provided by the embodiment of the application correlates split tracks through offline track correlation, so as to solve the problem of low correlation efficiency when using a cross-border tracking model.

Fig. 3 is a flowchart of a target track determining method according to an embodiment of the present application. Steps S301 to S309 in the embodiment shown in fig. 3 are the same as steps S201 to S209 in the embodiment shown in fig. 2, and will not be described here again. As shown in fig. 3, the method further comprises the steps of:

in step S310, in response to the determined track of the target including N tracks, where at least two tracks in the N tracks overlap in time, the association degree of any two tracks corresponding to the same time interval in the N tracks is determined using the trained prediction model.

In step S311, the trajectory of the target in each time interval is determined based on the degree of association.

In step S312, the trajectory of the target is determined based on the trajectories of the respective time zones.

Wherein N is a positive integer greater than 1.

In the embodiment of the application, whether the track splitting occurs can be judged by judging whether the track of the determined target comprises N sections of tracks and at least two sections of tracks in the N sections of tracks overlap in time. Wherein the at least two tracks overlap in time may comprise the at least two tracks completely overlapping in time or the at least two tracks partially overlapping in time.

Fig. 4 (a) is a schematic diagram of a target track without track splitting provided in an embodiment of the present application, and fig. 4 (b) is a schematic diagram of a target track with track splitting provided in an embodiment of the present application. As shown in fig. 4 (a), curve 1 is a complete curve representing a target track where no track splitting occurs, wherein there are no track segments overlapping in time. As shown in fig. 4 (b), the curves 1 to 6 are target tracks that together form a track split, and the curves 1 to 6 cannot form a complete curve, wherein the curves 1 and 2, 3 overlap in time, the curves 3 and 4 overlap in time, the curves 4 and 5 overlap in time, and the curves 5 and 6 overlap in time. It should be noted that, in some cases, the target track where the track splitting occurs may also form a complete curve, but other redundant track segments exist at the same time.

When the occurrence of track splitting is determined, the association degree of any two tracks corresponding to the same time interval in the N tracks can be determined by using a trained prediction model. The same time interval refers to the acquisition of a track start time and a track end time corresponding to a target track, the track start time to the track end time are divided into a plurality of time intervals, and each time interval at least corresponds to two tracks. The time interval distinguishing method may be determined according to the distribution of the tracks, or may be determined according to other methods, which is not limited herein. The divided time intervals do not overlap each other in time.

After the association degree of any two sections of tracks in a time interval is determined by using the trained prediction model, the track of the target in each time interval can be determined based on the association degree. Specifically, tracks with the association degree larger than a preset association degree threshold value can be screened out first, then the screened tracks are ordered, and two tracks with the highest association degree are selected for association, so that the track of the time interval is obtained.

In some embodiments, after the tracks with the association degree greater than the preset association degree threshold are screened out, the track of the previous time interval and the track of the next time interval of the current time interval may be obtained. And respectively predicting the association degree of each track with the selected association degree greater than a preset association degree threshold value in the previous time interval and the association degree of each track with the selected track in the next time interval, and determining two sections of tracks which are most associated based on the association degree of each track with the selected association degree greater than the preset association degree threshold value, the association degree of each track with the previous time interval and the association degree of each track with the track in the next time interval. And then correlating the two sections of tracks to obtain the track of the time interval.

Further, the track of the target may be determined according to the track of each time interval. That is, the track of each time interval is related in time sequence, and the track of the target can be determined. The value of the preset association threshold can be set according to actual needs, and the method is not limited.

According to the technical scheme provided by the embodiment of the application, the split tracks are directly associated by using the offline track association algorithm, so that the appearance information of the target is not required to be additionally detected, the calculation amount of target track determination is reduced, the processing speed is improved, and the track determination efficiency is improved.

Fig. 5 is a flowchart of a method for determining association degree of any two tracks corresponding to the same time interval in N tracks by using a trained prediction model according to an embodiment of the present application. As shown in fig. 5, the method comprises the steps of:

in step S501, the temporal features of the first track and the second track are respectively aggregated by a temporal module with channel attention in the prediction model, so as to obtain intermediate features of the first track and the second track.

In step S502, the fusion features of the intermediate features of the first track and the second track are respectively aggregated by a fusion module with spatial attention in the prediction model, so as to obtain the aggregation features of the first track and the second track.

In step S503, the degree of association of the first track and the second track is determined based on the aggregated features of the first track and the second track.

In the embodiment of the application, when the association degree of any two tracks corresponding to the same time interval in N tracks is determined by using a trained prediction model, the time characteristics of the first track and the second track can be respectively aggregated through a time module with channel attention in the prediction model, so as to obtain the intermediate characteristics of the first track and the second track. For the same track, all the target frames in time are associated, and by aggregating the characteristics of the track according to time, richer characteristics can be calculated. Wherein the characteristics of the trajectory include target object characteristics for each frame.

Furthermore, the fusion characteristics of the intermediate characteristics of the first track and the second track can be respectively aggregated through a fusion module with spatial attention in the prediction model, so that the aggregation characteristics of the first track and the second track can be obtained. That is, the characteristics of the track can be first aggregated according to the time dimension by the prediction model, and then the aggregated characteristics are sent to the fusion module with spatial attention to further aggregate the information of different characteristic dimensions of the track, so as to obtain the fusion characteristics of the track. Wherein the different feature dimensions may be at least one of: track identification, frames corresponding to the track and coordinates of a target frame in the track.

It is understood that each segment of the track includes features in the time dimension and features in the space dimension. For example, if two tracks are separated in time by a long period of time, the two tracks must not be split tracks and should not be correlated. Thus, the two tracks are not aggregated when passing through the time module with channel attention. On the other hand, if two tracks differ greatly in spatial dimension, such as ID, coordinates, etc., the two tracks are not necessarily tracks belonging to the same target object, and thus the two tracks are not aggregated when passing through the fusion module with spatial attention.

Further, after the aggregate characteristics of the first track and the second track are determined, the association degree of the first track and the second track may be determined based on the aggregate characteristics of the first track and the second track. That is, after the temporal module with attention and the fusion module with spatial attention pass, the features of the extracted tracks have higher correlation in time and space, and the correlation of the two tracks can be determined by further processing the features based on a classification algorithm and the like.

Fig. 6 is a flowchart of a method for determining a degree of association between a first track and a second track based on an aggregate feature of the first track and the second track according to an embodiment of the present application. As shown in fig. 6, the method includes the steps of:

In step S601, global average pooling processing is performed on the aggregate features of the first track and the second track, so as to obtain a first pooling result and a second pooling result.

In step S602, the first pooling result and the second pooling result are fused, so as to obtain a fusion feature.

In step S603, the fusion feature is processed by using the full connection layer and the classification layer, so as to obtain the association degree between the first track and the second track.

In the embodiment of the application, the aggregation characteristics of the first track and the second track can be subjected to global average pooling treatment to obtain a first pooling result and a second pooling result. And then fusing the first pooling result and the second pooling result to obtain a fusion characteristic. And finally, processing the fusion characteristics by using the full connection layer and the classification layer to obtain the association degree of the first track and the second track.

Fig. 7 is a schematic diagram of a method for offline associating a fracture track according to an embodiment of the present application. As shown in fig. 7, the track may be first formatted to convert the track into a format (track id, frame, target frame coordinates); archiving the tracks which exist simultaneously within a certain time range, wherein groups with different times exist at the moment, namely dividing a plurality of different time intervals; track 1 and track 2 are then selected in each group, i.e. each time interval; track 1 and track 2 are respectively sent to a time module with channel attention, track features are aggregated according to time dimension, and the aggregated features are sent to a fusion module with space attention to further aggregate information of different feature dimensions of the track; then, the aggregated features are subjected to global average pooling, and fusion, such as addition, is carried out by using the extracted average pooling results to obtain fusion features; and finally, the fusion characteristics pass through a full connection layer and a classification layer to obtain the association division of the two sections of tracks.

According to the technical scheme provided by the embodiment of the application, the track confirmation is realized by adopting two modes of online track smoothing and offline track association, so that the technical problems of shielding and low accuracy of a tracking algorithm in a target nonlinear motion scene can be remarkably improved. Specifically, for the technical problem of continuous accumulation of predictor errors in shielding scenes and under nonlinear motion of targets, an online track smoothing algorithm is adopted, and the predictor in the history losing process is corrected through the associated detection frame, so that the prediction precision of the predictor is improved. On the other hand, aiming at split tracks, an offline track association model is adopted to directly associate the tracks, so that the cross-border tracking model is not required to be used for extracting the appearance information of the target, track association is rapidly realized on the basis that more calculation amount is not required to be increased, and the track splitting degree is reduced.

Fig. 8 is a flowchart of a method for training a predictive model according to an embodiment of the application. As shown in fig. 8, the method includes the steps of:

In step S801, a history continuous track is acquired.

Wherein the historical continuous track comprises a historical continuous track of the target, or a historical continuous track of other targets.

In step S802, the historical continuous track is randomly segmented, and an M-segment track is obtained.

Wherein M is a positive integer greater than 1.

In step S803, the M-segment track is formatted.

In step S804, a time range corresponding to the historical continuous track is acquired, the time range is divided into L time intervals, and the formatted M-segment track is saved to the corresponding time interval.

Wherein L is a positive integer.

In step S805, two tracks in any time interval are acquired, and a prediction correlation degree of the two tracks is determined using a prediction model.

In step S806, a predicted trajectory of the target in each time zone is determined based on the predicted correlation degree, and a predicted trajectory of the target is determined based on the predicted trajectory in each time zone.

In step S807, parameters of the predictive model are modified in response to the difference between the predicted trajectory and the historical continuous trajectory being greater than a preset loss threshold until the model converges.

In the embodiment of the application, the prediction model can be trained in advance based on the historical track. The historical track can be a historical continuous track of the target of the track to be determined, and can also be a historical continuous track of other targets. Since the history track is a determined track, which is a complete curve, in order to use the history track to represent the track where the track splitting has occurred, the history track needs to be processed.

In the embodiment of the application, the historical continuous track can be randomly segmented to obtain M sections of tracks, and the tracks with track splitting can be expressed by using the M sections of tracks. The random segmentation refers to randomly segmenting the historical continuous track by using a plurality of time intervals, wherein part of the time intervals overlap in time. In this way, an M-segment track can be obtained in which part of the tracks overlap in time.

Furthermore, in order to more accurately express the track with the track splitting, the head and tail of the M sections of tracks obtained by splitting can be randomly discarded, wherein the maximum time interval for discarding can be set as Q frames, and Q is a positive integer. In one example, Q may take a value of 30.

After the M-segment track is divided, the M-segment track may be formatted. Each of the M tracks after formatting can be expressed using (track id, frame, target frame coordinates).

Furthermore, the time range corresponding to the historical continuous track can be obtained, the time range is divided into L time intervals, and the formatted M sections of track are stored to the corresponding time intervals. That is, the track start time and the track end time corresponding to the historical continuous track may be acquired, and the track start time to the track end time are divided into a plurality of time intervals, where each time interval corresponds to at least two tracks. The time interval distinguishing method may be determined according to the distribution of the tracks, or may be determined according to other methods, which is not limited herein. The divided time intervals do not overlap each other in time.

And then, acquiring two sections of tracks in any time interval, and determining the prediction association degree of the two sections of tracks by using a prediction model. The specific implementation process of determining the prediction association degree of the two tracks by using the prediction model may refer to the descriptions of the embodiments shown in fig. 5 and fig. 6, which are not repeated herein.

After the predicted association degree of any two sections of tracks in each time interval is determined, tracks with association degrees larger than a preset association degree threshold value can be screened out first, then the screened tracks are ordered, and two sections of tracks with highest association degrees are selected for association, so that the predicted track of the time interval is obtained. Further, the predicted track of the target may be determined according to the track of each time interval. That is, the predicted track of the target can be determined by associating the tracks of the time intervals in time sequence.

The determined predicted track and the historical continuous track can be compared, and when the difference value between the predicted track and the historical continuous track is larger than the preset loss threshold value, the difference between the predicted track obtained based on the prediction model and the actual value is larger, so that the model prediction effect is poor, and the model parameters need to be corrected. After modifying the parameters of the prediction model, the modified model can be used for determining the prediction correlation degree of the two sections of tracks in any time interval again, determining the prediction track of the target in each time interval based on the prediction correlation degree, and determining the prediction track of the target based on the prediction track of each time interval. And then, comparing the determined predicted track with the historical continuous track again until the model converges. That is, until the difference between the predicted trajectory and the historical continuous trajectory is less than or equal to the predetermined loss threshold. The value of the preset loss threshold value can be set according to actual needs, and is not limited here.

According to the technical scheme provided by the embodiment of the application, the training data is constructed by using the historical continuous track, the prediction model for predicting the association degree between two sections of tracks is trained, a large amount of sample data is not required to be acquired and indexed, and the simple realization and high efficiency are realized.

Any combination of the above optional solutions may be adopted to form an optional embodiment of the present application, which is not described herein.

The following are examples of the apparatus of the present application that may be used to perform the method embodiments of the present application. For details not disclosed in the embodiments of the apparatus of the present application, please refer to the embodiments of the method of the present application.

Fig. 9 is a schematic diagram of a target track determining apparatus according to an embodiment of the present application. As shown in fig. 9, the apparatus includes:

an acquisition module 901 configured to acquire a target motion video.

A prediction module 902 configured to predict a prediction frame of a target at an i-th frame and a j-th frame of video using a predictor.

The detection module 903 is configured to acquire a detection frame of the target in the i-th frame and the j-th frame of the video using a target detection algorithm.

The generating module 904 is configured to generate a virtual track using the i-th frame detection frame and the j-th frame detection frame in response to the i-th frame prediction frame not being associated with the i-th frame detection frame and the j-th frame prediction frame being associated with the j-th frame detection frame.

An update module 905 configured to update the predictor based on the virtual track.

Further, the prediction module is further configured to predict a predicted frame of the object at the j+n frame of the video based on the updated predictor, and the detection module is further configured to acquire a detected frame of the object at the j+n frame of the video using the object detection algorithm.

And an association module 906 configured to associate the prediction frame and the detection frame of the target in the j+n frame, so as to obtain the position of the target in the j+n frame.

A determination module 907 configured to determine the trajectory of the target based on the position of the target in the j+n frame.

Wherein i, j and n are positive integers, and j is greater than i.

According to the technical scheme provided by the embodiment of the application, after the track is re-associated successfully after being lost, the virtual track is determined based on the detection frames before and after being re-associated, and the state of the predictor is updated based on the target frame corresponding to the virtual track, namely the predictor is updated reversely and smoothly through the current detection value, so that the error accumulation of the predictor is avoided, and the determination precision of the target track is improved.

In the embodiment of the present application, the prediction frame of the ith frame is not associated with the detection frame of the ith frame, including: the detection frame of the ith frame is not acquired; or, the acquired detection frame of the ith frame cannot be associated with the prediction frame of the ith frame.

In the embodiment of the application, the method further comprises the following steps: responding to the determined track of the target to comprise N sections of tracks, wherein at least two sections of tracks in the N sections of tracks overlap in time, and determining the association degree of any two sections of tracks corresponding to the same time interval in the N sections of tracks by using a trained prediction model; determining the track of the target in each time interval based on the association degree; the trajectory of the object is determined based on the trajectories of the time intervals. Wherein N is a positive integer greater than 1.

In the embodiment of the application, any two sections of tracks in the same time interval comprise a first track and a second track, and the correlation degree of any two sections of tracks corresponding to the same time interval in N sections of tracks is determined by using a trained prediction model, and the method comprises the following steps: respectively aggregating the time characteristics of the first track and the second track through a time module with channel attention in a prediction model to obtain the intermediate characteristics of the first track and the second track; respectively aggregating the fusion characteristics of the intermediate characteristics of the first track and the second track through a fusion module with spatial attention in the prediction model to obtain the aggregation characteristics of the first track and the second track; and determining the association degree of the first track and the second track based on the aggregation characteristics of the first track and the second track.

In an embodiment of the present application, determining a degree of association between a first track and a second track based on an aggregate feature of the first track and the second track includes: global average pooling treatment is carried out on the aggregation characteristics of the first track and the second track respectively, so that a first pooling result and a second pooling result are obtained; fusing the first pooling result and the second pooling result to obtain a fusion characteristic; and processing the fusion characteristics by using the full connection layer and the classification layer to obtain the association degree of the first track and the second track.

In an embodiment of the present application, the fusion feature comprises at least one of: track identification, frames corresponding to the track and coordinates of a target frame in the track.

In the embodiment of the application, the prediction model is obtained by training in the following way: acquiring a historical continuous track, wherein the historical continuous track comprises a historical continuous track of a target or a historical continuous track of other targets; randomly segmenting the historical continuous track to obtain M sections of tracks, wherein M is a positive integer greater than 1; formatting the M sections of tracks; acquiring a time range corresponding to the historical continuous track, dividing the time range into L time intervals, and storing the formatted M sections of tracks to the corresponding time intervals; acquiring two sections of tracks in any time interval, and determining the prediction association degree of the two sections of tracks by using a prediction model; determining a predicted track of the target in each time interval based on the predicted association degree, and determining a predicted track of the target based on the predicted track of each time interval; and modifying parameters of the prediction model until the model converges in response to the difference between the predicted track and the historical continuous track being greater than a preset loss threshold.

It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present application.

Fig. 10 is a schematic diagram of an electronic device according to an embodiment of the present application. As shown in fig. 10, the electronic device 10 of this embodiment includes: a processor 1001, a memory 1002 and a computer program 1003 stored in the memory 1002 and executable on the processor 1001. The steps of the various method embodiments described above are implemented by the processor 1001 when executing the computer program 1003. Alternatively, the processor 1001 implements the functions of the modules/units in the above-described respective device embodiments when executing the computer program 1003.

The electronic device 10 may be a desktop computer, a notebook computer, a palm computer, a cloud server, or the like. The electronic device 10 may include, but is not limited to, a processor 1001 and a memory 1002. It will be appreciated by those skilled in the art that fig. 10 is merely an example of the electronic device 10 and is not limiting of the electronic device 10 and may include more or fewer components than shown, or different components.

The processor 1001 may be a central processing unit (Central Processing Unit, CPU) or other general purpose processor, digital signal processor (Digital Signal Processor, DSP), application specific integrated circuit (Application SpecificIntegrated Circuit, ASIC), field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like.

The memory 1002 may be an internal storage unit of the electronic device 10, for example, a hard disk or a memory of the electronic device 10. The memory 1002 may also be an external storage device of the electronic device 10, for example, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the electronic device 10. Memory 1002 may also include both internal and external storage units of electronic device 10. The memory 1002 is used to store computer programs and other programs and data required by the electronic device.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit.

The integrated modules/units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, and the computer program may be stored in a computer readable storage medium, where the computer program, when executed by a processor, may implement the steps of each of the method embodiments described above. The computer program may comprise computer program code, which may be in source code form, object code form, executable file or in some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the content of the computer readable medium can be appropriately increased or decreased according to the requirements of the jurisdiction's jurisdiction and the patent practice, for example, in some jurisdictions, the computer readable medium does not include electrical carrier signals and telecommunication signals according to the jurisdiction and the patent practice.

The above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims

1. A target trajectory determination method, comprising:

acquiring a target motion video;

predicting a predicted frame of the target at the i-th and j-th frames of the video using a predictor;

acquiring detection frames of the target in an ith frame and a jth frame of the video by using a target detection algorithm;

generating a virtual track by using the ith frame detection frame and the jth frame detection frame in response to the fact that the ith frame prediction frame is not associated with the ith frame detection frame and the jth frame prediction frame is associated with the jth frame detection frame;

updating the predictor based on the virtual track;

Acquiring a detection frame of the target in a j+n frame of the video by using a target detection algorithm;

associating a prediction frame and a detection frame of the target in the j+n frame to obtain the position of the target in the j+n frame;

responding to the determined track of the target comprises N sections of tracks, at least two sections of tracks in the N sections of tracks are overlapped in time, and determining the association degree of any two sections of tracks corresponding to the same time interval in the N sections of tracks by using a trained prediction model;

determining the track of the target in each time interval based on the association degree;

determining the track of the target based on the tracks of the time intervals;

any two sections of tracks in the same time interval comprise a first track and a second track, and the determining of the association degree of any two sections of tracks in the N sections of tracks corresponding to the same time interval by using a trained prediction model comprises the following steps:

respectively aggregating the time characteristics of the first track and the second track through a time module with channel attention in the prediction model to obtain the intermediate characteristics of the first track and the second track;

respectively aggregating fusion features of intermediate features of the first track and the second track through a fusion module with spatial attention in the prediction model to obtain aggregation features of the first track and the second track;

Determining a degree of association of the first track and the second track based on the aggregate characteristics of the first track and the second track;

wherein i, j and N are positive integers, j is greater than i, and N is a positive integer greater than 1.

2. The method of claim 1, wherein the prediction frame of the i-th frame is unassociated with the detection frame of the i-th frame, comprising:

the detection frame of the ith frame is not acquired; or alternatively

The obtained detection frame of the ith frame cannot be associated with the prediction frame of the ith frame.

3. The method of claim 1, wherein the determining the degree of association of the first track and the second track based on the aggregated features of the first track and the second track comprises:

global average pooling treatment is carried out on the aggregation characteristics of the first track and the second track respectively, so that a first pooling result and a second pooling result are obtained;

fusing the first pooling result and the second pooling result to obtain a fusion characteristic;

and processing the fusion characteristics by using a full connection layer and a classification layer to obtain the association degree of the first track and the second track.

4. The method of claim 1, wherein the fusion features comprise at least one of:

Track identification, frames corresponding to the track and coordinates of a target frame in the track.

5. The method of claim 1, wherein the predictive model is trained by:

acquiring a historical continuous track, wherein the historical continuous track comprises a historical continuous track of the target or a historical continuous track of other targets;

randomly segmenting the historical continuous track to obtain M sections of tracks, wherein M is a positive integer greater than 1;

formatting the M sections of tracks;

obtaining a time range corresponding to the historical continuous track, dividing the time range into L time intervals, and storing the formatted M sections of tracks to the corresponding time intervals, wherein L is a positive integer;

acquiring two sections of tracks in any time interval, and determining the prediction association degree of the two sections of tracks by using the prediction model;

determining a predicted track of the target in each time interval based on the predicted association degree, and determining the predicted track of the target based on the predicted track of each time interval;

and modifying parameters of the prediction model until the model converges in response to the difference between the predicted track and the historical continuous track being greater than a preset loss threshold.

6. A target trajectory determination device, characterized by comprising:

the acquisition module is configured to acquire a target motion video;

a prediction module configured to predict a prediction frame of the target at an i-th frame and a j-th frame of the video using a predictor;

the detection module is configured to acquire detection frames of the target in an ith frame and a jth frame of the video by using a target detection algorithm;

a generation module configured to generate a virtual track using the i-th frame detection frame and the j-th frame detection frame in response to the i-th frame prediction frame not being associated with the i-th frame detection frame and the j-th frame prediction frame being associated with the j-th frame detection frame;

an update module configured to update the predictor based on the virtual track;

the prediction module is further configured to predict a prediction frame of the target at a j+n frame of the video based on the updated predictor;

the detection module is further configured to acquire a detection frame of the target in a j+n frame of the video by using a target detection algorithm;

the association module is configured to associate the prediction frame and the detection frame of the target in the j+n frame to obtain the position of the target in the j+n frame;

a determining module configured to determine a trajectory of the target based on a position of the target at a j+n frame;

determining the track of the target based on the tracks of the time intervals;

7. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any one of claims 1 to 5 when the computer program is executed.

8. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the method according to any one of claims 1 to 5.