CN115953434A

CN115953434A - Track matching method and device, electronic equipment and storage medium

Info

Publication number: CN115953434A
Application number: CN202310118712.2A
Authority: CN
Inventors: 路金诚; 张伟; 谭啸; 李莹莹
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2023-01-31
Filing date: 2023-01-31
Publication date: 2023-04-11
Anticipated expiration: 2043-01-31
Also published as: CN115953434B

Abstract

The disclosure provides a track matching method, which relates to the technical field of artificial intelligence such as computer vision, image processing, deep learning and the like, and can be applied to scenes such as automatic driving, unmanned driving and the like. The specific implementation scheme is as follows: determining n first tracks from a first sequence of images from a first perceptual device and m second tracks from a second sequence of images from a second perceptual device, the first tracks comprising a first sequence of bounding boxes and a first sequence of features, the second tracks comprising a second sequence of bounding boxes and a second sequence of features; calculating the distance relation between the n first tracks and the m second tracks according to the first characteristic sequence and the second characteristic sequence; according to the distance relation, determining a mutual nearest neighbor set of each first enclosure frame in the n first tracks; and determining a second track matched with the first track in the m second tracks according to the mutual nearest neighbor set. The disclosure also provides a track matching device, an electronic device and a storage medium.

Description

Track matching method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of artificial intelligence technology, and in particular to the field of computer vision, image processing, and deep learning technology, and is applicable to automatic driving, unmanned driving, and other scenes. More particularly, the present disclosure provides a trajectory matching method, apparatus, electronic device, and storage medium.

Background

The main tasks of multi-target tracking include locating multiple targets simultaneously in a given video, and maintaining their respective identifications, recording their respective trajectories, etc. The multi-target tracking task is widely applied to the fields of robot navigation, intelligent monitoring video, industrial detection, aerospace, automatic driving and the like.

The continuous multi-target tracking across the cameras can obtain the complete track of the target under the visual field of the cameras, and can be used for road management of cities, high speeds and the like and scenes of digital twins and the like.

Disclosure of Invention

The disclosure provides a track matching method, a track matching device, track matching equipment and a storage medium.

According to a first aspect, there is provided a trajectory matching method, the method comprising: determining n first tracks from a first image sequence from a first perceiving device, and m second tracks from a second image sequence from a second perceiving device, n and m each being an integer greater than 1, the first tracks comprising a first bounding box sequence and a first feature sequence, the second tracks comprising a second bounding box sequence and a second feature sequence; calculating the distance relation between the n first tracks and the m second tracks according to the first characteristic sequence and the second characteristic sequence; according to the distance relation, determining a mutual nearest neighbor set of each first enclosure frame in the n first tracks, wherein the mutual nearest neighbor set comprises a plurality of second enclosure frames which are nearest neighbors to the first enclosure frames; and determining a second track matched with the first track in the m second tracks according to the mutual nearest neighbor set.

According to a second aspect, there is provided a trajectory matching device, the device comprising: a trajectory determination module for determining n first trajectories from a first image sequence from a first perceiving device and m second trajectories from a second image sequence from a second perceiving device, n and m each being an integer greater than 1, the first trajectories including a first bounding box sequence and a first feature sequence, the second trajectories including a second bounding box sequence and a second feature sequence; the calculation module is used for calculating the distance relation between the n first tracks and the m second tracks according to the first characteristic sequence and the second characteristic sequence; the nearest neighbor determining module is used for determining a mutually nearest neighbor set of each first enclosure frame in the n first tracks according to the distance relation, wherein the mutually nearest neighbor set comprises a plurality of second enclosure frames which are mutually nearest to the first enclosure frames; and the track matching module is used for determining a second track matched with the first track in the m second tracks according to the mutual nearest neighbor set.

According to a third aspect, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform methods provided in accordance with the present disclosure.

According to a fourth aspect, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform a method provided in accordance with the present disclosure.

According to a fifth aspect, a computer program product is provided, comprising a computer program stored on at least one of a readable storage medium and an electronic device, which computer program, when executed by a processor, implements a method provided according to the present disclosure.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a schematic diagram of an exemplary system architecture to which the trajectory matching method and apparatus may be applied, according to one embodiment of the present disclosure;

FIG. 2 is a flow diagram of a trajectory matching method according to one embodiment of the present disclosure;

FIG. 3 is a system architecture diagram of a trajectory matching method according to one embodiment of the present disclosure;

FIG. 4 is a schematic diagram of a cross-camera track matching method of bounding box granularity according to one embodiment of the present disclosure;

FIG. 5 is an effect diagram of cross-camera multi-target tracking according to one embodiment of the present disclosure;

FIG. 6 is a block diagram of a trajectory matching device according to one embodiment of the present disclosure;

FIG. 7 is a block diagram of an electronic device of a trajectory matching method according to one embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

At present, most of the mainstream multi-target tracking methods are tracking methods based on detection. The object may refer to a vehicle, a pedestrian, a robot, or the like.

For example, a multi-target tracking method for a single camera obtains a detection result of each frame of image in a plurality of continuous frames of images under the single camera, wherein the detection result of each frame of image includes a bounding box of each of a plurality of targets in the frame of image, and the bounding box includes position and size characteristics. And extracting features from the small images in the surrounding frame of each target to obtain the features of each target. And then comparing the characteristics of the targets in each two adjacent frames of images to complete the matching of the targets in the two adjacent frames of images, further completing the matching of the targets in the continuous multiple frames of images, obtaining the respective bounding box sequences of the multiple targets under the single camera, and using the respective bounding box sequences as the respective tracks of the multiple targets, namely the multi-target tracking result of the single camera.

For a plurality of continuous cameras with distance intervals, a multi-target tracking result under a single camera can be obtained, and then track matching is carried out according to the track level features (such as feature sequences corresponding to bounding box sequences) of the targets, so that respective tracks of the targets under the plurality of continuous cameras, namely the multi-target tracking result across the cameras, are obtained.

For example, performing track matching according to the features of the track level of the target may include performing track matching by using an average value of the feature sequence or features of key frames in the feature sequence as features of the entire track. However, the method easily ignores the characteristics of some specific surrounding boxes, so that the target characteristics are not obvious, and further, tracking confusion is easily caused. Especially, under the conditions that the scene is complex and the target with higher similarity exists, target tracking is easy to be confused and has poor robustness.

In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure and other processing of the personal information of the related user are all in accordance with the regulations of related laws and regulations and do not violate the good customs of the public order.

In the technical scheme of the disclosure, before the personal information of the user is acquired or collected, the authorization or the consent of the user is acquired.

FIG. 1 is a schematic diagram of an exemplary system architecture to which the trajectory matching method and apparatus may be applied, according to one embodiment of the present disclosure. It should be noted that fig. 1 is only an example of a system architecture to which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, and does not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios.

As shown in fig. 1, the system architecture 100 according to this embodiment may include

terminal devices

101, 102, 103, a network 104 and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired and/or wireless communication links, and so forth.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The

terminal devices

101, 102, 103 may be various electronic devices including, but not limited to, smart phones, tablet computers, laptop computers, and the like.

The trajectory matching method provided by the embodiments of the present disclosure may be generally performed by the server 105. Accordingly, the track matching device provided by the embodiment of the present disclosure may be generally disposed in the server 105.

The image sequences acquired by each of the plurality of cameras in succession may be uploaded by the

terminals

101, 102, 103 to the server 105 via the network 104. The server 105 executes the trajectory matching method provided by the embodiment of the disclosure, and then obtains a cross-camera multi-target tracking result. The server 105 can also feed back the cross-camera multi-target tracking result to the

terminals

101, 102 and 103 through the network 104, and the

terminals

101, 102 and 103 or other devices connected with the

terminals

101, 102 and 103 display the motion tracks of the multiple targets under the continuous multiple cameras based on the tracking result.

FIG. 2 is a flow diagram of a trajectory matching method according to one embodiment of the present disclosure.

As shown in fig. 2, the trajectory matching method 200 includes operations S210 to S240.

In operation S210, n first tracks are determined from a first image sequence from a first perception device, and m second tracks are determined from a second image sequence from a second perception device. m and n are integers larger than 1, the first track comprises a first bounding box sequence and a first characteristic sequence, and the second track comprises a second bounding box sequence and a second characteristic sequence.

The first perception device and the second perception device can be two cameras adjacent to each other in geographic positions, the first image sequence is obtained by shooting through the first perception device, and the second image sequence is obtained by shooting through the second perception device.

The n first trajectories may be trajectories of n first objects in the first image sequence, including a first bounding box sequence and a first feature sequence for each of the n first objects. The first bounding box represents a position and a size of the first object, and the first feature represents appearance information of the first object. The first bounding box sequence and the first feature sequence of each first object correspond to each other.

The m second trajectories may be trajectories of m second objects in the second image sequence. A second bounding box sequence and a second feature sequence for each of the m second objects. The second bounding box represents the position and size of the second object and the second feature represents appearance information of the second object. The second bounding box sequence and the second feature sequence of each second object correspond to each other.

The first target and the second target may both be vehicles.

In operation S220, distance relationships between the n first tracks and the m second tracks are calculated according to the first feature sequence and the second feature sequence.

For example, the similarity between each first bounding box in the n first tracks and each second bounding box in the m second tracks, respectively, may be calculated. And determining the distance relationship between the n first tracks and the m second tracks according to the similarity.

For example, n first bounding boxes of the n first tracks may be represented as a first matrix, each element in the first matrix representing one first bounding box, each element in the first matrix having a first characteristic corresponding to the first bounding box represented by the element. And representing m second enclosing frame sequences in the m second tracks as a second matrix, wherein each element in the second matrix represents one second enclosing frame, and each element in the second matrix has a second characteristic corresponding to the second enclosing frame represented by the element.

For example, the similarity calculation is performed on the first feature of each element in the first matrix and the second feature of each element in the second matrix, so as to obtain a similarity matrix. The similarity matrix may represent a distance relationship between the n first tracks and the m second tracks.

In operation S230, a mutual nearest neighbor set of each first bounding box in the n first tracks is determined according to the distance relationship, where the mutual nearest neighbor set includes a plurality of second bounding boxes that are nearest neighbors to the first bounding box.

For example, for each first bounding box, k (k is an integer greater than 1, e.g., k = 3) second bounding boxes nearest to the first bounding box may be found according to the distance relationship. For each second bounding box, k first bounding boxes nearest to the second bounding box may also be searched according to the distance relationship. Based on the k-nearest neighbors of each first bounding box and the k-nearest neighbors of each second bounding box, a k-mutual nearest neighbor relationship can be determined.

For example, if the first bounding box X belongs to the k-nearest neighbor of the second bounding box Y, and the second bounding box Y belongs to the k-nearest neighbor of the first bounding box X, then the first bounding box X and the second bounding box Y are k-nearest neighbors of each other.

For each first bounding box, all second bounding boxes that are k-nearest neighbors to the first bounding box may be grouped into a k-mutual nearest neighbor set.

In operation S240, a second track, which matches the first track, of the m second tracks is determined according to the mutual nearest neighbor set.

According to the embodiment of the disclosure, for each first bounding box, the mutually nearest neighbor set of the first bounding box includes a plurality of second bounding boxes, and some or all of the second bounding boxes of the plurality of second bounding boxes may belong to the same second track, so that the second track composed of the plurality of second bounding boxes may be determined as the second track matched with the first bounding box.

For example, the k-nearest neighbor set of the first bounding box A1 is { second bounding box B1, second bounding box B2, second bounding box C1}. The second enclosure frame B1 and the second enclosure frame B2 belong to the second trajectory B. Accordingly, the second trajectory B can be determined as the second trajectory matching the first bounding box A1.

According to the embodiment of the disclosure, for each first track, each first enclosure in the first track may determine a second track corresponding to the first enclosure. And combining the second tracks matched with each first enclosing frame of the first tracks into a candidate track set, and determining the second tracks with the largest number in the candidate track set as the second tracks matched with the first tracks.

For example, the first trajectory a includes a first bounding box A1, a first bounding box A2, and a first bounding box A3. The k-nearest neighbor set of the first bounding box A1 is { second bounding box B1, second bounding box B2, second bounding box C1}, so that the second trajectory matching the first bounding box A1 is the second trajectory B, which is added to the candidate trajectory set as the candidate second trajectory.

The k-nearest neighbor set of the first bounding box A2 is { second bounding box C1, second bounding box C2}, so that the second trajectory matching the first bounding box A2 is the second trajectory C, which is added to the candidate trajectory set as the candidate second trajectory.

The k-nearest neighbor set of the first bounding box A3 is { second bounding box C1, second bounding box C2, second bounding box C3}, so that the second trajectory matching the first bounding box A3 is the second trajectory C, which is added to the candidate trajectory set as the candidate second trajectory.

Therefore, the candidate trajectory set is { second trajectory B, second trajectory C }, where the most numerous candidate trajectory of the candidate trajectory set is second trajectory C, and thus, second trajectory C may be determined as the second trajectory matching first trajectory a.

Since the first track a and the second track C are matched with each other, the first track a and the second track C can be determined to be tracks of the same target, and further, the motion track of the same target crossing the camera can be determined.

Compared with the prior art in which the track matching is performed according to the feature sequence of the track level, the embodiment calculates the distance relationship of the enclosure frame granularity for the first track and the second track from different cameras, and performs the track matching of the enclosure frame granularity according to the distance relationship, so that the track matching is more accurate.

According to an embodiment of the present disclosure, operation S220 includes, for each bounding box of the n first trajectories, calculating a similarity matrix according to similarities between features of the bounding box and features of each bounding box of the m second trajectories, respectively; and aiming at each element in the similarity matrix, optimizing the element according to the shielding proportion of each of the two surrounding frames corresponding to the element to obtain the optimized similarity matrix as a distance relation.

For example, the ith track of the n first tracks may be represented as

i∈[1，n]，h _i Indicates the length of the ith track, is>

Represents the first bounding box in frame 1 of the ith trace, and->

The first bounding box representing the last frame in the ith track.

Similarly, the jth track of the m second tracks may be denoted as

j∈[1，m]，h _j Indicates the length of the jth track>

A second bounding box representing the 1 st frame in the jth track, < >>

A second bounding box representing the last frame in the jth track.

Calculating the similarity between each first enclosing frame in the n first tracks and each second enclosing frame in the m second tracks respectively, and obtaining a similarity matrix D as follows:

where cos () represents the cosine distance between the features of two bounding boxes, and 1-cos () may represent the similarity distance of two bounding boxes, which is also an element in the similarity matrix D. Can use

Representing elements of the similarity matrix D, i.e.

Specifically, the similarity distance between the bounding box I and the bounding box J is represented.

For each element in the similarity matrix D

According to the ratio of the occlusion of the bounding box I and the bounding box J, the element ^ is optimized>

The element may be optimized based on the following equation (1)>

Wherein,

representing an element in the similarity matrix, I, J representing two bounding boxes corresponding to the element, r _o Represents the maximum value r of the occlusion ratio of the bounding box I and the occlusion ratio of the bounding box J _thre And alpha _o Is a hyper-parameter.

At r is _o Greater than r _thre In other words, when the maximum value of the occlusion ratio of the bounding box is greater than the threshold value (for example, 50%), since the ratio of the occlusion of the bounding box by another vehicle is large, the similarity between the two bounding boxes is not accurate, and thus the similarity distance between the two bounding boxes can be suppressed. For example, alpha _o Less than 0, function

Monotonically decreases such that the greater the similarity, the more suppressed.

The optimized similarity distance matrix may be used as a distance matrix between the n first tracks and the m second tracks, that is, a distance relationship.

The track matching method provided by the embodiment of the present disclosure is described in detail below with reference to fig. 3.

FIG. 3 is a system architecture diagram of a trajectory matching method according to one embodiment of the present disclosure.

As shown in fig. 3, the system architecture diagram of the present embodiment includes a target detection module 310, a feature extraction module 320, a single-camera tracking module 330, and a cross-camera association module 340. Image series 301 to 304 are, for example, image series from a plurality of cameras in succession, for example image series 301 from camera a, image series 302 from camera B, image series 303 from camera C, image series 304 from camera D, cameras a to D being geographically successive (adjacent to each other) cameras, respectively.

For the image sequence 301, the input object detection module 310 obtains a bounding box of the object in the image sequence 301. The target detection module 310 may be implemented by a convolutional neural network, where the network structure is a PPYOLO-E (paddlepaddley-YOLO, and the PPYOLO-E is obtained by improving based on a PPYOLO series model), the model is input as a single image, and the output is the position, the category, and the confidence score of the bounding box of the target on the image.

The feature extraction module 320 is configured to input a small graph surrounded by a bounding box of each target output by the target detection module 310 into the convolutional neural network to obtain a ReID (Re-identification) feature 321 of the target, where the ReID feature 321 is different from a color feature, a shape feature, and the like of a single dimension, and is a feature representing overall appearance information of the target. For a particular object, the ReID feature of the object can be used to determine whether the object appears in other images. The backbone Network structure of the feature extraction module 320 may be a High-Resolution Network (HRNet).

The network structure of the single-camera Tracking module 330 may be modified based on deepSORT (SORT, SORT is a precursor of DeepsORT). The input of the single-camera tracking module 330 is an image and a target detection result of the image, which includes bounding box information output by the target detection module 310 and ReID features 321 output by the feature extraction module 320. And outputting a tracking identifier (tracking ID) of each target under the current camera.

For example, a target detection result of the current image is obtained, and the target detection result of the current image comprises the bounding box and the ReID feature of each target in the current image. And acquiring a target tracking result of the previous image, wherein the target tracking result comprises a surrounding frame, a ReID (ReID) characteristic and a tracking ID of each target in the previous image. And performing association matching on the target in the current image and the target in the previous image according to the ReID characteristics to obtain a successfully matched target pair. The successfully matched target pair indicates that the target pair is the same target. And for the target pair successfully matched, assigning the tracking ID of the target from the previous image in the target pair to the target from the current image, so that the same target has the same tracking ID.

For an object that is not successfully matched in the current image, the object may be an object of the first image in the image sequence 301 or a new object that newly enters the shooting range of the camera. For such targets, a tracking ID may be newly created for the target.

The target detection results of the images in the image sequence 301 are continuously input into the single-camera tracking module 330 according to the image sequence, so that the target tracking result of each image can be obtained, and further the single-camera multi-target tracking result of the image sequence 301 can be obtained. The bounding box sequence and the ReID feature sequence of the target with the same tracking ID in the image sequence 301 serve as the trajectory of the target, and therefore, the single-camera multi-target tracking result of the image sequence 301 includes the respective trajectories of the multiple targets.

Similarly, single-camera multi-target tracking results of the image sequence 302 to the image sequence 304 can also be obtained.

The input of the cross-camera association module 340 is the single-camera multi-target tracking result of each of the continuous image sequences, and the multi-target tracks in the adjacent image sequences are matched to obtain the cross-camera multi-target tracking result.

According to the embodiment of the disclosure, after the cross-camera multi-target tracking result is obtained, determining a first target and a second target respectively corresponding to a first track and a second track which are determined to be matched with each other as the same target; and assigning the same global identification to the first target and the second target which are determined to be the same target.

For example, if the trajectory 341 matches the trajectory 342, it may be determined that the target corresponding to the trajectory 341 is the same target as the target corresponding to the trajectory 342. A global identification (global ID) may be created for the same target so that the target has the same numeric ID under consecutive cameras, not easily confused with other targets.

The cross-camera association module 340 is a method for matching multiple target tracks in adjacent image sequences, specifically a cross-camera track matching method based on bounding box granularity. The cross-camera track matching method of bounding box granularity provided by the present disclosure is specifically described below with reference to fig. 4.

Fig. 4 is a schematic diagram of a cross-camera track matching method of bounding box granularity according to one embodiment of the present disclosure.

As shown in fig. 4, the cross-camera of the present embodiment may include an upstream camera and a downstream camera having a geographical position adjacency relationship.

The video 410 may be from an upstream camera, and a plurality of (e.g., 3) bounding box sequences 411 are obtained by framing the video 410 into an image sequence and performing object detection, where each bounding box sequence corresponds to an object in the video 410. Feature sequences corresponding to the bounding box sequences are obtained by performing feature extraction on the bounding box sequences 411, and the plurality of bounding box sequences 411 and the corresponding feature sequences form a plurality of upstream tracks 412.

For example, the plurality of upstream traces 412 includes upstream trace B ₁ Upstream track B ₂ And an upstream track B ₃ ，

The upstream track B can be set by complementing 0 at the featureless frame position ₁ Upstream track B ₂ And an upstream track B ₃ Are processed to be uniform and constitute a first matrix 413.

The video 420 may come from a downstream camera, and a plurality of (e.g., 3) bounding box sequences 421 are obtained by framing the video 420 into an image sequence and performing object detection, where each bounding box sequence corresponds to an object in the video 420. Feature extraction is performed on the bounding box sequence 421 to obtain a feature sequence corresponding to the bounding box sequence, and the plurality of bounding box sequences 421 and the corresponding feature sequences form a plurality of downstream tracks 422.

For example, the plurality of downstream traces 422 includes downstream traces

Downstream locus>

And a downstream locus +>

Downstream locus can be +>

Downstream locus pick>

And a downstream locus pick>

Are processed to be uniform and constitute the second matrix 423.

Similarity between each element in the first matrix 413 and each element in the second matrix 423 is calculated, and a similarity matrix is obtained. And performing similarity matrix optimization through the formula (1) to obtain a distance matrix 430 of the bounding box granularity, wherein elements in the distance matrix 430 represent similarity distances between two bounding boxes.

Next, a bounding box-granularity trajectory matching is performed using a k-mutual neighbor matching algorithm based on the distance matrix 430.

The k-nearest neighbor relationship can be defined by the following equation (2).

Wherein,

represents a bounding box of the h-th frame in the i-th track, -a>

Represents->

K nearest neighbors.

The k-mutual nearest neighbor relationship can be defined by the following formula (3).

Wherein,

represents a bounding box of the h-th frame in the i-th track, -a>

Represents->

K-mutual nearest neighbor set.

For convenience of description, the bounding box in the upstream trace will be referred to as an upstream bounding box, and the bounding box in the downstream trace will be referred to as a downstream bounding box.

For each upstream bounding box, k (e.g., k = 3) downstream bounding boxes nearest to the upstream bounding box may be found. For example, for upstream tracks

Each upstream enclosure inAnd finding k downstream surrounding frames with respective nearest neighbors according to the distance matrix 430 to obtain an upstream track B ₁ Of the nearest neighbor relation set 441. Similarly, an upstream trajectory B can be obtained ₂ And a nearest neighbor relation set 442, and an upstream trace B ₃ The set of nearest neighbor relations 443.

For each downstream bounding box, k (e.g., k = 3) downstream bounding boxes nearest to the downstream bounding box may be looked up in reverse. For example, the sets 451 to 453 may be nearest neighbor sets respectively corresponding to the sets 441 to 443 obtained by reverse lookup.

For example, in upstream track B ₁ In the nearest neighbor relation set 441, the upstream bounding box

Is a k-nearest neighbor set of

And &>

Are all on the downstream locus>

For->

And &>

The k-nearest neighbor is looked up in reverse, at @, by the distance matrix 430, respectively>

And &>

Each of the k-nearest neighbors contains->

(e.g., each and in set 451. ANG. -)>

And &>

The content of the respectively corresponding gray frame is->

). Accordingly, is present>

And &>

Belongs to>

K-mutual nearest neighbor set. Due to>

And &>

Belonging to downstream locus>

Thus downstream locus +>

And an upstream enclosure frame>

Matching, downstream locus->

Can be taken as an upstream track B ₁ Are candidate trajectories.

Similarly, in upstream track B ₁ In the nearest neighbor relation set 441, an upstream bounding box

K-nearest neighbor set of

And &>

Belongs to the downstream locus>

For->

And &>

Looking backward in k-nearest neighbors, respectively, by means of a distance matrix 430>

And &>

Each k-nearest neighbor comprises { }>

(e.g., respectively and @, in set 451)>

And &>

The content of the respectively corresponding gray frame is->

) Therefore, is present in>

And &>

Belongs to>

K-mutual nearest neighbor set. Due to>

And &>

Belongs to the downstream locus>

Thus downstream locus +>

And an upstream enclosure frame>

Matching, downstream locus->

Can be taken as an upstream track B ₁ Are candidate trajectories.

Similarly, for upstream track B ₁ Upstream bounding box of (1)

It is possible to determine the downstream locus->

And &>

Matching, downstream locus->

Can be taken as an upstream track B ₁ Are candidate trajectories.

Thus, for the upstream track B ₁ The candidate trajectory set is

Counting the largest number of downstream traces ≥ from the set of candidate traces by voting>

Is in a track B with the upstream ₁ Matched downstream tracks, i.e. upstream track B ₁ And a downstream locus>

And (6) matching.

Similarly, an upstream trajectory B may also be determined ₂ With downstream trajectory

Matching, upstream track B ₃ And the downstream locus pick>

And (6) matching.

In the embodiment, the distance matrix of the bounding box granularity is calculated, and the trajectory matching of the bounding box granularity is performed by using a k-mutual nearest neighbor matching algorithm, so that the trajectory matching is more accurate.

FIG. 5 is an effect diagram of cross-camera multi-target tracking according to one embodiment of the present disclosure.

As shown in fig. 5, the targets include a vehicle with ID 65 and a vehicle with ID 67. The ID of the vehicle is a global ID created for each target according to the matching method provided by the present disclosure. Since the same global ID is given to the same object, the vehicles have the same digital identification under a plurality of continuous cameras.

For example, for a vehicle with an ID of 65, the frames captured by the cameras a, B, and C are displayed as a surrounding frame and ID (65). Similarly, for a vehicle with ID 67, the frames captured by camera a, camera B, and camera C are also displayed in the form of a bounding box and ID (67).

According to the embodiment, the target is tracked and displayed in a form of surrounding frames and IDs (identification) in a cross-camera mode, so that the target is more clear and is not easy to be confused in tracking.

FIG. 6 is a block diagram of a trajectory matching device according to one embodiment of the present disclosure.

As shown in fig. 6, the trajectory matching apparatus 600 includes a trajectory determination module 601, a calculation module 602, a nearest neighbor determination module 603, and a trajectory matching module 604.

The trajectory determination module 601 is configured to determine n first trajectories from a first image sequence from a first perceptual device, and m second trajectories from a second image sequence from a second perceptual device, where n and m are integers greater than 1, the first trajectories include a first bounding box sequence and a first feature sequence, and the second trajectories include a second bounding box sequence and a second feature sequence.

The calculating module 602 is configured to calculate distance relationships between the n first tracks and the m second tracks according to the first feature sequence and the second feature sequence.

The nearest neighbor determining module 603 is configured to determine, according to the distance relationship, a mutual nearest neighbor set of each first bounding box in the n first tracks, where the mutual nearest neighbor set includes a plurality of second bounding boxes that are nearest neighbors to the first bounding box.

The track matching module 604 is configured to determine a second track, which is matched with the first track, from the m second tracks according to the mutual nearest neighbor set.

According to an embodiment of the present disclosure, the bounding boxes in the first sequence of bounding boxes correspond to features in the first sequence of features. The calculation module 602 includes a calculation unit and an optimization unit.

The calculation unit is used for calculating a similarity matrix according to the similarity between the features of the bounding box and the features of each bounding box in the m second tracks respectively for each bounding box in the n first tracks.

The optimization unit is used for optimizing each element in the similarity matrix according to the respective shielding proportion of the two surrounding frames corresponding to the element to obtain the optimized similarity matrix as a distance relation.

The optimization unit is used for optimizing the elements according to the following formula:

wherein,

representing an element in the similarity matrix, I, J representing two bounding boxes corresponding to the element, r _o Represents the maximum value r of the occlusion ratio of the bounding box I and the occlusion ratio of the bounding box J _thr And alpha _o Is a hyper-parameter.

Nearest neighbor determination module 603 includes a first nearest neighbor determination unit, a second nearest neighbor determination unit, a mutual nearest neighbor determination unit, and a combination unit.

The first nearest neighbor determining unit is used for determining k second surrounding frames nearest to each first surrounding frame according to the distance relation, wherein k is an integer larger than 1.

The second nearest neighbor determination unit is used for determining k first surrounding frames nearest to each second surrounding frame according to the distance relation.

The mutual nearest neighbor determining unit is used for responding to that the specific first surrounding frame belongs to the nearest neighbor of the specific second surrounding frame, and the specific second surrounding frame belongs to the nearest neighbor of the specific first surrounding frame, and determining that the specific first surrounding frame and the specific second surrounding frame are mutually the nearest neighbor.

The combining unit is used for combining all second enclosure frames which are nearest neighbors to the first enclosure frame into a mutual nearest neighbor set aiming at each first enclosure frame.

The trajectory matching module 604 includes a trajectory matching unit and a filtering unit.

The track matching unit is used for determining a second track formed by a plurality of second enclosing frames in the mutual nearest neighbor set of each first enclosing frame as a second track matched with the first enclosing frame.

The screening unit is used for combining the second tracks matched with each first enclosing frame in the first tracks into a candidate track set aiming at each first track, and determining the second track with the largest number in the candidate track set as the second track matched with the first track.

According to an embodiment of the present disclosure, the first trajectory is a trajectory of a first target and the second trajectory is a trajectory of a second target. The trajectory matching device 600 further includes a targeting module and an identification module.

The target determination module is used for determining a first target and a second target respectively corresponding to the first track and the second track which are determined to be matched with each other as the same target.

The identification module is used for endowing the same global identification to the first target and the second target which are determined to be the same target.

The track determining module 601 includes a first detection result obtaining unit, a first tracking result obtaining unit, a first target pair matching unit, a first identification unit, and a first track determining unit.

The first detection result obtaining unit is configured to obtain a first target detection result of a current first image in the first image sequence, where the first target detection result includes a first bounding box and a first feature of at least one first target in the current first image.

The first tracking result acquiring unit is used for acquiring a first target tracking result of a previous first image of a current first image, wherein the first target tracking result comprises a first surrounding frame, a first feature and a first tracking identifier of each first target in the previous first image.

The first target pair matching unit is used for matching at least one first target in the current first image with at least one first target in the previous first image according to the first enclosing frame and the first characteristics to obtain a successfully matched first target pair.

The first identification unit is used for endowing a first tracking identification of a first target from a previous first image in the first target pair with the first target from a current first image aiming at the first target pair successfully matched.

The first track determining unit is used for determining a first surrounding frame sequence and a first feature sequence of a first target with the same first tracking identifier in the first image sequence as a first track.

The trajectory determination module 601 further includes a second detection result acquisition unit, a second tracking result acquisition unit, a second target pair matching unit, a second identification unit, and a second trajectory determination unit.

The second detection result obtaining unit is configured to obtain a second target detection result of a current second image in the second image sequence, where the second target detection result includes a second bounding box and a second feature of at least one second target in the current second image.

The second tracking result obtaining unit is configured to obtain a second target tracking result of a previous second image of the current second image, where the second target tracking result includes a second bounding box, a second feature, and a second tracking identifier of each of at least one second target in the previous second image.

The second target pair matching unit is used for matching at least one second target in the current second image with at least one second target in the previous second image according to the second surrounding frame and the second feature to obtain a successfully matched second target pair.

The second identification unit is used for assigning a second tracking identification of a second target from a previous second image in the second target pair to the second target from the current second image aiming at the second target pair successfully matched.

The second track determining unit is used for determining a second bounding box sequence and a second feature sequence of a second target with the same second tracking identifier in the second image sequence as a second track.

The trajectory matching device 600 further comprises a first creating unit and a second creating unit.

The first creating module is used for creating a tracking identifier for the first target which is not successfully matched in the current first image.

The second creating module is used for creating a second tracking identifier for a second target which is not successfully matched in the current second image.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 7 illustrates a schematic block diagram of an example electronic device 700 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 7, the device 700 comprises a computing unit 701, which may perform various suitable actions and processes according to a computer program stored in a Read Only Memory (ROM) 702 or a computer program loaded from a storage unit 708 into a Random Access Memory (RAM) 703. In the RAM703, various programs and data required for the operation of the device 700 can be stored. The computing unit 701, the ROM 702, and the RAM703 are connected to each other by a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

A number of components in the device 700 are connected to the I/O interface 705, including: an input unit 706 such as a keyboard, a mouse, or the like; an output unit 707 such as various types of displays, speakers, and the like; a storage unit 708 such as a magnetic disk, optical disk, or the like; and a communication unit 709 such as a network card, modem, wireless communication transceiver, etc. The communication unit 709 allows the device 700 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

Computing unit 701 may be a variety of general purpose and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 701 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 701 performs the respective methods and processes described above, such as the trajectory matching method. For example, in some embodiments, the trajectory matching method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 708. In some embodiments, part or all of a computer program may be loaded onto and/or installed onto device 700 via ROM 702 and/or communications unit 709. When the computer program is loaded into the RAM703 and executed by the computing unit 701, one or more steps of the trajectory matching method described above may be performed. Alternatively, in other embodiments, the computing unit 701 may be configured to perform the trajectory matching method by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program code, when executed by the processor or controller, causes the functions/acts specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A trajectory matching method, comprising:

determining n first tracks from a first sequence of images from a first perceiving device and m second tracks from a second sequence of images from a second perceiving device, n and m each being an integer greater than 1, the first tracks including a first sequence of bounding boxes and a first sequence of features, the second tracks including a second sequence of bounding boxes and a second sequence of features;

calculating the distance relation between the n first tracks and the m second tracks according to the first characteristic sequence and the second characteristic sequence;

according to the distance relation, determining a mutual nearest neighbor set of each first enclosure frame in the n first tracks, wherein the mutual nearest neighbor set comprises a plurality of second enclosure frames which are nearest neighbors to the first enclosure frames; and

and determining a second track matched with the first track in the m second tracks according to the mutual nearest neighbor set.

2. The method of claim 1, wherein a bounding box in the first sequence of bounding boxes corresponds to a feature in the first sequence of features; the calculating the distance relationship between the n first tracks and the m second tracks according to the first feature sequence and the second feature sequence comprises:

for each bounding box in the n first tracks, calculating a similarity matrix according to the similarity between the features of the bounding box and the features of each bounding box in the m second tracks; and

and aiming at each element in the similarity matrix, optimizing the element according to the shielding proportion of each of the two surrounding frames corresponding to the element to obtain the optimized similarity matrix as the distance relation.

3. The method according to claim 2, wherein the optimizing, for each element in the similarity matrix, the element according to the respective occlusion proportions of the two bounding boxes corresponding to the element comprises:

optimizing the elements according to the following formula:

wherein,

representing an element in the similarity matrix, I, J representing two bounding boxes corresponding to the element, r _o Represents the maximum value, v, of the occlusion ratio of the bounding box I and the occlusion ratio of the bounding box J _thre And alpha _o Is a hyper-parameter.

4. The method of claim 1, wherein said determining a mutual nearest neighbor set of each first bounding box in the n first trajectories according to the distance relationship comprises:

for each first enclosure frame, determining k second enclosure frames nearest to the first enclosure frame according to the distance relation, wherein k is an integer greater than 1;

for each second enclosure frame, determining k first enclosure frames nearest to the second enclosure frame according to the distance relation;

in response to a particular first bounding box belonging to a nearest neighbor of a particular second bounding box, and the particular second bounding box belonging to a nearest neighbor of the particular first bounding box, determining that the particular first bounding box and the particular second bounding box are nearest neighbors to each other; and

for each first bounding box, all second bounding boxes that are nearest neighbors to the first bounding box are combined into the mutual-nearest neighbor set.

5. The method of claim 1 or 4, wherein said determining, from the set of mutual nearest neighbors, a second trajectory of the m second trajectories that matches the first trajectory comprises:

for each first enclosure box, determining a second track formed by a plurality of second enclosure boxes in the mutual nearest neighbor set of the first enclosure box as a second track matched with the first enclosure box;

and for each first track, combining the second tracks matched with each first enclosing frame in the first track into a candidate track set, and determining the second track with the largest number in the candidate track set as the second track matched with the first track.

6. The method of claim 1 or 5, wherein the first trajectory is a trajectory of a first target and the second trajectory is a trajectory of a second target; the method further comprises the following steps:

determining a first target and a second target respectively corresponding to the first track and the second track which are determined to be matched with each other as a same target; and

and assigning the same global identification to the first target and the second target which are determined to be the same target.

7. The method of claim 1, wherein said determining n first trajectories from the first sequence of images from the first perceiving device comprises:

acquiring a first target detection result of a current first image in the first image sequence, wherein the first target detection result comprises a first enclosing frame and a first feature of at least one first target in the current first image;

acquiring a first target tracking result of a previous first image of the current first image, wherein the first target tracking result comprises a first surrounding frame, a first feature and a first tracking identifier of at least one first target in the previous first image;

according to the first enclosing frame and the first characteristics, at least one first target in the current first image and at least one first target in the previous first image are matched to obtain a successfully matched first target pair;

for the first target pair with successful matching, assigning a first tracking identifier of a first target from a previous first image in the first target pair to the first target from a current first image; and

and determining a first bounding box sequence and a first feature sequence of a first target with the same first tracking identifier in the first image sequence as a first track.

8. The method of claim 7, further comprising:

and creating a tracking identifier for the first target which is not successfully matched in the current first image.

9. The method of claim 1, wherein the determining m second trajectories from the second sequence of images from the second perceiving device comprises:

acquiring a second target detection result of a current second image in the second image sequence, wherein the second target detection result comprises a second surrounding frame and a second feature of at least one second target in the current second image;

acquiring a second target tracking result of a previous second image of the current second image, wherein the second target tracking result comprises a second surrounding frame, a second feature and a second tracking identifier of at least one second target in the previous second image;

matching at least one second target in the current second image with at least one second target in the previous second image according to the second enclosing frame and the second characteristics to obtain a successfully matched second target pair;

for the second target pair with successful matching, assigning a second tracking identifier of a second target from a previous second image in the second target pair to the second target from the current second image; and

and determining a second bounding box sequence and a second feature sequence of a second target with the same second tracking identifier in the second image sequence as a second track.

10. The method of claim 9, further comprising:

and creating a second tracking identifier for the second target which is not successfully matched in the current second image.

11. A trajectory matching device comprising:

a trajectory determination module for determining n first trajectories from a first sequence of images from a first perceiving device and m second trajectories from a second sequence of images from a second perceiving device, n and m each being an integer greater than 1, the first trajectories including a first sequence of bounding boxes and a first sequence of features, the second trajectories including a second sequence of bounding boxes and a second sequence of features;

a calculating module, configured to calculate, according to the first feature sequence and the second feature sequence, a distance relationship between the n first tracks and the m second tracks;

a nearest neighbor determining module, configured to determine, according to the distance relationship, a mutual nearest neighbor set of each first bounding box in the n first tracks, where the mutual nearest neighbor set includes a plurality of second bounding boxes that are nearest neighbors to the first bounding boxes; and

and the track matching module is used for determining a second track matched with the first track in the m second tracks according to the mutual nearest neighbor set.

12. The apparatus of claim 11, wherein a bounding box in the first sequence of bounding boxes corresponds to a feature in the first sequence of features; the calculation module comprises:

a calculating unit, configured to calculate, for each bounding box in the n first tracks, a similarity matrix according to a similarity between a feature of the bounding box and a feature of each bounding box in the m second tracks, respectively; and

and the optimization unit is used for optimizing each element in the similarity matrix according to the shielding proportion of each of the two surrounding frames corresponding to the element to obtain the optimized similarity matrix as the distance relation.

13. The apparatus of claim 12, wherein the optimization unit is configured to optimize the element according to the following formula:

wherein,

representing an element in the similarity matrix, I, J representing two bounding boxes corresponding to the element, r _o Represents the maximum value r of the shielding ratio of the bounding box I and the shielding ratio of the bounding box J _thre And alpha _o Is a hyper-parameter.

14. The apparatus of claim 11, wherein the nearest neighbor determination module comprises:

a first nearest neighbor determining unit, configured to determine, for each first bounding box, k second bounding boxes nearest to the first bounding box according to the distance relationship, where k is an integer greater than 1;

a second nearest neighbor determining unit, configured to determine, for each second enclosure frame, k first enclosure frames nearest to the second enclosure frame according to the distance relationship;

a mutual nearest neighbor determination unit, configured to determine that a specific first bounding box and a specific second bounding box are mutually nearest neighbors in response to that the specific first bounding box belongs to a nearest neighbor of the specific second bounding box and that the specific second bounding box belongs to a nearest neighbor of the specific first bounding box; and

and the combining unit is used for combining all the second surrounding frames which are nearest neighbors to the first surrounding frame into the mutual nearest neighbor set aiming at each first surrounding frame.

15. The apparatus of claim 11 or 14, wherein the trajectory matching module comprises:

a track matching unit, configured to determine, for each first bounding box, a second track composed of a plurality of second bounding boxes in a mutually nearest neighbor set of the first bounding box as a second track matching the first bounding box;

and the screening unit is used for combining the second tracks matched with each first enclosing frame in each first track into a candidate track set and determining the second track with the largest number in the candidate track set as the second track matched with the first track.

16. The apparatus of claim 11 or 15, wherein the first trajectory is a trajectory of a first target and the second trajectory is a trajectory of a second target; the device further comprises:

the target determining module is used for determining a first target and a second target which respectively correspond to the first track and the second track which are determined to be matched with each other as the same target; and

and the identification module is used for endowing the same global identification to the first target and the second target which are determined to be the same target.

17. The apparatus of claim 11, wherein the trajectory determination module comprises:

a first detection result obtaining unit, configured to obtain a first target detection result of a current first image in the first image sequence, where the first target detection result includes a first bounding box and a first feature of at least one first target in the current first image;

a first tracking result obtaining unit, configured to obtain a first target tracking result of a previous first image of the current first image, where the first target tracking result includes a first bounding box, a first feature, and a first tracking identifier of each of at least one first target in the previous first image;

a first target pair matching unit, configured to match, according to the first bounding box and the first feature, at least one first target in the current first image and at least one first target in the previous first image, so as to obtain a successfully matched first target pair;

a first identification unit, configured to assign, to the first target pair successfully matched, a first tracking identification of a first target in the first target pair from a previous first image to the first target from a current first image; and

and the first track determining unit is used for determining a first bounding box sequence and a first feature sequence of a first target with the same first tracking identifier in the first image sequence as a first track.

18. The apparatus of claim 17, further comprising:

and the first creating module is used for creating a tracking identifier for the first target which is not successfully matched in the current first image.

19. The apparatus of claim 11, wherein the trajectory determination module further comprises:

a second detection result obtaining unit, configured to obtain a second target detection result of a current second image in the second image sequence, where the second target detection result includes a second bounding box and a second feature of at least one second target in the current second image;

a second tracking result obtaining unit, configured to obtain a second target tracking result of a previous second image of the current second image, where the second target tracking result includes a second bounding box, a second feature, and a second tracking identifier of each second target in the previous second image;

a second target pair matching unit, configured to match, according to the second bounding box and the second feature, at least one second target in the current second image and at least one second target in the previous second image, so as to obtain a successfully matched second target pair;

a second identification unit, configured to assign, to a second target pair successfully matched with the first target pair, a second tracking identification of a second target in the second target pair from a previous second image to the second target from the current second image; and

and the second track determining unit is used for determining a second bounding box sequence and a second feature sequence of a second target with the same second tracking identifier in the second image sequence as a second track.

20. The apparatus of claim 19, further comprising:

and the second creating module is used for creating a second tracking identifier for the second target which is not successfully matched in the current second image.

21. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 10.

22. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1 to 10.

23. A computer program product comprising a computer program stored on at least one of a readable storage medium and an electronic device, the computer program, when executed by a processor, implementing the method according to any one of claims 1 to 10.