CN115953434B

CN115953434B - Track matching method, track matching device, electronic equipment and storage medium

Info

Publication number: CN115953434B
Application number: CN202310118712.2A
Authority: CN
Inventors: 路金诚; 张伟; 谭啸; 李莹莹
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2023-01-31
Filing date: 2023-01-31
Publication date: 2023-12-19
Anticipated expiration: 2043-01-31
Also published as: CN115953434A

Abstract

The disclosure provides a track matching method, which relates to the technical field of artificial intelligence such as computer vision, image processing, deep learning and the like, and can be applied to scenes such as automatic driving, unmanned driving and the like. The specific implementation scheme is as follows: determining n first trajectories from a first image sequence from a first perception device and m second trajectories from a second image sequence from a second perception device, the first trajectories comprising a first bounding box sequence and a first feature sequence, the second trajectories comprising a second bounding box sequence and a second feature sequence; according to the first characteristic sequence and the second characteristic sequence, calculating the distance relation between n first tracks and m second tracks; determining a mutual nearest neighbor set of each first bounding box in the n first tracks according to the distance relation; and determining a second track matched with the first track in the m second tracks according to the mutually nearest neighbor set. The disclosure also provides a track matching device, an electronic device and a storage medium.

Description

Track matching method, track matching device, electronic equipment and storage medium

Technical Field

The disclosure relates to the technical field of artificial intelligence, in particular to the technical fields of computer vision, image processing and deep learning, and can be applied to scenes such as automatic driving, unmanned driving and the like. More particularly, the present disclosure provides a track matching method, apparatus, electronic device, and storage medium.

Background

The main tasks of multi-object tracking include locating multiple objects simultaneously in a given video, and maintaining the identity of each of the multiple objects, recording the trajectories of each of the multiple objects, etc. The multi-target tracking task is widely applied to the fields of robot navigation, intelligent monitoring video, industrial detection, aerospace, autopilot and the like.

The continuous multi-target tracking across cameras can obtain complete tracks of targets under the fields of view of a plurality of cameras, and can be used for road management of cities, high speeds and the like, digital twinning and other scenes.

Disclosure of Invention

The present disclosure provides a track matching method, apparatus, device, and storage medium.

According to a first aspect, there is provided a track matching method, the method comprising: determining n first trajectories from a first image sequence from a first perception device and m second trajectories from a second image sequence from a second perception device, n and m each being integers greater than 1, the first trajectories comprising a first bounding box sequence and a first feature sequence, the second trajectories comprising a second bounding box sequence and a second feature sequence; according to the first characteristic sequence and the second characteristic sequence, calculating the distance relation between n first tracks and m second tracks; determining a mutual nearest neighbor set of each first bounding box in the n first tracks according to the distance relation, wherein the mutual nearest neighbor set comprises a plurality of second bounding boxes which are nearest neighbors to the first bounding boxes; and determining a second track matched with the first track in the m second tracks according to the mutually nearest neighbor set.

According to a second aspect, there is provided a track matching device, the device comprising: the track determining module is used for determining n first tracks from a first image sequence from a first sensing device and determining m second tracks from a second image sequence from a second sensing device, wherein n and m are integers greater than 1, the first tracks comprise a first bounding box sequence and a first feature sequence, and the second tracks comprise a second bounding box sequence and a second feature sequence; the calculating module is used for calculating the distance relation between the n first tracks and the m second tracks according to the first characteristic sequence and the second characteristic sequence; the nearest neighbor determining module is used for determining a mutual nearest neighbor set of each first bounding box in the n first tracks according to the distance relation, wherein the mutual nearest neighbor set comprises a plurality of second bounding boxes which are nearest neighbors to the first bounding boxes; and the track matching module is used for determining a second track matched with the first track in the m second tracks according to the mutually nearest neighbor set.

According to a third aspect, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method provided in accordance with the present disclosure.

According to a fourth aspect, there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform a method provided according to the present disclosure.

According to a fifth aspect, there is provided a computer program product comprising a computer program stored on at least one of a readable storage medium and an electronic device, which, when executed by a processor, implements a method provided according to the present disclosure.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a schematic diagram of an exemplary system architecture to which trajectory matching methods and apparatus may be applied, according to one embodiment of the present disclosure;

FIG. 2 is a flow chart of a trajectory matching method according to one embodiment of the present disclosure;

FIG. 3 is a system architecture diagram of a track matching method according to one embodiment of the present disclosure;

FIG. 4 is a schematic diagram of a cross-camera trajectory matching method of bounding box granularity, according to one embodiment of the present disclosure;

FIG. 5 is an effect diagram of cross-camera multi-target tracking according to one embodiment of the present disclosure;

FIG. 6 is a block diagram of a track matching device according to one embodiment of the present disclosure;

fig. 7 is a block diagram of an electronic device of a trajectory matching method according to one embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Currently, most of the mainstream multi-target tracking methods are detection-based tracking methods. The target may refer to a vehicle, a pedestrian, a robot, or the like.

For example, in a multi-target tracking method for a single camera, a detection result of each frame of images in continuous multi-frame images under the single camera is obtained, wherein the detection result of each frame of images comprises respective bounding boxes of a plurality of targets in the frame of images, and the bounding boxes comprise position and size characteristics. And extracting the features from the small images in the bounding box of each target to obtain the features of each target. And comparing the characteristics of the targets in each two adjacent frames of images, completing the matching of the targets in the two adjacent frames of images, and further completing the matching of the targets in the continuous multi-frame images to obtain respective bounding box sequences of a plurality of targets under a single camera, wherein the bounding box sequences are used as respective tracks of the targets, namely a single-camera multi-target tracking result.

The multi-target tracking method across cameras can firstly obtain multi-target tracking results under a single camera for a plurality of continuous cameras with distance intervals, and then track matching is carried out according to the track level characteristics (such as the characteristic sequences corresponding to the bounding box sequences) of the targets to obtain respective tracks of the plurality of targets under the plurality of continuous cameras, namely the multi-target tracking results across the cameras.

For example, performing track matching according to the features of the track level of the target may include performing track matching by taking an average value of the feature sequence or features of key frames in the feature sequence as features of the entire track. However, the method is easy to ignore the characteristics of some specific bounding boxes, so that the target characteristics are not obvious, and tracking confusion is easy to occur. Especially in the case of complex scene and high-similarity targets, the target tracking is easy to be confused and has poor robustness.

In the technical scheme of the disclosure, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the personal information of the user accord with the regulations of related laws and regulations, and the public order colloquial is not violated.

In the technical scheme of the disclosure, the authorization or consent of the user is obtained before the personal information of the user is obtained or acquired.

FIG. 1 is a schematic diagram of an exemplary system architecture to which trajectory matching methods and apparatus may be applied, according to one embodiment of the present disclosure. It should be noted that fig. 1 is only an example of a system architecture to which embodiments of the present disclosure may be applied to assist those skilled in the art in understanding the technical content of the present disclosure, but does not mean that embodiments of the present disclosure may not be used in other devices, systems, environments, or scenarios.

As shown in fig. 1, a system architecture 100 according to this embodiment may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired and/or wireless communication links, and the like.

The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. The terminal devices 101, 102, 103 may be a variety of electronic devices including, but not limited to, smartphones, tablets, laptop portable computers, and the like.

The trajectory matching method provided by the embodiments of the present disclosure may be generally performed by the server 105. Accordingly, the track matching device provided by the embodiments of the present disclosure may be generally disposed in the server 105.

The image sequences acquired by each of the continuous plurality of cameras can be uploaded by the terminals 101, 102, 103 to the server 105 via the network 104. The server 105 executes the track matching method provided by the embodiment of the disclosure, so as to obtain a multi-target tracking result across cameras. The server 105 may also feed back the multi-target tracking result across the cameras to the terminals 101, 102, 103 via the network 104, and the terminals 101, 102, 103 or other devices connected to the terminals 101, 102, 103 may display the motion trajectories of the multiple targets under the continuous multiple cameras based on the tracking result.

Fig. 2 is a flow chart of a trajectory matching method according to one embodiment of the present disclosure.

As shown in fig. 2, the trajectory matching method 200 includes operations S210 to S240.

In operation S210, n first trajectories are determined from a first image sequence from a first perception device, and m second trajectories are determined from a second image sequence from a second perception device. m and n are integers greater than 1, the first track includes a first bounding box sequence and a first feature sequence, and the second track includes a second bounding box sequence and a second feature sequence.

The first sensing device and the second sensing device may be two cameras with adjacent geographic positions, the first image sequence is obtained by shooting by the first sensing device, and the second image sequence is obtained by shooting by the second sensing device.

The n first trajectories may be trajectories of n first objects in the first image sequence, including a first bounding box sequence and a first feature sequence of each of the n first objects. The first bounding box represents a position and a size of the first object, and the first feature represents appearance information of the first object. The first bounding box sequence and the first feature sequence of each first object are corresponding to each other.

The m second trajectories may be trajectories of m second objects in the second image sequence. Comprising a second bounding box sequence and a second feature sequence of each of the m second targets. The second bounding box represents the position and size of the second object and the second feature represents appearance information of the second object. The second bounding box sequence and the second feature sequence of each second object are corresponding to each other.

The first target and the second target may both be vehicles.

In operation S220, a distance relationship between the n first tracks and the m second tracks is calculated according to the first feature sequence and the second feature sequence.

For example, a similarity between each of the n first bounding boxes and each of the m second bounding boxes may be calculated. And determining the distance relation between the n first tracks and the m second tracks according to the similarity.

For example, a sequence of n first bounding boxes in the n first trajectories may be represented as a first matrix, each element in the first matrix representing one first bounding box, each element in the first matrix having a first feature corresponding to the first bounding box represented by the element. And representing m second bounding box sequences in m second tracks as a second matrix, wherein each element in the second matrix represents one second bounding box, and each element in the second matrix has a second feature corresponding to the second bounding box represented by the element.

For example, similarity calculation is performed on the first feature of each element in the first matrix and the second feature of each element in the second matrix, so as to obtain a similarity matrix. The similarity matrix may represent a distance relationship between the n first tracks and the m second tracks.

In operation S230, a set of mutually nearest neighbors of each first bounding box in the n first tracks is determined according to the distance relationship, where the set of mutually nearest neighbors includes a plurality of second bounding boxes that are mutually nearest neighbors to the first bounding box.

For example, for each first bounding box, k (k is an integer greater than 1, e.g., k=3) second bounding boxes nearest to the first bounding box may be found according to a distance relationship. For each second bounding box, k first bounding boxes nearest to the second bounding box can be searched according to the distance relation. The k-nearest neighbor relationship may be determined based on the k-nearest neighbor of each first bounding box and the k-nearest neighbor of each second bounding box.

For example, a first bounding box X belongs to the k-nearest neighbor of a second bounding box Y, and a second bounding box Y belongs to the k-nearest neighbor of the first bounding box X, then the first bounding box X and the second bounding box Y are each k-nearest neighbors.

For each first bounding box, all second bounding boxes that are k-nearest neighbors to the first bounding box may be grouped into a set of k-nearest neighbors.

In operation S240, a second track matching the first track from among the m second tracks is determined according to the set of mutually nearest neighbors.

According to the embodiment of the disclosure, for each first bounding box, a plurality of second bounding boxes are included in the mutually nearest neighbor set of the first bounding box, and part or all of the second bounding boxes may belong to the same second track, so that the second track formed by the plurality of second bounding boxes can be determined as the second track matched with the first bounding box.

For example, the k-nearest neighbor set of the first bounding box A1 is { second bounding box B1, second bounding box B2, second bounding box C1}. The second surrounding frame B1 and the second surrounding frame B2 belong to the second track B. Thus, the second trajectory B can be determined to be the second trajectory matching the first bounding box A1.

According to an embodiment of the present disclosure, for each first track, each first bounding box in the first track may determine a second track corresponding to the first bounding box. And combining the second tracks matched with each first bounding box of the first tracks into a candidate track set, and determining the second track with the largest number in the candidate track set as the second track matched with the first track.

For example, the first track a includes a first bounding box A1, a first bounding box A2, and a first bounding box A3. The k-nearest neighbor set of the first bounding box A1 is { second bounding box B1, second bounding box B2, second bounding box C1}, and therefore, the second track matching the first bounding box A1 is the second track B, which is added to the candidate track set as a candidate second track.

The k-nearest neighbor set of the first bounding box A2 is { second bounding box C1, second bounding box C2}, and therefore, the second track matching the first bounding box A2 is the second track C, which is added as a candidate second track to the candidate track set.

The k-nearest neighbor set of the first bounding box A3 is { second bounding box C1, second bounding box C2, second bounding box C3}, and therefore, the second track matching the first bounding box A3 is the second track C, which is added to the candidate track set as a candidate second track.

Therefore, the candidate track set is { second track B, second track C }, wherein the candidate track with the largest number of candidate track sets is second track C, and thus, second track C can be determined as the second track matching first track a.

Because the first track A and the second track C are matched with each other, the first track A and the second track C can be determined to be tracks of the same target, and further the movement track of the same target across the camera can be determined.

Compared with the track matching according to the characteristic sequence of the track level in the related art, the track matching method and device for the video camera calculates the distance relation of the bounding box granularity according to the first track and the second track from different video cameras, and track matching of the bounding box granularity according to the distance relation can enable the track matching to be more accurate.

According to an embodiment of the present disclosure, operation S220 includes calculating, for each bounding box in the n first trajectories, a similarity matrix according to similarities between features of the bounding box and features of each bounding box in the m second trajectories, respectively; and optimizing the elements according to the respective shielding proportion of the two bounding boxes corresponding to each element in the similarity matrix to obtain an optimized similarity matrix as a distance relation.

For example, an ith track of the n first tracks may be represented asi∈[1，n]，h _i Represents the length of the ith track, +.>Representing the first bounding box in frame 1 in the ith track,/and- >The first bounding box representing the last frame in the ith track.

Similarly, the j-th track of the m second tracks can be expressed asj∈[1，m]，h _j Represents the length of the j-th track, +.>A second bounding box representing frame 1 in the jth track,/>A second bounding box representing the last frame in the j-th track.

Calculating the similarity between each first bounding box in the n first tracks and each second bounding box in the m second tracks, wherein the obtained similarity matrix D is expressed as follows:

where cos () represents the cosine distance between the features of two bounding boxes, 1-cos () may represent the similarity distance of two bounding boxes, also being an element in the similarity matrix D. Can be used asRepresenting elements in the similarity matrix D, i.e.Specifically, the similarity distance between the bounding box I and the bounding box J is shown.

For each element in the similarity matrix DAccording to the ratio of the occlusion of bounding box I and bounding box J, the element +.>The element +.can be optimized according to the following equation (1)>

Wherein,representing an element in the similarity matrix, I, J representing two bounding boxes corresponding to the element, r _o Represents the maximum value, r, of the occlusion ratio of bounding box I and the occlusion ratio of bounding box J _thre And alpha _o Is a super parameter.

At r _o Greater than r _thre When the surrounding frame shielding ratio maximum value is larger than the threshold value (for example, 50%), the similarity between the two surrounding frames is inaccurate because the surrounding frames are shielded by other vehicles in a large ratio, and therefore the similarity distance between the two surrounding frames can be suppressed. For example, alpha _o Less than 0, functionMonotonically decreasing such that the greater the similarity, the more suppressed.

The optimized similarity distance matrix can be used as a distance matrix between the n first tracks and the m second tracks, namely a distance relation.

The track matching method provided by the embodiment of the present disclosure is described in detail below with reference to fig. 3.

Fig. 3 is a system architecture diagram of a track matching method according to one embodiment of the present disclosure.

As shown in fig. 3, the system architecture diagram of the present embodiment includes a target detection module 310, a feature extraction module 320, a single camera tracking module 330, and a cross-camera association module 340. Image sequences 301 to 304 are, for example, image sequences from a plurality of consecutive cameras, for example, image sequence 301 from camera a, image sequence 302 from camera B, image sequence 303 from camera C, image sequence 304 from camera D, cameras a to D being geographically consecutive (adjacent to each other) cameras, respectively.

For the image sequence 301, the object detection module 310 is input, resulting in a bounding box of the object in the image sequence 301. The object detection module 310 may be implemented by a convolutional neural network, where the network structure is PPYOLO-E (payepaddle-YOLO, PPYOLO-E is modified based on a PPYOLO series model), the model is input as a single image, and the output is the bounding box position, category, and confidence score of the object on the image.

The feature extraction module 320 is configured to input a small image enclosed by the bounding box of each target output by the target detection module 310 into a convolutional neural network to obtain a ReID (Re-identification) feature 321 of the target, where the ReID feature 321 is different from a color feature, a shape feature, and the like of a single dimension, and is a feature representing overall appearance information of the target. For a particular object, it can be determined whether the object appears in other images using the ReID characteristics of the object. The backbone network structure of the feature extraction module 320 may be HRNet (High-Resolution Network, high resolution network).

The network structure of the single camera tracking module 330 may be modified based on deep SORT (Simple Online Realtime Tracking, SORT, a precursor of deep SORT). The inputs of the single camera tracking module 330 are images and target detection results of the images, which include bounding box information output by the target detection module 310 and ReID features 321 output by the feature extraction module 320. And outputting a tracking identification (tracking ID) of each target under the current camera.

For example, a target detection result of a current image is acquired, the target detection result of the current image including a bounding box and a ReID feature of each target in the current image. The method comprises the steps of acquiring a target tracking result of a previous image, wherein the target tracking result comprises a bounding box of each target in the previous image, a ReID characteristic and a tracking ID. And carrying out association matching on the target in the current image and the target in the previous image according to the ReID characteristics to obtain a successfully matched target pair. A successfully matched target pair indicates that the target pair is the same target. And aiming at the successfully matched target pair, assigning the tracking ID of the target from the previous image in the target pair to the target from the current image, so that the same target has the same tracking ID.

For an unsuccessful match target in the current image, the target may be the target of the first image in the image sequence 301 or a new target that is newly entered into the shooting range of the camera. For such targets, a tracking ID may be newly created for the target.

The target detection results of the images in the image sequence 301 are continuously input into the single-camera tracking module 330 according to the image sequence, so that the target tracking result of each image can be obtained, and further the single-camera multi-target tracking result of the image sequence 301 is obtained. The bounding box sequence and the ReID feature sequence of the targets having the same tracking ID in the image sequence 301 serve as the trajectories of the targets, and thus, the single-camera multi-target tracking result of the image sequence 301 includes the respective trajectories of the plurality of targets.

Similarly, a single-camera multi-target tracking result for each of the image sequences 302-304 may also be obtained.

The input of the cross-camera association module 340 is a single-camera multi-target tracking result of each continuous image sequence, and the multi-target tracks in the adjacent image sequences are matched, so that the cross-camera multi-target tracking result can be obtained.

According to the embodiment of the disclosure, after obtaining the multi-target tracking result across the camera, determining a first target and a second target which respectively correspond to the first track and the second track determined to be matched with each other as the same target; and assigning the same global identity to the first target and the second target determined to be the same target.

For example, where track 341 and track 342 match, it may be determined that the object corresponding to track 341 and the object corresponding to track 342 are the same object. A global identification (global ID) may be created for the same object so that the object has the same digital ID under a continuous plurality of cameras, which is not easily confused with other objects.

The cross-camera correlation module 340 is a method for matching multiple target tracks in adjacent image sequences, in particular to a cross-camera track matching method based on bounding box granularity. The method for matching the track of the surrounding frame granularity across cameras provided by the present disclosure is specifically described below with reference to fig. 4.

Fig. 4 is a schematic diagram of a cross-camera trajectory matching method of bounding box granularity according to one embodiment of the present disclosure.

As shown in fig. 4, the cross-camera of the present embodiment may include an upstream camera and a downstream camera having a geographic location adjacency.

The video 410 may be derived from an upstream camera by taking frames of the video 410 into a sequence of images and performing object detection to obtain a plurality (e.g., 3) bounding box sequences 411, each bounding box sequence corresponding to an object in the video 410. By extracting features from the bounding box sequence 411, a feature sequence corresponding to the bounding box sequence is obtained, and a plurality of bounding box sequences 411 and corresponding feature sequences form a plurality of upstream tracks 412.

For example, the plurality of upstream tracks 412 includes upstream track B ₁ Upstream track B ₂ And upstream track B ₃ ，Upstream trace B may be performed by supplementing 0 at a frame position without features ₁ Upstream track B ₂ And upstream track B ₃ Is uniform and forms a first matrix 413.

The video 420 may be derived from a downstream camera by extracting frames of the video 420 into a sequence of images and performing object detection to obtain a plurality (e.g., 3) bounding box sequences 421, each bounding box sequence corresponding to an object in the video 420. By extracting features from the bounding box sequences 421, feature sequences corresponding to the bounding box sequences are obtained, and the plurality of bounding box sequences 421 and the corresponding feature sequences form a plurality of downstream tracks 422.

For example, the plurality of downstream tracks 422 includes a downstream trackDownstream track->And downstream trace-> Downstream trace +.>Downstream track->And downstream trace->Is processed to be uniform and forms a secondary matrix 423.

The similarity between each element in the first matrix 413 and each element in the second matrix 423 is calculated, resulting in a similarity matrix. And performing similarity matrix optimization through the formula (1) to obtain a distance matrix 430 of the granularity of the bounding box, wherein elements in the distance matrix 430 represent similarity distances between two bounding boxes.

Next, bounding box granularity trajectory matching is performed using a k-nearest neighbor matching algorithm based on distance matrix 430.

The k-nearest neighbor relationship can be defined by the following formula (2).

Wherein,a bounding box representing the h frame in the i-th track,>representation->Is the k nearest neighbors of (c).

The k-nearest neighbor relationship can be defined by the following formula (3).

Wherein,a bounding box representing the h frame in the i-th track,>representation->K-nearest neighbor sets of (c).

For convenience of description, the bounding box in the upstream track will be referred to as an upstream bounding box, and the bounding box in the downstream track will be referred to as a downstream bounding box.

For each upstream bounding box, k (e.g., k=3) downstream bounding boxes nearest to the upstream bounding box may be found. For example, for upstream trajectoriesEach upstream bounding box in (a), respectively searching k downstream bounding boxes nearest to each other according to the distance matrix 430 to obtain an upstream track B ₁ Nearest neighbor relation set 441 of (2). Similarly, an upstream trajectory B can be obtained ₂ Nearest neighbor relation set 442 of (a), and upstream trace B ₃ Nearest neighbor relation set 443 of (a).

For each downstream bounding box, k (e.g., k=3) downstream bounding boxes nearest to the downstream bounding box may be found in reverse. For example, the sets 451 to 453 may be nearest neighbor relation sets corresponding to the sets 441 to 443, respectively, obtained by reverse search.

For example, upstream trace B ₁ In the nearest neighbor relation set 441 of (2), upstream bounding boxesK-nearest neighbor set of (2) is And->All belong to the downstream trace->For->And->The k-nearest neighbors are found back by distance matrix 430, respectively, at +.>And->The k-nearest neighbors of each include ∈>(e.g. in collection 451 and +.>And->The content of the respective corresponding grey boxes is +.>). Thus (S)>And->Belonging to->K-nearest neighbor sets of (c). Due to->And->Belonging to the downstream trace- >Therefore downstream trace->And upstream bounding box->Matching, downstream trace->Can be used as an upstream track B ₁ Is a candidate trajectory.

Similarly, upstream trace B ₁ In the nearest neighbor relation set 441 of (2), upstream bounding boxesK-nearest neighbor set of (2) is And->Belonging to the downstream trace->For->And->The k-nearest neighbors are found back by distance matrix 430, respectively, at +.>And->The k-nearest neighbors of each include ∈>(e.g. in collection 451 and +.>And->The content of the respective corresponding grey boxes is +.>) Thus, it is->And->Belonging to->K-nearest neighbor sets of (c). Due to->And->Belonging to the downstream trace->Therefore downstream trace->And upstream bounding box->Matching, downstream trace->Can be used as an upstream track B ₁ Is a candidate trajectory.

Similarly, for upstream trace B ₁ Upstream bounding box in (a)It is possible to determine the downstream trajectory +.>And->Matching, downstream trace->Can be used as an upstream track B ₁ Is a candidate trajectory.

Thus, for upstream trace B ₁ The candidate track set isCounting the most downstream tracks in the candidate track set by voting method>Is in line with the upstream track B ₁ Matched downstream trajectory, i.e. upstream trajectory B ₁ And downstream track->Matching.

Similarly, the upstream trajectory B may also be determined ₂ With downstream trajectories Matching, upstream trace B ₃ And downstream track->Matching.

In the embodiment, the distance matrix of the bounding box granularity is calculated, and the k-nearest neighbor matching algorithm is used for matching the track of the bounding box granularity, so that the track matching is more accurate.

Fig. 5 is an effect diagram of cross-camera multi-target tracking according to one embodiment of the present disclosure.

As shown in fig. 5, the target includes a vehicle with an ID of 65 and a vehicle with an ID of 67. The ID of the vehicle is a global ID created for each target according to the matching method provided by the present disclosure. Since the same global ID is assigned to the same target, the vehicle has the same numerical identification under a plurality of continuous cameras.

For example, for a vehicle having an ID of 65, the frames photographed by the camera a, the camera B, and the camera C are displayed in the form of a surrounding frame and an ID (65). Similarly, for the vehicle having the ID 67, the frames photographed by the camera a, the camera B, and the camera C are displayed in the form of a surrounding frame and the ID (67).

According to the embodiment, the targets are tracked and displayed across cameras in the form of surrounding frames and IDs, so that the targets are clearer, and tracking is not easy to confuse.

Fig. 6 is a block diagram of a trajectory matching device according to one embodiment of the present disclosure.

As shown in fig. 6, the track matching device 600 includes a track determining module 601, a calculating module 602, a nearest neighbor determining module 603, and a track matching module 604.

The track determining module 601 is configured to determine n first tracks from a first image sequence from a first sensing device, and determine m second tracks from a second image sequence from a second sensing device, where n and m are integers greater than 1, the first tracks include a first bounding box sequence and a first feature sequence, and the second tracks include a second bounding box sequence and a second feature sequence.

The calculation module 602 is configured to calculate a distance relationship between the n first tracks and the m second tracks according to the first feature sequence and the second feature sequence.

The nearest neighbor determining module 603 is configured to determine, according to a distance relationship, a set of mutually nearest neighbors of each first bounding box in the n first tracks, where the set of mutually nearest neighbors includes a plurality of second bounding boxes that are nearest neighbors to the first bounding box.

The track matching module 604 is configured to determine a second track matching the first track from the m second tracks according to the set of mutually nearest neighbors.

According to an embodiment of the present disclosure, the bounding boxes in the first bounding box sequence correspond to features in the first feature sequence. The calculation module 602 includes a calculation unit and an optimization unit.

The calculation unit is used for calculating a similarity matrix according to the similarity between the features of the bounding box and the features of each bounding box in the m second tracks for each bounding box in the n first tracks.

The optimization unit is used for optimizing the elements according to the respective shielding proportion of the two bounding boxes corresponding to each element in the similarity matrix to obtain an optimized similarity matrix serving as a distance relation.

The optimizing unit is used for optimizing the element according to the following formula:

wherein,representing an element in the similarity matrix, I, J representing the elementTwo bounding boxes, r _o Represents the maximum value, r, of the occlusion ratio of bounding box I and the occlusion ratio of bounding box J _thr And alpha _o Is a super parameter.

The nearest neighbor determining module 603 includes a first nearest neighbor determining unit, a second nearest neighbor determining unit, a mutual nearest neighbor determining unit, and a combining unit.

The first nearest neighbor determining unit is used for determining k second bounding boxes nearest to each first bounding box according to the distance relation, wherein k is an integer greater than 1.

The second nearest neighbor determining unit is used for determining k first bounding boxes nearest to each second bounding box according to the distance relation.

The mutual nearest neighbor determining unit is used for determining that the specific first bounding box and the specific second bounding box are nearest neighbors to each other in response to the fact that the specific first bounding box belongs to the nearest neighbor of the specific second bounding box and the specific second bounding box belongs to the nearest neighbor of the specific first bounding box.

The combining unit is used for combining all second bounding boxes which are nearest neighbors to each other with the first bounding box into a mutually nearest neighbor set for each first bounding box.

The track matching module 604 includes a track matching unit and a screening unit.

The track matching unit is used for determining a second track formed by a plurality of second bounding boxes in the mutually nearest neighbor set of the first bounding boxes as a second track matched with the first bounding box for each first bounding box.

The screening unit is used for combining the second tracks matched with each first bounding box in the first tracks into a candidate track set aiming at each first track, and determining the second track with the largest number in the candidate track set as the second track matched with the first track.

According to an embodiment of the present disclosure, the first track is a track of a first object and the second track is a track of a second object. The trajectory matching device 600 further comprises a target determination module and an identification module.

The target determination module is used for determining a first target and a second target which are respectively corresponding to the first track and the second track determined to be matched with each other as the same target.

The identification module is used for assigning the same global identification to a first target and a second target which are determined to be the same target.

The track determining module 601 includes a first detection result acquiring unit, a first tracking result acquiring unit, a first target pair matching unit, a first identification unit, and a first track determining unit.

The first detection result acquisition unit is used for acquiring a first target detection result of a current first image in the first image sequence, wherein the first target detection result comprises a first bounding box and a first feature of each of at least one first target in the current first image.

The first tracking result acquisition unit is used for acquiring a first target tracking result of a previous first image of the current first image, wherein the first target tracking result comprises a first bounding box, a first feature and a first tracking identifier of at least one first target in the previous first image.

The first target pair matching unit is used for matching at least one first target in the current first image with at least one first target in the previous first image according to the first bounding box and the first feature, and a successfully matched first target pair is obtained.

The first identification unit is used for giving a first tracking identification of a first target from a previous first image in the first target pair to the first target from the current first image aiming at the successfully matched first target pair.

The first track determining unit is used for determining a first bounding box sequence and a first feature sequence of a first target with the same first tracking identifier in the first image sequence as a first track.

The track determining module 601 further includes a second detection result acquiring unit, a second tracking result acquiring unit, a second target pair matching unit, a second identification unit, and a second track determining unit.

The second detection result obtaining unit is used for obtaining a second target detection result of a current second image in the second image sequence, and the second target detection result comprises a second surrounding frame and a second feature of at least one second target in the current second image.

The second tracking result obtaining unit is used for obtaining a second target tracking result of a previous second image of the current second image, wherein the second target tracking result comprises a second bounding box, a second feature and a second tracking identifier of at least one second target in the previous second image.

The second target pair matching unit is used for matching at least one second target in the current second image with at least one second target in the previous second image according to the second surrounding frame and the second characteristics to obtain a successfully matched second target pair.

The second identification unit is used for giving a second tracking identification of a second target from a previous second image in the second target pair to a second target from a current second image aiming at the successfully matched second target pair.

The second track determining unit is used for determining a second surrounding frame sequence and a second feature sequence of a second target with the same second tracking identifier in the second image sequence as a second track.

The track matching device 600 further comprises a first creation unit and a second creation unit.

The first creating module is used for creating a tracking identifier for a first target which is not successfully matched in the current first image.

The second creating module is used for creating a second tracking identifier for a second target which is not successfully matched in the current second image.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

Fig. 7 illustrates a schematic block diagram of an example electronic device 700 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 7, the apparatus 700 includes a computing unit 701 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 702 or a computer program loaded from a storage unit 708 into a Random Access Memory (RAM) 703. In the RAM703, various programs and data required for the operation of the device 700 may also be stored. The computing unit 701, the ROM 702, and the RAM703 are connected to each other through a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

Various components in device 700 are connected to I/O interface 705, including: an input unit 706 such as a keyboard, a mouse, etc.; an output unit 707 such as various types of displays, speakers, and the like; a storage unit 708 such as a magnetic disk, an optical disk, or the like; and a communication unit 709 such as a network card, modem, wireless communication transceiver, etc. The communication unit 709 allows the device 700 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

The computing unit 701 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 701 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 701 performs the respective methods and processes described above, for example, a trajectory matching method. For example, in some embodiments, the trajectory matching method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 708. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 700 via ROM 702 and/or communication unit 709. When a computer program is loaded into RAM703 and executed by the computing unit 701, one or more steps of the trajectory matching method described above may be performed. Alternatively, in other embodiments, the computing unit 701 may be configured to perform the trajectory matching method by any other suitable means (e.g. by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed aspects are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A track matching method, comprising:

determining n first tracks from a first image sequence from a first sensing device, and determining m second tracks from a second image sequence from a second sensing device, wherein n and m are integers greater than 1, the first tracks comprise a first bounding box sequence and a first feature sequence which correspond to each other, and the second tracks comprise a second bounding box sequence and a second feature sequence which correspond to each other;

according to the first characteristic sequence and the second characteristic sequence, calculating the distance relation between the n first tracks and the m second tracks;

determining a mutual nearest neighbor set of each first bounding box in the n first tracks according to the distance relation, wherein the mutual nearest neighbor set comprises a plurality of second bounding boxes which are nearest neighbors to the first bounding boxes; and

Determining a second track matched with the first track in the m second tracks according to the mutually nearest neighbor set;

wherein said calculating a distance relationship between said n first tracks and said m second tracks comprises:

for each bounding box in the n first tracks, calculating a similarity matrix according to the similarity between the features of the bounding box and the features of each bounding box in the m second tracks;

for each similarity element in the similarity matrix, determining the maximum value in the shielding proportion of two bounding boxes corresponding to the similarity element, and multiplying the value of the similarity element by the output value of a function in response to the maximum value of the shielding proportion being greater than a threshold value, wherein the function is an exponential function based on e, and the parameter of the exponential function is the sum of 1 and the maximum value of the shielding proportion and is multiplied by a super parameter smaller than 0.

2. The method of claim 1, wherein the calculating a distance relationship between the n first tracks and the m second tracks from the first and second feature sequences further comprises:

and optimizing each element in the similarity matrix according to the respective shielding proportion of the two bounding boxes corresponding to the element to obtain an optimized similarity matrix serving as the distance relation.

3. The method of claim 2, wherein, for each element in the similarity matrix, optimizing the element according to respective occlusion proportions of two bounding boxes corresponding to the element comprises:

optimizing the elements according to the following formula:

4. The method of claim 1, wherein the determining the mutually nearest neighbor set of each first bounding box in the n first tracks according to the distance relationship comprises:

for each first bounding box, determining k second bounding boxes nearest to the first bounding box according to the distance relation, wherein k is an integer greater than 1;

for each second bounding box, determining k first bounding boxes nearest to the second bounding box according to the distance relation;

in response to a particular first bounding box belonging to a nearest neighbor of a particular second bounding box, and the particular second bounding box belonging to a nearest neighbor of the particular first bounding box, determining that the particular first bounding box and the particular second bounding box are nearest neighbors to each other; and

For each first bounding box, combining all second bounding boxes which are nearest neighbors to the first bounding box into the set of nearest neighbors.

5. The method of claim 1 or 4, wherein the determining, from the set of mutually nearest neighbors, a second track of the m second tracks that matches the first track comprises:

for each first bounding box, determining a second track formed by a plurality of second bounding boxes in the mutually nearest neighbor set of the first bounding box as a second track matched with the first bounding box;

and combining the second tracks matched with each first bounding box in the first tracks into a candidate track set aiming at each first track, and determining the second track with the largest number in the candidate track set as the second track matched with the first track.

6. The method of claim 1, wherein the first trajectory is a trajectory of a first target and the second trajectory is a trajectory of a second target; the method further comprises the steps of:

determining a first target and a second target, which correspond to the first track and the second track determined to be matched with each other, respectively, as the same target; and

the first target and the second target determined to be the same target are given the same global identity.

7. The method of claim 1, wherein the determining n first trajectories from the first sequence of images from the first perception device comprises:

acquiring a first target detection result of a current first image in the first image sequence, wherein the first target detection result comprises a first bounding box and a first feature of each of at least one first target in the current first image;

acquiring a first target tracking result of a previous first image of the current first image, wherein the first target tracking result comprises a first bounding box, a first feature and a first tracking identifier of at least one first target in the previous first image;

according to the first bounding box and the first feature, matching at least one first target in the current first image with at least one first target in the previous first image to obtain a successfully matched first target pair;

for the successfully matched first target pair, assigning a first tracking identifier of the first target from the previous first image in the first target pair to the first target from the current first image; and

and determining a first bounding box sequence and a first feature sequence of a first target with the same first tracking identifier in the first image sequence as a first track.

8. The method of claim 7, further comprising:

and creating a tracking identifier for the first target which is not successfully matched in the current first image.

9. The method of claim 1, wherein the determining m second trajectories from the second sequence of images from the second perception device comprises:

acquiring a second target detection result of a current second image in the second image sequence, wherein the second target detection result comprises a second surrounding frame and a second feature of at least one second target in the current second image;

acquiring a second target tracking result of a previous second image of the current second image, wherein the second target tracking result comprises a second bounding box, a second feature and a second tracking identifier of each of at least one second target in the previous second image;

according to the second bounding box and the second characteristics, at least one second target in the current second image and at least one second target in the previous second image are matched, and a second target pair successfully matched is obtained;

assigning, for the successfully matched second target pair, a second tracking identifier of a second target from a previous second image in the second target pair to a second target from a current second image; and

And determining a second surrounding frame sequence and a second characteristic sequence of a second target with the same second tracking identifier in the second image sequence as a second track.

10. The method of claim 9, further comprising:

and creating a second tracking identifier for a second target which is not successfully matched in the current second image.

11. A track matching device, comprising:

a track determining module, configured to determine n first tracks from a first image sequence from a first sensing device, and determine m second tracks from a second image sequence from a second sensing device, where n and m are integers greater than 1, the first tracks include a first bounding box sequence and a first feature sequence that correspond to each other, and the second tracks include a second bounding box sequence and a second feature sequence that correspond to each other;

the calculating module is used for calculating the distance relation between the n first tracks and the m second tracks according to the first characteristic sequence and the second characteristic sequence;

the nearest neighbor determining module is used for determining a mutual nearest neighbor set of each first bounding box in the n first tracks according to the distance relation, wherein the mutual nearest neighbor set comprises a plurality of second bounding boxes which are nearest neighbors to the first bounding boxes; and

The track matching module is used for determining a second track matched with the first track in the m second tracks according to the mutually nearest neighbor set;

the computing module is used for computing a similarity matrix according to the similarity between the features of the bounding box and the features of each bounding box in the m second tracks for each bounding box in the n first tracks; for each similarity element in the similarity matrix, determining the maximum value in the shielding proportion of two bounding boxes corresponding to the similarity element, and multiplying the value of the similarity element by the output value of a function in response to the maximum value of the shielding proportion being greater than a threshold value, wherein the function is an exponential function based on e, and the parameter of the exponential function is the sum of 1 and the maximum value of the shielding proportion and is multiplied by a super parameter smaller than 0.

12. The apparatus of claim 11, the computing module comprising:

and the optimizing unit is used for optimizing each element in the similarity matrix according to the respective shielding proportion of the two bounding boxes corresponding to the element to obtain an optimized similarity matrix serving as the distance relation.

13. The apparatus of claim 12, wherein the optimizing unit is configured to optimize the element according to the following formula:

14. The apparatus of claim 11, wherein the nearest neighbor determination module comprises:

the first nearest neighbor determining unit is used for determining k second bounding boxes nearest to each first bounding box according to the distance relation, wherein k is an integer greater than 1;

the second nearest neighbor determining unit is used for determining k first bounding boxes nearest to each second bounding box according to the distance relation;

a mutual nearest neighbor determining unit, configured to determine that a specific first bounding box and a specific second bounding box are nearest neighbors to each other in response to that the specific first bounding box belongs to a nearest neighbor of the specific second bounding box, and that the specific second bounding box belongs to a nearest neighbor of the specific first bounding box; and

And the combining unit is used for combining all second bounding boxes which are nearest neighbors to the first bounding box into the mutually nearest neighbor set for each first bounding box.

15. The apparatus of claim 11 or 14, wherein the trajectory matching module comprises:

a track matching unit, configured to determine, for each first bounding box, a second track composed of a plurality of second bounding boxes in a set of mutually nearest neighbors of the first bounding box as a second track matched with the first bounding box;

and the screening unit is used for combining the second tracks matched with each first bounding box in the first tracks into a candidate track set aiming at each first track, and determining the second track with the largest number in the candidate track set as the second track matched with the first track.

16. The apparatus of claim 11, wherein the first trajectory is a trajectory of a first target and the second trajectory is a trajectory of a second target; the apparatus further comprises:

the target determining module is used for determining a first target and a second target which are respectively corresponding to the first track and the second track determined to be matched with each other as the same target; and

and the identification module is used for assigning the same global identification to the first target and the second target which are determined to be the same target.

17. The apparatus of claim 11, wherein the trajectory determination module comprises:

a first detection result obtaining unit, configured to obtain a first target detection result of a current first image in the first image sequence, where the first target detection result includes a first bounding box and a first feature of each of at least one first target in the current first image;

a first tracking result obtaining unit, configured to obtain a first target tracking result of a previous first image of the current first image, where the first target tracking result includes a first bounding box, a first feature, and a first tracking identifier of each of at least one first target in the previous first image;

the first target pair matching unit is used for matching at least one first target in the current first image and at least one first target in the previous first image according to the first bounding box and the first feature to obtain a successfully matched first target pair;

a first identification unit, configured to assign, for the successfully matched first target pair, a first tracking identifier of a first target from a previous first image in the first target pair to a first target from a current first image; and

And the first track determining unit is used for determining a first bounding box sequence and a first feature sequence of a first target with the same first tracking identifier in the first image sequence as a first track.

18. The apparatus of claim 17, further comprising:

and the first creation module is used for creating a tracking identifier for the first target which is not successfully matched in the current first image.

19. The apparatus of claim 11, wherein the trajectory determination module further comprises:

a second detection result obtaining unit, configured to obtain a second target detection result of a current second image in the second image sequence, where the second target detection result includes a second surrounding frame and a second feature of each of at least one second target in the current second image;

a second tracking result obtaining unit, configured to obtain a second target tracking result of a previous second image of the current second image, where the second target tracking result includes a second bounding box, a second feature, and a second tracking identifier of each of at least one second target in the previous second image;

the second target pair matching unit is used for matching at least one second target in the current second image and at least one second target in the previous second image according to the second bounding box and the second characteristics to obtain a successfully matched second target pair;

A second identification unit, configured to assign, for the successfully matched second target pair, a second tracking identifier of a second target from a previous second image in the second target pair to a second target from a current second image; and

and the second track determining unit is used for determining a second surrounding frame sequence and a second characteristic sequence of a second target with the same second tracking identifier in the second image sequence as a second track.

20. The apparatus of claim 19, further comprising:

and the second creating module is used for creating a second tracking identifier for a second target which is not successfully matched in the current second image.

21. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 10.

22. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1 to 10.