CN116245916A

CN116245916A - Unmanned ship-oriented infrared ship target tracking method and device

Info

Publication number: CN116245916A
Application number: CN202310524458.6A
Authority: CN
Inventors: 苏绍璟; 吴鹏; 左震; 孙备; 郭润泽; 童小钟; 张家菊; 黄泓赫; 袁书东; 党昭洋; 李�灿
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2023-05-11
Filing date: 2023-05-11
Publication date: 2023-06-09
Anticipated expiration: 2043-05-11
Also published as: CN116245916B

Abstract

An infrared ship target tracking method and device facing unmanned ships, the method comprises the following steps: constructing an infrared ship target video data set; dividing the data set into a training set and a testing set; training a ship target detector; constructing an appearance characteristic extraction network structure model and training; constructing a ship target tracker and training; detecting ship targets of the test set by using the trained ship target detector; extracting an appearance feature vector of the ship target by using the trained appearance feature extraction network structure model; according to the ship target and the appearance feature vector of the ship target, testing the test set by using a trained ship target tracker; and deploying a ship target detector, extracting an appearance characteristic extraction network structure model and a ship target tracker to track the ship target. The infrared ship target tracking device is used for tracking the target of the unmanned ship. The method has the advantages of good robustness and high precision, and can be suitable for short-time shielding and nonlinear motion scenes.

Description

Unmanned ship-oriented infrared ship target tracking method and device

Technical Field

The invention relates to the field of infrared ship target tracking, in particular to an unmanned ship-oriented infrared ship target tracking method and device.

Background

The infrared detection system has unique advantages in the aspect of detecting ship targets under the conditions of smoke, dust and low illumination, can realize all-weather day and night continuous passive detection, and is suitable for tracking ship targets in complex water surface scenes. The infrared detection tracking system has important military and civil values, such as sea surface environment reconnaissance, sailing protection, port security assurance, ship traffic management and the like.

For the realization of the tracking task of the infrared ship target, at present, there are mainly 3 tracking strategies: the method comprises the following steps of detecting and tracking integration technology based on an initialized characteristic and tracking technology based on detection. The tracking technology based on the initialization features is suitable for the condition that the number of targets is unchanged, and is difficult to meet the infrared ship target tracking requirement of the unmanned ship. The detection tracking integration technique combines existing detectors and trackers in the same framework, which saves computation and reduces model size, however such models are not flexible enough in the case of multi-scene changes. The tracking technology based on detection has higher operation efficiency and tracking precision, can cope with flexible and changeable application scenes, and is a commonly used tracking method at present.

However, infrared ship target tracking has significant technical challenges. The infrared ship target tracking difficulty is very challenging due to the mutual shielding of ship targets and the nonlinear movement of the targets in the image caused by the shaking of the water surface or the shielding of fixed objects such as islands, piers and the like. The infrared ship target has great challenges to the similarity measurement between the target features adopting cosine distances and the feature extraction of a convolution network (CNN) because of the phenomenon of unobvious image features and weak textures. In addition, when the target is occluded, when no observation results are provided to the kalman filter prediction module, it recursively updates parameters only according to its own predictions, and without supervision of observation, the estimated variation of Kalman Filter (KF) linear motion hypothesis over time becomes a squared order error accumulation, resulting in incorrect matching of the predicted results with the newly detected results, resulting in tracking failure.

In summary, in a complex scene, achieving stable infrared ship target tracking is a technical problem that needs to be solved currently.

Disclosure of Invention

The invention aims to solve the technical problems of overcoming the defects of the background technology and providing the infrared ship target tracking method and device for the unmanned ship, which have the advantages of good robustness and high precision and can be suitable for short-time shielding and nonlinear motion scenes.

The technical scheme adopted for solving the technical problems is that the infrared ship target tracking method facing the unmanned ship comprises the following steps:

s1: in an actual water surface environment, recording infrared video data of a ship target, constructing an infrared ship target video data set, and preprocessing the video data;

s2: dividing an infrared ship target video data set into a training set and a testing set;

s3: training the ship target detector based on the training set to obtain a trained ship target detector;

s4: building an appearance characteristic extraction network structure model, and training the appearance characteristic extraction network model based on a training set to obtain a trained appearance characteristic extraction network structure model;

s5: constructing a ship target tracker, and training the ship target tracker based on a training set to obtain a trained ship target tracker;

s6: detecting ship targets of the test set by using the trained ship target detector;

s7: extracting appearance feature vectors of the detected ship targets by using the trained appearance feature extraction network structure model;

s8: according to the detected ship targets and the appearance feature vectors of the ship targets, testing the testing set by using a trained ship target tracker;

S9: the ship target detector, the appearance characteristic extraction network structure model and the ship target tracker are deployed to a processing platform of the unmanned ship for practical application, and ship target tracking is carried out on the ship target infrared video data acquired in real time.

In step S1, the preprocessing refers to frame extraction processing of the video data, rectangular frame labeling of the ship target in each frame of image, and labeling of the identity serial number of the ship target.

Further, in step S4, the appearance feature extraction network structure model includes a feature extraction backbone network and a refined feature extraction network, the ship target image detected by the ship target detector is subjected to feature extraction through the feature extraction backbone network, the output of the feature extraction backbone network is used as an input feature map of the refined feature extraction network, and the refined feature extraction network includes three branches, namely a first branch and a second branchThe method comprises the steps of branching and a third branching, wherein in the first branching, 1X 1 convolution operation is carried out on an input feature map to obtain a first input feature map, global maximization pooling is adopted on the first input feature map to generate 2048-dimensional feature vectors, and the 2048-dimensional feature vectors are compressed into 256-dimensional feature vectors through 1X 1 convolution; in the second branch, the input feature image is subjected to 3×3 convolution operation to obtain a second input feature image, in the third branch, the input feature image is subjected to 5×5 convolution operation to obtain a third input feature image, the second input feature image and the third input feature image are subjected to a cross fusion module to obtain a first fusion feature image and a second fusion feature image, in the second branch, the first fusion feature image adopts global maximum pooling to generate 2048-dimensional feature vectors, the feature images are compressed into 256-dimensional feature vectors through 1×1 convolution, in addition, the feature images after the fusion module are uniformly divided into 2 parts in the vertical direction, global maximum pooling is respectively carried out, the feature images are respectively compressed into 2 256-dimensional feature vectors through 1×1 convolution, and the second branch totally obtains 3 256-dimensional feature vectors; in the third branch, the second fusion feature map adopts global maximum pooling to generate 2048-dimensional feature vectors, and the feature vectors are compressed into 256-dimensional feature vectors through 1X 1 convolution, in addition, the feature map after the fusion module is uniformly divided into 3 parts in the vertical direction, global maximum pooling is respectively carried out, the feature vectors are respectively compressed into 3 256-dimensional feature vectors through 1X 1 convolution, and the third branch obtains 4 256-dimensional feature vectors in total; the appearance feature extraction network structure model obtains 8 256-dimensional feature vectors through three branches in total, 1 2048-dimensional feature vector is obtained after splicing, the 2048-dimensional feature vector obtained after splicing is convolved by 1X 1 to obtain a 2048-dimensional feature vector after convolution, and the 2048-dimensional feature vector after convolution is used as the appearance feature vector of the ship target F。

Further, the specific structure of the cross fusion module is as follows: in the second branch, the second input feature map is convolved by three 1×1 convolutions to generate corresponding Q ₁ 、K ₁ 、V ₁ Vector, Q ₁ 、K ₁ Vector multiplication results in an attention matrix A ₁ The method comprises the steps of carrying out a first treatment on the surface of the In the third branch, the third input feature map passes through three 1×1Convolving to generate corresponding Q ₂ 、K ₂ 、V ₂ Vector; q (Q) ₂ 、K ₂ Vector multiplication results in an attention matrix A ₂ The method comprises the steps of carrying out a first treatment on the surface of the Attention matrix a in the second branch ₁ V with the third branch ₂ Vector multiplication is carried out, and then the second input feature images are added to obtain a first fusion feature image; attention matrix a in the third branch ₂ And V in the second branch ₁ Vector multiplication is carried out, and then the third input feature images are added to obtain a second fusion feature image.

Further, in step S9, the infrared ship target tracker includes a cost matrix calculation module, a first cascade matching module, a second cascade matching module, a conventional kalman filter parameter update module, and an imaginary track kalman filter parameter update module; the cost matrix computing module is connected with the first cascade matching module, the first cascade matching module is connected with the second cascade matching module and the conventional Kalman filter parameter updating module, the second cascade matching module is connected with the conventional Kalman filter parameter updating module and the imaginary track Kalman filter parameter updating module, and the conventional Kalman filter parameter updating module and the imaginary track Kalman filter parameter updating module are respectively connected with the cost matrix computing module;

The cost matrix calculation module is used for calculating a total cost matrix, wherein the total cost matrix comprises an appearance similarity matrix, a position measurement matrix and a motion direction measurement matrix;

the first cascade matching module is used for cascade matching the detection frame and the corresponding prediction frame according to the total cost matrix;

the second cascade matching module is used for cascade matching the detection frame and the corresponding prediction frame according to the position measurement matrix;

the conventional Kalman filter parameter updating module is used for updating the Kalman filter parameters of the detection frame according to the continuous track;

the imaginary track Kalman filter parameter updating module is used for generating an imaginary track and updating Kalman filter parameters of the detection frame according to the imaginary track.

Further, in step S9, the tracking principle of the ship target tracker on the ship target is as follows:

the detection frames and the corresponding prediction frames are used as the input of a ship target tracker, the prediction frames are divided into a prediction frame of a confirmed state track and a prediction frame of an unconfirmed state track, wherein the detection frames and the prediction frame of the confirmed state track are used as the input of a cost matrix calculation module, and the prediction frame of the unconfirmed state track is used as the input of a second bight matching module; the detection frame is a detection result obtained by detecting a ship target in the current frame image through a ship target detector, and is marked by a rectangular frame; the corresponding prediction frame refers to a prediction result of a detection frame in a previous frame image in a current frame image, and is marked by a rectangular frame;

The cost matrix calculation module calculates a total cost matrix

According to the total cost matrix, the first cascade matching module carries out first cascade matching on the detection frame and the prediction frame of the confirmation state track; in the first cascade matching, carrying out Kalman filter parameter updating on a detection frame matched with a prediction frame through a conventional Kalman filter parameter updating module to obtain a prediction frame of a next frame; the prediction frame of the unconfirmed state track, the detection frame which is not matched in the first cascade matching and the prediction frame which is not matched are used as the input of a second cascade matching module;

the second cascade matching module performs second cascade matching on the prediction frames of the unacknowledged state track, the detection frames which are not matched in the first cascade matching, the prediction frames which are not matched, recalculates the position measurement matrix, performs second cascade matching according to the position measurement matrix, generates an imaginary track for the detection frames which are matched with the prediction frames in the second cascade matching, and performs Kalman filter parameter updating on the detection frames which are matched with the prediction frames according to the imaginary track to obtain the prediction frames of the next frame; carrying out Kalman filter parameter updating on the unmatched detection frame in the second secondary linkage matching through a conventional Kalman filter parameter updating module to obtain a prediction frame of the next frame; and deleting the unmatched prediction frames in the second secondary link matching.

Further, in the first cascade matching and the second cascade matching, 3 continuous frames are subjected to cascade matching to match detection frames, the track formed by the detection frames in the 3 frames of images is a confirmation track, the detection frames are detection frames of the confirmation track, otherwise, the detection frames are unconfirmed tracks, and the detection frames are detection frames of the unconfirmed tracks; the prediction frame obtained by updating the parameters of the Kalman filter of the detection frame of the confirmed track is the prediction frame of the confirmed track, and the prediction frame obtained by updating the parameters of the Kalman filter of the detection frame of the unconfirmed track is the prediction frame of the unconfirmed track; the virtual track is formed by filling a broken discontinuous part in the confirmation track with the virtual track; the continuous track refers to a confirmation track that is continuous without breaking.

Further, the total cost matrix

The calculation method of (2) is as follows: />

The appearance similarity matrix

The calculation method of (a) is that,

;

wherein ,

indicate->

Transpose of the appearance feature vector of the individual ship object,/->

Representation and the firstkAppearance feature vectors of ship targets in the last frame of images related to the tracks;

the position metric matrix

The calculation method of (a) is that,

;

wherein ,

for the intersection area of the detection frame and the prediction frame, < >>

For the union area of the detection frame and the prediction frame, < >>

Euclidean distance representing the center point of the detection frame and the prediction frame, < >>

For the diagonal length of the smallest outer cross frame, < >>

Is a weight function, ++>

Aspect ratio similarity;

the motion direction measurement matrix

The calculation method of (a) is that,

;

wherein ,

represents->

Frame image and->

Connected with ship target center position in frame imageDirection of formation, I (L)>

Represents->

Frame image and +.>

Direction of ship target center position in frame image, < >>

Radian value representing the angle between two directions, < >>

Representing the number of frames;

the total cost matrix

The calculation method of (a) is that,

;

wherein ,

for the appearance similarity matrix, ++>

For the locality metric matrix, +.>

A matrix is measured for the direction of motion.

Further, the imaginary locus

The generation method of (1) comprises the following steps:

;

wherein ,

for the last observed trace record, +.>

Recording the corresponding time for the last observed track,/-for>

Track record re-linked for second secondary link matching, ±>

Recording the corresponding time for linking to the track again; />

Indicating the time corresponding to the loss of the ship target.

An infrared ship target tracking device facing an unmanned ship comprises a ship target detector, an appearance characteristic extraction network structure model and a ship target tracker, wherein the ship target detector is respectively connected with the appearance characteristic extraction network structure model and the ship target tracker, and the appearance characteristic extraction network structure model is connected with the ship target tracker; the ship target detector is used for detecting ship targets; the appearance feature extraction network structure model is used for extracting appearance feature vectors of ship targets; the ship target tracker is used for tracking ship targets.

Compared with the prior art, the invention has the following advantages:

the invention designs a more robust appearance feature extraction network structure model to realize characteristic re-identification, better realize individual difference distinction and reduce identity jump in the tracking process; the ship target tracker is designed, a secondary linkage matching and imaginary track strategy is provided, a reasonable time interval shielding condition can be processed, a parameter optimization process is carried out according to the imaginary track, error accumulation caused by lack of an observer in Kalman filtering is relieved, a better estimated target position is obtained, and stable tracking of a robust ship target is finally realized. Compared with the existing infrared ship target tracking method, the method provided by the invention has the advantages of good robustness and high precision, and can be suitable for short-time shielding and nonlinear motion scenes.

Drawings

Fig. 1 is a schematic structural diagram of an appearance feature extraction network structure model in an embodiment of the present invention.

Fig. 2 is a schematic structural diagram of a cross fusion module according to an embodiment of the present invention.

Fig. 3 is a schematic structural diagram of a ship target tracker in an embodiment of the invention.

FIG. 4 is a graph comparing tracking performance indicators of the method of the present invention with other methods.

Detailed Description

The invention is described in further detail below with reference to the accompanying drawings and specific examples.

The infrared ship target tracking method facing the unmanned ship of the embodiment comprises the following steps:

s1: in an actual water surface environment, recording infrared video data of a ship target, constructing an infrared ship target video data set, and preprocessing the video data; the preprocessing refers to frame extraction processing of video data, rectangular frame labeling of ship targets in each frame of image, and labeling of identity serial numbers (IDs) of the ship targets;

s4: building an appearance characteristic extraction network structure model, and training the appearance characteristic extraction network model based on a training set to obtain a trained appearance characteristic extraction network structure model; the appearance feature extraction network structure model is used for extracting appearance feature vectors of ship targets;

s8: according to the detected ship targets and the appearance feature vectors of the ship targets, testing the test set by using a trained ship target tracker, and calculating tracking performance indexes;

In the embodiment, an infrared ship target tracking system oriented to an unmanned ship is constructed, the system comprises the unmanned ship and a shore-based platform, a processing platform is arranged on the unmanned ship, an infrared ship target tracking device is deployed on the processing platform, the infrared ship target tracking device comprises a ship target detector, an appearance characteristic extraction network structure model and a ship target tracker, the ship target detector is respectively connected with the appearance characteristic extraction network structure model and the ship target tracker, the appearance characteristic extraction network structure model is connected with the ship target tracker, the processing platform is NVIDIA Jetson Orin NX, an infrared measurement pod, an electronic compass, a cabin radio station, a cabin antenna and a GPS positioning device are mounted on the unmanned ship, a power supply of the infrared measurement pod is 24V, the infrared measurement pod is connected with the processing platform through an SDI acquisition card and transmits an infrared image to the processing platform in real time, and the processing platform carries out ship target tracking on the infrared image acquired in real time. The electronic compass is used for acquiring attitude information such as pitch angle, azimuth angle and the like of the unmanned ship, the GPS positioning device is used for acquiring longitude and latitude coordinate position information of the unmanned ship, the shore base station comprises a shore base station and a display platform, the shore base station comprises a base station antenna and a base station radio station, the display platform is connected with the base station radio station, the base station radio station is connected with the base station antenna, and the base station antenna is connected with the cabin antenna through wireless communication. The processing platform is connected with the cabin radio station, the unmanned ship communicates with the shore-based platform through the cabin radio station and the cabin antenna, and the calculation result of the processing platform on the unmanned ship is transmitted to the display platform.

The experimental test is carried out in a park, the range of the lake surface is 100 m multiplied by 200 m, the wavelength of an infrared camera of the infrared measurement nacelle is 8-12um, the obtained video frame rate is 25 frames, and the resolution is 640 multiplied by 480 pixels.

The ship comprises a military ship and a civil ship, the civil ship is used as a target ship, the target ship is a pleasure boat, 20 segments of original video are collected, the total duration is about 405 seconds, and 10133 frames of images are counted. The division ratio of the training set and the test set is 3:2, wherein 12 pieces of video are used for training, and 8 pieces of video are used for testing. And performing frame extraction processing on the video, storing each frame of image, performing rectangular frame marking on the ship target in the image, and noting the identity serial number (ID) of the ship target.

In this embodiment, the ship target detector adopts an existing detector, and the detector adopts a Yolov5 detection algorithm.

Referring to fig. 1, in this embodiment, a constructed appearance feature extraction network structure model includes a feature extraction backbone network and a refined feature extraction network, a ship target image detected by a ship target detector is subjected to feature extraction through the feature extraction backbone network, an output of the feature extraction backbone network is used as an input feature map of the refined feature extraction network, the refined feature extraction network includes three branches, namely a first branch, a second branch and a third branch, and in the first branch, a convolution operation of 1×1 is performed on the input feature map to obtain a first input feature map, then global maximization is adopted for the first input feature map to generate 2048-dimensional feature vectors, and the feature vectors are compressed into 256-dimensional feature vectors through convolution of 1×1; in the second branch, 3×3 convolution operation is performed on the input feature map to obtain a second input feature map, in the third branch, 5×5 convolution operation is performed on the input feature map to obtain a third input feature map, the second input feature map and the third input feature map are subjected to a cross fusion module to obtain a first fusion feature map and a second fusion feature map, in the second branch, the first fusion feature map adopts global maximum pooling to generate 2048-dimensional feature vectors, and the feature vectors are compressed into 256-dimensional feature vectors through 1×1 convolution, otherwise, the feature vectors are subjected to fusion module in the vertical direction The feature map after the block is uniformly divided into 2 parts, global maximum pooling is carried out respectively, the feature map is compressed into 2 256-dimensional feature vectors respectively through convolution of 1×1, and the second branch is used for obtaining 3 256-dimensional feature vectors in total. In the third branch, the second fusion feature map adopts global maximum pooling to generate 2048-dimensional feature vectors, and the feature vectors are compressed into 256-dimensional feature vectors through 1X 1 convolution, in addition, the feature map after the fusion module is uniformly divided into 3 parts in the vertical direction, global maximum pooling is respectively carried out, the feature vectors are respectively compressed into 3 256-dimensional feature vectors through 1X 1 convolution, and the third branch obtains 4 256-dimensional feature vectors in total; the appearance feature extraction network structure model obtains 8 256-dimensional feature vectors through three branches in total, 1 2048-dimensional feature vector is obtained after splicing, the 2048-dimensional feature vector obtained after splicing is convolved by 1X 1 to obtain a 2048-dimensional feature vector after convolution, and the 2048-dimensional feature vector after convolution is used as the appearance feature vector of the ship targetF。

Referring to fig. 2, in the embodiment, in the appearance feature extraction network structure model, a cross fusion structure is adopted in the second branch and the third branch, and the specific structure is that in the second branch, the second input feature map generates a corresponding Q through three 1×1 convolutions ₁ 、K ₁ 、V ₁ Vector, Q ₁ 、K ₁ Vector multiplication results in an attention matrix A ₁ The method comprises the steps of carrying out a first treatment on the surface of the In the third branch, the third input feature map is convolved by three 1×1 convolutions to generate corresponding Q ₂ 、K ₂ 、V ₂ Vector; q (Q) ₂ 、K ₂ Vector multiplication results in an attention matrix A ₂ The method comprises the steps of carrying out a first treatment on the surface of the Attention matrix a in the second branch ₁ V with the third branch ₂ Vector multiplication is carried out, and then the second input feature images are added to obtain a first fusion feature image; attention matrix a in the third branch ₂ And V in the second branch ₁ Vector multiplication is carried out, and then the third input feature images are added to obtain a second fusion feature image.

Referring to fig. 3, in this embodiment, the constructed infrared ship target tracker includes a cost matrix calculation module, a first cascade matching module, a second cascade matching module, a conventional kalman filter parameter update module, and a hypothetical trace kalman filter parameter update module.

The cost matrix computing module is connected with the first cascade matching module, the first cascade matching module is connected with the second cascade matching module and the conventional Kalman filter parameter updating module, the second cascade matching module is connected with the conventional Kalman filter parameter updating module and the imaginary track Kalman filter parameter updating module, and the conventional Kalman filter parameter updating module and the imaginary track Kalman filter parameter updating module are respectively connected with the cost matrix computing module.

The cost matrix calculation module is used for calculating a total cost matrix, wherein the total cost matrix comprises an appearance similarity matrix, a position measurement matrix and a motion direction measurement matrix.

And the first cascade matching module is used for cascade matching the detection frame and the corresponding prediction frame according to the total cost matrix.

And the second cascade matching module is used for cascade matching the detection frame and the corresponding prediction frame according to the position measurement matrix.

The conventional Kalman filter parameter updating module is used for updating the Kalman filter parameters of the detection frame according to the continuous track.

The imaginary track Kalman filter parameter updating module is used for generating an imaginary track and updating the Kalman filter parameters of the detection frame according to the imaginary track.

The ship target tracker tracks the ship target according to the following principle:

the detection frame and the corresponding prediction frame are used as input of a ship target tracker, wherein the detection frame refers to a detection result obtained by detecting a ship target in a current frame image through a ship target detector, and is marked by a rectangular frame; the corresponding prediction frame refers to a prediction result of a detection frame in the previous frame image in the current frame image, and is marked by a rectangular frame. The prediction frames are divided into a prediction frame of a confirmation state track and a prediction frame of an unacknowledged state track, wherein the detection frame and the prediction frame of the confirmation state track are used as input of a cost matrix calculation module, and the prediction frame of the unacknowledged state track is used as input of a second joint matching module.

The cost matrix calculation module calculates to obtain a total cost matrix

And carrying out first cascade matching on the detection frame and the prediction frame of the confirmation track by the first cascade matching module according to the total cost matrix, wherein the first cascade matching adopts a Hungary algorithm, and the total cost matrix is used as input of the Hungary algorithm to carry out maximum matching on the detection frame and the prediction frame. In the first cascade matching, the detection frame matched with the prediction frame is subjected to Kalman filter parameter updating through a conventional Kalman filter parameter updating module to obtain the prediction frame of the next frame. And taking the prediction frame of the unconfirmed state track, the detection frame which is not matched in the first cascade matching and the prediction frame which is not matched as the input of the second cascade matching module.

And the second cascade matching module performs second secondary cascade matching on the prediction frame of the unacknowledged state track, the detection frame which is not matched in the first cascade matching and the prediction frame which is not matched, recalculates the position measurement matrix, performs second secondary cascade matching according to the position measurement matrix, and performs maximum matching on the detection frame and the prediction frame by taking the position measurement matrix as input of the Hungary algorithm by adopting the Hungary algorithm. And generating an imaginary track for the detection frame matched with the prediction frame in the second secondary linkage matching by the imaginary track Kalman filtering parameter updating module, and updating Kalman filter parameters of the detection frame matched with the prediction frame according to the imaginary track to obtain the prediction frame of the next frame. The virtual track is a track formed by filling a broken discontinuous part of the confirmation track with the virtual track. The continuous track refers to a confirmation track that is continuous without breaking. And carrying out Kalman filter parameter updating on the unmatched detection frame in the second secondary linkage matching by a conventional Kalman filter parameter updating module to obtain a prediction frame of the next frame. And deleting the unmatched prediction frames in the second secondary link matching.

In the first cascade matching and the second secondary cascade matching, 3 continuous frames are subjected to cascade matching to match detection frames, the track formed by the detection frames in the 3 frames of images is a confirmation track, the detection frames are detection frames of the confirmation track, otherwise, the detection frames are non-confirmation tracks, and the detection frames are detection frames of the non-confirmation track. The prediction frame obtained by updating the parameters of the Kalman filter of the detection frame of the confirmed track is the prediction frame of the confirmed track, and the prediction frame obtained by updating the parameters of the Kalman filter of the detection frame of the unconfirmed track is the prediction frame of the unconfirmed track.

In this embodiment, the appearance similarity matrix

The calculation method of (a) is that,

;

wherein ,

indicate->

Transpose of the appearance feature vector of the individual ship object,/->

Representation and the firstkAppearance characteristic vectors of ship targets in the last frame of images related to the tracks.

In this embodiment, the position metric matrix

The calculation method of (a) is that,

;

wherein ,

for the intersection area of the detection frame and the prediction frame, < >>

For the union area of the detection frame and the prediction frame, < >>

For the diagonal length of the smallest outer cross frame, < >>

Is a weight function, ++ >

Aspect ratio similarity.

In this embodiment, the motion direction measurement matrix

The calculation method of (a) is that,

;

wherein ,

represents->

Frame image and->

Direction of ship target center position in frame image, < >>

Represents->

Frame image and +.>

Direction of ship target center position in frame image, < >>

Radian value representing the angle between two directions，/>

Representing the number of frames.

In this embodiment, the total cost matrix

The calculation method of (a) is that,

;

wherein ,

for the appearance similarity matrix, ++>

For the locality metric matrix, +.>

A matrix is measured for the direction of motion.

In the present embodiment, the virtual trajectory

The generation method of (1) comprises the following steps:

;/>

wherein ,

for the last observed trace record, +.>

Recording the corresponding time for the last observed track,/-for>

Track record re-linked for second secondary link matching, ±>

To be linked to the track againRecording corresponding time; />

The time corresponding to the loss of the ship target is represented, wherein the time is represented by the time corresponding to the frame number, and the time of one frame is 0.04 seconds.

In this example, the tracking effect and index performance of the method of the present invention on the test set were compared with Bytetrack, fairMOT, CStrack, GTR, transTrack methods. According to the Bytetrack method, the detection frames with high confidence and low confidence are matched with the tracking tracks, so that the problem of target omission caused by target shielding, motion blurring and the like can be effectively reduced, the accuracy of target tracking is improved, and the method is larger in calculated amount and low in efficiency because more detection frames are matched. The FairMOT method adopts a one-stage multi-target tracker, and the detection model and the target appearance characteristic model are carried out simultaneously, so that the running speed is improved, but the problem of insufficient appearance characteristic extraction exists. The CStrack method re-considers the relation between detection and appearance feature extraction in the tracking task and proposes that the two sharing network parameters have a competitive relation. The GTR method is a global multi-target tracking system structure method based on a transducer model (a neural network model), the network takes a short-time frame sequence as input and generates global tracks for all objects, and the core component is a global tracking converter which operates the objects in all frames in the sequence. The TransTrack method is based on a combined detection and tracking framework, firstly tracks the existing targets of the current frame and completes detection of new targets, and output is completed through a simple matching detection frame and a track frame. From the comparison results, the ID change phenomenon appears in the results of the GTR, CSTrack, byteTrack, fairMOT, transTrack method. The method can continuously and stably track the ship targets in the images, and the IDs are kept stable and consistent, so that the method can well cope with complex scenes. In fig. 4, the performance index employs MOTP (multiple objects tracking precision, i.e., multi-target tracking accuracy, used to measure the degree of difference between the target tracking result and the true target position), MOTA (multiple object tracking accuracy, i.e., multi-target tracking accuracy, calculate the combined value of false positive rate, false negative rate, and mismatch rate, and measure the accuracy of the algorithm tracking target), detA (detection accuracy, i.e., detection accuracy, the ratio between the detected target and the actual total target exceeding a set threshold), HOTA (Higher Order Tracking Accuracy, i.e., high-order tracking accuracy, calculate the geometric average of the detection accuracy and the correlation accuracy), assA (association accuracy, i.e., correlation accuracy, the average exceeding the positioning threshold in all the correlation Jaccard indexes of the matching detection), assRe (association recall, i.e. correlating recall, calculating the average value of all matching detections, the average result exceeding a positioning threshold), IDF1 (Frag and localization accuracy, i.e. the proportion of detected targets in which the correct identity number is obtained from the detected and tracked targets), frag (Chinese meaning is fragmentation, which is a multi-target tracking index, and the meaning is the total number of track fragmentation, i.e. the number of track interruption in the tracking process), IDs (identity switch, i.e. the number of identity number (ID) switches allocated to a target), and the like, the index side edges ∈indicate that the larger the index value is, the better the tracking performance, and ∈indicates that the smaller the index value is, the better the tracking performance.

Compared with other algorithms, the indexes of MOTP, MOTA, detA, HOTA, assA, assRe, IDF and the like are at the highest level, for example, MOTP and MOTA indexes are respectively 9.241 and 22.406 higher than ByteTrack, which shows that the method can be better suitable for the application of infrared ship target tracking lacking detailed information. In addition, for IDs index, the value of the method is minimum and is 26 and 66 lower than that of ByteTrack and TransTrack, respectively, which shows that the method has better tracking stability, and the method can better cope with the situations even though the infrared ship target has frequent crossing and tracking difficulty caused by shaking.

The method can meet the actual application requirements, can realize the infrared ship target tracking function in the sea surface scene, has good robustness and high precision, can be suitable for short-time shielding and nonlinear motion scenes, and has strong practical value.

Various modifications and variations of the present invention may be made by those skilled in the art, and, provided that they are within the scope of the appended claims and their equivalents, they are also within the scope of the present invention.

What is not described in detail in the specification is prior art known to those skilled in the art.

Claims

1. An unmanned ship-oriented infrared ship target tracking method is characterized by comprising the following steps of: the method comprises the following steps:

2. The unmanned ship-oriented infrared ship target tracking method according to claim 1, wherein: in step S1, the preprocessing refers to frame extraction processing of video data, rectangular frame labeling of ship targets in each frame of image, and labeling of identity serial numbers of the ship targets.

3. The unmanned ship-oriented infrared ship target tracking method according to claim 1, wherein: in step S4, the appearance feature extraction network structure model includes a feature extraction backbone network and a refined feature extraction network, the feature extraction backbone network is used for feature extraction of the ship target image detected by the ship target detector, the output of the feature extraction backbone network is used as an input feature map of the refined feature extraction network, the refined feature extraction network includes three branches, namely a first branch, a second branch and a third branch, in the first branch, a convolution operation of 1×1 is performed on the input feature map to obtain a first input feature map, then global maximum pooling is adopted on the first input feature map to generate 2048-dimensional feature vectors, and the 2048-dimensional feature vectors are compressed into 256-dimensional feature vectors through convolution of 1×1; in the second branch, the input feature image is subjected to 3×3 convolution operation to obtain a second input feature image, in the third branch, the input feature image is subjected to 5×5 convolution operation to obtain a third input feature image, the second input feature image and the third input feature image are subjected to a cross fusion module to obtain a first fusion feature image and a second fusion feature image, in the second branch, the first fusion feature image adopts global maximum pooling to generate 2048-dimensional feature vectors, and is compressed into 256-dimensional feature vectors through 1×1 convolution, in addition, the feature image after the fusion module is uniformly divided into 2 parts in the vertical direction, and global maximum is respectively carried out Pooling, namely respectively compressing the two 256-dimensional feature vectors into 2 256-dimensional feature vectors through convolution of 1 multiplied by 1, and obtaining 3 256-dimensional feature vectors in total through a second branch; in the third branch, the second fusion feature map adopts global maximum pooling to generate 2048-dimensional feature vectors, and the feature vectors are compressed into 256-dimensional feature vectors through 1X 1 convolution, in addition, the feature map after the fusion module is uniformly divided into 3 parts in the vertical direction, global maximum pooling is respectively carried out, the feature vectors are respectively compressed into 3 256-dimensional feature vectors through 1X 1 convolution, and the third branch obtains 4 256-dimensional feature vectors in total; the appearance feature extraction network structure model obtains 8 256-dimensional feature vectors through three branches in total, 1 2048-dimensional feature vector is obtained after splicing, the 2048-dimensional feature vector obtained after splicing is convolved by 1X 1 to obtain a 2048-dimensional feature vector after convolution, and the 2048-dimensional feature vector after convolution is used as the appearance feature vector of the ship targetF。

4. An unmanned ship-oriented infrared ship target tracking method according to claim 3, wherein: the specific structure of the cross fusion module is as follows: in the second branch, the second input feature map is convolved by three 1×1 convolutions to generate corresponding Q ₁ 、K ₁ 、V ₁ Vector, Q ₁ 、K ₁ Vector multiplication results in an attention matrix A ₁ The method comprises the steps of carrying out a first treatment on the surface of the In the third branch, the third input feature map is convolved by three 1×1 convolutions to generate corresponding Q ₂ 、K ₂ 、V ₂ Vector; q (Q) ₂ 、K ₂ Vector multiplication results in an attention matrix A ₂ The method comprises the steps of carrying out a first treatment on the surface of the Attention matrix a in the second branch ₁ V with the third branch ₂ Vector multiplication is carried out, and then the second input feature images are added to obtain a first fusion feature image; attention matrix a in the third branch ₂ And V in the second branch ₁ Vector multiplication is carried out, and then the third input feature images are added to obtain a second fusion feature image.

5. The unmanned ship-oriented infrared ship target tracking method according to claim 1, wherein: in step S9, the infrared ship target tracker includes a cost matrix calculation module, a first cascade matching module, a second cascade matching module, a conventional kalman filter parameter updating module, and an imaginary track kalman filter parameter updating module; the cost matrix computing module is connected with the first cascade matching module, the first cascade matching module is connected with the second cascade matching module and the conventional Kalman filter parameter updating module, the second cascade matching module is connected with the conventional Kalman filter parameter updating module and the imaginary track Kalman filter parameter updating module, and the conventional Kalman filter parameter updating module and the imaginary track Kalman filter parameter updating module are respectively connected with the cost matrix computing module;

6. The unmanned ship-oriented infrared ship target tracking method according to claim 5, wherein: in step S9, the following principle is adopted for the ship target tracker to track the ship target:

The cost matrix calculation module calculates a total cost matrix

7. The unmanned ship-oriented infrared ship target tracking method according to claim 6, wherein: in the first cascade matching and the second cascade matching, 3 continuous frames are subjected to cascade matching to match detection frames, the track formed by the detection frames in the 3 frames of images is a confirmation track, the detection frames are detection frames of the confirmation track, otherwise, the detection frames are unconfirmed tracks, and the detection frames are detection frames of the unconfirmed tracks; the prediction frame obtained by updating the parameters of the Kalman filter of the detection frame of the confirmed track is the prediction frame of the confirmed track, and the prediction frame obtained by updating the parameters of the Kalman filter of the detection frame of the unconfirmed track is the prediction frame of the unconfirmed track; the virtual track is formed by filling a broken discontinuous part in the confirmation track with the virtual track; the continuous track refers to a confirmation track that is continuous without breaking.

8. The unmanned ship-oriented infrared ship target tracking method according to claim 5, wherein: the total cost matrix

The calculation method of (2) is as follows:

the appearance similarity matrix

The calculation method of (a) is that,

wherein ,/>

Indicate->

Transpose of the appearance feature vector of the individual ship object,/- >

the position metric matrix

The calculation method of (a) is that,

；

wherein ,

for the intersection area of the detection frame and the prediction frame, < >>

For the union area of the detection frame and the prediction frame, < >>

For the diagonal length of the smallest outer cross frame, < >>

Is a weight function, ++>

Aspect ratio similarity;

the motion direction measurement matrix

The calculation method of (a) is that,

；

wherein ,

represents->

Frame image and->

The direction formed by connecting the ship target center positions in the frame image,

represents->

Frame image and +.>

Direction of ship target center position in frame image, < >>

Radian value representing the angle between two directions, < >>

Representing the number of frames;

the total cost matrix

The calculation method of (a) is that,

；

wherein ,

for the appearance similarity matrix, ++>

For the locality metric matrix, +.>

A matrix is measured for the direction of motion.

9. The unmanned ship-oriented infrared ship target tracking method as claimed in claim 5, wherein the method comprises the following steps ofIn the following steps: the imaginary locus

The generation method of (1) comprises the following steps:

;

wherein ,

for the last observed trace record, +.>

Recording the corresponding time for the last observed track,/-for >

Track record re-linked for second secondary link matching, ±>

Recording the corresponding time for linking to the track again; />

Indicating the time corresponding to the loss of the ship target.

10. An unmanned ship-oriented infrared ship target tracking device applied to the unmanned ship-oriented infrared ship target tracking method of any one of claims 1 to 9, characterized in that: the system comprises a ship target detector, an appearance characteristic extraction network structure model and a ship target tracker, wherein the ship target detector is respectively connected with the appearance characteristic extraction network structure model and the ship target tracker, and the appearance characteristic extraction network structure model is connected with the ship target tracker; the ship target detector is used for detecting ship targets; the appearance feature extraction network structure model is used for extracting appearance feature vectors of ship targets; the ship target tracker is used for tracking ship targets.