CN115937254A

CN115937254A - Multi-air flight target tracking method and system based on semi-supervised learning

Info

Publication number: CN115937254A
Application number: CN202211496180.8A
Authority: CN
Inventors: 丁锋; 欧阳志宏; 薛磊; 徐英; 房明星; 桂树; 李达
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2022-11-25
Filing date: 2022-11-25
Publication date: 2023-04-07
Anticipated expiration: 2042-11-25
Also published as: CN115937254B

Abstract

The invention provides a multi-air flying target tracking method and system based on semi-supervised learning, belonging to the technical field of computer vision and comprising the following steps: acquiring a flight target image sequence and dividing the flight target image sequence into a first flight target image sequence and a second flight target image sequence; acquiring a tagged flying target image set; constructing a supervision network model; inputting the marked flying target image set into a supervising network model; inputting the second flight target image sequence into a supervision network model after supervision training for pseudo label labeling; sequentially performing data enhancement processing on the pseudo tag flying target image set and inputting the pseudo tag flying target image set into a supervision network model after supervision training so as to optimize the supervision network model; and acquiring an image sequence of the flight target to be detected and inputting the image sequence into the optimized supervising network model. The invention can realize rapid and accurate aerial flight target positioning and tracking on the premise of not marking mass data.

Description

Multi-aerial flight target tracking method and system based on semi-supervised learning

Technical Field

The invention belongs to the technical field of computer vision, and particularly relates to a multi-air flight target tracking method and system based on semi-supervised learning.

Background

The target tracking technology is used as basic research content in the field of computer vision, and is greatly developed in the practical application fields of video monitoring, robots, sight tracking and the like. In order to realize long-time target tracking of a sky background image, a target tracking algorithm needs to solve the problems of complex background change and the like caused by various factors such as weather and illumination change besides the factors such as target motion blurring, shielding and background interference.

The conventional methods for tracking by directly utilizing image representation information of a target, such as KCF, struck, TLD and the like, rely on calculating the similarity of the representation information of the target in a video sequence, and are difficult to process the conditions of scale change and complex background change of a model of the target in the tracking process. In the tracking method based on semi-supervised learning, the tracking effect is usually limited by the quality of the video and the category of the moving object in the video content, so that the semi-supervised object tracking method lacks robustness in practical application. An End-to-End (End-to-End) tracking model based on deep learning is characterized in that target depth features are extracted from a massive video sequence, a deep network is trained to achieve a target, and particularly a supervised learning method adopted by a deep tracker represented by a twin network (Siemese CNN) needs to take a manual massive data set as a basis for model training in advance.

Disclosure of Invention

In view of the above, one of the objectives of the present invention is to provide a multi-aerial flying target tracking method based on semi-supervised learning, which can realize accurate identification, accurate positioning and tracking of multiple aerial flying targets on the premise of performing flying target class labeling on a small number of flying target images.

The second objective of the present invention is to provide a multi-flight target tracking system based on semi-supervised learning.

In order to achieve one of the purposes, the invention adopts the following technical scheme:

a multi-air flying target tracking method based on semi-supervised learning comprises the following steps:

s1, acquiring a flight target image sequence and dividing the flight target image sequence into a first flight target image sequence in front and a second flight target image sequence behind;

s2, obtaining each flight target position from each flight target image in the first flight target image sequence and carrying out category marking to obtain a labeled flight target image set;

s3, constructing a supervision network model; inputting the flying target images with the labels into a supervising network model for supervised training;

s4, inputting each flight target image in the second flight target image sequence into a supervised network model after supervision training for pseudo label labeling to obtain a pseudo label flight target image set;

s5, sequentially performing data enhancement processing on each pseudo tag flying target image in the pseudo tag flying target image set;

s6, inputting each pseudo tag flying target image of the pseudo tag flying target image set and the corresponding pseudo tag flying target image subjected to data enhancement processing into a supervised network model subjected to supervised training for unsupervised training so as to optimize the supervised network model;

and S7, acquiring an image sequence of the flight targets to be detected, and inputting the image sequence into the optimized supervising network model to acquire the positions of the flight targets at the next moment.

Further, in step S2, the specific process of acquiring the tagged flying target image set includes:

s21, respectively carrying out scale invariant feature point detection on adjacent flight target images in the first flight target image sequence to obtain two groups of position feature point vectors corresponding to the adjacent flight target images;

s22, carrying out position matching on the feature points in the two groups of position feature point vectors to obtain a position feature point pair set;

step S23, calculating the pixel distance of each pair of position characteristic point pairs in the position characteristic point pair set;

s24, judging whether the pixel distance is larger than a threshold value, if so, determining that the position characteristic point corresponding to the pixel distance belongs to a flight target area, and entering S25; if not, the position characteristic point corresponding to the pixel distance does not belong to the flight target area, and the operation is finished;

s25, clustering all position characteristic points in the flight target area to determine the flight target area of each category;

and S26, acquiring the same flying target area position as each flying target area from each flying target image and carrying out class marking to form a labeled flying target image set.

Further, the supervision network model comprises a backbone network, an RPN (resilient packet network), an interested region pooling module and a target classification network CLS;

the input ends of the RPN network and the interested region pooling module are connected with the output end of the backbone network; the output end of the interested region pooling module is connected with the target classification network CLS;

the backbone network is used for acquiring the depth image characteristics of each flying target from the labeled flying target image set;

the RPN is used for sequentially carrying out foreground-background classification and foreground position regression on the depth image characteristics of each flying target according to the position information and the category information of each flying target area in the labeled flying target image set so as to determine a candidate target area window of each flying target;

the interested region pooling module is used for adjusting the candidate target region windows of the flying targets according to the sizes of the corresponding depth image characteristics and then performing interested region pooling operation to obtain a plurality of candidate target interested region windows and the corresponding depth image characteristics;

and the object classification network CLS is used for sequentially carrying out category judgment and window regression correction on the candidate object region-of-interest windows according to the depth image characteristics.

Further, in step S5, the data enhancement processing includes one or a combination of two or more of morphological processing, radial transformation, scale transformation, color transformation, and motion blur processing;

the specific implementation process of the motion blur processing comprises the following steps:

s51, sequentially carrying out binary processing and fuzzy convolution operation on each pseudo tag flying target image in the pseudo tag flying target image set to determine a time domain image after each pseudo tag flying target image is fuzzy;

and S52, sequentially carrying out Fourier transform, wiener filtering and inverse Fourier transform on the time domain image after each pseudo tag flying target image is blurred.

Further, the specific implementation process of step S7 includes:

s71, inputting the image sequence of the flight targets to be detected into the optimized supervision network model to obtain the observation positions of the flight targets at different moments in each image of the flight targets to be detected;

s72, constructing a motion state equation of each flight target according to the observation position of each flight target at different moments;

s73, estimating the position of each flight target at the next moment by adopting the motion state equation of each flight target;

s74, inputting the position estimation result of the next moment into an RPN network in the optimized supervision network model, and inputting the images of the flight targets to be detected at the corresponding moments into a main network to obtain the observation positions of the flight targets at the corresponding moments;

and S75, adjusting the motion state equation of each flight target according to the observed position of each flight target at the next moment to determine the final position of each flight target at the next moment.

In order to achieve the second purpose, the invention adopts the following technical scheme:

a multi-air-flight target tracking system based on semi-supervised learning, the multi-air-flight target tracking system comprising:

the first acquisition module is used for acquiring a flight target image sequence and dividing the flight target image sequence into a first flight target image sequence in front and a second flight target image sequence behind;

the second obtaining module is used for obtaining each flying target position from each flying target image in the first flying target image sequence and carrying out category marking so as to obtain a labeled flying target image set;

the supervised training module is used for constructing a supervising network model; inputting the flying target images with the labels into a supervising network model for supervised training;

the pseudo label labeling module is used for inputting each flight target image in the second flight target image sequence into a supervision network model after supervision training for pseudo label labeling to obtain a pseudo label flight target image set;

the data enhancement processing module is used for sequentially enhancing the data of each pseudo tag flying target image in the pseudo tag flying target image set;

the unsupervised training module is used for inputting each pseudo label flying target image of the pseudo label flying target image set and the corresponding pseudo label flying target image subjected to data enhancement processing into a supervised network model subjected to supervised training for unsupervised training so as to optimize the supervised network model;

and the prediction module is used for acquiring the image sequence of the flight targets to be detected and inputting the image sequence into the optimized supervising network model so as to acquire the positions of the flight targets at the next moment.

Further, the second obtaining module includes:

the detection submodule is used for respectively carrying out scale-invariant feature point detection on adjacent flying target images in the first flying target image sequence to obtain two groups of position feature point vectors corresponding to the adjacent flying target images;

the position matching submodule is used for carrying out position matching on the characteristic points in the two groups of position characteristic point vectors to obtain a position characteristic point pair set;

the calculation submodule is used for calculating the pixel distance of each pair of position characteristic point pairs in the position characteristic point pair set;

the judgment submodule is used for judging whether the pixel distance is larger than a threshold value, if so, the position characteristic point corresponding to the pixel distance belongs to a flight target area, and the position characteristic point is transmitted to the clustering submodule; if not, the position characteristic point corresponding to the pixel distance does not belong to the flight target area, and the operation is finished;

the clustering submodule is used for clustering all the position characteristic points in the flight target area so as to determine the flight target area of each category;

and the class marking sub-module is used for acquiring the flight target area position which is the same as each flight target area from each flight target image and carrying out class marking to form a labeled flight target image set.

Further, the data enhancement processing module comprises:

the fuzzy convolution operation sub-module is used for sequentially carrying out binary processing and fuzzy convolution operation on each pseudo tag flying target image in the pseudo tag flying target image set so as to determine a time domain image after each pseudo tag flying target image is fuzzy;

and the inverse Fourier transform sub-module is used for sequentially performing Fourier transform, wiener filtering and inverse Fourier transform on the time domain image after each pseudo tag flying target image is blurred.

Further, the prediction module comprises:

the first acquisition sub-module is used for inputting the sequence of the flight target images to be detected into the optimized supervising network model so as to acquire the observation positions of each flight target at different moments in each flight target image to be detected;

the construction sub-module is used for constructing a motion state equation of each flight target according to the observation position of each flight target at different moments;

the estimation sub-module is used for estimating the position of each flight target at the next moment by adopting the motion state equation of each flight target;

the second acquisition sub-module is used for inputting the position estimation result at the next moment into an RPN network in the optimized supervision network model and inputting the images of the flight targets to be detected at the corresponding moments into the main network so as to acquire the observation positions of the flight targets at the corresponding moments;

and the adjusting sub-module is used for adjusting the motion state equation of each flight target according to the observed position of each flight target at the next moment so as to determine the final position of each flight target at the next moment.

In summary, the scheme provided by the invention has the following technical effects:

according to the method, a flying target image set with a label is obtained through a first flying target image sequence; the supervised training of the supervising network model is realized through the labeled flying target image set; generating a pseudo-label flying target image set by using a second flying target image sequence through a supervision network model after supervision training; through the non-tag flying target image set, the unsupervised training of the supervision network model is realized, and the optimized supervision network model is obtained; inputting the image sequence of the flying target to be detected into the optimized supervising network model to obtain the position of each flying target at the next moment, thereby realizing the aerial target tracking; the invention can realize accurate identification and accurate positioning of a plurality of aerial flight targets on the premise of labeling a small amount of flight target images.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 is a schematic flow chart of a multi-air flight target tracking method based on semi-supervised learning according to the present invention;

FIG. 2 is a schematic diagram of a supervision network model structure before optimization according to the present invention;

FIG. 3 is a diagram of the RPN network structure of the present invention;

fig. 4 is a schematic structural diagram of an optimized supervision network model according to the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.

The embodiment provides a multi-air flying target tracking method based on semi-supervised learning, and referring to fig. 1, the multi-air flying target tracking method includes:

s1, acquiring a flying target image sequence and dividing the flying target image sequence into a first flying target image sequence in front and a second flying target image sequence in back.

In order to adapt to the complex change of the background in the target tracking process, the flying target image in the flying target image sequence of the embodiment includes a large number of flying target images under different climatic and lighting conditions. The embodiment divides the flight target image sequence into a first flight target image sequence and a second flight target image sequence which are divided into a front part and a rear part.

S2, obtaining the position of each flight target from each flight target image in the first flight target image sequence and carrying out category marking to obtain a labeled flight target image set.

The first flight target image sequence of the front part of the embodiment is an acquired flight target image sequence of the initial stage, and is used for constructing a tag flight target image set. In order to overcome the interference caused by camera shake and illumination change, in this embodiment, an SIFT (Scale-innovative Feature Transform) Feature point detection algorithm and an ARIT (Action Recognition with Improved objects) method is adopted, a motion region is extracted from each flight target image, and a motion target category to which each motion region belongs is labeled, thereby completing the labeling processing on a labeled data set. In this embodiment, the specific process of obtaining the tagged flying target image set includes:

s21, respectively carrying out scale-invariant feature point detection on adjacent flight target images in the first flight target image sequence to obtain two groups of position feature point vectors corresponding to the adjacent flight target images;

in this embodiment, an SIFT feature point detection method is adopted to detect several frames of images (i.e., the first flying target image) at the initial stage, and two sets of scale-invariant feature points between adjacent frames (i.e., adjacent flying target images) are obtained.

in this embodiment, a Random Sample Consensus method (RANSAC Random Sample Consensus) is used to perform position matching on two sets of scale-invariant feature points (i.e., two sets of position feature point vectors).

when the Euclidean distance between the position characteristic point pairs is larger than a set threshold value, the position characteristic points belong to a flight target area, otherwise; the position feature point pair changes the pixel position between the adjacent frames due to camera shake and does not belong to the flying target area.

S25, clustering all position feature points in the flight target area to determine the flight target area of each category;

In this embodiment, the gradient histogram feature of the designated flight target area is calculated, a KCF kernel correlation filtering method is adopted, a corresponding position sequence of the target is found in a subsequent flight target image, and the corresponding flight target label information is combined to serve as a labeled data set of the training sample.

S3, constructing a supervision network model; and inputting the flying target images with the labels into a supervising network model for supervised training.

In the embodiment, the supervision training of the supervision network model is performed, and the supervision network model is used as a basis for generating the pseudo labels for the unlabeled data set in the subsequent process.

The supervising network model structure of the embodiment, referring to fig. 2, includes a backbone network, an RPN network, an interest region pooling module, and a target classification network CLS. The input ends of the RPN network and the interested region pooling module are connected with the output end of the backbone network. The output end of the region of interest pooling module is connected to the object classification network CLS.

The backbone network of this embodiment is configured to obtain depth image features of each flying target from the tagged flying target image set. The backbone network of the present embodiment generally employs a 50-layer residual network. In this embodiment, according to a multivariate Cross Entropy loss (Softmax Cross Entropy) function L, as shown in formula (1), a backbone network is trained to obtain initial backbone network parameters, and the trained backbone network performs flight target and background discrimination and flight target class discrimination on a flight target image to obtain depth image features of different flight targets:

wherein N represents the number of samples, M is the number of flight target categories, y _ic If the class of the ith sample (flight target image) belongs to the class c, the value is 1, otherwise, the class is 0,p _ic The probability of class c is predicted for the ith sample.

In this embodiment, the RPN network training is completed according to the initialized backbone network parameters. Referring to fig. 3, according to the 3 × 224 dimension of the input flying target image of the backbone network, the backbone network inputs 256 × 56 depth image features corresponding to the flying target image into the RPN network, and the RPN network performs 4-fold down-sampling on the width and height dimensions of the input flying target image and stretches the depth dimension to 256, which is called a depth Feature Map (Feature Map). After each channel of the depth feature map is subjected to 3*3 convolution and full-connection operation in sequence, each matrix element position subjected to convolution is used as an anchor point according to three proportions (1:1, 1:2 and 2:1) and three scaling scales (0.75, 1 and 1.5 original image scales) to generate 9 search windows, then all the search windows are respectively fed into two convolution operation branches of 1*1, each search window is subjected to inverse mapping according to the positions, the proportions and the scaling scales back to an original image and then combined with label information, and foreground/background information is respectively carried out on the two branchesAnd (3) performing two-classification judgment and regression of the positions of circumscribed rectangular frames of the areas where the flight targets are located, wherein the total loss function of the two parts is determined by the classification loss L of each search area _CLS Regression with each search box (REG) loss L _REG Jointly constitute:

L _CLS n in equation (2) according to the definition of binary cross entropy loss _CLS Taking the value of 2 times of the number of search windows generated by all flight targets _i The prediction type probability distribution of the ith anchor position in the training process; p is a radical of _i * The ratio obtained by calculating by adopting an IoU (interaction over Unit) method according to the actual target label is adopted, the ratio value is more than 0.5, and the foreground p is considered to be _i ^* The value is 1, the proportional value is less than 0.1, and the area is judged as the background p _i ^* The value is 0, and the other cases are abandoned; t is t _i The window position of the ith anchor point is represented, and the window position is a 4-dimensional vector representing the size and the position of a search window; t is t _i * Represents p _i ^* And 1, corresponding label data window. The search window regression loss is defined as the smooth L1 norm form. And updating the RPN network parameters according to the gradient back propagation of the loss function.

The RPN network of this embodiment is configured to sequentially perform foreground-background classification and foreground-position regression on the depth image features of each flying target according to the position information and the category information of each flying target region in the tagged flying target image set, so as to determine a candidate target region window of each flying target. The RPN network structure of the present embodiment, referring to fig. 3, includes a downsampled layer, a depth dimension stretched layer, a first convolutional layer of 3*3, a fully connected layer, and a second convolutional layer of 2 1*1. The depth dimension stretched layer, the first rolled layer of 3*3, and the fully-connected layer are connected in this order. The input ends of the second convolution layers of 2 layers 1*1 are all connected with the output end of the full-connection layer. The depth dimension stretching layer is used for stretching the depth image features of the flight targets in width dimension and height dimension to obtain the depth feature maps of the flight targets. And the first convolution layer is used for respectively carrying out 3*3 convolution operation on the three channels of the depth feature maps of the flight targets. And the full connection layer is used for performing full connection operation on each depth feature map subjected to the 3*3 convolution operation to obtain a plurality of target area search windows of each depth feature map. And the second convolution layers of the 2 1*1 are used for carrying out position mapping and size mapping on each target area search window and each flying target in each flying target image so as to realize foreground-background classification and foreground position regression.

The region-of-interest pooling module of this embodiment is configured to adjust the candidate target region windows of the respective flight targets according to the size of the corresponding depth image feature. Because the size of the recommended window output by the RPN network is the corresponding input image size, the size of all the recommended windows needs to be reduced and adjusted according to the size of the depth feature of the backbone network, corresponding positions are found on each channel of the depth feature map according to the adjusted search window, the depth features in the regions are expanded into vectors with uniform length to represent the content of the original image, the vectors are called depth feature recommended vectors, namely, a plurality of regions of Interest are generated at the position of the flight target image in the original flight target image, and the process is called Region of Interest Pooling (Region of Interest Pooling, roI Pooling)

The target classification network CLS of this embodiment is configured to sequentially perform category determination and window regression correction on the multiple candidate target region-of-interest windows. The CLS network judges the category of the flight target by combining the input of an image category label in label data through a full connection layer and a multiple cross entropy loss function for a depth feature suggestion vector output by the RoI Pooling. And performing regression correction of a circumscribed rectangular window on the position of the flying target according to the position tag information of the flying target. The CLS network parameters can be obtained by optimizing the two loss functions. And (3) utilizing the RPN network parameters and the CLS network parameters in the existing state to finely adjust the two network parameters (namely the RPN network parameters and the CLS network parameters) according to the labeled data.

And S4, inputting each flight target image in the second flight target image sequence into a supervised network model after supervision training for pseudo label labeling to obtain a pseudo label flight target image set.

After the second flight target image sequence is input into the supervised network model after supervised training, the embodiment performs pseudo label labeling on the second flight target image sequence by using a FlexMatch method. Referring to fig. 2, a specific implementation process of the pseudo tag labeling includes:

and S41, inputting all the flying target images in the second flying target image sequence into the trained backbone network to obtain the depth characteristics of each flying target.

Step S42, the RPN generates a possible candidate area set of each flight target according to the depth characteristics of each flight target, compares the Classification branch of each possible candidate area with constant Confidence threshold processing (Confidence-Based threshold) to determine whether a certain possible candidate area is a foreground or a background to generate a candidate position area in which the flight target possibly exists, corrects the position of the image area determined by regression branch correction to form a more accurate search suggestion window sequence, and filters out an anchor point with a higher error probability after Non-Maximum Suppression (Non-Maximum Suppression) processing on the search suggestion window sequence to form a candidate area of each flight target as a suggestion window (Proposals) (i.e. a candidate target area window).

Step S43, performing region-of-interest pooling, namely adjusting the candidate target region windows of each flight target according to the size of the corresponding depth image feature to obtain a plurality of candidate target region-of-interest windows and corresponding depth image features;

and S44, sequentially carrying out category judgment and window regression correction on the candidate target region-of-interest windows according to the depth image characteristics.

Object classification of the present embodimentThe network (CLS) prejudges the category of the moving target according to the loss of multivariate Cross Entropy (Softmax Cross entry), and allocates a pseudo label of a certain flight target to a region which meets the judgment condition and is larger than a category judgment threshold value tau. And correcting the region to which the pseudo label belongs through a window regression branch to obtain the position of the target region and the corresponding pseudo label. For the setting of the category judgment threshold value tau, according to a CPL (Current Pseudo laboratory Labeling) method proposed by FlexMatch, considering the difference of learning difficulty degrees of different types of flight target images in the training process and the influence on the training result, the category judgment threshold value for a certain flight target category is set as a dynamically-changing adaptive function tau _t (c) Rather than an intrinsic value τ:

τ _t (c)＝M(β _t (c))*τ，

and is

Wherein p is _t (y|x _n ) To supervise the classification prediction result of the CLS network in the network model on the label-free data (label-free flight target image), sigma _t (c) Indicating the judgment result of the supervising network model for all N non-label flying target images according to the category judgment threshold tau at the time t, namely for the input non-label image x _n And the C-type flying target is judged to meet the judgment threshold value, namely the pseudo tag learning efficiency of the current C-type flying target is reflected. And the learning effect of the supervising network model at the current moment on the category can be represented by the fact that the classification prediction result of the unlabeled training sample after passing through the CLS network is higher than the judgment threshold value and the number of the pseudo label flying target samples judged as the flying targets is greater than the judgment threshold value. Category determination threshold τ of flight target in the present embodiment _t (c) The acquisition process comprises the following steps:

1. setting an initial category judgment threshold value of each flight target;

2. the second flying meshInputting each flight target image in the target image sequence into a supervision network model after supervision training so as to obtain the conditional probability p of each flight target image belonging to each flight target category _t (y|x _n )；

3. Calculating the pseudo tag learning efficiency sigma of each flight target category according to the conditional probability of each flight target image belonging to each flight target category _t (c)；

4. Learning an efficiency σ from the respective pseudo label _t (c) The medium and maximum pseudo label learning efficiency is calculated, and the learning efficiency sigma of each pseudo label is calculated _t (c) And the ratio of the maximum pseudo tag learning efficiency to obtain the learning efficiency coefficient beta of each flight target category _t (c)；

5. Learning efficiency coefficient beta for each flight target class _t (c) And performing convex function mapping to adjust the initial category judgment threshold of each flight target.

And S5, sequentially performing data enhancement processing on each pseudo tag flying target image in the pseudo tag flying target image set.

The data enhancement processing in the present embodiment includes one or a combination of two or more of morphological processing, radial transformation, scale transformation, color transformation, and motion blur processing. In the real-time monitoring process of the flying target, motion blur caused by camera lens defocusing, insufficient exposure rate and other situations is inevitable, so that the embodiment performs motion blur processing on the pseudo tag area based on a Point Spread Function (PSF Point Spread Function) linear blur simulation method, and then performs inverse transformation on the blurred area by using wiener filtering to obtain a motion blur restoration result. The specific implementation process of the motion blur processing comprises the following steps:

randomly selecting a flight target pseudo label judgment result image, processing each pseudo label flight target image into a binary image by adopting an adaptive threshold, then performing fuzzy convolution operation on the binary image by adopting a Canny edge detection operator, extracting a contour region in the binary image, setting the flight target image region to be a contour region 1, setting the rest regions to be 0, then performing singular value decomposition on the matrix, and representing the motion direction (theta) according to the included angle between the vector corresponding to the maximum characteristic value obtained after decomposition and a horizontal axis.

According to the definition of linear motion blur in a frequency domain, the result of the frequency domain transformation F (u, v) of the original pseudo tag flying target image I (x, y) after the convolution of linear blur H (u, v) is G (u, v):

if the motion parameter X is in the X-axis and Y-axis ₀ (t) and y ₀ And (t) under the condition of knowing, the linear fuzzy H (u, v) is known, and the original flight target image is recovered from G (u, v) in a frequency domain. With the PSF method, the representation in the time domain is obtained by inverse transforming H (u, v). In general, PSF blur, which is generally described by a blur length L and a motion direction angle θ, at a certain pixel position is represented as a blur analog distortion:

and obtaining a fuzzy convolution kernel h (x, y) according to the fuzzy length L and the motion direction angle theta, and after the convolution is carried out on the original flight compensation image, carrying out motion fuzzy processing on the pseudo tag area image to obtain a time domain image after the pseudo tag flight target image is fuzzy.

And then, in an inverse transformation process, defining a dimensional nanofiltration mirror filter according to a Fourier transformation result H (u, v) of a fuzzy convolution kernel H (x, y) and a conjugate H (u, v) of the Fourier transformation result H (u, v), performing inverse Fourier transformation after multiplying a fuzzy result of the pseudo tag flight target image in a frequency domain to obtain a partial recovery result I' (x, y) of the original pseudo tag image, thereby realizing the enhancement processing of linear motion fuzzy data, randomly combining the process with other data enhancement operations such as radial transformation, scale transformation, color transformation and the like, and realizing complex data enhancement operation of the pseudo tag image area of the flight target.

And S6, inputting each pseudo label flying target image of the pseudo label flying target image set and the corresponding pseudo label flying target image subjected to data enhancement processing into a supervision network model subjected to supervision training for unsupervised training so as to optimize the supervision network model.

The method is characterized in that the original pseudo label flying target image generated by the supervising network model and the result after data enhancement (namely the pseudo label flying target image after data enhancement) are consistent in category judgment (namely a consistency regularization method), so that the depth image information (namely the depth image characteristics) extracted by the main network can still have good representativeness even under the condition that the flying target appearance is changed due to factors such as illumination change, scale, rotation and the like, and the flying target tracking model (namely the final supervising network model) has generalization capability. The unsupervised data computing pseudo-tag process uses a loss function Lu, which is consistent with a loss function Ls of the process for processing tagged data as shown in equation 2. The process is written as follows:

L _u (T(x _u ,s ^* ),q ^* )＝L _u (x _u , q ^* ,s ^* )＝L _s (x _T , q ^* ,s _T ^* ) (6)

wherein x is _u Representing an unlabeled image; q. q.s ^* ＝argmax(q(x _u ))，q(x _u ) Is to indicate the non-tag data x _u Pseudo label summary obtained through supervision network modelRate distribution, then q ^* Representing a probability density of belonging to a certain class of flight targets; s represents a position marking frame of a pseudo tag flying target, is a vector with the length of 4 and comprises position and size information; t (x) _u ,s ^* ) Data enhancement results representing images of the pseudo-tag flying targets; unlabeled data x _u X is obtained after data enhancement operation T _T The position indication frame s is generated due to the geometric transformation operation in the data enhancement operation T ^* Is changed, i.e. s ^* Is changed into s _T ^* 。

The minimization loss function optimizes the supervision network model, and the process is to minimize all loss functions so as to update the parameters of the moving target detection model (namely the supervision network model) in the back propagation process. The unsupervised loss of the pseudo label generation process is the total loss of the supervision network model, and comprises a classification process (CLS) loss function and a boundary window regression process (REG) loss which are jointly formed, the loss function is consistent with the loss function generated by the supervision network model with supervision training in the step S3, and the final global loss function comprises supervision loss L _s And unsupervised loss L _u Collectively, the global penalty function L is:

L＝L _s (x, p ^* ,t ^* )+ L _u (T(x _u , s ^* ), q ^* ) (7)

in this embodiment, a random Gradient Descent method (SGD Stochastic Gradient component) is adopted, a global loss function is optimized to converge to a minimum value, and in a back propagation process of training, a supervision network model parameter is subjected to a supervision network Gradient, so as to finally obtain a moving target tracking model (i.e., a final supervision network model).

And S7, acquiring an image sequence of the flight targets to be detected, and inputting the image sequence into the optimized supervision network model to acquire the positions of the flight targets at the next moment.

As shown in fig. 4, the optimized supervisory network model is applied to process the image sequence of the flight targets to be detected by using the optimized supervisory network model and the motion state equation, so as to obtain the trajectories of the flight targets. The specific implementation process comprises the following steps:

in the initial stage of the moving target tracking process of the flying target, the optimized supervising network model is used for monitoring the first t flying target images (namely I) in the flying target image sequence to be detected ₀ ～I _t ) And processing to obtain the observation positions of the flight targets at different moments (0-t moments) according to the optimized CLS network in the supervision network model.

And S72, constructing a motion state equation of each flight target according to the observation position of each flight target at different moments.

The motion state equation of each flight target of the present embodiment is generally a differential equation in the form of:

wherein x and z represent the theoretical image position of the flying target and the CLS network part output position (i.e. the observed position) of the flying target tracking network, respectively,

the moving speed of the flying target on the image is represented, functions f (x, t) and g (x) respectively represent a state transfer equation and an observation state equation, variables u and v represent state transfer noise and observation noise and are in accordance with normal Gaussian distribution. With the camera exposure time Δ t known, the equation of state of motion is expressed in discrete form with respect to the acquisition time:

two equations in the above equation represent a state transfer equation and an observation state equation, respectively, where the state transfer matrix and the observation matrix are a and C, respectively, the covariance Q of the process transfer noise u, the covariance of the observation noise v is R, and both obey a gaussian distribution with a mean value of 0:

p(u)～N(0,Q),p(v)～N(0,R) (10)

supposing that the theoretical image position of the flight target at the time t is x _t The predicted position of the flight target at the time t-1 is expressed as

According to the observed position z at time t _t The estimated position after the optimum solution obtained at this moment is ≥>

And the theoretical position x of the flight target _t Observed position z of flying target _t And an optimal solution position>

Satisfies the following relationship:

defining the prior state error of the theoretical flight target image position and the predicted position at the time t as

Posterior state error e of theoretical flight target image position and optimal solved image position _t And the prior covariance of the theoretical flight target position and the predicted position is expressed as P _t ^- And the posterior covariance P of the theoretical flight target position and the optimally solved image position _t The calculation forms of (a) are respectively expressed as:

at time t, according toOutputting a flight target observation position z by the current CLS network _t And a predicted state of the flight target state at time t-1

Obtaining an optimal solution status->

Wherein +>

Namely:

harmonic matrix K _t The value range of the gain is 0 to 1, the gain is used for balancing the difference between the predicted position at the last moment and the observed position at the current moment, K _t Is calculated as follows:

establishing the motion state is a cyclic iterative process of two computational processes, namely state transfer estimation and optimal state solution. After initializing the covariance Q of the transfer noise u and the covariance R of the observation noise v, the initial position x of the flight target is first specified at time 0 ₀ And initializing the moving speed of the flying target in the image

Combining the output result of the flight target tracking network CLS to obtain the position sequence { z ] of the flight target of the column from 0 to t _t And initialize the A posteriori covariance P ₀ (ii) a In the state transfer stage, the state transfer equation of the motion state equation carries out state updating from the time t-1 to the time t to obtain the estimated state ^ and ^ of the flight target>

And calculating a prior covariance matrix P _t ^- (ii) a In the optimal state solving stage, according to the observation noise R and the prior covariance P _t ^- And calculating to obtain a harmonic matrix K _t Then combining the actual observation position z output by the current flight target tracking network _t Correcting the estimated position to obtain the optimal solution state->

And calculating the posterior covariance P _t . When a covariance P _t And after the harmonic matrix K is converged, representing the establishment process of the discrete-form motion state equation of the flight target.

After the motion state equation is established, at the moment of t +1, the depth characteristic vector of the input image passing through the backbone network is calculated, and the estimated position of the flight target in the image is obtained according to the estimation result of the state transfer equation

And calculating a proportionality coefficient according to the image size and the size of the depth image feature output by the backbone network, mapping the image position for realizing the flying target to the depth image feature according to the proportionality coefficient, and then bringing the position into the RPN network as an anchor point position for generating a flying target search window, thereby realizing the reduction in the RPN networkLow search window numbers improve the efficiency of the search. And when the two flying targets are shielded from each other at present, the respective motion state at the next moment can be deduced according to the respective motion state equation and the motion state equation, so as to distinguish different flying targets in the shielding process and after the shielding is released.

The supervision network model optimized at the moment t +1 obtains the observation position z of the flying target _t+1 Then, the harmonic matrix K and the posterior covariance P of the equation of state of motion _t+1 Updating and predicting the state according to the flight target at the time t

Current harmonic matrix K and observation position z _t+1 Obtaining an optimal solution position->

In the embodiment, a labeled flying target image set is obtained through a first flying target image sequence; the supervised training of the supervising network model is realized through the labeled flying target image set; generating a pseudo label flying target image set by using a second flying target image sequence through a supervision network model after supervision training; unsupervised training of a supervision network model is achieved through the pseudo label flying target image set, and the optimized supervision network model is obtained; inputting the image sequence of the flying target to be detected into the optimized supervising network model to obtain the position of each flying target at the next moment, thereby realizing the aerial target tracking; according to the method and the device, accurate identification and accurate positioning of a plurality of aerial flight targets can be realized on the premise of labeling a small number of flight target images.

The embodiment can be realized by adopting a multi-air flight target tracking system based on semi-supervised learning, which is provided by the following embodiments:

another embodiment provides a multi-air flying target tracking system based on semi-supervised learning, including:

the supervised training module is used for constructing a supervising network model; inputting all the flight target images in the labeled flight target image set into a supervision network model for supervision training;

the unsupervised training module is used for inputting each pseudo label flying target image in the pseudo label flying target image set and the corresponding pseudo label flying target image subjected to data enhancement processing into a supervised network model subjected to supervised training for unsupervised training so as to optimize the supervised network model;

Further, the second obtaining module includes:

the detection sub-module is used for respectively carrying out scale-invariant feature point detection on adjacent flight target images in the first flight target image sequence to obtain two groups of position feature point vectors corresponding to the adjacent flight target images;

the judgment submodule is used for judging whether the pixel distance is larger than a threshold value or not, if so, the position characteristic point corresponding to the pixel distance belongs to a flight target area, and the position characteristic point is transmitted to the clustering submodule; if not, the position characteristic point corresponding to the pixel distance does not belong to the flight target area, and the process is finished;

and the class marking sub-module is used for acquiring the flight target area position same as each flight target area from each flight target image and carrying out class marking so as to form a labeled flight target image set.

Further, the data enhancement processing module comprises:

the fuzzy convolution operation submodule is used for sequentially carrying out binary processing and fuzzy convolution operation on each pseudo tag flying target image in the pseudo tag flying target image set so as to determine a time domain image after each pseudo tag flying target image is blurred;

and the inverse Fourier transform submodule is used for sequentially performing Fourier transform, wiener filtering and inverse Fourier transform on the time domain image after each pseudo tag flying target image is blurred.

Further, the prediction module comprises:

the construction submodule is used for constructing a motion state equation of each flight target according to the observation position of each flight target at different moments;

the second obtaining sub-module is used for inputting the position estimation result at the next moment into the RPN network in the optimized supervising network model and inputting the images of the flight targets to be detected at the corresponding moments into the main network so as to obtain the observation positions of the flight targets at the corresponding moments;

and the adjusting submodule is used for adjusting the motion state equation of each flight target according to the observed position of each flight target at the next moment so as to determine the final position of each flight target at the next moment.

The principles, terminology and formulas involved in the above embodiments are applicable and are not described in detail herein.

It should be noted that the technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, however, as long as there is no contradiction between the combinations of the technical features, the scope of the present description should be considered. The above examples only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent application shall be subject to the appended claims.

Claims

1. A multi-air flying target tracking method based on semi-supervised learning is characterized by comprising the following steps:

s6, inputting each pseudo label flying target image in the pseudo label flying target image set and the corresponding pseudo label flying target image subjected to data enhancement processing into a supervision network model subjected to supervision training for unsupervised training so as to optimize the supervision network model;

2. The method for tracking multiple aerial flight targets according to claim 1, wherein in the step S2, the specific process of acquiring the image set of tagged flight targets includes:

3. The multi-aerial flight target tracking method according to claim 2, wherein the supervision network model comprises a backbone network, an RPN network, a region-of-interest pooling module and a target classification network CLS;

the interesting region pooling module is used for adjusting the candidate target region windows of the flight targets according to the sizes of the corresponding depth image features and then performing interesting region pooling operation to obtain a plurality of candidate target interesting region windows and the corresponding depth image features;

and the object classification network CLS is used for sequentially carrying out category judgment and window regression correction on the candidate object interesting region windows according to the depth image characteristics.

4. The multi-aerial-flight target tracking method according to claim 3, wherein in step S5, the data enhancement process includes one or a combination of two or more of morphological processing, radial transformation, scale transformation, color transformation, and motion blur processing;

5. The multi-aerial flight target tracking method according to claim 4, wherein the specific implementation procedure of the step S7 comprises:

s74, inputting a position estimation result at the next moment into an RPN network in the optimized supervision network model, and inputting images of the to-be-detected flight targets at the corresponding moments into a main network to obtain the observation positions of the flight targets at the corresponding moments;

6. A multi-aerial flight target tracking system based on semi-supervised learning, the multi-aerial flight target tracking system comprising:

the second acquisition module is used for acquiring each flight target position from each flight target image in the first flight target image sequence and performing class marking to acquire a labeled flight target image set;

the unsupervised training module is used for inputting each pseudo label flying target image in the pseudo label flying target image set and the corresponding pseudo label flying target image subjected to data enhancement processing into a supervised network model subjected to supervised training for unsupervised training so as to optimize the supervised network model; and the prediction module is used for acquiring the image sequence of the flight targets to be detected and inputting the image sequence into the optimized supervising network model so as to acquire the positions of the flight targets at the next moment.

7. The multi-aerial flight target tracking system of claim 6, wherein the second acquisition module comprises:

8. The multi-aerial flight target tracking system of claim 7, wherein the data enhancement processing module comprises:

9. The multi-aerial flight target tracking system of claim 8, wherein the prediction module comprises:

the first acquisition sub-module is used for inputting the image sequence of the flying targets to be detected into the optimized supervision network model so as to acquire the observation positions of the flying targets at different moments in each image of the flying targets to be detected;

the estimation submodule is used for estimating the position of each flight target at the next moment by adopting the motion state equation of each flight target;