CN112766102A - Unsupervised hyperspectral video target tracking method based on space-spectrum feature fusion - Google Patents

Unsupervised hyperspectral video target tracking method based on space-spectrum feature fusion Download PDF

Info

Publication number
CN112766102A
CN112766102A CN202110018918.9A CN202110018918A CN112766102A CN 112766102 A CN112766102 A CN 112766102A CN 202110018918 A CN202110018918 A CN 202110018918A CN 112766102 A CN112766102 A CN 112766102A
Authority
CN
China
Prior art keywords
frame
hyperspectral
branch
tracking
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110018918.9A
Other languages
Chinese (zh)
Other versions
CN112766102B (en
Inventor
王心宇
刘桢杞
钟燕飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN202110018918.9A priority Critical patent/CN112766102B/en
Publication of CN112766102A publication Critical patent/CN112766102A/en
Application granted granted Critical
Publication of CN112766102B publication Critical patent/CN112766102B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/13Satellite images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/48Matching video sequences
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A40/00Adaptation technologies in agriculture, forestry, livestock or agroalimentary production
    • Y02A40/10Adaptation technologies in agriculture, forestry, livestock or agroalimentary production in agriculture

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Astronomy & Astrophysics (AREA)
  • Remote Sensing (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to an unsupervised hyperspectral video target tracking method based on spatial-spectral feature fusion. The hyperspectral target tracking method based on deep learning is designed by combining a cycle consistency theoretical method, a hyperspectral target tracking deep learning model can be trained unsupervised, and the cost of manual labeling is saved. On the basis of a Simese tracking framework, RGB (red, green and blue) branches (spatial branches) and hyperspectral branches are designed; and training a spatial branch by using RGB video data, loading the trained RGB model into network fixed parameters, and training a hyperspectral branch at the same time to obtain the fused features with higher robustness and discrimination. The final usage inputs the fused features into a correlation filter (DCF) to obtain the tracking result. The method can solve the problems of manual labeling of hyperspectral video data and few hyperspectral training samples for deep learning model training, and can effectively improve the precision and speed of a hyperspectral video tracking model.

Description

Unsupervised hyperspectral video target tracking method based on space-spectrum feature fusion
Technical Field
The invention relates to the field of computational vision technology processing, in particular to an unsupervised hyperspectral video target tracking method based on space-spectrum feature fusion.
Background
Target tracking of a hyperspectral video (high spatial resolution-high temporal resolution-high spectral resolution) is used as a new direction, and the target information of a given initial frame in the hyperspectral video is used for predicting the state of a target in a subsequent frame. Compared with RGB video target tracking, hyperspectral video target tracking can provide spectral information for distinguishing different materials besides spatial information. Even if the target has the same shape, the hyperspectral video can be used for tracking the target as long as the materials are different, which is an advantage that the RGB video target tracking does not have. Therefore, hyperspectral video target tracking can play an important role in the fields of camouflage target tracking, small target tracking and the like. On the basis, hyperspectral video target tracking also attracts the attention of more and more researchers.
Meanwhile, hyperspectral video target tracking is a difficult task. Firstly, the existing hyperspectral video target tracking algorithm uses the traditional manual features to represent the features of the target, so that the performance of the hyperspectral video target tracking algorithm is limited; secondly, the hyperspectral video needs to be shot by a special hyperspectral video camera, training samples are limited, and therefore the hyperspectral video target algorithm based on deep learning in the real sense does not exist at present. Thirdly, the supervised deep learning algorithm requires a large number of samples of manual standards, especially video annotation, which is time-consuming and labor-consuming. Due to the existence of the problems, the existing hyperspectral video target tracking algorithm is poor in performance.
Disclosure of Invention
The invention aims to provide an unsupervised hyperspectral video target tracking method based on space-spectrum feature fusion.
The unsupervised hyperspectral video target tracking method based on the spatial-spectral feature fusion has the following three remarkable characteristics. Firstly, a cycle consistency principle is utilized, and the whole hyperspectral target tracking algorithm based on deep learning is trained unsupervised under the condition that no manual marking is needed. And secondly, a related filtering hyperspectral video target tracking framework with space-spectrum feature fusion is designed, the problem that hyperspectral video training samples are few is solved to a certain extent, and meanwhile, the RGB and hyperspectral features are fused to obtain features with higher robustness and identification capability. And thirdly, designing a channel attention module, and calculating the weight of the characteristic channel only in an initial frame, so that the network can dynamically aggregate different weights of the characteristic channels of different targets.
The invention provides an unsupervised hyperspectral video target tracking method based on spatial-spectral feature fusion, which comprises the following steps of:
step 1, preprocessing video data;
step 2, initializing the boundary frame randomly and obtaining a template frame Z through the initialized boundary frameiAnd subsequent search frame Zi+xTemplate frame ZiAnd search frame Zi+xThe video frame is an RGB video frame or a hyperspectral video frame;
step 3, unsupervised training of RGB branches, also called spatial branches, is carried out by utilizing the principle of cycle consistency, and finally an optimized spatial branch model is obtained
Figure BDA0002888002740000021
The spatial branch comprises a template branch 1 and a search branch 1, wherein the template branch 1 contains a template frame Z of a tracking targetiFor an input image frame, the template frame Z at this timeiFor RGB video frames, search branch 1 to search for frame Zi+xI.e. the subsequent video frame is the input image frame, x>0, removing the hyperspectral branches when training the spatial branches, and only training the spatial branches;
the template branch 1 and the search branch 1 have the same structure and comprise a convolution layer, a nonlinear active layer, a convolution layer and a local response normalization layer;
step 4, unsupervised training of hyperspectral branches by using a cycle consistency principle to finally obtain an optimized space-hyperspectral model
Figure BDA0002888002740000031
The hyperspectral branch comprises a template branch 2 and a search branch 2, wherein the template branch 2 comprises a model of a tracked targetPlate frame ZiFor input image frame, search branch to search frame Zi+xI.e. the subsequent video frame is the input image frame, x>0, loading the model of the spatial branch during the training of the hyperspectral branch
Figure BDA0002888002740000032
Meanwhile, the freezing space branch parameters do not participate in back propagation;
the template branch 2 comprises a plurality of spectral feature extraction modules and a channel attention module which are connected in series, wherein the first two spectral feature extraction modules comprise a convolutional layer-batch normalization layer-nonlinear active layer, the third spectral feature extraction module comprises a convolutional layer-batch normalization layer-nonlinear active layer-convolutional layer, the channel attention module comprises a global average pooling layer-full connection layer-nonlinear active layer-full connection layer-Softmax, and the plurality of search branches 2 comprise only the spectral feature extraction modules which are connected in series and do not comprise the channel attention module;
step 5, the hyperspectral video frame X containing the target to be tracked1Input to a network model
Figure BDA0002888002740000033
The middle template branches, and the subsequent frame X2,X3,X4...XiAre sequentially input into the network model
Figure BDA0002888002740000034
The search branch of (2) results in the tracking result of each frame.
Further, the specific implementation manner of step 1 is as follows,
firstly, converting video data into a continuous image X of one framei,XiThe video frame is an RGB video frame or a hyperspectral video frame;
then the video image frame X without the label is processediAll resize sized video image frames Yi
Further, the step 2 is realized as follows,
on the basis of step 1, in the video frame Y without labeliBy the coordinate [ x, y ]]Is a centerSelecting a region with the size of 90 x 90 pixels as a target to be tracked, wherein the region is initialized BBOX; bringing the 90 × 90 region resize to Z of 125 × 125 pixelsi(ii) a Simultaneously at Yi+1To Yi+10Two frames Y of the 10 frames are randomly selectedi+aAnd Yi+b,10>=a>0,10>=b>0,a>b or a<b, likewise in the coordinates [ x, y ]]Selecting a 90 x 90 pixel size region resize for the center to a 125 x 125 pixel size Zi+aAnd Zi+b
Further, the specific implementation manner of the step 3 is as follows,
step 3.1, template branch 1 with template frame ZiFor inputting an image frame, branch 1 is searched for frame Zi+xIn order to input image frames, removing hyperspectral branches when training spatial branches, and only training the spatial branches;
step 3.2, template frame ZiEnter template branch 1, in this case ZiIs a RGB video frame, ZiSequentially obtaining a characteristic F _ t through a convolutional layer, a nonlinear active layer, a convolutional layer and a local response normalization layer;
step 3.3, adding Zi+a Input search Branch 1, when Zi+aIs an RGB video frame; zi+aSequentially obtaining a characteristic F _ s through a convolution layer, a nonlinear active layer, a convolution layer and a local response normalization layer;
step 3.4, solving a ridge regression loss function;
Figure BDA0002888002740000041
obtaining a filter w, wherein H is an ideal Gaussian response and lambda is a constant;
Figure BDA0002888002740000042
wherein the content of the first and second substances,
Figure BDA0002888002740000043
is a Fourier transform of w, the same way
Figure BDA0002888002740000044
Is the fourier transform of the F _ t,
Figure BDA0002888002740000045
is the Fourier transform of H, a represents a conjugate value, a represents a dot product;
step 3.5, calculating through the filter w and the characteristic F _ s of the subsequent frame to obtain the final response R;
Figure BDA0002888002740000046
wherein F-1Representing an inverse fourier transform;
step 3.6, first, forward tracking is carried out, and the tracking sequence is Zi-Zi+a-Zi+bThree frames constitute a training pair, and b>a, obtaining a tracking response Ri+a,Ri+b(ii) a Then tracking backwards with a tracking sequence of Zi+b-ZiTo obtain a tracking response Ri
Step 3.7, calculate the moving weight Mmotion
Figure BDA0002888002740000051
Wherein HiIs an initial frame ZiIdeal gaussian output, Hi+aIs Zi+aThe ideal Gaussian output is obtained, and m represents m different training pairs; by calculating moving weights MmotionTo determine whether the random initialization bounding box contains dynamic targets, if there are dynamic targets, MmotionMay be weighted more than the value without the dynamic target;
step 3.8, constructing a loss function:
Figure BDA0002888002740000052
wherein n represents batchMaximum value of size, RiIs composed of Zi+bTo ZiTracking response of HiIs an initial frame ZiThe ideal Gaussian response of the method is that M represents that M mini lots are used for training simultaneously, each mini lot is a training pair, each training pair has three frames of images, and a weight parameter M is usedmotionThe influence of the non-dynamic target on network training is reduced;
and 3.9, reversely propagating the loss value to update the network model parameters, wherein the loss value is the loss function value L in the step 3.8, reversely propagating the loss value, updating the network parameters in the step 3.2 by a random gradient descent (SGD) algorithm, and finally obtaining the optimized spatial branch model
Figure BDA0002888002740000053
Further, the implementation manner of the step 4 is as follows,
step 4.1, template Branch 2 with template frame ZiFor inputting image frames, at this time ZiFor hyperspectral video frames, search branch 2 is searched for frame Zi+xFor inputting image frames, at this time Zi+xModel for loading spatial branching during hyperspectral branching training for hyperspectral video frames
Figure BDA0002888002740000054
Meanwhile, the freezing space branch parameters do not participate in back propagation;
step 4.2, template frame ZiEnter template branch 2, from ZiThree bands are selected to form a pseudo color visual frame
Figure BDA0002888002740000055
Obtaining a characteristic F _ t _ rgb; at the same time, ZiObtaining the characteristics F _ t _ hsi sequentially through a network consisting of 3 spectral characteristic extraction modules connected in series; the structure of the 3 spectral feature extraction modules is shown in figure 2; calculating a weight function a of the F _ t _ hsi characteristic channel sequentially through a global average pooling layer, a full connection layer, a nonlinear activation layer, a full connection layer and Softmax to finally obtain a hyperspectral characteristic aF _ t _ his with weight;
step 4.3, adding Zi+a Input search branch 2, from Zi+aThree bands are selected to form a pseudo color video frame
Figure BDA0002888002740000061
Obtaining a characteristic F _ s _ rgb; in the same way, Zi+aObtaining the characteristics F _ s _ hsi sequentially through a network consisting of 3 spectral characteristic extraction modules connected in series; the structure of the 3 spectral feature extraction modules is shown in figure 2; using the a calculated in the step 4.2 to finally obtain the hyperspectral characteristic aF _ s _ hsi with weight;
step 4.4, solving a ridge regression loss function:
Figure BDA0002888002740000062
to obtain a filter wfF _ t _ F ═ aF _ t _ hsi + F _ t _ rgb, H is an ideal gaussian response, λ is a constant;
Figure BDA0002888002740000063
wherein the content of the first and second substances,
Figure BDA0002888002740000064
is wfFourier transform of (1), like
Figure BDA0002888002740000065
Is a fourier transform of F _ t _ F,
Figure BDA0002888002740000066
is the fourier transform of H, representing the conjugate value;
step 4.5, pass filter wfCalculating a final response R with the characteristic F _ s _ F ═ F _ s _ rgb + aF _ s _ hsi of the subsequent framef
Figure BDA0002888002740000067
Wherein F-1Representing an inverse fourier transform;
step 4.6, firstly, forward tracking is carried out, and the tracking sequence is Zi-Zi+a-Zi+b,b>a, obtaining a tracking response Rf_i+a,Rf_i+b(ii) a Then tracking backwards with a tracking sequence of Zi+b-ZiTo obtain a tracking response Rf_i
Step 4.7, calculating the moving weight Mf_motion:
Figure BDA0002888002740000068
Wherein HiIs an initial frame ZiIdeal gaussian output, Hi+aIs Zi+aThe ideal Gaussian output of the system; by calculating a weight parameter Mf_motionTo determine whether the random initialization bounding box contains dynamic targets, if there are dynamic targets, Mf_motionMay be weighted more than the value without the dynamic target;
step 4.8, constructing a loss function:
Figure BDA0002888002740000071
wherein n represents the maximum value of the batch size, Rf_iIs composed of Zi+bTo ZiTracking response of HiIs an initial frame ZiThe ideal Gaussian response of the method is that M represents that M mini lots are used for training simultaneously, each mini lot is a training pair, each training pair has three frames of images, and a weight parameter M is usedf_motionThe influence of the non-dynamic target on network training is reduced;
step 4.9, the loss value is propagated reversely to update the network model parameter, and the loss is the loss function value L in step 4.8fThe loss value is reversely propagated, the network parameters in the 4.2 steps are updated, and finally the optimized space-hyperspectral model is obtained
Figure BDA0002888002740000072
The method of the invention has the following remarkable effects: (1) the unsupervised training network based on the periodic consistency principle can save labor cost; (2) a tracking model fusing RGB (red, green and blue) features and hyperspectral features is trained end to end by utilizing deep learning, the reasoning speed is high, and compared with a traditional manual feature method, the reasoning speed is improved by tens of times; (3) and features which are more effective for the target to be tracked are aggregated in the initial frame by using a channel attention mechanism, so that the discrimination of the network on the target is increased.
Drawings
FIG. 1 is a schematic view of the cycle consistency in step 4 of example 1 of the present invention
FIG. 2 is a schematic diagram of hyperspectral branching in step 4 of embodiment 1 of the present invention.
FIG. 3 is a schematic diagram of spatial branching in step 3 of example 1 of the present invention.
Fig. 4 is a schematic diagram of the tracking result in step 3 of embodiment 1 of the present invention, in which the numbers in the figure respectively indicate the 4 th frame and the 12 th frame, and the frame represents the position and the size of the tracking target, and the frame moves and changes with the movement and the deformation of the target (the target becomes larger, the frame becomes larger, the target becomes smaller, and the frame becomes smaller).
Fig. 5 is a flowchart of embodiment 1 of the present invention.
Detailed Description
The technical scheme of the invention is further specifically described by the following embodiments and the accompanying drawings.
Example 1:
the embodiment of the invention provides an unsupervised hyperspectral video target tracking method based on space-spectrum feature fusion, which comprises the following steps of:
step 1, video data preprocessing, the step further comprising:
step 1.1, converting video data into a frame of continuous image Xi(RGB video frame or hyperspectral video frame).
Step 1.2, the video image frame X without the label is processediVideo image frame Y with all resize at 200 x 200 pixel sizei
Step 2, randomly initializing a Bounding Box (BBOX), and the step further comprises:
on the basis of step 1, in the video frame Y without labeliTo randomly select a region of 90 x 90 pixels in size (in coordinates x, y)]A region of 90 × 90 pixels size at the center) as the target to be tracked (this region is the initialized BBOX). Bringing the 90 × 90 region resize to Z of 125 × 125 pixelsi. Simultaneously at Yi+1To Yi+10Two frames Y of the 10 frames are randomly selectedi+aAnd Yi+b(10>=a>0,10>=b>0,a>b or a<b) Also in the coordinates [ x, y ]]Selecting a 90 x 90 pixel size region resize for the center to a 125 x 125 pixel size Zi+aAnd Zi+b
Step 3, unsupervised training of RGB branches (spatial branches) by using a cycle consistency principle, the step further comprises:
and 3.1, forming the whole network structure by using a Siamese network basis, and dividing the whole network structure into a template branch and a search branch. Template branching by template frame Zi(including the target to be tracked, in this case ZiRepresenting RGB video frames) are input image frames, and the template branches are further divided into spatial branches and hyperspectral branches. Searching for branches to search for frame Zi+x(subsequent video frame, x)>0) The input image frame is also divided into a spatial branch and a hyperspectral branch. And (4) removing the hyperspectral branches when training the spatial branches, and only training the spatial branches.
Step 3.2, template frame ZiEnter template branch, at this time ZiIs an RGB video frame. ZiThe characteristic F _ t is obtained through a convolutional layer, a nonlinear active layer, a convolutional layer and a local response normalization layer in sequence.
Step 3.3, adding Zi+a(suppose b>a) Enter search Branch, at this time Zi+aIs an RGB video frame. Zi+aThe characteristic F _ s is obtained through a convolutional layer, a nonlinear active layer, a convolutional layer and a local response normalization layer in sequence.
Step 3.4, solving a ridge regression loss function:
Figure BDA0002888002740000091
the resulting filter w, H is an ideal gaussian response and λ is a constant.
Figure BDA0002888002740000092
Wherein the content of the first and second substances,
Figure BDA0002888002740000093
is a Fourier transform of w, the same way
Figure BDA0002888002740000094
Is the fourier transform of the F _ t,
Figure BDA0002888002740000095
is the Fourier transform of H, a represents a conjugate value, and represents a dot product.
Step 3.5, a final response R can be calculated through the filter w and the feature F _ s of the subsequent frame:
Figure BDA0002888002740000096
wherein F-1Representing an inverse fourier transform.
Step 3.6, first, forward tracking is carried out, and the tracking sequence is Zi-Zi+a-Zi+b(three frames constitute a training pair) to obtain a tracking response Ri+a,Ri+b(ii) a Then tracking backwards with a tracking sequence of Zi+b-ZiTo obtain a tracking response Ri
Step 3.7, calculate the moving weight Mmotion:
Figure BDA0002888002740000101
Wherein HiIs an initial frame ZiIdeal gaussian output, Hi+aIs Zi+aAnd m represents m different training pairs. By calculating moving weights MmotionTo determine whether the random initialization bounding box contains dynamic objects (if there are dynamic objects, M)motionMay be weighted more than the value without the dynamic target).
Step 3.8, constructing a loss function:
Figure BDA0002888002740000102
wherein n represents the maximum value of the batch size, RiIs composed of Zi+bTo ZiTracking response of HiIs an initial frame ZiThe ideal Gaussian response of the method is that M represents that M mini lots are used for training simultaneously, each mini lot is a training pair, each training pair has three frames of images, and a weight parameter M is usedmotionThe influence of the non-dynamic targets on the network training can be reduced.
And 3.9, reversely propagating the loss value to update the network model parameters, reversely propagating the loss value, updating the network parameters in the step 3.2 by a random gradient descent (SGD) algorithm, and finally obtaining the optimized space branch model
Figure BDA0002888002740000103
Step 4, unsupervised training of hyperspectral branching by using a cycle consistency principle, the method further comprises the following steps:
and 4.1, forming the whole network structure by using a Siamese network basis, and dividing the whole network structure into a template branch and a search branch. Template branching by template frame Zi(including the target to be tracked, and the input video frames Zi and the like all represent hyperspectral video frames at the moment) are input image frames, and the template branches are divided into space branches and hyperspectral branches. Searching for branches to search for frame Zi+x(subsequent video frame, x)>0) The input image frame is also divided into a spatial branch and a hyperspectral branch. Model for loading spatial branches during training of hyperspectral branches
Figure BDA0002888002740000104
While the frozen spatial branch parameters do not participate in the back propagation.
Step 4.2, template frame ZiAnd inputting template branches. From ZiThree bands are selected to form a pseudo color video frame
Figure BDA0002888002740000105
The feature F _ t _ rgb is obtained (spatial branching). At the same time, ZiAnd sequentially obtaining the characteristics F _ t _ hsi through a network consisting of 3 spectral characteristic extraction modules connected in series. The structure of the 3 spectral feature extraction modules is shown in FIG. 2. And calculating a weight function a of the F _ t _ hsi characteristic channel (the weight function is only calculated in a template frame, and a is directly used in a subsequent frame) sequentially through a global average pooling layer, a full connection layer, a nonlinear activation layer, a full connection layer and Softmax (channel attention mechanism) to finally obtain the hyperspectral characteristic aF _ t _ hsi with weight.
Step 4.3, adding Zi+a(suppose b>a) A search branch is entered. From Zi+aThree band groups (the band composition is the same as the step 4.2) are selected to form a pseudo color video frame
Figure BDA0002888002740000105
The feature F _ s _ rgb is obtained. In the same way, Zi+aAnd obtaining the characteristics F _ s _ hsi sequentially through a network consisting of 3 spectral characteristic extraction modules connected in series. The structure of the 3 spectral feature extraction modules is shown in FIG. 2. And finally obtaining the hyperspectral characteristic aF _ s _ hsi with the weight by using the a calculated in the step 4.2.
Step 4.4, solving a ridge regression loss function:
Figure BDA0002888002740000112
to obtain a filter wfF _ t _ F is aF _ t _ hsi + F _ t _ rgb, H is an ideal gaussian response, and λ is a constant.
Figure BDA0002888002740000113
Wherein the content of the first and second substances,
Figure BDA0002888002740000114
is wfFourier transform of (1), like
Figure BDA0002888002740000115
Is a fourier transform of F _ t _ F,
Figure BDA0002888002740000116
is the fourier transform of H, representing the conjugate value.
Step 4.5, pass filter wfThe final response R can be calculated from the feature F _ s _ F ═ F _ s _ rgb + aF _ s _ hsi of the subsequent framef
Figure BDA0002888002740000117
Wherein F-1Representing an inverse fourier transform.
Step 4.6, firstly, forward tracking is carried out, and the tracking sequence is Zi-Zi+a-Zi+bTo obtain a tracking response Rf_i+a, Rf_i+b(ii) a Then tracking backwards with a tracking sequence of Zi+b-ZiTo obtain a tracking response Rf_i
Step 4.7, calculating the moving weight Mf_motion:
Figure BDA0002888002740000121
Wherein HiIs an initial frame ZiIdeal gaussian output, Hi+aIs Zi+aThe ideal gaussian output of. By calculating a weight parameter Mf_motionTo determine whether the random initialization bounding box contains dynamic objects (if there are dynamic objects, M)f_motionMay be weighted more than the value without the dynamic target).
Step 4.8, constructing a loss function:
Figure BDA0002888002740000122
wherein n represents the maximum value of the batch size, Rf_iIs composed of Zi+bTo ZiTracking response of HiIs an initial frame ZiThe ideal Gaussian response of the method is that M represents that M mini lots are used for training simultaneously, each mini lot is a training pair, each training pair has three frames of images, and a weight parameter M is usedf_motionThe influence of the non-dynamic targets on the network training can be reduced.
4.9, reversely propagating the loss value to update the network model parameters, reversely propagating the loss value, updating the network parameters in the 4.2 steps, and finally obtaining the optimized space-hyperspectral model
Figure BDA0002888002740000123
Step 5, the hyperspectral video frame X containing the target to be tracked1Input to a network model
Figure BDA0002888002740000124
The middle template branches, and the subsequent frame X2,X3,X4...XiAre sequentially input into the network model
Figure BDA0002888002740000125
The search branch of (2) results in the tracking result of each frame.
The method of the invention has the following remarkable effects: (1) the unsupervised training network based on the periodic consistency principle can save labor cost; (2) a tracking model fusing RGB (red, green and blue) features and hyperspectral features is trained end to end by utilizing deep learning, the reasoning speed is high, and compared with a traditional manual feature method, the reasoning speed is improved by tens of times; (3) and features which are more effective for the target to be tracked are aggregated in the initial frame by using a channel attention mechanism, so that the discrimination of the network on the target is increased. The specific embodiments described herein are merely illustrative of the spirit of the invention. Various modifications or additions may be made to the described embodiments or alternatives may be employed by those skilled in the art without departing from the spirit or ambit of the invention as defined in the appended claims.

Claims (5)

1. An unsupervised hyperspectral video target tracking method based on space-spectrum feature fusion is characterized by comprising the following steps of:
step 1, preprocessing video data;
step 2, initializing the boundary frame randomly and obtaining a template frame Z through the initialized boundary frameiAnd subsequent search frame Zi+xTemplate frame ZiAnd search frame Zi+xThe video frame is an RGB video frame or a hyperspectral video frame;
step 3, unsupervised training of RGB branches, also called spatial branches, is carried out by utilizing the principle of cycle consistency, and finally an optimized spatial branch model is obtained
Figure FDA0002888002730000011
The space branch comprises a template branch 1 and a search branch 1, wherein the template branch 1 contains a template frame Z of a tracking targetiFor an input image frame, the template frame Z at this timeiFor RGB video frames, search branch 1 to search for frame Zi+xI.e. the subsequent video frame is the input image frame, x>0, removing the hyperspectral branches when training the spatial branches, and only training the spatial branches;
the template branch 1 and the search branch 1 have the same structure and comprise a convolution layer, a nonlinear active layer, a convolution layer and a local response normalization layer;
step 4, unsupervised training of hyperspectral branches by using a cycle consistency principle to finally obtain an optimized space-hyperspectral model
Figure FDA0002888002730000012
The hyperspectral branch comprises a template branch 2 and a search branch 2, wherein the template branch 2 comprises a model of a tracked targetPlate frame ZiFor an input image frame, the template frame Z at this timeiFor hyperspectral video frames, search branch 2 is searched for frame Zi+xI.e. the subsequent video frame is the input image frame, x>0, loading the model of the spatial branch during the training of the hyperspectral branch
Figure FDA0002888002730000013
Meanwhile, the freezing space branch parameters do not participate in back propagation;
the template branch 2 comprises a plurality of spectral feature extraction modules and a channel attention module which are connected in series, wherein the first two spectral feature extraction modules comprise a convolutional layer-batch normalization layer-nonlinear active layer, the third spectral feature extraction module comprises a convolutional layer-batch normalization layer-nonlinear active layer-convolutional layer, the channel attention module comprises a global average pooling layer-full-connection layer-nonlinear active layer-full-connection layer-Softmax, and the plurality of search branches 2 comprise only the spectral feature extraction modules which are connected in series and do not comprise the channel attention module;
step 5, the hyperspectral video frame X containing the target to be tracked1Input to a network model
Figure FDA0002888002730000021
The middle template branches, and the subsequent video frame X2,X3,X4...XiAre sequentially input into the network model
Figure FDA0002888002730000022
The search branch of (2) results in the tracking result of each frame.
2. The unsupervised hyperspectral video target tracking method based on the spatial-spectral feature fusion of claim 1, characterized by comprising the following steps: the specific implementation of step 1 is as follows,
firstly, converting video data into a continuous image X of one framei,XiThe video frame is an RGB video frame or a hyperspectral video frame;
then the video image frame X without the label is processediAll resize is sizedVideo image frame Yi
3. The unsupervised hyperspectral video target tracking method based on the spatial-spectral feature fusion of claim 1, characterized by comprising the following steps: the implementation of said step 2 is as follows,
on the basis of step 1, in the video frame Y without labeliBy the coordinate [ x, y ]]Selecting a region with the size of 90 x 90 pixels as a target to be tracked for the center, wherein the region is initialized BBOX; bringing the 90 × 90 region resize to Z of 125 × 125 pixelsi(ii) a Simultaneously at Yi+1To Yi+10Two frames Y of the 10 frames are randomly selectedi+aAnd Yi+b,10>=a>0,10>=b>0,a>b or a<b, likewise in the coordinates [ x, y ]]Selecting a 90 x 90 pixel size region resize for the center to a 125 x 125 pixel size Zi+aAnd Zi+b
4. The unsupervised hyperspectral video target tracking method based on the spatial-spectral feature fusion of claim 1, characterized by comprising the following steps: the specific implementation of step 3 is as follows,
step 3.1, template branch 1 with template frame ZiFor inputting an image frame, branch 1 is searched for frame Zi+xIn order to input image frames, removing hyperspectral branches when training spatial branches, and only training the spatial branches;
step 3.2, template frame ZiEnter template branch 1, in this case ZiIs a RGB video frame, ZiSequentially obtaining a characteristic F _ t through a convolutional layer, a nonlinear active layer, a convolutional layer and a local response normalization layer;
step 3.3, adding Zi+aInput search Branch 1, when Zi+aIs an RGB video frame; zi+aSequentially obtaining a characteristic F _ s through a convolution layer, a nonlinear active layer, a convolution layer and a local response normalization layer;
step 3.4, solving a ridge regression loss function;
Figure FDA0002888002730000031
obtaining a filter w, wherein H is an ideal Gaussian response and lambda is a constant;
Figure FDA0002888002730000032
wherein the content of the first and second substances,
Figure FDA0002888002730000033
is a Fourier transform of w, the same way
Figure FDA0002888002730000034
Is the fourier transform of the F _ t,
Figure FDA0002888002730000035
is the Fourier transform of H, a represents a conjugate value, a represents a dot product;
step 3.5, calculating through the filter w and the characteristic F _ s of the subsequent frame to obtain the final response R;
Figure FDA0002888002730000036
wherein F-1Representing an inverse fourier transform;
step 3.6, first, forward tracking is carried out, and the tracking sequence is Zi-Zi+a-Zi+bThree frames constitute a training pair, and b>a, obtaining a tracking response Ri+a,Ri+b(ii) a Then tracking backwards with a tracking sequence of Zi+b-ZiTo obtain a tracking response Ri
Step 3.7, calculate the moving weight Mmotion
Figure FDA0002888002730000037
Wherein HiIs an initial frame ZiIdeal gaussian output, Hi+aIs Zi+aThe ideal Gaussian output is obtained, and m represents m different training pairs; by calculating moving weights MmotionTo determine whether the random initialization bounding box contains dynamic targets, if there are dynamic targets, MmotionMay be weighted more than the value without the dynamic target;
step 3.8, constructing a loss function:
Figure FDA0002888002730000041
wherein n represents the maximum value of the batch size, RiIs composed of Zi+bTo ZiTracking response of HiIs an initial frame ZiThe ideal Gaussian response of the method is that M represents that M mini lots are used for training simultaneously, each mini lot is a training pair, each training pair has three frames of images, and a weight parameter M is usedmotionThe influence of the non-dynamic target on network training is reduced;
and 3.9, reversely propagating the loss value to update the network model parameters, wherein the loss value is the loss function value L in the step 3.8, reversely propagating the loss value, updating the network parameters in the step 3.2 based on a stochastic gradient descent algorithm, and finally obtaining the optimized spatial branch model
Figure FDA0002888002730000042
5. The unsupervised hyperspectral video target tracking method based on the spatial-spectral feature fusion of claim 1, characterized by comprising the following steps: the implementation of said step 4 is as follows,
step 4.1, template Branch 2 with template frame ZiFor inputting an image frame, branch 2 is searched for frame Zi+xTraining a model for loading spatial branches during hyperspectral branching for input image frames
Figure FDA0002888002730000043
Meanwhile, the freezing space branch parameters do not participate in back propagation;
step 4.2, template frame ZiEnter template branch 2, from ZiThree bands are selected to form a pseudo color video pass
Figure FDA0002888002730000044
Obtaining a characteristic F _ t _ rgb; at the same time, ZiObtaining the characteristics F _ t _ hsi sequentially through a network consisting of 3 spectral characteristic extraction modules connected in series; calculating a weight function a of the F _ t _ hsi characteristic channel sequentially through a global average pooling layer, a full connection layer, a nonlinear activation layer, a full connection layer and Softmax to finally obtain a hyperspectral characteristic aF _ t _ hsi with weight;
step 4.3, adding Zi+aInput search branch 2, from Zi+aThree bands are selected to form a pseudo color video frame
Figure FDA0002888002730000051
Obtaining a characteristic F _ s _ rgb; in the same way, Zi+aObtaining the characteristics F _ s _ hsi sequentially through a network consisting of 3 spectral characteristic extraction modules connected in series; using the a calculated in the step 4.2 to finally obtain the hyperspectral characteristic aF _ s _ hsi with weight;
step 4.4, solving a ridge regression loss function:
Figure FDA0002888002730000052
to obtain a filter wfF _ t _ F ═ aF _ t _ hsi + F _ t _ rgb, H is an ideal gaussian response, λ is a constant;
Figure FDA0002888002730000053
wherein the content of the first and second substances,
Figure FDA0002888002730000054
is wfOfTransformation, same principle
Figure FDA0002888002730000055
Is a fourier transform of F _ t _ F,
Figure FDA0002888002730000056
is the fourier transform of H, representing the conjugate value;
step 4.5, pass filter wfCalculating a final response R with the characteristic F _ s _ F ═ F _ s _ rgb + aF _ s _ hsi of the subsequent framef
Figure FDA0002888002730000057
Wherein F-1Representing an inverse fourier transform;
step 4.6, firstly, forward tracking is carried out, and the tracking sequence is Zi-Zi+a-Zi+b,b>a, obtaining a tracking response Rf_i+a,Rf_i+b(ii) a Then tracking backwards with a tracking sequence of Zi+b-ZiTo obtain a tracking response Rf_i
Step 4.7, calculating the moving weight Mf_motion:
Figure FDA0002888002730000058
Wherein HiIs an initial frame ZiIdeal gaussian output, Hi+aIs Zi+aThe ideal Gaussian output of the system; by calculating a weight parameter Mf_motionTo determine whether the random initialization bounding box contains dynamic targets, if there are dynamic targets, MmotionMay be weighted more than the value without the dynamic target;
step 4.8, constructing a loss function:
Figure FDA0002888002730000061
wherein n represents the maximum value of the batch size, Rf_iIs composed of Zi+bTo ZiTracking response of HiIs an initial frame ZiThe ideal Gaussian response of the method is that M represents that M mini lots are used for training simultaneously, each mini lot is a training pair, each training pair has three frames of images, and a weight parameter M is usedf_motionThe influence of the non-dynamic target on network training is reduced;
step 4.9, the loss value is propagated reversely to update the network model parameter, and the loss is the loss function value L in step 4.8fThe loss value is reversely propagated, the network parameters in the 4.2 steps are updated, and finally the optimized space-hyperspectral model is obtained
Figure FDA0002888002730000062
CN202110018918.9A 2021-01-07 2021-01-07 Unsupervised hyperspectral video target tracking method based on spatial spectrum feature fusion Active CN112766102B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110018918.9A CN112766102B (en) 2021-01-07 2021-01-07 Unsupervised hyperspectral video target tracking method based on spatial spectrum feature fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110018918.9A CN112766102B (en) 2021-01-07 2021-01-07 Unsupervised hyperspectral video target tracking method based on spatial spectrum feature fusion

Publications (2)

Publication Number Publication Date
CN112766102A true CN112766102A (en) 2021-05-07
CN112766102B CN112766102B (en) 2024-04-26

Family

ID=75700670

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110018918.9A Active CN112766102B (en) 2021-01-07 2021-01-07 Unsupervised hyperspectral video target tracking method based on spatial spectrum feature fusion

Country Status (1)

Country Link
CN (1) CN112766102B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113344932A (en) * 2021-06-01 2021-09-03 电子科技大学 Semi-supervised single-target video segmentation method
CN113628244A (en) * 2021-07-05 2021-11-09 上海交通大学 Target tracking method, system, terminal and medium based on label-free video training
CN117689692A (en) * 2023-12-20 2024-03-12 中国人民解放军海军航空大学 Attention mechanism guiding matching associated hyperspectral and RGB video fusion tracking method

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107038684A (en) * 2017-04-10 2017-08-11 南京信息工程大学 A kind of method for lifting TMI spatial resolution
CN108765280A (en) * 2018-03-30 2018-11-06 徐国明 A kind of high spectrum image spatial resolution enhancement method
CN110210551A (en) * 2019-05-28 2019-09-06 北京工业大学 A kind of visual target tracking method based on adaptive main body sensitivity
CN111062888A (en) * 2019-12-16 2020-04-24 武汉大学 Hyperspectral image denoising method based on multi-target low-rank sparsity and spatial-spectral total variation
CN111325116A (en) * 2020-02-05 2020-06-23 武汉大学 Remote sensing image target detection method capable of evolving based on offline training-online learning depth
CN111724411A (en) * 2020-05-26 2020-09-29 浙江工业大学 Multi-feature fusion tracking method based on hedging algorithm
WO2020199205A1 (en) * 2019-04-04 2020-10-08 合刃科技(深圳)有限公司 Hybrid hyperspectral image reconstruction method and system
US20200327679A1 (en) * 2019-04-12 2020-10-15 Beijing Moviebook Science and Technology Co., Ltd. Visual target tracking method and apparatus based on deeply and densely connected neural network
CN111797716A (en) * 2020-06-16 2020-10-20 电子科技大学 Single target tracking method based on Siamese network

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107038684A (en) * 2017-04-10 2017-08-11 南京信息工程大学 A kind of method for lifting TMI spatial resolution
CN108765280A (en) * 2018-03-30 2018-11-06 徐国明 A kind of high spectrum image spatial resolution enhancement method
WO2020199205A1 (en) * 2019-04-04 2020-10-08 合刃科技(深圳)有限公司 Hybrid hyperspectral image reconstruction method and system
US20200327679A1 (en) * 2019-04-12 2020-10-15 Beijing Moviebook Science and Technology Co., Ltd. Visual target tracking method and apparatus based on deeply and densely connected neural network
CN110210551A (en) * 2019-05-28 2019-09-06 北京工业大学 A kind of visual target tracking method based on adaptive main body sensitivity
CN111062888A (en) * 2019-12-16 2020-04-24 武汉大学 Hyperspectral image denoising method based on multi-target low-rank sparsity and spatial-spectral total variation
CN111325116A (en) * 2020-02-05 2020-06-23 武汉大学 Remote sensing image target detection method capable of evolving based on offline training-online learning depth
CN111724411A (en) * 2020-05-26 2020-09-29 浙江工业大学 Multi-feature fusion tracking method based on hedging algorithm
CN111797716A (en) * 2020-06-16 2020-10-20 电子科技大学 Single target tracking method based on Siamese network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
洪科;: "基于超像素分割的RGB与高光谱图像融合", 电子技术与软件工程, no. 03 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113344932A (en) * 2021-06-01 2021-09-03 电子科技大学 Semi-supervised single-target video segmentation method
CN113628244A (en) * 2021-07-05 2021-11-09 上海交通大学 Target tracking method, system, terminal and medium based on label-free video training
CN113628244B (en) * 2021-07-05 2023-11-28 上海交通大学 Target tracking method, system, terminal and medium based on label-free video training
CN117689692A (en) * 2023-12-20 2024-03-12 中国人民解放军海军航空大学 Attention mechanism guiding matching associated hyperspectral and RGB video fusion tracking method

Also Published As

Publication number Publication date
CN112766102B (en) 2024-04-26

Similar Documents

Publication Publication Date Title
CN113065558B (en) Lightweight small target detection method combined with attention mechanism
CN108665496B (en) End-to-end semantic instant positioning and mapping method based on deep learning
CN107945204B (en) Pixel-level image matting method based on generation countermeasure network
CN109584248B (en) Infrared target instance segmentation method based on feature fusion and dense connection network
CN112884064B (en) Target detection and identification method based on neural network
CN110570371B (en) Image defogging method based on multi-scale residual error learning
Yu et al. High-resolution deep image matting
CN112766102A (en) Unsupervised hyperspectral video target tracking method based on space-spectrum feature fusion
CN107680106A (en) A kind of conspicuousness object detection method based on Faster R CNN
CN112507777A (en) Optical remote sensing image ship detection and segmentation method based on deep learning
CN108537824B (en) Feature map enhanced network structure optimization method based on alternating deconvolution and convolution
CN110334584B (en) Gesture recognition method based on regional full convolution network
CN113610905B (en) Deep learning remote sensing image registration method based on sub-image matching and application
CN109919246A (en) Pedestrian&#39;s recognition methods again based on self-adaptive features cluster and multiple risks fusion
CN113095371B (en) Feature point matching method and system for three-dimensional reconstruction
CN115512251A (en) Unmanned aerial vehicle low-illumination target tracking method based on double-branch progressive feature enhancement
CN115049945B (en) Unmanned aerial vehicle image-based wheat lodging area extraction method and device
CN114219824A (en) Visible light-infrared target tracking method and system based on deep network
CN115545166A (en) Improved ConvNeXt convolutional neural network and remote sensing image classification method thereof
CN114612709A (en) Multi-scale target detection method guided by image pyramid characteristics
CN106504219B (en) Constrained path morphology high-resolution remote sensing image road Enhancement Method
CN116342536A (en) Aluminum strip surface defect detection method, system and equipment based on lightweight model
CN116596966A (en) Segmentation and tracking method based on attention and feature fusion
Peng et al. RSBNet: One-shot neural architecture search for a backbone network in remote sensing image recognition
CN111339342B (en) Three-dimensional model retrieval method based on angle ternary center loss

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant