CN113283407A - Twin network target tracking method based on channel and space attention mechanism - Google Patents
Twin network target tracking method based on channel and space attention mechanism Download PDFInfo
- Publication number
- CN113283407A CN113283407A CN202110828947.1A CN202110828947A CN113283407A CN 113283407 A CN113283407 A CN 113283407A CN 202110828947 A CN202110828947 A CN 202110828947A CN 113283407 A CN113283407 A CN 113283407A
- Authority
- CN
- China
- Prior art keywords
- target
- attention mechanism
- channel
- target image
- network model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Abstract
The invention provides a twin network target tracking method based on a channel and space attention mechanism, which comprises the following steps: processing the video or image data set to obtain a plurality of target images of uniform image size; constructing and obtaining a novel backbone network model based on a convolutional neural network model, a channel attention mechanism and a space attention mechanism; extracting training samples from a plurality of target images to train the novel backbone network model; extracting deep features of a target image sample from a plurality of target images by using a trained novel backbone network model, and performing similarity matching on the deep features of the target image sample in a target image candidate region to obtain a plurality of target candidate blocks, wherein each target candidate block corresponds to a similarity score; and tracking the target by using the acquired target candidate block with the maximum similarity score. The apparent model of the tracking algorithm designed by the invention has better robustness and accuracy.
Description
Technical Field
The invention relates to the technical field of computer vision, in particular to a twin network target tracking method based on a channel and space attention mechanism.
Background
Target tracking is a very important topic in computer vision, and has practical applications in the fields of automatic driving, video monitoring, video analysis, medical treatment, military and the like. In practical application, due to wide and complex application scenes of target tracking, deformation often occurs on target tracking with a complex background, and challenging problems of motion blur, occlusion and the like exist. In addition, due to the requirements of application scenarios in the aspects of commerce, industry, military affairs, medicine and the like, the research on the target tracking technology is extremely valuable.
Generally, the target tracking algorithm includes two types, namely discriminant algorithm and generator algorithm. Wherein discriminant model-based algorithms can effectively distinguish the tracked object from the surrounding background. The model-based algorithm compares in a given search region using a learned similarity function between the target image sample and the candidate region target image sample. In recent years, with the advent of large-scale public labeled image data sets and the rapid development of computer hardware performance and software technology, deep learning has been highly successful in various fields of image processing. Among them, the discriminant correlation filter based on deep learning has been successfully applied to target tracking because of its fast operation speed. In addition, twin network-based tracking algorithms have also gained wide attention in target tracking tasks. And performing template matching on the detected target candidate sample by utilizing a twin network architecture, and calculating the highest similarity between the target region and the candidate region to obtain the position of the target image.
However, in the prior art, when performing visual target tracking, the convolutional neural network model, the channel attention mechanism and the spatial attention mechanism are not combined at the same time, and the accuracy and robustness of performing target tracking are not ideal.
Disclosure of Invention
In view of the above situation, there is a need to solve the problem in the prior art that, when performing visual target tracking, the accuracy and robustness of target tracking are not ideal without simultaneously combining the convolutional neural network model, the channel and the spatial attention mechanism.
The embodiment of the invention provides a twin network target tracking method based on a channel and space attention mechanism, wherein the method comprises the following steps:
the method comprises the following steps: processing the video or image data set to obtain a plurality of target images of uniform image size;
step two: constructing and obtaining a novel backbone network model based on a convolutional neural network model, a channel attention mechanism and a space attention mechanism;
step three: extracting training samples from the plurality of target images to train the novel backbone network model;
step four: extracting deep features of a target image sample from the multiple target images by using the trained novel backbone network model, and performing similarity matching on the deep features of the target image sample in a target image candidate region to obtain multiple target candidate blocks, wherein each target candidate block corresponds to a similarity score;
step five: and tracking the target by using the acquired target candidate block with the maximum similarity score.
The invention provides a twin network target tracking method based on a channel and space attention mechanism, which comprises the steps of firstly processing a video or image data set to obtain a target image with a uniform image size, then jointly constructing a novel backbone network model based on a convolutional neural network model, the channel attention mechanism and the space attention mechanism, then extracting a training sample from the target image, training the novel backbone network model, extracting deep features of the target image sample from the target image by using the trained novel backbone network model, further performing similarity matching in a target image candidate area to obtain a plurality of target candidate blocks, and finally performing target tracking by using the obtained target candidate blocks with the maximum similarity score.
According to the method, the GOT-10k is used as a training set to adjust model parameters of off-line training, so that targets in the video can be more accurately represented; feature extraction is then performed by using a lightweight convolutional neural network model. The apparent model of the tracking algorithm designed by the invention has better robustness and accuracy.
The twin network target tracking method based on the channel and space attention mechanism is characterized in that the novel backbone network model is a twin network framework, and the twin network framework comprises a template branch and a search branch;
wherein the step of extracting training samples from the plurality of target images comprises:
when the sub-window searching the target image extends beyond the range of the target image, the missing image portion is filled with the average RGB values.
The twin network target tracking method based on the channel and space attention mechanism comprises the following steps of:
inputting a target image through the template branch and the search branch respectively, and acquiring deep features of a target image sample according to the template branch and the search branch;
in the twin network framework, the following formula exists:
where h denotes a mapping function of the input-output signal,kthe length of the stride is represented by,is the transition value of the active area in the input-output signal,andboth represent the translation operator and are,representing the input target image.
The twin network target tracking method based on the channel and space attention mechanism is characterized in that in the fourth step, the similarity score is expressed by the following formula:
wherein the content of the first and second substances,representing a similarity score between two input target images,representative valueThe deviation of (a) is determined,a set of real numbers is represented as,andrepresenting the output characteristics of two input target images after passing through the twin network framework,representing two input target images of the object,representing a convolution embedding function.
The twin network target tracking method based on the channel and space attention mechanism comprises the following steps of, in the step of extracting deep features of target image samples from the plurality of target images by using the trained novel backbone network model, executing the following steps by the channel attention mechanism:
obtaining the characteristics of the target images of the two channels through maximum pooling and global average pooling;
inputting the features of the target images of the two channels obtained after the maximum pooling and the global average pooling into a multilayer perceptron network, and obtaining feature vectors after element summation;
passing the feature vector through a Sigmoid activation function to obtain a first weight coefficient, and combining the first weight coefficient with an input target imageThe multiplication is performed to obtain a first weighted new feature.
The twin network target tracking method based on the channel and space attention mechanism is characterized in that the first weight coefficient is expressed as:
wherein the content of the first and second substances,is a first weight coefficient of the first weight coefficient,indicating that the Sigmoid-activated function,representing the weight of the shared multi-tier perceptron network,the representation of the function of the ReLU,is a function of the global average pooling,is a function of the maximum pooling,representing an input target image;
the first weighted new feature is represented as:
wherein the content of the first and second substances,a first weighted new characteristic is represented that is,representing multiplication at the element level.
The twin network target tracking method based on the channel and spatial attention mechanism comprises the following steps of, in the step of extracting deep features of target image samples from the plurality of target images by using the trained novel backbone network model, executing the following steps:
obtaining the characteristics of the target images of the two channels through maximum pooling and global average pooling, and splicing the characteristics of the target images of the two channels through the first convolution layer;
calculating the characteristics of the target images of the two spliced channels through a second convolution layer and a Sigmoid activation function to obtain a second weight coefficient;
and multiplying the second weighting coefficient and the first weighting new characteristic to obtain a second weighting new characteristic.
The twin network target tracking method based on the channel and space attention mechanism is characterized in that the second weight coefficient is expressed as:
wherein the content of the first and second substances,for the purpose of the second weight coefficient,the perceptual domain representing the convolution kernel is 7 x 7,also a first weighted new feature is represented,representing a Sigmoid activation function;
the second weighted new feature is represented as:
The twin network target tracking method based on the channel and space attention mechanism comprises the steps of constructing a novel backbone network model based on a convolutional neural network model, a channel attention mechanism and a space attention mechanism,
training with the plurality of target images as a training data set, wherein the training data set contains 560 moving objects and 87 motion pattern classes;
a random gradient descent method was used in the training construction, where the momentum was set to 0.9.
The twin network target tracking method based on the channel and space attention mechanism is characterized in that target image feature sizes respectively extracted by a template branch and a search branch in the twin network frame are 6 x 128 and 22 x 128.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
FIG. 1 is a flow chart of a twin network target tracking method based on a channel and space attention mechanism according to the present invention;
FIG. 2 is a schematic diagram of a twin network target tracking method based on a channel and space attention mechanism according to the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.
These and other aspects of embodiments of the invention will be apparent with reference to the following description and attached drawings. In the description and drawings, particular embodiments of the invention have been disclosed in detail as being indicative of some of the ways in which the principles of the embodiments of the invention may be practiced, but it is understood that the scope of the embodiments of the invention is not limited correspondingly. On the contrary, the embodiments of the invention include all changes, modifications and equivalents coming within the spirit and terms of the claims appended hereto.
In the prior art, when visual target tracking is carried out, a convolutional neural network model, a channel attention mechanism and a space attention mechanism are not combined at the same time, and the accuracy and robustness of target tracking are not ideal.
In order to solve the technical problem, the present invention provides a twin network target tracking method based on channel and spatial attention mechanism, please refer to fig. 1 and fig. 2, wherein the method includes the following steps:
s101, processing the video or image data set to obtain a plurality of target images with uniform image sizes.
In this step, the images in the video or image data set need to be processed to a uniform size. It should be noted that the target images with uniform size are processed, which is convenient for subsequent input and extraction of deep features of images with uniform size in the tracking stage.
And S102, constructing and obtaining a novel backbone network model based on the convolutional neural network model, the channel attention mechanism and the space attention mechanism.
In this embodiment, the novel backbone network model is a twin network framework, which includes a template branch and a search branch. As shown in figure 2 of the drawings, in which,the corresponding is the branch of the template,the corresponding is a search branch.
Additionally, in fig. 2, a convolutional neural network model, a channel attention module, and a spatial attention module are integrated to construct a novel backbone network model within the middle dashed box. The convolutional neural network model comprises a convolutional layer 1, a convolutional layer 2, a convolutional layer 3, a convolutional layer 4 and a convolutional layer 5. Wherein the channel attention module and the space attention module are located between the convolutional layer 1 and the convolutional layer 2. The method is used for processing the deep features of the target image sample extracted from the target image in the subsequent step.
S103, extracting training samples from the target images to train the novel backbone network model.
During training, the size of the picture needs to be determined according to the complexity of the model and the size of the video memory. In the present invention, the sample image size input to the template branch is 127 × 127 × 3, and the target image size input to the search branch is 255 × 255 × 3.
In the step of extracting the training samples from the plurality of target images, it is additionally noted that:
when the sub-window searching for the target image extends beyond the range of the target image, the missing image portion is filled with the average RGB values. In the subsequent testing stage (including step S104 and step S105), the target images of the two channels are respectively introduced into the template branch and the search branch of the twin network framework to obtain the deep features of the target image sample.
It should be noted that, here, a plurality of target images are used as the training data set for training. Wherein the training data set comprises 560 moving objects and 87 motion pattern classes. In addition, the training data set also provides video clips of over 10000 real-world moving objects and over 150 million hand-made labeled borders. In the invention, the novel backbone network model designed above can realize end-to-end training of a large-scale data set GOT-10 k.
In addition, a random gradient descent method (SGD) was used in the training construction, in which the momentum was set to 0.9. Further, the learning rate per iteration decreases from the initial learning rate to the final learning rate, being set to 0.01 and 0.00001, respectively. The new backbone network model proposed in the present invention was trained for a total of 50 epochs, with a weight decay set to 0.0005 and a batch size of 16.
And S104, extracting deep features of the target image sample from the plurality of target images by using the trained novel backbone network model, and performing similarity matching on the deep features of the target image sample in a target image candidate region to obtain a plurality of target candidate blocks, wherein each target candidate block corresponds to a similarity score.
Specifically, for the above-mentioned novel backbone network model, the convolutional neural network model (CNN model) includes 5 convolutional layers, but does not include a full link layer. The channel attention mechanism and the space attention mechanism are composed of a channel attention module and a space attention module. Is constructed after the first layer of convolutional and pooling layers, in terms of the sequential discharge positions of the channel attention module-spatial attention module. Additionally, the receptive field of the spatial attention module employs a "7 × 7" convolution kernel.
In the twin network framework, the following formula exists:
where h denotes a mapping function of the input-output signal,kthe length of the stride is represented by,is the transition value of the active area in the input-output signal,andboth represent the translation operator and are,representing the input target image.
Furthermore, a convolution embedding function is typically usedSo that two input target imagesAndand performing correlation to generate an output response graph for representing the similarity score between the deep features of the target image sample after the two input target images pass through the twin network framework.
Wherein, the formula of the similarity score is expressed as:
wherein,Representing a similarity score between two input target images,representative valueThe deviation of (a) is determined,a set of real numbers is represented as,andrepresenting the output characteristics of two input target images after passing through the twin network framework,representing two input target images of the object,representing a convolution embedding function.
For the channel attention module described above, each channel of the feature map represents a particular detector when extracting relevant features of the input target image. Therefore, measures need to be taken to focus the channel attention module on certain specific features to be useful for the input target image.
Specifically, in the step of extracting deep features of the target image sample from the plurality of target images by using the trained novel backbone network model, the channel attention mechanism performs the following steps:
and A1, obtaining the characteristics of the target images of the two channels through maximum pooling and global average pooling.
In the present invention, the size of the input target image Z is "H × W × C", and Max-Pooling (maximum Pooling) and Global Average-Pooling (Global Average Pooling) are used to obtain the features of the target images of two channels, and the size of the features of the target images of two channels is "1 × 1 × C".
And B1, inputting the features of the target images of the two channels obtained after the maximum pooling and the global average pooling into the multi-layer perceptron network, and summing the element numbers to obtain a feature vector.
The features of the target images of the two channels obtained after the maximum pooling and the global average pooling are then input into a multi-layered perceptron network (i.e., MLP). Wherein, the number of the first layer of neurons is C/r, the activation function is ReLU, and the number of the second layer of neurons is C. Wherein the neural network parameters of the two layers are shared. And summing the elements to output a feature vector.
C1, passing the feature vector through a Sigmoid activation function to obtain a first weight coefficient, and combining the first weight coefficient with the input target imageThe multiplication is performed to obtain a first weighted new feature.
In this step, the first weight coefficient is expressed as:
wherein the content of the first and second substances,is a first weight coefficient of the first weight coefficient,indicating that the Sigmoid-activated function,representing the weight of the shared multi-tier perceptron network,the representation of the function of the ReLU,is a function of the global average pooling,is a maximum pooling function;
the first weighted new feature is represented as:
wherein the content of the first and second substances,a first weighted new characteristic is represented that is,which represents a multiplication at the level of the element,representing the input target image.
Further, after the channel attention module, a spatial attention module is introduced to focus on which features in the input target image are meaningful. Specifically, in the step of extracting deep features of the target image sample from the plurality of target images by using the trained novel backbone network model, the spatial attention mechanism performs the following steps:
and A2, obtaining the characteristics of the target images of the two channels through maximum pooling and global average pooling, and splicing the characteristics of the target images of the two channels through the first convolution layer.
Similar to the channel attention Module, the purpose of the inputTarget imageThe size of (1) is "H × W × C". Max-Pooling (maximum Pooling) and Global Average-Pooling (Global Average Pooling) of one channel dimension are utilized to obtain the characteristics of the target images of the two channels, the size is H multiplied by W multiplied by 1, and the target images are spliced together according to a standard convolutional layer (first convolutional layer).
And B2, calculating the characteristics of the target images of the two spliced channels through a second convolution layer and a Sigmoid activation function to obtain a second weight coefficient.
Then, the weight coefficient is obtained by 7 × 7 convolution layer and Sigmoid activation function. Finally, the weight coefficient is calculatedMultiplying with the input target image Z' to obtain a second weighted new feature。
Wherein the second weight coefficient is expressed as:
wherein the content of the first and second substances,for the purpose of the second weight coefficient,the perceptual domain representing the convolution kernel is 7 x 7,also a first weighted new feature is represented,representing a Sigmoid activation function.
And C2, multiplying the second weighting coefficient and the first weighting new characteristic to obtain a second weighting new characteristic.
The second weighted new feature is represented as:
For the above step S104, in summary, in the test tracking phase, the convolution feature between the target images of the two branches in the original twin network structure does not contain background context information. Therefore, the tracker has difficulty in distinguishing the target from the complex background information, and is prone to tracking drift and failure. Based on the method, the deep features of the target image sample are extracted by using a trained novel backbone network model, and the deep features of the target image sample are distinguished from background information so as to focus on important features and inhibit useless information.
Then we give more weight to the channel attention module and the spatial attention module of the sequence. The channel attention module and the spatial attention module play an important role in improving the discrimination capability of the tracker. Finally, the sizes of the target image features extracted by the template branch and the search branch in the twin network framework are "6 × 6 × 128" and "22 × 22 × 128", respectively.
Further, similarity matching is performed in the target image candidate region. I.e. calculating the similarity of all the transformation sub-windows on a dense grid, as shown in the above formula (2), i.e. using convolution embedding functionAre correlated to generate an outputAnd a response graph is used for representing the similarity score between deep features of the target image sample after the two input target images pass through the twin network framework.
Here, the target candidate blocks described here are all obtained by search branching, and the corresponding size is "22 × 22 × 128". The similarity score is obtained by comparing the similarity between the target candidate block (which is also the target image feature in nature) in the search branch and the sample image feature in the template branch.
And S105, performing target tracking by using the acquired target candidate block with the maximum similarity score.
In this step, the target candidate block with the obtained maximum similarity score is used for target tracking. The method specifically comprises the following steps: and calculating and comparing the similarity between the deep features of the target image sample (in the template branch) and the deep features of the candidate target image sample (in the search branch), and finding the target image with the region with the highest similarity score in the subsequent frame as a predicted result, thereby realizing target tracking.
The invention provides a twin network target tracking method based on a channel and space attention mechanism, which comprises the steps of firstly processing a video or image data set to obtain a target image with a uniform image size, then jointly constructing a novel backbone network model based on a convolutional neural network model, the channel attention mechanism and the space attention mechanism, then extracting a training sample from the target image, training the novel backbone network model, extracting deep features of the target image sample from the target image by using the trained novel backbone network model, further performing similarity matching in a target image candidate area to obtain a corresponding similarity score, and finally performing target tracking according to an obtained target candidate block with the maximum similarity score.
According to the method, the GOT-10k is used as a training set to adjust model parameters of off-line training, so that targets in the video can be more accurately represented; feature extraction is then performed by using a lightweight convolutional neural network model. The apparent model of the tracking algorithm designed by the invention has better robustness and accuracy.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present invention should be subject to the appended claims.
Claims (10)
1. A twin network target tracking method based on a channel and space attention mechanism is characterized by comprising the following steps:
the method comprises the following steps: processing the video or image data set to obtain a plurality of target images of uniform image size;
step two: constructing and obtaining a novel backbone network model based on a convolutional neural network model, a channel attention mechanism and a space attention mechanism;
step three: extracting training samples from the plurality of target images to train the novel backbone network model;
step four: extracting deep features of a target image sample from the multiple target images by using the trained novel backbone network model, and performing similarity matching on the deep features of the target image sample in a target image candidate region to obtain multiple target candidate blocks, wherein each target candidate block corresponds to a similarity score;
step five: and tracking the target by using the acquired target candidate block with the maximum similarity score.
2. The twin network target tracking method based on the channel and space attention mechanism as claimed in claim 1, wherein the novel backbone network model is a twin network framework, the twin network framework comprises a template branch and a search branch;
wherein the step of extracting training samples from the plurality of target images comprises:
when the sub-window searching the target image extends beyond the range of the target image, the missing image portion is filled with the average RGB values.
3. The twin network target tracking method based on the channel and space attention mechanism as claimed in claim 2, wherein in the twin network framework, the method comprises:
inputting a target image through the template branch and the search branch respectively, and acquiring deep features of a target image sample according to the template branch and the search branch;
in the twin network framework, the following formula exists:
4. The twin network target tracking method based on the channel and space attention mechanism as claimed in claim 3, wherein in the fourth step, the formula of the similarity score is expressed as:
wherein the content of the first and second substances,representing a similarity score between two input target images,representative valueThe deviation of (a) is determined,a set of real numbers is represented as,andrepresenting the output characteristics of two input target images after passing through the twin network framework,representing two input target images of the object,representing a convolution embedding function.
5. The method of claim 4, wherein in the step of extracting deep features of the target image samples from the plurality of target images by using the trained novel backbone network model, the channel attention mechanism performs the following steps:
obtaining the characteristics of the target images of the two channels through maximum pooling and global average pooling;
inputting the features of the target images of the two channels obtained after the maximum pooling and the global average pooling into a multilayer perceptron network, and obtaining feature vectors after element summation;
and obtaining a first weighting coefficient by passing the feature vector through a Sigmoid activation function, and multiplying the first weighting coefficient and an input target image to obtain a first weighted new feature.
6. The twin network target tracking method based on the channel and space attention mechanism as claimed in claim 5, wherein the first weight coefficient is expressed as:
wherein the content of the first and second substances,is a first weight coefficient of the first weight coefficient,indicating that the Sigmoid-activated function,representing the weight of the shared multi-tier perceptron network,the representation of the function of the ReLU,is a function of the global average pooling,is a function of the maximum pooling,representing an input target image;
the first weighted new feature is represented as:
7. The twin network target tracking method based on channel and spatial attention mechanism of claim 6, wherein in the step of extracting deep features of target image samples from the plurality of target images by using the trained novel backbone network model, the spatial attention mechanism performs the following steps:
obtaining the characteristics of the target images of the two channels through maximum pooling and global average pooling, and splicing the characteristics of the target images of the two channels through the first convolution layer;
calculating the characteristics of the target images of the two spliced channels through a second convolution layer and a Sigmoid activation function to obtain a second weight coefficient;
and multiplying the second weighting coefficient and the first weighting new characteristic to obtain a second weighting new characteristic.
8. The twin network target tracking method based on the channel and space attention mechanism as claimed in claim 7,
the second weight coefficient is expressed as:
wherein the content of the first and second substances,for the purpose of the second weight coefficient,the perceptual domain representing the convolution kernel is 7 x 7,also a first weighted new feature is represented,representing a Sigmoid activation function;
the second weighted new feature is represented as:
9. The twin network target tracking method based on channel and space attention mechanism as claimed in claim 1, wherein in the step of constructing a new backbone network model based on the convolutional neural network model, the channel attention mechanism and the space attention mechanism,
training with the plurality of target images as a training data set, wherein the training data set contains 560 moving objects and 87 motion pattern classes;
a random gradient descent method was used in the training construction, where the momentum was set to 0.9.
10. The twin network target tracking method based on the channel and space attention mechanism as claimed in claim 2, wherein the sizes of the target image features extracted by the template branch and the search branch in the twin network frame are "6 x 128" and "22 x 128", respectively.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110828947.1A CN113283407A (en) | 2021-07-22 | 2021-07-22 | Twin network target tracking method based on channel and space attention mechanism |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110828947.1A CN113283407A (en) | 2021-07-22 | 2021-07-22 | Twin network target tracking method based on channel and space attention mechanism |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113283407A true CN113283407A (en) | 2021-08-20 |
Family
ID=77287159
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110828947.1A Pending CN113283407A (en) | 2021-07-22 | 2021-07-22 | Twin network target tracking method based on channel and space attention mechanism |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113283407A (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113705588A (en) * | 2021-10-28 | 2021-11-26 | 南昌工程学院 | Twin network target tracking method and system based on convolution self-attention module |
CN114519847A (en) * | 2022-01-13 | 2022-05-20 | 东南大学 | Target consistency judging method suitable for vehicle-road cooperative sensing system |
CN114782488A (en) * | 2022-04-01 | 2022-07-22 | 燕山大学 | Underwater target tracking method based on channel perception |
CN115018878A (en) * | 2022-04-21 | 2022-09-06 | 哈尔滨工业大学 | Attention mechanism-based target tracking method in complex scene, storage medium and equipment |
CN115063445A (en) * | 2022-08-18 | 2022-09-16 | 南昌工程学院 | Target tracking method and system based on multi-scale hierarchical feature representation |
CN115147456A (en) * | 2022-06-29 | 2022-10-04 | 华东师范大学 | Target tracking method based on time sequence adaptive convolution and attention mechanism |
CN115223193A (en) * | 2022-06-19 | 2022-10-21 | 浙江爱达科技有限公司 | Capsule endoscope image focus identification method based on focus feature importance |
CN115661207A (en) * | 2022-11-14 | 2023-01-31 | 南昌工程学院 | Target tracking method and system based on space consistency matching and weight learning |
CN117437459A (en) * | 2023-10-08 | 2024-01-23 | 昆山市第一人民医院 | Method for realizing user knee joint patella softening state analysis based on decision network |
Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140313752A1 (en) * | 2009-11-21 | 2014-10-23 | Volkswagen Ag | Method for Controlling a Headlamp System for a Vehicle, and Headlamp System |
CN109035297A (en) * | 2018-07-19 | 2018-12-18 | 深圳市唯特视科技有限公司 | A kind of real-time tracing method based on dual Siam's network |
CN109978921A (en) * | 2019-04-01 | 2019-07-05 | 南京信息工程大学 | A kind of real-time video target tracking algorithm based on multilayer attention mechanism |
CN109993774A (en) * | 2019-03-29 | 2019-07-09 | 大连理工大学 | Online Video method for tracking target based on depth intersection Similarity matching |
CN110120064A (en) * | 2019-05-13 | 2019-08-13 | 南京信息工程大学 | A kind of depth related objective track algorithm based on mutual reinforcing with the study of more attention mechanisms |
CN110210551A (en) * | 2019-05-28 | 2019-09-06 | 北京工业大学 | A kind of visual target tracking method based on adaptive main body sensitivity |
CN110335290A (en) * | 2019-06-04 | 2019-10-15 | 大连理工大学 | Twin candidate region based on attention mechanism generates network target tracking method |
CN110675423A (en) * | 2019-08-29 | 2020-01-10 | 电子科技大学 | Unmanned aerial vehicle tracking method based on twin neural network and attention model |
CN111192292A (en) * | 2019-12-27 | 2020-05-22 | 深圳大学 | Target tracking method based on attention mechanism and twin network and related equipment |
CN111354017A (en) * | 2020-03-04 | 2020-06-30 | 江南大学 | Target tracking method based on twin neural network and parallel attention module |
CN111462175A (en) * | 2020-03-11 | 2020-07-28 | 华南理工大学 | Space-time convolution twin matching network target tracking method, device, medium and equipment |
CN112348849A (en) * | 2020-10-27 | 2021-02-09 | 南京邮电大学 | Twin network video target tracking method and device |
CN112785624A (en) * | 2021-01-18 | 2021-05-11 | 苏州科技大学 | RGB-D characteristic target tracking method based on twin network |
CN112837344A (en) * | 2019-12-18 | 2021-05-25 | 沈阳理工大学 | Target tracking method for generating twin network based on conditional confrontation |
-
2021
- 2021-07-22 CN CN202110828947.1A patent/CN113283407A/en active Pending
Patent Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140313752A1 (en) * | 2009-11-21 | 2014-10-23 | Volkswagen Ag | Method for Controlling a Headlamp System for a Vehicle, and Headlamp System |
CN109035297A (en) * | 2018-07-19 | 2018-12-18 | 深圳市唯特视科技有限公司 | A kind of real-time tracing method based on dual Siam's network |
CN109993774A (en) * | 2019-03-29 | 2019-07-09 | 大连理工大学 | Online Video method for tracking target based on depth intersection Similarity matching |
CN109978921A (en) * | 2019-04-01 | 2019-07-05 | 南京信息工程大学 | A kind of real-time video target tracking algorithm based on multilayer attention mechanism |
CN110120064A (en) * | 2019-05-13 | 2019-08-13 | 南京信息工程大学 | A kind of depth related objective track algorithm based on mutual reinforcing with the study of more attention mechanisms |
CN110210551A (en) * | 2019-05-28 | 2019-09-06 | 北京工业大学 | A kind of visual target tracking method based on adaptive main body sensitivity |
CN110335290A (en) * | 2019-06-04 | 2019-10-15 | 大连理工大学 | Twin candidate region based on attention mechanism generates network target tracking method |
CN110675423A (en) * | 2019-08-29 | 2020-01-10 | 电子科技大学 | Unmanned aerial vehicle tracking method based on twin neural network and attention model |
CN112837344A (en) * | 2019-12-18 | 2021-05-25 | 沈阳理工大学 | Target tracking method for generating twin network based on conditional confrontation |
CN111192292A (en) * | 2019-12-27 | 2020-05-22 | 深圳大学 | Target tracking method based on attention mechanism and twin network and related equipment |
CN111354017A (en) * | 2020-03-04 | 2020-06-30 | 江南大学 | Target tracking method based on twin neural network and parallel attention module |
CN111462175A (en) * | 2020-03-11 | 2020-07-28 | 华南理工大学 | Space-time convolution twin matching network target tracking method, device, medium and equipment |
CN112348849A (en) * | 2020-10-27 | 2021-02-09 | 南京邮电大学 | Twin network video target tracking method and device |
CN112785624A (en) * | 2021-01-18 | 2021-05-11 | 苏州科技大学 | RGB-D characteristic target tracking method based on twin network |
Non-Patent Citations (7)
Title |
---|
MD. MAKLACHUR RAHMAN ET AL: "Efficient Visual Tracking With Stacked Channel-Spatial Attention Learning", 《 IEEE ACCESS》 * |
MD. MAKLACHUR RAHMAN ET AL: "Efficient Visual Tracking With Stacked Channel-Spatial Attention Learning", 《IEEE ACCESS》 * |
YINGSEN ZENG ET AL: "Learning Spatial-Channel Attention for Visual Tracking", 《2019 IEEE/CIC INTERNATIONAL CONFERENCE ON COMMUNICATIONS IN CHINA (ICCC)》 * |
周迪雅等: "基于孪生网络与注意力机制的目标跟踪方法", 《信息通信》 * |
杨康等: "基于双重注意力孪生网络的实时视觉跟踪", 《计算机应用》 * |
董航: "基于深度学习的目标检测与跟踪", 《中国优秀博硕士学位论文全文数据库(硕士)》 * |
钟莎等: "基于孪生区域候选网络的无人机指定目标跟踪", 《计算机应用》 * |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113705588A (en) * | 2021-10-28 | 2021-11-26 | 南昌工程学院 | Twin network target tracking method and system based on convolution self-attention module |
CN114519847A (en) * | 2022-01-13 | 2022-05-20 | 东南大学 | Target consistency judging method suitable for vehicle-road cooperative sensing system |
CN114782488A (en) * | 2022-04-01 | 2022-07-22 | 燕山大学 | Underwater target tracking method based on channel perception |
CN115018878A (en) * | 2022-04-21 | 2022-09-06 | 哈尔滨工业大学 | Attention mechanism-based target tracking method in complex scene, storage medium and equipment |
CN115223193A (en) * | 2022-06-19 | 2022-10-21 | 浙江爱达科技有限公司 | Capsule endoscope image focus identification method based on focus feature importance |
CN115223193B (en) * | 2022-06-19 | 2023-07-04 | 浙江爱达科技有限公司 | Capsule endoscope image focus identification method based on focus feature importance |
CN115147456A (en) * | 2022-06-29 | 2022-10-04 | 华东师范大学 | Target tracking method based on time sequence adaptive convolution and attention mechanism |
CN115063445A (en) * | 2022-08-18 | 2022-09-16 | 南昌工程学院 | Target tracking method and system based on multi-scale hierarchical feature representation |
CN115063445B (en) * | 2022-08-18 | 2022-11-08 | 南昌工程学院 | Target tracking method and system based on multi-scale hierarchical feature representation |
CN115661207A (en) * | 2022-11-14 | 2023-01-31 | 南昌工程学院 | Target tracking method and system based on space consistency matching and weight learning |
CN117437459A (en) * | 2023-10-08 | 2024-01-23 | 昆山市第一人民医院 | Method for realizing user knee joint patella softening state analysis based on decision network |
CN117437459B (en) * | 2023-10-08 | 2024-03-22 | 昆山市第一人民医院 | Method for realizing user knee joint patella softening state analysis based on decision network |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113283407A (en) | Twin network target tracking method based on channel and space attention mechanism | |
Wu et al. | Radio Galaxy Zoo: CLARAN–a deep learning classifier for radio morphologies | |
Zhu et al. | I can find you! boundary-guided separated attention network for camouflaged object detection | |
Hu et al. | SAC-Net: Spatial attenuation context for salient object detection | |
Rahmon et al. | Motion U-Net: Multi-cue encoder-decoder network for motion segmentation | |
CN111626176B (en) | Remote sensing target rapid detection method and system based on dynamic attention mechanism | |
CN111507370A (en) | Method and device for obtaining sample image of inspection label in automatic labeling image | |
CN108520203B (en) | Multi-target feature extraction method based on fusion of self-adaptive multi-peripheral frame and cross pooling feature | |
CN111611851B (en) | Model generation method, iris detection method and device | |
Wimmer et al. | Convolutional neural network architectures for the automated diagnosis of celiac disease | |
US11468296B2 (en) | Relative position encoding based networks for action recognition | |
CN115359074B (en) | Image segmentation and training method and device based on hyper-voxel clustering and prototype optimization | |
CN111915644A (en) | Real-time target tracking method of twin guiding anchor frame RPN network | |
CN111428664A (en) | Real-time multi-person posture estimation method based on artificial intelligence deep learning technology for computer vision | |
CN117037215B (en) | Human body posture estimation model training method, estimation device and electronic equipment | |
Cheng et al. | A vision-based robot grasping system | |
CN111582091A (en) | Pedestrian identification method based on multi-branch convolutional neural network | |
CN112308825A (en) | SqueezeNet-based crop leaf disease identification method | |
CN116597224A (en) | Potato defect detection method based on improved YOLO V8 network model | |
CN115004316A (en) | Multi-functional computer-assisted gastroscopy system and method employing optimized integrated AI solutions | |
Liu et al. | Semi-supervised keypoint detector and descriptor for retinal image matching | |
CN114925320A (en) | Data processing method and related device | |
CN112036250B (en) | Pedestrian re-identification method, system, medium and terminal based on neighborhood cooperative attention | |
CN113033371A (en) | CSP model-based multi-level feature fusion pedestrian detection method | |
Liu et al. | MSSTResNet-TLD: A robust tracking method based on tracking-learning-detection framework by using multi-scale spatio-temporal residual network feature model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210820 |
|
RJ01 | Rejection of invention patent application after publication |