CN113379806B - Target tracking method and system based on learnable sparse conversion attention mechanism - Google Patents
Target tracking method and system based on learnable sparse conversion attention mechanism Download PDFInfo
- Publication number
- CN113379806B CN113379806B CN202110929160.4A CN202110929160A CN113379806B CN 113379806 B CN113379806 B CN 113379806B CN 202110929160 A CN202110929160 A CN 202110929160A CN 113379806 B CN113379806 B CN 113379806B
- Authority
- CN
- China
- Prior art keywords
- target
- image
- frame
- search area
- learnable
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/60—Analysis of geometric attributes
- G06T7/66—Analysis of geometric attributes of image moments or centre of gravity
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Geometry (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a target tracking method and a system based on a learnable sparse conversion attention mechanism, which comprises the following steps: initializing an image in a given first frame target frame to generate a target template image; in the subsequent frame, the target center of the image in the target frame of the previous frame is used as a central point, and a plurality of search area images are obtained through a multi-scale strategy; inputting the target template image and the search area image into a convolutional neural network model sharing the weight, and respectively extracting features through a convolutional neural network; performing space conversion and channel conversion on the extracted features based on the learnable sparse model; and taking the depth feature of the target template as a convolution kernel, performing sliding window operation on the image of the search area to obtain a plurality of score maps, and inferring the relative displacement and scale change of the target according to the maximum position of the score values to realize target tracking. The method has good robustness and real-time performance, and can realize good target image tracking effect.
Description
Technical Field
The invention relates to the technical field of computer vision and digital image processing, in particular to a target tracking method and system based on a learnable sparse conversion attention mechanism.
Background
In recent years, visual tracking is a research hotspot in computer vision, and estimates the target position of a subsequent video frame by using the target initial state of a first frame image. Especially in recent years, with the rapid development of deep learning, significant progress in the field of target tracking is driven. However, in complex scenes, achieving robust and accurate target tracking still has great challenges, such as occlusion, motion blur, scale change, and illumination change.
In general, visual tracking algorithms include two categories: discriminant algorithms and generator algorithms. Specifically, (1) the algorithm based on the discriminant model can be regarded as a two-classification problem, that is, the target and the background information are simultaneously extracted to train a classifier, so that the target is distinguished from the background information of the current frame, and the target position of the current frame is obtained; (2) and establishing a motion model through online learning based on the algorithm of the generated model, and then searching a candidate region with the minimum reconstruction error through the model to realize target tracking. Meanwhile, in recent years, the method based on deep learning utilizes the strong characterization capability of the depth features, greatly improves the robustness and accuracy of the tracking algorithm, and gradually becomes a mainstream trend.
Specifically, the tracking algorithm based on deep learning mainly utilizes that a convolutional neural network has strong feature extraction and expression capacity and is used for extracting target features and distinguishing foreground and background to identify a tracking target. The video tracking algorithm based on the twin network converts the tracking problem into a matching problem, realizes off-line end-to-end large-scale data set training, and greatly improves the speed and the accuracy.
However, in the prior art, the robustness and the accuracy of an appearance model of a partial visual tracking algorithm are not ideal, and the influence caused by appearance changes such as motion blur, illumination change, complex background and occlusion cannot be well processed.
Disclosure of Invention
In view of the above situation, it is necessary to solve the problem in the prior art that the robustness and accuracy of the appearance model of the partial visual tracking algorithm are not ideal, and the influence caused by appearance changes such as motion blur, illumination change, complex background, occlusion, and the like cannot be well handled.
The embodiment of the invention provides a target tracking method based on a learnable sparse conversion attention mechanism, wherein the method comprises the following steps:
the method comprises the following steps: initializing an image in a given first frame target frame to generate a target template image;
step two: in a second frame and a subsequent frame, taking the target center of the image in the target frame of the previous frame as a central point, obtaining a plurality of search area images through a multi-scale strategy, and adjusting the plurality of search area images to be the same in size;
step three: inputting the target template image and the search area image into a convolutional neural network model sharing weight values, and respectively extracting a target template depth feature and a search area depth feature through a convolutional neural network;
step four: performing space conversion and channel conversion on the depth features of the target template and the depth features of the search area based on a learnable sparse model to reduce space feature redundancy and inter-channel redundancy;
step five: taking the depth features of the target template processed by the learnable sparse model as convolution kernels, and performing sliding window operation on the image of the search area to obtain a plurality of score maps;
step six: and according to the position with the maximum score value in the score maps, the relative displacement of the target center of the image in the target frame of the previous frame in the current frame is estimated, and the scale change of the target tracking image is obtained through a multi-scale strategy so as to realize the tracking of the target.
The invention provides a target tracking method based on a learnable sparse conversion attention mechanism, which combines a convolutional neural network model and a learnable sparse conversion model and can obtain more sparse and robust target template image characteristics and search area image characteristics; in addition, similarity calculation is carried out on the target template image characteristics and the search area image characteristics through cross correlation, and a multi-scale strategy is utilized to adapt to target scale change. The target tracking method provided by the invention has good robustness and real-time performance, can better process appearance changes including shading, illumination changes, motion blur and the like, and finally realizes a good target image tracking effect.
The target tracking method based on the learnable sparse conversion attention mechanism is characterized in that in the step one, the coordinates of the center of the target to be tracked in the target frame of the first frame areThe height and width of the target to be tracked in the first frame target frame are respectivelyAnd;
the target tracking method based on the learnable sparse conversion attention mechanism is characterized in that in the step one, correlation coefficients are usedObtaining side lengths of target template imagesThe corresponding expression is:
the target tracking method based on the learnable sparse conversion attention mechanism is characterized in that in the second step, the side length of the image of the search area is searchedBy correlation coefficientHeight from the image in the previous frame target frameAnd widthAnd calculating to obtain the following concrete expression:
wherein, when the previous frame is the first frame, the height and width of the image are respectivelyAnd。
the target tracking method based on the learnable sparse conversion attention mechanism is characterized in that in the second step, the side length of the image of the search area is obtainedAfter the step of (a), the method further comprises:
center of object of image in previous frame object frameAs a central point, by respectivelyAs different side lengths, to obtain different search area images, wherein,;
In the third step, in the step of extracting the depth features by the convolutional neural network, the corresponding convolution operation is represented as:
wherein the content of the first and second substances,in order to input the features of the image,for the output characteristics after the convolution operation,in order to be the convolution kernel size,as to the number of channels of the input image,in order to slide the window, the sliding window,for sliding windowsFrom input featuresThe tensor of (A) isThe pixel of (a) is (are) in (b),is as followsA convolution kernel isThe pixel of (b).
The target tracking method based on the learnable sparse conversion attention mechanism, wherein in the fourth step, when performing spatial conversion, the method comprises:
decomposing an input image local area into different frequency bands through continuous row and column transformation, and initializing corresponding column and row transformation weights;
the concrete expression is as follows:
wherein the content of the first and second substances,the weights corresponding to the spatial transformation are represented,which represents the kronecker product of,andthe transform initial weights for the columns and rows are represented, respectively.
The target tracking method based on the learnable sparse conversion attention mechanism comprises the following six specific steps:
finding the position with the maximum score value in the three score mapsCalculating the relative displacement between the target center of the image in the target frame of the previous frame and the target center of the image in the target frame of the previous frame;
and updating the position of the target center of the target tracking image of the current frame according to the relative displacement so as to position.
The target tracking method based on the learnable sparse conversion attention mechanism is characterized by further comprising the following steps:
updating the scale of the target tracking image of the current frame according to the scale of the maximum value of the score values in the three score maps;
wherein the corresponding scale variation is represented as:
wherein the content of the first and second substances,in order to be a change in scale, the,is the scale on which the maximum of the three score maps lies.
The invention also provides a target tracking system based on the learnable sparse conversion attention mechanism, wherein the system comprises:
the first processing module is used for initializing the image in the given first frame target frame to generate a target template image;
the second processing module is used for obtaining a plurality of search area images by taking the target center of the image in the target frame of the previous frame as a central point through a multi-scale strategy in the second frame and the subsequent frames and adjusting the plurality of search area images to be the same in size;
the first learning module is used for inputting the target template image and the search area image into a convolutional neural network model sharing weight values and respectively extracting a target template depth feature and a search area depth feature through a convolutional neural network;
the second learning module is used for carrying out space conversion and channel conversion on the depth features of the target template and the depth features of the search area based on a learnable sparse model so as to reduce space feature redundancy and inter-channel redundancy;
the sliding window processing module is used for taking the depth features of the target template processed by the learnable sparse model as convolution kernels and performing sliding window operation on the image of the search area to obtain a plurality of score maps;
and the positioning tracking module is used for estimating the relative displacement of the target center of the image in the target frame of the previous frame in the current frame according to the position with the maximum score value in the score maps, and acquiring the scale change of the target tracking image through a multi-scale strategy so as to realize the tracking of the target.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
FIG. 1 is a schematic diagram of a target tracking method based on a learnable sparse conversion attention mechanism proposed by the present invention;
fig. 2 is a structural diagram of a target tracking system based on a learnable sparse conversion attention mechanism proposed in the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.
These and other aspects of embodiments of the invention will be apparent with reference to the following description and attached drawings. In the description and drawings, particular embodiments of the invention have been disclosed in detail as being indicative of some of the ways in which the principles of the embodiments of the invention may be practiced, but it is understood that the scope of the embodiments of the invention is not limited correspondingly. On the contrary, the embodiments of the invention include all changes, modifications and equivalents coming within the spirit and terms of the claims appended hereto.
In the prior art, the robustness and the accuracy of an appearance model of a partial visual tracking algorithm are not ideal, and the influence caused by appearance changes such as motion blur, illumination change and shielding cannot be well processed.
In order to solve the technical problem, the present invention provides a target tracking method based on a learnable sparse transformation attention mechanism, please refer to fig. 1, the method includes the following steps:
s101, initializing the image in the given first frame target frame to generate a target template image.
In this step, the coordinates of the center of the target to be tracked in the first frame target frame areThe height and width of the target to be tracked in the first frame target frame are respectivelyAnd. In addition, a correlation coefficient is correspondingly setThe expression is:
wherein the content of the first and second substances,andrespectively the height and width of the target to be tracked in the first frame target frame.
At the same time, by the correlation coefficientObtaining side lengths of target template imagesThe corresponding expression is:
that is, given the coordinates of the center of the target to be tracked in the first frame target frame asSide length ofIntercepting image block, and adjusting the size of target template image to。
And S102, in the second frame and the subsequent frames, taking the target center of the image in the target frame of the previous frame as a central point, obtaining a plurality of search area images through a multi-scale strategy, and adjusting the plurality of search area images to be the same in size.
Here, step S102 is the same as the sampling method in step S101, except that a multi-scale strategy is used for regression of the search region images, and the search region images are resized to the same size.
Specifically, in this step, for the search area image, the side length of the search area imageBy correlation coefficientHeight from the image in the previous frame target frameAnd widthAnd calculating to obtain the following concrete expression:
wherein, when the previous frame is the first frame, the height and width of the image are respectivelyAnd. Side length of image in search areaAfter the step (2), coordinates of the center of the target to be tracked in the target frame of the previous frameAs a central point, by respectivelyAs different side lengths to obtain different search area images (filled in as a mean if out of range of the current frame). Finally, the images in the search area are all adjusted toThree images of size. Wherein the content of the first and second substances,。
s103, inputting the target template image and the search area image into a convolutional neural network model sharing a weight, and respectively extracting a target template depth feature and a search area depth feature through a convolutional neural network.
It is noted that the target template image and the search area image use the same convolutional neural network, and the weights are shared. In the stage of feature extraction, a full convolution neural network is used. When training neural network parameters, a value is initialized randomly, parameters are optimized through back propagation of a real value (Ground Truth) and a cross entropy loss function of a predicted value, and finally a difference function between the predicted value and the real value is observed to find a group of parameters capable of well fitting training data.
In addition, the used feature extraction backbone network is AlexNet, wherein the first four convolutional layers are used, the full connection layer is removed, and AlexNet network parameters can well cope with complex target appearance changes through large-scale data set offline end-to-end training.
In this step, in the step of extracting the depth feature by the convolutional neural network, the corresponding convolution operation is expressed as:
wherein the content of the first and second substances,in order to input the features of the image,for the output characteristics after the convolution operation,in order to be the convolution kernel size,as to the number of channels of the input image,in order to slide the window, the sliding window,for sliding windowsFrom input featuresThe tensor of (A) isThe pixel of (a) is (are) in (b),is as followsA convolution kernel isThe pixel of (b).
And S104, performing space conversion and channel conversion on the depth features of the target template and the depth features of the search area based on a learnable sparse model to reduce space feature redundancy and inter-channel redundancy.
In performing spatial transformation, the main purpose is to reduce the redundancy of spatial features. Specifically, the local region of the input image is decomposed into different frequency bands through continuous row and column transformation, and corresponding column and row transformation weights are initialized;
the concrete expression is as follows:
wherein the content of the first and second substances,the weights corresponding to the spatial transformation are represented,which represents the kronecker product of,andthe transform initial weights for the columns and rows are represented, respectively.
In addition, in the channel conversion, redundancy among channels is mainly reduced, and specifically, the correlation among the channels is used for mapping the input features, so that the number of the channels is changed. Meanwhile, a residual error structure is adopted for design, on one hand, important information of input features is reserved, and on the other hand, a region of interest in an input image is highlighted.
And S105, taking the depth feature of the target template processed by the learnable sparse model as a convolution kernel, and performing sliding window operation on the image of the search area to obtain a plurality of score maps.
And S106, according to the position with the maximum score value in the multiple score maps, the relative displacement of the target center of the image in the target frame of the previous frame in the current frame is estimated, and the scale change of the target tracking image is obtained through a multi-scale strategy so as to realize the tracking of the target.
In this step, the position where the score value is the largest is found in the three score mapsAnd calculating the relative displacement between the target center of the image in the target frame of the previous frame and the target center of the image in the target frame of the previous frame. And then updating the position of the target center of the target tracking image of the current frame according to the relative displacement so as to carry out positioning.
And meanwhile, updating the scale of the target tracking image of the current frame according to the scale of the maximum value of the score values in the three score maps.
Wherein the corresponding scale variation is represented as:
wherein the content of the first and second substances,in order to be a change in scale, the,is the scale on which the maximum of the three score maps lies.
The invention provides a target tracking method based on a learnable sparse conversion attention mechanism, which combines a convolutional neural network model and a learnable sparse conversion model and can obtain more sparse and robust target template image characteristics and search area image characteristics; in addition, similarity calculation is carried out on the target template image characteristics and the search area image characteristics through cross correlation, and a multi-scale strategy is utilized to adapt to target scale change. The target tracking method provided by the invention has good robustness and real-time performance, can better process appearance changes including shading, illumination changes, motion blur and the like, and finally realizes a good target image tracking effect.
Referring to fig. 2, the present invention further provides a target tracking system based on a learnable sparse conversion attention mechanism, wherein the system includes a first processing module 11, a second processing module 12, a first learning module 13, a second learning module 14, a sliding window processing module 15, and a positioning tracking module 16;
a first processing module 11, configured to initialize an image in a given first frame target frame to generate a target template image;
a second processing module 12, which obtains a plurality of search area images through a multi-scale strategy by using a target center of an image in a target frame of a previous frame as a central point in a second frame and a subsequent frame, and adjusts the plurality of search area images to be the same size;
the first learning module 13 is configured to input the target template image and the search area image into a convolutional neural network model sharing a weight, and extract a target template depth feature and a search area depth feature through a convolutional neural network respectively;
the second learning module 14 is configured to perform spatial transformation and channel transformation on the depth feature of the target template and the depth feature of the search area based on a learnable sparse model to reduce spatial feature redundancy and inter-channel redundancy;
the sliding window processing module 15 is configured to perform a sliding window operation on the search area image by using the depth feature of the target template processed by the learnable sparse model as a convolution kernel to obtain a plurality of score maps;
and the positioning and tracking module 16 is configured to estimate, according to the position with the largest score value in the multiple score maps, a relative displacement of a target center of the image in the target frame of the previous frame in the current frame, and obtain, through a multi-scale strategy, a scale change of the target tracking image, so as to implement tracking of the target.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.
Claims (8)
1. A target tracking method based on a learnable sparse conversion attention mechanism is characterized by comprising the following steps:
the method comprises the following steps: initializing an image in a given first frame target frame to generate a target template image;
step two: in a second frame and a subsequent frame, taking the target center of the image in the target frame of the previous frame as a central point, obtaining a plurality of search area images through a multi-scale strategy, and adjusting the plurality of search area images to be the same in size;
step three: inputting the target template image and the search area image into a convolutional neural network model sharing weight values, and respectively extracting a target template depth feature and a search area depth feature through a convolutional neural network;
step four: performing space conversion and channel conversion on the depth features of the target template and the depth features of the search area based on a learnable sparse model to reduce space feature redundancy and inter-channel redundancy;
step five: taking the depth features of the target template processed by the learnable sparse model as convolution kernels, and performing sliding window operation on the image of the search area to obtain a plurality of score maps;
step six: according to the position with the maximum score value in the score maps, the relative displacement of the target center of the image in the target frame of the previous frame in the current frame is estimated, and the scale change of the target tracking image is obtained through a multi-scale strategy so as to realize the tracking of the target;
in the third step, in the step of extracting the depth feature by the convolutional neural network, the corresponding convolution operation is represented as:
wherein the content of the first and second substances,in order to input the features of the image,for the output characteristics after the convolution operation,in order to be the convolution kernel size,as to the number of channels of the input image,in order to slide the window, the sliding window,for sliding windowsFrom input featuresThe tensor of (A) isThe pixel of (a) is (are) in (b),is as followsA convolution kernel isA pixel of (b);
in the fourth step, when performing spatial conversion, the method includes:
decomposing an input image local area into different frequency bands through continuous row and column transformation, and initializing corresponding column and row transformation weights;
the concrete expression is as follows:
2. The target tracking method based on the learnable sparse conversion attention mechanism as claimed in claim 1, wherein in the step one, the coordinates of the target center to be tracked in the first frame target frame areThe height and width of the target to be tracked in the first frame target frame are respectivelyAnd;
4. the target tracking method based on the learnable sparse conversion attention mechanism as claimed in claim 2, wherein in the second step, the side length of the image of the search area is searchedBy correlation coefficientHeight from the image in the previous frame target frameAnd widthAnd calculating to obtain the following concrete expression:
5. the target tracking method based on the learnable sparse conversion attention mechanism as claimed in claim 4, wherein in the second step, the side length of the image of the search area is obtainedAfter the step of (a), the method further comprises:
center of object of image in previous frame object frameAs a central point, by respectivelyAs different side lengths, to obtain different search area images, wherein,;
6. The target tracking method based on the learnable sparse conversion attention mechanism according to claim 2, wherein the sixth step specifically comprises:
finding the position with the maximum score value in the three score mapsCalculating the relative displacement between the target center of the image in the target frame of the previous frame and the target center of the image in the target frame of the previous frame;
and updating the position of the target center of the target tracking image of the current frame according to the relative displacement so as to position.
7. The learnable sparse conversion attention mechanism based target tracking method of claim 6, the method further comprising:
updating the scale of the target tracking image of the current frame according to the scale of the maximum value of the score values in the three score maps;
wherein the corresponding scale variation is represented as:
8. A target tracking system based on a learnable sparse conversion attention mechanism, the system comprising:
the first processing module is used for initializing the image in the given first frame target frame to generate a target template image;
the second processing module is used for obtaining a plurality of search area images by taking the target center of the image in the target frame of the previous frame as a central point through a multi-scale strategy in the second frame and the subsequent frames and adjusting the plurality of search area images to be the same in size;
the first learning module is used for inputting the target template image and the search area image into a convolutional neural network model sharing weight values and respectively extracting a target template depth feature and a search area depth feature through a convolutional neural network;
the second learning module is used for carrying out space conversion and channel conversion on the depth features of the target template and the depth features of the search area based on a learnable sparse model so as to reduce space feature redundancy and inter-channel redundancy;
the sliding window processing module is used for taking the depth features of the target template processed by the learnable sparse model as convolution kernels and performing sliding window operation on the image of the search area to obtain a plurality of score maps;
the positioning and tracking module is used for estimating the relative displacement of the target center of the image in the target frame of the previous frame in the current frame according to the position with the maximum score value in the score maps, and obtaining the scale change of the target tracking image through a multi-scale strategy so as to realize the tracking of the target;
wherein the first learning module is configured to extract depth features through a convolutional neural network, and the corresponding convolutional operation is represented as:
wherein the content of the first and second substances,in order to input the features of the image,for the output characteristics after the convolution operation,in order to be the convolution kernel size,as to the number of channels of the input image,in order to slide the window, the sliding window,for sliding windowsFrom input featuresThe tensor of (A) isThe pixel of (a) is (are) in (b),is as followsA convolution kernel isA pixel of (b);
the second learning module is used for decomposing the local region of the input image into different frequency bands through continuous row and column transformation when space conversion is carried out, and initializing corresponding column and row transformation weights;
the concrete expression is as follows:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110929160.4A CN113379806B (en) | 2021-08-13 | 2021-08-13 | Target tracking method and system based on learnable sparse conversion attention mechanism |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110929160.4A CN113379806B (en) | 2021-08-13 | 2021-08-13 | Target tracking method and system based on learnable sparse conversion attention mechanism |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113379806A CN113379806A (en) | 2021-09-10 |
CN113379806B true CN113379806B (en) | 2021-11-09 |
Family
ID=77577066
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110929160.4A Active CN113379806B (en) | 2021-08-13 | 2021-08-13 | Target tracking method and system based on learnable sparse conversion attention mechanism |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113379806B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114119669A (en) * | 2021-11-30 | 2022-03-01 | 南昌工程学院 | Image matching target tracking method and system based on Shuffle attention |
CN115063445B (en) * | 2022-08-18 | 2022-11-08 | 南昌工程学院 | Target tracking method and system based on multi-scale hierarchical feature representation |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108492313A (en) * | 2018-02-05 | 2018-09-04 | 绍兴文理学院 | A kind of dimension self-adaption visual target tracking method based on middle intelligence similarity measure |
CN109427055A (en) * | 2017-09-04 | 2019-03-05 | 长春长光精密仪器集团有限公司 | The remote sensing images surface vessel detection method of view-based access control model attention mechanism and comentropy |
CN110675423A (en) * | 2019-08-29 | 2020-01-10 | 电子科技大学 | Unmanned aerial vehicle tracking method based on twin neural network and attention model |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110060274A (en) * | 2019-04-12 | 2019-07-26 | 北京影谱科技股份有限公司 | The visual target tracking method and device of neural network based on the dense connection of depth |
CN111126132A (en) * | 2019-10-25 | 2020-05-08 | 宁波必创网络科技有限公司 | Learning target tracking algorithm based on twin network |
CN111260688A (en) * | 2020-01-13 | 2020-06-09 | 深圳大学 | Twin double-path target tracking method |
CN111291679B (en) * | 2020-02-06 | 2022-05-27 | 厦门大学 | Target specific response attention target tracking method based on twin network |
CN112991385B (en) * | 2021-02-08 | 2023-04-28 | 西安理工大学 | Twin network target tracking method based on different measurement criteria |
-
2021
- 2021-08-13 CN CN202110929160.4A patent/CN113379806B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109427055A (en) * | 2017-09-04 | 2019-03-05 | 长春长光精密仪器集团有限公司 | The remote sensing images surface vessel detection method of view-based access control model attention mechanism and comentropy |
CN108492313A (en) * | 2018-02-05 | 2018-09-04 | 绍兴文理学院 | A kind of dimension self-adaption visual target tracking method based on middle intelligence similarity measure |
CN110675423A (en) * | 2019-08-29 | 2020-01-10 | 电子科技大学 | Unmanned aerial vehicle tracking method based on twin neural network and attention model |
Also Published As
Publication number | Publication date |
---|---|
CN113379806A (en) | 2021-09-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108090919B (en) | Improved kernel correlation filtering tracking method based on super-pixel optical flow and adaptive learning factor | |
US11551333B2 (en) | Image reconstruction method and device | |
WO2022036777A1 (en) | Method and device for intelligent estimation of human body movement posture based on convolutional neural network | |
CN110570458B (en) | Target tracking method based on internal cutting and multi-layer characteristic information fusion | |
WO2020108362A1 (en) | Body posture detection method, apparatus and device, and storage medium | |
CN113379806B (en) | Target tracking method and system based on learnable sparse conversion attention mechanism | |
CN111738344B (en) | Rapid target detection method based on multi-scale fusion | |
CN107424161B (en) | Coarse-to-fine indoor scene image layout estimation method | |
CN113313810B (en) | 6D attitude parameter calculation method for transparent object | |
CN113989301A (en) | Colorectal polyp segmentation method fusing neural networks of multiple attention mechanisms | |
CN111310768B (en) | Saliency target detection method based on robustness background prior and global information | |
CN115393584A (en) | Establishment method based on multi-task ultrasonic thyroid nodule segmentation and classification model, segmentation and classification method and computer equipment | |
EP3872761A2 (en) | Analysing objects in a set of frames | |
CN106407978B (en) | Method for detecting salient object in unconstrained video by combining similarity degree | |
CN110809126A (en) | Video frame interpolation method and system based on adaptive deformable convolution | |
CN111860823A (en) | Neural network training method, neural network training device, neural network image processing method, neural network image processing device, neural network image processing equipment and storage medium | |
CN115187786A (en) | Rotation-based CenterNet2 target detection method | |
CN112270366A (en) | Micro target detection method based on self-adaptive multi-feature fusion | |
CN114119669A (en) | Image matching target tracking method and system based on Shuffle attention | |
CN112329662B (en) | Multi-view saliency estimation method based on unsupervised learning | |
CN113538221A (en) | Three-dimensional face processing method, training method, generating method, device and equipment | |
CN110503093B (en) | Region-of-interest extraction method based on disparity map DBSCAN clustering | |
CN108765384B (en) | Significance detection method for joint manifold sequencing and improved convex hull | |
CN114782455B (en) | Cotton row center line image extraction method for agricultural machine embedded equipment | |
CN106485686A (en) | One kind is based on gravitational spectral clustering image segmentation algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |