CN114140495A - Single target tracking method based on multi-scale Transformer - Google Patents
Single target tracking method based on multi-scale Transformer Download PDFInfo
- Publication number
- CN114140495A CN114140495A CN202111340646.0A CN202111340646A CN114140495A CN 114140495 A CN114140495 A CN 114140495A CN 202111340646 A CN202111340646 A CN 202111340646A CN 114140495 A CN114140495 A CN 114140495A
- Authority
- CN
- China
- Prior art keywords
- target
- feature
- candidate
- features
- template
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 43
- 239000011159 matrix material Substances 0.000 claims description 14
- 238000012549 training Methods 0.000 claims description 9
- 238000000605 extraction Methods 0.000 claims description 8
- 238000009795 derivation Methods 0.000 claims description 6
- 238000010606 normalization Methods 0.000 claims description 6
- 238000012360 testing method Methods 0.000 claims description 6
- 230000000694 effects Effects 0.000 claims description 4
- 238000011176 pooling Methods 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 3
- 230000010354 integration Effects 0.000 claims description 2
- 238000013139 quantization Methods 0.000 claims description 2
- 238000005070 sampling Methods 0.000 claims description 2
- 238000005457 optimization Methods 0.000 abstract 1
- 230000008447 perception Effects 0.000 abstract 1
- 238000011156 evaluation Methods 0.000 description 10
- 238000012512 characterization method Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 2
- 230000001902 propagating effect Effects 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000009189 diving Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a single target tracking method based on a multi-scale Transformer. The method cuts out expression features with different space sizes from template features, obtains target feature information of a multi-scale semantic space through convolution kernels with different sizes, and then supervises the enhancement of the template features by utilizing the information so as to enable the template features to have the perception capability of the target features. And then, an IoU-Net is taken off line to evaluate the accuracy of the candidate frame, a feature modulation vector is learned by the feature of the target to act on the feature of the candidate frame, and the modulated feature is subjected to generalization learning to obtain the confidence score of the candidate frame. And finally, through multiple iterative optimization, finding a candidate frame with the highest confidence coefficient as a tracking result. Based on the multiscale Transformer module provided by the invention, the accuracy of the ATOM tracking method is improved to a certain extent, and the bounding box of the target can be estimated more accurately in a complex scene.
Description
Technical Field
The invention belongs to the technical field of single target tracking, and particularly relates to a multi-scale Transformer feature-guided single target tracking method in a complex environment.
Background
Single target tracking is a basic and challenging task in computer vision. Given any object in the first frame as a priori knowledge, the tracker aims at locating this target and estimating its bounding box for subsequent frames. In recent years, single-target tracking is widely applied to the fields of unmanned aerial vehicles, intelligent video monitoring and the like, and great progress is made, but the tracking errors accumulated continuously can cause that the tracker cannot cope with complex scenes such as deformation, shielding and the like. Therefore, how to accurately estimate the bounding box of the object remains to be studied.
Early single-target trackers performed bounding box estimation using a conventional multi-scale method, and only performed multi-scale measurements using the tracking result of the previous frame as the reference bounding box of the current frame. This conventional method will limit the accuracy of tracking to some extent when the target is severely distorted in the video stream. With the development of deep learning, many high-precision tracking methods emerge. The bounding box evaluation methods employed by mainstream trackers today can be broadly divided into two broad categories: template matching based and candidate box evaluation based methods. The tracker using the first method cuts an image containing context information as a template with a target in a first frame as the center, extracts the characteristics of the given template and the subsequent frame by using a twin network, and learns the most similar area to the template as a tracking result by a full convolution mode. The method for evaluating the boundary box greatly improves the accuracy of the tracker and can effectively estimate the state of the object when the object deforms. However, the method of using the context information and the target as the template has a certain drawback, in which the position, posture, etc. of the target are obscured by a large amount of context information. Therefore, subsequent candidate-box based evaluation methods are proposed to solve this problem. The method also utilizes the twin network to extract features, except that the features of a given target in the template are explicitly modeled with a priori knowledge, and then the confidence evaluation of the candidate box is guided by propagating the priori knowledge through off-line training of IoU-Net, and the candidate box with the highest confidence is taken as a tracking result. Due to the specific characterization capability of the target features, the method based on candidate box evaluation can effectively overcome some scenes with background interference. However, when similar interfering objects appear in the image, the tracker still has a drift condition because the receptive field of the convolutional neural network is far larger than the target area, so that the target characteristics are mixed with redundant information, and the characterization capability is insufficient. In order to further improve the tracking accuracy, the method optimizes the characterization capability of the target in the single-target tracking process on the basis of the candidate box evaluation.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a single target tracking method based on a multi-scale Transformer, and the characterization capability of a target is enhanced by using a multi-scale Transformer characteristic enhancement technology. The invention uses ATOM as a reference tracking method, thereby realizing more accurate tracking result.
The single target tracking method based on the multi-scale Transformer specifically comprises the following steps:
the method comprises the following specific steps:
1) cutting out 3 features with different space sizes on the template feature map by using the central position, wherein the scales of the three features are a multiplied by a, 2a multiplied by 2a and 3a multiplied by 3a respectively;
2) embedding the characteristics of different spaces into semantic spaces with different scales through a convolution layer with unchanged 3 channels, and finally modulating the characteristics into a 2-dimensional form; the overall flow of the multiscale Transformer is represented by the following formula:
3) in the multi-headed object attention module, the size of the linear convolution layer is 1 x1 through 1 convolution kernelReducing the number C of all the characteristic channels of V and K to C/4, so as to achieve the effect of accelerating the fitting of the model;
4) calculating a similarity matrix A between Q and K by taking the template characteristics as Q;
5) after the similarity matrix is obtained, calculating the output characteristic O of the single target attention block through the following matrix operation;
O=A*V (3)
6) Expanding the target attention block to multiple heads, and obtaining template characteristics T' enhanced by the multi-scale Transformer through summing and normalization processing;
T'=MultiHead(Q,K,V)=Norm(Concat(O1,O2,O3)Wo+Q) (4)
wherein Norm represents the utilization of l2Normalization adjusts the entire template features; woThe characteristic channel 3C is adjusted to input characteristic channel number C;
and 4, in an online testing stage, continuously fine-tuning the candidate frame position information through the gradient of the candidate frame position information, and iterating a more accurate boundary frame to serve as a tracking result.
Preferably, the backbone of the twin network in step 1 is a ResNet-18 network pre-trained on an ImageNet large data set, and parameters of the ResNet-18 network are shared by the template and subsequent image branches; to be suitable for the tracking task, a ResNet-18 network with the full connectivity layer removed is used as the feature extraction module, with a down-sampling rate of 16.
Preferably, the PrRoi posing extraction characteristic process in step 2 is: firstly, solving the quantization problem of regional pooling by using an interpolation method; interpolation is represented by the following equation:
IC(x,y,i,j)=max(0,1-|x-i|·max(0,1-|y-j|)) (6)
wherein i, j is the coordinate position of the feature image, and IC (x, y, i, j) is the interpolation coefficient; finally, the feature region after interpolation is extracted using double integration, the formula is as follows:
where F is a feature map, and the upper left corner (x1, y1) and the lower right corner (x2, y2) represent a region from which features are to be extracted.
Preferably, the common data set in step 3 comprises TrackingNet, LaSOT and COCO; two frames of images of the same video sequence are sampled during the training process as an input image pair of the model, and each image is a 288 × 288 region cut by taking the target as the center.
Preferably, the confidence score of each candidate box is evaluated in step 3, and the steps are as follows:
1) applying a full connection layer on the target feature to obtain a modulation vector x, wherein the formula is as follows:
x=φ(Flatten(Ftarget)) (8)
in the formula, inputting the target characteristicsOutputting through a full connection layer phi () by dimensional adjustment
2) And modulating the feature of each candidate frame by using the feature vector learned from the target feature, and outputting a final confidence score by the adjusted feature through another full-connection layer theta (eta), wherein the process is represented by the following formula:
wherein,representing broadcast multiplication, to be characterizedThe 25 eigenvalues of each channel in the system are given the same weight xiThe output of i-0, 1,2, …, C-1, s is the confidence score for each candidate box.
Preferably, the iteration step in step 4 is as follows:
1) in the testing stage, the coordinate position of the candidate frame position information is adjusted online by back-propagating the gradient of the candidate frame position information, wherein the coordinates at the upper left corner are (x1, y1), and the coordinates at the lower right corner are (x2, y 2); thanks to the continuity of the PrRoi posing feature extraction operation, the gradient information obtained by the reverse derivation is more accurate, and the derivation formula is as follows:
each candidate frame obtains a corresponding gradient value and updates the position and the size of a boundary frame of the candidate frame; after multiple iterations, selecting a candidate frame with the maximum confidence score from the output s as a tracking result;
2) each candidate frame obtains a corresponding gradient value and updates the position and the size of a boundary frame of the candidate frame; the update formula is as follows:
3) repeating the steps 1) and 2) for multiple times, and selecting the candidate box with the maximum confidence score from the output s as the tracking result.
The invention has the following beneficial effects:
1. a multi-scale Transformer module is bridged behind a twin network of the tracking framework based on ATOM, targets with different scales are used as supervision information to enhance template characteristics, interference of a large amount of background information is effectively inhibited, more accurate bounding box evaluation is guided, tracking errors are reduced, and single-target tracking in a complex scene is achieved.
2. The multi-scale Transformer feature enhancement strategy has strong universality and is suitable for most trackers based on candidate box evaluation.
Drawings
FIG. 1: a flow diagram of a single target tracking method based on candidate box evaluation;
FIG. 2: a structure diagram of a multi-scale transform module;
FIG. 3: an improved ATOM tracking framework map;
FIG. 4: the tracking examples are shown in contrast.
Detailed Description
The invention is further explained below with reference to the drawings.
As shown in fig. 1 and 3, the multiscale Transformer-based single-target tracking method is improved by using an ATOM tracking frame as a reference, and specifically includes the following steps:
After the template of the twin network structure is branched, the template features are strongly paired by the multi-scale Transformer module provided by the invention, as shown in fig. 2. The module derives template features from ResNet-18 template featuresThe space size is H W and the number of channels is C. In order to follow the input criteria of the self-attention mechanism, the dimension information of the 3-dimensional template T is adjusted toAs input Q for Multi-Head Target-orientation. And the template T is also input into a Pyramid transducer submodule to extract multi-scale features of the target so as to supervise the enhancement of the template features.
The submodule cuts out 3 features (4 x 4, 8 x 8 and 12 x 12) with different space sizes on a template feature diagram by using the center, then embeds the features of different spaces into semantic spaces with different scales by using convolution layers with 3 channels unchanged, and finally modulates the features into a 2-dimensional form. The overall flow for Pyramid transducer can be represented by the following formula:
in the Multi-Head Target-orientation submodule, first 1 convolution kernel is passed through a linear convolution layer of 1 × 1 sizeAnd the number C of all the characteristic channels of V and K is reduced to C/4, so that the fitting effect of the acceleration model is achieved. Then, the similarity matrix a between Q and K is calculated again using the following equation.
Wherein the output isdkIs characterized in thatOf (c) is calculated. After the similarity matrix is obtained, the output characteristic O of a single target attention block is calculated by the following matrix operation.
O=A*V (3)
Wherein, multiplication of the table matrix and outputAnd finally, expanding the Target-orientation to a multi-head, and performing summation and normalization processing to obtain the template characteristic T' enhanced by the multi-scale Transformer.
T'=MultiHead(Q,K,V)=Norm(Concat(O1,O2,O3)Wo+Q) (4)
Wherein Norm represents the utilization of l2Normalization adjusts the entire template features. WoIs a parameter matrix that can be learned, and adjusts the eigen channel 3C to the input eigen channel number C.
And 2, based on the fact that the target change of the front frame and the rear frame is small in the video sequence. The tracking result of the previous frame is used as a reference boundary frame, the center position of the reference boundary frame is kept unchanged, and 10 candidate frames with different sizes are randomly generated to serve as evaluation objects. The range of scaling factors for the length and width of these candidate frames is specified in the interval [0.75,1.25 ] as compared to the reference bounding box]And (4) the following steps. Extracting the target characteristics in the enhanced template in the step 1 by using a PrRoi posing operatorAnd features of candidate frames
x=φ(Flatten(Ftarget)) (5)
in the formula, inputting the target characteristicsOutputting through a full connection layer phi () by dimensional adjustment
The feature vector learned from the target feature is used to modulate the feature of each candidate frame, and the adjusted feature outputs a final confidence score through another fully connected layer θ (), which can be expressed as follows:
wherein,representing broadcast multiplication, to be characterizedThe 25 eigenvalues of each channel in the system are given the same weight xi(i ═ 0,1,2, …, C-1), s outputs the confidence score for each candidate box.
And 4, in the testing stage, adjusting the coordinate position of the candidate frame on line by reversely propagating the gradient of the position information of the candidate frame, wherein the coordinate at the upper left corner is (x1, y1), and the coordinate at the lower right corner is (x2, y 2). Thanks to the continuity of the PrRoi posing feature extraction operation, the gradient information obtained by the reverse derivation is more accurate, and the derivation formula is as follows:
each candidate box obtains its corresponding gradient value and updates the position and size of its bounding box. After 5 iterations, the candidate box with the largest confidence score is selected from the output s as the tracking result.
Step 5, the experimental environment of the invention is as follows: CPU isCoreTMi5-7300HQ @2.50GHz, GPU GTX1050Ti, video memory 4GB, system version Linux 5.4.0-81-genetic Ubuntu 18.04.5 LTS, Cuda version number 10.2, and deep learning frame Pytrch 1.6.0. The test results in the common data set OTB100 are as follows:
table 1: performance comparison before and after ATOM tracking method improvement
Accuracy | FPS | |
ATOM | 0.655 | 24.35 |
ATOM + Multi-Scale Transgormer (Ours) | 0.664 | 22.10 |
As can be seen from table 1, the accuracy of the tracker can be effectively improved by the multi-scale transform module provided by the present invention under the condition of low speed loss. To visually compare the boosting effect, fig. 4 shows an example comparison of two video sequences (basetball and Diving) in the OTB 100.
Claims (6)
1. The single-target tracking method based on the multi-scale Transformer is characterized by comprising the following steps:
step 1, after the template features extracted by the twin network are applied by a multi-scale Transformer module, guiding template feature enhancement by taking target features of different scales as supervision information to obtain enhanced template features T';
the method comprises the following specific steps:
1) cutting out 3 features with different space sizes on the template feature map by using the central position, wherein the scales of the three features are a multiplied by a, 2a multiplied by 2a and 3a multiplied by 3a respectively;
2) embedding the characteristics of different spaces into semantic spaces with different scales through a convolution layer with unchanged 3 channels, and finally modulating the characteristics into a 2-dimensional form; the overall flow of the multiscale Transformer is represented by the following formula:
3) in the multi-headed object attention module, the size of the linear convolution layer is 1 x1 through 1 convolution kernelReducing the number C of all the characteristic channels of V and K to C/4, so as to achieve the effect of accelerating the fitting of the model;
4) calculating a similarity matrix A between Q and K by taking the template characteristics as Q;
5) after the similarity matrix is obtained, calculating the output characteristic O of the single target attention block through the following matrix operation;
O=A*V (3)
6) Expanding the target attention block to multiple heads, and obtaining template characteristics T' enhanced by the multi-scale Transformer through summing and normalization processing;
T'=MultiHead(Q,K,V)=Norm(Concat(O1,O2,O3)Wo+Q) (4)
wherein Norm represents the utilization of l2Normalization adjusts the entire template features; woIs a parameter matrix which can be learnt and adjusts the characteristic channel 3CThe whole is the number C of input characteristic channels;
step 2, taking the tracking result of the previous frame as a reference frame of the current frame, randomly generating candidate frames with a plurality of scales, wherein the random length-width ratio scaling factor interval of the candidate frames is [ 1-alpha, 1+ alpha ], and extracting the features of the candidate frames and the target features in the enhanced template features T' through PrRoi posing;
step 3, training one IoU-Net through a public data set in an off-line manner; transmitting the target information to the candidate frames, adjusting the characteristics of the candidate frames by a vector modulation method, and evaluating the confidence score of each candidate frame;
and 4, in an online testing stage, continuously fine-tuning the candidate frame position information through the gradient of the candidate frame position information, and iterating a more accurate boundary frame to serve as a tracking result.
2. The multi-scale transform-based single-target tracking method of claim 1, wherein: the main trunk of the twin network in the step 1 adopts a ResNet-18 network pre-trained on an ImageNet big data set, and the parameters of the ResNet-18 network are shared by the template and the subsequent image branches; to be suitable for the tracking task, a ResNet-18 network with the full connectivity layer removed is used as the feature extraction module, with a down-sampling rate of 16.
3. The multi-scale transform-based single-target tracking method of claim 1, wherein: the PrRoi pooling extraction characteristic process in the step 2 comprises the following steps: firstly, solving the quantization problem of regional pooling by using an interpolation method; interpolation is represented by the following equation:
IC(x,y,i,j)=max(0,1-|x-i|·max(0,1-|y-j|)) (6)
wherein i, j is the coordinate position of the feature image, and IC (x, y, i, j) is the interpolation coefficient; finally, the feature region after interpolation is extracted using double integration, the formula is as follows:
where F is a feature map, and the upper left corner (x1, y1) and the lower right corner (x2, y2) represent a region from which features are to be extracted.
4. The multi-scale transform-based single-target tracking method of claim 1, wherein: the common data set in the step 3 comprises TrackingNet, LaSOT and COCO; two frames of images of the same video sequence are sampled during the training process as an input image pair of the model, and each image is a 288 × 288 region cut by taking the target as the center.
5. The multi-scale transform-based single-target tracking method of claim 1, wherein: in step 3, the confidence score of each candidate box is evaluated, and the steps are as follows:
1) applying a full connection layer on the target feature to obtain a modulation vector x, wherein the formula is as follows:
x=φ(Flatten(Ftarget)) (8)
in the formula, inputting the target characteristicsOutputting through a full connection layer phi () by dimensional adjustment
2) And modulating the feature of each candidate frame by using the feature vector learned from the target feature, and outputting a final confidence score by the adjusted feature through another full-connection layer theta (eta), wherein the process is represented by the following formula:
6. The multi-scale transform-based single-target tracking method of claim 1, wherein: the iteration steps in the step 4 are as follows:
1) in the testing stage, the coordinate position of the candidate frame position information is adjusted online by back-propagating the gradient of the candidate frame position information, wherein the coordinates at the upper left corner are (x1, y1), and the coordinates at the lower right corner are (x2, y 2); thanks to the continuity of the PrRoi posing feature extraction operation, the gradient information obtained by the reverse derivation is more accurate, and the derivation formula is as follows:
each candidate frame obtains a corresponding gradient value and updates the position and the size of a boundary frame of the candidate frame; after multiple iterations, selecting a candidate frame with the maximum confidence score from the output s as a tracking result;
2) each candidate frame obtains a corresponding gradient value and updates the position and the size of a boundary frame of the candidate frame; the update formula is as follows:
3) repeating the steps 1) and 2) for multiple times, and selecting the candidate box with the maximum confidence score from the output s as the tracking result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111340646.0A CN114140495A (en) | 2021-11-12 | 2021-11-12 | Single target tracking method based on multi-scale Transformer |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111340646.0A CN114140495A (en) | 2021-11-12 | 2021-11-12 | Single target tracking method based on multi-scale Transformer |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114140495A true CN114140495A (en) | 2022-03-04 |
Family
ID=80393732
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111340646.0A Pending CN114140495A (en) | 2021-11-12 | 2021-11-12 | Single target tracking method based on multi-scale Transformer |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114140495A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117853664A (en) * | 2024-03-04 | 2024-04-09 | 云南大学 | Three-dimensional face reconstruction method based on double-branch feature fusion |
-
2021
- 2021-11-12 CN CN202111340646.0A patent/CN114140495A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117853664A (en) * | 2024-03-04 | 2024-04-09 | 云南大学 | Three-dimensional face reconstruction method based on double-branch feature fusion |
CN117853664B (en) * | 2024-03-04 | 2024-05-14 | 云南大学 | Three-dimensional face reconstruction method based on double-branch feature fusion |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108154118B (en) | A kind of target detection system and method based on adaptive combined filter and multistage detection | |
CN110335290B (en) | Twin candidate region generation network target tracking method based on attention mechanism | |
US11182644B2 (en) | Method and apparatus for pose planar constraining on the basis of planar feature extraction | |
CN110348330B (en) | Face pose virtual view generation method based on VAE-ACGAN | |
CN112184752A (en) | Video target tracking method based on pyramid convolution | |
CN111179314A (en) | Target tracking method based on residual dense twin network | |
CN113065546B (en) | Target pose estimation method and system based on attention mechanism and Hough voting | |
CN108665491B (en) | Rapid point cloud registration method based on local reference points | |
CN102262724B (en) | Object image characteristic points positioning method and object image characteristic points positioning system | |
CN112862792B (en) | Wheat powdery mildew spore segmentation method for small sample image dataset | |
CN107871099A (en) | Face detection method and apparatus | |
CN111815665B (en) | Single image crowd counting method based on depth information and scale perception information | |
CN110246151B (en) | Underwater robot target tracking method based on deep learning and monocular vision | |
CN112785636B (en) | Multi-scale enhanced monocular depth estimation method | |
CN102799646B (en) | A kind of semantic object segmentation method towards multi-view point video | |
CN112232134A (en) | Human body posture estimation method based on hourglass network and attention mechanism | |
CN116468995A (en) | Sonar image classification method combining SLIC super-pixel and graph annotation meaning network | |
CN107862680A (en) | A kind of target following optimization method based on correlation filter | |
CN112801945A (en) | Depth Gaussian mixture model skull registration method based on dual attention mechanism feature extraction | |
CN112396036A (en) | Method for re-identifying blocked pedestrians by combining space transformation network and multi-scale feature extraction | |
CN110135435B (en) | Saliency detection method and device based on breadth learning system | |
CN115830375A (en) | Point cloud classification method and device | |
CN116189147A (en) | YOLO-based three-dimensional point cloud low-power-consumption rapid target detection method | |
CN114140495A (en) | Single target tracking method based on multi-scale Transformer | |
CN113723468B (en) | Object detection method of three-dimensional point cloud |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |