CN115330876A - Target template graph matching and positioning method based on twin network and central position estimation - Google Patents
Target template graph matching and positioning method based on twin network and central position estimation Download PDFInfo
- Publication number
- CN115330876A CN115330876A CN202211131672.7A CN202211131672A CN115330876A CN 115330876 A CN115330876 A CN 115330876A CN 202211131672 A CN202211131672 A CN 202211131672A CN 115330876 A CN115330876 A CN 115330876A
- Authority
- CN
- China
- Prior art keywords
- graph
- template
- network
- real
- target template
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/75—Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
- G06V10/751—Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
The invention belongs to the technical field of image processing and deep learning, and particularly relates to a target template map matching and positioning method based on twin network and central position estimation, which comprises the following steps: s1, constructing a target template graph matching positioning network; s2, training a target template graph to match a positioning network; and S3, matching and positioning the target template picture by applying the trained target template picture matching and positioning network model. Compared with the traditional template matching method, the target template map matching and positioning method based on the twin network and the central position estimation can fully utilize the powerful feature extraction and characterization capability of the deep twin network and the high-precision positioning capability of the central position estimation network, and obtains a target template map matching and positioning network model corresponding to the complex difference through training on the basis of a training image set covering large differences of different sources, dimensions, rotation, visual angles and the like.
Description
Technical Field
The invention belongs to the technical field of image processing and deep learning, and particularly relates to a target template map matching positioning method based on a twin network and central position estimation.
Background
The target template map matching positioning refers to that the template map of a target is given in advance, and the position corresponding to the center of the template map is accurately positioned in a real-time image acquired by imaging equipment through the steps of feature extraction, similarity measurement, maximum similar position searching and the like. The method is a basic technology in the field of computer vision and target identification, and is widely applied to various tasks such as remote sensing, medical image processing, video monitoring, imaging guidance and the like. In specific application, the real-time image and the template image are different in acquiring device, the shooting time, the shooting visual angle, the illumination condition and other acquiring conditions are different, and the real-time image and the target template image often have great difference in different sources, rotation, visual angles, noise and the like, which brings great challenge to the accurate positioning of the target template image.
A study "Image Registration Methods: A Survey" (Image and Vision Computing,2003,21 (11): 977-1000), published by Barbara Zitova' and Jan Fluser, divides the template graph matching localization task into four elements, namely feature extraction, similarity measure, search space and search method. The features extracted by the traditional target template matching positioning method are manually designed features, and simple similarity measurement is adopted, so that the capabilities of feature extraction and similarity measurement are weak, and the challenge of solving the problems is difficult. In addition, the search space of the traditional method is the coupling of dimensions such as translation, scale, rotation and the like, and the searched matching position is easy to fall into a local optimal value, so that the target template map is inaccurate in positioning and even wrong in positioning. The strong feature extraction and utilization capability of deep learning provides a new technical approach for improving the matching and positioning performance of the target template graph. A paper A Robust and Accurate End-to-End Template Matching Method Based on the Template Network (IEEE Geoscience and Remote Sensing Letters,2022.01, 19, 1-5) published by Qiang Ren et al proposes an End-to-End Template Matching Method Based on a twin Network, and the Method takes a Template Matching task as Template classification and position regression for processing, thereby improving the robustness of the Template Matching positioning to large differences such as heterogeneities, rotation, visual angles, noise and the like. However, the method adopts a method of dense prediction of a rectangular positioning frame when positioning the template graph, namely, the positioning of the central position of the template graph is indirectly realized by predicting a template boundary frame, so that the positioning accuracy and robustness of the template graph are still influenced by factors such as heterogenous sources, scales, visual angle differences and the like.
Disclosure of Invention
Aiming at the problems of the existing target template map matching and positioning method, the invention provides a target template map matching and positioning method based on a depth twin network and central position estimation.
In order to achieve the above object, the present invention provides the following solution, a target template map matching positioning method based on a depth twin network and a center position estimation, comprising the following steps:
s1, constructing a target template graph matching positioning network
The target template graph matching and positioning network is formed by sequentially cascading a feature extraction twin network, a depth correlation convolution network and a central position estimation network, and is input into a template graph T and a real-time graph S, wherein the sizes of T and S are mxm and nxn respectively, m and n are positive integers, and n is larger than m; thermodynamic diagram P with single-channel output hm Let it be m h ×m h ,m h The larger the thermodynamic value at a coordinate on the thermodynamic diagram is, say, a positive integerThe greater the likelihood that the coordinate is the position of the template map center on the real-time map. The method comprises the following specific steps:
s1.1, constructing a feature extraction twin network, and extracting feature information of an input template graph and a real-time graph
The feature extraction twin network is formed by cascading two convolution neural networks with shared parameters and the same structure, and takes a template graph T and a real-time graph S as input and outputs as a template graph feature graph f (T) and a real-time graph feature graph f (S), wherein the size of f (T) is m 1 ×m 1 X d, f (S) size n 1 ×n 1 X d, wherein m 1 Denotes the length and width of f (T), n 1 Length and width of f (S), d number of channels, m 1 、n 1 And d is a positive integer.
The convolutional neural network is obtained by modifying a standard ResNet network (He K., zhang X., ren S., sun J. Deep Residual Learning for Image registration [ C ]// IEEE Conference on Computer Vision & Pattern registration. IEEE Computer Society, 2016), and the specific modification is as follows:
(1) 3 x 3 convolution is added to the third, fourth and fifth layers of the standard ResNet network to realize feature dimension reduction, and the obtained feature maps are respectively marked asAnd
(2) For characteristic diagramCarrying out 3 multiplied by 3 deconvolution to obtain a characteristic diagram which is spliced on the characteristic diagramThen, carrying out 3 x 3 convolution on the spliced feature map to obtain the feature map
(3) For characteristic diagramCarrying out 3 multiplied by 3 deconvolution to obtain a characteristic diagram which is spliced on the characteristic diagramAfter that, the final output is obtained: template map feature map f (T) and real-time map feature map f (S).
S1.2, fusing the extracted template graph feature graph f (T) and the real-time graph feature graph f (S) by using a depth-dependent convolution network
The depth-dependent convolution network takes the template graph feature graph f (T) and the real-time graph feature graph f (S) extracted in S1.1 as input, takes f (T) as a convolution kernel to perform depth-dependent convolution operation with f (S), and outputs the result as a combined related feature graph f (T) and f (S) Fusion Having a size of (m) 1 +1)×(m 1 +1)×d;
S1.3, a central position estimation network is constructed, and a single-channel thermodynamic diagram is calculated
The central position estimation network is formed by cascading three 3 x 3 deconvolution layers and one 3 x 3 convolution layer, wherein: the number of channels of each 3 multiplied by 3 deconvolution layer is d, the step length is s, and s is a positive integer; the number of channels in the 3 × 3 convolutional layers is d, and the step size is 1.
The central position estimation network uses the fused related feature map f in S1.2 Fusion As input, the output is a single-channel thermodynamic diagram P hm Dimension m h ×m h ,m h =m 1 ·s 3 . Note p x,y Is a thermodynamic diagram P hm The heat value at the upper (x, y) position is 1-m h Then p is x,y Has a value range of [0,1 ]]。
S2 training target template graph matching positioning network
S2.1 making a training image set
S2.1.1, aiming at various targets such as houses, roads, bridges, vehicles, ships, airplanes and the like, shooting at different distances, different visual angles and different positions by using a visible light camera and an infrared camera respectively at different time periods to obtain a large number of images;
s2.1.2 making n from the captured image train For the image pair composed of the template image and the real-time image, n train Not less than 40000. The specific manufacturing method comprises the following steps: cutting an image block containing a certain target in a certain image, zooming into a size of m multiplied by m, and selecting as a template drawing, wherein m is a positive integer; and cutting image blocks containing the same target in other images, scaling the image blocks into n multiplied by n, and selecting the image blocks as a real-time image, wherein n is a positive integer.
S2.1.3 n to be fabricated train And taking the images as a training image set.
As can be seen from the above training image set production process, there are significant differences between the template image and the real-time image, such as different sources, scales, rotations, visual angles, etc.
S2.2 calibrating a training image set
When calibrating the image pair composed of the template graph and the real-time graph in the training image set, firstly, the coordinate c of the center of the template graph on the real-time graph needs to be calibrated ref =(x ref ,y ref ) Then mapped to coordinates (x) on a thermodynamic diagram hm ,y hm ) Namely, the corresponding position of the center of the template diagram on the thermodynamic diagram is calculated by the specific calculation method
After obtaining the corresponding coordinates of the center of the template drawing on the thermodynamic diagram, the thermodynamic diagram labels corresponding to the pair of training samples are generatedDifferent from the calibration method of directly recording the positive sample as '1' and the negative sample as '0', the thermodynamic diagram is calibrated by adopting a Gaussian kernel weighting mode in the step, and the purpose is to control the negative sample to occupy in the loss functionThe specific gravity reduces the influence caused by the unbalance of the positive and negative samples, and the specific calibration method comprises the following steps:
wherein:presentation in thermodynamic diagram LabelThe value of the specific calibrated heat value at the (x, y) position of (A), the value range of x and y is [1,m ] h ];σ p Is a hyper-parameter related to the size of the template graph, the invention takesCalculating the heat value of all (x, y) positions to obtain a thermodynamic diagram label calibrated for the training sample
S2.3 design loss function
The loss function used for the design training is as follows:
wherein: p is a radical of x,y Represents the thermal power value (confidence) of the template graph center at the position of the real-time graph (x, y) calculated by the target template graph matching positioning network in S1,representing a thermodynamic diagram calibrated for the training samples in S2.2The thermodynamic values at position (x, y), α and β are adjustable hyper-parameters, in the present invention α =2 and β =4 are taken.
S2.4, using the training image set acquired in S2.1 and the training image set calibrated in S2.2, performing network training by using a random gradient descent (SGD) (LeCun Y, boser B, denker J S, et al. Back propagation applied to hand written text code recognition [ J ]. Neural computation,1989,1 (4): 541-551.) method, namely, minimizing the loss function designed in S2.3 to obtain a trained target template map matching positioning network model.
S3, matching and positioning the target template picture by applying the trained target template picture matching and positioning network model
The specific process is as follows:
s3.1, inputting a template picture T (with the size of m multiplied by m) to be matched and positioned and a real-time picture S (with the size of n multiplied by n) into a trained target template picture matching and positioning network model in S2.4;
s3.2 calculating and outputting thermodynamic diagram P through the target template diagram matching positioning network model hm ;
S3.3 finding thermodynamic diagram P hm The maximum value above, the coordinate of the point with the maximum value is marked as (x) max ,y max );
S3.4 will (x) max ,y max ) And (5) positioning the position (u, v) of the center of the target template graph on the real-time graph by substituting the following formula:
compared with the traditional template matching method, the target template graph matching and positioning method based on the twin network and the central position estimation can fully utilize the powerful feature extraction and characterization capability of the deep twin network and the high-precision positioning capability of the central position estimation network, and on the basis of a training image set covering large differences such as heterogeneities, dimensions, rotation, visual angles and the like, a target template graph matching and positioning network model corresponding to the complex differences is obtained through training.
Drawings
FIG. 1 is a schematic network structure diagram of a target template map matching and positioning method based on twin network and center position estimation according to the present invention;
FIG. 2 is a schematic diagram of a novel ResNet 18-based feature extraction network structure according to the present invention;
FIG. 3 is an example of a template graph and a real-time graph in a training image set according to the present invention;
fig. 4 shows some of the template matching results provided by the present invention.
Detailed Description
The invention is further described with reference to the following figures and specific examples.
The invention provides a target template graph matching and positioning method based on twin networks and central position estimation, which comprises the following steps:
s1, constructing a target template graph matching positioning network
The target template graph matching positioning network is formed by sequentially cascading a feature extraction twin network, a depth correlation convolution network and a central position estimation network. Fig. 1 is a schematic diagram of a specific structure of the entire network. In an embodiment, the network inputs the template graph T with the size of 127 × 127 and the real-time graph S with the size of 255 × 255; the output is a single channel thermodynamic diagram of size 129 x 129.
S1.1, constructing a feature extraction twin network, and extracting feature information of an input template graph and a real-time graph
The feature extraction twin network is formed by cascading two convolution neural networks with shared parameters and the same structure, and takes a template graph T and a real-time graph S as input and outputs as a template graph feature graph f (T) and a real-time graph feature graph f (S), respectively. M in the example of implementation 1 =16,n 1 =32,d =128, i.e.: the size of f (T) is 16X 128, and the size of f (S) is 32X 128.
As shown in fig. 2, the convolutional neural network is obtained by modifying on the basis of a standard ResNet network, and the specific modifications are as follows:
(1) 3 x 3 convolution is added at the third, fourth and fifth layers of the standard ResNet network to realize feature dimension reduction, and the obtained feature maps are respectively marked asAnd
(2) For characteristic diagramPerforming 3 × 3 deconvolution to obtain a feature map, and splicing the feature mapThen, carrying out 3 x 3 convolution on the spliced feature map to obtain a feature map
(3) For characteristic diagramPerforming 3 × 3 deconvolution to obtain a feature map, and splicing the feature mapThen, the final output is obtained: template map feature map f (T) and live map feature map f (S).
In the embodiment where the ResNet18 network is selected, the number of channels for the 3 × 3 convolution is 128, the number of channels for the step size 1,3 × 3 deconvolution is 128, and the step size is 2.
S1.2, fusing the extracted template graph characteristic graph f (T) and the real-time graph characteristic graph f (S) by utilizing a depth correlation convolution network
The input of the depth correlation convolution operation is f (T) and f (S), the depth convolution operation is carried out on the f (T) serving as a convolution kernel and the f (S), and the output is a correlation characteristic graph f after the two are fused Fusion . Example of embodiment f Fusion Has a size of 17 × 17 × 128.
S1.3, a central position estimation network is constructed, and a thermodynamic diagram is calculated
The central position estimation network consists of three 3 x 3 deconvolution layers and one 3 x 3 convolution levelIs formed by connecting, the input is f Fusion The output is a single-channel thermodynamic diagram P hm . In the embodiment, the number of channels per 3 × 3 deconvolution layer is 128, the step size is 2, the number of channels per 3 × 3 convolution layer is 128, the step size is 1, and the output P is hm Has a size of 129 × 129.
S2 training target template graph matching positioning network
S2.1 making a training image set
In the present embodiment, a Dajiang M300 unmanned aerial vehicle is used to carry a Zen Si H20 pan-tilt camera, visible light pictures and infrared pictures of the ground are taken from the air, and a 40000 pair of template images and real-time images are made as training image sets according to the method provided in the previous step S2.1, wherein the sizes of the template images and the real-time images are 127 × 127 and 255 × 255 pixels, respectively.
S2.2 calibrating a training image set
S2.1.1, for each pair of training samples, calibrating the coordinate c of the center of the template graph on the real-time graph ref =(x ref ,y ref );
S2.1.2 calculating the corresponding position of the center of the template graph on the thermodynamic diagram, wherein the calculation method in the implementation example is
S2.1.3 obtaining the corresponding coordinates of the center of the template drawing on the thermodynamic diagram, and then generating the thermodynamic diagram labels corresponding to the pair of training samplesIn the implementation exampleThe thermal value calibrated at each (x, y) position is calculated as follows:
wherein x is more than or equal to 1, y is less than or equal to 129,is a hyper-parameter related to the size of the template graph.
S2.3 design loss function
The loss function used for design training is as follows:
wherein: p is a radical of x,y Represents the thermal power value (confidence) of the template graph center at the position of the real-time graph (x, y) calculated by the target template graph matching positioning network in S1,represents the thermodynamic diagram calibrated for the training samples in S2.2The thermodynamic values at position (x, y), α and β, are adjustable hyper-parameters, taking α =2 and β =4 in this embodiment example.
S2.4 utilizes the collected training image set and the calibrated data to carry out network training by using a Stochastic Gradient Descent (SGD) (method, namely, the trained target template graph matching positioning network model is obtained by minimizing the loss function designed in S2.3. In the implementation example, when the model is trained, the batch _ size is set to be 128 (the number of GPUs is 4, 32 pairs of images are loaded on each GPU), and the parameters Momentum and weight _ decade are respectively set to be 0.9 and 0.001. The model trains 20 epochs together, wherein in the first 5 epochs, the learning rate is increased to 0.005 from the equal interval of 0.001, and in the last 15 epochs, the learning rate is attenuated to 0.0005 from the equal logarithmic interval of 0.005.
S3, matching and positioning the target template picture by applying the trained target template picture matching and positioning network model
The specific process is as follows:
s3.1, inputting a template picture T (with the size of 127 multiplied by 127) to be matched and positioned and a real-time picture S (with the size of 256 multiplied by 256) into a trained target template picture matching and positioning network model in S2.4;
s3.2, calculating and outputting a thermodynamic diagram P by matching the target template diagram with the positioning network model hm ;
S3.3 finding thermodynamic diagram P hm Maximum value of (c), and the coordinate of the maximum value point is (x) max ,y max );
S3.4 will (x) max ,y max ) And (5) positioning the position (u, v) of the center of the target template graph on the real-time graph by substituting the following formula:
in order to qualitatively evaluate the template matching method provided by the invention, in the embodiment, a Dajiang M300 unmanned aerial vehicle is used to carry a Zen Si H20 pan-tilt camera, visible light photos and infrared photos of the ground are taken from the air, 350 pairs of image pairs consisting of template images and real-time images are made, and a test data set is constructed and recorded as Hard350. The template graph and the real-time graph in the test data set have great differences of rotation, visual angle, shielding, heterogeneities (visible light and infrared) and the like, and do not appear in the training set. In the present embodiment, the average central error (MCE) defined based on the central error and the matching Success Rate (SR) are used as evaluation indexes, where SR2 represents the matching success rate obtained when the central error is smaller than 2 pixels and the matching is successful.
Table 1 shows the comparison result of the method provided by the present invention and some existing typical template matching methods on the test data set, wherein typical representative algorithms include Normalized Cross Correlation (NCC), normalized Mutual Information (NMI), SIFT-based image matching algorithm and HOG-based image matching algorithm, and Ours in the table represents the method provided by the present invention. As can be seen from the comparison of the results in table 1: compared with the traditional template matching method, the method provided by the invention can greatly improve the accuracy and robustness of template matching in a complex environment.
TABLE 1 test results on Easy150 and Hard350 datasets for different methods
FIG. 4 shows some target template map matching positioning results obtained under the interference of heterogeneous source, viewing angle difference, rotation difference and scale difference by using the method provided by the invention. As can be seen from the figure, the target template map matching and positioning method provided by the invention still has good performance under the complex challenge condition.
In conclusion, the target template map matching and positioning method based on the twin network and the central position estimation has good target template map matching and positioning accuracy and robustness under the complex challenge condition.
Claims (5)
1. A target template map matching positioning method based on a depth twin network and center position estimation is characterized by comprising the following steps:
s1, constructing a target template graph matching positioning network
The target template graph matching and positioning network is formed by sequentially cascading a feature extraction twin network, a depth correlation convolution network and a central position estimation network, and is input into a template graph T and a real-time graph S, wherein the sizes of T and S are mxm and nxn respectively, m and n are positive integers, and n is larger than m; thermodynamic diagram P with single-channel output hm In the size of m h ×m h ,m h Is a positive integer, and specifically comprises the following components:
s1.1, constructing a feature extraction twin network, and extracting feature information of an input template graph and a real-time graph
The feature extraction twin network is formed by cascading two convolution neural networks with shared parameters and the same structure, and takes a template graph T and a real-time graph S as input and outputs as a template graph feature graph f (T) and a real-time graph feature graph f (S), wherein the size of f (T) is m 1 ×m 1 Size of x d, f (S)Is n 1 ×n 1 X d, wherein m 1 Denotes the length and width of f (T), n 1 Length and width of f (S), d number of channels, m 1 、n 1 D is a positive integer;
the convolutional neural network is obtained by modifying on the basis of a standard ResNet network, and the specific modification is as follows:
(1) 3 x 3 convolution is added at the third, fourth and fifth layers of the standard ResNet network to realize feature dimension reduction, and the obtained feature maps are respectively marked asAnd
(2) For characteristic diagramCarrying out 3 multiplied by 3 deconvolution to obtain a characteristic diagram which is spliced on the characteristic diagramThen, carrying out 3 x 3 convolution on the spliced feature map to obtain the feature map
(3) For characteristic diagramPerforming 3 × 3 deconvolution to obtain a feature map, and splicing the feature mapAfter that, the final output is obtained: a template graph feature graph f (T) and a real-time graph feature graph f (S);
s1.2, fusing the extracted template graph feature graph f (T) and the real-time graph feature graph f (S) by using a depth-dependent convolution network
The depth-dependent convolution network takes the template graph feature graph f (T) and the real-time graph feature graph f (S) extracted in S1.1 as input, takes f (T) as a convolution kernel to perform depth-dependent convolution operation with f (S), and outputs the result as a combined related feature graph f (T) and f (S) Fusion Of size (m) 1 +1)×(m 1 +1)×d;
S1.3, a central position estimation network is constructed, and a single-channel thermodynamic diagram is calculated
The central position estimation network is formed by cascading three 3 x 3 deconvolution layers and one 3 x 3 convolution layer, wherein: the number of channels of each 3 multiplied by 3 deconvolution layer is d, the step length is s, and s is a positive integer; the number of channels of the 3 multiplied by 3 convolutional layers is d, and the step length is 1;
the central position estimation network uses the fused related feature map f in S1.2 Fusion As input, the output is a single-channel thermodynamic diagram P hm Dimension m h ×m h ,m h =m 1 ·s 3 (ii) a Note p x,y Is a thermodynamic diagram P hm The heat value at the upper (x, y) position is 1-x, y-m h Then p is x,y Has a value range of [0,1 ]];
S2 training target template graph matching positioning network
S2.1 making a training image set
S2.1.1, shooting at different distances, different visual angles and different positions respectively by using a visible light camera and an infrared camera at different time periods aiming at various targets of houses, roads, bridges, vehicles, ships and warships and airplanes to obtain a large number of images;
s2.1.2 making n from the acquired image train The image pair consisting of the template image and the real-time image is processed;
s2.1.3 n to be made train Taking the images as a training image set;
s2.2 calibrating a training image set
When calibrating the image pair composed of the template graph and the real-time graph in the training image set, firstly, the coordinate c of the center of the template graph on the real-time graph needs to be calibrated ref =(x ref ,y ref ) Then mapped to coordinates (x) on a thermodynamic diagram hm ,y hm ) Namely, the corresponding position of the center of the template diagram on the thermodynamic diagram is calculated by the specific calculation method
after the corresponding coordinates of the center of the template drawing on the thermodynamic diagram are obtained, the thermodynamic diagram labels corresponding to the pair of training samples are generatedIn the step, a Gaussian kernel weighting mode is adopted to calibrate the thermodynamic diagram, and the specific calibration method comprises the following steps:
wherein:presentation in thermodynamic diagram LabelThe value of the specific calibrated heat value at the (x, y) position of (A), the value range of x and y is [1,m ] h ];σ p Is a hyper-parameter related to the size of the template graph; calculating the heat value of all (x, y) positions to obtain a thermodynamic diagram label calibrated for the training sample
S2.3 design loss function
The loss function used for design training is as follows:
wherein: p is a radical of x,y Represents the thermal force value of the template graph at the position of the real-time graph (x, y) calculated by the target template graph matching positioning network in S1,representing a thermodynamic diagram calibrated for the training samples in S2.2The thermodynamic values at position (x, y), α and β are adjustable hyper-parameters;
s2.4, performing network training by using the training image set acquired in S2.1 and the training image set calibrated in S2.2 by using a random gradient descent method, namely, minimizing the loss function designed in S2.3 to obtain a trained target template map matching positioning network model;
s3, matching and positioning the target template picture by applying the trained target template picture matching and positioning network model
The specific process is as follows:
s3.1, inputting a template picture T with the size of m multiplied by m and a real-time picture S with the size of n multiplied by n to be matched and positioned into a trained target template picture matching and positioning network model in S2.4;
s3.2 calculating and outputting thermodynamic diagram P through the target template diagram matching positioning network model hm ;
S3.3 finding thermodynamic diagram P hm The maximum value above, the coordinate of the point with the maximum value is marked as (x) max ,y max );
S3.4 will (x) max ,y max ) And (5) positioning the position (u, v) of the center of the target template graph on the real-time graph by substituting the following formula:
2. a target template map matching positioning method based on a depth twin network and center position estimation according to claim 1, characterized in that: s2.1.2, the number n of image pairs consisting of template images and real-time images train ≥40000。
3. A target template map matching positioning method based on a depth twin network and center position estimation according to claim 1, characterized in that: s2.1.2, preparation of n train The method for the image pair consisting of the template image and the real-time image comprises the following steps: cutting an image block containing a certain target in a certain image, zooming the image block into a size of m multiplied by m, and selecting the image block as a template picture, wherein m is a positive integer; and cutting image blocks containing the same target in other images, scaling the image blocks into n multiplied by n, and selecting the image blocks as a real-time image, wherein n is a positive integer.
5. A target template map matching positioning method based on a depth twin network and center position estimation according to claim 1, characterized in that: in S2.3, the adjustable hyper-parameters are α =2 and β =4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211131672.7A CN115330876B (en) | 2022-09-15 | 2022-09-15 | Target template graph matching and positioning method based on twin network and central position estimation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211131672.7A CN115330876B (en) | 2022-09-15 | 2022-09-15 | Target template graph matching and positioning method based on twin network and central position estimation |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115330876A true CN115330876A (en) | 2022-11-11 |
CN115330876B CN115330876B (en) | 2023-04-07 |
Family
ID=83929989
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211131672.7A Active CN115330876B (en) | 2022-09-15 | 2022-09-15 | Target template graph matching and positioning method based on twin network and central position estimation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115330876B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115861595A (en) * | 2022-11-18 | 2023-03-28 | 华中科技大学 | Multi-scale domain self-adaptive heterogeneous image matching method based on deep learning |
CN116260765A (en) * | 2023-05-11 | 2023-06-13 | 中国人民解放军国防科技大学 | Digital twin modeling method for large-scale dynamic routing network |
CN115861595B (en) * | 2022-11-18 | 2024-05-24 | 华中科技大学 | Multi-scale domain self-adaptive heterogeneous image matching method based on deep learning |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109191491A (en) * | 2018-08-03 | 2019-01-11 | 华中科技大学 | The method for tracking target and system of the twin network of full convolution based on multilayer feature fusion |
CN110245678A (en) * | 2019-05-07 | 2019-09-17 | 华中科技大学 | A kind of isomery twinned region selection network and the image matching method based on the network |
US20190332935A1 (en) * | 2018-04-27 | 2019-10-31 | Qualcomm Incorporated | System and method for siamese instance search tracker with a recurrent neural network |
CN112069896A (en) * | 2020-08-04 | 2020-12-11 | 河南科技大学 | Video target tracking method based on twin network fusion multi-template features |
CN113705731A (en) * | 2021-09-23 | 2021-11-26 | 中国人民解放军国防科技大学 | End-to-end image template matching method based on twin network |
CN114022729A (en) * | 2021-10-27 | 2022-02-08 | 华中科技大学 | Heterogeneous image matching positioning method and system based on twin network and supervised training |
CN114581678A (en) * | 2022-03-15 | 2022-06-03 | 中国电子科技集团公司第五十八研究所 | Automatic tracking and re-identifying method for template feature matching |
-
2022
- 2022-09-15 CN CN202211131672.7A patent/CN115330876B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190332935A1 (en) * | 2018-04-27 | 2019-10-31 | Qualcomm Incorporated | System and method for siamese instance search tracker with a recurrent neural network |
CN109191491A (en) * | 2018-08-03 | 2019-01-11 | 华中科技大学 | The method for tracking target and system of the twin network of full convolution based on multilayer feature fusion |
CN110245678A (en) * | 2019-05-07 | 2019-09-17 | 华中科技大学 | A kind of isomery twinned region selection network and the image matching method based on the network |
CN112069896A (en) * | 2020-08-04 | 2020-12-11 | 河南科技大学 | Video target tracking method based on twin network fusion multi-template features |
CN113705731A (en) * | 2021-09-23 | 2021-11-26 | 中国人民解放军国防科技大学 | End-to-end image template matching method based on twin network |
CN114022729A (en) * | 2021-10-27 | 2022-02-08 | 华中科技大学 | Heterogeneous image matching positioning method and system based on twin network and supervised training |
CN114581678A (en) * | 2022-03-15 | 2022-06-03 | 中国电子科技集团公司第五十八研究所 | Automatic tracking and re-identifying method for template feature matching |
Non-Patent Citations (4)
Title |
---|
KE LIANG ET AL.: "\"An Adaptive Kalman-Correlation Based Siamese Network Tracker for Visual Object Tracking\"" * |
QIANG REN ET AL.: "\"A Robust and Accurate End-to-End Template Matching Method Based on the Siamese Network\"" * |
史璐璐等: "\"基于Tiny Darknet全卷积孪生网络的目标跟踪\"" * |
陈云芳等: "\"基于孪生网络结构的目标跟踪算法综述\"" * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115861595A (en) * | 2022-11-18 | 2023-03-28 | 华中科技大学 | Multi-scale domain self-adaptive heterogeneous image matching method based on deep learning |
CN115861595B (en) * | 2022-11-18 | 2024-05-24 | 华中科技大学 | Multi-scale domain self-adaptive heterogeneous image matching method based on deep learning |
CN116260765A (en) * | 2023-05-11 | 2023-06-13 | 中国人民解放军国防科技大学 | Digital twin modeling method for large-scale dynamic routing network |
CN116260765B (en) * | 2023-05-11 | 2023-07-18 | 中国人民解放军国防科技大学 | Digital twin modeling method for large-scale dynamic routing network |
Also Published As
Publication number | Publication date |
---|---|
CN115330876B (en) | 2023-04-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110599537A (en) | Mask R-CNN-based unmanned aerial vehicle image building area calculation method and system | |
CN111160297A (en) | Pedestrian re-identification method and device based on residual attention mechanism space-time combined model | |
JP5417494B2 (en) | Image processing method and system | |
CN106529538A (en) | Method and device for positioning aircraft | |
CN104077760A (en) | Rapid splicing system for aerial photogrammetry and implementing method thereof | |
CN107766864B (en) | Method and device for extracting features and method and device for object recognition | |
CN112102294B (en) | Training method and device for generating countermeasure network, and image registration method and device | |
CN107909018B (en) | Stable multi-mode remote sensing image matching method and system | |
CN104268880A (en) | Depth information obtaining method based on combination of features and region matching | |
JP2017033197A (en) | Change area detection device, method, and program | |
US20210158081A1 (en) | System and method for correspondence map determination | |
CN115330876B (en) | Target template graph matching and positioning method based on twin network and central position estimation | |
Knyaz et al. | Joint geometric calibration of color and thermal cameras for synchronized multimodal dataset creating | |
CN113658147A (en) | Workpiece size measuring device and method based on deep learning | |
CN104392209B (en) | A kind of image complexity evaluation method of target and background | |
CN117218201A (en) | Unmanned aerial vehicle image positioning precision improving method and system under GNSS refusing condition | |
CN109740405B (en) | Method for detecting front window difference information of non-aligned similar vehicles | |
CN114120129B (en) | Three-dimensional identification method for landslide slip surface based on unmanned aerial vehicle image and deep learning | |
CN116758419A (en) | Multi-scale target detection method, device and equipment for remote sensing image | |
CN111563423A (en) | Unmanned aerial vehicle image target detection method and system based on depth denoising automatic encoder | |
CN111222576A (en) | High-resolution remote sensing image classification method | |
CN110135474A (en) | A kind of oblique aerial image matching method and system based on deep learning | |
CN113808256B (en) | High-precision holographic human body reconstruction method combined with identity recognition | |
Lo et al. | Depth estimation based on a single close-up image with volumetric annotations in the wild: A pilot study | |
CN113554754A (en) | Indoor positioning method based on computer vision |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |