CN115330876A - Target template graph matching and positioning method based on twin network and central position estimation - Google Patents

Target template graph matching and positioning method based on twin network and central position estimation Download PDF

Info

Publication number
CN115330876A
CN115330876A CN202211131672.7A CN202211131672A CN115330876A CN 115330876 A CN115330876 A CN 115330876A CN 202211131672 A CN202211131672 A CN 202211131672A CN 115330876 A CN115330876 A CN 115330876A
Authority
CN
China
Prior art keywords
graph
template
network
real
target template
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211131672.7A
Other languages
Chinese (zh)
Other versions
CN115330876B (en
Inventor
郑永斌
任强
徐婉莹
白圣建
孙鹏
朱笛
杨东旭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN202211131672.7A priority Critical patent/CN115330876B/en
Publication of CN115330876A publication Critical patent/CN115330876A/en
Application granted granted Critical
Publication of CN115330876B publication Critical patent/CN115330876B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/751Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention belongs to the technical field of image processing and deep learning, and particularly relates to a target template map matching and positioning method based on twin network and central position estimation, which comprises the following steps: s1, constructing a target template graph matching positioning network; s2, training a target template graph to match a positioning network; and S3, matching and positioning the target template picture by applying the trained target template picture matching and positioning network model. Compared with the traditional template matching method, the target template map matching and positioning method based on the twin network and the central position estimation can fully utilize the powerful feature extraction and characterization capability of the deep twin network and the high-precision positioning capability of the central position estimation network, and obtains a target template map matching and positioning network model corresponding to the complex difference through training on the basis of a training image set covering large differences of different sources, dimensions, rotation, visual angles and the like.

Description

Target template graph matching and positioning method based on twin network and central position estimation
Technical Field
The invention belongs to the technical field of image processing and deep learning, and particularly relates to a target template map matching positioning method based on a twin network and central position estimation.
Background
The target template map matching positioning refers to that the template map of a target is given in advance, and the position corresponding to the center of the template map is accurately positioned in a real-time image acquired by imaging equipment through the steps of feature extraction, similarity measurement, maximum similar position searching and the like. The method is a basic technology in the field of computer vision and target identification, and is widely applied to various tasks such as remote sensing, medical image processing, video monitoring, imaging guidance and the like. In specific application, the real-time image and the template image are different in acquiring device, the shooting time, the shooting visual angle, the illumination condition and other acquiring conditions are different, and the real-time image and the target template image often have great difference in different sources, rotation, visual angles, noise and the like, which brings great challenge to the accurate positioning of the target template image.
A study "Image Registration Methods: A Survey" (Image and Vision Computing,2003,21 (11): 977-1000), published by Barbara Zitova' and Jan Fluser, divides the template graph matching localization task into four elements, namely feature extraction, similarity measure, search space and search method. The features extracted by the traditional target template matching positioning method are manually designed features, and simple similarity measurement is adopted, so that the capabilities of feature extraction and similarity measurement are weak, and the challenge of solving the problems is difficult. In addition, the search space of the traditional method is the coupling of dimensions such as translation, scale, rotation and the like, and the searched matching position is easy to fall into a local optimal value, so that the target template map is inaccurate in positioning and even wrong in positioning. The strong feature extraction and utilization capability of deep learning provides a new technical approach for improving the matching and positioning performance of the target template graph. A paper A Robust and Accurate End-to-End Template Matching Method Based on the Template Network (IEEE Geoscience and Remote Sensing Letters,2022.01, 19, 1-5) published by Qiang Ren et al proposes an End-to-End Template Matching Method Based on a twin Network, and the Method takes a Template Matching task as Template classification and position regression for processing, thereby improving the robustness of the Template Matching positioning to large differences such as heterogeneities, rotation, visual angles, noise and the like. However, the method adopts a method of dense prediction of a rectangular positioning frame when positioning the template graph, namely, the positioning of the central position of the template graph is indirectly realized by predicting a template boundary frame, so that the positioning accuracy and robustness of the template graph are still influenced by factors such as heterogenous sources, scales, visual angle differences and the like.
Disclosure of Invention
Aiming at the problems of the existing target template map matching and positioning method, the invention provides a target template map matching and positioning method based on a depth twin network and central position estimation.
In order to achieve the above object, the present invention provides the following solution, a target template map matching positioning method based on a depth twin network and a center position estimation, comprising the following steps:
s1, constructing a target template graph matching positioning network
The target template graph matching and positioning network is formed by sequentially cascading a feature extraction twin network, a depth correlation convolution network and a central position estimation network, and is input into a template graph T and a real-time graph S, wherein the sizes of T and S are mxm and nxn respectively, m and n are positive integers, and n is larger than m; thermodynamic diagram P with single-channel output hm Let it be m h ×m h ,m h The larger the thermodynamic value at a coordinate on the thermodynamic diagram is, say, a positive integerThe greater the likelihood that the coordinate is the position of the template map center on the real-time map. The method comprises the following specific steps:
s1.1, constructing a feature extraction twin network, and extracting feature information of an input template graph and a real-time graph
The feature extraction twin network is formed by cascading two convolution neural networks with shared parameters and the same structure, and takes a template graph T and a real-time graph S as input and outputs as a template graph feature graph f (T) and a real-time graph feature graph f (S), wherein the size of f (T) is m 1 ×m 1 X d, f (S) size n 1 ×n 1 X d, wherein m 1 Denotes the length and width of f (T), n 1 Length and width of f (S), d number of channels, m 1 、n 1 And d is a positive integer.
The convolutional neural network is obtained by modifying a standard ResNet network (He K., zhang X., ren S., sun J. Deep Residual Learning for Image registration [ C ]// IEEE Conference on Computer Vision & Pattern registration. IEEE Computer Society, 2016), and the specific modification is as follows:
(1) 3 x 3 convolution is added to the third, fourth and fifth layers of the standard ResNet network to realize feature dimension reduction, and the obtained feature maps are respectively marked as
Figure BDA0003848241690000021
And
Figure BDA0003848241690000022
(2) For characteristic diagram
Figure BDA0003848241690000023
Carrying out 3 multiplied by 3 deconvolution to obtain a characteristic diagram which is spliced on the characteristic diagram
Figure BDA0003848241690000026
Then, carrying out 3 x 3 convolution on the spliced feature map to obtain the feature map
Figure BDA0003848241690000024
(3) For characteristic diagram
Figure BDA0003848241690000025
Carrying out 3 multiplied by 3 deconvolution to obtain a characteristic diagram which is spliced on the characteristic diagram
Figure BDA0003848241690000027
After that, the final output is obtained: template map feature map f (T) and real-time map feature map f (S).
S1.2, fusing the extracted template graph feature graph f (T) and the real-time graph feature graph f (S) by using a depth-dependent convolution network
The depth-dependent convolution network takes the template graph feature graph f (T) and the real-time graph feature graph f (S) extracted in S1.1 as input, takes f (T) as a convolution kernel to perform depth-dependent convolution operation with f (S), and outputs the result as a combined related feature graph f (T) and f (S) Fusion Having a size of (m) 1 +1)×(m 1 +1)×d;
S1.3, a central position estimation network is constructed, and a single-channel thermodynamic diagram is calculated
The central position estimation network is formed by cascading three 3 x 3 deconvolution layers and one 3 x 3 convolution layer, wherein: the number of channels of each 3 multiplied by 3 deconvolution layer is d, the step length is s, and s is a positive integer; the number of channels in the 3 × 3 convolutional layers is d, and the step size is 1.
The central position estimation network uses the fused related feature map f in S1.2 Fusion As input, the output is a single-channel thermodynamic diagram P hm Dimension m h ×m h ,m h =m 1 ·s 3 . Note p x,y Is a thermodynamic diagram P hm The heat value at the upper (x, y) position is 1-m h Then p is x,y Has a value range of [0,1 ]]。
S2 training target template graph matching positioning network
S2.1 making a training image set
S2.1.1, aiming at various targets such as houses, roads, bridges, vehicles, ships, airplanes and the like, shooting at different distances, different visual angles and different positions by using a visible light camera and an infrared camera respectively at different time periods to obtain a large number of images;
s2.1.2 making n from the captured image train For the image pair composed of the template image and the real-time image, n train Not less than 40000. The specific manufacturing method comprises the following steps: cutting an image block containing a certain target in a certain image, zooming into a size of m multiplied by m, and selecting as a template drawing, wherein m is a positive integer; and cutting image blocks containing the same target in other images, scaling the image blocks into n multiplied by n, and selecting the image blocks as a real-time image, wherein n is a positive integer.
S2.1.3 n to be fabricated train And taking the images as a training image set.
As can be seen from the above training image set production process, there are significant differences between the template image and the real-time image, such as different sources, scales, rotations, visual angles, etc.
S2.2 calibrating a training image set
When calibrating the image pair composed of the template graph and the real-time graph in the training image set, firstly, the coordinate c of the center of the template graph on the real-time graph needs to be calibrated ref =(x ref ,y ref ) Then mapped to coordinates (x) on a thermodynamic diagram hm ,y hm ) Namely, the corresponding position of the center of the template diagram on the thermodynamic diagram is calculated by the specific calculation method
Figure BDA0003848241690000031
Wherein
Figure BDA0003848241690000032
Indicating a rounding down operation.
After obtaining the corresponding coordinates of the center of the template drawing on the thermodynamic diagram, the thermodynamic diagram labels corresponding to the pair of training samples are generated
Figure BDA0003848241690000033
Different from the calibration method of directly recording the positive sample as '1' and the negative sample as '0', the thermodynamic diagram is calibrated by adopting a Gaussian kernel weighting mode in the step, and the purpose is to control the negative sample to occupy in the loss functionThe specific gravity reduces the influence caused by the unbalance of the positive and negative samples, and the specific calibration method comprises the following steps:
Figure BDA0003848241690000034
wherein:
Figure BDA0003848241690000035
presentation in thermodynamic diagram Label
Figure BDA0003848241690000036
The value of the specific calibrated heat value at the (x, y) position of (A), the value range of x and y is [1,m ] h ];σ p Is a hyper-parameter related to the size of the template graph, the invention takes
Figure BDA0003848241690000037
Calculating the heat value of all (x, y) positions to obtain a thermodynamic diagram label calibrated for the training sample
Figure BDA0003848241690000038
S2.3 design loss function
The loss function used for the design training is as follows:
Figure BDA0003848241690000039
wherein: p is a radical of x,y Represents the thermal power value (confidence) of the template graph center at the position of the real-time graph (x, y) calculated by the target template graph matching positioning network in S1,
Figure BDA00038482416900000310
representing a thermodynamic diagram calibrated for the training samples in S2.2
Figure BDA00038482416900000311
The thermodynamic values at position (x, y), α and β are adjustable hyper-parameters, in the present invention α =2 and β =4 are taken.
S2.4, using the training image set acquired in S2.1 and the training image set calibrated in S2.2, performing network training by using a random gradient descent (SGD) (LeCun Y, boser B, denker J S, et al. Back propagation applied to hand written text code recognition [ J ]. Neural computation,1989,1 (4): 541-551.) method, namely, minimizing the loss function designed in S2.3 to obtain a trained target template map matching positioning network model.
S3, matching and positioning the target template picture by applying the trained target template picture matching and positioning network model
The specific process is as follows:
s3.1, inputting a template picture T (with the size of m multiplied by m) to be matched and positioned and a real-time picture S (with the size of n multiplied by n) into a trained target template picture matching and positioning network model in S2.4;
s3.2 calculating and outputting thermodynamic diagram P through the target template diagram matching positioning network model hm
S3.3 finding thermodynamic diagram P hm The maximum value above, the coordinate of the point with the maximum value is marked as (x) max ,y max );
S3.4 will (x) max ,y max ) And (5) positioning the position (u, v) of the center of the target template graph on the real-time graph by substituting the following formula:
Figure BDA0003848241690000041
compared with the traditional template matching method, the target template graph matching and positioning method based on the twin network and the central position estimation can fully utilize the powerful feature extraction and characterization capability of the deep twin network and the high-precision positioning capability of the central position estimation network, and on the basis of a training image set covering large differences such as heterogeneities, dimensions, rotation, visual angles and the like, a target template graph matching and positioning network model corresponding to the complex differences is obtained through training.
Drawings
FIG. 1 is a schematic network structure diagram of a target template map matching and positioning method based on twin network and center position estimation according to the present invention;
FIG. 2 is a schematic diagram of a novel ResNet 18-based feature extraction network structure according to the present invention;
FIG. 3 is an example of a template graph and a real-time graph in a training image set according to the present invention;
fig. 4 shows some of the template matching results provided by the present invention.
Detailed Description
The invention is further described with reference to the following figures and specific examples.
The invention provides a target template graph matching and positioning method based on twin networks and central position estimation, which comprises the following steps:
s1, constructing a target template graph matching positioning network
The target template graph matching positioning network is formed by sequentially cascading a feature extraction twin network, a depth correlation convolution network and a central position estimation network. Fig. 1 is a schematic diagram of a specific structure of the entire network. In an embodiment, the network inputs the template graph T with the size of 127 × 127 and the real-time graph S with the size of 255 × 255; the output is a single channel thermodynamic diagram of size 129 x 129.
S1.1, constructing a feature extraction twin network, and extracting feature information of an input template graph and a real-time graph
The feature extraction twin network is formed by cascading two convolution neural networks with shared parameters and the same structure, and takes a template graph T and a real-time graph S as input and outputs as a template graph feature graph f (T) and a real-time graph feature graph f (S), respectively. M in the example of implementation 1 =16,n 1 =32,d =128, i.e.: the size of f (T) is 16X 128, and the size of f (S) is 32X 128.
As shown in fig. 2, the convolutional neural network is obtained by modifying on the basis of a standard ResNet network, and the specific modifications are as follows:
(1) 3 x 3 convolution is added at the third, fourth and fifth layers of the standard ResNet network to realize feature dimension reduction, and the obtained feature maps are respectively marked as
Figure BDA0003848241690000051
And
Figure BDA0003848241690000052
(2) For characteristic diagram
Figure BDA0003848241690000053
Performing 3 × 3 deconvolution to obtain a feature map, and splicing the feature map
Figure BDA0003848241690000054
Then, carrying out 3 x 3 convolution on the spliced feature map to obtain a feature map
Figure BDA0003848241690000055
(3) For characteristic diagram
Figure BDA0003848241690000056
Performing 3 × 3 deconvolution to obtain a feature map, and splicing the feature map
Figure BDA0003848241690000057
Then, the final output is obtained: template map feature map f (T) and live map feature map f (S).
In the embodiment where the ResNet18 network is selected, the number of channels for the 3 × 3 convolution is 128, the number of channels for the step size 1,3 × 3 deconvolution is 128, and the step size is 2.
S1.2, fusing the extracted template graph characteristic graph f (T) and the real-time graph characteristic graph f (S) by utilizing a depth correlation convolution network
The input of the depth correlation convolution operation is f (T) and f (S), the depth convolution operation is carried out on the f (T) serving as a convolution kernel and the f (S), and the output is a correlation characteristic graph f after the two are fused Fusion . Example of embodiment f Fusion Has a size of 17 × 17 × 128.
S1.3, a central position estimation network is constructed, and a thermodynamic diagram is calculated
The central position estimation network consists of three 3 x 3 deconvolution layers and one 3 x 3 convolution levelIs formed by connecting, the input is f Fusion The output is a single-channel thermodynamic diagram P hm . In the embodiment, the number of channels per 3 × 3 deconvolution layer is 128, the step size is 2, the number of channels per 3 × 3 convolution layer is 128, the step size is 1, and the output P is hm Has a size of 129 × 129.
S2 training target template graph matching positioning network
S2.1 making a training image set
In the present embodiment, a Dajiang M300 unmanned aerial vehicle is used to carry a Zen Si H20 pan-tilt camera, visible light pictures and infrared pictures of the ground are taken from the air, and a 40000 pair of template images and real-time images are made as training image sets according to the method provided in the previous step S2.1, wherein the sizes of the template images and the real-time images are 127 × 127 and 255 × 255 pixels, respectively.
S2.2 calibrating a training image set
S2.1.1, for each pair of training samples, calibrating the coordinate c of the center of the template graph on the real-time graph ref =(x ref ,y ref );
S2.1.2 calculating the corresponding position of the center of the template graph on the thermodynamic diagram, wherein the calculation method in the implementation example is
Figure BDA0003848241690000058
Wherein
Figure BDA0003848241690000059
Indicating a rounding down operation.
S2.1.3 obtaining the corresponding coordinates of the center of the template drawing on the thermodynamic diagram, and then generating the thermodynamic diagram labels corresponding to the pair of training samples
Figure BDA00038482416900000510
In the implementation example
Figure BDA00038482416900000511
The thermal value calibrated at each (x, y) position is calculated as follows:
Figure BDA0003848241690000061
wherein x is more than or equal to 1, y is less than or equal to 129,
Figure BDA0003848241690000062
is a hyper-parameter related to the size of the template graph.
S2.3 design loss function
The loss function used for design training is as follows:
Figure BDA0003848241690000063
wherein: p is a radical of x,y Represents the thermal power value (confidence) of the template graph center at the position of the real-time graph (x, y) calculated by the target template graph matching positioning network in S1,
Figure BDA0003848241690000064
represents the thermodynamic diagram calibrated for the training samples in S2.2
Figure BDA0003848241690000065
The thermodynamic values at position (x, y), α and β, are adjustable hyper-parameters, taking α =2 and β =4 in this embodiment example.
S2.4 utilizes the collected training image set and the calibrated data to carry out network training by using a Stochastic Gradient Descent (SGD) (method, namely, the trained target template graph matching positioning network model is obtained by minimizing the loss function designed in S2.3. In the implementation example, when the model is trained, the batch _ size is set to be 128 (the number of GPUs is 4, 32 pairs of images are loaded on each GPU), and the parameters Momentum and weight _ decade are respectively set to be 0.9 and 0.001. The model trains 20 epochs together, wherein in the first 5 epochs, the learning rate is increased to 0.005 from the equal interval of 0.001, and in the last 15 epochs, the learning rate is attenuated to 0.0005 from the equal logarithmic interval of 0.005.
S3, matching and positioning the target template picture by applying the trained target template picture matching and positioning network model
The specific process is as follows:
s3.1, inputting a template picture T (with the size of 127 multiplied by 127) to be matched and positioned and a real-time picture S (with the size of 256 multiplied by 256) into a trained target template picture matching and positioning network model in S2.4;
s3.2, calculating and outputting a thermodynamic diagram P by matching the target template diagram with the positioning network model hm
S3.3 finding thermodynamic diagram P hm Maximum value of (c), and the coordinate of the maximum value point is (x) max ,y max );
S3.4 will (x) max ,y max ) And (5) positioning the position (u, v) of the center of the target template graph on the real-time graph by substituting the following formula:
Figure BDA0003848241690000066
in order to qualitatively evaluate the template matching method provided by the invention, in the embodiment, a Dajiang M300 unmanned aerial vehicle is used to carry a Zen Si H20 pan-tilt camera, visible light photos and infrared photos of the ground are taken from the air, 350 pairs of image pairs consisting of template images and real-time images are made, and a test data set is constructed and recorded as Hard350. The template graph and the real-time graph in the test data set have great differences of rotation, visual angle, shielding, heterogeneities (visible light and infrared) and the like, and do not appear in the training set. In the present embodiment, the average central error (MCE) defined based on the central error and the matching Success Rate (SR) are used as evaluation indexes, where SR2 represents the matching success rate obtained when the central error is smaller than 2 pixels and the matching is successful.
Table 1 shows the comparison result of the method provided by the present invention and some existing typical template matching methods on the test data set, wherein typical representative algorithms include Normalized Cross Correlation (NCC), normalized Mutual Information (NMI), SIFT-based image matching algorithm and HOG-based image matching algorithm, and Ours in the table represents the method provided by the present invention. As can be seen from the comparison of the results in table 1: compared with the traditional template matching method, the method provided by the invention can greatly improve the accuracy and robustness of template matching in a complex environment.
TABLE 1 test results on Easy150 and Hard350 datasets for different methods
Figure BDA0003848241690000071
FIG. 4 shows some target template map matching positioning results obtained under the interference of heterogeneous source, viewing angle difference, rotation difference and scale difference by using the method provided by the invention. As can be seen from the figure, the target template map matching and positioning method provided by the invention still has good performance under the complex challenge condition.
In conclusion, the target template map matching and positioning method based on the twin network and the central position estimation has good target template map matching and positioning accuracy and robustness under the complex challenge condition.

Claims (5)

1. A target template map matching positioning method based on a depth twin network and center position estimation is characterized by comprising the following steps:
s1, constructing a target template graph matching positioning network
The target template graph matching and positioning network is formed by sequentially cascading a feature extraction twin network, a depth correlation convolution network and a central position estimation network, and is input into a template graph T and a real-time graph S, wherein the sizes of T and S are mxm and nxn respectively, m and n are positive integers, and n is larger than m; thermodynamic diagram P with single-channel output hm In the size of m h ×m h ,m h Is a positive integer, and specifically comprises the following components:
s1.1, constructing a feature extraction twin network, and extracting feature information of an input template graph and a real-time graph
The feature extraction twin network is formed by cascading two convolution neural networks with shared parameters and the same structure, and takes a template graph T and a real-time graph S as input and outputs as a template graph feature graph f (T) and a real-time graph feature graph f (S), wherein the size of f (T) is m 1 ×m 1 Size of x d, f (S)Is n 1 ×n 1 X d, wherein m 1 Denotes the length and width of f (T), n 1 Length and width of f (S), d number of channels, m 1 、n 1 D is a positive integer;
the convolutional neural network is obtained by modifying on the basis of a standard ResNet network, and the specific modification is as follows:
(1) 3 x 3 convolution is added at the third, fourth and fifth layers of the standard ResNet network to realize feature dimension reduction, and the obtained feature maps are respectively marked as
Figure FDA0003848241680000011
And
Figure FDA0003848241680000012
(2) For characteristic diagram
Figure FDA0003848241680000013
Carrying out 3 multiplied by 3 deconvolution to obtain a characteristic diagram which is spliced on the characteristic diagram
Figure FDA0003848241680000014
Then, carrying out 3 x 3 convolution on the spliced feature map to obtain the feature map
Figure FDA0003848241680000015
(3) For characteristic diagram
Figure FDA0003848241680000016
Performing 3 × 3 deconvolution to obtain a feature map, and splicing the feature map
Figure FDA0003848241680000017
After that, the final output is obtained: a template graph feature graph f (T) and a real-time graph feature graph f (S);
s1.2, fusing the extracted template graph feature graph f (T) and the real-time graph feature graph f (S) by using a depth-dependent convolution network
The depth-dependent convolution network takes the template graph feature graph f (T) and the real-time graph feature graph f (S) extracted in S1.1 as input, takes f (T) as a convolution kernel to perform depth-dependent convolution operation with f (S), and outputs the result as a combined related feature graph f (T) and f (S) Fusion Of size (m) 1 +1)×(m 1 +1)×d;
S1.3, a central position estimation network is constructed, and a single-channel thermodynamic diagram is calculated
The central position estimation network is formed by cascading three 3 x 3 deconvolution layers and one 3 x 3 convolution layer, wherein: the number of channels of each 3 multiplied by 3 deconvolution layer is d, the step length is s, and s is a positive integer; the number of channels of the 3 multiplied by 3 convolutional layers is d, and the step length is 1;
the central position estimation network uses the fused related feature map f in S1.2 Fusion As input, the output is a single-channel thermodynamic diagram P hm Dimension m h ×m h ,m h =m 1 ·s 3 (ii) a Note p x,y Is a thermodynamic diagram P hm The heat value at the upper (x, y) position is 1-x, y-m h Then p is x,y Has a value range of [0,1 ]];
S2 training target template graph matching positioning network
S2.1 making a training image set
S2.1.1, shooting at different distances, different visual angles and different positions respectively by using a visible light camera and an infrared camera at different time periods aiming at various targets of houses, roads, bridges, vehicles, ships and warships and airplanes to obtain a large number of images;
s2.1.2 making n from the acquired image train The image pair consisting of the template image and the real-time image is processed;
s2.1.3 n to be made train Taking the images as a training image set;
s2.2 calibrating a training image set
When calibrating the image pair composed of the template graph and the real-time graph in the training image set, firstly, the coordinate c of the center of the template graph on the real-time graph needs to be calibrated ref =(x ref ,y ref ) Then mapped to coordinates (x) on a thermodynamic diagram hm ,y hm ) Namely, the corresponding position of the center of the template diagram on the thermodynamic diagram is calculated by the specific calculation method
Figure FDA0003848241680000021
Wherein
Figure FDA0003848241680000022
Represents a round-down operation;
after the corresponding coordinates of the center of the template drawing on the thermodynamic diagram are obtained, the thermodynamic diagram labels corresponding to the pair of training samples are generated
Figure FDA0003848241680000023
In the step, a Gaussian kernel weighting mode is adopted to calibrate the thermodynamic diagram, and the specific calibration method comprises the following steps:
Figure FDA0003848241680000024
wherein:
Figure FDA0003848241680000025
presentation in thermodynamic diagram Label
Figure FDA0003848241680000026
The value of the specific calibrated heat value at the (x, y) position of (A), the value range of x and y is [1,m ] h ];σ p Is a hyper-parameter related to the size of the template graph; calculating the heat value of all (x, y) positions to obtain a thermodynamic diagram label calibrated for the training sample
Figure FDA0003848241680000027
S2.3 design loss function
The loss function used for design training is as follows:
Figure FDA0003848241680000028
wherein: p is a radical of x,y Represents the thermal force value of the template graph at the position of the real-time graph (x, y) calculated by the target template graph matching positioning network in S1,
Figure FDA0003848241680000029
representing a thermodynamic diagram calibrated for the training samples in S2.2
Figure FDA00038482416800000210
The thermodynamic values at position (x, y), α and β are adjustable hyper-parameters;
s2.4, performing network training by using the training image set acquired in S2.1 and the training image set calibrated in S2.2 by using a random gradient descent method, namely, minimizing the loss function designed in S2.3 to obtain a trained target template map matching positioning network model;
s3, matching and positioning the target template picture by applying the trained target template picture matching and positioning network model
The specific process is as follows:
s3.1, inputting a template picture T with the size of m multiplied by m and a real-time picture S with the size of n multiplied by n to be matched and positioned into a trained target template picture matching and positioning network model in S2.4;
s3.2 calculating and outputting thermodynamic diagram P through the target template diagram matching positioning network model hm
S3.3 finding thermodynamic diagram P hm The maximum value above, the coordinate of the point with the maximum value is marked as (x) max ,y max );
S3.4 will (x) max ,y max ) And (5) positioning the position (u, v) of the center of the target template graph on the real-time graph by substituting the following formula:
Figure FDA0003848241680000031
2. a target template map matching positioning method based on a depth twin network and center position estimation according to claim 1, characterized in that: s2.1.2, the number n of image pairs consisting of template images and real-time images train ≥40000。
3. A target template map matching positioning method based on a depth twin network and center position estimation according to claim 1, characterized in that: s2.1.2, preparation of n train The method for the image pair consisting of the template image and the real-time image comprises the following steps: cutting an image block containing a certain target in a certain image, zooming the image block into a size of m multiplied by m, and selecting the image block as a template picture, wherein m is a positive integer; and cutting image blocks containing the same target in other images, scaling the image blocks into n multiplied by n, and selecting the image blocks as a real-time image, wherein n is a positive integer.
4. A target template map matching positioning method based on a depth twin network and center position estimation according to claim 1, characterized in that: s2.2, hyper-parameters related to the size of the template graph
Figure FDA0003848241680000032
5. A target template map matching positioning method based on a depth twin network and center position estimation according to claim 1, characterized in that: in S2.3, the adjustable hyper-parameters are α =2 and β =4.
CN202211131672.7A 2022-09-15 2022-09-15 Target template graph matching and positioning method based on twin network and central position estimation Active CN115330876B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211131672.7A CN115330876B (en) 2022-09-15 2022-09-15 Target template graph matching and positioning method based on twin network and central position estimation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211131672.7A CN115330876B (en) 2022-09-15 2022-09-15 Target template graph matching and positioning method based on twin network and central position estimation

Publications (2)

Publication Number Publication Date
CN115330876A true CN115330876A (en) 2022-11-11
CN115330876B CN115330876B (en) 2023-04-07

Family

ID=83929989

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211131672.7A Active CN115330876B (en) 2022-09-15 2022-09-15 Target template graph matching and positioning method based on twin network and central position estimation

Country Status (1)

Country Link
CN (1) CN115330876B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115861595A (en) * 2022-11-18 2023-03-28 华中科技大学 Multi-scale domain self-adaptive heterogeneous image matching method based on deep learning
CN116260765A (en) * 2023-05-11 2023-06-13 中国人民解放军国防科技大学 Digital twin modeling method for large-scale dynamic routing network
CN115861595B (en) * 2022-11-18 2024-05-24 华中科技大学 Multi-scale domain self-adaptive heterogeneous image matching method based on deep learning

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109191491A (en) * 2018-08-03 2019-01-11 华中科技大学 The method for tracking target and system of the twin network of full convolution based on multilayer feature fusion
CN110245678A (en) * 2019-05-07 2019-09-17 华中科技大学 A kind of isomery twinned region selection network and the image matching method based on the network
US20190332935A1 (en) * 2018-04-27 2019-10-31 Qualcomm Incorporated System and method for siamese instance search tracker with a recurrent neural network
CN112069896A (en) * 2020-08-04 2020-12-11 河南科技大学 Video target tracking method based on twin network fusion multi-template features
CN113705731A (en) * 2021-09-23 2021-11-26 中国人民解放军国防科技大学 End-to-end image template matching method based on twin network
CN114022729A (en) * 2021-10-27 2022-02-08 华中科技大学 Heterogeneous image matching positioning method and system based on twin network and supervised training
CN114581678A (en) * 2022-03-15 2022-06-03 中国电子科技集团公司第五十八研究所 Automatic tracking and re-identifying method for template feature matching

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190332935A1 (en) * 2018-04-27 2019-10-31 Qualcomm Incorporated System and method for siamese instance search tracker with a recurrent neural network
CN109191491A (en) * 2018-08-03 2019-01-11 华中科技大学 The method for tracking target and system of the twin network of full convolution based on multilayer feature fusion
CN110245678A (en) * 2019-05-07 2019-09-17 华中科技大学 A kind of isomery twinned region selection network and the image matching method based on the network
CN112069896A (en) * 2020-08-04 2020-12-11 河南科技大学 Video target tracking method based on twin network fusion multi-template features
CN113705731A (en) * 2021-09-23 2021-11-26 中国人民解放军国防科技大学 End-to-end image template matching method based on twin network
CN114022729A (en) * 2021-10-27 2022-02-08 华中科技大学 Heterogeneous image matching positioning method and system based on twin network and supervised training
CN114581678A (en) * 2022-03-15 2022-06-03 中国电子科技集团公司第五十八研究所 Automatic tracking and re-identifying method for template feature matching

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
KE LIANG ET AL.: "\"An Adaptive Kalman-Correlation Based Siamese Network Tracker for Visual Object Tracking\"" *
QIANG REN ET AL.: "\"A Robust and Accurate End-to-End Template Matching Method Based on the Siamese Network\"" *
史璐璐等: "\"基于Tiny Darknet全卷积孪生网络的目标跟踪\"" *
陈云芳等: "\"基于孪生网络结构的目标跟踪算法综述\"" *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115861595A (en) * 2022-11-18 2023-03-28 华中科技大学 Multi-scale domain self-adaptive heterogeneous image matching method based on deep learning
CN115861595B (en) * 2022-11-18 2024-05-24 华中科技大学 Multi-scale domain self-adaptive heterogeneous image matching method based on deep learning
CN116260765A (en) * 2023-05-11 2023-06-13 中国人民解放军国防科技大学 Digital twin modeling method for large-scale dynamic routing network
CN116260765B (en) * 2023-05-11 2023-07-18 中国人民解放军国防科技大学 Digital twin modeling method for large-scale dynamic routing network

Also Published As

Publication number Publication date
CN115330876B (en) 2023-04-07

Similar Documents

Publication Publication Date Title
CN110599537A (en) Mask R-CNN-based unmanned aerial vehicle image building area calculation method and system
CN111160297A (en) Pedestrian re-identification method and device based on residual attention mechanism space-time combined model
JP5417494B2 (en) Image processing method and system
CN106529538A (en) Method and device for positioning aircraft
CN104077760A (en) Rapid splicing system for aerial photogrammetry and implementing method thereof
CN107766864B (en) Method and device for extracting features and method and device for object recognition
CN112102294B (en) Training method and device for generating countermeasure network, and image registration method and device
CN107909018B (en) Stable multi-mode remote sensing image matching method and system
CN104268880A (en) Depth information obtaining method based on combination of features and region matching
JP2017033197A (en) Change area detection device, method, and program
US20210158081A1 (en) System and method for correspondence map determination
CN115330876B (en) Target template graph matching and positioning method based on twin network and central position estimation
Knyaz et al. Joint geometric calibration of color and thermal cameras for synchronized multimodal dataset creating
CN113658147A (en) Workpiece size measuring device and method based on deep learning
CN104392209B (en) A kind of image complexity evaluation method of target and background
CN117218201A (en) Unmanned aerial vehicle image positioning precision improving method and system under GNSS refusing condition
CN109740405B (en) Method for detecting front window difference information of non-aligned similar vehicles
CN114120129B (en) Three-dimensional identification method for landslide slip surface based on unmanned aerial vehicle image and deep learning
CN116758419A (en) Multi-scale target detection method, device and equipment for remote sensing image
CN111563423A (en) Unmanned aerial vehicle image target detection method and system based on depth denoising automatic encoder
CN111222576A (en) High-resolution remote sensing image classification method
CN110135474A (en) A kind of oblique aerial image matching method and system based on deep learning
CN113808256B (en) High-precision holographic human body reconstruction method combined with identity recognition
Lo et al. Depth estimation based on a single close-up image with volumetric annotations in the wild: A pilot study
CN113554754A (en) Indoor positioning method based on computer vision

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant