CN113628261A

CN113628261A - Infrared and visible light image registration method in power inspection scene

Info

Publication number: CN113628261A
Application number: CN202110892727.5A
Authority: CN
Inventors: 吴志成; 林秀贵; 许家浩; 杨昌加; 王门鸿; 叶学知; 陈子良; 李博宁; 蔡志坚; 林旭鸣; 张志祥; 陈健伟
Original assignee: State Grid Fujian Electric Power Co Ltd; Quanzhou Power Supply Co of State Grid Fujian Electric Power Co Ltd
Current assignee: State Grid Fujian Electric Power Co Ltd; Quanzhou Power Supply Co of State Grid Fujian Electric Power Co Ltd
Priority date: 2021-08-04
Filing date: 2021-08-04
Publication date: 2021-11-09
Anticipated expiration: 2041-08-04
Also published as: CN113628261B

Abstract

The invention relates to an infrared and visible light image registration method in a power inspection scene, which comprises the following steps: step S1, acquiring an infrared image and a visible light image of the power equipment; step S2, respectively extracting edge information of the infrared and visible light images of the power equipment through a Sobel edge detection operator to obtain the edge images of the infrared and visible light; step S3, respectively detecting the characteristic points of the two edge images by using a SuperPoint characteristic extraction network and calculating a descriptor; s4, matching the feature points through a SuperGlue feature matching network according to the feature points of the two edge images obtained in the step S3, screening to obtain correct feature point matching pairs, and simultaneously removing unmatchable feature points; and step S5, calculating affine transformation model parameters according to the matched feature points, and performing space coordinate transformation on the image to be registered through bilinear interpolation to realize image registration. The method and the device realize accurate registration of the infrared and visible light images of the power equipment, and acquire the temperature information of the power equipment in the background of the visible light image.

Description

Infrared and visible light image registration method in power inspection scene

Technical Field

The invention belongs to the technical field of image processing, and particularly relates to an infrared and visible light image registration method in a power inspection scene.

Background

The construction and maintenance of the power grid play an important role in the development of the country and the society, and the electric power industry is an important guarantee for the improvement of the comprehensive strength of the country and the high-speed development of the society. The power equipment is an important component in the power grid, and when the power equipment normally works, a certain amount of heat is generated under the action of current, but the temperature should be within a certain range. When the power equipment ages or fails, the abnormal condition of local heating of the equipment can occur, and the safe and stable operation of the power grid is endangered. Therefore, it is necessary to perform safety detection on the power equipment, find abnormal heating conditions of the equipment in time and maintain the equipment.

The existing heating detection of the power equipment is usually carried out by using a temperature measuring instrument or an infrared camera, the detection instrument needs to be arranged in a power system on a large scale, the cost is high, dead corners are easy to miss, the detection result needs to be analyzed and judged by technicians, the cost is high in manpower and material resources, and the efficiency is low. With the development of an image processing technology, a mode of cooperative processing of infrared and visible light images can be adopted, and the functions of acquiring the temperature information of the power equipment in the background of the visible light image are realized by combining the characteristics that the infrared image can detect the temperature of an object, the anti-interference capability is strong, the detail information of the visible light image is rich, and the resolution is high. The detection result information combining the infrared image and the visible light image is rich, easy to observe, and convenient for technical personnel to carry out abnormal heating detection on the power equipment.

The precondition of cooperative processing of infrared and visible light images requires registration of the two images. The image registration is a process of identifying and then corresponding to the same or similar structures and contents from two or more images acquired from different sensors, different visual angles and different times, determining transformation parameters between the images according to some similarity measures, transforming the two or more images to the same coordinate system, and obtaining the best matching on a pixel layer. The infrared and visible light image registration belongs to multi-mode image registration, and due to the fact that the imaging mechanisms of the infrared and visible light images are different, the images are obviously different, the infrared image is low in resolution, fuzzy in image and poor in detail information compared with the visible light image, the gray features of the infrared image and the visible light image are greatly different, and the registration difficulty is high. The existing methods mostly perform registration based on image point features: such as SIFT and SURF. However, most methods have low accuracy and accuracy in registering infrared and visible images, and cannot realize registration. In response to this problem, a method is needed that allows for accurate registration of infrared and visible images.

Disclosure of Invention

In view of this, the present invention provides an infrared and visible light image registration method in a power inspection scene, which is used for accurately registering infrared and visible light images of a power device and acquiring temperature information of the power device in a background of the visible light image.

In order to achieve the purpose, the invention adopts the following technical scheme:

an infrared and visible light image registration method under a power inspection scene comprises the following steps:

step S1, acquiring an infrared image and a visible light image of the power equipment;

step S2, respectively extracting edge information of the infrared and visible light images of the power equipment through a Sobel edge detection operator to obtain the edge images of the infrared and visible light;

step S3, respectively detecting the characteristic points of the two edge images by using a SuperPoint characteristic extraction network and calculating a descriptor;

s4, matching the feature points through a SuperGlue feature matching network according to the feature points of the two edge images obtained in the step S3, screening to obtain correct feature point matching pairs, and simultaneously removing unmatchable feature points;

and step S5, calculating affine transformation model parameters according to the matched feature points, and performing space coordinate transformation on the image to be registered through bilinear interpolation to realize image registration.

Further, the step S2 is specifically:

step S21, carrying out graying processing on the infrared image A and the visible light image B of the power equipment through a grayscale conversion function, and converting the color image into a grayscale image;

step S22 by SExtracting infrared edge images of power equipment by using obel edge detection operator

And visible edge images

Let Sobel_x、Sobel_yRespectively as horizontal and vertical convolution factors, and performing convolution operation on the factors and the image to respectively obtain a horizontal and vertical edge detection result image G of the image_x＝Sobel_x*A,G_y＝Sobel_yA; by passing

And combining the two images to obtain a final edge image.

Further, the Sobel convolution factor is specifically:

further, the step S3 is specifically:

step S31, according to the acquired infrared edge image

The image size is H multiplied by W, the infrared edge image is processed by an encoder network, and the processed edge image is composed of

Become into

The space size is Hc multiplied by Wc, wherein Hc is less than H, Wc is less than W;

step S32, extracting network from the feature points, inputting the network as tensor

Firstly, carrying out convolution operation twice through convolution layer to obtainEach pixel point is the score of a characteristic point; the score for each pixel is then mapped to [0,1 ] by the Softmax function]The probability that the pixel points corresponding to the infrared edge image are the characteristic points; finally, restoring the original size through upsampling;

step S33 describing the sub-decoder network by computing tensors

And convert it into

The output of the network is a fixed-length descriptor normalized by an L2 norm standard;

step S34 for visible light edge image

Through the same processing, the characteristic points and descriptors of the visible light edge image are obtained.

Further, the encoder network structure unit is composed of a convolutional layer Conv, a nonlinear activation function Relu, and a pooling layer Pool, and specifically as follows:

a. and (3) rolling layers: the convolution layer firstly fills the boundary of the input image, then uses the convolution kernel to carry out convolution operation on the input image, extracts the characteristics of the image and outputs a characteristic diagram;

b. nonlinear activation function: after each convolutional layer, there is a ReLU nonlinear activation function, which increases the nonlinearity of the neural network.

c. A pooling layer: the pooling layer down-samples the feature map obtained by the convolutional layer, reduces the feature map size output by the convolutional layer, and reduces the amount of computation of the network.

Further, the step S33 is specifically:

a. tensor of encoder network output

Firstly, carrying out convolution twice, wherein the sizes of convolution kernels are 3 x 3 and 1 x 1 in sequence, the step length is 1, and the output after convolution operation is

Wherein 65 channels correspond to local 8 x 8 pixel grid regions in the image that do not overlap, plus 1 channel corresponding to no feature point detected in the 8 x 8 region; then, 1 channel without the feature point is removed to obtain

b. Using the Softmax function will

The score of each pixel in the set is mapped to [0,1 ]]Obtaining the probability of each pixel point being a feature point;

c. feature maps of smaller size by sub-pixel convolution

Enlargement, first in 64 feature maps

One pixel point is taken at the same position of the image, and a characteristic diagram with the size of 8 x 8 is formed by splicing; then carrying out the same processing on the pixel points at other positions of the feature map; finally, the size of the characteristic diagram is enlarged to 8 times of the original size, and a result diagram with the size consistent with that of the initial infrared edge image is output

Further, the step S34 is specifically:

a. tensor of encoder network output

b. Extracting descriptors corresponding to the feature points, firstly normalizing the size of the image, and simultaneously moving the feature points to the corresponding positions of the normalized image; then constructing K groups of tensors of 1 multiplied by 2 through the normalized feature points, wherein K represents the number of the feature points, and 2 respectively represents the horizontal and vertical coordinates of the feature points; performing inverse normalization on the positions of the feature points, and inserting the descriptors into the positions of the corresponding key points through a double-linear interpolation algorithm; finally, through normalization by an L2 norm standard, a descriptor with uniform length is obtained.

Further, the step S4 is specifically:

step S41, detecting infrared edge image through SuperPoint

And visible edge images

After the feature point position p and the descriptor d are calculated, M and N feature points are respectively extracted from the two images and are respectively expressed as

For infrared edge images

Position of the detected feature point

And a descriptor

Combining to form a feature matching vector, the initial representation of the vector is:

wherein MLP is multilayer perception encoder, and positions of characteristic points

Embedding into high-dimensional vector, and then combining the vector after the rising dimension with descriptor

And (4) adding. For visible light edge image

Position of the feature point

And a descriptor

The same treatment is carried out;

step S42, using the multi-layer neural network, aggregating the undirected edges epsilon for the feature points i in each layer_selfOr epsilon_crossAnd calculating the representation form of the update vector; epsilon_selfConnecting a feature point to all other feature points, ε, in the same image_crossThe feature points are connected to all feature points in another image,

representing infrared edge images

In the intermediate representation of the vector of (1) th layer, m_ε→iIs the result of the aggregation of all feature points { j (i, j) ∈ epsilon }, epsilon ∈ { epsilon_self,ε_cross}；

The vector calculation is updated as:

wherein [ |. ]]Expressed as a tandem operation, starting with 1 layer and epsilon when l is odd_selfWhen l is even number, epsilon ═ epsilon_cross(ii) a By following the inner edge epsilon of the image_selfAnd between imagesEdge epsilon_crossAlternately aggregating and updating, and simulating the process of judging feature matching by back and forth browsing of human beings; infrared edge image

After each feature point i in the image passes through the layer number L of neural network, a feature matching vector is obtained

Where W and b are weights and offsets, visible edge images

Correspondingly, the feature matching vector can be obtained

And step S43, the feature point correspondences of the two images must comply with preset physical constraints: 1) one feature point has at most one corresponding relation in another image; 2) due to factors such as occlusion, some feature points cannot be matched; therefore, the matching of the feature points between the images is to be partial distribution between two sets of feature point sets, and a partial distribution matrix P epsilon [0,1 ] is established for M and N feature points of the infrared and visible light edge images]^M×NRepresenting all possible matches of the feature points of the two images, each possible correspondence having a confidence value representing the likelihood of its correct match, and constrained as follows:

P*1_N＜1_M and P^T*1_M＜1_N

first, feature matching vectors are calculated

And

the inner product of (A) yields a feature matching score matrix S_i,jThe feature allocation matrix P can be maximized under constraint

Obtaining the total score of; the optimal transmission problem is regarded, and a final feature distribution matrix P is solved through iteration of a Sinkhorn algorithm;

expanding the scoring matrix S by one channel to

Matrix array

Redundant channels are used for filtering unmatched feature points;

the constraint becomes:

wherein

After iteration is carried out for a plurality of times through a Sinkhorn algorithm, the extra 1 channel is removed, and the feature distribution matrix is restored to be P_M×N。

Further, the step S5 is specifically:

step S51, calculating affine transformation model parameters according to the feature point matching result;

step S52, firstly, establishing a zero matrix with the size consistent with that of the infrared image, and obtaining a corresponding point of each point on the visible light image by carrying out coordinate transformation on each point in the matrix; then, obtaining the pixel value of the point through a bilinear interpolation method, and taking the pixel value as the pixel value of the corresponding point on the image after registration;

and step S53, calculating affine transformation model parameters according to the matched feature points, and performing space coordinate transformation on the image to be registered through bilinear interpolation to obtain a final registered image.

Further, the affine transformation model is expressed as:

wherein (x, y) and (x ', y') are coordinates of corresponding points in the two images respectively, and parameters in the affine transformation model can be determined through correct matching points.

Compared with the prior art, the invention has the following beneficial effects:

1. according to the invention, infrared and visible light images of the power equipment are cooperatively processed, and the advantages of strong anti-interference capability, rich visible light image detail information and high resolution of the infrared image, which can detect the temperature of an object, are combined, so that the function of acquiring the temperature information of the power equipment in the background of the visible light image is realized, the registered image information is rich, and the detection by technical personnel is facilitated;

2. according to the method, the edge information of the infrared image and the visible light image is respectively extracted through the edge detection operator, the outline of the image is highlighted, the similarity of the infrared image and the visible light image is improved, and meanwhile, the influence of distortion is better eliminated. By reasonably utilizing the edge information of the image, the image registration time is reduced, and the image registration accuracy is improved;

3. the invention can extract a large number of feature points and descriptors thereof with high precision. Most of mismatching can be identified and filtered out while correct matching between feature point sets is realized. The image registration accuracy and accuracy are high, and the algorithm robustness is good;

4. the invention can realize accurate registration of infrared and visible light images shot by different scenes and different cameras, and has strong applicability.

Drawings

FIG. 1 is a flow chart of the method of the present invention;

fig. 2 is a structure diagram of a SuperPoint network in an embodiment of the present invention;

FIG. 3 is a block diagram of an encoder network according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a maximum pooling layer in one embodiment of the present invention;

FIG. 5 is a diagram of a SuperGlue network architecture in accordance with an embodiment of the present invention;

FIG. 6 shows an example of the present invention.

Detailed Description

The invention is further explained below with reference to the drawings and the embodiments.

In the embodiment, a visual information acquisition part in the infrared and visible light image registration method in the power inspection scene is composed of binocular infrared and visible light cameras, wherein the resolution of the visible light camera is 2048 × 1536, the resolution of the infrared camera is 640 × 480, the two cameras are on the same optical axis, lenses are arranged on the same plane, and the baseline distance is 5-10 cm.

Referring to fig. 1, the invention provides a method for registering infrared and visible light images in a power inspection scene, which includes the following steps:

step S1, acquiring an infrared image A and a visible light image B of the power equipment;

step S2, utilizing a Sobel edge detection operator to carry out edge detection on the infrared and visible light images of the power equipment;

(1) and (5) graying processing. Carrying out graying processing on an infrared image A and a visible light image B of the power equipment through a gray level conversion function, and converting a shot color image into a gray level image;

(2) and detecting edges. Extracting infrared edge images of power equipment through Sobel edge detection operator

And visible edge images

The Sobel convolution factor is:

respectively, horizontal and vertical convolution factors, by performing convolution operation on the factors and the imageCan respectively obtain the horizontal and longitudinal edge detection result images G of the image_x＝Sobel_x*A,G_y＝Sobel_yA. By passing

And combining the two images to obtain a final edge image.

And step S3, extracting the characteristic points of the edge images of the infrared and visible light by using the SuperPoint characteristics and calculating the descriptors.

Preferably, in this embodiment, the structure diagram of the SuperPoint network is shown in fig. 2, and the specific process of extracting the features is as follows:

(1) and (6) reading an image. Reading in infrared edge images

The image size is H × W.

(2) And (5) processing the image. Processing the infrared edge image by an encoder network, the processed edge image being obtained by

Become into

It has a small space size (Hc Wc) and a large channel depth, wherein

Encoder network architecture as shown in fig. 3, the network structure unit is composed of a convolutional layer, a nonlinear activation function, and a pooling layer (Conv-Relu-Pool):

21) and (4) rolling up the layers. The convolutional layer firstly fills the boundary of the input image, then performs convolution operation on the input image by utilizing a convolution kernel, extracts the characteristics of the image and outputs a characteristic diagram. The encoder network has eight convolutional layers in total, the first four contain 64 convolutional kernels with the size of 3 x 3, the step size is 1, and an input image passes through the convolutional layers and then 64 feature maps are output. The last four contain 128 convolution kernels of size 3 x 3 with a step size of 1, and the input image passes through the convolution kernels and outputs 128 feature maps.

22) A non-linear activation function. After each convolutional layer, there is a ReLU nonlinear activation function, which increases the nonlinearity of the neural network. The nonlinear activation function has stronger expressive power, and simultaneously avoids the problem of gradient disappearance. The Relu function is as follows:

relu input x, when x is positive, the output is unchanged; when x is negative, the output is 0. Through unilateral inhibition, neurons in the network have sparse activation, and the features of the image can be extracted better.

23) And (4) a pooling layer. The pooling layer down-samples the feature map obtained by the convolutional layer, reduces the feature map size output by the convolutional layer, and reduces the amount of computation of the network. The encoder network has three maximum pooling layers, and maximum pooling operation is performed after each two convolutional layers. The pooled nuclei size was 2 x 2 with step size of 2. As shown in fig. 4, the maximum pooling layer preserves the main features in the region while enabling a reduction in image size.

(3) And extracting the characteristic points. Extracting network from feature points with tensor as input

Firstly, carrying out convolution operation twice through a convolution layer to obtain the score of each pixel point as a characteristic point. The score for each pixel is then mapped to [0,1 ] by the Softmax function]And (3) probability that pixel points corresponding to the infrared edge image are characteristic points. And finally, restoring the original size through upsampling. Comprises the following steps:

31) tensor of encoder network output

Of which 65 channels correspond to local 8 x 8 pixel grid regions in the image that do not overlap, plus 1 channel corresponding to no feature point detected in the 8 x 8 region. Then, 1 channel without the feature point is removed to obtain

32) Using the Softmax function will

The score of each pixel in the set is mapped to [0,1 ]]And obtaining the probability of each pixel point as a feature point. The Softmax function is as follows:

of all n elements, the m-th element z_mCan be mapped to [0,1 ] by the Softmax formula]In the meantime.

33) Feature maps of smaller size by sub-pixel convolution

Enlargement, first in 64 feature maps

(4) And calculating the descriptor. Describing a network of sub-decoders by computing tensors

And rotate itChange to

The output of the network is a fixed-length descriptor normalized by an L2 norm standard, which comprises the following processes:

41) tensor of encoder network output

42) And extracting descriptors corresponding to the feature points, normalizing the size of the image, and moving the feature points to the corresponding positions of the normalized image. And then constructing K groups of tensors of 1 × 1 × 2 through the normalized feature points, wherein K represents the number of the feature points, and 2 respectively represents the horizontal and vertical coordinates of the feature points. And performing inverse normalization on the positions of the characteristic points, and inserting the descriptors into the positions of the corresponding key points through a double-linear interpolation algorithm. Finally, through normalization by an L2 norm standard, a descriptor with uniform length is obtained.

(5) For visible edge images

With the same processing, the characteristic points and descriptors of the visible light edge image are obtained.

(6) A loss function. The final loss function is the sum of two intermediate loss functions, one for the feature points

The other is used for the descriptor

In the network training process, given an image I, the positions of the characteristic points of the image are known. Image I' is generated by applying a random homography matrix thereto. By means of the image pair thus generated, both losses are optimized simultaneously and the resulting losses are balanced with λ.

61) Feature point detector loss function

Is in the pixel

The above cross-entropy loss of full convolution, and similarly, defining the correct feature point set corresponding thereto as y_hwE.g. Y. Similarly, the image I' generated by the homography matrix comprises

Y′；

Wherein:

62) in image I

And in the image I

The correspondence between the (h, w) cells and the (h ', w') cells in the two images caused by the homography change matrix is as follows:

wherein,

representing a homography transformation matrix; p is a radical of_hwRepresents the position of the center pixel in the (h, w) cell;

indicates the cell location p_hwMultiplied by the homography transform matrix H and then divided by the last coordinate.

The descriptor loss function is:

s represents the entire set of correspondence relationships for a pair of images. Using a gas with a positive margin m_pAnd a negative margin m_nHinge loss function l_dAnd through λ_dTo balance the negative number:

l_d(d_hw,d′_h′w′；s_hwh′w′)＝λ_d*s_hwh′w′*max(0,m_p-d_hw ^Td′_h′w′)+(1-s_hwh′w′)*max(0,d^T _hwd′_h′w′-m_n)；

and step S4, utilizing a SuperGlue feature matching network to realize correct matching of the feature points between the infrared and visible light edge images, and simultaneously eliminating unmatchable feature points.

In this embodiment, preferably, the structure of the SuperGlue network is shown in fig. 5, and the following steps are detailed:

(1) and (5) feature coding. Detection of infrared edge images by SuperPoint

And visible edge images

After the feature point position p and the descriptor d are calculated, two images are respectively extractedTaking M and N characteristic points, respectively expressed as

For infrared edge images

Position of the detected feature point

And a descriptor

And (4) adding. For visible light edge image

Position of the feature point

And a descriptor

The same process is followed.

(2) A multi-layer graph neural network. Using a graph neural network with 7 layers, aggregating undirected edges ε for feature points i in each layer_selfOr epsilon_crossAnd computing a representation of the update vector. Epsilon_selfFeature pointsConnected to all other feature points, e, in the same image_crossThe feature points are connected to all feature points in another image,

representing infrared edge images

In the intermediate representation of the vector of (1) th layer, m_ε→iIs the result of the aggregation of all feature points { j (i, j) ∈ epsilon }, epsilon ∈ { epsilon_self,ε_cross}. The vector calculation is updated as:

wherein [ |. ]]Expressed as a tandem operation, starting with 1 layer and epsilon when l is odd_selfWhen l is even number, epsilon ═ epsilon_cross. By following the inner edge epsilon of the image_selfAnd inter-image edges epsilon_crossAnd alternately aggregating and updating, and simulating the process of judging feature matching by browsing back and forth by human beings. Infrared edge image

After each feature point i in the graph is subjected to the neural network with the number L of layers being 7, a feature matching vector is obtained

Where W and b are weights and offsets. Visible edge image

Correspondingly, the feature matching vector can be obtained

(3) A best matching network. The feature point correspondences of the two images must obey the following physical constraints: 1) one feature point has at most one corresponding relation in another image; 2) some feature points will not match due to occlusion, etc. Therefore, the matching of the feature points between the images is to be partial distribution between two sets of feature point sets, and a partial distribution matrix P epsilon [0,1 ] is established for M and N feature points of the infrared and visible light edge images]^M×NRepresenting all possible matches of the feature points of the two images, each possible correspondence having a confidence value representing the likelihood of its correct match, and constrained as follows:

P*1_N＜1_M and P^T*1_M＜1_N

first, feature matching vectors are calculated

And

The total score of (a) is obtained. This can be regarded as an optimal transmission problem, and the final feature distribution matrix P is solved iteratively by the Sinkhorn algorithm.

Since there may be unmatched points, expanding the scoring matrix S by one pass becomes

Matrix array

Redundant channels are used to filter out unmatched feature points. At this time, the constraint conditions become:

wherein

After 100 iterations through the Sinkhorn algorithm, the extra 1 channel is removed, and the feature distribution matrix is restored to P_M×N。

(4) A loss function. Network training uses supervised learning when giving truth value of feature point matching

And unmatchable feature points

By minimizing a partial allocation matrix function

While maximizing the accuracy and recall of the match. The loss function is defined as:

and step S5, calculating affine transformation model parameters according to the feature point matching result, and realizing image registration. The detailed steps are as follows:

(1) the affine transformation model can be expressed as:

wherein (x, y) and (x ', y') are the coordinates of corresponding points in the two images respectively, and 6 parameters in the affine transformation model can be determined through 3 pairs of correct matching points;

(2) firstly, establishing a zero matrix with the size consistent with that of the infrared image, and performing coordinate transformation on each point in the matrix to obtain a corresponding point of the point on the visible light image; and then obtaining the pixel value of the point through a bilinear interpolation method, wherein the pixel value is used as the pixel value of the corresponding point on the image after registration. As shown in fig. 6, affine transformation model parameters are calculated according to the matching feature point pairs, and spatial coordinate transformation is performed on the image to be registered through bilinear interpolation, so as to obtain a final registered image.

The above description is only a preferred embodiment of the present invention, and all equivalent changes and modifications made in accordance with the claims of the present invention should be covered by the present invention.

Claims

1. An infrared and visible light image registration method under a power inspection scene is characterized by comprising the following steps:

2. The method for registering the infrared and visible light images in the power inspection scene according to claim 1, wherein the step S2 specifically includes:

step S22, extracting the infrared edge image of the power equipment through a Sobel edge detection operator

And visible edge images

And combining the two images to obtain a final edge image.

3. The infrared and visible light image registration method under the power inspection scene according to claim 2, wherein the Sobel convolution factor specifically is:

4. the method for registering the infrared and visible light images in the power inspection scene according to claim 1, wherein the step S3 specifically includes:

step S31, according to the acquired infrared edge image

Become into

Its spatial dimensionHc is multiplied by Wc, wherein Hc is less than H, Wc is less than W;

Firstly, carrying out convolution operation twice through a convolution layer to obtain a score of each pixel point as a characteristic point; the score for each pixel is then mapped to [0,1 ] by the Softmax function]The probability that the pixel points corresponding to the infrared edge image are the characteristic points; finally, restoring the original size through upsampling;

step S33 describing the sub-decoder network by computing tensors

And convert it into

step S34 for visible light edge image

5. The method for registering the infrared and visible light images in the power inspection scene according to claim 4, wherein the encoder network structural unit is composed of a convolutional layer Conv, a nonlinear activation function Relu and a pooling layer Pool, and specifically comprises the following steps:

6. The method for registering the infrared and visible light images in the power inspection scene according to claim 4, wherein the step S33 specifically comprises:

a. tensor of encoder network output

b. Using the Softmax function will

c. feature maps of smaller size by sub-pixel convolution

Enlargement, first in 64 feature maps

One pixel point is taken at the same position of the image, and a characteristic diagram with the size of 8 x 8 is formed by splicing; then carrying out the same processing on the pixel points at other positions of the feature map; finally, the size of the feature map is enlarged to 8 times of the original size, and the feature map is output and initiallyResult graph with consistent infrared edge image size

7. The method for registering the infrared and visible light images in the power inspection scene according to claim 4, wherein the step S34 specifically comprises:

a. tensor of encoder network output

8. The method for registering the infrared and visible light images in the power inspection scene according to claim 1, wherein the step S4 specifically includes:

step S41, detecting infrared edge image through SuperPoint

And visible edge images

Position p of the feature point of (1)After the descriptor d is calculated, M and N feature points are respectively extracted from the two images and are respectively expressed as

For infrared edge images

Position of the detected feature point

And a descriptor

And (4) adding. For visible light edge image

Position of the feature point

And a descriptor

The same treatment is carried out;

step S42, using a multi-layer neural network, in each layerAggregating undirected edges epsilon for feature points i_selfOr epsilon_crossAnd calculating the representation form of the update vector; epsilon_selfConnecting a feature point to all other feature points, ε, in the same image_crossThe feature points are connected to all feature points in another image,

representing infrared edge images

The vector calculation is updated as:

wherein [ |. ]]Expressed as a tandem operation, starting with 1 layer and epsilon when l is odd_selfWhen l is even number, epsilon ═ epsilon_cross(ii) a By following the inner edge epsilon of the image_selfAnd inter-image edges epsilon_crossAlternately aggregating and updating, and simulating the process of judging feature matching by back and forth browsing of human beings; infrared edge image

After each feature point i in the feature matching vector is subjected to the layer number L graph neural network, a feature matching vector f is obtained_i ^A

Where W and b are weights and offsets, visible edge images

Correspondingly, the feature matching vector can be obtained

P*1_N＜1_M and P^T*1_M＜1_N

first, a feature matching vector f is calculated_i ^AAnd

expanding the scoring matrix S by one channel to

Matrix array

Redundant channels are used for filtering unmatched feature points;

the constraint becomes:

wherein

9. The method for registering the infrared and visible light images in the power inspection scene according to claim 1, wherein the step S5 specifically includes:

10. The infrared and visible light image registration method in the power inspection scene according to claim 9, wherein the affine transformation model is expressed as: