CN112561973A - Method and device for training image registration model and electronic equipment - Google Patents

Method and device for training image registration model and electronic equipment Download PDF

Info

Publication number
CN112561973A
CN112561973A CN202011541901.3A CN202011541901A CN112561973A CN 112561973 A CN112561973 A CN 112561973A CN 202011541901 A CN202011541901 A CN 202011541901A CN 112561973 A CN112561973 A CN 112561973A
Authority
CN
China
Prior art keywords
image
pair
image block
mask
registration
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202011541901.3A
Other languages
Chinese (zh)
Inventor
龙勇志
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Vivo Mobile Communication Co Ltd
Original Assignee
Vivo Mobile Communication Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Vivo Mobile Communication Co Ltd filed Critical Vivo Mobile Communication Co Ltd
Priority to CN202011541901.3A priority Critical patent/CN112561973A/en
Publication of CN112561973A publication Critical patent/CN112561973A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • G06T7/33Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
    • G06T7/337Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10048Infrared image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Abstract

The application discloses a method and a device for training an image registration model and electronic equipment, and belongs to the field of image processing. The method comprises the following steps: acquiring a dataset comprising a registered pair of images; cutting each image pair in the data set to obtain a target image block pair; calculating first transformation matrixes corresponding to two image blocks in the target image block pair; sequentially inputting target image block pairs serving as training data into an initial image registration model to obtain registration image pairs of the target image block pairs and obtain second transformation matrixes corresponding to two image blocks in the registration image pairs, wherein the image registration model is a deep neural network model; and calculating a difference value between the first transformation matrix and the second transformation matrix according to a preset loss function, and updating network parameters of the image registration model according to the calculated difference value until the calculated difference value is smaller than a preset threshold value, so as to obtain the trained image registration model. The method and the device can improve the registration accuracy of the infrared image and the visible light image.

Description

Method and device for training image registration model and electronic equipment
Technical Field
The application belongs to the technical field of image processing, and particularly relates to a method and a device for training an image registration model and electronic equipment.
Background
With the rapid development of sensor imaging technology, the imaging of a single sensor is difficult to meet the daily application requirements, and the imaging of multiple sensors leads to technical innovation. The image fusion is to comprehensively process the image information detected by a plurality of sensors, so as to realize more comprehensive and reliable description of the detection scene.
The infrared and visible light are used as image types which are most widely applied in the field of image processing, the infrared image can efficiently capture scene heat radiation and identify a scene highlight target, the visible light image has high resolution and can present scene detail texture information, and the image information of the infrared image and the image information of the visible light image have efficient complementarity. Therefore, the infrared image and the visible light image are fused, a fused image with rich scene information content can be obtained, and the scene background and the target can be clearly and accurately described.
Image registration refers to a process of matching and aligning two or more images acquired at different times, different sensors (imaging devices), or under different conditions (weather, illuminance, camera position and angle, etc.). The image registration is an indispensable precondition processing flow of an image fusion task and is also a performance guarantee of the image fusion task, and the image registration accuracy degree directly influences the image fusion effect. However, in the case that the infrared image and the visible light image are from different sensors, because the imaging principles of the different sensors have great differences, and the gray values and the contrast of the imaging pixels of the two images have great differences, the image registration algorithm based on the features usually has fewer effective feature points, which causes great error offset to be generated in image registration, and further causes ghost images or blur to appear in the final fused image, thereby affecting the image fusion effect.
Disclosure of Invention
The embodiment of the application aims to provide a method for training an image registration model, which can solve the problem that image registration in the prior art generates large error offset.
In order to solve the technical problem, the present application is implemented as follows:
in a first aspect, an embodiment of the present application provides a method for training an image registration model, where the method includes:
acquiring a dataset comprising registered image pairs, wherein each image pair comprises a visible light image and an infrared image in the same scene;
cutting each image pair in the data set to obtain a target image block pair, wherein the target image block pair comprises a first image block cut from a visible light image and a second image block cut from an infrared image, and the first image block and the second image block correspond to the same position in the image pair and have a preset random offset;
calculating first transformation matrixes corresponding to two image blocks in the target image block pair;
taking the target image block pair as training data, and sequentially inputting an initial image registration model to obtain a registration image pair of the target image block pair and obtain second transformation matrixes corresponding to two image blocks in the registration image pair, wherein the image registration model is a deep neural network model;
and calculating a difference value between the first transformation matrix and the second transformation matrix according to a preset loss function, and updating the network parameters of the image registration model according to the calculated difference value until the calculated difference value is smaller than a preset threshold value, so as to obtain the trained image registration model.
In a second aspect, an embodiment of the present application provides an apparatus for training an image registration model, where the apparatus includes:
a dataset acquisition module for acquiring a dataset comprising registered image pairs, wherein each image pair comprises a visible light image and an infrared image in the same scene;
the image shearing module is used for shearing each image pair in the data set to obtain a target image block pair, wherein the target image block pair comprises a first image block sheared from a visible light image and a second image block sheared from an infrared image, and the first image block and the second image block correspond to the same position in the image pair and have a preset random offset;
the matrix calculation module is used for calculating first transformation matrixes corresponding to two image blocks in the target image block pair;
the data training module is used for taking the target image block pair as training data, sequentially inputting an initial image registration model to obtain a registration image pair of the target image block pair and obtain second transformation matrixes corresponding to two image blocks in the registration image pair, wherein the image registration model is a deep neural network model;
and the parameter adjusting module is used for calculating a difference value between the first transformation matrix and the second transformation matrix according to a preset loss function, and updating the network parameters of the image registration model according to the calculated difference value until the calculated difference value is smaller than a preset threshold value, so as to obtain the trained image registration model.
In a third aspect, an embodiment of the present application provides an electronic device, which includes a processor, a memory, and a program or instructions stored on the memory and executable on the processor, and when executed by the processor, the program or instructions implement the steps of the method according to the first aspect.
In a fourth aspect, embodiments of the present application provide a readable storage medium, on which a program or instructions are stored, which when executed by a processor implement the steps of the method according to the first aspect.
In a fifth aspect, an embodiment of the present application provides a chip, where the chip includes a processor and a communication interface, where the communication interface is coupled to the processor, and the processor is configured to execute a program or instructions to implement the method according to the first aspect.
In the embodiment of the application, the registered image pairs collected in advance are cut, and the target image block pairs obtained by cutting are used as training data for training the image registration model. Each image pair can be cut into a plurality of target image block pairs according to a preset size, and the target image block pairs are used as training data, so that each image pair can generate a plurality of training data, the data quantity of the training data can be increased, and the training accuracy is improved. After the training data and the label are obtained, the training data (target image block pairs) are sequentially input into the initial image registration model for training, and the trained image registration model is obtained. The image registration model is obtained through supervised training according to a large amount of accurate training data, and fine registration of the image pair of the infrared image and the visible light image can be realized through the trained image registration model. Compared with an image registration method based on artificial features, the image registration method based on the deep neural network model provided by the application forces the image registration model to learn the image features with high robustness and high consistency in the image pair through given training data with accurate alignment in a network fitting mode, is used for calculating the space transformation between the images, and improves the registration accuracy of the infrared image and the visible light image.
Drawings
FIG. 1 is a flow chart of the steps of one embodiment of a method of training an image registration model of the present application;
FIG. 2 is a schematic flow chart of a pair of cropped target image blocks of the present application;
FIG. 3 is a schematic diagram of a network structure of an image registration model of the present application;
FIG. 4 is a schematic diagram of a network structure of a deep feature extraction network FEB according to the present application;
FIG. 5 is a schematic diagram of the internal structure of an RDN network of the present application;
FIG. 6 is a schematic diagram of a network structure of a mask prediction network MPB according to the present application;
FIG. 7 is a schematic diagram of an equivalent transformation of a first transformation matrix according to the present application;
FIG. 8 is a schematic structural diagram of an embodiment of an apparatus for training an image registration model according to the present application;
FIG. 9 is a schematic structural diagram of an electronic device of the present application;
fig. 10 is a hardware structure diagram of an electronic device implementing an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The terms first, second and the like in the description and in the claims of the present application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that embodiments of the application may be practiced in sequences other than those illustrated or described herein, and that the terms "first," "second," and the like are generally used herein in a generic sense and do not limit the number of terms, e.g., the first term can be one or more than one. In addition, "and/or" in the specification and claims means at least one of connected objects, a character "/" generally means that a preceding and succeeding related objects are in an "or" relationship.
The method for training the image registration model provided by the embodiment of the present application is described in detail below with reference to the accompanying drawings by specific embodiments and application scenarios thereof.
Referring to fig. 1, a flow chart of steps of an embodiment of a method of training an image registration model of the present application is shown, comprising the steps of:
step 101, acquiring a data set containing registered image pairs, wherein each image pair contains a visible light image and an infrared image in the same scene;
102, cutting each image pair in the data set to obtain a target image block pair, wherein the target image block pair comprises a first image block cut from a visible light image and a second image block cut from an infrared image, and the first image block and the second image block correspond to the same position in the image pair and have a preset random offset;
103, calculating first transformation matrixes corresponding to two image blocks in the target image block pair;
step 104, using the target image block pair as training data, and sequentially inputting an initial image registration model to obtain a registration image pair of the target image block pair and second transformation matrixes corresponding to two image blocks in the registration image pair, wherein the image registration model is a deep neural network model;
and 105, calculating a difference value between the first transformation matrix and the second transformation matrix according to a preset loss function, and updating network parameters of the image registration model according to the calculated difference value until the calculated difference value is smaller than a preset threshold value, so as to obtain the trained image registration model.
The embodiment of the application provides a method for training an image registration model, wherein a target image block pair and a first transformation matrix are generated according to a pre-collected registered image pair, the target image block pair is used as training data, the first transformation matrix is used as a label corresponding to the training data, and a neural network model is obtained through supervised training. The neural network model can decompose and extract richer and image features with appropriate types by simulating the structure of the human eye neurons, and the accuracy of feature extraction can be improved.
In an embodiment of the present application, a data set is pre-collected containing registered image pairs, each image pair containing an infrared image and a visible light image of the same scene. For example, the embodiment of the present application may collect an appropriate number of image pair datasets selected from a public database of infrared and visible light images and videos such as international published INO, TNO, OTCVBS, and the like, and use the image pair datasets for training data of an image registration model and making an annotation tag. The international published infrared and visible light image and video consensus public database contains infrared and visible light image pairs that achieve fine registration.
Further, in the embodiment of the present application, a preset number of image pairs are selected from the data set including the registered image pairs, and the selected image pairs include different scenes, for example, the selected preset number of image pairs include different scenes such as daytime scenes, night scenes, indoor scenes and outdoor scenes, and the image pairs including scene targets such as pedestrians and vehicles are selected as much as possible, so as to improve the objectivity of subsequent registration.
Each image pair in the data set is cut to obtain a target image block pair, the target image block pair comprises a first image block cut from a visible light image and a second image block cut from an infrared image, and the first image block and the second image block correspond to the same position in the image pair and have a preset random offset.
Because less training data can be used for training the registration model, the embodiment of the application cuts the pre-collected registered image pair, and uses the cut target image block pair as the training data for training the image registration model. Each image pair can be cut into a plurality of target image block pairs according to a preset size, and the target image block pairs are used as training data, so that each image pair can generate a plurality of training data, the data quantity of the training data can be increased, and the training accuracy is improved. In one example, the preset size is 32 × 32 pixels.
Because the first image block and the second image block in the target image block correspond to the same position in the image pair and have a preset random offset, the first transformation matrix corresponding to the first image block and the second image block in the target image block pair is calculated in the embodiment of the application. The first transform matrix may be used to represent an offset between the first image block and the second image block. The embodiment of the application takes the first transformation matrix as a label corresponding to training data and is used for guiding the training of the registration model.
After the training data and the label are obtained, the training data (target image block pair) are sequentially input into an initial image registration model to obtain a registration image pair of the target image block pair, and a second transformation matrix corresponding to two image blocks in the registration image pair is calculated, wherein the image registration model is a deep neural network model.
The image registration model may be obtained by performing supervised training on an existing neural network according to a large amount of training data and a machine learning method. It should be noted that, the embodiment of the present application does not limit the model structure and the training method of the image registration model. The image registration model may be a deep neural network model that fuses multiple neural networks. The neural network includes, but is not limited to, at least one or a combination, superposition, nesting of at least two of the following: CNN (Convolutional Neural Network), LSTM (Long Short-Term Memory) Network, RNN (Simple Recurrent Neural Network), attention Neural Network, and the like.
After calculating corresponding second transformation matrices for two image blocks in a registration image pair output by an image registration model, calculating a difference value between the first transformation matrix and the second transformation matrix according to a preset loss function, and updating network parameters of the image registration model according to the calculated difference value until the calculated difference value is smaller than a preset threshold value, thereby obtaining the trained image registration model.
The embodiment of the application provides a method for training an image registration model, wherein the image registration model is a deep neural network model, the image registration model is obtained through supervised training according to a large amount of accurate training data, and fine registration of the image pair of an infrared image and a visible light image can be realized through the trained image registration model. Compared with an image registration method based on artificial features, the image registration method based on the deep neural network model provided by the application forces the image registration model to learn the image features with high robustness and high consistency in the image pair through given training data with accurate alignment in a network fitting mode, is used for calculating the space transformation between the images, and improves the registration accuracy of the infrared image and the visible light image.
In an optional embodiment of the present application, the cropping each image pair in the data set to obtain a target image block pair includes:
step S11, randomly determining a first selection frame in a first image in the image pair, and acquiring first coordinates corresponding to four corner points of the first selection frame;
step S12, determining a second selection frame in a second image in the image pair according to the first coordinate;
step S13, according to a preset offset, carrying out random offset on four corner points of the second selection frame to obtain second coordinates corresponding to the four corner points after offset;
step S14, calculating a transformation matrix between the first coordinate and the second coordinate;
step S15, performing perspective transformation on the second selection frame in the second image according to the inverse matrix of the transformation matrix to obtain a third selection frame;
and step S16, cutting the first selection frame in the first image and cutting the third selection frame in the second image to obtain a target image block pair.
In an optional embodiment of the present application, the first image is a visible light image of the pair of images, and the second image is an infrared image of the pair of images; or, the first image is an infrared image in the image pair, and the second image is a visible light image in the image pair. In the embodiment of the present application, the first image is a visible light image, and the second image is an infrared image. Referring to fig. 2, a schematic flow chart of cutting a target image block pair according to an embodiment of the present application is shown.
First, step S10 is executed to select a certain image pair in the data set, as shown in FIG. 2, the image pair includes IrAnd Iv,IrAs an infrared image, IvIs a visible light image. Step S11 is executed to randomly determine a first selection box, denoted as P, in the visible light image of the image pair for the selected image pairvAnd the first selection frame is a rectangular frame, and first coordinates corresponding to four corner points of the first selection frame are obtained. It is to be understood that the size of the first selection frame is not limited in the embodiments of the present application. In the embodiment of the present application, the size of the first selection frame is 32 × 32 pixels as an example.
Then, step S12 is executed to determine a second selection box in the second image of the pair of images according to the first coordinate. Specifically, a first coordinate of four focuses of a first selection frame in the visible light image may be mapped to the infrared image to obtain a second selection frame in the infrared image, such as Pv′。
And next, executing step S13, and performing random offset on four corner points of the second selection frame according to a preset offset to obtain second coordinates corresponding to the four corner points after offset. Wherein the second selection frame Pv' the offset direction and offset distance of the four corner points are randomly selected within a preset range, and P is shown in step S13 in fig. 2vAn example of the offset direction and offset distance of the four corner points of'. The four corner points of the second selection frame in the infrared image correspond to second coordinates after being offset, and the second coordinates form the offset second selection frame and are marked as Pr′。
Step S14 is executed to calculate a transformation matrix between the first coordinates and the second coordinates. Specifically, the first selection frames P in the visible light image are respectively selectedvAnd a second selection frame P after the offset in the infrared imager' the coordinates of the corner point are calibrated, and a first selection frame P is calculatedvAnd the second selection frame P after the offsetr' i.e. a transformation matrix between the first coordinates and the second coordinates is calculated. In the embodiment of the present application, the transformation matrix may be a homography matrix of 3 × 3, denoted as Hv,r
Step S15 is executed, according to the transformation matrix Hv,rInverse matrix of (noted as H)r,v) And carrying out perspective transformation on the second selection frame in the second image to obtain a third selection frame. The embodiment of the application adopts the transformation matrix H for the second selection frame in the infrared imagev,rInverse matrix H ofr,vPerforming perspective transformation, the size of the transformed second selection frame may be changed, and the embodiment of the present application designates the transformed second selection frame as a third selection frame Pr". Third selection frame Pr"position of and first selection frame P in visible light imagevCorresponding to the same position in the image pair, but with a certain offset, where Hr,vThe transformation relation is shown in formula 1, wherein Hv,rRepresenting the transformation matrix of the visible image with respect to the infrared image, Hr,vRepresenting the transformation matrix of the infrared image with respect to the visible image, I3Representing a 3 rd order identity matrix.
Hv,r·Hr,v=I3 (1)
Step S16 is executed to select the first selection frame P in the visible light imagevCutting to obtain a first image block marked as PvAnd a third selection frame P in the second imager"carry outCutting to obtain a second image block, which is marked as PrFrom this, a target image block pair, denoted as (P), can be obtainedv,Pr)。
Performing the above steps S11 to S16 for each image pair in the data set may obtain a large number of target image block pairs, and inputting the obtained target image block pairs as training data into an initial image registration model for data fitting to train the image registration model.
In an optional embodiment of the present application, the calculating a first transformation matrix corresponding to two image blocks in the target image block pair includes:
step S21, after calculating a transformation matrix between the first coordinates and the second coordinates, recording the transformation matrix between the first coordinates and the second coordinates;
step S22, using a transformation matrix between the first coordinate value and the second coordinate value as a first transformation matrix corresponding to two image blocks in the target image block pair.
In the embodiment of the present application, after step S14 is executed, a transformation matrix between the first coordinate and the second coordinate may be recorded, and the transformation matrix between the first coordinate value and the second coordinate value may be used as a first transformation matrix H corresponding to two image blocks in the target image block pairv,r. The first transformation matrix is a label for the training data.
In an optional embodiment of the present application, the image registration model is a deep neural network model including a deep feature extraction network, a mask prediction network, a channel cascade module, and a matrix estimation network, and the obtaining a registration image pair of the target image block pair and a second transformation matrix corresponding to two image blocks in the registration image pair by using the target image block pair as training data and sequentially inputting an initial image registration model includes:
step S31, respectively inputting the first image block and the second image block in the target image block pair into the depth feature extraction network, so as to extract a first depth feature of the first image block and a second depth feature of the second image block;
step S32, inputting a first image block and a second image block in the target image block pair into the mask prediction network respectively to obtain a first mask corresponding to the first image block and a second mask corresponding to the second image block;
step S33, weighting the first depth feature by using the first mask to obtain a first feature map, and weighting the second depth feature by using the second mask to obtain a second feature map;
step S34, inputting the first feature map and the second feature map into the channel cascade module to obtain a registered image pair of the target image block pair;
and step S35, inputting the registration image pair into the matrix estimation network to obtain a second transformation matrix corresponding to two image blocks in the registration image pair.
According to the image registration method and device, an image registration framework based on a neural network is designed for an infrared image and visible light image registration task, a transformation matrix is calculated through regression to conduct network model training, and robustness features can be extracted from an image registration model network obtained through training. Referring to fig. 3, a schematic network structure diagram of an image registration model according to an embodiment of the present application is shown. As shown in fig. 3, the image registration model includes a depth feature extraction network (FEB), a mask prediction network (MPB), a channel cascade module, and a matrix estimation network (HEB).
First, step S31 is executed to combine the first image block P in the target image block pairvAnd a second image block PrRespectively inputting the depth feature extraction network FEB to extract the first image block PvFirst depth feature fvAnd said second image block PrSecond depth feature fr
While step S31 is being performed, step S32 may be performed to combine the first image block P in the target pair of image blocksvAnd a second image block PrRespectively inputting the mask prediction network MPB, and performing hierarchical mask prediction on the target image block pairs to obtain a target image blockObtaining the first image block PvCorresponding first mask MvAnd said second image block PrCorresponding second mask Mr
Then, step S33 is performed, in which the first mask is used to weight the first depth feature to obtain a first feature map, and the second mask is used to weight the second depth feature to obtain a second feature map. The embodiment of the application is to the first mask MvAnd a first depth feature fvPerforming weighted superposition to obtain a first characteristic map GvFirst feature map GvTherein contains a first mask MvAnd a first depth feature fvThe high efficiency common feature of (1). Likewise, for the second mask MrAnd a second depth feature frPerforming weighted superposition to obtain a second feature map GrSecond feature map GrIn which a second mask M is includedrAnd a second depth feature frThe high efficiency common feature of (1).
Next, step S34 is executed to map the first feature map GvAnd said second feature pattern GrInputting the image into the channel cascade module to obtain a registration image pair G of the target image block pairr,v. The channel cascade module is used for the first feature map GvAnd a second feature map GrPerforming channel cascade operation to obtain a first feature map GvAnd a second feature map GrCascading containers G with common characteristicsr,vNamely, obtaining the registration image pair of the target image block pair. It should be noted that the first feature map GvAnd a second feature map GrThe corresponding physical significance is that the characteristic information which is lack of consistency between the images is removed in a mode of setting a characteristic mask, and common characteristics of the images with more robustness are reserved.
Step S35 is executed to input the pair of registered images of the target image block pair into the matrix estimation network HEB to obtain a second transformation matrix corresponding to two image blocks in the pair of registered images. That is, the common characteristic container Gr,vInputting a matrix estimation network HEB to perform regression calculation of a registration network, and outputting a simulation through an HEB network moduleThe second transformation matrix obtained after the combination is recorded as Hv,r', the second transform matrix is a 3 x 3 homography matrix.
Calculating the first transformation matrix H according to a preset loss functionv,rAnd the second transformation matrix Hv,rAnd updating the network parameters of the image registration model according to the calculated difference value until the calculated difference value is smaller than a preset threshold value to obtain the trained image registration model.
In the embodiment of the application, the depth feature extraction network FEB is used for extracting multi-level depth feature information in an image. It should be noted that, in the embodiment of the present application, a network structure of the deep feature extraction network FEB is not limited.
Referring to fig. 4, a network structure diagram of a deep feature extraction network FEB according to an embodiment of the present application is shown. As shown in fig. 4, the depth Feature extraction network FEB may include a DFE (Deep Feature extraction) module and an FCN (full Convolutional network) module. In one example, the network structure parameters of the deep feature extraction network FEB shown in fig. 4 are shown in table 1.
TABLE 1
Figure BDA0002854954720000111
In an application example of the present application, the DFE module may be composed of three RDN (super resolution image Network) Network structures, referring to fig. 5, a schematic diagram of an internal structure of an RDN Network of the present application is shown, and Network structure parameters of the RDN Network shown in fig. 5 are shown in table 2.
TABLE 2
Figure BDA0002854954720000112
In the embodiment of the application, dense connection is performed through the RDN network shown in fig. 5, so that reusability of front and rear layer features is improved, and feature calculation complexity and structure width are reduced. Further, two branches may be included in each RDN network to enhance the diversity of the extracted features, making full use of the depth features extracted by each convolutional layer (Conv) within the module. Assuming that HRDN. d (-) represents the correlation operation of the d-th RDN module, calculating the characteristic mapping map of the d-th RDN module as shown in formula 2.
Figure BDA0002854954720000113
Wherein, FinputRepresenting the depth characteristics of the front-level input to this RDN block,
Figure BDA0002854954720000114
and representing the feature map to which the characteristics of the d-th RDN module are mapped.
In an optional embodiment of the present application, the respectively inputting a first image block and a second image block in the target image block pair into the mask prediction network to obtain a first mask corresponding to the first image block and a second mask corresponding to the second image block includes:
respectively inputting a first image block and a second image block in the target image block pair into the mask prediction network, so as to generate a first mask which is equal to the first image block size through the mask prediction network learning, marking a first contribution estimation value corresponding to each pixel in the first image block in the first mask, generate a second mask which is equal to the second image block size through the mask prediction network learning, and marking a second contribution estimation value corresponding to each pixel in the second image block in the second mask;
the weighting the first depth feature by using the first mask to obtain a first feature map, and weighting the second depth feature by using the second mask to obtain a second feature map, including:
weighting the first depth feature to obtain a first feature map by using a first contribution estimation value corresponding to each pixel in the first image block labeled in the first mask, and weighting the second depth feature to obtain a second feature map by using a second contribution estimation value corresponding to each pixel in the second image block labeled in the second mask.
In the embodiment of the present application, the mask prediction network MPB generates a first mask equal to the first image block size through network learning by automatically learning common features required for image registration, and marks a first contribution estimation value corresponding to each pixel in the first image block in the first mask. The first contribution estimation value refers to an estimated contribution degree of each element in the first image block to the first transformation matrix, the greater the contribution degree, the greater the probability that the feature of the corresponding element in the first image block is retained, and the smaller the contribution degree, the greater the probability that the feature of the corresponding element in the first image block is filtered out.
Similarly, the mask prediction network MPB generates a second mask having the same size as the first image block by automatically learning common features required for image registration, and labels a second contribution estimation value corresponding to each pixel in the second image block in the second mask. The second contribution estimation value refers to an estimated contribution degree of each element in the second image block to the first transformation matrix, the greater the contribution degree, the greater the probability that the feature of the corresponding element in the second image block is retained, and the smaller the contribution degree, the greater the probability that the feature of the corresponding element in the second image block is filtered out.
Referring to fig. 6, a schematic diagram of a network structure of a mask prediction network MPB according to an embodiment of the present application is shown, and as shown in fig. 6, the mask prediction network MPB may include a single RDN structure, which follows the structure shown in fig. 5. In one example, the network structure parameters of the mask prediction network MPB shown in fig. 6 are shown in table 3.
TABLE 3
Figure BDA0002854954720000131
Outputting a first image block P through a mask prediction network MPBvCorresponding first mask MvAnd a second image block PrCorresponding second mask MrThen, for the first mask MvAnd a first depth feature fvPerforming weighted superposition to obtain a first characteristic map Gv. And a second mask MrAnd a second depth feature frPerforming weighted superposition to obtain a second feature map Gr. The process of weighted overlap-add is shown in equation 3.
Gi=fi×Mi(i=r,v) (3)
Wherein f isi(i ═ r, v) respectively represent first depth features f extracted by the depth feature extraction network FEBvAnd a second depth feature fr。Mi(i-r, v) represents the first mask M of the output of the mask prediction network MPB, respectivelyvAnd a second mask Mr。G i(i ═ r, v) represents the first characteristic pattern G, respectivelyvAnd a second feature map Gr
Pairing a first feature map G by a channel cascade modulevAnd a second feature map GrPerforming channel cascade operation to obtain a first feature map GvAnd a second feature map GrCascading containers G with common characteristicsr,vI.e. obtaining a registered pair of images of the target pair of image blocks, sharing the feature container Gr,vAnd inputting the matrix estimation network HEB to perform regression calculation of the registration network.
In the embodiment of the present application, the matrix estimation network HEB may generate four sets of two-dimensional offset vectors (total 8-dimensional vectors). The whole matrix estimation process is represented by heb (·), and the calculation process is shown in equation 4.
H=heb(Gr,v) (4)
In an alternative embodiment of the present application, the matrix estimation network HEB may use ResNet34 as a backbone network structure, which includes 34 layers of convolution, followed by an adaptive pooling layer, and the matrix estimation network HEB module requires only a few dimensions of input features to generate a feature matrix of fixed size. In one example, the network structure parameters of the matrix estimation network HEB are shown in table 4.
TABLE 4
Figure BDA0002854954720000141
It should be noted that the network structure of the image registration model shown in fig. 3, the network structure of the depth feature extraction network FEB shown in fig. 4, the network structure of the RDN network shown in fig. 5, the network structure of the mask prediction network MPB shown in fig. 6, and parameters corresponding to the network structures shown in tables 1 to 4 are all used as an application example of the present application. The network structure of the image registration model, the network structure of the depth feature extraction network FEB, the network structure of the RDN network, the network structure of the mask prediction network MPB and the specific setting of the network parameters corresponding to the network structures are not limited.
In an optional embodiment of the present application, after the calculating the first transformation matrices corresponding to two image blocks in the target image block pair, the method further includes: performing equivalent transformation on the first transformation matrix according to the angular point coordinate offset of the first image block and the second image block to obtain a first equivalent matrix;
after obtaining the second transformation matrices corresponding to the two image blocks in the registered image pair, the method further includes: performing equivalent transformation estimation on the second transformation matrix to obtain a second equivalent matrix;
the calculating a difference value between the first transformation matrix and the second transformation matrix according to a preset loss function includes: and calculating a difference value between the first equivalent matrix and the second equivalent matrix according to a preset loss function.
In the embodiment of the application, the degrees of freedom of the fitted second transformation matrix are considered to be set in the network training stage of the image registration model, and the matrix estimation network HEB is output as an 8-dimensional vector. In addition, the transformation types of the parameter representations in the second transformation matrix are different, and some parameters represent rotation, scaling and shearing transformation, some parameters represent translation transformation, some parameters represent perspective transformation, and the like, that is, different dimensions exist among the parameters. If the parameter regression of the second transformation matrix is directly performed in the network training, different dimensions may affect the final training effect.
In order to solve the problem, in the embodiment of the present application, equivalent transformation processing is performed on the first transformation matrix and the second transformation matrix, so as to solve the problem of different regression parameter dimensions, thereby reducing the complexity of network training and improving the network training effect.
Specifically, after calculating first transformation matrices corresponding to two image blocks in the target image block pair, according to the offset of corner coordinates of the first image block and the second image block in the target image block pair, performing equivalent transformation on the first transformation matrices to obtain first equivalent matrices. Referring to fig. 7, a schematic diagram of performing an equivalent transformation on a first transformation matrix according to an embodiment of the present application is shown. The first transformation matrix H is transformed by the equivalent transformation shown in FIG. 7v,rThe parameters in the first image block and the second image block are replaced by the angular point coordinate offset of the first image block and the second image block to obtain a first equivalent matrix. Similarly, a second transformation matrix H obtained by fitting the matrix estimation network HEBv,rEach parameter in the' is replaced by the angular point coordinate offset estimated by the matrix estimation network HEB to obtain a second equivalent matrix.
In the training process of the image registration model, calculating a difference value between the first equivalent matrix and the second equivalent matrix according to a preset loss function so as to guide the adjustment of network parameters of the image registration model.
In an optional embodiment of the present application, the loss function is determined according to an offset of each corner point in the first equivalent matrix and an offset of each corner point in the second equivalent matrix.
Specifically, the loss function of the embodiment of the present application is shown in equation 5:
Figure BDA0002854954720000151
wherein, the value of i in the formula (5) is 1 to 4, which respectively represents four corner points in the first equivalent matrix or the second equivalent matrix. Δ μiAnd Δ viRespectively representing the offset of the ith corner point in the first equivalent matrix in the x and y directions. Δ μi' and Δ vi' denotes the offset of the ith corner point in the second equivalent matrix in the x, y directions, respectively. It is understood that the loss function shown in formula (5) is only an application example of the present application, and the function structure of the loss function is not limited in the embodiments of the present application.
Further, in the training phase of the image registration model, the pixel values of each target image block pair are normalized first, and the learning rate is set to be a constant piecewise attenuation, and when the number of network training iterations reaches a preset index, such as [5,000,10,000,14,000,18,000,20,000,24,000], the network training learning rate is set to be a value corresponding to [0.01,0.007,0.005,0.0025,0.001,0.0001,0.00005 ].
In an optional embodiment of the present application, after obtaining the trained image registration model, the method further includes:
inputting an image pair to be registered into the trained image registration model, wherein the image pair to be registered comprises an infrared image and a visible light image in the same scene;
and outputting a registration image pair through the trained image registration model.
After the training of the image registration model is completed, the trained image registration model can be used for image registration processing. In the embodiment of the application, the image registration model is an end-to-end model, the infrared image and the visible light image in the same scene to be registered are used as an image pair to be input into the trained image registration model, and then the registered image pair can be output.
Further, after the image pair to be included with the infrared image and the visible light image in the same scene is registered through the image registration model, the obtained registration image pair can be used for image fusion processing, so that the fusion effect is improved. The application provides a neural network-based infrared and visible light image fine registration method, when infrared and visible light images in the same scene are acquired, fine alignment between the images can be achieved, and registration performance guarantee is provided for a subsequent scene image fusion task.
In summary, the embodiment of the present application cuts the pre-collected registered image pair, and uses the cut target image block pair as training data for training the image registration model. Each image pair can be cut into a plurality of target image block pairs according to a preset size, and the target image block pairs are used as training data, so that each image pair can generate a plurality of training data, the data quantity of the training data can be increased, and the training accuracy is improved. After the training data and the label are obtained, the training data (target image block pairs) are sequentially input into the initial image registration model for training, and the trained image registration model is obtained. The image registration model is obtained through supervised training according to a large amount of accurate training data, and fine registration of the image pair of the infrared image and the visible light image can be realized through the trained image registration model. Compared with an image registration method based on artificial features, the image registration method based on the deep neural network model provided by the application forces the image registration model to learn the image features with high robustness and high consistency in the image pair through given training data with accurate alignment in a network fitting mode, is used for calculating the space transformation between the images, and improves the registration accuracy of the infrared image and the visible light image.
It should be noted that, in the method for training an image registration model provided in the embodiment of the present application, the execution subject may be an apparatus for training an image registration model, or a control module in the apparatus for training an image registration model, which is used for executing the method for training an image registration model. In the embodiment of the present application, a method for executing a training image registration model by using a device for training an image registration model is taken as an example, and the device for training an image registration model provided in the embodiment of the present application is described.
Referring to fig. 8, a schematic structural diagram of an embodiment of an apparatus for training an image registration model according to the present application is shown, the apparatus including:
a dataset acquisition module 801 for acquiring a dataset comprising registered image pairs, wherein each image pair comprises a visible light image and an infrared image in the same scene;
an image cropping module 802, configured to crop each image pair in the data set to obtain a target image block pair, where the target image block pair includes a first image block cropped from a visible light image and a second image block cropped from an infrared image, and the first image block and the second image block correspond to the same position in the image pair and have a preset random offset;
a matrix calculation module 803, configured to calculate a first transformation matrix corresponding to two image blocks in the target image block pair;
a data training module 804, configured to take the target image block pair as training data, sequentially input an initial image registration model to obtain a registration image pair of the target image block pair, and obtain a second transformation matrix corresponding to two image blocks in the registration image pair, where the image registration model is a deep neural network model;
a parameter adjusting module 805, configured to calculate a difference value between the first transformation matrix and the second transformation matrix according to a preset loss function, and update a network parameter of the image registration model according to the calculated difference value until the calculated difference value is smaller than a preset threshold, so as to obtain a trained image registration model.
Optionally, the image cropping module includes:
the first selection submodule is used for randomly determining a first selection frame in a first image in the image pair and acquiring first coordinates corresponding to four corner points of the first selection frame;
a second selection submodule for determining a second selection frame in a second image of the pair of images according to the first coordinate;
the coordinate offset submodule is used for randomly offsetting the four corner points of the second selection frame according to a preset offset to obtain second coordinates corresponding to the four corner points after offset;
a transformation calculation submodule for calculating a transformation matrix between the first coordinate and the second coordinate;
the third selection submodule is used for carrying out perspective transformation on the second selection frame in the second image according to the inverse matrix of the transformation matrix to obtain a third selection frame;
and the image shearing submodule is used for shearing the first selection frame in the first image and shearing the third selection frame in the second image to obtain a target image block pair.
Optionally, the first image is a visible light image in the image pair, and the second image is an infrared image in the image pair; or, the first image is an infrared image in the image pair, and the second image is a visible light image in the image pair.
Optionally, the image registration model is a deep neural network model including a depth feature extraction network, a mask prediction network, a channel cascade module, and a matrix estimation network, and the data training module includes:
the feature extraction sub-module is used for respectively inputting a first image block and a second image block in the target image block pair into the depth feature extraction network so as to extract a first depth feature of the first image block and a second depth feature of the second image block;
the mask prediction sub-module is used for respectively inputting a first image block and a second image block in the target image block pair into the mask prediction network so as to obtain a first mask corresponding to the first image block and a second mask corresponding to the second image block;
the mask superposition submodule is used for weighting the first depth features by using the first mask to obtain a first feature map and weighting the second depth features by using the second mask to obtain a second feature map;
a cascade processing submodule, configured to input the first feature map and the second feature map into the channel cascade module, so as to obtain a registered image pair of the target image block pair;
and the matrix estimation submodule is used for inputting the registration image pair into the matrix estimation network so as to obtain a second transformation matrix corresponding to two image blocks in the registration image pair.
Optionally, the mask prediction sub-module is specifically configured to input a first image block and a second image block in the target image block pair into the mask prediction network, so as to generate a first mask equal to the first image block size through the mask prediction network learning, and mark a first contribution estimation value corresponding to each pixel in the first image block in the first mask, and generate a second mask equal to the second image block size through the mask prediction network learning, and mark a second contribution estimation value corresponding to each pixel in the second image block in the second mask;
the mask superposition sub-module is specifically configured to weight the first depth feature to obtain a first feature map by using a first contribution estimation value corresponding to each pixel in the first image block labeled in the first mask, and weight the second depth feature to obtain a second feature map by using a second contribution estimation value corresponding to each pixel in the second image block labeled in the second mask.
Optionally, the apparatus further comprises:
the first transformation module is used for performing equivalent transformation on the first transformation matrix according to the angular point coordinate offset of the first image block and the second image block to obtain a first equivalent matrix;
the second transformation module is used for performing equivalent transformation estimation on the second transformation matrix according to the angular point coordinate offset in the second transformation matrix to obtain a second equivalent matrix;
the parameter adjusting module is specifically configured to calculate a difference value between the first equivalent matrix and the second equivalent matrix according to a preset loss function.
Optionally, the loss function is determined according to an offset of each corner point in the first equivalent matrix and an offset of each corner point in the second equivalent matrix.
Optionally, the apparatus further comprises:
the data registration module is used for inputting an image pair to be registered into the trained image registration model, wherein the image pair to be registered comprises an infrared image and a visible light image in the same scene;
and the result output module is used for outputting a registration image pair through the trained image registration model.
The device for training the image registration model, which is provided by the embodiment of the application, cuts a pre-collected registered image pair, and uses a target image block pair obtained by cutting as training data for training the image registration model. Each image pair can be cut into a plurality of target image block pairs according to a preset size, and the target image block pairs are used as training data, so that each image pair can generate a plurality of training data, the data quantity of the training data can be increased, and the training accuracy is improved. After the training data and the label are obtained, the training data (target image block pairs) are sequentially input into the initial image registration model for training, and the trained image registration model is obtained. The image registration model is obtained through supervised training according to a large amount of accurate training data, and fine registration of the image pair of the infrared image and the visible light image can be realized through the trained image registration model. Compared with an image registration method based on artificial features, the image registration method based on the deep neural network model provided by the application forces the image registration model to learn the image features with high robustness and high consistency in the image pair through given training data with accurate alignment in a network fitting mode, is used for calculating the space transformation between the images, and improves the registration accuracy of the infrared image and the visible light image.
The device for training the image registration model in the embodiment of the present application may be a device, or may be a component, an integrated circuit, or a chip in a terminal. The device can be mobile electronic equipment or non-mobile electronic equipment. By way of example, the mobile electronic device may be a mobile phone, a tablet computer, a notebook computer, a palm top computer, a vehicle-mounted electronic device, a wearable device, an ultra-mobile personal computer (UMPC), a netbook or a Personal Digital Assistant (PDA), and the like, and the non-mobile electronic device may be a server, a Network Attached Storage (NAS), a Personal Computer (PC), a Television (TV), a teller machine or a self-service machine, and the like, and the embodiments of the present application are not particularly limited.
The apparatus for training the image registration model in the embodiment of the present application may be an apparatus having an operating system. The operating system may be an Android (Android) operating system, an ios operating system, or other possible operating systems, and embodiments of the present application are not limited specifically.
The device for training the image registration model provided in the embodiment of the present application can implement each process implemented in the method embodiment of fig. 1, and is not described here again to avoid repetition.
Optionally, as shown in fig. 9, an electronic device 900 is further provided in this embodiment of the present application, and includes a processor 901, a memory 902, and a program or an instruction stored in the memory 902 and executable on the processor 901, where the program or the instruction is executed by the processor 901 to implement each process of the above embodiment of the method for training an image registration model, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here.
It should be noted that the electronic device in the embodiment of the present application includes the mobile electronic device and the non-mobile electronic device described above.
Fig. 10 is a schematic diagram of a hardware structure of an electronic device implementing an embodiment of the present application. The electronic device 1000 includes, but is not limited to: a radio frequency unit 1001, a network module 1002, an audio output unit 1003, an input unit 1004, a sensor 1005, a display unit 1006, a user input unit 1007, an interface unit 1008, a memory 1009, and a processor 1010.
Those skilled in the art will appreciate that the electronic device 1000 may further comprise a power source (e.g., a battery) for supplying power to various components, and the power source may be logically connected to the processor 1010 through a power management system, so as to implement functions of managing charging, discharging, and power consumption through the power management system. The electronic device structure shown in fig. 10 does not constitute a limitation of the electronic device, and the electronic device may include more or less components than those shown, or combine some components, or arrange different components, and thus, the description is not repeated here.
Wherein the processor 1010 is configured to acquire a data set comprising registered image pairs, wherein each image pair comprises a visible light image and an infrared image of the same scene; cutting each image pair in the data set to obtain a target image block pair, wherein the target image block pair comprises a first image block cut from a visible light image and a second image block cut from an infrared image, and the first image block and the second image block correspond to the same position in the image pair and have a preset random offset; calculating first transformation matrixes corresponding to two image blocks in the target image block pair; taking the target image block pair as training data, and sequentially inputting an initial image registration model to obtain a registration image pair of the target image block pair and obtain second transformation matrixes corresponding to two image blocks in the registration image pair, wherein the image registration model is a deep neural network model; and calculating a difference value between the first transformation matrix and the second transformation matrix according to a preset loss function, and updating the network parameters of the image registration model according to the calculated difference value until the calculated difference value is smaller than a preset threshold value, so as to obtain the trained image registration model.
According to the image registration method based on the deep neural network model, the image registration model is forced to learn the image characteristics with high robustness and high consistency in the image pair through given training data aligned accurately and in a network fitting mode, the image registration method is used for calculating space transformation between the images, and the registration accuracy of the infrared image and the visible light image is improved.
Optionally, the processor 1010 is further configured to randomly determine a first selection frame in a first image of the image pair, and obtain first coordinates corresponding to four corner points of the first selection frame; determining a second selection frame in a second image of the pair of images according to the first coordinate; randomly offsetting four corner points of the second selection frame according to a preset offset to obtain second coordinates corresponding to the four offset corner points; calculating a transformation matrix between the first coordinate and the second coordinate; performing perspective transformation on a second selection frame in the second image according to the inverse matrix of the transformation matrix to obtain a third selection frame; and cutting a first selection frame in the first image and cutting a third selection frame in the second image to obtain a target image block pair.
Because less training data can be used for training the registration model, the embodiment of the application cuts the pre-collected registered image pair, and uses the cut target image block pair as the training data for training the image registration model. Each image pair can be cut into a plurality of target image block pairs according to a preset size, and the target image block pairs are used as training data, so that each image pair can generate a plurality of training data, the data quantity of the training data can be increased, and the training accuracy is improved.
Optionally, the processor 1010 is further configured to input a first image block and a second image block in the target image block pair into the depth feature extraction network, respectively, so as to extract a first depth feature of the first image block and a second depth feature of the second image block; respectively inputting a first image block and a second image block in the target image block pair into the mask prediction network to obtain a first mask corresponding to the first image block and a second mask corresponding to the second image block; weighting the first depth features by using the first mask to obtain a first feature map, and weighting the second depth features by using the second mask to obtain a second feature map; inputting the first feature map and the second feature map into the channel cascade module to obtain a registered image pair of the target image block pair; and inputting the registration image pair into the matrix estimation network to obtain a second transformation matrix corresponding to two image blocks in the registration image pair.
The image registration model is a deep neural network model and is obtained through supervised training according to a large amount of accurate training data, and fine registration of the image pairs of the infrared image and the visible light image can be realized through the trained image registration model. Compared with an image registration method based on artificial features, the image registration method based on the deep neural network model provided by the application forces the image registration model to learn the image features with high robustness and high consistency in the image pair through given training data with accurate alignment in a network fitting mode, is used for calculating the space transformation between the images, and improves the registration accuracy of the infrared image and the visible light image.
Optionally, the processor 1010 is further configured to perform equivalent transformation on the first transformation matrix according to the offset of the corner coordinate of the first image block and the second image block, so as to obtain a first equivalent matrix; performing equivalent transformation estimation on the second transformation matrix according to the angular point coordinate offset in the second transformation matrix to obtain a second equivalent matrix; and calculating a difference value between the first equivalent matrix and the second equivalent matrix according to a preset loss function.
The embodiment of the application carries out equivalent transformation processing on the first transformation matrix and the second transformation matrix respectively to solve the problem of different regression parameter dimensions, thereby reducing the complexity of network training and improving the network training effect.
Optionally, the processor 1010 is further configured to input an image pair to be registered into the trained image registration model, where the image pair to be registered includes an infrared image and a visible light image in the same scene; and outputting a registration image pair through the trained image registration model.
The image registration model is an end-to-end model, the infrared image and the visible light image under the same scene to be registered are used as image pairs to be input into the trained image registration model, and then the registered image pairs can be output.
It should be understood that in the embodiment of the present application, the input Unit 1004 may include a Graphics Processing Unit (GPU) 1041 and a microphone 1042, and the Graphics Processing Unit 1041 processes image data of a still picture or a video obtained by an image capturing device (such as a camera) in a video capturing mode or an image capturing mode. The display unit 1006 may include a display panel 1061, and the display panel 1061 may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like. The user input unit 1007 includes a touch panel 1071 and other input devices 1072. The touch panel 1071 is also referred to as a touch screen. The touch panel 1071 may include two parts of a touch detection device and a touch controller. Other input devices 1072 may include, but are not limited to, a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, and a joystick, which are not described in detail herein. The memory 1009 may be used to store software programs as well as various data, including but not limited to application programs and operating systems. Processor 1010 may integrate an application processor that handles primarily operating systems, user interfaces, applications, etc. and a modem processor that handles primarily wireless communications. It will be appreciated that the modem processor described above may not be integrated into processor 1010.
The embodiment of the present application further provides a readable storage medium, where a program or an instruction is stored on the readable storage medium, and when the program or the instruction is executed by a processor, the program or the instruction implements each process of the above method for training an image registration model, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here.
The processor is the processor in the electronic device described in the above embodiment. The readable storage medium includes a computer readable storage medium, such as a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and so on.
The embodiment of the present application further provides a chip, where the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is configured to run a program or an instruction to implement each process of the above method for training an image registration model, and can achieve the same technical effect, and in order to avoid repetition, the details are not repeated here.
It should be understood that the chips mentioned in the embodiments of the present application may also be referred to as system-on-chip, system-on-chip or system-on-chip, etc. It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Further, it should be noted that the scope of the methods and apparatus of the embodiments of the present application is not limited to performing the functions in the order illustrated or discussed, but may include performing the functions in a substantially simultaneous manner or in a reverse order based on the functions involved, e.g., the methods described may be performed in an order different than that described, and various steps may be added, omitted, or combined. In addition, features described with reference to certain examples may be combined in other examples.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.
While the present embodiments have been described with reference to the accompanying drawings, it is to be understood that the invention is not limited to the precise embodiments described above, which are meant to be illustrative and not restrictive, and that various changes may be made therein by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (14)

1. A method of training an image registration model, the method comprising:
acquiring a dataset comprising registered image pairs, wherein each image pair comprises a visible light image and an infrared image in the same scene;
cutting each image pair in the data set to obtain a target image block pair, wherein the target image block pair comprises a first image block cut from a visible light image and a second image block cut from an infrared image, and the first image block and the second image block correspond to the same position in the image pair and have a preset random offset;
calculating first transformation matrixes corresponding to two image blocks in the target image block pair;
taking the target image block pair as training data, and sequentially inputting an initial image registration model to obtain a registration image pair of the target image block pair and obtain second transformation matrixes corresponding to two image blocks in the registration image pair, wherein the image registration model is a deep neural network model;
and calculating a difference value between the first transformation matrix and the second transformation matrix according to a preset loss function, and updating the network parameters of the image registration model according to the calculated difference value until the calculated difference value is smaller than a preset threshold value, so as to obtain the trained image registration model.
2. The method of claim 1, wherein said cropping each pair of images in the dataset to obtain a pair of target image blocks comprises:
randomly determining a first selection frame in a first image in the image pair, and acquiring first coordinates corresponding to four corner points of the first selection frame;
determining a second selection frame in a second image of the pair of images according to the first coordinate;
randomly offsetting four corner points of the second selection frame according to a preset offset to obtain second coordinates corresponding to the four offset corner points;
calculating a transformation matrix between the first coordinate and the second coordinate;
performing perspective transformation on a second selection frame in the second image according to the inverse matrix of the transformation matrix to obtain a third selection frame;
and cutting a first selection frame in the first image and cutting a third selection frame in the second image to obtain a target image block pair.
3. The method of claim 2, wherein the first image is a visible light image of the pair of images and the second image is an infrared image of the pair of images; or, the first image is an infrared image in the image pair, and the second image is a visible light image in the image pair.
4. The method according to claim 1, wherein the image registration model is a deep neural network model including a depth feature extraction network, a mask prediction network, a channel cascade module, and a matrix estimation network, and the obtaining of the registration image pair of the target image block pair and the second transformation matrix corresponding to two image blocks in the registration image pair by using the target image block pair as training data and sequentially inputting the initial image registration model comprises:
inputting a first image block and a second image block in the target image block pair into the depth feature extraction network respectively so as to extract a first depth feature of the first image block and a second depth feature of the second image block;
respectively inputting a first image block and a second image block in the target image block pair into the mask prediction network to obtain a first mask corresponding to the first image block and a second mask corresponding to the second image block;
weighting the first depth features by using the first mask to obtain a first feature map, and weighting the second depth features by using the second mask to obtain a second feature map;
inputting the first feature map and the second feature map into the channel cascade module to obtain a registered image pair of the target image block pair;
and inputting the registration image pair into the matrix estimation network to obtain a second transformation matrix corresponding to two image blocks in the registration image pair.
5. The method according to claim 4, wherein the inputting a first image block and a second image block in the target image block pair into the mask prediction network respectively to obtain a first mask corresponding to the first image block and a second mask corresponding to the second image block comprises:
respectively inputting a first image block and a second image block in the target image block pair into the mask prediction network, so as to generate a first mask which is equal to the first image block size through the mask prediction network learning, marking a first contribution estimation value corresponding to each pixel in the first image block in the first mask, generate a second mask which is equal to the second image block size through the mask prediction network learning, and marking a second contribution estimation value corresponding to each pixel in the second image block in the second mask;
the weighting the first depth feature by using the first mask to obtain a first feature map, and weighting the second depth feature by using the second mask to obtain a second feature map, including:
weighting the first depth feature to obtain a first feature map by using a first contribution estimation value corresponding to each pixel in the first image block labeled in the first mask, and weighting the second depth feature to obtain a second feature map by using a second contribution estimation value corresponding to each pixel in the second image block labeled in the second mask.
6. The method according to claim 1, wherein after calculating the first transformation matrix corresponding to two image blocks in the target image block pair, further comprising:
performing equivalent transformation on the first transformation matrix according to the angular point coordinate offset of the first image block and the second image block to obtain a first equivalent matrix;
after obtaining the second transformation matrices corresponding to the two image blocks in the registered image pair, the method further includes:
performing equivalent transformation estimation on the second transformation matrix according to the angular point coordinate offset in the second transformation matrix to obtain a second equivalent matrix;
the calculating a difference value between the first transformation matrix and the second transformation matrix according to a preset loss function includes:
and calculating a difference value between the first equivalent matrix and the second equivalent matrix according to a preset loss function.
7. The method according to claim 6, characterized in that the loss function is determined from the offset of each corner point in the first equivalent matrix and the offset of each corner point in the second equivalent matrix.
8. The method of claim 1, wherein after obtaining the trained image registration model, further comprising:
inputting an image pair to be registered into the trained image registration model, wherein the image pair to be registered comprises an infrared image and a visible light image in the same scene;
and outputting a registration image pair through the trained image registration model.
9. An apparatus for training an image registration model, the apparatus comprising:
a dataset acquisition module for acquiring a dataset comprising registered image pairs, wherein each image pair comprises a visible light image and an infrared image in the same scene;
the image shearing module is used for shearing each image pair in the data set to obtain a target image block pair, wherein the target image block pair comprises a first image block sheared from a visible light image and a second image block sheared from an infrared image, and the first image block and the second image block correspond to the same position in the image pair and have a preset random offset;
the matrix calculation module is used for calculating first transformation matrixes corresponding to two image blocks in the target image block pair;
the data training module is used for taking the target image block pair as training data, sequentially inputting an initial image registration model to obtain a registration image pair of the target image block pair and obtain second transformation matrixes corresponding to two image blocks in the registration image pair, wherein the image registration model is a deep neural network model;
and the parameter adjusting module is used for calculating a difference value between the first transformation matrix and the second transformation matrix according to a preset loss function, and updating the network parameters of the image registration model according to the calculated difference value until the calculated difference value is smaller than a preset threshold value, so as to obtain the trained image registration model.
10. The apparatus of claim 9, wherein the image cropping module comprises:
the first selection submodule is used for randomly determining a first selection frame in a first image in the image pair and acquiring first coordinates corresponding to four corner points of the first selection frame;
a second selection submodule for determining a second selection frame in a second image of the pair of images according to the first coordinate;
the coordinate offset submodule is used for randomly offsetting the four corner points of the second selection frame according to a preset offset to obtain second coordinates corresponding to the four corner points after offset;
a transformation calculation submodule for calculating a transformation matrix between the first coordinate and the second coordinate;
the third selection submodule is used for carrying out perspective transformation on the second selection frame in the second image according to the inverse matrix of the transformation matrix to obtain a third selection frame;
and the image shearing submodule is used for shearing the first selection frame in the first image and shearing the third selection frame in the second image to obtain a target image block pair.
11. The apparatus of claim 10, wherein the first image is a visible light image of the pair of images and the second image is an infrared image of the pair of images; or, the first image is an infrared image in the image pair, and the second image is a visible light image in the image pair.
12. The apparatus of claim 9, wherein the image registration model is a deep neural network model comprising a deep feature extraction network, a mask prediction network, a channel cascade module, and a matrix estimation network, and the data training module comprises:
the feature extraction sub-module is used for respectively inputting a first image block and a second image block in the target image block pair into the depth feature extraction network so as to extract a first depth feature of the first image block and a second depth feature of the second image block;
the mask prediction sub-module is used for respectively inputting a first image block and a second image block in the target image block pair into the mask prediction network so as to obtain a first mask corresponding to the first image block and a second mask corresponding to the second image block;
the mask superposition submodule is used for weighting the first depth features by using the first mask to obtain a first feature map and weighting the second depth features by using the second mask to obtain a second feature map;
a cascade processing submodule, configured to input the first feature map and the second feature map into the channel cascade module, so as to obtain a registered image pair of the target image block pair;
and the matrix estimation submodule is used for inputting the registration image pair into the matrix estimation network so as to obtain a second transformation matrix corresponding to two image blocks in the registration image pair.
13. The apparatus according to claim 12, wherein the mask prediction sub-module is specifically configured to input a first image block and a second image block of the target image block pair into the mask prediction network, respectively, to generate a first mask equal to the first image block size through the mask prediction network learning, and to label a first contribution estimation value corresponding to each pixel in the first image block in the first mask, and to generate a second mask equal to the second image block size through the mask prediction network learning, and to label a second contribution estimation value corresponding to each pixel in the second image block in the second mask;
the mask superposition sub-module is specifically configured to weight the first depth feature to obtain a first feature map by using a first contribution estimation value corresponding to each pixel in the first image block labeled in the first mask, and weight the second depth feature to obtain a second feature map by using a second contribution estimation value corresponding to each pixel in the second image block labeled in the second mask.
14. An electronic device comprising a processor, a memory, and a program or instructions stored on the memory and executable on the processor, the program or instructions, when executed by the processor, implementing the steps of the method of training an image fusion model according to any one of claims 1 to 8.
CN202011541901.3A 2020-12-23 2020-12-23 Method and device for training image registration model and electronic equipment Withdrawn CN112561973A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011541901.3A CN112561973A (en) 2020-12-23 2020-12-23 Method and device for training image registration model and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011541901.3A CN112561973A (en) 2020-12-23 2020-12-23 Method and device for training image registration model and electronic equipment

Publications (1)

Publication Number Publication Date
CN112561973A true CN112561973A (en) 2021-03-26

Family

ID=75031758

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011541901.3A Withdrawn CN112561973A (en) 2020-12-23 2020-12-23 Method and device for training image registration model and electronic equipment

Country Status (1)

Country Link
CN (1) CN112561973A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113487656A (en) * 2021-07-26 2021-10-08 推想医疗科技股份有限公司 Image registration method and device, training method and device, control method and device
CN113596341A (en) * 2021-06-11 2021-11-02 北京迈格威科技有限公司 Image shooting method, image processing device and electronic equipment
CN113706450A (en) * 2021-05-18 2021-11-26 腾讯科技(深圳)有限公司 Image registration method, device, equipment and readable storage medium
CN114419869A (en) * 2022-03-30 2022-04-29 北京启醒科技有限公司 Urban disaster early warning method and system based on time sequence multi-dimensional prediction

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
龙勇志: "红外与可见光图像配准与融合算法研究", 中国优秀硕士学位论文全文数据库信息科技辑, no. 07 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113706450A (en) * 2021-05-18 2021-11-26 腾讯科技(深圳)有限公司 Image registration method, device, equipment and readable storage medium
CN113596341A (en) * 2021-06-11 2021-11-02 北京迈格威科技有限公司 Image shooting method, image processing device and electronic equipment
CN113596341B (en) * 2021-06-11 2024-04-05 北京迈格威科技有限公司 Image shooting method, image processing device and electronic equipment
CN113487656A (en) * 2021-07-26 2021-10-08 推想医疗科技股份有限公司 Image registration method and device, training method and device, control method and device
CN114419869A (en) * 2022-03-30 2022-04-29 北京启醒科技有限公司 Urban disaster early warning method and system based on time sequence multi-dimensional prediction
CN114419869B (en) * 2022-03-30 2022-07-26 北京启醒科技有限公司 Urban disaster early warning method and system based on time sequence multi-dimensional prediction

Similar Documents

Publication Publication Date Title
CN112561973A (en) Method and device for training image registration model and electronic equipment
JP2022534337A (en) Video target tracking method and apparatus, computer apparatus, program
CN108062525B (en) Deep learning hand detection method based on hand region prediction
CN110163188B (en) Video processing and method, device and equipment for embedding target object in video
Yang et al. A multi-task Faster R-CNN method for 3D vehicle detection based on a single image
CN112597941A (en) Face recognition method and device and electronic equipment
WO2022179581A1 (en) Image processing method and related device
CN112561846A (en) Method and device for training image fusion model and electronic equipment
CN107194948B (en) Video significance detection method based on integrated prediction and time-space domain propagation
WO2022052782A1 (en) Image processing method and related device
CN113850136A (en) Yolov5 and BCNN-based vehicle orientation identification method and system
CN113160231A (en) Sample generation method, sample generation device and electronic equipment
CN113112542A (en) Visual positioning method and device, electronic equipment and storage medium
CN115482523A (en) Small object target detection method and system of lightweight multi-scale attention mechanism
CN116977674A (en) Image matching method, related device, storage medium and program product
Sharjeel et al. Real time drone detection by moving camera using COROLA and CNN algorithm
WO2022247126A1 (en) Visual localization method and apparatus, and device, medium and program
CN111222459A (en) Visual angle-independent video three-dimensional human body posture identification method
CN116580151A (en) Human body three-dimensional model construction method, electronic equipment and storage medium
CN114565777A (en) Data processing method and device
CN113537359A (en) Training data generation method and device, computer readable medium and electronic equipment
Álvarez et al. A new marker design for a robust marker tracking system against occlusions
CN113112522A (en) Twin network target tracking method based on deformable convolution and template updating
Xue et al. An end-to-end multi-resolution feature fusion defogging network
Wang et al. Learning to remove reflections from windshield images

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20210326