CN113516697A

CN113516697A - Image registration method and device, electronic equipment and computer-readable storage medium

Info

Publication number: CN113516697A
Application number: CN202110813026.8A
Authority: CN
Inventors: 李盼盼; 秦勇
Original assignee: Beijing Century TAL Education Technology Co Ltd
Current assignee: Beijing Century TAL Education Technology Co Ltd
Priority date: 2021-07-19
Filing date: 2021-07-19
Publication date: 2021-10-19
Anticipated expiration: 2041-07-19
Also published as: CN113516697B

Abstract

The present disclosure provides a method, an apparatus, an electronic device and a computer-readable storage medium for image registration, including: acquiring a first image and a second image to be registered; inputting the first image and the second image into a homography matrix acquisition network, and outputting a first homography matrix of a mapping relation of the first image and the second image by the homography matrix acquisition network; registering the first image and the second image according to the first homography matrix; the homography matrix acquisition network is obtained by taking a plurality of third images and a plurality of fourth images as input and training a plurality of second homography matrixes of mapping relations between the plurality of third images and the plurality of fourth images as labels. Registration of multiple images is achieved.

Description

Image registration method and device, electronic equipment and computer-readable storage medium

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to a method and an apparatus for image registration, an electronic device, and a computer-readable storage medium.

Background

Image registration and related technologies thereof are a hotspot and difficult technology in the field of image processing research, and the purpose of the image registration and related technologies is to compare and fuse images acquired under different conditions (different time, illumination, shooting angle and the like) for the same object, specifically, for two images to be registered, a spatial transformation is obtained through a series of operations, and one image is mapped onto the other image, so that points at the same position in space in the two images are in one-to-one correspondence; the technology is widely applied to the fields of target detection, model reconstruction, motion estimation, feature matching, tumor detection, lesion positioning, angiography, geological exploration, aerial reconnaissance and the like.

Disclosure of Invention

According to an aspect of the present disclosure, there is provided a method of image registration, including:

acquiring a first image and a second image to be registered;

inputting the first image and the second image into a homography matrix acquisition network, and outputting a first homography matrix of the mapping relation of the first image and the second image by the homography matrix acquisition network;

registering the first image and the second image according to the first homography matrix;

the homography matrix acquisition network is obtained by taking a plurality of third images and a plurality of fourth images as input and training a plurality of second homography matrixes of mapping relations between the plurality of third images and the plurality of fourth images as labels.

According to another aspect of the present disclosure, there is provided an apparatus for image registration, including:

the first acquisition module is used for acquiring a first image and a second image to be registered;

the first processing module is used for inputting the first image and the second image into a homography matrix acquisition network, and the homography matrix acquisition network outputs a first homography matrix of the mapping relation of the first image and the second image;

a registration module, configured to perform registration of the first image and the second image according to the first homography matrix;

According to another aspect of the present disclosure, there is provided an electronic device including:

a processor; and

a memory for storing a program, wherein the program is stored in the memory,

wherein the program comprises instructions which, when executed by the processor, cause the processor to perform a method of image registration according to any of the above.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of image registration according to any one of the above.

According to one or more technical schemes provided in the embodiments of the application, registration of a plurality of images can be achieved.

Drawings

Fig. 1 shows a flow chart of an image registration method according to an exemplary embodiment of the present disclosure;

fig. 2 shows a schematic block diagram of an image registration apparatus according to an exemplary embodiment of the present disclosure;

FIG. 3 illustrates a block diagram of an exemplary electronic device that can be used to implement embodiments of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.

It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.

The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description. It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.

It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.

The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.

Although the prior art has a great deal of research on the image registration technology, because the unified objective result evaluation standard is difficult to exist and the performance is difficult to be greatly improved by using the technical means of deep learning (a data set is difficult to design and obtain), a great number of configuration methods are designed aiming at a specific problem, and a universal method is rare and mainly a method for calculating a homography matrix by using image feature point matching.

The image feature points refer to typical representative pixel points in an image, and the application scenes of the image feature points mainly comprise camera calibration, image splicing, dense reconstruction and scene understanding, wherein a word bag method is commonly used in the scene understanding, namely a keyword bag is generated by taking the feature points as the center to identify the scene; the image characteristic points have important application in a plurality of tasks of computer vision, and at present, 3 major image characteristic point acquisition methods exist, namely, a manually designed characteristic point detection algorithm, such as well-known sift, surf, orb, fast, hog and the like; secondly, feature points are obtained by using a method based on deep learning, thirdly, artificial mark points in a scene are used as the feature points, and although the current method based on deep learning can obtain better effect, the algorithm designed manually still has better effect, higher speed and easy practicability and is widely applied in the industry, especially sift algorithm, and the method is not time-consuming since being proposed. Any pixel point on an image is required to become an image feature point, and two basic requirements of difference and repeatability are required to be met, the difference is that the pixel point is obvious compared with the surrounding point points, gray level transformation is obvious, such as an angular point, an edge point and the like, and the repeatability means that the same feature repeatedly appears in different visual angles and has rotation, luminosity and scale invariance.

At present, for primary school mathematics calculation subjects with correctable logic, a relatively good correcting effect is obtained, a large number of application programs appear, but the correction problems of all subjects in primary school mathematics can only be met, and only some conventional calculation subject types such as horizontal type, vertical type and detached type can not be processed, and common subject types with semantic information such as selection, filling and judgment subjects can not be processed, and the subject types are very common in primary school mathematics operation and are not few in number, especially for the four, five and six years, the application range of shooting and judging subjects can be greatly expanded, the user requirements can be greatly met, and even if the requirements are strong, a method for well correcting the subject types still does not exist at present; at present, for subjects whose logics are not modifiable, modifying is generally performed in a manner based on a pattern library, and in this manner, it is very critical how to more accurately correspond the answering areas, and although there are various manners, the registration method can have a better effect.

At present, image feature points are used for matching, and then homography matrixes of two images to be registered are calculated by methods such as random sampling (RANSAC) and the like, so that image registration is a relatively universal method; the prior exercise book for primary school mathematics has a great deal of problems of back penetration of text images (one side of the same page of paper affects the other side of the same page of paper due to writing habits, shooting scenes after the completion of operation, uneven photoprint, incorrect shooting angle and the like) caused by various reasons such as writing habits, shooting scenes and the like, and simultaneously, if all the texts in the primary school mathematics are completely changed, a picture library is required to be established, and how to correspond correct answers provided in the picture library to relevant questions in an image to be corrected has great influence on the accuracy of shooting answers, the answers are mapped by using an image registration method for judging the questions, so that a good effect can be realized, but the answer depends heavily on the image registration effect, and the prior general image registration method has poor effect on text images, particularly text images with complicated changes, this restricts the accuracy of the shooting question to be improved.

In the present embodiment, an image registration method is provided, which may be used in a smart device, such as a mobile phone, a tablet computer, and the like, and fig. 1 shows a flowchart of an image registration method according to an exemplary embodiment of the present disclosure, and as shown in fig. 1, the flowchart includes the following steps:

step S101, a first image and a second image to be registered are obtained. The source of the first image and the second image can include many kinds, for example, when the student homework is corrected, the first image can be the image to be corrected, and the second image can be the image of the correct answer in the question bank. It should be understood by those skilled in the art that the sources of the first image and the second image are not limited to the embodiment, and other images may be selected according to actual requirements within the scope of the embodiment.

Step S102, inputting the first image and the second image into a homography matrix acquisition network, and outputting a first homography matrix of the mapping relation of the first image and the second image by the homography matrix acquisition network. Specifically, the homography matrix acquisition network is obtained by training a plurality of second homography matrices in mapping relation of a plurality of third images and a plurality of fourth images as labels, wherein the plurality of third images and the plurality of fourth images are used as input. And inducing to obtain a homography matrix between the two images to be registered directly by learning the sample images, and realizing image registration based on a deep learning method.

And S103, registering the first image and the second image according to the first homography matrix.

Compared with the prior art of manually designing a feature extractor, such as sift, surf, orb and the like, the description of the detected feature point correspondence is usually based on the statistical features of the pixel values nearby, in a certain sense, only low-level features, such as edges and contours, can be described, but not high-level features, such as semantics and the like with stronger representation capability. The embodiment can realize image registration based on a deep learning method, a homography matrix between two images to be registered is obtained by inducing through learning of a sample image, and the registration of the images to be registered is completed based on the homography matrix, so that more accurate and rapid image registration is realized, and the accuracy of shooting and judging questions can be improved when the images to be corrected and the images with correct answers in the question bank are registered.

The sources of the third image and the fourth image used for training the homography matrix acquisition network include various ways, before the homography matrix acquisition network is trained, in some optional embodiments, the homography matrix acquisition network can be realized by a Variational Auto-Encoder (VAE), the VAE model is an important generative model, and is composed of two parts, namely an Encoder and a decoder, and usually a logarithm likelihood uncertainty is taken as an optimization target, so that a loss function of the VAE model is generally composed of two parts, namely a reconstruction loss and a cross entropy loss, the VAE model encodes an input through the Encoder and then inputs the encoded input into the decoder for restoring the input, and in most cases, a restored picture is very similar to an original picture, and compared with other generative models, the training of the VAE model is more stable and faster. And inputting the third image into a VAE model, and outputting a fourth image by the VAE model, wherein the VAE model is obtained by taking a plurality of fifth images as input and a plurality of sixth images as labels for training, and the sixth image is obtained by carrying out noise processing on the fifth image, and the noise processing can be, for example, operations of randomly adding some scratches, photocopying, back transmittance, uneven illumination and the like. To describe in detail with reference to the specific embodiment, a large number of normal text images (without scratches, etc.) (i.e., the fifth image) are collected, a part of data in the normal text images is copied to reduce the calculation amount, operations such as scratches, photocopying, back transmittance, uneven illumination, etc. are artificially and randomly added (i.e., the sixth image), and then a VAE model is trained by using the normal text images as input and artificially processed images as tags, i.e., a normal text image is input, so as to obtain a random problematic text image. This alternative embodiment reduces the effort of manually manufacturing the original abnormal sample by the VAE model.

In yet another optional embodiment, regarding the manner of obtaining the third image and the fourth image, a seventh image is obtained, the seventh image is input to the VAE model, an eighth image is output by the VAE model, a third homography matrix is randomly generated, the eighth image is transformed by using the third homography matrix to obtain a ninth image, the seventh image is used as the third image, the ninth image is used as the fourth image, and the third homography matrix is used as the second homography matrix. The optional embodiment reduces the workload of artificially manufacturing the original abnormal sample through the VAE model, and effectively relieves the problem that the registration data of the text image is difficult to collect in a mode of determining the homography matrix (label) and then obtaining the input image.

In order to measure the accuracy of the first homography matrix, the homography matrix acquisition network further comprises a branch network, the branch network of the homography matrix acquisition network is obtained by inputting a plurality of third images and a plurality of fourth images, training a plurality of registration probability values of the third images and the fourth images as labels, inputting the first images and the second images into the branch network of the homography matrix acquisition network, outputting the registration probability values of the first images and the second images by the branch network of the homography matrix acquisition network, registering the first images and the second images according to the first homography matrix and the registration probability values, and measuring the accuracy of the homography matrix according to the probability values of whether the two images can be registered. Image registration means that two images which have the same content (same scene) but are shot from different angles are subjected to some methods (such as feature point matching) to obtain a homography matrix, and a pixel point of one image can be transformed to a corresponding position on the other image by using the homography matrix.

In some alternative embodiments, the homography matrix acquisition network includes a plurality of convolutional layers and a plurality of fully-connected layers connected in sequence, where the homography matrix acquisition network corresponds to a first branch of the homography matrix acquisition network described below for consistency of description. The branch network of the homography matrix acquisition network includes a plurality of convolution layers and a plurality of full-link layers connected in sequence, and for consistency of description, the branch network of the homography matrix acquisition network herein corresponds to a second branch of the homography matrix acquisition network described below. Specifically, a homography matrix acquisition network is then constructed, in this embodiment, a common Resnet18 network is used and improved, the Resnet18 network is constructed by connecting 4 block blocks in series, each block includes several layers of convolution operations, the size of a feature map output by the first block is 1/4 of an original, the second block is an original 1/8, the third block is an original 1/16, and the fourth block is an original 1/32, in the block blocks, the number of feature maps output by each block is 128, all 4 groups of feature maps are converted into the size of an original 1/4 by interpolation and connected in series to obtain a group of feature maps, the number of channels is 512, so far, the two branches are obtained, the first branch is followed by performing five convolution operations on the feature maps of the 512 channels, and finally, a feature map with the number of 32 and the size of an original 1/128 is obtained, then, a fully-connected layer is connected, the number of nodes of which is 32 × H × W/128 (wherein H and W indicate the height and width of the image), and then a fully-connected layer is connected, the number of nodes of which is 8, and the fully-connected layer represents a homography matrix of the mapping relation of the two input images.

The second branch performs six convolution operations on the feature maps of the 512 channels to finally obtain the feature maps with the feature map number of 10 and the size of the original 1/256, and then is connected with a full connection layer with the node number of 10 × H × W/256 (wherein H and W refer to the height and width of the original), and then is connected with a full connection layer with the node number of 2 to indicate the probability value of whether the two images can be registered, and the branch network of the homography matrix acquisition network comprises: each convolution layer performs convolution processing on input channel feature mapping to obtain output channel feature mapping, the number of channels of the output channel feature mapping is smaller than that of the input channel feature mapping, the reduction of the number of the channels can reduce the calculated amount and improve the processing speed, after the convolution processing, selecting a window with a preset size, randomly interchanging and rearranging channel elements at the corresponding positions of a preset number of pixel points in the window with the preset size, and convolution operation is performed again to obtain feature mapping of the specified channel number, the preset size is reduced along with the increase of the convolution operation times, so that the calculation amount is further reduced, the processing speed is improved, and inputting the feature mapping with the specified channel quantity into a full connection layer after carrying out dimension change, and obtaining the re-determined feature mapping. The structure of the branch network of the network is obtained through the homography matrix, so that the fusion of the features can be increased, the local features and the global features of the two images to be registered can be effectively combined, and the accuracy of the probability value of whether the two images to be registered can be improved. More specifically, the second branch performs a convolution operation (convolution kernel size 3 × 3) on the input 512-channel feature map to obtain 256-channel output, then uses 8 × 8 as a window to randomly interchange and rearrange all channel elements at the corresponding positions of 4 pixels in the window, then performs a second 3 × 3 convolution operation to obtain 256-channel output, which is the same as the previous operation, but this time in 6 windows, similarly, the third convolution obtains 128-channel output, in 4 × 4 windows, the fourth convolution obtains 128 channels, in 2 × 2 windows, and after the fifth convolution obtains 64-channel output, at this time, the feature map size is 8 × 8, then rearranges 64-channel values corresponding to each pixel to 8 × 8 (i.e., 1 × 64 is changed to 8 ×), and then continues to perform a sixth convolution operation, obtaining 10-channel output, connecting the 10-channel output on a full-connection layer after dimension conversion, and effectively combining the local characteristics and the global characteristics of the two images to be registered compared with the conventional design of the network; during training, the first branch uses the smooth L1 loss function, and the second branch uses the softmax loss function.

The third image and the fourth image include a third image and a fourth image of positive samples, and also include a third image and a fourth image of negative samples, in order to reduce the number of samples for training the homography matrix acquisition network, a ratio of the number of positive samples to the number of negative samples is a predetermined threshold, for example, a ratio of 1:3, the positive samples indicate that the third image corresponds to the fourth image, and the negative samples indicate that the third image does not correspond to the fourth image.

In this embodiment, an apparatus for image registration is further provided, and the apparatus is used to implement the foregoing embodiments and preferred embodiments, which have already been described and are not described again. As used hereinafter, the term "module" is a combination of software and/or hardware that can implement a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.

The present embodiment provides an apparatus for image registration, as shown in fig. 2, including:

a first obtaining module 21, configured to obtain a first image and a second image to be registered;

a first processing module 22, configured to input the first image and the second image into a homography matrix acquisition network, and output a first homography matrix of a mapping relationship between the first image and the second image by the homography matrix acquisition network;

a registration module 23, configured to perform registration of the first image and the second image according to the first homography matrix;

Optionally, the apparatus further comprises:

the second processing module is used for inputting the third image to a variation self-encoder and outputting the fourth image by the variation self-encoder;

the variational self-encoder is obtained by taking a plurality of fifth images as input and a plurality of sixth images as labels for training, and the sixth images are obtained by carrying out noise processing on the fifth images.

The means for image registration in this embodiment is presented in the form of functional units, where a unit refers to an ASIC circuit, a processor and memory executing one or more software or fixed programs, and/or other devices that may provide the above-described functionality.

Further functional descriptions of the modules are the same as those of the corresponding embodiments, and are not repeated herein.

An exemplary embodiment of the present disclosure also provides an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor. The memory stores a computer program executable by the at least one processor, the computer program, when executed by the at least one processor, is for causing the electronic device to perform a method according to an embodiment of the disclosure.

The disclosed exemplary embodiments also provide a non-transitory computer readable storage medium storing a computer program, wherein the computer program, when executed by a processor of a computer, is adapted to cause the computer to perform a method according to an embodiment of the present disclosure.

The exemplary embodiments of the present disclosure also provide a computer program product comprising a computer program, wherein the computer program, when executed by a processor of a computer, is adapted to cause the computer to perform a method according to an embodiment of the present disclosure.

Referring to fig. 3, a block diagram of a structure of an electronic device 300, which may be a server or a client of the present disclosure, which is an example of a hardware device that may be applied to aspects of the present disclosure, will now be described. Electronic device is intended to represent various forms of digital electronic computer devices, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 3, the electronic device 300 includes a computing unit 301 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 302 or a computer program loaded from a storage unit 308 into a Random Access Memory (RAM) 303. In the RAM 303, various programs and data required for the operation of the device 300 can also be stored. The calculation unit 301, the ROM 302, and the RAM 303 are connected to each other via a bus 304. An input/output (I/O) interface 305 is also connected to bus 304.

A number of components in the electronic device 300 are connected to the I/O interface 305, including: an input unit 306, an output unit 307, a storage unit 308, and a communication unit 309. The input unit 306 may be any type of device capable of inputting information to the electronic device 300, and the input unit 306 may receive input numeric or character information and generate key signal inputs related to user settings and/or function controls of the electronic device. Output unit 307 may be any type of device capable of presenting information and may include, but is not limited to, a display, speakers, a video/audio output terminal, a vibrator, and/or a printer. Storage unit 304 may include, but is not limited to, magnetic or optical disks. The communication unit 309 allows the electronic device 300 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunications networks, and may include, but is not limited to, modems, network cards, infrared communication devices, wireless communication transceivers and/or chipsets, such as bluetooth (TM) devices, WiFi devices, WiMax devices, cellular communication devices, and/or the like.

The computing unit 301 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 301 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 301 performs the respective methods and processes described above. For example, in some embodiments, the method of image registration may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 308. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 300 via the ROM 302 and/or the communication unit 309. In some embodiments, the computing unit 301 may be configured by any other suitable means (e.g., by means of firmware) to perform the method of image registration.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

As used in this disclosure, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

Claims

1. A method of image registration, comprising:

acquiring a first image and a second image to be registered;

2. The method of image registration of claim 1, wherein acquiring the third image and the fourth image prior to training the homography acquisition network comprises:

inputting the third image to a variational self-encoder, outputting the fourth image by the variational self-encoder;

3. The method of image registration as claimed in claim 2, wherein acquiring the third and fourth images prior to training the homography acquisition network comprises:

acquiring a seventh image;

inputting the seventh image to the variational self-encoder, outputting an eighth image by the variational self-encoder;

randomly generating a third homography matrix;

transforming the eighth image by using the third homography matrix to obtain a ninth image;

and taking the seventh image as the third image, the ninth image as the fourth image and the third homography matrix as the second homography matrix.

4. The method of image registration as claimed in claim 1, wherein the homography acquisition network further comprises a branched network, the branched network of homography acquisition network having as inputs the third images, the fourth images, registration probability values of the third images and the fourth images trained as labels;

the method further comprises the following steps:

inputting the first image and the second image into a branch network of the homography acquisition network, outputting registration probability values of the first image and the second image by the branch network of the homography acquisition network; and

and registering the first image and the second image according to the first homography matrix and the registration probability value.

5. The method of image registration as claimed in claim 4, wherein the branching network of homography matrix acquisition networks comprises: a plurality of convolutional layers and full link layers;

each convolution layer performs convolution processing on the input channel characteristic mapping to obtain output channel characteristic mapping; the number of output channels of the channel feature map is less than the number of input channels of the channel feature map;

after convolution processing, selecting a window with a preset size, randomly exchanging and rearranging channel elements at the corresponding positions of a preset number of pixel points in the window with the preset size, and performing convolution operation again to obtain feature mapping of the specified channel number; the predetermined size decreases as the number of convolution operations increases;

and inputting the feature mapping with the specified channel quantity into the full connection layer after carrying out dimension change, so as to obtain the re-determined feature mapping.

6. The method of image registration of any of claims 1-5, wherein a ratio of a number of positive samples to negative samples is a predetermined threshold; wherein the positive samples indicate that the third image corresponds to the fourth image, and the negative samples indicate that the third image does not correspond to the fourth image.

7. An apparatus for image registration, comprising:

8. The apparatus for image registration according to claim 7, wherein the apparatus further comprises:

9. An electronic device, comprising:

a processor; and

a memory for storing a program, wherein the program is stored in the memory,

wherein the program comprises instructions which, when executed by the processor, cause the processor to carry out the method according to any one of claims 1-6.

10. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-6.