CN113516697B

CN113516697B - Image registration method, device, electronic equipment and computer readable storage medium

Info

Publication number: CN113516697B
Application number: CN202110813026.8A
Authority: CN
Inventors: 李盼盼; 秦勇
Original assignee: Beijing Century TAL Education Technology Co Ltd
Current assignee: Beijing Century TAL Education Technology Co Ltd
Priority date: 2021-07-19
Filing date: 2021-07-19
Publication date: 2024-02-02
Anticipated expiration: 2041-07-19
Also published as: CN113516697A

Abstract

The present disclosure provides a method, apparatus, electronic device, and computer-readable storage medium for image registration, including: acquiring a first image and a second image to be registered; inputting the first image and the second image into a homography matrix acquisition network, and outputting a first homography matrix of the mapping relation between the first image and the second image by the homography matrix acquisition network; registering the first image and the second image according to the first homography matrix; the homography matrix acquisition network is obtained by training a plurality of third images and a plurality of fourth images serving as inputs and a plurality of second homography matrixes of mapping relations between the third images and the fourth images serving as labels. Registration of the plurality of images is achieved.

Description

Image registration method, device, electronic equipment and computer readable storage medium

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to a method and apparatus for image registration, an electronic device, and a computer readable storage medium.

Background

Image registration and related technology are a hot spot and difficult technology in the field of image processing research, and aim to compare and fuse images acquired under different conditions (different time, illumination, shooting angle and the like) for the same object, specifically, two images to be registered are subjected to a series of operations to obtain a spatial transformation, and one image is mapped onto the other image, so that points at the same position in space in the two images are in one-to-one correspondence; the technology has wide application in the fields of target detection, model reconstruction, motion estimation, feature matching, tumor detection, lesion positioning, angiography, geological exploration, aviation reconnaissance and the like.

Disclosure of Invention

According to an aspect of the present disclosure, there is provided a method of image registration, comprising:

acquiring a first image and a second image to be registered;

inputting the first image and the second image into a homography matrix acquisition network, and outputting a first homography matrix of the mapping relation between the first image and the second image by the homography matrix acquisition network;

registering the first image and the second image according to the first homography matrix;

the homography matrix acquisition network is obtained by training a plurality of third images and a plurality of fourth images serving as inputs and a plurality of second homography matrixes of mapping relations between the third images and the fourth images serving as labels.

According to another aspect of the present disclosure, there is provided an apparatus for image registration, comprising:

the first acquisition module is used for acquiring a first image and a second image to be registered;

the first processing module is used for inputting the first image and the second image into a homography matrix acquisition network, and outputting a first homography matrix of the mapping relation between the first image and the second image by the homography matrix acquisition network;

the registration module is used for registering the first image and the second image according to the first homography matrix;

According to another aspect of the present disclosure, there is provided an electronic device including:

a processor; and

a memory in which a program is stored,

wherein the program comprises instructions which, when executed by the processor, cause the processor to perform a method of image registration according to any of the preceding claims.

According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the method of image registration according to any one of the above.

According to one or more technical schemes provided by the embodiment of the application, registration of a plurality of images can be achieved.

Drawings

Fig. 1 illustrates a flowchart of an image registration method according to an exemplary embodiment of the present disclosure;

fig. 2 shows a schematic block diagram of an image registration apparatus according to an exemplary embodiment of the present disclosure;

fig. 3 illustrates a block diagram of an exemplary electronic device that can be used to implement embodiments of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure have been shown in the accompanying drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but are provided to provide a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure.

It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order and/or performed in parallel. Furthermore, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.

The term "including" and variations thereof as used herein are intended to be open-ended, i.e., including, but not limited to. The term "based on" is based at least in part on. The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments. Related definitions of other terms will be given in the description below. It should be noted that the terms "first," "second," and the like in this disclosure are merely used to distinguish between different devices, modules, or units and are not used to define an order or interdependence of functions performed by the devices, modules, or units.

It should be noted that references to "one", "a plurality" and "a plurality" in this disclosure are intended to be illustrative rather than limiting, and those of ordinary skill in the art will appreciate that "one or more" is intended to be understood as "one or more" unless the context clearly indicates otherwise.

The names of messages or information interacted between the various devices in the embodiments of the present disclosure are for illustrative purposes only and are not intended to limit the scope of such messages or information.

In the prior art, although there is a great deal of research on the image registration technology, because it is difficult to have a unified objective result evaluation standard and to use a deep learning technology to greatly improve the performance (the data set is difficult to design and obtain), a great deal of configuration methods are designed aiming at a specific problem, and general methods are not common, mainly a method of matching image feature points and then calculating a homography matrix is used.

The image feature points are typical representative pixel points in the image, and the application scene mainly comprises camera calibration, image stitching, dense reconstruction and scene understanding, wherein a word bag method is commonly used in the scene understanding, namely, a keyword bag is generated by taking the feature points as the center for scene recognition; the image feature points are important to be applied to multiple tasks of computer vision, and at present, 3 kinds of image feature point obtaining methods exist, namely, a manually designed feature point detection algorithm, such as sift, surf, orb, fast, hog and the like, is adopted; secondly, the characteristic points are obtained by using a method based on deep learning, thirdly, the artificial mark points in the scene are used as the characteristic points, and although the current method based on deep learning can obtain better effects, the artificial design algorithm still has better effects, higher speed and easy realizability, and is applied to a large number of industries, in particular to the sift algorithm, and the method has not been degraded since the proposal. Any pixel on an image is required to be an image feature point, and two basic requirements of variability and repeatability must be met, wherein the variability is that the pixel is obvious in gray scale transformation compared with a surrounding point, such as a corner point, an edge point and the like, the repeatability means that the same feature repeatedly appears in different visual angles, and the rotation, the luminosity and the scale invariance are realized.

At present, good correction effects are obtained for the mathematics oral calculation questions of primary school, a large number of application programs are appeared, but the correction problems of all questions in primary school can not be solved, and only some conventional calculation questions such as horizontal, vertical and drop-off type can not be processed, and common questions with semantic information such as selecting, filling and judging questions can not be processed, and the questions are very common in primary school mathematics operation, and the number is not small, especially for the four, five and six grades, the application range of photographing judgment questions can be greatly expanded, and the user needs can be greatly met, however, even if the requirements are so strong, the method for correcting the questions still can not be well adopted at present; at present, for the questions with non-correctable logic, a method based on a graph question library is generally adopted for correction, and in the method, how to more accurately correspond response areas is very critical, and although various methods exist, a registration method can have better effect.

At present, image feature points are utilized for matching, then homography matrixes of two images to be registered are calculated through random sampling (RANSAC) and other methods, and therefore image registration is achieved in a more general way; at present, a great deal of problems such as back penetration (the other side is influenced by writing on both sides of the same piece of paper), uneven illumination (shooting under the light of done work), photocopying, inaccurate shooting angles and the like of a text image of a primary school mathematics are caused by various reasons such as writing habit, shooting scenes and the like, meanwhile, in order to carry out full-batch modification on all the texts in the primary school mathematics, a gallery is required to be established, how to respond to the correct answers provided in the question gallery and the related problems in the images to be modified has great influence on the accuracy of shooting judgment questions, and an image registration method is utilized for mapping the answers to carry out judgment questions, so that a good effect can be achieved, but the effect is seriously dependent on the image registration effect, and the conventional general image registration method has poor effect on the text image, especially the text image with relatively complex changes, so that the accuracy of the shooting judgment questions is limited.

In this embodiment, a method for image registration is provided, which may be used in an intelligent device, such as a mobile phone, a tablet computer, etc., fig. 1 shows a flowchart of an image registration method according to an exemplary embodiment of the present disclosure, and as shown in fig. 1, the flowchart includes the following steps:

step S101, acquiring a first image and a second image to be registered. The sources of the first image and the second image may include a plurality of kinds, for example, when correcting the homework of the student, the first image may be an image to be corrected, and the second image may be an image of a correct answer in the question bank. It should be understood by those skilled in the art that the sources of the first image and the second image are not limited to the embodiment, and other images are selected according to the actual needs and are within the scope of the embodiment.

Step S102, inputting the first image and the second image into a homography matrix acquisition network, and outputting a first homography matrix of the mapping relation between the first image and the second image by the homography matrix acquisition network. Specifically, the homography matrix acquisition network is obtained by training a plurality of third images and a plurality of fourth images serving as inputs and a plurality of second homography matrices of mapping relations between the third images and the fourth images serving as labels. The homography matrix between two images to be registered is directly obtained through learning of a sample image, and image registration is achieved based on a deep learning method.

And step S103, registering the first image and the second image according to the first homography matrix.

Compared with the method based on manual design of the feature extractor in the prior art, such as sift, surf, orb, the description corresponding to the detected feature points is often based on the statistical features of the nearby pixel values, in a sense, only low-level features, such as edges and contours, can be described instead of high-level features such as semantics with stronger characterization capability, can obtain more effective features by using a neural network instead of the manual design of the feature extractor, can perform feature point matching by using the more advanced features, can obtain a better homography matrix, and meanwhile, can have better resistance to scratches and the like in a real text image by using the strong fitting capability of the network. According to the embodiment, the image registration can be realized based on a deep learning method, the homography matrix between two images to be registered is directly obtained through learning of a sample image, and the registration of the images to be registered is completed based on the homography matrix, so that more accurate and rapid image registration is realized, and the accuracy of photographing judgment questions can be improved when the images to be corrected and the images of correct answers in a question bank are matched.

The sources of the third and fourth images used to train the homography matrix acquisition network include various ways, and before training the homography matrix acquisition network, in some alternative embodiments, the homography matrix acquisition network may be implemented by a Variational Auto-Encoder (VAE), where the VAE model is an important generative model that is composed of two parts, namely an Encoder and a decoder, and usually takes the lower-certainty of log-likelihood as an optimization target, so that the loss function of the VAE model is generally composed of two parts, namely reconstruction loss and cross-entropy loss, and the VAE model encodes the input by the Encoder and then encodes the input to the decoder for restoring the input. And inputting the third image into a VAE model, and outputting a fourth image by the VAE model, wherein the VAE model is obtained by training a plurality of fifth images serving as inputs and a plurality of sixth images serving as labels, and the sixth images are obtained by carrying out noise adding processing on the fifth images, wherein the noise adding processing can be operations such as random addition of scratches, photocopies, back penetration, uneven illumination and the like. In the following, a detailed description will be given of a specific embodiment, a large number of normal text images (no scratches, etc.) are collected first (i.e., the fifth image described above), and in order to reduce the amount of calculation, a part of data in the normal text images is copied, operations such as manually adding some scratches, photocopying, back-transmitting, uneven illumination, etc. (i.e., the sixth image described above) are performed randomly, then, using the normal text images described above as input, the artificially processed images are used as labels, and a VAE model is trained, i.e., a normal text image is input, to obtain a random problematic text image. In this alternative embodiment, the effort of manually manufacturing the original abnormal sample is reduced by the VAE model.

In another alternative embodiment, a seventh image is acquired, the seventh image is input to the VAE model, an eighth image is output from the VAE model, a third homography matrix is randomly generated, the eighth image is transformed using the third homography matrix to obtain a ninth image, the seventh image is taken as the third image, the ninth image is taken as the fourth image, and the third homography matrix is taken as the second homography matrix. According to the method, the workload of manually manufacturing the original abnormal sample is reduced through the VAE model, and meanwhile, the problem that the registration data of the text image are difficult to collect is effectively relieved through a mode that a homography matrix (label) is determined firstly and then an input image is obtained.

In order to measure the accuracy of the first homography matrix, the homography matrix acquisition network further comprises a branch network, the branch network of the homography matrix acquisition network is obtained by training a plurality of third images and a plurality of fourth images serving as inputs, a plurality of registration probability values of the third images and the fourth images serving as labels, the first images and the second images are input into the branch network of the homography matrix acquisition network, the registration probability values of the first images and the second images are output by the branch network of the homography matrix acquisition network, the registration of the first images and the second images is carried out according to the first homography matrix and the registration probability values, and the accuracy of the homography matrix is measured according to the probability values of whether the two images can be registered. Image registration refers to that two images with the same content (same scene) but photographed from different angles are subjected to a plurality of methods (such as feature point matching) to obtain a homography matrix, and pixels of one image can be transformed to corresponding positions on the other image by using the homography matrix, but in reality, any two images, whether the content is the same or not, can be obtained into the homography matrix, and in fact, whether the two images are the same or not is not known, a registration probability value can be regarded as a probability for measuring whether the two images are similar or not, and if the registration probability value is high, the two images are the same in content with high probability, so that the homography matrix is only meaningful.

In some alternative embodiments, the homography matrix acquisition network includes a plurality of convolution layers and a plurality of full connection layers connected in sequence, where, for consistency of description, the homography matrix acquisition network corresponds to a first branch of the homography matrix acquisition network described below. The branch network of the homography matrix acquisition network includes a plurality of convolution layers and a plurality of full connection layers connected in sequence, and for the sake of consistency of description, the branch network of the homography matrix acquisition network corresponds to a second branch of the homography matrix acquisition network described below. Specifically, a homography matrix is then constructed to obtain a network, in this embodiment, a common Resnet18 network is used and is improved, the Resnet18 network is constructed by connecting 4 blocks in series, each block includes a plurality of layers of convolution operations, the feature mapping size output by the first block is 1/4 of the original image, the second block is 1/8 of the original image, the third block is 1/16 of the original image, the fourth block is 1/32 of the original image, in the block blocks, the feature mapping number output by each block is 128, the 4 sets of feature mapping is changed into the original image 1/4 in size and is connected in series in all the interpolation manner, a set of feature mapping is obtained, the number of channels is 512, so far, the first branch is divided into two branches, the first branch is then subjected to five convolution operations on the feature mapping of 512 channels, the feature mapping number is 32, the feature mapping size is 1/128 of the original image is finally obtained, then a full-connection layer is connected, the number of nodes is 32, the node number is W/128, and the node number is W/128 of the full-connection layer is represented by the full-length image, and the node number is W/128, and the full-length image is represented by the full-length matrix.

The second branch performs six convolution operations on the feature map of the 512 channels, to obtain feature map with the number of feature maps being 10 and the size being 1/256 of the original image, then connects a full connection layer with the number of nodes being 10 x H x W/256 (where H and W refer to the height and width of the original image), then connects a full connection layer with the number of nodes being 2, and indicates the probability value that the two images can be registered, where the homography matrix obtains the branch network of the network, and includes: the method comprises the steps of carrying out convolution processing on input channel feature maps by each convolution layer to obtain output channel feature maps, wherein the number of channels of the output channel feature maps is smaller than that of channels of the input channel feature maps, the calculated amount can be reduced by reducing the number of channels, the processing speed is improved, after the convolution processing, windows with preset sizes are selected, channel elements at positions corresponding to the preset number of pixel points in the windows with the preset sizes are randomly interchanged and rearranged, convolution operation is carried out again to obtain feature maps with the number of specified channels, the preset sizes are reduced along with the increase of convolution operation times, so that the calculated amount is further reduced, the processing speed is improved, and the feature maps with the number of specified channels are input to the full connection layer after dimensional change, so that the redetermined feature maps are obtained. The above structure of the branch network of the homography matrix acquisition network can increase feature fusion, so that local features and global features of two images to be registered can be effectively combined, and accuracy of probability values of whether the two images to be registered can be registered is improved. More specifically, the second branch firstly carries out a convolution operation (convolution kernel size 3*3) on the input 512 channel feature map to obtain 256 channel output, then takes 8 x 8 as a window, randomly exchanges all channel elements at corresponding positions of 4 pixel points in the window, rearranges all channel elements, and then carries out a second 3*3 convolution operation to obtain 256 channel output, as in the previous operation, the second time is similar to the previous operation in the window 6*6, 128 channel output is obtained after the third convolution, 128 channel output is obtained after the fourth convolution in the window 4*4, 64 channel output is obtained after the fourth convolution in the window 2 x 2, 64 channel output is obtained after the fifth convolution, at this time, the feature map size is 8 x 8, then 64 channel values corresponding to each pixel point are taken out to be rearranged to 8 x 8 (namely 1 x 64 is changed to 8 x 8), then the sixth convolution operation is carried out to obtain 10 channel output, the third convolution operation is carried out, and then the third convolution operation is connected to the full connection layer, and the network is designed to combine the local features of two images to be registered with the global feature effectively compared with the conventional design; during training, the first branch uses a smooth L1 penalty function and the second branch uses a softmax penalty function.

The third image and the fourth image include a third image and a fourth image of positive samples, and also include a third image and a fourth image of negative samples, and in order to reduce the number of samples for training the homography matrix acquisition network, the ratio of the number of positive samples to the number of negative samples may be a predetermined threshold, for example, the ratio may be 1:3, where the positive samples indicate that the third image corresponds to the fourth image, and the negative samples indicate that the third image does not correspond to the fourth image.

The embodiment also provides an image registration device, which is used for implementing the above embodiment and the preferred implementation, and is not described in detail. As used below, the term "module" is a combination of software and/or hardware that can implement a predetermined function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.

The present embodiment provides an apparatus for image registration, as shown in fig. 2, including:

a first acquisition module 21 for acquiring a first image and a second image to be registered;

a first processing module 22, configured to input the first image and the second image to a homography matrix acquisition network, and output a first homography matrix of a mapping relationship between the first image and the second image by the homography matrix acquisition network;

a registration module 23, configured to perform registration of the first image and the second image according to the first homography matrix;

Optionally, the apparatus further comprises:

a second processing module for inputting the third image to a variation self-encoder, outputting the fourth image by the variation self-encoder;

the variable self-encoder takes a plurality of fifth images as input, a plurality of sixth images as labels for training, and the sixth images are obtained by adding noise to the fifth images.

The means of image registration in this embodiment are presented in the form of functional units, here referred to as ASIC circuits, processors and memories executing one or more software or firmware programs, and/or other devices that can provide the functionality described above.

Further functional descriptions of the above respective modules are the same as those of the above corresponding embodiments, and are not repeated here.

The exemplary embodiments of the present disclosure also provide an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor. The memory stores a computer program executable by the at least one processor for causing the electronic device to perform a method according to embodiments of the present disclosure when executed by the at least one processor.

The present disclosure also provides a non-transitory computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor of a computer, is for causing the computer to perform a method according to an embodiment of the present disclosure.

The present disclosure also provides a computer program product comprising a computer program, wherein the computer program, when executed by a processor of a computer, is for causing the computer to perform a method according to embodiments of the disclosure.

Referring to fig. 3, a block diagram of an electronic device 300 that may be a server or a client of the present disclosure, which is an example of a hardware device that may be applied to aspects of the present disclosure, will now be described. Electronic devices are intended to represent various forms of digital electronic computer devices, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 3, the electronic device 300 includes a computing unit 301 that can perform various suitable actions and processes according to a computer program stored in a Read Only Memory (ROM) 302 or a computer program loaded from a storage unit 308 into a Random Access Memory (RAM) 303. In the RAM 303, various programs and data required for the operation of the device 300 may also be stored. The computing unit 301, the ROM 302, and the RAM 303 are connected to each other by a bus 304. An input/output (I/O) interface 305 is also connected to bus 304.

Various components in the electronic device 300 are connected to the I/O interface 305, including: an input unit 306, an output unit 307, a storage unit 308, and a communication unit 309. The input unit 306 may be any type of device capable of inputting information to the electronic device 300, and the input unit 306 may receive input numeric or character information and generate key signal inputs related to user settings and/or function controls of the electronic device. The output unit 307 may be any type of device capable of presenting information and may include, but is not limited to, a display, speakers, video/audio output terminals, vibrators, and/or printers. Storage unit 304 may include, but is not limited to, magnetic disks, optical disks. The communication unit 309 allows the electronic device 300 to exchange information/data with other devices through a computer network, such as the internet, and/or various telecommunications networks, and may include, but is not limited to, modems, network cards, infrared communication devices, wireless communication transceivers and/or chipsets, such as bluetooth (TM) devices, wiFi devices, wiMax devices, cellular communication devices, and/or the like.

The computing unit 301 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 301 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 301 performs the respective methods and processes described above. For example, in some embodiments, the method of method image registration may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 308. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 300 via the ROM 302 and/or the communication unit 309. In some embodiments, the computing unit 301 may be configured to perform the method of image registration by any other suitable means (e.g. by means of firmware).

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

As used in this disclosure, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

Claims

1. A method of image registration, comprising:

acquiring a first image and a second image to be registered;

inputting the first image and the second image into a homography matrix acquisition network and a branch network of the homography matrix acquisition network, outputting a first homography matrix of the mapping relation between the first image and the second image by the homography matrix acquisition network, and outputting registration probability values of the first image and the second image by the branch network of the homography matrix acquisition network, wherein the registration probability values of the first image and the second image represent the probability of whether the first image is similar to the second image;

the homography matrix acquisition network is obtained by training a plurality of third images and a plurality of fourth images serving as inputs and a plurality of second homography matrixes of mapping relations between the third images and the fourth images serving as labels;

the branch network of the homography matrix acquisition network is obtained by training a plurality of third images and a plurality of fourth images serving as inputs and a plurality of registration probability values of the third images and the fourth images serving as labels;

the branch network of the homography matrix acquisition network comprises: a plurality of convolution layers and a full connection layer;

each convolution layer carries out convolution processing on the input channel feature map to obtain an output channel feature map; the number of channels of the output channel feature map is smaller than that of the input channel feature map;

after convolution processing, selecting a window with a preset size, randomly exchanging and rearranging channel elements at positions corresponding to a preset number of pixel points in the window with the preset size, and performing convolution operation again to obtain feature mapping of the number of the designated channels; the predetermined size decreases as the number of convolution operations increases;

and after the feature mapping of the number of the specified channels is subjected to dimensional change, inputting the feature mapping into the full-connection layer to obtain a redetermined feature mapping.

2. The method of image registration of claim 1, wherein prior to training the homography acquisition network, acquiring the third image and the fourth image comprises:

inputting the third image to a variation self-encoder, and outputting the fourth image by the variation self-encoder;

the variation self-encoder is trained by taking a plurality of fifth images as input and a plurality of sixth images as labels, and the sixth images are obtained by carrying out noise adding processing on the fifth images.

3. The method of image registration of claim 2, wherein prior to training the homography acquisition network, acquiring the third image and the fourth image comprises:

acquiring a seventh image;

inputting the seventh image to the variation self-encoder, outputting an eighth image by the variation self-encoder;

randomly generating a third homography matrix;

transforming the eighth image by using the third homography matrix to obtain a ninth image;

and taking the seventh image as the third image, the ninth image as the fourth image, and the third homography matrix as the second homography matrix.

4. The method of image registration of claim 1, wherein the method further comprises:

and registering the first image and the second image according to the first homography matrix and the registration probability value.

5. The method of image registration of any one of claims 1 to 4, wherein the ratio of the number of positive samples to negative samples is a predetermined threshold; wherein the positive sample indicates that the third image corresponds to the fourth image, and the negative sample indicates that the third image does not correspond to the fourth image.

6. An apparatus for image registration, comprising:

the first processing module is used for inputting the first image and the second image into a homography matrix acquisition network and a branch network of the homography matrix acquisition network, outputting a first homography matrix of the mapping relation between the first image and the second image by the homography matrix acquisition network, and outputting registration probability values of the first image and the second image by the branch network of the homography matrix acquisition network, wherein the registration probability values of the first image and the second image represent the probability of whether the first image is similar to the second image;

the method for obtaining the registration probability value of the first image and the second image by the homography matrix comprises the following steps:

7. The apparatus for image registration of claim 6, wherein the apparatus further comprises:

8. An electronic device, comprising:

a processor; and

a memory in which a program is stored,

wherein the program comprises instructions which, when executed by the processor, cause the processor to perform the method according to any of claims 1-5.

9. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-5.