CN112598718A - Unsupervised multi-view multi-mode intelligent glasses image registration method and device - Google Patents

Unsupervised multi-view multi-mode intelligent glasses image registration method and device Download PDF

Info

Publication number
CN112598718A
CN112598718A CN202011632743.2A CN202011632743A CN112598718A CN 112598718 A CN112598718 A CN 112598718A CN 202011632743 A CN202011632743 A CN 202011632743A CN 112598718 A CN112598718 A CN 112598718A
Authority
CN
China
Prior art keywords
network
image
registration
feature extraction
registered
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011632743.2A
Other languages
Chinese (zh)
Other versions
CN112598718B (en
Inventor
王成
高启宇
李一鸣
俞益洲
乔昕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Shenrui Bolian Technology Co Ltd
Shenzhen Deepwise Bolian Technology Co Ltd
Original Assignee
Beijing Shenrui Bolian Technology Co Ltd
Shenzhen Deepwise Bolian Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Shenrui Bolian Technology Co Ltd, Shenzhen Deepwise Bolian Technology Co Ltd filed Critical Beijing Shenrui Bolian Technology Co Ltd
Priority to CN202011632743.2A priority Critical patent/CN112598718B/en
Publication of CN112598718A publication Critical patent/CN112598718A/en
Application granted granted Critical
Publication of CN112598718B publication Critical patent/CN112598718B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • G06T7/33Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides an unsupervised multi-view multi-modal intelligent glasses image registration method, which comprises the following steps: alternately training an auxiliary feature extraction network and a high-dimensional feature similarity discrimination network, and training the high-dimensional feature similarity discrimination network by using the same-mode images and/or different-mode images extracted by the auxiliary feature extraction network and the images at the same view angle and/or different view angles; fixing a high-dimensional feature similarity discrimination network and an auxiliary feature extraction network, and training a registration network, wherein the input of the registration network is an image to be registered, a template image and a difference image of the image and the template image, the high output of the auxiliary feature extraction network is inserted in the middle layer as a feature vector, and the output is a dense registration field; and predicting the image to be registered by adopting a registration network and an auxiliary feature extraction network to obtain the registered image.

Description

Unsupervised multi-view multi-mode intelligent glasses image registration method and device
Technical Field
The invention relates to the field of computers, in particular to an unsupervised multi-view multi-mode intelligent glasses image registration method and device.
Background
The method is characterized in that the acquisition of the environmental image is an important condition for obstacle avoidance, various devices for acquiring the image are provided at present, such as an RGB camera, a structured light camera, a TOF camera and the like, each camera has advantages and disadvantages and an applicable scene, and a certain camera can not adapt to an actual scene when being used alone, so that the various cameras are usually combined for use in the obstacle avoidance scene. The imaging results of the multiple cameras are different due to the difference of the placement positions, the imaging time and the imaging forms, so that the fusion of the multi-view multi-modal image information is an important technical link in the obstacle avoidance task.
Disclosure of Invention
The present invention aims to provide an unsupervised multi-view multimodal smart glasses image registration method and apparatus that overcomes or at least partially solves the above mentioned problems.
In order to achieve the purpose, the technical scheme of the invention is realized as follows:
one aspect of the present invention provides an unsupervised multi-view multi-modal smart glasses image registration method, including: alternately training an auxiliary feature extraction network and a high-dimensional feature similarity discrimination network, and training the high-dimensional feature similarity discrimination network by using the same-mode images and/or different-mode images extracted by the auxiliary feature extraction network and the images at the same view angle and/or different view angles; the input images of the auxiliary feature extraction network are template images and images to be registered, and output are high-dimensional feature vectors; the high-dimensional feature similarity discrimination network adopts a convolutional neural network, input data is auxiliary features to extract high-dimensional features output by the network, and the output is a numerical value from 0 to 1; fixing a high-dimensional feature similarity discrimination network and an auxiliary feature extraction network, and training a registration network, wherein the input of the registration network is an image to be registered, a template image and a difference image of the image and the template image, the high output of the auxiliary feature extraction network is inserted in the middle layer as a feature vector, and the output is a dense registration field; and predicting the image to be registered by adopting a registration network and an auxiliary feature extraction network to obtain the registered image.
Wherein training the registration network comprises: and calculating the similarity between the registered image and the template object as a similarity loss function, adding the regularization loss function of the registration field, and training the registration network.
The high-dimensional feature similarity distinguishing network adopts a resnet classification network architecture, the auxiliary feature extraction network adopts a resnet classification network architecture, and the registration network adopts a UNet structure.
The auxiliary feature extraction network splices two images with the size of H, W, L into an image pair with the size of H, W, L, 2; the registration network tiles the two images of size H W L and their difference images into an image pair of size H W L3.
The method for predicting the image to be registered by adopting the registration network and the auxiliary feature extraction network to obtain the registered image comprises the following steps: combining the images to be registered and the template images with the same size of H, W, L and differential images of the images to be registered and the template images to form an image pair with the size of H, W, L and 3, and inputting the image pair into a registration network and an auxiliary feature extraction network; obtaining abstract features through a down-sampling path, combining the abstract features with high-dimensional features extracted by an auxiliary feature extraction network, and entering an up-sampling path together; and outputting a dense registration field with the size H W L3, and sampling the original image by using the registration field to obtain a registered image.
In another aspect, the present invention provides an unsupervised multi-view multi-modal smart glasses image registration apparatus, including: the first training module is used for alternately training the auxiliary feature extraction network and the high-dimensional feature similarity discrimination network, and training the high-dimensional feature similarity discrimination network by using the same-mode images and/or different-mode images extracted by the auxiliary feature extraction network and the images at the same view angle and/or different view angles; the input images of the auxiliary feature extraction network are template images and images to be registered, and output are high-dimensional feature vectors; the high-dimensional feature similarity discrimination network adopts a convolutional neural network, input data is auxiliary features to extract high-dimensional features output by the network, and the output is a numerical value from 0 to 1; the second training module is used for fixing the high-dimensional feature similarity discrimination network and the auxiliary feature extraction network and training the registration network, wherein the input of the registration network is the image to be registered, the template image and the difference image of the image and the template image, the high-level feature vector output by the auxiliary feature extraction network is inserted in the middle layer, and the output is a dense registration field; and the prediction module is used for predicting the image to be registered by adopting the registration network and the auxiliary feature extraction network to obtain the registered image.
Wherein the third training module trains the registration network by: and the third training module is specifically used for calculating the similarity between the registered image and the template object as a similarity loss function, adding a regularization loss function of the registration field, and training the registration network.
The high-dimensional feature similarity distinguishing network adopts a resnet classification network architecture, the auxiliary feature extraction network adopts a resnet classification network architecture, and the registration network adopts a UNet structure.
The auxiliary feature extraction network splices two images with the size of H, W, L into an image pair with the size of H, W, L, 2; the registration network tiles the two images of size H W L and their difference images into an image pair of size H W L3.
The prediction module predicts the image to be registered by adopting a registration network and an auxiliary feature extraction network in the following mode to obtain the registered image: the prediction module is specifically used for combining the image to be registered and the template image which have the same size H W L and the difference image of the image to be registered and the template image to form an image pair with the size H W L3, and inputting the image pair into the registration network and the auxiliary feature extraction network; obtaining abstract features through a down-sampling path, combining the abstract features with high-dimensional features extracted by an auxiliary feature extraction network, and entering an up-sampling path together; and outputting a dense registration field with the size H W L3, and sampling the original image by using the registration field to obtain a registered image.
Therefore, the unsupervised multi-view multi-modal intelligent glasses image registration method and device provided by the invention can be used for registering multi-view multi-modal images, and are convenient for subsequent tasks such as depth estimation, detection, segmentation and the like.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.
Fig. 1 is a flowchart of an unsupervised multi-view multi-modal smart glasses image registration method according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a network structure in an unsupervised multi-view multi-modal smart glasses image registration method according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of an unsupervised multi-view multi-modal smart glasses image registration apparatus according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
Fig. 1 shows a flowchart of an unsupervised multi-view multi-modal smart glasses image registration method provided by an embodiment of the present invention, and the following describes, with reference to fig. 1 and fig. 2, the unsupervised multi-view multi-modal smart glasses image registration method provided by the embodiment of the present invention, where the unsupervised multi-view multi-modal smart glasses image registration method provided by the embodiment of the present invention includes:
s1, alternately training an auxiliary feature extraction network and a high-dimensional feature similarity discrimination network, and training the high-dimensional feature similarity discrimination network by using the same-mode images and/or different-mode images extracted by the auxiliary feature extraction network and the same-view and/or different-view images; the input images of the auxiliary feature extraction network are template images and images to be registered, and output are high-dimensional feature vectors; the high-dimensional feature similarity discrimination network adopts a convolutional neural network, input data are auxiliary features, high-dimensional features output by the network are extracted, and the output is a numerical value from 0 to 1.
Specifically, the method firstly trains an auxiliary feature extraction network and a high-dimensional feature similarity discrimination network.
After the auxiliary feature extraction network is trained, for the output obtained by input calculation of different modes and different visual angles, the similarity of the input image is judged by using the high-dimensional feature similarity judgment network, and the training aim is that for the input images of different modes and different visual angles, the auxiliary feature extraction network can extract similar high-dimensional features.
As an optional implementation manner of the embodiment of the present invention, the structure of the auxiliary feature extraction network is a resnet classification network architecture, and the input images of the auxiliary feature extraction network are the template image and the image to be registered, and the output is the high-dimensional feature vector.
The invention utilizes the same mode image/different mode images extracted by the auxiliary feature extraction network, and the same visual angle/different visual angle images to train the high-dimensional feature similarity discrimination network, so that the input of the different mode and different visual angle images of the high-dimensional feature similarity discrimination network is close to 0 in network output, and the input of the same mode and different visual angle images is close to 1 in network output. Wherein 0 represents that the two images are not similar; 1 represents that the two images are similar.
As an optional implementation manner of the embodiment of the present invention, the basic structure of the high-dimensional feature similarity discrimination network is a convolutional neural network, a resnet classification network architecture; the input data is an image pair, two images with the size of H, W and L are spliced into the image pair, and the size of the image pair is H, W, L and 2; the input image is subjected to feature extraction through multilayer convolution operation, and finally 1 number of 0-1 is output to represent probability, wherein 1 represents similarity, and 0 represents dissimilarity.
And S2, fixing a high-dimensional feature similarity discrimination network and an auxiliary feature extraction network, and training a registration network, wherein the input of the registration network is the image to be registered, the template image and a difference image of the image and the template image, the high-dimensional feature vector output by the auxiliary feature extraction network is inserted in the middle layer, and the output is a dense registration field.
Specifically, a high-dimensional feature similarity discrimination network and an auxiliary feature extraction network are fixed, a registration network is trained, and an image to be registered, a template image and a difference image of the image to be registered and the template image are input through the network. And inserting auxiliary features into the middle layer of the network to extract high-dimensional features output by the network, and finally outputting a predicted registration field. The template image is a registration reference template, and the difference image is used for describing the difference degree between the image to be detected and the template image.
As an optional implementation manner of the embodiment of the present invention, a network structure of the registration network is an UNet structure, and includes a down-sampling path and an up-sampling path, where an intermediate layer is an output of the down-sampling path and an input of the up-sampling path; the input of the registration network is an image to be registered, a template image and a difference image of the image to be registered and the template image, the sizes of the images are H x W x L, the combined images with the sizes of H x W x L3 are input into the registration network and the auxiliary feature extraction network, abstract features are obtained through a down-sampling channel firstly, then the abstract features are combined with high-dimensional features extracted by the auxiliary feature extraction network and enter an up-sampling channel together, and finally a dense registration field with the size of H W x L3 is output.
And S3, predicting the image to be registered by adopting the registration network and the auxiliary feature extraction network to obtain the registered image.
As an optional implementation manner of the embodiment of the present invention, the predicting, by the registration network and the auxiliary feature extraction network, the image to be registered, and obtaining the registered image includes: combining the images to be registered and the template images with the same size of H, W, L and differential images of the images to be registered and the template images to form an image pair with the size of H, W, L and 3, and inputting the image pair into a registration network and an auxiliary feature extraction network; obtaining abstract features through a down-sampling path, combining the abstract features with high-dimensional features extracted by an auxiliary feature extraction network, and entering an up-sampling path together; and outputting a dense registration field with the size H W L3, and sampling the original image by using the registration field to obtain a registered image.
As an optional implementation manner of the embodiment of the present invention, training the registration network includes: and calculating the similarity between the registered image and the template object as a similarity loss function, adding the regularization loss function of the registration field, and training the registration network.
Specifically, the similarity between the registered image and the template object is calculated as a similarity loss function, and a regularization loss function of a registration field is added, so that the similarity loss function and the regularization loss function jointly guide the training of the registration network. The template image is marked as F, the image to be registered is marked as M, and the registration field obtained by the network is marked as
Figure BDA0002875353760000051
Figure BDA0002875353760000052
Wherein, a loss function loss is added, comprising:
similarity loss function | F-M |2And the method is used for counting the gray difference between the template image and the image to be detected as similarity measurement.
Regularization loss function
Figure BDA0002875353760000053
Wherein
Figure BDA0002875353760000054
Is an absolute value used for limiting the amplitude of the deformation field, and plays a role in regularization, and alpha is a weighting coefficient of the term.
Figure BDA0002875353760000055
To be registered with fieldThe first order differential is used to constrain the smoothness of the registration field, and β is the weighting factor of this term.
Therefore, compared with the mode in the prior art, the unsupervised multi-view multi-modal intelligent glasses image registration method provided by the embodiment of the invention removes the cycle-gan module and cancels the restoration of the high-dimensional abstract features to the 3-dimensional image. And a feature extraction network is used for abstracting high-dimensional features, a neural network is used for judging the similarity of the high-dimensional features of the multi-view multi-modal images, and a supervision feature extraction network is used for extracting similar features from the multi-view multi-modal images, so that the subsequent registration is facilitated.
The invention reduces the complexity of the algorithm and reduces the training process into three steps, namely training an auxiliary feature extraction network, training a high-dimensional feature similarity discrimination network and training a registration network, wherein the high-dimensional feature is used as the information supplement of the registration network, and the coupling degree is low.
Based on the unsupervised multi-view multimode intelligent glasses image registration method provided by the invention, an end-to-end deep learning registration scheme is provided, a template image and an image to be registered are input, a registration result can be output, a multimode image can be processed, images (an RGB camera, a TOF camera, a structured light camera and the like) from different cameras are registered together, an unsupervised algorithm is adopted, data do not need to be marked, the time and the price cost are greatly saved, a cycle-gan module is cancelled compared with the existing method, the algorithm stability is improved, the model does not have too many long jump structures, and the method is light and convenient to deploy on small equipment.
Fig. 3 is a schematic structural diagram of an unsupervised multi-view multi-modal smart glasses image registration apparatus provided in an embodiment of the present invention, in which the above method is applied, and the following only briefly explains the structure of the unsupervised multi-view multi-modal smart glasses image registration apparatus, and makes reference to the related description in the above unsupervised multi-view multi-modal smart glasses image registration method for other matters, referring to fig. 3, the unsupervised multi-view multi-modal smart glasses image registration apparatus provided in an embodiment of the present invention includes:
the first training module is used for alternately training the auxiliary feature extraction network and the high-dimensional feature similarity discrimination network, and training the high-dimensional feature similarity discrimination network by using the same-mode images and/or different-mode images extracted by the auxiliary feature extraction network and the images at the same view angle and/or different view angles; the input images of the auxiliary feature extraction network are template images and images to be registered, and output are high-dimensional feature vectors; the high-dimensional feature similarity discrimination network adopts a convolutional neural network, input data is auxiliary features to extract high-dimensional features output by the network, and the output is a numerical value from 0 to 1;
the second training module is used for fixing the high-dimensional feature similarity discrimination network and the auxiliary feature extraction network and training the registration network, wherein the input of the registration network is the image to be registered, the template image and the difference image of the image and the template image, the high-level feature vector output by the auxiliary feature extraction network is inserted in the middle layer, and the output is a dense registration field;
and the prediction module is used for predicting the image to be registered by adopting the registration network and the auxiliary feature extraction network to obtain the registered image.
As an optional implementation manner of the embodiment of the present invention, the third training module trains the registration network by: and the third training module is specifically used for calculating the similarity between the registered image and the template object as a similarity loss function, adding a regularization loss function of the registration field, and training the registration network.
As an optional implementation manner of the embodiment of the present invention, the high-dimensional feature similarity discrimination network adopts a resnet classification network architecture, the auxiliary feature extraction network adopts a resnet classification network architecture, and the registration network adopts a UNet structure.
As an optional implementation of the embodiment of the present invention, the auxiliary feature extraction network pieces two images with the size H × W × L into an image pair with the size H × W × L × 2; the registration network tiles the two images of size H W L and their difference images into an image pair of size H W L3.
As an optional implementation manner of the embodiment of the present invention, the prediction module predicts the image to be registered by using the registration network and the auxiliary feature extraction network in the following manner, so as to obtain the registered image: the prediction module is specifically used for combining the image to be registered and the template image which have the same size H W L and the difference image of the image to be registered and the template image to form an image pair with the size H W L3, and inputting the image pair into the registration network and the auxiliary feature extraction network; obtaining abstract features through a down-sampling path, combining the abstract features with high-dimensional features extracted by an auxiliary feature extraction network, and entering an up-sampling path together; outputting a dense registration field with a size H W L3, and sampling the original image by using the registration field to obtain a registered image
Therefore, compared with the mode in the prior art, the unsupervised multi-view multi-modal intelligent glasses image registration device provided by the embodiment of the invention removes the cycle-gan module, and simultaneously cancels the restoration of the high-dimensional abstract features to the 3-dimensional image. And a feature extraction network is used for abstracting high-dimensional features, a neural network is used for judging the similarity of the high-dimensional features of the multi-view multi-modal images, and a supervision feature extraction network is used for extracting similar features from the multi-view multi-modal images, so that the subsequent registration is facilitated.
The invention reduces the complexity of the algorithm and reduces the training process into three steps, namely training an auxiliary feature extraction network, training a high-dimensional feature similarity discrimination network and training a registration network, wherein the high-dimensional feature is used as the information supplement of the registration network, and the coupling degree is low.
Based on the unsupervised multi-view multimode intelligent glasses image registration device provided by the invention, an end-to-end deep learning registration scheme is provided, a template image and an image to be registered are input, a registration result can be output, a multimode image can be processed, images (an RGB camera, a TOF camera, a structured light camera and the like) from different cameras are registered together, an unsupervised algorithm is adopted, data do not need to be marked, the time and the price cost are greatly saved, a cycle-gan module is cancelled compared with the existing method, the algorithm stability is improved, the model does not have too many long jump structures, and the unsupervised multimode intelligent glasses image registration device is light in weight and convenient to deploy on small equipment.
The above are merely examples of the present application and are not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims (10)

1. An unsupervised multi-view multi-modal smart eyewear image registration method, comprising:
alternately training an auxiliary feature extraction network and a high-dimensional feature similarity discrimination network, and training the high-dimensional feature similarity discrimination network by using the same-mode images and/or different-mode images extracted by the auxiliary feature extraction network and the images at the same view angle and/or different view angles; the input images of the auxiliary feature extraction network are template images and images to be registered, and the output is a high-dimensional feature vector; the high-dimensional feature similarity discrimination network adopts a convolutional neural network, input data are high-dimensional features output by the auxiliary feature extraction network, and the output is a numerical value from 0 to 1;
fixing the high-dimensional feature similarity discrimination network and the auxiliary feature extraction network, and training a registration network, wherein the input of the registration network is an image to be registered, a template image and a difference image of the image and the template image, the high output of the auxiliary feature extraction network inserted in the middle layer is a feature vector, and the output is a dense registration field;
and predicting the image to be registered by adopting the registration network and the auxiliary feature extraction network to obtain the registered image.
2. The method of claim 1, wherein training the registration network comprises:
and calculating the similarity between the registered image and the template object as a similarity loss function, adding a regularization loss function of the registration field, and training the registration network.
3. The method according to claim 1, wherein the high-dimensional feature similarity discrimination network adopts a resnet classification network architecture, the auxiliary feature extraction network adopts a resnet classification network architecture, and the registration network adopts a UNet structure.
4. The method of claim 1, wherein the assist feature extraction network tiles two images of size H W L into an image pair of size H W L2;
the registration network tiles two images of size H W L and their difference images into an image pair of size H W L3.
5. The method according to claim 1, wherein the predicting the image to be registered by using the registration network and the assistant feature extraction network to obtain the registered image comprises:
combining the to-be-registered image and the template image which are same in size and are H x W x L, and the difference image of the to-be-registered image and the template image to form an image pair with the size of H x W x L3, and inputting the image pair into the registration network and the auxiliary feature extraction network;
obtaining abstract features through a down-sampling path, combining the abstract features with high-dimensional features extracted by the auxiliary feature extraction network, and jointly entering the up-sampling path;
and outputting the dense registration field with the size H W L3, and sampling the original image by using the registration field to obtain a registered image.
6. An unsupervised multi-view multi-modal smart eyewear image registration apparatus, comprising:
the first training module is used for alternately training an auxiliary feature extraction network and a high-dimensional feature similarity discrimination network, and training the high-dimensional feature similarity discrimination network by using the same-mode images and/or different-mode images extracted by the auxiliary feature extraction network and the images at the same view angle and/or different view angles; the input images of the auxiliary feature extraction network are template images and images to be registered, and the output is a high-dimensional feature vector; the high-dimensional feature similarity discrimination network adopts a convolutional neural network, input data are high-dimensional features output by the auxiliary feature extraction network, and the output is a numerical value from 0 to 1;
the second training module is used for fixing the high-dimensional feature similarity discrimination network and the auxiliary feature extraction network and training a registration network, wherein the input of the registration network is an image to be registered, a template image and a difference image of the image and the template image, the high-level feature vector output by the auxiliary feature extraction network is inserted in the middle layer, and the output is a dense registration field;
and the prediction module is used for predicting the image to be registered by adopting the registration network and the auxiliary feature extraction network to obtain the registered image.
7. The apparatus of claim 6, wherein the third training module trains the registration network by:
the third training module is specifically configured to calculate similarity between the registered image and the template object as a similarity loss function, add a regularization loss function of the registration field, and train the registration network.
8. The apparatus according to claim 6, wherein the high-dimensional feature similarity discrimination network adopts a resnet classification network architecture, the auxiliary feature extraction network adopts a resnet classification network architecture, and the registration network adopts a UNet structure.
9. The apparatus of claim 6, wherein the assist feature extraction network tiles two images of size H W L into an image pair of size H W L2;
the registration network tiles two images of size H W L and their difference images into an image pair of size H W L3.
10. The apparatus according to claim 6, wherein the prediction module predicts the image to be registered by using the registration network and the assistant feature extraction network to obtain the registered image by:
the prediction module is specifically configured to combine the to-be-registered image and the template image with the same size of H × W × L and the difference image of the to-be-registered image and the template image to form an image pair with the size of H × W × L3, and input the image pair to the registration network and the auxiliary feature extraction network; obtaining abstract features through a down-sampling path, combining the abstract features with high-dimensional features extracted by the auxiliary feature extraction network, and jointly entering the up-sampling path; and outputting the dense registration field with the size H W L3, and sampling the original image by using the registration field to obtain a registered image.
CN202011632743.2A 2020-12-31 2020-12-31 Unsupervised multi-view multi-mode intelligent glasses image registration method and device Active CN112598718B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011632743.2A CN112598718B (en) 2020-12-31 2020-12-31 Unsupervised multi-view multi-mode intelligent glasses image registration method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011632743.2A CN112598718B (en) 2020-12-31 2020-12-31 Unsupervised multi-view multi-mode intelligent glasses image registration method and device

Publications (2)

Publication Number Publication Date
CN112598718A true CN112598718A (en) 2021-04-02
CN112598718B CN112598718B (en) 2022-07-12

Family

ID=75206873

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011632743.2A Active CN112598718B (en) 2020-12-31 2020-12-31 Unsupervised multi-view multi-mode intelligent glasses image registration method and device

Country Status (1)

Country Link
CN (1) CN112598718B (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050249434A1 (en) * 2004-04-12 2005-11-10 Chenyang Xu Fast parametric non-rigid image registration based on feature correspondences
CN109064502A (en) * 2018-07-11 2018-12-21 西北工业大学 The multi-source image method for registering combined based on deep learning and artificial design features
CN109378054A (en) * 2018-12-13 2019-02-22 山西医科大学第医院 A kind of multi-modality images assistant diagnosis system and its building method
CN109767459A (en) * 2019-01-17 2019-05-17 中南大学 Novel ocular base map method for registering
CN110363797A (en) * 2019-07-15 2019-10-22 东北大学 A kind of PET and CT method for registering images inhibited based on excessive deformation
CN110838140A (en) * 2019-11-27 2020-02-25 艾瑞迈迪科技石家庄有限公司 Ultrasound and nuclear magnetic image registration fusion method and device based on hybrid supervised learning
CN111260594A (en) * 2019-12-22 2020-06-09 天津大学 Unsupervised multi-modal image fusion method
US20200184660A1 (en) * 2018-12-11 2020-06-11 Siemens Healthcare Gmbh Unsupervised deformable registration for multi-modal images
CN111369550A (en) * 2020-03-11 2020-07-03 创新奇智(成都)科技有限公司 Image registration and defect detection method, model, training method, device and equipment
CN111414968A (en) * 2020-03-26 2020-07-14 西南交通大学 Multi-mode remote sensing image matching method based on convolutional neural network characteristic diagram
CN112102385A (en) * 2020-08-20 2020-12-18 复旦大学 Multi-modal liver magnetic resonance image registration system based on deep learning
CN112150425A (en) * 2020-09-16 2020-12-29 北京工业大学 Unsupervised intravascular ultrasound image registration method based on neural network

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050249434A1 (en) * 2004-04-12 2005-11-10 Chenyang Xu Fast parametric non-rigid image registration based on feature correspondences
CN109064502A (en) * 2018-07-11 2018-12-21 西北工业大学 The multi-source image method for registering combined based on deep learning and artificial design features
US20200184660A1 (en) * 2018-12-11 2020-06-11 Siemens Healthcare Gmbh Unsupervised deformable registration for multi-modal images
CN109378054A (en) * 2018-12-13 2019-02-22 山西医科大学第医院 A kind of multi-modality images assistant diagnosis system and its building method
CN109767459A (en) * 2019-01-17 2019-05-17 中南大学 Novel ocular base map method for registering
CN110363797A (en) * 2019-07-15 2019-10-22 东北大学 A kind of PET and CT method for registering images inhibited based on excessive deformation
CN110838140A (en) * 2019-11-27 2020-02-25 艾瑞迈迪科技石家庄有限公司 Ultrasound and nuclear magnetic image registration fusion method and device based on hybrid supervised learning
CN111260594A (en) * 2019-12-22 2020-06-09 天津大学 Unsupervised multi-modal image fusion method
CN111369550A (en) * 2020-03-11 2020-07-03 创新奇智(成都)科技有限公司 Image registration and defect detection method, model, training method, device and equipment
CN111414968A (en) * 2020-03-26 2020-07-14 西南交通大学 Multi-mode remote sensing image matching method based on convolutional neural network characteristic diagram
CN112102385A (en) * 2020-08-20 2020-12-18 复旦大学 Multi-modal liver magnetic resonance image registration system based on deep learning
CN112150425A (en) * 2020-09-16 2020-12-29 北京工业大学 Unsupervised intravascular ultrasound image registration method based on neural network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
车统统: "基于深度学习的图像匹配研究与应用", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Also Published As

Publication number Publication date
CN112598718B (en) 2022-07-12

Similar Documents

Publication Publication Date Title
CN109299274B (en) Natural scene text detection method based on full convolution neural network
CN112884064B (en) Target detection and identification method based on neural network
Zhang et al. Deep hierarchical guidance and regularization learning for end-to-end depth estimation
CN108830171B (en) Intelligent logistics warehouse guide line visual detection method based on deep learning
CN111950453A (en) Optional-shape text recognition method based on selective attention mechanism
CN110633632A (en) Weak supervision combined target detection and semantic segmentation method based on loop guidance
CN113313703B (en) Unmanned aerial vehicle power transmission line inspection method based on deep learning image recognition
CN110705566B (en) Multi-mode fusion significance detection method based on spatial pyramid pool
CN117593304B (en) Semi-supervised industrial product surface defect detection method based on cross local global features
CN109657538B (en) Scene segmentation method and system based on context information guidance
CN111767854B (en) SLAM loop detection method combined with scene text semantic information
CN116434088A (en) Lane line detection and lane auxiliary keeping method based on unmanned aerial vehicle aerial image
CN113205507A (en) Visual question answering method, system and server
CN109740486B (en) Method and system for identifying number of human beings contained in image
CN115035172A (en) Depth estimation method and system based on confidence degree grading and inter-stage fusion enhancement
CN116824641B (en) Gesture classification method, device, equipment and computer storage medium
CN111898608B (en) Natural scene multi-language character detection method based on boundary prediction
CN112949451A (en) Cross-modal target tracking method and system through modal perception feature learning
CN112232221A (en) Method, system and program carrier for processing human image
CN112598718B (en) Unsupervised multi-view multi-mode intelligent glasses image registration method and device
CN116630917A (en) Lane line detection method
CN113920317B (en) Semantic segmentation method based on visible light image and low-resolution depth image
CN112200840B (en) Moving object detection system in visible light and infrared image combination
CN113159071B (en) Cross-modal image-text association anomaly detection method
CN111898671B (en) Target identification method and system based on fusion of laser imager and color camera codes

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant