CN113361542A - Local feature extraction method based on deep learning - Google Patents

Local feature extraction method based on deep learning Download PDF

Info

Publication number
CN113361542A
CN113361542A CN202110611600.1A CN202110611600A CN113361542A CN 113361542 A CN113361542 A CN 113361542A CN 202110611600 A CN202110611600 A CN 202110611600A CN 113361542 A CN113361542 A CN 113361542A
Authority
CN
China
Prior art keywords
network
local feature
homography
image
descriptor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110611600.1A
Other languages
Chinese (zh)
Other versions
CN113361542B (en
Inventor
刘晓平
蔡有城
李琳
王冬
黄鑫涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei University of Technology
Original Assignee
Hefei University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei University of Technology filed Critical Hefei University of Technology
Priority to CN202110611600.1A priority Critical patent/CN113361542B/en
Publication of CN113361542A publication Critical patent/CN113361542A/en
Application granted granted Critical
Publication of CN113361542B publication Critical patent/CN113361542B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a local feature extraction method based on deep learning, which comprises the following steps: firstly, network training is carried out, a pre-constructed network is trained on an image data set MS-COCO, the data set is divided into a training set and a verification set which respectively comprise 82783 images and 40504 images, then image matching is carried out, in the experiment, the performance of the local feature extraction method is evaluated by utilizing a standard local feature pipeline, the standard local feature pipeline is to extract and match features from any given pair of images in an experiment, followed by a repetition score (repetition) calculation, then, the matching Score (M-Score) calculation is carried out, and finally, the evaluation of the effect of the homography estimation is carried out, and by postponing the detection step until the description, compared with the traditional non-machine learning mode, the method has a more flexible characteristic searching process, obtains a large number of key points and improves the characteristic extraction precision.

Description

Local feature extraction method based on deep learning
Technical Field
The invention relates to the technical field of local feature extraction frameworks for deep learning, in particular to a local feature extraction method based on deep learning.
Background
In many areas of computer vision, learning-based methods have emerged and begin to outperform traditional methods, intuitively, the feature extraction process requires only a network of several convolutional layers to be able to model the behavior of traditional detectors and descriptors by learning the appropriate parameters, some existing learning-based methods focus on training detectors or descriptors individually, while others succeed in building an end-to-end feature detection and description pipeline, for the former, when these individually optimized detectors or descriptors are integrated into the complete pipeline, the performance gain of these individual components may disappear, for the latter, jointly training detectors and descriptors may be more desirable, which makes them synergistically optimized.
However, it is challenging to achieve two different optimization goals by training a network, because the optimization goal of the detector is repeatability, while the optimization goal of the descriptor is differentiable, there is not a good set of solutions for unifying and combining the two, and the prior art cannot well balance the two optimization goals.
Disclosure of Invention
The invention aims to provide a local feature extraction method based on deep learning, which aims to solve the problems in the prior art in the technical background.
In order to achieve the purpose, the invention provides the following technical scheme: a local feature extraction method based on deep learning comprises the following steps:
s1, firstly, training data is carried out
Training our network on an image dataset MS-COCO, the dataset being segmented into a training set and a validation set comprising 82783 and 40504 images respectively;
s2, and then performing image matching
In an experiment, evaluating the performance of the local feature extraction method by using a standard local feature pipeline, wherein the standard local feature pipeline is used for extracting and matching features from any given pair of images in the experiment;
s3, calculating the repetition fraction (Repeatability)
The repetition score is used for evaluating the performance of a detector in the local feature extraction method, more specifically, let epsilon represent a correct distance threshold value to obtain a correct key point correspondence between two detected images in an experiment, and the repetition score is defined as the number of correct corresponding key points divided by the total number of key points in an image pair;
s4, and then performing matching Score (M-Score) calculation
Evaluating the comprehensive performance of a detector in the local feature extraction method and a descriptor in the local feature extraction method by using a matching score, wherein the matching score is the ratio of correct matching obtained by a matching strategy of the standard local feature pipeline to the total matching quantity;
s5, finally, evaluating the effect of homography estimation
Evaluating the capability of the local feature extraction method for estimating a homography matrix by using a homography estimation effect, wherein the homography estimation is realized by RANSAC calculation;
the homography estimation effect adopts an indirect comparison method to adapt to homography matrixes with different scales, and the average distance between the homography matrix estimated by RANSAC and four corners of a group-route homography matrix transformation image is measured.
Preferably, the local feature extraction method includes a descriptor, a detector and a loss function, wherein:
the descriptor comprises a Homography Convolutional Network (HCN) and a feature description, and the descriptor operates on the original image to finally obtain a dense descriptor with the same resolution size as the original image;
the detector comprises a detector CNN network and key point extraction, and the detector operates tensor F obtained by the HCN to finally obtain sparse key point positions;
the loss function:
in order to jointly optimize the detector and the descriptor, the loss function is composed of two intermediate losses, namely a detection loss function and a description loss function, wherein the detection loss function enables the network to generate repeatable key point positions which are covariant with viewpoints or illumination, and the description loss function enables the network to output descriptors with strong distinctiveness, obtain reliable matching, jointly optimize the two losses, and simultaneously improve the effect and the performance of the detector and the descriptor.
Preferably, the Homography Convolutional Network (HCN):
receiving input original image data, predicting different original image transformations by using a homography estimation module in HCN, providing the transformed original image to a full convolution network instead of forcing the full convolution network to learn extra geometric changes, so that more original image information of network learning can be obtained, and a tensor F is obtained;
the characteristics are described as follows:
tensor derived from calculation of HCN
Figure BDA0003095947370000031
As inputs:
output a tensor by Bi-cubic interpolation
Figure BDA0003095947370000032
② obtaining a normalized descriptor vector d by L2-normalizes
dij=oij/‖oij2
Where i is 1, …, H, j is 1, …, W, H 'is H/4, W' is W/4, H and W are the height and width of the original image, respectively, and D is 256, these descriptor vectors can be easily matched between images by euclidean distance, thus obtaining a reliable correspondence;
the detector CNN network:
the detector CNN network aims at outputting a pixel-level detection fraction, the detection fraction represents the probability that the position is a key point, a tensor F is input into the detector CNN network to obtain the detection fraction of each pixel in original image data, the detector CNN network consists of a convolution layer and two upper convolution layers, the spatial resolution is gradually increased along with the gradual reduction of the number of channels, and finally a final result is obtained through a sigmoid activation function;
and (3) extracting the key points:
the key point extraction aims at outputting sparse key point positions, inputting detection scores obtained by the detector CNN network, and obtaining a specified number of feature points by using non-maximum suppression (NMS) and TopK operations.
Preferably, the homography estimation module consists of a convolution layer and a linear layer, and the original image data is predicted to be 6 xN after passing through a network layer of the homography estimation modulehA parameter for obtaining a homography transformation matrix;
wherein, 1 XNhOne parameter for calculating the scale transformation, 2 XNhOne parameter for calculating the rotation transformation, 3 XNhThe parameters are used for calculating perspective transformation;
the scale can be derived from one parameter:
λ(α)=exp(tanh(α));
for rotation, it can be calculated from two parameters by the following formula:
θ(α,β)=arctan2(tanh(α),tanh(β));
for the perspective transformation matrix A, three parameters can be processed by tanh activation function for representation (a)1,a2,a3) Thus, 6 XNhN can be obtained from one parameterhA homographic transformation matrix, NhIs a hyper-parameter, and sets N in consideration of the efficiency and effectiveness of the networkh=4;
Specifically, four corners of the image are set as initial points
x=[(-1,-1),(1,-1),(1,1),(-1,1)],
Four corresponding points are then predicted using the homography estimation module, where the corresponding initial point transform can be expressed as:
Figure BDA0003095947370000041
the homography transformation matrix H is computed from these 4 pairs of corresponding points x and x' in a differentiable manner using the Tensor direct linear transformation (Tensor DLT) as follows:
x′=Hx。
preferably, the detector performs inverse gradient update using a detection loss function;
the detection loss function calculation process is as follows:
giving a pair of real images I1And I2And giving out a ground-truth corresponding relation expressed as w (·), as shown in I1=w(I2) In other words, by this w (-) all the pixels I in the first image1Can be in the second image I2Find, we image pair I1And I2Inputting the network to obtain the detection score S1And S2Definition of G1And G2Detecting a loss function L for a key point label of ground-truthdetDefined by cross-entropy loss:
Ldet=Ls(S1,G1)+Ls(S2,G2)
Figure BDA0003095947370000051
where (i, j) represents the position of the coordinate point.
Preferably, the descriptor performs inverse gradient update by using a description loss function;
the description loss function is calculated as follows:
the loss describing function is based on the modified hardest-coherent loss, which is modified by a more strict negative distance, minimizes the distance between positive examples, maximizes the distance of the nearest negative example, and is expressed by Ldes
Figure BDA0003095947370000052
Herein, define
Figure BDA0003095947370000053
And
Figure BDA0003095947370000054
representing the kth corresponding descriptor of the image pair, K represents the number of all corresponding descriptors, and thus, the positive distance is represented as:
Figure BDA0003095947370000055
‖·‖2expressed as euclidean distance, negative distance is defined as:
Figure BDA0003095947370000056
where n (I, j, k) denotes the image I1Descriptor in (1)
Figure BDA0003095947370000057
And image I2The minimum distance of all non-corresponding descriptors in the set, then
Figure BDA0003095947370000058
Is shown and
Figure BDA0003095947370000059
the non-corresponding descriptor with the smallest distance,
Figure BDA00030959473700000510
the threshold C is a safety radius that is set to exclude feature points that are spatially too close to the correct correspondence, notably the describing loss function takes into account both the negative distance between pairs of images and the negative distance within images;
finally, a loss function L is described in connection withdetAnd said detection loss function LdesThe final loss function is obtained:
L=Ldes+Ldet
preferably, during network training, the MS-COCO dataset is processed, the resolutions of all images are adjusted to 320 × 240, then the images are converted into gray scales, in order to generate pixel correspondence, a suitable homography transformation matrix is randomly generated for each training sample, the homography transformed images and the images are simultaneously input into the network for training, and simultaneously the positions of the group-route key points are transformed to generate correspondingly transformed group-route key point labels.
Preferably, in the network test, the evaluation is performed on a HPatches data set, which has 116 image sequences, of which 57 sequences are illumination changes and 59 sequences are viewing angle changes, for each sequence, the first image is taken as a reference image and matched with all subsequent images, resulting in 580 image HPatches data set calculated at a resolution of 240 × 320 and extracting N1000 feature points, and the same Mutual Nearest Neighbor (MNN) matching strategy is employed, which is based on nearest neighbor search, i.e. only when two descriptors are mutually nearest, a match is accepted, and in order to emphasize the accuracy of the match, a threshold value e (e ∈ ═ 3) is set for the corresponding pixel, i.e. a match with a reprojection error below this threshold value is considered to be a correct match.
Compared with the prior art, the invention has the beneficial effects that:
1. compared with the existing method which mostly adopts a feature matching method of firstly detecting and then describing or simultaneously detecting and describing, the application can provide more distinguishing descriptors and then detecting, thereby greatly improving the effectiveness of feature matching, and the homography transformation operation in the existing method aims to generate more key points and is irrelevant to the generation of descriptors, and the HCN utilizes the homography transformation operation to generate more distinguishing descriptors, so the application cannot be extended from the prior art, and the homography transformation of the HCN is obtained through learning, so that the obtained homography transformation better accords with the characteristics of the descriptors, more distinguishing descriptors can be generated, the transformation in the existing method is obtained through sampling by a non-learning method, the method cannot be applied to the description of the characteristic symbol and cannot generate a descriptor with distinctiveness.
2. The method adopts a CNN network as a detector network to detect key points, combines a self-supervision training strategy to enable the obtained key points to be more repetitive, adopts a strategy of description before detection, and obtains more stable key points by postponing the detection step until the description is finished.
3. Two new loss functions are designed to further improve the performances of a descriptor and a detector, similarity loss is proposed, the repeatability of key point detection is further improved, the hardest-coherent loss with stricter negative distance constraint is adopted to avoid fuzzy areas and achieve more advanced performance, in the process of determining the loss functions, feature description and feature extraction double losses are utilized to act in the network together, so that the method not only considers the description process of more distinctive descriptors, but also considers obtaining more repetitive key points, the extraction of the descriptor HCN is associated with the loss of subsequent feature extraction, the whole network operates end to end, the time is saved, the network has good robustness due to the strategy of double loss superposition, and the feature description and the feature extraction of the picture after the HCN have better relevance, on one hand, the method can promote HCN to accurately and quickly generate the distinctive descriptors, on the other hand, the method can promote the utilization of the distinctive descriptors in the key point detection process, so that the more accurate key point detection is realized, and the superiority of the method is shown in the characteristic matching experiment.
Drawings
Fig. 1 is a flow chart of the RDFeat of the present invention.
Fig. 2 is a diagram of the RDFeat network architecture of the present invention.
Fig. 3 is a diagram of the RDFeat training architecture of the present invention.
Fig. 4 is a diagram of a homography estimation module network architecture of the present invention.
Fig. 5 is a diagram of a homographic transformation matrix based on scale, rotation and symmetric ray estimation in accordance with the present invention.
FIG. 6 is a graph of positive and negative distances, double arrowed lines representing Euclidean distances, of descriptors of the present invention.
FIG. 7 is a graph of the qualitative results of the HPatches data set of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
A Local feature extraction method based on deep Learning is provided, which is fully called reproducible and dispersive Detection and Description for Learning Local Features (RDFeat), and is used for obtaining reliable matching correspondence between images, distinguishing a frame with classic first Detection and later Description, adopting a strategy of first Description and later Detection, and obtaining more stable key points by postponing a Detection step until after Description, in the work, focusing on obtaining Repeatable key points and distinguishable descriptors, firstly, providing a dense and multiple Homography Convolution Network (HCN) as a descriptor network to estimate dense descriptors to obtain highly distinguishable descriptors, secondly, using a CNN network as a detector network to detect key points, and combining a self-supervision training strategy to make the obtained key points have more repeatability, finally, two new loss functions are designed to further improve the performance of the descriptor and the detector, RDFeat is trained on the MS-COCO image data set and then evaluated on a plurality of reference data sets, and experimental results show that the performance of the RDFeat is superior to that of the latest method.
As mentioned above, most of the existing feature extraction methods are feature matching methods that are performed after detection and then description or perform feature detection and description simultaneously, the purpose of the homography transformation operations in these methods is to allow the generation of more keypoints, independent of the generation of descriptors, therefore, it is even impossible to generate descriptors with distinction, it is difficult to improve the effectiveness of feature matching, whereas our HCN uses a homographic transformation operation to generate more distinct descriptors, it cannot be extended from the prior art to the present application, furthermore, the homographic transformation of HCN is obtained through learning, so that the obtained homographic transformation is more consistent with the characteristics of descriptors and can generate more distinctive descriptors, the transformation in the comparison file is obtained through sampling by a non-learning method, the method cannot be applied to the description of the characteristic symbol and cannot generate a descriptor with distinctiveness, and a specific scheme of the application is introduced below.
Referring to fig. 1-7, the present invention provides a technical solution:
a local feature extraction method based on deep learning, also called RDFeat, comprises the following steps:
s1, firstly, training data is carried out
Training our network on an image dataset MS-COCO, the dataset being segmented into a training set and a validation set comprising 82783 and 40504 images respectively;
s2, and then performing image matching
During the experiment, the performance of the local feature extraction method is evaluated by using a standard local feature pipeline, and the standard local feature pipeline extracts and matches features from any given pair of images in the experiment;
s3, calculating the repetition fraction (Repeatability)
The repetition score is used for evaluating the performance of a detector in the local feature extraction method, more specifically, let epsilon represent a correct distance threshold value to obtain a correct key point correspondence between two detected images in an experiment, and the repetition score is defined as the number of correct corresponding key points divided by the total number of key points in an image pair;
s4, and then performing matching Score (M-Score) calculation
Evaluating the comprehensive performance of a detector in the local feature extraction method and a descriptor in the local feature extraction method by using a matching score, wherein the matching score is the ratio of correct matching obtained by a matching strategy of the standard local feature pipeline to the total matching quantity;
s5, finally, evaluating the effect of homography estimation
Evaluating the capability of the local feature extraction method for estimating a homography matrix by using a homography estimation effect, wherein the homography estimation is realized by RANSAC calculation;
the homography estimation effect adopts an indirect comparison method to adapt to homography matrixes with different scales, and the average distance between the homography matrix estimated by RANSAC and four corners of a group-route homography matrix transformation image is measured.
In this embodiment, the local feature extraction method includes a descriptor, a detector, and a loss function, where:
the descriptor comprises a Homography Convolutional Network (HCN) and a feature description, and the descriptor operates on the original image to finally obtain a dense descriptor with the same resolution size as the original image;
the detector comprises a detector CNN network and key point extraction, and the detector operates tensor F obtained by the HCN to finally obtain sparse key point positions;
the method adopts a strategy of description before detection, and the detection step is postponed to the description so as to obtain more stable key points, and after more distinguishing descriptors are obtained, the key points with high repeatability are obtained in the detection process of the picture key points by using a self-supervision mode.
The loss function:
in order to jointly optimize the detector and the descriptor, the loss function is composed of two intermediate losses, namely a detection loss function and a description loss function, wherein the detection loss function enables the network to generate repeatable key point positions which are covariant with viewpoints or illumination, and the description loss function enables the network to output descriptors with strong distinctiveness, obtain reliable matching, jointly optimize the two losses, and simultaneously improve the effect and the performance of the detector and the descriptor.
In this embodiment, the Homography Convolutional Network (HCN):
receiving input original image data, predicting different image transformations by using a homography estimation module in HCN, providing the transformed images to a CNN network instead of forcing the CNN network to learn extra geometric changes, and thus, enabling the CNN network to learn more image information so as to obtain a tensor F;
as shown in fig. 2, the HCN takes an original image as input and then obtains N using the homography estimation modulehThe homography matrix transforms the image I to obtain a set of transformed images, where H (I) represents the image I transformed by the homography matrix H, and then we apply a full convolution network Q as a descriptor extraction network to extract a dense descriptor f for all the transformed images, defined as:
f=Q(H(I))
finally, the different dense signatures are inversely transformed back and fused into a dense signature by averaging:
Figure BDA0003095947370000111
this is done for two reasons, first, such a methodThe method allows the deep network to learn more about the geometric information of the image and secondly improves the distinctiveness of the descriptors under different geometric variations, that is, their string representations will be sufficiently similar (euclidean distance is sufficiently small) for descriptors of corresponding positions (matching) and sufficiently large (euclidean distance is sufficiently large) for descriptors of non-corresponding positions (non-matching), thereby improving the accuracy of the image matching, in practice, the full convolutional network F uses an vgg type encoder consisting of convolutional layers, pooling layers and activation functions, noting that our encoder uses two maximum pooling layers to reduce the resolution to 1/4, all convolutional layers are zero-padded to produce the same output size, we define H × W as the resolution of the input image, where H' ═ H/4, w' ═ W/4, the tensor of the output is defined as
Figure BDA0003095947370000112
Wherein D is the number of channels;
the descriptors obtained by the Homography Convolutional Network (HCN) have scale, rotation and affine invariance, although the CNN descriptor can also show a certain degree of scale invariance after being trained, the scale invariance is not the inherent property of the CNN, when the scale change is large or the visual angle changes, the CNN descriptor matching effect is greatly influenced, in order to handle the limitation, D2-Net uses an image pyramid model to make it more robust in scale change, but ignores other geometric changes, further, LF-Net learns different scales and directions of feature points, and then uses a micro-cuttable image block to calculate the robust descriptor, in addition, alsfiet uses a Deformable Convolutional Network (DCN) to predict and apply dense spatial transformation, thereby obtaining the capability of geometric change, in our work, we input images under different transformations to the full convolution network instead of forcing the full convolution network to learn additional geometric changes, so that more image information can be learned by the full convolution network, and thus more distinctive descriptors can be obtained, and the image matching effect is improved;
the homography describes the mapping of the position of an object in the pixel coordinate system of the image pair, so camera motion with rotation and translation can be easily modeled with the homography, and in addition, the homography can be easily estimated from a pair of images, which is a good model for simulating the same physical position of an object, for which reason the homography is used in our method to model geometric changes;
the characteristics are described as follows:
tensor derived from calculation of HCN
Figure BDA0003095947370000121
As inputs:
output a tensor by Bi-cubic interpolation
Figure BDA0003095947370000122
② obtaining a normalized descriptor vector d by L2-normalizes
dij=oij/‖oij2
Where i is 1, …, H, j is 1, …, W, H 'is H/4, W' is W/4, and H and W are the height and width of the original image, respectively; d is 256, and the descriptor vectors can be easily matched between the images through Euclidean distance, so that reliable corresponding relation is obtained;
the detector CNN network:
the detector CNN network aims at outputting a pixel-level detection fraction, the detection fraction represents the probability that the position is a key point, a tensor F is input into the detector CNN network to obtain the detection fraction of each pixel in original image data, the detector CNN network consists of a convolution layer and two upper convolution layers, the spatial resolution is gradually increased along with the gradual reduction of the number of channels, and finally a final result is obtained through a sigmoid activation function;
a descriptor extraction module is developed by adopting a structure similar to U-Net, and although an additional learning weight is introduced by the method, more stable and accurate key points can be obtained, which is reflected in the repeatability of the key points; meanwhile, different loss functions are proposed to further improve the network performance.
And (3) extracting the key points:
the key point extraction aims at outputting sparse key point positions, inputting detection scores obtained by the detector CNN network, and obtaining a specified number of feature points by using non-maximum suppression (NMS) and TopK operations.
In this embodiment, the homography estimation module is composed of a convolution layer and a linear layer, and the original image data is predicted to be 6 × N after passing through the network layer of the homography estimation modulehA parameter for obtaining a homography transformation matrix;
wherein, 1 XNhOne parameter for calculating the scale transformation, 2 XNhOne parameter for calculating the rotation transformation, 3 XNhThe parameters are used for calculating perspective transformation;
the scale can be derived from one parameter:
λ(α)=exp(tanh(α));
for rotation, it can be calculated from two parameters by the following formula:
θ(α,β)=arctan2(tanh(α),tanh(β));
for the perspective transformation matrix A, three parameters can be processed by tanh activation function for representation (a)1,a2,a3) Thus, 6 XNhN can be obtained from one parameterhA homographic transformation matrix, NhIs a hyper-parameter, and sets N in consideration of the efficiency and effectiveness of the networkh=4;
Specifically, four corners of the image are set as initial points
x=[(-1,-1),(1,-1),(1,1),(-1,1)-,
Four corresponding points are then predicted using the homography estimation module, where the corresponding initial point transform can be expressed as:
Figure BDA0003095947370000141
the homography transformation matrix H is computed from these 4 pairs of corresponding points x and x in a differentiable manner using the Tensor direct linear transformation (Tensor DLT) as follows:
x′=Hx。
the homography evaluation module has the main idea that an original picture is converted into 4 pictures (subjected to scale, rotation and symmetrical perspective) different from the original picture by utilizing a conversion matrix, the conversion matrix is controlled by 6 parameters, wherein 1 parameter controls the scale, 2 parameters controls the rotation, and 3 parameters controls the symmetrical radiation, so that a homography conversion matrix H is obtained;
and (3) coordinate corresponding process of the homography estimation module:
the difficulty of the current method lies in that a proper matrix cannot be directly found to directly model the transformation, but the method does not record in the prior art by limiting the 3 types of transformation to provide a matrix with 6 parameters, and from the realization process, the transformation matrix H of the method can easily find the inverse matrix H ', the transformation operation of the original image can be realized by utilizing the transformation matrix H, and then the original image can be inversely transformed through the inverse matrix H', specifically, the method finds the corresponding position coordinates of the transformed image after the coordinate points of the original image and the obtained H matrix are subjected to matrix corresponding multiplication. Then, extracting descriptors from the transformed image, wherein the position coordinates of the descriptors obtained by extracting the transformed image can be inversely transformed to the original image by using H', namely the descriptors obtained by extracting the 4 transformed images can be transformed back to the original positions, so that the information of the descriptors is enhanced;
in this embodiment, the detector performs inverse gradient update using a detection loss function;
the detection loss function calculation process is as follows:
giving a pair of real images I1And I2And giving out a ground-truth corresponding relation expressed as w (·), as shown in I1=w(I2) In other words, by this w (·), the image I1May be in image I2Find, we image pair I1And I2Input deviceNetwork obtains detection score S1And S2Definition of G1And G2Detecting a loss function L for a key point label of ground-truthdetDefined by cross-entropy loss:
Ldet=Ls(S1,G1)+Ls(S2,G2)
Figure BDA0003095947370000151
because it is difficult to determine the position of a group-channel key point, the conventional supervised training cannot solve the feature detection problem, and as observed in the previous work, there is no strict standard to define which positions are key points, so we solve the problem according to the self-supervision strategy proposed in the SuperPoint, and monitor the network by taking the group-channel key point generated by MagicPoint as the group-channel, MagicPoint trains on the Synthetic peaks dataset and then populates the dataset to a real image by using the homographic adaptation technology, and MagicPoint shows excellent performance in the aspect of key point detection, and quantitative indexes such as average accuracy (mAP) and repeatability are proved.
The descriptor updates the inverse gradient by adopting a description loss function;
the description loss function is calculated as follows:
the loss describing function is based on the modified hardest-coherent loss, which is modified by a more strict negative distance, minimizes the distance between positive examples, maximizes the distance of the nearest negative example, and is expressed by Ldes
Figure BDA0003095947370000152
Herein, define
Figure BDA0003095947370000153
And
Figure BDA0003095947370000154
representing the kth corresponding descriptor of the image pair, K represents the number of all corresponding descriptors, and thus, the positive distance is represented as:
Figure BDA0003095947370000155
‖·‖2expressed as euclidean distance, negative distance is defined as:
Figure BDA0003095947370000156
where n (I, j, k) denotes the image I1Descriptor in (1)
Figure BDA0003095947370000161
And image I2The minimum distance of all non-corresponding descriptors in the set, then
Figure BDA0003095947370000162
Is shown and
Figure BDA0003095947370000163
the non-corresponding descriptor with the smallest distance,
Figure BDA0003095947370000164
the threshold C is a safety radius that is set to exclude feature points that are spatially too close to the correct correspondence, notably the describing loss function takes into account both the negative distance between pairs of images and the negative distance within images;
finally, a loss function L is described in connection withdesAnd said detection loss function LdetThe final loss function is obtained:
L=Ldes+Ldet
final loss function to jointly optimize the detector and the descriptor, we propose a final loss function consisting of two intermediate losses, the detection loss and the description loss, for detection we want the network to produce repeatable keypoint locations, which are covariant to the viewpoint or illumination, and for description we want the network to output descriptors with strong distinctiveness, able to obtain a reliable match, for which we jointly optimize the two losses while improving the effect and performance of the detector and the descriptor.
By utilizing the common action of the feature description and the feature extraction in our network, the method not only considers the description process of a descriptor with more distinctiveness, but also considers the acquisition of a key point with more repeatability, and associates the extraction of the descriptor HCN with the loss of the subsequent feature extraction, so that the whole network operates end to end, thereby not only saving time, but also ensuring that the network has good robustness due to the strategy of overlapping the double losses, thereby realizing better relevance of the feature description and the feature extraction of the picture after being subjected to HCN, on one hand, the HCN can be promoted to accurately and rapidly generate the descriptor with more distinctiveness, on the other hand, the utilization of the descriptor with distinctiveness in the key point detection process can be promoted, thereby realizing more accurate key point detection, and in the feature matching experiment, the superiority of the method is shown.
In this embodiment, during network training, the MS-COCO data set is performed, the resolutions of all images are adjusted to 320 × 240, then the images are converted into gray scales, a suitable homographic transformation matrix is randomly generated for each training sample in order to generate pixel correspondence, the homographic transformed images and the images themselves are simultaneously input into the network for training, and meanwhile, the ground-route key point positions are transformed, and it should be noted that generating the corresponding transformed ground-route key point labels, the random generation of the homographic transformation matrix is limited within a reasonable range (generally, the range is determined by using a set value of SuperPoint) to simulate real-world camera transformation, thereby avoiding extreme situations.
In this embodiment, during network testing, evaluation is performed on an HPatches data set, where there are 116 image sequences in the HPatches, 57 of which are illumination changes, 59 of which are viewing angle changes, and for each sequence, the first image serves as a reference image and is matched with all subsequent images, so that 580 image HPatches data set is calculated, and the same Mutual Nearest Neighbor (MNN) matching strategy is employed, the MNN matching strategy is based on nearest neighbor search, that is, only when two descriptors are mutually nearest neighbors, the two descriptors are accepted as a match, in order to emphasize the accuracy of the match, a threshold e (e ═ 3) of the corresponding pixel is set, that is, the match with a reprojection error lower than the threshold is considered as a correct match;
for fair comparison, all methods are calculated with a resolution of 240 × 320 and extraction of N — 1000 feature points.
Figure BDA0003095947370000171
TABLE 1 evaluation results on HPatches
As shown in table 1, it can be found that our RDFeat is superior to all other methods in almost all indexes, when SIFT is at a low error threshold (epsilon ═ 1), due to its higher sub-pixel accuracy, the homography estimation capability is the best, when the threshold is larger, RDFeat can better estimate the homography matrix, it should be noted that RDFeat and SuperPoint are trained on the same data set, but RDFeat achieves better repeatability and matching score, proving its superiority;
FIG. 7 is a qualitative result of HPatches data set, large shaded area indicates more correct matches, small shaded area indicates less correct matches, RD-Net produces the most correct matches compared to SuperPoint, SIFT and ORB, covering the whole image even under extreme rotation and affine changes, although ORB performs as well as RDNet in repeatability, but its detection tends to form sparse clusters, and thus has poor performance on the homography estimation task.
The processing result shows that the invention can jointly learn the feature detector and the descriptor according to the above scheme, namely a novel deep network architecture, following the means of description re-detection, and combining the learning feature detector and the descriptor, and provides three innovations in the aspects of feature description, feature detection and loss function, thereby remarkably improving the differentiability and the key point repeatability of the descriptor, specifically, we provide a novel HCN to extract the dense descriptor, can collect more geometric image information under different transformations, and realize the scale, rotation and affine invariant performance of the descriptor, in addition, we also develop a detector CNN network based on the self-supervision training strategy, thereby realizing the effective detection of stable key points, in addition, considering different optimization targets of the detector and the descriptor, we design two loss functions to improve the feature performance, and finally, we perform comprehensive evaluation on a plurality of reference data sets, the experimental result shows that RDFeat shows impressive performance, the feature description and the feature extraction double loss are jointly acted in our network, so that the method not only considers the description process of a more distinctive descriptor, but also considers the acquisition of a more repetitive key point, the extraction of the descriptor HCN is associated with the loss of the subsequent feature extraction, the end-to-end operation of the whole network is realized, the time is saved, the network has good robustness due to the strategy of overlapping double losses, the feature description and the feature extraction of a picture after HCN have better relevance are realized, on one hand, the HCN can be promoted to accurately and quickly generate the more distinctive descriptor, on the other hand, the utilization of the distinctive descriptor in the key point detection process can be promoted, the more accurate key point detection is realized, in the feature matching experiment, the superiority of the method is shown.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (8)

1. A local feature extraction method based on deep learning is characterized in that: the local feature extraction method comprises the following steps:
s1, firstly, network training is carried out
Training a pre-constructed network on an image data set MS-COCO, wherein the data set is divided into a training set and a verification set which respectively comprise 82783 images and 40504 images;
s2, and then performing image matching
In an experiment, evaluating the performance of the local feature extraction method by using a standard local feature pipeline, wherein the standard local feature pipeline is used for extracting and matching features from any given pair of images in the experiment;
s3, calculating the repeat fraction
The repetition score is used for evaluating the performance of a detector in the local feature extraction method, more specifically, let epsilon represent a correct distance threshold value to obtain a correct key point correspondence between two detected images in an experiment, and the repetition score is defined as the number of correct corresponding key points divided by the total number of key points in an image pair;
s4, and then performing matching Score M-Score calculation
Evaluating the comprehensive performance of a detector in the local feature extraction method and a descriptor in the local feature extraction method by using a matching score, wherein the matching score is the ratio of correct matching obtained by a matching strategy of the standard local feature pipeline to the total matching quantity;
s5, finally, evaluating the effect of homography estimation
Evaluating the capability of the local feature extraction method for estimating a homography matrix by using a homography estimation effect, wherein the homography estimation is realized by RANSAC calculation;
the homography estimation effect evaluation adopts an indirect comparison method to adapt to homography matrixes with different scales, and the average distance between the homography matrix estimated by RANSAC and four corners of a group-route homography matrix transformation image is measured.
2. The local feature extraction method based on deep learning of claim 1, wherein:
the local feature extraction method comprises a descriptor, a detector and a loss function, wherein:
the descriptor comprises a homography convolutional network HCN and a characteristic description, and the descriptor operates on the original image to finally obtain a dense descriptor with the same resolution size as the original image;
the detector comprises a detector CNN network and key point extraction, and the detector operates tensor F obtained by the HCN to finally obtain sparse key point positions;
the loss function:
in order to jointly optimize the detector and the descriptor, the loss function is composed of two intermediate losses, namely a detection loss function and a description loss function, wherein the detection loss function enables the network to generate repeatable key point positions which are covariant with viewpoints or illumination, and the description loss function enables the network to output descriptors with strong distinctiveness, obtain reliable matching, jointly optimize the two losses, and simultaneously improve the effect and the performance of the detector and the descriptor.
3. The local feature extraction method based on deep learning according to claim 2, wherein:
the homography convolutional network HCN:
receiving input original image data, predicting different original image transformations by using a homography estimation module in HCN, providing the transformed original image to a full convolution network instead of forcing the full convolution network to learn extra geometric changes, so that more original image information of network learning can be obtained, and a tensor F is obtained;
the characteristics are described as follows:
tensor derived from calculation of HCN
Figure FDA0003095947360000021
As inputs:
output a tensor by Bi-cubic interpolation
Figure FDA0003095947360000022
② obtaining a normalized descriptor vector d by L2-normalizes
dij=oij/‖oij2
Where i is 1, …, H, j is 1, …, W, H 'is H/4, W' is W/4, and H and W are the height and width of the original image, respectively; d is 256, and the descriptor vectors can be easily matched between the images through Euclidean distance, so that reliable corresponding relation is obtained;
the detector CNN network:
the detector CNN network aims at outputting a pixel-level detection fraction, the detection fraction represents the probability that the position is a key point, a tensor F is input into the detector CNN network to obtain the detection fraction of each pixel in original image data, the detector CNN network consists of a convolution layer and two upper convolution layers, the spatial resolution is gradually increased along with the gradual reduction of the number of channels, and finally a final result is obtained through a sigmoid activation function;
and (3) extracting the key points:
the key point extraction aims at outputting sparse key point positions, inputting detection scores obtained by the detector CNN network, and utilizing non-maximum values to inhibit NMS and TopK operations so as to obtain a specified number of feature points.
4. The method of claim 3, wherein the local feature extraction method based on deep learning is characterized in that:
the homography estimation module consists of a convolution layer and a linear layer, and original image data is predicted to be 6 multiplied by N after passing through a network layer of the homography estimation modulehA parameter for obtaining a homography transformation matrix;
wherein, 1 XNhOne parameter for calculating the scale transformation, 2 XNhOne parameter for calculating the rotation transformation, 3 XNhThe parameters are used for calculating perspective transformation;
the scale can be derived from one parameter:
λ(α)=exp(tanh(α));
for rotation, it can be calculated from two parameters by the following formula:
θ(α,β)=arctan2(tanh(α),tanh(β));
for the perspective transformation matrix A, three parameters can be processed by tanh activation function for representation (a)1,a2,a3) Thus, 6 XNhN can be obtained from one parameterhA homographic transformation matrix, NhIs a hyper-parameter, and sets N in consideration of the efficiency and effectiveness of the networkh=4;
Specifically, four corners of the image are set as initial points
x=[(-1,-1),(1,-1),(1,1),(-1,1)],
Four corresponding points are then predicted using the homography estimation module, where the corresponding initial point transform can be expressed as:
Figure FDA0003095947360000041
the homography transformation matrix H is computed from these 4 pairs of corresponding points x and x in a differentiable manner using the Tensor direct linear transformation (Tensor DLT) as follows:
x′=Hx。
5. the local feature extraction method based on deep learning according to claim 2, wherein:
the detector adopts a detection loss function to update the reverse gradient;
the detection loss function calculation process is as follows:
giving a pair of real images I1And I2And giving out a ground-truth corresponding relation expressed as w (·), as shown in I1=w(I2) In other words, by means of the function w (·), the image I1May be in image I2Find, image pair I1And I2Inputting the network to obtain the detection score S1And S2Definition of G1And G2Detecting the loss function L for the corresponding ground-truth key point labeldetDefined by cross-entropy loss:
Ldet=Ls(S1,G1)+Ls(S2,G2)
Figure FDA0003095947360000042
where (i, j) represents the position of the coordinate point.
6. The method of claim 5, wherein the local feature extraction method based on deep learning is characterized in that:
the descriptor updates the inverse gradient by adopting a description loss function;
the description loss function is calculated as follows:
the loss describing function is based on the modified hardest-coherent loss, which is modified by a more strict negative distance, minimizes the distance between positive examples, maximizes the distance of the nearest negative example, and is expressed by Ldes
Figure FDA0003095947360000051
Herein, define
Figure FDA0003095947360000052
And
Figure FDA0003095947360000053
representing the kth corresponding descriptor of the image pair, K represents the number of all corresponding descriptors, and thus, the positive distance is represented as:
Figure FDA0003095947360000054
‖·‖2expressed as euclidean distance, negative distance is defined as:
Figure FDA0003095947360000055
where n (I, j, k) denotes the image IiDescriptor in (1)
Figure FDA0003095947360000056
And image IjThe minimum distance of all non-corresponding descriptors in the set, then
Figure FDA0003095947360000057
Is shown and
Figure FDA0003095947360000058
the non-corresponding descriptor with the smallest distance,
Figure FDA0003095947360000059
the threshold C is a safety radius that is set to exclude feature points that are spatially too close to the correct correspondence, notably the describing loss function takes into account both the negative distance between pairs of images and the negative distance within images;
finally, a loss function L is described in connection withdesAnd said detection loss function LdetThe final loss function is obtained:
L=Ldes+Ldet
7. the local feature extraction method based on deep learning according to any one of claims 1-6, characterized in that: when the network training is performed, the resolution of all images is adjusted to 320 × 240 on the MS-COCO data set, then the images are converted into gray scale, in order to generate pixel correspondence, a suitable homography transformation matrix is randomly generated for each training sample, the homography transformed images and the images are simultaneously input into the network for training, and simultaneously the positions of the group-route key points are transformed to generate correspondingly transformed group-route key point labels.
8. The local feature extraction method based on deep learning according to any one of claims 1-6, characterized in that: in the network test, evaluation is performed on an HPatches data set, which has 116 image sequences, of which 57 sequences are illumination changes and 59 sequences are viewing angle changes, and for each sequence, the first image is taken as a reference image and matched with all subsequent images, resulting in 580 image HPatches data set calculated at a resolution of 240 × 320 and with extracted N1000 feature points, and the same mutual nearest MNN matching strategy is employed, which is based on nearest neighbor search, i.e. only when two descriptors are nearest to each other, a match is accepted, and in order to emphasize the accuracy of the match, a threshold e (e ∈ 3) is set for the corresponding pixel, i.e. a match with a reprojection error below this threshold is considered to be a correct match.
CN202110611600.1A 2021-06-02 2021-06-02 Local feature extraction method based on deep learning Active CN113361542B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110611600.1A CN113361542B (en) 2021-06-02 2021-06-02 Local feature extraction method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110611600.1A CN113361542B (en) 2021-06-02 2021-06-02 Local feature extraction method based on deep learning

Publications (2)

Publication Number Publication Date
CN113361542A true CN113361542A (en) 2021-09-07
CN113361542B CN113361542B (en) 2022-08-30

Family

ID=77531111

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110611600.1A Active CN113361542B (en) 2021-06-02 2021-06-02 Local feature extraction method based on deep learning

Country Status (1)

Country Link
CN (1) CN113361542B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114067100A (en) * 2021-10-29 2022-02-18 厦门大学 Feature point matching method for simultaneously generating detector and descriptor under difficult condition
CN114663594A (en) * 2022-03-25 2022-06-24 中国电信股份有限公司 Image feature point detection method, device, medium, and apparatus
CN115170893A (en) * 2022-08-29 2022-10-11 荣耀终端有限公司 Training method of common-view gear classification network, image sorting method and related equipment
CN115860091A (en) * 2023-02-15 2023-03-28 武汉图科智能科技有限公司 Depth feature descriptor learning method based on orthogonal constraint
CN116774154A (en) * 2023-08-23 2023-09-19 吉林大学 Radar signal sorting method
CN116881430A (en) * 2023-09-07 2023-10-13 北京上奇数字科技有限公司 Industrial chain identification method and device, electronic equipment and readable storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170337470A1 (en) * 2016-05-20 2017-11-23 Magic Leap, Inc. Method and system for performing convolutional image transformation estimation
CN108629301A (en) * 2018-04-24 2018-10-09 重庆大学 A kind of human motion recognition method based on moving boundaries dense sampling and movement gradient histogram
CN108846861A (en) * 2018-06-12 2018-11-20 广州视源电子科技股份有限公司 Image homography matrix calculation method and device, mobile terminal and storage medium
US20190147341A1 (en) * 2017-11-14 2019-05-16 Magic Leap, Inc. Fully convolutional interest point detection and description via homographic adaptation
CN110929748A (en) * 2019-10-12 2020-03-27 杭州电子科技大学 Motion blur image feature matching method based on deep learning
CN111401384A (en) * 2020-03-12 2020-07-10 安徽南瑞继远电网技术有限公司 Transformer equipment defect image matching method
CN111652240A (en) * 2019-12-18 2020-09-11 南京航空航天大学 Image local feature detection and description method based on CNN
WO2020187705A1 (en) * 2019-03-15 2020-09-24 Retinai Medical Ag Feature point detection

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170337470A1 (en) * 2016-05-20 2017-11-23 Magic Leap, Inc. Method and system for performing convolutional image transformation estimation
US20190147341A1 (en) * 2017-11-14 2019-05-16 Magic Leap, Inc. Fully convolutional interest point detection and description via homographic adaptation
CN108629301A (en) * 2018-04-24 2018-10-09 重庆大学 A kind of human motion recognition method based on moving boundaries dense sampling and movement gradient histogram
CN108846861A (en) * 2018-06-12 2018-11-20 广州视源电子科技股份有限公司 Image homography matrix calculation method and device, mobile terminal and storage medium
WO2020187705A1 (en) * 2019-03-15 2020-09-24 Retinai Medical Ag Feature point detection
CN110929748A (en) * 2019-10-12 2020-03-27 杭州电子科技大学 Motion blur image feature matching method based on deep learning
CN111652240A (en) * 2019-12-18 2020-09-11 南京航空航天大学 Image local feature detection and description method based on CNN
CN111401384A (en) * 2020-03-12 2020-07-10 安徽南瑞继远电网技术有限公司 Transformer equipment defect image matching method

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
DANIEL DETONE ET AL: "SuperPoint: Self-Supervised Interest Point Detection and Description", 《2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW)》 *
JEROME REVAUD ET AL: "R2D2: Repeatable and Reliable Detector and Descriptor", 《ARXIV》 *
PETER HVIID CHRISTIANSEN ET AL: "UNSUPERPOINT: END-TO-END UNSUPERVISED INTEREST POINT DETECTOR AND DESCRIPTOR", 《ARXIV》 *
贾迪等: "图像匹配方法研究综述", 《中国图象图形学报》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114067100A (en) * 2021-10-29 2022-02-18 厦门大学 Feature point matching method for simultaneously generating detector and descriptor under difficult condition
CN114663594A (en) * 2022-03-25 2022-06-24 中国电信股份有限公司 Image feature point detection method, device, medium, and apparatus
CN115170893A (en) * 2022-08-29 2022-10-11 荣耀终端有限公司 Training method of common-view gear classification network, image sorting method and related equipment
CN115170893B (en) * 2022-08-29 2023-01-31 荣耀终端有限公司 Training method of common-view gear classification network, image sorting method and related equipment
CN115860091A (en) * 2023-02-15 2023-03-28 武汉图科智能科技有限公司 Depth feature descriptor learning method based on orthogonal constraint
CN115860091B (en) * 2023-02-15 2023-04-28 武汉图科智能科技有限公司 Depth feature descriptor learning method based on orthogonal constraint
CN116774154A (en) * 2023-08-23 2023-09-19 吉林大学 Radar signal sorting method
CN116774154B (en) * 2023-08-23 2023-10-31 吉林大学 Radar signal sorting method
CN116881430A (en) * 2023-09-07 2023-10-13 北京上奇数字科技有限公司 Industrial chain identification method and device, electronic equipment and readable storage medium
CN116881430B (en) * 2023-09-07 2023-12-12 北京上奇数字科技有限公司 Industrial chain identification method and device, electronic equipment and readable storage medium

Also Published As

Publication number Publication date
CN113361542B (en) 2022-08-30

Similar Documents

Publication Publication Date Title
CN113361542B (en) Local feature extraction method based on deep learning
WO2022002150A1 (en) Method and device for constructing visual point cloud map
CN109684924B (en) Face living body detection method and device
CN110348319B (en) Face anti-counterfeiting method based on face depth information and edge image fusion
CN111401384B (en) Transformer equipment defect image matching method
CN107424161B (en) Coarse-to-fine indoor scene image layout estimation method
CN110120064B (en) Depth-related target tracking algorithm based on mutual reinforcement and multi-attention mechanism learning
CN108776975A (en) Visual tracking method based on semi-supervised feature and filter joint learning
CN111709980A (en) Multi-scale image registration method and device based on deep learning
CN108898269A (en) Electric power image-context impact evaluation method based on measurement
CN112329662B (en) Multi-view saliency estimation method based on unsupervised learning
CN111898566B (en) Attitude estimation method, attitude estimation device, electronic equipment and storage medium
CN111523586B (en) Noise-aware-based full-network supervision target detection method
CN109993116B (en) Pedestrian re-identification method based on mutual learning of human bones
CN113011359B (en) Method for simultaneously detecting plane structure and generating plane description based on image and application
CN117237858B (en) Loop detection method
Amiri et al. RASIM: a novel rotation and scale invariant matching of local image interest points
CN114120013A (en) Infrared and RGB cross-modal feature point matching method
CN112070181B (en) Image stream-based cooperative detection method and device and storage medium
CN107291813B (en) Example searching method based on semantic segmentation scene
CN117351194A (en) Graffiti type weak supervision significance target detection method based on complementary graph inference network
CN113158870B (en) Antagonistic training method, system and medium of 2D multi-person gesture estimation network
CN114842506A (en) Human body posture estimation method and system
CN109146861A (en) A kind of improved ORB feature matching method
CN110503061B (en) Multi-feature-fused multi-factor video occlusion area detection method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant