TWI749821B

TWI749821B - Image feature comparison processing method and system

Info

Publication number: TWI749821B
Application number: TW109136733A
Authority: TW
Inventors: 陳冠文; 楊曉蒨; 陳柏亨; 陳永昇; 莊仁輝
Original assignee: 國立陽明交通大學
Priority date: 2020-10-22
Filing date: 2020-10-22
Publication date: 2021-12-11
Also published as: TW202217648A

Abstract

An image feature comparison processing method is provided. The method comprises: obtaining a first image and applying a feature matching neural network to process the first image; using a detector to find a plurality of feature points from the first image for matching, and applying a deconvolution algorithm to amplify the plurality of feature points; applying a descriptor to detect feature positions of the plurality of feature points, and applying the deconvolution algorithm to effectively describe the feature positions; and generating a second image according to the plurality of feature points and the effective position description.

Description

Image feature comparison processing method and system

本發明涉及一種圖像特徵比對處理方法與系統，特別是，應用於電腦視覺領域，並利用人工智慧中深度學習(deep learning)來解決特徵匹配的問題之圖像特徵比對處理方法與系統。 The present invention relates to an image feature comparison processing method and system, in particular, an image feature comparison processing method and system that is applied to the field of computer vision and uses deep learning in artificial intelligence to solve the problem of feature matching .

現有圖像特徵比對技術會產生一些問題。最主要的問題在於特徵檢測器的訓練方式及特徵描述器的解析度(resolution)問題。在特徵檢測器的訓練方式部分，皆需要直接提供關鍵事實(keypoint ground truth)的資訊，才能訓練深度網路，一般來說，這些關鍵事實的產生方式，通常是由人工檢測(hand-crafted detector)來產生的，這通常會讓所要訓練的深度網路，只是去“模仿”現有的人工檢測的表現，很難學習去“發掘”新的、更具代表性的特徵點。另外，超級點(SuperPoint)雖然利用了自己產生的虛擬資料來讓深度網路學習到一般特徵點的資訊，比如T接合(T-junction)或L接合(L-junction)，並不會有上述問題，但由於上述訓練只在虛擬資料的領域(domain)上，要直接使用在現實生活應用是很困難的，故要訓練出一個具備泛化性(generalization)的超級點網路，還需要一個複雜的領域適應(domain-adaptation)的流程，這不只增加了訓練深度網路的複雜度，還間接的侷限了網路的能力(因為此適應的流程完全基於單應性轉換(homography transformation)，但現實影像之間不完全是基於單應性轉換的。特徵描述器的解析度(resolution)部分，雖然現有技術可避開了使用關鍵事實這個缺失，但是在描述器解析(descriptor resolution)上，因為現有技術是使用一個矩形網格(rectangular gird)來逼近描述器的位置，所以在關鍵點的定位偵測上，會無法精準的偵測出位置，進而造成在最終的匹配上會有不匹配(mismatch)的狀況。 The existing image feature comparison technology will cause some problems. The main problem lies in the training method of the feature detector and the resolution of the feature descriptor. In the training method of the feature detector, it is necessary to directly provide keypoint ground truth information in order to train the deep network. Generally speaking, the method of generating these key facts is usually performed by hand-crafted detectors (hand-crafted detectors). ), which usually makes the deep network to be trained just to "imitate" the performance of the existing manual detection, and it is difficult to learn to "explore" new and more representative feature points. In addition, although SuperPoint uses virtual data generated by itself to allow the deep network to learn the information of general feature points, such as T-junction or L-junction, it does not have the above Problem, but because the above training is only in the domain of virtual data, it is very difficult to directly use it in real-life applications. Therefore, to train a super-point network with generalization, one is needed. The complex domain-adaptation process not only increases the complexity of training deep networks, but also indirectly limits the capabilities of the network (because this adaptation process is completely based on homography transformation, But the real images are not completely based on homography conversion of. In the resolution part of the feature descriptor, although the prior art can avoid the lack of using key facts, in the descriptor resolution, because the prior art uses a rectangular grid to approximate The position of the descriptor, so in the location detection of the key point, the position will not be accurately detected, which will cause a mismatch in the final match.

因此，本發明提出一種圖像特徵比對處理方法與系統，其主要使用深度神經網路來達成特徵匹配的功能。 Therefore, the present invention proposes an image feature comparison processing method and system, which mainly uses a deep neural network to achieve the feature matching function.

本發明之一實施例提供一種圖像特徵比對處理系統。此系統包括：特徵匹配神經網路，取得第一圖像，並對第一圖像進行處理；檢測器，由第一圖像找出多個特徵點以進行匹配；描述器，偵測多個特徵點的特徵位置；處理器，接收多個特徵點，應用反捲積演算法放大多個特徴點，接收特徵位置應用反捲積運算法對特徵位置進行有效的位置描述，進而讓處理器根據多個特徵點及該有效的位置描述，以產生第二圖像。 An embodiment of the present invention provides an image feature comparison processing system. This system includes: a feature matching neural network to obtain a first image and process the first image; a detector to find multiple feature points from the first image for matching; a descriptor to detect multiple The feature location of the feature point; the processor receives multiple feature points, applies deconvolution algorithm to amplify multiple feature points, receives the feature location and applies deconvolution algorithm to describe the feature location effectively, and then lets the processor according to A plurality of feature points and the effective position description are used to generate a second image.

於一實施例，特徵匹配神經網路應用多視角立體演算法對第一圖像進行處理。 In one embodiment, the feature matching neural network applies a multi-view stereo algorithm to process the first image.

於一實施例，特徵匹配神經網路應用關鍵點感知損失演算法學習及分辨多個特徵點，其中，特徵匹配神經網路應用角點檢測演算法及損失函數，以產生該關鍵點感知損失演算法，其中，特徵匹配神經網路應用角點檢測演算法檢測第一圖像的邊緣及轉角。 In one embodiment, the feature matching neural network uses a key point perception loss algorithm to learn and distinguish multiple feature points, where the feature matching neural network applies a corner detection algorithm and a loss function to generate the key point perception loss algorithm Method, wherein the feature matching neural network uses a corner detection algorithm to detect the edges and corners of the first image.

於一實施例，處理器應用反捲積運算法被應用對多個該特徵點的該有效的位置進行描述。 In one embodiment, the processor applies a deconvolution algorithm to describe the effective positions of the multiple feature points.

本發明一實施例提供一種圖像特徵比對處理方法。此方法包括：取得第一圖像，並應用特徵匹配神經網路對第一圖像進行處理；應用檢測器由第一圖像找出多個特徵點以進行匹配，並應用反捲積演算法放大多個特徴點；應用描述器偵測多個特徵點的特徵位置，應用反捲積運算法對特徵位置進行有效的位置描述；以及根據多個特徵點及有效的位置描述，以產生第二圖像。 An embodiment of the present invention provides an image feature comparison processing method. This method includes: obtaining a first image, and applying a feature matching neural network to process the first image; applying a detector to find multiple feature points from the first image for matching, and applying a deconvolution algorithm Amplify multiple feature points; apply a descriptor to detect the feature positions of multiple feature points, apply deconvolution algorithm to describe the feature locations effectively; and generate a second feature based on multiple feature points and effective location descriptions image.

於一實施例，應用多視角立體演算法對第一圖像進行處理。 In one embodiment, a multi-view stereo algorithm is used to process the first image.

於一實施例，應用關鍵點感知損失演算法以讓特徵匹配神經網路學習及分辨多個特徵點，其中，應用角點檢測演算法及損失函數，以產生關鍵點感知損失演算法，其中，應用角點檢測演算法檢測第一圖像的邊緣及轉角。 In one embodiment, the key point perception loss algorithm is applied to allow the feature matching neural network to learn and distinguish multiple feature points, wherein the corner detection algorithm and loss function are applied to generate the key point perception loss algorithm, wherein, The corner detection algorithm is used to detect the edges and corners of the first image.

於一實施例，反捲積運算法被應用對多個該特徵點的該有效的位置進行描述。 In one embodiment, the deconvolution algorithm is applied to describe the effective positions of multiple feature points.

為更進一步瞭解本發明的特徵及技術內容，請參閱以下有關本發明的詳細說明與圖式，然而所提供的圖式僅用於提供參考與說明，並非用來對本發明加以限制。 In order to further understand the features and technical content of the present invention, please refer to the following detailed description and drawings about the present invention. However, the drawings provided are only for reference and description, and are not used to limit the present invention.

11:特徵匹配神經網路 11: Feature matching neural network

12:檢測器 12: Detector

13:描述器 13: Descriptor

14:處理器 14: processor

S201~S205:步驟 S201~S205: steps

A、A1:目標物 A, A1: target

圖1顯示根據本發明之一實施例的圖像特徵比對處理系統。 Fig. 1 shows an image feature comparison processing system according to an embodiment of the present invention.

圖2顯示根據本發明之一實施例的圖像特徵比對處理方法。 Fig. 2 shows an image feature comparison processing method according to an embodiment of the present invention.

圖3A顯示未應用本發明所產生的圖示。 Figure 3A shows a diagram generated without applying the present invention.

圖3B顯示應用本發明之圖像比對處理方法所產生的圖示。 FIG. 3B shows a diagram generated by applying the image comparison processing method of the present invention.

圖1顯示根據本發明之一實施例的圖像特徵比對處理系統。圖像特徵比對處理系統1包括：特徵匹配神經網路11、檢測器12、描述器13及處理器14。特徵匹配神經網路11取得第一圖像，並對第一圖像進行處理；檢測器12由第一圖像找出多個特徵點以進行匹配；描述器13檢測多個特徵點的特徵位置；處理器14接收多個特徵點，應用反捲積演算法放大多個特徴點，并同時接收特徵位置，應用反捲積運算法對特徵位置進行有效的位置描述，進而讓處理器14根據多個特徵點及有效的位置描述，以產生第二圖像。於本實施例，特徵匹配神經網路11應用多視角立體演算法對第一圖像進行處理，特徵匹配神經網路11可應用關鍵點感知損失演算法學習及分辨多個特徵點，特徵匹配神經網路11應用角點檢測演算法及損失函數，以產生關鍵點感知損失演算法，特徵匹配神經網路11應用角點檢測演算法檢測第一圖像的邊緣及轉角。處理器14應用反捲積運算法對多個該特徵點的該有效的位置進行描述。 Fig. 1 shows an image feature comparison processing system according to an embodiment of the present invention. The image feature comparison processing system 1 includes: a feature matching neural network 11, a detector 12, a descriptor 13, and a processor 14. The feature matching neural network 11 obtains the first image and processes the first image; the detector 12 finds multiple feature points from the first image for matching; the descriptor 13 detects the feature positions of the multiple feature points ; The processor 14 receives multiple feature points, applies the deconvolution algorithm to amplify multiple feature points, and receives the feature positions at the same time, applies the deconvolution algorithm to describe the feature positions effectively, and then lets the processor 14 according to the multiple Feature points and effective location descriptions to produce a second image. In this embodiment, the feature matching neural network 11 uses a multi-view stereo algorithm to process the first image, and the feature matching neural network 11 can apply a key point perception loss algorithm to learn and distinguish multiple feature points, and the feature matching neural network The network 11 applies a corner detection algorithm and a loss function to generate a key point perception loss algorithm, and the feature matching neural network 11 uses a corner detection algorithm to detect the edges and corners of the first image. The processor 14 applies a deconvolution algorithm to describe the effective positions of the multiple feature points.

圖2顯示根據本發明之一實施例的圖像特徵比對處理方法。請同時參照圖1與圖2，圖像特徵處理方法包括：取得第一圖像，並應用特徵匹配神經網路11對第一圖像進行處理(步驟S201)；應用檢測器12由第一圖像找出多個特徵點以進行匹配，並應用反捲積演算法放大多個特徴點(步驟S202)；應用描述器13檢測多個特徵點的特徵位置，並應用反捲積運算法對特徵位置進行有效的位置描述(步驟S203)；以及根據多個特徵點及有效的位置描述，以產生第二圖像(步驟S204)。 Fig. 2 shows an image feature comparison processing method according to an embodiment of the present invention. 1 and 2 at the same time, the image feature processing method includes: obtaining a first image, and applying a feature matching neural network 11 to process the first image (step S201); applying the detector 12 from the first image For example, find multiple feature points for matching, and apply the deconvolution algorithm to amplify multiple feature points (step S202); apply the descriptor 13 to detect the feature positions of the multiple feature points, and apply the deconvolution algorithm to the features Perform an effective location description for the location (step S203); and generate a second image based on multiple feature points and effective location descriptions (step S204).

於本實施例，圖像特徵處理方法還包括應用多視角立體演算法對第一圖像進行處理，應用關鍵點感知損失演算法以讓特徵匹配神經網路學習及分辨多個該特徵點，其中，應用角點檢測演算法及損失函數，以產生關鍵點感知損失演算法，其中，應用角點檢測演算法檢測第一圖像的邊緣及轉角。特別是，反捲積運算法可應用對多個該特徵點的該有效的位置進行描述。 In this embodiment, the image feature processing method further includes applying a multi-view stereo algorithm to process the first image, and applying a key point perception loss algorithm to match the features to the neural network The path learns and distinguishes a plurality of the feature points, wherein a corner detection algorithm and a loss function are applied to generate a key point perception loss algorithm, wherein the corner detection algorithm is applied to detect the edges and corners of the first image. In particular, the deconvolution algorithm can be applied to describe the effective positions of a plurality of the feature points.

圖3A顯示未應用本發明所產生的圖示。圖3B顯示應用本發明之圖像比對處理方法所產生的圖示。於圖3A所示，未應用本發明之圖像，其轉換及比對圖像的過程中，特徵點及其所對應的特徵點位置較少，因此，圖像易於失真及清晰和辨識度不高，甚至會扭曲變形，或有斷線的情況。例如，圖3A中所示的目標物A在轉換為目標物A1的過程中，會有失真及斷線的情況。但經本發明之轉換，如圖3B所示，應用本發明的特徵比對處理方法所訓練出來的特徵比對網路，能有更多的特徵點，並能更精準的顯示所要的圖形，並提高圖形的清晰度。并且，由於特徵點多且其所對應的特徵位置亦多，故圖像較不易失真，且清晰度及辨識度也較高。例如，目標物A轉換為目標物A1時，由於特徵點及有效特徵位置較多且密集，因此，所轉換的目標物A1會較為清晰，且不易失真及斷線。本發明藉由反捲積(deconvolution)的方式，將描述檢測器(descriptor tensor)放大回原始輸入影像的解析度，此舉可以有效率的訓練特徵檢測(feature detector)，並且可以更精準的檢測每個描述器(descriptor)所描述的像素點位置。 Figure 3A shows a diagram generated without applying the present invention. FIG. 3B shows a diagram generated by applying the image comparison processing method of the present invention. As shown in FIG. 3A, the image without applying the present invention has fewer feature points and their corresponding feature point positions during the conversion and comparison of the image. Therefore, the image is prone to distortion and clearness and lack of recognition. High, it may even be distorted or broken. For example, the target A shown in FIG. 3A may be distorted and disconnected during the conversion process to the target A1. However, after the conversion of the present invention, as shown in Figure 3B, the feature comparison network trained by the feature comparison processing method of the present invention can have more feature points and display the desired graphics more accurately, and Improve the clarity of graphics. In addition, because there are many feature points and corresponding feature locations, the image is less likely to be distorted, and the clarity and recognition are also high. For example, when the target object A is converted to the target object A1, since the feature points and effective feature positions are many and dense, the converted target object A1 will be clearer and less likely to be distorted and disconnected. The present invention uses deconvolution to amplify the descriptor tensor back to the resolution of the original input image, which can train feature detectors efficiently, and can detect more accurately The position of the pixel described by each descriptor.

如圖3B所示，於本實施例，應用本發明圖像特徵比對處理方法取得第一圖像(圖3B之左圖)中的目標物A，並應用特徵匹配神經網路對第一圖像中的目標物A進行處理。應用檢測器12由第一圖像的目標物A公平且隨機的找出多個特徵點以進行匹配至第二圖示(圖3B之右圖)的目標物 A1，並應用反捲積演算法放大目標物A上的多個特徴點。應用描述器13檢測多個特徵點的特徵位置，並應用反捲積運算法對特徵位置進行有效的位置描述以及根據多個特徵點及有效的位置描述對應並匹配之目標物A1，以產生含有目標物A1的第二圖像。 As shown in FIG. 3B, in this embodiment, the image feature comparison processing method of the present invention is applied to obtain the target A in the first image (the left image of FIG. 3B), and the feature matching neural network is applied to the first image The target A in the image is processed. The application detector 12 fairly and randomly finds multiple feature points from the target A in the first image for matching to the target in the second illustration (the right image of FIG. 3B) A1, and apply the deconvolution algorithm to amplify multiple characteristic points on the target A. The application descriptor 13 detects the feature positions of multiple feature points, and applies the deconvolution algorithm to effectively describe the feature locations, and according to the multiple feature points and effective location descriptions, the corresponding and matched target A1 is generated to generate The second image of the target A1.

於本實施例，還可用多視角立體演算法對目標物A進行處理，例如，應用攝像裝置對目標物A進行多面向的掃瞄及描繪，以描繪出此目標物A。再應用角點檢測演算法對目標物A的邊緣及轉角進行掃瞄及演算。接著，應用角點檢測演算法及損失函數產生關鍵點感知損失演算法，前述損失函數是用來計算出每個像素點是否為關鍵點的信心程度。根據關鍵點感知演算法讓特徵匹配神經網路學習及分辨多個特徵點，並應用反捲積運算法來對多個特徵點的有效的位置進行描述，進而產生目標物A1。 In this embodiment, a multi-view stereo algorithm can also be used to process the target A. For example, a camera device is used to scan and depict the target A in multiple directions to depict the target A. Then use the corner detection algorithm to scan and calculate the edges and corners of the target A. Then, the corner detection algorithm and the loss function are used to generate the key point perception loss algorithm. The aforementioned loss function is used to calculate the confidence level of whether each pixel is a key point. According to the key point perception algorithm, the feature matching neural network learns and distinguishes multiple feature points, and applies the deconvolution algorithm to describe the effective positions of multiple feature points, and then generates the target A1.

於本實施例，本發明應用多個特徵點作為關鍵點，再藉由神經網路學習、分辨及收集大量的資料，因此，更可僅根據部分的圖示來複原為完整圖示，例如，圖3B所示，若目標物A僅有部分(例如，僅有圓柱及斜面的部分)被標示特徵點及特徵位置，但經過神經網路的分辨，可完整復原為目標物A1的圖示。 In this embodiment, the present invention uses multiple feature points as key points, and then learns, distinguishes, and collects a large amount of data through the neural network. Therefore, it can be restored to a complete icon based on only part of the icon, such as As shown in Figure 3B, if only part of the target A (for example, only the part of the cylinder and the inclined plane) is marked with feature points and feature positions, it can be completely restored to the icon of the target A1 through the discrimination of the neural network .

本發明有別於傳統特徵匹配將特徵點檢測(feature detector)以及特徵描述(feature descriptor)拆成兩個互不相干的任務，本發明將上述兩個任務合併變成一個端到端(end-to-end)的學習過程；另外，傳統方法的檢測器(detector)以及描述器(descriptor)皆須仰賴特定領域知識(domain knowledge)來進行設計，此種方式的使用時機，會有所限制且無法具備良好的泛化性，反之，本發明所提出的基於深度學習的特徵匹配神經網路並無此限制，而是讓深度網路能夠從給予的資料中自行學習，並同時收集大量資料，讓其使用的場域更加廣泛。 Unlike traditional feature matching, the present invention separates feature point detection (feature detector) and feature description (feature descriptor) into two irrelevant tasks. The present invention combines the above two tasks into one end-to-end (end-to-end) task. -end) learning process; in addition, the traditional method of detector (detector) and descriptor (descriptor) must rely on specific domain knowledge (domain knowledge) to design, the timing of the use of this method will be limited and impossible It has good generalization. On the contrary, the feature matching neural network based on deep learning proposed by the present invention does not There is no such limitation. Instead, the deep network can learn by itself from the given data and collect a large amount of data at the same time, so that it can be used in a wider range of fields.

本發明所要解決的問題可以分為兩點：第一是在不需要提供關鍵事實(keypoint ground truth)，且也不需依賴複雜的領域適應(domain-adaptation)的情況下，就可以有效檢測特徵點的特徵檢測器(feature detector)。第二是解決特徵描述器(feature descriptor)因為特徵圖分辨率(feature map resolution)逐漸下降，而導致關鍵點(keypoint)定位不夠精準的問題。 The problem to be solved by the present invention can be divided into two points: The first is that it can effectively detect features without providing keypoint ground truth and without relying on complex domain-adaptation. Point feature detector (feature detector). The second is to solve the problem that the feature descriptor (feature descriptor) gradually decreases in the resolution of the feature map, and the keypoint positioning is not accurate enough.

本發明使用提出的特徵匹配神經網路從有明顯外觀差異的圖像中提取可靠的密集特徵對應關係(dense feature correspondences)。以前的所有相關研究都依賴於手工方式的特徵(hand-crafted features)，從運動結構估計(Structure-from-Motion,SfM)的稀疏特徵對應關係進行訓練，因此現有技術的知識僅限於這些手工檢測器；除此之外，在某些現有的方法中，所得到的特徵描述張量解析度會導致特徵無法精準的被檢測到。為了克服上述的問題，本發明提出了一個新穎的概念，即從多視角立體(Multi-View Stereo，MVS)中同時學習檢測器(detector)和描述器(descriptor)以及密集的對應關係。本發明不預先使用任何條件來選取像素點，而是公平且隨機的挑選，除此之外，本發明提出了一個關鍵點感知損失(keypoint-aware loss)，目的在幫助檢測器學習密集的關鍵點以進行匹配；而在特徵描述方面，有別於現有方法，將檢測到的特徵位置用方型網格來近似，本發明採取反捲積的運算來學習更有效的位置描述。從應用結果證明，從密集的對應關係中學習檢測器和描述器可以顯著提高圖像匹配質量。 The present invention uses the proposed feature matching neural network to extract reliable dense feature correspondences from images with obvious appearance differences. All previous related studies have relied on hand-crafted features, which are trained from the sparse feature correspondences of Structure-from-Motion (SfM). Therefore, the knowledge of the prior art is limited to these hand-crafted features. In addition, in some existing methods, the resulting feature description tensor resolution will result in features that cannot be accurately detected. In order to overcome the above-mentioned problems, the present invention proposes a novel concept of simultaneously learning detectors and descriptors and dense correspondence from Multi-View Stereo (MVS). The present invention does not use any conditions to select pixels in advance, but selects them fairly and randomly. In addition, the present invention proposes a keypoint-aware loss, which aims to help the detector to learn intensively. In terms of feature description, unlike existing methods, the detected feature position is approximated by a square grid. The present invention adopts a deconvolution operation to learn a more effective position description. From the application knot It turns out that learning detectors and descriptors from dense correspondences can significantly improve the quality of image matching.

本發明在不需利用關鍵事實(keypoint ground truth)與任何領域適應(domain-adaptation)的前提下，藉由反捲積(deconvolution)的方式，將描述檢測器(descriptor tensor)放大回原始輸入影像的解析度，此舉可以有效率的訓練特徵檢測(feature detector)，並且可以更精準的檢測每個描述器(descriptor)所描述的像素點位置。除此之外，本發明利用角點檢測演算法(Harris corner detector)的概念，以及捲積神經網路(Convolutional Neural Networks，CNN)的運算，設計出一個關鍵點感知損失函數(keypoint-aware loss function)，此損失函數(loss function)是用來計算出每個像素點是否為關鍵點(keypoint)的信心程度(confidence)，可以有效的提升那些真的是特徵點被檢測到的機率。 The present invention does not require the use of keypoint ground truth and any domain-adaptation, and uses deconvolution to enlarge the descriptor tensor back to the original input image. With high resolution, this can efficiently train feature detectors, and can more accurately detect the pixel positions described by each descriptor. In addition, the present invention uses the concept of the Harris corner detector (Harris corner detector) and the operation of Convolutional Neural Networks (CNN) to design a keypoint-aware loss function (keypoint-aware loss). function), this loss function is used to calculate the confidence of whether each pixel is a keypoint, which can effectively improve the probability that those feature points are detected.

更明確的說，本發明不需要事先提供好關鍵事實(keypoint ground truth)來訓練特徵檢測(feature detector)，這使得本發明的檢測器並不會受限於現有的檢測器的表現。再者，本發明的深度網路並不會如超級點(SuperPoint)一樣，需要先訓練在虛擬資料上，再利用一個複雜的領域適應(domain-adaptation)方法，讓網路可以應用在真實世界的場景，且在真實場景中，也不會只在具有單應性匹配(homography pair)的影像之間才會表現良好。除此之外，本發明額外使用了反捲積(deconvolution)的方法，讓縮小的特徵張量(feature tensor)，學習去放大回原始影像的大小，此舉可以避免像傳統技術使用長方形網格的近似流程，而失去定位特徵點的精準度。 More specifically, the present invention does not need to provide keypoint ground truth in advance to train the feature detector, which makes the detector of the present invention not limited to the performance of the existing detector. Furthermore, the deep network of the present invention is not like SuperPoint. It needs to be trained on virtual data first, and then a complex domain-adaptation method is used to allow the network to be applied in the real world. In real scenes, it will not only perform well between images with homography pairs. In addition, the present invention additionally uses a deconvolution method to allow the reduced feature tensor (feature tensor) to learn to zoom back to the original image size, which can avoid the use of rectangular grids like traditional techniques The approximate process is lost, and the accuracy of locating feature points is lost.

本發明所提出的關鍵點感知損失可以不需要額外提供給捲積神經網路(CNN)關鍵點的事實(ground truth)。其原因在於，本發明利用角點檢測演算法中的回應值(response value)R的倒數當作是關鍵點圖示(kp map)中的係數，關鍵點圖示中每個像素代表的意義是此像素屬於關鍵點的信心指數，指數越高，代表CNN越有信心覺得該像素屬於一個關鍵點。此時若CNN偵測到某一關鍵點的回應很低，進而導致之前現有技術常用的三重邊緣損失(triplet margin loss)且很難被優化的時候，這時候便會產生出一個很大的係數(因為R是個很小的數值，但倒數後便很大)，這時CNN便會去降低關鍵點圖示中該關鍵點的數值，也就是降低CNN認為這個像素是關鍵點的信心指數。藉由這種方式，便可以讓CNN自行學習何謂關鍵點的像素。 The key point perception loss proposed by the present invention may not need to provide additional ground truth for the key point of the convolutional neural network (CNN). The reason is that the present invention uses the reciprocal of the response value R in the corner detection algorithm as the coefficient in the kp map. The meaning of each pixel in the key point map is This pixel belongs to the confidence index of a key point. The higher the index, the more confident CNN is that the pixel belongs to a key point. At this time, if CNN detects that the response of a certain key point is very low, which leads to the triplet margin loss commonly used in the prior art and is difficult to optimize, a large coefficient will be generated at this time (Because R is a very small value, but it will be very large after the countdown). At this time, CNN will reduce the value of the key point in the key point icon, which is to reduce the confidence index of CNN that this pixel is a key point. In this way, CNN can learn what is the key point pixel by itself.

本發明之圖像特徵比對處理方法及系統可用現有的電腦軟體及其他相關電子硬體設備，來實現深度神經網路、檢測、描述、多視角立體演算法、角點檢測演算法、反捲積演算法、損失函數及產生關鍵點感知損失等等技術，進行處理電腦視覺影像，並利用人工智慧中深度學習(deep learning)來解決特徵匹配的問題。 The image feature comparison processing method and system of the present invention can use existing computer software and other related electronic hardware equipment to realize deep neural network, detection, description, multi-view stereo algorithm, corner detection algorithm, and rewind Technologies such as product algorithm, loss function, and key point perception loss are used to process computer vision images, and use deep learning in artificial intelligence to solve the problem of feature matching.

上所公開的內容僅為本發明的優選可行實施例，並非因此侷限本發明的申請專利範圍，所以凡是運用本發明說明書及圖式內容所做的等效技術變化，均落入本發明的申請專利範圍內。 The content disclosed above is only a preferred and feasible embodiment of the present invention, and does not limit the scope of the patent application of the present invention. Therefore, all equivalent technical changes made using the description and schematic content of the present invention fall into the application of the present invention. Within the scope of the patent.

S 201~S205:步驟 S 201~S205: steps

Claims

An image feature comparison processing method includes: obtaining a first image through a feature matching neural network and processing the first image through a multi-view stereo algorithm to obtain a dense feature correspondence, and The feature matching neural network uses a key point perception loss algorithm to allow the feature matching neural network to learn and distinguish multiple feature points, wherein a corner detection algorithm and a loss function are used to generate the key point perception Loss algorithm, the corner detection algorithm is used to detect the edges and corners of the first image; the multi-view stereo algorithm through a detector processes the first image and finds it from the first image A number of feature points are generated for matching; the feature positions of multiple feature points are detected by a descriptor, and feature tensors are performed on the feature points through a deconvolution algorithm in a processor to amplify multiple features The point and the feature location are effectively described; and the processor generates a second image according to the plurality of feature points, the effective location description, and the corresponding relationship between the dense feature.

The image feature comparison processing method of claim 1, wherein the deconvolution algorithm is applied to describe the effective positions of a plurality of the feature points.

An image feature comparison processing system includes: a feature matching neural network to obtain a first image, and the feature matching neural network should process the first image through a multi-view stereo algorithm to obtain A dense feature correspondence, where the feature matching neural network learns and distinguishes multiple features through a key point perception loss algorithm, and the feature matching neural network uses a corner point detection algorithm and a loss function Number to generate the key point perception loss algorithm, the feature matching neural network detects the edges and corners of the first image through the corner detection algorithm; a detector uses the multi-view stereo algorithm for the first image An image is processed and a plurality of feature points are found from the first image for matching; a descriptor detects the feature positions of a plurality of the feature points; and a processor receives a plurality of feature points and applies a reverse The convolution algorithm performs a feature tensor on the feature point to amplify a plurality of the feature points, receives the feature location and applies the deconvolution algorithm to perform an effective location description of the feature location, and then allows the processor to perform an effective location description of the feature location based on the multiple features. The feature point and the effective location description and the corresponding relationship between the dense feature to generate a second image.

The image feature comparison processing system according to claim 3, wherein the processor applies the deconvolution algorithm to describe the effective positions of the multiple feature points.