KR20230034887A

KR20230034887A - Keypoint extraction method based on deep learning and learning method for keypoint extraction

Info

Publication number: KR20230034887A
Application number: KR1020220104508A
Authority: KR
Inventors: 임종우; 조범진
Original assignee: 한양대학교 산학협력단
Priority date: 2021-09-03
Filing date: 2022-08-22
Publication date: 2023-03-10

Abstract

Disclosed are a keypoint extraction method robust to a geometrical change in an image and a learning method for keypoint extraction. The keypoint extraction method based on deep learning includes the steps of: receiving a target image; generating a first feature map of a target image by using a pre-trained first artificial neural network; and generating a plurality of second feature maps having different receptive field scales from the first feature map by using a pre-trained second artificial neural network and generating and extracting a keypoint and a descriptor for the target image from each of the second feature maps.

Description

Deep learning-based feature point extraction method and learning method for feature point extraction {KEYPOINT EXTRACTION METHOD BASED ON DEEP LEARNING AND LEARNING METHOD FOR KEYPOINT EXTRACTION}

본 발명은 딥러닝 기반의 특징점 추출 방법 및 특징점 추출을 위한 학습 방법에 관한 것이다. The present invention relates to a method for extracting feature points based on deep learning and a learning method for extracting feature points.

특징점 추출(Keypoint Extraction)은 이미지 정합, 물체 인식, 동시적 위치 추정 및 지도 작성(SLAM) 등 다양한 컴퓨터 비전 분야에서 첫번째 단계에 이용되는 핵심 기술이다. 해당 기술의 경우 카메라의 시점이나 조명의 변화에 영향을 받지 않고 물체의 형태나 크기, 위치가 변해도 같은 지점을 찾는 것이 매우 중요하다.Keypoint Extraction is a key technology used in the first step in various computer vision fields such as image registration, object recognition, simultaneous localization and mapping (SLAM). In the case of this technology, it is very important to find the same point even if the shape, size, or position of an object changes without being affected by changes in camera viewpoint or lighting.

특징점 추출은 전통적인 핸드크래프트 방식과 최근 활발히 연구가 진행중인 딥러닝 기반 방식이 존재한다. 핸드크래프트 방식은 영상 내 코너니스가 높은 위치를 수학적으로 정의하고 이를 기반으로 특징점을 추출한다. 따라서, 주로 이미지 내에서 픽셀값의 변화가 큰 모서리나 덩어리(Blob)의 중심점에서 추출된다. 반면, 딥러닝 기반 방식은 SfM(Structure from Motion)을 통해 구해진 점들을 정답(Ground Truth)으로 학습하거나, 또는 영상 간 반복성(Repeatability)이 높은 지점들의 특성을 학습하여 전통적인 방식에 비해 높은 성능을 보여준다.For feature point extraction, there is a traditional handcraft method and a deep learning-based method that is being actively researched recently. The handcraft method mathematically defines a location with high corneriness in an image and extracts a feature point based on it. Therefore, it is mainly extracted from the corner or the center point of a blob with a large change in pixel value within the image. On the other hand, the deep learning-based method learns the points obtained through SfM (Structure from Motion) as the correct answer (Ground Truth) or learns the characteristics of points with high repeatability between images, showing higher performance than the traditional method. .

하지만, 종래의 딥러닝 기반 특징점 추출 방법들은 주로 시점 또는 조명 변화 등의 상황에서의 강건한 동작을 목표로 학습되어, 상대적으로 기하학적 스케일 변화가 큰 상황에서의 성능이 핸드크래프트 방식과 유사하다는 한계가 존재한다. 이러한 한계를 극복하기 위해, 하나의 이미지를 여러 스케일의 이미지로 구성하여 각 스케일 이미지마다 특징점을 추출하는 방법이 제안된 바 있다.However, conventional deep learning-based feature point extraction methods are mainly trained for robust operation in situations such as viewpoint or lighting change, and there is a limitation that performance in a situation with a relatively large geometric scale change is similar to that of the handcraft method. do. In order to overcome this limitation, a method of extracting feature points for each scale image by composing one image with images of various scales has been proposed.

관련 선행 문헌으로서, 특허 문헌인 대한민국 공개특허 제2021-0074163호, 비특허 문헌인, "Dusmanu, M., Rocco, I., Pajdla, T., Pollefeys, M., Sivic, J., Torii, A., & Sattler, T. (2019). D2-net: A trainable cnn for joint description and detection of local features. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp.8092-8101)."가 있다.As related prior literature, Korean Patent Publication No. 2021-0074163, a patent document, non-patent document, "Dusmanu, M., Rocco, I., Pajdla, T., Pollefeys, M., Sivic, J., Torii, A., & Sattler, T. (2019). "D2-net: A trainable cnn for joint description and detection of local features. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp.8092-8101)." there is.

본 발명은, 이미지의 기하학적인 변화에 강인한 특징점 추출 방법 및 특징점 추출을 위한 학습 방법을 제공하기 위한 것이다.An object of the present invention is to provide a method for extracting feature points that is robust to geometric changes in an image and a learning method for extracting feature points.

상기한 목적을 달성하기 위한 본 발명의 일 실시예에 따르면, 타겟 이미지를 입력받는 단계; 미리 학습된 제1인공 신경망을 이용하여, 타겟 이미지에 대한 제1특징맵을 생성하는 단계; 및 미리 학습된 제2인공 신경망을 이용하여, 상기 제1특징맵으로부터 수용장(receptive field)의 스케일이 서로 다른 복수의 제2특징맵을 생성하고, 제2특징맵 각각에서 상기 타겟 이미지에 대한 특징점 및 설명자를 추출하는 단계를 포함하는 딥러닝 기반의 특징점 추출 방법이 제공된다.According to one embodiment of the present invention for achieving the above object, receiving a target image; generating a first feature map of a target image by using a pre-learned first artificial neural network; and generating a plurality of second feature maps having different receptive field scales from the first feature map by using a second artificial neural network trained in advance, and generating information about the target image from each of the second feature maps. A method for extracting feature points based on deep learning including extracting feature points and descriptors is provided.

또한 상기한 목적을 달성하기 위한 본 발명의 다른 실시예에 따르면, 훈련용 이미지에 대한 변환 이미지를 생성하는 단계; 상기 훈련용 이미지, 상기 변환 이미지 및 정답값을 이용하여, 딥러닝 모델이 상기 훈련용 이미지 및 상기 변환 이미지에 대한 특징점과, 상기 특징점에 대한 설명자를 추출하도록 학습을 수행하는 단계를 포함하며, 상기 정답값은 상기 훈련용 이미지 및 상기 변환 이미지에 대한 특징점의 위치값을 포함하며, 상기 설명자는 상기 특징점 별로 서로 다르게 할당되는 식별자를 포함하는 특징점 추출을 위한 학습 방법이 제공된다.In addition, according to another embodiment of the present invention for achieving the above object, generating a conversion image for the training image; Performing learning so that a deep learning model extracts feature points for the training image and the transformed image and descriptors for the feature points using the training image, the converted image, and correct answer values; The correct answer value includes position values of feature points for the training image and the converted image, and the descriptor includes an identifier assigned differently to each feature point. A learning method for extracting feature points is provided.

본 발명의 일실시예에 따르면 서로 스케일이 다른 복수의 수용장에 의해 생성된 특징맵으로부터 특징점 및 설명자가 추출됨으로써, 타겟 이미지의 스케일이 달라지는 상황에서도, 효과적으로 특징점과 설명자가 추출될 수 있다.According to an embodiment of the present invention, feature points and descriptors are extracted from feature maps generated by a plurality of receptive fields having different scales, so that feature points and descriptors can be effectively extracted even in a situation where the scale of the target image is changed.

도 1은 본 발명의 일실시예에 따른 딥러닝 기반의 특징점 추출 방법을 설명하기 위한 도면이다.
도 2는 팽창 컨벌루션을 설명하기 위한 도면이다.
도 3은 본 발명의 일실시예에 따른 딥러닝 모델을 설명하기 위한 도면이다.
도 4는 본 발명의 일실시예에 따른 스코어맵을 도시하는 도면이다.
도 5는 본 발명의 일실시예에 따라서 추출된 특징점을 나타내는 도면이다.
도 6은 본 발명의 일실시예에 따른 특징점 추출을 위한 학습 방법을 설명하기 위한 도면이다. 1 is a diagram for explaining a feature point extraction method based on deep learning according to an embodiment of the present invention.
2 is a diagram for explaining dilated convolution.
3 is a diagram for explaining a deep learning model according to an embodiment of the present invention.
4 is a diagram showing a score map according to an embodiment of the present invention.
5 is a diagram showing feature points extracted according to an embodiment of the present invention.
6 is a diagram for explaining a learning method for feature point extraction according to an embodiment of the present invention.

본 발명은 다양한 변경을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 상세한 설명에 상세하게 설명하고자 한다. 그러나, 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. 각 도면을 설명하면서 유사한 참조부호를 유사한 구성요소에 대해 사용하였다.Since the present invention can make various changes and have various embodiments, specific embodiments will be illustrated in the drawings and described in detail in the detailed description. However, this is not intended to limit the present invention to specific embodiments, and should be understood to include all modifications, equivalents, and substitutes included in the spirit and scope of the present invention. Like reference numerals have been used for like elements throughout the description of each figure.

이하에서, 본 발명에 따른 실시예들을 첨부된 도면을 참조하여 상세하게 설명한다.Hereinafter, embodiments according to the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일실시예에 따른 딥러닝 기반의 특징점 추출 방법을 설명하기 위한 도면이며, 도 2는 팽창 컨벌루션을 설명하기 위한 도면이다.1 is a diagram for explaining a feature point extraction method based on deep learning according to an embodiment of the present invention, and FIG. 2 is a diagram for explaining dilation convolution.

본 발명의 일실시예에 따른 특징점 추출 방법은, 프로세서 및 메모리를 포함하는 컴퓨팅 장치에서 수행될 수 있다.A feature point extraction method according to an embodiment of the present invention may be performed in a computing device including a processor and a memory.

도 1을 참조하면 본 발명의 일실시예에 따른 컴퓨팅 장치는 타겟 이미지를 수신(S110) 즉, 입력받고, 타겟 이미지에서 특징점 및 설명자를 추출한다. 이 때, 컴퓨팅 장치는 제1 및 제2인공 신경망을 포함하는 딥러닝 모델을 이용해, 특징점(keypoint) 및 설명자(descriptor)를 추출할 수 있다. 여기서 설명자는, 특징점 별로 서로 다르게 할당되어, 특징점이 식별될 수 있도록 지원하는 식별자에 대응된다. Referring to FIG. 1 , a computing device according to an embodiment of the present invention receives a target image (S110), receives an input, and extracts feature points and descriptors from the target image. In this case, the computing device may extract a keypoint and a descriptor using a deep learning model including the first and second artificial neural networks. Here, the descriptor corresponds to an identifier that is differently assigned to each feature point and supports identification of the feature point.

컴퓨팅 장치는 미리 학습된 제1인공 신경망을 이용하여, 타겟 이미지에 대한 제1특징맵을 생성(S120)한다. 그리고 미리 학습된 제2인공 신경망을 이용하여, 제1특징맵으로부터 복수의 제2특징맵을 생성하며, 제2특징맵 각각에서 타겟 이미지에 대한 특징점 및 설명자를 추출(S130)한다. 여기서, 제2특징맵 각각은 서로 수용장(receptive field)의 스케일이 다른 특징맵이다. 다시 말해, 제2특징맵 각각은 서로 다른 스케일(scale)의 수용장에 의해 생성된 특징맵이다.The computing device generates a first feature map of the target image by using the pre-learned first artificial neural network (S120). Then, a plurality of second feature maps are generated from the first feature map using the pre-learned second artificial neural network, and feature points and descriptors of the target image are extracted from each of the second feature maps (S130). Here, each of the second feature maps is a feature map having a different receptive field scale. In other words, each of the second feature maps is a feature map generated by receptive fields of different scales.

제1 및 제2인공 신경망은 CNN(Convolution Neural Network) 기반의 신경망일 수 있으며, 제2인공 신경망은 특징점을 추출하는 특징점 추출 네트워크 및 설명자를 추출하는 설명자 추출 네트워크를 포함할 수 있다. 그리고 제2특징맵은 팽창 컨벌루션(dilated convolution)이나, ReS2Net과 같은 인공 신경망을 통해 생성될 수 있다.The first and second artificial neural networks may be neural networks based on a convolutional neural network (CNN), and the second artificial neural network may include a feature point extraction network for extracting feature points and a descriptor extraction network for extracting descriptors. Also, the second feature map may be generated through dilated convolution or an artificial neural network such as ReS2Net.

팽창 컨벌루션은 컨벌루션 필터 내부에 제로 패딩을 추가해 수용장의 스케일을 조절하는 컨벌루션 방법으로서, 도 2에 도시된 바와 같이, 팽창 비율(dilaltion ratio)의 값에 따라 수용장의 스케일이 달라진다. 도 2에는 팽창 비율의 값이 각각 1, 2, 3인 경우의 수용장의 스케일이 도시된다. 팽창 비율의 값이 커질수록, 제로 패딩이 증가하여 수용장의 스케일은 증가한다.Dilatation convolution is a convolution method that adjusts the scale of the receptive field by adding zero padding inside the convolution filter. As shown in FIG. 2, the scale of the receptive field varies according to the value of the dilation ratio. 2 shows the scale of the receiving field when the value of the expansion ratio is 1, 2, and 3, respectively. As the value of the expansion ratio increases, the zero padding increases and the scale of the receptive field increases.

그리고 ReS2Net과 같이, 연속적으로 배치된 컨벌루션 레이어를 이용해, 입력 이미지에 대해 연속적으로 컨벌루션 연산을 적용하는 경우에도, 컨벌루션 연산의 횟수에 비례하여, 수용장의 스케일은 증가한다.And even when convolution operations are continuously applied to the input image using convolutional layers arranged consecutively, such as ReS2Net, the scale of the receptive field increases in proportion to the number of convolution operations.

따라서 본 발명의 일실시예에 따른 컴퓨팅 장치는 서로 다른 팽창 비율을 이용해 팽창 컨벌루션을 수행하여, 스케일이 서로 다른 제2특징맵을 생성할 수 있으며, 또는 이미 생성된 이전 제2특징맵에 대해 재차 컨벌루션 연산을 수행함으로써, 이전 제2특징맵과 수용장 스케일이 상이한 복수의 제2특징맵들을 생성할 수 있다.Accordingly, the computing device according to an embodiment of the present invention may generate second feature maps having different scales by performing dilation convolution using different expansion ratios, or may re-create the previously generated second feature maps. By performing the convolution operation, a plurality of second feature maps having different receptive field scales from the previous second feature maps may be generated.

이와 같이, 본 발명의 일실시예에 따르면, 수용장의 다양한 스케일 별로 특징맵이 생성되고, 이러한 특징맵으로부터 특징점 및 설명자가 추출됨으로써, 타겟 이미지의 스케일이 달라지는 상황에서도, 효과적으로 특징점과 설명자가 추출될 수 있다.In this way, according to one embodiment of the present invention, feature maps are generated for each scale of the receptacle, and feature points and descriptors are extracted from these feature maps, so that feature points and descriptors can be effectively extracted even in a situation where the scale of the target image varies. can

단계 S130에서, 특징점 추출 네트워크에 의해, 제2특징맵 별로 타겟 이미지에 대한 후보 특징점이 추출된다. 후보 특징점 별로 신뢰도값이 함께 출력되며, 컴퓨팅 장치는 후보 특징점 중에서 신뢰도값이 임계값 이상인 특징점을, 타겟 이미지에 대한 특징점으로 결정할 수 있다.In step S130, candidate feature points for the target image are extracted for each second feature map by the feature point extraction network. A reliability value is also output for each candidate feature point, and the computing device may determine a feature point having a reliability value equal to or greater than a threshold value among the candidate feature points as a feature point for the target image.

그리고 설명자 추출 네트워크에서도 제2특징맵 별로 특징점에 대한 후보 설명자가 추출된다. 컴퓨팅 장치는 후보 설명자 중에서 특징점의 위치에 대응되는 설명자를 특징점에 대한 설명자로 결정할 수 있다.Also in the descriptor extraction network, candidate descriptors for feature points are extracted for each second feature map. The computing device may determine a descriptor corresponding to a location of a feature point among candidate descriptors as a descriptor for the feature point.

도 3은 본 발명의 일실시예에 따른 딥러닝 모델을 설명하기 위한 도면으로서, 도 3에는 본 발명의 일실시예에 따른 딥러닝 모델에 의해 생성되는 특징맵이 도시된다.3 is a diagram for explaining a deep learning model according to an embodiment of the present invention, and FIG. 3 shows a feature map generated by the deep learning model according to an embodiment of the present invention.

본 발명의 일실시예에 따른 딥러닝 모델은 전술된 바와 같이, 제1 및 제2인공 신경망을 포함하며, 제2인공 신경망은 특징점 추출 네트워크 및 설명자 추출 네트워크를 포함한다.As described above, the deep learning model according to an embodiment of the present invention includes first and second artificial neural networks, and the second artificial neural network includes a feature point extraction network and a descriptor extraction network.

그리고 제1인공 신경망은 복수의 컨벌루션 레이어를 포함한다. 일예로서, 제1인공 신경망은 9개의 컨벌루션 레이어를 포함할 수 있다. 제1 및 제2컨벌루션 레이어는 타겟 이미지(300)와 동일한 사이즈의 특징맵(310)을 생성할 수 있으며, 제3 및 제4컨벌루션 레이어는 제1 및 제2컨벌루션 레이어에 의해 생성된 특징맵(310)을 입력받아, 타겟 이미지보다 작은 사이즈의 특징맵(320)을 생성할 수 있다. 그리고 제5 내지 제9컨벌루션 레이어는 제3 및 제4컨벌루션 레이어에 의해 생성된 특징맵(320)을 입력받아, 제3 및 제4컨벌루션 레이어에 의해 생성된 특징맵(320)보다 작은 사이즈의 특징맵(330)을 생성할 수 있다. And, the first artificial neural network includes a plurality of convolutional layers. As an example, the first artificial neural network may include 9 convolutional layers. The first and second convolutional layers may generate feature maps 310 having the same size as the target image 300, and the third and fourth convolutional layers may generate feature maps generated by the first and second convolutional layers ( 310), a feature map 320 having a smaller size than the target image may be generated. In addition, the fifth to ninth convolutional layers receive feature maps 320 generated by the third and fourth convolutional layers, and feature sizes smaller than those of the feature maps 320 generated by the third and fourth convolutional layers. A map 330 may be created.

제3 및 제4컨벌루션 레이어에서 생성된 특징맵(320)과, 제5 내지 제9컨벌루션 레이어에서 생성된 특징맵(330)은 타겟 이미지(300)와 동일한 사이즈로 변환되며, 제1 내지 제9컨벌루션 레이어에서 생성된 특징맵(310, 320, 330)들이 서로 연결(concatenation)되어 제1특징맵(340)이 생성된다. 제1특징맵(340)은 3차원의 특징맵일 수 있다.The feature maps 320 generated from the third and fourth convolutional layers and the feature maps 330 generated from the fifth to ninth convolutional layers are converted to the same size as the target image 300, and the first to ninth convolutional layers The first feature map 340 is created by concatenating the feature maps 310 , 320 , and 330 generated in the convolution layer. The first feature map 340 may be a 3D feature map.

즉, 컴퓨팅 장치는 타겟 이미지에 대해 컨벌루션 연산을 반복적으로 수행하여 서로 다른 크기의 제3특징맵을 생성하고, 제3특징맵을 연결함으로써 제1특징맵(340)을 생성할 수 있다.That is, the computing device may generate the first feature map 340 by repeatedly performing a convolution operation on the target image to generate third feature maps having different sizes, and connecting the third feature maps.

특징점 추출 네트워크는 일실시예로서 Res2Net을 이용하여, 특징점을 추출할 수 있다. 특징점 추출 네트워크는 컨벌루션 레이어를 이용하여, 제1특징맵(340)에 대해 컨벌루션 연산을 연속적으로 수행함으로써, 수용장의 스케일이 서로 다른 제2특징맵(351 내지 354)을 생성한다. 도 3에는 4개의 서로 다른 수용장 스케일에 의해 생성된 제2특징맵(351 내지 354)이, 특징점 추출에 이용되는 실시예가 도시된다. 특징점 추출 네트워크는, 제1특징맵(340)과 사이즈가 동일하도록 제2특징맵(351 내지 354)을 생성할 수 있다. The feature point extraction network may extract feature points using Res2Net as an embodiment. The feature point extraction network continuously performs a convolution operation on the first feature map 340 using a convolution layer to generate second feature maps 351 to 354 having different scales of receptive fields. 3 shows an embodiment in which the second feature maps 351 to 354 generated by four different receptive field scales are used for feature point extraction. The feature point extraction network may generate second feature maps 351 to 354 to have the same size as the first feature map 340 .

컨벌루션 연산이 반복적으로 수행됨에 따라 위쪽 특징맵에서 아래쪽 특징맵의 순서로 제2특징맵(351 내지 354)이 생성된다. 다시 말해 제2특징맵(351 내지 354) 중 위쪽에 위치한 특징맵이 상대적으로 먼저 생성된 이전 제2특징맵이며, 아래쪽에 위치한 특징맵은, 이전 제2특징맵으로부터 생성된 특징맵이다. 이와 같이, 특징점 추출 네트워크는 이전 제2특징맵에 대해 컨벌루션 연산을 수행하여 수용장 스케일이 증가된 제2특징맵을 생성하며, 제2특징맵(351 내지 354)에서 아래쪽으로 갈수록 수용장의 스케일은 증가한다.As the convolution operation is repeatedly performed, the second feature maps 351 to 354 are generated in the order from the upper feature map to the lower feature map. In other words, among the second feature maps 351 to 354, the upper feature map is the previous second feature map generated relatively first, and the lower feature map is the feature map generated from the previous second feature map. In this way, the feature point extraction network performs a convolution operation on the previous second feature map to generate a second feature map in which the scale of the receptacle increases, and the scale of the receptacle increases downward in the second feature maps 351 to 354. It increases.

특징점 추출 네트워크는 제2특징맵(351 내지 354) 각각에서 후보 특징점을 추출하고, 후보 특징점에 대한 신뢰도값을 포함하는 스코어맵을 생성할 수 있다. 그리고 스코어맵에서 신뢰도값이 임계값 이상인 특징점을, 타겟 이미지에 대한 특징점으로 결정할 수 있다. 도 3에서 제2특징맵(351 내지 354)에 표시된 검은색 포인트는 타겟 이미지에 대한 특징점을 나타낸다. 제2특징맵(351 내지 354) 각각에서 결정된 특징점의 합집합이, 타겟 이미지에 대한 특징점에 대응된다.The feature point extraction network may extract candidate feature points from each of the second feature maps 351 to 354 and generate a score map including a reliability value for the candidate feature points. In addition, a feature point having a reliability value equal to or greater than a threshold value in the score map may be determined as a feature point for the target image. In FIG. 3 , black points displayed on the second feature maps 351 to 354 represent feature points of the target image. The union of the feature points determined in each of the second feature maps 351 to 354 corresponds to the feature points of the target image.

설명자 추출 네트워크는, 팽창 컨벌루션을 이용하여, 설명자를 추출할 수 있다. 설명자 추출 네트워크는 일실시예로서 1, 2, 3 및 8의 팽창 비율을 이용해, 제1특징맵(340)으로부터 수용장의 스케일이 서로 다른 4개의 제2특징맵(355 내지 358)을 생성할 수 있다. 팽창 비율의 값이 클수록, 수용장의 스케일은 증가한다. 설명자 추출 네트워크는 제1특징맵(340)과 사이즈가 동일하도록 제2특징맵(355 내지 358)을 생성할 수 있다. 또한 특징점 추출 네트워크가 4개의 서로 다른 수용장 스케일의 제2특징맵(351 내지 354)을 생성하는 것과 대응되도록, 설명자 추출 네트워크 역시 4개의 서로 다른 수용장 스케일의 제2특징맵(355 내지 358)을 생성할 수 있다. 즉, 특징점 추출 네트워크와 설명자 추출 네트워크가 생성하는 서로 다른 수용장 스케일의 제2특징맵의 개수는 동일하도록, 네트워크가 구성될 수 있다.The descriptor extraction network may extract descriptors using dilated convolution. As an example, the descriptor extraction network can generate four second feature maps 355 to 358 having different scales of the receptive field from the first feature map 340 using expansion ratios of 1, 2, 3, and 8. there is. The larger the value of the expansion ratio, the larger the scale of the receptive field. The descriptor extraction network may generate second feature maps 355 to 358 to have the same size as the first feature map 340 . In addition, to correspond to the fact that the feature point extraction network generates the second feature maps 351 to 354 of four different receptacle scales, the descriptor extraction network also generates second feature maps 355 to 358 of four different receptacle scales. can create That is, the network may be configured such that the number of second feature maps of different receptive field scales generated by the feature point extraction network and the descriptor extraction network is the same.

설명자 추출 네트워크는 제2특징맵(355 내지 358) 별로 후보 설명자를 추출하고, 특징점 추출 네트워크에서 결정된 특징점에 대응되는 위치의 설명자를, 특징점에 대한 설명자로 결정할 수 있다. 도 3에서 제2특징맵(355 내지 358)에 표시된 검은색 포인트는 특징점에 대한 설명자를 나타내며, 설명자의 위치는 특징점의 위치에 대응됨을 알 수 있다.The descriptor extraction network extracts candidate descriptors for each of the second feature maps 355 to 358, and may determine a descriptor at a location corresponding to a feature point determined in the feature point extraction network as a descriptor for the feature point. In FIG. 3, it can be seen that black points displayed on the second feature maps 355 to 358 represent descriptors for feature points, and the positions of the descriptors correspond to the positions of feature points.

도 4는 본 발명의 일실시예에 따른 스코어맵을 도시하는 도면이며, 도 5는 본 발명의 일실시예에 따라서 추출된 특징점을 나타내는 도면이다.4 is a diagram showing a score map according to an embodiment of the present invention, and FIG. 5 is a diagram showing feature points extracted according to an embodiment of the present invention.

도 4는 도 5의 타겟 이미지에 대한 스코어 맵으로서, 도 4에서 도시된 9개의 스코어맵은, 9개의 서로 다른 수용장 스케일에 의해 생성된 제2특징맵으로부터 획득된 스코어맵이다. 스코어맵에서 밝을 픽셀은 어두운 픽셀보다 높은 스코어 즉 신뢰도값을 나타낸다. 그리고 도 5에서 적색 원의 중심에 특징점이 위치하며, 적색 원의 크기는 수용장 스케일의 크기에 대응된다.4 is a score map for the target image of FIG. 5, and the 9 score maps shown in FIG. 4 are score maps obtained from the second feature map generated by 9 different receptive field scales. A bright pixel in the score map represents a higher score, that is, a reliability value, than a dark pixel. And, in FIG. 5, the feature point is located at the center of the red circle, and the size of the red circle corresponds to the size of the receptive field scale.

도 4에 도시된 바와 같이, 수용장 스케일 별로 전경과 배경 영역에서 높은 스코어의 위치가 서로 다르게 나타난다. 하나의 수용장 스케일에 의해 생성된 특징맵을 이용할 경우, 전경 또는 배경 영역에서 특징점이 추출되지 않을 수 있지만, 본 발명의 일실시예에 따르면 서로 스케일이 다른 복수의 수용장에 의해 생성된 특징맵이 이용됨으로써, 타겟 이미지에 대한 특징점이 도 5에 도시된 바와 같이 누락없이 효과적으로 추출됨을 알 수 있다.As shown in FIG. 4, the location of the high score appears differently in the foreground and background areas for each receptacle scale. When using a feature map generated by one receptive field scale, feature points may not be extracted from the foreground or background area, but according to an embodiment of the present invention, feature maps generated by a plurality of receptive fields having different scales. By using this, it can be seen that feature points for the target image are effectively extracted without omission as shown in FIG. 5 .

도 6은 본 발명의 일실시예에 따른 특징점 추출을 위한 학습 방법을 설명하기 위한 도면으로서, 전술된 딥러닝 모델에 대한 학습 방법을 설명하기 위한 도면이다.6 is a diagram for explaining a learning method for extracting feature points according to an embodiment of the present invention, and is a diagram for explaining the learning method for the aforementioned deep learning model.

도 6을 참조하면, 본 발명의 일실시예에 따른 컴퓨팅 장치는 훈련용 이미지에 대한 변환 이미지를 생성(S610)한다. 컴퓨팅 장치는 타겟 이미지의 기하학적인 변화에도 강인한 특징점이 추출될 수 있도록 딥러닝 모델을 훈련시키기 위해, 일실시예로서, 훈련용 이미지의 시점(view point)이 변환된 이미지를 생성할 수 있다. 컴퓨팅 장치는 호모그래피(homography) 변환 행렬을 이용하여, 훈련 이미지의 시점이 변환된 이미지를 생성할 수 있다.Referring to FIG. 6 , the computing device according to an embodiment of the present invention generates a converted image for a training image (S610). In order to train a deep learning model so that robust feature points can be extracted even with geometric changes in the target image, the computing device may generate an image in which a view point of the training image is converted, as an example. The computing device may generate an image in which a viewpoint of a training image is transformed by using a homography transformation matrix.

또한 컴퓨팅 장치는 타겟 이미지가 촬영된 환경의 조명 변화에도 강인한 특징점이 추출될 수 있도록 딥러닝 모델을 훈련시키기 위해, 일실시예로서 훈련용 이미지의 색상이 변환된 이미지를 생성할 수 있다. In addition, the computing device may generate an image in which the color of the training image is converted, as an example, in order to train a deep learning model so that a feature point robust to a change in lighting of an environment in which the target image is photographed can be extracted.

컴퓨팅 장치는 훈련용 이미지, 변환 이미지 및 정답값을 이용하여, 특징점 추출 모델이 훈련용 이미지 및 변환 이미지에 대한 특징점과, 특징점에 대한 설명자를 추출하도록 학습을 수행(S620)한다. 여기서, 정답값(ground truth)은 훈련용 이미지 및 변환 이미지에 대한 특징점의 위치값을 포함한다. The computing device performs learning so that the feature point extraction model extracts feature points for the training image and converted image and descriptors for the feature points using the training image, the converted image, and the correct answer value (S620). Here, the correct answer value (ground truth) includes the location value of the feature point for the training image and the converted image.

설명자는 특징점 별로 서로 다르게 할당되는 식별자를 포함하므로, 특징점에 대한 설명자의 위치값이 정답값에 포함되지 않아도 무방하며, 특징점 별로 서로 다른 설명자가 할당되도록 학습이 수행될 수 있다.Since the descriptors include identifiers that are allocated differently for each feature point, the location value of the descriptor for each feature point does not have to be included in the correct answer value, and learning can be performed such that different descriptors are assigned to each feature point.

앞서 설명한 기술적 내용들은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예들을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 하드웨어 장치는 실시예들의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The technical contents described above may be implemented in the form of program instructions that can be executed through various computer means and recorded on a computer readable medium. The computer readable medium may include program instructions, data files, data structures, etc. alone or in combination. Program commands recorded on the medium may be specially designed and configured for the embodiments or may be known and usable to those skilled in computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic media such as floptical disks. - includes hardware devices specially configured to store and execute program instructions, such as magneto-optical media, and ROM, RAM, flash memory, and the like. Examples of program instructions include high-level language codes that can be executed by a computer using an interpreter, as well as machine language codes such as those produced by a compiler. A hardware device may be configured to act as one or more software modules to perform the operations of the embodiments and vice versa.

이상과 같이 본 발명에서는 구체적인 구성 요소 등과 같은 특정 사항들과 한정된 실시예 및 도면에 의해 설명되었으나 이는 본 발명의 보다 전반적인 이해를 돕기 위해서 제공된 것일 뿐, 본 발명은 상기의 실시예에 한정되는 것은 아니며, 본 발명이 속하는 분야에서 통상적인 지식을 가진 자라면 이러한 기재로부터 다양한 수정 및 변형이 가능하다. 따라서, 본 발명의 사상은 설명된 실시예에 국한되어 정해져서는 아니되며, 후술하는 특허청구범위뿐 아니라 이 특허청구범위와 균등하거나 등가적 변형이 있는 모든 것들은 본 발명 사상의 범주에 속한다고 할 것이다.As described above, the present invention has been described by specific details such as specific components and limited embodiments and drawings, but these are provided to help a more general understanding of the present invention, and the present invention is not limited to the above embodiments. , Those skilled in the art in the field to which the present invention belongs can make various modifications and variations from these descriptions. Therefore, the spirit of the present invention should not be limited to the described embodiments, and it will be said that not only the claims to be described later, but also all modifications equivalent or equivalent to these claims belong to the scope of the present invention. .

Claims

receiving a target image;
generating a first feature map of a target image by using a pre-learned first artificial neural network; and
A plurality of second feature maps having different receptive field scales are generated from the first feature map using a second artificial neural network trained in advance, and feature points of the target image are generated in each of the second feature maps. and extracting descriptors
Deep learning-based feature point extraction method including.

According to claim 1,
The second artificial neural network is
A feature point extraction network for extracting the feature point and a descriptor extraction network for extracting the descriptor;
The sizes of the first and second feature maps are
same size as the target image
Deep learning-based feature point extraction method.

According to claim 2,
The feature point extraction network or the descriptor extraction network
Generating the second feature map using dilation convolution or performing a convolution operation on the previous second feature map to generate a second feature map having a different receptive field scale from the previous second feature map
Deep learning-based feature point extraction method.

According to claim 1,
The step of extracting the feature points and descriptors is
Extracting candidate descriptors for the feature point, and determining a descriptor corresponding to the position of the feature point among the candidate descriptors as a descriptor for the feature point
Deep learning-based feature point extraction method.

generating a conversion image for the training image;
Performing learning so that a deep learning model extracts feature points for the training image and the transformed image and descriptors for the feature points using the training image, the converted image, and correct answer values;
The correct answer value includes position values of feature points for the training image and the converted image,
The descriptor includes an identifier that is differently assigned to each feature point.
Learning method for extracting feature points.

According to claim 5,
The step of generating the converted image is
Creating an image in which the viewpoint or color of the training image is converted
Learning method for extracting feature points.