KR20220136797A

KR20220136797A - Apparatus and method for image stitching based on priority object

Info

Publication number: KR20220136797A
Application number: KR1020210042934A
Authority: KR
Inventors: 김규헌; 이성배; 박성환; 강전호
Original assignee: 경희대학교 산학협력단
Priority date: 2021-04-01
Filing date: 2021-04-01
Publication date: 2022-10-11
Also published as: KR102630106B1

Abstract

The present invention relates to a method for a computing apparatus operated by at least one processor to match images. The method includes the following steps of: detecting feature points from each input image, and matching identical feature points in the input images to extract a common area; detecting objects located in the common area, and calculating an object-based weight matrix proportional to priorities set for the detected objects; calculating boundary line weights based on boundary line position information derived from input images of consecutive previous views, and generating a boundary line generation matrix for the common area based on the boundary line weights and the object-based weight matrix; and matching the input images based on the boundary line generation matrix. Therefore, the present invention is capable of improving the accuracy of image matching.

Description

Priority object-based image matching device and method {APPARATUS AND METHOD FOR IMAGE STITCHING BASED ON PRIORITY OBJECT}

본 발명은 딥러닝 기반의 영상 정합 기술에 관한 것이다.The present invention relates to image registration technology based on deep learning.

영상 스티칭(Image stitching) 기술은 다수의 카메라를 통해 다각도에서 촬영된 영상들을 하나의 영상으로 합성하는 기술로써, 파노라마 영상과 360도 영상 생성에 주로 활용된다.Image stitching technology is a technology for synthesizing images taken from multiple angles through multiple cameras into one image, and is mainly used to create panoramic images and 360-degree images.

영상 스티칭 기술을 통해 생성된 파노라마 영상과 360도 영상은 기존 전통적인 구조의 카메라를 통해서 촬영한 영상보다 넓은 시야각을 가지기 때문에 사용자에게 현장감을 제공할 수 있다는 이점이 있다. Panoramic images and 360-degree images generated through image stitching technology have a wider viewing angle than images shot through a conventional camera, which has the advantage of providing a sense of presence to users.

하지만, 영상 정합 과정에서 촬영 카메라 간에 시차(Parallax)가 큰 경우 영상 접합 부분이 번지는 현상, 사물의 일부가 사라지는 현상, 또는 사물이 중복되어 겹쳐 보이는 현상 등의 시차 왜곡이 발생할 수 있다. 이러한 시차 왜곡은 사용자에게 이질감을 느끼게 하여 콘텐츠 몰입을 방해한다는 한계를 가진다. However, in the case of a large parallax between the photographing cameras in the image registration process, parallax distortion such as a phenomenon in which the image junction part spreads, a phenomenon in which a part of an object disappears, or a phenomenon in which objects are overlapped and overlapped may occur. This parallax distortion has a limitation in that it interferes with content immersion by making users feel a sense of heterogeneity.

이러한 시차 왜곡을 방지하기 위해서 다양한 연구들이 진행되고 있으며, 그중에서 경계선(seam) 최적화 기반의 영상 스티칭 방법(Seam Optimization)은 입력 영상들의 중복 영역에 대해 경계선을 기반으로 사물 검출 값에 가중치를 설정하여 사물에서의 시차 왜곡이 발생하는 것을 방지할 수 있다. In order to prevent such parallax distortion, various studies are being conducted. Among them, the seam optimization-based image stitching method (Seam Optimization) sets a weight to the object detection value based on the boundary line for the overlapping area of the input images. It is possible to prevent parallax distortion in objects from occurring.

다만, 이러한 경계선 최적화 기반의 영상 스티칭 방법(Seam Optimization)에서는 경계선의 생성 초기 위치, 사물 검출기의 성능 또는 사물의 배치 등에 따라 경계선을 생성함에 있어 제한이 발생한다. However, in this boundary line optimization-based image stitching method (Seam Optimization), limitations occur in generating the boundary line depending on the initial position of the boundary line, the performance of the object detector, or the arrangement of objects.

또한, 경계선 최적화 기반의 영상 스티칭 방법(Seam Optimization)을 통하여 동영상을 합성하는 경우 모든 동영상 프레임에서 각각의 경계선을 생성하기 때문에, 이전 동영상 프레임에서 생성된 경계선과 현재 동영상 프레임에서 생성된 경계선의 위치가 크게 다르다면, 합성된 동영상에서 사물이 급격하게 이동하는 왜곡 현상이 발생할 수 있다. In addition, when synthesizing a video through the image stitching method based on boundary line optimization (Seam Optimization), since each boundary line is generated in every video frame, the position of the boundary line generated in the previous video frame and the boundary line generated in the current video frame is If it is significantly different, a distortion phenomenon in which an object moves rapidly in the synthesized video may occur.

본 발명이 해결하고자 하는 과제는 딥러닝 기반으로 입력 영상들에 포함된 하나 이상의 사물을 검출하고, 검출된 사물의 우선순위에 따른 개별적 가중치에 기초하여 경계선 생성 행렬을 생성하고, 경계선 생성 행렬에 기초하여 입력 영상들을 정합하는 영상 정합 장치 및 그 방법을 제공하는 것이다.The problem to be solved by the present invention is to detect one or more objects included in input images based on deep learning, generate a boundary line generation matrix based on individual weights according to the priority of the detected object, and based on the boundary line generation matrix to provide an image matching apparatus and method for matching input images.

본 발명의 일 실시예에 따르면 발명은 적어도 하나의 프로세서에 의해 동작하는 컴퓨팅 장치가 영상 정합하는 방법으로서, 입력 영상들에서 각각 특징점을 검출하고, 입력 영상들에서 동일한 특징점들을 매칭하여 공통 영역을 추출하는 단계, 공통 영역에 위치하는 사물들을 검출하고, 검출된 사물들에 설정된 우선순위에 비례하는 사물 기반 가중치 행렬을 산출하는 단계, 연속된 이전 시점의 입력 영상에서 도출된 경계선의 위치 정보에 기초하여 경계선 가중치를 산출하고, 경계선 가중치와 사물 기반 가중치 행렬에 기초하여 공통 영역에서의 경계선 생성 행렬을 생성하는 단계, 그리고 경계선 생성 행렬에 기초하여 입력 영상들을 정합하는 단계를 포함한다. According to an embodiment of the present invention, there is provided a method of image matching by a computing device operated by at least one processor, wherein each feature point is detected in input images, and a common area is extracted by matching the same feature points in the input images. step, detecting objects located in a common area, calculating an object-based weight matrix proportional to the priority set for the detected objects, based on the location information of the boundary line derived from the input image of consecutive previous views calculating a boundary line weight, generating a boundary line generation matrix in a common area based on boundary line weights and an object-based weight matrix, and matching input images based on the boundary line generation matrix.

본 발명의 일 실시예에 따르면 영상 정합 장치로서, 명령어들을 포함하는 메모리, 그리고 명령어들을 실행하여 입력 영상간에 영상 정합을 수행하는 프로세서를 포함하고, 프로세서는 입력 영상들에서 검출된 특징점들에 기초하여 공통 영역을 추출하고, 공통 영역에서 검출된 사물들의 우선순위에 비례하는 사물 기반 가중치 행렬과 연속된 이전 시점의 입력 영상에서 도출된 경계선의 위치 정보에 기초하여 경계선 가중치를 산출하고, 경계선 가중치와 사물 기반 가중치 행렬에 기초하여 생성된 상기 공통 영역에서의 경계선 생성 행렬을 통해 상기 입력 영상들을 정합한다. According to an embodiment of the present invention, there is provided an image matching apparatus, comprising: a memory including instructions; and a processor executing the instructions to perform image registration between input images, wherein the processor is configured to perform image matching between input images based on feature points detected in the input images. Extracting the common area, calculating the boundary line weight based on the object-based weight matrix proportional to the priority of objects detected in the common area, and the boundary line position information derived from the continuous input image of the previous view, and calculating the boundary line weight and the object The input images are matched through a boundary line generation matrix in the common area generated based on a base weight matrix.

본 발명에 따르면, 다수 카메라를 통해 촬영된 영상들의 정합 과정에서 촬영하는 카메라 간의 시차에 의해 발생하는 시차 왜곡을 최소화할 수 있다. According to the present invention, it is possible to minimize the parallax distortion caused by the parallax between the cameras photographed in the process of matching images captured by multiple cameras.

본 발명에 따르면, 이전 프레임에서 검출된 경계선을 기반으로 수평 가중치를 설정하므로 합성된 동영상에서 사물이 급격하게 이동하는 문제를 해결할 수 있다. According to the present invention, since the horizontal weight is set based on the boundary line detected in the previous frame, it is possible to solve the problem of abrupt movement of an object in the synthesized video.

본 발명에 따르면, 영상 정합 시 딥러닝 기반 사물 검출 기술을 활용하여 검출된 사물의 중요도에 따라서 우선순위를 설정하고 경계선 생성 행렬을 구성하여, 시차 왜곡으로부터 중요한 사물을 보호함으로써 영상 정합의 정확도와 사용자의 콘텐츠 몰입을 향상시킬 수 있다. According to the present invention, by using deep learning-based object detection technology during image registration, priority is set according to the importance of the detected object and the boundary line generation matrix is configured to protect important objects from parallax distortion, thereby improving the accuracy of image registration and the user content immersion can be improved.

본 발명에 따르면, 딥러닝을 활용한 영상 스티칭 기술의 보편적인 기저 기술로 활용될 수 있으며, 영상 정합 시 딥러닝 기반 사물 검출 기술을 활용하기에 기존 딥러닝 기반 사물 검출을 활용한 감시 시스템과 절차적 통합이 가능하다. 　　　According to the present invention, it can be used as a universal base technology of image stitching technology using deep learning, and since the deep learning-based object detection technology is used for image registration, a monitoring system and procedure using the existing deep learning-based object detection integration is possible.

　　본 개시에서 얻을 수 있는 효과는 이상에서 언급한 효과들로 제한되지 않으며, 언급하지 않은 또 다른 효과들은 아래의 기재로부터 본 개시가 속하는 기술 분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.　 Effects obtainable in the present disclosure are not limited to the above-mentioned effects, and other effects not mentioned may be clearly understood by those of ordinary skill in the art to which the present disclosure belongs from the description below. will be.

도 1은 본 발명의 실시예에 따른 영상 정합 장치를 나타낸 구성도이다.
도 2는 본 발명의 실시예에 따른 우선순위 사물 설정기를 나타낸 구성도이다.
도 3은 본 발명의 실시예에 따른 영상 정합 방법을 나타낸 순서도이다.
도 4는 본 발명의 실시예에 따른 입력 영상들에 대해 공통영역을 추출하는 과정을 나타낸 예시도이다.
도 5는 본 발명의 실시예에 따른 시각적 인지 에너지 함수를 적용한 예시도이다.
도 6은 본 발명의 실시예에 검출된 사물의 세그먼트를 나타낸 예시도이다.
도7은 본 발명의 실시예에 우선순위 사물 기반 가중치 행렬을 설명하기 위한 예시도이다.
도 8은 본 발명의 실시예에 따른 동적 중요도를 할당하는 구성을 설명하기 위한 예시도이다.
도 9는 본 발명의 실시예에 따른 수평 가중치를 적용하는 과정을 설명하기 위한 예시도이다.
도 10은 본 발명의 실시예에 따른 영상 정합 결과와 기존 방법의 영상 정합 결과를 비교한 예시도이다.
도 11은 본 발명의 실시예에 따른 영상 정합 장치의 하드웨어 구성도이다. 1 is a block diagram showing an image matching apparatus according to an embodiment of the present invention.
2 is a configuration diagram illustrating a priority thing setter according to an embodiment of the present invention.
3 is a flowchart illustrating an image registration method according to an embodiment of the present invention.
4 is an exemplary diagram illustrating a process of extracting a common region from input images according to an embodiment of the present invention.
5 is an exemplary diagram to which a visual perception energy function according to an embodiment of the present invention is applied.
6 is an exemplary diagram illustrating a segment of an object detected according to an embodiment of the present invention.
7 is an exemplary diagram for explaining a priority object-based weight matrix in an embodiment of the present invention.
8 is an exemplary diagram for explaining a configuration for allocating dynamic importance according to an embodiment of the present invention.
9 is an exemplary diagram for explaining a process of applying a horizontal weight according to an embodiment of the present invention.
10 is an exemplary diagram comparing the image registration result according to an embodiment of the present invention and the image registration result of the existing method.
11 is a hardware configuration diagram of an image matching apparatus according to an embodiment of the present invention.

아래에서는 첨부한 도면을 참고로 하여 본 발명의 실시예에 대하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those of ordinary skill in the art to which the present invention pertains can easily implement them. However, the present invention may be embodied in several different forms and is not limited to the embodiments described herein. And in order to clearly explain the present invention in the drawings, parts irrelevant to the description are omitted, and similar reference numerals are attached to similar parts throughout the specification.

명세서 전체에서, 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다.　 또한, 명세서에 기재된 "……부", "……기", "모듈" 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어나 소프트웨어 또는 하드웨어 및 소프트웨어의 결합으로 구현될 수 있다. Throughout the specification, when a part "includes" a certain component, it means that other components may be further included, rather than excluding other components, unless otherwise stated. In addition, terms such as “…unit”, “…group”, and “module” described in the specification mean a unit that processes at least one function or operation, which is implemented by hardware or software or a combination of hardware and software. can be

본 발명에서 설명하는 장치들은 적어도 하나의 프로세서, 메모리 장치, 통신 장치 등을 포함하는 하드웨어로 구성되고, 지정된 장소에 하드웨어와 결합되어 실행되는 프로그램이 저장된다. 하드웨어는 본 발명의 방법을 실행할 수 있는 구성과 성능을 가진다. 프로그램은 도면들을 참고로 설명한 본 발명의 동작 방법을 구현한 명령어(instructions)를 포함하고, 프로세서와 메모리 장치 등의 하드웨어와 결합되어 본 발명을 실행한다. The devices described in the present invention are composed of hardware including at least one processor, a memory device, a communication device, and the like, and a program executed in combination with the hardware is stored in a designated place. The hardware has the configuration and capability to implement the method of the present invention. The program includes instructions for implementing the method of operation of the present invention described with reference to the drawings, and is combined with hardware such as a processor and a memory device to execute the present invention.

본 명세서에서 "전송 또는 제공"은 직접적인 전송 또는 제공하는 것뿐만 아니라 다른 장치를 통해 또는 우회 경로를 이용하여 간접적으로 전송 또는 제공도 포함할 수 있다.As used herein, "transmission or provision" may include not only direct transmission or provision, but also transmission or provision indirectly through another device or using a detour path.

본 명세서에서 단수로 기재된 표현은 "하나" 또는 "단일" 등의 명시적인 표현을 사용하지 않은 이상, 단수 또는 복수로 해석될 수 있다.In this specification, expressions described in the singular may be construed as singular or plural unless an explicit expression such as “a” or “single” is used.

도 1은 본 발명의 실시예에 따른 영상 정합 장치를 나타낸 구성도이다. 1 is a block diagram showing an image matching apparatus according to an embodiment of the present invention.

도 1에 도시한 바와 같이, 영상 정합 장치(100)는 공통 영역 추출기(110), 우선순위 사물 설정기(120), 경계선 생성 행렬 생성기 (130) 그리고 영상 정합기(140)를 포함한다. As shown in FIG. 1 , the image matching apparatus 100 includes a common region extractor 110 , a priority object setter 120 , a boundary line generation matrix generator 130 , and an image matcher 140 .

공통 영역 추출기(110), 우선순위 사물 설정기(120), 경계선 생성 행렬 생성기 (130) 그리고 영상 정합기(140)는 적어도 하나의 프로세서에 의해 동작할 수 있다. 다시 말해, 공통 영역 추출기(110), 우선순위 사물 설정기(120), 경계선 생성 행렬 생성기 (130) 그리고 영상 정합기(140)는 하나의 컴퓨팅 장치에 구현되거나, 별도의 컴퓨팅 장치에 분산 구현될 수 있다. 별도의 컴퓨팅 장치에 분산 구현된 경우, 공통 영역 추출기(110), 우선순위 사물 설정기(120), 경계선 생성 행렬 생성기 (130) 그리고 영상 정합기(140)는 통신 인터페이스를 통해 서로 통신할 수 있다. 컴퓨팅 장치는 본 발명을 수행하도록 작성된 소프트웨어 프로그램을 실행할 수 있는 장치이면 충분하고, 예를 들면, 서버, 랩탑 컴퓨터 등일 수 있다. The common region extractor 110 , the priority object setter 120 , the boundary line generation matrix generator 130 , and the image matcher 140 may be operated by at least one processor. In other words, the common region extractor 110, the priority object setter 120, the boundary line generation matrix generator 130, and the image matcher 140 may be implemented in one computing device or distributed in separate computing devices. can When distributed in a separate computing device, the common area extractor 110 , the priority object setter 120 , the boundary line generation matrix generator 130 , and the image matcher 140 may communicate with each other through a communication interface. . The computing device may be any device capable of executing a software program written to carry out the present invention, and may be, for example, a server, a laptop computer, or the like.

영상 정합 장치(100)는 하나 이상의 카메라 모듈과 직접 연동되거나 카메라 모듈의 데이터베이스에 연결되어 촬영 영상을 수집하거나 입력받을 수 있다. 이때 촬영 영상은 서로 중복되는 영역을 포함하는 이미지들을 나타낸다. The image matching apparatus 100 may be directly linked with one or more camera modules or may be connected to a database of a camera module to collect or receive captured images. In this case, the captured image represents images including regions overlapping each other.

이하에서는 영상 정합을 위한 수집되거나 입력되는 촬영 이미지들을 입력 영상으로 명명하여 설명한다. Hereinafter, captured or inputted images for image matching are named and described as input images.

공통 영역 추출기(110)는 입력 영상에서 하나 이상의 특징점들을 추출하고, 입력 영상들간에 추출된 특징점을 매칭하여 공통 영역을 추출한다. The common region extractor 110 extracts one or more feature points from the input image, and extracts the common region by matching the extracted feature points between the input images.

여기서, 특징점은 영상의 회전, 밝기 변화, 이동 변환에도 동일한 지점으로 검출되는 강인한 픽셀에 해당하며, 공통 영역 추출기(110)는 두 입력 영상에서 검출한 각각의 특징점 중에서 동일한 지점을 매칭한다. Here, the key point corresponds to a robust pixel that is detected as the same point even in rotation, brightness change, and movement transformation of the image, and the common area extractor 110 matches the same point among the respective key points detected in the two input images.

그리고 공통 영역 추출기(110)는 매칭된 특징점들을 모두 포함할 수 있는 영역을 공통 영역(Overlap region)으로 설정한다. In addition, the common region extractor 110 sets a region capable of including all the matched feature points as an overlap region.

다시 말해, 공통 영역 추출기(110)는 매칭된 모든 특징점들을 포함할 수 있는 최대 크기의 중복 영역을 공통 영역으로 설정할 수 있다. In other words, the common area extractor 110 may set the overlapping area of the maximum size that may include all the matched feature points as the common area.

이때, 공통 영역 추출기(110)는 공통 영역의 특징점들에 대해 호모그래피(Homography)를 계산하여, 동일 지점의 특징점간의 변화 관계를 추정하고, 입력 영상간의 시차를 최소화할 수 있도록 와핑(Warping)을 수행할 수 있다. At this time, the common region extractor 110 calculates a homography for the feature points of the common region, estimates a change relationship between the feature points at the same point, and performs warping to minimize the parallax between input images. can be done

상세하게는 공통 영역 추출기(110)는 호모그래피에 대한 동일 지점간의 특징점 매칭 결과에 뒤틀림 잔차(warping residual)를 산출한다. 그리고 공통 영역 추출기(110)는 뒤틀림 잔차와 특징점 매칭 결과에 기초하여 입력 영상을 변환할 수 있다. In detail, the common region extractor 110 calculates a warping residual from the feature point matching result between the same points for homography. In addition, the common region extractor 110 may transform the input image based on the distortion residual and the feature point matching result.

예를 들어, 제1 입력 영상과 제2 입력 영상간에 정합을 수행한다고 가정하면, 제1 입력 영상을 기준으로 제2 입력 영상에 설정된 공통 영역의 기울기를 변환할 수 있다. For example, assuming that matching is performed between the first input image and the second input image, the slope of the common region set in the second input image may be converted based on the first input image.

우선순위 사물 설정기(120)는 입력 영상들의 공통 영역에서 하나 이상의 사물을 검출하고, 검출된 사물의 우선순위에 기초하여 가중치를 적용한 사물 기반 가중치 행렬을 산출한다. The priority object setter 120 detects one or more objects in a common area of input images, and calculates an object-based weight matrix to which weights are applied based on the priority of the detected objects.

우선순위 사물 설정기(120)는 사물과 배경정보를 획득하기 위해 시각적 인지 에너지 함수를 통해 산출된 에너지 값과 공통 영역을 딥러닝 기반 사물 검출 모델에 입력하여 공통 영역에서 검출된 사물과 해당 사물의 종류, 사물의 위치 정보, 사물의 외형과 동일한 형태의 세그먼트 등을 획득한다. The priority object setter 120 inputs the energy value calculated through the visual perception energy function and the common area to the deep learning-based object detection model to obtain the object and background information, and the object detected in the common area and the object The type, location information of the object, and the segment having the same shape as the appearance of the object are acquired.

여기서 사용되는 사물 검출 모델은 Mask R-CNN, YOLACT 등이 적용될 수 있으며, 하나 이상의 딥러닝 알고리즘으로 구현될 수 있다. Mask R-CNN, YOLACT, etc. may be applied to the object detection model used here, and may be implemented with one or more deep learning algorithms.

그리고 우선순위 사물 설정기(120)는 검출된 사물의 종류와 중요도에 기초하여 사물마다의 우선순위를 설정한다. In addition, the priority object setter 120 sets a priority for each object based on the type and importance of the detected object.

우선순위 사물 설정기(120)는 사물 내부 영역으로 경계선이 생성되지 않도록 사물의 세그먼트 자체를 추가적인 가중치로 설정하고, 사물의 내부 위치에 적용되는 가중치 값은 배경의 가중치 값보다 크게 설정하기 위해 시각적 인지 에너지 함수의 최대 에너지 값을 이용한다. The priority object setter 120 sets the segment of the object itself as an additional weight so that a boundary line is not created in the interior area of the object, and the weight value applied to the interior location of the object is visually recognized to be larger than the weight value of the background. Use the maximum energy value of the energy function.

또한 우선순위 사물 설정기(120)는 사물의 우선순위에 비례한 최대 에너지 값을 이용하여 우선순위 사물 기반 가중치 행렬(POWM, Priority Object based Weight Matrix)을 생성한다. In addition, the priority object setter 120 generates a priority object based weight matrix (POWM) using the maximum energy value proportional to the priority of the object.

경계선 생성 행렬 생성기 (130)는 이전 시점 입력 영상에서 도출된 경계선의 위치 정보에 기초하여 수평 거리에 비례한 수평 가중치(PSW, Previous Seam Weight)를 산출하고, 우선순위 사물 기반 가중치 행렬과 결합하여 입력 영상의 경계선 생성 행렬 (SGM, Seam Generation Matrix)을 생성한다. The boundary line generation matrix generator 130 calculates a horizontal weight (PSW, Previous Seam Weight) proportional to the horizontal distance based on the position information of the boundary line derived from the previous viewpoint input image, and combines it with the priority object-based weight matrix to input Creates a Seam Generation Matrix (SGM) of the image.

그리고 영상 정합기(140)는 생성된 경계선 생성 행렬에 기초하여 입력 영상간에 정합을 수행한다. In addition, the image matcher 140 matches the input images based on the generated boundary line generation matrix.

이와 같이, 영상 정합 장치(100)는 시각적 인지 에너지 함수와 딥러닝 기반의 사물 검출 알고리즘을 이용하여 검출된 사물의 우선순위에 비례한 우선순위 사물 기반 가중치 행렬을 생성함으로써 사물 객체의 경계선이 사물 내부로 한정되지 않는다. As described above, the image matching apparatus 100 generates a priority object-based weight matrix proportional to the priority of the detected object using a visual cognitive energy function and a deep learning-based object detection algorithm so that the boundary line of the object object is set inside the object. is not limited to

또한, 영상 정합 장치(100)는 우선순위 사물 기반 가중치 행렬과 이전 입력 영상의 경계선 위치에 기초한 가중치를 이용하여 입력 영상간에 정합을 수행함으로써, 시차 왜곡에 강인한 영상 정합을 수행할 수 있다. Also, the image matching apparatus 100 may perform image registration robust to parallax distortion by performing registration between input images using a weighting matrix based on priority objects and a weight based on a boundary position of a previous input image.

이하에서는 우선순위 사물 설정기(120)에 대해서 상세하게 설명한다. Hereinafter, the priority thing setter 120 will be described in detail.

도 2는 본 발명의 실시예에 따른 우선순위 사물 설정기를 나타낸 구성도이다. 2 is a configuration diagram illustrating a priority thing setter according to an embodiment of the present invention.

도 2에 도시한 바와 같이, 우선순위 사물 설정기(120)는 경계선 검출기(121), 사물 검출기(122), 사물 추적기(123) 그리고 가중치 설정기(124)를 포함하며, 이외에도 사물 검출 모델을 학습시키는 학습기(미도시함)을 포함할 수 있다. As shown in FIG. 2 , the priority object setter 120 includes a boundary line detector 121 , an object detector 122 , an object tracker 123 , and a weight setter 124 , and in addition to the object detection model It may include a learner (not shown) for learning.

경계선 검출기(121), 사물 검출기(122), 사물 추적기(123), 가중치 설정기(124) 그리고 학습기는 적어도 하나의 프로세서에 의해 동작할 수 있다. 설명의 편의를 위해 학습기를 별도로 구현하였지만, 학습기가 우선순위 사물 설정기(120)에 포함되도록 구현이 가능하다. The boundary line detector 121 , the object detector 122 , the object tracker 123 , the weight setter 124 , and the learner may be operated by at least one processor. Although the learner is separately implemented for convenience of explanation, it can be implemented so that the learner is included in the priority thing setter 120 .

또한 도 2에 도시한 바와 같이, 학습기를 통해 학습이 완료된 사물 검출 모델을 사물 검출기(122)와 연동하면 되기 때문에 반드시 함께 구현될 필요는 없다. In addition, as shown in FIG. 2 , the object detection model that has been learned through the learner only needs to be linked with the object detector 122 , so it is not necessarily implemented together.

경계선 검출기(121)는 공통 영역 추출기(110)에 의해 추출된 공통 영역에서 시각적 인지 에너지 함수를 이용하여 사물과 배경 정보를 획득한다. The boundary line detector 121 acquires object and background information by using the visual perception energy function in the common area extracted by the common area extractor 110 .

여기서, 시각적 인지 에너지 함수에서 산출되는 시각적 인지 에너지 값은 입력 영상 에 대하여 가로축의(X축) 픽셀 변화량 크기와 세로축의(Y축) 픽셀 변화량 크기의 합으로 정의된다. Here, the visual perception energy value calculated from the visual perception energy function is defined as the sum of the horizontal (X-axis) pixel change amount and the vertical (Y-axis) pixel change amount with respect to the input image.

또한, 경계선 검출기(121)는 시각적 인지 에너지 함수의 최대 에너지 값을 산출할 수 있다. 경계선 검출기(121)는 배경보다 사물의 가중치 값이 더 크게 설정되기 위해 최대 에너지 값을 산출하여 가중치 설정기(124)에 전달할 수 있다. Also, the boundary line detector 121 may calculate the maximum energy value of the visual perception energy function. The boundary line detector 121 may calculate a maximum energy value and transmit it to the weight setter 124 in order to set the weight value of the object to be larger than that of the background.

사물 검출기(122)는 학습이 완료된 사물 검출 알고리즘을 이용하여 입력영상의 공통 영역에서 사물을 검출한다. The object detector 122 detects an object in the common area of the input image by using an object detection algorithm that has been trained.

사물 검출기(122)는 사물 내부로 가중치가 검출되지 않도록 시각적 인지 에너지 값을 공통 영역과 함께 사물 검출 알고리즘에 입력한다. 여기서, 사물 검출 알고리즘은 딥러닝 기반의 알고리즘으로 검출된 사물의 종류, 위치 정보 그리고 사물 형태의 세그먼트 등을 출력한다. The object detector 122 inputs the visual recognition energy value together with the common area into the object detection algorithm so that the weight is not detected inside the object. Here, the object detection algorithm outputs the type of object detected by the deep learning-based algorithm, location information, and segment of the object shape.

이때, 사물 검출기(122)는 획득된 사물 정보들(사물 종류, 위치정보, 세그먼트 등)을 별도의 데이터베이스에 저장할 수 있다. In this case, the object detector 122 may store the obtained object information (type of object, location information, segment, etc.) in a separate database.

그리고 미리 사물마다 중요도가 사물 중요도DB에 설정되어 저장된 경우, 사물 검출기(122)는 사물 중요도DB에 기초하여 검출된 사물의 세그먼트에 해당 사물의 중요도 값을 설정할 수 있다. In addition, when the importance of each thing is set and stored in the thing importance DB in advance, the thing detector 122 may set the importance value of the corresponding thing in the segment of the thing detected based on the thing importance DB.

사물 추적기(123)는 연속된 이전 시점의 입력 영상에서의 사물에 대한 위치 정보와 현재 시점의 입력 영상에서의 동일한 사물의 위치 정보를 비교하여 사물들 중에서 움직이는 사물을 인식한다. The object tracker 123 recognizes a moving object among objects by comparing the location information of the object in the input image of the continuous previous viewpoint with the location information of the same object in the input image of the current viewpoint.

그리고 사물 추적기(123)는 연속되는 입력 영상에서 움직이는 사물에 대해서 인식하고, 해당 움직이는 사물이 정지된 사물보다 높은 가중치가 할당되도록 가중치를 설정한다. In addition, the object tracker 123 recognizes a moving object in a continuous input image, and sets the weight so that the moving object is assigned a higher weight than the still object.

이에 사물 추적기(123)는 사물마다 설정된 중요도에 기초하여 사물 검출기(122)에서 검출할 수 있는 사물 종류의 총 개수를 더하여 동적 중요도를 산출할 수 있다. 다시 말해서, 사물 검출 알고리즘이 학습한 사물의 개수가 움직이는 사물에 대하여 추가적인 가중치 값으로 적용된다. Accordingly, the object tracker 123 may calculate the dynamic importance by adding the total number of types of objects that can be detected by the object detector 122 based on the importance set for each object. In other words, the number of objects learned by the object detection algorithm is applied as an additional weight value to a moving object.

이처럼, 사물 추적기(123)는 첫번째 프레임에서 검출된 사물 정보에 기초하여 두번째 프레임에서부터 각 사물들의 위치 비교를 통한 동적 중요도를 설정할 수 있다. As such, the object tracker 123 may set the dynamic importance by comparing the positions of the objects from the second frame on the basis of the object information detected in the first frame.

가중치 설정기(124)는 사물 검출기(122)로부터 수신한 사물 정보와 사물 추적기(123)로부터 동적 중요도, 그리고 별도의 사물 중요도DB의 내용에 기초하여 공통 영역에서 검출된 사물들의 우선순위와 그에 따른 가중치를 설정한다. The weight setter 124 determines the priority of objects detected in the common area based on the object information received from the object detector 122, the dynamic importance from the object tracker 123, and the contents of a separate object importance DB, and the set the weight

다시 말해, 가중치 설정기(124)는 영상 정합 시 시차 왜곡으로부터 보존하고 싶은 사물에 높은 중요도 값의 할당한다. In other words, the weight setter 124 assigns a high importance value to an object to be preserved from parallax distortion during image matching.

여기서, 중요도는 입력 영상들의 목적, 입력 영상내 움직이는 사물의 속도 등 하나 이상의 설정 기준에 따라 사물의 종류에 따라 설정된다. Here, the importance is set according to the type of the object according to one or more setting criteria, such as the purpose of the input images and the speed of the moving object in the input image.

예를 들어, 중요도는 움직이는 속도가 상대적으로 빠른 사물, 시차 왜곡에 대한 부조화 영향이 큰 사물, 그리고 교통 관련 사물 등의 기준에 의해 설정될 수 있다. 여기서 속도는 영상 내에서 검출된 사물의 위치 정보에 기초하여 추정된 상대적인 속도를 의미한다. For example, the importance level may be set by criteria such as an object having a relatively high moving speed, an object having a large dissonance effect on parallax distortion, and a traffic-related object. Here, the speed means a relative speed estimated based on location information of an object detected in the image.

그리고 우선순위는 입력 영상에서 검출된 사물들의 중요도 비교 순위에 따라 0부터 차등적으로 정수 값이 설정되는 변수를 나타낸다.In addition, the priority indicates a variable whose integer value is differentially set from 0 according to the priority comparison order of objects detected in the input image.

다시 말해, 가중치 설정기(124)는 각 사물에 설정된 중요도의 크기를 비교하여 중요도가 큰 값을 가지는 순서에 따라 우선순위를 설정한다. In other words, the weight setter 124 compares the size of importance set for each object, and sets the priority according to the order in which the importance has a large value.

또한 가중치 설정기(124)는 획득된 사물 세그먼트 자체를 추가적인 가중치로 설정할 수 있다. 이는 사물의 내부 위치에 적용되는 가중치 값을 배경의 가중치 값보다 항상 크게 설정하기 위함으로 사물마다 시각적 인지 에너지 함수의 최대 에너지 값 수집한다. Also, the weight setter 124 may set the obtained object segment itself as an additional weight. This is to set the weight value applied to the internal position of the object to be always larger than the weight value of the background, and the maximum energy value of the visual perception energy function is collected for each object.

그리고 가중치 설정기(124)는 사물의 우선순위에 비례한 최대 에너지 값을 기존 시각적 인지 에너지 함수에 더하여 우선순위 사물 기반 가중치 행렬을 생성한다. In addition, the weight setter 124 adds the maximum energy value proportional to the priority of the object to the existing visual perception energy function to generate a priority object-based weight matrix.

그리고 학습기는 사물 검출 알고리즘을 학습시키기 위해, 입력값과 획득하고자 하는 출력값을 서로 매칭하여 학습데이터를 생성한다. 그리고 학습데이터를 사물 검출 알고리즘에 입력하여, 획득하고자 하는 출력값이 출력되도록 반복하여 학습시킨다. In order to learn the object detection algorithm, the learner generates learning data by matching an input value with an output value to be obtained. Then, the learning data is input to the object detection algorithm, and the learning is repeatedly performed so that the desired output value is output.

학습기는 별도의 검증 과정을 통해 사물 검출 알고리즘의 출력 결과의 정확도를 판단하고, 임계치 이상의 정확도를 획득한 경우, 사물 검출 알고리즘의 학습을 완료할 수 있다. The learner may determine the accuracy of the output result of the object detection algorithm through a separate verification process, and when the accuracy greater than or equal to a threshold is obtained, the learning of the object detection algorithm may be completed.

다음으로는 도 3을 이용하여 영상 정합 장치(100)를 통한 시차 왜곡에 강한 영상 정합 방법에 대해서 상세하게 설명한다. Next, an image matching method strong against parallax distortion through the image matching apparatus 100 will be described in detail with reference to FIG. 3 .

도 3은 본 발명의 실시예에 따른 영상 정합 방법을 나타낸 순서도이다. 3 is a flowchart illustrating an image registration method according to an embodiment of the present invention.

영상 정합 장치(100)는 입력 영상들에서 각각 특징점을 검출하고, 동일한 특징점을 매칭하여 입력 영상들에서 공통 영역을 추출한다(S110). The image matching apparatus 100 detects each feature point in the input images and extracts a common area from the input images by matching the same feature point ( S110 ).

영상 정합 장치(100)는 하나 이상의 입력 영상에 대해서 각각 특징점들을 검출하고, 동일한 특징점들을 매칭하면, 매칭된 특징점들을 모두 포함하도록 공통 영역을 설정할 수 있다. The image matching apparatus 100 may detect feature points for one or more input images, respectively, and if the same feature points are matched, a common area may be set to include all the matched feature points.

다음으로 영상 정합 장치(100)는 공통 영역에 위치하는 사물들을 검출하고, 검출된 사물들에 설정된 우선순위에 비례하는 사물 기반 가중치 행렬을 산출한다(S120). Next, the image matching apparatus 100 detects objects located in the common area and calculates an object-based weight matrix proportional to the priority set to the detected objects ( S120 ).

영상 정합 장치(100)는 와핑이 적용된 공통 영역에서 사물과 배경 정보를 획득하기 위해 시각적 인지 에너지 함수를 다음 수학식 1과 같이 산출한다. The image matching apparatus 100 calculates a visual perception energy function as shown in Equation 1 below to obtain object and background information in the common area to which the warping is applied.

여기서, 시각적 인지 에너지 값(E(x,y)은 입력 영상(I)에 대하여 가로축의(X축) 픽셀 변화량 크기와 세로축의(Y축) 픽셀 변화량 크기의 합과 같다. Here, the visual perception energy value E(x,y) is equal to the sum of the pixel change amount on the horizontal axis (X-axis) and the pixel change amount on the vertical axis (Y-axis) with respect to the input image I.

그리고 영상 정합 장치(100)는 학습이 완료된 딥러닝 기반 사물 검출 알고리즘을 통해 입력 영상에서 사물의 종류, 위치, 사물 모양 세그먼트를 검출한다. And the image matching apparatus 100 detects the type of object, the position, and the object shape segment from the input image through the deep learning-based object detection algorithm that has been trained.

영상 정합 장치(100)는 입력 영상에서 검출된 사물들 중에서 공통 영역에 위치하는 사물에 대하여 각각의 중요도와 우선순위를 설정한다. 이때, 영상 정합 장치(100)는 연속된 입력 영상에서의 검출된 종류의 빈도수, 중요도, 위치 정보 등에 기초하여 미리 설정된 사물들의 중요도 정보에 기초하여 설정할 수 있다. 그리고 영상 정합 장치(100)는 중요도가 높은 순서대로 우선순위를 설정한다. The image matching apparatus 100 sets respective importance and priority for objects located in a common area among objects detected in the input image. In this case, the image matching apparatus 100 may set based on importance information of objects that are set in advance based on the frequency, importance, and location information of the types detected in the continuous input image. In addition, the image matching apparatus 100 sets priorities in the order of importance.

예를 들어, 사람, 비행기, 기차, 신호등, 표지판, 새, 고양이, 지갑 등 100가지의 사물에 대해 학습이 완료된 사물 검출 알고리즘을 활용하여 로드뷰 콘텐츠를 생성한다고 가정한다. 이때, 사물에 대한 중요도는 사람(100), 비행기(99), 기차(98), ... , 신호등(89), 표지판(88), ... , 새(24), 고양이(23), ... , 지갑(1) 등으로 설정될 수 있다. For example, it is assumed that roadview content is generated by using an object detection algorithm that has been trained for 100 objects such as people, airplanes, trains, traffic lights, signs, birds, cats, and wallets. At this time, the importance of the object is a person (100), an airplane (99), a train (98), ... , a traffic light (89), a sign (88), ... , a bird (24), a cat (23), ... , the wallet (1), etc. can be set.

이러한 중요도는 관리자가 직접 별도로 입력하거나 별도의 알고리즘을 통해 로드뷰 콘텐츠와 관련도가 높은 사물들의 종류를 추출하여 설정될 수 있다. Such importance may be set by separately input by the administrator or by extracting the types of things highly related to the roadview content through a separate algorithm.

또한, 우선순위는 중요도가 설정된 딥러닝 기반 사물 검출 알고리즘을 통해 입력 영상에서 사람, 기차, 표지판, 신호등, 고양이를 검출한 경우, 중요도에 따라 우선순위는 사람 (4), 승용차(3), 표지판(2), 신호등(1), 고양이(0)로 설정될 수 있다.In addition, when a person, a train, a sign, a traffic light, and a cat are detected in the input image through a deep learning-based object detection algorithm in which the priority is set, the priority according to the importance is a person (4), a car (3), a sign (2), a traffic light (1), and a cat (0).

한편, 영상 정합 장치(100)는 수학식 2를 통해 시각적 인지 에너지 함수의 최대 에너지 값(E_max)을 산출한다. Meanwhile, the image matching apparatus 100 calculates the maximum energy value (E _max ) of the visual perception energy function through Equation (2).

또한, 영상 정합 장치(100)는 수학식 3과 같이, 사물 간에 구분될 수 있는 가중치 값이 설정될 수 있도록 사물의 우선순위에 비례한 최대 에너지 값을 기존 시각적 인지 에너지 함수에 더하여 우선순위 사물 기반 가중치 행렬(POWM(x,y)을 생성한다. In addition, as shown in Equation 3, the image matching apparatus 100 adds a maximum energy value proportional to the priority of an object to the existing visual perception energy function so that a weight value that can be distinguished between objects can be set. Create a weight matrix (POWM(x,y).

여기서, Kp 는 우선순위 값을 나타낸다. Here, Kp represents a priority value.

다음으로 영상 정합 장치(100)는 이전 입력 영상에 기초하여 설정된 경계선 가중치와 사물 기반 가중치 행렬에 기초하여 공통 영역에서의 경계선 생성 행렬을 생성한다(S130). Next, the image matching apparatus 100 generates a boundary line generation matrix in the common area based on the boundary line weights set based on the previous input image and the object-based weight matrix ( S130 ).

영상 정합 장치(100)는 수학식 4를 이용하여 이전 시점 프레임에서 생성된 경계선의 좌표를 기준으로 수평 거리에 비례한 수평 가중치(PSW(x,y))를 산출한다. The image matching apparatus 100 calculates the horizontal weight (PSW(x,y)) proportional to the horizontal distance based on the coordinates of the boundary line generated in the previous view frame by using Equation (4).

다시 말해, 이전 경계선의 가로 (X 축) 위치와 현재 경계선의 가로 (X 축) 위치 차이의 제곱을 가중치 값으로 설정될 수 있다.In other words, the square of the difference between the horizontal (X-axis) position of the previous boundary line and the horizontal (X-axis) position of the current boundary line may be set as a weight value.

그리고 영상 정합 장치(100)는 경계선을 생성하기 위하여 수학식 5와 같이 POWM과 수평 가중치(PSW)를 더하여 경계선 행렬 S(x,y)를 구성한다. In addition, the image matching apparatus 100 constructs a boundary line matrix S(x,y) by adding POWM and a horizontal weight (PSW) as in Equation 5 to generate a boundary line.

수학식 5는 입력 영상의 중복 영역에 대하여 계산한 POWM과 이전 프레임의 경계선을 통해 계산한 수평 가중치의 덧셈 연산을 통하여 경계선 행렬을 생성하는 식이다. Equation 5 is an equation for generating a boundary line matrix through the addition operation of the POWM calculated for the overlapping region of the input image and the horizontal weight calculated through the boundary line of the previous frame.

그리고 영상 정합 장치(100)는 다음 수학식 6을 이용하여 경계선 생성 행렬(SGM(x,y))을 생성한다. And the image matching apparatus 100 generates a boundary line generation matrix (SGM(x,y)) using the following Equation (6).

예를 들어, 경계선 생성 행렬(SGM)은 경계선 행렬 S의 Y축 방향 상위 3개의 행렬 값 중에서 가장 작은 값을 합하여 생성한다. For example, the boundary line generation matrix SGM is generated by adding the smallest value among the upper three matrix values in the Y-axis direction of the boundary line matrix S.

다시 말해, 영상 정합 장치(100)는 최상단부터 최하단까지 가장 낮은 값을 선택하여 더해가는 과정을 반복하여 경계선 생성 행렬 SGM(x,y)를 생성한다. In other words, the image matching apparatus 100 generates the boundary line generation matrix SGM(x,y) by repeating the process of selecting and adding the lowest values from the top to the bottom.

다음으로 영상 정합 장치(100)는 경계선 생성 행렬에 기초하여 입력 영상들을 정합한다(S140). Next, the image matching apparatus 100 matches the input images based on the boundary line generation matrix ( S140 ).

영상 정합 장치(100)는 가장 낮은 값의 지점을 선택하고 해당 지점으로부터 수학식 7과 같이, 상단에 위치한 3개의 행렬 값 중 가장 낮은 값을 선택하는 과정을 최상단까지 반복하여, 최소 경로를 따라 경계선 Seam(x,y-1)을 생성한다. The image matching apparatus 100 selects the point of the lowest value and repeats the process of selecting the lowest value among the three matrix values located at the top from the point as in Equation 7 to the top, and the boundary line along the minimum path Create Seam(x,y-1).

수학식 7은 경계선 생성 행렬을 바탕으로 경계선을 생성하는 과정을 나타낸다. Equation 7 shows a process of generating a boundary line based on the boundary line generation matrix.

이와 같이, 영상 정합 장치(100)는 경계선 최적화(Seam Optimization) 기반의 영상 스티칭을 바탕으로, 딥러닝 기반의 사물 검출을 활용하여 사물의 종류에 따라 다른 가중치를 설정하고, 이를 경계선 생성 행렬에 적용함으로써 시차 왜곡을 극복할 수 있다. In this way, the image matching apparatus 100 sets different weights according to the type of object by utilizing deep learning-based object detection based on seam optimization-based image stitching, and applies this to the boundary line generation matrix. This can overcome parallax distortion.

도 4는 본 발명의 실시예에 따른 입력 영상들에 대해 공통영역을 추출하는 과정을 나타낸 예시도이다. 4 is an exemplary diagram illustrating a process of extracting a common region from input images according to an embodiment of the present invention.

도 4의 (a)는 특징점들을 매칭한 입력 영상들이고, (b)는 입력 영상들에 기초하여 공통 영역을 추출하는 과정을 나타낸다. FIG. 4(a) shows input images in which feature points are matched, and (b) shows a process of extracting a common region based on the input images.

도4에 도시한 바와 같이, 시차가 크게 나타난 #1 프레임, #2프레임 그리고 #3 프레임에 대해서 영상 정합 장치(100)는 각각 특징점들을 추출한 후, 매칭하는 과정을 통하여 공통 영역을 설정한다.As shown in FIG. 4 , the image matching apparatus 100 extracts feature points from frames #1, #2, and #3 in which the disparity is large, respectively, and then sets a common area through a matching process.

#1 프레임과 #2 프레임간에 동일한 특징점들을 연결하면, 영상 정합 장치(100)는 연결된 특징점들을 모두 포함하는 공통 영역(#1-1, #2-1)을 설정할 수 있다. 또한, #2 프레임과 #3프레임간에 동일한 특징점들을 연결하면, 영상 정합 장치(100)는 연결된 특징점들을 모두 포함하는 공통 영역(#2-2, #3-1)을 설정할 수 있다.When the same feature points are connected between the #1 frame and the #2 frame, the image matching apparatus 100 may set common areas #1-1 and #2-1 including all the connected feature points. Also, when the same feature points are connected between frame #2 and frame #3, the image matching apparatus 100 may set common areas #2-2 and #3-1 including all the connected feature points.

그리고 영상 정합 장치(100)는 공통 영역(#1-1, #2-1)간에 변형 관계를 도출하여 영상을 변환할 수 있다. In addition, the image matching apparatus 100 may transform an image by deriving a transformation relationship between the common regions #1-1 and #2-1.

도 5는 본 발명의 실시예에 따른 시각적 인지 에너지 함수를 적용한 예시도이다. 5 is an exemplary diagram to which a visual perception energy function according to an embodiment of the present invention is applied.

도 5의 (a)는 입력 영상이고, (b)는 시각적 인지 에너지 함수를 적용한 입력 영상을 나타낸 예시도이다. Fig. 5 (a) is an input image, and (b) is an exemplary view showing an input image to which a visual perception energy function is applied.

시각적 인지 에너지 함수를 적용하는 방법은 영상에서의 에너지 값을 경계선 생성 행렬에 적용하고 에너지 값이 작은 저주파수 영역을 따라 경계선을 생성하는 방법이다. The method of applying the visual perception energy function is a method of applying an energy value from an image to a boundary line generation matrix and generating a boundary line along a low-frequency region with a small energy value.

영상 정합 장치(100)는 입력 영상에 시각적 인지 에너지 함수를 적용하면 수평 변화의 크기와 수직 변화의 크기를 더한 에너지 값은 영상 내 사물 의 윤곽선(Edge)과 비슷한 도 5의 (b)와 같이 나타난다. When the image matching apparatus 100 applies the visual perception energy function to the input image, the energy value obtained by adding the magnitude of the horizontal change and the magnitude of the vertical change appears as shown in FIG. 5(b), which is similar to the edge of the object in the image .

도 5의 (b)를 보면, 빨간색 점에서 경계선이 처음 생성된다면, 경계선은 사물의 내부 영역을 따라 생성된다. 이때, 에너지 값은 픽셀의 변화율로 정의되기 때문에 배경과 비슷한 픽셀 값을 가지는 사물 영역에서는 도 5의 (b)의 우측 하단의 동그라미 영역과 같이, 가중치가 제대로 반영되지 않는 시차 왜곡이 발생한다. Referring to (b) of FIG. 5 , if a boundary line is first generated at a red point, the boundary line is generated along the inner region of the object. In this case, since the energy value is defined as the change rate of the pixel, parallax distortion occurs in which the weight is not properly reflected, as in the circled area at the lower right of FIG. 5B , in the object area having a pixel value similar to the background.

도 6은 본 발명의 실시예에 검출된 사물의 세그먼트를 나타낸 예시도이다. 6 is an exemplary diagram illustrating a segment of an object detected according to an embodiment of the present invention.

도 6에 도시한 바와 같이, 영상 정합 장치(100)는 입력 영상들을 학습이 완료된 사물 검출 알고리즘(예를 들어, Mask R-CNN)에 입력하여 검출된 사물 영역이 세그먼트화된 결과를 획득할 수 있다. As shown in FIG. 6 , the image matching apparatus 100 may obtain a segmented result of the detected object region by inputting the input images into a learned object detection algorithm (eg, Mask R-CNN). have.

이때, 영상 정합 장치(100)는 와핑을 수행하기 전의 공통 영역에 해당하는 사물 만을 고려하며, 이때, 시각적 인지 에너지 함수 값을 함께 입력값으로 사용할 수 있다. In this case, the image matching apparatus 100 considers only the object corresponding to the common area before performing the warping, and in this case, the value of the visual cognitive energy function may be used as an input value together.

그리고 영상 정합 장치(100)는 각 사물마다 세그먼트화하고, 사물의 종류, 위치 정보 등을 획득할 수 있다. In addition, the image matching apparatus 100 may segment each object and acquire the type of object, location information, and the like.

도 7은 본 발명의 실시예에 우선순위 사물 기반 가중치 행렬을 설명하기 위한 예시도이다. 7 is an exemplary diagram for explaining a priority object-based weight matrix in an embodiment of the present invention.

도 7의 (a)는 입력 영상을 나타내고, (b)는 입력 영상에 대해 우선순위 사물 기반 가중치 행렬을 생성한 예시도이다. 7 (a) shows an input image, and (b) is an exemplary diagram in which a priority object-based weight matrix is generated with respect to the input image.

도 7에 도시한 바와 같이, 영상 정합 장치(100)는 학습이 완료된 사물 검출 알고리즘에 도 7의 (a)의 입력 영상을 입력하여 사물들을 검출한다. As shown in FIG. 7 , the image matching apparatus 100 detects objects by inputting the input image of FIG. 7A to an object detection algorithm that has been trained.

이때, 영상 정합 장치(100)는 검출된 사물에 대해서 다음 표 1과 같은 사물에 할당된 중요도와 우선순위를 적용하게 된다. In this case, the image matching apparatus 100 applies the importance and priority assigned to the object as shown in Table 1 below with respect to the detected object.

기둥(pole)pole 의자(chair)chair 기차(train)train 사람(person)person 중요도importance 2121 6161 9898 100100 우선순위Priority 00 1One 22 33

이에 앞서 설명한 수학식 2에 적용하여 영상 정합 장치(100)는 사물기반 가중치 행렬을 산출하게 되며 이를 입력 영상에 표시하면 도 8의 (b)와 같다. By applying Equation 2 described above, the image matching apparatus 100 calculates an object-based weight matrix and displays it on the input image as shown in FIG. 8(b).

도 8의 (b)에 도시한 바와 같이 흰색에 가까울수록 높은 가중치를 나타낸다. 따라서, 사람, 기차, 의자 그리고 기둥 순으로 가중치가 설정된 것을 확인할 수 있다. As shown in FIG. 8B , the closer to white, the higher the weight. Therefore, it can be seen that weights are set in the order of person, train, chair, and column.

도 8은 본 발명의 실시예에 따른 동적 중요도를 할당하는 구성을 설명하기 위한 예시도이다. 8 is an exemplary diagram for explaining a configuration for allocating dynamic importance according to an embodiment of the present invention.

도 8에 도시한 바와 같이, 영상 정합 장치(100)는 검출된 사물마다 설정된 중요도에 기초하여 움직이는 사물을 확인하고, 움직이는 사물에 대해서는 동적 중요도를 할당한다. As shown in FIG. 8 , the image matching apparatus 100 identifies a moving object based on the importance set for each detected object, and assigns a dynamic importance to the moving object.

도 8의 (a)는 이전 시점 입력 영상의 프레임에서의 사물 위치 정보이고, (b)는 현재 시점 입력 영상의 프레임에서의 사물 위치 정보를 나타낸다. 8(a) shows object position information in a frame of a previous view input image, and (b) shows object position information in a frame of a current view input image.

영상 정합 장치(100)는 검출된 사람, 비행기, 기차, 차 오토바이, 새, 고양이, 신호등 교통 표지판 등의 사물에 대해서 각각 설정된 중요도를 확인할 수 있다. 그리고 검출된 사물들에 대해 사물 위치 정보의 변화에 기초하여 움직인 사물을 확인할 수 있다. The image matching apparatus 100 may check the importance set for each detected object such as a person, an airplane, a train, a motorcycle, a bird, a cat, a traffic sign, or the like. In addition, it is possible to identify a moving object based on a change in object location information for the detected objects.

이에 움직이는 사물로 확인된 사람, 비행기, 기차, 차, 오토바이에 대해서는 동적 중요도를 검출된 사물의 개수에 기초하여 높게 설정한다.Accordingly, for a person, an airplane, a train, a car, and a motorcycle identified as moving objects, the dynamic importance is set high based on the number of detected objects.

도 9는 본 발명의 실시예에 따른 수평 가중치를 적용하는 과정을 설명하기 위한 예시도이다. 9 is an exemplary diagram for explaining a process of applying a horizontal weight according to an embodiment of the present invention.

도 9에 도시한 바와 같이, 영상 정합 장치(100)는 이전 시점 프레임에서 생성된 경계선(이전 시점 프레임_SGM)에 기초하여 생성된 가중치에 대해서 앞서 설명한 수학식 4를 이용하여 수평 가중치 행렬을 도출한다. As shown in FIG. 9 , the image matching apparatus 100 derives a horizontal weight matrix using Equation 4 described above with respect to the weights generated based on the boundary line (previous view frame_SGM) generated in the previous view frame. do.

이전 시점 프레임에서 경계선의 X좌표와 현재 시점 프레임에서 경계선의 X'좌표를 기준으로 수평 거리 차이에 따라서 (X -X`)²를 가중치로 현재 시점 프레임에 대한 우선순위 사물 기반 가중치 행렬(POWM)에 더해줌으로써 현재 시점 프레임에 대한 경계선 생성 행렬(SGM)을 도출한다. Based on the horizontal distance difference based on the X coordinate of the boundary line in the previous viewpoint frame and the X' coordinate of the boundary line in the current viewpoint frame, (X -X`) ² as a weight for the current viewpoint frame Priority object-based weight matrix (POWM) By adding to , a boundary line generation matrix (SGM) for the current view frame is derived.

이는 현재 시점 프레임에서의 경계선은 이전 프레임 경계선에서 멀지 않은 위치에 생성되도록 하기 때문에, 합성된 영상에서 사물이 급격하게 이동하는 현상을 방지할 수 있다. Since the boundary line in the current view frame is generated at a position not far from the boundary line of the previous frame, a phenomenon in which an object moves rapidly in the synthesized image can be prevented.

도 10은 본 발명의 실시예에 따른 영상 정합 결과와 기존 방법의 영상 정합 결과를 비교한 예시도이다. 10 is an exemplary diagram comparing the image registration result according to an embodiment of the present invention and the image registration result of the existing method.

도 10의 (a)는 촬영 시간과 촬영 시점이 모두 다르게 촬영된 입력 영상들이다. 입력 영상들은 전체 영상 크기에 비해 현저하게 작은 공통 영역을 가지며, 공통영역의 중앙에 기둥이 위치하여 경계선이 사물을 우회하며 생성되기 어렵기 때문에 시차 왜곡이 발생 확률이 높다. 10A shows input images photographed at different photographing times and photographing times. The input images have a remarkably small common area compared to the overall image size, and since the column is located in the center of the common area, the boundary line is difficult to generate while bypassing the object, the probability of parallax distortion is high.

이에 도 10의 (b)는 기존 기술을 이용하여 정합한 영상이고, (d)는 본 발명의 실시예 따른 방법으로 정합한 영상을 나타낸다. Accordingly, (b) of FIG. 10 is an image matched using the existing technology, and (d) is an image matched by the method according to an embodiment of the present invention.

상세하게는 도 10의 (b)는 시각적 인지 에너지 함수에 평균 밝기와 일정 크기 이상 차이 나는 픽셀 값에 가중치를 적용하는 기술로, 기둥이 공통 영역의 평균 밝기와 비슷하기 때문에, 기둥에 대해서 가중치 설정이 제한되어 시차 왜곡이 발생하는 것을 확인할 수 있다. 이처럼 기존의 밝기에 기초하여 가중치를 적용하는 방법은 평균 밝기와 유사한 픽셀 영역은 고려 하지 못한다는 제한 사항이 있다. In detail, Fig. 10 (b) is a technique for applying a weight to a pixel value that differs from the average brightness by more than a certain size in the visual perception energy function. Since the pillar is similar to the average brightness of the common area, weight is set for the pillar As this is limited, it can be confirmed that parallax distortion occurs. As such, the method of applying weights based on the existing brightness has a limitation in that a pixel area similar to the average brightness cannot be considered.

반면에 도 10의 (c)를 보면 딥러닝을 통해 사물을 검출하고 사물의 종류에 따라서 개별적인 가중치를 설정하기 때문에 기둥에서 시차 왜곡이 발생하지 않은 것을 확인할 수 있다. On the other hand, referring to (c) of FIG. 10 , it can be seen that parallax distortion does not occur in the column because objects are detected through deep learning and individual weights are set according to the types of objects.

한편, 도 11은 한 실시예에 따른 영상 정합 장치의 하드웨어 구성을 나타낸 블록도이다.Meanwhile, FIG. 11 is a block diagram illustrating a hardware configuration of an image matching apparatus according to an exemplary embodiment.

도 11을 참고하면, 영상 정합 장치(100)는 적어도 하나의 컴퓨팅 장치(200)로 구현될 수 있고, 본 개시의 동작을 실행하도록 기술된 명령들(instructions)이 포함된 컴퓨터 프로그램을 실행할 수 있다. Referring to FIG. 11 , the image matching apparatus 100 may be implemented as at least one computing device 200 , and may execute a computer program including instructions described to execute the operation of the present disclosure. .

컴퓨팅 장치(200)의 하드웨어는 적어도 하나의 프로세서(210), 메모리(220), 스토리지(230), 통신 인터페이스(240)를 포함할 수 있고, 버스를 통해 연결될 수 있다. 이외에도 입력 장치 및 출력 장치 등의 하드웨어가 포함될 수 있다. 컴퓨팅 장치(200)는 컴퓨터 프로그램을 구동할 수 있는 운영 체제를 비롯한 각종 소프트웨어가 탑재될 수 있다.The hardware of the computing device 200 may include at least one processor 210 , a memory 220 , a storage 230 , and a communication interface 240 , and may be connected through a bus. In addition, hardware such as an input device and an output device may be included. The computing device 200 may be loaded with various software including an operating system capable of driving a computer program.

프로세서(210)는 컴퓨팅 장치(200)의 동작을 제어하는 장치로서, 컴퓨터 프로그램에 포함된 명령들을 처리하는 다양한 형태의 프로세서일 수 있고, 예를 들면, CPU(Central Processing Unit), MPU(Micro Processor Unit), MCU(Micro Controller Unit), GPU(Graphic Processing Unit) 등 일 수 있다. 또한, 프로세서(210)는 위에서 설명한 방법을 실행하기 위한 프로그램에 대한 연산을 수행할 수 있다.The processor 210 is a device for controlling the operation of the computing device 200 , and may be various types of processors that process instructions included in a computer program, for example, a central processing unit (CPU), a micro processor (MPU) unit), microcontroller unit (MCU), graphic processing unit (GPU), or the like. Also, the processor 210 may perform an operation on a program for executing the method described above.

메모리(220)는 각종 데이터, 명령 및/또는 정보를 저장한다. 메모리(220)는 본 개시의 동작을 실행하도록 기술된 명령들이 프로세서(210)에 의해 처리되도록 해당 컴퓨터 프로그램을 스토리지(230)로부터 로드할 수 있다. 메모리(220)는 예를 들면, ROM(read only memory), RAM(random access memory) 등일 수 있다. 스토리지(650)는 본 개시의 동작을 실행하는데 요구되는 각종 데이터, 컴퓨터 프로그램 등을 저장할 수 있다. 스토리지(230)는 컴퓨터 프로그램을 비임시적으로 저장할 수 있다. 스토리지(230)는 비휘발성 메모리로 구현될 수 있다. 프로세서(210)는 통신 인터페이스(240)를 통해 네트워크에 연결된다.The memory 220 stores various data, commands and/or information. The memory 220 may load a corresponding computer program from the storage 230 so that the instructions described to execute the operations of the present disclosure are processed by the processor 210 . The memory 220 may be, for example, read only memory (ROM), random access memory (RAM), or the like. The storage 650 may store various data, computer programs, and the like required to execute the operations of the present disclosure. The storage 230 may non-temporarily store a computer program. The storage 230 may be implemented as a non-volatile memory. The processor 210 is connected to the network through the communication interface 240 .

컴퓨터 프로그램은, 프로세서(210)에 의해 실행되는 명령어들(instructions)을 포함하고, 비일시적-컴퓨터 판독가능 저장매체(non-transitory computer readable storage medium)에 저장되며, 명령어들은 프로세서(210)가 본 개시의 동작을 실행하도록 만든다. 컴퓨터 프로그램은 네트워크를 통해 다운로드되거나, 제품 형태로 판매될 수 있다. 사물 검출 모델은 프로세서(210)에 의해 실행되는 컴퓨터 프로그램으로 구현될 수 있다. The computer program includes instructions executed by the processor 210 , and is stored in a non-transitory computer readable storage medium, wherein the instructions are read by the processor 210 . Makes the action of initiation be executed. The computer program may be downloaded over a network or sold in product form. The object detection model may be implemented as a computer program executed by the processor 210 .

이와 같이, 실시예에 따르면, 다수 카메라를 통해 촬영된 영상들의 정합 과정에서 촬영하는 카메라 간에 시차에 의해 발생하는 시차 왜곡을 최소화할 수 있다. As described above, according to the embodiment, parallax distortion caused by parallax between cameras photographed in a process of matching images captured by multiple cameras may be minimized.

또한, 이전 프레임에서 검출된 경계선을 기반으로 가중치를 설정하므로 합성된 동영상에서 사물이 급격하게 이동하는 문제를 해결할 수 있다. In addition, since the weight is set based on the boundary line detected in the previous frame, it is possible to solve the problem of rapidly moving objects in the synthesized video.

그리고 영상 정합 시 딥러닝 기반 사물 검출 기술을 활용하여 검출된 사물의 중요도에 따라서 우선순위를 설정하고 경계선 생성 행렬을 구성하여, 시차 왜곡으로부터 중요한 사물을 보호함으로써 영상 정합의 정확도와 사용자의 콘텐츠 몰입을 향상시킬 수 있다. In addition, by using deep learning-based object detection technology during image registration, priority is set according to the importance of the detected object and the boundary line generation matrix is configured to protect important objects from parallax distortion, thereby improving the accuracy of image registration and user content immersion. can be improved

딥러닝을 활용한 영상 스티칭 기술의 보편적인 기저 기술로 활용될 수 있으며, 영상 정합 시 딥러닝 기반 사물 검출 기술을 활용하기에 기존 딥러닝 기반 사물 검출을 활용한 감시 시스템과 절차적 통합이 가능하다. 　　　It can be used as a universal base technology for image stitching technology using deep learning, and since deep learning-based object detection technology is used for image registration, procedural integration with existing deep learning-based object detection is possible. .

이상에서 본 발명의 실시예에 대하여 상세하게 설명하였지만 본 발명의 권리범위는 이에 한정되는 것은 아니고 다음의 청구범위에서 정의하고 있는 본 발명의 기본 개념을 이용한 당업자의 여러 변형 및 개량 형태 또한 본 발명의 권리범위에 속하는 것이다.Although the embodiments of the present invention have been described in detail above, the scope of the present invention is not limited thereto. is within the scope of the right.

본 연구는 한국전력공사의 2018년 착수 에너지 거점대학 클러스터 사업에 의해 지원되었음 (과제번호:R18XA02)This study was supported by Korea Electric Power Corporation's 2018 Energy Hub University Cluster Project (Project No.: R18XA02)

Claims

A method of image registration by a computing device operated by at least one processor, the method comprising:
Detecting each feature point in the input images and extracting a common area by matching the same feature points in the input images;
detecting objects located in the common area and calculating an object-based weight matrix proportional to the priority set to the detected objects;
calculating a boundary line weight based on the position information of the boundary line derived from the input image of a continuous previous view, and generating a boundary line generation matrix in the common area based on the boundary line weight and the object-based weight matrix, and
matching the input images based on the boundary line generation matrix;
An image registration method comprising a.

In claim 1,
The step of extracting the common region comprises:
An image matching method that converts an input image by estimating a change relationship between feature points at the same point by detecting a point detected as the same point as a feature point even in rotation, brightness change, and movement transformation of an image, and extracting a common area based on the feature point .

In claim 1,
Calculating the object-based weight matrix includes:
An image matching method for calculating a visual perception energy value in the common area and detecting objects through an object detection algorithm in which learning of the common area and the calculated visual perception energy value is completed.

In claim 3,
Calculating the object-based weight matrix includes:
An image matching method for setting importance based on one or more criteria among a purpose of input images, frequency of appearance in the input image, and speed of a moving object in the input image for each object type obtained by the object detection algorithm.

In claim 4,
Calculating the object-based weight matrix includes:
An image matching method for selecting moving objects by comparing the position information of the object in the continuous input image of the previous viewpoint with the position information of the object at the current viewpoint, and setting the dynamic importance based on the importance set for each moving object.

In claim 5,
Calculating the object-based weight matrix includes:
Priority is set for each object based on the importance and dynamic importance of the object, the maximum energy value of the object is corrected in proportion to the priority of the object, and then an object-based weight matrix is generated by adding the visual perception energy value Image registration method.

In claim 3,
The generating of the boundary line generation matrix comprises:
An image registration method that calculates a boundary line weight proportional to the horizontal distance based on the boundary line coordinates generated from the previous view input image.

In claim 1,
The generating of the boundary line generation matrix comprises:
An image matching method for generating a boundary line generation matrix by summing minimum matrix values from one end to the other end based on the object-based weight matrix and boundary line weights.

In claim 1,
The step of matching the input images comprises:
An image registration method of generating a boundary line along a smallest value among adjacent pixels at a specific point based on the boundary line generation matrix.

An image matching device comprising:
memory containing instructions, and
A processor for performing image registration between input images by executing the instructions,
the processor
Extracting a common area based on the feature points detected in the input images,
Calculating boundary weights based on an object-based weight matrix proportional to the priority of objects detected in the common area and location information of boundary lines derived from an input image of a continuous previous view,
and matching the input images through a boundary line generation matrix in the common area generated based on the boundary line weights and the object-based weight matrix.

11. In claim 10,
the processor
An image matching apparatus for calculating a visual perception energy value in the common area and detecting objects through an object detection algorithm in which learning of the common area and the calculated visual perception energy value is completed.

In claim 11,
the processor
For each object type obtained by the object detection algorithm, the importance is set based on one or more criteria among the purpose of the input images, the frequency of appearance in the input image, and the speed of the moving object in the input image,
An image matching apparatus for selecting moving objects by comparing the position information of the object in the continuous input image of the previous viewpoint with the position information of the object at the current viewpoint, and setting the dynamic importance based on the importance set for each moving object .

In claim 12,
the processor
Priority is set for each object based on the importance and dynamic importance of the object, the maximum energy value of the object is corrected in proportion to the priority of the object, and then an object-based weight matrix is generated by adding the visual perception energy value , an image matching device.

11. In claim 10,
the processor
An image matching apparatus for calculating the boundary weight in proportion to a horizontal distance based on the coordinates of the boundary line generated from the previous view input image.

11. In claim 10,
the processor
Based on the object-based weight matrix and the boundary line weight, a boundary line generation matrix is generated by summing the minimum matrix values from one end to the other end, and based on the boundary line generation matrix, a boundary line is generated along the smallest value among adjacent pixels at a specific point. , an image matching device.