KR102558095B1

KR102558095B1 - Panoramic texture mapping method with semantic object matching and the system thereof

Info

Publication number: KR102558095B1
Application number: KR1020200175376A
Authority: KR
Inventors: 우운택; 박진우
Original assignee: 한국과학기술원
Priority date: 2020-12-15
Filing date: 2020-12-15
Publication date: 2023-07-24
Also published as: KR20220085369A

Abstract

본 발명은 의미론적 객체 매칭을 통한 파노라마 텍스처 매핑 방법 및 시스템에 관한 것으로, 거리 뷰 이미지에서 의미론적 정보를 추출하는 단계, 사용자의 움직임에 따른 카메라 뷰 이미지에 대한 G-버퍼를 렌더링하며, 상기 G-버퍼에서 각 픽셀을 상기 거리 뷰 이미지로 매핑하는 파노라마 이미지 매핑을 수행하는 단계 및 상기 의미론적 정보와 상기 G-버퍼를 이용하여 깊이 테스트, 의미 객체 매칭 테스트 및 3D 인페인팅(inpainting)을 통해 상기 카메라 뷰 이미지와 상기 거리 뷰 이미지에 대한 파노라마 텍스처 매핑을 수행하는 단계를 포함한다.The present invention relates to a panoramic texture mapping method and system through semantic object matching, which includes extracting semantic information from a street view image, rendering a G-buffer for a camera view image according to a user's movement, performing panoramic image mapping to map each pixel from the G-buffer to the street view image, and using the semantic information and the G-buffer to perform a depth test, a semantic object matching test, and 3D inpainting for the camera view image and the street view image. and performing panoramic texture mapping.

Description

Panoramic texture mapping method and system through semantic object matching

본 발명은 의미론적 객체 매칭을 통한 파노라마 텍스처 매핑 방법 및 시스템에 관한 것으로, 보다 상세하게는 거리 뷰 이미지를 실시간 재현하는 파노라마 텍스처(texture) 매핑 기반의 렌더링 기술에 관한 것이다. The present invention relates to a panoramic texture mapping method and system through semantic object matching, and more particularly, to a rendering technique based on panoramic texture mapping that reproduces a street view image in real time.

실제 세계의 도시 장면을 사진과 똑같이 재현하는 것은 가상 투어링, 지오태그형(geotagged) 소셜 미디어, 정보 시각화 등 물리적 세계를 반영하는 다양한 가상 현실(VR) 어플리케이션에 힘을 실어주는 데 중요한 역할을 해왔다. 이러한 경험의 사실성을 극대화하기 위한 핵심 기법으로서, 이미지 기반 렌더링(Image-Based Rendering; IBR)은 컴퓨터 그래픽과 비전 모두에서 가장 활발하고 중요한 연구 주제 중 하나이다. 특히 IBR의 기하 이미지 연속체에 따르면, 일반적으로 캡처(capture)된 실제 이미지를 해당 가상 프록시 기하학에 텍스처로 매핑하는 물리 기반 접근법을 채택하여 더 넓은 범위의 사용자 자유 시선으로 참신한 뷰를 합성하는 경우가 많다.Photorealistic reproductions of real-world urban scenes have been instrumental in powering a variety of virtual reality (VR) applications that mirror the physical world, including virtual touring, geotagged social media, and information visualization. As a key technique to maximize the realism of these experiences, Image-Based Rendering (IBR) is one of the most active and important research topics in both computer graphics and vision. In particular, according to IBR's geometric image continuum, it is common to adopt a physics-based approach that maps captured real-world images as textures to corresponding virtual proxy geometries, often synthesizing novel views with a wider range of user-free gaze.

그러나 이러한 전통적인 IBR 방식은 일반적으로 제한된 지역을 커버하기 위한 사용자 관점에서의 수십 년 또는 수백 개의 인접 이미지가 필요하여 대규모 도시 장면에 적용되기 어려웠다. 또한, 가깝고 새로운 뷰 합성을 위한 이전 방법은 로컬 기하학 정보에 따라 달라졌으며, 그 방법들은 깊이 이미지 또는 포인트 클라우드에서 로컬 프록시(proxy) 기하학의 높은 충실도와 밀집한 메쉬를 재구성하여 사실적으로 합성된 새로운 뷰의 확장성 측면에서 한계를 초래하였다. 결과적으로, 그러한 접근방식은 변형과 회전을 포함하여 완전한 자유도(Degrees of Freedom; DoF)를 가진 사용자를 더 큰 장면에서의 자유 보행 경험을 위해 지원할 수 없었다.However, these traditional IBR methods are difficult to apply to large-scale urban scenes, as they generally require decades or hundreds of adjacent images from the user's point of view to cover a limited area. In addition, previous methods for synthesizing near new views depended on local geometry information, and those methods were limited in terms of scalability of realistically synthesized new views by reconstructing high-fidelity and dense meshes of local proxy geometries from depth images or point clouds. As a result, such an approach could not support users with complete degrees of freedom (DoF) including transformation and rotation for a free walking experience in larger scenes.

선도기업이나 연구단체가 획득한 글로벌 데이터가 개발되면서 최근에는 대규모 장면을 다루는 첨단 기법이 제안되고 있다. 특히 구글 스트리트 뷰에서 제공하는 방대한 양의 거리 레벨 파노라마 이미지와 그에 상응하는 카메라 파라미터로, IBR에 사용할 수 있는 모션(Structure from Motion; SfM)으로부터의 구조와 같은 비전 기반의 알고리즘으로부터 광범위한 도시 장면을 재구성할 수 있었다. 그럼에도 불구하고 그러한 방법은 여전히 장면 스케일에 비례하여 더 많은 자원 이미지와 계산 시간이 필요했고 재구성된 포인트 클라우드가 너무 많아서 실시간 IBR 시스템에서 사용할 수 없었다.With the development of global data acquired by leading companies and research organizations, advanced techniques for handling large-scale scenes have recently been proposed. In particular, with the vast amount of street-level panoramic images provided by Google Street View and the corresponding camera parameters, a wide range of urban scenes could be reconstructed from vision-based algorithms such as Structure from Motion (SfM) available for IBR. Nonetheless, such methods still required more resource images and computational time proportional to the scene scale, and reconstructed point clouds were too numerous to be used in real-time IBR systems.

Google Earth는 세계 규모의 도시 장면에서 현실성이 높은 플라이오버(fly-over) 경험을 가진 사용자들을 지원하는 보다 상호작용적인 방법으로, 미리 거의 모든 글로벌 장면들을 캡처된 항공 사진을 사용하여 텍스처로 재구성하고 실시간으로 렌더링했다. 그럼에도 불구하고, 이 거대한 프로젝트는 주로 글로벌 장면을 조감도에서 렌더링하는 것에 초점을 맞췄고, 따라서 거리 레벨의 기하학 구조와 텍스처의 질이 현저히 떨어졌다. Geollery는 도시 거리에서의 보행 경험을 위한 보다 적절한 방법으로서 구글 스트리트 뷰의 깊이 지도를 바탕으로 밀집한 구형 메쉬의 효율적인 변환을 활용하여 로컬 프록시 기하학을 구축했다. 그러나 이 방법은 인근 단일 거리 뷰와 로컬 깊이 정보에 의존했기 때문에, 사용자의 위치에 따라 빈번한 자원 업데이트가 요구되어 일시적으로 불안정한 결과를 초래했다.Google Earth is a more interactive way to support users with highly realistic fly-over experiences in world-scale urban scenes, with nearly all global scenes pre-reconstructed into textures using captured aerial photography and rendered in real time. Nonetheless, this massive project focused primarily on rendering the global scene from a bird's eye view, so the quality of street-level geometry and textures was noticeably reduced. Geollery builds local proxy geometries by leveraging efficient transformations of dense spherical meshes based on Google Street View's depth maps as a more appropriate method for walking experiences on city streets. However, since this method relied on a single nearby street view and local depth information, frequent resource updates were required depending on the user's location, resulting in temporary instability.

또한 다양한 도시 장면을 커버하기 위해 제안된 방법은 구글 스트리트 뷰의 글로벌 거리 뷰 이미지와 개방형 데이터베이스에서 쉽게 얻을 수 있는 단순화된 3D 모델의 텍스처에 의존하며, 이러한 거리 뷰 이미지를 뷰에 의존하는 텍스처로서 장면 기하학에 매핑한다(View-Dependent Texture Mapping; VDTM). 특히 본 발명은 고해상도, 전방위적인 장면 정보가 담긴 희박하게 샘플링된 거리 뷰 이미지를 카메라 위치에서 사용하기 때문에, 이전 방법처럼 인접한 새로운 뷰를 합성하거나 거리 뷰 이미지를 자주 샘플링하는 데 충분한 이미지 자원을 사용할 필요가 없다. 이 특성은 동적으로 변화하는 사용자 관점에 관계없이 일시적으로 안정된 렌더링 품질을 제공하며, 낮은 계산 비용을 가져온다.In addition, the proposed method to cover various urban scenes relies on global street view images from Google Street View and textures of simplified 3D models readily available from open databases, and maps these street view images onto the scene geometry as view-dependent textures (View-Dependent Texture Mapping (VDTM)). In particular, since the present invention uses sparsely sampled street-view images containing high-resolution, omnidirectional scene information at camera positions, there is no need to use sufficient image resources to synthesize new adjacent views or frequently sample street-view images as in the previous method. This feature provides temporarily stable rendering quality regardless of the dynamically changing user perspective and brings low computational cost.

그러나, 단순화된 장면 모델과 거리 뷰 이미지에 포착된 실제 기하학 모델의 간극을 어떻게 처리할 것인가 하는 중요한 문제가 여전히 남아 있다. 이 간극은 잘못된 텍스처 매핑과 지각적으로 바람직하지 않은 렌더링 결과를 야기한다.However, an important problem still remains how to deal with the gap between the simplified scene model and the actual geometric model captured in the street view image. This gap causes incorrect texture mapping and perceptually undesirable rendering results.

R. Szeliski. Computer vision: algorithms and applications. Springer Science & Business Media, 2010.R. Szeliski. Computer vision: algorithms and applications. Springer Science & Business Media, 2010.

본 발명의 목적은 주어진 거리 뷰 이미지에서 의미론적 정보를 추출하여 적절한 단계에서 사용함으로써, 거리 뷰 이미지에 대한 매핑 정확성과 성능 시간을 모두 향상시키고자 한다. An object of the present invention is to improve both mapping accuracy and performance time for a street-view image by extracting semantic information from a given street-view image and using it at an appropriate stage.

본 발명의 실시예에 따른 의미론적 객체 매칭을 통한 파노라마 텍스처 매핑 방법은 거리 뷰 이미지에서 의미론적 정보를 추출하는 단계, 사용자의 움직임에 따른 카메라 뷰 이미지에 대한 G-버퍼를 렌더링하며, 상기 G-버퍼에서 각 픽셀을 상기 거리 뷰 이미지로 매핑하는 파노라마 이미지 매핑을 수행하는 단계 및 상기 의미론적 정보와 상기 G-버퍼를 이용하여 깊이 테스트, 의미 객체 매칭 테스트 및 3D 인페인팅(inpainting)을 통해 상기 카메라 뷰 이미지와 상기 거리 뷰 이미지에 대한 파노라마 텍스처 매핑을 수행하는 단계를 포함한다.A panoramic texture mapping method through semantic object matching according to an embodiment of the present invention includes extracting semantic information from a street view image, rendering a G-buffer for a camera view image according to a user's motion, performing panoramic image mapping in which each pixel in the G-buffer is mapped to the street view image, and using the semantic information and the G-buffer to perform a depth test, a semantic object matching test, and 3D inpainting, the panorama for the camera view image and the street view image. and performing texture mapping.

본 발명의 실시예에 따른 의미론적 객체 매칭을 통한 파노라마 텍스처 매핑 시스템은 거리 뷰 이미지에서 의미론적 정보를 추출하는 사전 처리부, 사용자의 움직임에 따른 카메라 뷰 이미지에 대한 G-버퍼를 렌더링하며, 상기 G-버퍼에서 각 픽셀을 상기 거리 뷰 이미지로 매핑하는 파노라마 이미지 매핑을 수행하는 이미지 매핑부 및 상기 의미론적 정보와 상기 G-버퍼를 이용하여 깊이 테스트, 의미 객체 매칭 테스트 및 3D 인페인팅(inpainting)을 통해 상기 카메라 뷰 이미지와 상기 거리 뷰 이미지에 대한 파노라마 텍스처 매핑을 수행하는 텍스처 매핑 수행부를 포함한다.A panoramic texture mapping system through semantic object matching according to an embodiment of the present invention includes a pre-processing unit that extracts semantic information from a street view image, an image mapping unit that renders a G-buffer for a camera view image according to a user's movement and maps each pixel in the G-buffer to the street view image, and a camera view image and the street view through a depth test, a semantic object matching test, and 3D inpainting using the semantic information and the G-buffer. and a texture mapping performing unit that performs panoramic texture mapping on the image.

본 발명의 실시예에 따르면, 제안되는 파이프라인의 적절한 중간 단계에서 장면의 추론된 의미론적 정보를 활용하여 보다 효율적이고 정확한 파노라마 텍스처 매핑을 제공할 수 있다.According to an embodiment of the present invention, more efficient and accurate panoramic texture mapping can be provided by utilizing inferred semantic information of a scene in an appropriate intermediate stage of the proposed pipeline.

또한, 본 발명의 실시예에 따르면 3D 기하학 구조와 의미론적 정보를 모두 활용한 효과적인 실시간 인페인팅으로 복잡한 폐색으로 인한 시각적 구멍 없이 역동적으로 변화하는 새로운 뷰 합성을 지원할 수 있다. In addition, according to an embodiment of the present invention, effective real-time inpainting utilizing both 3D geometry and semantic information can support dynamically changing new view synthesis without visual holes due to complex occlusion.

도 1은 본 발명의 실시예에 따른 의미론적 객체 매칭을 통한 파노라마 텍스처 매핑 방법의 동작 흐름도를 도시한 것이다.
도 2는 본 발명의 실시예에 따른 거리 뷰 이미지에서 의미론적 정보를 추출하는 과정을 이미지로 도시한 것이다.
도 3은 본 발명의 실시예에 따른 의미론적 객체 매칭을 통한 파노라마 텍스처 매핑 과정을 도시한 것이다.
도 4는 본 발명의 실시예에 따른 카메라 포즈 개선 결과를 도시한 것이다.
도 5는 본 발명의 실시예에 따른 파노라마 이미지 프로젝션을 설명하기 위해 도시한 것이다.
도 6a 및 도 6b는 본 발명의 실시예에 따른 시각화된 파노라마 텍스처 매핑 프로세스를 도시한 것이다.
도 7은 본 발명의 실시예에 따른 텍스처 필터링 테스트의 효과를 도시한 것이다.
도 8은 본 발명의 실시예에 따른 가중 혼합(weighted blending) 결과를 도시한 것이다.
도 9는 본 발명의 실시예에 따른 로우 패스 필터링 결과를 도시한 것이다.
도 10은 본 발명의 실시예에 따른 의미론적 3D Pixmix 인페인팅을 설명하기 위해 도시한 것이다.
도 11은 본 발명의 실시예에 따른 인페인팅 방법을 이용한 결과를 도시한 것이다.
도 12는 본 발명의 실시예에 따른 의미론적 객체 매칭을 통한 파노라마 텍스처 매핑 시스템의 세부 구성을 블록도로 도시한 것이다.1 is an operational flowchart of a panoramic texture mapping method through semantic object matching according to an embodiment of the present invention.
2 is an image showing a process of extracting semantic information from a street view image according to an embodiment of the present invention.
3 illustrates a panoramic texture mapping process through semantic object matching according to an embodiment of the present invention.
4 illustrates a camera pose improvement result according to an embodiment of the present invention.
5 is a diagram to explain panoramic image projection according to an embodiment of the present invention.
6A and 6B illustrate a visualized panorama texture mapping process according to an embodiment of the present invention.
7 illustrates the effect of a texture filtering test according to an embodiment of the present invention.
8 shows a result of weighted blending according to an embodiment of the present invention.
9 illustrates low pass filtering results according to an embodiment of the present invention.
10 is a diagram to explain semantic 3D Pixmix inpainting according to an embodiment of the present invention.
11 shows the result of using the inpainting method according to an embodiment of the present invention.
12 is a block diagram illustrating a detailed configuration of a panoramic texture mapping system through semantic object matching according to an embodiment of the present invention.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나, 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 것이며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하며, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다.Advantages and features of the present invention, and methods of achieving them, will become clear with reference to the detailed description of the following embodiments taken in conjunction with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below, but will be implemented in various different forms, and only these embodiments make the disclosure of the present invention complete, and those skilled in the art are provided to fully inform the scope of the invention, and the present invention is only defined by the scope of the claims.

본 명세서에서 사용된 용어는 실시예들을 설명하기 위한 것이며, 본 발명을 제한하고자 하는 것은 아니다. 본 명세서에서, 단수형은 문구에서 특별히 언급하지 않는 한 복수형도 포함한다. 명세서에서 사용되는 "포함한다(comprises)" 및/또는 "포함하는(comprising)"은 언급된 구성요소, 단계, 동작 및/또는 소자는 하나 이상의 다른 구성요소, 단계, 동작 및/또는 소자의 존재 또는 추가를 배제하지 않는다.Terms used in this specification are for describing the embodiments and are not intended to limit the present invention. In this specification, singular forms also include plural forms unless specifically stated otherwise in a phrase. As used herein, “comprises” and/or “comprising” does not preclude the presence or addition of one or more other components, steps, operations, and/or elements in which a stated component, step, operation, and/or element is present.

다른 정의가 없다면, 본 명세서에서 사용되는 모든 용어(기술 및 과학적 용어를 포함)는 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 공통적으로 이해될 수 있는 의미로 사용될 수 있을 것이다. 또한, 일반적으로 사용되는 사전에 정의되어 있는 용어들은 명백하게 특별히 정의되어 있지 않는 한 이상적으로 또는 과도하게 해석되지 않는다.Unless otherwise defined, all terms (including technical and scientific terms) used in this specification may be used in a meaning commonly understood by those of ordinary skill in the art to which the present invention belongs. In addition, terms defined in commonly used dictionaries are not interpreted ideally or excessively unless explicitly specifically defined.

이하, 첨부한 도면들을 참조하여, 본 발명의 바람직한 실시예들을 보다 상세하게 설명하고자 한다. 도면 상의 동일한 구성요소에 대해서는 동일한 참조 부호를 사용하고 동일한 구성요소에 대해서 중복된 설명은 생략한다.Hereinafter, with reference to the accompanying drawings, preferred embodiments of the present invention will be described in more detail. The same reference numerals are used for the same components in the drawings, and redundant descriptions of the same components are omitted.

본 발명의 실시예들은, 거리 레벨의 대규모 도시 장면을 사진과 동일하게 실시간으로 재현할 수 있는 파노라마 텍스처(texture) 매핑 기반 렌더링 기술을 그 요지로 한다.Embodiments of the present invention have a panoramic texture mapping-based rendering technology capable of reproducing a large-scale city scene at a street level in real time as identical to a photograph.

본 발명은 가상현실(VR)에서 현실성 높은 도시거리를 경험할 수 있는 궁극적인 목적을 위해 샘플링된 파노라마 거리 뷰 이미지를 활용한 새로운 즉석 텍스처 매핑 시스템과 대규모 글로벌 도시 장면의 개방형 3D 모델을 제안한다. The present invention proposes a novel instant texture mapping system using sampled panoramic street view images and an open 3D model of a large-scale global city scene for the ultimate purpose of experiencing highly realistic city streets in virtual reality (VR).

이하에서는 도 1 내지 도 12를 참조하여 의미론적 객체 매칭을 통한 파노라마 텍스처 매핑 방법 및 시스템에 대해 상세히 설명한다. Hereinafter, a panoramic texture mapping method and system through semantic object matching will be described in detail with reference to FIGS. 1 to 12 .

도 1은 본 발명의 실시예에 따른 의미론적 객체 매칭을 통한 파노라마 텍스처 매핑 방법의 동작 흐름도를 도시한 것이며, 도 2는 본 발명의 실시예에 따른 거리 뷰 이미지에서 의미론적 정보를 추출하는 과정을 이미지로 도시한 것이고, 도 3은 본 발명의 실시예에 따른 의미론적 객체 매칭을 통한 파노라마 텍스처 매핑 과정을 도시한 것이다.1 is an operation flowchart of a panoramic texture mapping method through semantic object matching according to an embodiment of the present invention, FIG. 2 is an image showing a process of extracting semantic information from a street view image according to an embodiment of the present invention, and FIG. 3 illustrates a panoramic texture mapping process through semantic object matching according to an embodiment of the present invention.

또한, 도 4는 본 발명의 실시예에 따른 카메라 포즈 개선 결과를 도시한 것이며, 도 5는 본 발명의 실시예에 따른 파노라마 이미지 프로젝션을 설명하기 위해 도시한 것이고, 도 6a 및 도 6b는 본 발명의 실시예에 따른 시각화된 파노라마 텍스처 매핑 프로세스를 도시한 것이다.4 shows a camera pose improvement result according to an embodiment of the present invention, FIG. 5 is shown to explain panoramic image projection according to an embodiment of the present invention, and FIGS. 6A and 6B show a visualized panorama texture mapping process according to an embodiment of the present invention.

또한, 도 7은 본 발명의 실시예에 따른 텍스처 필터링 테스트의 효과를 도시한 것이며, 도 8은 본 발명의 실시예에 따른 가중 혼합(weighted blending) 결과를 도시한 것이고, 도 9는 본 발명의 실시예에 따른 로우 패스 필터링 결과를 도시한 것이다.In addition, FIG. 7 shows the effect of a texture filtering test according to an embodiment of the present invention, FIG. 8 shows a weighted blending result according to an embodiment of the present invention, and FIG. 9 shows a result of low pass filtering according to an embodiment of the present invention.

또한, 도 10은 본 발명의 실시예에 따른 의미론적 3D Pixmix 인페인팅을 설명하기 위해 도시한 것이며, 도 11은 본 발명의 실시예에 따른 인페인팅 방법을 이용한 결과를 도시한 것이다. In addition, FIG. 10 is a diagram to explain semantic 3D Pixmix inpainting according to an embodiment of the present invention, and FIG. 11 shows a result using an inpainting method according to an embodiment of the present invention.

도 1의 방법은 도 12에 도시된 본 발명의 실시예에 따른 의미론적 객체 매칭을 통한 파노라마 텍스처 매핑 시스템에 의해 수행된다.The method of FIG. 1 is performed by a panoramic texture mapping system through semantic object matching according to an embodiment of the present invention shown in FIG. 12 .

도 3을 참조하면, 본 발명의 실시예에 따라 제안된 시스템은 크게 두 부분으로 구성되는데, 입력 자원을 획득하고 렌더링할 3D 장면을 구성하기 위한 사전 프로세스(310) 및 파노라마 텍스처 매핑과 의미론적 3D 인페인팅을 포함하는 실시간 프로세스(320)로 구성된다. 본 발명은 구글 스트리트 뷰(Google Street View), 의미론적 객체 매칭 테스트(Semantic Object Matching Test) 및 의미론적 3D 인페인팅(Semantic 3D Inpainting) 등의 적절한 중간 단계에서 거리 뷰 이미지에서 추출한 의미론적 정보를 매치하는 것을 특징으로 한다. Referring to FIG. 3, the proposed system according to an embodiment of the present invention is largely composed of two parts: a preliminary process 310 for obtaining input resources and constructing a 3D scene to be rendered, and a real-time process 320 including panoramic texture mapping and semantic 3D inpainting. The present invention is characterized by matching semantic information extracted from street view images at appropriate intermediate stages such as Google Street View, Semantic Object Matching Test, and Semantic 3D Inpainting.

도 1 및 도 3을 참조하면, 사전 프로세스(310)에서, 단계 110은 의미론적 정보를 추출한다. Referring to Figures 1 and 3, in the preliminary process 310, step 110 extracts semantic information.

단계 110은 거리 뷰 이미지에서 RGB 파노라마 거리 뷰 이미지(PI), 도시 모델(M), 객체 단위 의미론적 이미지(PS), 깊이 이미지(PD) 및 카메라 파라미터(C)의 의미론적 정보를 추출할 수 있다. 이때, RGB 파노라마 거리 뷰 이미지(PI)는 전세계의 광범위한 거리 레벨 파노라마 이미지이며, 구글 맵스 플랫폼의 구글 거리 뷰(Google Street View)에서 제공하는 이미지 리소스를 나타낸다. 또한, 도시 모델(M)는 도시 건물들의 가상 모델 데이터이며, 3D 지리정보의 오픈데이터베이스에서 획득한 것이다. 또한, 객체 단위 의미론적 이미지(PS)는 거리 뷰 이미지들에 대해서 객체 단위 의미 정보 이미지를 딥러닝 기반의 방법인 DeppLabV3+를 사용하여 추출한 것이며, 카메라 파라미터(C)는 도시 모델(M)과 객체 단위 의미론적 이미지(PS)를 활용하여 건물들의 경계선을 정합하는 방식으로 보정된 데이터이고, 깊이 이미지(PD)는 카메라 파라미터(C)에 포함된 각각의 카메라 정보를 활용하여 도시 모델(M)의 깊이 정보를 파노라마 이미지 형태로 렌더링한 것이다.Step 110 may extract semantic information of an RGB panoramic street view image (PI), a city model (M), an object unit semantic image (PS), a depth image (PD), and a camera parameter (C) from a street view image. At this time, the RGB panoramic street view image (PI) is a wide street level panoramic image around the world, and represents an image resource provided by Google Street View of the Google Maps platform. In addition, the city model M is virtual model data of city buildings and is obtained from an open database of 3D geographic information. In addition, the object unit semantic image (PS) is an object unit semantic information image extracted from street view images using a deep learning-based method, DeppLabV3+, and the camera parameter (C) is data calibrated by matching the boundary lines of buildings using the city model (M) and the object unit semantic image (PS).

전술한 바와 같이, RGB 파노라마 거리 뷰 이미지(PI)와 도시 모델(M)는 공개 데이터에서 획득되며, 객체 단위 의미론적 이미지(PS)은 딥러닝 기반 방법으로 RGB 파노라마 거리 뷰 이미지(PI)로부터 획득될 수 있다. 또한, RGB 파노라마 거리 뷰 이미지(PI)와 도시 모델(M) 사이의 이미지 모델 등록에 유추된 객체 단위 의미론적 이미지(PS)을 사용하여 카메라 파라미터(C)를 세분화할 수 있다. 마지막으로, 실시간 텍스처 매핑에서 깊이 테스트를 위한 깊이 이미지(PD)는 도시 모델(M)와 카메라 파라미터(C)를 사용하여 렌더링할 수 있다. As described above, the RGB panoramic street view image (PI) and city model (M) are obtained from open data, and the object unit semantic image (PS) can be obtained from the RGB panoramic street view image (PI) by a deep learning-based method. In addition, the camera parameter (C) can be subdivided using the object unit semantic image (PS) inferred from the image model registration between the RGB panoramic street view image (PI) and the city model (M). Finally, a depth image (PD) for depth test in real-time texture mapping can be rendered using a city model (M) and camera parameters (C).

단계 110은 객체 단위 의미론적 이미지(PS)을 추출하기 위해, 구글 스트리트 뷰(Google Street View)에서 딥러닝 기반 접근법 중 하나인 DeepLabv3+와 멀티스케일 컨텍스트 정보를 인코딩하기 위한 ASPP(Atrous Spatial Pyramid Pooling)를 사용하여 분할된 객체의 경계를 세분화할 수 있다. 이때, 본 발명은 목표 장면에 맞는 30개 클래스의 고품질 고밀도 픽셀 주석을 가진 5000개의 대규모 도시 장면 이미지를 사용하므로, 도 2의 첫 번째 행에 도시된 바와 같이 객체 단위 의미론적 이미지(PS)은 본 발명이 주로 사용하는 기본 클래스인 하늘, 건물, 지상의 경계를 적절하게 세분화하여 보여주는 것을 알 수 있다. Step 110 uses DeepLabv3+, which is one of the deep learning-based approaches in Google Street View, and ASPP (Atrous Spatial Pyramid Pooling) to encode multiscale context information to extract an object unit semantic image (PS). The boundary of the segmented object can be segmented. At this time, since the present invention uses 5000 large-scale city scene images with high-quality high-density pixel annotations of 30 classes suitable for the target scene, the object unit semantic image (PS), as shown in the first row of FIG.

또한, 단계 110은 보다 정확한 파노라마 이미지 투영을 위해 자원 텍스처를 캡처하는데 사용되는 적절한 카메라 파라미터(C)를 획득할 수 있다. 보다 상세하게, 단계 110은 윤곽선 정렬에 기초한 효과적인 이미지 3D 모델 등록을 이용하여 6 DoF 카메라 파라미터(C)를 개선할 수 있다. 이 방법을 사용하면 SfM(Structure from Motion)을 이용한 카메라 포즈 추정이나 묶음 조정과는 달리 개별 카메라에 대해 최적화를 수행할 수 있다는 장점이 있다. 이미지 등록 방식은 3D 모델의 pworld 지점에서 투영된 uk가 샘플링된 파노라마 거리 뷰 이미지(PI)에 정확히 겹쳐져 있을 때 영(0) 값에 도달하는 에너지 기능을 최적화해 샘플링된 파노라마 거리 뷰 이미지(PI) 모델과 정렬된 3D 모델의 윤곽을 만든다. 이를 실현하기 위해 본 발명은 구글 스트리트 뷰(Google Street View)의 원시 카메라 포즈를 초기 값으로 사용하여 하기의 [수식 1]의 손실 함수를 최소화한다.Step 110 may also obtain appropriate camera parameters (C) used to capture resource textures for more accurate panoramic image projection. More specifically, step 110 can improve the 6 DoF camera parameters (C) using effective image 3D model registration based on contour alignment. This method has the advantage of being able to perform optimization for individual cameras, unlike camera pose estimation or batch adjustment using SfM (Structure from Motion). The image registration method optimizes the energy function to reach a zero value when the projected uk at the pworld point of the 3D model is exactly superimposed on the sampled panoramic street-view image (PI), resulting in the outline of the 3D model aligned with the sampled panoramic street-view image (PI) model. To realize this, the present invention uses the original camera pose of Google Street View as an initial value to minimize the loss function of [Equation 1] below.

[수식 1][Equation 1]

여기서 3D 모델의 윤곽은 Unity 셰이더(shader, 도 4의 분홍색 선)에서 카메라 인덱스 k를 사용하여 함수 L을 통해 효율적으로 렌더링할 수 있으며, 이전에 획득한 샘플링된 객체 단위 의미론적 이미지(PS, 도 4의 청록 선)을 사용하여 샘플링된 파노라마 거리 뷰 이미지(PI)의 경계 Lk를 추출할 수 있다. 이때, 윤곽선 이미지의 각 픽셀은 이진수값(0: 윤곽이 아닌 픽셀, 1: 윤곽 픽셀)을 가지며 [수식 1]은 최적의 샘플링된 카메라 파라미터(C)를 획득할 수 있다. 최상의 샘플링된 카메라 파라미터(C)를 찾기 위해, 본 발명은 각 샘플링된 입자가 로컬에서 발견된 최적 입자와 다른 모든 입자가 공유하는 글로벌적으로 발견된 최적 입자를 고려하여 더 나은 샘플링된 카메라 파라미터(C) 샘플을 검색하는 진화적 샘플링 방법을 사용하여 입자 군집 최적화(PSO)를 배치한다. 이에 따라서, 도 4에 도시된 바와 같이 개선된 카메라 포즈를 사용한 결과, 구글 스트리트 뷰의 원 카메라 포즈보다 질적으로 더 잘 정렬된 것을 알 수 있다. Here, the outline of the 3D model can be efficiently rendered through a function L using the camera index k in the Unity shader (pink line in Fig. 4), and the boundary Lk of the sampled panoramic street view image (PI) can be extracted using the previously acquired sampled object unit semantic image (PS, cyan line in Fig. 4). At this time, each pixel of the contour image has a binary value (0: non-contour pixel, 1: contour pixel), and [Equation 1] can obtain the optimal sampled camera parameter (C). To find the best sampled camera parameters (C), the present invention deploys particle swarm optimization (PSO) using an evolutionary sampling method in which each sampled particle searches for a sample of better sampled camera parameters (C) by taking into account the locally found optimal particle and the globally found optimal particle shared by all other particles. Accordingly, as shown in FIG. 4 , as a result of using the improved camera pose, it can be seen that the qualitative alignment is better than that of the original camera pose of Google Street View.

또한, 단계 110은 3D 표면 지점에 텍스처를 매핑할 때 깊이 테스트를 사용하여 텍스처 픽셀을 필터링할 수 있다. 구체적으로, 단계 110은 샘플링된 깊이 이미지(PD)를 도시 모델(M)로 렌더링하고, 샘플링된 카메라 파라미터(C)로 미리 촬영하여, 깊이 테스트를 위해 샘플링된 깊이 이미지(PD)에 배치할 수 있다. Step 110 may also filter texture pixels using a depth test when mapping textures to 3D surface points. Specifically, in step 110, the sampled depth image PD may be rendered into the city model M, photographed in advance with the sampled camera parameters C, and placed in the sampled depth image PD for depth testing.

또한, 단계 110은 글로벌 기하학과 유사하게 정기적으로 분할된 글로벌 공간에 기초하여 희소 샘플링 기법을 사용하여 거리 뷰 이미지를 효과적으로 관리할 수 있다. 예를 들면, 사전 프로세스(310)에서 X-Z 평면은 사용자가 선택한 폭과 높이에 따라 구역(Districts)이라고 부르는 작은 타일로 분리된다. 구역에 위치한 거리 뷰 카메라 중 개선된 3D 장면 데이터(Refined 3D Scene Data)에서 노란색 원으로 표시된 구역의 중앙 위치에 가장 가까운 카메라를 샘플로 샘플링한다. 사용자의 현재 구역별로 보면 실시간 텍스처 매핑 과정에서 최대 8개 인접 구역이 배치될 수 있으며, 구역의 타일 크기가 작을 경우 더 가까운 거리 뷰 이미지를 샘플링할 수 있지만 자주 변경되는 사용자 구역에 따라 텍스처 업데이트가 더 많이 필요하게 된다.In addition, step 110 can effectively manage the street view image using a sparse sampling technique based on regularly segmented global space similar to the global geometry. For example, in pre-process 310, the X-Z plane is divided into smaller tiles called Districts according to the width and height selected by the user. Among the street view cameras located in the area, a camera closest to the central position of the area indicated by a yellow circle in the refined 3D scene data is sampled. For each user's current zone, up to eight adjacent zones can be placed in the real-time texture mapping process, and if the tile size of a zone is small, a closer distance view image can be sampled, but more texture updates are required depending on the user's frequently changing zone.

단계 120은 실시간 프로세스(320)에서, 사용자의 움직임에 따른 카메라 뷰 이미지에 대한 G-버퍼를 렌더링하며, G-버퍼에서 각 픽셀을 거리 뷰 이미지로 매핑하는 파노라마 이미지 매핑을 수행한다.In step 120, in real-time process 320, a G-buffer is rendered for a camera view image according to a user's movement, and panoramic image mapping is performed to map each pixel in the G-buffer to a street view image.

단계 120은 거리 뷰 이미지의 도시 모델(M)를 이용하여 사용자의 움직임에 따른 카메라 뷰 이미지에 따라 위치 버퍼(position buffer), 분할 버퍼(segmentation buffer) 및 일반 버퍼(normal buffer)의 G-버퍼를 렌더링하며, 위치 버퍼에서 인접한 카메라 정보를 기반으로 파노라마 좌표계 변환 과정을 수행하며, 위치 버퍼의 각 픽셀을 거리 뷰 이미지들로 매핑할 수 있다.In step 120, a G-buffer of a position buffer, a segmentation buffer, and a normal buffer may be rendered according to the camera view image according to the user's movement using the city model M of the street view image, a panoramic coordinate system conversion process may be performed based on adjacent camera information in the position buffer, and each pixel of the position buffer may be mapped to street view images.

단계 120은 실시간 파노라마 텍스처 매핑을 위해, 거리 뷰 이미지의 도시 모델(M)를 이용하여 카메라 뷰 이미지에 따라 위치 버퍼(position buffer, Gp), 분할 버퍼(segmentation buffer, Gs) 및 일반 버퍼(normal buffer, Gn)의 G-버퍼를 렌더링할 수 있다. G-버퍼는 현재 사용자가 보는 장면에 대한 여러가지 기하 정보를 렌더링하여 저장하는 공간으로, 가상 공간의 지형 위치 정보를 저장하는 위치 버퍼, 노말 벡터를 저장하는 일반 버퍼 및 객체 별 의미 정보를 담고있는 분할 버퍼를 포함할 수 있다. In step 120, a position buffer (Gp), a segmentation buffer (Gs), and a G-buffer of a normal buffer (Gn) may be rendered according to the camera view image using the city model M of the street view image for real-time panoramic texture mapping. The G-buffer is a space that renders and stores various geometric information about the scene currently viewed by the user, and may include a location buffer that stores topographical location information in virtual space, a general buffer that stores normal vectors, and a segmentation buffer that contains semantic information for each object.

단계 120은 실시간 프로세스(320)에서, 샘플링된 카메라 파라미터(C)를 이용하여 3D 표면 지점을 카메라 인덱스(k)와 함께 거리 뷰 이미지에서 해당하는 픽셀 위치로 투영(Panoramic Image Projection)할 수 있다. 도 5를 참조하여 설명하면, 단계 120은 구면 및 등사각형 투영을 순서대로 수행할 수 있다. 여기서, 와 는 각각 축 각도 표현과 회전을 의미한다. 세계 좌표에서 3D 지점 은 하기의 [수식 2]를 통해 뷰 매트릭스 를 통해 카메라 좌표에서 지점 으로 변환된다.In step 120, in the real-time process 320, a 3D surface point may be projected (Panoramic Image Projection) to a corresponding pixel position in the street view image along with a camera index k using the sampled camera parameter C. Referring to FIG. 5 , step 120 may sequentially perform spherical and equirectangular projections. here, and denotes the axis angle expression and rotation, respectively. 3D point in world coordinates is the view matrix through [Equation 2] below point in camera coordinates via is converted to

[수식 2][Formula 2]

가상 360° 카메라 모델이 구형 투영 Esphere를 수행한다고 가정할 때, 는 카메라 공간의 원점에 있는 유닛 구체에서 sk를 가리키도록 투영할 수 있다. 2D 구형 좌표계에 따르면 은 하기의 [수식 3], [수식 4] 및 [수식 5]와 같이 표현된다.Assuming the virtual 360° camera model performs a spherical projection Esphere, can be projected to point sk on the unit sphere at the origin of camera space. According to the 2D spherical coordinate system Is expressed as [Equation 3], [Equation 4] and [Equation 5] below.

[수식 3][Formula 3]

[수식 4][Formula 4]

[수식 5][Formula 5]

여기서, 및 는 각각 방위각과 극각을 나타내며, rk는 단위 구의 중심 위치에서 까지의 유클리드 거리를 나타낸다. 또한, 본 발명의 실시예에 따른 시스템이 개발된 Unity 엔진에서와 같이 왼손 좌표가 사용된다고 가정하면, 파노라마 이미지의 sk에서 픽셀 로 매핑하면 하기의 [수식 6] 및 [수식 7]과 같이 등사각형 투영 으로 수행할 수 있다. here, and denotes the azimuth and polar angle, respectively, and rk is at the center position of the unit sphere represents the Euclidean distance to In addition, assuming that left-handed coordinates are used as in the Unity engine in which the system according to the embodiment of the present invention was developed, pixels in the sk of the panoramic image If mapped as , equirectangular projection as shown in [Equation 6] and [Equation 7] below can be done with

[수식 6][Formula 6]

[수식 7][Formula 7]

여기서, 및 는 0.0과 1.0 사이의 s-t 이미지 좌표 값을 나타낸다. 마지막으로, 에서 uk까지 투영 함수 E는 [수식 8]로 정의할 수 있다.here, and represents the st image coordinate value between 0.0 and 1.0. finally, The projection function E from to uk can be defined by [Equation 8].

[수식 8][Formula 8]

단계 130은 카메라 뷰 이미지에 투영된 픽셀에 대한 투영 오류 최소화를 위해 깊이 테스트하는 제1 단계(Depth Test), 투영 오류가 최소화된 카메라 뷰 이미지에 의미론적 정보를 통해 객체 일치 테스트를 수행하여 의미론적 객체 매칭하는 제2 단계(Semantic Object Matching Test), 카메라 뷰 이미지에 혼합 가중치를 적용하여 후보 텍스처 색상을 보간하여 최종 색상을 합성하는 제3 단계(Weighted Blending) 및 카메라 뷰 이미지에서 비하늘 영역에 대한 의미론적 3D 인페인팅하여 파노라마 텍스처 매핑을 완료하는 제4 단계(Semantic 3D Inpainting)를 포함할 수 있다. Step 130 includes a first step (Depth Test) of performing a depth test to minimize projection errors for pixels projected on the camera view image, a second step (Semantic Object Matching Test) of performing an object matching test through semantic information on the camera view image in which the projection error is minimized, and a third step (Weighted Blending) of applying blending weights to the camera view image to interpolate candidate texture colors to synthesize the final color, and a semantic 3D rendering of a non-sky area in the camera view image. A fourth step (Semantic 3D Inpainting) of completing panorama texture mapping by inting may be included.

단계 120은 [수식 8]의 뷰 매트릭스 Mk와 투영 E를 사용하여 위치 버퍼(Gp)의 모든 표면 픽셀 을 카메라 인덱스 k로 샘플링된 파노라마 거리 뷰 이미지(PI)의 해당 픽셀 위치 에 매핑한다. 이후에, 단계 120은 매핑 관계를 통해 도 6a와 같이 샘플링된 파노라마 거리 뷰 이미지(PI)에서 현재 뷰의 가능한 픽셀에 텍스처 색상을 할당할 수 있다. 그러나, 도 7(a)에 도시된 바와 같이, 투영 오류 가능성으로 인해 특정 표면 픽셀의 최종 색상을 생성하기 위해 색상이 모두 사용될 수는 없다.Step 120 is all surface pixels in position buffer Gp using view matrix Mk and projection E from [Equation 8]. is the corresponding pixel location of the panoramic street view image (PI) sampled by the camera index k. map to Subsequently, step 120 may assign a texture color to a possible pixel of the current view in the sampled panoramic distance view image PI as shown in FIG. 6A through a mapping relationship. However, as shown in Fig. 7(a), not all of the colors may be used to generate the final color of a particular surface pixel due to possible projection errors.

이에 따라서, 본 발명의 실시예에 따른 단계 130는 제1 단계 내지 제4 단계를 통해 샘플링된 파노라마 거리 뷰 이미지(PI)에서 사용 가능한 후보 텍스처 색상을 선택하고, 혼합하기 위한 적절한 가중치를 계산한다. 본 발명은 텍스처 매핑 정확도를 효과적으로 향상시키기 위해 샘플링된 깊이 이미지(PD)를 사용하여 막힘 표면 지점에 잘못 투영된 텍스처 색상을 제외시킬 뿐만 아니라, 샘플링된 객체 단위 의미론적 이미지(PS)을 사용하여 컨텍스트적으로 올바른 후보를 선택한 다음, 도 6b와 같이 개념적으로 자연스러운 결과를 렌더링하기 위한 혼합 가중치(weighted blending)를 사용한다.Accordingly, step 130 according to an embodiment of the present invention selects usable candidate texture colors from the panoramic street view image PI sampled through the first to fourth steps, and calculates an appropriate weight for mixing. In order to effectively improve texture mapping accuracy, the present invention not only excludes erroneously projected texture colors at blockage surface points using sampled depth images (PD), but also selects contextually correct candidates using sampled per-object semantic images (PS), and then uses weighted blending to render a conceptually natural result, as shown in Fig. 6b.

제1 단계는 렌더링한 깊이 이미지(PD)를 이용하여 현재 사용자의 뷰에서 가림 현상으로 인한 RGB 값들을 제거하여 투영 오류 최소화할 수 있다.In the first step, projection error can be minimized by removing RGB values due to occlusion in the current user's view using the rendered depth image PD.

깊이 테스트의 목적은 투영 텍스처 매핑의 침투 문제를 처리하기 위한 것이다. 샘플링된 파노라마 거리 뷰 이미지(PI)를 캡처하는 데에 사용되는 k번째 카메라를 통한 오류가 샘플링된 파노라마 거리 뷰 이미지(PI)의 픽셀 색상으로 나타날 수 있다. 이에, 본 발명의 실시예에 따른 제1 단계는 샘플링된 카메라 파라미터(C)에 따라 3D 장면 모델의 샘플링된 깊이 이미지(PD)를 렌더링했기 때문에, 카메라 위치와 사이의 거리를 샘플링된 깊이 이미지(PD)의 픽셀 의 깊이 값과 비교함으로써, 위치 버퍼(position buffer, Gp)의 임의의 3D 포인트 가 k번째 카메라에 보이는지 여부를 간단히 판단할 수 있다.The purpose of the depth test is to address the penetration problem of projection texture mapping. An error through the k-th camera used to capture the sampled panoramic street-view image PI may appear as a pixel color of the sampled panoramic street-view image PI. Therefore, in the first step according to the embodiment of the present invention, since the sampled depth image PD of the 3D scene model is rendered according to the sampled camera parameter C, the camera position and The distance between the sampled pixels of the depth image (PD) Any 3D point in the position buffer (Gp) by comparing with the depth value of It can be simply determined whether is visible to the k-th camera.

다만, 깊이 테스트는 대부분의 투영 오류를 없앨 수 있지만 3D 모델을 단순화한 불완전한 샘플링된 깊이 이미지(PD)와 샘플링된 카메라 파라미터(C)의 보정 오류로 인해 샘플링된 파노라마 거리 뷰 이미지(PI)의 실제 기하학과 합성 장면 기하학 사이의 오등록이 발생하여 최적의 해결책은 아니다. 결과적으로, 텍스처 색상은 도 7(b)와 같이 의미론적으로 잘못된 대상 객체에 투영된다는 것을 여전히 발견할 수 있다. 따라서, 본 발명의 실시예에 따른 단계 120은 제2 단계, 제3 단계 및 제4 단계를 수행하여 나머지 오류를 정밀하게 제거한다.However, although the depth test can eliminate most of the projection errors, it is not an optimal solution because misregistration between the actual geometry of the sampled panoramic distance view image (PI) and the synthetic scene geometry occurs due to the incomplete sampled depth image (PD) that simplifies the 3D model and calibration errors in the sampled camera parameters (C). As a result, we can still find that the texture color is projected onto the semantically incorrect target object, as shown in Fig. 7(b). Therefore, step 120 according to an embodiment of the present invention precisely removes the remaining errors by performing the second step, the third step, and the fourth step.

제2 단계는 객체 단위 의미론적 이미지(PS)를 이용하여 위치 버퍼의 각 픽셀들이 RGB 파노라마 거리 뷰 이미지(PI)로 매핑되는 경우, RGB 파노라마 거리 뷰 이미지(PI)의 객체와의 매칭을 확인하는 객체 일치 테스트를 수행할 수 있으며, 샘플링된 파노라마 거리 뷰 이미지(PI)에서 대상 객체에 색상을 할당할 수 있다.In the second step, when each pixel of the location buffer is mapped to the RGB panoramic street view image PI using the object unit semantic image PS, an object matching test may be performed to check matching of the RGB panoramic street view image PI with an object, and a color may be assigned to the target object in the sampled panoramic street view image PI.

현재 장면에서 각 객체의 의미론적 정보를 배치하는 것이 각각에 질감 색상을 정확하게 할당할 수 있다. 이로 인해, 제2 단계는 실시간 프로세스(320)에서 샘플링된 객체 단위 의미론적 이미지(PS)을 사용하여 객체 일치 테스트를 수행함으로써, 샘플링된 파노라마 거리 뷰 이미지(PI)에서 올바른 대상 객체에 색상을 할당할 수 있다. 예를 들면, 사전 프로세스(310)의 DeepLabV3+에서 추정된 샘플링된 객체 단위 의미론적 이미지(PS)은 사전 훈련된 객체 클래스와 CityScapes 데이터셋의 다양한 도시 장면을 기반으로 샘플링된 파노라마 거리 뷰 이미지(PI)의 각 픽셀을 분류할 수 있으므로, 서로 다른 객체 영역의 정확하고 명확한 경계를 얻을 수 있다. 이에 따라서, 3D 모델이 DeepLabV3+에 정의된 알려진 라벨을 가지고 있다고 가정하면, 제2 단계는 모델에 있는 표면 지점 의 라벨과 샘플링된 객체 단위 의미론적 이미지(PS)의 추정 클래스의 라벨을 동일시할 수 있으므로, 일치하지 않는 물체에 투영된 부정확한 텍스처 색상을 걸러낼 수 있다. 이에 따라서 도 7(c)에 도시된 바와 같이 하늘에 투영된 건물(대상 객체)의 제거된 텍스처 색상을 나타낼 수 있다. Placing the semantic information of each object in the current scene can accurately assign a texture color to each. For this reason, in the second step, by performing an object matching test using the object unit semantic image (PS) sampled in the real-time process 320, a color may be assigned to the correct target object in the sampled panoramic street view image (PI). For example, the sampled object-by-object semantic image (PS) estimated in DeepLabV3+ in the pre-process 310 can classify each pixel of the sampled panoramic street view image (PI) based on the pre-trained object class and various city scenes in the CityScapes dataset, so that accurate and clear boundaries of different object areas can be obtained. Accordingly, assuming that the 3D model has known labels defined in DeepLabV3+, the second step is to determine the surface points on the model. Estimation of labels and sampled object unit semantic images (PS) of Labels of classes can be equated, so we can filter out incorrect texture colors projected on non-matching objects. Accordingly, as shown in FIG. 7(c), the removed texture color of the building (target object) projected on the sky can be represented.

제3 단계는 블렌딩 방식을 통해 후보 텍스처 색상을 보간하여 합성한 최종 색상의 RGB 값들을 위치 버퍼의 각 픽셀에 할당할 수 있다.In the third step, RGB values of a final color synthesized by interpolating candidate texture colors through a blending method may be allocated to each pixel of the location buffer.

이전 단계를 통과한 샘플링된 파노라마 거리 뷰 이미지(PI)에서 선택된 후보 텍스처 색상을 보간하여 최종 색상을 합성해야 한다. 이에, 제3 단계는 기본적으로 N 후보 색상이 주어지면 출력 색상 oi는 하기의 [수식 9]와 같이 표준화된 가중치 를 사용하여 보간된다. The final color must be synthesized by interpolating the selected candidate texture color from the sampled panoramic street view image (PI) passed through the previous steps. Therefore, in the third step, when N candidate colors are given, the output color oi is a standardized weight as shown in [Equation 9] below is interpolated using

[수식 9][Formula 9]

여기서, 는 [수식 8]의 투영 함수 E를 통해 카메라 인덱스 k를 가진 샘플링된 파노라마 거리 뷰 이미지(PI)의 표면 지점 에서 투영된 픽셀 위치를 나타낸다. 이때, 에서 픽셀 색상을 부여한다.here, is the surface point of the sampled panoramic distance view image (PI) with camera index k through the projection function E in [Equation 8] Indicates the projected pixel position in . At this time, assigns a pixel color to

제3 단계는 에너지와 거리 사이의 역제곱 법칙에 기초하여 표면의 입사광의 강도를 계산하기 위해, 근본적인 에너지 전달 방정식에 의해 동기 부여된 새로운 혼합 가중치를 사용한다. 또한, 제안된 가중치는 텍스처 카메라 파라미터(C)와 사이의 기하학적 관계뿐만 아니라 샘플링된 카메라 파라미터(C)와 사용자의 현재 위치 사이의 기하학적 관계를 하기의 [수식 10]과 같이 고려한다.The third step uses the new blend weights motivated by the underlying energy transfer equation to calculate the intensity of incident light on the surface based on the inverse square law between energy and distance. In addition, the proposed weight is the texture camera parameter (C) and The geometric relationship between the sampled camera parameter C and the current location of the user is considered as shown in [Equation 10].

[수식 10][Formula 10]

여기서, ni는 표면 지점 에서 정상 벡터를 나타내며, vk는 에서 k번째 카메라 위치 tk까지의 카메라 벡터를 나타낸다. α는 평활도 조건(α=4)을 나타낸다. 또한, disti-k와 distuser-k는 각각 와 tk, 그리고 사용자의 위치와 tk 사이의 거리를 나타낸다. 도 8에 도시된 바와 같이, 제안된 혼합 가중치의 주요 장점은 혼합에 대해 설명된 세 가지 주요 요인을 관찰하면서 사용자 위치에 더 적절한 색상을 계산할 수 있다. where ni is the surface point denotes a normal vector at , and vk is represents the camera vector from to the k-th camera position tk. α represents the smoothness condition (α=4). Also, disti-k and distuser-k are respectively and tk, and the distance between the user's location and tk. As shown in Fig. 8, the main advantage of the proposed blending weight is that it can calculate a color more suitable for the user's location while observing the three main factors described for blending.

제4 단계는 위치 버퍼에서, RGB 값이 채워지지 않은 픽셀들을 처리하기 위해 실시간 인페인팅 방법을 사용하며, 카메라 뷰 이미지에서 하늘 영역을 로우 패스 필터링(Low-pass Filtering) 처리하고, 비하늘 영역을 기하학적 그리고 의미론적으로 일치하는 텍스처 색상의 RGB 값으로 채워 파노라마 텍스처 매핑을 완료할 수 있다. 본 발명의 실시예에 따른 단계 130의 제4 단계는 카메라 뷰 이미지에서 하늘 영역과 비하늘 영역을 분리하여 처리하는 인페인팅 방법을 이용하여 계산시간을 효율적으로 줄일 수 있다. 이를 위해, 제4 단계는 로우 패스 필터를 하늘 영역에 적용하고, 텍스처 없는 구멍과 오류를 동시에 처리하며, 비하늘 영역을 기하학적 또는 의미론적으로 일치하는 텍스처 색상으로 채울 수 있다. In the fourth step, a real-time inpainting method is used to process pixels whose RGB values are not filled in the position buffer, low-pass filtering is performed on the sky area in the camera view image, and the non-sky area is geometrically and semantically matched. The panorama texture mapping can be completed by filling the RGB values of the texture color. The fourth step of step 130 according to an embodiment of the present invention can efficiently reduce calculation time by using an inpainting method that separates and processes the sky region and the non-sky region in the camera view image. To this end, a fourth step may apply a low-pass filter to the sky areas, handle untextured holes and errors simultaneously, and fill the non-sky areas with geometrically or semantically matching texture colors.

보다 상세하게, 제4 단계는 로우 패스 필터링을 통해 하늘 영역을 보정할 수 있다. 도 9를 예를 들어 설명하면, 제4 단계는 하늘 영역과 비하늘 영역으로 분류하며, 텍스처가 없는 하늘 영역을 평균 하늘 색상을 채울 수 있다. 그리고, 가장 가까운 이웃 다운샘플링을 사용하여 해상도로 크기를 조정하며, 이미지 크기가 원본이 될 때까지 로우 패스 필터링을 사용하여 업샘플링을 통해 이미지 폭을 두배로 증가시킬 수 있다. 그 결과, 본 발명은 도 9(f)에 도시된 바와 같이, 상단 오른쪽 밝은 부분과 같은 의미 있는 색의 변화를 포함하는 가운데 시각 층, 투영 오류 및 텍스처가 없는 픽셀이 없이도 설득력 있는 결과를 나타내는 것을 확인할 수 있다. More specifically, the fourth step may correct the sky area through low-pass filtering. Referring to FIG. 9 as an example, in the fourth step, the sky area and non-sky area are classified, and the sky area without texture may be filled with the average sky color. Then, scale with resolution using nearest-neighbor downsampling, and double the image width by upsampling using low-pass filtering until the image size is original. As a result, as shown in FIG. 9(f), it can be confirmed that the present invention shows convincing results without a visual layer, projection error, and pixels without texture, including meaningful color changes such as the upper right bright part.

또한, 제4 단계는 의미론적 3D 픽스믹스(Pixmix)의 인페인팅 방법을 사용할 수 있다. 하늘 영역에 비해 텍스처가 없는 비하늘 영역은 고주파, 의미론적으로 적절한 텍스처 색상으로 매끄럽게 채워야 사실적인 도시 장면이 가능하다. 따라서 본 발명은 Pixmix라는 고품질 2D 인페인팅 방법에 기초하여 현재 장면의 3D 기하학적, 의미론적 정보를 고려한 고급 인페인팅 기법을 채택한다.Also, the fourth step may use a semantic 3D Pixmix inpainting method. Compared to the sky area, the non-textured non-sky area must be smoothly filled with a high-frequency, semantically appropriate texture color to create a realistic urban scene. Therefore, the present invention adopts an advanced inpainting technique considering 3D geometric and semantic information of the current scene based on a high-quality 2D inpainting method called Pixmix.

Pixmix은 도 10에 도시된 바와 같이 비텍스트 영역 T의 대상 픽셀 p에서 이미 텍스처된 영역 S의 소스 픽셀까지 반복적으로 를 찾아내는 결과를 나타낸다. 모든 픽셀 p에 대해 적절한 매핑 함수 f가 발견되면, 각 픽셀 p의 색상은 최종 이미지에 대해 텍스처된 영역 S에서 결정된 소스 색상으로 대체된다. 이에 따라서, Pixmix은 최고의 매핑 함수 f를 얻기 위해 공간함수와 외관비용인 Costa의 두 가지 유형의 비용으로 구성된 총비용 함수의 글로벌 최소화 문제를 해결했다.Pixmix iteratively from the target pixel p in the non-text area T to the source pixel in the already textured area S, as shown in FIG. shows the result of finding . If an appropriate mapping function f is found for every pixel p, the color of each pixel p is replaced with the source color determined in the textured region S for the final image. Accordingly, Pixmix solved the problem of global minimization of the total cost function consisting of two types of costs: a spatial function and an apparent cost, Costa, in order to obtain the best mapping function f.

[수식 11][Equation 11]

[수식 12][Equation 12]

여기서, 는 T에서 픽셀 위치를 나타내고, 는 p 근처에서 인접 픽셀의 상대적 위치를 나타내며, 이는 를 제외하고 p에서 인접 픽셀로의 요소 벡터 를 포함한다. 원 작업과 마찬가지로 Ns와 Na에 각각 3×3, 5×5의 패치 사이즈가 사용되었으며, 가중치 는 기본적으로 균일한 가중치 를 나타낸다. 다만, wa의 경우 T와 S의 경계에 인접한 픽셀은 선택적으로 Costa에 대한 높은 충격에 영향을 미칠 수 있는 가중치가 커 주연효과(border effect)를 줄일 수 있다. 이에, 거리 함수 ds 및 da의 경우 제곱 유클리드 거리가 사용될 수 있다. 직관적으로 Costs 최소화는 유사한 주변 구조물을 유지하면서 대상 픽셀 p의 이웃이 f(p)의 주변 구조와 거의 일치한다는 것을 의미한다. h(p)가 p의 픽셀 색상을 나타낼 때, Costa를 최소화하면 p가 f(p)와 비슷한 주변 픽셀 색상을 가질 수 있다.here, denotes a pixel location at T, denotes the relative position of adjacent pixels near p, which is A vector of elements from p to adjacent pixels, except for includes As in the original work, patch sizes of 3 × 3 and 5 × 5 were used for Ns and Na, respectively, and weighted is basically uniformly weighted indicates However, in the case of wa, pixels adjacent to the boundary of T and S have a large weight that can selectively affect a high impact on Costa, so the border effect can be reduced. Thus, squared Euclidean distances may be used for the distance functions ds and da. Intuitively, minimizing costs means that the neighborhood of the target pixel p closely matches the surrounding structure of f(p) while maintaining similar surrounding structures. When h(p) represents the pixel color of p, minimizing Costa allows p to have similar pixel colors to f(p).

Pixmix 역시 실시간 설득력 있는 이미지 인페인팅 결과를 보여주었지만, 비평면 배경으로 구멍을 채우는 데 어려움을 겪었고, 대상과 소스 픽셀 간의 의미론적 매칭이 부족했다. 이에 대한 해결책으로, 본 발명의 실시예에 따른 제4 단계는 Pixmix를 확장하여 하기의 [수식 13]과 같은 총비용 함수를 사용하여 현재 장면의 3D 기하학적 정보와 의미론적 정보를 배치한다.Pixmix also showed convincing image inpainting results in real time, but struggled to fill holes with non-planar backgrounds and lacked semantic matching between target and source pixels. As a solution to this, the fourth step according to an embodiment of the present invention extends Pixmix to arrange 3D geometric information and semantic information of the current scene using a total cost function as shown in [Equation 13] below.

[수식 13][Equation 13]

여기서, 일치하는 Costm(p)는 p의 의미론적 라벨과 f(p)의 의미론적 라벨이 같을 때 0을 갖는 반면, 다른 경우에는 무한성을 갖는다. P가 속한 3D 모델의 알려진 클래스에서 대상 픽셀 p의 라벨을 획득할 수 있지만, 소스 픽셀 f(p)의 라벨은 이전 제3 단계에서 선택한 가장 영향력 있는 후보 텍스처인 샘플링된 파노라마 거리 뷰 이미지(PI)의 샘플링된 객체 단위 의미론적 이미지(PS)에서 획득된다. 이것은 p와 f(p) 둘 다에 대해 3D 모델의 의미론적 라벨을 사용하는 것 보다 의미론적으로 적절한 소스 픽셀을 더 정확하게 찾는데 도움이 된다. 나아가 위치 버퍼(Gp)에서 이미 알려진 3D 기하학적 구조를 고려하기 위해 와 의 3D 표면 위치와 3D 거리 함수 ds를 이용하여 공간 비용을 계산한다. 제어 파라미터 을 설정하려면 0.5를 사용한다. Here, the coincident Costm(p) has 0 when the semantic label of p and f(p) are equal, whereas it has infinity otherwise. The label of the target pixel p can be obtained from the known class of the 3D model to which P belongs, but the label of the source pixel f(p) is obtained from the sampled per-object semantic image (PS) of the sampled panoramic street view image (PI), which is the most influential candidate texture selected in the previous third step. This helps to find semantically appropriate source pixels more accurately than using the 3D model's semantic labels for both p and f(p). In order to further consider the already known 3D geometry in the position buffer (Gp) and Calculate the space cost using the 3D surface position of and the 3D distance function ds. control parameter Use 0.5 to set

본 발명의 실시예에 따른 의미론적 정보를 사용할 때의 주요 장점은 컨텍스트적으로 일치하는 소스 픽셀을 샘플링하여 f(p)의 검색 범위를 효과적으로 줄일 수 있다는 것이다. A major advantage of using semantic information according to an embodiment of the present invention is that the search range of f(p) can be effectively reduced by sampling contextually matching source pixels.

도 11을 참조하면, 도 11(a)는 촬영된 카메라 뷰 이미지를 나타내며, 도 11(b)는 깊이 테스트, 의미 객체 테스트 및 인페인팅이 적용되지 않은 상태를 나타내고, 도 11(c)는 깊이 테스트가 추가된 상태를 나타낸다. 또한, 도 11(d)는 의미 객체 테스트까지 추가된 상태를 나타내며, 도 11(e)는 인페인팅까지 진행된 상태로 본 발명의 최종 결과를 나타낸다. 또한, 도 11(f) 및 도 11(g)는 기존 인페인팅 방식을 적용한 결과를 나타낸다. 본 발명은 도 11에 도시된 바와 같이 3D 기하학적 구조를 배치하기 때문에 이전 작업보다 큰 구멍을 비평면 배경으로 더 잘 채우는 것을 확인할 수 있다. Referring to FIG. 11, FIG. 11(a) shows a captured camera view image, FIG. 11(b) shows a state where no depth test, semantic object test, and inpainting are applied, and FIG. 11(c) shows a state where a depth test is added. In addition, FIG. 11(d) shows a state in which the semantic object test has been added, and FIG. 11(e) shows the final result of the present invention in a state in which inpainting has been performed. 11(f) and 11(g) show the result of applying the existing inpainting method. It can be seen that the present invention fills a large hole with a non-planar background better than previous work because it places a 3D geometry as shown in FIG. 11 .

도 12는 본 발명의 실시예에 따른 의미론적 객체 매칭을 통한 파노라마 텍스처 매핑 시스템의 세부 구성을 블록도로 도시한 것이다.12 is a block diagram illustrating a detailed configuration of a panoramic texture mapping system through semantic object matching according to an embodiment of the present invention.

도 12를 참조하면, 본 발명의 실시예에 따른 의미론적 객체 매칭을 통한 파노라마 텍스처 매핑 시스템은 거리 뷰 이미지를 실시간 재현하는 파노라마 텍스처(texture) 매핑 기반의 렌더링 기술을 제안한다.Referring to FIG. 12 , a panoramic texture mapping system through semantic object matching according to an embodiment of the present invention proposes a rendering technique based on panoramic texture mapping that reproduces a street view image in real time.

이를 위해, 본 발명의 실시예에 따른 의미론적 객체 매칭을 통한 파노라마 텍스처 매핑 시스템(1200)은 사전 처리부(1210) 및 매핑 수행부(1220)를 포함한다.To this end, the panoramic texture mapping system 1200 through semantic object matching according to an embodiment of the present invention includes a pre-processing unit 1210 and a mapping performing unit 1220.

사전 처리부(1210)는 의미론적 정보를 추출한다. The pre-processing unit 1210 extracts semantic information.

사전 처리부(1210)는 거리 뷰 이미지에서 RGB 파노라마 거리 뷰 이미지(PI), 도시 모델(M), 객체 단위 의미론적 이미지(PS), 깊이 이미지(PD) 및 카메라 파라미터(C)의 의미론적 정보를 추출할 수 있다. 이때, RGB 파노라마 거리 뷰 이미지(PI)는 전세계의 광범위한 거리 레벨 파노라마 이미지이며, 구글 맵스 플랫폼의 구글 거리 뷰(Google Street View)에서 제공하는 이미지 리소스를 나타낸다. 또한, 도시 모델(M)는 도시 건물들의 가상 모델 데이터이며, 3D 지리정보의 오픈데이터베이스에서 획득한 것이다. 또한, 객체 단위 의미론적 이미지(PS)는 거리 뷰 이미지들에 대해서 객체 단위 의미 정보 이미지를 딥러닝 기반의 방법인 DeppLabV3+를 사용하여 추출한 것이며, 카메라 파라미터(C)는 도시 모델(M)과 객체 단위 의미론적 이미지(PS)를 활용하여 건물들의 경계선을 정합하는 방식으로 보정된 데이터이고, 깊이 이미지(PD)는 카메라 파라미터(C)에 포함된 각각의 카메라 정보를 활용하여 도시 모델(M)의 깊이 정보를 파노라마 이미지 형태로 렌더링한 것이다.The pre-processing unit 1210 may extract semantic information of an RGB panoramic street view image (PI), a city model (M), an object unit semantic image (PS), a depth image (PD), and a camera parameter (C) from a street view image. At this time, the RGB panoramic street view image (PI) is a wide street level panoramic image around the world, and represents an image resource provided by Google Street View of the Google Maps platform. In addition, the city model M is virtual model data of city buildings and is obtained from an open database of 3D geographic information. In addition, the object unit semantic image (PS) is an object unit semantic information image extracted from street view images using a deep learning-based method, DeppLabV3+, and the camera parameter (C) is data calibrated by matching the boundary lines of buildings using the city model (M) and the object unit semantic image (PS).

이미지 매핑부(1220)는 사용자의 움직임에 따른 카메라 뷰 이미지에 대한 G-버퍼를 렌더링하며, G-버퍼에서 각 픽셀을 거리 뷰 이미지로 매핑하는 파노라마 이미지 매핑을 수행한다.The image mapping unit 1220 renders a G-buffer for a camera view image according to a user's movement and performs panoramic image mapping to map each pixel in the G-buffer to a street view image.

이미지 매핑부(1220)는 거리 뷰 이미지의 도시 모델(M)를 이용하여 사용자의 움직임에 따른 카메라 뷰 이미지에 따라 위치 버퍼(position buffer), 분할 버퍼(segmentation buffer) 및 일반 버퍼(normal buffer)의 G-버퍼를 렌더링하며, 위치 버퍼에서 인접한 카메라 정보를 기반으로 파노라마 좌표계 변환 과정을 수행하며, 위치 버퍼의 각 픽셀을 거리 뷰 이미지들로 매핑할 수 있다.The image mapping unit 1220 may render a G-buffer of a position buffer, a segmentation buffer, and a normal buffer according to a camera view image according to a user's movement using a city model M of a street view image, perform a panoramic coordinate system conversion process based on adjacent camera information in the position buffer, and map each pixel of the position buffer to street view images.

이미지 매핑부(1220)는 실시간 파노라마 텍스처 매핑을 위해, 거리 뷰 이미지의 도시 모델(M)를 이용하여 카메라 뷰 이미지에 따라 위치 버퍼(position buffer, Gp), 분할 버퍼(segmentation buffer, Gs) 및 일반 버퍼(normal buffer, Gn)의 G-버퍼를 렌더링할 수 있다. G-버퍼는 현재 사용자가 보는 장면에 대한 여러가지 기하 정보를 렌더링하여 저장하는 공간으로, 가상 공간의 지형 위치 정보를 저장하는 위치 버퍼, 노말 벡터를 저장하는 일반 버퍼 및 객체 별 의미 정보를 담고있는 분할 버퍼를 포함할 수 있다. The image mapping unit 1220 may render a G-buffer of a position buffer (Gp), a segmentation buffer (Gs), and a normal buffer (Gn) according to a camera view image using a city model M of a street view image for real-time panoramic texture mapping. The G-buffer is a space that renders and stores various geometric information about the scene currently viewed by the user, and may include a location buffer that stores topographical location information in virtual space, a general buffer that stores normal vectors, and a segmentation buffer that contains semantic information for each object.

텍스처 매핑 수행부(1230)는 의미론적 정보와 G-버퍼를 이용하여 깊이 테스트, 의미 객체 매칭 테스트 및 3D 인페인팅(inpainting)을 통해 카메라 뷰 이미지와 거리 뷰 이미지에 대한 파노라마 텍스처 매핑을 수행한다. The texture mapping performer 1230 performs panoramic texture mapping on the camera view image and the street view image through a depth test, a semantic object matching test, and 3D inpainting using semantic information and a G-buffer.

텍스처 매핑 수행부(1230)는 카메라 뷰 이미지에 투영된 픽셀에 대한 투영 오류 최소화를 위해 깊이 테스트하는 제1 단계(Depth Test), 투영 오류가 최소화된 카메라 뷰 이미지에 의미론적 정보를 통해 객체 일치 테스트를 수행하여 의미론적 객체 매칭하는 제2 단계(Semantic Object Matching Test), 카메라 뷰 이미지에 혼합 가중치를 적용하여 후보 텍스처 색상을 보간하여 최종 색상을 합성하는 제3 단계(Weighted Blending) 및 카메라 뷰 이미지에서 비하늘 영역에 대한 의미론적 3D 인페인팅하여 파노라마 텍스처 매핑을 완료하는 제4 단계(Semantic 3D Inpainting)를 포함할 수 있다. The texture mapping performer 1230 performs a first step (Depth Test) of performing a depth test for minimizing projection errors on pixels projected on a camera view image, a second step (Semantic Object Matching Test) of performing an object matching test through semantic information on the camera view image in which the projection error is minimized, and a third step (Weighted Blending) of synthesizing a final color by interpolating candidate texture colors by applying blending weights to the camera view image. A fourth step (Semantic 3D Inpainting) of completing panorama texture mapping by 3D inpainting may be included.

비록, 도 12의 시스템에서 그 설명이 생략되었더라도, 본 발명에 따른 시스템은 상기 도 1 내지 도 11에서 설명한 모든 내용을 포함할 수 있다는 것은 이 기술 분야에 종사하는 당업자에게 있어서 자명하다.Although the description is omitted in the system of FIG. 12, it is apparent to those skilled in the art that the system according to the present invention may include all of the contents described in FIGS. 1 to 11.

이상에서 설명된 시스템 또는 장치는 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPGA(Field Programmable Gate Array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 어플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The system or apparatus described above may be implemented as hardware components, software components, and/or a combination of hardware components and software components. For example, devices and components described in the embodiments may be implemented using one or more general purpose or special purpose computers, such as, for example, a processor, controller, arithmetic logic unit (ALU), digital signal processor, microcomputer, field programmable gate array (FPGA), programmable logic unit (PLU), microprocessor, or any other device capable of executing and responding to instructions. The processing device may run an operating system (OS) and one or more software applications running on the operating system. A processing device may also access, store, manipulate, process, and generate data in response to execution of software. For convenience of understanding, there are cases in which one processing device is used, but those skilled in the art will recognize that the processing device may include a plurality of processing elements and/or multiple types of processing elements. For example, a processing device may include a plurality of processors or a processor and a controller. Other processing configurations are also possible, such as parallel processors.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치, 또는 전송되는 신호 파(signal wave)에 영구적으로, 또는 일시적으로 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.Software may include a computer program, code, instructions, or a combination of one or more of these, and may configure a processing device to operate as desired, or may independently or collectively direct a processing device. Software and/or data may be permanently or temporarily embodied in any tangible machine, component, physical device, virtual equipment, computer storage medium or device, or transmitted signal wave, to be interpreted by, or to provide instructions or data to, a processing device. Software may be distributed on networked computer systems and stored or executed in a distributed manner. Software and data may be stored on one or more computer readable media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded on a computer readable medium. The computer readable medium may include program instructions, data files, data structures, etc. alone or in combination. Program commands recorded on the medium may be specially designed and configured for the embodiment or may be known and usable to those skilled in computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CD-ROMs and DVDs, magneto-optical media such as floptical disks, and hardware devices specially configured to store and execute program instructions such as ROM, RAM, and flash memory. Examples of program instructions include high-level language codes that can be executed by a computer using an interpreter, as well as machine language codes such as those produced by a compiler. The hardware devices described above may be configured to operate as one or more software modules to perform the operations of the embodiments, and vice versa.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described with limited examples and drawings, those skilled in the art can make various modifications and variations from the above description. For example, even if the described techniques are performed in an order different from the described method, and/or components of the described system, structure, device, circuit, etc. are combined or combined in a different form from the described method, or replaced or substituted by other components or equivalents, appropriate results can be achieved.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents of the claims are within the scope of the following claims.

Claims

Extracting semantic information of an RGB panoramic street view image (PI), a city model (M), an object unit semantic image (PS), a depth image (PD), and a camera parameter (C) from a street view image constituting 3D scene data;
rendering a G-buffer for a camera view image according to a user's movement and performing panoramic image mapping to map each pixel in the G-buffer to the street view image; and
performing panoramic texture mapping on the camera view image and the street view image through a depth test, a semantic object matching test, and 3D inpainting using the semantic information and the G-buffer;
including,
The step of performing the panoramic image mapping
Rendering a G-buffer of a position buffer, a segmentation buffer, and a normal buffer according to the camera view image according to the user's movement using the city model M of the street view image;
The step of performing the panoramic texture mapping
performing a depth test to minimize projection errors for pixels projected on the camera view image;
performing a semantic object matching test by performing an object matching test on the camera view image in which the projection error is minimized;
synthesizing a final color by interpolating candidate texture colors by applying a blending weight to the camera view image; and
Completing panoramic texture mapping by performing semantic 3D inpainting on a non-sky area in the camera view image
including,
The step of performing the semantic object matching test
When each pixel of the location buffer is mapped to the RGB panoramic street view image (PI) using the object-unit semantic image (PS), the object match test for checking matching with an object in the RGB panoramic street view image (PI) Characterized in that,
Completing the panoramic texture mapping
In order to process pixels not filled with RGB values in the camera view image, a real-time inpainting method of separating and processing a sky region and a non-sky region in the camera view image is used. Panoramic texture mapping method through semantic object matching.

delete

According to claim 1,
The step of performing the panoramic image mapping
Panoramic texture mapping method through semantic object matching, wherein a panoramic coordinate system conversion process is performed based on adjacent camera information in the position buffer, and each pixel of the position buffer is mapped to the street view images.

delete

According to claim 1,
The depth testing step is
A panoramic texture mapping method through semantic object matching, wherein projection error is minimized by removing RGB values due to occlusion from a current user's view using the rendered depth image (PD).

delete

According to claim 1,
The synthesizing step is
A panoramic texture mapping method through semantic object matching, wherein RGB values of a final color synthesized by interpolating candidate texture colors through a blending method are allocated to each pixel of the location buffer.

delete

According to claim 1,
Completing the panoramic texture mapping
In the camera view image, low-pass filtering is performed on the sky area, and the panoramic texture mapping is completed by filling the non-sky area with RGB values of texture colors that match geometrically and semantically. Panorama texture mapping method through semantic object matching.

A pre-processing unit for extracting semantic information of an RGB panoramic street view image (PI), a city model (M), an object unit semantic image (PS), a depth image (PD), and a camera parameter (C) from a street view image constituting 3D scene data;
an image mapping unit rendering a G-buffer for a camera view image according to a user's motion and performing panoramic image mapping to map each pixel in the G-buffer to the street view image; and
A texture mapping processor performing panoramic texture mapping on the camera view image and the street view image through a depth test, a semantic object matching test, and 3D inpainting using the semantic information and the G-buffer
including,
The image mapping unit
Rendering a G-buffer of a position buffer, a segmentation buffer, and a normal buffer according to the camera view image according to the user's movement using the city model M of the street view image;
The texture mapping unit
a first step of performing a depth test to minimize a projection error for a pixel projected on the camera view image; a second step of performing a semantic object matching test by performing an object matching test on the camera view image in which projection error is minimized; a third step of synthesizing a final color by interpolating candidate texture colors by applying a blending weight to the camera view image; and a fourth step of completing panoramic texture mapping by performing semantic 3D inpainting on a non-sky area in the camera view image,
The second step is
When each pixel of the location buffer is mapped to the RGB panoramic street view image (PI) using the object-unit semantic image (PS), the object match test for checking matching with an object in the RGB panoramic street view image (PI) Characterized in that,
The fourth step is
Panoramic texture mapping system through semantic object matching, characterized by using a real-time inpainting method that separates and processes a sky region and a non-sky region in the camera view image to process pixels not filled with RGB values in the camera view image.