KR20240083448A

KR20240083448A - Method, computer device, and computer program to detect polygonal signs based on deep learning

Info

Publication number: KR20240083448A
Application number: KR1020220167591A
Authority: KR
Inventors: 주명호
Original assignee: 네이버랩스 주식회사
Priority date: 2022-12-05
Filing date: 2022-12-05
Publication date: 2024-06-12
Also published as: WO2024122820A1

Abstract

딥러닝을 기반으로 다각형 형태의 간판 검출을 위한 방법, 컴퓨터 장치, 및 컴퓨터 프로그램이 개시된다. 딥러닝을 기반으로 다각형 형태의 간판 검출을 위한 방법은, 학습 대상 이미지에 포함된 간판 영역을 다각형 형태로 라벨링한 간판 데이터를 생성하는 단계; 및 상기 간판 데이터를 이용하여 다각형 형태의 간판 영역을 검출하는 간판 검출 모델을 학습하는 단계를 포함한다.A method, computer device, and computer program for detecting a polygonal sign based on deep learning are disclosed. A method for detecting a sign in a polygonal shape based on deep learning includes the steps of generating sign data labeling a sign area included in a learning target image in a polygonal shape; and learning a sign detection model that detects a polygon-shaped sign area using the sign data.

Description

Method, computer device, and computer program for detecting polygonal signs based on deep learning {METHOD, COMPUTER DEVICE, AND COMPUTER PROGRAM TO DETECT POLYGONAL SIGNS BASED ON DEEP LEARNING}

아래의 설명은 실 공간에 존재하는 상점 간판을 검출하는 기술에 관한 것이다.The explanation below relates to technology for detecting store signs that exist in real space.

POI(point of interest: 관심 지점) 정보는 도로 주변 건물의 상가, 관공서, 학교 등 주요 장소에 대한 정보를 의미할 수 있다.POI (point of interest) information may refer to information about major places such as shops, government offices, and schools in buildings around roads.

지도나 증강현실(AR) 등에서 정확한 정보를 제공하기 위해서는 항상 최신의 POI 정보를 유지해야 한다.In order to provide accurate information on maps or augmented reality (AR), you must always maintain the latest POI information.

관련 기술의 일례로, 한국 등록특허공보 제10-1183519호(등록일 2012년 09월 11일)에는 실제의 지형 지물 사진이 파노라마 형태로 제작된 지리 정보 파노라마를 이용하여 주요 지점 정보인 POI 정보를 생성할 수 있는 기술이 개시되어 있다.As an example of related technology, Korean Patent Publication No. 10-1183519 (registration date September 11, 2012) generates POI information, which is information on key points, using geographic information panoramas created in panoramic form with photos of actual terrain features. A technology that can do this has been disclosed.

POI 정보를 자동 생성하기 위해 이미지에 포함된 간판 정보를 활용하는 기술이 사용되고 있다.Technology that utilizes signage information included in images is being used to automatically generate POI information.

그러나, 간판의 크기나 형태 등이 다양할 뿐 아니라 간판 내 글자의 폰트, 위치, 방향 등이 상이하기 때문에 인식에 어려움이 있다.However, recognition is difficult because not only are the sizes and shapes of the signs diverse, but the fonts, positions, and directions of the letters within the signs are different.

또한, 실 공간에 대한 촬영 위치에 따른 원근(perspective) 왜곡으로 인해 간판 영역의 검출이 쉽지 않다.In addition, it is not easy to detect the sign area due to perspective distortion depending on the shooting location in real space.

거리뷰(road view) 이미지에 대한 원근 시점(perspective view)을 고려하여 간판의 위치 및 크기, 형태를 검출할 수 있는 방법 및 장치를 제공한다.Provide a method and device for detecting the location, size, and shape of a sign by considering the perspective view of a road view image.

딥러닝 모델을 통해 간판 영역을 M변 다각형 형태로 검출하여 이로부터 정면 간판 이미지를 획득할 수 있는 방법 및 장치를 제공한다.We provide a method and device for detecting a sign area in the form of an M-sided polygon through a deep learning model and obtaining a front sign image from this.

컴퓨터 장치에서 수행되는 방법에 있어서, 상기 컴퓨터 장치는 메모리에 포함된 컴퓨터 판독가능한 명령들을 실행하도록 구성된 적어도 하나의 프로세서를 포함하고, 상기 방법은, 상기 적어도 하나의 프로세서에 의해, 학습 대상 이미지에 포함된 간판 영역을 다각형 형태로 라벨링한 간판 데이터를 생성하는 단계; 및 상기 적어도 하나의 프로세서에 의해, 상기 간판 데이터를 이용하여 다각형 형태의 간판 영역을 검출하는 간판 검출 모델을 학습하는 단계를 포함하는 방법을 제공한다.A method performed on a computer device, wherein the computer device includes at least one processor configured to execute computer-readable instructions included in a memory, wherein the method includes, by the at least one processor, an image included in a learning target image. Generating signboard data labeling the signboard area in a polygonal shape; and learning, by the at least one processor, a sign detection model for detecting a polygonal sign area using the sign data.

일 측면에 따르면, 상기 생성하는 단계는, 상기 학습 대상 이미지에서 상기 간판 영역의 M변 꼭지점 위치 정보를 획득하는 단계; 상기 M변 꼭지점 위치 정보를 이용하여 상기 간판 영역에 대한 정면 간판 이미지를 획득하는 단계; 및 상기 정면 간판 이미지에서 일정 차원의 특징 벡터인 간판 특징 벡터를 추출하는 단계를 포함할 수 있다.According to one aspect, the generating step includes: acquiring location information of a vertex on the M side of the signboard area in the learning target image; Obtaining a front sign image for the sign area using the M-side vertex position information; And it may include extracting a sign feature vector, which is a feature vector of a certain dimension, from the front sign image.

다른 측면에 따르면, 상기 학습하는 단계는, 상기 간판 특징 벡터를 이용하여 동일한 간판 간에 특징 벡터가 유사하도록, 다른 간판 간에 특징 벡터가 다르도록 상기 간판 검출 모델을 학습할 수 있다.According to another aspect, the learning step may use the sign feature vector to learn the sign detection model so that feature vectors are similar between the same sign and feature vectors are different between different signs.

또 다른 측면에 따르면, 상기 정면 간판 이미지를 획득하는 단계는, 상기 간판 영역에 대해 M변의 중심점을 기준으로 각 변을 일정 배수 확대하여 확대 간판 영역을 만드는 단계; 및 상기 확대 간판 영역을 사전에 정의된 형태의 크기로 와핑(warping)하여 상기 정면 간판 이미지를 만드는 단계를 포함할 수 있다.According to another aspect, the step of acquiring the front sign image includes creating an enlarged sign area by enlarging each side of the sign area by a certain number of times based on the center point of the M side; And it may include creating the front sign image by warping the enlarged sign area to a predefined size.

또 다른 측면에 따르면, 상기 학습하는 단계는, 간판 검출 결과로서 간판 영역에 대한 M변 위치 정보와 신뢰도(confidence level)가 출력되도록 상기 간판 검출 모델을 학습할 수 있다.According to another aspect, the learning step may learn the sign detection model so that M-side location information and confidence level for the sign area are output as a sign detection result.

또 다른 측면에 따르면, 상기 방법은, 상기 적어도 하나의 프로세서에 의해, 임의의 거리뷰 이미지가 주어지는 경우 상기 간판 검출 모델을 통해 상기 거리뷰 이미지에서 다각형 형태의 타겟 간판 영역을 검출하는 단계를 더 포함할 수 있다.According to another aspect, the method further includes detecting, by the at least one processor, a target sign area in the shape of a polygon in the street view image through the sign detection model when an arbitrary street view image is given. can do.

또 다른 측면에 따르면, 상기 검출하는 단계는, 상기 거리뷰 이미지에서 검출된 상기 타겟 간판 영역 각각에 대하여 M변 위치 정보와 신뢰도를 획득하는 단계를 포함할 수 있다.According to another aspect, the detecting step may include obtaining M-side location information and reliability for each of the target sign areas detected in the street view image.

또 다른 측면에 따르면, 상기 검출하는 단계는, 기준이 되는 앵커(anchor)에서 상기 타겟 간판 영역으로의 회귀(regression)를 수행하는 방식을 통해 상기 타겟 간판 영역에 대한 신뢰도를 산출하는 단계를 포함할 수 있다.According to another aspect, the detecting step may include calculating the reliability of the target sign area by performing regression from a reference anchor to the target sign area. You can.

또 다른 측면에 따르면, 상기 검출하는 단계는, 기준이 되는 앵커와 상기 타겟 간판 영역 간의 IOU(intersection of union) 또는 IOA(intersection of area)를 상기 타겟 간판 영역에 대한 신뢰도로 계산하는 단계를 포함할 수 있다.According to another aspect, the detecting step may include calculating an intersection of union (IOU) or intersection of area (IOA) between a reference anchor and the target sign area as reliability for the target sign area. You can.

또 다른 측면에 따르면, 상기 검출하는 단계는, 상기 타겟 간판 영역을 와핑하여 정면 간판 이미지를 획득하는 단계; 및 상기 정면 간판 이미지를 이용한 간판 간 이미지 유사도를 기초로 POI 변화를 탐지하는 단계를 포함할 수 있다.According to another aspect, the detecting step includes obtaining a front sign image by warping the target sign area; And it may include detecting a POI change based on image similarity between signboards using the front signage image.

상기 방법을 컴퓨터에 실행시키기 위해 컴퓨터 판독가능한 기록 매체에 저장되는 컴퓨터 프로그램을 제공한다.A computer program stored in a computer-readable recording medium is provided to execute the method on a computer.

컴퓨터 장치에 있어서, 메모리에 포함된 컴퓨터 판독가능한 명령들을 실행하도록 구성된 적어도 하나의 프로세서를 포함하고, 상기 적어도 하나의 프로세서는, 학습 대상 이미지에 포함된 간판 영역을 다각형 형태로 라벨링한 간판 데이터를 생성하는 과정; 및 상기 간판 데이터를 이용하여 다각형 형태의 간판 영역을 검출하는 간판 검출 모델을 학습하는 과정을 처리하는 컴퓨터 장치를 제공한다.A computer device comprising: at least one processor configured to execute computer-readable instructions included in a memory, wherein the at least one processor generates sign data labeling a sign area included in a learning target image in a polygonal shape. process; and a computer device that processes a process of learning a sign detection model that detects a polygonal sign area using the sign data.

본 발명의 실시예들에 따르면, 딥러닝 모델을 통해 간판 영역을 M변 다각형 형태로 검출하여 이로부터 정면 간판 이미지를 획득함으로써 원근 시점에 따른 변화에 강인한 간판 검출 환경을 제공할 수 있고 원근 왜곡 문제를 해결하여 간판 검출 정확도를 향상시킬 수 있다.According to embodiments of the present invention, by detecting the sign area in the form of an M-sided polygon through a deep learning model and obtaining a front sign image from this, it is possible to provide a sign detection environment that is robust to changes depending on the perspective and to solve the problem of perspective distortion. By solving this, the accuracy of sign detection can be improved.

도 1은 본 발명의 일실시예에 따른 네트워크 환경의 예를 도시한 도면이다.
도 2는 본 발명의 일실시예에 따른 컴퓨터 장치의 예를 도시한 블록도이다.
도 3은 본 발명의 일실시예에 따른 컴퓨터 장치가 수행할 수 있는 방법의 일례를 도시한 순서도이다.
도 4는 본 발명의 일실시예에 있어서 딥러닝 기반 간판 검출 모델의 예시를 도시한 것이다.
도 5는 본 발명의 일실시예에 있어서 간판 검출 모델의 학습 데이터를 생성하는 과정을 도시한 순서도이다.
도 6 내지 도 7은 본 발명의 일실시예에 있어서 간판 이미지 와핑 과정을 설명하기 위한 예시 도면이다.
도 8 내지 도 9는 본 발명의 일실시예에 있어서 간판 검출 신뢰도 산출 과정을 설명하기 위한 예시 도면이다.
도 10은 본 발명의 일실시예에 있어서 M변 다각형 형태의 간판 검출 결과 예시를 도시한 것이다.1 is a diagram illustrating an example of a network environment according to an embodiment of the present invention.
Figure 2 is a block diagram showing an example of a computer device according to an embodiment of the present invention.
Figure 3 is a flowchart showing an example of a method that can be performed by a computer device according to an embodiment of the present invention.
Figure 4 shows an example of a deep learning-based sign detection model in one embodiment of the present invention.
Figure 5 is a flowchart showing the process of generating training data for a sign detection model in one embodiment of the present invention.
Figures 6 and 7 are exemplary diagrams for explaining the sign image warping process in one embodiment of the present invention.
Figures 8 and 9 are exemplary diagrams for explaining the sign detection reliability calculation process in one embodiment of the present invention.
Figure 10 shows an example of a sign detection result in the form of an M-side polygon in one embodiment of the present invention.

이하, 본 발명의 실시예를 첨부된 도면을 참조하여 상세하게 설명한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the attached drawings.

본 발명의 실시예들은 실 공간에 존재하는 상점 간판을 검출하는 기술에 관한 것이다.Embodiments of the present invention relate to technology for detecting store signs existing in real space.

본 명세서에서 구체적으로 개시되는 것들을 포함하는 실시예들은 딥러닝 모델을 통해 간판 영역을 M변 다각형 형태로 검출하여 이로부터 정면 간판 이미지를 획득함으로써 원근 왜곡으로 인한 오인식을 최소화하여 간판 영역을 정확하게 검출할 수 있다.Embodiments including those specifically disclosed in this specification detect the sign area in the form of an M-side polygon through a deep learning model and obtain a front sign image from this, thereby minimizing misrecognition due to perspective distortion and accurately detecting the sign area. You can.

본 발명의 실시예들에 따른 간판 검출 시스템은 적어도 하나의 컴퓨터 장치에 의해 구현될 수 있으며, 본 발명의 실시예들에 따른 간판 검출 방법은 간판 검출 시스템에 포함되는 적어도 하나의 컴퓨터 장치를 통해 수행될 수 있다. 이때, 컴퓨터 장치에는 본 발명의 일실시예에 따른 컴퓨터 프로그램이 설치 및 구동될 수 있고, 컴퓨터 장치는 구동된 컴퓨터 프로그램의 제어에 따라 본 발명의 실시예들에 따른 간판 검출 방법을 수행할 수 있다. 상술한 컴퓨터 프로그램은 컴퓨터 장치와 결합되어 간판 검출 방법을 컴퓨터에 실행시키기 위해 컴퓨터 판독 가능한 기록매체에 저장될 수 있다.The sign detection system according to embodiments of the present invention may be implemented by at least one computer device, and the sign detection method according to embodiments of the present invention is performed through at least one computer device included in the sign detection system. It can be. At this time, the computer program according to an embodiment of the present invention may be installed and driven in the computer device, and the computer device may perform the sign detection method according to the embodiments of the present invention under the control of the driven computer program. . The above-described computer program can be combined with a computer device and stored in a computer-readable recording medium to execute the sign detection method on the computer.

도 1은 본 발명의 일실시예에 따른 네트워크 환경의 예를 도시한 도면이다. 도 1의 네트워크 환경은 복수의 전자 기기들(110, 120, 130, 140), 복수의 서버들(150, 160) 및 네트워크(170)를 포함하는 예를 나타내고 있다. 이러한 도 1은 발명의 설명을 위한 일례로 전자 기기의 수나 서버의 수가 도 1과 같이 한정되는 것은 아니다. 또한, 도 1의 네트워크 환경은 본 실시예들에 적용 가능한 환경들 중 하나의 예를 설명하는 것일 뿐, 본 실시예들에 적용 가능한 환경이 도 1의 네트워크 환경으로 한정되는 것은 아니다.1 is a diagram illustrating an example of a network environment according to an embodiment of the present invention. The network environment in FIG. 1 shows an example including a plurality of electronic devices 110, 120, 130, and 140, a plurality of servers 150 and 160, and a network 170. Figure 1 is an example for explaining the invention, and the number of electronic devices or servers is not limited as in Figure 1. In addition, the network environment in FIG. 1 only explains one example of environments applicable to the present embodiments, and the environment applicable to the present embodiments is not limited to the network environment in FIG. 1.

복수의 전자 기기들(110, 120, 130, 140)은 컴퓨터 장치로 구현되는 고정형 단말이거나 이동형 단말일 수 있다. 복수의 전자 기기들(110, 120, 130, 140)의 예를 들면, 스마트폰(smart phone), 휴대폰, 내비게이션, 컴퓨터, 노트북, 디지털방송용 단말, PDA(Personal Digital Assistants), PMP(Portable Multimedia Player), 태블릿 PC 등이 있다. 일례로 도 1에서는 전자 기기(110)의 예로 스마트폰의 형상을 나타내고 있으나, 본 발명의 실시예들에서 전자 기기(110)는 실질적으로 무선 또는 유선 통신 방식을 이용하여 네트워크(170)를 통해 다른 전자 기기들(120, 130, 140) 및/또는 서버(150, 160)와 통신할 수 있는 다양한 물리적인 컴퓨터 장치들 중 하나를 의미할 수 있다.The plurality of electronic devices 110, 120, 130, and 140 may be fixed terminals or mobile terminals implemented as computer devices. Examples of the plurality of electronic devices 110, 120, 130, and 140 include smart phones, mobile phones, navigation devices, computers, laptops, digital broadcasting terminals, Personal Digital Assistants (PDAs), and Portable Multimedia Players (PMPs). ), tablet PC, etc. For example, in FIG. 1, the shape of a smartphone is shown as an example of the electronic device 110. However, in embodiments of the present invention, the electronic device 110 actually communicates with other devices through the network 170 using a wireless or wired communication method. It may refer to one of various physical computer devices capable of communicating with electronic devices 120, 130, 140 and/or servers 150, 160.

통신 방식은 제한되지 않으며, 네트워크(170)가 포함할 수 있는 통신망(일례로, 이동통신망, 유선 인터넷, 무선 인터넷, 방송망)을 활용하는 통신 방식뿐만 아니라 기기들 간의 근거리 무선 통신 역시 포함될 수 있다. 예를 들어, 네트워크(170)는, PAN(personal area network), LAN(local area network), CAN(campus area network), MAN(metropolitan area network), WAN(wide area network), BBN(broadband network), 인터넷 등의 네트워크 중 하나 이상의 임의의 네트워크를 포함할 수 있다. 또한, 네트워크(170)는 버스 네트워크, 스타 네트워크, 링 네트워크, 메쉬 네트워크, 스타-버스 네트워크, 트리 또는 계층적(hierarchical) 네트워크 등을 포함하는 네트워크 토폴로지 중 임의의 하나 이상을 포함할 수 있으나, 이에 제한되지 않는다.The communication method is not limited, and may include not only a communication method utilizing a communication network that the network 170 may include (for example, a mobile communication network, wired Internet, wireless Internet, and a broadcast network), but also short-range wireless communication between devices. For example, the network 170 may include a personal area network (PAN), a local area network (LAN), a campus area network (CAN), a metropolitan area network (MAN), a wide area network (WAN), and a broadband network (BBN). , may include one or more arbitrary networks such as the Internet. Additionally, the network 170 may include any one or more of network topologies including a bus network, star network, ring network, mesh network, star-bus network, tree or hierarchical network, etc. Not limited.

서버(150, 160) 각각은 복수의 전자 기기들(110, 120, 130, 140)과 네트워크(170)를 통해 통신하여 명령, 코드, 파일, 컨텐츠, 서비스 등을 제공하는 컴퓨터 장치 또는 복수의 컴퓨터 장치들로 구현될 수 있다. 예를 들어, 서버(150)는 네트워크(170)를 통해 접속한 복수의 전자 기기들(110, 120, 130, 140)로 서비스(일례로, 지도 서비스, 증강현실 서비스 등)를 제공하는 시스템일 수 있다.Each of the servers 150 and 160 is a computer device or a plurality of computers that communicate with a plurality of electronic devices 110, 120, 130, 140 and a network 170 to provide commands, codes, files, content, services, etc. It can be implemented with devices. For example, the server 150 is a system that provides services (e.g., map services, augmented reality services, etc.) to a plurality of electronic devices 110, 120, 130, and 140 connected through the network 170. You can.

도 2는 본 발명의 일실시예에 따른 컴퓨터 장치의 예를 도시한 블록도이다. 앞서 설명한 복수의 전자 기기들(110, 120, 130, 140) 각각이나 서버들(150, 160) 각각은 도 2를 통해 도시된 컴퓨터 장치(200)에 의해 구현될 수 있다.Figure 2 is a block diagram showing an example of a computer device according to an embodiment of the present invention. Each of the plurality of electronic devices 110, 120, 130, and 140 described above or each of the servers 150 and 160 may be implemented by the computer device 200 shown in FIG. 2.

이러한 컴퓨터 장치(200)는 도 2에 도시된 바와 같이, 메모리(210), 프로세서(220), 통신 인터페이스(230) 그리고 입출력 인터페이스(240)를 포함할 수 있다. 메모리(210)는 컴퓨터에서 판독 가능한 기록매체로서, RAM(random access memory), ROM(read only memory) 및 디스크 드라이브와 같은 비소멸성 대용량 기록장치(permanent mass storage device)를 포함할 수 있다. 여기서 ROM과 디스크 드라이브와 같은 비소멸성 대용량 기록장치는 메모리(210)와는 구분되는 별도의 영구 저장 장치로서 컴퓨터 장치(200)에 포함될 수도 있다. 또한, 메모리(210)에는 운영체제와 적어도 하나의 프로그램 코드가 저장될 수 있다. 이러한 소프트웨어 구성요소들은 메모리(210)와는 별도의 컴퓨터에서 판독 가능한 기록매체로부터 메모리(210)로 로딩될 수 있다. 이러한 별도의 컴퓨터에서 판독 가능한 기록매체는 플로피 드라이브, 디스크, 테이프, DVD/CD-ROM 드라이브, 메모리 카드 등의 컴퓨터에서 판독 가능한 기록매체를 포함할 수 있다. 다른 실시예에서 소프트웨어 구성요소들은 컴퓨터에서 판독 가능한 기록매체가 아닌 통신 인터페이스(230)를 통해 메모리(210)에 로딩될 수도 있다. 예를 들어, 소프트웨어 구성요소들은 네트워크(170)를 통해 수신되는 파일들에 의해 설치되는 컴퓨터 프로그램에 기반하여 컴퓨터 장치(200)의 메모리(210)에 로딩될 수 있다.As shown in FIG. 2, this computer device 200 may include a memory 210, a processor 220, a communication interface 230, and an input/output interface 240. The memory 210 is a computer-readable recording medium and may include a non-permanent mass storage device such as random access memory (RAM), read only memory (ROM), and a disk drive. Here, non-perishable large-capacity recording devices such as ROM and disk drives may be included in the computer device 200 as a separate permanent storage device that is distinct from the memory 210. Additionally, an operating system and at least one program code may be stored in the memory 210. These software components may be loaded into the memory 210 from a computer-readable recording medium separate from the memory 210. Such separate computer-readable recording media may include computer-readable recording media such as floppy drives, disks, tapes, DVD/CD-ROM drives, and memory cards. In another embodiment, software components may be loaded into the memory 210 through the communication interface 230 rather than a computer-readable recording medium. For example, software components may be loaded into memory 210 of computer device 200 based on computer programs installed by files received over network 170.

프로세서(220)는 기본적인 산술, 로직 및 입출력 연산을 수행함으로써, 컴퓨터 프로그램의 명령을 처리하도록 구성될 수 있다. 명령은 메모리(210) 또는 통신 인터페이스(230)에 의해 프로세서(220)로 제공될 수 있다. 예를 들어 프로세서(220)는 메모리(210)와 같은 기록 장치에 저장된 프로그램 코드에 따라 수신되는 명령을 실행하도록 구성될 수 있다.The processor 220 may be configured to process instructions of a computer program by performing basic arithmetic, logic, and input/output operations. Commands may be provided to the processor 220 by the memory 210 or the communication interface 230. For example, processor 220 may be configured to execute received instructions according to program code stored in a recording device such as memory 210.

통신 인터페이스(230)는 네트워크(170)를 통해 컴퓨터 장치(200)가 다른 장치(일례로, 앞서 설명한 저장 장치들)와 서로 통신하기 위한 기능을 제공할 수 있다. 일례로, 컴퓨터 장치(200)의 프로세서(220)가 메모리(210)와 같은 기록 장치에 저장된 프로그램 코드에 따라 생성한 요청이나 명령, 데이터, 파일 등이 통신 인터페이스(230)의 제어에 따라 네트워크(170)를 통해 다른 장치들로 전달될 수 있다. 역으로, 다른 장치로부터의 신호나 명령, 데이터, 파일 등이 네트워크(170)를 거쳐 컴퓨터 장치(200)의 통신 인터페이스(230)를 통해 컴퓨터 장치(200)로 수신될 수 있다. 통신 인터페이스(230)를 통해 수신된 신호나 명령, 데이터 등은 프로세서(220)나 메모리(210)로 전달될 수 있고, 파일 등은 컴퓨터 장치(200)가 더 포함할 수 있는 저장 매체(상술한 영구 저장 장치)로 저장될 수 있다.The communication interface 230 may provide a function for the computer device 200 to communicate with other devices (eg, the storage devices described above) through the network 170. For example, a request, command, data, file, etc. generated by the processor 220 of the computer device 200 according to a program code stored in a recording device such as memory 210 is transmitted to the network ( 170) and can be transmitted to other devices. Conversely, signals, commands, data, files, etc. from other devices may be received by the computer device 200 through the communication interface 230 of the computer device 200 via the network 170. Signals, commands, data, etc. received through the communication interface 230 may be transmitted to the processor 220 or memory 210, and files, etc. may be stored in a storage medium (as described above) that the computer device 200 may further include. It can be stored as a permanent storage device).

입출력 인터페이스(240)는 입출력 장치(250)와의 인터페이스를 위한 수단일 수 있다. 예를 들어, 입력 장치는 마이크, 키보드 또는 마우스 등의 장치를, 그리고 출력 장치는 디스플레이, 스피커와 같은 장치를 포함할 수 있다. 다른 예로 입출력 인터페이스(240)는 터치스크린과 같이 입력과 출력을 위한 기능이 하나로 통합된 장치와의 인터페이스를 위한 수단일 수도 있다. 입출력 장치(250)는 컴퓨터 장치(200)와 하나의 장치로 구성될 수도 있다.The input/output interface 240 may be a means for interfacing with the input/output device 250. For example, input devices may include devices such as a microphone, keyboard, or mouse, and output devices may include devices such as displays and speakers. As another example, the input/output interface 240 may be a means for interfacing with a device that integrates input and output functions into one, such as a touch screen. The input/output device 250 may be configured as a single device with the computer device 200.

또한, 다른 실시예들에서 컴퓨터 장치(200)는 도 2의 구성요소들보다 더 적은 혹은 더 많은 구성요소들을 포함할 수도 있다. 그러나, 대부분의 종래기술적 구성요소들을 명확하게 도시할 필요성은 없다. 예를 들어, 컴퓨터 장치(200)는 상술한 입출력 장치(250) 중 적어도 일부를 포함하도록 구현되거나 또는 트랜시버(transceiver), 데이터베이스 등과 같은 다른 구성요소들을 더 포함할 수도 있다.Additionally, in other embodiments, computer device 200 may include fewer or more components than those of FIG. 2 . However, there is no need to clearly show most prior art components. For example, the computer device 200 may be implemented to include at least some of the input/output devices 250 described above, or may further include other components such as a transceiver, a database, etc.

이하에서는 딥러닝을 기반으로 다각형 형태의 간판을 검출할 수 있는 방법 및 장치의 구체적인 실시예를 설명하기로 한다.Below, specific examples of a method and device that can detect polygonal signboards based on deep learning will be described.

지도나 증강현실 등 다양한 서비스에서 실 공간에 대한 정확한 정보를 제공하기 위해 최신의 POI 정보를 유지하는 것이 중요하다.It is important to maintain the latest POI information to provide accurate information about real space in various services such as maps and augmented reality.

POI 정보를 자동 생성하기 위해 실 공간을 촬영한 이미지에서 간판 정보를 검출하는 기술이 사용되고 있다.To automatically generate POI information, technology is being used to detect sign information from images taken of real space.

그러나, 이미지 내 간판은 촬영 각도에 따라 정면보다는 원근 형태로 왜곡된 경우가 많다.However, the signs in the image are often distorted in perspective rather than frontal depending on the shooting angle.

본 실시예들은 POI 정보로서 거리뷰 이미지에 포함된 간판 정보를 활용함에 있어 거리뷰 이미지의 촬영 위치에 따른 원근 왜곡에 강인한 간판 검출 기능을 구현할 수 있다.In these embodiments, when using sign information included in a street view image as POI information, it is possible to implement a sign detection function that is robust to perspective distortion depending on the shooting location of the street view image.

본 실시예에 따른 컴퓨터 장치(200)에는 컴퓨터로 구현된 간판 검출 시스템이 구성될 수 있다. 일례로, 간판 검출 시스템은 독립적으로 동작하는 프로그램 형태로 구현되거나, 혹은 특정 어플리케이션의 인-앱(in-app) 형태로 구성되어 상기 특정 어플리케이션 상에서 동작이 가능하도록 구현될 수 있다.The computer device 200 according to this embodiment may be configured with a computer-implemented sign detection system. For example, the sign detection system may be implemented in the form of a program that operates independently, or may be implemented in the form of an in-app of a specific application to enable operation on the specific application.

컴퓨터 장치(200)의 프로세서(220)는 이하의 간판 검출 방법을 수행하기 위한 구성요소로 구현될 수 있다. 실시예에 따라 프로세서(220)의 구성요소들은 선택적으로 프로세서(220)에 포함되거나 제외될 수도 있다. 또한, 실시예에 따라 프로세서(220)의 구성요소들은 프로세서(220)의 기능의 표현을 위해 분리 또는 병합될 수도 있다.The processor 220 of the computer device 200 may be implemented as a component for performing the following sign detection method. Depending on the embodiment, components of the processor 220 may be selectively included in or excluded from the processor 220. Additionally, depending on the embodiment, components of the processor 220 may be separated or merged to express the functions of the processor 220.

이러한 프로세서(220) 및 프로세서(220)의 구성요소들은 이하의 간판 검출 방법이 포함하는 단계들을 수행하도록 컴퓨터 장치(200)를 제어할 수 있다. 예를 들어, 프로세서(220) 및 프로세서(220)의 구성요소들은 메모리(210)가 포함하는 운영체제의 코드와 적어도 하나의 프로그램의 코드에 따른 명령(instruction)을 실행하도록 구현될 수 있다.The processor 220 and the components of the processor 220 can control the computer device 200 to perform the steps included in the sign detection method below. For example, the processor 220 and its components may be implemented to execute instructions according to the code of an operating system included in the memory 210 and the code of at least one program.

여기서, 프로세서(220)의 구성요소들은 컴퓨터 장치(200)에 저장된 프로그램 코드가 제공하는 명령에 따라 프로세서(220)에 의해 수행되는 서로 다른 기능들(different functions)의 표현들일 수 있다.Here, the components of the processor 220 may be expressions of different functions performed by the processor 220 according to instructions provided by program codes stored in the computer device 200.

프로세서(220)는 컴퓨터 장치(200)의 제어와 관련된 명령이 로딩된 메모리(210)로부터 필요한 명령을 읽어들일 수 있다. 이 경우, 상기 읽어들인 명령은 프로세서(220)가 이후 설명될 단계들을 실행하도록 제어하기 위한 명령을 포함할 수 있다.The processor 220 may read necessary instructions from the memory 210 where instructions related to controlling the computer device 200 are loaded. In this case, the read command may include an command for controlling the processor 220 to execute steps that will be described later.

이후 설명될 간판 검출 방법이 포함하는 단계들은 도시된 순서와 다른 순서로 수행될 수 있으며, 단계들 중 일부가 생략되거나 추가의 과정이 더 포함될 수 있다.Steps included in the sign detection method to be described later may be performed in an order different from the order shown, and some of the steps may be omitted or additional processes may be included.

간판 검출 방법이 포함하는 단계들은 서버(150)에서 수행될 수 있으며, 실시예에 따라서는 단계들 중 적어도 일부가 전자 기기(110, 120, 130, 140 중 어느 하나)에서 수행되는 것 또한 가능하다.The steps included in the sign detection method may be performed in the server 150, and depending on the embodiment, at least some of the steps may also be performed in any one of the electronic devices 110, 120, 130, and 140. .

도 3은 본 발명의 일실시예에 따른 컴퓨터 장치가 수행할 수 있는 방법의 일례를 도시한 흐름도이다.Figure 3 is a flowchart showing an example of a method that can be performed by a computer device according to an embodiment of the present invention.

도 3을 참조하면, 단계(S310)에서 프로세서(220)는 학습 대상 이미지에 포함된 간판 영역을 M변 다각형 형태로 라벨링한 간판 데이터를 생성할 수 있다. 프로세서(220)는 간판이 포함된 다양한 시점의 거리뷰 이미지를 학습 대상 이미지로 수집할 수 있고, 이때 학습 대상 이미지에서 간판 영역을 M변 다각형 형태로 라벨링하여 간판 검출 모델을 학습하기 위한 데이터를 수집할 수 있다.Referring to FIG. 3 , in step S310, the processor 220 may generate sign data labeling the sign area included in the learning target image in the form of an M-sided polygon. The processor 220 may collect street view images from various viewpoints including signboards as learning target images. At this time, the signboard area in the learning target image is labeled in the form of an M-sided polygon to collect data for learning a signboard detection model. can do.

단계(S320)에서 프로세서(220)는 학습 대상 이미지로부터 생성된 간판 데이터를 이용하여 딥러닝 기반 간판 검출 모델을 학습할 수 있다. 간판 검출 모델의 출력이 M변 다각형이 되도록 단계(S310)에서 생성된 간판 데이터를 이용하여 간판 검출 모델을 학습할 수 있다.In step S320, the processor 220 may learn a deep learning-based sign detection model using sign data generated from the learning target image. A sign detection model can be learned using the sign data generated in step S310 so that the output of the sign detection model is an M-sided polygon.

단계(S330)에서 프로세서(220)는 임의의 거리뷰 이미지가 주어지는 경우 M변 다각형 형태로 라벨링된 간판 데이터로 학습된 딥러닝 기반 간판 검출 모델을 이용하여 주어진 거리뷰 이미지에서 M변 다각형 형태의 타겟 간편 영역을 검출할 수 있다.In step S330, when a random street view image is given, the processor 220 uses a deep learning-based sign detection model learned with signboard data labeled in the form of an M-side polygon to select a target in the form of an M-side polygon in the given street view image. Simple areas can be detected.

본 실시예에서는 거리뷰 이미지의 원근 왜곡에 강인한 간판 검출을 위해 M변 다각형 형태로 라벨링된 간판 데이터로 학습된 딥러닝 모델로서 M변 다각형 형태의 간판 검출 결과를 출력하는 간판 검출 모델을 제공할 수 있다.In this embodiment, in order to detect signboards that are robust to perspective distortion of street view images, a deep learning model learned with signage data labeled in the form of an M-side polygon can provide a signboard detection model that outputs signboard detection results in the form of an M-side polygon. there is.

간판의 크기나 형태 등이 다양함에 따라 많은 경우 간판 영역의 정의가 어려운 문제가 있다. 간판 검출 과정에서 간판 영역을 동일한 크기로 자르거나(crop) 정해진 크기로 와핑(warping)하는 방법이 이용될 수 있다.As the size and shape of signs vary, in many cases it is difficult to define the area of the sign. In the sign detection process, a method of cropping the sign area to the same size or warping it to a certain size can be used.

그러나, 간판 영역을 동일한 크기로 자를 경우 유효한 영역이 손실되거나 간판 이외의 불필요한 영역이 포함되어 올바른 간판 정보를 추출하기 어렵게 된다.However, if the sign area is cut to the same size, the effective area is lost or unnecessary areas other than the sign are included, making it difficult to extract correct sign information.

또한, 체인점은 물론이고, 위치(건물의 정면이나 측면 등)나 종류(플렉스 간판, 돌출 간판 등)에 따라 크기가 다르지만 동일한 상호명의 간판이 존재하게 되는데, 이때 간판 영역을 정해진 크기로 와핑하는 경우 동일한 간판 간의 매칭이 어려운 문제가 있다.In addition, there are signs with the same business name, although the size is different depending on the location (front or side of the building, etc.) or type (flex sign, protruding sign, etc.), as well as chain stores. In this case, when the sign area is warped to a certain size. There is a problem where matching between identical signs is difficult.

이러한 문제를 해결하기 위해 본 발명에서는 딥러닝을 기반으로 M변 다각형 형태의 간판 검출 모델을 학습하여 이를 통해 원근 왜곡에 강인한 간판 검출 환경을 제공할 수 있다.To solve this problem, the present invention learns a sign detection model in the form of an M-side polygon based on deep learning, thereby providing a sign detection environment that is robust to perspective distortion.

프로세서(220)는 딥러닝 기반 간판 검출 모델을 통해 거리뷰 이미지에서 원근 시점에 따른 M변 다각형 형태의 간판 영역을 검출할 수 있다. 더 나아가, 프로세서(220)는 간판 검출 모델을 통해 검출된 간판 영역을 정면 이미지로 와핑하여 정면 이미지를 기준으로 간판 간의 이미지 유사도를 측정함으로써 POI 변화를 탐지할 수 있다.The processor 220 can detect a sign area in the form of an M-sided polygon according to the perspective view in the street view image through a deep learning-based sign detection model. Furthermore, the processor 220 can detect POI changes by warping the sign area detected through the sign detection model to a frontal image and measuring image similarity between the signboards based on the frontal image.

도 4는 본 발명의 일실시예에 있어서 딥러닝 기반 간판 검출 모델의 예시를 도시한 것이다.Figure 4 shows an example of a deep learning-based sign detection model in one embodiment of the present invention.

도 4를 참조하면, 본 발명에 따른 간판 검출 모델(40)은 다양한 시점의 거리뷰 이미지로 사전에 수집된 이미지 셋이 입력 데이터가 될 수 있다. 특히, 간판 영역이 M점 폴리곤(polygon) 형태로 라벨링된 거리뷰 이미지 셋을 간판 검출 모델(40)의 입력으로 활용할 수 있다.Referring to FIG. 4, the sign detection model 40 according to the present invention may use a set of images collected in advance as street view images from various viewpoints as input data. In particular, a street view image set in which the sign area is labeled in the form of an M-point polygon can be used as an input to the sign detection model 40.

프로세서(220)는 간판 영역이 M점 폴리곤 형태로 라벨링된 거리뷰 이미지 셋을 이용하여 간판 검출 모델(40)을 학습하는 것으로, 간판 영역에 대한 검출 결과로서 M변 다각형 정보가 출력되도록 간판 검출 모델(40)을 학습할 수 있다.The processor 220 learns the sign detection model 40 using a street view image set in which the sign area is labeled in the form of an M-point polygon, and uses a sign detection model to output M-side polygon information as a detection result for the sign area. (40) can be learned.

간판 검출 모델(40)의 출력 파라미터는 거리뷰 이미지에서 검출된 간판 영역(POI #1, …, POI #N) 각각에 대하여 M변 위치 값(p1, …, pM)과 해당 간판 영역에 대한 검출 신뢰도(confidence level)(conf)를 포함할 수 있다.The output parameters of the sign detection model 40 are the M-side position values (p1, ..., pM) for each sign area (POI #1, ..., POI #N) detected in the street view image and the detection of the corresponding sign area. May include confidence level (conf).

도 5는 본 발명의 일실시예에 있어서 간판 검출 모델의 학습 데이터를 생성하는 과정을 도시한 순서도이다.Figure 5 is a flowchart showing the process of generating training data for a sign detection model in one embodiment of the present invention.

도 5를 참조하면, 단계(S501)에서 프로세서(220)는 학습 대상 이미지가 주어지는 경우 각 이미지에 간판 검출기를 적용하여 폴리곤 형태의 간판 영역을 검출할 수 있다. 프로세서(220)는 학습 대상 이미지에 포함된 간판 영역의 M변 꼭지점 위치 정보를 획득할 수 있다.Referring to FIG. 5 , in step S501, when a learning target image is given, the processor 220 may detect a polygon-shaped sign area by applying a sign detector to each image. The processor 220 may acquire vertex position information on the M side of the signboard area included in the learning target image.

단계(S502)에서 프로세서(220)는 M변 다각형 형태의 간판 영역을 사전 정의된 형태의 크기로 와핑하여 정면 간판 이미지를 획득할 수 있다.In step S502, the processor 220 may obtain a front sign image by warping the M-side polygonal sign area to a predefined size.

도 6 내지 도 7은 본 발명의 일실시예에 있어서 간판 이미지 와핑 과정을 설명하기 위한 예시 도면이다.Figures 6 and 7 are exemplary diagrams for explaining the sign image warping process in one embodiment of the present invention.

먼저, 도 6에 도시한 바와 같이, 프로세서(220)는 학습 대상 이미지에서 검출된 간판 영역(601)의 M변 꼭지점 위치 정보를 이용하여 M변의 중심점을 기준으로 각 변을 일정 배수(예를 들어, 1.5배 등)로 확대함으로써 확대 간판 영역(602)을 만들 수 있다.First, as shown in FIG. 6, the processor 220 uses the vertex position information of the M side of the sign area 601 detected in the learning target image to divide each side by a certain multiple (e.g., based on the center point of the M side). , 1.5 times, etc.), an enlarged sign area 602 can be created.

다음으로, 도 7에 도시한 바와 같이, 프로세서(220)는 M변 다각형 형태의 확대 간판 영역(602)을 사전에 정의된 형태(예를 들어, 직사각형 등)와 크기의 이미지로 와핑하여 주변 정보를 포함한 정면 간판 이미지(703)를 획득할 수 있다.Next, as shown in FIG. 7, the processor 220 warps the enlarged signboard area 602 in the form of an M-side polygon into an image of a predefined shape (e.g., rectangle, etc.) and size to obtain surrounding information. A front signage image 703 including can be obtained.

다시 도 5를 참조하면, 단계(S503)에서 프로세서(220)는 단계(S502)에서 획득한 정면 간판 이미지(703)로부터 일정 차원(예를 들어, 1024차원)의 특징 벡터(이하, '간판 특징 벡터'라 칭함)를 추출할 수 있다. 프로세서(220)는 정면 간판 이미지(703)로부터 추출된 간판 특징 벡터를 딥러닝 기반 간판 검출 모델의 학습을 위한 데이터로 이용할 수 있다.Referring again to FIG. 5, in step S503, the processor 220 generates a feature vector (hereinafter referred to as 'sign feature') of a certain dimension (e.g., 1024 dimensions) from the front sign image 703 acquired in step S502. (referred to as a ‘vector’) can be extracted. The processor 220 may use the sign feature vector extracted from the front sign image 703 as data for learning a deep learning-based sign detection model.

프로세서(220)는 정면 간판 이미지(703)로부터 추출된 간판 특징 벡터를 이용하여 간판 검출 모델을 학습할 수 있다. 프로세서(220)는 간판 검출 모델에 대해 동일한 간판 간 특징 벡터를 서로 유사하도록, 다른 간판 간 특징 벡터를 차이가 크도록 모델 학습을 진행할 수 있다.The processor 220 may learn a sign detection model using a sign feature vector extracted from the front sign image 703. The processor 220 may train the sign detection model so that feature vectors between the same signboards are similar to each other and feature vectors between different signboards have large differences.

도 8 내지 도 9는 본 발명의 일실시예에 있어서 간판 검출 신뢰도 산출 과정을 설명하기 위한 예시 도면이다.Figures 8 and 9 are exemplary diagrams for explaining the sign detection reliability calculation process in one embodiment of the present invention.

프로세서(220)는 이미지에서 검출된 간판 영역에 대해 M변 위치 값과 함께 신뢰도 값이 출력되도록 모델 학습을 진행할 수 있다.The processor 220 may perform model learning so that a reliability value is output along with the M-side position value for the sign area detected in the image.

프로세서(220)는 학습 대상 이미지에서 검출된 M변 다각형 형태의 간판 영역에 대한 신뢰도를 산출할 수 있다.The processor 220 may calculate the reliability of the sign area in the form of an M-sided polygon detected in the learning target image.

일례로, 도 8을 참조하면, 프로세서(220)는 4변 다각형 형태의 간판 영역을 검출한 경우 기준이 되는 사각형의 앵커(anchor)(80)에서 간판 영역(601)으로의 회귀(regression)를 수행하는 방식을 통해 해당 간판 영역에 대한 신뢰도 지표인 IOU(intersection of union)를 산출할 수 있다.For example, referring to FIG. 8, when the processor 220 detects a sign area in the form of a four-sided polygon, regression is performed from the reference square anchor 80 to the sign area 601. Through this method, the IOU (intersection of union), which is a reliability indicator for the corresponding sign area, can be calculated.

IOU는 이미지에서 검출된 경계 상자(boundary box)가 얼마나 정확한지를 나타내는 지표로서, 검출된 경계 상자와 기준 경계 상자 간에 중첩되는 교집합 면적을 두 경계 상자의 합집합 면적으로 나누는 것이다. 간판 영역에 대한 신뢰도를 파악하기 위해 앵커와 라벨링된 간판 이미지와의 IOU 계산 시 간판의 다양성으로 인해 모든 간판을 앵커에 포함하기 어렵다. 특히, 대각선 방향의 간판 영역의 경우 IOU가 매우 낮을 수밖에 없다.IOU is an indicator of how accurate the bounding box detected in the image is. It divides the overlapping intersection area between the detected bounding box and the reference bounding box by the union area of the two bounding boxes. When calculating the IOU between an anchor and a labeled sign image to determine the reliability of the sign area, it is difficult to include all signs in the anchor due to the diversity of signs. In particular, in the case of diagonal sign areas, the IOU is bound to be very low.

따라서, 도 9의 (A)와 같이 대각선 방향의 간판(901)의 경우 IOU가 낮더라도 IOA(intersection of area)를 계산하여 앵커(80)를 활성화할 수 있고, 도 9의 (B)와 같이 긴 간판(901)의 경우 가장 긴 앵커(80)에 낮은 IOU로 편입시킬 수 있다. IOA는 앵커의 전체 면적에서 두 경계 상자의 교집합 면적이 차지하는 비율을 계산하는 것이다.Therefore, in the case of a sign 901 in the diagonal direction, as shown in (A) of FIG. 9, the anchor 80 can be activated by calculating the intersection of area (IOA) even if the IOU is low, and as shown in (B) of FIG. 9 In the case of a long sign (901), it can be incorporated into the longest anchor (80) with a low IOU. IOA calculates the ratio of the intersection area of two bounding boxes to the total area of the anchor.

다시 말해, 프로세서(220)는 학습 대상 이미지에서 검출된 간판 영역에 대해 IOU를 계산함으로써 해당 간판 영역에 대한 신뢰도 값을 산출할 수 있다. 이때, 프로세서(220)는 IOU를 통해 간판 영역의 검출 정확도를 평가하되, 예를 들어 IOU가 임계치 미만인 경우 IOA를 이용하여 해당 간판 영역에 대한 검출 정확도를 평가할 수 있다.In other words, the processor 220 can calculate the reliability value for the sign area detected in the learning target image by calculating the IOU for the sign area. At this time, the processor 220 evaluates the detection accuracy of the sign area using the IOU. For example, if the IOU is less than a threshold, the processor 220 may evaluate the detection accuracy of the corresponding sign area using the IOA.

도 10은 본 발명의 일실시예에 있어서 M변 다각형 형태의 간판 검출 결과 예시를 도시한 것이다.Figure 10 shows an example of a sign detection result in the form of an M-side polygon in one embodiment of the present invention.

도 10을 참조하면, 프로세서(220)는 임의의 거리뷰 이미지(100)가 주어지는 경우 M변 다각형 형태의 간판 데이터로 학습된 간판 검출 모델(40)을 통해 다양한 원근 시점의 간판 영역(1001)을 검출할 수 있다.Referring to FIG. 10, when a random street view image 100 is given, the processor 220 detects sign areas 1001 at various perspective points through a sign detection model 40 learned with sign data in the form of an M-sided polygon. It can be detected.

다시 말해, 프로세서(220)는 주어진 거리뷰 이미지(100)에서 타겟 간판 영역을 M변 다각형 형태로 검출할 수 있다. 이때, 프로세서(220)는 간판 검출 결과로서 간판 영역을 나타내는 M변 다각형 좌표 정보를 수집할 수 있고, 이와 함께 해당 검출 영역에 대한 신뢰도 정보를 수집할 수 있다.In other words, the processor 220 may detect the target sign area in the form of an M-sided polygon in the given street view image 100. At this time, the processor 220 may collect M-side polygon coordinate information representing the sign area as a sign detection result, and may also collect reliability information for the corresponding detection area.

프로세서(220)는 딥러닝 기반 간판 검출 모델을 통해 거리뷰 이미지에서 간판의 위치 및 크기, 형태를 원근 시점에 따라 M변 다각형 형태로 검출/인식할 수 있다.The processor 220 can detect/recognize the position, size, and shape of the sign in the street view image in the form of an M-side polygon according to the perspective and perspective through a deep learning-based sign detection model.

프로세서(220)는 간판 검출 모델을 통해 검출된 M변 다각형 형태의 간판 영역을 간판 주변 영역을 포함한 정면 간판 이미지로 와핑하여 간판 간의 이미지 유사도를 측정할 수 있다.The processor 220 may measure image similarity between signs by warping the M-side polygon-shaped sign area detected through the sign detection model into a front sign image including the surrounding area of the sign.

본 발명에 따른 간판 검출 기술은 실 공간에 존재하는 POI 정보를 자동 생성하여 데이터베이스를 구성하는 기술, 실 공간에 존재하는 POI의 변화를 탐지하는 기술 분야 등에 활용될 수 있다.The sign detection technology according to the present invention can be used in the field of technology for automatically generating POI information existing in real space to form a database, and technology for detecting changes in POI existing in real space.

이처럼 본 발명의 실시예들에 따르면, 딥러닝 모델을 통해 간판 영역을 M변 다각형 형태로 검출하여 이로부터 정면 간판 이미지를 획득함으로써 원근 시점에 따른 변화에 강인한 간판 검출 환경을 제공할 수 있고 원근 왜곡 문제를 해결하여 간판 검출 정확도를 향상시킬 수 있다.In this way, according to embodiments of the present invention, by detecting the sign area in the form of an M-side polygon through a deep learning model and obtaining a front sign image from this, a sign detection environment that is robust to changes depending on the perspective can be provided and perspective distortion is possible. By solving the problem, sign detection accuracy can be improved.

이상에서 설명된 장치는 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치 및 구성요소는, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPGA(field programmable gate array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 어플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The device described above may be implemented with hardware components, software components, and/or a combination of hardware components and software components. For example, the devices and components described in the embodiments include a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate array (FPGA), and a programmable logic unit (PLU). It may be implemented using one or more general-purpose or special-purpose computers, such as a logic unit, microprocessor, or any other device capable of executing and responding to instructions. The processing device may execute an operating system (OS) and one or more software applications running on the operating system. Additionally, a processing device may access, store, manipulate, process, and generate data in response to the execution of software. For ease of understanding, a single processing device may be described as being used; however, those skilled in the art will understand that a processing device includes multiple processing elements and/or multiple types of processing elements. It can be seen that it may include. For example, a processing device may include a plurality of processors or one processor and one controller. Additionally, other processing configurations, such as parallel processors, are possible.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 컴퓨터 저장 매체 또는 장치에 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.Software may include a computer program, code, instructions, or a combination of one or more of these, which may configure a processing unit to operate as desired, or may be processed independently or collectively. You can command the device. The software and/or data may be embodied in any type of machine, component, physical device, computer storage medium or device for the purpose of being interpreted by or providing instructions or data to the processing device. there is. Software may be distributed over networked computer systems and stored or executed in a distributed manner. Software and data may be stored on one or more computer-readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 이때, 매체는 컴퓨터로 실행 가능한 프로그램을 계속 저장하거나, 실행 또는 다운로드를 위해 임시 저장하는 것일 수도 있다. 또한, 매체는 단일 또는 수 개의 하드웨어가 결합된 형태의 다양한 기록수단 또는 저장수단일 수 있는데, 어떤 컴퓨터 시스템에 직접 접속되는 매체에 한정되지 않고, 네트워크 상에 분산 존재하는 것일 수도 있다. 매체의 예시로는, 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD-ROM 및 DVD와 같은 광기록 매체, 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical medium), 및 ROM, RAM, 플래시 메모리 등을 포함하여 프로그램 명령어가 저장되도록 구성된 것이 있을 수 있다. 또한, 다른 매체의 예시로, 어플리케이션을 유통하는 앱 스토어나 기타 다양한 소프트웨어를 공급 내지 유통하는 사이트, 서버 등에서 관리하는 기록매체 내지 저장매체도 들 수 있다.The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded on a computer-readable medium. At this time, the medium may continuously store a computer-executable program, or temporarily store it for execution or download. In addition, the medium may be a variety of recording or storage means in the form of a single or several pieces of hardware combined. It is not limited to a medium directly connected to a computer system and may be distributed over a network. Examples of media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical recording media such as CD-ROMs and DVDs, magneto-optical media such as floptical disks, And there may be something configured to store program instructions, including ROM, RAM, flash memory, etc. Additionally, examples of other media include recording or storage media managed by app stores that distribute applications, sites or servers that supply or distribute various other software, etc.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described with limited examples and drawings, various modifications and variations can be made by those skilled in the art from the above description. For example, the described techniques are performed in a different order than the described method, and/or components of the described system, structure, device, circuit, etc. are combined or combined in a different form than the described method, or other components are used. Alternatively, appropriate results may be achieved even if substituted or substituted by an equivalent.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents of the claims also fall within the scope of the claims described below.

Claims

In a method performed on a computer device,
The computer device includes at least one processor configured to execute computer readable instructions contained in a memory,
The method is:
Generating, by the at least one processor, sign data labeling a sign area included in a learning target image in a polygonal shape; and
Learning, by the at least one processor, a sign detection model for detecting a polygonal sign area using the sign data.
How to include .

According to paragraph 1,
The generating step is,
Obtaining vertex position information on the M side of the signboard area in the learning target image;
Obtaining a front sign image for the sign area using the M-side vertex position information; and
Extracting a sign feature vector, which is a feature vector of a certain dimension, from the front sign image
How to include .

According to paragraph 2,
The learning step is,
Using the signboard feature vectors, learn the signboard detection model so that feature vectors are similar between the same signboards and feature vectors are different between different signboards.
A method characterized by .

According to paragraph 2,
The step of acquiring the front signage image is,
Creating an enlarged signboard area by enlarging each side of the signboard area by a certain number of times based on the center point of side M; and
Creating the front sign image by warping the enlarged sign area to a predefined size.
How to include .

According to paragraph 1,
The learning step is,
Learning the sign detection model so that M-side location information and confidence level for the sign area are output as a sign detection result.
A method characterized by .

According to paragraph 1,
The method is:
Detecting, by the at least one processor, a target sign area in the form of a polygon in the street view image through the sign detection model when a random street view image is given.
How to further include .

According to clause 6,
The detecting step is,
Obtaining M-side location information and reliability for each of the target signage areas detected in the street view image.
How to include .

According to clause 6,
The detecting step is,
Calculating the reliability of the target sign area by performing regression from a reference anchor to the target sign area.
How to include .

According to clause 6,
The detecting step is,
Calculating the intersection of union (IOU) or intersection of area (IOA) between a reference anchor and the target sign area as the reliability for the target sign area.
How to include .

According to clause 6,
The detecting step is,
Obtaining a front sign image by warping the target sign area; and
Detecting point of interest (POI) changes based on image similarity between signboards using the front signage image
How to include .

A computer program stored in a computer-readable recording medium for executing the method of any one of claims 1 to 10 on a computer.

In computer devices,
At least one processor configured to execute computer readable instructions contained in memory
Including,
The at least one processor,
A process of generating sign data that labels the sign area included in the learning target image in a polygonal form; and
The process of learning a sign detection model that detects a polygonal sign area using the sign data
A computer device that processes

According to clause 12,
The at least one processor,
Obtaining the location information of the M side vertex of the signboard area in the learning target image,
Obtain a front sign image for the sign area using the M side vertex location information,
Extracting a signboard feature vector, which is a feature vector of a certain dimension, from the front signage image
A computer device characterized by a.

According to clause 13,
The at least one processor,
Using the signboard feature vectors, learn the signboard detection model so that feature vectors are similar between the same signboards and feature vectors are different between different signboards.
A computer device characterized by a.

According to clause 13,
The at least one processor,
For the sign area, each side is enlarged by a certain number based on the center point of side M to create an enlarged sign area,
Creating the front sign image by warping the enlarged sign area to a predefined size.
A computer device characterized by a.

According to clause 12,
The at least one processor,
Learning the sign detection model so that M-side location information and reliability for the sign area are output as a sign detection result.
A computer device characterized by a.

According to clause 12,
The at least one processor,
When a random street view image is given, detecting a target sign area in the form of a polygon from the street view image through the sign detection model.
A computer device characterized by a.

According to clause 17,
The at least one processor,
Obtaining M-side location information and reliability for each of the target signage areas detected in the street view image.
A computer device characterized by a.

According to clause 17,
The detecting step is,
Calculating the IOU or IOA between a reference anchor and the target sign area as the reliability for the target sign area
A computer device characterized by a.

According to clause 17,
The at least one processor,
Obtain a front sign image by warping the target sign area,
Detecting POI changes based on image similarity between signs using the front signage image
A computer device characterized by a.