KR102546077B1

KR102546077B1 - Apparatus and method for classifying travel images using panoramic image conversion

Info

Publication number: KR102546077B1
Application number: KR1020210089286A
Authority: KR
Inventors: 정지하; 최재훈
Original assignee: 주식회사 트립비토즈
Priority date: 2021-07-07
Filing date: 2021-07-07
Publication date: 2023-06-22
Also published as: KR20230008530A

Abstract

본 발명은 영상의 파노라마 이미지 변환을 이용한 여행 영상 분류 장치 및 방법에 관한 것이다. 이를 위하여, 대표 프레임을 입력 데이터로 하고 대표 프레임 내에 포함된 오브젝트에 대한 분류 정보인 Class 분류 정보 및 오브젝트의 대표 프레임 내 비율에 대한 정보인 오브젝트 크기 정보를 출력 데이터로 하는 CNN 모듈을 포함하고, Class 분류 정보 및 오브젝트 크기 정보를 입력 데이터로 하고 위치 정보가 포함된 구역의 테마에 대한 분류 정보인 구역 분류 정보를 출력 데이터로 하는 인공신경망 모듈을 포함하며, 여행 영상 및 영상 정보에 대한 구역 분류 정보를 출력하는 대표 프레임 분류 모듈을 제공할 수 있다. 이에 따르면, 여행 영상을 사용자의 니즈에 따라 구분되는 구역으로 분류하는 것이 가능해지는 효과가 발생된다. The present invention relates to an apparatus and method for classifying travel images using panoramic image conversion of images. To this end, it includes a CNN module that takes a representative frame as input data and outputs class classification information, which is classification information about objects included in the representative frame, and object size information, which is information about the ratio of objects in the representative frame, as output data. It includes an artificial neural network module that takes classification information and object size information as input data and zone classification information, which is classification information on the theme of a zone including location information, as output data, and provides zone classification information for travel images and image information. A representative frame classification module to output may be provided. According to this, it is possible to classify travel images into zones according to user needs.

Description

Apparatus and method for classifying travel images using panoramic image conversion {Apparatus and method for classifying travel images using panoramic image conversion}

본 발명은 영상의 파노라마 이미지 변환을 이용한 여행 영상 분류 장치 및 방법에 관한 것이다. The present invention relates to an apparatus and method for classifying travel images using panoramic image conversion of images.

여행 산업은 스마트폰의 등장에 따라 급변하였으며 이에 따라 매년 꾸준히 성장하고 있는 산업군 중 하나이다. 여행 산업은 크게 항공, 숙박, 렌터카, 액티비티 티켓, 여행가이드 혹은 패키지여행으로 구분될 수 있다. 이 중 숙박 예약 산업은 여행 산업에서의 필수적인 요소로서 수요 및 공급이 가장 크고 경쟁도 가장 심화되어 있는 산업군이다. 숙박 예약 산업에서의 대표적인 플레이어로는, 호텔 예약에 호텔스닷컴, 익스피디아, 부킹닷컴, 아고다, 호텔엔조이, 당일 호텔 타임커머스에 데일리호텔, 세일투나잇, 호텔타임(여기어때), 호텔나우(야놀자), 모텔 예약에 야놀자, 여기어때, 팬션 예약에 야놀자팬션, 우리팬션, 떠나요닷컴, 게스트하우스나 민박 예약에 에어비앤비, 코자자, 올스테이, 호텔메타검색에 트립어드바이저, 호텔스컴바인, 스카이스캐너, 트리바고 등이 있다. The travel industry has changed rapidly with the advent of smartphones, and is one of the industries that is growing steadily every year. The travel industry can be largely divided into air travel, lodging, rental cars, activity tickets, travel guides, or package tours. Among them, the lodging reservation industry is an essential element in the travel industry, with the largest demand and supply and the most intense competition. Representative players in the accommodation reservation industry include Hotels.com, Expedia, Booking.com, Agoda, Hotel Enjoy, Daily Hotel, Sale Tonight, Hotel Time (How about here), Hotel Now ( Yanolja), Yanolja for motel reservation, How about here, Pension reservation for Yanolja Pension, Woori Pension, Go away.com, Guest house or B&B reservation for Airbnb, Kozaza, Allstay, Hotel meta search for TripAdvisor, Hotels Combine, Skyscanner, Trivago, etc.

이러한 다양한 여행 서비스의 등장으로 여행객들은 과거에 단순히 패키지 여행을 구매하던 방식과 달리 여행 경험과 일정을 직접 설계하고자 하는 경향이 발생하기 시작하였다. 이와 더불어 여행 서비스들은 고객들이 원하지도 않는 상품과 프로모션을 푸시하는 방식에서 벗어나 다양한 방식으로 상품을 제공함으로써 여행 설계의 새로운 소비 경험을 제공하는데 주력하고 있다. With the advent of these various travel services, travelers have begun to develop a tendency to design their own travel experience and itinerary, unlike the way they simply purchased package tours in the past. In addition, travel services are focusing on providing a new consumption experience of travel design by providing products in a variety of ways, away from pushing products and promotions that customers do not want.

대한민국 등록특허 10-1979764, 최저가 호텔 예약에 따른 차액보상이 가능한 호텔 예약 방법 및 시스템, (주)트립비토즈Republic of Korea Registered Patent No. 10-1979764, Hotel reservation method and system that can compensate for the difference according to the lowest price hotel reservation, Tripbitoz Co., Ltd. 미국 등록특허 US 10346402 B2, Optimized system and method for finding best fares, Expedia, Inc.US registered patent US 10346402 B2, Optimized system and method for finding best fares, Expedia, Inc. 미국 등록특허 US 7783506 B2, System and method for managing reservation requests for one or more inventory items, Expedia, Inc.US registered patent US 7783506 B2, System and method for managing reservation requests for one or more inventory items, Expedia, Inc. 미국 등록특허 US 6826543 B1, System and method for conducting transactions involving generically identified items, Hotels.comUS registered patent US 6826543 B1, System and method for conducting transactions involving generically identified items, Hotels.com 미국 공개특허 US 2016-0078374 A1, GRAPHICAL USER INTERFACE FOR HOTEL SEARCH SYSTEMS, GOOGLE INC.US Patent Publication US 2016-0078374 A1, GRAPHICAL USER INTERFACE FOR HOTEL SEARCH SYSTEMS, GOOGLE INC. 미국 공개특허 US 2013-0031506 A1, HOTEL RESULTS INTERFACE, GOOGLE INC.US Patent Publication US 2013-0031506 A1, HOTEL RESULTS INTERFACE, GOOGLE INC.

인스타그램, 페이스북, 틱톡, 스냅, 스노우 등의 소셜 네트워크가 영상 기반으로 발달하게 됨에 따라 여행 산업에서의 여행지 경험을 제공하는 방식이 영상 기반으로 전향되어야 할 필요성이 증대되고 있었다. 이러한 영상 기반 여행 경험 제공의 측면에서, 여행 설계의 새로운 소비 경험을 제공하는 방법 중 하나로 여행 크리에이터들이 각종 소셜 네트워크나 여행 서비스 상에서 여행 영상을 사용자들에게 제공하는 방법이 있다. 여행 영상(동영상 또는 이미지)을 소셜 네트워크/여행 서비스 상에 업로드하는 여행 크리에이터들이 창출할 수 있는 기존의 수익은 광고 수익(배너 광고, PPL 광고, 기획 영상 광고, 영상 전 광고 등)에 한정되어 있었고, 업로드 한 여행 영상에서부터 여행 상품의 판매가 발생하는 경우에 보상을 제공하여 여행 상품의 구매를 효율적으로 유도하는 여행 영상들의 생산을 촉진하는 여행 서비스 시스템은 나타나지 않았다. 이러한 여행 서비스 시스템을 효과적으로 구성하기 위해서는 여행 영상을 사용자의 니즈에 따라 분류하는 것이 필요하였다. As social networks such as Instagram, Facebook, TikTok, Snap, and Snow developed based on video, the need for the method of providing travel destination experiences in the travel industry to be converted to video-based was increasing. In terms of providing such an image-based travel experience, one of the methods for providing a new consumption experience of travel design is a method in which travel creators provide travel images to users on various social networks or travel services. Existing revenue that travel creators who upload travel videos (videos or images) on social networks/travel services can generate is limited to advertising revenue (banner advertisements, PPL advertisements, planned video advertisements, pre-video advertisements, etc.) However, there is no travel service system that promotes the production of travel images that efficiently induce the purchase of travel products by providing compensation when sales of travel products occur from uploaded travel images. In order to effectively configure such a travel service system, it was necessary to classify travel images according to user needs.

따라서, 본 발명의 목적은, 여행 영상의 생성자가 업로드 한 여행 영상을 여행자의 니즈에 따라 구분되는 구역으로 분류하는 영상의 파노라마 이미지 변환을 이용한 여행 영상 분류 장치 및 방법을 제공하는데에 있다. Accordingly, an object of the present invention is to provide an apparatus and method for classifying travel images using panoramic image conversion of images that classify travel images uploaded by creators of travel images into zones classified according to the needs of travelers.

이하 본 발명의 목적을 달성하기 위한 구체적 수단에 대하여 설명한다.Hereinafter, specific means for achieving the object of the present invention will be described.

본 발명의 목적은, 생성자 클라이언트에서 생성되고 시계열의 복수의 여행 영상 프레임으로 구성된 여행 영상 및 상기 여행 영상의 위치 정보를 포함하는 영상 정보를 수신하는 여행 영상 수신 모듈; 연속된 복수의 상기 여행 영상 프레임을 입력 데이터로 하고, 상기 여행 영상 프레임에서의 촬영 방향 벡터의 시작점의 가속도의 크기 또는 촬영 방향 벡터의 각속력을 포함하는 움직임 정보를 출력 데이터로 출력하도록 기학습된 인공신경망을 포함하고, 복수의 상기 여행 영상 프레임 각각에 대해 상기 움직임 정보를 생성하는 움직임 정보 출력 모듈; 상기 여행 영상 프레임의 상기 움직임 정보를 프레임 순서로 나열할 때, 상대적으로 낮은 상기 움직임 정보를 갖는 프레임 구간을 적어도 하나 이상 선정하고, 상기 프레임 구간 중에서 매 프레임의 상기 움직임 정보와 기준 움직임 정보의 차이를 합산한 값이 가장 큰 프레임 구간을 상기 여행 영상을 대표하는 구간인 대표 구간으로 선정하는 대표 구간 선정 모듈; 상기 대표 구간 내에서 가장 낮은 움직임 정보를 갖는 프레임을 대표 프레임으로 선정하는 대표 프레임 선정 모듈; 및 상기 대표 프레임을 입력 데이터로 하고 상기 대표 프레임 내에 포함된 오브젝트에 대한 분류 정보인 Class 분류 정보 및 상기 오브젝트의 상기 대표 프레임 내 비율에 대한 정보인 오브젝트 크기 정보를 출력 데이터로 하는 CNN 모듈을 포함하고, 상기 Class 분류 정보 및 상기 오브젝트 크기 정보를 입력 데이터로 하고 상기 위치 정보가 포함된 구역의 테마에 대한 분류 정보인 구역 분류 정보를 출력 데이터로 하는 인공신경망 모듈을 포함하며, 상기 여행 영상 및 상기 영상 정보에 대한 상기 구역 분류 정보를 출력하는 대표 프레임 분류 모듈;를 포함하는, 인공지능 기반의 여행 영상 분류 장치를 제공하여 달성될 수 있다. An object of the present invention is to provide a travel image receiving module configured to receive a travel image generated by a creator client and composed of a plurality of travel image frames in time series and image information including location information of the travel image; A plurality of consecutive travel image frames are used as input data, and motion information including the magnitude of the acceleration of the starting point of the shooting direction vector or the angular speed of the shooting direction vector in the travel image frames is pre-learned to be output as output data. a motion information output module including an artificial neural network and generating the motion information for each of the plurality of travel image frames; When the motion information of the travel image frame is arranged in frame order, at least one frame section having relatively low motion information is selected, and the difference between the motion information and reference motion information of each frame is determined among the frame sections. a representative section selection module that selects a frame section having the largest sum value as a representative section representing the travel video; a representative frame selection module for selecting a frame having the lowest motion information within the representative section as a representative frame; And a CNN module that takes the representative frame as input data and outputs Class classification information, which is classification information about an object included in the representative frame, and object size information, which is information about a ratio of the object in the representative frame, as output data, , an artificial neural network module having the class classification information and the object size information as input data and zone classification information, which is classification information on the theme of the zone including the location information, as output data, and the travel image and the image It can be achieved by providing an artificial intelligence-based travel image classification device including a; representative frame classification module for outputting the zone classification information for information.

본 발명의 다른 목적은, 여행 영상 수신 모듈이, 생성자 클라이언트에서 생성되고 시계열의 복수의 여행 영상 프레임으로 구성된 여행 영상 및 상기 여행 영상의 위치 정보를 포함하는 영상 정보를 수신하는 여행 영상 수신 단계; 연속된 복수의 상기 여행 영상 프레임을 입력 데이터로 하고, 상기 여행 영상 프레임에서의 촬영 방향 벡터의 시작점의 가속도의 크기 또는 촬영 방향 벡터의 각속력을 포함하는 움직임 정보를 출력 데이터로 출력하도록 기학습된 인공신경망을 포함하는 움직임 정보 출력 모듈이, 복수의 상기 여행 영상 프레임 각각에 대해 상기 움직임 정보를 생성하는 움직임 정보 출력 단계; 대표 구간 선정 모듈이, 상기 여행 영상 프레임의 상기 움직임 정보를 프레임 순서로 나열할 때, 상대적으로 낮은 상기 움직임 정보를 갖는 프레임 구간을 적어도 하나 이상 선정하고, 상기 프레임 구간 중에서 매 프레임의 상기 움직임 정보와 기준 움직임 정보의 차이를 합산한 값이 가장 큰 프레임 구간을 상기 여행 영상을 대표하는 구간인 대표 구간으로 선정하는 대표 구간 선정 단계; 대표 프레임 선정 모듈이, 상기 대표 구간 내에서 가장 낮은 움직임 정보를 갖는 프레임을 대표 프레임으로 선정하는 대표 프레임 선정 단계; 및 상기 대표 프레임을 입력 데이터로 하고 상기 대표 프레임 내에 포함된 오브젝트에 대한 분류 정보인 Class 분류 정보 및 상기 오브젝트의 상기 대표 프레임 내 비율에 대한 정보인 오브젝트 크기 정보를 출력 데이터로 하는 CNN 모듈을 포함하고, 상기 Class 분류 정보 및 상기 오브젝트 크기 정보를 입력 데이터로 하고 상기 위치 정보가 포함된 구역의 테마에 대한 분류 정보인 구역 분류 정보를 출력 데이터로 하는 인공신경망 모듈을 포함하는 대표 프레임 분류 모듈이, 상기 여행 영상 및 상기 영상 정보에 대한 상기 구역 분류 정보를 출력하는 대표 프레임 분류 단계;를 포함하는, 인공지능 기반의 여행 영상 분류 방법을 제공하여 달성될 수 있다. Another object of the present invention is a travel image reception step in which a travel image receiving module receives a travel image generated by a creator client and composed of a plurality of travel image frames in time series and image information including location information of the travel image; A plurality of consecutive travel image frames are used as input data, and motion information including the magnitude of the acceleration of the starting point of the shooting direction vector or the angular speed of the shooting direction vector in the travel image frames is pre-learned to be output as output data. a motion information outputting step of generating, by a motion information output module including an artificial neural network, the motion information for each of the plurality of travel image frames; When the representative section selecting module arranges the motion information of the travel video frame in frame order, at least one frame section having the relatively low motion information is selected, and the motion information and the motion information of each frame are selected from among the frame sections. a representative section selecting step of selecting a frame section having the largest sum of differences in reference motion information as a representative section representing the travel video; a representative frame selection step of selecting, by a representative frame selection module, a frame having the lowest motion information within the representative section as a representative frame; And a CNN module that takes the representative frame as input data and outputs Class classification information, which is classification information about an object included in the representative frame, and object size information, which is information about a ratio of the object in the representative frame, as output data, , A representative frame classification module including an artificial neural network module having the class classification information and the object size information as input data and zone classification information, which is classification information on the theme of the zone including the location information, as output data, It can be achieved by providing an artificial intelligence-based travel image classification method, including a representative frame classification step of outputting the travel image and the zone classification information for the image information.

본 발명의 다른 목적은, 여행 영상 분류 프로그램 코드를 포함하는 메모리 모듈; 및 상기 여행 영상 분류 프로그램 코드를 수행하는 처리 모듈; 을 포함하고, 상기 여행 영상 분류 프로그램 코드는, 생성자 클라이언트에서 생성되고 시계열의 복수의 여행 영상 프레임으로 구성된 여행 영상 및 상기 여행 영상의 위치 정보를 포함하는 영상 정보를 수신하는 여행 영상 수신 단계; 연속된 복수의 상기 여행 영상 프레임을 입력 데이터로 하고, 상기 여행 영상 프레임에서의 촬영 방향 벡터의 시작점의 가속도의 크기 또는 촬영 방향 벡터의 각속력을 포함하는 움직임 정보를 출력 데이터로 출력하도록 기학습된 인공신경망에 상기 여행 영상 프레임를 입력하여, 복수의 상기 여행 영상 프레임 각각에 대해 상기 움직임 정보를 생성하는 움직임 정보 출력 단계; 상기 여행 영상 프레임의 상기 움직임 정보를 프레임 순서로 나열할 때, 상대적으로 낮은 상기 움직임 정보를 갖는 프레임 구간을 적어도 하나 이상 선정하고, 상기 프레임 구간 중에서 매 프레임의 상기 움직임 정보와 기준 움직임 정보의 차이를 합산한 값이 가장 큰 프레임 구간을 상기 여행 영상을 대표하는 구간인 대표 구간으로 선정하는 대표 구간 선정 단계; 상기 대표 구간 내에서 가장 낮은 움직임 정보를 갖는 프레임을 대표 프레임으로 선정하는 대표 프레임 선정 단계; 및 상기 대표 프레임을 입력 데이터로 하고 상기 대표 프레임 내에 포함된 오브젝트에 대한 분류 정보인 Class 분류 정보 및 상기 오브젝트의 상기 대표 프레임 내 비율에 대한 정보인 오브젝트 크기 정보를 출력 데이터로 하는 CNN 모듈에 상기 대표 프레임을 입력하여 상기 Class 분류 정보 및 상기 오브젝트 크기 정보를 출력하고, 상기 Class 분류 정보 및 상기 오브젝트 크기 정보를 입력 데이터로 하고 상기 위치 정보가 포함된 구역의 테마에 대한 분류 정보인 구역 분류 정보를 출력 데이터로 하는 인공신경망 모듈에 상기 Class 분류 정보 및 상기 오브젝트 크기 정보를 입력하여 상기 여행 영상 및 상기 영상 정보에 대한 상기 구역 분류 정보를 출력하는 대표 프레임 분류 단계;를 포함한 단계를 컴퓨터 상에서 수행하도록 구성되는 것을 특징으로 하는, 인공지능 기반의 여행 영상 분류 장치를 제공하여 달성될 수 있다. Another object of the present invention is a memory module including a travel image classification program code; and a processing module that executes the travel image classification program code. The travel image classification program code includes a travel image receiving step of receiving a travel image generated by a creator client and composed of a plurality of travel image frames in time series and image information including location information of the travel image; A plurality of consecutive travel image frames are used as input data, and motion information including the magnitude of the acceleration of the starting point of the shooting direction vector or the angular speed of the shooting direction vector in the travel image frames is pre-learned to be output as output data. a motion information output step of inputting the travel image frames to an artificial neural network and generating the motion information for each of the plurality of travel image frames; When the motion information of the travel image frame is arranged in frame order, at least one frame section having relatively low motion information is selected, and the difference between the motion information and reference motion information of each frame is determined among the frame sections. a representative section selecting step of selecting a frame section having the largest sum value as a representative section representing the travel video; a representative frame selecting step of selecting a frame having the lowest motion information within the representative section as a representative frame; and Class classification information, which is classification information for an object included in the representative frame, and object size information, which is information about a ratio of the object in the representative frame, to a CNN module that takes the representative frame as input data and outputs the object size information as output data. A frame is input to output the class classification information and the object size information, and the class classification information and the object size information are used as input data and zone classification information, which is classification information on the theme of the zone including the location information, is output. A representative frame classification step of inputting the class classification information and the object size information to an artificial neural network module as data and outputting the zone classification information for the travel image and the image information; configured to perform steps including on a computer Characterized in that, it can be achieved by providing an artificial intelligence-based travel image classification device.

본 발명의 다른 목적은, 생성자 클라이언트에서 생성되고 시계열의 복수의 여행 영상 프레임으로 구성된 여행 영상 및 상기 여행 영상의 위치 정보를 포함하는 영상 정보를 수신하는 여행 영상 수신 모듈; 상기 여행 영상의 프레임 중 특정 프레임을 대표 프레임으로 선정하는 대표 프레임 선정 모듈; 상기 대표 프레임을 입력 데이터로 하고 상기 위치 정보가 포함된 구역의 테마에 대한 분류 정보인 제1구역 분류 정보를 출력 데이터로 하는 대표 프레임 분류 인공신경망 모듈을 포함하며, 상기 여행 영상 및 상기 영상 정보에 대한 상기 제1구역 분류 정보를 출력하는 대표 프레임 분류 모듈; 입력 데이터를 대표 프레임으로 하고 출력 데이터로 특징 클래스 정보(class probability) 및 특징 좌표 정보(coordinate data)를 포함하는 CNN 계열의 특징 추출 인공신경망을 포함하는 특징 추출 모듈; 상기 대표 프레임 분류 모듈의 특정 컨볼루전 레이어(Convolution Layer)와 연결되어 클래스 별 액티베이션 맵(Activation map)을 의미하는 클래스 액티베이션 정보를 수신하고, 상기 클래스 액티베이션 정보 및 상기 특징 좌표 정보를 이용하여 각 클래스의 특징 중요도 정보를 생성하며, 생성된 상기 특징 중요도 정보가 가장 큰 클래스를 코어 특징으로 선정하는 코어 특징 선정 모듈; 상기 대표 프레임 선정 모듈에서 선정된 상기 대표 프레임의 전후 프레임들 중 상기 코어 특징을 포함하는 복수의 프레임을 후보 프레임으로 선정하는 후보 프레임 선정 모듈; 및 상기 대표 프레임과 상기 후보 프레임을 상기 코어 특징을 기준으로 정합(stiching)하여 파노라마 대표 프레임을 생성하는 프레임 정합 모듈;를 포함하고, 상기 프레임 정합 모듈에서 생성된 상기 파노라마 대표 프레임은 상기 대표 프레임 분류 모듈에 입력 데이터로 입력되어 제2구역 분류 정보를 출력 데이터로 출력하도록 구성되는 것을 특징으로 하는. 영상의 파노라마 이미지 변환을 이용한 여행 영상 분류 장치를 제공하여 달성될 수 있다. Another object of the present invention is a travel image receiving module configured to receive a travel image generated by a creator client and composed of a plurality of travel image frames in time series and image information including location information of the travel image; a representative frame selection module for selecting a specific frame among the frames of the travel video as a representative frame; A representative frame classification artificial neural network module having the representative frame as input data and first zone classification information, which is classification information on the theme of the zone including the location information, as output data, a representative frame classification module outputting classification information of the first zone for the first zone; A feature extraction module including a CNN-based feature extraction artificial neural network that uses input data as a representative frame and includes feature class information (class probability) and feature coordinate information (coordinate data) as output data; It is connected to a specific convolution layer of the representative frame classification module to receive class activation information, which means an activation map for each class, and uses the class activation information and the feature coordinate information to generate information about each class. a core feature selection module that generates feature importance information and selects a class having the largest generated feature importance information as a core feature; a candidate frame selection module for selecting a plurality of frames including the core feature among frames before and after the representative frame selected by the representative frame selection module as candidate frames; and a frame matching module generating a representative panorama frame by stitching the representative frame and the candidate frame based on the core feature, wherein the representative panorama frame generated by the frame matching module is classified as the representative frame. Characterized in that it is input to the module as input data and configured to output the second zone classification information as output data. This can be achieved by providing an apparatus for classifying travel images using panoramic image conversion of images.

또한, 상기 특징 중요도 정보는, 각 클래스의 상기 특징 좌표 정보에 따른 바운딩 박스(bounding box) 범위 내에 상기 클래스 액티베이션 정보의 hitmap이 포함되는 비율 또는 각 클래스의 상기 바운딩 박스 범위 내에서의 상기 클래스 액티베이션 정보의 액티베이션 값의 합을 포함하는 것을 특징으로 할 수 있다. In addition, the feature importance information may include a ratio in which a hitmap of the class activation information is included within a bounding box range according to the feature coordinate information of each class or the class activation information within the bounding box range of each class. It may be characterized in that it includes the sum of the activation values of.

본 발명의 다른 목적은, 여행 영상 수신 모듈이, 생성자 클라이언트에서 생성되고 시계열의 복수의 여행 영상 프레임으로 구성된 여행 영상 및 상기 여행 영상의 위치 정보를 포함하는 영상 정보를 수신하는 여행 영상 수신 단계; 대표 프레임 선정 모듈이, 상기 여행 영상의 프레임 중 특정 프레임을 대표 프레임으로 선정하는 대표 프레임 선정 단계; 상기 대표 프레임을 입력 데이터로 하고 상기 위치 정보가 포함된 구역의 테마에 대한 분류 정보인 제1구역 분류 정보를 출력 데이터로 하는 대표 프레임 분류 인공신경망 모듈을 포함하는 대표 프레임 분류 모듈이, 상기 여행 영상 및 상기 영상 정보에 대한 상기 제1구역 분류 정보를 출력하는 대표 프레임 분류 단계; 입력 데이터를 대표 프레임으로 하고 출력 데이터로 특징 클래스 정보(class probability) 및 특징 좌표 정보(coordinate data)를 포함하는 CNN 계열의 특징 추출 인공신경망을 포함하는 특징 추출 모듈이, 상기 특징 클래스 정보 및 상기 특징 좌표 정보를 출력하는 특징 추출 단계; 코어 특징 선정 모듈이, 상기 대표 프레임 분류 모듈의 특정 컨볼루전 레이어(Convolution Layer)와 연결되어 클래스 별 액티베이션 맵(Activation map)을 의미하는 클래스 액티베이션 정보를 수신하고, 상기 클래스 액티베이션 정보 및 상기 특징 좌표 정보를 이용하여 각 클래스의 특징 중요도 정보를 생성하며, 생성된 상기 특징 중요도 정보가 가장 큰 클래스를 코어 특징으로 선정하는 코어 특징 선정 단계; 후보 프레임 선정 모듈이, 상기 대표 프레임의 전후 프레임들 중 상기 코어 특징을 포함하는 복수의 프레임을 후보 프레임으로 선정하는 후보 프레임 선정 단계; 및 프레임 정합 모듈이, 상기 대표 프레임과 상기 후보 프레임을 상기 코어 특징을 기준으로 정합(stiching)하여 파노라마 대표 프레임을 생성하는 프레임 정합 단계;를 포함하고, 상기 파노라마 대표 프레임은 상기 대표 프레임 분류 모듈에 입력 데이터로 입력되어 제2구역 분류 정보를 출력 데이터로 출력하도록 구성되는 것을 특징으로 하는. 영상의 파노라마 이미지 변환을 이용한 여행 영상 분류 방법을 제공하여 달성될 수 있다.Another object of the present invention is a travel image reception step in which a travel image receiving module receives a travel image generated by a creator client and composed of a plurality of travel image frames in time series and image information including location information of the travel image; a representative frame selection step in which a representative frame selection module selects a specific frame among the frames of the travel video as a representative frame; A representative frame classification module including a representative frame classification artificial neural network module that takes the representative frame as input data and first zone classification information, which is classification information on the theme of the zone including the location information, as output data, and a representative frame classification step of outputting the first zone classification information for the image information. A feature extraction module including a CNN-based feature extraction artificial neural network using input data as a representative frame and including feature class information (class probability) and feature coordinate information (coordinate data) as output data, the feature class information and the feature a feature extraction step of outputting coordinate information; A core feature selection module is connected to a specific convolution layer of the representative frame classification module to receive class activation information meaning an activation map for each class, and the class activation information and the feature coordinate information a core feature selection step of generating feature importance information of each class by using and selecting a class having the largest generated feature importance information as a core feature; a candidate frame selection step of selecting, by a candidate frame selection module, a plurality of frames including the core feature among frames before and after the representative frame as candidate frames; and a frame matching step of stitching, by a frame matching module, the representative frame and the candidate frame based on the core feature to generate a representative panorama frame, wherein the representative frame of the panorama is sent to the representative frame classification module. Characterized in that it is input as input data and configured to output the second zone classification information as output data. This can be achieved by providing a method for classifying travel images using panoramic image conversion of images.

본 발명의 다른 목적은, 여행 영상 분류 프로그램 코드를 포함하는 메모리 모듈; 및 상기 여행 영상 분류 프로그램 코드를 수행하는 처리 모듈;을 포함하고, 상기 여행 영상 분류 프로그램 코드는, 생성자 클라이언트에서 생성되고 시계열의 복수의 여행 영상 프레임으로 구성된 여행 영상 및 상기 여행 영상의 위치 정보를 포함하는 영상 정보를 수신하는 여행 영상 수신 단계; 상기 여행 영상의 프레임 중 특정 프레임을 대표 프레임으로 선정하는 대표 프레임 선정 단계; 상기 대표 프레임을 입력 데이터로 하고 상기 위치 정보가 포함된 구역의 테마에 대한 분류 정보인 제1구역 분류 정보를 출력 데이터로 하는 대표 프레임 분류 인공신경망 모듈이 상기 여행 영상 및 상기 영상 정보에 대한 상기 제1구역 분류 정보를 출력하는 대표 프레임 분류 단계; 입력 데이터를 대표 프레임으로 하고 출력 데이터로 특징 클래스 정보(class probability) 및 특징 좌표 정보(coordinate data)를 포함하는 CNN 계열의 특징 추출 인공신경망이 상기 특징 클래스 정보 및 상기 특징 좌표 정보를 출력하는 특징 추출 단계; 상기 대표 프레임 분류 인공신경망 모듈의 특정 컨볼루전 레이어(Convolution Layer)와 연결되어 클래스 별 액티베이션 맵(Activation map)을 의미하는 클래스 액티베이션 정보를 수신하고, 상기 클래스 액티베이션 정보 및 상기 특징 좌표 정보를 이용하여 각 클래스의 특징 중요도 정보를 생성하며, 생성된 상기 특징 중요도 정보가 가장 큰 클래스를 코어 특징으로 선정하는 코어 특징 선정 단계; 상기 대표 프레임의 전후 프레임들 중 상기 코어 특징을 포함하는 복수의 프레임을 후보 프레임으로 선정하는 후보 프레임 선정 단계; 및 상기 대표 프레임과 상기 후보 프레임을 상기 코어 특징을 기준으로 정합(stiching)하여 파노라마 대표 프레임을 생성하는 프레임 정합 단계;를 포함한 단계를 컴퓨터 상에서 수행하도록 구성되는 것을 특징으로 하고, 상기 파노라마 대표 프레임은 상기 대표 프레임 분류 모듈에 입력 데이터로 입력되어 제2구역 분류 정보를 출력 데이터로 출력하도록 구성되는 것을 특징으로 하는, 영상의 파노라마 이미지 변환을 이용한 여행 영상 분류 장치를 제공하여 달성될 수 있다.Another object of the present invention is a memory module including a travel image classification program code; and a processing module that executes the travel image classification program code, wherein the travel image classification program code includes a travel image generated by a creator client and composed of a plurality of travel image frames in time series and location information of the travel image. A travel image reception step of receiving image information to be performed; a representative frame selection step of selecting a specific frame among the frames of the travel video as a representative frame; A representative frame classification artificial neural network module having the representative frame as input data and first zone classification information, which is classification information on the theme of the zone including the location information, as output data Representative frame classification step of outputting zone 1 classification information; CNN-based feature extraction using input data as a representative frame and including feature class information (class probability) and feature coordinate information (coordinate data) as output data Feature extraction in which an artificial neural network outputs the feature class information and the feature coordinate information step; It is connected to a specific convolution layer of the representative frame classification artificial neural network module to receive class activation information, which means an activation map for each class, and uses the class activation information and the feature coordinate information to a core feature selection step of generating feature importance information of a class and selecting a class having the largest generated feature importance information as a core feature; a candidate frame selection step of selecting a plurality of frames including the core feature among frames before and after the representative frame as candidate frames; and a frame stitching step of generating a panorama representative frame by stitching the representative frame and the candidate frame based on the core feature, wherein the panorama representative frame is It can be achieved by providing a travel image classification device using panoramic image conversion of an image, characterized in that it is input as input data to the representative frame classification module and configured to output second zone classification information as output data.

본 발명의 다른 목적은, 여행 영상 수신 모듈이, 생성자 클라이언트에서 생성되고 시계열의 복수의 여행 영상 프레임으로 구성된 여행 영상 및 상기 여행 영상의 위치 정보를 포함하는 영상 정보를 수신하는 여행 영상 수신 단계; 대표 프레임 선정 모듈이, 상기 여행 영상의 프레임 중 특정 프레임을 대표 프레임으로 선정하는 대표 프레임 선정 단계; 상기 대표 프레임을 입력 데이터로 하고 상기 위치 정보가 포함된 구역의 테마에 대한 분류 정보인 제1구역 분류 정보를 출력 데이터로 하는 대표 프레임 분류 인공신경망 모듈을 포함하는 대표 프레임 분류 모듈이, 상기 여행 영상 및 상기 영상 정보에 대한 상기 제1구역 분류 정보를 출력하는 대표 프레임 분류 단계; 입력 데이터를 대표 프레임으로 하고 출력 데이터로 특징 클래스 정보(class probability) 및 특징 좌표 정보(coordinate data)를 포함하는 CNN 계열의 특징 추출 인공신경망을 포함하는 특징 추출 모듈이, 상기 특징 클래스 정보 및 상기 특징 좌표 정보를 출력하는 특징 추출 단계; 코어 특징 선정 모듈이, 상기 대표 프레임 분류 모듈의 특정 컨볼루전 레이어(Convolution Layer)와 연결되어 클래스 별 액티베이션 맵(Activation map)을 의미하는 클래스 액티베이션 정보를 수신하고, 상기 클래스 액티베이션 정보 및 상기 특징 좌표 정보를 이용하여 각 클래스의 특징 중요도 정보를 생성하며, 생성된 상기 특징 중요도 정보가 가장 큰 클래스를 코어 특징으로 선정하는 코어 특징 선정 단계; 후보 프레임 선정 모듈이, 상기 대표 프레임의 전후 프레임들 중 상기 코어 특징을 포함하는 복수의 프레임을 후보 프레임으로 선정하는 후보 프레임 선정 단계; 및 프레임 정합 모듈이, 상기 대표 프레임과 상기 후보 프레임을 상기 코어 특징을 기준으로 정합(stiching)하여 파노라마 대표 프레임을 생성하는 프레임 정합 단계;를 포함한 단계를 컴퓨터 상에서 수행하도록 구성되는 것을 특징으로 하고, 상기 파노라마 대표 프레임은 상기 대표 프레임 분류 모듈에 입력 데이터로 입력되어 제2구역 분류 정보를 출력 데이터로 출력하도록 구성되는 것을 특징으로 하는. 영상의 파노라마 이미지 변환을 이용한 여행 영상 분류 방법을 컴퓨터 상에서 수행하도록 구성되는 기록매체에 저장된 프로그램을 제공하여 달성될 수 있다.Another object of the present invention is a travel image reception step in which a travel image receiving module receives a travel image generated by a creator client and composed of a plurality of travel image frames in time series and image information including location information of the travel image; a representative frame selection step in which a representative frame selection module selects a specific frame among the frames of the travel video as a representative frame; A representative frame classification module including a representative frame classification artificial neural network module that takes the representative frame as input data and first zone classification information, which is classification information on the theme of the zone including the location information, as output data, and a representative frame classification step of outputting the first zone classification information for the image information. A feature extraction module including a CNN-based feature extraction artificial neural network using input data as a representative frame and including feature class information (class probability) and feature coordinate information (coordinate data) as output data, the feature class information and the feature a feature extraction step of outputting coordinate information; A core feature selection module is connected to a specific convolution layer of the representative frame classification module to receive class activation information meaning an activation map for each class, and the class activation information and the feature coordinate information a core feature selection step of generating feature importance information of each class by using and selecting a class having the largest generated feature importance information as a core feature; a candidate frame selection step of selecting, by a candidate frame selection module, a plurality of frames including the core feature among frames before and after the representative frame as candidate frames; and a frame stitching step in which a frame matching module generates a panorama representative frame by stitching the representative frame and the candidate frame based on the core feature, on a computer. Characterized in that the panoramic representative frame is input as input data to the representative frame classification module and configured to output second zone classification information as output data. It can be achieved by providing a program stored in a recording medium configured to perform a travel image classification method using a panoramic image conversion of an image on a computer.

상기한 바와 같이, 본 발명에 의하면 이하와 같은 효과가 있다.As described above, the present invention has the following effects.

첫째, 본 발명의 일실시예에 따르면, 여행 영상을 사용자의 니즈에 따라 구분되는 구역으로 분류하는 것이 가능해지는 효과가 발생된다. First, according to an embodiment of the present invention, it is possible to classify travel images into zones according to user needs.

둘째, 본 발명의 일실시예에 따르면, 여행 영상을 업로드 한 생성자에게 여행 영상의 구역 분류 정보에 따라 리워드를 달리 할 수 있는 효과가 발생된다. Second, according to one embodiment of the present invention, the creator who uploaded the travel video has an effect of being able to give a different reward according to the zone classification information of the travel video.

본 명세서에 첨부되는 다음의 도면들은 본 발명의 바람직한 실시예를 예시하는 것이며, 발명의 상세한 설명과 함께 본 발명의 기술사상을 더욱 이해시키는 역할을 하는 것이므로, 본 발명은 그러한 도면에 기재된 사항에만 한정되어 해석되어서는 아니 된다.
도 1은 본 발명의 일실시예에 따른 인공지능 기반의 여행 영상 분류 장치를 도시한 모식도,
도 2는 본 발명의 일실시예에 따른 인공지능 기반의 여행 영상 분류 장치의 작동관계를 도시한 모식도,
도 3은 본 발명의 일실시예에 따른 움직임 정보 출력 모듈(11)의 작동관계를 도시한 모식도,
도 4는 본 발명의 다른 실시예에 따른 강화학습 모듈을 포함한 움직임 정보 출력 모듈(11)을 도시한 모식도,
도 5는 대표 구간 선정 모듈(12)의 대표 구간 선정을 도시한 모식도,
도 6은 본 발명의 일실시예에 따른 대표 프레임 분류 모듈(14)의 구성을 도시한 모식도,
도 7은 본 발명의 일실시예에 따른 제nCNN 모듈을 도시한 모식도,
도 8은 본 발명의 일실시예에 따른 분류 인공신경망 모듈을 도시한 모식도,
도 9는 본 발명의 일실시예에 따른 구역 분류 인공신경망 모듈에 의해 출력된 구역 분류 정보를 여행 지역의 지도에 매핑한 모식도,
도 10은 본 발명의 변형예에 따른 여행 영상 분류 장치인 영상의 파노라마 이미지 변환을 이용한 여행 영상 분류 장치를 도시한 모식도,
도 11은 본 발명의 변형예에 따른 여행 영상 분류 장치인 영상의 파노라마 이미지 변환을 이용한 여행 영상 분류 장치의 작동관계를 도시한 모식도,
도 12는 본 발명의 변형예에 따른 여행 영상 분류 장치의 일구성인 특징 추출 모듈(21)의 구조를 도시한 모식도,
도 13은 본 발명의 변형예에 따른 특징 추출 인공신경망 모듈이 YOLO v1(CVPR 2016)으로 구성되는 경우의 구조를 도시한 모식도,
도 14는 본 발명의 변형예에 따른 코어 특징 선정 모듈(22)의 작동관계를 도시한 모식도,
도 15는 본 발명의 제2변형예에 따른 코어 특징 선정 모듈(22)의 작동관계를 도시한 모식도,
도 16은 본 발명의 변형예에 따른 후보 프레임 선정 모듈(23)을 도시한 모식도이다.The following drawings attached to this specification illustrate preferred embodiments of the present invention, and together with the detailed description of the invention serve to further understand the technical idea of the present invention, the present invention is limited only to those described in the drawings. and should not be interpreted.
1 is a schematic diagram showing an artificial intelligence-based travel image classification device according to an embodiment of the present invention;
2 is a schematic diagram showing the operational relationship of an artificial intelligence-based travel image classification device according to an embodiment of the present invention;
3 is a schematic diagram showing the operational relationship of the motion information output module 11 according to an embodiment of the present invention;
4 is a schematic diagram showing a motion information output module 11 including a reinforcement learning module according to another embodiment of the present invention;
5 is a schematic diagram showing the representative section selection of the representative section selection module 12;
6 is a schematic diagram showing the configuration of a representative frame classification module 14 according to an embodiment of the present invention;
7 is a schematic diagram showing a th nCNN module according to an embodiment of the present invention;
8 is a schematic diagram showing a classification artificial neural network module according to an embodiment of the present invention;
9 is a schematic diagram of mapping the area classification information output by the area classification artificial neural network module to a map of a travel area according to an embodiment of the present invention;
10 is a schematic diagram showing a travel image classification device using panoramic image conversion, which is a travel image classification device according to a modified example of the present invention;
11 is a schematic diagram showing the operational relationship of a travel image classification device using panoramic image conversion, which is a travel image classification device according to a modified example of the present invention;
12 is a schematic diagram showing the structure of a feature extraction module 21, which is a component of a travel image classification device according to a modified example of the present invention;
13 is a schematic diagram showing the structure when a feature extraction artificial neural network module according to a modified example of the present invention is composed of YOLO v1 (CVPR 2016);
14 is a schematic diagram showing the operational relationship of the core feature selection module 22 according to a modified example of the present invention;
15 is a schematic diagram showing the operational relationship of the core feature selection module 22 according to the second modified example of the present invention;
16 is a schematic diagram showing a candidate frame selection module 23 according to a modified example of the present invention.

이하 첨부된 도면을 참조하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 본 발명을 쉽게 실시할 수 있는 실시예를 상세히 설명한다. 다만, 본 발명의 바람직한 실시예에 대한 동작원리를 상세하게 설명함에 있어서 관련된 공지기능 또는 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략한다.Hereinafter, an embodiment in which a person skilled in the art can easily practice the present invention will be described in detail with reference to the accompanying drawings. However, in the detailed description of the operating principle of the preferred embodiment of the present invention, if it is determined that a detailed description of a related known function or configuration may unnecessarily obscure the subject matter of the present invention, the detailed description will be omitted.

또한, 도면 전체에 걸쳐 유사한 기능 및 작용을 하는 부분에 대해서는 동일한 도면 부호를 사용한다. 명세서 전체에서, 특정 부분이 다른 부분과 연결되어 있다고 할 때, 이는 직접적으로 연결되어 있는 경우뿐만 아니라, 그 중간에 다른 소자를 사이에 두고, 간접적으로 연결되어 있는 경우도 포함한다. 또한, 특정 구성요소를 포함한다는 것은 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라, 다른 구성요소를 더 포함할 수 있는 것을 의미한다.In addition, the same reference numerals are used for parts having similar functions and actions throughout the drawings. Throughout the specification, when a specific part is said to be connected to another part, this includes not only the case where it is directly connected but also the case where it is indirectly connected with another element interposed therebetween. In addition, including a specific component does not exclude other components unless otherwise stated, but means that other components may be further included.

이하 발명의 설명에서 컨볼루져널 곱을 활용한 Neural Network인 Convolutional Neural Network은 CNN, ConvNet 등으로 기재될 수 있다. In the following description of the invention, the convolutional neural network, which is a neural network using convolutional products, may be described as CNN, ConvNet, and the like.

이하 발명의 설명에서는 설명의 편의에 따라 호텔 예약을 기준으로 기술하였지만, 본 발명의 범위는 호텔 예약에 한정되지 않고 민박, 호스텔, 모텔, 호텔의 타임커머스, 액티비티, 패키지 여행, 가이드, 렌트카 등의 모든 여행 상품에 대한 범위를 포함할 수 있다. In the following description of the invention, the description is made based on hotel reservations for convenience of explanation, but the scope of the present invention is not limited to hotel reservations, but it is not limited to bed and breakfasts, hostels, motels, time commerce of hotels, activities, package tours, guides, rental cars, etc. You can include coverage for all travel products.

이하에서 여행 영상이란, 특정 여행 지역에서 촬영된 다양한 압축 방식, 다양한 코덱의 시청각 정보를 의미할 수 있다. Hereinafter, travel images may refer to audiovisual information of various compression methods and various codecs captured in a specific travel region.

인공지능 기반의 여행 영상 분류 장치AI-based travel image classification device

도 1은 본 발명의 일실시예에 따른 인공지능 기반의 여행 영상 분류 장치를 도시한 모식도이다. 도 1에 도시된 바와 같이, 본 발명의 일실시예에 따른 인공지능 기반의 여행 영상 분류 장치(1)는 여행 영상 수신 모듈(10), 움직임 정보 출력 모듈(11), 대표 구간 선정 모듈(12), 대표 프레임 선정 모듈(13), 대표 프레임 분류 모듈(14)을 포함할 수 있다. 도 2는 본 발명의 일실시예에 따른 인공지능 기반의 여행 영상 분류 장치의 작동관계를 도시한 모식도이다. 도 2에 도시된 바와 같이, 여행 영상을 생성하는 유저(여행 크리에이터)인 생성자의 클라이언트(생성자 클라이언트, 100)에 구성된 애플리케이션 모듈에서 여행 영상을 생성하여 인공지능 기반의 여행 영상 분류 장치(1)의 여행 영상 수신 모듈(10)에 송신하고, 여행 영상 수신 모듈(10)에서는 수신된 여행 영상의 각 프레임인 여행 영상 프레임을 움직임 정보 출력 모듈(11)에 송신한다. 움직임 정보 출력 모듈(11)에서는 연속된 복수의 여행 영상 프레임을 기초로 움직임 정보를 생성하고, 대표 구간 선정 모듈(12)에서는 프레임 별 움직임 정보를 기초로 연속된 여행 영상 프레임을 복수의 구간으로 구분하고, 복수의 구간들 중 대표 구간을 선정하여 대표 구간 정보를 생성하게 된다. 대표 프레임 선정 모듈(13)에서는 대표 구간 정보 내에서의 특정 여행 영상 프레임을 대표 프레임으로 선정하여 대표 프레임 정보를 생성하고, 대표 프레임 분류 모듈(14)에서는 대표 프레임 정보에 대응되는 대표 프레임을 입력 받고 구역 분류 정보를 출력 하도록 구성될 수 있다. 각 구성에 대한 구체적인 설명 및 실시예는 다음과 같다.1 is a schematic diagram showing an artificial intelligence-based travel image classification device according to an embodiment of the present invention. As shown in FIG. 1, the artificial intelligence-based travel image classification device 1 according to an embodiment of the present invention includes a travel image reception module 10, a motion information output module 11, and a representative section selection module 12. ), a representative frame selection module 13, and a representative frame classification module 14. 2 is a schematic diagram showing the operational relationship of an artificial intelligence-based travel image classification device according to an embodiment of the present invention. As shown in FIG. 2, the application module configured in the client (creator client, 100) of the user (travel creator) who creates the travel image generates the travel image, and the AI-based travel image classification device 1 The travel image reception module 10 transmits each frame of the received travel image, that is, a travel image frame, to the motion information output module 11. The motion information output module 11 generates motion information based on a plurality of consecutive travel video frames, and the representative section selection module 12 divides the consecutive travel video frames into a plurality of sections based on the motion information for each frame. And, representative section information is generated by selecting a representative section among a plurality of sections. The representative frame selection module 13 selects a specific travel video frame within the representative section information as a representative frame to generate representative frame information, and the representative frame classification module 14 receives a representative frame corresponding to the representative frame information It can be configured to output zone classification information. Detailed descriptions and examples of each configuration are as follows.

여행 영상 수신 모듈(10)은 생성자 클라이언트(100)에서 송신된 여행 영상 및 영상 정보를 생성자 클라이언트(100)의 애플리케이션 모듈에서 수신하는 모듈이다. 본 발명의 일실시예에 따른 여행 영상은 시계열로 구성된 복수의 여행 영상 프레임으로 구성되고, 영상 정보는 해당 여행 영상의 생성자 정보, 위치 정보, 시간 정보, 오디오 형식 정보, 오디오 샘플 레이트(예를 들어, 헤르츠 단위), 오디오 스트림 채널 개수 정보, 파일명 정보, 크기 정보, 총 프레임 수 정보, 초당 프레임 수 정보, 영상 높이 정보(예를 들어, 픽셀 단위), 영상 너비 정보(예를 들어, 픽셀 단위), 이미지 유형 정보, 압축 코덱 정보 등을 포함할 수 있다. 본 발명의 일실시예에 따른 여행 영상은 여행 영상의 생성자인 여행 크리에이터가 생성자 클라이언트(100)의 카메라를 통해 촬영한 영상 또는 생성자 클라이언트(100)에 기저장된 영상을 포함할 수 있다. 본 발명의 다른 실시예에 따르면, 생성자 클라이언트(100)에서 여행 영상을 수신한 서버 또는 데이터베이스에서 여행 영상 및 영상 정보를 여행 영상 수신 모듈(10)에 송신하도록 구성될 수 있다. The travel video reception module 10 is a module that receives the travel video and image information transmitted from the creator client 100 in the application module of the creator client 100 . A travel video according to an embodiment of the present invention is composed of a plurality of travel image frames configured in time series, and the image information includes creator information, location information, time information, audio format information, audio sample rate (for example, , hertz units), audio stream channel number information, file name information, size information, total frame number information, frames per second information, video height information (eg, pixel units), video width information (eg, pixel units) , image type information, compression codec information, and the like. The travel video according to an embodiment of the present invention may include an image taken by a travel creator, who is a creator of the travel video, through a camera of the creator client 100 or an image pre-stored in the creator client 100. According to another embodiment of the present invention, the creator client 100 may be configured to transmit the travel image and image information to the travel image receiving module 10 from a server or database that has received the travel image.

움직임 정보 출력 모듈(11)은 여행 영상 프레임을 프레임 시간 순서로 수신하고, 연속된 2개의 여행 영상 프레임을 입력 데이터로 하여 움직임 정보를 출력하도록 기학습된 인공신경망을 포함하는 모듈이다. 도 3은 본 발명의 일실시예에 따른 움직임 정보 출력 모듈(11)의 작동관계를 도시한 모식도이다. 도 3에 도시된 바와 같이, 움직임 정보 출력 모듈(11)은 연속된 2개의 여행 영상 프레임을 입력 데이터로 하고, 움직임 정보를 출력 데이터로 하는 인공신경망 모듈을 포함하도록 구성될 수 있고, 프레임 순서에 따라 움직임 정보를 출력하도록 구성될 수 있다. 이때, 2개의 여행 영상 프레임이 입력 데이터로 입력되는 것은 본 발명의 일실시예이며, 본 발명의 범위는 복수개의 여행 영상 프레임이 입력 데이터로 입력되는 것을 포함할 수 있다. 도 3에 도시된 바와 같이, 본 발명의 일실시예에 따른 움직임 정보 출력 모듈(11)에서는 여행 영상 프레임 t-3 및 t-2가 입력 데이터로 입력되어 움직임 정보 t-2가 출력되고, 여행 영상 프레임 t-2 및 t-1이 입력 데이터로 입력되어 움직임 정보 t-1이 출력되고, 여행 영상 프레임 t-1 및 t가 입력 데이터로 입력되어 움직임 정보 t가 출력되도록 구성될 수 있다.The motion information output module 11 is a module including a pretrained artificial neural network to receive travel image frames in frame time order and to output motion information using two consecutive travel image frames as input data. 3 is a schematic diagram showing an operating relationship of the motion information output module 11 according to an embodiment of the present invention. As shown in FIG. 3, the motion information output module 11 may be configured to include an artificial neural network module that takes two consecutive travel image frames as input data and motion information as output data, and in frame order. It may be configured to output motion information according to the At this time, inputting two travel image frames as input data is an embodiment of the present invention, and the scope of the present invention may include inputting a plurality of travel image frames as input data. As shown in FIG. 3, in the motion information output module 11 according to an embodiment of the present invention, travel image frames t-3 and t-2 are input as input data, and motion information t-2 is output. Image frames t-2 and t-1 are input as input data to output motion information t-1, and travel image frames t-1 and t are input as input data and motion information t is output.

움직임 정보는, 촬영하는 카메라(일 실시예에 따르면 생성자 클라이언트에 구성된 카메라, 촬영 방향 벡터의 시작점)의 가속도의 크기(scalar 값, magnitude of acceleration), 촬영 방향 벡터의 각속력(scalar 값, angular speed) 또는 촬영하는 카메라의 가속도의 크기와 촬영 방향 벡터의 각속력을 병합한 값을 의미할 수 있다. 움직임 정보 출력 모듈(11)의 학습 세션에서 손실(Loss) 계산을 위한 움직임 정보의 레퍼런스 정보인 비교 데이터는 생성자 클라이언트(100)의 가속도 센서, 자이로 센서의 센싱 값으로 계산될 수 있다. 예를 들어, 상기 카메라의 가속도의 크기는 움직임 정보 출력 모듈(11)이 생성자 클라이언트(100)의 가속도 센서에서 출력되는 가속도 벡터의 크기로 계산할 수 있고, 구체적으로는 가속도 센서의 x축 출력값(d²x/dt², x는 x축 변위 벡터), y축 출력값(d²y/dt², y는 y축 변위 벡터) 및 z축 출력값(d²z/dt², z는 z축 변위 벡터) 각각을 제곱하여 더한 후 제곱근을 씌워서 계산될 수 있다. 또한, 상기 촬영 방향 벡터의 각속력은 움직임 정보 출력 모듈(11)이 생성자 클라이언트(100)의 자이로 센서에서 출력되는 각속도 벡터의 크기로 계산할 수 있고, 구체적으로는 자이로 센서의 x축 출력값(dθ_x/dt, θ_x는 x축 회전각), y축 출력값(dθ_y/dt, θ_y는 y축 회전각) 및 z축 출력값(dθ_z/dt, θz는 z축 회전각) 각각을 제곱하여 더한 후 제곱근을 씌워서 계산될 수 있다.Motion information is the magnitude of acceleration (scalar value, magnitude of acceleration) of the shooting camera (the camera configured in the creator client according to one embodiment, the starting point of the shooting direction vector), the angular speed of the shooting direction vector (scalar value, angular speed ) or a value obtained by combining the magnitude of the camera's acceleration and the angular velocity of the shooting direction vector. In the learning session of the motion information output module 11 , comparison data serving as reference information for motion information for calculating loss may be calculated as sensing values of an acceleration sensor and a gyro sensor of the creator client 100 . For example, the magnitude of the acceleration of the camera can be calculated by the motion information output module 11 as the magnitude of an acceleration vector output from the acceleration sensor of the creator client 100, and specifically, the x-axis output value of the acceleration sensor (d ² x/dt ² , where x is the x-axis displacement vector, y-axis output (d ² y/dt ² , y is the y-axis displacement vector), and z-axis output (d ² z/dt ² , where z is the z-axis displacement vector) ) can be calculated by adding the squares of each, then taking the square root. In addition, the angular speed of the photographing direction vector may be calculated by the motion information output module 11 as the magnitude of the angular velocity vector output from the gyro sensor of the creator client 100, and specifically, the x-axis output value of the gyro sensor (dθ _x /dt, θ _x is the x-axis rotation angle), the y-axis output value (dθ _y /dt, θ _y is the y-axis rotation angle), and the z-axis output value (dθ _z /dt, θz is the z-axis rotation angle) by squaring each It can be calculated by adding and then taking the square root.

움직임 정보 출력 모듈(11)의 인공신경망 학습 세션(training session)에서는 움직임 정보 출력 모듈(11)이 연속된 복수개의 여행 영상 프레임을 입력 데이터로 입력 받고, 입력된 여행 영상 프레임들 중에서 특정 프레임에 대한 움직임 정보를 출력 데이터로 출력 한 뒤, 상기 특정 프레임에 대하여 기계산된 움직임 정보인 비교 데이터와 출력 데이터의 차이를 기초로 한 손실함수 계산을 통해 출력 데이터의 비교 데이터에 대한 손실이 감소되는 방향으로 움직임 정보 출력 모듈(11)의 weight 등 파라미터를 업데이트 하도록 구성될 수 있다. 이때, 본 발명의 일실시예에 따른 움직임 정보 출력 모듈(11)의 손실함수는 크로스 엔트로피 손실함수(Cross-entropy loss function), Contrastive loss function, Center loss function 등이 활용될 수 있다.In an artificial neural network training session of the motion information output module 11, the motion information output module 11 receives a plurality of consecutive travel image frames as input data, and selects a specific frame among the input travel image frames. After outputting the motion information as output data, the loss of the comparison data of the output data is reduced through calculation of a loss function based on the difference between the comparison data and the output data, which are motion information calculated for the specific frame. It may be configured to update parameters such as weight of the motion information output module 11. At this time, as the loss function of the motion information output module 11 according to an embodiment of the present invention, a cross-entropy loss function, a contrast loss function, a center loss function, and the like may be used.

움직임 정보 출력 모듈(11)의 인공신경망 추론 세션(inference session)에서는 움직임 정보 출력 모듈(11)이 연속된 복수개의 여행 영상 프레임을 입력 데이터로 입력 받고, 입력된 여행 영상 프레임들 중에서 특정 프레임에 대한 움직임 정보를 출력 데이터로 출력 하도록 구성될 수 있다.In the artificial neural network inference session of the motion information output module 11, the motion information output module 11 receives a plurality of consecutive travel image frames as input data, and selects a specific frame among the input travel image frames. It may be configured to output motion information as output data.

이에 따르면, 생성자 클라이언트(100)에서 가속도 센서 출력값과 자이로 센서 출력값을 매 프레임 별로 수신하지 않아도 되어, 애플리케이션 모듈이 단순해지고 송수신 정보의 규모 및 처리 하여야 하는 정보의 규모가 저감되는 효과가 발생된다.According to this, the creator client 100 does not have to receive the output value of the acceleration sensor and the output value of the gyro sensor every frame, thereby simplifying the application module and reducing the size of information to be transmitted and received and the size of information to be processed.

본 발명의 다른 실시예에 따르면, 움직임 정보 출력 모듈(11)은 강화학습 모듈을 포함할 수 있다. 도 4는 본 발명의 다른 실시예에 따른 강화학습 모듈을 포함한 움직임 정보 출력 모듈(11)을 도시한 모식도이다. 도 4에 도시된 바와 같이, 본 발명의 다른 실시예에 따른 움직임 정보 출력 모듈(11)의 강화학습 모듈은, 여행 영상을 Environment, 상기 움직임 정보 출력 모듈(11)을 Agent로 하고, 입력 데이터인 연속된 복수의 여행 영상 프레임을 State로 하며, 이러한 State에서 Agent인 움직임 정보 출력 모듈(11)이 특정 프레임에 대해 출력하는 움직임 정보(출력 데이터)를 Action으로 하고, 출력 데이터와 비교 데이터와의 차이가 적을수록 높은 Reward가 생성되어 Agent인 움직임 정보 출력 모듈(11)을 업데이트 하도록 구성될 수 있다. According to another embodiment of the present invention, the motion information output module 11 may include a reinforcement learning module. 4 is a schematic diagram showing a motion information output module 11 including a reinforcement learning module according to another embodiment of the present invention. As shown in FIG. 4, the reinforcement learning module of the motion information output module 11 according to another embodiment of the present invention uses a travel image as an environment and the motion information output module 11 as an agent, and input data A plurality of contiguous travel video frames are taken as a state, and in this state, the motion information (output data) output for a specific frame by the motion information output module 11, which is an agent, is taken as an action, and the difference between the output data and the comparison data The smaller the is, the higher the Reward is generated and can be configured to update the motion information output module 11, which is an agent.

대표 구간 선정 모듈(12)은 움직임 정보 출력 모듈(11)에서 여행 영상 프레임 각각에 대해 생성된 움직임 정보를 기초로 상대적으로 낮은 카메라 움직임을 보이는 연속된 여행 영상 프레임으로 구성된 프레임 구간을 적어도 하나 이상 선정하고, 해당 구간 중에서 상기 여행 영상을 대표하는 구간인 대표 구간을 선정하는 모듈이다. 도 5는 대표 구간 선정 모듈(12)의 대표 구간 선정을 도시한 모식도이다. 도 5에 도시된 바와 같이 움직임 정보 출력 모듈(11)에서 여행 영상 프레임 각각에 대해 생성된 움직임 정보를 프레임 순서에 따라 나열하였을 때, 복수개의 프레임 구간을 선정하고, 해당 프레임 구간들 중에서 대표 구간을 선정하도록 구성될 수 있다. 본 발명의 일실시예에 따른 대표 구간 선정 모듈(12)의 구체적인 대표 구간 선정 방법은 아래와 같다.The representative section selection module 12 selects at least one frame section composed of consecutive travel image frames showing a relatively low camera movement based on the motion information generated for each travel image frame in the motion information output module 11. and a module for selecting a representative section, which is a section representing the travel video, among the corresponding sections. 5 is a schematic diagram showing selection of a representative section by the representative section selection module 12 . As shown in FIG. 5, when the motion information generated for each travel image frame in the motion information output module 11 is arranged according to the frame order, a plurality of frame sections are selected, and a representative section is selected from among the corresponding frame sections. It can be configured to select. A specific representative section selection method of the representative section selection module 12 according to an embodiment of the present invention is as follows.

(1) 특정 움직임 정보(기준 움직임 정보) 이하의 움직임 정보를 갖는 연속된 복수개의 프레임으로 구성되는 적어도 하나 이상의 프레임 구간을 선정(1) Selecting at least one frame section composed of a plurality of consecutive frames having motion information equal to or less than specific motion information (reference motion information)

(2) 선정된 프레임 구간이 하나인 경우, 해당 프레임 구간을 대표 구간으로 선정(2) If there is only one selected frame section, the frame section is selected as a representative section

(2) 선정된 프레임 구간이 복수 인 경우, 선정된 복수의 프레임 구간 중 매 프레임의 움직임 정보와 기준 움직임 정보와의 차이를 합산한 값이 가장 큰 프레임 구간을 대표 구간으로 선정(2) If there are a plurality of selected frame sections, the frame section having the largest sum of the difference between the motion information of each frame and the reference motion information among the selected plurality of frame sections is selected as a representative section.

이에 따르면, 여행 크리에이터인 생성자가 생성한 여행 영상에서의 주요 콘텍스트를 포함하는 프레임 구간을 계산 부하가 낮은 알고리즘으로 자동으로 추출할 수 있게 되는 효과가 발생된다. According to this, an effect of being able to automatically extract a frame section including a main context in a travel image created by a travel creator is generated by an algorithm with a low computational load.

대표 프레임 선정 모듈(13)은 대표 구간 선정 모듈(12)에서 선정된 대표 구간 내에서 대표 프레임을 선정하는 모듈이다. 본 발명의 일실시예에 따르면 대표 구간 내에서 가장 낮은 움직임 정보를 갖는 프레임을 대표 프레임으로 선정하고 대표 프레임 정보를 생성하도록 구성될 수 있다. The representative frame selection module 13 is a module for selecting a representative frame within the representative section selected by the representative section selection module 12 . According to an embodiment of the present invention, a frame having the lowest motion information within a representative section may be selected as a representative frame and representative frame information may be generated.

대표 구간 선정 모듈(12)과 대표 프레임 선정 모듈(13)에 따르면, 순간적으로 카메라가 정지한 경우에 대표 프레임으로 선정되는 경우를 방지할 수 있게 되는 효과가 발생된다. 또한, 여행 크리에이터인 생성자가 생성한 여행 영상에서의 주요 콘텍스트를 포함하는 대표 프레임을 계산 부하가 낮은 알고리즘으로 자동으로 추출할 수 있게 되는 효과가 발생된다. 기존에는 여행 영상의 콘텍스트를 대표하는 프레임을 추출하는 작업을 생성자가 직접 수행할 수 밖에 없었다.According to the representative section selection module 12 and the representative frame selection module 13, an effect of being able to prevent a case in which a representative frame is selected when a camera is momentarily stopped occurs. In addition, an effect of being able to automatically extract a representative frame including a main context from a travel image created by a travel creator is generated by an algorithm with a low computational load. In the past, the creator had no choice but to directly perform the task of extracting the frame representing the context of the travel video.

대표 프레임 분류 모듈(14)은 대표 프레임 선정 모듈(13)에서 선정된 대표 프레임을 입력 데이터로 하고, 구역 분류 정보를 출력 데이터로 하는 인공신경망 모듈을 의미할 수 있다. 구역 분류 정보는, 해당 영상 정보가 촬영된 구역의 테마에 대한 분류 정보를 의미한다. 예를 들어, 노는 존(Zone), 마시는 존, 먹는 존, 보는 존, 쇼핑 존 등의 구역 분류를 의미할 수 있다. 본 발명의 일실시예에 따른 노는 존은 클럽, 해수욕장, 수영장 등이 포함된 구역, 마시는 존은 와인 바, 칵테일 바, 비어 펍, 바틀샵 등이 포함된 구역, 먹는 존은 각종 레스토랑, 해당 지역 맛집이나 길거리 음식이 포함된 구역, 보는 존은 각종 관광 구역이나 자연경관 구역이 포함된 구역, 쇼핑 존은 명품 매장, 백화점, 아울렛, 면세 매장, 기념품점, 길거리 쇼핑 등이 포함된 구역을 의미할 수 있다.The representative frame classification module 14 may refer to an artificial neural network module that uses the representative frame selected by the representative frame selection module 13 as input data and receives zone classification information as output data. Zone classification information refers to classification information on a theme of a zone in which corresponding image information is photographed. For example, this may indicate zone classification such as a play zone, a drinking zone, an eating zone, a viewing zone, and a shopping zone. According to one embodiment of the present invention, the playing zone is an area including a club, beach, swimming pool, etc., the drinking zone is an area including a wine bar, cocktail bar, beer pub, bartle shop, etc., and the eating zone is an area including various restaurants, the corresponding area Zones that include restaurants or street food, viewing zones are zones that include various tourist zones or natural scenery zones, and shopping zones are zones that include luxury stores, department stores, outlets, duty-free shops, souvenir shops, and street shopping. can

대표 프레임 분류 모듈(14)의 구성과 관련하여, 도 6은 본 발명의 일실시예에 따른 대표 프레임 분류 모듈(14)의 구성을 도시한 모식도이다. 도 6에 도시된 바와 같이, 본 발명의 일실시예에 따른 대표 프레임 분류 모듈(14)은 복수의 CNN 모듈 및 구역 분류 인공신경망 모듈을 포함할 수 있다. Regarding the configuration of the representative frame classification module 14, FIG. 6 is a schematic diagram showing the configuration of the representative frame classification module 14 according to an embodiment of the present invention. As shown in FIG. 6 , the representative frame classification module 14 according to an embodiment of the present invention may include a plurality of CNN modules and a region classification artificial neural network module.

대표 프레임 분류 모듈(14)의 복수의 CNN 모듈은 서로 다른 학습 데이터로 학습되거나, 서로 다른 파라미터를 갖는 네트워크로 구성된다. 도 7은 본 발명의 일실시예에 따른 제nCNN 모듈을 도시한 모식도이다. 도 7에 도시된 바와 같이, 본 발명의 일실시예에 따른 제nCNN 모듈의 입력 레이어(input layer)는 소스 데이터로서 대표 프레임이 입력되게 되고, 예를 들어 가로 32, 세로 32, 높이 n의 채널을 가지고 입력의 크기는 [32x32xn]인 매트릭스로 구성될 수 있다. CONV 레이어(Conv. layer)는 Conv. Filter에 의해 소스 데이터인 대표 프레임의 일부 영역과 연결되어 계산되며, 이 연결된 영역과 가중치의 내적 연산(dot product)을 계산하게 되고, 예를 들어 Conv. layer의 볼륨은 [32x32x12]와 같은 크기를 갖게 된다. 이후 RELU 레이어 등의 Activation 함수가 계산되며, RELU는 max(0,x)와 같이 각 요소에 적용되는 액티베이션 함수(activation function)이고, 볼륨의 크기를 변화시키지 않으며(예를 들어, 여전히 [32x32x12]) Activation map을 생성한다. POOL 레이어(pooling layer)는 "가로,세로" 차원에 대해 다운샘플링(downsampling)을 수행하고, 예를 들어 [16x16x12]와 같이 줄어든 볼륨(Activation map)을 출력한다. n번째 Activation map n)과 연결된 FC(fully-connected) 레이어 이후 클래스 점수들을 계산하여, 예를 들어 [m x m x 1]의 크기를 갖는 제n분류 정보를 포함하는 볼륨(output layer)을 출력한다. 본 발명의 일실시예에 따른 분류 정보는 대표 프레임 내에 포함된 오브젝트에 대한 분류 정보를 의미할 수 있고, 각각의 CNN 모듈에서 적어도 하나 이상 출력되도록 구성될 수 있다. A plurality of CNN modules of the representative frame classification module 14 are trained with different training data or configured as networks having different parameters. 7 is a schematic diagram showing a th nCNN module according to an embodiment of the present invention. As shown in FIG. 7, a representative frame is input as source data to the input layer of the nCNN module according to an embodiment of the present invention, for example, a channel of 32 width, 32 length, and height n. With , the size of the input can be composed of a matrix of [32x32xn]. The CONV layer (Conv. layer) is Conv. It is calculated by being connected to some area of the representative frame, which is the source data, by Filter, and the dot product of this connected area and weight is calculated. For example, Conv. The layer's volume will have dimensions equal to [32x32x12]. After that, an activation function such as a RELU layer is calculated, and RELU is an activation function applied to each element, such as max(0,x), and does not change the size of the volume (for example, still [32x32x12] ) to generate an activation map. The POOL layer (pooling layer) performs downsampling on the “horizontal, vertical” dimension and outputs a reduced volume (activation map), for example [16x16x12]. Class scores are calculated after a fully-connected (FC) layer connected to the n-th activation map n), and an output layer containing n-th classification information having a size of [m x m x 1] is output. Classification information according to an embodiment of the present invention may mean classification information on an object included in a representative frame, and may be configured to output at least one or more from each CNN module.

본 발명의 일실시예에 따른 복수의 CNN 모듈의 분류 정보는 검출된 오브젝트의 분류 된 Class에 대한 정보(Class 분류 정보) 및 해당 오브젝트가 입력된 대표 프레임 내에서 차지하는 비율에 대한 정보(오브젝트 크기 정보)를 포함하는 텐서(tensor)로 구성될 수 있고, 손실함수는 Class 손실함수와 오브젝트 크기 손실함수를 포함할 수 있다. CNN 모듈의 손실함수는 CNN 모듈의 학습 세션에서 인공신경망의 weight를 업데이트 하기 위해 활용되며, 손실함수를 최소(global minimum 또는 local minimum)로 하도록 CNN 모듈의 weight를 업데이트 할 수 있다. Class 손실함수는 대표 프레임이 입력된 CNN 모듈에서 출력되는 분류 정보 내에 포함된 Class 분류 정보와 대표 프레임 내에 포함된 실제 오브젝트의 Class 분류 정보인 ground truth(비교 데이터)와의 차이를 의미할 수 있다. 오브젝트 크기 손실함수는 대표 프레임이 입력된 CNN 모듈에서 출력되는 분류 정보 내에 포함된 오브젝트 크기 정보와 대표 프레임 내에 포함된 실제 오브젝트가 대표 프레임 내에서 차지하는 비율인 ground truth(비교 데이터)와의 차이를 의미할 수 있다. 본 발명의 일실시예에 따른 Class 분류 정보는, 예를 들어, 하늘, 바다, 수영장, 음식, 음료, 테이블, 의자, 자동차, 자전거, 사람 등의 오브젝트에 대한 class를 분류한 정보를 의미할 수 있다. 본 발명의 일실시예에 따른 오브젝트 크기 정보는 30%, 4% 등의 비율을 의미할 수 있다. Classification information of a plurality of CNN modules according to an embodiment of the present invention includes information on the classified class of the detected object (class classification information) and information on the ratio of the corresponding object in the input representative frame (object size information) ), and the loss function may include a class loss function and an object size loss function. The loss function of the CNN module is used to update the weight of the artificial neural network in the training session of the CNN module, and the weight of the CNN module can be updated so that the loss function is the minimum (global minimum or local minimum). The class loss function may mean a difference between class classification information included in classification information output from a CNN module inputting a representative frame and ground truth (comparison data), which is class classification information of a real object included in the representative frame. The object size loss function means the difference between the object size information included in the classification information output from the CNN module with the representative frame input and the ground truth (comparison data), which is the ratio of the actual object included in the representative frame to the representative frame. can Class classification information according to an embodiment of the present invention may refer to information for classifying objects such as sky, sea, swimming pool, food, drink, table, chair, car, bicycle, and person, for example. there is. Object size information according to an embodiment of the present invention may mean ratios such as 30% and 4%.

또한, 본 발명의 일실시예에 따른 복수의 CNN 모듈은 각각 서로 다른 학습 데이터를 통해 서로 다른 오브젝트를 분류하도록 학습될 수 있다. 이러한 학습 데이터 및 손실함수의 구성에 따라, 복수의 CNN 모듈에서는 각각 서로 다른 특정 오브젝트에 대한 Class와 해당 오브젝트의 크기를 출력하도록 구성될 수 있다. In addition, a plurality of CNN modules according to an embodiment of the present invention may be trained to classify different objects through different learning data. According to the configuration of the learning data and the loss function, a plurality of CNN modules may be configured to output a class for a specific object that is different from each other and a size of the object.

본 발명의 다른 실시예에 따르면, 복수의 CNN 모듈 대신에 YOLO, R-CNN 등의 멀티플 오브젝트 디텍션을 위한 CNN 아키텍쳐가 구성될 수 있고, 출력 데이터 및 손실함수는 본 발명의 일실시예에 따른 복수의 CNN 모듈의 출력 데이터 tensor(Class 분류 정보 및 오브젝트 크기 정보) 및 손실함수(Class 손실함수 및 오브젝트 크기 손실함수)가 준용될 수 있다. According to another embodiment of the present invention, a CNN architecture for multiple object detection such as YOLO or R-CNN can be configured instead of a plurality of CNN modules, and the output data and loss function are multiple The output data tensor (class classification information and object size information) and loss function (class loss function and object size loss function) of the CNN module of can be applied.

대표 프레임 분류 모듈(14)의 구역 분류 인공신경망 모듈은 복수의 CNN 모듈에서 출력된 복수의 오브젝트에 대한 복수의 분류 정보(Class 분류 정보 및 오브젝트 크기 정보)를 입력 데이터로 하고, 구역 분류 정보를 출력 데이터로 하는 인공신경망으로 구성될 수 있다. 도 8은 본 발명의 일실시예에 따른 분류 인공신경망 모듈을 도시한 모식도이다. 도 8에 도시된 바와 같이, 복수의 CNN 모듈에서 출력된 복수의 분류 정보(제1분류 정보, 제2분류 정보, ... , 제n분류 정보)를 입력 데이터로 하고 구역 분류 정보를 출력 정보로 하는 인공신경망으로 구성될 수 있다. The zone classification artificial neural network module of the representative frame classification module 14 takes as input data a plurality of classification information (class classification information and object size information) for a plurality of objects output from a plurality of CNN modules, and outputs zone classification information. It can be composed of an artificial neural network with data. 8 is a schematic diagram showing a classification artificial neural network module according to an embodiment of the present invention. As shown in FIG. 8, a plurality of classification information (first classification information, second classification information, ..., n-th classification information) output from a plurality of CNN modules is used as input data, and zone classification information is output information. It can be composed of an artificial neural network.

본 발명의 일실시예에 따른 구역 분류 인공신경망 모듈의 학습 세션(training session)에서는, 대표 프레임이 입력된 복수의 CNN 모듈에서 출력된 분류 정보를 입력 데이터로 구역 분류 인공신경망 모듈에 입력하고, 구역 분류 인공신경망 모듈에서 출력 데이터로서 출력 된 구역 분류 정보와 해당 여행 영상의 실제 구역 분류 정보(비교 데이터, ground truth)의 차이를 기초로 back propagation 등의 학습 방법으로 구역 분류 인공신경망 모듈의 hidden layer의 weight를 업데이트 하도록 구성될 수 있다. In a training session of the zone classification artificial neural network module according to an embodiment of the present invention, classification information output from a plurality of CNN modules inputted with representative frames is input to the zone classification artificial neural network module as input data, and Based on the difference between the zone classification information output as output data from the classification artificial neural network module and the actual zone classification information (comparison data, ground truth) of the corresponding travel image, back propagation is used to learn the hidden layer of the zone classification artificial neural network module. It can be configured to update weights.

본 발명의 일실시예에 따른 구역 분류 인공신경망 모듈의 추론 세션(inference session)에서는, 대표 프레임이 입력된 복수의 CNN 모듈에서 출력된 분류 정보를 입력 데이터로 구역 분류 인공신경망 모듈에 입력하고, 구역 분류 인공신경망 모듈에서 출력 데이터로서 출력 된 구역 분류 정보를 해당 여행 영상의 구역 분류 정보로서 활용하도록 구성될 수 있다.In an inference session of the zone classification artificial neural network module according to an embodiment of the present invention, classification information output from a plurality of CNN modules inputted with representative frames is input to the zone classification artificial neural network module as input data, and It may be configured to utilize zone classification information output as output data from the classification artificial neural network module as zone classification information of the corresponding travel image.

예를 들어, 복수의 CNN 모듈에서 분류 정보가 (음식, 12%), (테이블, 30%), (사람, 20%), (수저, 3%) 등으로 출력되고, 해당 분류 정보를 구역 분류 인공신경망 모듈에 입력하였을 때, 구역 분류 인공신경망 모듈에서는 '먹는 존'으로 구역 분류 정보가 출력되도록 구성될 수 있다. For example, in multiple CNN modules, classification information is output as (food, 12%), (table, 30%), (person, 20%), (spoon, 3%), etc., and the classification information is classified into zones. When input is input to the artificial neural network module, the zone classification artificial neural network module may be configured to output zone classification information as 'eating zone'.

이에 따르면, 대표 프레임을 입력받고 구역 분류 정보를 출력하는 하나의 CNN 모듈로 구성하는 구조에 비해 적은 학습 데이터 또는 각 Class 별로 데이터 양의 불균형이 심한 학습 데이터로도 최적화가 용이해지는 효과가 발생된다. 또한, 각 구역으로 분류될 수 있는 이미지나 영상은 매우 다양하기 때문에 CNN 아키텍쳐로 높은 분류 성능을 내기가 매우 어려운데, 본 발명의 일실시예에 따르면 CNN 모듈에 의해 출력된 분류 정보를 인공신경망 모듈로 재분류 함으로써 이러한 문제점이 해소되는 효과가 발생된다. According to this, compared to a structure composed of one CNN module that receives a representative frame and outputs zone classification information, an effect of facilitating optimization occurs even with small training data or learning data with a severe imbalance in the amount of data for each class. In addition, since the images or videos that can be classified into each zone are very diverse, it is very difficult to achieve high classification performance with the CNN architecture. According to one embodiment of the present invention, classification information output by the CNN module is transferred to the artificial neural network module. Reclassification has the effect of solving these problems.

구역 분류 정보의 활용과 관련하여, 도 9는 본 발명의 일실시예에 따른 구역 분류 인공신경망 모듈에 의해 출력된 구역 분류 정보를 여행 지역의 지도에 매핑한 모식도이다. 도 9에 도시된 바와 같이, 본 발명의 일실시예에 따른 구역 분류 정보는 영상 정보 내에 포함된 위치 정보를 기초로 특정 여행 지역의 지도에 매핑되고, 해당 애플리케이션 모듈이 구성된 사용자 클라이언트의 디스플레이에 구역 분류 정보를 포함한 여행 지역의 지도 정보를 출력하게 되며, 사용자는 해당 지도 정보를 토대로 원하는 테마를 즐기기 위하여 해당 위치로 이동하거나, 애플리케이션 모듈을 통해 원하는 테마에 관한 구역의 정보를 온라인으로 취득할 수 있게 된다. Regarding the use of zone classification information, FIG. 9 is a schematic diagram of mapping zone classification information output by the zone classification artificial neural network module to a map of a travel area according to an embodiment of the present invention. As shown in FIG. 9, zone classification information according to an embodiment of the present invention is mapped to a map of a specific travel area based on location information included in image information, and zones are displayed on a display of a user client configured with a corresponding application module. Map information of the travel area including classification information is output, and based on the map information, the user can move to the location to enjoy the desired theme or obtain information about the area related to the desired theme online through the application module. do.

[변형예 - 영상의 파노라마 이미지 변환을 이용한 여행 영상 분류 장치][Modified Example - Travel Image Classification Device Using Panoramic Image Conversion of Images]

본 발명의 변형예에 따른 여행 영상 분류 장치인 영상의 파노라마 이미지 변환을 이용한 여행 영상 분류 장치와 관련하여, 도 10은 본 발명의 변형예에 따른 여행 영상 분류 장치인 영상의 파노라마 이미지 변환을 이용한 여행 영상 분류 장치를 도시한 모식도, 도 11은 본 발명의 변형예에 따른 여행 영상 분류 장치인 영상의 파노라마 이미지 변환을 이용한 여행 영상 분류 장치의 작동관계를 도시한 모식도이다. 도 10, 11에 도시된 바와 같이, 본 발명의 변형예에 따른 여행 영상 분류 장치인 영상의 파노라마 이미지 변환을 이용한 여행 영상 분류 장치에 따르면, 본 발명의 일실시예에 따른 인공지능 기반의 여행 영상 분류 장치(1)에 특징 추출 모듈(21), 코어 특징 선정 모듈(22), 후보 프레임 선정 모듈(23), 프레임 정합 모듈(24)을 더 포함하고, 대표 프레임 선정 모듈(13)의 대표 프레임 선정 이후 특징 추출 모듈(21), 코어 특징 선정 모듈(22), 후보 프레임 선정 모듈(23), 프레임 정합 모듈(24)에 의해 대표 프레임이 파노라마 이미지로 변환된 형태인 파노라마 대표 프레임을 생성하고, 생성된 파노라마 대표 프레임을 대표 프레임 분류 모듈(14)에 입력하는 것을 특징으로 할 수 있다. In relation to a travel image classification device using panoramic image conversion of an image, which is a travel image classification device according to a modified example of the present invention, FIG. 10 is a travel image classification device according to a modified example of the present invention, travel using panoramic image conversion 11 is a schematic diagram showing an image classification device, which is a travel image classification device according to a modified example of the present invention, showing the operational relationship of the travel image classification device using panoramic image conversion. As shown in FIGS. 10 and 11, according to the travel image classification device using panoramic image conversion, which is a travel image classification device according to a modified example of the present invention, the artificial intelligence-based travel image according to an embodiment of the present invention The classification device 1 further includes a feature extraction module 21, a core feature selection module 22, a candidate frame selection module 23, and a frame matching module 24, and the representative frame of the representative frame selection module 13 After the selection, the representative frame is converted into a panoramic image by the feature extraction module 21, the core feature selection module 22, the candidate frame selection module 23, and the frame matching module 24 to generate a panoramic representative frame, It may be characterized by inputting the generated representative frame of the panorama to the representative frame classification module 14 .

특징 추출 모듈(21)과 관련하여, 도 12는 본 발명의 변형예에 따른 여행 영상 분류 장치의 일구성인 특징 추출 모듈(21)의 구조를 도시한 모식도이다. 도 12에 도시된 바와 같이, 특징 추출 모듈(21)은 상기 대표 프레임 선정 모듈(13)에서 선정된 대표 프레임의 오브젝트 특징을 추출하는 특징 추출 인공신경망 모듈을 포함하고, 상기 특징 추출 인공신경망 모듈은 입력 데이터를 대표 프레임 또는 후보 프레임으로 하고 출력 데이터로 특징 클래스 정보(class probability) 및 특징 좌표 정보(coordinate data)를 포함하는 ConvNet 계열의 인공신경망으로 구성될 수 있다. Regarding the feature extraction module 21, FIG. 12 is a schematic diagram showing the structure of the feature extraction module 21, which is a component of a travel image classification device according to a modified example of the present invention. As shown in FIG. 12, the feature extraction module 21 includes a feature extraction artificial neural network module for extracting object features of the representative frame selected by the representative frame selection module 13, and the feature extraction artificial neural network module It may be composed of a ConvNet-based artificial neural network that includes feature class information (class probability) and feature coordinate information (coordinate data) as output data and input data as a representative frame or candidate frame.

본 발명의 일실시예에 따른 특징 추출 인공신경망 모듈은 classification과 localization을 수행하는 multi-object detection 인공신경망 모듈을 포함할 수 있고, 이러한 multi-object detection 인공신경망 모듈로는 2-stage detector로서 RCNN(2013), OverFeat(ICLR 2014), Fast RCNN(ICCV 2015), Faster RCNN(NIPS 2015), Mask RCNN(ICCV 2017) 등이 활용될 수 있고, 1-stage detector로서 anchor based의 YOLO v1(CVPR 2016), YOLO v2(CVPR 2017), YOLO v3(arXiv 2018), SSD(ECCV 2016), RetinaNet(ICCV 2017) 등이 활용될 수 있으며, 1-stage detector로서 non-anchor based의 CornerNet(ECCV 2018), ExtreamNet(2019), CenterNet(2019) 등이 활용될 수 있다.The feature extraction artificial neural network module according to an embodiment of the present invention may include a multi-object detection artificial neural network module that performs classification and localization, and this multi-object detection artificial neural network module includes RCNN as a 2-stage detector ( 2013), OverFeat (ICLR 2014), Fast RCNN (ICCV 2015), Faster RCNN (NIPS 2015), Mask RCNN (ICCV 2017), etc. can be used, and as a 1-stage detector, anchor-based YOLO v1 (CVPR 2016) , YOLO v2 (CVPR 2017), YOLO v3 (arXiv 2018), SSD (ECCV 2016), RetinaNet (ICCV 2017), etc. can be used, and as a 1-stage detector, non-anchor based CornerNet (ECCV 2018), ExtreamNet (2019), CenterNet (2019), etc. can be utilized.

본 발명의 일실시예에 따른 특징 추출 인공신경망 모듈의 출력 데이터는, 적어도 하나 이상의 object에 대한 특정 class와 그 신뢰도를 의미하는 특징 클래스 정보(confidence score 또는 class probability), 해당 object의 특징 좌표 정보(coordinate data)를 포함할 수 있고, object의 특징 좌표 정보는 인공신경망의 구성에 따라 bounding box의 top-left coner와 bottom-right coner의 좌표, bounding box의 centeral region 좌표, bounding box의 width 및 hight를 포함하도록 구성될 수 있다. Output data of the feature extraction artificial neural network module according to an embodiment of the present invention includes feature class information (confidence score or class probability) indicating a specific class of at least one object and its reliability, feature coordinate information ( coordinate data), and the feature coordinate information of the object includes the coordinates of the top-left corner and bottom-right corner of the bounding box, the coordinates of the centeral region of the bounding box, and the width and height of the bounding box according to the configuration of the artificial neural network. can be configured to include

도 13은 본 발명의 변형예에 따른 특징 추출 인공신경망 모듈이 YOLO v1(CVPR 2016)으로 구성되는 경우의 구조를 도시한 모식도이다. 예를 들어, 도 13에 도시된 바와 같이 본 발명의 변형예에 따른 특징 추출 인공신경망 모듈을 YOLO v1(CVPR 2016)으로 구성하는 경우, DarkNet Architecture를 사용하게 되며, convolution layer들을 통해 feature map을 추출하고, fully connected layer를 거쳐 바로 bounding box의 coordinate data(특징 좌표 정보)와 class probability(class confidence, 특징 클래스 정보)를 추론(inference)하여 출력 데이터로서 출력하도록 구성된다. YOLO에서는 input 이미지인 대표 프레임 또는 후보 프레임을 SxS grid로 나누고 각 grid 영역에 해당하는 bounding box(SxSxB개)와 Class confidence(Probability(object)×IoU(prediction, ground truth)), Class probability map(Probability(Class_i|object))을 구하도록 구성된다. 구체적인 네트워크 구조를 예를 들면, 한 grid 영역당 5개의 bounding box coordinate(특징 좌표 정보)와 confidence score(특징 클래스 정보)를 출력하도록 구성될 수 있고, 대표 프레임 또는 후보 프레임은 448x448x3의 크기로 입력되도록 구성될 수 있으며, DarkNet Architecture의 Activation map은 7x7x1024의 크기로 구성될 수 있고, DarkNet Architecture 이후 4096 및 7x7x30의 Fully Connected Layer가 구성될 수 있다.13 is a schematic diagram showing the structure of a feature extraction artificial neural network module according to a modified example of the present invention composed of YOLO v1 (CVPR 2016). For example, as shown in FIG. 13, when the feature extraction artificial neural network module according to the modified example of the present invention is configured with YOLO v1 (CVPR 2016), DarkNet Architecture is used, and feature maps are extracted through convolution layers It is configured to infer the coordinate data (feature coordinate information) and class probability (class confidence, feature class information) of the bounding box directly through the fully connected layer and output them as output data. In YOLO, a representative frame or candidate frame, which is an input image, is divided into SxS grids, and bounding boxes (SxSxB) corresponding to each grid area, class confidence (Probability (object) × IoU (prediction, ground truth)), Class probability map (Probability (Class_i|object)). For example, a specific network structure can be configured to output 5 bounding box coordinates (feature coordinate information) and confidence score (feature class information) per grid area, and a representative frame or candidate frame is input in a size of 448x448x3 Activation map of DarkNet Architecture can be configured in size of 7x7x1024, and Fully Connected Layer of 4096 and 7x7x30 can be configured after DarkNet Architecture.

코어 특징 선정 모듈(22)과 관련하여, 도 14는 본 발명의 변형예에 따른 코어 특징 선정 모듈(22)의 작동관계를 도시한 모식도이다. 도 14에 도시된 바와 같이, 코어 특징 선정 모듈(22)은 대표 프레임 분류 모듈(14)의 제nCNN 모듈의 마지막 Conv.layer(Fully connected layer 이전의 Conv.layer)와 연결되어 대표 프레임 분류 모듈(14)의 입력 데이터가 대표 프레임일 때 Conv.layer에서 출력되는 적어도 한 차원 이상의 Activation map을 포함하는 액티베이션 정보(좌표 별 액티베이션 값을 포함)를 입력받고, 특징 추출 모듈(21)에 대표 프레임이 입력 데이터로 입력되어 출력 데이터로 출력되는 각 class의 coordinate data(특징 좌표 정보)를 입력받으며, 특징 추출 모듈(21)에서 입력받은 각 class의 coordinate data(특징 좌표 정보) 또는 액티베이션 정보의 좌표를 대표 프레임의 크기와 activation map의 크기에 따라 조정하며, 각 class의 coordinate data(특징 좌표 정보)에 따른 bounding box 범위 내에 대표 프레임 분류 모듈(14)의 activation map의 hitmap이 포함되는 비율(액티베이션 비율) 또는 activation weight(액티베이션 값)의 합을 각 class별 특징 중요도 정보로서 출력한다. 출력된 특징 중요도 정보가 가장 큰 class를 코어 특징으로 선정한다. Regarding the core feature selection module 22, FIG. 14 is a schematic diagram showing the operational relationship of the core feature selection module 22 according to a modified example of the present invention. As shown in FIG. 14, the core feature selection module 22 is connected to the last Conv.layer (Conv.layer before the fully connected layer) of the nCNN module of the representative frame classification module 14, and is connected to the representative frame classification module ( When the input data of 14) is a representative frame, activation information (including activation values for each coordinate) including at least one-dimensional activation map output from Conv.layer is input, and the representative frame is input to the feature extraction module 21 It receives the coordinate data (feature coordinate information) of each class that is input as data and output as output data, and the coordinate data (feature coordinate information) of each class input from the feature extraction module 21 or the coordinates of activation information is converted into a representative frame. It is adjusted according to the size of the activation map and the size of the activation map, and the rate (activation rate) or activation that the hitmap of the activation map of the representative frame classification module (14) is included within the bounding box range according to the coordinate data (feature coordinate information) of each class. The sum of weights (activation values) is output as feature importance information for each class. The class with the largest output feature importance information is selected as the core feature.

코어 특징 선정 모듈(22)의 제2변형예와 관련하여, 도 15는 본 발명의 제2변형예에 따른 코어 특징 선정 모듈(22)의 작동관계를 도시한 모식도이다. 도 15에 도시된 바와 같이, 본 발명의 제2변형예에 따르면, 대표 프레임 분류 모듈(14)이 구역 분류 정보를 출력 데이터로 출력하는 단일 컨볼루젼 네트워크로 구성된 인공신경망으로 구성되고, 코어 특징 선정 모듈(22)과 대표 프레임 분류 모듈(14)의 사이에 차원을 축소하는 클래스 액티베이션 생성 모듈(220)을 더 포함하며, 제2변형예의 클래스 액티베이션 생성 모듈(220)은 대표 프레임 분류 모듈(14)의 마지막 Conv.layer(Fully connected layer 이전의 Conv.layer)와 연결되어 대표 프레임 분류 모듈(14)의 입력 데이터가 대표 프레임일 때 Conv.layer에서 출력되는 적어도 한 차원 이상의 Activation map을 포함하는 액티베이션 정보(좌표 별 액티베이션 값을 포함)를 입력받고 각 class에 대응되는 Activation map을 포함하는 클래스 액티베이션 정보(각 class에 따른 좌표 별 액티베이션 값을 포함)를 출력하며, sigmoid 함수를 통해 구역 분류 정보를 출력하도록 대표 프레임 분류 모듈(14)과 함께 학습될 수 있다. 또한, 코어 특징 선정 모듈(22)에는 상기 클래스 액티베이션 정보와 특징 추출 모듈(21)에 대표 프레임이 입력 데이터로 입력되어 출력 데이터로 출력되는 각 class의 coordinate data(특징 좌표 정보)가 입력되며, 입력되는 특징 좌표 정보 또는 클래스 액티베이션 정보의 좌표를 대표 프레임의 크기와 class activation map의 크기에 따라 조정하여 각 class의 coordinate data(특징 좌표 정보)에 따른 bounding box 범위 내에 대표 프레임 분류 모듈(14)의 class activation map의 hitmap이 포함되는 비율(액티베이션 비율) 또는 activation weight(액티베이션 값)의 합을 각 class별 특징 중요도 정보로서 출력한다. 출력된 특징 중요도 정보가 가장 큰 class를 코어 특징으로 선정한다. Regarding the second modified example of the core feature selection module 22, FIG. 15 is a schematic diagram showing the operational relationship of the core feature selection module 22 according to the second modified example of the present invention. As shown in FIG. 15, according to the second modified example of the present invention, the representative frame classification module 14 is composed of an artificial neural network composed of a single convolutional network that outputs zone classification information as output data, and selects core features. A class activation generation module 220 for dimension reduction is further included between the module 22 and the representative frame classification module 14, and the class activation generation module 220 of the second modified example comprises the representative frame classification module 14 Activation information including at least one-dimensional activation map output from the Conv.layer when the input data of the representative frame classification module 14 is a representative frame when it is connected to the last Conv.layer (Conv.layer prior to the fully connected layer) of (including activation values by coordinates), outputs class activation information (including activation values by coordinates according to each class) including activation maps corresponding to each class, and outputs zone classification information through the sigmoid function It can be learned together with the representative frame classification module 14. In addition, the core feature selection module 22 receives the class activation information and the coordinate data (feature coordinate information) of each class that is output as output data after the representative frame is input to the feature extraction module 21 as input data. The class of the representative frame classification module 14 within the bounding box range according to the coordinate data (feature coordinate information) of each class by adjusting the coordinates of the feature coordinate information or class activation information to be used according to the size of the representative frame and the size of the class activation map. The ratio (activation rate) or the sum of activation weights (activation values) included in the hitmap of the activation map is output as feature importance information for each class. The class with the largest output feature importance information is selected as the core feature.

이때, 클래스 액티베이션 생성 모듈(220)에 의한 클래스 액티베이션 정보의 생성은 아래와 같이 수행될 수 있다. At this time, generation of class activation information by the class activation generation module 220 may be performed as follows.

위 수학식 1에서, M_c(x,y)는 class c로의 분류에 영향을 주는 (x,y)에 위치한 액티베이션 값, w_k ^c는 activation map에서 class c에 대한 k번째 채널의 가중치, f_k(x,y)는 Activation map의 k번째 채널의 (x,y)에 위치한 액티베이션 값을 의미한다. In Equation 1 above, M _c (x, y) is the activation value located at (x, y) that affects classification into class c, w _k ^c is the weight of the k-th channel for class c in the activation map, f _k (x, y) means an activation value located at (x, y) of the k-th channel of the activation map.

본 발명의 제2변형예에 따르면, 전체 클래스에 대한 Activation map이 아닌, 각 class 별로 달리 생성되는 Class activaiton map을 이용하여 각 class별 특징 중요도 정보를 출력하게 되므로, 구역 분류 정보의 추론(inference)에 보다 중요한 object를 코어 특징 정보로 선정할 수 있게 되는 효과가 발생된다. 또한, 대표 프레임 분류 모듈(14)부터 클래스 액티베이션 생성 모듈(220)까지 단일 인공신경망 모듈로 구성할 수 있게 됨으로써 인공신경망 학습 및 추론의 효율이 향상되는 효과가 발생된다. According to the second modified example of the present invention, since feature importance information for each class is output using a class activation map generated differently for each class instead of an activation map for all classes, inference of zone classification information There is an effect of being able to select a more important object as core feature information. In addition, since the representative frame classification module 14 to the class activation generation module 220 can be configured as a single artificial neural network module, the efficiency of artificial neural network learning and reasoning is improved.

후보 프레임 선정 모듈(23)과 관련하여, 도 16은 본 발명의 변형예에 따른 후보 프레임 선정 모듈(23)을 도시한 모식도이다. 도 16에 도시된 바와 같이, 후보 프레임 선정 모듈(23)은 상기 대표 프레임 선정 모듈(13)에서 선정된 대표 프레임의 전후 프레임들 중 상기 코어 특징을 포함하는 복수의 프레임을 후보 프레임으로 선정하는 모듈이다. 후보 프레임 선정 모듈(23)은 전후 프레임들을 특징 추출 모듈(21)에 입력하여 출력되는 특징 클래스 정보 중 코어 특징과 동일한 class에 대한 confidence score(class probability)가 특정 수준 이상인 프레임을 순차적으로 후보 프레임으로 선정하는 모듈이다. 예를 들어, 도 15에서 코어 특징과 동일한 class에 대한 confidence score가 0.7 이상인 프레임을 후보 프레임으로 선정하는 경우, confidence score가 0.65인 프레임과 대표 프레임의 사이를 순차적으로 후보 프레임으로 선정하도록 구성될 수 있다.Regarding the candidate frame selection module 23, FIG. 16 is a schematic diagram showing the candidate frame selection module 23 according to a modified example of the present invention. As shown in FIG. 16, the candidate frame selection module 23 selects a plurality of frames including the core feature among the frames before and after the representative frame selected by the representative frame selection module 13 as candidate frames. am. The candidate frame selection module 23 inputs the front and back frames to the feature extraction module 21 and sequentially selects frames having a confidence score (class probability) of a certain level or higher for the same class as the core feature among the output feature class information. module to select. For example, in FIG. 15, when a frame having a confidence score of 0.7 or higher for the same class as a core feature is selected as a candidate frame, a frame having a confidence score of 0.65 and a representative frame may be sequentially selected as candidate frames. there is.

프레임 정합 모듈(24)은 대표 프레임과 후보 프레임을 코어 특징 기준으로 정합(stiching)하여 파노라마 대표 프레임을 생성하는 모듈이다. 대표 프레임과 후보 프레임의 정합 방법으로는 후보 프레임에 대하여 코어 특징을 기준으로 이미지를 분할하여 분할 이미지를 생성하고, 분할 이미지 중 코어 특징을 포함하지 않는 분할 이미지를 대표 프레임에 정합하는 방법으로 파노라마 대표 프레임을 생성하도록 구성될 수 있다. 프레임 정합 모듈(24)에서 생성된 파노라마 대표 프레임은 대표 프레임 분류 모듈(14)에 입력 데이터로 입력되어 보다 성능이 좋은 구역 분류 정보인 제2구역 분류 정보를 출력 데이터로 출력하도록 구성될 수 있다. The frame matching module 24 is a module that creates a panorama representative frame by stitching a representative frame and a candidate frame based on a core feature. As a matching method between a representative frame and a candidate frame, an image of a candidate frame is divided based on a core feature to generate a divided image, and among the divided images, a divided image that does not include a core feature is matched to the representative frame. It can be configured to create a frame. The panoramic representative frame generated by the frame matching module 24 may be input to the representative frame classification module 14 as input data to output second zone classification information, which is zone classification information having better performance, as output data.

본 발명의 변형예에 따르면, 대표 프레임을 파노라마 이미지로 변환하여 구역 분류를 추론할 수 있게 되므로, 대표 프레임을 통해 전체 여행 영상의 구역을 분류하는 방식의 단점을 상쇄할 수 있게 되는 효과가 발생된다. 또한, 구역 분류를 추론하는 인공신경망 모듈의 추론 성능을 향상시키는 방향으로 파노라마 이미지를 생성하도록 구성되므로 기존의 컴퓨터 비전 방식의 파노라마 이미지 생성 알고리즘에 비하여 여행 영상의 구역 분류에 대한 정확도가 향상되게 되는 효과가 발생된다. According to the modified example of the present invention, it is possible to infer zone classification by converting a representative frame into a panoramic image, so that the disadvantages of the method of classifying zones of the entire travel image through the representative frame can be offset. . In addition, since it is configured to generate a panoramic image in the direction of improving the inference performance of the artificial neural network module that infers the region classification, the accuracy of the region classification of the travel image is improved compared to the existing computer vision panoramic image generation algorithm. occurs.

기존에는 파노라마 이미지를 생성하기 위해 SIFT(Scale-invariant Feature Transform) 알고리즘, SURF(Speed Up Robust Features) 알고리즘, CDVS(Compact Descriptor for Visual Search) 알고리즘 등의 컴퓨터 비전 계열의 알고리즘으로 특징점을 추출하고, RANSAC 알고리즘, PROSAC 알고리즘 등을 적용하여 특징점들 중에서 outlier를 제거하며, 두 이미지의 inlier 특징점들 중에서 매칭점을 선별하며, 각 이미지의 참인 특징점들(inlier)로 선별된 매칭점을 이용하여 호모그래피 행렬을 산출하고, 산출된 호모그래피 행렬을 이용하여 이미지 정합(스티칭, stiching)을 수행하여 파노라마 이미지를 생성하게 된다. 이러한 기존의 방식을 본 발명의 일실시예에 따른 파노라마 대표 프레임의 생성에 적용하게 되면 여행 영상의 전체 프레임에 대한 특징점 추출 및 outlier 제거를 컴퓨팅하는데 상당한 컴퓨팅 리소스가 요구되게 되고, 컴퓨터 비전 기반의 기존 알고리즘의 특성 상 outlier를 확실하게 제거하지 못해 다양한 형태의 문제가 발생되며, 여행 영상의 구역 분류와 전혀 관련이 없는 파노라마 이미지가 생성되게 되는 문제가 발생되게 된다. Conventionally, in order to generate a panoramic image, feature points are extracted with computer vision algorithms such as SIFT (Scale-invariant Feature Transform) algorithm, SURF (Speed Up Robust Features) algorithm, CDVS (Compact Descriptor for Visual Search) algorithm, and RANSAC algorithm, PROSAC algorithm, etc. are applied to remove outliers among feature points, select matching points among inlier feature points of two images, and generate a homography matrix using matching points selected as true feature points (inliers) of each image. A panorama image is generated by calculating and performing image matching (stiching) using the calculated homography matrix. When this conventional method is applied to the generation of a representative panorama frame according to an embodiment of the present invention, significant computing resources are required to compute feature point extraction and outlier removal for the entire frame of the travel image, and computer vision-based existing Due to the nature of the algorithm, various types of problems arise because outliers cannot be removed reliably, and a problem arises in that a panoramic image is generated that is not at all related to the region classification of the travel video.

이상에서 설명한 바와 같이, 본 발명이 속하는 기술 분야의 통상의 기술자는 본 발명이 그 기술적 사상이나 필수적 특징을 변경하지 않고서 다른 구체적인 형태로 실시될 수 있다는 것을 이해할 수 있을 것이다. 그러므로 상술한 실시예들은 모든 면에서 예시적인 것이며 한정적인 것이 아닌 것으로서 이해해야만 한다. 본 발명의 범위는 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 등가 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함하는 것으로 해석되어야 한다.As described above, those skilled in the art to which the present invention pertains will be able to understand that the present invention can be embodied in other specific forms without changing its technical spirit or essential features. Therefore, the above-described embodiments should be understood as illustrative in all respects and not restrictive. The scope of the present invention is indicated by the claims to be described later rather than the detailed description, and all changes or modifications derived from the meaning and scope of the claims and equivalent concepts should be construed as being included in the scope of the present invention.

본 명세서 내에 기술된 특징들 및 장점들은 모두를 포함하지 않으며, 특히 많은 추가적인 특징들 및 장점들이 도면들, 명세서, 및 청구항들을 고려하여 당업자에게 명백해질 것이다. 더욱이, 본 명세서에 사용된 언어는 주로 읽기 쉽도록 그리고 교시의 목적으로 선택되었고, 본 발명의 주제를 묘사하거나 제한하기 위해 선택되지 않을 수도 있다는 것을 주의해야 한다.The features and advantages described in this specification are not all inclusive, and many additional features and advantages will become apparent to those skilled in the art, particularly from consideration of the drawings, specification, and claims. Moreover, it should be noted that the language used herein has been chosen primarily for readability and instructional purposes, and may not have been chosen to delineate or limit the subject matter of the invention.

본 발명의 실시예들의 상기한 설명은 예시의 목적으로 제시되었다. 이는 개시된 정확한 형태로 본 발명을 제한하거나, 빠뜨리는 것 없이 만들려고 의도한 것이 아니다. 당업자는 상기한 개시에 비추어 많은 수정 및 변형이 가능하다는 것을 이해할 수 있다.The foregoing description of embodiments of the present invention has been presented for purposes of illustration. It is not intended to limit the invention to the precise form disclosed or to make it without omission. Those skilled in the art can appreciate that many modifications and variations are possible in light of the above disclosure.

그러므로 본 발명의 범위는 상세한 설명에 의해 한정되지 않고, 이를 기반으로 하는 출원의 임의의 청구항들에 의해 한정된다. 따라서, 본 발명의 실시예들의 개시는 예시적인 것이며, 이하의 청구항에 기재된 본 발명의 범위를 제한하는 것은 아니다.Therefore, the scope of the present invention is not limited by the detailed description, but by any claims of the application based thereon. Accordingly, the disclosure of embodiments of the invention is illustrative and not limiting of the scope of the invention set forth in the claims below.

1: 인공지능 기반의 여행 영상 분류 장치
10: 여행 영상 수신 모듈
11: 움직임 정보 출력 모듈
12: 대표 구간 선정 모듈
13: 대표 프레임 선정 모듈
14: 대표 프레임 분류 모듈
21: 특징 추출 모듈
22: 코어 특징 선정 모듈
23: 후보 프레임 선정 모듈
24: 프레임 정합 모듈
220: 클래스 액티베이션 생성 모듈
100: 생성자 클라이언트1: AI-based travel image classification device
10: travel video receiving module
11: motion information output module
12: Representative section selection module
13: Representative frame selection module
14: representative frame classification module
21: feature extraction module
22: Core feature selection module
23: candidate frame selection module
24: frame matching module
220: class activation generation module
100: constructor client

Claims

a travel image receiving module configured to receive a travel image generated by a creator client and composed of a plurality of travel image frames in time series and image information including location information of the travel image;
a representative frame selection module for selecting a specific frame among the frames of the travel video as a representative frame;
A representative frame classification artificial neural network module having the representative frame as input data and first zone classification information, which is classification information on the theme of the zone including the location information, as output data, a representative frame classification module outputting classification information of the first zone for the first zone;
Characteristics comprising a CNN-based feature extraction artificial neural network including feature class information (class probability) and feature coordinate information (coordinate data) of one or more object features extracted from the representative frame as input data and output data as output data extraction module;
It is connected to a specific convolution layer of the representative frame classification module and receives class activation information including an activation map for each class, and uses the class activation information and the feature coordinate information to determine the classification of each class. a core feature selection module that generates feature importance information and selects a class having the largest generated feature importance information as a core feature;
a candidate frame selection module for selecting a plurality of frames including the core feature among frames before and after the representative frame selected by the representative frame selection module as candidate frames; and
a frame stitching module generating a panoramic representative frame by stitching the representative frame and the candidate frame based on the core feature;
including,
Characterized in that the panorama representative frame generated by the frame matching module is input as input data to the representative frame classification module to output second zone classification information as output data.
An apparatus for classifying travel images using panoramic image conversion of images.

According to claim 1,
The feature importance information,
The rate at which the hitmap of the class activation information is included within the bounding box range according to the feature coordinate information of each class or the sum of the activation values of the class activation information within the bounding box range of each class characterized in that,
An apparatus for classifying travel images using panoramic image conversion of images.

a travel image reception step in which a travel image reception module receives a travel image generated by a creator client and composed of a plurality of travel image frames in time series and image information including location information of the travel image;
a representative frame selection step in which a representative frame selection module selects a specific frame among the frames of the travel video as a representative frame;
A representative frame classification module including a representative frame classification artificial neural network module that takes the representative frame as input data and first zone classification information, which is classification information on the theme of the zone including the location information, as output data, and a representative frame classification step of outputting the first zone classification information for the image information.
Characteristics comprising a CNN-based feature extraction artificial neural network including feature class information (class probability) and feature coordinate information (coordinate data) of one or more object features extracted from the representative frame as input data and output data as output data a feature extraction step of outputting, by an extraction module, the feature class information and the feature coordinate information;
A core feature selection module is connected to a specific convolution layer of the representative frame classification module to receive class activation information including an activation map for each class, and the class activation information and the feature coordinate information a core feature selection step of generating feature importance information of each class by using and selecting a class having the largest generated feature importance information as a core feature;
a candidate frame selection step of selecting, by a candidate frame selection module, a plurality of frames including the core feature among frames before and after the representative frame as candidate frames; and
a frame matching step of generating, by a frame matching module, a panoramic representative frame by stitching the representative frame and the candidate frame based on the core feature;
including,
Characterized in that the panoramic representative frame is input as input data to the representative frame classification module and configured to output second zone classification information as output data.
A travel image classification method using panoramic image conversion of images.

a memory module containing travel image classification program code; and
a processing module that executes the travel image classification program code;
including,
The travel image classification program code,
A travel image reception step of receiving a travel image generated by a generator client and composed of a plurality of travel image frames in time series and image information including location information of the travel image;
a representative frame selection step of selecting a specific frame among the frames of the travel video as a representative frame;
A representative frame classification artificial neural network module having the representative frame as input data and first zone classification information, which is classification information on the theme of the zone including the location information, as output data Representative frame classification step of outputting zone 1 classification information;
A CNN-based feature extraction artificial neural network that uses input data as a representative frame and includes feature class information (class probability) and feature coordinate information (coordinate data) of one or more object features extracted from the representative frame as output data, the feature class a feature extraction step of outputting information and the feature coordinate information;
It is connected to a specific convolution layer of the representative frame classification artificial neural network module, receives class activation information including an activation map for each class, and uses the class activation information and the feature coordinate information to a core feature selection step of generating feature importance information of a class and selecting a class having the largest generated feature importance information as a core feature;
a candidate frame selection step of selecting a plurality of frames including the core feature among frames before and after the representative frame as candidate frames; and
a frame matching step of generating a panorama representative frame by stitching the representative frame and the candidate frame based on the core feature;
Characterized in that it is configured to perform steps including on a computer,
Characterized in that the panoramic representative frame is input as input data to the representative frame classification module and configured to output second zone classification information as output data.
An apparatus for classifying travel images using panoramic image conversion of images.

a travel image reception step in which a travel image reception module receives a travel image generated by a creator client and composed of a plurality of travel image frames in time series and image information including location information of the travel image;
a representative frame selection step in which a representative frame selection module selects a specific frame among the frames of the travel video as a representative frame;
A representative frame classification module including a representative frame classification artificial neural network module that takes the representative frame as input data and first zone classification information, which is classification information on the theme of the zone including the location information, as output data, and a representative frame classification step of outputting the first zone classification information for the image information.
Characteristics comprising a CNN-based feature extraction artificial neural network including feature class information (class probability) and feature coordinate information (coordinate data) of one or more object features extracted from the representative frame as input data and output data as output data a feature extraction step of outputting, by an extraction module, the feature class information and the feature coordinate information;
A core feature selection module is connected to a specific convolution layer of the representative frame classification module to receive class activation information including an activation map for each class, and the class activation information and the feature coordinate information a core feature selection step of generating feature importance information of each class by using and selecting a class having the largest generated feature importance information as a core feature;
a candidate frame selection step of selecting, by a candidate frame selection module, a plurality of frames including the core feature among frames before and after the representative frame as candidate frames; and
a frame matching step of generating, by a frame matching module, a panoramic representative frame by stitching the representative frame and the candidate frame based on the core feature;
Characterized in that it is configured to perform steps including on a computer,
Characterized in that the panoramic representative frame is input as input data to the representative frame classification module and configured to output second zone classification information as output data.
A program stored in a recording medium configured to perform a travel image classification method using a panoramic image conversion of an image on a computer.