KR102284913B1

KR102284913B1 - Users Field of View selection method and apparatus in 360 degree videos

Info

Publication number: KR102284913B1
Application number: KR1020200108412A
Authority: KR
Inventors: 이종원; 무하마드 이르판
Original assignee: 세종대학교산학협력단
Priority date: 2020-08-27
Filing date: 2020-08-27
Publication date: 2021-08-02

Abstract

Disclosed are a method for selecting a user's viewing angle in a 360-degree image and a device thereof. A method of selecting the user's viewing angle in a 360-degree image includes the steps of: (a) preprocessing the 360-degree image and dividing each into three FoV (Field of View) images having a 120-degree horizontal view; (b) calculating a salience score for each of the FoV images; and (c) selecting and transmitting any one of the three FoV images using the salience score for each FoV image. Therefore, bandwidth waste can be reduced by allowing only interesting FoVs to be transmitted in the 360-degree image.

Description

User Field of View selection method and apparatus in 360 degree videos}

본 발명은 360도 영상에서의 사용자 시야각 선택 방법 및 그 장치에 관한 것이다. The present invention relates to a method and apparatus for selecting a user's viewing angle in a 360-degree image.

360도 카메라는 몰입형 세계에 대한 완전한 뷰를 제공하므로 기존 카메라보다 더 압도적인 몰입이 가능하도록 만든다. 이러한 360도 카메라는 사용자에게 즐거운 경험을 제공하기 때문에 가상 현실(VR)및 증강 현실(AR)과 같은 최신 기술 및 응용 프로그램에 채택되고 있다. The 360-degree camera provides a complete view of the immersive world, making it more immersive than traditional cameras. These 360-degree cameras are being adopted in the latest technologies and applications such as virtual reality (VR) and augmented reality (AR) because they provide an enjoyable experience for users.

많은 수요로 인해 최근에는 Facebook, YouTube, Google과 같은 거대 테크 및 소셜 미디어 기업에서도 제작 및 지원되고 있다. 최근에는 100만개 이상의 360도 비디오 콘텐츠와 2500만개의 360도 이미지가 Facebook에 업로드 되고 있다. Due to high demand, it has recently been created and supported by tech and social media giants such as Facebook, YouTube, and Google. Recently, more than 1 million 360-degree video content and 25 million 360-degree images have been uploaded to Facebook.

이러한 360도 비디오는 기존 2D 비디오에 비해 가상 콘텐츠에 있는 듯한 착각을 통해 사용자에게 흥미 진진한 경험을 선사한다. 360도 비디오 관련 분야에 대해 사용자, 거대 기업 및 연구원들을 끌어 들었으나 동시에 다양한 응용 프로그램을 탐색하는 과정에서 새로운 문제에 직면하게 되었다. Compared to traditional 2D video, these 360-degree videos provide users with an exciting experience through the illusion of being in virtual content. It has attracted users, giants and researchers to the field of 360-degree video, but at the same time, it faces new challenges while exploring various applications.

360도 비디오 콘텐츠는 영상의 높은 해상도와 시각적 콘텐츠에 대한 인간의 제한된 FoV(Field of view)로 인한 문제가 발생한다. 무엇보다도 360도 비디오의 넓은 범위로 인해 사용자가 "볼 곳"을 선택하기가 매우 어려운 문제점이 있다. 이로 인해, 종래의 경우, 도 1에 도시된 바와 같이, HMD(head-mounted displays)와 같은 웨어러블 장치를 이용하여 360도 비디오에서 FoV를 검색하여 수동으로 선택하는 방법이 이용되고 있다. 360-degree video content is problematic due to the high resolution of the image and the limited human field of view (FoV) for visual content. First of all, the wide range of 360-degree video makes it very difficult for users to choose "where to watch". For this reason, in the conventional case, as shown in FIG. 1 , a method of manually selecting an FoV by searching for a FoV in a 360-degree video using a wearable device such as a head-mounted display (HMD) is used.

본 발명은 360도 영상에서의 사용자 시야각 선택 방법 및 그 장치를 제공하기 위한 것이다. An object of the present invention is to provide a method and an apparatus for selecting a user's viewing angle in a 360-degree image.

또한, 본 발명은 360도 영상에서 사용자에게 가장 흥미로운 FoV를 자동으로 선택할 수 있는 360도 영상에서의 사용자 시야각 선택 방법 및 그 장치를 제공하기 위한 것이다. Another object of the present invention is to provide a method and apparatus for selecting a user's viewing angle in a 360-degree image that can automatically select the FoV most interesting to the user in the 360-degree image.

또한, 본 발명은 딥 러닝 기법을 이용하여 360도 영상에 존재하는 세일리언시 객체를 검출한 후 이를 기초로 가장 흥미로우며 사용자의 기억에 남는 FoV를 자동으로 선택할 수 있는 360도 영상에서의 사용자 시야각 선택 방법 및 그 장치를 제공하기 위한 것이다. In addition, the present invention detects a saliency object existing in a 360-degree image using a deep learning technique, and based on this, a user in a 360-degree image that can automatically select the most interesting and memorable FoV of the user. An object of the present invention is to provide a viewing angle selection method and an apparatus therefor.

또한, 본 발명은 360도 영상에서 흥미로운 FoV만을 전송하도록 함으로써 대역폭 낭비를 줄일 수 있는 360도 영상에서의 사용자 시야각 선택 방법 및 그 장치를 제공하기 위한 것이다. Another object of the present invention is to provide a method and an apparatus for selecting a user's viewing angle in a 360-degree image that can reduce bandwidth waste by transmitting only interesting FoV from a 360-degree image.

본 발명의 일 측면에 따르면, 360도 영상에서의 사용자 시야각 선택 방법이 제공된다. According to one aspect of the present invention, a method for selecting a user's viewing angle in a 360-degree image is provided.

본 발명의 일 실시예에 따르면, (a) 360도 영상을 전처리하여 120도 수평 시야(view)를 가지는 3개의 FoV(Field of View) 영상으로 각각 분할하는 단계; (b) 상기 각 FoV 영상에 대한 세일리언스 스코어를 계산하는 단계; 및 (c) 상기 각 FoV 영상에 대한 세일리언시 스코어를 이용하여 상기 3개의 FoV 영상 중 어느 하나를 선택하여 전송하는 단계를 포함하는 360도 영상에서의 사용자 시야각 선택 방법이 제공될 수 있다. According to an embodiment of the present invention, (a) pre-processing a 360-degree image and dividing each of the three FoV (Field of View) images having a 120-degree horizontal view (view); (b) calculating a salience score for each FoV image; and (c) selecting and transmitting any one of the three FoV images using the saliency score for each FoV image.

상기 (b) 단계는, 상기 각 FoV 영상을 객체 감지 모델에 적용하여 객체를 각각 감지한 후 상기 감지된 각 객체의 객체 클래스와 정확도값을 도출하는 단계; 상기 각 FoV 영상을 기학습된 기억용이성 계산 모델에 적용하여 각 FoV 영상에 대한 기억용이성 스코어를 각각 계산하는 단계; 및 상기 각 객체의 객체 클래스와 정확도값 및 상기 기억용이성 스코어를 이용하여 각 FoV 영상에 대한 세일리언시 스코어를 계산하는 단계를 포함할 수 있다. The step (b) may include: applying each FoV image to an object detection model to detect an object, and then deriving an object class and an accuracy value of each detected object; calculating an ease of memory score for each FoV image by applying each of the FoV images to a previously learned memorability calculation model; and calculating a saliency score for each FoV image using the object class and accuracy value of each object and the memorization score.

상기 각 FoV 영상에 대한 세일리언시 스코어는 하기 수학식을 이용하여 계산되되,The saliency score for each FoV image is calculated using the following equation,

여기서, X, Y, Z는 각 객체 클래스에 포함되는 객체의 전체 개수를 나타내고, P, A, V는 각 객체 클래스에 포함된 각각의 객체를 나타내며, acc는 각 객체의 정확도값을 나타내고, M은 기억용이성 스코어를 나타내며,

는 밸런싱 가중치를 나타낸다. Here, X, Y, and Z represent the total number of objects included in each object class, P, A, and V represent each object included in each object class, acc represents the accuracy value of each object, and M represents the memory recall score,

denotes a balancing weight.

상기 (c) 단계는, 상기 3개의 FoV 영상 중 세일리언시 스코어가 가장 높은 FoV 영상을 뷰포트로서 선택하여 전송할 수 있다. In step (c), the FoV image having the highest saliency score among the three FoV images may be selected and transmitted as a viewport.

상기 각 FoV 영상은 상기 360도 영상의 전체 칼럼(column) 픽셀 개수를 3등분하여 수평 시야가 동일한 120도를 가지도록 생성될 수 있다. Each of the FoV images may be generated by dividing the number of pixels in the entire column of the 360-degree image into three so that the horizontal field of view has the same 120 degrees.

상기 검출된 각 객체의 객체 클래스는 기분류된 클래스 중 어느 하나로 분류되되, 상기 기분류된 클래스와 상기 검출된 각 객체와의 정확도값을 각각 도출한 후 정확도값이 가장 높은 클래스로 상기 검출된 각 객체의 객체 클래스가 결정될 수 있다. The object class of each detected object is classified into any one of the classified classes, and after deriving the accuracy values of the previously classified class and each of the detected objects, each detected as the class having the highest accuracy value The object class of the object may be determined.

본 발명의 다른 측면에 따르면, 360도 영상에서의 사용자 시야각을 선택할 수 있는 장치가 제공된다. According to another aspect of the present invention, an apparatus for selecting a user's viewing angle in a 360-degree image is provided.

본 발명의 일 실시예에 따르면, 장치에 있어서, 적어도 하나의 명령어를 저장하는 메모리; 및 상기 메모리에 저장된 명령어를 실행하는 프로세서를 포함하되, 상기 프로세서에 의해 실행된 명령어는, (a) 360도 영상을 전처리하여 120도 수평 시야(view)를 가지는 3개의 FoV(Field of View) 영상으로 각각 분할하는 단계; (b) 상기 각 FoV 영상에 대한 세일리언스 스코어를 계산하는 단계; 및 (c) 상기 각 FoV 영상에 대한 세일리언시 스코어를 이용하여 상기 3개의 FoV 영상 중 어느 하나를 선택하여 전송하는 단계를 수행하는 것을 특징으로 하는 장치가 제공될 수 있다. According to an embodiment of the present invention, there is provided an apparatus comprising: a memory for storing at least one instruction; and a processor executing the instructions stored in the memory, wherein the instructions executed by the processor include: (a) three Field of View (FoV) images having a 120-degree horizontal view by preprocessing a 360-degree image dividing each into (b) calculating a salience score for each FoV image; and (c) selecting and transmitting any one of the three FoV images using the saliency score for each FoV image.

본 발명의 일 실시예에 따른 360도 영상에서 사용자에게 가장 흥미로운 FoV를 자동으로 선택할 수 있는 이점이 있다. There is an advantage in that the FoV most interesting to the user can be automatically selected from the 360-degree image according to an embodiment of the present invention.

또한, 본 발명은 딥 러닝 기법을 이용하여 360도 영상에 존재하는 세일리언시 객체를 검출한 후 이를 기초로 가장 흥미로우며 사용자의 기억에 남는 FoV를 자동으로 선택할 수 있는 이점이 있다. In addition, the present invention has the advantage of automatically selecting the most interesting and memorable FoV of the user based on the detection of a saliency object existing in a 360-degree image using a deep learning technique.

또한, 본 발명은 360도 영상에서 흥미로운 FoV만을 전송하도록 함으로써 대역폭 낭비를 줄일 수 있다.In addition, the present invention can reduce bandwidth waste by allowing only interesting FoVs to be transmitted in a 360-degree image.

도 1은 종래의 HMD(head-mounted displays)를 이용하여 360도 비디오에서 FoV를 수동 검색하는 방법을 설명하기 위해 도시한 도면.
도 2는 본 발명의 일 실시예에 따른 360도 영상에서의 사용자 시야각 선택 방법을 나타낸 순서도.
도 3은 본 발명의 일 실시예에 따른 360도 비디오 프레임의 해상도를 설명하기 위해 도시한 도면.
도 4는 본 발명의 일 실시예에 따른 분할된 FoV의 해상도를 설명하기 위해 도시한 도면.
도 5는 본 발명의 일 실시예에 따른 동일한 수평 시야를 가지도록 균등 분할된 FoV를 예시한 도면.
도 6은 본 발명의 일 실시예에 따른 세일리언시 테이블을 예시한 도면.
도 7은 본 발명의 일 실시예에 따른 360도 영상에서의 사용자 시야각을 자동으로 선택할 수 있는 장치의 내부 구성을 개략적으로 도시한 블록도.1 is a diagram illustrating a method of manually searching for a FoV in a 360-degree video using a conventional head-mounted displays (HMD);
2 is a flowchart illustrating a method for selecting a user's viewing angle in a 360-degree image according to an embodiment of the present invention.
3 is a view for explaining the resolution of a 360-degree video frame according to an embodiment of the present invention.
4 is a view for explaining the resolution of the divided FoV according to an embodiment of the present invention.
5 is a diagram illustrating equally divided FoV to have the same horizontal field of view according to an embodiment of the present invention;
6 is a diagram illustrating a saliency table according to an embodiment of the present invention.
7 is a block diagram schematically illustrating an internal configuration of an apparatus capable of automatically selecting a user's viewing angle in a 360-degree image according to an embodiment of the present invention.

본 명세서에서 사용되는 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 명세서에서, "구성된다" 또는 "포함한다" 등의 용어는 명세서상에 기재된 여러 구성 요소들, 또는 여러 단계들을 반드시 모두 포함하는 것으로 해석되지 않아야 하며, 그 중 일부 구성 요소들 또는 일부 단계들은 포함되지 않을 수도 있고, 또는 추가적인 구성 요소 또는 단계들을 더 포함할 수 있는 것으로 해석되어야 한다. 또한, 명세서에 기재된 "...부", "모듈" 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어 또는 소프트웨어로 구현되거나 하드웨어와 소프트웨어의 결합으로 구현될 수 있다.As used herein, the singular expression includes the plural expression unless the context clearly dictates otherwise. In this specification, terms such as “consisting of” or “comprising” should not be construed as necessarily including all of the various components or various steps described in the specification, some of which components or some steps are It should be construed that it may not include, or may further include additional components or steps. In addition, terms such as "...unit" and "module" described in the specification mean a unit that processes at least one function or operation, which may be implemented as hardware or software, or a combination of hardware and software. .

이하, 첨부된 도면들을 참조하여 본 발명의 실시예를 상세히 설명한다. Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 2는 본 발명의 일 실시예에 따른 360도 영상에서의 사용자 시야각 선택 방법을 나타낸 순서도이며, 도 3은 본 발명의 일 실시예에 따른 360도 비디오 프레임의 해상도를 설명하기 위해 도시한 도면이고, 도 4는 본 발명의 일 실시예에 따른 분할된 FoV의 해상도를 설명하기 위해 도시한 도면이고, 도 5는 본 발명의 일 실시예에 따른 동일한 수평 시야를 가지도록 균등 분할된 FoV를 예시한 도면이며, 도 6은 본 발명의 일 실시예에 따른 세일리언시 테이블을 예시한 도면이다. 2 is a flowchart illustrating a method for selecting a user's viewing angle in a 360-degree image according to an embodiment of the present invention, and FIG. 3 is a diagram illustrating the resolution of a 360-degree video frame according to an embodiment of the present invention. , FIG. 4 is a diagram illustrating the resolution of a divided FoV according to an embodiment of the present invention, and FIG. 5 illustrates an equally divided FoV to have the same horizontal field of view according to an embodiment of the present invention 6 is a diagram illustrating a saliency table according to an embodiment of the present invention.

단계 210에서 장치(100)는 입력되는 몰입형 비디오 프레임을 전처리하여 동일한 수평 시야(view)를 가지는 3개의 FoV 영상으로 각각 분할한다. In step 210 , the device 100 preprocesses the input immersive video frame and divides it into three FoV images each having the same horizontal view.

이에 대해 보다 상세히 설명하기로 한다. This will be described in more detail.

입력된 몰입형 비디오 프레임은 360도 영상이다. 일반적인 2D 영상의 경우 해상도가 영상의 가로 및 세로 크기를 기준으로 결정된다. 그러나, 360도 영상의 경우 파노라마로 인해 다소 복잡하다. 이러한 몰입형 비디오(이하, 360도 비디오 프레임으로 칭하기로 함)의 경우 콘텐츠가 가로로 360도 및 세로로 180도 늘어나며, 전체 장면(scene)는 시청자(viewer)의 두 눈에 의해 분할 될 수 있다. 따라서, 사용자의 FoV는 전체 360도 비디오 프레임이 아니라 120도로 제한된다. 따라서, 본 발명의 일 실시예에서는 360도 비디오 프레임을 120도로 동일한 수평 시야(120)를 가지는 3개의 FoV(Field of View) 영상을 각각 분할 수 있다. The input immersive video frame is a 360 degree image. In the case of a general 2D image, the resolution is determined based on the horizontal and vertical dimensions of the image. However, in the case of a 360-degree image, it is somewhat complicated due to the panorama. In the case of such an immersive video (hereinafter referred to as a 360-degree video frame), the content is stretched by 360 degrees horizontally and 180 degrees vertically, and the entire scene can be divided by the viewer's two eyes. . Thus, the user's FoV is limited to 120 degrees rather than a full 360 degree video frame. Accordingly, according to an embodiment of the present invention, three Field of View (FoV) images having the same horizontal field of view 120 as a 360-degree video frame may be divided into 120-degree respectively.

360도 비디오 프레임은 도 3에서 보여지는 바와 같이, 2k에서 16k까지 다양한 해상도로 제공된다. 따라서, 사용자의 FoV 영상의 해상도를 자동으로 조절하기 위해 수학식 1과 같은 방정식이 이용될 수 있다. 360 degree video frames are provided in various resolutions from 2k to 16k, as shown in FIG. 3 . Accordingly, the equation of Equation 1 may be used to automatically adjust the resolution of the user's FoV image.

여기서, r과 c는 각각 로우(rows)와 칼럼(columns)의 픽셀 개수를 나타내며, pixels는 360도 비디오 프레임의 전체 픽셀 개수를 나타낸다. Here, r and c represent the number of pixels in rows and columns, respectively, and pixels represent the total number of pixels in a 360 degree video frame.

장치(100)는 360도 비디오 프레임의 전체 칼럼의 개수를 기반으로 120도로 동일한 수평 시야를 가진 3개의 FoV로 360도 비디오 프레임을 분할할 수 있다. The device 100 may split the 360-degree video frame into three FoVs with the same horizontal field of view at 120 degrees based on the total number of columns of the 360-degree video frame.

예를 들어, 이를 수학식으로 나타내면 수학식 2와 같다. For example, if this is expressed as an equation, it is equal to Equation 2.

여기서,

는 120도의 수평 시야(view)이다. 수학식 2에 의해 장치는 360도 비디오 프레임으로부터 120도 수평 시야를 가지는 FoV 영상으로 각각 분할될 수 있다. here,

is a horizontal view of 120 degrees. By Equation 2, the device can be divided into FoV images each having a 120-degree horizontal field of view from a 360-degree video frame.

360도 비디오 프레임을 동일한 120도 수평 시야를 가지는 3개의 FoV 영상으로 분할한 일 예가 도 4에 도시되어 있다. An example of dividing a 360-degree video frame into three FoV images having the same 120-degree horizontal field of view is shown in FIG. 4 .

장치(100)는 입력된 360도 비디오 프레임을 120도 시야를 가지는 3개의 FoV로 각각 분할하고, 각각의 FoV의 해상도를 입력된 360도 비디오 프레임의 크기에 기초하여 조절할 수 있다.The apparatus 100 may divide the input 360-degree video frame into three FoVs each having a 120-degree field of view, and adjust the resolution of each FoV based on the size of the input 360-degree video frame.

예를 들어, 12000pixes로 구성된 12k FoV를 가지는 360도 비디오 프레임의 경우, 120도 시야를 가지며, 4000 pixel로 구성된 4k FoV로 각각 분할될 수 있다. 다양한 입력 해상도를 기반으로 한 360도 비디오 프레임의 UFoV의 해상도는 도 5와 같다. For example, a 360-degree video frame having 12k FoV composed of 12000 pixels has a 120-degree field of view and may be divided into 4k FoV composed of 4000 pixels, respectively. The resolution of UFoV of a 360-degree video frame based on various input resolutions is shown in FIG. 5 .

단계 215에서 장치(100)는 각각의 FoV 영상을 딥러닝 기반 객체 감지 모델에 적용하여 객체를 감지하며, 감지된 객체에 대한 객체 클래스 및 정확도값을 도출한다. In step 215, the device 100 detects an object by applying each FoV image to a deep learning-based object detection model, and derives an object class and accuracy value for the detected object.

딥러닝 기반 객체 감지 모델은 SSDLite(Single Shot Detector) 기반 모델일 수 있다. The deep learning-based object detection model may be an SSDLite (Single Shot Detector)-based model.

객체 감지 모델은 다양한 객체에 대한 클래스를 사전 학습하고 있는 것을 가정하기로 한다. 여기서, 클래스는 사람, 자동차, 동물 등등일 수 있다. 클래스의 종류는 이외에도 더 다양할 수 있음은 당연하다. The object detection model assumes that classes for various objects are learned in advance. Here, the class may be a person, a car, an animal, and the like. It goes without saying that the types of classes may be more diverse than others.

즉, 장치(100)는 딥러닝 기반 객체 감지 모델을 통해 각각의 FoV 영상에서 객체를 감지한 후 감지된 객체를 기분류된 클래스(객체 클래스)와의 정확도값을 도출하여 정확도가 가장 높은 클래스로 감지된 객체의 객체 클래스를 결정하여 도출할 수 있다. That is, the device 100 detects an object in each FoV image through a deep learning-based object detection model, and then derives an accuracy value with the class (object class) from the sensed object to detect the highest accuracy class It can be derived by determining the object class of the object.

이를 통해, 장치(100)는 도 6에서 보여지는 바와 같이, 각 FoV 영상에 대한 객체 클래스 및 정확도값을 테이블에 저장할 수 있다. Through this, the apparatus 100 may store the object class and accuracy value for each FoV image in a table, as shown in FIG. 6 .

단계 220에서 장치(100)는 각 FoV 영상을 기학습된 기억용이성 계산 모델에 적용하여 기억용이성 스코어를 도출한다. In step 220 , the device 100 derives a memorization score by applying each FoV image to the pre-learned memorability calculation model.

여기서, 기억용이성 계산 모델은 hybrid-AlexNet 모델이며, 많은 주석이 달린 이미지 기반 데이터베이스로 미세 조정될 수 있다. 기억용이성 계산 모델은 인간과 아름다운 자연 경관을 포함하는 다양한 객체 클래스의 기억용이성 스코어가 사전 학습될 수 있다. Here, the memorability computational model is a hybrid-AlexNet model, which can be fine-tuned with many annotated image-based databases. In the memory recall calculation model, memory recall scores of various object classes including humans and beautiful natural landscapes may be pre-trained.

따라서, 장치(100)는 학습된 기억용이성 계산 모델이 각각의 FoV 영상을 적용하여 각각의 FoV 영상에 대한 기억용이성 스코어를 계산할 수 있다. Accordingly, the apparatus 100 may calculate a memorability score for each FoV image by applying the learned memorization calculation model to each FoV image.

기억용이성 계산 모델은 이미지 기반 데이터베이스를 기반으로 다양한 객체가 포함된 이미지에 대한 기억용이성 스코어가 학습되어 있다. 따라서, 장치(100)는 해당 학습된 기억용이성 계산 모델에 각각의 FoV 영상을 적용하여 해당 FoV 영상에 대한 기억용이성 스코어를 도출할 수 있다. The memorability calculation model is based on an image-based database, in which memory scores for images containing various objects are learned. Accordingly, the apparatus 100 may derive a memorability score for the corresponding FoV image by applying each FoV image to the learned memory ease calculation model.

기억용이성 스코어를 계산하기 위한 hybrid-AlexNet 모델은 공지된 모델이며, 해당 공지된 기억용이성 스코어를 계산하는 다양한 방법이면 모두 동일하게 적용될 수 있다. The hybrid-AlexNet model for calculating the memory recall score is a known model, and various methods for calculating the known memory score can be equally applied.

단계 225에서 장치(100)는 각 FoV 영상의 객체 클래스와 정확도 및 기억용이성 스코어를 이용하여 각 FoV 영상에 대한 세일리언시 스코어를 계산한다. In step 225 , the device 100 calculates a saliency score for each FoV image by using the object class and accuracy and memorability scores of each FoV image.

예를 들어, 장치(100)는 하기 수학식 3을 이용하여 각 FoV 영상에 대한 세일리언시 스코어를 각각 계산할 수 있다. For example, the apparatus 100 may calculate a saliency score for each FoV image by using Equation 3 below.

denotes a balancing weight.

단계 230에서 장치(100)는 각 FoV 영상에 대한 세일리언시 스코어를 이용하여 3개의 FoV 영상 중 어느 하나를 선택한다. In step 230 , the device 100 selects any one of the three FoV images by using the saliency scores for each FoV image.

예를 들어, 장치(100)는 하기 수학식 4를 이용하여 FoV 영상 중 어느 하나를 선택할 수 있다. For example, the device 100 may select any one of the FoV images using Equation 4 below.

즉, 장치(100)는 FoV 영상에 대한 세일리언시 스코어가 가장 높은 FoV 영상을 뷰포트로 선택하여 해당 선택된 뷰포트 영상만을 전송할 수 있다.That is, the device 100 may select the FoV image having the highest saliency score for the FoV image as the viewport and transmit only the selected viewport image.

이를 통해, 장치(100)는 사용자에게 가장 흥미 있고 기억에 남을 만한 FoV 영상을 자동으로 선택하여 해당 FoV 영상만 선택적으로 전송하도록 함으로써 대역폭 낭비를 줄일 수 있다.Through this, the device 100 automatically selects the most interesting and memorable FoV image to the user and selectively transmits only the corresponding FoV image, thereby reducing bandwidth waste.

또한, 구현 방법에 따라, 특정 FoV 영상이 뷰포트로 선택된 이후에는 해당 FoV 영상에서 세일리언시 객체를 추적하도록 특정 FoV 영상 이후의 360도 비디오 프레임에서 FoV 영상을 자동으로 선택할 수도 있다. Also, according to an implementation method, after a specific FoV image is selected as a viewport, a FoV image may be automatically selected from a 360-degree video frame after the specific FoV image to track a saliency object in the corresponding FoV image.

도 7은 본 발명의 일 실시예에 따른 360도 영상에서의 사용자 시야각을 자동으로 선택할 수 있는 장치의 내부 구성을 개략적으로 도시한 블록도이다. 7 is a block diagram schematically illustrating an internal configuration of an apparatus capable of automatically selecting a user's viewing angle in a 360-degree image according to an embodiment of the present invention.

도 7을 참조하면, 본 발명의 일 실시예에 따른 장치(100)는 전처리부(710), 계산부(715), 뷰포트 선택부(720), 메모리(725) 및 프로세서(730)를 포함하여 구성된다. Referring to FIG. 7 , the apparatus 100 according to an embodiment of the present invention includes a preprocessor 710 , a calculator 715 , a viewport selector 720 , a memory 725 , and a processor 730 . is composed

전처리부(710)는 360도 비디오 프레임을 전처리하여 동일한 120도 수평 시야를 가지도록 3개의 FoV 영상으로 분할하기 위한 수단이다. The preprocessor 710 is a means for preprocessing a 360-degree video frame and dividing it into three FoV images to have the same 120-degree horizontal field of view.

계산부(715)는 각 FoV 영상에 대한 세일리언시 스코어를 계산하기 위한 수단이다. The calculator 715 is a means for calculating a saliency score for each FoV image.

계산부(715)는 세일리언시 스코어를 계산하기 위해 우선 각 FoV 영상에 대해 객체를 각각 검출하고, 검출된 객체의 객체 클래스와 정확도값을 도출하고, 각 FoV 영상에 대한 기억용이성 스코어를 도출할 수 있다.In order to calculate the saliency score, the calculator 715 first detects an object for each FoV image, derives an object class and accuracy value of the detected object, and derives a memorization score for each FoV image. can

이어, 계산부(715)는 각 FoV 영상에서 검출된 객체의 객체 클래스와 정확도값과 기억용이성 스코어를 이용하여 각 FoV 영상에 대한 세일리언시 스코어를 계산할 수 있다. Next, the calculator 715 may calculate a saliency score for each FoV image by using the object class, accuracy value, and memorization score of the object detected in each FoV image.

이미 전술한 바와 같이, 계산부(715)는 딥러닝 기반 객체 검출 모델에 각 FoV 영상을 적용하여 객체를 각각 검출한 후 검출된 각 객체를 기분류된 클래스 중 어느 하나로 분류하고, 이에 대한 정확도값을 도출할 수 있다.As already described above, the calculator 715 detects each object by applying each FoV image to the deep learning-based object detection model, and classifies each detected object into any one of the subclassified classes, and the accuracy value thereof can be derived.

또한, 계산부(715)는 기학습된 기억용이성 계산 모델에 각 FoV 영상을 적용하여 기억용이성 스코어를 도출할 수도 있다. In addition, the calculator 715 may derive an ease of memory score by applying each FoV image to the previously-learned calculation model for ease of memory.

각 테이블에 분류된 클래스별 감지된 객체의 개수와 각 객체에 대한 정확도값이 각 FoV 영상에 상응하여 저장될 수 있다. 물론, 테이블에는 각 FoV 영상에 대해 도출된 기억용이성 스코어가 저장될 수 있다.The number of objects detected for each class classified in each table and an accuracy value for each object may be stored corresponding to each FoV image. Of course, the table may store a memorability score derived for each FoV image.

따라서, 계산부(715)는 테이블을 기초로 각 FoV에 대한 객체 클래스와 정확도값 및 기억용이성 스코어를 이용하여 각 FoV 영상에 대한 세일리언시 스코어를 계산할 수 있다.Accordingly, the calculator 715 may calculate a saliency score for each FoV image using an object class, an accuracy value, and a memorization score for each FoV based on the table.

뷰포트 선택부(720)는 각 FoV 영상에 대한 세일리언시 스코어를 기초로 360도 비디오 프레임에 대한 3개의 FoV 영상 중 어느 하나를 뷰포트로 선택할 수 있다. The viewport selection unit 720 may select any one of three FoV images for a 360-degree video frame as a viewport based on a saliency score for each FoV image.

예를 들어, 뷰포트 선택부(720)는 세일리언시 스코어를 기초로 각 FoV 영상 중 세일리언시 스코어가 가장 큰 FoV 영상을 뷰포트로 선택할 수 있다. For example, the viewport selector 720 may select, as a viewport, a FoV image having the highest saliency score among FoV images based on the saliency score.

또한, 뷰포트 선택부(720)는 세일리언시 스코어를 이용하여 FoV 영상 중 어느 하나를 뷰포트로 선택함에 있어 이전 선택된 FoV 영상에 포함된 주요 객체(세일리언시 객체)를 포함하는 FoV 영상을 뷰포트로 선택할 수도 있다. 이때, 이전 선택된 FoV 영상에 포함된 주요 객체를 포함하는 FoV 영상을 뷰포트로 선택함에 있어 해당 FoV 영상에 세일리언시 스코어가 기준치 이하인 경우, 주요 객체를 추종하지 않고 세일리언시 스코어가 가장 높은 FoV 영상을 뷰포트로 선택하도록 할 수도 있다. In addition, when selecting any one of the FoV images as the viewport using the saliency score, the viewport selection unit 720 converts the FoV image including the main object (saliency object) included in the previously selected FoV image to the viewport. You can also choose At this time, when the FoV image including the main object included in the previously selected FoV image is selected as the viewport, if the saliency score of the corresponding FoV image is less than or equal to the reference value, the FoV image with the highest saliency score does not follow the main object. can be selected as the viewport.

메모리(725)는 본 발명의 일 실시예에 따른 360도 영상에서의 사용자 시야각 선택 방법을 수행하기 위해 필요한 명령어들을 저장한다. The memory 725 stores instructions necessary for performing the method for selecting a user's viewing angle in a 360-degree image according to an embodiment of the present invention.

프로세서(730)는 본 발명의 일 실시예에 따른 장치(100)의 내부 구성 요소들(예를 들어, 전처리부(710), 계산부(715), 뷰포트 선택부(720), 메모리(725) 등)을 제어하기 위한 수단이다. 또한, 별도로 설명하고 있지 않으나, 프로세서(730)는 메모리에 저장된 명령어들을 실행할 수도 있으며, 프로세서에 의해 실행된 명령어들은 도 1 내지 도 6을 참조하여 설명한 바와 같은 각각의 단계를 포함(수행)할 수도 있다. The processor 730 includes internal components (eg, a preprocessor 710 , a calculator 715 , a viewport selector 720 , and a memory 725 ) of the device 100 according to an embodiment of the present invention. etc.) is a means to control. In addition, although not described separately, the processor 730 may execute instructions stored in the memory, and the instructions executed by the processor may include (perform) each step as described with reference to FIGS. 1 to 6 . there is.

본 발명의 실시 예에 따른 장치 및 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 컴퓨터 판독 가능 매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 분야 통상의 기술자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media) 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다.The apparatus and method according to an embodiment of the present invention may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, and the like, alone or in combination. The program instructions recorded on the computer readable medium may be specially designed and configured for the present invention, or may be known and available to those skilled in the computer software field. Examples of the computer-readable recording medium include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic media such as floppy disks. - Includes magneto-optical media and hardware devices specially configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine language codes such as those generated by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like.

상술한 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The hardware devices described above may be configured to operate as one or more software modules to perform the operations of the present invention, and vice versa.

이제까지 본 발명에 대하여 그 실시 예들을 중심으로 살펴보았다. 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자는 본 발명이 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 변형된 형태로 구현될 수 있음을 이해할 수 있을 것이다. 그러므로 개시된 실시 예들은 한정적인 관점이 아니라 설명적인 관점에서 고려되어야 한다. 본 발명의 범위는 전술한 설명이 아니라 특허청구범위에 나타나 있으며, 그와 동등한 범위 내에 있는 모든 차이점은 본 발명에 포함된 것으로 해석되어야 할 것이다.So far, the present invention has been looked at focusing on the embodiments thereof. Those of ordinary skill in the art to which the present invention pertains will understand that the present invention can be implemented in modified forms without departing from the essential characteristics of the present invention. Therefore, the disclosed embodiments should be considered in an illustrative rather than a restrictive sense. The scope of the present invention is indicated in the claims rather than the foregoing description, and all differences within the scope equivalent thereto should be construed as being included in the present invention.

100: 장치
710: 전처리부
715: 계산부
720: 뷰포트 선택부
725: 메모리
730: 프로세서100: device
710: preprocessor
715: calculator
720: viewport selection
725: memory
730: processor

Claims

(a) pre-processing a 360-degree image and dividing each into three FoV (Field of View) images having a 120-degree horizontal view;
(b) calculating a salience score for each FoV image; and
(c) selecting and transmitting any one of the three FoV images using the saliency score for each FoV image,
Step (b) is,
applying each FoV image to an object detection model to detect an object, and then deriving an object class and an accuracy value of each detected object;
calculating an ease of memory score for each FoV image by applying each of the FoV images to a previously learned memorability calculation model; and
calculating a saliency score for each FoV image using the object class and accuracy value of each object and the memorability score,
The object class of each detected object is classified into any one of the classified classes,
User viewing angle selection in a 360-degree image, characterized in that the object class of each detected object is determined as the class having the highest accuracy value after deriving the accuracy values of the previously classified class and each of the detected objects method.

delete

According to claim 1,
A saliency score for each FoV image is calculated using the following equation.

Here, X, Y, and Z represent the total number of objects included in each object class, P, A, and V represent each object included in each object class, acc represents the accuracy value of each object, and M represents the memory recall score,

represents the balancing weight.

According to claim 1,
The step (c) is,
A method for selecting a user's viewing angle in a 360-degree image, characterized in that the FoV image having the highest saliency score among the three FoV images is selected and transmitted as a viewport.

According to claim 1,
Each of the FoV images is generated by dividing the number of pixels in the entire column of the 360-degree image into thirds so that the horizontal field of view is the same 120 degrees.

delete

A computer-readable recording medium in which a program code for performing the method of any one of claims 1 to 5 is recorded.

In the device,
a memory storing at least one instruction; and
A processor for executing instructions stored in the memory,
The instructions executed by the processor are
(a) pre-processing a 360-degree image and dividing each into three FoV (Field of View) images having a 120-degree horizontal view;
(b) calculating a salience score for each FoV image; and
(c) selecting and transmitting any one of the three FoV images using the saliency score for each FoV image,
Step (b) is,
applying each FoV image to an object detection model to detect an object, and then deriving an object class and an accuracy value of each detected object;
calculating an ease of memory score for each FoV image by applying each of the FoV images to a previously learned memorability calculation model; and
Calculating a saliency score for each FoV image using the object class and accuracy value of each object and the memorability score,
The object class of each detected object is classified into any one of the classified classes,
After deriving an accuracy value between the previously classified class and each of the detected objects, the object class of each detected object is determined as a class having the highest accuracy value.

delete