KR102261544B1

KR102261544B1 - Streaming server and method for object processing in multi-view video using the same

Info

Publication number: KR102261544B1
Application number: KR1020190144712A
Authority: KR
Inventors: 전성국; 김회민; 윤정록
Original assignee: 한국광기술원
Priority date: 2019-11-13
Filing date: 2019-11-13
Publication date: 2021-06-07
Also published as: KR20210057925A

Abstract

본 발명은 스트리밍 서버 및 이를 이용한 다시점 동영상에서의 객체 처리 방법에 관한 것으로서, 특히 스트리밍 서버에 의해 수행되는 다시점 동영상에서의 객체 처리 방법에 있어서, a) 콘텐츠 제작 단말로부터 전방향으로 배열된 하나 이상의 카메라를 이용하여 촬영된 시점별 영상 데이터를 실시간 수신하는 단계; b) 상기 시점별 영상 데이터에 대한 영상 스티칭 및 색상 톤 보정을 수행하는 영상 보정 알고리즘을 수행하여 하나의 다시점 영상으로 병합하는 단계; c) 상기 병합된 다시점 영상에서 사용자의 시야각을 기준으로 관심 영역을 설정하고, 인공 지능 기반의 객체 검출 모델을 이용하여 상기 관심 영역에서 하나 이상의 객체 영상을 추출한 후 상기 추출된 객체 영상에 대한 메타 데이터를 저장하는 단계; d) 상기 객체 영상에 상응하는 증강현실(Augmented Reality, AR) 콘텐츠를 생성하고, 상기 다시점 영상에 상기 객체 영상, AR 콘텐츠 및 메타데이터를 통합하여 360도 스트리밍 영상을 생성하여 저장하는 단계; 및 e) 사용자 단말의 요청에 따라 상기 360도 스트리밍 영상을 상기 사용자 단말에 제공하고, 상기 360도 스트리밍 영상이 재생되는 도중에 상기 사용자 단말에 의해 관심 객체가 선택되면, 상기 관심 객체에 포함된 AR 콘텐츠를 실시간 출력하는 단계를 포함하되, 상기 인공 지능 기반의 객체 검출 모델은, 네트워크를 통해 복수의 360도 스트리밍 영상을 수집하여 학습 영상으로 저장하고, 상기 학습 영상들 중 객체 영상에 대한 레이블링 작업을 수행하여 레이블링 데이터를 데이터베이스에 저장하며, 상기 데이터베이스에 저장된 학습 영상에 기초하여 상기 레이블링 데이터를 학습하는 것이다. The present invention relates to a streaming server and a method for processing an object in a multi-view video using the same, in particular, in a method for processing an object in a multi-view video performed by a streaming server, a) one arranged omnidirectionally from a content production terminal Receiving image data for each point of time taken by using the above cameras in real time; b) merging into one multi-view image by performing an image correction algorithm for performing image stitching and color tone correction on the image data for each viewpoint; c) In the merged multi-view image, a region of interest is set based on the user's viewing angle, and one or more object images are extracted from the region of interest using an artificial intelligence-based object detection model. storing data; d) generating an augmented reality (AR) content corresponding to the object image, and integrating the object image, AR content, and metadata into the multi-view image to generate and store a 360-degree streaming image; and e) providing the 360-degree streaming image to the user terminal according to the user terminal's request, and when an object of interest is selected by the user terminal while the 360-degree streaming image is being played, AR content included in the object of interest and outputting in real time, wherein the artificial intelligence-based object detection model collects a plurality of 360-degree streaming images through a network, stores them as learning images, and performs a labeling operation on the object images among the learning images. to store the labeling data in the database, and to learn the labeling data based on the learning image stored in the database.

Description

Streaming server and object processing method in multi-view video using the same {STREAMING SERVER AND METHOD FOR OBJECT PROCESSING IN MULTI-VIEW VIDEO USING THE SAME}

본 발명은 360도 촬영된 영상을 이용하여 인터랙티브 콘텐츠 서비스를 제공하는 스트리밍 서버 및 이를 이용한 다시점 동영상에서의 객체 처리 방법에 관한 것이다.The present invention relates to a streaming server that provides an interactive content service using a 360-degree image, and a method for processing an object in a multi-view video using the same.

이 부분에 기술된 내용은 단순히 본 실시예에 대한 배경 정보를 제공할 뿐 종래기술을 구성하는 것은 아니다.The content described in this section merely provides background information for the present embodiment and does not constitute the prior art.

인터넷 사용자의 확산과 영상처리 및 편집 기술의 발달은 인터넷 이용자들의 고급 정보에 대한 자연스러운 욕구 증대와 함께 파노라믹(panoramic) 및 다시점(multi-view) 동영상 기술의 발전을 가져왔다. The proliferation of Internet users and the development of image processing and editing technologies have led to the development of panoramic and multi-view video technologies along with an increase in Internet users' natural desire for advanced information.

일반적으로, 다시점 동영상은 다수 개의 카메라를 이용하여 영상을 획득하고 이를 이용하여 영상처리를 하는 기술을 의미한다. 다시점 3차원 동영상은 다시점 동영상의 하나의 서브셋(sub-set)으로서, 3차원 동영상을 지원하는 동영상 형태이다. 이를 위해서는 카메라 배치가 상당히 조밀해야 다시점 동영상 보다 사용자들에게 보여주는 관찰범위가 다소 줄어들게 된다.In general, a multi-view video refers to a technology of acquiring an image using a plurality of cameras and processing the image using the obtained image. A multi-view 3D video is a subset of a multi-view video, and is a video format supporting a 3D video. For this, the camera arrangement must be quite dense, so that the viewing range shown to users is somewhat reduced compared to the multi-view video.

이러한 다시점 동영상에 관한 처리 요소 기술로는, 영상 획득 기술, 모델링/렌더링 기술, 부호화 및 전송 기술 등이 있다. As processing element technologies for such a multi-view video, there are image acquisition technology, modeling/rendering technology, encoding and transmission technology, and the like.

정보 통신 기술 발달로 다양한 멀티미디어 콘텐츠를 사용자에게 제공할 수 있게 되었고, 3차원 TV와 같은 멀티미디어 시스템의 발달로 인하여 실감 미디어 콘텐츠에 대한 수요가 급증하고 있다.With the development of information and communication technology, various multimedia contents can be provided to users, and the demand for immersive media contents is rapidly increasing due to the development of multimedia systems such as 3D TV.

3차원 영상에 대한 다양한 사용자의 요구를 충족시킬 수 있는 대안으로 주목받고 있는 다시점 동영상은 여러 개의 카메라로 하나의 3차원 장면을 360도 촬영한 동영상의 집합으로 사용자에게 임의 시점을 제공하며, 여러 시점의 영상을 합성하여 보다 넓은 화면을 제공할 수 있다.Multi-view video, which is attracting attention as an alternative that can meet the needs of various users for 3D video, is a set of videos taken 360 degrees of one 3D scene with multiple cameras and provides users with an arbitrary viewpoint. It is possible to provide a wider screen by synthesizing the images of the viewpoint.

이와 같이 다시점 3차원 동영상이 비록 많은 응용분야에서 사용되고 있지만, 영상의 획득, 처리, 데이터 양, 동기화 및 디스플레이 방법에 있어 기존 2차원 동영상 보다 많은 어려운 점을 가지고 있는 실정이다.Although the multi-view 3D video is used in many applications, it has many difficulties compared to the existing 2D video in image acquisition, processing, data amount, synchronization, and display methods.

또한, 가상 현실 및 증강 현실로 대표되는 공간 정보 기반의 실감형 콘텐츠에 대한 관심이 증대되면서 객체 인식 등의 지능형 공간 인지 기술에 대한 연구가 활발히 진행되고 있다. 특히, 영상 시각화 장치의 발달과 5G 통신 기술의 출현으로 인해 실시간 대용량 영상 정보의 송수신 및 가시화 처리 기술의 기반이 구축됨에 따라 다시점 3차원 동영상에 대한 지능형 공간 인지 기술의 필요성이 증대되고 있다.In addition, as interest in immersive content based on spatial information represented by virtual reality and augmented reality increases, research on intelligent spatial recognition technology such as object recognition is being actively conducted. In particular, with the development of image visualization devices and the advent of 5G communication technology, the need for intelligent spatial recognition technology for multi-view 3D video is increasing as the basis of real-time large-capacity image information transmission/reception and visualization processing technology is established.

그러나, 딥러닝 기반의 객체 인식 기술의 경우에, 대부분 일반적인 3차원 동영상에 대한 처리를 다루고 있어, 파노라마 영상이나 360도 스트리밍 영상에 대한 객체 인식에 대한 연구가 미비한 실정이다.However, in the case of deep learning-based object recognition technology, most of the processing of general 3D video is dealt with, so research on object recognition for panoramic images or 360-degree streaming images is insufficient.

대한민국 공개특허 제 10-2019-0038134호(발명의 명칭: 360 영상 라이브 스트리밍 서비스 방법 및 서버장치)Republic of Korea Patent Publication No. 10-2019-0038134 (Title of the invention: 360 video live streaming service method and server device)

본 발명은 전술한 문제점을 해결하기 위하여, 본 발명의 일 실시예에 따라 여러 대의 360도 카메라로 촬영된 각 시점의 영상 데이터들에 대한 실시간 스트리밍 서비스를 제공하면서 360도 스트리밍 영상에서 사용자가 시청하는 시청 영역을 중심으로 객체를 인식하여 시각화할 수 있도록 하는 것에 목적이 있다.In order to solve the above problems, the present invention provides a real-time streaming service for image data at each point of time taken by multiple 360-degree cameras according to an embodiment of the present invention, while providing a real-time streaming service for a user to watch in a 360-degree streaming video. The purpose is to recognize and visualize an object centered on the viewing area.

다만, 본 실시예가 이루고자 하는 기술적 과제는 상기된 바와 같은 기술적 과제로 한정되지 않으며, 또 다른 기술적 과제들이 존재할 수 있다.However, the technical task to be achieved by the present embodiment is not limited to the technical task as described above, and other technical tasks may exist.

상기한 기술적 과제를 달성하기 위한 기술적 수단으로서 본 발명의 일 실시예에 따른 스트리밍 서버에 의해 수행되는 다시점 동영상에서의 객체 처리 방법에 있어서, a) 콘텐츠 제작 단말로부터 전방향으로 배열된 하나 이상의 카메라를 이용하여 촬영된 시점별 영상 데이터를 실시간 수신하는 단계; b) 상기 시점별 영상 데이터에 대한 영상 스티칭 및 색상 톤 보정을 수행하는 영상 보정 알고리즘을 수행하여 하나의 다시점 영상으로 병합하는 단계; c) 상기 병합된 다시점 영상에서 사용자의 시야각을 기준으로 관심 영역을 설정하고, 인공 지능 기반의 객체 검출 모델을 이용하여 상기 관심 영역에서 하나 이상의 객체 영상을 추출한 후 상기 추출된 객체 영상에 대한 메타 데이터를 저장하는 단계; d) 상기 객체 영상에 상응하는 증강현실(Augmented Reality, AR) 콘텐츠를 생성하고, 상기 다시점 영상에 상기 객체 영상, AR 콘텐츠 및 메타데이터를 통합하여 360도 스트리밍 영상을 생성하여 저장하는 단계; 및 e) 사용자 단말의 요청에 따라 상기 360도 스트리밍 영상을 상기 사용자 단말에 제공하고, 상기 360도 스트리밍 영상이 재생되는 도중에 상기 사용자 단말에 의해 관심 객체가 선택되면, 상기 관심 객체에 포함된 AR 콘텐츠를 실시간 출력하는 단계를 포함하되, 상기 인공 지능 기반의 객체 검출 모델은, 네트워크를 통해 복수의 360도 스트리밍 영상을 수집하여 학습 영상으로 저장하고, 상기 학습 영상들 중 객체 영상에 대한 레이블링 작업을 수행하여 레이블링 데이터를 데이터베이스에 저장하며, 상기 데이터베이스에 저장된 학습 영상에 기초하여 상기 레이블링 데이터를 학습하는 것이다. In the object processing method in a multi-view video performed by a streaming server according to an embodiment of the present invention as a technical means for achieving the above technical problem, a) one or more cameras arranged in all directions from the content production terminal Receiving the image data for each point in time taken by using the real-time; b) merging into one multi-view image by performing an image correction algorithm for performing image stitching and color tone correction on the image data for each viewpoint; c) In the merged multi-view image, a region of interest is set based on the user's viewing angle, and one or more object images are extracted from the region of interest using an artificial intelligence-based object detection model. storing data; d) generating an augmented reality (AR) content corresponding to the object image, and integrating the object image, AR content, and metadata into the multi-view image to generate and store a 360-degree streaming image; and e) providing the 360-degree streaming image to the user terminal according to the user terminal's request, and when an object of interest is selected by the user terminal while the 360-degree streaming image is being played, AR content included in the object of interest and outputting in real time, wherein the artificial intelligence-based object detection model collects a plurality of 360-degree streaming images through a network, stores them as learning images, and performs a labeling operation on the object images among the learning images. to store the labeling data in a database, and to learn the labeling data based on the learning image stored in the database.

상기 b) 단계는, 상기 시점별 영상 데이터에서 하나 이상의 특징점을 검출하는 단계; 상기 시점별 영상 데이터의 특징점들을 비교하여 서로 이웃한 시점의 영상 데이터들 간에 동일한 특징점들을 동일점으로 매칭하여 대응쌍들을 생성하는 단계; 및 상기 대응 쌍들로부터 각 시점별 영상 데이터들 간의 변환행렬을 계산하고, 상기 변환 행렬을 통해 영상 스티칭을 수행하는 단계를 포함할 수 있다.Step b) may include: detecting one or more feature points in the image data for each viewpoint; generating corresponding pairs by comparing the feature points of the image data for each view and matching the same feature points between image data of neighboring views as the same point; and calculating a transformation matrix between the image data for each view from the corresponding pairs, and performing image stitching through the transformation matrix.

상기 시점별 영상 데이터에서 하나 이상의 특징점을 검출하는 단계는, SIFT(Scale Invariant Feature Transform) 알고리즘과 SURF(Speeded Up Robust Feature) 알고리즘 중 어느 하나의 알고리즘을 이용하여 특징점을 검출할 수 있다. The detecting of the one or more feature points from the image data for each viewpoint may include detecting the feature points by using any one of a Scale Invariant Feature Transform (SIFT) algorithm and a Speeded Up Robust Feature (SURF) algorithm.

상기 b) 단계는, 상기 매칭된 대응 쌍들간의 변환 관계를 RANSAC(RANdom SAmple Consensus) 알고리즘을 사용하여 하기 수학식 1의 동차형(Homogeneous)으로 표현된 변환 행렬을 통해 회전 행렬 및 이동 벡터를 계산할 수 있다. In step b), a rotation matrix and a motion vector are calculated through a transformation matrix expressed in a homogeneous form of Equation 1 below using a RANdom SAmple Consensus (RANSAC) algorithm for the transformation relationship between the matched pairs. can

또한, 상기 b) 단계는, 상기 영상 스티칭을 통해 병합된 서로 이웃한 시점의 영상 데이터들간에 중첩되는 영역에서의 색상 보정을 위한 색상 변형 가중치를 계산하고, 상기 색상 변형 가중치를 이용하여 전체 영상 데이터의 색상 톤을 보정할 수 있다. 여기서, 상기 b) 단계는, 하기 수학식 2를 이용하여 상기 색상 변형 가중치의 최소값을 계산하고, 상기 색상 변형 가중치의 최소값을 전체 영상 데이터에 적용할 수 있다. Also, in step b), a color transformation weight for color correction is calculated in an overlapping region between image data of neighboring views merged through the image stitching, and the entire image data is obtained using the color transformation weight. You can correct the color tone of Here, in step b), the minimum value of the color change weight may be calculated using Equation 2 below, and the minimum value of the color change weight may be applied to the entire image data.

상기 메타데이터는 상기 객체 영상에 대한 객체의 종류, 객체 위치 정보 및 크기 정보를 포함할 수 있다.The metadata may include object type, object location information, and size information for the object image.

한편, 본 발명의 다른 일 실시예에 따른 다시점 동영상에서의 객체 처리를 위한 스트리밍 서버에 있어서, 다시점 동영상에서의 객체 처리 방법을 수행하기 위한 프로그램이 기록된 메모리; 및 상기 프로그램을 실행하기 위한 프로세서를 포함하며, 상기 프로세서는, 상기 프로그램의 실행에 의해, 콘텐츠 제작 단말로부터 전방향으로 배열된 하나 이상의 카메라를 이용하여 촬영된 시점별 영상 데이터를 실시간 수신하여, 상기 시점별 영상 데이터에 대한 영상 스티칭 및 색상 톤 보정을 수행하는 영상 보정 알고리즘을 수행하여 하나의 다시점 영상으로 병합하고, 상기 병합된 다시점 영상에서 사용자의 시야각을 기준으로 관심 영역을 설정하고, 인공 지능 기반의 객체 검출 모델을 이용하여 상기 관심 영역에서 하나 이상의 객체 영상을 추출한 후 상기 추출된 객체 영상에 대한 메타 데이터를 저장하며, 상기 객체 영상에 상응하는 증강현실(Augmented Reality, AR) 콘텐츠를 생성하고, 상기 다시점 영상에 상기 객체 영상, AR 콘텐츠 및 메타데이터를 통합하여 360도 스트리밍 영상을 생성하여 저장하고, 사용자 단말의 요청에 따라 상기 360도 스트리밍 영상을 상기 사용자 단말에 제공하고, 상기 360도 스트리밍 영상이 재생되는 도중에 상기 사용자 단말에 의해 관심 객체가 선택되면, 상기 관심 객체에 포함된 AR 콘텐츠를 실시간 출력하되, 상기 인공 지능 기반의 객체 검출 모델은, 네트워크를 통해 복수의 360도 스트리밍 영상을 수집하여 학습 영상으로 저장하고, 상기 학습 영상들 중 객체 영상에 대한 레이블링 작업을 수행하여 레이블링 데이터를 데이터베이스에 저장하며, 상기 데이터베이스에 저장된 학습 영상에 기초하여 상기 레이블링 데이터를 학습하는 것이다.On the other hand, in the streaming server for object processing in a multi-view video according to another embodiment of the present invention, a memory for recording a program for performing the object processing method in the multi-view video; and a processor for executing the program, wherein the processor receives, in real time, image data for each point of time photographed using one or more cameras arranged in an omnidirectional direction from a content production terminal by executing the program, An image correction algorithm that performs image stitching and color tone correction on image data for each viewpoint is performed to merge them into a single multi-view image, a region of interest is set based on the user's viewing angle in the merged multi-view image, and artificial After extracting one or more object images from the region of interest using an intelligence-based object detection model, metadata for the extracted object images is stored, and Augmented Reality (AR) content corresponding to the object images is generated and generating and storing a 360-degree streaming image by integrating the object image, AR content, and metadata into the multi-view image, and providing the 360-degree streaming image to the user terminal according to the request of the user terminal, and the 360-degree streaming image is provided to the user terminal. When an object of interest is selected by the user terminal while a streaming image is being played, the AR content included in the object of interest is output in real time, and the AI-based object detection model includes a plurality of 360-degree streaming images through a network. to collect and store as a training image, perform a labeling operation on an object image among the training images to store the labeling data in a database, and learn the labeling data based on the training image stored in the database.

상기 프로세서는, 상기 시점별 영상 데이터에서 하나 이상의 특징점을 검출한 후, 상기 시점별 영상 데이터의 특징점들을 비교하여 서로 이웃한 시점의 영상 데이터들 간에 동일한 특징점들을 동일점으로 매칭하여 대응쌍들을 생성하고, 상기 대응 쌍들로부터 각 시점별 영상 데이터들 간의 변환행렬을 계산하고, 상기 변환 행렬을 통해 영상 스티칭을 수행하는 것이다. The processor detects one or more feature points in the image data for each view, compares the feature points of the image data for each view, matches the same feature points between image data of adjacent views as the same point to generate corresponding pairs, , calculates a transformation matrix between image data for each view from the corresponding pairs, and performs image stitching through the transformation matrix.

또한, 상기 프로세서는, 상기 영상 스티칭을 통해 병합된 서로 이웃한 시점의 영상 데이터들간에 중첩되는 영역에서의 색상 보정을 위한 색상 변형 가중치를 계산하고, 상기 색상 변형 가중치를 이용하여 전체 영상 데이터의 색상 톤을 보정할 수 있다.In addition, the processor calculates a color transformation weight for color correction in an overlapping region between image data of neighboring views merged through the image stitching, and uses the color transformation weight to calculate the color of the entire image data. You can correct the tone.

전술한 본 발명의 과제 해결 수단에 의하면, 여러 대의 360도 카메라로 촬영된 각 시점의 영상 데이터들에 대한 실시간 스트리밍 서비스를 제공할 수 있고, 각 카메라마다 특성 및 위치가 상이하고 영상 왜곡 정도와 색상이 동일하지 않으므로 각 시점의 영상 데이터들에 대한 영상 스티칭과 색상 톤 보정을 수행함으로써 콘텐츠 제작 단말이 쉽게 양질의 360도 스트리밍 영상을 제작할 수 있도록 한다. According to the above-described problem solving means of the present invention, it is possible to provide a real-time streaming service for image data at each point taken by multiple 360-degree cameras, and each camera has different characteristics and positions, and the degree of image distortion and color Since this is not the same, by performing image stitching and color tone correction on the image data of each viewpoint, the content production terminal can easily produce a high-quality 360-degree streaming image.

또한, 본 발명은 인공 지능 기반의 객체 검출 모델을 통해 360도 스트리밍 영상 내의 객체 영상을 학습하고, 학습된 객체 검출 모델을 통해 실시간 객체 영상을 검출하되, 영상의 전체 영역이 아니라 관심 영역에 대한 객체 인식을 수행하여 서버의 영상 처리 능력이 향상될 수 있고, 객체 영상에 광고 정보, 상품 판매 정보, 게임 정보 등의 AR 콘텐츠를 포함함으로써 개인 방송 미디어 등의 실시간 콘텐츠에 광고, 커머스, 게임 등의 AR 콘텐츠와 연계하여 다양한 360도 스트리밍 영상을 제공할 수 있다.In addition, the present invention learns an object image in a 360-degree streaming image through an artificial intelligence-based object detection model and detects a real-time object image through the learned object detection model, but the object for the region of interest rather than the entire region of the image By performing recognition, the image processing capability of the server can be improved, and by including AR contents such as advertisement information, product sales information, and game information in object images, AR such as advertisements, commerce, games, etc. in real-time contents such as personal broadcasting media Various 360-degree streaming images can be provided in connection with content.

도 1은 본 발명의 일 실시예에 따른 다시점 동영상에서의 객체 처리를 위한 스트리밍 서버의 구성을 나타낸 도면이다.
도 2는 본 발명의 일 실시예에 따른 다시점 동영상에서의 객체 처리 방법을 나타낸 동작 흐름도이다.
도 3은 본 발명의 일 실시예에 따른 영상 보정 알고리즘 중 영상 스티칭 과정을 설명하는 예시도이다.
도 4는 본 발명의 일 실시예에 따른 영상 보정 알고리즘 중 색상 톤 보정 과정을 설명하는 예시도이다.
도 5는 본 발명의 일 실시예에 따른 관심 영역을 설명하는 도면이다.
도 6은 본 발명의 일 실시예에 따른 인공 지능 기반의 객체 검출 모델을 설명하는 도면이다.
도 7은 본 발명의 일시시예에 따른 360도 스트리밍 영상의 생성 과정을 설명하는 도면이다.
도 8은 본 발명의 일 실시예에 따른 다시점 동영상에서의 객체 처리 방법에 의해 AR 콘텐츠의 증강 과정을 설명하는 도면이다.
도 9는 도 8의 360도 스트리밍 영상에서 관심 객체의 AR 콘텐츠 증강 화면을 설명하는 예시도이다.1 is a diagram showing the configuration of a streaming server for object processing in a multi-view video according to an embodiment of the present invention.
2 is a flowchart illustrating an object processing method in a multi-view video according to an embodiment of the present invention.
3 is an exemplary diagram illustrating an image stitching process in an image correction algorithm according to an embodiment of the present invention.
4 is an exemplary diagram illustrating a color tone correction process in an image correction algorithm according to an embodiment of the present invention.
5 is a diagram illustrating a region of interest according to an embodiment of the present invention.
6 is a diagram illustrating an artificial intelligence-based object detection model according to an embodiment of the present invention.
7 is a view for explaining a process of generating a 360-degree streaming image according to a temporary embodiment of the present invention.
8 is a view for explaining a process of augmenting AR content by an object processing method in a multi-view video according to an embodiment of the present invention.
9 is an exemplary diagram illustrating an AR content augmentation screen of an object of interest in the 360-degree streaming image of FIG. 8 .

아래에서는 첨부한 도면을 참조하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본 발명의 실시예를 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily implement them. However, the present invention may be embodied in many different forms and is not limited to the embodiments described herein. And in order to clearly explain the present invention in the drawings, parts irrelevant to the description are omitted, and similar reference numerals are attached to similar parts throughout the specification.

명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 소자를 사이에 두고 "전기적으로 연결"되어 있는 경우도 포함한다. 또한 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미하며, 하나 또는 그 이상의 다른 특징이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.Throughout the specification, when a part is "connected" with another part, this includes not only the case of being "directly connected" but also the case of being "electrically connected" with another element interposed therebetween. . Also, when a part "includes" a component, it means that other components may be further included, rather than excluding other components, unless otherwise stated, and one or more other features However, it is to be understood that the existence or addition of numbers, steps, operations, components, parts, or combinations thereof is not precluded in advance.

본 명세서에서 '단말'은 휴대성 및 이동성이 보장된 무선 통신 장치일 수 있으며, 예를 들어 스마트 폰, 태블릿 PC 또는 노트북 등과 같은 모든 종류의 핸드헬드(Handheld) 기반의 무선 통신 장치일 수 있다. 또한, '단말'은 네트워크를 통해 다른 단말 또는 서버 등에 접속할 수 있는 PC 등의 유선 통신 장치인 것도 가능하다. 또한, 네트워크는 단말들 및 서버들과 같은 각각의 노드 상호 간에 정보 교환이 가능한 연결 구조를 의미하는 것으로, 근거리 통신망(LAN: Local Area Network), 광역 통신망(WAN: Wide Area Network), 인터넷 (WWW: World Wide Web), 유무선 데이터 통신망, 전화망, 유무선 텔레비전 통신망 등을 포함한다. In the present specification, a 'terminal' may be a wireless communication device with guaranteed portability and mobility, for example, any type of handheld-based wireless communication device such as a smart phone, a tablet PC, or a notebook computer. In addition, the 'terminal' may be a wired communication device such as a PC that can be connected to another terminal or a server through a network. In addition, the network refers to a connection structure capable of exchanging information between each node, such as terminals and servers, and includes a local area network (LAN), a wide area network (WAN), and the Internet (WWW). : World Wide Web), wired and wireless data networks, telephone networks, and wired and wireless television networks.

무선 데이터 통신망의 일례에는 3G, 4G, 5G, 3GPP(3rd Generation Partnership Project), LTE(Long Term Evolution), WIMAX(World Interoperability for Microwave Access), 와이파이(Wi-Fi), 블루투스 통신, 적외선 통신, 초음파 통신, 가시광 통신(VLC: Visible Light Communication), 라이파이(LiFi) 등이 포함되나 이에 한정되지는 않는다.Examples of wireless data communication networks include 3G, 4G, 5G, 3rd Generation Partnership Project (3GPP), Long Term Evolution (LTE), World Interoperability for Microwave Access (WIMAX), Wi-Fi, Bluetooth communication, infrared communication, ultrasound Communication, Visible Light Communication (VLC), LiFi, etc. are included, but are not limited thereto.

이하의 실시예는 본 발명의 이해를 돕기 위한 상세한 설명이며, 본 발명의 권리 범위를 제한하는 것이 아니다. 따라서 본 발명과 동일한 기능을 수행하는 동일 범위의 발명 역시 본 발명의 권리 범위에 속할 것이다.The following examples are detailed descriptions to help the understanding of the present invention, and do not limit the scope of the present invention. Accordingly, an invention of the same scope performing the same function as the present invention will also fall within the scope of the present invention.

이하 첨부된 도면을 참고하여 본 발명의 일 실시예를 상세히 설명하기로 한다.Hereinafter, an embodiment of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른 다시점 동영상에서의 객체 처리를 위한 스트리밍 서버의 구성을 나타낸 도면이다.1 is a diagram showing the configuration of a streaming server for object processing in a multi-view video according to an embodiment of the present invention.

도 1을 참조하면, 스트리밍 서버(100)는 통신 모듈(110), 메모리(120), 프로세서(130) 및 데이터베이스(140)를 포함한다.Referring to FIG. 1 , the streaming server 100 includes a communication module 110 , a memory 120 , a processor 130 , and a database 140 .

상세히, 통신 모듈(110)은 통신망과 연동하여 스트리밍 서버(100)와 콘텐츠 제작 단말(200), 사용자 단말(300) 간의 송수신 신호를 패킷 데이터 형태로 제공하는 데 필요한 통신 인터페이스를 제공한다. 여기서, 통신 모듈(110)은 다른 네트워크 장치와 유무선 연결을 통해 제어 신호 또는 데이터 신호와 같은 신호를 송수신하기 위해 필요한 하드웨어 및 소프트웨어를 포함하는 장치일 수 있다.In detail, the communication module 110 provides a communication interface necessary to provide a transmission/reception signal between the streaming server 100 , the content production terminal 200 , and the user terminal 300 in the form of packet data by interworking with the communication network. Here, the communication module 110 may be a device including hardware and software necessary for transmitting and receiving signals such as control signals or data signals through wired/wireless connection with other network devices.

메모리(120)는 다시점 동영상에서의 객체 처리 방법을 수행하기 위한 프로그램이 기록된다. 또한, 프로세서(130)가 처리하는 데이터를 일시적 또는 영구적으로 저장하는 기능을 수행한다. 여기서, 메모리(120)는 휘발성 저장 매체(volatile storage media) 또는 비휘발성 저장 매체(non-volatile storage media)를 포함할 수 있으나, 본 발명의 범위가 이에 한정되는 것은 아니다.The memory 120 records a program for performing an object processing method in a multi-view video. In addition, the processor 130 performs a function of temporarily or permanently storing the processed data. Here, the memory 120 may include a volatile storage medium or a non-volatile storage medium, but the scope of the present invention is not limited thereto.

프로세서(130)는 다시점 동영상에서의 객체 처리 방법을 제공하는 전체 과정을 제어하는 것으로서, 인공 지능 기반의 객체 검출 모델을 통해 360도 스트리밍 영상에서 객체 영상을 검출하고, 검출한 객체 영상에 대한 AR 콘텐츠를 시각화하여 사용자 단말(300)에 제공할 수 있다. 이러한 프로세서(130)가 수행하는 각각의 동작에 대해서는 추후 보다 상세히 살펴보기로 한다.The processor 130 controls the entire process of providing an object processing method in a multi-view video, and detects an object image from a 360-degree streaming image through an artificial intelligence-based object detection model, and AR for the detected object image The content may be visualized and provided to the user terminal 300 . Each operation performed by the processor 130 will be described in more detail later.

여기서, 프로세서(130)는 프로세서(processor)와 같이 데이터를 처리할 수 있는 모든 종류의 장치를 포함할 수 있다. 여기서, '프로세서(processor)'는, 예를 들어 프로그램 내에 포함된 코드 또는 명령으로 표현된 기능을 수행하기 위해 물리적으로 구조화된 회로를 갖는, 하드웨어에 내장된 데이터 처리 장치를 의미할 수 있다. 이와 같이 하드웨어에 내장된 데이터 처리 장치의 일 예로써, 마이크로프로세서(microprocessor), 중앙처리장치(central processing unit: CPU), 프로세서 코어(processor core), 멀티프로세서(multiprocessor), ASIC(application-specific integrated circuit), FPGA(field programmable gate array) 등의 처리 장치를 망라할 수 있으나, 본 발명의 범위가 이에 한정되는 것은 아니다.Here, the processor 130 may include all kinds of devices capable of processing data, such as a processor. Here, the 'processor' may refer to a data processing device embedded in hardware, for example, having a physically structured circuit to perform a function expressed as a code or an instruction included in a program. As an example of the data processing device embedded in the hardware as described above, a microprocessor, a central processing unit (CPU), a processor core, a multiprocessor, an application-specific integrated (ASIC) circuit) and a processing device such as a field programmable gate array (FPGA), but the scope of the present invention is not limited thereto.

데이터베이스(140)는 다시점 동영상에서의 객체 처리 방법을 수행하면서 누적되는 데이터가 저장된다. 예컨대, 데이터베이스(140)에는 콘텐츠 제작 단말별 영상 데이터, 객체 검출 모델의 학습 데이터, 실시간360도 스트리밍 영상, 객체 영상, 메타 데이터, AR 콘텐츠 등이 저장될 수 있다.The database 140 stores data accumulated while performing an object processing method in a multi-view video. For example, the database 140 may store image data for each content production terminal, learning data of an object detection model, a real-time 360-degree streaming image, an object image, metadata, AR content, and the like.

도 2는 본 발명의 일 실시예에 따른 다시점 동영상에서의 객체 처리 방법을 나타낸 동작 흐름도이고, 도 3은 본 발명의 일 실시예에 따른 영상 보정 알고리즘 중 영상 스티칭 과정을 설명하는 예시도이며, 도 4는 본 발명의 일 실시예에 따른 영상 보정 알고리즘 중 색상 톤 보정 과정을 설명하는 예시도이다. 도 5는 본 발명의 일 실시예에 따른 관심 영역을 설명하는 도면이고, 도 6은 본 발명의 일 실시예에 따른 인공 지능 기반의 객체 검출 모델을 설명하는 도면이다. 2 is an operation flowchart illustrating an object processing method in a multi-view video according to an embodiment of the present invention, and FIG. 3 is an exemplary diagram illustrating an image stitching process in an image correction algorithm according to an embodiment of the present invention, 4 is an exemplary diagram illustrating a color tone correction process in an image correction algorithm according to an embodiment of the present invention. 5 is a diagram illustrating a region of interest according to an embodiment of the present invention, and FIG. 6 is a diagram illustrating an artificial intelligence-based object detection model according to an embodiment of the present invention.

도 2를 참조하면, 스트리밍 서버(100)는 콘텐츠 제작 단말(200)로부터 전방향으로 배열된 하나 이상의 360도 카메라를 이용하여 촬영된 시점별 영상 데이터를 실시간 수신한다(S1). Referring to FIG. 2 , the streaming server 100 receives real-time image data for each viewpoint photographed using one or more 360-degree cameras arranged in all directions from the content production terminal 200 ( S1 ).

스트리밍 서버(100)는 시점별 영상 데이터에 대한 영상 스티칭 및 색상 톤 보정을 수행하는 영상 보정 알고리즘을 수행하여 시점별 영상 데이터들을 하나의 다시점 영상으로 병합한다(S2). The streaming server 100 merges the image data for each viewpoint into a single multi-view image by performing an image correction algorithm that performs image stitching and color tone correction on the image data for each viewpoint (S2).

도 3에 도시된 바와 같이, 스트리밍 서버(100)는 콘텐츠 제작 단말(200)로부터 실시간 전송되는 시점별 영상 데이터에서 SIFT(Scale Invariant Feature Transform) 알고리즘과 SURF(Speeded Up Robust Feature) 알고리즘 중 어느 하나의 알고리즘을 이용하여 특징점들을 검출하고(S31), 시점별 영상 데이터의 특징점들을 서로 비교하여 서로 이웃한 시점의 영상 데이터들 간에 동일한 특징점들을 동일점으로 매칭하여 대응쌍들을 생성한다(S32) 스트리밍 서버(100)는 대응 쌍들로부터 각 시점별 영상 데이터들 간의 변환행렬을 계산하고(S33), 변환 행렬을 통해 영상 스티칭을 수행한다(S34).As shown in FIG. 3 , the streaming server 100 is one of a Scale Invariant Feature Transform (SIFT) algorithm and a Speeded Up Robust Feature (SURF) algorithm in the video data for each point of time transmitted in real time from the content production terminal 200 . The feature points are detected using an algorithm (S31), and the feature points of the image data for each view are compared with each other to match the same feature points between the image data of the neighboring views as the same point to generate corresponding pairs (S32) Streaming server ( 100) calculates a transformation matrix between the image data for each view from the corresponding pairs (S33), and performs image stitching through the transformation matrix (S34).

이때, 스트리밍 서버(100)는 매칭된 대응 쌍들간의 변환 관계를 RANSAC(RANdom SAmple Consensus) 알고리즘을 사용하여 하기 수학식 1의 동차형(Homogeneous)으로 표현된 변환 행렬을 통해 회전 행렬 및 이동 벡터를 계산한다.At this time, the streaming server 100 uses the RANSAC (RANdom SAmple Consensus) algorithm for the transformation relationship between the matched pairs through the transformation matrix expressed in the homogeneous form of Equation 1 below, the rotation matrix and the movement vector. Calculate.

[수학식 1][Equation 1]

한편, 도 4에 도시된 바와 같이, 스트리밍 서버(100)는 상이한 영상 데이터 간에 색상 톤을 일치시키기 위해, 영상 스티칭을 통해 병합된 서로 이웃한 시점의 영상 데이터들간에 중첩되는 영역에서의 색상 보정을 위한 색상 변형 가중치를 하기 수학식 2를 이용하여 계산한다. On the other hand, as shown in FIG. 4, the streaming server 100 performs color correction in an overlapping area between image data of neighboring views merged through image stitching in order to match color tones between different image data. The color transformation weights for this are calculated using Equation 2 below.

[수학식 2] [Equation 2]

수학식 2에서, r_a, g_a, b_a는 중첩되는 영역 A의 각 화소 색상 값이고, r_b, g_b, b_b는 중첩되는 영역 B의 각 화소 색상 값이며, w는 색상 변형 가중치이고, n은 중첩되는 영역의 화소수를 각각 의미한다. 스트리밍 서버(100)는 색상 변형 가중치의 최소값을 계산하고, 이 최소값을 전체 영상 데이터에 적용함으로써 전체 영상 데이터의 색상 톤을 보정한다. In Equation 2, r _a , g _a , b _a are each pixel color value of the overlapping area A, r _b , g _b , b _b are each pixel color value of the overlapping area B, and w is the color transformation weight , and n denotes the number of pixels in the overlapping area, respectively. The streaming server 100 corrects the color tone of the entire image data by calculating the minimum value of the color transformation weight and applying the minimum value to the entire image data.

다시 도 2를 설명하면, 스트리밍 서버(100)는 영상 스티칭과 색상 톤 보정을 통해 병합된 다시점 영상에서 사용자의 시야각, 즉 도 5에 도시된 바와 같이 시청 영역을 기준으로 관심 영역(ROI)을 설정한다(S3). Referring back to FIG. 2 , the streaming server 100 selects the user's viewing angle in the multi-viewpoint image merged through image stitching and color tone correction, that is, a region of interest (ROI) based on the viewing region as shown in FIG. 5 . set (S3).

이때, 스트리밍 서버(100)는 사용자 단말(300)의 화면 크기에 따라 기설정된 시야각 범위를 추출한 후 추출된 시야각 범위에 근거하여 설정된 시청 영역을 ROI로 설정한다. 일례로, 유투브의 경우에 유투브의 360도 영상을 HMD(Head Mounted Display)가 아닌 스마트 폰으로 시청하면 HMD와 마찬가지로 스마트폰에 내장되어 있는 자이로 센서 또는 IMU(Inertial Measurement Unit) 센서를 통해 추정된 스마트폰의 자세 데이터 또는 회전 데이터를 기반으로 시청자가 스마트폰을 돌리는 방향에 있는 360도 영상의 일부분만을 시청하도록 한다.In this case, the streaming server 100 extracts a preset viewing angle range according to the screen size of the user terminal 300 and then sets the set viewing area as an ROI based on the extracted viewing angle range. For example, in the case of YouTube, if a 360-degree video of YouTube is viewed with a smartphone rather than a head mounted display (HMD), the smart phone estimated through the gyro sensor or IMU (Inertial Measurement Unit) sensor built into the smartphone like the HMD is used. Based on the posture data or rotation data of the phone, the viewer views only a part of the 360-degree image in the direction the smartphone is turned.

이와 동일한 원리로, 스트리밍 서버(100)는 스마트폰, 태블릿 PC 등의 사용자 단말(300)에 내장된 센서와 단말별 스펙 정보, 기기정보 등을 이용하여 사용자의 시청 영역을 알 수 있고, 이를 토대로 ROI를 설정하는 것이다. In the same principle, the streaming server 100 can know the user's viewing area by using the sensor built in the user terminal 300 such as a smartphone, tablet PC, and the like, terminal-specific specification information, device information, and the like, and based on this It is to set the ROI.

스트리밍 서버(100)는 인공 지능 기반의 객체 검출 모델을 이용하여 관심 영역에서 하나 이상의 객체 영상을 추출한 후 추출된 객체 영상에 대한 메타 데이터를 데이터베이스(140)에 저장한다(S4). 이때, 메타데이터는 객체 영상에 대한 객체의 종류, 영상 내에서의 객체 위치 정보, 객체의 크기 정보를 포함한다.The streaming server 100 extracts one or more object images from the ROI by using an artificial intelligence-based object detection model, and then stores metadata about the extracted object images in the database 140 ( S4 ). In this case, the metadata includes the type of the object for the object image, information on the location of the object in the image, and information on the size of the object.

스트리밍 서버(100)는 네트워크를 통해 복수의 360도 스트리밍 영상을 수집하여 학습 영상으로 데이터베이스(140)에 저장하고, 학습 영상들 중 객체 영상에 대한 레이블링 작업을 수행하여 레이블링 데이터를 데이터베이스(140)에 저장한다. 도 6에 도시된 바와 같이, 인공 지능 기반의 객체 검출 모델은 데이터베이스(140)에 저장된 학습 영상에 기초하여 레이블링 데이터를 학습한다. Streaming server 100 collects a plurality of 360-degree streaming images through a network, stores them in the database 140 as learning images, and performs a labeling operation on object images among the learning images to store the labeling data in the database 140 Save. As shown in FIG. 6 , the object detection model based on artificial intelligence learns the labeling data based on the training image stored in the database 140 .

이때, 객체 검출 모델은 객체를 학습시키기 위해서는 학습할 영상과 파라미터 파일들, 가중치 파일이 필요하며, 학습을 완료하면 학습시킨 객체들을 인식하는 가중치 파일이 생성된다. 다크넷의 YOLO(You Only Look Once) 모델의 경우, 파라미터 파일은 컨볼루션 레이어(convolutional), 학습 횟수, 영상 사이즈(height, width), 그래픽 카드 사용량, 출력층 필터 등의 신경망의 설정들을 조정할 수 있고, 이 파일의 값들을 잘 조정하면 객체 인식의 성능을 높일 수 있다.In this case, the object detection model requires an image to be learned, parameter files, and a weight file in order to learn an object, and when learning is completed, a weight file for recognizing the learned objects is generated. In the case of Darknet's YOLO (You Only Look Once) model, the parameter file can adjust the neural network settings such as convolutional layer, number of training, image size (height, width), graphics card usage, output layer filter, etc. , If you adjust the values of this file well, you can improve the performance of object recognition.

객체 검출 모델은 기계 학습 중 YOLO(You Only Look Once), R-CNN, FAST R-CNN, FASTER R-CNN, SSD(Single Shot Multibox Detector) 등의 딥 러닝에 기반하여 구축되지만, 딥러닝 이외에 Random Forest, Support Vector Machine 등의 여러 기계 학습을 이용하여 구축될 수 있다. 기계 학습은 크게 지도학습, 비지도 학습, 강화 학습으로 분류될 수 있고, 특히 강화학습은 딥러닝, 큐러닝(Q-Learning), 딥러닝과 큐러닝이 결합한 DQN(Deep-Q-Network) 알고리즘이 대표적으로 사용된다. 딥러닝을 포함한 기계학습(machine learning)에서 지도학습(supervised learning)은 입력값과 출력값이 포함된 레이블링 데이터(labeling data)를 학습하며, 데이터가 충분할 경우에 다양하게 활용될 수 있어 가장 활발히 연구되고 있는 분야이다.Object detection models are built based on deep learning such as YOLO (You Only Look Once), R-CNN, FAST R-CNN, FASTER R-CNN, and SSD (Single Shot Multibox Detector) among machine learning, but in addition to deep learning, random It can be built using several machine learning methods such as Forest and Support Vector Machine. Machine learning can be broadly classified into supervised learning, unsupervised learning, and reinforcement learning. In particular, reinforcement learning is deep learning, Q-learning, and a deep-Q-network (DQN) algorithm that combines deep learning and Q-learning. This is typically used. In machine learning including deep learning, supervised learning learns labeling data including input and output values, and is the most actively studied because it can be used in various ways when the data is sufficient. there is a field

스트리밍 서버(100)는 객체 영상에 상응하는 증강현실(Augmented Reality, AR) 콘텐츠를 생성하고, 다시점 영상에 객체 영상, AR 콘텐츠 및 메타데이터를 통합하여 360도 스트리밍 영상을 생성하여 저장한다(S5).The streaming server 100 generates augmented reality (AR) content corresponding to the object image, and integrates the object image, AR content, and metadata in the multi-view image to generate and store a 360-degree streaming image (S5). ).

사용자 단말(300)이 360도 스트리밍 영상을 요청하면(S6), 스트리밍 서버(100)는 해당 360도 스트리밍 영상을 사용자 단말(300)에 제공하여 메타데이터가 시각화되도록 한다(S7).When the user terminal 300 requests a 360-degree streaming video (S6), the streaming server 100 provides the 360-degree streaming video to the user terminal 300 so that the metadata is visualized (S7).

스트리밍 서버(100)는 사용자 단말(300)에서360도 스트리밍 영상이 재생되는 도중에 관심 객체가 선택되면(S8), 관심 객체에 포함된 AR 콘텐츠를 실시간 출력한다(S9).When an object of interest is selected while the 360-degree streaming image is being played in the user terminal 300, the streaming server 100 outputs AR content included in the object of interest in real time (S9).

하나 이상의 360도 카메라를 이용하여 실시간으로 획득된 시점별 영상 데이터들은 각 카메라의 특성 및 위치가 상이하므로 영상 왜곡 정도 및 색상이 동일하지 않다. 따라서, 콘텐츠 제작 단말(200)은 특정 콘텐츠에 대해 시점별 영상 데이터의 촬영이 종료될 때까지 스트리밍 서비스를 위해 각 시점마다 획득된 영상 데이터를 스트리밍 서버(100)에 전송한다(S10, S11).Since image data for each viewpoint acquired in real time using one or more 360-degree cameras have different characteristics and positions of each camera, the degree of image distortion and color are not the same. Accordingly, the content production terminal 200 transmits the image data obtained for each viewpoint to the streaming server 100 for a streaming service until the shooting of the image data for each viewpoint for a specific content is finished ( S10 , S11 ).

스트리밍 서버(100)는 시점별 영상 데이터를 영상 스티칭 및 색상 톤 보정을 통해 하나의 다시점 영상으로 병합하고, 전체 영역에 대한 객체 검출을 수행하지 않고, 현재 사용자가 보고 있는 관심 영역만 객체 검출을 수행하면서 360도 스트리밍 영상을 사용자 단말(300)에 제공한다. The streaming server 100 merges the image data for each viewpoint into one multi-view image through image stitching and color tone correction, does not perform object detection for the entire region, and detects only the region of interest currently viewed by the user. While performing, a 360-degree streaming image is provided to the user terminal 300 .

한편 도 2의 단계 S1 내지 S11은 본 발명의 구현예에 따라서 추가적인 단계들로 분할되거나, 더 적은 단계들로 조합될 수 있다. 또한, 일부 단계는 필요에 따라 생략될 수도 있고, 단계간의 순서가 변경될 수도 있다.Meanwhile, steps S1 to S11 of FIG. 2 may be divided into additional steps or combined into fewer steps according to an embodiment of the present invention. Also, some steps may be omitted if necessary, and the order between steps may be changed.

도 7은 본 발명의 일시시예에 따른 360도 스트리밍 영상의 생성 과정을 설명하는 도면이고, 도 8은 본 발명의 일 실시예에 따른 다시점 동영상에서의 객체 처리 방법에 의해 AR 콘텐츠의 증강 과정을 설명하는 도면이며, 도 9는 도 8의 360도 스트리밍 영상에서 관심 객체의 AR 콘텐츠 증강 화면을 설명하는 예시도이다.7 is a view for explaining a process of generating a 360-degree streaming video according to a temporary embodiment of the present invention, and FIG. 8 is a process of augmenting AR content by an object processing method in a multi-view video according to an embodiment of the present invention. FIG. 9 is an exemplary diagram illustrating an AR content augmentation screen of an object of interest in the 360-degree streaming image of FIG. 8 .

도 7 내지 도 8을 참조하면, 콘텐츠 제작 단말(200)은 여러 대의 360도 카메라(210)로 촬영된 각 시점의 영상 데이터를 실시간 스트리밍 서버(100)로 전송한다(도 7의 (a) 참조). 7 to 8 , the content production terminal 200 transmits image data of each viewpoint captured by multiple 360-degree cameras 210 to the real-time streaming server 100 (see (a) of FIG. 7 ). ).

스트리밍 서버(100)는 각 시점의 영상 데이터를 영상 스티칭 및 색상 톤 보정을 통해 하나의 다시점 영상으로 병합하고(도 7의 (b) 참조), 객체 검출 모델을 통해 와인, 책 사람, 일회용 컵(disposable cup) 등의 객체 영상을 검출하고, 각 객체의 종류, 위치 및 크기 정보를 포함한 메타데이터를 객체 영상과 함께 저장한다(도 7의 (c)).The streaming server 100 merges the image data of each viewpoint into one multi-view image through image stitching and color tone correction (see Fig. 7 (b)), and wine, book person, disposable cup through the object detection model An object image such as a (disposable cup) is detected, and metadata including type, position, and size information of each object is stored together with the object image (FIG. 7(c)).

스트리밍 서버(100)는 사용자 단말(300)의 요청에 따라 360도 스트리밍 영상을 제공하고, 사용자 단말(300)에 의해 관심 객체가 선택되는 경우에 데이터베이스(140)에 관심 객체와 함께 저장되어 있는 AR 콘텐츠를 구현한다. 이때, AR 콘텐츠는 관심 객체에 대한 광고 정보, 판매 정보, 게임 정보 등이 될 수 있다. The streaming server 100 provides a 360-degree streaming image according to the request of the user terminal 300 , and when the object of interest is selected by the user terminal 300 , the AR stored with the object of interest in the database 140 . implement the content. In this case, the AR content may be advertisement information, sales information, game information, etc. about the object of interest.

이상에서 설명한 본 발명의 실시예에 따른 다시점 동영상에서의 객체 처리 방법은, 컴퓨터에 의해 실행되는 프로그램 모듈과 같은 컴퓨터에 의해 실행 가능한 명령어를 포함하는 기록 매체의 형태로도 구현될 수 있다. 이러한 기록 매체는 컴퓨터 판독 가능 매체를 포함하며, 컴퓨터 판독 가능 매체는 컴퓨터에 의해 액세스될 수 있는 임의의 가용 매체일 수 있고, 휘발성 및 비휘발성 매체, 분리형 및 비분리형 매체를 모두 포함한다. 또한, 컴퓨터 판독가능 매체는 컴퓨터 저장 매체를 포함하며, 컴퓨터 저장 매체는 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보의 저장을 위한 임의의 방법 또는 기술로 구현된 휘발성 및 비휘발성, 분리형 및 비분리형 매체를 모두 포함한다.The object processing method in the multi-view video according to the embodiment of the present invention described above may also be implemented in the form of a recording medium including instructions executable by a computer, such as a program module executed by a computer. Such recording media includes computer-readable media, and computer-readable media can be any available media that can be accessed by a computer, and includes both volatile and nonvolatile media, removable and non-removable media. Computer readable media also includes computer storage media, which include volatile and nonvolatile embodied in any method or technology for storage of information, such as computer readable instructions, data structures, program modules, or other data. , both removable and non-removable media.

전술한 본 발명의 설명은 예시를 위한 것이며, 본 발명이 속하는 기술분야의 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다.The above description of the present invention is for illustration, and those of ordinary skill in the art to which the present invention pertains can understand that it can be easily modified into other specific forms without changing the technical spirit or essential features of the present invention. will be. Therefore, it should be understood that the embodiments described above are illustrative in all respects and not restrictive. For example, each component described as a single type may be implemented in a dispersed form, and likewise components described as distributed may be implemented in a combined form.

본 발명의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.The scope of the present invention is indicated by the following claims rather than the above detailed description, and all changes or modifications derived from the meaning and scope of the claims and their equivalent concepts should be interpreted as being included in the scope of the present invention. do.

100: 스트리밍 서버
110: 통신 모듈 120: 메모리
130: 프로세서 140: 데이터베이스100: streaming server
110: communication module 120: memory
130: processor 140: database

Claims

In the object processing method in the multi-view video performed by the streaming server,
a) receiving real-time image data for each point of view taken by using one or more cameras arranged in an omnidirectional direction from a content production terminal;
b) merging into one multi-view image by performing an image correction algorithm for performing image stitching and color tone correction on the image data for each viewpoint;
b) extracting a preset viewing angle range as the user's viewing area according to the screen size of the user terminal, setting the interested area in the merged multi-viewpoint image based on the extracted user's viewing area, and setting a plurality of Collecting a 360-degree streaming image and storing it as a learning image, performing a labeling operation on an object image among the learning images to store the labeling data in a database, and learning the labeling data based on the learning image stored in the database extracting one or more object images from the region of interest using an artificial intelligence-based object detection model, and then storing metadata for the extracted object images;
c) generating augmented reality (AR) content corresponding to the object image, and integrating the object image, AR content, and metadata into the multi-view image to generate and store a 360-degree streaming image; and
d) The 360-degree streaming image is provided to the user terminal at the request of the user terminal, and when the object of interest is selected by the user terminal while the 360-degree streaming image is being played, AR content included in the object of interest Including the step of real-time output,
In step b), a color transformation weight is calculated for color correction in an overlapping region between image data of neighboring views merged through the image stitching, and the color transformation weight is calculated using Equation 2 below. An object processing method in a multi-view video, wherein the color tone of the entire image data is corrected by calculating the minimum value and then applying the minimum value of the color transformation weight to the entire image data.
[Equation 2]

r _a , g _a , b _a : the color value of each pixel in the overlapping area A
r _b , g _b , b _b : the color value of each pixel in the overlapping area B
w: color variation weight
n: the number of pixels in the overlapping area

The method of claim 1,
Step b) is,
detecting one or more feature points from the image data for each viewpoint;
generating corresponding pairs by comparing the feature points of the image data for each view and matching the same feature points between image data of adjacent views as the same point; and
and calculating a transformation matrix between the image data for each view from the corresponding pairs, and performing image stitching through the transformation matrix.

3. The method of claim 2,
The step of detecting one or more feature points in the image data for each viewpoint includes:
An object processing method in a multi-view video, wherein the feature point is detected using any one of a Scale Invariant Feature Transform (SIFT) algorithm and a Speeded Up Robust Feature (SURF) algorithm.

3. The method of claim 2,
Step b) is,
Multi-viewpoint, which calculates the rotation matrix and the motion vector through the transformation matrix expressed in the homogeneous form of Equation 1 below using the RANdom SAmple Consensus (RANSAC) algorithm for the transformation relationship between the matched pairs. How to handle objects in video.
[Equation 1]

delete

The method of claim 1,
The metadata is an object processing method in a multi-view video, including object type, object location information, and size information for the object image.

In the streaming server for object processing in multi-view video,
a memory in which a program for performing an object processing method in a multi-view video is recorded; and
a processor for executing the program;
The processor, by executing the program,
By receiving real-time image data for each viewpoint photographed using one or more cameras arranged in all directions from the content production terminal, and performing an image correction algorithm that performs image stitching and color tone correction on the image data for each viewpoint, one Merge into a multi-view video,
Extracting a preset viewing angle range according to the screen size of the user terminal as the user's viewing area, setting the interested area in the merged multi-viewpoint image based on the extracted user's viewing area, and a plurality of 360 degrees through a network Artificial intelligence that collects streaming images and stores them as learning images, performs a labeling operation on an object image among the learning images, stores labeling data in a database, and learns the labeling data based on the learning images stored in the database After extracting one or more object images from the region of interest using an object detection model based on
Generates augmented reality (AR) content corresponding to the object image, integrates the object image, AR content, and metadata in the multi-view image to generate and store a 360-degree streaming image,
The 360-degree streaming image is provided to the user terminal at the request of the user terminal, and when an object of interest is selected by the user terminal while the 360-degree streaming image is being played, AR content included in the object of interest is output in real time but,
The image correction algorithm calculates a color transformation weight for color correction in an overlapping region between image data of neighboring views merged through the image stitching, and calculates a color transformation weight for the color transformation weight using Equation 2 below. After calculating the minimum value, the streaming server is to correct the color tone of the entire image data by applying the minimum value of the color transformation weight to the entire image data.
[Equation 2]

9. The method of claim 8,
The processor is
After detecting one or more feature points in the image data for each view, the feature points of the image data for each view are compared to match the same feature points between image data of adjacent views as the same point to generate corresponding pairs, and the corresponding pairs , which calculates a transformation matrix between image data for each view from and performs image stitching through the transformation matrix.

9. The method of claim 8,
The processor is
calculating a color transformation weight for color correction in an overlapping region between image data of neighboring views merged through the image stitching, and correcting the color tone of the entire image data using the color transformation weight , streaming server.

As a computer-readable storage medium recording a program for performing an object processing method in a multi-view video,
The program is
By receiving real-time image data for each viewpoint photographed using one or more cameras arranged in all directions from the content production terminal, and performing an image correction algorithm that performs image stitching and color tone correction on the image data for each viewpoint, one Merge into a multi-view video,
Extracting a preset viewing angle range according to the screen size of the user terminal as the user's viewing area, setting the interested area in the merged multi-viewpoint image based on the extracted user's viewing area, and a plurality of 360 degrees through a network Artificial intelligence that collects streaming images and stores them as learning images, performs a labeling operation on an object image among the learning images, stores labeling data in a database, and learns the labeling data based on the learning images stored in the database After extracting one or more object images from the region of interest using an object detection model based on
Generates augmented reality (AR) content corresponding to the object image, integrates the object image, AR content, and metadata in the multi-view image to generate and store a 360-degree streaming image,
The 360-degree streaming image is provided to the user terminal at the request of the user terminal, and when an object of interest is selected by the user terminal while the 360-degree streaming image is being played, AR content included in the object of interest is output in real time but,
The image correction algorithm calculates a color transformation weight for color correction in an overlapping region between image data of neighboring views merged through the image stitching, and calculates a color transformation weight for the color transformation weight using Equation 2 below. Computer readable record recording a program for performing an object processing method in a multi-view video, characterized in that after calculating the minimum value, the color tone of the entire image data is corrected by applying the minimum value of the color transformation weight to the entire image data media.
[Equation 2]