KR20230109113A

KR20230109113A - Method for controlling camera for high quality video self-portrait and apparatus for same

Info

Publication number: KR20230109113A
Application number: KR1020230004166A
Authority: KR
Inventors: 노신영; 김현우; 허환; 박현진; 박진영; 김인재; 원희지
Original assignee: 고려대학교 산학협력단
Priority date: 2022-01-12
Filing date: 2023-01-11
Publication date: 2023-07-19

Abstract

본 발명의 일 실시 예에 따른 고품질 동영상 자율 촬영을 위해 카메라를 제어하는 방법은 (a) 복수 개의 카메라가 촬영한 영상을 수신하여 상기 영상이 포함하는 오브젝트(Object)를 탐지하고, 상기 오브젝트 탐지 결과를 이용하여 상기 복수 개의 카메라가 촬영한 영상의 정합, 상기 오브젝트 간 상호작용의 분석 및 상기 오브젝트의 추적 중 어느 하나 이상을 수행하는 제1 단계, (b) 상기 제1 단계의 수행 결과를 이용하여 상기 탐지한 오브젝트가 상기 수신한 영상이 포함하는 프레임 단위의 이미지의 중심으로부터 얼마나 벗어나 있는지 탐지하고, 상기 이미지의 기울어진 정도를 탐지하여 상기 수신한 영상의 심미성을 평가하는 제2 단계 및 (c) 상기 제2 단계의 심미성 평과 결과를 이용하여 상기 오브젝트를 촬영하는 복수 개의 카메라를 개별적으로 제어하는 제3 단계를 포함한다. A method for controlling a camera for autonomous high-quality video recording according to an embodiment of the present invention includes (a) receiving images captured by a plurality of cameras, detecting an object included in the images, and detecting the object as a result. A first step of performing any one or more of the matching of the images captured by the plurality of cameras, the analysis of the interaction between the objects, and the tracking of the object by using, (b) using the result of the first step A second step of detecting how far the detected object deviate from the center of an image of a frame unit included in the received image and evaluating the aesthetics of the received image by detecting a degree of inclination of the image; and (c) A third step of individually controlling a plurality of cameras for photographing the object using the aesthetic evaluation result of the second step is included.

Description

Method for controlling camera for autonomous shooting of high-quality video and apparatus therefor

본 발명은 고품질 동영상 자율 촬영을 위해 카메라를 제어하는 방법 및 이를 위한 장치에 관한 것이다. 보다 자세하게는 주어진 장면에 대하여 능동적으로 오브젝트를 선택하고, 구도 및 샷의 종류까지 선정하여 촬영을 진행함으로써 고품질 동영상을 촬영할 수 있는 방법 및 이를 위한 장치에 관한 것이다. The present invention relates to a method for controlling a camera for self-photographing a high-quality video and an apparatus therefor. More specifically, the present invention relates to a method and an apparatus for taking a high-quality video by actively selecting an object for a given scene, selecting a composition and a type of shot, and proceeding with the shooting.

영상과 관련된 다양한 콘텐츠가 기하급수적으로 쏟아져나오는 현 시대 속에서, 보다 고품질의 동영상을 촬영하고자 하는 수요자들의 니즈는 나날이 증가하고 있으며, 관련된 촬영 기술 역시 활발하게 개발되고 있다. In the current era where various contents related to video are pouring out exponentially, the needs of consumers who want to shoot more high-quality videos are increasing day by day, and related filming technologies are also being actively developed.

종래에는 고품질 동영상을 제작하기 위해 전문적인 촬영 기술을 보유한 촬영 감독이나 촬영 기사가 현장에 배치되어 스스로의 전문 지식을 활용해 촬영 대상인 오브젝트에 대한 촬영을 진행하였으나, 이들에 대한 인건비가 나날이 증가하고 있으며, 한 장면을 제작하기 위해 복수 개의 카메라로 동시에 촬영을 진행하는 것이 보편적인 촬영 방식이 된 현재의 촬영 현장 내에서 이들 전문가들을 여러명 두는 것은 콘텐츠의 제작비 증가에 치명적인 영향을 준다는 문제점이 있다. Conventionally, in order to produce high-quality videos, a cinematographer or cinematographer with professional filming skills was placed on site and used their own expertise to shoot the object to be filmed, but the labor cost for them is increasing day by day. However, there is a problem that having several experts in the current shooting site, where simultaneous shooting with multiple cameras to produce a scene has become a common shooting method, has a fatal effect on the increase in content production costs.

이러한 문제점을 해결하기 위해 최근에는 수동적 시각 지능이라 하여 인공지능 기술을 활용해 특정 장치가 전문 촬영 감독이나 촬영 기사가 촬영한 영상을 수동적으로 학습하고, 학습한 결과에 따라 촬영을 담당하는 카메라를 제어하는 방식이 개발되었으며, 이는 인건비 절감에 도움을 주기는 하였으나, 학습 대상이 된 전문가들의 촬영 방식에서 크게 벗어날 수 없으며, 장치 스스로 촬영 대상인 피사체를 선택하거나 피사체를 기준으로 가장 효과적인 구도나 샷의 종류를 선정할 수 없기에 전문가들을 완벽하게 대체할 수 있는 대체재로서 동작하지 못하고 보조자로서의 역할에만 그친다는 문제점이 있다. In order to solve this problem, a specific device passively learns the video taken by a professional cinematographer or cinematographer by using artificial intelligence technology, which is recently called passive visual intelligence, and controls the camera in charge of filming according to the learning result. Although this helped reduce labor costs, it is impossible to deviate significantly from the shooting method of experts who have become learning subjects, and the device itself selects the subject to be photographed or selects the most effective composition or type of shot based on the subject. Since it cannot be selected, there is a problem that it does not work as a substitute that can perfectly replace experts and only serves as an assistant.

한편, 수동적 시각 지능이 종래에 비하여 인건비 절감에 도움을 주기는 했어도 소규모 제작자(예를 들어, 개인 방송을 진행하거나 개인 단위의 유튜버 등)들의 입장에서는 여전히 전문가들을 활용할 인건비 부담을 느낄 수밖에 없는바, 전문적인 촬영 지식을 보유한 전문가들을 보조하는 역할에서 한 걸음 더 나아가, 이들을 완벽하게 대체할 수 있는 새롭고 진보된 기술의 개발이 요구되는바, 본 발명은 이에 관한 것이며, 본 명세서를 통해 이를 능동적 시각 지능이라 명명하도록 한다. On the other hand, although passive visual intelligence has helped to reduce labor costs compared to the past, small-scale producers (eg, individual broadcasters or individual YouTubers) still have to feel the burden of labor costs to use experts. , It is required to develop a new and advanced technology that can completely replace them, going one step further from the role of assisting experts with professional shooting knowledge. Let's call it intelligence.

대한민국 공개특허공보 제 10-2020-0000104호(2020.01.02)Republic of Korea Patent Publication No. 10-2020-0000104 (2020.01.02)

본 발명이 해결하고자 하는 기술적 과제는 종래 고품질 동영상 제작을 위해 높은 인건비를 부담해야 하는 전문 촬영 감독이나 촬영 기사를 전혀 고용하지 않고, 장치만을 위한 비교적 적은 비용만으로 고품질 동영상을 제작할 수 있는 고품질 동영상 자율 촬영을 위해 카메라를 제어하는 방법 및 이를 위한 장치를 제공하는 것이다. The technical problem to be solved by the present invention is a high-quality video autonomous shooting that can produce high-quality video at a relatively low cost only for a device without hiring a professional cinematographer or cinematographer who has to bear high labor costs for conventional high-quality video production. To provide a method for controlling a camera and a device therefor.

본 발명이 해결하고자 하는 또 다른 기술적 과제는 수동적 시각 지능과 같이 학습 대상이 된 전문가들의 촬영 방식에서 벗어나 장치가 피사체를 기준으로 가장 효과적인 구도나 샷의 종류를 스스로 선정함으로써 이들 전문가들을 완벽하게 대체하는 대체재로서 동작할 수 있는 고품질 동영상 자율 촬영을 위해 카메라를 제어하는 방법 및 이를 위한 장치를 제공하는 것이다.Another technical problem to be solved by the present invention is to move away from the shooting method of experts who have been subject to learning, such as passive visual intelligence, and to completely replace these experts by allowing the device to select the most effective composition or shot type based on the subject. It is to provide a method for controlling a camera for autonomous shooting of high-quality video that can operate as an alternative, and an apparatus therefor.

본 발명이 해결하고자 하는 또 다른 기술적 과제는 전문가들을 완벽하게 대체하여 이들에 대한 인건비를 절약할 수 있게 됨으로써 소규모 제작자들 역시 고품질 동영상을 부담없이 제작할 수 있도록 이바지하는 고품질 동영상 자율 촬영을 위해 카메라를 제어하는 방법 및 이를 위한 장치를 제공하는 것이다. Another technical problem to be solved by the present invention is to completely replace experts and save labor costs for them, so that small producers can also produce high-quality videos without any burden. Camera control for autonomous shooting of high-quality videos It is to provide a method and an apparatus therefor.

본 발명의 기술적 과제들은 이상에서 언급한 기술적 과제들로 제한되지 않으며, 언급되지 않은 또 다른 기술적 과제들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.The technical problems of the present invention are not limited to the technical problems mentioned above, and other technical problems not mentioned will be clearly understood by those skilled in the art from the description below.

상기 기술적 과제를 달성하기 위한 본 발명의 일 실시 예에 따른 고품질 동영상 자율 촬영을 위해 카메라를 제어하는 방법은 (a) 복수 개의 카메라가 촬영한 영상을 수신하여 상기 영상이 포함하는 오브젝트(Object)를 탐지하고, 상기 오브젝트 탐지 결과를 이용하여 상기 복수 개의 카메라가 촬영한 영상의 정합, 상기 오브젝트 간 상호작용의 분석 및 상기 오브젝트의 추적 중 어느 하나 이상을 수행하는 제1 단계, (b) 상기 제1 단계의 수행 결과를 이용하여 상기 탐지한 오브젝트가 상기 수신한 영상이 포함하는 프레임 단위의 이미지의 중심으로부터 얼마나 벗어나 있는지 탐지하고, 상기 이미지의 기울어진 정도를 탐지하여 상기 수신한 영상의 심미성을 평가하는 제2 단계 및 (c) 상기 제2 단계의 심미성 평과 결과를 이용하여 상기 오브젝트를 촬영하는 복수 개의 카메라를 개별적으로 제어하는 제3 단계를 포함한다. In order to achieve the above technical problem, a method for controlling a camera for autonomously shooting a high-quality video according to an embodiment of the present invention is to (a) receive an image captured by a plurality of cameras and select an object included in the image. A first step of detecting the object and performing at least one of matching the images captured by the plurality of cameras, analyzing the interaction between the objects, and tracking the object using the object detection result, (b) the first step Detecting how far the detected object is away from the center of the frame unit image included in the received image using the result of the step, and evaluating the aesthetics of the received image by detecting the degree of inclination of the image A second step and (c) a third step of individually controlling a plurality of cameras for photographing the object using the aesthetic evaluation result of the second step.

일 실시 예에 따르면, 상기 제1 단계는, 복수 개의 카메라가 촬영한 영상을 수신하는 제1-1 단계, 상기 수신한 영상이 포함하는 프레임 단위의 이미지에 오브젝트 탐지 알고리즘을 적용해 복수 개의 오브젝트를 탐지하는 제1-2 단계, 상기 프레임 단위의 이미지에서 탐지한 복수 개의 오브젝트를 기준으로 상기 복수 개의 카메라가 촬영한 영상을 정합하는 제1-3 단계, 상기 프레임 단위의 이미지에서 탐지한 복수 개의 오브젝트에 대한 정보 및 상기 복수 개의 카메라가 촬영한 영상의 정합에 대한 정보를 이용하여 상기 복수 개의 오브젝트 간 상호작용을 분석하는 제1-4 단계 및 상기 프레임 단위의 이미지에서 탐지한 복수 개의 오브젝트의 위치를 상기 수신한 영상 내에서 추적하는 제1-5 단계 중 어느 하나 이상을 포함할 수 있다. According to an embodiment, in the first step, a 1-1 step of receiving images captured by a plurality of cameras, and detecting a plurality of objects by applying an object detection algorithm to frame-unit images included in the received images. Step 1-2 of detecting, step 1-3 of matching the images taken by the plurality of cameras based on the plurality of objects detected in the image in frame units, the plurality of objects detected in the image in unit of frame Steps 1 to 4 of analyzing the interaction between the plurality of objects by using information about and matching information of images captured by the plurality of cameras and the positions of the plurality of objects detected in the frame-by-frame image Any one or more of steps 1 to 5 of tracking within the received image may be included.

일 실시 예에 따르면, 상기 제1-2 단계에서의 오브젝트 탐지 알고리즘은, YOLO(You Only Look Once) 알고리즘 및 CenterNet 알고리즘 중 어느 하나일 수 있다. According to an embodiment, the object detection algorithm in step 1-2 may be any one of a You Only Look Once (YOLO) algorithm and a CenterNet algorithm.

일 실시 예에 따르면, 상기 제1-2 단계에서의 복수 개의 오브젝트 탐지는, According to one embodiment, the detection of a plurality of objects in step 1-2,

상기 탐지한 복수 개의 오브젝트 각각이 상기 이미지 내에서 위치하는 영역을 경계 박스(Object Bounding Box)로 출력하는 것일 수 있다.An area in which each of the plurality of detected objects is located in the image may be output as an object bounding box.

일 실시 예에 따르면, 상기 제1-4 단계에서의 복수 개의 오브젝트 간 상호작용은, 상기 복수 개의 오브젝트가 사람과 사물인 경우, 이들 사이의 상호작용 및 상기 복수 개의 오브젝트가 사람과 사람인 경우, 이들 사이의 상호작용 중 어느 하나 이상을 포함할 수 있다. According to an embodiment, the interaction between the plurality of objects in steps 1 to 4 may include an interaction between the plurality of objects when the plurality of objects are a person and a thing, and an interaction between the plurality of objects when the plurality of objects are a person and a person. Any one or more of the interactions between them may be included.

일 실시 예에 따르면, 상기 복수 개의 카메라가 촬영한 영상의 정합에 대한 정보, 상기 복수 개의 오브젝트 간 상호작용에 대한 정보 및 복수 개의 오브젝트의 위치 추적 정보 중 어느 하나 이상을 이용하여 상기 복수 개의 오브젝트 각각에 대한 향후 행동을 예측하는 제1-6 단계를 더 포함할 수 있다. According to an embodiment, each of the plurality of objects is obtained by using at least one of information on matching of images captured by the plurality of cameras, information on interaction between the plurality of objects, and location tracking information of the plurality of objects. Steps 1 to 6 of predicting a future action for may be further included.

일 실시 예에 따르면, 상기 제2 단계는, 상기 탐지한 오브젝트가 상기 이미지 내에서 위치하는 영역을 출력한 경계 박스의 중심이 상기 이미지의 중심으로부터 얼마나 벗어나 있는지 탐지하는 제2-1 단계, 상기 이미지 내에서 수평선을 탐지하고, 상기 탐지한 수평선의 기울기를 산출하여 상기 이미지의 기울어진 정도를 탐지하는 제2-2 단계 및 상기 제2-1 단계의 탐지 결과와 제2-2 단계의 탐지 결과를 이용하여 상기 수신한 영상의 심미성을 평가하는 제2-3 단계 중 어느 하나 이상을 포함할 수 있다. According to an embodiment, in the second step, the 2-1 step of detecting how far the center of the bounding box outputting the area where the detected object is located in the image is deviated from the center of the image, the image The 2-2 step of detecting a horizontal line in the image and calculating the slope of the detected horizontal line to detect the degree of inclination of the image, and the detection result of the 2-1 step and the 2-2 step Any one or more of steps 2 to 3 of evaluating the aesthetics of the received image using the method may be included.

일 실시 예에 따르면, 상기 제2-1 단계 및 제2-2 단계 사이에, 상기 제2-1 단계의 탐지 결과에 따라 상기 이미지를 촬영한 카메라의 상대적인 위치를 산출하는 제2-1 ´단계를 더 포함할 수 있다. According to an embodiment, between the steps 2-1 and 2-2, the step 2-1 ´ of calculating the relative position of the camera that captured the image according to the detection result of the step 2-1 may further include.

일 실시 예에 따르면, 상기 제2-2 단계에서의 수평선의 탐지는, 상기 이미지 내에서 수평선이 탐지되지 않는 경우, 평행한 두 선을 탐지하고, 탐지한 두 평행선이 만나는 소실점을 산출하여 상기 수평선을 탐지하는 것일 수 있다. According to an embodiment, in the detection of the horizontal line in the step 2-2, when no horizontal line is detected in the image, two parallel lines are detected, and a vanishing point where the two detected parallel lines meet is calculated to calculate the horizontal line. may be detecting

일 실시 예에 따르면, 상기 제2-2 단계와 제2-3 단계 사이에, 상기 이미지가 실내 이미지인 경우, 상기 탐지한 오브젝트의 대칭선을 탐지하여 상기 이미지의 기울어진 정도를 탐지하는 제2-2´ 단계를 더 포함할 수 있다. According to an embodiment, between the steps 2-2 and 2-3, when the image is an indoor image, a second step of detecting an inclination degree of the image by detecting a line of symmetry of the detected object -2' steps may be further included.

일 실시 예에 따르면, 상기 제2-3 단계에서의 심미성 평가의 결과는, 상기 복수 개의 카메라가 촬영한 영상 각각에 대한 심미성 평가 결과, 상기 복수 개의 카메라가 촬영한 영상 각각에 대한 심미성 평가 결과 중, 가장 높은 영상을 촬영한 카메라에 대한 정보, 상기 제2-1 단계의 탐지 결과에 따른 상기 이미지를 촬영한 카메라의 추천 촬영 위치에 대한 정보 및 상기 제2-2 단계의 탐지 결과에 따른 상기 이미지를 촬영한 카메라의 추천 촬영 각도에 대한 정보 중 어느 하나 이상을 포함할 수 있다. According to an embodiment, the result of the aesthetic evaluation in step 2-3 is the aesthetic evaluation result for each image captured by the plurality of cameras and the aesthetic evaluation result for each image captured by the plurality of cameras. , information on the camera that captured the highest image, information on the recommended location of the camera that captured the image according to the detection result of step 2-1, and the image according to the detection result of step 2-2. It may include any one or more of information about a recommended shooting angle of a camera that has taken a picture.

상기 기술적 과제를 달성하기 위한 본 발명의 또 다른 실시 예에 따른 고품질 동영상 자율 촬영을 위해 카메라를 제어하는 장치는 하나 이상의 프로세서; 하나 이상의 프로세서, 네트워크 인터페이스, 상기 프로세서에 의해 수행되는 컴퓨터 프로그램을 로드(Load)하는 메모리 및 대용량 네트워크 데이터 및 상기 컴퓨터 프로그램을 저장하는 스토리지를 포함하되, 상기 컴퓨터 프로그램은 상기 하나 이상의 프로세서에 의해, (A) 복수 개의 카메라가 촬영한 영상을 수신하여 상기 영상이 포함하는 오브젝트(Object)를 탐지하고, 상기 오브젝트 탐지 결과를 이용하여 상기 복수 개의 카메라가 촬영한 영상의 정합, 상기 오브젝트 간 상호작용의 분석 및 상기 오브젝트의 추적 중 어느 하나 이상을 수행하는 제1 오퍼레이션, (B) 상기 제1 단계의 수행 결과를 이용하여 상기 탐지한 오브젝트가 상기 수신한 영상이 포함하는 프레임 단위의 이미지의 중심으로부터 얼마나 벗어나 있는지 탐지하고, 상기 이미지의 기울어진 정도를 탐지하여 상기 수신한 영상의 심미성을 평가하는 제2 오퍼레이션, (C) 상기 제2 단계의 심미성 평과 결과를 이용하여 상기 오브젝트를 촬영하는 복수 개의 카메라를 개별적으로 제어하는 오퍼레이션을 실행한다.An apparatus for controlling a camera for autonomous high-quality video recording according to another embodiment of the present invention for achieving the above technical problem includes one or more processors; One or more processors, a network interface, a memory for loading a computer program executed by the processor, and a storage for storing large-capacity network data and the computer program, wherein the computer program is configured by the one or more processors, ( A) Receiving an image captured by a plurality of cameras, detecting an object included in the image, matching the images captured by the plurality of cameras, and analyzing the interaction between the objects using the object detection result and a first operation for performing any one or more of tracking the object, (B) how far the detected object is from the center of the image in units of frames included in the received image using the result of the first step. (C) a plurality of cameras that take pictures of the object individually using the aesthetic evaluation result of the second step; Execute the operation controlled by

상기 기술적 과제를 달성하기 위한 본 발명의 또 다른 실시 예에 따른 매체에 저장된 컴퓨터 프로그램은 컴퓨팅 장치와 결합하여, (AA) 복수 개의 카메라가 촬영한 영상을 수신하여 상기 영상이 포함하는 오브젝트(Object)를 탐지하고, 상기 오브젝트 탐지 결과를 이용하여 상기 복수 개의 카메라가 촬영한 영상의 정합, 상기 오브젝트 간 상호작용의 분석 및 상기 오브젝트의 추적 중 어느 하나 이상을 수행하는 제1 단계, (BB) 상기 제1 단계의 수행 결과를 이용하여 상기 탐지한 오브젝트가 상기 수신한 영상이 포함하는 프레임 단위의 이미지의 중심으로부터 얼마나 벗어나 있는지 탐지하고, 상기 이미지의 기울어진 정도를 탐지하여 상기 수신한 영상의 심미성을 평가하는 제2 단계 및 (CC) 상기 제2 단계의 심미성 평과 결과를 이용하여 상기 오브젝트를 촬영하는 복수 개의 카메라를 개별적으로 제어하는 단계를 실행한다.A computer program stored in a medium according to another embodiment of the present invention for achieving the above technical problem is combined with a computing device to receive (AA) an image taken by a plurality of cameras and obtain an object included in the image. A first step of detecting and performing at least one of matching images captured by the plurality of cameras, analyzing interactions between the objects, and tracking the object using the object detection result, (BB) the first step Using the result of step 1, how far the detected object deviate from the center of the frame unit image included in the received image is detected, and the degree of inclination of the image is detected to evaluate the aesthetics of the received image. and (CC) a step of individually controlling a plurality of cameras for photographing the object using the aesthetic evaluation result of the second step.

상기와 같은 본 발명에 따르면, 고품질 동영상 제작을 위해 높은 인건비를 부담해야 하는 전문 촬영 감독이나 촬영 기사를 전혀 고용하지 않고, 장치에 대한 소정의 비용만을 부담하는 것만으로 고품질 동영상을 손쉽게 제작할 수 있다는 효과가 있다. According to the present invention as described above, it is possible to easily produce a high-quality video only by paying a predetermined cost for the device without hiring a professional cinematographer or photographer who must bear high labor costs for producing a high-quality video. there is

또한, 수동적 시각 지능과 같이 학습 대상이 된 전문가들의 촬영 방식을 학습하는 것이 아닌 화면의 구도에 초점을 맞춘 카메라 제어가 이루어지기 때문에 장치 스스로 피사체를 기준으로 가장 효과적인 구도나 샷의 종류를 스스로 선정함으로써 이들 전문가들을 완벽하게 대체하는 대체재로서 동작할 수 있다는 효과가 있다.In addition, since camera control is performed focusing on the composition of the screen rather than learning the shooting method of experts who have been subject to learning, such as passive visual intelligence, the device itself selects the most effective composition or type of shot based on the subject. There is an effect that it can operate as a substitute that perfectly replaces these experts.

또한, 전문가들을 완벽하게 대체하여 이들에 대한 인건비를 절약할 수 있게 됨으로써 소규모 제작자들 역시 고품질 동영상을 부담없이 제작할 수 있도록 이바지할 수 있다는 효과가 있다. In addition, since it is possible to save labor costs by replacing experts perfectly, there is an effect that small-scale producers can also contribute to producing high-quality videos without any burden.

본 발명의 효과들은 이상에서 언급한 효과들로 제한되지 않으며, 언급되지 않은 또 다른 효과들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해 될 수 있을 것이다.The effects of the present invention are not limited to the effects mentioned above, and other effects not mentioned will be clearly understood by those skilled in the art from the description below.

도 1은 본 발명의 제1 실시 예에 따른 고품질 동영상 자율 촬영을 위해 카메라를 제어하는 장치가 포함하는 전체 구성을 예시적으로 도시한 도면이다.
도 2는 본 발명의 제2 실시 예에 따른 고품질 동영상 자율 촬영을 위해 카메라를 제어하는 방법을 수행하기 위한 구성을 포함하는 전체 환경을 예시적으로 도시한 도면이다.
도 3은 본 발명의 제2 실시 예에 따른 고품질 동영상 자율 촬영을 위해 카메라를 제어하는 방법의 대표적인 단계를 나타낸 순서도이다.
도 4는 본 발명의 제2 실시 예에 따른 고품질 동영상 자율 촬영을 위해 카메라를 제어하는 방법에 있어서, 고차원적으로 장면을 인식하는 제1 단계를 구체화한 순서도이다.
도 5는 본 발명의 제2 실시 예에 따른 고품질 동영상 자율 촬영을 위해 카메라를 제어하는 방법에 있어서, 심미성을 평가하는 제2 단계를 구체화한 순서도이다.
도 6은 복수 개의 오브젝트에 대한 경계 박스와 이를 전부 포함하는 종합 경계 박스 그리고 이들의 중심과 이미지의 중심을 예시적으로 도시한 도면이다.
도 7은 도 6에 도시된 도면을 기준으로 카메라를 제어함으로써 변경된 이미지를 예시적으로 도시한 도면이다.
도 8은 본 발명의 제1 실시 예에 따른 고품질 동영상 자율 촬영을 위해 카메라를 제어하는 장치를 도 1의 경우와 상이하게 기능적인 구성을 포함하는 형태로 도시한 도면이다. 1 is a diagram exemplarily showing the overall configuration included in a device for controlling a camera for autonomous high-quality video recording according to a first embodiment of the present invention.
FIG. 2 is a diagram illustratively illustrating an entire environment including configurations for performing a method of controlling a camera for self-photographing a high-quality video according to a second embodiment of the present invention.
3 is a flowchart illustrating representative steps of a method for controlling a camera for self-photographing a high-quality video according to a second embodiment of the present invention.
4 is a flowchart embodying a first step of recognizing a scene in a high-dimensional manner in a method for controlling a camera for self-photographing a high-quality video according to a second embodiment of the present invention.
5 is a flow chart specifying a second step of evaluating aesthetics in the method for controlling a camera for self-photographing a high-quality video according to a second embodiment of the present invention.
6 is a diagram illustratively illustrating a bounding box for a plurality of objects, a comprehensive bounding box including all of them, and their center and the center of an image.
FIG. 7 is a diagram showing an image changed by controlling a camera based on the drawing shown in FIG. 6 by way of example.
FIG. 8 is a diagram showing a device for controlling a camera for self-photographing a high-quality video according to a first embodiment of the present invention in a form including a functional configuration different from that of FIG. 1 .

본 발명의 목적과 기술적 구성 및 그에 따른 작용 효과에 관한 자세한 사항은 본 발명의 명세서에 첨부된 도면에 의거한 이하의 상세한 설명에 의해 보다 명확하게 이해될 것이다. 첨부된 도면을 참조하여 본 발명에 따른 실시 예를 상세하게 설명한다.Objects and technical configurations of the present invention and details of the operational effects thereof will be more clearly understood by the following detailed description based on the accompanying drawings in the specification of the present invention. An embodiment according to the present invention will be described in detail with reference to the accompanying drawings.

본 명세서에서 개시되는 실시 예들은 본 발명의 범위를 한정하는 것으로 해석되거나 이용되지 않아야 할 것이다. 이 분야의 통상의 기술자에게 본 명세서의 실시 예를 포함한 설명은 다양한 응용을 갖는다는 것이 당연하다. 따라서, 본 발명의 상세한 설명에 기재된 임의의 실시 예들은 본 발명을 보다 잘 설명하기 위한 예시적인 것이며 본 발명의 범위가 실시 예들로 한정되는 것을 의도하지 않는다.The embodiments disclosed herein should not be construed or used as limiting the scope of the present invention. For those skilled in the art, it is natural that the description including the embodiments herein has a variety of applications. Therefore, any embodiments described in the detailed description of the present invention are illustrative for better explaining the present invention and are not intended to limit the scope of the present invention to the embodiments.

도면에 표시되고 아래에 설명되는 기능 블록들은 가능한 구현의 예들일 뿐이다. 다른 구현들에서는 상세한 설명의 사상 및 범위를 벗어나지 않는 범위에서 다른 기능 블록들이 사용될 수 있다. 또한, 본 발명의 하나 이상의 기능 블록이 개별 블록들로 표시되지만, 본 발명의 기능 블록들 중 하나 이상은 동일 기능을 실행하는 다양한 하드웨어 및 소프트웨어 구성들의 조합일 수 있다.The functional blocks shown in the drawings and described below are only examples of possible implementations. Other functional blocks may be used in other implementations without departing from the spirit and scope of the detailed description. Also, while one or more functional blocks of the present invention are represented as separate blocks, one or more of the functional blocks of the present invention may be a combination of various hardware and software configurations that perform the same function.

또한, 어떤 구성요소들을 포함한다는 표현은 "개방형"의 표현으로서 해당 구성요소들이 존재하는 것을 단순히 지칭할 뿐이며, 추가적인 구성요소들을 배제하는 것으로 이해되어서는 안 된다.In addition, the expression of including certain components simply indicates that the corresponding components exist as an expression of “open type”, and should not be understood as excluding additional components.

나아가 어떤 구성요소가 다른 구성요소에 "연결되어" 있다거나 "접속되어" 있다고 언급될 때에는, 그 다른 구성요소에 직접적으로 연결 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 한다.Furthermore, it should be understood that when a component is referred to as being “connected” or “connected” to another component, it may be directly connected or connected to the other component, but other components may exist in the middle. do.

이하에서는 도면들을 참조하여 본 발명의 세부적인 실시 예들에 대해 살펴보도록 한다. Hereinafter, detailed embodiments of the present invention will be described with reference to the drawings.

도 1은 본 발명의 제1 실시 예에 따른 고품질 동영상 자율 촬영을 위해 카메라를 제어하는 장치(100)가 포함하는 전체 구성을 예시적으로 도시한 도면이다. FIG. 1 is a diagram exemplarily illustrating the overall configuration included in an apparatus 100 for controlling a camera for self-photographing high-quality video according to a first embodiment of the present invention.

그러나 이는 본 발명의 목적을 달성하기 위한 바람직한 실시 예일 뿐이며, 필요에 따라 일부 구성이 추가되거나 삭제될 수 있고, 어느 한 구성이 수행하는 역할을 다른 구성이 함께 수행할 수도 있음은 물론이다. However, this is only a preferred embodiment for achieving the object of the present invention, and some components may be added or deleted as necessary, and other components may also perform the role played by one component.

본 발명의 제1 실시 예에 따른 고품질 동영상 자율 촬영을 위해 카메라를 제어하는 장치(100)는 프로세서(10), 네트워크 인터페이스(20), 메모리(30), 스토리지(40) 및 이들을 연결하는 데이터 버스(50)를 포함할 수 있으며, 기타 본 발명의 목적을 달성함에 있어 요구되는 부가적인 구성들을 더 포함할 수 있음은 물론이라 할 것이다. An apparatus 100 for controlling a camera for self-photographing a high-quality video according to a first embodiment of the present invention includes a processor 10, a network interface 20, a memory 30, a storage 40, and a data bus connecting them. It may include (50), and other additional components required for achieving the object of the present invention may be further included.

프로세서(10)는 각 구성의 전반적인 동작을 제어한다. 프로세서(10)는 CPU(Central Processing Unit), MPU(Micro Processer Unit), MCU(Micro Controller Unit) 또는 본 발명이 속하는 기술 분야에서 널리 알려져 있는 형태의 프로세서 중 어느 하나일 수 있다. The processor 10 controls the overall operation of each component. The processor 10 may be any one of a central processing unit (CPU), a micro processor unit (MPU), a micro controller unit (MCU), or a type of processor widely known in the art to which the present invention belongs.

아울러, 프로세서(10)는 본 발명의 제2 실시 예에 따른 고품질 동영상 자율 촬영을 위해 카메라를 제어하는 방법을 수행하기 위한 적어도 하나의 애플리케이션 또는 프로그램에 대한 연산을 수행할 수 있으며, 추천 모델이 구현된 인공지능 프로세서일 수 있다. In addition, the processor 10 may perform calculations for at least one application or program for performing a method for controlling a camera for self-photographing a high-quality video according to the second embodiment of the present invention, and a recommendation model is implemented. may be an artificial intelligence processor.

네트워크 인터페이스(20)는 본 발명의 제1 실시 예에 따른 고품질 동영상 자율 촬영을 위해 카메라를 제어하는 장치(100)의 유무선 인터넷 통신을 지원하며, 그 밖의 공지의 통신 방식을 지원할 수도 있다. 따라서 네트워크 인터페이스(20)는 그에 따른 통신 모듈을 포함하여 구성될 수 있다.The network interface 20 supports wired and wireless Internet communication of the device 100 for controlling a camera for self-photographing of a high-quality video according to the first embodiment of the present invention, and may support other known communication methods. Accordingly, the network interface 20 may include a communication module according to it.

메모리(30)는 각종 정보, 명령 및/또는 정보를 저장하며, 본 발명의 제2 실시 예에 따른 고품질 동영상 자율 촬영을 위해 카메라를 제어하는 방법을 수행하기 위해 스토리지(40)로부터 하나 이상의 컴퓨터 프로그램(41)을 로드할 수 있다. 도 1에서는 메모리(30)의 하나로 RAM을 도시하였으나 이와 더불어 다양한 저장 매체를 메모리(30)로 이용할 수 있음은 물론이다. The memory 30 stores various types of information, commands, and/or information, and one or more computer programs from the storage 40 to perform the method of controlling a camera for self-photographing a high-quality video according to the second embodiment of the present invention. (41) can be loaded. Although RAM is shown as one of the memories 30 in FIG. 1 , it goes without saying that various storage media can be used as the memory 30 .

스토리지(40)는 하나 이상의 컴퓨터 프로그램(41) 및 대용량 네트워크 정보(42)를 비임시적으로 저장할 수 있다. 이러한 스토리지(40)는 ROM(Read Only Memory), EPROM(Erasable Programmable ROM), EEPROM(Electrically Erasable Programmable ROM), 플래시 메모리 등과 같은 비휘발성 메모리, 하드 디스크, 착탈형 디스크, 또는 본 발명이 속하는 기술 분야에서 널리 알려져 있는 임의의 형태의 컴퓨터로 읽을 수 있는 기록 매체 중 어느 하나일 수 있다. Storage 40 may non-temporarily store one or more computer programs 41 and mass network information 42 . The storage 40 may be a non-volatile memory such as a read only memory (ROM), an erasable programmable ROM (EPROM), an electrically erasable programmable ROM (EEPROM), a flash memory, a hard disk, a removable disk, or a It may be any one of widely known computer-readable recording media in any form.

컴퓨터 프로그램(41)은 메모리(30)에 로드되어, 하나 이상의 프로세서(10)에 의해, (A) 복수 개의 카메라가 촬영한 영상을 수신하여 상기 영상이 포함하는 오브젝트(Object)를 탐지하고, 상기 오브젝트 탐지 결과를 이용하여 상기 복수 개의 카메라가 촬영한 영상의 정합, 상기 오브젝트 간 상호작용의 분석 및 상기 오브젝트의 추적 중 어느 하나 이상을 수행하는 제1 오퍼레이션, (B) 상기 제1 단계의 수행 결과를 이용하여 상기 탐지한 오브젝트가 상기 수신한 영상이 포함하는 프레임 단위의 이미지의 중심으로부터 얼마나 벗어나 있는지 탐지하고, 상기 이미지의 기울어진 정도를 탐지하여 상기 수신한 영상의 심미성을 평가하는 제2 오퍼레이션 및 (C) 상기 제2 단계의 심미성 평과 결과를 이용하여 상기 오브젝트를 촬영하는 복수 개의 카메라를 개별적으로 제어하는 오퍼레이션을 실행할 수 있다.The computer program 41 is loaded into the memory 30 and, by one or more processors 10, (A) receives an image taken by a plurality of cameras and detects an object included in the image, A first operation of performing at least one of matching images captured by the plurality of cameras, analyzing interactions between the objects, and tracking the object using object detection results, (B) a result of performing the first step A second operation of detecting how far the detected object deviate from the center of an image of a frame unit included in the received image using and evaluating aesthetics of the received image by detecting a degree of inclination of the image; and (C) An operation of individually controlling a plurality of cameras for photographing the object may be executed using the aesthetic evaluation result of the second step.

이상 간단하게 언급한 컴퓨터 프로그램(41)이 수행하는 오퍼레이션은 컴퓨터 프로그램(41)의 일 기능으로 볼 수 있으며, 보다 자세한 설명은 본 발명의 제2 실시 예에 따른 고품질 동영상 자율 촬영을 위해 카메라를 제어하는 방법에 대한 설명에서 후술하도록 한다. The operation performed by the computer program 41 briefly mentioned above can be regarded as one function of the computer program 41, and a more detailed description is for controlling the camera for autonomous shooting of high-quality video according to the second embodiment of the present invention. It will be described later in the description of how to do it.

데이터 버스(50)는 이상 설명한 프로세서(10), 네트워크 인터페이스(20), 메모리(30) 및 스토리지(40) 사이의 명령 및/또는 정보의 이동 경로가 된다. The data bus 50 serves as a transfer path for commands and/or information between the processor 10 , the network interface 20 , the memory 30 and the storage 40 described above.

이상 설명한 본 발명의 제1 실시 예에 따른 고품질 동영상 자율 촬영을 위해 카메라를 제어하는 장치(100)는 독립된 디바이스의 형태, 예를 들어 전자 기기나 서버(클라우드 포함)의 형태일 수 있으며, 후자의 경우 전용 애플리케이션의 형태로 사용자 단말에 다운로드되어 설치될 수 있을 것이다. The apparatus 100 for controlling a camera for self-photographing a high-quality video according to the first embodiment of the present invention described above may be in the form of an independent device, for example, an electronic device or a server (including a cloud), and the latter In this case, it may be downloaded and installed in the user terminal in the form of a dedicated application.

아울러, 여기서 전자 기기는 스마트폰, 태블릿 PC, 노트북 PC, PDA, PMP 등과 같이 휴대가 용이한 포터블 기기 뿐만 아니라 한 장소에 고정 설치되어 사용하는 데스크톱 PC 등이라도 무방하며, 네트워크 기능만 보유하고 있다면 전자 기기는 어떠한 것이라도 무방하다 할 것이다. In addition, the electronic device here may be portable devices such as smartphones, tablet PCs, notebook PCs, PDAs, PMPs, etc., as well as desktop PCs that are fixedly installed and used in one place. Any device will do.

이하, 본 발명의 제1 실시 예에 따른 고품질 동영상 자율 촬영을 위해 카메라를 제어하는 장치(100)가 독립된 디바이스 형태인 서버임을 전제로 본 발명의 제2 실시 예에 따른 고품질 동영상 자율 촬영을 위해 카메라를 제어하는 방법에 대하여 도 2내지 도 7을 참조하여 설명하도록 한다. Hereinafter, on the premise that the apparatus 100 for controlling a camera for autonomously capturing a high-quality video according to the first embodiment of the present invention is a server in the form of an independent device, a camera for autonomously capturing a high-quality video according to the second embodiment of the present invention. A method of controlling will be described with reference to FIGS. 2 to 7 .

도 2는 본 발명의 제2 실시 예에 따른 고품질 동영상 자율 촬영을 위해 카메라를 제어하는 방법을 수행하기 위한 구성을 포함하는 전체 환경을 예시적으로 도시한 도면이다. FIG. 2 is a diagram illustratively illustrating an entire environment including configurations for performing a method of controlling a camera for self-photographing a high-quality video according to a second embodiment of the present invention.

본 발명의 제2 실시 예에 따른 고품질 동영상 자율 촬영을 위해 카메라를 제어하는 방법은 본 발명의 제1 실시 예에 따른 고품질 동영상 자율 촬영을 위해 카메라를 제어하는 장치(100)가 복수 개의 카메라(200)와 네트워크(N)를 통해 연결되어 있으며, 복수 개의 카메라(200)는 서로 상이한 위치 및 각도에서 동일한 현장을 촬영할 수 있다. A method for controlling a camera for autonomously capturing a high-quality video according to a second embodiment of the present invention includes a plurality of cameras 200 in which the apparatus 100 for controlling a camera for autonomously capturing a high-quality video according to the first embodiment of the present invention ) and the network N, and a plurality of cameras 200 may capture the same scene from different positions and angles.

여기서 동일한 현장이라 함은 복수 개의 카메라(200)를 통해 동영상에 담고자 하는 하나의 촬영 현장을 의미하는 것이며, 촬영 현장에는 오브젝트인 복수 개의 피사체(300)가 존재할 수 있는바, 여기서 오브젝트 또는 피사체(300)는 사람과 사물을 가리지 않는 최광의의 개념이라 할 것이고, 복수 개의 카메라(200)가 동일한 촬영 현장을 촬영한다고 하여 이들 카메라 모두가 동일한 오브젝트 또는 피사체(300)를 촬영하는 것은 아니라 할 것이다. Here, the same site means one shooting site to be included in a video through a plurality of cameras 200, and a plurality of subjects 300, which are objects, may exist in the shooting site. Here, the object or subject ( 300) is a concept in the broadest sense that does not cover people and objects, and even if a plurality of cameras 200 shoot the same shooting site, not all of these cameras shoot the same object or subject 300.

한편, 복수 개의 카메라(200)는 명칭을 카메라로 하였을 뿐, 촬영 기능을 보유한 디바이스라면 어떠한 것이라도 카메라가 될 수 있는바, 예를 들어, 촬영용 드론, 카메라 로봇, 팬틸트 줌 카메라, 스마트폰 등과 같이 촬영 기능을 보유한 디바이스가 모두 카메라(200)가 될 수 있으며, 복수 개의 카메라(200) 각각은 서로 동일한 종류일 필요는 없으며, 성능 역시 상이할 수도 있음은 물론이라 할 것이다. On the other hand, the plurality of cameras 200 are just named cameras, and any device with a shooting function can be a camera, for example, a shooting drone, a camera robot, a pan-tilt-zoom camera, a smartphone, etc. All devices having a photographing function may be cameras 200, and it will be understood that the plurality of cameras 200 do not necessarily have the same type and may have different performance.

도 3은 본 발명의 제2 실시 예에 따른 고품질 동영상 자율 촬영을 위해 카메라를 제어하는 방법의 대표적인 단계를 나타낸 순서도이다. 3 is a flowchart illustrating representative steps of a method for controlling a camera for self-photographing a high-quality video according to a second embodiment of the present invention.

그러나 이는 본 발명의 목적을 달성함에 있어서 바람직한 실시 예일 뿐이며, 필요에 따라 일부 단계가 추가 또는 삭제될 수 있음은 물론이고, 어느 한 단계가 다른 단계에 포함되어 수행될 수도 있다. However, this is only a preferred embodiment in achieving the object of the present invention, and some steps may be added or deleted as necessary, and any one step may be included in another step and performed.

한편, 각 단계는 본 발명의 제1 실시 예에 따른 고품질 동영상 자율 촬영을 위해 카메라를 제어하는 장치(100)를 통해 이루어지는 것을 전제로 하며, 설명의 편의를 위해 "장치(100)"로 명명하도록 함을 미리 밝혀두는 바이다. On the other hand, it is assumed that each step is performed through the device 100 for controlling a camera for autonomous shooting of a high-quality video according to the first embodiment of the present invention, and is named “device 100” for convenience of description. It is a bar that makes it clear in advance.

또한, 이하의 설명에서 사용할 단어인 "동영상", "영상"과 "이미지"는 사전적인 의미는 상이하나, "동영상" 또는 "영상"은 프레임 단위의 "이미지" 복수 개가 연속적으로 취합하여 이루어지는 것이기 때문에 이하의 "이미지"는 "동영상" 또는 "영상"의 특정 프레임에서의 정지 화면을 의미한다 할 것이며, 광의의 개념으로 해석하여 "동영상", "영상", "이미지"는 큰 구별없이 혼용되어 사용될 수 있다 할 것이다. In addition, the words "video", "video" and "image" used in the following description have different meanings in the dictionary, but "video" or "video" is formed by continuously collecting a plurality of "images" in frame units. Therefore, the following "image" will mean a still image in a specific frame of "video" or "video", and interpreted as a broad concept, and "movie", "video", and "image" are used interchangeably without great distinction. It can be used.

우선, 장치(100)가 복수 개의 카메라(200)가 촬영한 영상을 수신하여 해당 영상이 포함하는 오브젝트(300)를 탐지하고, 오브젝트(300) 탐지 결과를 이용하여 복수 개의 카메라(200)가 촬영한 영상의 정합, 오브젝트(300) 간 상호작용의 분석 및 오브젝트(300)의 추적 중 어느 하나 이상을 수행한다(S310). First, the device 100 receives an image captured by the plurality of cameras 200, detects the object 300 included in the image, and uses the object 300 detection result to capture the image by the plurality of cameras 200. At least one of image matching, analysis of interactions between objects 300, and tracking of the object 300 is performed (S310).

이와 같은 S310 단계를 제1 단계라고 하는바, 보다 구체적으로 제1 단계는 촬영 현장을 고차원적으로 탐지하여 장면을 인식하는 단계로 볼 수 있으며, 이하, 도 4를 참조하여 설명하도록 한다. Such step S310 is referred to as a first step, and more specifically, the first step can be viewed as a step of recognizing a scene by detecting a shooting site in a high level, and will be described below with reference to FIG. 4 .

도 4는 본 발명의 제2 실시 예에 따른 고품질 동영상 자율 촬영을 위해 카메라를 제어하는 방법에 있어서, 고차원적으로 장면을 인식하는 제1 단계를 구체화한 순서도이다. 4 is a flowchart embodying a first step of recognizing a scene in a high-dimensional manner in a method for controlling a camera for self-photographing a high-quality video according to a second embodiment of the present invention.

우선, 장치(100)가 복수 개의 카메라(200)가 촬영한 영상을 수신한다(S310-1). First, the device 100 receives images captured by the plurality of cameras 200 (S310-1).

이와 같은 S310-1 단계를 제1-1 단계라 하며, 여기서 복수 개의 카메라(200)는 앞서 도 2에 대한 설명에서 언급한 카메라(200)를 의미하고, 장치(100)는 복수 개의 카메라(200) 각각으로부터 각각의 카메라가 촬영한 영상을 개별적으로 수신할 수 있는바, 이하 설명할 단계들은 개별적인 카메라로부터 수신한 영상 각각에 대하여 수행될 수 있다 할 것이다. Step S310-1 is referred to as step 1-1, wherein the plurality of cameras 200 refer to the cameras 200 mentioned above in the description of FIG. 2, and the device 100 includes the plurality of cameras 200 ), images taken by each camera can be individually received from each, and the steps to be described below can be performed for each image received from each camera.

이후, 장치(100)가 수신한 영상이 포함하는 프레임 단위의 이미지에 오브젝트 탐지 알고리즘을 적용해 복수 개의 오브젝트(300)를 탐지한다(S310-2). Thereafter, the device 100 detects a plurality of objects 300 by applying an object detection algorithm to the image in units of frames included in the received image (S310-2).

이와 같은 S310-2 단계를 제1-2 단계라 하며, 영상은 복수 개의 이미지가 연속적으로 연결되어 형성된 것이기 때문에 제1-2 단계에서는 복수 개의 카메라(200)로부터 수신한 영상 각각에 대하여, 해당 영상이 포함하고 있는 프레임 단위의 모든 이미지에서 복수 개의 오브젝트(300)를 탐지할 수 있다. Such step S310-2 is referred to as step 1-2, and since the image is formed by continuously connecting a plurality of images, in step 1-2, for each image received from the plurality of cameras 200, the corresponding image A plurality of objects 300 may be detected in all images in units of frames included in the object 300 .

여기서 복수 개의 오브젝트(300)는 사람과 사물을 모두 포함하는 최광의의 개념임은 앞서 언급하였으며, 오브젝트 탐지는 YOLO(You Only Look Once) 알고리즘 및 CenterNet 알고리즘 중 어느 하나를, 더 나아가 이상의 알고리즘 외에 공지된 오브젝트 탐지 알고리즘 중 어느 하나라면 어떠한 것이라도 이용할 수 있다 할 것이다. It has been mentioned above that the plurality of objects 300 is a concept in the broadest sense that includes both people and objects, and object detection is performed using any one of the You Only Look Once (YOLO) algorithm and the CenterNet algorithm, and further known in addition to the above algorithms. Any one of the proposed object detection algorithms may be used.

한편, 장치(100)는 복수 개의 오브젝트(300)를 탐지한 경우, 탐지한 복수 개의 오브젝트(300) 각각이 이미지 내에서 위치하는 영역을 경계 박스(Object Bounding Box)로 출력할 수 있는바, 여기서 경계 박스는 탐지한 오브젝트(300)의 개수만큼 출력될 수 있으며, 경계 박스 내에는 탐지한 오브젝트(300) 각각이 전부 포함될 수 있다. On the other hand, when the device 100 detects a plurality of objects 300, it can output an area where each of the plurality of detected objects 300 is located in the image as an object bounding box, where Bounding boxes may be output as many as the number of detected objects 300, and each of the detected objects 300 may be included in the bounding box.

예를 들어, 장치(100)가 하나의 이미지에서 한 명의 사람과 한 개의 공을 탐지한 경우, 경계 박스는 한 명의 사람을 내부에 전부 포함하는 경계 박스와 한 개의 공을 내부에 전부 포함하는 경계 박스 총 두 개가 출력될 수 있을 것이며, 장치(100)는 각각의 경계 박스에 대하여 하나의 이미지에서 동일한 색상 또는 상이한 색상으로도 출력할 수 있을 것이나, 서로 상이한 이미지에서 인식한 동일한 오브젝트에 대해서는 동일한 색상으로 경계 박스를 출력할 수 있을 것이다. For example, when the device 100 detects one person and one ball in one image, the bounding box is a bounding box containing one person entirely inside and a boundary containing one ball entirely inside. A total of two boxes may be output, and the device 100 may output the same color or different colors in one image for each bounding box, but the same color for the same object recognized in different images. You can print the bounding box with

이미지에서 복수 개의 오브젝트(300)를 탐지했다면, 프레임 단위의 이미지에서 탐지한 복수 개의 오브젝트(300)를 기준으로 복수 개의 카메라(200)가 촬영한 영상을 정합한다(S310-3). If a plurality of objects 300 are detected in the image, images captured by the plurality of cameras 200 are matched based on the plurality of objects 300 detected in the image in frame units (S310-3).

이와 같은 S310-3 단계를 제1-3 단계라 하며, 제1-3 단계를 통해 서로 상이한 카메라(200)로부터 수신한 영상들을 서로 정합할 수 있다. Step S310-3 is referred to as step 1-3, and images received from different cameras 200 may be matched with each other through step 1-3.

예를 들어, 제1 카메라(미도시)로부터 수신한 영상에서 탐지된 한 개의 공이, 제2 카메라(미도시)로부터 수신한 영상에서도 탐지된 경우, 두 개의 영상에서 모두 탐지된 공이라는 오브젝트를 기준으로 제1 카메라(미도시)로부터 수신한 영상과 제2 카메라(미도시)로부터 수신한 영상을 정합할 수 있으며, 영상의 정합은 특정 영상에서 오브젝트의 형상 전체가 탐지된 경우뿐만 아니라 오브젝트의 일 부분만 탐지된 경우에도 가능하다 할 것이나, 최소한 오브젝트로 탐지되어 경계 박스 내에 포함될 정도의 일 부분은 탐지되어야 할 것이다. For example, when a ball detected in an image received from a first camera (not shown) is also detected in an image received from a second camera (not shown), an object called a ball detected in both images is used as a standard. It is possible to match the image received from the first camera (not shown) and the image received from the second camera (not shown), and the image matching is not only when the entire shape of the object is detected in a specific image, Even if only a part is detected, it will be possible, but at least a part to be detected as an object and included in the bounding box must be detected.

또 다른 예를 들어, 제1 카메라(미도시)로부터 수신한 영상에서 탐지된 한 명의 사람이, 제3 카메라(미도시)로부터 수신한 영상에서도 탐지된 경우, 두 개의 영상에서 모두 탐지된 동일한 사람이라는 오브젝트를 기준으로 제1 카메라(미도시)로부터 수신한 영상과 제3 카메라(미도시)로부터 수신한 영상을 정합할 수 있으며, 앞서 한 개의 공이라는 오브젝트를 기준으로 제1 카메라(미도시)로부터 수신한 영상과 제2 카메라(미도시)로부터 수신한 영상을 정합할 수 있다 하였으므로, 장치(100)는 제1 카메라(미도시)로부터 수신한 영상 내지 제3 카메라(미도시)로부터 수신한 영상 전부를 자연스럽게 정합할 수 있을 것이다. As another example, when a person detected in an image received from a first camera (not shown) is also detected in an image received from a third camera (not shown), the same person detected in both images An image received from a first camera (not shown) and an image received from a third camera (not shown) may be matched based on an object called, and a first camera (not shown) based on an object called a ball. Since it is said that the image received from the second camera (not shown) can be matched with the image received from the second camera (not shown), the device 100 can match the image received from the first camera (not shown) to the image received from the third camera (not shown) You will be able to match all of the images naturally.

영상까지 정합했다면, 장치(100)가 프레임 단위의 이미지에서 탐지한 복수 개의 오브젝트(300)에 대한 정보 및 복수 개의 카메라(200)가 촬영한 영상의 정합에 대한 정보를 이용하여 복수 개의 오브젝트(300) 간 상호작용을 분석한다(S310-4). If the images are matched, the device 100 uses the information about the plurality of objects 300 detected in the frame-by-frame image and the information about the matching of the images captured by the plurality of cameras 200 to create a plurality of objects 300. ) is analyzed (S310-4).

이와 같은 S310-4 단계를 제1-4 단계라 하며, 제1-4 단계에서의 복수 개의 오브젝트(300) 간 상호작용은 복수 개의 오브젝트(300)가 사람과 사물인 경우, 이들 사이의 상호작용 및 복수 개의 오브젝트(300)가 사람과 사람인 경우, 이들 사이의 상호작용 중 어느 하나 이상을 포함할 수 있으며, 모두 포함하는 것이 가장 바람직하다 할 것이다. Step S310-4 is referred to as step 1-4, and the interaction between the plurality of objects 300 in step 1-4 is the interaction between the plurality of objects 300 when they are people and objects. And when the plurality of objects 300 are people and people, it may include any one or more of interactions between them, and it would be most preferable to include all of them.

한편, 여기서 상호작용은 복수 개의 오브젝트(300) 간의 관계, 보다 구체적으로 각각의 오브젝트가 서로에게 어떤 영향을 주고 있는지에 관한 것인바, 복수 개의 오브젝트(300)가 한 명의 사람과 한 개의 공인 경우, 이들 사이의 상호작용은 예를 들어, 사람이 공을 발로 찬다(공이 사람에게 발로 차인다), 사람이 공을 손으로 던진다(공이 사람에게 손으로 던져진다), 사람이 공을 헤딩한다(공이 사람에게 헤딩당한다) 등일 수 있으며, 오브젝트 간 상호작용을 분석하기 위해서는 어느 한 오브젝트가 다른 오브젝트에게 어떻게 작용할 수 있는지를 반드시 확인해야 하는바, 그에 따라 장치(100)가 상호작용을 분석하기 위해서는 앞서 제1-2 단계에서 탐지한 복수 개의 오브젝트(300)에 대한 정보가 요구되는 것이며(오브젝트가 무엇인지), 장치(100)는 다양한 오브젝트 사이에서 작용할 수 있는 모든 예시에 대한 데이터베이스(미도시)를 포함할 수 있다. Meanwhile, the interaction here relates to the relationship between the plurality of objects 300, and more specifically, how each object affects each other. When the plurality of objects 300 is one person and one public figure, The interaction between them is, for example, a person kicks the ball (the ball is kicked by the person), a person throws the ball with the hand (the ball is thrown at the person), and the person heads the ball (the ball is kicked by the person). being headed by a person), etc., and in order to analyze the interaction between objects, it is necessary to check how one object can act on another object. Accordingly, in order for the device 100 to analyze the interaction, it is necessary to Information on the plurality of objects 300 detected in step 1-2 is required (what the object is), and the device 100 includes a database (not shown) for all examples that can act between various objects. can do.

예를 들어, 오브젝트가 공인 경우, 다른 오브젝트인 사람에 대하여 발로 차이는 작용, 손으로 던져지는 작용, 머리로 헤딩당하는 작용 등이 데이터베이스(미도시)에 저장되어 있을 수 있으며, 다른 오브젝트가 방망이인 경우, 방망이로 쳐지는 작용이 데이터베이스(미도시)에 저장되어 있을 수 있다.For example, if the object is a ball, an action of kicking a person, which is another object, an action of being thrown with a hand, an action of being headed with a head, etc. may be stored in a database (not shown), and the other object is a bat. In this case, the action of being hit with a bat may be stored in a database (not shown).

더 나아가, 장치(100)는 앞서 제1-3 단계에서 복수 개의 카메라(200)가 촬영한 영상의 정합에 대한 정보까지 상호작용의 분석에 이용할 수 있는바, 영상의 정합에 대한 정보를 통해 다양한 시각 정보를 확인할 수 있기 때문이다. 앞선 예에서 복수 개의 오브젝트(300)가 한 명의 사람과 한 개의 공인 경우, 이들 사이의 상호작용이 사람이 공을 발로 찬다(공이 사람에게 발로 차인다), 사람이 공을 손으로 던진다(공이 사람에게 손으로 던져진다), 사람이 공을 헤딩한다(공이 사람에게 헤딩당한다) 등일 수 있다고 했던바, 장치(100)가 복수 개의 오브젝트(300)에 대한 정보만을 이용한다면 사람과 공의 상호작용이 발로 차는 것인지, 손으로 던지는 것인지, 헤딩하는 것인지 정확하게 파악하기가 어려울 것이나, 영상의 정합에 대한 정보를 통해 사람과 공에 대한 다양한 시각 정보를 확인함으로써 사람과 공의 상호 작용을 정확하게 구분할 수 있는 것이다. Furthermore, the device 100 can use information on matching of images captured by the plurality of cameras 200 in steps 1-3 to analyze interactions, and through the information on matching images, various This is because visual information can be confirmed. In the previous example, if the plurality of objects 300 are one person and one ball, the interaction between them is that the person kicks the ball (the ball is kicked by the person), and the person throws the ball with their hand (the ball is kicked by the person). thrown by hand), a person heads the ball (the ball is headed by a person), etc., if the device 100 uses only information about the plurality of objects 300, the interaction between the person and the ball It will be difficult to accurately determine whether it is kicking, throwing with the hand, or heading, but it is possible to accurately distinguish the interaction between a person and the ball by checking various visual information about the person and the ball through the information on the matching of the video. .

예를 들어, 영상의 정합에 대한 정보를 통해 사람의 발과 공이 가까이 위치해있다면, 상호작용은 사람이 공을 발로 찬다가 될 것이며, 사람이 공을 손에 쥐고 있으면, 상호작용은 사람이 공을 손으로 던진다가 될 것이고, 공이 사람의 머리 부근에 위치해있다면, 상호작용은 사람이 공을 헤딩한다가 될 것인바, 이들 모두 하나의 영상을 통해서만은 확인이 어려울 것이기에 영상의 정합에 대한 정보를 함께 이용하는 것이다. For example, if a person's foot and a ball are located close to each other through the information about the matching of the image, the interaction will be when the person kicks the ball, and if the person holds the ball in their hand, the interaction will be when the person kicks the ball. It will be thrown by hand, and if the ball is located near the person's head, the interaction will be when the person heads the ball. Since it will be difficult to confirm all of these only through one image, information on the matching of the image is to use

복수 개의 오브젝트(300) 간 상호작용을 분석했다면, 장치(100)가 프레임 단위의 이미지에서 탐지한 복수 개의 오브젝트(300)의 위치를 수신한 영상 내에서 추적하며 (S310-5), 복수 개의 카메라(200)가 촬영한 영상의 정합에 대한 정보, 복수 개의 오브젝트(300) 간 상호작용에 대한 정보 및 복수 개의 오브젝트(300)의 위치 추적 정보 중 어느 하나 이상을 이용하여 복수 개의 오브젝트(300) 각각에 대한 향후 행동을 예측한다(S310-6). If the interaction between the plurality of objects 300 is analyzed, the device 100 tracks the positions of the plurality of objects 300 detected in the frame-by-frame image within the received image (S310-5), and the plurality of cameras Each of the plurality of objects 300 is detected by using at least one of information on matching of images captured by the user 200, information on interaction between the plurality of objects 300, and location tracking information of the plurality of objects 300. Predict future actions for (S310-6).

여기서 전자에 해당하는 S310-5 단계를 제1-5 단계라 하며, 후자에 해당하는 S310-6 단계를 제1-6 단계라 하는바, 이들 단계를 통해 복수 개의 카메라(200)가 촬영한 다수의 영상으로부터 오브젝트를 실시간으로 추척할 수 있으며, 전체 영상 내에서 등장한 사람의 수를 파악할 수 있고, 원하는 사람이 등장한 영상까지 파악할 수 있다. Here, steps S310-5 corresponding to the former are referred to as steps 1-5, and steps S310-6 corresponding to the latter are referred to as steps 1-6. Objects can be tracked in real time from the video of , the number of people appearing in the entire video can be identified, and even the video in which the desired person appears can be identified.

더 나아가, 복수 개의 오브젝트(300) 각각에 대한 향후 행동 예측 결과를 이용하여 오브젝트를 촬영하는 복수 개의 카메라(200)를 개별적으로 제어할 수도 있는바, 예를 들어 공이라는 오브젝트에 대한 장치(100)의 향후 행동 예측 결과가 운동장의 사이드 라인을 벗어나는 것일 경우에, 공을 촬영하는 카메라를 운동장의 사이드 라인 방향으로 이동하도록 제어함으로써 공을 화면에서 놓치지 않고 사이드 라인을 벗어나는 모습을 실시간으로 생동감 있게 제공할 수 있을 것이며, 이는 복수 개의 카메라(200)를 개별적으로 제어하는 제3 단계에 관한 설명인바, 뒤에서 자세히 설명하도록 한다. Furthermore, it is also possible to individually control a plurality of cameras 200 for photographing an object by using a future behavior prediction result for each of the plurality of objects 300. For example, the apparatus 100 for an object called a ball If the result of predicting the future behavior of the game is to move out of the sideline of the playground, the camera that shoots the ball is controlled to move in the direction of the sideline of the playground, so that the ball does not miss the screen and the scene of leaving the sideline in real time is provided vividly. Since this is a description of the third step of individually controlling the plurality of cameras 200, it will be described in detail later.

다시 도 3에 대한 설명으로 돌아가도록 한다. Let's go back to the description of FIG. 3 again.

제1 단계를 수행했다면, 장치(100)가 제1 단계의 수행 결과를 이용하여 탐지한 오브젝트가 수신한 영상이 포함하는 프레임 단위의 이미지의 중심으로부터 얼마나 벗어나 있는지 탐지하고, 이미지의 기울어진 정도를 탐지하여 수신한 영상의 심미성을 평가한다(S320). If the first step is performed, the apparatus 100 detects how far the detected object is from the center of the image in units of frames included in the received image using the result of the first step, and determines the degree of inclination of the image. Estheticity of the detected and received image is evaluated (S320).

이와 같은 S320 단계를 제2 단계라고 하는바, 보다 구체적으로 제1 단계에서 인식한 장면의 심미성을 평가하는 단계로 볼 수 있으며, 이하, 도 5를 참조하여 설명하도록 한다. Such step S320 is referred to as a second step, and more specifically, it can be viewed as a step of evaluating the aesthetics of the scene recognized in the first step, and will be described below with reference to FIG. 5 .

도 5는 본 발명의 제2 실시 예에 따른 고품질 동영상 자율 촬영을 위해 카메라를 제어하는 방법에 있어서, 심미성을 평가하는 제2 단계를 구체화한 순서도이다. 5 is a flow chart specifying a second step of evaluating aesthetics in the method for controlling a camera for self-photographing a high-quality video according to a second embodiment of the present invention.

우선, 장치(100)가 탐지한 오브젝트가 이미지 내에서 위치하는 영역을 출력한 경계 박스의 중심이 이미지의 중심으로부터 얼마나 벗어나 있는지 탐지한다(S320-1). First, how far the center of the bounding box outputting the region where the object detected by the device 100 is located in the image is deviated from the center of the image is detected (S320-1).

이와 같은 S320-1 단계를 제2-1 단계라 하며, 제2-1 단계는 영상에 대한 심미성 평가 항목 중, 촬영하고자 하는 피사체의 중심을 화면의 중심에 두어야하는 항목을 구현한 것이다. Step S320-1 is referred to as step 2-1, and step 2-1 implements an item for placing the center of a subject to be photographed at the center of the screen among aesthetic evaluation items for images.

앞서 제1-2 단계에 대한 설명에서 장치(100)가 탐지한 복수 개의 오브젝트(300) 각각이 이미지 내에서 위치하는 영역을 경계 박스로 출력한다고 했던바, 오브젝트는 경계 박스 내부에 전부 포함되므로 경계 박스의 중심이 해당 오브젝트의 중심으로 볼 수 있으며, 장치(100)는 경계 박스의 중심이 이미지의 중심으로부터 얼마나 벗어나 있는지를 탐지할 수 있다. In the description of steps 1-2 above, it was said that the region where each of the plurality of objects 300 detected by the device 100 is located in the image is output as a bounding box. The center of the box may be viewed as the center of the corresponding object, and the device 100 may detect how far the center of the bounding box is from the center of the image.

한편, 앞서 제1-2 단계에 대한 설명에서 오브젝트가 복수 개인 경우, 각각의 오브젝트에 대하여 경계 박스가 개별적으로 출력된다고 했던바, 이러한 경우에는 경계 박스의 중심 역시 복수개가 되므로 어떠한 경계 박스의 중심을 이미지의 중심으로부터 얼마나 벗어나 있는지 탐지해야 함이 문제될 수 있다. 이러한 경우 본 발명의 제2 실시 예에 따른 고품질 동영상 자율 촬영을 위해 카메라를 제어하는 방법은 탐지한 오브젝트가 복수 개임으로 인해 출력한 경계 박스 역시 복수 개가 되는 경우, 해당 복수 개의 경계 박스 전부를 내부에 포함할 수 있는 새로운 경계 박스를 생성하여 출력할 수 있으며, 이를 종합 경계 박스(Union Box)라 한다. On the other hand, in the above description of step 1-2, it was said that when there are a plurality of objects, the bounding boxes are individually output for each object. In this case, since the centers of the bounding boxes are also plural, Having to detect how far away from the center of the image can be a problem. In this case, in the method for controlling a camera for autonomous shooting of a high-quality video according to the second embodiment of the present invention, when there are also a plurality of output bounding boxes due to a plurality of detected objects, all of the plurality of bounding boxes are placed inside. A new bounding box that can be included can be created and output, which is called a union box.

종합 경계 박스는 그 내부에 복수 개의 오브젝트(300) 각각을 내부에 포함하는 복수 개의 경계 박스를 전부 포함하고 있는바, 종합 경계 박스의 중심은 각각의 경계 박스의 중심으로부터 벗어나 있을 수 있으나, 전체적으로 본다면 종합 경계 박스의 중심이 해당 이미지의 중심과 일치하는 경우, 피사체의 전체적인 중심이 화면의 중심에 놓여진 것으로 볼 수 있기 때문이다. The comprehensive bounding box includes all of the plurality of bounding boxes including each of the plurality of objects 300 therein, and the center of the comprehensive bounding box may be deviated from the center of each bounding box, but viewed as a whole. This is because when the center of the comprehensive bounding box coincides with the center of the corresponding image, the overall center of the subject can be regarded as being placed in the center of the screen.

도 6에 이를 예시적으로 도시한바, 도 6을 참조하면, 이미지 내에 두 개의 공이 존재하며, 각각의 공을 내부에 전부 포함하는 경계 박스가 두 개 출력되고 있음을 확인할 수 있으며, 두 개의 경계 박스를 내부에 전부 포함하는 종합 경계 박스의 중심(p)이 이미지의 중심(P)으로부터 얼마나 벗어나 있는지 탐지하는 모습을 확인할 수 있다. Fig. 6 shows this as an example. Referring to Fig. 6, it can be seen that there are two balls in the image, and two bounding boxes containing all the balls inside are output, and the two bounding boxes It can be seen how much the center (p) of the comprehensive bounding box containing all inside is deviated from the center (P) of the image.

장치(100)는 종합 경계 박스의 중심이 이미지의 중심으로부터 얼마나 벗어나 있는지 탐지하여 탐지 결과에 따라 이미지를 촬영한 카메라의 상대적인 위치를 산출할 수 있는바(S320-1′), 이를 제2-1′ 단계라 하며, 후술할 제3 단계에서 활용될 수 있다. The device 100 can detect how far the center of the comprehensive bounding box is from the center of the image and calculate the relative position of the camera that took the image according to the detection result (S320-1'), which is shown in 2-1. ′ It is called a step and can be used in the third step to be described later.

이후, 장치(100)가 이미지 내에서 수평선을 탐지하고, 탐지한 수평선의 기울기를 산출하여 이미지의 기울어진 정도를 탐지한다(S320-2). Thereafter, the device 100 detects a horizontal line in the image and calculates the tilt of the detected horizontal line to detect the tilt of the image (S320-2).

이와 같은 S320-2 단계를 제2-2 단계라 하며, 제2-2 단계는 영상에 대한 심미성 평가 항목 중, 수평 구도에 관한 것으로써 화면의 전체적인 수평이 균형감있게 맞춰져야 하는 항목을 구현한 것이다. Step S320-2 is referred to as step 2-2, and step 2-2 is related to horizontal composition among aesthetic evaluation items for images, and implements an item that requires the overall horizontality of the screen to be balanced.

장치(100)는 이미지 내에서 주된 수평선을 탐지하고, 탐지한 수평선의 기울기를 산출하여 이미지의 기울어진 정도를 탐지해 후술할 제3 단계에서 활용할 수 있으며, 여기서 수평선은 특정 오브젝트에 의한 것일 수 있고, 특정 오브젝트가 아닌 오브젝트의 뒤에 위치한 배경에 의한 것일 수도 있는바, 어느 것이든 이미지 내에서 주된 수평선으로 인식될 수 있는 것이어야 할 것이다. The device 100 detects a main horizontal line in the image, calculates the slope of the detected horizontal line, detects the degree of tilt of the image, and uses it in a third step to be described later, wherein the horizontal line may be due to a specific object, , which may be due to a background located behind the object rather than a specific object, whichever should be recognized as a main horizontal line in the image.

한편, 이미지에 따라 수평선이 직접적으로 드러나지 않는 경우도 존재할 수있을 것인바, 이 경우, 장치(100)는 이미지 내에서 평행한 두 선을 탐지하고, 탐지한 두 평행선이 만나는 소실점을 산출함으로써 수평선을 탐지할 수 있으며, 이에 대한 기울기를 산출하여 이미지의 기울어진 정도를 탐지할 수도 있다. On the other hand, there may be a case where the horizontal line is not directly revealed depending on the image. In this case, the device 100 detects two parallel lines in the image and calculates a vanishing point where the two detected parallel lines meet to determine the horizontal line. It can be detected, and the degree of inclination of the image can be detected by calculating the inclination thereof.

더 나아가, 본 발명의 제2 실시 예에 따른 고품질 동영상 자율 촬영을 위해 카메라를 제어하는 방법은 이미지가 실내 이미지인 경우, 제2-2 단계 이후에 탐지한 오브젝트의 대칭선을 탐지하여 이미지의 기울어진 정도를 탐지할 수 있는바(S320-2′), 이를 제2-2 ′단계라 하며, 탐지 결과를 제2-2 단계와 함께 후술할 제3 단계에서 활용할 수 있다. Furthermore, in the method for controlling a camera for self-photography of a high-quality video according to the second embodiment of the present invention, when the image is an indoor image, the symmetry line of the object detected after step 2-2 is detected and the image is tilted. The true degree can be detected (S320-2′), which is referred to as the 2-2′ step, and the detection result can be used in the third step to be described later along with the 2-2 step.

이미지의 기울어진 정도까지 탐지했다면, 장치(100)가 제2-1 단계의 탐지 결과와 제2-2 단계의 탐지 결과를 이용하여 상기 수신한 영상의 심미성을 평가한다(S320-3). If the degree of inclination of the image is detected, the device 100 evaluates the aesthetics of the received image using the detection result of step 2-1 and step 2-2 (S320-3).

여기서 제2-1 단계의 탐지 결과는 촬영하고자 하는 피사체의 중심을 화면의 중심에 두어야하는 항목에 관한 것이며, 제2-2 단계의 탐지 결과는 화면의 전체적인 수평이 균형감있게 맞춰져야 하는 항목에 관한 것이라고 했던바, 장치(100)는 이를 이용하여 영상의 심미성을 평가할 수 있다. Here, the detection result of step 2-1 relates to an item that requires the center of the subject to be photographed to be at the center of the screen, and the detection result of step 2-2 relates to an item that requires the overall horizontality of the screen to be balanced. As described above, the device 100 may evaluate the aesthetics of the image using this.

여기서 심미성의 평가는 복수 개의 카메라(200)가 촬영한 영상 각각에 대하여 개별적으로 이루어지는 것이며, 제2-1 탐지 결과 및 제2-2 탐지 결과를 이용한다 함은 영상의 심미성 평가에 있어서 화면의 구도를 평가한다는 의미이며, 장치(100)는 화면의 구도뿐만 아니라 기타 고품질 동영상이라고 평가하는데 활용될 수 있는 항목에 대한 평가, 예를 들어 영상의 화질에 대한 항목, 영상의 색감에 대한 항목, 영상의 역동성에 대한 항목 등과 같은 항목을 심미성 평가에 추가적으로 활용할 수 있다 할 것이다. Here, the aesthetic evaluation is performed individually for each image captured by the plurality of cameras 200, and the use of the 2-1 detection result and the 2-2 detection result means that the composition of the screen is determined in the aesthetic evaluation of the image. It means to evaluate, and the device 100 evaluates not only the composition of the screen but also other items that can be used to evaluate a high-quality video, such as an item for image quality, an item for color of an image, and dynamics of an image. Items such as gender can be additionally used for aesthetic evaluation.

한편, 심미성 평가는 평가 결과가 일정한 수치로 산출될 수 있으며, 수치가 아니라 복수 개의 카메라(200)가 촬영한 영상 중, 심미성이 높은 순서만 평가 결과로 산출될 수도 있을 것이며, 이와 더불어 복수 개의 카메라(200)가 촬영한 영상 각각에 대한 심미성 평가 결과 중, 가장 높은 영상을 촬영한 카메라에 대한 정보, 제2-1 단계의 탐지 결과에 따른 이미지를 촬영한 카메라의 추천 촬영 위치에 대한 정보 및 제2-2 단계의 탐지 결과에 따른 이미지를 촬영한 카메라의 추천 촬영 각도에 대한 정보 중 어느 하나 이상을 포함할 수 있다. On the other hand, in the evaluation of aesthetics, the evaluation result may be calculated as a certain numerical value, and among the images captured by the plurality of cameras 200, only the order of aesthetics may be calculated as the evaluation result, not as a numerical value. Among the aesthetic evaluation results for each image taken by (200), information on the camera that took the highest image, information on the recommended shooting location of the camera that took the image according to the detection result of step 2-1, and It may include any one or more of information about the recommended shooting angle of the camera that took the image according to the detection result of step 2-2.

영상의 심미성까지 평가했다면, 마지막으로 장치(100)가 제2 단계의 심미성 평과 결과를 이용하여 오브젝트를 촬영하는 복수 개의 카메라(200)를 개별적으로 제어한다(S330). If the aesthetics of the image is evaluated, finally, the device 100 individually controls the plurality of cameras 200 for photographing the object using the result of the aesthetic evaluation in the second step (S330).

복수 개의 카메라(200)에 대한 개별적인 제어는 제2 단계의 심미성 평가 결과를 이용하는 것이 우선적이며, 보다 구체적으로, 제2-1 단계의 탐지 결과에 따른 이미지를 촬영한 카메라의 추천 촬영 위치에 대한 정보 및 제2-2 단계의 탐지 결과에 따른 이미지를 촬영한 카메라의 추천 촬영 각도에 대한 정보를 모두 이용하는 것이 가장 바람직하다 할 것이고, 더 나아가, 제2-1′ 단계에서 산출한 카메라의 상대적인 위치에 대한 정보와 제2-2′ 단계에서 탐지한 대칭선에 따른 이미지의 기울어진 정도를 추가적으로 반영함으로써 카메라 제어에 정확도를 향상시킬 수도 있다 할 것이다. For individual control of the plurality of cameras 200, it is preferential to use the aesthetic evaluation result of the second step, and more specifically, information on the recommended shooting location of the camera that took the image according to the detection result of the second-1 step. and information on the recommended shooting angle of the camera that took the image according to the detection result of step 2-2. Furthermore, the relative position of the camera calculated in step 2-1′ It will be said that the accuracy of camera control can be improved by additionally reflecting the information on the image and the degree of inclination of the image according to the line of symmetry detected in step 2-2'.

이를 도 7에 예시적으로 도시한바, 카메라의 추천 촬영 위치에 대한 정보를 이용하여 해당 카메라의 촬영 위치를 변경함으로써 도 6에 예시적으로 도시한 종합 경계 박스의 중심이 이미지의 중심과 일치하게 됨을 확인할 수 있다. As shown exemplarily in FIG. 7, the center of the comprehensive bounding box exemplarily shown in FIG. You can check.

한편, 장치(100)는 복수 개의 카메라(200)를 개별적으로 제어하기에 앞서 제2-1 단계의 탐지 결과에 따른 이미지를 촬영한 카메라의 추천 촬영 위치에 대한 정보에 따른 추천 경계 박스, 제2-2 단계의 탐지 결과에 따른 이미지를 촬영한 카메라의 추천 촬영 각도에 대한 정보에 따른 추천 수평선을 현재 상태의 경계 박스와 수평선과 구별하여 출력할 수 있으며, 카메라의 제어에 따라 현재의 경계 박스 및 수평선이 추천 경계 박스 및 추천 수평선과 일치하게 되는 경우 장치(100)의 사용자에게 최적의 고품질 동영상을 촬영하고 있다는 알림을 발송할 수 있다 할 것이다. On the other hand, prior to individually controlling the plurality of cameras 200, the device 100 includes a recommendation bounding box according to information on a recommended shooting position of a camera that has taken an image according to the detection result of step 2-1, a second -The recommended horizontal line according to the information on the recommended shooting angle of the camera that took the image according to the detection result of step 2 can be output by distinguishing it from the current bounding box and horizontal line. When the horizontal line coincides with the recommended bounding box and the recommended horizontal line, a notification that an optimal high-quality video is being shot may be sent to the user of the device 100 .

이와 별개로 장치(100)는 제3 단계에 따른 카메라 제어 결과, 보다 구체적으로 특정 오브젝트, 오브젝트간 상호작용 그리고 심미성 평가 결과에 따른 카메라 제어 결과를 지속적으로 학습하여 추후 동일하거나 유사한 오브젝트, 오브젝트간 상호작용, 심미성 평가 결과에 따른 카메라의 제어에 활용할 수 있을 것인바, 장치(100)의 사용에 따라 그 성능이 지속적으로 향상될 수 있을 것이다. Separately, the device 100 continuously learns the camera control result according to the third step, more specifically, the camera control result according to the specific object, the interaction between objects, and the aesthetic evaluation result, so that the same or similar object or interaction between objects is subsequently learned. Since it can be used to control the camera according to the evaluation result of action and aesthetics, its performance can be continuously improved according to the use of the device 100 .

지금까지 본 발명의 제2 실시 예에 따른 고품질 동영상 자율 촬영을 위해 카메라를 제어하는 방법에 대하여 설명하였다. 본 발명에 따르면, 고품질 동영상 제작을 위해 높은 인건비를 부담해야 하는 전문 촬영 감독이나 촬영 기사를 전혀 고용하지 않고, 장치(100)에 대한 소정의 비용만을 부담하는 것만으로 고품질 동영상을 손쉽게 제작할 수 있다. 또한, 수동적 시각 지능과 같이 학습 대상이 된 전문가들의 촬영 방식을 학습하는 것이 아닌 화면의 구도에 초점을 맞춘 카메라 제어가 이루어지기 때문에 장치(100) 스스로 피사체를 기준으로 가장 효과적인 구도나 샷의 종류를 스스로 선정함으로써 이들 전문가들을 완벽하게 대체하는 대체재로서 동작할 수 있다. 더 나아가, 전문가들을 완벽하게 대체하여 이들에 대한 인건비를 절약할 수 있게 됨으로써 소규모 제작자들 역시 고품질 동영상을 부담없이 제작할 수 있도록 이바지할 수 있다. So far, a method for controlling a camera for self-photographing a high-quality video according to the second embodiment of the present invention has been described. According to the present invention, a high-quality video can be easily produced only by paying a predetermined cost for the device 100 without hiring a professional cinematographer or photographer who must bear high labor costs for producing a high-quality video. In addition, since camera control focused on the composition of the screen is performed rather than learning the shooting method of experts who are subject to learning, such as passive visual intelligence, the device 100 itself determines the most effective composition or type of shot based on the subject. By self-selection, it can act as a substitute that perfectly replaces these experts. Furthermore, it can contribute to the production of high-quality videos without burden even for small-scale producers by being able to save labor costs for professionals by replacing them perfectly.

한편, 본 발명의 제1 실시 예에 따른 고품질 동영상 자율 촬영을 위해 카메라를 제어하는 장치(100)는 도 1과 같은 모습뿐만 아니라 도 8에 예시적으로 도시한 바와 같이 각각의 기능을 수행하는 기능적인 구성을 포함하는 장치(1000)로 나타낼 수도 있으며, 본 발명의 제1 실시 예에 따른 고품질 동영상 자율 촬영을 위해 카메라를 제어하는 장치(100) 및 본 발명의 제2 실시 예에 따른 고품질 동영상 자율 촬영을 위해 카메라를 제어하는 방법은 모든 기술적 특징을 동일하게 포함하는 본 발명의 제3 실시 예에 따른 컴퓨터로 판독 가능한 매체에 저장된 컴퓨터 프로그램으로 구현할 수도 있는바, 이 경우 컴퓨팅 장치와 결합하여, (AA) 복수 개의 카메라가 촬영한 영상을 수신하여 상기 영상이 포함하는 오브젝트(Object)를 탐지하고, 상기 오브젝트 탐지 결과를 이용하여 상기 복수 개의 카메라가 촬영한 영상의 정합, 상기 오브젝트 간 상호작용의 분석 및 상기 오브젝트의 추적 중 어느 하나 이상을 수행하는 제1 단계, (BB) 상기 제1 단계의 수행 결과를 이용하여 상기 탐지한 오브젝트가 상기 수신한 영상이 포함하는 프레임 단위의 이미지의 중심으로부터 얼마나 벗어나 있는지 탐지하고, 상기 이미지의 기울어진 정도를 탐지하여 상기 수신한 영상의 심미성을 평가하는 제2 단계 및 (CC) 상기 제2 단계의 심미성 평과 결과를 이용하여 상기 오브젝트를 촬영하는 복수 개의 카메라를 개별적으로 제어하는 단계를 실행할 수 있을 것이며, 중복 서술을 위해 자세히 기재하지는 않았지만 본 발명의 제1 실시 예에 따른 고품질 동영상 자율 촬영을 위해 카메라를 제어하는 장치(100) 및 본 발명의 제2 실시 예에 따른 고품질 동영상 자율 촬영을 위해 카메라를 제어하는 방법 에 적용된 모든 기술적 특징은 본 발명의 제3 실시 예에 따른 컴퓨터로 판독 가능한 매체에 저장된 컴퓨터 프로그램에 모두 동일하게 적용될 수 있음은 물론이라 할 것이다. Meanwhile, the device 100 for controlling a camera for self-photographing a high-quality video according to the first embodiment of the present invention has a function of performing each function as exemplarily shown in FIG. 8 as well as the appearance of FIG. It can also be represented as a device 1000 including a typical configuration, and the device 100 for controlling a camera for autonomous high-quality video recording according to the first embodiment of the present invention and the autonomous high-quality video video according to the second embodiment of the present invention The method for controlling the camera for shooting may be implemented as a computer program stored in a computer readable medium according to the third embodiment of the present invention having all the same technical features. In this case, by combining with a computing device, ( AA) Receiving an image captured by a plurality of cameras, detecting an object included in the image, matching the images captured by the plurality of cameras and analyzing the interaction between the objects using the object detection result and a first step of performing any one or more of the tracking of the object, (BB) how far the detected object is from the center of the image in units of frames included in the received image using the result of the first step. A second step of detecting whether or not the image is tilted and evaluating the aesthetics of the received image by detecting the degree of inclination of the image, and (CC) using the aesthetic evaluation result of the second step, a plurality of cameras for photographing the object are individually It will be possible to execute the control step, and although not described in detail for redundant description, the apparatus 100 for controlling a camera for autonomous shooting of a high-quality video according to the first embodiment of the present invention and the second embodiment of the present invention Of course, all technical features applied to the method for controlling a camera for autonomous high-quality video recording according to the present invention can be equally applied to a computer program stored in a computer readable medium according to the third embodiment of the present invention.

이상 첨부된 도면을 참조하여 본 발명의 실시 예들을 설명하였지만, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자는 본 발명이 그 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 실시될 수 있다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시 예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다.Although the embodiments of the present invention have been described with reference to the accompanying drawings, those skilled in the art to which the present invention pertains can be implemented in other specific forms without changing the technical spirit or essential features of the present invention. you will be able to understand Therefore, the embodiments described above should be understood as illustrative in all respects and not limiting.

10: 프로세서
20: 네트워크 인터페이스
30: 메모리
40: 스토리지
41: 컴퓨터 프로그램
50: 정보 버스
100: 고품질 동영상 자율 촬영을 위해 카메라를 제어하는 장치
200: 카메라
300: 피사체, 오브젝트
N: 네트워크10: Processor
20: network interface
30: memory
40: storage
41: computer program
50: information bus
100: A device that controls a camera for autonomous shooting of high-quality video
200: camera
300: subject, object
N: network

Claims

A method for a device including a processor and a memory to control a camera for autonomous shooting of a high-quality video,
(a) Receiving an image captured by a plurality of cameras, detecting an object included in the image, and using the object detection result to match the images captured by the plurality of cameras and to determine the interaction between the objects. A first step of performing at least one of analysis and tracking of the object;
(b) using the result of the first step, detecting how far the detected object is away from the center of the image in units of frames included in the received image, detecting the degree of inclination of the image, and detecting the received image A second step of evaluating aesthetics of the image; and
(c) a third step of individually controlling a plurality of cameras for photographing the object using the aesthetic evaluation result of the second step;
A method for controlling a camera for autonomous shooting of high-quality videos, including.

According to claim 1,
The first step is
Step 1-1 of receiving images taken by a plurality of cameras;
Steps 1-2 of detecting a plurality of objects by applying an object detection algorithm to an image in units of frames included in the received image;
Steps 1-3 of matching the images taken by the plurality of cameras based on the plurality of objects detected in the frame-by-frame image;
Steps 1 to 4 of analyzing an interaction between the plurality of objects by using information about the plurality of objects detected in the frame-by-frame image and information about matching of images captured by the plurality of cameras; and
Steps 1-5 of tracking the positions of the plurality of objects detected in the frame-by-frame image within the received image;
A method for controlling a camera for autonomous shooting of high-quality video, including any one or more of the above.

According to claim 2,
The object detection algorithm in step 1-2,
One of the You Only Look Once (YOLO) algorithm and the CenterNet algorithm,
How to control the camera for self-portrait of high-quality video.

According to claim 2,
Detecting a plurality of objects in step 1-2,
Outputting an area where each of the plurality of detected objects is located in the image as an object bounding box,
How to control the camera for self-portrait of high-quality video.

According to claim 2,
The interaction between the plurality of objects in the first to fourth steps,
Including any one or more of interactions between them when the plurality of objects are people and objects and interactions between them when the plurality of objects are people and people.
How to control the camera for self-portrait of high-quality video.

According to claim 2,
Predicting future behavior of each of the plurality of objects by using any one or more of information on matching of images captured by the plurality of cameras, information on interaction between the plurality of objects, and location tracking information of the plurality of objects. Steps 1-6 to do;
A method for controlling a camera for autonomous shooting of high-quality video, which further includes.

According to claim 1,
The second step,
a 2-1 step of detecting how far the center of the bounding box outputting the area where the detected object is located within the image is deviated from the center of the image;
a 2-2 step of detecting a horizontal line in the image and calculating an inclination of the detected horizontal line to detect an inclination degree of the image; and
a 2-3 step of evaluating aesthetics of the received image using the detection result of the 2-1 step and the detection result of the 2-2 step;
A method for controlling a camera for autonomous shooting of high-quality video, including any one or more of the above.

According to claim 7,
Between the 2-1 step and the 2-2 step,
Step 2-1 ´ of calculating the relative position of the camera that captured the image according to the detection result of Step 2-1;
A method for controlling a camera for autonomous shooting of high-quality video, which further includes.

According to claim 7,
The detection of the horizontal line in step 2-2,
When a horizontal line is not detected in the image, detecting two parallel lines and calculating a vanishing point where the two detected parallel lines meet to detect the horizontal line,
How to control the camera for self-portrait of high-quality video.

According to claim 7,
Between the 2-2 and 2-3 steps,
a 2-2' step of detecting a degree of inclination of the image by detecting a line of symmetry of the detected object when the image is an indoor image;
A method for controlling a camera for autonomous shooting of high-quality video, which further includes.

According to claim 7,
The result of the aesthetic evaluation in step 2-3 is,
Among the aesthetic evaluation results for each image captured by the plurality of cameras, information on the camera that captured the highest image among the aesthetic evaluation results for each image captured by the plurality of cameras, detection of the step 2-1 Including any one or more of information on the recommended shooting position of the camera that took the image according to the result and information on the recommended shooting angle of the camera that took the image according to the detection result of step 2-2,
How to control the camera for self-portrait of high-quality video.

one or more processors;
network interface;
a memory for loading a computer program executed by the processor; and
Including storage for storing large-capacity network data and the computer program,
The computer program by the one or more processors,
(A) Receiving an image taken by a plurality of cameras, detecting an object included in the image, and using the object detection result to match the images taken by the plurality of cameras, and to determine the interaction between the objects. a first operation for performing at least one of analysis and tracking of the object;
(B) using the result of the first step, detecting how far the detected object is away from the center of the image in units of frames included in the received image, detecting the degree of inclination of the image, and detecting the received image a second operation to evaluate the aesthetics of the image; and
(C) individually controlling a plurality of cameras for photographing the object using the aesthetic evaluation result of the second step;
A device that controls a camera for autonomous shooting of high-quality video that runs

Combined with a computing device,
(AA) Receiving an image captured by a plurality of cameras, detecting an object included in the image, and using the result of object detection, matching of images captured by the plurality of cameras, interaction between the objects A first step of performing at least one of analysis and tracking of the object;
(BB) Using the result of the first step, it is detected how far the detected object is from the center of the image in units of frames included in the received image, and the degree of inclination of the image is detected to obtain the received image. A second step of evaluating aesthetics of the image; and
(CC) individually controlling a plurality of cameras for photographing the object using the aesthetic evaluation result of the second step;
A computer program stored on a computer-readable medium that executes.