KR20230099921A

KR20230099921A - Method for generating synthetic video using optical flow

Info

Publication number: KR20230099921A
Application number: KR1020210189365A
Authority: KR
Inventors: 김은태; 이수현
Original assignee: 연세대학교 산학협력단
Priority date: 2021-12-28
Filing date: 2021-12-28
Publication date: 2023-07-05
Also published as: KR102586637B1

Abstract

The present embodiments provide a method and apparatus for generating a video that includes: calculating the optical flow between the background image sequence when an object to be synthesized is given to a background image sequence and a video; using the calculated optical flow to naturally synthesize the object with the background image sequence over time and generate a synthetic video similar to a real video. The apparatus includes a processor and a memory.

Description

Synthesis video generation method using optical flow {METHOD FOR GENERATING SYNTHETIC VIDEO USING OPTICAL FLOW}

본 발명이 속하는 기술 분야는 합성 동영상 생성 방법 및 장치에 관한 것이다. The technical field to which the present invention belongs relates to a method and apparatus for generating a synthesized video.

이 부분에 기술된 내용은 단순히 본 실시예에 대한 배경 정보를 제공할 뿐 종래기술을 구성하는 것은 아니다.The contents described in this part merely provide background information on the present embodiment and do not constitute prior art.

객체 검출 및 추적(Object Detection and Tracking)이나 의미론적 분할(Semantic Segmentation)과 같이 자율 주행에 있어서 필수적인 과업을 해결하기 위해, 깊은 신경망(Deep Neural Network)은 많은 양의 학습 데이터를 기반으로 발전해왔다. 하지만, 깊은 신경망은 흔치 않은 상황을 맞이하게 되면, 학습 되지 않은 상황을 인지하지 못한다. 예를 들어, 학습 데이터에 존재하지 않았던 표지판이 실제 상황에서 등장하면 표지판의 검출을 실패한다. 이러한 문제를 해결하기 위해서는 흔치 않은 상황들에 대한 많은 동영상 데이터가 필요하다. 하지만 다양한 상황들에 대해 깊은 신경망을 충분히 학습할 만큼 많은 양의 동영상 데이터를 직접 취득하는 것은 거의 불가능하다.In order to solve essential tasks for autonomous driving, such as object detection and tracking or semantic segmentation, deep neural networks have been developed based on large amounts of training data. However, when a deep neural network encounters an unusual situation, it cannot recognize the unlearned situation. For example, if a sign that did not exist in the learning data appears in a real situation, detection of the sign fails. To solve this problem, we need a lot of video data for unusual situations. However, it is almost impossible to directly acquire a large amount of video data to sufficiently train a deep neural network for various situations.

한국공개특허공보 제10-2021-0040882호 (2021.04.14)Korean Patent Publication No. 10-2021-0040882 (2021.04.14)

Ilg, Eddy, et al. "Flownet 2.0: Evolution of optical flow estimation with deep networks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017. Ilg, Eddy, et al. "Flownet 2.0: Evolution of optical flow estimation with deep networks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.

본 발명의 실시예들은 배경 이미지 시퀀스와 동영상에 합성될 물체가 주어졌을 때 배경 이미지 시퀀스 사이의 광학 흐름을 계산하고, 계산된 광학 흐름을 이용하여 시간의 흐름에 따라 물체를 배경 이미지 시퀀스에 자연스럽게 합성하여 실제와 유사한 합성 동영상을 생성하는데 주된 목적이 있다.Embodiments of the present invention calculate the optical flow between the background image sequence when a background image sequence and an object to be synthesized into a video are given, and naturally synthesize the object into the background image sequence over time using the calculated optical flow. The main purpose is to create a synthetic video similar to the real one by doing so.

본 발명의 명시되지 않은 또 다른 목적들은 하기의 상세한 설명 및 그 효과로부터 용이하게 추론할 수 있는 범위 내에서 추가적으로 고려될 수 있다.Other non-specified objects of the present invention may be additionally considered within the scope that can be easily inferred from the following detailed description and effects thereof.

본 실시예의 일 측면에 의하면 동영상 생성 장치에 의한 동영상 생성 방법에 있어서, 복수의 프레임 중에서 하나의 프레임에 합성할 물체의 위치와 크기를 지정하는 단계; 상기 광학 흐름을 기반으로 다른 프레임에 상기 합성할 물체의 위치를 산출하는 단계를 포함하는 동영상 생성 방법을 제공한다.According to an aspect of the present embodiment, in a method for generating a video by a video generating apparatus, designating a position and size of an object to be synthesized in one frame among a plurality of frames; Provided is a video generating method comprising calculating the position of the object to be synthesized in another frame based on the optical flow.

상기 동영상 생성 방법은 상기 합성할 물체의 위치를 산출하는 단계 이후에 상기 광학 흐름을 기반으로 상기 다른 프레임에 상기 합성할 물체의 크기를 산출하는 단계를 포함할 수 있다.The video generating method may include calculating a size of the object to be synthesized in the other frame based on the optical flow after the step of calculating the position of the object to be synthesized.

상기 합성할 물체의 위치와 크기를 지정하는 단계는, UV좌표계를 기반으로 상기 합성할 물체의 위치에 해당하는 U좌표 기반의 중심 좌표 및 V좌표 기반의 하단 좌표를 설정하고, 상기 합성할 물체의 크기에 해당하는 높이 및 너비를 설정할 수 있다.In the step of designating the position and size of the object to be synthesized, the center coordinate based on U coordinate and the lower coordinate based on V coordinate corresponding to the position of the object to be synthesized are set based on the UV coordinate system, and the coordinates of the lower end based on V coordinate are set. You can set the height and width corresponding to the size.

상기 합성할 물체의 위치를 산출하는 단계는, 상기 하나의 프레임 및 상기 다른 프레임 간의 광학 흐름을 픽셀 단위로 변환하는 변환 모델을 통해 픽셀 데이터로 변환할 수 있다.In the calculating of the position of the object to be synthesized, the optical flow between the one frame and the other frame may be converted into pixel data through a conversion model that converts the optical flow between the one frame and the other frame in units of pixels.

상기 변환 모델은 영상 내 물체의 움직임 패턴에 따라 대응하는 픽셀이 이동한 방향 및 거리를 산출할 수 있다.The transformation model may calculate the moving direction and distance of the corresponding pixel according to the motion pattern of the object in the image.

상기 합성할 물체의 위치를 산출하는 단계는, 상기 광학 흐름에 대응하는 U좌표 기반의 중심 좌표의 변화 및 V좌표 기반의 하단 좌표의 변화를 이용하여 산출할 수 있다.The step of calculating the position of the object to be synthesized may be calculated using a change in center coordinates based on U coordinates and a change in lower coordinates based on V coordinates corresponding to the optical flow.

상기 합성할 물체의 크기를 산출하는 단계는, 상기 광학 흐름에 대응하는 U좌표 기반의 측면 좌표의 변화 및 U좌표 기반의 중심 좌표의 변화를 이용하여 너비 변화를 산출할 수 있다.In the step of calculating the size of the object to be synthesized, a change in width may be calculated using a change in side coordinates based on U coordinates and a change in center coordinates based on U coordinates corresponding to the optical flow.

상기 합성할 물체의 크기를 산출하는 단계는, 상기 너비 변화의 비율을 이용하여 높이 변화를 산출할 수 있다.In the step of calculating the size of the object to be synthesized, the change in height may be calculated using the ratio of the change in width.

본 실시예의 다른 측면에 의하면 프로세서 및 상기 프로세서에 의해 실행되는 프로그램을 저장하는 메모리를 포함하는 동영상 생성 장치에 있어서, 상기 프로세서는, 복수의 프레임 중에서 하나의 프레임에 합성할 물체의 위치와 크기를 지정하고, 상기 광학 흐름을 기반으로 다른 프레임에 상기 합성할 물체의 위치를 산출하고, 상기 광학 흐름을 기반으로 상기 다른 프레임에 상기 합성할 물체의 크기를 산출하는 것을 특징으로 하는 동영상 생성 장치를 제공한다.According to another aspect of the present embodiment, in the video generating apparatus including a processor and a memory storing a program executed by the processor, the processor designates the position and size of an object to be synthesized in one frame among a plurality of frames. and calculating the position of the object to be synthesized in another frame based on the optical flow, and calculating the size of the object to be synthesized in the other frame based on the optical flow. .

상기 프로세서는, UV좌표계를 기반으로 상기 합성할 물체의 위치에 해당하는 U좌표 기반의 중심 좌표 및 V좌표 기반의 하단 좌표를 설정하고, 상기 합성할 물체의 크기에 해당하는 높이 및 너비를 설정할 수 있다.The processor may set center coordinates based on U coordinates and bottom coordinates based on V coordinates corresponding to the position of the object to be synthesized based on the UV coordinate system, and set height and width corresponding to the size of the object to be synthesized. there is.

상기 프로세서는, 상기 하나의 프레임 및 상기 다른 프레임 간의 광학 흐름을 픽셀 단위로 변환하는 변환 모델을 통해 픽셀 데이터로 변환하여, 상기 광학 흐름에 대응하는 U좌표 기반의 중심 좌표의 변화 및 V좌표 기반의 하단 좌표의 변화를 이용하여 상기 합성할 물체의 위치를 산출할 수 있다.The processor converts the optical flow between the one frame and the other frame into pixel data through a conversion model that converts the optical flow in units of pixels, and converts the U-coordinate-based center coordinates corresponding to the optical flow and the V-coordinate-based The position of the object to be synthesized may be calculated using the change in the lower coordinates.

상기 프로세서는, 상기 너비 변화의 비율을 이용하여 높이 변화를 산출하는 방식을 통해 상기 합성할 물체의 크기를 산출할 수 있다.The processor may calculate the size of the object to be synthesized by calculating a height change using the width change ratio.

이상에서 설명한 바와 같이 본 발명의 실시예들에 의하면, 배경 이미지 시퀀스와 동영상에 합성될 물체가 주어졌을 때 배경 이미지 시퀀스 사이의 광학 흐름을 계산하고, 계산된 광학 흐름을 이용하여 시간의 흐름에 따라 물체를 배경 이미지 시퀀스에 자연스럽게 합성하여 실제와 유사한 합성 동영상을 생성할 수 있는 효과가 있다.As described above, according to the embodiments of the present invention, when an object to be synthesized into a background image sequence and a video is given, an optical flow between background image sequences is calculated, and the optical flow is calculated according to the lapse of time using the calculated optical flow. There is an effect of generating a synthetic video similar to reality by naturally synthesizing an object with a background image sequence.

여기에서 명시적으로 언급되지 않은 효과라 하더라도, 본 발명의 기술적 특징에 의해 기대되는 이하의 명세서에서 기재된 효과 및 그 잠정적인 효과는 본 발명의 명세서에 기재된 것과 같이 취급된다.Even if the effects are not explicitly mentioned here, the effects described in the following specification expected by the technical features of the present invention and their provisional effects are treated as described in the specification of the present invention.

도 1은 본 발명의 일 실시예에 따른 동영상 생성 장치를 예시한 블록도이다.
도 2는 본 발명의 일 실시예에 따른 동영상 생성 장치가 처리하는 배경 이미지 시퀀스를 예시한 도면이다.
도 3은 본 발명의 일 실시예에 따른 동영상 생성 장치가 처리하는 합성할 물체를 예시한 도면이다.
도 4는 본 발명의 일 실시예에 따른 동영상 생성 장치가 처리하는 합성 이미지를 예시한 도면이다.
도 5는 본 발명의 일 실시예에 따른 동영상 생성 장치가 처리하는 두 프레임 간의 광학 흐름을 예시한 도면이다.
도 6은 본 발명의 일 실시예에 따른 동영상 생성 장치가 처리하는 합성 이미지 시퀀스를 예시한 도면이다.
도 7은 본 발명의 다른 실시예에 따른 동영상 생성 방법을 예시한 흐름도이다.1 is a block diagram illustrating a video generating device according to an embodiment of the present invention.
2 is a diagram illustrating a background image sequence processed by an apparatus for generating a video according to an embodiment of the present invention.
3 is a diagram illustrating an object to be synthesized processed by a video generating apparatus according to an embodiment of the present invention.
4 is a diagram illustrating a composite image processed by an apparatus for generating a video according to an embodiment of the present invention.
5 is a diagram illustrating an optical flow between two frames processed by an apparatus for generating a video according to an embodiment of the present invention.
6 is a diagram illustrating a composite image sequence processed by an apparatus for generating a video according to an embodiment of the present invention.
7 is a flowchart illustrating a video generating method according to another embodiment of the present invention.

이하, 본 발명을 설명함에 있어서 관련된 공지기능에 대하여 이 분야의 기술자에게 자명한 사항으로서 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략하고, 본 발명의 일부 실시예들을 예시적인 도면을 통해 상세하게 설명한다.Hereinafter, in the description of the present invention, if it is determined that a related known function may unnecessarily obscure the subject matter of the present invention as an obvious matter to those skilled in the art, the detailed description thereof will be omitted, and some embodiments of the present invention will be described. It will be described in detail through exemplary drawings.

합성 이미지 데이터 생성은 실제 이미지 취득의 어려움을 해결할 수 있는 방법 중 하나이다. 배경 이미지와 합성할 물체가 주어졌을 때, 물체를 배경 이미지에 합성하여 새로운 합성 이미지 데이터를 생성할 수 있다. 합성 동영상 데이터를 생성하기 위해서 물체를 합성하기 원하는 위치와 합성될 물체의 적절한 크기를 모든 배경 이미지 시퀀스에 대해 직접 표시하는 방식이 활용되고 있다. 이러한 작업 방식으로는 깊은 신경망을 충분히 학습할 만큼 많은 양의 동영상 데이터를 생성하는 것은 실질적으로 불가능하다.Creating synthetic image data is one of the ways to solve the difficulty of acquiring real images. When an object to be synthesized with a background image is given, new synthesized image data may be generated by synthesizing the object with the background image. In order to generate synthesized video data, a method of directly displaying the desired location of the object to be synthesized and the appropriate size of the object to be synthesized for all background image sequences is used. With this way of working, it's practically impossible to generate enough video data to train deep neural networks.

본 실시예는 배경 이미지 시퀀스 데이터의 첫 프레임에만 물체를 합성하기 원하는 위치와 크기를 지정해주면, 이미지 시퀀스 데이터 사이의 광학 흐름을 통해 모든 배경 이미지 시퀀스에 대해 물체를 합성할 적절한 위치와 크기를 자동으로 계산하여 합성 동영상 데이터를 생성한다. 여기서 광학 흐름이란 영상 내 물체의 움직임의 패턴으로, 현재 프레임과 다음 프레임 사이에서 각각의 픽셀이 이동한 방향과 거리를 나타낸다.In this embodiment, if the position and size desired to synthesize an object are specified only in the first frame of the background image sequence data, the appropriate position and size for synthesizing the object are automatically set for all background image sequences through an optical flow between the image sequence data. Calculate and generate synthesized video data. Here, the optical flow is a motion pattern of an object in an image, and indicates the direction and distance each pixel has moved between the current frame and the next frame.

도 1은 본 발명의 일 실시예에 따른 동영상 생성 장치를 예시한 블록도이다.1 is a block diagram illustrating a video generating device according to an embodiment of the present invention.

동영상 생성 장치(110)는 적어도 하나의 프로세서(120), 컴퓨터 판독 가능한 저장매체(130) 및 통신 버스(170)를 포함한다. The video generating device 110 includes at least one processor 120 , a computer readable storage medium 130 and a communication bus 170 .

프로세서(120)는 동영상 생성 장치(110)로 동작하도록 제어할 수 있다. 예컨대, 프로세서(120)는 컴퓨터 판독 가능한 저장 매체(130)에 저장된 하나 이상의 프로그램들을 실행할 수 있다. 하나 이상의 프로그램들은 하나 이상의 컴퓨터 실행 가능 명령어를 포함할 수 있으며, 컴퓨터 실행 가능 명령어는 프로세서(120)에 의해 실행되는 경우 동영상 생성 장치(110)로 하여금 예시적인 실시예에 따른 동작들을 수행하도록 구성될 수 있다.The processor 120 may control the video generating device 110 to operate. For example, the processor 120 may execute one or more programs stored in the computer readable storage medium 130 . The one or more programs may include one or more computer executable instructions, which when executed by the processor 120 may cause the video generating device 110 to perform operations according to an exemplary embodiment. can

컴퓨터 판독 가능한 저장 매체(130)는 컴퓨터 실행 가능 명령어 내지 프로그램 코드, 프로그램 데이터 및/또는 다른 적합한 형태의 정보를 저장하도록 구성된다. 컴퓨터 실행 가능 명령어 내지 프로그램 코드, 프로그램 데이터 및/또는 다른 적합한 형태의 정보는 입출력 인터페이스(150)나 통신 인터페이스(160)를 통해서도 주어질 수 있다. 컴퓨터 판독 가능한 저장 매체(130)에 저장된 프로그램(140)은 프로세서(120)에 의해 실행 가능한 명령어의 집합을 포함한다. 일 실시예에서, 컴퓨터 판독 가능한 저장 매체(130)는 메모리(랜덤 액세스 메모리와 같은 휘발성 메모리, 비휘발성 메모리, 또는 이들의 적절한 조합), 하나 이상의 자기 디스크 저장 디바이스들, 광학 디스크 저장 디바이스들, 플래시 메모리 디바이스들, 그 밖에 동영상 생성 장치(110)에 의해 액세스되고 원하는 정보를 저장할 수 있는 다른 형태의 저장 매체, 또는 이들의 적합한 조합일 수 있다.Computer-readable storage medium 130 is configured to store computer-executable instructions or program code, program data, and/or other suitable form of information. Computer executable instructions or program codes, program data and/or other suitable forms of information may also be provided via input/output interface 150 or communication interface 160. The program 140 stored in the computer readable storage medium 130 includes a set of instructions executable by the processor 120 . In one embodiment, computer readable storage medium 130 may include memory (volatile memory such as random access memory, non-volatile memory, or a suitable combination thereof), one or more magnetic disk storage devices, optical disk storage devices, flash It may be memory devices, other types of storage media that can be accessed by the video generating apparatus 110 and store desired information, or a suitable combination thereof.

통신 버스(170)는 프로세서(120), 컴퓨터 판독 가능한 저장 매체(130)를 포함하여 동영상 생성 장치(110)의 다른 다양한 컴포넌트들을 상호 연결한다.The communication bus 170 interconnects various other components of the video generating device 110, including the processor 120 and the computer readable storage medium 130.

동영상 생성 장치(110)는 또한 하나 이상의 입출력 장치를 위한 인터페이스를 제공하는 하나 이상의 입출력 인터페이스(150) 및 하나 이상의 통신 인터페이스(160)를 포함할 수 있다. 입출력 인터페이스(150) 및 통신 인터페이스(160)는 통신 버스(170)에 연결된다. 입출력 장치(미도시)는 입출력 인터페이스(150)를 통해 동영상 생성 장치(110)의 다른 컴포넌트들에 연결될 수 있다.The video generating device 110 may also include one or more input/output interfaces 150 and one or more communication interfaces 160 providing interfaces for one or more input/output devices. The input/output interface 150 and the communication interface 160 are connected to the communication bus 170 . An input/output device (not shown) may be connected to other components of the video generating device 110 through the input/output interface 150 .

동영상 생성 장치(110)는 배경 이미지 시퀀스와 물체가 주어졌을 때 배경 이미지 시퀀스의 첫 프레임에서 물체를 합성할 위치를 지정해주면, 배경 이미지 시퀀스 사이의 광학 흐름을 이용하여 자동으로 모든 배경 이미지 시퀀스에 대해 물체를 적절한 위치에 적절한 크기로 합성하여 동영상 데이터를 생성한다.When a background image sequence and an object are given, the moving picture generating device 110 designates a position to synthesize an object in the first frame of the background image sequence, and automatically for all background image sequences by using optical flow between the background image sequences. It creates video data by synthesizing objects in an appropriate location and in an appropriate size.

동영상 생성 장치(110)는 실제 환경에서 취득하기 쉬운 배경 동영상 데이터와 물체 데이터를 사용하여 실제 환경에서 취득하기 어려운 동영상 데이터를 대신할 수 있는 합성 동영상을 손쉽게 생성할 수 있다.The video generating device 110 can easily create a synthesized video that can replace video data that is difficult to obtain in a real environment by using background video data and object data that are easy to obtain in a real environment.

도 2는 배경 이미지 시퀀스를 예시한 도면이고, 도 3은 합성할 물체를 예시한 도면이고, 도 4는 합성 이미지를 예시한 도면이고, 도 5는 두 프레임 간의 광학 흐름을 예시한 도면이고, 도 6은 합성 이미지 시퀀스를 예시한 도면이다.2 is a diagram illustrating a background image sequence, FIG. 3 is a diagram illustrating an object to be synthesized, FIG. 4 is a diagram illustrating a composite image, FIG. 5 is a diagram illustrating an optical flow between two frames, FIG. 6 is a diagram illustrating a composite image sequence.

프로세서는 복수의 프레임 중에서 하나의 프레임에 합성할 물체의 위치와 크기를 지정하고, 광학 흐름을 기반으로 다른 프레임에 합성할 물체의 위치를 산출하고, 광학 흐름을 기반으로 다른 프레임에 합성할 물체의 크기를 산출한다.The processor designates the position and size of an object to be synthesized in one frame among a plurality of frames, calculates the position of the object to be synthesized in another frame based on the optical flow, and determines the position of the object to be synthesized in another frame based on the optical flow. calculate the size

프로세서는 UV좌표계를 기반으로 합성할 물체의 위치에 해당하는 U좌표 기반의 중심 좌표 및 V좌표 기반의 하단 좌표를 설정하고, 합성할 물체의 크기에 해당하는 높이 및 너비를 설정할 수 있다.The processor may set center coordinates based on U coordinates and bottom coordinates based on V coordinates corresponding to the position of objects to be synthesized based on the UV coordinate system, and set height and width corresponding to the size of objects to be synthesized.

프로세서는 하나의 프레임 및 다른 프레임 간의 광학 흐름을 픽셀 단위로 변환하는 변환 모델을 통해 픽셀 데이터로 변환하여, 광학 흐름에 대응하는 U좌표 기반의 중심 좌표의 변화 및 V좌표 기반의 하단 좌표의 변화를 이용하여 합성할 물체의 위치를 산출할 수 있다. The processor converts the optical flow between one frame and another into pixel data through a conversion model that converts the pixel-by-pixel unit, and calculates the change in center coordinates based on U coordinates and changes in lower coordinates based on V coordinates corresponding to the optical flow. It can be used to calculate the position of the object to be synthesized.

프로세서는 너비 변화의 비율을 이용하여 높이 변화를 산출하는 방식을 통해 합성할 물체의 크기를 산출할 수 있다.The processor may calculate the size of the object to be synthesized by calculating the height change using the ratio of the width change.

도 2와 같이 시간의 흐름에 따른 연속된 배경 이미지와 도 3과 같은 물체가 주어졌을 때, 배경 이미지 사이의 광학 흐름을 계산하여 물체가 배경 이미지에 합성된 동영상 데이터를 생성한다.Given a continuous background image over time as shown in FIG. 2 and an object as shown in FIG. 3, the optical flow between the background images is calculated to generate video data in which the object is combined with the background image.

먼저 첫번째 프레임의 이미지

에 대해 물체를 합성할 위치와 크기를 수학식 1과 같이 지정한다.First frame image

For , the position and size of the object to be synthesized are designated as in Equation 1.

여기에서 uc는 물체의 중앙에 해당하는 u좌표, vd는 물체의 가장 아랫부분에 해당하는 v좌표이고, h와 w는 물체의 높이와 너비이다. Pos₀에 의해 물체는 첫 프레임의 이미지 I₀에 도 4와 같이 합성된다.Here, uc is the u-coordinate corresponding to the center of the object, vd is the v-coordinate corresponding to the lowermost part of the object, and h and w are the height and width of the object. By Pos ₀ , the object is combined with the image I ₀ of the first frame as shown in FIG.

이제 첫 프레임에서의 물체의 위치와 연속된 프레임에서 계산한 광학 흐름을 기반으로 합성 동영상을 자동으로 생성한다. 우선 첫번째 프레임의 이미지 I-₀와 두번째 프레임의 이미지 I₁ 사이의 광학 흐름을 변환 모델을 이용하여 픽셀 단위로 계산한 결과

를 시각화하면 도 5와 같다. 변환 모델로는 비특허문헌 1 등을 참조하여 활용할 수 있다.Now, a composite video is automatically generated based on the position of the object in the first frame and the optical flow calculated in successive frames. First, the result of calculating the optical flow between the image I- ₀ of the first frame and the image I ₁ of the second frame in units of pixels using a conversion model

Visualization is shown in FIG. 5 . As a conversion model, it can be used with reference to Non-Patent Document 1 and the like.

F₀₁의 각 픽셀의 값은 I₀의 각 픽셀이 I₁로 u, v 좌표계에서 몇 픽셀만큼 이동하였는지를 나타낸다. Pos₀과 F₀₁를 이용하여 I₁에서 물체를 합성할 위치 Pos₁이 수학식 2와 같이 계산된다.The value of each pixel of F ₀₁ indicates how many pixels each pixel of I ₀ has moved to I ₁ in the u, v coordinate system. The position Pos ₁ where the object is to be synthesized in I ₁ using Pos ₀ and F ₀₁ is calculated as shown in Equation 2.

ul은 이미지에서 물체의 왼쪽에 해당하는 u좌표이다. 첫번째 프레임 이미지 I₀의 ul₀와 uc₀가 두번째 프레임 이미지 I₁의 어느 좌표로 옮겨갔는지(ul₁ 와 uc₁)를 광학 흐름 F₀₁을 통해 계산하여 두번째 프레임에 붙여넣을 물체의 크기도 수학식 2에서 함께 구한다.ul is the u-coordinate of the left side of the object in the image. The size of the object to be pasted in the second frame is calculated through the optical flow F ₀₁ to which coordinates (ul ₁ and uc ₁ ) of _the second frame image I ₁ the ul ₀ and uc ₀ of the first frame image I 0 have moved to. Find together in 2.

두번째 프레임의 이미지 I₁에 수학식 2에 의해 물체가 합성되며 이 과정은 마지막 배경 이미지 시퀀스까지 반복된다. 두번째 프레임의 이미지 I₁과 세번째 프레임의 이미지 I₂ 사이의 광학 흐름 F₁₂를 구하고 수학식 2를 적용한다. 이러한 과정이 마지막 프레임까지 반복되면 우리가 합성하길 원하는 도 3의 물체가 도면 2의 모든 배경 이미지에 합성되고, 합성된 물체는 실제로 그 자리에 존재하는 물체처럼 카메라의 움직임에 따라 자연스럽게 합성된다. An object is synthesized in the image I ₁ of the second frame according to Equation 2, and this process is repeated until the last background image sequence. The optical flow F ₁₂ between the image I ₁ of the second frame and the image I ₂ of the third frame is obtained and Equation 2 is applied. When this process is repeated until the last frame, the object in FIG. 3 that we want to synthesize is synthesized with all the background images in FIG.

따라서 도 6과 같이 시간의 흐름에 따라 물체가 배경에 자연스럽게 합성된 동영상을 제작할 수 있다.Accordingly, as shown in FIG. 6 , a video in which an object is naturally synthesized in the background can be created over time.

도 7은 본 발명의 다른 실시예에 따른 동영상 생성 방법을 예시한 흐름도이다.7 is a flowchart illustrating a video generating method according to another embodiment of the present invention.

동영상 생성 방법은 동영상 생성 장치에 의해 수행될 수 있다.The video generating method may be performed by a video generating device.

단계 S10에서는 복수의 프레임 중에서 하나의 프레임에 합성할 물체의 위치와 크기를 지정하는 단계를 수행한다.In step S10, a step of designating the position and size of an object to be synthesized in one frame among a plurality of frames is performed.

단계 S20에서는 광학 흐름을 기반으로 다른 프레임에 합성할 물체의 위치를 산출하는 단계를 수행한다.In step S20, a step of calculating the position of an object to be synthesized in another frame based on the optical flow is performed.

단계 S30에서는 광학 흐름을 기반으로 상기 다른 프레임에 상기 합성할 물체의 크기를 산출하는 단계를 수행한다.In step S30, a step of calculating the size of the object to be synthesized in the other frame based on the optical flow is performed.

합성할 물체의 위치와 크기를 지정하는 단계(S10)는, UV좌표계를 기반으로 합성할 물체의 위치에 해당하는 U좌표 기반의 중심 좌표 및 V좌표 기반의 하단 좌표를 설정하고, 합성할 물체의 크기에 해당하는 높이 및 너비를 설정할 수 있다.In the step of designating the position and size of the object to be synthesized (S10), the U-coordinate-based center coordinate and the V-coordinate-based bottom coordinate corresponding to the location of the object to be synthesized are set based on the UV coordinate system, and the object to be synthesized is set. You can set the height and width corresponding to the size.

합성할 물체의 위치를 산출하는 단계(S20)는, 하나의 프레임 및 다른 프레임 간의 광학 흐름을 픽셀 단위로 변환하는 변환 모델을 통해 픽셀 데이터로 변환할 수 있다. 변환 모델은 영상 내 물체의 움직임 패턴에 따라 대응하는 픽셀이 이동한 방향 및 거리를 산출할 수 있다.In the step of calculating the position of the object to be synthesized (S20), the optical flow between one frame and another frame may be converted into pixel data through a conversion model that converts the optical flow in units of pixels. The transformation model may calculate the moving direction and distance of the corresponding pixel according to the motion pattern of the object in the image.

합성할 물체의 위치를 산출하는 단계(S20)는, 광학 흐름에 대응하는 U좌표 기반의 중심 좌표의 변화 및 V좌표 기반의 하단 좌표의 변화를 이용하여 산출할 수 있다.The step of calculating the position of the object to be synthesized (S20) can be calculated using a change in center coordinates based on U coordinates and a change in lower coordinates based on V coordinates corresponding to the optical flow.

합성할 물체의 크기를 산출하는 단계(S30)는, 광학 흐름에 대응하는 U좌표 기반의 측면 좌표의 변화 및 U좌표 기반의 중심 좌표의 변화를 이용하여 너비 변화를 산출할 수 있다. 합성할 물체의 크기를 산출하는 단계(S30)는, 너비 변화의 비율을 이용하여 높이 변화를 산출할 수 있다.In the step of calculating the size of the object to be synthesized (S30), the width change may be calculated using a change in side coordinates based on U coordinates and a change in center coordinates based on U coordinates corresponding to the optical flow. In the step of calculating the size of the object to be synthesized (S30), the height change may be calculated using the ratio of the width change.

동영상 생성 장치는 하드웨어, 펌웨어, 소프트웨어 또는 이들의 조합에 의해 로직회로 내에서 구현될 수 있고, 범용 또는 특정 목적 컴퓨터를 이용하여 구현될 수도 있다. 장치는 고정배선형(Hardwired) 기기, 필드 프로그램 가능한 게이트 어레이(Field Programmable Gate Array, FPGA), 주문형 반도체(Application Specific Integrated Circuit, ASIC) 등을 이용하여 구현될 수 있다. 또한, 장치는 하나 이상의 프로세서 및 컨트롤러를 포함한 시스템온칩(System on Chip, SoC)으로 구현될 수 있다.The video generating device may be implemented in a logic circuit by hardware, firmware, software, or a combination thereof, or may be implemented using a general-purpose or special-purpose computer. The device may be implemented using a hardwired device, a field programmable gate array (FPGA), an application specific integrated circuit (ASIC), or the like. In addition, the device may be implemented as a System on Chip (SoC) including one or more processors and controllers.

동영상 생성 장치는 하드웨어적 요소가 마련된 컴퓨팅 디바이스 또는 서버에 소프트웨어, 하드웨어, 또는 이들의 조합하는 형태로 탑재될 수 있다. 컴퓨팅 디바이스 또는 서버는 각종 기기 또는 유무선 통신망과 통신을 수행하기 위한 통신 모뎀 등의 통신장치, 프로그램을 실행하기 위한 데이터를 저장하는 메모리, 프로그램을 실행하여 연산 및 명령하기 위한 마이크로프로세서 등을 전부 또는 일부 포함한 다양한 장치를 의미할 수 있다.The video generating device may be installed in the form of software, hardware, or a combination thereof in a computing device or server equipped with hardware elements. A computing device or server includes all or part of a communication device such as a communication modem for communicating with various devices or wired/wireless communication networks, a memory for storing data for executing a program, and a microprocessor for executing calculations and commands by executing a program. It can mean a variety of devices, including

도 7에서는 각각의 과정을 순차적으로 실행하는 것으로 기재하고 있으나 이는 예시적으로 설명한 것에 불과하고, 이 분야의 기술자라면 본 발명의 실시예의 본질적인 특성에서 벗어나지 않는 범위에서 도 7에 기재된 순서를 변경하여 실행하거나 또는 하나 이상의 과정을 병렬적으로 실행하거나 다른 과정을 추가하는 것으로 다양하게 수정 및 변형하여 적용 가능할 것이다.In FIG. 7, it is described that each process is sequentially executed, but this is merely an example, and a person skilled in the art changes and executes the sequence described in FIG. 7 within the range not departing from the essential characteristics of the embodiment of the present invention. Alternatively, it will be possible to apply various modifications and variations by executing one or more processes in parallel or adding another process.

본 실시예들에 따른 동작은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능한 매체에 기록될 수 있다. 컴퓨터 판독 가능한 매체는 실행을 위해 프로세서에 명령어를 제공하는 데 참여한 임의의 매체를 나타낸다. 컴퓨터 판독 가능한 매체는 프로그램 명령, 데이터 파일, 데이터 구조 또는 이들의 조합을 포함할 수 있다. 예를 들면, 자기 매체, 광기록 매체, 메모리 등이 있을 수 있다. 컴퓨터 프로그램은 네트워크로 연결된 컴퓨터 시스템 상에 분산되어 분산 방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수도 있다. 본 실시예를 구현하기 위한 기능적인(Functional) 프로그램, 코드, 및 코드 세그먼트들은 본 실시예가 속하는 기술분야의 프로그래머들에 의해 용이하게 추론될 수 있을 것이다.Operations according to the present embodiments may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer readable medium. Computer readable medium refers to any medium that participates in providing instructions to a processor for execution. A computer readable medium may include program instructions, data files, data structures, or combinations thereof. For example, there may be a magnetic medium, an optical recording medium, a memory, and the like. The computer program may be distributed over networked computer systems so that computer readable codes are stored and executed in a distributed manner. Functional programs, codes, and code segments for implementing this embodiment may be easily inferred by programmers in the art to which this embodiment belongs.

본 실시예들은 본 실시예의 기술 사상을 설명하기 위한 것이고, 이러한 실시예에 의하여 본 실시예의 기술 사상의 범위가 한정되는 것은 아니다. 본 실시예의 보호 범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 실시예의 권리범위에 포함되는 것으로 해석되어야 할 것이다.These embodiments are for explaining the technical idea of this embodiment, and the scope of the technical idea of this embodiment is not limited by these embodiments. The scope of protection of this embodiment should be construed according to the claims below, and all technical ideas within the scope equivalent thereto should be construed as being included in the scope of rights of this embodiment.

Claims

In the video generating method by the video generating device,
designating a position and size of an object to be synthesized in one frame among a plurality of frames;
and calculating a position of the object to be synthesized in another frame based on the optical flow.

According to claim 1,
After the step of calculating the position of the object to be synthesized
and calculating the size of the object to be synthesized in the other frame based on the optical flow.

According to claim 2,
In the step of specifying the position and size of the object to be synthesized,
Based on the UV coordinate system, setting the center coordinates based on U coordinates and the bottom coordinates based on V coordinates corresponding to the position of the object to be synthesized, and setting the height and width corresponding to the size of the object to be synthesized Characterized in that How to create a video.

According to claim 3,
In the step of calculating the position of the object to be synthesized,
and converting the optical flow between the one frame and the other frame into pixel data through a conversion model that converts the optical flow in units of pixels.

According to claim 4,
Wherein the transformation model calculates a direction and distance to which a corresponding pixel has moved according to a movement pattern of an object in the image.

According to claim 4,
In the step of calculating the position of the object to be synthesized,
A video generation method characterized in that the calculation is performed using a change in center coordinates based on U coordinates and a change in lower coordinates based on V coordinates corresponding to the optical flow.

According to claim 6,
In the step of calculating the size of the object to be synthesized,
A method for generating a video, characterized in that the change in width is calculated using a change in lateral coordinates based on U-coordinates and a change in center coordinates based on U-coordinates corresponding to the optical flow.

According to claim 7,
In the step of calculating the size of the object to be synthesized,
A method for generating a video, characterized in that the height change is calculated using the ratio of the width change.

An apparatus for generating a video comprising a processor and a memory storing a program executed by the processor,
the processor,
Specify the position and size of the object to be synthesized in one frame among a plurality of frames,
Calculate the position of the object to be synthesized in another frame based on the optical flow;
and calculating the size of the object to be synthesized in the other frame based on the optical flow.

According to claim 9,
the processor,
Based on the UV coordinate system, setting the center coordinates based on U coordinates and the bottom coordinates based on V coordinates corresponding to the position of the object to be synthesized, and setting the height and width corresponding to the size of the object to be synthesized Characterized in that video production device.

According to claim 10,
the processor,
By converting the optical flow between the one frame and the other frame into pixel data through a conversion model that converts the optical flow in units of pixels,
The motion picture generating apparatus according to claim 1 , wherein a position of the object to be synthesized is calculated using a change in center coordinates based on U coordinates and a change in lower coordinates based on V coordinates corresponding to the optical flow.

According to claim 11,
the processor,
The moving picture generating device, characterized in that for calculating the size of the object to be synthesized through a method of calculating the height change using the ratio of the width change.