KR102521904B1

KR102521904B1 - System and method for automatic video editing using utilization of auto labeling and insertion of design elements

Info

Publication number: KR102521904B1
Application number: KR1020220161618A
Authority: KR
Inventors: 전동혁; 이우섭
Original assignee: (주)비디오몬스터
Priority date: 2022-11-28
Filing date: 2022-11-28
Publication date: 2023-04-17

Abstract

The present invention relates to an automatic image editing system by using auto-labeling and inserting a design element and a method thereof. To this end, image auto-labeling performs image labeling on the basis of time, place, and object, and automatic arrangement, grouping, and border setting of a labeled image are performed by estimating the context of an image using a machine learning technology. In addition, a video file of a life log or Vlog with one episode or storytelling is provided by automatically inserting a transition at a boundary point. The automatic image editing system comprises: an image group generating unit for performing division into a plurality of clip images in accordance with metadata extracted from a plurality of image files and object data recognized therefrom, inferring the context between the clip images on the basis of the metadata and the object data, and arranging and grouping the clip images in accordance with a context inference result to generate one or more image groups including the clip images; a design effect insertion unit for separately recognizing the object data and boundary points between the image groups, and inserting a prepared design effect between the clip images and at least one of the boundary points on the basis of the recognized object data and boundary points; and a video file generating unit for generating a video file by combining the image groups into which the design effect is inserted and the boundary points.

Description

Automatic video editing system and method using auto labeling and inserting design elements

본 발명의 실시예는 오토 라벨링 활용과 디자인 요소 삽입을 통한 영상 자동 편집 시스템 및 그 방법에 관한 것이다.An embodiment of the present invention relates to an automatic image editing system and method through the use of auto labeling and insertion of design elements.

일반적으로 일상 생활에서 스마트 폰 등으로 촬영한 영상들을 일반 사용자가가 하나의 동영상으로 편집을 하기는 번거로우며, 약간의 전문적인 편집 도구 사용법을 알아야 함에 따른 어려움을 느낀다. In general, it is cumbersome for ordinary users to edit videos taken with a smartphone or the like in everyday life into a single video, and it is difficult to know how to use some professional editing tools.

따라서, 자동화된 영상 편집 도구가 있어 여러 곳이나 시간에 따라 촬영한 영상을 합쳐 하나의 동영상 결과물을 자동으로 편집해서 제작하는 도구가 개발되어 사용되고 있다. Therefore, there is an automated image editing tool, and a tool for automatically editing and producing a single video result by combining images taken in various places or according to time has been developed and used.

그러나, 종래의 영상 자동 편집 도구는 대부분 단순히 영상의 시간 흐름에 따라 영상물을 배열하고, 배열된 영상물을 합쳐 놓은 결과물을 제공하고 있어, 영상 결과물이 단순하고 어색한 전개를 갖도록 편집될 수 밖에 없는 문제가 있으며, 이러한 이유로 인해 사용자로 하여금 크게 흥미나 만족도를 갖지 못한다는 단점 있다.However, most of the conventional automatic video editing tools simply arrange the video objects according to the time flow of the video and provide the result of combining the arranged video objects, so the video result can only be edited to have a simple and awkward development. For this reason, there is a disadvantage in that users do not have much interest or satisfaction.

공개특허공보 제10-2005-0003690호(공개일자: 2005년01월12일)Patent Publication No. 10-2005-0003690 (published on January 12, 2005) 공개특허공보 제10-2021-0154957호(공개일자: 2021년12월21일)Publication No. 10-2021-0154957 (Publication date: December 21, 2021)

본 발명의 실시예는, 이미지 오토 라벨링을 수행하여 시간, 장소, 객체를 기반으로 영상에 대한 라벨링을 수행하고, 머신러닝 기술을 활용한 영상의 맥락 추정을 통해 라벨링된 영상의 자동 배열, 그룹화 및 경계설정을 수행하고, 클립영상 및 경계지점에 디자인효과를 자동 삽입함으로써 하나의 에피소드나 스토리텔링을 갖춘 라이프로그(Life Log) 또는 브이로그(Vlog)의 동영상파일을 제공하는 영상 자동 편집 시스템 및 그 방법을 제공한다.An embodiment of the present invention performs image auto-labeling to label images based on time, place, and object, and automatically arrange, group, and group labeled images through context estimation of images using machine learning technology. A video automatic editing system that provides a video file of Life Log or Vlog with one episode or storytelling by performing boundary setting and automatically inserting design effects at clip images and boundary points, and its provides a way

본 발명의 일 실시예에 따른 오토 라벨링 활용과 디자인 요소 삽입을 통한 영상 자동 편집 시스템은, 다수의 영상파일로부터 추출된 메타데이터 및 인식된 객체데이터에 따라 다수의 클립영상으로 분할하고, 상기 메타데이터 및 상기 객체데이터를 기반으로 다수의 클립영상 간 맥락을 추론하고, 맥락추론결과에 따라 상기 클립영상을 배열 및 그룹화하여 다수의 클립영상을 포함한 적어도 하나의 영상그룹을 생성하는 영상그룹 생성부; 상기 객체데이터 및 상기 영상그룹 간의 경계점을 각각 인식하고, 인식된 상기 객체데이터 및 상기 경계점을 기반으로 상기 클립영상 사이 및 상기 경계점 중 적어도 하나에 미리 준비된 디자인효과를 삽입하는 디자인효과 삽입부; 및 상기 디자인효과가 삽입된 상기 영상그룹과 상기 경계점을 서로 결합하여 동영상파일을 생성하는 동영상파일 생성부를 포함한다.An automatic video editing system using auto labeling and inserting design elements according to an embodiment of the present invention divides into a plurality of clip images according to metadata and recognized object data extracted from a plurality of image files, and the metadata and an image group generating unit configured to infer a context between a plurality of clip images based on the object data, and arrange and group the clip images according to the result of the context inference to generate at least one image group including a plurality of clip images. a design effect insertion unit for recognizing boundary points between the object data and the image group, and inserting a previously prepared design effect between the clip images and at least one of the boundary points based on the recognized object data and the boundary points; and a video file generating unit generating a video file by combining the video group into which the design effect is inserted and the boundary point.

또한, 사용자통신단말로부터 다수의 영상파일을 입력 받아 자동편집서버로 업로드하는 영상파일 등록부; 및 상기 동영상파일을 사용자통신단말로 전송하여 배포하는 동영상파일 배포부를 더 포함할 수 있다.In addition, a video file registration unit for receiving a plurality of video files from the user communication terminal and uploading them to the automatic editing server; and a video file distribution unit for transmitting and distributing the video file to a user communication terminal.

또한, 상기 영상파일 등록부는, 동영상파일 및 사진파일 중 적어도 하나의 영상파일을 선택 받는 영상파일 선택부; 및 상기 영상파일 선택부를 통해 선택된 영상파일을 업로드 하는 영상파일 업로드부를 포함할 수 있다.In addition, the video file registering unit may include: a video file selection unit for receiving a selection of at least one video file from among a video file and a photo file; and an image file upload unit for uploading the image file selected through the image file selection unit.

또한, 상기 영상파일 등록부를 통해 업로드 된 영상파일의 사이즈를 미리 설정된 사이즈로 각각 변환하고, 영상파일의 방향이 미리 설정된 방향으로 정렬되도록 회전시켜 영상파일에 포함된 이미지데이터를 정규화하여 상기 영상그룹 생성부로 전달하는 이미지 전처리부를 더 포함할 수 있다.In addition, the image group is generated by converting the size of each image file uploaded through the image file registration unit to a preset size, and normalizing image data included in the image file by rotating the image file so that the orientation of the image file is aligned in the preset direction. It may further include an image pre-processing unit that is transmitted to the unit.

또한, 상기 영상파일은 동영상파일 및 사진파일 중 적어도 하나를 포함하고, 상기 영상그룹 생성부는, 영상파일에서 촬영일시데이터와 촬영위치데이터를 각각 추출하는 메타데이터 추출부; 영상파일에서 객체데이터를 인식하는 객체 인식부; 영상파일이 동영상파일인 경우 상기 객체데이터에 따라 해당 영상파일을 분할하여 다수의 클립영상을 생성하는 영상파일 분할부; 클립영상 및 사진파일 별로 인물객체에 대한 성별, 연령, 행동 및 감정 중 적어도 하나를 분석하여 객체메타데이터를 생성하는 객체 분석부; 상기 촬영일시데이터, 상기 촬영위치데이터 및 상기 객체메타데이터를 기반으로 클립영상 및 사진파일 각각 간의 맥락을 추론하고, 맥락추론결과에 따라 클립영상 및 사진파일을 자동 배열하여 영상배열을 형성하는 영상배열 형성부; 및 상기 맥락추론결과를 기반으로 상기 영상배열에서 맥락종료지점을 각각 상기 경계점으로 자동 설정하고, 설정된 상기 경계점을 기준으로 상기 영상배열에 대한 그룹화를 수행하여 상기 영상그룹을 생성하는 영상그룹 생성부를 포함할 수 있다.In addition, the image file includes at least one of a video file and a photo file, and the image group generator includes: a metadata extractor extracting shooting date and time data and shooting location data from the image file, respectively; an object recognizing unit recognizing object data in an image file; If the video file is a video file, an image file division unit for generating a plurality of clip images by dividing the corresponding video file according to the object data; an object analyzer for generating object metadata by analyzing at least one of gender, age, behavior, and emotion of a person object for each clip image and photo file; An image arrangement in which a context between each clip image and photo file is inferred based on the shooting date data, the shooting location data, and the object metadata, and the clip image and photo files are automatically arranged according to the result of the context inference to form an image arrangement. forming part; and an image group creation unit configured to automatically set a context end point in the image array as the boundary point based on a result of the context inference, and group the image array based on the set boundary point to generate the image group. can do.

또한, 상기 영상배열 형성부는, 상기 영상배열에 포함된 클립영상을 재생 가능하게 표시하고, 상기 영상배열에 포함된 클립영상 및 사진파일 각각의 배열 순서를 드래그 앤 드랍 방식으로 변경하기 위한 제1 사용자 인터페이스를 제공하고, 상기 영상그룹 생성부는, 상기 맥락종료지점의 위치를 드래그 앤 드랍 방식으로 변경하기 위한 제2 사용자 인터페이스를 제공할 수 있다.In addition, the image array forming unit displays the clip images included in the image array in a reproducible manner, and changes the arrangement order of each of the clip images and photo files included in the image array by a drag-and-drop method. An interface may be provided, and the image group creation unit may provide a second user interface for changing the position of the context end point using a drag and drop method.

또한, 상기 디자인효과 삽입부는, 미리 준비된 디자인효과요소 중 상기 객체메타데이터와 매칭되는 디자인효과요소를 추출하고, 추출된 디자인효과요소를 상기 객체메타데이터가 라벨링된 클립영상의 시작 또는 종료 지점에 삽입하고, 상기 영상그룹 간을 경계점으로 인식하고, 인식된 상기 경계점에 미리 준비된 트랜지션 영상 또는 미리 준비된 디자인효과영상을 삽입할 수 있다.In addition, the design effect inserting unit extracts a design effect element matching the object metadata from among previously prepared design effect elements, and inserts the extracted design effect element at the start or end point of a clip image labeled with the object metadata. and recognizes the boundary point between the image groups, and inserts a previously prepared transition image or a previously prepared design effect image into the recognized boundary point.

본 발명의 다른 실시예에 따른 오토 라벨링 활용과 디자인 요소 삽입을 통한 영상 자동 편집 방법은, 영상그룹 생성부가, 다수의 영상파일로부터 추출된 메타데이터 및 인식된 객체데이터에 따라 다수의 클립영상으로 분할하고, 상기 메타데이터 및 상기 객체데이터를 기반으로 다수의 클립영상 간 맥락을 추론하고, 맥락추론결과에 따라 상기 클립영상을 배열 및 그룹화하여 다수의 클립영상을 포함한 적어도 하나의 영상그룹을 생성하는 영상그룹 생성 단계; 디자인효과 삽입부가, 상기 객체데이터 및 상기 영상그룹 간의 경계점을 각각 인식하고, 인식된 상기 객체데이터 및 상기 경계점을 기반으로 상기 클립영상 사이 및 상기 경계점 중 적어도 하나에 미리 준비된 디자인효과를 삽입하는 디자인효과 삽입 단계; 및 동영상파일 생성부가, 상기 디자인효과가 삽입된 상기 영상그룹과 상기 경계점을 서로 결합하여 동영상파일을 생성하는 동영상파일 생성 단계를 포함한다.According to another embodiment of the present invention, in the automatic video editing method using auto labeling and inserting design elements, the video group creation unit divides into a plurality of clip images according to metadata extracted from a plurality of video files and recognized object data. and inferring the context between a plurality of clip images based on the metadata and the object data, and arranging and grouping the clip images according to the result of the context inference to generate at least one image group including a plurality of clip images. group creation step; A design effect insertion unit recognizes the boundary points between the object data and the video group, respectively, and inserts a previously prepared design effect between the clip images and at least one of the boundary points based on the recognized object data and the boundary points. insertion step; and a video file generation step of generating a video file by combining the video group into which the design effect is inserted with the boundary point, by a video file generating unit.

또한, 영상파일 등록부가, 사용자통신단말로부터 다수의 영상파일을 입력 받아 자동편집서버로 업로드하는 영상파일 등록 단계; 및 동영상파일 배포부가, 상기 동영상파일을 사용자통신단말로 전송하여 배포하는 동영상파일 배포 단계를 더 포함할 수 있다.In addition, the image file registration step of receiving a plurality of image files from the user communication terminal and uploading them to the automatic editing server by the image file registration unit; and a video file distribution step of transmitting and distributing the video file to the user communication terminal by the video file distribution unit.

또한, 상기 영상파일 등록 단계는, 동영상파일 및 사진파일 중 적어도 하나의 영상파일을 선택 받는 영상파일 선택 단계; 및 상기 영상파일 선택 단계를 통해 선택된 영상파일을 업로드 하는 영상파일 업로드 단계를 포함할 수 있다.In addition, the video file registration step may include a video file selection step of receiving a selection of at least one video file from a video file and a photo file; and an image file upload step of uploading the image file selected through the image file selection step.

또한, 상기 영상파일 등록 단계를 통해 업로드 된 영상파일의 사이즈를 미리 설정된 사이즈로 각각 변환하고, 영상파일의 방향이 미리 설정된 방향으로 정렬되도록 회전시켜 영상파일에 포함된 이미지데이터를 정규화하여 상기 영상그룹 생성 단계를 실행하기 위해 전달하는 이미지 전처리 단계를 더 포함할 수 있다.In addition, the size of each video file uploaded through the video file registration step is converted to a preset size, and the image data included in the video file is normalized by rotating the image file so that the orientation of the video file is aligned in the preset direction, thereby normalizing the image group. It may further include an image pre-processing step passing to execute the generating step.

또한, 상기 영상파일은 동영상파일 및 사진파일 중 적어도 하나를 포함하고, 상기 영상그룹 생성 단계는, 영상파일에서 촬영일시데이터와 촬영위치데이터를 각각 추출하는 메타데이터 추출 단계; 영상파일에서 객체데이터를 인식하는 객체 인식 단계; 영상파일이 동영상파일인 경우 상기 객체데이터에 따라 해당 영상파일을 분할하여 다수의 클립영상을 생성하는 영상파일 분할 단계; 클립영상 및 사진파일 별로 인물객체에 대한 성별, 연령, 행동 및 감정 중 적어도 하나를 분석하여 객체메타데이터를 생성하는 객체 분석 단계; 상기 촬영일시데이터, 상기 촬영위치데이터 및 상기 객체메타데이터를 기반으로 클립영상 및 사진파일 각각 간의 맥락을 추론하고, 맥락추론결과에 따라 클립영상 및 사진파일을 자동 배열하여 영상배열을 형성하는 영상배열 형성 단계; 및 상기 맥락추론결과를 기반으로 상기 영상배열에서 맥락종료지점을 각각 경계점으로 자동 설정하고, 설정된 상기 경계점을 기준으로 상기 영상배열에 대한 그룹화를 수행하여 상기 영상그룹을 생성하는 영상그룹 생성 단계를 포함할 수 있다.In addition, the video file includes at least one of a video file and a picture file, and the generating of the video group may include: a metadata extraction step of extracting shooting date and time data and shooting location data from the video file, respectively; object recognition step of recognizing object data in the image file; If the video file is a video file, dividing the corresponding video file according to the object data to generate a plurality of clip images; An object analysis step of generating object metadata by analyzing at least one of gender, age, behavior, and emotion of the person object for each clip image and photo file; An image arrangement in which a context between each clip image and photo file is inferred based on the shooting date data, the shooting location data, and the object metadata, and the clip image and photo files are automatically arranged according to the result of the context inference to form an image arrangement. formation step; and an image group generation step of automatically setting context end points in the image array as boundary points based on a result of the context inference, and generating the image groups by grouping the image arrays based on the set boundary points. can do.

또한, 상기 영상배열 형성 단계는, 상기 영상배열에 포함된 클립영상을 재생 가능하게 표시하고, 상기 영상배열에 포함된 클립영상 및 사진파일 각각의 배열 순서를 드래그 앤 드랍 방식으로 변경하기 위한 제1 사용자 인터페이스를 제공하고, 상기 영상그룹 생성 단계는, 상기 맥락종료지점의 위치를 드래그 앤 드랍 방식으로 변경하기 위한 제2 사용자 인터페이스를 제공할 수 있다.In addition, the forming of the image array may include displaying the clip images included in the image array in a reproducible manner and changing the arrangement order of each of the clip images and photo files included in the image array by a drag-and-drop method. A user interface may be provided, and in the generating of the image group, a second user interface may be provided to change the location of the context end point using a drag and drop method.

또한, 상기 디자인효과 삽입 단계는, 미리 준비된 디자인효과요소 중 상기 객체메타데이터와 매칭되는 디자인효과요소를 추출하고, 추출된 디자인효과요소를 상기 객체메타데이터가 라벨링된 클립영상의 시작 또는 종료 지점에 삽입하고, 상기 영상그룹 간을 경계점으로 인식하고, 인식된 상기 경계점에 미리 준비된 트랜지션 영상 또는 미리 준비된 디자인효과영상을 삽입할 수 있다.In addition, in the step of inserting the design effect, design effect elements matching the object metadata are extracted from among the design effect elements prepared in advance, and the extracted design effect elements are placed at the start or end point of the clip image labeled with the object metadata. Inserting, recognizing the image groups as a boundary point, and inserting a previously prepared transition image or a previously prepared design effect image into the recognized boundary point.

본 발명에 따르면, 이미지 오토 라벨링을 수행하여 시간, 장소, 객체를 기반으로 영상에 대한 라벨링을 수행하고, 머신러닝 기술을 활용한 영상의 맥락 추정을 통해 라벨링된 영상의 자동 배열, 그룹화 및 경계설정을 수행하고, 클립영상 및 경계지점에 디자인효과를 자동 삽입함으로써 하나의 에피소드나 스토리텔링을 갖춘 라이프로그(Life Log) 또는 브이로그(Vlog)의 동영상파일을 제공하는 영상 자동 편집 시스템 및 그 방법을 제공할 수 있다.According to the present invention, image auto-labeling is performed to perform labeling of images based on time, place, and object, and automatic arrangement, grouping, and boundary setting of labeled images through context estimation of images using machine learning technology. An automatic video editing system and method for providing a video file of Life Log or Vlog with one episode or storytelling by performing and automatically inserting design effects at clip images and border points can provide

도 1은 본 발명의 일 실시예에 따른 오토 라벨링 활용과 디자인 요소 삽입을 통한 영상 자동 편집 시스템의 구성 형태를 설명하기 위해 나타낸 개요도이다.
도 2는 본 발명의 일 실시예에 따른 오토 라벨링 활용과 디자인 요소 삽입을 통한 영상 자동 편집 시스템의 구성을 나타낸 블록도이다.
도 3은 본 발명의 일 실시예에 따른 영상파일 등록부의 구성을 나타낸 블록도이다.
도 4는 본 발명의 일 실시예에 따른 영상그룹 생성부의 구성을 나타낸 블록도이다.
도 5는 본 발명의 영상 자동 편집 과정을 설명하기 위해 나타낸 도면이다.
도 6은 본 발명의 다른 실시예에 따른 오토 라벨링 활용과 디자인 요소 삽입을 통한 영상 자동 편집 방법의 구성을 나타낸 순서도이다.
도 7은 본 발명의 다른 실시예에 따른 영상파일 등록 단계의 구성을 나타낸 순서도이다.
도 8은 본 발명의 다른 실시예에 따른 영상그룹 생성 단계의 구성을 나타낸 순서도이다.1 is a schematic diagram illustrating the configuration of an automatic video editing system through utilization of auto labeling and insertion of design elements according to an embodiment of the present invention.
2 is a block diagram showing the configuration of an automatic image editing system through the use of auto labeling and insertion of design elements according to an embodiment of the present invention.
3 is a block diagram showing the configuration of a video file registration unit according to an embodiment of the present invention.
4 is a block diagram showing the configuration of a video group generator according to an embodiment of the present invention.
5 is a diagram shown to explain the automatic video editing process of the present invention.
6 is a flowchart illustrating the configuration of a method for automatically editing an image through utilization of auto labeling and insertion of design elements according to another embodiment of the present invention.
7 is a flowchart showing the configuration of a video file registration step according to another embodiment of the present invention.
8 is a flowchart illustrating the configuration of a video group generating step according to another embodiment of the present invention.

본 명세서에서 사용되는 용어에 대해 간략히 설명하고, 본 발명에 대해 구체적으로 설명하기로 한다.The terms used in this specification will be briefly described, and the present invention will be described in detail.

본 발명에서 사용되는 용어는 본 발명에서의 기능을 고려하면서 가능한 현재 널리 사용되는 일반적인 용어들을 선택하였으나, 이는 당 분야에 종사하는 기술자의 의도 또는 판례, 새로운 기술의 출현 등에 따라 달라질 수 있다. 또한, 특정한 경우는 출원인이 임의로 선정한 용어도 있으며, 이 경우 해당되는 발명의 설명 부분에서 상세히 그 의미를 기재할 것이다. 따라서 본 발명에서 사용되는 용어는 단순한 용어의 명칭이 아닌, 그 용어가 가지는 의미와 본 발명의 전반에 걸친 내용을 토대로 정의되어야 한다.The terms used in the present invention have been selected from general terms that are currently widely used as much as possible while considering the functions in the present invention, but these may vary depending on the intention of a person skilled in the art or precedent, the emergence of new technologies, and the like. In addition, in a specific case, there is also a term arbitrarily selected by the applicant, and in this case, the meaning will be described in detail in the description of the invention. Therefore, the term used in the present invention should be defined based on the meaning of the term and the overall content of the present invention, not simply the name of the term.

명세서 전체에서 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있음을 의미한다. 또한, 명세서에 기재된 "...부", "모듈" 등의 용어는 적어도 하나 이상의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어 또는 소프트웨어로 구현되거나 하드웨어와 소프트웨어의 결합으로 구현될 수 있다.When it is said that a certain part "includes" a certain component throughout the specification, it means that it may further include other components without excluding other components unless otherwise stated. In addition, terms such as "...unit" and "module" described in the specification mean a unit that processes at least one function or operation, which may be implemented as hardware or software or a combination of hardware and software. .

아래에서는 첨부한 도면을 참고하여 본 발명의 실시예에 대하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다. Hereinafter, with reference to the accompanying drawings, embodiments of the present invention will be described in detail so that those skilled in the art can easily carry out the present invention. However, the present invention may be embodied in many different forms and is not limited to the embodiments described herein. And in order to clearly explain the present invention in the drawings, parts irrelevant to the description are omitted, and similar reference numerals are attached to similar parts throughout the specification.

도 1은 본 발명의 일 실시예에 따른 오토 라벨링 활용과 디자인 요소 삽입을 통한 영상 자동 편집 시스템의 구성 형태를 설명하기 위해 나타낸 개요도이다.1 is a schematic diagram illustrating the configuration of an automatic video editing system through utilization of auto labeling and insertion of design elements according to an embodiment of the present invention.

도 1을 참조하면, 본 발명의 일 실시예에 따른 오토 라벨링 활용과 디자인 요소 삽입을 통한 영상 자동 편집 시스템(1000)은 사용자통신단말(10)과 자동편집서버(20)를 이용하여 구현될 수 있다. Referring to FIG. 1 , an automatic video editing system 1000 using auto labeling and inserting design elements according to an embodiment of the present invention may be implemented using a user communication terminal 10 and an automatic editing server 20. there is.

본 발명의 일 실시예에 따른 사용자통신단말(10)은 오토 라벨링 활용과 디자인 요소 삽입을 통한 영상 자동 편집을 위한 일부 기능이 탑재된 소프트웨어를 통해 영상 자동 편집 서비스를 제공 받을 수 있다. The user communication terminal 10 according to an embodiment of the present invention can receive an automatic video editing service through software loaded with some functions for automatic video editing through the use of auto labeling and insertion of design elements.

좀 더 구체적으로, 사용자통신단말(10)은, 오토 라벨링 활용과 디자인 요소 삽입을 통한 영상 자동 편집 서비스를 제공 받기 위한 전용 프로그램(예를 들어, 어플리케이션 관리 프로그램)이 설치 또는 탑재되거나, 사용자통신단말(10)의 웹 브라우저를 통해 웹 사이트에 접속하는 방식을 통해 오토 라벨링 활용과 디자인 요소 삽입을 통한 영상 자동 편집 서비스를 제공받을 수 있도록 구현될 수 있다. 이러한 사용자통신단말(10)은, PCS(Personal Communication System), GSM(Global System for Mobile communications), PDC(Personal Digital Cellular), PHS(Personal Handyphone System), PDA(Personal Digital Assistant), IMT(International Mobile Telecommunication)-2000, CDMA(Code Division Multiple Access)-2000, W-CDMA(W-Code Division Multiple Access), Wibro(Wireless Broadband Internet) 단말, 스마트폰(smartphone), 스마트 패드(smartpad), 타블렛 PC(Tablet PC) 등과 같은 모든 종류의 핸드헬드(Handheld) 기반의 무선 통신 장치를 포함할 수 있다. 여기서, 웹 브라우저는 웹(WWW: world wide web) 서비스를 이용할 수 있게 하는 프로그램으로 HTML(hypertext mark-up language)로 서술된 하이퍼텍스트를 받아서 보여주는 프로그램을 의미하며, 예를 들어 넷스케이프(Netscape), 익스플로러(Explorer), 크롬(chrome) 등을 포함한다. 또한, 애플리케이션은 단말 상의 응용 프로그램(application)을 의미하며, 즉 모바일 단말(스마트폰)에서 실행되는 어플리케이션을 포함할 수 있다.More specifically, in the user communication terminal 10, a dedicated program (for example, an application management program) is installed or mounted, or a user communication terminal for receiving an automatic video editing service through use of auto labeling and insertion of design elements. Through the method of accessing the website through the web browser in (10), it can be implemented to receive the automatic video editing service through the use of auto labeling and the insertion of design elements. These user communication terminals 10 are PCS (Personal Communication System), GSM (Global System for Mobile communications), PDC (Personal Digital Cellular), PHS (Personal Handyphone System), PDA (Personal Digital Assistant), IMT (International Mobile) Telecommunication)-2000, CDMA (Code Division Multiple Access)-2000, W-CDMA (W-Code Division Multiple Access), Wibro (Wireless Broadband Internet) terminal, smartphone, smart pad, tablet PC ( It may include all kinds of handheld-based wireless communication devices such as a Tablet PC. Here, the web browser is a program that enables the use of web (WWW: world wide web) services, and means a program that receives and displays hypertext described in HTML (hypertext mark-up language). For example, Netscape, Includes Explorer, Chrome, etc. In addition, an application means an application on a terminal, that is, it may include an application executed on a mobile terminal (smart phone).

상기 자동편집서버(20)는, 다수의 사용자통신단말(10)과 연결되어, 사용자통신단말(10)로부터 선택된 다수의 영상파일(동영상, 사진)를 수신하고, 수신된 영상파일의 메타데이터에 포함된 일시 및 위치정보와 영상데이터로부터의 인식객체를 기반으로 영상에 대한 라벨링을 자동 수행하고, 머신러닝 기술을 활용한 영상의 맥락 추정을 통해 라벨링된 영상의 자동 배열, 그룹화 및 경계설정을 수행하고, 경계 지점에서의 트랜지션을 자동 삽입함으로써 하나의 에피소드나 스토리텔링을 갖춘 동영상파일을 제공할 수 있다.The automatic editing server 20 is connected to a plurality of user communication terminals 10, receives a plurality of video files (videos, photos) selected from the user communication terminals 10, and stores metadata of the received video files. Automatically label images based on included time and location information and recognition objects from image data, and perform automatic arrangement, grouping, and boundary setting of labeled images through context estimation of images using machine learning technology And, by automatically inserting transitions at boundary points, a video file with one episode or storytelling can be provided.

상기 자동편집서버(200)는, 하드웨어적으로 통상적인 웹 서버와 동일한 구성을 가지며, 소프트웨어적으로는 C, C++, Java, Visual Basic, Visual C 등과 같은 다양한 형태의 언어를 통해 구현되어 여러 가지 기능을 하는 프로그램 모듈을 포함할 수 있다. 또한, 일반적인 서버용 하드웨어에 도스(dos), 윈도우(window), 리눅스(linux), 유닉스(unix), 매킨토시(macintosh), 안드로이드(Android), 아이오에서(iOS) 등의 운영 체제에 따라 다양하게 제공되고 있는 웹 서버 프로그램을 이용하여 구현될 수 있다.The automatic editing server 200 has the same configuration as a normal web server in terms of hardware, and in terms of software, various functions are implemented through various types of languages such as C, C++, Java, Visual Basic, Visual C, etc. It may include a program module that does. In addition, it is provided in various ways depending on the operating system such as DOS, Windows, Linux, Unix, Macintosh, Android, and iOS in general server hardware. It can be implemented using a web server program that is being developed.

한편, 사용자통신단말(10)과 자동편집서버(20) 간을 연결하는 인터넷 네트워크의 통신망의 일 예로는, 이동통신을 위한 기술표준들 또는 통신방식(예를 들어, GSM(Global System for Mobile communication), CDMA(Code Division Multi Access), CDMA2000(Code Division Multi Access 2000), EV-DO(Enhanced Voice-Data Optimized or Enhanced Voice-Data Only), WCDMA(Wideband CDMA), HSDPA(High Speed Downlink Packet Access), HSUPA(High Speed Uplink Packet Access), LTE(Long Term Evolution), LTE-A(Long Term Evolution-Advanced), 5G 등)에 따라 구축된 이동 통신망을 포함할 수 있으나, 특별히 한정하는 것은 아니다. 또한, 유선 통신망의 일 예로는, LAN(Local Area Network), WAN(Wide Area Network)등의 폐쇄형 네트워크일 수 있으며, 인터넷과 같은 개방형 네트워크인 것이 바람직하다. 인터넷은 TCP/IP 프로토콜 및 그 상위계층에 존재하는 여러 서비스, 즉 HTTP(HyperText Transfer Protocol), Telnet, FTP(File Transfer Protocol), DNS(Domain Name System), SMTP(Simple Mail Transfer Protocol), SNMP(Simple Network Management Protocol), NFS(Network File Service), NIS(Network Information Service)를 제공하는 전세계적인 개방형 컴퓨터 네트워크 구조를 의미한다.On the other hand, as an example of the communication network of the Internet network connecting the user communication terminal 10 and the automatic editing server 20, technical standards or communication methods for mobile communication (eg, GSM (Global System for Mobile communication) ), Code Division Multi Access (CDMA), Code Division Multi Access 2000 (CDMA2000), Enhanced Voice-Data Optimized or Enhanced Voice-Data Only (EV-DO), Wideband CDMA (WCDMA), High Speed Downlink Packet Access (HSDPA) , High Speed Uplink Packet Access (HSUPA), Long Term Evolution (LTE), Long Term Evolution-Advanced (LTE-A), 5G, etc.), but is not particularly limited. In addition, an example of the wired communication network may be a closed network such as a local area network (LAN) and a wide area network (WAN), preferably an open network such as the Internet. The Internet is based on the TCP/IP protocol and several services that exist on its upper layer, such as HTTP (HyperText Transfer Protocol), Telnet, FTP (File Transfer Protocol), DNS (Domain Name System), SMTP (Simple Mail Transfer Protocol), SNMP ( Simple Network Management Protocol), Network File Service (NFS), and Network Information Service (NIS).

도 2는 본 발명의 일 실시예에 따른 오토 라벨링 활용과 디자인 요소 삽입을 통한 영상 자동 편집 시스템의 구성을 나타낸 블록도이고, 도 3은 본 발명의 일 실시예에 따른 영상파일 등록부의 구성을 나타낸 블록도이고, 도 4는 본 발명의 일 실시예에 따른 영상그룹 생성부의 구성을 나타낸 블록도이며, 도 5는 본 발명의 영상 자동 편집 과정을 설명하기 위해 나타낸 도면이다.2 is a block diagram showing the configuration of an automatic video editing system through the use of auto labeling and insertion of design elements according to an embodiment of the present invention, and FIG. 3 shows the configuration of a video file registration unit according to an embodiment of the present invention. FIG. 4 is a block diagram showing the configuration of a video group generator according to an embodiment of the present invention, and FIG. 5 is a diagram for explaining the automatic video editing process of the present invention.

도 2를 참조하면, 본 발명의 일 실시예에 따른 오토 라벨링 활용과 디자인 요소 삽입을 통한 영상 자동 편집 시스템(1000)은 영상파일 등록부(100), 이미지 전처리부(200), 영상그룹 생성부(300), 디자인효과 삽입부(400), 동영상파일 생성부(500) 및 동영상파일 배포부(600) 중 적어도 하나를 포함할 수 있다.Referring to FIG. 2 , an automatic video editing system 1000 using auto labeling and inserting design elements according to an embodiment of the present invention includes an image file registration unit 100, an image preprocessing unit 200, and an image group generation unit ( 300), a design effect insertion unit 400, a video file creation unit 500, and a video file distribution unit 600.

상기 영상파일 등록부(100)는, 사용자통신단말(10)로부터 다수의 영상파일을 입력 받아 자동편집서버(20)로 업로드 할 수 있다.The video file registration unit 100 may receive a plurality of video files from the user communication terminal 10 and upload them to the automatic editing server 20 .

이를 위해 영상파일 등록부(100)는 도 3에 도시된 바와 같이, 영상파일 선택부(110)와 영상파일 업로드부(120) 중 적어도 하나를 포함할 수 있다.To this end, the image file registration unit 100 may include at least one of an image file selection unit 110 and an image file upload unit 120 as shown in FIG. 3 .

상기 영상파일 선택부(110)는, 사용자통신단말(10)의 앨범 또는 사진첩에 접근하여, 해당 앨범 또는 사진첩에 저장된 동영상파일 및 사진파일 중 적어도 하나의 영상파일을 선택 받도록 하거나, 사용자통신단말(10)의 카메라를 통해 촬영된 동영상이나 사진 중 어느 하나를 선택할 수 있도록 한다. The video file selector 110 accesses the album or photo album of the user communication terminal 10 and allows selection of at least one video file from among video files and photo files stored in the album or photo album, or the user communication terminal ( 10) allows users to select either a video or a photo taken by the camera.

상기 영상파일 업로드부(120)는, 영상파일 선택부(110)를 통해 선택된 동영상, 사진 등의 영상파일을 유선 또는 무선 인터넷 통신망을 통해 자동편집서버(20)로 업로드 할 수 있다. The video file upload unit 120 may upload video files such as videos and photos selected through the video file selection unit 110 to the automatic editing server 20 through a wired or wireless internet communication network.

상기 이미지 전처리부(200)는, 영상파일 등록부(100)를 통해 업로드 된 영상파일(동영상, 사진)의 사이즈를 미리 설정된 사이즈로 통일되도록 각각 변환하고, 영상파일의 방향이 미리 설정된 방향으로 정렬되도록 회전시켜 영상파일에 포함된 이미지데이터를 정규화할 수 있으며, 전처리가 완료된 영상파일을 영상그룹 생성부(300)로 전달할 수 있다. 이때, 영상파일은 그 촬영방향 즉 세로로 촬영되었는지 또는 가로로 촬영되었는지에 따라 해당 영상의 사이즈와 형태가 상이할 수 있으므로, 이를 통일시키도록 영상을 회전 변환시킬 수 있다.The image pre-processing unit 200 converts the sizes of video files (videos and photos) uploaded through the video file registering unit 100 to be unified to a preset size, and arranges the directions of the video files in a preset direction. Image data included in the image file may be normalized by rotation, and the image file for which preprocessing is completed may be transmitted to the image group generator 300 . At this time, since the size and shape of the corresponding image may be different depending on the image file's shooting direction, that is, vertical or horizontal, the images may be rotated and converted to unify them.

상기 영상그룹 생성부(300)는, 다수의 영상파일(동영상, 사진)로부터 추출된 메타데이터 및 인식된 객체데이터에 따라 다수의 클립영상으로 분할하고, 추출된 메타데이터 및 인식된 객체데이터를 기반으로 다수의 클립영상 간 맥락을 추론하고, 맥락추론결과에 따라 다수의 클립영상을 배열 및 그룹화하여 다수의 클립영상을 포함한 적어도 하나의 영상그룹을 생성할 수 있다.The image group generator 300 divides into a plurality of clip images according to metadata and recognized object data extracted from a plurality of image files (videos and photos), and based on the extracted metadata and recognized object data. It is possible to infer a context between a plurality of clip images, and to arrange and group a plurality of clip images according to the context inference result to generate at least one image group including a plurality of clip images.

이를 위해 영상그룹 생성부(300)는 도 4에 도시된 바와 같이, 메타데이터 추출부(310), 객체 인식부(320), 영상파일 분할부(330), 객체 분석부(340), 영상배열 형성부(350) 및 영상그룹 생성부(360) 중 적어도 하나를 포함할 수 있다.To this end, as shown in FIG. 4, the image group creation unit 300 includes a metadata extraction unit 310, an object recognition unit 320, an image file division unit 330, an object analysis unit 340, and an image array. At least one of the formation unit 350 and the image group generation unit 360 may be included.

상기 메타데이터 추출부(310)는, 전처리가 완료된 동영상, 사진 등의 영상파일에서 메타데이터를 추출할 수 있으며, 추출된 메타데이터에서 촬영일시데이터와 촬영위치데이터를 각각 추출할 수 있다. 여기서, 촬영일시데이터는 영상파일이 촬영된 년, 월, 일, 시에 대한 정보를 포함하고, 촬영위치데이터는 영상파일을 촬영한 사용자통신단말(10)의 GPS위치정보를 포함할 수 있다. 이러한 촬영일시데이터와 촬영위치데이터는 후술하는 영상파일의 맥락추론을 위한 정보 중 하나로서 활용되며, 맥락추론을 위한 영상파일의 새로운 메타데이터로서 새롭게 저장 또는 정의될 수 있다.The metadata extractor 310 may extract metadata from image files such as videos and photos for which preprocessing has been completed, and may extract recording date and time data and recording location data from the extracted metadata, respectively. Here, the recording date data may include information on the year, month, day, and time when the video file was captured, and the recording location data may include GPS location information of the user communication terminal 10 that captured the video file. Such photographing date and time data and photographing location data are utilized as one of information for context inference of an image file to be described later, and may be newly stored or defined as new metadata of an image file for context inference.

상기 객체 인식부(320)는, 전처리가 완료된 영상파일(동영상, 사진)에서 객체데이터를 인식할 수 있다. 이러한 객체 인식부(320)는 객체 인식을 위해 미리 정의된 머신러닝 알고리즘을 활용하여 영상파일의 영상데이터 내 존재하는 특정 객체를 인식할 수 있다. 여기서, 객체는 사람(인물), 동물(개, 고양이 등), 사물(자동차, 건축물, 교량, 신호등 등) 등 다양한 대상을 포함할 수 있으며, 본 실시예에서는 미리 설정된 객체 또는 객체군에 대한 인식 프로세스를 제공할 수 있다. The object recognizing unit 320 may recognize object data in image files (videos, photos) for which preprocessing has been completed. The object recognition unit 320 may recognize a specific object existing in image data of an image file by using a predefined machine learning algorithm for object recognition. Here, the object may include various objects such as people (persons), animals (dogs, cats, etc.), objects (cars, buildings, bridges, traffic lights, etc.), and in this embodiment, recognition of a preset object or group of objects. process can be provided.

상기 영상파일 분할부(330)는, 업로드 된 영상파일이 동영상파일인 경우 해당 동영상파일에서 인식된 객체데이터에 따라 해당 동영상파일을 해당 객체를 기준으로 분할하여 다수의 클립영상을 생성할 수 있다.When the uploaded video file is a video file, the video file divider 330 divides the video file based on the object according to the object data recognized in the video file to generate a plurality of clip images.

예를 들어, 도 5의 (a) 및 (b)에 도시된 바와 같이 업로드 된 영상파일이 Video01, Video02, Image01, Video03, Image02, Video04가 있다고 가정했을 때, Video01에는 객체 1, 2가 인식되고, Video02에는 객체 1, 2가 인식되고, Video03에는 객체 2, 3이 인식되고, Video04에는 객체 2, 3이 인식되었다면, 각 인식객체가 나타나는 재생구간에 따라 Video를 다수의 클립영상으로 분할할 수 있다. 다만, 하나의 Video에서 서로 다른 객체가 서로 다른 재생구간에서 인식되는 경우 객체가 나타나는 구간 단위로 Video를 분할할 수 있다. 즉, Video01에는 객체 1, 2가 인식되었지만, 객체 1이 먼저 인식되고 객체 2가 인식된 후 다시 객체 1이 인식되는 경우 객체 1, 객체 2, 객체 3의 인식순서에 따라 Clip01, Clip02, Clip03으로 분할할 수 있다. For example, assuming that the uploaded video files have Video01, Video02, Image01, Video03, Image02, and Video04 as shown in (a) and (b) of FIG. 5, objects 1 and 2 are recognized in Video01 and , If objects 1 and 2 are recognized in Video02, objects 2 and 3 are recognized in Video03, and objects 2 and 3 are recognized in Video04, the video can be divided into multiple clips according to the playback section in which each recognized object appears. there is. However, if different objects are recognized in different playback sections in one video, the video can be divided into sections where the objects appear. That is, if objects 1 and 2 are recognized in Video01, but object 1 is recognized first, then object 2 is recognized, and then object 1 is recognized again, Clip01, Clip02, and Clip03 according to the recognition order of object 1, object 2, and object 3. can be divided

상기 객체 분석부(340)는, 클립영상 및 사진파일 별로 인물객체에 대한 성별, 연령, 행동 및 감정 중 적어도 하나에 대한 객체특징을 분석하여 객체메타데이터를 생성할 수 있다. 여기서, 객체메타데이터는 후술하는 맥락추론을 위한 기초정보로서 활용되며, 클립영상 및 사진들에 대한 전후 맥락을 추론하기에 앞서 각 파일들에 나타나는 객체의 특징적 요소 즉, 성별, 연령, 행동, 감정 등의 특징을 미리 정의된 머신러닝 알고리즘을 활용하여 분석할 수 있다. 객체 분석부(340)의 분석결과 즉, 객체메타데이터는 인식된 객체 별로 객체성별메타정보, 객체연령메타정보, 객체행동메타정보, 객체감정메타정보 중 적어도 하나를 정보를 포함하며, 후술하는 맥락추론을 위해 기초정보 중 하나로서 활용될 수 있다.The object analyzer 340 may generate object metadata by analyzing object characteristics of at least one of gender, age, behavior, and emotion of the person object for each clip image and photo file. Here, the object metadata is used as basic information for context inference, which will be described later, and the characteristic elements of objects appearing in each file, such as gender, age, behavior, and emotion, prior to inferring the context of clip images and photos. The characteristics of the back can be analyzed using a predefined machine learning algorithm. The analysis result of the object analyzer 340, that is, the object metadata, includes at least one of object gender meta information, object age meta information, object behavior meta information, and object emotion meta information for each recognized object, which will be described later. It can be used as one of the basic information for reasoning.

상기 영상배열 형성부(350)는, 영상파일(클립영상 및 사진파일)의 촬영일시데이터 및 촬영위치데이터와, 객체메타데이터를 기반으로 클립영상 및 사진파일 각각 간의 맥락을 추론하고, 맥락추론결과에 따라 클립영상 및 사진파일을 자동 배열 또는 정렬하여 영상배열을 형성할 수 있다.The image array forming unit 350 infers the context between each clip image and photo file based on the shooting date and time data and shooting location data of the image file (clip image and photo file) and object metadata, and the context inference result Depending on the above, an image arrangement can be formed by automatically arranging or arranging clip images and photo files.

예를 들어, 도 5의 (b)와 같이 분할된 Clip01, Clip02, Clip03, Clip04, Clip05, Image01, Clip06, Clip07, Clip08, Image02, Clip09, Clip10은 각 클립영상과 사진파일의 촬영시간, 촬영위치, 객체의 다양한 특성(성별, 연령, 행동, 감정)을 고려하여 미리 학습된 머신러닝 알고리즘을 통한 맥락추론을 실시하여 특정한 스토리나 전개를 갖는 파일들의 순서를 정의 또는 결정할 수 있으며, 이러한 맥락추론결과에 따라 도 5의 (c)에 도시된 바와 같이, Clip02, Clip04, Clip03, Image01, Clip01, Clip05, Clip06, Clip09, Image02, Clip08, Clip10, Clip07의 순서를 갖는 영상배열을 형성할 수 있다. For example, Clip01, Clip02, Clip03, Clip04, Clip05, Image01, Clip06, Clip07, Clip08, Image02, Clip09, and Clip10 divided as shown in (b) of FIG. , It is possible to define or determine the order of files with a specific story or development by conducting context inference through pre-learned machine learning algorithms in consideration of various characteristics (gender, age, behavior, emotion) of objects, and the result of such context inference Accordingly, as shown in (c) of FIG. 5, an image array having the order of Clip02, Clip04, Clip03, Image01, Clip01, Clip05, Clip06, Clip09, Image02, Clip08, Clip10, and Clip07 can be formed.

한편, 영상배열 형성부(350)는, 영상배열에 포함된 클립영상을 재생 가능하게 표시하고, 영상배열에 포함된 클립영상 및 사진파일 각각의 배열 순서를 드래그 앤 드랍 방식으로 변경하기 위한 제1 사용자 인터페이스를 제공할 수 있다.On the other hand, the image array forming unit 350 displays the clip images included in the image array in a reproducible manner and changes the arrangement order of each of the clip images and photo files included in the image array by a drag-and-drop method. A user interface can be provided.

예를 들어, 도 5의 (c)와 같이 도시된 영상배열이 Clip02, Clip04, Clip03, Image01, Clip01, Clip05, Clip06, Clip09, Image02, Clip08, Clip10, Clip07의 순서로 형성된 경우, Clip02 클립영상을 선택한 후 드래그하여 Clip01과 Clip05 사이에 드랍하고, Image02를 선택한 후 Clip04 앞으로 드랍하면, 해당 영상배열이 Image02, Clip04, Clip03, Image01, Clip01, Clip02, Clip05, Clip06, Clip09, Clip08, Clip10, Clip07의 순서로 재배치될 수 있다.For example, when the image array shown in (c) of FIG. 5 is formed in the order of Clip02, Clip04, Clip03, Image01, Clip01, Clip05, Clip06, Clip09, Image02, Clip08, Clip10, Clip07, Clip02 clip image After selecting and dragging to drop between Clip01 and Clip05, select Image02 and drop in front of Clip04, the image array will be in the order of Image02, Clip04, Clip03, Image01, Clip01, Clip02, Clip05, Clip06, Clip09, Clip08, Clip10, Clip07 can be relocated to

상기 영상그룹 생성부(360)는, 영상배열 형성부(350)를 통한 맥락추론결과를 기반으로 영상배열에서 맥락종료지점을 각각 경계점으로 자동 설정하고, 설정된 경계점을 기준으로 영상배열에 대한 그룹화를 수행하여 영상그룹을 생성할 수 있다.The image group creation unit 360 automatically sets each context end point in the image array as a boundary point based on the result of context inference through the image array forming unit 350, and groups the image arrays based on the set boundary point. You can create an image group by performing

예를 들어, 도 5의 (d)에 도시된 바와 같이 맥락추론결과에 따라 특정한 스토리의 흐름이 종료되는 지점으로 A1와 A2가 설정되면, 설정된 A1와 A2을 기준으로 Clip02, Clip04, Clip03, Image01으로 구성되는 제1 영상그룹, Clip01, Clip05, Clip06, Clip09로 구성되는 제2 영상그룹, Image02, Clip08, Clip10, Clip07로 구성되는 제3 영상그룹이 각각 정의될 수 있다. For example, as shown in (d) of FIG. 5 , when A1 and A2 are set as points at which the flow of a specific story ends according to the result of context inference, Clip02, Clip04, Clip03, Image01 based on the set A1 and A2. A first image group composed of , a second image group composed of Clip01, Clip05, Clip06, and Clip09, and a third image group composed of Image02, Clip08, Clip10, and Clip07 may be respectively defined.

한편, 영상그룹 생성부(360)는, 맥락종료지점 즉 경계점의 위치를 드래그 앤 드랍 방식으로 변경하기 위한 제2 사용자 인터페이스를 제공할 수 있다.Meanwhile, the image group creation unit 360 may provide a second user interface for changing the location of a context end point, that is, a border point, using a drag and drop method.

예를 들어, 도 5의 (d)에 도시된 경계점 A1과 A2가 최초 설정되어 있는 상태에서, A1을 선택한 후 드래그하여 Clip01과 Clip05 사이에 드랍하면 제1 영상그룹은 Clip02, Clip04, Clip03, Image01, Clip01으로 재구성되고, 제2 영상그룹은 Clip05, Clip06, Clip09로 재구성될 수 있다.For example, in the state where the boundary points A1 and A2 shown in (d) of FIG. 5 are initially set, if A1 is selected and then dragged and dropped between Clip01 and Clip05, the first image group is Clip02, Clip04, Clip03, Image01 , Clip01, and the second image group can be reconstructed into Clip05, Clip06, and Clip09.

상기 디자인효과 삽입부(400)는, 영상그룹 생성부(300)를 통해 생성된 각 클립영상 별 객체데이터(또는 객체메타데이터) 및 영상그룹 간의 경계점을 각각 인식하고, 인식된 객체데이터(또는 객체메타데이터) 및 경계점을 기반으로 클립영상 사이 및 경계점 중 적어도 하나에 미리 준비된 디자인효과를 삽입할 수 있다. The design effect insertion unit 400 recognizes object data (or object metadata) for each clip image generated by the image group creation unit 300 and boundary points between image groups, and recognizes the recognized object data (or object data). Based on metadata) and boundary points, a design effect prepared in advance may be inserted between clip images and at least one of boundary points.

좀 더 구체적으로, 경계점 A1과 A2에 미리 준비된 트랜지션 영상을 삽입하여 제1 영상그룹에 대한 재생이 종료된 후 제2 영상그룹에 대한 재생이 이루어지기 전에 트랜지션 영상이 재생되어 제1 영상그룹과 제2 영상그룹 간의 화면 전환이 자연스럽게 이루어질 수 있도록 편집될 수 있다. More specifically, after playback of the first image group is finished by inserting transition images prepared in advance at boundary points A1 and A2, the transition image is reproduced before playback of the second image group, and the first image group and the second image group are reproduced. It can be edited so that the screen transition between the 2 image groups can be made naturally.

이때, 최종 생성될 동영상파일에 대한 테마, 컨셉, 카테고리 등을 사용자가 설정하면, 설정정보에 따라 미리 준비된 트랜지션 영상 중 적어도 하나가 자동 선택되어 경계점에 삽입됨으로써 클립영상 간 또는 클립영상과 사진 사이의 전환 시 매끄럽고 효과적인 장면전환이 자동적으로 연출되도록 할 수 있다. 물론, 본 실시예에 따른 트랜지션 영상은 사용자가 직접 제작 또는 편집한 영상으로 설정되어 자동 삽입될 수 있으며, 맥락추론결과에 따라 전후 맥락에 따른 이미지나 분위기에 맞는 컨셉의 트랜지션 영상이 자동 선택 또는 추천되어 적용될 수도 있다. 물론, 트랜지션 영상뿐만 아니라, 미리 준비된 애니메이션효과, 특수효과, 디자인효과영상 등의 다양한 디자인효과를 대체 적용할 수도 있다. At this time, if the user sets the theme, concept, category, etc. for the video file to be finally created, at least one of the transition videos prepared in advance according to the setting information is automatically selected and inserted at the boundary, thereby creating a clear image between clip images or between clip images and photos. Smooth and effective scene transitions can be produced automatically during transition. Of course, the transition video according to the present embodiment can be set as a video produced or edited by the user and automatically inserted, and a transition video with a concept suitable for the image or atmosphere according to the context is automatically selected or recommended according to the result of context inference. may be applied. Of course, various design effects such as previously prepared animation effects, special effects, and design effect images, as well as transition images, may be applied instead.

또한, 각 클립영상으로부터 인식된 객체데이터 또는 객체메타데이터에 따라 미리 정의된 머신러닝 알고리즘을 기반으로 디자인효과요소 B1, B2를 해당 클립영상과 다음 클립영상 사이에 각각 삽입할 수 있다. 디자인효과영상은 해당 객체에 대한 꾸밈, 강조 등 디자인적 효과를 주기 위한 애니메이션효과, 특수효과, 디자인효과영상 등을 포함할 수 있다. 좀 더 구체적으로는, 디자인효과를 제공하기 위한 데이터베이스에서 해당 객체메타데이터와 매칭되는 디자인효과요소를 추출하고, 해당 객체메타데이터가 라벨링된 클립영상 또는 해당 클립영상이 시작 또는 종료되는 지점에 추출된 해당 디자인효과요소(애니메이션효과, 특수효과, 디자인효과영상 등)를 삽입할 수 있다. 이때, 디자인효과요소는 객체메타데이터에 따라 미리 학습된 결과로 결정 또는 매칭될 수 있으나, 맥락추론결과를 함께 고려하여 결정될 수도 있다. 즉, 인식된 특정 객체와 해당 객체의 행동, 상황, 감정, 성별, 나이 등을 고려한 디자인효과가 매칭되어 적용될 수도 있다.In addition, design effect elements B1 and B2 may be inserted between the corresponding clip image and the next clip image based on a predefined machine learning algorithm according to object data or object metadata recognized from each clip image. The design effect image may include animation effects, special effects, and design effect images to give design effects such as embellishment and emphasis on the object. More specifically, design effect elements that match the corresponding object metadata are extracted from the database for providing design effects, and the corresponding object metadata is extracted from the labeled clip image or the starting or ending point of the corresponding clip image. Applicable design effect elements (animation effect, special effect, design effect video, etc.) can be inserted. At this time, the design effect element may be determined or matched as a result learned in advance according to object metadata, but may also be determined by considering the context reasoning result together. That is, a design effect considering a recognized specific object and the object's behavior, situation, emotion, gender, age, etc. may be matched and applied.

상기 동영상파일 생성부(500)는, 도 5의 (e)에 도시된 바와 같이 트랜지션 영상이 삽입된 영상그룹과 경계점을 서로 결합하여 하나의 동영상파일을 생성할 수 있다. 이때, 사용자가 업로드 한 타이틀, 엔딩 등에 대한 텍스트 정보가 있는 경우 해당 동영상파일의 시작과 종료지점에 해당 텍스트 정보가 삽입된 영상클립이 추가 삽입되어 동영상파일에 적용될 수도 있다. As shown in (e) of FIG. 5, the video file generation unit 500 may generate a single video file by combining an image group into which a transition image is inserted and a boundary point. At this time, if there is text information about a title, an ending, etc. uploaded by a user, a video clip in which the text information is inserted may be additionally inserted at the start and end points of the corresponding video file and applied to the video file.

상기 동영상파일 배포부(600)는, 동영상파일을 렌더링 및 압축한 후 유선 또는 무선 인터넷 통신망을 통해 사용자통신단말(10)로 전송하여 배포 또는 전송함으로써, 최초 업로드 한 다수의 동영상 및 사진을 이용하여 특정한 스토리 또는 시퀀스를 가지며 매끄럽고 효과적인 장면전환이 연출되도록 편지된 하나의 동영상파일을 제공할 수 있다.The video file distributing unit 600 renders and compresses the video file, and then distributes or transmits the video file to the user communication terminal 10 through a wired or wireless Internet communication network, thereby using a plurality of videos and photos uploaded for the first time. It is possible to provide a single video file that has a specific story or sequence and is written so that smooth and effective scene transitions can be produced.

도 6은 본 발명의 다른 실시예에 따른 오토 라벨링 활용과 디자인 요소 삽입을 통한 영상 자동 편집 방법의 구성을 나타낸 순서도이고, 도 7은 본 발명의 다른 실시예에 따른 영상파일 등록 단계의 구성을 나타낸 순서도이며, 도 8은 본 발명의 다른 실시예에 따른 영상그룹 생성 단계의 구성을 나타낸 순서도이다.6 is a flow chart showing the configuration of an automatic video editing method through the use of auto labeling and insertion of design elements according to another embodiment of the present invention, and FIG. 7 shows the configuration of a video file registration step according to another embodiment of the present invention. 8 is a flowchart showing the configuration of the image group generating step according to another embodiment of the present invention.

도 6을 참조하면, 본 발명의 다른 실시예에 따른 오토 라벨링 활용과 디자인 요소 삽입을 통한 영상 자동 편집 방법(S1000)은 영상파일 등록 단계(S100), 이미지 전처리 단계(S200), 영상그룹 생성 단계(S300), 디자인효과 삽입 단계(S400), 동영상파일 생성 단계(S500) 및 동영상파일 배포 단계(S600) 중 적어도 하나를 포함할 수 있다.Referring to FIG. 6 , a method of automatically editing an image through utilization of auto labeling and insertion of design elements (S1000) according to another embodiment of the present invention includes image file registration step (S100), image preprocessing step (S200), and image group creation step. (S300), a design effect insertion step (S400), a video file creation step (S500), and a video file distribution step (S600) may include at least one.

상기 영상파일 등록 단계(S100)에서는, 사용자통신단말(10)로부터 다수의 영상파일을 입력 받아 자동편집서버(20)로 업로드 할 수 있다.In the video file registration step (S100), a plurality of video files may be received from the user communication terminal 10 and uploaded to the automatic editing server 20.

이를 위해 영상파일 등록 단계(S100)는 도 7에 도시된 바와 같이, 영상파일 선택 단계(110)와 영상파일 업로드 단계(S120) 중 적어도 하나를 포함할 수 있다.To this end, the image file registration step (S100) may include at least one of an image file selection step (110) and an image file upload step (S120), as shown in FIG.

상기 영상파일 선택 단계(110)에서는, 사용자통신단말(10)의 앨범 또는 사진첩에 접근하여, 해당 앨범 또는 사진첩에 저장된 동영상파일 및 사진파일 중 적어도 하나의 영상파일을 선택 받도록 하거나, 사용자통신단말(10)의 카메라를 통해 촬영된 동영상이나 사진 중 어느 하나를 선택할 수 있도록 한다. In the video file selection step 110, by accessing the album or photo album of the user communication terminal 10, selecting at least one video file from among video files and photo files stored in the album or photo album, or the user communication terminal ( 10) allows users to select either a video or a photo taken by the camera.

상기 영상파일 업로드 단계(S120)에서는, 영상파일 선택 단계(110)를 통해 선택된 동영상, 사진 등의 영상파일을 유선 또는 무선 인터넷 통신망을 통해 자동편집서버(20)로 업로드 할 수 있다. In the video file uploading step (S120), the video files, such as videos and photos, selected through the video file selection step (110) can be uploaded to the automatic editing server 20 through a wired or wireless Internet communication network.

상기 이미지 전처리 단계(S200)에서는, 영상파일 등록 단계(S100)를 통해 업로드 된 영상파일(동영상, 사진)의 사이즈를 미리 설정된 사이즈로 통일되도록 각각 변환하고, 영상파일의 방향이 미리 설정된 방향으로 정렬되도록 회전시켜 영상파일에 포함된 이미지데이터를 정규화할 수 있으며, 전처리가 완료된 영상파일을 영상그룹 생성 단계(S300)로 전달할 수 있다. 이때, 영상파일은 그 촬영방향 즉 세로로 촬영되었는지 또는 가로로 촬영되었는지에 따라 해당 영상의 사이즈와 형태가 상이할 수 있으므로, 이를 통일시키도록 영상을 회전 변환시킬 수 있다.In the image pre-processing step (S200), the size of the image files (video, photo) uploaded through the image file registration step (S100) is converted to a unified size, respectively, and the directions of the image files are aligned in the preset direction. Image data included in the image file may be normalized by rotating as much as possible, and the image file for which preprocessing is completed may be transmitted to the image group generation step (S300). At this time, since the size and shape of the corresponding image may be different depending on the image file's shooting direction, that is, vertical or horizontal, the images may be rotated and converted to unify them.

상기 영상그룹 생성 단계(S300)에서는, 다수의 영상파일(동영상, 사진)로부터 추출된 메타데이터 및 인식된 객체데이터에 따라 다수의 클립영상으로 분할하고, 추출된 메타데이터 및 인식된 객체데이터를 기반으로 다수의 클립영상 간 맥락을 추론하고, 맥락추론결과에 따라 다수의 클립영상을 배열 및 그룹화하여 다수의 클립영상을 포함한 적어도 하나의 영상그룹을 생성할 수 있다.In the image group creation step (S300), the image group is divided into a plurality of clip images according to metadata and recognized object data extracted from a plurality of image files (videos and photos), and based on the extracted metadata and recognized object data. It is possible to infer a context between a plurality of clip images, and to arrange and group a plurality of clip images according to the context inference result to generate at least one image group including a plurality of clip images.

이를 위해 영상그룹 생성 단계(S300)는 도 8에 도시된 바와 같이, 메타데이터 추출 단계(S310), 객체 인식 단계(S320), 영상파일 분할 단계(S330), 객체 분석 단계(S340), 영상배열 형성 단계(S350) 및 영상그룹 생성 단계(S360) 중 적어도 하나를 포함할 수 있다.To this end, as shown in FIG. 8, the image group creation step (S300) includes metadata extraction step (S310), object recognition step (S320), image file division step (S330), object analysis step (S340), image arrangement At least one of the forming step (S350) and the image group generating step (S360) may be included.

상기 메타데이터 추출 단계(S310)에서는, 전처리가 완료된 동영상, 사진 등의 영상파일에서 메타데이터를 추출할 수 있으며, 추출된 메타데이터에서 촬영일시데이터와 촬영위치데이터를 각각 추출할 수 있다. 여기서, 촬영일시데이터는 영상파일이 촬영된 년, 월, 일, 시에 대한 정보를 포함하고, 촬영위치데이터는 영상파일을 촬영한 사용자통신단말(10)의 GPS위치정보를 포함할 수 있다. 이러한 촬영일시데이터와 촬영위치데이터는 후술하는 영상파일의 맥락추론을 위한 정보 중 하나로서 활용되며, 맥락추론을 위한 영상파일의 새로운 메타데이터로서 새롭게 저장 또는 정의될 수 있다.In the metadata extraction step (S310), metadata may be extracted from image files such as videos and photos for which preprocessing has been completed, and recording date and time data and recording location data may be extracted from the extracted metadata, respectively. Here, the recording date data may include information on the year, month, day, and time when the video file was captured, and the recording location data may include GPS location information of the user communication terminal 10 that captured the video file. Such photographing date and time data and photographing location data are utilized as one of information for context inference of an image file to be described later, and may be newly stored or defined as new metadata of an image file for context inference.

상기 객체 인식 단계(S320)에서는, 전처리가 완료된 영상파일(동영상, 사진)에서 객체데이터를 인식할 수 있다. 이러한 객체 인식부(320)는 객체 인식을 위해 미리 정의된 머신러닝 알고리즘을 활용하여 영상파일의 영상데이터 내 존재하는 특정 객체를 인식할 수 있다. 여기서, 객체는 사람(인물), 동물(개, 고양이 등), 사물(자동차, 건축물, 교량, 신호등 등) 등 다양한 대상을 포함할 수 있으며, 본 실시예에서는 미리 설정된 객체 또는 객체군에 대한 인식 프로세스를 제공할 수 있다. In the object recognizing step (S320), object data can be recognized in the image file (video, photo) for which preprocessing has been completed. The object recognition unit 320 may recognize a specific object existing in image data of an image file by using a predefined machine learning algorithm for object recognition. Here, the object may include various objects such as people (persons), animals (dogs, cats, etc.), objects (cars, buildings, bridges, traffic lights, etc.), and in this embodiment, recognition of a preset object or group of objects. process can be provided.

상기 영상파일 분할 단계(S330)에서는, 업로드 된 영상파일이 동영상파일인 경우 해당 동영상파일에서 인식된 객체데이터에 따라 해당 동영상파일을 해당 객체를 기준으로 분할하여 다수의 클립영상을 생성할 수 있다.In the video file division step (S330), if the uploaded video file is a video file, a plurality of clip images may be generated by dividing the video file based on the object according to the object data recognized in the video file.

상기 객체 분석 단계(S340)에서는, 클립영상 및 사진파일 별로 인물객체에 대한 성별, 연령, 행동 및 감정 중 적어도 하나에 대한 객체특징을 분석하여 객체메타데이터를 생성할 수 있다. 여기서, 객체메타데이터는 후술하는 맥락추론을 위한 기초정보로서 활용되며, 클립영상 및 사진들에 대한 전후 맥락을 추론하기에 앞서 각 파일들에 나타나는 객체의 특징적 요소 즉, 성별, 연령, 행동, 감정 등의 특징을 미리 정의된 머신러닝 알고리즘을 활용하여 분석할 수 있다. 객체 분석 단계(S340)의 분석결과 즉, 객체메타데이터는 인식된 객체 별로 객체성별메타정보, 객체연령메타정보, 객체행동메타정보, 객체감정메타정보 중 적어도 하나를 정보를 포함하며, 후술하는 맥락추론을 위해 기초정보 중 하나로서 활용될 수 있다.In the object analysis step (S340), object metadata may be generated by analyzing object characteristics for at least one of gender, age, behavior, and emotion of the person object for each clip image and photo file. Here, the object metadata is used as basic information for context inference, which will be described later, and the characteristic elements of objects appearing in each file, such as gender, age, behavior, and emotion, prior to inferring the context of clip images and photos. The characteristics of the back can be analyzed using a predefined machine learning algorithm. The analysis result of the object analysis step (S340), that is, the object metadata includes at least one of object gender meta information, object age meta information, object behavior meta information, and object emotion meta information for each recognized object, and the context described later. It can be used as one of the basic information for reasoning.

상기 영상배열 형성 단계(S350)에서는, 영상파일(클립영상 및 사진파일)의 촬영일시데이터 및 촬영위치데이터와, 객체메타데이터를 기반으로 클립영상 및 사진파일 각각 간의 맥락을 추론하고, 맥락추론결과에 따라 클립영상 및 사진파일을 자동 배열 또는 정렬하여 영상배열을 형성할 수 있다.In the image array forming step (S350), the context between each clip image and photo file is inferred based on the shooting date and time data and shooting location data of the image file (clip image and photo file) and object metadata, and the context inference result Depending on the above, an image arrangement can be formed by automatically arranging or arranging clip images and photo files.

한편, 영상배열 형성 단계(S350)에서는, 영상배열에 포함된 클립영상을 재생 가능하게 표시하고, 영상배열에 포함된 클립영상 및 사진파일 각각의 배열 순서를 드래그 앤 드랍 방식으로 변경하기 위한 제1 사용자 인터페이스를 제공할 수 있다.On the other hand, in the image array forming step (S350), a first step for displaying the clip images included in the image array in a reproducible manner and changing the arrangement order of each of the clip images and photo files included in the image array by a drag-and-drop method. A user interface can be provided.

상기 영상그룹 생성 단계(S360)에서는, 영상배열 형성 단계(S350)를 통한 맥락추론결과를 기반으로 영상배열에서 맥락종료지점을 각각 경계점으로 자동 설정하고, 설정된 경계점을 기준으로 영상배열에 대한 그룹화를 수행하여 영상그룹을 생성할 수 있다.In the image group creation step (S360), each context end point in the image array is automatically set as a boundary point based on the context inference result through the image array forming step (S350), and the image array is grouped based on the set boundary point. You can create an image group by performing

한편, 영상그룹 생성 단계(S360)에서는, 맥락종료지점 즉 경계점의 위치를 드래그 앤 드랍 방식으로 변경하기 위한 제2 사용자 인터페이스를 제공할 수 있다.Meanwhile, in the image group creation step ( S360 ), a second user interface for changing the position of the context end point, that is, the border point, by a drag and drop method may be provided.

상기 디자인효과 삽입 단계(S400)에서는, 영상그룹 생성 단계(S300)를 통해 각 클립영상 별 객체데이터(또는 객체메타데이터) 및 영상그룹 간의 경계점을 각각 인식하고, 인식된 객체데이터(또는 객체메타데이터) 및 경계점을 기반으로 클립영상 사이 및 경계점 중 적어도 하나에 미리 준비된 디자인효과를 삽입할 수 있다. In the design effect inserting step (S400), the object data (or object metadata) for each clip image and the boundary point between the image groups are recognized through the image group creation step (S300), respectively, and the recognized object data (or object metadata) ) and boundary points, a previously prepared design effect may be inserted between clip images and at least one of boundary points.

상기 동영상파일 생성 단계(S500)에서는, 도 5의 (e)에 도시된 바와 같이 트랜지션 영상이 삽입된 영상그룹과 경계점을 서로 결합하여 하나의 동영상파일을 생성할 수 있다. 이때, 사용자가 업로드 한 타이틀, 엔딩 등에 대한 텍스트 정보가 있는 경우 해당 동영상파일의 시작과 종료지점에 해당 텍스트 정보가 삽입된 영상클립이 추가 삽입되어 동영상파일에 적용될 수도 있다. In the video file generation step (S500), as shown in FIG. 5(e), a video file can be created by combining an image group into which a transition image is inserted and a boundary point. In this case, if there is text information about the title, ending, etc. uploaded by the user, a video clip in which the corresponding text information is inserted may be additionally inserted and applied to the video file at the start and end points of the corresponding video file.

상기 동영상파일 배포 단계(S600)에서는, 동영상파일을 렌더링 및 압축한 후 유선 또는 무선 인터넷 통신망을 통해 사용자통신단말(10)로 전송하여 배포 또는 전송함으로써, 최초 업로드 한 다수의 동영상 및 사진을 이용하여 특정한 스토리 또는 시퀀스를 가지며 매끄럽고 효과적인 장면전환이 연출되도록 편지된 하나의 동영상파일을 제공할 수 있다.In the video file distribution step (S600), after rendering and compressing the video file, the video file is transmitted to the user communication terminal 10 through a wired or wireless Internet communication network for distribution or transmission, using a plurality of videos and photos uploaded for the first time. It is possible to provide a single video file that has a specific story or sequence and is written so that smooth and effective scene transitions can be produced.

이상에서 설명한 것은 본 발명에 의한 오토 라벨링 활용과 디자인 요소 삽입을 통한 영상 자동 편집 시스템 및 그 방법을 실시하기 위한 하나의 실시예에 불과한 것으로서, 본 발명은 상기 실시예에 한정되지 않고, 이하의 특허청구범위에서 청구하는 바와 같이 본 발명의 요지를 벗어남이 없이 당해 발명이 속하는 분야에서 통상의 지식을 가진 자라면 누구든지 다양한 변경 실시가 가능한 범위까지 본 발명의 기술적 정신이 있다고 할 것이다.What has been described above is only one embodiment for implementing the automatic video editing system and method through the use of auto labeling and insertion of design elements according to the present invention, the present invention is not limited to the above embodiment, and the following patents As claimed in the claims, anyone skilled in the art without departing from the gist of the present invention will say that the technical spirit of the present invention exists to the extent that various changes can be made.

1000: 오토 라벨링 활용과 디자인 요소 삽입을 통한 영상 자동 편집 시스템
100: 영상파일 등록부
110: 영상파일 선택부
120: 영상파일 업로드부
200: 이미지 전처리부
300: 영상그룹 생성부
310: 메타데이터 추출부
320: 객체 인식부
330: 영상파일 분할부
340: 객체 분석부
350: 영상배열 형성부
360: 영상그룹 생성부
400: 디자인효과 삽입부
500: 동영상파일 생성부
600: 동영상파일 배포부
S1000: 오토 라벨링 활용과 디자인 요소 삽입을 통한 영상 자동 편집 방법
S100: 영상파일 등록 단계
S110: 영상파일 선택 단계
S120: 영상파일 업로드 단계
S200: 이미지 전처리 단계
S300: 영상그룹 생성 단계
S310: 메타데이터 추출 단계
S320: 객체 인식 단계
S330: 영상파일 분할 단계
S340: 객체 분석 단계
S350: 영상배열 형성 단계
S360: 영상그룹 생성 단계
S400: 디자인효과 삽입 단계
S500: 동영상파일 생성 단계
S600: 동영상파일 배포 단계1000: Video automatic editing system through the use of auto labeling and insertion of design elements
100: image file register
110: video file selection unit
120: image file upload unit
200: image pre-processing unit
300: image group creation unit
310: metadata extraction unit
320: object recognition unit
330: video file division unit
340: object analysis unit
350: image array forming unit
360: image group creation unit
400: design effect insertion unit
500: video file generation unit
600: video file distribution unit
S1000: How to automatically edit video through the use of auto labeling and insertion of design elements
S100: image file registration step
S110: Video file selection step
S120: Image file upload step
S200: image pre-processing step
S300: Image group creation step
S310: metadata extraction step
S320: object recognition step
S330: Image file division step
S340: object analysis step
S350: image array formation step
S360: Image group creation step
S400: Design Effect Insertion Step
S500: video file creation step
S600: video file distribution step

Claims

The image group generator divides the image into a plurality of clip images according to the metadata extracted from the plurality of image files and the recognized object data, infers the context between the plurality of clip images based on the metadata and the object data, and an image group generating step of generating at least one image group including a plurality of clip images by arranging and grouping the clip images according to a result of the inference;
A design effect insertion unit recognizes the boundary points between the object data and the video group, respectively, and inserts a previously prepared design effect between the clip images and at least one of the boundary points based on the recognized object data and the boundary points. insertion step; and
A video file generating step of generating a video file by combining the video group into which the design effect is inserted and the boundary point, by a video file generating unit;
The image file includes at least one of a video file and a photo file,
The image group creation step,
A metadata extraction step of extracting shooting date and time data and shooting location data from the image file, respectively;
object recognition step of recognizing object data in the image file;
If the video file is a video file, dividing the corresponding video file according to the object data to generate a plurality of clip images;
An object analysis step of generating object metadata by analyzing at least one of gender, age, behavior, and emotion of the person object for each clip image and photo file;
An image arrangement in which a context between each clip image and photo file is inferred based on the shooting date data, the shooting location data, and the object metadata, and the clip image and photo files are automatically arranged according to the result of the context inference to form an image arrangement. formation step; and
An image group forming step of automatically setting a context end point in the image array as a boundary point based on a result of the context inference, and grouping the image array based on the set boundary point to form the image group. A video automatic editing method through the use of auto labeling and the insertion of design elements, characterized in that.

According to claim 1,
A video file registration step in which the video file registration unit receives a plurality of video files from the user communication terminal and uploads them to an automatic editing server; and
Video file distribution step of transmitting and distributing the video file to a user communication terminal by a video file distribution unit.

According to claim 2,
The video file registration step,
A video file selection step of receiving a selection of at least one video file from among a video file and a photo file; and
An image file upload step of uploading the image file selected through the image file selection step.

According to claim 2,
Creating the image group by converting the size of each image file uploaded through the image file registration step to a preset size, and normalizing image data included in the image file by rotating the image file so that the orientation of the image file is aligned in the preset direction. Video automatic editing method through the use of auto labeling and the insertion of design elements, characterized in that it further comprises an image pre-processing step delivered to execute.

delete

According to claim 1,
In the step of inserting the design effect,
Extracting a design effect element that matches the object metadata among the previously prepared design effect elements, inserting the extracted design effect element at the start or end point of a clip image labeled with the object metadata,
An image automatic editing method through the use of auto labeling and the insertion of design elements, characterized in that recognizing the image groups as boundary points and inserting a previously prepared transition image or a previously prepared design effect image at the recognized boundary point.

It divides into a plurality of clip images according to the metadata extracted from a plurality of image files and the recognized object data, infers the context between the plurality of clip images based on the metadata and the object data, and an image group generating unit for generating at least one image group including a plurality of clip images by arranging and grouping clip images;
a design effect insertion unit for recognizing boundary points between the object data and the image group, and inserting a previously prepared design effect between the clip images and at least one of the boundary points based on the recognized object data and the boundary points; and
A video file generation unit generating a video file by combining the video group into which the design effect is inserted and the boundary point;
The image file includes at least one of a video file and a photo file,
The image group creation unit,
a metadata extraction unit for extracting shooting date and time data and shooting location data from an image file;
an object recognizing unit recognizing object data in an image file;
If the video file is a video file, an image file division unit for generating a plurality of clip images by dividing the corresponding video file according to the object data;
an object data analyzer generating object metadata by analyzing at least one of gender, age, behavior, and emotion of the person object for each clip image and photo file;
An image arrangement in which a context between each clip image and photo file is inferred based on the shooting date data, the shooting location data, and the object metadata, and the clip image and photo files are automatically arranged according to the result of the context inference to form an image arrangement. forming part; and
An image group forming unit configured to automatically set a context end point in the image array as the boundary point, respectively, based on a result of the context inference, and group the image array based on the set boundary point to form the image group. An automatic image editing system through the use of auto labeling and the insertion of design elements.

According to claim 7,
a video file registration unit for receiving a plurality of video files from a user communication terminal and uploading them to an automatic editing server; and
An automatic video editing system using auto labeling and inserting design elements, characterized in that it further comprises a video file distribution unit for transmitting and distributing the video file to a user communication terminal.

According to claim 8,
The video file registration unit,
a video file selector receiving at least one video file selected from a video file and a photo file; and
An automatic video editing system using auto labeling and inserting design elements, characterized in that it comprises a video file upload unit for uploading the video file selected through the video file selection unit.

According to claim 8,
The size of each video file uploaded through the video file registration unit is converted to a preset size, and the image data included in the video file is normalized by rotating the image file so that the direction of the video file is aligned in the preset direction, and the image data is transmitted to the video group generator. An automatic video editing system through the use of auto labeling and the insertion of design elements, characterized in that it further comprises an image pre-processing unit.

delete

According to claim 7,
The design effect insertion unit,
Extracting a design effect element that matches the object metadata among the previously prepared design effect elements, inserting the extracted design effect element at the start or end point of a clip image labeled with the object metadata,
An automatic video editing system using auto labeling and inserting design elements, characterized in that recognizing the image groups as boundary points and inserting a previously prepared transition image or a previously prepared design effect image at the recognized boundary point.