KR102465870B1

KR102465870B1 - Method and system for generating video content based on text to speech for image

Info

Publication number: KR102465870B1
Application number: KR1020210034735A
Authority: KR
Inventors: 김재민; 이수미; 이주현; 박소현; 전혜인; 손정민; 황소정
Original assignee: 네이버 주식회사; 라인 가부시키가이샤
Priority date: 2021-03-17
Filing date: 2021-03-17
Publication date: 2022-11-10
Also published as: JP7277635B2; JP2022145617A; KR20220129868A

Abstract

이미지에 대한 음성합성에 기반하여 영상 컨텐츠를 생성하는 방법 및 시스템을 개시한다. 일실시예에 따른 영상 컨텐츠 생성 방법은 컨텐츠 편집 툴을 통해 업로드된 이미지들의 스냅샷들을 추출하는 단계, 추출된 스냅샷들을 컨텐츠 편집 툴을 통해 타임라인에 따라 표시하는 단계, 컨텐츠 편집 툴을 통해 표시된 스냅샷들의 길이를 조절하는 길이 조절 기능을 제공하는 단계, 길이 조절 기능을 통해 길이가 조절된 스냅샷의 런닝타임을 조절된 길이에 따라 조절하는 단계 및 컨텐츠 편집 툴을 통해 입력되는 텍스트에 대한 음성합성을 생성하여 타임라인의 선택된 시점에 추가하는 단계를 포함할 수 있다.Disclosed are a method and a system for generating video content based on speech synthesis for an image. The method for generating image content according to an embodiment includes extracting snapshots of images uploaded through a content editing tool, displaying the extracted snapshots along a timeline through a content editing tool, and displaying the extracted snapshots through a content editing tool. A step of providing a length adjustment function to adjust the length of the snapshots, a step of adjusting the running time of a snapshot whose length is adjusted through the length adjustment function according to the adjusted length, and a voice for text input through a content editing tool It may include creating a compositing and adding it to a selected point in the timeline.

Description

Method and system for generating video contents based on speech synthesis for images

아래의 설명은 이미지에 대한 음성합성에 기반하여 영상 컨텐츠를 생성하는 방법 및 시스템에 관한 것이다.The following description relates to a method and system for generating video content based on speech synthesis for images.

이미지들을 포함하는 자료에 음원(음성합성(Text To Speech, TTS) 포함)을 적용하려면, 일례로 파워포인트로 형성된 자료의 경우, 슬라이드별로 각 하나의 음원을 추가해야 하는 번거로운 작업이 요구되었다. 이때, 각 슬라이드별로 하나의 음원만이 추가가 가능한 제약이 있으며, 재생 시작 시간을 자유롭게 편집할 수 없다는 문제점이 있다.In order to apply a sound source (including Text To Speech, TTS) to material containing images, for example, in the case of a PowerPoint-formed material, a cumbersome task of adding one sound source to each slide was required. In this case, there is a limitation that only one sound source can be added for each slide, and there is a problem that the playback start time cannot be freely edited.

이처럼, 영상 컨텐츠의 제작과 소비의 니즈가 커진 시장에서 종래의 음성합성을 이용한 영상 제작 기술은 번거롭고 제한적인 형태만을 제공하는 문제점이 있다.As such, in the market where the need for production and consumption of video content has increased, the conventional video production technology using voice synthesis has a problem of providing only a cumbersome and limited form.

[선행기술문헌][Prior art literature]

한국공개특허 제10-2014-0147401호(공개일: 2014.12.30)Korea Patent Publication No. 10-2014-0147401 (published date: 2014.12.30)

다수의 이미지들에 대해 사용자가 원하는 음성합성을 실시간으로 생성하여 사용자가 원하는 재생 시작 시간에 더빙할 수 있고, 생성된 음성합성이 더빙된 다수의 이미지들을 통해 영상 컨텐츠를 생성 및 제공할 수 있는 영상 컨텐츠 생성 방법 및 시스템을 제공한다.An image capable of generating a voice synthesis desired by a user for a plurality of images in real time and dubbing it at a playback start time desired by the user, and generating and providing video content through a plurality of images dubbed with the generated voice synthesis A content creation method and system are provided.

적어도 하나의 프로세서를 포함하는 컴퓨터 장치의 영상 컨텐츠 생성 방법에 있어서, 상기 적어도 하나의 프로세서에 의해, 컨텐츠 편집 툴을 통해 업로드된 이미지들의 스냅샷들을 추출하는 단계; 상기 적어도 하나의 프로세서에 의해, 상기 추출된 스냅샷들을 상기 컨텐츠 편집 툴을 통해 타임라인에 따라 표시하는 단계; 상기 적어도 하나의 프로세서에 의해, 상기 컨텐츠 편집 툴을 통해 상기 표시된 스냅샷들의 길이를 조절하는 길이 조절 기능을 제공하는 단계; 상기 적어도 하나의 프로세서에 의해, 상기 길이 조절 기능을 통해 길이가 조절된 스냅샷의 런닝타임을 상기 조절된 길이에 따라 조절하는 단계; 및 상기 적어도 하나의 프로세서에 의해, 상기 컨텐츠 편집 툴을 통해 입력되는 텍스트에 대한 음성합성을 생성하여 상기 타임라인의 선택된 시점에 추가하는 단계를 포함하는 것을 특징으로 하는 영상 컨텐츠 생성 방법을 제공한다.A method for generating image content in a computer device including at least one processor, the method comprising: extracting, by the at least one processor, snapshots of images uploaded through a content editing tool; displaying, by the at least one processor, the extracted snapshots according to a timeline through the content editing tool; providing, by the at least one processor, a length adjustment function for adjusting the length of the displayed snapshots through the content editing tool; adjusting, by the at least one processor, a running time of a snapshot whose length is adjusted through the length adjustment function according to the adjusted length; and generating, by the at least one processor, a voice synthesis for text input through the content editing tool, and adding it to the selected time point of the timeline.

일측에 따르면, 상기 표시된 스냅샷들의 길이는 상기 표시된 스냅샷들에 대응하는 이미지들이 상기 타임라인상에서 점유하는 시간인 상기 런닝타임에 비례하고, 상기 타임라인에 따라 표시하는 단계는, 상기 추출된 스냅샷들을 디폴트 런닝타임에 비례하는 길이로 상기 컨텐츠 편집 툴을 통해 표시하는 것을 특징으로 할 수 있다.According to one side, the length of the displayed snapshots is proportional to the running time, which is a time occupied by images corresponding to the displayed snapshots on the timeline, and displaying according to the timeline includes: It may be characterized in that the shots are displayed through the content editing tool with a length proportional to the default running time.

다른 측면에 따르면, 상기 길이 조절 기능을 제공하는 단계는, 상기 표시된 스냅샷들 중 제1 스냅샷에 대해 기설정된 왼쪽 영역 또는 오른쪽 영역에 대한 사용자의 터치 앤 드래그 또는 클릭 앤 드래그에 따라 상기 제1 스냅샷의 길이를 증가 또는 감소시키는 기능을 제공하는 것을 특징으로 할 수 있다.According to another aspect, the providing of the length adjustment function may include: according to a user's touch-and-drag or click-and-drag a user's touch-and-drag or click-and-drag on a predetermined left area or right area for a first snapshot among the displayed snapshots. It may be characterized in that it provides a function to increase or decrease the length of the snapshot.

또 다른 측면에 따르면, 상기 길이 조절 기능을 제공하는 단계는, 상기 제1 스냅샷의 상기 왼쪽 영역 또는 상기 오른쪽 영역에 대한 사용자의 터치 또는 클릭이 유지되는 동안, 상기 제1 스냅샷의 왼쪽 끝 부분 또는 오른쪽 끝 부분에 대한 상기 타임라인상의 시점을 표시하는 것을 특징으로 할 수 있다.According to another aspect, the providing of the length adjustment function may include, while the user's touch or click on the left area or the right area of the first snapshot is maintained, the left end portion of the first snapshot Alternatively, it may be characterized in that a viewpoint on the timeline for the right end is displayed.

또 다른 측면에 따르면, 상기 런닝타임을 상기 조절된 길이에 따라 조절하는 단계는, 상기 길이가 조절된 스냅샷에 대응하는 이미지가 상기 타임라인상에서 점유하는 시간인 상기 런닝타임을, 상기 길이가 조절된 정도에 비례하게 증가 또는 감소시키는 것을 특징으로 할 수 있다.According to another aspect, in the step of adjusting the running time according to the adjusted length, the running time, which is a time occupied by an image corresponding to the length-adjusted snapshot, on the timeline, is adjusted by the length. It may be characterized in that it increases or decreases in proportion to the degree.

또 다른 측면에 따르면, 상기 음성합성을 생성하여 상기 타임라인의 선택된 시점에 추가하는 단계는, 상기 컨텐츠 편집 툴을 통해 선택된 음성 타입에 따라 상기 텍스트에 대한 음성합성을 생성하는 것을 특징으로 할 수 있다.According to another aspect, the generating and adding the speech synthesis to the selected time point of the timeline may include generating a speech synthesis for the text according to a speech type selected through the content editing tool. .

또 다른 측면에 따르면, 상기 음성합성을 생성하여 상기 타임라인의 선택된 시점에 추가하는 단계는, 상기 타임라인상에서 특정 시점을 나타내는 타임 인디케이터의 이동을 통해 선택된 상기 타임라인의 특정 시점에 상기 생성된 음성합성을 추가하는 것을 특징으로 할 수 있다.According to another aspect, the step of generating the speech synthesis and adding it to the selected point in time of the timeline may include moving a time indicator indicating a specific point in time on the timeline to move the generated speech at a specific point in the timeline. It may be characterized by adding synthesis.

또 다른 측면에 따르면, 상기 영상 컨텐츠 생성 방법은 상기 적어도 하나의 프로세서에 의해, 사용자의 입력에 기반하여 상기 타임라인에 추가된 상기 음성합성의 상기 타임라인상에서의 위치를 이동시키는 단계를 더 포함할 수 있다.According to another aspect, the method for generating image content may further include moving, by the at least one processor, a position on the timeline of the voice synthesis added to the timeline based on a user input. can

또 다른 측면에 따르면, 상기 영상 컨텐츠 생성 방법은 상기 적어도 하나의 프로세서에 의해, 상기 컨텐츠 편집 툴을 통해 제공된 복수의 효과음 중 하나의 효과음을 선택받는 단계; 및 상기 적어도 하나의 프로세서에 의해, 상기 컨텐츠 편집 툴에서 상기 타임라인에 대해 선택된 시점에 상기 선택된 효과음을 추가하는 단계를 더 포함할 수 있다.According to another aspect, the method for generating image content includes: receiving, by the at least one processor, one of a plurality of sound effects provided through the content editing tool; and adding, by the at least one processor, the selected sound effect at a time selected for the timeline in the content editing tool.

또 다른 측면에 따르면, 상기 영상 컨텐츠 생성 방법은 상기 적어도 하나의 프로세서에 의해, 상기 표시된 스냅샷들의 순서를 변경하기 위한 기능을 제공하는 단계를 더 포함할 수 있다.According to another aspect, the method for generating image content may further include providing, by the at least one processor, a function for changing the order of the displayed snapshots.

또 다른 측면에 따르면, 상기 이미지들은, 이미지화 가능한 복수의 페이지들 포함하는 파일의 형태로 업로드되는 것을 특징으로 할 수 있다.According to another aspect, the images may be uploaded in the form of a file including a plurality of imageable pages.

또 다른 측면에 따르면, 상기 음성합성을 생성하여 상기 타임라인의 선택된 시점에 추가하는 단계는, 상기 타임라인에 추가하고자 하는 제1 음성합성이 상기 타임라인에 이미 추가된 제2 음성합성과 런닝타임의 적어도 일부가 겹치는 경우, 상기 제1 음성합성을 상기 제2 음성합성과 다른 음성 채널로 상기 타임라인에 추가하는 것을 특징으로 할 수 있다.According to another aspect, the step of generating the speech synthesis and adding it to the selected time point of the timeline includes a first speech synthesis to be added to the timeline, a second speech synthesis already added to the timeline, and a running time When at least a portion of the syntheses overlap, the first voice synthesis may be added to the timeline as a different voice channel than the second voice synthesis.

또 다른 측면에 따르면, 상기 음성합성을 생성하여 상기 타임라인의 선택된 시점에 추가하는 단계는, 상기 타임라인의 선택된 시점에 추가된 음성합성에 대한 인디케이터를 상기 컨텐츠 편집 툴을 통해 표시하는 것을 특징으로 할 수 있다.According to another aspect, the step of generating the speech synthesis and adding it to the selected time point of the timeline includes displaying an indicator for the speech synthesis added at the selected time point of the timeline through the content editing tool. can do.

또 다른 측면에 따르면, 상기 인디케이터를 통해 상기 텍스트의 적어도 일부가 표시되는 것을 특징으로 할 수 있다.According to another aspect, at least a part of the text may be displayed through the indicator.

또 다른 측면에 따르면, 상기 인디케이터의 길이는 상기 음성합성의 길이에 비례하는 것을 특징으로 할 수 있다.According to another aspect, the length of the indicator may be proportional to the length of the speech synthesis.

또 다른 측면에 따르면, 상기 음성합성을 생성하여 상기 타임라인의 선택된 시점에 추가하는 단계는, 상기 인디케이터에 대한 사용자 입력에 기반하여 상기 음성합성의 생성에 이용된 음성 타입에 대한 정보, 상기 음성합성의 길이에 대한 정보 및 상기 텍스트 중 적어도 하나를 표시하는 것을 특징으로 할 수 있다.According to another aspect, the step of generating the speech synthesis and adding it to the selected time point of the timeline includes information on a speech type used to generate the speech synthesis based on a user input for the indicator, and the speech synthesis. It may be characterized in that at least one of information on the length of and the text is displayed.

컴퓨터 장치와 결합되어 상기 방법을 컴퓨터 장치에 실행시키기 위해 컴퓨터 판독 가능한 기록매체에 저장된 컴퓨터 프로그램을 제공한다.Provided is a computer program stored in a computer-readable recording medium in combination with a computer device to execute the method on the computer device.

상기 방법을 컴퓨터 장치에 실행시키기 위한 프로그램이 기록되어 있는 컴퓨터 판독 가능한 기록매체를 제공한다.It provides a computer-readable recording medium in which a program for executing the method in a computer device is recorded.

컴퓨터에서 판독 가능한 명령을 실행하도록 구현되는 적어도 하나의 프로세서를 포함하고, 상기 적어도 하나의 프로세서에 의해, 컨텐츠 편집 툴을 통해 업로드된 이미지들의 스냅샷들을 추출하고, 상기 추출된 스냅샷들을 상기 컨텐츠 편집 툴을 통해 타임라인에 따라 표시하고, 상기 컨텐츠 편집 툴을 통해 상기 표시된 스냅샷들의 길이를 조절하는 길이 조절 기능을 제공하고, 상기 길이 조절 기능을 통해 길이가 조절된 스냅샷의 런닝타임을 상기 조절된 길이에 따라 조절하고, 상기 컨텐츠 편집 툴을 통해 입력되는 텍스트에 대한 음성합성을 생성하여 상기 타임라인의 선택된 시점에 추가하는 것을 특징으로 하는 컴퓨터 장치를 제공한다.at least one processor implemented to execute computer-readable instructions, wherein the at least one processor extracts snapshots of images uploaded through a content editing tool, and edits the extracted snapshots into the content It displays according to the timeline through a tool, provides a length adjustment function for adjusting the length of the displayed snapshots through the content editing tool, and adjusts the running time of the snapshot whose length is adjusted through the length adjustment function Provided is a computer device characterized in that the audio synthesis is adjusted according to the length of the text inputted through the content editing tool and added to the selected time point of the timeline.

다수의 이미지들에 대해 사용자가 원하는 음성합성을 실시간으로 생성하여 사용자가 원하는 재생 시작 시간에 더빙할 수 있고, 생성된 음성합성이 더빙된 다수의 이미지들을 통해 영상 컨텐츠를 생성 및 제공할 수 있다.A user's desired voice synthesis for a plurality of images may be generated in real time and dubbed at a playback start time desired by the user, and video content may be generated and provided through the plurality of images to which the generated voice synthesis has been dubbed.

도 1은 본 발명의 일실시예에 따른 네트워크 환경의 예를 도시한 도면이다.
도 2는 본 발명의 일실시예에 따른 컴퓨터 장치의 예를 도시한 블록도이다.
도 3은 본 발명의 일실시예에 따른 영상 컨텐츠 생성 시스템의 예를 도시한 도면이다.
도 4 내지 도 19는 본 발명의 일실시예에 따른 컨텐츠 편집 툴의 화면들의 예를 도시한 도면들이다.
도 20은 본 발명의 일실시예에 따른 영상 컨텐츠 생성 방법의 예를 도시한 흐름도이다.1 is a diagram illustrating an example of a network environment according to an embodiment of the present invention.
2 is a block diagram illustrating an example of a computer device according to an embodiment of the present invention.
3 is a diagram illustrating an example of an image content generation system according to an embodiment of the present invention.
4 to 19 are diagrams illustrating examples of screens of a content editing tool according to an embodiment of the present invention.
20 is a flowchart illustrating an example of a method for generating image content according to an embodiment of the present invention.

이하, 실시예를 첨부한 도면을 참조하여 상세히 설명한다.Hereinafter, embodiments will be described in detail with reference to the accompanying drawings.

본 발명의 실시예들에 따른 컨텐츠 생성 시스템은 적어도 하나의 컴퓨터 장치에 의해 구현될 수 있으며, 본 발명의 실시예들에 따른 컨텐츠 생성 방법은 컨텐츠 생성 시스템을 구현하는 적어도 하나의 컴퓨터 장치를 통해 수행될 수 있다. 컴퓨터 장치에는 본 발명의 일실시예에 따른 컴퓨터 프로그램이 설치 및 구동될 수 있고, 컴퓨터 장치는 구동된 컴퓨터 프로그램의 제어에 따라 본 발명의 실시예들에 따른 컨텐츠 생성 방법을 수행할 수 있다. 상술한 컴퓨터 프로그램은 컴퓨터 장치와 결합되어 컨텐츠 생성 방법을 컴퓨터 장치에 실행시키기 위해 컴퓨터 판독 가능한 기록매체에 저장될 수 있다.The content generating system according to the embodiments of the present invention may be implemented by at least one computer device, and the content generating method according to the embodiments of the present invention is performed through at least one computer device implementing the content generating system. can be The computer program according to an embodiment of the present invention may be installed and driven in the computer device, and the computer device may perform the content creation method according to the embodiments of the present invention under the control of the driven computer program. The above-described computer program may be stored in a computer-readable recording medium in order to be combined with a computer device to cause the computer device to execute the content creation method.

도 1은 본 발명의 일실시예에 따른 네트워크 환경의 예를 도시한 도면이다. 도 1의 네트워크 환경은 복수의 전자 기기들(110, 120, 130, 140), 복수의 서버들(150, 160) 및 네트워크(170)를 포함하는 예를 나타내고 있다. 이러한 도 1은 발명의 설명을 위한 일례로 전자 기기의 수나 서버의 수가 도 1과 같이 한정되는 것은 아니다. 또한, 도 1의 네트워크 환경은 본 실시예들에 적용 가능한 환경들 중 하나의 예를 설명하는 것일 뿐, 본 실시예들에 적용 가능한 환경이 도 1의 네트워크 환경으로 한정되는 것은 아니다.1 is a diagram illustrating an example of a network environment according to an embodiment of the present invention. The network environment of FIG. 1 shows an example including a plurality of electronic devices 110 , 120 , 130 , 140 , a plurality of servers 150 , 160 , and a network 170 . 1 is an example for explaining the invention, and the number of electronic devices or the number of servers is not limited as in FIG. 1 . In addition, the network environment of FIG. 1 only describes one example of environments applicable to the present embodiments, and the environment applicable to the present embodiments is not limited to the network environment of FIG. 1 .

복수의 전자 기기들(110, 120, 130, 140)은 컴퓨터 장치로 구현되는 고정형 단말이거나 이동형 단말일 수 있다. 복수의 전자 기기들(110, 120, 130, 140)의 예를 들면, 스마트폰(smart phone), 휴대폰, 네비게이션, 컴퓨터, 노트북, 디지털방송용 단말, PDA(Personal Digital Assistants), PMP(Portable Multimedia Player), 태블릿 PC 등이 있다. 일례로 도 1에서는 전자 기기(110)의 예로 스마트폰의 형상을 나타내고 있으나, 본 발명의 실시예들에서 전자 기기(110)는 실질적으로 무선 또는 유선 통신 방식을 이용하여 네트워크(170)를 통해 다른 전자 기기들(120, 130, 140) 및/또는 서버(150, 160)와 통신할 수 있는 다양한 물리적인 컴퓨터 장치들 중 하나를 의미할 수 있다.The plurality of electronic devices 110 , 120 , 130 , and 140 may be a fixed terminal implemented as a computer device or a mobile terminal. Examples of the plurality of electronic devices 110 , 120 , 130 , 140 include a smart phone, a mobile phone, a navigation device, a computer, a notebook computer, a digital broadcasting terminal, a personal digital assistant (PDA), and a portable multimedia player (PMP). ), tablet PCs, etc. As an example, in FIG. 1 , the shape of a smartphone is shown as an example of the electronic device 110 , but in embodiments of the present invention, the electronic device 110 is substantially configured to be different through the network 170 using a wireless or wired communication method. It may refer to one of various physical computer devices capable of communicating with the electronic devices 120 , 130 , 140 and/or the servers 150 and 160 .

통신 방식은 제한되지 않으며, 네트워크(170)가 포함할 수 있는 통신망(일례로, 이동통신망, 유선 인터넷, 무선 인터넷, 방송망)을 활용하는 통신 방식뿐만 아니라 기기들간의 근거리 무선 통신 역시 포함될 수 있다. 예를 들어, 네트워크(170)는, PAN(personal area network), LAN(local area network), CAN(campus area network), MAN(metropolitan area network), WAN(wide area network), BBN(broadband network), 인터넷 등의 네트워크 중 하나 이상의 임의의 네트워크를 포함할 수 있다. 또한, 네트워크(170)는 버스 네트워크, 스타 네트워크, 링 네트워크, 메쉬 네트워크, 스타-버스 네트워크, 트리 또는 계층적(hierarchical) 네트워크 등을 포함하는 네트워크 토폴로지 중 임의의 하나 이상을 포함할 수 있으나, 이에 제한되지 않는다.The communication method is not limited, and not only a communication method using a communication network (eg, a mobile communication network, a wired Internet, a wireless Internet, a broadcasting network) that the network 170 may include, but also short-range wireless communication between devices may be included. For example, the network 170 may include a personal area network (PAN), a local area network (LAN), a campus area network (CAN), a metropolitan area network (MAN), a wide area network (WAN), and a broadband network (BBN). , the Internet, and the like. In addition, the network 170 may include any one or more of a network topology including a bus network, a star network, a ring network, a mesh network, a star-bus network, a tree, or a hierarchical network, etc. not limited

서버(150, 160) 각각은 복수의 전자 기기들(110, 120, 130, 140)과 네트워크(170)를 통해 통신하여 명령, 코드, 파일, 컨텐츠, 서비스 등을 제공하는 컴퓨터 장치 또는 복수의 컴퓨터 장치들로 구현될 수 있다. 예를 들어, 서버(150)는 네트워크(170)를 통해 접속한 복수의 전자 기기들(110, 120, 130, 140)로 서비스(일례로, 컨텐츠 제공 서비스, 그룹 통화 서비스(또는 음성 컨퍼런스 서비스), 메시징 서비스, 메일 서비스, 소셜 네트워크 서비스, 지도 서비스, 번역 서비스, 금융 서비스, 결제 서비스, 검색 서비스 등)를 제공하는 시스템일 수 있다.Each of the servers 150 and 160 communicates with the plurality of electronic devices 110 , 120 , 130 , 140 and the network 170 through a computer device or a plurality of computers providing commands, codes, files, contents, services, etc. It can be implemented in devices. For example, the server 150 provides a service (eg, a content providing service, a group call service (or a voice conference service) to the plurality of electronic devices 110 , 120 , 130 , and 140 connected through the network 170 ). , a messaging service, a mail service, a social network service, a map service, a translation service, a financial service, a payment service, a search service, etc.).

도 2는 본 발명의 일실시예에 따른 컴퓨터 장치의 예를 도시한 블록도이다. 앞서 설명한 복수의 전자 기기들(110, 120, 130, 140) 각각이나 서버들(150, 160) 각각은 도 2를 통해 도시된 컴퓨터 장치(200)에 의해 구현될 수 있다.2 is a block diagram illustrating an example of a computer device according to an embodiment of the present invention. Each of the plurality of electronic devices 110 , 120 , 130 , 140 or the servers 150 and 160 described above may be implemented by the computer device 200 illustrated in FIG. 2 .

이러한 컴퓨터 장치(200)는 도 2에 도시된 바와 같이, 메모리(210), 프로세서(220), 통신 인터페이스(230) 그리고 입출력 인터페이스(240)를 포함할 수 있다. 메모리(210)는 컴퓨터에서 판독 가능한 기록매체로서, RAM(random access memory), ROM(read only memory) 및 디스크 드라이브와 같은 비소멸성 대용량 기록장치(permanent mass storage device)를 포함할 수 있다. 여기서 ROM과 디스크 드라이브와 같은 비소멸성 대용량 기록장치는 메모리(210)와는 구분되는 별도의 영구 저장 장치로서 컴퓨터 장치(200)에 포함될 수도 있다. 또한, 메모리(210)에는 운영체제와 적어도 하나의 프로그램 코드가 저장될 수 있다. 이러한 소프트웨어 구성요소들은 메모리(210)와는 별도의 컴퓨터에서 판독 가능한 기록매체로부터 메모리(210)로 로딩될 수 있다. 이러한 별도의 컴퓨터에서 판독 가능한 기록매체는 플로피 드라이브, 디스크, 테이프, DVD/CD-ROM 드라이브, 메모리 카드 등의 컴퓨터에서 판독 가능한 기록매체를 포함할 수 있다. 다른 실시예에서 소프트웨어 구성요소들은 컴퓨터에서 판독 가능한 기록매체가 아닌 통신 인터페이스(230)를 통해 메모리(210)에 로딩될 수도 있다. 예를 들어, 소프트웨어 구성요소들은 네트워크(170)를 통해 수신되는 파일들에 의해 설치되는 컴퓨터 프로그램에 기반하여 컴퓨터 장치(200)의 메모리(210)에 로딩될 수 있다.As shown in FIG. 2 , the computer device 200 may include a memory 210 , a processor 220 , a communication interface 230 , and an input/output interface 240 . The memory 210 is a computer-readable recording medium and may include a random access memory (RAM), a read only memory (ROM), and a permanent mass storage device such as a disk drive. Here, a non-volatile mass storage device such as a ROM and a disk drive may be included in the computer device 200 as a separate permanent storage device distinct from the memory 210 . Also, the memory 210 may store an operating system and at least one program code. These software components may be loaded into the memory 210 from a computer-readable recording medium separate from the memory 210 . The separate computer-readable recording medium may include a computer-readable recording medium such as a floppy drive, a disk, a tape, a DVD/CD-ROM drive, and a memory card. In another embodiment, the software components may be loaded into the memory 210 through the communication interface 230 instead of a computer-readable recording medium. For example, the software components may be loaded into the memory 210 of the computer device 200 based on a computer program installed by files received through the network 170 .

프로세서(220)는 기본적인 산술, 로직 및 입출력 연산을 수행함으로써, 컴퓨터 프로그램의 명령을 처리하도록 구성될 수 있다. 명령은 메모리(210) 또는 통신 인터페이스(230)에 의해 프로세서(220)로 제공될 수 있다. 예를 들어 프로세서(220)는 메모리(210)와 같은 기록 장치에 저장된 프로그램 코드에 따라 수신되는 명령을 실행하도록 구성될 수 있다.The processor 220 may be configured to process instructions of a computer program by performing basic arithmetic, logic, and input/output operations. The instructions may be provided to the processor 220 by the memory 210 or the communication interface 230 . For example, the processor 220 may be configured to execute a received instruction according to a program code stored in a recording device such as the memory 210 .

통신 인터페이스(230)은 네트워크(170)를 통해 컴퓨터 장치(200)가 다른 장치(일례로, 앞서 설명한 저장 장치들)와 서로 통신하기 위한 기능을 제공할 수 있다. 일례로, 컴퓨터 장치(200)의 프로세서(220)가 메모리(210)와 같은 기록 장치에 저장된 프로그램 코드에 따라 생성한 요청이나 명령, 데이터, 파일 등이 통신 인터페이스(230)의 제어에 따라 네트워크(170)를 통해 다른 장치들로 전달될 수 있다. 역으로, 다른 장치로부터의 신호나 명령, 데이터, 파일 등이 네트워크(170)를 거쳐 컴퓨터 장치(200)의 통신 인터페이스(230)를 통해 컴퓨터 장치(200)로 수신될 수 있다. 통신 인터페이스(230)를 통해 수신된 신호나 명령, 데이터 등은 프로세서(220)나 메모리(210)로 전달될 수 있고, 파일 등은 컴퓨터 장치(200)가 더 포함할 수 있는 저장 매체(상술한 영구 저장 장치)로 저장될 수 있다.The communication interface 230 may provide a function for the computer device 200 to communicate with other devices (eg, the storage devices described above) through the network 170 . For example, a request, command, data, file, etc. generated by the processor 220 of the computer device 200 according to a program code stored in a recording device such as the memory 210 is transmitted to the network ( 170) to other devices. Conversely, signals, commands, data, files, etc. from other devices may be received by the computer device 200 through the communication interface 230 of the computer device 200 via the network 170 . A signal, command, or data received through the communication interface 230 may be transmitted to the processor 220 or the memory 210 , and the file may be a storage medium (described above) that the computer device 200 may further include. persistent storage).

입출력 인터페이스(240)는 입출력 장치(250)와의 인터페이스를 위한 수단일 수 있다. 예를 들어, 입력 장치는 마이크, 키보드 또는 마우스 등의 장치를, 그리고 출력 장치는 디스플레이, 스피커와 같은 장치를 포함할 수 있다. 다른 예로 입출력 인터페이스(240)는 터치스크린과 같이 입력과 출력을 위한 기능이 하나로 통합된 장치와의 인터페이스를 위한 수단일 수도 있다. 입출력 장치(250)는 컴퓨터 장치(200)와 하나의 장치로 구성될 수도 있다.The input/output interface 240 may be a means for an interface with the input/output device 250 . For example, the input device may include a device such as a microphone, keyboard, or mouse, and the output device may include a device such as a display or a speaker. As another example, the input/output interface 240 may be a means for an interface with a device in which functions for input and output are integrated into one, such as a touch screen. The input/output device 250 may be configured as one device with the computer device 200 .

또한, 다른 실시예들에서 컴퓨터 장치(200)는 도 2의 구성요소들보다 더 적은 혹은 더 많은 구성요소들을 포함할 수도 있다. 그러나, 대부분의 종래기술적 구성요소들을 명확하게 도시할 필요성은 없다. 예를 들어, 컴퓨터 장치(200)는 상술한 입출력 장치(250) 중 적어도 일부를 포함하도록 구현되거나 또는 트랜시버(transceiver), 데이터베이스 등과 같은 다른 구성요소들을 더 포함할 수도 있다.Also, in other embodiments, the computer device 200 may include fewer or more components than those of FIG. 2 . However, there is no need to clearly show most of the prior art components. For example, the computer device 200 may be implemented to include at least a portion of the above-described input/output device 250 or may further include other components such as a transceiver and a database.

도 3은 본 발명의 일실시예에 따른 영상 컨텐츠 생성 시스템의 예를 도시한 도면이다. 도 3은 컨텐츠 생성 서버(300), 복수의 사용자들(310) 및 컨텐츠 편집 툴(320)을 나타내고 있다.3 is a diagram illustrating an example of an image content generation system according to an embodiment of the present invention. 3 shows a content creation server 300 , a plurality of users 310 , and a content editing tool 320 .

컨텐츠 생성 서버(300)는 적어도 하나의 컴퓨터 장치(200)에 의해 구현될 수 있으며, 복수의 사용자들(310)에게 컨텐츠 편집 툴(320)을 제공하여 복수의 사용자들(310)이 컨텐츠 편집 툴(320)을 이용하여 복수의 사용자들(310) 각각이 이미지들에 음성합성을 더빙하여 영상 컨텐츠를 생성하는 것을 지원할 수 있다.The content creation server 300 may be implemented by at least one computer device 200 , and provides a content editing tool 320 to a plurality of users 310 so that the plurality of users 310 can use the content editing tool. Using 320, each of the plurality of users 310 may support the creation of video content by dubbing audio synthesis on images.

여기서, "이미지들"은 개별적인 복수의 이미지들, 이미지들의 묶음 또는 이미지들의 묶음과 적어도 하나의 개별적인 이미지를 포함할 수 있다. 또한, 이미지들의 묶음은 PDF 파일과 같이 하나의 파일에 포함된 페이지들을 이미지화한 것을 포함할 수 있다. Here, “images” may include a plurality of individual images, a bundle of images, or a bundle of images and at least one individual image. Also, the bundle of images may include images of pages included in one file, such as a PDF file.

복수의 사용자들(310)은 컨텐츠 생성 서버(300)로부터 컨텐츠 편집 툴(320)을 제공받아 이미지들로부터 영상 컨텐츠를 생성할 수 있다. 이때, 복수의 사용자들(310) 각각은 실질적으로 네트워크(170)를 통해 컨텐츠 생성 서버(300)에 접근하여 컨텐츠 편집 툴(320)을 제공받는 물리적인 전자 기기들일 수 있다. 이러한 물리적인 전자 기기들 역시 각각 앞서 도 2를 통해 설명한 컴퓨터 장치(200)에 의해 구현될 수 있다.The plurality of users 310 may receive the content editing tool 320 from the content creation server 300 to generate image content from the images. In this case, each of the plurality of users 310 may be physical electronic devices that access the content creation server 300 through the network 170 and receive the content editing tool 320 . Each of these physical electronic devices may also be implemented by the computer device 200 described above with reference to FIG. 2 .

컨텐츠 편집 툴(320)은 웹 방식 또는 앱 방식으로 복수의 사용자들(310)에게 제공될 수 있다. 웹 방식은 복수의 사용자들(310)이 컨텐츠 편집 툴(320)의 기능이 구현되어 있으며 컨텐츠 생성 서버(300)에 의해 제공되는 웹페이지를 방문하여 해당 웹 페이지를 통해 영상 컨텐츠 생성을 위한 기능을 제공받는 방식을 의미할 수 있다. 앱 방식은 복수의 사용자들(310)에 대응하는 물리적인 전자 기기들 각각에 설치 및 구동되는 어플리케이션을 통해 컨텐츠 생성 서버(300)에 접속하여 영상 컨텐츠 생성을 위한 기능을 제공받는 방식을 의미할 수 있다. 실시예에 따라 영상 컨텐츠 생성을 위한 기능이 포함된 어플리케이션을 통해 복수의 사용자들(310)에 대응하는 물리적인 전자 기기들 각각에서 자체적으로 영상 컨텐츠 생성을 처리할 수도 있다.The content editing tool 320 may be provided to the plurality of users 310 by a web method or an app method. In the web method, a plurality of users 310 implement the function of the content editing tool 320 and visit a web page provided by the content generation server 300 to generate image content through the web page. It can mean the way in which it is provided. The app method may refer to a method in which a function for image content creation is provided by accessing the content creation server 300 through an application installed and driven in each of the physical electronic devices corresponding to the plurality of users 310 . have. According to an embodiment, each of the physical electronic devices corresponding to the plurality of users 310 may independently process image content creation through an application including a function for generating image content.

일실시예에서 컨텐츠 생성 서버(300)는 컨텐츠 편집 툴(320)을 통해 사용자에 의해 업로드된 이미지들의 썸네일들을 타임라인에 따라 컨텐츠 편집 툴(320)에 표시할 수 있다. 만약, 사용자가 복수의 페이지들로 형성된 파일을 업로드하는 경우, 컨텐츠 생성 서버(300)는 복수의 페이지들을 이미지화하고, 이미지화된 페이지들의 썸네일들을 타임라인에 따라 컨텐츠 편집 툴(320)에 표시할 수 있다.In an embodiment, the content creation server 300 may display thumbnails of images uploaded by the user through the content editing tool 320 on the content editing tool 320 along a timeline. If the user uploads a file formed of a plurality of pages, the content creation server 300 may image the plurality of pages and display thumbnails of the imaged pages on the content editing tool 320 along a timeline. have.

이때, 컨텐츠 편집 툴(320)은 사용자가 타임라인상에서의 이미지들의 순서를 조절할 수 있는 기능을 제공할 수 있다. 사용자들은 해당 기능을 이용하여 자신이 업로드한 이미지들의 순서를 결정할 수 있다. 타임라인상에서의 이미지들의 순서는 최종적으로 생성될 영상 컨텐츠에서 이미지들이 등장하는 순서에 대응될 수 있다.In this case, the content editing tool 320 may provide a function for the user to adjust the order of images on the timeline. Users can use this function to determine the order of their uploaded images. The order of the images on the timeline may correspond to the order in which the images appear in the image content to be finally created.

또한, 컨텐츠 편집 툴(320)은 사용자가 타임라인상에서의 이미지들 중 원하는 이미지를 삭제할 수 있는 기능을 제공할 수 있다. 다시 말해, 사용자들은 해당 기능을 이용하여 자신이 업로드한 이미지들 중 필요하지 않은 이미지를 삭제할 수 있다.In addition, the content editing tool 320 may provide a function for the user to delete a desired image from among the images on the timeline. In other words, users can delete unnecessary images from among their uploaded images by using the corresponding function.

또한, 컨텐츠 편집 툴(320)은 사용자가 타임라인상에서 각 이미지들이 점유하는 시간(또는 구간)을 조절할 수 있는 기능을 제공할 수 있다. 조절된 시간은 최종적으로 생성될 영상 컨텐츠에서 이미지들이 등장하는 시간(또는 구간)에 대응될 수 있다. 예를 들어, 컨텐츠 편집 툴(320)에 표시되는 썸네일들의 가로 길이(또는 세로 길이)가 이미지들이 타임라인상에서 점유하는 시간(또는 구간)에 대응될 수 있다. 일례로, 컨텐츠 편집 툴(320)은 처음 4초의 시간(또는 구간)에 대응하는 길이로 썸네일들을 표시할 수 있다. 이때, 컨텐츠 편집 툴(320)은 썸네일의 좌측 및/또는 우측 끝 부분을 사용자가 클릭이나 터치한 후 드래그하여 썸네일의 길이를 늘이거나 줄일 수 있는 기능을 제공할 수 있다. 이 경우, 늘어나거나 줄어든 썸네일의 길이에 따라 이미지가 타임라인상에서 점유하는 시간이 늘어나거나 줄어들 수 있다.In addition, the content editing tool 320 may provide a function for the user to adjust the time (or section) occupied by each image on the timeline. The adjusted time may correspond to a time (or section) at which images appear in the image content to be finally created. For example, the horizontal length (or vertical length) of the thumbnails displayed on the content editing tool 320 may correspond to the time (or section) the images occupy on the timeline. For example, the content editing tool 320 may display thumbnails with a length corresponding to the first 4 seconds (or section). In this case, the content editing tool 320 may provide a function to increase or decrease the length of the thumbnail by dragging the user after clicking or touching the left and/or right ends of the thumbnail. In this case, the time the image occupies on the timeline may increase or decrease according to the length of the extended or decreased thumbnail.

또한, 컨텐츠 편집 툴(320)은 사용자가 타임라인상에서 원하는 시점이나 구간을 선택할 수 있는 기능을 제공할 수 있으며, 선택된 시점이나 구간에 대해 사용자가 원하는 임의의 텍스트를 연계시킬 수 있는 사용자 인터페이스를 제공할 수 있다. 선택된 시점이나 구간에 대해 임의의 텍스트가 연계되면, 컨텐츠 생성 서버(300)는 연계된 텍스트를 자동으로 음성으로 변환하여 선택된 시점이나 구간에 변환된 음성을 추가함으로써, 사용자들이 쉽고 간편하게 이미지들에 원하는 내용의 음성을 더빙할 수 있도록 지원할 수 있다.In addition, the content editing tool 320 may provide a function for the user to select a desired time point or section on the timeline, and provides a user interface capable of associating any text desired by the user with the selected time point or section can do. When any text is associated with the selected time point or section, the content creation server 300 automatically converts the associated text into a voice and adds the converted voice to the selected time point or section, so that users can easily and conveniently add the desired text to the images. It can support the voice dubbing of the content.

도 4 내지 도 19는 본 발명의 일실시예에 따른 컨텐츠 편집 툴의 화면들의 예를 도시한 도면들이다.4 to 19 are diagrams illustrating examples of screens of a content editing tool according to an embodiment of the present invention.

도 4는 도 3을 통해 설명한 컨텐츠 편집 툴(320)의 제1 화면 예(400)를 나타내고 있다. 본 실시예에 따른 컨텐츠 편집 툴(320)의 구성은 하나의 실시예로서 상기 구성은 실시예에 따라 다양하게 바뀔 수 있다.FIG. 4 shows an example of a first screen 400 of the content editing tool 320 described with reference to FIG. 3 . The configuration of the content editing tool 320 according to the present embodiment is one embodiment, and the configuration may be variously changed according to the embodiment.

사용자는 자신의 전자 기기를 통해 컨텐츠 편집 툴(320)에 접근할 수 있으며, 컨텐츠 편집 툴(320)은 사용자의 이미지들을 업로드 받기 위한 기능(410)을 제공할 수 있다. 도 4의 제1 화면 예(400)에서는 동영상이나 PDF 파일을 업로드하는 예를 설명하고 있으나, 컨텐츠 편집 툴(320)은 개별적인 복수의 이미지들이나 복수의 이미지들이 포함된 하나의 파일 또는 하나의 파일과 복수의 이미지들의 조합을 업로드하기 위한 기능을 제공할 수도 있다. 이때, 사용자에 의해 업로드되는 이미지들은 사용자가 컨텐츠 편집 툴(320)에 접근하기 위해 사용된 전자 기기의 로컬 저장소에 저장된 이미지들을 포함할 수 있다. 실시예에 따라, 사용자에 의해 업로드되는 이미지들은 전자 기기의 로컬 저장소가 아닌 웹 상에 위치하는 이미지들일 수도 있다.The user may access the content editing tool 320 through his/her electronic device, and the content editing tool 320 may provide a function 410 for uploading images of the user. Although an example of uploading a video or PDF file is described in the first screen example 400 of FIG. 4 , the content editing tool 320 includes a plurality of individual images or a single file including a plurality of images or a single file and A function for uploading a combination of a plurality of images may be provided. In this case, the images uploaded by the user may include images stored in a local storage of the electronic device used by the user to access the content editing tool 320 . According to an embodiment, the images uploaded by the user may be images located on the web rather than the local storage of the electronic device.

또한, 컨텐츠 편집 툴(320)은 이미지에 더빙을 추가하기 위한 기능(420)을 제공할 수 있다. 일례로, 기능(420)은 음성 선택 기능(421) 및 텍스트 입력 기능(422)을 포함할 수 있다. 음성 선택 기능(421)은 다양한 종류의 미리 정의된 음성 타입들 중에서 하나를 선택하기 위한 기능일 수 있으며, 텍스트 입력 기능(422)은 음성합성(Text To Speech, TTS)을 생성하기 위한 텍스트를 입력받기 위한 기능일 수 있다. 일례로, 사용자는 음성 선택 기능(421)에서 음성 타입 "음성 1"을 선택하고, 텍스트 입력 기능(422)에 텍스트 "안녕하세요"를 입력할 수 있다. 이때, 미리듣기 버튼(423)이나 더빙추가 버튼(424)을 선택(일례로, PC 환경에서의 클릭 또는 터치스크린 환경에서의 터치에 의해 선택)하는 경우, 입력된 텍스트 "안녕하세요"와 선택된 음성 타입 "음성 1"의 식별자가 컨텐츠 편집 툴(320)을 통해 컨텐츠 생성 서버(300)로 전달될 수 있다. 이 경우, 컨텐츠 생성 서버(300)는 음성 타입 "음성 1"로 텍스트 "안녕하세요"에 대한 음성합성을 생성할 수 있으며, 생성된 음성합성을 컨텐츠 편집 툴(320)을 통해 사용자의 전자 기기로 전달할 수 있다. 이때, 미리듣기 버튼(423)의 선택에 응답하여 전자 기기에서 음성합성이 스피커를 통해 출력될 수 있으며, 더빙추가 버튼(424)이 선택에 응답하여 기능(410)을 통해 업로드된 이미지와 연관하여 음성합성이 타임라인에 추가될 수 있다. 보다 구체적으로, 컨텐츠 편집 툴(320)은 최종적으로 생성될 영상 컨텐츠에 대한 타임라인을 가시적으로 표현하기 위한 타임라인 표시 기능(440)을 포함할 수 있다. 이때 음성합성이 타임라인의 어디에 추가되는가에 대해서는 이후 더욱 자세히 설명한다.Also, the content editing tool 320 may provide a function 420 for adding a dubbing to an image. As an example, the function 420 may include a voice selection function 421 and a text input function 422 . The voice selection function 421 may be a function for selecting one from among various types of predefined voice types, and the text input function 422 inputs text for generating a Text To Speech (TTS). It may be a function to receive. For example, the user may select a voice type “Voice 1” in the voice selection function 421 and input the text “Hello” in the text input function 422 . At this time, when the preview button 423 or the dubbing add button 424 is selected (for example, by clicking in a PC environment or by a touch in a touch screen environment), the input text "hello" and the selected voice type The identifier of “voice 1” may be transmitted to the content creation server 300 through the content editing tool 320 . In this case, the content creation server 300 may generate a voice synthesis for the text “hello” with the voice type “voice 1” and deliver the generated voice synthesis to the user's electronic device through the content editing tool 320 . can At this time, in response to the selection of the preview button 423 , the voice synthesis may be output from the electronic device through the speaker, and the dubbing add button 424 is associated with the image uploaded through the function 410 in response to the selection. Speech synthesis can be added to the timeline. More specifically, the content editing tool 320 may include a timeline display function 440 for visually representing a timeline for the image content to be finally generated. In this case, where the speech synthesis is added to the timeline will be described in more detail later.

실시예에 따라 음성 선택 기능(421)은 사용자에 의해 즐겨찾기로 등록된 음성 타입들 중 하나를 선택받도록 구현될 수 있다. 이때, 전체 음성 타입들 중 특정 음성 타입을 즐겨찾기로 등록하기 위한 사용자 인터페이스가 사용자에게 제공될 수 있다. 일례로, 사용자가 더빙 추가 기능(420)에 나타난 "전체보기"를 선택하는 경우, 사용자에게 전체 음성 타입들을 표시하기 위한 사용자 인터페이스가 제공될 수 있으며, 사용자는 제공된 사용자 인터페이스를 통해 전체 음성 타입들 중 원하는 적어도 하나의 음성 타입을 즐겨찾기로 등록할 수 있다. 이 경우, 음성 선택 기능(421)은 사용자에 의해 즐겨찾기로 등록된 음성들 중 하나를 선택받도록 구현될 수 있다.According to an embodiment, the voice selection function 421 may be implemented to receive one of voice types registered as a favorite by the user. In this case, a user interface for registering a specific voice type among all voice types as a favorite may be provided to the user. As an example, when the user selects “view all” displayed in the dubbing add-on function 420 , a user interface for displaying all voice types may be provided to the user, and the user may select all voice types through the provided user interface. At least one desired voice type may be registered as a favorite. In this case, the voice selection function 421 may be implemented so that one of voices registered as a favorite is selected by the user.

또한, 컨텐츠 편집 툴(320)은 미리 제작되어 있는 효과음을 이미지와 연관하여 타임라인에 추가하기 위한 효과음 추가 기능(430)을 제공할 수 있다. 효과음 추가 기능(430)은 미리 제작되어 있는 다수의 효과음들의 리스트를 표시하고, 효과음들에 대한 미리 듣기를 수행하거나 효과음을 타임라인의 특정 시간에 추가하기 위한 기능들을 포함할 수 있다. 필요에 따라 사용자가 원하는 효과음을 외부 파일로부터 추가하거나 직접 생성할 수도 있다.Also, the content editing tool 320 may provide a sound effect adding function 430 for adding a pre-made sound effect to the timeline in association with the image. The sound effect adding function 430 may include functions for displaying a list of a plurality of pre-produced sound effects, performing a preview of the sound effects, or adding the sound effect at a specific time in the timeline. If necessary, the user can add the desired sound effect from an external file or create it directly.

또한, 컨텐츠 편집 툴(320)은 타임라인의 특정 시점을 나타내는 타임 인디케이터(450)를 표시할 수 있다. 도 4에서는 타임 인디케이터(450)가 디폴트로 00:00.00의 시점을 나타내고 있는 예를 나타내고 있다.Also, the content editing tool 320 may display a time indicator 450 indicating a specific point in the timeline. 4 shows an example in which the time indicator 450 indicates the time of 00:00.00 by default.

또한, 도 4의 컨텐츠 편집 툴(320)에 나타난 저장 버튼(460)은 현재 프로젝트의 편집을 저장하기 위한 기능을 제공할 수 있으며, 다운로드 버튼(470)은 영상 컨텐츠를 생성하여 사용자의 전자 기기로 다운로드하기 위한 기능을 제공할 수 있다.In addition, the save button 460 shown in the content editing tool 320 of FIG. 4 may provide a function for saving the edit of the current project, and the download button 470 generates image content and sends it to the user's electronic device. A function for downloading may be provided.

도 5는 컨텐츠 편집 툴(320)의 제2 화면 예(500)를 나타내고 있다. 도 5의 제2 화면 예(500)에서는 도 4에서 설명한 기능(410)을 통해 이미지들이 업로드됨에 따라 업로드된 이미지들의 썸네일들 중 일부가 타임라인 표시 기능(440)을 통해 표시된 예를 나타내고 있다. 이때, 각 썸네일들은 기설정된 시간 간격(도 5의 실시예에서는 4초의 시간 간격)에 대응되도록 타임라인 표시 기능(440)에 표시되어 있다. 또한, 타임라인 표시 기능(440)의 영역에 대한 클릭 앤 드래그(또는 터치스크린 환경을 위한 터치 앤 드래그나 스와이프 제스처)를 통해 타임라인과 썸네일들이 탐색될 수 있다.5 shows an example of a second screen 500 of the content editing tool 320 . In the second screen example 500 of FIG. 5 , as images are uploaded through the function 410 described in FIG. 4 , some of the thumbnails of the uploaded images are displayed through the timeline display function 440 . At this time, each thumbnail is displayed in the timeline display function 440 to correspond to a preset time interval (a time interval of 4 seconds in the embodiment of FIG. 5 ). In addition, the timeline and thumbnails may be searched for by clicking and dragging the area of the timeline display function 440 (or using a touch and drag or swipe gesture for a touch screen environment).

도 6은 컨텐츠 편집 툴(320)의 제3 화면 예(600)로서 타임라인 표시 기능(440)의 영역에 대한 클릭 앤 드래그를 통해 타임라인 표시 기능(440)의 다른 영역이 표시되는 예를 나타내고 있다. 제3 화면 예(600)에서는 마지막 썸네일인 썸네일 10을 통해 사용자에 의해 10개의 이미지들이 업로드 되었음을 알 수 있다. 이미 설명한 바와 같이, 10 개의 이미지들은 개별 이미지들 또는 10개의 이미지들로 이미지화가 가능한 페이지들을 포함하는 하나의 파일의 형태로 업로드되거나 또는 n 개의 이미지들로 이미지화가 가능한 페이지들을 포함하는 파일과 m 개의 개별 이미지들(여기서, n, m은 자연수로 n+m=10)이 결합된 형태로 업로드될 수 있다. 둘 이상의 파일들과 개별 이미지들의 조합이 사용될 수도 있음을 쉽게 이해할 수 있을 것이다.6 shows an example in which another area of the timeline display function 440 is displayed by clicking and dragging the area of the timeline display function 440 as a third screen example 600 of the content editing tool 320, have. In the third screen example 600, it can be seen that 10 images have been uploaded by the user through the thumbnail 10, which is the last thumbnail. As already described, 10 images are uploaded in the form of individual images or one file including pages imageable as 10 images, or a file containing pages imageable as n images and m images. Individual images (here, n and m are natural numbers n+m=10) may be uploaded in a combined form. It will be readily appreciated that a combination of two or more files and individual images may be used.

도 7은 컨텐츠 편집 툴(320)의 제4 화면 예(700)로서, 썸네일의 시간 간격을 조절한 예를 나타내고 있다. 예를 들어, 도 7의 제4 화면 예(700)에서는 타임라인 표시 기능(440)의 영역에 표시되는 썸네일들의 가로 길이가 이미지들이 타임라인상에서 점유하는 시간(또는 구간)에 대응될 수 있다. 이때, 제4 화면 예(700)에서는 사용자가 썸네일 2의 우측 끝 부분을 클릭한 후 우측 방향으로 드래그하여 썸네일의 길이를 늘인 예를 나타내고 있다. 이 경우, 늘어난 썸네일 2의 길이에 따라 썸네일 2에 대응하는 이미지가 타임라인상에서 점유하는 시간(이하, 런닝타임)이 늘어날 수 있다. 이때, 제4 화면 예(700)에서는 사용자가 썸네일 2의 우측 끝 부분을 클릭하고 있는 동안, 썸네일 2의 우측 끝 부분에 대응하는 타임라인상의 시점(9.9초의 시점)이 표시되는 사용자 인터페이스(710)가 나타나 있다. 따라서, 사용자는 이러한 사용자 인터페이스(710)에 표시되는 시간에 기반하여 썸네일 2의 길이를 조절할 수 있다. 한편, 썸네일 2의 길이가 늘어난 만큼, 썸네일 2 이후의 썸네일들(일례로, 썸네일 3 내지 썸네일 10)의 시작 시점이 변경될 수 있다. 도 7의 실시예에서는 썸네일 2의 길이를 조절하여 썸네일에 대응하는 이미지의 런닝타임을 조절하는 예를 설명하였으나, 이러한 설명이 타임라인 표시 기능(440)의 각 썸네일들에 동일하게 적용될 수 있음을 쉽게 이해할 수 있을 것이다.7 is an example of a fourth screen 700 of the content editing tool 320, and shows an example in which the time interval of the thumbnails is adjusted. For example, in the fourth screen example 700 of FIG. 7 , the horizontal length of thumbnails displayed in the area of the timeline display function 440 may correspond to the time (or section) the images occupy on the timeline. In this case, the fourth screen example 700 shows an example in which the user extends the length of the thumbnail by clicking the right end of the thumbnail 2 and dragging it in the right direction. In this case, the time (hereinafter, running time) that the image corresponding to the thumbnail 2 occupies on the timeline may increase according to the length of the extended thumbnail 2 . At this time, in the fourth screen example 700, while the user clicks on the right end of the thumbnail 2, a time point on the timeline corresponding to the right end of the thumbnail 2 (the time point of 9.9 seconds) is displayed. User interface 710 is appearing Accordingly, the user may adjust the length of the thumbnail 2 based on the time displayed on the user interface 710 . Meanwhile, as the length of thumbnail 2 increases, the start time of thumbnails after thumbnail 2 (eg, thumbnail 3 to thumbnail 10) may be changed. In the embodiment of FIG. 7, an example of adjusting the running time of an image corresponding to a thumbnail by adjusting the length of thumbnail 2 has been described, but this description can be equally applied to each thumbnail of the timeline display function 440 It will be easy to understand.

도 8은 컨텐츠 편집 툴(320)의 제5 화면 예(800)로서, 썸네일 4의 시간 간격이 줄어든 예를 나타내고 있다. 제5 화면 예(800)에서는 사용자가 썸네일 4의 우측 끝 부분을 클릭한 후 좌측 방향으로 드래그하여 썸네일의 길이를 줄인 예를 나타내고 있다. 이때, 줄어든 썸네일 4의 길이에 따라 썸네일 4에 대응하는 이미지의 런닝타임이 줄어들 수 있다. 이 경우, 제5 화면 예(800)에서는 사용자가 썸네일 4의 우측 끝 부분을 클릭하고 있는 동안, 썸네일 4의 우측 끝 부분에 대응하는 타임라인상의 시점(17초의 시점)이 표시되는 사용자 인터페이스(810)가 나타나 있다. 한편, 썸네일 4의 길이가 줄어든 만큼, 썸네일 4 이후의 썸네일들(일례로, 썸네일 5 내지 썸네일 10)의 시작 시점이 변경될 수 있다.FIG. 8 is an example of a fifth screen 800 of the content editing tool 320, showing an example in which the time interval of thumbnail 4 is reduced. In the fifth screen example 800 , the user clicks the right end of the thumbnail 4 and then drags it to the left to reduce the length of the thumbnail. In this case, the running time of the image corresponding to the thumbnail 4 may be reduced according to the reduced length of the thumbnail 4 . In this case, in the fifth screen example 800 , while the user clicks on the right end of the thumbnail 4, a user interface 810 in which a time point on the timeline corresponding to the right end of the thumbnail 4 (a time point of 17 seconds) is displayed. ) is shown. Meanwhile, as the length of thumbnail 4 is reduced, the start time of thumbnails after thumbnail 4 (eg, thumbnail 5 to thumbnail 10) may be changed.

도 7 및 도 8의 실시예들에서는 사용자가 썸네일의 우측 끝 부분을 클릭한 후 좌우 방향으로 드래그하여 썸네일의 길이를 늘이거나 줄임으로써, 썸네일에 대응하는 이미지의 런닝타임을 늘이거나 줄이는 실시예들을 설명하였다. 이러한 설명을 통해 실시예에 따라 컨텐츠 편집 툴(320)이 썸네일의 좌측 끝 부분을 클릭한 후 좌우 방향으로 드래그하여 썸네일의 길이를 늘이거나 줄임으로써, 썸네일에 대응하는 이미지의 런닝타임을 늘이거나 줄이는 기능을 제공할 수도 있음을 쉽게 이해할 수 있을 것이다.In the embodiments of FIGS. 7 and 8 , the user clicks on the right end of the thumbnail and drags left and right to increase or decrease the length of the thumbnail, thereby increasing or reducing the running time of the image corresponding to the thumbnail. explained. Through this description, according to the embodiment, the content editing tool 320 clicks the left end of the thumbnail and drags it left and right to increase or decrease the length of the thumbnail, thereby increasing or decreasing the running time of the image corresponding to the thumbnail. It will be easy to understand that functions may be provided.

도 9는 컨텐츠 편집 툴(320)의 제6 화면 예(900)로서, 썸네일들의 순서가 변경된 예를 나타내고 있다. 컨텐츠 편집 툴(320)은 사용자가 특정 썸네일을 클릭 후 드래그(터치스크린 환경에서는 터치 후 드래그)함으로써, 썸네일들의 순서를 변경할 수 있는 기능을 제공할 수 있다. 일례로, 사용자는 제5 화면 예(800)에서 썸네일 1을 클릭 후 우측 방향으로 드래그함으로써, 썸네일 1과 썸네일 2의 순서를 변경할 수 있다. 제6 화면 예(600)는 썸네일 1과 썸네일 2의 순서가 변경된 모습을 나타내고 있다.FIG. 9 is a sixth screen example 900 of the content editing tool 320 , and shows an example in which the order of thumbnails is changed. The content editing tool 320 may provide a function for changing the order of thumbnails by a user clicking and dragging a specific thumbnail (touch and drag in a touch screen environment). For example, the user may change the order of thumbnail 1 and thumbnail 2 by clicking and dragging thumbnail 1 to the right in the fifth screen example 800 . The sixth screen example 600 shows a state in which the order of the thumbnail 1 and the thumbnail 2 is changed.

도 10은 컨텐츠 편집 툴(320)의 제7 화면 예(1000)로서, 특정 썸네일이 삭제된 예를 나타내고 있다. 컨텐츠 편집 툴(320)은 사용자가 특정 썸네일을 선택 한 후, 삭제할 수 있는 기능을 제공할 수 있다. 일례로, 사용자가 특정 썸네일에 대해 마우스 오버 이벤트를 발생시킴에 따라 해당 썸네일의 삭제를 위한 사용자 인터페이스가 표시될 수 있으며, 사용자는 표시된 사용자 인터페이스를 이용하여 해당 썸네일을 삭제할 수 있다. 이러한 썸네일의 삭제를 위한 방법이 다양하게 제공될 수 있음을 쉽게 이해할 수 있을 것이다. 일례로, 사용자는 특정 썸네일을 마우스로 클릭하여 선택한 후, 키보드상의 "Del" 키를 눌러 선택된 썸네일을 삭제할 수도 있다.10 is a seventh screen example 1000 of the content editing tool 320, and shows an example in which a specific thumbnail is deleted. The content editing tool 320 may provide a function for the user to select a specific thumbnail and then delete it. For example, when a user generates a mouse-over event for a specific thumbnail, a user interface for deleting the corresponding thumbnail may be displayed, and the user may delete the corresponding thumbnail using the displayed user interface. It will be easily understood that various methods for deleting such a thumbnail may be provided. For example, the user may select a specific thumbnail by clicking it with a mouse, and then press a "Del" key on the keyboard to delete the selected thumbnail.

도 11 및 도 12는 컨텐츠 편집 툴(320)의 제8 화면 예(1100) 및 제9 화면 예(1200)로서, 더빙을 추가하는 예를 나타내고 있다. 이미 설명한 바와 같이, 타임 인디케이터(450)는 타임라인의 특정 시점을 나타낼 수 있다. 예를 들어, 사용자는 타임 인디케이터(450)를 드래그하거나 원하는 타임라인의 위치를 클릭하는 방식으로 타임 인디케이터(450)를 이동시킬 수 있다. 제8 화면 예(1100)에서 타임 인디케이터(450)와 연관하여 표시된 시각 "00:06.00"은 타임라인에서 현재 타임 인디케이터(450)가 지시하는 시점을 나타낼 수 있다.11 and 12 are examples of an eighth screen 1100 and a ninth screen example 1200 of the content editing tool 320, illustrating examples in which dubbing is added. As already described, the time indicator 450 may indicate a specific point in the timeline. For example, the user may move the time indicator 450 by dragging the time indicator 450 or clicking a desired location of the timeline. The time “00:06.00” displayed in association with the time indicator 450 in the eighth screen example 1100 may indicate a time point indicated by the current time indicator 450 in the timeline.

또한, 제8 화면 예(1100)에서는 더빙 추가 기능(420)의 텍스트 입력 기능(422)을 통해 텍스트 "안녕하세요. 저는 AAA입니다."가 입력된 예를 나타내고 있다. 이때, 사용자가 더빙추가 버튼(424)을 선택하는 경우, 제9 화면 예(1200)에서와 같이 텍스트 "안녕하세요. 저는 AAA입니다."에 대응하는 제1 음성합성을 위한 음성합성 인디케이터(1210)가 타임라인 표시 기능(440)의 영역에 썸네일들과 연관하여 표시될 수 있다. 이때, 제1 음성합성은 이미 설명한 바와 같이 컨텐츠 생성 서버(300)에 의해 생성되어 컨텐츠 편집 툴(320)로 전달될 수 있다. 한편, 음성합성 인디케이터(1210)에는 대응하는 텍스트 "안녕하세요. 저는 AAA입니다."의 적어도 일부(제9 화면 예(1200)에서의 "안녕하세요. 저")와 제1 음성합성의 생성에 사용된 음성 타입의 식별자(일례로, 음성 타입 "음성 1"의 식별자 ①(1220))가 표시될 수 있다. Also, in the eighth screen example 1100 , the text “Hello. I am AAA” is input through the text input function 422 of the dubbing add function 420 . At this time, when the user selects the add dubbing button 424, as in the ninth screen example 1200, the voice synthesis indicator 1210 for the first voice synthesis corresponding to the text "Hello. I am AAA." It may be displayed in association with thumbnails in the area of the timeline display function 440 . In this case, the first voice synthesis may be generated by the content creation server 300 and transmitted to the content editing tool 320 as described above. On the other hand, the speech synthesis indicator 1210 includes at least a part of the corresponding text "Hello. I am AAA" ("Hello. I" in the ninth screen example 1200) and the voice used to generate the first speech synthesis. The type identifier (eg, the identifier ① 1220 of the voice type “voice 1”) may be displayed.

음성합성 인디케이터(1210)의 길이는 제1 음성합성의 길이에 대응할 수 있으며, 이러한 음성합성 인디케이터(1210)의 길이에 따라 표시 가능한 텍스트의 양이 달라질 수 있다. 이때, 제8 화면 예(1100)에 나타난 타임 인디케이터(450)의 시각은 "00:06.00"이고, 제9 화면 예(1200)에 나타난 타임 인디케이터(450)의 시각은 "00:09.56"이다. 다시 말해, 제1 음성합성을 위한 음성합성 인디케이터(1210)의 길이가 3.56초(00:09.56 - 00:06.00 = 00:03.56)임을 알 수 있다. The length of the speech synthesis indicator 1210 may correspond to the length of the first speech synthesis, and the amount of displayable text may vary according to the length of the speech synthesis indicator 1210 . In this case, the time of the time indicator 450 shown in the eighth screen example 1100 is “00:06.00”, and the time of the time indicator 450 shown in the ninth screen example 1200 is “00:09.56”. In other words, it can be seen that the length of the speech synthesis indicator 1210 for the first speech synthesis is 3.56 seconds (00:09.56 - 00:06.00 = 00:03.56).

한편, 사용자가 제8 화면 예(1100)에서 미리듣기 버튼(423)을 선택하는 경우, 텍스트 "안녕하세요. 저는 AAA입니다."에 대응하는 제1 음성합성이 사용자의 전자 기기의 스피커를 통해 출력될 수 있다. 다시 말해, 전자 기기는 컨텐츠 편집 툴(320)의 제어에 따라 제1 음성합성을 스피커를 통해 출력할 수 있다.Meanwhile, when the user selects the preview button 423 in the eighth screen example 1100, the first voice synthesis corresponding to the text “Hello. I am AAA” will be output through the speaker of the user's electronic device. can In other words, the electronic device may output the first voice synthesis through the speaker under the control of the content editing tool 320 .

도 13은 컨텐츠 편집 툴(320)의 제10 화면 예(1300)로서, 사용자가 음성합성 인디케이터(1210)상에 마우스 오버와 같은 입력을 발생시키는 경우, 마우스 포인터의 위치(터치스크린 환경에서는 음성합성 인디케이터(1210)의 위치를 터치하고 터치를 위치하는 동안의 터치의 위치)와 연관하여 음성합성 정보(1310)가 표시되는 예를 나타내고 있다. 음성합성 정보(1310)는 음성합성의 생성에 이용된 음성 타입(음성 1), 음성합성의 길이(3.56초(00:03.56))와 입력된 텍스트(안녕하세요. 저는 AAA입니다.)를 포함할 수 있다.13 is an example of a tenth screen 1300 of the content editing tool 320. When the user generates an input such as a mouse over on the voice synthesis indicator 1210, the position of the mouse pointer (speech synthesis in a touch screen environment) An example in which the voice synthesis information 1310 is displayed in association with the position of the indicator 1210 and the position of the touch while the position of the touch is touched) is shown. The speech synthesis information 1310 may include a speech type (voice 1) used to generate speech synthesis, a length of speech synthesis (3.56 seconds (00:03.56)), and input text (hello, I am AAA). have.

도 14는 컨텐츠 편집 툴(320)의 제11 화면 예(1400)로서 사용자가 썸네일 3의 길이를 타임 인디케이터(450)에 맞게 줄인 경우의 예를 나타내고 있다. 이 경우, 썸네일 3의 길이는 제1 음성합성의 길이가 1.56이며, 영상 컨텐츠를 위한 타임라인에서 썸네일 3에 대응하는 이미지의 런닝타임이 1.56초가 됨을 알 수 있다.FIG. 14 is an eleventh screen example 1400 of the content editing tool 320 and shows an example in which the user shortens the length of thumbnail 3 to fit the time indicator 450 . In this case, it can be seen that the length of the thumbnail 3 is 1.56 of the length of the first voice synthesis, and the running time of the image corresponding to the thumbnail 3 in the timeline for video content is 1.56 seconds.

도 15는 컨텐츠 편집 툴(320)의 제12 화면 예(1500)로서 사용자가 제1 음성합성의 시작 시점을 변경하는 예를 나타내고 있다. 다시 말해, 제12 화면 예(1500)에서는 제11 화면 예(1400)에서와 비교하여 음성합성 인디케이터(1210)의 위치가 변경되었음을 알 수 있다. 일례로, 사용자는 컨텐츠 편집 툴(320)에서 음성합성 인디케이터(1210)를 클릭한 상태로, 좌측 또는 우측으로 드래그함으로써 음성합성 인디케이터(1210)의 위치를 변경할 수 있으며, 이러한 음성합성 인디케이터(1210)의 위치의 변경에 따라 제1 음성합성의 시작 시점이 변경될 수 있다. 한편, 음성합성 인디케이터(1210)의 위치의 변경은 해당 음성합성 인디케이터(1210)가 선택(일례로, 클릭)된 상태에서 키보드의 방향키 입력을 통해 이루어질 수도 있다. 이러한 위치의 변경은 음성합성 인디케이터(1210)뿐만 아니라, 컨텐츠 편집 툴(320)에 의해 제공되는 다양한 인디케이터들 각각에 대해서도 공통적인 방법으로 적용될 수 있다. 또한, 다수의 인디케이터들은 하나의 그룹으로 선택할 수도 있다. 일례로, 키보드에서의 "Shift" 키를 누른 상태로 다수의 인디케이터들을 순차적으로 선택(일례로, 클릭)함에 따라 다수의 인디케이터들이 하나의 그룹으로 선택될 수 있다. 이 경우, 사용자는 드래그나 키보드의 방향키 입력 등을 통해 해당 그룹에 속한 다수의 인디케이터들의 위치를 한꺼번에 변경할 수도 있다.FIG. 15 is a twelfth screen example 1500 of the content editing tool 320 and shows an example in which the user changes the start time of the first voice synthesis. In other words, it can be seen that the position of the voice synthesis indicator 1210 is changed in the twelfth screen example 1500 compared to the eleventh screen example 1400 . As an example, the user may change the position of the voice synthesis indicator 1210 by clicking and dragging the voice synthesis indicator 1210 to the left or right in the content editing tool 320, such a voice synthesis indicator 1210. The start time of the first speech synthesis may be changed according to a change in the position of . On the other hand, the change of the position of the voice synthesis indicator 1210 may be made through the input of the direction key of the keyboard while the corresponding voice synthesis indicator 1210 is selected (eg, clicked). This change of position may be applied in a common method not only to the voice synthesis indicator 1210 but also to each of the various indicators provided by the content editing tool 320 . Also, a plurality of indicators may be selected as one group. For example, as a plurality of indicators are sequentially selected (eg, clicked) while a “Shift” key on the keyboard is pressed, a plurality of indicators may be selected as a group. In this case, the user may change the positions of a plurality of indicators belonging to a corresponding group at once through dragging or inputting a direction key on the keyboard.

도 16 및 도 17은 컨텐츠 편집 툴(320)의 제13 화면 예(1600) 및 제14 화면 예(1700)로서, 더빙을 더 추가하는 예를 나타내고 있다.16 and 17 are examples of a thirteenth screen 1600 and a fourteenth screen example 1700 of the content editing tool 320, illustrating examples in which dubbing is further added.

제13 화면 예(1600)는 사용자가 타임 인디케이터(450)를 "00:05.78"의 위치로 이동시킨 후, 음성 선택 기능(421)을 통해 음성 타입 "음성 2"를 선택하고, 텍스트 입력 기능(422)를 통해 텍스트 "반갑습니다."를 입력한 예를 나타내고 있다. 이때, 사용자가 더빙추가 버튼(424)을 선택하는 경우, 제14 화면 예(1700)에서와 같이 텍스트 "반갑습니다."에 대응하는 제2 음성합성을 위한 음성합성 인디케이터(1710)가 타임라인 표시 기능(440)의 영역에 썸네일들과 연관하여 표시될 수 있다. 이미 설명한 바와 같이, 음성합성 인디케이터(1710)에는 대응하는 텍스트 "반갑습니다."의 적어도 일부(제14 화면 예(1700)에서의 "반갑습")와 제2 음성합성의 생성에 사용된 음성 타입의 식별자(일례로, 음성 타입 "음성 2"의 식별자 ②(1720))가 표시될 수 있다.In the thirteenth screen example 1600, after the user moves the time indicator 450 to the position of “00:05.78”, the user selects the voice type “Voice 2” through the voice selection function 421, and a text input function ( 422) to input the text "Nice to meet you" is shown. At this time, when the user selects the add dubbing button 424, as in the 14th screen example 1700, the speech synthesis indicator 1710 for the second speech synthesis corresponding to the text “Nice to meet you” is displayed on the timeline. It may be displayed in association with thumbnails in the area of the function 440 . As already described, the speech synthesis indicator 1710 includes at least a part of the corresponding text "Nice to meet you" ("Nice to meet you" in the 14th screen example 1700) and the speech type used to generate the second speech synthesis. An identifier of (eg, an identifier ② 1720 of the voice type “Voice 2”) may be displayed.

음성합성 인디케이터(1710)의 길이는 제2 음성합성의 길이에 대응할 수 있으며, 이러한 음성합성 인디케이터(1710)의 길이에 따라 표시 가능한 텍스트의 양이 달라질 수 있다. 이때, 제13 화면 예(1600)에 나타난 타임 인디케이터(450)의 시각은 "00:06.00"이고, 제14 화면 예(1700)에 나타난 타임 인디케이터(450)의 시각은 "00:08.24"이다. 다시 말해, 제2 음성합성을 위한 음성합성 인디케이터(1710)의 길이가 2.24초(00:08.24 - 00:06.00 = 00:02.24)임을 알 수 있다.The length of the speech synthesis indicator 1710 may correspond to the length of the second speech synthesis, and the amount of displayable text may vary according to the length of the speech synthesis indicator 1710 . In this case, the time of the time indicator 450 shown in the thirteenth screen example 1600 is “00:06.00”, and the time of the time indicator 450 shown in the fourteenth screen example 1700 is “00:08.24”. In other words, it can be seen that the length of the speech synthesis indicator 1710 for the second speech synthesis is 2.24 seconds (00:08.24 - 00:06.00 = 00:02.24).

한편, 사용자가 제13 화면 예(1600)에서 미리듣기 버튼(423)을 선택하는 경우, 텍스트 "반갑습니다."에 대응하는 제2 음성합성이 사용자의 전자 기기의 스피커를 통해 출력될 수 있다. 다시 말해, 전자 기기는 컨텐츠 편집 툴(320)의 제어에 따라 제2 음성합성을 스피커를 통해 출력할 수 있다.Meanwhile, when the user selects the preview button 423 on the thirteenth screen example 1600 , a second voice synthesis corresponding to the text “Nice to meet you” may be output through the speaker of the user's electronic device. In other words, the electronic device may output the second voice synthesis through the speaker under the control of the content editing tool 320 .

도 18은 컨텐츠 편집 툴(320)의 제15 화면 예(1800)로서, 효과음을 추가하는 예를 나타내고 있다. 제15 화면 예(1800)에서는 사용자가 효과음 선택 기능(430)을 통해 효과음 2를 선택(일례로, 점선박스(1810) 내의 플러스 버튼을 클릭)함에 따라 현재의 타임 인디케이터(450)의 시점을 시작 시점으로 하여 효과음 2의 인디케이터(1820)가 추가된 예를 나타내고 있다. 이때, 효과음 2의 인디케이터(1820)의 길이는 점선박스(1810)에 나타난 바와 같이 2.46초일 수 있다. 이러한 인디케이터(1820) 역시 사용자가 클릭 앤 드래그를 통해 다른 시점으로 이동시킬 수 있다.18 is a fifteenth screen example 1800 of the content editing tool 320, and shows an example of adding sound effects. In the fifteenth screen example 1800, the current time indicator 450 starts as the user selects sound effect 2 through the sound effect selection function 430 (eg, clicks the plus button in the dotted line box 1810). An example in which the indicator 1820 of sound effect 2 is added as a viewpoint is shown. In this case, the length of the indicator 1820 of the sound effect 2 may be 2.46 seconds as shown in the dotted line box 1810 . The indicator 1820 may also be moved to another viewpoint by the user by clicking and dragging.

이상의 실시예들에서는 썸네일들을 위한 하나의 채널과, 음성합성들을 위한 하나의 채널, 그리고 효과음을 위한 하나의 채널의 총 세 개의 채널을 통해 영상 컨텐츠의 생성을 위한 정보들이 타임라인에 따라 나열되는 예들을 설명하였다. 그러나, 실시예에 따라 음성합성들을 위한 둘 이상의 채널 및/또는 효과음을 위한 둘 이상의 채널들이 사용될 수도 있다.In the above embodiments, information for generating image content is listed along a timeline through a total of three channels: one channel for thumbnails, one channel for voice synthesis, and one channel for sound effects. have been explained. However, two or more channels for voice synthesis and/or two or more channels for sound effects may be used according to an embodiment.

도 19는 컨텐츠 편집 툴(320)의 제16 화면 예(1900)로서, 음성합성들을 위한 둘 이상의 채널을 사용하는 예를 나타내고 있다. 제16 화면 예(1900)에서는 두 음성합성 인디케이터들(1210, 1710)이 일부분 겹쳐서 표시된 예를 나타낼 수 있다. 이는, 적어도 일부의 타임라인에서 두 음성합성들이 동시에 출력될 수도 있음을 나타내고 있다. 도 19의 실시예에서는 음성합성들을 위한 두 개의 채널이 사용될 수 있음을 나타내고 있지만, 셋 이상의 채널이 사용될 수도 있음을 쉽게 이해할 수 있을 것이다. 또한, 효과음을 위한 둘 이상의 채널이 사용될 수도 있음 역시 쉽게 이해할 수 있을 것이다.19 is an example of a sixteenth screen 1900 of the content editing tool 320, showing an example of using two or more channels for voice synthesis. In the sixteenth screen example 1900, two voice synthesis indicators 1210 and 1710 may be partially overlapped and displayed. This indicates that two speech synthesis may be simultaneously output in at least some timelines. Although it is shown that two channels for speech synthesis can be used in the embodiment of FIG. 19, it will be easily understood that three or more channels may be used. In addition, it will be easily understood that two or more channels for sound effect may be used.

도 20은 본 발명의 일실시예에 따른 영상 컨텐츠 생성 방법의 예를 도시한 흐름도이다. 본 실시예에 따른 영상 컨텐츠 생성 방법은 컨텐츠 제작 툴(320)을 통해 컨텐츠 편집 지원을 위한 서비스를 제공하는 컴퓨터 장치(200)에 의해 수행될 수 있다. 이때, 컴퓨터 장치(200)의 프로세서(220)는 메모리(210)가 포함하는 운영체제의 코드나 적어도 하나의 컴퓨터 프로그램의 코드에 따른 제어 명령(instruction)을 실행하도록 구현될 수 있다. 여기서, 프로세서(220)는 컴퓨터 장치(200)에 저장된 코드가 제공하는 제어 명령에 따라 컴퓨터 장치(200)가 도 20의 방법이 포함하는 단계들(2010 내지 2090)을 수행하도록 컴퓨터 장치(200)를 제어할 수 있다.20 is a flowchart illustrating an example of a method for generating image content according to an embodiment of the present invention. The image content creation method according to the present embodiment may be performed by the computer device 200 that provides a service for content editing support through the content creation tool 320 . In this case, the processor 220 of the computer device 200 may be implemented to execute a control instruction according to a code of an operating system included in the memory 210 or a code of at least one computer program. Here, the processor 220 causes the computer device 200 to perform the steps 2010 to 2090 included in the method of FIG. 20 according to a control command provided by the code stored in the computer device 200 . can control

단계(2010)에서 컴퓨터 장치(200)는 컨텐츠 편집 툴을 통해 업로드된 이미지들의 스냅샷들을 추출할 수 있다. 이미 설명한 바와 같이, 이미지들은 개별적인 복수의 이미지들이나 복수의 이미지들이 포함된 하나의 파일 또는 하나의 파일과 복수의 이미지들의 조합의 형태로 업로드될 수 있다. 특정 실시예에서 이미지들은 이미지화 가능한 복수의 페이지들 포함하는 파일의 형태로 업로드될 수 있다. 일례로, PDF 파일이 업로드되는 경우, 컴퓨터 장치(200)는 PDF 파일에서 이미지를 추출하여 복수의 이미지 파일들로 저장할 수 있으며, 복수의 이미지 파일들 각각에 대한 스냅샷들을 추출할 수 있다.In operation 2010, the computer device 200 may extract snapshots of images uploaded through a content editing tool. As already described, the images may be uploaded in the form of a plurality of individual images, a single file including a plurality of images, or a combination of a single file and a plurality of images. In certain embodiments, images may be uploaded in the form of a file including a plurality of imageable pages. For example, when a PDF file is uploaded, the computer device 200 may extract an image from the PDF file and store it as a plurality of image files, and may extract snapshots for each of the plurality of image files.

단계(2020)에서 컴퓨터 장치(200)는 추출된 스냅샷들을 컨텐츠 편집 툴을 통해 타임라인에 따라 표시할 수 있다. 여기서, 표시된 스냅샷들의 길이는 표시된 스냅샷들에 대응하는 이미지들이 상기 타임라인상에서 점유하는 시간인 런닝타임에 비례할 수 있다. 이때 컴퓨터 장치(200)는 추출된 스냅샷들을 디폴트 런닝타임에 비례하는 길이로 컨텐츠 편집 툴을 통해 표시할 수 있다. 앞서 도 5에서는 4초의 디폴트 런닝타임에 비례하는 길이로 스냅샷들을 표시하는 예를 설명한 바 있다.In operation 2020, the computer device 200 may display the extracted snapshots according to a timeline through a content editing tool. Here, the length of the displayed snapshots may be proportional to the running time, which is a time occupied by images corresponding to the displayed snapshots on the timeline. In this case, the computer device 200 may display the extracted snapshots with a length proportional to the default running time through the content editing tool. Previously, in FIG. 5 , an example of displaying snapshots with a length proportional to a default running time of 4 seconds has been described.

단계(2030)에서 컴퓨터 장치(200)는 표시된 스냅샷들의 순서를 변경하기 위한 기능을 제공할 수 있다. 일례로, 도 8 및 도 9에서는 썸네일 1 및 썸네일 2의 위치를 변경하는 예를 설명한 바 있다. 실시예에 따라 컴퓨터 장치(200)는 특정 썸네일을 삭제하기 위한 기능을 더 제공할 수도 있다.In step 2030, the computer device 200 may provide a function for changing the order of the displayed snapshots. As an example, an example of changing the positions of the thumbnails 1 and 2 has been described in FIGS. 8 and 9 . According to an embodiment, the computer device 200 may further provide a function for deleting a specific thumbnail.

단계(2040)에서 컴퓨터 장치(200)는 컨텐츠 편집 툴을 통해 표시된 스냅샷들의 길이를 조절하는 길이 조절 기능을 제공할 수 있다. 일례로, 컴퓨터 장치(200)는 표시된 스냅샷들 중 제1 스냅샷에 대해 기설정된 왼쪽 영역 또는 오른쪽 영역에 대한 사용자의 터치 앤 드래그 또는 클릭 앤 드래그에 따라 제1 스냅샷의 길이를 증가 또는 감소시키는 기능을 제공할 수 있다. 또한, 컴퓨터 장치(200)는 제1 스냅샷의 왼쪽 영역 또는 오른쪽 영역에 대한 사용자의 터치 또는 클릭이 유지되는 동안, 제1 스냅샷의 왼쪽 끝 부분 또는 오른쪽 끝 부분에 대한 타임라인상의 시점을 표시할 수 있다. 일례로, 앞서 도 7 및 도 8에서는 썸네일의 길이를 늘이거나 줄일 수 있으며, 이때 타임라인상의 시점이 해당 스냅샷의 오른쪽 끝 부분에 표시되는 예를 설명한 바 있다.In operation 2040, the computer device 200 may provide a length adjustment function for adjusting the length of the snapshots displayed through the content editing tool. For example, the computer device 200 increases or decreases the length of the first snapshot according to the user's touch and drag or click and drag on the left area or the right area preset for the first snapshot among the displayed snapshots. function can be provided. In addition, the computer device 200 displays a viewpoint on the timeline for the left or right end of the first snapshot while the user's touch or click on the left or right area of the first snapshot is maintained. can do. As an example, in FIGS. 7 and 8 , the length of the thumbnail can be increased or decreased, and an example in which a viewpoint on the timeline is displayed at the right end of the corresponding snapshot has been described.

단계(2050)에서 컴퓨터 장치(200)는 길이 조절 기능을 통해 길이가 조절된 스냅샷의 런닝타임을 조절된 길이에 따라 조절할 수 있다. 일례로, 컴퓨터 장치(200)는 길이가 조절된 스냅샷에 대응하는 이미지가 타임라인상에서 점유하는 시간인 상기 런닝타임을, 스냅샷의 길이가 조절된 정도에 비례하게 증가 또는 감소시킬 수 있다.In operation 2050 , the computer device 200 may adjust the running time of the snapshot whose length is adjusted through the length adjustment function according to the adjusted length. For example, the computer device 200 may increase or decrease the running time, which is a time occupied by an image corresponding to the length-adjusted snapshot on the timeline, in proportion to the degree to which the length of the snapshot is adjusted.

단계(2060)에서 컴퓨터 장치(200)는 컨텐츠 편집 툴을 통해 입력되는 텍스트에 대한 음성합성을 생성하여 타임라인의 선택된 시점에 추가할 수 있다. 이때, 컴퓨터 장치(200)는 컨텐츠 편집 툴을 통해 선택된 음성 타입에 따라 텍스트에 대한 음성합성을 생성할 수 있다. 남녀노소, 언어(한국어, 영어, 중국어, 일본어, 스페인어 등), 감정(기쁨, 슬픔 등) 등에 따라 다수의 음성 타입들이 미리 생성되어 컨텐츠 편집 툴을 통해 사용자에게 제공될 수 있으며, 사용자는 컨텐츠 편집 툴에서 음성합성에 이용할 특정 음성 타입을 선택할 수 있다. 또한, 컴퓨터 장치(200)는 타임라인상에서 특정 시점을 나타내는 타임 인디케이터의 이동을 통해 선택된 타임라인의 특정 시점에 생성된 음성합성을 추가할 수 있다. 앞서, 도 11 및 도 12, 그리고 도 16 및 도 17에서는 타임 인디케이터(450)를 통해 선택된 시점에 음성합성을 추가하는 예를 설명한 바 있다.In operation 2060, the computer device 200 may generate a voice synthesis for text input through the content editing tool and add it to the selected time point of the timeline. In this case, the computer device 200 may generate a speech synthesis for the text according to the speech type selected through the content editing tool. A number of voice types can be created in advance and provided to users through a content editing tool according to people of all ages, languages (Korean, English, Chinese, Japanese, Spanish, etc.), emotions (joy, sadness, etc.) In the tool, you can select a specific speech type to use for speech synthesis. Also, the computer device 200 may add the voice synthesis generated at a specific point in the selected timeline by moving a time indicator indicating a specific point in time on the timeline. Previously, examples of adding speech synthesis at a time point selected through the time indicator 450 have been described in FIGS. 11 and 12 and FIGS. 16 and 17 .

실시예에 따라 컴퓨터 장치(200)는 타임라인에 추가하고자 하는 제1 음성합성이 타임라인에 이미 추가된 제2 음성합성과 런닝타임의 적어도 일부가 겹치는 경우, 제1 음성합성을 제2 음성합성과 다른 음성 채널로 타임라인에 추가할 수 있다. 다시 말해, 생성되는 영상 컨텐츠에서 동시에 둘 이상의 음성합성들이 출력되도록 더빙이 이루어질 수 있다. 도 19에서는 두 개의 음성합성들이 서로 다른 채널을 통해 타임라인에 추가되는 예를 설명한 바 있다.According to an embodiment, when the first voice synthesis to be added to the timeline overlaps with the second voice synthesis already added to the timeline at least a part of the running time, the computer device 200 performs the first voice synthesis to the second voice synthesis. and other voice channels can be added to the timeline. In other words, dubbing may be performed so that two or more voice syntheses are simultaneously output from the generated image content. In FIG. 19, an example in which two speech synthesis is added to the timeline through different channels has been described.

또한, 컴퓨터 장치(200)는 타임라인의 선택된 시점에 추가된 음성합성에 대한 인디케이터를 컨텐츠 편집 툴을 통해 표시할 수 있다. 실시예에 따라 인디케이터를 통해 텍스트의 적어도 일부가 표시될 수 있으며, 인디케이터의 길이는 음성합성의 길이에 비례할 수 있다. 여기서 음성합성의 길이는 음성합성이 출력되는 시간을 의미할 수 있다.Also, the computer device 200 may display an indicator for voice synthesis added at a selected time in the timeline through the content editing tool. According to an embodiment, at least a portion of the text may be displayed through the indicator, and the length of the indicator may be proportional to the length of the speech synthesis. Here, the length of the speech synthesis may mean a time during which the speech synthesis is output.

이에 더해, 컴퓨터 장치(200)는 인디케이터에 대한 사용자 입력에 기반하여 음성합성의 생성에 이용된 음성 타입에 대한 정보, 음성합성의 길이에 대한 정보 및 텍스트 중 적어도 하나를 포함하는 음성합성 정보를 출력할 수 있다. 음성합성 정보는 인디케이터에 대한 사용자 입력이 발생하는 위치와 연계하여 표시될 수 있다. 일례로, 앞서 도 13에서는 음성합성 인디케이터(1210)에 대한 음성합성 정보(1310)를 표시하는 예를 설명한 바 있다.In addition, the computer device 200 outputs speech synthesis information including at least one of information on a speech type used to generate a speech synthesis, information on a length of speech synthesis, and text based on a user input to the indicator. can do. The speech synthesis information may be displayed in association with a location where a user input to the indicator occurs. For example, in FIG. 13 , an example of displaying the voice synthesis information 1310 for the voice synthesis indicator 1210 has been described.

단계(2070)에서 컴퓨터 장치(200)는 사용자의 입력에 기반하여 타임라인에 추가된 음성합성의 타임라인상에서의 위치를 이동시킬 수 있다. 일례로, 도 14 및 도 15에서는 사용자의 클릭 앤 드래그 또는 터치 앤 드래그와 같은 입력에 기반하여 음성합성의 위치를 이동시킬 수 있는 예를 설명한 바 있다.In operation 2070, the computer device 200 may move the position on the timeline of the voice synthesis added to the timeline based on the user's input. As an example, examples in which the position of speech synthesis can be moved based on an input such as a user's click and drag or touch and drag have been described in FIGS. 14 and 15 .

단계(2080)에서 컴퓨터 장치(200)는 컨텐츠 편집 툴을 통해 제공된 복수의 효과음 중 하나의 효과음을 선택받을 수 있다. 일례로, 도 18에서는 효과음 추가 기능(430)을 통해 사용자에게 복수의 효과음들을 제공할 수 있으며, 사용자가 복수의 효과음들 중 하나를 선택할 수 있음을 설명하였다.In operation 2080, the computer device 200 may receive a selection of one sound effect from among a plurality of sound effects provided through the content editing tool. As an example, in FIG. 18 , it has been described that a plurality of sound effects can be provided to the user through the sound effect addition function 430 , and that the user can select one of the plurality of sound effects.

단계(2090)에서 컴퓨터 장치(200)는 컨텐츠 편집 툴에서 타임라인에 대해 선택된 시점에 선택된 효과음을 추가할 수 있다. 일례로, 도 18에서는 타임 인디케이터(450)를 통해 선택된 시점에 효과음 2를 추가하는 예를 설명한 바 있다.In operation 2090, the computer device 200 may add the selected sound effect to the timeline at the selected time point in the content editing tool. For example, in FIG. 18 , an example of adding sound effect 2 at a time point selected through the time indicator 450 has been described.

이때, 실시예에 따라 단계들(2010 내지 2090) 중 적어도 일부는 병렬적으로 수행될 수 있다. 일례로, 단계(2040)와 단계(2050)는 길이 조절을 위한 사용자의 입력에 의해 트리거링될 수 있으며, 단계(2060)와 단계(2070)는 음성합성의 추가를 위한 사용자의 입력에 의해 트리거링될 수 있으며, 단계(2080)와 단계(2090)는 효과음의 추가를 위한 사용자의 입력에 의해 트리거링될 수 있다. 따라서, 단계들(2040 내지 2090)의 순서는 사용자의 입력에 따라 변경될 수도 있다.In this case, according to an embodiment, at least some of the steps 2010 to 2090 may be performed in parallel. As an example, steps 2040 and 2050 may be triggered by a user input for length adjustment, and steps 2060 and 2070 may be triggered by a user input for addition of speech synthesis. Also, steps 2080 and 2090 may be triggered by a user input for adding a sound effect. Accordingly, the order of the steps 2040 to 2090 may be changed according to a user's input.

추후, 사용자가 영상 컨텐츠의 생성을 요청하는 경우, 컴퓨터 장치(200)는 이미지들을 영상 컨텐츠에 맞춘 사이즈로 평준화한 후, 동영상을 생성할 수 있다. 실시예에 따라 컴퓨터 장치(200)는 영상 컨텐츠에 워터마크 및/또는 자막을 삽입할 수 있다. 이후, 컴퓨터 장치(200)는 동영상에 타임라인에 맞게 음성합성 및/또는 효과음을 삽입하여 최종 영상 컨텐츠를 생성할 수 있다.Later, when the user requests to generate image content, the computer device 200 may generate a moving image after normalizing the images to a size tailored to the image content. According to an embodiment, the computer device 200 may insert a watermark and/or a caption into the image content. Thereafter, the computer device 200 may generate final video content by inserting voice synthesis and/or sound effects in the video according to the timeline.

이와 같이, 본 발명의 실시예들에 따르면, 다수의 이미지들에 대해 사용자가 원하는 음성합성을 실시간으로 생성하여 사용자가 원하는 재생 시작 시간에 더빙할 수 있고, 생성된 음성합성이 더빙된 다수의 이미지들을 통해 영상 컨텐츠를 생성 및 제공할 수 있다.As described above, according to embodiments of the present invention, a voice synthesis desired by a user can be generated in real time for a plurality of images and dubbed at a playback start time desired by the user, and the generated voice synthesis is dubbed in a plurality of images. video contents can be created and provided through them.

이상에서 설명된 시스템 또는 장치는 하드웨어 구성요소, 또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPGA(field programmable gate array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 어플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The system or apparatus described above may be implemented as a hardware component or a combination of a hardware component and a software component. For example, devices and components described in the embodiments may include, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate array (FPGA). , a programmable logic unit (PLU), microprocessor, or any other device capable of executing and responding to instructions, may be implemented using one or more general purpose or special purpose computers. The processing device may execute an operating system (OS) and one or more software applications running on the operating system. The processing device may also access, store, manipulate, process, and generate data in response to execution of the software. For convenience of understanding, although one processing device is sometimes described as being used, one of ordinary skill in the art will recognize that the processing device includes a plurality of processing elements and/or a plurality of types of processing elements. It can be seen that may include For example, the processing device may include a plurality of processors or one processor and one controller. Other processing configurations are also possible, such as parallel processors.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치에 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록매체에 저장될 수 있다.The software may comprise a computer program, code, instructions, or a combination of one or more thereof, which configures a processing device to operate as desired or is independently or collectively processed You can command the device. The software and/or data may be any kind of machine, component, physical device, virtual equipment, computer storage medium or device, to be interpreted by or to provide instructions or data to the processing device. may be embodied in The software may be distributed over networked computer systems and stored or executed in a distributed manner. Software and data may be stored in one or more computer-readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 매체는 컴퓨터로 실행 가능한 프로그램을 계속 저장하거나, 실행 또는 다운로드를 위해 임시 저장하는 것일 수도 있다. 또한, 매체는 단일 또는 수개 하드웨어가 결합된 형태의 다양한 기록수단 또는 저장수단일 수 있는데, 어떤 컴퓨터 시스템에 직접 접속되는 매체에 한정되지 않고, 네트워크 상에 분산 존재하는 것일 수도 있다. 매체의 예시로는, 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD-ROM 및 DVD와 같은 광기록 매체, 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical medium), 및 ROM, RAM, 플래시 메모리 등을 포함하여 프로그램 명령어가 저장되도록 구성된 것이 있을 수 있다. 또한, 다른 매체의 예시로, 애플리케이션을 유통하는 앱 스토어나 기타 다양한 소프트웨어를 공급 내지 유통하는 사이트, 서버 등에서 관리하는 기록매체 내지 저장매체도 들 수 있다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다.The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, etc. alone or in combination. The medium may continuously store a computer executable program, or may be a temporary storage for execution or download. In addition, the medium may be various recording means or storage means in the form of a single or several hardware combined, it is not limited to a medium directly connected to any computer system, and may exist distributed on a network. Examples of the medium include a hard disk, a magnetic medium such as a floppy disk and a magnetic tape, an optical recording medium such as CD-ROM and DVD, a magneto-optical medium such as a floppy disk, and those configured to store program instructions, including ROM, RAM, flash memory, and the like. In addition, examples of other media may include recording media or storage media managed by an app store that distributes applications, sites that supply or distribute various other software, or servers. Examples of program instructions include not only machine language codes such as those generated by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described with reference to the limited embodiments and drawings, various modifications and variations are possible by those skilled in the art from the above description. For example, the described techniques are performed in a different order than the described method, and/or the described components of the system, structure, apparatus, circuit, etc. are combined or combined in a different form than the described method, or other components Or substituted or substituted by equivalents may achieve an appropriate result.

그러므로, 다른 구현들, 다른 실시예들 및 청구범위와 균등한 것들도 후술하는 청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents to the claims are also within the scope of the following claims.

Claims

A method for generating image content in a computer device including at least one processor, the method comprising:
extracting, by the at least one processor, snapshots of images uploaded through a content editing tool;
displaying, by the at least one processor, the extracted snapshots according to a timeline through the content editing tool;
providing, by the at least one processor, a length adjustment function for adjusting the length of the displayed snapshots displayed in the content editing tool through the content editing tool;
adjusting, by the at least one processor, a running time of a snapshot whose length is adjusted through the length adjustment function according to the adjusted length; and
generating, by the at least one processor, a speech synthesis for text input through the content editing tool, and adding it to the selected time point of the timeline;
Video content creation method comprising a.

According to claim 1,
The length at which the displayed snapshots are displayed in the content editing tool is proportional to the running time, which is a time occupied by images corresponding to the displayed snapshots on the timeline,
The step of displaying according to the timeline,
Displaying the extracted snapshots with a length proportional to the default running time through the content editing tool
A method of generating video content, characterized in that

According to claim 1,
The step of providing the length adjustment function,
and providing a function of increasing or decreasing the length of the first snapshot according to a user's touch and drag or click and drag with respect to the left or right area preset for the first snapshot among the displayed snapshots. How to create video content with

4. The method of claim 3,
The step of providing the length adjustment function,
While the user's touch or click on the left area or the right area of the first snapshot is maintained, a time point on the timeline for the left end or right end of the first snapshot is displayed. How to create video content with

According to claim 1,
The step of adjusting the running time according to the adjusted length,
The method of claim 1, wherein the running time, which is a time occupied by an image corresponding to the length-adjusted snapshot, on the timeline, is increased or decreased in proportion to the degree to which the length is adjusted.

According to claim 1,
The step of generating the speech synthesis and adding it to the selected time point of the timeline comprises:
and generating a voice synthesis for the text according to the voice type selected through the content editing tool.

According to claim 1,
The step of generating the speech synthesis and adding it to the selected time point of the timeline comprises:
and adding the generated voice synthesis at a specific point in the timeline selected through movement of a time indicator indicating a specific point in time on the timeline.

According to claim 1,
moving, by the at least one processor, a position on the timeline of the speech synthesis added to the timeline based on a user input;
Image content generation method further comprising a.

According to claim 1,
receiving, by the at least one processor, one of a plurality of sound effects provided through the content editing tool; and
adding, by the at least one processor, the selected sound effect at a time point selected for the timeline in the content editing tool
Image content generation method further comprising a.

According to claim 1,
providing, by the at least one processor, a function for changing the order of the displayed snapshots;
Image content generating method, characterized in that it further comprises.

According to claim 1,
The images are image content creation method, characterized in that it is uploaded in the form of a file including a plurality of imageable pages.

According to claim 1,
The step of generating the speech synthesis and adding it to the selected time point of the timeline comprises:
When the first voice synthesis to be added to the timeline overlaps with the second voice synthesis already added to the timeline at least a part of the running time, the first voice synthesis is transferred to a different voice channel than the second voice synthesis. A method of creating video content, characterized in that it is added to the timeline.

According to claim 1,
The step of generating the speech synthesis and adding it to the selected time point of the timeline comprises:
and displaying an indicator for voice synthesis added at the selected time point of the timeline through the content editing tool.

14. The method of claim 13,
At least a part of the text is displayed through the indicator.

14. The method of claim 13,
The length of the indicator is proportional to the length of the voice synthesis method.

14. The method of claim 13,
The step of generating the speech synthesis and adding it to the selected time point of the timeline comprises:
and displaying at least one of information on a voice type used to generate the voice synthesis, information on a length of the voice synthesis, and the text based on a user input to the indicator.

A computer program stored in a computer-readable recording medium in combination with a computer device to cause the computer device to execute the method of any one of claims 1 to 16.

17. A computer-readable recording medium in which a computer program for executing the method of any one of claims 1 to 16 in a computer device is recorded.

at least one processor implemented to execute computer-readable instructions
including,
by the at least one processor,
Extract snapshots of uploaded images through the content editing tool,
Displaying the extracted snapshots along a timeline through the content editing tool,
Through the content editing tool, the displayed snapshots provide a length adjustment function for adjusting the displayed length in the content editing tool,
Adjusting the running time of the snapshot whose length is adjusted through the length adjustment function according to the adjusted length,
Creating a speech synthesis for text input through the content editing tool and adding it to the selected time point of the timeline
A computer device characterized by a.

20. The method of claim 19,
to provide the length adjustment function, by the at least one processor,
Providing a function of increasing or decreasing the length of the first snapshot according to a user's touch and drag or click and drag on a left or right region preset for the first snapshot among the displayed snapshots
A computer device characterized by a.